Uploaded by juancequema

Essential Linear Algebra with Applications, A Problem-Solving Approach. Titu Andreescu

advertisement
Titu Andreescu
Essential Linear
Algebra with
Applications
A Problem-Solving Approach
Titu Andreescu
Essential Linear Algebra
with Applications
A Problem-Solving Approach
Titu Andreescu
Natural Sciences and Mathematics
University of Texas at Dallas
Richardson, TX, USA
ISBN 978-0-8176-4360-7
ISBN 978-0-8176-4636-3 (eBook)
DOI 10.1007/978-0-8176-4636-3
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2014948201
Mathematics Subject Classification (2010): 15, 12, 08
© Springer Science+Business Media New York 2014
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.birkhauser-science.com)
Preface
This textbook is intended for an introductory followed by an advanced course in
linear algebra, with emphasis on its interactions with other topics in mathematics,
such as calculus, geometry, and combinatorics. We took a straightforward path to
the most important topic, linear maps between vector spaces, most of the time finite
dimensional. However, since these concepts are fairly abstract and not necessarily
natural at first sight, we included a few chapters with explicit examples of vector
spaces such as the standard n-dimensional vector space over a field and spaces of
matrices. We believe that it is fundamental for the student to be very familiar with
these spaces before dealing with more abstract theory. In order to maximize the
clarity of the concepts discussed, we included a rather lengthy chapter on 2 2
matrices and their applications, including the theory of Pell’s equations. This will
help the student manipulate matrices and vectors in a concrete way before delving
into the abstract and very powerful approach to linear algebra through the study of
vector spaces and linear maps.
The first few chapters deal with elementary properties of vectors and matrices
and the basic operations that one can perform on them. A special emphasis is
placed on the Gaussian Reduction algorithm and its applications. This algorithm
provides efficient ways of computing some of the objects that appear naturally in
abstract linear algebra such as kernels and images of linear maps, dimensions of
vector spaces, and solutions to linear systems of equation. A student mastering
this algorithm and its applications will therefore have a much better chance of
understanding many of the key notions and results introduced in subsequent
chapters.
The bulk of the book contains a comprehensive study of vector spaces and linear
maps between them. We introduce and develop the necessary tools along the way,
by discussing the many examples and problems proposed to the student. We offer a
thorough exposition of central concepts in linear algebra through a problem-based
approach. This is more challenging for the students, since they have to spend time
trying to solve the proposed problems after reading and digesting the theoretical
v
vi
Preface
material. In order to assist with the comprehension of the material, we provided
solutions to all problems posed in the theoretical part. On the other hand, at the
end of each chapter, the student will find a rather long list of proposed problems,
for which no solution is offered. This is because they are similar to the problems
discussed in the theoretical part and thus should not cause difficulties to a reader
who understood the theory.
We truly hope that you will have a wonderful experience in your linear algebra
journey.
Richardson, TX, USA
Titu Andreescu
Contents
1
Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Vectors, Matrices, and Basic Operations on Them . . . . . . . . . . . . . . . . .
1.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Matrices as Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Block Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 The Transpose of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
3
10
11
14
15
26
29
31
31
41
44
51
2
Square Matrices of Order 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 The Trace and the Determinant Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The Characteristic Polynomial and the
Cayley–Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 The Powers of a Square Matrix of Order 2. . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Application to Linear Recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Solving the Equation X n D A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Application to Pell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
53
56
57
65
67
70
70
73
74
78
79
83
vii
viii
Contents
3
Matrices and Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.1 Linear Systems: The Basic Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2 The Reduced Row-Echelon form and Its
Relevance to Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.3 Solving the System AX D b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.4 Computing the Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.4.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4
Vector Spaces and Subspaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Vector Spaces-Definition, Basic Properties and Examples . . . . . . . . .
4.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Linear Combinations and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Dimension Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107
107
113
114
121
122
127
128
133
135
146
5
Linear Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Definitions and Objects Canonically Attached to a Linear Map . . .
5.1.1
Problems for practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Linear Maps and Linearly Independent Sets . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1
Problems for practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Matrix Representation of Linear Transformations . . . . . . . . . . . . . . . . . .
5.3.1
Problems for practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Rank of a Linear Map and Rank of a Matrix. . . . . . . . . . . . . . . . . . . . . . . .
5.4.1
Problems for practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149
149
157
159
163
164
181
183
194
6
Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 The Dual Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Orthogonality and Equations for Subspaces . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 The Transpose of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Application to the Classification of Nilpotent Matrices . . . . . . . . . . . .
6.4.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
197
197
208
210
218
220
224
225
234
7
Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.1 Multilinear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Contents
7.2
ix
Determinant of a Family of Vectors, of a Matrix,
and of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Main Properties of the Determinant of a Matrix . . . . . . . . . . . . . . . . . . . .
7.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computing Determinants in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Vandermonde Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Linear Systems and Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
243
251
253
262
264
278
282
287
288
298
8
Polynomial Expressions of Linear Transformations and Matrices . . .
8.1 Some Basic Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 The Minimal Polynomial of a Linear Transformation or Matrix . .
8.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4 The Characteristic Polynomial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 The Cayley–Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
301
301
303
304
309
310
316
319
330
333
337
9
Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1 Upper-Triangular Matrices, Once Again . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Diagonalizable Matrices and Linear Transformations . . . . . . . . . . . . . .
9.2.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 Some Applications of the Previous Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.1
Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
339
340
343
345
356
359
372
10
Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1 Bilinear and Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Positivity, Inner Products, and the Cauchy–Schwarz Inequality . . .
10.2.1 Practice Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Bilinear Forms and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.1 Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4 Duality and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.1 Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5 Orthogonal Bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5.1 Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.6 The Adjoint of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.6.1 Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.7 The Orthogonal Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.7.1 Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
377
378
389
391
397
399
406
408
416
418
436
442
448
450
465
7.3
7.4
7.5
7.6
x
Contents
10.8
11
The Spectral Theorem for Symmetric Linear
Transformations and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
10.8.1 Problems for Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Appendix: Algebraic Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.1 The Symmetric Group Sn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.2 Transpositions as Generators of Sn . . . . . . . . . . . . . . . . . . . . . . . .
11.2.3 The Signature Homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
483
483
484
484
486
487
489
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
Chapter 1
Matrix Algebra
Abstract This chapter deals with matrices and the basic operations associated with
them in a concrete way, paving the path to a more advanced study in later chapters.
The emphasis is on special types of matrices and their stability under the described
operations.
Keywords Matrices • Operations • Invertible • Transpose • Orthogonal
• Symmetric matrices
Before dealing with the abstract setup of vector spaces and linear maps between
them, we find it convenient to discuss some properties of matrices. Matrices are a
very handy way of describing linear phenomena while being very concrete objects.
The goal of this chapter is to define these objects as well as some basic operations
on them.
Roughly, a matrix is a collection of “numbers” displayed in some rectangular
board. We call these “numbers” the entries of the matrix. Very often, these “numbers” are simply rational, real, or more generally complex numbers. However, these
choices are not always adapted to our needs: in combinatorics and computer science,
one works very often with matrices whose entries are residue classes of integers
modulo prime numbers (especially modulo 2 in computer science), while other
areas of mathematics work with matrices whose entries are polynomials, rational
functions, or more generally continuous, differentiable, or integrable functions.
There are rules allowing to add and multiply matrices (if suitable conditions on the
size of the matrices are satisfied), if the set containing the entries of these matrices
is stable under these operations. Fields are algebraic structures specially designed to
have such properties (and more. . . ), and from this point of view they are excellent
choices for the sets containing the entries of the matrices we want to study.
The theory of fields is extremely beautiful and one can write a whole series of
books on it. Even the basics can be fairly difficult to digest by a reader without some
serious abstract algebra prerequisites. However, the purpose of this introductory
book is not to deal with subtleties related to the theory of fields, so we decided
to take the following rather pragmatic approach: we will only work with a very
explicit set of fields in this book (we will say which ones in the next paragraphs), so
the reader not familiar with abstract algebra will not need to know the subtleties of
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__1
1
2
1 Matrix Algebra
the theory of fields in the sequel. Of course, the reader familiar with this theory will
realize that all the general results described in this book work over general fields.
In most introductory books of linear algebra, one works exclusively over the
fields R and C of real numbers and complex numbers, respectively. They are indeed
sufficient for essentially all applications of matrices to analysis and geometry, but
they are not sufficient for some interesting applications in computer science and
combinatorics. We will introduce one more field that will be used from time to time
in this book. This is the field F2 with two elements 0 and 1. It is endowed with
addition and multiplication rules as follows:
0 C 0 D 0;
0 C 1 D 1 C 0 D 1;
1C1D0
and
0 0 D 0 1 D 1 0 D 0;
1 1 D 1:
We do not limit ourselves exclusively to R and C since a certain number of issues
arise from time to time when working with general fields, and this field F2 allows
us to make a series of remarks about this issues. From this point of view, one can
see F2 as a test object for some subtle issues arising in linear algebra over general
fields.
Important convention: in the remainder of this book, we will work exclusively with one of the following fields:
•
•
•
•
the field Q of rational numbers
the field R of real numbers.
the field C of complex numbers.
The field with two elements F2 with addition and multiplication rules
described as above.
We will assume familiarity with each of the sets Q, R and C as well as the
basic operations that can be done with rational, real, or complex numbers (such as
addition, multiplication, or division by nonzero numbers).
We will reserve the letter F for one of these fields (if we do not want to
specify which one of the previous fields we are working with, we will simply say
“Let F be a field”).
The even more pragmatic reader can take an even more practical approach and
simply assume that F will stand for R or C in the sequel.
1.1 Vectors, Matrices, and Basic Operations on Them
3
1.1 Vectors, Matrices, and Basic Operations on Them
Consider a field F . Its elements will be called scalars.
Definition 1.1. Let n be a positive integer. We denote by F n the set of n-tuples
of elements of F . The elements of F n are called vectors and are denoted either in
row-form X D .x1 ; : : : ; xn / or in column-form
2
3
x1
6 x2 7
6 7
X D 6 : 7:
4 :: 5
xn
The scalar xi is called the i th coordinate of X (be it written in row or column form).
The previous definition requires quite a few clarifications. First of all, note that
if we want to be completely precise we should call an element of F n an n-vector
or n-dimensional vector, to make it apparent that it lives in a set which depends
on n. This would make a lot of statements fairly cumbersome, so we simply call the
elements of F n vectors, without any reference to n. So .1/ is a vector in F 1 , while
.1; 2/ is a vector in F 2 . There is no relation whatsoever between the two exhibited
vectors, as they live in completely different sets a priori.
While the abuse of notation discussed in the previous paragraph is rather easy
to understand and accept, the convention about writing vectors either in row or in
column form seems strange at first sight. It is easily understood once we introduce
matrices and basic operations on them, as well as the link between matrices and
vectors, so we advise the reader to take it simply as a convention for now and make
2 3
v1
6v2 7
6 7
no distinction between the vector .v1 ; : : : ; vn / and the vector 6 : 7. We will see later
4 :: 5
vn
on that from the point of view of linear algebra the column notation is more useful.
The zero vector in F n is denoted simply 0 and it is the vector whose coordinates
are all equal to 0. Note that the notation 0 is again slightly abusive, since it does
not make apparent the dependency on n: the 0 vector in F 2 is definitely not the
same object as the zero vector in F 3 . However, this will (hopefully) not create any
confusion, since in the sequel the context will always make it clear which zero vector
we consider.
Definition 1.2. Let m; n be positive integers. An m n matrix with entries in F
is a rectangular array
4
1 Matrix Algebra
3
a11 a12 : : : a1n
6 a21 a22 : : : a2n 7
7
6
AD6 :
:: : : :: 7 :
4 ::
: : 5
:
2
am1 am2 : : : amn
The scalar aij 2 F is called the .i; j /-entry of A. The column-vector
3
a1j
6 a2j 7
7
6
Cj D 6 : 7
4 :: 5
2
amj
is called the j th column of A and the row-vector
Li D Œai1 ; ai2 ; : : : ; ai n is called the ith row of A. We denote by Mm;n .F / the set of all m n matrices with
entries in F .
Definition 1.3. A square matrix of order n with entries in F is a matrix A 2
Mn;n .F /. We denote by Mn .F / the set Mn;n .F / of square matrices of order n.
We can already give an explanation for our choice of denoting vectors in two
different ways: a m n matrix can be seen as a family of vectors, namely its rows.
But it can also be seen as a family of vectors given by its columns. It is rather natural
to denote rows of A in row-form and columns of A in column-form. Note that a rowvector in F n can be thought of as a 1 n matrix, while a column-vector in F n can
be thought of as a n 1 matrix. From now on, whenever we write a vector as a row
vector, we think of it as a matrix with one row, while when we write it in column
form, we think of it as a matrix with one column.
Remark 1.4. If F1 F are fields, then we have a natural inclusion Mm;n .F1 / Mm;n .F /: any matrix with entries in F1 is naturally a matrix with entries in F . For
instance the inclusions Q R C, induce inclusions of the corresponding sets of
matrices, i.e.
Mm;n .Q/ Mm;n .R/ Mm;n .C/:
Whenever it is convenient, matrices in Mm;n .F / will be denoted symbolically
by capital letters A; B; C; : : : or by Œaij ; Œbij ; Œcij ; : : : where aij ; bij ; cij ; : : :
respectively, represent the entries of the matrices.
Example 1.5. a) The matrix Œaij 2 M2;3 .Q/, where aij D i 2 C j is given by
234
:
AD
567
1.1 Vectors, Matrices, and Basic Operations on Them
5
b) The matrix
3
1234
62 3 4 57
7
AD6
43 4 5 65
2
4567
can also be written as the matrix A D Œaij 2 M4 .Q/ with aij D i C j 1.
Remark 1.6. Two matrices A D Œaij and B D Œbij are equal if and only if they
have the same size (i.e., the same number of columns and rows) and aij D bij for
all pairs .i; j /.
A certain number of matrices will appear rather constantly throughout the book
and we would like to make a list of them. First of all, we have the zero mn matrix,
that is the matrix all of whose entries are equal to 0. Equivalently, it is the matrix
all of whose rows are the zero vector in F n , or the matrix all of whose columns are
the zero vector in F m . This matrix is denoted Om;n or, if the context is clear, simply
0 (in this case, the context will make it clear that 0 is the zero matrix and not the
element 0 2 F ).
Another extremely important matrix is the unit (or identity) matrix In 2
Mn .F /, defined by
3
1 0 ::: 0
60 1 ::: 07
7
6
In D 6 : : : : 7
4 :: :: : : :: 5
2
0 0 ::: 1
with entries
ıij D
1 if i D j
0 if i ¤ j
Among the special but important classes of matrices that we will have to deal
with quite often in the sequel, we mention:
• The diagonal matrices. These are square matrices A D Œaij such that aij D 0
unless i D j . The typical shape of a diagonal matrix is therefore
2
3
a1 0 : : : 0
6 0 a2 : : : 0 7
6
7
6 : : : : 7:
4 :: :: : : :: 5
0 0 : : : an
6
1 Matrix Algebra
• The upper-triangular matrices. These are square matrices A D Œaij whose
entries below the main diagonal are zero, that is aij D 0 whenever i > j . Hence
the typical shape of an upper-triangular matrix is
2
3
a11 a12 : : : a1n
6 0 a22 : : : a2n 7
6
7
AD6 : : :
: 7:
4 :: :: : : :: 5
0
0 : : : ann
Of course, one can also define lower-triangular matrices as those square matrices
whose entries above the main diagonal are zero.
We will deal now with the basic operations on matrices. Two matrices of the
same size m n can be added together to produce another matrix of the same size.
The addition is done component-wise. The re-scaling of a matrix by a scalar is done
by multiplying each entry by that scalar. The obtained matrix has the same size as
the original one. More formally:
Definition 1.7. Let A D Œaij and B D Œbij be matrices in Mm;n .F / and let c 2 F
be a scalar.
a) The sum A C B of the matrices A and B is the matrix
A C B D Œaij C bij :
In fully expanded form
3 2
3
b11 b12 b13 : : : b1n
a11 a12 a13 : : : a1n
6 a21 a22 a23 : : : a2n 7 6 b21 b22 b23 : : : b2n 7
7 6
7
6
6 :
::
:: : : :: 7 C 6 :: :: :: : : :: 7 D
4 ::
: : 5 4 : : :
: : 5
:
:
2
am1 am2 am3 : : : amn
bm1 bm2 bm3 : : : bmn
3
2
a11 C b11 a12 C b12 a13 C b13 : : : a1n C b1n
6 a21 C b21 a22 C b22 a23 C b23 : : : a2n C b2n 7
7
6
7:
6
::
::
::
::
::
5
4
:
:
:
:
:
am1 C bm1 am2 C bm2 am3 C bm3 : : : amn C bmn
b) The re-scaling of A by c is the matrix
cA D Œcaij :
Remark 1.8. a) We insist on the fact that it does not make sense to add two
matrices if they do not have the same size.
1.1 Vectors, Matrices, and Basic Operations on Them
7
b) We also write A instead of .1/A, thus we write A B instead of A C .1/B,
if A and B have the same size.
Example 1.9. We have
3
3 2
3 2
1223
0123
1100
40 1 1 05 C 41 2 3 45 D 41 3 4 45
2356
2345
0011
2
but
3
1100
4 0 1 1 0 5 C I3
0011
2
does not make sense.
As another example, we have
3
3 2
3 2
1211
0111
1100
40 1 1 05 C 40 0 1 15 D 40 1 2 15
0012
0001
0011
2
in M3;4 .R/.
On the other hand, we have the following equality in M3;4 .F2 /
3
3 2
3 2
1011
0111
1100
40 1 1 05 C 40 0 1 15 D 40 1 0 15:
0010
0001
0011
2
As we observed in the previous section, we can think of column-vectors in F n as
n 1 matrices, thus we can define addition and re-scaling for vectors by using the
above definition for matrices. Explicitly, we have
2
3 2 3
3
2
x1
y1
x 1 C y1
6 x2 7 6 y2 7
6 x 2 C y2 7
6 7 6 7
7
6
6 : 7 C 6 : 7 WD 6
7
::
4 :: 5 4 :: 5
5
4
:
xn
yn
x n C yn
and for a scalar c 2 F
3 2
3
x1
cx1
6 x2 7 6 cx2 7
7
6 7 6
c 6 : 7 D 6 : 7:
4 :: 5 4 :: 5
2
xn
cxn
8
1 Matrix Algebra
Similarly, we can define operations on row-vectors by thinking of them as matrices
with only one row.
Remark 1.10. a) Again, it makes sense to add two vectors if and only if they
have the same number of coordinates. So it is nonsense to add a vector in F 2
and a vector in F 3 .
b) Similarly, we let X be the vector .1/X and, if X; Y 2 F n , we let X Y D
X C .Y /.
The following result follows from the basic properties of addition and multiplication rules in a field. We leave the formal proof to the reader.
Proposition 1.11. For any matrices A; B; C 2 Mm;n .F / and any scalars ˛; ˇ 2 F
we have
(A1)
(A2)
(A3)
(A4)
(S1)
(S2)
(S3)
(S4)
.A C B/ C C D A C .B C C / (associativity of the addition);
A C B D B C A (commutativity of the addition);
A C Om;n D Om;n C A D A (neutrality of Om;n );
A C .A/ D .A/ C A D Om;n (cancellation with the opposite matrix).
.˛ C ˇ/A D ˛A C ˇA (distributivity of the re-scaling over scalar sums);
˛.A C B/ D ˛A C ˛B (distributivity of the re-scaling over matrix sums);
˛.ˇA/ D .˛ˇ/A (homogeneity of the scalar product);
1A D A (neutrality of 1).
Since vectors in F n are the same thing as n 1 matrices (or 1 n matrices,
according to our convention of representing vectors), the previous proposition
implies that the properties (A1)–(A4) and (S1)–(S4) are also satisfied by vectors
in F n . Of course, this can also be checked directly from the definitions.
Definition 1.12. The canonical basis (or standard basis) of F n is the n-tuple of
vectors .e1 ; : : : ; en /, where
2 3
1
607
6 7
6 7
e1 D 6 0 7 ;
6:7
4 :: 5
0
2 3
0
617
6 7
6 7
e2 D 6 0 7 ; : : : ;
6:7
4 :: 5
0
2 3
0
607
6 7
6 7
en D 6 0 7 :
6:7
4 :: 5
1
Thus ei is the vector in F n whose i th coordinate equals 1 and all other coordinates
are equal to 0.
Remark 1.13. Observe that the meaning of ei depends on the context.
example,
For
1
2
, but if we
if we think of e1 as the first standard basis vector in F then e1 D
0
2 3
1
3
4
think of it as the first standard basis vector in F then e1 D 0 5. It is customary not
0
1.1 Vectors, Matrices, and Basic Operations on Them
9
to introduce extra notation to distinguish such situations but to rely on the context
in deciding on the meaning of ei .
The following result follows directly by unwinding definitions:
Proposition 1.14. Any vector v 2 F n can be uniquely written as
v D x1 e1 C x2 e2 C : : : C xn en
for some scalars x1 ; : : : ; xn 2 F . In fact, x1 ; : : : ; xn are precisely the coordinates
of v.
Proof. If x1 ; : : : ; xn are scalars, then by definition
3 2 3
2 3 2 3
0
x1
0
x1
6 0 7 6 x2 7
6 0 7 6 x2 7
6 7 6 7
6 7 6 7
6 7 6 7
6 7 6 7
x1 e1 C x2 e2 C : : : C xn en D 6 0 7 C 6 0 7 C : : : C 6 0 7 D 6 x3 7 :
6 : 7 6 : 7
6 : 7 6 : 7
4 :: 5 4 :: 5
4 :: 5 4 :: 5
2
0
0
xn
xn
t
u
The result follows.
We have similar results for matrices:
Definition 1.15. Let m; n be positive integers. For 1 i m and 1 j n
consider the matrix Eij 2 Mm;n .F / whose .i; j /-entry equals 1 and all other entries
are 0.
The mn-tuple .E11 ; : : : ; E1n ; E21 ; : : : ; E2n ; : : : ; Em1 ; : : : ; Emn / is called the
canonical basis (or standard basis) of Mm;n .F /.
Proposition 1.16. Any matrix A 2 Mm;n .F / can be uniquely expressed as
AD
n
m X
X
aij Eij
iD1 j D1
for some scalars aij . In fact, aij is the .i; j /-entry of A.
Proof. As in the proof of Proposition 1.14, one checks that for any scalars xij 2 F
we have
3
x11 x12 : : : x1n
n
m X
6 x21 x22 : : : x2n 7
X
7
6
xij Eij D 6 :
:: : : :: 7 ;
4 ::
: : 5
:
2
iD1 j D1
xm1 xm2 : : : xmn
which yields the desired result.
t
u
10
1 Matrix Algebra
3
1100
Example 1.17. Let us express the matrix A D 4 0 1 1 0 5 in terms of the canonical
0022
basis. We have
2
A D E11 C E12 C E22 C E23 C 2E33 C 2E34 :
1.1.1 Problems for Practice
1. Write down explicitly the entries of the matrix A D Œaij 2 M2;3 .R/ in each of
the following cases:
a) aij D iCj1 1 .
b) aij D i C 2j .
c) aij D ij .
2. For each of the following pairs of matrices .A; B/ explain which of the matrices
A C B and A 2B make sense and compute these matrices whenever they do
make sense:
3
3
2
2
1111
1200
a) A D 4 0 1 3 0 5 and B D 4 0 1 1 1 5.
0021
0012
b) A D 1 1 0 0 and B D 1 1 0 .
3
3
2
2
2 1 0
3 1 0
c) A D 4 1 1 1 5 and B D 4 4 1 1 5.
6 4 3
2 0 5
3. Consider the vectors
3
1
6 2 7
7
6
7
6
v1 D 6 3 7 ;
7
6
4 1 5
4
2
3
2
6 2 7
7
6
7
6
v2 D 6 1 7 :
7
6
4 4 5
3
2
What are the coordinates of 2
the vector v1 3
C 2v2 ?
3 1 0 4
4. Express the matrix A D 4 7 1 1 2 5 in terms of the canonical basis of
8 9 5 3
M3;4 .R/.
5. Let .Eij /1i2;1j 3 be the canonical basis of M2;3 .R/. Describe the entries of
the matrix E11 3E12 C 4E23 .
1.2 Matrices as Linear Maps
11
6. Let F be a field.
a) Prove that if A; B 2 Mn .F / are diagonal matrices, then A C cB is a diagonal
matrix for any c 2 F .
b) Prove that the same result holds if we replace diagonal with upper-triangular.
c) Prove that any matrix A 2 Mn .F / can be written as the sum of an uppertriangular matrix and of a lower-triangular matrix. Is there a unique such
writing?
7. a) How many distinct matrices are there in Mm;n .F2 /?
b) How many of these matrices are diagonal?
c) How many of these matrices are upper-triangular?
1.2 Matrices as Linear Maps
In this section we will explain how to see a matrix as a map on vectors. Let F be
a field and let A 2 Mm;n .F / be a matrix with entries aij . To each vector X D
2 3
x1
6 x2 7
6 7
6 : 7 2 F n we associate a new vector AX 2 F m defined by
4 :: 5
xn
2
3
a11 x1 C a12 x2 C : : : C a1n xn
6 a21 x1 C a22 x2 C : : : C a2n xn 7
6
7
AX D 6
7:
::
4
5
:
am1 x1 C am2 x2 C : : : C amn xn
We obtain therefore a map F n ! F m which sends X to AX .
Example 1.18. The map associated with the matrix
3
1100
A D 4 1 1 1 0 5 2 M3;4 .R/
0011
2
is the map f W R4 ! R3 defined by
2 3
02 31
3
2
x
x
xCy
6y 7
B6 y 7C
6 7 4
6 7C
5
f B
@4 z 5A D A 4 z 5 D x C y C z :
zCt
t
t
12
1 Matrix Algebra
In terms of row-vectors we have
f .x; y; z; t / D .x C y; x C y C z; z C t /:
Remark 1.19. Consider the canonical basis e1 ; : : : ; en of F n . Then by definition for
all 1 i n
3
a1i
6 a2i 7
7
6
Aei D Ci D 6 : 7 ;
4 :: 5
2
ami
2
3
x1
6 x2 7
6 7
the i th column of A. In general, if X D 6 : 7 2 F n is any vector, then
4 :: 5
xn
AX D x1 C1 C x2 C2 C : : : C xn Cn ;
as follows directly from the definition of AX .
The key properties of this correspondence are summarized in the following:
Theorem 1.20. For all matrices A; B 2 Mm;n .F /, all vectors X; Y 2 F n and all
scalars ˛; ˇ 2 F we have
a) A.˛X C ˇY / D ˛AX C ˇAY .
b) .˛A C ˇB/X D ˛AX C ˇBX.
c) If AX D BX for all X 2 F n , then A D B.
3
2 3
x1
y1
6 x2 7
6 y2 7
6 7
6 7
Proof. Writing A D Œaij , B D Œbij , and X D 6 : 7, Y D 6 : 7, we have
:
4 : 5
4 :: 5
2
xn
3
˛x1 C ˇy1
6 ˛x2 C ˇy2 7
7
6
˛A C ˇB D Œ˛aij C ˇbij and ˛X C ˇY D 6
7.
::
5
4
:
˛xn C ˇyn
yn
2
a) By definition, the ith coordinate of A.˛X C ˇY / is
n
X
j D1
aij .˛xj C ˇyj / D ˛
n
X
j D1
aij xj C ˇ
n
X
j D1
aij yj :
1.2 Matrices as Linear Maps
13
The right-hand side is the i th coordinate of ˛AX C ˇAY , giving the desired
result.
b) The argument is identical: the equality is equivalent to
n
X
.˛aij C ˇbij /xj D ˛
j D1
n
X
aij xj C ˇ
j D1
n
X
bij xj
j D1
which is clear.
c) By hypothesis we have Aei D Bei , where e1 ; : : : ; en is the canonical basis of
F n . Then Remark 1.19 shows that the i th column of A equals the i th column of
B for 1 i n, which is enough to conclude that A D B.
t
u
We obtain therefore an injective map A 7! .X 7! AX / from Mm;n .F / to the set
of maps ' W F n ! F m which satisfy
'.˛X C ˇY / D ˛'.X / C ˇ'.Y /
for all X; Y 2 F n and ˛; ˇ 2 F . Such a map ' W F n ! F m is called linear. Note
that a linear map necessarily satisfies '.0/ D 0 (take ˛ D ˇ D 0 in the previous
relation), hence this notion is different from the convention used in some other areas
of mathematics (in linear algebra a map '.X / D aX C b is usually referred to as
an affine map).
The following result shows that we obtain all linear maps by the previous
procedure:
Theorem 1.21. Let ' W F n ! F m be a linear map. There is a unique matrix
A 2 Mm;n .F / such that '.X / D AX for all X 2 F n .
Proof. The uniqueness assertion is exactly part c) of the previous theorem, so let us
focus on the existence issue. Let ' W F n ! F m be a linear map and let e1 ; : : : ; en be
the canonical basis of F n . Consider the matrix A whose i th column Ci equals the
vector '.ei / 2 F m . By Remark 1.19 we have Aei D Ci D '.ei / for all 1 i n.
2 3
x1
6 x2 7
6 7
If X D 6 : 7 2 F n is an arbitrary vector, then X D x1 e1 C : : : C xn en , thus since
4 :: 5
xn
' is linear, we have
'.X / D '.x1 e1 C : : : C xn en / D x1 '.e1 / C : : : C xn '.en / D
x1 C1 C : : : C xn Cn D AX;
the last equality being again a consequence of Remark 1.19. Thus '.X / D AX for
t
u
all X 2 F n and the theorem is proved.
14
1 Matrix Algebra
We obtain therefore a bijection between matrices in Mm;n .F / and linear
maps F n ! F m .
Example 1.22. Let us consider the map f W R4 ! R3 defined by
f .x; y; z; t / D .x 2y C z; 2x 3z C t; t x/:
What is the matrix A 2 M3;4 .R/ corresponding to this linear map? By Remark 1.19,
we must have f .ei / D Ci , where e1 ; e2 ; e3 ; e4 is the canonical basis of R4 and
C1 ; C2 ; C3 ; C4 are the successive columns of A. Thus, in order to find A, it suffices
to compute the vectors f .e1 /; : : : ; f .e4 /. We have
f .e1 / D f .1; 0; 0; 0/ D .1; 2; 1/;
f .e2 / D f .0; 1; 0; 0/ D .2; 0; 0/;
f .e3 / D f .0; 0; 1; 0/ D .1; 3; 0/;
f .e4 / D f .0; 0; 0; 1/ D .0; 1; 1/:
Hence
3
1 2 1 0
A D 4 2 0 3 1 5 :
1 0 0 1
2
In practice, one can avoid computing f .e1 /; : : : ; f .e4 / as we did before: we look
at the first coordinate of the vector f .x; y; z; t /, that is x 2y C z. We write it
as
1 x C.2/ y C 1 z C 0 t and this gives us the first row of A, namely
1 2 1 0 . Next, we look at the second coordinate of f .x; y; z; t / and write it as
2 x C 0 y C .3/ z C 1 t, which gives the second row 2 0 3 1 of A. We
proceed similarly with the last row.
1.2.1 Problems for Practice
1. Describe the linear maps associated with the matrices
3
1 3 2 0
4 2 1 4 15;
1 5 0 1
2
3 1
;
2 4
1 1 00
:
2 3 2 5
2. Consider the map f W R3 ! R4 defined by
f .x; y; z/ D .x 2y C 2z; y z C x; x; z/:
Prove that f is linear and describe the matrix associated with f .
1.3 Matrix Multiplication
15
3. a) Consider the map f W R2 ! R2 defined by
f .x; y/ D .x 2 ; y 2 /:
Is this map linear?
b) Answer the same question with the field R replaced with F2 .
4. Consider the map f W R2 ! R2 defined by
f .x; y/ D .x C 2y; x C y 1/:
Is the map f linear?
3
1 2 2 0
5. Consider the matrix A D 4 2 0 4 1 5. Describe the image of the vector v D
1 1 0 1
2 3
1
617
6 7 through the linear map attached to A.
425
2
2
6. Give an example of a map f W R2 ! R which is not linear and for which
f .av/ D af .v/
for all a 2 R and all v 2 R2 .
1.3 Matrix Multiplication
Let us consider now three positive integers m; n; p and A 2 Mm;n .F /, B 2
Mn;p .F /. We insist on the fact that the number of columns n of A equals the
number of rows n of B. We saw in the previous section that A and B define natural
maps
'A W F n ! F m ;
'B W F p ! F n ;
sending X 2 F n to AX 2 F m and Y 2 F p to BY 2 F n .
Let us consider the composite map
'A ı 'B W F p ! F m ;
.'A ı 'B /.X / D 'A .'B .X //:
Since 'A and 'B are linear, it is not difficult to see that 'A ı 'B is also linear. Thus
by Theorem 1.21 there is a unique matrix C 2 Mm;p .F / such that
'A ı 'B D 'C :
Let us summarize this discussion in the following fundamental:
16
1 Matrix Algebra
Definition 1.23. The product of two matrices A 2 Mm;n .F / and B 2 Mn;p .F /
(such that the number of columns n of A equals the number of rows n of B) is the
unique matrix AB 2 Mm;p .F / such that
A.BX / D .AB/X
for all X 2 F p .
Remark 1.24. Here is a funny thing, which shows that the theory developed so far is
coherent: consider a matrix A 2 Mm;n .F / and a vector X 2 F n , written in columnform. As we said, we can think of X as a matrix with one column, i.e., a matrix
XQ 2 Mn;1 .F /. Then we can consider the product AXQ 2 Mm;1 .F /. Identifying again
Mm;1 .F / with column-vectors of length m, i.e., with F m , AXQ becomes identified
with AX , the image of X through the linear map canonically attached to A. In
other words, when writing AX we can either think of the image of X through the
canonical map attached to A (and we strongly encourage the reader to do so) or
as the product of the matrix A and of a matrix in Mn;1 .F /. The result is the same,
modulo the natural identification between column-vectors and matrices with one
column.
The previous definition is a little bit abstract, so let us try to compute explicitly
the entries of AB in terms of the entries aij of A and bij of B. Let e1 ; : : : ; ep
be the canonical basis of F p . Then .AB/ej is the j th column of AB by
Remark 1.19. Let C1 .A/; : : : ; Cn .A/ and C1 .B/; : : : ; Cp .B/ be the columns of A
and B respectively. Using again Remark 1.19, we can write
A.Bej / D ACj .B/ D b1j C1 .A/ C b2j C2 .A/ C : : : C bnj Cn .A/:
Since by definition A.Bej / D .AB/ej D Cj .AB/, we obtain
Cj .AB/ D b1j C1 .A/ C b2j C2 .A/ C : : : C bnj Cn .A/
(1.1)
We conclude that
.AB/ij D ai1 b1j C ai2 b2j C : : : C ai n bnj
(1.2)
and so we have established the following
Theorem 1.25 (Product Rule). Let A D Œaij 2 Mm;n .F / and B D Œbij 2
Mn;p .F /. Then the .i; j /-entry of the matrix AB is
.AB/ij D
n
X
aik bkj :
kD1
Of course, one could also take the previous theorem as a definition of the product
of two matrices. But it is definitely not apparent why one should define the product
1.3 Matrix Multiplication
17
in such a complicated way: for instance a very natural way of defining the product
would be component-wise (i.e., the .i; j /-entry of the product should be the product
of the .i; j /-entries in A and B), but this naive definition is not useful for the
purposes of linear algebra. The key point to be kept in mind is that for the purposes
of linear algebra (and not only), matrices should be thought of as linear maps,
and the product should correspond to the composition of linear maps.
a11 a12
b11 b12
Example 1.26. a) If A D
and B D
are matrices in M2 .F /,
a21 a22
b21 b22
then AB exists and
a11 b11 C a12 b21 a11 b12 C a12 b22
AB D
:
a21 b11 C a22 b21 a21 b12 C a22 b22
b) If
2
3
a11 a12
A D 4 a21 a22 5
a31 a32
and
BD
b11 b12
b21 b22
then the product AB is defined and it is the 3 2 matrix
2
3
a11 b11 C a12 b21 a11 b12 C a12 b22
AB D 4 a21 b11 C a22 b21 a21 b12 C a22 b22 5
a31 b11 C a32 b21 a31 b12 C a32 b22
The product BA is not defined since B 2 M2;2 .F / and A 2 M3;2 .F /.
c) Considering
3
1 1
AD4 2 0 5
1 3
2
and
BD
1 2
3 1
we get
3
2 1
AB D 4 2 4 5
8 1
2
d) Take A; B 2 M2 .C/, where
AD
10
00
and
BD
00
20
18
1 Matrix Algebra
Then, both products AB and BA are defined and we have
AB D
00
D O2
00
and
BA D
00
:
20
This last example shows two important things:
• multiplication of matrices (even in M2 .F /) is not commutative, i.e., generally
AB ¤ BA when AB and BA both make sense (this is the case if A; B 2 Mn .F /,
for instance).
• There are nonzero matrices A; B whose product is 0: for instance in this example
we have A ¤ O2 , B ¤ O2 , but AB D O2 .
Definition 1.27. Two matrices A; B 2 Mn .F / commute if
AB D BA:
One has to be very careful when using algebraic operations on matrices, since
multiplication is not commutative in general. For instance, one uses quite often
identities such as
.a C b/2 D a2 C 2ab C b 2 ;
.a C b/.a b/ D a2 b 2
for elements of a field F . Such identities are (in general) no longer true if a; b are
matrices and they should be replaced by the following correct identities
.A C B/2 D A2 C AB C BA C B 2 ;
.A C B/.A B/ D A2 AB C BA B 2 :
We see that the previous identities (which hold for elements of a field) hold for A
and B if and only if A and B commute.
Matrix multiplication obeys many of the familiar arithmetical laws apart from
the commutativity property. More precisely, we have the following:
Proposition 1.28. Multiplication of matrices has the following properties
1) Associativity: we have .AB/C D A.BC / for all matrices A 2 Mm;n .F /, B 2
Mn;p .F /, C 2 Mp;q .F /.
2) Compatibility with scalar multiplication: we have ˛.AB/ D .˛A/B D A.˛B/
if ˛ 2 F , A 2 Mm;n .F / and B 2 Mn;p .F /
3) Distributivity with respect to addition: we have
.A C B/C D AC C BC
if A; B 2 Mm;n .F / and
C 2 Mn;p .F /;
if A; B 2 Mm;n .F / and
D 2 Mp;m .F /:
and
D.A C B/ D DA C DB
1.3 Matrix Multiplication
19
All these properties follow quite easily from Definition 1.23 or Theorem 1.25. Let
us prove for instance the associativity property (which would be the most painful
to check by bare hands if we took Theorem 1.25 as a definition). It suffices (by
Theorem 1.21) to check that for all X 2 F q we have
..AB/C /X D .A.BC //X:
But by definition of the product we have
..AB/C /X D .AB/.CX / D A.B.CX //
and
.A.BC //X D A..BC /X / D A.B.CX //;
and the result follows. One could also use Theorem 1.25 and check by a rather
painful computation that the .i; j /-entry in .AB/C equals the .i; j /-entry in
A.BC /, by showing that they are both equal to
X
aik bkl clj :
k;l
All other properties of multiplication stated in the previous proposition are
proved in exactly the same way and we leave it to the reader to fill in the details.
Remark 1.29. Because of the associativity property we can simply write ABCD
instead of the cumbersome ..AB/C /D, which also equals .A.BC //D or
A.B.CD//. Similarly, we define the product of any number of matrices. When
these matrices are all equal we use the notation
An D A A : : : A;
with n factors in the right-hand side. This is the nth power of the matrix A. Note
that it only make sense to define the powers of a square matrix! By construction we
have
An D A An1 :
We make the natural convention that A0 D In for any A 2 Mn .F /. The reader
will have no difficulty in checking that In is a unit for matrix multiplication, in the
sense that
A In D A
and
Im A D A if
A 2 Mm;n .F /:
20
1 Matrix Algebra
We end this section with a long list of problems which illustrate the concepts
introduced so far.
Problem 1.30. Let A.x/ 2 M3 .R/ be the matrix defined by
3
1 x x2
A.x/ D 4 0 1 2x 5 :
00 1
2
Prove that A.x1 /A.x2 / D A.x1 C x2 / for all x1 ; x2 2 R.
Solution. Using the product rule given by Theorem 1.25, we obtain
32
3
1 x2 x22
1 x1 x12
A.x1 /A.x2 / D 4 0 1 2x1 5 4 0 1 2x2 5
0 0 1
0 0 1
2
3
3
2
1 x2 C x1 x22 C 2x1 x2 C x12
1 x1 C x2 .x1 C x2 /2
5 D 40
D 40
1
2x2 C 2x1
1
2.x1 C x2 / 5 :
0
0
1
0
0
1
2
By definition, the last matrix is simply A.x1 C x2 /.
t
u
The result established in the following problem is very useful and constantly used
in practice:
Problem 1.31. a) Prove that the product of two diagonal matrices is a diagonal
matrix.
b) Prove that the product of two upper-triangular matrices is upper-triangular.
c) Prove that in both cases the diagonal entries of the product are the product of the
corresponding diagonal entries.
Solution. a) Let A D Œaij and B D Œbij be two diagonal matrices in Mn .F /. Let
i ¤ j 2 f1; : : : ; ng. Using the product rule, we obtain
.AB/ij D
n
X
aik bkj :
kD1
We claim that aik bkj D 0 for all k 2 f1; 2; : : : ; ng, thus .AB/ij D 0 for all
i ¤ j and AB is diagonal. To prove the claim, note that since i ¤ j , we have
i ¤ k or j ¤ k. Thus either aik D 0 (since A is diagonal) or bkj D 0 (since B
is diagonal), thus in all cases aik bkj D 0 and the claim is proved.
b) Let A D Œaij and B D Œbij be upper-triangular matrices in Mn .F /. We want to
prove that .AB/ij D 0 for all i > j . By the product rule,
.AB/ij D
n
X
kD1
aik bkj ;
1.3 Matrix Multiplication
21
thus it suffices to prove that for all i > j and all k 2 f1; 2; : : : ; ng we have
aik bkj D 0. Fix i > j and k 2 f1; 2; : : : ; ng and suppose that aik bkj ¤ 0, thus
aik ¤ 0 and bkj ¤ 0. Since A and B are upper-triangular, we deduce that i k
and k j , thus i j , a contradiction.
c) Again, using the product rule we compute
.AB/i i D
n
X
aik bki :
kD1
Assume that A and B are upper-triangular (which includes the case when they
are both diagonal). If aik bki is nonzero for some k 2 f1; 2; : : : ; ng, then i k
and k i , thus k D i. We conclude that
.AB/i i D ai i bi i
t
u
and the result follows.
Problem 1.32. A matrix A 2 Mn .R/ is called right stochastic if all entries are
nonnegative real numbers and the sum of the entries in each row equals 1. We
define the concept of left stochastic matrix similarly by replacing the word row
with column. Finally, a matrix is called doubly stochastic if it is simultaneously
left and right stochastic.
a) Prove that the product of two left stochastic matrices is a left stochastic matrix.
b) Prove that the product of two right stochastic matrices is a right stochastic matrix.
c) Prove that the product of two doubly stochastic matrices is a doubly stochastic
matrix.
Solution. Note that c) is just the combination of a) and b). The argument for proving
b) is identical to the one used to prove a), thus we will only prove part a) and
leave the details for part b) to the reader. Consider thus two left stochastic matrices
A; B 2 Mn .R/, say A D Œaij and B D Œbij . Thus aij 0, bij 0 for all
i; j 2 f1; 2; : : : ; ng and moreover the sum of the entries in each column of A or B
is 1, which can be written as
n
X
aki D 1;
kD1
n
X
bki D 1
kD1
for i 2 f1; 2; : : : ; ng. Note that by the product rule
.AB/ij D
n
X
ail blj
lD1
is nonnegative for all i; j 2 f1; 2; : : : ; ng. Moreover, the sum of the entries of AB
in the ith column is
22
1 Matrix Algebra
n
X
0
1
!
n
n
n
n
X
X
X
X
@
.AB/ki D
akj bj i A D
akj bj i
kD1
D
kD1
n
X
j D1
bj i n
X
j D1
j D1
!
D
akj
n
X
bj i 1 D
j D1
kD1
kD1
n
X
bj i D 1;
j D1
Pn
where we used once the fact that A is left stochastic
Pn (so that kD1 akj D 1 for all
j ) and once the fact that B is stochastic (hence j D1 bj i D 1 for all i ). The result
follows.
t
u
Problem 1.33. Let .Eij /1i;j n be the canonical basis of Mn .F /. Prove that if
i; j; k; l 2 f1; 2; : : : ; ng, then
Eij Ekl D ıj k Eil ;
where ıj k equals 1 if j D k and 0 otherwise.
Solution. We use the product rule: let u; v 2 f1; 2; : : : ; ng, then
.Eij Ekl /uv D
n
X
.Eij /uw .Ekl /wv :
wD1
Now .Eab /cd is zero unless a D c and b D d , and it is equal to 1 if the previous
two equalities are satisfied. Thus .Eij /uw .Ekl /wv is zero unless i D u; j D w and
k D w, l D v. The last equalities can never happen if j ¤ k, so if j ¤ k, then
.Eij Ekl /uv D 0 for all u; v 2 f1; 2; : : : ; ng. We conclude that Eij Ekl D 0 when
j ¤ k.
Assuming now that j D k, the previous discussion yields .Eij Ekl /uv D 1 if
u D i and v D l, and it equals 0 otherwise. In other words,
.Eij Ekl /uv D .Eil /uv
for all u; v 2 f1; 2; : : : ; ng. Thus Eij Ekl D Eil in this case, as desired.
t
u
Problem 1.34. Let .Eij /1i;j n be the canonical basis of Mn .F /. Let i; j 2
f1; 2; : : : ; ng and consider a matrix A D Œaij 2 Mn .F /.
a) Prove that
3
0 0 : : : a1i 0 : : : 0
6 0 0 : : : a2i 0 : : : 0 7
7
6
AEij D 6 : : : :
7;
5
4 :: :: :: ::
2
0 0 : : : ani 0 : : : 0
the only possibly nonzero entries being in the j th column.
1.3 Matrix Multiplication
23
b) Prove that
2
3
0 0 ::: 0
6 : : : : 7
6 :: :: :: :: 7
6
7
6
7
6 aj1 aj 2 : : : aj n 7
Eij A D 6
7;
6 0 0 ::: 0 7
6
7
6 :: :: :: :: 7
4 : : : : 5
0 0 ::: 0
the only possibly nonzero entries being in the ith row.
Solution. a) Write
AD
n
X
akl Ek;l :
k;lD1
Using Problem 1.33, we can write
AEi;j D
n
X
akl Ek;l Ei;j D
k;lD1
D
n
X
n
X
akl ıi;l Ek;j
k;lD1
aki Ek;j D a1i E1;j C a2i E2;j C : : : C ani En;j :
kD1
Coming back to the definition of the matrices E1j ; : : : ; En;j , the result follows.
b) The proof is identical and left to the reader.
t
u
Problem 1.35. Prove that a matrix A 2 Mn .F / commutes with all matrices in
Mn .F / if and only if A D cIn for some scalar c 2 F .
Solution. If A D cIn for some scalar c 2 F , then AB D cB and BA D cB for all
B 2 Mn .F /, hence AB D BA for all matrices B 2 Mn .F /. Conversely, suppose
that A commutes with all matrices B 2 Mn .F /. Then A commutes with Eij for all
i; j 2 f1; 2; : : : ; ng. Using Problem 1.34 we obtain the equality
2
3
0 0 ::: 0
2
3 6 : : : : 7
6 :: :: :: :: 7
0 0 : : : a1i 0 : : : 0
6
7
6 0 0 : : : a2i 0 : : : 0 7 6
7
6
7 6 aj1 aj 2 : : : aj n 7
6: : : :
7D6
7:
4 :: :: :: ::
5 6 0 0 ::: 0 7
6
7
6 :: :: :: :: 7
0 0 : : : ani 0 : : : 0
4 : : : : 5
0 0 ::: 0
24
1 Matrix Algebra
If i ¤ j , considering the .j; j /-entry in both matrices appearing in the previous
equality yields aj i D 0, thus aij D 0 for i ¤ j and A is diagonal. Contemplating
again the previous equality yields ai i D ajj for all i; j and so all diagonal entries
of A are equal. We conclude that A D a11 In and the problem is solved.
t
u
Problem 1.36. Find all matrices A 2 M3 .C/ which commute with the matrix
3
100
A D 40 2 05:
003
2
Solution. Let B D Œbij be a matrix commuting with A. Using the product rule, we
obtain
3 2
3 2
3
b11 b12 b13
b11 b12 b13
100
AB D 4 0 2 0 5 4 b21 b22 b23 5 D 4 2b21 2b22 2b23 5
003
b31 b32 b33
3b31 3b32 3b33
2
and
3 2
3 2
3
b11 2b12 3b13
b11 b12 b13
100
BA D 4 b21 b22 b23 5 4 0 2 0 5 D 4 b21 2b22 3b23 5 :
003
b31 b32 b33
b31 2b32 3b33
2
Comparing the equality AB D BA yields
b12 D b13 D b21 D b23 D b31 D b32 D 0
and conversely if these equalities are satisfied, then AB D BA. We conclude that
2
3
b11 0 0
the solutions of the problem are the matrices of the form B D 4 0 b22 0 5, that
0 0 b33
is the diagonal matrices.
t
u
Problem 1.37. A 3 3 matrix A 2 M3 .R/ is called circulant if there are real
numbers a; b; c such that
3
abc
A D 4c a b5:
bca
2
a) Prove that the sum and product of two circulant matrices is a circulant matrix.
b) Prove that any two circulant matrices commute.
1.3 Matrix Multiplication
25
3
3
2
xy z
abc
Solution. Let A D 4 c a b 5 and B D 4 z x y 5 be two circulant matrices.
y z x
bca
2
a) Note that
3
aCx bCy cCz
ACB D 4 c Cz aCx bCy5
bCy cCz aCx
2
is a circulant matrix. Using the product rule we compute
3
u v w
AB D 4 w u v 5 ;
v wu
2
where
u D ax C bz C cy;
v D ay C bx C cz;
w D az C by C cx:
Thus AB is also a circulant matrix.
b) Similarly, using the product rule we check that
2
3
u v w
BA D 4 w u v 5 D AB:
v wu
t
u
Problem 1.38. If A; B 2 Mn .C/ are matrices satisfying
A2 D B 2 D .AB/2 D In ;
prove that A and B commute.
Solution. Multiplying the relation ABAB D In by A on the left and by B on the
right, we obtain
A2 BAB 2 D AB:
By assumption, the left-hand side equals In BAIn D BA, thus BA D AB.
t
u
26
1 Matrix Algebra
1.3.1 Problems for Practice
1. Consider the matrices
AD 123
and
2 3
4
B D 455:
6
Which of the products AB and BA make sense? For each product which makes
sense, compute the entries of the product.
2. Consider the matrices
3
111
A D 40 1 15
001
3
011
B D 41 1 05
001
2
2
and
in M3 .F2 /. Compute AB and BA.
3. Consider the matrices
3
1 0 2
A D 4 3 1 0 5 ;
1 1 1
2
3
0 1
B D 4 2 1 5 :
1 0
2
Which of the products A2 ; AB; BA; B 2 makes sense? Compute all products that
make sense.
13
.
4. Let A D
21
a) Find all matrices B 2 M2 .C/ which commute with A.
b) Find all matrices B 2 M2 .C/ for which AB C BA is the zero matrix.
5. Determine all matrices A 2 M2 .R/ commuting with the matrix
12
:
34
1x
with x 2 .1; 1/. Prove
6.
x1
that the product of two elements of G is an element of G.
a b
7. (matrix representation of C) Let G be the set of matrices of the form
b a
with a; b 2 R.
Let G be the set of matrices of the form p 1 2
1x
a) Prove that the sum and product of two elements of G is in G.
1.3 Matrix Multiplication
27
b) Consider the map f W G ! C defined by
f
a b
b a
D a C i b:
Prove that f is a bijective map satisfying f .A C B/ D f .A/ C f .B/ and
f .AB/ D f .A/f .B/ for all A; B 2 G.
a b
.
c) Use this to compute the nth power of the matrix
b a
8. For any real number x let
3
1x 0 x
A.x/ D 4 0 1 0 5 :
x 0 1x
2
a) Prove that for all real numbers a; b we have
A.a/A.b/ D A.a C b 2ab/:
b) Given a real number x, compute A.x/n .
9. Compute A20 , where
3
100
A D 40 2 05:
003
2
10. a) Give a detailed proof, by induction on k, for the binomial formula: if
A; B 2 Mn .F / commute then
!
k
X
k
Akj B j :
.A C B/k D
j
j D0
b) Give a counterexample to the binomial formula if we drop the hypothesis
that A and B commute.
11. a) Let
3
00 1
B D 4 0 0 1 5 :
11 0
2
Prove that B 3 D O3 .
28
1 Matrix Algebra
b) Let a be a real number. Using part a) and the binomial formula, compute An
where
3
10 a
A D 4 0 1 a 5 :
aa 1
2
12. Let
2
3
1 01
A D 4 1 1 12 5 :
0 01
a) Prove that .A I3 /3 D O3 .
b) Compute An for all positive integers n, by using part a) and the binomial
formula.
13. a) Prove that the matrix
3
110
A D 40 1 15
001
2
satisfies .A I3 /3 D O3 .
b) Compute An for all positive integers n.
14. a) Prove that the matrix
3
2 34
A D 4 4 2 05
3 0 2
2
satisfies .A 2I3 /3 D O3 .
b) Compute An for all positive integers n.
15. Suppose that A 2 Mn .C/ is a diagonal matrix whose diagonal entries are
pairwise distinct. Let B 2 Mn .C/ be a matrix such that AB D BA. Prove
that B is diagonal.
16. A matrix A 2 Mn .R/ is called a permutation matrix if each row and column
of A has an entry equal to 1 and all other entries equal to 0. Prove that the
product of two permutation matrices is a permutation matrix.
17. Consider a permutation of 1; 2; : : : ; n, that is a bijective map
W f1; 2; : : : ; ng ! f1; 2; : : : ; ng:
1.4 Block Matrices
29
We define the associated permutation matrix P as follows: the .i; j /-entry of
P is equal to 1 if i D .j / and 0 otherwise.
a) Prove that any permutation matrix is of the form P for a unique permutation
.
b) Deduce that there are nŠ permutation matrices.
c) Prove that
P1 P2 D P1 ı2
for all permutations 1 ; 2 .
d) Given a matrix B 2 Mn .F /, describe the matrices P B and BP in terms
of B and of the permutation .
1.4 Block Matrices
A sub-matrix of a matrix A 2 Mm;n .F / is a matrix obtained from A by deleting
rows and/or columns of A (note that A itself is a sub-matrix of A). A matrix can be
partitioned into sub-matrices by drawing horizontal or vertical lines between some
of its rows or columns. We call such a matrix a block (or partitioned) matrix and
we call the corresponding sub-matrices blocks.
Here are a few examples of partitioned matrices:
2
3
123
40 1 25;
001
"
#
12
;
34
3
0100
41 1 1 15:
1123
2
We can see a partitioned matrix as a “matrix of matrices”: the typical shape of a
partitioned matrix A of size m n is
2
3
A11 A12 : : : A1k
6 A21 A22 : : : A2k 7
6
7
AD6 :
:: :: :: 7 ;
4 ::
: : : 5
Al1 Al2 : : : Alk
where Aij is a matrix of size mi nj for some positive integers m1 ; : : : ; ml and
n1 ; : : : ; nk with m1 C m2 C : : : C ml D m and n1 C n2 C : : : C nk D n. If l D k,
we call the blocks A11 ; : : : ; Akk the diagonal blocks and we say that A is block
diagonal if all blocks of A but the diagonal ones are zero. Thus a block diagonal
matrix is of the form
30
1 Matrix Algebra
2
A11 0 : : :
6 0 A22 : : :
6
AD6 :
:: ::
4 ::
: :
0
3
0
0 7
7
:: 7 :
: 5
0 : : : Akk
An important advantage is given by the following rules for addition and
multiplication of block matrices (which follow directly from the rules of addition
and multiplication by matrices; we warn however the reader that the proof of the
multiplication rule is quite involved from a notational point of view!):
• If
2
3
A11 A12 : : : A1k
6 A21 A22 : : : A2k 7
6
7
AD6 :
:: :: :: 7
4 ::
: : : 5
2
and
3
B11 B12 : : : B1k
6 B21 B22 : : : B2k 7
6
7
BD6 : : : : 7
4 :: :: :: :: 5
Al1 Al2 : : : Alk
Bl1 Bl2 : : : Blk
with Aij and Bij of the same size for all i; j (so the rows and columns of B and
A are partitioned in the same way), then
2
3
A11 C B11 A12 C B12 : : : A1k C B1k
6 A21 C B21 A22 C B22 : : : A2k C B2k 7
6
7
ACB D6
7:
::
::
::
::
4
5
:
:
:
:
Al1 C Bl1 Al2 C Bl2 : : : Alk C Blk
• If
2
3
A11 A12 : : : A1k
6 A21 A22 : : : A2k 7
6
7
AD6 :
:: :: :: 7 ;
4 ::
: : : 5
Al1 Al2 : : : Alk
2
3
B11 B12 : : : B1r
6 B21 B22 : : : B2r 7
6
7
BD6 :
:: :: :: 7
4 ::
: : : 5
Bk1 Bk2 : : : Bkr
are m n, respectively n p partitioned matrices, with Aij of size mi nj and
Bij of size ni pj , then
2
3
C11 C12 : : : C1r
6 C21 C22 : : : C2r 7
6
7
AB D 6 : : : : 7 ;
:
:
:
:
4 : : : : 5
Cl1 Cl2 : : : Clr
1.5 Invertible Matrices
31
where
Cij D
k
X
Aiu Buj :
uD1
1.4.1 Problems for Practice
If A D Œaij 2 Mm1 ;n1 .F / and B 2 Mm2 ;n2 .F / are matrices, the Kronecker
product or tensor product of A and B is the matrix A ˝ B 2 Mm1 m2 ;n1 n2 .F /
defined by
2
3
a12 B : : : a1;n1 B
a22 B : : : a2;n1 B 7
7
7:
::
::
::
5
:
:
:
am1 ;1 B am1 ;2 B : : : am1 ;n1 B
a11 B
6 a21 B
6
A˝B D6 :
4 ::
1. Compute the Kronecker product of the matrices
3
010
A D 41 0 05
001
2
and
BD
2 1
:
1 1
2. Do we always have A ˝ B D B ˝ A?
3. Check that Im ˝ In D Imn .
4. Prove that if A1 2 Mm1 ;n1 .F /, A2 2 Mn1 ;r1 .F /, B1 2 Mm2 ;n2 .F / and B2 2
Mn2 ;r2 .F /, then
.A1 ˝ B1 / .A2 ˝ B2 / D .A1 A2 / ˝ .B1 B2 /:
5. Prove that if A 2 Mm .F / and B 2 Mn .F / then
A ˝ B D .A ˝ In / .Im ˝ B/:
1.5 Invertible Matrices
Let n be a positive integer. We say that a matrix A 2 Mn .F / is invertible or nonsingular if there is a matrix B 2 Mn .F / such that
AB D BA D In :
32
1 Matrix Algebra
Such a matrix B is then necessarily unique, for if C is another matrix with the same
properties, we obtain
C D In C D .BA/C D B.AC / D BIn D B:
The matrix B is called the inverse of A and denoted A1 .
Let us establish a few basic properties of invertible matrices:
Proposition 1.39. a) If c is a nonzero scalar, then cIn is invertible.
b) If A is invertible, then so is A1 , and .A1 /1 D A.
c) If A; B 2 Mn .F / are invertible, then so is AB and
.AB/1 D B 1 A1 :
Proof. a) The matrix c 1 In is an inverse of the matrix cIn .
b) Let B D A1 , then BA D AB D In , showing that B is invertible, with inverse A.
c) By assumption A1 and B 1 exist, so the matrix C D B 1 A1 makes sense. We
compute
.AB/C D ABB 1 A1 D AIn A1 D AA1 D In
and similarly
C.AB/ D B 1 A1 AB D B 1 In B D B 1 B D In ;
showing that AB is invertible, with inverse C .
t
u
Remark 1.40. a) One should be careful when computing inverses of products of
matrices, for the formula .AB/1 D A1 B 1 is not correct, unless A and B
commute. We will have
.ABC /1 D C 1 B 1 A1
and not A1 B 1 C 1 in general. Thus the inverse of the product equals the
product of the inverses in the reverse order.
b) By the proposition, invertible matrices are stable under product, but they are
definitely not stable under addition: the matrices In and In are invertible, but
their sum On is not invertible (as On A D On ¤ In for any matrix A 2 Mn .F /).
The set of invertible matrices plays an extremely important role in linear algebra,
so it deserves a definition and a special notation:
Definition 1.41. The set of invertible matrices A 2 Mn .F / is called the general
linear group and denoted GLn .F /.
1.5 Invertible Matrices
33
Unfortunately, with the tools introduced so far it is illusory to hope to understand
the fine properties of the general linear group GLn .F /. Once we develop the theory
of linear algebra from the point of view of vector spaces and linear transformations,
we will have a much more powerful theory that will make it easier to understand
invertible matrices. Just to give a hint of the difficulty of the theory, try to prove by
bare hands that if A; B 2 Mn .F / satisfy AB D In , then A is invertible. The key
point is proving that this equality forces BA D In , but this is definitely not trivial
simply by coming back to the multiplication of matrices! In subsequent chapters we
will develop a theory of determinants which allows a much cleaner characterization
of invertible matrices. Also, in subsequent chapters we will describe an algorithm,
based on operations on the rows of a matrix, which gives an efficient way of solving
the following problem: given a square matrix A, decide whether A is invertible and
compute its inverse if A is invertible. This problem is not easy to solve with the tools
we have introduced so far.
3
2
010
Problem 1.42. Consider the matrix A D 4 1 0 0 5. Is the matrix A invertible? If
001
1
this is case, compute A .
Solution. Since we don’t have any strong tools at our disposal for the moment, let
3
2
ab c
us use brute force and look for a matrix 4 x y z 5 such that
u vw
3
3 2
ab c
010
4 1 0 0 5 4 x y z 5 D I3 :
u vw
001
2
3
xy z
The left-hand side equals 4 a b c 5, so this last matrix should be equal to I3 . This
u vw
gives a unique solution x D b D w D 1 and all other variables are equal to 0. We
conclude that A is invertible and
2
3
010
A1 D 4 1 0 0 5 :
001
2
t
u
It is clear that the method used to find the inverse of A in the previous problem is
not efficient and quickly becomes very painful even for 3 3 matrices. We will see
later on a much more powerful approach, but we would like to present yet another
34
1 Matrix Algebra
method, which can be fairly useful in some situations (especially when the matrix
has some nice symmetry properties or if it has many zeros).
Consider a matrix A 2 Mn .F / and a vector b 2 F n . Assume that we can always
solve the system AX D b with X 2 F n and that it has a unique solution. Then one
can prove (we will see this in a later chapter, so we will take it for granted in this
chapter) that A is invertible and so the solution of the system is given by X D A1 b
(multiply the relation AX D b by A1 ). On the other hand, assume that we are
able to solve the system by hand, then we have a description of X in terms of the
coordinates of b. Thus we will know explicitly A1 b for all vectors b 2 F n and this
is enough to find A1 . In practice, the resolution of the system will show that
3
c11 b1 C c12 b2 C : : : C c1n bn
6 c21 b1 C c22 b2 C : : : C c2n bn 7
7
6
A1 b D 6
7
::
5
4
:
2
cn1 b1 C c22 b2 C : : : C cnn bn
for some scalars cij , independent of b1 ; : : : ; bn . Letting b be the i th vector of the
canonical basis of F n , the left-hand side A1 b is simply the i th column of A1 ,
while the right-hand side is the i th column of the matrix Œcij . Since the two sides
are equal and this for all i, we conclude that
A1 D Œcij :
Note that once the system is solved, it is very easy to write the matrix A1 directly
by looking at the expression of A1 b. Namely, if the first coordinate of A1 b is
c11 b1 C c12 b2 C : : : C c1n bn , then the first row of A1 is .c11 ; c12 ; : : : ; c1n /. Of
course, the key part of the argument is the resolution of the linear system AX D b,
and this will be discussed in a subsequent chapter. We will limit therefore ourselves
in this chapter to rather simple systems, which can be solved by hand without any
further theory.
Let us see a few concrete examples:
Problem 1.43. Compute the inverse of the matrix A in the previous problem using
the method we have just described.
2 3
b1
Solution. Given a vector b D 4 b2 5 2 F 3 , we try to solve the system AX D b,
b3
2 3
x1
with X D 4 x2 5. The system can be written as
x3
x 2 D b1 ;
x 1 D b2 ;
x 3 D b3 ;
1.5 Invertible Matrices
35
or equivalently
x 1 D b2 ;
x 2 D b1 ;
x 3 D b3 :
Since this system has a solution for all b 2 F 3 , we deduce that A is invertible and
that for all b 2 F 3 we have
2
3 2 3
x1
b2
1
4
5
4
A b D X D x 2 D b1 5 :
x3
b3
The first coordinate of A1 b is b2 , thus the first row of A1 is .0; 1; 0/, the second
coordinate of A1 b is b1 so the second row of A1 is .1; 0; 0/. Finally, the third
coordinate of A1 b is b3 , so the third row of A1 is .0; 0; 1/. We conclude that
3
010
A1 D 4 1 0 0 5 :
001
2
t
u
Problem 1.44. Consider the matrix
3
1111
60 1 1 17
7
AD6
40 0 1 15:
0001
2
Prove that A is invertible and compute A1 .
2 3
b1
6 b2 7
4
7
Solution. Again, given a vector b D 6
4 b3 5 2 F we try to solve the system
b4
3
x1
6 x2 7
7
AX D b with X D 6
4 x3 5. The system can be written
x4
2
8
x 1 C x 2 C x 3 C x 4 D b1
ˆ
ˆ
<
x 2 C x 3 C x 4 D b2
ˆ
x 3 C x 4 D b3
:̂
x 4 D b4
36
1 Matrix Algebra
and can be solved rather easily: the last equation gives x4 D b4 . Subtracting the last
equation from the third one yields x3 D b3 b4 , then subtracting the third equation
from the second one yields x2 D b2 b3 and finally x1 D b1 b2 . Thus the system
always has solutions and so A is invertible, with
3 2
3
b1 b2
x1
6 x2 7 6 b2 b3 7
7 6
7
A1 b D X D 6
4 x3 5 D 4 b3 b4 5 :
2
x4
b4
of A1 b being b1 b2 , we deduce that the first row of A is
The first coordinate
1 1 0 0 . Similarly, the coordinate b2 b3 gives the second row of A namely
0 1 1 0 , and so on. We end up with
3
1 1 0 0
6 0 1 1 0 7
7
A1 D 6
4 0 0 1 1 5 :
0 0 0 1
2
t
u
Problem 1.45. Let n be a positive integer. Find the inverse of the matrix
2
3
1 2 3 ::: n
6 0 1 2 ::: n 17
6
7
6 0 0 1 ::: n 27
6
7:
6: : :
7
4 :: :: :: : : : ::: 5
0 0 0 ::: 1
Solution. Let A be the matrix whose inverse we are trying to compute. Given a
vector b 2 Rn with coordinates b1 ; b2 ; : : : ; bn , let us try to solve the system AX D
b. This system is equivalent to
8
ˆ
ˆx1 C 2x2 C 3x3 C : : : C nxn D b1
ˆ
ˆ
ˆ
< x2 C 2x3 C : : : C .n 1/xn D b2
::
:
ˆ
ˆ
ˆ
ˆ
xn1 C 2xn D bn1
:̂
x n D bn
In principle one can easily solve it by starting with the last equation and
successively determining xn ; xn1 ; : : : ; x1 from the equations of the system. To
make our life simpler, we subtract the second equation from the first, the third
equation from the second,. . . , the nth equation from the n 1th equation. We end
1.5 Invertible Matrices
37
up with the equivalent system
8
x1 C x2 C x3 C : : : C xn D b1 b2
ˆ
ˆ
ˆ
ˆ
ˆ
< x2 C x3 C : : : C xn D b2 b3
::
:
ˆ
ˆ
ˆ
ˆ
xn1 C xn D bn1 bn
:̂
x n D bn :
Subtracting again consecutive equations yields
x1 D b1 2b2 C b3 ;
x2 D b2 2b3 C b4 ; : : : ; xn1 D bn1 2bn ;
x n D bn :
Since the system AX D b always has solutions, we deduce that A is invertible. Moreover, the system is equivalent to A1 b D X and the expressions of
x1 ; x2 ; : : : ; xn give us the rows of A1 : x1 D b1 2b2 C b3 shows that the first
row of A1 equals .1; 2; 1; 0; : : : ; 0/,. . . , xn1 D bn1 2bn shows that the nextto-last row is .0; 0; : : : ; 0; 1; 2/ and finally the last row of A1 is .0; 0; : : : ; 1/.
Thus
3
1 2 1 0 : : : 0 0
6 0 1 2 1 : : : 0 0 7
7
6
7
6
A1 D 6 ::: ::: ::: : : : ::: ::: ::: 7 :
7
6
4 0 0 0 0 : : : 1 2 5
2
0 0
0 0 ::: 0 1
t
u
Problem 1.46. Let A; B 2 Mn .F / be matrices such that AB D BA. Prove that if
A is invertible, then A1 B D BA1 .
Solution. Multiply the relation AB D BA on the left by A1 and on the right by
A1 . We obtain
A1 ABA1 D A1 BAA1 :
Since A1 A D In , the left-hand side equals BA1 . Since AA1 D In , the righthand side equals A1 B. Thus A1 B D BA1 , as desired.
t
u
Problem 1.47. Prove that a diagonal matrix A 2 Mn .F / is invertible if and only
if all its diagonal entries are nonzero. Moreover, if this is the case, then A1 is also
diagonal.
38
1 Matrix Algebra
Solution. Let A D Œaij 2 Mn .F / be a diagonal matrix. If B D Œbij 2 Mn .F / is
an arbitrary matrix, let us compute AB. Using the product rule, we have
.AB/ij D
n
X
aik bkj :
kD1
We have aik D 0 for k ¤ i , since A is diagonal. Thus
.AB/ij D ai i bij
and similarly
.BA/ij D ajj bij :
Suppose now that ai i ¤ 0 for all i 2 f1; 2; : : : ; ng and consider the diagonal
matrix B with diagonal entries bi i D a1i i . Then the formulae in the previous
paragraph yield AB D BA D In , thus A is invertible and A1 D B is diagonal.
Conversely, suppose that A is invertible and diagonal, thus we can find a matrix
B such that AB D BA D In . Thus for all i 2 f1; : : : ; ng we have
1 D .In /i i D .AB/i i D ai i bi i ;
hence ai i ¤ 0 for all i and so all diagonal entries of A are nonzero.
t
u
Sometimes, it can be very easy to prove that a matrix A is invertible and to
compute its inverse, if we know that A satisfies some algebraic equation. For
instance, imagine that we knew that
A3 C 3A C In D On :
Then A3 C 3A D In , which can be written as
A .A2 3In / D In :
On the other hand, we also have
.A2 3In / A D A3 3A D In ;
thus A is invertible and A1 D A2 3In . In general, a similar argument shows
that if A 2 Mn .C/ satisfies an equation of the form
ad Ad C ad 1 Ad 1 C : : : C a0 In D 0
1.5 Invertible Matrices
39
for some complex numbers a0 ; : : : ; ad with a0 ¤ 0, then A is invertible and
A1 D ad d 1 ad 1 d 2
a1
A
C
A
C : : : C In :
a0
a0
a0
Of course, it is totally mysterious how to find such an algebraic equation satisfied
by A in general, but we will see much later how to naturally create such equations
(this will already require a lot of nontrivial theory!).
We discuss below two more such examples.
Problem 1.48. Consider the matrix
3
12 1
A D 42 1 3 5:
3 0 1
2
a) Check that
A3 A2 8A 18I3 D O3 :
b) Deduce that A is invertible and compute A1 .
Solution. a) We compute brutally, using the product rule
3
3 2
3 2
8 46
12 1
12 1
A2 D 4 2 1 3 5 4 2 1 3 5 D 4 13 5 2 5
0 64
3 0 1
3 0 1
2
and
3
3 2
3 2
34 20 14
8 46
12 1
A3 D 4 2 1 3 5 4 13 5 2 5 D 4 29 31 26 5 :
24 6 14
0 64
3 0 1
2
It follows that
3
8 16 8
A3 A2 18I3 D 4 16 8 24 5 D 8A
24 0 8
2
and the result follows.
b) We can write the identity in a) as follows:
A.A2 A 8I3 / D 18I3
40
1 Matrix Algebra
or equivalently
A
1 2
.A A 8I3 / D I3 :
18
Similarly, we obtain
1 2
.A A 8I3 / A D I3
18
and this shows that A is invertible and
3
1 2 5
1 2
1 4
.A A 8I3 / D
A1 D
11 4 1 5 :
18
18
3 6 3
2
t
u
Problem 1.49. Let n > 1 be an integer and let
2i D e n D cos
2
2
C i sin
:
n
n
Let Fn be the Fourier matrix of order n, whose .j; k/-entry is .j 1/.k1/ for 1 j; k n.
a) Let Fn be the matrix whose .j; k/-entry is the complex conjugate of the .j; k/entry of Fn . Prove that
Fn Fn D Fn Fn D nIn :
b) Deduce that Fn is invertible and compute its inverse.
Solution. a) Let j; k 2 f1; 2; : : : ; ng and let us use the product rule to compute
.Fn Fn /j k D
n
X
.Fn /j l .Fn /lk D
lD1
n
X
.j 1/.l1/ .l1/.k1/ D
lD1
n
X
.j 1/.l1/.l1/.k1/ ;
lD1
the last equality being a consequence of the fact that D 1 . Thus
.Fn Fn /j k D
n
X
lD1
.l1/.j k/ D
n1
X
. j k /l :
lD0
1.5 Invertible Matrices
41
The last sum is the sum of a geometric progression with ratio j k . If j D k,
then j k D 1, so the sum equals n, since each term is equal to 1 in this case. If
j ¤ k, then j k ¤ 1 and we have
n1
X
1 . j k /n
1 . n /j k
. j k /l D
D
D 0;
1 j k
1 j k
lD0
the last equality being a consequence of the formula n D 1. We conclude that
.Fn Fn /j k equals n when j D k and equals 0 otherwise. It follows that
Fn Fn D nIn :
The equality Fn Fn D nIn is proved is exactly the same way and we leave the
details to the reader.
b) By part a) we can write
Fn 1
1
Fn D Fn D In ;
n
n
which plainly shows that Fn is invertible and
Fn1 D
1
Fn :
n
1.5.1 Problems for Practice
1. Find the inverse of the matrix
12
:
AD
34
2. For which real numbers x is the matrix
1x
AD
23
invertible? For each such x, compute the inverse of A.
t
u
42
1 Matrix Algebra
3. Is the matrix
3
101
A D 4 0 1 1 5 2 M3 .F2 /
110
2
invertible? If so, compute its inverse.
4. Same problem with the matrix
3
101
A D 4 0 1 1 5 2 M3 .F2 /:
010
2
5. Consider the matrix
3
12345
60 1 2 3 47
7
6
7
6
A D 6 0 0 1 2 3 7 2 M5 .R/:
7
6
40 0 0 1 25
00001
2
Prove that A is invertible and compute its inverse.
6. Consider the matrix
3
0111
61 0 1 17
7
AD6
41 1 0 15:
1110
2
a) Compute the inverse of A by solving, for each b 2 R4 , the system AX D b.
b) Prove that A2 D 3I4 C 2A. Deduce a new way of computing A1 .
7. Let A be the matrix
3
3 1 2
A D 4 5 2 3 5 :
1 0 1
2
a) Check that A3 D O3 .
b) Compute .I3 C A/1 .
1.5 Invertible Matrices
43
8. Let n be a positive integer. Find the inverse of the n n matrix A whose entries
on or above the main diagonal are equal to 1 and whose entries (strictly) below
the main diagonal are zero.
9. Consider the matrices
3
0 10 0
6 1 0 0 0 7
7
AD6
4 0 0 0 1 5 ;
0 01 0
2
3
0 0 10
6 0 0 0 17
7
BD6
4 1 0 0 0 5
0 1 0 0
2
and
3
0 0 0 1
6 0 0 1 0 7
7
C D6
4 0 1 0 05
1 0 0 0
2
and let H be the set of all matrices of the form aA C bB C cC C dI4 , with
a; b; c; d real numbers.
a) Prove that A2 D B 2 D C 2 D I4 and
BC D CB D A;
CA D AC D B;
AB D BA D C:
b) Prove that the sum and product of two elements of H is an element of H.
c) Prove that all nonzero elements of H are invertible.
10. Let A; B 2 Mn .R/ such that
A C B D In
and
A2 C B 2 D On :
Prove that A and B are invertible and that
.A1 C B 1 /n D 2n In
for all positive integers n.
11. Let A 2 Mn .R/ be an invertible matrix such that
A1 D In A:
Prove that
A6 In D On :
44
1 Matrix Algebra
12. Let A 2 Mn .R/ be a matrix such that A2 D A, where is a real number with
¤ 1. Prove that
.In C A/1 D In 1
A:
C1
13. Recall that a permutation matrix is a matrix A 2 Mn .C/ such that every row
and column of A contains one entry equal to 1 and all other entries are 0. Prove
that a permutation matrix is invertible and that its inverse is also a permutation
matrix.
14. Suppose that an upper-triangular matrix A 2 Mn .C/ is invertible. Prove that
A1 is also upper-triangular.
15. Let a; b; c be positive real numbers, not all of them equal and consider the
matrix
3
2
a0b0c 0
60 a 0 c 0 b7
7
6
7
6
6c 0 a 0 b 07
AD6
7:
60 b 0 a 0 c 7
7
6
4b 0 c 0 a 05
0c 0b0a
Prove that A is invertible. Hint: A1 is a matrix of the same form as A (with
a; b; c replaced by suitable real numbers x; y; z).
1.6 The Transpose of a Matrix
Let A 2 Mm;n .F / be a m n matrix. The transpose of A is the matrix t A (also
denoted as AT ) obtained by interchanging the rows and columns of A. Consequently
t
A is a n m matrix, i.e., t A 2 Mn;m .F /. It is clear that t In D In . Note that if
A D Œaij , then t A D Œaj i , that is
.t A/ij D Aj i
(1.3)
3
1 2 3
Example 1.50. a) The transpose of the matrix 4 0 1 2 5 is the matrix
3 4 5
3
2
1 0 3
4 2 1 4 5.
3 2 5
2
1.6 The Transpose of a Matrix
45
3
10
62 17
1234
7
is the matrix 6
b) The transpose of the matrix
4 3 2 5.
0123
43
t
2
The following proposition summarizes the basic properties of the operation A !
A on Mm;n .F /.
Proposition 1.51. The transpose operation has the following properties:
1) t . t A/ D A for all A 2 Mm;n .F /.
2) t .A C B/ D t A C t B for all A; B 2 Mm;n .F /;
3) t .cA/ D c t A if c 2 F is a scalar and A 2 Mm;n .F /.
4) t .AB/ D t B t A if A 2 Mm;n .F / and B 2 Mn;p .F /;
5) t .Ak / D . t A/k if A 2 Mn .F / and k is a positive integer;
6) If the matrix A is invertible, then t A is also invertible and
. t A/1 D t .A1 /I
Proof. Properties 1), 2), and 3) are immediate from the definition of the transpose
of a matrix (more precisely from relation (1.3)). Let us prove (4). First, note that
t
B 2 Mp;n .F / and t A 2 Mn;m .F /, thus t B t A makes sense. Next, if A D Œaij and
B D Œbj k , we have
. t .AB//ki D .AB/ik D
n
X
aij bj k D
j D1
n
X
. t B/kj . t A/j i D . t B t A/ki ;
j D1
thus t .AB/ D t B t A.
Property 5) follows by induction on k, using property 4. Finally, property 6) also
follows from 4), since
In D t In D t .A A1 / D t .A1 /t A
and similarly t A t .A1 / D In .
t
u
It follows from the previous proposition that the transpose operation leaves the
general linear group GLn .F / invariant, that is t A 2 GLn .F / whenever A 2
GLn .F /.
Problem 1.52. Let X 2 F n be a vector with coordinates x1 ; : : : ; xn , considered as
a matrix in Mn;1 .F /. Prove that for any matrix A 2 Mn .F / we have
t
X. t A A/X D
n
X
.ai1 x1 C ai2 x2 C : : : C ai n xn /2 :
iD1
46
1 Matrix Algebra
Solution. First of all, we use Proposition 1.51 to obtain
t
X. t A A/X D t X t AAX D t .AX / AX:
Write now
3 2 3
a11 x1 C : : : C a1n xn
y1
6 a21 x1 C : : : C a2n xn 7 6 y2 7
7 6 7
6
Y D AX D 6
7 D 6 : 7:
::
5 4 :: 5
4
:
2
an1 x1 C : : : C ann xn
yn
Then
3
y1
7
6
6 y2 7
t
Y Y D y 1 y 2 : : : yn 6 : 7
4 :: 5
2
yn
and using the product rule for matrices, we obtain that the last quantity equals y12 C
: : : C yn2 . We conclude that
t
X. t A A/X D t Y Y D y12 C : : : C yn2 D
n
X
.ai1 x1 C ai2 x2 C : : : C ai n xn /2 :
iD1
t
u
There are three types of special matrices that play a fundamental role in linear
algebra and that are related to the transpose operation:
• The symmetric matrices. These are matrices A 2 Mn .F / for which t A D A
or equivalently aij D aj i for all i; j . They play a crucial role in the theory
of quadratic forms and euclidean spaces (for the latter one choose F D R),
and a whole chapter will be devoted to their subtle properties. For example, all
symmetric matrices of order 2 and 3 are of the form
ab
;
bc
3
ab c
4b d e 5;
c e f
2
a; b; c 2 F
and
a; b; c; d; e; f 2 F:
• The orthogonal matrices. These are matrices A 2 GLn .F / for which
A1 D t A:
1.6 The Transpose of a Matrix
47
They also play a fundamental role in the theory of euclidean spaces, since these
matrices correspond to isometries of such spaces. They will also be extensively
studied in the last chapter of the book.
• The skew-symmetric (or antisymmetric) matrices. These are matrices for which
A C t A D On ;
that is t A D A. These matrices are related to alternating forms. They satisfy
aij D aj i for all i; j . Thus 2ai i D 0. If F 2 fQ; R; Cg, then this last equality
forces ai i D 0 for all i . Thus the diagonal elements of a skew-symmetric matrix
are in this case equal to 0. On the other hand, over a field F such as F2 (the
field with two elements), the condition 2ai i D 0 does not give any information
about the element ai i , since for any x 2 F2 we have 2x D 0. Actually, over such a
field there is no difference between symmetric and skew-symmetric matrices! All
skew-symmetric matrices of order 2 and 3 over the field C of complex numbers
are of the following form:
0 a
;
a 0
3
0 a b
4 a 0 c 5 ;
b c 0
2
a 2 C and
a; b; c 2 C:
Proposition 1.53. All matrices in the following statements are square matrices of
the same size. Prove that
1) The sum of a matrix and its transpose is a symmetric matrix. The difference of a
matrix and its transpose is a skew-symmetric matrix.
2) The product of a matrix and its transpose is a symmetric matrix.
3) Any power of a symmetric matrix is symmetric.
4) An even power of a skew-symmetric matrix is symmetric. An odd power of a
skew-symmetric matrix is skew-symmetric.
5) If A is invertible and symmetric, then A1 is symmetric.
6) If A is invertible and skew-symmetric, then A1 is skew-symmetric.
Proof. 1) If A is a matrix, then t .A Ct A/ D t A Ct .t A/ D t A C A D A Ct A,
thus A Ct A is symmetric. Similarly, t .A t A/ D t A A D .A t A/, thus
A t A is skew-symmetric.
2) We have t .At A/ D t .t A/t A D At A, thus At A is symmetric.
3) and 4) follow from the equality .t A/n D t .An /, valid for any matrix A and any
n 1.
5) and 6) follow from the equality t .A1 / D .t A/1 , valid for any invertible
matrix A.
t
u
We end this section with a rather long list of problems that illustrate the ideas
introduced in this section.
48
1 Matrix Algebra
Problem 1.54. Describe the symmetric matrices A 2 Mn .F / which are simultaneously upper-triangular.
Solution. Let A D Œaij be a symmetric and upper-triangular matrix. By definition
aij D 0 whenever i > j (since A is upper-triangular) and moreover aij D aj i for all
i; j 2 f1; 2; : : : ; ng. We conclude that aij D 0 whenever i ¤ j , that is A is diagonal.
Conversely, any diagonal matrix is clearly symmetric and upper-triangular. Thus the
answer of the problem is: the diagonal matrices.
t
u
Problem 1.55. How many symmetric matrices are there in Mn .F2 /?
Solution. By definition, each entry of a matrix A D Œaij 2 Mn .F2 / is equal to 0
or 1, and A is symmetric if and only if aij D aj i for all i; j 2 f1; 2; : : : ; ng. Thus a
symmetric matrix A is entirely determined by the choice of the entries above or on
the main diagonal, that is the entries aij with 1 i j n. Moreover, for any
choice of these entries, we can construct a symmetric matrix by defining aij D aj i
for i > j . For each pair .i; j / with 1 i j n we have two choices for the entry
aij (either 0 or 1). Since there are n C n2 D n.nC1/
such pairs .i; j / (n pairs with
2
n.n1/
n
i D j and 2 D 2 pairs with i < j ) and since the choices are independent,
we deduce that the number of symmetric matrices in Mn .F2 / is 2
n.nC1/
2
.
t
u
Problem 1.56. a) Describe the diagonal matrices A 2 Mn .R/ which are skewsymmetric.
b) Same question, but replacing R with F2 .
Solution. a) Let A D Œaij 2 Mn .R/ be a diagonal skew-symmetric matrix. Since
A is diagonal, all entries away from the main diagonal are zero. Also, since A C
t
A D 0, we have
ai i C ai i D 0
for all i 2 f1; 2; : : : ; ng, by noticing that A and t A have the same diagonal
entries. We conclude that 2ai i D 0 and so ai i D 0 for all i 2 f1; 2; : : : ; ng. Thus
A D On is the unique diagonal skew-symmetric matrix in Mn .R/.
b) As in part a), a matrix A D Œaij 2 Mn .F2 / is diagonal and skew-symmetric
if and only if it is diagonal and its diagonal entries ai i (for 1 i n) satisfy
2ai i D 0. However, any element x of F2 satisfies 2x D 0, thus the condition
2ai i D 0 is automatic. We conclude that any diagonal matrix A 2 Mn .F2 / is
skew-symmetric!
t
u
Problem 1.57. A matrix A 2 Mn .R/ has a unique nonzero entry in each row and
column, and that entry equals 1 or 1. Prove that A is orthogonal.
Solution. Let A D Œaij . We need to prove that A1 D t A, that is A t A D In and
A A D In . Fix i; j 2 f1; 2; : : : ; ng. Then the .i; j /-entry of A t A is
t
.A t A/ij D
n
X
kD1
aik aj k :
1.6 The Transpose of a Matrix
49
Assume that aik aj k is nonzero for some k 2 f1; 2; : : : ; ng, thus both aik and aj k
are nonzero. If i ¤ j , this means that A has at least two nonzero entries in column
k, which is impossible. Thus if i ¤ j , then aik aj k D 0 for all k 2 f1; 2; : : : ; ng and
consequently the .i; j /-entry of A t A is 0.
On the other hand, if i D j , then
.A t A/ij D
n
X
2
aik
:
kD1
Now, by assumption the ith row
Pn of A2 consists of one entry equal to 1 or 1, and
all other entries are 0. Since P
kD1 aik is simply the sum of squares of the entries
2
in the ith row, we deduce that nkD1 aik
D 1 and so .A t A/ij D 1 for i D j . We
t
conclude that A A D In . The reader will have no problem adapting this argument
in order to prove the equality t A A D In .
t
u
Remark 1.58. In particular all such matrices are invertible, a fact which is definitely
not obvious. Moreover, it is very easy to compute the inverse of such a matrix:
simply take its transpose!
Problem 1.59. Prove that any matrix A 2 Mn .C/ can be expressed in a unique way
as B C C , where B is symmetric and C is skew-symmetric.
Solution. If A D B C C with B symmetric and C skew-symmetric, then
necessarily t A D B C , thus
BD
1
.A C t A/
2
and
C D
1
.A t A/:
2
Conversely, choosing B and C as in the previous relation, they are symmetric,
respectively skew-symmetric (by the previous proposition) and they add up to A.
t
u
13
is the difference of a symmetric matrix B
Problem 1.60. The matrix A D
22
and of a skew-symmetric matrix C . Find B.
Solution. By assumption we have A D B C with t B D B and t C D C . Thus
t
A D t .B C / D t B t C D B C C:
We conclude that
A C t A D .B C / C .B C C / D 2B
50
1 Matrix Algebra
and so
1
1
12
13
t
C
B D .A C A/ D
32
22
2
2
5
1 25
1
D
D 5 2 :
2
2 54
2
t
u
Problem 1.61. a) Let A 2 Mn .R/ be a matrix such that A A D On . Prove that
A D On .
b) Does the result in part a) hold if we replace R with C?
t
Solution. a) Let A D ŒAij . By the product rule for matrices, the .i; i /-entry of
t
A A is
. t A A/i i D
n
n
X
X
. t A/ik Aki D
A2ki :
kD1
kD1
Since t A A D On , we conclude that for all i 2 f1; 2; : : : ; ng we have
n
X
A2ki D 0:
kD1
Since the square of a real number is nonnegative, the last equality forces Aki D 0
for all k 2 f1; 2; : : : ; ng. Since i was arbitrary, we conclude that A D 0.
b) The result does no longer hold. Let us look for a symmetric matrix A 2 M2 .C/
such that t A A D O2 , that is A2 D O2 . Since A is symmetric, we can write
AD
ab
bd
for some complex numbers a; b; d . Now
ab
A D
bd
2
2
a C b 2 b.a C d /
ab
D
:
bd
b.a C d / b 2 C d 2
So we look for complex numbers a; b; d which are not all equal to 0 and for
which
a2 C b 2 D 0;
b.a C d / D 0;
b 2 C d 2 D 0:
1.6 The Transpose of a Matrix
51
It suffices to ensure that a C d D 0 and a2 C b 2 D 0. For instance, one can take
a D i , b D 1, d D i .
t
u
Remark 1.62. We could have also used Problem 1.52 in order to solve part a).
Indeed, for any X 2 Rn we have t X. t A A/X D 0 and so
n
X
.ai1 x1 C ai2 x2 C : : : C ai n xn /2 D 0
iD1
for any choice of real numbers x1 ; : : : ; xn . Since the sum of squares of real numbers
equals 0 if and only if all these numbers are equal to 0, we deduce that
ai1 x1 C : : : C ai n D 0
for all i 2 f1; 2; : : : ; ng and all real numbers x1 ; : : : ; xn . Thus AX D 0 for all
X 2 Rn and then A D On .
1.6.1 Problems for Practice
1. Consider the matrices
11
;
AD
21
31
:
BD
42
Compute each of the following matrices:
a) A t B.
b) B t A.
c) .A C 2 t B/.B C 2 t A/.
2. Let 2 R and let
cos sin :
AD
sin cos a) Prove that A is orthogonal.
b) Find all values of for which A is symmetric.
c) Find all values of for which A is skew-symmetric.
3. Which matrices A 2 Mn .F2 / are the sum of a symmetric matrix and of a skewsymmetric matrix?
52
1 Matrix Algebra
4. Write the matrix
3
123
A D 42 3 45
345
2
as the sum of a symmetric matrix and of a skew-symmetric matrix with real
entries.
5. All matrices in the following statements are square matrices of the same size.
Prove that
a) The product of two symmetric matrices is a symmetric matrix if and only if
the two matrices commute.
b) The product of two antisymmetric matrices is a symmetric matrix if and only
if the two matrices commute.
c) The product of a symmetric and a skew-symmetric matrix is skewsymmetric if and only if the two matrices commute.
6. We have seen that the square of a symmetric matrix A 2 Mn .R/ is symmetric.
Is it true that if the square of a matrix A 2 Mn .R/ is symmetric, then A is
symmetric?
7. Consider the map ' W M3 .R/ ! M3 .R/ defined by
'.A/ D t A C 2A:
Prove that ' is linear, that is
'.cA C B/ D c'.A/ C '.B/
for all A; B 2 M3 .R/ and all c 2 R.
8. Let A 2 Mn .R/ be a matrix such that A t A D On . Prove that A D On .
9. Find the skew-symmetric matrices A 2 Mn .R/ such that A2 D On .
10. Let A1 ; : : : ; Ak 2 Mn .R/ be matrices such that
t
A1 A1 C : : : C t Ak Ak D On :
Prove that A1 D : : : D Ak D On .
11. a) Let A 2 M3 .R/ be a skew-symmetric matrix. Prove that there exists a
nonzero vector X 2 R3 such that AX D 0.
b) Does the result in part a) remain true if we replace M3 .R/ with M2 .R/?
12. Describe all upper-triangular matrices A 2 Mn .R/ such that
A t A D t A A:
Chapter 2
Square Matrices of Order 2
Abstract The main topic of this chapter is a detailed study of 2 2 matrices and
their applications, for instance to linear recursive sequences and Pell’s equations.
The key ingredient is the Cayley–Hamilton theorem, which is systematically used in
analyzing the properties of these matrices. Many of these properties will be proved
in subsequent chapters by more advanced methods.
Keywords Cayley–Hamilton
• Binomial equation
•
Trace
•
Determinant
•
Pell’s
equation
In this chapter we will study some specific problems involving matrices of order
two and to make things even more concrete, we will work exclusively with matrices
whose entries are real or complex numbers. The reason for doing this is that in this
case one can actually perform explicit computations which might help the reader
become more familiar with the material introduced in the previous chapter. Also,
many of the results discussed in this chapter in a very special context will later
on be generalized (very often with completely different methods and tools!). We
should however warn the reader from the very beginning: studying square matrices
of order 2 is very far from being trivial, even though it might be tempting to believe
the contrary.
A matrix A 2 M2 .C/ is scalar if it is of the form zI2 for some complex number
z. One can define the notion of scalar matrix in full generality: if F is a field and
n 1, the scalar matrices are precisely the matrices of the form cIn , where c 2 F
is a scalar.
2.1 The Trace and the Determinant Maps
We introduce now two fundamental invariants of a 2 2 matrix, which will be
generalized and extensively studied in subsequent chapters for n n matrices:
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__2
53
2 Square Matrices of Order 2
54
a11 a12
Definition 2.1. Consider a matrix A D
a21 a22
2 M2 .C/. We define
• the trace of A as
Tr.A/ D a11 C a22 :
• the determinant of A as
det A D a11 a22 a12 a21 :
We also write
ˇ
ˇ
ˇ a11 a12 ˇ
ˇ
ˇ
det A D ˇ
a21 a22 ˇ
for the determinant of A.
We obtain in this way two maps
Tr; det W M2 .C/ ! C
which essentially govern the theory of 2 2 matrices. The following proposition
summarizes the main properties of the trace map. The second property is absolutely
fundamental. Recall that t A is the transpose of the matrix A.
Proposition 2.2. For all matrices A; B 2 M2 .C/ and all complex numbers z 2 C
we have
(a) Tr.A C zB/ D Tr.A/ C zTr.B/.
(b) Tr.AB/ D Tr.BA/.
(c) Tr. t A/ D Tr.A/.
Proof. Properties (a) and (c) are readily checked, so let us focus on property
(b). Write
a11 a12
AD
a21 a22
and
b11 b12
BD
:
b21 b22
Then
AB D
a11 b11 C a12 b21 a11 b12 C a12 b22
a21 b11 C a22 b21 a21 b12 C a22 b22
2.1 The Trace and the Determinant Maps
55
and
b11 a11 C b12 a21 b11 a12 C b12 a22
:
BA D
b21 a11 C b22 a21 b21 a12 C b22 a22 :
Thus
Tr.AB/ D a11 b11 C a22 b22 C a12 b21 C a21 b12 D Tr.BA/:
Remark 2.3. The map Tr W M2 .C/ ! C is not multiplicative, i.e., generally
Tr.AB/ ¤ Tr.A/Tr.B/. For instance Tr.I2 I2 / D Tr.I2 / D 2 and Tr.I2 / Tr.I2 / D
2 2 D 4 ¤ 2.
Let us turn now to properties of the determinant map:
Proposition 2.4. For all matrices A; B 2 M2 .C/ and all complex numbers ˛ we
have
(1) det.AB/ D det A det B;
(2) det t A D det A;
(3) det.˛A/ D ˛ 2 det A.
Proof. Properties (2) and (3) follow readily from the definition of a determinant.
Property (1) will be checked by a painful direct computation. Let
ab
;
AD
cd
xy
:
BD
z t
Then
AB D
ax C bz ay C bt
cx C d z cy C dt
and so
det.AB/ D .ax C bz/.cy C dt / .cx C d z/.ay C bt / D
acxy C adxt C bcyz C bd zt acxy bcxt adyz bd zt D
xt .ad bc/ yz.ad bc/ D .ad bc/.xt yz/ D det A det B;
as desired.
Problem 2.5. Let A 2 M2 .R/ such that
det.A C 2I2 / D det.A I2 /:
2 Square Matrices of Order 2
56
Prove that
det.A C I2 / D det.A/:
Solution. Write A D
ab
. The condition becomes
cd
ˇ
ˇ ˇ
ˇ
ˇa C 2 b ˇ ˇa 1 b ˇ
ˇDˇ
ˇ
ˇ
ˇ c d C 2ˇ ˇ c d 1ˇ
or equivalently
.a C 2/.d C 2/ bc D .a 1/.d 1/ bc:
Expanding and canceling similar terms, we obtain the equivalent relation aCd D1.
Using similar arguments, the equality det.A C I2 / D det A is equivalent to .a C
1/.d C 1/ bc D ad bc, or a C d D 1.The result follows.
t
u
2.1.1 Problems for Practice
1. Compute the trace and the determinant of the following matrices
AD
12
;
21
AD
3 6
;
2 4
AD
10
:
10
12
. Compute the determinant of A7 .
34
3. The trace of A 2 M2 .C/ equals 0. Prove that the trace of A3 also equals 0.
4. Prove that for all matrices A 2 M2 .C/ we have
2. Let A D
det A D
.Tr.A//2 Tr.A2 /
:
2
5. Prove that for all matrices A; B 2 M2 .C/ we have
det.A C B/ D det A C det B C Tr.A/ Tr.B/ Tr.AB/:
6. Let f W M2 .C/ ! C be a map with the property that for all matrices A; B 2
M2 .C/ and all complex numbers z we have
f .A C zB/ D f .A/ C zf .B/ and
f .AB/ D f .BA/:
2.2 The Characteristic Polynomial and the Cayley–Hamilton Theorem
57
(a) Consider the matrices
E11 D
10
;
00
E12 D
01
;
00
E21 D
00
;
10
E22 D
00
01
and define xij D f .Eij /. Check that E12 E21 D E11 and E21 E12 D E22 , and
deduce that x11 D x22 .
(b) Check that E11 E12 D E12 and E12 E11 D O2 , and deduce that x12 D 0.
Using a similar argument, prove that x21 D 0.
(c) Conclude that there is a complex number c such that
f .A/ D c Tr.A/
for all matrices A.
2.2 The Characteristic Polynomial and the Cayley–Hamilton
Theorem
Let A 2 M2 .C/. The characteristic polynomial of A is by definition the
polynomial denoted det.XI2 A/ and defined by
det.XI2 A/ D X 2 Tr.A/X C det A:
We note straight away that AB and BA have the same characteristic polynomial
for all matrices A; B 2 M2 .C/, since AB and BA have the same trace and the
same determinant, by results established in the previous section. In particular, if P
is invertible, then A and PAP 1 have the same characteristic polynomial.
The notation det.XI2 A/ is rather suggestive, and it is indeed coherent, in the
sense that for any complex number z, if we evaluate the characteristic polynomial of
A at z, we obtain precisely the determinant of the matrix zI2 A. More generally,
we have the following very useful:
Problem 2.6. For any two matrices A; B 2 M2 .C/ there is a complex number u
such that
det.A C zB/ D det A C uz C det B z2
for all complex numbers z. If A; B have integer/rational/real entries, then u is
integer/rational/real.
2 Square Matrices of Order 2
58
ab
Solution. Write A D
cd
˛ˇ
. Then
and B D
ı
ˇ
ˇ
ˇ a C z˛ b C zˇ ˇ
ˇ
ˇD
det.A C zB/ D ˇ
c C z d C zı ˇ
.aCz˛/.d Czı/.bCzˇ/.c Cz / D z2 .˛ıˇ /Cz.aıCd˛ˇc b/Cad bc:
Since ˛ı ˇ D det B and ad bc D det A, the result follows.
In other words, for any two matrices A; B 2 M2 .C/ we can define a quadratic
polynomial det.A C XB/ which evaluated at any complex number z gives det.A C
zB/. Moreover, det.A C XB/ has constant term det A and leading term B, and if
A; B have rational/integer/real entries, then this polynomial has rational/integer/real
coefficients. Before moving on, let us practice some problems to better digest these
ideas.
Problem 2.7. Let U; V 2 M2 .R/. Using the polynomial det.U C X V /, prove that
det.U C V / C det.U V / D 2 det U C 2 det V:
Solution. Write
f .X / D det.X V C U / D det V X 2 C mX C det U;
for some m 2 R. Then
det.U C V / C det.U V / D f .1/ C f .1/ D
.det V C m C det U / C .det V m C det U / D 2.det U C det V /:
Problem 2.8. Let A; B 2 M2 .R/. Using the previous problem, prove that
det.A2 C B 2 / C det.AB C BA/ 0:
Solution. As suggested, we use the identity
det.U C V / C det.U V / D 2 det U C 2 det V:
from Problem 2.7, and take U D A2 C B 2 , V D AB C BA. Thus
det.A2 C B 2 C AB C BA/ C det.A2 C B 2 AB BA/
D 2 det.A2 C B 2 / C 2 det.AB C BA/:
2.2 The Characteristic Polynomial and the Cayley–Hamilton Theorem
59
As A2 C B 2 C AB C BA D .A C B/2 and A2 C B 2 AB BA D .A B/2 , we
obtain
2 det.A2 C B 2 / C 2 det.AB C BA/ D det.A C B/2 C det.A B/2 0:
Problem 2.9. Let A; B 2 M2 .R/. Using the polynomial
f .X / D det.I2 C AB C x.BA AB//;
prove that
2AB C 3BA
det I2 C
5
3AB C 2BA
D det I2 C
:
5
Solution. As suggested, consider the polynomial of degree at most 2
f .X / D det.I2 C AB C x.BA AB//:
3
2
Df
. We claim that f .X / D f .1 X /, which
We need to prove that f
5
5
clearly implies the desired result. The polynomial g.X / D f .X / f .1 X / has
degree at most 1 and satisfies g.0/ D g.1/ D 0. Indeed, we have
g.0/ D f .0/ f .1/ D det.I2 C AB/ det.I2 C BA/ D 0;
since AB and BA have the same characteristic polynomial. Also, g.1/ D f .1/ f .0/ D 0. Thus g must be the zero polynomial and the result follows.
We introduce now another crucial tool in the theory of matrices, which will
be vastly generalized in subsequent chapters to n n matrices (using completely
different ideas and techniques).
Definition 2.10. The eigenvalues of a matrix A 2 M2 .C/ are the roots of its
characteristic polynomial, in other words they are the complex solutions 1 ; 2 of
the equation
det.tI2 A/ D t 2 Tr.A/t C det A D 0:
Note that
1C
2 D Tr.A/
and
1 2 D det A;
i.e., the trace is the sum of the eigenvalues and the determinant is the product
of the eigenvalues. Indeed, by definition of 1 and 2 the characteristic polynomial
2 Square Matrices of Order 2
60
is .X 1 /.X 2 /, and identifying the coefficients of X and X 0 D 1 yields the
desired relations.
The following result is absolutely fundamental for the study of square
matrices of order 2.
Theorem 2.11 (Cayley–Hamilton). For any A 2 M2 .C/ we have
A2 Tr.A/ A C .det A/ I2 D O2 :
Proof. Write A D
ab
, then a direct computation shows that
cd
A2 D
a2 C bc b.a C d /
:
c.a C d / d 2 C bc
Letting x D Tr.A/, we obtain
A2 Tr.A/ A C .det A/ I2 D
C
ad bc
0
0
ad bc
D
ax bx
a2 C bc bx
cx dx
cx d 2 C bc
a2 C ad ax
0
0
d 2 C ad dx
D 0;
since a2 Cad axDa.a Cd x/D0 and similarly d 2 Cad dxDd.a Cd x/D0.
t
u
Remark 2.12. (a) In other words, the matrix A is a solution of the characteristic
equation
det.tI2 A/ D t 2 Tr.A/t C det A D 0:
(b) Expressed in terms of the eigenvalues
theorem can be written either
A2 . 1 C
1 and
2 /A C
2 of A, the Cayley–Hamilton
1 2 I2 D O2
(2.1)
2 I2 / D O2 :
(2.2)
or equivalently
.A 1 I2 /.A Both relations are extremely useful when dealing with square matrices of order 2,
and we will see many applications in subsequent sections.
2.2 The Characteristic Polynomial and the Cayley–Hamilton Theorem
Problem 2.13. Let A 2 M2 .C/ have eigenvalues
we have
Tr.An / D
Deduce that
n
1 and
n
1 C
1 and
61
2 . Prove that for all n 1
n
2:
n
n
2 are the eigenvalues of A .
Solution. Let xn D Tr.An /. Multiplying relation (2.1) by An and taking the
trace yields
xnC2 . 1 C
2 /xnC1 C
1 2 xn D 0:
Since x0 D 2 and x1 D Tr.A/ D 1 C 2 , an immediate induction shows that
xn D n1 C n2 for all n.
For the second part, let z1 ; z2 be the eigenvalues of An . By definition, they are the
solutions of the equation t 2 Tr.An /t C det.An / D 0. Since det.An / D .det A/n D
n n
n
n
n
1 2 and Tr.A / D 1 C 2 , the previous equation is equivalent to
t 2 . n1 C
n
2 /t C
n n
1 2 D0
or
.t n
1 /.t n
2 / D 0:
The result follows.
Problem 2.14. Let A 2 M2 .C/ be a matrix with Tr.A/ ¤ 0. Prove that a matrix
B 2 M2 .C/ commutes with A if and only if B commutes with A2 .
Solution. Clearly, if BA D AB, then BA2 D A2 B, so assume conversely that
BA2 D A2 B. Using the Cayley–Hamilton theorem, we can write this relation as
B.Tr.A/A det A I2 / D .Tr.A/A det A I2 /B
or
Tr.A/.BA AB/ D O2 :
Since Tr.A/ ¤ 0, we obtain BA D AB, as desired.
Problem 2.15. Prove that for any matrices A; B 2 M2 .R/ there is a real number ˛
such that .AB BA/2 D ˛I2 .
Solution. Let X D AB BA. Since Tr.X / D Tr.AB/ Tr.BA/ D 0, the Cayley–
Hamilton theorem yields X 2 D det XI2 and so we can take ˛ D det X .
Problem 2.16. Let X 2 M2 .R/ be a matrix such that det.X 2 C I2 / D 0. Prove that
X 2 C I2 D O2 .
Solution. We have det.X CiI2 / D 0 or det.X iI2 / D 0, and since det.X iI
2/ D
ab
,
det.X C iI2 /, we deduce that det.X C iI2 / D 0 D det.X iI2 /. If X D
cd
2 Square Matrices of Order 2
62
the relation det.X C iI2 / D 0 is equivalent to .a C i /.d C i / bc D 0, i.e.,
ad bc D 1 and a C d D 0. Thus det X D 1 and T r.X / D 0 and we conclude
using the Cayley–Hamilton theorem.
An important consequence of the Cayley–Hamilton theorem is the following
result (which can of course be proved directly by hand).
Theorem 2.17. A matrix A 2 M2 .C/ is invertible if and only if det A ¤ 0. If this is
the case, then
A1 D
1
.Tr.A/ I2 A/:
det A
Proof. Suppose that A is invertible. Then taking the determinant in the equality
A A1 D I2 we obtain
det A det A1 D det I2 D 1;
thus det A ¤ 0.
Conversely, suppose that det A ¤ 0 and define
BD
1
.Tr.A/ I2 A/:
det A
Then using the Cayley–Hamilton theorem we obtain
AB D
1
1
.Tr.A/ A A2 / D
det AI2 D I2
det A
det A
and similarly BA D I2 . Thus A is invertible and A1 D B.
Remark 2.18. One can also check directly that if det A ¤ 0, then A is invertible, its
inverse being given by
A
1
1
a22 a12
:
D
det A a21 a11
Problem 2.19. Let A; B 2 M2 .C/ be two matrices such that AB D I2 . Then A is
invertible and B D A1 . In particular, we have BA D I2 .
Solution. Since AB D I2 , we have det A det B D det.AB/ D 1, thus det A ¤ 0.
The previous theorem shows that A is invertible. Multiplying the equality AB D I2
by A1 on the left, we obtain B D A1 . Finally, BA D A1 A D I2 .
A very important consequence of the previous theorem is the following characterization of eigenvalues:
Theorem 2.20. If A 2 M2 .C/ and z 2 C, then the following assertions are
equivalent:
2.2 The Characteristic Polynomial and the Cayley–Hamilton Theorem
63
(a) z is an eigenvalue of A;
(b) det.zI2 A/ D 0.
(c) There is a nonzero vector v 2 C2 such that Av D zv.
Proof. By definition of eigenvalues, (a) implies (b). Suppose that (b) holds and let
B D A zI2 . The assumption implies that det B D 0 and so b11 b22 D b12 b21 . We
need to prove that we can find x1 ; x2 2 C not both zero and such that
b11 x1 C b12 x2 D 0
b21 x1 C b22 x2 D 0:
and
If b11 ¤ 0 or b12 ¤ 0, choose x2 D b11 and x1 D b12 , so suppose that b11 D 0 D
b12 . If one of b21 ; b22 is nonzero, choose x1 D b22 and x2 D b21 , otherwise choose
x1 D x2 D 1. Thus (b) implies (c).
Suppose now that (c) holds. Then A2 v D zAv D z2 v and using relation (2.1) we
obtain
.z2 Tr.A/z C det A/v D 0:
Since v ¤ 0, this forces z2 Tr.A/z C det A D 0 and so z is an eigenvalue of A.
Thus (c) implies (a) and the theorem is proved.
Problem 2.21. Let A 2 M2 .C/ have two distinct eigenvalues
can find an invertible matrix P 2 GL2 .C/ such that
ADP
1
0
0
2
1;
2 . Prove that we
P 1 :
x11
Solution. By the previous theorem, we can find two nonzero vectors X1 D
x21
x12
such that AXi D i Xi .
and X2 D
x22
x11 x12
whose columns are X1 ; X2 . A simple
Consider the matrix P D
x21 x22
computation shows
that
the columns of
AP are
1 X1 and 2 X2 , which are the
0
0
1
1
, thus AP D P
. It remains to see that if 1 ¤ 2 ,
columns of P
0 2
0 2
then P is invertible (we haven’t used so far the hypothesis 1 ¤ 2 ).
Suppose that det P D 0, thus x11 x22 D x21 x12 . This easily implies that the
columns of P are proportional, say the second column X2 is ˛ times the first column,
X1 . Thus X2 D ˛X1 . Then
2 X2 D AX2 D ˛AX1 D ˛ 1 X1 D
forcing . 1 2 /X2 D 0. This is impossible as both
The problem is solved.
1 X2 ;
1 2 and X2 are nonzero.
2 Square Matrices of Order 2
64
Problem 2.22. Solve in M2 .C/ the following equations
(a) A2 D O2 .
(b) A2 D I2 .
(c) A2 D A.
Solution. (a) Let A be a solution of the problem. Then det A D 0 and the Cayley–
Hamilton relation reduces to Tr.A/A D 0. Taking the trace yields Tr.A/2 D 0,
thus Tr.A/ D 0. Conversely, if det A D 0 and Tr.A/ D 0, then the Cayley–
Hamilton theorem shows that A2 D O2 . Thus the solutions of the problem are
the matrices
a b
; with a; b; c 2 C and a2 C bc D 0:
AD
c a
(b) We must have det A D ˙1 and, by the Cayley–Hamilton theorem, I2 Tr.A/A C det AI2 D O2 . If det A D 1, then Tr.A/A D 2I2 and taking the trace
yields Tr.A/2 D 4, thus Tr.A/ D ˙2. This yields two solutions, A D ˙I2 .
Suppose that det A D 1. Then Tr.A/A D O2 and taking the trace gives
Tr.A/ D 0. Conversely, any matrix A with Tr.A/ D 0 and det A D 1 is a
solution of the problem (again by Cayley–Hamilton). Thus the solutions of the
equation are
˙I2
and
AD
a b
;
c a
a; b; c 2 C;
a2 C bc D 1:
(c) If det A ¤ 0, then multiplying by A1 yields A D I2 . So suppose that det A D
0. The Cayley–Hamilton theorem yields A Tr.A/A D O2 . If Tr.A/ ¤ 1, this
forces A D O2 , which is a solution of the problem. Thus if A ¤ O2 ; I2 , then
det A D 0 and Tr.A/ D 1. Conversely, all such matrices are solutions of the
problem (again by Cayley–Hamilton). Thus the solutions of the problem are
a b
;
AD
c 1a
O2 ;
I2
and
a; b; c 2 C;
a2 C bc D a:
Problem 2.23. Let A 2 M2 .C/ be a matrix. Prove that the following statements are
equivalent:
(a) Tr.A/ D det A D 0.
(b) A2 D O2 .
(c) Tr.A/ D Tr.A2 / D 0.
(d) There exists n 2 such that An D O2 .
Solution. Taking the trace of the Cayley–Hamilton theorem, we see that Tr.A2 / D
Tr.A/2 2 det A. From this it is clear that (a) and (c) are equivalent.
2.2 The Characteristic Polynomial and the Cayley–Hamilton Theorem
65
The implication (a) implies (b) is just an application of the Cayley–Hamilton
theorem. The implication (b) implies (d) is obvious. Thus we need only show (d)
implies (a). If An D O2 for some n 2, then clearly det A D 0. Thus the Cayley–
Hamilton theorem reads A2 D Tr.A/A. Iterating this an immediate induction gives
An D Tr.A/n1 A, hence O2 D Tr.A/n1 A. Taking the trace of this identity gives
0 D Tr.A/n and hence Tr.A/ D 0.
Problem 2.24. Find all matrices X 2 M2 .R/ such that X 3 D I2 .
Solution. We must have .det X /3 D 1 and so det X D 1 (since det X 2 R). Letting
t D Tr.X /, the Cayley–Hamilton theorem and the given equation yield
I2 D X 3 D X.tX I2 / D t .tX I2 / X D .t 2 1/X tI2 :
If t 2 ¤ 1, then the previous relation shows that X is scalar and since X 3 D I2 , we
must have X D I2 . If t 2 D 1, then the previous relation gives t D 1. Conversely,
any matrix X 2 M2 .R/ with Tr.X / D 1 and det X D 1 satisfies X 2 CX CI2 D O2
and so also X 3 D I2 . We conclude that the solutions of the problem are
I2
and
a
b
;
c 1 a
a; b; c 2 R;
a2 C a C bc D 1:
2.2.1 Problems for Practice
1. Let A; B 2 M2 .R/ be commuting matrices. Prove that
det.A2 C B 2 / 0:
Hint: check that A2 C B 2 D .A C iB/.A iB/.
2. Let A; B 2 M2 .R/ be such that AB D BA and det.A2 C B 2 / D 0: Prove
that det A D det B. Hint: use the hint of the previous problem and consider the
polynomial det.A C XB/.
3. Let A; B; C 2 M2 .R/ be pairwise commuting matrices and let
f .X / D det.A2 C B 2 C C 2 C x.AB C BA C CA//:
(a) Prove that f .2/ 0. Hint: check that
A2 C B 2 C C 2 C 2.AB C BA C CA/ D .A C B C C /2 :
2 Square Matrices of Order 2
66
(b) Prove that f .1/ 0. Hint: denote X D A B and Y D B C and
check that
1
A C B C C .AB C BC C CA/ D X C Y
2
2
2
p
2
2
C
3
Y
2
!2
:
Next use the first problem.
(c) Deduce that
det.A2 C B 2 C C 2 / C 2 det.AB C BC C CA/ 0:
4. Let A; B 2 M2 .C/ be matrices with Tr.AB/ D 0. Prove that .AB/2 D .BA/2 .
Hint: use the Cayley–Hamilton theorem.
5. Let A be a 2 2 matrix with rational entries with the property that
det.A2 2I2 / D 0:
2I2 and det A D 2: Hint: use the fact that A2 2I2 D
Provepthat A2 D p
.A 2I2 /.A C 2I2 / and consider the characteristic polynomial of A.
6. Let x be a positive real number and let A 2 M2 .R/ be a matrix such that
det.A2 C xI2 / D 0. Prove that
det.A2 C A C xI2 / D x:
7. Let A; B 2 M2 .R/ be such that det.AB BA/ 0: Consider the polynomial
f .X / D det.I2 C .1 X /AB C XBA/:
(a) Prove that f .0/ D f .1/.
(b) Deduce that
1
det.I2 C AB/ det I2 C .AB C BA/ :
2
8. Let n 3 be an integer. Let X 2 M2 .R/ be such that
1 1
:
D
1 1
X CX
n
n2
(a) Prove that det X D 0. Hint: show that det.X 2 C I2 / D 0.
2.3 The Powers of a Square Matrix of Order 2
67
(b) Let t be the trace of X . Prove that
t n C t n2 D 2:
(c) Find all possible matrices X satisfying the original equation.
9. Let n 2 be a positive integer and let A; B 2 M2 .C/ be two matrices such that
AB ¤ BA and .AB/n D .BA/n . Prove that .AB/n D ˛I2 for some complex
number ˛.
10. Let A; B 2 M2 .R/ and let n 1 be an integer such that C n D I2 , where
C D AB BA. Prove that n is even and C 4 D I2 . Hint: use Problem 2.15.
2.3 The Powers of a Square Matrix of Order 2
In this section we will use the Cayley–Hamilton theorem to compute the powers of a
given matrix A 2 M2 .C/. Let 1 and 2 be the eigenvalues of A. The discussion and
the final result will be very different according to whether 1 and 2 are different
or not.
Let us start with the case 1 D 2 and consider the matrix B D A 1 I2 . Then
the Cayley–Hamilton theorem in the form of relation (2.2) yields B 2 D O2 , thus
B k D O2 for k 2. Using the binomial formula we obtain
n
X
n
A D .B C 1 I2 / D
k
n
!
nk k
B D
1
n
n
n1
1 I2 C n 1 B:
kD0
Let us assume now that
1 ¤
B DA
2 and consider the matrices
1 I2
and
C DA
2 I2 :
Relation (2.2) becomes BC D O2 , or equivalently B.A 2 I2 / D O2 . Thus BA D
2
2
n
2 B, which yields BA D 2 BA D 2 B and by an immediate induction BA D
n
n
n
2 B for all n. Similarly, the relation BC D O2 yields CA D 1 C for all n. Taking
advantage of the relation C B D . 1 2 /I2 , we obtain
. 1
n
n
n
n
2 /A D .C B/A D CA BA D
n
1C n
2 B:
Thus
An D
1
1
2
. n1 C n
2 B/:
All in all, we proved the following useful result, in which we change notations:
2 Square Matrices of Order 2
68
Theorem 2.25. Let A 2 M2 .C/ and let
(a) If
1 ¤
1;
2 be its eigenvalues.
2 , then for all n 1 we have A
n
BD
1
1
.A D
n
1B C
2 I2 / and C D
2
1
2
n
2 C , where
.A 1 I2 /:
1
(b) If 1 D 2 , then for all n 1 we have An D n1 B C n n1
1 C , where B D I2
and C D A 1 I2 .
1 3
:
Problem 2.26. Compute An , where A D
3 5
Solution. As Tr.A/ D 4 and det A D 4, the eigenvalues of A are solutions of the
equation t 2 C 4t C 4 D 0, thus 1 D 2 D 2 are the eigenvalues of A. Using the
previous theorem, we conclude that for any n 1 we have
An D .2/n I2 C n.2/n1 .A C 2I2 / D .2/n1
3n 2 3n
:
3n 3n 2
Though the exact statement of the previous theorem is a little cumbersome, the
basic idea is very simple. If one learns this idea, then one can compute An easily.
Keep in mind that when computing powers of a 22 matrix, one starts by computing
the eigenvalues of the matrix (this comes down to solving the quadratic equation
t 2 Tr.A/t C det A D 0). If the eigenvalues are equal, say both equal to , then
B WD A I2 satisfies B 2 D O2 and so one computes An by writing A D B C I2
and using the binomial formula. On the other hand, if the eigenvalues are different,
say 1 and 2 , then there are two matrices B; C such that for all n we have
An D
n
1B C
n
2 C:
One can easily find these matrices without having to learn the formulae by heart:
if the previous relation holds for all n 0, then it certainly holds for n D 0 and
n D 1. Thus
I2 D B C C;
AD
1B C
2 C:
This immediately yields the matrices B and C in terms of I2 ; A and 1 ; 2 .
Moreover, we see that they are of the form xI2 C yA for some complex numbers
x; y. Combining this observation with Theorem 2.25 yields the following useful
Corollary 2.27. For any matrix A 2 M2 .C/ there are sequences .xn /n0 , .yn /n0
of complex numbers such that
An D xn A C yn I2
for all n 0.
2.3 The Powers of a Square Matrix of Order 2
69
One has to be careful that the sequences .xn /n0 and .yn /n0 in the previous
corollary are definitely not always characterized by the equality An D xn A C yn I2
(this is however the case if A is not scalar). On the other hand, Theorem 2.25 shows
that we can take
xn D
when
1 ¤
2 , and, when
n
1 n
2
1
2
1 D
and yn D
n
1 2 1
n
2 1
2
2
xn D n n1
and yn D .n 1/ n1 :
1
Problem 2.28. Let m; n be positive integers and let A; B 2 M2 .C/ be two matrices
such that Am B n D B n Am . If Am and B n are not scalar, prove that AB D BA.
Solution. From Corollary 2.27 we have
Ak D xk A C yk I2 and B k D uk B C vk I2 ;
k 0;
where .xk /k0 , .yk /k0 , .uk /k0 , .vk /k0 are sequences of complex numbers. Since
Am and B n are not scalar matrices it follows that xm ¤ 0 and un ¤ 0. The relation
Am B n D B n Am is equivalent to
.xm A C ym I2 /.un B C vn I2 / D .un B C vn I2 /.xm A C ym I2 /
i.e.
xm un .AB BA/ D O2 :
Hence AB D BA.
Problem 2.29. Let t 2 R and let
cos t sin t
:
At D
sin t cos t
Compute Ant for n 1.
Solution. We offer three ways to solve this problem. The first is to follow the usual
procedure: compute the eigenvalues of At and then use the general Theorem 2.25.
Here the eigenvalues are e it and e it and it is not difficult to deduce that
Ant D
cos nt sin nt
:
sin nt cos nt
2 Square Matrices of Order 2
70
Another argument is as follows: an explicit computation shows that At1 Ct2 D
At1 At2 , thus
Ant D At At : : : At D AtCtC:::Ct D Ant :
Finally, one can also argue geometrically: At is the matrix of a rotation of angle
t , thus Ant is the matrix of a rotation of angle nt .
2.3.1 Problems for Practice
1. Consider the matrix
AD
23
:
32
(a) Let n be a positive integer. Prove the existence of a unique pair of integers
.xn ; yn / such that
An D xn A C yn I2 :
(b) Compute limn!1 yxnn .
2. Given a positive integer n, compute the nth power of the matrix
1 1
:
AD
1 1
3. Let a; b be real
and let n be a positive integer. Compute the nth power
numbers
ab
.
of the matrix
0a
4. Let x be a real number and let
cos x C sin x
2 sin x
:
AD
sin x
cos x sin x
Compute An for all positive integers n.
2.4 Application to Linear Recurrences
In this section we present two classical applications of the theory developed in
the previous section. Let a; b; c; d; x0 ; y0 be complex numbers and consider two
sequences .xn /n0 and .yn /n0 recursively defined by
2.4 Application to Linear Recurrences
71
xnC1 D axn C byn
ynC1 D cxn C dyn ;
(2.3)
n0
We would like to find the general terms of the two sequences in terms of the initial
data a; b; c; d; x0 ; y0 and n.
The key observation is that the system can be written in matrix form as follows
xnC1
ynC1
ab
where A D
cd
D
ab
cd
xn
yn
i.e.
xnC1
ynC1
DA
xn
;
yn
n 0;
is the matrix of coefficients. An immediate induction yields
xn
yn
DA
n
x0
;
y0
n 0;
(2.4)
and so the problem is reduced to the computation of An , which is solved by
Theorem 2.25.
Let us consider a slightly different problem, also very classical. It concerns
second order linear recurrences with constant coefficients. More precisely, we fix
complex numbers a; b; x0 ; x1 and look for the general term of the sequence .xn /n0
defined recursively by
xnC1 D axn C bxn1 ;
n 1;
(2.5)
We can easily reduce this problem to the previous one by denoting yn D xn1 for
1
n 1 and y0 D .x1 ax0 / if b ¤ 0 (which we will assume from now on, since
b
otherwise the problem is very simple from the very beginning). Indeed, relation
(2.5) is equivalent to the following system
xnC1 D axn C byn
;
ynC1 D xn
n 0:
As we have already seen, finding xn and yn (or equivalently xn ) comes down to
computing the powers of the matrix of coefficients
ab
:
AD
10
The characteristic equation of this matrix is 2 a b D 0. If
roots of this equation, then Theorem 2.25 yields the following:
1 and
2 are the
2 Square Matrices of Order 2
72
If
1 ¤
2 , then we can find constants u; v such that
xn D u n1 C v n2
for all n. These two constants are easily determined by imposing
x0 D u C v and
x1 D u 1 C v 2
and solving this linear system in the unknowns u; v.
If 1 D 2 , then we can find constants u; v such that for all n 0
xn D .un C v/ n1 ;
and u and v are found from the initial conditions by solving x0 D v and x1 D
.u C v/ 1 .
Problem 2.30. Find the general terms of .xn /n0 , .yn /n0 if
xnC1 D xn C 2yn
ynC1 D 2xn C 5yn ;
and x0 D 1, y0 D 2.
n 0;
1 2
, with characteristic equation
Solution. The matrix of coefficients is A D
2 5
2
6 C 9 D 0 and solutions 1 D 2 D 3. Theorem 2.25 yields (after a simple
computation)
An D 3n I2 C n3n1
.3 2n/3n1
2 2
2n3n1
D
2 2
2n3n1 .3 C 2n/3n1
Combined with x0 D 1 and y0 D 2, we obtain
xn D .2n C 3/3n1 and yn D 2.n C 3/3n1 ;
n 0:
Problem 2.31. Find the limits of sequences .xn /n0 and .yn /n0 , where
xnC1 D .1 ˛/xn C ˛yn
ynC1 D ˇxn C .1 ˇ/yn ;
and ˛; ˇ are complex numbers with j1 ˛ ˇj < 1.
2.4 Application to Linear Recurrences
73
Solution. The matrix of coefficients is
1˛ ˛
AD
ˇ 1ˇ
1 D 1 and
and one easily checks that its eigenvalues are
j 2 j < 1, in particular 2 ¤ 1. Letting
BD
1
1
.A 2 I2 / D
2
1
˛Cˇ
2 D 1 ˛ ˇ. Note that
ˇ˛
;
ˇ˛
Theorem 2.25 gives the existence of an explicit matrix C such that
An D
n
1B C
n
2C D B C
n
2 C:
Since j 2 j < 1, we have limn!1 n2 D 0 and the previous relation shows that
n
limn!1 A
DB.
x
xn
D An 0 , we conclude that xn and yn are convergent sequences,
Since
yn
y0
and if l1 ; l2 are their limits, then
l1
l2
DB
x0
:
y0
Taking into account the explicit form of B, we obtain
lim xn D lim yn D
n!1
n!1
ˇx0 C ˛y0
:
˛Cˇ
2.4.1 Problems for Practice
1. Find the general term of the sequence .xn /n0 defined by x1 D 1, x2 D 0 and for
all n 1
xnC2 D 4xnC1 xn :
2. Consider the sequence .xn /n0 defined by x0 D 1, x1 D 2 and for all n 0
xnC2 D xnC1 xn :
Is this sequence periodical? If so, find its minimal period.
2 Square Matrices of Order 2
74
3. Find the general terms of the sequences .xn /n0 and .yn /n0 satisfying x0 D
y0 D 1, x1 D 1, y1 D 2 and
xnC1 D
2xn C 3yn
;
5
ynC1 D
2yn C 3xn
:
5
4. A sequence .xn /n0 satisfies x0 D 2, x1 D 3 and for all n 1
xnC1 D
p
xn1 xn :
Find the general term of this sequence (hint: take the logarithm of the recurrence
relation).
5. Consider a map f W .0; 1/ ! .0; 1/ such that
f .f .x// D 6x f .x/
for all x > 0. Let x > 0 and define a sequence .zn /n0 by z0 D x and znC1 D
f .zn / for n 0.
(a) Prove that
znC2 C znC1 6zn D 0
for n 0.
(b) Deduce the existence of real numbers a; b such that
zn D a 2n C b .3/n
for all n 0.
(c) Using the fact that zn > 0 for all n, prove that b D 0 and conclude that
f .x/ D 2x for all x > 0.
2.5 Solving the Equation X n D A
Consider a matrix A 2 M2 .C/ and an integer n > 1. In this section we will explain
how to solve the equation X n D A, with X 2 M2 .C/.
A first key observation is that for any solution X of the equation we have
AX D XA:
Indeed, this is simply a consequence of the fact that X n X D X X n . We will need
the following very useful:
2.5 Solving the Equation X n D A
75
Proposition 2.32. Let A 2 M2 .C/ be a non-scalar matrix. If X 2 M2 .C/
commutes with A, then X D ˛I2 C ˇA for some complex numbers ˛; ˇ.
xy
ab
. The equality AX D XA is equivalent,
and X D
Proof. Write A D
z t
cd
after an elementary computation, to
bz D cy;
ay C bt D bx C dy;
cx C d z D az C ct;
bz D cy;
.a d /y D b.x t /;
c.x t / D z.a d /:
or
xt
If a ¤ d , set ˇ D ad
. Then z D cˇ, y D bˇ and ˇa x D ˇd t. We deduce
that X D ˛I2 C ˇA, where ˛ D ˇa C x D ˇd C t .
Suppose that a D d . If x ¤ t, the previous relations yield b D c D 0 and so A is
scalar, a contradiction. Thus x D t and bz D cy. Moreover, one of b; c is nonzero
(as A is not scalar), say b (the argument when c ¤ 0 is identical). Setting ˇ D yb
and ˛ D x ˇa yields X D ˛I2 C ˇA.
Let us come back to our original problem, solving the equation X n D A. Let 1
and 2 be the eigenvalues of A. We will discuss several cases, each of them having
a very different behavior.
with the case 1 ¤ 2 . By Problem 2.21, we can then write A D
Let us start
0
1
P 1 for some P 2 GL2 .C/. Since AX D XA and A is not scalar, by
P
0 2
Proposition 2.32 there are complex numbers a; b such that X D aI2 C bA. Thus
X DP
0
aCb 1
P 1 :
0
aCb 2
The equation X n D A is then equivalent to
0
.a C b 1 /n
0
.a C b 2 /n
D
1
0
0
2
:
It follows that a C b 1 D z1 and a C b 2 D z2 , where zn1 D
z1 0
P 1 . Hence
X DP
0 z2
n
1 and z2 D
2 , and
Proposition 2.33. Let A 2 M2 .C/ be a matrix
withdistinct eigenvalues 1 ; 2 . Let
1 0
P 1 . Then the solutions of the
P 2 GL2 .C/ be a matrix such that A D P
0 2
2 Square Matrices of Order 2
76
z1 0
P 1 , where z1 and z2 are solutions
equation X D A are given by X D P
0 z2
of the equations t n D 1 and t n D 2 respectively.
n
Let us deal now with the case in which A is not scalar, but has equal eigenvalues,
say both eigenvalues equal . Then the matrix B D A I2 satisfies B 2 D O2 (by
the Cayley–Hamilton theorem) and we have A D B C I2 . Now, since AX D XA
and A is not scalar, we can write X D cI2 C dA for some complex numbers c; d
(Proposition 2.32). Since A D B C I2 , it follows that we can also write X D
aI2 C bB for some complex numbers a; b. Since B 2 D O2 , the binomial formula
and the given equation yield
A D X n D .aI2 C bB/n D an I2 C nan1 bB:
Since A D B C I2 , we obtain
B C I2 D nan1 bB C an I2 :
Since B is not scalar (as A itself is not scalar), the previous relation is equivalent to
1 D nan1 b
and
D an :
This already shows that ¤ 0 (as the first equation shows that a ¤ 0), so if D 0
(which corresponds to A2 D O2 ) then the equation has no solution. On the other
hand, if ¤ 0, then the equation an D has n complex solutions, and for each of
them we obtain a unique value of b, namely b D na1n1 . We have just proved the
following
Proposition 2.34. Suppose that A 2 M2 .C/ is not scalar, but both eigenvalues of
A are equal to some complex number . Then
(a) If D 0, the equation X n D A has no solutions for n > 1, and the only solution
X D A for n D 1.
(b) If ¤ 0, then the solutions of the equation X n D A are given by
X D aI2 C
1
.A I2 /;
nan1
where a runs over the n solutions of the equation zn D .
Finally, let us deal with the case when A is scalar, say A D cI2 for some complex
number c. If c D 0, then X n D O2 has already been solved, so let us assume that
c ¤ 0. Consider a solution X of the equation X n D cI2 and let 1 ; 2 be the
eigenvalues of X . Then n1 D n2 D c. We have two possibilities:
2.5 Solving the Equation X n D A
• Either
1 ¤
77
2 , in which
caseX has distinct eigenvalues and so (Problem 2.21)
1 0
P 1 for some invertible matrix P . Then X n D
we can write X D P
0 2
n
1 0
P 1 and this equals cI2 since n1 D n2 D c. The conclusion is that
P
0 n2
n
for each pair . 1 ; 2 / of distinct solutions of the equation
t D c we obtain a
1 0
P 1 for some
whole family of solutions, namely the matrices X D P
0 2
invertible matrix P .
• Suppose that 1 D 2 and let Y D X 1 I2 , then Y 2 D O2 and the equation
X n D cI2 is equivalent to .Y C 1 I2 /n D cI2 . Using again the binomial formula
and the equality Y 2 D O2 , we can rewrite this equation as
n
n1
1 I2 C n 1 Y D cI2 :
Since n1 D c and 1 ¤ 0 (as c ¤ 0), we deduce that necessarily Y D O2 and so
X D 1 I2 , with 1 one of the n complex solutions of the equation t n D c. Thus
we obtain n more solutions this way.
We can unify the previous two possibilities and obtain
Proposition 2.35. If c ¤ 0 is a complex number, the solutions in M2 .C/ of the
equation X n D cI2 are given by
X DP
x 0
P 1
0y
(2.6)
where x; y are solutions (not necessary distinct) of the equation zn D c, and P 2
GL2 .C/ is arbitrary.
Problem 2.36. Let t 2 .0; / be a real number and let n > 1 be an integer. Find all
matrices X 2 M2 .R/ such that
Xn D
cos t sin t
:
sin t cos t
n
Solution. With the notations of Problem 2.29, we need to solve the equation
X
D
ab
, the
At . Let X be a solution, then XAt D At X D X nC1 . Writing X D
cd
relation XAt D At X yields
b sin t D c sin t and a sin t D d sin t , thus a D d
a b
. Next, since X n D At , we have
and c D b. Hence X D
b a
.det X /n D det X n D det At D 1;
2 Square Matrices of Order 2
78
and since det X D a2 C b 2 0, we deduce that a2 C b 2 D 1. Thus we can
write a D cos x and b D sin x for some real number x. Then X D Ax and the
equation X n D At is equivalent (thanks to Problem 2.29) to Anx D At . This is
further equivalent to nx D t C 2k for some integer k. It is enough to restrict to
k 2 f0; 1; : : : ; n1g. We conclude that the solutions of the problem are the matrices
cos tk sin tk
Xk D
;
sin tk cos tk
t C 2k
, k D 0; 1; : : : ; n 1.
n
a b
2 M2 .R/. Prove that the following statements
Problem 2.37. Let A D
b a
are equivalent:
where tk D
(1) An D I2 for some positive integer n;
(2) a D cos r, b D sin r for some rational number r.
Solution. If a D cos. kn / and b D sin. kn / for some n 1 and k 2 Z, then
Problem 2.29 yields A2n D I2 , thus (2) implies (1).
Assume now that (1) holds. Then .det A/n D det An D 1 and since det A D
2
a C b 2 0, we must have det A D 1, that is a2 C b 2 D 1. Thus we can find t 2 R
such that a D cos t and b D sin t . Then A D At and by Problem 2.29 we have
I2 D An D Ant . This forces cos.nt / D 1 and so t is a rational multiple of . The
problem is solved.
2.5.1 Problems for Practice
1. Let n > 1 be an integer. Prove that the equation
Xn D
01
00
has no solutions in M2 .C/.
2. Solve in M2 .C/ the binomial equation
1 2
:
X D
1 2
4
3. Let n > 1 be an integer. Prove that the equation
3 1
X D
0 0
n
has no solutions in M2 .Q/.
2.6 Application to Pell’s Equations
79
4. Find all matrices X 2 M2 .R/ such that
X D
3
4 3
:
3 2
5. Find all matrices A; B 2 M2 .C/ such that
AB D O2
and
A5 C B 5 D O2 :
6. Solve in M2 .R/ the equation
7 5
:
X D
15 12
n
7. Solve in M2 .R/ the equation
Xn D
6 2
:
21 7
2.6 Application to Pell’s Equations
Let D > 1 be an integer which is not a perfect square. The diophantine equation,
called Pell’s equation
x 2 Dy 2 D 1
(2.7)
has an obvious solution .1; 0/ in nonnegative integers. A well-known but nontrivial
result (which we take for granted) is that this equation also has nontrivial solutions
(i.e., different from .0; 1/).
In this section we explain how the theory developed so far allows finding all
solutions of the Pell equation once we know the smallest nontrivial solution. Let SD
be the set of all solutions in positive integers to the Eq. (2.7) and let .x1 ; y1 / be the
fundamental solution, i.e., the solution in SD for which the first component x1 is
minimal among the first components of the elements of SD .
If x; y are positive integers, consider the matrix
x Dy
;
A.x;y/ D
y x
so that .x; y/ 2 SD if and only if det A.x;y/ D 1. Elementary computations yield the
fundamental relation
A.x;y/ A.u;v/ D A.xuCDyv;xvCyu/
(2.8)
2 Square Matrices of Order 2
80
Passing to determinants in (2.8) we obtain the multiplication principle:
then .xu C Dyv; xv C yu/ 2 SD :
if .x; y/; .u; v/ 2 SD ;
It follows from the multiplication principle that if we write
An.x1 ;y1 / D
xn Dyn
;
yn xn
n 1;
then .xn ; yn / 2 SD for all n. The sequences xn and yn are described by the recursive
system
xnC1 D x1 xn C Dy1 yn
(2.9)
ynC1 D y1 xn C x1 yn ; n 1
n
consequence of the equality AnC1
.x1 ;y1 / D A.x1 ;y1 / A.x1 ;y1 / . Moreover, Theorem 2.25
gives explicit formulae for xn and yn in terms of x1 ; y1 ; n: the characteristic equation
of matrix A.x1 ;y1 / is
2x1 C 1 D 0
q
p
with 1;2 D x1 ˙ x12 1 D x1 ˙ y1 D, and Theorem 2.25 yields, after an
elementary computation of the matrices B; C involved in that theorem
8
p
p
1
ˆ
ˆ
x D Œ.x1 C y1 D/n C .x1 y1 D/n ˆ
< n
2
(2.10)
ˆ
p n
p n
ˆ
1
:̂ yn D p Œ.x1 C y1 D/ .x1 y1 D/ ; n 1:
2 D
2
Note that relation (2.10) also makes sense for n D 0, in which case it gives the
trivial solution .x0 ; y0 / D .1; 0/.
Theorem 2.38. All solutions in positive integers of the Pell equation x 2 Dy 2 D 1
are described by the formula (2.10), where .x1 ; y1 / is the fundamental solution of
the equation.
Proof. Suppose that there are elements in SD which are not covered by formula
(2.10), and among them choose one .x; y/ for which x is minimal. Using the
multiplication principle, we observe that the matrix A.x;y/ A1
.x1 ;y1 / generates a
0
0
solution in integers .x ; y /, where
x 0 D x1 x Dy1 y
y 0 D y1 x C x1 y
p
We claim that x 0 ; y 0 are positive integers. This is clear for x 0 , as x > Dy and x1 >
p
Dy1 , thus x1 x > Dy1 y. Also, x1 y > y1 x is equivalent to x12 .x 2 1/ > x 2 .x12 1/
2.6 Application to Pell’s Equations
81
or x > x1 , which holds because .x1 ; y1 / is a fundamental solution and .x; y/ is
not described by relation (2.10) (while .x1 ; y1 / is described by this relation, with
n D 1). Moreover, since A.x 0 ;y 0 / A.x1 ;y1 / D A.x;y/ , we have x D x 0 x1 C Dy 0 y1 > x 0
and y D x 0 y1 C y 0 x1 > y 0 . By minimality, .x 0 ; y 0 / must be of the form (2.10), i.e.,
kC1
k
A.x;y/ A1
.x1 ;y1 / D A.x1 ;y1 / for some positive integer k. Therefore A.x;y/ DA.x1 ;y1 / , i.e.,
.x; y/ is of the form (2.10), a contradiction.
Problem 2.39. Find all solutions in positive integers to Pell’s equation
x 2 2y 2 D 1:
Solution. The fundamental solution is .x1 ; y1 / D .3; 2/ and the associated matrix is
34
A.3;2/ D
23
The solutions .xn ; yn /n1 are given by An.3;2/ , i.e.
8
p
p
1
ˆ
ˆ
xn D Œ.3 C 2 2/n C .3 2 2/n ˆ
<
2
ˆ
p
p
ˆ
1
:̂ yn D p Œ.3 C 2 2/n .3 2 2/n :
2 2
We can extend slightly the study of the Pell equation by considering the more
general equation
ax 2 by 2 D 1
(2.11)
where we assume that ab is not a perfect square (it is not difficult to see that if ab is
a square, then the equation has only trivial solutions). Contrary to the Pell equation,
this Eq. (2.11) does not always have solutions (the reader can check that the equation
3x 2 y 2 D 1 has no solutions in integers by working modulo 3).
Define the Pell resolvent of (2.11) by
u2 abv2 D 1
(2.12)
and let Sa;b be the set of solutions in positive integers of Eq. (2.11). Thus S1;ab is the
set denoted Sab when considering the Pell equation (it is the set of solutions of the
Pell resolvent). If x; y; u; v are positive integers consider the matrices
x by
u abv
; Au;v D
;
B.x;y/ D
y ax
v u
the second matrix being the matrix associated with the Pell resolvent equation.
2 Square Matrices of Order 2
82
An elementary computation shows that
B.x;y/ A.u;v/ D B.xuCbyv;axvCyu/ ;
Passing to determinants in the above relation and noting that .x; y/ 2 Sa;b if and
only if det B.x;y/ D 1, we obtain the multiplication principle:
if
.x; y/ 2 Sa;b
.u; v/ 2 Sab ;
and
then
.xu C byv; axv C yu/ 2 Sa;b ;
i.e., the product B.x;y/ A.u;v/ generates the solution .xu C byv; axv C yu/ of (2.11).
Using the previous theorem and the multiplication principle, one easily obtains the
following result, whose formal proof is left to the reader.
Theorem 2.40. Assume that Eq. (2.11) is solvable in positive integers, and let
.x0 ; y0 / be its minimal solution (i.e., x0 is minimal). Let .u1 ; v1 / be the fundamental
solution of the resolvent Pell equation (2.14). Then all solutions .xn ; yn / in positive
integers of Eq. (2.11) are generated by
B.xn ;yn / D B.x0 ;y0 / An.u1 ;v1 / ;
n0
(2.13)
It follows easily from (2.13) that
xn D x0 un C by0 vn
yn D y0 un C ax0 vn ;
n0
(2.14)
where .un ; vn /n1 is the general solution to the Pell resolvent equation.
Problem 2.41. Solve in positive integers the equation
6x 2 5y 2 D 1:
Solution. This equation is solvable and its minimal solution is .x0 ; y0 / D .1; 1/.
The Pell resolvent equation is u2 30v2 D 1, with fundamental solution .u1 ; v1 / D
.11; 2/. Using formulae (2.14) and then (2.10), we deduce that the solutions in
positive integers are .xn ; yn /n1 , where
8
p
p
p n 6 30
p
ˆ
6 C 30
ˆ
ˆ
.11 C 2 30/ C
.11 2 30/n
ˆ
< xn D
12
12
p
p
ˆ
ˆ
p
p
ˆ
5 C 30
5 30
:̂ yn D
.11 C 2 30/n C
.11 2 30/n :
12
12
2.6 Application to Pell’s Equations
83
2.6.1 Problems for Practice
1. A triangular number is a number of the form 1 C 2 C : : : C n for some positive
integer n. Find all triangular numbers which are perfect squares.
2. Find all positive integers n such that n C 1 and 3n C 1 are simultaneously perfect
squares.
3. Find all integers a; b such that a2 C b 2 D 1 C 4ab.
4. The difference of two consecutive cubes equals n2 for some positive integer n.
Prove that 2n 1 is a perfect square.
5. Find all triangles whose sidelengths are consecutive integers and whose area is
an integer.
Chapter 3
Matrices and Linear Equations
Abstract This chapter introduces and studies the reduced row-echelon form of
a matrix, and applies it to the resolution of linear systems of equations and the
computation of the inverse of a matrix. The approach is algorithmic.
Keywords Linear systems • Homogeneous Systems • Row-echelon form •
Gaussian reduction
The resolution of linear systems of equations is definitely one of the key motivations
of linear algebra. In this chapter we explain an algorithmic procedure which allows
the resolution of linear systems of equations, by performing some simple operations
on matrices. We consider this problem as a motivation for the introduction of
basic operations on the rows (or columns) of matrices. A much deeper study of
these objects will be done in later chapters, using a more abstract (and much more
powerful) setup. We will fix a field F in the following discussion, which the reader
might take R or C.
3.1 Linear Systems: The Basic Vocabulary
A linear equation in the variables x1 ; : : : ; xn is an equation of the form
a1 x1 C : : : C an xn D b;
where a1 ; : : : ; an ; b 2 F are given scalars and n is a given positive integer. The
unknowns x1 ; : : : ; xn are supposed to be elements of F .
A linear system in the variables x1 ; : : : ; xn is a family of linear equations,
usually written as
8
a11 x1C a12 x2 C : : :C a1n xn D b1
ˆ
ˆ
<
a21 x1C a22 x2 C : : :C a2n xn D b2
ˆ
:::
:̂
am1 x1C am2 x2 C : : :C amn xn D bm
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__3
(3.1)
85
86
3 Matrices and Linear Equations
Here a11 ; a12 ; : : : ; amn and b1 ; : : : ; bm are given scalars. There is a much shorter
notation for the previous system, using matrices and vectors: denoting X the
(column) vector with coordinates x1 ; : : : ; xn , A the matrix Œaij 1i m;1j n , and b
the (column) vector whose coordinates are b1 ; : : : ; bm , the system can be rewritten as
AX D b:
(3.2)
Finally, we can rewrite the system in terms of vectors: if C1 ; : : : ; Cn are the
columns of the matrix A, seen as vectors in F m (written in column form), the system
is equivalent to
x1 C1 C x2 C2 C : : : C xn Cn D b:
(3.3)
Definition 3.1. (a) The linear system (3.1) is called homogeneous if b1 D : : : D
bm D 0.
(b) The homogeneous linear system associated with the system (3.2) is the system
AX D 0.
Thus a homogeneous system is one of the form AX D 0 for some matrix A. For
the resolution of linear systems, homogeneous systems play a crucial role, thanks to
the following proposition, which shows that solving a general linear system reduces
to finding one solution and then solving a homogeneous linear system.
Proposition 3.2 (Superposition Principle). Let A 2 Mm;n .F / and b 2 F m . Let
S F n be the set of solutions of the homogeneous linear system AX D 0. If the
system AX D b has a solution X0 , then the set of solutions of this system is X0 C S.
Proof. By assumption AX0 D b. Now the relation AX D b is equivalent to AX D
AX0 , or A.X X0 / D 0. Thus a vector X is a solution of the system AX D b if and
only if X X0 is a solution of the homogeneous system AY D 0, i.e., X X0 2 S.
This is equivalent to X 2 X0 C S.
Definition 3.3. A linear system is called consistent if it has at least one solution. It
is called inconsistent if it is not consistent, i.e., it has no solution.
Let us introduce a final definition for this section:
Definition 3.4. (a) Two linear systems are equivalent if they have exactly the
same set of solutions.
(b) Let A; B be matrices of the same size. If the systems AX D 0 and BX D 0 are
equivalent, we write A B.
Remark 3.5. (a) Typical examples of inconsistent linear systems are
x1 D 0
x1 D 1
3.1 Linear Systems: The Basic Vocabulary
87
or
x1 2x2 D 1
2x2 x1 D 0
(b) Note that homogeneous systems are always consistent: any homogeneous
system has an obvious solution, namely the vector whose coordinates are all
equal to 0. We will call this the trivial solution. It follows from Proposition 3.2
that if the system AX D b is consistent, then it has a unique solution if and only
if the associated homogeneous system AX D 0 has only the trivial solution.
3.1.1 Problems for Practice
1. For which real numbers a is the system
x1 C 2x2 D 1
3x1 C 6x2 D a
consistent? Solve the system in this case.
2. Find all real numbers a and b for which the systems
x1 C 2x2 D 3
x1 C 3x2 D 1
and
x1 C ax2 D 2
x1 C 2x2 D b
are equivalent.
3. Let a; b be real numbers, not both equal to 0.
(a) Prove that the system
ax1 C bx2 D 0
bx1 C ax2 D 0
has only the trivial solution.
88
3 Matrices and Linear Equations
(b) Prove that for all real numbers c; d the system
ax1 C bx2 D c
bx1 C ax2 D d
has a unique solution and find this solution in terms of a; b; c; d .
4. Let A 2 M2 .C/ be a matrix and consider the homogeneous system AX D 0.
Prove that the following statements are equivalent:
(a) This system has only the trivial solution.
(b) A is invertible.
5. Let A and B be nn matrices such that the system ABX D 0 has only the trivial
solution. Show that the system BX D 0 also has only the trivial solution.
6. Let C and D be n n matrices such that the system CDX D b is consistent for
every choice of a vector b in Rn . Show that the system C Y D b is consistent for
every choice of a vector b in Rn .
7. Let A 2 Mn .F / be an invertible matrix with entries in a field F . Prove that for
all b 2 F n the system AX D b is consistent (the converse holds but the proof is
much harder, see Theorem 3.25).
3.2 The Reduced Row-Echelon form and Its
Relevance to Linear Systems
Consider a matrix A with entries in a field F . If R is a row of A, say R is zero if all
entries in row R are equal to 0. If R is nonzero, the leading entry of R or the pivot
of R is the first nonzero entry in that row. We say that A is in reduced row-echelon
form if A has the following properties:
(1) All zero rows of A are at the bottom of A (so no nonzero row can lie below a
zero row).
(2) The pivot in a nonzero row is strictly to the right of the pivot in the row above.
(3) In any nonzero row, the pivot equals 1 and it is the only nonzero element in its
column.
For instance, the matrix In is in reduced row-echelon form, and so is the matrix
On . The matrix
3
1 2 0 1
A D 40 0 1 1 5
0 0 0 0
2
(3.4)
3.2 The Reduced Row-Echelon form and Its Relevance to Linear Systems
89
is in reduced row-echelon form, but the slightly different matrix
3
1 2 1 1
B D 40 0 1 1 5
0 0 0 0
2
is not in reduced row-echelon form, as the pivot for the second row is not the only
nonzero entry in its column.
What is the relevance of this very special form of matrices with respect to the
original problem, consisting in solving linear systems of equations? We will see
in the next sections that any matrix can be put (in an algorithmic way) in reduced
row-echelon form and that this form is unique. Also, we will see that if Aref is the
reduced row-echelon form of A, then the systems AX D 0 and Aref X D 0 are
equivalent. Moreover, it is very easy to solve the system Aref X D 0 since Aref is
in reduced row-echelon form.
Example 3.6. Let us solve the system AX D 0, where A is the reduced row-echelon
matrix given in relation (3.4). The system is
x1 2x2 x4 D 0
x 3 C x4 D 0
We can simply express x3 D x4 and x1 D 2x2 C x4 , thus the general solution of
the system is
.2a C b; a; b; b/
with a; b 2 F .
In general, consider a matrix A which is in reduced row-echelon form and let
us see how to solve the system AX D 0. The only meaningful equations are those
given by the nonzero rows of A (recall that all zero rows of A are at the bottom).
Suppose that the i th row of A is nonzero for some i and let the pivot of that row be
in column j , so that the pivot is aij D 1. The i th equation of the linear system is
then of the form
xj C
n
X
aik xk D 0:
kDj C1
We call xj the pivot variable of the row Li . So to each nonzero row we associate a
unique pivot variable. All the other variables of the system are called free variables.
One solves the system starting from the bottom, by successively expressing the pivot
90
3 Matrices and Linear Equations
variables in terms of free variables. This yields the general solution of the system,
in terms of free variables, which can take any value in F . If y1 ; : : : ; ys are the free
variables, then the solutions of the system will be of the form
3
b11 y1 C b12 y2 C : : : C b1s ys
6 b21 y1 C b22 y2 C : : : C b2s ys 7
7
X D6
5
4
:::
2
bn1 y1 C bn2 y2 C : : : C bns ys
for some scalars bij . This can also be written as
2
3
2
3
b11
b1s
6 b21 7
6 b2s 7
7
6
7
X D y1 6
4 : : : 5 C : : : C ys 4 : : : 5 :
bn1
bns
We call
3
b11
6 b21 7
7
Y1 D 6
4:::5;:::;
bn1
2
3
b1s
6 b2s 7
7
Ys D 6
4:::5
bns
2
the fundamental solutions of the system AX D 0. The motivation for their name is
easy to understand: Y1 ; : : : ; Ys are solutions of the system AX D 0 which “generate”
all other solutions, in the sense that all solutions of the system AX D 0 are obtained
by all possible linear combinations of Y1 ; : : : ; Ys (corresponding to all possible
values that the free variables y1 ; : : : ; ys can take).
Example 3.7. Let us consider the matrix in reduced row-echelon form
3
1 1 0 0 1 0 2
60 0 1 0 3 0 1 7
7
6
7
6
A D 60 0 0 1 0 0 17
7
6
40 0 0 0 0 1 0 5
0000 0 0 0
2
and the associated homogeneous linear system AX D 0. This can be written as
8
x1 C x2 x5 C 2x7 D 0
ˆ
ˆ
<
x3 C 3x5 C x7 D 0
ˆ
x 4 x7 D 0
:̂
x6 D 0
3.2 The Reduced Row-Echelon form and Its Relevance to Linear Systems
91
The pivot variables are x1 ; x3 ; x4 ; x6 , as the pivots appear in columns 1; 3; 4; 6. So
the free variables are x2 ; x5 ; x7 . Next, we solve the system starting with the last
equation and going up, at each step expressing the pivot variables in terms of free
variables. The last equation gives x6 D 0. Next, we obtain x4 D x7 , then x3 D
3x5 x7 and x1 D x2 C x5 2x7 . Thus
3
3
3
2
2
2
3
1
1
2
x2 C x5 2x7
6 1 7
6 0 7
6 0 7
6
7
x2
7
7
7
6
6
6
6
7
6 0 7
6 3 7
6 1 7
6 3x x
7
7
7
7
6
6
6
6
7
5
7
7
7
7
6
6
6
6
7
X D6
x7
7 D x2 6 0 7 C x5 6 0 7 C x7 6 1 7 :
7
7
7
6
6
6
6
7
6 0 7
6 1 7
6 0 7
6
7
x5
7
7
7
6
6
6
6
7
4 0 5
4 0 5
4 0 5
4
5
0
0
0
1
x7
2
The three column vectors appearing in the right-hand side are the fundamental
solutions of the system AX D 0. All solutions of the system are given by all possible
linear combinations of the three fundamental solutions.
The number of fundamental solutions of the system AX D 0 is the total
number of variables minus the number of pivot variables. We deduce that the system
AX D 0 has the unique solution X D 0 if and only if there are no free variables,
or equivalently every variable is a pivot variable. This is the same as saying that
the number of pivot variables equals the number of columns of A. Combining these
observations with the superposition principle (Proposition 3.2) we obtain the very
important:
Theorem 3.8. (a) A homogeneous linear system having more variables than
equations has nontrivial solutions. If the field containing the coefficients of the
equations is infinite (for instance R or C), then the system has infinitely many
solutions.
(b) A consistent linear system AX D b having more variables than equations has
at least 2 solutions and, if the field F is infinite (for instance F D R or F D C),
then it has infinitely many solutions.
We turn now to the fundamental problem of transforming a matrix into a reduced
row-echelon form matrix. In order to solve this problem we introduce three types of
simple operations that can be applied to the rows of a matrix. We will see that one
can use these operations to transform any matrix into a reduced row-echelon form
matrix. These operations have a very simple motivation from the point of view of
linear systems: the most natural operations that one would do in order to solve a
linear system are:
• multiplying an equation by a nonzero scalar;
• adding a multiple of an equation to a second (and different) equation;
• interchanging two equations.
92
3 Matrices and Linear Equations
Note that these operations are reversible: for example, the inverse operation of
multiplication of an equation by a nonzero scalar a is multiplication of that equation
by the inverse of a. It is therefore clear that by performing any finite sequence
of such operations on a linear system we obtain a new linear system which has
precisely the same set of solutions as the original one (i.e., a new linear system
which is equivalent to the original one). These operations on equations of the system
can be seen as operations on the matrix associated with the system. More precisely:
Definition 3.9. An elementary operation on the rows of a matrix A (or elementary row operation) is an operation of one of the following types:
(1) row swaps: interchanging two rows of the matrix A.
(2) row scaling: multiplying a row of A by a nonzero scalar.
(3) transvection: replacing a row L by L C cL0 for some scalar c and some row L0
of A, different from L.
The previous discussion shows that if A is a matrix and B is obtained from
A by a sequence of elementary row operations, then A
B, where we recall
(Definition 3.4) that this simply means that the systems AX D 0 and BX D 0
are equivalent.
Corresponding to these operations, we define elementary matrices:
Definition 3.10. A matrix A 2 Mn .F / is called an elementary matrix if it is
obtained from In by performing exactly one elementary row operation.
Note that elementary matrices have the same number of rows and columns. There
are three types of elementary matrices:
(1) Transposition matrices: those obtained from In by interchanging two of its rows.
(2) Dilation matrices: those obtained from In by multiplying one of its rows by a
nonzero scalar.
(3) Transvection matrices: those obtained from In by adding to a row a multiple of
a second (and different) row.
A simple, but absolutely crucial observation is the following:
Proposition 3.11. Let A 2 Mm;n .F / be a matrix. Performing an elementary row
operation on A is equivalent to multiplying A on the left by the elementary matrix
corresponding to that operation.
Proof. If E is any m m matrix and A 2 Mm;n .F /, then the i th row of EA is
ei1 L1 C ei2 L2 C : : : C eim Lm , where L1 ; : : : ; Lm are the rows of A and eij are the
entries of E. The result follows readily from the definitions.
We now reach the most important theorem of this chapter: it is one of the
most important theorems in linear algebra, since using it we will obtain algorithmic
ways of solving many practical problems, concerning linear systems, invertibility of
matrices, linear independence of vectors, etc.
3.2 The Reduced Row-Echelon form and Its Relevance to Linear Systems
93
Theorem 3.12. Any matrix A 2 Mm;n .F / can be put into a reduced row-echelon
form by performing elementary row operations on its rows.
Proof. The proof is algorithmic. Start with any matrix A and consider its first
column. If it is zero, then pass directly to the next column. Suppose that the first
column C1 is nonzero and consider the first nonzero entry, say it is ai1 . Then
interchange rows L1 and Li (if i D 1 we skip this operation), so in the new matrix
we have a nonzero entry x in position .1; 1/. Multiply the first row by 1=x to obtain
a new matrix, in which the entry in position .1; 1/ is 1. Using transvections, we can
make all entries in the first column below the .1; 1/ entry equal to 0: for i 2
subtract bi1 times the first row, where bi1 is the entry in position .i; 1/. Thus after
some elementary row operations we end up with a matrix B whose first column is
either 0 or has a pivot in position .1; 1/ and zeros elsewhere.
Next, we move to the second column C2 of this new matrix B. If every entry
below b12 is zero, go directly to the third column of B. Suppose that some entry
below b12 is nonzero. By possibly swapping the second row and a suitable other row
(corresponding to the first nonzero entry below b12 ), we may assume that b22 ¤ 0.
Multiply the second row by 1=b22 so that the entry in position .2; 2/ becomes 1.
Now make the other entries in the second column zero by transvections. We now
have pivots equal to 1 in the first and second columns. Needless to say, we continue
this process with each subsequent column and we end up with a matrix in reduced
row-echelon form.
Remark 3.13. The algorithm used in the proof of the previous theorem is called
Gaussian reduction or row-reduction.
By combining the Gaussian reduction theorem (Theorem 3.12) and Proposition 3.11 we obtain the following result, which will be constantly used in the next
section:
Proposition 3.14. For any matrix A 2 Mm;n .F / we can find a matrix B 2 Mm .F /
which is a product of elementary matrices, such that Aref D BA.
Remark 3.15. In order to find the matrix B in practice, the best way is to row-reduce
the matrix ŒAjIm if A is m n. The row-reduction will yield the matrix ŒAref jB,
as the reader can check.
Example 3.16. Let us perform the Gaussian reduction on the matrix
3
0 1 2 34
61 0 1 2 37
7
AD6
4 0 1 1 1 15 2 M4;5 .R/:
2
3 1 1 0 2
The first nonzero entry in column C1 appears in position .2; 1/ and equals 1, so
we swap the first and second rows, then we multiply the new first row by 1 to get
a pivot equal to 1 in the first row. We end up with the matrix
94
3 Matrices and Linear Equations
3
1 0 1 2 3
60 1 2 3 4 7
7
A1 D 6
40 1 1 1 1 5 :
3 1 1 0 2
2
We make zeros elsewhere in the first column, by subtracting three times the first row
from the last row. The new matrix is
3
1 0 1 2 3
60 1 2 3 4 7
7
A2 D 6
40 1 1 1 1 5 :
0 1 2 6 11
2
Since we are done with the first column, we go on to the second one. The entry
in position .2; 2/ is already equal to 1, so we don’t need to swap rows or to scale
them. Thus we make directly zeros elsewhere in the second column, so that the only
nonzero entry is the 1 in position .2; 2/. For this, we subtract the second row from
the third and the fourth. The new matrix is
3
1 0 1 2 3
60 1 2 3 4 7
7
A3 D 6
40 0 1 2 35 :
2
00 0
3
7
We next consider the third column. The first nonzero entry below the entry in
position .2; 3/ is 1, so we multiply the third row by 1 and then make the 1
in position .3; 3/ the only nonzero entry in that column by transvections. We end
up with
3
100 0 0
60 1 0 1 27
7
A4 D 6
40 0 1 2 3 5 :
000 3 7
2
We repeat the procedure with the fourth column: we multiply the last row by 1=3
(so that the first nonzero entry below the one in position .3; 4/ becomes 1 our pivot)
and then make the entry in position .4; 4/ the only nonzero entry in its column by
transvections. The final matrix is the reduced row-echelon form of A, namely
3
1000 0
60 1 0 0 1=3 7
7
A5 D 6
40 0 1 0 5=35 :
0 0 0 1 7=3
2
3.2 The Reduced Row-Echelon form and Its Relevance to Linear Systems
95
Problem 3.17. Solve the homogeneous linear system AX D 0, where A is the
matrix from the previous example.
Solution. The systems AX D 0 and Aref X D 0 being equivalent, it suffices to
solve the latter system. The pivot variables are x1 ; x2 ; x3 ; x4 and the free variable is
x5 . The system Aref X D 0 is given by
8
ˆ
ˆ
<
x1 D 0
x2 C x35 D 0
ˆ x3 53 x5 D 0
:̂
x4 C 73 x5 D 0
The resolution is then immediate and gives the solutions
7
1 5
.0; t; t; t; t /;
3 3
3
t 2 R:
t
u
3.2.1 Problems for Practice
1. Find the reduced row-echelon form of the matrix with real entries
3
12345
A D 42 3 4 5 65 :
34567
2
2. Implement the Gaussian reduction algorithm on the matrix
3
0 2112
6 1 1 0 2 17
7
AD6
43 1 1 0 25 :
2
1 1111
3. Determine the fundamental solutions of the homogeneous linear system of
equations AX D 0, where A is the matrix
3
1 2 1 0
A D 42 4 0 25 :
1 2 1 2
2
96
3 Matrices and Linear Equations
4. (a) Write the solutions of the homogeneous system of equations AX D 0 in
parametric vector form, where
3
1 0 3
A D 4 0 1 15 :
1 1 4
2
(b) Find a solution to the system for which the sum of the first two coordinatesis 1.
5. Solve the homogeneous system
8
< x C 2y 3z D 0
2x C 5y C 2z D 0
:
3x y 4z D 0
6. Show that the homogeneous system of equations AX D 0 has nontrivial
solutions, where
3
2 1 3 1
61 0 2 2 7
7
AD6
43 1 7 0 5 :
1 2 4 1
2
Then determine a matrix B of size 4 3 obtained from A by erasing one of its
columns such that the system BY D 0 has only the trivial solution.
7. Let n > 2 be an integer. Solve in real numbers the linear system
x2 D
x 1 C x3
;
2
x3 D
x 2 C x4
;
2
:::;
xn1 D
xn2 C xn
:
2
3.3 Solving the System AX D b
Consider a linear system AX D b with A 2 Mm;n .F / and b 2 F m , in the variables
x1 ; : : : ; xn , which are the coordinates of the vector X 2 F n . In order to solve
this system, we consider the augmented matrix .Ajb/ obtained by adding to the
matrix A a new column (at the right), given by the coordinates of the vector b.
Elementary row operations on the equations of the system come down to elementary
row operations on the augmented matrix, thus in order to solve the system we can
first transform .Ajb/ into its reduced row-echelon form by the Gaussian reduction
algorithm, then solve the new (much easier) linear system. The key point is the
following easy but important observation:
3.3 Solving the System AX D b
97
Proposition 3.18. Consider the linear system AX D b. Suppose that the matrix
.A0 jb 0 / is obtained from the augmented matrix .Ajb/ by a sequence of elementary
row operations. Then the systems AX D b and A0 X D b 0 are equivalent, i.e., they
have exactly the same set of solutions.
Proof. As we have already noticed, performing elementary row operations on .Ajb/
comes down to performing elementary operations on the equations of the system
AX D b, and these do not change the set of solutions, as they are reversible.
We now reach the second fundamental theorem of this chapter, the existence and
uniqueness theorem.
Theorem 3.19. Assume that .Ajb/ has been brought to a reduced row-echelon form
.A0 jb 0 / by elementary row operations.
(a) The system AX D b is consistent if and only if .A0 jb 0 / does not have a pivot in
the last column.
(b) If the system is consistent, then it has a unique solution if and only if A0 has
pivots in every column.
Proof. (a) Assume that .A0 jb 0 / has a pivot in the last column. If the pivot appears
in row i, then the i th row of .A0 jb 0 / is of the form .0; : : : ; 0; 1/. Thus among the
equations of the system A0 X 0 D b 0 we have the equation 0x10 C : : : C 0xn0 D 1,
which has no solution. Thus the system A0 X 0 D b 0 has no solution and so the
system AX D b is not consistent.
Conversely, suppose that .A0 jb 0 / does not have a pivot in the last column.
Say A0 has pivots in columns j1 < : : : < jk n and call xj1 ; : : : ; xjk the pivot
variables, and all other variables the free variables. Give the value 0 to all free
variables, getting in this way a system in the variables xj1 ; : : : ; xjk . This system
is triangular and can be solved successively from the bottom, by first finding
xjk , then xjk1 ,. . . , then xj1 . In particular, the system has a solution and so the
system AX D b is consistent.
(b) Since we can give any value to the free variables, the argument in the second
paragraph of the proof of (a) shows that the solution is unique if and only if
there are no free variables, or equivalently if and only if A0 has a pivot in every
column.
t
u
For simplicity, assume that F D R, i.e., the coefficients of the equations of the
linear system AX D b are real numbers. In order to find the number of solutions
of the system, we proceed as follows. First, we consider the augmented matrix
ŒAjb and perform the Gaussian reduction on it to reach a matrix ŒA0 jb 0 . If this
new matrix has a row of the form .0; 0; : : : ; 0; jc/ for some nonzero real number c,
then the system is inconsistent. If this is not the case, then we check whether every
column of A0 has a pivot. If this is the case, then the system has a unique solution.
If not, then the system has infinitely many solutions.
98
3 Matrices and Linear Equations
Problem 3.20. Let us consider the matrix
3
122
A D 40 1 15 :
244
2
Given a vector b 2 R3 , find a necessary and sufficient condition in terms of the
coordinates of b such that the system AX D b is consistent.
Solution. The augmented matrix of the system is
2
3
1 2 2 b1
ŒAjb D 40 1 1 b2 5 :
2 4 4 b3
In order to obtain its row-reduction, we subtract twice the first row from the third
one, and in the new matrix we subtract twice the second row from the first one. We
end up with
2
ŒAjb
3
1 0 0 b1 2b2
40 1 1
b2 5 :
0 0 0 b3 2b1
By the previous theorem, the system AX D b is consistent if and only if this last
matrix has no pivot in the last column, which is equivalent to b3 D 2b1 .
Using the fact that for two matrices A; B differing by a sequence of elementary
row operations the systems AX D 0 and BX D 0 are equivalent, we can give a
proof of the uniqueness of the reduced row-echelon form of a matrix. The following
simple and elegant proof of this nontrivial theorem is due to Thomas Yuster.1
Theorem 3.21. The reduced row-echelon form of a matrix is unique.
Proof. The proof goes by induction on the number n of columns of the matrix
A 2 Mm;n .F /. The result being clear for n D 1, assume that n > 1 and that
the result holds for n 1. Let A 2 Mm;n .F / and let A0 be the matrix obtained
from A by deleting the nth column. Suppose that B and C are two distinct reduced
row-echelon forms of A. Since any sequence of elementary row operations bringing
A to a reduced row-echelon form also bring A0 to a reduced row-echelon form,
by applying the induction hypothesis we know that B and C differ in the nth
column only.
Let j be such that bj n ¤ cj n (such j exists by the previous paragraph and the
assumption that B ¤ C ). If X is a vector such that BX D 0, then CX D 0 (as
1
See the article “The reduced row-echelon form of a matrix is unique: a simple proof”, Math.
Magazine, Vol. 57, No 2, Mar 1984, pp. 93–94.
3.3 Solving the System AX D b
99
the systems BX D 0 and CX D 0 are equivalent to the system AX D 0), so that
.B C /X D 0. Since B and C differ in the nth column only, the j th equation of the
system .B C /X D 0 reads .bj n cj n /xn D 0 and so xn D 0 whenever BX D 0
or CX D 0. It follows that xn is not a free variable for B and C and thus B and C
must have a pivot in the nth column. Again, since B and C only differ in the last
column and since they are in reduced row-echelon form, the row in which the pivot
in the last column appears is the same for B and C . Since all other entries in the
last column of B and C are equal to 0 (as B and C are in reduced echelon form),
we conclude that B and C have the same nth column, contradicting the fact that
bj n ¤ cj n . Thus B D C and the inductive step is completed, proving the desired
result.
3.3.1 Problems for Practice
1. Write down the solution set of the linear system
8
< x1
3x2 2x3 D 5
x2 x3 D 4
:
2x1 C3x2 C7x3 D 2
in parametric vector form.
2. Let A be a matrix of size m n and let b and c be two vectors in Rm such that the
system AX D b has a unique solution and the system AX D c has no solution.
Explain why m > n must hold.
3. Find a necessary and sufficient condition on the coordinates of the vector b 2 R4
for the system AX D b to be consistent, where
3
3 6 2 1
62 4 1 3 7
7
AD6
4 0 0 1 1 5:
1 2 1 0
2
4. Find x; y; z and w so that
00
x 3 1 1
D
00
y4 z w
Find one solution with x positive and one with x negative.
5. Explain why a linear system of 10 equations in 11 variables cannot have a unique
solution.
100
3 Matrices and Linear Equations
6. Find all possible values for h and k such that the system with augmented matrix
12j h
2 k j 12
has
(a) a unique solution.
(b) infinitely many solutions.
(c) no solution.
7. For what value of s is the vector v1 D .s; 7; 6/ a linear combination of the
vectors v2 D .1; 0; 1/ and v3 D .1; 7; 4/?
8. Let a; b be real numbers. Solve in real numbers the system
8
xCy Da
ˆ
ˆ
<
yCzDb
ˆzCt Da
:̂
t Cx Db
3.4 Computing the Inverse of a Matrix
Recall that a matrix A 2 Mn .F / is invertible if there is a matrix B such that
AB D BA D In . Such a matrix is then unique and is called the inverse of A and
denoted A1 . A fundamental observation is that elementary matrices are invertible,
which follows immediately from the fact that elementary row operations on matrices
are reversible (this also shows that the inverse of an elementary matrix is still an
elementary matrix). For instance, if a matrix E is obtained from In by exchanging
rows i and j , then E 1 is obtained from In by doing the same operation that is
E 1 D E. Also, if E is obtained by adding times row j to row i in In , then E 1
is obtained by adding times row j to row i in In . Due to its importance, let us
state this as a proposition:
Proposition 3.22. Elementary matrices are invertible and their inverses are also
elementary matrices.
Here is an important consequence of the previous proposition and
Proposition 3.14.
Theorem 3.23. For a matrix A 2 Mn .F / the following statements are equivalent:
(a) A is invertible.
(b) Aref D In .
(c) A is a product of elementary matrices.
3.4 Computing the Inverse of a Matrix
101
Proof. First, let us note that any product of elementary matrices is invertible,
since any elementary matrix is invertible and since invertible matrices are stable
under product. This already proves that (c) implies (a). Assume that (a) holds. By
Proposition 3.14 and our initial observation, we can find an invertible matrix B
such that Aref D BA. Since A is invertible, so is BA and so Aref is invertible. In
particular, all rows in Aref are nonzero (it is easy to see that if Aref has an entire
row consisting of zeros, then Aref C is never equal to In ) and so Aref has n pivots,
one in each column. Since moreover Aref is in reduced row-echelon form, we must
have Aref D In . Thus (b) holds.
Finally, if (b) holds, then by Proposition 3.14 we can find a matrix B which is a
product of elementary matrices such that BA D In . By the previous proposition B
is invertible and B 1 is a product of elementary matrices. Since BA D In , we have
A D B 1 BA D B 1 and so A is a product of elementary matrices. Thus (b) implies
(c) and the theorem is proved.
The following proposition expresses the solutions of the system AX D b when
A is an invertible matrix. Of course, in order to make this effective, one should have
an algorithm allowing one to compute A1 . We will see such an algorithm (based
again on row-reduction) later on (see the discussion following Corollary 3.26).
Proposition 3.24. If A 2 Mn .F / is invertible, then for all b 2 F n the system
AX D b has a unique solution, namely X D A1 b.
Proof. Let X be a solution of the system. Multiplying the equality AX D b on the
left by A1 yields A1 .AX / D A1 b. Since
A1 .AX / D .A1 A/X D In X D X;
we conclude that X D A1 b, thus the system has at most one solution. To see that
this is indeed a solution, we compute
A.A1 b/ D .AA1 /b D In b D b:
It turns out that the converse is equally true, but much trickier. In fact, we have
the fundamental:
Theorem 3.25. Let A 2 Mn .F / be a matrix. The following statements are
equivalent:
(a) A is invertible
(b) For all b 2 F n the system AX D b has a unique solution X 2 F n .
(c) For all b 2 F n the system AX D b is consistent.
Proof. We have already proved that (a) implies (b). It is clear that (b) implies (c),
so let us assume that (c) holds. Let Aref be the reduced row-echelon form of A. By
Proposition 3.14 we can find a matrix B which is a product of elementary matrices
102
3 Matrices and Linear Equations
(thus invertible) such that Aref D BA. We deduce that the system Aref X D Bb
has at least one solution for all b 2 F n (indeed, if AX D b, then Aref X D BAX D
Bb). Now, for any b 0 2 F n we can find b such that b 0 D Bb, by taking b D B 1 b 0 .
We conclude that the system Aref X D b is consistent for every b 2 F n . But then
any row of Aref must be nonzero (if row i is zero, then choosing any vector b
with the ith coordinate equal to 1 yields an inconsistent system) and, as in the first
paragraph of the proof of Theorem 3.23 we obtain Aref D In . Using Theorem 3.23
we conclude that A is invertible and so (a) holds. The theorem is proved.
Here is a nice and nontrivial consequence of the previous theorem:
Corollary 3.26. Let A; B 2 Mn .F / be matrices.
(a) If AB D In , then A is invertible and B D A1 .
(b) If BA D In , then A is invertible and B D A1 .
Proof. (a) For any b 2 F n the vector X D Bb satisfies AX D A.Bb/ D
.AB/b D b, thus the system AX D b is consistent for every b 2 F n . By
the previous theorem, A is invertible. Multiplying the equality AB D In on the
left by A1 we obtain B D A1 AB D A1 , thus B D A1 .
(b) By part (a), we know that B is invertible and A D B 1 . But then A itself is
invertible and A1 D B, since by definition B B 1 D B 1 B D In .
t
u
The previous corollary gives us a practical way of deciding whether a square
matrix A is invertible and, if this is the case, computing its inverse. Indeed, A
is invertible if and only if we can find a matrix X such that AX D In , as then
X D A1 . The equation AX D In is equivalent to n linear systems: AX1 D e1 ,
AX2 D e2 ,. . . , AXn D en , where ei is the i th column of In and Xi denotes the i th
column of X . We already know how to solve linear systems, using the reduced row
echelon form, so this gives us a practical way of computing X (if at least one of
these systems is inconsistent, then A is not invertible).
In practice, one can avoid solving n linear systems by the following
trick: instead of considering n augmented matrices ŒAjei , consider only one
augmented matrix ŒAjIn , in which we add the matrix In to the right of A (thus
ŒAjIn has 2n columns). Thus we solve simultaneously the n linear systems we
are interested in by enlarging the augmented matrix! Now find the reduced
row-echelon form ŒA0 jX of this n 2n matrix ŒAjIn . If A0 is different from In ,
then A is not invertible. If A0 D In , then the inverse of A is simply the matrix X .
Example 3.27. Consider the matrix
3
1222
62 1 2 27
7
AD6
42 2 1 25 2 M4 .R/:
2
1222
We will try to see whether the matrix A is invertible and, if this is the case, compute
its inverse. Consider the augmented matrix B D ŒAjI4 and let us find its reduced
3.4 Computing the Inverse of a Matrix
103
row-echelon form using Gaussian reduction. Subtracting twice the first row from
each of the second, third, and fourth row of B, we end up with
3
1 2 2 2 1 000
60 3 2 2 2 1 0 07
7
B1 D 6
40 2 3 2 2 0 1 05 :
2
0 2 2 3 2 0 0 1
Multiply the second row by 1=3. In the new matrix, add twice the second row to
the third and fourth row. We end up with
3
12 2 2 1 0 00
60 1 2 2 2 1 0 07
3
3
3
3
7
B2 D 6
40 0 5 2 2 2 1 05 :
3
3
3
3
0 0 23 53 23 23 0 1
2
Multiply the third row by 35 . In the new matrix add 2=3 times the third row to
the fourth one, then multiply the fourth row by 5=7. Continuing the Gaussian
reduction in the usual way, we end up (after quite a few steps which are left to the
reader) with the matrix
2
3
1 0 0 0 57 72 27 27
60 1 0 0 2 5 2 2 7
7
7 7
7 7
6
40 0 1 0 2 2 5 2 5 :
7
7
7 7
0 0 0 1 27 72 27 57
This shows that A is invertible and
2
57
6 2
7
A1 D 6
4 2
7
2
7
2 3
7
2 7
57
7 7:
2
57 72 5
7
2
2
57
7
7
2
7
2
7
2
7
Let us take a closer look at this example, with another proof (this proof works
in general when the coefficients of the matrix have sufficient symmetry). Let us
consider solving the system AX D Y . This can be written
8
x1 C 2x2 C 2x3 C 2x4 D y1
ˆ
ˆ
<
2x1 C x2 C 2x3 C 2x4 D y2
ˆ2x1 C 2x2 C x3 C 2x4 D y3
:̂
2x1 C 2x2 C 2x3 C x4 D y4
104
3 Matrices and Linear Equations
We can easily solve this system by introducing
S D x 1 C x 2 C x 3 C x4 :
Then the equations become
2S xi D yi ;
1 i 4:
Thus xi D 2S yi . Taking into account that
S D x 1 C x 2 C x3 C x 4 D
.2S y1 / C .2S y2 / C .2S y3 / C .2S y4 / D 8S .y1 C y2 C y3 C y4 /;
we deduce that
SD
y 1 C y 2 C y 3 C y4
7
and so
5
2
2
2
x 1 D y1 C y 2 C y 3 C y 4
7
7
7
7
and similarly for x2 ; x3 ; x4 . This shows that for any choice of Y 2 R4 the system
AX D Y is consistent. Thus A is invertible and the solution of the system is given
by X D A1 Y . If the first row of A1 is .a; b; c; d /, then
x1 D ay1 C by2 C cy3 C dy4 :
But since we know that
5
2
2
2
x 1 D y1 C y 2 C y 3 C y 4
7
7
7
7
and since y1 ; y2 ; y3 ; y4 are arbitrary, we deduce that
5
aD ;
7
bDcDd D
2
:
7
In this way we can find the matrix A1 and, of course, we obtain the same result
as before (but the reader will have noticed that we obtain this result with much less
effort!).
3.4 Computing the Inverse of a Matrix
105
3.4.1 Problems for Practice
1. Is the matrix
3
1 2 3
A D 41 2 45
0 1 1
2
invertible? If so, compute its inverse.
2. For which real numbers x is the matrix
3
1x 1
A D 40 1 x 5
101
2
invertible? For any such number x, compute the inverse of A.
3. Let x; y; z be real numbers. Compute the inverse of the matrix
3
1xy
A D 40 1 z 5 :
00 1
2
4. Determine the inverse of the matrix
2
3
n 1 1 ::: 1
61 n 1 : : : 17
7
6
A D 6 : : : : : 7 2 Mn .R/:
4 :: :: :: : : :: 5
1 1 1 ::: n
5. Let a be a real number. Determine the inverse of the matrix
2
3
1
0
0 ::: 0 0
6 a
1
0 : : : 0 07
6
7
6 2
7
a
1 : : : 0 07
6 a
AD6
::
:: : : :: :: 7
6 ::
7 2 Mn .R/:
: : :7
6 :
:
:
6 n2 n3 n4
7
4a
a
a
: : : 1 05
an1 an2 an3 : : : a 1
Chapter 4
Vector Spaces and Subspaces
Abstract In this chapter we formalize and generalize many of the ideas
encountered in the previous chapters, by introducing the key notion of vector
space. The central focus is a good theory of dimension for vector spaces spanned
by finitely many vectors. This requires a detailed study of spanning and linear
independent families of vectors in a vector space.
Keywords Vector space • Vector subspace • Span • Linearly independent set
• Dimension • Direct sum • Basis
In this chapter we formalize and generalize many of the ideas encountered in the
previous chapters, by introducing the key notion of vector space. It turns out that
many familiar spaces of functions are vector spaces, and developing an abstract
theory of vector spaces has the advantage of being applicable to all these familiar
spaces simultaneously. A good deal of work is required in order to define a good
notion of dimension for vector spaces, but once the theory is developed, a whole
family of nontrivial tools are at our disposal and can be used for a deeper study of
vector spaces.
In all this chapter we fix a field F 2 fQ; R; C; F2 g, which the reader might want
to take R or C, for simplicity. The elements of F are called scalars.
4.1 Vector Spaces-Definition, Basic Properties and Examples
We refer the reader to the appendix on algebraic preliminaries for the notion of
group and commutative group (we will recall below everything we need, anyway).
Let us simply recall that a commutative group .V; C/ is a set V endowed with an
addition rule C W V V ! V , denoted .v; w/ ! v C w, and satisfying natural
identities (which are supposed to mimic the properties of addition on integers,
rational numbers, real numbers, etc.). We are now ready to introduce a fundamental
definition, that of a vector space over a field F . The prototype example to keep in
mind is F n (n being any positive integer), which has already been introduced in the
first chapter.
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__4
107
108
4 Vector Spaces and Subspaces
Definition 4.1. A vector space over F or an F -vector space is a commutative
group .V; C/ endowed with a map F V ! V , called scalar multiplication and
denoted .a; v/ ! a v such that for all a; b 2 F and u; v 2 V we have
a) a .v C w/ D a v C a w and .a C b/ v D a v C b v.
b) 1 v D v.
c) .ab/ v D a .b v/.
The elements of V are called vectors.
Remark 4.2. 1) We usually write av instead of a v.
2) By definition, a vector space over F is nonempty!
Before giving quite a few examples of vector spaces we will make the definition
more explicit and then try to explain different ways of understanding a vector space.
Thus a vector space over F is a set V , whose elements are called vectors, in
which two operations can be performed
• addition, taking two vectors v; w and returning a vector v C w
• scalar multiplication, taking a scalar c 2 F and a vector v 2 V , and returning the
vector cv.
Moreover, the following properties/rules should hold:
1) addition is commutative: v C w D w C v for all vectors v; w 2 V .
2) addition is associative: .u C v/ C w D u C .v C w/ for all vectors u; v; w 2 V .
3) addition has an identity: there is a vector 0 2 V such that 0 C v D v C 0 D v for
all v 2 V .
4) there are additive inverses: for all v 2 V there is a vector w 2 V such that
v C w D 0.
5) We have 1v D v for all v 2 V .
6) For all scalars a; b 2 F and all v 2 V we have .ab/v D a.bv/.
7) Scalar multiplication is additive: for all scalars a 2 F and all v; w 2 V we have
a.v C w/ D av C aw.
8) scalar multiplication distributes over addition: for all scalars a; b 2 F and all
v 2 V we have .a C b/v D av C bv.
One can hardly come up with a longer definition of a mathematical object, but
one has to understand that most of the imposed conditions are natural and fairly
easy to check. Actually, most of the time we will not even bother checking these
conditions since they will be apparent on the description of the space V and its
operations. The key point is that we simply want to add vectors and multiply them
by scalars without having too many difficulties.
Remark 4.3. An important observation is that the addition C W V ! V is an
internal operation, while the scalar multiplication W F V ! V is an external
operation.
Let us make a few simple, but important remarks concerning the previous rules.
First of all, one should be careful to distinguish the scalar 0 2 F and the vector
4.1 Vector Spaces-Definition, Basic Properties and Examples
109
0 2 V which is the identity for the addition rule. Of course, they are denoted in
exactly the same way, but they live in quite different worlds, so that there should
not be any risk of confusion. Next, since addition is associative, we will not bother
writing ..u C v/ C w/ C z, but simply u C v C w C z.
Now let us focus a little bit on property 4. So let us start with any vector v 2 V .
Property 4 ensures the existence of a vector w 2 V for which v C w D 0. A natural
question is whether such a vector is unique. The answer is positive (and this holds
in any group): suppose that w0 is another such vector. Then using properties 2
(associativity) and 3 we obtain
w D w C 0 D w C .v C w0 / D .w C v/ C w0 D 0 C w0 D w0 :
Thus w is uniquely determined by v, and we will denote it as v.
Another natural question is whether this vector v coincides with the vector
.1/v obtained by multiplying v by the scalar 1. Since mathematical definitions
are (usually) coherent, one expects that the answer is again positive, which is the
case. Indeed, on the one hand properties 5 and 8 yield
.1/v C v D .1/v C 1v D .1 C 1/v D 0v
and on the other hand property 8 gives
0v C 0v D .0 C 0/v D 0v:
Adding 0v to the previous relation we obtain 0v D 0, thus
0v D 0;
.1/v D v
for all v 2 V . There are a lot of such formulae which can be obtained by
simple algebraic manipulations straight from the definitions. Again, we will simplify
notations and write v w for v C .w/.
In the proof that 0v D 0 we used a trick which deserves to be glorified since it is
very useful:
Proposition 4.4 (Cancellation law). Let V be a vector space over F .
a) If v C u D w C u for some u; v; w 2 V , then v D w.
b) If au D av for some v; w 2 V and some nonzero a 2 F , then u D v.
Proof. a) We have
v D vC0 D vC.uu/ D .vCu/u D .wCu/u D wC.uu/ D wC0 D w;
hence v D w, as desired.
b) Similarly, we have
u D 1 u D .a1 a/u D a1 .au/ D a1 .av/ D .a1 a/v D 1 v D v:
110
4 Vector Spaces and Subspaces
It is now time to see some concrete vector spaces. We have already encountered
quite a few in previous chapters. Let us explain why. Let us fix a field F .
First, the field F itself is a vector space over F . Indeed, addition and multiplication on F satisfy all properties 1–8 by definition of a field! Note here that
scalar multiplication coincides with the multiplication in F . The zero vector 0 in F
coincides with the natural unit for addition in F .
Another very important class of vector spaces over F occurs as follows: let K
be a field containing F . Then K is a vector space over F , for essentially the same
reasons as in the previous paragraph. Important examples of this situation are Q R, R C, Q C. Thus R is a vector space over Q, C is a vector space over R, C
is a vector space over Q.
Next, consider a positive integer n and recall that F n is the set of n-tuples of
2 3
x1
6 x2 7
7
elements of F , written in column form, X D 6
4 : : : 5. We add two such vectors
xn
component-wise and we re-scale them by scalars in F component-wise
3 2 3 2
3
y1
x 1 C y1
x1
6 x 2 7 6 y 2 7 6 x 2 C y2 7
7
6 7C6 7D6
4:::5 4:::5 4 ::: 5
xn
yn
x n C yn
2
2
and
3 2
3
x1
cx1
6 x2 7 6 cx2 7
7 6
7
c6
4:::5 D 4 ::: 5:
xn
cxn
It is not difficult to check that properties 1–8 are all satisfied: they all follow from
the corresponding properties of addition and multiplication in F , since all operations
are defined component-wise. Thus F n is a vector space for these two operations. Its
2 3
0
6 0 7
7
zero vector 0 is the vector 6
4 : : : 5 having all coordinates equal to 0.
0
Consider next the set V D Mm;n .F / of m n matrices with entries in F , where
m; n are given positive integers. Recall that addition and scalar multiplication on V
are defined component-wise by
Œaij C Œbij D Œaij C bij and
cŒaij D Œcaij for matrices Œaij ; Œbij 2 V and scalars c 2 F . Again, all properties 1 8 follow
from the corresponding properties of the operations in F . The zero vector in V is
the matrix Om;n all of whose entries are equal to 0.
We consider now function spaces. In complete generality, let X be any
nonempty set and consider the set V D F X of functions f W X ! F . We can
define addition and scalar multiplication on V by the rules
.f C g/.x/ D f .x/ C g.x/ and
.cf /.x/ D cf .x/
4.1 Vector Spaces-Definition, Basic Properties and Examples
111
1.5
1
y
0.5
0
0.2
0.6
0.4
0.8
1
x
–0.5
–1
–1.5
Fig. 4.1 The functions f .x/ D x 2 , g.x/ D 1 2x and their sum .f C g/.x/ D x 2x C 1
for c 2 F and x 2 X . Then V is a vector space over F , again thanks to the fact that
all operations are induced directly from the corresponding operations in F . The zero
vector 0 of V is the map 0 W X ! F sending every x 2 X to 0 2 F . Note that for
X D f1; 2; : : : ; ng we recover the space F n : giving a map f W f1; 2; : : : ; ng ! F
is the same as giving a n-tuple of elements of F (namely the images of 1; 2; : : : ; n),
that is an element of F n .
One can impose further natural properties on the maps f W X ! F and still
get vector spaces, contained in F X . For instance, consider the set CŒ0; 1 of realvalued continuous functions on the interval Œ0; 1. Thus an element of CŒ0; 1 is a
continuous map f W Œ0; 1 ! R. The addition and scalar multiplication are inherited
from those on the vector space RŒ0;1 of all real-valued maps on Œ0; 1. For example,
if f .x/ D x 2 and g.x/ D 1 2x, then .f C g/.x/ D x 2 2x C 1, for all x in the
interval Œ0; 1 (see Fig. 4.1).
As another example, the function f given by f .x/ D sin 5x and its re-scaling
32 f are depicted in Fig. 4.2.
The key point is that the sum of two continuous maps is still a continuous map,
and if f is continuous and c is a real number, then cf is also continuous. This
ensures that the addition and scalar multiplication laws are well defined on CŒ0; 1.
They satisfy all properties 1–8, since these properties are already satisfied on the
larger space RŒ0;1 . Then CŒ0; 1 is itself a vector space over R, contained in RŒ0;1 .
This is an example of vector subspace of a vector space, a crucial notion which will
be introduced and studied at length in the sequel.
There is nothing special about the interval Œ0; 1: for each interval I we obtain a
vector space of continuous real-valued maps on I . If the real numbers are replaced
with complex numbers, we obtain a corresponding vector space of complex-valued
continuous maps on I .
112
4 Vector Spaces and Subspaces
1.5
1
y
0.5
0
0.2
0.4
0.6
0.8
1
x
–0.5
–1
–1.5
Fig. 4.2 The function f .x/ D sin 5x and its re-scaling by a factor of 32
In fact, there are many other function spaces: we could consider the vector space
of piecewise continuous functions, differentiable functions, bounded functions,
integrable functions, etc, as long as any two functions in such a space add up to
another function in the same space and re-scaling of a function in the space is
another function in the space. The possibilities are endless.
Let us consider now another very important class of vector spaces, namely spaces
of polynomials. Consider the set RŒX of polynomials in one variable and having
real coefficients. This set is a vector space over R. Recall that the addition and rescaling of polynomials are done coefficient-wise, so the fact that RŒX is a vector
space over R follows directly from the fact that R itself is a field. The zero vector in
RŒX is the zero polynomial (i.e., the polynomial all of whose coefficients are 0).
The vector space RŒX contains a whole bunch of other vector spaces over R: for
each nonnegative integer n consider the set Rn ŒX of polynomials in RŒX whose
degree does not exceed n. For example the polynomials
4.1 Vector Spaces-Definition, Basic Properties and Examples
1 C X 2;
3
X C X 2 ;
4
113
1 X X3
are all in R3 ŒX , only the first two are in R2 ŒX and none of them is in R1 ŒX . Since
the sum of two polynomials of degree at most n is a polynomial of degree at most
n, and since deg.cP / n for any real number c and any P 2 Rn ŒX , we deduce
that Rn ŒX is stable under the addition and scalar multiplication defined on RŒX ,
thus it forms itself a vector space over R. To be completely explicit, any polynomial
in Rn ŒX can be written in the form
a0 C a1 X C a2 X 2 C C an X n
where ai , 0 i n, are real numbers, and then
.a0 C a1 X C C an X n / C .b0 C b1 X C C bn X n /
D .a0 C b0 / C .a1 C b1 /X C C .an C bn /X n
and for c 2 F
c.a0 C a1 X C C an X n / D .ca0 / C .ca1 /X C C .can /X n :
4.1.1 Problems for Practice
1. Consider the set V D R2 endowed with an addition rule defined by
.x; y/ C .x 0 ; y 0 / D .x C x 0 ; y C y 0 /
and with a multiplication rule by elements
of R as follows
.x; y/ D .2x; 0/:
Is V endowed with these operations a vector space over R?
2. Define an operation C on .0; 1/ by
a C b D ab
for a; b 2 .0; 1/, and an external multiplication by real numbers as follows
a b D b a
for a 2 R; b 2 .0; 1/. Does .0; 1/ endowed with this new addition and scalar
multiplication become a vector space over R?
114
4 Vector Spaces and Subspaces
3. (Complexification of a real vector space) Let V be a vector space over R. Let VC
be the product V V (this is the set of ordered pairs .x; y/ of elements of V )
endowed with the following addition rule
.x; y/ C .x 0 ; y 0 / D .x C x 0 ; y C y 0 /:
Also, for each complex number z D a C i b, consider the “multiplication by
z rule”
z .x; y/ WD .ax by; ay C bx/
on VC . Prove that VC endowed with these operations becomes a C-vector space
(this space is called the complexification of the vector space V ).
4.2 Subspaces
We have already seen in the previous subsection a lot of subspaces of concrete vector
spaces. In this section we formalize the concept of vector subspace of a given vector
space and then study some of its basic properties.
Definition 4.5. Let V be an F -vector space. A subspace of V is a nonempty subset
W of V which is stable under the operations of addition and scalar multiplication:
v C w 2 W and cv 2 W for all v; w 2 W and c 2 F .
Example 4.6. Let V be the vector space over R of all maps f W R ! R. Then the
following sets V1 ; V2 ; V3 ; V4 are subspaces over R.
I)
II)
III)
IV)
V1 D ff 2 V j f is a continuous function on Rg.
V2 D ff 2 V j f is a differentiable function on Rg.
V3 D ff 2 V j f is an integrable function on the interval Œa; b;
where a; b 2 Rg.
V4 D ff 2 V j there exists 2 R such that jf .x/j ; 8 x 2 Rg.
The previous definition invites a whole series of easy observations, which are
however very useful in practice.
Remark 4.7. 1. First, note that a vector subspace of a vector space must contain
the zero vector. Indeed, say W is a vector subspace of V . Since W is nonempty,
there is v 2 W , but then 0 D 0v 2 W . Thus if a subset W of a vector space
V does not contain the zero vector, then this subset W has no chance of being a
vector subspace of V .
2. Next, a key observation is that if W is a subspace of V , then W becomes itself
an F -vector space, by restricting the operations in V to W . Indeed, since
properties 1–8 in the definition of a vector space are satisfied in V , they are
automatically satisfied in the subset W of V . This was essentially implicitly used
4.2 Subspaces
115
(or briefly explained) in the previous section, where many examples of vector
spaces were constructed as vector subspaces of some standard vector spaces.
3. In practice, one can avoid checking two conditions (stability under addition and
under scalar multiplication) by checking only one: v C cw 2 W whenever v; w 2
W and c 2 F . More generally, a nonempty subset W of V is a vector subspace
if and only if
av C bw 2 W
for all a; b 2 F and v; w 2 W . We deduce by induction on n that if W
is a vector subspace of V and w1 ; : : : ; wn 2 W and c1 ; : : : ; cn 2 F , then
c1 w1 C : : : C cn wn 2 W .
4. Another very important observation is the stability under arbitrary intersections of vector subspaces. More precisely, if .Wi /i 2I is a family of subspaces of
V , then
W WD \i2I Wi
is again a subspace of V . Indeed, W is nonempty because it contains 0 (as any
subspace of V contains 0) and clearly W is stable under addition and scalar
multiplication, since each Wi has this property.
Problem 4.8. Consider the vector space V D R3 over R and the subsets V1 ; V2
defined by
V1 D f.x; y; z/ 2 R3 j x C y C z D 1g
p
V2 D f.x; y; z/ 2 R3 j x C 2y C z > 2g:
Which (if either) of these is a subspace of V ?
Solution. Neither V1 nor V2 contain .0; 0; 0/, thus they are not subspaces of V . Problem 4.9. Let V D R3 and
U D f.x; y; z/ 2 R3 j x 2 C y 2 C z2 1g:
Is U a subspace of V ?
Solution. U is not a subspace of V , since the vector u D .1; 0; 0/ belongs to U , but
the vector 2u D .2; 0; 0/ does not belong to U .
Problem 4.10. Determine if W is a subspace of V where
(a) V D CŒ0; 1 and W consists in those functions f in V for which f .0/ D 0.
(b) V D CŒ0; 1 and W consists in those functions f in V for which f .1/ D 1.
116
4 Vector Spaces and Subspaces
(c) V D CŒ0; 1 and W consists in those functions f in V for which
Z 1
f .x/ dx D 0:
0
(d) V D CŒ0; 1 and W consists in those functions f in V for which
Z 1
f .x/ dx D 1:
0
(e) V is the space of three times differentiable functions on Œ0; 1 and W consists in
those functions in V whose third derivative is 0.
Solution. a) If f .0/ D 0 and g.0/ D 0, then .f C cg/.0/ D 0 for all c 2 R, and
f C cg is a continuous map. Thus W is a subspace of V .
b) W does not contain the zero element of V (which is the constant map equal to
0), thus W is not a subspace of V .
c) If f; g 2 W , then for all c 2 R the map f C cg is continuous and
Z 1
.f C cg/.x/dx D
Z 1
0
f .x/dx C c
0
Z 1
g.x/dx D 0;
0
thus f C cg 2 W . It follows that W is a subspace of V .
d) W does not contain the zero map in V , thus it is not a subspace of V .
e) If f; g are three times differentiable and the third derivative is 0, then f C cg has
the same property for all real numbers c, since .f C cg/.3/ D f .3/ C cg .3/ . Thus
W is a subspace of V (consisting actually of polynomial functions of degree at
most 2).
Problem 4.11. Let U and V be the sets of vectors
U D f .x1 ; x2 / j x1 ; x2 0 g
and
V D f .x1 ; x2 / j x1 x2 0 g
in R2 .
(a) Show that U is closed under addition.
(b) Show that V is closed under re-scaling.
(c) Show that neither U nor V is a subspace of R2 .
Solution. It is clear that U is stable under addition, since nonnegative real numbers
are closed under addition. To see that V is closed under re-scaling, consider a scalar
c and v D .x1 ; x2 / in V . Then cv D .cx1 ; cx2 / and .cx1 /.cx2 / D c 2 x1 x2 0
because c 2 0 and x1 x2 0.
U is not a subspace of R2 as v D .1; 1/ 2 U but v D .1/v … U . V
is not a subspace of R2 since v1 D .2; 2/ 2 V , v2 D .1; 3/ 2 V , but
v1 C v2 D .1; 1/ … V .
4.2 Subspaces
117
The union of two subspaces W1 ; W2 of V is almost never a subspace of V , as
the following problem shows.
Problem 4.12. Let V be a vector space over a field F and let V1 ; V2 be subspaces
of V . Prove that the union of V1 ; V2 is a subspace of V if and only if
V1
V2
or
V2
V1 :
V2 (resp. V2
V1 ), then V1 [ V2 D V2 (resp. V1 [ V2 D V1 ).
Solution. If V1
Therefore in both cases V1 [ V2 is a subspace of V .
Conversely, suppose that V1 [ V2 is a subspace of V . If V1 V2 , then we are
done, so suppose that this is not the case. Thus we can find v 2 V1 which does not
belong to V2 . We will prove that V2 V1 .
Take any vector x 2 V2 . Since V1 [ V2 is a subspace of V containing x and v,
it contains their sum x C v. Thus x C v 2 V1 or x C v 2 V2 . If x C v 2 V2 , then
v D .x C v/ x 2 V2 , since V2 is a subspace of V . This contradicts the choice
of v, thus we must have x C v 2 V1 . Since v 2 V1 , we also have v 2 V1 and so
x D .x C v/ v 2 V1 . Thus any element of V2 belongs to V1 and we have V2 V1 ,
as desired.
We now define a very important operation on subspaces of an F -vector space:
Definition 4.13. Let W1 ; W2 ; : : : ; Wn be subspaces of a vector space V . Their sum
W1 C W2 C : : : C Wn is the subset of V consisting of all vectors w1 C w2 C : : : C wn
with w1 2 W1 ; : : : ; wn 2 Wn .
One could extend the previous
definition to an arbitrary
P
P family .Wi /i 2I of
subspaces of V . In this case i2I Wi consists of all sums i 2I wi with wi 2 Wi
for
P all i 2 I and all but finitely many of the vectors wi are zero, so that the sum
i2I wi has only finitely many nonzero terms and thus makes sense, even if I if
infinite. In practice we will however deal with finite collections of subspaces. The
following result also holds for infinite families of vector subspaces, but in the sequel
we prefer to focus on finite families, for simplicity.
Proposition 4.14. If W1 ; W2 ; : : : ; Wn are subspaces of a vector space V , then W1 C
W2 C : : : C Wn is a subspace of V .
Proof. Let us denote for simplicity S D W1 C W2 C : : : C Wn . Let s; s 0 2 S and
let c be a scalar. It remains to prove that s C cs 0 2 S . By definition, we can find
w1 ; : : : ; wn and w01 ; : : : ; w0n such that wi ; w0i 2 Wi for 1 i n and
s D w1 C w2 C : : : C wn ;
s 0 D w01 C w02 C : : : C w0n :
Then
s C cs 0 D w1 C w2 C : : : C wn C c.w01 C w02 C : : : C w0n / D
w1 C w2 C : : : C wn C cw01 C cw02 C : : : C cw0n D .w1 C cw01 / C : : : C .wn C cw0n /:
118
4 Vector Spaces and Subspaces
Since Wi is a subspace of V and since wi ; w0i 2 Wi , it follows that wi C cw0i 2 Wi
for all 1 i n. The previous displayed formula expresses therefore s C cs 0 as a
sum of vectors in W1 ; : : : ; Wn and shows that s C cs 0 2 S . This finishes the proof
that S is a subspace of V .
Problem 4.15. Prove that W1 C W2 C : : : C Wn is the smallest subspace of V
containing all subspaces W1 ; : : : ; Wn .
Solution. It is clear that W1 C : : : C Wn contains W1 ; W2 ; : : : ; Wn , since each vector
wi of Wi can be written as 0C0C: : :C0Cwi C0C: : :C0 and 0 2 W1 \: : :\Wn . We
need to prove that if W is any subspace of V which contains each of the subspaces
W1 ; : : : ; Wn , then W contains W1 CW2 C: : :CWn . Take any vector v of W1 C: : :CWn .
By definition, we can write v D w1 C w2 C : : : C wn for some vectors wi 2 Wi .
Since W contains W1 ; : : : ; Wn , it contains each of the vectors w1 ; : : : ; wn . And since
W is a subspace of V , it must contain their sum, which is v. We proved that any
element of W1 C : : : C Wn belongs to W , thus W1 C : : : C Wn W and the result
follows.
We now introduce a second crucial notion, that of direct sum of subspaces:
Definition 4.16. Let W1 ; W2 ; : : : ; Wn be subspaces of a vector space V . We say that
W1 ; W2 ; : : : ; Wn are in direct sum position if the equality
w1 C w2 C : : : C wn D 0
with w1 2 W1 ; : : : ; wn 2 Wn forces w1 D w2 D : : : D wn D 0.
There are quite a few different ways of expressing this condition. Here is one
of them:
Proposition 4.17. Subspaces W1 ; : : : ; Wn of a vector space V are in direct sum
position if and only if every element of W1 C W2 C : : : C Wn can be uniquely written
as a sum w1 C : : : C wn with w1 2 W1 ; : : : ; wn 2 Wn .
Proof. Suppose that W1 ; : : : ; Wn are in direct sum position and take an element v of
W1 C : : : C Wn . By definition we can express v D w1 C : : : C wn with wi 2 Wi for
all 1 i n. Suppose that we can also write v D w01 C : : : C w0n with w0i 2 Wi . We
need to prove that wi D w0i for all 1 i n. Subtracting the two relations yields
0 D v v D .w1 w01 / C .w2 w02 / C : : : C .wn w0n /:
Let ui D wi w0i . Since Wi is a subspace of V , we have ui 2 Wi . Moreover,
u1 C : : : C un D 0. Since W1 ; : : : ; Wn are in direct sum position, it follows that
u1 D : : : D un D 0, and so wi D w0i for all 1 i n, which is what we needed.
Conversely, suppose every element of W1 CW2 C CWn can be written uniquely
as a sum of elements of W1 ; : : : ; Wn . Then 0 D 0 C 0 C 0 must be the unique
4.2 Subspaces
119
decomposition of 0. Thus whenever w1 2 W1 , w2 2 W2 ; : : : ; wn 2 Wn satisfy
w1 C w2 C C wn D 0, we have w1 D w2 D D wn D 0. Thus W1 ; : : : ; Wn are
in direct sum position.
Finally, we make another key definition:
Definition 4.18. a) We say that a vector space V is the direct sum of its subspaces
W1 ; W2 ; : : : ; Wn and write
V D W1 ˚ W2 ˚ : : : ˚ Wn
if W1 ; W2 ; : : : ; Wn are in direct sum position and V D W1 C W2 C : : : C Wn .
b) If V1 ; V2 are subspaces of a vector space V , we say that V2 is a complement (or
complementary subspace) of V1 if V1 ˚ V2 D V .
By the previous results, V D W1 ˚: : :˚Wn if and only if every vector v 2 V can
be uniquely written as a sum w1 C w2 C : : : C wn , with wi 2 Wi for all i . Hence, if
V1 ; V2 are subspaces of V , then V2 is a complement of V1 if and only if every vector
v 2 V can be uniquely expressed as v D v1 C v2 with v1 2 V1 and v2 2 V2 .
The result of the following problem is extremely useful in practice.
Problem 4.19. Prove that V2 is a complement of V1 if and only if V1 C V2 D V and
V1 \ V2 D f0g.
Solution. Assume that V2 is a complement of V1 , thus V D V1 ˚V2 and each v 2 V
can be uniquely written as the sum of an element of V1 and an element of V2 . This
clearly implies that V1 CV2 D V . If v 2 V1 \V2 , then we can write v D vC0 D 0Cv
and by uniqueness v D 0, thus V1 \ V2 D f0g.
Conversely, assume that V1 \ V2 D f0g and V1 C V2 D V . The second relation
implies that each vector of V is the sum of a vector in V1 and one in V2 . Assume that
v 2 V can be written both v1 C v2 and v01 C v02 with v1 ; v01 2 V1 and v2 ; v02 2 V2 . Then
v1 v01 D v02 v2 . Now the left-hand side belongs to V1 while the right-hand side
belongs to V2 , thus they both belong to V1 \ V2 D f0g and so v1 D v01 and v2 D v02 ,
giving the desired uniqueness result.
Example 4.20. 1. The vector space V D R2 is the direct sum of its subspaces
V1 D f.x; 0/ j x 2 Rg and V2 D f.0; y/ j y 2 Rg. Indeed, any .x; y/ 2 R2 can
be uniquely in the form .a; 0/ C .0; b/, via a D x and b D y.
2. Let V D Mn .R/ be the vector space of n n matrices with real entries. If V1
and V2 are the subspaces of symmetric, respectively skew-symmetric matrices,
then V D V1 ˚ V2 . Indeed, any matrix A 2 V can be uniquely written as the
sum of a symmetric matrix and a skew-matrix matrix: the only way to have A D
B C C with B symmetric and C skew-symmetric is via B D 12 .A C t A/ and
C D 12 .A t A/.
3. Let V be the vector space of all real-valued maps on R. Let V1 (respectively
V2 ) be the subspace of V consisting in even (respectively odd) functions. Recall
that a map f W R ! R is even (respectively odd) if f .x/ D f .x/ for all
120
4 Vector Spaces and Subspaces
x (respectively f .x/ D f .x/ for all x). Then V D V1 ˚ V2 . Indeed, for
any map f , the only way to write f D g C h with g even and h odd is via
.x/
.x/
g.x/ D f .x/Cf
and h.x/ D f .x/f
.
2
2
Problem 4.21. Let V be the space of continuous real-valued maps on Œ1; 1 and let
V1 D ff 2 V j
Z 1
f .t /dt D 0g
1
and V2 be the subset of V consisting of constant functions.
a) Prove that V1 ; V2 are subspaces of V .
b) Prove that V D V1 ˚ V2 .
Solution. a) If f1 ; f2 are in V1 and c 2 R, then cf1 C f2 is continuous and
Z 1
1
.cf1 C f2 /.t /dt D c
Z 1
1
f1 .t /dt C
Z 1
1
f2 .t /dt D 0;
thus cf1 C f2 2 V1 and V1 is a subspace of V . It is clear that V2 is a subspace
of V .
b) By the previous problem, we need to check that V1 \V2 D f0g and V D V1 CV2 .
R1
Assume that f 2 V1 \ V2 , thus f is constant and 1 f .t /dt D 0. Say f .t / D c
for all t 2 Œ1; 1, then
0D
Z 1
f .t /dt D 2c;
1
thus c D 0 and f D 0. This shows that V1 \ V2 D f0g.
In order to prove that V D V1 C V2 , let f 2 V and let us try to write f D c C g
with c a constant and g 2 V1 . We need to ensure that
Z 1
g.t /dt D 0;
1
that is
Z 1
.f .t / c/dt D 0:
1
It suffices therefore to take
1
cD
2
and g D f c.
Z 1
f .t /dt
1
4.2 Subspaces
121
4.2.1 Problems for Practice
1. Show that none of the following sets of vectors is a subspace of R3 :
(a) The set U of vectors x D .x1 ; x2 ; x3 / such that x12 C x22 C x32 D 1.
(b) The set V of vectors in R3 all of whose coordinates are integers.
(c) The set W of vectors in R3 that have at least one coordinate equal to 0.
2. Determine if U is a subspace of M2 .R/, where
(a) U is the set of 2 2 matrices such that the sum of the entries in the first
column is 0.
(b) U is the set of 2 2 matrices such that the product of the entries in the first
column is 0.
3. Is R a subspace of the C-vector space C?
4. Let V be the set of all periodic sequences of real numbers. Is V a subspace of
the space of all sequences of real numbers?
5. Let V be the set of vectors .x; y; z/ 2 R3 such that x.y 2 C z2 / D 0. Is V a
subspace of R3 ?
6. Let V be the set of twice differentiable functions f W R ! R such that for all
x we have
f 00 .x/ C x 2 f 0 .x/ 3f .x/ D 0:
Is V a subspace of the space of all maps f W R ! R?
7. Let V be the set of differentiable functions f W R ! R such that for all x we
have
f 0 .x/ f .x/2 D x:
Is V a subspace of the space of all maps f W R ! R?
8. a) Is the set of bounded sequences of real numbers a vector subspace of the
space of all sequences of real numbers?
b) Answer the same question if instead of bounded sequences we consider
monotonic sequences.
9. Let V be the set of all sequences .xn /n0 of real numbers such that
xnC2 C nxnC1 .n 1/xn D 0
for all n 0. Prove that V is a subspace of the space of all sequences of real
numbers.
10. Let V be the space of all real-valued maps on R and let W be the subset of V
consisting of maps f such that f .0/ C f .1/ D 0.
a) Check that W is a subspace of V .
b) Find a subspace S of V such that V D W ˚ S .
122
4 Vector Spaces and Subspaces
11. Let V be the space of continuously differentiable maps f W R ! R and let
W be the subspace of those maps f for which f .0/ D f 0 .0/ D 0. Let Z be
the subspace of V consisting of maps x 7! ax C b, with a; b 2 R. Prove that
V D W ˚ Z.
12. Let V be the space of convergent sequences of real numbers. Let W be the
subset of V consisting of sequences converging to 0 and let Z be the subset of
V consisting of constant sequences. Prove or disprove that W; Z are subspaces
of V and W ˚ Z D V .
13. (Quotient space) Let V be a vector space over F and let W V be a subspace.
For a vector v 2 V , let Œv D fv C w W w 2 W g. Note that Œv1 D Œv2 if
v1 v2 2 W . Define the quotient space V =W to be fŒv W v 2 V g. Define
an addition and scalar multiplication on V =W by Œu C Œv D Œu C v and
aŒv D Œav. Prove that the addition and multiplication above are well defined
and V =W equipped with these operations is a vector space.
14. Let F 2 fR; Cg and let V be a nonzero vector space over F . Suppose that V
is the union of finitely many subspaces of V . Prove that one of these subspaces
is V .
4.3 Linear Combinations and Span
Let V be a vector space over a field F and let v1 ; v2 ; : : : ; vn be vectors in V . By
definition, V contains all vectors c1 v1 C : : : C cn vn , with c1 ; : : : ; cn 2 F . The
collection of all these vectors plays a very important role in the sequel and so
deserves a formal definition:
Definition 4.22. Let v1 ; v2 ; : : : ; vn be vectors in a vector space V over F .
a) A vector v 2 V is a linear combination of v1 ; v2 ; : : : ; vn if there are scalars
c1 ; c2 ; : : : ; cn 2 F such that
v D c1 v1 C c2 v2 C : : : C cn vn
(4.1)
b) The span of v1 ; : : : ; vn is the subset of V consisting in all linear combinations of
v1 ; v2 ; : : : ; vn . It is denoted Span.v1 ; v2 ; : : : ; vn /.
Example 4.23. 1) The span Span.v/ of a single vector v in Rm consists in all rescaled copies of v (we also say all scalar multiples of v). Using the geometric
interpretation of vectors in R2 (or R3 ), if v ¤ 0 then Span.v/ is represented by
the line through the origin in the direction of the vector v.
2) Let e1 D .1; 0; 0/ and e2 D .0; 1; 0/. Then
x1 e1 C x2 e2 D .x1 ; x2 ; 0/:
4.3 Linear Combinations and Span
123
Since x1 and x2 are arbitrary we see that Span.e1 ; e2 / consists in all vectors in R3
whose third coordinate is 0. This is the x1 x2 -plane in R3 . In general, if two vectors
v1 and v2 in R3 are not collinear, then their span is the unique plane through the
origin that contains them.
Problem 4.24. Show that the vector .1; 1; 1/ cannot be expressed as a linear
combination of the vectors
v1 D .1; 0; 0/;
v2 D .0; 1; 0/
and
v3 D .1; 1; 0/:
Solution. An arbitrary linear combination
x1 v1 C x2 v2 C x3 v3 D .x1 C x3 ; x2 C x3 ; 0/
of v1 , v2 and v3 has 0 as the third coordinate, and so cannot be equal to .1; 1; 1/. More generally, let us consider the following practical problem: given a family
of vectors v1 ; v2 ; : : : ; vk in F n and a vector v 2 F n , decide whether this vector is a
linear combination of v1 ; : : : ; vk , that is v 2 Span.v1 ; : : : ; vk /. Consider the n k
matrix A whose columns are v1 ; : : : ; vk . Saying that v 2 Span.v1 ; : : : ; vk / is the
same as saying that we can find x1 ; : : : ; xk 2 F such that v D x1 v1 C : : : C xk vk ,
or equivalently the system AX D v is consistent (and then x1 ; : : : ; xk are given
by the coordinates of X ). Since we have a practical way of deciding whether this
system is consistent (via row-reduction of the augmented matrix ŒAjv), we see that
we have an algorithmic solution to the previous problem. Of course, we can solve
the previous problem via this method, too.
Problem 4.25. Consider the vectors v1 D .1; 0; 1; 2/, v2 D .3; 4; 2; 1/ and v3 D
.5; 8; 3; 0/. Is the vector v D .1; 0; 0; 0/ in the span of fv1 ; v2 ; v3 g? What about the
vector w D .4; 4; 3; 3/?
Solution. In order to solve this problem, we use the method described above.
Namely, we consider the matrix
3
135
60 4 87
7
AD6
41 2 35:
210
2
We want to know if the system AX D v is consistent. The row-reduction of the
augmented matrix ŒAjv is
3
1 0 1 0
60 1 2 07
7
6
40 0 0 15:
2
ŒAjv
00 0 0
124
4 Vector Spaces and Subspaces
Looking at the third row in the matrix appearing in the right-hand side, we see that
the system is not consistent, thus v is not in the span of fv1 ; v2 ; v3 g.
For the vector w, we use the same method. The row-reduction of the augmented
matrix ŒAjw is now
3
1 0 1 1
60 1 2 17
7
6
40 0 0 05
2
ŒAjw
00 0 0
which shows that the system is consistent and so w is in the span of fv1 ; v2 ; v3 g. If we
want to explicitly find the linear combination of v1 ; v2 ; v3 giving w, all we need is to
solve the system
3
2 3
2 3
1 0 1
1
x
1
60 1 2 7
617
7 4 5 6 7
6
4 0 0 0 5 x2 D 4 0 5 :
x3
00 0
0
2
This yields without any problem x1 D x3 C 1 and x2 D 1 2x3 . Thus we can write
w D .1 C x3 /v1 C x2 v2 C .1 2x3 /v3
and this for any choice of x3 . We can take for instance x3 D 0 and obtain w D
v1 C v2 .
The following result is easily proved, but explains the importance of the notion
of span:
Proposition 4.26. Let V be a vector space over F and let v1 ; v2 ; : : : ; vn 2 V .
Then
a) Span.v1 ; v2 ; : : : ; vn / is the intersection of all subspaces of V which contain
v1 ; v2 ; : : : ; vn .
b) Span.v1 ; v2 ; : : : ; vn / is the smallest vector subspace of V which contains
v1 ; v2 ; : : : ; vn .
Proof. Since an arbitrary intersection of vector subspaces is a vector subspace, part
a) implies part b), so we will focus on the proof of part a).
First, let us prove that Span.v1 ; v2 ; : : : ; vn / is contained in every vector subspace
W of V that contains v1 ; v2 ; : : : ; vn . This will imply that Span.v1 ; v2 ; : : : ; vn / is
contained in the intersection of all such subspaces W . Or, since W is a subspace of
V and since v1 ; v2 ; : : : ; vn 2 W , we also have c1 v1 C c2 v2 C : : : C cn vn 2 W for all
scalars c1 ; c2 ; : : : ; cn 2 F . Thus W contains all linear combinations of v1 ; v2 ; : : : ; vn ,
i.e., it contains Span.v1 ; v2 ; : : : ; vn /.
4.3 Linear Combinations and Span
125
It remains to see that Span.v1 ; v2 ; : : : ; vn / is a vector subspace of V (as it contains
v1 ; v2 ; : : : ; vn , this will imply that it contains the intersection of vector subspaces
containing v1 ; v2 ; : : : ; vn ). So let x; y 2 Span.v1 ; v2 ; : : : ; vn / and c 2 F a scalar.
Since x; y are linear combinations of v1 ; v2 ; : : : ; vn , we can write x D a1 v1 Ca2 v2 C
: : : C an vn and y D b1 v1 C b2 v2 C : : : C bn vn for some scalars a1 ; : : : ; an and
b1 ; : : : ; bn . Then
x C cy D .a1 C cb1 /v1 C .a2 C cb2 /v2 C : : : C .an C cbn /vn
is also a linear combination of v1 ; v2 ; : : : ; vn , thus it belongs to Span.v1 ; v2 ; : : : ; vn /.
The result follows.
Remark 4.27. It follows from the previous proposition and Problem 4.15 that
Span.v1 ; v2 ; : : : ; vn / D
n
X
F vi ;
iD1
where F vi is the subspace of V consisting in all multiples cvi of vi (equivalently,
F vi D Span.vi /).
We can extend slightly the previous definition and results by considering arbitrary
subsets of V :
Definition 4.28. Let S be a subset of V .
a) Span.S / is the subset of V consisting in all linear combinations c1 v1 C c2 v2 C
: : : C cn vn , where v1 ; v2 ; : : : ; vn is a finite subset of S and c1 ; c2 ; : : : ; cn are
scalars.
b) We say that S is a spanning set or generating set for V if Span.S / D V .
Example 4.29. 1) Consider the space V D F n and the canonical basis
3
1
6 0 7
6 7
6 7
e1 D 6 0 7 ;
6 7
4:::5
0
2
3
0
6 1 7
6 7
6 7
e2 D 6 0 7 ; : : : ;
6 7
4:::5
0
2
3
0
6 0 7
6 7
6 7
en D 6 0 7 :
6 7
4:::5
1
2
3
x1
6x 7
6 27
6 7
Then e1 ; : : : ; en is a spanning set for F n , since any vector X D 6 x3 7 can be
6 7
4:::5
xn
written X D x1 e1 C x2 e2 C : : : C xn en .
2) Similarly, consider the space V D Mm;n .F / of m n matrices with entries in F .
If Eij is the matrix in V having the .i; j /-entry equal to 1 and all other entries 0,
2
126
4 Vector Spaces and Subspaces
then the family .Eij /1im;1j n is a spanning family for V , since any matrix
A D Œaij can be written
AD
m X
n
X
aij Eij :
iD1 j D1
3) In the space Rn ŒX of polynomials with real coefficients and degree bounded by
n, the family 1; X; : : : ; X n is spanning.
Similarly, one can prove (or deduce from the previous proposition) that for an
arbitrary subset S of V , the set Span.S / is the smallest vector subspace of V which
contains S. Note the very useful
if
S1 S2
then
Span.S1 / Span.S2 /:
(4.2)
Indeed, Span.S2 / is a vector subspace containing S2 , thus also S1 , hence it contains
Span.S1 /. Alternatively, this follows from the fact that any linear combination of
finitely many elements of S1 is also a linear combination of finitely many elements
of S2 . It follows from relation (4.2) that any subset of V containing a spanning
set for V is a spanning set for V .
Row-reduction is also very useful in understanding Span.v1 ; : : : ; vk /, when
v1 ; : : : ; vk 2 F n . Indeed, consider the kn matrix A whose rows are the coordinates
of the vectors v1 ; : : : ; vk in the canonical basis of F n . Performing elementary
operations on the rows of A does not affect the span of the set of its rows, hence
Span.v1 ; : : : ; vk / is precisely the span of the rows of Aref , where we recall that
Aref is the reduced row-echelon form of A (of course, it suffices to consider only
the nonzero rows of Aref ). This gives in practice a quite manageable form of
Span.v1 ; : : : ; vk /.
Example 4.30. Consider the vectors v1 D .1; 2; 3; 4/, v2 D .3; 1; 2; 1/ and
v3 D .1; 2; 1; 2/ in R4 . We would like to obtain a simple description of V D
Span.v1 ; v2 ; v3 /.
Consider the matrix
3
1234
A D 43 1 2 15
1212
2
whose first row is given by the coordinates 1; 2; 3; 4 of v1 with respect to the
canonical basis of R4 , and similarly for the second and third row (replacing v1 with
v2 and v3 respectively). Row-reduction yields
3
1 0 0 35
Aref D 4 0 1 0 45 5 :
001 1
2
4.3 Linear Combinations and Span
127
Thus
4
3
V D Span..1; 0; 0; /; .0; 1; 0; /; .0; 0; 1; 1//
5
5
and this is the same as the set of vectors
4
3
w D .a; b; c; a C b C c/
5
5
with a; b; c 2 R.
4.3.1 Problems for Practice
1. Find at least three different ways to express the matrix
BD
2 1
2 1
as a linear combination of the matrices
1 1
1 1
;
A2 D
A1 D
1 1
1 1
1 0
A3 D
:
1 0
and
2. Show that the vector .1; 1; 1/ cannot be expressed as a linear combination of
a1 D .1; 1; 0/;
a2 D .1; 0; 1/
and
a3 D .0; 1; 1/:
3. Let W be the subset of Rn consisting of those vectors whose sum of coordinates
equals 0. Let Z be the span of .1; 1; : : : ; 1/ in Rn . Prove or disprove that
W ˚ Z D Rn .
4. Let P be the span of .1; 1; 1/ and .1; 1; 1/ in R3 , and let D be the span of
.0; 1; 1/. Is it true that P ˚ D D R3 ?
5. One of the vectors b1 D .3; 7; 6/ and b2 D .0; 2; 4/ is in the plane spanned
by the vectors v1 D .1; 0; 1/ and v2 D .1; 7; 4/. Determine which one and
write it as linear combination of the vectors v1 and v2 . Also, prove that the other
vector is not in the plane spanned by v1 and v2 .
6. Let V be the vector space of real-valued maps on R and let fn (respectively gn )
be the map sending x to cos nx (respectively cosn .x/). Prove or disprove that
Span.ffn jn 0g/ D Span.fgn jn 0g/:
128
4 Vector Spaces and Subspaces
4.4 Linear Independence
Consider some vectors v1 ; v2 ; : : : ; vn in a vector space V over F , and a vector v in
Span.v1 ; v2 ; : : : ; vn /. By definition, there are scalars c1 ; c2 ; : : : ; cn such that
v D c1 v1 C c2 v2 C : : : C cn vn :
There is nothing in the definition of the span that requires c1 ; c2 ; : : : ; cn in relation
(4.2) to be unique.
Problem 4.31. Let v1 ; v2 ; v3 be three vectors in Rn such that 3v1 C v2 C v3 D 0
and let v D v1 C v2 2v3 . Find infinitely many different ways to write v as a linear
combination of v1 ; v2 ; v3 .
Solution. Let ˛ be an arbitrary real number. Re-scaling both sides of the equality 3v1 C v2 C v3 D 0 by ˛ and adding the corresponding relation to the
equality v D v1 C v2 2v3 yields
v D .3˛ C 1/v1 C .˛ C 1/v2 C .˛ 2/v3 :
Thus each value of ˛ provides a different way to write v as a linear combination of
v1 ; v2 ; v3 .
Suppose now that a vector v can be written as v D a1 v1 C a2 v2 C : : : C an vn . If
b1 ; b2 ; : : : ; bn are scalars such that we also have v D b1 v1 C b2 v2 C : : : C bn vn , then
subtracting the two relations we obtain
0 D .a1 b1 /v1 C .a2 b2 /v2 C : : : C .an bn /vn :
Thus we would be able to conclude that a1 ; a2 ; : : : ; an are unique if the equation
(with z1 ; : : : ; zn 2 F )
z1 v1 C z2 v2 C : : : C zn vn D 0
would force z1 D : : : D zn D 0. As we said above, this is not always the case: take
for instance n D 1, v1 D 0, then a1 v1 D 0 for any choice of the scalar a1 . On the
other hand, vectors v1 ; : : : ; vn having the uniqueness property play a fundamental
role in linear algebra and they also deserve a formal definition:
Definition 4.32. a) Vectors v1 ; v2 ; : : : ; vn in some vector space V are linearly
dependent if there is a relation
c1 v1 C c2 v2 C : : : C cn vn D 0
for which at least one of the scalars c1 ; c2 ; : : : ; cn is nonzero.
b) Vectors v1 ; v2 ; : : : ; vn in the vector space V are linearly independent if whenever we have scalars a1 ; a2 ; : : : ; an with a1 v1 C a2 v2 C C an vn D 0, then
a1 D a2 D D an D 0.
4.4 Linear Independence
129
Example 4.33. In all situations considered in example 4.29, the corresponding
generating family is also linearly independent.
Before going on to more abstract things, let us consider the following very
concrete problem: given some vectors v1 ; : : : ; vk in F n (take for simplicity F DR),
decide whether they are linearly independent. We claim that this problem can be
solved algorithmically in a fairly simple way. Indeed, we need to know if we can
find x1 ; : : : ; xk 2 F , not all equal to 0 and such that
x1 v1 C : : : C xk vk D 0:
Let A be the n k matrix whose columns are given by the coordinates of v1 ; : : : ; vk
with respect to the canonical basis of F n . Then the previous relation is equivalent to
AX D 0, where X is the column vector with coordinates x1 ; : : : ; xk . Thus v1 ; : : : ; vk
are linearly independent if and only if the homogeneous linear system AX D 0
has a nontrivial solution. We know that this problem can be solved algorithmically,
via the row-reduction algorithm: let Aref be the reduced row-echelon form of
A. If there is a pivot in every column of Aref , then v1 ; : : : ; vk are linearly
independent, otherwise they are not. Thus the original problem can also be solved
algorithmically. Also, note that since every homogeneous linear system with more
variables than equations has a nontrivial solution, we deduce that if we have more
than n vectors in F n , then they are never linearly independent! Thus sometimes
we can solve the original problem with absolutely no effort, simply by counting the
number of vectors we are given!
Problem 4.34. Consider the vectors v1 D .1; 2; 3; 4; 5/, v2 D .2; 3; 4; 5; 1/, v3 D
.1; 3; 5; 7; 9/, v4 D .3; 5; 7; 9; 1/ in R5 . Are these vectors linearly independent? If
the answer is negative, give a nontrivial linear dependency relation between these
vectors.
Solution. We consider the matrix
3
1213
62 3 3 57
7
6
7
6
A D 63 4 5 77:
7
6
44 5 7 95
5191
2
Row-reduction yields
3
1 0 0 2
60 1 0 2 7
7
6
7
6
Aref D 6 0 0 1 1 7 :
7
6
40 0 0 0 5
000 0
2
130
4 Vector Spaces and Subspaces
Since there is no pivot in the last column, the vectors v1 ; v2 ; v3 ; v4 are linearly
dependent.
To find a nontrivial linear dependency relation, we solve the system AX D 0,
which is equivalent to the system Aref X D 0. This system is further equivalent to
x1 D 2x4 ;
x2 D 2x4 ;
x3 D x4 :
Taking x4 D 1 (we can take any nonzero value we like), we obtain the dependency
relation
2v1 2v2 v3 C v4 D 0:
Problem 4.35. Show that the 4 vectors
v1 D .2; 1; 3; 1/; v2 D .1; 0; 1; 2/; v3 D .3; 2; 7; 4/; v4 D .1; 2; 0; 1/
are linearly dependent, and find three of them that are linearly independent.
Solution. Row reduction yields
3
2 1 3 1
61 0 2 2 7
7
6
43 1 7 0 5
1 2 4 1
2
3
1020
60 1 1 07
7
6
40 0 0 15 :
0000
2
Thus the 4 vectors are dependent. Eliminating the vector v3 (the one that does
not have a pivot in its column) yields the linearly independent set of vectors
fv1 ; v2 ; v4 g.
One may argue that the above definition is a little bit restrictive in the sense that
it only deals with finite families of vectors. If we had an infinite family P
.vi /i 2I of
vectors of V , we would not be able to give a meaning to the infinite sum i 2I ci vi
for any choice of the scalars ci . However, if all but finitely many of the scalars ci
were 0, then the previous sum would be a finite sum and would thus make sense. So
one can extend the previous definition by saying that the family .vi /i 2I is linearly
dependent if one can
Pfind scalars .ci /i2I such that all but finitely many are 0, not all
of them are 0 and i2I ci vi D 0. Equivalently, and perhaps easier to understand,
an arbitrary family is linearly dependent if there is a finite subfamily which
is linearly dependent. A family of vectors is linearly independent if any finite
subfamily is linearly independent. Thus, a (possibly infinite) set L is linearly
independent if whenever we have distinct elements l1 ; : : : ; ln 2 L and scalars
a1 ; a2 ; : : : ; an with a1 l1 C a2 l2 C C an ln D 0, then a1 D a2 D D an D 0.
4.4 Linear Independence
131
Remark 4.36. We note the following simple but extremely useful facts:
a) A subfamily of a linearly independent family is linearly independent. Indeed,
let .vi /i2I be a linearly independent family and let J be a subset of I . Assume
that .vi /i2J is linearly dependent, thus (by definition) we can find a finite linearly
dependent subfamily vi1 ; : : : ; vin with i1 ; : : : ; in 2 J . But i1 ; : : : ; in 2 I , thus
vi1 ; : : : ; vin is a finite linearly dependent subfamily of the linearly independent
family .vi /i2I , contradiction.
b) If two vectors in a family of vectors are equal, then this family is automatically linearly dependent. Indeed, say vector v appears at least twice in the
linearly independent family .vi /i2I . Then by part a), the subfamily v; v should
be linearly independent. But this is absurd, since an obvious nontrivial lineardependency relation is 1 v C .1/v D 0.
Problem 4.37. Let V be the vector space of all real-valued maps on R. Prove that
the maps x ! jx 1j, x ! jx 2j,. . . , x ! jx 10j are linearly independent.
Solution. Let fi .x/ D jx ij for 1 i 10 and suppose that
a1 f1 C a2 f2 C : : : C a10 f10 D 0
for some real numbers a1 ; : : : ; a10 . Suppose that some ai is nonzero. Dividing by
ai , we obtain that fi is a linear combination of f1 ; : : : ; fi 1 ; fi C1 ; : : : ; f10 . But
f1 ; : : : ; fi1 ; fiC1 ; : : : ; f10 are all differentiable at i , hence fi is also differentiable
at i . This is obviously wrong, hence ai D 0 for all 1 i 10, and the result
follows.
One can relate the notions of span and that of being linearly dependent, as the
following proposition shows. It essentially says that a set v1 ; v2 ; : : : ; vn is linearly
dependent if and only if one of the vectors v1 ; : : : ; vn is a linear combination
of the other vectors. Note that we used the word set and not family, that is in the
above statement we assume that v1 ; : : : ; vn are pairwise distinct (as we observed at
the end of the previous paragraph, if two vectors are equal among v1 ; : : : ; vn , then
the family v1 ; : : : ; vn is automatically linearly dependent).
Proposition 4.38. Let S be a set of vectors in some vector space V . Then S is
linearly dependent if and only if there is v 2 S such that v 2 Span.S n fvg/.
Proof. We deal separately with each implication. First, suppose that S is linearly
dependent. By definition, this means that we can find finitely many vectors
v1 ; v2 ; : : : ; vn 2 S and some scalars a1 ; a2 ; : : : ; an , not all 0, such that
a1 v1 C a2 v2 C : : : C an vn D 0:
Note that v1 ; : : : ; vn are pairwise distinct, since the elements of S are assumed to be
pairwise distinct.
132
4 Vector Spaces and Subspaces
Since not all scalars are 0, there is i 2 f1; 2; : : : ; ng such that ai ¤ 0. Dividing
the previous equality by ai , we obtain
ai1
aiC1
an
a1
v1 C : : : C
vi1 C vi C
viC1 C : : : C vn D 0;
ai
ai
ai
ai
hence
vi D a1
ai1
aiC1
an
v1 : : : vi1 vi C1 : : : vn :
ai
ai
ai
ai
We deduce that vi belongs to the span of v1 ; : : : ; vi 1 ; vi C1 ; : : : ; vn , which is
contained in the span of S n fvi g, as fv1 ; : : : ; vi1 ; vi C1 ; : : : ; vn g S n fvi g. This
proves one implication.
Next, suppose that there is v 2 S such that v 2 Span.S n fvg/. That means that
we can find v1 ; v2 ; : : : ; vn 2 S n fvg and scalars a1 ; a2 ; : : : ; an such that
v D a1 v1 C a2 v2 C : : : C an vn
But then
1 v C .a1 /v1 C : : : C .an /vn D 0
and the vectors v; v1 ; : : : ; vn are linearly dependent. Since v … fv1 ; : : : ; vn g, it follows
that S has a finite subset which is linearly dependent and so S is linearly dependent.
The result follows.
The following rather technical and subtle result (the Steinitz exchange lemma)
is the fundamental theorem in the basic theory of vector spaces. We will deduce
from it a lot of very nontrivial results, which will help building the theory of finite
dimensional vector spaces.
Theorem 4.39 (Exchange lemma). Let L D fv1 ; v2 ; : : : ; vn g and S D
fw1 ; w2 ; : : : ; wm g be two finite subsets of a vector space V , with L linearly
independent and S a spanning set. Then n m and we can find vectors s1 ; : : : ; smn
in S such that L [ fs1 ; s2 ; : : : ; smn g is a spanning set.
Proof. The result will be proved by induction on n. There is nothing to be proved
when n D 0, so assume that the result holds for n and let us prove it for n C 1. Since
v1 ; v2 ; : : : ; vnC1 are linearly independent, so are v1 ; v2 ; : : : ; vn by Remark 4.36. Thus
by the inductive hypothesis we already have n m and the existence of vectors
s1 ; : : : ; smn such that fv1 ; : : : ; vn ; s1 ; : : : ; smn g is a spanning set. In particular, we
can express vnC1 as a linear combination
vnC1 D a1 v1 C : : : C an vn C b1 s1 C : : : C bmn smn :
4.4 Linear Independence
133
If m D n, then the previous relation can be written
vnC1 D a1 v1 C : : : C an vn
and contradicts the hypothesis that v1 ; v2 ; : : : ; vn are linearly independent. Thus
m ¤ n and since n m, we must have n C 1 m. The same argument also proves
that at least one of b1 ; b2 ; : : : ; bmn is nonzero. Permuting the vectors s1 ; : : : ; smn ,
we may assume that b1 ¤ 0. Dividing the relation
vnC1 D a1 v1 C : : : C an vn C b1 s1 C : : : C bmn smn
by b1 and rearranging terms yields
s1 D a1
an
1
bmn
v1 : : : vn C vnC1 : : : smn
b1
bn
b1
b1
which shows that s1 2 Span.v1 ; : : : ; vn ; vnC1 ; s2 ; : : : ; smn /. Thus
V D Span.v1 ; : : : ; vn ; s1 ; : : : ; smn / Span.v1 ; : : : ; vn ; vnC1 ; s2 ; : : : ; smn /
and L [ fs2 ; : : : ; smn g is a spanning set, which is exactly what we needed.
Remark 4.40. One can slightly refine the previous theorem by no longer assuming
that L is finite (but still assuming that S is finite). Indeed, any subset of L is still
linearly independent. Hence Theorem 4.39 shows that any finite subset of L has size
at most m and hence L is finite and has size n m.
4.4.1 Problems for Practice
1. Are the vectors
v1 D .1; 2; 1/;
v2 D .3; 4; 5/; v3 D .0; 2; 3/
linearly independent in R3 ?
2. Consider the vectors
v1 D .1; 2; 1; 3/;
v2 D .1; 1; 1; 1/;
v3 D .3; 0; 3; 1/
in R4 .
a) Prove that v1 ; v2 ; v3 are not linearly independent.
b) Express one of these vectors as a linear combination of two other vectors.
134
4 Vector Spaces and Subspaces
3. Let V be the vector space of polynomials with real coefficients whose degree
does not exceed 3. Are the following vectors
1 C 3X C X 2 ;
X 3 3X C 1;
3X 3 X 2 X 1
linearly independent in V ?
4. Let V be the space of all real-valued maps on R.
a) If a1 < : : : : < an are real numbers, compute
lim
n
X
x!1
e .ai an /x :
iD1
b) Prove that the family of maps .x 7! e ax /a2R is linearly independent in V .
5. Let V be the space of all maps ' W Œ0; 1/ ! R. For each a 2 .0; 1/ consider
the map fa 2 V defined by
fa .x/ D
1
:
xCa
a) Let a1 < : : : < an be positive real numbers and suppose that ˛1 ; : : : ; ˛n are
real numbers such that
n
X
˛i fai .x/ D 0
iD1
for all x 0. Prove that for all real numbers x we have
n
X
iD1
˛i Y
.x C aj / D 0:
j ¤i
By making suitable choices of x, deduce that ˛1 D : : : D ˛n D 0.
b) Prove that the family .fa /a>0 is linearly independent in V .
6. Consider V D R, seen as vector space over F D Q.
p p
set in V . Hint: if a; b;p
c are
a) Prove that 1; 2; 3 is a linearly
p independent
p
rational numbers such that a C b 2 C c 3 D 0, check that a2 C 2ab 2 C
2b 2 D 3c 2 .
b) Prove that the set of numbers ln p, where p runs over the prime numbers, is
linearly independent in V .
4.5 Dimension Theory
135
7. a) If m; n are nonnegative integers, compute
Z 2
cos.mx/ cos.nx/dx:
0
b) Deduce that the maps x 7! cos nx, with n nonnegative integer, form a linearly
independent set in the space of all real-valued maps on R.
8. Let v1 ; v2 ; : : : ; vn be linearly independent vectors in Rn . Is it always the case that
v1 ; v1 C v2 ; : : : ; v1 C v2 C : : : C vn are linearly independent?
4.5 Dimension Theory
We are now ready to develop the dimension theory of vector spaces. For general
vector spaces, this is rather subtle, but we will stick to finite dimensional vector
spaces, for which the arguments are rather elementary consequences of the subtle
exchange lemma proved in the last section. We fix a field F and all vector spaces in
this section will be over F .
Definition 4.41. A vector space V is called finite dimensional if it has a finite
spanning set.
Thus V is finite dimensional if we can find a finite family of vectors
v1 ; v2 ; : : : ; vn 2 V such that all vectors in V are linear combinations of
v1 ; v2 ; : : : ; vn . For instance, the spaces F n , Mm;n .F / and Rn ŒX are finite
dimensional, by example 4.29. However, not all vector spaces are finite dimensional
(actually most of them are not).
Problem 4.42. Prove that the vector space V of all polynomials with real coefficients is not a finite dimensional R-vector space.
Proof. Suppose that V has a finite spanning set, so there are polynomials
P1 ; : : : ; Pn 2 V such that V D Span.P1 ; : : : ; Pn /. Let d be the maximum of
deg.P1 /; : : : ; deg.Pn /. Since all Pi have degree at most d , so does any linear
combination of P1 ; : : : ; Pn . It follows that any vector in V has degree at most d ,
which is certainly absurd since X d C1 has degree greater than d .
We would like to define the dimension of a finite dimensional vector space. This
should be an invariant of the vector space and should correspond to the geometric
picture (you might prefer to take F D R for a better geometric intuition): a line
(namely F ) should have dimension 1, a plane (i.e., F 2 ) should have dimension 2, in
general F n should have dimension n. Before stating and proving the main result, let
us introduce a crucial definition and practice some problems to get a better feeling
about it.
Definition 4.43. A basis of a vector space V is a subset of V which is linearly
independent and spanning.
136
4 Vector Spaces and Subspaces
For instance, the generating families appearing in example 4.29 are all bases of
the corresponding vector spaces (this explains why we called them canonical bases
in previous chapters!).
Problem 4.44. Given the matrix
AD
20
2 M2 .R/;
03
find a basis of the subspace U of M2 .R/ defined by
U D fX 2 M2 .R/ j XA D AX g:
Solution. Consider a square matrix X D
XA D AX , which can be rewritten as
2a1 3a2
2a3 3a4
a1 a2
. Then X 2 U if and only if
a3 a4
2a1 2a2
D
:
3a3 3a4
This equality is equivalent to a2 D a3 D 0. Thus
a1 0
ja1 ; a4 2 Rg;
U Df
0 a4
10
00
and X2 D
(it is
and so a basis of U is given by the matrices X1 D
00
01
not difficult to check that X1 and X2 are linearly independent).
Problem 4.45. Determine a basis of the subspace U of R4 , where
U D f.a; b; c; d / 2 R4 j a C b D 0; c D 2d g:
Solution. Since b D a and c D 2d , we can write
U D f.a; a; 2d; d /ja; d 2 Rg D fav1 C d v2 ja; d 2 Rg;
where v1 D .1; 1; 0; 0/ and v2 D .0; 0; 2; 1/. Thus v1 ; v2 form a generating family
for U . Moreover, they are linearly independent, since the relation av1 C d v2 D 0 is
equivalent to .a; a; 2d; d / D .0; 0; 0; 0/ and forces a D d D 0. We conclude that
a basis of U is given by v1 and v2 .
Problem 4.46. Consider the subspaces U; V of R4 defined by
U D f.x; y; z; w/ 2 R4 j y C z C w D 0g
4.5 Dimension Theory
137
and
V D f.x; y; z; w/ 2 R4 j x D y; z D 2wg:
Find a basis for each of the subspaces of U; V and U \ V of R4 .
Solution. Expressing w in terms of y and z, we obtain
U D f.x; y; z; y z/jy; z 2 Rg D fxu1 C yu2 C zu3 jx; y; z 2 Rg;
where u1 D .1; 0; 0; 0/, u2 D .0; 1; 0; 1/ and u3 D .0; 0; 1; 1/. Let us see whether
u1 ; u2 ; u3 are linearly independent. The equality xu1 C yu2 C zu3 D 0 is equivalent
to .x; y; z; y z/ D .0; 0; 0; 0/ and forces x D y D z D 0. Thus u1 ; u2 ; u3 are
linearly independent and therefore they form a basis of U .
Let us deal now with V . Clearly
V D f.y; y; 2w; w/jy; w 2 Rg D fyv1 C wv2 jy; w 2 Rg;
where v1 D .1; 1; 0; 0/ and v2 D .0; 0; 2; 1/. As above, v1 and v2 are linearly
independent, since the relation yv1 C wv2 D 0 is equivalent to .y; y; 2w; w/ D
.0; 0; 0; 0/ and forces y D w D 0. Thus v1 ; v2 form a basis of V .
Finally, a vector .x; y; z; w/ 2 R4 belongs to U \ V if and only if
x D y;
z D 2w;
y C z C w D 0:
This is equivalent to x D 3w, z D 2w and y D 3w, or
.x; y; z; w/ D .3w; 3w; 2w; w/ D w.3; 3; 2; 1/:
Thus .3; 3; 2; 1/ forms a basis of U \ V .
Problem 4.47. Consider the space V of functions f W R ! R spanned by the
functions in B D f1; x 7! sin.2x/; x 7! cos.2x/g.
a) Prove that B forms a basis of V .
(b) Prove that x 7! sin2 .x/ is a function in V and write it as a linear combination of
elements of B.
Solution. a) We need to prove that the vectors in B are linearly independent. In
other words, we need to prove that if a; b; c are real numbers such that
a C b sin.2x/ C c cos.2x/ D 0
for all real numbers x, then a D b D c D 0. Taking x D 0 we obtain a C c D 0,
then taking x D =2 yields ac D 0. Thus a D c D 0. Finally, taking x D =4
yields b D 0.
138
4 Vector Spaces and Subspaces
b) For all x 2 R we have
cos.2x/ D 2 cos2 .x/ 1 D 2.1 sin2 .x// 1 D 1 2 sin2 .x/;
thus
sin2 .x/ D
1 cos.2x/
:
2
We deduce that x 7! sin2 .x/ is in V and the previous formula expresses it as a
linear combination
sin2 .x/ D
1
1
1 C 0 sin.2x/ cos.2x/:
2
2
Let us prove now the first fundamental result regarding dimension theory of
vector spaces.
Theorem 4.48. Let V be a finite dimensional vector space. Then
a) V contains a basis with finitely many elements.
b) Any two bases of V have the same number of elements (in particular any basis
has finitely many elements).
Proof. a) Among all finite spanning sets S of V (we know that there is at least one
such set) consider a set B with the smallest possible number of elements. We will
prove that B is a basis. By our choice, B is a spanning set, so all we need to prove
is that B is linearly independent. If this is not the case, then Proposition 4.38
yields the existence of a vector v 2 B such that v 2 Span.B n fvg/. It follows
that B n fvg is also a spanning set. This contradicts the minimality of B and
shows that B is indeed linearly independent.
b) Let B be a basis with finitely many elements, say n. Let B 0 be another basis of
V . Then B 0 is a linearly independent set and B is a spanning set with n elements,
thus by Remark 4.40 B 0 is finite, with at most n elements. This shows that any
basis has at most n elements. But now we can play the following game: say B 0
has d elements. We saw that d n. We exchange B and B 0 in the previous
argument, to get that any basis has at most d elements, thus n d . It follows
that n D d and so all bases have the same number of elements.
The previous theorem allows us to make the following:
Definition 4.49. Let V be a finite dimensional vector space. The dimension dim V
of V is the number of elements of any basis of V .
Example 4.50. a) Consider the vector space F n . Its canonical basis e1 ; : : : ; en is a
basis of F n with n elements, thus dim F n D n.
4.5 Dimension Theory
139
b) Consider the space F ŒX n of polynomials with coefficients in F , whose degree
does not exceed n. A basis of F ŒX n is given by 1; X; : : : ; X n , thus
dim F ŒX n D n C 1:
c) Consider the vector space Mm;n .F / of m n matrices with entries in F . A basis
of this vector space is given by the elementary matrices Eij with 1 i m and
1 j n (the canonical basis of Mm;n .F /). It follows that
dim Mm;n .F / D mn:
Problem 4.51. Find a basis as well as the dimension of the subspace
V D f.a; 2a/ j a 2 Rg R2 :
Solution. By definition V is the linear span of the vector .1; 2/, since .a; 2a/ D
a.1; 2/. Since .1; 2/ ¤ .0; 0/, we deduce that a basis of V is given by .1; 2/ and
dim V D 1.
The second fundamental theorem concerning dimension theory is the following:
Theorem 4.52. Let V be a vector space of dimension n < 1. Then
a) Any linearly independent set in V has at most n elements.
b) Any spanning set in V has at least n elements.
c) If S is a subset of V with n elements, then the following assertions are
equivalent:
i) S is linearly independent
ii) S is a spanning set
iii) S is a basis of V .
Proof. Fix a basis B of V . By definition, B has n elements.
a) Since B is a spanning set with n elements, the result follows directly from
Remark 4.40.
b) Let S be a spanning set and suppose that S has d < n elements. Since B is
linearly independent, Theorem 4.39 yields n d , a contradiction.
c) Clearly iii) implies i) and ii). It suffices therefore to prove that each of i) and ii)
implies iii). Suppose that S is linearly independent. By Theorem 4.39 we can
add n n D 0 vectors to S so that the new set is a spanning set. Clearly the
new set is nothing more than S , so S is a spanning set and thus a basis (since by
assumption S is linearly independent).
Now suppose that S is a spanning set and that S is not linearly independent.
By Proposition 4.38 we can find v 2 S such that v 2 Span.S n fvg/. Then S n fvg
is a spanning set with n 1 elements, contradicting part b). Thus S is linearly
independent and a basis of V .
140
4 Vector Spaces and Subspaces
The following problems are all applications of the previous theorem.
Problem 4.53. Prove that the set U , where
U D f.1; 1; 1/; .1; 2; 1/; .2; 1; 1/g
is a basis of R3 .
Solution. Let v1 D .1; 1; 1/, v2 D .1; 2; 1/ and v3 D .2; 1; 1/. Since dim R3 D 3,
it suffices to prove that v1 ; v2 ; v3 are linearly independent. Suppose that x; y; z 2 R
satisfy
xv1 C yv2 C zv3 D 0:
This can be written as
8
< x C y C 2z D 0
x C 2y C z D 0
:
xCyCzD0
Combining the first and the last equation yields z D 0, and similarly, combining the
second and the last equation yields y D 0. Coming back to the first equation, we
also find x D 0, and the result follows.
Problem 4.54. Determine a basis of R3 that includes the vector v D .2; 1; 1/.
Solution. Let e1 ; e2 ; e3 be the canonical basis of R3 . Then v D 2e1 C e2 C e3 . It
follows that e3 belongs to the span of v; e1 ; e2 , thus the span of v; e1 ; e2 is R3 . Thus
v; e1 ; e2 form a basis of R3 , since dim R3 D 3 (of course, one can also check directly
that v; e1 ; e2 are linearly independent).
Problem 4.55. Let Rn ŒX be the vector space of polynomials with real coefficients
whose degree does not exceed n. Prove that if P0 ; P1 ; : : : ; Pn 2 Rn ŒX satisfy
deg Pk D k for 0 k n, then P0 ; P1 ; : : : ; Pn form a basis of Rn ŒX .
Solution. Since dim Rn ŒX D n C 1, it suffices to prove that P0 ; P1 ; : : : ; Pn are
linearly independent. Suppose that a0 ; a1 ; : : : ; an 2 R are not all zero and
a0 P0 C a1 P1 C : : : C an Pn D 0:
Let j be the largest index for which aj ¤ 0. Then by hypothesis a0 P0 C a1 P1 C
: : : C aj Pj has degree exactly j , which contradicts the fact that this polynomial is
the zero polynomial (since aj C1 D : : : D an D 0 and a0 P0 C : : : C an Pn D 0) and
that the zero polynomial has degree 1.
Problem 4.56. Let P 2 RŒX be a polynomial. Prove that the following assertions
are equivalent:
a) P .n/ is an integer for all integers n.
4.5 Dimension Theory
141
b) There are integers n and a0 ; : : : ; an such that
P .X / D
n
X
kD0
ak
X.X 1/ : : : .X k C 1/
;
kŠ
with the convention that the first term in the sum equals a0 .
Solution. Let Pk D X.X1/:::.XkC1/
, with P0 D 1. It is not difficult to see
kŠ
that Pk .Z/ Z (as the values of Pk at all integers are, up to a sign, binomial
coefficients). This makes it clear that b) implies a).
Suppose that a) holds and let d D deg P . Since deg Pk D k for 0 k d ,
Problem 4.55 yields real numbers a0 ; a1 ; : : : ; ad such that P D a0 P0 Ca1 P1 C: : :C
ad Pd . We need to prove that a0 ; : : : ; ad are actually integers. But by hypothesis
!
!
!
m
m
m
P .m/ D a0 C
a1 C
a2 C : : :
am1 C am
1
2
m1
are integers, for m D 0; : : : ; d . Using the relation
!
!
!
!
m
m
m
am D P .m/ a0 C
a1 C
a2 C : : :
am1
1
2
m1
it is easy to prove by induction on j that a0 ; : : : ; aj are integers for 0 j d .
Thus a0 ; : : : ; an are all integers and the problem is solved.
Before moving on to another fundamental theorem, let us stop and try to explain
how to solve a few practical problems. First, consider some vectors v1 ; : : : ; vk
in Rn and consider the problem of deciding whether this is a basis of Rn . By the
previous results, this is the case if and only if k D n and v1 ; : : : ; vk are linearly
independent. This is equivalent to saying that k D n and Aref D In . We see that we
have an algorithmic solution for our problem.
Consider now the problem: given v1 ; : : : ; vk in Rn , decide whether they span
n
R . To solve this problem, we consider the matrix A whose rows are given by the
coordinates of the vectors v1 ; : : : ; vk with respect to the canonical basis of Rn . We
row-reduce A and obtain its reduced echelon-form Aref . Then v1 ; : : : ; vk span Rn if
and only if the rows of Aref span Rn . This is the case if and only if Aref has a pivot
in every column.
Next, consider the following trickier problem: given some vectors v1 ; : : : ; vk in
Rn , find a subset of fv1 ; : : : ; vk g which forms a basis of Rn . Of course, if v1 ; : : : ; vk
do not span Rn , then the problem has no solution (and we can test this using the
procedure described in the previous paragraph). Assume now that v1 ; : : : ; vk span
Rn . Let A be the matrix whose columns are given by the coordinates of v1 ; : : : ; vk
in the canonical basis of Rn . We leave it to the reader to convince himself that those
vectors vi corresponding to columns of A containing a pivot form a basis of Rn .
142
4 Vector Spaces and Subspaces
Example 4.57. Consider the vectors v1 D .1; 0; 1; 0/, v2 D .0; 1; 1; 1/, v3 D
.2; 3; 12; 1/, v4 D .1; 1; 1; 1/, v5 D .1; 1; 0; 1/. We would like to find a
subset of these vectors which gives a basis of R4 . Let us check first whether they
span R4 . For that, we consider the matrix
3
1 0 1 0
6 0 1 1 1 7
7
6
7
6
A D 6 2 3 12 1 7 :
7
6
41 1 1 1 5
1 1 0 1
2
The row-reduction is
3
1000
60 1 0 07
7
6
7
6
Aref D 6 0 0 1 0 7
7
6
40 0 0 15
0000
2
and it has pivots in every column, thus v1 ; : : : ; v5 span R4 .
Now, to solve the original problem, we consider the matrix
3
1 0 2 1 1
6 0 1 3 1 1 7
7
A0 D 6
4 1 1 12 1 0 5
0 1 1 1 1
2
whose columns are the coordinates of v1 ; v2 ; v3 ; v4 ; v5 . Its row-reduction is
3
1 0 0 0 13
60 1 0 0 2 7
37
A0ref D 6
40 0 1 0 0 5:
0 0 0 1 13
2
The columns containing pivots are the first four, so v1 ; v2 ; v3 ; v4 form a basis of R4 .
Note that we could have read whether v1 ; : : : ; v5 span R4 directly on A0 , without
the need to introduce the matrix A. Indeed, it suffices to check that A0 has a pivot in
every row, which is the case.
Problem 4.58. Let S be the set
82 3 2 3 2 3 2 39
1 =
5
0
< 1
S D 425 ; 415 ; 4 6 5 ; 425 :
;
:
1
8
2
0
4.5 Dimension Theory
143
a) Show that S spans the space R3 and find a basis for R3 contained in S .
b) What are the coordinates of the vector c D .1; 1; 1/ with respect to the basis
found in a)?
Solution. a) Consider the matrix
3
1 0 5 1
A D 42 1 6 25 :
0 2 8 1
2
Its row-reduction is
3
1 0 5 0
Aref D 40 1 4 05 :
00 0 1
2
Since A has a pivot in each row, the columns of A span R3 , thus S spans R3 .
Considering the pivot columns of A, we also deduce that a subset of S that forms
2 3
2 3 2 3
1
0
1
a basis of R3 consists of 425, 415 and 425.
1
2
0
b) Since
3
1 0 1 j1
42 1 2 j 15
0 2 1 j1
2
3
100j 6
40 1 0 j 3 5 ;
0 0 1 j 5
2
the coordinates of c with respect to this basis given by the last column of the
matrix above, namely 6; 3; 5.
Theorem 4.59. Let V be a finite dimensional vector space and let W be a subspace
of V . Then
a) W is finite dimensional and dim W dim V . Moreover, we have equality if and
only if W D V .
b) Any basis of W can be extended to a basis of V .
Proof. Let n D dim V .
a) If S is any linearly independent set in W , then S is a linearly independent
set in V and so S has at most n elements by part a) of Theorem 4.52. Note
that if we manage to prove that W is finite dimensional, then the previous
observation automatically implies that dim W n (as any basis of W is a
linearly independent set in W ). Suppose that W is not finite dimensional. Since
W is nonzero, we can choose w1 2 W nonzero. Since fw1 g is not a spanning
144
4 Vector Spaces and Subspaces
set for W , we can choose w2 2 W not in the span of w1 . Assuming that
we constructed w1 ; : : : ; wk , simply choose any vector wkC1 in W but not in
the span of w1 ; : : : ; wk . Such a vector exists, since by assumption the finite
set fw1 ; : : : ; wk g is not a spanning set for W . By construction, w1 ; : : : ; wk are
linearly independent for all k. Thus w1 ; : : : ; wnC1 is a linearly independent set
with more than n elements in W , which is absurd. Thus W is finite dimensional
and dim W n.
We still have to prove that dim W D n implies W D V . Let B be a basis
of W . Then B has n elements and is a linearly independent set in V . By part c)
of Theorem 4.52 B is a spanning set for V , and since it is contained in W , we
deduce that W D V .
b) Let d D dim W n and let B be a basis of W . Let B 0 be a basis of V . By
Theorem 4.39 applied to the linearly independent set B in V and to the spanning
set B 0 in V , we can add nd elements to B to make it a spanning set. This set has
n elements and is a spanning set, thus it is a basis of V (part c) of Theorem 4.52)
and contains B. This is exactly what we needed.
The following result is very handy when estimating the dimension of a sum of
subspaces of a given vector space.
Theorem 4.60 (Grassmann’s formula). If W1 ; W2 are subspaces of a finite dimensional vector space V , then
dim W1 C dim W2 D dim.W1 C W2 / C dim.W1 \ W2 /:
Proof. Let m D dim W1 , n D dim W2 and k D dim.W1 \ W2 /. Let B D
fv1 ; : : : ; vk g be a basis of W1 \ W2 . Since W1 \ W2 is a subspace of both W1
and W2 , Theorem 4.59 yields bases B1 , B2 of W1 and W2 which contain B. Say
B1 D fv1 ; : : : ; vk ; u1 ; : : : ; umk g and B2 D fv1 ; : : : ; vk ; w1 ; : : : ; wnk g. We will
prove that the family
S D fv1 ; : : : ; vk ; u1 ; : : : ; umk ; w1 ; : : : ; wnk g
is a basis of W1 C W2 , and so
dim.W1 C W2 / D k C m k C n k D m C n k;
as desired.
We start by proving that S is a spanning set for W1 C W2 . Let x be any vector
in W1 C W2 . By definition we can write x D x1 C x2 with x1 2 W1 and x2 2 W2 .
Since B1 and B2 are spanning sets for W1 and W2 , we can write
x1 D a1 v1 C : : : C ak vk C b1 u1 C : : : C bmk umk
4.5 Dimension Theory
145
and
x2 D c1 v1 C : : : C ck vk C d1 w1 C : : : C dnk wnk
for some scalars ai ; bj ; cl ; dr . Then
x D .a1 Cc1 /v1 C: : :C.ak Cck /vk Cb1 u1 C: : :Cbmk umk Cd1 w1 C: : :Cdnk wnk
is in the span of S. Since x was arbitrary in W1 CW2 , it follows that S spans W1 CW2 .
Finally, let us prove that S is a linearly independent set in W1 C W2 . Suppose that
a1 v1 C : : : C ak vk C b1 u1 C : : : C bmk umk C c1 w1 C : : : C cnk wnk D 0
for some scalars ai ; bj ; cl . Then
a1 v1 C :: C ak vk C b1 u1 C : : : C bmk umk D .c1 w1 C : : : C cnk wnk /:
The left-hand side belongs to W1 and the right-hand side belongs to W2 , hence both
sides belong to W1 \ W2 , and so they are linear combinations of v1 ; : : : ; vk . Thus we
can write
a1 v1 C : : : C ak vk C b1 u1 C : : : C bmk umk D d1 v1 C : : : C dk vk
for some scalars d1 ; : : : ; dk . Writing the previous relation as
.a1 d1 /v1 C : : : C .ak dk /vk C b1 u1 C : : : C bmk umk D 0
and using the fact that v1 ; : : : ; vk ; u1 ; : : : ; umk are linearly independent, it follows
that a1 D d1 ; : : : ; ak D dk and b1 D : : : D bmk D 0. By symmetry we also obtain
c1 D : : : D cnk D 0. Then a1 v1 C : : : C ak vk D 0 and since v1 ; : : : ; vk are linearly
independent, we conclude that a1 D : : : D ak D 0. Thus all scalars ai ; bj ; cl are 0
and S is a linearly independent set. This finishes the proof of the theorem.
Remark 4.61. Suppose that W1 ; W2 are subspaces of a finite dimensional vector
space V , such that V D W1 ˚ W2 . If B1 and B2 are bases for W1 and W2 , then
B1 [ B2 is a basis for V . This follows from the proof of the previous theorem, or it
can simply be checked by unwinding definitions. More generally, if a vector space
V is the direct sum of subspaces W1 ; : : : ; Wn and Bi is a basis for Wi (1 i n),
then B1 [ : : : [ Bn is a basis for V . We leave this as an easy exercise for the reader.
Problem 4.62. Let V1 ; V2 ; : : : ; Vk be subspaces of a finite dimensional vector
space V . Prove that
dim.V1 C V2 C : : : C Vk / dim V1 C dim V2 C : : : C dim Vk :
146
4 Vector Spaces and Subspaces
Solution. It suffices to prove the result for k D 2, as then an immediate induction
on k yields the result in general (noting that V1 C V2 C : : : C Vk D .V1 C V2 / C
V3 C : : : C Vk for k 3). But for k D 2 this follows from
dim.V1 C V2 / D dim V1 C dim V2 dim.V1 \ V2 / dim V1 C dim V2 :
Problem 4.63. Let V be a finite dimensional vector space over a field F . Let U; W
be subspaces of V . Prove that V D U ˚ W if and only if V D U C W and
dim V D dim U C dim W:
Solution. If V D U ˚ W , then clearly V D U C W and we can obtain a basis of
V by patching a basis of U and one of W , so dim V D dim U C dim W . Suppose
now that V D U C W and dim V D dim U C dim W . We need to prove that
U \ W D f0g. But
dim.U \ W / D dim U C dim W dim.U C W / D dim V dim V D 0;
thus U \ W D 0.
4.5.1 Problems for Practice
1. Do the following two sets of vectors span the same subspace of R3 ?
X D f .1; 1; 0/; .3; 2; 2/ g and
Y D f .7; 3; 8/; .1; 0; 2/; .8; 3; 10/ g
2. The set S consists in the following 5 matrices:
10
;
00
11
;
00
00
;
11
1 0
;
1 0
00
:
10
(a) Determine
a basis B of M2 .R/ included in S .
12
as a linear combination of elements of B.
(b) Write
34
3. Let e1 ; e2 ; e3 ; e4 be the canonical basis of R4 and consider the vectors
v1 D e1 C e4 ;
v2 D e3 ;
v3 D e2 ;
v4 D e2 C e4 :
a) Are the subspaces Span.v1 ; v2 / and Span.v3 ; v4 / in direct sum position?
b) Are the subspaces Span.v1 ; v2 ; v3 / and Span.v4 / in direct sum position?
4.5 Dimension Theory
147
4. Let V be the set of polynomials f with real coefficients of degree not exceeding
4 and such that f .1/ D f .1/ D 0.
a) Prove that V is a subspace of the space of all polynomials with real
coefficients.
b) Find a basis of V and its dimension.
5. Let V be the set of vectors .x; y; z; t / 2 R4 such that x D z and y D t .
a) Prove that V is a subspace of R4 .
b) Give a basis and the dimension of V .
c) Complete the basis found in b) to a basis of R4 .
6. Consider the set V of vectors .x1 ; x2 ; x3 ; x4 / 2 R4 such that
x1 C x 3 D 0
and
x2 C x4 D 0:
a) Prove that V is a subspace of R4 .
b) Give a basis and the dimension of V .
c) Let W be the span of the vectors .1; 1; 1; 1/, .1; 1; 1; 1/ and .1; 0; 1; 0/.
Give a basis of W and find V C W and V \ W (you are asked to give a basis
for each of these spaces).
7. A set of three linearly independent vectors can be chosen among
u D .1; 0; 1/;
v D .2; 1; 1/
w D .4; 1; 1/;
and
x D .1; 1; 1/:
(a) Determine such a set and show that it is indeed linearly independent.
(b) Determine a nontrivial dependence relation among the four given vectors.
8. Exactly one of the vectors b1 D .7; 2; 5/ and b2 D .7; 2; 5/ can be written as
a linear combination of the column vectors of the matrix
3
103
A D 41 1 45 :
011
2
Determine which one and express it as a linear combination of the column
vectors of A.
9. Let V be the set of matrices A 2 Mn .C/ for which aij D 0 whenever i j is
odd.
a) Prove that V is a subspace of Mn .C/ and that the product of two matrices
in V belongs to V .
b) Find the dimension of V as C-vector space.
148
4 Vector Spaces and Subspaces
10. Let V be the set of matrices A 2 Mn .R/ such that
anC1i;nC1j D aij
for i; j 2 Œ1; n.
a) Prove that V is a subspace of Mn .R/.
b) Find the dimension of V as R-vector space.
11. Find all real numbers x for which the vectors
v1 D .1; 0; x/;
v2 D .1; 1; x/;
v3 D .x; 0; 1/
form a basis of R3 .
12. Let Pk D X k .1 X /nk . Prove that P0 ; : : : ; Pn is a basis of the space of
polynomials with real coefficients, whose degree does not exceed n.
13. Let V be a vector space of dimension n over F2 . Prove that for all d 2 Œ1; n
the following assertions hold:
a) There are .2n 1/.2n 2/ : : : .2n 2d 1 / d -tuples .v1 ; : : : ; vd / in V d such
that the family v1 ; : : : ; vd is linearly independent.
b) There are .2n 1/.2n 2/ : : : .2n 2n1 / invertible matrices in Mn .F2 /.
c) There are
.2n 1/.2n1 1/ : : : .2nd C1 1/
.2d 1/.2d 1 1/ : : : .2 1/
subspaces of dimension d in V .
Chapter 5
Linear Transformations
Abstract While the previous chapter dealt with individual vector spaces, in this
chapter we focus on the interaction between two vector spaces by studying linear
maps between them. Using the representation of linear maps in terms of matrices,
we obtain some rather surprising results concerning matrices, which would be
difficult to prove otherwise.
Keywords Linear maps • Kernel • image • Projection • Symmetry • Stable
subspace • Change of basis • Matrix • Rank
The goal of this chapter is to develop the theory of linear maps between vector
spaces. In other words, while the previous chapter dealt with basic properties of
individual vector spaces, in this chapter we are interested in the interactions between
two vector spaces. We will see that one can understand linear maps between finite
dimensional vector spaces in terms of matrices and, more importantly and perhaps
surprisingly at first sight, that we can study properties of matrices using linear maps
and properties of vector spaces that were established in the previous chapter.
5.1 Definitions and Objects Canonically Attached
to a Linear Map
Unless stated otherwise, all vector spaces will be over a field F , which the reader
can take R or C. In the previous chapter we defined and studied the basic properties
of vector spaces. In this chapter we will deal with maps between vector spaces.
We will not consider all maps, but only those which are compatible with the
algebraic structures on vector spaces, namely addition and scalar multiplication.
More precisely:
Definition 5.1. Let V; W be vector spaces over F . A linear map (or linear
transformation or homomorphism) between V and W is a map T W V ! W
satisfying the following two properties:
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__5
149
150
5 Linear Transformations
1) T .v1 C v2 / D T .v1 / C T .v2 / for all vectors v1 ; v2 2 V , and
2) T .cv/ D cT .v/ for all v 2 V and all scalars c 2 F .
The reader will notice the difference between this definition and the definition
of linear maps in other parts of mathematics: very often in elementary algebra or in
real analysis when we refer to a linear map we mean a map f W R ! R of the form
f .x/ D ax C b for some real numbers a; b. Such a map is a linear map from the
point of view of linear algebra if and only if b D 0 (we refer to the more general
maps x ! ax C b as affine maps in linear algebra).
In practice, instead of checking separately that T respects addition and scalar
multiplication, it may be advantageous to prove directly that
T .v1 C cv2 / D T .v1 / C cT .v2 /
for all vectors v1 ; v2 2 V and all scalars c 2 F .
Problem 5.2. If T W V ! W is a linear transformation, then T .0/ D 0 and
T .v/ D T .v/ for all v 2 V .
Solution. Since T is linear, we have
T .0/ D T .0 C 0/ D T .0/ C T .0/
thus T .0/ D 0. Similarly,
T .v/ D T ..1/v/ D .1/T .v/ D T .v/:
Example 5.3. a) If V is a vector space over F and c 2 F is a scalar, then the map
T W V ! V sending v to cv is linear (this follows from the definition of a vector
space). For c D 0 we obtain the zero map, which we simply denote 0, while
for c D 1 we obtain the identity map, denoted id. In general, linear maps of the
form v ! cv for some scalar c 2 F are called scalar linear maps.
b) Consider the vector space V D RŒX of polynomials with real coefficients
(we could allow coefficients in any field). The map T W V ! V sending P
to its derivative P 0 is linear, as follows immediately from its definition. Note
that if deg P n, then deg P 0 n, thus the map T restricts to a linear map
T W Rn ŒX ! Rn ŒX for all n (recall that Rn ŒX is the vector subspace of V
consisting of polynomials whose degree does not exceed n).
c) The map T W R2 ! R defined by T .x; y/ D xy C1 is not linear, since T .0; 0/ D
1 ¤ 0 (by Problem 5.2). Similarly, the map T W R2 ! R2 defined by T .x; y/ D
.x; y C 1/ is not linear.
d) Consider the vector space V of continuous real-valued maps on Œ0; 1. Then the
R1
map T W V ! R sending f 2 V to 0 f .x/dx is linear. This follows from
properties of integration.
e) Consider the trace map Tr W Mn .F / ! F defined by
5.1 Definitions and Objects Canonically Attached to a Linear Map
Tr.A/ D a11 C a22 C : : : C ann
151
if A D Œaij :
By definition of the operations in Mn .F /, this map is linear. It has the extremely
important property that
Tr.AB/ D Tr.BA/
for all matrices
P A; B. Indeed, one checks using the product rule that both terms
are equal to ni;j D1 aij bj i .
f) In the chapter devoted to basic properties of matrices, we saw that any matrix
A 2 Mm;n .F / defines a linear map F n ! F m via X ! AX . We also
proved in that chapter that any linear map T W F n ! F m comes from a
unique matrix A 2 Mm;n .F /. For example, the map T W R3 ! R2 defined
by T .x1 ; x2 ; x3 /D .x1; x2 / for all x1 ; x2 ; x3 2 R is linear and associated with
100
. The linear maps T W F n ! F m are exactly the maps
the matrix A D
010
T .x1 ; : : : ; xn / D .a11 x1 C : : : C a1n xn ; : : : ; am1 x1 C : : : C amn xn /
with aij 2 F .
g) We introduce now a fundamental class of linear transformations: projections
onto subspaces. Suppose that V is a vector space over a field F and that W1 ; W2
are subspaces of V such that V D W1 ˚ W2 . The projection onto W1 along W2
is the map p W V ! W1 defined as follows: for each v 2 V , p.v/ is the unique
vector in W1 for which v p.v/ 2 W2 . This makes sense, since by assumption v
can be uniquely written as v1 C v2 with v1 2 W1 and v2 2 W2 , and so necessarily
p.v/ D v1 . It may not be apparently clear that the map p is linear, but this is
actually not difficult: assume that v; v0 2 V and let w D p.v/ and w0 D p.v0 /.
Then w; w0 2 W1 so w C w0 2 W1 , and
.v C v0 / .w C w0 / D .v w/ C .v0 w0 / 2 W2 ;
so by definition
p.v C v0 / D w C w0 D p.v/ C p.v0 /:
We leave it to the reader to check that p.av/ D ap.v/ for v 2 V and a 2 F ,
using a similar argument. Note that p.v/ D v for all v 2 W1 , but p.v/ D 0 for
all v 2 W2 . In general, we call a linear map T W V ! V a projection if there is
a decomposition V D W1 ˚ W2 such that T is the projection onto W1 along W2 .
h) Assume that we are in the situation described in g). We will define a
second fundamental class of linear maps namely symmetries with respect to
subspaces. More precisely, for any decomposition V D W1 ˚ W2 of V into the
direct sum of two subspaces W1 ; W2 we define the symmetry s W V ! V with
respect to W1 along W2 as follows: take a vector v 2 V , write it v D w1 C w2
with w1 2 W1 and w2 2 W2 , and set
152
5 Linear Transformations
s.v/ D w1 w2 :
Again, it is not difficult to check that s is a linear map. Note that s.v/ D v
whenever v 2 W1 and s.v/ D v whenever v 2 W2 . Note that if F D F2 ,
then s is the identity map, since v D v for all v 2 V if V is a vector space
over F2 . In general a linear map T W V ! V is called a symmetry if there is a
decomposition V D W1 ˚ W2 such that T is the symmetry with respect to W1
along W2 .
Suppose that T W V ! V is a linear transformation. If W is a subspace of V ,
there is no reason to have T .W / W . However, the subspaces W with this property
play an absolutely crucial role in the study of linear maps and deserve a special name
and a definition. They will be extensively used in subsequent chapters dealing with
deeper properties of linear maps.
Definition 5.4. Let T W V ! V be a linear map on a vector space V . A subspace
W of V is called stable under T or T -stable if T .W / W .
Problem 5.5. Consider the map T W R2 ! R2 sending .x1 ; x2 / to .x2 ; x1 /. Find
all subspaces of R2 which are stable under T .
Solution. Let W be a subspace of R2 which is stable under T . Since R2 and f0g
are obviously stable under T , let us assume that W ¤ f0g; R2 . Then necessarily
dim W D 1, that is W D Rv for some nonzero vector v D .x1 ; x2 /. Since W is
stable under T , there is a scalar c 2 R such that T .v/ D cv, that is .x2 ; x1 / D
.cx1 ; cx2 /. We deduce that x2 D cx1 and x1 D cx2 D c 2 x1 . Thus .c 2 C 1/x1 D 0
and since c 2 R, we must have x1 D 0 and then x2 D 0, thus v D 0, a contradiction.
This shows that the only subspaces stable under T are R2 and f0g.
Remark 5.6. The result of the previous problem is no longer the same if we replace
R by C. In this new situation the line spanned by .1; i/ 2 C2 is stable under T .
If W is a stable subspace, then T restricts to a linear map T W W ! W . For
instance, one-dimensional stable subspaces (i.e., lines in V stable under T ) will be
fundamental objects associated with linear transformations on finite dimensional
vector spaces. The following exercise studies those linear maps T for which every
line is a stable subspace.
Problem 5.7. Let V be a vector space over some field F and let T W V ! V be
a linear transformation. Suppose that all lines in V are stable subspaces under T .
Prove that there is a scalar c 2 F such that T .x/ D cx for all x 2 V .
Solution. Let x 2 V be nonzero and consider the line L D F x spanned by x.
By hypothesis T .L/ L, thus we can find a scalar cx such that T .x/ D cx x. We
want to prove that we can choose cx independently of x.
Suppose that x and y are linearly independent (in particular nonzero). Then
x C y ¤ 0 and the equality T .x C y/ D T .x/ C T .y/ can be written
cxCy .x C y/ D cx x C cy y
5.1 Definitions and Objects Canonically Attached to a Linear Map
153
or equivalently
.cxCy cx / x C .cxCy cy / y D 0:
This forces cxCy D cx D cy . Next, suppose that x and y are nonzero and linearly
dependent. Thus y D ax for some nonzero scalar a. Then T .y/ D aT .x/ can
be written cy y D acx x or equivalently cy y D cx y and forces cx D cy .
Thus as long as x; y are nonzero vectors of V , we have cx D cy . Letting c be the
common value of cx (when x varies over the nonzero vectors in V ) yields the desired
result.
Let V and W be vector spaces over F and let us denote Hom.V; W / the set of
linear transformations between V and W . It is a subset of the vector space M.V; W /
of all maps f W V ! W . Recall that the addition and scalar multiplication in
M.V; W / are defined by
.f C g/.v/ D f .v/ C g.v/;
.cf /.v/ D cf .v/
for f; g 2 M.V; W /, c 2 F and v 2 V .
Proposition 5.8. Let V; W be vector spaces. The set Hom.V; W / of linear transformations between V and W is a subspace of M.V; W /.
Proof. We need to prove that the sum of two linear transformations is a linear
transformation and that cT is a linear transformation whenever c is a scalar and
T is a linear transformation. Both assertions follow straight from the definition of a
linear transformation.
We introduce now a fundamental definition:
Definition 5.9. The kernel (or null space) of a linear transformation T W V ! W is
ker T D fv 2 V; T .v/ D 0g:
The image (or range) Im.T / of T is the set
Im.F / D fT .v/jv 2 V g W:
The following criterion for injectivity is extremely useful and constantly used
when dealing with linear maps.
Proposition 5.10. If T W V ! W is a linear transformation, then T is injective if
and only if ker T D f0g.
154
5 Linear Transformations
Proof. Since T .0/ D 0, it is clear that ker T D f0g if T is injective. Conversely,
assume that ker T D f0g. If T .v1 / D T .v2 /, then
T .v1 v2 / D T .v1 / T .v2 / D 0;
thus v1 v2 2 ker T and so v1 D v2 . Thus T is injective.
Problem 5.11. Find the dimension of the kernel of the linear map determined by
the matrix
3
1 2 1 0
A D 4 1 2 1 2 5 2 M3;4 .R/:
2 4 0 2
2
Solution. Let T be the corresponding linear map, so that
T .x1 ; x2 ; x3 ; x4 / D .x1 2x2 C x3 ; x1 C 2x2 C x3 C 2x4 ; 2x1 C 4x2 C 2x4 /:
A vector x D .x1 ; : : : ; x4 / belongs to ker.T / if and only
8
<
x1 2x2 C x3 D 0
x1 C 2x2 C x3 C 2x4 D 0
:
2x1 C 4x2 C 2x4 D 0
The matrix associated with this system is A and row-reduction yields
3
1 2 0 1
Aref D 4 0 0 1 1 5 :
0 0 0 1
2
Thus the previous system is equivalent to
x1 2x2 x4 D 0
x 3 C x4 D 0
We conclude that
ker.T / D f.x1 ; x2 ; 2x2 x1 ; x1 2x2 /jx1 ; x2 2 Rg:
The last space is the span of the vectors v1 D .1; 0; 1; 1/ and v2 D .0; 1; 2; 2/
and since they are linearly independent (as x1 v1 C x2 v2 D 0 is equivalent to
.x1 ; x2 ; 2x2 x1 ; x1 2x2 / D .0; 0; 0; 0/ and so to x1 D x2 D 0), it follows that
dim ker T D 2.
Problem 5.12. Give a basis for the kernel of the linear map T W R3 ! R3 given by
T .x; y; z/ D .x 2y C z; 2x 3y C z; x C y 2z; 3x y 2z/:
5.1 Definitions and Objects Canonically Attached to a Linear Map
155
Solution. We need to find those .x; y; z/ for which T .x; y; z/ D 0, in other words
we need to solve the system
8
x 2y C z D 0
ˆ
ˆ
<
2x 3y C z D 0
ˆ x C y 2z D 0
:̂
3x y 2z D 0
The matrix of this homogeneous system is
3
2
1 2 1
6 2 3 1 7
7
AD6
4 1 1 2 5
3 1 2
and row-reduction yields
3
1 0 1
6 0 1 1 7
7
Aref D 6
40 0 0 5:
00 0
2
Thus the system is equivalent to
xzD0
yzD0
and its solutions are given by .x; x; x/ with x 2 R. In other words,
Ker.T / D f.x; x; x/jx 2 Rg
and a basis is given by the vector .1; 1; 1/.
11
and let T W M2 .R/ ! M2 .R/ be the map defined by
Problem 5.13. Let A D
11
F .X / D AX:
(a) Prove that F is a linear transformation.
(b) Find the dimension of ker.F / and a basis for ker.F /.
Solution. (a) For any two matrices X and Y in M2 .R/ and any scalar c we have
F .X C cY / D A.X C cY / D AX C cAY D F .X / C cF .Y /;
thus F is a linear transformation.
156
5 Linear Transformations
(b) We need to find the dimension and a basis of the space of matrices that are
solutions of the matrix equation AX D 0. This equation is equivalent to
00
x11 C x21 x12 C x22
D
00
x11 C x21 x12 C x22
or x21 D x11 and x22 D x12 . Thus
ker.F / D f
x11 x12
jx11 ; x12 2 Rg:
x11 x12
This last space is clearly two-dimensional, with a basis given by
0 1
1 0
:
and
0 1
1 0
Proposition 5.14. If T W V ! W is a linear transformation, then ker T and Im.T /
are subspaces of V , respectively W . Moreover, ker T is stable under T , and if
V D W then Im.T / is stable under T .
Proof. Let v1 ; v2 in ker T and let c 2 F . We need to prove that v1 C cv2 2 ker T .
Indeed,
T .v1 C cv2 / D T .v1 / C cT .v2 / D 0 C c 0 D 0:
Similarly, if w1 ; w2 2 Im.T /, then we can write w1 D T .v1 / and w2 D T .v2 / for
some v1 ; v2 2 V . Then
w1 C cw2 D T .v1 / C cT .v2 / D T .v1 C cv2 / 2 Im.T /
for all scalars c 2 F , thus Im.T / is a subspace of W .
It is clear that Im.T / is stable under T if V D W . To see that ker T is stable
under T , take v 2 ker T , so that T .v/ D 0. Then T .T .v// D T .0/ D 0, thus
T .v/ 2 ker T and so ker T is stable.
The following problem gives a characterization of projections as those linear
maps T for which T ı T D T .
Problem 5.15. Let V be a vector space over a field F and let T W V ! V be a
linear map on V . Prove that the following statements are equivalent:
a) T is a projection
b) We have T ı T D T .
Moreover, if this is the case, then ker T ˚ Im.T / D V .
5.1 Definitions and Objects Canonically Attached to a Linear Map
157
Solution. Assume that a) holds and let us prove b). Assume that T is the projection
onto W1 along W2 for some decomposition V D W1 ˚ W2 . Take v 2 V and write
v D w1 Cw2 with w1 2 W1 and w2 2 W2 . Then T .v/ D w1 and T .T .v// D T .w1 / D
w1 , hence T .T .v// D T .v/ for all v 2 V and so T ı T D T and b) holds.
Assume now that T ı T D T and let us prove b). We start by proving that
ker T ˚ Im.T / D V . Suppose that v 2 ker T \ Im.T /, so that v D T .w/ for some
w 2 V , and T .v/ D 0. We deduce that
0 D T .v/ D T .T .w// D T .w/
hence v D T .w/ D 0 and ker T \ Im.T / D f0g. Next, let v 2 V and put v1 D
v T .v/ and v2 D T .v/. Clearly v D v1 C v2 and v2 2 Im.T /. Moreover,
T .v1 / D T .v T .v// D T .v/ T .T .v// D 0
and so v1 2 ker T . Hence v 2 ker T C Im.T / and ker T ˚ Im.T / D V holds.
Set W1 D Im.T / and W2 D ker T . By assumption V D W1 ˚ W2 and T .v/ 2 W1
for all v 2 W . It suffices therefore to prove that v T .v/ 2 W2 for all v 2 V , as this
implies that T is the projection onto W1 along W2 . But v T .v/ 2 W2 if and only
if T .v T .v// D 0, that is T .v/ D T 2 .v/, which follows from our assumption that
b) holds. Note that the last statement of the problem has already been proved.
Remark 5.16. We have a similar statement for symmetries assuming that F 2
fQ; R; Cg (so we exclude F D F2 ). Namely, if V is a vector space over F and
T W V ! V is a linear map, then the following statements are equivalent:
a) T is a symmetry.
b) T ı T D id, the identity map of V (sending every vector of V to itself).
Moreover, if this is the case then V D Ker.T id/ ˚ Ker.T C id/.
5.1.1 Problems for practice
In the next problems F is a field.
1. Let f W C ! C be a R-linear map. Prove the existence of complex numbers
a; b such that f .z/ D az C bz for all z 2 C.
2. Consider the map f W R4 ! R3 defined by
f .x1 ; x2 ; x3 ; x4 / D .x1 C x2 C x3 C x4 ; 2x1 C x2 x3 C x4 ; x1 x2 C x3 x4 /:
a) Prove that f is a linear map.
b) Give a basis for the kernel of f .
158
5 Linear Transformations
3. Let V be the space of polynomials with real coefficients whose degree does not
exceed 3, and let the map f W V ! R4 be defined by
f .P / D .P .0/; P .1/; P .1/; P .2//:
a) Prove that f is a linear map.
b) Is f injective?
4. Let n be a positive integer and let V be the space of real polynomials whose
degree does not exceed n. Consider the map
f W V ! V;
f .P .X // D P .X / C .1 X /P 0 .X /;
where P 0 .X / is the derivative of P .
a) Explain why f is a well-defined linear map.
b) Give a basis for the kernel of f .
5. Find all subspaces of R2 which are stable under the linear transformation
T W R2 ! R2 ;
T .x; y/ D .x C y; x C 2y/:
6. Let V be the space of polynomials with real coefficients whose degree does not
exceed n. Let T be the linear transformation on V sending P to its derivative.
Find all subspaces of V which are stable under T .
7. Let T W RŒX ! RŒX be the map defined by
T .P .X // D P .X / 2.X 2 1/P 00 .X /:
a) Prove that T is a linear map.
b) Prove that for all n 0, the space of polynomials with real coefficients
whose degree does not exceed n is stable under T .
8. Let V be a vector space over a field F and let T1 ; : : : ; Tn W V ! V be linear
transformations. Prove that
!
n
n
\
X
ker Ti ker
Ti :
iD1
iD1
9. Let V be a vector space over a field F and let T1 ; T2 W V ! V be linear
transformations such that T1 ı T2 D T1 and T2 ı T1 D T2 . Prove that
ker T1 D ker T2 .
10. Let V be a vector space over F and let T W V ! V be a linear transformation
such that
ker T D ker T 2
and
ImT D ImT 2 :
5.2 Linear Maps and Linearly Independent Sets
159
Prove that
V D ker T ˚ ImT:
11. For each of the following maps T W R3 ! R3 , check that T is linear and then
check whether ker.T / and Im.T / are in direct sum position.
a) T .x; y; z/ D .x 2y C z; x z; x 2y C z/.
b) T .x; y; z/ D .3.x C y C z/; 0; x C y C z/.
12. Let f W R ! R be a map such that f .x C y/ D f .x/ C f .y/ for all real
numbers x; y. Prove that f is a linear map of Q-vector spaces between R and
itself.
13. (Quotient space) Let V be a finite dimensional vector space over F and let
W V be a subspace. For a vector v 2 V , let
Œv D fv C w W w 2 W g:
Note that Œv1 D Œv2 if v1 v2 2 W . Define the quotient space V =W to be
fŒv W v 2 V g. Define an addition and scalar multiplication on V =W by Œu C
Œv D Œu C v and aŒv D Œav. We recall that the addition and multiplication
above are well defined and V =W equipped with these operations is a vector
space.
a) Show that the map W V ! V =W defined by .v/ D Œv is linear with
kernel W .
b) Show that dim.W / C dim.V =W / D dim.V /.
c) Suppose U V is any subspace with W ˚ U D V . Show that jU W U !
V =W is an isomorphism, i.e., a bijective linear map.
d) Let T W V ! U be a linear map, let W ker T be a subspace of V , and
W V ! V =W be the projection onto the quotient space. Show that there is
a unique linear map S W V =W ! U such that T D S ı .
5.2 Linear Maps and Linearly Independent Sets
The following result relates linear maps and notions introduced in the previous
chapter: spanning sets, linearly independent sets. In general, if T W V ! W is
linear, it is not true that the image of a linearly independent set in V is linearly
independent in W (think about the zero linear map). However, if T is injective, then
this is the case, as the following proposition shows (dealing also with the analogous
result for spanning sets).
Proposition 5.17. Let T W V ! W be a linear transformation.
a) If T is injective and if L is a linearly independent set in V , then T .L/ WD
fT .l/; l 2 Lg is a linearly independent set in W .
160
5 Linear Transformations
b) If T is surjective and if S is a spanning set in V , then T .S / is a spanning set
in W .
c) If T is bijective and if B is a basis in V , then T .V / is a basis in W .
Proof. Part c) is simply a combination of a) and b), which we prove separately.
a) Suppose we have
c1 T .l1 / C C cn T .ln / D 0
for some scalars c1 ; : : : ; cn . The previous relation can be written as T .c1 l1 C C
cn ln / D 0, thus c1 l1 C C cn ln 2 ker T . Since T is injective, we deduce that
c1 l1 C C cn ln D 0. Hence c1 D c2 D D cn D 0. Thus T .L/ is linearly
independent.
b) Let w 2 W . Since T is surjective, there is v 2 V such that T .v/ D w. Since S
is a spanning set in V , we can write v as a linear combination of some elements
s1 ; : : : ; sn of S , say v D c1 s1 C : : : C cn sn for some scalars c1 ; : : : ; cn . Then
w D T .v/ D T .c1 s1 C : : : C cn sn / D c1 T .s1 / C : : : C cn T .sn /:
Thus w is in the span of T .s1 /; : : : ; T .sn /, thus in the span of T .S /. Since w 2 W
was arbitrary, the result follows.
The following corollary is absolutely fundamental (especially part c)). It follows
easily from the previous proposition and the rather subtle properties of finite
dimensional vector spaces discussed in the previous chapter.
Corollary 5.18. Let V and W be finite dimensional vector spaces and let
T W V ! W be a linear transformation.
a) If T is injective, then dim V dim W .
b) If T is surjective, then dim V dim W .
c) If T is bijective, then dim V D dim W .
Proof. Again, part c) is a consequence of a) and b). For a), let B be a basis of V
and let v1 ; : : : ; vn be its elements. By Proposition 5.17 T .v1 /; : : : ; T .vn / are linearly
independent vectors in W . Thus dim W n D dim V .
The argument for b) is similar, since Proposition 5.17 implies that the vectors
T .v1 /; : : : ; T .vn / form a spanning set for W , thus n dim W .
We can sometimes prove the existence of a linear map T without having to
explicitly write down the value of T .x/ for each vector x in its domain: if the domain
is finite dimensional (this hypothesis is actually unnecessary), it suffices to give the
images of the elements in a basis of the domain. More precisely:
Proposition 5.19. Let V; U be vector spaces over a field F . Let fv1 ; v2 ; : : : ; vn g be
a basis of V and let fu1 ; u2 ; : : : ; un g be any set of vectors in U . Then there is a
unique linear transformation T W V ! U such that
T .v1 / D u1 ; T .v2 / D u2 ; : : : ; T .vn / D un :
5.2 Linear Maps and Linearly Independent Sets
161
Proof. We start by proving the uniqueness. Suppose that we have two linear
transformations T; T 0 W V ! U such that T .vi / D T 0 .vi / D ui for 1 i n. Then
T T 0 is a linear transformation which vanishes at v1 ; : : : ; vn . Thus ker.T T 0 /,
which is a subspace of V , contains the span of v1 ; : : : ; vn , which is V . It follows that
T D T 0.
Let us prove existence now. Take any vector v 2 V . Since v1 ; : : : ; vn form a basis
of V , we can uniquely express v as a linear combination v D a1 v1 C : : : C an vn
for some scalars a1 ; : : : ; an 2 F . Define T .v/ D a1 u1 C : : : C an un . By definition
T .vi / D ui for all i , and it remains to check that T is a linear transformation. Let
v; v0 2 V and let c be a scalar. Write v D a1 v1 C: : :Can vn and v0 D b1 v1 C: : :Cbn vn
for some scalars ai ; bj 2 F . Then
v C cv0 D .a1 C cb1 /v1 C : : : C .an C cbn /vn ;
thus
T .v C cv0 / D .a1 C cb1 /u1 C : : : C .an C cbn /un D
.a1 u1 C : : : C an un / C c.b1 u1 C : : : C bn un / D T .v/ C cT .v0 /;
which proves the linearity of T and finishes the proof.
Problem 5.20. Find a linear transformation T W R ! R , whose image is the
linear span of the set of vectors
3
4
f.1; 2; 1; 1/; .3; 1; 5; 2/g:
Solution. Let e1 D .1; 0; 0/, e2 D .0; 1; 0/ and e3 D .0; 0; 1/ be the standard basis
of R3 . Let v1 D .1; 2; 1; 1/ and v2 D .3; 1; 5; 2/. By Proposition 5.19 there is a
linear transformation T W R3 ! R4 such that
T .e1 / D v1 ;
T .e2 / D v2 ;
T .e3 / D 0:
The image of T is precisely the set of linear combinations of T .e1 /, T .e2 / and
T .e3 /, and this is clearly the span of v1 ; v2 .
We note that T is very far from being unique: we could have taken T .e3 / D v2
for instance (there are actually lots of linear maps with the desired property).
Problem 5.21. Let
v1 D .1; 0; 0/;
v2 D .1; 1; 0/;
v3 D .1; 1; 1/
and let T W R3 ! R2 be a linear transformation such that
T .v1 / D .3; 2/;
Compute the value of T .5; 3; 1/.
T .v2 / D .1; 2/;
T .v3 / D .0; 1/:
162
5 Linear Transformations
Solution. We look for scalars a; b; c such that
.5; 3; 1/ D av1 C bv2 C cv3 ;
as then, by linearity,
T .5; 3; 1/ D aT .v1 / C bT .v2 / C cT .v3 /:
The equality
.5; 3; 1/ D av1 C bv2 C cv3 ;
is equivalent to
.5; 3; 1/ D .a; 0; 0/ C .b; b; 0/ C .c; c; c/ D .a C b C c; b C c; c/:
Thus c D 1, b C c D 3 and a C b C c D 5, which gives
c D 1;
b D 2;
a D 2:
It follows that
T .5; 3; 1/ D 2T .v1 / C 2T .v2 / C T .v3 / D .6; 4/ C .2; 4/ C .0; 1/ D .4; 9/: Remark 5.22. One can easily check that v1 ; v2 ; v3 form a basis of R3 , thus such a
map exists by Proposition 5.19.
Problem 5.23. Determine the linear transformation T W R3 ! R3 such that
T .1; 0; 1/ D .1; 0; 0/;
T .0; 1; 1/ D .0; 1; 0/;
T .0; 0; 1/ D .1; 1; 1/:
Solution. We start with an arbitrary vector v D .x1 ; x2 ; x3 / and look for scalars
k1 ; k2 ; k3 such that
v D k1 .1; 0; 1/ C k2 .0; 1; 1/ C k3 .0; 0; 1/:
If we find such scalars, then
T .v/ D k1 T .1; 0; 0/ C k2 T .0; 1; 1/ C k3 T .0; 0; 1/ D
.k1 ; 0; 0/ C .0; k2 ; 0/ C .k3 ; k3 ; k3 / D .k1 C k3 ; k2 C k3 ; k3 /:
The equality
v D k1 .1; 0; 1/ C k2 .0; 1; 1/ C k3 .0; 0; 1/
5.2 Linear Maps and Linearly Independent Sets
163
is equivalent to
.x1 ; x2 ; x3 / D .k1 ; k2 ; k1 C k2 C k3 /
or
k1 D x 1 ;
k2 D x 2 ;
k3 D x 3 x 1 x 2 :
Thus for all .x1 ; x2 ; x3 / 2 R3
T .x1 ; x2 ; x3 / D .k1 C k3 ; k2 C k3 ; k3 / D .x3 x2 ; x3 x1 ; x3 x1 x2 /: 5.2.1 Problems for practice
1. Describe the linear transformation T W R3 ! R3 such that
T .0; 1; 1/ D .1; 2; 3/;
T .1; 0; 1/ D .1; 1; 2/
and
T .1; 1; 0/ D .1; 1; 1/:
2. Is there a linear map T W R2 ! R2 such that
T .1; 1/ D .1; 2/;
T .1; 1/ D .1; 2/;
T .2; 3/ D .1; 2/‹
3. Find all real numbers x for which there is a linear map T W R3 ! R3 such that
T .1; 1; 1/ D .1; x; 1/;
T .1; 0; 1/ D .1; 0; 1/
and
T .1; 1; 0/ D .1; 2; 3/;
T .1; 1; 1/ D .1; x; 2/:
4. Find a linear map T W R4 ! R3 whose image is the span of the vectors
.1; 1; 1/ and .1; 2; 3/.
5. a) Let V be the space of polynomials with real coefficients whose degree does
not exceed 3. Find all positive integers n for which there is a bijective linear
map between V and Mn .R/.
b) Answer the same question if the word bijective is replaced with injective.
c) Answer the same question if the word bijective is replaced with surjective.
164
5 Linear Transformations
5.3 Matrix Representation of Linear Transformations
We have already seen in the chapter devoted to matrices that all linear maps
T W F n ! F m are described by matrices A 2 Mm;n .F /. We will try to extend
this result and describe linear maps F W V ! W between finite dimensional
vector spaces V; W in terms of matrices. The description will not be canonical,
we will need to fix bases in V and W . All vector spaces in this section are finite
dimensional over F .
It will be convenient to introduce the following definition:
Definition 5.24. A linear transformation T W V ! W is called an isomorphism of
vector spaces or invertible if it is bijective. In this case we write V ' W (the map
T being understood).
Problem 5.25. Let T W V ! W be an isomorphism of vector spaces. Prove that its
inverse T 1 W W ! V is an isomorphism of vector spaces.
Solution. The map T 1 is clearly bijective, with inverse T . We only need to check
that T 1 is linear, i.e.
T 1 .w1 C cw2 / D T 1 .w1 / C cT 1 .w2 /
for all vectors w1 ; w2 2 W and all scalars c 2 F . Let v1 D T 1 .w1 / and v2 D
T 1 .w2 /. Then T .v1 / D w1 and T .v2 / D w2 , thus
T 1 .w1 C cw2 / D T 1 .T .v1 / C cT .v2 // D T 1 .T .v1 C cv2 // D v1 C cv2 ;
as needed.
It turns out that we can completely classify finite dimensional nonzero vector
spaces up to isomorphism: for each positive integer n, all vector spaces of
dimension n are isomorphic to F n . More precisely:
Theorem 5.26. Let n be a positive integer and let V be a vector space of dimension
n over F . If B D .e1 ; : : : ; en / is a basis, then the map iB W F n ! V sending
.x1 ; : : : ; xn / to x1 e1 C x2 e2 C : : : C xn en is an isomorphism of vector spaces.
Proof. It is clear that iB is linear and by definition of a basis it is bijective. The result
follows.
Remark 5.27. Conversely, if T W V ! W is an isomorphism of vector spaces, then
necessarily dim V D dim W . This is Corollary 5.18 (recall that we only work with
finite dimensional vector spaces).
Thus the choice of a basis in a vector space of dimension n allows us to identify
it with F n . Consider now a linear map T W V ! W and suppose that dim V D n
and dim W D m. Choose bases BV D .v1 ; : : : ; vn / and BW D .w1 ; : : : ; wm / in V
and W , respectively. By the previous theorem we have isomorphisms
iBV W F n ! V;
iBW W F m ! W:
5.3 Matrix Representation of Linear Transformations
165
We produce a linear map ' by composing the maps iBV W F n ! V , T W V ! W
and iB1
W W ! F m:
W
'T W F n ! F m ;
'T D iB1
ı T ı iBV :
W
Since 'T is a linear map between F n and F m it is uniquely described by a matrix
A 2 Mm;n .F /. This is the matrix of T with respect to the bases BV and BW .
It highly depends on the two bases, so we prefer to denote it MatBW ;BV .T /. We put
BW (i.e., the basis on the target of T ) before BV (the basis at the source of T ) in
the notation of the matrix for reasons which will be clear later on. When V D W
and we fix a basis B of V , we write MatB .T / instead of MatB;B .T /, the matrix of
T with respect to the basis B both at the source and target of T .
The previous construction looks rather complicated, but it is very natural: we
have a parametrization of linear maps between F n and F m by matrices, and we can
extend it to a description of linear maps between V and W by identifying V with
F n and W with F m , via the choice of bases in V and W . Note the fundamental
relation
iBW .AX / D T .iBV .X // if X 2 F n
and
A D MatBW ;BV .T /:
(5.1)
Taking for X a vector in the canonical basis of F n , we can make everything
completely explicit: let e1 ; : : : ; en be the canonical basis of F n and f1 ; : : : ; fm the
canonical basis of F m . If A D Œaij , then by definition Aei D a1i f1 C : : : C ami fm ,
thus for X D ei we have
iBW .AX / D iBW .a1i f1 C a2i f2 C : : : C ami fm /
D a1i w1 C a2i w2 C : : : C ami wm
On the other hand, iBV .ei / D vi , thus relation (5.1) is equivalent to the fundamental
and more concrete relation
T .vi / D a1i w1 C a2i w2 C : : : C ami wm :
(5.2)
In other words:
Proposition 5.28. Let T W V ! W be a linear transformation and let BV D
.v1 ; : : : ; vn /; BW D .w1 ; : : : ; wm / be bases in V and W . Then column j of
MatBW ;BV .T / 2 Mm;n .F / consists in the coordinates of T .vi / with respect to the
basis BW . In other words, if MatBW ;BV .T / D Œaij then for all 1 i n we have
T .vi / D
m
X
j D1
aj i wj :
166
5 Linear Transformations
Problem 5.29. Find the matrix representation of the linear transformation T W
R3 ! R3 defined by
T .x; y; z/ D .x C 2y z; y C z; x C y 2z/
with respect to the standard basis of R3 .
Solution. Let e1 D .1; 0; 0/, e2 D .0; 1; 0/, e3 D .0; 0; 1/ be the standard basis
of R3 . Then
T .e1 / D T .1; 0; 0/ D .1; 0; 1/ D 1e1 C 0e2 C 1e3
T .e2 / D T .0; 1; 0/ D .2; 1; 1/ D 2e1 C 1e2 C 1e3
T .e3 / D T .0; 0; 1/ D .1; 1; 2/ D 1e1 C 1e2 2e3 :
Thus the matrix representation of T with respect to the standard basis is
3
1 2 1
40 1 1 5:
1 1 2
2
Problem 5.30. Let Pn be the vector space of polynomials with real coefficients, of
degree less than n. A linear transformation T W P3 ! P5 is given by
T .P .X // D P .X / C X 2 P .X /
a) Find the matrix of this transformation with respect to the basis B D f1; X C
1; X 2 C 1g of P3 and the standard basis C D f1; X; X 2 ; X 3 ; X 4 g of P5 .
b) Show that T is not onto and it is injective.
Solution. a) We need to find the coordinates of T .1/, T .X C 1/ and T .X 2 C 1/
with respect to the basis C . We have
T .1/ D 1 C X 2 D 1 1 C 0 X C 1 X 2 C 0 X 3 C 0 X 4 ;
T .X C1/ D XC1CX 2 .X C1/ D 1CX CX 2 CX 3 D 11C1XC1X 2 C1X 3 C0X 4 ;
T .X 2 C1/ D X 2 C1CX 2 .X 2 C1/ D 1C2X 2 CX 4 D 11C0X C2X 2 C0X 3 CX 4 :
It follows that the required matrix is
5.3 Matrix Representation of Linear Transformations
167
3
111
60 1 07
7
6
7
6
61 1 27 :
7
6
40 1 05
001
2
b) Since dim P5 D 5 > dim P3 D 3, T cannot be onto. To prove that T is
injective, it suffices to check that ker.T / D 0. But if P 2 ker.T /, then
P .X / C X 2 P .X / D 0, thus .1 C X 2 /P .X / D 0. Since 1 C X 2 ¤ 0, it follows
that P .X / D 0 and so ker.T / D 0.
Problem 5.31. Let V be the space of polynomials with real coefficients whose
degree does not exceed n, a fixed positive integer. Consider the map
T W V ! V;
T .P .X // D P .X C 1/:
(a) Prove that T is an invertible linear transformation.
(b) What are the matrices of T and T 1 with respect to the basis 1; X; : : : ; X n of V ?
Solution. a) It is not difficult to see that T is a linear transformation, for if P1 ; P2
are vectors in V and c is a scalar, we have
T ..P1 C cP2 /.X // D .P1 C cP2 /.X C 1/ D P1 .X C 1/ C cP2 .X C 1/
D T .P1 .X // C cT .P2 .X //:
Next, to see that T is invertible it suffices to prove that T is bijective. We can
easily find the inverse of T by solving the equation P .X C 1/ D Q.X /. This is
equivalent to P .X / D Q.X 1/, thus the inverse of T is given by T 1 .P .X // D
P .X 1/.
b) For 0 j n the binomial formula yields
!
j
X
j
T .X j / D .X C 1/j D
Xi
i
iD0
and
T
1
.X / D .X 1/ D
j
j
j
X
iD0
j i
.1/
!
j
Xi:
i
Thus if A D Œaij and B D Œbij are the matrices of T , respectively T 1 with
respect to the basis 1; X; : : : ; X n , we have (with the standard convention that
n
D 0 for n < k)
k
!
!
j
j i j
:
; bij D .1/
aij D
i
i
168
5 Linear Transformations
Remark 5.32. Since T and T 1 are inverse to each other, the product of the matrices
A and B is the identity matrix In . We invite the reader to use this in order to prove
the following combinatorial identity:
! !
n
X
k
kj j
D ıi;k ;
.1/
i
j
j D0
where the right-hand side equals 1 if i D k and 0 otherwise.
The next result follows formally from the fundamental bijection between linear
maps F n ! F m and matrices in Mm;n .F /. Recall that Hom.V; W / is the vector
space of linear maps T W V ! W .
Theorem 5.33. Let BV ; BW be bases in two (finite-dimensional) vector
spaces V; W . The map T ! MatBW ;BV .T / sending a linear transformation
T W V ! W to its matrix with respect to BV and BW is an isomorphism of
vector spaces
Hom.V; W / ' Mm;n .F /:
Proof. Let '.T / D MatBW ;BV .T /. It is clear from Proposition 5.28 that ' is
a linear map from Hom.V; W / to Mm;n .F /. It is moreover injective, since if
'.T / D 0, Proposition 5.28 yields T .vi / D 0 for all i , thus ker T contains
Span.v1 ; : : : ; vn / D V and T D 0. To see that ' is surjective, start with any matrix
A D Œaij 2 Mm;n .F /. It induces a linear transformation 'A W F n ! F m defined by
X ! AX . By construction, the linear transformation T D iBW ı 'A ı iB1
satisfies
V
'.T / D A. More concretely, since v1 ; : : : ; vn is a basis of V , there is a unique linear
map T W V ! W such that
T .vi / D
m
X
aj i wj
j D1
for all 1 i n (Proposition 5.19). By Proposition 5.28 we have MatBW ;BV D A
and we are done.
Recall that dim Mm;n .F / D mn, a basis being given by the canonical basis
.Eij /1im;1j n . The theorem and Remark 5.27 yield
dim Hom.V; W / D dim V dim W:
We conclude this section with some rather technical issues, but which are
absolutely fundamental in the theory of linear transformations. First, we want to
understand the link between the matrix of a composition T ı S of linear maps and
the matrices of T and S . More precisely, fix two linear maps T W V ! W and
S W W ! U and set m D dim V , n D dim W , p D dim U . Also, fix three bases
BU ; BV and BW in U; V; W respectively. Let us write for simplicity
A D MatBU ;BW .S / and
B D MatBW ;BV .T /:
5.3 Matrix Representation of Linear Transformations
169
Corresponding to BU ; BV ; BW we have isomorphisms
iBV W F m ! V;
iBW W F n ! W;
iBU W F p ! U
and by definition of A; B we have (relation (5.1))
iBW .BX / D T .iBV .X //; X 2 F m ;
iBU .AY / D S.iBW .Y //; Y 2 F p :
Applying S to the first relation and then using the second one, we obtain for X 2 F m
S ı T .iBV .X // D S.iBW .BX // D iBU .ABX /:
This last relation and the definition of MatBU ;BV .S ı T / show that
MatBU ;BV .S ı T / D A B:
In other words, composition of linear transformations comes down to multiplication of matrices or formally
Theorem 5.34. Let T W V ! W and S W W ! U be linear transformations
between finite-dimensional vector spaces and let BU ; BV ; BW be bases of U; V
and W , respectively. Then
MatBU ;BV .S ı T / D MatBU ;BW .S / MatBW ;BV .T /:
A less technical corollary which will be constantly used is the following:
Corollary 5.35. Let T1 ; T2 W V ! V be linear transformations on a finite
dimensional vector space V and let B be a basis of V . Then
MatB .T1 ı T2 / D MatB .T1 / MatB .T2 /:
Problem 5.36. Let V be the space of polynomials with real coefficients whose
degree does not exceed 2. Consider the maps
T W R3 ! V;
T .a; b; c/ D a C 2bX C 3cX 2
and
S W V ! M2 .R/;
S.a C bX C cX 2 / D
a aCb
:
ac b
We consider the basis B1 D .1; X; X 2 / of V , the canonical basis B2 of R3 and the
canonical basis B3 D .E11 ; E12 ; E21 ; E22 / of M2 .R/.
a) Check that T and S are linear maps.
b) Write down the matrices of T and S with respect to the previous bases.
170
5 Linear Transformations
c) Find the matrix of the composition S ı T with respect to the previous bases.
d) Compute explicitly S ıT , then find directly its matrix with respect to the previous
bases and check that you obtain the same result as in c).
Solution. a) Let u be a real number and let .a; b; c/ and .a0 ; b 0 ; c 0 / be vectors in R3 .
Then
T .u.a; b; c/ C .a0 ; b 0 ; c 0 // D T .au C a0 ; bu C b 0 ; cu C c 0 /
D .au C a0 / C 2.bu C b 0 /X C 3.cu C c 0 /X 2 D
u.a C 2bX C 3cX 2 / C .a0 C 2b 0 X C 3c 0 X 2 / D uT .a; b; c/ C T .a0 ; b 0 ; c 0 /;
thus T is linear. One checks similarly that S is linear.
b) We start by computing the matrix MatB1 ;B2 .T / of T with respect to B1 and B2 .
Let B2 D .e1 ; e2 ; e3 / be the canonical basis of R3 , then
T .e1 / D T .1; 0; 0/ D 1 D 1 1 C 0 X C 0 X 2 ;
T .e2 / D T .0; 1; 0/ D 2X D 0 1 C 2 X C 0 X 2 ;
T .e3 / D T .0; 0; 1/ D 3X 2 D 0 1 C 0 X C 3 X 2 ;
hence
3
100
MatB1 ;B2 .T / D 4 0 2 0 5 :
003
2
Similarly, we compute
11
D 1 E11 C 1 E12 C 1 E21 C 0 E22 ;
S.1/ D
10
01
D 0 E11 C 1 E12 C 0 E21 C 1 E22 ;
S.X / D
01
S.X 2 / D
0 0
D 0 E11 C 0 E12 C .1/ E21 C 0 E22 ;
1 0
hence
3
10 0
61 1 0 7
7
MatB3 ;B1 .S / D 6
4 1 0 1 5 :
01 0
2
5.3 Matrix Representation of Linear Transformations
171
c) Using the theorem, we obtain
MatB3 ;B2 .S ı T / D MatB3 ;B1 .S / MatB1 ;B2 .T /
3
2
3
3
2
10 0
10 0
100
7
6
61 1 0 7
7 40 2 05 D 61 2 0 7:
D6
4 1 0 3 5
4 1 0 1 5
003
02 0
01 0
2
d) We compute
.S ı T /.a; b; c/ D S.T .a; b; c/ D S.a C 2bX C 3cX 2 / D
a a C 2b
:
a 3c 2b
Next,
11
D 1 E11 C 1 E12 C 1 E21 C 0 E22 ;
10
02
D 0 E11 C 2 E12 C 0 E21 C 2 E22
02
.S ı T /.e1 / D
.S ı T /.e2 / D
and
0 0
D 0 E11 C 0 E12 C .3/ E21 C 0 E22
.S ı T /.e3 / D
3 0
and so the matrix of S ı T is
3
10 0
61 2 0 7
7
MatB3 ;B2 .S ı T / D 6
4 1 0 3 5 ;
2
02 0
which coincides of course with the one obtained in part c).
Problem 5.37. Let A 2 Mn .F / and let T W F n ! F n be the linear map sending X
to AX . Prove that A is invertible if and only if T is bijective.
Solution. If A is invertible, let B 2 Mn .F / be such that AB D BA D In . Let
S W F n ! F n be the map X ! BX . Then S ıT has associated matrix (with respect
to the canonical basis in F n ) BA D In , thus S ı T D Id. Similarly, T ı S D id, thus
T is bijective.
Next, suppose that T is bijective and let B be the matrix of T 1 with respect
to the canonical basis. Then AB is the matrix of T ı T 1 D id with respect to
the canonical basis, thus AB D In . Similarly BA D In and A is invertible with
inverse B.
172
5 Linear Transformations
Next, suppose that we have a linear map T W V ! W , with a given matrix A with
respect to two bases B1 ; C1 of V; W respectively. Let us choose two new bases B2 ; C2
of V; W respectively. We would like to understand the matrix of T with respect to
the new bases. To answer this, we need to introduce an important object:
Definition 5.38. Let V be a vector space and let B D .v1 ; : : : ; vn / and B 0 D
.v01 ; : : : ; v0n / be two bases of V . The change of basis matrix from B to B 0 is the
matrix P D Œpij whose columns are the coordinates of the vectors v01 ; : : : ; v0n when
expressed in the basis B. Thus
v0j D p1j v1 C : : : C pnj vn
for 1 j n.
Problem 5.39. Consider the vectors
v1 D .1; 2/;
v2 D .1; 3/:
a) Prove that B 0 D .v1 ; v2 / is a basis of R2 .
b) Find the change of basis matrix from B 0 to the canonical basis of R2 .
Solution. a) It suffices to check that v1 and v2 are linearly independent. If av1 C
bv2 D 0 for some real numbers a; b, then
.a; 2a/ C .b; 3b/ D .0; 0/
thus
a C b D 0;
2a C 3b D 0:
Replacing b D a in the second equation yields a D b D 0.
b) Let B D .e1 ; e2 / be the canonical basis. We need to express e1 ; e2 in terms of
v1 ; v2 . Let us look for a; b such that
e1 D av1 C bv2 :
Equivalently, we want
a C b D 1;
2a C 3b D 0:
This has the unique solution a D 3; b D 2, thus
e1 D 3v1 2v2 :
Similarly, we obtain
e2 D v1 C v2 :
5.3 Matrix Representation of Linear Transformations
173
The coordinates 3; 2 of e1 when written in base B 0 yield the first column of the
change of basis matrix, and the coordinates 1; 1 of e2 when written in base B 0
yield the second column, thus the change of basis matrix is
3 1
:
P D
2 1
Problem 5.40. Let V; B; B 0 ; P be as above. Consider a vector v 2 V and let X and
X 0 be the column vectors representing the coordinates of v with respect to the bases
B and B 0 . Prove that X D PX 0 .
Solution. Indeed, by definition we have
v D x1 v1 C : : : C xn vn D x10 v01 C : : : C xn0 v0n ;
thus
n
X
xk vk D
kD1
D
n
X
j D1
xj0 v0j D
n
X
xj0
j D1
n
X
pkj vk
kD1
n X
n
n
X
X
.
pkj xj0 /vk D
.PX 0 /k vk
kD1 j D1
kD1
and since v1 ; : : : ; vn are linearly independent, it follows that X D PX 0 .
Remark 5.41. The previous definition and problem are always a source of confusion
and trouble, so let us insist on the following issue: the change of basis matrix from
B to B 0 expresses B 0 in terms of B, however (and this is very important in practice)
as the problem shows, the change of basis matrix takes coordinates with respect to
B 0 to coordinates with respect to B. Thus we have a change of direction.
We also write MatB .B 0 / for the change of basis matrix from B to B 0 . A simple
but very important observation is that
MatB .B 0 / D MatB;B 0 .idV /;
as follows directly from Proposition 5.28. Using this observation and Theorem 5.34,
we deduce that for any bases B; B 0 ; B 00 of V we have
MatB .B 0 / MatB 0 .B 00 / D MatB .B 00 /:
Since MatB .B/ D In for any basis B, we deduce that
MatB .B 0 / MatB 0 .B/ D In :
Thus the change of basis matrix is invertible and its inverse is simply the change
of basis matrix for the bases B 0 ; B.
174
5 Linear Transformations
Problem 5.42. Consider the families of vectors B D .v1 ; v2 ; v3 /; B 0 D .w1 ; w2 ; w3 /,
where
v1 D .0; 1; 1/;
v2 D .1; 0; 1/;
v3 D .1; 1; 0/
and
w1 D .1; 1; 1/;
w2 D .1; 0; 1/;
w3 D .1; 1; 0/:
a) Prove that B and B 0 are bases of R3 .
b) Find the change of basis matrix P from B to B 0 going back to the definition of P .
c) Find the change of basis matrix P using the canonical basis of R3 and the
previous theorem.
Solution. a) To prove that B is a basis, we find the reduced row-echelon form of
the matrix
3
011
A D 41 0 15
110
2
using row-reduction. This yields
Aref D I3
and so v1 ; v2 ; v3 are linearly independent, hence a basis of R3 . We proceed
similarly with w1 ; w2 ; w3 .
b) First, we use the definition of P : the columns of P are the coordinates of
w1 ; w2 ; w3 when expressed in the basis B. First, we try to express
w1 D av1 C bv2 C cv3 D .b C c; a C c; a C b/
which gives
b C c D 1;
a C c D 1;
a C b D 1;
with the solution
aD
1
D b;
2
cD
3
:
2
Thus
1
1
3
w1 D v1 v2 C v3
2
2
2
5.3 Matrix Representation of Linear Transformations
175
2
3
12
and the first column of P is 4 12 5. Similar arguments yield
3
2
w2 D v1 C v3 ;
3
2
1
hence the second column of P is 4 0 5 and finally w3 D v3 , thus the third
1
3
2
0
column of P is 4 0 5. We conclude that
1
2
3
12 1 0
P D 4 12 0 0 5 :
3
1 1
2
c) Let B 00 D .e1 ; e2 ; e3 / be the canonical basis of R3 . We want to find MatB .B 0 /
and we write it as
MatB .B 0 / D MatB .B 00 / MatB00 .B 0 / D .MatB00 .B//1 MatB00 .B 0 /:
Next, by definition
3
011
MatB00 .B/ D 4 1 0 1 5 ;
110
2
3
1 1 1
MatB00 .B 0 / D 4 1 0 1 5 :
1 1 0
2
Next, using either the row-reduction algorithm or by solving the system
3
2
011
4 1 0 1 5 X D b, one computes
110
31 2 1 1 1 3
011
2 2 2
41 0 15 D 4 1 1 1 5
2
2 2
1
1
110
12
2
2
2
and finally
2
12
P D 4 21
1
2
3 2 1
3
2 1 0
1 1 1
5 4 1 0 1 5 D 4 1 0 0 5 :
12
2
1
3
1
1 1 0
2
1 1
2
2
1
2
1
2
1
2
3 2
Without any miracle, we obtain the same result as in part b)!
Similar arguments give the following fundamental:
176
5 Linear Transformations
Theorem 5.43. Let T W V ! W be a linear map and let B1 ; B2 be two bases of V ,
C1 ; C2 two bases of W . If P D MatC1 .C2 / and Q D MatB1 .B2 / are the change of
basis matrices, then
MatC2 ;B2 .T / D P 1 MatC1 ;B1 .T /Q:
Proof. By Theorem 5.34 we have
P MatC2 ;B2 .T / D MatC1 ;C2 .idW / MatC2 ;B2 .T / D MatC1 ;B2 .T /
and similarly
MatC1 ;B1 .T /Q D MatC1 ;B1 .T /MatB1 ;B2 .idV / D MatC1 ;B2 .T /:
Thus
P MatC2 ;B2 .T / D MatC1 ;B1 .T /Q
and the result follows by multiplying on the left by the invertible matrix P .
Here is a different proof which has the advantage that it also shows us how to
recover the rather complicated formula in the previous theorem (experience shows
that it is almost impossible to learn this formula by heart). It assumes familiarity
with the result of Problem 5.40 (which is however much easier to remember!).
Write A1 for the matrix of T with respect to B1 ; C1 and A2 for the matrix of T
with respect to B2 ; C2 . Start with a vector v in V and write X1 , X2 for its coordinates
with respect to B1 and B2 respectively. By Problem 5.40 we have X1 D QX2 . Let
Y1 ; Y2 be the coordinates of T .v/ with respect to C1 and C2 respectively. Again by
Problem 5.40 we have Y1 D P Y2 . On the other hand, by definition of A1 and A2
we have A1 X1 D Y1 and A2 X2 D Y2 . Since P and Q are invertible, we obtain
X2 D Q1 X1 and so
A1 X1 D Y1 D P Y2 D PA2 X2 D PA2 Q1 X1 :
Since this holds for every v 2 V (equivalently, for any X1 ), we deduce that
A1 D PA2 Q1 and so A2 D P 1 AQ.
While the previous results are quite a pain in the neck to state and remember,
the following special case is absolutely fundamental and rather easy to remember
(or reprove)
Corollary 5.44. Let T W V ! V be a linear transformation on a finite dimensional
vector space V and let B; B 0 be bases of V . If P is the change of basis matrix from
B to B 0 , then
MatB 0 .T / D P 1 MatB .T /P:
5.3 Matrix Representation of Linear Transformations
177
Here is how one should recover this result in case of doubt: write Xv , Xv0 for the
column vectors representing the coordinates of v 2 V with respect to B; B 0 . Then
XT .v/ D MatB .T /X;
XT0 .v/ D MatB 0 .T /X 0
and by Problem 5.40 we have Xv D PXv0 and XT .v/ D PXT0 .v/ . Thus Combining
these relations yields
P MatB 0 .T / D MatB .T /P;
both being equal to PXT0 .v/ . Multiplying by P 1 yields the desired result.
Problem 5.45. Consider the matrix
3
2 1 0
A D 4 2 1 2 5
1 1 3
2
and let T W R3 ! R3 be the associated linear transformation, thus T .X / D AX for
all X 2 R3 . Consider the vectors
3
3
3
2
2
2
1
1
1
v1 D 4 1 5 ; v2 D 4 0 5 ; v3 D 4 1 5 :
1
1
0
a) Prove that v1 ; v2 ; v3 form a basis of R3 and compute the matrix of T with respect
to this basis.
b) Find the change of basis matrix from the canonical basis to the basis .v1 ; v2 ; v3 /.
c) Compute An for all positive integers n.
Solution. a) It suffices to check that v1 ; v2 ; v3 are linearly independent. If a; b; c are
real numbers such that av1 C bv2 C cv3 D 0, we obtain
a C b C c D 0;
a c D 0;
a b D 0:
The first and third equations yield c D 0, then the second one gives a D 0 and
finally b D 0. Thus v1 ; v2 ; v3 are linearly independent and hence they form a
basis. Another method for proving this is as follows: consider the matrix whose
columns are the vectors v1 ; v2 ; v3 are use row-reduction to bring this matrix to its
reduced row-echelon form. We end up with I3 and this shows that v1 ; v2 ; v3 is a
basis of R3 .
To compute the matrix of T with respect to the new basis, we simply express
each of the vectors T .v1 /; T .v2 /; T .v3 / in terms of v1 ; v2 ; v3 . We have
3
1
T .v1 / D Av1 D 4 1 5 D v1 ;
1
2
178
5 Linear Transformations
then
3
2
T .v2 / D Av2 D 4 0 5 D 2v2
2
2
and
3
3
T .v3 / D Av3 D 4 3 5 D 3v3 :
0
2
We conclude that the matrix of T with respect to the basis .v1 ; v2 ; v3 / is
3
2
100
B D 40 2 05:
003
b) Call the change of basis matrix P . By definition, the columns of P consist of the
coordinates of v1 ; v2 ; v3 with respect to the canonical basis of R3 . Thus
3
1 1 1
P D 4 1 0 1 5 :
1 1 0
2
c) The matrix of T with respect to .v1 ; v2 ; v3 / is, thanks to the change of matrix
formula, equal to P 1 AP . Combining this observation with part a), we deduce
that
3
2
100
P 1 AP D 4 0 2 0 5 :
003
Raising this equality to the nth power and taking into account that .P 1 AP /n D
P 1 An P (this follows easily by induction on n) yields
3
2
1 0 0
P 1 An P D 4 0 2n 0 5 :
0 0 3n
It follows that
2
3
1 0 0
An D P 4 0 2n 0 5 P 1 :
0 0 3n
5.3 Matrix Representation of Linear Transformations
179
We can easily compute P 1 either by expressing the vectors of the canonical
basis in terms of v1 ; v2 ; v3 , or by solving the system PX D b. We end up with
3
2
1 1 1
P 1 D 4 1 1 2 5 :
1 0 1
Finally,
2
2
3
3
1 0 0
1 2n C 3n 1 2n 1 2nC1 C 3n
5:
An D P 4 0 2n 0 5 P 1 D 4 1 3n
1
1 3n
n
n
n
nC1
0 0 3
2 1 2 1 2
1
Problem 5.46. Let T W R3 ! R3 be the linear map defined by
T .x; y; z/ D .2x C y z; y; x C y/:
Let e1 ; e2 ; e3 be the canonical basis of R3 and let
v1 D e1 C e3 ;
v2 D e1 C e2 ;
v3 D e1 C e2 C e3 :
a) Prove that .v1 ; v2 ; v3 / is a basis of R3 .
b) Find the matrix of T with respect to this basis.
Solution. a) In order to prove that .v1 ; v2 ; v3 / is a basis of R3 , it suffices to check
that they are linearly independent. If
av1 C bv2 C cv3 D 0;
for some real numbers a; b; c, then
.a b C c/e1 C .b C c/e2 C .a C c/e3 D 0:
Since e1 ; e2 ; e3 are linearly independent, this forces
a b C c D 0;
b C c D 0;
a C c D 0:
The first and third equations yield b D 0, then c D 0 and a D 0. Thus .v1 ; v2 ; v3 /
is a basis of R3 . Another method for proving this is as follows: consider the
matrix A whose columns are the coordinates of v1 ; v2 ; v3 when expressed in terms
of the canonical basis, that is
3
1 1 1
A D 40 1 15:
1 0 1
2
180
5 Linear Transformations
Row-reduction yields Aref D I3 and the result follows.
b) We compute
T .v1 / D T .1; 0; 1/ D .1; 0; 1/ D v1 ;
then
T .v2 / D T .1; 1; 0/ D .1; 1; 0/ D v2
and finally
T .v3 / D T .1; 1; 1/ D .2; 1; 2/:
To conclude, we need to express the vector .2; 1; 2/ in terms of v1 ; v2 ; v3 . We look
therefore for a; b; c such that
.2; 1; 2/ D av1 C bv2 C cv3
or equivalently
.2; 1; 2/ D .a b C c; b C c; a C c/:
Solving the corresponding linear system yields
a D 1;
b D 0;
c D 1:
Thus T .v3 / D v1 C v3 and so the matrix of T with respect to .v1 ; v2 ; v3 / is
3
101
B D 40 1 05:
001
2
Motivated by the previous corollary, we introduce the following fundamental
definition:
Definition 5.47. Two matrices A; B 2 Mn .F / are called similar or conjugate if
there exists P 2 GLn .F / such that B D P 1 AP . Equivalently, they are similar if
they represent the same linear transformation of V D F n in possibly two different
bases.
It is an easy exercise for the reader to prove that similarity is an equivalence
relation on Mn .F /, that is
• any matrix A is similar to itself.
• If A is similar to B, then B is similar to A.
• If A is similar to B and B is similar to C , then A is similar to C .
5.3 Matrix Representation of Linear Transformations
181
One of the most fundamental problems in linear algebra is the classification of
matrices up to similarity. In fact, the main goal of the next chapters is to prove that
suitable matrices are similar to rather simple matrices: we dedicate a whole chapter
to matrices similar to diagonal and upper-triangular ones, and we will see in the last
chapter that any symmetric matrix with real entries is similar to a diagonal matrix.
5.3.1 Problems for practice
1. Let B D .e1 ; e2 / be the canonical basis of R2 and let B 0 D .f1 ; f2 /, where
f1 D e1 C e2 ;
f2 D e1 C 2e2 :
a) Prove that B 0 is a basis of R2 .
b) Find the change of basis matrix P from B to B 0 , as well as its inverse.
c) Let T be the linear transformation on R2 whose matrix
with
respect to the
1
1
. Find the matrix
basis B (both on the source and target of R2 ) is A D
2 3
of T with respect to the bases B 0 on the target and B on the source.
2. Consider the matrix
3
17 28 4
A D 4 12 20 3 5
16 28 5
2
and the associated linear map T W R3 ! R3 defined by T .X / D AX .
a) Find a basis B1 of the kernel of T .
b) Let V be the kernel of T id, where id is the identity map on R3 . Give a basis
B2 of V .
c) Prove that V ˚ ker.T / D R3 .
d) Find the matrix of T with respect to the basis B1 [ B2 of R3 .
3. Let B D .v1 ; v2 ; v3 /, where
2 3
1
v1 D 4 0 5 ;
2
3
1
v2 D 4 1 5 ;
0
2
and let B 0 D .w1 ; w2 ; w3 /, where
3
2 3
2
2
3
w1 D 4 0 5 ; w2 D 4 2 5 ;
1
4
2 3
1
v3 D 4 2 5
3
3
2
w3 D 4 3 5 :
4
2
182
5 Linear Transformations
a) Prove that B and B 0 are both bases of R3 .
b) Find the change of basis matrix P from B to B 0 as well as its inverse P 1 .
c) Consider the linear transformation T W R3 ! R3 whose matrix with respect
3
2
1 04
to the basis B (both on the source and target of T ) is 4 0 1 0 5. Find the
2 0 1
matrix of T with respect to B 0 (both on the source and target of T ).
4. Let V be a vector space over a field F , of dimension n. Let T W V ! V be a
projection (recall that this is a linear map such that T ı T D T ).
a) Prove that V D Ker.T / ˚ Im.T /.
b) Prove that there is a basis of V in which the matrix of T is
some i 2 f0; 1; : : : ; ng.
Ii 0
0 Oni
for
5. Let V be a vector space over C or R, of dimension n. Let T W V ! V be a
symmetry (that is a linear transformation such that T ı T D id is the identity
map of V ).
a) Prove that V D ker.T id/ ˚ ker.T C id/.
b) Deduce that there is i 2
Œ0; n anda basis of V such that the matrix of T with
I
0
respect to this basis is i
.
0 Ini
6. Let T be the linear transformation on R3 whose matrix with respect to the
canonical basis is
3
1 1 1
A D 4 6 4 2 5 :
3 1 1
2
a) Check that A2 D 2A.
b) Deduce that T .v/ D 2v for all v 2 Im.T /.
c) Prove that ker.T / and Im.T / are in direct sum position in R3 .
d) Give bases for ker.T / and Im.T /, and write the matrix of T with respect
to the basis of R3 deduced by patching the two bases of ker.T / and Im.T /
respectively.
12
and consider the map T W M2 .C/ ! M2 .C/ defined by
7. Let A D
21
T .B/ D AB BA:
a) Prove that T is linear.
b) Find the matrix of T with respect to the canonical basis of M2 .C/.
5.4 Rank of a Linear Map and Rank of a Matrix
183
8. Let V be the vector space of polynomials with complex coefficients whose
degree does not exceed 3. Let T W V ! V be the map defined by T .P / D
P C P 0 . Prove that T is linear and find the matrix of T with respect to the basis
1; X; X 2 ; X 3 of V .
9. a) Find the matrix with respect to the canonical basis of the map which projects
a vector v 2 R3 to the xy-plane.
b) Find the matrix with respect to the canonical basis of the map which sends a
vector v 2 R3 to its reflection with respect to the xy-plane.
c) Let 2 R. Find the matrix with respect to the canonical basis of the
map which sends a vector v 2 R2 to its rotation through an angle ,
counterclockwise.
10. Let V be a vector space of dimension n over F . A flag in V is a family of
subspaces V0 V1 : : : Vn such that dim Vi D i for all i 2 Œ0; n. Let
T W V ! V be a linear transformation. Prove that the following statements are
equivalent:
a) There is a flag V0 : : : Vn in V such that T .Vi / Vi for all i 2 Œ0; n.
b) There is a basis of V with respect to which the matrix of T is upper-triangular.
11. Prove that the matrices
3
1100
60 1 1 07
7
AD6
40 0 1 15
0001
3
1234
60 1 2 37
7
BD6
40 0 1 25
0001
2
2
and
are similar.
5.4 Rank of a Linear Map and Rank of a Matrix
In this section we discuss a very important numerical invariant associated with a
linear transformation and to a matrix: its rank. All vector spaces over the field F
will be assumed to be finite dimensional in this section.
Definition 5.48. Let V; W be finite dimensional vector spaces over F . The rank of
a linear map T W V ! W is the integer
rank.T / D dim Im.T /:
Let us try to understand more concretely the previous definition. Let T W V ! W
be a linear transformation and let v1 ; : : : ; vn be a basis of V . Then the elements of
Im.T / are of the form T .v/ with v 2 V . Since v1 ; : : : ; vn span V , each v 2 V can
be written v D x1 v1 C : : : C xn vn with xi 2 F , and
T .v/ D T .x1 v1 C : : : C xn vn / D x1 T .v1 / C : : : C xn T .vn /:
184
5 Linear Transformations
Thus T .v1 /; : : : ; T .vn / is a spanning set for Im.T / and
rank.T / D dim Span.T .v1 /; : : : ; T .vn //:
Since we have already seen an algorithmic way of computing the span of a finite
family of vectors (using row-reduction, see the discussion preceding Example 4.30),
this gives an algorithmic way of computing the rank of a linear transformation.
More precisely, pick a basis w1 ; : : : ; wm of W and express each of the vectors
T .v1 /; : : : ; T .vn / as linear combinations of w1 ; : : : ; wm . Consider the matrix A
whose rows are the coordinates of T .v1 /; : : : ; T .vn / when expressed in the basis
w1 ; : : : ; wm of W . Performing elementary operations on the rows of A does not
change the span of T .v1 /; : : : ; T .vn /, so that rank.T / is the dimension of the span
of the rows of Aref , then reduced row-echelon form of A. On the other hand, it is
very easy to compute the last dimension: by definition of the reduced row-echelon
form, the dimension of the span of the rows of Aref is precisely the number of
nonzero rows in Aref or, equivalently, the number of pivots in Aref . Thus
rank.T / D number of nonzero rows of Aref D number of pivots in Aref :
Let us see two concrete examples:
Problem 5.49. Compute the rank of the linear map T W R3 ! R4 defined by
T .x; y; z/ D .x C y C z; x y; y z; z x/:
Solution. We let v1 ; v2 ; v3 be the canonical basis of R3 and compute
T .v1 / D T .1; 0; 0/ D .1; 1; 0; 1/;
thus the first row of the matrix A in the previous discussion is .1; 1; 0; 1/. We do
the same with v2 ; v3 and we obtain
3
2
1 1 0 1
A D 4 1 1 1 0 5 :
1 0 1 1
Using row-reduction we compute
3
100 0
Aref D 4 0 1 0 1 5
0 0 1 1
2
and we deduce that
rank.T / D 3:
5.4 Rank of a Linear Map and Rank of a Matrix
185
Problem 5.50. Let V be the space of polynomials with real coefficients of degree
not exceeding 3, and let T W V ! V be the linear map defined by
T .P .X // D P .X C 1/ P .X /:
Find rank.T /.
Solution. We start by choosing the canonical basis 1; X; X 2 ; X 3 of V and
computing
T .1/ D 0;
T .X / D X C 1 X D 1;
T .X 2 / D .X C 1/2 X 2 D 1 C 2X
and
T .X 3 / D .X C 1/3 X 3 D 1 C 3X C 3X 2 :
The matrix A in the previous discussion is
3
2
0000
61 0 0 07
7
AD6
41 2 0 05
1330
and row-reduction yields
3
1000
60 1 0 07
7
Aref D 6
40 0 1 05:
0000
2
There are three pivots, thus
rank.T / D 3:
We turn now to a series of more theoretical exercises, which establish some other
important properties of the rank of a linear map. In all problems below we assume
that the vector spaces appearing in the statements are finite dimensional.
Problem 5.51. Let T W V ! W be a linear map. Prove that
rank.T / min.dim V; dim W /:
Solution. Since Im.T / W , we have rank.T / dim W . As we have already
seen, if v1 ; : : : ; vn is a basis of V , then Im.T / is spanned by T .v1 /; : : : ; T .vn /, thus
rank.T / n D dim V:
The result follows by combining the two inequalities.
186
5 Linear Transformations
Problem 5.52. Let T1 W U ! V and T2 W V ! W be linear maps. Prove that
rank.T2 ı T1 / min.rank.T1 /; rank.T2 //:
Solution. The image of T2 ı T1 is included in that of T2 , thus rank.T2 ı T1 / rank.T2 /. Next, we consider the restriction T20 of T2 to Im.T1 /, obtaining a linear
map T20 W Im.T1 / ! W whose image clearly equals that of T2 ı T1 . Applying
Problem 5.51 to T20 we obtain
rank.T2 ı T1 / D rank.T20 / dim.Im.T1 // D rank.T1 /;
and the result follows.
Problem 5.53. Let T1 ; T2 W V ! W be linear transformations. Prove that
jrank.T1 / rank.T2 /j rank.T1 C T2 / rank.T1 / C rank.T2 /:
Solution. We have Im.T1 C T2 / Im.T1 / C Im.T2 / and so
rank.T1 C T2 / dim.Im.T1 / C Im.T2 // dim Im.T1 / C dim Im.T2 / D rank.T1 / C rank.T2 /;
establishing the inequality on the right. On the other hand, we clearly have Im.T2 / D
Im.T2 /, thus rank.T2 / D rank.T2 /. Applying what we have already proved, we
obtain
rank.T1 C T2 / C rank.T2 / D rank.T1 C T2 / C rank.T2 / rank.T1 /;
thus rank.T1 C T2 / rank.T1 / rank.T2 /. We conclude using the symmetry in T1
and T2 .
Problem 5.54. Prove that if S1 W U ! V , T W V ! W and S2 W W ! Z are linear
maps such that S1 ; S2 are bijective, then
rank.S2 T S1 / D rank.T /:
Solution. Since S1 is bijective, we have
.T S1 /.U / D T .S1 .U // D T .V / D Im.T /:
Since S2 is bijective, it realizes an isomorphism between .T S1 /.U / and
S2 ..T S1 /.U //, thus these two spaces have the same dimension. We conclude
that
rank.T / D dim Im.T / D dim.T S1 /.U / D
D dim S2 ..T S1 /.U // D dim.S2 T S1 /.U / D rank.S2 T S1 /:
Note that we only used the injectivity of S1 and the surjectivity of S2 .
5.4 Rank of a Linear Map and Rank of a Matrix
187
We will now prove the first fundamental theorem concerning the rank of a
linear map:
Theorem 5.55 (The Rank-Nullity Theorem). Let V; W be vector spaces over a
field F and let T W V ! W be a linear transformation. If V is finite-dimensional,
then
dim ker T C rank.T / D dim V:
(5.3)
Proof. Let n D dim V and let r D dim ker T . Since ker T is a subspace of V , we
have r n, in particular r < 1. We need to prove that dim ImT D n r.
Let v1 ; : : : ; vr be a basis of ker T and extend it to a basis v1 ; : : : ; vn of V . We will
prove that T .vrC1 /; : : : ; T .vn / form a basis of Im.T /, which will yield the desired
result.
Let us start by proving that T .vrC1 /; : : : ; T .vn / are linearly independent.
Suppose that arC1 ; : : : ; an are scalars in F such that
arC1 T .vrC1 / C : : : C an T .vn / D 0:
This equality can be written as T .arC1 vrC1 C : : : C an vn / D 0, or equivalently
arC1 vrC1 C : : : C an vn 2 ker T . We can therefore write
arC1 vrC1 C : : : C an vn D b1 v1 C : : : C br vr
for some scalars b1 ; : : : ; br 2 F . But since v1 ; : : : ; vn form a basis of V , the last
relation forces arC1 D : : : D an D 0 and b1 D : : : D br D 0, proving that
T .vrC1 /; : : : ; T .vn / are linearly independent.
Next, we prove that T .vrC1 /; : : : ; T .vn / span Im.T /. Let x 2 Im.T /. By
definition, there is v 2 V such that x D T .v/. Since v1 ; : : : ; vn span V , we can find
scalars a1 ; : : : ; an 2 F such that v D a1 v1 C : : : C an vn . Since v1 ; : : : ; vr 2 ker T ,
we obtain
x D T .v/ D
n
X
iD1
ai T .vi / D
n
X
ai T .vi / 2 Span.T .vrC1 /; : : : ; T .vn //:
iDrC1
This finishes the proof of the theorem.
Corollary 5.56. Let V be a finite-dimensional vector space over a field F and
let T W V ! V be a linear transformation. Then the following assertions are
equivalent:
a) T is injective.
b) T is surjective.
c) T is bijective.
Proof. Suppose that a) holds. Then the rank-nullity theorem and the fact that
ker T D 0 yield dim Im.T / D dim V . Since Im.T / is a subspace of the finitedimensional vector space V and dim Im.T / D dim V , we deduce that Im.T / D V
and so T is surjective, thus b) holds.
188
5 Linear Transformations
Suppose now that b) holds, thus dim Im.T / D dim V . The rank-nullity theorem
yields dim ker T D 0, thus ker T D 0 and then T is injective. Since it is also
surjective by assumption, it follows that c) holds. Since c) clearly implies a), the
result follows.
Remark 5.57. Without the assumption that V is finite dimensional, the previous
result no longer holds: one can find linear transformations T W V ! V which are
injective and not surjective, and linear transformations which is surjective and not
injective. Indeed, let V be the space of all sequences .xn /n0 of real numbers and
define two maps T1 ; T2 W V ! V by
T1 .x0 ; x1 ; : : :/ D .x1 ; x2 ; : : :/;
T2 .x0 ; x1 ; : : :/ D .0; x0 ; x1 ; : : :/:
Then T1 is surjective but not injective, and T2 is injective but not surjective.
Problem 5.58. Let A and B be n n matrices such that AB is invertible. Show that
both A and B are invertible.
Solution. Let T1 W F n ! F n and T2 W F n ! F n be the linear maps associated
with A and B respectively (so T1 .X / D AX and T2 .X / D BX ). Then AB is the
matrix of the linear map T1 ı T2 with respect to the canonical basis of F n (both
on the source and on the target). Since AB is invertible, we deduce that T1 ı T2 is
bijective, hence T2 is injective and T1 is surjective. But an injective or surjective
linear transformation on a finite dimensional vector space is automatically bijective.
Thus T1 and T2 are both bijective and the result follows from Problem 5.37.
Problem 5.59. Let A; B 2 Mn .C/ satisfy AB D In . Prove that BA D In .
Solution. By the previous problem, A and B are invertible. Multiplying the relation
AB D In on the right by A1 yields B D A1 . Thus BA D A1 A D In .
Problem 5.60. Show that if A and B are square matrices in Mn .C/ with AB D
A C B, then AB D BA.
Solution. The condition AB D A C B implies .A In /.B In / D In . Therefore
A In and B In are mutually inverse and .B In /.A In / D In , which implies
BA D A C B D AB.
Problem 5.61. Let T W R3 ! R3 be the linear transformation defined by
T .x; y; z/ D .x y; 2x y z; x 2y C z/:
Find the kernel of T and the rank of T .
Solution. In order to find the kernel of T , we need to find those x; y; z 2 R3 such
that
x y D 0;
2x y z D 0;
x 2y C z D 0:
5.4 Rank of a Linear Map and Rank of a Matrix
189
The first equation gives x D y, the second one z D x and so x D y D z, which
satisfies all equations. It follows that the kernel of T is the subspace f.x; x; x/jx 2
R3 g, which is precisely the line spanned by the vector .1; 1; 1/.
Next, then rank of T can be determined from the rank-nullity theorem:
3 D dim R3 D dim ker T C rank.T / D 1 C rank.T /;
thus rank.T / D 2.
We turn now to the analogous concept for matrices
Definition 5.62. Let A 2 Mm;n .F /. The rank of A is the integer rank.A/ defined
as the rank of the linear map F n ! F m sending X to AX (i.e., the canonical linear
map attached to A).
Remark 5.63. We can restate the results established in Problems 5.51, 5.52, 5.53,
and 5.54 in terms of matrices as follows:
a) rank.A/ min.m; n/ if A 2 Mm;n .F /.
b) jrank.A/ rank.B/j rank.A C B/ rank.A/ C rank.B/ for all A; B 2
Mm;n .F /.
c) rank.PAQ/ D rank.A/ for all P 2 GLm .F /, A 2 Mm;n .F / and Q 2 GLn .F /.
That is, the rank of a matrix does not change if we multiply it (on the left or
on the right) by invertible matrices.
d) rank.AB/ min.rank.A/; rank.B// for A 2 Mm;n .F / and B 2 Mn;p .F /.
Of course, we can also make the definition very concrete: let A 2 Mm;n .F / and
let e1 ; e2 ; : : : ; en be the canonical basis of F n . Write ' W F n ! F m for the linear
map X ! AX canonically attached to A. By the previous discussion Im.'/ is the
span of '.e1 /; : : : ; '.en /. Now, if C1 ; : : : ; Cn are the columns of A, seen as column
vectors in F m , then by definition '.ei / D Ci for all i . We conclude that the image
of ' is the span of C1 ; : : : ; Cn .
Let us summarize the previous discussion in an important
Theorem 5.64. Let A 2 Mm;n .F / and let C1 ; C2 ; : : : ; Cn 2 F m be its columns.
Then
rank.A/ D dim Span.C1 ; C2 ; : : : ; Cn /:
So, following the previous discussion, we obtain the following algorithm for
computing the rank of A: consider the transpose t A of A (thus the columns of
A become rows in the new matrix) and bring it to its reduced row-echelon form.
Then count the number of nonzero rows or equivalently the number of pivots.
This is the rank of A. We will see later on (see Problem 5.70) that the trick of
considering the transpose of A is actually not necessary: A and t A have the same
rank. Of course, we can also avoid considering the transpose matrix and instead
using column operations on A.
190
5 Linear Transformations
Problem 5.65. Compute the rank of the matrix
3
1 1 0 1
6 2 2 1 0 7
7
6
7
6
A D 6 0 2 1 1 7 :
7
6
4 1 1 1 2 5
1 3 1 1
2
Solution. Following the previous discussion we bring the matrix
3
1 2 0 1 1
6 1 2 2 1 3 7
t
7
AD6
4 0 1 1 1 1 5
1 0 1 2 1
2
to its reduced row-echelon form by row-reduction
3
1000 0
60 1 0 0 1 7
7
. t A/ref D 6
40 0 1 0 1 5:
2
0 0 0 1 1
Since there are 4 nonzero rows, we deduce that
rank.A/ D 4:
Problem 5.66 (Sylvester’s Inequality). Prove that for all A; B 2 Mn .F / we have
rank.AB/ rank.A/ C rank.B/ n:
Solution. Consider V D F n and the linear transformations T1 ; T2 W V ! V
sending X to AX , respectively BX . We need to prove that
rank.T1 ı T2 / rank.T1 / C rank.T2 / dim V:
By the rank-nullity theorem we know that
rank.T1 / dim V D dim ker T1 ;
thus it suffices to prove that
rank.T2 / rank.T1 ı T2 / dim ker T1 :
5.4 Rank of a Linear Map and Rank of a Matrix
191
Let W D T2 .V / D Im.T2 / and let T10 W W ! V be the restriction of T1 to W . Then
using again the rank-nullity theorem, we obtain
rank.T1 ı T2 / D dim T1 .W / D rank.T10 /
D dim W dim ker T10 :
Now dim W D rank.T2 /, so we are reduced to proving that
dim ker T10 dim ker T1 :
This is clear, as ker T10 D ker T1 \ W ker T1 .
Problem 5.67. Let A 2 M3;2 .R/ and B 2 M2;3 .R/ be matrices such that
3
0 1 1
AB D 4 1 0 1 5 :
1 1 2
2
a) Check that .AB/2 D AB and that AB has rank 2.
b) Prove that BA is invertible.
c) Prove that .BA/3 D .BA/2 and deduce that BA D I2 .
Solution. a) One checks using the product rule that the matrix
3
0 1 1
X D 4 1 0 1 5
1 1 2
2
satisfies X 2 D X . Next, one computes the rank of X by computing the reduced
row-echelon form of t X :
3
1 0 1
. t X /ref D 4 0 1 1 5 :
00 0
2
Since there are two pivots, AB D X has rank 2.
b) Using Remark 5.63, we obtain
rank.BA/ rank.A.BA/B/ D rank..AB/2 / D rank.AB/ D 2:
On the other hand, BA is a 2 2 matrix, thus necessarily rank.BA/ D 2 and so
BA is invertible.
192
5 Linear Transformations
c) Since .AB/2 D AB, we have
B.AB/2 A D B.AB/A D .BA/2 :
The left-hand side equals .BA/3 and so .BA/3 D .BA/2 . Since BA is invertible,
it follows that BA D I2 and the problem is solved.
The second fundamental theorem concerning rank is the following:
Theorem 5.68. Let A 2 Mm;n .F / and let 0 r min.m; n/. Then rank.A/ D
r if and only if there are matrices P 2 GLm .F / and Q 2 GLn .F / such that
A D PJr Q, where
Ir 0
2 Mm;n .F /:
Jr D
0 0
Proof. If A D PJr Q, then by part c) of Remark 5.63 we have rank.A/ D rank.Jr /.
The linear map associated with Jr is .x1 ; : : : ; xn / ! .x1 ; : : : ; xr /, and its image
is F r , which has dimension r, thus rank.Jr / D r. This proves one implication.
Assume now that rank.A/ D r and let T W F n ! F m be the linear map sending
X to AX , so that A is the matrix of T with respect to the canonical bases of F n
and F m . Thanks to Theorem 5.43, it suffices to prove that we can find two bases
B1 ; B2 of F n ; F m respectively such that the matrix of T with respect to B1 ; B2
is Jr . In order to construct B1 and B2 , we start with a basis e1 ; : : : ; enr of ker T
(note that dim ker T D n r by the rank-nullity theorem) and we complete it to a
basis e1 ; : : : ; en of F n . Let fi D T .enrCi / for 1 i r. We claim that f1 ; : : : ; fr
is a basis of Im.T /. Since dim Im.T / D r, it suffices to see that f1 ; : : : ; fr span
Im.T /. But any x 2 Im.T / can be written x D T .a1 e1 C : : : C an en / and since
T .ej / D 0 for 1 j n r, we have
x D anrC1 f1 C : : : C an fr 2 Span.f1 ; : : : ; fr /;
proving the claim (this argument has already been used in the last paragraph of the
proof of the rank-nullity theorem).
Complete now f1 ; : : : ; fr to a basis f1 ; : : : ; fm of F m . Call B1 D
.enrC1 ; : : : ; en ; e1 ; : : : ; er / and B2 D .f1 ; : : : ; fm /. Then by construction the
matrix of T with respect to B1 , B2 is Jr and the theorem is proved.
Corollary 5.69. Let A; B 2 Mm;n .F /. Then rank.A/ D rank.B/ if and only if
there are matrices P 2 GLm .F / and Q 2 GLn .F / such that B D PAQ.
Proof. If B D PAQ with P; Q invertible, then the result follows from part
c) of Remark 5.63. Assume that rank.A/ D rank.B/ D r, then by the previous
theorem we can write A D P1 Jr Q1 and B D P2 Jr Q2 for invertible matrices
P1 ; P2 ; Q1 ; Q2 . Setting P D P2 P11 and Q D Q11 Q2 we obtain B D PAQ. 5.4 Rank of a Linear Map and Rank of a Matrix
193
Problem 5.70. Prove that for all A 2 Mm;n .F / we have
rank.A/ D rank. t A/:
Solution. Say A has rank r and write A D PJr Q with P 2 GLm .F /
and Q 2 GLn .F /. Then t A D t Q t Jr t P and since t P; t Q are invertible,
we have rank. t A/ D rank. t Jr /. Since t Jr D Jr , we conclude that
rank. t A/ D rank.A/ D r.
Problem 5.71. Let A 2 Mn .C/. Find, as a function of A, the smallest integer r
such that A can be written as a sum of r matrices of rank 1.
Solution. For all matrices X; Y 2 Mn .C/ we have
rank.X C Y / rank.X / C rank.Y /
thus if A D A1 C C As with rank.Ai / D 1, then
rank.A/ D rank
s
X
iD1
!
Ai
s
X
rank.Ai / D s:
iD1
We will prove that we can write A as a sum of rank.A/ matrices of rank 1, which
will imply that the answer of the problem is rank.A/. Indeed, if A has rank k, write
A D PJk R for some P; R 2 GLn .C/. Thus A D A1 C A2 C C Ak , where
Ai D PEi i Q and Ei i is the matrix having all entries 0 except for entry .i; i /, which
is 1. Clearly Ai has rank 1 (since P; Q are invertible and Ei i has rank 1).
Problem 5.72. Let A 2 Mn .F / have rank r 2 Œ1; n 1. Prove that there exist
B 2 Mn;r .F /; C 2 Mr;n .F / with
rank.B/ D rank.C / D r;
such that A D BC:
Solution. WriteA D
PJr Q; where P; Q are invertible n n matrices. Note that
Ir
2 Mn;r .F / and C1 D Ir 0 2 Mr;n .F / we have Jr D B1 C1
choosing B1 D
0
and B1 ; C1 both have rank r. But then
xA D PJr Q D .PB1 /.C1 Q/
and B D PB1 2 Mn;r .F /, C D C1 Q 2 Mr;n .F / both have rank r, since P; Q are
invertible (Remark 5.63). The problem is solved.
Problem 5.73. Let A D Œaij 2 Mn .C/ be a matrix of rank 1. Prove that there exist
complex numbers x1 ; x2 ; : : : ; xn ; y1 ; y2 ; : : : ; yn such that aij D xi yj for all integers
1 i; j n:
194
5 Linear Transformations
Solution. According to the previous problem there exist two matrices
B 2 Mn;1 .C/ ;
C 2 M1;n .C/
so that A D BC: If
1
x1
B x2 C
C
BDB
@:::A ;
xn
0
C D .y1 y2 : : : yn / ;
then
1
1
0
x1 y1 x1 y2 : : : x 1 yn
x1
B x2 y1 x2 y2 : : : x 2 yn C
B x2 C
C
C
B
ADB
@ : : : A .y1 y2 : : : yn / D @ : : : : : : : : : : : : A :
xn
x n y 1 x n y 2 : : : xn y n
0
5.4.1 Problems for practice
1. a) Find the rank of the linear transformation
T W R3 ! R3 ;
T .x; y; z/ D .x y; y z; z x/:
b) Answer the same question with R replaced with F2 .
2. Let T be the linear transformation on R3 whose matrix with respect to the
canonical basis is
3
12 1
A D 4 0 1 1 5 :
11 1
2
Find a basis of Im.T / and ker.T /, and compute the rank of T .
3. Compute the rank of the matrices
3
011 3
61 1 1 1 7
7
6
7
6
B D 6 2 1 1 4 7 :
7
6
42 2 2 2 5
3 2 2 3
2
3
1 1 1 2
A D 4 0 1 3 4 5 ;
2 2 2 4
2
5.4 Rank of a Linear Map and Rank of a Matrix
195
4. Let A; B 2 M3 .F / be two matrices such that AB D O3 . Prove that
min.rank.A/; rank.B// 1:
5. Let A 2 M3 .C/ be a matrix such that A2 D O3 .
a) Prove that A has rank 0 or 1.
b) Deduce the general form of all matrices A 2 M3 .C/ such that A2 D O3 .
6. Find the rank of the matrix A D Œcos.i j /1i;j n .
7. a) Let V be an n-dimensional vector space over F and let T W V ! V be a
linear transformation. Let T j be the j -fold iterate of T (so T 2 D T ı T ,
T 3 D T ı T ı T , etc). Prove that
Im.T n / D Im.T nC1 /:
Hint: check that if Im.T j / D Im.T j C1 / for some j , then Im.T k / D
Im.T kC1 / for k j .
b) Let A 2 Mn .C/ be a matrix. Prove that An and AnC1 have the same rank.
8. Let A 2 Mn .F / be a matrix of rank 1. Prove that
A2 D Tr.A/A:
9. Let A 2 Mm .F / and B 2 Mn .F /. Prove that
rank
A 0
0 B
D rank.A/ C rank.B/:
10. Prove that for any matrices A 2 Mn;m .F / and B 2 Mm .F / we have
I A
rank n
0 B
D n C rank.B/:
11. Let n > 2 and let A D Œaij 2 Mn .C/ be a matrix of rank 2. Prove the existence
of real numbers xi ; yi ; zi ; ti for 1 i n such that for all i; j 2 f1; 2; : : : ; ng
we have
aij D xi yj C zi tj :
12. Let A D aij 1i;j n ; B D bij 1i;j n be complex matrices such that
aij D 2ij bij
for all integers 1 i; j n: Prove that rank A D rank B:
196
5 Linear Transformations
13. Let A 2 Mn .C/ be a matrix such that A2 D A, i.e., A is the matrix of a
projection. Prove that
rank.A/ C rank.In A/ D n:
14. Let n > k and let A1 ; : : : ; Ak 2 Mn .R/ be matrices of rank n 1. Prove
that A1 A2 : : : Ak is nonzero. Hint: using Sylvester’s inequality prove that
rank.A1 : : : Aj / n j for 1 j k.
15. Let A 2 Mn .C/ be a matrix of rank at least n 1. Prove that rank.Ak / n k
for 1 k n. Hint: use Sylvester’s inequality.
16. a) Prove that for any matrix A 2 Mn .R/ we have
rank.A/ D rank. t AA/:
Hint: if X 2 Rn is a column vector such that t AAX D 0, write t X t AAX D
0 and express
theleft-hand side as a sum of squares.
1 i
. Find the rank of A and t AA and conclude that part a) of
b) Let A D
i 1
the problem is no longer true if R is replaced with C.
17. Let A be an m n matrix with rank r. Prove that there is an m m matrix B
with rank m r such that BA D Om;n .
18. (Generalized inverses) Let A 2 Mm;n .F /. A generalized inverse of A is a matrix
X 2 Mn;m .F / such that AXA D A.
a) If m D n and A is invertible, show that the only generalized inverse of A
is A1 .
b) Show that a generalized inverse of A always exists.
c) Give an example to show that the generalized inverse need not be unique.
Chapter 6
Duality
Abstract After an in-depth study of duality for finite dimensional vector spaces,
we prove Jordan’s classification result of nilpotent transformations on a finite
dimensional vector space. We also explain how to describe vector subspaces by
equations using hyperplanes.
Keywords Duality • Dual basis • Linear form • Hyperplane • Orthogonal
This chapter focuses on a restricted class of linear maps between vector spaces,
namely linear maps between a vector space and the field of scalars (seen as a vector
space of dimension 1 over itself). Such linear maps are called linear forms on the
vector space. Even though the whole chapter might look rather formal at first sight,
the study of linear forms (known as duality) on finite dimensional vector spaces
is very important and yields a lot of surprising properties. For instance, we will
use duality to prove a famous result due to Jordan which completely classifies the
nilpotent linear transformations on a finite dimensional vector space. This is one
of the most important results in linear algebra! We will also use duality in the last
chapter, in a more geometric context.
6.1 The Dual Basis
We fix a field F in the sequel. The reader may take F 2 fR; Cg if he/she prefers.
Definition 6.1. The dual V of a vector space V over F is the set of linear maps
l W V ! F , endowed with the structure of a vector space over F by defining
.l1 C l2 /.v/ D l1 .v/ C l2 .v/ and
.cl/.v/ D cl.v/
for l1 ; l2 ; l 2 V , v1 ; v2 ; v 2 V and c 2 F .
We leave to the reader the immediate verification of axioms of a vector space,
which show that V is indeed a vector space over F when endowed with the
previous operations. An element l of V is called a linear form on V . These objects
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__6
197
198
6 Duality
are not very mysterious: assume for instance that V D F n and let e1 ; : : : ; en be the
canonical basis. Then for all .x1 ; : : : ; xn / 2 V we have
l.x1 ; : : : ; xn / D l.x1 e1 C: : :Cxn en / D x1 l.e1 /C: : :Cxn l.en / D a1 x1 C: : :Can xn ;
where ai D l.ei / 2 F . Conversely, any map of the form .x1 ; : : : ; xn / 7! a1 x1 C
: : : C an xn is a linear form on Rn . In general, if V is a finite dimensional vector
space and e1 ; : : : ; en is a basis of V , then the linear forms on V are precisely those
maps l W V ! F of the form
l.x1 e1 C : : : C xn en / D a1 x1 C : : : C an xn
with a1 ; : : : ; an 2 F .
By definition we have a canonical map
V V ! F;
.l; v/ 7! l.v/:
We also denote this map as .l; v/ 7! hl; vi and call it the canonical pairing between
V and its dual. Unwinding definitions, we obtain the useful formulae
hcl1 C l2 ; vi D chl1 ; vi C hl2 ; vi;
and
hl; cv1 C v2 i D chl; v1 i C hl; v2 i:
The canonical pairing is a key example of a bilinear form, a topic which will be
studied in much greater depth in subsequent chapters.
Each vector v 2 V gives rise to a natural linear form
evv W V ! F;
l 7! l.v/
on V , obtained by evaluating linear forms at v. We obtain therefore a map
W V ! V ;
.v/ D evv ;
called the canonical biduality map. Note that by definition
h .v/; li D hl; vi
for all linear forms l on V and all vectors v 2 V . A fundamental property of the
biduality map is that it is always injective. In other words, if v is a nonzero vector
in V , then we can always find a linear form l on V such that l.v/ ¤ 0. The proof
of this rather innocent-looking statement uses the existence of bases for general
vector spaces, so we prefer to take the following theorem for granted: we will see in
short time that the biduality map is an isomorphism when V is finite dimensional,
with an easy proof, and this is all we will need in this book.
6.1 The Dual Basis
199
Theorem 6.2. For any vector space V over F , the canonical biduality map
V ! V is injective.
W
Before moving on, let us introduce a useful and classical notation, called the
Kronecker symbol:
Definition 6.3. If i; j are integers, we let ıij D 1 if i D j and ıij D 0 if i ¤ j .
Let us assume now that V is finite dimensional, of dimension n 1 and let
us consider a basis e1 ; e2 ; : : : ; en of V . If v is a vector in V , then we can write
v D x1 e1 C : : : C xn en for some scalars x1 ; : : : ; xn which are uniquely determined.
Define the i th coordinate form by
ei W V ! F;
ei .v/ D xi
if
v D x1 e1 C : : : C xn en :
Thus by definition for all v 2 V we have
vD
n
X
ei .v/ei ;
iD1
or equivalently
vD
n
X
hei ; viei :
iD1
Note that for all 1 i; j n we have
ei .ej / D ıij :
We are now ready to state and prove the first fundamental result of this chapter:
Theorem 6.4. Let V be a vector space of dimension n 1 and let e1 ; : : : ; en be a
basis of V . Then the coordinate forms e1 ; : : : ; en form a basis of V as vector space
over F .
Proof. Let us check first that ei is an element of V , i.e., that ei is linear. But if
x D x1 e1 C : : : C xn en and y D y1 e1 C : : : C yn en , and if c 2 F is a scalar, then
x C cy D .x1 C cy1 /e1 C : : : C .xn C cyn /en ;
thus
ei .x C cy/ D xi C cyi D ei .x/ C cei .y/;
so ei is linear.
200
6 Duality
Next, let us prove that e1 ; : : : ; en are linearly independent. Suppose that
c1 ; : : : ; cn 2 F are scalars such that
c1 e1 C : : : C cn en D 0:
Evaluating at ei yields
c1 he1 ; ei i C : : : C cn hen ; ei i D 0:
The left-hand side equals
n
X
cj hej ; ei i D
j D1
n
X
cj ıij D ci :
j D1
Thus ci D 0 for all i and so e1 ; : : : ; en are linearly independent.
Finally, let us prove that e1 ; : : : ; en are a generating family for V . Let l 2 V be an arbitrary linear form. If v D x1 e1 C : : : C xn en is a vector in V , then linearity
of l gives
hl; vi D x1 hl; e1 i C : : : C xn hl; en i D hl; e1 ihe1 ; vi C : : : C hl; en ihen ; vi
D hhl; e1 ie1 C hl; e2 ie2 C : : : C hl; en ien ; vi;
showing that
l D hl; e1 ie1 C hl; e2 ie2 C : : : C hl; en ien :
Thus l belongs to the span of e1 ; : : : ; en , which finishes the proof of the theorem. Remark 6.5. The proof shows that for any l 2 V we have the useful relation
l D hl; e1 ie1 C hl; e2 ie2 C : : : C hl; en ien :
This is the “dual” relation of the tautological relation
vD
n
X
hei ; viei ;
iD1
valid for all v 2 V .
The previous theorem explains the following:
Definition 6.6. If e1 ; : : : ; en is a basis of a vector space V over F , we call
e1 ; : : : ; en the dual basis of e1 ; : : : ; en . It is uniquely characterized by the property that
ei .ej / D ıij
for all
1 i; j n:
A crucial consequence of the previous theorem is the following:
6.1 The Dual Basis
201
Corollary 6.7. For all finite dimensional vector spaces V over F we have
dim V D dim V :
Moreover, the canonical biduality map W V ! V is an isomorphism of vector
spaces over F .
Proof. The first part is clear from the previous theorem and the fact that all bases
in a finite dimensional vector space have the same number of elements, namely the
dimension of the space. To prove that is an isomorphism, it suffices to prove that
is injective, since
dim V D dim V D dim V ;
as follows from what we have already proved.
So suppose that .v/ D 0, which means that hl; vi D 0 for all l 2 V . Let
e1 ; : : : ; en be a basis of V . Then hei ; vi D 0 for all 1 i n, and since
vD
n
X
hei ; viei ;
iD1
we obtain v D 0, establishing therefore the injectivity of .
Remark 6.8. Conversely, one can prove that if the biduality map is an isomorphism,
then V is finite dimensional. In other words, the biduality map is never an
isomorphism for an infinite dimensional vector space!
Recall that Rn ŒX is the vector space of polynomials with real coefficients
whose degree does not exceed n.
Problem 6.9. Let V D Rn ŒX . It is easy to see that the maps P 7! P .k/ .0/
(where P .i/ is the i th derivative of P ) are elements of V . Express the dual basis of
1; X; : : : ; X n in terms of these maps.
Solution. Let ei D X i 2 V and let e0 ; : : : ; en be the dual basis. By definition
ei .ej / D ıij . Thus for all P D a0 C a1 X C : : : C an X n 2 V we have
ei .P / D ai D
1 .i /
P .0/:
iŠ
Thus ei is the linear form given by P 7! iŠ1 P .i / .0/.
The following problem gives a beautiful and classical application of the ideas
developed so far:
Problem 6.10 (Lagrange Interpolation). Let V D Rn ŒX and let x0 ; : : : ; xn be
pairwise distinct real numbers. For 0 i n define
Li .X / D
Y X xj
:
x xj
0j n i
j ¤i
202
6 Duality
a) Show that
Li .xj / D ıij
for all
1 i; j n:
b) Prove that L0 ; : : : ; Ln form a basis of V .
c) Describe the dual basis of L0 ; : : : ; Ln .
d) Prove Lagrange’s interpolation formula: for all P 2 V we have
P D
n
X
P .xi /Li :
iD0
e) Prove that for any b0 ; : : : ; bn 2 R we can find a unique polynomial P 2 V
with P .xi / D bi for 0 i n. This polynomial P is called the Lagrange
interpolation polynomial associated with b0 ; : : : ; bn .
Solution. a) By construction we have Li .xj / D 0 for j ¤ i . On the other hand,
Y x i xj
D 1;
Li .xi / D
x xj
0j n i
j ¤i
thus
Li .xj / D ıij :
b) Since dim V D n C 1 (a basis being given by 1; X; : : : ; X n ), it suffices to check
that L0 ; : : : ; Ln are linearly independent. Suppose that a0 L0 C : : : C an Ln D 0
for some scalars a0 ; : : : ; an . Evaluating this equality at xi and using part a) yields
0D
n
X
aj Lj .xi / D
j D0
n
X
aj ıij D ai
j D0
for all 0 i n, thus L0 ; : : : ; Ln are linearly independent.
c) By definition of the dual basis and by part a), we have
Li .Lj / D ıij D ıj i D Lj .xi /
for all i; j . Fix i 2 f0; : : : ; ng. Since Li .Lj / D Lj .xi / for all 0 j n and
since L0 ; : : : ; Ln span V , we deduce that
Li .P / D P .xi / for all
P 2 V:
d) By definition of the dual basis
P D
n
X
hLi ; P iLi :
iD0
By part c) we have hLi ; P i D P .xi /, which yields the desired result.
6.1 The Dual Basis
203
Pn
e) It suffices to take P D
iD0 bi Li in order to prove the existence part. For
uniqueness, if Q 2 V also satisfies Q.xi / D bi for 0 i n, it follows that
P Q is a polynomial whose degree does not exceed n and which has at least
n C 1 distinct roots, thus P Q D 0 and P D Q.
Problem 6.11. Let x0 ; : : : ; xn 2 Œ0; 1 be pairwise distinct and let V D Rn ŒX .
a) Prove that the mapl W V ! R defined by
Z 1
P .x/dx
l.P / D
0
is a linear form on V .
b) Using the previous problem, prove that there is a unique n C 1-tuple .a0 ; : : : ; an /
of real numbers such that
Z 1
n
X
P .x/dx D
ai P .xi /
0
iD0
for all P 2 V .
Solution. a) This is a direct consequence of basic properties of integral calculus.
b) We use the result and notations of the previous problem, which establishes that
L0 ; : : : ; Ln is a basis of V , and Li .P / D P .xi / for all P 2 V . Thus saying
that
Z 1
n
X
P .x/dx D
ai P .xi /
0
iD0
for all P 2 V is equivalent to saying that
I.P / D
n
X
ai Li .P /
iD0
for all P 2 V , in other words
I D
n
X
ai Li
iD0
as elements of V . Since L0 ; : : : ; Ln is a basis of V , the existence and
uniqueness of a0 ; : : : ; an is clear.
Let us consider now the following practical problem: given a basis v1 ; : : : ; vn
of Rn , express the dual basis v1 ; : : : ; vn in terms of the dual basis e1 ; : : : ; en of the
canonical basis of Rn . To do so, write
vi D
n
X
j D1
aj i ej
and
vi D
n
X
j D1
bj i ej :
204
6 Duality
Note that in practice we have an easy access to the matrix A D Œaij : its columns are
precisely the coordinates of v1 ; : : : ; vn with respect to the canonical basis e1 ; : : : ; en
of Rn . We are interested in finding B D Œbij . Using the identity vi .vj / D ıij , we
obtain
ıij D vi .vj / D
n
X
bki ek .vj / D
kD1
D
n X
n
X
n
X
bki kD1
n
X
alj bki ıkl D
kD1 lD1
n
X
alj ek .el /
lD1
akj bki D . t B A/ij :
kD1
Since this holds for all i; j , we deduce that
t
B A D In ;
i:e:;
B D t A1 :
Thus in practice we need to compute A1 (via row-reduction on the matrix .AjIn /)
and take the transpose!
Problem 6.12. Let e1 ; e2 ; e3 be the dual basis of the canonical basis of R3 . Express
in terms of e1 ; e2 ; e3 the dual basis of the basis of R3 consisting in
3
3
3
2
2
2
3
1
0
v1 D 4 2 5 ; v2 D 4 1 5 ; v3 D 4 2 5 :
1
1
3
Solution. We leave to the reader to check that v1 ; v2 ; v3 form a basis of R3 , using
row-reduction on the matrix A whose columns are v1 ; v2 ; v3 . Using row-reduction
on .AjI3 /, one obtains
3
2
5 3 2
1
A1 D 4 8 9 6 5 :
7
1 2 1
With the above notations
3
5 8 1
1
B D 4 3 9 2 5
7
2 6 1
2
and then
5
3
2
v1 D e1 e2 e3 ;
7
7
7
v2 D
8 9 6 e C e2 C e3 ;
7 1
7
7
1
2
1
v3 D e1 e2 C e3 :
7
7
7
6.1 The Dual Basis
205
Consider now the following inverse problem: given a basis f1 ; : : : ; fn of V ,
is there always a basis e1 ; : : : ; en of V whose dual basis is f1 ; : : : ; fn ? If so, how
to find such a basis?
Let us start with any basis v1 ; : : : ; vn of V (we know that dim V D n since we
know that dim V D n). Of course, in practice the choice of v1 ; : : : ; vn will be the
natural one (for instance if V D Rn then we will take for v1 ; : : : ; vn the canonical
basis, if V D Rn1 ŒX , we will take for v1 ; : : : ; vn the basis 1; X; : : : ; X n1 , etc).
Define a matrix
A D Œaij ;
aij D fi .vj /:
This will be known in practice. On the other hand, we are looking for a basis
e1 ; : : : ; en of V such that ei D fi , that is
fi .ej / D ıij
for 1 i; j n. We are therefore looking for an invertible matrix B such that
setting
ei D
n
X
bj i vj ;
j D1
these vectors satisfy the previous relations. Well, these relations are equivalent to
ıij D fi .ej / D
n
X
bkj fi .vk / D
kD1
n
X
bkj aik D .AB/ij ;
kD1
that is
AB D In :
In other words, e1 ; : : : ; en exist if and only if the matrix A is invertible, and then
e1 ; : : : ; en are uniquely determined by
B D A1 :
It is however not clear that the matrix A is invertible. This is however the case, as
the following theorem shows:
Theorem 6.13. Let v1 ; : : : ; vn be a basis of V and let f1 ; : : : ; fn be a basis of V .
The matrix A D Œaij with aij D fi .vj / is invertible. Consequently (thanks to the
above discussion) for any basis f1 ; : : : ; fn of V there is a unique basis e1 ; : : : ; en
of V whose dual basis is f1 ; : : : ; fn .
206
6 Duality
Proof. Assume that A is not invertible. We can thus find a nonzero vector X 2 F n
with coordinates x1 ; : : : ; xn such that AX D 0. Thus for all j 2 f1; 2; : : : ; ng we
have
0D
n
X
iD1
aj i xi D
n
X
fj .vi /xi D fj .
iD1
n
X
xi vi /:
i D1
The vector v D x1 v1 C : : : C xn vn is therefore nonzero (since v1 ; : : : ; vn are linearly
independent and X ¤ 0) and we have
f1 .v/ D : : : D fn .v/ D 0:
Since f1 ; : : : ; fn is a spanning set for V , we deduce that l.v/ D 0 for all l 2 V .
Thus v is a nonzero vector in the kernel of the biduality map V ! V , which was
however shown to be an isomorphism. This contradiction shows that A is invertible
and finishes the proof.
In practice, it is helpful to know that a converse of the theorem holds:
Theorem 6.14. Let V be a vector space of dimension n over a field F . If the matrix
A D Œaij with aij D fi .vj / is invertible for some v1 ; : : : ; vn 2 V and some
f1 ; : : : ; fn 2 V , then v1 ; : : : ; vn form a basis of V and f1 ; : : : ; fn form a basis
of V .
Proof. Suppose that v1 ; : : : ; vn are linearly independent, say x1 v1 C : : : C xn vn D 0
for some x1 ; : : : ; xn 2 F , not all equal to 0. Applying fj to this relation, we obtain
0 D fj .x1 v1 C : : : C xn vn / D aj1 x1 C : : : C aj n xn
for all j 2 f1; 2; : : : ; ng, thus AX D 0, where X 2 F n has coordinates x1 ; : : : ; xn ,
contradicting that A is invertible. Thus v1 ; : : : ; vn are linearly independent and since
dim V D n, they form a basis of V .
Similarly, if f1 ; : : : ; fn were linearly dependent, we could find a nontrivial
dependency relation x1 f1 C : : : C xn fn D 0, which evaluated at each vi would
yield
n
X
aij xi D 0;
iD1
that is t AX D 0 and t A would not be invertible, a contradiction.
Problem 6.15. Consider the following linear forms on R3 :
l1 .x; y; z/ D x C 2y C 3z;
l2 .x; y; z/ D 2x C 3y C z;
a) Prove that l1 ; l2 ; l3 form a basis of the dual of R3 .
b) Find the basis of R3 whose dual basis is l1 ; l2 ; l3 .
l3 .x; y; z/ D 3x C y C 2z:
6.1 The Dual Basis
207
Solution. a) Consider the canonical basis e1 ; e2 ; e3 of R3 and the matrix
3
123
A D Œli .ej / D 4 2 3 1 5 :
312
2
This matrix is invertible, as one easily shows using row-reduction. It follows from
the previous theorem that l1 ; l2 ; l3 form a basis of the dual of R3 .
b) We compute the inverse of A using row-reduction. We obtain
3
2
5 1 7
1
4 1 7 5 5 :
A1 D
18
7 5 1
Using the previous discussion, we read the desired basis v1 ; v2 ; v3 of R3 on the
columns of A1 :
3
2
5
1 4
v1 D
1 5;
18
7
3
2
1
1 4
v2 D
7 5;
18
5
3
2
7
1 4
v3 D
5 5 :
18
1
Problem 6.16. Let V D R2 ŒX and, for P 2 V , set
l1 .P / D P .1/;
l2 .P / D P 0 .1/;
l3 .P / D
Z 1
P .x/dx:
0
a) Prove that l1 ; l2 ; l3 is a basis of V .
b) Find a basis e1 ; e2 ; e3 of V whose dual basis is l1 ; l2 ; l3 .
Solution. a) It is not difficult to check that l1 ; l2 ; l3 are linear forms on V . In order
to prove that they form a basis of V , we will use the previous theorem. Namely,
we consider the canonical basis v1 D 1, v2 D X and v3 D X 2 of V and the
matrix
A D Œli .vj /:
Noting that if P D aX 2 C bX C c then
l1 .P / D a C b C c;
we deduce that
l2 .P / D 2a C b;
l3 .P / D
a
b
C C c;
3
2
208
6 Duality
2
3
111
A D 40 1 2 5:
1 12 13
One easily checks using row-reduction that A is invertible and by the previous
theorem l1 ; l2 ; l3 form a basis of V .
b) Using the method discussed before Theorem 6.13, we see that we have to
compute the matrix B D A1 . Row-reduction yields
3
2
2 12 3
B D A1 D 4 6 2 6 5 :
3 32 3
Moreover, using that method we deduce that
e1 D 2v1 C 6v2 3v3 D 2 C 6X 3X 2 ;
e2 D
1
3
1
3
v1 2v2 C v3 D 2X C X 2 ;
2
2
2
2
e3 D 3v1 6v2 C 3v3 D 3 6X C 3X 2 :
6.1.1 Problems for Practice
In the following problems we let Rn ŒX be the space of polynomials with real
coefficients whose degree does not exceed n.
1. Find the dual basis of the basis of R3 consisting of
v1 D .1; 1; 0/;
v2 D .0; 0; 1/;
v3 D .1; 1; 1/:
2. Consider the linear forms on R3
l1 .x; y; z/ D 2x C 4y C z;
l2 .x; y; z/ D 4x C 2y C 3z;
l3 .x; y; z/ D x C y:
a) Prove that l1 ; l2 ; l3 form a basis of the dual of R3 .
b) Find the basis of R3 whose dual basis is l1 ; l2 ; l3 .
3. Let V be a finite dimensional vector space over a field F . Prove that for all
x ¤ y 2 V we can find a linear form l on V such that l.x/ ¤ l.y/.
4. Define P0 D 1 and, for k 1,
Pk .X / D X.X 1/ : : : .X k C 1/:
6.1 The Dual Basis
209
Also, let fk W Rn ŒX ! R be the map defined by fk .P / D P .k/.
a) Prove that P0 ; : : : ; Pn is a basis of Rn ŒX .
b) Prove that f0 ; : : : ; fn is a basis of Rn ŒX .
c) Let .P0 ; : : : ; Pn / be the dual basis of .P0 ; : : : ; Pn /. Express Pk in terms of
f0 ; : : : ; fn .
5. Let a ¤ b be real numbers and for k 2 f0; 1; 2g set
Pk .X / D .X a/k .X b/2k :
a) Prove that P0 ; P1 ; P2 form a basis of R2 ŒX .
b) Let c D aCb
and, for ˛ 2 fa; b; cg, let f˛ W R2 ŒX ! R be the map defined
2
by f˛ .P / D P .˛/. Prove that fa ; fb ; fc form a basis of R2 ŒX .
c) Express the dual basis P0 ; P1 ; P2 in terms of the basis fa ; fb ; fc .
6. For i 0 let fi W R2 ŒX ! R be the map defined by
fi .P / D
Z 1
x i P .x/dx:
0
a) Prove that f0 ; f1 ; f2 form a basis of R2 ŒX .
b) Find a basis of R2 ŒX whose dual basis is f0 ; f1 ; f2 .
7. Let V be the vector space of all sequences .xn /n0 of real numbers such that
xnC2 D xnC1 C xn
for all n 0.
a) Prove that V has dimension 2.
b) Let l0 ; l1 W V ! R be the linear forms sending a sequence .xn /n0 to x0 ,
respectively x1 . Find the basis e0 ; e1 of V whose dual basis is l0 ; l1 .
8. Let X be a finite set and let V be the space of all maps ' W X ! F . For each
x 2 X , consider the map lx W V ! F sending f to f .x/. Prove that the family
.lx /x2X is a basis of V .
9. Let l be a linear form on Rn ŒX and let k 2 Œ0; n be an integer. Prove that the
following statements are equivalent:
a) We have l.X k P / D 0 for all polynomials P 2 Rnk ŒX .
b) There are real numbers ˛0 ; : : : ; ˛k1 such that for all P 2 Rn ŒX l.P / D
k1
X
iD0
˛i P .i / .0/:
210
6 Duality
10. a) Let a0 ; : : : ; an be pairwise distinct real numbers. Prove that there is a unique
n C 1-tuple of real numbers .b0 ; : : : ; bn / such that for any P 2 Rn ŒX we
have
P .0/ C P 0 .0/ D
n
X
bk P .ak /:
kD0
b) Find such numbers b0 ; : : : ; bn for n D 2, a0 D 1, a1 D 2 and a2 D 3.
11. Prove Simpson’s formula: for all P 2 R2 ŒX aCb
ba
P .a/ C 4P
C f .b/ :
P .x/dx D
6
2
a
Z b
12. a) Let l1 ; l2 be nonzero linear forms on some nonzero vector space V over R.
Prove that we can find v 2 V such that l1 .v/l2 .v/ is nonzero.
b) Generalize this to any finite number of nonzero linear forms.
13. Let V; W be vector spaces. Prove that .V W / is isomorphic to V W .
6.2 Orthogonality and Equations for Subspaces
Let V be a vector space over a field F , let l be a linear form on V and v 2 V . We say
that l and v are orthogonal if
hl; vi D 0;
i.e. l.v/ D 0;
or equivalently
v 2 ker l:
If S is any subset of V , we let
S ? D fl 2 V j
hl; vi D 0
8v 2 S g
be the orthogonal of S . These are the linear forms on V which vanish on S, or
equivalently on the span of S (by linearity). Thus
S ? D Span.S /? :
Note that S ? is a subspace of V , since if l1 and l2 vanish on S , then so does
l1 C cl2 for all scalars c 2 F .
Similarly, if S is a subset of V , we let
S ? D fv 2 V j hl; vi D 0
8l 2 S g
be the orthogonal of S . The elements of S ? are the vectors killed by all linear
forms in S , thus
\
ker l:
S? D
l2S
6.2 Orthogonality and Equations for Subspaces
211
This makes it clear that S ? is a subspace of V , as intersection of the subspaces
.ker l/l2S of V . Again, by linearity we have
S ? D .Span.S //?
for all S V .
In practice, finding the orthogonal of a subset of a finite dimensional vector space
or of its dual comes down to solving linear systems, problem which can be easily
solved using row-reduction for instance. Indeed, let V be a finite dimensional vector
space over F and let S be a set of vectors in V . Finding S ? comes down to finding
those linear forms l on V vanishing on each element of S . Let e1 ; : : : ; en be a basis
of V , then a linear form l on V is of the form
l.x1 e1 C : : : C xn en / D a1 x1 C : : : C an xn
for some a1 ; : : : ; an 2 F . Writing each element s 2 S with respect to the basis
e1 ; : : : ; en yields
s D ˛s1 e1 C : : : C ˛sn en
for some scalars ˛si . Then l 2 S ? if and only if
a1 ˛s1 C : : : C an ˛sn D 0
for all s 2 S . This is a linear system in a1 ; : : : ; an , but the reader will probably
be worried that it may have infinitely many equations (if S is infinite). This is
not a problem, since as we have already seen S ? D .Span.S //? and Span.S / is
finite dimensional (since a subspace of V ), thus by choosing a basis of Span.S / say
s1 ; : : : ; sk , we reduce the problem to solving the system
a1 ˛sj 1 C : : : C an ˛sj n D 0
for 1 j k. The discussion is similar if we want to compute the orthogonal of a
subset of V .
Let us see some concrete examples:
Problem 6.17. Consider the subspace W of R3 defined by
W D f.x; y; z/ 2 R3 j x C y C z D 0g:
Give a basis of the orthogonal W ? of W .
Solution. By definition, a linear form l on R3 belongs to W ? if and only if
l.x; y; z/ D 0 whenever x C y C z D 0. In other words,
l.x; y; x y/ D 0
for all
x; y 2 R;
212
6 Duality
which can be written as
xl.1; 0; 1/ C yl.0; 1; 1/ D 0:
Thus l 2 W ? if and only if
l.1; 0; 1/ D l.0; 1; 1/ D 0:
Now, a linear form l on R3 is of the form
l.x; y; z/ D ax C by C cz;
where a; b; c are real numbers. Thus l 2 W ? if and only if
a c D 0;
b c D 0;
or equivalently a D b D c. It follows that the linear form
l0 .x; y; z/ D x C y C z
is a basis of W ? .
Problem 6.18. Let S D fv1 ; v2 ; v3 g R , where
4
v1 D .1; 0; 1; 0/;
v2 D .0; 1; 1; 0/;
v3 D .1; 1; 0; 1/:
Describe S ? by giving a basis of this space.
Solution. A linear form l on R4 is of the form
l.x; y; z; t / D ax C by C cz C dt;
where a; b; c; d are real numbers. The condition l 2 S ? is equivalent to
l.v1 / D l.v2 / D l.v3 / D 0:
Thus l 2 S ? if and only if a; b; c; d are solutions of the system
8
< aCc D0
bCc D0
:
a C b C d D 0
This system can be solved without difficulty: the first and second equations give
a D b D c and the third equation yields d D 0, thus the solutions of the system
are f.u; u; u; 0/ju 2 Rg. The corresponding linear forms are
lu .x; y; z; t / D u.x C y z/;
6.2 Orthogonality and Equations for Subspaces
213
hence a basis of S ? is given by
l1 .x; y; z; t / D x C y z:
Problem 6.19. Consider the set S D fl1 ; l2 g where
l1 .x; y; z/ D 2x C 3y z;
l2 .x; y; z/ D x 2y C z:
Find a basis for S ? .
Solution. A vector .x; y; z/ is in S ? if and only if
l1 .x; y; z/ D l2 .x; y; z/ D 0;
that is
2x C 3y z D 0
x 2y C z D 0
Solving the system yields
y D 3x;
z D 7x:
Thus a basis of S ? is given by .1; 3; 7/.
Let us continue with an easy, but important theoretical exercise.
Problem 6.20. a) If S1 S2 are subsets of V or of V , then S2? S1? .
b) If S is a subset of V or V , then S .S ? /? .
Solution. a) Suppose that S1 ; S2 are subsets of V . If l 2 S2? , then l vanishes on S2 .
Since S1 S2 , it follows that l vanishes on S1 and so l 2 S1? . Thus S2? S1? .
Suppose that S1 ; S2 are subsets of V . If v 2 S2? , then all elements of S2
vanish at v. Since S1 S2 , it follows that all elements of S1 vanish at v and so
v 2 S1? . The result follows.
b) Suppose that S V and let v 2 S . We need to prove that if l 2 S ? , then
hl; vi D 0, which is clear by definition! Similarly, if S V and l 2 S , we need
to prove that hl; vi D 0 for all v 2 S ? , which is again clear.
Remark 6.21. While it is tempting to believe that the inclusion in part b) of the
problem is actually an equality, this is completely false: .S ? /? is a subspace of
V or V , while S has no reason to be a subspace of V or V (it was an arbitrary
subset). Actually, we will see that the inclusion is an equality if S is a subspace
of V or V when V is finite dimensional.
214
6 Duality
The fundamental theorem concerning duality of vector spaces is the following:
Theorem 6.22. Let V be a finite dimensional vector space over F . Then for all
subspaces W of V or V we have
dim W C dim W ? D dim V:
Proof. Let n D dim V . Let W be a subspace of V , of dimension m n, and
let e1 ; : : : ; em be a basis of W , completed to a basis e1 ; : : : ; en of V . We need to
prove that dim W ? D n m. Let e1 ; : : : ; en be the dual basis of V associated
with e1 ; : : : ; en . We will prove that emC1
; : : : ; en is a basis of W ? , which will
?
; : : : ; en belong to
prove the equality dim W D n m. First, notice that emC1
?
W , since ej vanishes at e1 ; : : : ; em for all m < j n, thus it vanishes on
W D Span.e1 ; : : : ; em /.
Since emC1
; : : : ; en form a subfamily of the linearly independent family
e1 ; : : : ; en , it suffices to prove that they span W ? . Let l 2 W ? , so that l vanishes
on W . Using Remark 6.5, we obtain
lD
n
X
hl; ei iei 2 Span.emC1
; : : : ; en /
iDmC1
and the proof of the equality dim W ? D n m is finished.
Suppose now that W is a subspace of V . By definition W ? consists of vectors
v 2 V such that hl; vi D 0 for all l 2 W . Let W V ! V be the canonical biduality
map. The equality hl; vi D 0 is equivalent to h .v/; li D 0. Thus v 2 W ? if and only
if .v/ 2 .V / vanishes on W . Since is an isomorphism and since the space of
g 2 .V / which vanish on W has dimension dim V dim W D dim V dim W
by the first paragraph, we conclude that dim W ? D dim V dim W , finishing the
proof of the theorem.
Let us also mention the following very important consequence of the previous
theorem: we can recover a subspace in a finite dimensional vector space (or its dual)
from its orthogonal:
Corollary 6.23. Let V be a finite dimensional vector space over F and let W be a
subspace of V or V . Then .W ? /? D W .
Proof. By Problem 6.20 we have an inclusion W .W ? /? . By the previous
theorem
dim.W ? /? D dim V dim W ? D dim W:
Thus we must have .W ? /? D W .
The previous result allows us to give equations for a subspace W of a finite
dimensional vector space V over F . Indeed, let n D dim V and p D dim W , thus
dim W ? D n p by the previous theorem. Let l1 ; : : : ; lnp be a basis of W ? . Then
by the previous corollary
W D .W ? /? D fv 2 V j l1 .v/ D : : : D lnp .v/ D 0g:
6.2 Orthogonality and Equations for Subspaces
215
If e1 ; : : : ; en is a fixed basis of V , the linear form li is of the form
li .x1 e1 C : : : C xn en / D ai1 x1 C ai2 x2 C : : : C ai n xn
for some aij 2 F . We deduce that
W D fx1 e1 C : : : C xn en 2 V j ai1 x1 C : : : C ai n xn D 0
for all
1 i n pg;
in other words W can be defined by n p equations, which are linearly independent (since l1 ; : : : ; lnp form a basis of W and thus are linearly independent).
Moreover, one can actually write down explicitly these equations if we know the
coefficients aij , in other words if we can find a basis of W ? . But if W is given,
then we have already explained how to compute W ? , and we also know how to
compute a basis of a given vector space, thus all the previous steps can actually be
implemented in practice (we will see a concrete example in a few moments).
Conversely, if l1 ; : : : ; lnp are linearly independent linear forms on V , then
Z D fv 2 V j l1 .v/ D : : : D lnp .v/ D 0g
is a vector subspace of V of dimension p, since
Z D .Span.l1 ; : : : ; lnp //? ;
thus by Theorem 6.22
dim Z D n dim Span.l1 ; : : : ; lnp / D n .n p/ D p:
We can summarize the previous discussion in the following fundamental:
Theorem 6.24. Let V be a vector space of dimension n over a field.
a) If W is a subspace of V of dimension p, then we can find linearly independent
linear forms l1 ; : : : ; lnp on V such that
W D fv 2 V j l1 .v/ D : : : D lnp .v/ D 0g:
We say that l1 .v/ D : : : D lnp .v/ D 0 are equations of W (of course, there are
many possible equations for W !).
b) Conversely, if l1 ; : : : ; lnp are linearly independent linear forms on V , then
W D fv 2 V j l1 .v/ D : : : D lnp .v/ D 0g
is a subspace of dimension p of V .
With the above notations, the case p D n 1 is particularly important and
deserves a
216
6 Duality
Definition 6.25. Let V be a finite dimensional vector space over F . A subspace W
of V is called a hyperplane if
dim W D dim V 1:
For instance, the hyperplanes in R2 are the subspaces of dimension 1, i.e., the
lines. On the other hand, the hyperplanes in R3 are the subspaces of dimension 2,
i.e., planes spanned by two linearly independent vectors (this really corresponds to
the geometric intuition). There are several possible definitions of a hyperplane and
actually the previous one, though motivated by the previous theorem, is not the most
natural one since it does not say anything about the case of infinite dimensional
vector spaces. The most general and useful definition of a hyperplane in a (not
necessarily finite dimensional) vector space V over F is that of a subspace W
of V of the form ker l, where l is a nonzero linear form on V . In other words,
hyperplanes are precisely the kernels of nonzero linear forms. Of course, this
new definition is equivalent to the previous one in the case of finite dimensional
vector spaces (for instance, by the rank-nullity theorem or by the previous theorem).
It also shows that the hyperplanes in F n are precisely the subspaces of the form
H D f.x1 ; : : : ; xn / 2 F n j a1 x1 C : : : C an xn D 0g
for some nonzero vector .a1 ; : : : ; an / 2 F n . In general, if e1 ; : : : ; en is a basis of V ,
then the hyperplanes in V are precisely the subspaces of the form
H D fv D x1 e1 C : : : C xn en 2 V j a1 x1 C : : : C an xn D 0g:
Notice that if H is a hyperplane in a finite dimensional vector space, then H ? has
dimension 1, thus it is a line in V .
We say that hyperplanes H1 ; : : : ; Hp are linearly independent if they are the
kernels of a linearly independent family of linear forms. The previous theorem can
be rewritten as:
Theorem 6.26. a) Any subspace of dimension p in a vector space of dimension n
is the intersection of n p linearly independent hyperplanes of V .
b) Conversely, the intersection of np linearly independent hyperplanes in a vector
space of dimension n is a subspace of dimension p.
We end this section with two concrete problems:
Problem 6.27. Let W be the subspace of R4 spanned by the vectors
v1 D .1; 1; 1; 0/
and
v2 D .1; 2; 1; 1/:
Find equations for W .
Solution. Here V D R4 and e1 ; e2 ; e3 ; e4 is the canonical basis of V . As the
discussion above shows, the problem comes down to finding a basis of W ? . Now
W ? consists in those linear forms
l.x; y; z; t / D ax C by C cz C dt
6.2 Orthogonality and Equations for Subspaces
217
which vanish on v1 and v2 , i.e., such that
a C b c D 0;
a C 2b c C d D 0:
We obtain
c D a C b;
d D a C c 2b D 2a b
and so
l.x; y; z; t / D ax C by C .a C b/z C .2a b/t
D a.x C z C 2t / C b.y C z t /:
We deduce that a basis of W ? is given by
l1 .x; y; z; t / D x C z C 2t
and
l2 .x; y; z; t / D y C z t:
As we have already seen above, we have
W D fv 2 V j l1 .v/ D l2 .v/ D 0g D
f.x; y; z; t / 2 R4 j x C z C 2t D y C z t D 0g
and l1 .v/ D l2 .v/ D 0 are equations for W .
Problem 6.28. Let V D R3 ŒX . Write the vector subspace of W spanned by 1 C X
and 1 X C X 3 as the intersection of 2 linearly independent hyperplanes.
Solution. Consider the canonical basis
e1 D 1; e2 D X; e3 D X 2 ; e4 D X 3
of V and
v1 D 1 C X D e1 C e2 ;
v2 D 1 X C X 3 D e1 e2 C e4 :
Writing W D Span.v1 ; v2 / as the intersection of 2 linearly independent hyperplanes
is equivalent to finding two equations defining W , say l1 .v/ D l2 .v/ D 0, as then
W D H1 \ H 2 ;
where
Hi D ker li :
Thus we are reduced to finding a basis l1 ; l2 of W ? . A linear form l on V is of
the form
l.x1 e1 C x2 e2 C x3 e3 C x4 e4 / D ax1 C bx2 C cx3 C dx4
218
6 Duality
for some real numbers a; b; c; d . This linear form belongs to W ? if and only if
l.v1 / D l.v2 / D 0, which is equivalent to
a C b D a b C d D 0:
This gives b D a and d D 2a, that is
l.x1 e1 C : : : C x4 e4 / D a.x1 x2 2x4 / C cx3 :
We deduce that a basis l1 ; l2 of W ? is given by
l1 .x1 e1 C : : : C X4 e4 / D x1 x2 2x4 ;
l2 .x1 e1 C : : : C x4 e4 / D x3
and so W is the intersection of two linearly independent hyperplanes
H1 D ker l1 D fa C bX C cX 2 C dX 3 2 V j a b 2d D 0g
and
H2 D ker l2 D fa C bX C cX 2 C dX 3 2 V j c D 0g:
6.2.1 Problems for Practice
1. Consider the linear forms
l1 .x; y/ D x 2y;
l2 .x; y/ D 2x C 3y
on R2 . Give a basis of S ? , where S D fl1 ; l2 g.
2. Give a basis of S ? , where S consists of the linear forms
l1 .x; y; z/ D x C y z;
l2 .x; y; z/ D 2x 3y C z;
l3 .x; y; z/ D 3x 2y
on R3 .
3. Find a basis of W ? , where
W D f.x; y; z; t / 2 R4 jx C 2y C z t D 0g:
4. Let S D f.v1 ; v2 ; v3 /g, where
v1 D .0; 1; 1/;
Describe S ? .
v2 D .1; 1; 0/;
v3 D .3; 5; 2/:
6.2 Orthogonality and Equations for Subspaces
219
5. Give equations for the subspace of R4 spanned by
v1 D .1; 2; 2; 1/;
v2 D .1; 0; 4; 2/:
6. a) Find the dimension p of the subspace W of R4 spanned by
v1 D .1; 2; 2; 1/;
v2 D .1; 2; 0; 3/;
v3 D .0; 4; 2; 2/:
b) Write W as an intersection of 4 p linearly independent hyperplanes.
c) Can we write W as the intersection of 3 p hyperplanes?
7. Let V D Mn .R/ and for each A 2 V consider the map
lA W V ! R;
LA .B/ D AB:
a) Prove that lA 2 V for all A 2 V .
b) Prove that the map
V ! V ;
A 7! lA
is a bijective linear map (thus an isomorphism of vector spaces).
c) Let Sn and An be the subspaces of V consisting of symmetric, respectively
skew-symmetric matrices. Prove that
Sn? D flA j A 2 An g
and
A?
n D flA j A 2 Sn g:
8. Let V be the space of polynomials with real coefficients and let W be the
subspace of V spanned by the linear forms .ln /n0 , where ln .P / D P .n/ .0/.
Prove that W ? D f0g, but W ¤ V . Thus if W is a subspace of V , we do not
always have .W ? /? D W (this is the case if V is finite dimensional, or, more
generally, if W is finite dimensional).
9. Let l be a linear form on Mn .R/ such that
l.AB/ D l.BA/
for all A; B 2 Mn .R/. Let .Eij /1i;j n be the canonical basis of Mn .R/.
a) Prove that l.E11 / D : : : D l.Enn /. Hint: for i ¤ j Eij Ej i D Ei i and
Ej i Eij D Ejj .
b) Prove that l.Eij / D 0 for i ¤ j . Hint: Ei i Eij D Eij and Eij Ei i D On .
c) Deduce that there is a real number c such that
l.A/ D c Tr.A/
for all
A 2 Mn .R/:
10. Using the previous problem, determine the span of the set of matrices of the
form AB BA, with A; B 2 Mn .R/ (hint: consider the orthogonal of the span).
220
6 Duality
11. Let V be a vector space and let W1 ; W2 be subspaces of V or V . Prove that
.W1 C W2 /? D W1? \ W2? :
12. Let V be a finite dimensional vector space and let W1 and W2 be subspaces of
V . Prove that
.W1 \ W2 /? D W1? C W2? :
Hint: use the previous problem and Corollary 6.23.
13. Let W1 ; W2 be complementary subspaces in a finite dimensional vector space V
over a field F . Prove that W1? and W2? are complementary subspaces in V .
14. Let H1 ; H2 be distinct hyperplanes in a vector space V of dimension n 2 over
R. Find dim.H1 \ H2 /.
15. Prove that a nonzero finite dimensional vector space over R is not the union of
finitely many hyperplanes.
16. Prove that the hyperplanes in Mn .R/ are precisely the subspaces of the form
fX 2 Mn .R/j Tr.AX / D 0g
for some nonzero matrix A 2 Mn .R/.
17. Let W be a subspace of dimension p in a vector space V of dimension n. Prove
that the minimal number of hyperplanes whose intersection is W is n p.
18. Let V be a finite dimensional vector space and let l; l1 ; : : : ; ln 2 V be linear
forms. Prove that l 2 Span.l1 ; : : : ; ln / if and only if \niD1 ker li ker l.
6.3 The Transpose of a Linear Transformation
Let V; W be vector spaces over a field F and let T W V ! W be a linear
transformation. For each l 2 W we can consider the composite l ı T W V ! F ,
which is a linear form on V . We obtain therefore a map
t
T W W ! V ;
t
T .l/ D l ı T:
In terms of the canonical pairing between V and V , and between W and W ,
we have
h t T .l/; vi D hl; T .v/i
for all l 2 W and v 2 V . We call t T the transpose of the linear transformation T .
If V and W are finite dimensional, the following theorem completely elucidates
the map t T :
6.3 The Transpose of a Linear Transformation
221
Theorem 6.29. Let T W V ! W be a linear transformation between finite
dimensional vector spaces and let B and B 0 be two bases of V and W respectively.
If A is the matrix of T with respect to B and B0 , then the matrix of t T W W ! V with respect to the dual bases of B 0 and B is t A.
Proof. Let B D .v1 ; : : : ; vn / and B 0 D .w1 ; : : : ; wm /. Write A D Œaij and let
B D Œbij be the matrix of t T with respect to the bases w1 ; : : : ; wm and v1 ; : : : ; vn .
By definition we have
T .vi / D
m
X
aj i wj ;
81 i n
j D1
and
t
T .wi / D
n
X
bki vk ;
81 i m:
kD1
Fix 1 i m and write the last equality as
n
X
bki vk D wi ı T:
kD1
Evaluating at vj , with j 2 Œ1; n arbitrary, we obtain
n
X
bki vk .vj / D
kD1
n
X
bki ıkj D bj i
kD1
and
wi .T .vj // D wi .
m
X
alj wl / D
lD1
m
X
alj ıil D aij :
lD1
Comparing the two expressions yields
aij D bj i
which is exactly saying that B D t A.
for all
i; j;
The following problems establish basic properties of the correspondence
T ! t T . For linear maps between finite dimensional vector spaces, they follow
immediately from the previous theorem and properties of the transpose map on
matrices that we have already established in the first chapter. If we want to deal with
arbitrary vector spaces, we cannot use these results. Fortunately, the results are still
rather easy to establish in full generality.
222
6 Duality
Problem 6.30. Prove that for all linear transformations T1 ; T2 W V ! W and all
scalars c 2 F we have
t
.T1 C cT2 / D t T1 C c t T2 :
Solution. We need to prove that if l is a linear form on W , then
l ı .T1 C cT2 / D l ı T1 C cl ı T2 :
This follows from the fact that l is linear.
Problem 6.31. a) Let T1 W V1 ! V2 and T2 W V2 ! V3 be linear transformations.
Prove that
t
.T2 ı T1 / D t T1 ı t T2 :
b) Deduce that if T W V ! V is an isomorphism, then so is t T W V ! V , and
. t T /1 D t .T 1 /.
Solution. a) Let l be a linear form on V3 . Then
t
.T2 ı T1 /.l/ D l ı .T2 ı T1 / D .l ı T2 / ı T1 D
t
T1 .l ı T2 / D t T1 . t T2 .l// D t T1 ı t T2 .l/:
The result follows.
b) Since T is an isomorphism, there is a linear transformation T 1 such that T ı
T 1 D T 1 ı T D id. Using part a) and the obvious equality t id D id, we obtain
t
T ı t .T 1 / D id D t .T 1 / ıt T;
from where the result follows.
Problem 6.32. Let T W V ! W be a linear transformation and let V W V ! V ,
be the canonical biduality maps. Prove that
W WW !W
W ıT D
. T/ı V:
t t
Solution. Let v 2 W , then
. T / ı V .v/ D t . t T /.evv / D evv ı t T:
t t
The last map sends l 2 W to
evv ı t T .l/ D evv .l ı T / D .l ı T /.v/ D l.T .v//
D evT .v/ .l/ D W .T .v//.l/:
6.3 The Transpose of a Linear Transformation
223
Thus
. T / ı V .v/ D evv ı t T D W .T .v//
t t
for all v 2 V , which is exactly the desired equality.
The following technical but very important result makes the link between the
transpose operation and orthogonality. This allows us to use the powerful results
established in the previous section.
Theorem 6.33. Let T W V ! W be a linear transformation between finite
dimensional vector spaces. We have
ker. t T / D .Im.T //? ;
ker T D .Im. t T //?
Im. t T / D .ker T /? ;
Im.T / D .ker. t T //? :
and
Proof. By definition we have
ker. t T / D fl 2 W j l ı T D 0g D fl 2 W j l.T .v// D 0 8 v 2 V g
D fl 2 W j l.w/ D 0 8 w 2 Im.T /g D .Im.T //? :
Similarly, we have
.Im. t T //? D fv 2 V j t T .l/.v/ D 0 8 l 2 W g
D fv 2 V j l.T .v// D 0 8 l 2 W g D fv 2 V j T .v/ D 0g D ker T:
Note that we could have also deduced this second result by using the already
established equality ker. t T / D .Im.T //? , applying it to t T and using the previous
problem (and the fact that V and W are isomorphisms).
Using what we have already established and the fact that our spaces are finite
dimensional (thus we can use Corollary 6.23), we obtain
.ker T /? D ..Im. t T //? /? D Im. t T /:
We proceed similarly for the equality Im.T / D .ker. t T //? .
The previous theorem allows us to give a new proof of the classical but nontrivial
result that a matrix and its transpose have the same rank:
Problem 6.34. a) Let T W V ! W be a linear transformation between finite
dimensional vector spaces. Prove that T and t T have the same rank.
b) Prove that if A 2 Mm;n .F /, then A and its transpose have the same rank.
224
6 Duality
Solution. Using Theorem 6.29, we see that b) is simply the matrix translation of a).
In order to prove part a), we use Theorem 6.33, which yields
rank. t T / D dim.Im. t T // D dim.ker T /? :
By Theorem 6.22 and the rank-nullity theorem, the last expression equals
dim V dim ker T D dim Im.T / D rank.T /:
The result follows.
6.3.1 Problems for Practice
In the next problems we fix a field F .
1. Consider the linear map
T W R3 ! R2 ;
T .x; y; z/ D .x 2y C 3z; x y C z/:
Let e1 ; e2 be the dual basis of R2 . Find the coordinates of the vector t T .e1 e2 ; e1 C e2 / with respect to the dual basis of the canonical basis of R3 .
2. Find the matrix of t T with respect to the dual base of the canonical base of R3 ,
knowing that
T .x; y; z/ D .x 2y C 3z; 2y z; x 4y C 3z/:
3. Let T W V ! W be a linear transformation between finite dimensional vector
spaces over F . Prove that
a) T is injective if and only if t T is surjective.
b) T is surjective if and only if t T is injective.
4. Let T W V ! V be a linear transformation on a finite dimensional vector space
V over F , and let W be a subspace of V . Prove that W is stable under T if and
only if W ? is stable under t T .
5. Find all planes of R3 which are invariant under the linear transformation
T W R3 ! R3 ;
T .x; y; z/ D .x 2y C z; 0; x C y C z/:
6. Let V be a finite dimensional vector space and let T W V ! V be a linear
transformation such that any hyperplane of V is stable under T . Prove that T is
a scalar times the identity (hint: prove that any line in V is stable under t T ).
6.4 Application to the Classification of Nilpotent Matrices
225
6.4 Application to the Classification of Nilpotent Matrices
In this section we will use the results established in the previous sections to give a
simple proof of a beautiful and extremely important theorem of Jordan. This will be
used later on to completely classify matrices in Mn .C/ up to similarity. Actually, the
proof will be split into a series of (relatively) easy exercises, many of them having
their own interest. We will work over an arbitrary field F in this section, but the
reader may assume that F D R or C if he/she wants.
We have seen several important classes of matrices so far: diagonal, uppertriangular, symmetric, orthogonal, etc. It is time to introduce another fundamental
class of matrices and linear transformations:
Definition 6.35. a) Let V be a vector space over F and let T W V ! V be a linear
transformation. We say that T is nilpotent if T k D 0 for some k 1, where
T k D T ı T ı : : : ı T (k times). The smallest such positive integer k is called the
index of T . Thus if k is the index of T , then T k D 0 but T k1 ¤ 0.
b) A matrix A 2 Mn .F / is called nilpotent if Ak D On for some k 1. The
smallest such positive integer k is called the index of A.
If V is a finite dimensional vector space over F , if B is a basis of V and if T W
V ! V is a linear transformation whose matrix with respect to B is A 2 Mn .F /,
then the matrix of T k with respect to B is Ak . It follows that T is nilpotent if and
only if A is nilpotent, and in this case the index of T equals the index of A.
In particular, any matrix similar to a nilpotent matrix is nilpotent and has the same
index. This can also be proved directly using matrix manipulations: if A is nilpotent,
P is invertible, and B D PAP 1 , then an easy induction shows that
B k D PAk P 1
for all k 1, thus B k D On if and only if Ak D On , establishing the previous
statement.
Problem 6.36. Let T1 ; T2 be two linear transformations on a vector space V and
assume that T1 ı T2 D T2 ı T1 . If T1 ; T2 are nilpotent, then so are T1 ı T2 and
T 1 C T2 .
Solution. Say T1k1 D 0 and T2k2 D 0 for some k1 ; k2 1. Then T1k D T2k D 0
where k D k1 C k2 . Since T1 and T2 commute, we obtain
.T1 ı T2 /k D T1k ı T2k D 0
and
.T1 C T2 /
2k
!
2k
X
2k
D
T12ki T2i :
k
iD0
226
6 Duality
For each 0 i k we have T12ki D 0 and for each i 2 Œk C 1; 2k we have
T2i D 0. Thus T12ki T2i D 0 for all 0 i 2k and so .T1 C T2 /2k D 0, establishing
that T1 C T2 is nilpotent.
Remark 6.37. 1) Similarly (and actually a consequence of the problem), the
sum/product of two nilpotent commuting matrices is a nilpotent matrix.
2) The result of the previous problem
true if we don’t assume that
isno longer
00
01
are nilpotent, but their sum is
and
T1 and T2 commute: the matrices
10
00
1 1
01
are nilpotent, but their product
and
not nilpotent, also the matrices
1 1
00
is not nilpotent.
3) It follows from 2) that the nilpotent matrices in Mn .F / do not form a vector
subspace of Mn .F /. A rather challenging exercise for the reader is to prove that
the vector subspace of Mn .F / spanned by the nilpotent matrices is precisely the
set of matrices of trace 0.
The result established in the following problem is very important:
Problem 6.38. a) Let T W V ! V be a nilpotent transformation of index k
and let v 2 V be a vector such that T k1 .v/ ¤ 0. Prove that the family
.v; T .v/; : : : ; T k1 .v// is linearly independent in V .
b) Deduce that if V is finite dimensional then the index of any nilpotent transformation on V does not exceed dim V .
c) Prove that if A 2 Mn .F / is nilpotent, then its index does not exceed n.
Solution. a) Suppose that
a0 v C a1 T .v/ C : : : C ak1 T k1 .v/ D 0
(6.1)
for some scalars a0 ; : : : ; ak1 . Applying T k1 to this relation and taking into
account that T j D 0 for j k yields
a0 T k1 .v/ C 0 C : : : C 0 D 0;
and since T k1 .v/ ¤ 0, we obtain a0 D 0. Applying now T k2 to relation
(6.1) gives a1 T k1 .v/ D 0 and then a1 D 0. Continuing by induction yields
a0 D : : : D ak1 D 0 and the result follows.
b) Suppose that T is nilpotent on V , of index k. Part a) shows that V contains a
linearly independent family with k elements, thus dim V k and we are done.
c) This follows from b) applied to V D F n and the linear map T W V ! V sending
X to AX (using the discussion preceding the problem, which shows that A and
T have the same index).
Using the previous problem, we are ready to introduce a fundamental kind of
nilpotent matrix: Jordan blocks. This is the goal of the next problem:
6.4 Application to the Classification of Nilpotent Matrices
227
Problem 6.39. Let T W V ! V be a nilpotent linear transformation on index k on
a vector space, let v 2 V and let
W D Span.v; T .v/; : : : ; T k1 .v//:
a) Prove that W is stable under T .
b) Prove that if T k1 .v/ ¤ 0, then T k1 .v/; T k2 .v/; : : : ; T .v/; v form a basis of
W (thus dim W D k) and the matrix of the linear transformation T W W ! W
with respect to this basis is
3
0 1 0 ::: 0
60 0 1 ::: 07
7
6
7
6
Jk D 6 ::: ::: ::: : : : ::: 7 :
7
6
40 0 0 ::: 15
2
0 0 0 ::: 0
This matrix is called a Jordan block of size k (note that J1 D O1 , the 1 1
matrix with one entry equal to 0).
Solution. a) Any element of W is of the form
w D a0 v C a1 T .v/ C : : : C ak1 T k1 .v/:
Since T k .v/ D 0, we have
T .w/ D a0 T .v/ C : : : C ak2 T k1 .v/ 2 W;
thus W is stable under T .
b) If T k1 .v/ ¤ 0, part a) of the previous problem shows that T k1 .v/; : : : ; T .v/; v
is a linearly independent family and since it also spans W , it is a basis of W .
Moreover, since T k .v/ D 0 and
T .T i .v// D T iC1 .v/
for k 2 i 0, it is clear that the matrix of T W W ! W with respect to this
basis is Jk .
The main theorem concerning nilpotent linear transformations on finite
dimensional vector spaces is the following beautiful:
Theorem 6.40 (Jordan). Let V be a finite dimensional vector space over a field F
and let T W V ! V be a nilpotent linear transformation. Then there is a basis of V
with respect to which the matrix of T is of the form
228
6 Duality
2
Jk1 0 : : :
6 0 Jk : : :
2
6
AD6 : : :
:
:
4 : : ::
0
3
0
0 7
7
:: 7
: 5
0 : : : Jkd
for some sequence of positive integers k1 k2 : : : kd with
k1 C : : : C kd D n:
Moreover, the sequence .k1 ; : : : ; kd / is uniquely determined.
We can restate the previous theorem in terms of matrices:
Theorem 6.41 (Jordan). Any nilpotent matrix A 2 Mn .F / is similar to a block3
2
Jk1 0 : : : 0
6 0 Jk : : : 0 7
2
7
6
diagonal matrix 6 : : :
: 7 for a unique sequence of positive integers
4 :: :: : : :: 5
0 0 : : : Jkd
.k1 ; : : : ; kd / with k1 k2 : : : kd and
k1 C k2 C : : : C kd D n:
2
Jk1 0 : : :
6 0 Jk : : :
2
6
The matrix 6 : : :
4 :: :: : :
3
0
0 7
7
:: 7 is called the Jordan normal (or canonical) form
: 5
0 0 : : : Jkd
of A or T .
The next series of problems is devoted to the proof of this theorem. We will start
with the uniqueness of the sequence .k1 ; : : : ; kd /. The proof, given in the next three
problems, will also show how to compute explicitly these integers and therefore how
to find in practice the Jordan normal form of a nilpotent matrix.
Problem 6.42. Let T be the linear transformation on F n associated with the Jordan
block Jn . Prove that for all 1 k n 1 we have
rank.T k / D n k
and deduce that
for 1 k n 1.
rank.Jnk / D n k
Solution. If e1 ; : : : ; en is the canonical basis of F n , then
T .e1 / D 0;
T .e2 / D e1 ;
T .e3 / D e2 ; : : : ;
T .en / D en1 :
6.4 Application to the Classification of Nilpotent Matrices
229
In other words, T .ei / D ei1 for 1 i n, with the convention that e0 D 0.
We deduce that T 2 .ei / D T .ei1 / D ei2 for 1 i n, with e1 D 0.
An immediate induction yields
T j .ei / D eij
for 1 j n 1 and 1 i n, with er D 0 for r 0. Thus
Im.T j / D Span.e1 ; e2 ; : : : ; enj /
and this space has dimension n j , which yields
rank.T k / D n k
for 1 k n 1. The second part is an immediate consequence of the first part. Problem 6.43. Suppose that A 2 Mn .F / is similar to
2
Jk1 0 : : :
6 0 Jk : : :
2
6
6 : : :
:
:
4 : : ::
0
3
0
0 7
7
:: 7 :
: 5
0 : : : Jkd
Let Nj be the number of terms equal to j in the sequence .k1 ; : : : ; kd /. Prove that
for all 1 j n
rank.Aj / D Nj C1 C 2Nj C2 C : : : C .n j /Nn :
Solution. If A1 ; : : : ; Ad are square matrices, then
31
A1 0 : : : 0
B6 0 A2 : : : 0 7C
7C
B6
rank B6 : : :
: 7C D rank.A1 / C : : : C rank.Ad /;
@4 :: :: : : :: 5A
02
0 0 : : : Ad
as the reader can easily check by using the fact that the rank of a matrix is the
dimension of the span of its column set. Since similar matrices have the same rank,
we deduce that for all j 1 we have
31
j
Jk1 0 : : : 0
7C
B6
j
d
B6 0 Jk2 : : : 0 7C X
j
7
C
6 : :
D
rank.Jki /:
rank.Aj / D rank B
B6 : : : : :: 7C
: : 5A i D1
@4 : :
j
0 0 : : : Jkd
02
230
6 Duality
j
By the previous problem, rank.Jki / equals ki j if j ki and 0 otherwise. Thus,
since Nt is the number of indices i for which ki D t , we have
d
X
iD1
D
X
j
rank.Jki / D
XX
j
rank.Jt /
tj ki Dt
Nt .t j / D Nj C1 C 2Nj C2 C : : : C .n j /Nn :
tj
Problem 6.44.
Prove that if k1 : : : kd and k10 : : : kd0 0 are sequences of
2
Jk1 0 : : :
6 0 Jk : : :
2
6
positive integers adding up to n and such that A D 6 : : :
4 :: :: : :
3
0
0 7
7
:: 7 is similar
: 5
0 0 : : : Jkd
3
Jk10 0 : : : 0
6 0 J 0 ::: 0 7
7
6
k2
7
to B D 6
6 :: :: : : :: 7, then these sequences are equal. This is the uniqueness
: : 5
4 : :
0 0 : : : Jk 0 0
d
part of Jordan’s theorem.
2
Solution. Let Nj be the number of terms equal to j in the sequence .k1 ; : : : ; kd /,
and define similarly Nj0 for the sequence .k10 ; : : : ; kd0 0 /. We are asked to prove that
Nj D Nj0 for 1 j n.
Since A and B are similar, Aj and B j are similar for all j 1, thus they have
the same rank. Using the previous problem, we deduce that
Nj C1 C 2Nj C2 C : : : C .n j /Nn D Nj0 C1 C 2Nj0 C2 C : : : C .n j /Nn0
for j 1. Setting j D n 1 gives Nn D Nn0 , then setting j D n 2 and using
0
Nn D Nn0 gives Nn1 D Nn1
. Continuing this way yields Nj D Nj0 for 2 j n.
We still need to prove that N1 D N10 , but this follows from
N1 C 2N2 C : : : C nNn D N10 C 2N20 C : : : C nNn0 D n;
since
k1 C : : : C kd D k10 C : : : C kd0 0 D n:
Remark 6.45. The previous two problems show how to compute the sequence
.k1 ; : : : ; kd / in practice. Namely, we are reduced to computing N1 ; : : : ; Nn . For this,
we use the relations
rank.Aj / D Nj C1 C 2Nj C2 C : : : C .n j /Nn
6.4 Application to the Classification of Nilpotent Matrices
231
for 1 j n (it suffices to take j k if A has index k, noting that the previous
relation for j D k already yields NkC1 D : : : D Nn D 0). These determine
completely N2 ; : : : Nn . To find N1 , we use the relation
N1 C 2N2 C : : : C nNn D n:
Example 6.46. As a concrete example, consider the matrix
3
1 1 1 2
6 1 1 1 2 7
7
AD6
4 0 0 2 4 5 :
2
0 0 1 2
One can easily check that this matrix is nilpotent: we compute using the product
rule
3
0 0 4 8
6 0 0 4 8 7
7
A2 D 6
40 0 0 05
00 0 0
2
and then A3 D O3 , using again the product rule. Thus A is nilpotent of index k D 3.
It follows that N4 D 0 and
N1 C 2N2 C 3N3 D 4:
Next, it is easy to see that the rank of A is 2, since the first and second rows are
identical, the last row is half the third row, and the first and third row are linearly
independent. Thus
2 D rank.A/ D N2 C 2N3 C 3N4 D N2 C 2N3
Next, it is clear that A2 has rank 1, thus
1 D rank.A2 / D N3 C 2N4 D N3 :
It follows that
N1 D 1;
N2 D 0;
N3 D 1;
and so the Jordan normal form of A is
3
2
0100
60 0 1 07
7
6
40 0 0 05:
0000
N4 D 0
232
6 Duality
The uniqueness part of Jordan’s theorem being proved, it remains to prove the
existence part, which is much harder. The basic idea is however not very surprising:
we work by strong induction on dim V , the case dim V D 1 being clear (as then
T D 0). Assume that the result holds for dim V < n and let us consider the case
dim V D n. We may assume that T ¤ 0, otherwise we are done. Let k1 D k be the
index of T and let v 2 V such that T k1 .v/ ¤ 0. By Problem 6.39, the subspace
W D Span.v; T .v/; : : : ; T k1 .v//
is invariant under T , which acts on it as the matrix Jk on F k . Moreover, dim W D k.
If k D n, then we are done. If not, we look for a complementary subspace W 0 of W
which is stable under T . If we could find such a space W 0 , then we could apply the
inductive hypothesis to the map T W W 0 ! W 0 (note that its index does not exceed
k1 ) and find a basis of W 0 in which the matrix of T has the desired form. Patching
the basis T k1 .v/; : : : ; T .v/; v and this basis of W 0 would yield the desired basis of
V and would finish the inductive proof. The key difficulty is proving the existence
of W 0 . This will be done in the two problems below.
Problem 6.47. a) Prove that if A 2 Mn .F / is nilpotent, then t A is nilpotent and
has the same index as A.
b) Suppose that V is a finite dimensional vector space over F . Prove that if T W
V ! V is nilpotent, then t T W V ! V is also nilpotent and has the same
index as T .
Solution. a) For all k 1 we have
. t A/k D t .Ak /;
thus . t A/k D On if and only if Ak D On . The result follows.
b) Let B be a basis of V and let B be the dual basis of B. If A is the matrix of T with
respect to B, then the matrix of t T with respect to B is t A, by Theorem 6.29.
The result follows now from part a).
We can also prove this directly as follows: if k 1, then . t T /k D 0 if and
only if . t T /k .l/ D 0 for all l 2 V , equivalently l ı T k D 0 for all l 2 V .
This can be written as: for all v 2 V and all l 2 V we have l.T k .v// D 0.
Now, the assertion that l.T k .v// D 0 for all l 2 V is equivalent to T k .v/ D 0,
by injectivity of the biduality map V ! V . Thus . t T /k D 0 if and only if
T k D 0, and this even when V is infinite dimensional. In other words, part b)
holds in all generality (but the proof requires the injectivity of the map V ! V ,
which is difficult and was not given for infinite dimensional vector spaces). Problem 6.48. Let T W V ! V be a nilpotent transformation of index k on a finite
dimensional vector space V and let v 2 V be such that T k1 .v/ ¤ 0. We denote for
simplicity S D t T W V ! V and we recall that S is nilpotent of index k by the
previous problem.
a) Explain why we can find a linear form l 2 V such that
l.T k1 .v// ¤ 0:
6.4 Application to the Classification of Nilpotent Matrices
233
b) Prove that the orthogonal W 0 of
Z D Span.l; S.l/; : : : ; S k1 .l// V is stable under T .
c) Prove that dim W 0 C dim W D dim V .
d) Deduce that W 0 ˚ W D V , thus W 0 is a complementary subspace of W , stable
under T . This finishes the proof of Jordan’s theorem!
Solution. a) This is a direct consequence of the injectivity (and actually bijectivity
since our space is finite dimensional) of the biduality map V ! V .
b) Let us try to understand concretely the space Z ? . A vector x is in Z ? if and only
if S j .l/.x/ D 0 for 0 j k 1. Since S D T , we have
S j .l/.x/ D .l ı T j /.x/ D l.T j .x//;
thus
Z ? D fx 2 V j l.T j .x// D 0 for all
0 j k 1g:
Now let x 2 Z ? and let us prove that T .x/ 2 Z ? , i.e., that
l.T j .T .x/// D 0
for 0 j k 1, or equivalently l.T j .x// D 0 for 1 j k. This is clear
for 1 j k 1, since x 2 Z ? , and it is true for j D k since by assumption
T k D 0.
c) By Theorem 6.22 we have
dim.W 0 / D dim.Z ? / D dim V dim Z D dim V dim Z:
It suffices therefore to prove that dim Z D dim W . Now dim W D k by
Problem 6.39, and dim Z D k by the same problem applied to V , S (which
is nilpotent of index k) and l (note that S k1 .l/ D l ı T k1 ¤ 0 since
l.T k1 .v// ¤ 0). Thus dim W 0 C dim W D dim V .
d) By part c) it suffices to prove that W 0 \ W D f0g. Let w 2 W and write
w D a0 v C a1 T .v/ C : : : C ak1 T k1 .v/
for some scalars a0 ; : : : ; ak1 . Suppose that w 2 W 0 , thus w 2 Z ? , that is
l.T j .w// D 0 for 0 j k 1. Taking j D k 1 and using the fact that
T m D 0 for m k yields
a0 l.T k1 .v// D 0:
234
6 Duality
Since l.T k1 .v// ¤ 0, we must have a0 D 0. Taking j D k 2 gives similarly
a1 l.T k1 .v// D 0 and so a1 D 0. Continuing like this we obtain a0 D : : : D
ak1 D 0 and so w D 0. This finishes the solution of the problem.
6.4.1 Problems for Practice
In the problems below F is a field.
1. Let T W V ! V be a linear transformation on a finite dimensional vector space
such that for all v 2 V there is a positive integer k such that T k .v/ D 0. Prove
that T is nilpotent.
2. Let V be the space of polynomials with real coefficients and let T W V ! V be
the map sending a polynomial to its derivative. Prove that for all v 2 V there is
a positive integer k such that T k .v/ D 0, but T is not nilpotent.
3. Describe the possible Jordan normal forms for a nilpotent matrix A 2 M4 .F /.
4. Find, up to similarity, all nilpotent 3 3 matrices with real entries.
5. A nilpotent matrix A 2 M5 .C/ satisfies rank.A/ D 3 and rank.A2 / D 1. Find
its Jordan normal form.
6. a) Prove that the matrix
3
3 1 3
AD4 2 0 2 5
3 1 3
2
is nilpotent and find its index.
b) Find the Jordan normal form of A.
7. Find the Jordan normal form of the matrix
3
1 1 0
A D 4 1 1 25:
1 1 0
2
8. Consider the matrix
3
3 1 1 7
6 9 3 7 1 7
7
AD6
4 0 0 4 8 5 :
0 0 2 4
2
a) Prove that A is nilpotent.
b) Find its Jordan normal form.
6.4 Application to the Classification of Nilpotent Matrices
235
9. Describe up to similarity all matrices A 2 Mn .F / such that A2 D On .
10. Let A 2 Mn .F / be a nilpotent matrix. Prove that A has index n if and only if
rank.A/ D n 1.
11. Let A 2 Mn .F / be a nilpotent matrix, say Ak D On for some k 1. Prove that
In C xA is invertible for all x 2 F and
.In C xA/1 D In xA C x 2 A2 : : : C .1/k1 x k1 Ak1 :
12. (Fitting decomposition) Let V be a finite dimensional vector space over a field
F and let T W V ! V be a linear transformation. Write
\
[
ker T k ; I D
Im.T k /:
N D
k1
k1
a) Prove that N and I are subspaces of V , stable under T .
b) Prove that there exists n such that N D ker T n and I D Im.T n /.
c) Deduce that V D N ˚ I .
d) Prove that the restriction of T to N is nilpotent and the restriction of T
to I is invertible. We call this decomposition V D N ˚ I the Fitting
decomposition of T .
e) Prove that if V D V1 ˚ V2 is a decomposition of V into subspaces stable
under T and such that T jV1 is nilpotent and T jV2 is invertible, then V1 D N
and V2 D I .
13. Find the Fitting decomposition of the matrix
1 2
:
AD
2 4
Do the same with the matrix
AD
2 1
:
4 2
Chapter 7
Determinants
Abstract This rather technical chapter is devoted to the study of determinants
of matrices and linear transformations. These are introduced and studied via
multilinear maps. The present chapter is rich in examples, both numerical and
theoretical.
Keywords Determinant • Multilinear map • Laplace expansion • Cofactor
This rather technical chapter is devoted to the study of determinants of matrices
and linear transformations. We have already seen in the chapter devoted to square
matrices of order 2 that determinants are absolutely fundamental in the study of
matrices. The advantage in that case is that many key properties of the determinant
can be checked by easy computations, while this is no longer the case for general
matrices: it is actually not even clear what the analogue of the determinant should
be for n n matrices.
The definition of the determinant of a matrix is rather miraculous at first sight,
so we spend a large part of this chapter explaining why this definition is natural
and motivated by the study of multilinear forms (which will also play a key role
in the last chapter of this book). Once the machinery is developed, the proofs of
the main properties of the determinants are rather formal, while they would be very
painful if one had to manipulate the brutal definition of a determinant as polynomial
expression of the entries of the matrix.
Permutations play a key role in this chapter, so the reader not familiar with
them should start by reading the corresponding section in the appendix dealing
with algebraic preliminaries. The most important thing for us is that the set Sn of
permutations of f1; 2; : : :; ng is a group of order nŠ with respect to the composition
of permutations, and there is a nontrivial homomorphism " W Sn ! f1; 1g, the
signature.
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__7
237
238
7 Determinants
7.1 Multilinear Maps
Let V1 ; V2 ; : : :; Vd and W be vector spaces over a field F (the reader might prefer to
take R or C in the sequel).
Definition 7.1. A map f W V1 : : : Vd ! W is called multilinear if for all
i 2 f1; 2; : : :; d g and all v1 2 V1 ; : : :; vi1 2 Vi1 ; vi C1 2 Vi C1 ; : : :; vd 2 Vd the
map
Vi ! W;
vi 7! f .v1 ; v2 ; : : :; vd /
is linear.
Let us see what the condition really says in a few simple cases. First, if d D 1,
then it simply says that the map f W V1 ! W is linear. Secondly, if d D 2, the
condition is that x 7! f .a; x/ and x 7! f .x; b/ are linear for all a 2 V1 and b 2
V2 . Such maps are also called bilinear and they will be studied rather extensively
in the last chapter of the book. If d D 3, the condition is that x 7! f .a; b; x/,
x 7! f .a; x; c/ and x 7! f .x; b; c/ should be linear for all a 2 V1 ; b 2 V2 and
c 2 V3 .
There is a catch with the previous definition: one might naively believe that a
multilinear map is the same as a linear map f W V1 : : : Vd ! W . This is
definitely not the case: consider the map f W R2 ! R sending .x; y/ to xy. It is
bilinear since for all a the map x 7! ax is linear, but the map f is not linear, since
f ..1; 0// C f ..0; 1// D 0 ¤ f ..1; 0/ C .0; 1// D 1:
One can develop a whole theory (of tensor products) based on this observation, and
the reader will find the basic results of this theory in a series of exercises at the end
of this section (see the problems for practice section).
Though one can develop a whole theory in the general setting introduced before,
we will specialize to the case V1 D V2 D : : : D Vd and we will simply call this
space V . Multilinear maps f W V d ! W will also be called d -linear maps. The
next problem gives an important recipe which yields d -linear forms from linear
forms.
Problem 7.2. Let f1 ; f2 ; : : :; fd W V ! K be linear forms and consider the map
f W V d ! K;
.x1 ; : : :; xd / 7! f1 .x1 /: : :fd .xd /:
Prove that f is d -linear.
Solution. If i 2 f1; : : :; d g and x1 2 V1 ; : : :; xi1 2 Vi 1 ; xi C1 2 Vi C1 ; : : :; xd 2
Vd , then
Q the map xi 7! f .x1 ; : : :; xd / is simply the map xi 7! afi .xi / where
a D j ¤i fj .xj / is a scalar. Since fi is a linear form, so is afi , thus xi 7! afi .xi /
is a linear map and the result follows.
7.1 Multilinear Maps
239
Not all d -linear forms are just products of linear forms:
Problem 7.3. Prove that the map f W .R2 /2 ! R given by
f ..x1 ; x2 /; .y1 ; y2 // D x1 y1 C x2 y2
is 2-linear, but is not a product of two linear maps, i.e., we cannot find linear maps
l1 ; l2 W R2 ! R such that f .x; y/ D l1 .x/l2 .y/ for all x; y 2 R2 .
Solution. If x1 and x2 are fixed, then it is not difficult to see that the map
.y1 ; y2 / 7! x1 y1 C x2 y2 is linear. Similarly, if y1 ; y2 are fixed, then the map
.x1 ; x2 / 7! x1 y1 C x2 y2 is linear. Thus f is 2-linear. Assume by contradiction that
f .x; y/ D l1 .x/l2 .y/ for two linear maps l1 ; l2 W R2 ! R and for all x D .x1 ; x2 /
and y D .y1 ; y2 / in R2 . It follows that we can find real numbers a D l1 .1; 0/,
b D l1 .0; 1/, c D l2 .1; 0/ and d D l2 .0; 1/ such that
x1 y1 C x2 y2 D .ax1 C bx2 /.cy1 C dy2 /
for all x1 ; y1 ; x2 ; y2 2 R. We cannot have .a; b/ D .0; 0/, so assume without loss of
generality that b ¤ 0. Taking x2 D axb 1 we obtain
x1 y1 D
ax1
y2
b
for all real numbers x1 ; y1 ; y2 . This is plainly absurd and the result follows.
Let us consider now a d -linear form f W V d ! W and a permutation 2 Sd .
We define a new map .f / W V d ! W by
.f /.x1 ; : : :; xd / D f .x.1/ ; : : :; x.d / /:
It follows easily from the definition of d -linear maps that .f / is also a d linear map. Moreover, for all ; 2 Sd and all d -linear maps f we have the crucial
relation (we say that the symmetric group Sd acts on the space of d -linear forms)
. /.f / D . .f //
Indeed, by definition we have
. /.f /.x1 ; : : :; xd / D f .x. .1// ; : : :; x. .d // /
while1
. .f //.x1 ; : : :; xd / D .f /.x.1/ ; : : :; x.d / / D f .x. .1// ; : : :; x. .d // /:
1
Note that setting yi D x .i / , we have y .i / D x . .i // .
(7.1)
240
7 Determinants
Recall (see the appendix on algebraic preliminaries for more details on permutations) that there is a special map " W Sd ! f1; 1g, the signature. The precise
definition is
"./ D
Y
.i / .j /
:
i j
1i<j n
This map " is multiplicative, that is
". / D "./ ". /
for all ; 2 Sd . Recall that a transposition is a permutation for which there are
integers i ¤ j 2 f1; 2; : : :; d g such that .i / D j , .j / D i and .k/ D k for
all k ¤ i; j . In this case we write D .i; j /. We note that "./ D 1 for any
transposition . We also recall that any permutation is a product of transpositions.
We introduce now two fundamental classes of d -linear maps:
Definition 7.4. Let f W V d ! W be a d -linear map.
a) We say that f is antisymmetric if .f / D "./f for all 2 Sd .
b) We say that f is alternating if f .x1 ; x2 ; : : :; xd / D 0 whenever
x1 ; x2 ; : : :; xd 2 V are not pairwise distinct.
The two definitions look quite different, but most of the time they are equivalent.
There are however some subtleties related to the field F , as the following problems
show. However, the reader should keep in mind that over fields such as the real,
rational, or complex numbers there is no difference between alternating and
antisymmetric d -linear maps.
Problem 7.5. Prove that an alternating d -linear map f
antisymmetric.
W Vd
! W is
Solution. Since any permutation is a product of transpositions and since " is
multiplicative, relation (7.1) reduces the problem to proving that .f / D f for
any transposition D .i; j /, with i < j . Consider arbitrary vectors x1 ; x2 ; : : :; xd
and note that
f .x1 ; : : :; xi1 ; xi C xj ; xiC1 ; : : :; xj 1 ; xi C xj ; xj C1 ; : : :; xd / D 0
since f is alternating. Using the d -linearity of f , the previous relation can be
written
f .x1 ; : : :; xi ; : : :; xi ; : : :; xd / C f .x1 ; : : :; xj ; : : :; xj ; : : :xd /C
f .x1 ; : : :; xj ; : : :; xi ; : : :; xd / C f .x1 ; : : :; xd / D 0:
7.1 Multilinear Maps
241
Using again the fact that f is alternating, it follows that the first two terms in the
above sum are zero and we obtain the desired result, noting that the third term is
.f /.x1 ; : : :; xd /.
Problem 7.6. Suppose that F 2 fQ; R; Cg. Prove that an antisymmetric d -linear
map f W V d ! W (with V; W vector spaces over F ) is alternating. Thus over such a
field F there is no difference between antisymmetric and alternating d -linear maps.
Solution. Suppose that x1 ; : : :; xd are not pairwise distinct, say xi D xj for some
i < j . Consider the transposition
D .i; j /. Since f is antisymmetric and
". / D 1, we deduce that .f / D f . Evaluating this equality at .x1 ; : : :; xd /
yields
f .x1 ; : : :; xi ; : : :; xj ; : : :; xd / D f .x1 ; : : :; xj ; : : :; xi ; : : :; xd /:
But since xi D xj , the previous relation can be written
2f .x1 ; : : :; xd / D 0:
Since F 2 fQ; R; Cg, the previous relation yields f .x1 ; : : :; xd / D 0 (note that this
would be completely wrong if we had F D F2 , see also the example below). Thus
f is alternating.
Example 7.7. Bad things happen when F D F2 . Let f W F 2 ! F be the
multiplication map, that is f .x; y/ D xy. It is clearly bilinear and it is not
alternating, since f .1; 1/ D 1 ¤ 0. On the other hand, f is antisymmetric. Indeed,
we only need to check that f .x; y/ D f .y; x/, or equivalently 2xy D 0. This
holds since 2 D 1 C 1 D 0.
A natural question is: how to construct antisymmetric or alternating d -linear
maps? The following problem shows that starting with any d -linear map f we can
obtain an antisymmetric one by taking a weighted average of the values .f /. This
will play a crucial role in the next section, when defining the determinant of a family
of vectors.
Problem 7.8. Let f W V d ! W be a d -linear map. Prove that
X
". / .f /
A.f / WD
2Sd
is an antisymmetric d -linear map.
Solution. It is clear that A.f / is a d -linear map, since it is a linear combination of
d -linear maps. Let 2 Sd and let us prove that .A.f // D ". /A.f /. Note that by
relation (7.1) we have
.A.f // D
X
2Sd
". / . .f // D
X
2Sd
". /. /.f /:
242
7 Determinants
Thus, using the fact that ". /"./ D ". /, we obtain
". / .A.f // D
X
". /. /.f /:
2Sd
Note that the map 7! is aP
permutation of Sd (its inverse being simply 7!
1
), thus the last sum equals 2Sd ". / .f / D A.f /. We conclude that
". / .A.f // D A.f /
and the result follows, since ". /1 D ". /.
A crucial property of alternating d -linear forms, which actually characterizes
them, is
Theorem 7.9. Let f W V d ! W be an alternating d -linear form. If x1 ;
x2 ; : : :; xd 2 V are linearly dependent, then f .x1 ; x2 ; : : :; xd / D 0.
Proof. Since x1 ; : : :; xd are linearly dependent, some xi lies in the span of
.xj /j ¤i , say
X
aj xj
xi D
j ¤i
for some scalars aj . Then using the d -linearity of f , we obtain
f .x1 ; : : :; xd / D
X
aj f .x1 ; : : :; xi1 ; xj ; xi C1 ; : : :; xd /:
j ¤i
As f is alternating, each of the terms f .x1 ; : : :; xi1 ; xj ; xi C1 ; : : :; xd / is zero, since
x1 ; : : :; xi1 ; xj ; xiC1 ; : : :; xd are not pairwise distinct. Thus f .x1 ; : : :; xd / D 0. 7.1.1 Problems for Practice
Let F be a field and let V1 ; : : :; Vd be finite dimensional vector spaces over F .
We define the tensor product V1 ˝ : : : ˝ Vd of V1 ; : : :; Vd as the set of multilinear
maps f W V1 : : : Vd ! F , where Vi is the dual of Vi .
1. Check that V1 ˝ : : : ˝ Vd is a vector subspace of the vector space of all maps
f W V1 : : : Vd ! F .
2. If vi 2 Vi for all 1 i d , define a map v1 ˝ : : : ˝ vd W V1 : : : Vd ! F by
.v1 ˝ : : : ˝ vd /.f1 ; : : :; fd / D f1 .v1 /f2 .v2 /: : :fd .vd /
for fi 2 Vi .
7.2 Determinant of a Family of Vectors, of a Matrix, and of a Linear. . .
243
a) Prove that v1 ˝ : : : ˝ vd 2 V1 ˝ : : : ˝ Vd (elements of V1 ˝ : : : ˝ Vd of the
form v1 ˝ : : : ˝ vd are called pure tensors).
b) Is the map V1 : : : Vd ! V1 ˝ : : : ˝ Vd sending .v1 ; : : :; vd / to v1 ˝ : : : ˝ vd
linear? Is it multilinear?
c) Is every element of V1 ˝ : : : ˝ Vd a pure tensor?
3. For each 1 i d let .ei;j /1j ni be a basis of Vi . Let .ei;j
/1j ni be the
associated dual basis of Vi .
a) Prove that for any f 2 V1 ˝ : : : ˝ Vd we have
f D
n1
X
j1 D1
:::
nd
X
f .e1;j
; : : :; ed;j
/e1;j1 ˝ : : : ˝ ed;jd :
1
d
jd D1
b) Prove that the family of pure tensors e1;j1 ˝: : :˝ed;jd , where 1 j1 n1 ,. . . ,
1 jd nd forms a basis of V1 ˝ : : : ˝ Vd .
c) Prove that
dim.V1 ˝ : : : ˝ Vd / D dim V1 : : : dim Vd :
4. Prove that V1 ˝ : : : ˝ Vd has the following universal property: for any vector
space W over F and any multilinear map f W V1 : : : Vd ! W there is a
unique linear map g W V1 ˝ : : : ˝ Vd ! W such that
g.v1 ˝ : : : ˝ vd / D f .v1 ; : : :; vd /
for all vi 2 Vi , 1 i d .
5. Prove that there is an isomorphism .V1 ˝ V2 / ˝ V3 ! V1 ˝ V2 ˝ V3 sending
.v1 ˝ v2 / ˝ v3 to v1 ˝ v2 ˝ v3 for all v1 2 V1 ; v2 2 V2 ; v3 2 V3 .
6. Prove that there is an isomorphism V1 ˝ V2 ! .V1 ˝ V2 / .
7. Prove that there is an isomorphism V1 ˝ V2 ! Hom.V1 ; V2 / sending f1 ˝ v2 to
the map v1 7! f1 .v1 /v2 for all f1 2 V1 and v2 2 V2 . We recall that Hom.V1 ; V2 /
is the vector space of linear maps between V1 and V2 .
7.2 Determinant of a Family of Vectors, of a Matrix,
and of a Linear Transformation
Let V be a vector space over F , of dimension n 1. Let .v1 ; v2 ; : : :; vn / be an
n-tuple of vectors in V forming a basis of V . The order of v1 ; v2 ; : : :; vn will be
very important in the sequel, so one should not consider only the set fv1 ; : : :; vn g,
but the n-tuple .v1 ; : : :; vn /.
244
7 Determinants
Consider the dual basis v1 ; : : :; vn of the dual space V . Recall that vi is the
linear form on V such that
vi .x1 v1 C : : : C xn vn / D xi
for all x1 ; : : :; xn 2 F . That is, vi .v/ is the i th coordinate of v when expressed as a
linear combination of v1 ; : : :; vn .
By Problem 7.2 the map
f W V n ! F;
.x1 ; : : :; xn / ! v1 .x1 /: : :vn .xn /
is a n-linear form. By Problem 7.8, the map
A.f / W V n ! F;
A.f /.x1 ; : : :; xn / D
X
"./f .x.1/ ; : : :; x.n/ /
2Sn
is an antisymmetric n-linear form.
Definition 7.10. Let f be as above and let x1 ; : : :; xn 2 V . We call
A.f /.x1 ; : : :; xn / the determinant of x1 ; : : :; xn with respect to .v1 ; : : :; vn / and
denote it det.v1 ;:::;vn / .x1 ; : : :; xn /.
Remark 7.11. 1) By definition we have
X
det.v1 ;:::;vn / .x1 ; : : :; xn / D
"./v1 .x.1/ /: : :vn .x.n/ /:
(7.2)
2Sn
In other words, if we write
xi D
n
X
aj i vj
j D1
for some scalars aj i 2 F (which we can always do, since v1 ; : : :; vn is a basis
of V ), then
X
det.v1 ;:::;vn / .x1 ; : : :; xn / D
"./a1.1/ : : : an .n/ :
2Sn
2) We claim that det.v1 ;:::;vn / .v1 ; : : :; vn / D 1. Indeed, suppose that v1 .v.1/ /: : :
vn .v.n/ / is a nonzero term appearing in the right-hand side of relation (7.2).
Then vi .v.i/ / is nonzero for all i 2 Œ1; n, which forces .i / D i for all i .
Thus the only nonzero term appearing in the right-hand side of (7.2) is the one
corresponding to D id, which is clearly equal to 1. This proves the claim.
3) The geometric interpretation of the determinant is as follows: consider F D R
and let e1 ; e2 ; : : :; en be the canonical basis of Rn . If x1 ; x2 ; : : :; xn are vectors in
Rn , we write det.x1 ; x2 ; : : :; xn / instead of det.e1 ;e2 ;:::;en / .x1 ; x2 ; : : :; xn /. We can
associate to the vectors x1 ; x2 ; : : :; xn the parallelepiped
P.x1 ; x2 ; : : :; xn / D fa1 x1 C a2 x2 C : : : C an xn ja1 ; : : :; an 2 Œ0; 1g:
7.2 Determinant of a Family of Vectors, of a Matrix, and of a Linear. . .
245
For instance, if xi D ei for all i, then the associated parallelepiped is the
hypercube Œ0; 1n . The geometric interpretation of det.x1 ; x2 ; : : :; xn / is given by
the fundamental equality
j det.x1 ; x2 ; : : :; xn /j D vol.P.x1 ; x2 ; : : :; xn //;
the volume being taken here with respect to the Lebesgue measure on Rn (this is
the usual area/volume when n D 2/n D 3).
2
Example 7.12. Consider the vector space V
D F overF and let e1 ; e2 be the
a
c
and x2 D
in V we have
canonical basis of V . For any vectors x1 D
b
d
det.e1 ;e2 / .x1 ; x2 / D ad bc:
For instance
3
1
D 4 2 3 D 2:
;
det.e1 ;e2 /
4
2
Here is the first big theorem concerning determinants:
Theorem 7.13. Let v1 ; : : :; vn be a basis of a vector space V over F . The
determinant map det.v1 ;:::;vn / W V d ! F with respect to this basis is n-linear and
alternating.
Proof. Denote f D det.v1 ;:::;vn / . By Definition 7.10 and the discussion preceding it
we know that f is n-linear and antisymmetric. If F 2 fQ; R; Cg, then Problem 7.6
shows that f is alternating. Let us give a proof which works for any field F (the
reader interested only in fields such as R, C, Q may skip the following technical
proof).
Let x1 ; : : :; xn 2 V and suppose that they are not pairwise distinct, say xi D xj
for some i < j . Let D .i; j /, a transposition and let An be the set of even
permutations in Sn , that is those permutations for which "./ D 1. Since ". / D
". /"./ D "./ for all 2 Sn , we deduce that Sn D An [ An (disjoint union)
and using formula (7.2) we can write
f .x1 ; : : :; xn / D
X
2An
v1 .x.1/ /: : :vn .x.n/ / X
v1 .x .1/ /: : :vn .x .n/ /:
2An
We claim that x .k/ D x.k/ for all k and 2 An , which clearly shows that
f .x1 ; : : :; xn / D 0. The claim is clear if .k/ … fi; j g, as then .k/ D .k/.
Suppose that .k/ D i , then .k/ D j and the claim comes down to xj D xi ,
which holds by assumption. The argument being similar for .k/ D j , the theorem
is proved.
246
7 Determinants
The second big theorem in the theory of determinants and multilinear maps is the
following:
Theorem 7.14. Let .v1 ; : : :; vn / be an n-tuple of vectors of V , forming a basis of V .
If f W V n ! F is any alternating n-linear form, then
f D f .v1 ; : : :; vn / det.v1 ;:::;vn /
Proof. Let x1 ; : : :; xn be vectors in V and write
xi D ai1 v1 C ai2 v2 C : : : C ai n vn
for some scalars aij . By part 1) of Remark 7.11 we have
X
"./a1.1/ : : :an .n/ :
det.v1 ;:::;vn / .x1 ; : : :; xn / D
2Sn
On the other hand, repeatedly using the n-linearity of f , we can write
f .x1 ; : : :; xn / D f .a11 v1 C : : : C a1n vn ; x2 ; : : :; xn / D
n
X
a1i f .vi ; x2 ; : : :; xn /
i D1
D
n
X
a1i a2j f .vi ; vj ; x3 ; : : :; xn / D : : : D
i;j D1
n
X
a1i1 a2i2 : : :anin f .vi1 ; : : :; vin /:
i1 ;:::;in D1
Now, since f is alternating, we have f .vi1 ; : : :; vin / D 0 unless i1 ; : : :; in are
pairwise distinct, i.e., unless there is a permutation 2 Sn such that .k/ D ik
for 1 k n. We conclude that
X
a1.1/ : : :an .n/ f .v.1/ ; : : :; v.n/ /:
f .x1 ; : : :; xn / D
2Sn
Since f is antisymmetric (by Problem 7.5 and the alternating property of f ), we
can further rewrite the last equality as
f .x1 ; : : :; xn / D
X
"./a1.1/ : : :an .n/ f .v1 ; : : :; vn / D
2Sn
det.v1 ;:::;vn / .x1 ; : : :; xn /f .v1 ; : : :; vn /;
and the result follows.
Let us record two important consequences of Theorem 7.14
Corollary 7.15. Let V be a vector space of dimension n 1. The vector space of
n-linear alternating forms f W V n ! F has dimension 1.
7.2 Determinant of a Family of Vectors, of a Matrix, and of a Linear. . .
247
Proof. Consider a basis v1 ; : : :; vn of V and let f D det.v1 ;:::;vn / . By Theorem 7.13,
the map f is an alternating n-linear form. By part 2) of Remark 7.11 we have
f .v1 ; : : :; vn / D 1, thus f is nonzero. On the other hand, Theorem 7.14 shows
that any alternating n-linear form differs by a scalar from det.v1 ;:::;vn / . The result
follows.
Corollary 7.16. Given a basis v1 ; v2 ; : : :; vn of a vector space V over F , there is a
unique n-linear alternating form f W V n ! F such that f .v1 ; v2 ; : : :; vn / D 1. This
form is given by f D det.v1 ;v2 ;:::;vn / .
Proof. Uniqueness follows directly from Theorem 7.14. The existence has already
been established during the proof of the last corollary.
Theorem 7.14 can also be used to establish a criterion to decide when a family
of vectors forms a basis of a finite dimensional vector space: it all comes down to
computing determinants, and we will see quite a few methods to compute them in
the next sections (however, in practice it the method explained before Problem 4.34
and based on row-reduction is the most efficient one).
Corollary 7.17. Let V be a vector space of dimension n over F and let
x1 ; x2 ; : : :; xn 2 V . The following assertions are equivalent:
a) x1 ; x2 ; : : :; xn form a basis of V (or, equivalently, they are linearly independent).
b) For any basis v1 ; v2 ; : : :; vn we have
det.v1 ;v2 ;:::;vn / .x1 ; x2 ; : : :; xn / ¤ 0:
c) There is a basis v1 ; v2 ; : : :; vn such that
det.v1 ;v2 ;:::;vn / .x1 ; x2 ; : : :; xn / ¤ 0:
Proof. Suppose that a) holds and let v1 ; : : :; vn be a basis of V . By Theorem 7.14
applied to f D det.x1 ;:::;xn / we have
det.x1 ;:::;xn / .x1 ; : : :; xn / D det.x1 ;:::;xn / .v1 ; : : :; vn / det.v1 ;:::;vn / .x1 ; : : :; xn /:
By Remark 7.11 the left-hand side is 1, thus both factors in the right-hand side are
nonzero, establishing b). It is clear that b) implies c), so assume that c) holds and
let us prove a). Since dim V D n, it suffices to check that x1 ; x2 ; : : :; xn are linearly
independent. If this is not the case, we deduce from Theorems 7.14 and 7.9 that
det.v1 ;v2 ;:::;vn / .x1 ; x2 ; : : :; xn / D 0, a contradiction.
Problem 7.18. Let V be a finite dimensional F -vector space, let e1 ; : : :; en be a
basis of V and let T W V ! V be a linear transformation. Prove that for all
v1 ; : : :; vn 2 V we have
n
X
iD1
det.v1 ; : : :; vi1 ; T .vi /; viC1 ; : : :; vn / D Tr.T / det.v1 ; : : :; vn /;
248
7 Determinants
where all determinants are computed with respect to the basis e1 ; : : :; en and where
Tr.T / is the trace of the matrix of T with respect to the basis e1 ; : : :; en .
Solution. Consider the map
' W V n ! F;
'.v1 ; : : :; vn / D
n
X
det.v1 ; : : :; vi 1 ; T .vi /; vi C1 ; : : :; vn /:
iD1
This map is a sum of n-linear maps, thus it is n-linear. Moreover, it is
alternating. Indeed, assume for example that v1 Dv2 . Then det.v1 ; : : :; vi 1 ; T .vi /;
vi C1 ; : : :; vn /D0 for i > 2 and
det.T .v1 /; v2 ; : : :; vn / C det.v1 ; T .v2 /; : : :; vn / D
det.T .v1 /; v1 ; v3 ; : : :; vn / C det.v1 ; T .v1 /; v3 ; : : :; vn / D 0;
since the determinant is antisymmetric.
Since the space of n-linear alternating forms on V is one-dimensional, it follows
that we can find a scalar ˛ 2 F such that
'.v1 ; : : :; vn / D ˛ det.v1 ; : : :; vn /
for all v1 ; : : :; vn . Choose v1 D e1 ; : : :; vn D en and let A D Œaij be the matrix of
T with respect to e1 ; : : :; en . Then the right-hand side equals ˛, while the left-hand
side equals
n
X
i D1
n X
n
X
det.e1 ; : : :; ei1 ;
n
X
aj i ej ; ei C1 ; : : :; en / D
j D1
aj i det.e1 ; : : :; ei1 ; ej ; eiC1 ; : : :; en / D
iD1 j D1
n
X
ai i ;
i D1
the last equality being
P a consequence of the fact that the determinant map is
alternating. Since niD1 ai i D Tr.T /, we conclude that ˛ D Tr.T / and we are
done.
Remark 7.19. Tr.T / is actually independent of the choice of the basis e1 ; : : :; en
and it is called the trace of T . To prove the independence with respect to the choice
of the basis, we need to prove that for all A 2 Mn .F / and all P 2 GLn .F / we have
Tr.A/ D Tr.PAP 1 /:
By a fundamental property of the trace map (which the reader can check without
any difficulty) we have
Tr.AB/ D Tr.BA/
7.2 Determinant of a Family of Vectors, of a Matrix, and of a Linear. . .
249
for all matrices A; B 2 Mn .F /. Thus
Tr.PAP 1 / D Tr..PA/P 1 / D Tr.P 1 .PA// D Tr..P 1 P /A/ D Tr.A/:
Consider a vector space V of dimension n 1 and a linear transformation
T W V ! V . If f W V n ! F is an n-linear form, then one can easily check that
• the map
Tf W V n ! F;
.x1 ; : : :; xn / 7! f .T .x1 /; : : :; T .xn //
is also an n-linear form.
• If f is alternating, then so is Tf .
Using these observations, we will prove the following fundamental theorem:
Theorem 7.20. Let V be a vector space of dimension n 1 over F . For any linear
transformation T W V ! V there is a unique scalar det T 2 F such that
f .T .x1 /; T .x2 /; : : :; T .xn // D det T f .x1 ; x2 ; : : :; xn /
(7.3)
for all n-linear alternating forms f W V n ! F and all x1 ; x2 ; : : :; xn 2 V .
Proof. Fix a basis v1 ; v2 ; : : :; vn of V and denote f0 D det.v1 ;:::;vn / . By Theorem 7.13
and Remark 7.11 f0 is n-linear, alternating and we have f0 .v1 ; : : :; vn / D 1.
Since .x1 ; : : :; xn / ! f0 .T .x1 /; : : :; T .xn // is n-linear and alternating, it must
be a scalar multiple of f0 , thus we can find det T 2 F such that
f0 .T .x1 /; : : :; T .xn // D det T f0 .x1 ; : : :; xn /
for all x1 ; : : :; xn 2 V . Since any n-linear alternating form f is a scalar multiple
of f0 (Corollary 7.15), it follows that relation (7.3) holds for any such map f
(since by definition of det T it holds for f0 ), which establishes the existence part
of the theorem. Uniqueness is much easier: if relation (7.3) holds for all f and all
x1 ; : : :; xn , choosing f D f0 and xi D vi for all i yields
det T D f0 .T .v1 /; : : :; T .vn //;
which clearly shows that det T is unique.
Definition 7.21. The scalar det T is called the determinant of the linear transformation T .
Note that the end of the proof of Theorem 7.20 gives an explicit formula
det T D det.v1 ;:::;vn / .T .v1 /; : : :; T .vn //
(7.4)
and this for any choice of the basis v1 ; : : :; vn of V . In particular, the right-hand side
is independent of the choice of the basis!
250
7 Determinants
Moreover, this allows us to express det T in terms of the matrix AT of T with
respect to the basis v1 ; : : :; vn . Recall that AT D Œaij with
T .vi / D
n
X
aj i vj :
j D1
Following the proof of Theorem 7.14 (i.e., using the fact that det.v1 ;:::;vn / is n-linear
and alternating, thus antisymmetric), we obtain
det.v1 ;:::;vn / .T .v1 /; : : :; T .vn // D
X
"./a1.1/ : : :an .n/ :
2Sn
The right-hand side is expressed purely in terms of the matrix AT , which
motivates the following:
Definition 7.22. If A D Œaij 2 Mn .F /, we define its determinant by
det A D
X
"./a1.1/ : : :an .n/ :
(7.5)
2Sn
We also write det A as
ˇ
ˇ
ˇ a11 a12 : : : a1n ˇ
ˇ
ˇ
ˇ a21 a22 : : : a2n ˇ
ˇ
ˇ
ˇ ::: ::: ::: ::: ˇ:
ˇ
ˇ
ˇa a ::: a ˇ
n1 n2
nn
Problem 7.23. Prove that the determinant of a diagonal matrix is the product of
diagonal entries of that matrix. In particular det In D 1.
Solution. Let A D Œaij be a diagonal n n matrix. Then
det A D
X
"./a1.1/ : : :an .n/ :
2Sn
Consider a nonzero term in the previous sum, corresponding to a permutation . We
have ai.i/ ¤ 0 for all i 2 f1; 2; : : :; ng and since A is diagonal, thus forces .i / D i
for all i 2 f1; 2; : : :; ng. It follows that the only possibly nonzero term in the above
sum is the one corresponding to the identity permutation, which equals a11 : : :ann ,
hence
det A D a11 : : :ann ;
as desired.
7.2 Determinant of a Family of Vectors, of a Matrix, and of a Linear. . .
251
Let us come back to our original situation: we have a vector space V over F , a
linear transformation T W V ! V , a basis v1 ; : : :; vn of V and the matrix AT of T
with respect to this basis. The previous discussion gives
det T D det AT
(7.6)
Note that the left-hand side is completely intrinsic to T by Theorem 7.20, in
particular it is independent of the choice of v1 ; : : :; vn . On the other hand, the matrix
AT certainly depends on the choice of the basis v1 ; : : :; vn . The miracle is that while
AT depends on choices, its determinant does not! Let us glorify this observation,
since this is a very important result:
Theorem 7.24. If A 2 Mn .F /, then det A D det.PAP 1 / for any invertible matrix
P 2 GLn .F /. In other words, similar matrices have the same determinant.
We can turn this discussion upside down: start now with any matrix A 2 Mn .F /
and let T W F n ! F n be the linear transformation sending X 2 F n to AX . Then
A is the matrix of T with respect to the canonical basis e1 ; : : :; en of F n and the
previous discussion shows that det A D det T . We deduce from Theorem 7.20 that
f .AX1 ; AX2 ; : : :; AXn / D det A f .X1 ; : : :; Xn /
for all n-linear alternating forms f W .F n /n ! F .
7.2.1 Problems for Practice
1. Check that the general definition of the determinant of a matrix matches the
definition of the determinant of a matrix A 2 M2 .C/ as seen in the chapter
concerned with square matrices of order 2.
2. Recall that a permutation matrix is a matrix A 2 Mn .R/ having precisely one
nonzero entry in each row and column, and this nonzero entry is equal to 1.
Prove that the determinant of a permutation matrix is equal to 1 or 1.
3. Let A D Œaij 2 Mn .C/ and let B D Œ.1/iCj aij 2 Mn .C/. Compare det A
and det B.
4. Generalize the previous problem as follows: let z be a complex number and let
A D Œaij 2 Mn .C/ and B D ŒziCj aij 2 Mn .C/. Express det B in terms of
det A and z.
5. (The Wronskian) Let f1 ; f2 ; : : :; fn be real-valued maps on some open interval
I of R. Assume that each of these maps is differentiable at least n 1
times. For x 2 I let W .f1 ; : : :; fn /.x/ be the determinant of the matrix A D
.j 1/
.j /
.x/1i;j n , where fi is the j th derivative of fi (with the convention
Œfi
.0/
that fi D fi ). The map x 7! W .f1 ; : : :; fn /.x/ is called the Wronskian of
f1 ; : : :; fn .
252
7 Determinants
a) Take n D 2 and f1 .x/ D e ax , f2 .x/ D e bx for two real numbers a; b.
Compute the Wronskian of f1 ; f2 .
b) Prove that if f1 ; : : :; fn are linearly dependent, then
W .f1 ; : : :; fn / D 0:
6. Consider a matrix-valued map A W I ! Mn .R/, A.t/ D Œaij .t /, where aij W
I ! R are differentiable maps on some open interval I of R. Let Bk be the
matrix obtained by replacing all entries in the kth row of A by their derivatives.
Prove that for all t 2 I
det.A.t // D
n
X
det.Bk .t //:
kD1
In the next problems V is a vector space of dimension n 1 over a field
F 2 fR; Cg. If p is a nonnegative integer, we let ^p V be the vector space of
all p-linear alternating forms ! W V p D V : : : V ! F , with the convention
that ^0 V D F .
7. Prove that ^p V D 0 for p > n.
8. Prove that if W is a finite dimensional vector space over F and if f W V ! W
is a linear map, then f induces a linear map f W ^p W ! ^p V defined by
f .!/.v1 ; : : :; vp / D !.f .v1 /; : : :; f .vp //:
9. Prove that if g W W ! Z is a linear map from W to another finite dimensional
vector space Z over F , then
.g ı f / D f ı g as maps ^p Z ! ^p V .
If ! 2 ^p V and 2 ^q V , we define the exterior product ! ^ of ! and
as the map ! ^ W V pCq ! F defined by
.!^ /.v1 ; : : :; vpCq /D
1 X
"./!.v.1/ ; : : :; v.p/ / .v.pC1/ ; : : :; v.pCq/ /:
pŠqŠ 2S
pCq
10. Prove that ! ^
11. Prove that
2 ^pCq V .
!^
D .1/pq ^ !:
12. Check that for all !1 2 ^p V , !2 2 ^q V and !3 2 ^r V we have
.!1 ^ !2 / ^ !3 D !1 ^ .!2 ^ !3 /:
7.3 Main Properties of the Determinant of a Matrix
253
We define !1 ^ !2 ^ : : : ^ !r by induction on r as follows:
!1 ^ : : : ^ !r D .!1 ^ !2 ^ : : : ^ !r1 / ^ !r :
13. Check that for all !1 ; : : :; !p 2 V D ^1 V and all v1 ; : : :; vp 2 V we have
.!1 ^ : : : ^ !p /.v1 ; : : :; vp / D det.!i .xj //:
The right-hand side is by definition the determinant of the matrix A D
Œ!i .xj / 2 Mp .F /.
14. Prove that !1 ; : : :; !p 2 V D ^1 V are linearly independent if and only if
!1 ^ !2 ^ : : : ^ !p ¤ 0:
15. Let !1 ; : : :; !n be a basis of V . Prove that the family .!i1 ^: : :^!ip /1i1 <:::<ip n
forms a basis of ^p V and deduce that
!
n
nŠ
p dim ^ V D
WD
:
p
pŠ.n p/Š
7.3 Main Properties of the Determinant of a Matrix
We reach now the heart of this chapter: establishing the main properties of the
determinant map that was introduced in the previous section. We have fortunately
developed all the necessary theory to be able to give clean proofs of all important
properties of determinants.
A first very important result is the homogeneity of the determinant map: if
we multiply all entries of a matrix A 2 Mn .F / by a scalar 2 F , then the
determinant gets multiplied by n .
Proposition 7.25. We have det. A/ D
n
det A for all A 2 Mn .F / and all
2 F.
Proof. Write A D Œaij , then A D Œ aij , hence by definition
X
det. A/ D
". /. a1.1/ / : : : . an .n/ / D
2Sn
X
"./ n a11 : : : an .n/ D
n
det A;
2Sn
as desired.
Problem 7.26. Prove that for any A 2 Mn .C/ we have
det.A/ D det A;
where the entries of A are the complex conjugates of the entries of A.
254
7 Determinants
Solution. Let A D Œaij , then A D Œaij and so
det.A/ D
X
"./a1.1/ : : : an .n/ D
2Sn
X
"./a1.1/ : : :an .n/ D
2Sn
X
"./a1.1/ : : :an .n/ D det A:
2Sn
The main property of the determinant is its multiplicative character.
Theorem 7.27. For all linear transformations T1 ; T2 on a finite dimensional vector
space V we have
det.T1 ı T2 / D det T1 det T2 :
Proof. Let v1 ; : : :; vn be a basis of V . By Theorem 7.20 we have
det.T1 ı T2 / D det.v1 ;:::;vn / .T1 .T2 .v1 //; : : :; T1 .T2 .vn ///
D det T1 det.v1 ;:::;vn / .T2 .v1 /; : : :; T2 .vn //:
Relation (7.4) shows that
det.v1 ;:::;vn / .T2 .v1 /; : : :; T2 .vn // D det T2 :
Combining these two equalities yields the desired result.
Combining the previous theorem and relation (7.6) we obtain the following
fundamental theorem, which would be quite a pain in the neck to prove directly
from the defining relation (7.5).
Theorem 7.28. For all matrices A; B 2 Mn .F / we have
det.AB/ D det A det B:
Proof. Let V D F n and let T1 W V ! V be the linear transformation sending
X 2 V to AX . Define similarly T2 replacing A by B. If S is a linear transformation
on V , let AS be the matrix of S with respect to the canonical basis of V D F n .
Then A D AT1 , B D AT2 and AB D AT1 ıT2 . The result follows directly from the
previous theorem and relation (7.6).
Problem 7.29. An invertible matrix A 2 Mn .R/ has the property that both A and
A1 have integer entries. Prove that det A 2 f1; 1g.
7.3 Main Properties of the Determinant of a Matrix
255
Solution. We have A A1 D In , so using the fact that the determinant is
multiplicative and that det In D 1 (which follows straight from the definition of
the determinant of a matrix), we obtain
1 D det In D det.A A1 / D det A det.A1 /:
Next, recalling the definition of the determinant of a matrix, we notice that if all
entries of the matrix are integers, then the determinant of the matrix is an integer
(since it is obtained by taking sums and differences of products of the entries of the
matrix). Since A and A1 have by hypothesis integer entries, it follows that det A
and det A1 are two integers, whose product equals 1. Thus det A is a divisor of 1
and necessarily det A 2 f1; 1g.
Remark 7.30. A much more remarkable result is the following kind of converse:
suppose that A 2 Mn .R/ is a matrix with integer entries. If det A 2 f1; 1g,
then A1 has integer entries. This is fairly difficult to prove with the tools we have
introduced so far! The reader might try to do the case n D 2, which is not so difficult.
We can use the previous theorem and Corollary 7.17 to obtain a beautiful
characterization of invertible matrices. The result is stunningly simple to state.
Theorem 7.31. A matrix A 2 Mn .F / is invertible if and only if det A ¤ 0.
Proof. Suppose that A is invertible, so there is a matrix B 2 Mn .F / such that
AB D BA D In . Taking the determinant yields det A det B D 1, thus det A ¤ 0.
Conversely, suppose that det A ¤ 0 and let e1 ; : : :; en be the canonical basis of
F n , and C1 ; : : :; Cn 2 F n the columns of A. Then det A D det.e1 ;:::;en / .C1 ; : : :; Cn /
is nonzero, thus by Corollary 7.17 the vectors C1 ; : : :; Cn are linearly independent.
This means that the linear map ' W F n ! F n sending X to AX is injective, and
so invertible. Let
be its inverse and let B be the matrix of
in the canonical
basis of F n . The equalities ' ı D ı ' D id yield AB D BA D In , thus A is
invertible.
Problem 7.32. Let A and B be invertible n n matrices with real entries, where n
is an odd positive integer. Show that AB C BA is nonzero.
Solution. Suppose that AB CBA D On , thus AB D BA. Taking the determinant,
we deduce that
det.AB/ D .1/n det BA D det BA:
On the other hand, det.AB/ D det A det B D det.BA/, thus the previous equality
yields 2 det A det B D 0. This contradicts the hypothesis that A and B are invertible.
t
u
Problem 7.33. Let A and B be two square matrices with real coefficients. If A and
B commute, prove that
det.A2 C B 2 / 0:
256
7 Determinants
Solution. Since A and B commute, we have
A2 C B 2 D .A C iB/.A iB/:
Thus
det.A2 C B 2 / D det.A C iB/ det.A iB/:
But using Problem 7.26 we obtain
det.A iB/ D det.A C iB/ D det.A C iB/;
thus
det.A2 C B 2 / D j det.A C iB/j2 0;
as desired.
Problem 7.34. Let n be an odd integer and let A; B 2 Mn .R/ be matrices such that
A2 C B 2 D On . Prove that AB BA is not invertible.
Solution. Consider the equality
.A C iB/.A iB/ D A2 C B 2 C i.BA AB/ D i.BA AB/:
Taking the determinant yields
det.A C iB/ det.A iB/ D i n det.BA AB/:
Suppose that det.AB BA/ ¤ 0 and note that since A; B have real entries, we have
by Problem 7.26
det.A iB/ D det.A C iB/ D det.A C iB/
and so
j det.A C iB/j2 D i n det.BA AB/:
Since det.AB BA/ is nonzero, we deduce that i n is real, contradicting the
hypothesis that n is odd. Thus
det.AB BA/ D 0
and the result follows.
7.3 Main Properties of the Determinant of a Matrix
257
Remark 7.35. An alternate solution goes as follows. Note that we have
.A C iB/.A iB/ D A2 C B 2 C i.BA AB/ D i.BA AB/
and
.A iB/.A C iB/ D A2 C B 2 i.BA AB/ D i.BA AB/:
Since
det..A C iB/.A iB// D det.A C iB/ det.A iB/ D det..A iB/.A C iB//;
and n is odd, we conclude that
i n det.BA AB/ D .i /n det.BA AB/ D i n det.BA AB/
and hence det.BA AB/ D 0.
Problem 7.36. Let p; q be real numbers such that the equation x 2 C px C q D 0
has no real solutions. Prove that if n is odd, then the equation X 2 C pX C qIn D On
has no solution in Mn .R/.
Solution. Suppose that X 2 C pX C qIn D On for some X 2 Mn .R/. We can write
this equation as
XC
2
p 2 4q
p
In D
In :
2
4
Taking the determinant, we deduce that
p 2 4q
4
n
D det X C
p
In
2
2
0:
This is impossible, since by assumption p 2 < 4q and n is odd.
Another important property of the determinant of a matrix is its behavior with
respect to the transpose operation. Recall that if A D Œaij 2 Mn .F /, then its
transpose t A is the matrix defined by t A D Œaj i .
Theorem 7.37. For all matrices A 2 Mn .F / we have
det A D det. t A/:
Proof. By formula (7.5) applied to t A we have
det. t A/ D
X
2Sn
"./a.1/1 : : :a.n/n :
258
7 Determinants
For any permutation we have
a.1/1 : : :a.n/n D a1 1 .1/ : : :an 1 .n/ ;
since ai 1 .i/ D a.j /j with j D 1 .i / (and when i runs over f1; 2; : : :; ng, so
does j ). Using this relation and the observation that ". 1 / D "./1 D "./, we
obtain2
X
det. t A/ D
". 1 /a1 1 .1/ : : :an 1 .n/ D
2Sn
D
X
"./a1.1/ : : :an .n/ D det A:
2Sn
The result follows.
Problem 7.38. Let A be a skew-symmetric matrix (recall that this means that
AC t A D On ) of odd order with real or complex coefficients. Prove that det.A/ D 0.
Solution. By hypothesis we have t A D A. Since det.A/ D det. t A/, it follows
that
det.A/ D det. t A/ D det.A/ D .1/n det.A/ D det.A/:
Thus det.A/ must be 0.
Problem 7.39. Let A be a matrix of odd order. Show that
det.A t A/ D 0:
Solution. We have
t
.A t A/ D t A t . t A/ D t A A D .A t A/;
thus the matrix A t A is skew-symmetric and its determinant must be 0 by
Problem 7.38.
Problem 7.40. Let a1 ; : : :; an and b1 ; : : :; bn be complex numbers. Compute the
determinant
ˇ
ˇ
ˇ a1 C b1 b1 : : : b1 ˇ
ˇ
ˇ
ˇ b2 a2 C b2 : : : b2 ˇ
ˇ:
ˇ
ˇ :::
: : : : : : : : : ˇˇ
ˇ
ˇ b
bn : : : an C bn ˇ
n
2
Note that when runs over Sn , so does 1 .
7.3 Main Properties of the Determinant of a Matrix
259
Solution. Let A be the matrix whose determinant we need to evaluate. We have
det A D det. t A/
and the columns of t A are the vectors a1 e1 C b1 v; : : :; an en C bn v, where e1 ; : : :; en
is the canonical basis of Cn and v is the vector all of whose coordinates are equal
to 1. We deduce that3
det. t A/ D det.a1 e1 C b1 v; : : :; an en C bn v/:
Using the fact that the determinant map is multilinear and alternating, we obtain
n
X
det.a1 e1 ; : : :; ai 1 ei 1 ; bi v; ai C1 ei C1 ; : : :; an en /:
det. t A/D det.a1 e1 ; : : :; an en /C
iD1
Indeed, note that det.x1 ; : : :; xn / D 0 if at least two of the vectors x1 ; : : :; xn are
multiples of v. We conclude that
det. t A/ D a1 : : :an C
n
X
a1 : : :ai1 bi aiC1 : : :an det.e1 ; : : :; ei 1 ; v; ei C1 ; : : :; en /:
iD1
Since v D e1 C : : : C en and the determinant map is multilinear and alternating,
we have
det.e1 ; : : :; ei1 ; v; eiC1 ; : : :; en / D det.e1 ; : : :; en / D 1
for all i . We conclude that
det A D a1 : : :an C
n
X
iD1
bi Y
ak :
k¤i
Recall that a matrix A 2 Mn .F / is called upper-triangular if all entries of A
below the main diagonal are 0, that is aij D 0 whenever i > j . Similarly, A is
called lower-triangular if all entries above the main diagonal are 0, that is aij D 0
whenever i < j . The result of the following computation is absolutely crucial:
the determinant of an upper-triangular or lower-triangular square matrix is
simply the product of the diagonal entries. One can hardly underestimate the
power of this innocent-looking statement.
3
We simply write det instead of det.e1 ;:::;en / .
260
7 Determinants
Theorem 7.41. If A D Œaij 2 Mn .F / is upper-triangular or lower triangular,
then
det A D
n
Y
ai i :
iD1
Proof. The argument being identical in the lower-triangular case, let us assume that
A is upper-triangular. Consider a nonzero term "./a1.1/ : : :an .n/ appearing in the
right-hand side of formula (7.4).
P Then each
Pnai .i / is nonzero and so necessarily
i .i / for all i. But since niD1 i D
iD1 .i / (as is a permutation), all
the previous inequalities must be equalities. Thus is the identity permutation and
the corresponding term is a11 : : :ann . Since all other terms are 0, the theorem is
proved.
Problem 7.42. For 1 i; j n we let aij be the number of common positive
divisors of i and j , and we let bij D 1 if j divides i , and bij D 0 otherwise.
a) Prove that A D B t B, where A D Œaij and B D Œbij .
b) What can you say about the shape of the matrix B?
c) Compute det A.
Solution. a) Let us fix i; j 2 f1; 2; : : :; ng and compute, using the product rule
.B t B/ij D
n
X
bik bj k :
kD1
Consider a nonzero term bik bj k in the previous sum. Since bik and bj k are
nonzero, k must divide both i and j , that is k is a common positive divisor
of i and j . Conversely, if k is a common positive divisor of i and j , then
bik D bj k D 1. We deduce that the only nonzero terms in the sum are those
corresponding to common positive divisors of i and j , and each such nonzero
term equals 1. Thus .B t B/ij is the number of common positive divisors of
i and j , which by definition of A is simply aij . Since i; j were arbitrary, we
deduce that A D B t B.
b) If i < j are between 1 and n, then certainly j cannot divide i and so bij D 0.
Thus bij D 0 whenever i < j , which means that B is lower-triangular. We can
say a little more: since i divides i for all i 2 f1; 2; : : :; ng, we have bi i D 1, thus
all diagonal terms of B are equal to 1.
c) Since the determinant is multiplicative and since det B D det. t B/, we have
(using part a))
det A D det.B t B/ D .det B/2 :
We can now use part b) and the previous theorem to conclude that det B D 1
and so
det A D 1:
t
u
7.3 Main Properties of the Determinant of a Matrix
261
The next theorem (which can be very useful in practice) would also be quite
painful to prove directly by manipulating the complicated expression defining the
determinant of a matrix. The theory of alternating multilinear forms makes the proof
completely transparent.
Theorem 7.43 (Block-Determinants). Let A 2 Mn .F / be a matrix given in block
form
B D
;
AD
Oq;p C
where B 2 Mp .F /, C 2 Mq .F / (with p C q D n) and D 2 Mp;q .F /. Then
det A D det B det C:
Proof. Consider the map
' W .F p /p ! F;
ˇ
ˇ
ˇ X Dˇ
ˇ;
'.X1 ; : : :; Xp / D ˇˇ
Oq;p C ˇ
where X 2 Mp .F / is the matrix with columns X1 ; : : :; Xp . The determinant map
on .F n /n (with respect to the canonical basis of F n ) being linear with respect to
each variable, ' is p-linear. Moreover, ' is alternating: if Xi D Xj for some i ¤ j ,
X D
are equal and so this matrix has
then columns i and j in the matrix
Oq;p C
determinant 0.
Now, applying Theorem 7.14 to the canonical basis of F p , we obtain
ˇ
ˇ
ˇ
ˇ
ˇ X Dˇ
ˇ Ip D ˇ
ˇ
ˇ
ˇ
ˇ
'.X / D ˇ
D det X ˇ
Oq;p C ˇ
Oq;p C ˇ
for all X 2 Mp .F /. The same game played with the q-linear alternating form Y !
ˇ
ˇ
ˇ Ip D ˇ
ˇ
ˇ
ˇ Oq;p Y ˇ yields
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ Ip D ˇ
ˇ D det Y ˇ Ip D ˇ :
ˇ
ˇ Oq;p Iq ˇ
ˇ Oq;p Y ˇ
Thus
ˇ
ˇ
ˇ Ip D ˇ
ˇ
ˇ D det B det C;
det A D det B det C ˇ
Oq;p Iq ˇ
Ip D
is upperthe last equality being a consequence of the fact that the matrix
Oq;p Iq
triangular with diagonal entries equal to 1, thus its determinant equals 1.
262
7 Determinants
Problem 7.44. Let A 2 Mn .C/ and let T W Mn .C/ ! Mn .C/ be the map defined
by T .X / D AX . Find the determinant of T .
Solution. Let .Eij /1i;j n be the canonical basis of Mn .C/. Note that
T .Eij / D AEij D
n
X
aki Ekj ;
kD1
as shows a direct inspection of the product AEij . We deduce that the matrix of
T with respect to the basis E11 ; : : :; En1 ; E12 ; : : :; En2 ; : : :; E1n ; : : :; Enn is a blockdiagonal matrix with n diagonal blocks equal to A. It follows from Theorem 7.43
that
det T D .det A/n :
7.3.1 Problems for Practice
1. A 5 5 matrix A with real entries has determinant 2. Compute the determinant
of 2A; 3A; A2 , A3 , . t A/2 .
2. Prove that the determinant of an orthogonal matrix A 2 Mn .R/ equals 1 or 1.
We recall that A is orthogonal if A t A D In .
3. a) A matrix A 2 Mn .R/ satisfies A3 D In . What are the possible values of
det A?
b) Answer the same question with R replaced by C.
c) Answer the same question with R replaced by F2 .
4. Prove that for all A 2 Mn .R/ we have
det.A t A/ 0:
5. If A D Œaij 2 Mn .C/, define A D Œaj i 2 Mn .C/.
a) Express det.A / in terms of det A.
b) Prove that det.A A / 0.
6. Let T be a linear transformation on a finite dimensional vector space V .
Suppose that V D V1 ˚ V2 for some subspaces V1 ; V2 which are stable under
T . Let T1 ; T2 be the restrictions of T to V1 ; V2 . Prove that
det T D det T1 det T2 :
7. The entries of a matrix A 2 Mn .R/ are equal to 1 or 1. Prove that 2n1 divides
det A.
7.3 Main Properties of the Determinant of a Matrix
263
8. Prove that for any matrix A 2 Mn .R/ we have
a) det.A2 C In / 0.
b) det.A2 C A C In / 0.
9. Prove that if A; B 2 Mn .R/ are matrices which commute, then
det.A2 C AB C B 2 / 0:
10. Let A; B; C 2 Mn .R/ be pairwise commuting matrices. Prove that
det.A2 C B 2 C C 2 AB BC CA/ 0:
Hint: express A2 C B 2 C C 2 AB BC CA simply in terms of the matrices
X D A B and Y D B C .
11. Let A 2 Mn .C/ and consider the matrix
In A
:
BD
A In
a) Prove that det B D det.In A/ det.In C A/. Hint: start by proving the
equality
In A
In On
In A
D
:
A In
On In
A In A2
b) If B is invertible, prove that In A2 is invertible and compute the inverse of
B in terms of A and the inverse of In A2 .
12. Prove that for all matrices A; B 2 Mn .R/ we have
ˇ
ˇ
ˇ A B ˇ
2
ˇ
ˇ
ˇ B A ˇ D j det.A C iB/j :
13. Let A; B 2 Mn .R/ be matrices such that A2 C B 2 D AB and AB BA is
invertible.
2i a) Let j D e 3 , so that j 2 C j C 1 D 0. Check that
.A C jB/.A C j 1 B/ D j.BA AB/:
b) Prove that n is a multiple of 3.
14. (Matrices differing by a rank one matrix) Let A 2 GLn .F / be an invertible
matrix and v; w 2 Mn;1 .F / be vectors thought of as n 1 matrices.
a) Show that
det.A v t w/ D det.A/.1 t wA1 v/;
264
7 Determinants
where we think of the 1 1 matrix t wA1 v as a scalar.
Hint. One way to prove this formula is to justify the block matrix formula
t t
1 0
1 0
1 w
1
1 t wA1 v t w
w
1
D
:
D
A v In
v In
0
A
v A
0 A v tw
b) If furthermore t wA1 v ¤ 1, show that
.A v t w/1 D A1 C
1
1 t wA1 v
A1 v t wA1 :
15. (Determinants of block matrices) Let X 2 Mn .F / be a matrix given in block
form
A B
;
XD
C D
where A 2 Mp .F /, B 2 Mp;q .F /, C 2 Mq;p .F /, D 2 Mq .F /, and pCq D n.
If A is invertible, show that
det.X / D det.A/ det.D CA1 B/:
Hint. Generalize the block matrix formula from the preceding problem.
16. (Smith’s determinant) For 1 i; j n let xij be the greatest common
divisor of i and j . The goal of this problem is to compute det X , where
X D Œxij 1i;j n .
Let ' be Euler’s totient function (i.e., '.1/ D 1 and, for n 2, '.n/ is
the number of positive integers less than n and relatively prime to n). Define
yij D '.j / if j divides i , and yij D 0 otherwise. Also, let bij D 1 if j divides
i and 0 otherwise.
a) Prove that X D Y t B, where Y D Œyij and B D Œbij .
b) Prove that
det X D '.1/'.2/: : :'.n/:
7.4 Computing Determinants in Practice
If n D 2, then S2 contains only the permutations
12
12
and
12
, hence we get
21
7.4 Computing Determinants in Practice
265
ˇ
ˇ
ˇ a11 a12 ˇ
ˇ
ˇ
ˇ a21 a22 ˇ D a11 a22 a12 a21 ;
a formula which we have extensively used in the chapter devoted to square matrices
of order 2. The value of the determinant of a matrix of order two may be remembered
by the array
ˇ
ˇ
ˇ a11
a12 ˇˇ
ˇ
ˇ
ˇ D a11 a22 a12 a21
&
%
ˇ
ˇ
ˇa
a22 ˇC
21
If n D 3, then S3 contains six permutations:
123
;
123
123
;
312
123
;
231
123
;
321
123
;
132
123
:
213
The first three permutations are even and the last three are odd. In this case we get
ˇ
ˇ
ˇ a11 a12 a13 ˇ
ˇ
ˇ
ˇ a21 a22 a23 ˇ D a11 a22 a33 C a13 a21 a32 C a12 a23 a31
ˇ
ˇ
ˇa a a ˇ
31 32 33
a13 a22 a31 a11 a23 a32 a12 a21 a33
The value of the determinant of order three may be remembered using a particular
scheme similar to that used for determinants of order two:
a11
a12
Ÿ
a21
a22
a31
a13
Ÿ
%
a23
&
a32
a11
Ÿ
%
a22 D a11 a22 a33 C a13 a21 a32
a21
&
a33
a12
%
&
a31
a32
C a12 a23 a31 a13 a22 a31
a11 a23 a32 a12 a21 a33
i.e., the first two columns of the determinant are repeated at its right, the products of
the three elements along the arrows running downward and to the right are noted as
well as the negative of the products of the three elements along the arrows running
upward and to the right. The algebraic sum of these six products is the value of the
determinant.
266
7 Determinants
For example, applying this scheme we get
1
4
Ÿ
1
2
2
0
Ÿ
%
&
1
4
%
2 D .2/ C 8 0 0 0 .4/ D 14
1
1
&
0
1
Ÿ
%
&
2
0
Problem 7.45. Compute the determinant
ˇ
ˇ
ˇ1 2 3 ˇ
ˇ
ˇ
ˇ 2 3 5 ˇ :
ˇ
ˇ
ˇ 3 1 2 ˇ
Solution. Using the rule described above, we obtain
ˇ
ˇ
ˇ1 2 3 ˇ
ˇ
ˇ
ˇ 2 3 5 ˇ D 6 C 30 C 6 C 27 5 C 8 D 72:
ˇ
ˇ
ˇ 3 1 2 ˇ
Problem 7.46. Consider the invertible matrix
3
211
A D 41 1 15 :
112
2
Find the determinant of the inverse of A.
Solution. It is useless to compute A1 explicitly in order to solve the problem.
Indeed, since A A1 D I3 , we have det A det.A1 / D 1 and so
det.A1 / D
1
:
det A
It suffices therefore to compute det A. Now, using the previous rule, we obtain
det A D 4 C 1 C 1 1 2 2 D 1:
Thus
det.A1 / D 1:
7.4 Computing Determinants in Practice
267
No such easy rules exist for matrices of size at least 4. In practice, the following
properties of the determinant allow computing a large quantity of determinants
(the elements R1 ; R2 ; : : :; Rn below are the rows of the matrix whose determinant
we are asked to compute or study).
1. If every element of a row of a determinant of order n is multiplied by the scalar
, then the value of the determinant is multiplied by .
2. If two rows of a determinant are interchanged, then the determinant gets
multiplied by 1. More generally, we have the following formula where is
any permutation in Sn
3
2 3
R1
R.1/
6 R2 7
6 R.2/ 7
7
6 7
6
det 6 : 7 D "./ det 6 : 7 :
4 :: 5
4 :: 5
2
R.n/
Rn
3. Adding a scalar multiple of a row of a determinant to another row does not
change the value of the determinant: for j ¤ k and 2 F we have
2
3
2 3
R1
7
6
7
6
6 R2 7
7
6
6 7
7
det 6
: 7
6 Rj C Rk 7 D det 6
4 :: 5
7
6
:
::
5
4
Rn
Rn
R1
::
:
4. A very useful property is that the determinant of an upper (or lower) triangular
matrix is simply the product of its diagonal entries.
Note that the operations involved are the elementary row operations studied in
Chap. 3. This gives us a practical way of computing determinants, known as the
Gaussian elimination algorithm for determinants: start with the matrix A whose
determinant we want to evaluate and perform Gaussian reduction in order to bring
it to its reduced row-echelon form. This will require several elementary operations
on the rows of the matrix A, which come down to multiplying A by a sequence of
elementary matrices E1 ; : : :; Ek on the left. Thus at the end we obtain
E1 E2 : : :Ek A D Aref ;
where Aref is the reduced row-echelon form of A. Taking determinants gives
det.E1 / det.E2 / : : : det Ek det A D det Aref :
Since Aref is upper-triangular, its determinant is simply the product of its diagonal
entries (in particular if some diagonal entry equals 0, then det A D 0). Also, the
268
7 Determinants
previous rules 1–3 allow us to compute very easily each of the factors det E1 ,
det E2 ,. . . , det Ek . We can neglect those matrices Ei which correspond to transvections, as their determinant is 1. Next, if Ei corresponds to multiplication of a row by
a scalar , then det Ei D , and if Ei corresponds to a permutation of two rows, then
det Ei D 1. Thus, in practice we simply follow the Gaussian reduction to bring
the matrix to its reduced row-echelon form, and keep in mind to multiply at each
step the value of the determinant by the corresponding constant, which depends on
the operation performed as explained before.
Remark 7.47. a) Note that since det. t A/ D det A for all A 2 Mn .F /, all previous
properties referring to rows of a matrix (or determinant) still hold when the word
row is replaced with the word column.
b) For any particular problem an intelligent human being can probably do better
than the naive Gaussian elimination. The most likely way is by being opportunistic to produce more zeroes in the matrix with carefully placed row and/or
column operations. Two more systematic ways are:
• If there is some linear dependence among the columns (or rows) then the
determinant vanishes, which gives an early exit to the algorithm. Note that
this is the case if a column (or row) consists entirely of zeros, or if there are
two equal columns or two equal rows.
• Since we can easily compute the determinant of an upper-triangular matrix,
we do not need to fully reduce, just get down to a triangular matrix.
c) There are some extra rules one could exploit (they are however more useful in
theoretical questions):
• If a column is decomposed as the sum of two column vectors, then the
determinant is the sum of the corresponding two determinants, i.e.
det c1 c2 : : : ck0 C ck00 : : : cn
D det c1 c2 : : : ck0 : : : cn C det c1 c2 : : : ck00 : : : cn :
A similar statement applies to rows.
• If A 2 Mn .C/, then the determinant of the conjugate matrix of A equals the
conjugate of determinant of A, i.e.
det A D det A:
• For A; B 2 Mn .F / we have
det.A B/ D det A det B:
• If A 2 Mn .F / and
2 F , then
det. A/ D
n
det A:
7.4 Computing Determinants in Practice
269
Problem 7.48. Prove that for all real numbers a; b; c we have
ˇ
ˇ
ˇ 1
1
1 ˇˇ
ˇ
ˇ a
b
c ˇˇ D 0:
ˇ
ˇb C c c C a a C bˇ
Solution. Adding the second row to the third row yields
ˇ
ˇ ˇ
ˇ
ˇ 1
ˇ
1
1 ˇˇ ˇˇ
1
1
1
ˇ
ˇ
ˇ a
ˇDˇ
ˇ:
b
c
a
b
c
ˇ
ˇ ˇ
ˇ
ˇb C c c C a a C b ˇ ˇa C b C c a C b C c a C b C c ˇ
Since the third row is proportional to the first row, this last determinant vanishes. Problem 7.49. Let a; b; c be complex numbers. By computing the determinant of
the matrix
3
2
abc
A D 4b c a 5
cab
in two different ways, prove that
a3 C b 3 C c 3 3abc D .a C b C c/.a2 C b 2 C c 2 ab bc ca/:
Solution. First, we can compute det A using the rule described in the beginning of
this section. We end up with
det A D abc C abc C abc a3 b 3 c 3 D .a3 C b 3 C c 3 3abc/:
On the other hand, we can add all columns to the first column and obtain
ˇ
ˇ
ˇ
ˇ
ˇ1 b c ˇ
ˇa C b C c b c ˇ
ˇ
ˇ
ˇ
ˇ
det A D ˇˇ a C b C c c a ˇˇ D .a C b C c/ ˇˇ 1 c a ˇˇ :
ˇa C b C c a b ˇ
ˇ1 a b ˇ
The last determinant can be computed using the rule described in the beginning of
the section. We obtain
ˇ
ˇ
ˇ1 b c ˇ
ˇ
ˇ
ˇ 1 c a ˇ D bc C ab C ca c 2 a2 b 2 D .a2 C b 2 C c 2 ab bc ca/:
ˇ
ˇ
ˇ1 a b ˇ
Thus
det A D .a C b C c/.a2 C b 2 C c 2 ab bc ca/:
Comparing the two expressions for det A yields the desired result.
270
7 Determinants
Problem 7.50. Prove that
ˇ
ˇ
ˇ
ˇ
ˇ b1 C c1 c1 C a1 a1 C b1 ˇ
ˇ a1 b1 c1 ˇ
ˇ
ˇ
ˇ
ˇ
ˇ b2 C c2 c2 C a2 a2 C b2 ˇ D 2 ˇ a2 b2 c2 ˇ
ˇ
ˇ
ˇ
ˇ
ˇb C c c C a a C b ˇ
ˇa b c ˇ
3
3 3
3 3
3
3 3 3
for all real numbers a1 ; a2 ; a3 ; b1 ; b2 ; b3 ; c1 ; c2 ; c3 .
Solution. Performing the indicated operations on the corresponding matrices, we
have the following chain of equalities:
ˇ
ˇ
ˇ
ˇ
ˇ b1 C c1 c1 C a1 a1 C b1 ˇ
ˇ
ˇ
ˇ
ˇ C1 !C1 C2 ˇ b1 a1 c1 C a1 a1 C b1 ˇ
ˇ b2 C c2 c2 C a2 a2 C b2 ˇ DDDDDD ˇ b2 a2 c2 C a2 a2 C b2 ˇ
ˇ
ˇ
ˇ
ˇ
ˇb C c c C a a C b ˇ
ˇb a c C a a C b ˇ
3
3 3
3 3
3
3
3 3
3 3
3
ˇ
ˇ
ˇ
ˇ
ˇ 2a1 c1 C a1 a1 C b1 ˇ
ˇ a1 c1 C a1 a1 C b1 ˇ
ˇ
ˇ
ˇ
ˇ
DDDDDD ˇˇ 2a2 c2 C a2 a2 C b2 ˇˇ D 2 ˇˇ a2 c2 C a2 a2 C b2 ˇˇ
ˇ 2a c C a a C b ˇ
ˇa c C a a C b ˇ
3 3
3 3
3
3 3
3 3
3
C1 !C1 C3
ˇ
ˇ
ˇ
ˇ
ˇ a 1 c1 b 1 ˇ
ˇ a1 b1 c1 ˇ
ˇ
ˇ
ˇ
ˇ
DDDDDD 2 ˇˇ a2 c2 b2 ˇˇ D 2 ˇˇ a2 b2 c2 ˇˇ :
C3 !C3 C1
ˇa c b ˇ
ˇa b c ˇ
3 3 3
3 3 3
C2 !C2 C1
The result follows.
Remark 7.51. An alternate and shorter solution to the previous problem is to note
that
3
3 2
3 2
2
a1 b1 c1
011
b1 C c1 c1 C a1 a1 C b1
4b2 C c2 c2 C a2 a2 C b2 5 D 4a2 b2 c2 5 41 0 15 :
110
b3 C c3 c3 C a3 a3 C b3
a3 b3 c3
Problem 7.52. Compute det A, where x1 ; : : :; xn are real numbers and
3
2
1 C x1 x 2 x 3 : : : x n
6 x 1 1 C x2 x 3 : : : x n 7
7:
AD6
5
4 :::
x1
x2
x 3 : : : 1 C xn
Solution. We start by adding all the other columns to the first column, and factoring
out 1 C x1 C : : : C xn . We obtain
ˇ
ˇ
ˇ 1
x2 : : : xn ˇˇ
ˇ
ˇ 1 1 C x2 : : : xn ˇ
ˇ:
det A D .1 C x1 C : : : C xn / ˇˇ
ˇ
ˇ::: ::: ::: ::: ˇ
ˇ 1
x 2 : : : 1 C xn ˇ
7.4 Computing Determinants in Practice
271
In this new determinant, from each row starting with the second we subtract the first
row. We end up with
ˇ
ˇ
ˇ 1 x2 : : : x n ˇ
ˇ
ˇ
ˇ 0 1 ::: 0 ˇ
ˇ:
det A D .1 C x1 C x2 C : : : C xn / ˇˇ
ˇ
ˇ::: ::: ::: :::ˇ
ˇ 0 0 ::: 1 ˇ
The last determinant is that of an upper-triangular matrix with diagonal entries equal
to 1, thus it equals 1. We conclude that
det A D 1 C x1 C x2 C : : : C xn :
Problem 7.53. Let A D Œaij 2 Mn .R/ be the matrix defined by
n C 1; if i D j
aij D
1;
if i ¤ j:
Compute det A.
Solution. The matrix A can be written in the form
2
n C 1 1 ::: 1
6 1 n C 1 ::: 1
6
AD6 :
::
::
::
4 ::
:
:
:
1
1
3
7
7
7:
5
::: n C 1
Adding all columns of A to the first column we obtain
ˇ
ˇ
ˇ
ˇ
ˇ 2n 1 : : : 1 ˇ
ˇ1 1 ::: 1 ˇ
ˇ
ˇ
ˇ
ˇ
ˇ 2n n C 1 : : : 1 ˇ
ˇ1 n C 1 ::: 1 ˇ
ˇ
ˇ
ˇ
ˇ
det A D ˇ :
::
::
:: ˇˇ D 2n ˇˇ :: ::
::
:: ˇˇ :
ˇ ::
:
:
: ˇ
:
: ˇ
ˇ
ˇ: :
ˇ 2n 1 : : : n C 1 ˇ
ˇ1 1 ::: n C 1ˇ
The last determinant can be computed by subtracting the first column from each
of the other columns, and noting that the resulting matrix is lower-triangular.
We obtain
ˇ
ˇ
ˇ
ˇ
ˇ1 1 ::: 1 ˇ
ˇ1 0 ::: 0ˇ
ˇ
ˇ
ˇ
ˇ
ˇ1 n C 1 ::: 1 ˇ
ˇ1 n ::: 0ˇ
ˇ
ˇ
ˇ
ˇ
det A D 2n ˇ : :
::
:: ˇˇ D 2n ˇˇ :: :: :: :: ˇˇ
ˇ :: ::
:
: ˇ
ˇ
ˇ: : : :ˇ
ˇ1 1 ::: n C 1ˇ
ˇ1 0 ::: nˇ
D 2n.1 n n : : : n/ D 2n nn1 D 2nn :
272
7 Determinants
Remark 7.54. The previous two problems are special cases of Problem 7.40.
Problem 7.55. Prove that for all real numbers a; b; c
ˇ
ˇ 2
ˇ cos a sin2 a cos 2a ˇ
ˇ
ˇ
ˇ cos2 b sin2 b cos 2b ˇ D 0:
ˇ
ˇ
ˇ cos2 c sin2 c cos 2c ˇ
Solution. We have
ˇ ˇ
ˇ
ˇ 2
ˇ cos a sin2 a cos 2a ˇ ˇ cos2 a sin2 a sin2 a cos 2a ˇ
ˇ
ˇ
ˇ
ˇ
ˇ cos2 b sin2 b cos 2b ˇ D ˇ cos2 b sin2 b sin2 b cos 2b ˇ
ˇ ˇ
ˇ
ˇ
ˇ cos2 c sin2 c cos 2c ˇ ˇ cos2 c sin2 c sin2 c cos 2c ˇ
ˇ
ˇ
ˇ cos 2a sin2 a cos 2a ˇ
ˇ
ˇ
D ˇˇ cos 2b sin2 b cos 2b ˇˇ D 0:
ˇ cos 2c sin2 c cos 2c ˇ
The result follows.
Problem 7.56. Let A 2 Mn .R/.
a) Show that if n2 n C 1 entries in A are equal to 0, then det.A/ D 0.
b) Show that one can choose A such that det A ¤ 0 and A has n2 n C 1 equal
entries.
c) Show that if n2 n C 2 entries in A are equal, then det.A/ D 0.
Solution. a) We claim that the matrix A has a column consisting entirely of zeros,
which implies det A D 0. Indeed, if each column of A has at most n 1 zeros,
then A has at most n.n 1/ zero entries in total. This contradicts the hypothesis.
b) Consider the matrix A whose elements off the main diagonal are equal to 1
and the diagonal entries are 1; 2; : : :; n. Then n2 n C 1 entries are equal to 1,
but det A ¤ 0. Indeed, subtracting the first row from each subsequent row yields
an upper-triangular matrix with nonzero diagonal entries, thus invertible.
c) If n2 n C 2 entries in A are equal (say to some number a), then there are at
most n 2 entries of A that are not equal to a. Thus at most n 2 columns of A
contain an entry which is not equal to a. Said differently, at least 2 columns
of A have all entries a. But then A has two equal columns and det.A/ D 0. Problem 7.57 (The Vandermonde Determinant). Let a1 ; a2 ; : : : ; an be complex
j 1
numbers. Prove that the determinant of the matrix A D Œai 1i;j n (where if
necessary we interpret 00 as being 1) equals
det.A/ D
Y
.aj ai /:
1i<j n
7.4 Computing Determinants in Practice
273
Solution. Starting from the right-hand side and working left subtract a1 times each
column from the column to its right. This gives
ˇ
ˇ
ˇ
ˇ1
0
0
0
0
ˇ
ˇ
ˇ1 .a2 a1 / .a2 a1 /a2 .a2 a1 /a2n3 .a2 a1 /a2n2 ˇ
ˇ
ˇ
n3
n2 ˇ
ˇ
det A D ˇ1 .a3 a1 / .a3 a1 /a3 .a3 a1 /a3 .a3 a1 /a3 ˇ
ˇ
ˇ:
::
::
::
::
::
ˇ
ˇ ::
:
:
:
:
:
ˇ
ˇ
ˇ1 .a a / .a a /a .a a /an3 .a a /an2 ˇ
n
1
n
1 n
n
1 n
n
1 n
ˇ
ˇ
ˇ.a2 a1 / .a2 a1 /a2 .a2 a1 /a2n3 .a2 a1 /a2n2 ˇ
ˇ
ˇ
ˇ.a3 a1 / .a3 a1 /a3 .a3 a1 /an3 .a3 a1 /an2 ˇ
3
3 ˇ
ˇ
Dˇ
ˇ:
::
::
::
::
::
ˇ
ˇ
:
:
:
:
:
ˇ
ˇ
ˇ.a a / .a a /a .a a /an3 .a a /an2 ˇ
n
1
n
1
n
n
1
n
n
1
n
Factoring out ak a1 from row k gives
ˇ
ˇ
ˇ1 a2 a2n3 a2n2 ˇ
ˇ
ˇ
n
ˇ1 a3 an3 an2 ˇ
Y
3
3 ˇ
ˇ
.aj a1 / ˇ : : :
det A D
:: ˇˇ :
ˇ :: :: : : :::
: ˇ
j D2
ˇ
ˇ1 a an3 an2 ˇ
n
n
n
This last determinant is of the same form as the original matrix, but of a smaller
size, hence we are done using an easy induction on n.
Problem 7.58 (The Cauchy Determinant). Let a1 ; : : :; an and b1 ; : : :; bn be complex numbers such that ai C bj ¤ 0 for 1 i; j n. Prove that the determinant of
1
the matrix A D Πai Cb
equals
j
Q
1i<j n .aj ai /.bj bi /
Qn
:
det A D
i;j D1 .ai C bj /
Solution. Subtracting the last column from each of the first n1 columns and using
the identity
bn bj
1
1
D
ai C bj
ai C bn
.ai C bj /.ai C bn /
1
out of the i -th row yields
to factor a bn bj out of the j -th column and a ai Cb
n
ˇ
ˇ
1
1
ˇ
ˇ 1
ˇ a1 Cb1 a1 Cb2 a1 Cbn1 1ˇ
ˇ
ˇ 1
1
1
.bn b1 / .bn bn1 / ˇˇ a2 Cb1 a2 Cb2 a2 Cbn1 1ˇˇ
det A D
:: : :
::
:: ˇ :
.a1 C bn / .an C bn / ˇˇ :::
:
:
:
:ˇ
ˇ
ˇ 1
1
1
ˇ
1ˇ
an Cb1 an Cb2
an Cbn1
274
7 Determinants
Similarly, subtracting the last row from each of the first n 1 rows in the matrix
appearing in the last equality and pulling out common factors, we obtain
ˇ
ˇ
1
1
1
ˇ
ˇ
ˇ a1 Cb1 a1 Cb2 : : : a1 Cbn1 0 ˇ
ˇ
ˇ
1
1
1
Qn1
ˇ a2 Cb1 a2 Cb2 : : : a2 Cbn1 0 ˇ
ˇ
ˇ
.bn bi /.an ai /
::
::
::
::
ˇ
ˇ:
det A D Qn iD1
Qn1
:
ˇ
ˇ
:
:
:
ˇ
iD1 .ai C bn /
iD1 .an C bi / ˇ
1
ˇ a 1Cb a 1Cb : : : a Cb
0 ˇˇ
n1
n1
ˇ n1 1 n1 2
ˇ 1
1
:::
1
1ˇ
Hence
ˇ
1
ˇ
ˇ a1 Cb1
ˇ
Qn1
1
ˇ a2 Cb
.bn bi /.an ai /
1
ˇ
det A D Qn i D1
Qn1
::
ˇ
.a
C
b
/
.a
C
b
/
ˇ
n
i
:
iD1 i
iD1 n
ˇ
1
ˇ
ˇ
: : : a1 Cb1 n1 ˇˇ
ˇ
: : : a2 Cb1 n1 ˇ
ˇ;
::
::
::
ˇ
:
ˇ
:
:
ˇ
1
1
ˇ
:
:
:
an1 Cb1 an1 Cb2
an1 Cbn1
1
a1 Cb2
1
a2 Cb2
which allows us to conclude by induction on n, the last determinant being of the
same form, but of smaller dimension.
Another useful tool for computing determinants is the Laplace expansion.
Consider a matrix A 2 Mn .F / with entries aij . The minor of aij is the determinant
Mij of the matrix obtained from A by deleting row i and column j . The cofactor
of aij is Cij D .1/iCj Mij .
Example 7.59. The minor of a23 in
ˇ
ˇ
ˇ 2 1 0 ˇ
ˇ
ˇ
ˇ 1 2 3 ˇ
ˇ
ˇ
ˇ 4 0 2 ˇ
is
ˇ
ˇ
ˇ 2 1 ˇ
ˇ
ˇD4
M23 D ˇ
4 0 ˇ
and the cofactor of a23 is C23 D .1/2C3 M23 D 4.
The cofactors play a key role, thanks to the following theorem, which shows that
the computation of a determinant of order n may be reduced to the computation of
n determinants of order n 1. If we use properties 1)–11) of determinants and we
create some zeros on the kth line, then we only need the cofactors corresponding to
the nonzero elements of this line, i.e., combining these methods we can reduce the
volume of computations.
7.4 Computing Determinants in Practice
275
Theorem 7.60 (Laplace Expansion). Let A D Œaij 2 Mn .F / be a matrix and let
Ci;j be the cofactor of aij .
a) (expansion with respect to column j ) For each j 2 f1; 2; : : :; ng we have
det A D
n
X
aij Cij :
iD1
b) (expansion with respect to row i) For each i 2 f1; 2; : : :; ng we have
det A D
n
X
aij Cij :
j D1
Proof. We will prove only part a), the argument being similar for part b) (alternatively, this follows from a) using that the determinant of a matrix equals the
determinant of its transpose). Fix j 2 f1; 2; : : :; ng, let B D .e1 ; : : :; en / be the
n
n
canonical
Pn basis of F and let C1 ; : : :; Cn 2 F be the columns of A, so that
Ck D iD1 aik ei for all k. We deduce that
det A D detB .C1 ; ::; Cn / D detB .C1 ; : : :; Cj 1 ;
n
X
aij ei ; Cj C1 ; : : :; Cn /
i D1
D
n
X
aij detB .C1 ; : : :; Cj 1 ; ei ; Cj C1 ; : : :; Cn /:
iD1
It remains to see that Xij WD detB .C1 ; : : :; Cj 1 ; ei ; Cj C1 ; : : :; Cn / equals Cij , the
cofactor of aij . By a series of n j column interchanges, we can put the j th
column of the determinant Xij in the last position, and by a sequence of n i
row interchanges we can put the i th row in the last position. We end up with
ˇ
ˇ
ˇ a11 : : : a1;j 1 a1;j C1 : : : a1n 0 ˇ
ˇ
ˇ
ˇ : ::
ˇ
::
:: ::
::
ˇ
:
ni Cnj ˇ ::
:
:
:
:
Xij D .1/
ˇ
ˇ:
ˇ an1 : : : an;j 1 an;j C1 : : : ann 0 ˇ
ˇ
ˇ
ˇa ::: a
ˇ
i1
i;j 1 ai;j C1 : : : ai;n 1
The last determinant is precisely Cij , thanks to Theorem 7.43. The result follows,
since .1/niCnj D .1/iCj .
Example 7.61. Expanding with respect to the first row, we obtain
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ a11 a12 a13 ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ a21 a22 a23 ˇ D a11 ˇ a22 a23 ˇ a12 ˇ a21 a23 ˇ C a13 ˇ a21 a22 ˇ
ˇ a32 a33 ˇ
ˇ a31 a33 ˇ
ˇ a31 a32 ˇ
ˇ
ˇ
ˇa a a ˇ
31 32 33
276
7 Determinants
Problem 7.62. Let
Compute
(a) det.A/
3
1 1 1 1
61 1 1 37
7
AD6
42 2 2 25 :
3 1 1 1
2
(b) det.At A/.
(c) det.A C A/.
(d) det.A1 /.
Solution. (a) Subtracting the second row from the first and expanding with respect
to the first row yields
ˇ
ˇ
ˇ0 0 0 4 ˇ
ˇ
ˇ
ˇ
ˇ
ˇ1 1 1 ˇ
ˇ1 1 1 3ˇ
ˇ
ˇ
ˇ D 4 ˇ2 2 2ˇ :
det A D ˇˇ
ˇ
ˇ
ˇ
2
2
2
2
ˇ
ˇ
ˇ3 1 1ˇ
ˇ3 1 1 1ˇ
In the new determinant, subtract twice the first row from the second one, and
add the first row to the last one. We obtain
ˇ
ˇ
ˇ1 1 1 ˇ
ˇ
ˇ
det A D 4 ˇˇ0 0 4ˇˇ :
ˇ4 0 0 ˇ
Expanding with respect to the last row yields
det A D 4 4 .4/ D 64:
(b) Since the determinant map is multiplicative and det A D det.t A/, we obtain
det.At A/ D det A det.t A/ D .det A/2 D 642 D 4096:
(c) We have
det.A C A/ D det.2A/ D 24 det.A/ D 16 64 D 1024:
(d) Finally,
det.A1 / D
1
1
D
:
det.A/
64
Problem 7.63. Let A D Œaij 2 Mn .R/ be a matrix with nonnegative entries such
that the sum of the entries in each row does not exceed 1. Prove that j det Aj 1.
Solution. We will prove the result by induction on n, the case n D 1 being clear.
Assume that the result holds for n 1 and let A be a matrix as in the statement of
the problem. For 1 i n let Ai be the matrix obtained by deleting the first row
and column i from A. Then
7.4 Computing Determinants in Practice
det A D
277
n
X
.1/iC1 a1i det Ai ;
iD1
and by the inductive hypothesis applied to each Ai we have j det Ai j 1. We deduce
that
j det Aj n
X
a1i j det Ai j n
X
a1i 1
i D1
iD1
and the result follows.
We have already seen that a matrix A 2 Mn .F / is invertible if and only if
det A ¤ 0. It turns out that we can actually compute the inverse of the matrix A by
computing certain determinants. Before doing that, let us introduce a fundamental
definition:
Definition 7.64. Let A 2 Mn .F / be a square matrix with entries in F . The
adjugate matrix adj.A/ is the matrix whose .i; j /-entry is the cofactor Cj i of aj i .
Thus adj.A/ is the transpose of the matrix whose .i; j /-entry is the cofactor Cij
of aij .
We have the fundamental result:
Theorem 7.65. If A 2 Mn .F / has nonzero determinant, then
A1 D
1
adj.A/:
det A
Proof. It suffices to prove that A adj.A/ D det A In . Using the multiplication rule,
this comes down to checking that
n
X
Ck;j ai;j D det A ıik
j D1
for all 1 i; k n, where ıik equals 1 if i D k and 0 otherwise.
If k D i , this follows by Laplace expansion of det A with respect to the i th row,
so suppose that k ¤ i and consider the matrix A0 obtained from A by replacing its
kth row with a copy of the ith row, so that rows i and k in A0 coincide, forcing
det A0 D 0. Using the Laplace expansion in A0 with respect to the kth row and
taking into account that the cofactors involved in the expression do not change when
going from A to A0 (as only the kth row of A is modified), we obtain
0 D det A0 D
n
X
aij Ck;j
j D1
and the result follows.
278
7 Determinants
The previous theorem does not give a practical way of computing the inverse of
a matrix (this involves computing too many determinants), but it is very important
from a theoretical point of view: for instance, it says that the entries of A1 are
rational functions of the entries of A (in particular they are continuous functions
of the entries of A if A has real or complex coefficients). The practical way
of computing the inverse of a matrix has already been presented in the chapter
concerning linear systems and operations on matrices, so we will not repeat the
discussion here.
7.4.1 Problems for Practice
1. Let x be a real number. Compute in two different ways the determinant
ˇ
ˇ
ˇx 1 1 ˇ
ˇ
ˇ
ˇ1 x 1ˇ:
ˇ
ˇ
ˇ1 1 xˇ
2. Let a; b; c be real numbers. Compute the determinant
ˇ
ˇ
ˇa b c
ˇ
2a
2a
ˇ
ˇ
ˇ
ˇ:
2b
bca
2b
ˇ
ˇ
ˇ
2c
2c
c abˇ
3. Let x be a real number. Compute the determinant
ˇ
ˇ
ˇ cos x 0 sin x ˇ
ˇ
ˇ
ˇ 0 1 0 ˇ:
ˇ
ˇ
ˇ sin x 0 cos x ˇ
4. Let a; b; c be real numbers. Compute the determinant
ˇ
ˇ
ˇa C 1 b C 1 c C 1 ˇ
ˇ
ˇ
ˇb C c a C c a C bˇ:
ˇ
ˇ
ˇ 1
1
1 ˇ
5. Let a; b; c be real numbers. Find a necessary and sufficient condition for the
vanishing of the following determinant
ˇ
ˇ
ˇ .a C b/2
a2
b 2 ˇˇ
ˇ
ˇ a2
.a C c/2
c 2 ˇˇ :
ˇ
ˇ b2
c2
.b C c/2 ˇ
7.4 Computing Determinants in Practice
279
3
0y z
6. Let x; y; z be real numbers. By considering the matrices 4 z x 0 5 and
y 0x
3
2
0 z y
4y x 0 5, compute the determinant
z 0x
ˇ 2
ˇ
ˇ y C z2 xy
zx ˇˇ
ˇ
ˇ xy z2 C x 2 yz ˇ :
ˇ
ˇ
ˇ xz
yz x 2 C y 2 ˇ
2
7. Let x; y; z be real numbers. Compute
ˇ
ˇ
ˇ1
cos x
sin x ˇˇ
ˇ
ˇ 1 cos.x C y/ sin.x C y/ ˇ :
ˇ
ˇ
ˇ 1 cos.x C z/ sin.x C z/ ˇ
8. Compute det.A/, where A is the n n matrix
3
2
1 1 1 : : : 1
6 1 1 1 : : : 1 7
7:
AD6
5
4: : :
1 1 1 : : : 1
9. Let a; b; c; d be real numbers and consider the matrices
3
3
2
2
a b c d
1 1 1 1
6b a d c 7
61 1 1 17
7
7
6
AD6
4 c d a b 5 ; B D 41 1 1 1 5 :
d c b a
1 1 1 1
a) Compute det B.
b) By considering the matrix AB, compute det A.
10. Let a be a real number. Prove that for n 3 we have Dn D aDn1 Dn2 ,
where
ˇ
ˇ
ˇ a 1 0 0 ::: ::: ::: 0 ˇ
ˇ
ˇ
ˇ 1 a 1 0 ::: ::: ::: 0 ˇ
ˇ
ˇ
ˇ 0 1 a 1 ::: ::: ::: 0 ˇ
ˇ
ˇ
ˇ
ˇ
Dn D ˇ 0 0 1 a : : : : : : : : : 0 ˇ
ˇ
ˇ
ˇ::: ::: ::: ::: ::: ::: ::: :::ˇ
ˇ
ˇ
ˇ 0 0 0 0 ::: 1 a 1 ˇ
ˇ
ˇ
ˇ 0 0 0 0 ::: 0 1 a ˇ
is the determinant of an n n matrix.
280
7 Determinants
11. Let a; b; c 2 R and x D a2 C b 2 C c 2 , y D ab C bc C ca. Prove that
ˇ
ˇ
ˇ1 x
x
x ˇˇ
ˇ
ˇ 1 x C y x C y 2x ˇ
3
3
3
2
ˇ
ˇ
ˇ 1 2x x C y x C y ˇ D .a C b C c 3abc/ :
ˇ
ˇ
ˇ 1 x C y 2x x C y ˇ
12. Compute det A in each of the following cases:
(a) ai;j D min.i; j /, for i; j D 1; : : : ; n.
(b) ai;j D max.i; j /, for i; j D 1; : : : ; n.
(c) ai;j D ji j j, for i; j D 1; : : : ; n.
13. Let n 2 and let A D Œaij 2 Mn .R/ be the matrix defined by
aij D
0; if i D j
1; if i ¤ j:
a) Compute det A.
b) Prove that
A1 D
2n
1
In C
A:
n1
n1
/
14. Let n 3 and let A be the n n matrix whose .i; j /-entry is aij D cos 2.iCj
n
for i; j 2 Œ1; n. Find det.In C A/.
15. Let A be a matrix of order 3.
(a) If all the entries in A are 1 or 1 show that det.A/ must be even integer and
determine the largest possible value of det.A/.
(b) If all the entries in A are 1 or 0, determine the largest possible value of
det.A/.
16. Let n > 2 and let x1 ; : : :; xn be real numbers. Compute the determinant of the
matrix whose entries are sin.xi C xj / for 1 i; j n.
1
17. Let A 2 Mn .R/ be the matrix whose .i; j /-entry is aij D i Cj
. Prove that
det A D
.1Š2Š: : :; nŠ/4
:
.nŠ/2 1Š2Š: : :.2n/Š
Hint: use the Cauchy determinant.
18. Let V be the space of polynomials with real coefficients whose degree does
not exceed n. Compute the determinant of the linear transformation T sending
P 2 V to P C P 0 .
7.4 Computing Determinants in Practice
281
19. Prove that any matrix A 2 Mn .R/ with determinant 1 is a product of matrices
of the form In C Eij , with i ¤ j 2 Œ1; n and 2 R. Hint: use elementary
operations on rows and columns.
20. Let A be an invertible matrix with integer coefficients. Prove that A1 has
integer entries if and only if det A 2 f1; 1g.
21. Let A; B 2 Mn .R/ be matrices with integer entries such that A; A C B; : : :;
A C 2nB are invertible and their inverses have integer entries. Prove that A C
.2n C 1/B has the same property. Hint: prove first the existence of a polynomial
P with integer coefficients such that P .x/ D det.A C xB/ for all x 2 R.
22. Let A; B 2 Mn .C/ be matrices which commute. We want to prove that the
adjugate matrices adj.A/ and adj.B/ also commute.
a) Prove the desired result when A and B are invertible.
b) By considering the matrices A C k1 In and B C k1 In for k ! 1, prove the
desired result in all cases.
23. Let A 2 Mn .C/, with n 2. Prove that
det.adj.A// D .det A/n1 :
Hint: start by proving the result when A is invertible, then in order to prove the
general case consider the matrices A C k1 In for k ! 1.
24. (Dodgson condensation) Consider a 3 3 matrix
3
2
a11 a12 a13
A D 4a21 a22 a23 5 :
a31 a32 a33
View this matrix as being composed of four 2 2 matrices (overlapping at a22 ).
Form
a 2 2 matrix by taking the determinants of these four 2 2 matrices
N W NE
. Show that
S W SE
ˇ
ˇ
ˇN W NE ˇ
ˇ
ˇ
ˇ S W SE ˇ D a22 det.A/:
25. (Dodgson condensation, continued) Choose your favorite 44 matrix A D Œaij with a22 ; a23 ; a32 ; a33 , and a22 a33 a23 a32 all nonzero.
a) Compute det.A/.
b) View A as being composed of a 3 3 array of overlapping 2 2 matrices
and compute the determinants of these 9 matrices. Write them in a 3 3
matrix B. View this 3 3 matrix as being composed of four overlapping
2 2 matrices. For each compute the determinant and divide by the entry of
A the four had in common. Write the results in a 2 2 matrix C . Take the
determinant of C and divide it by the central entry of B (the one common to
the four determinants that make up C ). Compare your result to the result of
part (a).
282
7 Determinants
Remark. This method of computing a determinant, due to Charles Dodgson,
a.k.a. Lewis Carroll, extends to higher dimensions. It is best visualized if you
imagine filling a pyramidal array of numbers. We start with an .n C 1/ .n C 1/
array of all ones and we lay the n n matrix whose determinant we want in
the layer above, with each entry of A sitting between four of the ones. At each
stage, we fill the next layer by computing the determinant of the 2 2 matrix
formed by the four touching cells in the layer below and dividing by the entry
two layer down directly below the cell. (Thus at the first stage there is a division
by 1 that we neglected to mention above.) When you are done, the entry at
the top of the pyramid will be the determinant. (There is a slight complication
here. Following this procedure naively might result in dividing by zero. This is
fixable, but makes the algorithm less pretty.)
7.5 The Vandermonde Determinant
If A 2 Mn .F /, by definition
det A D
X
"./a1.1/ a2.2/ : : :an .n/
2Sn
is a polynomial expression in the entries of the matrix A. This suggests using
properties of polynomials (such as degree, finiteness of the number of roots. . . ) for
studying determinants. This is a very fruitful idea and we will sketch in this section
how it works.
We start with an absolutely fundamental computation, that of Vandermonde
determinants. These determinants play a crucial role in almost all aspects of
mathematics. We have already given a proof of the next theorem in Problem 7.57.
Here we give a different proof.
Theorem 7.66. Let F be a field and let x1 ; : : :; xn 2 F . Then
ˇ
ˇ
ˇ 1 x1 : : : x1n1 ˇ
ˇ
ˇ
Y
ˇ 1 x2 : : : x n1 ˇ
2
ˇ
ˇ
.xj xi /:
ˇ::: ::: ::: ::: ˇ D
ˇ 1i<j n
ˇ
ˇ 1 x : : : x n1 ˇ
n
n
Proof. We will prove the statement by induction on n, the cases n D 1 and n D 2
being left to the reader. Assume that the result holds for n 1 (and for any choice of
x1 ; : : :; xn1 ) and let x1 ; : : :; xn 2 F . If two of these elements are equal, the result is
clear: the determinant we want to compute has two equal columns, so must vanish.
So assume that x1 ; : : :; xn are pairwise distinct and consider the polynomial
7.5 The Vandermonde Determinant
283
ˇ
ˇ
ˇ 1 x : : : x n1 ˇ
1
ˇ
ˇ
1
ˇ 1 x : : : x n1 ˇ
ˇ
ˇ
2
2
ˇ
ˇ
P .X / D ˇ : : : : : : : : : : : : ˇ :
ˇ
ˇ
n1
ˇ 1 xn1 : : : xn1 ˇ
ˇ
ˇ
ˇ 1 X : : : X n1 ˇ
Expanding with respect to the last row, we see that
ˇ
ˇ
ˇ 1 x1 : : : x1n2 ˇ
ˇ
ˇ
ˇ
x2 : : : x2n2 ˇˇ
n1 ˇ 1
n2
C :::
P .X / D X
ˇ : : : : : : : : : : : : ˇ C an2 X
ˇ
ˇ
ˇ 1 x
n2 ˇ
n1 : : : xn1
for someQan2 ; : : :; a0 2 F . Thus by the inductive hypothesis the leading coefficient
of P is 1i<j n1 .xj xi / ¤ 0.
Let i 2 Œ1; n 1. Taking X D xi , we obtain a determinant with two equal rows,
which must therefore vanish. It followsQthat P .x1 / D : : : D P .xn1 / D 0. Since P
has degree n 1, leading coefficient 1i<j n1 .xj xi / and vanishes at n 1
distinct points x1 ; : : :; xn1 , we deduce that
P .X / D
Y
.xj xi / 1i<j n1
n1
Y
.X xi /:
i D1
Plugging in X D xn yields the desired result.
ˇ
ˇ
ˇ 1 x1 : : : x1n1 ˇ
ˇ
ˇ
ˇ 1 x2 : : : x n1 ˇ
2
ˇ the Vandermonde deterRemark 7.67. We call the determinant ˇˇ
ˇ
ˇ::: ::: ::: ::: ˇ
ˇ 1 x : : : x n1 ˇ
n
n
minant associated with x1 ; : : :; xn . It follows from the previous theorem that
the Vandermonde determinant associated with x1 ; : : :; xn is nonzero if and only
if x1 ; : : :; xn are pairwise distinct. Vandermonde determinants are ubiquitous in
mathematics and are closely related to the following fundamental problem: “for
distinct complex numbers x1 ; : : : ; xn and arbitrary complex numbers b1 ; : : : ; bn ,
find a polynomial P .X / of degree at most n 1 such that P .xi / D bi .” Written out
as a linear system for the coefficients ai of P yields the equation Va D b, where b is
the column vector whose coordinates are the bi ’s and V is the Vandermonde matrix
associated with x1 ; : : :; xn (thus det V is the Vandermonde determinant associated
with x1 ; : : :; xn ). The fact that the Vandermonde determinant is nonzero is equivalent
to this problem having a unique solution. The unique solution of this problem
(known as Lagrange interpolation) is given by
P .X / D
n
X
iD1
bi
Y X xj
j ¤i
x i xj
:
284
7 Determinants
Problem 7.68. Let a; b; c be nonzero real numbers. Prove that
ˇ 2 2 2ˇ
ˇa b c ˇ
ˇ
ˇ
ˇ c 2 a2 b 2 ˇ D .a2 bc/.b 2 ca/.c 2 ab/:
ˇ
ˇ
ˇ ac ab bc ˇ
Solution. Dividing all entries of the first column by a2 , all entries of the second
column by b 2 , and all entries of the third column by c 2 , we obtain
ˇ
ˇ 2 2 2ˇ
ˇ 1
ˇa b c ˇ
ˇ
ˇ
ˇ
2
2
2
2
ˇ c a b ˇ D .abc/ ˇˇ c 2
ˇ
ˇ
ˇ ac
ˇ ac ab bc ˇ
ˇ
a 2
b
a
b
ˇ
ˇ 1
ˇ
ˇ
D .abc/2 ˇ ac
ˇ c 2
ˇ
ˇ
1 ˇˇ
b ˇ
c ˇˇ :
b 2ˇ
1
a
a
1
a
b
a 2
b
ˇ
1 ˇˇ
b 2 ˇˇ
c
ˇ
b ˇ
c
c
We recognize a Vandermonde determinant associated with ac ; ab ; bc , thus we can
further write
ˇ 2 2 2ˇ
ˇa b c ˇ
ˇ
ˇ
ˇ c 2 a2 b 2 ˇ D .abc/2 b c b a a c :
ˇ
ˇ
c a
c b
b a
ˇ ac ab bc ˇ
We have
c 2 ab
b c
D
c a
ac
and similar identities obtained by permuting a; b; c. We conclude that
ˇ 2 2 2ˇ
ˇa b c ˇ
ˇ
ˇ
ˇ c 2 a2 b 2 ˇ D .a2 bc/.b 2 ca/.c 2 ab/:
ˇ
ˇ
ˇ ac ab bc ˇ
Problem 7.69. Let F be a field and let x1 ; : : :; xn 2 F . Compute
ˇ
ˇ
ˇ 1 x1 : : : x1n2 x1n ˇ
ˇ
ˇ
ˇ 1 x2 : : : x n2 x n ˇ
2
1 ˇ:
ˇ
ˇ::: ::: ::: :::
ˇ
ˇ
ˇ
ˇ 1 x : : : x n2 x n ˇ
n
n
n
7.5 The Vandermonde Determinant
285
Solution. Write
P .X / D .X x1 /: : :.X xn / D X n C an1 X n1 C : : : C a1 X C a0
for some scalars a0 ; : : :; an 2 F , with an1 D .x1 C : : : C xn /. Next, add to
the last column the first column multiplied by a0 , the second column multiplied by
a1 ,. . . , the n 1th column multiplied by an2 . The value of the determinant does
not change, and since
xin C an2 xin2 C : : : C a0 D an1 xin1 ;
we deduce that
ˇ
ˇ
ˇ
ˇ
ˇ 1 x1 : : : x1n2 x1n1 ˇ
ˇ 1 x1 : : : x1n2 x1n ˇ
ˇ
ˇ
ˇ
ˇ
ˇ 1 x2 : : : x n2 x n1 ˇ
ˇ 1 x2 : : : x n2 x n ˇ
2
1 ˇ D a
2
1
ˇ
ˇ
ˇ
n1 ˇ
ˇ
ˇ
ˇ::: ::: ::: :::
ˇ
ˇ::: ::: ::: :::
ˇ
ˇ
ˇ 1 x : : : x n2 x n1 ˇ
ˇ 1 x : : : x n2 x n ˇ
n
n
n
n
n
n
n
X
xi /
D.
Y
iD1
1i<j n
.xj xi /;
the last equality being a consequence of Theorem 7.66.
Remark 7.70. An alternate solution is to remark that the desired determinant is the
coefficient of X n1 in the Vandermonde determinant
ˇ
ˇ
ˇ1 x1 x1n ˇ
ˇ
ˇ
n
ˇ :: :: : : :: ˇ
Y
Y
ˇ: : : : ˇ
.xj xi / .X xk /:
ˇD
ˇ
ˇ1 x n x n ˇ
1i<j n
kD1
nˇ
ˇ
ˇ1 X X n ˇ
Problem 7.71. Let P0 ; : : :; Pn1 be monic polynomials with complex coefficients
such that deg Pi D i for 0 i n 1 (thus P0 D 1). If x1 ; : : :; xn 2 C, compute
ˇ
ˇ
ˇ P0 .x1 / P1 .x1 / : : : Pn1 .x1 / ˇ
ˇ
ˇ
ˇ P0 .x2 / P1 .x2 / : : : Pn1 .x2 / ˇ
ˇ
ˇ:
ˇ :::
::: :::
: : : ˇˇ
ˇ
ˇ P .x / P .x / : : : P .x / ˇ
0 n
1 n
n1 n
Solution. Let A be the matrix whose determinant we want to compute and let us
write
Pi .X / D X i C ci;i 1 X i1 C : : : C ci;0
286
7 Determinants
for some complex numbers cij . The matrix A is then equal to
2
3 2
3
1 x1 x12 : : : x1n1
1 c1;0 c2;0 : : : cn;0
61 x2 x 2 : : : x n1 7 60 1 c2;1 : : : cn;1 7
2
2 7 6
6
7
6: : : :
7 6: : :
7:
::
4 :: :: :: : :
5 4 :: :: ::
: 5
1 xn xn2 : : : xnn1
0 0
0
:::1
Since the second matrix is upper-triangular with diagonal entries equal to 1, its
determinant equals 1. Using Theorem 7.66, we deduce that
ˇ
ˇ
ˇ P0 .x1 / P1 .x1 / : : : Pn1 .x1 / ˇ
ˇ
ˇ
Y
ˇ P0 .x2 / P1 .x2 / : : : Pn1 .x2 / ˇ
ˇ
ˇD
.xj xi /:
ˇ :::
ˇ
::: :::
::: ˇ
ˇ
1i<j n
ˇ P .x / P .x / : : : P .x / ˇ
0 n
1 n
n1 n
Problem 7.72. For 0 < k < n compute det A, where
2
1k
2k
::
:
6
6
AD6
4
2k
3k
::
:
3k
4k
::
:
3
: : : .n C 1/k
: : : .n C 2/k 7
7
7:
::
::
5
:
:
.n C 1/k .n C 2/k .n C 3/k : : : .2n C 1/k
Solution. Consider the matrix
2
6
6
Ax D 6
4
1k
2k
::
:
2k
3k
::
:
3k
4k
::
:
:::
nk
: : : .n C 1/k
::
::
:
:
.n C 1/k .n C 2/k .n C 3/k : : : .2n/k
.x C 1/k
.x C 2/k
::
:
3
7
7
7
5
.x C n C 1/k
obtained from A by modifying its last column. Then p.x/ D det.Ax / is a
polynomial in the variable x, whose degree is at most k < n. Indeed, expanding
the determinant of Ax with respect to the last column shows that p.x/ is a linear
combination of .x C 1/k ; : : :; .x C n C 1/k , each of which has degree k.
Next, observe that p vanishes at 0; 1; : : :; n 1, since when x 2 f0; 1; : : :; n 1g
the matrix Ax has two equal columns, thus det.Ax / D 0. Since deg p < n and p
has at least n distinct roots, it follows that p is the zero polynomial. In particular
p.n/ D 0, hence the determinant to be evaluated is 0.
7.5 The Vandermonde Determinant
287
7.5.1 Problems for Practice
1. Given real numbers a; b; c, compute the determinant
ˇ
ˇ
ˇ bCc aCc aCb ˇ
ˇ
ˇ
ˇ b 2 C c 2 a2 C c 2 a2 C b 2 ˇ :
ˇ
ˇ
ˇ b 3 C c 3 a3 C c 3 a3 C b 3 ˇ
2. Let a; b; c be real numbers. Compute
ˇ
ˇ
ˇ a C b ab a2 C b 2 ˇ
ˇ
ˇ
ˇ b C c bc b 2 C c 2 ˇ :
ˇ
ˇ
ˇ c C a ca c 2 C a2 ˇ
3. Let z1 ; : : :; zn be pairwise distinct complex numbers. Let fi W R ! C be the
map x 7! e zi x . Prove that f1 ; : : :; fn are linearly independent over C. Hint: if
˛1 f1 C : : : C ˛n fn D 0, take successive derivatives of this relation and evaluate
at x D 0.
4. a) Prove that for any positive integer n there is a polynomial Tn of degree n
such that
Tn .cos x/ D cos nx
for all real numbers x. This polynomial Tn is called the nth Chebyshev
polynomial. For instance, T1 .X / D X , T2 .X / D 2X 2 1.
b) Let x1 ; : : :; xn be real numbers. Using part a), compute the determinant
ˇ
ˇ
ˇ 1 cos.x1 / cos.2x1 / : : : cos..n 1/x1 / ˇ
ˇ
ˇ
ˇ 1 cos.x2 / cos.2x2 / : : : cos..n 1/x2 / ˇ
ˇ
ˇ:
ˇ::: :::
ˇ
:::
:::
ˇ
ˇ
ˇ 1 cos.x / cos.2x / : : : cos..n 1/x / ˇ
n
n
n
5. Let x1 ; : : :; xn ; y1 ; : : :; yn be complex numbers and let k 2 Œ0; n 1. Compute
the determinant of the n n matrix whose .i; j /-entry is .xi C yj /k . Hint:
use the binomial formula and write the matrix of the product of two simpler
matrices, of Vandermonde type.
6. Let a0 ; a1 ; : : :; an1 be complex numbers and consider the matrix
3
a0 a1 a2 : : : an1
6an1 a0 a1 : : : an2 7
7
AD6
5
4 :::
a1 a2 a3 : : : a0
2
obtained by cyclic permutations of the first row.
288
7 Determinants
2i a) Let z D e n and consider the matrix B D Œz.i1/.j 1/ 1i;j n . Compute the
matrix AB.
b) Deduce that
0
1
n
n1
Y
X
@
det A D
aj zj.k1/ A :
kD1
j D0
7. Consider the “curve” C D f.1; t; t 2 ; : : :; t n1 /jt 2 Cg in Cn , where n is a
positive integer. Prove that any n pairwise distinct points of C form a basis
of Cn .
8. Using Vandermonde’s determinant, prove that we cannot find finitely many
maps fi ; gj W R ! R such that
e xy D
n
X
fi .x/gi .y/
iD1
for all x; y 2 R.
9. Let z1 ; z2 ; : : :; n be complex numbers such that
z1 C z2 C : : : C zn D z21 C z22 C : : : C z2n D : : : D zn1 C zn2 C : : : C znn D 0:
Prove that z1 D z2 D : : : D zn D 0.
10. Prove that there exists an infinite set of points
: : : ; P3 ; P2 ; P1 ; P0 ; P1 ; P2 ; P3 ; : : :
in the plane with the following property: for any three distinct integers a; b; and
c, points Pa , Pb , and Pc are collinear if and only if a C b C c D 2014. Hint: let
Pn be the point with coordinates .x; x 3 /, where x D n 2014
.
3
7.6 Linear Systems and Determinants
In this section we will use determinants to make a more refined study of linear
systems. Before doing that, we will show that the computation of the rank of a
matrix A 2 Mm;n .F / can be reduced to the computation of a certain number of
determinants. This will be very important for applications to linear systems, but is
not very useful in practice, since it is more practical to compute the rank of a matrix
by computing its reduced echelon form (by definition of this form, the rank of A is
simply the number of pivots).
Let A 2 Mm;n .F / be a matrix. Recall that a sub-matrix of A is a matrix obtained
by deleting a certain number of rows and columns of A.
7.6 Linear Systems and Determinants
289
Theorem 7.73. Let A 2 Mm;n .F / be a matrix of rank r. Then
a) There is an invertible r r sub-matrix in A.
b) For k > r, any k k sub-matrix of A is not invertible.
In other words, the rank of A is the largest size of an invertible sub-matrix of A.
Proof. Let A D Œaij and let C1 ; : : :; Cn be the columns of A, so that
r D dim Span.C1 ; : : :; Cn /:
Let d be the largest size of an invertible sub-matrix of A. We will prove separately
the inequalities d r and r d .
We start by proving the inequality r d . Let B be an invertible d d sub-matrix
of A. Permuting the rows and columns of A (which does not change its rank), we
may assume that B consists in the first d rows and columns of A. Then C1 ; : : :; Cd
are linearly independent (as any nontrivial linear relation between C1 ; : : :; Cd would
induce a nontrivial relation between the columns of B, contradicting the fact that B
is invertible). But then
r D dim Span.C1 ; : : :; Cn / dim Span.C1 ; : : :; Cd / D d:
Let us prove now that r d . By definition of r, we know that we can find r
columns of A which form a basis of the space generated by the columns of A. Let B
be the m r matrix obtained by deleting all other columns of A except for these r.
Then B has rank r. But then t B also has rank r (because a matrix and its transpose
have the same rank), thus the space generated by the rows of B has dimension r.
In particular, we can find r rows of B which are linearly independent. The submatrix obtained from B by deleting all other rows except for these r is an invertible
r r sub-matrix of A, thud d r.
Problem 7.74. Let v1 ; : : :; vp 2 F n be vectors and let A 2 Mn;p .F / be the matrix
whose columns are v1 ; : : :; vp . Prove that v1 ; : : :; vp are linearly independent if and
only if A has a p p invertible sub-matrix.
Solution. v1 ; : : :; vp are linearly independent if and only if they form a basis of
Span.v1 ; : : :; vp / or equivalently if dim Span.v1 ; : : :; vp / D p. Finally, this is further
equivalent to rank.A/ D p. The result follows then directly from the previous
theorem.
Problem 7.75. Consider the vectors
v1 D .1; x; 0; 1/;
v2 D .0; 1; 2; 1/;
v3 D .1; 1; 1; 1/ 2 R4 :
Prove that for any choice of x 2 R4 the vectors v1 ; v2 ; v3 are linearly independent.
290
7 Determinants
Solution. The matrix whose columns are v1 ; v2 ; v3 is
3
101
6x 1 17
7
AD6
4 0 2 15 :
2
111
By Problem 7.74 v1 ; v2 ; v3 are linearly independent if and only if A has a 3 3
invertible sub-matrix. Such a matrix is obtained by deleting one row of A. Deleting
the second row yields a sub-matrix whose determinant is
ˇ
ˇ
ˇ1 0 1ˇ
ˇ
ˇ
ˇ 0 2 1 ˇ D 1;
ˇ
ˇ
ˇ1 1 1ˇ
thus the corresponding sub-matrix is invertible and the result follows.
Thanks to the previous results, we can make a detailed study of linear systems.
Consider the linear system
8
ˆ
ˆ a11 x1C a12 x2 C : : :C a1n xn D b1
<
a21 x1C a22 x2 C : : :C a2n xn D b2
ˆ
:::
:̂
am1 x1C am2 x2 C : : :C amn xn D bm
3
b1
6 b2 7
m
7
with A D Œaij 2 Mm;n .F /, b D 6
4 : : : 5 2 F a given vector and unknowns
bm
2 3
x1
6 x2 7
7
x1 ; : : :; xn . Let X D 6
4 : : : 5 and let C1 ; : : :; Cn be the columns of A. Then the
2
xn
system can be written as
AX D b
or
x1 C1 C : : : C xn Cn D b:
The first fundamental theorem of linear systems is the following:
Theorem 7.76 (Rouché–Capelli). Consider the linear system above and let
ŒA; b 2 Mm;nC1 .F / be the matrix obtained by adding a rightmost column to A,
equal to b. Then
7.6 Linear Systems and Determinants
291
a) The system is consistent4 if and only if rank.A/ D rankŒA; b.
b) Assume that the system is consistent and let X0 be a solution of it. Let Sh be the
set of solutions of the associated homogeneous system.5 Then the set of solutions
of the original system is fX0 C X jX 2 Sh g and Sh is a vector space of dimension
n rank.A/ over F .
Proof. a) The system being equivalent to b D x1 C1 C : : : C xn Cn , it is consistent
if and only if b is a linear combination of C1 ; : : :; Cn , which is equivalent
to b 2 Span.C1 ; : : :; Cn /. This is further equivalent to Span.C1 ; : : :; Cn / D
Span.C1 ; : : :; Cn ; b/ and finally
dim Span.C1 ; : : :; Cn / D dim Span.C1 ; : : :; Cn ; b/:
By definition, the left-hand side equals rank.A/ and the right-hand side equals
rankŒA; b. The result follows.
b) By Proposition 3.2 we know that the set of solutions of the system is fX0 C
X jX 2 Sh g. It remains to prove that Sh is of dimension n rank.A/. But the
corresponding homogeneous system can be written AX D 0, thus its set of
solutions is the kernel of the map T sending X 2 F n to AX 2 F m . By the
rank-nullity theorem we deduce that
dim Sh D n dim Im.T / D n rank.A/
and the theorem is proved.
Let us take for simplicity F D R or F D C (the same argument will apply to
any infinite field). It follows from the previous theorem that we have the following
possibilities:
• the system has no solution. This happens precisely when A and ŒA; b do not have
the same rank.
• the system has exactly one solution, which happens if and only if A has rank
exactly n, or equivalently its columns are linearly independent.
• the system has more than 1 solution, and then it has infinitely many solutions.
More precisely, the solutions depend on n rank.A/ parameters.
Here is an important consequence of the previous results:
Theorem 7.77. Let A 2 Mm;n .F / and let F1 be a field containing F . Consider the
linear system AX D 0. If it has a nontrivial solution in F1n , then it has a nontrivial
solution in F n .
Proof. Since the system has nontrivial solutions in F1n , A has rank r < n seen as
element of Mm;n .F1 /. But Theorem 7.73 shows that the rank of A seen as element
of Mm;n .F1 / or Mm;n .F / is the same, thus using again the previous discussion we
deduce that the system has nontrivial solutions in F n .
4
Recall that this simply means that the system has at least one solution.
5
This is the system AX D 0, i.e., it has the same unknowns, but b is equal to 0.
292
7 Determinants
To make a deeper study of the linear system AX D b, assume that it is consistent
and that A has rank r (therefore ŒA; b also has rank r). By Theorem 7.73 the matrix
A has an invertible r r sub-matrix. By permuting the equations and the unknowns
of the system, we may assume that the sub-matrix consisting in the first r rows and
columns of A is invertible. Then x1 ; : : :; xr are called the principal unknowns and
the first r equations of the system are called the principal equations. All other
equations can be deduced from the first r, so separating the principal and nonprincipal unknowns yields the equivalent system
8
a11 x1C a12 x2 C : : :C a1r xr D b1 a1;rC1 xrC1 : : : a1n xn
ˆ
ˆ
<
a21 x1C a22 x2 C : : :C a2r xr D b2 a2;rC1 xrC1 : : : a2n xn
ˆ
:::
:̂
ar1 x1C ar2 x2 C : : :C arr xr D br ar;rC1 xrC1 : : : arn xn
This a Cramer system, that is the number of unknowns (which are x1 ; : : :; xr )
equals the number of equations and the matrix of the system (which is Œaij 1i;j r )
is invertible. This kind of system has a unique solution, which can be expressed in
terms of some determinants, as the following theorem shows:
Theorem 7.78. Let A D Œaij 1i;j n be an invertible matrix in Mn .F /, let b D
2 3
b1
6 b2 7
6 7 2 F n be a given vector and consider the system AX D b with the unknowns
4:::5
bn
x1 ; : : :; xn . Then the system has a unique solution
X D A1 b
and we have for all i 2 Œ1; n
xi D
i
;
where
D det A and i is the determinant of the matrix obtained from A by
replacing the ith column with the vector b.
Proof. It is clear that the system AX D b is equivalent to X D A1 b and so it has
a unique solution. To prove the second part, let e1 ; : : :; en be the canonical basis of
F n and write det instead of det.e1 ;:::;en / . If C1 ; : : :; Cn are the columns of A, then by
definition
i D det.C1 ; : : :; Ci1 ; b; CiC1 ; : : :; Cn /:
Since AX D b, we have x1 C1 C : : : C xn Cn D b. Since det is multilinear and
alternating, we obtain
7.6 Linear Systems and Determinants
293
i D det.C1 ; : : :; Ci1 ;
n
X
xj Cj ; Ci C1 ; : : :; Cn / D
j D1
n
X
xj det.C1 ; : : :; Ci1 ; Cj ; CiC1 ; : : :; Cn / D xi det.C1 ; : : :; Ci 1 ; Ci ; Ci C1 ; : : :; Cn /
j D1
D xi det A D xi :
The result follows.
Finally, we want to give another criterion for consistency. Recall that
A 2 Mm;n .F / is the matrix of the system, that we assume rank.A/ D r and
(by permuting the unknowns and the equations) that the r r sub-matrix of A
consisting in the first r rows and columns of A is invertible.
Theorem 7.79. Under the previous hypotheses, the system AX D b 2 F m is
consistent if and only if for all k 2 Œr C 1; m we have
ˇ
ˇ
ˇa a ::: a b ˇ
1r 1 ˇ
ˇ 11 12
ˇa a ::: a b ˇ
ˇ 21 22
2r 2 ˇ
ˇ
ˇ
ˇ D 0:
k D ˇ ::: ::: ::: :::
ˇ
ˇ
ˇ ar1 ar2 : : : arr br ˇ
ˇ
ˇ
ˇ ak1 ak2 : : : akr bk ˇ
Proof. If the system is consistent, then b is a linear combination of C1 ; : : :; Cn ,
so the last column of the matrix defining k is a linear combination of the other
columns, thus k D 0 and for all k 2 Œr C 1; m.
Conversely, assume that all determinants k are 0. Note that it makes sense to
define k for k r by the same formula, and it is clear that we still have k D 0
for k r (as the corresponding matrix has two equal rows). Expanding k with
respect to its last row and denoting rC1;1 ; : : :; rC1;r the corresponding cofactors
(which are independent of k) we obtain
rC1;1 ak1 C C
rC1;r akr C det.aij /1i;j r bk D 0;
rC1;1 C1 C : : : C
rC1;r Cr C det.aij /1i;j r b D 0:
for all k and so
Since det.aij /1i;j r ¤ 0 by assumption, this shows that b 2 Span.C1 ; : : :; Cr / and
so b is a linear combination of the columns of A, which means that the system is
consistent.
294
7 Determinants
Let us see a few examples of explicit resolutions of linear systems using the
above ideas. Actually for the first one the method of reduced echelon form is much
more practical than the following. We strongly suggest the reader to compare the
two methods.
Problem 7.80. a) Solve in real numbers the system
8
< x1 C 3x2 5x3 D 4
x C 4x2 8x3 D 5
: 1
3x1 7x2 C 9x3 D 6
b) Solve the system
8
< x1 C 3x2 5x3 D 4
x1 C 4x2 8x3 D 7
:
3x1 7x2 C 9x3 D 6
Solution. a) The matrix of the system is
2
3
1 3 5
A D 4 1 4 85
3 7 9
and one easily computes det A D 0. Thus the system is not a Cramer system.
Looking at the sub-matrix of A consisting in the first two rows and columns, we
see that it is invertible. It follows that A has rank 2. The system is consistent if
and only if
3
2
1 3 5 4
rank 4 1 4 8 55 D 2:
3 7 9 6
This means that all matrices obtained from this matrix by deleting one column
have determinant 0. But one easily checks that the matrix obtained by deleting
the second column is invertible, thus the system is not consistent and thus it has
no solution.
b) The matrix of the system is the same. The system is consistent if and only
3
2
1 3 5 4
if all matrices obtained by deleting one column from 4 1 4 8 7 5 have
3 7 9 6
determinant 0. One easily checks that this is the case, thus the system will have
infinitely many solutions. The principal unknowns are x1 ; x2 and the system is
equivalent to
x1 C 3x2 D 5x3 C 4;
x1 C 4x2 D 8x3 C 7
7.6 Linear Systems and Determinants
295
and can be solved using Cramer’s formulae or directly. One finds
x2 D 3x3 C 3;
x1 D 4x3 5:
We conclude that the solutions of the system are given by .4t 5; 3t C 3; t /
with t 2 R.
Problem 7.81. Let a; b; c be given real numbers. Solve the linear system
8
< .b C c/x C by C cz D 1
ax C .c C a/y C cz D 1
:
ax C by C .a C b/z D 1
Solution. The matrix of the system is
3
bCc b
c
AD4 a aCc c 5
a
b aCb
2
and a brutal computation left to the reader shows that
det A D 4abc:
We consider therefore two cases.
If abc ¤ 0 the system is a Cramer system with a unique solution given by
Cramer’s formulae
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ1 b
ˇb C c 1 c ˇ
ˇb C c b 1ˇ
c ˇˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ1 a C c c ˇ
ˇ a 1 c ˇ
ˇ a a C c 1ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ1 b a C bˇ
ˇ a 1 aCbˇ
ˇ a
b 1ˇ
xD
; yD
; zD
:
4abc
4abc
4abc
In order to compute x explicitly, we subtract b times the first column from the
second one, and c times the first column from the third one, ending up with
ˇ
ˇ
ˇ1
ˇ
0
0
ˇ
ˇ
ˇ1 a C c b
ˇ
0
ˇ
ˇ
ˇ1
0
aCb cˇ
.a C c b/.a C b c/
xD
D
4abc
4abc
and we similarly obtain
yD
.b C c a/.b C a c/
;
4abc
zD
.a C c b/.b C c a/
:
4abc
296
7 Determinants
In the second case we have abc D 0, that is A is not invertible. The system is
consistent if and only if
3
2
bCc b
c 1
rank.A/ D rank 4 a a C c c 15 :
a
b aCb 1
While one can follow the discussion given in this chapter, it is much easier to deal
with this case as follows: say without loss of generality that a D 0. The system
becomes
8
<.b C c/x C by C cz D 1
c.y C z/ D 1
:
b.y C z/ D 1
It is clear from the second and third equations that if the system is consistent, then
necessarily b D c and b is nonzero. So if b ¤ c or bc D 0, then the system has
no solution in this case. Assume therefore that b D c is nonzero. The system is
equivalent to
b.2x C y C z/ D 1
b.y C z/ D 1
making it clear that x D 0 and y C z D b1 . In this case the solutions of the system
are given by .0; y; b1 y/ with y 2 R.
Problem 7.82. Let n be an integer greater than 1. Solve the linear system
8
x2 C : : :C
xn D 1
x1 C
ˆ
ˆ
<
x1C 2x2 C : : :C nxn D 0
ˆ
:::
:̂
x1C 2n1 x2 C : : :C nn1 xn D 0
Solution. The matrix of the system is
3
2
1 1
1 :::
1
1
61 2
3 ::: n 1
n 7
7
6
A D 6: :
:
:
:: 7
:
::
::
::
4 :: ::
: 5
n1 n1
n1 n1
12
3
: : : .n 1/
n
and it is invertible as its determinant is a Vandermonde determinant.
Therefore the system is a Cramer system and we have
ˇ
ˇ
ˇ 1 1 :::
1
1
1
: : : 1 ˇˇ
ˇ
ˇ 1 2 ::: i 1 0 i C 1 ::: n ˇ
ˇ
ˇ
ˇ
ˇ::: ::: :::
:::
ˇ
ˇ
ˇ 1 2n1 : : : .i 1/n1 0 .i C 1/n1 : : : nn1 ˇ
:
xi D
det A
7.6 Linear Systems and Determinants
297
The numerator of the previous fraction is the Vandermonde determinant attached
to 1; 2; : : :; i 1; 0; i C 1; : : :; n, while the denominator is the Vandermonde determinant attached to 1; 2; : : :; i 1; i; i C 1;
Q: : :; n. Recalling that the Vandermonde
determinant attached to x1 ; : : :; xn equals 1i<j n .xj xi / and canceling similar
factors in the numerator and denominator, we end up with
Qi1
Qn
.i 1/ŠnŠ
j D1 .0 j /
kDi C1 k
xi D Qi1
D .1/i 1
Qn
.i 1/ŠiŠ.n i /Š
j D1 .i j /
kDi C1 .k i /
!
D .1/
i1
nŠ
n
D .1/i 1
i Š.n i /Š
i
and so for i 2 Œ1; n
xi D .1/
i1
!
n
:
i
Pn
Remark 7.83. An alternate solution goes as follows: write P .T / D kD1 xk T k .
Then the first equation reads P .1/ D 1 and the rest read P .k/ .1/ D 0 for
k D 1; : : : ; n 1. Since we also have P .0/ D 0 by construction, we see that the
unique solution is P .T / D 1 .1 T /n from which we read off the coefficients
xk D .1/k1 kn .
Problem 7.84. Let a1 ; : : :; an ; b1 ; : : :; bn be pairwise distinct complex numbers
such that ai C bj ¤ 0 for all i; j 2 Œ1; n. Find all complex numbers x1 ; : : :; xn
such that for all i 2 Œ1; n we have
n
X
xj
D 1:
a C bj
j D1 i
Solution. The determinant of the associated matrix is a Cauchy determinant and
equals (by Problem 7.58)
Q
1i<j n .aj ai /.bj bi /
Q
det A D
¤ 0:
i;j 2Œ1;n .ai C bj /
Thus the system has at most one solution. One could in principle argue as in the
previous problem to find this solution, but there is a much more elegant (and very
useful) technique that we prefer to present. Consider the rational function
F .X / D
n
X
xj
:
X
C bj
j D1
298
7 Determinants
The system is equivalent to F .a1 / D : : : D F .an / D 1. Write
F .X / D
P .X /
;
Q.X /
Q.X / D .X C b1 /: : :.X C bn /
for some polynomial P , and notice that deg P n 1. The system becomes
P .aj / D Q.aj / for 1 j n. Since Q P is a monic polynomial of degree n
vanishing at a1 ; : : :; an and since a1 ; : : :; an are pairwise distinct, we deduce that the
system is equivalent to
Q.X / P .X / D
n
Y
.X ak /:
kD1
The conclusion is that x1 ; : : :; xn is a solution of the system if and only if
n
X
xj
D
X
C bj
j D1
Qn
kD1 .X C bk / kD1 .X ak /
Qn
:
.X
C
bk /
kD1
Qn
In order to find each xj , we multiply the previous relation by X C bj and then make
X tend to bj . We deduce that
Qn
xj D lim
x!bj
kD1 .X C bk / Q
Qn
kD1 .X ak /
k¤j .X C bk /
D .1/
n1
Qn
.ak C bj /
QkD1
:
k¤j .bk bj /
It follows that the system has a unique solution, given by
Qn
.ak C bj /
:
xj D .1/n1 QkD1
k¤j .bk bj /
7.6.1 Problems for Practice
1. Let A 2 Mn .C/ and let B D adj.A/ be the adjugate matrix.
a) Prove that if A is invertible, then so is B.
b) Prove that if A has rank n 1, then B has rank 1.
c) Prove that if A has rank at most n 2, then B D On .
2. Using the previous result, find all matrices A 2 Mn .C/ which are equal to their
adjugate matrix. Hint: the case n D 2 is special.
7.6 Linear Systems and Determinants
299
3. Let a; b be complex numbers and consider the matrix A 2 Mn .C/ whose
diagonal entries are all equal to a, and such that all other entries of A are equal
to b.
a) Compute det A.
b) Find the rank of A (you will need to distinguish several cases, according to
the values of a and b).
4. Find real numbers a; b; c such that for all polynomials P with real coefficients
and whose degree does not exceed 2 we have
Z 1
1
P .x/dx D aP .0/ C bP . / C cP .1/:
2
0
5. Given real numbers a; b; c; u; v; w, solve the linear system
8
<ax by D u
by cz D v
:
cz ax D w
6. Given a real number a, solve the linear system
8
y
x
z
ˆ
< 1Ca C 1C2a C 1C3a D 1
y
x
z
C 2C2a C 2C3a
D1
2Ca
:̂ x C y C z D 1
3Ca
3C2a
3C3a
7. Let Sa be the linear system
8
< x 2y C z D 1
3x C 2y 2z D 2
:
2x y C az D 3
a) Find all real numbers a for which the system has no solution.
b) Find all real numbers a for which the system has a unique solution.
c) Find all real numbers a for which the system has infinitely many solutions.
8. Given real numbers a; b; c; d , solve the linear system
8
<
xCyCzD1
ax C by C cz D d
: 2
a x C b2y C c2z D d 2
300
7 Determinants
9. Given real numbers a; b; c; d; ˛, solve the linear system
8
.1 C ˛/x C y C z C t D a
ˆ
ˆ
<
x C .1 C ˛/y C z C t D b
ˆ x C y C .1 C ˛/z C t D c
:̂
x C y C z C .1 C ˛/t D d
10. Find the necessary and sufficient condition satisfied by real numbers a; b; c for
the system
8
<x a.y C z/ D 0
y b.x C z/ D 0
:
z c.x C y/ D 0
to have a nontrivial solution.
11. Let a; b; c be pairwise distinct real numbers. Solve the system
8
<x C ay C a2 z D a3
x C by C b 2 z D b 3
:
x C cy C c 2 z D c 3
12. Let a; b be complex numbers. Solve the linear system
8
ax C y C z C t D 1
ˆ
ˆ
<
x C ay C z C t D b
ˆ x C y C az C t D b 2
:̂
x C y C z C at D b 3 :
Chapter 8
Polynomial Expressions of Linear
Transformations and Matrices
Abstract From a theoretical point of view, this chapter is the heart of the book.
It uses essentially all results established before to prove a great deal of surprising
results concerning matrices. This chapter makes heavy use of basic properties of
polynomials which are used to study the eigenvalues and eigenvectors of matrices.
Keywords Minimal polynomial • Characteristic polynomial • Eigenvalue
• Eigenvectors
From a theoretical point of view, we reach now the heart of the book. In this chapter
we will use everything we have developed so far to study linear maps and matrices.
To each matrix (or linear transformation of a finite dimensional vector space) we
will associate two polynomials, the minimal and the characteristic polynomial. They
are not enough to characterize the matrix up to similarity, but they give lots of
nontrivial information about the matrix. We also associate a collection of scalars
called eigenvalues of the matrix (if the field of scalars is C, the eigenvalues are
simply the roots of the characteristic polynomial) and a collection of subspaces
indexed by the eigenvalues and called eigenspaces. An in-depth study of these
objects yields many deep theorems and properties of matrices.
In this chapter we will make heavy use of basic properties of polynomials. We
recalled them in the appendix concerning algebraic prerequisites, and we strongly
advise the reader to make sure that he is familiar with these properties before starting
reading this chapter. We fix a field F (the reader will not loose anything assuming
that F is either R or C).
8.1 Some Basic Constructions
Let V be a vector space over a field F , and let T W V ! V be a linear
transformation. We define a sequence .T i /i0 of linear transformations of V by
the rule
T 0 D id;
T iC1 D T ı T i
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__8
301
302
8 Polynomial Expressions of Linear Transformations and Matrices
for i 0, where id denotes the identity map (sending every vector v to v). In other
words, T i is the ith iterate of T , for instance
T 3 .v/ D T .T .T .v///
for all v 2 V .
If P D a0 C a1 X C : : : C an X n 2 F ŒX we define a linear transformation P .T /
of V by
P .T / WD a0 T 0 C a1 T 1 C : : : C an T n :
The following result follows easily by unwinding definitions. We will use it
constantly from now on, without further reference, thus the reader may want to
take a break and check that he can actually prove it.
Proposition 8.1. If P1 ; P2 2 F ŒX and T is a linear transformation of V , then
a) P1 .T / C P2 .T / D .P1 C P2 /.T /.
b) P1 .T / ı P2 .T / D .P1 P2 /.T /.
We warn the reader that we do not have P .T1 / C P .T2 / D P .T1 C T2 / and
P .T1 / ı P .T2 / D P .T1 ı T2 / in general. For instance, take P .X / D X 2 and
T1 D T2 D id, then
P .T1 / C P .T2 / D 2id ¤ 4id D .T1 C T2 /2 :
We invite the reader to find a counterexample for the equality P .T1 / ı P .T2 / D
P .T1 ı T2 /.
Definition 8.2. The F -algebra generated by the linear transformation T is the set
F ŒT D fP .T /; P 2 F ŒX g:
The following result follows directly from the previous proposition:
Proposition 8.3. For all x; y 2 F ŒT and c 2 F we have x C cy 2 F ŒT and
x ı y 2 F ŒT . Thus F ŒT is a subspace of the space of linear transformations on V ,
which is stable by composition of linear transformations.
Actually, the reader can easily check that F ŒT is the smallest subspace of the
space of linear transformations on V which contains id, T and is closed under
composition of linear transformations.
All previous constructions and results have analogues for matrices. Namely, if
A 2 Mn .F / is a square matrix of order n with coefficients in F , we have the
sequence .Ai /i0 of successive powers of A, and we define for P D a0 C a1 X C
: : : C an X n 2 F ŒX P .A/ WD a0 In C a1 A C : : : C an An :
We have P .A/ Q.A/ D .PQ/.A/ for all polynomials P; Q and all matrices A.
The algebra generated by A is defined by
F ŒA D fP .A/; P 2 F ŒX g:
It is a subspace of Mn .F / which is stable under multiplication of matrices.
8.1 Some Basic Constructions
303
Remark 8.4. If A is the matrix of some linear transformation T of V in some basis
of V , then P .A/ is the matrix of P .T / in that basis.
Problem 8.5. a) Let A; B 2 Mn .F / be matrices, with B invertible. Prove that for
any P 2 F ŒX we have
P .BAB 1 / D BP .A/B 1 :
b) Prove that if A; B 2 Mn .F / are similar matrices, then P .A/ and P .B/ are
similar matrices for all P 2 F ŒX .
Solution. a) Suppose first that P .X / D X k for some k 1, we need to prove that
.BAB 1 /k D BAk B 1 . But using that B 1 B D In several times, we obtain
.BAB 1 /k D BAB 1 BAB 1 : : :BAB 1
D BA2 B 1 BAB 1 : : :BAB 1 D : : : D BAk B 1 :
In general, write P .X / D a0 C a1 X C : : : C ak X k , then
P .BAB 1 / D
k
X
ai .BAB 1 /i D
k
X
ai BAi B 1
i D0
iD0
k
X
D B.
ai Ai /B 1 D BP .A/B 1
iD0
and the problem is solved.
b) Write A D CBC 1 for some invertible matrix C . Then by part a)
P .A/ D P .CBC 1 / D CP .B/C 1 ;
thus P .A/ and P .B/ are similar.
8.1.1 Problems for Practice
1. Prove Proposition 8.1.
2. Let
3
1 1 1
A D 4 1 1 1 5 :
2 2 3
2
Compute P .A/, where P .X / D X 3 X C 1.
304
8 Polynomial Expressions of Linear Transformations and Matrices
3. Let a; b; c be real numbers and let
3
0 b c
A D 4 a 0 c 5 :
a b 0
2
Compute P .A/, where
P .X / D X.X 2 C ab C bc C ca/:
4. Prove that the matrix
3
2 0 1
A D 44 0 25
4 0 2
2
is nilpotent.
5. Let A 2 Mn .F / be a symmetric matrix. Prove that for all P 2 F ŒX the matrix
P .A/ is symmetric.
6. Let A 2 Mn .F / be a diagonal matrix. Prove that for all P 2 F ŒX the matrix
P .A/ is diagonal.
7. Let A 2 Mn .F / be an upper-triangular matrix. Prove that for all P 2 F ŒX the
matrix P .A/ is upper-triangular.
8. Let V be the vector space of smooth functions f W R ! R and let T W V ! V
be the linear transformation sending f 2 V to its derivative f 0 . Can we find a
nonzero polynomial P 2 RŒX such that P .T / D 0?
8.2 The Minimal Polynomial of a Linear
Transformation or Matrix
Let V be a finite dimensional vector space over F , say of dimension n 1. We
will be concerned with the following problem: given a linear transformation T
of V , describe the polynomials P 2 F ŒX for which P .T / D 0. Note that we
can also ask the dual question: given a polynomial P 2 F ŒX , describe the linear
transformations T for which P .T / D 0. This is more difficult to answer, and solving
this problem actually requires the resolution of the first problem.
So let us start with a linear transformation T of V and consider the set
I.T / D fP 2 F ŒX ; P .T / D 0g:
A key observation is that I.T / is not reduced to f0g. Indeed, the space of
linear transformations on V has dimension n2 , thus the linear transformations
2
id; T; T 2 ; : : :; T n cannot be linearly independent. Thus we can find a0 ; : : :; an2 not
all 0 such that
2
a0 id C a1 T C : : : C an2 T n D 0
2
and then a0 C a1 X C : : : C an2 X n is a nonzero element of I.T /.
8.2 The Minimal Polynomial of a Linear Transformation or Matrix
305
Theorem 8.6. There is a unique monic (nonzero) polynomial T 2 I.T / such that
I.T / is the set of multiples of T in F ŒX , i.e.
I.T / D T F ŒX :
Proof. Proposition 8.1 implies that I.T / is a subspace of F ŒX and that PQ 2 I.T /
whenever P 2 I.T / and Q 2 F ŒX . Indeed,
.PQ/.T / D P .T / ı Q.T / D 0 ı Q.T / D 0:
The discussion preceding Theorem 8.6 shows that I.T / ¤ 0. Let P be a nonzero
polynomial of smallest degree in I.T /. Dividing P by its leading coefficient (the
new polynomial is still in I.T / and has the same degree as P ), we may assume that
P is monic. By the first paragraph, all multiples of P belong to I.T /. Conversely,
let S be an element of I.T / and write S D QP C R with Q; R 2 F ŒX and
deg R < deg P . Note that R D S QP 2 I.T / since I.T / is a subspace of F ŒX and S; QP 2 I.T /. If R ¤ 0, then since deg R < deg P we obtain a contradiction
with the minimality of P . Thus R D 0 and P divides S . It follows that I.T / is
precisely the set of multiples of P and so we can take T D P .
Finally, we need to prove that T is unique. If S had the same properties, then
S would be a multiple of T and T would be a multiple of S . Since they are both
monic, they must be equal.
Definition 8.7. The polynomial T is called the minimal polynomial of T .
Due to its importance, let us stress again the properties of the minimal
polynomial T :
• it is monic and satisfies T .T / D 0.
• For any polynomial P 2 F ŒX , we have P .T / D 0 if and only if T
divides P .
The whole theory developed above applies verbatim to matrices: if
A 2 Mn .F /, there is a unique monic polynomial A 2 F ŒX with the following
properties:
• A .A/ D On and
• If P 2 F ŒX , then P .A/ D On if and only if A divides P .
Remark 8.8. If P is a polynomial and A is a matrix (or a linear transformation)
satisfying P .A/ D On , we will sometimes say that P kills A or that A is killed
by P . Thus the polynomials killing A are precisely the multiples of the minimal
polynomial of A.
The discussion preceding Theorem 8.6 shows that we can find a nonzero
polynomial P of degree not exceeding n2 such that P .T / D 0. Since T divides P ,
it follows that deg T n2 . This bound is fairly weak and the goal of the next
sections is to introduce a second polynomial canonically associated with T , the
characteristic polynomial of T . This will be monic of degree n and will also vanish
when evaluated at T . This will yield the inequality deg T n, which is essentially
optimal.
306
8 Polynomial Expressions of Linear Transformations and Matrices
Let us give a few examples of computations of minimal polynomials:
Example 8.9. Let F be a field. All matrices below are supposed to have entries
in F .
a) The minimal polynomial of the matrix On is clearly On D X . More generally,
the minimal polynomial of the scalar matrix cIn is X c.
b) Consider some elements d1 ; : : :; dn 2 F and a diagonal matrix A D Œaij , with
ai i D di . The elements d1 ; : : :; dn are not necessarily pairwise distinct, so let
us assume that di1 ; : : :; dik is the largest collection of pairwise distinct elements
among d1 ; : : :; dn . Note that for any polynomial Q 2 F ŒX the matrix Q.A/
is simply the diagonal matrix with diagonal entries Q.d1 /; : : :; Q.dn /. Thus
Q.A/ D On if and only if Q.di / D 0 for 1 i n. This happens if and
only if Q.di1 / D : : : D Q.dik / D 0. Since di1 ; : : :; dik are pairwise distinct, this
is further equivalent to .X di1 /: : :.X dik / j Q. Thus the minimal polynomial
of A is
A .X / D .X di1 /: : :.X dik /:
: :; dn are pairwise distinct if and only if A has
In particular, we see that d1 ; : Q
degree n, in which case A D niD1 .X di /.
c) Suppose that F D R and that a matrix A 2 Mn .F / satisfies A2 CIn D On . What
is the minimal polynomial A of A? For sure it divides X 2 C 1, since X 2 C 1
vanishes at A. The only monic nonconstant divisor of X 2 C 1 in RŒX is X 2 C 1
itself, thus necessarily A D X 2 C 1.
d) With the tools introduced so far it is not easy at all to compute the minimal
polynomial of a given matrix. We will introduce in the next sections another
polynomial (called the characteristic polynomial of the matrix) which can be
directly computed (via the computation of a determinant) from the matrix and
which is always a multiple of the minimal polynomial. This makes the computation of the minimal polynomial much easier: one computes the characteristic
polynomial P of the matrix, then looks at all possible monic divisors Q of P and
checks which one kills the matrix and has the smallest degree. We will see later
on that one does not really need to check all possible divisors, which makes the
computation even more rapid.
Problem 8.10. Let T be a linear transformation on a finite dimensional F -vector
space V and let V D V1 ˚ V2 be a decomposition of V into subspaces which
are stable under T . Let P; P1 ; P2 be the minimal polynomials of T; T jV1 and T jV2
respectively. Prove that P is the least common multiple of P1 and P2 .
Solution. Let Q be the least common multiple of P1 and P2 . Since P kills T , it also
kills T jV1 and T jV2 , thus it is a multiple of P1 and P2 . It follows that Q divides P .
In order to prove that P divides Q, it suffices to prove that Q kills T . But since
Q is a multiple of P1 and P1 kills T jV1 , it follows that Q kills T jV1 . Similarly, Q
kills T jV2 . Since V D V1 ˚ V2 , we deduce that Q kills T and the result follows. 8.2 The Minimal Polynomial of a Linear Transformation or Matrix
307
A natural and rather subtle problem is the following: suppose that A 2 Mn .Q/ is
a matrix with rational entries. We can definitely see A as a matrix with real entries,
i.e., as an element of Mn .R/ or as a matrix with complex entries, i.e., as an element
of Mn .C/. Thus we can attach to A three minimal polynomials! Fortunately, the
following theorem shows that the three polynomials are actually one and the same:
the minimal polynomial of a matrix does not depend on the field containing the
entries of the matrix:
Theorem 8.11. Let F1 F2 be two fields and let A 2 Mn .F1 /. Then the minimal
polynomial of A seen as element of Mn .F1 / and that of A seen as element of Mn .F2 /
coincide.
Proof. Let 1 be the minimal polynomial of A 2 Mn .F1 / and 2 that of
A 2 Mn .F2 /. Since F1 ŒX F2 ŒX , the polynomial 1 belongs to F2 ŒX and
kills A, thus it must be a multiple of 2 . In other words, 2 divides 1 . Let
di D deg i . It suffices to prove that d2 d1 and for this it suffices to prove that
there is a nonzero polynomial P 2 F1 ŒX of degree at most d2 which vanishes at A
(as such a polynomial is necessarily a multiple of 1 ). By hypothesis, we know that
we have a relation
a0 In C a1 A C : : : C ad2 Ad2 D On
with ai 2 F2 (namely the coefficients of 2 ). This is equivalent to n2 linear
homogeneous equations in the unknowns a0 ; : : :; ad2 . The coefficients of these
equations are entries of the matrices In ; A; : : :; Ad2 , so they belong to F1 . So we have
a linear homogeneous system with coefficients in F1 and having a nontrivial solution
in F2 . Then it automatically has a nontrivial solution in F1 (by Theorem 7.77),
giving the desired polynomial P .
We end this section with a series of problems related to the pointwise minimal
polynomial. Let V be a finite dimensional vector space over a field F and let
T W V ! V be a linear transformation with minimal polynomial T . For x 2 V ,
consider
Ix D fP 2 F ŒX jP .T /.x/ D 0g:
Note that the sum and difference of two elements of Ix is still in Ix .
Problem 8.12. Prove that there is a unique monic polynomial x 2 F ŒX such that
Ix is the set of multiples of x in F ŒX . Moreover, x divides T .
Solution. We may assume that x ¤ 0. Note that T 2 Ix , since T .T / D 0. Let
x be the monic polynomial of smallest degree which belongs to Ix . We will prove
that Ix D x F ŒX .
First, if P 2 x F ŒX , then P D x Q for some Q 2 F ŒX and
P .T /.x/ D Q.T /.x .T /.x// D Q.T /.0/ D 0;
thus P 2 Ix . This shows that x F ŒX Ix .
308
8 Polynomial Expressions of Linear Transformations and Matrices
Conversely, let P 2 Ix and, using the division algorithm let P D Qx C R for
some polynomials Q; R 2 F ŒX such that deg R < deg x . Assume that R ¤ 0.
Since P and Qx belong to Ix (the second one by the previous paragraph), we
deduce that R 2 Ix . Let a be the leading coefficient of R, then a1 R is a monic
polynomial belonging to Ix and with degree less than that of x , a contradiction.
Thus R D 0 and x divides P , finishing the solution.
Problem 8.13. Let T be a linear transformation on a finite dimensional vector
space V over F , where F is an arbitrary field.
a) Prove that if T D P k Q with k 1, P 2 F ŒX irreducible and Q relatively
prime to P , then we can find x 2 V such that x D P k .
b) Prove that if x1 ; x2 2 V are such that x1 and x2 are relatively prime, then
x1 Cx2 D x1 x2 .
c) Conclude that there is always a vector x 2 V such that x D T .
Solution. a) Suppose on the contrary that x ¤ P k for all x 2 V . Let x 2 V .
Then by hypothesis .P k Q/.T /.x/ D 0. Hence v D Q.T /.x/ lies in the kernel
of P k .T / and so v divides P k . Since v ¤ P k and P is irreducible, v divides
P k1 and so P k1 .T /.v/ D 0, that is .P k1 Q/.T /.x/ D P k1 .T /.v/ D 0. But
since x was arbitrary, this means T jP k1 Q, a contradiction.
b) Let P1 D x1 and P2 D x2 , and let P D P1 P2 . Since P is a multiple of both
P1 and P2 , we deduce that P .T / vanishes at both x1 and x2 , thus it vanishes at
x1 C x2 and so x1 Cx2 j P .
On the other hand, x1 Cx2 .T /.x1 C x2 / D 0, thus
.P1 x1 Cx2 /.T /.x1 / C .P1 x1 Cx2 /.T /.x2 / D 0:
The first term in the sum is 0, since P1 .T /.x1 / D 0, thus the second term must be
0, which means that x2 D P2 j P1 x1 Cx2 . Since P1 and P2 are relatively prime,
it follows that P2 divides x1 Cx2 and by symmetry P1 also divides x1 Cx2 . Using
again that P1 and P2 are relatively prime, we conclude that P D P1 P2 divides
x1 Cx2 . Combining this with the divisibility x1 Cx2 j P and using that x1 Cx2
and P are both monic, the result follows.
c) Consider the decomposition T D P1k1 : : :Prkr of T into irreducible polynomials. Here P1 ; : : :; Pr are pairwise relatively prime irreducible polynomials and
ki are positive integers. By part a) we can find xi 2 V such that xi D Piki .
Applying successively part b), we obtain
x1 C:::Cxr D x1 : : :xr D P1k1 : : :Prkr D T
and so we can take x D x1 C : : : C xr .
Problem 8.14. Let Vx be the span of x; T .x/; T 2 .x/; : : :. Prove that Vx is a
subspace of V of dimension deg x , stable under T .
8.2 The Minimal Polynomial of a Linear Transformation or Matrix
309
Solution. It is clear that Vx is a subspace of V , stable under T . Let d D deg x .
We will prove that x; T .x/; : : :; T d 1 .x/ form a basis of Vx , which will yield the
desired result.
Suppose first that a0 x C a1 T .x/ C : : : C ad 1 T d 1 .x/ D 0 for some scalars
a0 ; : : :; ad 1 , not all of them equal to 0. The polynomial P D a0 C a1 X C : : : C
ad 1 X d 1 is then nonzero and belongs to Ix (i.e., P .T /.x/ D 0). Thus P is a
multiple of x , which is impossible as it is nonzero and its degree is less than that
of x . Thus x; T .x/; : : :; T d 1 .x/ are linearly independent over F .
Let W be the span of x; T .x/; : : :; T d 1 .x/. We claim that W is stable under T .
It suffices to check that T d .x/ belongs to W . But since x .T /.x/ D 0 and x is
monic of degree d , we know that there are scalars b0 ; : : :; bd 1 such that
T d .x/ C bd 1 T d 1 .x/ C : : : C b0 x D 0:
This relation shows that T d .x/ is a linear combination of x; T .x/; : : :; T d 1 .x/ and
so it belongs to W .
Now, since W is stable under T and contains x, it must contain all T k .x/ for
k 0, thus W must also contain Vx . It follows that x; T .x/; : : :; T d 1 .x/ is a
generating subset of Vx and the proof is finished, since we have already shown that
this set is linearly independent.
8.2.1 Problems for Practice
1. Compute the minimal polynomial of the following matrices:
AD
2 3
;
4 2
3
2
010
A D 41 0 05 ;
001
3
123
A D 40 1 25 :
001
2
2. Compute the minimal polynomial of the matrix A 2 Mn .R/ all of whose entries
are equal to 1.
3. Let A 2 Mn .C/. Prove that
dim Span.In ; A; A2 ; : : :/ D deg A :
4. Find a matrix A 2 M2 .R/ whose minimal polynomial is
a) X 2 3X C 2.
b) X 2 .
c) X 2 C 1.
5. Let V be a finite dimensional vector space over F and let T W V ! V be an
invertible linear transformation. Prove that T 1 2 F ŒT .
310
8 Polynomial Expressions of Linear Transformations and Matrices
6. For which positive integers n can we find a matrix A 2 Mn .R/ whose minimal
polynomial is X 2 C 1?
7. Compute the minimal polynomial of the projection/symmetry of Cn onto a
subspace along a complementary subspace.
8. Let T W Mn .C/ ! C be the map sending a matrix to its transpose. Find the
minimal polynomial of T .
9. Let T W Mn .C/ ! C be the map sending a matrix A D Œaij to the matrix
A D Œaij , where z is the complex conjugate of z. Find the minimal polynomial
of T .
10. Describe the minimal polynomial of a matrix A 2 Mn .C/ of rank 1.
8.3 Eigenvectors and Eigenvalues
Let V be a vector space over a field F and let T be a linear transformation of V .
In this section we will be interested in those 2 F for which id T is not
invertible. The following definition is fundamental.
Definition 8.15. An eigenvalue of T is a scalar
2 F such that id T
is not invertible. An eigenvector of T corresponding to the eigenvalue
(or
-eigenvector) is any nonzero element of the space ker. id T /, which is called
the eigenspace corresponding to (or the -eigenspace).
Thus a -eigenvector v is by definition nonzero and satisfies
T .v/ D v;
and the -eigenspace consists of the vector 0 and all -eigenvectors.
We have the analogous definition for matrices:
Definition 8.16. Let A 2 Mn .F / be a square matrix. A scalar 2 F is called an
eigenvalue of A if there is a nonzero vector X 2 F n such that AX D X . In this
case, the subspace
ker. In A/ WD fX 2 F n j AX D
Xg
is called the -eigenspace of A.
It is an easy but important exercise for the reader to check that the two
definitions are compatible, in the following sense: let V be finite dimensional and
let T W V ! V be a linear transformation. Choose any basis of V and let A be the
matrix of T with respect to this basis. Then the eigenvalues of T are exactly the
eigenvalues of A.
8.3 Eigenvectors and Eigenvalues
311
0 1
. Let us find the eigenvalues and
Example 8.17. Consider the matrix A D
1 0
the eigenspaces of A, if we consider A as a matrix with complex entries. Let be
an eigenvalue and let X be a nonzero vector such that AX D X . If x1 ; x2 are the
coordinates of X , the condition AX D X is equivalent to the equations
x2 D x1 ;
x 1 D x2 :
We deduce that
x2 D
2
x2 :
If x2 D 0, then x1 D 0 and X D 0, a contradiction. Thus x2 ¤ 0 and necessarily
2
D 1, that is 2 fi; i g. Conversely, i and i are both eigenvalues, since we
can choose x2 D 1 and x1 D as a solution of the previous system. Actually the
-eigenspace is given by
ker. I2 A/ D f. x2 ; x2 /jx2 2 Cg
and it is the line spanned by v D . ; 1/ 2 C2 . Thus seen as a complex matrix, A has
two eigenvalues ˙i , and the eigenspaces are the lines spanned by .i; 1/ and .i; 1/.
We see now A as a matrix with real entries and we ask the same question.
Letting
2 R be an eigenvalue and X an eigenvector as above, the same
computations yield
. 2 C 1/x2 D 0:
Since is real, 2 C 1 is nonzero and so x2 D 0, then x1 D 0 and X D 0. The
conclusion is that seen as a matrix with real entries, A has no eigenvalue, thus
no eigenspace. This example shows that eigenvalues and eigenspaces are very
sensitive to the field of scalars.
Given a matrix A 2 Mn .F /, how can we find its eigenvalues and its eigenspaces?
The first part is much harder than the second one. Indeed, finding eigenspaces is
equivalent to solving linear systems of the form AX D X , which is not (too)
difficult. On the other hand, finding eigenvalues comes down to solving polynomial
equations, which is quite hard (but can be done approximately with the help of a
computer as long as we are not interested in exact formulae). In practice (and for
reasonably sized matrices) we use the following fundamental observation in order
to compute eigenvalues:
Proposition 8.18. A scalar
2 F is an eigenvalue of A 2 Mn .F / if and only if
det. In A/ D 0:
Proof. In A is not invertible if and only if its determinant vanishes. The result
follows.
312
8 Polynomial Expressions of Linear Transformations and Matrices
Let us come back to our problem of computing the eigenvalues of a matrix. If we
know that
3
2
a11 a12 : : : a1n
6 a21 a22 : : : a2n 7
7
AD6
4 ::: ::: ::: ::: 5
an1 an2 : : : ann
where aij 2 F for i; j D 1; 2; : : : ; n, then the proposition says that we can find the
eigenvalues of A by solving the polynomial equation
ˇ
ˇ
ˇ a11 a12 : : : a1n ˇ
ˇ
ˇ
ˇ a21
a22 : : : a2n ˇˇ
ˇ
D0
ˇ :::
: : : : : : : : : ˇˇ
ˇ
ˇ a
an2 : : : ann ˇ
n1
in F . This is a polynomial equation of degree n. If the degree is greater than 4, there
is no general solution in terms of radicals (of course, there are instances in which
one can solve the equations in terms of radicals, but most of the time this will not
happen).
3
2
10 0
Example 8.19. Let us find the eigenvalues of A D 4 0 0 1 5. We start by
01 0
simplifying the equation
ˇ
ˇ
ˇ 1 0 0ˇ
ˇ
ˇ
ˇ 0
1 ˇˇ D 0:
ˇ
ˇ 0 1 ˇ
Expanding with respect to the first column and doing the computations, we obtain
the equivalent equation
. 1/. 2 C 1/ D 0:
Next, we recall that eigenvalues are sensitive to the field of scalars. Since nothing
was said about the field of scalars in this problem, we consider two cases. If we take
the field of scalars to be C, then the eigenvalues are 1; ˙i, which are the complex
solutions of the equation . 1/. 2 C 1/ D 0. If the field of scalars is R, then the
only eigenvalue of A is 1.
Remark 8.20. Let us mention two important and interesting consequences of
Proposition 8.18 and the discussion following it:
• For any matrix A 2 Mn .F /, A and its transpose t A have the same eigenvalues.
Indeed, for 2 F we have
det. In t A/ D det. t . In A// D det. In A/;
thus det. In A/ D 0 if and only if det. In t A/ D 0.
8.3 Eigenvectors and Eigenvalues
313
• Any matrix A 2 Mn .F / has finitely many eigenvalues, since they are all
solutions of a polynomial equation of degree n, namely det. In A/ D 0.
Actually, since a polynomial of degree n has at most n distinct roots, we deduce
that any matrix A 2 Mn .F / has at most n eigenvalues.
We can restate part of the previous remark in terms of linear transformations:
Corollary 8.21. Let V be a finite dimensional vector space over F and let T W
V ! V be a linear transformation. Then T has only finitely many (actually at most
dim V ) distinct eigenvalues.
Remark 8.22. On the other hand, a linear transformation on an infinite dimensional
vector space may very well have infinitely many eigenvalues. Consider for instance
the space V of all smooth functions f W R ! R, and consider the map T W V ! V
sending f to its derivative. Then fa W x 7! e ax is an eigenvector with eigenvalue a
for all a 2 R, thus any real number is an eigenvalue for T .
The following important problem shows that it is very easy to describe the
eigenvalues of an upper-triangular matrix:
Problem 8.23. Let A D Œaij be an upper-triangular matrix in Mn .F /. Prove that
the eigenvalues of A are precisely its diagonal elements.
Solution. By definition, 2 F is an eigenvalue of A if and only if In A is
not invertible. The matrix In A is also upper-triangular, with diagonal elements
ai i . But an upper-triangular matrix is invertible if and only if its diagonal entries
are nonzero (because its determinant equals the product of the diagonal entries by
Theorem 7.41). The result follows.
Problem 8.24. Find the eigenvalues of A6 , where
3
1 357
60 1 3 67
10
7
AD6
4 0 0 0 4 5 2 M4 .R/:
0 002
2
Solution. It is useless to compute explicitly A6 : by the product rule for matrices,
the product of two upper-triangular matrices A D Œaij and B D Œbij is an
upper-triangular matrix with diagonal entries ai i bi i . It follows that A6 is an uppertriangular matrix with diagonal entries 1; 1=106 ; 0; 64. By the previous problem,
these are also the eigenvalues of A6 .
The next important result says that eigenvectors corresponding to different
eigenvalues are linearly independent.
Theorem 8.25. Let 1 ; : : :; k be pairwise distinct eigenvalues of a linear transformation T . Then the i -eigenspaces of T are in direct sum position.
314
8 Polynomial Expressions of Linear Transformations and Matrices
Proof. By definition, we need to prove that if T .vi / D i vi and v1 C : : : C vk D 0,
then v1 D : : : D vk D 0. We will prove this by induction on k. The result is clear
when k D 1, so assume that it holds for k 1 and let us prove it for k. We have
0 D T .v1 C : : : C vk / D T .v1 / C : : : C T .vk / D
which combined with the relation
0D. k
k v1 C : : : C
1 v1 C : : : C
k vk ;
k vk D 0 yields
1 /v1 C : : : C . k k1 /vk1 D 0:
The inductive hypothesis implies that . k i /vi D 0 for 1 i < k. Since
k ¤ i , this forces vi D 0 for 1 i < k. But then automatically vk D 0, since
v1 C : : : C vk D 0. The inductive step being proved, the problem is solved.
Problem 8.26. Let be an eigenvalue of a linear map T W V ! V , where V is a
vector space over F and let P be a polynomial with coefficients in F . Prove that
P . / is an eigenvalue of P .T /.
Solution. The hypothesis yields the existence of a nonzero vector v 2 V such that
T .v/ D v. By induction, we obtain T k .v/ D k v for k 1. Indeed, if T k .v/ D
k
v, then
T kC1 .v/ D T .T k .v// D T . k v/ D
k
T .v/ D
kC1
v:
We deduce that if P .X / D an X n C : : : C a1 X C a0 , then
P .T /.v/ D an T n .v/ C : : : C a1 T .v/ C a0 v
D an n v C : : : C a0 v D P . /v
and so P . / is an eigenvalue of P .T /.
The following consequence of the previous problem is very useful in practice:
Problem 8.27. Let A 2 Mn .C/ be a matrix and let P 2 CŒX be a polynomial
such that P .A/ D On . Prove that any eigenvalue of A satisfies P . / D 0.
Solution. By the previous problem, P . / is an eigenvalue of P .A/ D On . Since 0
is the only eigenvalue of On , we deduce that P . / D 0.
In particular, we obtain the following:
Theorem 8.28. Let T W V ! V be a linear transformation on a finite-dimensional
vector space V over F . Then the eigenvalues of T are precisely the roots in F of
the minimal polynomial T of T .
Proof. Since T .T / D 0, the previous problem shows that all eigenvalues of T
are roots of T . Conversely, let 2 F be a root of T and assume that is not
8.3 Eigenvectors and Eigenvalues
315
an eigenvalue of T . Thus T id is invertible. Since T . / D 0, we can write
T .X / D .X /Q.X / for some Q 2 F ŒX . Since T .T / D 0, we deduce that
.T id/ ı Q.T / D 0:
As T id is invertible, the last relation is equivalent to Q.T / D 0. Hence T
divides Q, which is absurd. The problem is solved.
The next problem is a classical result, which gives rather interesting bounds on
the eigenvalues of a matrix.
Problem 8.29 (Gershgorin Discs). Let A D Œaij 2 Mn .C/ be a matrix and let
X
Ri D
jaij j:
1j n
j ¤i
a) Prove that if jai i j > Ri for all i , then A is invertible.
b) Deduce that any eigenvalue of A belongs to the set
n
[
fz 2 Cjjz ai i j Ri g:
iD1
c) Give a geometric interpretation of the result established in part b).
Solution. a) Suppose that A is not invertible, thus we can find a nonzero vector
X 2 Cn , with coordinates x1 ; x2 ; : : :; xn , such that AX D 0. Let i be an index
such that
jxi j D max jxj j:
1j n
The i th equation of the linear system AX D 0 reads
ai1 x1 C ai2 x2 C : : : C ai n xn D 0;
or equivalently
ai i x i D X
aij xj :
j ¤i
Using the triangular inequality, i.e., jz1 C : : : C zn j jz1 j C : : : C jzn j, valid for
all complex numbers z1 ; : : :; zn ), we deduce that
jai i jjxi i j D j
X
j ¤i
aij xj j X
j ¤i
jaij jjxj j:
316
8 Polynomial Expressions of Linear Transformations and Matrices
Since jxj j jxi j for all j , we can further write
X
jai i jjxi j jaij jjxi j D Ri jxi j:
j ¤i
Note that xi ¤ 0, since otherwise jxj j jxi j D 0 for all j , thus xj D 0 for
all j , contradicting the fact that X ¤ 0. Thus we can divide by jxi j the previous
inequality and obtain
jai i j Ri ;
which contradicts the assumption of the problem. Hence A is invertible.
b) Let be an eigenvalue of A and let B D A In . Write B D Œbij , with
bij D aij when i ¤ j and bi i D ai i . SinceP
B is not invertible, part a)
ensures the existence of an index i such that jbi i j i ¤j jbij j. This can be also
written as
jai i j Ri
and shows that
n
[
2
fz 2 Cjjz ai i j Ri g:
iD1
c) The set fz 2 Cjjz ai i j Ri g is the closed disc centered at ai i and having radius
Ri . Thus part b) says that the eigenvalues of A are located in a union of discs
centered at the diagonal entries of A and whose radii are R1 ,. . . ,Rn .
Remark 8.30. Consider
Ci D
X
jaj i j:
j ¤i
Applying the result established before to t A (which has the same eigenvalues as A)
we obtain that the eigenvalues of A are also located in
n
[
fz 2 Cjjz ai i j Ci g:
iD1
8.3.1 Problems for Practice
1. Find the eigenvalues and the eigenvectors of the matrix
3
2
110
A D 4 0 2 1 5 2 M3 .C/:
001
8.3 Eigenvectors and Eigenvalues
317
1
is an
2. Let V be the set of matrices A 2 M2 .C/ with the property that
2
eigenvector of A. Prove that V is vector subspace of M2 .C/ and give a basis
for V .
3. Let e1 ; e2 ; e3 ; e4 be the standard basis of C4 and consider the set V of those
matrices A 2 M4 .C/ with the property that e1 ; e2 are both eigenvectors of A.
Prove that V is a vector subspace of M4 .C/ and compute its dimension.
2 3
1
4. Find all matrices A 2 M3 .C/ for which the vector 4 2 5 is an eigenvector with
3
eigenvalue 2.
5. Find the eigenvalues of the matrix A 2 Mn .R/ all of whose entries are equal
to 2.
1x
2 M2 .R/ has
6. Find all real numbers x for which the matrix A D
21
a) two distinct eigenvalues.
b) no eigenvalue.
7. Let V be the space of all polynomials with real coefficients. Let T be the linear
transformation on V sending P .X / to P .1 X /. Describe the eigenvalues of
T . Hint: what is T ı T ?
8. P
A matrix A 2 Mn .R/ is called stochastic if aij 0 for all i; j 2 Œ1; n and
n
j D1 aij D 1 for all i 2 Œ1; n.
a) Prove that 1 is an eigenvalue of any stochastic matrix.
b) Prove that any complex eigenvalue of a stochastic matrix satisfies j j 1.
9. Consider the map T W RŒX ! RŒX sending a polynomial P .X / to P .3X /.
a) Prove that T is a bijective linear transformation, thus its inverse T 1 exists
and is linear.
b) Find the eigenvalues of T .
c) Deduce that there is no polynomial P 2 RŒX such that
T 1 D P .T /:
10. Let A; B 2 Mn .C/ be matrices such that
AB BA D B:
a) Prove that AB k B k A D kB k for all k 1.
b) Deduce that B is nilpotent. Hint: consider the eigenvalues of the map T W
Mn .C/ ! Mn .C/ given by T .X / D AX XA.
318
8 Polynomial Expressions of Linear Transformations and Matrices
11. Let V be the space of continuous real-valued maps on Œ0; 1. Define a map
T W V ! V by
T .f /.x/ D
Z 1
min.x; t /f .t /dt
0
for f 2 V .
a) Justify that T is well defined and a linear transformation on V .
b) Is V finite dimensional?
c) Find the eigenvalues and describe the corresponding eigenspaces of T .
12. Let V be the space of polynomials with real coefficients whose degree does not
exceed n and let T W V ! V be the map defined by
T .P / D P .X / .1 C X /P 0 .X /:
a) Explain why T is a linear transformation on V .
b) Find the eigenvalues of T .
13. Let V be the space of all sequences .xn /n1 of real numbers. Let T be the
map which associates to a sequence .xn /n1 the sequence whose general term
n
is x1 C2x2nC:::Cnx
(for n 1).
2
a) Prove that T is a linear transformation on V .
b) Find the eigenvalues and the corresponding eigenspaces of T .
14. Let V be the vector space of polynomials with real coefficients and let
T W V ! V be the map sending a polynomial P to
T .P / D .X 2 1/P 00 .X / C XP 0 .X /:
a) Prove that T is a linear map.
b) What are the eigenvalues of T ?
15. a) Let A 2 Mn .C/ be a matrix with complex entries, let P 2 CŒX be a
nonconstant polynomial and let be an eigenvalue of P .A/. Prove that there
is an eigenvalue of A such that P . / D (this gives a converse of the
result proved in Problem 8.26 for
Q matrices with complex entries). Hint: factor
the polynomial P .X / as c diD1 .X zi / for some nonzero constant c and
some complex numbers z1 ; : : :; zd , and prove that at least one of the matrices
A z1 In ; : : :; A zd In is not invertible.
0 1
, prove that the result established in
b) By considering the matrix A D
1 0
part a) is false if we replace C with R.
2
c) Suppose that a positive real number
p
pis an eigenvalue of A , where A 2
Mn .R/ is a matrix. Prove that
or is an eigenvalue of A.
8.4 The Characteristic Polynomial
319
16. Let A 2 Mn .R/ be a matrix and let
0 A
:
BD
A 2A
Express the eigenvalues of B in terms of those of A.
17. Consider the matrix
3
2 1 0 : : : 0 0 0
6 1 2 1 : : : 0 0 0 7
7
6
7
6
A D 6 ::: ::: ::: : : : ::: ::: ::: 7 2 Mn .C/:
7
6
4 0 0 0 : : : 1 2 1 5
2
0
0
0 : : : 0 1 2
Prove that the eigenvalues of A are 4 sin2
j
2nC2
for 1 j n.
8.4 The Characteristic Polynomial
We saw in the previous section that finding the eigenvalues of a matrix A 2 Mn .F /
comes down to solving the polynomial equation
det. In A/ D 0
in F . In this section we will study in greater detail the polynomial giving rise to this
equation.
By construction, the determinant of a matrix is a polynomial expression with
integer coefficients in the entries of that matrix. The following theorem refines this
observation a little bit.
Theorem 8.31. Consider two matrices A; B 2 Mn .F /. There is a polynomial
P 2 F ŒX such that for all x 2 F we have
P .x/ D det.xA C B/:
Denoting this polynomial P .X / D det.XA C B/, we have
det.XA C B/ D det.A/X n C ˛n1 X n1 C : : : C ˛1 X C det B
for some polynomial expressions ˛1 ; : : :; ˛n1 with integer coefficients in the entries
of A and B.
320
8 Polynomial Expressions of Linear Transformations and Matrices
Proof. Define P by
P .X / D
X
". /.a1.1/ X C b1.1/ /: : :.an .n/ X C bn .n/ /:
2Sn
It is clear on the definition that P is a polynomial whose coefficients are polynomial
expressions with integer coefficients in the entries of A and B. It is also clear that
P .x/ D det.xA C B/ for x 2 F . The constant term is given by plugging in X D 0
and thus equals det B. Moreover, for each 2 Sn we have
". /.a1.1/ X C b1.1/ /: : :.an .n/ X C bn .n/ / D "./a1.1/ : : :an .n/ X n C : : :;
all terms but the first in the right-hand side having degree at most n 1. Taking the
sum over , we see that P .X / starts with det A X n , all other terms having degree
at most n 1. The result follows.
It follows from the theorem that if A; B have integer (respectively rational)
entries, then det.XA C B/ has integer (respectively rational) coefficients.
Armed with the previous results, we introduce the following
Definition 8.32. The characteristic polynomial of the matrix A 2 Mn .F / is the
polynomial A 2 F ŒX defined by
A .X / D det.X In A/:
Problem 8.33. Find the characteristic polynomial and the eigenvalues of the matrix
3
01 0 0
6 2 0 1 0 7
7
AD6
4 0 7 0 6 5 2 M4 .R/:
2
00 3 0
Solution. We compute using Laplace expansion with respect to the first row
ˇ
ˇ
ˇ X 1 0 0 ˇ
ˇ
ˇ
ˇ 2 X 1 0 ˇ
ˇD
A .X / D det.XI4 A/ D ˇˇ
ˇ
ˇ 0 7 X 6 ˇ
ˇ 0 0 3 X ˇ
ˇ
ˇ ˇ
ˇ
ˇ X 1 0 ˇ ˇ 2 1 0 ˇ
ˇ
ˇ ˇ
ˇ
X ˇˇ 7 X 6 ˇˇ C ˇˇ 0 X 6 ˇˇ D
ˇ 0 3 X ˇ ˇ 0 3 X ˇ
X.X 3 11X / 2.X 2 18/ D X 4 13X 2 C 36:
8.4 The Characteristic Polynomial
321
In order to find the eigenvalues of A, we need to find the real solutions of the
equation
x 4 13x 2 C 36 D 0:
Letting y D x 2 we obtain the quadratic equation
y 2 13y C 36 D 0
with solutions y1 D 4 and y2 D 9. Solving the equations x 2 D 4 and x 2 D 9 yields
the eigenvalues ˙2; ˙3 of A.
Problem 8.34. Find the characteristic polynomial and the eigenvalues of the matrix
3
2
011
A D 4 1 0 1 5 2 M3 .F2 /:
111
Solution. We will constantly use that 1 D 1 in F2 . We obtain
A .X / D det.XI3 A/ D det.XI3 C A/ D
ˇ
ˇ ˇ
ˇ
ˇX 1
1 ˇˇ ˇˇ 1 C X 0
1 ˇˇ
ˇ
ˇ 1 X 1 ˇ D ˇ1 C X X C 1 1 ˇ:
ˇ
ˇ ˇ
ˇ
ˇ 1 1 X C 1ˇ ˇ 0
X X C 1ˇ
the equality being obtained by adding the second column to the first one and the
third column to the second one. Now
ˇ
ˇ
ˇ1 C X 0
1 ˇˇ
ˇ
ˇ1 C X X C 1 1 ˇ D
ˇ
ˇ
ˇ 0
X X C 1ˇ
ˇ
ˇ
ˇ1 0
1 ˇˇ
ˇ
.X C 1/ ˇˇ 1 X C 1 1 ˇˇ D .X C 1/.X C 1/2 D .X C 1/3 :
ˇ0 X X C 1ˇ
Thus
A .X / D .X C 1/3
and consequently the unique eigenvalue of A is 1.
In the following more theoretical exercises, we will
• compute the characteristic polynomial for a rather large class of matrices: uppertriangular, nilpotent, companion matrices, etc.
• establish a few basic properties of the characteristic polynomial which turn out
to be important in practice or in theoretical problems.
322
8 Polynomial Expressions of Linear Transformations and Matrices
For upper-triangular matrices the characteristic polynomial can be read off
directly from the diagonal entries:
Problem 8.35. Let A D Œaij be an upper-triangular matrix (so that aij D 0
whenever i > j ). Prove that
A .X / D
n
Y
.X ai i /:
iD1
Solution. The matrix XIn A is again upper-triangular, with diagonal entries equal
to X ai i . The result follows directly from Theorem 7.41.
Recall that t A is the transpose of the matrix A.
Problem 8.36. Prove that A and t A have the same characteristic polynomial when
A 2 Mn .F /.
Solution. Indeed t .XIn A/ D XIn t A. Since a matrix and its transpose have
the same determinant (Theorem 7.37), we have
A .X / D det.XIn A/ D det.t .XIn A// D det.XIn t A/ D t A .X /;
as desired.
Problem 8.37. Prove that the characteristic polynomial A of A is of the form
A .X / D X n Tr.A/X n1 C : : : C .1/n det A:
Solution. Let us come back to the definition
X
". /.X ı1.1/ a1.1/ /: : :.X ın .n/ an .n/ /:
det.X In A/ D
2Sn
A brutal expansion shows that
.X ı1.1/ a1.1/ /: : :.X ın .n/ an .n/ / D X n
n
Y
ıi .i / i D1
X
n1
n Y
X
.
ık .k/ /aj .j / C : : ::
j D1 k¤j
Q
Note that niD1 ıi .i / is nonzero only for the identity permutation, in which case
it equals 1. This already shows that A .X / is monic of degree n. It is clear that
its constant term is A .0/ D det.A/ D .1/n det A (all these results also follow
straight from Theorem 8.31).
8.4 The Characteristic Polynomial
323
Q
Next, if j 2 f1; 2; : : :; ng, then k¤j ık .k/ is nonzero only when .k/ D k
for all k ¤ j , but then automatically .j / D j (as is a permutation) and so is the identity permutation. Thus the coefficient of X n1 is nonzero in .X ı1.1/ a1.1/ /: : :.X ın .n/ an .n/ / ifPand only if is the identity permutation, in which
case this coefficient equals nj D1 ajj D Tr.A/. This shows that the coefficient
of X n1 in A .X / is Tr.A/.
Problem 8.38. Let A 2 Mn .F / be a nilpotent matrix.
a) Prove that
A .X / D X n :
b) Prove that Tr.Ak / D 0 for all k 1.
Solution. a) Note that by definition there is a positive integer k such that Ak D On .
Then
X k In D X k In Ak D .XIn A/.X k1 In C X k2 A C : : : C Ak1 /:
Taking determinants yields
X nk D A .X / det.X k1 In C : : : C Ak1 /:
The right-hand side is the product of two polynomials (again by the polynomial
nature of the determinant). We deduce that A .X / divides the monomial X nk .
Since moreover A .X / is monic of degree n (by Problem 8.37), it follows that
A .X / D X n .
b) Replacing A with Ak (which is also nilpotent), we may assume that k D 1. We
need to prove that Tr.A/ D 0. But by part a) A .X / D X n , thus the coefficient
of X n1 in A .X / is 0. By the previous problem, this coefficient equals Tr.A/,
thus Tr.A/ D 0.
The following computation will play a fundamental role in the next section,
which deals with the Cayley–Hamilton theorem. It also shows that any monic
polynomial of degree n with coefficients in F is the characteristic polynomial of
some matrix in Mn .F /.
Problem 8.39. Let a0 ; a1 ; : : :; an1 2 F and let
3
2
0 0 0 : : : 0 a0
6 1 0 0 ::: 0 a 7
6
1 7
7
6
A D 6 0 1 0 : : : 0 a2 7 :
7
6
5
4::: ::: ::: :::
0 0 0 : : : 1 an1
324
8 Polynomial Expressions of Linear Transformations and Matrices
Prove that
A D X n an1 X n1 : : : a0 :
Solution. Let P D X n an1 X n1 : : : a1 X a0 . Consider the matrix
3
2
X 0 0 ::: 0
a0
6 1 X 0 : : : 0
a1 7
7
6
7
6
B D XIn A D 6 0 1 X : : : 0
a2 7 :
7
6
5
4::: ::: ::: :::
0 0 0 : : : 1 X an1
Adding to the first row of B the second row multiplied by X , the third row multiplied
by X 2 ,. . . , the nth row multiplied by X n1 we obtain the matrix
3
0 0 0 ::: 0
P
6 1 X 0 : : : 0
a1 7
7
6
7
6
C D 6 0 1 X : : : 0
a2 7 :
7
6
5
4::: ::: ::: :::
0 0 0 : : : 1 X an1
2
We have A D det B D det C and, expanding det C with respect to the first row,
we obtain
ˇ
ˇ
ˇ 1 X : : : 0 ˇ
ˇ
ˇ
ˇ 0 1 : : : 0 ˇ
ˇ D .1/nC1 P .1/n1 D P;
det C D .1/nC1 P ˇˇ
ˇ
:
:
:
:
:
:
:
:
:
:
:
:
ˇ
ˇ
ˇ 0 0 : : : 1 ˇ
observing that the matrix whose determinant we need to evaluate is upper-triangular
with diagonal entries 1. The result follows.
Recall that two matrices A; B 2 Mn .F / are called similar if they represent the
same linear transformation of F n in possibly different bases of this F -vector space.
Equivalently, A and B are similar if there is P 2 GLn .F / such that B D PAP 1 ,
i.e., they are conjugated by an invertible matrix. A fundamental property is that
the characteristic polynomial is invariant under similarity of matrices. More
precisely:
Theorem 8.40. Two similar matrices have the same characteristic polynomial.
Proof. Suppose that A and B are similar, thus we can find an invertible matrix
P 2 Mn .F / such that B D PAP 1 . Note that
XIn B D XPP 1 PAP 1 D P .XIn A/P 1 :
8.4 The Characteristic Polynomial
325
Now, we will take for granted that the determinant is still defined and multiplicative
for matrices with entries in F ŒX (recall that F ŒX is the set of polynomials in
one variable with coefficients in F ). The existence is easy, since one can simply
define in the usual way
det A D
X
"./a1.1/ : : :an .n/
2Sn
for a matrix A D Œaij with entries in F ŒX . The fact that the determinant is
multiplicative is trickier (the hardest case being the case when F is a finite field)
and we will take it for granted.
Consider then P; XIn A; XIn B as matrices with entries in F ŒX . The inverse
of P in Mn .F / is also an inverse of P in Mn .F ŒX /, thus P is invertible considered
as a matrix in Mn .F ŒX /. The multiplicative character of the determinant map yields
B .X / D det.XIn B/ D det.P / det.XIn A/ det.P /1
D det.XIn A/ D A .X /;
as desired.
Problem 8.41. Prove that if A; B 2 Mn .F /, then AB and BA have the same
characteristic polynomial. You may assume for simplicity that F D R or F D C.
Solution. If A is invertible, then AB and BA are similar, as
AB D ABAA1 D A.BA/A1 :
The previous theorem yields the result in this case.
Suppose now that A is not invertible. As A has only finitely many eigenvalues
(Corollary 8.21) and since F is infinite, there are infinitely many 2 F such that
A WD In A is invertible. By the first paragraph for all such we have
det.A B/ D det.BA /:
This can be written as
det. B AB/ D det. B BA/:
Both sides are polynomials in . Since they agree on infinitely many values of ,
these polynomials are equal. In particular, they agree on D 0, which is exactly the
desired result.
Remark 8.42. The previous proof crucially uses the fact that F is infinite. The same
result is true if F D F2 (or more generally any field), but the proof requires more
tools from algebra.
The previous theorem shows that the following definition makes sense.
326
8 Polynomial Expressions of Linear Transformations and Matrices
Definition 8.43. Let V be a finite dimensional F -vector space. The characteristic
polynomial T of the linear transformation T of V is the characteristic polynomial
of the matrix of T in any basis.
Problem 8.44. Let T W R3 ! R3 be the linear transformation defined by
T .x1 ; x2 ; x3 / D .x1 2x2 C x3 ; x2 x3 ; x1 /:
Compute the characteristic polynomial of T .
Solution. The matrix of T with respect to the canonical basis is
3
1 2 1
A D 4 0 1 1 5 :
1 0 0
2
Thus
ˇ
ˇ
ˇ X 1 2 1 ˇ
ˇ
ˇ
T .X / D A .X / D ˇˇ 0 X 1 1 ˇˇ D
ˇ 1
0 X ˇ
ˇ
ˇ ˇ
ˇ
ˇ X 1 1 ˇ ˇ 2 1 ˇ
ˇˇ
ˇ D X 3 2X 2 1:
.X 1/ ˇˇ
0 X ˇ ˇX 1 1 ˇ
Problem 8.45. Let T W V ! V be a linear transformation on a finite dimensional
vector space and let W be a subspace of V which is stable under T . Let T1 be the
restriction of T to W . Prove that T1 divides T .
Solution. Choose a basis w1 ; : : :; wk of W and complete it to a basis
w1 ; : : :; wk ; vkC1 ; : : :; vn of V . Since W is stable under T the matrix
of T
A
, where
with respect to the basis w1 ; : : :; wk ; vkC1 ; : : :; vn is of the form
0 B
A 2 Mk .F / is the matrix of T1 with respect to w1 ; : : :; wk . Using properties of
block-determinants (more precisely Theorem 7.43) we obtain
T .X / D A .X / B .X /
and the result follows.
The previous problem allows us to make the precise link between characteristic
polynomial and eigenspaces: by construction the eigenvalues of a matrix can be
recovered as the roots in F of the characteristic polynomial, but it is not clear
how to deal with their possible multiplicities. Actually, there are two different (and
important) notions of multiplicity:
8.4 The Characteristic Polynomial
327
Definition 8.46. Let T W V ! V be a linear transformation on a finite dimensional
vector space V over F and let 2 F be an eigenvalue of T .
a) The geometric multiplicity of is the dimension of the F -vector space Ker. id T /.
b) The algebraic multiplicity of
is the multiplicity of
as a root of the
characteristic polynomial T of T (i.e., the largest integer j such that .X /j
divides T .X /).
Of course, we have similar definitions for the multiplicities of an eigenvalue of a
matrix: if A 2 Mn .F / and 2 F is an eigenvalue of A, the algebraic multiplicity
of is the multiplicity of as a root of A , while the geometric multiplicity of is
dim Ker. In A/. A good exercise for the reader is to convince himself that if A
is the matrix of a linear transformation T with respect to any basis of V , then the
corresponding multiplicities of for A and for T are the same.
Remark 8.47. The algebraic multiplicity and
geometric multiplicity are not
the
01
. It has 0 as an eigenvalue with
always equal: consider the matrix A D
00
geometric multiplicity 1: indeed the system AX D 0 is equivalent to x2 D 0, thus
Ker.A/ is the line spanned by the first vector of the canonical basis of F 2 . On the
other hand, the characteristic polynomial of A is A .X / D X 2 , thus the algebraic
multiplicity of 0 is 2. If the algebraic multiplicity of an eigenvalue coincides
with its geometric multiplicity, we will simply refer to this common value as the
multiplicity of .
Problem 8.48. Consider the matrix
3
8 1 5
A D 4 2 3 1 5 2 M3 .R/:
4 1 1
2
a) Find the characteristic polynomial and the eigenvalues of A.
b) For each eigenvalue of A, find the algebraic and the geometric multiplicity
of .
Solution. a) Adding the second and third column to the first one yields
ˇ
ˇ ˇ
ˇ
ˇX 8 1
5 ˇˇ ˇˇ X 2 1
5 ˇˇ
ˇ
A .X / D ˇˇ 2 X 3 1 ˇˇ D ˇˇ X 2 X 3 1 ˇˇ
ˇ 4
1 X C 1ˇ ˇX 2 1 X C 1ˇ
ˇ
ˇ
ˇ1 1
5 ˇˇ
ˇ
D .X 2/ ˇˇ 1 X 3 1 ˇˇ :
ˇ1 1 X C 1ˇ
328
8 Polynomial Expressions of Linear Transformations and Matrices
To compute the last determinant, subtract the first row from the second and the
third row, then expand with respect to the first column. We obtain in the end
A .X / D .X 2/.X 4/2 :
The eigenvalues of A are the real roots of A , thus they are 2 and 4.
b) Since A .X / D .X 2/.X 4/2 , it follows that 2 has algebraic multiplicity
1 and 4 has algebraic multiplicity 2. To find the geometric multiplicity of 2, we
determine the 2-eigenspace by solving the system AX D 2X . The reader will
check without difficulty that the system is equivalent to x D y D z (where
x; y; z are the coordinates of X ), thus the 2-eigenspace is one-dimensional and
the geometric multiplicity of the eigenvalue 2 is 1 (we could have done this
without any computation if we knew the theorem below). For the eigenvalue
4, we proceed similarly by solving the system AX D 4X . An easy computation
shows that the system is equivalent to y D x and z D x, thus the 4-eigenspace
is also one-dimensional and so the geometric multiplicity of the eigenvalue 4 is
also 1.
As we have already seen, algebraic multiplicity and geometric multiplicity are
not the same thing. The next result gives however precious information concerning
the link between the two notions.
Theorem 8.49. Let A 2 Mn .F / and let 2 F be an eigenvalue of A. Then the
geometric multiplicity of does not exceed its algebraic multiplicity. In particular,
if the algebraic multiplicity of is 1, then its geometric multiplicity equals 1.
Proof. Let V D F n and let T be the linear map on V attached to A. Let W D
ker. In A/ D ker. id T /. Then W is stable under T , thus by Problem 8.45 (and
letting T jW be the restriction of T to W ) T jW divides T . On the other hand, T jW
is simply multiplication by on W , thus
T jW .X / D .X /dim W :
It follows that .X /dim W divides A .X / D T .X / and the result follows.
The result established in the next problem is very important in applications:
Problem 8.50. Let A 2 Mn .C/ be a matrix with complex entries. Let Sp.A/ be the
set of eigenvalues of A (we call Sp.A/ the spectrum of A) and, for 2 Sp.A/, let
m be the algebraic multiplicity of .
a) Explain the equality of polynomials
A .X / D
Y
2Sp.A/
.X /m :
8.4 The Characteristic Polynomial
329
b) Prove that
Tr.A/ D
X
m
:
2Sp.A/
In other words, the trace of a complex matrix is the sum of its eigenvalues,
counted with their algebraic multiplicities.
c) Prove that
det A D
Y
m
;
2Sp.A/
that is the determinant of a matrix is the product of its eigenvalues, counted
with their algebraic multiplicities.
Q
Solution. a) It is clear by definition of algebraic multiplicities that
2Sp.A/ .X /m divides A .X / (this holds for a matrix with coefficients in any field).
To prove the opposite divisibility (which allows us to conclude since both
polynomials are monic), we will crucially exploit the fact that the matrix has
complex entries and that C is algebraically closed. In particular, we know that A
splits in CŒX into a product of linear factors X z. Any such z is an eigenvalue
of A, since det.zIn A/ D 0. Hence z 2 Sp.A/ and by definition its multiplicity
as root of A .X / is mz . TheQresult follows.
P
m
b) The coefficient of X n1 in
is . On the other
2Sp.A/ .X /
2Sp.A/ m
n1
hand, the coefficient of X
in A equals Tr.A/ by Problem 8.37. The result
follows from a).
c) Taking X D 0 in theP
equality established in a) and using the fact that A .0/ D
.1/n det A and that
2Sp.A/ m D n, we obtain
Y
.1/n det A D A .0/ D
. /m D .1/n
2Sp.A/
The result follows by dividing by .1/n .
Y
m
:
2Sp.A/
Remark 8.51. If we replace C with R or Q the result is completely false: it may
even
0 1
.
happen that Sp.A/ is empty! Indeed, consider for instance the matrix A D
1 0
Here is a nice application of the previous problem.
Problem 8.52. a) Let A 2 Mn .R/ be a matrix such that
A2 3A C 2In D 0:
Prove that det A 2 f1; 2; 4; : : ::; 2n g.
330
8 Polynomial Expressions of Linear Transformations and Matrices
b) Let k 2 f1; 2; 4; : : :; 2n g. Construct a matrix A 2 Mn .R/ such that A2 3A C
2In D 0 and det A D k.
Solution. a) By Problem 8.27 for any complex eigenvalue
of A we have
2
3 C2 D 0, that is . 1/. 2/ D 0. It follows that each complex eigenvalue
of A is either 1 or 2. Since det A is the product of all complex eigenvalues of A
(counted with their algebraic multiplicities), the result follows.
b) Write k D 2p with p 2 f0; 1; : : :; ng. Then a diagonal matrix A having p
diagonal entries equal to 2 and the other diagonal entries equal to 1 is a solution
of the problem.
8.4.1 Problems for Practice
1. Find the characteristic polynomial and the eigenvalues of the matrix
3
3 0 1
A D 4 2 4 2 5 2 M3 .R/:
1 0 3
2
2. Find the characteristic polynomial and the eigenvalues of the matrix
3
1100
60 1 0 17
7
AD6
4 1 0 1 0 5 2 M4 .F2 /:
0011
2
3. a) Give an example of a matrix A 2 M4 .R/ whose characteristic polynomial
equals X 4 X 3 C 1.
3
b) Is
p there a matrix A 2 M3 .Q/ whose characteristic polynomial equals X 2? Give an example of such a matrix in M3 .R/.
4. For each of the matrices below, compute its characteristic and minimal polynomial
a)
1 3
AD
2 1
b)
3
100
A D 40 2 05
103
2
8.4 The Characteristic Polynomial
331
Then find their eigenvalues and the corresponding eigenspaces, by considering
these matrices as matrices with rational entries. Then do the same by considering these matrices as matrices with real (and finally with complex) entries.
5. Let n 2 and let
3
2
1 2 2 ::: 2 2
6 2 1 2 ::: 2 27
7
6
7
6
A D 6 2 2 1 : : : 2 2 7 2 Mn .R/:
7
6
5
4::: ::: ::: :::
2 2 2 ::: 2 1
a) Compute the minimal polynomial and the characteristic polynomial of A.
b) Describe the eigenvalues of A and the corresponding eigenspaces.
6. a) Let A 2 Mn .R/ be the matrix associated with the projection of Rn
onto a subspace W along a complementary subspace of W . Compute the
characteristic polynomial of A in terms of n and dim W .
b) Answer the same question assuming that A is the matrix associated with the
symmetry with respect to a subspace W along a complementary subspace
of W .
7. Consider the following three 5 5 nilpotent matrices
3
3
3
2
2
2
01000
01000
01000
60 0 1 0 07
60 0 1 0 07
60 0 1 0 07
7
7
7
6
6
6
7
7
7
6
6
6
A D 60 0 0 1 07 ; B D 60 0 0 0 07 ; C D 60 0 0 0 07 :
7
7
7
6
6
6
40 0 0 0 05
40 0 0 0 05
40 0 0 0 15
00000
00000
00000
Since these matrices are nilpotent they all have characteristic polynomial
A .X / D B .X / D C .X / D X 5 .
a) Compute the minimal polynomials of these matrices and use them to show
that A is not similar to either B or C .
b) Compute the dimensions of the kernels of these matrices and use them to
show that B is not similar to A or C .
8. Let A 2 Mn .R/ be a matrix such that A3 C In D 0. Prove that Tr.A/ is an
integer.
9. Prove that any matrix A 2 Mn .R/ is the sum of two invertible matrices.
10. Let A 2 Mn .C/ be an invertible matrix. Prove that for all x ¤ 0 we have
A1 .x/ D
xn
A .1=x/:
A .0/
332
8 Polynomial Expressions of Linear Transformations and Matrices
11. Let A D Œaij 2 Mn .C/ and let A D Œaij be the matrix whose entries are the
complex-conjugates of the entries of A. Prove that the characteristic polynomial
of AA has real coefficients. Hint: use the fact that AA and AA have the same
characteristic polynomial.
12. Let A 2 Mn;p .C/ and B 2 Mp;n .C/.
a) Prove the following identities for x 2 C
xIn A
In On;p
xIn AB A
D
B Ip
Op;n Ip
B Ip
and
In On;p
xIn A
xIn
A
:
D
B Ip
Op;n xIp BA
B xIp
b) Deduce that
X q AB .X / D X p BA .X /:
13. Let A and B be matrices in M3 .C/. Show that
det.AB BA/ D
1
TrŒ.AB BA/3 :
3
Hint: if a; b; c are the eigenvalues of AB BA, prove that a C b C c D 0 and
then that
a3 C b 3 C c 3 D 3abc:
14. Prove that for all A; B 2 Mn .C/
deg.det.XA C B// rank.A/:
Hint:
if r is the rank of A, start by reducing the problem to the case A D
Ir 0
2 Mn .C/.
0 0
15. Let A, B, C and D be square matrices in Mn .C/ and let
A B
2 M2n .C/:
C D
M D
a) Assume that DC D CD and that D is invertible. Check the identity
D On
A B
AD BC B
D
C D
On
C In
D
8.5 The Cayley–Hamilton Theorem
333
and deduce that
det M D det.AD BC /:
b) Assume that DC D CD, but not necessarily that D is invertible. Prove that
det M D det.AD BC /:
Hint: consider the matrix Dx D xIn C D with x 2 C.
c) By considering the matrices
10
;
11
AD
BD
10
;
01
10
;
00
C D
DD
01
;
00
prove that the result in part b) no longer holds if we drop the hypothesis
CD D DC .
16. a) Find two matrices A; B 2 M4 .R/ with the same characteristic and minimal
polynomial, but which are not similar.
b) Can we find two such matrices in M2 .R/?
17. Let A D Œaij 2 Mn .C/ and let sk be the sum of all k k principal minors of A
(thus s1 is the sum of the diagonal entries of A, that is Tr.A/, while sn is det A).
Prove that
A .X / D X n s1 X n1 C s2 X n2 : : : C .1/n sn :
Hint: use the multilinear character of the determinant map.
18. Let V D Mn .R/ and consider the linear transformation T W V ! V defined by
T .A/ D A C Tr.A/ In :
a) Prove that V is the direct sum of the eigenspaces of T .
b) Compute the characteristic polynomial of T .
19. Let V D Mn .R/ and consider the linear transformation T W V ! V sending A
to t A. Find the characteristic polynomial of T . Hint: what is T ı T ?
8.5 The Cayley–Hamilton Theorem
We now reach a truly beautiful result: any matrix is killed by its characteristic
polynomial. Recall that A denotes the characteristic polynomial of A 2 Mn .F /.
334
8 Polynomial Expressions of Linear Transformations and Matrices
Theorem 8.53 (Cayley–Hamilton). For all matrices A 2 Mn .F / we have
A .A/ D On :
In other words, if A .X / D X n C an1 X n1 C : : : C a0 , then
An C an1 An1 C : : : C a1 A C a0 In D On :
There are quite a few (at least 30. . . ) different proofs of this result, neither of them being straightforward. The reader should therefore start by finding
the error in the following classical, but unfortunately wrong argument: since
A .X / D det.XIn A/, we have
A .A/ D det.AIn A/ D det.A A/ D det.On / D 0:
Before moving to the rather technical proofs of the previous theorem, we take a
break and focus on some applications:
Problem 8.54. Let A 2 Mn .F /. Prove that the minimal polynomial of A divides
the characteristic polynomial of A.
Solution. Since A annihilates A by the Cayley–Hamilton theorem, it follows that
A divides A .
Problem 8.55. Let A 2 Mn .F / be an invertible matrix. Prove that there are scalars
a0 ; : : :; an1 2 F such that
A1 D a0 In C a1 A C : : : C an1 An1 :
Solution. The characteristic polynomial of A is of the form X n Cbn1 X n1 C: : :C
b1 X C b0 , with b0 D .1/n det A nonzero. By the Cayley–Hamilton theorem
An C bn1 An1 C : : : C b1 A C b0 In D On :
Multiplying by b10 A1 we obtain
1 n1 bn1 n2
b1
A
C
A
C : : : C In C A1 D On :
b0
b0
b0
Thus we can take
a0 D b1
;
b0
a1 D b2
; : : :;
b0
an1 D 1
:
b0
8.5 The Cayley–Hamilton Theorem
335
Problem 8.56. Let A 2 Mn .C/ be a matrix. Prove that the following statements
are equivalent:
a) A is nilpotent (recall that this means that Ak D On for some k 1).
b) The characteristic polynomial of A is X n .
c) An D On .
d) The minimal polynomial of A is of the form X k for some k 1.
Solution. The fact that a) implies b) follows directly from part a) of Problem 8.38.
That b) implies c) is a direct consequence of the Cayley–Hamilton theorem. If c)
holds, then X n kills A, thus the minimal polynomial of A divides X n and is monic,
thus necessarily of the form X k for some k 1, proving that c) implies d). Finally,
since the monic polynomial of A kills A, it is clear that d) implies a).
We will give two proofs of the Cayley–Hamilton theorem in this section. Neither
of them really explains clearly what is happening (the second one does a much better
job than the first proof from this point of view), but with the technology we have
developed so far, we cannot do any better. We will see later on a much better proof,1
which reduces (via a subtle but very useful density argument) the theorem to the
case of diagonal matrices (which is immediate).
Let us give now the first proof of the Cayley–Hamilton theorem. Let A 2 Mn .F /
and let B D XIn A 2 Mn .K/, where K D F .X / is the field of rational fractions2
in the variable X , with coefficients in F . Consider the adjugate matrix C D adj.B/
of B. Its entries are given by determinants of .n 1/ .n 1/-matrices whose
entries are polynomials of degree 1 in X . Thus each entry of C is a polynomial
of degree at most n 1 in X , with coefficients in F . Let
.0/
.1/
.n1/
cij D cij C cij X C : : : C cij
.0/
.n1/
be the .i; j /-entry of C , with cij ; : : :; cij
X n1
2 F . Let C .k/ be the matrix whose
.k/
entries are the cij . Then
C D C .0/ C C .1/ X C : : : C C .n1/ X n1 :
Next, recall that
B C D B adj.B/ D det B In D A .X / In :
Thus we have
.XIn A/ .C .0/ C C .1/ X C : : : C C .n1/ X n1 / D A .X / In :
Which unfortunately works only when F C, even though one can actually deduce the theorem
from this case.
1
2
An element of K is a quotient BA , where A; B 2 F ŒX and B ¤ 0.
336
8 Polynomial Expressions of Linear Transformations and Matrices
Writing A .X / D X n Cun1 X n1 C: : :Cu0 2 F ŒX , the previous equality becomes
AC .0/ C .C .0/ AC .1/ /X C .C .1/ AC .2/ /X C : : : C .C .n2/ AC .n1/ /X n1
CC .n1/ X n D u0 In C u1 In X C : : : C un1 In X n1 C In X n :
Identifying coefficients yields
AC .0/ D u0 In ;
C .0/ AC .1/ D u1 In ; : : :;
C .n2/ AC .n1/ D un1 In ;
C .n1/ D In :
Dealing with these relations by starting with the last one and working backwards
yields
C .n1/ D In ;
C .n2/ D A C un1 In ;
C .n3/ D A2 C un1 A C un2 In
and an easy induction gives
C .nj 1/ D Aj C un1 Aj 1 C : : : C unj In :
In particular
C .0/ D An1 C un1 An2 C : : : C u1 In :
Combining this with the relation AC .0/ D u0 In finally yields
An C un1 An1 C : : : C u0 In D On ;
that is A .A/ D On .
As the reader can easily observe, though rather long, the proof is fairly elementary and based on very simple manipulations. It is not very satisfactory however,
since it does not really show why the theorem holds.
We turn now to the second proof of the Cayley–Hamilton theorem. We will
actually prove the following result, which is clearly equivalent (via the choice of
a basis) to the Cayley–Hamilton theorem.
Theorem 8.57. Let V be a finite dimensional vector space over F and let
T W V ! V be a linear map. Then T .T / D 0.
Proof. The idea is to reduce the problem to linear maps for which we can compute
easily T . The details are a little bit more complicated than this might suggest. . .
Fix an x 2 V . If m is a nonnegative integer, let
Wm D Span.T 0 .x/; T 1 .x/; : : : ; T m .x//:
8.5 The Cayley–Hamilton Theorem
337
Note that W0 W1 : : : V and that dim Wm dim WmC1 dim V for all
m 0. Hence there must be some least m such that dim Wm1 D dim Wm . Since
Wm1 D Wm , we must have Wm1 D Wm , in other words T m .x/ lies in the subspace
Wm1 and we can write T m .x/ as a linear combination of T k .x/ for 0 k < m, say
T m .x/ D
m1
X
ak T k .x/:
kD0
Note that this implies Wm1 is stable under T . Since m is minimal, the vectors
T 0 .x/; : : : ; T m1 .x/ must be linearly independent (a linear dependence among
them would express a lower iterate as a linear combination of earlier iterates).
Therefore they are a basis of Wm1 and with respect to this basis the matrix of
T1 D T jWm1 is
3
2
0 0 0 0 a0
61 0 0 0 a1 7
7
6
7
6
A D 60 1 0 0 a2 7 :
6: : : : : : 7
4 :: :: :: : : :: :: 5
0 0 0 1 am1
The characteristic polynomial of this matrix was computed in Problem 8.39 and it
equals X m am1 X m1 a0 . Hence
T1 .T /.x/ D T m .x/ m1
X
ak T k .x/ D 0:
kD0
By Problem 8.45, since Wm1 is T -stable, the characteristic polynomial T1 of T
restricted to Wm1 divides T . Therefore T .T /.x/ D 0. Since x was arbitrary, we
conclude that T .T / vanishes when applied to any vector, that is, it is the zero linear
map.
8.5.1 Problems for Practice
1. Prove that for any A D Œaij 2 M3 .C/ we have
A3 Tr.A/ A2 C Tr.adjA/ A det A I3 D 0:
2. Let A 2 M3 .R/ be a matrix such that
Tr.A/ D Tr.A2 / D 0:
Prove that A3 D ˛I3 for some real number ˛.
338
8 Polynomial Expressions of Linear Transformations and Matrices
3. Let A; B 2 M3 .C/ be matrices such that the traces of AB and .AB/2 are both 0.
Prove that .AB/3 D .BA/3 .
4. Let A; B; C 2 Mn .C/ be matrices such that AC D CB and C ¤ On .
a) Prove that for all polynomials P 2 CŒT we have
P .A/C D CP .B/:
b) By choosing a suitable polynomial P and using the Cayley–Hamilton
theorem, deduce that A and B have a common eigenvalue.
5. Let A; B 2 Mn .C/ be matrices such that .AB/n D On . Prove that .BA/n D On .
Hint: prove first that .BA/nC1 D On , then use Problem 8.56.
6. Let A 2 Mn .C/ be a matrix such that A and 3A are similar. Prove that An D
On . Hint: similar matrices have the same characteristic polynomial. Also use
Problem 8.56.
7. Let A 2 Mn .C/. Prove that An D On if and only if Tr.Ak / D 0 for all k 1.
Hint: to establish the harder direction, prove that all eigenvalues of A must be 0
and use Problem 8.56.
8. Let V be a vector space of dimension n over a field F and let T W V ! V be
a linear transformation. The goal of this problem is to prove that the following
assertions are equivalent:
i) There exists a vector x 2 V such that x; T .x/; : : :; T n1 .x/ forms a basis
of V .
ii) The minimal polynomial and the characteristic polynomial of T coincide.
a) Assume that i) holds. Use Problem 8.14 to prove that deg T n and
conclude that ii) holds using the Cayley–Hamilton theorem.
b) Assume that ii) holds. Using Problems 8.13 and 8.14, explain why we can
find x 2 V such that x; T .x/; T 2 .x/; : : : span V . Conclude that i) holds.
9. Let n 1 and let A; B 2 Mn .Z/ be matrices with integer entries. Suppose
that det A and det B are relatively prime. Prove that we can find matrices
U; V 2 Mn .Z/ such that AU C BV D In .
Chapter 9
Diagonalizability
Abstract The main focus is on diagonalizable matrices, that is matrices similar to
a diagonal one. We completely characterize these matrices and use this to complete
the proof of Jordan’s classification theorem for arbitrary matrices with complex
entries. Along the way, we prove that diagonalizable matrices with complex entries
are dense and use this to give a clean proof of the Cayley–Hamilton theorem.
Keywords Diagonalizable
classification
•
Trigonalizable
•
Jordan
block
•
Jordan’s
In this chapter we will apply the results obtained in the previous chapter to study
matrices which are as close as possible to diagonal ones. The diagonal matrices are
fairly easy to understand and so are matrices similar to diagonal matrices. These
are called diagonalizable matrices and play a fundamental role in linear algebra.
For instance, we will prove that diagonalizable matrices form a dense subset of
Mn .C/ (i.e., any matrix in Mn .C/ can be approximated to arbitrary precision with
a diagonalizable matrix) and we will use this result to give a very simple proof
of the Cayley–Hamilton theorem over C, by reducing it to the case of diagonal
matrices (which is trivial). Also, we will prove that any matrix A 2 Mn .C/ is
the commuting sum of a nilpotent and of a diagonalizable matrix, showing once
more the importance of diagonalizable (and nilpotent) matrices. We then use the
classification of nilpotent matrices obtained in the chapter concerned with duality to
prove the general form of Jordan’s theorem, classifying all matrices in Mn .C/ up to
similarity. Along the way, we give applications to the resolution of linear differential
equations (of any order) with constant coefficients, as well as to linear recurrence
sequences.
A large part of the chapter is devoted to finding intrinsic properties and
characterizations of diagonalizable matrices. In this chapter F will be a field, but
the reader will not loose anything by assuming that F is either R or C.
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__9
339
340
9 Diagonalizability
9.1 Upper-Triangular Matrices, Once Again
Recall that a matrix A D Œaij 2 Mn .F / is called upper-triangular if aij D 0
whenever i > j , that is all entries of A below the main diagonal are zero. We have
already established quite a few results about upper-triangular matrices, which make
this class of matrices rather easy to understand. For instance, we have already seen
that the upper-triangular matrices form a vector subspace of Mn .F / which is closed
under multiplication. Moreover, it is easy to compute the eigenvalues of an uppertriangular matrix: simply look at the diagonal entries! It is therefore easy to compute
the characteristic polynomial of such a matrix: if A D Œaij is an upper-triangular
matrix, then its characteristic polynomial
A .X / D
n
Y
.X ai i /:
iD1
Before dealing with diagonalizable matrices, we will focus on the trigonalizable
ones, i.e., matrices A 2 Mn .F / which are similar to an upper-triangular matrix. We
will need an important definition:
Definition 9.1. A polynomial P 2 F ŒX is split over F if it is of the form
P .X / D c.X a1 / : : : .X an /
for some scalars c; a1 ; : : : ; an 2 F (not necessarily distinct).
For instance, X 2 C 1 is not split over R since it has no real root, but it is split over
C, since X 2 C1 D .X Ci /.X i /. On the other hand, the polynomial X 2 3X C2 is
split over R, since it factors as .X 1/.X 2/. It is pointless to look for a polynomial
in CŒX which is not split, due to the following amazing theorem of Gauss:
Theorem 9.2 (The Fundamental Theorem of Algebra). Any polynomial
P 2 CŒX is split over C.
This theorem is usually stated as: C is an algebraically closed field, that is any
nonconstant polynomial equation with complex coefficients has at least one complex
solution. The previous theorem is actually equivalent to this usual version of Gauss’
theorem (and it is a good exercise for the reader to prove the equivalence of these
two statements).
By the previous discussion, the characteristic polynomial of an upper-triangular
matrix is split over F . Since the characteristic polynomials of two similar matrices
are equal, we deduce that the characteristic polynomial of any trigonalizable matrix
is split over F .
Problem 9.3. Give an example of a matrix A 2 M2 .R/ which is not trigonalizable
in M2 .R/.
9.1 Upper-Triangular Matrices, Once Again
341
Solution. Since the characteristic polynomial of a trigonalizable matrix is split
over R, it suffices to find a matrix A 2 M2 .R/ whose characteristic polynomial
is not split over R. Consider the matrix
0 1
:
1 0
AD
Its characteristic polynomial is X 2 C 1, which is not split over R. Thus A is not
trigonalizable in M2 .R/.
t
u
The following fundamental theorem gives an intrinsic characterization of trigonalizable matrices.
Theorem 9.4. Let A 2 Mn .F / be a matrix. Then the following assertions are
equivalent:
a) The characteristic polynomial of A is split over F .
b) A is similar to an upper-triangular matrix in Mn .F /.
Proof. The discussion preceding the theorem shows that b) implies a). We will
prove the converse by induction on n. It is clearly true for n D 1, so assume that
n 2 and that the statement holds for n 1.
Choose a root 2 F of the characteristic polynomial A of A (we can do it,
thanks to the hypothesis that A is split over F ), and choose a nonzero vector v 2 F n
such that Av D v. Since v ¤ 0, we can complete v1 to a basis v1 ; : : : ; vn of
V D F n . The matrix of the linear transformation T attached to A with respect to
the basis v1 ; : : : ; vn is of the form
0B
for some B 2 Mn1 .F /. Thus we can find an invertible matrix P1 such that
P1 AP11 D
0B
for some B 2 Mn1 .F /. Since similar matrices have the same characteristic
polynomial, we obtain
A .X / D P1 AP 1 .X / D .X /B .X /;
1
the last equality being a consequence of Theorem 7.43. It follows that B is split
over F . Since B 2 Mn1 .F /, we can apply the inductive hypothesis and find
1
an invertible
matrix Q 2 Mn1 .F / such that QBQ is upper-triangular. Let
1 0
, then P2 2 Mn .F / is invertible (again by Theorem 7.43 we have
P2 D
0Q
det P2 D det Q ¤ 0) and
342
9 Diagonalizability
P2 .P1 AP11 /P21 D
0 QBQ1
is upper-triangular. Setting P D P2 P1 , the matrix PAP 1 is upper-triangular, as
desired.
t
u
Combining the previous two theorems, we obtain the following very important
result:
Corollary 9.5. For any matrix A 2 Mn .C/ we can find an invertible matrix
P 2 Mn .C/ and an upper-triangular matrix T 2 Mn .C/ such that A D P TP 1 .
Thus any matrix A 2 Mn .C/ is trigonalizable in Mn .C/.
Proof. By Gauss’ theorem the characteristic polynomial A of A is split over C.
The result follows from Theorem 9.4.
t
u
As a beautiful application of Corollary 9.5, let us give yet another proof of the
Cayley–Hamilton theorem for matrices in Mn .C/ (the result applies of course to
matrices in Mn .Q/ or Mn .R/). Recall that this theorem says that A .A/ D On for
any matrix A 2 Mn .C/, where A is the characteristic polynomial of A. We will
prove this in two steps: first, we reduce to the case when A is upper-triangular, then
we prove the theorem in this case by a straightforward argument.
Let A 2 Mn .C/ be a matrix and let P be an invertible matrix such that the matrix
T D PAP 1 is upper-triangular. We want to prove that A .A/ D On , but
A .A/ D A .P 1 TP / D P 1 A .T /P D P 1 T .T /P;
the last equality being a consequence of the fact that A and T are similar, thus have
the same characteristic polynomial. Hence it suffices to prove that T .T / D On , in
other words, we may and will assume that A is upper-triangular.
Let e1 ; : : : ; en be the canonical basis of Cn and consider the polynomials
Qk .X / D
k
Y
.X ai i /;
iD1
so that Qn D A (since A is upper-triangular). We claim that Qk .A/ei D 0 for
1 i k and for all 1 k n. Accepting this for a moment, it follows that
Qn .A/ei D 0 for all 1 i n, that is A .A/ei D 0 for all 1 i n, which is
exactly saying that A .A/ D On .
It remains to prove the claim, and we will do this by induction on k. If k D 1, we
need to check that Q1 .A/e1 D 0, that is .A a11 In /e1 D 0, or equivalently that the
first column of A a11 In is zero, which is clear since A is upper-triangular. Assume
now that Qk .A/ei D 0 for 1 i k, and let us prove that QkC1 .A/ei D 0 for
1 i k C 1. If 1 i k, then Qk .A/ei D 0 yields
QkC1 .A/ei D .A akC1;kC1 In /Qk .A/ei D 0:
9.1 Upper-Triangular Matrices, Once Again
343
If i D k C 1, then
QkC1 .A/ei D Qk .A/.A akC1;kC1 In /ei D
k
X
Qk .A/.Aei akC1;kC1 ei / D ai;kC1 Qk .A/ei D 0;
iD1
since Qk .A/ei D 0 for 1 i k. The inductive step is established and the claim
is proved.
Problem 9.6. Let A 2 Mn .C/ andQlet Q 2 CŒX be a polynomial. If the
characteristic polynomial ofQ
A equals niD1 .X i /, prove that the characteristic
polynomial of Q.A/ equals niD1 .X Q. i //.
Solution. By the previous corollary we can write A D P TP 1 for some
P 2 GLn .C/ and some upper-triangular matrix T . The
Q characteristic polynomial
of T is the same as that of A, and it is also equal to niD1 .X ti i / if T D Œtij .
Thus the diagonal entries of T are 1 ; : : : ; n (up to a permutation). Next, Q.A/ D
PQ.T /P 1 and the characteristic polynomial of Q.A/ is the same as that of Q.T /.
But Q.T / is again upper-triangular, with diagonal entries Q. 1 /; : : : ; Q. n /, so
Q.A/ D Q.T / D
n
Y
.X Q. i //:
iD1
t
u
Problem 9.7. Let A 2 Mn .C/ have eigenvalues 1 ; : : : ; n (counted with their
algebraic multiplicities). Prove that for all Q 2 CŒX we have
det Q.A/ D
n
Y
Q. i /;
iD1
Tr.Q.A// D
n
X
Q. i /:
i D1
Solution. Simply combine the previous problem with Problem 8.50.
t
u
9.1.1 Problems for Practice
1. For each of the following matrices decide whether A is trigonalizable over R or
not:
3
2
121
a) A D 43 2 25.
011
14
b) A D
25
344
9 Diagonalizability
x
1
is trigonaliz2. Find all real numbers x for which the matrix A D
x1 2Cx
able in M2 .R/.
3. Find an upper-triangular matrix which is similar to the matrix
12
:
32
4. Find an upper-triangular matrix which is similar to the matrix
3
100
42 1 05:
321
2
5. A matrix A 2 M3 .C/ has eigenvalues 1; 2; 1. Find the trace and the determinant
of A3 C 2A C I3 .
6. Let A 2 Mn .F / be a matrix. Prove that A is nilpotent if and only if A is similar
to an upper-triangular matrix all of whose diagonal entries are 0.
7. Let A; B 2 Mn .C/ be matrices such that AB D BA.
a) Prove that each eigenspace of B is stable under the linear transformation
attached to A.
b) Deduce that A and B have a common eigenvector.
c) Prove by induction on n that there is an invertible matrix P such that PAP 1
and PBP 1 are both upper-triangular.
8. Let A; B 2 Mn .C/ be two matrices. Recall that the Kronecker or tensor product
of A and B is the matrix A ˝ B 2 Mn2 .C/ defined by
2
3
a11 B a12 B : : : a1n B
6 a21 B a22 B : : : a2n B 7
6
7
A˝B D6 :
::
: 7:
4 ::
: : : : :: 5
an1 B an2 B : : : ann B
We recall that
.A ˝ B/ .A0 ˝ B 0 / D .AA0 / ˝ .BB 0 /
for all matrices A; A0 ; B; B 0 2 Mn .C/.
a) Consider two invertible matrices P; Q such that P 1 AP and Q1 BQ are
upper-triangular. Prove that .P ˝ Q/1 .A ˝ B/.P ˝ Q/ is also uppertriangular and describe its diagonal entries in terms of the eigenvalues of A
and B.
9.2 Diagonalizable Matrices and Linear Transformations
345
b) Deduce that if
A .X / D
n
Y
.X i/
B .X / D
and
n
Y
.X i /
i D1
iD1
then
A˝B .X / D
n Y
n
Y
.X i j /:
iD1 j D1
9.2 Diagonalizable Matrices and Linear Transformations
Diagonal matrices are fairly easy to understand and study. In this section we study
those matrices which are as close as possible to being diagonal: the matrices
which are similar to a diagonal matrix. We fix a field F . All vector spaces will
be considered over F and will be finite-dimensional.
Definition 9.8. a) A matrix A 2 Mn .F / is called diagonalizable if it is similar to
a diagonal matrix in Mn .F /.
b) A linear transformation T W V ! V on a vector space V is called diagonalizable
if its matrix in some basis of V is diagonal.
Thus a matrix A 2 Mn .F / is diagonalizable if and only if we can write
A D PDP 1
for some invertible matrix P 2 Mn .F / and some diagonal matrix D D Œdij 2 Mn .F /.
Note that any matrix which is similar to a diagonalizable matrix is itself
diagonalizable. In particular, if T is a diagonalizable linear transformation, then
the matrix of T with respect to any basis of V is still diagonalizable (but not
diagonal in general).
We can give a completely intrinsic characterization of diagonalizable linear
transformations, with no reference to a choice of basis or to matrices:
Theorem 9.9. A linear transformation T W V ! V on a vector space V is
diagonalizable if and only if there is a basis of V consisting of eigenvectors of T .
Proof. Suppose that T is diagonalizable. Thus there is a basis v1 ; : : : ; vn of V such
that the matrix A of T with respect to this basis is diagonal. If .ai i /1i n are the
diagonal entries of A, then by definition T .vi / D ai i vi for all 1 i n, thus
v1 ; : : : ; vn is a basis of V consisting of eigenvectors for T .
Conversely, suppose that there is a basis v1 ; : : : ; vn of V consisting of eigenvectors for T . If T .vi / D di vi , then the matrix of T with respect to v1 ; : : : ; vn is
diagonal, thus T is diagonalizable.
t
u
346
9 Diagonalizability
Remark 9.10. One can use these ideas to find an explicit way to diagonalize a
matrix A. If A 2 Mn .F / is diagonalizable, then we find a basis of V D F n
consisting of eigenvectors and we let P be the matrix whose columns are this
basis. Then P 1 AP D D is diagonal and A D PDP 1 .
Remark 9.11. Suppose that A is diagonalizable and write A D PDP 1 for some
diagonal matrix D and some invertible matrix P .
a) The characteristic polynomials of A and D are the same, since A and D are
similar. We deduce that
n
Y
.X di i / D A .X /:
iD1
In particular, the diagonal entries of D are (up to a permutation) the eigenvalues
of A (counted with algebraic multiplicities). This is very useful in practice.
b) Let be an eigenvalue of A. Then the algebraic multiplicity of equals the
number of indices i 2 Œ1; n for which di i D
(this follows from a)). On
the other hand, the geometric multiplicity of as eigenvalue of A or D is the
same (since X 7! P 1 X induces an isomorphism between Ker. In A/ and
Ker. In D/, thus these two spaces have the same dimension). But it is not
difficult to see that the geometric multiplicity of as eigenvalue of D is the
number of indices i 2 Œ1; n for which di i D , since the system DX D X is
equivalent to the equations .di i /xi D 0 for 1 i n. We conclude that for
a diagonalizable matrix, the algebraic multiplicity of any eigenvalue equals
its geometric multiplicity.
Problem 9.12. Show that
AD
1a
01
is not diagonalizable when a ¤ 0.
Solution. Suppose that A is diagonalizable and write A D PDP 1 with P
invertible and D diagonal. Since A is upper-triangular with diagonal entries equal
to 1, we deduce that the eigenvalues of A are equal to 1. By the previous remark
the diagonal entries of D must all be equal to 1 and so D D In . But then
A D P In P 1 D In , a contradiction.
t
u
Problem 9.13. Prove that the only nilpotent and diagonalizable matrix A 2 Mn .F /
is the zero matrix.
Solution. Suppose that A is diagonalizable and nilpotent and write A D PDP 1 .
By Problem 8.38 and the previous remark we obtain
X n D A .X / D
n
Y
.X di i /:
iD1
Thus di i D 0 for all i and then D D On and A D POn P 1 D On .
t
u
9.2 Diagonalizable Matrices and Linear Transformations
347
The study of diagonalizable matrices is more involved than that of trigonalizable
ones. Before proving the main theorem characterizing diagonalizable matrices, we
will prove a technical result, which is extremely useful in other situations as well
(the reader will find two more beautiful applications of this result in the next
section).
Let k > 1 be an integer and let P1 ; : : : ; Pk pairwise relatively prime polynomials
in F ŒX . Denote P D P1 : : : Pk the product of these k polynomials.
Problem 9.14. Let Qi D PPi . Prove that Q1 ; : : : ; Qk are relatively prime, i.e., there
is no nonconstant polynomial Q dividing all Q1 ; : : : ; Qk .
Solution. Suppose there is an irreducible polynomial Q that divides Qi for all i .
Since QjQ1 D P2 Pk , we deduce that Q divides Pj for some j 2 f2; : : : ; kg.
But since Q divides Qj , it also divides Pi for some i ¤ j , contradicting that Pi
and Pj are relatively prime.
t
u
Note that it is definitely not true that Q1 ; : : : ; Qk are themselves pairwise
relatively prime: if k > 2, then both Q1 and Q2 are multiples of Pk .
The technical result we need is the following:
Theorem 9.15. Suppose that T is a linear transformation on some F -vector space
V (not necessarily finite dimensional). Then for any pairwise relatively prime
polynomials P1 ; : : : ; Pk 2 F ŒX we have
ker P .T / D
k
M
ker Pi .T /;
iD1
where P D P1 P2 : : : Pk .
Proof. Consider the polynomials Qi D PPi as in the previous problem. Since
they are relatively prime, Bezout’s lemma1 yields the existence of polynomials
R1 ; : : : ; Rk such that
Q1 R1 C : : : C Qk Rk D 1
(9.1)
Since Pi divides P , it follows that ker Pi .T / ker P .T / for all i 2 Œ1; k. On
the other hand, take x 2 ker P .T / and let xi D .Qi Ri /.T /.x/. Then relation (9.1)
shows that
x D x1 C x2 C : : : C xk :
1
This lemma says that if A; B 2 F ŒX are relatively prime polynomials, then we can find
polynomials U; V 2 F ŒX such that AU C BV D 1. This easily yields the following more
general statement: if P1 ; : : : ; Pk are polynomials whose greatest common divisor is 1, then we can
find polynomials U1 ; : : : ; Uk such that U1 P1 C : : : C Uk Pk D 1.
348
9 Diagonalizability
Moreover, Pi .T /.xi / D .Pi Qi Ri /.T /.x/ and Pi Qi Ri is a multiple of P . Since
x 2 ker P .T / ker.Pi Qi Ri /.T /, it follows that xi 2 ker Pi .T /, and since
x D x1 C : : : C xk , we conclude that
ker P .T / D
k
X
ker Pi .T /:
iD1
It remains to prove that if xi 2 ker Pi .T / and x1 C : : : C xk D 0, then xi D 0
for all i 2 Œ1; k. We have
Q1 .T /.x1 / C Q1 .T /.x2 / C : : : C Q1 .T /.xk / D 0:
But Q1 .T /.x2 / D : : : D Q1 .T /.xk / D 0, since Q1 is a multiple of P2 ; : : : ; Pk
and P2 .T /.x2 / D : : : D Pk .T /.xk / D 0. Thus Q1 .T /.x1 / D 0 and similarly
Qj .T /.xj / D 0 for 1 j k. But then
x1 D .R1 Q1 /.T /.x1 / C : : : C .Rk Qk /.T /.xk / D 0
and similarly we obtain x2 D : : : D xk D 0. The theorem is proved.
t
u
We are now ready to prove the fundamental theorem concerning diagonalizable
linear transformations.
Theorem 9.16. Let V be a finite dimensional vector space over F and let
T W V ! V be a linear transformation. The following assertions are equivalent:
a) T is diagonalizable.
b) There is a polynomial P 2 F ŒX which splits over F and has pairwise distinct
roots, such that P .T / D 0.
c) The minimal polynomial T of T splits over F and has pairwise distinct roots.
d) Let Sp.T / F be the set of eigenvalues of T . Then
M
ker.T id/ D V:
2Sp.T /
Proof. We start by proving that a) implies b). Choose a basis in which T is
represented by the diagonal matrix D. Let P be the polynomial whose roots are
the distinct diagonal entries of D. Then P .T / is represented by the diagonal matrix
P .D/ with entries P .di i / D 0. Thus P .T / D 0.
That b) implies c) is clear since the minimal polynomial of T will divide P and
hence it splits over F , with distinct roots.
That c) implies d) is just Theorem 9.15 applied to P the minimal polynomial of
T and Pi its linear factors.
Finally, to see that d) implies a), write Sp.T / D f 1 ; : : : ; k g and choose a basis
v1 ; : : : ; vn of V obtained by patching a basis of ker.T 1 id/, followed by a basis
of ker.T 2 id/, : : : , followed by a basis of ker.T k id/. Then v1 ; : : : ; vn form
a basis of eigenvectors of T , thus a) holds by Theorem 9.9.
t
u
9.2 Diagonalizable Matrices and Linear Transformations
349
Remark 9.17. a) If T is a diagonalizable linear transformation, then example 8.9
shows that the minimal polynomial of T is
Y
T .X / D
.X /;
2Sp.T /
the product being taken all eigenvalues of T , counted without multiplicities.
Taking the same product, but counting multiplicities (algebraic or geometric, they
are the same) of eigenvalues this time, we obtain the characteristic polynomial
of T .
b) If T is any linear transformation on a finite dimensional vector space V , then T
is diagonalizable if and only if the sum of the dimensions of the eigenspaces of
T equals dim V , i.e.,
X
dim ker.T id/ D dim V:
2Sp.T /
Indeed, this follows from the theorem and the fact that the subspaces ker.T id/
are always in direct sum position.
c) Suppose that T is diagonalizable. For each 2 Sp.T / let be the projection
on the subspace ker.T id/. Then
X
T D
:
2Sp.T /
This follows from ˚ 2Sp.T / ker.T vD
X
v
id/ D V and the fact that if
with v 2 ker.T id/;
2Sp.T /
then
T .v/ D
X
T .v / D
2Sp.T /
X
2Sp.T /
v D
X
.v/:
2Sp.T /
Due to its importance, we will restate the previous theorem in terms of matrices:
Theorem 9.18. Let A 2 Mn .F /. Then the following assertions are equivalent:
a) A is diagonalizable in Mn .F /.
b) If Sp.A/ is the set of eigenvalues of A, then
M
ker. In A/ D F n :
2Sp.A/
c) The minimal polynomial A of A is split over F , with pairwise distinct roots.
d) There is a polynomial P 2 F ŒX which is split over F , with pairwise distinct
roots and such that P .A/ D On .
350
9 Diagonalizability
In the following problems the reader will have the opportunity to check the
comprehension of the various statements involved in the previous theorem.
Problem 9.19. Explain why the matrix A with real entries is diagonalizable in each
of the following two cases.
(a) The matrix A has characteristic polynomial
X 3 3X 2 C 2X:
(b)
3
10000
60 3 0 0 07
7
6
7
6
A D 60 0 4 0 07 :
7
6
40 0 0 3 25
00014
2
Solution. (a) We have
X 3 3X 2 C 2X D X.X 2 3X C 2/ D X.X 1/.X 2/;
which is split, with distinct roots. Since this polynomial kills A (by the Cayley–
Hamilton theorem), the result follows from the implication “b) implies a)”
in Theorem 9.16. We can also argue directly, as follows: if v1 ; v2 ; v3 are
eigenvectors corresponding to the eigenvalues 0; 1; 2, then v1 ; v2 ; v3 are linearly
independent (since the eigenvalues are distinct) and thus must form a basis
of R3 . Thus A is diagonalizable (by Theorem 9.9 and the discussion preceding
it).
(b) We have
A .X / D det.XI5 A/ D .X 1/.X 3/.X 4/Œ.X 3/.X 4/ 2
D .X 1/.X 3/.X 4/.X 2 7X C10/ D .X 1/.X 2/.X 3/.X 4/.X 5/:
This polynomial is split with distinct roots, so the same argument as in part a)
yields the result.
t
u
Problem 9.20. Consider the matrix
3
010
A D 40 0 15 :
100
2
a) Is A diagonalizable in M3 .C/?
b) Is A diagonalizable in M3 .R/?
9.2 Diagonalizable Matrices and Linear Transformations
351
Solution. One easily finds that the characteristic polynomial of A is A .X / D
X 3 1. This polynomial is split with distinct roots in CŒX , thus A is diagonalizable
in M3 .C/. On the other hand, A is not diagonalizable in M3 .R/, since its
characteristic polynomial does not split in RŒX .
t
u
Problem 9.21. Let
3
2
7 16 4
A D 4 6 13 25 2 M3 .R/
12 16 1
(a) Prove that D 5 is an eigenvalue of A.
(b) Diagonalize A, if possible.
Solution. (a) We have
3
12 16 4
A 5I D 4 6
8 25
12 16 4
2
and the last row is the opposite of the first row. Thus A 5I is not invertible
and 5 is an eigenvalue of A.
(b) We take advantage of part a) and study the 5-eigenspace of A. This is described
by the system of equations
8
<12x 16y C 4z D 0
6x C 8y 2z D 0
:
12x C 16y 4z D 0
As we have already remarked in part a), the first and the third equations are
equivalent. The system is then equivalent (after dividing the first equation by 4
and the second one by 2) to
3x 4y C z D 0
3x C 4y z D 0
Again, the first and second equations are equivalent. Thus the 5-eigenspace is
ker.A 5I / D f.x; y; 3x C 4y/jx; y 2 Rg
and this is a two-dimensional vector space, with a basis given by
v1 D .1; 0; 3/;
v2 D .0; 1; 4/:
We deduce that 5 has algebraic multiplicity at least 2. Since the sum of the
complex eigenvalues of A equals the trace of A, which is 7 C 13 C 1 D 7, we
deduce that 3 is another eigenvalue of A, and the corresponding eigenspace is a
line. Solving the system AX D 3X yields the solution .2; 1; 2/. We deduce that
a diagonalization of A is given by
352
9 Diagonalizability
31
32
32
1 0 2
50 0
1 0 2
A D 40 1 1 5 40 5 0 5 40 1 1 5 :
34 2
0 0 3
34 2
2
t
u
Problem 9.22. Consider the matrix
2
3
0 10
A D 44 4 05 2 M3 .R/:
2 1 2
Is this matrix diagonalizable?
Solution. We start by computing the characteristic polynomial
ˇ
ˇ
ˇ X 1
ˇ
ˇ
0 ˇˇ
ˇ
ˇ X 1 ˇ
ˇD
A .X / D ˇˇ 4 X 4 0 ˇˇ D .X 2/ ˇˇ
4 X 4ˇ
ˇ 2 1 X 2 ˇ
.X 2/.X 2 4X C 4/ D .X 2/3 :
Thus 2 is an eigenvalue of A with algebraic multiplicity 3. If A is diagonalizable,
then 2 would have geometric multiplicity 3, that is Ker.A 2I3 / would be three
dimensional and A D 2I3 . Since this is certainly not the case, it follows that A is
not diagonalizable.
t
u
Problem 9.23. Find all values of a 2 R for which the matrix
3
2
2 1 2
A D 41 a 15 2 M3 .R/
1 1 1
is diagonalizable.
Solution. As usual, we start by computing the characteristic polynomial A .X /
of A. Adding the first column to the third one, then subtracting the first row from
the third one, we obtain
ˇ
ˇ
ˇ X 2 1
2 ˇˇ
ˇ
A .X / D ˇˇ 1 X a 1 ˇˇ D
ˇ 1
1 X C 1 ˇ
ˇ
ˇ
ˇ X 2 1 X ˇ
ˇ
ˇ
ˇ 1 X a 0 ˇ D X.X 1/.X a/:
ˇ
ˇ
ˇ1 X 0
0ˇ
If a … f0; 1g, then A .X / is split with distinct roots and since it kills A (by the
Cayley–Hamilton theorem), we deduce that A is diagonalizable.
9.2 Diagonalizable Matrices and Linear Transformations
353
Suppose that a D 0, thus 0 is an eigenvalue of A with algebraic multiplicity 2.
Let us find its geometric multiplicity, which comes down to solving the system
AX D 0. This system reads to
8
<2x1 C x2 2x3 D 0
x 1 x3 D 0
:
x1 C x2 x3 D 0
and its solutions are .x1 ; 0; x1 / with x1 2 R. As this space is one dimensional, we
deduce that the geometric multiplicity of 0 is 1 and so A is not diagonalizable.
If a D 1, a similar argument shows that 1 is an eigenvalue with algebraic
multiplicity 2 and with geometric multiplicity 1, thus A is not diagonalizable. All in
all, the answer of the problem is: all a 2 R n f0; 1g.
t
u
Problem 9.24. Diagonalize, if possible, the matrix
3
4 0 2
A D 42 5 4 5 2 M3 .R/
00 5
2
Solution. We start by computing the characteristic polynomial of A:
ˇ
ˇ
ˇX 4 0
ˇ
ˇ
2 ˇˇ
ˇ
ˇ
ˇ
ˇ 2 X 5 4 ˇ D .X 5/ ˇ X 4 0 ˇ D .X 4/.X 5/2 :
ˇ
ˇ
ˇ 2 X 5 ˇ
ˇ 0
0 X 5ˇ
We deduce that A has two eigenvalues, namely 4 with algebraic multiplicity 1
and 5 with algebraic multiplicity 2. Next, we study separately the corresponding
eigenspaces. Since 4 has algebraic multiplicity 1, we already know that the 4eigenspace will be a line. To find it, we write the condition AX D 4X as the system
8
<
4x 2z D 4x
2x C 5y C 4z D 4y
:
5z D 4z
This system can easily be solved: the last equation gives z D 0, the first one becomes
tautological and the second one gives y D 2x. Thus the 4-eigenspace is the line
spanned by v1 D .1; 2; 0/.
Next, we study the 5-eigenspace. Write the equation AX D 5X as the system
8
<
4x 2z D 5x
2x C 5y C 4z D 5y
:
5z D 5z
354
9 Diagonalizability
The last equation is tautological. The first equation gives x D 2z and the second
one becomes then tautological. Thus the solutions of the system are .2z; y; z/ D
y.0; 1; 0/ C z.2; 0; 1/ with y; z 2 R2 . We deduce that the 5-eigenspace is twodimensional, with a basis given by v2 D .0; 1; 0/ and v3 D .2; 0; 1/.
Since the sum of the dimensions of the eigenspaces equals 3 D dim R3 , we
deduce that A is diagonalizable and v1 ; v2 ; v3 form a basis of eigenvectors. The
matrix P whose columns are the coordinates of v1 ; v2 ; v3 with respect to the
canonical basis is
3
1 0 2
P D 42 1 0 5 :
0 0 1
2
We have
3
400
with D D 40 5 05 :
005
2
A D PDP
1
;
t
u
We end this section with some more theoretical exercises.
Problem 9.25. Let T be a diagonalizable linear transformation on a finite dimensional vector space V over a field F . Let W be a subspace of V which is stable
under T . Prove that T jW W W ! W is diagonalizable.
Solution. Since T W V ! V is diagonalizable, there is a polynomial P 2 F ŒX of
the form P D .X 1 / : : : .X k / with 1 ; : : : ; k 2 F pairwise distinct, such
that P .T / D 0. Since P .T /.v/ D 0 for all v 2 V , we have P .T /.w/ D 0 for all
w 2 W . Thus P .T jW / D 0 and so T jW is diagonalizable by Theorem 9.16.
t
u
The result established in the next problem is very useful in many situations.
Problem 9.26. Let V be a finite dimensional vector space over a field F and let
T1 ; T2 W V ! V be linear transformations of V . Prove that if T1 and T2 commute,
then any eigenspace of T2 is stable under T1 .
Solution. Let 2 F be an eigenvalue of T2 and let E D ker. id T2 / be the
corresponding eigenspace. If v 2 E , then T2 .v/ D v, thus
T2 .T1 .v// D T1 .T2 .v// D T1 . v/ D T1 .v/
and so T1 .v/ 2 E . The result follows.
t
u
Problem 9.27. Let V be a finite dimensional vector space over a field F and let
T1 ; T2 W V ! V be diagonalizable linear transformations of V . Prove that T1 and
T2 commute if and only if they are simultaneously diagonalizable.
9.2 Diagonalizable Matrices and Linear Transformations
355
Solution. Suppose first that T1 and T2 are simultaneously diagonalizable. Thus
there is a basis B of V in which the matrices of T1 and T2 are diagonal, say D1
and D2 . We clearly have D1 D2 D D2 D1 , thus the matrices of T1 ı T2 and T2 ı T1
coincide in this basis and so T1 ı T2 D T2 ı T1 .
Conversely, suppose that T1 and T2 commute. Let 1 ; : : : ; k be the distinct
eigenvalues of T1 and let Wi D ker.T1 i / be the corresponding eigenspaces.
Since T1 is diagonalizable, we have V D W1 ˚ : : : ˚ Wk . Since T1 and T2
commute, T2 leaves each Wi invariant by Problem 9.26. Since T2 is diagonalizable,
so is its restriction to Wi , by Problem 9.25. Thus there is a basis Bi of Wi
consisting of eigenvectors for T2 jWi . Consider the basis B 0 consisting of all vectors
in B1 [ : : : [ Bk . Then B 0 consists of eigenvectors for both T1 and T2 (this is clear
for T2 , and holds for T1 since T1 acts on Wi by the scalar i ). Thus the matrices of
T1 and T2 in the basis B 0 are both diagonal and the result follows.
t
u
Problem 9.28. Let A be an invertible matrix with complex coefficients and let
d 1. Prove that A is diagonalizable if and only if Ad is diagonalizable. What
happens if we don’t assume that A is invertible?
Solution. Suppose that A is diagonalizable, thus there is an invertible matrix P such
that PAP 1 is a diagonal matrix. Then .PAP 1 /d D PAd P 1 is also a diagonal
matrix, hence Ad is diagonalizable. This implication does not require the hypothesis
that A is invertible.
Suppose now that Ad is diagonalizable and that A is invertible. Since Ad is diagonalizable and invertible, its minimal polynomial is of the form .X 1 / : : : .X k /
with 1 ; : : : ; k pairwise distinct and nonzero. Consider the polynomial P .X / D
.X d 1 / : : : .X d k /. Since each of the polynomials X d 1 ; : : : ; X d k has
pairwise distinct roots and since these polynomials are pairwise relatively prime,
their product P has pairwise distinct roots. Since P .A/ D 0, we deduce that A is
diagonalizable.
Finally, if we only assume that Ad is diagonalizable and A is not invertible, then
one of the eigenvalues of A is 0. Hence one of the factors of the matrix P .X / above
becomes X d . Since this does not have distinct roots the proof breaks down. Indeed
A
not be diagonalizable in this case. For instance, consider the matrix A D
need
01
. This matrix satisfies A2 D 0, thus A2 is certainly diagonalizable. However,
00
A is not diagonalizable. Indeed, if this was the case, then A would necessarily be the
zero matrix, since its eigenvalues are all 0. Hence for the more difficult implication
one cannot drop the hypothesis that A is invertible.
t
u
Problem 9.29. Let A be a matrix with real entries such that A3 D A2 .
a) Prove that A2 is diagonalizable.
b) Find A if its trace equals the number of columns of A.
Solution. a) The hypothesis yields A4 D A3 D A2 , thus .A2 /2 D A2 . It follows
that A2 is killed by the polynomial X.X 1/, which has pairwise distinct and
real roots. Thus A2 is diagonalizable.
356
9 Diagonalizability
b) Let n be the number of columns of A. The trace of A equals n, and this is also the
sum of the complex eigenvalues of A, counted with their algebraic multiplicities.
By hypothesis each eigenvalue satisfies 3 D 2 , thus 2 f0; 1g. Since the
n eigenvalues add up to n, it follows that all of them are equal to 1. Thus all
eigenvalues of A2 are 1 and using part a) we deduce that A2 D In . Combining
this with the hypothesis yields A In D In and then A D In , which is the unique
solution of the problem.
t
u
Problem 9.30. Let A1 2 Mp .F / and A2 2 Mq .F / and let
A1 0
2 MpCq .F /:
AD
0 A2
Prove that A is diagonalizable if and only if A1 and A2 are diagonalizable.
Solution. If P 2 F ŒX is a polynomial, then
P .A1 / 0
:
P .A/ D
0 P .A2 /
If A is diagonalizable, then we can find a polynomial P which splits over F into a
product of distinct linear factors and which kills A. By the previous formula, P also
kills A1 and A2 , which must therefore be diagonalizable.
Suppose now that A1 and A2 are diagonalizable, thus we can find polynomials
P1 , P2 which split over F into products of distinct linear factors and which kill A1
and A2 respectively. Let P be the least common multiple of P1 and P2 . Then P splits
into a product of distinct linear factors and kills A, which is therefore diagonalizable.
An alternative solution is based on the study of eigenspaces of A. Namely, it is
not difficult to see that for any 2 F we have
ker.A IpCq / D ker.A1 Ip / ˚ ker.A2 Iq /:
Now a matrix X 2 Mn .F / is diagonalizable if and only if ˚ 2F ker.X In / D
F n , from where the result follows easily.
t
u
9.2.1 Problems for Practice
1. a) Diagonalize the matrix
AD
1 2
4 1
in M2 .C/.
b) Do the same by considering A as an element of M2 .R/.
9.2 Diagonalizable Matrices and Linear Transformations
357
2. For each matrix A below, decide if A is diagonalizable. Explain your reasoning.
If A is diagonalizable, find an invertible matrix P and a diagonal matrix D such
that P 1 AP D D.
(a)
3
500
A D 4 0 5 0 5 2 M3 .R/
105
2
(b)
3
300
A D 4 0 5 0 5 2 M3 .R/
105
2
3. a) Let a1 ; : : : ; an be complex numbers and let A D Œai aj 1i;j n 2 Mn .C/.
When is A diagonalizable?
b) Let a1 ; : : : ; an be real numbers and let A D Œai aj 1i;j n 2 Mn .R/. When
is A diagonalizable?
4. Let A be the n n matrix all of whose entries are equal to 1. Prove that A 2
Mn .R/ is diagonalizable and find its eigenvalues.
5. Compute the nth power of the matrix
3
133
A D 43 1 35:
331
2
Hint: diagonalize A.
6. Find all differentiable maps x; y; z W R ! R such that x.0/ D 1, y.0/ D 0,
z.0/ D 0 and
x 0 D y C z;
y 0 D x C z;
z0 D x 3y C 4z:
3
0 11
Hint: the matrix A D 4 1 0 1 5 has an eigenvalue equal to 1. Use this to
1 3 4
diagonalize A. How is this related to the original problem?
7. Let V be a finite dimensional vector space over C and let T W V ! V be a
linear transformation.
2
a) Prove that if T is diagonalizable, then T 2 is diagonalizable and
ker T D ker T 2 .
b) Prove that if T 2 is diagonalizable and ker T D ker T 2 , then T is
diagonalizable.
358
9 Diagonalizability
8. Let A; B 2 Mn .F / be matrices such that A is invertible and AB is diagonalizable. Prove that BA is also diagonalizable. What happens if we don’t assume
that A is invertible?
9. Find all matrices A 2 M3 .R/ such that
3
900
A2 D 4 1 4 0 5 :
111
2
3
900
Hint: start by diagonalizing the matrix 4 1 4 0 5 and prove that any solution of
111
the problem is diagonalizable and commutes with this matrix.
10. Let A 2 Mn .C/ be a matrix such that Ad D In for some positive integer d .
Prove that
2
a) A is diagonalizable with eigenvalues d th roots of unity.
b) Deduce that
1X
Tr.Ai /:
d i D1
d
dim ker.A In / D
11. Let F be an arbitrary family of diagonalizable matrices in Mn .C/. Suppose that
AB D BA for all A; B 2 F. Prove that there is an invertible matrix P such
that PAP 1 is diagonal for all A 2 F. Hint: proceed by induction on n and use
Problem 9.26 and the arguments in the solution of Problem 9.27.
12. (Functions of a diagonalizable matrix) Let A 2 Mn .F / be a diagonalizable
matrix with A D PDP 1 and D diagonal with diagonal entries di i . Let f W
F ! F be any function and let f .D/ be the diagonal matrix with .i; i / entry
f .di i /. Define f .A/ D Pf .D/P 1 .
a) Prove that f .A/ is well defined. That is, if we diagonalize A in a different
way, we will get the same matrix f .A/. (Hint: there is a polynomial p with
p.di i / D f .di i /.)
b) Prove that if A is diagonalizable over F D R and m is odd, then there is a
diagonalizable matrix B with B m D A.
13. Let A; B 2 Mn .R/ be diagonalizable matrices such that
AB 5 D B 5 A:
Write B D PDP 1 with P invertible and D diagonal.
a) Let C D P 1 AP . Prove that CD 5 D D 5 C .
b) Prove that CD D DC . Hint: use the injectivity of the map x ! x 5 (x 2 R).
c) Deduce that AB D BA.
9.3 Some Applications of the Previous Ideas
359
AA
2 M2n .C/.
14. Let A 2 Mn .C/ and let B D
0 A
a) Prove that for all P 2 CŒX we have
P .A/ AP 0 .A/
:
P .B/ D
0
P .A/
b) Deduce that B is diagonalizable if and only if A D On .
15. Find all matrices A 2 Mn .R/ such that A5 D A2 and the trace of A equals n.
Hint: prove that all complex eigenvalues of A are equal to 1 and then that A is
diagonalizable in Mn .C/.
16. Let A; B 2 Mn .R/ be diagonalizable matrices such that A5 D B 5 . Prove that
A D B. Hint: use problems 13 and 9.27.
17. Let V be a finite dimensional C-vector space and let T W V ! V be a linear
transformation such that any subspace of V which is stable under T has a
complement which is stable under T . Prove that T is diagonalizable.
18. Let V be a finite dimensional vector space over some field F and let T W V !
V be a diagonalizable linear transformation on V . Let C.T / be the set of linear
transformations S W V ! V such that S ı T D T ı S .
a) Prove that a linear transformation S W V ! V belongs to C.T / if and only
if S leaves invariant each eigenspace of T .
b) Let m be the algebraic multiplicity of the
of T . Prove that
P eigenvalue
C.T / is an F -vector space of dimension
m2 , the sum being taken over
all eigenvalues of T .
c) Suppose that the eigenvalues of T are pairwise distinct. Prove that
id; T; T 2 ; : : : ; T n1 form a basis of C.T / as F -vector space.
9.3 Some Applications of the Previous Ideas
In this section we would like to come back to the technical result given by
Theorem 9.15 and give some further nice applications of it. First of all, we will apply
it to the resolution of linear differential equations with constant coefficients.
Consider the following classical problem in real analysis: given complex numbers a0 ; a1 ; : : : ; an1 and an open interval I in R, find all smooth functions f WI !C
such that
f .n/ .x/ C an1 f .n1/ .x/ C : : : C a1 f 0 .x/ C a0 f .x/ D 0
(9.2)
for all x 2 I . Here f .i / is the i th derivative of f .
It follows from elementary calculus that any solution of Eq. (9.2) is smooth,
i.e., infinitely differentiable. Let V be the space of smooth functions f W I ! C.
360
9 Diagonalizability
Note that V is an infinite dimensional vector space over C, but we had no finiteness
assumption in Theorem 9.15, so we can use it for this vector space. Consider the
linear transformation T sending a map f in V to its derivative
T .f / D f 0 :
T W V ! V;
Then T k .f / D f .k/ for all k 0, thus solving Eq. (9.2) is equivalent to finding
ker P .T /, where
P .X / D X n C an1 X n1 C : : : C a0
is the characteristic polynomial of Eq. (9.2). Since we work over the complex
numbers, we can factor
P .X / D
d
Y
.X zi /ki
iD1
for some positive integers k1 ; : : : ; kd and some pairwise distinct complex numbers
z1 ; : : : ; zd . By Theorem 9.15 we have
ker P .T / D
d
M
ker.T zi id/ki
iD1
so it suffices to understand ker.T z id/k , where z is a complex number and k is a
positive integer. Let g 2 V be the map
g.x/ D e zx ;
so that g 0 D zg. Then for any f 2 V we have
.T z id/.fg/ D .fg/0 zfg D f 0 g;
thus by an immediate induction
.T z id/k .fg/ D f .k/ g:
Take h 2 ker.T z id/k and let f D h=g (note that g has no complex zero). Then
the previous computation gives f .k/ D 0, that is f is a polynomial map of degree
less than k. Conversely, the same computation shows that any such f gives rise to
an element of ker.T z id/k (if multiplied by g). We conclude that ker.T z id/k
consists of the maps x 7! g.x/P .x/, with P a polynomial of degree k 1 with
complex coefficients, a basis of ker.T z id/k being given by the maps x 7! x j e zx ,
where 0 j k 1.
9.3 Some Applications of the Previous Ideas
361
Putting everything together, we obtain the following:
Theorem 9.31. Let a0 ; : : : ; an1 be complex numbers and write
X n C an1 X n1 C : : : C a0 D
d
Y
.X zi /ki :
i D1
a) The complex-valued solutions of the differential equation
f .n/ C an1 f .n1/ C : : : C a1 f 0 C a0 f D 0
are the maps of the form
x 7! f .x/ D
d
X
e zi x Pi .x/;
iD1
where Pi is a polynomial with complex coefficients whose degree does not exceed
ki 1.
b) The set of complex-valued solutions of the previous differential equation is a
vector space of dimension n D k1 C : : : C kd over C, a basis being given by the
maps x 7! x j e zi x , where 1 i d and 0 j < ki .
We consider now the discrete analogue of the previous problem, namely linear
recurrence sequences. Let a0 ; : : : ; ad 1 be complex numbers and consider the set
S of sequences .xn /n0 of complex numbers such that
xnCd D a0 xn C a1 xnC1 C : : : C ad 1 xnCd 1
for all n 0.
First of all, it is clear that an element of S is uniquely determined by its first d
terms x0 ; : : : ; xd 1 . In other words, the map
S ! Cd ;
.xn /n0 7! .x0 ; x1 ; : : : ; xd 1 /;
which is clearly linear, is bijective and so an isomorphism of vector spaces. We
deduce that
dim S D d:
We would like to describe explicitly the elements of S. We proceed as above,
by working in the big space V of all sequences .xn /n0 of complex numbers and
considering the shift map
T W V ! V;
T ..xn /n0 / D .xnC1 /n0 :
362
9 Diagonalizability
Note that T is clearly a linear map and that
T k ..xn /n0 / D .xnCk /n0 :
It follows that
S D ker P .T /;
P .X / D X d ad 1 X d 1 : : : a0
where
is the characteristic polynomial of the recurrence relation. As before, factorizing
P .T / D
p
Y
.X zi /ki
iD1
we obtain using Theorem 9.15 that
SD
p
M
ker.T zi id/ki
iD1
and so the problem is reduced to understanding the space ker.T z id/k where z is
a complex number and k is a positive integer.
Let us start with the case z D 0, i.e., understanding ker T k . We have
k
T ..xn /n0 / D 0 if and only if xnCk D 0 for all n 0, i.e., the sequence
x0 ; x1 ; : : : becomes the zero sequence starting with index k. A basis of ker T k is
given by the sequences x .0/ ; : : : ; x .k1/ , where x .j / is the sequence whose j th term
is 1 and all other terms 0.
Assume now that z ¤ 0. Let x D .xn /n0 be any sequence in V and define a new
sequence y D .yn /n0 by
yn D
xn
zn
for n 0. One can easily check by induction on j that
.T z id/j .x/ D .znCj .T id/j .y/n /n0 ;
where .T id/j .y/n is the nth component of the sequence .T id/j .y/. It follows
that
x 2 ker.T z id/k
if and only if
y 2 ker.T id/k :
We are therefore reduced to understanding Ker.T id/k . If k D 1, a sequence
x D .xn /n0 is in Ker.T id/k if and only if xnC1 xn D 0 for n 0, i.e., x is a
constant sequence. If k D 2, a sequence x D .xn /n0 is in Ker.T id/k if and only
9.3 Some Applications of the Previous Ideas
363
if .T id/.x/ is a constant sequence, i.e., the sequence .xnC1 xn /n0 is constant,
that is x is an arithmetic sequence or equivalently
xn D an C b
for some complex numbers a; b. In general, we have
Proposition 9.32. If k is a positive integer, then ker.T id/k is the set of sequences
of the form
xn D a0 C a1 n C : : : C ak1 nk1
with a0 ; : : : ; ak1 2 C, a basis of it being given by the sequences
x .j / D .nj /n0
for 0 j k 1 (with the convention that 00 D 1).
Proof. It suffices to prove that the sequences x .0/ ; : : : ; x .k1/ form a basis of
ker.T id/k .
First, we prove that x .0/ ; : : : ; x .k1/ are linearly independent. Indeed, if not, then
we can find complex numbers u0 ; : : : ; uk1 , not all 0, such that for all n 0
u0 C u1 n C : : : C uk1 nk1 D 0:
The polynomial u0 C u1 X C : : : C uk1 X k1 is then nonzero and has infinitely many
roots, a contradiction.
Next, we prove that x .j / 2 ker.T id/k for 0 j k 1, by induction on k.
This is clear for k D 1, and assuming that it holds for k 1, it suffices (thanks to
the inductive hypothesis and the inclusion ker.T id/k1 ker.T id/k ) to check
that x .k1/ 2 ker.T id/k , or equivalently that
.T id/x .k1/ D ..n C 1/k1 nk1 /n0 2 ker.T id/k1 :
But the binomial theorem shows that ..n C 1/k1 nk1 /n0 is a linear combination
of x .0/ ; : : : ; x .k2/ , which all belong to ker.T id/k1 by the inductive hypothesis,
hence the inductive step is proved.
To conclude, it suffices to prove that dim ker.T id/k k for all k, which we
do again by induction on k. This has already been seen for k D 1, and if it holds for
k 1, then the rank-nullity theorem applied to the map
T id W ker.T id/k ! ker.T id/k1
yields
dim ker.T id/k dim ker.T id/ C dim ker.T id/k1 :
364
9 Diagonalizability
Now dim ker.T id/ 1 and by induction dim ker.T id/k1 k 1, hence
dim ker.T id/k k and the inductive step is completed. The proposition is finally
proved.
t
u
We can now put everything together and the previous discussion yields the
following beautiful:
Theorem 9.33. Let a0 ; : : : ; ad 1 be complex numbers. Consider the polynomial
P .X / D X d ad 1 X d 1 : : : a0 D
p
Y
.X zi /ki
i D1
and assume for simplicity that a0 ¤ 0, so that all zi are nonzero.
Let S be the set of sequences .xn /n0 of complex numbers such that
xnCd D a0 xn C a1 xnC1 C : : : C ad 1 xnCd 1
for all n 0.
a) A sequence .xn /n0 is in S if and only if there are polynomials Qi with complex
coefficients, of degree not exceeding ki 1, such that for all n
xn D Q1 .n/zn1 C : : : C Qp .n/znp :
b) S is a vector space of dimension d over C, a basis being given by the sequences
.zni nj /1ip;0j <ki .
We promised that we will use the ideas developed in this chapter to give a
very natural and simple proof of the Cayley–Hamilton theorem for matrices with
complex entries. It is now time to honor our promise! We will need some topological
preliminaries, however. . .
A sequence of matrices .Ak /k0 in Mn .C/ converges to a matrix A 2 Mn .C/
(which we denote by Ak ! A) if for all i; j 2 Œ1; n the sequence with general
term the .i; j /-entry of Ak converges (as a sequence of complex numbers) to the
.i; j /-entry of A. Equivalently, the sequence .Ak /k0 converges to A if for all " > 0
we have
max j.Ak /ij Aij j < "
1i;j n
for all k large enough (depending on "). We leave it to the reader to check that if
Ak ! A and Bk ! B, then Ak C Bk ! A C B and Ak Bk ! A B. Finally,
a subset S of Mn .C/ is dense in Mn .C/ if for any matrix A 2 Mn .C/ there is a
sequence of elements of S which converges to A. That is, any matrix in Mn .C/ is
the limit of a suitable sequence of matrices in S .
The following fundamental result makes the importance of diagonalizable
matrices fairly clear.
9.3 Some Applications of the Previous Ideas
365
Theorem 9.34. The set of diagonalizable matrices in Mn .C/ is dense in Mn .C/.
In other words, any matrix A 2 Mn .C/ is the limit of a sequence of diagonalizable
matrices.
Proof. Suppose first that T is an upper-triangular matrix with entries tij . Consider
the sequence Tk of matrices, where all entries of Tk except the diagonal ones are
equal to the corresponding entries in T , and for which the diagonal entries of Tk are
ti i C k1i . Clearly, limk!1 Tk D T . We claim that if k is large enough, then Tk is
diagonalizable. It suffices to check that the eigenvalues of Tk are pairwise distinct.
But since Tk is upper-triangular, its eigenvalues are the diagonal entries. Thus it
suffices to check that for k large enough the numbers t11 C k1 ; t22 C k12 ; : : : ; tnn C k1n
are pairwise distinct, which is clear.
Now let A 2 Mn .C/ be an arbitrary matrix. By Corollary 9.5 we can write
A D P TP 1 for some invertible matrix P and some upper-triangular matrix T .
By the previous paragraph, there is a sequence Dk of diagonalizable matrices which
converges to T . Then Dk0 WD PDk P 1 is a sequence of diagonalizable matrices
which converges to A. Thus any matrix is a limit of diagonalizable matrices and the
theorem is proved.
t
u
Remark 9.35. a) We can restate the theorem as follows: given any matrix A D
Œaij 2 Mn .C/ and any " > 0, we can find a diagonalizable matrix B D Œbij 2
Mn .C/ such that
max jaij bij j < ":
1i;j n
b) This result is completely false over the real numbers: the diagonalizable matrices
in Mn .R/ are not dense in Mn .R/. The reason is that the characteristic polynomial of a diagonalizable matrix is split. One can prove that if limn!1 An D A
and An is diagonalizable for all n, then the characteristic polynomial of A is split.
Conversely, if this happens, then A is trigonalizable in Mn .R/ and the proof of
the previous theorem easily yields that A is a limit of diagonalizable matrices
in Mn .R/. We deduce that the trigonalizable matrices in Mn .R/ are precisely
the limits of sequences of diagonalizable matrices in Mn .R/. In other words, a
matrix A 2 Mn .R/ is trigonalizable if and only if it can be approximated to any
precision by a diagonalizable matrix in Mn .R/.
Using the previous theorem, we can give a very simple and natural proof of the
Cayley–Hamilton theorem.
Theorem 9.36 (Cayley–Hamilton). For any matrix A 2 Mn .C/ we have A .A/ D
On , that is A is annihilated by its characteristic polynomial.
Proof. If A is diagonal, the result is clear: if a1 ; : : : ; an are the diagonal entries of
A, then A .X / D .X a1 / : : : .X an / and clearly this polynomial annihilates A
(since it vanishes at a1 ; : : : ; an ).
366
9 Diagonalizability
Next, suppose that A is diagonalizable. Thus we can write A D PDP 1 for an
invertible matrix P and a diagonal matrix D. Since A and D are similar, we have
A D D . So we need to check that D .A/ D On . But
D .A/ D D .PDP 1 / D P D .D/P 1 D On ;
the last equality being a consequence of the first paragraph and the equality
D .PDP 1 / D P D .D/P 1 being a consequence of linearity and of the equality
.PDP 1 /k D PD k P 1 for all k 0.
Finally, let A 2 Mn .C/ be arbitrary. By Theorem 9.34 there is a sequence
.Ak /k1 of diagonalizable matrices such that Ak ! A. The coefficients of Ak
are polynomial expressions in the coefficients of Ak and since limk!1 Ak D A, it
follows that the coefficient of X d in Ak converges to the coefficient of X d in A
for all d n. Now write
Ak .X / D a0 .k/ C a1 .k/X C : : : C an .k/X n ;
A .X / D a0 C a1 X C : : : C an X n :
By the previous paragraph we know that
a0 .k/In C a1 .k/Ak C : : : C an .k/Ank D On
for all k. Passing to the limit and using the fact that Aik ! Ai and ai .k/ ! ai for
all i 0 , we deduce that
A .A/ D a0 In C a1 A C : : : C an An D
lim .a0 .k/In C a1 .k/Ak C : : : C an .k/Ank / D On ;
k!1
finishing the proof of the theorem.
t
u
Remark 9.37. a) The second half of the proof of the previous theorem essentially
2
proves that if a polynomial equation on Cn holds on a dense subset, then it holds
everywhere. The reader is strongly advised to convince himself that he can adapt
the argument to prove this very useful result.
b) In fact by using some deep facts from algebra one can show that the Cayley–
Hamilton theorem for the field C just proven implies the Cayley–Hamilton
theorem over an arbitrary field. One first needs to know that one can choose
n2 elements xij of C such that there is no polynomial equation with integer
coefficients (in n2 variables) satisfied by the xij (this is a generalization of a
transcendental number which is a number that satisfies no polynomial equation
with integer coefficients). Then since the Cayley–Hamilton theorem holds for the
matrix with entries xij , we conclude that each coefficient of the Cayley–Hamilton
theorem gives a polynomial identity in n2 indeterminates which holds over the
integers. Second, one needs to know that for any field F and any n2 elements aij
of F there is a morphism (a map respecting addition and multiplication) from
ZŒx11 ; : : : ; xnn to F taking xij to aij . Thus each coefficient also vanishes in F
for the matrix A D .aij / and the Cayley–Hamilton theorem holds for F .
9.3 Some Applications of the Previous Ideas
367
We will end this chapter by explaining how we can combine the ideas seen so far
with Jordan’s Theorems 6.40 and 6.41 to obtain a classification up to similarity of
all matrices A 2 Mn .C/ (Theorems 6.40 and 6.41 classified nilpotent matrices up
to similarity).
Suppose that V is a finite dimensional vector space over a field F and that T W
V ! V is a trigonalizable linear transformation on V . Recall that this is equivalent
to saying that the characteristic polynomial of T is split over F . For instance, if
F D C, then any linear transformation on V is trigonalizable. Let
T .X / D
d
Y
.X i/
ki
iD1
be the factorization of the characteristic polynomial of T , with 1 ; : : : ; d 2
F pairwise distinct and k1 ; : : : ; kd positive integers. Thus ki is the algebraic
multiplicity of the eigenvalue i .
By the Cayley–Hamilton theorem T .T / D 0, thus Theorem 9.15 yields
V D
d
M
ker.T i id/
ki
:
iD1
We call the subspace
Ci D ker.T i id/
the characteristic subspace of i . Note that the
and that the previous relation can be written as
V D
d
M
ki
i -eigenspace is a subspace of Ci
Ci :
iD1
Since T commutes with .T i id/ki , T leaves invariant Ci D ker.T i id/ki ,
thus each characteristic subspace Ci is stable under T .
Let Ti be the restriction of T i id to Ci . By definition, Tiki D 0, thus Ti is a
nilpotent transformation on Ci , of index not exceeding ki . Thus Ti is classified up
to similarity by a Jordan matrix, that is there is a basis Bi of Ci in which the matrix
of Ti is
2
Jk1;i 0 : : :
6 0 Jk : : :
2;i
6
6 :
:: : :
4 ::
:
:
0
0
0
::
:
0 : : : Jkri ;i
3
7
7
7
5
368
9 Diagonalizability
for a sequence k1;i : : : kri ;i of positive integers adding up to dim Ci . We recall
that
3
0 1 0 ::: 0
60 0 1 ::: 07
7
6
7
6
Jk D 6 ::: ::: ::: : : : ::: 7
7
6
40 0 0 ::: 15
2
0 0 0 ::: 0
is the Jordan block of size k.
Definition 9.38. If
2 F , we let
Jn . / D
In C Jn 2 Mn .F /
the Jordan block of size n associated with .
The previous discussion naturally leads to
Theorem 9.39 (Jordan). Let T W V ! V be a trigonalizable linear transformation on a finite dimensional vector space. Then there is a basis of V in which the
matrix of T is of the form
2
Jk1 . 1 /
0
:::
6 0
Jk2 . 2 / : : :
6
6
::
::
::
4
:
:
:
0
0
0
0
::
:
3
7
7
7
5
: : : Jkd . d /
for some positive integers k1 ; : : : ; kd adding up to n and some
1; : : : ;
d 2 F.
Proof. With notations as above, we found a basis Bi of Ci in which the matrix of
the restriction of T to Ci is Jdim Ci . i /. Patching these bases Bi yields a basis of V
in which the matrix of T has the desired form.
t
u
We can restate the previous theorem in terms of matrices:
Theorem 9.40 (Jordan). Any trigonalizable matrix A 2 Mn .F / is similar to a
matrix of the form
2
0
:::
Jk1 . 1 /
6 0
J
.
/
k2
2 :::
6
6
::
::
::
4
:
:
:
0
0
0
0
::
:
3
7
7
7
5
: : : Jkd . d /
for some positive integers k1 ; : : : ; kd adding up to n and some
1; : : : ;
d 2 F.
9.3 Some Applications of the Previous Ideas
369
The story isn’t quite finished: we would like to know when two block-diagonal
matrices as in the theorem are similar, in other words we would like to know if
1 ; : : : ; d and k1 ; : : : ; kd are determined by the similarity class of the matrix
3
2
0
:::
0
Jk1 . 1 /
7
6 0
Jk2 . 2 / : : :
0
7
6
(?)
7:
6
::
::
:
:
:
:
5
4
:
:
:
:
0
0
: : : Jkd . d /
Suppose that A is a matrix similar to the matrix (?). Then the characteristic
polynomial of A is
A .X / D
d
Y
Jki . i /.X /:
iD1
Now, since Jn is nilpotent we have Jn .X / D X n and so
Jn . / .X / D .X /n :
It follows that
A .X / D
d
Y
.X i/
ki
iD1
and so necessarily 1 ; : : : ; d are all eigenvalues of A. Note that we did not assume
that 1 ; : : : ; d are pairwise distinct, thus we cannot conclude from the previous
equality that k1 ; : : : ; kd are the algebraic multiplicities of the eigenvalues of A. This
is not true in general: several Jordan blocks corresponding to a given eigenvalue
may appear. The problem of uniqueness is completely solved by the following:
Theorem 9.41. Suppose that a matrix A 2 Mn .F / is similar to
2
3
0
:::
0
Jk1 . 1 /
6 0
7
Jk2 . 2 / : : :
0
6
7
6
7
::
::
:
::
::
4
5
:
:
:
0
0
: : : Jkd . d /
for some positive integers k1 ; : : : ; kd adding up to n and some
Then
1; : : : ;
d
2 F.
a) Each i is an eigenvalue of A.
b) For each eigenvalue of A and each positive integer m, the number of Jordan
blocks Jm . / among Jk1 . 1 /; : : : ; Jkd . d / is
Nm . / D rank.A In /mC1 2rank.A In /m C rank.A In /m1
and depends only on the similarity class of A.
370
9 Diagonalizability
Proof. We have already seen part a). The proof of part b) is very similar to the
solution of Problem 6.43. More precisely, let B D A In and observe that B m is
3
2
0
:::
0
.Jk1 . 1 / Ik1 /m
7
6
0
.Jk2 . 2 / Ik2 /m : : :
0
7
6
similar to 6
7, thus
::
::
:
:
:
:
5
4
:
:
:
:
m
0
0
: : : .Jkd . d / Ikd /
rank.B m / D
d
X
rank.Jki . i / Iki /m :
iD1
Now, the rank of .Jn . / In /m is
• n if
¤ , as in this case
Jn . / In D Jn C . /In
is invertible,
• n m for D and m n, as follows from Problem 6.42.
• 0 for D and m > n, as Jnn D On .
Hence, if Nm . / is the number of Jordan blocks Jm . / among Jk1 . 1 /; : : : ; Jkd
. d /, then
rank.B m / D
X
X
.ki m/ C
iD
ki m
ki ;
i¤
then subtracting these relations for m 1 and m yields
X
rank.B m1 / rank.B m / D
1
iD
ki m
and finally
rank.B m1 / 2rank.B m / C rank.B mC1 / D .rank.B m1 / rank.B m //
X
.rank.B m / rank.B mC1 // D
1 D Nm . /;
iD
ki Dm
as desired.
t
u
Note that if an eigenvalue has algebraic multiplicity 1, then there is a single
Jordan block attached to , and it has size 1.
9.3 Some Applications of the Previous Ideas
371
Example 9.42. Consider the matrix
3
1 000 2
6 0 010 0 7
7
6
7
6
A D 6 0 0 0 0 0 7:
7
6
4 0 100 0 5
1 0 0 0 2
2
We compute A .X / by expanding det.XI5 A/ with respect to the third row
and obtain (using again an expansion with respect to the second row in the new
determinant)
ˇ
ˇ
ˇ
ˇ
ˇ X 1 0 0 2 ˇ
ˇX 1 0
ˇ
ˇ
2 ˇˇ
ˇ
ˇ 0 X 0
ˇ
0
ˇ D X2 ˇ 0 X 0 ˇ
A .X / D X ˇˇ
ˇ
ˇ
ˇ
ˇ 1 0 X C 2ˇ
ˇ 0 1 X 0 ˇ
ˇ 1
ˇ
0 0 X C2
ˇ
ˇ
ˇ X 1 2 ˇ
ˇ D X 4 .X C 1/:
D X 3 ˇˇ
1 X C 2ˇ
The eigenvalue 1 has algebraic multiplicity 1, thus there is a single Jordan
block associated with this eigenvalue, of size 1. Let us deal now with the eigenvalue
0, which has algebraic multiplicity 4. Let Nm be the number of Jordan blocks of size
m associated with this eigenvalue. By the previous theorem
N1 D rank.A2 / 2rank.A/ C 5;
N2 D rank.A3 / 2rank.A2 / C rank.A/
and so on. One easily checks that A has rank 3. Next, one computes
3
3
2
2
1 0 0 0 2
1 000 2
6 0 000 0 7
6 0 000 0 7
7
7
6
6
7
7
6
6
2
3
A D 6 0 0 0 0 0 7; A D 6 0 0 0 0 0 7:
7
7
6
6
4 0 010 0 5
4 0 000 0 5
1 000 2
1 0 0 0 2
Note that A2 has rank 2 (it is apparent that a basis of the space spanned by its rows
is given by the first and fourth row) and A3 has rank 1. Thus
N1 D 2 2 3 C 5 D 1;
thus there is one Jordan block of size 1 and
N2 D 1 2 2 C 3 D 0;
372
9 Diagonalizability
thus there is no Jordan block of size 2. Since the sum of the sizes of the Jordan
blocks associated with the eigenvalue 0 is 4, and since we already know that there
is a block of size 1 and no block of size 2, we deduce that there is one block of size
3 and so the Jordan canonical form of A is
3
2
1 0 0 0 0
6 0 0 0 0 07
7
6
7
6
6 0 0 0 1 07:
7
6
4 0 0 0 0 15
0 0000
9.3.1 Problems for Practice
1. Given a real number ! and two real numbers a; b, find all twice differentiable
functions f W R ! R satisfying f .0/ D a, f 0 .0/ D b and
00
f C !2f D 0
2. Find all smooth functions f W R ! R such that f .0/D1, f 0 .0/D0, f 00 .0/D0
and
f
000
00
C f C f 0 C f D 0:
3. Let V be a finite dimensional F -vector space and let T W V ! V be a linear
transformation such that T 3 D id.
a) Prove that V D Ker.T id/ ˚ Ker.T 2 C T C id/.
b) Prove that
rank.T id/ D dim Ker.T 2 C T C id/:
c) Deduce that
V D Ker.T id/ ˚ Im.T id/:
4. Describe the sequences .xn /n0 of complex numbers such that
xnC4 C xnC3 xnC1 xn D 0
for all n 0.
5. Find the Jordan canonical form of the matrix
9.3 Some Applications of the Previous Ideas
373
3
1 0 3
A D 4 1 1 6 5 :
1 2 5
2
6. Compute the Jordan canonical form of the matrix
3
1100
60 1 2 07
7
AD6
40 0 1 05:
2
0002
7. Consider the matrix
3
011000
60 0 0 2 1 07
7
6
7
6
60 0 0 0 1 27
AD6
7:
60 0 0 0 0 07
7
6
40 0 0 0 0 05
000000
2
a) Prove that A3 D O3 and find the characteristic polynomial of A.
b) Find the Jordan canonical form of A.
8. What are the possible Jordan forms of a matrix whose characteristic polynomial
is .X 1/.X 2/2 ?
9. Consider a matrix A2M6 .C/ of rank 4 whose minimal polynomial is X.X 1/
.X 2/2 .
a) What are the eigenvalues of A?
b) Is A diagonalizable?
c) What are the possible forms of the Jordan canonical form of A?
10. Prove that any matrix similar to a matrix of the form
2
Jk1 . 1 /
0
:::
6 0
J
.
/
:::
k
2
2
6
6
::
::
::
4
:
:
:
0
0
0
0
::
:
3
7
7
7
5
: : : Jkd . d /
is trigonalizable (this is a converse to Jordan’s theorem).
11. a) What is the minimal polynomial of Jn . / when 2 C and n 1?
b) Explain how we can compute the minimal polynomial of a matrix in terms
of its Jordan canonical form.
374
9 Diagonalizability
12. Prove that two matrices A; B 2 Mn .C/ are similar if and only if P .A/ and
P .B/ have the same rank for all polynomials P 2 CŒX .
13. Use Jordan’s theorem to prove that any matrix A 2 Mn .C/ is similar to its
transpose.
14. a) Prove that if A 2 Mn .C/ is similar to 2A, then A is nilpotent.
b) Use Jordan’s theorem to prove that if A 2 Mn .C/ is nilpotent then A is
similar to 2A.
15. Let T W V ! V be a trigonalizable linear transformation on a finite dimensional
vector space V over a field F . Let
T .T / D
d
Y
.X i/
ki
iD1
be the factorization of its characteristic polynomial and let
Ci D ker.T be the characteristic subspace of
i id/
ki
i.
a) Prove that ker.T i id/k D Ci for all k ki . Hint: use Theorem 9.15 to
show that V D ker.T i id/k ˚ ˚j ¤i Cj , then take dimensions.
b) Prove that
dim Ci D ki :
Hint: consider the matrix of T with respect to a basis of V obtained by
patching a basis of Ci and a basis of a complementary subspace of Ci . What
is its characteristic polynomial?
c) Prove that the smallest positive integer k for which
ker.T is the multiplicity of
i id/
k
D Ci
i as root of the minimal polynomial of T .
16. (The Dunford–Jordan decomposition) a) Using Jordan’s theorem, prove that any
trigonalizable linear transformation T W V ! V on a finite dimensional vector
space is the sum of a diagonalizable and of a nilpotent transformation, the two
transformations commuting with each other.
b) State the result obtained in a) in terms of matrices.
b) Conversely, prove the sum of a nilpotent and of a diagonalizable transformations which commute with each other is trigonalizable.
17. (More on the Dunford–Jordan decomposition) Let T
trigonalizable linear transformation with
T .T / D
d
Y
.X iD1
i/
ki
W V
! V be a
9.3 Some Applications of the Previous Ideas
375
as in Problem 15. Let Ci be the characteristic subspace of the eigenvalue i .
We define the i -spectral projection i as the projection of V onto Ci along
˚j ¤i Cj . Thus by definition if v 2 V is written as v1 C : : : C vd with vi 2 Ci ,
then
i .v/ D ci :
a) Use the proof of Theorem 9.15 to show that
i 2 F ŒT :
b) Let
DD
d
X
i i:
iD1
Prove that D is a diagonalizable linear transformation on V, that N DT D
is nilpotent and N ı D D D ı N . Thus D; N give a Dunford–Jordan
decomposition of T .
c) Prove that D and N are in F ŒT .
d) Deduce from part c) that if D 0 is diagonalizable, N 0 is nilpotent, D 0 and N 0
commute and D 0 C N 0 D T , then D 0 D D and N 0 D N . In other words,
the pair .D; N / in the Jordan–Dunford decomposition is unique.
e) Find the Dunford–Jordan decomposition of the matrices
3
1 1 0
A D 4 0 1 1 5
0 0 1
2
3
110
B D 40 1 15;
001
2
3
1 0 3
C D 4 1 1 6 5 :
1 2 5
2
Chapter 10
Forms
Abstract This chapter has a strong geometrical flavor. It starts with a discussion
of bilinear and quadratic forms and uses this to introduce Euclidean spaces and
establish their main geometric properties. This is in turn applied to linear algebra,
leading to a classification of symmetric and orthogonal matrices with real entries.
Keywords Quadratic form • Bilinear form • Polar form • Euclidean space
• Inner-product • Positive-definite matrix • Orthogonal projection
• Gram–Schmidt algorithm
The goal of this last chapter is to make a rather detailed study of Euclidean spaces
over the real numbers. Euclidean spaces make the link between linear algebra,
geometry and analysis. They are therefore of fundamental importance. The geometric insight they offer also reveals unexpected and deep properties of symmetric
and orthogonal matrices. Thus on the one hand proving the fundamental theorems
concerning Euclidean spaces will use essentially everything we have developed so
far, so this is also an opportunity to see real applications of linear algebra, on the
other hand the geometry of Euclidean spaces helps discovering and proving many
interesting properties of matrices! Among the important topics discussed in this
chapter, we mention: basic properties of bilinear and quadratic forms, orthogonality
and inequalities in Euclidean spaces, orthogonal projections and their applications
to minimization problems, orthogonal bases and their applications, for instance
to Fourier analysis, the classification of isometries (i.e., linear transformations
preserving distance) of an Euclidean space, the classification of symmetric matrices,
and its applications to matrix inequalities, norms, etc. In all this chapter we work
with the field F D R of real numbers. Many exercises (left to the reader) are devoted
to the analogous theory over the field of complex numbers.
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__10
377
378
10 Forms
10.1 Bilinear and Quadratic Forms
We have already introduced the notion of d -linear form on a vector space in the
chapter devoted to determinants. We will be concerned with a special case of this
notion, and for the reader’s convenience we will give the detailed definition in this
special case:
Definition 10.1. Let V be a vector space over R. A bilinear form on V is a map
b W V V ! R such that
• For all x 2 V the map b.x; / W V ! R sending v to b.x; v/ is linear.
• For all y 2 V the map b.; y/ W V ! R sending v to b.v; y/ is linear.
The bilinear form b is called symmetric if b.x; y/ D b.y; x/ for all x; y 2 V .
Remark 10.2. If x1 ; : : : ; xn 2 V; y1 ; : : : ; ym 2 V and a1 ; : : : ; an ; c1 ; : : : ; cm 2 R,
then for any bilinear form b on V we have
b.
n
X
iD1
ai xi ;
m
X
cj y j / D
m
n X
X
j D1
ai cj b.xi ; yj /
(10.1)
iD1 j D1
In particular, if V is finite dimensional and if e1 ; : : : ; en is a basis of V , then b is
uniquely determined by its values at the pairs .ei ; ej / with 1 i; j n (i.e., if
b; b 0 are bilinear forms on V and b.ei ; ej / D b 0 .ei ; ej / for all 1 i; j n, then
b D b 0 ).
Example 10.3. a) If a1 ; : : : ; an are real numbers and V D Rn , then setting for x D
.x1 ; : : : ; xn / and y D .y1 ; : : : ; yn /
b.x; y/ D a1 x1 y1 C : : : C an xn yn
yields a symmetric bilinear form on V . The choice a1 D : : : D an D 1 is
particularly important and in this case we call b the canonical inner product
on Rn .
b) Consider the space V of continuous, real-valued functions on Œ1; 1. Then
b.f; g/ D
Z 1
f .x/g.x/dx
1
is a symmetric bilinear form on V , as the reader can easily check.
c) Let V be the space Mn .R/ of n n matrices with real entries, and consider the
map b W V V ! R defined by
b.A; B/ D Tr.AB/:
10.1 Bilinear and Quadratic Forms
379
Then b is a symmetric bilinear form on V . A slightly different and more
commonly used variant is
b 0 .A; B/ D Tr.A t B/:
The reason why b 0 is preferred to b is that one can easily check that if A D Œaij and B D Œbij , then
b 0 .A; B/ D
n
X
aij bij ;
i;j D1
that is if we identify V D Rn via the canonical basis of V , then b 0 becomes
2
identified with the canonical inner product on Rn .
P
d) Let V be the space of sequences .xn /n1 of real numbers for which n1 xn2 is
a convergent series. Define
2
b.x; y/ D
X
xn yn
n1
P
for x D .xn /n1 and y D .yn /n1 in V . Note that the series n1 xn yn
converges since it converges absolutely. Indeed, we have .jxn j jyn j/2 0
which can be written as
jxn yn j xn2 C yn2
2
x 2 Cy 2
and by assumption the series with general term n 2 n converges. One can easily
check that b is a symmetric bilinear form on V .
e) Let V be the space of polynomials with real coefficients and, for P; Q 2 V ,
define
b.P; Q/ D
X P .n/Q.n/
n1
2n
:
Note that the series converges absolutely, since nk =2n D O.1=n2 / for all k 1.
Then b is a symmetric bilinear form.
It follows easily by unwinding definitions that the set of all bilinear forms on V is
naturally a vector subspace of the vector space of all maps V V ! R. Moreover,
the subset of symmetric bilinear forms is a subspace of the space of all bilinear
forms on V . To any bilinear form b one can attach a map of one variable
q W V ! R;
q.x/ D b.x; x/:
380
10 Forms
This is called the quadratic form attached to b. Let us formally define quadratic
forms:
Definition 10.4. A quadratic form on V is a map q W V ! R for which there is a
bilinear form b W V V ! R such that q.x/ D b.x; x/ for all x 2 V .
A natural question is whether the bilinear form b attached to a quadratic form
as in the previous definition is uniquely determined by the quadratic form. So the
question is whether we can have two different bilinear forms b1 ; b2 such that
b1 .x; x/ D b2 .x; x/
for all x. Stated differently, is there a nonzero bilinear form b such that b.x; x/ D 0
for all x 2 V ? The answer is yes: consider the bilinear form b W R2 R2 ! R
defined by
b..x1 ; y1 /; .x2 ; y2 // D x1 y2 x2 y1 :
Clearly this is a nonzero bilinear form and b.x; x/ D 0 for all x. On the other hand,
if we further impose that b should be symmetric, then we have uniqueness, as
shows the following fundamental:
Theorem 10.5. For any quadratic form q W V ! R there is a unique symmetric
bilinear form b W V V ! R such that q.x/ D b.x; x/ for all x 2 V . It is
determined by the polarization identity
b.x; y/ D
q.x C y/ q.x/ q.y/
:
2
Proof. Fix a quadratic form q W V ! R. By hypothesis we can find a bilinear (but
not necessarily symmetric) form B such that q.x/ D B.x; x/ for all x 2 V . Define
a map b W V V ! R by
b.x; y/ D
q.x C y/ q.x/ q.y/
:
2
We claim that b is a symmetric bilinear form and b.x; x/ D q.x/. By definition,
we have
b.x; y/ D
B.x C y; x C y/ B.x; x/ B.y; y/
:
2
Since B is bilinear, we can write
B.x C y; x C y/ D B.x; x/ C B.x; y/ C B.y; x/ C B.y; y/:
Thus
b.x; y/ D
B.x; y/ C B.y; x/
:
2
10.1 Bilinear and Quadratic Forms
381
This makes it clear that b.x; x/ D B.x; x/ D q.x/ and that b.x; y/ D b.y; x/ for
all x; y 2 V . It remains to see that b is bilinear. But for fixed x the maps B.x; / and
B.; x/ are linear (since B is bilinear), thus so is the map
b.x; / D
B.x; / C B.; x/
:
2
Similarly, b.; x/ is linear for all x 2 V , establishing that b is bilinear and proving
the claim.
Let us now show that b is unique. If b 0 is another bilinear symmetric form such
that b 0 .x; x/ D q.x/ for all x, then a computation as in the previous paragraph gives
q.x C y/ D b 0 .x C y; x C y/ D
b 0 .x; x/ C 2b 0 .x; y/ C b 0 .y; y/ D q.x/ C q.y/ C 2b 0 .x; y/;
thus necessarily b 0 .x; y/ D b.x; y/ for all x; y, that is b 0 D b.
t
u
Definition 10.6. If b is attached to q as in the previous theorem, we call b the polar
form of q.
Example 10.7. a) Consider the space V D Rn and the map q W Rn ! R defined by
q.x1 ; : : : ; xn / D x12 C : : : C xn2 :
Then q is a quadratic form and its polar form is
b..x1 ; : : : ; xn /; .y1 ; : : : ; yn // D x1 y1 C : : : C xn yn :
Indeed, let us compute for x D .x1 ; : : : ; xn / and y D .y1 ; : : : ; yn /
q.x C y/ q.x/ q.y/
D
2
Pn
iD1 .xi C yi /
2
Pn
2
i D1 xi Pn
2
i D1 yi
2
D
n
X
xi yi :
iD1
P
The map .x; y/ 7! niD1 xi yi being bilinear and symmetric, it follows on the
one hand that q is a quadratic form and on the other hand that b is its polar form.
b) Consider the space V of continuous real-valued maps on Œ0; 1 and define q W
V ! R by
q.f / D
Z 1
f .x/2 dx:
0
382
10 Forms
To see that q is a quadratic form and to find the polar form of f , we compute
q.f Cg/ q.f / q.g/
D
2
R1
R1
2
2
2
0 .f .x/Cg.x// dx 0 f .x/ dx 0 g.x/ dx
R1
2
D
Z 1
f .x/g.x/dx:
0
R1
Since the map b defined by b.f; g/ D 0 f .x/g.x/dx is bilinear and symmetric,
it follows that q is a quadratic form with polar form b.
c) As a counter-example, consider the map q W R2 ! R defined by
q.x; y/ D x 2 C 2y 2 C 3x:
We claim that q is not a quadratic form. Indeed, otherwise letting b its polar form
we would have
b..x; y/; .x; y// D x 2 C 2y 2 C 3x
for all x; y 2 R2 . Replacing x by x and y by y and taking into account that
b is bilinear, we obtain
x 2 C 2y 2 C 3x D b..x; y/; .x; y// D b..x; y/; .x; y// D
b..x; y/; .x; y// D x 2 C 2y 2 3x;
thus 6x D 0 and this for all x 2 R, which is plainly absurd.
The previous theorem establishes therefore a bijection between quadratic
forms and symmetric bilinear forms: any symmetric bilinear form b determines
a quadratic form x 7! b.x; x/, and any quadratic form determines a symmetric
bilinear form, namely its polar form.
Problem 10.8. Let q be a quadratic form on V , with polar form b.
a) Prove that for all x; y 2 V
b.x; y/ D
q.x C y/ q.x y/
:
4
b) (Parallelogram law) Prove that for all x; y 2 V
q.x C y/ C q.x y/ D 2.q.x/ C q.y//:
10.1 Bilinear and Quadratic Forms
383
c) (Pythagorean theorem) Prove that for all x; y 2 V we have b.x; y/ D 0 if and
only if
q.x C y/ D q.x/ C q.y/:
Solution. a) By the polarization identity we have
q.x C y/ D q.x/ C q.y/ C 2b.x; y/
and (noting that q.y/ D q.y/ and b.x; y/ D b.x; y/)
q.x y/ D q.x/ C q.y/ 2b.x; y/:
Subtracting the two previous relations yields the desired result.
b) It suffices to add the two relations established in the proof of part a).
c) This follows directly from the polarization identity.
t
u
n
Let us try to understand the quadratic forms on R . If q is a quadratic form on
Rn with polar form b, and if e1 ; : : : ; en is the canonical basis of Rn , then for all
x D x1 e1 C : : : C xn en 2 Rn we have, using Remark 10.2
q.x1 ; : : : ; xn / D b.x1 e1 C : : : C xn en ; x1 e1 C : : : C xn en / D
n
X
n
X
b.ei ; ej /xi xj D
i;j D1
aij xi xj ;
i;j D1
with aij D b.ei ; ej /. Notice that since b.ei ; ej / D b.ej ; ei /, we have aij D aj i ,
thus any quadratic form q on Rn can be written
q.x1 ; : : : ; xn / D
n
X
aij xi xj D
i;j D1
n
X
ai i xi2 C 2
X
aij xi xj ;
1i<j n
iD1
with A D Œaij a symmetric matrix.
Conversely, if A D Œaij is any matrix in Mn .R/ (not necessarily symmetric),
then the map
q W R ! R;
n
q.x1 ; : : : ; xn / D
n
X
aij xi xj
i;j D1
is a quadratic form on Rn , with polar form
b..x1 ; : : : ; xn /; .x10 ; : : : ; xn0 // D
n
X
aij .xi xj0 C xj xi0 /
i;j D1
2
:
384
10 Forms
We leave this as an easy exercise for the reader. Notice that
q.x/ D
n
X
aij xi xj D
i;j D1
n
X
bij xi xj ;
i;j D1
with
bij D
aij C aj i
;
2
and the matrix B D Œbij is symmetric.
There is another natural way of constructing quadratic forms on Rn : pick real
numbers ˛1 ; : : : ; ˛r and linear forms l1 ; : : : ; lr on Rn , and set
q.x/ D ˛1 l1 .x/2 C : : : C ˛r lr .x/2 :
Then q is a quadratic form on Rn , with associated polar form given by
b.x; y/ D
r
X
˛i li .x/li .y/;
iD1
as the reader can easily check. The following amazing result due to Gauss says that
we obtain in this way all quadratic forms on Rn . Moreover, Gauss described an
algorithm which allows us to write a given quadratic form q in the form
q D ˛1 l12 C : : : C ˛r lr2 ;
with l1 ; : : : ; lr linearly independent linear forms. This algorithm will be described
in the (long) proof of the following theorem.
Theorem 10.9 (Gauss). Let q be a quadratic form on V D Rn . There are real
numbers ˛1 ; : : : ; ˛r and linearly independent linear forms l1 ; : : : ; lr 2 V such that
for all x 2 V
q.x/ D ˛1 l1 .x/2 C : : : C ˛r lr .x/2 :
Before giving the proof of the theorem, let us make some further remarks on the
statement. Of course, we may assume that ˛i ¤ 0 for 1 i r, otherwise simply
delete the corresponding term ˛i li2 . Let I be the set of those i for which ˛i > 0 and
let J be the set of those i for which ˛i < 0. Then
Xp
Xp
. ˛i li /2 .x/ . ˛i li /2 .x/
q.x/ D
i2I
i2J
10.1 Bilinear and Quadratic Forms
385
and defining
Li D
p
˛i li
if
i 2I
we obtain
X
qD
Li D
and
L2i i2I
X
p
˛i li
i 2 J;
if
L2i :
i2J
Moreover, since l1 ; : : : ; lr are linearly independent, so are L1 ; : : : ; Lr . In other
words, we can refine the previous theorem by asking that ˛i 2 f1; 1g for all i .
One can prove that the number of elements of I , J , as well as the number r are
uniquely determined by q (this is Sylvester’s inertia theorem). The pair .jI j; jJ j/
consisting in the number of elements of I and J is called the signature of q. We
call jI j C jJ j D r the rank of q (we will see another interpretation of r later on,
which will also explain its name).
We will start now the algorithmic proof of Theorem 10.9, by induction on n. For
n D 1 we can write q.x1 / D ˛1 x12 , where x1 2 R and ˛1 D q.1/ 2 R, so the result
holds.
Assume now that the result holds for n 1. We can write
q.x1 ; : : : ; xn / D
n
X
X
ai i xi2 C 2
aij xi xj
1i<j n
iD1
for some scalars aij 2 R. We will discuss two cases:
• There is i 2 f1; 2; : : : ; ng such that ai i ¤ 0. Without loss of generality, we may
assume that ann ¤ 0. We consider q.x1 ; : : : ; xn / as a quadratic polynomial in the
variable xn and complete the square, to obtain
q.x1 ; : : : ; xn / D ann xn2 C2
n1
X
!
ai n xi xn C
iD1
ann xn C
n1
X
ai n
iD1
ann
!2
xi
ann
D ann xn C
n1
X
iD1
n1
X
ai n
iD1
ann
!2
xi
C
X
ai i xi2 C2
n1
X
i D1
X
ai i xi2 C 2
1i<j n1
!2
n1
X
ai n
a
iD1 nn
xi
C q 0 .x1 ; : : : ; xn1 /;
where q 0 is a quadratic form on Rn1 . By induction, we can write
q 0 .x1 ; : : : ; xn1 / D
r
X
iD1
aij xi xj D
1i<j n1
˛i Li .x1 ; : : : ; xn1 /2
aij xi xj
386
10 Forms
for some linearly independent linear forms Li in x1 ; : : : ; xn1 . Defining
lrC1 .x1 ; : : : ; xn / D xn C
n1
X
ai n
iD1
ann
˛rC1 D ann
xi ;
and
li .x1 ; : : : ; xn / D Li .x1 ; : : : ; xn1 /
for 1 i r we obtain
q.x/ D
rC1
X
˛i li .x/2
iD1
for all x 2 V and the inductive step is finished (we leave it to the reader to check
that l1 ; : : : ; lrC1 are linearly independent).
• All ai i D 0. If all aij D 0, then q D 0 and the result is clear. If not, without loss
of generality we may assume that an1;n ¤ 0. We use the identity
b
c
c
bc
b
yC
axy C bx C cy D a xy C x C y D a x C
a
a
a
a
a
to rewrite
q.x1 ; : : : ; xn / D 2an1;n xn1 xn C 2
n2
X
ai n xi xn
i D1
C2
n2
X
n2
X
ai;n
xi
xn1 C
a
iD1 n1;n
aij xi xj D
1i<j n2
iD1
2an1;n
X
ai;n1 xi xn1 C 2
!
xn C
n2
X
ai;n1
a
iD1 n1;n
!
xi
C q 0 .x1 ; : : : ; xn2 /
for some quadratic form q 0 on Rn2 . Applying the inductive hypothesis, we can
write
q 0 .x1 ; : : : ; xn2 / D
r
X
˛i Li .x1 ; : : : ; xn2 /2
iD1
for some linearly independent linear forms Li of x1 ; : : : ; xn2 , as well as some
scalars ˛i . Using the identity
10.1 Bilinear and Quadratic Forms
387
ab D
.a C b/2 .a b/2
;
4
we obtain
2an1;n
n2
X
ai;n
xn1 C
xi
a
iD1 n1;n
!
xn C
n2
X
ai;n1
i D1
an1;n
!
xi
D
an1;n
.lrC1 .x1 ; : : : ; xn /2 lrC2 .x1 ; : : : ; xn /2 /;
2
where
lrC1 .x1 ; : : : ; xn / D xn1 C xn C
n2
X
ai;n C ai;n1
i D1
an1;n
xi ;
and
lrC2 .x1 ; : : : ; xn / D xn1 xn C
n2
X
ai;n ai;n1
iD1
an1;n
xi :
All in all, setting
˛rC1 D ˛rC2 D
an1;n
2
we have
q.x/ D
rC2
X
˛i li .x/2 :
iD1
We leave it to the reader to check that l1 ; : : : ; lrC2 are linearly independent. This
finishes the proof of Theorem 10.9.
Problem 10.10. Implement the algorithm described in the previous proof in each
of the following cases:
a) q is the quadratic form on R3 defined by
q.x; y; z/ D xy C yz C zx:
b) q is the quadratic form on R3 defined by
q.x; y; z/ D .x y/2 C .y z/2 C .z x/2 :
388
10 Forms
Solution. a) With the notations of the previous proof, we have ai i D 0 for 1 i 3, thus we are in the second case of the above proof. We focus on the nonzero
term yz and we write
q.x; y; z/ D .y C x/.z C x/ x 2 :
Next, using the identity
ab D
.a C b/2 .a b/2
4
we obtain
.y C x/.z C x/ D
.2x C y C z/2 .y z/2
:
4
We conclude that
q.x; y; z/ D
1
1
.2x C y C z/2 .y z/2 x 2
4
4
and one easily checks that the linear forms 2x C y C z; y z and x are linearly
independent.
b) It is tempting to say that q is already written in the desired form, but the problem
is that the linear forms x y, y z, and z x are not linearly independent (they
add up to 0). Therefore we write (by brutal expansion)
q.x; y; z/ D .x y/2 C .y z/2 C .z x/2 D
2.x 2 C y 2 C z2 xy yz zx/:
We are in the first case of the previous proof, so we focus on the term x 2 and try
to complete the square:
y C z 2 .y C z/2
C 2y 2 C 2z2 2yz D
q.x; y; z/ D 2 x 2
2
yCz
2 x
2
2
3y 2 C 3z2 6yz
yCz 2 3
C
C .y z/2
D2 x
2
2
2
and we easily check that the linear forms x yCz
and y z are linearly
2
independent.
t
u
10.1 Bilinear and Quadratic Forms
389
10.1.1 Problems for Practice
1. Prove that the map
b W R2 R2 ! R;
b..x; y/; .z; t // D xt yz
is a bilinear form on R2 . Describe the associated quadratic form.
2. Consider the map q W R4 ! R,
q.x; y; z; t / D xy C 2z2 C tx t 2 :
a) Prove that q is a quadratic form and find its polar form.P
b) Implement Gauss’ algorithm and write q in the form riD1 ˛i li2 with real
numbers ˛i and linearly independent linear forms li .
c) What is the signature of q?
3. Use
algorithm to write each of the following quadratic forms as
Pr Gauss’
2
iD1 ˛i li with linearly independent linear forms l1 ; : : : ; lr and scalars
˛1 ; : : : ; ˛r .
a) q.x; y; z/ D .x 2y C z/2 .x y/2 C z2 .
b) q.x; y; z/ D .x 2y C z/2 C .y 2z C x/2 .z 2x C y/2 .
c) q.x; y; z; t / D xy C yz C zt C tx.
d) q.x; y; z/ D x 2 C xy C yz C zx.
For each of these quadratic forms, find its signature and its rank.
4. a) If q is a quadratic form on Rn , is it true that fx 2 Rn jq.x/ D 0g is a vector
subspace of Rn ?
b) Describe geometrically fx 2 Rn jq.x/ D 0g if q.x; y/ D x 2 2y 2 ,
if q.x; y/ D x 2 C y 2 and finally if q.x; y; z/ D x 2 C y 2 z2 .
5. Which of the following maps are quadratic forms:
a) q W R3 ! R, q.x; y; z/ D x 2 C y 3 C z2 .
b) q W R4 ! R, q.x; y; z; t / D xt z2 C zt y.
c) q W R4 ! R, q.x; y; z; t / D .x C z/.y C t /?
6. Let V be the space of continuous real-valued maps on Œ1; 1 and consider the
map b W V V ! R defined by
b.f; g/ D
Z 1
.1 t 2 /f .t /g.t /dt C f 0 .1/g 0 .1/:
1
a) Prove that b is a symmetric bilinear form on V .
b) If q is the associated quadratic form, find those f 2 V for which q.f / D 0.
390
10 Forms
7. Let b be a bilinear form on a vector space V over R. The kernel of b is the set
ker b defined by
ker b D fx 2 V jb.x; y/ D 0
8y 2 V g:
a) Prove that ker b is a vector subspace of V .
b) Find the kernel of the polar form of the quadratic form q.x; y; z/ D xy C
yz C zx on R3 .
8. If b is a bilinear form on a vector space V over R, is it true that f.x; y/ 2
V V jb.x; y/ D 0g is a vector subspace of V V ?
9. Let V D Mn .R/ and consider the map q W V ! V defined by
q.A/ D Tr. t AA/ C .Tr.A//2 :
Prove that q is a quadratic form on V and describe its polar form.
One can define bilinear forms over C, but they do not have all the properties
one desires. Instead it is standard to take sesquilinear forms (sesqui- meaning
one-and-a-half).
Definition. Let V be a vector space over C. A sesquilinear form on V is a map
' W V V ! C such that
i) For all x 2 V the map '.x; / W V ! C sending y to '.x; y/ is linear.
ii) For all y 2 V the map '.; y/ W V ! C sending x to the complex conjugate
'.x; y/ of '.x; y/ 2 C is linear.
The sesquilinear form ' is called conjugate symmetric or hermitian if
'.x; y/ D '.y; x/ for all x; y 2 V .
In the next problems V is a C-vector space.
10. Prove that the set S.V / of sesquilinear forms on V is a vector subspace of the
C-vector space of all maps W V V ! C.
11. Prove that the set H.V / of hermitian sesquilinear forms on V is a vector
subspace of the R-vector space S.V /. Is H.V / a C-vector subspace of S.V /?
12. Prove that we have a direct-sum decomposition of R-vector spaces
S.V / D H.V / ˚ iH.V /:
13. Let ' be a hermitian sesquilinear form on V and consider the map ˚ W V ! C
defined by
˚.x/ D '.x; x/:
A map ˚ W V ! C of this form is called a hermitian quadratic form and if
˚.x/ D '.x; x/ for all x 2 V , we call the hermitian sesquilinear form ' the
polar form of ˚ .
10.2 Positivity, Inner Products, and the Cauchy–Schwarz Inequality
391
a) Prove that ˚.x/ 2 R for all x 2 V .
b) Prove that ˚.ax/ D jaj2 ˚.x/ for all a 2 C and x 2 V .
c) Prove that for all x; y 2 V we have
˚.x C y/ D ˚.x/ C ˚.y/ C 2Re.'.x; y//:
d) Deduce the polarization identity
˚.x C y/ ˚.x y/ C i.˚.x C iy/ ˚.x iy// D 4'.y; x/:
Conclude that the polar form of a quadratic hermitian form is unique.
e) Prove the parallelogram law
˚.x C y/ C ˚.x y/ D 2.˚.x/ C ˚.y//:
14. Let V D Cn and consider the map ˚ W V ! R defined by
˚.x1 ; : : : ; xn / D jx1 j2 C jx2 j2 C : : : C jxn j2
for all .x1 ; : : : ; xn / 2 Cn . Prove that ˚ is a hermitian quadratic form and find
its polar form.
15. Let V be the space of continuous maps f W Œ0; 1 ! C. Answer the same
questions as in the previous problem for the map ˚ W V ! R defined by
˚.f / D
Z 1
jf .t /j2 dt:
0
16. Prove the complex analogue of Gauss’ theorem: if ˚ is a hermitian quadratic
form on Cn , then we can find ˛1 ; : : : ; ˛r 2 f1; 1g and linearly independent
linear forms l1 ; : : : ; lr on Cn such that for all x 2 Cn
˚.x1 ; : : : ; xn / D
r
X
˛i jli .x/j2 :
iD1
10.2 Positivity, Inner Products, and the Cauchy–Schwarz
Inequality
A fundamental notion in the theory of bilinear and quadratic forms is that of
positivity:
Definition 10.11. a) A symmetric bilinear form b W V V ! R is called positive
if b.x; x/ 0 for all x 2 V . We say that b is positive definite if b.x; x/ > 0 for
all nonzero vectors x 2 V .
392
10 Forms
b) A quadratic form q on V is called positive (or positive definite) if its polar form
is positive (or positive definite). Thus q is positive if q.x/ 0 for all x 2 V , and
positive definite if moreover we have equality only for the zero vector.
Problem 10.12. Which of the following quadratic forms are positive? Which ones
are positive definite?
a) q.x; y; z/ D xy C yz C zx.
b) q.x; y; z/ D x 2 C 2.y z/2 C 3.z x/2 .
c) q.x; y; z/ D x 2 C y 2 C z2 xy yz zx.
Solution. a) We have to check whether xy C yz C zx 0 for all real numbers
x; y; z. This is definitely not the case, since taking z D 0 we would have xy 0
for all x; y 2 R, which is definitely absurd. Thus q is not positive, and thus not
positive definite either.
b) It is clear that q.x; y; z/ 0 for all x; y; z 2 R, since q.x; y; z/ is a sum of
squares of real numbers. Thus q is positive. To see whether q is positive definite,
we need to investigate when q.x; y; z/ D 0. This forces
x DyzDzx D0
and then x D y D z D 0. Thus q is positive definite.
c) We observe that
q.x; y; z/ D
.x y/2 C .y z/2 C .z x/2
0
2
for all x; y; z 2 R, thus q is positive. Notice that q is not positive definite, since
q.1; 1; 1/ D 0, but .1; 1; 1/ ¤ .0; 0; 0/.
t
u
We introduce now another fundamental concept, which will be constantly used
in the sequel:
Definition 10.13. a) An inner product on a R-vector space V is a symmetric
positive definite bilinear form on V .
b) An Euclidean space is a finite dimensional R-vector space V endowed with an
inner product.
We warn the reader that some authors do not impose that an Euclidean space
is finite dimensional. When dealing with inner products and Euclidean spaces, the
notation hx; yi is preferred to b.x; y/ (where b is the inner product on V ). If h ; i is
an inner product on V , we let
jjxjj D
p
hx; xi
and we call jjxjj the norm of x (the reason for this name will be given a bit later).
10.2 Positivity, Inner Products, and the Cauchy–Schwarz Inequality
393
Remark 10.14. a) If V is an Euclidean space, then any subspace W of V is
naturally an Euclidean space, when endowed with the restriction of the inner
product on V to W : note that this restriction is still an inner product on W , by
definition.
b) Rn endowed with the canonical inner product
h.x1 ; : : : ; xn /; .y1 ; : : : ; yn /i D x1 y1 C x2 y2 C : : : C xn yn
is an Euclidean space. We leave it to the reader to check this assertion.
Problem 10.15. Let n be a positive integer and let V be the space of polynomials
with real coefficients whose degree does not exceed n. Prove that
hP; Qi D
n
X
P .i /Q.i /
iD0
defines an inner product on V .
Solution. First, it is clear that for any P the map Q 7! hP; Qi is linear, and
similarly for any Q the map P 7! hP; Qi is linear. Next, we have
hP; P i D
n
X
P .i /2
iD0
and the last quantityP
is clearly nonnegative. Finally, assume that hP; P i D 0 for
some P 2 V . Then niD0 P .i /2 D 0, which forces P .i / D 0 for all 0 i n.
Thus P has at least nC1 distinct roots and since deg P n, we deduce that P D 0.
The result follows.
t
u
Problem 10.16. Let V be the space of continuous real-valued maps on Œa; b
(where a < b are fixed real numbers). Prove that the map h ; i defined by
hf; gi D
Z b
f .x/g.x/dx
a
is an inner product on V .
Solution. It is easy to see that h ; i is a symmetric bilinear form, it remains to check
that it is positive definite. Since f 2 is a continuous nonnegative map, we have
hf; f i D
Z b
a
f .x/2 dx 0:
394
10 Forms
Suppose that hf; f i D 0 and that f is nonzero. Thus there is x0 2 .a; b/ such that
f .x0 / ¤ 0. By continuity we can find " > 0 such that .x0 "; x0 C "/ Œa; b and
jf .x/j " for x 2 .x0 "; x0 C "/. It follows that
hf; f i Z x0 C"
"2 dx D 2"3 > 0
x0 "
t
u
and the result follows.
Problem 10.17. Let V be the space of smooth functions f W Œ0; 1 ! R such that
f .0/ D f .1/ D 0. Prove that
hf; gi D Z 1
.f .x/g 00 .x/ C f 00 .x/g.x//dx
0
defines an inner product on V .
Solution. Using integration by parts, we obtain
hf; gi D .fg 0 C f 0 g/j10 C 2
Z 1
f 0 .x/g 0 .x/dx D 2
0
Z 1
f 0 .x/g 0 .x/dx:
0
The last formula makes it clear that h ; i is a symmetric bilinear form on V .
It remains to see that it is positive definite. We have
hf; f i D 2
Z 1
.f 0 .x//2 dx 0;
0
with equality if and only if (by the previous problem) f 0 .x/ D 0 for all x. This
last condition is equivalent to saying that f is constant. But since f vanishes by
assumption at 0, if f if constant then it must be the zero map. Thus hf; f i D 0
implies f D 0, which yields the desired result.
t
u
The fundamental result concerning positive symmetric bilinear forms is the
Theorem 10.18 (Cauchy–Schwarz Inequality). Let b W V V ! R be a
symmetric bilinear form and let q be its associated quadratic form.
a) If b is positive, then for all x; y 2 V we have
b.x; y/2 q.x/q.y/:
b) If moreover b is positive definite and if b.x; y/2 D q.x/q.y/ for some x; y 2 V ,
then x and y are linearly dependent.
Proof. a) Consider the map F W R ! R given by
F .t / D q.x C ty/:
10.2 Positivity, Inner Products, and the Cauchy–Schwarz Inequality
395
Note that since b is bilinear and symmetric, we have
F .t / D b.x C ty; x C ty/ D b.x; x/ C b.x; ty/ C b.ty; x/ C b.ty; ty/
D q.x/ C t b.x; y/ C t b.x; y/ C t 2 b.y; y/ D q.x/ C 2t b.x; y/ C t 2 q.y/:
Thus F .t / is a quadratic polynomial function with leading coefficient q.y/ 0.
Moreover, since b is positive, we have F .t / 0 for all t 2 R. It follows that the
discriminant of F is nonpositive, that is
4b.x; y/2 4q.x/q.y/ 0:
But this is precisely the desired inequality (after division by 4).
b) Suppose that b is positive definite and that b.x; y/2 D q.x/q.y/. We may assume
that y ¤ 0, so that q.y/ > 0. Thus with the notations used in the proof of part a),
the discriminant of F is 0. It follows that F has a unique real root, say t . Then
q.x Cty/ D 0 and since q is positive definite, this can only happen if x Cty D 0.
Thus x and y are linearly dependent.
t
u
The following result is a direct consequence of the previous theorem, but it is of
fundamental importance:
Corollary 10.19. If V is a vector space over R endowed with an inner product
h ; i, then for all x; y 2 V we have
jhx; yij jjxjj jjyjj:
Example 10.20. a) Let V D Rn be endowed with the canonical inner product.
The inequality jhx; yij jjxjj jjyjj can be re-written (after squaring) as
.x1 y1 C : : : C xn yn /2 .x12 C : : : C xn2 /.y12 C : : : C yn2 /:
b) Let V be the space of continuous real-valued maps on Œa; b, where a < b are
real numbers. The map h ; i W V V ! R defined by
hf; gi D
Z b
f .x/g.x/dx
a
is an inner product on V (see Problem 10.16) and the inequality in the corollary
becomes (after squaring)
!2
Z b
f .x/g.x/dx
a
Z b
!
a
f .x/ dx !
Z b
2
2
g.x/ dx :
a
396
10 Forms
Let V be a vector space over R, endowed with an inner product h ; i. By the
previous corollary we have
1 hu; vi
1 for all u; v 2 V f0g:
kukkvk
Thus there exists a unique angle 2 Œ0; satisfying
cos D
hu; vi
:
kukkvk
We define this angle to be the angle between the vectors u; v.
An important consequence of the Cauchy–Schwarz inequality is
Theorem 10.21 (Minkowski’s Inequality). Let V be a vector space over R and
let q be a positive quadratic form on V . Then for all x; y 2 E we have
p
p
p
q.x/ C q.y/ q.x C y/:
Proof. Squaring the inequality we obtain the equivalent one
p
q.x/ C q.y/ C 2 q.x/q.y/ q.x C y/:
Letting b be the polar form of q, the polarization identity yields
q.x C y/ D q.x/ C q.y/ C 2b.x; y/:
Comparing this equality and the previous inequality, we obtain the equivalent form
p
q.x/q.y/ b.x; y/;
which, squared, is exactly the Cauchy–Schwarz inequality.
t
u
Consider now an inner product h ; i on some R-vector space V . Recall that we
defined
p
jj jj W V ! R; jjxjj D hx; xi:
Since an inner product is positive definite, we see that jjxjj 0 for all x, with
equality if and only if x D 0. Also, since an inner product is a bilinear form, we
have jjaxjj D jajjjxjj for all a 2 R. Finally, Minkowski’s inequality yields
jjx C yjj jjxjj C jjyjj
for all x; y 2 V . We call this inequality the triangle inequality.
A map jj jj W V ! R satisfying the following properties:
• jjvjj 0 for all v 2 V , with equality if and only if v D 0.
10.2 Positivity, Inner Products, and the Cauchy–Schwarz Inequality
397
• jjavjj D jaj jjvjj for all v 2 V and a 2 R.
• jjv C wjj jjvjj C jjwjj for all v; w 2 V
is called a norm on V . This explains why we called jjxjj the norm of x.
Minkowski’s inequality shows that any inner product on a vector space V over
R naturally endows the space V with a norm jj jj. We can use this norm to define a
distance d W V V ! RC by
d.u; v/ D jju vjj:
One can check (see the exercise section) that for all u; v; w 2 V we have
d.u; v/ C d.v; w/ d.u; w/:
This construction is of fundamental importance, since it allows us to do analysis
on V as we do it on R. Note that if V D Rn with n 3, endowed with its
canonical inner product, then the distance obtained as above is really the Euclidean
distance that we are used with on the line, in the plane and in three-dimensional
space. For instance, the distance between the points .1; 1/ and .2; 3/ is
d..1; 1/; .2; 3// D
p
p
.1 2/2 C .1 3/2 D 5;
and this really corresponds to the geometric distance between these two points in
the plane.
10.2.1 Practice Problems
In the following problems, whenever the inner product on Rn is not specified, it is
implicitly assumed that we consider the canonical inner product on Rn , defined by
h.x1 ; : : : ; xn /; .y1 ; : : : ; yn /i D x1 y1 C x2 y2 C : : : C xn yn :
1. Let V be an R-vector space endowed with an inner product h ; i. Recall that
the distance between two points x; y 2 V is
d.x; y/ D
p
hx y; x yi:
Prove the triangle inequality
d.x; y/ C d.y; z/ d.x; z/
for all x; y; z 2 V .
398
10 Forms
2. a) What is the distance between the vectors u D .1; 1; 1/ and v D .1; 2; 3/ in
R3 ?
b) What are their respective norms?
c) What is the angle between u and v?
3. Find the angle between the vectors u D .1; 2/ and v D .1; 1/ in R2 .
4. Find the vectors v of norm 1 in R3 such that the angle between v and .1; 1; 0/
is 4 .
5. Among all vectors of the form .1; x; 2; 1/ with x 2 R, which vector is at
smallest distance from .0; 1; 1; 1/?
6. Find all values of ˛ 2 R for which the map h ; i W R4 R4 ! R defined by
h.x1 ; x2 ; x3 ; x4 /; .y1 ; y2 ; y3 ; y3 /i D ˛x1 y1 C 2x2 y2 C .1 ˛/x3 y3 C x4 y4
is an inner product.
7. Prove that if f W Œa; b ! R is a continuous map, then
!2
Z b
f .t /dt
.b a/
a
Z b
f .t /2 dt:
a
8. a) Prove that if x1 ; : : : ; xn are positive real numbers, then
.x1 C x2 C : : : C xn / 1
1
1
C
C ::: C
x1
x2
xn
n2 :
When do we have equality?
b) Prove that if f W Œa; b ! .0; 1/ is a continuous map, then
Z b
f .t /dt Z b
a
a
1
dt .b a/2 :
f .t /
When do we have equality?
9. Let f W Œ0; 1 ! RC be a continuous map taking nonnegative values and let
xn D
Z 1
t n f .t /dt:
0
Prove that for all n; p 0
xnCp p
x2n p
x2p :
10. Let V be a C-vector space and let ˚ be a hermitian quadratic form on V .
Assume that ˚ is positive definite, i.e., ˚.x/ > 0 for all nonzero vectors x 2 V .
Let ' be the polar form of ˚.
10.3 Bilinear Forms and Matrices
399
a) Prove the Cauchy–Schwarz inequality: for all x; y 2 V we have
j'.x; y/j2 ˚.x/˚.y/;
with equality if and only if x; y are linearly dependent.
b) Prove Minkowski’s inequality: for all x; y 2 V we have
p
p
p
˚.x/ C ˚.y/ ˚.x C y/:
11. Prove that there is no inner product h ; i on the space V of continuous realvalued maps on Œ0; 1 such that for all f 2 V we have
hf; f i D sup f .x/2 :
x2Œ0;1
Hint: the parallelogram law.
10.3 Bilinear Forms and Matrices
From now on we will focus only on finite dimensional vector spaces V over R.
We have already seen that we can describe linear transformations on V in terms of
matrices. We would like to have a similar description for bilinear forms.
Definition 10.22. Consider a basis e1 ; : : : ; en of V , and let b be a symmetric
bilinear form on V . The matrix of b with respect to the basis e1 ; : : : ; en is the
matrix .b.ei ; ej //1i;j n .
b) If q is a quadratic form on V , the matrix of q with respect to the basis
e1 ; : : : ; en is the matrix of its polar form with respect to e1 ; : : : ; en .
Theorem 10.23. Let V be a finite dimensional vector space and let e1 ; : : : ; en
be a basis of V . Sending a symmetric bilinear form to its matrix with respect to
e1 ; : : : ; en establishes an isomorphism of R-vector spaces between the vector space
of symmetric bilinear forms on V and the vector space of symmetric matrices in
Mn .R/.
Proof. It is clear that if A is the matrix of b and A0 is the matrix of b 0 , then cA C A0
is the matrix of cb C b 0 for all scalars c 2 R. Also, since b is symmetric, we have
b.ei ; ej / D b.ej ; ei /, thus the matrix of b is symmetric. Thus sending a symmetric
bilinear form b to its matrix A with respect to e1 ; : : : ; en induces a linear map '
from symmetric bilinear forms on V to symmetric matrices A 2 Mn .R/.
Injectivity of the map ' follows directly from Remark 10.2, so it remains to prove
that ' is surjective. Start with any symmetric matrix A D Œaij . If x D x1 e1 C : : : C
xn en and y D y1 e1 C : : : C yn en are vectors in V , define
b.x; y/ D
n
X
i;j D1
aij xi yj :
400
10 Forms
It is easy to see that b is a symmetric bilinear form whose matrix in the basis
e1 ; : : : ; en is precisely A.
t
u
A natural question is the following: what is explicitly the inverse of the
isomorphism given by the previous theorem? Fortunately, this has already been
answered during the proof: it is the map sending a symmetric matrix A D Œaij to the bilinear form b defined by
b.x1 e1 C : : : C xn en ; y1 e1 C : : : C yn en / D
n
X
aij xi yj :
i;j D1
This formula does not come out of nowhere, but it is imposed by Remark 10.2.
Also, note that the right-hand side of the previous equality can be written as t XAY ,
where X; Y are the column vectors whose coordinates are x1 ; : : : ; xn , respectively
y1 ; : : : ; yn . Here we consider t X as a 1 n matrix and of Y as a n 1 matrix,
so that t XAY is a 1 1 matrix, that is a real number. We obtain the following
characterization of the matrix of b with respect to the basis e1 ; : : : ; en .
Theorem 10.24. Let e1 ; e2 ; : : : ; en be a basis of V and let b be a symmetric bilinear
form on V . The matrix of b with respect to e1 ; : : : ; en is the unique symmetric matrix
A 2 Mn .R/ such that
b.x; y/ D t XAY
for all vectors x; y 2 V (where X; Y are the column vectors whose coordinates are
those of x; y with respect to e1 ; : : : ; en ).
Remark 10.25. Keep the hypotheses and notations of the previous theorem and
discussion. If q is the quadratic form attached to b, then
q.x1 e1 C : : : C xn en / D t XAX D
n
X
aij xi xj D
i;j D1
n
X
ai i xi2 C 2
i D1
X
aij xi xj ;
1i<j n
the last equality being a consequence of the equality aij D aj i for i < j . The
presence of the factor 2 is quite often a source of errors when dealing with the link
between quadratic forms and matrices. Indeed, it is quite tempting (and thishappens
01
is
quite often!) to say that the quadratic form associated with the matrix A D
10
q.x1 ; x2 / D x1 x2 ;
which is wrong: the quadratic form associated with A is
q.x1 ; x2 / D 2x1 x2 :
10.3 Bilinear Forms and Matrices
401
An even more common mistake is to say that the matrix associated with the
quadratic form
q.x; y; z/ D xy C yz C zx
3
2 1 13
0 2 2
011
on R3 is 4 1 0 1 5. Actually, the correct matrix is 4 12 0 12 5, since the polar form
1 1
110
0
2 2
of q is the bilinear form b defined by
2
b..x; y; z/; .x 0 ; y 0 ; z0 // D
x 0 y C y 0 x C x 0 z C z0 x C y 0 z C z0 y
:
2
Armed with Theorem 10.24, it is not difficult to understand how the matrix of
a bilinear form varies when we vary the basis. More precisely, consider two bases
e1 ; : : : ; en and e10 ; : : : ; en0 of V and let A; A0 be the matrices of a symmetric bilinear
form b with respect to these bases. If x D x1 e1 C : : : C xn en D x10 e10 C : : : C xn0 en0 is
a vector in V , let X (respectively X 0 ) be the column vector whose coordinates are
x1 ; : : : ; xn (respectively x10 ; : : : ; xn0 ). Then
b.x; y/ D t XAY D t X 0 A0 Y 0 :
Letting P be the change of basis matrix from e1 ; : : : ; en to e10 ; : : : ; en0 (recall that
the columns of P are the coordinates of e10 ; : : : ; en0 when expressed in terms of
e1 ; : : : ; en ), we have
X D PX 0 ;
Y D P Y 0:
It follows that
t
X 0 A0 Y 0 D b.x; y/ D t XAY D t .PX 0 /AP Y 0 D t .X 0 / t PAP Y 0
and we obtain the following
Theorem 10.26. Suppose that a symmetric bilinear form b has matrix A with
respect to a basis e1 ; : : : ; en of V . Let e10 ; : : : ; en0 be another basis of V and let
P be the change of basis matrix from e1 ; : : : ; en to e10 ; : : : ; en0 . Then the matrix of b
with respect to e10 ; : : : ; en0 is
A0 D t PAP:
The previous theorem suggests the following
Definition 10.27. Two symmetric matrices A; B 2 Mn .R/ are called congruent
if they are the matrices of some symmetric bilinear form in two bases of F n .
402
10 Forms
Equivalently, A and B are congruent if there is an invertible matrix P 2 Mn .F /
such that B D t PAP .
By definition, the congruence relation is an equivalence relation on the set of
symmetric matrices A 2 Mn .R/, that is:
• any matrix A is congruent to itself (this is clear).
• If A is congruent to B, then B is congruent to A: indeed, if B D t PAP , then
A D t .P 1 /BP 1 .
• If A is congruent to B and B is congruent to C , then A is congruent to C . Indeed,
if B D t PAP and C D t QBQ, then C D t .PQ/A.PQ/.
Note that two congruent matrices have the same rank. This allows us to define
the rank of a symmetric bilinear form as the rank of its matrix in any basis of
the surrounding space (the previous discussion shows that it is independent of the
choice of the basis). Note that we cannot define the determinant of a symmetric
bilinear form in a similar way: if A and B are congruent matrices, then it is not true
that det A D det B. All we can say is that if B D t PAP , then
det B D det. t P / det A det P D det A .det P /2 ;
thus det A and det B differ by the square of a nonzero real number. In particular,
they have the same sign. The discriminant of a symmetric bilinear form is defined
to be the sign of the determinant of its matrix in a basis of the surrounding space
(it is independent of the choice of the basis).
The fundamental theorem concerning the congruence relation is the following
consequence of Theorem 10.9:
Theorem 10.28 (Gauss). Any symmetric matrix A 2 Mn .R/ is congruent to a
diagonal matrix.
Proof. Consider the associated quadratic form q on V D Rn
q.X / D t XAX;
i:e:;
q.x1 ; : : : ; xn / D
n
X
aij xi xj :
i;j D1
By Theorem 10.26, it suffices to prove the existence of a basis of Rn with respect to
which the matrix of q is diagonal (as then A will be congruent to the corresponding
diagonal matrix).
By Theorem 10.9 we know that we can find real numbers ˛1 ; : : : ; ˛r and linearly
independent linear forms l1 ; : : : ; lr 2 V such that
q.X / D
r
X
iD1
˛i li .X /2
10.3 Bilinear Forms and Matrices
403
for all X 2 V . Complete the family .l1 ; : : : ; lr / to a basis .l1 ; : : : ; ln / of V .
By Theorem 6.13 there is a basis .e1 ; : : : ; en / of V whose dual basis is .l1 ; : : : ; ln /.
Thus, if X D x1 e1 C : : : C xn en 2 V , then
r
X
q.X / D
˛i li .X /2 D
iD1
r
X
˛i xi2 ;
iD1
so that the matrix of q with respect to the basis .e1 ; : : : ; en / is the diagonal matrix
D with diagonal entries ˛1 ; : : : ; ˛r .
t
u
Remark 10.29. The proof also shows the following interesting fact: if q is a
quadratic form on Rn with polar form b, then we can find a basis f1 ; : : : ; fn of
Rn such that
b.fi ; fj / D 0
1 i ¤ j n:
for all
Such a basis is called a q-orthogonal basis of Rn . We can even impose that q.fi / 2
f1; 0; 1g for all 1 i n. Indeed, as in the above proof we can write
q.X / D
r
X
˛i li .X /2
iD1
and by the discussion following Theorem 10.9 we can even ensure that ˛i 2 f1; 1g
for all 1 i r. If e1 ; : : : ; en is a basis as in the above proof, then
q.X / D
r
X
˛i xi2
iD1
for X D x1 e1 C : : : C xn en , thus
b.X; Y / D
n
X
˛i xi yi
iD1
with ˛i D 0 for r < i n. It follows that we can take fi D ei for 1 i n.
We introduce one more definition before ending this section:
Definition 10.30. A symmetric matrix A 2 Mn .R/ is called positive if t XAX 0
for all X 2 Rn . It is called positive definite if t XAX > 0 for all nonzero vectors
X 2 Rn .
if and only
In other words, A D Œaij is positive (respectively positive definite)
P
if the quadratic form associated with A, namely .x1 ; : : : ; xn / ! ni;j D1 aij xi xj , is
positive (respectively positive definite). Any symmetric positive definite matrix A
gives rise to an inner product h ; iA on Rn , defined by
hX; Y iA D hX; AY i D t XAY;
where h ; i is the canonical inner product on Rn .
404
10 Forms
Note that if A is positive then letting e1 ; : : : ; en be the canonical basis of Rn ,
we have
ai i D t ei Aei 0;
and if A is positive definite then the inequality is strict. Also, note that any matrix
congruent to a positive (respectively positive definite) symmetric matrix is itself
positive (respectively positive definite), since
t
X. t PAP /X D t .PX /A.PX /:
Problem 10.31. Let A 2 Mn .R/ be any matrix.
a) Prove that t AA is symmetric and positive.
b) Prove that t AA is positive definite if and only if A is invertible.
Solution. Note that
. AA/ D t A t . t A/ D t AA;
t t
thus t AA is symmetric. Next, for all X 2 Rn we have
t
X. t AA/X D t .AX /.AX / D jjAX jj2 0;
with equality if and only if AX D 0. Both a) and b) follow from these observations
(and the fact that A is invertible if and only if AX D 0 implies X D 0).
t
u
Remark 10.32. The same result holds with A t A instead of t AA.
Remarkably, the converse of the result established in the previous problem holds:
Theorem 10.33. Any symmetric positive matrix A 2 Mn .R/ can be written as t BB
for some matrix B 2 Mn .R/.
Proof. We use Gauss’ Theorem 10.28. By that theorem, there is an invertible
matrix P such that t PAP D D is a diagonal matrix. By the discussion preceding
Problem 10.31 we know that D itself is positive and its diagonal coefficients di i
t
are nonnegative. Hence
p we can write D D D1 D1 for a diagonal matrix D1 whose
diagonal entries are di i . But then
A D t P 1 DP 1 D t P 1 t D1 D1 P 1 D t BB;
with B D D1 P 1 .
t
u
Problem 10.34. Let .V; h ; i/ be an Euclidean space and let v1 ; : : : ; vn be a family
of vectors in V . Let A 2 Mn .R/ be the Gram matrix of the family, i.e., the matrix
whose .i; j /-entry is hvi ; vj i.
a) Prove that A is symmetric positive.
b) Prove that A is positive definite if and only if v1 ; : : : ; vn are linearly independent.
10.3 Bilinear Forms and Matrices
405
Solution. Again, it is clear that A is symmetric. For all x1 ; : : : ; xn 2 R we have
n
X
i;j D1
n
X
iD1
xi n
X
n
X
aij xi xj D
hvi ; xj vj i D
j D1
n
n
n
X
X
X
hxi vi ;
xj vj i D jj
xi vi jj2 0;
j D1
iD1
with equality if and only if
xi xj hvi ; vj i D
i;j D1
i D1
Pn
iD1 xi vi D 0. The result follows.
t
u
Problem 10.35. Let n 1 and let A D Œaij 2 Mn .R/ be defined by aij D
min.i; j /. Prove that A is symmetric and positive definite.
Solution. It is clear that the matrix is symmetric. Note that we can write
min.i; j / D
X
1:
ki;kj
Doing so and interchanging orders of summation, we see that
n
n X
X
iD1 j D1
min.i; j /xi xj D
n X
n X
n
X
kD1 iDk j Dk
xi xj D
n
n
X
X
kD1
!2
xi
:
i Dk
This last expression is clearly nonnegative, and it equals 0 if and only if
x1 C C xn D 0; x2 C C xn D 0; : : : ; xn D 0:
Subtracting each equation from the one before it, we see that the unique solution is
x1 D x2 D D xn D 0, which shows that the matrix is positive definite.
An alternative argument is to note that
Z 1
min.i; j / D
fi .x/fj .x/dx;
0
where fi .x/ D 1 for x 2 Œ0; i and fi .x/ D 0 for x > i (i.e., fi is the
characteristic function of the interval Œ0; i ). It follows that A is the Gram matrix
of the family f1 ; : : : ; fn , thus it is symmetric and positive. Since f1 ; : : : ; fn are
linearly independent (in the space of integrable functions on Œ0; 1/), it follows that
A is positive definite (all this uses Problem 10.34).
Yet another argument is to note that A D t BB where B is the upper triangular
matrix all of whose nonzero coefficients are equal to 1. Since B is invertible, we
deduce that A is positive definite by Problem 10.31.
t
u
406
10 Forms
10.3.1 Problems for Practice
1. Consider the map q W R3 ! R defined by
q.x; y; z/ D .x C 2y C 3z/2 C .y C z/2 .y z/2 :
a) Prove that q is a quadratic form and find its polar form b.
b) Is q positive definite?
c) Give the matrix of q with respect to the canonical basis of R3 .
d) Consider the vectors
v1 D .2; 0; 0/;
v2 D .5; 1; 1/;
v3 D .1; 1; 1/:
Prove that .v1 ; v2 ; v3 / is a basis of R3 and find the matrix of b with respect
to this basis.
2. Consider the map q W R3 ! R defined by
q.x; y; z/ D x.x y C z/ 2y.y C z/:
a) Prove that q is a quadratic form and find its polar form b.
b) Find the matrix of q with respect to the canonical basis of R3 .
c) Find those vectors v 2 R3 such that b.v; w/ D 0 for all vectors w 2 R3 .
3. Consider the quadratic form q on R3 defined by
q.x; y; z/ D 2x.x C y z/ C y 2 C z2 :
a) Find the matrix of q with
to the canonical basis of R3 .
Pr respect
2
b) Write q in the form iD1 ˛i li with l1 ; : : : ; lr linearly independent linear
forms.
c) Find the rank, signature, and discriminant of q.
d) Find a q-orthogonal basis of R3 and give the matrix of q with respect to this
basis.
4. Is the matrix A D Œaij 2 Mn .R/ with aij D i j positive? Is it positive
definite?
5. a) Prove that a symmetric positive definite matrix is invertible.
b) Prove that a symmetric positive matrix is positive definite if and only if it is
invertible.
6. All entries but the diagonal ones of the matrix A 2 Mn .R/ are equal to 1,
while all diagonal entries are equal to n1. Is A positive? Is it positive definite?
7. Prove that any symmetric positive matrix A 2 Mn .R/ is the Gram matrix of a
family of vectors v1 ; : : : ; vn 2 Rn . Hint: use Theorem 10.33.
8. Let V be a R-vector space endowed with an inner product h ; i and let x1 ; : : : ; xn
be vectors in V . The Gram determinant of x1 ; : : : ; xn , denoted G.x1 ; : : : ; xn /
10.3 Bilinear Forms and Matrices
407
is by definition the determinant of the Gram matrix Œhxi ; xj i1i;j n . Prove that
x1 ; : : : ; xn is linearly independent if and only if G.x1 ; : : : ; xn / ¤ 0.
9. Compute the Gram determinant of the vectors
x1 D .1; 2; 1/;
x2 D .1; 1; 2/;
x3 D .1; 0; 1/
in R3 . Are they linearly independent?
1
10. Prove that the matrix A D ΠiCj
1i;j n is symmetric and positive (hint: what
R 1 iCj 1
dt ?).
is 0 t
11. Let A D Œaij 2 Mn .R/ be a matrix such that aij D 1 for i ¤ j , and ai i > 1
for all i 2 Œ1; n. Prove that A is symmetric and positive definite.
12. Let n be a positive integer. Consider the space V of polynomials of degree at
most n with real coefficients. Define a map
b W V V ! R;
b.P; Q/ D
Z 1
tP .t /Q0 .t /dt;
0
where Q0 is the derivative of Q.
a) Prove that b is a bilinear form on V . Is it symmetric?
b) Let q be the quadratic form attached to b, so that q.x/ D b.x; x/. Find the
matrix of q with respect to the basis 1; X; : : : ; X n of V .
13. In this long problem we establish the link between sesquilinear maps and
matrices, extending thus the results of this section to finite dimensional
C-vector spaces. Let V be a finite dimensional C-vector space and let B D
.e1 ; : : : ; en / be a basis of V . Recall that S.V / is the set of sesquilinear forms
on V .
a) Let ' 2 S.V / be a sesquilinear form on V . The matrix of ' with respect to
B is the matrix A D Œaij 2 Mn .C/ where aij D '.ei ; ej / for 1 i; j n.
Prove that for all vectors x; y 2 V we have
'.x; y/ D X AY;
where X; Y are the column vectors whose coordinates are the coordinates of
x; y with respect to B, and X D t X is the row vector whose coordinates
are the complex conjugates of the coordinates of x with respect to B.
b) Prove that A is the unique matrix having the property stated in a).
c) Prove that the map sending ' 2 S.V / to its matrix with respect to B is an
isomorphism of C-vector spaces between S.V / and Mn .C/.
d) Let ' 2 S.V / and let A be its matrix with respect to B. Prove that '
is hermitian if and only if A satisfies A D t A. Such a matrix is called
conjugate symmetric or hermitian. We usually write A instead of t A,
so a matrix A is hermitian if and only if A D A .
408
10 Forms
e) Let B 0 be another basis of V and let P be the change of basis matrix from
B to B 0 . If x; y 2 V , let X; Y (respectively X 0 ; Y 0 ) be the column vectors
whose coordinates are the coordinates of x; y with respect to B (respectively
B 0 ). Prove that if A (respectively A0 ) is the matrix of ' with respect to B
(respectively B 0 ), then
A0 D P AP:
f) A hermitian matrix A 2 Mn .C/ is called positive (respectively positive
definite) if X AX 0 for all X 2 Cn (respectively if moreover X AX ¤ 0
for X ¤ 0). Prove that for any matrix B 2 Mn .C/ the matrices B B and
BB are hermitian positive, and they are hermitian positive definite if and
only if B is invertible.
g) Prove that any hermitian positive matrix A can be written as BB for some
matrix B 2 Mn .C/.
10.4 Duality and Orthogonality
Let b be a symmetric bilinear form on a vector space V over R (for now we don’t
assume that V is finite dimensional). For each y 2 V the map x ! b.x; y/ is by
definition linear, thus it is a linear form on V . Letting V be the dual of V , we obtain
therefore a map
'b W V ! V ;
'b .y/.x/ D b.x; y/:
Since for all x 2 V the map y ! b.x; y/ is linear, it follows that 'b is a linear map.
Problem 10.36. Suppose that V is finite dimensional and let e1 ; : : : ; en be a basis
of V . Let e1 ; : : : ; en be the dual basis1 of e1 ; : : : ; en . Prove that the matrix of 'b
with respect to the bases e1 ; : : : ; en and e1 ; : : : ; en is the matrix of b in the basis
e1 ; : : : ; en .
Solution. For x D x1 e1 C : : : C xn en 2 V we have
'b .ei /.x/ D b.x; ei / D x1 b.e1 ; ei / C : : : C xn b.en ; ei /
D b.e1 ; ei /e1 .x/ C : : : C b.en ; ei /en .x/:
Thus
'b .ei / D b.e1 ; ei /e1 C : : : C b.en ; ei /en :
1
Recall that ei .ej / D 1i Dj , where 1i Dj equals 1 if i D j and 0 otherwise.
10.4 Duality and Orthogonality
409
t
u
The result follows.
The following result, though very simple, is very useful:
Theorem 10.37 (Riesz’s Representation Theorem). If V is an Euclidean (thus
finite dimensional) space with inner product h ; i, then the map 'h ; i W V ! V is
an isomorphism. In other words, for any linear map l W V ! R there is a unique
vector v 2 V such that l.x/ D hv; xi for all x 2 V .
Proof. Since dim V D dim V , it suffices to prove that 'h ; i is injective. Assume
that 'h ; i .x/ D 0 for some x 2 V . Then by definition hx; yi D 0 for all y 2 V , in
particular hx; xi D 0. But then jjxjj2 D 0, where jj jj is the norm associated with
the inner product, and so x D 0.
t
u
Let V be again an arbitrary vector space over R and let b be a symmetric bilinear
form on V .
Definition 10.38. a) Two vectors x; y 2 V are called orthogonal (with respect to
b) if b.x; y/ D 0.
b) The orthogonal S ? of a subset S of V is the set of vectors v 2 V which are
orthogonal to each element of S .
c) Two subsets S; T of V are called orthogonal if S T ? , that is any element of
S is orthogonal to any element of T .
Remark 10.39. Suppose
that h ; i is an inner product on V , with associated norm
p
jj jj (thus jjxjj D hx; xi). Then vectors x; y 2 V are orthogonal if and only if
jjx C yjj2 D jjxjj2 C jjyjj2 :
This is the Pythagorean theorem and follows directly from the polarization identity
jjx C yjj2 D jjxjj2 C 2hx; yi C jjyjj2 :
Coming back to the general case, note that b.x; y/ D 0 is equivalent to
'b .y/.x/ D 0, i.e., x and the linear form 'b .y/ are orthogonal in the sense of
duality. This allows us to use the results we have already established in the chapter
concerned with duality to obtain information about symmetric bilinear forms. As a
consequence, we obtain the following fundamental:
Theorem 10.40. Let V be an Euclidean space and let W be a subspace of V . Then
W ? ˚ W D V , in particular
dim W ? C dim W D dim V:
Moreover, .W ? /? D W .
We can slightly refine the following theorem by allowing infinite dimensional
ambient spaces and finite dimensional subspaces therein.
410
10 Forms
Theorem 10.41. Let V be any R-vector space endowed with an inner product and
let W be a finite dimensional subspace of V . Then
W ˚W? DV
and
W ?? D W:
Proof. Let h ; i be the inner product on V , with associated norm jj jj. We start by
proving that W ˚ W ? D V . If x 2 W \ W ? , then hx; xi D 0, that is jjxjj2 D 0,
and so x D 0. Thus W \ W ? D f0g. We still need to prove that W C W ? D V , so
let x 2 V be arbitrary and consider the map f W W ! R defined by f .y/ D hx; yi.
Then f is a linear form on W . Since W is an Euclidean space (for the inner product
inherited from the one on V ), by Theorem 10.37 there is a unique z 2 W such that
f .y/ D hz; yi for all y 2 W . We deduce that hx z; yi D 0 for all y 2 W , thus
x z 2 W ? . Since z 2 W , we conclude that x 2 W C W ? and the result follows.
It remains to prove that W ?? D W . By definition W is contained in W ?? ,
so let x 2 W ?? . By the result established in the previous paragraph we can write
x D y C z with y 2 W and z 2 W ? . Since x 2 W ?? , we have hx; zi D 0, thus
hy; zi C jjzjj2 D 0. But y 2 W and z 2 W ? , thus hy; zi D 0 and so jjzjj2 D 0, then
z D 0 and finally x D y 2 W .
t
u
Remark 10.42. The hypothesis that W is finite dimensional is crucial in the
previous theorem. Indeed, consider the following situation: V is the space of
continuous real-valued maps on Œ0; 1 and W is the subspace consisting of maps
f such that f .0/ D 0. Endow V with the inner product given by
hf; gi D
Z 1
f .x/g.x/dx:
0
Then the orthogonal W ? of W is reduced to f0g, thus we do not have W ˚W ? D V
or W ?? D W . To prove that W ? D f0g, let f 2 W ? . Note that the map g defined
by g.x/ D xf .x/ belongs to W and so hf; gi D 0. This can be written as
Z 1
xf .x/2 dx D 0:
0
Since the map x 7! xf .x/2 is continuous, nonnegative, and with average 0, it is the
0 map and so f .x/ D 0 for all x 2 .0; 1. By continuity, we deduce that f D 0.
The previous theorem has quite a lot of important applications in analysis, in
particular concerning minimization problems. We will see a few examples in the
sequel, but before that we introduce two very important definitions.
Definition 10.43. Let V be a R-vector space endowed with an inner product. Let
W be a finite dimensional subspace of V . By Theorem 10.41 we have V D W ˚
W ? . The orthogonal projection onto W is the projection pW W V ! W onto W
along W ? . In other words, for x 2 V pW .x/ is the unique vector in W such that
x pW .x/ 2 W ? .
10.4 Duality and Orthogonality
411
Remark 10.44. Simply by definition we have
pW .x/ C pW ? .x/ D x
for all x 2 V and all subspaces W of V . This can be very useful in practice, since it
might be easier to compute the orthogonal projection onto W ? than that onto W .
Example 10.45. Endow R3 with the canonical inner product. Let W D f.0; 0; a3 / j
a3 2 Rg. Then the orthogonal complement W ? is
W ? D f.a1 ; a2 ; 0/ j a1 ; a2 2 Rg:
Note that W is the Cartesian z-axis and W ? is the Cartesian xy-plane. The orthogonal projection PW of R3 onto W is the map
PW W R3 ! R3 ;
PW .x; y; z/ D .0; 0; z/:
Problem 10.46. Let
v1 D .1; 1; 0; 0/
and
v2 D .1; 0; 1; 0/:
Find the matrix of the orthogonal projection of R4 onto W D Span.v1 ; v2 /.
Solution. Let v 2 R4 and write pW .v/ D av1 C bv2 for some real numbers a; b.
Since v pW .v/ is orthogonal to W , we have
hv .av1 C bv2 /; v1 i D hv .av1 C bv2 /; v2 i D 0;
which can also be written, taking into account that
jjv1 jj2 D 2;
jjv2 jj2 D 2;
hv1 ; v2 i D 1;
as
2a C b D hv; v1 i;
a C 2b D hv; v2 i:
Solving the system yields
a D hv;
2v1 v2
i;
3
b D hv;
2v2 v1
i:
3
Since
2v1 v2
D
3
1 2 1
; ; ;0 ;
3 3 3
2v2 v1
D
3
1 1 2
; ; ;0 ;
3 3 3
412
10 Forms
we can easily compute the values pW .e1 /; : : : ; pW .e4 /, where e1 ; : : : ; e4 is the
canonical basis of R4 . More precisely, we obtain for v D v1 that a D 13 and b D 13 ,
thus
v1 C v2
2 1 1
D
; ; ;0 ;
pW .e1 / D
3
3 3 3
and similar arguments yield
pW .e2 / D
2v1 C v2
1 2 1
D ; ; ;0 ;
3
3 3 3
v1 2v2
1 1 2
D ; ; ;0
pW .e3 / D
3
3 3 3
and finally
pW .e4 / D 0 v1 C 0 v2 D .0; 0; 0; 0/:
We conclude that the desired matrix is
2 2
3
13 13 0
61 2 1 07
3 3
3
7
AD6
41 1 2 05 :
3
3 3
0 0 0 0
3
t
u
Definition 10.47. Let V be an Euclidean space. A linear map p W V ! V is called
an orthogonal projection if there is a subspace W of V such that p is the orthogonal
projection onto W .
The next theorem describes orthogonal projections as solutions to minimization
problems. The result is absolutely fundamental:
Theorem 10.48. Let V be a R-vector space with an inner product h ; i and with
associated norm jj jj. Let W be a finite dimensional subspace of V and let v 2 V .
Then pW .v/ is the element of W at smallest distance from v, i.e.
jjv pW .v/jj D min jjx vjj:
x2W
Moreover, pW .v/ is the unique element of W with this property.
Proof. Let x 2 W and apply the Pythagorean theorem (Remark 10.39; observe that
x pW .v/ 2 W and v pW .v/ 2 W ? ) to obtain
jjx vjj2 D jj.x pW .v// C .pW .v/ v/jj2 D
jjx pW .v/jj2 C jjpW .v/ vjj2 jjpW .v/ vjj2 :
10.4 Duality and Orthogonality
413
This shows that jjx vjj jjpW .v/ vjj for al v 2 V and yields the first part of the
theorem. For the second part, note that equality holds in the first offset equation if
and only if x D pW .v/. This finishes the proof of the theorem.
t
u
Definition 10.49. With the notations of Theorem 10.48, we define the distance
from v to W as
d.v; W / D jjv pW .v/jj D min jjx vjj:
x2W
Problem 10.50. Let V be a R-vector space endowed with an inner product h ; i and
let W be a finite dimensional subspace of V . Let x1 ; : : : ; xn be a basis of W and let
v 2 V . Prove that
d.v; W /2 D
G.v; x1 ; : : : ; xn /
;
G.x1 ; : : : ; xn /
where G.x1 ; : : : ; xn / D det.hxi ; xj i/ is the Gram determinant of the family
x 1 ; : : : ; xn .
Solution. Write pW .v/ D a1 x1 C : : : C an xn for some real numbers a1 ; : : : ; an .
By definition
d.v; W /2 D jjv pW .v/jj2 D hv pW .v/; vi D jjvjj2 hv; pW .v/i;
thus
d.v; W /2 C a1 hv; x1 i C : : : C an hv; xn i D jjvjj2 :
Since
hv; xi i D hv pW .v/; xi i C hpW .v/; xi i D hpW .v/; xi i
D a1 hx1 ; xi i C a2 hx2 ; xi i C : : : C an hxn ; xi i;
we deduce that d.v; W /2 and a1 ; : : : ; an are solutions of the linear system in the
unknowns t0 ; : : : ; tn
8
ˆ
ˆ
<
t0 C t1 hv; x1 i C : : : C tn hv; xn i D jjvjj2
t1 hx1 ; x1 i C t2 hx1 ; x2 i C : : : C tn hx1 ; xn i D hv; x1 i
ˆ
:::
:̂
t1 hx1 ; xn i C t2 hx2 ; xn i C : : : C tn hxn ; xn i D hv; xn i
The result follows then straight from Cramer’s rule.
t
u
414
10 Forms
Problem 10.51. Consider the vectors
3
3
617
7
v1 D 6
415 ;
1
2
3
1
617
7
v2 D 6
415
1
2
and
2 3
3
617
7
bD6
455 :
1
Find the distance from b to the space W D Span.v1 ; v2 /.
Solution. We start by finding the orthogonal projection of b on W by writing
b D pW .b/ C u
with hu; v1 i D hu; v2 i D 0. Writing pW .b/ D ˛v1 C ˇv2 we obtain
˛jjv1 jj2 C ˇhv1 ; v2 i D hb; v1 i
and
˛hv1 ; v2 i C ˇjjv2 jj2 D hb; v2 i;
which reduces to
12˛ D 6;
4ˇ D 6:
We deduce that
3
3
617
1
3
7
pW .b/ D v1 C v2 D 6
415
2
2
1
2
and so
d.b; W / D jjb pW .b/jj D
p
p
.3 3/2 C .1 .1//2 C .5 1/2 C .1 .1//2 D 24:
Of course, one can also use the previous exercise, by computing (exercise left to
the reader) G.v1 ; v2 / D 48 and G.b; v1 ; v2 / D 32 36, then
s
d.b; W / D
p
G.b; v1 ; v2 /
D 24:
G.v1 ; v2 /
10.4 Duality and Orthogonality
415
However, we strongly advise the reader to redo every time the argument explained
in the first part of the solution.
t
u
Problem 10.52. Let n be a positive integer and let V be the vector space of
polynomials with real coefficients whose degree does not exceed n. For P; Q 2 V
define
Z 1
P .x/Q.x/e x dx:
hP; Qi D
0
a) Explain why h ; i is a well-defined inner product on V .
b) Find
Z 1
min
a1 ;:::;an 2R 0
.1 C a1 x C : : : C an x n /2 e x dx:
Solution. a) The definition makes sense, since x k e x is integrable on Œ0; 1/ for
all k 0. More precisely, we have the following classical result, which follows
easily by induction on k combined with integration by parts
Z 1
e x x k dx D kŠ:
0
It is easy to see that h ; i is indeed an inner product: it is clearly symmetric
and bilinear, and we have
Z 1
hP; P i D
P .x/2 e x dx 0:
0
R1
R1
If the last quantity equals 0, then so does 0 e x P .x/2 dx 0 e x P .x/2 dx.
Since x 7! e x P .x/2 is continuous, nonnegative, and with average value 0 on
Œ0; 1, it must be the zero map, thus P vanishes on Œ0; 1 and so P D 0 (because
P is a polynomial). This proves the claim that h ; i is an inner product.
b) Let W be the span of X; X 2 ; : : : ; X n , then W is a subspace of V and the problem
asks us to find
inf jj1 C P jj2 D d.1; W /2 :
P 2W
We know that the minimum value is attained when P is the orthogonal projection
of 1 on W . This is characterized by hP C 1; Qi D 0 for all Q 2 W , or
equivalently hP C 1; X k i D 0 for all 1 k n. Using the identity
Z 1
0
e x x k dx D kŠ
416
10 Forms
and writing P D a1 X C : : : C an X n , we can rewrite the condition as
kŠ C
n
X
ai .k C i /Š D 0 or
1C
iD1
n
X
ai .k C 1/ : : : .k C n/ D 0:
iD1
P
Thus the polynomial Q D 1C niD1 ai .X C1/ : : : .X Ci / vanishes at 1; 2; : : : ; n
and since it has degree n and leading coefficient an , we must have
Q D an .X 1/ : : : .X n/:
We need to evaluate
d.1; W /2 D jj1 C P jj2 D h1 C P; 1i D 1 C
n
X
ai i Š D Q.0/ D .1/n nŠan :
i D1
Taking X D 1 in the equality
1C
n
X
ai .X C 1/ : : : .X C n/ D an .X 1/ : : : .X n/
iD1
yields
1 D an .1/n .n C 1/Š:
We conclude that the answer of the problem is
.1/n nŠan D nŠ 1
1
D
:
.n C 1/Š
nC1
t
u
10.4.1 Problems for Practice
Whenever it is not specified, the inner product on Rn is the canonical one.
1. Let
3
1
x1 D 4 3 5 ;
2
2
2 3
6
4
x2 D 45
2
3
4
b D 425 :
3
2
and
Find the distance from b to the plane spanned by x1 and x2 .
2. Determine the orthogonal projection of b D .2; 1; 1/ on the subspace spanned
by .1; 1; 1/ and .1; 0; 1/.
10.4 Duality and Orthogonality
417
3. Let W be the subspace of R4 spanned by w D .1; 1; 1; 1/. Find the
orthogonal projection of b D .3; 0; 3; 2/ on the orthogonal complement W ? .
4. Consider the vector space V of continuous real-valued maps on Œ0; 1, with the
inner product defined by
hf; gi D
Z 1
f .x/g.x/dx:
0
Determine which of the functions f .x/ D x and g.x/ D x 3 is closer (with
respect to the distance associated with the norm induced by the inner product)
to the function h.x/ D x 2 .
5. a) Let V be an Euclidean space and let T1 ; T2 be orthogonal projections such
that T1 ıT2 is a projection. Prove that T1 ıT2 D T2 ıT1 . Hint: use Problem 14.
b) Does this result remain true if we no longer assume that T1 and T2 are
orthogonal projections, but only projections?
6. Let a1 ; : : : ; an be real numbers, not all of them equal to 0. Let H be the set of
vectors .x1 ; : : : ; xn / 2 Rn such that a1 x1 C : : : C an xn D 0. Find the matrix of
the orthogonal projection onto H with respect to the canonical basis of Rn .
7. Let V be the vector space of
whose degree
P polynomials with real coefficients
P
does not exceed n. If P D niD0 ai X i 2 V and Q D niD0 bi X i 2 V , define
hP; Qi D
n
X
ai bi :
iD0
Let H be the subspace of polynomials in V vanishing at 1. Compute d.X; H /.
8. Let V be the set of polynomials with real coefficients and degree not exceeding
3. Find
Z min
jP .x/ sin xj2 :
P 2V
9. Find the vector in Span..1; 2; 1/; .1; 3; 4// which is closest (with respect to
the Euclidean norm) to the vector .1; 1; 1/.
10. Let v1 D .0; 1; 1; 0/, v2 D .1; 1; 1; 1/ in R4 . Let W be the span of v1 ; v2 .
a) Find the matrix of the orthogonal projection of R4 onto W with respect to
the canonical basis of R4 .
b) Compute the distance from .1; 1; 1; 1/ to W .
11. Let V be the space of smooth real-valued maps on Œ0; 1, endowed with
hf; gi D
Z 1
0
f .x/g.x/dx C
Z 1
0
f 0 .x/g 0 .x/dx:
418
10 Forms
a) Prove that h ; i is an inner product on V .
b) Let W1 be the subspace of V consisting of maps f vanishing at 0 and 1.
Let W2 be the subspace of V consisting of maps f such that f 00 D f . Prove
that W1 ˚ W2 D V and that W1 and W2 are orthogonal to each other.
c) Describe the orthogonal projection of V onto W2 .
12. Let .V; h ; i/ be an Euclidean space and let f W V ! V be a map such that
hf .x/; yi D hx; f .y/i for all x; y 2 V . Prove that f is linear.
13. Let V be an Euclidean space and let T W V ! V be a linear map such that T 2 D
T , i.e., T is a projection. Prove that the following statements are equivalent:
a) T is an orthogonal projection.
b) For all x; y 2 V we have hT .x/; yi D hx; T .y/i.
14. Let V be an Euclidean space and let T be a linear transformation on V such
that T 2 D T , i.e., T is a projection.
a) Suppose that T is an orthogonal projection. Using the Pythagorean theorem,
prove that for all x 2 V we have
jjT .x/jj jjxjj:
b) Conversely, suppose that jjT .x/jj jjxjj for all x 2 V . Prove that hx; yi D
0 for x 2 ker T and y 2 Im.T / (hint: use that jjT .x C cy/jj2 jjx C cyjj2
for all real numbers c) and deduce that T is an orthogonal projection.
10.5 Orthogonal Bases
Let V be a vector space over R endowed withpan inner product .x; y/ 7! hx; yi,
with associated norm jj jj (recall that jjxjj D hx; xi for all x 2 V ).
Definition 10.53. a) A family .vi /i2I of vectors in V is called orthogonal if
hvi ; vj i D 0 for all
i ¤ j 2 I:
It is called orthonormal if moreover jjvi jj D 1 for all i 2 I . Thus the vectors in
an orthonormal family of V have norm 1 and are pairwise orthogonal.
b) An orthogonal basis of V is a basis of V which is an orthogonal family.
c) An orthonormal basis of V is a basis which is an orthonormal family.
Note that the canonical basis of Rn is an orthonormal basis of Rn with respect
to the canonical inner product on Rn . In the following two exercises the reader will
find two other very important examples of orthonormal families.
Problem 10.54. Let x0 ; : : : ; xn be pairwise distinct real numbers and consider the
space V of polynomials with real coefficients and degree not exceeding n, endowed
10.5 Orthogonal Bases
419
with
hP; Qi D
n
X
P .xi /Q.xi /:
iD0
Prove that h ; i is an inner product on V and that the family .Li /0i n where
Y X xk
x i xk
Li .X / D
0kn
k¤i
is an orthonormal family in V .
Solution. Clearly h ; i is a symmetric bilinear form on V and for all P 2 V we
have
hP; P i D
n
X
P .xi /2 0;
iD0
with equality if and only if P .xi / D 0 for 0 i n. Since x0 ; : : : ; xn are pairwise
distinct and since deg P n, it follows that necessarily P D 0 and so h ; i is an
inner product on V .
To prove the second assertion, let i ¤ j 2 f0; : : : ; ng and let us compute
hLi ; Lj i D
n
X
Li .xk /Lj .xk /:
kD0
Now, by construction we have
Li .xj / D ıij ;
where ıij D 0 if i ¤ j and 1 otherwise. Thus
hLi ; Lj i D
n
X
ıik ıj k :
kD0
If i ¤ j , then ıik ıj k D 0 for 0 k n, thus hLi ; Lj i D 0. If i D j , then
ıik ıj k D 0 for k ¤ i and 1 for k D i, thus hLi ; Li i D 1 and the result follows. u
t
Problem 10.55. Let V be the space of continuous 2-periodic maps f W R ! R,
endowed with the inner product
hf; gi D
Z f .x/g.x/dx:
420
10 Forms
Let cn ; sn 2 V be the maps defined by
cn .x/ D cos.nx/;
sn .x/ D sin.nx/:
Prove that the family
FD
[ 1
[ 1
p cn j n 1
p sn j n 1
1
p
2
is an orthonormal family in V .
Solution. To simplify notations a little bit, let C0 D p1 and for n 1
2
1
Cn D p cn ;
1
Sn D p sn :
Clearly
jjC0 jj D
Z 2
1
dx D 1:
2
Next,
jjCn jj2 D
Z 1
1
cos2 .nx/dx D
Z 1 C cos.2nx/
dx D 1;
2
since
Z cos.px/dx D 0
8
p 2 Z
(10.2)
(a primitive of cos.px/ is p1 sin.px/ and this vanishes at and ). Similarly, we
obtain that jjSn jj D 1, by using the identity
sin2 .nx/ D
1 cos.2nx/
:
2
Thus jjvjj D 1 for all v 2 F.
It remains to check that elements of F are pairwise orthogonal. That C0 is
orthogonal to Cn and Sn for all n 1 follows from relation (10.2) and its analogue
Z sin.px/ D 0;
8
p2Z
(10.3)
10.5 Orthogonal Bases
421
Next, we check that Cn and Cm are orthogonal for m ¤ n. This follows from the
identity
cos.nx/ cos.mx/ D
cos..m n/x/ C cos..m C n/x/
2
and relation (10.2). Similarly, the fact that Sn and Sm are orthogonal for n ¤ m is a
consequence of the relations
sin.nx/ sin.mx/ D
cos..m n/x/ cos..m C n/x/
2
and (10.2). Finally, the fact that Sn and Cm are orthogonal for n; m 1 follows
from relations
sin.nx/ cos.mx/ D
sin..m C n/x/ C sin..n m/x/
2
t
u
and (10.3).
A fundamental property of orthonormal families is that they are linearly
independent. More precisely
Proposition 10.56. Let V be a vector space over R endowed with an inner product.
Then any orthogonal family .vi /i2I of nonzero vectors in E is linearly independent.
P
Proof. Suppose that i2I ai vi D 0 for some scalars ai 2 R, such that all but
finitely many of them are 0. For j 2 I we have
hvj ;
X
ai vi i D 0:
i2I
By bilinearity, the left-hand side equals
X
ai hvj ; vi i D aj jjvj jj2 ;
i2I
the last equality being a consequence of the fact that .vi /i 2I is orthogonal. We
deduce (thanks to the hypothesis that vj ¤ 0 for all j ) that aj D 0 for all j 2 I
and the result follows.
t
u
The following result is a direct consequence of the previous proposition:
Corollary 10.57. An orthogonal family of nonzero vectors in an Euclidean space
of dimension n has at most n elements. Moreover, it has n elements if and only
if it is an orthogonal basis. In particular, an orthonormal family of n vectors in
an n-dimensional Euclidean space is automatically an orthonormal basis of that
space.
422
10 Forms
When we have an orthonormal basis e1 ; : : : ; en of an Euclidean space V , it is
rather easy to write down the coordinates of a vector v 2 V with respect to this
basis: these coordinates are simply hv; ei i for 1 i n. More precisely, we have
the very important formula
vD
n
X
hv; ei iei
(10.4)
iD1
called the Fourier decomposition of v with respect to the orthonormal basis
e1 ; : : : ; en . In order to prove formula (10.4), write
vD
n
X
xi ei
iD1
for some real numbers xi and observe that
hv; ej i D
n
X
hxi ei ; ej i D
n
X
iD1
xi hei ; ej i D xj
iD1
for all 1 j n.
Let us come back for a moment to the setup and notations of Problem 10.54.
Recall that we proved in that problem that the polynomials
Li .X / D
Y X xk
x i xk
0kn
k¤i
for 0 i n form an orthonormal family in the space V of polynomials with real
coefficients and degree at most n, endowed with the inner product defined by
hP; Qi D
n
X
P .xi /Q.xi /:
iD0
Since dim V D nC1 and since the family .Li /0in is orthonormal (Problem 10.54)
and has n C 1 elements, Corollary 10.57 shows that this family is an orthonormal
basis of V . Moreover, for each P 2 V the Fourier decomposition of P becomes
P D
n
X
hP; Li iLi :
iD0
Note that
hP; Li i D
n
X
kD0
P .xk /Li .xk / D P .xi /;
10.5 Orthogonal Bases
423
since Li .xk / D 0 for i ¤ k and 1 for i D k. We obtain in this way Lagrange’s
interpolation formula: for all polynomials P of degree at most n we have
P D
n
X
P .xi /Li D
iD0
n
X
P .xi /
iD0
Y X xk
:
x i xk
0kn
k¤i
Let us do now the same thing starting with Problem 10.55. Let Tn be the space
of trigonometric polynomials of degree at most n. By definition,
Tn D Span.c0 ; c1 ; : : : ; cn ; s1 ; : : : ; sn /;
where we recall that
ck .x/ D cos.kx/;
sk .x/ D sin.kx/:
Thus an element of Tn is a continuous 2-periodic map of the form
x 7! a0 C
n
X
.ak cos.kx/ C bk sin.kx//
kD1
with ak ; bk 2 R. By Problem 10.55 the family
Fn D
1
p
2
[ 1
[ 1
p ck j 1 k n
p sk j 1 k n
is orthonormal with respect to the inner product
hf; gi D
Z f .x/g.x/dx
on Tn . This family is therefore linearly independent in Tn and by definition it spans
Tn , hence it is an orthonormal basis of Tn . If f W R ! R is any continuous
2-periodic map, we call the sum
Sn .f / D
X
hf; gig
g2Fn
the nth partial Fourier series of f . A small calculation shows that we can also
write
a0 .f / X
.ak .f / cos.kx/ C bk .f / sin.kx// ;
C
2
n
Sn .f /.x/ D
kD1
424
10 Forms
where
1
am .f / D
Z 1
bm .f / D
f .x/ cos.mx/dx;
Z f .x/ sin.mx/dx
and the mth Fourier coefficients of f .
We can also rewrite the previous results in terms of the complex Fourier
coefficients
Z 1
f .x/e imx dx
cm .f / D
2 of f . These are usually referred to simply as the Fourier coefficients of f . A nice
exercise for the reader consists in checking that the partial Fourier series can also be
expressed as
Sn .f /.x/ D
n
X
ck .f /e ikx ;
kDn
by first checking that for m 0
cm .f / D
am .f / i bm .f /
:
2
Note that relation (10.4) says that
f D Sn .f / if
f 2 Tn ;
but of course we do not have f D Sn .f / for any continuous 2-periodic function
f . One may wonder what is the actual relationship between f and the partial
Fourier series of f . The naive guess would be that
lim Sn .f /.x/ D f .x/
n!1
for every continuous 2-periodic map f W R ! R. This is not true, but there are
many situations in which this is actually true: a deep theorem in Fourier analysis
due to Dirichlet says that if f and its derivative are piecewise continuous, then for
all x we have
lim Sn .f /.x/ D
n!1
f .xC/ C f .x/
;
2
where f .xC/ and f .x/ are the one-sided limits of f at x. Thus if moreover f is
continuous, we do have
lim Sn .f /.x/ D f .x/;
n!1
10.5 Orthogonal Bases
425
which we can write as
f .x/ D
X
1
ck .f /e ikx D
k2Z
a0 .f / X
.ak .f / cos.kx/ C bk .f / sin.kx// :
C
2
kD1
Orthogonal bases are extremely useful in practice, as we can easily compute
orthogonal projections and distances once we have at our disposal an orthogonal
basis of the space we are interested in. More precisely, we have the following very
useful
Theorem 10.58. Let V be a vector space over R endowed with an inner product
h ; i and let W be a finite dimensional subspace of V . Let v1 ; : : : ; vn be an
orthogonal basis of W . Then for all v 2 V we have
pW .v/ D
n
X
hv; vi i
iD1
jjvi jj2
vi :
Proof. Let us write v D pW .v/ C u, with u 2 W ? , that is hu; vi i D 0 for all
i 2 Œ1; n. Letting pW .v/ D ˛1 v1 C : : : C ˛n vn and using the fact that v1 ; : : : ; vn is
an orthogonal family, we obtain
0 D hu; vi i D hv; vi i hpW .v/; vi i
D hv; vi i n
X
˛j hvj ; vi i D hv; vi i ˛i jjvi jj2 :
j D1
It follows that
˛i D
hv; vi i
jjvi jj2
t
u
and the theorem is proved.
We can say quite a bit more. The inequality in the theorem below is called
Bessel’s inequality.
Theorem 10.59. Let V be a vector space over R endowed with an inner product
h ; i, and let W be a finite dimensional subspace of V . If v1 ; : : : ; vn is an
orthonormal basis of W , then for all v 2 V we have
pW .v/ D
n
X
iD1
hv; vi i vi
426
10 Forms
and
d.v; W /2 D jjv n
n
X
X
hv; vi i vi jj2 D jjvjj2 hv; vi i2 :
i D1
iD1
In particular we have
n
X
hv; vi i2 jjvjj2 :
iD1
Proof. The formula for pW .v/ is a direct consequence of the previous theorem.
Next, using the Pythagorean theorem
jjvjj2 D jjv pW .v/jj2 C jjpW .v/jj2 :
On the other hand, since v1 ; : : : ; vn is an orthonormal basis, we have
jjpW .v/jj D jj
2
n
X
hv; vi i vi jj2 D
iD1
n
X
hhv; vi ivi ; hv; vj ivj i D
i;j D1
n
X
hv; vi i hv; vj i hvi ; vj i D
i;j D1
n
X
ıi;j hv; vi i hv; vj i D
i;j D1
n
X
hv; vi i2 :
iD1
Combining these equalities yields
d.v; W /2 D jjv n
n
X
X
hv; vi i vi jj2 D jjvjj2 hv; vi i2 :
i D1
iD1
Finally, since d.v; W /2 0, the last inequality is a direct consequence of the
previous formula.
t
u
Remark 10.60. Let V be a vector space over R endowed with an inner product and
let .vi /i2I be an orthogonal family. If .ai /i2I are real numbers, all but finitely many
being equal to 0, then
jj
X
i2I
ai vi jj2 D
X
i2I
ai2 jjvi jj2 :
10.5 Orthogonal Bases
427
In particular, if .vi /i2I is an orthonormal family, then
jj
X
ai vi jj2 D
i2I
X
ai2 :
i2I
This can be proved in the same way as the previous theorem: we have
jj
X
ai vi jj2 D h
i2I
X
X
ai vi ;
i2I
ai aj hvi ; vj i D
i;j 2I
X
X
aj vj i D
j 2I
ai2 hvi ; vi i D
X
ai2 jjvi jj2 ;
i 2I
i2I
since the family is orthogonal. Note that the algebraic operations are allowed since
we assumed that all but finitely many of the ai ’s are zero, thus we never manipulate
infinite sums.
Remark 10.61. Let us come back to the discussion preceding the previous theorem.
If f W R ! R is a continuous 2-periodic map, we deduce from that discussion
and the previous theorem that Sn .f / (the nth partial Fourier series of f ) is the
orthogonal projection of f on the space Tn of trigonometric polynomials of degree
at most n and that
Z X
hf; gi2 jjf jj2 D
f .x/2 dx:
g2Fn
This can be rewritten in terms of the Fourier coefficients am .f /; bm .f / of f as
n
Z kD1
1
a0 .f /2 X
.ak .f /2 C bk .f /2 / C
2
f .x/2 dx:
Since this holds for all n, we deduce that the series
X
.ak .f /2 C bk .f /2 /
k1
converges and
1
Z kD1
a0 .f /2 X
1
.ak .f /2 C bk .f /2 / C
2
f .x/2 dx:
The convergence of the previous series yields
lim an .f / D lim bn .f / D 0;
n!1
n!1
428
10 Forms
a nontrivial result known as the Riemann–Lebesgue theorem. On the other hand,
one can prove (this is the famous Plancherel theorem) that the previous inequality
is actually an equality, that is for all continuous 2-periodic maps f we have
1
Z kD1
a0 .f /2 X
1
.ak .f /2 C bk .f /2 / D
C
2
f .x/2 dx:
The proof is beyond the scope of this book. A good exercise for the reader is to
convince himself that Plancherel’s theorem can be rewritten as
Z X
1
2
jck .f /j D
f .x/2 dx;
2 k2Z
where we recall that
ck .f / D
1
2
Z f .x/e ikx dx:
Plancherel’s theorem also holds for functions f W R ! R which are piecewise
continuous and 2-periodic.
Problem 10.62. a) Determine an orthogonal basis of R3 containing the vector
w D .1; 2; 1/.
b) Let W be the subspace of R3 spanned by w. Find the projection of v D .1; 2; 1/
onto the orthogonal complement of W .
Solution. a) We look for an orthogonal basis w; v1 ; v2 of R3 . In particular v1 ; v2
should be an orthogonal basis of .Rw/? . A vector v D .x; y; z/ belongs to .Rw/?
if and only if
0 D hv; wi D x C 2y z:
Thus we must have
v1 D .x1 ; y1 ; x1 C 2y1 /;
v2 D .x2 ; y2 ; x2 C 2y2 /
for some real numbers x1 ; x2 ; y1 ; y2 . Moreover, we should have hv1 ; v2 i D 0
and v1 ; v2 should be nonzero: this automatically implies that v1 ; v2 are linearly
independent (because hv1 ; v2 i D 0) and so they form a basis of .Rw/? .
The condition hv1 ; v2 i D 0 is equivalent to
x1 x2 C y1 y2 C .x1 C 2y1 /.x2 C 2y2 / D 0:
10.5 Orthogonal Bases
429
We see that we have lots of choices: to keep things simple we choose x1 C 2y1 D
0, for instance y1 D 1 and x1 D 2. Then the condition becomes 2x2 Cy2 D 0,
so we choose x2 D 1 and y2 D 2. This gives
v1 D .2; 1; 0/;
v2 D .1; 2; 5/:
We insist that this is only one of the many possible answers of the problem.
b) As we have already seen in part a), the orthogonal complement of W is exactly
Span.v1 ; v2 / and an orthogonal basis of W ? is given by v1 ; v2 . Applying the
previous theorem yields
pW ? .v/ D
D
hv; v1 i
hv; v2 i
v1 C
v2
2
jjv1 jj
jjv2 jj2
1
1 2 5
v2 D . ; ; /:
3
3 3 3
We could have done this in a much easier way as follows: instead of computing
pW ? .v/ we compute first pW .v/. Now an orthogonal basis of W is given by w, thus
pW .v/ D
hv; wi
4
2 4 2
w D w D . ; ; /:
2
jjwjj
6
3 3 3
Next, we have
1 2 5
pW ? .v/ D v pW .v/ D . ; ; /:
3 3 3
t
u
The previous results concerning orthonormal bases show rather clearly the
crucial role played by these objects. Yet, we avoided a natural and very important
question: can we always find an orthonormal basis? The answer is given by the
following fundamental theorem. We do not give its proof right now since we will
prove a much stronger result in just a few moments.
Theorem 10.63. Any Euclidean space has an orthonormal basis.
The following theorem refines Theorem 10.63 and gives an algorithmic construction of an orthonormal basis of an Euclidean space starting with an arbitrary
basis of the corresponding vector space. It is absolutely fundamental:
Theorem 10.64 (Gram–Schmidt). Let v1 ; : : : ; vd be linearly independent vectors
in a vector space V over R (not necessarily finite dimensional), endowed with an
inner product h ; i. Then there is a unique orthonormal family e1 ; : : : ; ed in V with
the property that for all k 2 Œ1; d we have
Span.e1 ; : : : ; ek / D Span.v1 ; : : : ; vk /
and
hek ; vk i > 0:
430
10 Forms
Proof. We will prove the theorem by induction on d . Let us start with the case
d D 1. Suppose that e1 is a vector satisfying the conditions imposed by the theorem.
Since e1 2 Rv1 , we can write e1 D v1 for some real number . Then he1 ; v1 i D
jjv1 jj2 is positive, thus > 0. Next, jje1 jj D 1, thus j j D jjv11 jj and so necessarily
D jjv11 jj and e1 D jjv11 jj v1 . Conversely, this vector satisfies the desired properties,
which proves the theorem when d D 1.
Assume now that d 2 and that the theorem holds for d 1. Let v1 ; : : : ; vd be
linearly independent vectors in V . By the inductive hypothesis we know that there
is a unique orthonormal family e1 ; : : : ; ed 1 satisfying the conditions of the theorem
with respect to the family v1 ; : : : ; vd 1 . It suffices therefore to prove that there is a
unique vector ed such that e1 ; : : : ; ed satisfies the conditions of the theorem with
respect to v1 ; : : : ; vd , that is such that
jjed jj D 1;
hed ; ei i D 0 81 i d 1;
and
Span.e1 ; : : : ; ed / D Span.v1 ; : : : ; vd /:
Assume first that ed is such a vector. Then
ed 2
Span.e1 ; : : : ; ed / D Span.v1 ; : : : ; vd / D Rvd C Span.v1 ; : : : ; vd 1 /
D Rvd C Span.e1 ; : : : ; ed 1 /:
Thus we can write
ed D vd C
d 1
X
ai ei
iD1
for some real numbers ; a1 ; : : : ; ad 1 . Then for all i 2 Œ1; d 1 we have (since
e1 ; : : : ; ed 1 is an orthonormal family)
0 D hed ; ei i D hvd ; ei i C
d 1
X
aj hej ; ei i D hvd ; ei i C ai ;
j D1
thus ai D hvd ; ei i are uniquely determined if
ed D .vd is so. Next, we have
d 1
X
hvd ; ei iei /:
iD1
10.5 Orthogonal Bases
431
Pd 1
Note that z WD vd 2
iD1 hvd ; ei iei is nonzero, since otherwise vd
Span.e1 ; : : : ; ed 1 / D Span.v1 ; : : : ; vd 1 /, contradicting the hypothesis that
1
v1 ; : : : ; vd are linearly independent. Now jjed jj D 1 forces j j D jjzjj
, and the
condition hed ; vd i > 0 shows that the sign of is uniquely determined and is
actually positive:
hed ; vd i D hed ;
ed
C
d 1
X
hvd ; ei iei i D
1
:
iD1
We deduce that
is uniquely determined by
D
1
jjzjj
and the uniqueness follows.
1
and ed D z. The previous computations
Conversely, we can define D jjzjj
show that ed satisfies all required properties and this proves the existence part and
finishes the proof of the inductive step.
t
u
Remark 10.65. a) Let us try to understand the proof geometrically (i.e., let us give
a less computational and more conceptual proof of the theorem). Assuming
that we constructed e1 ; : : : ; ed 1 , we would like to understand how to construct
ed . This vector ed must be orthogonal to e1 ; : : : ; ed 1 and it must belong
to W D Span.v1 ; : : : ; vd /. It follows that ed must be in the orthogonal of
Span.e1 ; : : : ; ed 1 / D Span.v1 ; : : : ; vd 1 /. However
dim Span.v1 ; : : : ; vd 1 /? D dim Span.v1 ; : : : ; vd / dim Span.v1 ; : : : ; vd 1 /
D d .d 1/ D 1;
thus ed is uniquely determined up to a scalar. Since we further want ed to be of
norm 1, this pins down ed up to a sign. Finally, the condition that hed ; vd i > 0
determines uniquely the sign and so determines uniquely ed .
b) Part a) (and the proof of the theorem also) gives the following algorithm,
known as the Gram–Schmidt process, which constructs e1 ; : : : ; ed starting from
v1 ; : : : ; vd . Set f1 D v1 and e1 D jjff11 jj , then assuming that we constructed
f1 ; : : : ; fk1 and e1 ; : : : ; ek1 , let
fk D vk k1
X
hvk ; ei iei ;
iD1
and
ek D
fk
:
jjfk jj
That
Pk1 is, at each step we subtract from vk its orthogonal projection
iD1 hvk ; ei iei onto Span.e1 ; : : : ; ek1 / and obtain in this way fk . Then
432
10 Forms
we normalize fk to get ek . Note that in practice it can be very useful to
observe that we can compute jjfk jj via
jjfk jj2 D hfk ; vk i:
This formula follows from the fact that vk D fk C
orthogonal to fk for 1 i k 1.
Pk1
i D1 hvk ; ei iei
and ei is
Example 10.66. Let us consider the vectors
v1 D .1; 1; 1/;
v2 D .0; 2; 1/;
v3 D .3; 1; 3/ 2 R3 :
An easy computation shows that the determinant of the matrix whose columns
are v1 ; v2 ; v3 is nonzero, thus v1 ; v2 ; v3 are linearly independent. Let us follow the
Gram–Schmidt process and find the corresponding orthonormal basis of R3 . We set
f1 D v1 ;
e1 D
1
1
1
v1
v1
D p D . p ; p ; p /:
jjv1 jj
3
3
3
3
Next, set
f2 D v2 hv2 ; e1 ie1 D v2 p
3e1 D v2 .1; 1; 1/ D .1; 1; 0/
and
e2 D
1
1
f2
f2
D p D . p ; p ; 0/:
jjf2 jj
2
2
2
Finally, set
f3 D v3 hv3 ; e1 ie1 hv3 ; e2 ie2 D
p
7
7 7 7
v3 p e1 C 2e2 D .3; 1; 3/ . ; ; / C .1; 1; 0/
3 3 3
3
1 1 2
D . ; ; /
3 3 3
and
e3 D
f3
1
D p .1; 1; 2/:
jjf3 jj
6
10.5 Orthogonal Bases
433
Problem 10.67. Let V be the space of polynomials with real coefficients whose
degree does not exceed 2, endowed with the inner product defined by
Z 1
P .x/Q.x/dx:
hP; Qi D
1
Find the orthonormal basis of V obtained by applying the Gram–Schmidt process
to the basis 1; X; X 2 of V .
Solution. We start with v1 D 1, v2 D X and v3 D X 2 and apply the Gram–Schmidt
process. We obtain
jjv1 jj D
p
2;
1
e1 D p ;
2
then
1
1
1
f2 D v2 hv2 ; p i p D X 2
2
2
Z 1
xdx D X
1
and
jjf2 jj2 D hf2 ; v2 i D
Z 1
thus
f2
e2 D
D
jjf2 jj
x 2 dx D
1
r
2
;
3
3
X:
2
Finally,
1
1
f3 D v3 hv3 ; p i p hv3 ;
2
2
D X2 1
2
r
r
3
3
Xi
X
2
2
Z 1
Z
3 1 3
1
x 2 dx .
x dx/X D X 2 2 1
3
1
and
jjf3 jj2 D hf3 ; v3 i D
Z 1
1
8
x 2 .x 2 /dx D
:
3
45
1
Hence
f3
3X 2 1
D
e3 D
jjf3 jj
2
r
5
:
2
434
10 Forms
Hence the answer is
r
1
p ;
2
3X 2 1
2
3
X;
2
r
5
:
2
t
u
The following problem is a generalization of the previous one. It is much more
challenging and represents an introduction to the beautiful theory of orthogonal
polynomials.
Problem 10.68 (Legendre’s Polynomials). Let n 1 and let V be the space of
polynomials with real coefficients whose degree does not exceed n, endowed with
the inner product
hP; Qi D
Z 1
P .x/Q.x/dx:
1
Let Ln be the nth derivative of .X 2 1/n .
a) Prove that L0 ; : : : ; Ln is an orthogonal basis of V .
b) Compute jjLk jj.
c) What is the orthonormal basis of V obtained by applying the Gram–Schmidt
process to the canonical basis 1; X; : : : ; X n of V ?
Solution. For j 2 Œ0; n let Pj D .X 2 1/j and note that 1 and 1 are roots
with multiplicity j of Pj . It follows that for k 2 Œ0; j , 1 and 1 and roots with
.k/
multiplicity j k of Pj (kth derivative of Pj ). If P 2 V , we deduce from this
observation and integration by parts that for j 1
hLj ; P i D
Z 1
1
Z 1
1
.j 1/
.j /
Pj .x/P .x/dx D Pj
.j 1/
Pj
.x/P 0 .x/dx D Z 1
1
.j 1/
Pj
.x/P .x/j11 .x/P 0 .x/dx
and repeating this argument gives
hLj ; P i D .1/
Z 1
k
1
.j k/
Pj
.x/P .k/ .x/dx
for k 2 Œ0; j and P 2 V . Taking j D k yields the fundamental relation
hLk ; P i D .1/
Z 1
k
.x 1/ P
2
k
.k/
.x/dx D
Z 1
1
for k 2 Œ0; n and P 2 V .
It is now rather easy to deal with the problem.
1
.1 x 2 /k P .k/ .x/dx
(10.5)
10.5 Orthogonal Bases
435
a) For j < k we have by relation (10.5)
hLk ; Lj i D
Z 1
1
.k/
.1 x 2 /k Lj .x/dx:
.k/
Since deg Lj D j , we have Lj D 0 and so hLk ; Lj i D 0, proving that
L0 ; : : : ; Ln is an orthogonal family.
b) By definition Ln has degree n and
2
n .2n/
D .X 2n /.2n/ D 2n.2n 1/ : : : 1 D .2n/Š:
L.n/
n .X / D ..X 1/ /
We deduce from relation (10.5) that
hLn ; Ln i D .2n/Š
Z 1
.1 x 2 /n dx D 2.2n/Š
Z 1
1
.1 x 2 /n dx:
0
Let
In D
Z 1
.1 x 2 /n dx
0
and observe that an integration by parts yields
In D x.1 x 2 /n j10 D 2n
Z 1
Z 1
x.1 x 2 /n1 .2x/ D 2n
0
Z 1
x 2 .1 x 2 /n1 dx
0
.1 .1 x 2 //.1 x 2 /n1 dx D 2n.In1 In /;
0
thus
.2n C 1/In D 2nIn1 :
Taking into account that I0 D 1 we obtain
In D
n
n
Y
Y
Ii
2i
2n nŠ
4n nŠ2
D
D
D
:
I
2i C 1
1 3 : : : .2n C 1/
.2n C 1/Š
iD1 i1
iD1
Finally
jjLn jj2 D hLn ; Ln i D 2.2n/ŠIn D
22nC1 nŠ2
2n C 1
436
10 Forms
and
r
jjLn jj D
2
2n nŠ:
2n C 1
Lk
c) Let Qk D jjL
. Then by part a) the family Q0 ; : : : ; Qn is orthonormal in V and
k jj
since dim V D n C 1, it follows that Q0 ; : : : ; Qn is an orthonormal basis of V .
Moreover, we have deg Qk D deg Lk D k for k 2 Œ0; n, which easily implies
that
Span.Q0 ; : : : ; Qk / D Span.X 0 ; : : : ; X k /
for k 2 Œ0; n. Finally,
hX k ; Lk i
hX ; Qk i D
D
jjLk jj
k
R1
.k/
2 k
0 Lk .1 x / dx
jjLk jj
> 0;
.k/
since we have already seen that Lk is a positive real number. We conclude
that Q0 ; : : : ; Qn is obtained from 1; X; : : : ; X n by applying the Gram–Schmidt
process.
t
u
10.5.1 Problems for Practice
1. Apply the Gram–Schmidt algorithm to the vectors
v1 D .1; 2; 2/;
v2 D .0; 1; 2/;
v3 D .1; 3; 1/:
2. Consider the vector space V of polynomials with real coefficients and degree
not exceeding 2, endowed with the inner product defined by
hP; Qi D
Z 1
xP .x/Q.x/dx:
0
Apply the Gram–Schmidt algorithm to the vectors 1; X; X 2 .
3. Consider the map h ; i W R3 R3 ! R defined by
h.x1 ; x2 ; x3 /; .y1 ; y2 ; y3 /i D .x1 C x2 C x3 /.y1 C y2 C y3 /C
.x2 C x3 /.y2 C y3 / C x3 y3 :
a) Check that this defines an inner product on R3 .
b) Applying the Gram–Schmidt algorithm to the canonical basis of R3 , give an
orthonormal basis for R3 endowed with this inner product.
10.5 Orthogonal Bases
437
4. Find an orthogonal basis of R4 containing the vector .1; 2; 1; 2/.
5. (The QR factorization) Let A 2 Mm;n .R/ be a matrix with linearly independent
columns C1 ; : : : ; Cn . Let W be the span of C1 ; : : : ; Cn , a subspace of Rm .
a) Prove that there is a matrix Q 2 Mm;n .R/ whose columns are an orthonormal basis of W , and there is an upper-triangular matrix R 2 Mn .R/ with
positive diagonal entries such that
A D QR:
Hint: the columns of Q are the result of applying the Gram–Schmidt process
to the columns of A.
b) Prove that the factorization A D QR with Q; R matrices as in part a) is
unique.
6. Using the Gram–Schmidt process, find the QR factorization of the matrix
3
235
A D 40 4 65:
007
2
7. Find the QR factorization of the matrix
3
12
A D 42 15:
13
2
8. Describe the QR factorization of an upper-triangular matrix A 2 Mn .R/.
9. If f W R ! R is a continuous 2-periodic function, we denote
cn .f / D
1
2
Z f .x/e i nx dx:
a) Prove that if f is continuously differentiable, then for all n 2 Z we have
cn .f 0 / D i n cn .f /:
b) Deduce that under the assumptions of a) we have
lim ncn .f / D 0
n!1
438
10 Forms
and
X
n2 jcn .f /j2 < 1:
n2Z
Hint: use the Riemann–Lebesgue theorem and Bessel’s inequality, as well as
part a).
c) Prove that if f; g W R ! R are continuous 2-periodic maps such that
cn .f / D cn .g/ for all n 2 Z, then f D g. Hint: use Plancherel’s theorem
for the function f g.
10. Consider the 2-periodic function f W R ! R such that f .0/ D f ./ D 0,
f .t / D 0 for t 2 .0; / and f is odd, i.e., f .x/ D f .x/ for all x.
a) Explain why such a map exists, plot its graph and show that it is piecewise
continuous.
b) Compute its Fourier coefficients am .f / and bm .f / for all m 0.
c) Using Plancherel’s theorem, deduce Euler’s famous identity
X
1
2
D
:
.2n C 1/2
8
n0
d) Deduce from part c) the even more famous Euler’s identity
1
X
1
nD1
n
D
2
2
:
6
11. Consider the 2-periodic function f W R ! R such that f .t / D t 2 for t 2
Œ; .
a) Compute the Fourier coefficients of f .
b) Using Plancherel’s theorem, prove the following identity
1
X
1
nD1
n4
D
4
:
90
12. Let E be an Euclidean space, let e1 ; : : : ; en be an orthonormal basis of E and
let T be a linear transformation on E. Prove that
Tr.T / D
n
X
hT .ei /; ei i:
iD1
13. Let V be an Euclidean space and let T be a linear transformation on V such
that Tr.T / D 0. Let e1 ; : : : ; en be an orthonormal basis of V .
10.5 Orthogonal Bases
439
a) Prove that one can find i; j 2 f1; 2; : : : ; ng such that hT .ei /; ei i and
hT .ej /; ej i have opposite signs. Hint: use Problem 12.
b) Check that the map f W Œ0; 1 ! R defined by
f .t / D hT .tei C .1 t /ej /; t ei C .1 t /ej i
is continuous and that f .0/f .1/ 0.
c) Conclude that there is a nonzero vector x 2 E such that
hT .x/; xi D 0:
d) Finally, prove by induction on n that there is an orthogonal basis of V in
which the diagonal entries of the matrix of T are all equal to 0.
14. Let V be an Euclidean space of dimension n, let e1 ; : : : ; en be an orthonormal
basis of V and let T W V ! V be an orthogonal projection. Show that
n
X
rank.T / D
jjT .ei /jj2 :
iD1
15. Let V be an Euclidean space of dimension n and let e1 ; : : : ; en be nonzero
vectors in V such that for all x 2 V we have
n
X
hek ; xi2 D jjxjj2 :
kD1
a) Compute the orthogonal of Span.e1 ; : : : ; en / and deduce that e1 ; : : : ; en is a
basis of V .
b) By choosing x D ei , prove that jjei jj 1 for all 1 i n.
c) By choosing x 2 Span.e1 ; : : : ; ei1 ; eiC1 ; : : : ; en /? , prove that jjei jj D 1 for
all 1 i n.
d) Conclude that e1 ; : : : ; en is an orthonormal basis of V .
16. (Hermite’s polynomials) Let n be a positive integer and let V be the space of
polynomials with real coefficients whose degree does not exceed n, endowed
with
Z 1
P .t /Q.t /e t dt:
hP; Qi D
0
a) Explain why h ; i is well defined and an inner product on V .
b) Define hk D .X k e X /.k/ e X for k 0. What are the coefficients of hk ?
c) Prove that for all k 2 Œ0; n and all P 2 V we have
Z 1
hP; hk i D .1/
k
0
P .k/ .t /t k e t dt:
440
10 Forms
d) Prove that h0 ; : : : ; hn is an orthogonal basis of V .
e) Prove that jjhk jj D kŠ for k 2 Œ0; n.
17. (Chebyshev’s polynomials) Let n be a positive integer and let V be the space
of real polynomials with degree not exceeding n, endowed with
hP; Qi D
Z 1
P .t /Q.t /
dt:
p
1 t2
1
a) Explain why h ; i makes sense and defines an inner product on V .
b) Prove that for each k 0 there is a unique polynomial Tk (the kth
Chebyshev polynomial) such that Tk .cos x/ D cos kx for all x 2 R.
c) Prove that T0 ; : : : ; Tn is an orthogonal basis of V .
d) Find jjTk jj for k 2 Œ0; n.
18. (Cross-product) Let V be an Euclidean space of dimension n 3 and
let .e1 ; : : : ; en / be a fixed orthonormal basis. If v1 ; : : : ; vn 2 V , write
det.v1 ; : : : ; vn / instead of det.e1 ;:::;en / .v1 ; : : : ; vn /.
a) Let v1 ; : : : ; vn1 2 V . Prove the existence of a unique vector v1 ^: : :^vn1 2
V such that for all v 2 V
det.v1 ; : : : ; vn1 ; v/ D hv; v1 ^ : : : ^ vn1 i
(10.6)
We call this vector v1 ^ : : : ^ vn1 the cross-product of v1 ; : : : ; vn1 .
b) Prove that v1 ^ : : : ^ vn1 is orthogonal to v1 ; : : : ; vn1 .
c) Prove that v1 ; : : : ; vn1 are linearly dependent if and only if
v1 ^ : : : ^ vn1 D 0:
d) Let vj D
Pn
iD1 aij ei . By choosing v D ei in (10.6) prove that
v1 ^ : : : ^ vn1 D
n
X
.1/ni
i ei ;
iD1
where i is the determinant of the .n 1/ .n 1/ matrix obtained from
Œaij (i.e., the matrix whose columns are v1 ; : : : ; vn1 ) by deleting the i th
row. In particular, if n D 3 check that
2
3 2 3 2
3
u1
v1
u2 v3 u3 v2
4 u2 5 ^ 4 v2 5 D 4 u3 v1 u1 v3 5 :
u3
v3
u1 v2 u2 v1
10.5 Orthogonal Bases
441
e) Prove that if f1 ; : : : ; fn is an orthonormal basis of V , then det.f1 ; : : : ; fn / 2
f1; 1g. We say that f1 ; : : : ; fn is positive or positively oriented (with
respect to .e1 ; : : : ; en /) if det.f1 ; : : : ; fn / D 1.
f) Prove that if v1 ; : : : ; vn1 is an orthonormal family, then v1 ; : : : ; vn1 ; v1 ^
: : : ^ vn1 is a positive orthonormal basis.
19. Let v1 ; : : : ; vn1 be linearly independent vectors in a Euclidean space V of
dimension n 3. Let H be the hyperplane spanned by v1 ; : : : ; vn1 .
a) Prove that for all v 2 V we have
pH .v/ D v hv1 ^ : : : ^ vn1 ; vi
.v1 ^ : : : ^ vn1 /
jjv1 ^ : : : ^ vn1 jj2
and
d.v; H / D
jhv; v1 ^ : : : ^ vn1 ij
:
jjv1 ^ : : : ^ vn1 jj
b) Prove that
H D fv 2 V j hv; v1 ^ : : : ^ vn1 i D 0g:
20. In this problem V is an Euclidean space of dimension 3.
a) (Lagrange’s formula) Prove that for all x; y 2 V we have
hx; yi2 C jjx ^ yjj2 D jjxjj2 jjyjj2 :
b) Prove that if is the angle between x and y, then
jjx ^ yjj D jjxjj jjyjj j sin j:
21. This exercise develops the theory of orthogonal bases over the complex
numbers. Let V be a finite dimensional vector space over C, endowed with a
hermitian inner product h ; i, i.e., a hermitian sesquilinear form h ; i W V V !
C such that hx; xi > 0 for all nonzero vectors x 2 V . Such a space is called a
hermitian space. Two vectors x; y 2 V are orthogonal if hx; yi D 0. Starting
with this definition, one defines the notion of orthogonal/orthonormal family
and orthogonal/orthonormal basis as in the case of vector spaces over R.
a) Prove that an orthogonal family consisting of nonzero vectors is linearly
independent, and deduce that if dim V D n, then an orthonormal family
consisting of n vectors is an orthonormal basis of V .
442
10 Forms
b) Let e1 ; : : : ; en be an orthonormal basis of V and let x D x1 e1 C : : : C xn en
and y D y1 e1 C : : : C yn en be two vectors in V . Prove that
hx; yi D x1 y1 C : : : C xn yn
and
jjxjj2 D jx1 j2 C : : : C jxn j2 :
c) State and prove a version of the Gram–Schmidt process in this context.
d) Prove that there is an orthonormal basis of V .
e) Prove that any orthonormal family in V can be completed to an orthonormal
basis of V .
f) Let W be a subspace of V and let w1 ; : : : ; wk be an orthonormal basis of W .
i) Prove that W ˚ W ? D V and .W ? /? D W .
ii) The orthogonal projection pW of V onto W is the projection of V onto W
along W ? . Prove that for all v 2 V
pW .v/ D
k
X
hwi ; viwi
iD1
and
jjv pW .v/jj D min jjv wjj:
w2W
10.6 The Adjoint of a Linear Transformation
Let .V; h ; i/ be an Euclidean space (the condition that V is finite dimensional will be
crucial in this section, so we insist on it). Let T W V ! V be a linear transformation.
For all y 2 V , the map x 7! hT .x/; yi is a linear form on V . It follows from
Theorem 10.37 that there is a unique vector T .y/ 2 V such that
hT .x/; yi D hT .y/; xi D hx; T .y/i
for all x 2 V . We obtain in this way a map T W V ! V , uniquely characterized by
the condition
hT .x/; yi D hx; T .y/i
for all x; y 2 V . It is easy to see that T is itself linear and we call T the adjoint
of T . All in all, we obtain the following
10.6 The Adjoint of a Linear Transformation
443
Theorem 10.69. Let .V; h ; i/ be an Euclidean space. For each linear transformation T W V ! V there is a unique linear transformation T W V ! V , called the
adjoint of T , such that for all x; y 2 V
hT .x/; yi D hx; T .y/i:
As the following problem shows, the previous result fails rather badly if we don’t
assume that V is finite dimensional:
Problem 10.70. Let V be the space of continuous real-valued maps on Œ0; 1,
endowed with the inner product
hf; gi D
Z 1
f .t /g.t /dt:
0
Prove that the linear transformation T sending f to the constant map equal to f .0/
has no adjoint.
Solution. Suppose that T has some adjoint T . Let W D ker T , that is the subspace
of maps f with f .0/ D 0. Fix g 2 V . Since
hT .f /; gi D hf; T .g/i
for all f; g 2 V , we deduce that hT .g/; f i D 0 for all f 2 W . Applying this to
the function f given by x 7! xT .g/.x/ which is in W , we conclude that
hT .g/; f i D
Z 1
x.T .g/.x//2 dx D 0:
0
Since x 7! x.T .g/.x//2 is continuous, nonnegative, and with average equal to 0,
it is the zero map, thus T .g/ vanishes on .0; 1 and then on Œ0; 1 by continuity.
We conclude that T .g/ D 0 for all g 2 V , thus hT .f /; gi D 0 for all f; g 2 V
and finally T .f / D 0 for all f 2 V . Since this is clearly absurd, the problem is
solved.
t
u
Note that for all x; y 2 V we have
hy; T .x/i D hT .x/; yi D hx; T .y/i D hT .y/; xi D hy; .T / .x/i
It follows that T .x/ .T / .x/ D 0 and so
.T / D T;
which we write as
T D T:
444
10 Forms
Thus the map T ! T is an involution of the space of linear transformations on
V . The fixed points of this involution are called symmetric or self-adjoint linear
transformations. They will play a fundamental role in this chapter. More precisely,
we introduce the following definitions:
Definition 10.71. Let .V; h ; i/ be an Euclidean space. A linear transformation T W
V ! V is called symmetric or self-adjoint if T D T and alternating or skewsymmetric if T D T .
In the next problems the reader will have the opportunity to find quite a few
different characterizations and/or properties of self-adjoint and alternating linear
transformations.
Problem 10.72. Let .V; h ; i/ be an Euclidean space and let T W V ! V be a linear
transformation. Let e1 ; : : : ; en be an orthonormal basis and let A be the matrix of
T with respect to e1 ; : : : ; en . Prove that the matrix of T with respect to e1 ; : : : ; en
is t A. Thus T is symmetric if and only if A is symmetric, and T is alternating if
and only if A is skew-symmetric (be careful to the hypothesis that e1 ; : : : ; en is
an orthonormal basis, nor just any basis!).
Solution. Let B D Œbij be the matrix of T with respect to e1 ; : : : ; en , thus for all
i 2 Œ1; n we have
T .ei / D
n
X
bki ek :
kD1
Since
hT .ei /; ej i D hei ; T .ej /i
and T .ei / D
Pn
kD1 aki ek , and since the basis is orthonormal we obtain
hT .ei /; ej i D
n
X
aki hek ; ej i D aj i
kD1
and
hei ; T .ej /i D
n
X
bkj hei ; ek i D bij :
kD1
We conclude that bij D aj i for all i; j 2 Œ1; n, and the result follows.
t
u
Problem 10.73. Prove that any two distinct eigenspaces of a symmetric linear
transformation are orthogonal.
10.6 The Adjoint of a Linear Transformation
445
Solution. Let 1 ; 2 be different eigenvalues of T and let x; y be nonzero vectors
in V such that T .x/ D 1 x and T .y/ D 2 y. Since T is symmetric we have
hT .x/; yi D hx; T .y/i:
The left-hand side equals 1 hx; yi, while the right-hand side equals
1 ¤ 2 , it follows that hx; yi D 0 and the result follows.
2 hx; yi. Since
t
u
Problem 10.74. Let n be a positive integer and let V be the space of polynomials
with real coefficients whose degree does not exceed n, endowed with the inner
product defined by
hP; Qi D
Z 1
P .x/Q.x/dx:
1
Prove that the linear transformation T WV ! V sending P to 2XP 0 .X /C.X 2 1/
P 00 .X / is symmetric.
Solution. Since the maps P 7! P 0 and P 7! P 00 are linear, it follows that T is a
linear transformation (note that T .P / belongs to V since deg XP 0 deg P n
and deg.X 2 1/P 00 deg P n). In order to prove that T is symmetric, we need
to prove that
hT .P /; Qi D hP; T .Q/i
for all polynomials P; Q 2 V . Note that using the product rule for derivatives, we
can write
T .P / D ..X 2 1/P 0 /0 :
Hence integration by parts gives
hT .P /; Qi D
Z 1
1
0
0
..x 1/P .x// Q.x/ dx D
2
Z 1
.1 x 2 /P 0 .x/Q0 .x/ dx:
1
Note that this last expression is symmetric in P and Q, so it also equals
hT .Q/; P i D hP; T .Q/i. Thus T is symmetric.
t
u
Problem 10.75. Let V be an Euclidean space and let T W V ! V be a linear
transformation.
a) Prove that T is alternating if and only if hT .x/; xi D 0 for all x 2 V .
b) Prove that if this is the case, then the only possible real root of the characteristic
polynomial of T is 0.
446
10 Forms
Solution. a) Suppose that T is alternating, thus T C T D 0. Then for all x 2 V
we have
hT .x/; xi D hx; T .x/i D hx; T .x/i D hT .x/; xi;
thus hT .x/; xi D 0.
Conversely, suppose that hT .x/; xi D 0 for all x 2 V . Thus for all x; y 2 V
we have
0 D hT .x C y/; x C yi D hT .x/ C T .y/; x C yi D
hT .x/; xi C hT .x/; yi C hx; T .y/i C hT .y/; yi D
hT .x/; yi C hT .x/; yi D h.T C T /.x/; yi:
Thus .T C T /.x/ is orthogonal to V and thus it equals 0, and this holds for all
x 2 V . It follows that T is alternating.
b) Suppose that is a real root of the characteristic polynomial of T . Thus there is
a nonzero vector x 2 V such that T .x/ D x. Then
jjxjj2 D h x; xi D hT .x/; xi D 0;
and so
D 0.
t
u
Problem 10.76. Let V be an Euclidean space and let e1 ; : : : ; en be a basis of V .
Prove that the map T W V ! V defined by
T .x/ D
n
X
hek ; xiek
kD1
is a symmetric linear transformation on V . Is T positive? Is it positive definite?
Solution. Note that x 7! hek ; xi is a linear map for all 1 k n (by definition of
an inner product). It follows that T itself is a linear transformation of V . In order to
check that T is symmetric, we need to prove that
hT .x/; yi D hx; T .y/i
for all x; y 2 V . Using the bilinearity of h ; i, we obtain
n
n
X
X
hek ; xi hek ; yi:
hT .x/; yi D h hek ; xiek ; yi D
kD1
kD1
A similar computation yields
n
n
X
X
hek ; yiek i D
hek ; xi hek ; yi;
hx; T .y/i D hx;
kD1
kD1
establishing therefore the desired equality and proving that T is symmetric.
10.6 The Adjoint of a Linear Transformation
447
Notice that the previous computations also give
hT .x/; xi D
n
X
hek ; xi2 :
kD1
The last sum is nonnegative since it is a sum of squares of real numbers. It follows
that T is positive. Moreover, if hT .x/; xi D 0, then the previous argument yields
hek ; xi D 0 for all 1 k n. Thus x is orthogonal to Span.e1 ; : : : ; en / D V and
so x D 0. It follows that T is positive definite.
t
u
Problem 10.77. Let T be a linear transformation on an Euclidean space V . Prove
that the following statements are equivalent:
a) For all x 2 V we have jjT .x/jj D jjT .x/jj.
b) For all x; y 2 V we have hT .x/; T .y/i D hT .x/; T .y/i.
c) T and T commute.
Such a linear transformation T is called normal. Note that symmetric as well as
alternating linear transformations are normal.
Solution. Suppose that a) holds. Using the polarization identity twice and the
linearity of T and T , we obtain
hT .x/; T .y/i D
jjT .x C y/jj2 jjT .x/jj2 jjT .y/jj2
D
2
jjT .x C y/jj2 jjT .x/jj2 jjT .y/jj2
D hT .x/; T .y/i:
2
Thus b) holds.
Suppose now that b) holds. For all x; y 2 V we have
h.T ı T T ı T /.x/; yi D hT .T .x//; yi hT .T .x//; yi
D hT .x/; T .y/i hy; T .T .x//i D hT .x/; T .y/i hT .y/; T .x/i D 0:
Thus .T ı T T ı T /.x/ D 0 for all x 2 V , that is T and T commute and so
c) holds.
Finally, suppose that c) holds. Then
jjT .x/jj2 D hT .x/; T .x/i D hx; T .T .x//i D
hx; T .T .x//i D hT .T .x//; xi D hT .x/; T .x/i D jjT .x/jj2 ;
thus jjT .x/jj D jjT .x/jj for all x 2 V and so a) holds. The problem is solved.
t
u
448
10 Forms
Problem 10.78. Let T be a normal linear transformation on an Euclidean space V .
Prove that if V1 is a subspace of V which is stable under T , then V1? is also stable
under T .
Solution. The result is clear if V1 D 0 or V1 D V , so assume that this is not the case.
Choose an orthonormal basis e1 ; : : : ; en of V obtained by patching an orthonormal
basis of V1 and an orthonormal basis of V1? . Since V1 is stable under T , the matrix
AB
for some matrices
of T with respect to e1 ; : : : ; en is of the form M D
0 C
A; B; C . Since T and T commute, we have
t
t
AB
A 0
A 0
AB
:
t t
D t t
0 C
0 C
B C
B C
In particular, we must have C t C D t BB C t C C . Thus
Tr. t BB/ D Tr.C t C / Tr. t C C / D 0;
P
which can be written as i;j bij2 D 0, where B D Œbij . We deduce that bij D 0 for
t
u
all i; j , that is B D 0. But then it is clear that V1? is stable under T .
10.6.1 Problems for Practice
1. Let V be an Euclidean space and let T be a linear transformation on V . Prove
that ker T ı T D ker T . Hint: if x 2 ker T ı T , compute jjT .x/jj2 .
2. Let T be a symmetric linear transformation of an Euclidean space V . Prove that
V D Im.T / ˚ ker T and that Im.T / and ker T are orthogonal.
3. Prove that if T is a normal endomorphism of an Euclidean space V , then
ker T D ker T :
4. Prove that if T is a linear transformation on an Euclidean space V , then
det T D det T :
5. Prove that if T is a linear transformation on an Euclidean space V , then
ker.T / D Im.T /? ;
Im.T / D .ker T /? :
6. Let V be an Euclidean space and let v 2 V be a vector with jjvjj D 1. Prove
that if k is a real number, then the map
T W V ! V;
T .x/ D x C khx; viv
is a symmetric linear transformation on V .
10.6 The Adjoint of a Linear Transformation
449
7. Let V be the space of polynomials with real coefficients whose degree does not
exceed n and consider the map h ; i W V V ! R defined by
hP; Qi D
Z 1r
1
1x
P .x/Q.x/dx:
1Cx
a) Explain why h ; i is well defined and an inner product on V .
b) Prove that the map T W V ! V defined by
T .P .X // D .X 2 1/P 00 .X / C .2X C 1/P 0 .X /
is a self-adjoint linear transformation on V .
8. Prove that if a; b are real numbers, then the linear transformation
T W R2 ! R2 ;
T .x; y/ D .ax C by; bx C ay/
is normal.
9. Let V be an Euclidean space of dimension 2 and let T W V ! V be a normal
linear transformation. Let A be the matrix of T with respect to an orthonormal
basis of V . Prove that either T is symmetric or
a b
AD
b a
for some real numbers a; b.
10. Let P 2 GLn .R/ be an invertible matrix and let E D Mn .R/ endowed with the
inner product given by
hA; Bi D Tr.At B/:
Find the adjoint of the linear transformation T W E ! E sending A to PAP 1 .
11. Let V be an Euclidean space and let T be a linear transformation on V such
that jjT .x/jj jjxjj for all x 2 V .
a) Prove that jjT .x/jj jjxjj for all x 2 V .
b) Prove that ker.T id/ D ker.T id/.
c) Deduce that V is the orthogonal direct sum of ker.T id/ and Im.T id/.
12. Let V be a hermitian space, that is a finite dimensional vector space over C
endowed with a hermitian inner product h ; i W V V ! C.
a) Prove that for any linear transformation T W V ! V there is a unique linear
transformation T W V ! V (called the adjoint of T ) such that for all
x; y 2 V
hx; T .y/i D hT .x/; yi:
450
10 Forms
Be careful that the left-hand side is no longer equal to hT .y/; xi, but rather
hT .y/; xi.
b) Prove that the map T 7! T is a linear involution on the space of linear
transformations on V , such that for all S; T
.S ı T / D T ı S :
c) Prove that T is invertible if and only if T is invertible, and then
.T /1 D .T 1 / :
d) If e1 ; : : : ; en is an orthonormal basis of V and if A is the matrix of T with
respect to this basis, prove that the matrix of T is A WD t A. We say that
T is self-adjoint or hermitian if T D T .
e) Prove that any orthogonal projection is a hermitian linear transformation.
f) Prove that ker T D .Im.T //? and Im.T / D .ker T /? .
g) Prove that if T is hermitian, then the orthogonal of a subspace stable under
T is also stable under T .
10.7 The Orthogonal Group
Let V1 ; V2 be Euclidean spaces with inner products h ; i1 and h ; i2 , and with
corresponding norms jj jj1 and jj jj2 .
Definition 10.79. An isometry (or isomorphism of Euclidean spaces) between
V1 and V2 is an isomorphism of R-vector spaces T W V1 ! V2 such that for all
x; y 2 V1
hT .x/; T .y/i2 D hx; yi1 :
Thus an isometry is a bijective linear map which is compatible with the inner
products on V1 and V2 . The following exercise gives an equivalent formulation of
this compatibility:
Problem 10.80. Let V1 and V2 be as above and let T W V1 ! V2 be a linear
transformation. Prove that the following statements are equivalent:
a) For all x; y 2 V1 we have
hT .x/; T .y/i2 D hx; yi1 :
b) For all x 2 V1 we have jjT .x/jj2 D jjxjj1 .
10.7 The Orthogonal Group
451
Solution. If a) holds, then taking y D x we obtain
jjT .x/jj22 D jjxjj21
and so jjT .x/jj2 D jjxjj1 , showing that b) holds.
If b) holds, then the polarization identity and the linearity of T yield
hT .x/; T .y/i2 D
jjT .x/ C T .y/jj22 jjT .x/jj22 jjT .y/jj22
D
2
jjT .x C y/jj22 jjT .x/jj22 jjT .y/jj22
jjx C yjj21 jjxjj21 jjyjj21
D
D hx; yi1 ;
2
2
t
u
finishing the solution.
Remark 10.81. If T is a linear transformation as in the previous problem, then T is
automatically injective: if T .x/ D 0, then jjT .x/jj2 D 0, thus jjxjj1 D 0 and then
x D 0.
Definition 10.82. a) Let V be an Euclidean space. A linear transformation T W
V ! V is called orthogonal if T is an isometry between V and V . In other
words, T is orthogonal if T is bijective and for all x; y 2 V
hT .x/; T .y/i D hx; yi:
Note that the bijectivity of T is a consequence of the last relation, thanks to
the previous remark. Thus T is orthogonal if and only if T preserves the inner
product.
b) A matrix A 2 Mn .R/ is called orthogonal if
A t A D In :
The equivalence between the first and last point in the following problem implies
the following compatibility of the previous definitions: let A 2 Mn .R/ and endow
Rn with its canonical inner product. Then A is orthogonal if and only if the linear
transformation X 7! AX on Rn is orthogonal. Also, by the previous problem a
linear map T on V is orthogonal if and only if jjT .x/jj D jjxjj for all x 2 V . Hence
A is orthogonal if and only if
jjAX jj D jjX jj
for all X 2 Rn , where jj jj is the norm associated with the canonical inner product
on Rn , that is
jj.x1 ; : : : ; xn /jj D
q
x12 C : : : C xn2 :
452
10 Forms
Example 10.83. A very important class of orthogonal transformations/matrices is
given by orthogonal symmetries. Namely, consider an Euclidean space V and a
subspace W . Then V D W ˚ W ? , so we can define the symmetry sW with respect
to W along W ? . Recall that if v 2 V is written as v D w C w? with w 2 W and
w? 2 W ? , then
sW .v/ D w w? ;
so that sW fixes pointwise W1 , and sW fixes pointwise W ? .
In order to see that sW is an orthogonal transformation, it suffices to check that
jjsW .v/jj D jjvjj for all v 2 V , or equivalently
jjw w? jj D jjw C w? jj
for all .w; w? / 2 W W ? . But by the Pythagorean theorem the squares of both
sides are equal to jjwjj2 C jjw? jj2 , whence the result.
Orthogonal symmetries can be easily recognized among orthogonal maps: they
are precisely the self-adjoint orthogonal transformations, that is their matrices in an
orthonormal basis of the space are simultaneously symmetric and orthogonal. The
point is that an orthogonal matrix A is symmetric if and only if A2 D In , since
A t A D In .
Let us come back to the general context of an orthogonal matrix A 2 Mn .R/ and
analyze a little bit the relation
A t A D In :
Using the product rule and denoting R1 ; : : : ; Rn the rows of A, we see that the
previous equality is equivalent to
hRi ; Rj i D 0
if i ¤ j;
jjRi jj2 D 1;
1 i n;
in other words A is orthogonal if and only if its rows R1 ; : : : ; Rn form an
orthonormal basis of Rn . Also, notice that A is orthogonal if and only if t A is
orthogonal, thus we have just proved the following:
Theorem 10.84. Let A 2 Mn .R/ be a matrix and endow Rn with the canonical
inner product, with associated norm jjjj. The following statements are equivalent:
a) A is orthogonal.
b) The rows of A form an orthonormal basis of Rn .
c) The columns of A form an orthonormal basis of Rn .
d) For all X 2 Rn we have
jjAX jj D jjX jj:
10.7 The Orthogonal Group
453
Problem 10.85. Let V be an Euclidean space and let T W V ! V be a linear
transformation. Prove that the following assertions are equivalent:
a) T is orthogonal.
b) We have hT .x/; T .y/i D hx; yi for all x; y 2 V .
c) For all x 2 V we have jjT .x/jj D jjxjj.
d) T ı T D Id.
Solution. By definition a) implies b), which is equivalent to c) by Problem 10.80.
If b) holds, then
hT ı T .x/ x; yi D hy; T .T .x//i hx; yi D hT .x/; T .y/i hx; yi D 0
for all x; y 2 V , thus T .T .x// D x for all x 2 V and d) follows. It remains to
see that d) implies a). It already implies that T is bijective, with inverse T , so it
suffices to see that b) holds. Since b) is equivalent to c) by Problem 10.80, it suffices
to check that c) holds. Or
jjT .x/jj2 D hT .x/; T .x/i D hx; T .T .x//i D hx; xi D jjxjj2
for all x 2 V , which yields c).
t
u
We can also characterize orthogonal linear transformations in terms of their effect
on orthonormal bases, as the following problem shows:
Problem 10.86. Let V be an Euclidean space and let T W V ! V be a linear
transformation. Then the following statements are equivalent:
a) T is orthogonal.
b) For any orthonormal basis e1 ; : : : ; en of V , the vectors T .e1 /; : : : ; T .en / form an
orthonormal basis of V .
c) There is an orthonormal basis e1 ; : : : ; en of V such that T .e1 /; : : : ; T .en / is an
orthonormal basis of V .
Solution. Suppose that a) holds and let e1 ; : : : ; en be an orthonormal basis of V .
Then for all i; j 2 Œ1; n we have
hT .ei /; T .ej /i D hei ; ej i D 1i Dj :
It follows that T .e1 /; : : : ; T .en / is an orthonormal family, and since it has n D
dim V elements, we deduce that it is an orthonormal basis of V . Thus a) implies b),
which clearly implies c).
Suppose that c) holds. Let x 2 V and write x D x1 e1 C : : : C xn en . Since
T .e1 /; : : : ; T .en / and e1 ; : : : ; en are orthonormal bases of V , we have
jjT .x/jj2 D jjx1 T .e1 / C : : : C xn T .en /jj2 D x12 C : : : C xn2 D jjxjj2 :
Thus jjT .x/jj D jjxjj for all x 2 V , and T is orthogonal (by the previous problem).
t
u
454
10 Forms
Remark 10.87. The previous problem has the following very useful consequence:
let e1 ; : : : ; en be an orthonormal basis of V and let e10 ; : : : ; en0 be another basis of V .
Let P be the change of basis matrix from e1 ; : : : ; en to e10 ; : : : ; en0 . Then e10 ; : : : ; en0
is orthonormal if and only if P is orthogonal. We leave the details of the proof to
the reader.
Theorem 10.88. The set of orthogonal linear transformations on an Euclidean
space V forms a group under composition. In more concrete terms, the composition of two orthogonal transformations is an orthogonal transformation, and the
inverse of an orthogonal transformation is an orthogonal transformation.
Proof. If T1 ; T2 are orthogonal linear transformations, then T1 ı T2 is a linear
transformation and
jjT1 ı T2 .x/jj D jjT1 .T2 .x//jj D jjT2 .x/jj D jjxjj
for all x 2 V thus T1 ı T2 is an orthogonal linear transformation on V by
Problem 10.85. Similarly, we prove that the inverse of an orthogonal transformation
is an orthogonal transformation. The result follows.
t
u
The group O.V / of orthogonal transformations (or isometries) of V is called the
orthogonal group of V . It is the group of automorphisms of the Euclidean space V
and plays a crucial role in understanding the space V .
Problem 10.89. Let V be an Euclidean space and let T be an orthogonal linear
transformation on V . Let W be a subspace of V which is stable under T .
a) Prove that T .W / D W and T .W ? / D W ? .
b) Prove that the restriction of T to W (respectively W ? ) is an orthogonal linear
transformation on W (respectively W ? ).
Solution. a) This follows easily from Problems 10.85 and 10.78, but for the
reader’s convenience we give a direct argument. Since T maps W into W by
assumption and since T jW is injective (because T is injective on V ), it follows
that T jW W W ! W is surjective, thus T .W / D W . The same argument reduces
the proof of the equality T .W ? / D W ? to that of the inclusion T .W ? / W ? .
Let x 2 W ? and y 2 W . We want to prove that hT .x/; yi D 0. But T is
orthogonal, so T D T 1 (Problem 10.85) and so
hT .x/; yi D hx; T 1 .y/i:
Since W is stable under T 1 , we obtain T 1 .y/ 2 W , and since x 2 W ? , we
must have hx; T 1 .y/i D 0. Thus hT .x/; yi D 0 and we are done.
b) Let T1 be the restriction of T to W . Using Problem 10.85 we obtain for all x 2 W
jjT1 .x/jj D jjT .x/jj D jjxjj;
thus using Problem 10.85 again we obtain that T1 is an orthogonal linear map on
W . The argument for W ? being identical, the problem is solved.
t
u
10.7 The Orthogonal Group
455
We will now classify the orthogonal transformations of an Euclidean space in
terms of simple transformations. The proof requires two preliminary results, which
are themselves of independent interest.
Lemma 10.90. Let V be an Euclidean space and let T be a linear transformation
on V . Then there is a line or a plane in V which is stable under T .
Proof. The minimal polynomial of T is a polynomial P with real coefficients. If it
has a real root, it follows that T has an eigenvalue and so the line spanned by an
eigenvector for that eigenvalue is stable under T . Suppose that P has no real root.
Let z be a complex root of P . Then since P has real coefficients, z is also a root of P
and so Q D .X z/.X z/ divides P . Moreover, Q.T / is not invertible, otherwise
P
would be a polynomial of smaller degree killing T . Thus there is a nonzero vector
Q
x 2 V such that Q.T /.x/ D 0. This can be written as T 2 .x/ C aT .x/ C bx D 0
for some real numbers a; b. It follows that the space generated by x and T .x/ is a
plane which is stable under T , and the lemma is proved.
t
u
Lemma 10.91. Let V be a two-dimensional Euclidean space and let T be an
orthogonal transformation on V with no real eigenvalue. Then there is an orthonormal basis of V with respect to which the matrix of T is of the form
cos sin :
R D
sin cos Proof. Let e1 ; e2 be an arbitrary orthonormal basis of V and write T .e1 / D ae1 C
be2 for some real numbers a; b. Since
a2 C b 2 D jjT .e1 /jj2 D jje1 jj2 D 1;
we can find a real number such that a D cos and b D sin . The orthogonal of
T .e1 / is given by the line R. sin e1 Ccos e2 /. Since hT .e1 /; T .e2 /i D he1 ; e2 i D
0, we deduce that T .e2 / 2 R. sin e1 C cos e2 / and so
T .e2 / D c. sin e1 C cos e2 /
for some real number c. Since
jjT .e2 /jj D jje2 jj D 1;
we deduce that jcj D 1 and so c 2 f1; 1g. It remains to exclude the case c D 1.
But if c D 1, then the matrix of T with respect to e1 ; e2 is
cos sin AD
sin cos 456
10 Forms
and one can easily check that its characteristic polynomial is X 2 1, which has real
roots. It follows that if c D 1, then T has a real eigenvalue, contradiction. The
result follows.
t
u
We are now ready for the proof of the fundamental theorem classifying orthogonal linear transformations on an Euclidean space:
Theorem 10.92. Let V be an Euclidean space and let T be an orthogonal
transformation on V . Then we can find an orthonormal basis of V with respect
to which the matrix of T is of the form
2
3
Ip 0 : : : 0 0
6 0 I : : : 0 0 7
6
7
q
6
7
A D 6 0 0 R 1 : : : 0 7
6
7
4::: ::: ::: :::
5
0 0 : : : 0 R k
where 1 ; : : : ; k are real numbers and
cos sin :
R D
sin cos Proof. We will prove the result by induction on dim V . If dim V D 1, then
everything is clear, since we must have T D ˙id. Assume now that dim V D n 2
and that the result is known in dimension at most n 1.
Suppose that T has a real eigenvalue and let e1 be an eigenvector. Then
j jjje1 jj D jj e1 jj D jjT .e1 /jj D jje1 jj;
thus 2 f1; 1g. Let W D Re1 , then W is stable under T , hence W ? is stable
under T (because T is orthogonal). Moreover, the restriction of T to W ? is still
an orthogonal transformation, since we have jjT .x/jj D jjxjj for all x 2 V ,
thus also for all x 2 W ? . By the inductive hypothesis, W ? has an orthonormal
basis e2 ; : : : ; en with respect to which the matrix of T restricted to W ? is of the
right shape (i.e., as in the statement of the theorem). Adding the vector jjee11 jj and
possibly permuting the resulting orthonormal basis jjee11 jj ; e2 ; : : : ; en of V yields an
orthonormal basis with respect to which the matrix of T has the desired shape.
Assume now that T has no real eigenvalue. By Lemma 10.90 we can find
two dimensional subspace V of T stable under T . Since T is orthogonal, the
space W ? is also stable under T , and the restrictions of T to W and W ? are
orthogonal transformations on these spaces. By the inductive hypothesis W ? has
an orthonormal basis e3 ; : : : ; en with respect to which the matrix of T jW ? is
block-diagonal, with blocks of the form Ri . By Lemma 10.91 the space W has an
10.7 The Orthogonal Group
457
orthonormal basis e1 ; e2 with respect to which the matrix of T jW is of the form R .
Then the matrix of T with respect to e1 ; : : : ; en has the desired shape. The theorem
is proved.
t
u
We can also rewrite the previous theorem purely in terms of matrices:
Corollary 10.93. Let A 2 Mn .R/ be an orthogonal matrix. There is an orthogonal
matrix P 2 Mn .R/, integers p; q; k such that p C q C 2k D n and real numbers
1 ; : : : ; k such that
2
3
Ip 0 : : : 0 0
6 0 I : : : 0 0 7
6
7
q
6
7
A D P 1 6 0 0 R1 : : : 0 7 P:
6
7
4::: ::: ::: :::
5
0 0 : : : 0 R k
Remark 10.94. a) The determinant of the matrix
3
2
Ip 0 : : : 0 0
6 0 I : : : 0 0 7
7
6
q
7
6
6 0 0 R 1 : : : 0 7
7
6
5
4::: ::: ::: :::
0 0 : : : 0 R k
is .1/q 2 f1; 1g, since det Ri D 1 for 1 i k. It follows that
det T 2 f1; 1g
for any orthogonal transformation T on V . Equivalently, det A 2 f1; 1g for any
orthogonal matrix A 2 Mn .R/. Of course, we can prove this directly, without
using the previous difficult theorem: since A t A D In and det. t A/ D det A, we
deduce that
1 D det.A t A/ D det.A/2 ;
thus det A 2 f1; 1g.
An isometry T with det T D 1 is called a positive isometry, while an
isometry T with det T D 1 is called a negative isometry. Geometrically,
positive isometries preserve the orientation of the space, while negative ones
reverse the orientation.
We can use the previous remark to define the notion of oriented orthonormal
basis of V . Fix an orthonormal basis B D .e1 ; : : : ; en / of V . If B 0 D .f1 ; : : : ; fn /
is another orthonormal basis of V , then the change of basis matrix P from B to
B 0 is orthogonal, thus det P 2 f1; 1g. We say that B 0 is positive or positively
oriented (with respect to B) if det P D 1, and negative or negatively oriented
(with respect to B) if det P D 1. If V D Rn is endowed with the canonical
458
10 Forms
inner product, then we always take for B the canonical basis, so we simply say
that an orthonormal basis is positive or negative if it is positive or negative with
respect to the canonical basis of Rn .
b) The characteristic polynomial of the matrix
2
3
Ip 0 : : : 0 0
6 0 I : : : 0 0 7
6
7
q
6
7
6 0 0 R 1 : : : 0 7
6
7
4::: ::: ::: :::
5
0 0 : : : 0 R k
is
.X 1/ .X C 1/ p
q
k
Y
.X 2 2 cos i X C 1/:
iD1
Notice that the complex roots of the polynomial X 2 2 cos X C 1 are e i and
e i , and they have absolute value 1. We deduce from the previous theorem that
if is a complex root of the characteristic polynomial of an orthogonal matrix,
then j j D 1. In other words, all complex eigenvalues of an orthogonal matrix
have absolute value 1. This can also be proved directly, but the proof is trickier
than the one that det A 2 f1; 1g for an orthogonal matrix A.
Let us try to study the orthogonal group in small dimension, by starting in
dimension 2. We could use the previous theorem, but we prefer to give direct
arguments in this case, since everything can be done by hand in a fairly simple and
explicit way. So, let us try to understand orthogonal matrices A 2 M2 .R/. Consider
a matrix
ab
AD
cd
satisfying A t A D I2 . We know by the previous discussion that det A 2 f1; 1g
(recall that this is immediate from the relation A t A D I2 ). Therefore, it is natural
to consider two cases:
• det A D 1. In this case the inverse of A is simply
A1 D
d b
c a
and since A is orthogonal we have A1 D t A, giving a D d and b D c, that is
a c
:
AD
c a
10.7 The Orthogonal Group
459
Moreover, we have a2 C c 2 D 1, thus there is a unique real number 2 .; such that a D cos and c D sin . Therefore
cos sin :
A D R D
sin cos The corresponding linear transformation T W R2 ! R2 (sending X to AX ) is
given by
T .x; y/ D .cos x sin y; sin x C cos y/
and geometrically this is the rotation of angle . A simple computation shows
that
R1 R2 D R1 C2 D R2 R1
(10.7)
for all real numbers. In particular, all rotations commute with each other.
An important consequence of this observation is that the matrix of T with respect
to any positive orthonormal basis of R2 is still R (since the change of basis
matrix from the canonical basis to this new positive orthonormal basis is still a
rotation, thus it commutes with R ). Similarly, one checks that the matrix of T
with respect to any negative orthonormal basis of R2 is R . The formula (10.7)
also shows that it is very easy to find the angle of the composite of two rotations:
simply add their angles and subtract a suitable multiple of 2 to bring this angle
in the interval .; .
d b
, thus the condition A1 D t A
• det A D 1. Now the inverse of A is
c a
yields d D a and b D c. Also, we have a2 C b 2 D 1, thus there is a unique
real number 2 .; such that a D cos and b D sin . Then
A D S WD
cos sin :
sin cos Note that S is symmetric and orthogonal, thus S2 D I2 and the corresponding
transformation T W R2 ! R2
T .x; y/ D .cos x C sin y; sin x cos y/
is an orthogonal symmetry. In order to find the line with respect to which T is
an orthogonal symmetry, it suffices to solve the system AX D X . An easy
computation left to the reader shows that the system is equivalent to
sin
x D cos
y
2
2
460
10 Forms
and so the line AX D X is spanned by the vector
; sin
:
e1 D cos
2
2
Note that the orthogonal of this line is spanned by the vector
e2 D sin
; cos
;
2
2
and the vectorse1 ; e2 form a positive orthonormal basis of R2 in which the matrix
1 0
.
of T is
0 1
One can easily check that
S1 S2 D R1 2 ;
thus the composite of two orthogonal symmetries is a rotation (this was actually
clear from the beginning, since the product of two matrices of determinant 1 is a
matrix with determinant 1). Similarly, one checks that
S1 R2 D S1 2 ;
R1 S2 D S1 C2 ;
thus the composite of a rotation and an orthogonal symmetry is an orthogonal
symmetry (this was also clear for determinant reasons).
All in all, the previous discussion gives
Theorem 10.95. Let A 2 M2 .R/ be an orthogonal matrix.
a) If det A D 1, then
cos sin A D R D
sin cos for a unique real number 2 .; , and the corresponding linear transformation T on R2 is the rotation of angle . Any two such matrices commute and
the matrix of T in any positive orthonormal basis of R2 is R .
b) If det A D 1, then
A D S D
cos sin sin cos for a unique real number 2 .; . The matrix A is symmetric and the
corresponding linear transformation on R2 is the orthogonal symmetry with
respect to the line spanned by cos 2 ; sin 2 .
10.7 The Orthogonal Group
461
Let us consider now the more complicated case dim V D 3. Here it is no
longer easy to do explicit computations, so we will use Theorem 10.92 and our
understanding of the case dim V D 2 in order to understand the case dim V D 3.
Recall the integers p; q; k from Theorem 10.92. Since
p C q C 2k D 3;
we see that necessarily p ¤ 0 or q ¤ 0. We can also prove this directly, observing
that the characteristic polynomial of T has degree 3, thus it has a real root and so
T has a real eigenvalue, which is necessarily equal to 1 or 1 since it has absolute
value 1.
Replacing T with T , we exchange the roles of p and q. For simplicity, let us
assume that p 1, i.e., T has at least one fixed point v. Then T fixes the line
D spanned by v, and induces an isometry on the plane P orthogonal to D. This
isometry is classified by Theorem 10.95, which deals with isometries of a plane.
Thus we reduced the case dim V D 3 to the case dim V D 2. We can be a little bit
more explicit, by discussing the following cases:
• Either T or T is the identity map. This case is not very interesting.
• We have dim ker.T id/ D 2. If e2 ; e3 is an orthonormal basis of the plane
ker.T id/, completed to an orthonormal basis e1 ; e2 ; e3 of V , then T fixes
pointwise Span.e2 ; e3 / and leaves invariant the line spanned by e1 . Thus the
3
2
00
matrix of T with respect to e1 ; e2 ; e3 is of the form 4 0 1 0 5 for some real
001
number , which is necessarily 1 (it must be 1 or 1 since the matrix must
be orthogonal, and it cannot be 1 as otherwise T D id). We deduce that T is
the orthogonal symmetry with respect to the plane ker.T id/. Notice that
det T D 1 in this case (i.e., T is a negative isometry).
• We have dim ker.T id/ D 1, thus ker.T id/ is the line spanned by some vector
e1 of norm 1. Complete e1 to a positive orthonormal basis e1 ; e2 ; e3 of V D R3 .
For instance, one can simply find a vector e2 of norm 1 orthogonal for e1 , and if
3
u1
e1 D 4 u2 5
u3
2
3
v1
e2 D 4 v2 5 ;
v3
2
and
set
2
3
u2 v3 u3 v2
e3 D e1 ^ e2 WD 4 u3 v1 u1 v3 5 :
u1 v2 u2 v1
462
10 Forms
The isometry that T induces on Span.e2 ; e3 / has no fixed point (since all fixed
points of T are on the line spanned by e1 ), thus it is a rotation of angle for a
unique real number 2 .; . The matrix of T with respect to e1 ; e2 ; e3 is
3
1 0
0
R WD 4 0 cos sin 5 :
0 sin cos 2
We say that T is the rotation of angle around the axis Re1 . Note that det T D
1, that is T is a positive isometry. Also, note that the angle satisfies
1 C 2 cos D Tr.A/;
but this relation does not uniquely characterize the angle (since is also
a solution of that equation). In order to find , it remains to find sin . In order to
do that, one checks that
ˇ
ˇ
ˇ1 0 0 ˇ
ˇ
ˇ
det.e1 ;e2 ;e3 / .e1 ; e2 ; T .e2 // D ˇˇ 0 1 cos ˇˇ D sin :
ˇ 0 0 sin ˇ
• Finally, assume that ker.T id/ D f0g. One possibility is that T D id. Assume
that T ¤ id. Since either T or T have a fixed point (this follows from the fact
that p or q is nonzero, i.e., that T has a real eigenvalue, which must be ˙1) and
since T has no fixed point, it follows that T has a fixed point. Let e1 be a vector
of norm 1 which is fixed by T , thus T .e1 / D e1 . Complete e1 to a positive
orthonormal basis e1 ; e2 ; e3 of V , then arguing as in the previous case we deduce
that the matrix of T with respect to e1 ; e2 ; e3 is
3
3
2
1 0
0
1 0 0
4 0 cos sin 5 D R 4 0 1 0 5
0 sin cos 0 01
2
for some 2 .; . Thus T is the composite of a rotation of angle and
of an orthogonal symmetry with respect to the orthogonal of the axis of the
rotation. Also, notice that det T D 1, thus T is a negative isometry.
We can also slightly change the point of view and discuss the situation in
terms of matrices. Consider an orthogonal matrix A 2 M3 .R/ and the associated
linear transformation T W V ! V sending X to AX , where V D R3 is endowed
with the canonical inner product. We exclude the trivial cases A D ˙I3 .
In order to study the isometry T , we first check whether T is a positive or
negative isometry by computing det T D det A.
Assume first that T is positive, i.e., det A D 1. We then check whether A
is symmetric, i.e., A D t A. Let us consider two cases:
10.7 The Orthogonal Group
463
• If A is symmetric, then A2 D I3 (since A is orthogonal and symmetric) and so
T is an orthogonal symmetry. We claim that T is the orthogonal symmetry
with respect to a line. Indeed, since A2 D I3 , all eigenvalues of A are 1 or 1.
Moreover, they are not all equal since we excluded the cases A D ˙I3 , and their
product is 1, since det A D 1. Thus one eigenvalue is 1 and the other 2 are equal
to 1. It follows that the matrix of T with respect to some orthonormal basis
3
2
1 0 0
e1 ; e2 ; e3 of R3 is 4 0 1 0 5 and T is the orthogonal symmetry with respect
0 0 1
to the line spanned by e1 . To find this line, we compute ker.A I3 / by solving
the system AX D X . A basis v of the space of solutions of this system will span
the line we are looking for.
• If A is not symmetric, then A is a rotation of angle for a unique 2 .; .
We find the axis of the rotation by solving the system AX D X : if Ae1 D e1
for some unit vector e1 , then the axis of the rotation is spanned by e1 . To find
the angle of the rotation, we start by using the relation
1 C 2 cos D Tr.A/;
(?)
which pins down up to a sign. Next, we choose any vector e2 of norm 1
orthogonal to e1 and we set e3 D e1 ^ e2 . Then e1 ; e2 ; e3 is a positive orthonormal
basis of R3 and det.e1 ;e2 ;e3 / .e1 ; e2 ; Ae2 / gives sin , which then determines uniquely. Notice that in practice it suffices to find the sign of the determinant
of the vectors e1 ; e2 ; Ae2 with respect to the canonical basis of R3 , as this
sign gives the sign of sin , which in turn determines uniquely thanks to
relation (?).
Assume now that T is negative, i.e., det A D 1. Then T is positive, thus the
previous discussion applies to T .
Let us see two concrete examples:
Problem 10.96. a) Prove that
3
2
1 2 2
14
AD
2 1 2 5
3
2 2 1
is an orthogonal matrix.
b) Describe the isometry of R3 defined by A, i.e., the map T W R3 ! R3 given by
T .X / D AX .
Solution. a) Using the product rule, one easily checks that A t A D I3 , thus A is
orthogonal. Alternatively, one checks that the columns (or rows) of A form an
orthonormal family.
b) First, we check whether T is a positive or negative isometry by computing det A.
An easy computation shows that det A D 1, so T is a positive isometry. Since
464
10 Forms
A is symmetric, we deduce from the above discussion that A is the orthogonal
symmetry with respect to a line. To find this line, we solve the system AX D X .
If x; y; z are the coordinates of X , the system is equivalent to
8
<x C 2y C 2z D 3x
2x y C 2z D 3y
:
2x C 2y z D 3z
and has the solution x D y D z. Thus T is the orthogonal symmetry with respect
to the line spanned by .1; 1; 1/.
t
u
Problem 10.97. Prove that the matrix
3
2 2 1
1
A D 4 2 1 2 5
3
1 2 2
2
is orthogonal and study the associated isometry of R3 .
Solution. One easily checks either that A t A D I3 or that the rows of A form an
orthonormal family. Next, one computes det A D 1, thus the associated isometry T
is positive. Since A is not symmetric, it follows that T is a rotation. To find its axis,
we solve the system AX D X , which is equivalent to
8
< 2x C 2y C z D 3x
2x C y C 2z D 3y
:
x 2y C 2z D 3z
and then to
x D z;
y D 0:
2 3
1
Thus the axis of the rotation is spanned by the vector 4 0 5. We normalize it to make
1
it have norm 1, thus we consider instead the vector
2 3
1
1 4 5
e1 D p
0 ;
2 1
which spans the axis of T .
Let be the angle of the rotation, so that
1 C 2 cos D Tr.A/ D
5
;
3
10.7 The Orthogonal Group
465
thus
cos D
1
:
3
It remains to find the sign of sin . For that, we choose a unit vector orthogonal to
e1 , say
2 3
0
e2 D 4 1 5
0
and compute the sign of
ˇ
ˇ
ˇ1 0 2 ˇ
ˇ
ˇ
1
4
det.e1 ; e2 ; Ae2 / D p ˇˇ 0 1 1 ˇˇ D p < 0;
3 2 ˇ 1 0 2 ˇ
3 2
thus sin < 0 and finally
1
D arccos :
3
t
u
10.7.1 Problems for Practice
1. Prove the result stated in Remark 10.87.
2. a) Prove that the matrix
3 4
A D 54 35
5 5
is orthogonal.
b) Describe the isometry T W R2 ! R2 sending X to AX : is it positive or
negative? If it is a rotation, describe the angle, if it is a symmetry describe
the line with respect to which T is the orthogonal symmetry.
3. a) Prove that each of the following matrices is orthogonal
3
3
3
2
2
2
1 0 0
001
1 2 2
1
41 0 05;
4 2 1 2 5 ; 4 0 0 1 5 :
3
0 1 0
010
2 2 1
b) If A is one of these matrices, describe the isometry T W R3 ! R3 sending X
to AX (for instance, if T is a rotation then you will need to find the axis and
the angle of the corresponding rotation).
466
10 Forms
4. Prove that the matrix
3
2
2 6 3
14
AD
6 3 2 5
7
3 2 6
is orthogonal and study the associated isometry of R3 .
5. Find the matrix of the rotation of angle 3 around the line spanned by .1; 1; 1/.
6. Let V be an Euclidean space and let T W V ! V be a linear map. Prove that T
is orthogonal if and only if jjT .x/jj D 1 whenever jjxjj D 1.
7. a) Describe all orthogonal matrices A 2 Mn .R/ having integer entries.
b) How many such matrices are there?
8. a) Describe the matrices in Mn .R/ which are simultaneously diagonal and
orthogonal.
b) Describe the matrices in Mn .R/ which are simultaneously upper-triangular
and orthogonal.
9. Let V be an Euclidean space. Recall that if W is a subspace of V , then sW
denotes the orthogonal symmetry with respect to W , that is the symmetry with
respect to W along W ? .
a) Let v be a vector in V with jjvjj D 1 and let H D .Rv/? be its orthogonal.
Prove that for all x 2 V we have
sH .x/ D x 2hv; xiv:
b) Let v1 ; v2 2 V be vectors in V with the same norm. Prove that there is a
hyperplane H of V such that sH .v1 / D v2 .
10. Find the matrix (in the canonical basis of R3 ) of the orthogonal symmetry of
R3 with respect to the line spanned by .1; 2; 3/.
11. Find the matrix (in the canonical basis of R3 ) of the orthogonal symmetry of
R3 with respect to the plane spanned by .1; 1; 1/ and .0; 1; 0/.
12. Let V be a three-dimensional Euclidean space and let r be a rotation on V and
s an orthogonal symmetry. Prove that s ı r ı s is a rotation and describe its axis
and its angle in terms of those of r.
13. Let V be a three-dimensional Euclidean space. When does a rotation of V
commute with an orthogonal symmetry of V ?
14. Let A D Œaij 2 Mn .R/ be an orthogonal matrix. Prove that
n
n
X
p
jaij j n n:
i;j D1
Hint: the sum of squares of the elements in each row is 1. For the inequality on
the right use the Cauchy–Schwarz inequality.
10.7 The Orthogonal Group
467
15. Let A D Œaij 2 Mn .R/ be an orthogonal matrix.
a) Let X be the vector in Rn all of whose coordinates are equal to 1. Compute
hX; AX i, where h ; i is the standard inner product on Rn .
b) Prove that
j
n
X
aij j n:
i;j D1
16. Let v 2 Rn be a nonzero vector. Find all real numbers k for which the linear
map T W Rn ! Rn defined by
T .x/ D x C khx; viv
is an isometry.
17. Let V be an Euclidean space and let T W V ! V be a linear transformation
such that hT .x/; T .y/i D 0 whenever hx; yi D 0.
a) Let x; y be vectors of norm 1 in V . Compute hx C y; x yi.
b) Prove that there is a nonnegative real number k such that for all x 2 V
jjT .x/jj D kjjxjj:
Hint: if jjxjj D jjyjj D 1, show that jjT .x/jj D jjT .y/jj using part a) and
the hypothesis.
c) Prove that there is an orthogonal transformation S on V such that T D kS .
18. Let V D Mn .R/ be endowed with the inner product
hA; Bi D Tr. t AB/:
Let A 2 V . Prove that the following statements are equivalent:
a) A is orthogonal
b) The linear transformation T W V ! V sending B to AB is orthogonal.
19. (Cayley transform)
a) Let A 2 Mn .R/ be a skew-symmetric matrix. Prove that In C A is invertible.
Hint: if AX D X , compute hAX; X i in two different ways.
b) Prove that if A 2 Mn .R/ is skew-symmetric, then .In A/.In C A/1 is an
orthogonal matrix which does not have 1 as eigenvalue.
c) Conversely, prove that if B is an orthogonal matrix not having 1 as
eigenvalue, then we can find a skew-symmetric matrix A such that B D
.In A/.In C A/1 .
d) Prove that the map A 7! .In A/.In C A/1 induces a bijection between the
skew-symmetric matrices in Mn .R/ and the orthogonal matrices in Mn .R/
for which 1 is not an eigenvalue.
468
10 Forms
20. (Compactness of the orthogonal group) Let .Ak /k1 be a sequence of orthogo.k/
nal matrices in Mn .R/. Let ai;j be the .i; j /-entry of Ak . Prove that there exists
a sequence of integers k1 < k2 < : : : such that for all i; j 2 f1; 2; : : : ; ng the
.k /
sequence .ai;jl /l1 converges to some real number xij and such that the matrix
X D Œxij is an orthogonal matrix. Hint: use the classical fact from real analysis
that each sequence in Œ1; 1 has a convergent subsequence.
21. Let A 2 Mn .R/ be a skew-symmetric matrix and let T W Rn ! Rn be the map
X ! AX . Prove that there is an orthonormal basis of Rn with respect to which
the matrix of T is a block-diagonalmatrix,
in which each block is either the
0 a
for some real number a. Hint: use
zero matrix or a matrix of the form
a 0
induction on n and Lemma 10.90, and argue as in the proof of Theorem 10.92.
22. Prove that if A 2 Mn .R/ is a skew-symmetric matrix, then det A 0 and the
rank of A is even.
In the following problems we consider a finite dimensional vector space V
over C endowed with a positive definite hermitian product h ; i and associated
norm jj jj. A linear map T W V ! V is called unitary or an isometry if
hT .x/; T .y/i D hx; yi
for all x; y 2 V . A matrix A 2 Mn .C/ is called unitary if the associated linear
map Cn ! Cn sending X to AX is unitary (where Cn is endowed with its
standard hermitian product).
23. Prove that for a linear map T W V ! V the following assertions are
equivalent:
a) T is unitary.
b) We have jjT .x/jj D jjxjj for all x 2 V .
c) T maps unit vectors (i.e., vectors of norm 1) to unit vectors.
24. Prove that a matrix A 2 Mn .C/ is unitary if and only if A A D In , where
A D t A is the conjugate transpose of A (thus if A D Œaij then A D Œaj i ).
25. Prove that the inverse of a unitary matrix is a unitary matrix, and that the product
of two unitary matrices is a unitary matrix.
26. Prove that if A is a unitary matrix, then j det Aj D 1.
27. Describe the diagonal and unitary matrices in Mn .C/.
28. Prove that for a matrix A 2 Mn .C/ the following assertions are equivalent:
a) A is unitary.
b) There is an orthonormal basis X1 ; : : : ; Xn of Cn (endowed with its standard
hermitian product) such that AX1 ; : : : ; AXn is an orthonormal basis of Cn .
c) For any orthonormal basis X1 ; : : : ; Xn of Cn the vectors AX1 ; : : : ; AXn form
an orthonormal basis of Cn .
29. Let T W V ! V be a unitary linear transformation on V . Prove that there is an
orthogonal basis of V consisting of eigenvectors of T .
10.8 The Spectral Theorem for Symmetric Linear Transformations and Matrices
469
10.8 The Spectral Theorem for Symmetric Linear
Transformations and Matrices
In this section we will prove the fundamental theorem concerning real symmetric
matrices or linear transformations. This classifies the symmetric linear transformations on an Euclidean space in the same way as Theorem 10.92 classifies orthogonal
transformations. We will then use this theorem to prove the rather amazing result
that any matrix A 2 Mn .R/ is the product of a symmetric positive matrix and of
an orthogonal matrix. This result, called the polar decomposition, is the matrix
analogue of the classical result saying that any complex number can be written as
the product of a nonnegative real number and of a complex number of magnitude 1.
We start by establishing a first fundamental property of real symmetric
matrices: their complex eigenvalues are actually real.
Theorem 10.98. Let A 2 Mn .R/ be a symmetric matrix. Then all roots of the
characteristic polynomial of A are real.
Proof. Let be a root of the characteristic polynomial of A. Let us see A as a
matrix in Mn .C/. Since det. In A/ D 0, there exists X 2 Cn nonzero such that
AX D X . Write X D Y C iZ for two vectors Y; Z 2 Rn and write D a C i b
for some real numbers a; b. The equality AX D X becomes
AY C iAZ D .a C i b/.Y C iZ/ D aY bZ C i.aZ C bY /
and taking real and imaginary parts yields
AY D aY bZ;
AZ D aZ C bY
(10.8)
Since A is symmetric, we have
hAY; Zi D hY; AZi
(10.9)
By relation (10.8), the left-hand side of relation (10.9) is equal to ahY; Zi bjjZjj2 ,
while the right-hand side is equal to ahY; Zi C bjjY jj2 . We deduce that
b.jjY jj2 C jjZjj2 / D 0
and since at least one of Y; Z is nonzero (otherwise X D 0, a contradiction), we
deduce that b D 0 and is real.
t
u
We need one further preliminary remark before proving the fundamental theorem:
Lemma 10.99. Let V be an euclidian space and let T W V ! V be a symmetric
linear transformation on V . Let W be a subspace of V which is stable under T .
Then
470
10 Forms
a) W ? is also stable under T .
b) The restrictions of T to W and W ? are symmetric linear transformations on
these spaces.
Proof. This follows fairly easily from Problem 10.78, but we prefer to give a
straightforward argument.
a) Let x 2 W ? and y 2 W . Then
hT .x/; yi D hx; T .y/i:
Now x 2 W ? and T .y/ 2 T .W / W , thus hx; T .y/i D 0 and so T .W ? / W ? , which yields the desired result.
b) Let T1 be the restriction of T to W . For x; y 2 W we have
hT1 .x/; yi D hT .x/; yi D hx; T .y/i D hx; T1 .y/i;
thus T1 is symmetric as linear map on W . The argument being identical for W ? ,
the lemma is proved.
t
u
We are finally in good shape for the fundamental theorem of the theory of
symmetric linear transformations (or matrices), which shows that all such transformations are diagonalizable in an orthonormal basis:
Theorem 10.100 (Spectral Theorem). Let V be an Euclidean space and let T W
V ! V be a symmetric linear transformation. Then there is an orthonormal basis
of V consisting of eigenvectors for T .
Proof. We will prove the theorem by strong induction on n D dim V . Everything
being clear when n D 1, suppose that the statement holds up to n 1 and let us
prove it for n. So let V be Euclidean with dim V D n and let T be a symmetric
linear transformation on V . Let e1 ; : : : ; en be an orthonormal basis of V . The matrix
A of T in this basis is symmetric, hence it has a real eigenvalue by Theorem 10.98
(and the fact that any matrix with real-or complex-entries has a complex eigenvalue).
Let W D ker. id T / be the -eigenspace of T . If W D V , then T D id and
so e1 ; : : : ; en is an orthonormal basis consisting of eigenvectors for T . So assume
that dim W < n. We have V D W ˚ W ? and T leaves stable W ? , inducing
a symmetric linear transformation on this subspace (Lemma 10.99). Applying the
inductive hypothesis to the restriction of T to W ? we find an orthonormal basis
f1? ; : : : ; fk? of W ? consisting of eigenvectors for T . Choosing any orthonormal
basis f1 ; : : : ; fs of W (consisting automatically of eigenvectors for T ), we obtain
an orthonormal basis f1 ; : : : ; fs ; f1? ; : : : ; fk? of V D W ˚ W ? consisting of
eigenvectors for T . This finishes the proof of the theorem.
t
u
If A 2 Mn .R/ is a symmetric matrix, then the linear transformation T W X 7!
AX on V D Rn is symmetric. Applying the previous theorem, we can find an
orthonormal basis of V with respect to which the matrix of T is diagonal. Since the
10.8 The Spectral Theorem for Symmetric Linear Transformations and Matrices
471
canonical basis of V is orthonormal and since the change of basis matrix between
two orthonormal bases is orthogonal (Remark 10.87), we obtain the following allimportant result:
Theorem 10.101. Let A 2 Mn .R/ be a symmetric matrix. There exists an
orthogonal matrix P 2 Mn .R/ such that PAP 1 is diagonal (in particular A is
diagonalizable). In other words, there is an orthonormal basis of Rn consisting in
eigenvectors of A.
The next result gives a very useful characterization of positive (respectively
positive definite) symmetric matrices:
Theorem 10.102. Let A 2 Mn .R/ be a symmetric matrix. Then the following
statements are equivalent:
a) A is positive
b) All eigenvalues of A are nonnegative.
c) A D B 2 for some symmetric matrix B 2 Mn .R/.
d) A D t B B for some matrix B 2 Mn .R/.
Proof. Suppose that A is positive and that
v. Since Av D v, we obtain
is an eigenvalue of A, with eigenvector
jjvjj2 D hv; Avi D t vAv 0;
thus 0. It follows that a) implies b).
Assume that b) holds and let 1 ; : : : ; n be all eigenvalues of A, counted with
multiplicities. By assumption i 0 for all i 2 Œ1; n. Moreover, by the spectral
theorem we can find an orthogonal matrix P such that PAP 1 D D, where D is the
diagonal
1 ; : : : ; n . Let D1 be the diagonal matrix with entries
p matrix with entries
1
i D
and
let
B
D
P
D
i
1 P . Then B is symmetric, since P is orthogonal and
D1 is symmetric:
t
B D t PD1 t P 1 D P 1 D1 P:
Moreover, by construction B 2 D P 1 D12 P D P 1 DP D A. Thus c) holds.
It is clear that c) implies d). Finally, if d) holds, then for all X 2 Rn we have
t
and so A is positive.
XAX D jjBX jj2 0
t
u
The reader is invited to state and prove the corresponding theorem for positive
definite matrices.
After this hard work, we will take a break and see some nice applications of the
above theorems. The result established in the next problem is very important.
472
10 Forms
Problem 10.103. a) Let T be a symmetric positive definite linear transformation
on an Euclidean space V . Prove that for all d 2 there is a unique symmetric
positive definite linear transformation Td such that Tdd D T . Moreover, prove
that there is a polynomial Pd 2 RŒX such that Td D Pd .T /.
b) Let A 2 Mn .R/ be a symmetric positive definite matrix. Prove that for all d 2 there is a unique symmetric positive definite matrix Ad such that Add D A.
Moreover, there is a polynomial Pd 2 RŒX such that Ad D Pd .A/.
Solution. Clearly part b) is a consequence of part a), so we focus on part a) only.
Let us establish the existence part first. Since T is symmetric and positive definite,
there are positive real numbers 1 ; : : : ; n and an orthonormal basis e1 ; : : p
: ; en of V
such that T .ei / D i ei for 1 i n. Define Td W V ! V by Td .ei / D d i ei for
p d
1 i n and extend it by linearity. Then Tdd .ei / D d i ei D i ei D T .ei / for
1 i n. Thus Tdd D T . Moreover, Td is symmetric and positive definite: indeed,
in the orthonormal basis e1 ; : : : ; ed the matrix of Td is diagonal with positive entries.
Next, we prove that Td is a polynomial
in T . It suffices to prove that there is a
p
polynomial P such that P . i / D d i for 1 i n, as then
p
P .T /.ei / D P . i /ei D d i ei D Td .ei /;
thus P .T / D Td . In order to prove the existence of P , let us assume without loss of
generality that the different numbers appearing in the list 1 ; : : : ; n are 1 ; : : : ; k
for
some 1 k n. It is enough to construct a polynomial P such that P . i / D
p
d
i for 1 i k. Simply take
p the Lagrange
p interpolation polynomial associated
with the data . 1 ; : : : ; k / and d 1 ; : : : ; d k .
Let us prove now that Td is unique. Let S be a symmetric positive definite linear
transformation such that S d D T . Then S commutes with T D S d , thus it also
commutes with any polynomial in T . It follows from the previous paragraph that
S commutes with Td . Since S and Td are diagonalizable and since they commute,
it follows that there is a basis f1 ; : : : ; fn of V in which the matrices of S and Td
are both diagonal, say D1 and D2 . Note that the entries a1 ; : : : ; an and b1 ; : : : ; bn of
D1 , respectively D2 are positive (since they are the eigenvalues of S and Td ) and
they satisfy aid D bid for 1 i n (since S d D Tdd D T ). It follows that ai D bi
for 1 i n and then D1 D D2 and S D Td . Thus Td is unique. The problem is
solved.
t
u
Remark 10.104. a) As the proof shows, the same result applies to symmetric
positive (but not necessarily positive definite) linear transformations and matrices
(of course, the resulting transformation Td , respectively matrix Ad will also be
symmetric positive, butpnot necessarily positive
definite).
p
b) We will simply write d T , respectively d A for the linear transformation Td ,
respectively matrix Ad in the previous problem.
Consider now a matrix A 2 Mn .R/. The matrix t A A is then symmetric and
positive. By the previous problem (and the remark following it), there is a unique
10.8 The Spectral Theorem for Symmetric Linear Transformations and Matrices
473
p
symmetric positive matrix S D t A A such that S 2 D t A A. Suppose now that
A is invertible, then S is invertible (because t A A D S 2 is invertible) and so we
can define
U D AS 1 :
Then, taking into account that S is symmetric, we obtain
t
U U D t S 1 t AAS 1 D S 1 S 2 S 1 D In ;
that is U is orthogonal. We have just obtained half of the following important
Theorem 10.105 (Polar Decomposition, Invertible Case). Let A 2 Mn .R/ be an
invertible matrix. There is a unique pair .S; U / with S a symmetric positive definite
matrix and U an orthogonal matrix such that A D US .
Proof. The existence part follows from the previous discussion, it remains to
establish the uniqueness of U and S . Suppose that A D US with U orthogonal
and S symmetric positive definite. Then
t
A A D S t U US D S 2
and by the uniqueness part in Problem 10.103 we deduce that S D
then U D AS 1 . Hence U and S are unique.
p
t A A and
t
u
One may wonder what is happening when A D Œaij is no longer invertible.
We will prove that we still have a decomposition A D US with U orthogonal and
S symmetric positive (not positive definite). The pair .S; U / is however no longer
unique (if A D On , then A D UOn for any orthogonal matrix U ). The existence
of the decomposition in the case when A is no longer invertible is rather tricky.
We will consider the matrices Ak D A C k1 In . There exists k0 such that for all
k > k0 the matrix Ak is invertible (because A has only finitely many eigenvalues).
By the previous theorem applied to Ak we can find an orthogonal matrix Uk and a
symmetric positive definite matrix Sk such that
Ak D Uk Sk :
.k/
.k/
Write Uk D Œuij and Sk D Œsij . Since Uk is orthogonal, the sum of squares of the
.k/
elements in each column of Uk equals 1, thus uij 2 Œ1; 1 for all i; j 2 f1; : : : ; ng
and all k > k0 . By a classical result in real analysis, any sequence of numbers
between 1 and 1 has a convergent subsequence (this is saying that the interval
Œ1; 1 is compact). Applying this result n2 times (for each pair i; j 2 f1; 2; : : : ng)
we deduce the existence of a sequence k0 < k1 < k2 < : : : such that
.k /
uij WD lim uij l
l!1
474
10 Forms
exists for all i; j 2 f1; 2; : : : ; ng. We claim that the matrix U D Œuij is orthogonal.
Indeed, passing to the limit in each entry of the equality t Ukl Ukl D In yields
t
U U D In . Moreover, since
Akl D t Ukl Akl ;
Skl D Uk1
l
and since each .i; j /-entry of t Ukl converges (when l ! 1) to uj i and each
.i; j /-entry of Akl converges (when l ! 1) to aij , we deduce that for all
.k /
i; j 2 f1; 2; : : : ; ng the sequence .sij l /l converges to some sij , the matrix S D Œsij is symmetric and
S D t U A;
that is A D US . It remains to check that S is positive, but if X 2 Rn , then passing
to the limit in the inequality t XSkl X 0 yields t XSX 0, thus S is positive. All
in all, we have just proved the following:
Theorem 10.106 (Polar Decomposition, The General Case). Any matrix A 2
Mn .R/ can be written as the product of an orthogonal matrix and of a symmetric
positive matrix.
Note that if A D US , then necessarily
t
A A D S2
p
t A A is uniquely determined. We call the eigenvalues of S the
and so S D
singular values of A. For more information about these, see the problems section.
We end this section with a few other applications of the results seen so far.
Problem 10.107. Let V be an Euclidean space and let T be a symmetric linear
transformation on V . Let 1 ; : : : ; n be the eigenvalues of T . Prove that
jjT .x/jj
D max j i j:
1in
x2V f0g jjxjj
sup
Solution. By renumbering the eigenvalues, we may assume that maxi j i j D j n j.
Let e1 ; : : : ; en be an orthonormal basis of V in which T .ei / D i ei for 1 i n.
If x 2 V f0g, we can write x D x1 e1 C : : : C xn en for some real numbers xi , and
we have
T .x/ D
n
X
iD1
i xi ei :
10.8 The Spectral Theorem for Symmetric Linear Transformations and Matrices
Thus
jjT .x/jj
D
jjxjj
since
2 2
i xi sP
n
475
2 2
i xi
j nj
2
iD1 xi
PiD1
n
2 2
n xi for 1 i n. We conclude that
jjT .x/jj
j n j:
x2V f0g jjxjj
sup
Since
jjT .en /jj
D j n j;
jjen jj
we deduce that the previous inequality is actually an equality, which yields the
desired result.
t
u
Problem 10.108. Find all nilpotent symmetric matrices A 2 Mn .R/.
Solution. If A is nilpotent, then all eigenvalues of A are 0. If A is moreover
symmetric, then it is diagonalizable and so it must be On . Thus only the zero matrix
is simultaneously symmetric and nilpotent.
t
u
Problem 10.109. Let A be a symmetric matrix with real entries and suppose that
Ak D In for some positive integer k. Prove that A2 D In .
Solution. Since A is symmetric and has real entries, its complex eigenvalues are
actually real. Since they are moreover kth roots of unity, they must be ˙1. Thus all
eigenvalues of A2 are equal to 1. Since A2 is symmetric, it is diagonalizable, and
since all of its eigenvalues are 1, we must have A2 D In .
t
u
Problem 10.110. Let A 2 Mn .R/ be a symmetric positive matrix. Prove that
p
n
det A 1
Tr.A/:
n
Solution. det A and Tr.A/ do not change if we replace A with any matrix similar
to it. Using the spectral theorem, we may therefore assume that A is diagonal. Since
A is positive, its diagonal entries ai WD ai i are nonnegative numbers. It suffices
therefore to prove that
p
a1 C a2 C : : : C an
n
a1 a2 : : : an n
for all nonnegative real numbers a1 ; : : : ; an . This is the AM-GM inequality. Let us
recall the proof: the inequality is clear if one of the ai ’s is 0. If all ai are positive, the
inequality is a consequence of the convexity of x 7! e x (more precisely of Jensen’s
inequality applied to ln a1 ; : : : ; ln an ).
t
u
476
10 Forms
Problem 10.111. Let A D Œaij 2 Mn .R/ be a symmetric positive matrix with
eigenvalues 1 ; : : : ; n . Prove that if f W Œ0; 1/ ! R is a convex function, then
f .a11 / C f .a22 / C : : : C f .ann / f . 1 / C : : : C f . n /:
Solution. Since A is symmetric and positive, there is an orthogonal matrix P such
that A D PDP 1 , where D is the diagonal matrix with diagonal entries 1 ; : : : ; n .
Let P D Œpij , then the equality A D PD t P yields
aij D
n
X
pik k pj k :
kD1
Since P is orthogonal, we have
deduce that
Pn
f .ai i / D f
2
kD1 pik D 1 for all i , and since f
n
X
!
2
pik
k
kD1
n
X
is convex, we
2
pik
f . k /:
kD1
Adding up these inequalities yields
n
X
f .ai i / iD1
n
X
n
n X
X
2
pik
f . k/ D
iD1 kD1
f . k/
n
X
2
pik
D
iD1
kD1
n
X
f . k /;
kD1
the last equality being again a consequence of the fact that P is orthogonal. The
result follows.
t
u
Problem 10.112. Let A D Œaij 2 Mn .R/ be a symmetric positive matrix. Prove
that
det A a11 a22 : : : ann :
Solution. If det A D 0, then everything is clear, since ai i Dt ei Aei 0 for all i ,
where e1 ; : : : ; en is the canonical basis of Rn . So suppose that det A > 0, thus A is
positive definite. Then ai i > 0, since ei ¤ 0. If 1 ; : : : ; n are the eigenvalues of A,
then det A D 1 : : : n , thus the inequality is equivalent to
n
X
kD1
log
k n
X
log akk :
kD1
This follows from Problem 10.111 applied to the convex function f .x/ D log x.
t
u
10.8 The Spectral Theorem for Symmetric Linear Transformations and Matrices
477
Problem 10.113 (Hadamard’s Inequality). Let A D Œaij 2 Mn .R/ be an
arbitrary matrix. Prove that
0
1
n
n
Y
X
@
j det Aj2 aij2 A :
iD1
j D1
t
Solution. We will apply Problem 10.112 to the matrix B D
PnA A,2 which is
2
symmetric and positive. Note that det B D .det A/ and bi i D j D1 aij for all i .
The result follows therefore from Problem 10.112.
t
u
10.8.1 Problems for Practice
1. Give an example of a symmetric matrix with complex coefficients which is not
diagonalizable.
2. Let T be a linear transformation on an Euclidean space V , and suppose that
V has an orthonormal basis consisting of eigenvectors of T . Prove that T is
symmetric (thus the converse of the spectral theorem holds).
3. Consider the matrix
3
2
1 2 2
A D 4 2 1 2 5 :
2 2 1
a) Explain why A is diagonalizable in M3 .R/.
b) Find an orthogonal matrix P such that P 1 AP is diagonal.
4. Find an orthogonal basis consisting of eigenvectors for the matrix
3
2
2 6 3
14
AD
6 3 2 5:
7
3 2 6
5. Let A 2 Mn .R/ be a nilpotent matrix such that A t A D t AA. Prove that A D
On . Hint: prove that B D A t A is nilpotent.
6. Let A 2 Mn .R/ be a matrix. Prove that At A and t AA are similar (in fact A and
t
A are always similar matrices, but the proof of this innocent-looking statement
is much harder and requires Jordan’s classification theorem). Hint: both these
matrices are symmetric, hence diagonalizable.
7. Let A 2 Mn .R/ be a symmetric matrix. Prove that
rank.A/ .Tr.A//2
:
Tr.A2 /
Hint: consider an orthonormal basis of eigenvectors for A.
478
10 Forms
8. The entries of a matrix A 2 Mn .R/ are between 1 and 1. Prove that
j det Aj nn=2 :
Hint: use Hadamard’s inequality.
9. Let A; B 2 Mn .R/ be matrices such that t AA D t BB. Prove that there is
an orthogonal matrix U 2 Mn .R/ such that B D UA. Hint: use the polar
decomposition.
10. (The Courant–Fischer theorem) Let E be an Euclidean space of dimension n
and let p 2 Œ1; n be an integer. Let T be a symmetric linear transformation on
E and let 1 : : : n be its eigenvalues.
a) Let e1 ; : : : ; en be an orthonormal basis of E such that T .ei / D
1 i n and let F D Span.e1 ; : : : ; ep /. Prove that
max hT .x/; xi x2F
jjxjjD1
i ei for all
p:
b) Let F be a subspace of E of dimension p. Prove that F \ Span.ep ; : : : ; en /
is nonzero and deduce that
max hT .x/; xi x2F
jjxjjD1
p:
c) Prove the Courant–Fischer theorem:
p D
min
max hT .x/; xi;
F E
x2F
dim F Dp jjxjjD1
the minimum being taken over all subspaces F of E of dimension p.
11. Find all matrices A 2 Mn .R/ satisfying At AA D In . Hint: start by proving that
any solution of the problem is a symmetric matrix.
12. Find all symmetric matrices A 2 Mn .R/ such that
A C A3 C A5 D 3In :
13. Let A; B 2 Mn .R/ be symmetric positive matrices.
a) Let e1 ; : : : ; en be an orthonormal basis of Rn consisting of eigenvectors of
B, say Bei D i ei . Let i D hAei ; ei i. Explain why i ; i 0 for all i
and why
Tr.A/ D
n
X
iD1
i
and
Tr.AB/ D
n
X
i D1
i i :
10.8 The Spectral Theorem for Symmetric Linear Transformations and Matrices
479
b) Prove that
Tr.AB/ Tr.A/ Tr.B/:
14. Let A D Œaij 2 Mn .R/ be a symmetric matrix and let
eigenvalues (counted with multiplicities). Prove that
n
X
aij2 D
i;j D1
n
X
1; : : : ;
n be its
2
i:
iD1
15. (Cholesky’s decomposition) Let A be a symmetric positive definite matrix in
Mn .R/. Prove that there is a unique upper-triangular matrix T 2 Mn .R/ with
positive diagonal entries such that
A D t T T:
Hint: for the existence part, consider the inner product hx; yi1 D hAx; yi on Rn
(with h ; i the canonical inner product on Rn ), apply the Gram–Schmidt process
to the canonical basis B of Rn and to the inner product h ; i1 , and consider the
change of basis matrix from B to the basis given by the Gram–Schmidt process.
16. a) Let V be an Euclidean space and let T be a linear transformation on V . Let
1 ; : : : ; n be the eigenvalues of T ı T . Prove that
p
jjT .x/jj
D max
i:
1i n
x2V f0g jjxjj
sup
b) Let V be an Euclidean space and let T be a symmetric linear transformation
on V . Let 1 : : : n be the eigenvalues of T . Prove that
hT .x/; xi
D
jjxjj2
x2V f0g
sup
n:
17. Let A; B 2 Mn .R/ be symmetric matrices. Define a map f W R ! R by: f .t /
is the largest eigenvalue of A C tB. Prove that f is a convex function. Hint: use
Problem 16.
18. Let T be a diagonalizable linear transformation on an Euclidean space V . Prove
that if T and T commute, then T is symmetric.
19. Let V be the vector space of polynomials with real coefficients whose degree
does not exceed n, endowed with the inner product
hP; Qi D
Z 1
P .x/Q.x/dx:
0
480
10 Forms
Consider the map T W V ! V defined by
T .P /.X / D
Z 1
.X C t /n P .t /dt:
0
a) Give a precise meaning to T .P /.X / and prove that T is a symmetric linear
transformation on V .
b) Let P0 ; : : : ; Pn be an orthonormal basis of V consisting of eigenvectors for
T , with corresponding eigenvalues 0 ; : : : ; n . Prove that for all x; y 2 R
we have
.x C y/n D
n
X
k Pk .x/Pk .y/:
kD0
20. Prove that if A; B are symmetric positive matrices in Mn .R/, then
det.A C B/ det A C det B:
21. a) Prove that if x1 ; : : : ; xn are real numbers and 1 ; : : : ; n are positive real
numbers, then
!
!
!2
n
n
n
X
X
X
2
1 2
2
xi :
i xi
i xi
iD1
i D1
iD1
b) Prove that if T is a symmetric and positive definite linear transformation on
an Euclidean space V , then for all x 2 V we have
hT .x/; xi hT 1 .x/; xi jjxjj4 :
22. a) Prove that if
1; : : : ;
n are nonnegative real numbers, then
p
n
.1 C
1 / : : : .1 C
n/ 1 C
p
n
1:::
n:
Hint: check that the map f .x/ D ln.1 C e x / is convex on Œ0; 1/ and use
Jensen’s inequality.
b) Let A 2 Mn .R/ be a symmetric positive definite matrix. Prove that
p
p
n
n
det.In C A/ 1 C det A:
23. (Singular value decomposition) Let 1 ; : : : ; n be the singular values of A 2
Mn .R/, counted with multiplicities (algebraic or geometric, it does not matter
since S is diagonalizable).
10.8 The Spectral Theorem for Symmetric Linear Transformations and Matrices
481
a) Prove the existence of orthonormal bases e1 ; : : : ; en and f1 ; : : : ; fn of Rn
such that Aei D i fi for 1 i n. Hint: let A D US be the polar
decomposition of A. Pick an orthonormal basis e1 ; : : : ; en of Rn such that
Sei D i ei and set fi D Uei .
b) Prove that if e1 ; : : : ; en and f1 ; : : : ; fn are bases as in a), then for all X 2 Rn
we have
AX D
n
X
i hX; ei ifi :
iD1
We call this the singular value decomposition of A.
c) Let e1 ; : : : ; en and f1 ; : : : ; fn be orthonormal bases of Rn giving a singular
value decomposition of A. Prove that the singular value decomposition of
A1 is given by
1
A XD
n
X
1
j D1
hX; fj iej :
j
d) Prove that two matrices A1 ; A2 2 Mn .R/ have the same singular values if
and only if there are orthogonal matrices U1 ; U2 such that
A2 D U1 A1 U2 :
e) Prove that A is invertible if and only if 0 is not a singular value of A.
f) Compute the rank of A in terms of the singular values of A.
g) Prove that A is an orthogonal matrix if and only if all of its singular values
are equal to 1.
24. The goal of this long exercise is to establish the analogues of the main results
of this section for hermitian spaces.
Let V be a hermitian space, that is a finite dimensional C-vector space
endowed with a hermitian inner product h ; i. A linear transformation T WV !V
is called hermitian if hT .x/; yi D hx; T .y/i for all x; y 2 V .
a) Let e1 ; : : : ; en be an orthonormal basis of V . Prove that T is hermitian if
and only if the matrix A of T with respect to e1 ; : : : ; en is hermitian, that is
A D A (recall that A D t A).
From now on, until part e), we let T be a hermitian linear transformation
on V .
b) Prove that the eigenvalues of T are real numbers.
c) Prove that if W is a subspace of V stable under T , then W ? is also stable
under T , and the restrictions of T to W and W ? are hermitian linear
transformations on these subspaces.
d) Prove that there is an orthonormal basis of V consisting of eigenvectors of T .
482
10 Forms
e) Conversely, prove that if V has an orthonormal basis consisting of eigenvectors of T with real eigenvalues, then T is hermitian.
f) Prove that for any hermitian matrix A 2 Mn .C/ we can find a unitary matrix
P and a diagonal matrix D with real entries such that A D P 1 DP .
g) Let T W V ! V be any invertible linear transformation. Prove that there is a
unique pair .S; U / of linear transformations on V such that H is hermitian
positive (i.e., H is hermitian and its eigenvalues are positive), U is unitary
and T D U ı H .
Chapter 11
Appendix: Algebraic Prerequisites
Abstract This appendix recalls the basic algebraic structures that are needed in the
study of linear algebra, with special emphasis on permutations and polynomials.
Even though the main objects of this book are vector spaces and linear maps
between them, groups and polynomials naturally appear at several key moments
in the development of linear algebra. In this brief chapter we define these objects
and state the main properties that will be needed in the sequel. The reader is advised
to skip reading this chapter and return to it whenever reference to this chapter is
made.
11.1 Groups
Morally, a group is just a set in which one can multiply objects of the set (staying
in that set) according to some rather natural rules. Formally, we have the following
definition.
Definition 11.1. A group is a nonempty set G endowed with a map W G G ! G
satisfying the following properties:
a) (associativity) For all a; b; c 2 G we have .a b/ c D a .b c/.
b) (identity) There is an element e 2 G such that a e D e a D a for all a 2 G.
c) (existence of inverses) For all a 2 G there is a1 2 G such that a a1 D
a1 a D e.
If moreover a b D b a for all a; b 2 G, we say that the group G is commutative
or abelian.
Note that the element e of G is unique. Indeed, if e 0 is another element with the
same properties, then e 0 D e 0 e D e e 0 D e. We call e the identity element of G.
Secondly, the element a1 is also unique, for if x is another element with the same
properties, then
x D x e D x .a a1 / D .x a/ a1 D e a1 D a1 :
We call a1 the inverse of a.
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3__11
483
484
11 Appendix: Algebraic Prerequisites
We will usually write ab instead of a b. Moreover, if the group G is abelian, we
will usually prefer the additive notation a C b instead of ab and write 0 instead of
e, and a instead of a1 .
Since the definition of a group is not restrictive, there is a huge amount of
interesting groups. For instance, all vector spaces (which we haven’t properly
defined yet, but which are the main actors of this book) are examples of commutative
groups. There are many other groups, which we will see in action further on: groups
of permutations of a set, groups of invertible linear transformations of a vector space,
the group of positive real numbers or the group of integers, etc.
11.2 Permutations
11.2.1 The Symmetric Group Sn
A bijective map W f1; 2; : : : ; ng ! f1; 2; : : : ; ng is called a permutation of degree
n. We usually describe a permutation by a table
D
1
2 ::: n
;
.1/ .2/ : : : .n/
where the second line represents the images of 1; 2; : : : ; n by .
The set of all permutations of degree n is denoted by Sn . It is not difficult to see
that Sn has nŠ elements: we have n choices for .1/, n 1 choices for .2/ (as it can
be any element different from .1/),. . . , one choice for .n/, thus n .n 1/ : : : 1 D nŠ choices in total.
We denote by e the identity map sending k to k for 1 k n, thus
eD
1 2 ::: n
:
1 2 ::: n
The product of two permutations ;
ı . Thus for all 1 k n
2 Sn is defined as the composition
. /.k/ D . .k//:
Example 11.2. Let ; 2 S4 be the permutations given by
D
1234
2341
and
D
1234
:
3142
11.2 Permutations
485
Then
D
1234
2341
1234
3142
D
1234
4213
and
D
1234
3142
1234
2341
D
1234
1423
Since and are bijections, so is their composition and so 2 Sn . The easy
proof of the following theorem is left to the reader.
Theorem 11.3. Endowed with the previously defined multiplication, Sn is a group
with nŠ elements.
Note that the inverse of a permutation with respect to multiplication is simply
its inverse as a bijective map (i.e., 1 is the unique map such that 1 .x/ D y
whenever .y/ D x). For example, the inverse of permutation
12345
D
24513
is the permutation
1
D
12345
:
41523
The previous Example 11.2 shows that we generally have ¤ , thus Sn
is a non commutative group in general (actually for all n 3, the groups S1 and S2
being commutative). The group Sn is called the symmetric group of degree n or
the group of permutations of degree n.
Problem 11.4. Let 2 Sn , where n 3. Prove that if ˛ D ˛ for all
permutations ˛ 2 Sn , then D e.
Solution. Fix i 2 f1; 2; : : : ; ng and choose a permutation ˛ having i as unique fixed
point, for instance
˛D
1 2 ::: i 1 i i C 1 ::: n
:
2 3 ::: i C 1 i i C 2 ::: 1
Since
.i / D .˛.i // D ˛..i //
and i is the unique fixed point of ˛, we must have .i / D i . As i was arbitrary, the
result follows.
486
11 Appendix: Algebraic Prerequisites
11.2.2 Transpositions as Generators of Sn
The group Sn has a special class of elements which have a rather simple structure
and which determine the whole group Sn , in the sense that any element of Sn is a
product of some of these elements. They are called transpositions and are defined
as follows.
Definition 11.5. Let i; j 2 f1; 2; : : : ; ng be distinct. The transposition .ij / is the
permutation sending k to k for all k ¤ i; j and for which .i / D j and .j / D i .
Thus .ij / exchanges i and j , while keeping all the other elements fixed.
It follows straight from the definition that a transposition satisfies 2 D e and
so 1 D . Note also that the set fi; j g is uniquely determined by the transposition
.ij /, since it is exactly the set of those k 2 f1; 2; : : : ; ng for which .ij /.k/ ¤ k.
Since there are
!
n
n.n 1/
D
2
2
subsets with two elements of f1; 2; : : : ; ng, it follows that there are n2 transpositions. Let us prove now that the group Sn is generated by transpositions.
Theorem 11.6. Let n 2. Any permutation 2 Sn is a product of transpositions.
Proof. For 2 Sn we let m be the number of elements k 2 f1; 2; : : : ; ng for
which .k/ ¤ k. We prove the theorem by induction on m . If m D 0, then
D e D .12/2 and we are done.
Assume that m > 0 and that the statement holds for all permutations ˛ 2 Sn
with m˛ < m . Since m > 0, there is i 2 f1; 2; : : : ; ng such that .i / ¤ i .
Let j D .i /, D .ij / and ˛ D . Let A D fk; ˛.k/ ¤ kg and B D fk; .k/ ¤
kg. Note that if .k/ D k, then k ¤ i and k ¤ j , hence
˛.k/ D . /.k/ D . .k// D .k/ D k:
This shows that A B. Moreover, we have A ¤ B since j belongs to B but not to
A. It follows that m˛ < m .
Using the induction hypothesis, we can write ˛ as a product of transpositions.
Since D ˛ 1 D ˛ , itself is a product of transpositions and we are done.
Note that the proof of the theorem also gives an algorithm allowing to express a
given permutation as a product of transpositions. Let us see a concrete example. Let
D
12345
:
25413
11.2 Permutations
487
Since .1/ D 2, we compute .12/ in order to create a fixed point
1 D .12/ D
D
12345
25413
12345
21345
12345
:
52413
Because 1 .1/ D 5, we compute 1 .15/ to create a new fixed point
2 D 1 .15/ D
D
12345
52413
12345
52341
12345
:
32415
Computing 2 .13/ we obtain a new fixed point in the permutation
3 D 2 .13/ D
12345
:
42315
Now, observe that 3 D .14/, thus 3 .14/ D e. We deduce that .12/ .15/ .13/ .14/ D e and so
D .14/.13/.15/.12/:
11.2.3 The Signature Homomorphism
An inversion of a permutation 2 Sn is a pair .i; j / with 1 i < j n and
.i / > .j /. Let Inv. / be the number of inversions of . Note that
0 Inv. / n.n 1/
;
2
2 Sn ;
and these inequalities are optimal: Inv.e/ D 0 and Inv. / D n.n1/
for
2
D
1 2 ::: n
:
n n 1 ::: 1
488
11 Appendix: Algebraic Prerequisites
Example 11.7. The permutation
D
123456
562143
has Inv. / D 4 C 4 C 1 C 1 D 10 inversions, since .1/ > .3/, .1/ > .4/,
.1/ > .5/, .1/ > .6/, .2/ > .3/, .2/ > .4/, .2/ > .5/, .2/ > .6/,
.3/ > .4/, .5/ > .6/.
We introduce now a fundamental map " W Sn ! f1; 1g, the signature.
Definition 11.8. The sign of a permutation 2 Sn is defined by
"./ D .1/Inv. / :
If "./ D 1, then we say that is an even permutation and if "./ D 1, then we
say that is an odd permutation. Note that a transposition D .ij / with i < j is an
odd permutation, as the number of inversions of is j i Cj i 1 D 2.j i /1.
Here is the fundamental property of the signature map:
Theorem 11.9. The signature map " W Sn ! f1; 1g is a homomorphism of groups,
i.e., ".1 2 / D ".1 /".2 / for all 1 ; 2 2 Sn .
Without giving the formal proof of this theorem, let us mention that the key point
is the equality
"./ D
Y
.i / .j /
i j
1i<j n
for any 2 Sn . This follows rather easily from the definition of "./ and can be
used to prove the multiplicative character of .
Remark 11.10. a) The signature is the unique nontrivial homomorphism Sn !
f1; 1g. Indeed, let ' W Sn ! f1; 1g be a surjective homomorphism of groups.
If 1 D .i; j / and 2 D .k; l/ are two transpositions, then we can find 2 Sn
such that 2 D 1 1 (indeed, it suffices to impose .i / D k and .j / D l).
Then '. 2 / D './'. 1 /'. /1 D '. 1 /. Thus all transpositions of Sn are sent
to the same element of f1; 1g, which must be 1, as the transpositions generate
Sn and ' is not the trivial homomorphism. Thus '. / D 1 D ". / for all
transpositions and using again that the transpositions generate Sn , it follows that
' D ".
b) Let 2 Sn be a permutation, and write D 1 2 : : : k , where 1 ; 2 ; : : : ; k are
transpositions. This decomposition is definitely not unique, but the parity of k is
the same in all decompositions. This is definitely not an obvious statement, but it
follows easily Q
from the previous theorem: for any such decomposition we must
have "./ D kiD1 ". i / D .1/k , thus the parity of k is independent of the
decomposition.
11.3 Polynomials
489
11.3 Polynomials
Let F be a field, for instance R or C. The set F ŒX of polynomials with coefficients
in X will play a key role in this chapter. In this section we recall, without proof, a
few basic facts about polynomials.
Any element P of F ŒX can be uniquely written as a formal expression
P D a0 C a1 X C : : : C an X n
with a0 ; : : : ; an 2 F . If P ¤ 0, then at least one of the coefficients a0 ; : : : ; an is
nonzero, and we may assume that an ¤ 0. We then say that P has degree n (and
write deg P D n) and leading coefficient an . By convention, the degree of the
zero polynomial is 1. A fundamental property of polynomials with coefficients
in a field is the equality
deg.PQ/ D deg P C deg Q
for all polynomials P; Q 2 F ŒX . We say that P is unitary or monic if its
leading coefficient is 1. Polynomials of degree 0 or 1 are also called constant
polynomials.
Remark 11.11. Sometimes we will write P .X / instead of P for an element of
F ŒX , in order to emphasize that the variable is X .
The first fundamental result is the division algorithm:
Theorem 11.12. Let A; B 2 F ŒX with B ¤ 0. There is a unique pair .Q; R/ of
elements of F ŒX such that A D BQ C R and deg R < deg B.
The polynomials Q and R are called the quotient, respectively remainder of A
when divided by B. We say that B divides A if R D 0. We say that a polynomial
P 2 F ŒX is irreducible if P is not constant, but cannot be written as the product
of two nonconstant polynomials. Thus all divisors of an irreducible polynomial are
either constant polynomials or constant times the given polynomial. For instance,
all polynomials of degree 1 are irreducible. For some special fields, these are the
only irreducible polynomials:
Definition 11.13. A field F is called algebraically closed if any irreducible
polynomial P 2 F ŒX has degree 1.
An element a 2 F is called a root of a polynomial P 2 F ŒX if P .a/ D 0.
In this case, the division algorithm implies the existence of a factorization
P D .X a/Q
490
11 Appendix: Algebraic Prerequisites
for some polynomial Q 2 F ŒX . Repeating this argument, we deduce that if
a1 ; a2 ; : : : ; ak 2 F are pairwise distinct roots of P , then we can write
P D .X a1 /.X a2 / : : : .X ak /Q
for some polynomial Q 2 F ŒX . Taking degrees, we obtain the following
Theorem 11.14. A nonzero polynomial P 2 F ŒX of degree n has at most n
pairwise distinct roots in F .
Stated otherwise, if a polynomial of degree at most n vanishes at n C 1 distinct
points of F , then it must be the zero polynomial. The notion of irreducible
polynomial can also be expressed in terms of roots:
Theorem 11.15. A field F is algebraically closed if and only if any nonconstant
polynomial P 2 F ŒX has a root in F . If this is the case, then any nonconstant
polynomial P 2 F ŒX can be written as
P D c.X a1 /n1 : : : .X ak /nk
for some nonzero constant c 2 F , some pairwise distinct elements a1 ; ::; ak of F
and some positive integers n1 ; : : : ; nk .
We call ni the multiplicity of the root ai of P . It is the largest positive integer m
for which .X ai /m divides P .
Finally, we state the fundamental theorem of algebra:
Theorem 11.16 (Gauss). The field C of complex numbers is algebraically closed.
References
1. Axler, S.: Linear Algebra Done Right, 2nd edn. Springer, New York (1997)
2. Halmos, P.: Finite Dimensional Vector Spaces, 2nd edn. Van Nostrand, New York (1958)
3. Hoffman, K., Kunze, R.: Linear Algebra, 2nd edn. Prentice Hall, Englewood Cliffs (1971)
4. Lang, S.: Linear Algebra, 3rd edn. Undergraduate Texts in Mathematics. Springer, New York
(1987)
5. Lax, P.D.: Linear Algebra and Its Applications, 2nd edn. Wiley Interscience, New York (2007)
6. Munkres, J.: Elementary Linear Algebra. Addison-Wesley, Reading (1964)
7. Strang, G.: Linear Algebra and Its Applications, 4th edn. Brooks cole, Stamford (2005)
© Springer Science+Business Media New York 2014
T. Andreescu, Essential Linear Algebra with Applications: A Problem-Solving
Approach, DOI 10.1007/978-0-8176-4636-3
491
Download