Uploaded by Calvin Vail

MATH 136 Course Notes

advertisement
MATH 136: Linear Algebra 1 for Honours Mathematics
Fall 2023 Edition
1st edition 2020 by Conrad Hewitt and Francine Vinette
© Faculty of Mathematics, University of Waterloo
We would like to acknowledge contributions from:
Shane Bauman, Judith Koeller, Anton Mosunov, Graeme Turner, and Owen Woody
2
Contents
1 Vectors in R𝑛
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
1.2 Algebraic and Geometric Representation of Vectors
1.3 Operations on Vectors . . . . . . . . . . . . . . . .
1.4 Vectors in C𝑛 . . . . . . . . . . . . . . . . . . . . .
1.5 Dot Product in R𝑛 . . . . . . . . . . . . . . . . . .
1.6 Projection, Components and Perpendicular . . . .
1.7 Standard Inner Product in C𝑛 . . . . . . . . . . . .
1.8 Fields . . . . . . . . . . . . . . . . . . . . . . . . .
1.9 The Cross Product in R3 . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
12
13
19
23
27
28
2 Span, Lines and Planes
2.1 Linear Combinations and Span . . . .
2.2 Lines in R2 . . . . . . . . . . . . . . .
2.3 Lines in R𝑛 . . . . . . . . . . . . . . .
2.4 The Vector Equation of a Plane in R𝑛
2.5 Scalar Equation of a Plane in R3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
32
34
36
40
44
3 Systems of Linear Equations
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 An Approach to Solving Systems of Linear Equations . . . . . . . . . . .
3.4 Solving Systems of Linear Equations Using Matrices . . . . . . . . . . . .
3.5 The Gauss–Jordan Algorithm for Solving Systems of Linear Equations . .
3.6 Rank and Nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Homogeneous and Non-Homogeneous Systems, Nullspace . . . . . . . . .
3.8 Solving Systems of Linear Equations over C . . . . . . . . . . . . . . . . .
3.9 Matrix–Vector Multiplication . . . . . . . . . . . . . . . . . . . . . . . . .
3.10 Using a Matrix–Vector Product to Express a System of Linear Equations
3.11 Solution Sets to Systems of Linear Equations . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
48
48
49
55
61
66
75
78
84
84
88
89
.
.
.
.
.
.
97
97
99
103
107
109
112
.
.
.
.
.
.
.
.
.
.
4 Matrices
4.1 The Column and Row Spaces of a Matrix
4.2 Matrix Equality and Multiplication . . . .
4.3 Arithmetic Operations on Matrices . . . .
4.4 Properties of Square Matrices . . . . . . .
4.5 Elementary Matrices . . . . . . . . . . . .
4.6 Matrix Inverse . . . . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
Chapter 0
CONTENTS
5 Linear Transformations
121
5.1 The Function Determined by a Matrix . . . . . . . . . . . . . . . . . . . . . 121
5.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3 The Range of a Linear Transformation and “Onto” Linear Transformations 127
5.4 The Kernel of a Linear Transformation and “One-to-One” Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.5 Every Linear Transformation is Determined by a Matrix . . . . . . . . . . . 133
5.6 Special Linear Transformations: Projection, Perpendicular, Rotation and
Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.7 Composition of Linear Transformations . . . . . . . . . . . . . . . . . . . . 143
6 The
6.1
6.2
6.3
6.4
6.5
6.6
Determinant
The Definition of the Determinant . . .
Computing the Determinant in Practice:
The Determinant and Invertibility . . .
An Expression for 𝐴−1 . . . . . . . . . .
Cramer’s Rule . . . . . . . . . . . . . . .
The Determinant and Geometry . . . .
. . . .
EROs
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Eigenvalues and Diagonalization
7.1 What is an Eigenpair? . . . . . . . . . . . . . . . . . . .
7.2 The Characteristic Polynomial and Finding Eigenvalues
7.3 Properties of the Characteristic Polynomial . . . . . . .
7.4 Finding Eigenvectors . . . . . . . . . . . . . . . . . . . .
7.5 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . .
7.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
147
147
153
156
158
161
163
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
167
167
169
172
178
181
183
8 Subspaces and Bases
8.1 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Linear Dependence and the Notion of a Basis of a Subspace
8.3 Detecting Linear Dependence and Independence . . . . . .
8.4 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.6 Bases for Col(𝐴) and Null(𝐴) . . . . . . . . . . . . . . . . .
8.7 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.8 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
190
190
193
197
204
208
212
216
219
.
.
.
.
230
230
235
238
244
9 Diagonalization
9.1 Matrix Representation of a Linear Operator
9.2 Diagonalizability of Linear Operators . . . .
9.3 Diagonalizability of Matrices Revisited . . .
9.4 The Diagonalizability Test . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10 Vector Spaces
254
10.1 Definition of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
10.2 Span, Linear Independence and Basis . . . . . . . . . . . . . . . . . . . . . . 258
10.3 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Chapter 1
Vectors in R𝑛
1.1
Introduction
Linear algebra is used widely in the social sciences, business, science, and engineering.
Vectors are used in the sciences for displacement, velocity, acceleration, force, and many
other important physical concepts. Informally, a vector is an object that has both magnitude
and direction. For instance, the velocity of a boat on a river can be represented by a vector
that encodes both its magnitude (speed) and its direction of travel.
In mathematics, a vector is represented by a column of numbers, as in the following definition.
Definition 1.1.1
R𝑛 , Vector in R𝑛
⎧
⎪
⎨
⎫
⎤
𝑥1
⎪
⎬
⎢ .. ⎥
#»
𝑛
The set R is defined as 𝑥 = ⎣ . ⎦ : 𝑥1 , . . . , 𝑥𝑛 ∈ R .
⎪
⎪
⎩
⎭
𝑥𝑛
⎡ ⎤
𝑥1
⎢ .. ⎥
#»
A vector is an element 𝑥 = ⎣ . ⎦ of R𝑛 .
⎡
𝑥𝑛
REMARK
In these notes, a vector is indicated by an accented right arrow: #»
𝑣.
Other texts use different notation for vectors, such as v or 𝑣.
⎡
Example 1.1.2
[︂ ]︂
2
#»
The following are vectors: 𝑤 =
∈ R2 ,
3
⎡
⎤
−3
⎢
⎢
#»
𝑣 = ⎣ 1 ⎦ ∈ R3 , and #»
𝑢 =⎢
⎣
−5
5
𝑢1
𝑢2
..
.
𝑢100
⎤
⎥
⎥
⎥ ∈ R100 .
⎦
6
Chapter 1
Vectors in R𝑛
We will usually write vectors in column notation. It can be more convenient and compact
to write a vector in row notation, which includes the superscript “𝑇 ” (short for transpose)
and clearly separates the components of the vectors with spaces, as follows.
⎤
𝑣1
⎢ 𝑣2 ⎥
]︀𝑇
[︀
⎢ ⎥
𝑣 = 𝑣1 𝑣2 · · · 𝑣𝑛 .
The row notation for the vector #»
𝑣 = ⎢ . ⎥ is #»
.
⎣ . ⎦
⎡
Definition 1.1.3
Row Notation for a
Vector
𝑣𝑛
REMARK
]︀𝑇
]︀
[︀
[︀
Many texts do not distinguish between 𝑣1 𝑣2 · · · 𝑣𝑛 and 𝑣1 𝑣2 · · · 𝑣𝑛 .
]︀𝑇
]︀ [︀
[︀
For our purposes, this distinction is important, i.e. 𝑣1 𝑣2 . . . 𝑣𝑛 ̸= 𝑣1 𝑣2 . . . 𝑣𝑛 .
1.2
Algebraic and Geometric Representation of Vectors
[︂ ]︂
2
#»
.
A vector’s algebraic representation consists of a column of numbers, e.g. 𝑣 =
3
A vector’s geometric representation consists of a directed line segment in R𝑛 .
Vectors are not localized; that is, if two vectors have the same magnitude and direction,
but different starting points, we consider those vectors to be equal. For convenience, we
often choose the starting point of a vector to be the origin, 𝑂.
Figure 1.2.1: Vectors and the Origin
Relationship between Geometric and Algebraic Representations
Consider the geometric representation for a vector #»
𝑣 in R𝑛 , which is a directed line segment.
#»
We typically consider the initial point of 𝑣 to be the origin 𝑂 and call its terminal point V
(i.e., we use the same letter as the vector). If the point V has coordinates (𝑣1 , 𝑣2 , . . . , 𝑣𝑛 ),
then we write the vector #»
𝑣 as
⎡ ⎤
𝑣1
⎢ 𝑣2 ⎥
⎢ ⎥
#»
𝑣 =⎢ . ⎥
⎣ .. ⎦
𝑣𝑛
This 𝑛-tuple is the algebraic representation of #»
𝑣.
Section 1.3
Operations on Vectors
7
⎤
𝑣1
⎢ 𝑣2 ⎥
⎢ ⎥
On the other hand, if the algebraic representation of a vector #»
𝑣 in R𝑛 is ⎢ . ⎥, then its
⎣ .. ⎦
⎡
𝑣𝑛
geometric interpretation is a directed line segment from the initial point, 𝑂, to the terminal
point, V, with co-ordinates (𝑣1 , 𝑣2 , . . . , 𝑣𝑛 ).
[︂ ]︂
𝑣1
#»
Figure 1.2.2: The geometric representation of 𝑣 =
in R2
𝑣2
We are free to move the vector #»
𝑣 around, as long as we maintain its magnitude and direction;
however, moving it will change both the initial and terminal points. We say that the point
𝑉 is the terminal point associated with the vector #»
𝑣 (using the initial point as 𝑂), or more
simply, we say that 𝑉 is the point associated with the vector #»
𝑣.
⎡
Example 1.2.1
⎤
2
The point 𝑉 = (2, −4, 9) is the terminal point associated with the vector #»
𝑣 = ⎣ −4 ⎦ in R3 .
9
1.3
Operations on Vectors
⎡
Definition 1.3.1
Equality of Vectors
in R𝑛
⎤
⎡ ⎤
𝑢1
𝑣1
⎢ 𝑢2 ⎥
⎢ 𝑣2 ⎥
⎢ ⎥
⎢ ⎥
We say that vectors #»
𝑢 = ⎢ . ⎥ and #»
𝑣 = ⎢ . ⎥ in R𝑛 are equal if 𝑢𝑖 = 𝑣𝑖 for all 𝑖 = 1, . . . , 𝑛.
⎣ .. ⎦
⎣ .. ⎦
𝑢𝑛
𝑣𝑛
We denote this by writing #»
𝑢 = #»
𝑣.
The vector equation #»
𝑢 = #»
𝑣 is therefore equivalent to a system of 𝑛 equations, one for each
component.
Example 1.3.2
⎡ ⎤
⎡ ⎤
1
1
If #»
𝑢 = ⎣ 2 ⎦ and #»
𝑣 = ⎣ 2 ⎦, then #»
𝑢 =
̸ #»
𝑣 because 𝑢3 ̸= 𝑣3 .
3
4
8
Chapter 1
Vectors in R𝑛
Geometrically, two vectors #»
𝑢 and #»
𝑣 are said to be equal if they have the same magnitude
and the same direction.
⎤
⎡ ⎤ ⎡ ⎤ ⎡
⎤
⎡ ⎤
𝑢1 + 𝑣1
𝑣1
𝑢1
𝑣1
𝑢1
⎢ 𝑢2 ⎥ ⎢ 𝑣2 ⎥ ⎢ 𝑢2 + 𝑣2 ⎥
⎢ 𝑣2 ⎥
⎢ 𝑢2 ⎥
⎥
⎢ ⎥ ⎢ ⎥ ⎢
⎢ ⎥
⎢ ⎥
𝑢 + #»
𝑣 =⎢ . ⎥+⎢ . ⎥=⎢
𝑣 = ⎢ . ⎥ ∈ R𝑛 . Then #»
Let #»
𝑢 = ⎢ . ⎥ , #»
⎥.
..
⎦
⎣ .. ⎦ ⎣ .. ⎦ ⎣
⎣ .. ⎦
⎣ .. ⎦
.
𝑢𝑛 + 𝑣𝑛
𝑣𝑛
𝑢𝑛
𝑣𝑛
𝑢𝑛
⎡
Definition 1.3.3
Addition in R𝑛
Example 1.3.4
⎡ ⎤ ⎡ ⎤ ⎡
⎤ ⎡ ⎤
2
3
2+3
5
We have ⎣ 4 ⎦ + ⎣ −5 ⎦ = ⎣ 4 − 5 ⎦ = ⎣ −1 ⎦ .
6
7
6+7
13
#» = #»
Geometrically, the vector 𝑤
𝑢 + #»
𝑣 follows the Parallelogram Law, illustrated in the
#»
following example with 𝑣 moved so that its initial point is the terminal point of #»
𝑢.
Example 1.3.5
[︂ ]︂
[︂ ]︂
5
−2
#»
#»
in R2 . Show the result geometrically.
to 𝑣 =
Add 𝑢 =
1
3
[︂ ]︂ [︂ ]︂ [︂ ]︂
3
5
−2
#»
#»
.
=
+
Solution: 𝑢 + 𝑣 =
4
1
3
REMARK
In this course, most of the time we will be labelling axes in R2 as 𝑥1 , 𝑥2 instead of 𝑥, 𝑦.
Similarly, in R3 we will be labelling axes as 𝑥1 , 𝑥2 , 𝑥3 instead of 𝑥, 𝑦, 𝑧. We recommend you
to follow that practice as well.
Example 1.3.6
Suppose a boat is moving with a velocity of #»
𝑢 and a person on the boat walks with velocity
#»
𝑣 on (relative to) the boat. An observer on the shore sees the person move with velocity
#» = #»
𝑤
𝑢 + #»
𝑣 , and thus sees the combined velocity of the person and the boat.
The following rules of vector addition follow from the properties of addition of real numbers.
Section 1.3
Operations on Vectors
Proposition 1.3.7
9
(Properties of Vector Addition)
#» ∈ R𝑛 .
Let #»
𝑢 , #»
𝑣,𝑤
(a) #»
𝑢 + #»
𝑣 = #»
𝑣 + #»
𝑢
(symmetry)
#» = #»
#» = ( #»
#» (associativity)
(b) #»
𝑢 + #»
𝑣 +𝑤
𝑢 + ( #»
𝑣 + 𝑤)
𝑢 + #»
𝑣)+ 𝑤
]︀𝑇
#» [︀
(c) There is a zero vector, 0 = 0 0 · · · 0 in R𝑛 , with the property that
#» #»
#»
𝑣 + 0 = 0 + #»
𝑣 = #»
𝑣.
EXERCISE
Prove Proposition 1.3.7 (Properties of Vector Addition).
For many of the results in this chapter, the proof is left as an exercise for you.
⎡
Definition 1.3.8
Additive Inverse
⎤
𝑢1
⎢ 𝑢2 ⎥
⎢ ⎥
Let #»
𝑢 = ⎢ . ⎥ ∈ R𝑛 . The additive inverse of #»
𝑢 , denoted − #»
𝑢 , is defined as
⎣ .. ⎦
𝑢𝑛
⎡
⎤
−𝑢1
⎢ −𝑢2 ⎥
⎢
⎥
− #»
𝑢 = ⎢ . ⎥.
⎣ .. ⎦
−𝑢𝑛
⎡
Example 1.3.9
⎤
⎡ ⎤
1
−1
If #»
𝑢 = ⎣ −4 ⎦, then − #»
𝑢 = ⎣ 4 ⎦.
2
−2
We now notice the key property of the additive inverse.
Proposition 1.3.10
(Additive Inverse Property)
If #»
𝑢 ∈ R𝑛 , we have
#»
#»
𝑢 − #»
𝑢 = #»
𝑢 + (− #»
𝑢 ) = (− #»
𝑢 ) + #»
𝑢 = 0.
Thus − #»
𝑢 has the effect of “cancelling” the vector #»
𝑢 when addition is performed (hence the
term additive inverse).
10
Chapter 1
Vectors in R𝑛
REMARK
The vector − #»
𝑢 has the same magnitude as #»
𝑢 , but the opposite direction.
⎤
⎡
⎤
⎡ ⎤
−𝑢1
𝑢1
𝑣1
⎢ −𝑢2 ⎥
⎢ 𝑢2 ⎥
⎢ 𝑣2 ⎥
⎥
⎢
⎢ ⎥
⎢ ⎥
𝑢 = ⎢ . ⎥ be vectors in R𝑛 . We define
𝑢 = ⎢ . ⎥ and − #»
Let #»
𝑣 = ⎢ . ⎥, #»
⎣ .. ⎦
⎣ .. ⎦
⎣ .. ⎦
−𝑢𝑛
𝑢𝑛
𝑣𝑛
⎡
Definition 1.3.11
Subtraction in R𝑛
⎤
𝑣1 − 𝑢1
⎢ 𝑣2 − 𝑢2 ⎥
⎥
⎢
#»
𝑣 − #»
𝑢 = #»
𝑣 + (− #»
𝑢) = ⎢
⎥.
..
⎦
⎣
.
⎡
𝑣𝑛 − 𝑢𝑛
Figure 1.3.3: General geometric interpretation of #»
𝑣 − #»
𝑢
Example 1.3.12
⎤ ⎡ ⎤
⎡ ⎤ ⎡ ⎤ ⎡
−1
2−3
3
2
We have ⎣ 4 ⎦ − ⎣ −5 ⎦ = ⎣ 4 + 5 ⎦ = ⎣ 9 ⎦ .
−1
6−7
7
6
For vectors in R𝑛 with 𝑛 > 1, we do not define a component-wise “vector multiplication”,
as we did for addition and subtraction, since such an operation turns out to be of little use
in linear algebra. There are different types of “vector products” that we do define, however.
These will be discussed later in this chapter.
In the meantime, we will define the scalar multiplication of a vector by a real number 𝑐.
Definition 1.3.13
Scalar
Multiplication
⎤
⎤
⎡ ⎤ ⎡
𝑐𝑣1
𝑣1
𝑣1
⎢ 𝑣2 ⎥ ⎢ 𝑐𝑣2 ⎥
⎢ 𝑣2 ⎥
⎢ ⎥ ⎢
⎥
⎢ ⎥
Let 𝑐 ∈ R and #»
𝑣 = ⎢ . ⎥ ∈ R𝑛 . We define 𝑐 #»
𝑣 = 𝑐 ⎢ . ⎥ = ⎢ . ⎥.
⎣ .. ⎦ ⎣ .. ⎦
⎣ .. ⎦
𝑣𝑛
𝑐𝑣𝑛
𝑣𝑛
We say that the vector #»
𝑣 is scaled by 𝑐.
⎡
For this reason, real numbers are often referred to as scalars.
Section 1.3
Operations on Vectors
11
Figure 1.3.4: Scalar multiplication of #»
𝑣
REMARKS
• When 𝑐 > 0, the vector 𝑐 #»
𝑣 points in the direction of #»
𝑣 and is 𝑐 times as long.
• When 𝑐 < 0, then 𝑐 #»
𝑣 points in the opposite direction to #»
𝑣 and is |𝑐| times as long.
• The vector (−1) #»
𝑣 = − #»
𝑣 has the same length as #»
𝑣 but points in the opposite direction.
Proposition 1.3.14
(Properties of Scalar Multiplication)
Let 𝑐, 𝑑 ∈ R and #»
𝑢 , #»
𝑣 ∈ R𝑛 .
(a) (𝑐 + 𝑑) #»
𝑣 = 𝑐 #»
𝑣 + 𝑑 #»
𝑣.
(b) (𝑐𝑑) #»
𝑣 = 𝑐(𝑑 #»
𝑣 ).
(c) 𝑐( #»
𝑢 + #»
𝑣 ) = 𝑐 #»
𝑢 + 𝑐 #»
𝑣.
#»
(d) 0 #»
𝑣 = 0.
#»
#»
(e) If 𝑐 #»
𝑣 = 0 , then 𝑐 = 0 or #»
𝑣 = 0.
(cancellation law)
Property (e) may look especially familiar, because the following result is often used in
mathematics when dealing with real numbers:
For all 𝑎, 𝑏 ∈ R, if 𝑎𝑏 = 0, then 𝑎 = 0 or 𝑏 = 0.
Definition 1.3.15
Standard Basis for
R𝑛
Example 1.3.16
In R𝑛 , let 𝑒#»𝑖 be the vector whose 𝑖𝑡ℎ component is 1 with all other components 0. The set
𝑛
ℰ = {𝑒#»1 , 𝑒#»2 , . . . , 𝑒#»
𝑛 } is called the standard basis for R .
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
0
0 ⎬
⎨ 1
The standard basis for R3 is {𝑒#»1 , 𝑒#»2 , 𝑒#»3 } = ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ .
⎩
⎭
0
0
1
⎡
Definition 1.3.17
Components
⎤
𝑣1
⎢ 𝑣2 ⎥
⎢ ⎥
#»
If #»
𝑣 = ⎢ . ⎥ = 𝑣1 𝑒#»1 + 𝑣2 𝑒#»2 + ... + 𝑣𝑛 𝑒#»
𝑛 , then we call 𝑣1 , 𝑣2 , ..., 𝑣𝑛 the components of 𝑣 .
⎣ .. ⎦
𝑣𝑛
12
Example 1.3.18
Chapter 1
In
R2 ,
Vectors in R𝑛
[︂ ]︂
[︂ ]︂
[︂ ]︂
2
1
0
#»
#»
#»
𝑣 =
= 2𝑒1 + 3𝑒2 = 2
+3
.
3
0
1
⎡
Example 1.3.19
⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
5
1
0
0
If #»
𝑣 = ⎣ 3 ⎦ ∈ R3 , then #»
𝑣 = 5 ⎣ 0 ⎦ + 3 ⎣ 1 ⎦ − 4 ⎣ 0 ⎦ = 5𝑒#»1 + 3𝑒#»2 − 4𝑒#»3 .
−4
0
0
1
The components of #»
𝑣 are 5, 3 and −4.
1.4
Vectors in C𝑛
Vectors are also defined in C𝑛 , where each component is a complex number. Equality,
addition, subtraction, scalar multiplication, and the cancellation law in C𝑛 are all defined
as in the previous propositions, replacing R with C.
Definition 1.4.1
C𝑛 , Vector in C𝑛
⎧
⎪
⎨
⎫
⎤
𝑧1
⎪
⎬
⎢ ⎥
The set C𝑛 is defined as #»
𝑧 = ⎣ ... ⎦ : 𝑧1 , . . . , 𝑧𝑛 ∈ C .
⎪
⎪
⎩
⎭
𝑧𝑛
⎡ ⎤
𝑧1
⎢ .. ⎥
#»
A vector is an element 𝑧 = ⎣ . ⎦ of C𝑛 .
⎡
𝑧𝑛
⎤
1+𝑖
#» = 1 − 2𝑖 ∈ C2 and #»
We have 𝑤
𝑧 = ⎣ 2 − 3𝑖 ⎦ ∈ C3 .
7
−4 + 5𝑖
[︂
Example 1.4.2
]︂
⎡
Addition, subtraction, and scalar multiplication in C𝑛 are defined as in R𝑛 .
[︂
Example 1.4.3
]︂
[︂
]︂
1+𝑖
2−𝑖
The expression 3
−2
evaluates to
2+𝑖
4+𝑖
[︂
]︂
[︂
]︂ [︂
]︂ [︂
]︂ [︂
]︂
1+𝑖
2−𝑖
3 + 3𝑖
4 − 2𝑖
−1 + 5𝑖
3
−2
=
−
=
.
2+𝑖
4+𝑖
6 + 3𝑖
8 + 2𝑖
−2 + 𝑖
Section 1.5
Dot Product in R𝑛
Example 1.4.4
13
#» ∈ C𝑛 and (𝑐 − 4 − 𝑖)( #»
#» = #»
#»
If 𝑐 ∈ C and #»
𝑧,𝑤
𝑧 − 𝑤)
0 , then 𝑐 = 4 + 𝑖 or #»
𝑧 = 𝑤.
𝑛
𝑛
The standard basis {𝑒#»1 , 𝑒#»2 , ..., 𝑒#»
𝑛 } for C is the same as the standard basis for R .
Definition 1.4.5
Standard Basis for
C𝑛
Example 1.4.6
In C𝑛 , let 𝑒#»𝑖 represent the vector whose 𝑖𝑡ℎ component is 1 with all other components 0.
𝑛
The set {𝑒#»1 , 𝑒#»2 , ..., 𝑒#»
𝑛 } is called the standard basis for C .
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1
0
0
0 ⎪
⎪
⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
⎬
0
1
0
⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢0⎥ .
The standard basis for C4 is {𝑒#»1 , 𝑒#»2 , 𝑒#»3 , 𝑒#»4 } = ⎢
⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦⎪
⎪
⎪
⎪
⎩
⎭
0
0
0
1
⎡
Definition 1.4.7
Components
⎢
⎢
If #»
𝑣 =⎢
⎣
𝑣1
𝑣2
..
.
⎤
⎥
⎥
⎥ = 𝑣1 𝑒#»1 + 𝑣2 𝑒#»2 + ... + 𝑣𝑛 𝑒#»
𝑛 , then we refer to 𝑣1 , 𝑣2 , ..., 𝑣𝑛 as the components
⎦
𝑣𝑛
of #»
𝑣.
⎡
Example 1.4.8
⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
5+𝑖
1
0
0
If #»
𝑣 = ⎣ 3𝑖 ⎦ ∈ C3 , then #»
𝑣 = (5 + 𝑖) ⎣ 0 ⎦ + 3𝑖 ⎣ 1 ⎦ − 4 ⎣ 0 ⎦ = (5 + 𝑖)𝑒#»1 + 3𝑖 𝑒#»2 − 4𝑒#»3 .
−4
0
0
1
#»
The components of 𝑣 are 5 + 𝑖, 3𝑖 and −4.
1.5
Dot Product in R𝑛
The dot product is a function that takes as input two vectors in R𝑛 and returns a real
number. (Later, we will adapt this idea to vectors in C𝑛 using a different operation.)
⎡ ⎤
⎤
𝑣1
𝑢1
⎢ 𝑢2 ⎥
⎢ 𝑣2 ⎥
⎢ ⎥
⎢ ⎥
𝑣 = ⎢ . ⎥ be vectors in R𝑛 . We define their dot product by
Let #»
𝑢 = ⎢ . ⎥ and #»
⎣ .. ⎦
⎣ .. ⎦
𝑢𝑛
𝑣𝑛
⎡
Definition 1.5.1
Dot Product
#»
𝑢 · #»
𝑣 = 𝑢1 𝑣1 + 𝑢2 𝑣2 + · · · + 𝑢𝑛 𝑣𝑛 .
Example 1.5.2
⎡ ⎤ ⎡ ⎤
[︂ ]︂ [︂ ]︂
3
−4
2
5
We have
·
= 2(5) + 3(7) = 31 and ⎣ −5 ⎦ · ⎣ 4 ⎦ = 3(−4) + (−5)(4) + 2(−2) = −36.
3
7
2
−2
14
Chapter 1
Proposition 1.5.3
Vectors in R𝑛
(Properties of the Dot Product)
#» are vectors in R𝑛 , then:
If 𝑐 ∈ R and #»
𝑢 , #»
𝑣, 𝑤
(a) #»
𝑢 · #»
𝑣 = #»
𝑣 · #»
𝑢
(symmetry)
}︃
#» = #»
#» + #»
#»
(b) ( #»
𝑢 + #»
𝑣)· 𝑤
𝑢 ·𝑤
𝑣 ·𝑤
(c) (𝑐 #»
𝑢 ) · #»
𝑣 = 𝑐( #»
𝑢 · #»
𝑣)
(collectively known as linearity)
#»
(d) #»
𝑣 · #»
𝑣 ≥ 0, with #»
𝑣 · #»
𝑣 = 0 if and only if #»
𝑣 = 0.
(non-negativity)
An important use of the dot product is determining the length of a vector.
Definition 1.5.4
√
The length (or norm or magnitude) of the vector #»
𝑣 ∈ R𝑛 is ‖ #»
𝑣 ‖ = #»
𝑣 · #»
𝑣.
Length in R𝑛 ,
Norm in R𝑛
Example 1.5.5
⃦[︂ ]︂⃦ √︃[︂ ]︂ [︂ ]︂
√︀
⃦ 2 ⃦
√
√
2
2
⃦
= 2(2) + 4(4) = 20 = 2 5.
We have ⃦
·
⃦ 4 ⃦=
4
4
Example 1.5.6
⃦⎡ ⎤⃦ ⎯
⎡
⎤ ⎡ ⎤
⃦ 2 ⃦ ⎸
2
2
√︀
⃦
⃦ ⎸
√
⎣ −3 ⎦⃦ = ⎸
⎣ −3 ⎦ · ⎣ −3 ⎦ = 2(2) + (−3)(−3) + 4(4) = 29.
We have ⃦
⎷
⃦
⃦
⃦ 4 ⃦
4
4
REMARKS
The notation “‖” is used in order to distinguish from the notation for the absolute value of
a real number, “|”.
√
Recall that when we take the square root, we mean the positive square root, so that 4 = 2
(not ±2). Notice that it follows from Property (d) of Proposition 1.5.3 (Properties of the
Dot Product) that every non-zero vector has length greater than zero, and the zero vector
#»
0 has a length of zero.
Proposition 1.5.7
If 𝑐 ∈ R and #»
𝑣 ∈ R𝑛 , then ‖𝑐 #»
𝑣 ‖ = |𝑐| ‖ #»
𝑣 ‖.
⃦[︂ ]︂⃦
√︀
⃦ 1 ⃦
√
⃦
⃦
⃦ 2 ⃦ = 3 1(1) + 2(2) = 3 5.
Example 1.5.8
⃦[︂ ]︂⃦ ⃦ [︂ ]︂⃦
⃦ −3 ⃦ ⃦
1 ⃦
⃦ ⃦
⃦
We have ⃦
⃦ −6 ⃦ = ⃦−3 2 ⃦ = | − 3|
Example 1.5.9
⃦⎡ ⎤⃦ ⃦ ⎡
⎤⃦
⃦ −2 ⃦ ⃦
⃦
1
⃦
⃦ ⃦
⃦
⃦
⃦
⃦
⎣
⎦
⎣
⎦
We have ⃦ −6 ⃦ = ⃦−2 3 ⃦
⃦ = | − 2|
⃦ 4 ⃦ ⃦
−2 ⃦
⃦⎡ ⎤⃦
⃦ 1 ⃦
√︀
⃦
⃦
√
⃦⎣ 3 ⎦⃦ = 2 12 + 32 + (−2)2 = 2 14.
⃦
⃦
⃦ −2 ⃦
Section 1.5
Dot Product in R𝑛
Definition 1.5.10
15
We say that #»
𝑣 ∈ R𝑛 is a unit vector if ‖ #»
𝑣 ‖ = 1.
Unit Vector
Example 1.5.11
Example 1.5.12
[︂ ]︂
√
2
#»
If 𝑢 =
, then ‖ #»
𝑢 ‖ = 13, so #»
𝑢 is not a unit vector.
3
If #»
𝑣 =
[︂
1 1 2
√ √ √
6 6 6
]︂𝑇
⎡ ⎤
1
1 ⎣ ⎦
1 √
1 , then ‖ #»
= √
𝑣 ‖ = √ 12 + 12 + 22 = 1, so #»
𝑣 is a unit
6 2
6
vector.
Definition 1.5.13
When #»
𝑣 ∈ R𝑛 is a non-zero vector, we can produce a unit vector
#»
𝑣
𝑣^ = #»
‖𝑣‖
Normalization
in the direction of #»
𝑣 by scaling #»
𝑣 . This process is called normalization.
REMARKS
#»
𝑣
The vector 𝑣^ has the same direction as the vector #»
𝑣 , since 𝑣^ = ‖ #»
𝑣‖ =
⃦
⃦ ⃒
⃒
⃦ 1 #»⃦ ⃒ 1 ⃒ #»
The vector 𝑣^ is a unit vector, since ‖^
𝑣 ‖ = ⃦ ‖ #»
𝑣 ‖ 𝑣 ⃦ = ⃒ ‖ #»
𝑣 ‖ ⃒ ‖ 𝑣 ‖ = 1.
1
‖ #»
𝑣‖
#»
𝑣.
⎡
Example 1.5.14
⎤
⎡ ⎤
2
2
#»
𝑣
1 ⎣ ⎦
#»
⎣
⎦
−4 .
The unit vector in the direction of 𝑣 = −4 is the vector 𝑣^ = #» = √
‖𝑣‖
21 −1
−1
The dot product can also be used to determine the angle between two vectors. In R2 this
can be seen as follows.
Figure 1.5.5: The angle 𝜃 between #»
𝑢 and #»
𝑣 in R2
16
Chapter 1
Vectors in R𝑛
If 𝜃 is the angle between #»
𝑢 , #»
𝑣 ∈ R2 , then the law of cosines says that
‖ #»
𝑢 − #»
𝑣 ‖2 = ‖ #»
𝑢 ‖2 + ‖ #»
𝑣 ‖2 − 2‖ #»
𝑢 ‖‖ #»
𝑣 ‖ cos 𝜃.
On the other hand, recalling the definition of the length ‖ · ‖ in terms of the dot product,
we have
‖ #»
𝑢 − #»
𝑣 ‖2 = ( #»
𝑢 − #»
𝑣 ) · ( #»
𝑢 − #»
𝑣)
#»
#»
#»
#»
= 𝑢 · 𝑢 − 𝑢 · 𝑣 − #»
𝑣 · #»
𝑢 + #»
𝑣 · #»
𝑣
2
2
#»
#»
#»
#»
= ‖𝑢‖ − 2𝑣 · 𝑢 + ‖𝑣 ‖ .
If we combine the above two equations, we arrive at
#»
𝑢 · #»
𝑣 = ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ cos 𝜃.
This gives us a relationship between the angle 𝜃 between the vectors #»
𝑢 , #»
𝑣 ∈ R2 and their
dot product. This motivates the following definition.
Definition 1.5.15
Angle Between
Vectors
Let
and
#»
𝑢 and #»
𝑣 be non-zero vectors in R𝑛 . The angle 𝜃, in radians (0 ≤ 𝜃 ≤ 𝜋), between #»
𝑢
#»
𝑣 is such that
#»
𝑢 · #»
𝑣 = ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ cos 𝜃, that is, 𝜃 = arccos
(︂ #» #» )︂
𝑢·𝑣
.
#»
‖ 𝑢 ‖‖ #»
𝑣‖
#» is between 0 and 𝜋
Figure 1.5.6: The angle 𝜃 between #»
𝑣 and 𝑤
REMARK
#» #»
𝑢·𝑣
In order for the above definition to be well-defined, we need to know that ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ lies in in
(︁ #» #» )︁
𝑢·𝑣
the domain of arccos—otherwise arccos ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ is undefined. That is, we need to know
that
#»
𝑢 · #»
𝑣
∈ [−1, 1],
#»
‖ 𝑢 ‖‖ #»
𝑣‖
or, equivalently, that
| #»
𝑢 · #»
𝑣 | ≤ ‖ #»
𝑢 ‖ #»
𝑣 ‖ for all #»
𝑢 , #»
𝑣 ∈ R𝑛 .
This inequality is true and is known as the Cauchy–Schwarz Inequality. We will take its
validity for granted in these notes, and will not provide a proof.
Section 1.5
Dot Product in R𝑛
Example 1.5.16
Example 1.5.17
Definition 1.5.18
17
[︂ ]︂
[︂
]︂
1
−2
Determine the angle 𝜃 between
and
in R2 , rounding your answer to three decimal
4
3
places throughout.
⎛ [︂ ]︂ [︂ ]︂ ⎞
1
−2
(︂
)︂
·
⎟
⎜
4
3
⎟ = arccos √ 10√
⃦
⃦
[︂
]︂⃦
[︂
]︂⃦
Solution: 𝜃 = arccos ⎜
≈ 0.833.
⎝ ⃦ 1 ⃦ ⃦ −2 ⃦ ⎠
17
13
⃦
⃦
⃦⃦
⃦ 4 ⃦⃦ 3 ⃦
⎡ ⎤
⎡
⎤
1
2
Determine the angle, 𝜃, between ⎣ 3 ⎦ and ⎣ −4 ⎦ in R3 , rounding your answer to three
5
3
decimal places throughout.
⎛ ⎡ ⎤ ⎡ ⎤ ⎞
1
2
⎜ ⎣ 3 ⎦ · ⎣ −4 ⎦ ⎟
⎜
⎟
)︂
(︂
⎜
⎟
5
3
5
⎜
⎟
⎤⃦ ⎟ = arccos √ √
≈ 1.413.
Solution: 𝜃 = arccos ⎜ ⃦⎡ ⎤⃦ ⃦⎡
⎜⃦ 1 ⃦ ⃦ 2 ⃦⎟
35 29
⃦⃦
⃦⎟
⎜⃦
⎣ ⎦⃦ ⃦⎣
⎦⃦
⎝⃦
⃦ 3 ⃦ ⃦ −4 ⃦ ⎠
⃦ 5 ⃦⃦ 3 ⃦
We say that the two vectors #»
𝑢 and #»
𝑣 in R𝑛 are orthogonal (or perpendicular) if #»
𝑢 · #»
𝑣 = 0.
Orthogonal in R𝑛
REMARK
Our definition of the angle between two vectors, Definition 1.5.15, is consistent with our
definition of orthogonality. Examining our new dot product formula,
#»
#» = ‖ #»
#» cos 𝜃,
𝑣 ·𝑤
𝑣 ‖‖ 𝑤‖
we see that if two vectors are orthogonal, then their dot product is zero, and thus conclude
that the angle 𝜃 between them is 𝜋2 , since cos 𝜋2 = 0 and 0 ≤ 𝜃 ≤ 𝜋.
⎡
Example 1.5.19
⎤
⎡ ⎤
⎡
⎤ ⎡ ⎤
1
1
1
1
The vectors #»
𝑢 = ⎣ 5 ⎦ and #»
𝑣 = ⎣ 1 ⎦ are orthogonal, since #»
𝑢 · #»
𝑣 = ⎣ 5 ⎦ · ⎣ 1 ⎦ = 0.
−3
2
−3
2
⎡
Example 1.5.20
⎤
⎡ ⎤
⎡ ⎤ ⎡ ⎤
1
2
1
2
#» = ⎣ −3 ⎦ and #»
#» · #»
The vectors 𝑤
𝑥 = ⎣ 2 ⎦ are not orthogonal, since 𝑤
𝑥 = ⎣ −3 ⎦ · ⎣ 2 ⎦ =
5
1
5
1
1 ̸= 0.
18
Chapter 1
Vectors in R𝑛
REMARK
⎤
𝑣1
⎢ 𝑣2 ⎥
#»
#»
⎢ ⎥
𝑣 = ⎢ . ⎥ ∈ R𝑛 is orthogonal to 0 , as 0 · #»
𝑣 = 0𝑣1 + 0𝑣2 + · · · + 0𝑣𝑛 = 0.
Every vector #»
⎣ .. ⎦
⎡
𝑣𝑛
Given a vector #»
𝑣 , there are many ways to produce a non-trivial vector that is orthogonal
to #»
𝑣.
Example 1.5.21
[︂ ]︂
[︂ ]︂
[︂ ]︂ [︂ ]︂
𝑎
−𝑏
−𝑏
𝑎
2
A vector orthogonal to
in R is
, since
·
= 0.
𝑎
𝑏
𝑏
𝑎
Example 1.5.22
⎡ ⎤
⎡ ⎤
⎡ ⎤ ⎡ ⎤
𝑎
−𝑏
−𝑏
𝑎
3
⎣
⎦
⎣
⎦
⎣
⎦
⎣
A vector orthogonal to 𝑏 in R is 𝑎 , since 𝑎 · 𝑏 ⎦ = 0.
𝑐
0
0
𝑐
EXERCISE
⎡ ⎤
𝑎
⎣
Produce two different vectors that are orthogonal to 𝑏 ⎦.
𝑐
⎡
Example 1.5.23
⎤
⎡ ⎤
1
1
#»
#»
#»
⎣
⎦
⎣
Determine a vector 𝑢 that is orthogonal to both 𝑣 = −1 and 𝑤 = 2 ⎦.
0
3
⎡ ⎤
𝑢1
Solution: Let #»
𝑢 = ⎣ 𝑢2 ⎦ be such a vector. Then
𝑢3
#»
𝑢 · #»
𝑣⎤ =⎡0 ⎤
⎡
𝑢1
1
⎣ 𝑢2 ⎦ · ⎣ −1 ⎦ = 0
𝑢3
0
𝑢1 − 𝑢2 = 0
𝑢1 = 𝑢2 (*)
and
⎡
#»
#» = 0
𝑢 ·𝑤
⎡
⎤ ⎡ ⎤
𝑢1
1
⎣ 𝑢2 ⎦ · ⎣ 2 ⎦ = 0
𝑢3
3
𝑢1 + 2𝑢2 + 3𝑢3 = 0
3𝑢1 + 3𝑢3 = 0 substitute from (*)
𝑢3 = −𝑢1 (**)
⎤
⎡
⎤
𝑢1
𝑢1
From (*) and (**) we have #»
𝑢 = ⎣ 𝑢1 ⎦. For any choice of 𝑢1 ∈ R, the vector ⎣ 𝑢1 ⎦ will
−𝑢1
−𝑢1
#» This is because by choosing different values of 𝑢 , we
be orthogonal to both #»
𝑣 and 𝑤.
1
Section 1.6
Projection, Components and Perpendicular
19
change the length of #»
𝑢 , but the line through the origin which contains #»
𝑢 remains fixed.
⎡
⎤
4
For example, with 𝑢1 = 4 we get #»
𝑢 = ⎣ 4 ⎦, in which case
−4
⎡ ⎤ ⎡ ⎤
⎡ ⎤ ⎡ ⎤
4
1
4
1
#»
#»
#»
#»
⎣
⎦
⎣
⎦
⎣
⎦
⎣
𝑢 · 𝑣 = 4 · −1 = 4 − 4 = 0 and 𝑢 · 𝑤 = 4 · 2 ⎦ = 4 + 8 − 12 = 0.
−4
0
−4
3
EXERCISE
⎡
⎤
𝑢1
With #»
𝑢 = ⎣ 𝑢1 ⎦ from Example 1.5.23, choose different values of 𝑢1 ∈ R and convince
−𝑢1
⎡ ⎤
⎡ ⎤
1
1
#» = ⎣ −1 ⎦.
yourself that the resulting #»
𝑢 is still orthogonal to both #»
𝑣 = ⎣ 2 ⎦ and 𝑤
0
3
#»
Show that for any choice of 𝑢1 ∈ R, the resulting #»
𝑢 is still orthogonal to both #»
𝑣 and 𝑤.
1.6
Definition 1.6.1
Projection of #»
𝑣
#»
Onto 𝑤
Projection, Components and Perpendicular
#»
#» ∈ R𝑛 with 𝑤
#» =
#» is defined by
Let #»
𝑣,𝑤
̸ 0 . The projection of #»
𝑣 onto 𝑤
#»
#» #»
( #»
𝑣 · 𝑤)
#» = ( 𝑣 · 𝑤) 𝑤.
#»
proj 𝑤#» ( #»
𝑣 ) = #» 2 𝑤
#» · 𝑤
#»
𝑤
‖ 𝑤‖
#» direction.
We also refer to this as the projection of #»
𝑣 in the 𝑤
#»
#» ( 𝑣 )
Figure 1.6.7: proj 𝑤
20
Chapter 1
Find the projection of #»
𝑣 =
[︂
Vectors in R𝑛
]︂
[︂ ]︂
4
7
#»
onto 𝑤 =
.
−8
1
Solution:
proj 𝑤#» ( #»
𝑣) =
Example 1.6.2
=
=
=
#»
( #»
𝑣 · 𝑤)
#»
#» 2 𝑤
‖ 𝑤‖
[︂ ]︂ [︂ ]︂
4
7
[︂ ]︂
·
−8
1
7
⃦[︂ ]︂⃦2
1
⃦ 7 ⃦
⃦
⃦
⃦ 1 ⃦
[︂ ]︂
20 7
50 1
2 #»
𝑤.
5
⎡
Example 1.6.3
⎤
⎡ ⎤
3
1
#» = ⎣ 2 ⎦.
Find the projection of #»
𝑣 = ⎣ −4 ⎦ onto 𝑤
5
3
Solution:
⎡
⎤ ⎡ ⎤
3
1
⎣ −4 ⎦ · ⎣ 2 ⎦ ⎡ ⎤
⎡ ⎤
⎡ ⎤
1
1
1
#»
#»
5
3
( 𝑣 · 𝑤) #»
10 ⎣ ⎦ 5 ⎣ ⎦
#»
⎣
⎦
2 =
2 =
2 .
proj 𝑤#» ( 𝑣 ) = #» 2 𝑤 = ⃦⎡ ⎤⃦2
‖ 𝑤‖
14
7
⃦ 1 ⃦
3
3
3
⃦
⃦
⃦⎣ 2 ⎦⃦
⃦
⃦
⃦ 3 ⃦
REMARKS
1. We can write the projection as follows:
(︂
#»
( #»
𝑣 · 𝑤)
#» = #»
proj 𝑤#» ( #»
𝑣 ) = #» 2 𝑤
𝑣 ·
‖ 𝑤‖
#» )︂ 𝑤
#»
𝑤
#» ^ 𝑤.
^
#»
#» = ( 𝑣 · 𝑤)
‖ 𝑤‖
‖ 𝑤‖
#» is not relevant to the projection, but the direction of 𝑤
#» is.
Thus, the magnitude of 𝑤
#» is given by #»
#» =
2. Recall from Definition 1.5.15 that the angle 𝜃 between #»
𝑣 and 𝑤
𝑣 ·𝑤
#» cos 𝜃, giving:
‖ #»
𝑣 ‖‖ 𝑤‖
#» cos 𝜃
‖ #»
𝑣 ‖‖ 𝑤‖
#» = (‖ #»
proj 𝑤#» ( #»
𝑣) =
𝑤
𝑣 ‖ cos 𝜃)𝑤.
^
#»
‖ 𝑤‖2
#» as giving that part of the vector #»
You can think of the projection of #»
𝑣 onto 𝑤
𝑣 that
#»
lies in the 𝑤 direction.
Section 1.6
Projection, Components and Perpendicular
Definition 1.6.4
21
#»
#» ∈ R𝑛 with 𝑤
#» =
Let #»
𝑣,𝑤
̸ 0 . We refer to the quantity
Component
‖ #»
𝑣 ‖ cos 𝜃 = #»
𝑣 ·𝑤
^
#»
as the component (or scalar component) of #»
𝑣 along 𝑤.
#» will be in the direction
If the scalar component is negative, then the projection of #»
𝑣 along 𝑤
#» we can still think of this projection vector as being in the 𝑤
#» direction.
of the vector − 𝑤;
⎡
Example 1.6.5
Definition 1.6.6
⎤
⎡ ⎤
3
1
#»
#»
⎣
⎦
⎣
Determine the component of 𝑣 = −4 along 𝑤 = 2 ⎦.
5
3
⎡ ⎤
1
1
#» = √14, so 𝑤
^ = √ ⎣ 2 ⎦.
Solution: Note that ‖ 𝑤‖
14 3
⎡ ⎤
⎡ ⎤
1
3
10
1
#» is #»
The component of #»
𝑣 along 𝑤
𝑣 ·𝑤
^ = ⎣ −4 ⎦ · √ ⎣ 2 ⎦ = √ .
14 3
14
5
#»
#» ∈ R𝑛 with 𝑤
#» =
#» is defined by
Let #»
𝑣,𝑤
̸ 0 . The perpendicular of #»
𝑣 onto 𝑤
Perpendicular
perp 𝑤#» ( #»
𝑣 ) = #»
𝑣 − proj 𝑤#» ( #»
𝑣 ).
#»
#» ( 𝑣 )
Figure 1.6.8: perp 𝑤
Example 1.6.7
[︂ ]︂
[︂ ]︂
4
7
#»
#»
Determine the perpendicular of 𝑣 =
onto 𝑤 =
.
−8
1
Then calculate proj #» ( #»
𝑣 ) · perp #» ( #»
𝑣 ).
𝑤
Solution:
𝑤
22
Chapter 1
Vectors in R𝑛
[︂ ]︂
2 7
#»
, so
From [Example 1.6.2], we have proj 𝑤#» ( 𝑣 ) =
5 1
perp 𝑤#» ( #»
𝑣) =
=
=
proj 𝑤#» ( #»
𝑣 ) · perp 𝑤#» ( #»
𝑣) =
=
=
#»
𝑣 − proj #» ( #»
𝑣)
[︂ ]︂ 𝑤 [︂ ]︂
2 7
4
−
−8
5 1
[︂ ]︂
6 1
5 −7
(︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂
2 7
6 1
·
5 1
5 −7
12
(7(1) + 1(−7))
25
0.
⎤
⎡ ⎤
1
3
#»
#»
⎦
⎣
⎣
Determine the perpendicular of 𝑣 = −4 onto 𝑤 = 2 ⎦.
3
5
⎡ ⎤
1
5⎣ ⎦
#»
2 , so
Solution: From Example 1.6.3, we have proj 𝑤#» ( 𝑣 ) =
7
3
⎡
Example 1.6.8
⎤
⎤
⎡ ⎤
⎡
3
1
16
5
1
𝑣 ) = #»
𝑣 − proj 𝑤#» ( #»
𝑣 ) = ⎣ −4 ⎦ − ⎣ 2 ⎦ = ⎣ −38 ⎦ .
perp 𝑤#» ( #»
7
7
5
3
20
⎡
Proposition 1.6.9
#»
#» =
The projection and the perpendicular of a vector #»
𝑣 onto 𝑤
̸ 0 are orthogonal; that is
perp 𝑤#» ( #»
𝑣 ) · proj 𝑤#» ( #»
𝑣 ) = 0.
#» since
Proof: First, perp 𝑤#» ( #»
𝑣 ) is orthogonal to 𝑤,
#» = ( #»
#»
perp 𝑤#» ( #»
𝑣)· 𝑤
𝑣 − proj 𝑤#» ( #»
𝑣 )) · 𝑤
#» − (proj #» ( #»
#»
= #»
𝑣 ·𝑤
𝑤 𝑣 )) · 𝑤
(︂ #» #» )︂
#» − ( 𝑣 · 𝑤) 𝑤
#» · 𝑤
#»
= #»
𝑣 ·𝑤
#» 2
‖ 𝑤‖
(︂ #» #» )︂
( 𝑣 · 𝑤)
#»
#»
#» 2
= 𝑣 ·𝑤−
‖ 𝑤‖
#» 2
‖ 𝑤‖
#» − #»
#»
= #»
𝑣 ·𝑤
𝑣 ·𝑤
= 0.
#» it is also orthogonal to proj #» ( #»
Since perp 𝑤#» ( #»
𝑣 ) is orthogonal to 𝑤,
𝑤 𝑣 ).
Section 1.7
Standard Inner Product in C𝑛
1.7
23
Standard Inner Product in C𝑛
Recall the different ways used to find the magnitude of a real number and the magnitude
of a complex number.
√︀
If 𝑥 ∈ R, then the magnitude (absolute value) of 𝑥 is |𝑥| = 𝑥(𝑥).
√︀
If 𝑧 ∈ C, then the magnitude (modulus) of 𝑧 is |𝑧| = 𝑧(¯
𝑧 ).
Example 1.7.1
Determine the magnitude of 𝑧 = 2 − 3𝑖.
Solution: The magnitude of 𝑧 is
√︁
√︀
√
|𝑧| = |2 − 3𝑖| = (2 − 3𝑖)(2 − 3𝑖) = 22 + (−3)2 = 13.
One of the most important uses of the dot product on R𝑛 is in finding the lengths of vectors.
When we extend this concept to C𝑛 , we will need to use complex conjugates.
⎡
Definition 1.7.2
Standard Inner
Product on C𝑛
⎤
⎡ ⎤
𝑣1
𝑤1
⎢ 𝑣2 ⎥
⎢ 𝑤2 ⎥
⎢ ⎥ #» ⎢ ⎥
The standard inner product of #»
𝑣 = ⎢ . ⎥,𝑤
= ⎢ . ⎥ ∈ C𝑛 is
⎣ .. ⎦
⎣ .. ⎦
𝑣𝑛
𝑤𝑛
#» = 𝑣 𝑤 + 𝑣 𝑤 + · · · + 𝑣 𝑤 .
⟨ #»
𝑣 , 𝑤⟩
2 2
𝑛 𝑛
1 1
Notice that this operation is similar to the dot product, in that we multiply components
together and add them up. An important difference is the fact that we take the complex
#»
conjugate of all of the components that come from the second vector 𝑤.
Example 1.7.3
Let #»
𝑣 =
[︂
]︂
[︂
]︂
1+𝑖
2+𝑖
#»
#»
,𝑤 =
. Evaluate ⟨ #»
𝑣 , 𝑤⟩.
1 + 2𝑖
1+𝑖
Solution:
#» = (1 + 𝑖)(2 + 𝑖) + (1 + 2𝑖)(1 + 𝑖)
⟨ #»
𝑣 , 𝑤⟩
= (2 − 𝑖 + 2𝑖 + 1) + (1 − 𝑖 + 2𝑖 + 2)
= (3 + 𝑖) + (3 + 𝑖)
= 6 + 2𝑖.
⟨[︂
Example 1.7.4
Evaluate
]︂ [︂
]︂⟩
2 − 3𝑖
−3 + 4𝑖
,
.
1 − 2𝑖
3 + 5𝑖
24
Chapter 1
Vectors in R𝑛
Solution:
⟨[︂
2 − 3𝑖
1 − 2𝑖
]︂ [︂
]︂⟩
−3 + 4𝑖
,
= (2 − 3𝑖)(−3 + 4𝑖) + (1 − 2𝑖)(3 + 5𝑖)
3 + 5𝑖
= (2 − 3𝑖) (−3 − 4𝑖) + (1 − 2𝑖)(3 − 5𝑖)
= (−6 − 8𝑖 + 9𝑖 − 12) + (3 − 5𝑖 − 6𝑖 − 10)
= −25 − 10𝑖.
Proposition 1.7.5
(Properties of the Standard Inner Product)
#» are vectors in C𝑛 , then
If 𝑐 ∈ C and #»
𝑢 , #»
𝑣 and 𝑤
(a) ⟨ #»
𝑢 , #»
𝑣 ⟩ = ⟨ #»
𝑣 , #»
𝑢 ⟩ (conjugate symmetry)
}︃ (︂
)︂
#» = ⟨ #»
#» + ⟨ #»
#»
(b) ⟨ #»
𝑢 + #»
𝑣 , 𝑤⟩
𝑢 , 𝑤⟩
𝑣 , 𝑤⟩
collectively known as
linearity in the first argument.
(c) ⟨𝑐 #»
𝑢 , #»
𝑣 ⟩ = 𝑐 ⟨ #»
𝑢 , #»
𝑣⟩
#»
(d) ⟨ #»
𝑣 , #»
𝑣 ⟩ ≥ 0, with ⟨ #»
𝑣 , #»
𝑣 ⟩ = 0 if and only if #»
𝑣 = 0.
(non-negativity).
As in R𝑛 , we can use Property (d) to calculate lengths of vectors in C𝑛 .
Definition 1.7.6
√︀
The length (or norm or magnitude) of the vector #»
𝑣 ∈ C𝑛 is ‖ #»
𝑣 ‖ = ⟨ #»
𝑣 , #»
𝑣 ⟩.
Length in C𝑛 ,
Norm in C𝑛
⎡
Example 1.7.7
⎤
2−𝑖
Determine the length of the vector #»
𝑣 = ⎣ −3 + 2𝑖 ⎦.
−4 − 5𝑖
Solution:
⟨⎡ 2 − 𝑖 ⎤ ⎡ 2 − 𝑖 ⎤⟩
⎣ −3 + 2𝑖 ⎦ , ⎣ −3 + 2𝑖 ⎦
⟨ #»
𝑣 , #»
𝑣⟩ =
−4 − 5𝑖
−4 − 5𝑖
= (2 − 𝑖)(2 − 𝑖) + (−3 + 2𝑖)(−3 + 2𝑖) + (−4 − 5𝑖)(−4 − 5𝑖)
= 2 2 + 12 + 32 + 22 + 4 2 + 5 2
= 59
√︀
√
#»
⟨ #»
𝑣 , #»
𝑣 ⟩ = 59.
‖𝑣‖ =
Note that ⟨ #»
𝑣 , #»
𝑣 ⟩ = ‖ #»
𝑣 ‖2 = |2 − 𝑖|2 + | − 3 + 2𝑖|2 + | − 4 − 5𝑖|2 .
Proposition 1.7.8
(Properties of the Length)
Let 𝑐 ∈ C and #»
𝑣 ∈ C𝑛 . Then
(a) ‖𝑐 #»
𝑣 ‖ = |𝑐| ‖ #»
𝑣‖
Section 1.7
Standard Inner Product in C𝑛
25
(b) ‖ #»
𝑣 ‖ ≥ 0, and ‖ #»
𝑣 ‖ = 0 if and only if #»
𝑣 = 0.
Example 1.7.9
⃦
⎡
⎤⃦
⃦
2−𝑖 ⃦
⃦
⃦
⎣
⎦⃦
Determine ⃦
⃦(−1 + 2𝑖) −3 + 2𝑖 ⃦ .
⃦
−4 − 5𝑖 ⃦
Solution: Using Property (a) in the above Proposition and Example 1.7.7,
⃦
⃦⎡
⎡
⎤⃦
⎤⃦
⃦
⃦ 2−𝑖 ⃦
2−𝑖 ⃦
√
⃦
⃦
⃦
⃦ √ √
⃦(−1 + 2𝑖) ⎣ −3 + 2𝑖 ⎦⃦ = | − 1 + 2𝑖| ⃦⎣ −3 + 2𝑖 ⎦⃦ = 5 59 = 295.
⃦
⃦
⃦
⃦
⃦
⃦ −4 − 5𝑖 ⃦
−4 − 5𝑖 ⃦
Definition 1.7.10
We say that the two vectors #»
𝑢 and #»
𝑣 in C𝑛 are orthogonal if ⟨ #»
𝑢 , #»
𝑣 ⟩ = 0.
Orthogonal in C𝑛
Example 1.7.11
Is #»
𝑣 =
[︂
]︂
]︂
[︂
2+𝑖
1+𝑖
#»
?
orthogonal to 𝑤 =
−1 − 𝑖
1 + 2𝑖
#»
Solution: Let’s calculate the standard inner product of #»
𝑣 and 𝑤:
#» = (1 + 𝑖)(2 + 𝑖) + (1 + 2𝑖)(−1 − 𝑖)
⟨ #»
𝑣 , 𝑤⟩
= 2 − 𝑖 + 2𝑖 + 1 − 1 + 𝑖 − 2𝑖 − 2
= 0.
#» = 0, #»
#» are orthogonal.
Since ⟨ #»
𝑣 , 𝑤⟩
𝑣 and 𝑤
Example 1.7.12
]︂
[︂
2+𝑖
#»
.
(a) Find a non-zero vector that is orthogonal to 𝑤 =
1+𝑖
]︂
[︂
#» = 2 + 𝑖 .
(b) Find a unit vector that is orthogonal to 𝑤
1+𝑖
Solution: [︂ ]︂
𝑣
#» Then
𝑣 = 1 be orthogonal to 𝑤.
(a) Let #»
𝑣2
#» = 0
⟨ #»
𝑣 , 𝑤⟩
𝑣1 (2 − 𝑖) + 𝑣2 (1 − 𝑖) = 0
𝑣2 (1 − 𝑖) = 𝑣1 (𝑖 − 2)
𝑣2 (1 − 𝑖)(1 + 𝑖) = 𝑣1 (𝑖 − 2)(1 + 𝑖)
(︀
)︀
multiply by 1 − 𝑖
2𝑣2 = 𝑣1 (−3 − 𝑖)
−3 − 𝑖
𝑣2 =
𝑣1 . (*)
2
#» = 0, then
By Proposition 1.7. 5 (Properties of the Standard Inner Product), if ⟨ #»
𝑣 , 𝑤⟩
#»
#»
#»
#»
#»
#»
⟨𝑐 𝑣 , 𝑤⟩ = 𝑐 ⟨ 𝑣 , 𝑤⟩ = 𝑐(0) = 0; that is, if 𝑣 is orthogonal to 𝑤, then any scalar multiple 𝑐 #»
𝑣
#»
is also orthogonal to 𝑤.
26
Chapter 1
Vectors in R𝑛
Accordingly, by choosing any non-zero value for 𝑣[︂1 , we
]︂ can use (*) to determine a corre𝑣
#»
sponding value for 𝑣2 , so that the resulting vector 1 is orthogonal to 𝑤.
𝑣2
−3 − 𝑖
For example, choosing 𝑣1 = 2 gives 𝑣2 =
(2) = −3 − 𝑖.
2
⟨[︂
]︂ [︂
]︂⟩
2
2+𝑖
#»
#»
Since ⟨ 𝑣 , 𝑤⟩ =
,
−3 − 𝑖
1+𝑖
= 2(2 + 𝑖) + (−3 − 𝑖)(1 + 𝑖)
= 2(2 − 𝑖) + (−3 − 𝑖)(1 − 𝑖)
= 4 − 2𝑖 − 3 − 𝑖 + 3𝑖 − 1 = 0,
[︂
]︂
[︂
]︂
2
2+𝑖
#»
#»
we know that 𝑣 =
is orthogonal to 𝑤 =
.
−3 − 𝑖
1+𝑖
[︂
]︂
1
2
(b) A unit vector in the same direction as #»
𝑣 =
is 𝑣^ = #» #»
𝑣.
−3 − 𝑖
‖𝑣‖
√︁
√︀
𝑣 , #»
𝑣⟩ =
2(2) + (−3 − 𝑖)(−3 − 𝑖)
‖ #»
𝑣 ‖ = ⟨ #»
√
4+9+1
=
√
=
14
]︂
[︂
1
1
2
#»
is a unit vector that is orthogonal to 𝑤.
Thus, 𝑣^ = #» #»
𝑣 =√
‖𝑣‖
14 −3 − 𝑖
EXERCISE
In (a) of the previous Example, choose a different non-zero [︂value
]︂ for 𝑣1 and use (*) to
𝑣1
#» =
determine a value for 𝑣2 . Verify that your resulting vector
is orthogonal to 𝑤
𝑣2
]︂
[︂
2+𝑖
.
1+𝑖
Definition 1.7.13
Projection of #»
𝑣
#»
Onto 𝑤
#»
#» ∈ C𝑛 , with 𝑤
#» =
#» is defined by
Let #»
𝑣,𝑤
̸ 0 . The projection of #»
𝑣 onto 𝑤
#»
⟨ #»
𝑣 , 𝑤⟩
#» = ⟨ #»
proj 𝑤#» ( #»
𝑣 ) = #» 2 𝑤
𝑣 , 𝑤⟩
^ 𝑤.
^
‖ 𝑤‖
⎡
Example 1.7.14
⎤
⎡
⎤
1
1−𝑖
#» = ⎣ 2 − 𝑖 ⎦ .
Find the projection of #»
𝑣 = ⎣ 𝑖 ⎦ onto 𝑤
1+𝑖
3+𝑖
Solution:
Section 1.8
Fields
27
#» = 1(1 − 𝑖) + 𝑖(2 − 𝑖) + (1 + 𝑖)(3 + 𝑖)
⟨ #»
𝑣 , 𝑤⟩
= 1(1 + 𝑖) + 𝑖(2 + 𝑖) + (1 + 𝑖)(3 − 𝑖)
= 1 + 𝑖 + 2𝑖 − 1 + 3 − 𝑖 + 3𝑖 + 1
= 4 + 5𝑖
2
#»
‖ 𝑤‖ = |1 − 𝑖|2 + |2 − 𝑖|2 + |3 + 𝑖|2
= 1+1+4+1+9+1
= 17
⎡
⎤
1−𝑖
#»
#»
⟨ 𝑣 , 𝑤⟩ #» 4 + 5𝑖 ⎣
2 − 𝑖⎦ .
Thus, proj 𝑤#» ( #»
𝑣) =
#» 2 𝑤 = 17
‖ 𝑤‖
3+𝑖
1.8
Fields
So far, we have used the sets R and C to choose the components of vectors, but they will
play other roles as well. These sets are examples of a field.
We will not see any other fields in this course beyond R and C, but for the interested
student, we can give some feel for what a field “is”.
Loosely speaking, a field is a set where we can add, subtract, multiply and divide. So,
the set of integers, Z, is not a field, since although 2 and 3 are both integers, the quotient
2/3 is not. The set of positive real numbers R+ is not a field since although 2 and 3 are
positive real numbers, the difference 2 − 3 is not. It turns out (and this is not supposed to
be obvious) that the ability to add, subtract, multiply and divide are exactly the properties
we need of our scalars in order to do the type of algebra that we want.
In much of what we do, there is no need to make an explicit distinction between R and C,
and so we will refer to our universal set as the field F, with the understanding that F will
either be R or C, so we will assume that F ⊆ C. Most of the time, the choice of F will not
matter. There are however a few occasions where we need to be explicit about which field
we are using.
For example, if we attempt to solve 2𝑥 = 5 over F, the solution is 𝑥 = 25 , whether F is R or
C. However, if we try to solve 𝑥2 = −1 over F, then the field matters. If F = R, then this
equation has no solutions, but if F = C, then there are two solutions: 𝑥 = ±𝑖.
We previously saw the definition of the standard inner product in C𝑛 . It will be useful to
extend this definition to F𝑛 .
Definition 1.8.1
Standard Inner
Product on F𝑛
#» ∈ F𝑛 is
The standard inner product of #»
𝑣,𝑤
#» = 𝑣 𝑤 + 𝑣 𝑤 + · · · + 𝑣 𝑤 .
⟨ #»
𝑣 , 𝑤⟩
1 1
2 2
𝑛 𝑛
28
Chapter 1
Vectors in R𝑛
REMARK
• If F = C, then this is exactly the standard inner product on C𝑛 , from Definition 1.7.2.
• If F = R, since each 𝑤𝑖 ∈ R, then 𝑤𝑖 = 𝑤𝑖 , and this is just the dot product on R𝑛 , as
in Definition 1.5.1.
1.9
The Cross Product in R3
Given two vectors #»
𝑢 , #»
𝑣 ∈ R3 , the cross product returns a vector in R3 that is orthogonal
#»
#»
to both 𝑢 and 𝑣 .
⎡
Definition 1.9.1
Cross Product
⎤
⎡ ⎤
𝑢1
𝑣1
Let #»
𝑢 = ⎣ 𝑢2 ⎦ , #»
𝑣 = ⎣ 𝑣2 ⎦ ∈ R3 . The cross product of #»
𝑢 and #»
𝑣 is defined to be the
𝑢3
𝑣3
vector in R3 given by
⎡
⎤
𝑢2 𝑣3 − 𝑢3 𝑣2
#»
𝑢 × #»
𝑣 = ⎣ −(𝑢1 𝑣3 − 𝑢3 𝑣1 ) ⎦ .
𝑢1 𝑣2 − 𝑢2 𝑣1
⎡
Example 1.9.2
⎤ ⎡ ⎤ ⎡
⎤ ⎡
⎤
2
−2
(−3)(4) − (5)(1)
−17
We have ⎣ −3 ⎦ × ⎣ 1 ⎦ = ⎣ −((2)(4) − (5)(−2)) ⎦ = ⎣ −18 ⎦ .
5
4
(2)(1) − (−3)(−2)
−4
REMARK
A useful trick for remembering the definition of the cross product will be given in Section
6.6.
Proposition 1.9.3
(Properties of the Cross Product)
Let #»
𝑢 , #»
𝑣 ∈ R3 and let #»
𝑧 = #»
𝑢 × #»
𝑣 . Then
(a) #»
𝑧 · #»
𝑢 = 0 and #»
𝑧 · #»
𝑣 =0
( #»
𝑧 is orthogonal to both #»
𝑢 and #»
𝑣)
(b) #»
𝑣 × #»
𝑢 = − #»
𝑧 = − #»
𝑢 × #»
𝑣
(skew-symmetric)
#»
#»
(c) If #»
𝑢 ̸= 0 and #»
𝑣 ̸= 0 , then ‖ #»
𝑢 × #»
𝑣 ‖ = ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ sin 𝜃, where 𝜃 is the angle between #»
𝑢
#»
and 𝑣 .
Section 1.9
The Cross Product in R3
29
REMARK (Right-Hand Rule)
If #»
𝑧 is orthogonal to both #»
𝑢 and #»
𝑣 , then − #»
𝑧 will
#»
also be orthogonal to both 𝑢 and #»
𝑣 , and − #»
𝑧 has
#»
opposite direction to 𝑧 .
The Right-Hand Rule is a convention used to
determine which of these directions #»
𝑢 × #»
𝑣 should
have.
If the pointer finger of your right hand points in
the direction of #»
𝑢 , and the middle finger of your
right hand points in the direction of #»
𝑣 , then your
thumb points in the direction of #»
𝑢 × #»
𝑣.
If you consider #»
𝑣 × #»
𝑢 instead, you would have to
turn your hand upside-down, which helps demonstrate the skew-symmetric property above!
REMARK (Parallelogram Area via Cross Product)
Consider the parallelogram defined by #»
𝑢 and #»
𝑣 ; its height is ℎ = ‖ #»
𝑣 ‖ sin 𝜃.
As a result of Property (c) of the above Proposition, its area is
‖ #»
𝑢 ‖ℎ = ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ sin 𝜃 = ‖ #»
𝑢 × #»
𝑣 ‖.
⎡
Example 1.9.4
⎤
⎡
⎤
2
−2
With #»
𝑢 = ⎣ −3 ⎦ and #»
𝑣 = ⎣ 1 ⎦, verify Proposition 1.9.3 (Properties of the Cross Prod5
4
uct).
⎡
⎤
−17
Solution: From Example 1.9.2, #»
𝑧 = #»
𝑢 × #»
𝑣 = ⎣ −18 ⎦.
−4
⎡
⎤ ⎡
⎤
−17
2
(a) #»
𝑧 · #»
𝑢 = ⎣ −18 ⎦ · ⎣ −3 ⎦ = −17(2) − 18(−3) − 4(5) = 0.
−4
5
⎡
⎤ ⎡
⎤
−17
−2
#»
𝑧 · #»
𝑣 = ⎣ −18 ⎦ · ⎣ 1 ⎦ = −17(−2) − 18(1) − 4(4) = 0.
−4
4
30
Chapter 1
Vectors in R𝑛
⎡
⎤ ⎡ ⎤ ⎡
⎤ ⎡ ⎤
−2
2
(1)(5) − (4)(−3)
17
(b) #»
𝑣 × #»
𝑢 = ⎣ 1 ⎦ × ⎣ −3 ⎦ = ⎣ −((−2)(5) − (4)(2)) ⎦ = ⎣ 18 ⎦ = − #»
𝑢 × #»
𝑣.
4
5
(−2)(−3) − (1)(2)
4
(c) Let 𝜃 be the angle between #»
𝑢 and #»
𝑣 . Working to three decimal places throughout,
#»
#»
𝑢 · 𝑣 can be used to evaluate 𝜃.
⎛ ⎡ ⎤ ⎡ ⎤ ⎞
2
−2
⎜ ⎣ −3 ⎦ · ⎣ 1 ⎦ ⎟
⎜
⎟
(︂ #» #» )︂
)︂
(︂
⎜
⎟
5
4
𝑢·𝑣
13
⎜
⎟
⎤⃦ ⎟ = arccos √
𝜃 = arccos
= arccos ⎜ ⃦⎡ ⎤⃦ ⃦⎡
≈ 1.093.
⎜ ⃦ 2 ⃦ ⃦ −2 ⃦ ⎟
‖ #»
𝑢 ‖‖ #»
𝑣‖
798
⃦⃦
⃦⎟
⎜⃦
⎣ ⎦⃦ ⃦⎣
⎦⃦
⎝⃦
⃦ −3 ⃦ ⃦ 1 ⃦ ⎠
⃦ 5 ⃦⃦ 4 ⃦
⃦⎡ ⎤⃦ ⃦⎡
⎤⃦
⃦ 2 ⃦ ⃦ −2 ⃦
⃦
⃦
⃦
⃦
√
⎣ −3 ⎦⃦ ⃦⎣ 1 ⎦⃦ sin 𝜃 ≈ 798(0.888) ≈ 25.080.
Thus ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ sin 𝜃 = ⃦
⃦
⃦⃦
⃦
⃦ 5 ⃦⃦ 4 ⃦
⎡
⎤
−17
Since #»
𝑢 × #»
𝑣 = ⎣ −18 ⎦, thus
−4
√
√
‖ #»
𝑢 × #»
𝑣 ‖ = 172 + 182 + 42 = 629 = 25.080 ≈ ‖ #»
𝑢 ‖‖ #»
𝑣 ‖ sin 𝜃.
Proposition 1.9.5
(Linearity of the Cross Product)
#» ∈ R3 , then
If 𝑐 ∈ R and #»
𝑢 , #»
𝑣,𝑤
}︃
#» = ( #»
#» + ( #»
#»
( #»
𝑢 + #»
𝑣)× 𝑤
𝑢 × 𝑤)
𝑣 × 𝑤)
(𝑏) (𝑐 #»
𝑢 ) × #»
𝑣 = 𝑐( #»
𝑢 × #»
𝑣)
(𝑎)
(𝑐)
(𝑑)
#»
#» = ( #»
#»
𝑢 × ( #»
𝑣 + 𝑤)
𝑢 × #»
𝑣 ) + ( #»
𝑢 × 𝑤)
#»
𝑢 × (𝑐 #»
𝑣 ) = 𝑐( #»
𝑢 × #»
𝑣)
linearity in the first argument
}︃
linearity in the second argument
Proof: We will prove linearity in the second argument; the proof of linearity in the first
argument follows by using the skew-symmetry of the cross product.
(c) We have
⎡
⎤
𝑢2 (𝑣3 + 𝑤3 ) − 𝑢3 (𝑣2 + 𝑤2 )
#»
#» = ⎣ −[𝑢 (𝑣 + 𝑤 ) − 𝑢 (𝑣 + 𝑤 )] ⎦
𝑢 × ( #»
𝑣 + 𝑤)
1 3
3
3 1
1
𝑢1 (𝑣2 + 𝑤2 ) − 𝑢2 (𝑣1 + 𝑤1 )
⎡
⎤ ⎡
⎤
𝑢2 𝑣3 − 𝑢3 𝑣2
𝑢2 𝑤3 − 𝑢3 𝑤2
= ⎣ −[𝑢1 𝑣3 − 𝑢3 𝑣1 ] ⎦ + ⎣ −[𝑢1 𝑤3 − 𝑢3 𝑤1 ] ⎦
𝑢1 𝑣2 − 𝑢2 𝑣1
𝑢1 𝑤2 − 𝑢2 𝑤1
#»
#»
#»
#»
= ( 𝑢 × 𝑣 ) + ( 𝑢 × 𝑤).
Section 1.9
The Cross Product in R3
31
(d) We have
⎡
⎤
𝑢2 (𝑐𝑣3 ) − 𝑢3 (𝑐𝑣2 )
#»
𝑢 × (𝑐 #»
𝑣 ) = ⎣ −[𝑢1 (𝑐𝑣3 ) − 𝑢3 (𝑐𝑣1 )] ⎦
𝑢1 (𝑐𝑣2 ) − 𝑢2 (𝑐𝑣1 )
⎡
⎤
𝑢2 𝑣3 − 𝑢3 𝑣2
= 𝑐 ⎣ −[𝑢1 𝑣3 − 𝑢3 𝑣1 ] ⎦
𝑢1 𝑣2 − 𝑢2 𝑣1
= 𝑐( #»
𝑢 × #»
𝑣 ).
REMARK (Why is #»
𝑢 × #»
𝑣 only defined for R3 ?)
Given two non-parallel vectors #»
𝑢 , #»
𝑣 ∈ R𝑛 for 𝑛 = 3, we can notice that there is always
exactly one line through the origin that is orthogonal to both #»
𝑢 and #»
𝑣 . A cross product
allows us to compute one non-zero vector on this line. It turns out that any other vector
that is orthogonal to both #»
𝑢 and #»
𝑣 will always lie on this line, and hence will be a multiple
#»
#»
of 𝑢 × 𝑣 . Notice that, when 𝑛 ̸= 3, this does not happen.
[︂ ]︂
[︂ ]︂
0
1
#»
#»
, then the only vector that is orthogonal to both #»
𝑢
and 𝑣 =
• In
if 𝑢 =
1
0
#»
and #»
𝑣 is 0 , which we can consider a trivial choice.
R2 ,
• For vectors #»
𝑢 , #»
𝑣 ∈ R𝑛 with 𝑛 ≥ 4, there are many different lines through the origin
that are orthogonal to both #»
𝑢 and #»
𝑣 , making it difficult to choose one line out of
many.
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
1
0
0
0
⎢
⎥
⎢
⎥
⎢
⎥
⎢
0⎥
#» ⎢ 1 ⎥
#» ⎢ 0 ⎥
#» ⎢ 0 ⎥
⎥
For example, in R4 , if #»
𝑢 =⎢
⎣ 0 ⎦ and 𝑣 = ⎣ 0 ⎦, then 𝑦 = ⎣ 1 ⎦ and 𝑧 = ⎣ 0 ⎦ are
0
0
0
1
each orthogonal to both #»
𝑢 and #»
𝑣 , but #»
𝑦 and #»
𝑧 lie on different lines through the
origin.
Chapter 2
Span, Lines and Planes
2.1
Definition 2.1.1
Linear
Combination
Example 2.1.2
Example 2.1.3
Linear Combinations and Span
Let 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F and let 𝑣#»1 , 𝑣#»2 , . . . 𝑣#»𝑘 be vectors in F𝑛 . We refer to any vector of the
form 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 as a linear combination of 𝑣#»1 , 𝑣#»2 , . . . 𝑣#»𝑘 .
⎡ ⎤
⎡ ⎤
⎤ ⎡ ⎤ ⎡ ⎤
⎡
⎡ ⎤
−1
2
9
9
−1
2
⎦
⎣
⎦
⎣
⎦
⎣
⎦
⎣
⎦
⎣
⎣
Since 2 0 − 5 1 = −5 , −5 is a linear combination of vectors 0 and 1 ⎦
3
5
−5
−5
3
5
3
in R .
]︂
[︂
]︂
[︂ ]︂
]︂ [︂
]︂ [︂
[︂
[︂ ]︂
1+𝑖
𝑖
1 + 4𝑖
1 + 4𝑖
1+𝑖
𝑖
and
is a linear combination of
,
=
+4
Since 3𝑖
−4𝑖
0
−16𝑖
−16𝑖
−4𝑖
0
in C2 .
⎡
Example 2.1.4
⎤
⎡ ⎤
⎡
⎤
⎡ ⎤
⎡
⎤
−31
1
0
1
10
⎢ 54 ⎥
⎢0⎥
⎢ 0 ⎥
⎢2⎥
⎢ −10 ⎥
⎥
⎢ ⎥
⎢
⎥
⎢ ⎥
⎢
⎥
The vector ⎢
⎣ −10 ⎦ = 2 ⎣ 2 ⎦ − 5 ⎣ −1 ⎦ + 7 ⎣ 3 ⎦ − 4 ⎣ 10 ⎦ is a linear combination of the
23
0
1
4
0
four vectors
[︀
]︀𝑇 [︀
]︀𝑇 [︀
]︀𝑇 [︀
]︀𝑇
1 0 2 0 , 0 0 −1 1 , 1 2 3 4 , 10 −10 10 0
in R4 .
Example 2.1.5
Definition 2.1.6
Span
#» ∈ F𝑛 , the zero vector #»
#» is a linear combination of #»
#»
For any #»
𝑣,𝑤
0 = 0 #»
𝑣 + 0𝑤
𝑣 and 𝑤.
Let 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 be vectors in F𝑛 . We define the span of {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } to be the set of
all linear combinations of 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 . That is,
Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } = {𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 : 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F}.
32
Section 2.1
Linear Combinations and Span
33
We refer to {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } as a spanning set for Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. We also say that
Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is spanned by {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }.
⎡
Example 2.1.7
⎤
⎡
⎤
2𝑖
2+𝑖
Consider the vectors 𝑣#»1 = ⎣ 3 ⎦ , 𝑣#»2 = ⎣ −5 + 2𝑖 ⎦ in C3 . Then
𝑖
6𝑖
⎫
⎧⎡ ⎤ ⎡
⎡
⎤
⎤⎫ ⎧ ⎡ ⎤
2𝑖
2+𝑖
2+𝑖 ⎬ ⎨
⎬
⎨ 2𝑖
Span ⎣ 3 ⎦ , ⎣ −5 + 2𝑖 ⎦ = 𝑐1 ⎣ 3 ⎦ + 𝑐2 ⎣ −5 + 2𝑖 ⎦ : 𝑐1 , 𝑐2 ∈ C .
⎭
⎭ ⎩
⎩
𝑖
6𝑖
𝑖
6𝑖
Example 2.1.8
⎡ ⎤
⎡ ⎤
⎡
⎤
⎡ ⎤
1
2
2
5
Consider the vectors 𝑣#»1 = ⎣ 3 ⎦ , 𝑣#»2 = ⎣ 5 ⎦ , 𝑣#»3 = ⎣ −8 ⎦ , 𝑣#»4 = ⎣ 7 ⎦ in F3 . Then
7
7
6
−4
⎧ ⎡ ⎤
⎫
⎡ ⎤
⎡ ⎤
⎡ ⎤
1
2
2
5
⎨
⎬
Span {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 } = 𝑐1 ⎣ 3 ⎦ + 𝑐2 ⎣ 5 ⎦ + 𝑐3 ⎣ −8 ⎦ + 𝑐4 ⎣ 7 ⎦ : 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ F .
⎩
⎭
7
7
6
−4
REMARK
In the above example, F could be either R or C. If F = R, then we would choose
𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ R. However, if F = C, then we would choose 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ C, resulting
in more vectors being in Span {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 }.
⎧⎡ ⎤ ⎡ ⎤⎫
⎤
−3
1 ⎬
⎨ 1
Determine whether the vector ⎣ 9 ⎦ is an element of Span ⎣ 2 ⎦ , ⎣ −1 ⎦ .
⎩
⎭
2
1
0
⎡
Example 2.1.9
Solution:
⎧⎡ ⎤ ⎡
⎡ ⎤
⎡ ⎤
⎡
⎤
⎡
⎤
⎤⎫
−3
1
1
−3
1 ⎬
⎨ 1
Since ⎣ 9 ⎦ = 2 ⎣ 2 ⎦ − 5 ⎣ −1 ⎦, we see that ⎣ 9 ⎦ ∈ Span ⎣ 2 ⎦ , ⎣ −1 ⎦ .
⎩
⎭
2
1
0
2
1
0
Example 2.1.10
⎧⎡ ⎤ ⎡ ⎤⎫
⎡ ⎤
0
−1 ⎬
⎨ 2
⎣
⎦
⎣
⎦
⎣
0 , 0 ⎦ .
Determine whether 1 is in Span
⎩
⎭
0
5
3
Solution:
⎧⎡ ⎤ ⎡
⎤⎫
⎡ ⎤
⎡
⎤ ⎡
⎤
−1 ⎬
2
−1
2𝑐1 − 𝑐2
⎨ 2
⎦
0
Every vector in Span ⎣ 0 ⎦ , ⎣ 0 ⎦ has the form 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ = ⎣
⎩
⎭
5
3
5
3
5𝑐1 + 3𝑐2
34
Chapter 2
Span, Lines and Planes
⎤
⎡ ⎤ ⎡
0
2𝑐1 − 𝑐2
⎦ has no solutions in 𝑐1 , 𝑐2 ∈ F,
0
for some 𝑐1 , 𝑐2 ∈ F. Since the equation ⎣ 1 ⎦ = ⎣
0
5𝑐1 + 3𝑐2
⎧⎡ ⎤ ⎡
⎤⎫
⎡ ⎤
−1 ⎬
0
⎨ 2
we conclude that ⎣ 1 ⎦ ̸∈ Span ⎣ 0 ⎦ , ⎣ 0 ⎦ .
⎭
⎩
5
3
0
2.2
Lines in R2
There is a nice way to write down the equation of a line in R𝑛 making use of vectors. We
begin with R2 (otherwise known as the 𝑥𝑦-plane) before generalizing to R𝑛 .
We consider the line ℒ with points (𝑥1 , 𝑦1 ) and (𝑥2 , 𝑦2 ), 𝑦-intercept at (0, 𝑏), with slope
𝑚 = 𝑝𝑞 where 𝑞 ̸= 0. We visualize all of this information below.
Figure 2.2.1: Characteristics of a line ℒ in R2
The table below includes four different forms of equation(s) for ℒ in the 𝑥𝑦-plane, based on
given information.
Given Information
Slope 𝑚 and 𝑦-intercept 𝑏
Two points (𝑥1 , 𝑦1 ) and (𝑥2 , 𝑦2 )
Point (𝑥1 , 𝑦1 ) and slope 𝑚
Point (𝑥1 , 𝑦1 ) and slope
𝑝
𝑞
(𝑞 ̸= 0)
Equation of Line
𝑦 = 𝑚𝑥 + 𝑏
𝑦 − 𝑦1
𝑥 − 𝑥1
=
𝑦2 − 𝑦1
𝑥2 − 𝑥1
𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1 )
𝑥 = 𝑥1 + 𝑞𝑡
, 𝑡∈R
𝑦 = 𝑦1 + 𝑝𝑡
In the first 3 forms above, we input a particular value for 𝑥 to obtain the value of 𝑦 for the
corresponding point on the line.
In the last form above, the variable 𝑡 is called a parameter. We input a particular value
for 𝑡 to obtain the value of the 𝑥- and 𝑦-coordinates for the corresponding point on the line.
Definition 2.2.1
Parametric
Equations of a Line
in R2
Let 𝑝, 𝑞 be fixed real numbers and 𝑞 ̸= 0. Then parametric equations of a line in R2
through the point (𝑥1 , 𝑦1 ) with slope 𝑝𝑞 are
𝑥 = 𝑥1 + 𝑞𝑡
,
𝑦 = 𝑦1 + 𝑝𝑡
𝑡 ∈ R.
Section 2.2
Lines in R2
35
Each value of 𝑡 gives a different point on the line. For instance,
• 𝑡 = 0 gives the point (𝑥, 𝑦) = (𝑥1 , 𝑦1 )
• 𝑡 = 2 gives the point (𝑥, 𝑦) = (𝑥1 + 2𝑞, 𝑦1 + 2𝑝)
• 𝑡 = −5 gives the point (𝑥, 𝑦) = (𝑥1 − 5𝑞, 𝑦1 − 5𝑝).
As 𝑡 varies over all real numbers, we generate all the points on the line.
REMARK
Since 𝑞 ̸= 0, the expression 𝑝𝑞 is always defined. Putting 𝑞 = 0 into the parametric equations
of the line gives us 𝑥 = 𝑥1 and 𝑦 = 𝑦1 + 𝑝𝑡, a vertical line with undefined slope (we can
think of this informally as “infinite slope”).
Definition 2.2.2
[︂ ]︂
𝑞
be a non-zero vector in R2 . The expression
Let
𝑝
Vector Equation of
a Line in R2
[︂ ]︂ [︂ ]︂
[︂ ]︂
#»
𝑥
𝑥1
𝑞
ℓ =
=
,𝑡∈R
+𝑡
𝑦
𝑦1
𝑝
is a vector equation of the line ℒ in
R2
[︂
through
𝑥1
𝑦1
]︂
[︂ ]︂
𝑞
.
with direction
𝑝
[︂
]︂
[︂ ]︂
𝑥1
𝑞
For any value of 𝑡 ∈ R, the expression
produces a vector in R2 whose terminal
+𝑡
𝑦1
𝑝
point has coordinates 𝐿 = (𝑥1 + 𝑡𝑞, 𝑦1 + 𝑡𝑝) and is on ℒ.
REMARKS
[︂ ]︂
[︂ ]︂
𝑥
𝑞
If we let #»
𝑢 = 1 and #»
, we can write a vector equation of line ℒ in R2 as
𝑣 =
𝑦1
𝑝
#» #»
ℓ = 𝑢 + 𝑡 #»
𝑣 , for 𝑡 ∈ R.
• Letting 𝑡 = 0 gives us that the vector #»
𝑢 is on the line.
• The line passes through the terminal point 𝑈 associated with #»
𝑢 . The other points on
the line move from 𝑈 in the #»
𝑣 direction by scalar multiples of #»
𝑣.
• We say that #»
𝑣 is parallel to the line and that #»
𝑣 is a direction vector to the line.
• The vector #»
𝑣 is parallel to the line. However, the terminal point 𝑉 associated with
#»
𝑣 is not usually a point on the line; in fact, 𝑉 is a point on the line if and only if the
vector #»
𝑣 is a scalar multiple of the vector #»
𝑢.
36
Chapter 2
Definition 2.2.3
Span, Lines and Planes
#»
Let #»
𝑢 , #»
𝑣 ∈ R2 with #»
𝑣 =
̸ 0 . We refer to the set of vectors
Line in R2
ℒ = { #»
𝑢 + 𝑡 #»
𝑣 : 𝑡 ∈ R}
as a line ℒ in R2 through #»
𝑢 with direction #»
𝑣.
Figure 2.2.2: Line in R2 through #»
𝑢 with direction #»
𝑣
Example 2.2.4
Let 𝑈 = (2, 3) and 𝑉 = (1, 4) be two points on a line ℒ in R2 . Find a vector equation of ℒ.
Solution:
#»
#»
We can find a vector parallel to the line by taking the difference #»
𝑣 − #»
𝑢 ,[︂ where
]︂
]︂ [︂ 𝑢 ]︂and[︂𝑣 are
−1
2
1
,
=
−
the vector representations of points 𝑈 and 𝑉 , respectively. This gives
1
3
4
[︂ ]︂
[︂ ]︂
#»
−1
2
for 𝑡 ∈ R.
+𝑡
and a vector equation of the line is therefore ℓ = #»
𝑢 + 𝑡( #»
𝑣 − #»
𝑢) =
1
3
2.3
Lines in R𝑛
We now consider lines in R𝑛 . Lines in R𝑛 can be described using the same vector equations
as lines in R2 . The only major differences in describing these lines arise from the implicit
number of components in the vectors, which in turn affects the number of parametric
equations associated with these lines.
Definition 2.3.1
Vector Equation of
a Line in R𝑛
#»
Let #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 =
̸ 0 . The expression
#» #»
ℓ = 𝑢 + 𝑡 #»
𝑣,
𝑡∈R
is a vector equation of the line ℒ in R𝑛 through #»
𝑢 with direction #»
𝑣.
#»
#»
If ℓ 1 and ℓ 2 are lines with direction #»
𝑣 1 and #»
𝑣 2 , respectively, we say that they have the
#»
#»
same direction if 𝑐 𝑣 1 = 𝑣 2 for some non-zero 𝑐 ∈ R.
Section 2.3
Lines in R𝑛
37
⎤
⎡ ⎤
𝑣1
𝑢1
⎢ 𝑣2 ⎥
⎢ 𝑢2 ⎥
⎢ ⎥
⎢ ⎥
For any value of 𝑡 ∈ R, the expression ⎢ . ⎥ + 𝑡 ⎢ . ⎥ produces a vector in R𝑛 whose
.
⎣ .. ⎦
⎣ . ⎦
⎡
𝑣𝑛
𝑢𝑛
terminal point has coordinates 𝐿 = (𝑢1 + 𝑡𝑣1 , 𝑢2 + 𝑡𝑣2 , ..., 𝑢𝑛 + 𝑡𝑣𝑛 ) and is on line ℒ.
#»
Figure 2.3.3: In R𝑛 , the line ℓ = #»
𝑢 + 𝑡 #»
𝑣,
𝑡 ∈ R.
REMARKS
• Letting 𝑡 = 0 gives us that the vector #»
𝑢 is on the line.
• The line passes through the terminal point 𝑈 associated with #»
𝑢 . The other points on
the line move from 𝑈 in the #»
𝑣 direction by scalar multiples of #»
𝑣.
• We say that #»
𝑣 is parallel to the line, and that #»
𝑣 is a direction vector to the line.
• The vector #»
𝑣 is parallel to the line, however the terminal point 𝑉 associated with the
#»
vector 𝑣 , is not usually a point on the line. In fact, 𝑉 is a point on the line if and
only if the vector #»
𝑣 is a multiple of the vector #»
𝑢.
• There are many different vector equations that could be used to describe the same
line ℒ. We’ll see this in Example 2.3.5
Definition 2.3.2
Parametric
Equations of a Line
in R𝑛
#»
Let #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 =
̸ 0 . Consider the vector equation of the line ℒ in R𝑛 given by
#» #»
ℓ = 𝑢 + 𝑡 #»
𝑣,
𝑡 ∈ R.
The parametric equations of the line ℒ in R𝑛 through #»
𝑢 with direction #»
𝑣 are
ℓ1 = 𝑢1 + 𝑡𝑣1
ℓ2 = 𝑢2 + 𝑡𝑣2
,
..
.
𝑡 ∈ R.
ℓ𝑛 = 𝑢𝑛 + 𝑡𝑣𝑛
Example 2.3.3
Give a vector equation and a set of parametric equations
of the line through
⎡
⎤
−2
𝑈 = (4, −3, 5) and in the direction of the vector #»
𝑣 = ⎣ 4 ⎦.
1
38
Chapter 2
Span, Lines and Planes
Solution: A vector equation of the line is
⎡
⎤
⎡
⎤
4
−2
#» ⎣
ℓ = −3 ⎦ + 𝑡 ⎣ 4 ⎦ , 𝑡 ∈ R.
5
1
The corresponding parametric equations of the line are:
ℓ1 = 4 − 2𝑡
ℓ2 = −3 + 4𝑡 ,
ℓ3 = 5 + 𝑡
Definition 2.3.4
𝑡 ∈ R.
#»
Let #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 =
̸ 0 . We refer to the set of vectors
Line in R𝑛
ℒ = { #»
𝑢 + 𝑡 #»
𝑣 : 𝑡 ∈ R}
as a line ℒ in R𝑛 through #»
𝑢 with direction #»
𝑣.
Example 2.3.5
⎫
⎧⎡
⎤
⎤
⎡
−2
⎬
⎨ 4
Let ℒ = ⎣ −3 ⎦ + 𝑡 ⎣ 4 ⎦ : 𝑡 ∈ R , be the line from Example 2.3.3.
⎭
⎩
1
5
⎫
⎧⎡ ⎤
⎤
⎡
6
⎬
⎨ 0
Show that the line ℳ = ⎣ 5 ⎦ + 𝑠 ⎣ −12 ⎦ : 𝑠 ∈ R is identical to ℒ.
⎭
⎩
−3
7
⎡
⎤ ⎡
⎤
⎡ ⎤
⎡
⎤
−2
6
−2
6
Solution: First, −3 ⎣ 4 ⎦ = ⎣ −12 ⎦, so ⎣ 4 ⎦ has the same direction as ⎣ −12 ⎦.
1
−3
1
−3
Thus, ℒ and ℳ are parallel. Two parallel lines are the same if and only if they share a
common point.
⎡ ⎤
⎡ ⎤
⎡ ⎤ ⎡ ⎤
0
4
−2
0
⎣
⎦
⎣
⎦
⎣
⎦
⎣
Taking 𝑠 = 0 shows that 5 is on ℳ. Taking 𝑡 = 2 shows that −3 + 2 4 = 5 ⎦
7
5
1
7
is also on ℒ. Thus, lines ℒ and ℳ are identical.
EXERCISE
⎡ ⎤
⎡ ⎤
1
0
#»
Determine two different vector equations for the line ℓ = ⎣ 2 ⎦ + 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R.
2
1
Suppose that we are given two distinct points, 𝑈 and 𝑄, on a line in R𝑛 , with associated
vectors, #»
𝑢 and #»
𝑞 , respectively. We can define #»
𝑣 = #»
𝑞 − #»
𝑢 , and this vector gives the direction
of the line. Thus, we can still use Definition 2.3.1 for the vector equation of the line:
#» #»
ℓ = 𝑢 + 𝑡( #»
𝑞 − #»
𝑢 ), 𝑡 ∈ R.
Section 2.4
Lines in R𝑛
39
Figure 2.3.4: Line in R𝑛 with direction #»
𝑣 = #»
𝑞 − #»
𝑢
Example 2.3.6
Determine a vector equation of the line through 𝑈 = (2, −3, 5) and 𝑄 = (4, −2, 6).
Solution: A vector equation of the line through the given points is
⎡ ⎤
⎛⎡
⎤ ⎡ ⎤⎞ ⎡ ⎤
⎡ ⎤
2
4
2
2
2
#» ⎣ ⎦
⎝
⎣
⎦
⎣
⎦
⎠
⎣
⎦
⎣
−2 − −3
= −3 + 𝑡 1 ⎦ , 𝑡 ∈ R.
ℓ = −3 + 𝑡
5
6
5
5
1
We can describe lines through the origin as the span of a single vector. If #»
𝑣 ∈ R𝑛 , with
#»
#»
#»
#»
#»
𝑣 ̸= 0 , then Span{ 𝑣 } = { 0 + 𝑡 𝑣 : 𝑡 ∈ R} is the line through the origin 𝑂 with #»
𝑣 as a
direction vector.
#»
Figure 2.3.5: In R𝑛 , Span{ #»
𝑣 } = { 0 + 𝑡 #»
𝑣 : 𝑡 ∈ R} is a line through the origin
REMARK
If a line can be expressed as the span of one vector, then the line must pass through the
origin.
Example 2.3.7
⎧⎡ ⎤⎫
⎪
⎪ 2 ⎪
⎪
⎨
⎢ 4 ⎥⎬
⎢
⎥
The set Span ⎣ ⎦ is a line ℒ in R4 that goes through the origin and points in the
6 ⎪
⎪
⎪
⎪
⎭
⎩
8
⎧ ⎡ ⎤
⎫
⎡ ⎤
2
2
⎪
⎪
⎪
⎪
⎨ ⎢ ⎥
⎬
⎢4⎥
4
⎥. It is shorthand for ℒ = 𝑡 ⎢ ⎥ : 𝑡 ∈ R .
direction of ⎢
⎣6⎦
⎣6⎦
⎪
⎪
⎪
⎪
⎩
⎭
8
8
40
Chapter 2
2.4
Span, Lines and Planes
The Vector Equation of a Plane in R𝑛
In this section we will be working in R𝑛 where 𝑛 ≥ 2.
Definition 2.4.1
Plane in R𝑛
Through the
Origin
#» be non-zero vectors in R𝑛 with #»
#» for any 𝑐 ∈ R. Then
Let #»
𝑣,𝑤
𝑣 =
̸ 𝑐𝑤
#» = {𝑠 #»
#» : 𝑠, 𝑡 ∈ R}
𝒫 = Span{ #»
𝑣 , 𝑤}
𝑣 + 𝑡𝑤
#»
is a plane in R𝑛 through the origin with direction vectors #»
𝑣 and 𝑤.
#» in gray.
Figure 2.4.6: Plane 𝒫 = Span{ #»
𝑣 , 𝑤}
#» #»
#» and #»
#»
Three points on 𝒫 are illustrated: #»
𝑞 = #»
𝑣 + 𝑤,
𝑟 = 3 #»
𝑣 +𝑤
𝑠 = −2 #»
𝑣 − 𝑤.
REMARKS
• 𝒫 contains the points 𝑉 and 𝑊 , which are the terminal points associated with the
#» respectively.
vectors #»
𝑣 and 𝑤
#» for some
• If a point 𝑃 with associated vector #»
𝑝 lies on the plane, then #»
𝑝 = 𝑠 #»
𝑣 + 𝑡 𝑤,
𝑠, 𝑡 ∈ R.
• Any plane defined by the span of two vectors must pass through the origin.
#» must be non-zero vectors with 𝑤
#» ̸= 𝑐 #»
• To form a plane, #»
𝑣,𝑤
𝑣 for any 𝑐 ∈ R. If exactly
#»
#»
#»
#»
#»
#» is a line, not a
one of 𝑣 and 𝑤 is 0 , or if 𝑤 = 𝑐 𝑣 for some 𝑐 ∈ R, then Span{ #»
𝑣 , 𝑤}
#»
#»
#»
#»
#»
#»
plane. If 𝑣 = 𝑤 = 0 , then Span{ 𝑣 , 𝑤} = 0 ; this is not a plane, but rather a single
point, the origin.
Definition 2.4.2
Vector Equation of
a Plane in R𝑛
Through the
Origin
#» be non-zero vectors in R𝑛 with #»
#» for any 𝑐 ∈ R. The expression
Let #»
𝑣,𝑤
𝑣 =
̸ 𝑐𝑤
#»
#»
𝑝 = 𝑠 #»
𝑣 + 𝑡𝑤
is a vector equation of the plane in R𝑛 through the origin with direction vectors
#»
#»
𝑣 and 𝑤.
Section 2.4
The Vector Equation of a Plane in R𝑛
Example 2.4.3
41
equation of the plane in R3 through the origin with direction vectors
⎡Determine
⎤
⎡ a vector
⎤
2
−1
⎣ 4 ⎦ and ⎣ 2 ⎦.
6
−3
Solution: A vector equation of this plane is
⎡ ⎤
⎡ ⎤
2
−1
#»
⎣
⎦
⎣
𝑝 = 𝑠 4 + 𝑡 2 ⎦ , 𝑠, 𝑡 ∈ R.
6
−3
Notice that the points (2, 4, 6) and (−1, 2, −3) are on this plane.
Example 2.4.4
Give a vector equation of the plane in R5 that passes through the origin with direction
[︀
]︀𝑇
[︀
]︀𝑇
vectors 5 4 3 2 1 and −5 4 −3 2 −1 .
Solution: A vector equation of this plane is
⎡ ⎤
⎡ ⎤
5
−5
⎢4⎥
⎢ 4 ⎥
⎢ ⎥
⎢ ⎥
#»
⎥
⎢ ⎥
𝑝 = 𝑠⎢
⎢ 3 ⎥ + 𝑡 ⎢ −3 ⎥ , 𝑠, 𝑡 ∈ R.
⎣2⎦
⎣ 2 ⎦
1
−1
Notice that the points (5, 4, 3, 2, 1) and (−5, 4, −3, 2, −1) in R5 are on this plane.
Definition 2.4.5
#» be non-zero vectors in R𝑛 with #»
#» for any 𝑐 ∈ R. Then
Let #»
𝑢 ∈ R𝑛 and let #»
𝑣 and 𝑤
𝑣 =
̸ 𝑐 𝑤,
Plane in R𝑛
#» : 𝑠, 𝑡 ∈ R}
𝒫 = { #»
𝑢 + 𝑠 #»
𝑣 + 𝑡𝑤
#» We say that #»
#»
is a plane in R𝑛 through #»
𝑢 with direction vectors #»
𝑣 and 𝑤.
𝑣 and 𝑤
are parallel to 𝒫.
REMARKS
• Letting 𝑠 = 𝑡 = 0 gives us that the vector #»
𝑢 is on the plane.
• The plane passes through the terminal point 𝑈 associated with #»
𝑢 . The other points
#» directions by linear combinations of #»
on the plane move from 𝑈 in the #»
𝑣 and the 𝑤
𝑣
#»
and 𝑤.
#» are not usually
• The terminal points, 𝑉 and 𝑊 , associated with the vectors, #»
𝑣 and 𝑤,
#» that
on the plane. In fact, 𝑉 is a point on the plane if and only if #»
𝑢 ∈ Span{ #»
𝑣 , 𝑤};
is, if and only if 𝑈 lies on the plane through the origin that contains 𝑉 and 𝑊 .
42
Example 2.4.6
Definition 2.4.7
Vector Equation of
a Plane
Chapter 2
Span, Lines and Planes
⎫
⎧⎡ ⎤
⎡ ⎤
⎡ ⎤
2
−1
⎬
⎨ 0
Since the plane ℳ = ⎣ 0 ⎦ + 𝑠 ⎣ 4 ⎦ + 𝑡 ⎣ 2 ⎦ : 𝑠, 𝑡 ∈ R has the same direction vectors
⎭
⎩
1
6
−3
⎧ ⎡ ⎤
⎫
⎡
⎤
−1
⎨ 2
⎬
as the plane 𝒫 = 𝑠 ⎣ 4 ⎦ + 𝑡 ⎣ 2 ⎦ : 𝑠, 𝑡 ∈ R from Example 2.4.3, then ℳ is parallel to
⎩
⎭
6
−3
𝒫. It can be shown that ℳ does not pass through the origin.
#» be non-zero vectors in R𝑛 with #»
#» for any 𝑐 ∈ R. Then
Let #»
𝑢 ∈ R𝑛 and let #»
𝑣 and 𝑤
𝑣 =
̸ 𝑐 𝑤,
#»
#»
𝑝 = #»
𝑢 + 𝑠 #»
𝑣 + 𝑡 𝑤,
𝑠, 𝑡 ∈ R
is a vector equation of the plane in R𝑛 through #»
𝑢 with direction vectors #»
𝑣 and
#»
𝑤.
Example 2.4.8
3
Give a vector
⎡ ⎤ equation
⎡
⎤of a plane in R which passes through the point (1, −4, 6), and has
2
−1
vectors ⎣ 4 ⎦ and ⎣ 2 ⎦ parallel to it.
6
−3
Solution: A vector equation of this plane is
⎡ ⎤
⎡ ⎤
⎡ ⎤
−1
2
1
#»
⎦
⎣
⎦
⎣
⎣
𝑝 = −4 + 𝑠 4 + 𝑡 2 ⎦ , 𝑠, 𝑡 ∈ R.
−3
6
6
Note
points (2, 4, 6) and (−1, 2, −3) are not on this plane; however, the vectors
⎤ that⎡the ⎤
⎡
−1
2
⎣ 4 ⎦ and ⎣ 2 ⎦ are parallel to this plane.
−3
6
#» in gray.
Figure 2.4.7: Plane 𝒫 with equation #»
𝑝 = #»
𝑢 + 𝑠 #»
𝑣 + 𝑡𝑤
#»
#»
#»
#»
#»
#»
#»
#» and #»
#»
Plane 𝒫 includes 𝑞 = 𝑢 + 𝑣 + 𝑤 and 𝑟 = 𝑢 + 3 𝑣 + 𝑤
𝑥 = #»
𝑢 − 2 #»
𝑣 − 𝑤.
Section 2.4
The Vector Equation of a Plane in R𝑛
Example 2.4.9
43
Give a vector equation of a plane 𝒫 in R5 that passes through the point (1, −4, 5, −2, 7)
[︀
]︀𝑇
[︀
]︀𝑇
and has direction vectors 2 4 6 8 10 and −1 2 −3 4 −5 .
Solution: A vector equation of this plane is
⎡ ⎤
⎡ ⎤
⎤
⎡
2
−1
1
⎢ −4 ⎥
⎢4⎥
⎢ 2 ⎥
⎢ ⎥
⎢ ⎥
⎥
⎢
#»
⎢
⎥
⎢
⎥
⎥
𝑝 = ⎢ 5 ⎥ + 𝑠⎢ 6 ⎥ + 𝑡⎢
⎢ −3 ⎥ , 𝑠, 𝑡 ∈ R.
⎣ −2 ⎦
⎣8⎦
⎣ 4 ⎦
7
10
−5
Note that the points (2, 4, 6, 8, 10) and (−1, 2, −3, 4, −5) are not on 𝒫. However, the vectors
[︀
]︀𝑇
[︀
]︀𝑇
2 4 6 8 10 and −1 2 −3 4 −5 are parallel to 𝒫.
[︀
]︀𝑇
[︀
]︀𝑇
Noticing that 2 4 6 8 10 = 2 1 2 3 4 5 , a different vector equation for 𝒫 is
⎡
⎤
⎡ ⎤
⎡ ⎤
1
1
−1
⎢ −4 ⎥
⎢2⎥
⎢ 2 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
#»
⎥
⎢ ⎥
⎢ ⎥
𝑝 =⎢
⎢ 5 ⎥ + 𝑠 ⎢ 3 ⎥ + 𝑡 ⎢ −3 ⎥ , 𝑠, 𝑡 ∈ R.
⎣ −2 ⎦
⎣4⎦
⎣ 4 ⎦
7
5
−5
REMARK
There are many different vector equations that could be used to describe the same plane 𝒫.
EXERCISE
⎡ ⎤
⎡ ⎤ ⎡
⎤
0
1
1
Determine two different vector equations for the plane #»
𝑝 = ⎣ 1 ⎦ +𝑠 ⎣ 2 ⎦ +𝑡 ⎣ −1 ⎦ , 𝑠, 𝑡 ∈ R.
0
3
−1
We can also uniquely define a plane using three points. Let 𝑈, 𝑄 and 𝑅 be three non-colinear
points in R𝑛 (that is, three points which do not all lie on the same line), with associated
vectors #»
𝑢 , #»
𝑞 , and #»
𝑟 , respectively. Suppose that we want the equation of the unique plane
#» = #»
containing these three points. The vectors #»
𝑣 = #»
𝑞 − #»
𝑢 and 𝑤
𝑟 − #»
𝑢 are parallel to the
plane, and thus, we can express the equation of the plane as
𝒫 = { #»
𝑢 + 𝑠( #»
𝑞 − #»
𝑢 ) + 𝑡( #»
𝑟 − #»
𝑢 ) : 𝑠, 𝑡 ∈ R}.
Example 2.4.10
Find a vector equation of a plane in R4 which contains the following points:
𝑈 = (2, −4, 6, −8), 𝑄 = (1, 3, −2, −4), and 𝑅 = (9, 7, 5, 3).
Solution: We need to determine two non-parallel direction vectors. We have:
44
Chapter 2
Span, Lines and Planes
Figure 2.4.8: Plane 𝒫 = { #»
𝑢 + 𝑠( #»
𝑞 − #»
𝑢 ) + 𝑡( #»
𝑟 − #»
𝑢 ) : 𝑠, 𝑡 ∈ R} in gray.
⎡
⎤ ⎡ ⎤ ⎡
⎤
⎡ ⎤ ⎡ ⎤ ⎡
⎤
1
2
−1
9
2
7
⎢ 3 ⎥ ⎢ −4 ⎥ ⎢ 7 ⎥
⎥ ⎢ ⎥ ⎢
⎥
#»
#» #» ⎢
⎥ ⎢ ⎥ ⎢
⎥
⎢ 7 ⎥ ⎢ −4 ⎥ ⎢ 11 ⎥
𝑞 − #»
𝑢 =⎢
⎣ −2 ⎦ − ⎣ 6 ⎦ = ⎣ −8 ⎦ and 𝑟 − 𝑢 = ⎣ 5 ⎦ − ⎣ 6 ⎦ = ⎣ −1 ⎦ .
3
−8
11
−4
−8
4
A vector equation of this plane is then
⎡ ⎤
⎡ ⎤
⎤
7
−1
2
⎥
⎢
⎥
⎢
⎥
⎢
11
7
−4
#»
⎢ ⎥
⎢ ⎥
⎥
𝑝 = #»
𝑢 + 𝑠( #»
𝑞 − #»
𝑢 ) + 𝑡( #»
𝑟 − #»
𝑢) = ⎢
⎣ 6 ⎦ + 𝑠 ⎣ −8 ⎦ + 𝑡 ⎣ −1 ⎦ , 𝑠, 𝑡 ∈ R.
11
4
−8
⎡
2.5
Scalar Equation of a Plane in R3
In R3 (and only in R3 !), there is an additional way to generate an equation of a plane, using
the cross product.
#» be non-zero vectors in R3 with #»
#» for any 𝑐 ∈ R. We know that
Let #»
𝑣 and 𝑤
𝑣 ̸= 𝑐 𝑤,
#»
#»
#»
#» are both
the cross product, 𝑛 = 𝑣 × 𝑤, is orthogonal to any plane to which #»
𝑣 and 𝑤
direction vectors. The vector #»
𝑛 is referred to as a normal vector to such a plane. Suppose
that 𝑈 and 𝑃 are points on the plane, with associated vectors, #»
𝑢 and #»
𝑝 , respectively.
#»
#»
Then the vector 𝑝 − 𝑢 is parallel to the plane and is therefore orthogonal to #»
𝑛 ; that is,
#»
#» · ( #»
𝑛 · ( #»
𝑝 − #»
𝑢 ) = ( #»
𝑣 × 𝑤)
𝑝 − #»
𝑢 ) = 0.
Definition 2.5.1
Normal Form,
Scalar Equation of
a Plane in R3
⎡ ⎤
𝑎
#»
#» and a normal vector #»
Let 𝒫 be a plane in R3 with direction vectors #»
𝑣 and 𝑤
𝑛 = ⎣ 𝑏 ⎦ ̸= 0 .
𝑐
⎡ ⎤
𝑥
𝑝 ̸= #»
𝑢 . A normal form of 𝒫 is given by
Let #»
𝑢 ∈ 𝒫 and #»
𝑝 = ⎣ 𝑦 ⎦ ∈ 𝒫 where #»
𝑧
#»
𝑛 · ( #»
𝑝 − #»
𝑢 ) = 0.
Expanding this, we arrive at a scalar equation (or general form) of 𝒫,
𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 = 𝑑,
where 𝑑 = #»
𝑛 · #»
𝑢.
Section 2.5
Scalar Equation of a Plane in R3
45
REMARK
The plane goes through the origin, 𝑂
#»
• if and only if the vector 0 satisfies this equation
#» · ( #»
• if and only if ( #»
𝑣 × 𝑤)
0 − #»
𝑢) = 0
#» for some 𝑎, 𝑏 ∈ R.
• if and only if #»
𝑢 = 𝑎 #»
𝑣 + 𝑏 𝑤,
Geometrically, the plane goes through the origin if and only if both 𝑉 and 𝑊 lie on the
plane.
#» The plane in R3 with normal form ( #»
#» · ( #»
Figure 2.5.9: Let #»
𝑛 = #»
𝑣 × 𝑤.
𝑣 × 𝑤)
𝑝 − #»
𝑢 ) = 0 is in gray. It passes
#»
through #»
𝑢 and has direction vectors #»
𝑣 and 𝑤.
Example 2.5.2
Find a scalar
of the plane in R3 which passes through the origin and has direction
⎡ ⎤ equation
⎡
⎤
2
−1
vectors ⎣ 4 ⎦ and ⎣ 2 ⎦.
7
−3
⎡ ⎤
𝑥
Solution: Let #»
𝑛 be a normal to the plane and let #»
𝑝 = ⎣ 𝑦 ⎦ be any vector on the plane.
𝑧
Then
⎡ ⎤ ⎡ ⎤ ⎡
⎤
2
−1
−26
#»
𝑛 = ⎣ 4 ⎦ × ⎣ 2 ⎦ = ⎣ −1 ⎦ .
7
−3
8
A normal form of the plane (going through the origin) is thus given by
⎡ ⎤ ⎡
⎤
𝑥
−26
⎣ 𝑦 ⎦ · ⎣ −1 ⎦ = 0.
𝑧
8
Expanding this, we obtain a scalar equation of the plane
−26𝑥 − 𝑦 + 8𝑧 = 0.
46
Chapter 2
Span, Lines and Planes
Note that the two points (2, 4, 7) and (−1, 2, −3) lie on this plane.
Example 2.5.3
3
Give a scalar equation
⎡ ⎤ of the
⎡ plane
⎤ in R which passes through the point (1, −4, 3) and has
2
−1
the two vectors ⎣ 4 ⎦ and ⎣ 2 ⎦ parallel to it.
7
−3
⎡ ⎤
𝑥
#»
#»
⎣
Solution: Let 𝑛 be a normal to the plane and 𝑝 = 𝑦 ⎦ be any vector on the plane.
𝑧
⎡ ⎤ ⎡ ⎤ ⎡
⎤
2
−1
−26
#»
𝑛 = ⎣ 4 ⎦ × ⎣ 2 ⎦ = ⎣ −1 ⎦ .
7
−3
8
A scalar equation of the plane (not going through the origin) is then given by
⎤
⎛⎡ ⎤ ⎡ ⎤⎞ ⎡
−26
1
𝑥
⎝⎣ 𝑦 ⎦ − ⎣ −4 ⎦⎠ · ⎣ −1 ⎦ = 0
8
3
𝑧
−26(𝑥 − 1) − (𝑦 + 4) + 8(𝑧 − 3) = 0
−26𝑥 − 𝑦 + 8𝑧 = 2.
Note that the two points (2, 4, 7) and (−1, 2, −3) do not lie on this plane.
REMARKS
1. Examining the definition of a scalar
of a plane above, we see that the com⎤
⎡ equation
𝑎
ponents of a normal vector #»
𝑛 = ⎣ 𝑏 ⎦ are recorded as the coefficients of 𝑥, 𝑦, 𝑧 in a
𝑐
scalar equation. We can use this information to quickly retrieve a normal form from
a scalar equation by identifying a single vector on the plane.
2. We use the language “a scalar equation”, “a normal vector”, and “a normal form” of
a plane 𝒫 to emphasize that these are not unique.
• Normal vectors of a plane are simply non-zero vectors that are orthogonal to
any vector that is parallel to 𝒫. If #»
𝑛 is a normal vector of 𝒫, then so is 𝑐 #»
𝑛 for
any non-zero scalar 𝑐 ∈ R, so 𝒫 does not have a unique normal vector (and thus
normal form). That said, all normal vectors of 𝒫 lie on the same line.
• A scalar equation of a plane is an equation which all vectors of 𝒫 satisfy. We
know that multiplying an equation by a non-zero 𝑐 ∈ R does not change the
solution set to that equation, so 𝒫 does not have a unique scalar equation.
Section 2.5
Scalar Equation of a Plane in R3
Example 2.5.4
47
Let 𝒫 be a plane in R3 with scalar equation 2𝑥 − 𝑦 + 𝑧 = 7. Give a normal form of 𝒫.
⎡
⎤
2
Solution: From the scalar equation, we see that a normal vector of 𝒫 is #»
𝑛 = ⎣ −1 ⎦. To
1
identify a vector on 𝒫, we can pick any constants we want for exactly two of 𝑥, 𝑦, 𝑧 and
solve for the third value. Taking 𝑥 = 𝑦 = 0, we have
7 = 2(0) − 0 + 𝑧 = 𝑧,
⎡ ⎤
0
so #»
𝑢 = ⎣ 0 ⎦ ∈ 𝒫. Therefore, a normal form of 𝒫 is
7
⎡
⎤ ⎛⎡ ⎤ ⎡ ⎤⎞
2
𝑥
0
⎣ −1 ⎦ · ⎝⎣ 𝑦 ⎦ − ⎣ 0 ⎦⎠ = 0.
1
𝑧
7
Chapter 3
Systems of Linear Equations
3.1
Introduction
We begin with some introductory examples in which linear systems of equations naturally
arise.
Example 3.1.1
At the market, 3 watermelons and 7 kiwis cost $19 and 2 watermelons and 5 kiwis cost $13.
How much does a watermelon cost and how much does a kiwi cost?
Solution: Let 𝑤 denote the price of a watermelon and let 𝑘 denote the price of a kiwi. We
convert the given information into a system of two equations.
3𝑤 + 7𝑘 = 19
2𝑤 + 5𝑘 = 13
(𝑒1 )
(𝑒2 )
We must find values for 𝑤 and 𝑘 that satisfy the equations (𝑒1 ) and (𝑒2 ) simultaneously.
If we multiply (𝑒1 ) by 2 and subtract from that 3 times (𝑒2 ), we obtain
[2(3) − 3(2)]𝑤 + [2(7) − 3(5)]𝑘 = 2(19) − 3(13).
That is, −𝑘 = −1, which implies that 𝑘 = 1.
Substituting this value for 𝑘 into equation (𝑒1 ) gives 𝑤 =
19−7(1)
3
= 4.
Therefore, the price of a watermelon is $4 and the price of a kiwi is $1.
Substituting these values into the two equations
3(4) + 7(1) = 19
2(4) + 5(1) = 13
shows that they satisfy both equations.
The crucial step in solving this problem is that of taking the combination 2(𝑒1 ) − 3(𝑒2 ).
This step is determined by looking at the coefficients of 𝑤 in the two equations (𝑒1 ) and
(𝑒2 ), which are 3 and 2, respectively. The combination of 2(𝑒1 ) − 3(𝑒2 ) produces a new
equation in which 𝑤 has been eliminated. We then solve this new equation for the other
48
Section 3.2
Systems of Linear Equations
49
variable, 𝑘. Once we have a value for 𝑘, we then substitute it into one of the original two
equations and solve for the value of the other variable, 𝑤.
Alternatively, we could have considered the combination 5(𝑒1 ) − 7(𝑒2 ), to eliminate the
variable 𝑘, and then solve for 𝑤. We then could substitute its value into either equation,
(𝑒1 ) or (𝑒2 ) in order to find the value of 𝑘.
Example 3.1.2
Find the equation of the line that passes through the points (2, 3) and (−3, 4).
Solution: The general equation of a line is 𝑦 = 𝑚𝑥 + 𝑏. We must determine the values
of the constants 𝑚 and 𝑏. Since both points lie on the line, they must both satisfy the
equation. Therefore,
3 = 2𝑚 + 𝑏 and 4 = −3𝑚 + 𝑏.
Thus, we have a system with two equations and two variables, 𝑚 and 𝑏:
2𝑚 + 𝑏 = 3
−3𝑚 + 𝑏 = 4
We will not solve this system now.
Example 3.1.3
⎧⎡ ⎤ ⎡ ⎤⎫
⎡ ⎤
1
1 ⎬
⎨ −1
Determine whether the vector ⃗𝑣 = ⎣ 2 ⎦ lies in Span ⎣ 2 ⎦ , ⎣ 1 ⎦ .
⎩
⎭
3
−3
1
Solution: The vector ⃗𝑣 will lie in the span of the two given vectors if, and only if, there
exist scalars 𝑎, 𝑏 ∈ R, such that
⎡ ⎤
⎡ ⎤
⎡ ⎤
1
−1
1
⎣2⎦ = 𝑎 ⎣ 2 ⎦ + 𝑏 ⎣1⎦ ,
3
−3
1
or equivalently
1 = −1𝑎 +𝑏
2 =
2𝑎 +𝑏 .
3 = −3𝑎 +𝑏
We have a system with three equations and two variables. We will not solve this system
now.
3.2
Definition 3.2.1
Linear Equation,
Coefficient,
Constant Term
Systems of Linear Equations
A linear equation in 𝑛 variables (or unknowns) 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 is an equation that can
be written in the form 𝑎1 𝑥1 + 𝑎2 𝑥2 + · · · + 𝑎𝑛 𝑥𝑛 = 𝑏, where 𝑎1 , 𝑎2 , . . . , 𝑎𝑛 , 𝑏 ∈ F. The
scalars 𝑎1 , 𝑎2 , . . . , 𝑎𝑛 are the coefficients of 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 , respectively, and 𝑏 is called the
constant term.
50
Definition 3.2.2
System of Linear
Equations
Chapter 3
Systems of Linear Equations
A system of linear equations is a collection of 𝑚 linear equations in 𝑛 variables,
𝑥1 , . . . , 𝑥 𝑛 :
𝑎11 𝑥1 +𝑎12 𝑥2 + · · · +𝑎1𝑛 𝑥𝑛 = 𝑏1
𝑎21 𝑥1 +𝑎22 𝑥2 + · · · +𝑎2𝑛 𝑥𝑛 = 𝑏2
..
.
𝑎𝑚1 𝑥1 +𝑎𝑚2 𝑥2 + · · ·
+𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚
We will use the convention that 𝑎𝑖𝑗 is the coefficient of 𝑥𝑗 in the 𝑖𝑡ℎ equation.
Definition 3.2.3
Solve, Solution
We say that the scalars 𝑦1 , 𝑦2 , . . . , 𝑦𝑛 in F solve a system of linear equations if, when we
set 𝑥1 = 𝑦1 , 𝑥2 = 𝑦2 , . . . , 𝑥𝑛 = 𝑦𝑛 in the system, then each of the equations is satisfied:
𝑎11 𝑦1
𝑎21 𝑦1
+𝑎12 𝑦2 + · · ·
+𝑎22 𝑦2 + · · ·
..
.
𝑎𝑚1 𝑦1 +𝑎𝑚2 𝑦2 + · · ·
+𝑎1𝑛 𝑦𝑛 =
+𝑎2𝑛 𝑦𝑛 =
𝑏1
𝑏2
+𝑎𝑚𝑛 𝑦𝑛 = 𝑏𝑚
⎡
⎤
𝑦1
⎢ 𝑦2 ⎥
⎢ ⎥
We also say that the vector ⃗𝑦 = ⎢ . ⎥ is a solution to the system.
⎣ .. ⎦
𝑦𝑛
Note that if at least one equation in the system of linear equations is not satisfied by the
entries in #»
𝑦 , then #»
𝑦 is not a solution.
Example 3.2.4
Consider the following system of linear equations:
𝑥1 −2𝑥2 +3𝑥3 = 1
2𝑥1 −4𝑥2 +6𝑥3 = 2 .
3𝑥1 −𝑥2 +𝑥3 = 3
⎡ ⎤
1
The vector ⎣ 0 ⎦ is a solution, as
0
1(1) − 2(0) + 3(0) = 1
2(1) − 4(0) + 6(0) = 2.
3(1) − 1(0) + 1(0) = 3
⎡
⎤
−2
However, ⎣ 0 ⎦ is not a solution, because even though 1(−2) − 2(0) + 3(1) = 1 and
1
2(−2) − 4(0) + 6(1) = 2, we have that 3(−2) − 1(0) + 1(1) = −5 ̸= 3.
Section 3.2
Systems of Linear Equations
Definition 3.2.5
Solution Set
Example 3.2.6
51
The set of all solutions to a system of linear equations is called the solution set to the
system.
Solve 2𝑥 = 4. That is, find the solution set to this system of linear equations (a system
with one equation).
Solution: The solution to this equation is 𝑥 = 2 and thus, the solution set 𝑆 is {2}. We
can verify that 2(2) = 4.
Example 3.2.7
Solve 2𝑥 + 4𝑦 = 6.
Solution: We have only one equation for the two unknowns, 𝑥 and 𝑦. There will be an
infinite number of solutions to this equation. Indeed, given any real number 𝑦, we can solve
for 𝑥 to obtain 𝑥 = 3 − 2𝑦.
In this way we see that 𝑥 depends on 𝑦, but that 𝑦 can be freely chosen to be any real
number. We emphasize this point by writing 𝑦 = 𝑡, where the variable 𝑡 ∈ R is called a
parameter. Then 𝑥 = 3 − 2𝑡, and therefore the solution set, 𝑆, can be expressed as
{︂[︂
]︂
}︂
3 − 2𝑡
𝑆=
:𝑡∈R .
𝑡
We can verify our solution by setting 𝑦 = 𝑡 and 𝑥 = 3 − 2𝑡:
2𝑥 + 4𝑦 = 2(3 − 2𝑡) + 4𝑡 = 6 − 4𝑡 + 4𝑡 = 6,
which is the right-hand side of the equation.
Note that declaring 𝑦 = 𝑡 as our parameter was an arbitrary choice. We could have just
as well let 𝑥 = 𝑡 and solved for 𝑦 in terms of 𝑡. The important property to note is that a
parameter was needed to describe the solution set 𝑆.
Example 3.2.8
Solve the system of linear equations
2𝑥 + 3𝑦 = 4
2𝑥 + 3𝑦 = 5
Solution: Since both equations have identical left-hand sides, but different right-hand
sides, it will be impossible to find values for 𝑥 and 𝑦 that satisfy both of these equations.
Therefore, the solutions set is 𝑆 = ∅.
Theorem 3.2.9
(The Solution Set to a System of Linear Equations)
The solution set to a system of linear equations is exactly one of the following:
(a) empty (there are no solutions),
(b) contains exactly one element (there is a unique solution), or
(c) contains an infinite number of elements (the solution set has one or more parameters).
52
Chapter 3
Systems of Linear Equations
We have given examples above to illustrate these three outcomes. However, at this time,
we will not give a proof of this result.
Definition 3.2.10
Inconsistent and
Consistent Systems
If the solution set to a system of linear equations is empty, then we say that the system is
inconsistent.
If the solution set has a unique solution or infinitely many solutions, then we say that the
system is consistent.
Example 3.2.11
Solve the system
𝑥=1
.
𝑥=2
Solution: There is no solution and 𝑆 = ∅. This system is inconsistent.
Example 3.2.12
Solve the system
𝑥+𝑦 =2
.
𝑥−𝑦 =4
{︂[︂
Solution: There is a unique solution, 𝑥 = 3 and 𝑦 = −1. Therefore, 𝑆 =
3
−1
]︂}︂
. This
system is consistent.
Example 3.2.13
Solve the system 𝑥 + 𝑦 = 1.
Solution:
]︂ 𝑡, where
}︂ 𝑡 ∈ R. Therefore, 𝑥 = 1−𝑡. We obtain infinitely many solutions
{︂[︂Let 𝑦 =
1−𝑡
: 𝑡 ∈ R . This system is consistent.
and 𝑆 =
𝑡
Note{︂[︂
that if]︂ we had}︂ let 𝑥 = 𝑡, where 𝑡 ∈ R, we would obtain 𝑦 = 1 − 𝑡, and
𝑡
:𝑡∈R .
𝑆=
1−𝑡
These are two different representations of the same set 𝑆. Generally there will be many
different ways of expressing any given set.
It is usually not obvious whether a system is inconsistent, will have a unique solution, or
will have infinitely many solutions. The key idea to solving a system of equations is to
manipulate the system into another system of equations which is easier to solve and has
the same solution set as the original system.
Definition 3.2.14
We say that two linear systems are equivalent whenever they have the same solution set.
Equivalent Systems
Given a system of linear equations, we will manipulate it into equivalent systems and we
will stop once we have obtained an equivalent system whose solution set is easily obtained.
The question then becomes: what manipulations can we do to a system which will produce
an equivalent system?
Section 3.2
Systems of Linear Equations
Example 3.2.15
Consider the system
53
𝑥 +𝑦 = 1
. If we multiply the first equation by zero, we get a
2𝑥 +3𝑦 = 6
new system:
0=0
2𝑥 + 3𝑦 = 6
This system is not equivalent to the original system. For example, 𝑥 = 3 and 𝑦 = 0 is a
solution to the new system, but it is not a solution to the original system. We have lost
information by multiplying the first equation by zero.
Example 3.2.16
Consider the system
𝑥 +𝑦 +𝑧 = 3
2𝑥 +3𝑦 −2𝑧 = 3
We could add an additional equation to get a new system.
𝑥 +𝑦 +𝑧 = 3
2𝑥 +3𝑦 −2𝑧 = 3
𝑥 +2𝑦 −3𝑧 = 5
This new system is not equivalent to the old one. For example, 𝑥 = 1, 𝑦 = 1, 𝑧 = 1 is a
solution to the original system, but it is not a solution to the new system. We have added
an extra restriction on the variables by adding in this new equation. When manipulating a
system, we must be careful that we neither lose information nor add information.
Definition 3.2.17
Elementary
Operations
Consider a system of 𝑚 linear equations in 𝑛 variables. The equations are ordered and
labelled from 𝑒1 to 𝑒𝑚 . The following three operations are known as elementary operations.
Equation swap elementary operation: interchange two equations.
𝑒𝑖 ↔ 𝑒𝑗
Interchange equations 𝑒𝑖 and 𝑒𝑗 .
Equation scale elementary operation: multiply one equation by a non-zero constant.
𝑒𝑖 → 𝑚𝑒𝑖 , 𝑚 ∈ F∖{0} Replace equation 𝑒𝑖 by 𝑚 times equation 𝑒𝑖 .
Equation addition elementary operation: add a multiple of one equation to another
equation.
}︃
𝑒𝑗 → 𝑐𝑒𝑖 + 𝑒𝑗
Replace equation 𝑒𝑗 by adding 𝑒𝑗 and a multiple of equation 𝑒𝑖 .
𝑖 ̸= 𝑗, 𝑐 ∈ F
Some sources refer to these operations as Type I, Type II, and Type III operations,
respectively.
When performing a series of elementary operations, we will use the convention that 𝑒𝑖 and
𝑒𝑗 refer to the current system we are working with. Therefore, 𝑒𝑖 and 𝑒𝑗 will change as the
system is manipulated.
54
Theorem 3.2.18
Chapter 3
Systems of Linear Equations
(Elementary Operations)
If a single elementary operation of any type is performed on a system of linear equations,
then the system produced will be equivalent to the original system.
In practice, there might be other operations that we want to perform. These operations will
not be elementary operations, instead they will be combinations of elementary operations.
For example,
𝑒𝑗 → 𝑐𝑒𝑖 − 𝑒𝑗
is not an elementary operation, but rather it is a combination of two elementary operations.
Can you determine which ones?
Using these operations will still produce an equivalent system, but they will cause confusion
later. Therefore, when manipulating a system of equations, we will make use of the three
elementary operations only.
Note that if at any point we produce an equation of the form
0 = 𝑎, where 𝑎 ̸= 0,
then the system is inconsistent and we stop at once.
Definition 3.2.19
Trivial Equation
We refer to the equation 0 = 0 as the trivial equation. Any other equation is known as a
non-trivial equation.
The trivial equation is always true and it has no information content. If at any point
when performing elementary operations we produce the trivial equation, then we move this
equation to the end of the system, by performing an equation swap elementary operation.
We might be tempted to ignore such equations and not write them down. In practice, we
keep any (there may be more than one) equations of this form in our new system, so that
we will always have the same number of equations in any of the equivalent systems.
The goal of performing elementary operations is to obtain a system whose solution set is
easier to determine. Let us refer to this system as our “final equivalent system.” The final
equivalent system is not unique and depends on the sequence of elementary operations used.
Example 3.2.20
Solve the following system of three linear equations in three unknowns:
−2𝑥1 −4𝑥2 −6𝑥3 = 4
3𝑥1 +6𝑥2 +10𝑥3 = 6 .
𝑥2 +2𝑥3 = 5
Solution: Suppose that we perform the elementary operation
1
𝑒1 → − 𝑒1
2
to get the new system
𝑥1 +2𝑥2 +3𝑥3 = −2
3𝑥1 +6𝑥2 +10𝑥3 =
6 .
𝑥2 +2𝑥3 =
5
Section 3.3
An Approach to Solving Systems of Linear Equations
55
We will now make use of this new equation 𝑒1 to remove the 𝑥1 term from 𝑒2 .
𝑒2 → −3𝑒1 + 𝑒2
gives
𝑥1 +2𝑥2 +3𝑥3 = −2
𝑥3 = 12
𝑥2 +2𝑥3 =
5
This could be our final equivalent system. From 𝑒2 we obtain that 𝑥3 = 12. We could make
use of this fact and 𝑒3 to get 𝑥2 + 2(12) = 5, so that 𝑥2 = −19. We can put the values for
𝑥2 and 𝑥3 into 𝑒1 to get 𝑥1 + 2(−19) + 3(12) = −2, so that 𝑥1 = 0.
However, instead of doing these substitutions, we may wish to continue applying elementary
operations. We perform the operation
𝑒2 ↔ 𝑒3 to get
𝑥1 +2𝑥2 +3𝑥3 = −2
𝑥2 +2𝑥3 =
5 .
𝑥3 = 12
Then we perform
𝑒2 → −2𝑒3 + 𝑒2
𝑒1 → −3𝑒3 + 𝑒1
to get
to get
𝑥1 +2𝑥2 +3𝑥3 = −2
𝑥2
= −19 ,
𝑥3 =
12
𝑥1 +2𝑥2
𝑥2
𝑥3
𝑥1
𝑒1 → −2𝑒2 + 𝑒1
to finally get
𝑥2
𝑥3
= −38
= −19 ,
=
12
=
0
= −19 .
=
12
This is our final equivalent system, and its solution set is
⎧⎡
⎤⎫
⎨ 0 ⎬
𝑆 = ⎣ −19 ⎦ .
⎩
⎭
12
It is clear from this example that there will be plenty of choices on which elementary
operations to apply at any stage and also that there will be choices as to when we will stop
at our final equivalent system. There is no “best” final equivalent system. However, there
are two particular forms that are preferable for the final equivalent system. We will attempt
to motivate these two forms and discuss how they can be obtained.
3.3
An Approach to Solving Systems of Linear Equations
Let us consider a system of 𝑚 linear equations and 𝑛 unknowns. We normally try to solve
for the variables of the system in the alphabetical or numerical order in which they appear.
Let us assume that the variables are labelled 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 . We assume that there is at least
one equation with the variable 𝑥1 appearing in it with a non-zero coefficient (by relabeling
the variables, if necessary).
56
Chapter 3
Systems of Linear Equations
We would like the first equation to tell us about the first variable, 𝑥1 , so that it is of the
form
𝑎1 𝑥1 + · · · = 𝑏1 ,
with 𝑎1 ̸= 0. We will make use of the first equation to remove the variable 𝑥1 from all
the other equations after the first equation by performing equation addition elementary
operations.
We would like the second equation to tell us about the next variable that can be obtained.
Call this 𝑥𝑖 . Often this would be the variable 𝑥2 , so that 𝑖 = 2, but this is not always the
case. The (new) second equation has the form
𝑎𝑖 𝑥𝑖 + · · · = 𝑏2 ,
with 𝑎𝑖 ̸= 0. We will make use of the second equation to remove the variable 𝑥𝑖 from all
the other equations after the second equation by performing equation addition elementary
operations.
We repeat this process, moving down through the equations, performing equation swap and
equation addition elementary operations as needed. We also move any trivial equations to
the end of the system.
We assume that the 𝑟𝑡ℎ equation is the last non-trivial equation. (It is possible that 𝑟 is
equal to 𝑚.) We also assume that this equation tells us about the 𝑘 𝑡ℎ variable, 𝑥𝑘 . Note
that this equation should not involve the variables which were removed from the preceding
equations by the process above. (𝑥1 from equation 1, 𝑥𝑖 from equation 2, etc.) Therefore,
we have an equation of the form
𝑎𝑘 𝑥𝑘 + · · · = 𝑏𝑟 ,
with 𝑎𝑘 ̸= 0.
We illustrate the process up to this point with the following example.
Example 3.3.1
Solve the following system of five linear equations in four unknowns:
𝑥1 +2𝑥2
𝑥1 +2𝑥2 +3𝑥3 +𝑥4
−𝑥1 −𝑥2 +𝑥3 +𝑥4
𝑥2 +𝑥3 +𝑥4
−𝑥2 +2𝑥3
=
1
=
0
= −2 .
= −1
=
0
Solution: We will apply elementary operations, usually just one or two at a time. At each
stage, we will produce an equivalent system. We will stop the process when we produce an
equivalent system whose solution set is relatively easily obtained.
We notice that 𝑒1 has 𝑥1 in it and we will use it to eliminate the variable 𝑥1 from all the
equations below it. (If 𝑒1 did not have any 𝑥1 terms, then we would have switched it with
one that did.)
We perform the elementary operations
𝑒2 → −𝑒1 + 𝑒2
𝑒3 → 𝑒1 + 𝑒3
Section 3.3
An Approach to Solving Systems of Linear Equations
57
We notice that 𝑒4 and 𝑒5 do not contain 𝑥1 and so they are not modified at this stage. We
get
𝑥1 +2𝑥2
=
1
3𝑥3 +𝑥4 = −1
𝑥2 +𝑥3 +𝑥4 = −1 .
𝑥2 +𝑥3 +𝑥4 = −1
−𝑥2 +2𝑥3
=
0
We notice that 𝑒2 does not have an 𝑥2 variable, but 𝑒3 does. Therefore, we perform the
elementary operation
𝑒2 ↔ 𝑒3
to get
𝑥1 +2𝑥2
𝑥2
𝑥2
−𝑥2
+𝑥3 +𝑥4
3𝑥3 +𝑥4
+𝑥3 +𝑥4
+2𝑥3
=
1
= −1
= −1 .
= −1
=
0
There is an 𝑥2 in 𝑒2 and we can make use of this fact to remove the variable 𝑥2 from the
equations below it.
Performing the elementary operations
𝑒4 → −𝑒2 + 𝑒4
𝑒5 → 𝑒2 + 𝑒5
yields
𝑥1 +2𝑥2
𝑥2 +𝑥3 +𝑥4
3𝑥3 +𝑥4
0
3𝑥3 +𝑥4
=
1
= −1
= −1 .
=
0
= −1
There is an 𝑥3 in 𝑒3 and we make use of this fact to remove 𝑥3 from all the equations below
it. In this case, we need only perform
𝑒5 → −𝑒3 + 𝑒5
to obtain
𝑥1 +2𝑥2
𝑥2 +𝑥3 +𝑥4
3𝑥3 +𝑥4
0
0
=
1
= −1
= −1 .
=
0
=
0
Notice that the two trivial equations are at the end.
At this point in the process, the form of the equivalent system of equations has a particularly
simple appearance. We may choose to stop performing elementary operations at this point,
and so this would be our final equivalent system.
Note that a different sequence of elementary operations could produce a different final
58
Chapter 3
Systems of Linear Equations
equivalent system of a similar form. For example, a final equivalent system could involve
the equation 6𝑥3 + 2𝑥4 = −2.
Proceeding from the original system to the final equivalent system in this form is known as
the forward elimination phase.
Once we have a final equivalent system in this form, we can obtain the solution set using a
process known as back substitution. We use the information given in the 𝑟𝑡ℎ equation for
the variable 𝑥𝑘 to then back substitute for the variable 𝑥𝑘 into all the previous equations
in our final equivalent system. Then we repeat this process moving back up the system.
Our last step is to use the information given in the second equation for the variable 𝑥𝑖 to
back substitute for the variable 𝑥𝑖 into the first equation (if it appears there).
Let us return to our example.
Example 3.3.1
Our final equivalent system was
𝑥1 +2𝑥2
𝑥2 +𝑥3 +𝑥4
3𝑥3 +𝑥4
0
0
=
1
= −1
= −1 .
=
0
=
0
We have completed the forward elimination phase and this system is a good choice for a
final equivalent system. It will have the same solution set as the original system and we
can obtain its solution by back substitution.
We notice that 𝑒4 and 𝑒5 are trivial equations and do not give us any information. We also
notice that 𝑒3 gives us a relationship between variables 𝑥3 and 𝑥4 . One of these variables
can arbitrarily be assigned any real value and the value of the other variable will depend
on this arbitrary value.
We will choose to assign 𝑥4 any real value and to emphasize this fact we will let 𝑥4 = 𝑡,
where 𝑡 ∈ R.
Therefore, 𝑒3 tells us that
1
𝑥3 = (−1 − 𝑡).
3
Then 𝑒2 tells us that
1
2 2𝑡
𝑥2 = −1 − 𝑥3 − 𝑥4 = −1 − (−1 − 𝑡) − 𝑡 = − − .
3
3
3
Then 𝑒1 tells us that
(︂
)︂
2 2𝑡
7 4𝑡
𝑥1 = 1 − 2𝑥2 = 1 − 2 − −
= + .
3
3
3
3
Therefore, the solution set is
⎫ ⎧⎡ 7 ⎤
⎫
⎧⎡ 7 4 ⎤
⎡ 4 ⎤
+
𝑡
⎪
⎪
⎪
⎪
3
3
3
3
⎪
⎪
⎪
⎨⎢ 2 2 ⎥
⎬ ⎪
⎨⎢ 2 ⎥
⎬
⎢−2 ⎥
−
−
𝑡
−
3
3
3
3
⎥
⎢
⎥
⎢
⎥
⎢
𝑆 = ⎣ 1 1 ⎦ : 𝑡 ∈ R = ⎣ 1 ⎦ + 𝑡⎣ 1 ⎦ : 𝑡 ∈ R .
−3 − 3𝑡
−3
⎪
⎪ −3
⎪
⎪
⎪
⎪
⎪
⎩
⎭ ⎪
⎩
⎭
𝑡
0
1
Section 3.3
An Approach to Solving Systems of Linear Equations
59
An alternative way to proceed, instead of performing back substitution, is to continue to
simplify the system. The following extension yields a unique final equivalent system, which
is particularly simple.
First, using equation scale elementary operations, scale each of the non-trivial equations of
the current system. Scale the first equation by the factor 𝑎11 , the second equation by the
factor 𝑎1𝑖 and so on, scaling the 𝑟𝑡ℎ equation by the factor 𝑎1𝑘 .
Using the 𝑟𝑡ℎ equation and equation addition elementary operations, we eliminate the variable 𝑥𝑘 from all the equations above it. We then examine the (𝑟 − 1)𝑠𝑡 equation, which will
be of the form
𝑥𝑗 + · · · = 𝑏𝑟−1 ,
for some natural number 𝑗 < 𝑘. Using this equation and equation addition elementary
operations, we eliminate the variable 𝑥𝑗 from all the equations above it.
We repeat this process, moving up through the equations, and performing equation addition
elementary operations to eliminate variables. The process ends when we use the second
equation to eliminate the variable 𝑥𝑖 from the first equation.
These additional steps, performed after the end of the forward elimination phase, are collectively known as the backward elimination phase. When we are performing these steps
we say that we are doing backward elimination.
For a given system, there are many different possibilities for the equivalent system after
doing forward elimination, depending on the particular elementary operations used. However, regardless of the elementary operations used, the same final equivalent system will
be obtained after performing both forward and backward elimination. We will formally
establish this result in Theorem 3.5.8 (Unique RREF).
Once again we return to our example.
Example 3.3.1
Our final equivalent system after the forward elimination phase was
𝑥1 +2𝑥2
𝑥2 +𝑥3 +𝑥4
3𝑥3 +𝑥4
0
0
=
1
= −1
= −1
=
0
=
0
and instead of using back substitution, we continue performing elementary operations.
We observe that the last two equations are trivial equations and 𝑒3 is in terms of both
variables 𝑥3 and 𝑥4 . We perform the elementary operation
1
𝑒3 → 𝑒3
3
to get
𝑥1 +2𝑥2
𝑥2 +𝑥3
+𝑥4
𝑥3 + 31 𝑥4
0
0
=
1
= −1
= − 13 .
=
0
=
0
60
Chapter 3
Systems of Linear Equations
We use 𝑒3 to eliminate the variable 𝑥3 from the equations above it, by performing
𝑒2 → −𝑒3 + 𝑒2
to get
𝑥1 +2𝑥2
𝑥2
𝑥3
=
1
= − 23
= − 13 .
=
0
=
0
+ 32 𝑥4
+ 31 𝑥4
0
0
We use 𝑒2 to eliminate variable 𝑥2 from the equation above it by performing
𝑒1 → −2𝑒2 + 𝑒1
to get
𝑥1
𝑥2
𝑥3
− 43 𝑥4
+ 32 𝑥4
+ 31 𝑥4
0
0
7
=
3
= − 23
= − 13 .
=
0
=
0
We have completed the backward elimination phase. We have that 𝑥4 = 𝑡, 𝑡 ∈ R. From 𝑒1
we obtain that
7 4
𝑥1 = + 𝑡.
3 3
From 𝑒2 we obtain that
2 2
𝑥2 = − − 𝑡.
3 3
From 𝑒3 we obtain that
1 1
𝑥3 = − − 𝑡.
3 3
Therefore, the solution set is
⎫ ⎧⎡ 7 ⎤
⎫
⎧⎡ 7 4 ⎤
⎡ 4 ⎤
⎪
⎪
⎪
⎪
3 + 3𝑡
3
3
⎪
⎪
⎪
⎪
⎬ ⎨⎢ 2 ⎥
⎬
⎨⎢ 2 2 ⎥
⎢−2 ⎥
−
𝑡
−
−
3
3
3
3
⎥
⎥
⎢
⎥
⎢
⎢
𝑆 = ⎣ 1 1 ⎦ : 𝑡 ∈ R = ⎣ 1 ⎦ + 𝑡⎣ 1 ⎦ : 𝑡 ∈ R .
−3 − 3𝑡
−3
⎪
⎪
⎪ −3
⎪
⎪
⎪
⎪
⎩
⎭ ⎪
⎩
⎭
𝑡
0
1
𝑡
If we use another parameter, 𝑢 = , then we remove some of the fractions to get
3
⎧⎡ 7
⎫ ⎧⎡ 7 ⎤
⎫
⎤
⎡ ⎤
4
⎪
⎪
⎪
⎪
3 + 4𝑢
3
⎪
⎪
⎪
⎪
⎨⎢ 2
⎬ ⎨⎢ 2 ⎥
⎬
⎥
⎢ −2 ⎥
−
−
2𝑢
−
3
3
⎢
⎥
⎥
⎢
⎥
𝑆= ⎢
:
𝑢
∈
R
=
+
𝑢
:
𝑢
∈
R
.
⎣ −1 − 𝑢 ⎦
⎣−1 ⎦
⎣ −1 ⎦
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
3
3
⎩
⎭ ⎩
⎭
3𝑢
0
3
𝑡−2
and get an even nicer looking solution set
3
⎧⎡ ⎤
⎫
⎡ ⎤
5
4
⎪
⎪
⎪
⎪
⎨⎢ ⎥
⎬
⎢ −2 ⎥
−2
⎥ + 𝑠⎢ ⎥ : 𝑠 ∈ R .
𝑆= ⎢
⎣ −1 ⎦
⎣ −1 ⎦
⎪
⎪
⎪
⎪
⎩
⎭
2
3
Alternatively, we could let 𝑠 =
Section 3.4
Solving Systems of Linear Equations Using Matrices
61
REMARK
Remember that, when a solution set contains at least one parameter, you can obtain particular solutions by setting particular values to parameters. In the case of Example 3.3.1,
the solution set of a system of linear equations is given by
⎧⎡ ⎤
⎫
⎡ ⎤
5
4
⎪
⎪
⎪
⎪
⎨⎢ ⎥
⎬
⎢ −2 ⎥
−2
⎢
⎥
⎢
⎥
𝑆 = ⎣ ⎦ + 𝑠⎣ ⎦ : 𝑠 ∈ R .
−1
−1
⎪
⎪
⎪
⎪
⎩
⎭
2
3
By setting 𝑠 = −1, 𝑠 = 0 and 𝑠 = 1, we find that
⎡
⎤
⎡
⎤
1
5
⎢ 0 ⎥
⎢ −2 ⎥
⎢
⎥
⎢
⎥
⎣ 0 ⎦ , ⎣ −1 ⎦ and
−1
2
⎡
⎤
9
⎢ −4 ⎥
⎢ ⎥
⎣ −2 ⎦
5
are particular solutions to the given system of linear equations.
3.4
Solving Systems of Linear Equations Using Matrices
When we are solving a system of equations, we manipulate the system using elementary
operations. At each stage, we have an equivalent system of equations. That is, each system
has the same solution set. We can stop at any stage with any equivalent system as soon as
we find that we can write down the solution set.
In the systems of equations that we have solved, notice that we write down the variables
many times in the process, but it turns out they are only really important at the beginning
and the end of the process.
Moving forwards, we will only need to write down the coefficients and the constant terms
in the intermediate steps. To do this, we will make use of matrices.
Definition 3.4.1
Matrix, Entry
An 𝑚 × 𝑛 matrix, 𝐴, is a rectangular array of scalars with 𝑚 rows and 𝑛 columns. The
scalar in the 𝑖𝑡ℎ row and 𝑗 𝑡ℎ column is the (𝑖, 𝑗)𝑡ℎ entry and is denoted 𝑎𝑖𝑗 or (𝐴)𝑖𝑗 . That
is,
⎡
⎤
𝑎11 𝑎12 · · · 𝑎1𝑛
⎢ 𝑎21 𝑎22 · · · 𝑎2𝑛 ⎥
⎢
⎥
𝐴=⎢ .
.. . .
.. ⎥ .
⎣ ..
. . ⎦
.
𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛
Definition 3.4.2
Coefficient Matrix,
Augmented Matrix
For a given system of linear equations,
𝑎11 𝑥1
𝑎21 𝑥1
+𝑎12 𝑥2 + · · ·
+𝑎22 𝑥2 + · · ·
..
.
𝑎𝑚1 𝑥1 +𝑎𝑚2 𝑥2 + · · ·
+𝑎1𝑛 𝑥𝑛 =
+𝑎2𝑛 𝑥𝑛 =
𝑏1
𝑏2
+𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚
,
62
Chapter 3
Systems of Linear Equations
the coefficient matrix, 𝐴, of the system is the matrix
⎤
⎡
𝑎11 𝑎12 · · · 𝑎1𝑛
⎢ 𝑎21 𝑎22 · · · 𝑎2𝑛 ⎥
⎥
⎢
𝐴=⎢ .
.. . .
.. ⎥ .
.
⎣ .
. . ⎦
.
𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛
The entry 𝑎𝑖𝑗 is the coefficient of the variable 𝑥𝑗 in the 𝑖𝑡ℎ
#»
The augmented matrix, [𝐴| 𝑏 ], of the system is
⎡
𝑎11 𝑎12 · · · 𝑎1𝑛
⎢
#»
⎢ 𝑎21 𝑎22 · · · 𝑎2𝑛
[𝐴| 𝑏 ] = ⎢ .
..
..
..
⎣ ..
.
.
.
𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛
equation.
⎤
𝑏1
𝑏2 ⎥
⎥
.. ⎥ ,
. ⎦
𝑏𝑚
#»
where 𝑏 is a column whose entries are the constant terms on the right-hand side of the
equations.
#»
The augmented matrix is the coefficient matrix, 𝐴, with the extra column, 𝑏 , added.
In solving a system of equations, we can manipulate the rows of the augmented matrix,
which will correspond to manipulating the equations.
Definition 3.4.3
Elementary Row
Operation (ERO)
Elementary row operations (EROs) are the operations performed on the coefficient
and/or augmented matrix which correspond to the elementary operations performed on the
system of equations.
Elementary operation
Row swap
Row scale
Row addition
Equation
𝑒𝑖 ↔ 𝑒𝑗
𝑒𝑖 → 𝑐𝑒𝑖 , 𝑐 ̸= 0
𝑒𝑖 → 𝑐𝑒𝑗 + 𝑒𝑖 , 𝑖 ̸= 𝑗
Row
𝑅𝑖 ↔ 𝑅𝑗
𝑅𝑖 → 𝑐𝑅𝑖 , 𝑐 ̸= 0
𝑅𝑖 → 𝑐𝑅𝑗 + 𝑅𝑖 , 𝑖 ̸= 𝑗
As with the equation operations, these EROs are referred to in some sources as Type I,
Type II, and Type III operations, respectively.
Definition 3.4.4
In a matrix, we refer to a row that has all zero entries as a zero row.
Zero Row
Definition 3.4.5
Row Equivalent
If a matrix 𝐵 is obtained from a matrix 𝐴 by a finite number of EROs, then we say that 𝐵
is row equivalent to 𝐴.
At any point in the process of manipulating the rows of an augmented matrix, we could stop
and immediately write down the corresponding system of linear equations. The first column
in the matrix corresponds to the variable 𝑥1 , the second column in the matrix corresponds
to the variable 𝑥2 and so on up to the second-last column in the matrix which corresponds
Section 3.4
Solving Systems of Linear Equations Using Matrices
63
to the variable 𝑥𝑛 . The last column in the matrix corresponds to the constants on the right
hand side of the system.
The augmented matrix at any stage is row equivalent to the original augmented matrix.
Therefore, the system of equations corresponding to this augmented matrix is equivalent to
the system of equations corresponding to the original augmented matrix.
Example 3.4.6
Solve the following system (the same system as in Example 3.3.1) using augmented matrices
and EROs.
𝑥1 +2𝑥2
=
1
𝑥1 +2𝑥2 +3𝑥3 +𝑥4 =
0
−𝑥1 −𝑥2 +𝑥3 +𝑥4 = −2 .
𝑥2 +𝑥3 +𝑥4 = −1
−𝑥2 +2𝑥3
=
0
Solution: The augmented matrix for this system is
⎡
⎤
1
2 0 0 1
⎢1
2 3 1 0⎥
⎢
⎥
⎢−1 −1 1 1 −2⎥ .
⎢
⎥
⎣0
1 1 1 −1⎦
0 −1 2 0 0
We begin by eliminating 𝑥1 from the second and third equations by performing the following
EROs
𝑅2 → −𝑅1 + 𝑅2
𝑅3 → 𝑅1 + 𝑅3
to get
⎡
1 2
⎢0 0
⎢
⎢0 1
⎢
⎣0 1
0 −1
0
3
1
1
2
⎤
0 1
1 −1⎥
⎥
1 −1⎥
⎥.
1 −1⎦
0 0
The second row does not contain the variable 𝑥2 , but the third one does and so we swap
these two rows, 𝑅2 ↔ 𝑅3 , to get
⎤
⎡
1 2 0 0 1
⎢0 1 1 1 −1⎥
⎢
⎥
⎢0 0 3 1 −1⎥ .
⎥
⎢
⎣0 1 1 1 −1⎦
0 −1 2 0 0
We now make use of the new second row to eliminate 𝑥2 from all rows after the second one,
by performing the following EROs
𝑅4 → −𝑅2 + 𝑅4
𝑅5 → 𝑅2 + 𝑅5
to get
⎡
1
⎢0
⎢
⎢0
⎢
⎣0
0
2
1
0
0
0
0
1
3
0
3
⎤
0 1
1 −1⎥
⎥
1 −1⎥
⎥.
0 0⎦
1 −1
64
Chapter 3
Systems of Linear Equations
We use the third row to eliminate 𝑥3 from the rows after it, by performing the following
ERO
𝑅5 → −𝑅3 + 𝑅5
to get
⎡
1
⎢0
⎢
⎢0
⎢
⎣0
0
2
1
0
0
0
0
1
3
0
0
⎤
0 1
1 −1⎥
⎥
1 −1⎥
⎥.
0 0⎦
0 0
At this point, we have completed the forward elimination phase using a sequence of EROs
equivalent to how we manipulated the equations in Example 3.3.1.
We can write down the system of equations that corresponds to the final augmented matrix
that we obtained. This system of equations is equivalent to the original system.
𝑥1 +2𝑥2
𝑥2 +𝑥3 +𝑥4
3𝑥3 +𝑥4
0
0
=
1
= −1
= −1
=
0
=
0
This system is the same system we obtained after the forward elimination phase when we
were using elementary operations. As before we can use back substitution to obtain the
solution set
⎧⎡ 7 4 ⎤
⎫ ⎧⎡ 7 ⎤
⎫
⎡ 4 ⎤
⎪
⎪
⎪
⎪
3 + 3𝑡
3
3
⎪
⎪
⎪
⎪
⎨⎢ 2 2 ⎥
⎬ ⎨⎢ 2 ⎥
⎬
⎢−2 ⎥
−
−
𝑡
−
3
3 ⎥:𝑡∈R
3 ⎥ + 𝑡⎢ 3 ⎥ : 𝑡 ∈ R .
⎢
𝑆= ⎢
=
⎣−1 ⎦
⎣−1 − 1𝑡⎦
⎪
⎪
⎪
⎪⎣ − 13 ⎦
⎪
⎪
⎪
3
3
3
⎩
⎭ ⎪
⎭
⎩
𝑡
0
1
Instead of using back substitution we can continue using EROs to obtain an even simpler
system.
We scale the third row by performing the following ERO
1
𝑅3 → 𝑅3
3
to get
⎡
⎤
1 2 0 0 −1
⎢0 1 1 1 −1 ⎥
⎢
⎥
⎢0 0 1 1 − 1 ⎥ .
3
3⎥
⎢
⎣0 0 0 0 0 ⎦
0 0 0 0 0
Next, we use the third row to eliminate 𝑥3 from the equations above it. We do this by
performing the following ERO
𝑅2 → −𝑅3 + 𝑅2
yielding
⎡
1
⎢0
⎢
⎢0
⎢
⎣0
0
2
1
0
0
0
⎤
0 0 −1
0 23 − 23 ⎥
⎥
1 13 − 13 ⎥
⎥.
0 0 0 ⎦
0 0 0
Section 3.5
Solving Systems of Linear Equations Using Matrices
65
The last step of the backward elimination phase is to eliminate 𝑥2 from the first equation,
which is achieved by the ERO
𝑅1 → −2𝑅2 + 𝑅1
yielding
⎡
1
⎢0
⎢
⎢0
⎢
⎣0
0
0
1
0
0
0
0 − 34
0 23
1 13
0 0
0 0
7
3
⎤
− 23 ⎥
⎥
− 13 ⎥
⎥.
0 ⎦
0
This augmented matrix has a very simple structure. We could argue that it has the simplest
structure of all the augmented matrices we have seen in this example. In the next section
we will introduce terminology to formalize this structure.
The system of equations corresponding to the last augmented matrix above is
𝑥1
𝑥2
𝑥3
− 43 𝑥4
+ 32 𝑥4
+ 31 𝑥4
0
0
7
=
3
= − 23
= − 13
=
0
=
0
This system is the same system we obtained after the forward and backward elimination
phases when we were using elementary operations. Therefore, the solution set is
⎧⎡ 7 4 ⎤
⎫ ⎧⎡ 7 ⎤
⎫
⎡ 4 ⎤
⎪
⎪
⎪
⎪
3 + 3𝑡
3
3
⎪
⎪
⎪
⎪
⎨⎢ 2 2 ⎥
⎬ ⎨⎢ 2 ⎥
⎬
2⎥
⎢
−
−
−
−
𝑡
3
3
3
3
⎥
⎥
⎢
⎥
⎢
𝑆= ⎢
:
𝑡
∈
R
+
𝑡
:
𝑡
∈
R
=
.
⎣−1 ⎦
⎣−1 − 1𝑡⎦
⎣−1 ⎦
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
3
3
3
3
⎩
⎭ ⎩
⎭
𝑡
0
1
REMARK
In an augmented matrix, a zero row corresponds to the equation 0 = 0. In the above
example, the last two rows of the augmented matrix eventually became zero rows. Although
these equations do not give us any information about the solution to the system of equations,
we continued to write them down so that we would not forget that our system started with
five equations.
If at any point we have a zero row in the coefficient matrix and the last entry in the
corresponding row in the augmented matrix is 𝑏 ̸= 0, then we can stop and deduce that our
system is inconsistent, since one of the equations has become 0 = 𝑏, where 𝑏 ̸= 0.
66
Chapter 3
3.5
Systems of Linear Equations
The Gauss–Jordan Algorithm for Solving Systems of Linear Equations
Consider a system of 𝑚 linear equations in 𝑛 variables (unknowns). We assume the variables
are labelled 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 . The Gauss–Jordan algorithm is the formalization of the strategy
we used in the previous section to solve such a system. In this algorithm, we use EROs to
manipulate the augmented matrix into a simpler form from which it is easier to determine
the solution of the system. The process of performing EROs on the matrix to bring it into
these simpler forms is called row reduction or Gaussian elimination after Carl Friedrich
Gauss (1777-1855) who outlined the process; a similar method for solving systems of linear
equations was known to the Chinese around 250 B.C.
In the previous section, we obtained two simpler forms of the augmented matrix. The
first was obtained after completing the forward elimination process and is called the row
echelon form. The second is obtained after completing both the forward and backward
elimination processes and is called the reduced row echelon form. We now give the
formal definitions of these forms in terms of leading entries and pivots.
Definition 3.5.1
Leading Entry,
Leading One
Definition 3.5.2
Row Echelon Form
The leftmost non-zero entry in any non-zero row of a matrix is called the leading entry
of that row. If the leading entry is a 1, then it is called a leading one.
We say that a matrix is in row echelon form (REF) whenever both of the following two
conditions are satisfied:
1. All zero rows occur as the final rows in the matrix.
2. The leading entry in any non-zero row appears in a column to the right of the columns
containing the leading entries of any of the rows above it.
We say that the matrix 𝑅 is a row echelon form of matrix 𝐴 to mean that 𝑅 is in row
echelon form and that 𝑅 can be obtained from 𝐴 by performing a finite number of EROs
to 𝐴.
Definition 3.5.3
Pivot, Pivot
Position, Pivot
Column, Pivot
Row
If a matrix is in REF, then the leading entries are referred to as pivots and their positions
in the matrix are called pivot positions. Any column that contains a pivot position is
called a pivot column. Any row that contains a pivot position is called a pivot row.
Due to the structure of REF, any pivot column will contain exactly one pivot and any pivot
row will contain exactly one pivot.
Example 3.5.4
The matrix
⎡
3
⎢0
⎢
⎢0
⎢
⎣0
0
2
0
0
0
0
−3
2
0
0
0
⎤
4
0 ⎥
⎥
−5 ⎥
⎥
0 ⎦
0
Section 3.5
The Gauss–Jordan Algorithm for Solving Systems of Linear Equations
67
is in REF, with pivots in the (1, 1), (2, 3) and (3, 4) entries. However, the matrix
⎡
⎤
−3 3 5 7
⎣ 0 26 0 ⎦
1 0 1 −2
is not in REF because the leading entry in the third row (marked in red) is not to the right
of the leading entry of one (in fact both of) the rows above it.
Definition 3.5.5
Reduced Row
Echelon Form
We say that a matrix is in reduced row echelon form (RREF) whenever all of the
following three conditions are satisfied:
1. It is in REF.
2. All its pivots are leading ones.
3. The only non-zero entry in a pivot column is the pivot itself.
REMARK
Item “3.” from the definition of RREF above implies that all entries above and below a
pivot MUST be zero for a matrix in RREF; in other words, any column of a matrix in
RREF with two or more non-zero entries is not a pivot column.
Example 3.5.6
The matrix
⎡
3
⎢0
⎢
⎢0
⎢
⎣0
0
2
0
0
0
0
−3
2
0
0
0
⎤
4
0 ⎥
⎥
−5 ⎥
⎥
0 ⎦
0
from the previous example is in REF, but not in RREF, because its pivots are not leading
ones, and moreover, the third and fourth pivot columns contain non-zero entries that are
not pivots.
The matrix
⎡
1
⎢0
⎢
⎢0
⎢
⎣0
0
2
0
0
0
0
0
1
0
0
0
⎤
0
0⎥
⎥
1⎥
⎥
0⎦
0
is in RREF.
Example 3.5.7
List all possible 2 × 2 matrices that are in RREF.
68
Chapter 3
Systems of Linear Equations
Solution: Since a 2 × 2 matrix has two rows and two columns, then the number of pivots
is either zero, one or two. If there are zero pivots, then all entries in the matrix must be 0.
Therefore, we obtain the matrix
[︂
]︂
00
.
00
If there is one pivot, then this pivot must occur in the first row and the second row must
be a zero row. The pivot could occur in the first column or the second column. If the pivot
occurs in the first column, then the (1, 2) entry could be anything. Therefore, we obtain an
infinite number of matrices of the form
[︂
]︂
1𝑎
, 𝑎 ∈ F.
00
If the pivot occurs in the second column, then the (1, 1) entry must be 0. Therefore, we
obtain the matrix
[︂
]︂
01
.
00
If there are two pivots, then they must occur in the positions (1, 1) and (2, 2) and the other
entries must be 0. Therefore, we obtain the matrix
[︂
]︂
10
.
01
If 𝐴 is not a matrix of all zeros, then there are an infinite number of matrices which are
REFs of 𝐴. However, the pivot positions of 𝐴 are unique, as well as is its RREF. At this
point, we do not have the tools to prove this result.
Theorem 3.5.8
(Unique RREF)
Let 𝐴 be a matrix with REFs 𝑅1 and 𝑅2 . Then 𝑅1 and 𝑅2 will have the same set of pivot
positions. Moreover, there is a unique matrix 𝑅 such that 𝑅 is the RREF of 𝐴.
Because the RREF of a given matrix 𝐴 is unique, we can introduce notation to refer to it.
Definition 3.5.9
RREF(𝐴)
We say that the matrix 𝑅 is the reduced row echelon form of matrix 𝐴, and we write
𝑅 = RREF(𝐴), if 𝑅 is in reduced row echelon form and if 𝑅 can be obtained from 𝐴 by
performing a finite number of EROs to 𝐴.
Having laid down the needed definitions, we now outline an algorithm that takes as input a
given matrix 𝐴 and produces a row equivalent matrix 𝑅 in REF. This algorithm corresponds
to the forward elimination process for a consistent system.
ALGORITHM (Obtaining an REF)
Section 3.5
The Gauss–Jordan Algorithm for Solving Systems of Linear Equations
69
1. Consider the leftmost non-zero column of the matrix. Use EROs to obtain a leading
entry in the top position of this column. This entry is now a pivot and this row is
now a pivot row.
2. Use EROs to change all other entries below the pivot in this pivot column to 0.
3. Consider the submatrix formed by covering up the current pivot row and all previous
pivot rows. If there are no more rows or if the only remaining rows are zero rows,
we are finished. Otherwise, repeat steps 1 and 2 on the submatrix. Continue in this
manner, covering up the current pivot row to obtain a matrix with one less row until
no rows remain or we obtain a submatrix with only zero rows.
We can also outline a similar algorithm that takes as input a matrix 𝑅 in REF and produces
the RREF of 𝑅. This will correspond to the backward elimination process.
ALGORITHM (Obtaining the RREF from an REF)
Start with a matrix in REF.
1. Select the rightmost pivot column. If the pivot is not already 1, use EROs to change
it to 1.
2. Use EROs to change all entries above the pivot in this pivot column to 0.
3. Consider the submatrix formed by covering up the current pivot row and all other
rows below it. If there are no more rows, then we are finished. Otherwise, repeat
steps 1 and 2 on the submatrix until no rows remain.
The preceding two algorithms can be run in sequence to take a matrix 𝐴 and produce
RREF(𝐴). This process is known as the Gauss–Jordan1 algorithm.
REMARKS
• If
[︀ the system]︀is inconsistent, then at some point we will obtain a row of the form
0 · · · 0 𝑏 , where 𝑏 ̸= 0. This row corresponds to the equation 0 = 𝑏, where
𝑏 ̸= 0. As soon as we obtain such a row, we can stop the algorithm and conclude that
the system is inconsistent.
• We will not always blindly follow these algorithms when finding an REF or the RREF.
There may be other choices along the way that lead to fewer calculations, or we may
want to avoid fractions for as long as possible. When determining an REF we often
try to obtain leading 1s, even though they are not required, because they are easier
to work with.
1
Named after the aforementioned Gauss and Wilhelm Jordan (1842-1899), a German engineer who popularized this method for solving systems of equations.
70
Chapter 3
Systems of Linear Equations
We will apply the Gauss–Jordan algorithm to the augmented matrix obtained from a system
of linear equations. This will allow us to easily obtain the solution set to the system. The
following terminology will be helpful in describing these solution sets.
Definition 3.5.10
Basic Variable, Free
Variable
Consider a system of linear equations. Let 𝑅 be an REF of the coefficient matrix of this
system. If the 𝑖𝑡ℎ column of this matrix contains a pivot, then we call 𝑥𝑖 a basic variable.
Otherwise, we call 𝑥𝑖 a free variable.
This terminology reflects the fact that we will be able to assign parameters to the free
variables, and then we will be able to express the basic variables in terms of these parameters.
See the examples below.
Example 3.5.11
Solve the following system.
−𝑥1
2𝑥1
𝑥1
2𝑥1
−2𝑥2 +2𝑥3
+4𝑥2 −2𝑥3
+2𝑥2 −𝑥3
+4𝑥2 −6𝑥3
+4𝑥4
−2𝑥4
−2𝑥4
−8𝑥4
=
3
=
4
=
0
= −4
Solution: We begin by giving the augmented matrix for the system.
⎡
⎤
−1 −2 2
4
3
⎢2
4 −2 −2 4 ⎥
⎢
⎥
⎣1
2 −1 −2 0 ⎦
2
4 −6 −8 −4
The leftmost non-zero column is the first column. Therefore, our first goal is to get a
leading entry (which will become a pivot) in the top position of this column. It will make
our calculations easier if this is a leading one, so we will perform the ERO 𝑅1 ←→ 𝑅3 to
get
⎡
⎤
1
2 −1 −2 0
⎢2
4 −2 −2 4 ⎥
⎢
⎥.
⎣−1 −2 2
4
3⎦
2
4 −6 −8 −4
Next, we want to change all entries below the pivot in this column to 0. So we perform the
following EROs
𝑅2 → −2𝑅1 + 𝑅2
𝑅3 → 𝑅1 + 𝑅3
𝑅4 → −2𝑅1 + 𝑅4
to get
⎡
1
⎢0
⎢
⎣0
0
⎤
2 −1 −2 0
0 0
2
4⎥
⎥.
0 1
2
3⎦
0 −4 −4 −4
Covering up the first row, the leftmost non-zero column is the third column. We need to
get a leading entry (which will become a pivot) in the top position of this column (with the
Section 3.5
The Gauss–Jordan Algorithm for Solving Systems of Linear Equations
first row covered) and so we need to get a pivot in
do this is to perform the ERO 𝑅2 ←→ 𝑅3 to get
⎡
1 2 −1 −2
⎢0 0 1
2
⎢
⎣0 0 0
2
0 0 −4 −4
71
the (2,3) position. The simplest way to
⎤
0
3⎥
⎥.
4⎦
−4
Next, we want to change all entries below the pivot in this column to 0. So we perform the
following ERO
𝑅4 → 4𝑅2 + 𝑅4
to get
⎡
1
⎢0
⎢
⎣0
0
⎤
2 −1 −2 0
0 1
2 3⎥
⎥.
0 0
2 4⎦
0 0
4 8
Covering up the first two rows, the leftmost non-zero column is the fourth column. We
need to get a leading entry (which will become a pivot) in the top position of this column
(with the first two rows covered) and so we need to get a pivot in the (3,4) position. There
already is a leading entry there and so no EROs are needed.
Next, we want to change all entries below the pivot in this column to 0. So we perform the
following ERO
𝑅4 → −2𝑅3 + 𝑅4
to get
⎡
1
⎢0
⎢
⎣0
0
⎤
2 −1 −2 0
0 1
2 3⎥
⎥.
0 0
2 4⎦
0 0
0 0
Covering up the first three rows, all remaining rows are zero rows and so we now have an
augmented matrix in REF. We write down the system of equations that corresponds to this
augmented matrix and solve this system using back substitution.
The corresponding system of equations is
𝑥1 +2𝑥2 −𝑥3 −2𝑥4
𝑥3 +2𝑥4
2𝑥4
0
=
=
=
=
0
3
.
4
0
The third equation tells us that 𝑥4 = 2. Substituting this into the second equation, we
obtain that 𝑥3 = −1. Substituting both of these values into the first equation, we obtain
that 𝑥1 + 2𝑥2 = 3. Using our convention, we set 𝑥1 to be a basic variable and 𝑥2 to be a
free variable. Therefore, 𝑥2 = 𝑡, 𝑡 ∈ R and 𝑥1 = 3 − 2𝑡.
⎧⎡
⎫ ⎧⎡
⎫
⎤
⎤
⎡ ⎤
3 − 2𝑡
3
−2
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨⎢
⎬ ⎨⎢
⎬
⎥
⎥
⎢ 1 ⎥
𝑡
0
⎥:𝑡∈R = ⎢
⎥ + 𝑡⎢ ⎥ : 𝑡 ∈ R .
Therefore, the solution set is 𝑆 = ⎢
⎣ −1 ⎦
⎣ 0 ⎦
⎪
⎪
⎪⎣ −1 ⎦
⎪
⎪
⎪
⎪
⎩
⎭ ⎪
⎩
⎭
2
2
0
72
Chapter 3
Systems of Linear Equations
Instead of stopping at REF and using back substitution, we could continue to do row
reduction until we reached an augmented matrix in RREF (the Gauss-Jordan algorithm).
Beginning with
⎡
1
⎢0
⎢
⎣0
0
⎤
2 −1 −2 0
0 1
2 3⎥
⎥
0 0
2 4⎦
0 0
0 0
we identify the rightmost pivot column, which
to 1 by performing the ERO 𝑅3 → 12 𝑅3 to get
⎡
1 2 −1
⎢0 0 1
⎢
⎣0 0 0
0 0 0
is the fourth column. We change the pivot
−2
2
1
0
⎤
0
3⎥
⎥.
2⎦
0
Next, we want to change all entries above this pivot to 0. So we perform the EROs
𝑅1 → 2𝑅3 + 𝑅1
𝑅2 → −2𝑅3 + 𝑅2
to get
⎡
1
⎢0
⎢
⎣0
0
⎤
2 −1 0 4
0 1 0 −1⎥
⎥.
0 0 1 2⎦
0 0 0 0
Covering up the bottom two rows, the rightmost pivot column is the third column. The
pivot in this column is already 1 so no ERO is necessary. We want to change all entries
above this pivot to 0 and so we perform the ERO
𝑅1 → 𝑅2 + 𝑅1
to get
⎡
1
⎢0
⎢
⎣0
0
2
0
0
0
0
1
0
0
⎤
0 3
0 −1⎥
⎥.
1 2⎦
0 0
Covering up the bottom three rows, the rightmost pivot column is the first column. The
pivot in this column is already 1 so no ERO is necessary.
There are now no more rows to adjust and so our augmented matrix is in RREF.
The corresponding system of equations is
𝑥1 +2𝑥2
𝑥3
=
3
= −1
𝑥4 =
2
0 =
0
The second and third equations give us values for 𝑥3 and 𝑥4 , respectively. We set 𝑥2 to be a
free variable and thus 𝑥2 = 𝑡, 𝑡 ∈ R, which implies that 𝑥1 = 3 − 2𝑡. Therefore, the solution
Section 3.5
The Gauss–Jordan Algorithm for Solving Systems of Linear Equations
73
⎧⎡
⎫ ⎧⎡ ⎤
⎫
⎤
⎡ ⎤
3
−
2𝑡
3
−2
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨⎢
⎬ ⎪
⎨⎢ ⎥
⎬
⎥
⎢ 1 ⎥
𝑡
0
⎢
⎥
⎢
⎥
⎢
⎥
set is 𝑆 = ⎣
: 𝑡 ∈ R = ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑡 ∈ R which is the same solution
−1 ⎦
0
⎪
⎪
⎪ −1
⎪
⎪
⎪
⎪
⎩
⎭ ⎪
⎩
⎭
2
2
0
set we obtained earlier.
Example 3.5.12
Solve the following system.
2𝑥1 +4𝑥2 −2𝑥3 = 2
3𝑥1 +7𝑥2 +𝑥3 = 0
2𝑥1 +5𝑥2 +2𝑥3 = 1
Solution: We begin by giving the augmented matrix for the system.
⎡
⎤
2 4 −2 2
⎣3 7 1 0⎦
2 5 2 1
Our first goal is to get a leading entry (which will become a pivot) in the top position of the
first column, since is it the leftmost non-zero column. It will actually make our calculations
easier if this is a leading one and so we will perform the ERO 𝑅1 → 12 𝑅1 to obtain
⎡
⎤
1 2 −1 1
⎣3 7 1 0⎦ .
2 5 2 1
Next, we want to change all entries below the pivot in this column to 0. So we perform the
following row addition EROs
𝑅2 → −3𝑅1 + 𝑅2
𝑅3 → −2𝑅1 + 𝑅3
to obtain
⎤
⎡
1 2 −1 1
⎣0 1 4 −3⎦ .
0 1 4 −1
Covering up the first row, the leftmost non-zero column is the second column. We need to
obtain a leading entry (which will become a pivot) in the top position of this column (with
the first row covered) and so we need to get a pivot in the (2,2) position. There already is
a pivot there and so no ERO is necessary. Next, we want to change all entries below this
pivot to 0 and so we perform the ERO 𝑅3 → −𝑅2 + 𝑅3 to obtain
⎡
⎤
1 2 −1 1
⎣0 1 4 −3⎦ .
0 0 0
2
[︀
]︀
The third row is of the form 0 0 0 𝑏 , where 𝑏 ̸= 0 and so we can stop the algorithm here
and declare that this system is inconsistent. (The third row corresponds to the equation
0 = 2.)
74
Chapter 3
Systems of Linear Equations
We will include one more example in which we give far less commentary. We will simply
outline the EROs that we are using. We will make use of the Gauss–Jordan algorithm
(forward and backward elimination).
Example 3.5.13
Solve the following system.
3𝑥1 +5𝑥2 +3𝑥3 =
9
−2𝑥1
−𝑥2 +6𝑥3 = 10
4𝑥1 +10𝑥2 −3𝑥3 = −2
Solution: The augmented matrix for the system is
⎡
⎤
3
5
3
9
⎣−2 −1 6 10 ⎦ .
4 10 −3 −2
We determine an REF.
⎤
1
4
9 19
𝑅1 → 𝑅2 + 𝑅1 gives ⎣−2 −1 6 10 ⎦ .
4 10 −3 −2
⎤
⎡
{︃
19
1
4
9
𝑅2 → 2𝑅1 + 𝑅2
24
48 ⎦ .
gives ⎣0 7
𝑅3 → −4𝑅1 + 𝑅3
0 −6 −39 −78
⎡
⎤
1 4
9
19
𝑅2 → 𝑅3 + 𝑅2 gives ⎣0 1 −15 −30⎦ .
0 −6 −39 −78
⎡
⎤
1 4
9
19
𝑅3 → 6𝑅2 + 𝑅3 gives ⎣0 1 −15 −30 ⎦ .
0 0 −129 −258
⎡
This augmented matrix is in REF. At this point we could use back substitution, but we
will continue on to RREF.
⎡
⎤
1 4
9
19
1
𝑅3 → − 129
𝑅3 gives ⎣0 1 −15 −30⎦ .
0 0
1
2
⎡
⎤
{︃
1 4 0 1
𝑅1 → −9𝑅3 + 𝑅1
gives ⎣0 1 0 0⎦ .
𝑅2 → 15𝑅3 + 𝑅2
0 0 1 2
⎡
⎤
1 0 0 1
𝑅1 → −4𝑅2 + 𝑅1 gives ⎣0 1 0 0⎦ .
0 0 1 2
Given this augmented matrix we ⎧
can
⎡ see
⎤⎫that the solution is 𝑥1 = 1, 𝑥2 = 0 and 𝑥3 = 2.
⎨ 1 ⎬
Therefore, the solution set is 𝑆 = ⎣ 0 ⎦ .
⎩
⎭
2
Section 3.6
Rank and Nullity
3.6
75
Rank and Nullity
Definition 3.6.1
We use 𝑀𝑚×𝑛 (R) to denote to the set of all 𝑚 × 𝑛 matrices with real entries.
𝑀𝑚×𝑛 (R), 𝑀𝑚×𝑛 (C),
𝑀𝑚×𝑛 (F)
We use 𝑀𝑚×𝑛 (C) to denote to the set of all 𝑚 × 𝑛 matrices with complex entries.
If we do not need to distinguish between real and complex entries we will use 𝑀𝑚×𝑛 (F).
REMARK
If 𝑚 = 𝑛 it is common to abbreviate 𝑀𝑚×𝑛 (F) as 𝑀𝑛 (F).
Let us consider a system of 𝑚 linear equations in 𝑛 variables. The
matrix
[︁ coefficient
#» ]︁
of this system, 𝐴, belongs to 𝑀𝑚×𝑛 (F) and the augmented matrix, 𝐴 | 𝑏 , belongs to
𝑀𝑚×(𝑛+1) (F). In other words, the augmented matrix has exactly one more column than
the coefficient matrix.
There are three important questions that we would like to consider about this system.
1. Is the system consistent or inconsistent?
2. If the system is consistent, does it have a unique solution?
3. If the system is consistent and does not have a unique solution, how many parameters
are there in the solution set?
We will be able to[︁ answer
these questions once we have an REF or the RREF of the
#» ]︁
augmented matrix 𝐴 | 𝑏 even before we have found the solution set.
Definition 3.6.2
Rank
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) such that RREF(𝐴) has exactly 𝑟 pivots. Then we say that the rank of
𝐴 is 𝑟, and we write rank(𝐴) = 𝑟.
Although there are infinitely many REFs of a matrix 𝐴, Theorem 3.5.8 (Unique RREF)
tells us that they will all have the same pivot positions, hence the same number of pivots.
Since RREF(𝐴) is an REF of 𝐴, all REFs of 𝐴 will have the same number of pivots as
RREF(𝐴). Therefore, we can use any REF of 𝐴 to determine its rank.
Proposition 3.6.3
(Rank Bounds)
If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then rank(𝐴) ≤ min{𝑚, 𝑛}.
Proof: Since there is at most one pivot in each row, then rank(𝐴) ≤ 𝑚. Also, since there
is at most one pivot in each column, then rank(𝐴) ≤ 𝑛.
Comparing the rank of the coefficient matrix of a system of linear equations to the rank of
its augmented matrix will tell us whether the system is consistent or inconsistent.
76
Proposition 3.6.4
Chapter 3
Systems of Linear Equations
(Consistent System Test)
[︁
#» ]︁
Let 𝐴 be the coefficient matrix of a system of linear equations and let 𝐴 | 𝑏
be the augmented
of the system. The system is consistent if and only if
(︁ [︁ matrix
#» ]︁ )︁
rank(𝐴) = rank
𝐴| 𝑏
.
[︁
#» ]︁
Proof: Suppose that we perform a sequence of EROs on 𝐴 | 𝑏 to manipulate it into its
[︁
]︁
RREF, which we will call 𝑅 | #»
𝑐 . If we perform the same sequence of EROs on 𝐴, we will
[︁
]︁
obtain the matrix 𝑅, which is the RREF of 𝐴. The only difference between 𝑅 and 𝑅 | #»
𝑐
[︁
]︁
(︁ [︁
#» ]︁ )︁
is that 𝑅 | #»
𝑐 has an additional column. Therefore, rank(𝐴) ≤ rank
𝐴| 𝑏
.
[︁
]︁
Assume that the system is inconsistent. Then 𝑅 | #»
𝑐 contains a row of the form [0 · · · 0|1]
[︁
]︁
(︁ [︁
#» ]︁ )︁
and so 𝑅 | #»
𝐴| 𝑏
.
𝑐 has a pivot in its final column. Therefore, rank(𝐴) < rank
[︁
]︁
𝑐 does not contain a row of the form
Assume that the system is consistent. Then 𝑅 | #»
(︁ [︁
#» ]︁ )︁
[0 · · · 0|1] and so 𝑅 and [𝑅| #»
𝑐 ] have the same pivots. Therefore, rank(𝐴) = rank
𝐴| 𝑏
.
In summary, the system is consistent if and only if rank(𝐴) = rank
Example 3.6.5
Suppose we have a system of linear equations
matrix to obtain the following REF.
⎡
2 −5 5
⎢0 0 4
⎢
⎣0 0 0
0 0 0
(︁ [︁
#» ]︁ )︁
𝐴| 𝑏
.
and we perform EROs on the augmented
−7
−9
5
0
⎤
4
3⎥
⎥
2⎦
1
Determine whether the system is consistent or inconsistent.
Solution:
The rank of the coefficient matrix is 3 and the rank of the augmented matrix is 4. Therefore,
the system is inconsistent. The fact that the system is inconsistent is also evident from the
fact that the final row of the augmented matrix corresponds to the equation 0 = 1.
Example 3.6.6
Suppose we have a system of linear equations and
matrix to obtain the following REF.
⎡
2 −5 5 −7
⎢0 0 4 −9
⎢
⎣0 0 0 5
0 0 0 0
we perform EROs on the augmented
⎤
4
3⎥
⎥
2⎦
0
Determine whether the system is consistent or inconsistent.
Solution: The ranks of the coefficient matrix and the augmented matrix are both 3.
Therefore, the system is consistent.
Section 3.6
Rank and Nullity
77
[︁
#» ]︁
Comparing the ranks of 𝐴 and 𝐴 | 𝑏 gives us a quick way of determining whether or not
a system of linear equations is consistent.
If the system is consistent, then we usually want to know how many parameters, if any,
appear in the solution set. We can also answer this question using rank(𝐴), as our next
result shows.
Theorem 3.6.7
(System Rank Theorem)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) with rank(𝐴) = 𝑟.
#»
#»
(a) Let 𝑏 ∈ F𝑚 . If the system of linear equations with augmented matrix [𝐴 | 𝑏 ] is consistent, then the solution set to this system will contain 𝑛 − 𝑟 parameters.
[︁
#» ]︁
#»
(b) The system with augmented matrix 𝐴 | 𝑏 is consistent for every 𝑏 ∈ F𝑚 if and only
if 𝑟 = 𝑚.
Proof: (a) Since 𝐴 has 𝑛 columns, the system of equations has 𝑛 variables. Since
rank(𝐴) = 𝑟, RREF(𝐴) will have 𝑟 pivot columns. Therefore, the system will have 𝑟
basic variables and 𝑛−𝑟 free variables. In determining the solution set, we assign each
free variable a parameter. Therefore, the solution set will contain 𝑛 − 𝑟 parameters.
(b) [︁We prove
the contrapositive which says that the system with augmented matrix
#» ]︁
#»
𝐴 | 𝑏 is inconsistent for some 𝑏 ∈ F𝑚 if and only if 𝑟 ̸= 𝑚.
We begin
the forward direction and assume that the system with augmented
[︁ with
#» ]︁
#»
matrix 𝐴 | 𝑏 is inconsistent for some 𝑏 ∈ F𝑚 . Then, as in the proof of Proposition
(︁ [︁
#» ]︁ )︁
3.6. 4 (Consistent System Test), RREF
𝐴| 𝑏
will include a row of the form
[0 · · · 0|1]. Therefore, RREF(𝐴) will include a row of all zeros. Since RREF(𝐴) has 𝑚
rows, it follows that rank(𝐴) < 𝑚.
For the backward direction we assume that 𝑟 ̸= 𝑚. By Proposition 3.6. 3 (Rank
Bounds), we get that 𝑟 < 𝑚. If we let 𝑅 = RREF(𝐴),
[︁
]︁ then 𝑅 must include a
#»
row of all zeros. Consider the augmented matrix 𝑅 | 𝑒 𝑚 , where #»
𝑒 𝑚 is the vector
from the standard basis of F𝑚 whose 𝑚𝑡ℎ component is 1 with all other components
0. It corresponds
[︁
]︁to an inconsistent system. Since all EROs are reversible, we can
#»
apply to 𝑅 | 𝑒 𝑚 the reverse of the EROs needed to row reduce 𝐴 to 𝑅. Applying
[︁
#» ]︁
these reverse EROs will result in an augmented matrix of the form 𝐴 | 𝑏 , for some
[︁
]︁
[︁
#»
#» ]︁
𝑏 ∈ F𝑚 . The matrix is row equivalent to 𝑅 | #»
𝑒 𝑚 . Therefore, 𝐴 | 𝑏 is the
augmented matrix of an inconsistent system.
Definition 3.6.8
Nullity
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) with rank(𝐴) = 𝑟. We define the nullity of 𝐴, written nullity(𝐴), to be
the integer 𝑛 − 𝑟.
78
Chapter 3
Systems of Linear Equations
[︁
#» ]︁
Thus Theorem 3.6. 7 (System Rank Theorem) shows that, if 𝐴 | 𝑏 is the augmented
matrix of a consistent system, the number of parameters in the solution set of this system
is given by nullity(𝐴).
Example 3.6.9
Returning to Example 3.6.6, we have an augmented matrix with the following REF:
⎡
⎤
2 −5 5 −7 4
⎢0 0 4 −9 3⎥
⎢
⎥
⎣0 0 0 5 2⎦ .
0 0 0 0 0
How many parameters will there be in the solution set of the corresponding system?
Solution:
We already determined that this system was consistent. The system has four variables and
three pivots. Using Theorem 3.6.7 (System Rank Theorem), the number of parameters in
the solution set to this system is 4 − 3 = 1 (i.e. the nullity is 1). The variable 𝑥2 is a free
variable because the second column is not a pivot column. All other variables are basic.
3.7
Homogeneous and Non-Homogeneous Systems, Nullspace
We examine a series of examples that will help us state general facts about solving systems
of linear equations.
Example 3.7.1
Solve the following system of linear equations:
𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = 1
2𝑥1 −4𝑥2 +𝑥3
= 5 .
𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 = 4
Solution:
[︁
#» ]︁
The augmented matrix 𝐴 | 𝑏 for the system is
⎡
⎤
1 −2 −1 3 1
⎣2 −4 1
0 5⎦ .
1 −2 2 −3 4
We perform the following EROs.
{︃
𝑅2 → −2𝑅1 + 𝑅2
⎡
1
⎣
gives 0
𝑅3 → −𝑅1 + 𝑅3
0
⎡
1 −2
𝑅2 → 13 𝑅2 gives ⎣0 0
0 0
⎡
1
𝑅3 → −3𝑅2 + 3𝑅3 gives ⎣0
0
⎤
−2 −1 3 1
0
3 −6 3⎦ .
0
3 −6 3
⎤
−1 3 1
1 −2 1⎦ .
3 −6 3
⎤
−2 −1 3 1
0
1 −2 1⎦ .
0
0
0 0
Section 3.7
Homogeneous and Non-Homogeneous Systems, Nullspace
This matrix is in REF and since rank(𝐴) = rank
this system is consistent.
79
(︁[︁
#» ]︁)︁
𝐴| 𝑏
= 2, then we conclude that
We perform the following ERO.
⎤
⎡
1 −2 0 1 2
𝑅1 → 𝑅1 + 𝑅2 gives ⎣0 0 1 −2 1⎦ .
0 0 0 0 0
This matrix is in RREF. Since there are four variables and rank(𝐴) = 2, the number of
parameters is 4 − 2 = 2. We let the free variables 𝑥2 and 𝑥4 be 𝑠 and 𝑡 respectively, where
𝑠, 𝑡 ∈ R.
Solving for the basic variables, we get 𝑥1 = 2+2𝑠−𝑡 and 𝑥3 = 1+2𝑡, for 𝑠, 𝑡 ∈ R. Therefore,
the solution set is
⎫ ⎧⎡ ⎤
⎧⎡
⎫
⎡ ⎤
⎡ ⎤
⎤
−1
2
2
2 + 2𝑠 − 𝑡
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎬
⎬ ⎨⎢ ⎥
⎨⎢
⎢ 0 ⎥
⎢1⎥
⎥
0
𝑠
⎥
⎢
⎥
⎢
⎥
⎥
⎢
⎢
:
𝑠,
𝑡
∈
R
.
+
𝑡
+
𝑠
:
𝑠,
𝑡
∈
R
=
𝑆= ⎣
⎣ 2 ⎦
⎣0⎦
1 + 2𝑡 ⎦
⎪
⎪
⎪⎣ 1 ⎦
⎪
⎪
⎪
⎪
⎭
⎭ ⎪
⎩
⎩
1
0
0
𝑡
Example 3.7.2
Solve the following system of linear equations:
𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = 1
2𝑥1 −4𝑥2 +𝑥3
= 2 .
𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 = 3
Solution:
The augmented matrix is:
⎡
⎤
1 −2 −1 3 1
⎣2 −4 1
0 2⎦ .
1 −2 2 −3 3
The coefficient matrix for this system is exactly the same as it was for the previous system.
So we perform the exact same sequence of EROs to obtain the following augmented matrix.
(Only the constant terms will be different.)
⎡
⎤
1 −2 −1 3 1
⎣0 0
1 −2 0⎦ .
0 0
0
0 2
(︁[︁
#» ]︁)︁
This matrix is in REF and since rank(𝐴) = 2 is not equal to rank 𝐴 | 𝑏
= 3, we conclude that this system is inconsistent. (We could also notice that the third row corresponds
to the equation 0 = 2.)
The solution set is 𝑇 = ∅.
80
Example 3.7.3
Chapter 3
Systems of Linear Equations
Solve the following system of linear equations:
𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = −1
2𝑥1 −4𝑥2 +𝑥3
=
4 .
𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 =
5
Solution:
Again we observe that we have the same coefficient matrix and thus, we will perform the
same sequence of EROs as in the two previous examples to obtain the following augmented
matrix.
⎡
⎤
1 −2 −1 3 −1
⎣0 0
1 −2 2 ⎦ .
0 0
0
0
0
This matrix is in REF, and since rank(𝐴) = rank
system is consistent. This time the RREF will be
(︁[︁
#» ]︁)︁
𝐴| 𝑏
= 2, we conclude that this
⎤
⎡
1 −2 0 1 1
⎣0 0 1 −2 2⎦ .
0 0 0 0 0
Since there are four variables and rank(𝐴) = 2, then the number of parameters is 4 − 2 = 2.
We let the free variables 𝑥2 and 𝑥4 be 𝑠 and 𝑡 respectively, where 𝑠, 𝑡 ∈ R.
Solving for the basic variables, we get 𝑥1 = 1+2𝑠−𝑡 and 𝑥3 = 2+2𝑡, for 𝑠, 𝑡 ∈ R. Therefore,
the solution set is
⎧⎡
⎫ ⎧⎡ ⎤
⎫
⎤
⎡ ⎤
⎡ ⎤
1 + 2𝑠 − 𝑡
1
2
−1
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨⎢
⎬ ⎨⎢ ⎥
⎬
⎥
⎢1⎥
⎢ 0 ⎥
𝑠
0
⎥ : 𝑠, 𝑡 ∈ R = ⎢ ⎥ + 𝑠 ⎢ ⎥ + 𝑡 ⎢ ⎥ : 𝑠, 𝑡 ∈ R .
𝑈= ⎢
⎣ 2 + 2𝑡 ⎦
⎣0⎦
⎣ 2 ⎦
⎪
⎪
⎪⎣ 2 ⎦
⎪
⎪
⎪
⎪
⎩
⎭ ⎪
⎩
⎭
𝑡
0
0
1
Example 3.7.4
Solve the following system of linear equations:
𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = 0
2𝑥1 −4𝑥2 +𝑥3
= 0 .
𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 = 0
Solution:
Once again, we observe that we have the same coefficient matrix and thus we will be
performing the same sequence of EROs as in the previous three examples. We also note
that this system is guaranteed to be consistent since all of the constant terms are originally
0 and no ERO will change this fact. We can always set all of the variables to 0 to obtain a
solution. The RREF of this system is
Section 3.7
Homogeneous and Non-Homogeneous Systems, Nullspace
81
⎡
⎤
1 −2 0 1 0
⎣0 0 1 −2 0⎦ .
0 0 0 0 0
As before we let the free variables 𝑥2 and 𝑥4 be 𝑠 and 𝑡 respectively, where 𝑠, 𝑡 ∈ R.
Solving for the basic variables, we get 𝑥1 = 2𝑠 − 𝑡 and 𝑥3 = 2𝑡, for 𝑠, 𝑡 ∈ R. Therefore, the
solution set is
⎧⎡
⎫ ⎧ ⎡ ⎤
⎫
⎤
⎡
⎤
2𝑠
−
𝑡
2
−1
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨⎢
⎬ ⎪
⎨ ⎢ ⎥
⎬
⎥
⎢ 0 ⎥
𝑠
1
⎥ : 𝑠, 𝑡 ∈ R = 𝑠 ⎢ ⎥ + 𝑡 ⎢
⎥ : 𝑠, 𝑡 ∈ R .
𝑉 = ⎢
⎣ 2𝑡 ⎦
⎣ 2 ⎦
⎪
⎪
⎪ ⎣0⎦
⎪
⎪
⎪
⎪
⎩
⎭ ⎪
⎩
⎭
𝑡
0
1
Note that in this final example, the solution set, 𝑉 , can be written as all linear combinations
[︀
]︀𝑇
[︀
]︀𝑇
of the vectors 2 1 0 0 and −1 0 2 1 . Therefore, we can write the solution set as the
span of these two vectors:
⎫
⎧ ⎡ ⎤
⎧⎡ ⎤ ⎡
⎤⎫
⎤
⎡
−1 ⎪
2
−1
2
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎬
⎬
⎨⎢ ⎥ ⎢
⎨ ⎢ ⎥
⎢ 0 ⎥
1⎥ ⎢ 0 ⎥
1⎥
⎥
⎥
⎢
⎢
⎢
,
:
𝑠,
𝑡
∈
R
=
Span
𝑉 = 𝑠⎣ ⎦ + 𝑡⎣
⎣ 0 ⎦ ⎣ 2 ⎦⎪ .
2 ⎦
0
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎭
⎭
⎩
⎩
1
0
1
0
There are several observations we can make by comparing these four systems of linear
equations.
The first observation is that the last system is the simplest to solve because all the constant
terms are zero. We give a name to such a system and its solution set.
Definition 3.7.5
Homogeneous and
Non-homogeneous
Systems
We say that a system of linear equations is homogeneous if all the constant terms on
the right-hand side of the equations are zero. Otherwise we say the system is nonhomogeneous.
[︁
#» ]︁
Thus, if 𝐴 | 𝑏 is the augmented matrix of a system of linear equations, then the system
#»
#»
is homogeneous if 𝑏 = ⃗0, and non-homogeneous if 𝑏 ̸= ⃗0.
As we noted in Example 3.7.4, a homogeneous system will always be consistent. We can set
all of the variables to 0 to obtain a solution. This solution is special in its consideration,
and we formally define it below.
Definition 3.7.6
Trivial Solution
For a homogeneous system with variables 𝑥1 , 𝑥2 , ..., 𝑥𝑛 , the trivial solution is the solution
𝑥1 = 𝑥2 = · · · = 𝑥𝑛 = 0.
Moreover, if we collect all the solutions of a homogeneous system into a single set, that set
has a special relationship with the matrix 𝐴. We define this set formally below.
82
Chapter 3
Definition 3.7.7
Nullspace
Systems of Linear Equations
The solution set of a homogeneous system of linear equations with coefficient matrix 𝐴 is
called the nullspace of 𝐴 and is denoted Null(𝐴).
Consider the solution sets of the three consistent systems in our previous examples.
⎧⎡ ⎤
⎫
⎡ ⎤
⎡ ⎤
2
2
−1
⎪
⎪
⎪
⎪
⎨
⎬
⎢ 0 ⎥
⎢1⎥
⎢0⎥
⎥
⎢
⎥
⎢
⎢
⎥
𝑆 = ⎣ ⎦ + 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R .
0
1
2
⎪
⎪
⎪
⎪
⎩
⎭
0
0
1
⎫
⎧⎡ ⎤
⎡ ⎤
⎡ ⎤
−1
2
1
⎪
⎪
⎪
⎪
⎬
⎨⎢ ⎥
⎢ 0 ⎥
⎢1⎥
0
⎥
⎥
⎢
⎥
⎢
⎢
𝑈 = ⎣ ⎦ + 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R .
2
0
⎪
⎪
⎪
⎪ 2
⎭
⎩
1
0
0
⎧ ⎡ ⎤
⎫
⎧⎡ ⎤ ⎡
⎤
⎡
⎤⎫
−1
2
−1
2
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨ ⎢ ⎥
⎬
⎨⎢ ⎥ ⎢
⎢ 0 ⎥
⎥⎬
1
0
1
⎥ : 𝑠, 𝑡 ∈ R = Span ⎢ ⎥ , ⎢
⎥ + 𝑡⎢
⎥ .
𝑉 = 𝑠⎢
⎣ 2 ⎦
⎣0⎦
⎣ 0 ⎦ ⎣ 2 ⎦⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
⎭
⎩
⎭
1
0
1
0
⎤
1 −2 −1 3
Note that 𝑉 = Null(𝐴), where 𝐴 = ⎣ 2 −4 1 0 ⎦ is the common coefficient matrix of all
1 −2 2 −3
three systems.
⎡
The second observation is that the solution set for the homogeneous system of equations can
be written as the span of a set of vectors. The solution sets for the two non-homogeneous
systems of equations cannot be written as the span of a set of vectors because these solution
sets do not include ⃗0.
The third observation is that the solution sets 𝑆, 𝑈 and 𝑉 are very similar. To be more
precise, the solution sets 𝑆 and 𝑈 of the non-homogeneous systems differ by only a constant
vector and the part which they have in common is 𝑉 . We will return to this observation
#»
#»
𝑥 = 0 and 𝐴 #»
𝑥 = 𝑏 ).
later and formalize it in Theorem 3.11.6 (Solutions to 𝐴 #»
The systems in Examples 3.7.1, 3.7.2, 3.7.3 and 3.7.4 have the same coefficient matrix, 𝐴,
but different right-hand sides:
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
1
1
−1
0
⎣ 5 ⎦ , ⎣ 2 ⎦ , ⎣ 4 ⎦ and ⎣ 0 ⎦ ,
4
3
5
0
respectively. To solve each of the previous four examples,
of EROs. We can solve all four systems at once using
matrix”,
⎡
1 −2 −1 3 1 1 −1
⎣2 −4 1
0 5 2 4
1 −2 2 −3 4 3 5
we performed the same sequence
the following “super-augmented
⎤
0
0⎦ ,
0
Section 3.8
Homogeneous and Non-Homogeneous Systems, Nullspace
which has RREF
83
⎡
⎤
1 −2 0 1 2 1 1 0
⎣0 0 1 −2 1 0 2 0⎦ .
0 0 0 0 0 2 0 0
We use the RREF of the “super-augmented matrix” to solve the four systems. For the first
system, we need the first column after the vertical line.
⎡
⎤
1 −2 0 1 2
⎣0 0 1 −2 1⎦
0 0 0 0 0
This corresponds to the system of equations
𝑥1 −2𝑥2
with solution set
+𝑥4 = 2
𝑥3 −2𝑥4 = 1
0 = 0
⎧⎡ ⎤
⎫
⎡ ⎤
⎡ ⎤
2
2
−1
⎪
⎪
⎪
⎪
⎨⎢ ⎥
⎬
⎢1⎥
⎢ 0 ⎥
0
⎢
⎥
⎢
⎥
⎢
⎥
𝑆 = ⎣ ⎦ + 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R .
0
2
⎪
⎪
⎪ 1
⎪
⎩
⎭
0
0
1
For the second system, we need the second column after the vertical line.
⎤
⎡
1 −2 0 1 1
⎣0 0 1 −2 0⎦
0 0 0 0 2
This is an inconsistent system and has solution set 𝑇 = ∅.
For the third and fourth systems we need the third and fourth columns after the vertical
line, respectively,
⎡
⎤
⎡
⎤
1 −2 0 1 0
1 −2 0 1 1
⎣0 0 1 −2 2⎦ and ⎣0 0 1 −2 0⎦ .
0 0 0 0 0
0 0 0 0 0
These correspond to solution sets
⎫
⎧⎡ ⎤
⎡ ⎤
⎡ ⎤
1
2
−1
⎪
⎪
⎪
⎪
⎨⎢ ⎥
⎬
⎢1⎥
⎢ 0 ⎥
0
⎥ + 𝑎 ⎢ ⎥ + 𝑏 ⎢ ⎥ : 𝑎, 𝑏 ∈ R
𝑈= ⎢
⎣2⎦
⎣0⎦
⎣ 2 ⎦
⎪
⎪
⎪
⎪
⎩
⎭
0
0
1
and
respectively.
⎧ ⎡ ⎤
⎫
⎧⎡ ⎤ ⎡ ⎤⎫
⎡ ⎤
2
−1
2
−1 ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨ ⎢ ⎥
⎬
⎨⎢ ⎥ ⎢ ⎥⎪
⎬
⎢ 0 ⎥
1
1
⎥ + 𝑡 ⎢ ⎥ : 𝑠, 𝑡 ∈ R = Span ⎢ ⎥ , ⎢ 0 ⎥ ,
𝑉 = 𝑠⎢
⎣0⎦
⎣ 2 ⎦
⎣ 0 ⎦ ⎣ 2 ⎦⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
⎭
⎩
⎭
0
1
0
1
84
Chapter 3
3.8
Systems of Linear Equations
Solving Systems of Linear Equations over C
The Gauss–Jordan Algorithm works just as well over C as it does over R. The only new feature is that the arithmetic is more intricate because of the complex numbers. If parameters
are introduced in the solution, then these parameters take on all complex values instead of
just real values.
REMARK
Recall that for 𝑧 = 𝑎 + 𝑏𝑖 ∈ C, 𝑧 ̸= 0, that
Example 3.8.1
1
𝑧
𝑎 − 𝑏𝑖
=
= 2
.
𝑧
𝑧𝑧
𝑎 + 𝑏2
Solve the following system of linear equations over C.
(1 + 𝑖) 𝑧1 +(−2 − 3𝑖)𝑧2 =
−15𝑖
(1 + 3𝑖) 𝑧1 +(−1 − 8𝑖)𝑧2 = 15 − 30𝑖
Solution:
The augmented matrix is
]︂
1 + 𝑖 −2 − 3𝑖
−15𝑖
.
1 + 3𝑖 −1 − 8𝑖 15 − 30𝑖
[︂
We perform the following EROs.
(︂ (︂
)︂ )︂
[︂
]︂
1 1
1
1
− 52 − 12 𝑖 − 15
− 15
𝑖
2
2
𝑅1 =
− 𝑖 𝑅1 gives
𝑅1 →
.
1 + 3𝑖 −1 − 8𝑖 15 − 30𝑖
1+𝑖
2 2
[︂
1 − 52 − 12 𝑖 − 15
2 −
𝑅2 → −(1 + 3𝑖)𝑅1 + 𝑅2 gives
0
0
0
15
2 𝑖
]︂
.
This matrix is in RREF. The rank of the coefficient matrix and the rank of the augmented
matrix are both equal to 1 and so the system is consistent. The number of parameters in
the solution set is 2 − 1 = 1 parameter in the solution set. We let the
𝑧2 = 𝑡
(︀ 5 free1 variable
)︀
15
for 𝑡 ∈ C. Solving for the basic variable 𝑧1 we get 𝑧1 = − 15
−
𝑖
+
+
𝑖
𝑡.
Therefore,
2
2
2
2
the solution set, 𝑆, is
{︂[︂ (︀ 15 15 )︀ (︀ 5 1 )︀ ]︂
}︂ {︂[︂ 15 15 ]︂ [︂ 5 1 ]︂
}︂
− 2 − 2 𝑖 + 2 + 2𝑖 𝑡
−2 − 2𝑖
+
𝑖
𝑆=
:𝑡∈C =
+ 2 2 𝑡:𝑡∈C
𝑡
0
1
3.9
Matrix–Vector Multiplication
In this section we will define a notion of matrix–vector multiplication that will be very
useful in our study of systems of linear equations. However, before we give the definition,
we will consider some different ways that we can partition a matrix.
Section 3.9
Matrix–Vector Multiplication
Definition 3.9.1
Row Vector
85
A row vector is a matrix with exactly one row. For a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F), we will denote
# »(𝐴). That is,
the 𝑖𝑡ℎ row of 𝐴 by row
𝑖
# »(𝐴) = [︀ 𝑎 𝑎 · · · 𝑎 ]︀ .
row
𝑖1 𝑖2
𝑖𝑛
𝑖
We can write an 𝑚 × 𝑛 matrix 𝐴 as
⎡# »
⎤
row1 (𝐴)
# »(𝐴) ⎥
⎢ row
2
⎢
⎥
𝐴=⎢
⎥.
..
⎣
⎦
.
#row »(𝐴)
𝑚
In other words, we can partition 𝐴 into 𝑚 row vectors, each with 𝑛 components.
]︂
5 −2 0
# »(𝐴) = [︀ 5 −2 0 ]︀ and row
# »(𝐴) = [︀ 2 1 −8 ]︀.
. Then row
Let 𝐴 =
1
2
2 1 −8
[︂
Example 3.9.2
[︀
]︀
#»
𝑚
Similarly, we can think of an 𝑚 × 𝑛 matrix 𝐴 as 𝐴 = 𝑎#»1 𝑎#»2 . . . 𝑎# »
𝑛 , where 𝑎𝑖 ∈ F . In
other words, we can partition 𝐴 into 𝑛 column vectors, each with 𝑚 components. We will
use the convention, demonstrated here, where an upper case letter refers to the matrix and
the corresponding lower case letter with subscripts refers to the columns.
⎡
Example 3.9.3
⎤
⎡ ⎤
⎡ ⎤
1 2
1
2
[︁ #» #» ]︁
#»
#»
Let 𝐵 = ⎣ −3 −4 ⎦. Then 𝐵 = 𝑏1 𝑏2 , where 𝑏 1 = ⎣ −3 ⎦ and 𝑏 2 = ⎣ −4 ⎦.
7 9
7
9
We now define the product of an 𝑚 × 𝑛 matrix and a (column) vector in F𝑛 .
Definition 3.9.4
Matrix–Vector
Multiplication in
Terms of the
Individual Entries
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and
⎡
𝑎11
⎢ 𝑎21
⎢
𝐴 #»
𝑥 =⎢ .
⎣ ..
𝑎𝑚1
#»
𝑥 ∈ F𝑛 . We define the product 𝐴 #»
𝑥 as follows:
⎤
⎤⎡ ⎤ ⎡
𝑎12 . . . 𝑎1𝑛
𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛
𝑥1
⎢ ⎥ ⎢
⎥
𝑎22 . . . 𝑎2𝑛 ⎥
⎥ ⎢ 𝑥2 ⎥ ⎢ 𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛 ⎥
⎥.
.. . .
..
.. ⎥ ⎢ .. ⎥ = ⎢
⎦
. . ⎦⎣ . ⎦ ⎣
.
.
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛
𝑎𝑚2 . . . 𝑎𝑚𝑛
𝑥𝑛
REMARK
Notice that we have defined the product of an 𝑚 × 𝑛 matrix 𝐴 with a vector #»
𝑥 only if
#»
𝑥 ∈ F𝑛 . The expression 𝐴 #»
𝑦 is meaningless if #»
𝑦 ∈ F𝑘 with 𝑘 ̸= 𝑛.
In other words, the number of components of #»
𝑥 must be equal to the number of columns
of 𝐴 for the product 𝐴 #»
𝑥 to be defined.
86
Chapter 3
Systems of Linear Equations
The resulting product 𝐴 #»
𝑥 of 𝐴 ∈ 𝑀𝑚×𝑛 (F) and #»
𝑥 ∈ F𝑛 is a vector in F𝑚 . The 𝑖𝑡ℎ
component of 𝐴 #»
𝑥 is the sum
𝑎𝑖1 𝑥1 + 𝑎𝑖2 𝑥2 + · · · + 𝑎𝑖𝑛 𝑥𝑛 =
𝑛
∑︁
𝑎𝑖𝑗 𝑥𝑗 =
𝑗=1
𝑛
∑︁
# »(𝐴)) 𝑥 .
(row
𝑖
𝑗 𝑗
𝑗=1
Informally, we can think of the sum above as something analogous to a “dot product” of
the 𝑖𝑡ℎ row of 𝐴 with the vector #»
𝑥.
⎡
Example 3.9.5
⎤
⎡ ⎤
16 1
1
Compute the product 𝐴 #»
𝑥 where 𝐴 = ⎣ 3 4 5 ⎦ and #»
𝑥 = ⎣ −4 ⎦.
5 2 −3
6
Solution: We have coloured the matrix red and the vector blue to emphasize the role their
entries play in calculating the product.
First, we calculate the components of the product by calculating “dot products” of the rows
of 𝐴 with #»
𝑥.
⎡
⎤⎡ ⎤ ⎡
⎤ ⎡
⎤
16 1
1
(1)(1) + (6)(−4) + (1)(6)
−17
⎣ 3 4 5 ⎦⎣ −4 ⎦ = ⎣ (3)(1) + (4)(−4) + (5)(6) ⎦ = ⎣ 17 ⎦ .
5 2 −3
6
(5)(1) + (2)(−4) + (−3)(6)
−21
We also make the observation that
⎡
𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛
⎢ 𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛
⎢
⎢
..
⎣
.
⎤
⎡
𝑎11
⎥
⎢ 𝑎21
⎥
⎢
⎥ = 𝑥1 ⎢ ..
⎦
⎣ .
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛
𝑎𝑚1
⎤
⎡
𝑎12
⎥
⎢ 𝑎22
⎥
⎢
⎥ + 𝑥2 ⎢ ..
⎦
⎣ .
⎤
⎡
𝑎1𝑛
⎥
⎢ 𝑎2𝑛
⎥
⎢
⎥ + · · · + 𝑥𝑛 ⎢ ..
⎦
⎣ .
𝑎𝑚2
⎤
⎥
⎥
⎥,
⎦
𝑎𝑚𝑛
which is a linear combination of the columns of 𝐴. Therefore, we can re-express matrix–
vector multiplication in terms of the columns of 𝐴.
Proposition 3.9.6
(Matrix–Vector Multiplication in Terms of Columns)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and #»
𝑥 ∈ F𝑛 . Then
⎤
𝑥1
⎥
[︀
]︀ ⎢
⎢ 𝑥2 ⎥
𝑎 1 #»
𝑎 2 · · · #»
𝑎 𝑛 ⎢ . ⎥ = 𝑥1 #»
𝐴 #»
𝑥 = #»
𝑎 1 + 𝑥2 #»
𝑎 2 + · · · + 𝑥𝑛 #»
𝑎 𝑛.
⎣ .. ⎦
⎡
𝑥𝑛
⎡
Example 3.9.7
⎤⎡ ⎤
16 1
1
⎣
⎦
⎣
−4 ⎦ again, this time by thinking of the product as a linear
We will compute 3 4 5
5 2 −3
6
Section 3.9
Matrix–Vector Multiplication
87
combination of the columns of 𝐴:
⎡
⎤⎡
⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
16 1
1
1
6
1
⎣ 3 4 5 ⎦⎣ −4 ⎦ = (1)⎣ 3 ⎦ + (−4)⎣ 4 ⎦ + (6)⎣ 5 ⎦
5 2 −3
6
5
2
−3
⎡ ⎤ ⎡
⎤ ⎡
⎤
1
−24
6
= ⎣ 3 ⎦ + ⎣ −16 ⎦ + ⎣ 30 ⎦
5
−8
−18
⎡
⎤
−17
= ⎣ 17 ⎦ .
−21
Example 3.9.8
⎡
⎤
1
1 + 𝑖 2 + 2𝑖 3 − 𝑖
Compute the product 𝐴 #»
𝑥 where 𝐴 =
and #»
𝑥 = ⎣ 1 − 𝑖 ⎦.
2 + 3𝑖 4 + 𝑖 5 − 2𝑖
2 − 3𝑖
[︂
]︂
Solution: We compute the product in two different ways. We have coloured the matrix
red and the vector blue to emphasize the role their entries play in calculating the product.
First, we calculate the components of the product by calculating “dot products” of the rows
of 𝐴 with #»
𝑥.
⎤
⎡
]︂
[︂
1
1 + 𝑖 2 + 2𝑖 3 − 𝑖 ⎣
1−𝑖 ⎦
𝐴 #»
𝑥 =
2 + 3𝑖 4 + 𝑖 5 − 2𝑖
2 − 3𝑖
]︂
[︂
(1 + 𝑖)(1) + (2 + 2𝑖)(1 − 𝑖) + (3 − 𝑖)(2 − 3𝑖)
=
(2 + 3𝑖)(1) + (4 + 𝑖)(1 − 𝑖) + (5 − 2𝑖)(2 − 3𝑖)
]︂
[︂
8 − 10𝑖
.
=
11 − 19𝑖
Second, we calculate the product by thinking of the product as a linear combination of the
columns of 𝐴.
⎡
⎤
[︂
]︂
1
1 + 𝑖 2 + 2𝑖 3 − 𝑖 ⎣
1−𝑖 ⎦
𝐴 #»
𝑥 =
2 + 3𝑖 4 + 𝑖 5 − 2𝑖
2 − 3𝑖
[︂
]︂
[︂
]︂
[︂
]︂
1+𝑖
2 + 2𝑖
3−𝑖
= (1)
+ (1 − 𝑖)
+ (2 − 3𝑖)
2 + 3𝑖
4+𝑖
5 − 2𝑖
[︂
]︂ [︂
]︂ [︂
]︂ [︂
]︂
1+𝑖
4
3 − 11𝑖
8 − 10𝑖
=
+
+
=
.
2 + 3𝑖
5 − 3𝑖
4 − 19𝑖
11 − 19𝑖
The following useful result can be proved using Proposition 3.9.6 (Matrix–Vector Multiplication in Terms of Columns).
Theorem 3.9.9
(Linearity of Matrix–Vector Multiplication)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). Let #»
𝑥 , #»
𝑦 ∈ F𝑛 and 𝑐 ∈ F. Then
88
Chapter 3
Systems of Linear Equations
(a) 𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦.
(b) 𝐴(𝑐 #»
𝑥 ) = 𝑐𝐴 #»
𝑥.
3.10
Using a Matrix–Vector Product to Express a System of
Linear Equations
Consider the system of linear equations
𝑎11 𝑥1
𝑎21 𝑥1
+𝑎12 𝑥2 + · · ·
+𝑎22 𝑥2 + · · ·
..
.
𝑎𝑚1 𝑥1 +𝑎𝑚2 𝑥2 + · · ·
⎡
𝑎11 𝑎12
⎢ 𝑎21 𝑎22
⎢
with coefficient matrix 𝐴 = ⎢ .
..
⎣ ..
.
𝑎𝑚1 𝑎𝑚2
+𝑎1𝑛 𝑥𝑛 =
+𝑎2𝑛 𝑥𝑛 =
𝑏1
𝑏2
+𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚
⎤
· · · 𝑎1𝑛
· · · 𝑎2𝑛 ⎥
⎥
. ⎥.
..
. .. ⎦
· · · 𝑎𝑚𝑛
⎡
⎤
⎡ ⎤
𝑥1
𝑏1
⎢
⎥
⎢
#»
#» ⎢ 𝑏2 ⎥
⎢ 𝑥2 ⎥
⎥
This system can now be represented as 𝐴 #»
𝑥 = 𝑏 , where #»
𝑥 = ⎢ . ⎥ and 𝑏 = ⎢ . ⎥.
⎣ .. ⎦
⎣ .. ⎦
𝑥𝑛
𝑏𝑛
#»
Going forwards, we will also refer to expressions of the form 𝐴 #»
𝑥 = 𝑏 , with 𝐴 ∈ 𝑀𝑚×𝑛 (F),
#»
#»
𝑥 ∈ F𝑚 and 𝑏 ∈ F𝑚 , as “a system of linear equations”, acknowledging the equivalence of
#»
a system of 𝑚 linear equations (as given above) to the expression 𝐴 #»
𝑥 = 𝑏.
Example 3.10.1
The system of equations
5𝑥1 +4𝑥2 −7𝑥3 +2𝑥4 = −1
−6𝑥1 −2𝑥2 +3𝑥3
=
2
+3𝑥2 +5𝑥3 +5𝑥4 =
7
can be represented as 𝐴⃗𝑥 = ⃗𝑏, where
⎡
⎤
⎡ ⎤
5 4 −7 2
−1
𝐴 = ⎣ −6 −2 3 0 ⎦ and ⃗𝑏 = ⎣ 2 ⎦ .
0 3 5 5
7
Using this notation and the linearity of matrix–vector multiplication, we can prove a result
that is essentially a corollary of Theorem 3.6.7 (System Rank Theorem).
Section 3.11
Solution Sets to Systems of Linear Equations
Proposition 3.10.2
89
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). If for every vector 𝑒#»𝑖 in the standard basis of F𝑚 the system of equations
𝐴 #»
𝑥 = 𝑒#»𝑖 is consistent, then rank(𝐴) = 𝑚.
Proof: We begin by assuming that for every vector 𝑒#»𝑖 in the standard basis of F𝑚 , the
system of equations 𝐴 #»
𝑥 = 𝑒#»𝑖 is consistent. Let 𝑥#»𝑖 be a solution to 𝐴 #»
𝑥 = 𝑒#»𝑖 .
#»
#»
We know that for any vector 𝑏 ∈ F𝑚 , we can write 𝑏 as a linear combination of the vectors
in the standard basis for F𝑚 . That is, there exists 𝑐1 , . . . , 𝑐𝑚 ∈ F such that
#»
».
𝑏 = 𝑐1 𝑒#»1 + · · · + 𝑐𝑚 𝑒#𝑚
Using these scalars, the solutions #»
𝑥 𝑖 to 𝐴 #»
𝑥 = 𝑒#»𝑖 and the linearity of matrix-vector multiplication we find that
» = #»
𝐴(𝑐1 #»
𝑥 1 + · · · + 𝑐𝑚 #»
𝑥 𝑚 ) = 𝑐1 𝐴 #»
𝑥 1 + · · · + 𝑐𝑚 𝐴 #»
𝑥 𝑚 = 𝑐1 𝑒#»1 + · · · + 𝑐𝑚 𝑒#𝑚
𝑏.
#»
#»
» is a solution to 𝐴 #»
Therefore, 𝑐1 𝑥#»1 + · · · + 𝑐𝑚 𝑥# 𝑚
𝑥 = 𝑏 . Thus, 𝐴 #»
𝑥 = 𝑏 is consistent
#»
for every 𝑏 ∈ F𝑚 . Therefore, by Part (b) of Theorem 3.6.7 (System Rank Theorem),
rank(𝐴) = 𝑚.
3.11
Solution Sets to Systems of Linear Equations
#»
#»
#»
Let 𝐴 #»
𝑥 = 𝑏 be a system of linear equations. If 𝑏 = 0 , then the system is homogeneous.
#»
#»
If 𝑏 ̸= 0 , then the system is non-homogeneous. We will consider the solution sets of
homogeneous and non-homogeneous systems with the same coefficient matrix.
As we noted in Section 3.7, a homogeneous system of linear equations is always consistent,
since we can always set all of the variables to 0 (the trivial solution). In other words, the
solution set to a homogeneous system always contains the zero vector (the trivial solution),
and thus, it is never empty. The solution set of a non-homogeneous system does not contain
the trivial solution. Non-homogeneous systems may be consistent or inconsistent.
The next result gives another important property of the solution set to a homogeneous
system.
Proposition 3.11.1
#»
Let 𝐴 #»
𝑥 = 0 be a homogeneous system of linear equations with solution set 𝑆. If #»
𝑥 , #»
𝑦 ∈ 𝑆,
#»
#»
#»
and if 𝑐 ∈ F, then 𝑥 + 𝑦 ∈ 𝑆 and 𝑐 𝑥 ∈ 𝑆.
Proof: Assume that #»
𝑥 , #»
𝑦 ∈ 𝑆. Then #»
𝑥 and #»
𝑦 are solutions to the homogeneous system
#»
#»
#»
#»
and 𝐴 𝑥 = 0 and 𝐴 𝑦 = 0 . Using the linearity of matrix multiplication,
#» #» #»
𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦 = 0+0 = 0
and
#»
#»
𝐴(𝑐 #»
𝑥 ) = 𝑐(𝐴 #»
𝑥 ) = 𝑐( 0 ) = 0 .
Therefore, #»
𝑥 + #»
𝑦 ∈ 𝑆 and 𝑐 #»
𝑥 ∈ 𝑆.
90
Chapter 3
Systems of Linear Equations
We can combine these two results to state that
If #»
𝑥 , #»
𝑦 ∈ 𝑆 and if 𝑐, 𝑑 ∈ F, then 𝑐 #»
𝑥 + 𝑑 #»
𝑦 ∈ 𝑆.
We can also extend this result inductively to 𝑛 vectors in 𝑆 and 𝑛 scalars in F. In other
words, given 𝑛 solutions to a homogeneous system of linear equations, then any linear
combination of these solutions is also a solution to the homogeneous system. We say that
𝑆 is closed under addition and scalar multiplication. This fact, along with the fact that
𝑆 is non-empty means that 𝑆 forms a type of object called a subspace. We will explore
subspaces later in the course.
Example 3.11.2
Consider the homogeneous system given in Example 3.7.4 of Section 3.7. It has coefficient
matrix
⎡
⎤
1 −2 −1 3
𝐴 = ⎣ 2 −4 1 0 ⎦
1 −2 2 −3
and solution set
⎫
⎧ ⎡ ⎤
⎡ ⎤
−1
2
⎪
⎪
⎪
⎪
⎬
⎨ ⎢ ⎥
⎢ 0 ⎥
1
⎢
⎥
⎥
⎢
𝑉 = 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R .
2
0
⎪
⎪
⎪
⎪
⎭
⎩
1
0
[︀
]︀𝑇
By choosing 𝑠 = 1, 𝑡 = 0 and then 𝑠 = 0, 𝑡 = 1, we get the solutions #»
𝑥 = 2100
[︀
]︀𝑇
𝑧 be the following linear combination of these two
and #»
𝑦 = −1 0 2 1 , respectively. Let #»
solutions:
Then
[︀
]︀𝑇
#»
𝑧 = 2 #»
𝑥 + 3 #»
𝑦 = 1263 .
⎡ ⎤
⎡ ⎤
⎤ 1
0
1 −2 −1 3 ⎢ ⎥
2⎥ ⎣ ⎦
#»
⎢
⎣
⎦
𝐴 𝑧 = 2 −4 1 0 ⎣ ⎦ = 0 .
6
0
1 −2 2 −3
3
⎡
And so #»
𝑧 ∈ 𝑉 as predicted by Proposition 3.11.1.
Definition 3.11.3
Associated
Homogeneous
System
#»
#»
#»
Let 𝐴 #»
𝑥 = 𝑏 , where 𝑏 ̸= 0 , be a non-homogeneous system of linear equations. The
#»
associated homogeneous system to this system is the system 𝐴 #»
𝑥 = 0.
Example 3.11.4
⎡
⎤
⎡ ⎤
2 3
0
−5 7 ⎦ #»
𝑥 = ⎣ 0 ⎦ is a homogeneous system of 3 equations in 3 unknowns.
−4 6
0
⎤
⎡ ⎤
2 3 4
2
#»
⎦
⎣
−5 7 −9 𝑦 = 4 ⎦ is a non-homogeneous system of 3 equations in 4 unknowns,
−4 6 −4
6
⎡
⎤
⎡ ⎤
1 2 3 4
0
with associated homogeneous system of equations ⎣ −4 −5 7 −9 ⎦ #»
𝑦 = ⎣ 0 ⎦.
2 −4 6 −4
0
1
(a) ⎣ −4
2
⎡
1
⎣
(b) −4
2
Section 3.11
Solution Sets to Systems of Linear Equations
Definition 3.11.5
Particular Solution
91
#»
Let 𝐴 #»
𝑥 = 𝑏 be a consistent system of linear equations. We refer to a solution of this
system, 𝑥#»𝑝 , as a particular solution to this system.
#»
Since a consistent system 𝐴 #»
𝑥 = 𝑏 either has a unique solution or it has an infinite number
of solutions, there may be an infinite number of choices for a particular solution.
We can use a particular solution to a non-homogeneous system to find a relationship between
the non-homogeneous system and its associated homogeneous system. Specifically, we can
#»
express the solution set of a consistent, non-homogeneous system 𝐴 #»
𝑥 = 𝑏 in terms of a
particular solution to the system and the solution set of its associated homogeneous system
#»
𝐴 #»
𝑥 = 0 as follows.
Theorem 3.11.6
#»
#»
(Solutions to 𝐴 #»
𝑥 = 0 and 𝐴 #»
𝑥 = 𝑏)
#»
#»
#»
Let 𝐴 #»
𝑥 = 𝑏 , where 𝑏 ̸= 0 , be a consistent non-homogeneous system of linear equations
#»
˜ Let 𝐴 #»
with solution set 𝑆.
𝑥 = 0 be the associated homogeneous system with solution set
˜ then
𝑆. If 𝑥#»𝑝 ∈ 𝑆,
𝑆˜ = {𝑥#»𝑝 + #»
𝑥 : #»
𝑥 ∈ 𝑆}.
˜ Therefore, 𝐴𝑥#»𝑝 = #»
Proof: Let 𝑥#»𝑝 ∈ 𝑆.
𝑏 . Let #»
𝑥 be a solution of its associated homogeneous
#»
#»
system. Therefore, 𝐴 𝑥 = 0 . To show that 𝑆˜ = {𝑥#»𝑝 + #»
𝑥 : #»
𝑥 ∈ 𝑆}, we will show that
#
»
#»
#»
#
»
#»
#»
˜
𝑆˜ ⊆ {𝑥𝑝 + 𝑥 : 𝑥 ∈ 𝑆} and {𝑥𝑝 + 𝑥 : 𝑥 ∈ 𝑆} ⊆ 𝑆.
#»
˜ Therefore, 𝐴 #»
Suppose that #»
𝑦 ∈ 𝑆.
𝑦 = 𝑏 . We can write #»
𝑦 as #»
𝑦 = 𝑥#»𝑝 + #»
𝑦 − 𝑥#»𝑝 . Using the
linearity of matrix-vector multiplication we have that
#» #» #»
𝐴( #»
𝑦 − 𝑥#»𝑝 ) = 𝐴 #»
𝑦 − 𝐴𝑥#»𝑝 = 𝑏 − 𝑏 = 0 ,
which implies that
that 𝑆˜ ⊆ {𝑥#»𝑝 + #»
𝑥 :
#»
𝑦 − 𝑥#»𝑝 ∈ 𝑆. Therefore, #»
𝑦 ∈ {𝑥#»𝑝 + #»
𝑥 : #»
𝑥 ∈ 𝑆}. Thus, we have shown
#»
𝑥 ∈ 𝑆}.
Now, suppose that #»
𝑧 ∈ {𝑥#»𝑝 + #»
𝑥 : #»
𝑥 ∈ 𝑆}. Then #»
𝑧 = 𝑥#»𝑝 + #»
𝑥 1 for some #»
𝑥 1 ∈ 𝑆. Since
#»
⃗𝑥1 ∈ 𝑆, 𝐴⃗𝑥1 = 0 . Again using the linearity of matrix-vector multiplication, we have that
#» #» #»
𝐴 #»
𝑧 = 𝐴(𝑥#»𝑝 + #»
𝑥 1 ) = 𝐴𝑥#»𝑝 + 𝐴 #»
𝑥1 = 𝑏 + 0 = 𝑏 .
˜
˜ Thus, we have shown {𝑥#»𝑝 + #»
Therefore, #»
𝑧 ∈ 𝑆.
𝑥 : #»
𝑥 ∈ 𝑆} ⊆ 𝑆.
The next example illustrates the idea from this theorem.
[︂
Example 3.11.7
]︂
12
Let 𝐴 =
. Solve the following systems and represent their solutions sets graphically.
24
#»
(a) 𝐴 #»
𝑥 = 0
[︂ ]︂
4
#»
(b) 𝐴 𝑥 =
8
92
Chapter 3
Systems of Linear Equations
[︂ ]︂
−2
#»
(c) 𝐴 𝑥 =
−4
Solution: We solve all three systems simultaneously by row-reducing the “super-augmented
matrix”
[︂
]︂
1 2 0 4 −2
.
2 4 0 8 −4
We perform the ERO −2𝑅1 + 𝑅2 to obtain the following matrix which is in RREF:
[︂
]︂
1 2 0 4 −2
.
0 0 0 0 0
Therefore, the solution set to the first system, which is a homogeneous system, is
{︂[︂ ]︂
}︂
−2
𝑆=
𝑡:𝑡∈R .
1
The solution to the second system is
}︂
{︂[︂ ]︂ [︂ ]︂
4
−2
𝑡:𝑡∈R .
𝑇 =
+
1
0
The solution to the third system is
}︂
{︂[︂ ]︂ [︂ ]︂
−2
−2
𝑡:𝑡∈R .
+
𝑈=
1
0
We represent these solution sets graphically. The solution sets are labelled in the diagram.
Section 3.11
Solution Sets to Systems of Linear Equations
93
We see that 𝑆 is a line through the origin, which is to be expected because it is the solution
set to a homogeneous system. The solution sets 𝑇 and 𝑈 are translations of the set 𝑆.
The solution set 𝑇 is obtained
[︂ ]︂ by translating every vector in 𝑆 to the right by 4 units, which
4
is equivalent to adding
, a particular solution to the system in (b), to every vector in 𝑆.
0
Similarly, the solution set 𝑈 is obtained
[︂ by
]︂ translating every vector in 𝑆 to the left by 2
−2
units, which is equivalent to adding
, a particular solution to the system in (c), to
0
every vector in 𝑆.
Let’s consider another example in R3 .
⎡
Example 3.11.8
⎤
123
Let 𝐴 = ⎣ 2 4 6 ⎦. Solve the following systems and represent their solutions sets graphically.
369
#»
(a) 𝐴 #»
𝑥 = 0
⎡
⎤
6
(b) 𝐴 #»
𝑥 = ⎣ 12 ⎦
18
⎡
⎤
−4
(c) 𝐴 #»
𝑥 = ⎣ −8 ⎦
−12
Solution: We solve all three systems simultaneously by row-reducing the “super-augmented
matrix”
⎤
⎡
1 2 3 0 6 −4
⎣ 2 4 6 0 12 −8 ⎦ .
3 6 9 0 18 −12
We perform the EROs −2𝑅1 + 𝑅2 and −3𝑅1 + 𝑅3
in RREF:
⎡
1 2 3 0 6
⎣ 0 0 0 0 0
0 0 0 0 0
to obtain the following matrix which is
⎤
−4
0 ⎦.
0
Therefore, the solution set to the first system, which is a homogeneous system, is
⎧⎡ ⎤
⎫
⎧⎡ ⎤ ⎡
⎡ ⎤
⎤⎫
−3
−3 ⎬
⎨ −2
⎬
⎨ −2
𝑆 = ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ .
⎩
⎭
⎩
⎭
0
1
0
1
The solution to the second system is
⎧⎡ ⎤ ⎡ ⎤
⎫
⎡ ⎤
−2
−3
⎨ 6
⎬
𝑇 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R .
⎭
⎩
0
0
1
94
Chapter 3
Systems of Linear Equations
The solution to the third system is
⎧⎡ ⎤ ⎡ ⎤
⎫
⎡ ⎤
−2
−3
⎨ −4
⎬
𝑈 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R .
⎩
⎭
0
0
1
We represent these solution sets graphically. The solution sets are labelled in the diagram
We see that 𝑆 is a plane through the origin, which is to be expected because it is the solution
set to a homogeneous system.
The solution sets 𝑇 and 𝑈 are translations
of the set 𝑆. For 𝑇 , every vector in 𝑆 has
⎡ ⎤
6
been translated in the direction of ⎣ 0 ⎦, a particular solution to the system in (b). For 𝑈 ,
0
⎡
⎤
−4
every vector in 𝑆 has been translated in the direction of ⎣ 0 ⎦, a particular solution to the
0
system in (c).
In the previous two examples, we see that we can find the solution set 𝑈 by subtracting
from every element in 𝑇 a particular solution of 𝑇 and then adding a particular solution of
𝑈 . Let’s focus on the sets in Example 3.11.8.
⎡
⎤
⎡ ⎤
−4
6
#»
Let #»
𝑧 ∈ 𝑇 , and let #»
𝑢 𝑝 = ⎣ 0 ⎦ and 𝑡 𝑝 = ⎣ 0 ⎦ be particular solutions of 𝑈 and 𝑇 ,
0
0
#»
#»
respectively. From Theorem 3.11.6 (Solutions to 𝐴 #»
𝑥 = 0 and 𝐴 #»
𝑥 = 𝑏 ), as #»
𝑧 ∈ 𝑇 , we
know that
#»
#»
𝑧 = 𝑡 𝑝 + #»
𝑥
Section 3.11
Solution Sets to Systems of Linear Equations
95
for some #»
𝑥 ∈ 𝑆, the solution set of the associated homogeneous system. From above, we
would express
⎡
⎤
⎡
⎤
−2
−3
#»
𝑥 = ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 for some 𝑠, 𝑡 ∈ R.
0
1
#»
#»
As we can certainly write #»
𝑢 𝑝 = #»
𝑢 𝑝 − 𝑡 𝑝 + 𝑡 𝑝 and any vector #»
𝑢 ∈ 𝑈 as #»
𝑢 = #»
𝑢 𝑝 + #»
𝑥 for
#»
#»
#»
#»
#»
some 𝑥 ∈ 𝑆 by Theorem 3.11.6 (Solutions to 𝐴 𝑥 = 0 and 𝐴 𝑥 = 𝑏 ), we have
𝑈 = { #»
𝑢 𝑝 + #»
𝑥 : #»
𝑥 ∈ 𝑆}
{︀(︀ #»
}︀
#»
#» )︀
= 𝑢 𝑝 − 𝑡 𝑝 + 𝑡 𝑝 + #»
𝑥 : #»
𝑥 ∈𝑆
(︀
)︀
#» )︀ (︀ #»
= { #»
𝑢 𝑝 − 𝑡 𝑝 + 𝑡 𝑝 + #»
𝑥 : #»
𝑥 ∈ 𝑆}
⏟
⏞
element of 𝑇
{︀(︀ #»
}︀
#» )︀ #» #»
= 𝑢𝑝 − 𝑡 𝑝 + 𝑧 : 𝑧 ∈ 𝑇 .
This holds in general and provides a nice way to understand the relationships between
parallel lines and between parallel planes. We formalize this discussion below.
Corollary 3.11.9
#»
(Solutions to 𝐴 #»
𝑥 = 𝑏 and 𝐴 #»
𝑥 = #»
𝑐)
Consider the two consistent, non-homogeneous systems
#»
𝐴 #»
𝑥 = 𝑏 , and 𝐴 #»
𝑥 = #»
𝑐,
#»
where 𝑏 ̸= #»
𝑐 . If 𝑆˜𝑏 and 𝑆˜𝑐 are their respective solution sets, with particular solutions 𝑥#»𝑏
and 𝑥#»𝑐 , respectively, then
𝑆˜𝑐 = {(𝑥#»𝑐 − 𝑥#»𝑏 ) + #»
𝑧 : #»
𝑧 ∈ 𝑆˜𝑏 }.
We can use this result to help us find the solution set to a consistent, non-homogeneous
system, provided we have a particular solution to the system and we have the solution set
to another consistent, non-homogeneous system which shares the coefficient matrix. Note
that finding a particular solution of a system is not always a simple task.
Example 3.11.10
𝑥 = #»
𝑐 using the
Using Example 3.11.8, solve the consistent non-homogeneous system 𝐴 #»
#»
#»
solution to 𝐴 𝑥 = 𝑏 , where
⎡
⎤
⎡ ⎤
⎡ ⎤
123
6
1
#»
𝐴 = ⎣ 2 4 6 ⎦ and 𝑏 = ⎣ 12 ⎦ , #»
𝑐 = ⎣2⎦ .
369
18
3
[︀
]︀𝑇
Solution: From Example 3.11.8, we have that the solution set to 𝐴 #»
𝑥 = 6 12 18 is
⎧⎡ ⎤ ⎡
⎫
⎤
⎡
⎤
−2
−3
⎨ 6
⎬
𝑇 = 𝑆˜𝑏 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R .
⎩
⎭
0
0
1
[︀
]︀𝑇
By inspection, we can see that 1 0 0 is a particular solution to 𝐴 #»
𝑥 = #»
𝑐.
96
Chapter 3
Systems of Linear Equations
#»
Therefore, by Corollary 3.11.9 (Solutions to 𝐴 #»
𝑥 = 𝑏 and 𝐴 #»
𝑥 = #»
𝑐 ), the solution set to
#»
#»
𝐴 𝑥 = 𝑐 is
⎡ ⎤
}︃
{︃ ⎛⎡ 1 ⎤ ⎡ 6 ⎤⎞ ⎡ 6 ⎤ ⎡ −2 ⎤
−3
𝑆˜𝑐 = ⎝⎣ 0 ⎦ − ⎣ 0 ⎦⎠ + ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R
0
0
0
0
1
⏟
⏞
#»
𝑧 ∈𝑆˜𝑏
Note that in the final solution above, simplifying the expression in the set gives us
⎧⎡ ⎤ ⎡ ⎤
⎫ ⎧⎡ ⎤
⎫
⎡ ⎤
−2
−3
⎨ 1
⎬ ⎨ 1
⎬
𝑆˜𝑐 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R = ⎣ 0 ⎦ + #»
𝑥 : #»
𝑥 ∈𝑆 ,
⎩
⎭ ⎩
⎭
0
0
1
0
where 𝑆 is the solution set of the associated homogeneous system, which is in alignment
#»
#»
with Theorem 3.11.6 (Solutions to 𝐴 #»
𝑥 = 0 and 𝐴 #»
𝑥 = 𝑏 ).
Chapter 4
Matrices
4.1
The Column and Row Spaces of a Matrix
In the previous chapter, we introduced the concept of a matrix, which was built from the
coefficients of a system of equations. It turns out that matrices are interesting and powerful
mathematical objects all by themselves. In this chapter, we will explore matrices further.
We saw in Proposition 3.9.6 that the matrix–vector product 𝐴 #»
𝑥 is a linear combination of
the columns of 𝐴. We give the set of all linear combinations of the columns of 𝐴 a name.
Definition 4.1.1
Column Space
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the column space of 𝐴, denoted by Col(𝐴), to be the span
of the columns of 𝐴. That is,
Col(𝐴) = Span{𝑎#»1 , 𝑎#»2 , . . . , 𝑎# »
𝑛 }.
Proposition 4.1.2
(Consistent System and Column Space)
#»
#»
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝑏 ∈ F𝑚 . The system of linear equations 𝐴 #»
𝑥 = 𝑏 is consistent
#»
if and only if 𝑏 ∈ Col(𝐴).
Proof: We must prove the statement in both directions. We begin with the forward direc]︀𝑇
[︀
#»
such that 𝐴 #»
𝑦 = 𝑏 . Thus,
tion. Assume that there exists a vector #»
𝑦 = 𝑦1 𝑦2 · · · 𝑦𝑛
]︀𝑇
[︀ #» #»
]︀ [︀
#»
𝑎1 𝑎2 · · · 𝑎# »
𝑦1 𝑦2 · · · 𝑦𝑛 = 𝑏 .
𝑛
Therefore,
#»
𝑦1 𝑎#»1 + 𝑦2 𝑎#»2 + · · · + 𝑦𝑛 𝑎# »
𝑛 = 𝑏.
#»
#»
Thus, 𝑏 is a linear combination of the columns of the matrix 𝐴, and so 𝑏 ∈ Col(𝐴).
#»
Now we prove the backward direction. Assume that 𝑏 ∈ Col(𝐴). Then
#»
𝑏 = 𝑠1 𝑎#»1 + 𝑠2 𝑎#»2 + · · · + 𝑠𝑛 𝑎# »
𝑛
[︀
]︀𝑇
#»
for some 𝑠1 , 𝑠2 , . . . , 𝑠𝑛 ∈ F. Let 𝑠 = 𝑠1 𝑠2 · · · 𝑠𝑛 . Then
#»
𝐴 #»
𝑠 = 𝑠1 𝑎#»1 + 𝑠2 𝑎#»2 + · · · + 𝑠𝑛 𝑎# »
𝑛 = 𝑏.
#»
#»
Thus, 𝐴 #»
𝑠 = 𝑏 , and so #»
𝑠 is a solution to the system. Therefore, 𝐴 #»
𝑥 = 𝑏 is consistent.
97
98
Chapter 4
Matrices
We now begin our examination of a similar space for the rows of the matrix. First, we need
the notion of the transpose of a matrix.
Definition 4.1.3
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the transpose of 𝐴, denoted 𝐴𝑇 , by (𝐴𝑇 )𝑖𝑗 = (𝐴)𝑗𝑖 .
Transpose
If 𝐴 is an 𝑚 × 𝑛 matrix, then 𝐴𝑇 is an 𝑛 × 𝑚 matrix. It is formed by making all the rows
of 𝐴 into the columns of 𝐴𝑇 in the order in which they appear.
This definition is the motivation
behind
]︀𝑇 the notation that we have used to write a vector
[︀
from F𝑛 horizontally: #»
𝑣 = 𝑣1 . . . 𝑣𝑛 .
⎡
⎤
]︂
1 4
1 2 −3
If 𝐴 =
, then 𝐴𝑇 = ⎣ 2 −5 ⎦ .
4 −5 6
−3 6
[︂
Example 4.1.4
In Section 3.9, we introduced the idea of thinking of an 𝑚 × 𝑛 matrix 𝐴 as
⎡# »
⎤
row1 (𝐴)
# »(𝐴) ⎥
⎢ row
2
⎢
⎥
𝐴=⎢
⎥.
..
⎣
⎦
.
#row »(𝐴)
𝑚
In other words, we partition 𝐴 into 𝑚 row vectors, each with 𝑛 components. If we transpose
these row vectors, we get vectors in R𝑛 . We use these vectors to define the row space of 𝐴.
Definition 4.1.5
Row Space
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the row space of 𝐴, denoted by Row(𝐴), to be the span of
the transposed rows of 𝐴. That is,
{︁(︀
}︁
# »(𝐴))︀𝑇 , (︀row
# »(𝐴))︀𝑇 , . . . , (︀row
# »(𝐴))︀𝑇 .
Row(𝐴) = Span row
1
2
𝑚
Notice that, if 𝐴 ∈ 𝑀𝑚×𝑛 (F), Row(𝐴) is a subset of F𝑛 while Col(𝐴) is a subset of F𝑚 .
⎡
Example 4.1.6
1
⎢2
If 𝐴 = ⎢
⎣3
4
4
3
2
1
⎤
⎧⎡ ⎤ ⎡
⎤ ⎡ ⎤ ⎡ ⎤⎫
1
2
3
4 ⎬
⎨ 1
⎥
−1 ⎥
⎣4⎦ , ⎣ 3 ⎦ , ⎣2⎦ , ⎣ 1 ⎦ .
,
then
Row(𝐴)
=
Span
1 ⎦
⎩
⎭
1
−1
1
−1
−1
REMARK
Since the transposed rows of 𝐴 are the columns of 𝐴𝑇 , we see that Row(𝐴) = Col(𝐴𝑇 ).
Applying EROs to a matrix does not affect the row space. We formalize this below.
Section 4.2
Matrix Equality and Multiplication
Proposition 4.1.7
99
Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F). If 𝐵 is row equivalent to 𝐴, then
Row(𝐵) = Row(𝐴).
Proof: Using the remark above, we note that Row(𝐴) = Col(𝐴𝑇 ) and Row(𝐵) = Col(𝐵 𝑇 ).
Therefore, if we can show that Col(𝐵 𝑇 ) = Col(𝐴𝑇 ), then the proposition will be proved.
We begin by considering the case where 𝐵 is obtained from 𝐴 by applying exactly one ERO.
When we perform a single ERO of any type to 𝐴 to obtain 𝐵, each column of 𝐵 𝑇 will be
a linear combination of the columns of 𝐴𝑇 . Therefore, Col(𝐵 𝑇 ) ⊆ Col(𝐴𝑇 ).
As we know from Theorem 3.2. 18 (Elementary Operations), the operations which reverse EROs are themselves EROs. Therefore, we can obtain 𝐴 from 𝐵 by doing a single
ERO. Thus, every column of 𝐴𝑇 is a linear combination of the columns of 𝐵 𝑇 . Therefore,
Col(𝐴𝑇 ) ⊆ Col(𝐵 𝑇 ) and we have that Col(𝐵 𝑇 ) = Col(𝐴𝑇 ).
In the case that 𝐵 is obtained from 𝐴 by applying multiple EROs, we can apply the above
argument multiple times, once for each of the EROs.
REMARK
One might expect that a similar result would hold for the column space. That is, if 𝐴, 𝐵 ∈
𝑀𝑚×𝑛 (F), and if 𝐵 is row equivalent to 𝐴, [︂then ]︂Col(𝐵) = Col(𝐴). However, [︂this ]︂result
00
11
, but
is row equivalent to 𝐴 =
is not true. For example, the matrix 𝐵 =
11
00
{︂[︂ ]︂}︂
{︂[︂ ]︂}︂
1
0
while Col(𝐴) = Span
Col(𝐵) = Span
.
0
1
4.2
Matrix Equality and Multiplication
As a preliminary step toward discussing statements involving algebraic matrix operations,
we establish what it means for matrices to be equal.
Definition 4.2.1
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝐵 ∈ 𝑀𝑝×𝑞 (F). We say that 𝐴 and 𝐵 are equal if
Matrix Equality
1. 𝐴 and 𝐵 have the same size, that is, 𝑚 = 𝑝 and 𝑛 = 𝑞, and
2. The corresponding entries of 𝐴 and 𝐵 are equal, i.e., 𝑎𝑖𝑗 = 𝑏𝑖𝑗 , for all 𝑖 = 1, 2, . . . , 𝑚
and 𝑗 = 1, 2, . . . , 𝑛.
We denote this by writing 𝐴 = 𝐵.
While this definition seems to suggest that matrix equality should always be determined
through exhaustive comparison of matrix entries, we can also uniquely identify matrices
based on their action on other objects. Our next theorem gives a very useful criterion for
100
Chapter 4
Matrices
checking that two given matrices of the same size are equal based on their behaviour in
matrix-vector products. It requires a preliminary lemma, which is important in its own
right.
Lemma 4.2.2
(Column Extraction)
[︀
]︀
#» #»
Let 𝐴 = 𝑎#»1 · · · 𝑎# »
𝑛 ∈ 𝑀𝑚×𝑛 (F). Then 𝐴 𝑒𝑖 = 𝑎𝑖 for all 𝑖 = 1, . . . , 𝑛.
In other words, the matrix-vector multiplication of 𝐴 and the 𝑖𝑡ℎ standard basis vector 𝑒#»𝑖
yields the 𝑖𝑡ℎ column of 𝐴.
Proof: According to the definition of matrix-vector multiplication,
#»
𝐴𝑒#»1 = 1 · 𝑎#»1 + 0 · 𝑎#»2 + 0 · 𝑎#»3 + · · · + 0 · 𝑎# »
𝑛 = 𝑎1 ,
#»
𝐴𝑒#»2 = 0 · 𝑎#»1 + 1 · 𝑎#»2 + 0 · 𝑎#»3 + · · · + 0 · 𝑎# »
𝑛 = 𝑎2 ,
..
.
#»
#»
#»
#»
𝐴𝑒𝑛 = 0 · 𝑎1 + 0 · 𝑎2 + 0 · 𝑎#»3 + · · · + 1 · 𝑎# »
𝑛 = 𝑎𝑛 .
Theorem 4.2.3
(Equality of Matrices)
Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F). Then
𝐴 = 𝐵 if and only if 𝐴 #»
𝑥 = 𝐵 #»
𝑥 for all #»
𝑥 ∈ F𝑛 .
Proof: If 𝐴 = 𝐵, then it is clear that 𝐴 #»
𝑥 = 𝐵 #»
𝑥 for all #»
𝑥 ∈ F𝑛 .
[︁
]︁
[︀
]︀
#»
#»
#»
#»
#»
𝑛
Now, let 𝐴 = 𝑎#»1 · · · 𝑎# »
𝑛 , 𝐵 = 𝑏1 · · · 𝑏𝑛 , and suppose that 𝐴 𝑥 = 𝐵 𝑥 for all 𝑥 ∈ F .
Then we can take #»
𝑥 equal to the 𝑖𝑡ℎ standard basis vector 𝑒#» and conclude that 𝐴𝑒#» = 𝐵 𝑒#»
𝑖
𝑖
𝑖
for all 𝑖 = 1, . . . , 𝑛. But then it follows from Lemma 4.2.2 (Column Extraction) that
#»
⃗𝑎𝑖 = 𝐴𝑒#»𝑖 = 𝐵 𝑒#»𝑖 = 𝑏𝑖 .
Since the columns of 𝐴 and 𝐵 are the same, we conclude that the matrices are equal.
Recall that we can calculate the matrix–vector product 𝐴 #»
𝑥 provided the sizes of 𝐴 and #»
𝑥
#»
𝑛
are compatible: we need 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝑥 ∈ F . We extend this process to multiplying
several column vectors, of the correct size, by the same matrix simultaneously.
Definition 4.2.4
Matrix
Multiplication
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝐵 ∈ 𝑀𝑛×𝑝 (F). We define the matrix product 𝐴𝐵 = 𝐶 to be the
matrix 𝐶 ∈ 𝑀𝑚×𝑝 (F), constructed as follows:
[︁ #» #»
#» ]︁ [︁ #» #»
#» ]︁
𝐶 = 𝐴𝐵 = 𝐴 𝑏1 𝑏2 · · · 𝑏𝑝 = 𝐴𝑏1 𝐴𝑏2 · · · 𝐴𝑏𝑝 .
That is, the 𝑗 𝑡ℎ column of 𝐶, 𝑐#»𝑗 , is obtained by multiplying the matrix 𝐴 by the 𝑗 𝑡ℎ column
of the matrix 𝐵:
#»
𝑐#»𝑗 = 𝐴𝑏𝑗 , for all 𝑗 = 1, . . . , 𝑝.
Section 4.2
Matrix Equality and Multiplication
101
Thus, we can construct the product 𝐴𝐵, column by column, by calculating a sequence of
#»
matrix–vector products 𝐴𝑏𝑗 .
REMARKS
• In calculating the product 𝐴𝐵, the number of columns of the first matrix, 𝐴, must be
equal to the number of rows of the second matrix, 𝐵. If these values do not match,
then the product is undefined.
• Matrix multiplication is generally non-commutative, so the order of multiplication
matters. In other words, for matrices 𝐴 and 𝐵, we are not guaranteed that 𝐴𝐵 = 𝐵𝐴.
• We can immediately connect the definition of matrix multiplication to our understanding of column spaces. As the 𝑗 𝑡ℎ column of 𝐶 = 𝐴𝐵 is obtained by multiplying
the matrix 𝐴 by the 𝑗 𝑡ℎ column of the matrix 𝐵, the 𝑗 𝑡ℎ column of 𝐶 must be a linear
combination of the columns of 𝐴 (by Proposition 3.9.6 (Matrix–Vector Multiplication
in Terms of Columns)), and therefore is a member of Col(𝐴).
⎡
Example 4.2.5
⎤
]︂
[︂
12
−1 3
, calculate the products 𝐴𝐵 and 𝐵𝐴 if possible.
Given 𝐴 = ⎣ 3 5 ⎦ and 𝐵 =
2 −4
87
Solution: First, we note that the number of columns of 𝐴 is equal to the number of rows
of 𝐵 and so the product is defined. Also, since 𝐴 has three rows and 𝐵 has two columns,
then the size of the product will be 3 × 2.
We have coloured the entries of 𝐴 blue and the entries of 𝐵 red to emphasize the role their
entries play in calculating the product.
⎤
⎤
⎤
⎡
⎤
⎡⎡
⎡
]︂
]︂
]︂
1 2 [︂
1 2 [︂
1 2 [︂
⎣3 5⎦ 3 ⎦
⎣ 3 5 ⎦ −1 3 = ⎣⎣ 3 5 ⎦ −1
−4
2
2 −4
87
87
87
⎡
⎤
(1)(−1) + (2)(2) (1)(3) + (2)(−4)
= ⎣ (3)(−1) + (5)(2) (3)(3) + (5)(−4) ⎦
(8)(−1) + (7)(2) (8)(3) + (7)(−4)
⎡
⎤
3 −5
= ⎣ 7 −11 ⎦ .
6 −4
The product 𝐵𝐴 is undefined, since 𝐵 has 2 columns, 𝐴 has 3 rows and 2 ̸= 3.
[︂
Example 4.2.6
2 − 𝑖 1 − 2𝑖
Given 𝐴 =
3 − 2𝑖 1 − 3𝑖
𝐵𝐴 if possible.
]︂
[︂
and 𝐵 =
]︂
1 + 𝑖 −2 + 𝑖
, calculate the products 𝐴𝐵 and
−1 + 𝑖 3 + 2𝑖
Solution: First, we note that the number of columns of 𝐴 is equal to the number of rows
of 𝐵 and so the product is defined. Also, since 𝐴 has two rows and 𝐵 has two columns,
then the size of the product will be 2 × 2.
102
Chapter 4
Matrices
We have coloured the entries of 𝐴 blue and the entries of 𝐵 red to emphasize the role their
entries play in calculating the product.
[︂
]︂[︂
]︂
2 − 𝑖 1 − 2𝑖
1 + 𝑖 −2 + 𝑖
3 − 2𝑖 1 − 3𝑖 −1 + 𝑖 3 + 2𝑖
[︂ [︂
]︂[︂
]︂ [︂
]︂[︂
]︂ ]︂
2 − 𝑖 1 − 2𝑖
1+𝑖
2 − 𝑖 1 − 2𝑖 −2 + 𝑖
=
3 − 2𝑖 1 − 3𝑖 −1 + 𝑖
3 − 2𝑖 1 − 3𝑖 3 + 2𝑖
[︂
]︂
(2 − 𝑖)(1 + 𝑖) + (1 − 2𝑖)(−1 + 𝑖)
(2 − 𝑖)(−2 + 𝑖) + (1 − 2𝑖)(3 + 2𝑖)
=
(3 − 2𝑖)(1 + 𝑖) + (1 − 3𝑖)(−1 + 𝑖) (3 − 2𝑖)(−2 + 𝑖) + (1 − 3𝑖)(3 + 2𝑖)
[︂
]︂
4 + 4𝑖 4
=
.
7 + 5𝑖 5
The product 𝐵𝐴 is also defined since the number of columns of 𝐵 is equal to the number
of rows of 𝐴.
[︂
]︂[︂
]︂
1 + 𝑖 −2 + 𝑖
2 − 𝑖 1 − 2𝑖
−1 + 𝑖 3 + 2𝑖 3 − 2𝑖 1 − 3𝑖
[︂ [︂
]︂[︂
]︂ [︂
]︂[︂
]︂ ]︂
1 + 𝑖 −2 + 𝑖
2−𝑖
1 + 𝑖 −2 + 𝑖 1 − 2𝑖
=
−1 + 𝑖 3 + 2𝑖 3 − 2𝑖
−1 + 𝑖 3 + 2𝑖 1 − 3𝑖
]︂
[︂
(1 + 𝑖)(2 − 𝑖) + (−2 + 𝑖)(3 − 2𝑖)
(1 + 𝑖)(1 − 2𝑖) + (−2 + 𝑖)(1 − 3𝑖)
=
(−1 + 𝑖)(2 − 𝑖) + (3 + 2𝑖)(3 − 2𝑖) (−1 + 𝑖)(1 − 2𝑖) + (3 + 2𝑖)(1 − 3𝑖)
]︂
[︂
−1 + 8𝑖 4 + 6𝑖
.
=
12 + 3𝑖 10 − 4𝑖
Notice that even though 𝐴𝐵 and 𝐵𝐴 are both defined, we have that 𝐴𝐵 ̸= 𝐵𝐴.
In practice, we usually construct the product 𝐴𝐵 = 𝐶 one entry at a time. The entry
(𝐶)𝑖𝑗 is in the 𝑖𝑡ℎ row and the 𝑗 𝑡ℎ column of the product. Therefore, (𝐶)𝑖𝑗 = (𝑐#»𝑗 )𝑖 , where
#»
#»
𝑐#»𝑗 = 𝐴𝑏𝑗 . For 𝐴 ∈ 𝑀𝑚×𝑛 (F), 𝐵 ∈ 𝑀𝑛×𝑝 (F), and 𝐶 ∈ 𝑀𝑚×𝑝 (F), the 𝑖𝑡ℎ entry of 𝐴𝑏𝑗 is
𝑛
∑︁
𝑎𝑖𝑘 𝑏𝑘𝑗 = 𝑎𝑖1 𝑏1𝑗 + 𝑎𝑖2 𝑏2𝑗 + · · · + 𝑎𝑖𝑛 𝑏𝑛𝑗 .
𝑘=1
Thus,
𝑛
∑︁
#»
(𝐶)𝑖𝑗 = (𝐴𝑏𝑗 )𝑖 =
𝑎𝑖𝑘 𝑏𝑘𝑗 .
𝑘=1
Consequently, in order to obtain the (𝑖, 𝑗)𝑡ℎ entry of 𝐴𝐵, we need to multiply the corresponding entries of the 𝑖𝑡ℎ row of 𝐴 and the 𝑗 𝑡ℎ column of 𝐵 and sum them up. We can
re-write this last expression in a more suggestive manner:
⎡
⎤ ⎡
⎤
𝑎𝑖1
𝑏1𝑗
⎢ 𝑎𝑖2 ⎥ ⎢ 𝑏2𝑗 ⎥
⎢
⎥ ⎢
⎥
(𝐶)𝑖𝑗 = ⎢ . ⎥ · ⎢ . ⎥ .
⎣ .. ⎦ ⎣ .. ⎦
𝑎𝑖𝑛
𝑏𝑛𝑗
Section 4.3
Arithmetic Operations on Matrices
103
REMARK
The equation above shows that the (𝑖, 𝑗)𝑡ℎ entry of 𝐴𝐵 may be obtained by taking the dot
product of the (transposed) 𝑖𝑡ℎ row of 𝐴 with the 𝑗 𝑡ℎ column of 𝐵.
Example 4.2.7
Determine the (2, 2)𝑡ℎ entry of the product
⎡
1
⎣
𝐴𝐵 = 𝐶 = 3
8
⎤
]︂
2 [︂
−1 3
⎦
5
.
2 −4
7
Solution: We use the second row of 𝐴 and the second column of 𝐵 to calculate
[︂
]︂ [︂ ]︂ [︂
]︂
[︀
]︀𝑇
3
3
3
𝑐22 = 3 5 ·
=
·
= 9 − 20 = −11.
−4
5
−4
Example 4.2.8
Determine the (2, 1)𝑡ℎ entry of the product
]︂
]︂[︂
[︂
1 + 𝑖 −2 + 𝑖
2 − 𝑖 1 − 2𝑖
.
𝐶𝐷 = 𝐸 =
3 − 2𝑖 1 − 3𝑖 −1 + 𝑖 3 + 2𝑖
Solution: We use the second row of 𝐶 and the first column of 𝐷 to calculate
]︂
[︂
[︀
]︀𝑇
1+𝑖
𝑒21 = 3 − 2𝑖 1 − 3𝑖 ·
−1 + 𝑖
[︂
]︂ [︂
]︂
3 − 2𝑖
1+𝑖
=
·
1 − 3𝑖
−1 + 𝑖
= (3 − 2𝑖)(1 + 𝑖) + (1 − 3𝑖)(−1 + 𝑖)
= 7 + 5𝑖.
4.3
Arithmetic Operations on Matrices
In this section, we will introduce additional operations that can be performed on matrices.
We will also see that, while the sets 𝑀𝑚×𝑛 (R) and 𝑀𝑚×𝑛 (C) are very similar to R𝑛 and
C𝑛 , in certain respects, there also exist quite a few important distinctions between them.
Definition 4.3.1
Sum of Matrices
Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F). We define the matrix sum 𝐴 + 𝐵 = 𝐶 to be the matrix
𝐶 ∈ 𝑀𝑚×𝑛 (F) whose (𝑖, 𝑗)𝑡ℎ entry is 𝑐𝑖𝑗 = 𝑎𝑖𝑗 + 𝑏𝑖𝑗 , for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛.
104
Chapter 4
Matrices
REMARK
Addition is not defined for matrices that do not have the same size.
[︂
Example 4.3.2
]︂
[︂
]︂
[︂
]︂ [︂
]︂ [︂
]︂
02
1 0
02
1 0
1 2
For 𝐴 =
and 𝐵 =
, we have the sum 𝐴 + 𝐵 =
+
=
.
30
0 −4
30
0 −4
3 −4
Some basic properties of matrix addition and matrix multiplication are given in the next
two propositions.
Proposition 4.3.3
(Matrix Addition)
If 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑚×𝑛 (F), then
(a) 𝐴 + 𝐵 = 𝐵 + 𝐴.
(b) 𝐴 + 𝐵 + 𝐶 = (𝐴 + 𝐵) + 𝐶 = 𝐴 + (𝐵 + 𝐶).
Proposition 4.3.4
(Matrix Multiplication)
If 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), 𝐶, 𝐷 ∈ 𝑀𝑛×𝑝 (F) and 𝐸 ∈ 𝑀𝑝×𝑞 (F), then
(a) (𝐴 + 𝐵)𝐶 = 𝐴𝐶 + 𝐵𝐶.
(b) 𝐴(𝐶 + 𝐷) = 𝐴𝐶 + 𝐴𝐷.
(c) (𝐴𝐶)𝐸 = 𝐴(𝐶𝐸) = 𝐴𝐶𝐸.
EXERCISE
Prove the previous two Propositions.
Definition 4.3.5
Additive Inverse
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the additive inverse of 𝐴 to be the matrix −𝐴 whose
(𝑖, 𝑗)𝑡ℎ entry is −𝑎𝑖𝑗 for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛.
[︂
Example 4.3.6
Definition 4.3.7
Zero Matrix
]︂
[︂
]︂
1 2
−1 −2
If 𝐴 =
, then the additive inverse of 𝐴 is −𝐴 =
.
3 −4
−3 4
The 𝑚 × 𝑛 zero matrix is the matrix 𝒪𝑚×𝑛 ∈ 𝑀𝑚×𝑛 (F) all of whose entries are 0.
Section 4.3
Arithmetic Operations on Matrices
105
REMARK
1. It is important to distinguish the zero matrix 𝒪𝑚×𝑛 from the zero scalar 0 and the
zero vector ⃗0.
2. Usually, the context in which the zero matrix is used is sufficient for us to determine
its size, in which case we, denote it by 𝒪.
[︂
Example 4.3.8
We have 𝒪2×3 =
]︂
000
and 𝒪1×1 = [0]. Note that 𝒪1×1 is technically different from the
000
zero scalar 0.
On the other hand, 𝒪𝑚×1
Proposition 4.3.9
⎡ ⎤
0
⎢ .. ⎥
= ⎣ . ⎦ is the zero vector in F𝑚 .
0
(Properties of the Additive Inverse and the Zero Matrix)
If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then
(a) 𝒪 + 𝐴 = 𝐴 + 𝒪 = 𝐴.
(b) 𝐴 + (−𝐴) = (−𝐴) + 𝐴 = 𝒪.
EXERCISE
Prove the previous Proposition.
Definition 4.3.10
Multiplication of a
Matrix by a Scalar
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝑐 ∈ F. We define the product of 𝑐 and 𝐴 to be the matrix
𝑐𝐴 ∈ 𝑀𝑚×𝑛 (F) whose (𝑖, 𝑗)𝑡ℎ entry is (𝑐𝐴)𝑖𝑗 = 𝑐𝑎𝑖𝑗 for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛.
Notice that the product (−1)𝐴 (of the scalar −1 and the matrix 𝐴) is −𝐴, the additive
inverse of 𝐴. That is,
(−1)𝐴 = −𝐴.
So, our notational choices are compatible. The following proposition gives some additional
properties of scalar multiplication.
Proposition 4.3.11
(Properties of Multiplication of a Matrix by a Scalar)
If 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), 𝐶 ∈ 𝑀𝑛×𝑘 (F), and 𝑟, 𝑠 ∈ F, then
(a) 𝑠(𝐴 + 𝐵) = 𝑠𝐴 + 𝑠𝐵.
106
Chapter 4
Matrices
(b) (𝑟 + 𝑠)𝐴 = 𝑟𝐴 + 𝑠𝐴.
(c) 𝑟(𝑠𝐴) = (𝑟𝑠)𝐴.
(d) 𝑠(𝐴𝐶) = (𝑠𝐴)𝐶 = 𝐴(𝑠𝐶). Thus, we may write 𝑠𝐴𝐶 for any of these three quantities
unambiguously.
EXERCISE
Prove the previous Proposition.
Our previous results demonstrate that matrix operations behave like their scalar counterparts. However, there are significant differences. For instance, while it is true that 𝑎𝑏 = 𝑏𝑎
for all 𝑎, 𝑏 ∈ F, we have already seen an example of matrices 𝐴 and 𝐵 where 𝐴𝐵 ̸= 𝐵𝐴,
even when both 𝐴𝐵 and 𝐵𝐴 are defined. (See Example 4.2.6.)
Two other results that apply to scalars which students are tempted to apply to matrices
are the following:
1. If 𝑎𝑏 = 𝑎𝑐 and 𝑎 ̸= 0, then 𝑏 = 𝑐 (cancellation law).
2. 𝑎𝑏 = 0 if and only if 𝑎 = 0 or 𝑏 = 0.
These results do not extend to matrix arithmetic, in general.
]︂
]︂
[︂
]︂
[︂
]︂
[︂
34
25
11
01
and 𝐴 ̸= 𝒪, but
. Then 𝐴𝐵 = 𝐴𝐶 =
, and 𝐶 =
,𝐵=
68
34
34
02
𝐵 ̸= 𝐶. Thus, the cancellation law does not apply to matrices.
[︂
]︂
[︂
]︂
01
37
2. Let 𝐴 =
and 𝐵 =
. Then 𝐴𝐵 = 𝒪, but 𝐴 ̸= 𝒪 and 𝐵 ̸= 𝒪.
02
00
[︂
Example 4.3.12
1. Let 𝐴 =
In Section 4.1 we were introduced to the transpose 𝐴𝑇 of a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F). Recall
that (𝐴𝑇 )𝑖𝑗 = (𝐴)𝑗𝑖 . Our next Proposition describes the interplay between transpose and
the other operations on matrices.
Proposition 4.3.13
(Properties of Matrix Transpose)
If 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), 𝐶 ∈ 𝑀𝑛×𝑘 (F), and 𝑠 ∈ F, then
(a) (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇 .
(b) (𝑠𝐴)𝑇 = 𝑠𝐴𝑇 .
(c) (𝐴𝐶)𝑇 = 𝐶 𝑇 𝐴𝑇 .
(d) (𝐴𝑇 )𝑇 = 𝐴.
Section 4.4
Properties of Square Matrices
107
REMARK
Pay close attention to Property (c). The transpose of a product is the product of the
transposes in reverse!
EXERCISE
Prove the previous Proposition.
4.4
Properties of Square Matrices
We turn our attention to matrices that have the same number of rows as columns.
Definition 4.4.1
Square Matrix
A matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) where the number of rows is equal to the number of columns is
called a square matrix .
REMARK
For the case where 𝑚 = 𝑛, it is common to refer to 𝑀𝑛×𝑛 (F) as 𝑀𝑛 (F).
]︂
321
is not a square matrix, since it has 2 rows and 3 columns. The matrix
The matrix
111
]︂
[︂
−2 + 𝑖 3 − 2𝑖
is a square matrix, since it has 2 rows and 2 columns.
2 + 2𝑖 6 − 7𝑖
[︂
Example 4.4.2
Definition 4.4.3
Upper Triangular
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐴 is upper triangular if 𝑎𝑖𝑗 = 0 for 𝑖 > 𝑗 with 𝑖 = 1, . . . , 𝑛
and 𝑗 = 1, . . . , 𝑛.
⎤
⎡
⎤
3 −4 6
𝑖 4 − 𝑖 6 − 4𝑖
1 + 𝑖 −7
The matrices
, ⎣ 0 5 9 ⎦ , and ⎣ 0 1 + 𝑖 7𝑖 ⎦ are upper-triangular.
0 1−𝑖
0 0 7
0 0 2 − 4𝑖
[︂
Example 4.4.4
Definition 4.4.5
Lower Triangular
⎡
Let 𝐵 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐵 is lower triangular if 𝑏𝑖𝑗 = 0 for 𝑖 < 𝑗 with 𝑖 = 1, . . . , 𝑛
and 𝑗 = 1, . . . , 𝑛.
⎤
𝑖
0
0
0
300
⎢ 7𝑖 1 + 𝑖 0
𝑖 0
0 ⎥
⎥ are lower triangular.
The matrices
, ⎣ 3 4 0 ⎦ , and ⎢
⎣
4 1−𝑖
6 2 + 𝑖 2 − 4𝑖 0 ⎦
070
3 − 5𝑖 5
0 4 + 3𝑖
[︂
Example 4.4.6
]︂
]︂
⎡
⎤
⎡
108
Chapter 4
Matrices
REMARKS
• The transpose of an upper (lower) triangular matrix is a lower (upper) triangular
matrix, respectively.
• The product of upper (lower) triangular 𝑛 × 𝑛 matrices is upper (lower) triangular,
respectively.
Definition 4.4.7
Diagonal
We say that an 𝑛 × 𝑛 matrix 𝐴 is diagonal if 𝑎𝑖𝑗 = 0 for 𝑖 ̸= 𝑗 with 𝑖 = 1, . . . , 𝑛 and
𝑗 = 1, . . . , 𝑛. We refer to the entries 𝑎𝑖𝑖 as the diagonal entries of 𝐴, and denote our
matrix 𝐴 by 𝐴 = diag(𝑎11 , 𝑎22 , . . . , 𝑎𝑛𝑛 ).
⎤
𝑖 0
0
0
300
⎢0 1 + 𝑖 0
𝑖 0
0 ⎥
⎥ are diagonal.
The matrices
, ⎣ 0 4 0 ⎦ , and ⎢
⎣
0 1−𝑖
0 0 2 − 4𝑖 0 ⎦
000
0 0
0 4 + 3𝑖
[︂
Example 4.4.8
]︂
⎤
⎡
⎡
⎡
Example 4.4.9
⎤
1 0 0
If 𝐶 = diag(1, −2, 3), then 𝐶 = ⎣ 0 −2 0 ⎦.
0 0 3
REMARKS
• Perhaps the most simple example of a diagonal matrix is the 𝑛 × 𝑛 zero matrix 𝒪𝑛×𝑛 .
• All diagonal matrices are also upper triangular and lower triangular matrices.
Another simple (and important) example is the identity matrix, which we define as follows.
Definition 4.4.10
Identity Matrix
The diagonal matrix diag(1, 1, . . . , 1) is called the identity matrix, and is denoted by 𝐼.
If we wish to indicate that the size of the identity matrix is 𝑛 × 𝑛, we add a subscript 𝑛
and write 𝐼𝑛 .
⎡
⎤
1
100
⎢0
10
𝐼1 = [1], 𝐼2 =
, 𝐼3 = ⎣ 0 1 0 ⎦ , and 𝐼4 = ⎢
⎣0
01
001
0
[︂
Example 4.4.11
]︂
⎡
0
1
0
0
0
0
1
0
⎤
0
0⎥
⎥.
0⎦
1
Section 4.5
Elementary Matrices
109
REMARK
The identity matrix behaves as a multiplicative identity: for any 𝐴 ∈ 𝑀𝑚×𝑛 (F) we have
that
𝐼𝑚 𝐴 = 𝐴 and 𝐴𝐼𝑛 = 𝐴.
In particular, this also holds for vectors in F𝑛 , so 𝐼 #»
𝑥 = #»
𝑥 for all #»
𝑥 ∈ F𝑛 .
𝑛
When the identity matrix is multiplied by a scalar 𝑐 ∈ F, the result is a diagonal matrix
whose diagonal entries are all equal to 𝑐:
𝑐𝐼𝑛 = diag(𝑐, 𝑐, . . . , 𝑐).
[︂
Example 4.4.12
]︂
12
If 𝐴 =
, then 𝐴𝐼2 = 𝐼2 𝐴 = 𝐴.
34
]︂
[︂
123
, then 𝐵𝐼3 = 𝐼2 𝐵 = 𝐵.
If 𝐵 =
340
]︂
𝑐0
.
If 𝑐 ∈ F, then 𝑐𝐼2 =
0𝑐
[︂
Example 4.4.13
4.5
Elementary Matrices
In this section we will show that performing an elementary row operation (ERO) on a
matrix is equivalent to multiplying that matrix on the left by a square matrix of a certain
type.
Definition 4.5.1
Elementary
Matrices
A matrix that can be obtained by performing a single ERO on the identity matrix is called
an elementary matrix.
If we wish to be more precise, we refer to elementary matrices using the same names as we
have for EROs.
Example 4.5.2
The following matrices are examples of elementary matrices:
⎡
⎤
001
• 𝐸1 = ⎣ 0 1 0 ⎦ is a row swap elementary matrix, since performing 𝑅1 ↔ 𝑅3 on 𝐼3 will
100
produce 𝐸1 .
[︂
]︂
50
• 𝐸2 =
is a row scale elementary matrix, since performing 𝑅1 → 5𝑅1 on 𝐼2 will
01
produce 𝐸2 .
110
Chapter 4
⎡
1
• 𝐸3 = ⎣ 0
0
𝑅2 on 𝐼3
Matrices
⎤
0 0
1 −2 ⎦ is a row addition elementary matrix, since performing 𝑅2 → −2𝑅3 +
0 1
will produce 𝐸3 .
Elementary matrices are useful because they can be used to carry out EROs. The next two
results make this statement precise.
Proposition 4.5.3
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and suppose that a single ERO is performed on it to produce matrix 𝐵.
Suppose, also, that we perform the same ERO on the matrix 𝐼𝑚 to produce the elementary
matrix 𝐸. Then
𝐵 = 𝐸𝐴.
Proof: We will prove the result only for row swap elementary matrices and leave the rest
as an exercise. Let 𝐴 be an 𝑚 × 𝑛 matrix with 𝑚 > 1, and consider its transpose 𝐴𝑇 . The
»𝑇 , where 𝑟#» = row
# »(𝐴) is the 𝑖𝑡ℎ row of 𝐴. Thus
columns of 𝐴𝑇 are 𝑟#»1 𝑇 , . . . , 𝑟#𝑚
𝑖
𝑖
[︀
]︀
𝐴𝑇 = 𝑟#»𝑇 · · · 𝑟#»𝑇 · · · 𝑟#»𝑇 · · · 𝑟# »𝑇 .
1
𝑖
𝑗
𝑚
Interchanging rows 𝑖 and 𝑗 of 𝐴 yields the matrix 𝐵 whose transpose is
[︀
»𝑇 ]︀ .
𝐵 𝑇 = 𝑟#»1 𝑇 · · · 𝑟#»𝑗 𝑇 · · · 𝑟#»𝑖 𝑇 · · · 𝑟#𝑚
On the other hand, let
[︀
» ]︀
𝐸 = 𝑒#»1 · · · 𝑒#»𝑗 · · · 𝑒#»𝑖 · · · 𝑒#𝑚
be the elementary matrix that corresponds to the row swap 𝑅𝑖 ↔ 𝑅𝑗 . One can quickly
show that 𝐸 = 𝐸 𝑇 ; it then follows from Lemma 4.2.2 (Column Extraction) that
(𝐸𝐴)𝑇 = 𝐴𝑇 𝐸 𝑇
= 𝐴𝑇 𝐸
[︀
» ]︀
= 𝐴𝑇 𝑒#»1 · · · 𝑒#»𝑗 · · · 𝑒#»𝑖 · · · 𝑒#𝑚
[︀
» ]︀
= 𝐴𝑇 𝑒#»1 · · · 𝐴𝑇 𝑒#»𝑗 · · · 𝐴𝑇 𝑒#»𝑖 · · · 𝐴𝑇 𝑒#𝑚
[︀
»𝑇 ]︀
= 𝑟#»1 𝑇 · · · 𝑟#»𝑗 𝑇 · · · 𝑟#»𝑖 𝑇 · · · 𝑟#𝑚
= 𝐵𝑇
Since 𝐵 𝑇 = (𝐸𝐴)𝑇 , we see that 𝐵 = 𝐸𝐴 by Property (d) in Proposition 4.3.13 (Properties
of Matrix Transpose).
The previous Proposition tells us that we can perform an ERO on 𝐴 by multiplying 𝐴 on
the left by the appropriate elementary matrix 𝐸. We can also carry out a sequence of EROs
by multiplying 𝐴 on the left by a sequence of appropriate elementary matrices.
Corollary 4.5.4
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and suppose that a finite number of EROs, numbered 1 through 𝑘, are
performed on 𝐴 to produce a matrix 𝐵. Let 𝐸𝑖 denote the elementary matrix corresponding
to the 𝑖𝑡ℎ ERO (1 ≤ 𝑖 ≤ 𝑘) applied to 𝐼𝑚 . Then
𝐵 = 𝐸𝑘 . . . 𝐸2 𝐸1 𝐴.
Section 4.5
Elementary Matrices
111
Proof: Use the Principle of Mathematical Induction on 𝑘 and Proposition 4.5.3.
Pay attention to the order in which the elementary matrices appear in this product.
Example 4.5.5
Let us revisit Example 3.7.1 in Section
⎡
1
𝐵=⎣ 2
1
3.7. The augmented matrix, 𝐵, was given by
⎤
−2 −1 3 1
−4
1 0 5 ⎦.
−2
2 −3 4
We perform the following EROs:
⎡
1 −2
0
gives ⎣ 0
𝑅3 → −𝑅1 + 𝑅3
0
0
⎡
1 −2 −1
0
1
𝑅2 → 31 𝑅2 gives ⎣ 0
0
0
3
⎡
1 −2
0
𝑅3 → −3𝑅2 + 𝑅3 gives ⎣ 0
0
0
{︃
𝑅2 → −2𝑅1 + 𝑅2
⎤
−1
3 1
3 −6 3 ⎦ = 𝐶,
3 −6 3
⎤
3 1
−2 1 ⎦ = 𝐷,
−6 3
⎤
−1
3 1
1 −2 1 ⎦ = 𝐹.
0
0 0
For each of the four EROs, we give the corresponding elementary matrix:
⎡
⎤
1 00
𝑅2 → −2𝑅1 + 𝑅2 gives ⎣ −2 1 0 ⎦ = 𝐸1 ,
0 01
⎡
⎤
1 00
𝑅3 → −𝑅1 + 𝑅3 gives ⎣ 0 1 0 ⎦ = 𝐸2 .
−1 0 1
Notice that 𝐶 = 𝐸2 𝐸1 𝐵. Further,
⎡
⎤
1 0 0
𝑅2 → 13 𝑅2 gives ⎣ 0 31 0 ⎦ = 𝐸3 ,
0 0 1
⎡
⎤
1 0 0
𝑅3 → −3𝑅2 + 𝑅3 gives ⎣ 0 1 0 ⎦ = 𝐸4 .
0 −3 1
Notice that 𝐷 = 𝐸3 𝐶 and 𝐹 = 𝐸4 𝐷.
We can now verify our result by evaluating the product 𝐸4 𝐸3 𝐸2 𝐸1 𝐵 = 𝐸4 (𝐸3 (𝐸2 (𝐸1 𝐵))):
112
Chapter 4
Matrices
⎡
⎤
⎤⎡
⎤⎡
⎤⎡
⎤⎡
1 0 0
1 0 0
1 00
1 00
1 −2 −1
3 1
⎣ 0 1 0 ⎦ ⎣ 0 1 0 ⎦ ⎣ 0 1 0 ⎦ ⎣ −2 1 0 ⎦ ⎣ 2 −4
1
0 5 ⎦
3
0 −3 1
0 0 1
1 −2
2 −3 4
−1 0 1
0 01
⎡
⎤⎡
⎤
⎤⎡
⎤⎡
1 0 0
1 0 0
1 00
1 −2 −1
3 1
0
3 −6 3 ⎦
= ⎣ 0 1 0 ⎦ ⎣ 0 13 0 ⎦ ⎣ 0 1 0 ⎦ ⎣ 0
0 −3 1
1 −2
2 −3 4
0 0 1
−1 0 1
⎡
⎤⎡
⎤⎡
⎤
1 0 0
1 −2 −1
3 1
1 0 0
0
3 −6 3 ⎦
= ⎣ 0 1 0 ⎦ ⎣ 0 13 0 ⎦ ⎣ 0
0 0 1
0 −3 1
0
0
3 −6 3
⎤
⎡
⎤⎡
1 0 0
1 −2 −1
3 1
0
1 −2 1 ⎦
= ⎣0 1 0⎦ ⎣ 0
0 −3 1
0
0
3 −6 3
⎡
⎤
1 −2 −1
3 1
⎣
0
1 −2 1 ⎦ = 𝐹.
= 0
0
0
0
0 0
4.6
Matrix Inverse
Let 𝐴 be an 𝑚 × 𝑛 matrix. In this section, we aim to explore the following two questions:
• Does there exist an 𝑛 × 𝑚 matrix 𝐵 such that, for all #»
𝑥 ∈ F𝑚 , 𝐴𝐵 #»
𝑥 = #»
𝑥?
• Does there exist an 𝑛 × 𝑚 matrix 𝐶 such that, for all #»
𝑥 ∈ F𝑛 , 𝐶𝐴 #»
𝑥 = #»
𝑥?
In the first case, you can think of the matrix 𝐴 as the one that “cancels out” the action
of 𝐵. In the second case, you can think of the matrix 𝐶 as the one that “cancels out” the
action of 𝐴.
In view of the Theorem 4.2.3 (Equality of Matrices), let us reformulate our two motivating
questions as follows:
• Given an 𝑚 × 𝑛 matrix 𝐴, does there exist an 𝑛 × 𝑚 matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑚 ?
• Given an 𝑚 × 𝑛 matrix 𝐴, does there exist an 𝑛 × 𝑚 matrix 𝐶 such that 𝐶𝐴 = 𝐼𝑛 ?
If the matrices 𝐵 and 𝐶 exist, we refer to them as a right inverse and a left inverse of
𝐴, respectively. Notice that if 𝐵 and 𝐶 exist and all of 𝐴, 𝐵 and 𝐶 are square, then
𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 . This case is the most interesting to us, and it motivates the following
definition.
Definition 4.6.1
Invertible Matrix
We say that an 𝑛 × 𝑛 matrix 𝐴 is invertible if there exist 𝑛 × 𝑛 matrices 𝐵 and 𝐶 such
that 𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 .
Section 4.6
Matrix Inverse
113
The following theorem demonstrates that if an 𝑛 × 𝑛 matrix 𝐴 is invertible, then its left
and right inverses are equal.
Proposition 4.6.2
(Equality of Left and Right Inverses)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). If there exist matrices 𝐵 and 𝐶 in 𝑀𝑛×𝑛 (F) such that 𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 ,
then 𝐵 = 𝐶.
Proof: We have 𝐵 = 𝐼𝑛 𝐵 = (𝐶𝐴)𝐵 = 𝐶(𝐴𝐵) = 𝐶𝐼𝑛 = 𝐶.
So far, we have learned that if both left and right inverses exist, then they must be equal.
The next result guarantees that for square matrices the left inverse exists if and only if the
right inverse exists.
Theorem 4.6.3
(Left Invertible Iff Right Invertible)
For 𝐴 ∈ 𝑀𝑛×𝑛 (F), there exists an 𝑛 × 𝑛 matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 if and only if there
exists an 𝑛 × 𝑛 matrix 𝐶 such that 𝐶𝐴 = 𝐼𝑛 .
Proof: Let 𝐴 be an 𝑛 × 𝑛 matrix.
To prove the forward direction, suppose that there exists an 𝑛 × 𝑛 matrix 𝐵 such that
𝐴𝐵 = 𝐼𝑛 . Consider the homogeneous system
#»
𝐵 #»
𝑥 = 0.
Multiplying both sides of this equation by 𝐴 on the left, we have that
#»
#»
#»
𝐴𝐵 #»
𝑥 = 𝐴 0 =⇒ 𝐼𝑛 #»
𝑥 = 0 =⇒ #»
𝑥 = 0.
#»
#»
Thus, the homogeneous system 𝐵 #»
𝑥 = 0 has the unique solution #»
𝑥 = 0 . Therefore, its
solution set has no parameters, and so it follows from Part (a) of Theorem 3.6.7 (System
Rank Theorem) that rank(𝐵) = 𝑛. It then follows from Part (b) of Theorem 3.6.7 (System
#»
#»
Rank Theorem) that the system 𝐵 #»
𝑥 = 𝑏 has a solution for every 𝑏 ∈ F𝑛 . Thus, for any
#»
#»
𝑏 ∈ F𝑛 , there exists #»
𝑥 ∈ F𝑛 such that 𝑏 = 𝐵 #»
𝑥 . Consequently,
#»
#»
#»
(𝐵𝐴) 𝑏 = 𝐵𝐴(𝐵 #»
𝑥 ) = 𝐵(𝐴𝐵) #»
𝑥 = 𝐵𝐼𝑛 #»
𝑥 = 𝐵 #»
𝑥 = 𝑏 = 𝐼𝑛 𝑏 .
#»
#»
#»
Since the identity (𝐵𝐴) 𝑏 = 𝐼𝑛 𝑏 holds for all 𝑏 ∈ F𝑛 , it follows from Theorem 4.2.3
(Equality of Matrices) that 𝐵𝐴 = 𝐼𝑛 . Taking 𝐶 = 𝐵, the result follows.
To prove the backward direction, suppose that there exists an 𝑛 × 𝑛 matrix 𝐶 such that
𝐶𝐴 = 𝐼𝑛 . Then
𝐼𝑛 = 𝐼𝑛𝑇 = (𝐶𝐴)𝑇 = 𝐴𝑇 𝐶 𝑇 .
Thus, 𝐶 𝑇 is the right inverse of 𝐴𝑇 , and so it follows from the proof of the forward direction
that 𝐶 𝑇 is also the left inverse of 𝐴𝑇 , that is,
𝐶 𝑇 𝐴𝑇 = 𝐼𝑛 .
But then
𝐼𝑛 = 𝐼𝑛𝑇 = (𝐶 𝑇 𝐴𝑇 )𝑇 = (𝐴𝑇 )𝑇 (𝐶 𝑇 )𝑇 = 𝐴𝐶.
Taking 𝐵 = 𝐶, the result follows.
114
Chapter 4
Matrices
Notice that if a matrix 𝐴 is invertible, then there exists a unique matrix 𝐵 such that
𝐴𝐵 = 𝐼𝑛 . Indeed, suppose that 𝐴𝐵1 = 𝐼𝑛 and 𝐴𝐵2 = 𝐼𝑛 . Since 𝐴𝐵1 = 𝐴𝐵2 , we see that
𝐵1 𝐴𝐵1 = 𝐵1 𝐴𝐵2 . By Theorem 4.6.3 (Left Invertible Iff Right Invertible), we know that
𝐵1 𝐴 = 𝐼𝑛 , so 𝐵1 = 𝐵2 . In view of this, consider the following definition.
Definition 4.6.4
Inverse of a Matrix
If an 𝑛 × 𝑛 matrix 𝐴 is invertible, we refer to the matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 as the
inverse of 𝐴. We denote the inverse of 𝐴 by 𝐴−1 . The inverse of 𝐴 satisfies
𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼𝑛 .
REMARK
The above results tell us that, in order to verify that the matrix 𝐵 is the inverse of 𝐴, it is
sufficient to verify that 𝐴𝐵 = 𝐼𝑛 . We do not need to also verify that 𝐵𝐴 = 𝐼𝑛 .
]︂
]︂
[︂
−𝑖
𝑖
1+𝑖 𝑖
, because
is the inverse of 𝐴 =
The matrix 𝐵 =
1 −1 − 𝑖
1 𝑖
[︂
Example 4.6.5
[︂
−𝑖
𝑖
𝐴𝐵 =
1 −1 − 𝑖
]︂ [︂
]︂
]︂ [︂
10
1+𝑖 𝑖
.
=
01
1 𝑖
]︂
]︂
[︂
2 −1
11
, because
is not the inverse of 𝐴 =
The matrix 𝐵 =
−2 2
12
[︂
Example 4.6.6
[︂
2 −1
𝐴𝐵 =
−2 2
]︂ [︂
]︂
]︂ [︂
]︂ [︂
10
10
11
.
̸
=
=
01
02
12
While there exist numerous criteria for the invertibility of a matrix, for now we will outline
only three of them.
Theorem 4.6.7
(Invertibility Criteria – First Version)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The following three conditions are equivalent:
(a) 𝐴 is invertible.
(b) rank(𝐴) = 𝑛.
(c) RREF(𝐴) = 𝐼𝑛 .
Proof: ((𝑎) ⇒ (𝑏)):
Suppose that 𝐴 is invertible. Then there exists a matrix 𝐵 such that 𝐵𝐴 = 𝐼𝑛 . We can
show that rank(𝐴) = 𝑛 as in the proof of Theorem 4.6.3 (Left Invertible Iff Right Invertible).
Section 4.6
Matrix Inverse
115
((𝑏) ⇒ (𝑐)):
Suppose that rank(𝐴) = 𝑛. Then every row and every column of the 𝑛×𝑛 matrix RREF(𝐴)
has a pivot. The only matrix that can satisfy this condition is the 𝑛 × 𝑛 identity matrix 𝐼𝑛 .
((𝑐) ⇒ (𝑎)):
Suppose that RREF(𝐴) = 𝐼𝑛 . Then the rank of 𝐴 is equal to the number of rows of 𝐴, which
#»
𝑛
⃗
means that the system 𝐴 #»
𝑥 = ⃗𝑏 is consistent for every
[︁ #» 𝑏 ∈ F#».]︁Thus, we let 𝑏𝑖 be a solution
to 𝐴 #»
𝑥 = 𝑒#»𝑖 for 𝑖 = 1, . . . , 𝑛. If we now define 𝐵 = 𝑏1 · · · 𝑏𝑛 , then by construction
[︁ #»
]︀
#» ]︁ [︀
#» ]︁ [︁ #»
𝐴𝐵 = 𝐴 𝑏1 · · · 𝑏𝑛 = 𝐴𝑏1 · · · 𝐴𝑏𝑛 = 𝑒#»1 · · · 𝑒#»
𝑛 = 𝐼𝑛
From Theorem 4.6.3 (Left Invertible Iff Right Invertible) it follows that there exists a matrix
𝐶 such that 𝐶𝐴 = 𝐼𝑛 . Since 𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 , we conclude that 𝐴 is invertible.
Let us look closely at the last part of the proof, as it describes an algorithm for finding the
inverse of an invertible matrix 𝐴. Indeed, if we are able to solve each of the equations
#»
#»
#»
𝐴𝑏1 = 𝑒#»1 , 𝐴𝑏2 = 𝑒#»2 , . . . , 𝐴𝑏𝑛 = 𝑒#»
𝑛
[︁ #»
]︁
#»
#»
#»
for 𝑏1 , . . . , 𝑏𝑛 ∈ F𝑛 , then 𝐴−1 = 𝑏1 · · · 𝑏𝑛 . Let 𝑅 = RREF(𝐴), and suppose that a
sequence of EROs 𝑟1 , . . . , 𝑟𝑘 is used to row reduce the augmented matrix 𝐴 to 𝑅. Convince
yourself that the same sequence of EROs can be used to row reduce any system [𝐴 | 𝑒#»𝑖 ] to
#»
#»
its RREF, [𝑅 | 𝑏𝑖 ], for some vector 𝑏𝑖 . Consequently, the same sequence of EROs can be
used to row reduce the super-augmented matrix
[︀
]︀
[𝐴 | 𝐼 ] = 𝐴 𝑒#» · · · 𝑒#»
𝑛
1
𝑛
into its RREF
[︁
#»
#» ]︁
[𝑅 | 𝐵] = 𝑅 𝑏1 · · · 𝑏𝑛
[︁ #»
#» ]︁
for some 𝑛×𝑛 matrix 𝐵 = 𝑏1 · · · 𝑏𝑛 . Notice how by performing row reduction on a superaugmented matrix, we are solving 𝑛 distinct systems of linear equations simultaneously. Of
#»
course, this is possible because these systems 𝐴 𝑏𝑖 = 𝑒#»𝑖 have equal coefficient matrices. By
Theorem 3.6.7 (System Rank Theorem), we have that
• If rank(𝐴) ̸= 𝑛 (which is equivalent to 𝑅 ̸= 𝐼𝑛 ), there exists an index 𝑖 such that the
#»
system 𝐴 𝑏𝑖 = 𝑒#»𝑖 is inconsistent (by Proposition 3.10.2). Consequently, it is impossible
to find a matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 , which means that 𝐴 is not invertible.
#»
• If rank(𝐴) = 𝑛 (which is equivalent to 𝑅 = 𝐼𝑛 ), then each system 𝐴 𝑏𝑖 = 𝑒#»𝑖 is
consistent, and so the matrix 𝐵 is the inverse 𝐴.
We summarize our observations in the following proposition.
Proposition 4.6.8
(Algorithm for Checking Invertibility and Finding the Inverse)
The following algorithm allows you to determine whether an 𝑛 × 𝑛 matrix 𝐴 is invertible,
and if it is, the algorithm will provide the inverse of 𝐴.
1. Construct a super-augmented matrix [𝐴 | 𝐼𝑛 ].
2. Find the RREF, [𝑅 | 𝐵], of [𝐴 | 𝐼𝑛 ].
3. If 𝑅 ̸= 𝐼𝑛 , conclude that 𝐴 is not invertible. If 𝑅 = 𝐼𝑛 , conclude that 𝐴 is invertible,
and that 𝐴−1 = 𝐵.
116
Example 4.6.9
Chapter 4
Determine whether the matrix
[︂
𝐴=
12
34
Matrices
]︂
is invertible. If it is invertible, find its inverse.
Solution: In order to find the inverse, we need to solve two systems of equations with
augmented matrices [𝐴 | 𝑒#»1 ] and [𝐴 | 𝑒#»2 ]. Instead of solving them separately, we will solve
them simultaneously by forming a super-augmented matrix
]︂
[︂
[︀
]︀
1 2 1 0
[𝐴 | 𝐼2 ] = 𝐴 𝑒#»1 𝑒#»2 =
3 4 0 1
and row reducing it. We have
[︂
𝑅2 → 𝑅2 − 3𝑅1
gives
]︂
1
2 1
0
0 −2 −3 1
.
Notice that the matrix that we’ve obtained is in REF. Since it has two pivots, we have that
rank(𝐴) = 2, so it must be the case that 𝐴 is invertible. We continue reducing to RREF.
𝑅2 →
− 21 𝑅2
[︂
gives
[︂
𝑅1 → 𝑅1 − 2𝑅2
gives
1 2 1 0
0 1 32 − 12
]︂
1 0 −2 1
0 1 32 − 12
,
]︂
.
Thus, we conclude that the inverse of 𝐴 is
[︂
]︂
]︂
[︂
1 4 −2
−2 1
−1
𝐴 = 3
.
1 =−
2 −3 1
2 −2
Let us verify our calculations:
]︂
]︂ [︂
]︂
[︂
[︂
1 4 · 1 + (−2) · 3 4 · 2 + (−2) · 4
1 4 −2 1 2
−1
𝐴 𝐴=−
=−
34
2 −3 1
2 (−3) · 1 + 1 · 3 (−3) · 2 + 1 · 4
[︂
]︂ [︂
]︂
1 −2 0
10
=−
=
.
0
−2
01
2
Example 4.6.10
Determine whether the matrix
⎡
⎤
123
𝐵 = ⎣4 5 6⎦
789
is invertible. If it is invertible, find its inverse.
Solution: We consider the super-augmented matrix [𝐵 | 𝐼3 ] and reduce it to a matrix in
REF.
Section 4.6
Matrix Inverse
117
{︃
𝑅2 → 𝑅2 − 4𝑅1
𝑅3 → 𝑅3 − 7𝑅1
1
𝑅2 → − 𝑅2
3
𝑅3 → 𝑅3 + 6𝑅2
⎡
1
2
3
gives ⎣ 0 −3 −6
0 −6 −12
⎡
1
2
3
1
2
gives ⎣ 0
0 −6 −12
⎡
1 2 3 1
gives ⎣ 0 1 2 43
0 0 0 1
⎤
1 0 0
−4 1 0 ⎦ ,
−7 0 1
⎤
1 0 0
4
1
⎦,
3 −3 0
−7 0 1
⎤
0 0
− 31 0 ⎦ .
−2 1
At this point, we have that rank(𝐵) = 2, and so it is not invertible.
Example 4.6.11
Determine whether the matrix
⎡
⎤
12 3
𝐶 = ⎣4 5 6 ⎦
7 8 10
is invertible. If it is invertible, find its inverse.
Solution: We consider the super-augmented matrix [𝐶 | 𝐼3 ] and row reduce it.
⎡
{︃
𝑅2 → 𝑅2 − 4𝑅1
𝑅3 → 𝑅3 − 7𝑅1
gives
1
𝑅2 → − 𝑅2
3
gives
𝑅3 → 𝑅3 + 6𝑅2
gives
1
2
3
⎣ 0 −3 −6
0 −6 −11
⎡
1
2
3
⎣ 0
1
2
0 −6 −11
⎡
1 2 3 1
⎣ 0 1 2 4
3
0 0 1 1
At this point, we have that rank(𝐶) = 3, and so it
reducing it to RREF.
⎡
{︃
1 2
𝑅1 → 𝑅1 − 3𝑅3
⎣
0 1
gives
𝑅2 → 𝑅2 − 2𝑅3
0 0
⎡
1 0
⎣
0 1
𝑅1 → 𝑅1 − 2𝑅2 gives
0 0
⎤
1 0 0
−4 1 0 ⎦ ,
−7 0 1
⎤
1 0 0
4
1
⎦,
3 −3 0
−7 0 1
⎤
0 0
− 13 0 ⎦ .
−2 1
is invertible. We thus continue row
⎤
0 −2
6 −3
0 − 23 11
−2 ⎦ ,
3
1
1 −2
1
⎤
2
4
0 −3 −3
1
11
0 − 23
−2 ⎦ .
3
1
1 −2
1
We conclude that the inverse of 𝐶 is
⎡ 2 4
⎤
⎡
⎤
−3 −3 1
−2 −4 3
1
⎢
⎥
𝐶 −1 = ⎣ − 32 11
= ⎣ −2 11 −6 ⎦ .
3 −2 ⎦
3
3 −6 3
1 −2 1
118
Chapter 4
Matrices
Let us verify our calculations.
⎡
⎤
⎡
⎤
⎡
⎤
1 2 3 (︂ )︂ −2 −4 3
300
⎣ 4 5 6 ⎦ 1 ⎣ −2 11 −6 ⎦ = 1 ⎣ 0 3 0 ⎦ = 𝐼3
3
3
7 8 10
3 −6 3
003
Example 4.6.12
Determine whether the matrix
[︂
𝐷=
−1 − 2 𝑖 −1 − 𝑖
1−𝑖
−𝑖
]︂
is invertible. If it is invertible, find its inverse.
Solution We consider the super-augmented matrix [𝐷 | 𝐼2 ] and row reduce it.
[︂
−1 − 2 𝑖 −1 − 𝑖 1 0
1−𝑖
−𝑖
0 1
[︂
5 3 − 𝑖 −1 + 2 𝑖
0
2 1−𝑖
0
1+𝑖
[︂
5
3−𝑖
−1 + 2 𝑖
0
0 −1 − 3 𝑖 2 − 4 𝑖 5 + 5 𝑖
[𝐷 | 𝐼2 ] =
{︃
𝑅1 → (−1 + 2𝑖)𝑅1
𝑅2 → (1 + 𝑖)𝑅2
gives
𝑅2 → −2𝑅1 + 2𝑅2 gives
]︂
]︂
]︂
At this point, we have that rank(𝐷) = 2 and so 𝐷 is invertible. We thus continue row
reducing to RREF.
[︂
5 3 − 𝑖 −1 + 2 𝑖
0
0 3 − 𝑖 4 + 2 𝑖 −5 + 5 𝑖
[︂
]︂
5
0
−5
5 − 5𝑖
0 3 − 𝑖 4 + 2 𝑖 −5 + 5 𝑖
𝑅2 → 𝑖𝑅2 gives
𝑅1 → 𝑅1 − 𝑅2 gives
As
]︂
1
3+𝑖
3+𝑖
=
=
, we complete our reduction with
3−𝑖
9+1
10
{︃
[︂
]︂
𝑅1 → 15 𝑅1
1 0 −1
1−𝑖
gives
1
0 1 1 + 𝑖 −2 + 𝑖
𝑅2 → 10
(3 + 𝑖)𝑅2
Therefore, we conclude that
𝐷
−1
[︂
=
−1
1−𝑖
1 + 𝑖 −2 + 𝑖
]︂
Let us verify our calculations:
[︂
]︂ [︂
]︂
−1
1−𝑖
−1 − 2 𝑖 −1 − 𝑖
1 + 𝑖 −2 + 𝑖
1−𝑖
−𝑖
[︂
]︂ [︂
]︂
2
1 + 2𝑖 + (1 − 𝑖)
1 + 𝑖 − 𝑖(1 − 𝑖)
10
=
=
.
(1 + 𝑖)(−1 − 2𝑖) + (−2 + 𝑖)(1 − 𝑖) − (1 + 𝑖)2 − 𝑖(−2 + 𝑖)
01
Section 4.6
Matrix Inverse
119
The inverse of a 2 × 2 matrix may be computed as follows.
Proposition 4.6.13
(Inverse of a 2 × 2 Matrix)
[︂
]︂
𝑎 𝑏
Let 𝐴 =
. Then 𝐴 is invertible if and only if 𝑎𝑑 − 𝑏𝑐 ̸= 0. Furthermore, if 𝑎𝑑 − 𝑏𝑐 ̸= 0,
𝑐 𝑑
then
[︂
]︂
1
𝑑 −𝑏
−1
.
𝐴 =
𝑎𝑑 − 𝑏𝑐 −𝑐 𝑎
Proof: If 𝑎𝑑 − 𝑏𝑐 ̸= 0, then we may define a matrix 𝐵 ∈ 𝑀2×2 (F) by
[︂
]︂
1
𝑑 −𝑏
𝐵=
.
𝑎𝑑 − 𝑏𝑐 −𝑐 𝑎
A straightforward computation shows that 𝐴𝐵 = 𝐵𝐴 = 𝐼2 . Thus, 𝐴 is invertible in this
case and moreover 𝐵 = 𝐴−1 , as claimed.
Conversely, assume that 𝑎𝑑 − 𝑏𝑐 = 0. We will show in this case that 𝐴 is not invertible by
proving that rank(𝐴) ̸= 2. We will have to consider two cases: 𝑑 = 0 and 𝑑 ̸= 0. If 𝑑 = 0
then
[︂
]︂
𝑎 𝑏
𝐴=
.
𝑐 0
and 𝑏𝑐 = 𝑎𝑑 = 0 (since 𝑎𝑑 − 𝑏𝑐 = 0 by assumption). So, one of 𝑏 or 𝑐 must be 0. If 𝑐 = 0
then 𝐴 has a row of zeros, so it can have at most one pivot, hence rank(𝐴) ̸= 2, and 𝐴 is
not invertible. A similar argument can be applied if 𝑏 = 0: in that case, there can be at
most one pivot (in the first column).
Next, if 𝑑 ̸= 0, we can perform the following EROs to 𝐴:
]︂
𝑎𝑑 𝑏𝑑
.
𝑅1 → 𝑑𝑅1 gives
𝑐 𝑑
]︂
[︂
𝑎𝑑 − 𝑏𝑐 0
.
𝑅1 → −𝑏𝑅2 + 𝑅1 gives
𝑐
𝑑
[︂
Since 𝑎𝑑 − 𝑏𝑐 = 0, we have thus row reduced 𝐴 to a matrix with a row of zeros. Hence, as
above, 𝐴 can have at most one pivot, and therefore, it cannot be invertible.
This completes the analysis of all cases.
REMARK
[︂
]︂
𝑎 𝑏
, then the number 𝑎𝑑 − 𝑏𝑐 is called the determinant of 𝐴. We will discuss it
𝑐 𝑑
in more detail in Chapter 6.
If 𝐴 =
120
Example 4.6.14
Chapter 4
Matrices
Find the inverses of the following matrices, if they exist.
[︂
]︂
[︂
]︂
[︂
]︂
12
12
−1 − 2𝑖 −1 − 𝑖
𝐴=
𝐵=
𝐶=
34
36
1−𝑖
−𝑖
Solution: The determinant of 𝐴 is (1(4) − 2(3)) = −2, and so 𝐴 is invertible with
[︂
]︂
1 4 −2
−1
𝐴 =−
.
2 −3 1
The determinant of 𝐵 is (1(6) − 2(3)) = 0, and so 𝐵 is not invertible.
The determinant of 𝐶 is
(−1 − 2𝑖)(−𝑖) − (−1 − 𝑖)(1 − 𝑖) = −2 + 𝑖 + 2 = 𝑖
1
= −𝑖, we have
𝑖
]︂
]︂ [︂
[︂
−1 1 − 𝑖
−𝑖
1+𝑖
−1
=
𝐶 = −𝑖
1 + 𝑖 −2 + 𝑖
−1 + 𝑖 −1 − 2𝑖
and so 𝐶 is invertible. As
Chapter 5
Linear Transformations
5.1
The Function Determined by a Matrix
Up to this point in the course, we have considered matrices as arrays of numbers which
contain important information about systems of linear equations through either the coefficient matrix, the augmented matrix, or both. Moving forward, we will discover that there
is much more going on than this. We will make use of multiplication by a matrix to define
functions.
Definition 5.1.1
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). The function determined by the matrix 𝐴 is the function
Function
Determined by a
Matrix
𝑇𝐴 : F𝑛 → F𝑚
defined by
𝑇𝐴 ( #»
𝑥 ) = 𝐴 #»
𝑥.
⎤
1 4
𝑥 ∈ R2 , then
Let 𝐴 = ⎣ −2 −5 ⎦. If #»
4 6
⎡
Example 5.1.2
⎡
⎤
⎡
⎤
1 4 [︂ ]︂
𝑥1 + 4𝑥2
𝑥
𝑇𝐴 ( #»
𝑥 ) = ⎣ −2 −5 ⎦ 1 = ⎣ −2𝑥1 − 5𝑥2 ⎦ .
𝑥2
4 6
4𝑥1 + 6𝑥2
Note that 𝑇𝐴 ( #»
𝑥 ) ∈ R3 .
[︂
]︂
−2
For instance, if #»
𝑥 =
, then 𝑇𝐴 ( #»
𝑥 ) = 𝑇𝐴
3
(︂[︂
−2
3
]︂)︂
⎡
⎤
10
= ⎣ −11 ⎦.
10
The function 𝑇𝐴 takes as input vectors in R2 and outputs vectors in R3 .
121
122
Chapter 5
Linear Transformations
[︂
Example 5.1.3
]︂
𝑖 1 + 2𝑖 3 + 2𝑖
Let 𝐴 =
. If #»
𝑧 ∈ C3 , then
2 − 𝑖 4 2 − 5𝑖
𝑇𝐴 ( #»
𝑧) =
[︂
𝑖
1 + 2𝑖 3 + 2𝑖
2−𝑖
4
2 − 5𝑖
]︂
⎡
⎤
[︂
]︂
𝑧1
⎣ 𝑧2 ⎦ = 𝑖𝑧1 + (1 + 2𝑖)𝑧2 + (3 + 2𝑖)𝑧3 .
(2 − 𝑖)𝑧1 + 4𝑧2 + (2 − 5𝑖)𝑧3
𝑧3
Note that 𝑇𝐴 ( #»
𝑧 ) ∈ C2 .
⎡
⎤
⎛⎡
⎤⎞
[︂
]︂
3
3
12𝑖
#»
#»
⎣
⎦
⎝
⎣
⎦
⎠
2−𝑖
For instance, if 𝑧 = 2 − 𝑖 , then 𝑇𝐴 ( 𝑧 ) = 𝑇𝐴
=
.
24 − 3𝑖
2𝑖
2𝑖
The function 𝑇𝐴 takes as input vectors in C3 and outputs vectors in C2 .
The function 𝑇𝐴 determined by a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F) has some rather special features. In
Theorem 3.9.9 we saw that matrix–vector multiplication is linear : that is, for any #»
𝑥 , #»
𝑦 ∈ F𝑛
and any 𝑐 ∈ F, the following two properties hold:
𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦
#»
#»
𝐴(𝑐 𝑥 ) = 𝑐𝐴 𝑥
In view of these properties, the following result holds.
Theorem 5.1.4
(Function Determined by a Matrix is Linear)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 be the function determined by the matrix 𝐴. Then 𝑇𝐴 is
linear; that is, for any #»
𝑥 , #»
𝑦 ∈ F𝑛 and any 𝑐 ∈ F, the following two properties hold:
(a) 𝑇𝐴 ( #»
𝑥 + #»
𝑦 ) = 𝑇𝐴 ( #»
𝑥 ) + 𝑇𝐴 ( #»
𝑦)
(b) 𝑇𝐴 (𝑐 #»
𝑥 ) = 𝑐 𝑇𝐴 ( #»
𝑥)
Proof:
𝑇𝐴 ( #»
𝑥 + #»
𝑦 ) = 𝐴( #»
𝑥 + #»
𝑦)
#»
= 𝐴( 𝑥 ) + 𝐴( #»
𝑦)
and
= 𝑇𝐴 ( #»
𝑥 ) + 𝑇𝐴 ( #»
𝑦)
5.2
𝑇𝐴 (𝑐 #»
𝑥 ) = 𝐴(𝑐 #»
𝑥)
#»
= 𝑐𝐴 𝑥
= 𝑐𝑇𝐴 ( #»
𝑥)
(by definition)
(by Theorem 3.9.9)
(by definition).
Linear Transformations
The properties exhibited by 𝑇𝐴 in Theorem 5.1.4 (Function Determined by a Matrix is
Linear) turn out to be of such importance in linear algebra that we give a name to the class
of functions that exhibit these properties.
Section 5.2
Linear Transformations
Definition 5.2.1
Linear
Transformation
123
We say that the function 𝑇 : F𝑛 → F𝑚 is a linear transformation (or linear mapping)
if, for any #»
𝑥 , #»
𝑦 ∈ F𝑛 and any 𝑐 ∈ F, the following two properties hold:
1. 𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ) (called linearity over addition).
2. 𝑇 (𝑐 #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥 ) (called linearity over scalar multiplication).
We refer to F𝑛 here as the domain of 𝑇 and F𝑚 as the codomain of 𝑇 , as we would for
any function.
REMARKS
• In linear algebra, the words transformation or mapping are often used instead of
the word function.
• In high school, you studied a variety of non-linear functions, such as 𝑓 (𝑥) = 𝑥2 + 1 or
𝑓 (𝑥) = 2𝑥 . These functions usually had a simple domain and codomain, such as R.
In linear algebra, the domains and codomains are more complicated (such as R𝑛 or
C𝑛 ), but the functions themselves are rather simple — they are linear. If you go on
to take a vector calculus course, you may see more complicated functions and more
complicated domains and codomains occurring together.
REMARK
A fundamental example of a linear transformation is the function 𝑇𝐴 determined by a matrix
𝐴. The linearity of 𝑇𝐴 was established in Theorem 5.1.4 (Function Determined by a Matrix
is Linear).
From this point forward we shall refer to 𝑇𝐴 as the linear transformation determined by 𝐴,
instead of the function determined by 𝐴.
We now consider some features of linear transformations. We begin with the result below
which tells us that, if we want, we can check linearity over addition and linearity over
scalar multiplication simultaneously, instead of checking them separately. We will give a
few examples after the following two results.
Proposition 5.2.2
(Alternate Characterization of a Linear Transformation)
Let 𝑇 : F𝑛 → F𝑚 be a function. Then 𝑇 is a linear transformation if and only if for any
#»
𝑥 , #»
𝑦 ∈ F𝑛 and any 𝑐 ∈ F,
𝑇 (𝑐 #»
𝑥 + #»
𝑦 ) = 𝑐𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ).
Proof: We begin with the forward direction. Suppose that 𝑇 is a linear transformation.
#» = 𝑐 #»
Let #»
𝑥 , #»
𝑦 ∈ F𝑛 and 𝑐 ∈ F. Put 𝑤
𝑥 . Since 𝑇 is linear, we have
#» + #»
#» + 𝑇 ( #»
𝑇 (𝑤
𝑦 ) = 𝑇 ( 𝑤)
𝑦)
124
Chapter 5
and
Linear Transformations
#» = 𝑇 (𝑐 #»
𝑇 ( 𝑤)
𝑥 ) = 𝑐𝑇 ( #»
𝑥 ).
Thus, we have
#» + #»
𝑇 (𝑐 #»
𝑥 + #»
𝑦 ) = 𝑇 (𝑤
𝑦)
#»
= 𝑇 ( 𝑤) + 𝑇 ( #»
𝑦)
= 𝑐𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ).
Now for the backward direction. Suppose that for all #»
𝑥 , #»
𝑦 ∈ F𝑛 and for all 𝑐 ∈ F,
𝑇 (𝑐 #»
𝑥 + #»
𝑦 ) = 𝑐𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ).
Taking 𝑐 = 1, we find that for all #»
𝑥 , #»
𝑦 ∈ F𝑛 ,
𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ).
#»
Taking #»
𝑦 = 0 , we find that for all #»
𝑥 ∈ F𝑛 and for all 𝑐 ∈ F,
𝑇 (𝑐 #»
𝑥 + #»
𝑦 ) = 𝑇 (𝑐 #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥 ).
We conclude that 𝑇 is a linear transformation.
Proposition 5.2.3
(Zero Maps to Zero)
Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. Then
#»
#»
𝑇 ( 0 F𝑛 ) = 0 F𝑚 .
#»
#»
Proof: Notice that 0 F𝑛 = 0 · 0 F𝑛 . Since 𝑇 is linear,
#»
#»
𝑇 ( 0 F𝑛 ) = 𝑇 (0 · 0 F𝑛 )
#»
= 0 · 𝑇 ( 0 F𝑛 )
#»
= 0 F𝑚 ,
#»
where the last equality follows from the fact that for any #»
𝑥 ∈ F𝑚 , 0 · #»
𝑥 = 0 F𝑚 .
We will now look at some functions 𝑇 : F𝑛 → F𝑚 and determine whether or not they are
linear. Towards this end, here are a couple of questions to consider:
• Does the zero vector of the domain get mapped to the zero vector of the codomain?
If it does not, then the function is not linear.
• Does the function look linear?
√
All of its terms must be linear. For example, the terms 2𝑥1 , 3𝑥2 , 𝜋𝑥3 , and −𝑥4
are linear, while the terms of the form −3𝑥1 𝑥3 , 𝑥21 or 𝑥1 𝑥2 𝑥3 are not linear. If you
encounter any such non-linear terms, this indicates that the function is likely not
linear. Find a counterexample to definitively prove that the function is not linear.
If the answer to both of these questions is yes, then we use the definition to prove linearity.
Section 5.2
Linear Transformations
Example 5.2.4
125
Determine whether or not the function 𝑇1 : R3 → R2 defined by
⎛⎡ ⎤⎞
[︂
]︂
𝑥
2𝑥 + 3𝑦 + 4
⎝
⎣
⎦
⎠
𝑦
𝑇1
=
6𝑥 − 7𝑧
𝑧
is a linear transformation.
Solution: This transformation is not linear, because
[︀
]︀𝑇
[︀
]︀𝑇 [︀
]︀𝑇
00 .
𝑇1 ( 0 0 0 ) = 4 0 =
̸
Example 5.2.5
Determine whether or not the function 𝑇2 : C3 → C2 defined by
⎛⎡ ⎤⎞
[︂
]︂
𝑧1
2𝑧1 + 3𝑧3
⎦
⎠
⎝
⎣
𝑧2
=
𝑇2
𝑧2 𝑧3
𝑧3
is a linear transformation.
Solution: The zero vector of C3 is mapped to the zero vector of C2 by this function.
However, this function does not look linear because of the product 𝑧2 𝑧3 . We try to construct
a counterexample. The (possibly) troublesome term does not involve 𝑧1 , so we will set that
to zero. Since the image under the function does involve the other two variables let us see
what happens for different values (setting both to 1 or 0 is not usually helpful).
Note that on one hand,
⎛⎡ ⎤⎞
⎛⎡ ⎤⎞
[︂ ]︂ [︂ ]︂ [︂ ]︂
0
0
3
3
0
.
=
+
𝑇2 ⎝⎣ 1 ⎦⎠ + 𝑇2 ⎝⎣ 0 ⎦⎠ =
0
0
0
1
0
On the other hand,
⎛⎡ ⎤⎞
⎛⎡ ⎤ ⎡ ⎤⎞
[︂ ]︂
0
0
0
3
⎦
⎣
⎦
⎠
⎝
⎣
⎦
⎠
⎝
⎣
1
1 + 0
= 𝑇2
=
.
𝑇2
1
1
1
0
Since
⎛⎡ ⎤⎞
⎛⎡ ⎤⎞
⎛⎡ ⎤ ⎡ ⎤⎞
0
0
0
0
𝑇2 ⎝⎣ 1 ⎦⎠ + 𝑇2 ⎝⎣ 0 ⎦⎠ ̸= 𝑇2 ⎝⎣ 1 ⎦ + ⎣ 0 ⎦⎠ ,
0
1
0
1
the function 𝑇2 is not a linear transformation.
Example 5.2.6
Determine whether or not the function 𝑇3 : R2 → R3 defined by
⎡
⎤
(︂[︂ ]︂)︂
2𝑥1 + 3𝑥2
𝑥1
𝑇3
= ⎣ 6𝑥1 − 5𝑥2 ⎦
𝑥2
2𝑥1
is a linear transformation.
126
Chapter 5
Linear Transformations
Solution: As the zero vector of R2 is mapped to the zero vector of R3 and we do not detect
any non-linear terms, this function could be linear. Let us prove that this is indeed the
case.
[︂ ]︂
[︂ ]︂
𝑥1 #»
𝑦
#»
Let 𝑥 =
, 𝑦 = 1 ∈ R2 and 𝑐 ∈ R. Then, using the definition of 𝑇3 , we have
𝑥2
𝑦2
𝑇3 ( #»
𝑥 ) = 𝑇3
(︂[︂
𝑥1
𝑥2
𝑇3 ( #»
𝑦 ) = 𝑇3
(︂[︂
𝑦1
𝑦2
]︂)︂
and
We also have
]︂)︂
⎤
2𝑥1 + 3𝑥2
= ⎣ 6𝑥1 − 5𝑥2 ⎦
2𝑥1
⎡
⎡
⎤
2𝑦1 + 3𝑦2
= ⎣ 6𝑦1 − 5𝑦2 ⎦ .
2𝑦1
[︂
]︂ [︂ ]︂ [︂
]︂
𝑐𝑥1
𝑦1
𝑐𝑥1 + 𝑦1
#»
#»
𝑐𝑥 + 𝑦 =
+
=
,
𝑐𝑥2
𝑦2
𝑐𝑥2 + 𝑦2
so that
⎤
2(𝑐𝑥1 + 𝑦1 ) + 3(𝑐𝑥2 + 𝑦2 )
𝑐𝑥1 + 𝑦1
𝑇3 (𝑐 #»
𝑥 + #»
𝑦 ) = 𝑇3
= ⎣ 6(𝑐𝑥1 + 𝑦1 ) − 5(𝑐𝑥2 + 𝑦2 ) ⎦
𝑐𝑥2 + 𝑦2
2(𝑐𝑥1 + 𝑦1 )
⎡
⎤ ⎡
⎤
(︂[︂ ]︂)︂
(︂[︂ ]︂)︂
2𝑥1 + 3𝑥2
2𝑦1 + 3𝑦2
𝑦1
𝑥1
⎣
⎦
⎣
⎦
= 𝑐 6𝑥1 − 5𝑥2 + 6𝑦1 − 5𝑦2 = 𝑐𝑇3
.
+ 𝑇3
𝑦2
𝑥2
2𝑥1
2𝑦1
(︂[︂
]︂)︂
⎡
That is, 𝑇3 (𝑐 #»
𝑥 + #»
𝑦 ) = 𝑐𝑇3 ( #»
𝑥 ) + 𝑇3 ( #»
𝑦 ). It follows that 𝑇3 is a linear transformation (by
Proposition 5.2.2 (Alternate Characterization of a Linear Transformation)).
Example 5.2.7
Determine whether or not the function 𝑇4 : C2 → C2 defined by
(︂[︂ ]︂)︂ [︂
]︂
𝑧1
2𝑖𝑧1 + 3𝑧2
𝑇4
=
.
𝑧2
(2 − 𝑖)𝑧1 − (1 + 3𝑖)𝑧2
is a linear transformation.
Solution: As the zero vector of C2 is mapped to the zero vector of C2 and we do not detect
any non-linear terms, this function could be linear. Let us prove that this is indeed the
case.
[︂ ]︂
[︂ ]︂
𝑧1 #»
𝑤1
#»
Let 𝑧 =
,𝑤 =
∈ C2 , and 𝑐 ∈ C. Then, from the definition of 𝑇4 , we have
𝑧2
𝑤2
(︂[︂ ]︂)︂ [︂
]︂
𝑧1
2𝑖𝑧1 + 3𝑧2
#»
𝑇4 ( 𝑧 ) = 𝑇4
=
𝑧2
(2 − 𝑖)𝑧1 − (1 + 3𝑖)𝑧2
and
#» = 𝑇
𝑇4 ( 𝑤)
4
(︂[︂
𝑤1
𝑤2
]︂)︂
[︂
=
]︂
2𝑖𝑤1 + 3𝑤2
.
(2 − 𝑖)𝑤1 − (1 + 3𝑖)𝑤2
Section 5.3
The Range of a Linear Transformation and “Onto” Linear Transformations
We also have
127
[︂
]︂ [︂ ]︂ [︂
]︂
#» = 𝑐𝑧1 + 𝑤1 = 𝑐𝑧1 + 𝑤1 ,
𝑐 #»
𝑧 +𝑤
𝑐𝑧2
𝑤2
𝑐𝑧2 + 𝑤2
so that
#» = 𝑇
𝑇4 (𝑐 #»
𝑧 + 𝑤)
4
[︂
=𝑐
(︂[︂
𝑐𝑧1 + 𝑤1
𝑐𝑧2 + 𝑤2
]︂)︂
[︂
2𝑖(𝑐𝑧1 + 𝑤1 ) + 3(𝑐𝑧2 + 𝑤2 )
=
(2 − 𝑖)(𝑐𝑧1 + 𝑤1 ) − (1 + 3𝑖)(𝑐𝑧2 + 𝑤2 )
]︂ [︂
]︂
2𝑖𝑧1 + 3𝑧2
2𝑖𝑤1 + 3𝑤2
+
(2 − 𝑖)𝑧1 − (1 + 3𝑖)𝑧2
(2 − 𝑖)𝑤1 − (1 + 3𝑖)𝑤2
(︂[︂ ]︂)︂
(︂[︂ ]︂)︂
𝑤1
𝑧1
.
+ 𝑇4
= 𝑐𝑇4
𝑤2
𝑧2
]︂
#» = 𝑐𝑇 ( #»
#»
That is, 𝑇4 (𝑐 #»
𝑧 + 𝑤)
4 𝑧 ) + 𝑇4 ( 𝑤). It follows that 𝑇4 is a linear transformation (by
Proposition 5.2.2 (Alternate Characterization of a Linear Transformation)).
5.3
The Range of a Linear Transformation and “Onto” Linear Transformations
A linear transformation is a special kind of function, and, as with many other functions, we
consider the set of all outputs of a linear transformation.
Definition 5.3.1
Range
Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. We define the range of 𝑇 , denoted Range(𝑇 ),
to be the set of all outputs of 𝑇 . That is,
Range(𝑇 ) = {𝑇 ( #»
𝑥 ) : #»
𝑥 ∈ F𝑛 } .
The range of 𝑇 is a subset of F𝑚 .
#»
#»
Since for any linear transformation 𝑇 : F𝑛 → F𝑚 we have 𝑇 ( 0 F𝑛 ) = 0 F𝑚 (by Proposition
#»
5.2.3 (Zero Maps to Zero)), we have that 0 F𝑚 ∈ Range(𝐴). Therefore, the range of a linear
transformation is never empty.
Proposition 5.3.2
(Range of a Linear Transformation)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F), and let 𝑇𝐴 : F𝑛 → F𝑚 be the linear transformation determined by 𝐴.
Then
Range(𝑇𝐴 ) = Col(𝐴).
Proof: We will show that these two subsets of F𝑚 are equal by showing that each set is
contained in the other.
First, suppose that #»
𝑦 ∈ Range(𝑇𝐴 ). Then there exists #»
𝑥 ∈ F𝑛 such that 𝑇𝐴 ( #»
𝑥 ) = #»
𝑦 . That
is, there exists a vector #»
𝑥 ∈ F𝑛 such that 𝐴 #»
𝑥 = #»
𝑦 . Since the system 𝐴⃗𝑥 = ⃗𝑦 is consistent
if and only if #»
𝑦 ∈ Col(𝐴) (see Proposition 4.1.2 (Consistent System and Column Space)),
we conclude that #»
𝑦 ∈ Col(𝐴).
128
Chapter 5
Linear Transformations
Next, suppose that #»
𝑧 ∈ Col(𝐴). Then there exist scalars 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ∈ F such that
]︀𝑇
[︀
#»
#»
#»
𝑥 = #»
𝑧 ; that is,
𝑧 = 𝑥1 𝑎 1 + 𝑥2 𝑎 2 + · · · + 𝑥𝑛 #»
𝑎 𝑛 . If we let #»
𝑥 = 𝑥1 𝑥2 · · · 𝑥𝑛 , then 𝐴 #»
#»
#»
#»
𝑧 = 𝑇𝐴 ( 𝑥 ), and so 𝑧 ∈ Range(𝑇𝐴 ).
Thus, we conclude that Range(𝑇𝐴 ) = Col(𝐴).
REMARK (Connection to Systems of Linear Equations)
We have already seen in Proposition 4.1.2 (Consistent System and Column Space) that
#»
#»
the system of linear equations 𝐴 #»
𝑥 = 𝑏 has a solution if and only if 𝑏 ∈ Col(𝐴).
We can now write
#»
#»
𝑥 = 𝑏 is consistent if and only if 𝑏 ∈ Range(𝑇𝐴 ).
𝐴 #»
⎤
1 4
Determine Range(𝑇𝐴 ), where 𝐴 = ⎣ −2 −5 ⎦.
4 6
⎡
Example 5.3.3
Solution: The range of 𝑇𝐴 is
⎧⎡
⎤⎫
⎤ ⎡
4 ⎬
1
⎨
Range(𝑇𝐴 ) = Col(𝐴) = Span ⎣ −2 ⎦ , ⎣ −5 ⎦ .
⎭
⎩
6
4
#»
#»
Note that the system of linear equations 𝐴 #»
𝑥 = 𝑏 is consistent if and only if 𝑏 ∈ Range(𝑇𝐴 ).
]︂
𝑖 1 + 2𝑖 3 + 2𝑖
.
Determine Range(𝑇𝐴 ), where 𝐴 =
2 − 𝑖 4 2 − 5𝑖
[︂
Example 5.3.4
Solution: The range of 𝑇𝐴 is
{︂[︂
Range(𝑇𝐴 ) = Span
𝑖
2−𝑖
]︂ [︂
]︂ [︂
]︂}︂
1 + 2𝑖
3 + 2𝑖
,
,
.
4
2 − 5𝑖
#»
It can be shown that Range(𝑇𝐴 ) = C2 , and thus the system of linear equations 𝐴 #»
𝑥 = 𝑏 is
#»
consistent for all 𝑏 ∈ C2 .
Linear transformations whose range is equal to the entire codomain deserve a special name.
Definition 5.3.5
Onto
We say that the transformation 𝑇 : F𝑛 → F𝑚 is onto (or surjective) if Range(𝑇 ) = F𝑚 .
Section 5.4
The Range of a Linear Transformation and “Onto” Linear Transformations
Corollary 5.3.6
129
(Onto Criteria)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 be the linear transformation determined by the matrix 𝐴.
The following statements are equivalent:
(a) 𝑇𝐴 is onto.
(b) Col(𝐴) = F𝑚 .
(c) rank(𝐴) = 𝑚.
Proof: ((𝑎) ⇒ (𝑏)):
Suppose that 𝑇𝐴 is onto. From Proposition 5.3.2 (Range of a Linear Transformation), it
follows that
Col(𝐴) = Range(𝑇𝐴 ) = F𝑚 .
((𝑏) ⇒ (𝑐)):
#»
#»
Suppose that Col(𝐴) = F𝑚 . The system 𝐴 #»
𝑥 = 𝑏 is consistent if and only if 𝑏 ∈ Col(𝐴)
#»
(see Proposition 4.1.2 (Consistent System and Column Space)). Thus we have that 𝐴 #»
𝑥 = 𝑏
is consistent for all ⃗𝑏 ∈ F𝑚 . From Part (b) of Theorem 3.6.7 (System Rank Theorem), we
conclude that rank(𝐴) = 𝑚.
((𝑐) ⇒ (𝑎)):
Suppose that rank(𝐴) = 𝑚. From Part (b) of Theorem 3.6.7 (System Rank Theorem),
#»
#»
it follows that the system 𝐴 #»
𝑥 = 𝑏 is consistent for all 𝑏 ∈ F𝑚 . By Proposition 4.1.2
(Consistent System and Column Space), Col(𝐴) = F𝑚 . By Proposition 5.3.2 (Range of a
Linear Transformation), Range(𝑇𝐴 ) = Col(𝐴) = F𝑚 .
⎡
Example 5.3.7
Determine whether or not the linear transformation 𝑇𝐴
⎤
1 4
determined by 𝐴 = ⎣ −2 −5 ⎦ is
4 6
onto.
Solution: From Example 5.3.3 we know that
⎧⎡ ⎤ ⎡ ⎤⎫
4 ⎬
⎨ 1
Range(𝑇𝐴 ) = Col(𝐴) = Span ⎣ −2 ⎦ , ⎣ −5 ⎦ .
⎩
⎭
4
6
The range of 𝑇𝐴 is only a plane in R3 . Therefore, it cannot be equal to all of R3 , and so
#»
#»
Range(𝑇𝐴 ) ̸= R3 . There will be some vectors 𝑏 in R3 for which the equation 𝑇𝐴 ( #»
𝑥) = 𝑏
has no solution. This will happen
⎡ for
⎤ any vector which does not lie on the plane determined
1
by Col(𝐴). One such vector is ⎣ 0 ⎦. Therefore, the linear transformation 𝑇𝐴 is not onto.
0
130
Chapter 5
5.4
Linear Transformations
The Kernel of a Linear Transformation and “One-toOne” Linear Transformations
When analyzing functions, we consider the set of inputs whose output is zero.
Definition 5.4.1
Kernel
Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. We define the kernel of 𝑇 , denoted Ker(𝑇 ),
to be the set of inputs of 𝑇 whose output is zero. That is,
{︁
#» }︁
Ker(𝑇 ) = #»
𝑥 ∈ F𝑛 : 𝑇 ( #»
𝑥 ) = 0 F𝑚 .
The kernel of 𝑇 is a subset of F𝑛 .
#»
#»
Since for any linear transformation 𝑇 : F𝑛 → F𝑚 we have 𝑇 ( 0 F𝑛 ) = 0 F𝑚 (by Proposition
#»
5.2.3 (Zero Maps to Zero)), we have 0 F𝑛 ∈ Ker(𝑇 ). Thus, the kernel of a linear transformation is never empty.
Proposition 5.4.2
(Kernel of a Linear Transformation)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 : F𝑛 → F𝑚 be the linear transformation determined by 𝐴.
Then
Ker(𝑇𝐴 ) = Null(𝐴).
Proof: Using the definitions of kernel and nullspace, we find that #»
𝑥 ∈ Ker(𝑇𝐴 ) ⇐⇒
#»
#»
#»
#»
#»
𝑇𝐴 ( 𝑥 ) = 0 F𝑚 ⇐⇒ 𝐴 𝑥 = 0 F𝑚 ⇐⇒ 𝑥 ∈ Null(𝐴).
REMARK
#»
The kernel of 𝑇𝐴 is equal to the solution set of the homogeneous system 𝐴 #»
𝑥 = 0.
Definition 5.4.3
One-to-One
We say that the transformation 𝑇 : F𝑛 → F𝑚 is one-to-one (or injective) if, whenever
𝑇 ( #»
𝑥 ) = 𝑇 ( #»
𝑦 ), then #»
𝑥 = #»
𝑦.
REMARK
Notice that the statement
For all #»
𝑥 , #»
𝑦 ∈ F𝑛 , if 𝑇 ( #»
𝑥 ) = 𝑇 ( #»
𝑦 ) then #»
𝑥 = #»
𝑦
is logically equivalent to its contrapositive
For all #»
𝑥 , #»
𝑦 ∈ F𝑛 , if #»
𝑥 ̸= #»
𝑦 then 𝑇 ( #»
𝑥 ) ̸= 𝑇 ( #»
𝑦)
Thus, one-to-one linear transformations have the nice property that they map distinct
elements of F𝑛 to distinct elements of F𝑚 .
Section 5.4
The Kernel of a Linear Transformation and “One-to-One” Linear Transformations
131
There is a simple way to check if a linear transformation is one-to-one.
Proposition 5.4.4
(One-to-One Test)
Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. Then
#»
𝑇 is one-to-one if and only if Ker(𝑇 ) = { 0 F𝑛 }.
#»
Proof: Suppose that 𝑇 is one-to-one. We have noted that { 0 F𝑛 } ⊆ Ker(𝑇 ), so it remains
#»
𝑥 in Ker(𝑇 ),
to show that Ker(𝑇 ) ⊆ { 0 F𝑛 }. For this purpose, choose an arbitrary vector #»
#»
#»
#»
#»
𝑥 ) = 𝑇 ( 0 F𝑛 ). Since 𝑇 is
so that 𝑇 ( #»
𝑥 ) = 0 F𝑚 . Since 𝑇 ( 0 F𝑛 ) = 0 F𝑚 , we have that 𝑇 ( #»
#»
#»
one-to-one, we conclude that #»
𝑥 = 0 F𝑛 . Thus, Ker(𝑇 ) ⊆ { 0 F𝑛 }.
#»
𝑥 , #»
𝑦 ∈ F𝑛 that satisfy 𝑇 ( #»
𝑥 ) = 𝑇 ( #»
𝑦 ),
Conversely, suppose that Ker(𝑇 ) = { 0 F𝑛 }. Given any #»
#»
#»
we must show that 𝑥 = 𝑦 . To this end, notice that
#»
#»
𝑇 ( #»
𝑥 ) = 𝑇 ( #»
𝑦 ) =⇒ 𝑇 ( #»
𝑥 ) − 𝑇 ( #»
𝑦 ) = 0 F𝑚 =⇒ 𝑇 ( #»
𝑥 − #»
𝑦 ) = 0 F𝑚 ,
where the last implication follows from linearity. This shows that #»
𝑥 − #»
𝑦 ∈ Ker(𝑇 ). Since
#»
#»
#»
#»
by assumption Ker(𝑇 ) = { 0 F𝑛 }, it follows that 𝑥 − 𝑦 = 0 F𝑛 , and therefore #»
𝑥 = #»
𝑦 , as
required.
If we know that our linear transformation 𝑇 is of the form 𝑇 = 𝑇𝐴 for some matrix 𝐴, then
we can say a bit more.
Corollary 5.4.5
(One-to-One Criteria)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 be the linear transformation determined by the matrix 𝐴.
The following statements are equivalent.
(a) 𝑇𝐴 is one-to-one.
(b) Null(𝐴) = {⃗0F𝑛 }.
(c) nullity(𝐴) = 0.
(d) rank(𝐴) = 𝑛.
Proof: ((𝑎) ⇒ (𝑏)):
This follows from Proposition 5.4.2 (Kernel of a Linear Transformation) and Proposition
5.4.4 (One-to-One Test).
((𝑏) ⇒ (𝑐)):
#»
#»
Suppose that Null(𝐴) = { 0 F𝑛 }. Then the consistent system 𝐴 #»
𝑥 = 0 F𝑚 has a unique
#»
solution, namely #»
𝑥 = 0 F𝑛 . Thus, the solution set of this consistent system has 0 parameters,
and so it follows from the definition of nullity that nullity(𝐴) = 0.
((𝑐) ⇒ (𝑑)):
Suppose that nullity(𝐴) = 0. Since nullity(𝐴) = 𝑛 − rank(𝐴) by definition, we see that
rank(𝐴) = 𝑛.
132
Chapter 5
Linear Transformations
((𝑑) ⇒ (𝑎)):
#»
Suppose that rank(𝐴) = 𝑛 and consider the system 𝐴 #»
𝑥 = 0 F𝑚 . Note that this system is
#»
consistent, because #»
𝑥 = 0 F𝑛 is a solution. From Part (a) of Theorem 3.6.7 (System Rank
Theorem), it follows that number of parameters in the solution set of this consistent system
#»
#»
𝑥 = 0 F𝑚 is
is 𝑛 − rank(𝐴) = 𝑛 − 𝑛 = 0. We conclude that the solution #»
𝑥 = 0 F𝑛 to 𝐴 #»
unique.
#»
#»
This shows that Null(𝐴) = { 0 F𝑛 }. Thus Ker(𝑇𝐴 ) = { 0 F𝑛 } by Proposition 5.4.2 (Kernel of
a Linear Transformation), and therefore 𝑇𝐴 is one-to-one by Proposition 5.4.4 (One-to-One
Test).
Example 5.4.6
Determine
whether
or not the linear transformation 𝑇𝐴 : R3 → R3 determined by
⎡
⎤
2 4 10
𝐴 = ⎣ 4 −4 −4 ⎦ is one-to-one.
6 8 22
Solution: Let us examine the kernel of 𝑇𝐴 , which is the same as looking at the solution
#»
set to 𝐴 #»
𝑥 = 0 . The augmented matrix is
⎡
⎤
2 4 10 0
⎣ 4 −4 −4 0 ⎦ ,
6 8 22 0
which row reduces to
⎡
⎤
1250
⎣0 1 2 0⎦ .
0000
Thus rank(𝐴) = 2 < 3 = 𝑛, and it follows that 𝑇𝐴 is not one-to-one by Part (d) of Corollary
5.4.5 (One-to-One Criteria).
By combining Corollary 5.3.6 (Onto Criteria) and Corollary 5.4.5 (One-to-One Criteria) we
arrive at the following result concerning square matrices. The proof is left as an exercise.
(Compare with Theorem 4.6.7 (Invertibility Criteria – First Version).)
Theorem 5.4.7
(Invertibility Criteria – Second Version)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) be a square matrix and let 𝑇𝐴 be the linear transformation determined
by the matrix 𝐴. The following statements are equivalent.
(a) 𝐴 is invertible.
(b) 𝑇𝐴 is one-to-one.
(c) 𝑇𝐴 is onto.
#»
#»
(d) Null(𝐴) = { 0 }. That is, the only solution to the homogeneous system 𝐴 #»
𝑥 = 0 is the
#»
trivial solution #»
𝑥 = 0.
#»
#»
(e) Col(𝐴) = F𝑛 . That is, for every 𝑏 ∈ F𝑛 , the system 𝐴 #»
𝑥 = 𝑏 is consistent.
(f) nullity(𝐴) = 0.
Section 5.5
Every Linear Transformation is Determined by a Matrix
133
(g) rank(𝐴) = 𝑛.
(h) RREF(𝐴) = 𝐼𝑛 .
5.5
Every Linear Transformation is Determined by a Matrix
In Section 5.1 we saw that every matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F) determines a linear transformation
𝑇𝐴 : F𝑛 → F𝑚 . We will show in this section that, conversely, every linear transformation
𝑇 : F𝑛 → F𝑚 is actually of the form 𝑇 = 𝑇𝐴 for a certain matrix 𝐴 (that depends on 𝑇 ). In
fact, we will show that there is a very natural process that constructs this matrix 𝐴 from
𝑇.
We will make use of the set
⎧⎡ ⎤ ⎡ ⎤
1
0
⎪
⎪
⎪
⎨⎢ 0 ⎥ ⎢ 1 ⎥
⎢ ⎥ ⎢ ⎥
ℰ = ⎢ . ⎥,⎢ . ⎥,···
⎪
⎣ .. ⎦ ⎣ .. ⎦
⎪
⎪
⎩
0
0
⎡ ⎤⎫
0 ⎪
⎪
⎢ 0 ⎥⎪
⎬
⎢ ⎥
, ⎢ . ⎥ = {𝑒#»1 , 𝑒#»2 , . . . , 𝑒#»
𝑛}
⎣ .. ⎦⎪
⎪
⎪
⎭
1
of standard basis vectors in F𝑛 (which we had already met in Definition 1.3.15).
Example 5.5.1
Let us examine the consequences of linearity in the special [︂case]︂when F𝑛 = F𝑚 = F2 . Thus
𝑥
suppose that 𝑇 : F2 → F2 is a linear mapping and let #»
𝑥 = 1 be a vector in F2 . Then
𝑥2
(︂[︂
𝑇
𝑥1
𝑥2
]︂)︂
(︂[︂
]︂ [︂ ]︂)︂
𝑥1
0
=𝑇
+
𝑥2
0
[︂ ]︂)︂
(︂ [︂ ]︂
0
1
+ 𝑥2
= 𝑇 𝑥1
1
0
(︂[︂ ]︂)︂
(︂[︂ ]︂)︂
0
1
+ 𝑥2 𝑇
= 𝑥1 𝑇
0
1
[︂ ]︂
[︀
]︀ 𝑥1
= 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 )
𝑥2
[︀ #»
]︀ #»
#»
= 𝑇 (𝑒1 ) 𝑇 (𝑒2 ) 𝑥 .
(by linearity)
This shows us that the actual
effect of]︀ the linear transformation can be replicated by the
[︀
introduction of a matrix 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) .
[︀
]︀
In addition, this matrix 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) has columns which are constructed by applying 𝑇 to
the basis vectors 𝑒#»1 and 𝑒#»2 in F2 . This means that if we know what the linear transformation
does to just these two (standard basis) vectors, then we can determine what it does to all
vectors in F2 .
Finally,
value of 𝑇 ( #»
𝑥 ) can be computed by matrix multiplication of this matrix
[︀ #» the#»actual
]︀
𝑇 (𝑒 ) 𝑇 (𝑒 ) by the component vector #»
𝑥 . This result extends to higher dimensions.
1
2
134
Definition 5.5.2
Standard Matrix
Chapter 5
Linear Transformations
Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. We define the standard matrix of 𝑇 , denoted
by [𝑇 ]ℰ , to be 𝑚 × 𝑛 matrix whose columns are the images under 𝑇 of the vectors in the
standard basis of F𝑛 :
[︀
]︀
[𝑇 ]ℰ = 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) · · · 𝑇 (𝑒#»
𝑛)
⎛⎡ ⎤⎞ ⎤
⎡ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞
0
0
1
⎜⎢ 0 ⎥⎟ ⎥
⎢ ⎜⎢ 0 ⎥⎟ ⎜⎢ 1 ⎥⎟
⎢ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⎥⎟ ⎥
= ⎢ 𝑇 ⎜⎢ . ⎥⎟ 𝑇 ⎜⎢ . ⎥⎟ · · · 𝑇 ⎜⎢ . ⎥⎟ ⎥ .
⎝⎣ .. ⎦⎠ ⎦
⎣ ⎝⎣ .. ⎦⎠ ⎝⎣ .. ⎦⎠
0
1
0
REMARKS
• There is a 𝑇 to remind us of the linear transformation that we are going to represent.
• The square brackets in [𝑇 ]ℰ is our way of indicating that we are converting a linear
transformation into a matrix.
• The ℰ indicates that the standard basis is being used for both the domain and the
codomain. Later we will investigate the consequences of using different bases in either
or both of these two sets.
Theorem 5.5.3
(Every Linear Transformation Is Determined by a Matrix)
Let 𝑇 : F𝑛 → F𝑚 be a linear transformation and let [𝑇 ]ℰ be the standard matrix of 𝑇 . Then
for all #»
𝑥 ∈ F𝑛 ,
𝑇 ( #»
𝑥 ) = [𝑇 ]ℰ #»
𝑥.
That is, 𝑇 = 𝑇[𝑇 ]ℰ is the linear transformation determined by the matrix [𝑇 ]ℰ .
[︀
]︀𝑇
Proof: Let #»
𝑥 ∈ F𝑛 with #»
𝑥 = 𝑥1 𝑥2 · · · 𝑥𝑛 . Then
⎤⎞
𝑥1
⎜⎢ 𝑥2 ⎥⎟
⎜⎢ ⎥⎟
𝑇 ( #»
𝑥 ) = 𝑇 ⎜⎢ . ⎥⎟
⎝⎣ .. ⎦⎠
⎛⎡
𝑥𝑛
⎛⎡
⎤ ⎡ ⎤
⎡
𝑥1
0
⎢
⎜⎢ 0 ⎥ ⎢ 𝑥2 ⎥
⎢
⎜⎢ ⎥ ⎢ ⎥
= 𝑇 ⎜⎢ . ⎥ + ⎢ . ⎥ + · · · + ⎢
.
.
⎣
⎝⎣ . ⎦ ⎣ . ⎦
0
0
..
.
⎤⎞
⎥⎟
⎥⎟
⎥⎟
⎦⎠
𝑥𝑛
0
0
= 𝑇 (𝑥1 𝑒#»1 + 𝑥2 𝑒#»2 + · · · + 𝑥𝑛 𝑒#»
𝑛)
#»
#»
= 𝑥1 𝑇 (𝑒1 ) + 𝑥2 𝑇 (𝑒2 ) + · · · + 𝑥𝑛 𝑇 (𝑒#»
𝑛)
(using linearity)
Section 5.5
Every Linear Transformation is Determined by a Matrix
135
⎤
𝑥1
⎥
[︀
]︀ ⎢
⎢ 𝑥2 ⎥
= 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) · · · 𝑇 (𝑒#»
𝑛 ) ⎢ .. ⎥
⎣ . ⎦
⎡
= [𝑇 ]ℰ #»
𝑥.
𝑥𝑛
This result is quite remarkable.
First, the matrix representation provides us with a very elegant and compact way of expressing the linear transformation.
Second, the construction of the standard matrix is very simple. We must only find the
images of the 𝑛 standard basis vectors and build the standard matrix column by column
from these images.
Third, once we have this standard matrix, we can find the image of any vector in F𝑛 under
the linear transformation using matrix–vector multiplication.
Thus, if 𝑇 : F𝑛 → F𝑚 is a linear transformation, then once we know 𝑇 (𝑒#»1 ), . . . , 𝑇 (𝑒#»
𝑛 ), we
can compute 𝑇 ( #»
𝑥 ) for all #»
𝑥 ∈ F𝑛 . This property is incredibly strong and is one of the key
features of linear transformations.
In particular, linear transformations 𝑇 : R → R are very simple. For example, given
𝑔(𝑥) = 𝑚𝑥 (a linear function through the origin), it is possible to determine the value
of 𝑚 if we are provided with any non-zero point 𝑃 that falls on the line. It turns out that
these functions completely categorize linear transformations 𝑇 : R → R, which is our result
below.
Proposition 5.5.4
Let 𝑇 : R → R be a linear transformation. Then there is a real number 𝑚 ∈ R such that
𝑇 (𝑥) = 𝑚𝑥 for all 𝑥 ∈ R.
Proof: We have 𝑇 (𝑥) = [𝑇 ]ℰ 𝑥, where the standard matrix [𝑇 ]ℰ of 𝑇 is the 1 × 1 matrix
[𝑇 (1)]. Thus, if we let 𝑚 = 𝑇 (1), we find that 𝑇 (𝑥) = 𝑚𝑥, as required.
Note that the above property is not true of non-linear transformations. For example, the
(non-linear) function 𝑓 : R → R defined by 𝑓 (𝑥) = sin(𝑥) cannot be determined from
knowing the value of sin(1).
To conclude this section, we summarize what we have learned.
Given a matrix
𝐴 ∈ 𝑀𝑚×𝑛 (F), we can define a linear transformation 𝑇𝐴 : F𝑛 → F𝑚 . Conversely, given
a linear transformation 𝑇 : F𝑛 → F𝑚 , we can construct a matrix [𝑇 ]ℰ ∈ 𝑀𝑚×𝑛 (F). What
happens if we combine these two processes? That is, what can we say about 𝑇[𝑇 ]ℰ and
[𝑇𝐴 ]ℰ ?
We answer both these questions and state a couple of useful consequences in the following
result.
136
Proposition 5.5.5
Chapter 5
Linear Transformations
(Properties of a Standard Matrix)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F), let 𝑇𝐴 : F𝑛 → F𝑚 be the linear transformation determined by 𝐴, and let
𝑇 : F𝑛 → F𝑚 be a linear transformation. Then
(a) 𝑇[𝑇 ]ℰ = 𝑇 .
(b) [𝑇𝐴 ]ℰ = 𝐴.
(c) 𝑇 is onto if and only if rank([𝑇 ]ℰ ) = 𝑚.
(d) 𝑇 is one-to-one if and only if rank([𝑇 ]ℰ ) = 𝑛.
Proof: (a) Using the definition of the linear transformation defined by a matrix, we have
𝑇[𝑇 ]ℰ #»
𝑥 = [𝑇 ]ℰ #»
𝑥 = 𝑇 #»
𝑥
for all #»
𝑥 ∈ F𝑛 ,
where the last equality follows from Theorem 5.5.3 (Every Linear Transformation Is
Determined by a Matrix). Thus, the two functions 𝑇[𝑇 ]ℰ and 𝑇 have the same values
at each #»
𝑥 ∈ F𝑛 , so they must be equal.
(b) Using the definition of the standard matrix, we have
[︀
]︀
[𝑇𝐴 ]ℰ = 𝑇𝐴 (𝑒#»1 ) · · · 𝑇𝐴 (𝑒#»
𝑛)
[︀
]︀
= 𝐴𝑒#»1 · · · 𝐴𝑒#»
𝑛
[︀
]︀
= 𝑎#»1 · · · 𝑎# »
(by Lemma 4.2.2 (Column Extraction))
𝑛
= 𝐴.
(c) This follows from Corollary 5.3.6 (Onto Criteria).
(d) This follows from Corollary 5.4.5 (One-to-One Criteria).
5.6
Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection
In this section, we will revisit the operations of projection and perpendicular discussed in
Chapter 1, prove that both of these types of transformations are linear, and determine their
standard matrices. We will do the same for the operations of rotation and reflection.
Example 5.6.1
(Projection onto a line through the origin)
]︂
𝑤1
be a non-zero vector in R2 .
𝑤2
proj 𝑤#» : R2 → R2 defined in Section 1.6.
#» =
Let 𝑤
[︂
(a) Prove that proj 𝑤#» is a linear transformation.
Consider the projection transformation
Section 5.6
Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection
137
(b) Determine the standard matrix of proj 𝑤#» .
(c) Determine whether proj 𝑤#» is onto.
(d) Determine whether proj 𝑤#» is one-to-one.
Solution:
#» is defined by
(a) Recall that if #»
𝑣 is a vector in R2 , then the projection of #»
𝑣 onto 𝑤
#»
#»
𝑣 ·𝑤
#»
𝑣 ) = #» 2 𝑤.
proj 𝑤#» ( #»
‖ 𝑤‖
Now let #»
𝑢 , #»
𝑣 ∈ R2 and let 𝑐1 , 𝑐2 ∈ R. By the linearity properties of the dot product,
#» #»
#»
#»
𝑢 ·𝑤
#» + 𝑐 𝑣 · 𝑤 𝑤
#»
proj 𝑤#» (𝑐1 #»
𝑢 + 𝑐2 #»
𝑣 ) = 𝑐1 #» 2 𝑤
2 #» 2
‖ 𝑤‖
‖ 𝑤‖
= 𝑐 proj #» ( #»
𝑢 ) + 𝑐 proj #» ( #»
𝑣 ).
1
𝑤
2
𝑤
We conclude that proj 𝑤#» is a linear transformation.
(b) To find the standard matrix [proj 𝑤#» ]ℰ of proj 𝑤#» , recall that
]︀
[︀
[proj 𝑤#» ]ℰ = proj 𝑤#» (𝑒#»1 ) proj 𝑤#» (𝑒#»2 ) .
Since
[︂ ]︂
[︂ 2 ]︂
#»
𝑒#»1 · 𝑤
𝑤1 · 1 + 𝑤2 · 0 𝑤1
1
𝑤1
#»
#»
proj 𝑤#» (𝑒1 ) = #» 2 𝑤 =
= 2
𝑤2
‖ 𝑤‖
𝑤12 + 𝑤22
𝑤1 + 𝑤22 𝑤1 𝑤2
[︂ ]︂
[︂
]︂
#»
𝑒#»2 · 𝑤
1
𝑤1 𝑤2
#» = 𝑤1 · 0 + 𝑤2 · 1 𝑤1 =
proj 𝑤#» (𝑒#»2 ) = #» 2 𝑤
,
𝑤2
𝑤2
‖ 𝑤‖
𝑤2 + 𝑤2
𝑤2 + 𝑤2
1
we have
2
1
2
2
[︂ 2
]︂
1
𝑤1 𝑤1 𝑤2
[proj 𝑤#» ]ℰ = 2
.
𝑤1 + 𝑤22 𝑤1 𝑤2 𝑤22
(c) There are two ways in which we can argue why proj 𝑤#» is not an onto linear transfor#»
mation. Thinking geometrically, note that all images of proj 𝑤#» lie on the line Span{ 𝑤}
and nowhere else in R2 . Equivalently,
[︂ ]︂
[︂ ]︂}︂
{︂[︂ ]︂}︂
{︂
𝑤2
𝑤1
𝑤1
𝑤1
𝑤1
Range(proj 𝑤#» ) = Span
, 2
= Span
̸= R2 .
2
2
2
𝑤
𝑤
𝑤
𝑤1 + 𝑤2
𝑤1 + 𝑤2
2
2
2
(d) Once again, there are two ways in which we can argue why proj 𝑤#» is not a one-toone linear transformation. Thinking geometrically, note that there are many different
vectors which have the same image when they are projected onto the line. Equivalently,
the kernel is obtained by solving the system
[︂ 2
]︂
1
𝑤1 𝑤1 𝑤2 #» #»
𝑣 = 0,
𝑤12 + 𝑤22 𝑤1 𝑤2 𝑤22
which row-reduces to
[︂
]︂
𝑤1 𝑤2 #» #»
𝑣 = 0.
0 0
Since the rank of the coefficient matrix is strictly less than 2, it follows that proj 𝑤#» not
one-to-one from Corollary 5.4.5 (One-to-One Criteria).
138
Chapter 5
Linear Transformations
Note that if you want to project onto a line that does not pass through the origin, then you
can project on the parallel line through the origin and then translate the solution. Note,
#»
however, that this function is not an example of a linear transformation, since 0 would not
#»
be mapped to 0 .
EXERCISE
#» be a non-zero vector in R𝑛 and consider the perpendicular transformation
Let 𝑤
perp 𝑤#» : R𝑛 → R𝑛 defined in Section 1.6 for #»
𝑣 ∈ R𝑛 :
𝑣 ).
perp 𝑤#» ( #»
𝑣 ) = #»
𝑣 − proj 𝑤#» ( #»
(a) Prove that perp 𝑤#» is a linear transformation.
(b) Determine the standard matrix of perp 𝑤#» .
(c) Determine whether perp 𝑤#» is onto.
(d) Determine whether perp 𝑤#» is one-to-one.
Example 5.6.2
(Projection onto a plane through the origin)
We can define the projection map onto a plane through the origin by using a normal vector
#» ∈ 𝑃 is orthogonal
of the plane. For a plane 𝑃 with normal vector #»
𝑛 , we know every vector 𝑤
#»
#»
#»
to 𝑛 , so we know that 𝑤 = perp 𝑛#» ( 𝑤).
#» ∈ R3 , we think of proj #» ( #»
#»
#»
Intuitively, for #»
𝑣,𝑤
𝑤 𝑣 ) as the 𝑤-component of 𝑣 ; the same analogy
can be made with planes. With this intuition in place, we define the projection of #»
𝑣 onto
the plane 𝑃 as
proj𝑃 ( #»
𝑣 ) = perp 𝑛#» ( #»
𝑣 ).
With this established, consider the projection map proj𝑃 : R3 → R3 onto a plane 𝑃 , where
𝑃 is defined by the scalar equation 3𝑥 − 4𝑦 + 5𝑧 = 0.
Section 5.6
Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection
139
(a) Prove that proj𝑃 is a linear transformation.
(b) Determine the standard matrix of proj𝑃 .
(c) Determine whether proj𝑃 is onto.
(d) Determine whether proj𝑃 is one-to-one.
Solution:
[︀
]︀𝑇
(a) Let #»
𝑛 = 3 −4 5
be a normal to the plane 𝑃 . Then proj𝑃 ( #»
𝑣 ) = perp 𝑛#» ( #»
𝑣 ), and
because we know (from the exercise above) that perp 𝑛#» is a linear transformation, we
conclude that proj𝑃 is a linear transformation as well.
(b) To find the standard matrix [proj𝑃 ]ℰ of proj𝑃 , recall that
]︀
[︀
[proj𝑃 ]ℰ = proj𝑃 (𝑒#»1 ) proj𝑃 (𝑒#»2 ) proj𝑃 (𝑒#»3 )
Since
proj𝑃 (𝑒#»1 ) = perp 𝑛#» (𝑒#»1 )
= 𝑒#»1 − proj 𝑛#» (𝑒#»1 )
𝑒#»1 · #»
𝑛
= 𝑒#»1 − #» 2 #»
𝑛
‖𝑛‖
⎡ ⎤ ⎡
⎤
⎡ ⎤
3
41/50
1
1 · 3 + 0 · (−4) + 0 · 5 ⎣ ⎦ ⎣
−4 = 6/25 ⎦
= ⎣0⎦ −
50
5
−3/10
0
⎡ ⎤
⎡ ⎤ ⎡
⎤
0
3
6/25
0
·
3
+
1
·
(−4)
+
0
·
5
⎣ −4 ⎦ = ⎣ 17/25 ⎦
proj𝑃 (𝑒#»2 ) = ⎣ 1 ⎦ −
50
0
5
2/5
⎡ ⎤
⎡ ⎤ ⎡
⎤
0
3
−3/10
0
·
3
+
0
·
(−4)
+
1
·
5
⎣ −4 ⎦ = ⎣ 2/5 ⎦
proj𝑃 (𝑒#»3 ) = ⎣ 0 ⎦ −
50
1
5
1/2
we have that
⎡
⎤
41/50 6/25 −3/10
[proj𝑃 ]ℰ = ⎣ 6/25 17/25 2/5 ⎦ .
−3/10 2/5
1/2
(c) There are two ways in which we can argue why proj𝑃 is not onto. Thinking geometrically, note that all images lie on the plane 3𝑥 − 4𝑦 + 5𝑧 = 0 and nowhere else in R3 .
Equivalently, we could investigate the range and show that it is not R3 . This can be
done, for example, by finding one vector #»
𝑣 ∈ R3 such that the system [proj𝑃 ]ℰ #»
𝑥 = #»
𝑣
#»
#»
is inconsistent. This can be achieved by taking any vector 𝑣 ∈
/ 𝑃 , such as 𝑣 = 𝑒#»1
𝑣 ∈
/ Range(proj𝑃 ), we conclude that
(notice that 1 · 3 + 0 · (−4) + 0 · 5 ̸= 0). Since #»
3
Range(proj𝑃 ) ̸= R , and so proj𝑃 is not onto.
140
Chapter 5
Linear Transformations
(d) Once again, there are two ways in which we can argue why proj𝑃 is not one-to-one.
Thinking geometrically, note that many different points have the same image when they
are projected onto the plane. Equivalently, the kernel is obtained by solving
#»
[proj𝑃 ]ℰ #»
𝑥 = 0.
Since the rank of the coefficient matrix is strictly less than 3, it follows that proj𝑃 not
one-to-one from Corollary 5.4.5 (One-to-One Criteria).
Example 5.6.3
(Rotation about the origin by an angle 𝜃)
Consider the transformation 𝑅𝜃 : R2 → R2 that rotates every vector #»
𝑥 ∈ R2 by a fixed
angle 𝜃 counter-clockwise about the origin. We will assume 0 < 𝜃 < 2𝜋 so that the rotation
action is not trivial. This linear transformation is illustrated in the diagram below. Notice
that the variable 𝑟 represents the length of a vector #»
𝑥.
(a) Prove that 𝑅𝜃 is a linear transformation.
(b) Determine the standard matrix of 𝑅𝜃 .
(c) Determine whether 𝑅𝜃 is onto.
(d) Determine whether 𝑅𝜃 is one-to-one.
Solution:
(a) We will prove that 𝑅𝜃 is a linear transformation by showing that[︂ it can
]︂ be written in
𝑥1
the form 𝑅𝜃 ( #»
𝑥 ) = 𝐴 #»
𝑥 for some matrix 𝐴 ∈ 𝑀2×2 (R). Let #»
𝑥 =
be an arbitrary
𝑥2
vector in R2 . Recall that we can convert #»
𝑥 into its polar representation by taking
#»
#»
𝑟 = ‖ 𝑥 ‖ (the length of 𝑥 ) and multiplying by cos 𝜑 to get the 𝑥-coordinate and sin 𝜑
to get the 𝑦-coordinate, where 𝜑 is the angle between 𝑒#»1 and #»
𝑥.
[︂
]︂
𝑟 cos 𝜑
#»
#»
Converting 𝑥 into polar coordinates, we find that 𝑥 =
, Since 𝑅𝜃 ( #»
𝑥 ) is the
𝑟 sin 𝜑
result of a counter-clockwise rotation of #»
𝑥 by an angle 𝜃 about the origin,
(︂[︂
𝑅𝜃
𝑟 cos 𝜑
𝑟 sin 𝜑
]︂)︂
[︂
]︂
𝑟 cos(𝜑 + 𝜃)
=
.
𝑟 sin(𝜑 + 𝜃)
Section 5.6
Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection
141
By making use of the trigonometric angle-sum identities
cos (𝜑 + 𝜃) = cos 𝜑 cos 𝜃 − sin 𝜑 sin 𝜃
and
sin (𝜑 + 𝜃) = sin 𝜑 cos 𝜃 + cos 𝜑 sin 𝜃,
we have
(︂[︂
]︂)︂
𝑟 cos 𝜑
𝑅𝜃 ( #»
𝑥 ) = 𝑅𝜃
𝑟 sin 𝜑
[︂
]︂
𝑟 cos(𝜑 + 𝜃)
=
𝑟 sin(𝜑 + 𝜃)
[︂
]︂
𝑟(cos 𝜑 cos 𝜃 − sin 𝜑 sin 𝜃)
=
𝑟(sin 𝜑 cos 𝜃 + cos 𝜑 sin 𝜃)
[︂
]︂
cos 𝜃(𝑟 cos 𝜑) − sin 𝜃(𝑟 sin 𝜑)
=
sin 𝜃(𝑟 cos 𝜑) + cos 𝜃(𝑟 sin 𝜑)
]︂
]︂ [︂
[︂
cos 𝜃 − sin 𝜃 𝑟 cos 𝜑
=
𝑟 sin 𝜑
sin 𝜃 cos 𝜃
= 𝐴 #»
𝑥,
]︂
cos 𝜃 − sin 𝜃
. Since we were able to express 𝑅𝜃 in the form of a matrixwhere 𝐴 =
sin 𝜃 cos 𝜃
vector product, it must be the case that 𝑅𝜃 is a linear transformation.
[︂
(b) From part (a) we find that
]︂
cos 𝜃 − sin 𝜃
.
[𝑅𝜃 ]ℰ = 𝐴 =
sin 𝜃 cos 𝜃
[︂
(c) There are two ways in which we can argue why 𝑅𝜃 is onto. Thinking geometrically,
note that any vector in R2 can be obtained by a rotation from an appropriate starting
vector. Equivalently, as Range(𝑅𝜃 ) = Col([𝑅𝜃 ]ℰ ) by Proposition 5.3.2 (Range of a
Linear Transformation), we have
{︂[︂
Range(𝑅𝜃 ) = Span
]︂ [︂
]︂}︂
cos 𝜃
− sin 𝜃
,
sin 𝜃
cos 𝜃
and it can be shown that this is R2 .
(d) Once again, there are two ways in which we can argue why 𝑅𝜃 is one-to-one. Thinking
geometrically, note that any two different vectors have different images when they are
rotated. Equivalently, as Ker(𝑅𝜃 ) = Null([𝑅𝜃 ]ℰ ) by Proposition 5.4.2 (Kernel of a Linear
Transformation), we have
[︂
]︂
cos 𝜃 − sin 𝜃 #» #»
𝑥 = 0.
sin 𝜃 cos 𝜃
Given that the angle 𝜃 is fixed here, we view sin 𝜃 and cos 𝜃 as constants, so we can
reduce this matrix as normal. We wish to show that the rank of the coefficient matrix
is 2, so that the only solution to the system above is the trivial solution, allowing us to
conclude that this linear transformation is one-to-one.
142
Chapter 5
Linear Transformations
First, note that if cos(𝜃) = 0 or sin(𝜃) = 0 (conditions which cannot occur simultaneously) then the matrix is either in REF or a row swap will establish an REF. As such,
in these scenarios the rank is 2 and the linear transformation is one-to-one.
In the scenarios where sin 𝜃 ̸= 0 and cos 𝜃 ̸= 0, we have
{︃
[︂
]︂
𝑅1 → (sin 𝜃)𝑅1
sin 𝜃 cos 𝜃 − sin2 𝜃
gives
sin 𝜃 cos 𝜃 cos2 𝜃
𝑅2 → (cos 𝜃)𝑅2
[︂
]︂ [︂
]︂
sin 𝜃 cos 𝜃
− sin2 𝜃
sin 𝜃 cos 𝜃 − sin2 𝜃
𝑅2 → 𝑅2 − 𝑅1 gives
=
0
cos2 𝜃 + sin2 𝜃
0
1
We
𝜃]︂ ̸= 0 as sin 𝜃 ̸= 0 and cos 𝜃 ̸= 0, so we can conclude that
[︂ know sin 𝜃 cos
sin 𝜃 cos 𝜃 − sin2 𝜃
is an REF of [𝑅𝜃 ]ℰ , so again the rank is 2. Thus, we can say
0
1
in all cases that 𝑅𝜃 is one-to-one.
Example 5.6.4
(Reflection about a line through the origin)
[︂
]︂
𝑤1
be a non-zero vector in R2 , and consider the reflection transformation
𝑤2
#» This linear
refl 𝑤#» : R2 → R2 , which reflects any vector #»
𝑣 ∈ R2 about the line Span{ 𝑤}.
transformation is illustrated in the diagram below.
#» =
Let 𝑤
Using the geometric construction above, one can show that any vector #»
𝑣 ∈ R2 gets reflected
#» according to the rule below. This is left as an exercise.
about the line Span{ 𝑤}
refl 𝑤#» ( #»
𝑣 ) = #»
𝑣 − 2 perp 𝑤#» ( #»
𝑣 ).
(a) Prove that refl 𝑤#» is a linear transformation.
(b) Determine the standard matrix of refl 𝑤#» .
(c) Determine whether refl 𝑤#» is onto.
(d) Determine whether refl 𝑤#» is one-to-one.
Solution:
Section 5.7
Composition of Linear Transformations
143
(a) Let #»
𝑢 , #»
𝑣 ∈ R2 and let 𝑐1 , 𝑐2 ∈ F. Since the perpendicular transformation is linear,
refl 𝑤#» (𝑐1 #»
𝑢 + 𝑐2 #»
𝑣 ) = (𝑐1 #»
𝑢 + 𝑐2 #»
𝑣 ) − 2 perp 𝑤#» (𝑐1 #»
𝑢 + 𝑐2 #»
𝑣)
#»
#»
#»
= (𝑐 𝑢 + 𝑐 𝑣 ) − 2𝑐 perp #» ( 𝑢 ) − 2𝑐 perp #» ( #»
𝑣)
1
2
1
2
𝑤
𝑤
= 𝑐1 #»
𝑢 + 𝑐2 #»
𝑣 − 2𝑐1 perp 𝑤#» ( #»
𝑢 ) − 2𝑐2 perp 𝑤#» ( #»
𝑣)
#»
#»
#»
#»
= 𝑐 ( 𝑢 − 2 perp #» ( 𝑢 )) + 𝑐 ( 𝑣 − 2 perp #» ( 𝑣 ))
1
𝑤
2
𝑤
𝑣 ).
𝑢 ) + 𝑐2 refl 𝑤#» ( #»
= 𝑐1 refl 𝑤#» ( #»
We conclude that refl 𝑤#» is a linear transformation.
(b) To find the standard matrix [refl 𝑤#» ]ℰ of refl 𝑤#» , note that
𝑣 ).
𝑣 ) = − #»
𝑣 + 2 proj 𝑤#» ( #»
refl 𝑤#» ( #»
𝑣 ) = #»
𝑣 − 2 perp 𝑤#» ( #»
Since
[︂ ]︂
2
1
#»
refl 𝑤#» (𝑒1 ) = −
+ 2
0
𝑤1 + 𝑤22
[︂ ]︂
2
0
+ 2
refl 𝑤#» (𝑒#»2 ) = −
1
𝑤1 + 𝑤22
]︂
1
𝑤12
= 2
𝑤1 𝑤2
𝑤1 + 𝑤22
]︂
[︂
1
𝑤1 𝑤2
= 2
2
𝑤2
𝑤1 + 𝑤22
[︂
]︂
𝑤12 − 𝑤22
,
2𝑤1 𝑤2
]︂
[︂
2𝑤1 𝑤2
,
𝑤22 − 𝑤12
[︂
we have that
[refl 𝑤#» ]ℰ
=
[︀
refl 𝑤#» (𝑒#»1 )
refl 𝑤#» (𝑒#»2 )
]︀
[︂ 2
]︂
1
𝑤1 − 𝑤22 2𝑤1 𝑤2
= 2
.
𝑤1 + 𝑤22 2𝑤1 𝑤2 𝑤22 − 𝑤12
(c) There are two ways in which we can argue why refl 𝑤#» is onto. Thinking geometrically,
note that any vector in R2 can be obtained by a reflection from an appropriate starting
vector. Equivalently, the range is
]︂ [︂
]︂}︂
{︂[︂ 2
2𝑤1 𝑤2
𝑤1 − 𝑤22
#»
,
Range(refl 𝑤 ) = Span
2𝑤1 𝑤2
𝑤22 − 𝑤12
and it can be shown that this is equal to R2 .
(d) Once again, there are two ways in which we can argue why 𝑅𝜃 is one-to-one. Thinking
geometrically, note that any two different vectors have different images when they are
reflected. Equivalently, the kernel is obtained by solving
#»
[refl 𝑤#» ]ℰ #»
𝑥 = 0.
Since the rank of the coefficient matrix is equal to 2, then the only solution is the trivial
solution and so this linear transformation is one-to-one.
5.7
Composition of Linear Transformations
In this section, we will consider the composition of linear transformations and show how it
is related to matrix multiplication.
144
Definition 5.7.1
Composition of
Linear
Transformations
Chapter 5
Linear Transformations
Let 𝑇1 : F𝑛 → F𝑚 and 𝑇2 : F𝑚 → F𝑝 be linear transformations. We define the function
𝑇2 ∘ 𝑇1 : F𝑛 → F𝑝 by
(𝑇2 ∘ 𝑇1 )( #»
𝑥 ) = 𝑇2 (𝑇1 ( #»
𝑥 )).
The function 𝑇2 ∘ 𝑇1 is called the composite function of 𝑇2 and 𝑇1 .
Proposition 5.7.2
(Composition of Linear Transformations Is Linear)
Let 𝑇1 : F𝑛 → F𝑚 and 𝑇2 : F𝑚 → F𝑝 be linear transformations. Then 𝑇2 ∘ 𝑇1 is a linear
transformation.
Proof: Since 𝑇1 is linear, we have that for all #»
𝑥 , #»
𝑦 ∈ F𝑛 , 𝑐 ∈ F,
𝑇1 (𝑐 #»
𝑥 + #»
𝑦 ) = 𝑐𝑇1 ( #»
𝑥 ) + 𝑇1 ( #»
𝑦 ).
#» #»
Since 𝑇2 is linear, we have that for all 𝑤,
𝑧 ∈ F𝑚 , 𝑎 ∈ F,
#» + #»
#» + 𝑇 ( #»
𝑇2 (𝑎 𝑤
𝑧 ) = 𝑎𝑇2 ( 𝑤)
2 𝑧 ).
Now, consider the action of the composition on 𝑐 #»
𝑥 + #»
𝑦:
(𝑇2 ∘ 𝑇1 )(𝑐 #»
𝑥 + #»
𝑦 ) = 𝑇2 (𝑐𝑇1 ( #»
𝑥 ) + 𝑇1 ( #»
𝑦 )) .
#» and 𝑇 ( #»
#»
We can let 𝑇1 ( #»
𝑥) = 𝑤
1 𝑦 ) = 𝑧 , so that
#» + #»
(𝑇2 ∘ 𝑇1 )(𝑐 #»
𝑥 + #»
𝑥 2 ) = 𝑇2 (𝑐 𝑤
𝑧)
#»
= 𝑐𝑇 ( 𝑤) + 𝑇 ( #»
𝑧 ),
2
2
= 𝑐𝑇2 (𝑇1 ( #»
𝑥 )) + 𝑇2 (𝑇1 ( #»
𝑦 ))
#»
= 𝑐(𝑇 ∘ 𝑇 )( 𝑥 ) + (𝑇 ∘ 𝑇 )( #»
𝑦 ).
2
1
2
(by linearity of 𝑇2 )
1
Therefore, 𝑇2 ∘ 𝑇1 is a linear transformation.
Proposition 5.7.3
(The Standard Matrix of a Composition of Linear Transformations)
Let 𝑇1 : F𝑛 → F𝑚 and 𝑇2 : F𝑚 → F𝑝 be linear transformations. Then the standard matrix
of 𝑇2 ∘ 𝑇1 is equal to the product of standard matrices of 𝑇2 and 𝑇1 . That is,
[𝑇2 ∘ 𝑇1 ]ℰ = [𝑇2 ]ℰ [𝑇1 ]ℰ .
Proof: Let 𝐴 = [𝑇1 ]ℰ , 𝐵 = [𝑇2 ]ℰ , and let 𝑇𝐵𝐴 : F𝑛 → F𝑝 denote the linear transformation
determined by the matrix 𝐵𝐴. Note that, for all #»
𝑣 ∈ F𝑛 ,
(𝑇2 ∘ 𝑇1 )( #»
𝑣 ) = 𝑇2 (𝑇1 ( #»
𝑣 ))
#»
= 𝑇 (𝐴 𝑣 )
2
= 𝐵(𝐴 #»
𝑣)
= (𝐵𝐴) #»
𝑣
= 𝑇𝐵𝐴 ( #»
𝑣 ).
Section 5.7
Composition of Linear Transformations
145
We conclude that the linear transformations 𝑇2 ∘ 𝑇1 and 𝑇𝐵𝐴 are identical. Consequently,
(𝑇2 ∘ 𝑇1 )(𝑒#»𝑖 ) = 𝑇𝐵𝐴 (𝑒#»𝑖 ) for all 𝑖 = 1, . . . , 𝑛. But then
[︀
]︀
[𝑇2 ∘ 𝑇1 ]ℰ = (𝑇2 ∘ 𝑇1 )(𝑒#»1 ) · · · (𝑇2 ∘ 𝑇𝑛 )(𝑒#»
𝑛)
[︀
]︀
= 𝑇𝐵𝐴 (𝑒#»1 ) · · · 𝑇𝐵𝐴 (𝑒#»
𝑛)
]︀
[︀
= (𝐵𝐴)𝑒#»1 · · · (𝐵𝐴)𝑒#»
𝑛
= 𝐵𝐴
(by Lemma 4.2.2 (Column Extraction))
= [𝑇2 ]ℰ [𝑇1 ]ℰ .
This result gives us a very efficient way of obtaining the matrix representation of a composite linear transformation. It also explains a motivation behind the definition of matrix
multiplication.
⎤
3𝑖 1 + 𝑖
1 + 𝑖 −3𝑖 −2
0 ⎦ be a matrix in
Let 𝐴 =
be a matrix in 𝑀2×3 (C) and 𝐵 = ⎣ 1
2 1 + 2𝑖 4𝑖
1 − 𝑖 3 − 2𝑖
𝑀3×2 (C). Let 𝑇𝐴 and 𝑇𝐵 be the linear transformations determined by 𝐴 and 𝐵, respectively.
Determine the standard matrix of their composite linear transformation 𝑇𝐵 ∘ 𝑇𝐴 .
[︂
Example 5.7.4
⎡
]︂
Solution: By Proposition 5.7.3 (The Standard Matrix of a Composition of Linear Transformations),
[𝑇𝐵 ∘ 𝑇𝐴 ]ℰ = [𝑇𝐵 ]ℰ [𝑇𝐴 ]ℰ = 𝐵𝐴.
Therefore,
⎡
⎤
⎤
]︂
−1 + 5𝑖 8 + 3𝑖 −4 − 2𝑖
3𝑖 1 + 𝑖 [︂
1 + 𝑖 −3𝑖 −2
−2 ⎦ .
0 ⎦
= ⎣ 1 + 𝑖 −3𝑖
[𝑇𝐵 ∘ 𝑇𝐴 ]ℰ = ⎣ 1
2 1 + 2𝑖 4𝑖
8 − 4𝑖 4 + 𝑖 6 + 14𝑖
1 − 𝑖 3 − 2𝑖
⎡
Example 5.7.5
Find the standard matrix of the linear transformation 𝑇 : R2 → R2 which is defined by 𝑇1 ,
a rotation counter-clockwise about the origin by an angle of 𝜋3 radians, followed by 𝑇2 , a
projection onto the line 𝑦 = −3𝑥.
Solution From Example 5.6.3 we have that
√ ]︃
[︃
]︃ [︃
3
1
cos( 𝜋3 ) − sin( 𝜋3 )
−
2
2
√
[𝑇1 ]ℰ = [𝑅𝜋/3 ]ℰ =
=
.
3
1
sin( 𝜋3 ) cos( 𝜋3 )
2
2
[︀
]︀𝑇
Next, note that the line 𝑦 = −3𝑥 can be written as Span{𝑤},
⃗ where 𝑤
⃗ = 1 −3 . From
Example 5.6.1 we have that for
[︂
]︂
[︂
]︂
1
1
1 −3
1 −3
.
[𝑇2 ]ℰ = [proj 𝑤#» ]ℰ =
=
1 + (−3)2 −3 9
10 −3 9
Thus, we have
√ ]︃
[︃
√
√ ]︃
[︂
]︂ [︃ 1
3
−
1
−
3
3
−3
−
3
1
1
1 −3
2
2
√
√
√
[𝑇 ]ℰ = [𝑇2 ]ℰ [𝑇1 ]ℰ =
=
.
3
1
10 −3 9
20 −3 + 9 3 3 3 + 9
2
2
146
Definition 5.7.6
Identity
Transformation
Chapter 5
Linear Transformations
The linear transformation id𝑛 : F𝑛 → F𝑛 such that id𝑛 ( #»
𝑥 ) = #»
𝑥 for all #»
𝑥 ∈ F𝑛 is called the
identity transformation.
EXERCISE
Show that the standard matrix [id𝑛 ]ℰ of id𝑛 is the identity matrix 𝐼𝑛 .
A special case of composition arises when 𝑇 : F𝑛 → F𝑛 has the same domain and codomain,
in which case we can apply the linear transformation 𝑇 more than one time.
Definition 5.7.7
𝑇𝑝
Let 𝑇 : F𝑛 → F𝑛 and let 𝑝 > 1 be an integer. We then define the 𝑝𝑡ℎ power of 𝑇 , denoted
by 𝑇 𝑝 , inductively by
𝑇 𝑝 = 𝑇 ∘ 𝑇 𝑝−1 .
We also define 𝑇 0 = id𝑛 .
Corollary 5.7.8
Let 𝑇 : F𝑛 → F𝑛 be a linear transformation and let 𝑝 > 1 be an integer. Then the standard
matrix of 𝑇 𝑝 is the 𝑝𝑡ℎ power of the standard matrix of 𝑇 . That is,
[𝑇 𝑝 ]ℰ = ([𝑇 ]ℰ )𝑝 .
Proof: Use induction and Proposition 5.7.3 (The Standard Matrix of a Composition of
Linear Transformations).
Example 5.7.9
Consider the linear transformation defined by a rotation about the origin by an angle of 𝜃
2
2
counter-clockwise
in the
]︂ plane, denoted by 𝑅𝜃 : R → R . In Example 5.6.3 we found that
[︂
cos 𝜃 − sin 𝜃
. Determine the standard matrix of 𝑅𝜃2 .
[𝑇𝜃 ]ℰ =
sin 𝜃 cos 𝜃
Solution: We have
[︂
]︂ [︂
]︂
[︀ 2 ]︀
cos(𝜃) − sin(𝜃) cos(𝜃) − sin(𝜃)
2
𝑅𝜃 ℰ = ([𝑅𝜃 ]ℰ ) =
sin(𝜃) cos(𝜃)
sin(𝜃) cos(𝜃)
[︂
=
cos2 (𝜃) − sin2 (𝜃) −2 cos(𝜃) sin(𝜃)
2 cos(𝜃) sin(𝜃) cos2 (𝜃) − sin2 (𝜃)
]︂
From here, we can make use of the trigonometric double-angle identities,
cos(2𝜃) = cos2 (𝜃) − sin2 (𝜃)
and
sin(2𝜃) = 2 cos(𝜃) sin(𝜃),
to obtain
[︂
]︂
[︀ 2 ]︀
cos(2𝜃) − sin(2𝜃)
𝑅𝜃 ℰ =
,
sin(2𝜃) cos(2𝜃)
which is the standard matrix for a rotation of 2𝜃 counter-clockwise.
Chapter 6
The Determinant
6.1
The Definition of the Determinant
[︀ ]︀
When is an 𝑛 × 𝑛 matrix 𝐴[︀invertible?
If 𝑛 = 1 then 𝐴 = 𝑎 is invertible if and only if
]︀
𝑎 ̸= 0, in which case 𝐴−1 = 𝑎1 . If 𝑛 = 2, we saw in Proposition 4.6.13 (Inverse of a 2 × 2
]︂
[︂
𝑎 𝑏
is invertible if and only if 𝑎𝑑 − 𝑏𝑐 ̸= 0.
Matrix) that 𝐴 =
𝑐 𝑑
Thus, the invertibility of a 1 × 1 or 2 × 2 matrix can be completely determined by examining
a number formed using the entries of the matrix. We give this number a name.
Definition 6.1.1
[︀
]︀
If 𝐴 = 𝑎11 is in 𝑀1×1 (F), then the determinant of 𝐴, denoted by det(𝐴), is:
Determinant of a
1 × 1 and 2 × 2
Matrix
det(𝐴) = 𝑎11 .
[︂
𝑎 𝑎
If 𝐴 = 11 12
𝑎21 𝑎22
]︂
is in 𝑀2×2 (F), then the determinant of 𝐴, denoted by det(𝐴), is:
det(𝐴) = 𝑎11 𝑎22 − 𝑎12 𝑎21 .
Example 6.1.2
[︀
]︀
Let 𝐴 = 4 − 𝑖 . Then det(𝐴) = 4 − 𝑖.
[︂
Example 6.1.3
]︂
6 8
Let 𝐵 =
. Then det(𝐵) = 6(2) − 8(−3) = 36.
−3 2
In the remainder of this section, we will generalize the above and define the determinant of
an 𝑛 × 𝑛 matrix. One of our main results will be that an 𝑛 × 𝑛 matrix is invertible if and
only if its determinant is non-zero, but it will take some time to get there (see Theorem
6.3.1 (Invertible iff the Determinant is Non-Zero)).
We will need some preliminary definitions.
147
148
Definition 6.1.4
(𝑖, 𝑗)𝑡ℎ Submatrix,
(𝑖, 𝑗)𝑡ℎ Minor
Chapter 6
The Determinant
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The (𝑖, 𝑗)𝑡ℎ submatrix of 𝐴, denoted by 𝑀𝑖𝑗 (𝐴), is the (𝑛 − 1) ×
(𝑛 − 1) matrix obtained from 𝐴 by removing the 𝑖𝑡ℎ row and the 𝑗 𝑡ℎ column from 𝐴. The
determinant of 𝑀𝑖𝑗 (𝐴) is known as the (𝑖, 𝑗)𝑡ℎ minor of 𝐴.
⎡
Example 6.1.5
Definition 6.1.6
Determinant of an
𝑛 × 𝑛 matrix
⎤
[︂
]︂
[︂
]︂
4 6 8
48
6 8
If 𝐴 = ⎣ 6 −3 2 ⎦ then 𝑀22 (𝐴) =
and 𝑀31 (𝐴) =
.
59
−3 2
5 7 9
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) for 𝑛 ≥ 2. We define the determinant function, det : 𝑀𝑛×𝑛 (F) → F,
by
𝑛
∑︁
det(𝐴) =
𝑎1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐴)).
𝑗=1
[︂
Example 6.1.7
]︂
𝑎11 𝑎12
If 𝑛 = 2, so that 𝐴 =
, the above definition gives
𝑎21 𝑎22
[︀
]︀
[︀
]︀
det(𝐴) = 𝑎11 (−1)1+1 det( 𝑎22 ) + 𝑎12 (−1)1+2 det( 𝑎21 ) = 𝑎11 𝑎22 − 𝑎12 𝑎21 .
Thus, we recover the same expression given in Definition 6.1.1.
The expression for det(𝐴) in Definition 6.1.6 is called the expansion along the first row
of the determinant. It is recursive in nature: we must multiply the entries in the first row
of the matrix 𝐴 by ±1 and then by the determinant of an (𝑖, 𝑗) submatrix.
For a 3 × 3 matrix, this process will involve the evaluation of three determinants of 2 × 2
matrices.
For a 4 × 4 matrix, this process will involve the evaluation of four determinants of 3 × 3
matrices, each of which involve the evaluation of three determinants of 2 × 2 matrices.
This pattern continues as we increase 𝑛.
In the next section we will learn techniques that will permit us to occasionally avoid too
many tedious computations.
⎡
Example 6.1.8
⎤
12 3
Find the determinant of the matrix 𝐴 = ⎣ 4 5 6 ⎦.
7 8 10
Section 6.1
The Definition of the Determinant
149
Solution: Using the definition, we have
det(𝐴) =
3
∑︁
𝑎1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐴))
𝑗=1
= 𝑎11 (−1)1+1 det(𝑀11 (𝐴)) + 𝑎12 (−1)1+2 det(𝑀12 (𝐴)) + 𝑎13 (−1)1+3 det(𝑀13 (𝐴))
(︂[︂
]︂)︂
(︂[︂
]︂)︂
(︂[︂
]︂)︂
5 6
4 6
45
= (1)(1) det
+ (2)(−1) det
+ (3)(1) det
8 10
7 10
78
= 1(50 − 48) − 2(40 − 42) + 3(32 − 35) = −3.
⎡
Example 6.1.9
⎤
1
2𝑖 3
Find the determinant of the matrix 𝐵 = ⎣ 4 − 𝑖 5 + 2𝑖 6 ⎦ .
7 + 3𝑖 8 − 2𝑖 10
Solution: Using the definition, we have
det(𝐵) =
𝑛
∑︁
𝑏1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐵))
𝑗=1
= 𝑐11 (−1)1+1 det(𝑀11 (𝐶)) + 𝑐12 (−1)1+2 det(𝑀12 (𝐶)) + 𝑐13 (−1)1+3 det(𝑀13 (𝐶))
]︂)︂
(︂[︂
]︂)︂
(︂[︂
5 + 2𝑖 6
4−𝑖 6
= (1)(1) det
+ (2𝑖)(−1) det
7 + 3𝑖 10
8 − 2𝑖 10
]︂)︂
(︂[︂
4 − 𝑖 5 + 2𝑖
+ (3)(1) det
7 + 3𝑖 8 − 2𝑖
= 1(50 + 20𝑖 − (48 − 12𝑖)) − 2𝑖(40 − 10𝑖 − (42 + 18𝑖)) + 3(30 − 16𝑖 − (29 + 29𝑖))
= −51 − 99𝑖.
One of the remarkable features of the determinant is that it may be evaluated by performing
an expansion along any row, not just the first row. We state this result below. The proof
of this result is quite lengthy and is omitted.
Proposition 6.1.10
(𝑖𝑡ℎ Row Expansion of the Determinant)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) with 𝑛 ≥ 2 and let 𝑖 ∈ {1, . . . , 𝑛}. Then
det(𝐴) =
𝑛
∑︁
𝑎𝑖𝑗 (−1)𝑖+𝑗 det(𝑀𝑖𝑗 (𝐴)).
𝑗=1
Notice that if 𝑖 = 1 then the above expression reduces to the definition of det(𝐴).
150
Chapter 6
The Determinant
⎡
Example 6.1.11
⎤
12 3
Evaluate the determinant of the matrix 𝐴 = ⎣ 4 5 6 ⎦ using an expansion along the second
7 8 10
row.
Solution: We have
det(𝐴) =
𝑛
∑︁
𝑎2𝑗 (−1)2+𝑗 det(𝑀2𝑗 (𝐴))
𝑗=1
= 𝑎21 (−1)2+1 det(𝑀21 (𝐴)) + 𝑎22 (−1)2+2 det(𝑀22 (𝐴)) + 𝑎23 (−1)2+3 det(𝑀23 (𝐴))
(︂[︂
]︂)︂
(︂[︂
]︂)︂
(︂[︂
]︂)︂
2 3
1 3
12
= (4)(−1) det
+ (5)(1) det
+ (6)(−1) det
8 10
7 10
78
= −4(20 − 24) + 5(10 − 21) − 6(8 − 14) = −3,
just as we found in Example 6.1.8.
A further remarkable feature of the determinant is that it may also be evaluated by expanding along any column.
Proposition 6.1.12
(𝑗 𝑡ℎ Column Expansion of the Determinant)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) with 𝑛 ≥ 2 and let 𝑗 ∈ {1, . . . , 𝑛}. Then
det(𝐴) =
𝑛
∑︁
𝑎𝑖𝑗 (−1)𝑖+𝑗 det(𝑀𝑖𝑗 (𝐴)).
𝑖=1
⎡
Example 6.1.13
⎤
12 3
Evaluate the determinant of the matrix 𝐴 = ⎣ 4 5 6 ⎦ using an expansion along the third
7 8 10
column.
Solution: We have
det(𝐴) =
3
∑︁
𝑎𝑖3 (−1)𝑖+3 det(𝑀𝑖3 (𝐴))
𝑖=1
= 𝑎13 (−1)1+3 det(𝑀13 (𝐴)) + 𝑎23 (−1)2+3 det(𝑀23 (𝐴)) + 𝑎33 (−1)3+3 det(𝑀33 (𝐴))
(︂[︂
]︂)︂
(︂[︂
]︂)︂
(︂[︂
]︂)︂
45
12
12
= (3)(1) det
+ (6)(−1) det
+ (10)(1) det
78
78
45
= 3(32 − 35) − 6(8 − 14) + 10(5 − 8) = −3.
Thus, you may evaluate the determinant by expanding along any row or column of your
choice. In practice, it is usually easier to choose a row or column with many zeros, since
this will reduce the number determinants of submatrices that need to be computed.
Section 6.1
The Definition of the Determinant
151
⎡
Example 6.1.14
1
⎢ 3
Evaluate the determinant of 𝐴 = ⎢
⎣ 567
4
0
2
234
5
0
0
14
0
⎤
5
7 ⎥
⎥.
235 ⎦
3
Solution: We perform an expansion along the third column.
det(𝐴) =
4
∑︁
𝑎𝑖3 (−1)𝑖+3 det(𝑀𝑖3 (𝐴))
𝑖=1
= 0(1) det(𝑀13 (𝐴)) + 0(−1) det(𝑀23 (𝐴)) + 14(1) det(𝑀33 (𝐴)) + 0(−1) det(𝑀43 (𝐴))
⎛⎡
⎤⎞
105
= 14 det ⎝⎣ 3 2 7 ⎦⎠ .
453
From here, we need to proceed with another expansion along a row or column. Row 1 now
looks “ideal” with the most zeroes of any row. Doing so, we continue our calculation below:
⎛⎡
⎤⎞
105
det(𝐴) = 14 det ⎝⎣ 3 2 7 ⎦⎠
453
]︂)︂]︂
]︂)︂
(︂[︂
[︂
(︂[︂
32
27
+ (5) det
= 14 (1) det
45
53
= 14 [(6 − 35) + 5(15 − 8)]
= 14 [−29 + 35] = 14(6) = 84.
The recursive definition of the determinant makes its direct application tedious. There are,
however, certain situations where the determinant is relatively easy to compute.
Proposition 6.1.15
(Easy Determinants)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) be a square matrix.
(a) If 𝐴 has a row consisting only of zeros, then det 𝐴 = 0.
(b) If 𝐴 has a column consisting only of zeros, then det 𝐴 = 0.
⎡
⎤
𝑎11 * * · · · *
⎢ 0 𝑎22 * · · · * ⎥
⎢
⎥
⎢
⎥
(c) If 𝐴 = ⎢ 0 0 𝑎33 · · · * ⎥ is upper triangular, then det 𝐴 = 𝑎11 𝑎22 · · · 𝑎𝑛𝑛 .
⎢ .. .. . . . .
.. ⎥
⎣ . .
. . . ⎦
0
0 · · · 0 𝑎𝑛𝑛
Proof: For parts (a) and (b), simply perform the expansion along the row/column consisting of 0s.
152
Chapter 6
The Determinant
To prove part (c), we will proceed by induction on 𝑛. If 𝑛 = 1, then the result follows
immediately from the definition of the determinant of a 1 × 1 matrix. For the inductive
step, we will assume that the result is true when 𝑛 = 𝑘 and consider a (𝑘 + 1) × (𝑘 + 1)
upper triangular matrix
⎡
⎤
𝑎11 * * · · ·
*
⎢ 0 𝑎22 * · · ·
⎥
*
⎥
⎢
⎢
⎥
*
𝐴 = ⎢ 0 0 𝑎33 · · ·
⎥.
⎢ .. .. . . . .
⎥
..
⎣ . .
⎦
. .
.
0
0 · · · 0 𝑎𝑘+1,𝑘+1
Using an expansion along the first column, we obtain
⎛⎡
𝑎22
⎜⎢ 0
⎜⎢
det(𝐴) = 𝑎11 det(𝑀11 (𝐴)) = 𝑎11 det ⎜⎢ .
⎝⎣ ..
*
𝑎44
..
.
···
···
..
.
*
*
..
.
⎤⎞
⎥⎟
⎥⎟
⎥⎟ .
⎦⎠
0 · · · 0 𝑎𝑘+1,𝑘+1
Since 𝑀11 (𝐴) is an 𝑘 × 𝑘 matrix, we may apply the inductive hypothesis to conclude that
det 𝑀11 (𝐴) = 𝑎22 · · · 𝑎𝑘+1,𝑘+1 . It follows that det(𝐴) = 𝑎11 𝑎22 · · · 𝑎𝑘+1,𝑘+1 , as required.
Corollary 6.1.16
The determinant of the 𝑛 × 𝑛 identity matrix is 1, that is, det(𝐼𝑛 ) = 1.
Proof: This follows from part (c) of the previous Proposition.
The determinant of a lower triangular square matrix is also the product of its diagonal
entries. This follows at once from the following result and Proposition 6.1.15 (Easy Determinants).
Proposition 6.1.17
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then det(𝐴) = det(𝐴𝑇 ).
Proof: We proceed by induction on 𝑛. If 𝑛 = 1, then 𝐴 = 𝐴𝑇 and there is nothing to
prove. Thus, assume that the result is true for 𝑛 = 𝑘 and consider a (𝑘 + 1) × (𝑘 + 1) matrix
𝐴. Let 𝐵 = 𝐴𝑇 . Notice that 𝑏𝑖𝑗 = 𝑎𝑗𝑖 and that 𝑀𝑖𝑗 (𝐵) = (𝑀𝑗𝑖 (𝐴))𝑇 . Consequently, using
an expansion along the first row of 𝐵, we have
det(𝐵) =
𝑘+1
∑︁
𝑏1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐵)) =
𝑗=1
𝑘+1
∑︁
𝑎𝑗1 (−1)1+𝑗 det((𝑀𝑗1 (𝐴))𝑇 ).
𝑗=1
Now since 𝑀𝑗1 (𝐴) is a 𝑘 × 𝑘 matrix, we have det((𝑀𝑗1 (𝐴))𝑇 ) = det(𝑀𝑗1 (𝐴)), by the
inductive hypothesis. So, the above expression for det(𝐵) = det(𝐴𝑇 ) becomes
det(𝐴𝑇 ) =
𝑘+1
∑︁
𝑎𝑗1 (−1)1+𝑗 det(𝑀𝑗1 (𝐴)).
𝑗=1
The right-side is the determinant expansion along the first column of 𝐴, that is, it is equal
to det(𝐴). Thus, det(𝐴𝑇 ) = det(𝐴).
Section 6.2
Computing the Determinant in Practice: EROs
6.2
153
Computing the Determinant in Practice: EROs
We will now outline a practical method for computing the determinant of a general square
matrix 𝐴. The basic idea is to perform EROs on 𝐴, thereby reducing it to a matrix 𝐵, whose
determinant is more easily computed—for instance, 𝐵 could have a lot of 0 entries. Since
EROs will turn out to have a well-understood effect on the determinant (as we describe
below), we can obtain det(𝐴) from det(𝐵).
Theorem 6.2.1
(Effect of EROs on the Determinant)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F).
(a) (Row swap) If 𝐵 is obtained from 𝐴 by interchanging two rows, then det(𝐵) = − det(𝐴).
(b) (Row scale) If 𝐵 is obtained from 𝐴 by multiplying a row by 𝑚 ̸= 0, then
det(𝐵) = 𝑚 det(𝐴).
(c) (Row addition) If 𝐵 is obtained from 𝐴 by adding a non-zero multiple of one row to
another row, then det(𝐵) = det(𝐴).
REMARK
Since det(𝐴) = det(𝐴𝑇 ), the previous Theorem remains true if we replace every instance of
the word “row” with the word “column”.
WARNING: Do not use such “column operations” when performing calculations other
than a determinant, as column operations disrupt many properties of a matrix.
⎡
Example 6.2.2
⎤
2 −1 0
Let 𝐴 = ⎣ 0 −3 1 ⎦. Since 𝐴 is upper triangular, we have that det(𝐴) = (2)(−3)(3) = −18.
0 0 3
⎡
⎤
2 −1 0
Let 𝐵 = ⎣ 0 −3 1 ⎦. Notice that 𝐵 may be obtained from 𝐴 by adding a multiple of 2 times
0 −6 5
the second row of 𝐴 to the third row of 𝐴. Thus, det(𝐵) = det(𝐴) = −18.
⎡
⎤
4 −2 0
Let 𝐶 = ⎣ 0 −3 1 ⎦. Notice that 𝐶 may be obtained from 𝐵 by multiplying the first row
0 −6 5
by 2. Thus, det(𝐶) = 2 det(𝐵) = −36.
⎡
⎤
4 −2 0
Finally, let 𝐷 = ⎣ 0 −6 5 ⎦, which is obtained from 𝐶 by interchanging the second and
0 −3 1
third rows. Thus, det(𝐷) = − det(𝐶) = 36.
The proof of Theorem 6.2.1 (Effect of EROs on the Determinant) involves lengthy (but
straightforward) computations. We will not give it in this course. Instead, we will extract
from the Theorem some useful corollaries.
154
Corollary 6.2.3
Chapter 6
The Determinant
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). If 𝐴 has two identical rows (or two identical columns), then det(𝐴) = 0.
Proof: If 𝐴 has two identical rows, we can use a row operation to subtract one of these rows
from the other. The resulting matrix 𝐵 will have a row of zeros. Consequently, det(𝐵) = 0
by Proposition 6.1.15 (Easy Determinants). From Theorem 6.2.1 (Effect of EROs on the
Determinant), we know that row addition EROs do not change the determinant. Thus, we
conclude that det(𝐴) = det(𝐵) = 0.
If 𝐴 has two identical columns, we can apply the previous argument to 𝐴𝑇 .
Corollary 6.2.4
(Determinants of Elementary Matrices)
For each part below, let 𝐸 be an elementary matrix of the indicated type.
(a) (Row swap) det(𝐸) = −1.
(b) (Row scale) det(𝐸) = 𝑚 (if 𝐸 is obtained from 𝐼𝑛 by multiplying a row by 𝑚 ̸= 0).
(c) (Row addition) det(𝐸) = 1.
Proof: Combine the fact that det(𝐼𝑛 ) = 1 with Theorem 6.2.1 (Effect of EROs on the
Determinant).
Corollary 6.2.5
(Determinant After One ERO)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and suppose we perform a single ERO on 𝐴 to produce the matrix 𝐵.
Assume that the corresponding elementary matrix is 𝐸. Then
det(𝐵) = det(𝐸) det(𝐴).
Proof: Combine Theorem 6.2.1 (Effect of EROs on the Determinant) and Corollary 6.2.4
(Determinants of Elementary Matrices).
Corollary 6.2.6
(Determinant After 𝑘 EROs)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and suppose we perform a sequence of 𝑘 EROs on the matrix 𝐴 to obtain
the matrix 𝐵.
Suppose that the elementary matrix corresponding to the 𝑖th ERO is 𝐸𝑖 , so that
𝐵 = 𝐸𝑘 · · · 𝐸2 𝐸1 𝐴.
Then
det(𝐵) = det(𝐸𝑘 · · · 𝐸2 𝐸1 𝐴) = det(𝐸𝑘 ) · · · det(𝐸2 ) det(𝐸1 ) det(𝐴).
Section 6.2
Computing the Determinant in Practice: EROs
155
Proof: We proceed by induction on 𝑘. If 𝑘 = 1, this is the previous Corollary. Now
assume that the result is true for 𝑘 = ℓ and consider the case where 𝑘 = ℓ + 1. We then
have 𝐵 = 𝐸ℓ+1 𝐸ℓ · · · 𝐸1 𝐴, and therefore,
det(𝐵) = det(𝐸ℓ+1 ) det(𝐸ℓ · · · 𝐸1 𝐴)
(Corollary 6.2.5)
= det(𝐸ℓ+1 ) det(𝐸ℓ ) · · · det(𝐸1 ) det(𝐴)
(inductive hypothesis).
The result now follows.
We will give an example that illustrates how to use the preceding results to obtain the
determinant of an arbitrary matrix. We will attempt to apply EROs to simplify 𝐴, keeping
track of what EROs are being applied. Then we can relate det(𝐴) to det(𝐵) using Corollary
6.2.6 (Determinant After 𝑘 EROs). Notice that row addition EROs do not change the
determinant, so we need only keep track of row swap and row scale EROs.
⎡
Example 6.2.7
⎤
12 3
Evaluate the determinant of 𝐴 = ⎣ 4 5 6 ⎦.
7 8 10
Solution: We have
⎛⎡
1 2
⎝
⎣
0 −3
det(𝐴) = det
0 −6
⎛⎡
1 2
⎝
⎣
0 −3
= det
0 0
⎤⎞
3
−6 ⎦⎠
−11
⎤⎞
3
−6 ⎦⎠
1
= (1)(−3)(1)
(︃
perform
𝑅2 → −4𝑅1 + 𝑅2
𝑅3 → −7𝑅1 + 𝑅3
)︃
, determinant unchanged
(perform 𝑅3 → −2𝑅2 + 𝑅3 , determinant unchanged)
(determinant of an upper triangular matrix)
= −3.
Example 6.2.8
The previous example used only row addition EROs. Our next example will use all three
types. Notice that if we apply a row scale ERO to 𝐴 to obtain 𝐵, then det(𝐵) = 𝑚 det(𝐴).
1
det(𝐵).
Thus, det(𝐴) =
𝑚
⎡
⎤
2 −2 4
Evaluate the determinant of 𝐴 = ⎣ 5 −5 1 ⎦.
3 6 5
Solution: We have
⎛⎡
⎤⎞
1 −1 2
det(𝐴) = 2 det ⎝⎣ 5 −5 1 ⎦⎠
3 6 5
⎛⎡
⎤⎞
1 −1 2
= 2 det ⎝⎣ 0 0 −9 ⎦⎠
0 9 −1
⎛⎡
⎤⎞
1 −1 2
= −2 det ⎝⎣ 0 9 −1 ⎦⎠
0 0 −9
(︁
perform 𝑅1 → 21 𝑅1 , determinant scaled by
(︃
perform
𝑅2 → −5𝑅1 + 𝑅2
𝑅3 → −3𝑅1 + 𝑅3
1
1/2
)︁
=2
)︃
, determinant unchanged
(perform 𝑅2 ↔ 𝑅3 , sign of determinant changed)
156
Chapter 6
= (−2)(1)(9)(−9)
The Determinant
(determinant of an upper triangular matrix)
= 162.
6.3
The Determinant and Invertibility
In this section, we will use our results on the relationship between EROs and the determinant
to prove the following fundamental result.
Theorem 6.3.1
(Invertible iff the Determinant is Non-Zero)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 𝐴 is invertible if and only if det(𝐴) ̸= 0.
Proof: Let 𝑅 = RREF(𝐴). Suppose we obtain 𝑅 from 𝐴 by applying 𝑘 EROs. Thus,
𝑅 = 𝐸𝑘 · · · 𝐸1 𝐴,
where 𝐸𝑖 is the elementary matrix corresponding to the 𝑖th ERO. Corollary 6.2.6 (Determinant After 𝑘 EROs) then implies that
det(𝑅) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝐴).
By Corollary 6.2.4 (Determinants of Elementary Matrices), since det(𝐸𝑖 ) ̸= 0 for all 𝑖, it
follows that det(𝑅) ̸= 0 if and only if det(𝐴) ̸= 0.
Now, if 𝐴 is invertible, then 𝑅 = 𝐼𝑛 , and therefore det(𝑅) = 1 ̸= 0. So det(𝐴) ̸= 0.
Conversely, if 𝐴 is not invertible, then 𝑅 must have a zero row, and therefore det(𝑅) = 0.
So det(𝐴) = 0 in this case. This completes the proof.
⎡
Example 6.3.2
⎤
35 7
For what value(s) of the constant 𝑘 is the matrix 𝐴 = ⎣ 6 𝑘 14 ⎦ invertible?
24 6
Solution: Let us evaluate the determinant of 𝐴.
⎛⎡
⎤⎞
3
5
7
det(𝐴) = det ⎝⎣ 0 𝑘 − 10 0 ⎦⎠
(𝑅2 → −2𝑅1 + 𝑅2 )
2
4
6
⎛⎡
⎤⎞
3
5
7
= 2 det ⎝⎣ 0 𝑘 − 10 0 ⎦⎠
(𝑅3 → 21 𝑅3 )
1
2
3
⎛⎡
⎤⎞
0 −1 −2
= 2 det ⎝⎣ 0 𝑘 − 10 0 ⎦⎠
(𝑅1 → −3𝑅3 + 𝑅1 )
1
2
3
= 4(𝑘 − 10)
(expansion along the second row).
Thus, det(𝐴) = 0 if and only if 𝑘 = 10. We conclude that the matrix 𝐴 is invertible if and
only if 𝑘 ̸= 10.
Section 6.3
The Determinant and Invertibility
157
The following result is one of the most important properties of the determinant.
Proposition 6.3.3
(Determinant of a Product)
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Then det(𝐴𝐵) = det(𝐴) det(𝐵).
Proof: As in the proof of Theorem 6.3.1 (Invertible iff the Determinant is Non-Zero), we
may write 𝐴 as
𝐴 = 𝐸𝑘 · · · 𝐸1 𝑅
where 𝑅 = RREF(𝐴) and the 𝐸𝑖 are elementary matrices that correspond to the EROs
needed to operate on 𝑅 to produce 𝐴. Then Corollary 6.2.6 (Determinant After 𝑘 EROs)
gives
det(𝐴) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝑅).
Now, suppose 𝐴 is invertible, so that 𝑅 = 𝐼𝑛 . On the one hand,
det(𝐴) = det(𝐸𝑘 ) · · · det(𝐸1 ),
while on the other hand, 𝐴𝐵 = (𝐸𝑘 · · · 𝐸1 𝑅)𝐵 = 𝐸𝑘 · · · 𝐸1 𝐵 (as 𝑅 = 𝐼𝑛 ) so that
det(𝐴𝐵) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝐵),
again by Corollary 6.2.6. The right-side is det(𝐴) det(𝐵). Thus, the Proposition is true if
𝐴 is invertible.
If 𝐴 is not invertible, then 𝑅 has a zero row, and therefore so does 𝑅𝐵. Since
𝐴𝐵 = 𝐸𝑘 · · · 𝐸1 (𝑅𝐵)
we can apply Corollary 6.2.6 (Determinant After 𝑘 EROs) once more to find that
det(𝐴𝐵) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝑅𝐵) = 0.
Since det(𝐴) is 0 in this case too, we have that det(𝐴𝐵) = det(𝐴) det(𝐵).
REMARK
It is not true that det(𝐴 + 𝐵) = det(𝐴) + det(𝐵) in general. For example, if
[︂
]︂
[︂
]︂
10
00
𝐴=
and 𝐵 =
00
01
then det(𝐴) = det(𝐵) = 0 so that det(𝐴) + det(𝐵) = 0. On the other hand, 𝐴 + 𝐵 = 𝐼2 , so
det(𝐴 + 𝐵) = 1. Thus, det(𝐴 + 𝐵) ̸= det(𝐴) + det(𝐵) in this case.
158
Example 6.3.4
Chapter 6
The Determinant
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Prove that 𝐴𝐵 is invertible if and only if 𝐵𝐴 is invertible.
Solution:
𝐴𝐵 is invertible iff det(𝐴𝐵) ̸= 0
(Theorem 6.3.1 (Invertible iff the Determinant is Non-Zero))
iff det(𝐴) det(𝐵) ̸= 0 (Proposition 6.3.3 (Determinant of a Product))
iff det(𝐵) det(𝐴) ̸= 0
iff det(𝐵𝐴) ̸= 0
(Proposition 6.3.3)
iff 𝐵𝐴 is invertible.
(Theorem 6.3.1)
The previous example implicitly contains the following useful observation.
Corollary 6.3.5
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Then det(𝐴𝐵) = det(𝐵𝐴).
Here is another useful observation that can be proved using similar ideas.
Corollary 6.3.6
(Determinant of Inverse)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) be invertible. Then det(𝐴−1 ) =
1
.
det(𝐴)
Proof: We have 𝐴−1 𝐴 = 𝐼𝑛 . Using Proposition 6.3.3 (Determinant of a Product) and
taking determinants of both sides, we get det(𝐴−1 ) det(𝐴) = 1. Since 𝐴 is invertible, we
have that det(𝐴) ̸= 0, so we may divide through by det(𝐴) to obtain det(𝐴−1 ) = 1/ det(𝐴),
as desired.
⎡
Example 6.3.7
⎤
12 3
Prove that 𝐴 = ⎣ 4 5 6 ⎦ is invertible and find det(𝐴−1 ).
7 8 10
Solution: We saw in Example 6.2.7 that det(𝐴) = −3, so Theorem 6.3.1 (Invertible iff the
Determinant is Non-Zero)) tells us that 𝐴 is invertible.
We get that det(𝐴−1 ) = −
6.4
1
by Corollary 6.3.6 (Determinant of Inverse).
3
An Expression for 𝐴−1
We saw that a 2 × 2 matrix 𝐴 is invertible if and only if det(𝐴) ̸= 0 in Proposition 4.6.
13 (Inverse of a 2 × 2 Matrix). In the previous sections, we generalized the notion of
the determinant to 𝑛 × 𝑛 matrices and obtained the analogous result: an 𝑛 × 𝑛 matrix is
invertible if and only if det(𝐴) ̸= 0 (Theorem 6.3.1).
Section 6.4
An Expression for 𝐴−1
159
Proposition 4.6.13 contains one additional result: in the case where det(𝐴) ̸= 0, an expression for the inverse of 𝐴 is given, namely
[︂
]︂
1
𝑑 −𝑏
𝐴−1 =
.
det(𝐴) −𝑐 𝑎
In this section, we will show that there is a similar result for an 𝑛 × 𝑛 matrix (see Corollary
6.4.6 (Inverse by Adjugate)).
We will need some preliminary notation and terminology.
Definition 6.4.1
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The (𝑖, 𝑗)𝑡ℎ cofactor of 𝐴, denoted by 𝐶𝑖𝑗 (𝐴), is defined by
Cofactor
𝐶𝑖𝑗 (𝐴) = (−1)𝑖+𝑗 det(𝑀𝑖𝑗 (𝐴)).
Note that we can write the determinant of 𝐴 in terms of cofactors as either:
det(𝐴) =
𝑛
∑︁
(expansion along the 𝑖𝑡ℎ row)
𝑎𝑖𝑗 𝐶𝑖𝑗 (𝐴)
𝑗=1
or
det(𝐴) =
𝑛
∑︁
𝑎𝑖𝑗 𝐶𝑖𝑗 (𝐴)
(expansion along the 𝑗 𝑡ℎ column).
𝑖=1
Definition 6.4.2
Adjugate of a
Matrix
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The adjugate of 𝐴, denoted by adj(𝐴), is the 𝑛 × 𝑛 matrix whose
(𝑖, 𝑗)𝑡ℎ entry is
(adj(𝐴))𝑖𝑗 = 𝐶𝑗𝑖 (𝐴).
That is, the adjugate of 𝐴 is the transpose of the matrix of cofactors of 𝐴.
[︂
Example 6.4.3
]︂
𝑎 𝑏
, then the cofactors of 𝐴 are
If 𝐴 =
𝑐 𝑑
𝐶11 (𝐴) = (−1)1+1 𝑑 = 𝑑,
𝐶12 (𝐴) = (−1)1+2 𝑐 = −𝑐,
𝐶21 (𝐴) = (−1)2+1 𝑏 = −𝑏,
𝐶22 (𝐴) = (−1)2+2 𝑎 = 𝑎.
Hence
[︂
𝑑 −𝑐
adj(𝐴) =
−𝑏 𝑎
]︂𝑇
[︂
]︂
𝑑 −𝑏
=
.
−𝑐 𝑎
Notice that this is the matrix that appeared in the expression for 𝐴−1 in Proposition 4.6.
13 (Inverse of a 2 × 2 Matrix).
160
Chapter 6
The Determinant
⎡
Example 6.4.4
⎤
12 3
Find the adjugate of 𝐴 = ⎣ 4 5 6 ⎦ .
7 8 10
Solution:
The cofactors of 𝐴 are given below.
𝐶𝑖𝑗 (𝐴).)
[︂
]︂
5 6
𝑀11 =
𝐶11
8 10
[︂
]︂
45
𝑀13 =
𝐶13
78
[︂
]︂
1 3
𝑀22 =
𝐶22
7 10
[︂
]︂
23
𝑀31 =
𝐶31
56
]︂
[︂
12
𝐶33
𝑀33 =
45
(We will write 𝑀𝑖𝑗 and 𝐶𝑖𝑗 instead of 𝑀𝑖𝑗 (𝐴) and
[︂
=2
𝑀12
= −3
𝑀21
= −11
𝑀23
= −3
𝑀32
]︂
4 6
=
7 10
[︂
]︂
2 3
=
8 10
[︂
]︂
12
=
78
[︂
]︂
13
=
46
𝐶12 = 2
𝐶21 = 4
𝐶23 = 6
𝐶32 = 6
= −3
Consequently,
⎤
⎤𝑇 ⎡
2 4 −3
2 2 −3
adj(𝐴) = ⎣ 4 −11 6 ⎦ = ⎣ 2 −11 6 ⎦ .
−3 6 −3
−3 6 −3
⎡
Here is the key result concerning the adjugate matrix, which we state without proof.
Theorem 6.4.5
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then
𝐴 adj(𝐴) = adj(𝐴) 𝐴 = det(𝐴)𝐼𝑛 .
Corollary 6.4.6
(Inverse by Adjugate)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). If det(𝐴) ̸= 0, then
𝐴−1 =
1
adj(𝐴).
det(𝐴)
Proof: From Theorem 6.4.5, we know that
adj(𝐴)𝐴 = det(𝐴)𝐼𝑛 .
Dividing this equation by det(𝐴), we obtain
(︂
)︂
1
adj(𝐴) 𝐴 = 𝐼𝑛 .
det(𝐴)
Section 6.5
Cramer’s Rule
161
Therefore,
1
adj(𝐴) = 𝐴−1 .
det(𝐴)
[︂
Example 6.4.7
𝑎 𝑏
If 𝐴 =
𝑐 𝑑
Adjugate),
]︂
and det(𝐴) ̸= 0, then by Example 6.4.3 and Corollary 6.4.6 (Inverse by
−1
𝐴
[︂
]︂
[︂
]︂
1
1
𝑑 −𝑏
𝑑 −𝑏
=
=
,
det(𝐴) −𝑐 𝑎
𝑎𝑑 − 𝑏𝑐 −𝑐 𝑎
exactly as in Proposition 4.6.13 (Inverse of a 2 × 2 Matrix).
⎡
Example 6.4.8
⎤
12 3
Let 𝐴 = ⎣ 4 5 6 ⎦ . Determine 𝐴−1 , if it exists.
7 8 10
Solution: From Example 6.2.7, we know
⎤
⎡
1 2 3
det(𝐴) = det ⎣ 0 −3 −6 ⎦ = 33 − 36 = −3 ̸= 0,
0 −6 −11
so we have that 𝐴 is invertible. Having computed in Example 6.4.4 that
⎤
⎡
2 4 −3
adj(𝐴) = ⎣ 2 −11 6 ⎦ ,
−3 6 −3
we conclude that
⎡
𝐴−1
⎤
2 4 −3
1
= − ⎣ 2 −11 6 ⎦ .
3
−3 6 −3
REMARK
We do not recommend this method as a way of determining the inverse of a matrix. However,
it may be useful for checking that one or two entries are correct.
In practice, it will usually be more efficient to use the Algorithm involving EROs given in
Proposition 4.6.8 (Algorithm for Checking Invertibility and Finding the Inverse).
6.5
Cramer’s Rule
Cramer’s Rule provides a method for solving systems of 𝑛 linear equations in 𝑛 unknowns
by making use of determinants. It is not usually efficient from a computational point of
view, since calculating determinants is generally computationally intensive.
However, Cramer’s Rule is particularly useful if one is only interested in a single component
of the solution.
162
Proposition 6.5.1
Chapter 6
The Determinant
(Cramer’s Rule)
#»
#»
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and consider the equation 𝐴 #»
𝑥 = 𝑏 , where 𝑏 ∈ F𝑛 and det(𝐴) ̸= 0.
#»
If we construct 𝐵𝑗 from 𝐴 by replacing the 𝑗 𝑡ℎ column of 𝐴 by the column vector 𝑏 , then
the solution #»
𝑥 to the equation
#»
𝐴 #»
𝑥 = 𝑏
is given by
𝑥𝑗 =
det(𝐵𝑗 )
,
det(𝐴)
for all 𝑗 = 1, . . . , 𝑛.
Proof: Since det(𝐴) ̸= 0, we have that 𝐴 is invertible and
𝐴−1 =
Thus,
and so
1
adj(𝐴).
det(𝐴)
#»
#»
𝑥 = 𝐴−1 𝑏 =
1
#»
adj(𝐴) 𝑏 ,
det 𝐴
𝑛
∑︁
1
(adj(𝐴))𝑗𝑘 𝑏𝑘 ;
𝑥𝑗 =
det(𝐴)
𝑘=1
that is,
𝑥𝑗 =
𝑛
∑︁
1
𝐶𝑘𝑗 (𝐴) 𝑏𝑘 .
det(𝐴)
𝑘=1
We will show that
𝑛
∑︁
𝐶𝑘𝑗 (𝐴) 𝑏𝑘 = det(𝐵𝑗 ),
𝑘=1
and then Cramer’s Rule will be proven.
We evaluate det(𝐵𝑗 ) by expanding the determinant along the 𝑗 𝑡ℎ column of 𝐵𝑗 ,
𝑘
∑︁
det(𝐵𝑗 ) =
(𝐵𝑗 )𝑘𝑗 (𝐶(𝐵𝑗 ))𝑘𝑗 .
𝑖=1
#»
Recall that (𝐵𝑗 )𝑘𝑗 = 𝑏𝑘 , since the 𝑗 𝑡ℎ column of 𝐵𝑗 is just 𝑏 .
Since the matrices 𝐴 and 𝐵 only differ (at most) in the 𝑗 𝑡ℎ column, we conclude that
(𝐶(𝐵𝑗 ))𝑘𝑗 = (𝐶(𝐴))𝑘𝑗 .
Thus,
det(𝐵𝑗 ) =
𝑛
∑︁
𝑘=1
as required.
𝐶𝑘𝑗 (𝐴) 𝑏𝑘 ,
Section 6.6
The Determinant and Geometry
163
⎡
Example 6.5.2
⎤
12 3
Let 𝐴 = ⎣ 4 5 6 ⎦. Use Cramer’s Rule to solve
7 8 10
⎡
⎤
⎡
⎤
12 3
−2
⎣ 4 5 6 ⎦ #»
𝑥 = ⎣ 3 ⎦.
7 8 10
−4
Solution: We saw in Example 6.2.7 that
⎛⎡
⎤⎞
12 3
det ⎝⎣ 4 5 6 ⎦⎠ = −3.
7 8 10
Now, let us evaluate the determinants of the matrices
⎡
⎤
⎡
⎤
−2 2 3
1 −2 3
𝐵1 = ⎣ 3 5 6 ⎦ , 𝐵2 = ⎣ 4 3 6 ⎦ ,
−4 8 10
7 −4 10
⎡
⎤
1 2 −2
𝐵3 = ⎣ 4 5 3 ⎦ .
7 8 −4
We evaluate the determinants of 𝐵1 , 𝐵2 and 𝐵3 using the indicated EROs:
⎧
⎤
⎤
⎡
⎡
−2 2 3
−2 2 3
⎨ 𝑅2 → 32 𝑅1 + 𝑅2
⎦ = 20.
det(𝐵1 ) = det ⎣ 3 5 6 ⎦ = det ⎣ 0 8 21
2
⎩
0 4 4
𝑅3 → −2𝑅1 + 𝑅3
−4 8 10
⎡
⎤
⎡
⎤
1 −2 3
1 −2 3
det(𝐵2 ) = det ⎣ 4 3 6 ⎦ = det ⎣ 0 11 −6 ⎦ = −61.
7 −4 10
0 10 −11
⎡
⎤
⎡
⎤
1 2 −2
1 2 −2
det(𝐵3 ) = det ⎣ 4 5 3 ⎦ = det ⎣ 0 −3 11 ⎦ = 36.
7 8 −4
0 −6 10
⎧
⎨ 𝑅2 → −4𝑅1 + 𝑅2
⎩
𝑅3 → −7𝑅1 + 𝑅3
⎧
⎨ 𝑅2 → −4𝑅1 + 𝑅2
⎩
𝑅3 → −7𝑅1 + 𝑅3
Thus,
⎡
⎤
20
1
#»
𝑥 = − ⎣ −61 ⎦ .
3
36
6.6
The Determinant and Geometry
In this section, we will give a geometric interpretation of the determinant. In R2 , the
underlying idea is the relationship between the determinant and areas of parallelograms,
which we can calculate using the cross product. Here is the key result.
164
Proposition 6.6.1
Chapter 6
The Determinant
(Area of Parallelogram)
[︂ ]︂
[︂ ]︂
𝑣1
𝑤1
#»
#»
Let 𝑣 =
and 𝑤 =
be vectors in R2 .
𝑣2
𝑤2
⃒
(︂[︂
]︂)︂⃒
⃒
𝑣1 𝑤1 ⃒⃒
#»
#»
⃒
The area of the parallelogram with sides 𝑣 and 𝑤 is ⃒det
.
𝑣2 𝑤2 ⃒
#» in R3 with a third component
Proof: We shall consider the analogous vectors #»
𝑣 1 and 𝑤
1
of zero. That is, we let
⎡ ⎤
⎡ ⎤
𝑣1
𝑤1
#»
#» = ⎣ 𝑤 ⎦ .
𝑣 1 = ⎣ 𝑣2 ⎦ and 𝑤
2
1
0
0
As we saw in the Remark (Parallelogram Area via Cross Product) in Section 1.9, the area
#» is given by
of the parallelogram with sides #»
𝑣 and 𝑤
⃦⎡ ⎤ ⎡ ⎤⃦ ⃦⎡
⎤⃦
⃦ 𝑣1
⃦ ⃦
⃦
𝑤
0
1
⃦
⃦ ⃦
⃦
#»
#»
⃦
⃦
⃦
⃦ = |𝑣1 𝑤2 − 𝑤1 𝑣2 |.
⎣
⎦
⎣
⎦
⎣
⎦
0
‖ 𝑣 1 × 𝑤 1 ‖ = ⃦ 𝑣 2 × 𝑤2 ⃦ = ⃦
⃦
⃦ 0
0 ⃦ ⃦ 𝑣 1 𝑤2 − 𝑤1 𝑣 2 ⃦
⃒
(︂[︂
]︂)︂⃒
⃒
⃒
𝑣
𝑤
1
1
⃒, as required.
The expression on the right is equal to ⃒⃒det
𝑣 2 𝑤2 ⃒
Example 6.6.2
[︀
]︀𝑇
#» = [︀ 4 1 ]︀𝑇
Determine the area of the parallelogram in R2 with the vectors #»
𝑣 = 2 5 and 𝑤
as its sides.
⃒
]︂)︂⃒
(︂[︂
⃒
2 4 ⃒⃒
⃒
Solution: The area is ⃒det
= | − 18| = 18 units2 .
51 ⃒
Notice that it is important to include the absolute value signs in the expression given by
Proposition 6.6.1 (Area of Parallelogram).
REMARK
If we study the proof of Proposition 6.6.1 (Area of Parallelogram), we can obtain a useful
trick that helps us compute
cross products.
Namely, suppose we wish to compute the cross
⎡ ⎤
⎡ ⎤
𝑢1
𝑣1
product #»
𝑢 × #»
𝑣 of #»
𝑢 = ⎣ 𝑢2 ⎦ and #»
𝑣 = ⎣ 𝑣2 ⎦. Consider the matrix
𝑢3
𝑣3
⎡ #»
𝑒1
⎣ 𝑢1
𝑣1
#»
𝑒2
𝑢2
𝑣2
⎤
#»
𝑒3
𝑢3 ⎦
𝑣3
whose first row consists of the standard basis vectors #»
𝑒 1 , #»
𝑒 2 , #»
𝑒 3 (treated as formal sym#»
bols!), and whose second and third rows are the entries of 𝑢 and #»
𝑣 , respectively. Then if
Section 6.6
The Determinant and Geometry
165
we expand along the first row, we obtain
⎡ #» #» #» ⎤
𝑒1 𝑒2 𝑒3
𝑒 1 − (𝑢1 𝑣3 − 𝑢3 𝑣1 ) #»
𝑒 2 + (𝑢1 𝑣2 − 𝑢2 𝑣1 ) #»
𝑒3
det ⎣ 𝑢1 𝑢2 𝑢3 ⎦ = (𝑢2 𝑣3 − 𝑢3 𝑣2 ) #»
𝑣1 𝑣2 𝑣3
⎡
⎤
𝑢2 𝑣3 − 𝑢3 𝑣2
= ⎣ −(𝑢1 𝑣3 − 𝑢3 𝑣1 ) ⎦ .
𝑢1 𝑣2 − 𝑢2 𝑣1
We recognize this last expression as #»
𝑢 × #»
𝑣.
We must emphasize however that this is simply a trick that helps us remember the formula
for #»
𝑢 × #»
𝑣 . We are abusing notation by treating #»
𝑒 1 , #»
𝑒 2 , #»
𝑒 3 as symbols in the left-side of
the equation, but as vectors in the right-side of the equation.
⎡
Example 6.6.3
⎤ ⎡
⎤
2
−2
Compute ⎣ −3 ⎦ × ⎣ 1 ⎦.
5
4
Solution:
Using the trick above, we have
⎡ ⎤ ⎡
⎤
⎡ #» #» #» ⎤
2
−2
𝑒1 𝑒2 𝑒3
⎣ −3 ⎦ × ⎣ 1 ⎦ = det ⎣ 2 −3 5 ⎦
5
4
−2 1 4
= (−17) #»
𝑒 1 − (18) #»
𝑒 2 + (−4) #»
𝑒3
⎡
⎤
−17
⎣
= −18 ⎦ ,
−4
just as we had computed in Example 1.9.2.
Here is another useful re-interpretation of Proposition 6.6.1 (Area of Parallelogram). Let
𝐴 ∈ 𝑀2×2 (R) and consider the unit square in R2 . Multiply the sides 𝑒#»1 and 𝑒#»2 of the unit
square by 𝐴 to obtain the vectors 𝐴𝑒#»1 and 𝐴𝑒#»2 . In this way, we may view 𝐴 as having
transformed the unit square into a parallelogram with sides 𝐴𝑒#»1 and 𝐴𝑒#»2 .
Since 𝐴𝑒#» and 𝐴𝑒#» are the columns of 𝐴 (by Lemma 4.2.2 (Column Extraction)), Proposition
1
2
6.6.1 tells us that the area of the resulting parallelogram is | det(𝐴)|. Since the area of the
unit square is 1, we conclude 𝐴 has scaled the area of the unit square by a factor of | det(𝐴)|.
The determinant of a 2 × 2 real matrix may thus be interpreted as a (signed) scaling factor
of areas. In particular, notice that if the determinant is 0, then the area gets scaled by 0.
This intuitively aligns with the fact that such a matrix is not invertible, since such a scaling
cannot be reversed.
It is possible to give a similar interpretation of the 𝑛 × 𝑛 determinant of a real matrix as
a (signed) scaling factor of volumes in R𝑛 , but we will not pursue this idea further in this
course.
166
Chapter 6
Figure 6.6.1: Effect of 𝐴 on the unit square.
The Determinant
Chapter 7
Eigenvalues and Diagonalization
7.1
What is an Eigenpair?
Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be the standard matrix for the linear transformation 𝑇𝐴 . We want
to consider the effect that multiplication by 𝐴 has on the vector #»
𝑥 ∈ R𝑛 ; that is, we
#»
will compare the length and the direction of an input vector 𝑥 ∈ R𝑛 to its image vector
#»
𝑦 = 𝐴 #»
𝑥 ∈ R𝑛 .
Usually, when #»
𝑥 is multiplied by matrix 𝐴 to produce #»
𝑦 = 𝐴 #»
𝑥 , there will be changes to
both the length and the direction. Exceptions to this pattern turn out to be particularly
interesting, and will become the focus of the majority of this chapter.
]︂
[︂ ]︂
1
13
#»
. Then
and 𝑔 =
Let 𝐴 =
0
31
[︂
Example 7.1.1
]︂ [︂ ]︂ [︂ ]︂
[︂
#»
1
13 1
#»
#»
.
=
ℎ = 𝑇𝐴 ( 𝑥 ) = 𝐴 𝑔 =
3
31 0
√
#»
Since ‖ #»
𝑔 ‖ = 1 and the length of the image √
is ‖ ℎ ‖ = 10,
#»
the image ℎ has been scaled by a factor of 10 relative to
the length of the input vector #»
𝑔.
#»
The image ℎ has a different direction than the input #»
𝑔;
#»
#»
the angle between
and ℎ is
(︁ #» #»𝑔 )︁
(︁
)︁
𝑔 ·ℎ
𝜃 = arccos #» #»
= arccos √110 ≈ 1.249 radians
‖𝑔 ‖‖ℎ‖
counter-clockwise.
[︂
Example 7.1.2
]︂
[︂ ]︂
13
2
#»
Keeping 𝐴 =
from the previous example with a new input vector 𝑢 =
, then
31
−1
167
168
Chapter 7
Eigenvalues and Diagonalization
[︂
]︂ [︂ ]︂ [︂ ]︂
13
2
−1
#»
#»
#»
𝑣 = 𝑇𝐴 ( 𝑢 ) = 𝐴 𝑢 =
=
.
3 1 −1
5
√
√
𝑣 ‖ = 26,
Since ‖ #»
𝑢 ‖ = 5 and the length of the image√︁is ‖ #»
the image #»
𝑣 has been scaled by a factor of 26
5 relative to
#»
the length of the input vector 𝑢 .
#»
The image #»
𝑣 has a different direction than the
𝑢)︁;
(︁ input
#»
#»
#»
#»
𝑢·𝑣
the angle between 𝑢 and 𝑣 is 𝜃 = arccos ‖ #»
=
𝑢 ‖ ‖ #»
𝑣‖
(︁
)︁
√
arccos √5−7
≈ 2.232 radians counter-clockwise.
26
We now ask, for a given matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F), are there any input vectors whose direction
is not changed when they are multiplied by 𝐴?
[︂
Example 7.1.3
]︂
[︂ ]︂
13
1
#»
Keeping 𝐴 =
from the previous examples with a new input vector 𝑧 =
, then
31
1
[︂ ]︂
]︂ [︂ ]︂ [︂ ]︂
[︂
1
4
13 1
#» = 𝑇 ( #»
#»
= 4 #»
𝑧.
=
4
=
𝑤
𝑧
)
=
𝐴
𝑧
=
𝐴
1
4
31 1
#» has been scaled by a factor of 4 relative to the length of the input vector #»
The image 𝑤
𝑧.
#» has the same direction as the original vector #»
The image 𝑤
𝑧 ; that is, multiplying 𝐴 by #»
𝑧
did not change the direction.
We include within our scope of interest image vectors that are in the opposite direction of
the input vector.
[︂
Example 7.1.4
]︂
[︂
]︂
13
1
#»
from the previous examples with a new input vector 𝑥 =
,
Keeping 𝐴 =
31
−1
[︂
]︂ [︂ ]︂ [︂
]︂
[︂ ]︂
13
1
−2
1
#»
#»
#»
𝑦 = 𝑇𝐴 ( 𝑥 ) = 𝐴 𝑥 =
=
= −2
= −2 #»
𝑥.
3 1 −1
2
−1
The image #»
𝑦 has been scaled by a factor of −2 relative to the length of the input vector #»
𝑥.
The image #»
𝑦 has the opposite direction as the original vector #»
𝑥.
[︂ ]︂
[︂ ]︂
1
1
#»
#»
In the previous two examples, we found vectors 𝑧 =
and 𝑥 =
that “fit” with
1
−1
[︂
]︂
13
the matrix 𝐴 =
in a special way, such that 𝐴 #»
𝑧 and 𝐴 #»
𝑥 are scalar multiples of #»
𝑧
31
#»
and 𝑥 respectively.
Section 7.2
The Characteristic Polynomial and Finding Eigenvalues
169
REMARK
Given any matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F), we will seek vectors #»
𝑥 such that 𝐴 #»
𝑥 is a scalar multiple
#»
#»
#»
#»
#»
#»
of 𝑥 . For any such matrix 𝐴, 𝑥 = 0 is such a vector because 𝐴 0 = 0 = 𝑐 0 for any
#»
#»
constant 𝑐; that is, 𝐴 0 is always a scalar multiple of 0 . This fact is so trivial that we focus
#»
our attention on vectors #»
𝑥 ̸= 0 such that 𝐴 #»
𝑥 is a scalar multiple of #»
𝑥.
We now arrive at our main definitions.
Definition 7.1.5
Eigenvector,
Eigenvalue and
Eigenpair
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). A non-zero vector #»
𝑥 is an eigenvector of 𝐴 over F if there exists a
scalar 𝜆 ∈ F such that
𝐴 #»
𝑥 = 𝜆 #»
𝑥.
The scalar 𝜆 is then called an eigenvalue of 𝐴 over F, and the pair (𝜆, #»
𝑥 ) is an eigenpair
of 𝐴 over F.
[︂
Example 7.1.6
In Examples 7.1.3 and 7.1.4, we saw that 𝐴 =
(︂
[︂ ]︂)︂
1
over R.
−2,
−1
13
31
]︂
(︂ [︂ ]︂)︂
1
has eigenpairs 4,
and
1
Interestingly enough, a square matrix with real entries may have eigenpairs with non-real
eigenvalues.
]︂
(︂ [︂ ]︂)︂
𝑖
0 −1
#»
is an eigenpair of 𝐵 over C, because
, then (𝜆, 𝑥 ) = 𝑖,
If 𝐵 =
1
1 0
[︂
Example 7.1.7
[︂
]︂ [︂ ]︂ [︂ ]︂
[︂ ]︂
0 −1
𝑖
−1
𝑖
#»
𝐵𝑥 =
=
=𝑖
= 𝜆 #»
𝑥.
1 0
1
𝑖
1
(︂
]︂)︂
−𝑖
You can verify that −𝑖,
is an eigenpair of 𝐵 over C in a similar way. With the
1
techniques that will be developed in this chapter you will be able to show that, even though
𝐵 has entries in R, it does not have any eigenpairs over R.
7.2
[︂
The Characteristic Polynomial and Finding Eigenvalues
When looking for eigenpairs of a matrix 𝐴 over F, consider this list of equivalent equations
which we solve over F for both #»
𝑥 and 𝜆. We denote 𝐼𝑛 by 𝐼.
𝐴 #»
𝑥 = 𝜆 #»
𝑥
#»
⇔ 𝐴 #»
𝑥 − 𝜆 #»
𝑥 = 0
#»
⇔ 𝐴 #»
𝑥 − 𝜆𝐼 #»
𝑥 = 0
#»
⇔ (𝐴 − 𝜆𝐼) #»
𝑥 = 0
170
Chapter 7
Eigenvalues and Diagonalization
#»
Thus, solving the equation 𝐴 #»
𝑥 = 𝜆 #»
𝑥 is equivalent to solving (𝐴 − 𝜆𝐼) #»
𝑥 = 0.
Definition 7.2.1
Eigenvalue
Equation or
Eigenvalue
Problem
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We refer to the equation
𝐴 #»
𝑥 = 𝜆 #»
𝑥
or
#»
(𝐴 − 𝜆𝐼) #»
𝑥 = 0
as the eigenvalue equation for the matrix 𝐴 over F. It is also sometimes referred to
as the eigenvalue problem.
REMARKS
• This is an unusual equation to solve since we want to solve it for both the vector
#»
𝑥 ∈ F𝑛 and the scalar 𝜆 ∈ F. We will approach the problem by first identifying
eligible values of 𝜆. We can then determine corresponding sets of vectors #»
𝑥 that solve
the equation for each 𝜆 we identify.
• As mentioned in the Remark following Example 7.1.4, a trivial solution to this equa#»
#»
tion is #»
𝑥 = 0 . We seek to obtain a non-trivial ( #»
𝑥 ̸= 0 ) solution to the eigenvalue
equation. This is possible if and only if the RREF of matrix 𝐴 − 𝜆𝐼 has fewer than
𝑛 pivots, which occurs if and only if 𝐴 − 𝜆𝐼 is not invertible; that is, if and only if
det(𝐴 − 𝜆𝐼) = 0.
⎛⎡
⎤⎞
𝑎11 − 𝜆 𝑎12 . . . 𝑎1𝑛
⎜⎢ 𝑎21 𝑎22 − 𝜆 . . . 𝑎2𝑛 ⎥⎟
⎜⎢
⎥⎟
• Computing det(𝐴 − 𝜆𝐼) = det ⎜⎢
⎥⎟ by expanding along
..
..
..
.
.
⎝⎣
⎦⎠
.
.
.
.
𝑎𝑛1
𝑎𝑛2 . . . 𝑎𝑛𝑛 − 𝜆
the first row, we sum up 𝑛 terms, each of which is the product of entries in 𝐴 − 𝜆𝐼.
Since each entry is either a constant or a linear term in the variable 𝜆, and all products
present in the expanded determinant calculation contain 𝑛 terms, we can infer that
det(𝐴 − 𝜆𝐼) is a polynomial in 𝜆 of degree 𝑛. See Proposition 7.3.4 (Features of the
Characteristic Polynomial) for a more precise statement and justification.
Definition 7.2.2
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and 𝜆 ∈ F. The characteristic polynomial of 𝐴, denoted by 𝐶𝐴 (𝜆), is
Characteristic
Polynomial and
Characteristic
Equation
𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼).
The characteristic equation of 𝐴 is
𝐶𝐴 (𝜆) = 0.
REMARK
The eigenvalues of 𝐴 over F are the roots of the characteristic polynomial of 𝐴 in F.
Section 7.2
The Characteristic Polynomial and Finding Eigenvalues
171
[︂
Example 7.2.3
]︂
12
Find the eigenvalues of 𝐴 =
over R.
21
[︂
]︂
1−𝜆 2
Solution: Since 𝐴 − 𝜆𝐼 =
, the characteristic polynomial of 𝐴 is
2 1−𝜆
(︂[︂
𝐶𝐴 (𝜆) = det
1−𝜆 2
2 1−𝜆
]︂)︂
= (1 − 𝜆)2 − 22 = 𝜆2 − 2𝜆 − 3
and the characteristic equation of 𝐴 is
𝐶𝐴 (𝜆) = 𝜆2 − 2𝜆 − 3 = (𝜆 − 3)(𝜆 + 1) = 0.
The eigenvalues of 𝐴 over R are the real roots of 𝐶𝐴 (𝜆), that is, 𝜆1 = 3 and 𝜆2 = −1.
[︂
Example 7.2.4
]︂
0 −1
Find the eigenvalues of 𝐵 =
over R.
1 0
]︂
[︂
−𝜆 −1
, the characteristic polynomial of 𝐵 is
Solution: Since 𝐵 − 𝜆𝐼 =
1 −𝜆
(︂[︂
𝐶𝐵 (𝜆) = det
−𝜆 −1
1 −𝜆
]︂)︂
= 𝜆2 + 1
and the characteristic equation of 𝐵 is
𝐶𝐵 (𝜆) = 𝜆2 + 1 = 0.
Since 𝐶𝐵 (𝜆) has no real roots, 𝐵 has no eigenvalues and no eigenpairs over R.
REMARK
]︂
]︂ [︂
cos 𝜃 − sin 𝜃
0 −1
Note that we can interpret 𝐵 geometrically by observing that
=
for
sin 𝜃 cos 𝜃
1 0
𝜋
𝜃 = 2.
[︂
Therefore, from a geometric perspective, 𝐵 is the matrix which performs a rotation by 𝜋2
radians counter-clockwise. Since 𝐵 #»
𝑥 can never be a scalar multiple of a nonzero vector
#»
𝑥 ∈ R2 (because the transformation will always change the vector’s direction), it makes
sense that 𝐵 would have no real eigenvalues nor eigenvectors.
[︂
Example 7.2.5
]︂
0 −1
Keeping 𝐵 =
from the previous example, find the eigenvalues of 𝐵 over C.
1 0
[︂
]︂
−𝜆 −1
Solution: As in the previous example, 𝐵 − 𝜆𝐼 =
and
1 −𝜆
(︂[︂
𝐶𝐵 (𝜆) = det
−𝜆 −1
1 −𝜆
]︂)︂
= 𝜆2 + 1.
172
Chapter 7
Eigenvalues and Diagonalization
We can factor 𝐶𝐵 (𝜆) fully because we are now working in C; thus the characteristic equation
of 𝐵 is
𝐶𝐵 (𝜆) = 𝜆2 + 1 = (𝜆 − 𝑖)(𝜆 + 𝑖) = 0.
The complex roots of 𝐶𝐵 (𝜆) are the eigenvalues of 𝐵 over C. Thus, 𝜆1 = 𝑖 and 𝜆2 = −𝑖.
[︂
Example 7.2.6
]︂
3𝑖 −4
over C.
2 𝑖
[︂
]︂
3𝑖 − 𝜆 −4
Solution: Since 𝐺 − 𝜆𝐼 =
, the characteristic polynomial of 𝐺 is
2 𝑖−𝜆
Find the eigenvalues of 𝐺 =
(︂[︂
𝐶𝐺 (𝜆) = det
3𝑖 − 𝜆 −4
2 𝑖−𝜆
]︂)︂
= (3𝑖 − 𝜆)(𝑖 − 𝜆) + 8 = 𝜆2 − 4𝑖𝜆 + 5
and the characteristic equation is
𝐶𝐺 (𝜆) = 𝜆2 − 4𝑖𝜆 + 5 = (𝜆 − 5𝑖)(𝜆 + 𝑖) = 0.
The complex roots of 𝐶𝐺 (𝜆) are the eigenvalues of 𝐺 over C. Thus, 𝜆1 = 5𝑖 and 𝜆2 = −𝑖.
⎡
Example 7.2.7
⎤
4 2 −6
Find the eigenvalues of 𝐻 = ⎣ 1 −2 1 ⎦ over R.
−6 2 4
Solution: ⎡We find the characteristic
polynomial of 𝐻 by calculating the determinant of
⎤
4−𝜆
2
−6
𝐻 − 𝜆𝐼 = ⎣ 1 −2 − 𝜆 1 ⎦ , using an expansion along the first row:
−6
2
4−𝜆
⎛⎡
⎤⎞
4−𝜆
2
−6
𝐶𝐻 (𝜆) = det ⎝⎣ 1 −2 − 𝜆 1 ⎦⎠
−6
2
4−𝜆
= (4 − 𝜆)[(−2 − 𝜆)(4 − 𝜆) − 2] − 2[1(4 − 𝜆) − 1(−6)] − 6[1(2) − (−2 − 𝜆)(−6)]
= −𝜆3 + 6𝜆2 + 40𝜆.
The characteristic equation is
𝐶𝐻 (𝜆) = −𝜆3 + 6𝜆2 + 40𝜆 = −𝜆(𝜆2 − 6𝜆 − 40) = −𝜆(𝜆 − 10)(𝜆 + 4) = 0.
The real roots are the eigenvalues of 𝐻 over R. Thus, 𝜆1 = 10, 𝜆2 = 0 and 𝜆3 = −4.
7.3
Properties of the Characteristic Polynomial
There is a lot of arithmetic involved in finding eigenpairs and it is wise to make sure that
our characteristic polynomial is actually correct before we begin factoring it. This section
explores properties of the characteristic polynomial that will allow us to verify our work.
We begin with the following result.
Section 7.3
Properties of the Characteristic Polynomial
Proposition 7.3.1
173
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 𝐴 is invertible if and only if 𝜆 = 0 is not an eigenvalue of 𝐴.
𝐴 is invertible iff det(𝐴) ̸= 0.
Proof:
iff det(𝐴 − 0 𝐼𝑛 ) ̸= 0
iff 0 is not a root of the characteristic polynomial
iff 0 is not an eigenvalue of the matrix A.
Definition 7.3.2
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We define the trace of 𝐴 by
Trace
tr(𝐴) =
𝑛
∑︁
𝑎𝑖𝑖 .
𝑖=1
That is, the trace of a square matrix is the sum of its diagonal entries.
⎡
Example 7.3.3
Proposition 7.3.4
⎤
12 7
Let 𝐴 = ⎣ 3 4 −5 ⎦. Then tr(𝐴) = 1 + 4 + 6 = 11.
40 6
(Features of the Characteristic Polynomial)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have characteristic polynomial 𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼). Then 𝐶𝐴 (𝜆) is a
degree 𝑛 polynomial in 𝜆 of the form
𝐶𝐴 (𝜆) = 𝑐𝑛 𝜆𝑛 + 𝑐𝑛−1 𝜆(𝑛−1) + · · · + 𝑐1 𝜆 + 𝑐0 ,
where
(a) 𝑐𝑛 = (−1)𝑛 ,
(b) 𝑐𝑛−1 = (−1)(𝑛−1) tr(𝐴), and
(c) 𝑐0 = det(𝐴).
Proof: By performing cofactor expansion along the first row of 𝐴 and along the first row
of every subsequent (1, 1)-submatrix, we obtain an expression of the form
𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼) = (𝑎11 − 𝜆)(𝑎22 − 𝜆) · · · (𝑎𝑛𝑛 − 𝜆) + 𝑓 (𝜆),
where 𝑓 is the polynomial that is the sum of the other terms of the determinant. In the
above, we have singled out the contribution of the product of the diagonal entries, and have
grouped together all other terms. This latter grouping 𝑓 (𝜆) is a polynomial of degree at
most 𝑛 − 2. To see this, note that unless the cofactor calculation is done with respect to
a diagonal entry, it will use an entry (𝑖, 𝑗) that corresponds to deleting the entry (𝑎𝑖𝑖 − 𝜆)
174
Chapter 7
Eigenvalues and Diagonalization
Figure 7.3.1: Visualization of taking the (1, 2)-submatrix of 𝐴 − 𝜆𝐼
from the 𝑖𝑡ℎ row and the entry (𝑎𝑗𝑗 − 𝜆) from the 𝑗 𝑡ℎ column. This leaves us with at most
𝑛 − 2 entries that contain a 𝜆. We illustrate this below using the (1, 2)-entry as an example.
Returning to our expression for 𝐶𝐴 (𝜆), we can expand out the first term in the expression
to obtain
𝐶𝐴 (𝜆) = 𝑎11 · · · 𝑎𝑛𝑛 + · · · + (−1)𝑛−1 (𝑎11 + · · · + 𝑎𝑛𝑛 )𝜆𝑛−1 + (−1)𝑛 𝜆𝑛 + 𝑓 (𝜆).
From this we see that 𝐶𝐴 (𝜆) is a polynomial of degree 𝑛 in 𝜆 (because we’ve shown that this
degree is not exceeded by the polynomial 𝑓 , which is of degree at most 𝑛 − 2). Moreover,
we obtain the following results:
(a) The coefficient of 𝜆𝑛 is 𝑐𝑛 = (−1)𝑛 .
(b) The coefficient of 𝜆𝑛−1 is
𝑐𝑛−1 = (−1)𝑛−1 (𝑎11 + · · · + 𝑎𝑛𝑛 ) = (−1)𝑛−1 tr(𝐴).
(c) From the definition of 𝐶𝐴 (𝜆), we immediately have that the constant term is
𝑐0 = 𝐶𝐴 (0) = det(𝐴 − 0𝐼) = det(𝐴).
⎤
12 3
Determine the characteristic polynomial of 𝐴 = ⎣ 4 5 6 ⎦ and verify the properties outlined
7 8 10
in Proposition 7.3.4 (Features of the Characteristic Polynomial).
⎡
Example 7.3.5
Solution: The characteristic polynomial of 𝐴 is
⎡
⎤
1−𝜆 2
3
6 ⎦
𝐶𝐴 (𝜆) = det ⎣ 4 5 − 𝜆
7
8 10 − 𝜆
= (1 − 𝜆) [(5 − 𝜆)(10 − 𝜆) − 48] − 2 [4(10 − 𝜆) − 42] + 3 [32 − 7(5 − 𝜆)]
= −𝜆3 + 16𝜆2 + 12𝜆 − 3.
Observe that 𝐶𝐴 (𝜆) has degree 3, that tr(𝐴) = 1 + 5 + 10 = 16 and that det(𝐴) = −3 as
we determined in Example 6.1.8.
We now verify our calculation of 𝐶𝐴 (𝜆) using each part of Proposition 7.3.4 (Features of
the Characteristic Polynomial):
Section 7.3
Properties of the Characteristic Polynomial
175
(a) The coefficient of 𝜆3 is 𝑐3 = −1 = (−1)3 .
(b) The coefficient of 𝜆2 is 𝑐2 = 16 = (−1)2 tr(𝐴).
(c) The constant term is 𝑐0 = −3 = det(𝐴).
We can now be more confident that our characteristic polynomial was computed correctly.
Note that this does not prove that our calculation was correct; this is simply a nice check
for “red flags”.
The characteristic polynomial 𝐶𝐴 (𝜆) of an 𝑛 × 𝑛 matrix 𝐴 is a degree 𝑛 polynomial. By
the Fundamental Theorem of Algebra, the characteristic polynomial 𝐶𝐴 (𝜆) is guaranteed
to have 𝑛 complex roots (though some may be repeated) as we assume F ⊆ C. Therefore,
𝐴 is guaranteed to have 𝑛 eigenvalues in C. Some of these eigenvalues may be repeated, in
which case we list each eigenvalue as many times as its corresponding linear factor appears
in the characteristic polynomial 𝐶𝐴 (𝜆); Example 7.3.12 will address the case of repeated
eigenvalues.
The following result establishes a connection between the characteristic polynomial of a
matrix and its 𝑛 complex eigenvalues over C.
Proposition 7.3.6
(Characteristic Polynomial and Eigenvalues over C)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have characteristic polynomial
𝐶𝐴 (𝜆) = 𝑐𝑛 𝜆𝑛 + 𝑐𝑛−1 𝜆𝑛−1 + · · · + 𝑐1 𝜆 + 𝑐0 ,
and 𝑛 eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 (possibly repeated) in C. Then
(a) 𝑐𝑛−1 = (−1)(𝑛−1)
𝑛
∑︁
𝜆𝑖 , and
𝑖=1
(b) 𝑐0 =
𝑛
∏︁
𝜆𝑖 .
𝑖=1
Note that if 𝐴 has repeated eigenvalues over C, then we include each eigenvalue in the list
𝜆1 , 𝜆2 , ..., 𝜆𝑛 as many times as its corresponding linear factor appears in the characteristic
polynomial 𝐶𝐴 (𝜆).
Proof: The eigenvalues of 𝐴 over C are the 𝑛 complex roots of the characteristic polynomial, so its characteristic polynomial has the form
𝐶𝐴 (𝜆) = 𝑘(𝜆 − 𝜆1 )(𝜆 − 𝜆2 ) · · · (𝜆 − 𝜆𝑛 ) for some 𝑘 ∈ C.
By Parts (a) and (b) of Proposition 7.3.4 (Features of the Characteristic Polynomial), the
coefficient of 𝜆𝑛 is
𝑐𝑛 = (−1)𝑛 . It must be that 𝑘 = (−1)𝑛 , so
𝐶𝐴 (𝜆) = (−1)𝑛 (𝜆 − 𝜆1 )(𝜆 − 𝜆2 ) · · · (𝜆 − 𝜆𝑛 ).
176
Chapter 7
Eigenvalues and Diagonalization
(a) Consider 𝑐𝑛−1 , the coefficient of 𝜆𝑛−1 in 𝐶𝐴 (𝜆). By expanding out the expression for
𝐶𝐴 (𝜆), we find that there are 𝑛 terms involving 𝜆𝑛−1 , each of which is the product of
(−1)𝑛 with one of the constants −𝜆1 , −𝜆2 , . . . , −𝜆𝑛 and with 𝜆 from each of the other
𝑛 − 1 linear factors. By taking the sum of these terms, we find that
𝑐𝑛−1 𝜆𝑛−1 = (−1)𝑛 [−𝜆1 (𝜆)𝑛−1 − 𝜆2 (𝜆)𝑛−1 − · · · − 𝜆𝑛 (𝜆)𝑛−1 ]
(︃
)︃
𝑛
∑︁
(𝑛−1)
= (−1)
𝜆𝑖 𝜆𝑛−1
𝑖=1
that is, 𝑐𝑛−1 = (−1)(𝑛−1)
𝑛
∑︁
𝜆𝑖 .
𝑖=1
(b) Expanding the terms of 𝐶𝐴 (𝜆), its constant term must be (−1)𝑛 times the product
of the constant terms in each of the 𝑛 linear factors, which are −𝜆1 , −𝜆2 , . . . , −𝜆𝑛 .
Therefore,
𝑛
𝑛
𝑛
∏︁
∏︁
∏︁
𝑐0 = (−1)𝑛 (−𝜆𝑖 ) = (−1)2𝑛
𝜆𝑖 =
𝜆𝑖 .
𝑖=1
𝑖=1
𝑖=1
Propositions 7.3.4 and 7.3.6 combine to form the following corollary, the proof of which we
leave as an exercise.
Corollary 7.3.7
(Eigenvalues and Trace/Determinant)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have 𝑛 eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 (possibly repeated) in C. Show that:
(a)
𝑛
∑︁
𝜆𝑖 = tr(𝐴).
𝑖=1
(b)
𝑛
∏︁
𝜆𝑖 = det(𝐴).
𝑖=1
In the following examples, we confirm our previous calculations using Proposition 7.3.6
(Characteristic Polynomial and Eigenvalues over C).
[︂
Example 7.3.8
]︂
0 −1
In Example 7.2.5, the eigenvalues of 𝐵 =
over C are 𝜆1 = 𝑖 and 𝜆2 = −𝑖, and
1 0
𝐶𝐵 (𝜆) = 𝜆2 + 1 is a quadratic (degree 2) polynomial. Proposition 7.3.6 and Corollary 7.3.7
confirm that
(a) the coefficient of 𝜆 is (−1)1
2
∑︁
𝜆𝑖 = −(𝜆1 + 𝜆2 ) = −(𝑖 − 𝑖) = 0 = 𝑐1 and tr(𝐵) =
𝑖=1
𝜆1 + 𝜆2 = 0.
(b) the constant term is
2
∏︁
𝑖=1
𝜆𝑖 = 𝜆1 𝜆2 = (𝑖)(−𝑖) = −𝑖2 = 1 = 𝑐0 and 1 = det(𝐵).
Section 7.3
Properties of the Characteristic Polynomial
177
[︂
Example 7.3.9
]︂
3𝑖 −4
In Example 7.2.6, the eigenvalues of 𝐺 =
over C are 𝜆1 = 5𝑖 and 𝜆2 = −𝑖 and
2 𝑖
𝐶𝐺 (𝜆) = 𝜆2 − 4𝑖𝜆 + 5 is a degree 2 polynomial. hyperref[prop:complex-eigenvals-coeffs-charpoly]Proposition 7.3.6 and Corollary 7.3.7 confirm that
1
(a) the coefficient of 𝜆 is (−1)
2
∑︁
𝜆𝑖 = −(𝜆1 + 𝜆2 ) = −(5𝑖 − 𝑖) = −(4𝑖) = 𝑐1 and tr(𝐺) =
𝑖=1
𝜆1 + 𝜆2 = 4𝑖.
(b) the constant term is
2
∏︁
𝜆𝑖 = 𝜆1 𝜆2 = 5𝑖(−𝑖) = 5 = 𝑐0 and 5 = det(𝐺).
𝑖=1
In the next two examples, we will look at the eigenvalues of a matrix 𝐴 over R, which
are the real roots of the characteristic polynomial. Since R is a subset of C, any real root
of a polynomial can also be considered as a complex root of the polynomial; thus, any
real eigenvalue of a matrix 𝐴 can be considered to be a complex eigenvalue. As a result,
Proposition 7.3.6 (Characteristic Polynomial and Eigenvalues over C) applies in these cases.
]︂
12
has a degree two characteristic polynomial
In Example 7.2.3, 𝐴 =
21
𝐶𝐴 (𝜆) = 𝜆2 − 2𝜆 − 3 = (𝜆 − 3)(𝜆 + 1), with two real roots 𝜆1 = 3 and 𝜆2 = −1. These are
the eigenvalues of 𝐴 over C (as well as over R). Proposition 7.3.6 confirms that
[︂
Example 7.3.10
(a) the coefficient of 𝜆 is (−1)1
2
∑︁
𝜆𝑖 = −(𝜆1 + 𝜆2 ) = −(3 − 1) = −2 = 𝑐1 .
𝑖=1
(b) the constant term is
2
∏︁
𝜆𝑖 = 𝜆1 𝜆2 = (3)(−1) = −3 = 𝑐0 .
𝑖=1
Be careful to note that a matrix 𝐴 can have both real and non-real eigenvalues. In applying
Proposition 7.3.6, we must use all of the eigenvalues of 𝐴.
⎡
Example 7.3.11
⎤
−4 0 0
The characteristic polynomial of 𝐽 = ⎣ 0 0 4 ⎦ is
0 −4 0
⎛⎡
⎤⎞
−4 − 𝜆 0 0
−𝜆 4 ⎦⎠ = −(𝜆 + 4)(𝜆2 + 16) = −𝜆3 − 4𝜆2 − 16𝜆 − 64.
𝐶𝐽 (𝜆) = det ⎝⎣ 0
0
−4 −𝜆
Its degree is 3, and it has one real root 𝜆1 = −4 and two non-real complex roots 𝜆2 = 4𝑖
and 𝜆3 = −4𝑖. These are the eigenvalues of 𝐽 over C. Proposition 7.3.6 confirms that
178
Chapter 7
(a) the coefficient of 𝜆2 is (−1)2
3
∑︁
Eigenvalues and Diagonalization
𝜆𝑖 = 𝜆1 + 𝜆2 + 𝜆3 = −4 + 4𝑖 − 4𝑖 = −4 = 𝑐2 .
𝑖=1
(b) the constant term is
3
∏︁
𝜆𝑖 = 𝜆1 𝜆2 𝜆3 = (−4)(4𝑖)(−4𝑖) = −64 = 𝑐0 .
𝑖=1
For our following example, the characteristic polynomial includes a repeated linear factor.
In order to use Proposition 7.3.6 (Characteristic Polynomial and Eigenvalues over C), we
include each eigenvalue in our list 𝜆1 , 𝜆2 , ..., 𝜆𝑛 as many times as its corresponding linear
factor appears in the characteristic polynomial 𝐶𝐴 (𝜆). We will address this concept later
in terms of multiplicity.
⎡
Example 7.3.12
⎤
311
The characteristic polynomial of 𝑍 = ⎣ 1 3 1 ⎦ is
113
⎛⎡
⎤⎞
3−𝜆 1
1
𝐶𝑍 (𝜆) = det ⎝⎣ 1 3 − 𝜆 1 ⎦⎠
1
1 3−𝜆
= (3 − 𝜆)[(3 − 𝜆)2 − 1] − 1[(3 − 𝜆) − 1] + 1[1 − (3 − 𝜆)]
= −𝜆3 + 9𝜆2 − 24𝜆 + 20
= −(𝜆 − 5)(𝜆 − 2)2 .
Since 𝑛 = 3, we must carefully write down the three eigenvalues corresponding to the roots
of 𝐶𝑍 (𝜆), allowing repeats. Since the linear factor 𝜆 − 2 occurs twice in 𝐶𝑍 (𝜆), we list the
eigenvalues as 𝜆1 = 5, 𝜆2 = 2, and 𝜆3 = 2. Proposition 7.3.6 confirms that
(a) the coefficient of 𝜆2 is (−1)2
3
∑︁
𝜆𝑖 = 𝜆1 + 𝜆2 + 𝜆3 = 5 + 2 + 2 = 9 = 𝑐2 .
𝑖=1
(b) the constant term is
3
∏︁
𝜆𝑖 = 𝜆1 𝜆2 𝜆3 = 5(2)(2) = 20 = 𝑐0 .
𝑖=1
7.4
Finding Eigenvectors
Once we have found an eigenvalue 𝜆 of a matrix 𝐴 over F, we can examine the eigenvalue
#»
equation in the form (𝐴 − 𝜆𝐼) #»
𝑥 = 0 in order to obtain a corresponding eigenvector #»
𝑥.
[︂
Example 7.4.1
]︂
12
For each eigenvalue of 𝐴 =
over R, find a corresponding eigenpair.
21
Section 7.4
Finding Eigenvectors
179
Solution: From Example 7.2.3, the eigenvalues of 𝐴 over R are 𝜆1 = 3 and 𝜆2 = −1.
For each eigenvalue 𝜆𝑖 , we examine the eigenvalue equation to determine an eigenvector.
#»
• When 𝜆1 = 3, we examine (𝐴 − 3𝐼) #»
𝑥 = 0 which is equivalent to
[︂
]︂ [︂ ]︂
−2 2
𝑥1
#»
= 0
2 −2 𝑥2
and row-reduces to
[︂
1 −1
0 0
]︂ [︂
]︂
𝑥1
#»
= 0.
𝑥2
{︂ [︂ ]︂
}︂
(︂ [︂ ]︂)︂
1
1
The general solution is 𝑠
: 𝑠 ∈ R and an eigenpair is 3,
.
1
1
#»
• When 𝜆2 = −1, we examine (𝐴 − (−1)𝐼) #»
𝑥 = 0 which is equivalent to
]︂ [︂ ]︂
[︂
2 2 𝑥1
#»
= 0
2 2 𝑥2
and row-reduces to
[︂
11
00
]︂ [︂
]︂
𝑥1
#»
= 0.
𝑥2
}︂
(︂
[︂ ]︂)︂
{︂ [︂ ]︂
1
1
.
: 𝑡 ∈ R and an eigenpair is −1,
The general solution is 𝑡
−1
−1
Checking these eigenpairs, we have that
[︂ ]︂
]︂ [︂ ]︂ [︂ ]︂
[︂ ]︂ [︂
1
3
12 1
1
=3
=
=
𝐴
1
3
21 1
1
and
]︂
[︂
]︂ [︂ ]︂ [︂ ]︂
]︂ [︂
1
−1
1
12
1
.
= −1
=
=
𝐴
1
−1
2 1 −1
−1
[︂
REMARK
(︂ [︂ ]︂)︂
1
While 3,
is an eigenpair of 𝐴, we can get other eigenpairs by pairing non-zero scalar
1
[︂ ]︂
(︂ [︂ ]︂)︂
1
2
multiples of the eigenvector
with the eigenvalue 𝜆 = 3. For example, 3,
and
1
2
(︂ [︂
]︂)︂
−4
3,
are eigenpairs of 𝐴 over R, since
−4
[︂ ]︂ [︂
]︂ [︂ ]︂ [︂ ]︂
[︂ ]︂
2
12 2
6
2
𝐴
=
=
=3
2
21 2
6
2
and
[︂
𝐴
]︂ [︂
]︂ [︂ ]︂ [︂
]︂
[︂ ]︂
−4
1 2 −4
−12
−4
=
=
=3
.
−4
2 1 −4
−12
−4
180
Chapter 7
Eigenvalues and Diagonalization
We formalize this idea later in Proposition 7.5.1 (Linear Combinations of Eigenvectors).
[︂
Example 7.4.2
]︂
0 −1
For each eigenvalue of 𝐵 =
over C, find a corresponding eigenpair.
1 0
Solution: From Example 7.2.4, the eigenvalues of 𝐵 over C are 𝜆1 = 𝑖 and 𝜆2 = −𝑖.
For each eigenvalue 𝜆𝑖 , we examine the eigenvalue equation to determine an eigenvector.
#»
• When 𝜆1 = 𝑖, we examine (𝐵 − 𝑖𝐼) #»
𝑥 = 0 which is equivalent to
[︂
]︂ [︂ ]︂
−𝑖 −1 𝑥1
#»
= 0
1 −𝑖
𝑥2
and row-reduces to
[︂
1 −𝑖
0 0
]︂ [︂
]︂
𝑥1
#»
= 0.
𝑥2
}︂
(︂ [︂ ]︂)︂
{︂ [︂ ]︂
𝑖
𝑖
.
: 𝑠 ∈ C and an eigenpair is 𝑖,
The general solution is 𝑠
1
1
#»
• When 𝜆2 = −𝑖, we examine (𝐵 − (−𝑖)𝐼) #»
𝑥 = 0 which is equivalent to
]︂ [︂ ]︂
[︂
𝑖 −1 𝑥1
#»
= 0
1 𝑖
𝑥2
and row-reduces to
[︂
1 𝑖
00
]︂ [︂
]︂
𝑥1
#»
= 0.
𝑥2
}︂
(︂
[︂ ]︂)︂
{︂ [︂ ]︂
−𝑖
−𝑖
.
: 𝑡 ∈ C and an eigenpair is −𝑖,
The general solution is 𝑡
1
1
Checking these eigenpairs, we have that
[︂ ]︂ [︂
]︂ [︂ ]︂ [︂
]︂
[︂ ]︂
𝑖
0 −1
𝑖
−1
𝑖
𝐵
=
=
=𝑖
1
1 0
1
𝑖
1
and
[︂
]︂ [︂
]︂ [︂ ]︂ [︂ ]︂
[︂ ]︂
−𝑖
0 −1 −𝑖
−1
−𝑖
𝐵
=
=
= −𝑖
.
1
1 0
1
−𝑖
1
⎡
Example 7.4.3
⎤
311
For each eigenvalue of 𝑍 = ⎣ 1 3 1 ⎦ over R, find a corresponding eigenpair.
113
Solution: From Example 7.3.12, the characteristic polynomial of 𝑍 is
𝐶𝑍 (𝜆) = −(𝜆 − 5)(𝜆 − 2)2 . Therefore, the eigenvalues of 𝑍 over R are 𝜆1 = 5, 𝜆2 = 2 and
𝜆3 = 2. (Note that the eigenvalue 2 occurs twice.)
Section 7.5
Eigenspaces
181
For each eigenvalue 𝜆𝑖 , we examine the eigenvalue equation to determine an eigenvector.
#»
• When 𝜆1 = 5, we solve the system (𝑍 − 5𝐼) #»
𝑥 = 0 which is equivalent to
⎡
⎤⎡ ⎤
−2 1 1
𝑥1
⎣ 1 −2 1 ⎦ ⎣ 𝑥2 ⎦ = #»
0
1 1 −2
𝑥3
and row-reduces to
⎤⎡ ⎤
1 0 −1
𝑥1
⎣ 0 1 −1 ⎦ ⎣ 𝑥2 ⎦ = #»
0.
00 0
𝑥3
⎡
⎧ ⎡ ⎤
⎫
⎛ ⎡ ⎤⎞
1
⎨ 1
⎬
The solution set is 𝑠 ⎣ 1 ⎦ : 𝑠 ∈ R and an eigenpair is ⎝5, ⎣ 1 ⎦⎠ .
⎩
⎭
1
1
#»
• When 𝜆2 = 2, we solve the system (𝑍 − 2𝐼) #»
𝑥 = 0 which is equivalent to
⎤⎡ ⎤
⎡
𝑥1
111
⎣ 1 1 1 ⎦ ⎣ 𝑥2 ⎦ = #»
0
111
𝑥3
and row-reduces to
⎡
⎤⎡ ⎤
111
𝑥1
⎣ 0 0 0 ⎦ ⎣ 𝑥2 ⎦ = #»
0.
000
𝑥3
⎧ ⎡ ⎤
⎫
⎡ ⎤
⎛ ⎡ ⎤⎞
−1
−1
⎨ −1
⎬
The solution set is 𝑡 ⎣ 1 ⎦ + 𝑢 ⎣ 0 ⎦ : 𝑡, 𝑢 ∈ R and an eigenpair is ⎝2, ⎣ 1 ⎦⎠ .
⎩
⎭
0
1
0
• No new computation
⎛ ⎡
⎤⎞is needed for 𝜆3 = 2. However,
⎡ ⎤note that we can obtain another
−1
−1
⎝
⎣
⎦
⎠
⎣
0 ⎦ is not a scalar multiple of the
eigenpair 2, 0
where the eigenvector
1
1
⎡
⎤
−1
eigenvector ⎣ 1 ⎦ obtained in the previous step. We are able to do this because the
0
solution set to the eigenvalue equation has two parameters in it. This feature of the
solution set is related to the fact that the eigenvalue 𝜆2 = 𝜆3 = 2 is a repeated root
of the characteristic polynomial. We will study this phenomenon in more detail later.
7.5
Eigenspaces
[︂
(︂ [︂ ]︂)︂
1
After Example 7.2.3, we noted that 𝐴 =
has an eigenpair 3,
. By taking
1
[︂ ]︂
(︂ [︂ ]︂)︂
(︂ [︂ ]︂)︂
1
2
−4
scalar multiples of the eigenvector
, we generated 3,
and 3,
that are
1
2
−4
12
21
]︂
182
Chapter 7
Eigenvalues and Diagonalization
also eigenpairs of 𝐴 over R with the same eigenvalue 𝜆 = 3. More generally, we have the
following result.
Proposition 7.5.1
(Linear Combinations of Eigenvectors)
Let 𝑐, 𝑑 ∈ F and suppose that (𝜆1 , #»
𝑥 ) and (𝜆1 , #»
𝑦 ) are eigenpairs of a matrix 𝐴 over F with
#»
the same eigenvalue 𝜆1 . If 𝑐 #»
𝑥 + 𝑑 #»
𝑦 ̸= 0 , then (𝜆1 , 𝑐 #»
𝑥 + 𝑑 #»
𝑦 ) is also an eigenpair for 𝐴 with
eigenvalue 𝜆1 .
Proof: Observe that
𝐴(𝑐 #»
𝑥 + 𝑑 #»
𝑦 ) = 𝑐(𝐴 #»
𝑥 ) + 𝑑(𝐴 #»
𝑦 ) = 𝑐(𝜆1 #»
𝑥 ) + 𝑑(𝜆1 #»
𝑦 ) = 𝜆1 (𝑐 #»
𝑥 + 𝑑 #»
𝑦 ).
Thus, since 𝑐 #»
𝑥 + 𝑑 #»
𝑦 ̸= 0, it follows that 𝑐 #»
𝑥 + 𝑑 #»
𝑦 is an eigenvector of 𝐴 with eigenvalue
#»
#»
𝜆1 . That is, (𝜆1 , 𝑐 𝑥 + 𝑑 𝑦 ) is an eigenpair for 𝐴.
Example 7.5.2
If we take 𝑑 = 0 and 𝑐 ̸= 0 in Proposition 7.5.1, we see that every non-zero scalar multiple
of an eigenvector is an eigenvector (with the same eigenvalue).
]︂
(︂ [︂ ]︂)︂
[︂
(︂ [︂ ]︂)︂
𝑐
12
1
for any
, then so is 3,
is an eigenpair for 𝐴 =
For instance, since 3,
𝑐
21
1
𝑐 ̸= 0 in F.
Since every non-zero linear combination of eigenvectors with respect to a fixed eigenvalue
𝜆 are also eigenvectors with respect to 𝜆, this suggests collecting all possible eigenvectors
of 𝐴 with respect to a value 𝜆 into a set.
Definition 7.5.3
Eigenspace
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and let 𝜆 ∈ F. The eigenspace of 𝐴 associated with 𝜆, denoted by
#»
𝐸𝜆 (𝐴), is the solution set to the system (𝐴 − 𝜆𝐼) #»
𝑥 = 0 over F. That is,
𝐸𝜆 (𝐴) = Null(𝐴 − 𝜆 𝐼).
If the choice of 𝐴 is clear, we abbreviate this as 𝐸𝜆 .
REMARKS
#»
#»
• Notice that 0 is always in 𝐸𝜆 . However, by definition 0 is not an eigenvector. Thus
the eigenspace 𝐸𝜆 consists of all the eigenvectors that have eigenvalue 𝜆 together with
the zero vector.
#»
• 𝜆 is an eigenvalue for 𝐴 if and only if 𝐸𝜆 ̸= { 0 }.
• 𝐸0 (𝐴) = Null(𝐴 − 0𝐼) = Null(𝐴).
Section 7.6
Diagonalization
Example 7.5.4
183
(︂ [︂ ]︂)︂
(︂
[︂
]︂)︂
[︂
]︂
1
1
12
In Example 7.4.1, we found that 3,
and −1,
are eigenpairs of 𝐴 =
1
−1
21
over R. Our{︂[︂work
in
that
Example
in
fact
shows
that
the
eigenspaces
of
𝐴
are
]︂}︂
{︂[︂ ]︂}︂
1
1
𝐸3 = Span
and 𝐸−1 = Span
.
1
−1
[︂
Example 7.5.5
]︂
0 −1
Our work in Example 7.4.2 shows that the eigenspaces of 𝐵 =
over C are
1 0
{︂[︂ ]︂}︂
{︂[︂ ]︂}︂
𝑖
−𝑖
𝐸𝑖 = Span
and 𝐸−𝑖 = Span
.
1
1
⎡
Example 7.5.6
⎤
311
In Example 7.4.3, the eigenspaces of 𝑍 = ⎣ 1 3 1 ⎦ over R are
113
⎫
⎧⎡ ⎤⎫
⎧ ⎡ ⎤
⎬
⎨ 1 ⎬
⎨ 1
⎦
⎣
𝐸5 = 𝑠 1 : 𝑠 ∈ R = Span ⎣ 1 ⎦
⎭
⎭
⎩
⎩
1
1
and
7.6
⎧ ⎡
⎫
⎧⎡ ⎤ ⎡
⎤
⎡ ⎤
⎤⎫
−1
−1 ⎬
⎨ −1
⎬
⎨ −1
𝐸2 = 𝑡 ⎣ 1 ⎦ + 𝑢 ⎣ 0 ⎦ : 𝑡, 𝑢 ∈ R = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ .
⎩
⎭
⎩
⎭
0
1
0
1
Diagonalization
In this section we will demonstrate one practical use of eigenvalues and eigenvectors. (There
are many more that you can learn about in your future courses!)
In certain applications of linear algebra, it is necessary to compute powers 𝐴𝑘 of a square
matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) efficiently. It would be ideal if we could compute 𝐴𝑘 without first
having to compute each of 𝐴2 , 𝐴3 , 𝐴4 , . . . , 𝐴𝑘−1 . In some special situations, this is possible.
For instance, if 𝐴 is a diagonal matrix, then computing 𝐴𝑘 is particularly straightforward:
simply take the 𝑘 𝑡ℎ powers of the diagonal entries.
EXERCISE
Prove that if 𝐷 = diag(𝑑1 , . . . , 𝑑𝑛 ) ∈ 𝑀𝑛×𝑛 (F), then 𝐷𝑘 = diag(𝑑𝑘1 , . . . , 𝑑𝑘𝑛 ) for all 𝑘 ∈ N.
As the following example demonstrates, computing powers of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) is
simpler not only when 𝐴 is diagonal, but also when 𝐴 is of the form 𝐴 = 𝑃 𝐷𝑃 −1 , where 𝑃
is invertible and 𝐷 is diagonal.
184
Chapter 7
Eigenvalues and Diagonalization
[︂
Example 7.6.1
]︂
[︂
]︂
1 1
3 0
Consider the matrices 𝑃 =
and 𝐷 =
. (Observe that 𝐷 is a diagonal
1 −1
0 −1
[︂
]︂
12
matrix.) Let 𝐴 = 𝑃 𝐷𝑃 −1 =
. Then,
21
𝐴2 = 𝐴𝐴 = (𝑃 𝐷𝑃 −1 )(𝑃 𝐷𝑃 −1 ) = 𝑃 𝐷(𝑃 −1 𝑃 )𝐷𝑃 −1 = 𝑃 𝐷2 𝑃 −1 ,
and therefore
𝐴3 = 𝐴𝐴2 = (𝑃 𝐷𝑃 −1 )(𝑃 𝐷2 𝑃 −1 ) = 𝑃 𝐷3 𝑃 −1 .
One can show using induction (exercise: do this!) that we have the following general formula
for 𝐴𝑘 :
𝐴𝑘 = 𝑃 𝐷𝑘 𝑃 −1 for all 𝑘 ∈ N.
Now, the key observation is: since 𝐷 is a diagonal matrix, we have
[︂ 𝑘
]︂
3
0
𝑘
𝐷 =
.
0 (−1)𝑘
Consequently,
𝑘
𝑘
𝐴 = 𝑃𝐷 𝑃
−1
]︂−1
]︂ [︂ 𝑘
]︂ [︂
3
0
1 1
1 1
=
1 −1
1 −1
0 (−1)𝑘
]︂ [︂ 1 1 ]︂
]︂ [︂ 𝑘
[︂
3
0
1 1
2 2
=
1
1
1 −1
0 (−1)𝑘
2 −2
]︂
[︂ 𝑘
1 3 + (−1)𝑘 3𝑘 − (−1)𝑘
.
=
2 3𝑘 − (−1)𝑘 3𝑘 + (−1)𝑘
[︂
In the previous Example, the relationship between 𝐴 and 𝐷 given by the equation
𝐴 = 𝑃 𝐷𝑃 −1 made the computation of 𝐴𝑘 a rather straightforward matter. We shall
give such a relationship a name. Of course, the key problem is how to find such a pair of
matrices 𝑃 and 𝐷 given a matrix 𝐴, if this is even possible. We will return to this problem
shortly.
Definition 7.6.2
Similar
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐴 is similar to 𝐵 over F if there exists an invertible
matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F) such that 𝐴 = 𝑃 𝐵𝑃 −1 .
REMARKS
• Notice that if 𝐴 = 𝑃 𝐵𝑃 −1 then multiplying this equation on the left by 𝑃 −1 and
on the right by 𝑃 gives 𝑃 −1 𝐴𝑃 = 𝐵, or equivalently, 𝑄𝐴𝑄−1 = 𝐵 where 𝑄 = 𝑃 −1 .
Thus if 𝐴 is similar to 𝐵, then 𝐵 is similar to 𝐴 and we can just say that 𝐴 and 𝐵
are similar to each other.
Section 7.6
Diagonalization
185
• The choice of “similar” as terminology for the relationship in Definition 7.6.2 might
seem strange at first sight. There is in fact good reason for this choice, as we will
later learn. (See the Remark on page 234.) In the Exercise below, you are asked to
show that similar matrices share a variety of features. Specifically, they have the same
determinant, trace, eigenvalues and characteristic polynomial.
[︂
Example 7.6.3
]︂
[︂
]︂
12
3 0
Example 7.6.1 shows that 𝐴 =
is similar to 𝐷 =
over R.
21
0 −1
[︂
Example 7.6.4
]︂
[︂
]︂
[︂
]︂
1 6 −2
12
12
−1
Let 𝐴 =
. Consider 𝑃 =
, with 𝑃 = −
, and let
34
46
2 −4 1
𝐵 = 𝑃 −1 𝐴𝑃 =
[︂
12
46
]︂−1 [︂
12
34
]︂ [︂
12
46
]︂
=
[︂
]︂
1 16 24
.
2 −17 −26
]︂
]︂
[︂
1 16 24
12
is similar to 𝐵 =
over R.
Then 𝐴 =
34
2 −17 −26
[︂
EXERCISE
Let 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (F). Show that:
(a) 𝐴 is similar to 𝐴 over F.
(b) If 𝐴 is similar to 𝐵 over F and 𝐵 is similar to 𝐶 over F, then 𝐴 is similar to 𝐶 over F.
EXERCISE
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Show that if 𝐴 is similar to 𝐵 over F, then:
(a) For all 𝑘 ∈ N, 𝐴𝑘 is similar to 𝐵 𝑘 over F.
(b) 𝐶𝐴 (𝜆) = 𝐶𝐵 (𝜆).
(c) 𝐴 and 𝐵 have the same eigenvalues.
(d) tr(𝐴) = tr(𝐵) and det(𝐴) = det(𝐵).
Let us return to the problem of computing 𝐴𝑘 . In view of Example 7.6.1, the problem
becomes much easier if we can show that 𝐴 is similar to a diagonal matrix 𝐷. We will
single out such matrices.
186
Definition 7.6.5
Diagonalizable
Matrix
Chapter 7
Eigenvalues and Diagonalization
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐴 is diagonalizable over F if it is similar over F to a
diagonal matrix 𝐷 ∈ 𝑀𝑛×𝑛 (F); that is, if there exists an invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F)
such that 𝑃 −1 𝐴𝑃 = 𝐷. We say that the matrix 𝑃 diagonalizes 𝐴.
REMARKS
• Although we have motivated the above Definition by considering the problem of computing powers of matrices, the notion of diagonalizablity is fundamental for several
other reasons, both theoretical and practical. We will see some of them in due course.
• It is important to pay attention to what field F we are working over. It is possible
for a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) to be diagonalizable over C but not over R. (See Example
9.4.1.)
[︂
Example 7.6.6
Example 7.6.1 shows that 𝐴 =
]︂
12
is diagonalizable over R.
21
The problem we must now address is: determine if a given matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) is diagonalizable over F, and if it is, find the matrix 𝑃 that diagonalizes it.
We will eventually be able to completely solve this problem. For now, we will give a partial
solution.
Proposition 7.6.7
(𝑛 Distinct Eigenvalues =⇒ Diagonalizable)
If 𝐴 ∈ 𝑀𝑛×𝑛 (F) has 𝑛 distinct eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 in F, then 𝐴 is diagonalizable over
F.
More specifically, if we let (𝜆1 , 𝑣#»1 ), (𝜆2 , 𝑣#»2 ), . . . , (𝜆𝑛 , 𝑣#»
𝑛 ) be eigenpairs of 𝐴 over F, and if we
#»
#»
#
»
let 𝑃 = [𝑣 𝑣 · · · 𝑣 ] be the matrix whose columns are eigenvectors corresponding to the
1 2
𝑛
distinct eigenvalues, then
(a) 𝑃 is invertible, and
(b) 𝑃 −1 𝐴𝑃 = 𝐷 = diag(𝜆1 , 𝜆2 , · · · , 𝜆𝑛 ).
REMARKS
• WARNING: The converse of the above Proposition is false. That is, if 𝐴 ∈ 𝑀𝑛×𝑛 (F)
is diagonalizable over F, then it is not necessarily true that 𝐴 has 𝑛 distinct eigenvalues. Consider, for instance, 𝒪2×2 = diag(0, 0) which is diagonalizable (it is diagonal!)
with two repeated eigenvalues 𝜆1 = 𝜆2 = 0.
Section 7.6
Diagonalization
187
• We will generalize this result in Chapter 9, and we will address precisely what happens
when 𝐴 does not have 𝑛 distinct eigenvalues. We will prove the generalized result at
that time. (See page 242 for the proof of Proposition 7.6.7.)
• Notice that the columns of 𝑃 are eigenvectors of 𝐴 and the diagonal entries of 𝐷
are eigenvalues of 𝐴. Moreover, the order of the eigenvalues down the diagonal of
𝐷 corresponds to the order the eigenvectors occur as columns in 𝑃 . That is, the
𝑖𝑡ℎ diagonal entry in 𝐷 is the eigenvalue corresponding to the 𝑖𝑡ℎ column of 𝑃 . The
Proposition remains true if we list the eigenvectors as columns in 𝑃 in any order we
want, provided we adjust the diagonal entries of 𝐷 accordingly.
Although we do not have a general proof of Proposition 7.6.7 at this time, we can directly
verify that it works in specific examples as we illustrate below.
[︂
Example 7.6.8
]︂
(︂ [︂ ]︂)︂
(︂
[︂ ]︂)︂
12
1
1
From Example 7.4.1, 𝐴 =
has eigenpairs 3,
and −1,
over R.
−1
21
1
]︂
[︂
]︂
[︂
1 1
1 −1 −1
−1
from the eigenvectors, with inverse 𝑃 = − 2
We construct 𝑃 =
.
1 −1
−1 1
Now, if we compute
𝐷=𝑃
−1
[︂
1 −1
𝐴𝑃 = −
2 −1
[︂
1 −1
=−
2 −1
]︂
[︂
3 0
=
0 −1
]︂ [︂
−1
1
]︂ [︂
12
21
−1
1
]︂ [︂
3 −1
3 1
1 1
1 −1
]︂
]︂
= diag(3, −1),
we find that it is a diagonal matrix whose diagonal entries are the eigenvalues of 𝐴, exactly
as asserted by Proposition 7.6.7.
Example 7.6.9
[︂
]︂
(︂ [︂ ]︂)︂
(︂
[︂ ]︂)︂
0 −1
𝑖
−𝑖
From Example 7.4.2, 𝐵 =
has eigenpairs 𝑖,
and −𝑖,
over C. We
1 0
1
1
[︂
]︂
[︂
]︂
1 −𝑖 1
𝑖 −𝑖
construct 𝑃 =
from the eigenvectors, with inverse 𝑃 −1 =
.
1 1
2 𝑖 1
Then
𝐷 = 𝑃 −1 𝐵𝑃
[︂
]︂ [︂
]︂ [︂
]︂
1 −𝑖 1 0 −1
𝑖 −𝑖
=
1 1
2 𝑖 1 1 0
[︂
]︂ [︂
]︂
1 −𝑖 1 −1 −1
=
𝑖 −𝑖
2 𝑖 1
[︂
]︂
𝑖 0
=
0 −𝑖
188
Chapter 7
Eigenvalues and Diagonalization
= diag(𝑖, −𝑖).
⎡
Example 7.6.10
⎤
4 2 −6
From Example 7.2.7, 𝐻 = ⎣ 1 −2 1 ⎦ has eigenalues 𝜆1 = 10, 𝜆2 = 0 and 𝜆3 = −4.
−6 2 4
⎛ ⎡ ⎤⎞ ⎛ ⎡ ⎤⎞
⎛
⎡
⎤⎞
1
1
1
We leave it as an exercise to verify that ⎝10, ⎣ 0 ⎦⎠ , ⎝0, ⎣ 1 ⎦⎠ and ⎝−4, ⎣ −1 ⎦⎠ are
−1
1
1
⎡
⎤
1 1 1
⎣
0 1 −1 ⎦ from the eigenvectors, with inverse
eigenpairs over R. We construct 𝑃 =
−1 1 1
⎡
⎤
2 0 −2
1
𝑃 −1 = ⎣ 1 2 1 ⎦ .
4
1 −2 1
Then
⎤
⎤⎡
⎤⎡
1 1 1
4 2 −6
2 0 −2
1
𝐷 = 𝑃 −1 𝐻𝑃 = ⎣ 1 2 1 ⎦ ⎣ 1 −2 1 ⎦ ⎣ 0 1 −1 ⎦
4
−1 1 1
−6 2 4
1 −2 1
⎡
⎤⎡
⎤
2 0 −2
10 0 −4
1
= ⎣1 2 1 ⎦ ⎣ 0 0 4 ⎦
4
1 −2 1
−10 0 −4
⎡
⎤
10 0 0
=⎣ 0 0 0 ⎦
0 0 −4
⎡
= diag(10, 0, −4).
EXERCISE
⎡
⎤
3 −5 0
The matrix 𝐴 = ⎣ 2 −3 0 ⎦ has the following eigenpairs:
−2 3 −1
⎛
⎡ ⎤⎞ ⎛ ⎡
⎤⎞
⎛
⎡
⎤⎞
0
2−𝑖
2+𝑖
⎝−1, ⎣ 0 ⎦⎠ , ⎝𝑖, ⎣ 1 − 𝑖 ⎦⎠ , and ⎝−𝑖, ⎣ 1 + 𝑖 ⎦⎠ .
1
−1
−1
Find an invertible matrix 𝑃 and a diagonal matrix 𝐷 such that 𝐴 = 𝑃 𝐷𝑃 −1 .
We close this Chapter with an example of a quick computation of powers of a diagonalizable
matrix.
Section 7.6
Diagonalization
189
⎡
Example 7.6.11
⎤
4 2 −6
Let 𝐻 = ⎣ 1 −2 1 ⎦. Find 𝐻 10 .
−6 2 4
Solution:
In the previous Example we found matrices 𝑃 and 𝐷 so that 𝐻 = 𝑃 𝐷𝑃 −1 . By what we
have learned from Example 7.6.1 (see also the Exercise below), we find that for all 𝑘 ∈ N
𝐻 𝑘 = (𝑃 𝐷𝑃 −1 )𝑘 = 𝑃 𝐷𝑘 𝑃 −1
⎡
⎤⎡
⎤𝑘 ⎡
⎤
1 1 1
10 0 0
2 0 −2
1
⎣1 2 1 ⎦
= ⎣ 0 1 −1 ⎦ ⎣ 0 0 0 ⎦
4
−1 1 1
0 0 −4
1 −2 1
⎡
⎤
⎤⎡ 𝑘
⎤ ⎡
10 0
0
2 0 −2
1 1 1
1
= ⎣ 0 1 −1 ⎦ ⎣ 0 0𝑘 0 ⎦ ⎣ 1 2 1 ⎦
4
1 −2 1
−1 1 1
0 0 (−4)𝑘
⎡
⎤
𝑘
𝑘
𝑘
2 · 10 + (−4)
−2(−4)
−2(10𝑘 )
1
⎦.
−(−4)𝑘
2(−4)𝑘
−(−4)𝑘
= ⎣
4
𝑘
𝑘
𝑘
𝑘
𝑘
−2(10 ) + (−4)
−2(−4)
2(10 ) + (−4)
If we plug in 𝑘 = 10 and simplify, we arrive at
⎤
⎡
9766137 −1024 −976513
1024
−512 ⎦ .
𝐻 10 = 512 ⎣ −512
−9765113 −1024 9766137
Chapter 8
Subspaces and Bases
8.1
Subspaces
Throughout this course we have been dealing with a variety of subsets of F𝑛 —for example,
solutions sets of systems of equations, lines and planes in R3 , and ranges of linear transformations. Many of these subsets have a special structure that makes them particularly
interesting from the point of view of linear algebra.
The next definition identifies the key features of this structure.
Definition 8.1.1
Subspace
A subset 𝑉 of F𝑛 is called a subspace of F𝑛 if the following properties are all satisfied.
#»
1. 0 ∈ 𝑉 .
2. For all #»
𝑥 , #»
𝑦 ∈ 𝑉, #»
𝑥 + #»
𝑦 ∈ 𝑉 (closure under addition).
3. For all #»
𝑥 ∈ 𝑉 and 𝑐 ∈ F, 𝑐 #»
𝑥 ∈ 𝑉 (closure under scalar multiplication).
Here are some important examples of subspaces.
Proposition 8.1.2
(Examples of Subspaces)
#»
(a) { 0 } and F𝑛 are subspaces of F𝑛 .
(b) If {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is a subset of F𝑛 , then Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is a subspace of F𝑛 .
#»
(c) If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then the solution set to the homogeneous system 𝐴 #»
𝑥 = 0 is a
subspace of F𝑛 . (Equivalently, Null(𝐴) is a subspace of F𝑛 .)
Proof: (a) This is immediate from the definition.
190
Section 8.1
Subspaces
191
#»
#»
(b) We have 0 = 0𝑣#»1 + · · · + 0𝑣#»𝑘 , so 0 ∈ Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }.
Next, we consider #»
𝑥 , #»
𝑦 ∈ Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } and 𝑐 ∈ F. Then there exist scalars
𝑎1 , 𝑎2 , . . . , 𝑎𝑘 ∈ F such that #»
𝑥 = 𝑎1 𝑣#»1 + 𝑎2 𝑣#»2 + · · · + 𝑎𝑘 𝑣#»𝑘 , and there exist scalars
𝑏 , 𝑏 , . . . , 𝑏 ∈ F such that #»
𝑦 = 𝑏 𝑣#» + 𝑏 𝑣#» + . . . + 𝑏 𝑣#». We have
1
2
1 1
𝑘
2 2
𝑘 𝑘
#»
𝑥 + #»
𝑦 = (𝑎1 𝑣#»1 + 𝑎2 𝑣#»2 + · · · + 𝑎𝑘 𝑣#»𝑘 ) + (𝑏1 𝑣#»1 + 𝑏2 𝑣#»2 + · · · + 𝑏𝑘 𝑣#»𝑘 )
= (𝑎 + 𝑏 )𝑣#» + (𝑎 + 𝑏 )𝑣#» + · · · + (𝑎 + 𝑏 )𝑣#».
1
1
1
2
2
2
𝑘
𝑘
𝑘
This shows that #»
𝑥 + #»
𝑦 is an element of Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. Similarly, since
𝑐 #»
𝑥 = 𝑐(𝑎1 𝑣#»1 + 𝑎2 𝑣#»2 + · · · + 𝑎𝑘 𝑣#»𝑘 ) = 𝑐𝑎1 𝑣#»1 + 𝑐𝑎2 𝑣#»2 + · · · + 𝑐𝑎𝑘 𝑣#»𝑘
we also have that 𝑐 #»
𝑥 ∈ Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. This proves that Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is a
subspace of F𝑛 .
#»
#» #»
(c) Let 𝑆 = { #»
𝑥 ∈ F𝑛 : 𝐴 #»
𝑥 = 0 }. We must show that 𝑆 is a subspace of F𝑛 . As 𝐴 0 = 0 ,
#»
we know that 0 ∈ 𝑆. Next, let #»
𝑥 , #»
𝑦 ∈ 𝑆 and 𝑐 ∈ F. Then since
#» #» #»
𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦 = 0 + 0 = 0,
#» #»
we have that #»
𝑥 + #»
𝑦 ∈ 𝑆. And since 𝐴(𝑐 #»
𝑥 ) = 𝑐(𝐴 #»
𝑥 ) = 𝑐 0 = 0 , we have that 𝑐 #»
𝑥 ∈ 𝑆.
This proves that 𝑆 is a subspace of F𝑛 .
Many commonly encountered sets in linear algebra can be shown to be subspaces by realizing
that they are either of the form Span 𝑆, for some finite subset 𝑆 of F𝑛 , or realizing that
they are the solution set of a homogeneous linear system of equations, and then appealing
to Proposition 8.1.2 above. Here are some key examples of this.
Proposition 8.1.3
(More Examples of Subspaces)
(a) If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then Col(𝐴) is a subspace of F𝑚 .
(b) If 𝑇 : F𝑛 → F𝑚 is a linear transformation, then the range of 𝑇 , Range(𝑇 ), is a subspace
of F𝑚 .
(c) If 𝑇 : F𝑛 → F𝑚 is a linear transformation, then the kernel of 𝑇 , Ker(𝑇 ), is a subspace
of F𝑛 .
(d) If 𝐴 ∈ 𝑀𝑛×𝑛 (F) and if 𝜆 ∈ F, then the eigenspace 𝐸𝜆 is a subspace of F𝑛 .
#»
𝑡ℎ column of 𝐴, so this is a
Proof: (a) Col(𝐴) = Span{𝑎#»1 , 𝑎#»2 , . . . , 𝑎# »
𝑛 }, where 𝑎𝑖 is the 𝑖
special case of Proposition 8.1.2 (b).
(b) Range(𝑇 ) = Col([𝑇 ]ℰ ), so this is a special case of Part (a).
#»
(c) Null(𝑇 ) is equal to the solution set of the homogeneous linear system [𝑇 ]ℰ #»
𝑥 = 0 , so
this follows from Proposition 8.1.2 (c).
192
Chapter 8
Subspaces and Bases
#»
(d) 𝐸𝜆 is the solution set of the homogeneous system (𝐴 − 𝜆𝐼) #»
𝑥 = 0 , so this is a special
case of Proposition 8.1.2 (c).
The following proposition can simplify the process needed to check if a subset of F𝑛 is a
subspace.
Proposition 8.1.4
(Subspace Test)
Let 𝑉 be a subset of F𝑛 . Then 𝑉 is a subspace of F𝑛 if and only if
(a) 𝑉 is non-empty, and
(b) for all #»
𝑥 , #»
𝑦 ∈ 𝑉 and 𝑐 ∈ F, 𝑐 #»
𝑥 + #»
𝑦 ∈𝑉.
Proof: (⇒):
#»
Assume 𝑉 is a subspace of F𝑛 . Then 0 ∈ 𝑉 , so 𝑉 is non-empty. Also for all #»
𝑥 ∈ 𝑉 and
#»
𝑐 ∈ F, 𝑐 𝑥 ∈ 𝑉 , since 𝑉 is closed under scalar multiplication. But then, for all #»
𝑦 ∈ 𝑉,
𝑐 #»
𝑥 + #»
𝑦 is also in 𝑉 , since 𝑉 is closed under addition. So 𝑉 satisfies (a) and (b).
(⇐):
Conversely, assume that 𝑉 is a subset of F𝑛 that satisfies (a) and (b). By (a), there exists
#»
an #»
𝑥 ∈ 𝑉 . Then, by (b) with 𝑐 = −1 and #»
𝑦 = #»
𝑥 , we find that 0 = − #»
𝑥 + #»
𝑥 is in 𝑉 . Also,
#»
#»
#»
#»
with 𝑐 = 1, and any 𝑥 and 𝑦 in 𝑉 , we find that 𝑥 + 𝑦 ∈ 𝑉 , so 𝑉 is closed under addition.
#»
#»
Finally, since we have shown that 𝑉 contains 0 , then (b) with #»
𝑦 = 0 shows that 𝑉 is
closed under scalar multiplication. So 𝑉 is a subspace of F𝑛 .
EXERCISE
Establish that the examples referred to in Propositions 8.1.2 and 8.1.3 above are subspaces
using Proposition 8.1.4 (Subspace Test).
Example 8.1.5
Let 𝑊1 = {[𝑥 𝑦 𝑧]𝑇 ∈ R3 : 𝑥 + 2𝑦 + 3𝑧 = 4} and 𝑊2 = {[𝑥 𝑦 𝑧]𝑇 ∈ R3 : 𝑥2 − 𝑧 2 = 0} be
subsets of R3 . Determine whether either of them is a subspace of R3 .
Solution:
𝑊1 is not a subspace of R3 , since [0 0 0]𝑇 ∈
/ 𝑊1 .
Although [0 0 0]𝑇 ∈ 𝑊2 , this subset does not look “linear.” Let us check closure under
addition.
Notice that [2 0 2]𝑇 ∈ 𝑊2 , as 22 − 22 = 0, and [−2 0 2]𝑇 ∈ 𝑊2 , as (−22 ) − 22 = 0. However,
if we add these two vectors together, we get [2 0 2]𝑇 + [−2 0 2]𝑇 = [0 0 4]𝑇 , and since
(02 ) − 42 = −16 ̸= 0, then [0 0 4]𝑇 ∈
/ 𝑊2 .
This means that 𝑊2 is not closed under addition and is therefore not a subspace of R3 .
Section 8.2
Linear Dependence and the Notion of a Basis of a Subspace
193
REMARK
The preceding example illustrates a useful check: solution sets to non-homogeneous systems
#»
(such as 𝑊1 ) are never subspaces, since they do not contain 0 . Likewise, solution sets
to nonlinear systems (such as 𝑊2 ) tend to not be subspaces, since they are usually not
closed under addition or scalar multiplication (or both). On the other hand, we have from
Proposition 8.1.2 (Examples of Subspaces) that solution sets to homogeneous linear systems
are always subspaces.
8.2
Linear Dependence and the Notion of a Basis of a Subspace
We will now tackle the issue of how to describe a subspace in an efficient way. We have
already seen in Proposition 8.1.2 (Examples of Subspaces) that, given any finite subset 𝑆 of
vectors in F𝑛 , their span Span 𝑆 is a subspace of F𝑛 . We will eventually see that, conversely,
every subspace 𝑉 of F𝑛 may be expressed in the form of 𝑉 = Span 𝑆 where 𝑆 is a finite
subset of vectors in 𝑉 .
An important matter here is that this set 𝑆 is not uniquely determined by 𝑉 , in the sense
that a given subspace 𝑉 may be expressed as the spanning set of many different subsets 𝑆.
Example 8.2.1
We have
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂
{︂[︂ ]︂ [︂ ]︂}︂
−1
1
0
1
1
0
1
0
1
,
,
,
,
= Span
,
,
= Span
,
R = Span
−1
2
1
0
2
1
0
1
0
2
[︂ ]︂
𝑥
∈ R2 , we can express this as
as for
𝑦
[︂ ]︂
[︂ ]︂
[︂ ]︂
[︂ ]︂
[︂ ]︂
[︂ ]︂
[︂ ]︂
[︂ ]︂
[︂ ]︂
[︂ ]︂
−1
1
0
1
1
0
1
0
1
𝑥
.
+0
+0
+𝑦
=𝑥
+0
+𝑦
=𝑥
+𝑦
=𝑥
−1
2
1
0
2
1
0
1
0
𝑦
In the preceding example, it is apparent that the second and third spanning sets contain
some redundancies. Our goal in this section is to be able to mathematically quantify this
type of redundancy and demonstrate that sets without such redundancies are preferable.
Below is a more involved example to illustrate this point.
Example 8.2.2
Consider the following subset of R3 :
⎧⎡ ⎤ ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
2
0
−2
1 ⎬
⎨ 2
⎦
⎣
⎣
⎦
⎣
⎦
⎣
⎦
⎣
2 , −3 , 5 , 13 , 1 ⎦ .
𝑆=
⎭
⎩
−8
1
2
4
−2
Let 𝑉 = Span (𝑆). At this point we cannot say much about 𝑉 . For instance, it could be the
entirety of R3 , or it could be some smaller subspace, such as a line or a plane through the
194
Chapter 8
Subspaces and Bases
origin. We can at least be sure that 𝑉 cannot be a line through the origin, since 𝑆 contains
two non-zero, non-parallel vectors. (We will see below that 𝑉 is in fact a plane.)
To get a better understanding of 𝑉 , we would like to remove any “redundant” vectors from
𝑆 and provide a smaller subset, let us call it 𝑆1 , such that 𝑉 = Span (𝑆1 ) . How do we
know which vectors are redundant and ought to be removed from 𝑆? This leads us to the
important idea of linear dependence.
Informally, we say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent if at least one of
them, 𝑣#»𝑗 say, can be written as a linear combination of some of the other vectors. Thus, the
vector 𝑣#»𝑗 depends linearly on the other vectors. (The formal definition is given in Definition
8.2.3 below.)
For example, the vectors in the set 𝑆 above are linearly dependent because
⎡ ⎤
⎡ ⎤
⎡ ⎤
−2
2
2
⎣ 13 ⎦ = 2 ⎣ 2 ⎦ − 3 ⎣ −3 ⎦ .
−8
2
4
[︀
]︀𝑇
Thus, we can remove the vector −2 13 −8 from 𝑆 and obtain a more efficient description
of 𝑉 :
⎧⎡ ⎤ ⎡ ⎤ ⎡
⎤ ⎡ ⎤ ⎡ ⎤⎫
2
0
−2
1 ⎬
⎨ 2
𝑉 = Span ⎣ 2 ⎦ , ⎣ −3 ⎦ , ⎣ 5 ⎦ , ⎣ 13 ⎦ , ⎣ 1 ⎦
⎩
⎭
2
4
−2
−8
1
⎧⎡ ⎤ ⎡ ⎤ ⎡
⎤ ⎡ ⎤⎫
1 ⎬
0
2
⎨ 2
⎦
⎣
⎦
⎣
⎦
⎣
⎣
2 , −3 , 5 , 1 ⎦ .
= Span
⎩
⎭
1
−2
4
2
[︀
]︀𝑇
However, note that the choice of singling out the vector −2 13 −8 is somewhat arbitrary.
For example, we could also write
⎡ ⎤
⎡ ⎤
⎡
⎤
2
−2
2
⎣ 2 ⎦ = 1 ⎣ 13 ⎦ + 3 ⎣ −3 ⎦
2
2
2
−8
4
or
⎡ ⎤
⎡
⎤
⎤
2
−2
2
⎣ −3 ⎦ = 2 ⎣ 2 ⎦ − 1 ⎣ 13 ⎦
3
3
4
2
−8
⎡
and then remove one of these vectors instead.
It is therefore more reasonable and balanced if we put everything on one side of our dependence equation and re-write it as
⎡ ⎤
⎡ ⎤
⎡ ⎤ ⎡ ⎤
−2
2
2
0
⎣ 13 ⎦ − 2 ⎣ 2 ⎦ + 3 ⎣ −3 ⎦ = ⎣ 0 ⎦ .
−8
2
4
0
The point is that the redundancy in the set 𝑆 is identified by the feature that we can take
some linear combination of some of the vectors in 𝑆 to produce the zero vector.
Section 8.2
Linear Dependence and the Notion of a Basis of a Subspace
195
We can always form the zero vector by taking a trivial linear combination (that is, one
where all the scalar coefficients are 0). For example,
⎡ ⎤
⎡ ⎤
⎡
⎤ ⎡ ⎤
−2
2
2
0
0 ⎣ 13 ⎦ + 0 ⎣ 2 ⎦ + 0 ⎣ −3 ⎦ = ⎣ 0 ⎦ .
−8
2
4
0
It is worth noting the scenarios where this is the only linear combination that can produce the zero vector, which forms the premise of our definitions of linear dependence and
independence below.
Definition 8.2.3
Linear Dependence
We say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ F𝑛 are linearly dependent if there exist scalars
#»
𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F, not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 .
If 𝑈 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }, then we say that the set 𝑈 is a linearly dependent set (or
simply that 𝑈 is linearly dependent) to mean that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly
dependent.
If a list or set of vectors is not linearly dependent, we will say that it is linearly independent.
Here is the formal definition.
Definition 8.2.4
Linear
Independence,
Trivial Solution
We say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ F𝑛 are linearly independent if there do not exist
#»
scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F, not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0 .
Equivalently we say that 𝑣#», 𝑣#», . . . , 𝑣#» ∈ F𝑛 are linearly independent if the only solution
1
to the equation
2
𝑘
#»
𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0
is the trivial solution 𝑐1 = 𝑐2 = · · · = 𝑐𝑘 = 0.
If 𝑈 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }, then we say that the set 𝑈 is a linearly independent set (or
simply that 𝑈 is linearly independent) to mean that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are
linearly independent.
REMARK
As a matter of convention, the empty set ∅ is considered to be linearly independent. We
consider the condition for linear independence in the above definition to be vacuously satisfied by ∅.
The notion of linear dependence and independence is one of the most important ideas in
linear algebra. We will examine it more deeply in the following sections. For now, let’s
return to the previous example.
196
Example 8.2.5
Chapter 8
Subspaces and Bases
We have as before the set
⎧⎡ ⎤ ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
2
0
−2
1 ⎬
⎨ 2
⎣
⎦
⎣
⎦
⎣
⎦
⎣
⎦
⎣
2 , −3 , 5 , 13 , 1 ⎦
𝑉 = Span (𝑆) = Span
⎭
⎩
2
4
−2
−8
1
which is a subspace of R3 . We have shown that 𝑆 is linearly dependent, and specifically
that [−2 13 − 8]𝑇 can be written as a linear combination of [2 2 2]𝑇 and [2 − 3 4]𝑇 . We
choose to remove the vector [−2 13 − 8]𝑇 from 𝑆 to produce the set 𝑆1 . Then
⎧⎡ ⎤ ⎡ ⎤ ⎡
⎤ ⎡ ⎤⎫
2
0
1 ⎬
⎨ 2
⎣
⎦
⎣
⎦
⎣
⎦
⎣
2 , −3 , 5 , 1 ⎦ .
𝑉 = Span (𝑆1 ) = Span
⎩
⎭
2
4
−2
1
However 𝑆1 is linearly dependent. For example,
⎡ ⎤
⎡ ⎤ ⎡ ⎤
2
1
0
⎣2⎦ − 2 ⎣1⎦ = ⎣0⎦ .
2
1
0
We choose to remove the vector [2 2 2]𝑇 from 𝑆1 to produce the set 𝑆2 . Then
⎧⎡
⎤ ⎡ ⎤ ⎡ ⎤⎫
0
1 ⎬
⎨ 2
𝑉 = Span (𝑆2 ) = Span ⎣ −3 ⎦ , ⎣ 5 ⎦ , ⎣ 1 ⎦ .
⎩
⎭
4
−2
1
The set 𝑆2 is still linearly dependent! For example,
⎤ ⎡ ⎤ ⎡ ⎤
⎡ ⎤ ⎡
0
0
2
1
2 ⎣ 1 ⎦ − ⎣ −3 ⎦ − ⎣ 5 ⎦ = ⎣ 0 ⎦ .
0
−2
4
1
We choose to remove the vector [0 5 − 2]𝑇 from 𝑆2 to produce the set 𝑆3 . Then
⎧⎡ ⎤ ⎡ ⎤⎫
1 ⎬
⎨ 2
𝑉 = Span (𝑆3 ) = Span ⎣ −3 ⎦ , ⎣ 1 ⎦ .
⎩
⎭
4
1
The set 𝑆3 , however, is linearly independent. Indeed, if we have a linear combination
⎡ ⎤
⎡ ⎤ ⎡ ⎤
2
1
0
𝑐1 ⎣ −3 ⎦ + 𝑐2 ⎣ 1 ⎦ = ⎣ 0 ⎦ ,
4
1
0
then 𝑐2 = −2𝑐1 and 𝑐2 = 3𝑐1 , so −2𝑐1 = 3𝑐1 . Therefore, 𝑐1 = 0 and consequently 𝑐2 = 0.
That is, we cannot form the zero vector as a non-trivial linear combination of vectors in 𝑆3 .
The set 𝑆3 is an optimal way of producing the subspace 𝑉 . This optimal set incidentally
shows that 𝑉 is equal to the span of two non-parallel vectors in R3 , and so it is a plane
through the origin.
Section 8.3
Detecting Linear Dependence and Independence
197
The previous example illustrates an important phenomenon. Given a subspace 𝑉 of F𝑛 , say
one of the form 𝑉 = Span (𝑆) for some finite set 𝑆 (we will later see that all subspaces are
of this form), it will be desirable to remove any redundant vectors from 𝑆, reducing it to
a smaller linearly independent subset ℬ such that 𝑉 = Span(ℬ). Such a set ℬ is, in some
sense, a basic set of building blocks for the construction of 𝑉 .
Definition 8.2.6
Basis
Let 𝑉 be a subspace of F𝑛 and let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a finite set of vectors contained
in 𝑉 . We say that ℬ is a basis for 𝑉 if
1. ℬ is linearly independent, and
2. 𝑉 = Span(ℬ).
REMARK
We shall
{︁ #»}︁adopt the convention that the empty set ∅ is a basis for the zero subspace
𝑉 = 0 . Recall that we had earlier adopted the convention of considering the empty
set to be linearly
{︁ #»}︁ independent. (See the remark following Definition 8.2.4.) The statement
Span (∅) = 0 can be made plausible if we agree that a “linear combination of no vectors”
gives the zero vector.
The notion of a basis is central to much of what follows in the course. We will spend the
next two sections carefully examining both conditions (1) (linear independence) and (2)
(spanning) in the above definition.
Example 8.2.7
⎧⎡
⎤ ⎡ ⎤⎫
1 ⎬
⎨ 2
⎦
⎣
⎣
−3 , 1 ⎦ is a basis for
Our work in the previous example shows that ℬ =
⎭
⎩
1
4
⎧⎡ ⎤ ⎡
⎤ ⎡
⎤ ⎡ ⎤ ⎡ ⎤⎫
2
0
−2
1 ⎬
⎨ 2
𝑉 = Span ⎣ 2 ⎦ , ⎣ −3 ⎦ , ⎣ 5 ⎦ , ⎣ 13 ⎦ , ⎣ 1 ⎦ .
⎩
⎭
2
4
−2
−8
1
8.3
Detecting Linear Dependence and Independence
We begin by recalling the definitions of linear dependence and independence given in Definitions 8.2.3 and 8.2.4. A set of vectors {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is said to be linearly dependent
#»
if there exist scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 , not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0 ; if
not, the set is said to be linearly independent.
The next proposition offers reformulations of these definitions that can be easier to use in
practice.
198
Proposition 8.3.1
Chapter 8
Subspaces and Bases
(Linear Dependence Check)
(a) The vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent if and only if one of the vectors can be
written as a linear combination of some of the other vectors.
(b) The vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly independent if and only if
#»
𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0
(𝑐𝑖 ∈ F)
implies
𝑐1 = · · · = 𝑐𝑘 = 0.
Proof: (a) Suppose that 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent. Then there exist scalars
#»
𝑐1 , . . . , 𝑐𝑘 , not all zero, such that 𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . Suppose that 𝑐𝑗 is nonzero.
Then by moving the 𝑐𝑗 𝑣#»𝑗 term to the right side of the previous equation, we get
∑︁
−𝑐𝑗 𝑣#»𝑗 =
𝑐𝑖 #»
𝑣 𝑖.
𝑖̸=𝑗
Dividing through by −𝑐𝑗 (which is permissible, since 𝑐𝑗 ̸= 0), we get
∑︁ 𝑐𝑖
#»
𝑣 𝑖.
𝑣#»𝑗 =
−𝑐𝑗
𝑖̸=𝑗
Thus, we have expressed 𝑣#»𝑗 as a linear combination of the other vectors.
Conversely, if 𝑣#» is a linear combination of the other vectors, say
𝑗
𝑣#»𝑗 =
∑︁
𝑎𝑖 #»
𝑣 𝑖,
𝑖̸=𝑗
then
(−1)𝑣#»𝑗 +
∑︁
#»
𝑎𝑖 #»
𝑣𝑖 = 0
𝑖̸=𝑗
#»
is an expression of 0 as a nontrivial linear combination of the vectors 𝑣#»1 , . . . , 𝑣#»𝑘 (nontrivial because the coefficient of 𝑣#»𝑗 is nonzero). This proves that 𝑣#»1 , . . . , 𝑣#»𝑘 are linearly
dependent.
(b) This result is a rephrasing of the equivalent condition for linear independence in Definition 8.2.4. Indeed, the contrapositive of the given statement is: A list of vectors
𝑣#»1 , . . . , 𝑣#»𝑘 is linearly dependent if and only if there exists scalars 𝑐1 , . . . , 𝑐𝑘 not all zero
#»
such that 𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . This is the definition of linear dependence. So the
statement given in (b) must be true as well.
Now we turn to the problem of determining whether a given set is or is not linearly dependent. When the set contains at most two vectors, this is particularly straightforward.
Proposition 8.3.2
Let 𝑆 ⊆ F𝑛 .
#»
(a) If 0 ∈ 𝑆, then 𝑆 is linearly dependent.
Section 8.3
Detecting Linear Dependence and Independence
199
#»
(b) If 𝑆 = { #»
𝑥 } contains only one vector, then 𝑆 is linearly dependent if and only if #»
𝑥 = 0.
(c) If 𝑆 = { #»
𝑥 , #»
𝑦 } contains only two vectors, then 𝑆 is linearly dependent if and only if
one of the vectors is a scalar multiple of the other.
#»
#»
Proof: (a) Since 1 is a non-zero scalar, and 1( 0 ) = 0 , we can thus take a non-trivial
linear combination of some vectors (in this case, only one vector) in 𝑆 to produce the
zero vector.
#»
(b) We know from (a) that if #»
𝑥 = 0 , then 𝑆 is linearly dependent.
#»
Conversely, we know from the properties of the zero vector that if 𝑐 #»
𝑥 = 0 , then either
#»
#»
𝑥 = 0 or 𝑐 = 0.
#»
#»
So if #»
𝑥 ̸= 0 and 𝑐 #»
𝑥 = 0 , then 𝑐 must be zero. That is, only the trivial combination
#»
of #»
𝑥 ̸= 0 will produce the zero vector. Thus, 𝑆 is linearly independent.
(c) If one of the vectors is a multiple of the other, we may assume without loss of generality
that #»
𝑦 = 𝑚 #»
𝑥 for some 𝑚 ∈ F. Then
#»
#»
𝑦 − 𝑚 #»
𝑥 = 0.
Since the coefficient of #»
𝑦 is 1 ̸= 0, we have demonstrated linear dependence.
#»
Conversely, if 𝑆 = { 𝑥 , #»
𝑦 } is linearly dependent, then there exist 𝑎, 𝑏 ∈ F, not both
#»
zero, such that 𝑎 #»
𝑦 + 𝑏 #»
𝑥 = 0 . We may assume without loss of generality that 𝑎 ̸= 0.
We then have that
𝑏
𝑏
#»
#»
𝑦 + #»
𝑥 = 0 , which gives #»
𝑦 = − #»
𝑥.
𝑎
𝑎
Thus, one of the vectors, #»
𝑦 , is a multiple of the other, #»
𝑥.
{︂[︂
Example 8.3.3
Is 𝑆 =
]︂}︂
]︂ [︂
2
6
linearly dependent?
,
−4
−12
Solution:
[︂
]︂
[︂ ]︂
6
2
Yes, as
=3
.
−12
−4
{︂[︂
Example 8.3.4
Is 𝑆 =
]︂ [︂
]︂}︂
2
6
,
linearly dependent?
−4
−14
Solution:
No. The second vector is not a multiple of the first (and the first is not a multiple of the
second). Looking at the first components, 6 is 3 times 2; however, looking at the second
components, −14 is not 3 times −4.
200
Example 8.3.5
Chapter 8
Subspaces and Bases
[︂
]︂
[︂
]︂
2 + 3𝑖
12 + 5𝑖
#»
#»
#»
#»
Is 𝑆 = {𝑧1 , 𝑧2 } , with 𝑧1 =
and 𝑧2 =
, linearly dependent?
4+𝑖
14 − 5𝑖
Solution:
Because we are working over C, the answer to this question is not so obvious.
#»
We will use Proposition 8.3.2 (b). Consider the equation 𝑐1 𝑧#»1 + 𝑐2 𝑧#»2 = 0 . The Proposition
states that {𝑧#», 𝑧#»} is linearly independent precisely when this equation has only the trivial
1
2
solution 𝑐1 = 𝑐2 = 0.
The corresponding coefficient matrix of this equation is
[︂
]︂
2 + 3𝑖 12 + 5𝑖
4 + 𝑖 14 − 5𝑖
which row reduces to
[︂
]︂
1 3 − 2𝑖
.
0 0
We deduce from this reduced matrix that there are (infinitely many) non-trivial solutions
to the system of equations. One solution is 𝑐1 = −3 + 2𝑖 and 𝑐2 = 1.
Thus, the set 𝑆 is linearly dependent.
How do we deal with a set that contains more than two vectors? The ideas in the previous
examples can be expanded as follows. Let 𝑆 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 .
Consider the following questions about 𝑆:
#»
1. Is 0 ∈ 𝑆?
2. Is one vector in 𝑆 a scalar multiple of another vector in 𝑆?
3. Can we easily observe that we can write one of the vectors in 𝑆 as a linear combination
of some of the other vectors in 𝑆?
If the answer to any of these questions is “yes”, then the set 𝑆 is linearly dependent.
In particular, (2) and (3) above essentially ask “is it obvious that this set is linearly
dependent?”
If the answer to all of them is “no”, we cannot immediately conclude whether the set is
linearly independent or dependent. Therefore, we must examine the expression
#»
𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 .
This is a homogeneous linear system of 𝑛 equations (obtained by equating the entries of the
vectors on both sides of the above expression in 𝑘 unknowns (the coefficients 𝑐𝑖 ). Since it
is a homogeneous system, we have that one solution is the trivial solution, and we wish to
know whether or not there are any other solutions. We know that 𝑆 is linearly dependent
if and only if there are non-trivial solutions from Proposition 8.3.1 (Linear Dependence
Check).
This is all succinctly captured by the rank of the coefficient matrix of the system associated
with the expression above.
Section 8.3
Detecting Linear Dependence and Independence
Proposition 8.3.6
201
(Pivots and Linear Independence)
[︀
]︀
Let 𝑆 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 . Let 𝐴 = 𝑣#»1 𝑣#»2 · · · 𝑣#»𝑘 be the 𝑛 × 𝑘
matrix whose columns are the vectors in 𝑆.
Suppose that rank(𝐴) = 𝑟 and 𝐴 has pivots in columns 𝑞1 , 𝑞2 , . . . , 𝑞𝑟 .
Let 𝑈 = {𝑣# 𝑞»1 , 𝑣# 𝑞»2 , . . . , 𝑣# 𝑞»𝑟 }, the set of columns of 𝐴 that correspond to the pivot columns
labelled above. Then
(a) 𝑆 is linearly independent if and only if 𝑟 = 𝑘.
(b) 𝑈 is linearly independent.
𝑣 } is linearly dependent.
(c) If #»
𝑣 is in 𝑆 but not in 𝑈 then the set {𝑣# 𝑞»1 , ..., 𝑣# 𝑞»𝑟 , #»
(d) Span (𝑈 ) = Span (𝑆).
REMARK
This proposition makes the identification of linear dependence/independence relatively
straightforward. Additionally, it identifies “redundant” vectors in a given set that may be
removed to obtain a linearly independent subset whose span is the same as the span of the
original set. This gives a framework for the ad-hoc process carried out in Proposition 8.1.2.
Proof: (a) The matrix 𝐴 is the coefficient matrix for the homogeneous system of linear
equations
#»
𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 .
There will be no parameters in the solution set if and only if there is a pivot in each of
the columns of 𝐴, that is, if and only if 𝑟 = 𝑘. So 𝑆 is linearly independent if and only
if 𝑟 = 𝑘.
(b) We must examine the linear dependence of the vectors 𝑣# 𝑞»1 , 𝑣# 𝑞»2 , . . . , 𝑣# 𝑞»𝑟 corresponding
to the pivot columns in 𝐴. Thus, we must consider the homogeneous system of linear
equations
#»
𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 = 0 .
This system is a system of 𝑟 equations in 𝑟 unknowns. Its coefficient matrix has rank
𝑟 because it consists of the pivot columns in 𝐴. Thus, the only solution to this system
is the trivial solution. Consequently, 𝑈 must be linearly independent.
(c) Suppose that 𝑟 < 𝑘, so that there is at least one non-pivot column in 𝐴 — the 𝑗th
column, say. Add the vector 𝑣#»𝑗 to 𝑈 and check this new set for linear dependence. This
amounts to examining the system
#»
𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 + 𝛼 𝑣#»𝑗 = 0 .
By construction, the coefficient matrix of this system has 𝑟 + 1 columns and rank 𝑟.
Thus, there must be 𝑟 + 1 − 𝑟 = 1 free parameter in its solution set, and therefore, there
are nontrivial solutions. It follows that {𝑣# 𝑞»1 , 𝑣# 𝑞»2 , . . . , 𝑣# 𝑞»𝑟 , 𝑣#»𝑗 } is linearly dependent.
202
Chapter 8
Subspaces and Bases
(d) Note that 𝑈 ⊆ 𝑆, so Span (𝑈 ) ⊆ Span (𝑆). We will prove that Span (𝑆) ⊆ Span (𝑈 ).
If 𝑈 = 𝑆, there is nothing to prove. So assume that 𝑆 ̸= 𝑈 , and let 𝑣#» be a vector in 𝑆,
𝑗
but not in 𝑈 . Our proof in part (3) shows that the system
#»
𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 + 𝛼 𝑣#»𝑗 = 0
has a nontrivial solution. Such a solution must have 𝛼 ̸= 0, because of the following
argument. If 𝛼 = 0, then
#»
𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 = 0 ,
and consequently 𝑑1 = . . . = 𝑑𝑟 = 0 by part (b) above and by Part (b) of Proposition
8.3.1 (Linear Dependence Check). So, if 𝛼 = 0, we get the trivial solution, which implies
that this set of vectors is linearly independent and this contradicts (c).
Therefore, since 𝛼 ̸= 0, we can write 𝑣#» as a linear combination of the vectors in 𝑈 ,
𝑗
namely
𝑑1 # »
𝑑2 # »
𝑑𝑟 # »
𝑣#»𝑗 =
𝑣𝑞1 +
𝑣𝑞2 + · · · +
𝑣𝑞 .
−𝛼
−𝛼
−𝛼 𝑟
This shows that 𝑣#»𝑗 ∈ Span (𝑈 ). Since 𝑣#»𝑗 was an arbitrary vector in 𝑆 but not in 𝑈 , it
follows that Span (𝑆) ⊆ Span (𝑈 ), as desired.
Here is a very useful application of the previous Proposition.
Corollary 8.3.7
(Bound on Number of Linearly Independent Vectors)
Let 𝑆 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 . If 𝑛 < 𝑘, then 𝑆 is linearly dependent.
Proof: As in Proposition 8.3.6 (Pivots and Linear Independence), let 𝐴 be the matrix
whose columns are the vectors in 𝑆, and let 𝑟 = rank(𝐴). Then 𝑟 ≤ 𝑛, since 𝐴 has 𝑛 rows.
Hence if 𝑛 < 𝑘, it follows that 𝑟 ≤ 𝑛 < 𝑘. Therefore, 𝑆 is linearly dependent, by Proposition
8.3.6 (a).
We now turn to some computational applications of Proposition 8.3.6 (Pivots and Linear
Independence).
Example 8.3.8
Consider the following vectors in R6 :
[︀
]︀𝑇
𝑣#»1 = 1 −2 3 −4 5 −6 ,
[︀
]︀𝑇
𝑣#»3 = −5 −10 −7 −16 −9 −22 ,
[︀
]︀𝑇
𝑣#»5 = 1 1 1 1 1 1 ,
Let 𝑆 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 , 𝑣#»5 , 𝑣#»6 } and 𝑉 = Span 𝑆.
(a) Show that 𝑆 is linearly dependent.
(b) Find a subset of 𝑆 that is a basis for 𝑉 .
[︀
]︀𝑇
𝑣#»2 = 3 4 5 6 7 8 ,
[︀
]︀𝑇
𝑣#»4 = −6 −4 −2 0 2 4 ,
[︀
]︀𝑇
𝑣#»6 = −3 3 −5 5 −1 1 .
Section 8.3
Detecting Linear Dependence and Independence
203
Solution: (a) Form matrix 𝐴, which has the vectors in 𝑆 as its columns.
3 −5 −6 1 −3
4 −10 −4 1
3
5 −7 −2 1 −5
6 −16
0 1
5
7 −9
2 1 −1
8 −22
4 1
1
⎡
1
⎢ −2
⎢
⎢ 3
𝐴=⎢
⎢ −4
⎢
⎣ 5
−6
An REF of 𝐴 is
⎡
1 3 −5 −6 1 −3
⎢
⎢0
⎢
⎢0
⎢
⎢
⎢0
⎢
⎣0
0
1 −2
0 0
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎤
−8 3 −3
5 10 10
7
1 −1
12 24
0 0
0 0
0 0
0
0
0
0
0
0
⎥
⎥
⎥
⎥
⎥.
⎥
1 ⎥
⎥
0 ⎦
0
We find that 𝐴 has rank 4 with pivots in columns 1, 2, 4, and 6. Since 4 < 6, we conclude that
𝑆 is linearly dependent (by Part (a) of Proposition 8.3.6 (Pivots and Linear Independence)).
(b) Parts (b) and (d) of Proposition 8.3.6 (Pivots and Linear Independence) tell us that the
vectors 𝑣#»1 , 𝑣#»2 , 𝑣#»4 and 𝑣#»6 are linearly independent, and that
Span{𝑣#»1 , 𝑣#»2 , 𝑣#»4 , 𝑣#»6 } = Span 𝑆 = 𝑉.
We conclude that {𝑣#»1 , 𝑣#»2 , 𝑣#»4 , 𝑣#»6 } is a basis for 𝑉 .
Example 8.3.9
Consider the following vectors in C5 :
#»
𝑣1
#»
𝑣2
#»
𝑣3
#»
𝑣4
#»
𝑣
5
= [1 + 𝑖, −2 + 𝑖, 3, −4 − 𝑖, 5𝑖]𝑇 ,
= [−1 + 5𝑖, −7 − 4𝑖, 6 + 9𝑖, −5 − 14𝑖, −15 + 10𝑖]𝑇 ,
= [3𝑖, 4 + 2𝑖, 5 + 2𝑖, 6 − 4𝑖, 7 − 2𝑖]𝑇 ,
= [−1 + 2𝑖, −3 − 6𝑖, 11 + 9𝑖, 1 − 18𝑖, −8 + 10𝑖]𝑇 ,
= [−6𝑖, −4𝑖, −2𝑖, 0, 2𝑖]𝑇 .
Let 𝑆 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 , 𝑣#»5 } and 𝑉 = Span 𝑆.
(a) Show that 𝑆 is linearly dependent.
(b) Find a subset of 𝑆 that is a basis for 𝑉 .
Solution: (a) Form matrix 𝐴, which has the vectors in 𝑆 as its columns.
⎡
⎤
1+𝑖
−1 + 5𝑖
3𝑖 −1 + 2𝑖 −6𝑖
⎢ −2 + 𝑖
−7 − 4𝑖 4 + 2𝑖 −3 − 6𝑖 −4𝑖 ⎥
⎢
⎥
⎢
3
6 + 9𝑖 5 + 2𝑖
11 + 9𝑖 −2𝑖 ⎥
𝐴=⎢
⎥.
⎣ −4 − 𝑖 −5 − 14𝑖 6 − 4𝑖
1 − 18𝑖
0 ⎦
5𝑖 −15 + 10𝑖 7 − 2𝑖 −8 + 10𝑖
2𝑖
204
Chapter 8
Subspaces and Bases
The RREF of 𝐴 is
⎡
1 2 + 3𝑖
⎢0 0
⎢
⎢0 0
⎢
⎣0 0
0 0
0
1
0
0
0
⎤
0 −2 − 3𝑖
0 −1 ⎥
⎥
⎥.
1
1
⎥
⎦
0
0
0
0
We find that 𝐴 has rank 3 with pivots in columns 1, 3 and 4. Since 3 < 5, we conclude that
𝑆 is linearly dependent (by Part (a) of Proposition 8.3.6 (Pivots and Linear Independence)).
(b) Parts (b) and (d) of Proposition 8.3.6 (Pivots and Linear Independence) tell us that the
vectors 𝑣#»1 , 𝑣#»3 , and 𝑣#»4 are linearly independent, and that
Span{𝑣#»1 , 𝑣#»2 , 𝑣#»4 } = Span 𝑆 = 𝑉.
We conclude that {𝑣#»1 , 𝑣#»2 , 𝑣#»4 } is a basis for 𝑉 .
8.4
Spanning Sets
The problem we wish to tackle in this section is the following: given a subspace 𝑉 of F𝑛 ,
find a finite set 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } of vectors in 𝑉 such that 𝑉 = Span{𝑣#»1 , . . . , 𝑣#»𝑘 }. We begin
by showing that this is always possible.
Theorem 8.4.1
(Every Subspace Has a Spanning Set)
Let 𝑉 be a subspace of F𝑛 . Then there exist vectors 𝑣#»1 , . . . , 𝑣#»𝑘 ∈ 𝑉 such that
𝑉 = Span{𝑣#»1 , . . . , 𝑣#»𝑘 }.
#»
#»
#»
Proof: If 𝑉 = { 0 }, then 𝑉 = Span{ 0 }. So we may assume that 𝑉 ̸= { 0 }. Therefore,
there must be some nonzero vector #»
𝑣 1 in 𝑉 .
If 𝑉 = Span{𝑣#»1 }, we are done. Otherwise, there exists a 𝑣#»2 ∈ 𝑉 that is not in Span{𝑣#»1 }. By
Part (a) of Proposition 8.3.1 (Linear Dependence Check), the set {𝑣#», 𝑣#»} must be linearly
1
2
independent.
If 𝑉 = Span{𝑣#»1 , 𝑣#»2 }, we are done. Otherwise, by continuously repeating this process, we
generate a set {𝑣#»1 , . . . , 𝑣#»𝑘 } of 𝑘 linearly independent vectors in 𝑉 . By Corollary 8.3.7 (Bound
on Number of Linearly Independent Vectors), we must have 𝑘 ≤ 𝑛. Thus, the process must
terminate after at most 𝑛 steps, at which point we have 𝑉 = Span{𝑣#»1 , . . . , 𝑣#»𝑘 }.
REMARK
The above proof actually shows more than is stated in the Theorem. It shows that if 𝑉 is
a nonzero subspace of F𝑛 , then we can find a linearly independent spanning set (that is, a
basis!) for 𝑉 . Moreover, we can find a basis with at most 𝑛 elements. We will return to
Section 8.4
Spanning Sets
205
these points in Theorem 8.5.1 (Every Subspace Has a Basis) and Proposition 8.7.5 (Bound
on Dimension of Subspace).
The proof of the previous Theorem can be refined into an algorithm that produces a spanning set for a given subspace 𝑉 of F𝑛 . We will not go into the full details of that here;
instead, we will only consider one of the key issues that must be addressed in order to
make this process practical. Namely, given a finite subset 𝑆 of 𝑉 , how can we determine if
Span (𝑆) = 𝑉 ?
Proposition 8.4.2
(Span of Subset)
Let 𝑉 be a subspace of F𝑛 and let 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } ⊆ 𝑉 . Then Span (𝑆) ⊆ 𝑉 .
Proof: This result follows immediately from the fact that 𝑉 is closed under addition and
scalar multiplication.
Thus, given a finite subset 𝑆 of 𝑉 , to show that Span 𝑆 = 𝑉 , we need only show that
𝑉 ⊆ Span (𝑆), since the other containment is always true by the previous Proposition. In
order to do this, we need to show that given #»
𝑣 ∈ 𝑉 , there exist scalars 𝑐1 , . . . , 𝑐𝑘 ∈ F such
that
#»
𝑣 = 𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 .
[︀
]︀
If we [︀let 𝐴 = 𝑣#»1 · ·]︀· 𝑣#»𝑘 be the coefficient matrix of this system of equations, and if we let
𝑣 be the augmented matrix of this system of equations, then we need to
𝐵 = 𝑣#»1 · · · 𝑣#»𝑘 | #»
show that rank(𝐴) = rank(𝐵) for every #»
𝑣 ∈ 𝑉. This will depend on both 𝑆 and 𝑉 .
Example 8.4.3
Consider the plane 𝑃 in R3 given by the scalar equation 2𝑥 − 6𝑦 + 4𝑧 = 0. Determine a
basis for 𝑃 .
Solution: If we let 𝑧 = 𝑡 and 𝑦 = 𝑠, where 𝑠, 𝑡 ∈ R, then 𝑥 = 3𝑠 − 2𝑡, and we can express
the plane as
⎧⎡
⎫ ⎧⎡ ⎤
⎫
⎤
⎡
⎤
−2
⎨ 3𝑠 − 2𝑡
⎬ ⎨ 3
⎬
𝑃 = ⎣ 𝑠 ⎦ : 𝑠, 𝑡 ∈ R = ⎣ 1 ⎦ 𝑠 + ⎣ 0 ⎦ 𝑡 : 𝑠, 𝑡 ∈ R .
⎩
⎭ ⎩
⎭
𝑡
0
1
Thus,
⎧⎡ ⎤ ⎡ ⎤⎫
−2 ⎬
⎨ 3
⎣
⎦
⎣
1 , 0 ⎦ .
𝑃 = Span
⎩
⎭
0
1
⎡ ⎤
⎡
⎤
3
−2
Since the vectors ⎣ 1 ⎦ and ⎣ 0 ⎦ are not multiples of each other, the set
0
1
⎧⎡ ⎤ ⎡ ⎤⎫
−2 ⎬
⎨ 3
𝑆 = ⎣1⎦ , ⎣ 0 ⎦
⎩
⎭
0
1
is linearly independent, hence is a basis for 𝑃 .
206
Example 8.4.4
Chapter 8
Subspaces and Bases
⎧⎡
⎤ ⎡ ⎤⎫
−4 ⎬
⎨ 2
Continue with 𝑃 as above, and consider 𝑆1 = ⎣ 0 ⎦ , ⎣ 0 ⎦ . Is Span (𝑆1 ) = 𝑃 ?
⎭
⎩
−1
2
(Note that 𝑆1 ⊆ 𝑃 .)
⎤
3𝑠1 − 2𝑡1
⎦ for some 𝑠1 , 𝑡1 ∈ R. Consider
𝑠1
Solution: Let #»
𝑣 ∈ 𝑃. Then we can write #»
𝑣 =⎣
𝑡1
the system
⎤
⎡
⎤
⎡ ⎤
⎡
2
−4
3𝑠1 − 2𝑡1
#»
⎦
⎣
⎦
⎣
⎣
𝑠1
= 𝑎 0 + 𝑏 0 ⎦ , where 𝑎, 𝑏 ∈ R.
𝑣 =
−1
2
𝑡1
⎡
The augmented matrix for this system is
⎡
⎤
2 −4 3𝑠1 − 2𝑡1
⎣ 0
⎦,
0
𝑠1
𝑡1
−1 2
which row reduces to
⎡
⎤
2 −4 3𝑠1 − 2𝑡1
⎣ 0 0
⎦.
𝑠1
0 0
0
The last column of the augmented matrix is a pivot column when 𝑠1 ̸= 0, and thus, we
cannot always solve this system. It is also quite clear from the form of the two vectors in
𝑆1 , which both have a second component of zero, that we will be unable to build all the
vectors in 𝑃 from them, as some of the vectors in 𝑃 have a nonzero second component.
Thus Span 𝑆1 ̸= 𝑃 .
Example 8.4.5
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
3 ⎬
−4
⎨ 2
Again use 𝑃 from above. Consider 𝑆2 = ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ . Is Span (𝑆2 ) = 𝑃 ?
⎭
⎩
−1
2
0
⎡
⎤
3𝑠1 − 2𝑡1
⎦ ∈ 𝑃 and consider the system
𝑠1
Solution: Let #»
𝑣 =⎣
𝑡1
⎡
⎤
⎡
⎤
⎡ ⎤
⎡ ⎤
3𝑠1 − 2𝑡1
2
−4
3
#»
⎣
⎦
⎣
⎦
⎣
⎦
⎣
𝑠1
𝑣 =
= 𝑎 0 + 𝑏 0 + 𝑐 1 ⎦ , where 𝑎, 𝑏, 𝑐 ∈ R.
𝑡1
−1
2
0
The augmented matrix for this system is
⎡
⎤
2 −4 3 3𝑠1 − 2𝑡1
⎣ 0
⎦,
0 1
𝑠1
−1 2 0
𝑡1
which row reduces to
⎡
⎤
1 −2 0 𝑡1
⎣ 0 0 1 𝑠1 ⎦ .
0 0 0 0
Section 8.4
Spanning Sets
207
Notice that the rank of the coefficient matrix and the rank of the augmented matrix are
both equal to 2, and thus, the system will be consistent for all #»
𝑣 ∈ 𝑃 . Consequently,
Span 𝑆2 = 𝑃.
In addition, we can solve the system to get 𝑐 = 𝑠1 , 𝑏 = 𝑢, and 𝑎 = 2𝑢 − 𝑡1 , for 𝑢 ∈ R. Note
that the fact that there is a parameter 𝑢 in this solution, tells us that we have redundancy.
Thus, the set 𝑆2 is not linearly independent.
In the special case where 𝑉 = F𝑛 , we have the following useful criterion.
Proposition 8.4.6
(Spans F𝑛 iff rank is 𝑛)
[︀
]︀
Let 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 and let 𝐴 = 𝑣#»1 · · · 𝑣#»𝑘 be the matrix
whose columns are the vectors in 𝑆. Then
Span (𝑆) = F𝑛
if and only if
rank(𝐴) = 𝑛.
Proof: We have Span (𝑆) = F𝑛 if and only if for every #»
𝑣 ∈ F𝑛 , the system of equations
#»
𝑣 = 𝑐 𝑣#» + 𝑐 𝑣#» + · · · + 𝑐 𝑣#»
1 1
2 2
𝑘 𝑘
has a solution.
[︀
]︀
𝑣 be the corresponding augmented
The coefficient matrix of this system is 𝐴. Let 𝐵 = 𝐴 | #»
matrix. Then the above system has a solution if and only if rank(𝐴) = rank(𝐵).
We want this system to have a solution for every #»
𝑣 ∈ F𝑛 ; in particular, this holds for every
standard basis vector. This happens if and only if the last column of 𝐵 is never a pivot
column, which in turn happens if and only if every row of 𝐴 has a pivot in it. Since 𝐴 has
𝑛 rows, this last statement is equivalent to having rank(𝐴) = 𝑛.
[︀
]︀
Note that if rank 𝑣#»1 · · · 𝑣#»𝑘 = 𝑛, then 𝑘 ≥ 𝑛. Therefore, we need at least 𝑛 vectors in 𝑆
for Span (𝑆) to be equal to F𝑛 , which is very reasonable. Of course, the condition for 𝑘 ≥ 𝑛
is not sufficient on its own — the vectors in 𝑆 must also form a spanning set.
Example 8.4.7
Determine which (if any) of the following sets span R4 .
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1
2
2 ⎪
⎪
⎪
⎬
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
1
−2
⎥ , ⎢ ⎥ , ⎢4⎥ .
(a) 𝑆1 = ⎢
⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦⎪
⎪
⎪
⎪
⎭
⎩
−3
8
1
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1
2
2
5 ⎪
⎪
⎪
⎬
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
1
−2
4
⎥,⎢ ⎥,⎢ ⎥,⎢ 3 ⎥ .
(b) 𝑆2 = ⎢
⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦ ⎣ 10 ⎦⎪
⎪
⎪
⎪
⎩
⎭
1
−3
8
6
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1
2
2
5
3 ⎪
⎪
⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
⎬
1
−2
4
3
⎥,⎢ ⎥,⎢ ⎥,⎢ ⎥,⎢ 5 ⎥ .
(c) 𝑆3 = ⎢
⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦ ⎣ 10 ⎦ ⎣ 6 ⎦⎪
⎪
⎪
⎪
⎩
⎭
1
−3
8
6
10
208
Chapter 8
Subspaces and Bases
Solution:
(a) Span (𝑆1 ) ̸= R4 , since we need at least four vectors to span R4 .
(b) There are at least four vectors in 𝑆2 , so perhaps it spans R4 . Let us place the vectors
in a matrix 𝐴 and check if rank(𝐴) = 4. We have
⎡
⎤
1 2 2 5
⎢ 1 −2 4 3 ⎥
⎥
𝐴=⎢
⎣ 1 3 6 10 ⎦
1 −3 8 6
with
⎡
0
1
0
0
2
4
6
8
5
3
10
6
1
⎢0
RREF(𝐴) = ⎢
⎣0
0
0
0
1
0
⎤
1
1⎥
⎥.
1⎦
0
Thus rank(𝐴) = 3 ̸= 4. So Span (𝑆2 ) ̸= R4 .
(c) We again have at least four vectors. Let
⎡
1 2
⎢ 1 −2
𝐴=⎢
⎣1 3
1 −3
Then
⎡
1
⎢0
RREF(𝐴) = ⎢
⎣0
0
0
1
0
0
⎤
3
5⎥
⎥.
6⎦
10
0
0
1
0
1
1
1
0
⎤
0
0⎥
⎥.
0⎦
1
Thus rank(𝐴) = 4, and so in this case Span (𝑆3 ) = R4 .
8.5
Basis
In Definition 8.2.6, we defined a basis for a subspace 𝑉 of F𝑛 to be a subset ℬ ⊆ 𝑉 that is
linearly independent and spans 𝑉 ; that is, Span (ℬ) = 𝑉 . In the previous two sections, we
examined both aspects of this definition. In this section and the next, we will put everything
together to showcase the utility of bases.
We begin with the following result which we had already noted in passing.
Theorem 8.5.1
(Every Subspace Has a Basis)
Let 𝑉 be a subspace of F𝑛 . Then 𝑉 has a basis.
Section 8.5
Basis
209
Proof: If 𝑉 is not the zero subspace, then the result is given by the proof of Theorem 8.4.1
(Every Subspace Has a Spanning Set). As discussed
following Definition 8.2.6,
}︁
{︁ #»immediately
we know that ∅ is a basis of the trivial subspace 0 .
Theorem 8.5.1 (Every Subspace Has a Basis), as it is stated, is merely an existence result. It
does not supply us with a method for obtaining a basis for 𝑉 . (Although, as mentioned in
the previous section, the proof of Theorem 8.4.1 (Every Subspace Has a Spanning Set) can
be refined into an algorithm.) Nonetheless, in some cases it’s possible to explicitly exhibit
examples of bases. Here is a special—but very important—example.
Definition 8.5.2
Standard Basis for
F𝑛
In F𝑛 , let #»
𝑒 𝑖 represent the vector whose 𝑖𝑡ℎ component is 1 with all other components 0.
𝑛
The set ℰ = {𝑒#»1 , 𝑒#»2 , ..., 𝑒#»
𝑛 } is called the standard basis for F .
Using Proposition 8.3.6 (Pivots and Linear Independence), we can verify that the standard
basis ℰ is linearly independent. It also spans F𝑛 . Indeed, if #»
𝑥 ∈ F𝑛 , then
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
𝑥1
1
0
0
⎢ 𝑥2 ⎥
⎢0⎥
⎢1⎥
⎢0⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
#»
𝑥 = ⎢ . ⎥ = 𝑥1 ⎢ . ⎥ + 𝑥2 ⎢ . ⎥ + · · · + 𝑥𝑛 ⎢ . ⎥
⎣ .. ⎦
⎣ .. ⎦
⎣ .. ⎦
⎣ .. ⎦
𝑥𝑛
0
0
1
= 𝑥1 𝑒#»1 + 𝑥2 𝑒#»2 + · · · 𝑥𝑛 𝑒#»
𝑛.
Therefore, #»
𝑥 ∈ Span (ℰ). Thus, ℰ is a basis for F𝑛 .
In general, determining if a set 𝑆 is a basis for F𝑛 is a straightforward matter, as the next
two propositions show.
Proposition 8.5.3
(Size of Basis for F𝑛 )
Let 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 . If 𝑆 is a basis for F𝑛 , then 𝑘 = 𝑛.
Proof: For a subset 𝑆 of F𝑛 to be a basis of F𝑛 , we must have that
1. Span (𝑆) = F𝑛 , which by Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛) means that 𝑘 ≥ 𝑛,
and
2. 𝑆 is linearly independent, which by Corollary 8.3.7 (Bound on Number of Linearly
Independent Vectors) means that 𝑘 ≤ 𝑛.
We conclude that 𝑘 = 𝑛.
Thus, if a subset 𝑆 of F𝑛 contains fewer than 𝑛 vectors, it does not contain enough vectors
to span F𝑛 and is thus not a spanning set of F𝑛 . On the other hand, if 𝑆 contains more
than 𝑛 vectors in F𝑛 , then it must be linearly dependent.
If 𝑆 contains exactly 𝑛 vectors, then it may or may not be a basis for F𝑛 . We still have to
check that 𝑆 is linearly independent and that it spans F𝑛 . In fact, as the next result shows,
we only need to check one of these conditions!
210
Proposition 8.5.4
Chapter 8
Subspaces and Bases
(𝑛 Vectors in F𝑛 Span iff Independent)
𝑛
Let 𝑆 = {𝑣#»1 , . . . , 𝑣#»
𝑛 } be a set of 𝑛 vectors in F . Then 𝑆 is linearly independent if and only if
𝑛
Span (𝑆) = F .
]︀
[︀
Proof: Let 𝐴 = 𝑣#»1 · · · 𝑣#»
𝑛 be the 𝑛 × 𝑛 matrix whose columns are the vectors in 𝑆. By
Part (a) of Proposition 8.3.6 (Pivots and Linear Independence), 𝑆 is linearly independent
if and only if rank(𝐴) = 𝑛.
On the other hand, from Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛) we have that Span (𝑆) =
F𝑛 if and only if rank(𝐴) = 𝑛.
Combining both of these results completes the proof.
Thus, when we are checking to see whether or not a subset 𝑆 of F𝑛 is a basis of F𝑛 , below
are three options to check this:
1. Check 𝑆 for linear independence and check 𝑆 for spanning, or
2. Count the vectors in 𝑆 and, if 𝑆 contains exactly 𝑛 vectors, then check 𝑆 for spanning
using Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛), or
3. Count the vectors in 𝑆 and, if 𝑆 contains exactly 𝑛 vectors, then check 𝑆 for linear
independence using Proposition 8.3.6 (Pivots and Linear Independence).
In practice, the last option above is usually the quickest.
Example 8.5.5
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
1
−1
4
1
4
⎥
⎢
⎥
⎢
⎥
⎢
⎥
⎢
⎥
⎢
−3
1
3
2
2
⎥ #» ⎢ ⎥ #» ⎢ ⎥ #» ⎢ ⎥ #» ⎢ ⎥
Let 𝑣#»1 = ⎢
⎣ 3 ⎦ , 𝑣2 = ⎣ −3 ⎦ , 𝑣3 = ⎣ 2 ⎦ , 𝑣4 = ⎣ 1 ⎦ , 𝑣5 = ⎣ 2 ⎦ .
−1
1
1
4
4
Which of the following subsets of R4 is a basis for R4 ?
𝑆1 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 } , 𝑆2 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 , 𝑣#»5 } ,
𝑆3 = {𝑣#»1 , 𝑣#»2 , #»
𝑣 3 , 𝑣#»4 } , 𝑆4 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»5 } .
Solution: Sets 𝑆1 and 𝑆2 fail immediately as they have the wrong number of vectors in
them.
Sets 𝑆3 and 𝑆4 have the correct number of vectors in them so we need to investigate them
further. Let us check whether they are linear independent.
The matrix whose columns are the vectors in 𝑆3 is
⎡
⎤
1 −1 4 1
⎢2 2 3 1⎥
⎥
𝐴=⎢
⎣ 3 −3 2 1 ⎦ ,
4 4 11
Section 8.5
Basis
211
which row reduces to
⎡
1
⎢0
⎢
⎣0
0
0
1
0
0
⎤
0 15
0 0⎥
⎥.
1 15 ⎦
0 0
Thus, rank(𝐴) = 3. Since rank(𝐴) ̸= 4, 𝑆3 is not linearly independent. Consequently, it is
not a basis either.
For set 𝑆4 , we consider the matrix
−1
2
−3
4
4
3
2
1
⎡
0
1
0
0
⎤
0
0⎥
⎥.
0⎦
1
1
⎢2
𝐵=⎢
⎣3
4
which row reduces to
⎤
4
−3 ⎥
⎥,
2 ⎦
−1
⎡
1
⎢0
⎢
⎣0
0
0
0
1
0
Since rank(𝐵) = 4, it must be the case that 𝑆4 is linearly independent. Since 𝑆4 contains
exactly four elements, it is a basis for R4 by Proposition 8.5.4 (𝑛 Vectors in F𝑛 Span iff
Independent).
REMARK
There are two problems we might encounter when trying to obtain a basis for F𝑛 :
(a) We might have a set of vectors 𝑆 ⊆ F𝑛 with the property that Span (𝑆) = F𝑛 , but the
set contains more than 𝑛 vectors, which is too many to be a basis. In this case, 𝑆 will be
linearly dependent. We may apply Proposition 8.3.6 (Pivots and Linear Independence)
to produce a subset of 𝑆 that is linearly independent, but still spans F𝑛 . This subset
will be a basis for F𝑛 .
(b) We might have a set of vectors 𝑆 ⊆ F𝑛 that is linearly independent, but that contains
fewer than 𝑛 vectors, which is too few to be a basis. In this case, Span (𝑆) ̸= F𝑛 . The
problem here is to figure out which vectors to add to 𝑆 to make it span F𝑛 . One possible
approach is to add all 𝑛 standard basis vectors to 𝑆, obtaining a larger set 𝑆 ′ . Then
certainly Span (𝑆 ′ ) = F𝑛 , but now 𝑆 ′ is too large to be a basis. This brings us back to
(a).
We summarize observations stated in the above remark in Theorem 8.5.6.
Theorem 8.5.6
(Basis From a Spanning Set or Linearly Independent Set)
Let 𝑆 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } be a subset of F𝑛 .
212
Chapter 8
Subspaces and Bases
(a) If Span(𝑆) = F𝑛 , then there exists a subset ℬ of 𝑆 which is a basis for F𝑛 .
(b) If Span(𝑆) ̸= F𝑛 and 𝑆 is linearly independent, then there exist vectors #»
𝑣 𝑘+1 , . . . , #»
𝑣𝑛
#»
#»
#»
#»
𝑛
𝑛
in F such that ℬ = { 𝑣 1 , . . . , 𝑣 𝑘 , 𝑣 𝑘+1 , . . . , 𝑣 𝑛 } is a basis for F .
Example 8.5.7
Find a basis ℬ of R3 that contains the set
⎧⎡ ⎤ ⎡
⎤⎫
−3 ⎬
⎨ 1
𝑆 = ⎣ 2 ⎦ , ⎣ −4 ⎦ .
⎭
⎩
3
−6
Solution: Notice that 𝑆 is linearly independent, since the two vectors it contains are not
multiples of one another. We would like to extend 𝑆 to a basis for R3 . We begin by adding
the standard basis vectors to the set to obtain the set
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
0 ⎬
0
1
−3
⎨ 1
𝑆 ′ = ⎣ 2 ⎦ , ⎣ −4 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ .
⎭
⎩
1
0
0
−6
3
Now Span 𝑆 ′ = R3 and we would like to extract a linearly independent subset from 𝑆 ′ using
the method of Proposition 8.3.6 (Pivots and Linear Independence). The matrix whose
columns are the vectors in 𝑆 ′ is
⎡
⎤
1 −3 1 0 0
𝐴 = ⎣ 2 −4 0 1 0 ⎦ ,
3 −6 0 0 1
which row reduces to
⎡
⎤
1 0 −2 0 1
⎣ 0 1 −1 0 1 ⎦ .
3
0 0 0 1 − 23
There are pivots in columns 1, 2 and 4. Thus, columns 1, 2 and 4 of 𝐴 are a basis for R3 :
⎧⎡ ⎤ ⎡
⎤ ⎡ ⎤⎫
−3
0 ⎬
⎨ 1
ℬ = ⎣ 2 ⎦ , ⎣ −4 ⎦ , ⎣ 1 ⎦ .
⎩
⎭
3
−6
0
In the above, we only considered bases for the whole space F𝑛 . In the next section we will
discuss examples of bases for special subspaces of F𝑛 . In Chapter 9 we will look at bases
for eigenspaces and we will discover a connection to the problem of diagonalizability that
we had met in Section 7.6.
8.6
Bases for Col(𝐴) and Null(𝐴)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). In Propositions 8.1.3 and 8.1.2, we saw that Col(𝐴) is a subspace of F𝑚
and that Null(𝐴) is a subspace of F𝑛 .
Section 8.6
Bases for Col(𝐴) and Null(𝐴)
Proposition 8.6.1
213
(Basis for Col(𝐴))
[︀
]︀
Let 𝐴 = 𝑎#»1 · · · 𝑎# »
∈ 𝑀𝑚×𝑛 (F) and suppose that RREF(𝐴) has pivots in columns
𝑛
𝑞1 , . . . , 𝑞𝑟 , where 𝑟 = rank(𝐴). Then {𝑎# 𝑞»1 , . . . , 𝑎# 𝑞»𝑟 } is a basis for Col(𝐴).
Proof: This follows from Proposition 8.3.6 (Pivots and Linear Independence).
REMARK
It is important to note that the columns of 𝐴—and not those of RREF(𝐴)—are the elements
of the basis given in the previous proposition. In general, the columns in RREF(𝐴) are not
necessarily in Col(𝐴).
⎡
Example 8.6.2
1
⎢ −2
Let 𝐴 = ⎢
⎣ 2
3
3
−2
3
4
3
2
0
−1
2
−8
7
11
⎤
−9
2 ⎥
⎥. Find a basis for Col(𝐴).
1 ⎦
−8
Solution: Row reduce 𝐴 to
⎡
1
⎢0
RREF(𝐴) = ⎢
⎣0
0
0
1
0
0
−3
2
0
0
5
−1
0
0
⎤
0
0⎥
⎥.
1⎦
0
Since there are pivots in columns 1, 2 and 5, we conclude that
⎧⎡ ⎤ ⎡ ⎤ ⎡
⎤⎫
−9 ⎪
3
1
⎪
⎪
⎪
⎬
⎨⎢ ⎥ ⎢ ⎥ ⎢
−2 ⎥ ⎢ −2 ⎥ ⎢ 2 ⎥
⎥
⎢
ℬ = ⎣ ⎦,⎣ ⎦,⎣
1 ⎦⎪
3
2
⎪
⎪
⎪
⎭
⎩
−8
4
3
is a basis for Col(𝐴).
Notice that in this example it would be incorrect to say that the set of pivot columns of
RREF(𝐴) is a basis for Col(𝐴). Indeed, they all have 0 in the fourth component, so they
cannot possibly span Col(𝐴). We can also show that
⎡ ⎤
0
⎢0⎥
⎢ ⎥ ̸∈ Col(𝐴).
⎣1⎦
0
Next we turn our attention to Null(𝐴). Recall that we can view this subspace as the solution
#»
set to the homogeneous system 𝐴 #»
𝑥 = 0 . If we re-examine the Gauss–Jordan Algorithm,
we find that it provides us with a basis for Null(𝐴). Indeed, the solutions corresponding to
the free parameters are by design a basis for Null(𝐴).
214
Chapter 8
Subspaces and Bases
⎡
Example 8.6.3
⎤
1 −2 −1 3
Let 𝐴 = ⎣ 2 −4 1 0 ⎦. Find a basis for Null(𝐴).
1 −2 2 −3
Solution: The matrix 𝐴 is the
equations
𝑥1
2𝑥1
𝑥1
coefficient matrix of the homogeneous system of linear
−2𝑥2 −𝑥3 +3𝑥4 = 0
−4𝑥2 +𝑥3
= 0 .
−2𝑥2 +2𝑥3 −3𝑥4 = 0
Thus, Null(𝐴) is the solution set to this system. In Example 3.7.4, we applied the Gauss–
Jordan Algorithm to find that the solution to this system is
⎫ ⎧ ⎡ ⎤
⎫
⎧⎡
⎤
⎡ ⎤
2𝑠 − 𝑡
−1
2
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎬ ⎨ ⎢ ⎥
⎬
⎨⎢
⎥
⎢ 0 ⎥
𝑠
1
⎥
⎥
⎢
⎥
⎢
⎢
:
𝑠,
𝑡
∈
F
:
𝑠,
𝑡
∈
F
Null(𝐴) = ⎣
=
+
𝑡
.
𝑠
⎣ 2 ⎦
2𝑡 ⎦
⎪
⎪
⎪
⎪ ⎣0⎦
⎪
⎪
⎪
⎭ ⎪
⎭
⎩
⎩
𝑡
1
0
Notice that the Gauss–Jordan Algorithm immediately supplies us with the spanning set
⎧⎡ ⎤ ⎡ ⎤⎫
−1 ⎪
2
⎪
⎪
⎬
⎨⎢ ⎥ ⎢ ⎥⎪
1⎥ ⎢ 0 ⎥
⎢
ℬ = ⎣ ⎦,⎣ ⎦ .
2 ⎪
0
⎪
⎪
⎪
⎭
⎩
1
0
Moreover, observe that this spanning set is linearly independent since the two vectors it
contains are not multiples of each other. Thus, ℬ is a basis for Null(𝐴).
Digging a bit deeper, the real reason the set ℬ in the previous example is linearly independent is because it contains the solution vectors that were constructed using the free
parameters 𝑥2 and 𝑥4 . (Refer back to Example 3.7.4.) This forces the first vector in the
spanning set to have a 1 in the second component and forces the second vector to have
a 0 in the second component. (It also forces the second vector to have a 1 in the fourth
component and the first vector to have a 0 in the fourth component.)
⎡
Example 8.6.4
⎤
−1 −1 −2 3 1
Let 𝐴 = ⎣ −9 5 −4 −1 −5 ⎦. Find a basis for Null(𝐴).
7 −5 2 3 5
Solution:
The matrix 𝐴 is the coefficient matrix of the homogeneous system of linear equations
−𝑥1 −2𝑥2 −2𝑥3 +3𝑥4 +𝑥5 = 0
−9𝑥1
5𝑥2 −4𝑥3 −𝑥4 −5𝑥5 = 0 .
7𝑥1 −5𝑥2 +2𝑥3 +3𝑥4 +5𝑥5 = 0
Thus, Null(𝐴) is the solution set to this system.
⎡
10
RREF(𝐴) = ⎣ 0 1
00
We have
⎤
1 −1 0
1 −2 −1 ⎦ .
0 0 0
Section 8.6
Bases for Col(𝐴) and Null(𝐴)
215
There are pivots in columns 1 and 2, so we may take 𝑥3 = 𝑠, 𝑥4 = 𝑡 and 𝑥5 = 𝑟 as free
parameters. Then 𝑥1 = −𝑥3 + 𝑥4 = −𝑠 + 𝑡 and 𝑥2 = −𝑥3 + 2𝑥4 + 𝑥5 = −𝑠 + 2𝑡 + 𝑟, and
therefore, the solution set is given by
⎫
⎧⎡
⎫ ⎧ ⎡
⎤
⎡ ⎤
⎤
⎡ ⎤
0
−𝑠 + 𝑡
1
−1
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪⎢ −𝑠 + 2𝑡 + 𝑟 ⎥
⎪ ⎪
⎢1⎥
⎪
⎢2⎥
⎢ −1 ⎥
⎬
⎨
⎬
⎨
⎥
⎢ ⎥
⎢
⎥
⎢ ⎥
⎢
⎥
⎥
⎥
⎢
⎢
⎥
⎢
⎢
𝑠
Null(𝐴) = ⎢
⎥ : 𝑠, 𝑡, 𝑟 ∈ F⎪ = ⎪𝑠 ⎢ 1 ⎥ + 𝑡 ⎢ 0 ⎥ + 𝑟 ⎢ 0 ⎥ : 𝑠, 𝑡, 𝑟 ∈ F⎪ .
⎪
⎪
⎪
⎪
⎪
⎦
⎣0⎦
⎣
⎣1⎦
𝑡
⎪
⎪
⎪
⎪ ⎣ 0 ⎦
⎪
⎪
⎪
⎭
⎩
⎭ ⎪
⎩
1
𝑟
0
0
Consequently, the set
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
−1
1
0 ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎥
⎥
⎢
⎢
⎢
⎪
−1
2
⎨⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥
⎥⎬
⎥
⎥
⎥
⎢
⎢
⎢
ℬ = ⎢ 1 ⎥ , ⎢0⎥ , ⎢0⎥
⎪
⎪
⎪
⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦⎪
⎪
⎪
⎪
⎪
⎩
⎭
0
0
1
is a spanning set for Null(𝐴). It is also linearly independent. Indeed, if some linear combination of the vectors in the set is zero, that is,
⎡ ⎤
⎡ ⎤
⎡ ⎤ ⎡ ⎤
−1
1
0
0
⎢ −1 ⎥
⎢2⎥
⎢1⎥ ⎢0⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎥ + 𝑡 ⎢0⎥ + 𝑟 ⎢0⎥ = ⎢0⎥ ,
1
𝑠⎢
⎢ ⎥
⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎣ 0 ⎦
⎣1⎦
⎣0⎦ ⎣0⎦
0
0
1
0
then
⎡
⎤ ⎡ ⎤
−𝑠 + 𝑡
0
⎢ −𝑠 + 2𝑡 + 𝑟 ⎥ ⎢ 0 ⎥
⎢
⎥ ⎢ ⎥
⎢
⎥ = ⎢0⎥
𝑠
⎢
⎥ ⎢ ⎥
⎣
⎦ ⎣0⎦
𝑡
𝑟
0
and hence 𝑠 = 𝑡 = 𝑟 = 0 by equating the third, fourth and fifth entries. So the only
possible such linear combination is the trivial linear combination. Therefore, ℬ is linearly
independent.
Notice that the third, fourth and fifth entries correspond to the free parameters. This is no
coincidence. Indeed, in the set ℬ, each vector has a nonzero entry in the row corresponding
to its free parameter while all other vectors in ℬ have a 0 in that same row.
The results in the previous two examples may be generalized as follows.
Proposition 8.6.5
(Basis for Null(𝐴))
#»
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and consider the homogeneous linear system 𝐴 #»
𝑥 = 0 . Suppose that,
after applying the Gauss–Jordan Algorithm, we obtain 𝑘 free parameters so that the solution
set to this system is given by
Null(𝐴) = {𝑡1 𝑥#»1 + · · · + 𝑡𝑘 𝑥# »𝑘 : 𝑡1 , . . . , 𝑡𝑘 ∈ F}.
216
Chapter 8
Subspaces and Bases
Here 𝑘 = nullity(𝐴) = 𝑛 − rank(𝐴) and the parameters 𝑡𝑖 and the vectors 𝑥#»𝑖 for 1 ≤ 𝑖 ≤ 𝑘
are obtained using the method outlined in Section 3.7.
Then {𝑥#», . . . , 𝑥# »} is a basis for Null(𝐴).
1
𝑘
Proof: Since Null(𝐴) = {𝑡1 𝑥#»1 + · · · + 𝑡𝑘 𝑥# »𝑘 : 𝑡1 , . . . , 𝑡𝑘 ∈ F}, we see that ℬ = {𝑥#»1 , . . . , 𝑥# »𝑘 } is
a spanning set. The fact that it is linearly independent follows from the way the solution
vectors 𝑥#»𝑖 were constructed such that only 𝑥#»𝑖 has a nonzero entry in the row corresponding
to the 𝑖𝑡ℎ free variable. So if there are scalars 𝑐𝑖 ∈ F so that
#»
𝑐1 𝑥#»1 + · · · + 𝑐𝑘 𝑥# »𝑘 = 0 ,
then by comparing 𝑖𝑡ℎ coefficients of both sides of the equation, we get that 𝑐𝑖 = 0 for all 𝑖.
Thus, ℬ is a basis for Null(𝐴), as claimed.
8.7
Dimension
A nonzero subspace 𝑉 of F𝑛 will have infinitely many different bases. If ℬ = {𝑣#»1 , . . . , 𝑣#»𝑘 } is
a basis for 𝑉 , then so is ℬ ′ = {𝑐𝑣#»1 , . . . , 𝑣#»𝑘 } for any nonzero scalar 𝑐 ∈ F. More drastically,
two different bases can contain substantially different vectors, and not just ones that are
scalar multiples of each other.
Example 8.7.1
In Section 8.5, we saw two bases for R4 : the standard basis
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
0 ⎪
0
0
1
⎪
⎪
⎬
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
0
1
0
⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢0⎥ ,
𝒮 = {𝑒#»1 , 𝑒#»2 , 𝑒#»3 , 𝑒#»4 } = ⎢
⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦⎪
⎪
⎪
⎪
⎭
⎩
1
0
0
0
and the basis
⎧⎡ ⎤ ⎡
⎤ ⎡ ⎤ ⎡ ⎤⎫
4 ⎪
4
−1
1
⎪
⎪
⎪
⎬
⎨⎢ ⎥ ⎢
⎢
⎥
⎢
⎥
2 ⎥ ⎢ 2 ⎥ ⎢ 3 ⎥ ⎢ −3 ⎥
⎥
⎢
,
,
𝑆4 = ⎣ ⎦ , ⎣
−3 ⎦ ⎣ 2 ⎦ ⎣ 2 ⎦⎪
3
⎪
⎪
⎪
⎭
⎩
−1
1
4
4
given in Example 8.5.5. From our work in Example 8.4.7(b), we can obtain yet another
basis:
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1
2
2
5 ⎪
⎪
⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
⎬
−2
4
1
⎥,⎢ ⎥,⎢ ⎥,⎢ 3 ⎥ .
𝑆2 = ⎢
⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦ ⎣ 10 ⎦⎪
⎪
⎪
⎪
⎩
⎭
1
−3
8
6
Although the three sets in the above example contain different vectors, they share one thing
in common. They each contain precisely four vectors. This agrees with Proposition 8.5.3
(Size of Basis for F𝑛 ), which states that a basis for F𝑛 must contain precisely 𝑛 vectors.
The same kind of result is true for a subspace 𝑉 of F𝑛 . Even though 𝑉 may have many
different bases, any two bases will have the same number of elements.
Section 8.7
Dimension
Theorem 8.7.2
217
(Dimension is Well-Defined)
# », . . . , 𝑤
# »} are bases for 𝑉 , then
Let 𝑉 be a subspace of F𝑛 . If ℬ = {𝑣#»1 , . . . , 𝑣#»𝑘 } and 𝒞 = {𝑤
1
ℓ
𝑘 = ℓ.
Proof: Since ℬ and 𝒞 are bases for 𝑉 , we have that 𝑉 = Span (ℬ) = Span (𝒞). This means,
# » ∈ Span (ℬ). Thus there are scalars 𝑐 , . . . , 𝑐 such that
in particular, that 𝑤
1
1
𝑘
# » = 𝑐 𝑣#» + 𝑐 𝑣#» + · · · + 𝑐 𝑣#».
𝑤
1
1 1
2 2
𝑘 𝑘
# » would be
At least one of these scalars must be nonzero, for if they were all zero then 𝑤
1
#»
equal to 0 , which would contradict the fact that 𝒞 is linearly independent (by Proposition
8.3.2(a)).
We may assume, without loss of generality, that 𝑐1 ̸= 0. Then we can write
1 #»
#»
#»
𝑣#»1 = (𝑤
1 − 𝑐2 𝑣2 − · · · − 𝑐𝑘 𝑣𝑘 ).
𝑐1
# », 𝑣#», . . . , 𝑣#»}. Thus, we have effectively replaced
This shows that Span{𝑣#»1 , . . . , 𝑣#»𝑘 } = Span{𝑤
1 2
𝑘
# ». By repeating this same argument with 𝑤
# », 𝑣#», . . . , 𝑣#» and 𝑤
# », we can show
𝑣#»1 with 𝑤
1
1 2
2
𝑘
#»
#»
that we can replace some other 𝑣𝑖 , say without loss of generality 𝑣2 , while still having
# », 𝑤
# », 𝑣#», . . . , 𝑣#»}. We can continue this replacement procedure
Span{𝑣#»1 , . . . , 𝑣#»𝑘 } = Span{𝑤
1
2 3
𝑘
#
»
for each 𝑤 in 𝒞. This is only possible if ℓ ≤ 𝑘.
𝑖
Using a similar argument where the roles of ℬ and 𝒞 are swapped, we must also have that
𝑘 ≤ ℓ. Thus, 𝑘 = ℓ.
Definition 8.7.3
Dimension
The number of elements in a basis for a subspace 𝑉 of F𝑛 is called the dimension of 𝑉 .
We denote this number by dim(𝑉 ).
Since every subspace 𝑉 of F𝑛 has a basis (by Theorem 8.5.1 (Every Subspace Has a Basis)),
and since we just proved in Theorem 8.7.2 (Dimension is Well-Defined) that any two bases for
𝑉 have the same number of elements, we see that dim(𝑉 ) is well-defined (or unambiguous).
We can evaluate dim(𝑉 ) by counting the number of vectors in any basis for 𝑉 .
Example 8.7.4
𝑛
𝑛
The standard basis ℰ = {𝑒#»1 , . . . , 𝑒#»
𝑛 } of F consist of 𝑛 vectors. Thus, dim(F ) = 𝑛. This
aligns with our intuition of F𝑛 being “𝑛-dimensional”.
How large can the dimension of a subspace of F𝑛 be?
Proposition 8.7.5
(Bound on Dimension of Subspace)
Let 𝑉 be a subspace of F𝑛 . Then dim(𝑉 ) ≤ 𝑛.
Proof: The proof of Theorem 8.4.1 (Every Subspace Has a Spanning Set) gave us that every
subspace has a basis, and established that the basis generated in the proof was guaranteed
to have at most 𝑛 elements.
218
Chapter 8
Subspaces and Bases
EXERCISE
Let 𝑉 be a subspace of F𝑛 . Show that 𝑉 = F𝑛 if and only if dim(𝑉 ) = 𝑛.
Example 8.7.6
#» #»
Let 𝐿 be the line through the origin in R𝑛 with direction vector 𝑑 ̸= 0 . That is,
{︁ #»}︁
#»
𝐿 = {𝑡 𝑑 : 𝑡 ∈ R} = Span 𝑑 .
{︁ #»}︁
#»
Since 𝑑 ̸= 0, the set 𝑑 is linearly independent, and by the above it spans 𝐿. Thus,
{︁ #»}︁
ℬ = 𝑑 is a basis for 𝐿. Consequently, dim(𝐿) = 1.
That is, lines through the origin are 1-dimensional subspaces of R𝑛 .
Example 8.7.7
Let 𝑃 be the plane in R3 given by the scalar equation 2𝑥 − 6𝑦 + 4𝑧 = 0. In Example 8.4.3,
we saw that
⎧⎡ ⎤ ⎡ ⎤⎫
−2 ⎬
⎨ 3
⎦
⎣
⎣
1 , 0 ⎦
𝑆=
⎭
⎩
1
0
is a basis for 𝑃 . Thus, dim(𝑃 ) = 2.
More generally, if 𝑃 is any plane through the origin in R3 , then we can show that any two
linearly independent vectors in 𝑃 will form a basis for 𝑃 . Thus, dim(𝑃 ) = 2 in all cases.
That is, planes through the origin are 2-dimensional subspaces of R𝑛 .
The next two examples of dimension computations are important enough to warrant being
put in a proposition.
Proposition 8.7.8
(Rank and Nullity as Dimensions)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). Then
(a) rank(𝐴) = dim(Col(𝐴)), and
(b) nullity(𝐴) = dim(Null(𝐴)).
Proof: (a) follows from Proposition 8.6.1 (Basis for Col(𝐴)).
(b) follows from Proposition 8.6.5 (Basis for Null(𝐴)).
Using the definition of nullity (Definition 3.6.8), we arrive at the following important result.
Section 8.8
Coordinates
Theorem 8.7.9
219
(Rank–Nullity Theorem)
Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). Then
𝑛 = rank(𝐴) + nullity(𝐴)
= dim(Col(𝐴)) + dim(Null(𝐴)).
Proof: This result follows immediately from the fact that nullity(𝐴) = 𝑛 − rank(𝐴), together with Proposition 8.7.8 (Rank and Nullity as Dimensions).
This relationship between rank and nullity is one of the central results of linear algebra.
Although the above proof seems short, it contains a significant amount of content.
8.8
Coordinates
In this section, we discuss one of the most important results about a basis. This result will
naturally lead us to the idea of coordinates with respect to bases.
Theorem 8.8.1
(Unique Representation Theorem)
#»
𝑛
𝑛
Let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } be a basis for F . Then, for every vector 𝑣 ∈ F , there exist unique
scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 ∈ F such that
#»
𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#»
𝑛.
Proof: First we prove that such scalars exist. Since ℬ is a basis for F𝑛 , Span (ℬ) = F𝑛 ,
and so each vector #»
𝑣 can be written as a linear combination of elements of ℬ, that is, there
exist scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 ∈ F such that #»
𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#»
𝑛.
Next, we show that the representation is unique. Suppose that we can also express #»
𝑣 as
#»
𝑣 = 𝑑1 𝑣#»1 + 𝑑2 𝑣#»2 + · · · + 𝑑𝑛 𝑣#»
𝑛 , for some scalars 𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ∈ F.
If we subtract these two expressions for #»
𝑣 , then we have
#» #» #»
#»
#»
#»
0 = 𝑣 − 𝑣 = (𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#»
𝑛 ) − (𝑑1 𝑣1 + 𝑑2 𝑣2 + · · · + 𝑑𝑛 𝑣𝑛 )
= (𝑐 − 𝑑 )𝑣#» + (𝑐 − 𝑑 )𝑣#» + · · · + (𝑐 − 𝑑 )𝑣#».
1
1
1
2
2
2
𝑛
𝑛
𝑛
Since ℬ is linearly independent, we have that (𝑐1 − 𝑑1 ) = 0, (𝑐2 − 𝑑2 ) = 0, . . . , (𝑐𝑛 − 𝑑𝑛 ) = 0.
That is, 𝑐1 = 𝑑1 , 𝑐2 = 𝑑2 , . . . , 𝑐𝑛 = 𝑑𝑛 , and thus, the representation for #»
𝑣 is unique.
⎤
𝑥1
#» ⎢ . ⎥
𝑛
𝑛
Let ℰ = {𝑒#»1 , . . . , 𝑒#»
𝑛 } be the standard basis for F . Given 𝑥 = ⎣ .. ⎦ ∈ F , we have
𝑥𝑛
⎡
Example 8.8.2
#»
𝑥 = 𝑥1 𝑒#»1 + · · · + 𝑥𝑛 𝑒#»
𝑛.
220
Chapter 8
Subspaces and Bases
Thus, the unique scalars in the representation of #»
𝑥 in terms of the standard basis are the
coordinates (or components) of #»
𝑥.
The previous Example motivates our next definition.
Definition 8.8.3
#»
𝑛
𝑛
Let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } be a basis for F . Let the vector 𝑣 ∈ F have representation
Coordinates and
Components
#»
𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#»
𝑛
(𝑐𝑖 ∈ F).
We call the scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 the coordinates (or components) of #»
𝑣 with respect
#»
to ℬ, or the ℬ-coordinates of 𝑣 .
⎡
⎤
𝑐1
⎢ ⎥
We would like to use this definition to create a column vector ⎣ ... ⎦ using the coordinates
𝑐𝑛
𝑐1 , . . . , 𝑐𝑛 of #»
𝑣 with respect to ℬ. However, the ordering of the vectors in a basis (that
is, the assignment of labels 1, 2, ..., 𝑛 for 𝑣#»1 , 𝑣#»2 , ..., 𝑣#»
𝑛 ) is arbitrary, and permutations of the
ordering result in the same basis. Unfortunately, each of these permutations can result in
different representations of the vector under ℬ, which could be a source of confusion.
The next definition helps us address this subtlety.
Definition 8.8.4
Ordered Basis
𝑛
An ordered basis for F𝑛 is a basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } for F together with a fixed
ordering.
REMARK
#»
When we refer to the set {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } as being ordered, we are indicating that 𝑣1 is the
#»
first element in the ordering, that 𝑣2 is the second, and so on.
Thus even though {𝑣#», 𝑣#»} and {𝑣#», 𝑣#»} are the same set, they are different from the point
1
2
2
1
of view of orderings.
A given basis {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } gives rise to 𝑛! ordered bases, one for each possible ordering
(permutation) of the entries.
Example 8.8.5
{︂[︂ ]︂ [︂ ]︂}︂
1
5
The set ℬ =
,
is a basis for R2 . The two vectors in ℬ are not multiples of each
2
6
other, so ℬ is linearly independent. Since ℬ contains precisely two vectors, it must also
span R2 , by Proposition 8.5.4 (𝑛 Vectors in F𝑛 Span iff Independent). Hence ℬ is a basis
for R2 .
The basis ℬ gives rise to two different ordered bases:
{︂[︂ ]︂ [︂ ]︂}︂
{︂[︂ ]︂ [︂ ]︂}︂
1
5
5
1
,
and
,
.
2
6
6
2
Section 8.8
Coordinates
Definition 8.8.6
Coordinate Vector
221
#»
𝑛
𝑛
Let ℬ = {𝑣#»1 , . . . , 𝑣#»
𝑛 } be an ordered basis for F . Let 𝑣 ∈ F have coordinates 𝑐1 , . . . , 𝑐𝑛
with respect to ℬ, where the ordering of the scalars 𝑐𝑖 matches the ordering in ℬ, that is,
#»
𝑣 =
𝑛
∑︁
𝑐𝑖 𝑣#»𝑖 .
𝑖=1
𝑣 with respect to ℬ (or the ℬ-coordinate vector of
Then the coordinate vector of #»
#»
𝑛
𝑣 ) is the column vector in F
⎡ ⎤
𝑐1
⎢ 𝑐2 ⎥
⎢ ⎥
[ #»
𝑣 ]ℬ = ⎢ . ⎥ .
⎣ .. ⎦
𝑐𝑛
If we choose a different ordering on the basis ℬ then the entries of [ #»
𝑣 ]ℬ will be permuted
accordingly.
Example 8.8.7
Consider the ordered basis ℬ =
have
[︂ ]︂
{︂[︂ ]︂ [︂ ]︂}︂
7
5
1
. By inspection, we
for R2 and let #»
𝑣 =
,
10
6
2
[︂ ]︂
[︂ ]︂
5
1
#»
.
+ (1)
𝑣 = (2)
6
2
So the ℬ-coordinates of #»
𝑣 are 2 and 1. Therefore,
[︂ ]︂
2
#»
.
[ 𝑣 ]ℬ =
1
{︂[︂ ]︂ [︂ ]︂}︂
1
5
, we would have
,
If we instead use the ordered basis 𝒞 =
2
6
[︂ ]︂
1
#»
[ 𝑣 ]𝒞 =
.
2
In using coordinates, it is therefore important to indicate clearly which ordered basis you
are using.
REMARK (Standard Ordering)
#»
𝑛
Let ℰ = {𝑒#»1 , . . . , 𝑒#»
𝑛 } be the standard basis for F , ordered so that 𝑒𝑖 is the 𝑖th vector. We
call this the standard ordering. Unless explicitly stated otherwise, we always assume
that ℰ is given the standard ordering.
In the early part of the course, we had been implicitly using coordinate vectors with respect
222
Chapter 8
Subspaces and Bases
to ℰ. Indeed, whenever we had written
⎤
𝑣1
⎢ 𝑣2 ⎥
⎢ ⎥
#»
𝑣 =⎢ . ⎥
⎣ .. ⎦
⎡
𝑣𝑛
we were implicitly writing down [ #»
𝑣 ]ℰ .
Moving forwards, when we attempt to describe some vector in F𝑛 , we still have to represent
it somehow. Unless we have some other explicit arrangement, we will do this as above, i.e.,
by making use of the standard basis (with its standard ordering). We will often suppress
the notation [ #»
𝑣 ]ℰ when doing so, and instead simply write #»
𝑣 as we have been doing so far.
REMARK
The notation of [ #»
𝑣 ]ℰ (and more generally [ #»
𝑣 ]ℬ ) is reminiscent of the notation [𝑇 ]ℰ for the
standard matrix representation of the linear transformation 𝑇 . In both cases, the notation
conveys the idea that we are taking an object (a vector or linear transformation) and
representing it as an array of numbers (a column vector or matrix) by using a basis to
obtain this representation.
There is another connection between these two pieces of notation, as we will see later.
#»
#»
𝑛
Note that for an ordered basis ℬ = {𝑣#»1 , . . . , 𝑣#»
𝑛 } of F , the relationship between 𝑣 and [ 𝑣 ]ℬ
is a two-way relationship:
⎡ ⎤
𝑐1
⎢ .. ⎥
#»
#»
#
»
#»
𝑣 = 𝑐1 𝑣1 + · · · + 𝑐𝑛 𝑣𝑛 ⇐⇒ [ 𝑣 ]ℬ = ⎣ . ⎦ .
𝑐𝑛
Thus, if we have #»
𝑣 , we can obtain [ #»
𝑣 ]ℬ , and vice versa. We leave the proof of the following
result as an exercise.
Theorem 8.8.8
(Linearity of Taking Coordinates)
𝑛
Let ℬ = {𝑣#»1 , . . . , 𝑣#»
𝑛 } be an ordered basis for F . Then the function [
#»
#»
by sending 𝑣 to [ 𝑣 ]ℬ is linear :
]ℬ : F𝑛 → F𝑛 defined
(a) For all #»
𝑢 , #»
𝑣 ∈ F𝑛 , [ #»
𝑢 + #»
𝑣 ]ℬ = [ #»
𝑢 ]ℬ + [ #»
𝑣 ]ℬ .
(b) For all #»
𝑣 ∈ F𝑛 and 𝑐 ∈ F, [𝑐 #»
𝑣 ]ℬ = 𝑐[ #»
𝑣 ]ℬ .
We now provide some examples of computing coordinates with respect to bases for F𝑛 .
Section 8.8
Coordinates
Example 8.8.9
223
{︂[︂ ]︂ [︂ ]︂}︂
[︂ ]︂
1
5
3
#»
2
Consider the ordered basis ℬ =
,
for R . Let 𝑣 =
. (By this we mean [ #»
𝑣 ]ℰ ,
2
6
4
as explained at the end of the above remark on the standard ordering.)
(a) Find [ #»
𝑣 ]ℬ .
[︂ ]︂
−3
#»
#» (i.e., find [ 𝑤]
#» ).
(b) If [ 𝑤]ℬ =
, find 𝑤
ℰ
2
Solution:
(a) To find [ #»
𝑣 ]ℬ , we need to find the coordinates of #»
𝑣 with respect to ℬ. Thus, we want
to find 𝑎, 𝑏 ∈ F so that
[︂ ]︂
[︂ ]︂ [︂ ]︂
1
5
3
𝑎
+𝑏
=
.
2
6
4
This leads us to the system
[︂
15
26
]︂ [︂ ]︂ [︂ ]︂
𝑎
3
=
𝑏
4
with augmented matrix
[︂
which reduces to
Thus, 𝑎 =
1
2
[︃
1 5 3
2 6 4
]︂
1
2
1
2
]︃
1 0
0 1
,
.
and 𝑏 = 12 . These are the coordinates of #»
𝑣 with respect of ℬ. Therefore,
[︃ ]︃
[︃ ]︃
1
3
= 21 .
4
2
ℬ
[︂ ]︂
−3
#»
. This means that
(b) We are given that [ 𝑤]ℬ =
2
[︂ ]︂
[︂ ]︂ [︂ ]︂
1
5
7
#»
𝑤 = (−3)
+2
=
.
2
6
6
That is,
[︂ ]︂
#» = 𝑤
#» = 7 .
[ 𝑤]
ℰ
6
Example 8.8.10
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎡ ⎤
1
1 ⎬
−3
⎨ 1
#»
⎣
⎦
⎣
⎦
⎣
⎦
⎣
1 , 0 , −2
Let ℬ =
and let 𝑣 = 2 ⎦.
⎩
⎭
1
3
4
−6
(a) Show that ℬ is a basis for R3 .
224
Chapter 8
Subspaces and Bases
(b) Viewing ℬ as an ordered basis in the given order, find [ #»
𝑣 ]ℬ .
⎡
⎤
3
#» = ⎣ −4 ⎦, find [ 𝑤]
#» .
(c) If [ 𝑤]
ℬ
ℰ
5
Solution:
(a) By Proposition 8.5.4 (𝑛 Vectors in F𝑛 Span iff Independent), since we have three vectors
in ℬ, it suffices to check that they span R3 .
By Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛), this can be done by evaluating the rank
of
⎡
⎤
11 1
𝐴 = ⎣ 1 0 −2 ⎦ .
13 4
⎡
⎤
111
Row reduction yields the matrix 𝑅 = ⎣ 0 1 3 ⎦ , which has rank 3. Thus, we have a
003
basis.
(b) We need to find 𝑎, 𝑏, 𝑐 ∈ R so that
⎤ ⎡ ⎤
⎡
⎡ ⎤
⎡ ⎤
−3
1
1
1
⎦
⎣
⎦
⎣
⎦
⎣
⎣
𝑎 1 + 𝑏 0 + 𝑐 −2 = 2 ⎦ .
−6
4
3
1
Consider the system
⎡
⎤⎡ ⎤ ⎡
⎤
11 1
𝑎
−3
⎣ 1 0 −2 ⎦ ⎣ 𝑏 ⎦ = ⎣ 2 ⎦ ,
13 4
𝑐
−6
which has augmented matrix
⎡
⎤
1 1 1 −3
⎣ 1 0 −2 2 ⎦ ,
1 3 4 −6
which reduces to
⎡
1 0 0
⎢
⎣ 0 1 0
0 0 1
−8
3
⎤
⎥
2 ⎦.
−7
3
Thus, 𝑎 = − 83 , 𝑏 = 2 and 𝑐 = − 73 , and we have
⎡
⎤
− 83
⎢
⎥
[ #»
𝑣 ]ℬ = ⎣ 2 ⎦ .
− 37
Section 8.8
Coordinates
225
(c) If
⎡
⎤
3
#» = ⎣ −4 ⎦ ,
[ 𝑤]
ℬ
5
then
⎡ ⎤
⎡ ⎤
⎡
⎤ ⎡ ⎤
1
1
1
4
#» = (3) ⎣ 1 ⎦ + (−4) ⎣ 0 ⎦ + (5) ⎣ −2 ⎦ = ⎣ −7 ⎦ ,
𝑤
1
3
4
11
that is,
⎡
⎤
4
#» = 𝑤
#» = ⎣ −7 ⎦ .
[ 𝑤]
ℰ
11
In the previous two examples, we saw that in order to obtain the coordinate vector representation [ #»
𝑣 ]ℬ of a given vector #»
𝑣 , we must solve a certain system of equations. Must
we do this each time we want to find coordinate vectors with respect to ℬ? The answer is
“no”; there is a more efficient approach.
Example 8.8.11
Let us first consider the final calculation of Example 8.8.9 (b), where we had
[︂ ]︂
[︂ ]︂
#» = 𝑤
#» = 7 .
#» = −3 , and then [ 𝑤]
[ 𝑤]
ℬ
ℰ
6
2
This was completed by performing the calculation
[︂ ]︂
[︂ ]︂
5
1
#»
= 𝑤.
+2
(−3)
6
2
The system above can be expressed as
[︂
15
26
]︂ [︂
]︂
−3
.
2
[︂
]︂
15
Thus, there is a matrix at work here, namely 𝑀 =
, such that multiplication by 𝑀
26
allows us to go from the new coordinate system (with respect to the basis ℬ) to the standard
system (basis ℰ).
[︂
]︂
4
For instance, if we are given a vector #»
𝑧 with [ #»
𝑧 ]ℬ =
, then
−3
[︂
]︂ [︂ ]︂ [︂
]︂
15
4
−11
#»
#»
[ 𝑧 ]ℰ = 𝑀 [ 𝑧 ]ℬ =
=
.
2 6 −3
−10
Notice that 𝑀 is very special. Its columns are the coordinate vectors of ℬ expressed in the
standard basis.
226
Chapter 8
Subspaces and Bases
What about going the other way,
[︂ from
]︂ ℰ to ℬ? Let us re-examine what we did in Example
3
8.8.9 (a). We wanted to obtain
, which we found amounted to solving the system
4 ℬ
[︂
15
26
]︂ [︂ ]︂ [︂ ]︂
𝑎
3
=
.
𝑏
4
The coefficient matrix here is the same 𝑀 above. It is invertible! (This is a general
phenomenon, as we will soon see.) Thus,
[︂ ]︂ [︂
]︂−1 [︂ ]︂
[︂
]︂ [︂ ]︂ [︃ 1 ]︃
1 6 −5 3
𝑎
15
3
2
=
=−
= 1 .
𝑏
26
4
4
4 −2 1
2
From this, we get
⎡1⎤
[︂ ]︂
2
3
= ⎣ 1 ⎦.
4 ℬ
2
]︂
6 −5
. It will change
−2 1
coordinates from the standard system (basis ℰ) to the new system (basis ℬ), and it will do
this for any vector whose components we already have in ℰ.
[︂ ]︂
−2
, then
For instance, given a vector #»
𝑢 with [ #»
𝑢 ]ℰ =
5
Once again there is a matrix at work here, namely 𝑀 −1 = − 14
[︂
]︂ [︂ ]︂
]︂
[︂
[︂
1 6 −5 −2
1 37
−1 #»
#»
[ 𝑢 ]ℬ = 𝑀 [ 𝑢 ]ℰ = −
=
.
5
4 −2 1
4 −9
To obtain 𝑀 −1 , we invert the matrix whose columns are the coordinates of the new basis
vector in ℬ expressed in the standard basis.
With some work, you can show that the columns of 𝑀 −1 are in fact the coordinate vectors
of the standard basis vectors with respect to ℬ.
The idea of using a matrix like 𝑀 above to change coordinates from one basis to another
is in fact always possible, as we now explain.
Definition 8.8.12
Change-of-Basis
Matrix, Change-ofCoordinate
Matrix
#»
#»
𝑛
Let ℬ = {𝑣#»1 , . . . , 𝑣#»
𝑛 } and 𝒞 = {𝑤1 , . . . , 𝑤𝑛 } be ordered bases for F .
The change-of-basis (or change-of-coordinates) matrix from ℬ-coordinates to 𝒞coordinates is the 𝑛 × 𝑛 matrix
[︀
]︀
[𝐼] = [𝑣#»] . . . [𝑣#»]
𝒞
ℬ
1 𝒞
𝑛 𝒞
whose columns are the 𝒞-coordinates of the vectors 𝑣#»𝑖 in ℬ.
Similarly, the change-of-basis (or change-of-coordinates) matrix from 𝒞-coordinates
to ℬ-coordinates is the 𝑛 × 𝑛 matrix
[︀ # »
# »] ]︀
[𝐼] = [𝑤
] . . . [𝑤
ℬ
𝒞
1 ℬ
𝑛 ℬ
#» in 𝒞.
whose columns are the ℬ-coordinates of the vectors 𝑤
𝑖
Section 8.8
Coordinates
227
REMARKS
• The reason for the notation will become apparent later. The resemblance to the
notation [𝑇 ]ℰ for the standard matrix of the linear transformation 𝑇 is intentional.
• Other sources may notate 𝒞 [𝐼]ℬ as 𝒞 𝑃ℬ or 𝑃𝒞←ℬ .
Example 8.8.13
The matrix 𝑀 in Example 8.8.9 is none other than ℰ [𝐼]ℬ , while the matrix 𝑀 −1 is ℬ [𝐼]ℰ .
(This is not a coincidence. See Corollary 8.8.16 (Inverse of Change-of-Basis Matrix).)
The role of the change-of-basis matrix is exactly as its name suggests. It changes coordinates
from one basis to another.
Proposition 8.8.14
(Changing a Basis)
#»
#»
𝑛
Let ℬ = {𝑣#»1 , . . . , 𝑣#»
𝑛 } and 𝒞 = {𝑤1 , . . . , 𝑤𝑛 } be ordered bases for F .
Then [ #»
𝑥 ] = [𝐼] [ #»
𝑥 ] and [ #»
𝑥 ] = [𝐼] [ #»
𝑥 ] for all #»
𝑥 ∈ F𝑛 .
𝒞
𝒞
ℬ
ℬ
ℬ
ℬ
𝒞
𝒞
⎡
⎤
𝑎1
⎢ ⎥
Proof: Suppose that [ #»
𝑥 ]ℬ = ⎣ ... ⎦ . Then #»
𝑥 = 𝑎1 𝑣#»1 + · · · + 𝑎𝑛 𝑣#»
𝑛 . As the coordinate vector
𝑎𝑛
map is linear by Theorem 8.8.8 (Linearity of Taking Coordinates), we have
[ #»
𝑥 ]𝒞 = [𝑎1 𝑣#»1 + · · · + 𝑎𝑛 𝑣#»
𝑛 ]𝒞
#»
= 𝑎 [𝑣 ] + · · · + 𝑎 [𝑣#»]
1
1 𝒞
𝑛
𝑛 𝒞
⎡
⎤
𝑎1
⎥
[︀
]︀ ⎢
⎢ 𝑎2 ⎥
= [𝑣#»1 ]𝒞 [𝑣#»2 ]𝒞 · · · [𝑣#»
]
𝑛 𝒞 ⎢ .. ⎥
⎣ . ⎦
= 𝒞 [𝐼]ℬ [ #»
𝑥 ]ℬ .
𝑎𝑛
The proof going from 𝒞 to ℬ is identical, with 𝒞 switched with ℬ.
⎡
Corollary 8.8.15
⎤
𝑥1
⎢ ⎥
Let #»
𝑥 = [ #»
𝑥 ]ℰ = ⎣ ... ⎦ be a vector in F𝑛 , where ℰ is the standard basis for F𝑛 . If 𝒞 is any
𝑥𝑛
ordered basis for F𝑛 , then
[ #»
𝑥 ]𝒞 = 𝒞 [𝐼]ℰ [ #»
𝑥 ]ℰ .
Proof: Take ℬ = ℰ in Proposition 8.8.14 (Changing a Basis).
228
Chapter 8
Subspaces and Bases
The matrix ℰ [𝐼]𝒞 is relatively easily obtained by simply inserting the standard coordinates
of the vectors in 𝒞 into the columns of a matrix. However, we usually want to go the other
way (as in the above Corollary), and thus, we would like to have 𝒞 [𝐼]ℰ .
The next result tells us that once we have one of these two matrices, then we can obtain
the other rather quickly as they are inverses of each other.
Corollary 8.8.16
(Inverse of Change-of-Basis Matrix)
Let ℬ and 𝒞 be two ordered bases of F𝑛 . Then
ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ
= 𝐼𝑛
and
𝒞 [𝐼]ℬ ℬ [𝐼]𝒞
= 𝐼𝑛 .
In other words, ℬ [𝐼]𝒞 = ( 𝒞 [𝐼]ℬ )−1 .
Proof: Suppose we change coordinates twice. We start using basis ℬ, then change the
basis to 𝒞, and then back to ℬ.
We then have, for any #»
𝑥 ∈ F𝑛 , [ #»
𝑥 ]𝒞 = 𝒞 [𝐼]ℬ [ #»
𝑥 ]ℬ and therefore,
[ #»
𝑥 ]ℬ = ℬ [𝐼]𝒞 [ #»
𝑥 ]𝒞 = ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ [ #»
𝑥 ]ℬ .
We can write this as
(︀
)︀
#»
𝐼𝑛 − ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ [ #»
𝑥 ]ℬ = 0 ,
for all #»
𝑥 ∈ F𝑛 .
Thus, by Theorem 4.2.3 (Equality of Matrices), we conclude that
(𝐼𝑛 −ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ ) = 𝒪,
where 𝒪 is the 𝑛 × 𝑛 zero matrix, and therefore ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ = 𝐼𝑛 . So ℬ [𝐼]𝒞 and 𝒞 [𝐼]ℬ are
inverses of each other.
Example 8.8.17
If we revisit Example 8.8.10 with this new notation and information, we find that
⎡
⎤
11 1
ℰ [𝐼]ℬ = ⎣ 1 0 −2 ⎦ .
13 4
The inverse of this matrix is
and so
⎡
⎡
⎤
6 −1 −2
1⎣
−6 3 3 ⎦
ℬ [𝐼]ℰ =
3
3 −2 −1
⎤
⎡
⎤⎡
⎤
⎡
⎤
−3
6 −1 −2
−3
−8
1
1
⎣ 2 ⎦ = ⎣ −6 3 3 ⎦ ⎣ 2 ⎦ = ⎣ 6 ⎦ ,
3
3
−6 ℬ
3 −2 −1
−6
−7
just as we had computed.
Section 8.8
Coordinates
229
⎡ ⎤
3
Furthermore, if we now want, for example, ⎣ 5 ⎦ , then we can quickly find it:
9 ℬ
⎡ ⎤
⎡
⎤⎡ ⎤
⎡
⎤
3
6 −1 −2
3
−5
1
1
⎣ 5 ⎦ = ⎣ −6 3 3 ⎦ ⎣ 5 ⎦ = ⎣ 24 ⎦ .
3
3
9 ℬ
3 −2 −1
9
−10
Example 8.8.18
#» #»
2
Suppose
[︂ we are
]︂ in C and
[︂ we ]︂wish to work with the ordered basis 𝒞 = {𝑣1 , 𝑣2 } , where
1 + 2𝑖
1+𝑖
𝑣#»1 =
and 𝑣#»2 =
.
−1 + 𝑖
𝑖
[︂
]︂
10 + 𝑖
#»
(a) Let 𝑧 =
. Find [ #»
𝑧 ]𝒞 .
2 + 6𝑖
#» ∈ C2 with [ 𝑤]
#» =
(b) Consider the vector 𝑤
𝒞
[︂
]︂
2
#» .
. Find [ 𝑤]
ℰ
3𝑖
Solution:
Note that, as usual, all the vectors have been given with respect to the standard basis.
We have
[︂
1 + 2𝑖 1 + 𝑖
ℰ [𝐼]𝒞 =
−1 + 𝑖 𝑖
]︂
and therefore, we can compute
[︂
1 + 2𝑖 1 + 𝑖
𝒞 [𝐼]ℰ =
−1 + 𝑖 𝑖
]︂−1
[︂
]︂
[︂
]︂ [︂
]︂
1
𝑖 −1 − 𝑖
𝑖 −1 − 𝑖
1
−1 + 𝑖
=
= −𝑖
=
.
1 − 𝑖 1 + 2𝑖
−1 − 𝑖 2 − 𝑖
𝑖 1 − 𝑖 1 + 2𝑖
(a) We can obtain [ #»
𝑧 ]𝒞 by
[ #»
𝑧 ]𝒞 = 𝒞 [𝐼]ℰ [ #»
𝑧 ]ℰ =
[︂
1
−1 + 𝑖
−1 − 𝑖 2 − 𝑖
]︂ [︂
]︂ [︂
]︂
10 + 𝑖
2 − 3𝑖
=
.
2 + 6𝑖
1−𝑖
[︂
]︂ [︂
]︂ [︂
]︂
2
−1 + 7𝑖
=
.
3𝑖
−5 + 2𝑖
#» by
(b) We can obtain [ 𝑤]
ℰ
#» = [𝐼] [ 𝑤]
#» =
[ 𝑤]
ℰ
ℰ
𝒞
𝒞
1 + 2𝑖 1 + 𝑖
−1 + 𝑖 𝑖
Chapter 9
Diagonalization
9.1
Matrix Representation of a Linear Operator
In this chapter we will focus on linear transformations 𝑇 : F𝑛 → F𝑛 whose domain and
codomain are the same set. Among other notable properties, these transformations will
have the ability to be composed with themselves; that is, functions such as 𝑇 ∘ 𝑇 will be
well-defined when the domain and codomain of 𝑇 are the same.
Definition 9.1.1
A linear transformation 𝑇 : F𝑛 → F𝑚 where 𝑛 = 𝑚 is called a linear operator.
Linear Operator
In Section 5.5, we learned how to find the standard matrix [𝑇 ]ℰ of a linear operator 𝑇 .
In this section we’ll learn how to find matrix representations of a linear operator 𝑇 with
respect to bases for F𝑛 other than the standard basis.
Definition 9.1.2
ℬ-Matrix of T
Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } be an ordered basis for
F𝑛 . We define the ℬ-matrix of 𝑇 to be the matrix [𝑇 ]ℬ constructed as follows:
[︁
]︁
[𝑇 ]ℬ = [𝑇 (𝑣#»1 )]ℬ [𝑇 (𝑣#»2 )]ℬ · · · [𝑇 (𝑣#»
)]
𝑛 ℬ .
That is, after applying the action of 𝑇 to each member of ℬ, we take the ℬ-coordinate
vectors of each of these images to create the columns of [𝑇 ]ℬ .
Similar to the way we used [𝑇 ]ℰ to find the image of a vector in F𝑛 , we can use [𝑇 ]ℬ to
find the coordinate vector with respect to ℬ of the image of any vector in F𝑛 by performing
matrix–vector multiplication.
Proposition 9.1.3
Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } be an ordered basis for
F𝑛 . If #»
𝑣 ∈ F𝑛 , then
[𝑇 ( #»
𝑣 )]ℬ = [𝑇 ]ℬ [ #»
𝑣 ]ℬ .
230
Section 9.1
Matrix Representation of a Linear Operator
231
Proof: Since ℬ is a basis for F𝑛 and #»
𝑣 ∈ F𝑛 , then there exists 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 ∈ F such that
#»
𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#»
𝑛.
We can use the linearity of 𝑇 to expand 𝑇 ( #»
𝑣 ). We get
#»
#»
#»
𝑇 ( #»
𝑣 ) = 𝑇 (𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#»
𝑛 ) = 𝑐1 𝑇 (𝑣1 ) + 𝑐2 𝑇 (𝑣2 ) + · · · + 𝑐𝑛 𝑇 (𝑣𝑛 ).
Taking coordinates is a linear operation, so
[𝑇 ( #»
𝑣 )]ℬ = [𝑐1 𝑇 (𝑣#»1 ) + 𝑐2 𝑇 (𝑣#»2 ) + · · · + 𝑐𝑛 𝑇 (𝑣#»
𝑛 )]ℬ
#»
#»
= 𝑐1 [𝑇 (𝑣1 )]ℬ + 𝑐2 [𝑇 (𝑣2 )]ℬ + · · · + 𝑐𝑛 [𝑇 (𝑣#»
𝑛 )]ℬ
⎡ ⎤
𝑐1
[︁
]︁ ⎢ 𝑐2 ⎥
⎢ ⎥
= [𝑇 (𝑣#»1 )]ℬ [𝑇 (𝑣#»2 )]ℬ · · · [𝑇 (𝑣#»
𝑛 )]ℬ ⎢ .. ⎥
⎣ . ⎦
𝑐𝑛
= [𝑇 ]ℬ [ #»
𝑣 ]ℬ ,
as required.
There are two important reasons why we may want to use another basis. The first is that
we may have some geometrically or physically preferred vectors that naturally arise in our
problem; for example, the direction vector of a line through which we are reflecting vectors.
The second is that we may wish to simplify the matrix representation; for example, it would
simplify things if [𝑇 ]ℬ were diagonal. These two reasons are often connected.
We will now revisit some of the linear transformations that we considered in Section 5.6.
For each transformation, we will choose a basis ℬ such that the ℬ-matrix will be particularly
simple: it will be diagonal.
Example 9.1.4
[︂
]︂
𝑤1
be a non-zero vector in R2 . Consider the projection transformation
𝑤2
proj 𝑤#» : R2 → R2 . Find a basis ℬ such that [proj 𝑤#» ]ℬ is diagonal.
#» =
Let 𝑤
Solution:
Consider some vectors for which it is relatively easy to determine their images under
[︂ proj
]︂ 𝑤#» .
#» onto itself is itself. Therefore, proj #» ( 𝑤)
#» = 𝑤.
#» The vector 𝑤2 is
The projection of 𝑤
𝑤
−𝑤1
(︂[︂
]︂)︂ [︂ ]︂
𝑤2
0
#» Therefore, proj #»
orthogonal to 𝑤.
=
.
𝑤
−𝑤1
0
{︂[︂ ]︂ [︂
]︂}︂
𝑤1
𝑤2
Let ℬ =
,
. This set consists of two vectors that are not scalar multiples of
𝑤2
−𝑤1
each other, and therefore it is linearly independent. So ℬ is a basis for R2 .
We have that
(︂[︂
proj 𝑤#»
and
(︂[︂
proj 𝑤#»
𝑤1
𝑤2
]︂)︂
𝑤2
−𝑤1
]︂
[︂ ]︂
[︂
]︂
𝑤1
𝑤1
𝑤2
=
=1
+0
𝑤2
𝑤2
−𝑤1
]︂)︂
[︂
[︂ ]︂
[︂ ]︂
[︂
]︂
0
𝑤1
𝑤2
=
=0
+0
0
𝑤2
−𝑤1 .
232
Chapter 9
Diagonalization
Consequently,
[︂
]︂
10
.
[proj 𝑤#» ]ℬ =
00
Notice that this is a diagonal matrix.
Example 9.1.5
]︂
𝑤1
be a non-zero vector in R2 . Consider the reflection transformation
𝑤2
#»
refl 𝑤#» : R2 → R2 , which reflects any vector in R2 about the line Span{ 𝑤}.
Find a basis
ℬ such that [refl 𝑤#» ]ℬ is diagonal.
#» =
Let 𝑤
[︂
Solution: Consider some vectors for which it is relatively easy to determine their images
#»
#»
#» #»
#»
under
[︂
]︂refl 𝑤 . The reflection of 𝑤 about itself is itself. Therefore, refl 𝑤 ( 𝑤) = 𝑤. The vector
𝑤2
#» and so its reflection about 𝑤
#» is its additive inverse. Therefore,
is orthogonal to 𝑤
−𝑤1
(︂[︂
]︂)︂ [︂
]︂
𝑤2
−𝑤2
#»
refl 𝑤
=
.
−𝑤1
𝑤1
{︂[︂ ]︂ [︂
]︂}︂
𝑤1
𝑤2
Let ℬ =
,
. This set consists of two vectors that are not scalar multiples of
𝑤2
−𝑤1
each other, and therefore it is linearly independent. So ℬ is a basis for R2 .
We have that
refl 𝑤#»
and
refl 𝑤#»
(︂[︂
(︂[︂
𝑤1
𝑤2
𝑤2
−𝑤1
]︂)︂
]︂)︂
[︂
]︂
[︂ ]︂
[︂
]︂
𝑤1
𝑤1
𝑤2
=
=1
+0
𝑤2
𝑤2
−𝑤1
[︂
]︂
[︂ ]︂
[︂
]︂
−𝑤2
𝑤1
𝑤2
=
=0
+ (−1)
𝑤1
𝑤2
−𝑤1 .
Consequently, we get the following diagonal matrix for [refl 𝑤#» ]ℬ :
]︂
[︂
1 0
.
[refl 𝑤#» ]ℬ =
0 −1
EXERCISE
In Examples{︂[︂
9.1.4]︂and
]ℬ and [refl 𝑤#» ]ℬ if we
[︂ 9.1.5,
]︂}︂ what happens to the matrices
{︂[︂
]︂[proj
[︂ 𝑤#»]︂}︂
𝑤1
𝑤2
𝑤2
𝑤1
replace ℬ =
,
with the ordered basis
,
?
𝑤2
−𝑤1
−𝑤1
𝑤2
REMARK
Finding a basis ℬ such that [𝑇 ]ℬ is diagonal is not usually done by inspection. Moreover,
it is not always possible to choose a basis ℬ such that [𝑇 ]ℬ is diagonal.
the
(︂[︂ For
]︂)︂example,
[︂
]︂
𝑥
𝑥
+
𝑦
2
2
operator known as the “shear operator” 𝑇 : R → R , defined by 𝑇
=
,
𝑦
𝑦
is an example of a linear operator for which no basis ℬ exists such that [𝑇 ]ℬ is diagonal.
Throughout this chapter we will develop criteria to determine whether or not we will be
able to find such a basis.
Section 9.1
Matrix Representation of a Linear Operator
233
Given a linear operator 𝑇 : F𝑛 → F𝑛 , we can create many matrices [𝑇 ]ℬ by choosing different
bases ℬ for F𝑛 . A natural question to ask is: How are these various matrices related? The
answer is that they are similar (see Definition 7.6.2).
Proposition 9.1.6
(Similarity of Matrix Representations)
Let 𝑇 : F𝑛 → F𝑛 be a linear operator. Let ℬ and 𝒞 be ordered bases for F𝑛 . Then
[𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ ℬ [𝐼]𝒞 = (ℬ [𝐼]𝒞 )−1 [𝑇 ]ℬ ℬ [𝐼]𝒞
and
[𝑇 ]ℬ = ℬ [𝐼]𝒞 [𝑇 ]𝒞 𝒞 [𝐼]ℬ = (𝒞 [𝐼]ℬ )−1 [𝑇 ]𝒞 𝒞 [𝐼]ℬ .
That is, the matrices [𝑇 ]ℬ and [𝑇 ]𝒞 are similar over F.
#» #»
#»
Proof: Let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } and let 𝒞 = {𝑤1 , 𝑤2 , . . . , 𝑤𝑛 }. Then using the definition of
[𝑇 ]𝒞 and Proposition 8.8.14 (Changing a Basis) gives
[︁
]︁
# »)] [𝑇 (𝑤
# »)] · · · [𝑇 (𝑤
# »)]
[𝑇 ]𝒞 = [𝑇 (𝑤
1 𝒞
2 𝒞
𝑛 𝒞
[︁
]︁
# »)]
#»
#»
= 𝒞 [𝐼]ℬ [𝑇 (𝑤
1 ℬ 𝒞 [𝐼]ℬ [𝑇 (𝑤2 )]ℬ · · · 𝒞 [𝐼]ℬ [𝑇 (𝑤𝑛 )]ℬ .
Using the definition of matrix multiplication, we can write this matrix as the product
[︁
]︁
# »)]
# »)] · · · [𝑇 (𝑤
# »)] .
[𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 (𝑤
[𝑇
(
𝑤
1 ℬ
2 ℬ
𝑛 ℬ
Using Proposition 9.1.3, we get
[︁
# »]
[𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ [𝑤
1 ℬ
]︁
# »] · · · [𝑇 ] [𝑤
# »] .
[𝑇 ]ℬ [𝑤
2 ℬ
ℬ 𝑛 ℬ
Again using the definition of matrix multiplication, we obtain
[︁
]︁
# »]
#»
#»
[𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ [𝑤
1 ℬ [𝑤2 ]ℬ · · · [𝑤𝑛 ]ℬ .
By definition,
[︁
# »]
[𝐼]
=
[𝑤
1 ℬ
ℬ
𝒞
]︁
# »] · · · [𝑤
# »] .
[𝑤
2 ℬ
𝑛 ℬ
Since
𝒞 [𝐼]ℬ
= (ℬ [𝐼]𝒞 )−1 ,
so we obtain that
[𝑇 ]𝒞 = (ℬ [𝐼]𝒞 )−1 [𝑇 ]ℬ
ℬ [𝐼]𝒞 .
By swapping the roles played by ℬ and 𝒞 in the above argument, we obtain that
[𝑇 ]ℬ = ℬ [𝐼]𝒞 [𝑇 ]𝒞 𝒞 [𝐼]ℬ = (𝒞 [𝐼]ℬ )−1 [𝑇 ]𝒞
𝒞 [𝐼]ℬ .
234
Corollary 9.1.7
Chapter 9
Diagonalization
(Finding the Standard Matrix)
Let 𝑇 : F𝑛 → F𝑛 be a linear operator. Let ℬ be a basis for F𝑛 and let ℰ be the standard
basis for F𝑛 . Then
[𝑇 ]ℰ = ℰ [𝐼]ℬ [𝑇 ]ℬ
ℬ [𝐼]ℰ
= (ℬ [𝐼]ℰ )−1 [𝑇 ]ℬ
[𝑇 ]ℬ = ℬ [𝐼]ℰ [𝑇 ]ℰ
ℰ [𝐼]ℬ
= (ℰ [𝐼]ℬ )−1 [𝑇 ]ℰ
ℬ [𝐼]ℰ
and
ℰ [𝐼]ℬ .
Proof: This follows from the previous Proposition, with 𝒞 replaced by ℰ.
EXERCISE
In this exercise you will show that if 𝐴 and 𝐵 are similar matrices in 𝑀𝑛×𝑛 (F), then they
are different representations of the same linear operator.
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Suppose that 𝐴 is similar to 𝐵, say 𝐴 = 𝑃 𝐵𝑃 −1 for some invertible
matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F). Let ℬ be the ordered basis for F𝑛 consisting of the columns of 𝑃
(ordered in the way they appear in 𝑃 ). Show that [𝑇𝐴 ]ℰ = 𝐴 and [𝑇𝐴 ]ℬ = 𝐵.
REMARK
The previous Exercise justifies the use of the word “similar” to describe the relationship
in Definition 7.6.2. If matrices 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F) are similar over F, then they are both
ℬ-matrices of the same linear operator 𝑇 : F𝑛 → F𝑛 . That is, similar matrices are just
different representations of the same operator.
Example 9.1.8
[︂
]︂
𝑤1
be a non-zero vector in R2 . Consider the projection transformation
𝑤2
proj 𝑤#» : R2 → R2 . Determine [proj 𝑤#» ]ℰ using the solution to Example 9.1.4 and Corollary
9.1.7 (Finding the Standard Matrix).
[︂
]︂
10
Solution: In Example 9.1.4, we found the matrix representation [proj 𝑤#» ]ℬ =
with
00
{︂[︂ ]︂ [︂
]︂}︂
𝑤1
𝑤2
respect to the basis ℬ =
,
.
𝑤2
−𝑤1
#» =
Let 𝑤
We also have that
[︂
𝑤1 𝑤2
ℰ [𝐼]ℬ =
𝑤2 −𝑤1
and
ℬ [𝐼]ℰ
= (ℰ [𝐼]ℬ )
−1
]︂
[︂
]︂
[︂
]︂
1
1
−𝑤1 −𝑤2
𝑤1 𝑤2
= 2
.
=
−𝑤12 − 𝑤22 −𝑤2 𝑤1
𝑤1 + 𝑤22 𝑤2 −𝑤1
Section 9.2
Diagonalizability of Linear Operators
235
Therefore, using Corollary 9.1.7 (Finding the Standard Matrix), we have
[proj 𝑤#» ]ℰ = ℰ [𝐼]ℬ [proj 𝑤#» ]ℬ ℬ [𝐼]ℰ
[︂
]︂ [︂
]︂
[︂
]︂
1
𝑤1 𝑤2
10
𝑤1 𝑤2
=
𝑤2 −𝑤1 0 0 𝑤12 + 𝑤22 𝑤2 −𝑤1
[︂
]︂ [︂
]︂
1
𝑤1 0 𝑤1 𝑤2
= 2
𝑤1 + 𝑤22 𝑤2 0 𝑤2 −𝑤1
[︂ 2
]︂
1
𝑤1 𝑤1 𝑤2
= 2
,
𝑤1 + 𝑤22 𝑤1 𝑤2 𝑤22
which matches what we found in Example 5.6.1.
Example 9.1.9
]︂
𝑤1
be a non-zero vector in R2 . Consider the reflection transformation
𝑤2
#»
refl 𝑤#» : R2 → R2 , which reflects any vector #»
𝑣 ∈ R2 about the line Span{ 𝑤}.
Determine
[refl 𝑤#» ]ℰ using the solution to Example 9.1.5 and Corollary 9.1.7 (Finding the Standard
Matrix).
]︂
[︂
1 0
with
Solution: In Example 9.1.5, we found the matrix representation [refl 𝑤#» ]ℬ =
0 −1
{︂[︂ ]︂ [︂
]︂}︂
𝑤1
𝑤2
respect to the basis ℬ =
,
.
𝑤2
−𝑤1
#» =
Let 𝑤
[︂
We also have that
[︂
ℰ [𝐼]ℬ =
and
ℬ [𝐼]ℰ
= (ℰ [𝐼]ℬ )
−1
𝑤1 𝑤2
𝑤2 −𝑤1
]︂
[︂
[︂
]︂
]︂
1
1
−𝑤1 −𝑤2
𝑤1 𝑤2
= 2
.
=
−𝑤12 − 𝑤22 −𝑤2 𝑤1
𝑤1 + 𝑤22 𝑤2 −𝑤1
Therefore, using Corollary 9.1.7 (Finding the Standard Matrix), we have
[refl 𝑤#» ]ℰ = ℰ [𝐼]ℬ [refl 𝑤#» ]ℬ ℬ [𝐼]ℰ
[︂
]︂ [︂
]︂
[︂
]︂
1
𝑤1 𝑤2
𝑤1 𝑤2
1 0
=
𝑤2 −𝑤1 0 −1 𝑤12 + 𝑤22 𝑤2 −𝑤1
[︂
]︂ [︂
]︂
1
𝑤1 −𝑤2 𝑤1 𝑤2
= 2
𝑤2 −𝑤1
𝑤1 + 𝑤22 𝑤2 𝑤1
[︂ 2
]︂
1
𝑤1 − 𝑤22 2𝑤1 𝑤2
= 2
,
𝑤1 + 𝑤22 2𝑤1 𝑤2 𝑤22 − 𝑤12
which matches what we found in Example 5.6.4.
9.2
Diagonalizability of Linear Operators
Of all the ℬ-matrices of a fixed linear operator 𝑇 : F𝑛 → F𝑛 , some might be simpler than
others. For instance, in Examples 9.1.4 and 9.1.5, we were able to find an ordered basis
ℬ for F𝑛 such that the ℬ-matrix of the given linear operator 𝑇 was especially simple: it
236
Chapter 9
Diagonalization
was a diagonal matrix. This can allow us to perform computations with or make certain
observations about the nature of 𝑇 in a more efficient manner.
Example 9.2.1
[︂ ]︂
1
#»
Let 𝑤 =
∈ R2 and consider the linear operator 𝑇 = refl 𝑤#» . Then:
2
[︂
]︂
1 −3 4
1. Using the standard basis ℰ, we have [𝑇 ]ℰ =
.
5 4 3
{︂[︂ ]︂ [︂ ]︂}︂
[︂
]︂
1
2
1 0
2. Using the basis ℬ =
,
from Example 9.1.5, we have [𝑇 ]ℬ =
.
0 −1
2
−1
{︂[︂ ]︂ [︂ ]︂}︂
[︂
]︂
−1 7 24
1
2
3. Using the basis 𝒞 =
, it can be shown that [𝑇 ]𝒞 =
.
−2 1
25 24 −25
We can see immediately that the matrix [𝑇 ]ℬ is invertible. This is also true of the ℰ- and
𝒞-matrix representations, but it is not as obvious.
Moreover, we can also see at once that ([𝑇 ]ℬ )2 = 𝐼2 . This is a manifestation of the geometric
fact that refl 𝑤#» ∘ refl 𝑤#» is the identity transformation. You can check that ([𝑇 ]ℰ )2 and ([𝑇 ]𝒞 )2
are also both equal to the 2 × 2 identity matrix, but this requires a bit of a computation.
This example demonstrates some of the ways in which a diagonal matrix representation,
such as [𝑇 ]ℬ , can be superior to other matrix representations.
Definition 9.2.2
Diagonalizable
Let 𝑇 : F𝑛 → F𝑛 be a linear operator. We say that 𝑇 is diagonalizable over F if there
exists an ordered basis ℬ for F𝑛 such that [𝑇 ]ℬ is a diagonal matrix.
Like matrices, not all linear operators are diagonalizable. We’ll see this later as we explore
the connection between the diagonalizability of matrices and linear operators.
Example 9.2.3
#» ∈ R2 be a non-zero vector. Then Example 9.1.4 shows that 𝑇 = proj #» is a diagonalLet 𝑤
𝑤
izable operator and Example 9.1.5 shows that 𝑇 = refl 𝑤#» is a diagonalizable operator.
We were able to accomplish this because we could choose basis vectors 𝑣#»1 and 𝑣#»2 such that
𝑇 (𝑣#») is a scalar multiple of 𝑣#».
𝑖
𝑖
The observation at the end of the preceding Example is the key to diagonalizability. It also
prompts the following (hopefully familiar!) definition.
Definition 9.2.4
Eigenvector,
Eigenvalue and
Eigenpair of a
Linear Operator
Let 𝑇 : F𝑛 → F𝑛 be a linear operator. We say that the non-zero vector #»
𝑥 ∈ F𝑛 is an
eigenvector of 𝑇 if there exists a scalar 𝜆 ∈ F such that
𝑇 ( #»
𝑥 ) = 𝜆 #»
𝑥.
The scalar 𝜆 is called an eigenvalue of 𝑇 over F and the pair (𝜆, #»
𝑥 ) is called an eigenpair
of 𝑇 over F.
Section 9.2
Diagonalizability of Linear Operators
237
This should remind you of the analogous definitions for matrices (Definition 7.1.5). These
definitions are connected as follows.
Proposition 9.2.5
(Eigenpairs of 𝑇 and [𝑇 ]ℬ )
Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ be an ordered basis for F𝑛 . Then (𝜆, #»
𝑥 ) is
an eigenpair of 𝑇 if and only if (𝜆, [ #»
𝑥 ]ℬ ) is an eigenpair of the matrix [𝑇 ]ℬ .
Proof: Take the coordinates of both sides with respect to ℬ of the eigenvalue problem and
use the fact that taking coordinates is a linear operation:
𝑇 ( #»
𝑥 ) = 𝜆 #»
𝑥 ⇔ [𝑇 ( #»
𝑥 )] = [𝜆 #»
𝑥]
ℬ
ℬ
⇔ [𝑇 ]ℬ [ #»
𝑥 ]ℬ = 𝜆 [ #»
𝑥 ]ℬ
EXERCISE
]︂
𝑤1
be a non-zero vector in R2 . Consider the projection transformation
𝑤2
proj 𝑤#» : R2 → R2 . Find the eigenvalues of proj 𝑤#» and, for each eigenvalue, find a corresponding eigenvector.
#» =
Let 𝑤
[︂
[Hint: The hard way is to use the standard basis and [proj 𝑤#» ]ℰ which we computed in
Example 5.6.1. The easy way is to use the basis ℬ from Example 9.1.4.]
Now we can give a criterion for the diagonalizability of a linear operator 𝑇 .
Proposition 9.2.6
(Eigenvector Basis Criterion for Diagonalizability)
Let 𝑇 : F𝑛 → F𝑛 be a linear operator. Then 𝑇 is diagonalizable over F if and only if there
𝑛
exists an ordered basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } for F consisting of eigenvectors of 𝑇 .
Proof: Begin with the forward direction. Assume that 𝑇 is diagonalizable over F. Then
𝑛
there exists an ordered basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } of F such that
[𝑇 ]ℬ = diag(𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ).
Therefore,
[𝑇 (𝑣#»𝑖 )]ℬ = [𝑇 ]ℬ [𝑣#»𝑖 ]ℬ
= diag(𝑑1 , . . . , 𝑑𝑛 )𝑒#»𝑖
(by Proposition 9.1.3)
= the 𝑖𝑡ℎ column of diag(𝑑1 , . . . , 𝑑𝑛 ) (by Lemma 4.2.2 (Column Extraction))
= 𝑑 𝑒#»
𝑖 𝑖
= 𝑑𝑖 [𝑣#»𝑖 ]ℬ .
Therefore, for each 𝑖 = 1, 2, . . . , 𝑛, 𝑣#»𝑖 is an eigenvector of 𝑇 with eigenvalue 𝑑𝑖 .
For the backward direction, assume that there exists an ordered basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛}
for F𝑛 consisting of eigenvectors of 𝑇 . Then, for all 𝑣#» ∈ ℬ, 𝑇 (𝑣#») = 𝜆 𝑣#» for some 𝜆 ∈ F.
𝑖
𝑖
𝑖 𝑖
So [𝑇 ]ℬ = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ), a diagonal matrix. Thus, 𝑇 is diagonalizable over F.
𝑖
238
Example 9.2.7
Chapter 9
Diagonalization
[︂ ]︂
𝑤1
#»
Let 𝑤 =
be a non-zero vector in R2 . In Example 9.1.4, we essentially showed that
𝑤2
{︂[︂ ]︂ [︂
]︂}︂
𝑤1
𝑤2
ℬ =
,
is a basis for R2 consisting of eigenvectors of proj 𝑤#» . So proj 𝑤#» is
𝑤2
−𝑤1
diagonalizable over R by Proposition 9.2.6.
[︂ ]︂
[︂
]︂
𝑤1
𝑤2
Moreover, we showed that the eigenvalues corresponding to
and
are 1 and
𝑤2
−𝑤1
#» = diag(1, 0). This is
0, respectively, so the proof of Proposition 9.2.6 asserts that [proj 𝑤]
ℬ
precisely what we had found in Example 9.1.4.
EXERCISE
Apply the same analysis to refl 𝑤#» .
The previous Example and Exercise make use of eigenpairs that were determined through a
geometric argument. Not all transformations will have intuitive geometric interpretations,
so this strategy won’t always be practical. We will tackle this problem in the next two
sections.
9.3
Diagonalizability of Matrices Revisited
In this section we will describe a strategy for determining when a linear operator is diagonalizable while simultaneously connecting this notion of diagonalizability to the notion of
diagonalizability of matrices that we learned about in Section 7.6.
Proposition 9.3.1
(𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable)
Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ be an ordered basis of F𝑛 . Then 𝑇 is
diagonalizable over F if and only if the matrix [𝑇 ]ℬ is diagonalizable over F.
Proof: (⇒):
Assume that 𝑇 is diagonalizable over F. Then, by Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability), there exists an ordered basis 𝒞 of F𝑛 consisting of eigenvectors
of 𝑇 . In the proof of Proposition 9.2.6, we saw that [𝑇 ]𝒞 = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ), where the
𝜆𝑖 ’s are the eigenvalues of 𝑇.
By Corollary 9.1.7 (Finding the Standard Matrix),
[𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ ℬ [𝐼]𝒞 = (ℬ [𝐼]𝒞 )−1 [𝑇 ]ℬ ℬ [𝐼]𝒞 .
Since [𝑇 ]𝒞 is diagonal, then [𝑇 ]ℬ is diagonalizable over F and the matrix 𝑃 = ℬ [𝐼]𝒞 diagonalizes [𝑇 ]ℬ .
Section 9.3
Diagonalizability of Matrices Revisited
239
(⇐):
Assume
[︀ that [𝑇 ]ℬ]︀ is diagonalizable over F.
𝑃 = 𝑝#»1 𝑝#»2 · · · 𝑝#»
𝑛 such that
Then there exists an invertible matrix
𝑃 −1 [𝑇 ]ℬ 𝑃 = 𝐷 = diag(𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ).
#»
#»
𝑡ℎ column of 𝑃 .
We define vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 such that for 𝑖 = 1, . . . , 𝑛, [𝑣𝑖 ]ℬ = 𝑝 𝑖 , the 𝑖
#»
#»
We see this is possible by defining 𝑣𝑖 = ℰ [𝐼]ℬ 𝑝𝑖 , so that
𝑝#»𝑖 = (ℰ [𝐼]ℬ )−1 𝑣#»𝑖 = ℬ [𝐼]ℰ 𝑣#»𝑖 = [𝑣#»𝑖 ]ℬ .
Therefore,
[︁
]︁
𝑃 = [𝑣#»1 ]ℬ [𝑣#»2 ]ℬ · · · [𝑣#»
]
𝑛 ℬ .
𝑛
We will show that the set 𝒮 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»
𝑛 } is a basis for F . By Proposition 8.5.4 (𝑛
𝑛
Vectors in F Span iff Independent), since it contains 𝑛 vectors in F𝑛 , then it will be enough
to show that the set 𝒮 is linearly independent. Consider the equation
#»
𝑐1 𝑣#»1 + · · · + 𝑐𝑛 𝑣#»
𝑛 = 0.
Taking the coordinates with respect to ℬ of both sides of the equation above, as finding
#»
#»
coordinates with respect to a basis is a linear process and [ 0 ]ℬ = 0 , we obtain the equation
#»
𝑐1 [𝑣#»1 ]ℬ + · · · + 𝑐𝑛 [𝑣#»
𝑛 ]ℬ = 0 .
The set {[𝑣#»1 ]ℬ , . . . , [𝑣#»
𝑛 ]ℬ } consists of the columns of 𝑃 . Since 𝑃 is invertible, then by
Theorem 4.6.7 (Invertibility Criteria – First Version), rank(𝑃 ) = 𝑛. It then follows from
Proposition 8.3.6 (Pivots and Linear Independence) that {[𝑣#»1 ]ℬ , . . . , [𝑣#»
𝑛 ]ℬ } is linearly independent. Therefore, 𝑐1 = · · · = 𝑐𝑛 = 0 and thus, 𝒮 is linearly independent and a basis for
F𝑛 .
Next, we will show that the vectors in 𝒮 are also eigenvectors of 𝑇 . Using the fact that
𝑃 −1 [𝑇 ]ℬ 𝑃 = 𝐷 = diag(𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ), we can multiply both sides by 𝑃 to obtain that
[𝑇 ]ℬ 𝑃 = 𝑃 𝐷. Comparing the 𝑖𝑡ℎ column of [𝑇 ]ℬ 𝑃 with the 𝑖𝑡ℎ column of 𝑃 𝐷 we obtain
the equation
[𝑇 ]ℬ [𝑣#»𝑖 ]ℬ = 𝑑𝑖 [𝑣#»𝑖 ]ℬ .
Therefore, by Proposition 9.1.3 and the linearity of finding coordinates,
[𝑇 (𝑣#»𝑖 )]ℬ = [𝑑𝑖 𝑣#»𝑖 ]ℬ .
Thus, 𝑇 (𝑣#»𝑖 ) = 𝑑𝑖 𝑣#»𝑖 by Theorem 8.8.1 (Unique Representation Theorem). We know that
#»
𝑣#»𝑖 ̸= 0 , since 𝑣#»𝑖 belongs to a linearly independent set, and so 𝑣#»𝑖 is an eigenvector of 𝑇 .
Consequently, we conclude that 𝒮 is a basis of eigenvectors of 𝑇 , and 𝑇 is diagonalizable
over F by Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability).
Using this result, we can translate our criterion for diagonalizability of operators from
Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability) to a criterion for diagonalizability of matrices.
240
Corollary 9.3.2
Chapter 9
Diagonalization
(Eigenvector Basis Criterion for Diagonalizability – Matrix Version)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 𝐴 is diagonalizable over F if and only if there exists a basis of F𝑛
consisting of eigenvectors of 𝐴.
Proof: Consider the linear operator 𝑇𝐴 : F𝑛 → F𝑛 such that 𝑇𝐴 ( #»
𝑥 ) = 𝐴 #»
𝑥 . Then [𝑇𝐴 ]ℰ = 𝐴
and
𝐴 is diagonalizable over F
⇐⇒ [𝑇𝐴 ]ℰ is diagonalizable over F
(because [𝑇𝐴 ]ℰ = 𝐴)
⇐⇒ 𝑇𝐴 is diagonalizable over F
(by Proposition 9.3.1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable))
⇐⇒ There exists a basis ℬ of F𝑛 of eigenvectors of 𝑇𝐴
(by Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability))
By Proposition 9.2.5 (Eigenpairs of 𝑇 and [𝑇 ]ℬ ), #»
𝑥 is an eigenvector of 𝑇𝐴 if and only if #»
𝑥
𝑛
is an eigenvector of [𝑇 ]ℰ = 𝐴. Therefore, there exists a basis ℬ of F of eigenvectors of 𝑇𝐴
if and only if there exists a basis ℬ of F𝑛 of eigenvectors of 𝐴.
REMARKS
The previous two results contain several important pieces of information.
1. To determine whether a linear operator 𝑇 is diagonalizable over F, we can start by
obtaining the matrix of 𝑇 with respect to some basis ℬ for F𝑛 , and then test whether
this matrix is diagonalizable over F. The most natural initial choice for ℬ is the
standard basis, ℰ. Thus, moving forward, we can focus our discussion on the
diagonalization of matrices.
2. If 𝐴 is diagonalizable over F and 𝑃 −1 𝐴𝑃 = 𝐷 is a diagonal matrix, then
(a) the entries of 𝐷 are the eigenvalues of 𝐴 in F,
(b) the matrix 𝑃 is the change of basis matrix from an ordered basis ℬ for F𝑛
comprised of eigenvectors of 𝐴 to the standard basis ℰ for F𝑛 (so 𝑃 = ℰ [𝐼]ℬ ),
and
(c) the columns of 𝑃 are eigenvectors of 𝐴.
Corollary 9.3.2 (Eigenvector Basis Criterion for Diagonalizability – Matrix Version) tells us
that determining whether a matrix 𝐴 is diagonalizable over F boils down to whether or not
we can find 𝑛 linearly independent eigenvectors of 𝐴 in F𝑛 .
Section 9.3
Diagonalizability of Matrices Revisited
241
⎡
Example 9.3.3
⎤
3 4 −2
Let 𝑇 : R3 → R3 be a linear operator with [𝑇 ]ℰ = ⎣ 3 8 −3 ⎦ . We will show that 𝑇 is
6 14 −5
diagonalizable over R by finding three linearly independent eigenvectors of 𝐴 = [𝑇 ]ℰ . In
order to do this, let us begin by finding all the possible eigenvectors of 𝐴.
The characteristic polynomial of 𝐴 is
⎛⎡
⎤⎞
3−𝜆 4
−2
𝐶𝐴 (𝜆) = det ⎝⎣ 3 8 − 𝜆 −3 ⎦⎠ = −𝜆3 + 6𝜆2 − 11𝜆 + 6 = −(𝜆 − 3)(𝜆 − 2)(𝜆 − 1).
6
14 −5 − 𝜆
The eigenvalues of 𝐴 are thus 𝜆1 = 1, 𝜆2 = 2 and 𝜆3 = 3. We leave it as an exercise to show
that the corresponding eigenspaces are given by
⎧⎡ ⎤⎫
⎧⎡ ⎤⎫
⎧⎡ ⎤⎫
⎨ 1 ⎬
⎨ 0 ⎬
⎨ 1 ⎬
𝐸1 = Span ⎣ 0 ⎦ , 𝐸2 = Span ⎣ 1 ⎦ and 𝐸3 = Span ⎣ 3 ⎦ .
⎩
⎭
⎩
⎭
⎩
⎭
1
2
6
Therefore, there is an obvious choice for our desired three linearly independent eigenvectors:
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1 ⎬
0
⎨ 1
⎦
⎣
⎦
⎣
⎣
0 , 1 , 3⎦ .
ℬ=
⎩
⎭
6
2
1
Of course, we have to check that this set is in fact linearly independent. We can easily do
this using Proposition 8.3.6 (Pivots and Linear Independence), for instance. Thus ℬ is a
basis for R3 consisting of eigenvectors of 𝐴.
Therefore, 𝐴 is diagonalizable over F, by Corollary 9.3.2 (Eigenvector Basis Criterion for
Diagonalizability – Matrix Version), and hence 𝑇 is diagonalizable over F too, by Proposition
9.3.1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable).
In the previous Example we could have invoked Proposition 7.6.7 (𝑛 Distinct Eigenvalues
=⇒ Diagonalizable) to deduce that 𝐴 is diagonalizable. Notice that Proposition 7.6.7 suggests the same strategy that we followed above: namely, take an eigenvector corresponding
to each one of the eigenvalues of 𝐴, and form a basis using these eigenvectors. For this strategy to actually produce a basis, we need to be sure that the resulting set of eigenvectors is
linearly independent.
Proposition 9.3.4
(Eigenvectors Corresponding to Distinct Eigenvalues are Linearly Independent)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have eigenpairs (𝜆1 , 𝑣#»1 ), (𝜆2 , 𝑣#»2 ), . . . , (𝜆𝑘 , 𝑣#»𝑘 ) over F.
If the eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑘 are all distinct, then the set of eigenvectors {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }
is linearly independent.
Proof: Proceed by induction on 𝑘. If 𝑘 = 1, then since eigenvectors are non-zero we have
that {𝑣#»1 } is linearly independent.
242
Chapter 9
Diagonalization
Assume the statement holds for 𝑘 = 𝑗, that is, we assume that the set {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑗 } is
linearly independent.
»}. We must show that this set is linearly independent
Consider the set {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑗 , 𝑣# 𝑗+1
and so we consider the equation
» = #»
𝑐1 𝑣#»1 + · · · + 𝑐𝑗+1 𝑣# 𝑗+1
0 . (*)
We’ll manipulate (*) in two ways. First, multiply both sides of (*) through by 𝐴 to obtain
» = #»
𝑐1 𝜆1 𝑣#»1 + · · · + 𝑐𝑗 𝜆𝑗 𝑣#»𝑗 + 𝑐𝑗+1 𝜆𝑗+1 𝑣# 𝑗+1
0 . (**)
Next, multiply both sides of (*) through by 𝜆𝑗+1 to obtain
» = #»
𝑐1 𝜆𝑗+1 𝑣#»1 + · · · + 𝑐𝑗 𝜆𝑗+1 𝑣#»𝑗 + 𝑐𝑗+1 𝜆𝑗+1 𝑣# 𝑗+1
0 . (* * *)
Then subtract (* * *) from (**) to get
#»
𝑐1 (𝜆1 − 𝜆𝑗+1 )𝑣#»1 + · · · + 𝑐𝑗 (𝜆𝑗 − 𝜆𝑗+1 )𝑣#»𝑗 = 0 .
By the inductive hypothesis, {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑗 } is linearly independent so for all 𝑖, 1 ≤ 𝑖 ≤ 𝑗,
𝑐𝑖 (𝜆𝑖 − 𝜆𝑗+1 ) = 0.
However, 𝜆𝑖 ̸= 𝜆𝑗+1 and so 𝑐𝑖 = 0. Thus, 𝑐1 = · · · = 𝑐𝑗 = 0 and our original equation (*)
becomes
» = #»
𝑐𝑗+1 𝑣# 𝑗+1
0.
» is nonzero since it is an eigenvector, so we conclude that 𝑐
The vector 𝑣# 𝑗+1
𝑗+1 = 0. Conse#»
#»
#»
#
»
quently, the set {𝑣1 , 𝑣2 , . . . , 𝑣𝑗 , 𝑣𝑗+1 } is linearly independent. This establishes the inductive
step and completes the proof.
We can now provide a proof of Proposition 7.6.7 (𝑛 Distinct Eigenvalues =⇒ Diagonalizable) which states the following:
If 𝐴 ∈ 𝑀𝑛×𝑛 (F) has 𝑛 distinct eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 in F, then 𝐴 is diagonalizable over
F.
More specifically, if we let (𝜆1 , 𝑣#»1 ), . . . , (𝜆𝑛 , 𝑣#»
𝑛 ) be eigenpairs of 𝐴 over F, and if we let
𝑃 = [𝑣#» · · · 𝑣#»] be the matrix whose columns are eigenvectors corresponding to the distinct
1
𝑛
eigenvalues, then
(a) 𝑃 is invertible, and
(b) 𝑃 −1 𝐴𝑃 = 𝐷 = diag(𝜆1 , 𝜆2 , · · · , 𝜆𝑛 ).
Proof: Since we have 𝑛 eigenvectors corresponding to 𝑛 distinct eigenvalues, then the set
{𝑣#»1 , . . . , 𝑣#»
𝑛 } is linearly independent by Proposition 9.3.4 (Eigenvectors Corresponding to
Distinct Eigenvalues are Linearly Independent). Further, the set {𝑣#»1 , . . . , 𝑣#»
𝑛 } is a basis of
eigenvectors for F𝑛 . Therefore, 𝐴 is diagonalizable over F by Proposition 9.2.6 (Eigenvector
Basis Criterion for Diagonalizability).
Section 9.3
Diagonalizability of Matrices Revisited
243
#»
#»
Let 𝑃 = [𝑣#»1 · · · 𝑣#»
𝑛 ]. Since {𝑣1 , . . . , 𝑣𝑛 } is a basis, then it is linearly independent, and
so rank(𝑃 ) = 𝑛 by Proposition 8.3.6 (Pivots and Linear Independence). Therefore, by
Theorem 4.6.7 (Invertibility Criteria – First Version), 𝑃 is invertible. Now since 𝑣#»1 , . . . , 𝑣#»
𝑛
are eigenvectors of 𝐴, then using Lemma 4.2.2 (Column Extraction),
𝐴𝑃 = 𝐴[𝑣#» · · · 𝑣#»] = [𝐴𝑣#» · · · 𝐴𝑣#»] = [𝜆 𝑣#» · · · 𝜆 𝑣#»].
1
𝑛
𝑖
𝑛
1 1
𝑛 𝑛
Let 𝐷 = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ) = [𝜆1 𝑒#»1 · · · 𝜆𝑛 𝑒#»
𝑛 ]. Therefore, using Lemma 4.2.2 again,
#»
#»
#»
#»
𝑃 𝐷 = 𝑃 [𝜆1 𝑒#»1 · · · 𝜆𝑛 𝑒#»
𝑛 ] = [𝜆1 𝑃 𝑒1 · · · 𝜆𝑛 𝑃 𝑒𝑛 ] = [𝜆1 𝑣1 · · · 𝜆𝑛 𝑣𝑛 ].
Therefore, 𝐴𝑃 = 𝑃 𝐷 and so
𝑃 −1 𝐴𝑃 = 𝐷 = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ).
⎡
Example 9.3.5
⎤
5 2 −1
Suppose 𝑇 : R3 → R3 is a linear operator with [𝑇 ]ℰ = ⎣ 8 1 −2 ⎦ .
16 0 −3
(a) Prove that 𝑇 is diagonalizable over R.
(b) Find a basis ℬ of R3 such that [𝑇 ]ℬ is a diagonal matrix.
(c) Determine an invertible matrix 𝑃 such that 𝑃 −1 [𝑇 ]ℰ 𝑃 = [𝑇 ]ℬ .
Solution:
(a) By Proposition 9.3.1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable), it suffices to prove that
𝐴 = [𝑇 ]ℰ is diagonalizable over R. The characteristic polynomial of 𝐴 is
⎤⎞
⎛⎡
5−𝜆 2
−1
𝐶𝐴 (𝜆) = det ⎝⎣ 8 1 − 𝜆 −2 ⎦⎠ = −𝜆3 + 3𝜆2 + 13𝜆 − 15.
16
0 −3 − 𝜆
Since
−𝜆3 + 3𝜆2 + 13𝜆 − 15 = −(𝜆 − 5)(𝜆 − 1)(𝜆 + 3),
the eigenvalues of 𝑇 are 𝜆1 = 5, 𝜆2 = 1, and 𝜆3 = −3. There are three distinct eigenvalues, and therefore 𝐴 is diagonalizable over R by Proposition 7.6.7 (𝑛 Distinct Eigenvalues
=⇒ Diagonalizable).
(b) We want a basis ℬ for R3 consisting of eigenvectors of 𝐴. Since we have three distinct
eigenvalues, we need to select one eigenvector from each eigenspace. We leave it as an
exercise to determine that the eigenspaces of 𝐴 are given by
⎧⎡ ⎤⎫
⎧⎡ ⎤⎫
⎧⎡ ⎤⎫
⎨ 1 ⎬
⎨ 1 ⎬
⎨ 0 ⎬
⎣
⎦
⎣
⎦
1
0
𝐸5 = Span
, 𝐸1 = Span
and 𝐸−3 = Span ⎣ 1 ⎦ .
⎩
⎭
⎩
⎭
⎩
⎭
2
4
2
Thus,
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1
0 ⎬
⎨ 1
⎣
⎦
⎣
⎦
⎣
1 , 0 , 1⎦
ℬ=
⎩
⎭
2
4
2
is a basis for R3 for which [𝑇 ]ℬ = diag(5, 1, −3).
244
Chapter 9
Diagonalization
(c) The columns of the desired matrix 𝑃 are the eigenvectors in ℬ. Thus,
⎡
⎤
110
𝑃 = ⎣1 0 1⎦ .
242
REMARK
We repeat for emphasis that the matrix 𝑃 in part (c) that diagonalizes 𝐴 is the change of
basis matrix from the basis of eigenvectors of 𝐴 to the standard basis, ℰ [𝐼]ℬ .
9.4
The Diagonalizability Test
For 𝐴 ∈ 𝑀𝑛×𝑛 (F) to be diagonalizable over F, it must have 𝑛 linearly independent eigenvectors in F𝑛 . Let us consider some examples where 𝑛 linearly independent eigenvectors do
not exist.
]︂
0 1
. Then 𝐶𝐴 (𝜆) = 𝜆2 + 1. This matrix has no real eigenvalues and therefore
−1 0
no real eigenvectors. So 𝐴 is not diagonalizable over R.
[︂
Example 9.4.1
Let 𝐴 =
However, we note that 𝐴 does have two distinct complex eigenvalues, 𝜆1 = 𝑖 and 𝜆2 = −𝑖,
and is therefore diagonalizable over C.
⎤
2 0 0
Let 𝐴 = ⎣ 0 0 1 ⎦ . Then 𝐶𝐴 (𝜆) = −(𝜆 − 2)(𝜆2 + 1).
0 −1 0
⎡
Example 9.4.2
This matrix has only one real eigenvalue 𝜆1 = 2. We can show that all eigenvectors associated
[︀
]︀𝑇
with this eigenvalue are scalar multiples of 1 0 0 . Therefore, 𝐴 is not diagonalizable
over R.
However, we note that 𝐴 does have three distinct complex eigenvalues, 𝜆1 = 2, 𝜆2 = 𝑖, and
𝜆3 = −𝑖, and is therefore diagonalizable over C.
[︂
Example 9.4.3
]︂
01
Let 𝐴 =
. Then 𝐶𝐴 (𝜆) = 𝜆2 = (𝜆 − 0)2 .
00
This matrix has two real eigenvalues, but they are both 0. We can show that all eigenvec[︀
]︀𝑇
tors associated with the eigenvalue 0 are scalar multiples of 1 0 . Therefore, 𝐴 is not
diagonalizable over R or C.
We will now introduce some theory that will help us determine exactly when a basis of
eigenvectors exists.
Section 9.4
The Diagonalizability Test
Definition 9.4.4
Algebraic
Multiplicity
245
Let 𝜆𝑖 be an eigenvalue of 𝐴 ∈ 𝑀𝑛×𝑛 (F). The algebraic multiplicity of 𝜆𝑖 , denoted by
𝑎𝜆𝑖 , is the largest positive integer such that (𝜆 − 𝜆𝑖 )𝑎𝜆𝑖 divides the characteristic polynomial
𝐶𝐴 (𝜆).
In other words, 𝑎𝜆𝑖 gives the number of times that (𝜆−𝜆𝑖 ) terms occur in the fully factorized
form of 𝐶𝐴 (𝜆).
⎡
Example 9.4.5
⎤
122
Let 𝐴 = ⎣ 2 1 2 ⎦ . Then 𝐶𝐴 (𝜆) = −(𝜆 − 5)(𝜆 + 1)2 .
221
The matrix 𝐴 has two distinct eigenvalues: 𝜆1 = 5, with algebraic multiplicity 𝑎𝜆1 = 1, and
𝜆2 = −1, with algebraic multiplicity 𝑎𝜆2 = 2.
We can also say that 𝐴 has three eigenvalues with one of them being repeated twice. So its
eigenvalues are 5, −1, and −1.
Definition 9.4.6
Geometric
Multiplicity
Let 𝜆𝑖 be an eigenvalue of 𝐴 ∈ 𝑀𝑛×𝑛 (F). The geometric multiplicity of 𝜆𝑖 , denoted by
𝑔𝜆𝑖 , is the dimension of the eigenspace 𝐸𝜆𝑖 . That is, 𝑔𝜆𝑖 = dim(𝐸𝜆𝑖 ).
⎡
Example 9.4.7
⎤
122
Let 𝐴 ∈ 𝑀3×3 (R) with 𝐴 = ⎣ 2 1 2 ⎦ . Determine the geometric multiplicities of all eigen221
values of 𝐴.
Solution:
The eigenvalues of 𝐴 are 𝜆1 = 5 and 𝜆2 = −1.
#»
The eigenspace 𝐸𝜆1 is the solution set to (𝐴 − 5𝐼) #»
𝑥 = 0 , which is equivalent to
⎤
⎡
−4 2 2
#»
⎣ 2 −4 2 ⎦ #»
𝑥 = 0
2 2 −4
and row reduces to
A basis for 𝐸5 is thus
⎡
⎤
1 0 −1
#»
⎣ 0 1 −1 ⎦ #»
𝑥 = 0.
00 0
⎧⎡ ⎤⎫
⎨ 1 ⎬
⎣1⎦ .
⎩
⎭
1
Therefore, dim(𝐸𝜆1 ) = 1 and so 𝑔𝜆1 = 1.
#»
The eigenspace 𝐸𝜆2 is the solution set to (𝐴 + 1𝐼) #»
𝑥 = 0 , which is equivalent to
⎡
⎤
222
#»
⎣ 2 2 2 ⎦ #»
𝑥 = 0
222
246
Chapter 9
and row reduces to
Diagonalization
⎡
⎤
111
#»
⎣ 0 0 0 ⎦ #»
𝑥 = 0.
000
A basis for 𝐸𝜆2 is thus
⎧⎡
⎤ ⎡ ⎤⎫
1 ⎬
⎨ 1
⎣ −1 ⎦ , ⎣ 0 ⎦ .
⎩
⎭
0
−1
Therefore, dim(𝐸𝜆2 ) = 2 and so 𝑔𝜆2 = 2.
The geometric multiplicity of an eigenvalue tells us the maximum number of linearly independent eigenvectors we can obtain from the eigenspace corresponding to that eigenvalue.
The two multiplicities are connected through the following result.
Proposition 9.4.8
(Geometric and Algebraic Multiplicities)
Let 𝜆𝑖 be an eigenvalue of the matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then
1 ≤ 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 .
REMARK
In the proof below, we will describe a matrix by using what is called a block matrix; that
is, denoting some or all parts of a matrix as matrices themselves.
For example, given the matrix
⎡
1
⎢3
⎢
𝐴=⎢
⎢ 11
⎣ 13
15
2
4
12
14
16
5
8
17
20
23
6
9
18
21
24
⎤
7
10 ⎥
⎥
19 ⎥
⎥
22 ⎦
25
we can describe 𝐴 as
[︂
𝐴1 𝐴2
𝐴=
𝐵1 𝐵2
]︂
where
[︂
]︂
12
𝐴1 =
,
34
[︂
]︂
56 7
𝐴2 =
,
8 9 10
⎡
⎤
11 12
𝐵1 = ⎣ 13 14 ⎦ ,
15 16
⎡
⎤
17 18 19
𝐵2 = ⎣ 20 21 22 ⎦ .
23 24 25
Alternatively, with 𝐴2 , 𝐵1 , and 𝐵2 defined as above, we can re-write our matrix 𝐴 as
⎡
⎤
1 2
𝐴2 ⎦
𝐴=⎣ 3 4
.
𝐵1
𝐵2
Of course, there are many other ways we could slice 𝐴 into blocks to give different block
representations of 𝐴.
Section 9.4
The Diagonalizability Test
247
Proof: By definition, if 𝜆𝑖 is an eigenvalue of 𝐴, then there is a non-trivial solution to
𝐴 #»
𝑣 = 𝜆𝑖 #»
𝑣 . Therefore, each eigenspace contains at least one non-zero vector and so its
dimension will be at least one. Therefore, 1 ≤ 𝑔𝜆𝑖 .
Suppose that {𝑣#», . . . , 𝑣#»} is a basis for 𝐸 . Therefore, 𝑔 = 𝑘.
1
𝑘
𝜆𝑖
𝜆𝑖
F𝑛
We can extend this basis for 𝐸𝜆𝑖 to a basis ℬ for
(for a demonstration of how this
can be done, see Example 8.5.7, as well as the remark preceding it). Therefore, ℬ =
# », . . . , 𝑤
# »} .
{𝑣#»1 , . . . , 𝑣#»𝑘 , 𝑤
𝑛
𝑘+1
𝑛
𝑛
We let 𝑇 : F → F be defined as 𝑇 ( #»
𝑣 ) = 𝐴 #»
𝑣.
𝐴
𝐴
For 𝑗 = 1, . . . , 𝑘, it is the case that 𝑣#»𝑗 ∈ 𝐸𝜆𝑖 and therefore, 𝐴𝑣#»𝑗 = 𝜆𝑖 𝑣#»𝑗 . Therefore,
# »)] will be some vector in F𝑛 . It now follows
[𝑇𝐴 (𝑣#»𝑗 )]ℬ = 𝜆𝑖 𝑒#»𝑗 . For 𝑗 = 𝑘 + 1, . . . , 𝑛, [𝑇𝐴 (𝑤
𝑗 ℬ
that [𝑇𝐴 ]ℬ has the block structure
⎤
⎡
𝜆𝑖 0 · · · 0
⎥
⎢ 0 𝜆𝑖 · · · 0
⎢
⎥
𝑀
⎥
⎢ ..
.
.
1
.. . . . ..
[𝑇𝐴 ]ℬ = ⎢ .
⎥
⎥
⎢
⎦
⎣ 0 0 · · · 𝜆𝑖
𝒪(𝑛−𝑘)×𝑘
𝑀2
for some 𝑀1 ∈ 𝑀𝑘×(𝑛−𝑘) (F) and some 𝑀2 ∈ 𝑀(𝑛−𝑘)×(𝑛−𝑘) (F).
The characteristic polynomial of 𝐴 is equal to the characteristic polynomials of [𝑇𝐴 ]ℬ since
these two matrices are similar (exercise).
It can be shown by induction that
𝐶𝐴 (𝜆) = (𝜆𝑖 − 𝜆)𝑘 𝐶𝑀2 (𝜆) = (𝜆𝑖 − 𝜆)𝑔𝜆𝑖 𝐶𝑀2 (𝜆).
Since 𝑎𝜆𝑖 is the largest positive integer such that (𝜆 − 𝜆𝑖 )𝑎𝜆𝑖 divides 𝐶𝐴 (𝑡), it follows that
𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 .
What happens if we take the union of the bases of all the eigenspaces of a given matrix? It
turns out that this set is linearly independent.
Proposition 9.4.9
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) with distinct eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑘 . If their corresponding
eigenspaces, 𝐸𝜆1 , 𝐸𝜆2 , . . . , 𝐸𝜆𝑘 have bases ℬ1 , ℬ2 , . . . , ℬ𝑘 , then ℬ = ℬ1 ∪ ℬ2 ∪ · · · ∪ ℬ𝑘 is
linearly independent.
Proof: To make things easier to read, let 𝑔𝑖 = 𝑔𝜆𝑖 for 𝑖 = 1, . . . , 𝑘.
Let ℬ1 = {𝑣# 1»1 , . . . , 𝑣# 1 𝑔»1 }, ℬ2 = {𝑣# 2»1 , . . . , 𝑣# 2 𝑔»2 } and so on up to ℬ𝑘 = {𝑣# 𝑘»1 . . . , 𝑣# 𝑘 𝑔»𝑘 }. Take
note of the double subscripts here, with the first subscript corresponding to the numbering
of the basis (ℬ1 , ℬ2 , and so on) and the second subscript counting the vector’s position
within that basis.
We wish to perform the linear dependence check on ℬ = ℬ1 ∪ ℬ2 ∪ · · · ∪ ℬ𝑘 . We consider
the following equation with scalars in F:
𝑔1
∑︁
𝑖=1
𝑐1𝑖 𝑣# 1»𝑖 +
𝑔2
∑︁
𝑖=1
𝑐2𝑖 𝑣# 2»𝑖 + · · · +
𝑔𝑘
∑︁
𝑖=1
#»
𝑐𝑘𝑖 𝑣# 𝑘»𝑖 = 0
(*).
248
Chapter 9
Diagonalization
𝑔1
𝑔2
# » = ∑︀
# » = ∑︀
Let 𝑤
𝑐1𝑖 𝑣# 1»𝑖 , which is a vector from 𝐸𝜆1 . Let 𝑤
𝑐2𝑖 𝑣# 2»𝑖 , which is a vector from
1
2
𝑖=1
𝑖=1
𝑔𝑘
# » = ∑︀
𝐸𝜆2 . Continuing in this manner, let 𝑤
𝑐𝑘𝑖 𝑣# 𝑘»𝑖 , which is a vector from 𝐸𝜆𝑘 .
𝑘
𝑖=1
Therefore, our equation (*) has become
#» + 𝑤
#» + ··· + 𝑤
# » = #»
𝑤
0
1
2
𝑘
(**).
# » are from a different eigenspace. The vectors in a given eigenspace are
Each of the 𝑤
𝑗
# » are not the zero vector, then from
either eigenvectors or the zero vector. If any of the 𝑤
𝑗
(**) we get a non-trivial linear combination of eigenvectors each chosen from a different
eigenspace equalling to the zero vector. This contradicts Proposition 9.3.4 (Eigenvectors
Corresponding to Distinct Eigenvalues are Linearly Independent), which tells us that a set
of eigenvectors chosen in such a way that every eigenvector is from a different eigenspace is
#» = ··· = 𝑤
# » = #»
linearly independent. Therefore, it must be the case that 𝑤
0.
1
𝑘
𝑔𝑗
# » = ∑︀ 𝑐 𝑣# » = #»
But 𝑤
0 implies that 𝑐𝑗1 = · · · = 𝑐𝑗𝑔𝜆𝑗 = 0 because ℬ𝑗 is a basis for 𝐸𝜆𝑗 .
𝑗
𝑗𝑖 𝑗 𝑖
𝑖=1
Therefore, all of the scalars in (*) are 0 and the set ℬ is linearly independent.
We have now reached the stage where we can give a test to determine whether a matrix is
diagonalizable over F.
Theorem 9.4.10
(Diagonalizability Test)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Suppose that the complete factorization of the characteristic polynomial
of 𝐴 into irreducible factors over F is given by
𝐶𝐴 (𝜆) = (𝜆 − 𝜆1 )𝑎𝜆1 · · · (𝜆 − 𝜆𝑘 )𝑎𝜆𝑘 ℎ(𝜆),
where 𝜆1 , . . . 𝜆𝑘 are all of the distinct eigenvalues of 𝐴 in F with corresponding algebraic
multiplicities 𝑎𝜆1 . . . 𝑎𝜆𝑘 and ℎ(𝜆) is a polynomial in 𝜆 that is irreducible over F. Then 𝐴
is diagonalizable over F if and only if ℎ(𝜆) is a constant polynomial and 𝑎𝜆𝑖 = 𝑔𝜆𝑖 , for each
𝑖 = 1, . . . , 𝑘.
Proof: We begin with the forward direction. Thus, assume that 𝐴 is diagonalizable over
F. Then, by Corollary 9.3.2 (Eigenvector Basis Criterion for Diagonalizability – Matrix
Version), 𝐴 has 𝑛 linearly independent eigenvectors in F𝑛 . Each of these eigenvectors must
lie in one of the eigenspaces 𝐸𝜆𝑖 of 𝐴, and no eigenvector can lie in two different eigenspaces.
Since there can be at most 𝑔𝜆𝑖 linearly independent vectors in 𝐸𝜆𝑖 , we conclude that
𝑛 ≤ 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 .
On the other hand, by Proposition 9.4.8 (Geometric and Algebraic Multiplicities), we know
that 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 , and so
𝑔𝜆1 + · · · + 𝑔𝜆𝑘 ≤ 𝑎𝜆1 + · · · + 𝑎𝜆𝑘 .
We also know that the sum of algebraic multiplicities 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 is at most the degree
of 𝐶𝐴 (𝜆), which is 𝑛 by Proposition 7.3. 4 (Features of the Characteristic Polynomial).
Therefore,
𝑛 ≤ 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 ≤ 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 ≤ 𝑛.
Section 9.4
The Diagonalizability Test
249
This implies that
𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 = 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 = 𝑛 = deg(𝐶𝐴 (𝜆)).
From this we immediately conclude that deg(ℎ(𝜆)) = 0, i.e., that ℎ(𝜆) is a constant polynomial. Since 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 for all 𝑖 = 1, . . . , 𝑘, we also conclude from this that 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for all
𝑖 = 1, . . . , 𝑘. This completes the proof of the forward direction.
Conversely, assume that ℎ(𝜆) is a constant polynomial and that 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for all 𝑖 = 1, . . . , 𝑘.
This implies that
𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 = 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 = 𝑛.
Thus if we let ℬ𝑘 be a basis for 𝐸𝜆𝑘 for 𝑖 = 1, . . . , 𝑘, and if we let ℬ = ℬ1 ∪ · · · ∪ ℬ𝑘 be
the union of these bases, then by Proposition 9.4.9, ℬ will linearly independent, and by the
above, ℬ will contain 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 = 𝑛 eigenvectors. Thus ℬ is a linearly independent
subset of F𝑛 that contains 𝑛 vectors, so ℬ must be a basis for F𝑛 . Therefore, we have
found a basis for F𝑛 consisting of eigenvectors of 𝐴, so 𝐴 must be diagonalizable over F, by
Corollary 9.3.2 (Eigenvector Basis Criterion for Diagonalizability – Matrix Version).
REMARK
If F = C, then the polynomial ℎ(𝜆) in Theorem 9.4.10 (Diagonalizability Test) is always
constant (in fact, it is equal to (−1)𝑛 .) This is because every degree 𝑛 polynomial factors
over C into linear factors. Therefore ℎ(𝜆) only matters if F = R. It measures the failure of
𝐶𝐴 (𝜆) to factor completely into linear factors, and so it measures, in a sense, a deficit of
real eigenvalues.
⎡
Example 9.4.11
⎤
122
Consider the matrix 𝐴 = ⎣ 2 1 2 ⎦ from Example 9.4.7. Determine whether 𝐴 is diagonal221
izable over R and/or C.
Solution: In Example 9.4.7, we found the characteristic polynomial to be
𝐶𝐴 (𝜆) = −(𝜆 − 5)(𝜆 + 1)2 .
So ℎ(𝜆) = −1, a constant polynomial. The eigenvalues are 𝜆1 = 5 and 𝜆2 = −1 and we
determined that 𝑎𝜆1 = 𝑔𝜆1 = 1 and 𝑎𝜆2 = 𝑔𝜆2 = 2. Therefore, 𝐴 is diagonalizable over both
R and C, using Theorem 9.4.10 (Diagonalizability Test).
⎡
Example 9.4.12
1
⎢2
Determine whether 𝐴 = ⎢
⎣0
0
2
1
0
0
0
0
1
2
⎤
0
0⎥
⎥ is diagonalizable over R and/or C.
2⎦
1
Solution: The characteristic polynomial is 𝐶𝐴 (𝜆) = (𝜆 − 3)2 (𝜆 + 1)2 . Therefore, ℎ(𝜆) = 1,
a constant polynomial.
250
Chapter 9
Diagonalization
The eigenvalues are: 𝜆1 = 3, with 𝑎𝜆1 = 2, and 𝜆2 = −1, with 𝑎𝜆2 = 2.
#»
The eigenspace 𝐸𝜆1 is the solution set to (𝐴 − 3𝐼) #»
𝑣 = 0 , which is equivalent to
⎤
⎡
−2 2 0 0
⎢ 2 −2 0 0 ⎥ #» #»
⎥
⎢
⎣ 0 0 −2 2 ⎦ 𝑣 = 0
0 0 2 −2
and row reduces to
⎡
1
⎢0
⎢
⎣0
0
−1
0
0
0
0
1
0
0
⎤
0
−1 ⎥
#»
⎥ #»
𝑣 = 0.
⎦
0
0
Since the rank of the coefficient matrix is 2, then there will be 4 − 2 = 2 parameters in the
solution set. Therefore, 𝑔𝜆1 = dim(𝐸𝜆1 ) = 2.
#»
The eigenspace 𝐸 is the solution set to (𝐴 + 1𝐼) #»
𝑣 = 0 which is equivalent to
𝜆2
and row reduces to
⎡
2
⎢2
⎢
⎣0
0
2
2
0
0
0
0
2
2
⎤
0
0⎥
#»
⎥ #»
𝑣 = 0
2⎦
2
⎡
1
0
0
0
0
1
0
0
⎤
0
1⎥
#»
⎥ #»
𝑣 = 0.
0⎦
0
1
⎢0
⎢
⎣0
0
Since the rank of the coefficient matrix is 2, then there will be 4 − 2 = 2 parameters in the
solution set. Therefore, 𝑔𝜆2 = dim(𝐸𝜆2 ) = 2.
Since 𝑎𝜆1 = 𝑔𝜆1 = 2, and 𝑎𝜆2 = 𝑔𝜆2 = 2, 𝐴 is diagonalizable over both R and C by Theorem
9.4.10 (Diagonalizability Test).
⎡
Example 9.4.13
5
⎢ −2
Determine whether 𝐴 = ⎢
⎣ 4
16
2
1
4
0
0
0
3
−8
⎤
1
−1 ⎥
⎥ is diagonalizable over R and/or C.
2 ⎦
−5
Solution: The characteristic polynomial 𝐶𝐴 (𝜆) = (𝜆 − 3)3 (𝜆 + 5). Therefore, ℎ(𝜆) = 1, a
constant polynomial.
The eigenvalues are: 𝜆1 = 3, with 𝑎𝜆1 = 3, and 𝜆2 = −5, with 𝑎𝜆2 = 1.
#»
The eigenspace 𝐸𝜆1 is the solution set to (𝐴 − 3𝐼) #»
𝑣 = 0 , which is equivalent to
⎡
⎤
2 2 0 1
⎢ −2 −2 0 −1 ⎥ #» #»
⎢
⎥
⎣ 4 4 0 2 ⎦𝑣 = 0
16 0 −8 −8
Section 9.4
The Diagonalizability Test
251
and row reduces to
⎡
2
⎢0
⎢
⎣0
0
2
2
0
0
0
1
0
0
⎤
1
2⎥
#»
⎥ #»
𝑣 = 0.
0⎦
0
Since the rank of the coefficient matrix is 2, then there will be 4 − 2 = 2 parameters in the
solution set. Therefore, dim(𝐸𝜆1 ) = 2.
Since 𝑎𝜆1 = 3 and 𝑔𝜆1 = 2, 𝐴 is not diagonalizable over R nor C, by Theorem 9.4.10
(Diagonalizability Test).
[︂
Example 9.4.14
Determine whether 𝐴 =
]︂
1 −1
is diagonalizable over R and/or C.
1 1
Solution: The characteristic polynomial 𝐶𝐴 (𝜆) = 𝜆2 − 2𝜆 + 2 = (𝜆 − 1 − 𝑖)(𝜆 − 1 + 𝑖). Over
R, ℎ(𝜆) = 𝜆2 −2𝜆+2 which is not a constant polynomial. Therefore, 𝐴 is not diagonalizable
over R, by Theorem 9.4.10 (Diagonalizability Test).
However, over C, ℎ(𝜆) = 1, which is a constant polynomial.
The eigenvalues are: 𝜆1 = 1 + 𝑖, with 𝑎𝜆1 = 1 and 𝜆2 = 1 − 𝑖, with 𝑎𝜆2 = 1. Since
𝐴 has two distinct eigenvalues, we can conclude that 𝐴 is diagonalizable over C by
Proposition 7.6.7 (𝑛 Distinct Eigenvalues =⇒ Diagonalizable).
If we wanted to use Theorem 9.4.10, we could note that 1 ≤ 𝑔𝑖 ≤ 𝑎𝑖 for any eigenvalue. In
this case, 𝑎𝜆1 = 𝑎𝜆2 = 1 and so 𝑔𝜆1 = 𝑔𝜆2 = 1. Therefore, by the Diagonalizability Test, 𝐴
is diagonalizable over C.
Theorem 9.4.10 (Diagonalizability Test) is useful when we are determining whether a matrix
is diagonalizable over F. However, if an 𝑛 × 𝑛 matrix 𝐴 is diagonalizable over F, we often
want to find an invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F) such 𝑃 −1 𝐴𝑃 = 𝐷, where 𝐷 is a diagonal
matrix.
Theorem 9.4.10 and the proof of Proposition 9.4.9 give us that if 𝐴 is diagonalizable over
F, then the union of the bases of all of the eigenspaces of 𝐴 will be a basis of eigenvectors of
𝐴 for F𝑛 . Therefore, we let 𝑃 be the matrix whose columns are this basis of eigenvectors.
The proof of Proposition 9.3. 1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable) gives us that
𝑃 −1 𝐴𝑃 = 𝐷, where 𝐷 is a diagonal matrix and the entries on the diagonal of 𝐷 will be
the eigenvalues of 𝐴 in the order corresponding to the order in which the eigenvectors are
used as the columns of 𝑃 . We illustrate this with an example.
⎡
Example 9.4.15
⎤
1200
⎢2 1 0 0⎥
⎥
We know from Example 9.4.12 that matrix 𝐴 = ⎢
⎣ 0 0 1 2 ⎦ is diagonalizable over R. De0021
termine an invertible matrix 𝑃 such that 𝑃 −1 𝐴𝑃 = 𝐷, where 𝐷 is a diagonal matrix.
Solution: In Example 9.4.12, we determined that 𝐴 had two eigenvalues, 𝜆1 = 3 and
𝜆2 = −1.
252
Chapter 9
We determined that the eigenspace 𝐸𝜆1 was the solution set to
⎡
⎤
1 −1 0 0
⎢ 0 0 1 −1 ⎥ #» #»
⎢
⎥
⎣0 0 0 0 ⎦ 𝑣 = 0 .
0 0 0 0
A basis for 𝐸𝜆1
⎧⎡ ⎤ ⎡ ⎤⎫
1
0 ⎪
⎪
⎪
⎨⎢ ⎥ ⎢ ⎥⎪
⎬
1
⎥ , ⎢0⎥ .
is ⎢
⎣ 0 ⎦ ⎣ 1 ⎦⎪
⎪
⎪
⎪
⎩
⎭
0
1
We determined that the eigenspace 𝐸𝜆2 was the solution set to
⎤
⎡
1100
⎢ 0 0 1 1 ⎥ #» #»
⎥
⎢
⎣0 0 0 0⎦ 𝑣 = 0 .
0000
A basis for 𝐸𝜆2
⎧⎡
⎤ ⎡ ⎤⎫
0 ⎪
−1
⎪
⎪
⎪
⎨
⎢ 1 ⎥ ⎢ 0 ⎥⎬
⎥,⎢ ⎥ .
is ⎢
⎣ 0 ⎦ ⎣ −1 ⎦⎪
⎪
⎪
⎪
⎩
⎭
1
0
Therefore, we let
⎡
1
⎢1
𝑃 =⎢
⎣0
0
0
0
1
1
−1
1
0
0
⎤
0
0 ⎥
⎥,
−1 ⎦
1
⎡
0
3
0
0
which will give us that
3
⎢
0
𝑃 −1 𝐴𝑃 = 𝐷 = ⎢
⎣0
0
If we reorder the columns of 𝑃 such that
⎡
1
⎢1
𝑃 =⎢
⎣0
0
then we will obtain
−1
1
0
0
⎡
0
0
1
1
3
⎢
0
𝑃 −1 𝐴𝑃 = 𝐷 = ⎢
⎣0
0
0
0
−1
0
⎤
0
0 ⎥
⎥.
0 ⎦
−1
⎤
0
0 ⎥
⎥,
−1 ⎦
1
0
−1
0
0
0
0
3
0
⎤
0
0 ⎥
⎥.
0 ⎦
−1
Diagonalization
Section 9.4
The Diagonalizability Test
253
EXERCISE
⎡
1
⎢2
Using Example 9.4.15, find all diagonal matrices which are similar to 𝐴 = ⎢
⎣0
0
2
1
0
0
0
0
1
2
⎤
0
0⎥
⎥.
2⎦
1
Chapter 10
Vector Spaces
10.1
Definition of a Vector Space
So far, all of our discussion has taken place within R𝑛 and C𝑛 or within various subspaces
of these spaces. An underlying fact that we have made use of is that these sets are closed
under addition and scalar multiplication. We have taken it for granted that any linear
combination of vectors in R𝑛 (or C𝑛 ) will produce a vector in R𝑛 (or C𝑛 ). This assumption
is fundamental to linear algebra.
We can consider other sets of objects along with two operations. One operation, called
addition, combines two objects in the set to produce another object in the set. The other
operation, called scalar multiplication, combines an object in the set and a scalar from
the field F to produce another object in the set. These operations may be defined in familiar
ways, like the addition and scalar multiplication of vectors in F𝑛 . However, they may also
be newly defined operations.
Example 10.1.1
Consider the set of all real polynomials of degree two or less, which we denote by 𝑃2 (R):
{︀
}︀
𝑃2 (R) = 𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 : 𝑝0 , 𝑝1 , 𝑝2 ∈ R .
Given two polynomials in this set,
𝑝(𝑥) = 𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 and 𝑞(𝑥) = 𝑞0 + 𝑞1 𝑥 + 𝑞2 𝑥2 ,
their sum is
𝑝1 (𝑥) + 𝑝2 (𝑥) = 𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 + 𝑞0 + 𝑞1 𝑥 + 𝑞2 𝑥2 = 𝑟0 + 𝑟1 𝑥 + 𝑟2 𝑥2 ,
where 𝑟0 = 𝑝0 + 𝑞0 , 𝑟1 = 𝑝1 + 𝑞1 and 𝑟2 = 𝑝2 + 𝑞2 .
The resulting polynomial is another polynomial in 𝑃2 (R), and thus we say that 𝑃2 (R) is
closed under addition.
If we multiply 𝑝(𝑥) by 𝑐 ∈ R we get
𝑐𝑝(𝑥) = 𝑐(𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 ) = 𝑐𝑝0 + 𝑐𝑝1 𝑥 + 𝑐𝑝2 𝑥2 = 𝑠0 + 𝑠1 𝑥 + 𝑠2 𝑥2 ,
where 𝑠0 = 𝑐𝑝0 , 𝑠1 = 𝑐𝑝1 and 𝑠2 = 𝑐𝑝2 . The resulting polynomial is another polynomial in
𝑃2 (R) and thus, we say that 𝑃2 (R) is closed under scalar multiplication.
254
Section 10.1
Definition of a Vector Space
255
We can think of linear algebra as operating in a world with four components:
1. a non-empty set of objects V;
2. a field F;
3. an operation called addition, that combines two objects from V, which we denote by
⊕; and
4. an operation called scalar multiplication, which combines an object from V and a
scalar from F, which we denote by ⊙.
Note that addition (⊕) and scalar multiplication (⊙) are used in place of + and · to
denote that these operations may have definitions that differ from the “standard operations”
of addition/multiplication on scalars or coordinate vectors. Many other definitions of these
operations are possible, as long as they satisfy the vector space axioms below.
Definition 10.1.2
Vector Space
A non-empty set of objects, V, is a vector space over a field, F, under the operations
of addition, ⊕, and scalar multiplication, ⊙, provided the following set of ten axioms
are met.
C1. For all #»
𝑥 , #»
𝑦 ∈ V, #»
𝑥 ⊕ #»
𝑦 ∈ V.
(Closure Under Addition)
C2. For all #»
𝑥 ∈ V and all 𝑐 ∈ F, 𝑐 ⊙ #»
𝑥 ∈ V.
(Closure Under Scalar Multiplication)
V1. For all #»
𝑥 , #»
𝑦 ∈ V, #»
𝑥 ⊕ #»
𝑦 = #»
𝑦 ⊕ #»
𝑥.
(Addition is Commutative)
V2. For all #»
𝑥 , #»
𝑦 , #»
𝑧 ∈ V, ( #»
𝑥 ⊕ #»
𝑦 ) ⊕ #»
𝑧 = #»
𝑥 ⊕ ( #»
𝑦 ⊕ #»
𝑧 ) = #»
𝑥 ⊕ #»
𝑦 ⊕ #»
𝑧.
(Addition is Associative)
#»
#» #»
V3. There exists a vector 0 ∈ V such that for all #»
𝑥 ∈ V, #»
𝑥 ⊕ 0 = 0 ⊕ #»
𝑥 = #»
𝑥.
(Additive Identity)
#»
V4. For all #»
𝑥 ∈ V, there exists a vector − #»
𝑥 ∈ V such that #»
𝑥 ⊕ (− #»
𝑥 ) = (− #»
𝑥 ) ⊕ #»
𝑥 = 0.
(Additive Inverse)
V5. For all #»
𝑥 , #»
𝑦 ∈ V and for all 𝑐 ∈ F, 𝑐 ⊙ ( #»
𝑥 ⊕ #»
𝑦 ) = (𝑐 ⊙ #»
𝑥 ) ⊕ (𝑐 ⊙ #»
𝑦 ).
(Vector Addition Distributive Law)
V6. For all #»
𝑥 ∈ V and for all 𝑐, 𝑑 ∈ F, (𝑐 + 𝑑) ⊙ #»
𝑥 = (𝑐 ⊙ #»
𝑥 ) ⊕ (𝑑 ⊙ #»
𝑥 ).
(Scalar Addition Distributive Law)
V7. For all #»
𝑥 ∈ V and for all 𝑐, 𝑑 ∈ F, (𝑐𝑑) ⊙ #»
𝑥 = 𝑐 ⊙ (𝑑 ⊙ #»
𝑥 ).
(Scalar Multiplication is Associative)
V8. For all #»
𝑥 ∈ V, 1 ⊙ #»
𝑥 = #»
𝑥.
(Multiplicative Identity)
256
Chapter 10
Definition 10.1.3
Vector Spaces
A vector is an element of a vector space.
Vector
REMARKS
1. The sum 𝑐 + 𝑑 in V6 (scalar addition distributive law) is the sum of scalars in F.
2. The product 𝑐𝑑 in V7 (scalar associativity) is the product of scalars in F.
3. In Abstract Algebra, the combination of set, field, addition, and scalar multiplication
is sometimes referenced as (V, F, ⊕, ⊙).
4. If the “standard operations” for a given vector space are being used, then we will
usually default to the standard + symbol for addition and juxtaposition for scalar
multiplication.
5. One of the defining properties of a field that we will use is the fact that every nonzero
𝑎 ∈ F has a multiplicative inverse, which we denote by 𝑎−1 .
That is to say, for every nonzero 𝑎 ∈ F, there exists 𝑎−1 ∈ F such that
𝑎𝑎−1 = 𝑎−1 𝑎 = 1.
Example 10.1.4
Consider R𝑛 over R and C𝑛 over C with the standard vector addition and standard scalar
multiplication, respectively. We can show that the ten vector space axioms are satisfied
here, so that both of these sets are vector spaces over their respective fields.
Perhaps the most simple vector space V that one can think of is the vector space consisting
#»
of a single vector, namely the zero vector 0 , with addition and scalar multiplication defined
in an obvious way.
Example 10.1.5
#»
#» #»
#»
#»
#»
Consider the set V = { 0 } with the operations 0 + 0 = 0 and 𝑐 0 = 0 , where 𝑐 ∈ F. It
#»
can be shown that these operations satisfy the ten vector space axioms. Thus, V = { 0 } is
a vector space. It is called the zero vector space or the zero space.
The zero space is not the most interesting object to study. Here are two more interesting
examples of vector spaces.
Example 10.1.6
Consider 𝑀𝑚×𝑛 (F) over F equipped with the standard matrix addition and scalar multiplication, respectively. We can show that the ten vector space axioms are satisfied, with the
zero vector in this space being the zero matrix. Therefore, 𝑀𝑚×𝑛 (F) is a vector space over
F.
Section 10.1
Definition of a Vector Space
Example 10.1.7
257
The set 𝑃𝑛 (F) is the set of all polynomials of degree at most 𝑛 with coefficients in F. Using
the field F, we define addition and scalar multiplication in the following way:
(𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) + (𝑏0 + 𝑏1 𝑥 + · · · + 𝑏𝑛 𝑥𝑛 ) = (𝑎0 + 𝑏0 ) + (𝑎1 + 𝑏1 )𝑥 + · · · + (𝑎𝑛 + 𝑏𝑛 )𝑥𝑛 ,
𝑐(𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) = (𝑐𝑎0 ) + (𝑐𝑎1 )𝑥 + · · · + (𝑐𝑎𝑛 )𝑥𝑛 .
We can show that these operations satisfy the ten vector space axioms, with the zero vector
being the zero polynomial in 𝑃𝑛 (F), i.e.,
𝑝(𝑥) = 0 + 0𝑥 + · · · + 0𝑥𝑛 .
Therefore, this is a vector space over F. We give the formal definition below.
Note that a vector in 𝑀𝑚×𝑛 (F) is by definition an 𝑚 × 𝑛 matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F). Likewise,
a vector in 𝑃𝑛 (F) is a polynomial in 𝑃𝑛 (F).
Definition 10.1.8
𝑃𝑛 (F)
We use 𝑃𝑛 (F) to denote the vector space over F comprised of the set of all polynomials of
degree at most 𝑛 with coefficients in F, with addition and scalar multiplication defined as
follows:
(𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) + (𝑏0 + 𝑏1 𝑥 + · · · + 𝑏𝑛 𝑥𝑛 ) = (𝑎0 + 𝑏0 ) + (𝑎1 + 𝑏1 )𝑥 + · · · + (𝑎𝑛 + 𝑏𝑛 )𝑥𝑛 ,
𝑐(𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) = (𝑐𝑎0 ) + (𝑐𝑎1 )𝑥 + · · · + (𝑐𝑎𝑛 )𝑥𝑛 .
Of course, our discussion about vector spaces would not be complete without showing
examples of objects (V, F, ⊕, ⊙) that are not vector spaces. This happens when at least
one of the ten vector space axioms fails. In order to prove that (V, F, ⊕, ⊙) is not a vector
space, it suffices to disprove V4 by showing that V has no additive identity or to disprove
any other vector space axiom by providing an explicit counterexample that violates it.
Example 10.1.9
For a positive integer 𝑛, let
⎧⎡ ⎤
⎫
⎪
⎪
⎨ 𝑥1
⎬
⎢ .. ⎥
V = ⎣ . ⎦ : 𝑥1 > 0, . . . , 𝑥𝑛 > 0
⎪
⎪
⎩
⎭
𝑥𝑛
be a subset of R𝑛 , and consider V equipped with the standard operations of addition and
scalar multiplication of vectors in R𝑛 . Show that V is not a vector space.
Solution: Since V does not contain the zero vector, we see that the vector space axiom
V4 is violated, which means that V is not a vector space.
258
Chapter 10
Example 10.1.10
Vector Spaces
For a positive integer 𝑛, let
⎧⎡ ⎤
⎫
⎪
⎪
⎨ 𝑥1
⎬
⎢ .. ⎥
V = ⎣ . ⎦ : 𝑥1 ≥ 0, . . . , 𝑥𝑛 ≥ 0
⎪
⎪
⎩
⎭
𝑥𝑛
be a subset of R𝑛 , and consider V equipped with the standard operations of addition and
scalar multiplication of vectors in R𝑛 . Show that V is not a vector space over R.
Solution: Unlike in the previous example, this time V does contain the zero vector, so
the axiom V4 is satisfied. Furthermore, one can verify that V is closed under addition,
so the axiom C1 is satisfied. However, V is not closed under scalar multiplication. As a
counterexample, take the first standard basis vector #»
𝑥 = #»
𝑒 1 and 𝑐 = −1. Then
⎡ ⎤ ⎡
⎤
1
−1
⎢0⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢
⎥
𝑐 #»
𝑥 = (−1) · ⎢ . ⎥ = ⎢ . ⎥ ∈
/ V,
⎣ .. ⎦ ⎣ .. ⎦
0
0
which means that the vector space axiom C2 is violated. This proves that V is not a vector
space over R. As an exercise try listing all vector space axioms that do and do not hold for
V.
EXERCISE
Let
⎧⎡ ⎤
⎫
𝑥
⎪
⎪
1
⎨
⎬
⎢ .. ⎥
V = ⎣ . ⎦ : 𝑥1 , . . . , 𝑥𝑛 ∈ R
⎪
⎪
⎩
⎭
𝑥𝑛
be a subset of C𝑛 , and consider V equipped with the standard operations of addition and
scalar multiplication of vectors in C𝑛 . Show that V is not a vector space over C. List all
vector space axioms that do and do not hold for V.
10.2
Span, Linear Independence and Basis
Just like for F𝑛 , we can introduce the notions of a span, linear dependence, linear independence and basis for a vector space (V, F, + , · ). In what follows, we will primarily be
interested in vector spaces 𝑃𝑛 (F) and 𝑀𝑚×𝑛 (F).
Definition 10.2.1
Span
Let 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 be vectors in V. We define the span of {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } to be the set of all
linear combinations of 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 . That is,
Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } = {𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 : 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F}.
We refer to {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } as a spanning set for Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. We also say that
Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is spanned by {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }.
Section 10.2
Span, Linear Independence and Basis
259
We have not yet developed an analogous result for Proposition 8.4.6 (Spans F𝑛 iff rank is
𝑛) in the general vector space context, so in order to show a set ℬ is a spanning set of
a vector space V we will need to apply a more “first principles” approach. To establish
Span(ℬ) = V, we will need to show Span(ℬ) ⊆ V and V ⊆ Span(ℬ).
Example 10.2.2
Show that the set
{︂[︂
ℬ=
]︂ [︂
]︂ [︂
]︂ [︂
]︂}︂
10
00
01
0 1
,
,
,
00
01
10
−1 0
is a spanning set for 𝑀2×2 (F).
Solution: Our goal is to show that
{︂[︂
]︂ [︂
]︂ [︂
]︂ [︂
]︂}︂
10
00
01
0 1
Span
,
,
,
= 𝑀2×2 (F).
00
01
10
−1 0
Since ℬ is a subset of 𝑀2×2 (F), it must be the case that any linear combination of the
elements of ℬ is also in 𝑀2×2 (F) by the closure axioms for vector spaces. Therefore,
Span(ℬ) ⊆ 𝑀2×2 (F).
[︂
]︂
𝑎11 𝑎12
It remains to show that 𝑀2×2 (F) ⊆ Span(ℬ). Let 𝐴 =
be a matrix in 𝑀2×2 (F).
𝑎21 𝑎22
We claim that there exist 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ F such that
]︂ [︂
[︂
]︂
[︂
]︂
[︂
]︂
[︂
]︂
10
00
01
𝑎11 𝑎12
0 1
𝑐1
+ 𝑐2
=
+ 𝑐3
+ 𝑐4
.
−1 0
𝑎21 𝑎22
00
01
10
To see that this is the case, let us simplify the left-hand side of the above equality:
]︂
]︂ [︂
[︂
𝑎 𝑎
𝑐1 𝑐3 + 𝑐4
= 11 12 .
𝑎21 𝑎22
𝑐3 − 𝑐4 𝑐2
Equating the entries of the matrices on both sides of the above equality, we obtain the
following system of four equations in four unknowns:
𝑐1
𝑐3 +𝑐4
𝑐3 −𝑐4
𝑐2
= 𝑎11
= 𝑎12
.
= 𝑎21
= 𝑎22
Solving this system for 𝑐1 , 𝑐2 , 𝑐3 , and 𝑐4 , we find that
𝑐1 = 𝑎11 ,
𝑐2 = 𝑎22 ,
𝑐3 =
𝑎12 + 𝑎21
,
2
and 𝑐4 =
𝑎12 − 𝑎21
.
2
Thus,
[︂
𝐴 = 𝑎11
]︂
[︂
]︂
[︂
]︂
[︂
]︂
𝑎12 + 𝑎21 0 1
𝑎12 − 𝑎21 0 1
10
00
+ 𝑎22
+
+
,
10
−1 0
00
01
2
2
so 𝐴 ∈ Span(ℬ). This means that 𝑀2×2 (F) ⊆ Span(ℬ), and since it is also the case that
Span(ℬ) ⊆ 𝑀2×2 (F), we conclude that ℬ is a spanning set for 𝑀2×2 (F).
260
Chapter 10
Vector Spaces
EXERCISE
Show that the set
{︂[︂
ℬ=
]︂ [︂
]︂ [︂
]︂ [︂
]︂}︂
10
01
00
1 0
,
,
,
01
00
10
0 −1
is a spanning set for 𝑀2×2 (F).
Example 10.2.3
Show that the set
ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥}
is a spanning set for 𝑃2 (C).
Solution: Our goal is to show that
{︀
}︀
Span 1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥 = 𝑃2 (C).
Since ℬ is a subset of 𝑃2 (C), it must be the case that any linear combination of the elements
of ℬ is also in 𝑃2 (C). Therefore, Span(ℬ) ⊆ 𝑃2 (C).
It remains to show that 𝑃2 (C) ⊆ Span(ℬ). Let 𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 be a polynomial in
𝑃2 (C). We claim that there exist 𝑐1 , 𝑐2 , 𝑐3 ∈ C such that
𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 .
To see that this is the case, let us simplify the left-hand side of the above equality:
(𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 .
Equating the corresponding coefficients of the polynomials on both sides of the above equality, we obtain the following system of three equations in three unknowns:
𝑐1 −𝑐2 −𝑐3 = 𝑎0
𝑖𝑐2 −𝑖𝑐3 = 𝑎1 .
−𝑐1
= 𝑎2
Solving this system for 𝑐1 , 𝑐2 , and 𝑐3 , we find that
𝑐1 = −𝑎2 ,
𝑐2 =
−𝑎0 − 𝑖𝑎1 − 𝑎2
2
, and 𝑐3 =
−𝑎0 + 𝑖𝑎1 − 𝑎2
.
2
Thus,
𝑝(𝑥) = (−𝑎2 )(1 − 𝑥2 ) +
−𝑎0 + 𝑖𝑎1 − 𝑎2
−𝑎0 − 𝑖𝑎1 − 𝑎2
(−1 + 𝑖𝑥) +
(−1 − 𝑖𝑥),
2
2
so 𝑝(𝑥) ∈ Span(ℬ). This means that 𝑃2 (C) ⊆ Span(ℬ), and since it is also the case that
Span(ℬ) ⊆ 𝑃2 (C), we conclude that ℬ is a spanning set for 𝑃2 (C).
EXERCISE
Let 𝑛 be a non-negative integer. Show that the set
ℬ = {1, 2𝑥, 4𝑥2 , . . . , 2𝑛 𝑥𝑛 }
Section 10.2
Span, Linear Independence and Basis
261
is a spanning set for 𝑃𝑛 (F).
As with the definition of the span of a set, the definitions of linear dependence and linear
independence in the general vector space context will look nearly identical to what was used
for linear dependence and linear independence in F𝑛 in Definitions 8.2.3 and 8.2.4.
Definition 10.2.4
Linear Dependence,
Linear Independence
We say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ F𝑛 are linearly dependent if there exist scalars
#»
𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F, not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 .
We say that 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ V are linearly independent if the only solution to the equation
#»
𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0
is the trivial solution 𝑐1 = 𝑐2 = · · · = 𝑐𝑘 = 0.
If 𝑈 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }, then we say that the set 𝑈 is linearly dependent (resp. linearly
independent) to mean that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent (resp. linearly
independent).
Since we do not yet have an analogous result for Proposition 8.3. 6 (Pivots and Linear
Independence) in the general vector space context, we will need to determine whether or
not a set ℬ is linearly independent directly from the definition above.
Example 10.2.5
Determine whether the subset
]︂}︂
]︂ [︂
]︂ [︂
]︂ [︂
{︂[︂
0 1
01
00
10
,
,
,
ℬ=
−1 0
10
01
00
of 𝑀2×2 (F) is linearly dependent or linearly independent.
Solution: In order to determine whether the set ℬ is linearly dependent or linearly independent, we examine the equation
]︂
]︂
]︂
]︂
]︂ [︂
[︂
[︂
[︂
[︂
00
10
00
01
0 1
.
𝑐1
+ 𝑐2
+ 𝑐3
+ 𝑐4
=
00
01
10
−1 0
00
Simplifying the left-hand side of the above equality, we find that
[︂
]︂ [︂
]︂
𝑐1 𝑐3 + 𝑐4
00
=
.
𝑐3 − 𝑐4 𝑐2
00
Equating the entries of the matrices on both sides of the above equality, we obtain the
following system of four equations in four unknowns:
𝑐1
𝑐2
=0
𝑐3 +𝑐4 = 0
.
𝑐3 −𝑐4 = 0
=0
Solving this system for 𝑐1 , 𝑐2 , 𝑐3 and 𝑐4 , we find that 𝑐1 = 𝑐2 = 𝑐3 = 𝑐4 = 0. Therefore, ℬ
is linearly independent.
262
Chapter 10
Vector Spaces
EXERCISE
Determine whether the subset
{︂[︂
ℬ=
]︂ [︂
]︂ [︂
]︂ [︂
]︂}︂
10
01
00
1 0
,
,
,
01
00
10
0 −1
of 𝑀2×2 (F) is linearly dependent or linearly independent.
Example 10.2.6
Determine whether the subset
ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥}
of 𝑃2 (C) is linearly dependent or linearly independent.
Solution: In order to determine whether the set ℬ is linearly dependent or linearly independent, we examine the equation
𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥) = 0.
Simplifying the left-hand side of the above equality, we find that
(𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2 = 0.
Equating the corresponding coefficients of the polynomials on both sides of the above equality, we obtain the following system of three equations in three unknowns:
𝑐1 −𝑐2 −𝑐3 = 0
𝑖𝑐2 −𝑖𝑐3 = 0 .
−𝑐1
=0
Solving this system for 𝑐1 , 𝑐2 and 𝑐3 , we find that 𝑐1 = 𝑐2 = 𝑐3 = 0. Therefore, ℬ is linearly
independent.
EXERCISE
Let 𝑛 be a non-negative integer. Determine whether the subset
ℬ = {1, 2𝑥, 4𝑥2 , . . . , 2𝑛 𝑥𝑛 }
of 𝑃𝑛 (F) is linearly dependent or linearly independent.
Finally, we extend the notion of basis to the general vector space context, with a definition
that is, yet again, nearly identical to what was used previously for a basis of a subspace of
F𝑛 in Definition 8.2.6.
Definition 10.2.7
We say that a subset ℬ of a nonzero vector space V is a basis for V if
Basis
1. ℬ is linearly independent, and
Section 10.2
Span, Linear Independence and Basis
263
2. V = Span(ℬ).
REMARK
You should be cautioned that our definition requires a basis to have finitely many vectors in
it. This may not always be possible. For example, the set of all polynomials with coefficients
in F is a vector space over F with the usual operations of addition and scalar multiplication.
However, this vector space does not admit a “basis” according to our definition. The study
of such “infinite-dimensional” vector spaces has important applications in mathematics and
physics, and is taken up in more advanced courses.
Example 10.2.8
Determine whether or not the set
{︂[︂
]︂ [︂
]︂ [︂
]︂ [︂
]︂}︂
10
00
01
0 1
ℬ=
,
,
,
10
−1 0
00
01
is a basis for 𝑀2×2 (F).
Solution: It was demonstrated in Example 10.2.5 that ℬ is linearly independent. It
was also demonstrated in Example 10.2.2 that 𝑀2×2 (F) = Span(ℬ). Since ℬ is a linearly
independent spanning set for 𝑀2×2 (F), it is a basis for 𝑀2×2 (F).
EXERCISE
Show that the set
{︂[︂
ℬ=
]︂}︂
]︂ [︂
]︂ [︂
]︂ [︂
1 0
00
01
10
,
,
,
0 −1
10
00
01
is a basis for 𝑀2×2 (F).
Example 10.2.9
Show that the set
ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥}
is a basis for 𝑃2 (C).
Solution: It was demonstrated in Example 10.2.6 that ℬ is linearly independent. It was also
demonstrated in Example 10.2.3 that 𝑃2 (C) = Span(ℬ). Since ℬ is a linearly independent
spanning set for 𝑃2 (C), it is a basis for 𝑃2 (C).
EXERCISE
Let 𝑛 be a non-negative integer. Determine whether or not the set
ℬ = {1, 2𝑥, 4𝑥2 , . . . , 2𝑛 𝑥𝑛 }
is a basis for 𝑃2 (F).
264
Chapter 10
Vector Spaces
With the definitions of the span of a set, linear independence, linear dependence, and basis
being so similar to what were used previously in the context of F𝑛 , it should come as no
surprise that much of what we developed in that context holds for general vector spaces
as well. In particular, analogous results to Theorem 8.7.2 (Dimension is Well-Defined) and
Theorem 8.8.1 (Unique Representation Theorem) exist, allowing us to define the notions of
dimension and coordinate vectors. With these tools, we can translate much of the work of
general vector spaces back to familiar territory in the context of F𝑛 !
10.3
Linear Operators
Just like we were able to generalize the notions of a vector space, linear dependence, linear
independence, and basis, we can generalize the notion of a linear transformation.
Definition 10.3.1
Generic Linear
Transformation
Let (V, F, + , · ) and (W, F, ⊕, ⊙) be vector spaces over the same field F. We say
that the function 𝑇 : V → W is a linear transformation (or linear mapping) if, for any
#»
𝑥 , #»
𝑦 ∈ V and any 𝑐 ∈ F, the following two properties hold:
1. 𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) ⊕ 𝑇 ( #»
𝑦 ) (called linearity over addition).
2. 𝑇 (𝑐 · #»
𝑥 ) = 𝑐 ⊙ 𝑇 ( #»
𝑥 ) (called linearity over scalar multiplication).
We refer to V here as the domain of 𝑇 and W as the codomain of 𝑇 , as we would for any
function.
When studying functions between vector spaces, we will restrict our attention only to linear
transformations whose domains and codomains are F𝑛 , 𝑃𝑛 (F) and 𝑀𝑚×𝑛 (F), equipped with
the standard operations of addition and scalar multiplication. In view of this, we will write
V instead of (V, F, + , · ) for brevity.
Example 10.3.2
For a positive integer 𝑛, consider the function
𝐷 : 𝑃𝑛 (F) → 𝑃𝑛 (F)
defined by
𝐷(𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + · · · + 𝑎𝑛 𝑥𝑛 ) = 𝑎1 + (2𝑎2 )𝑥 + · · · + (𝑛𝑎𝑛 )𝑥𝑛−1 .
Show that 𝐷 is a linear transformation.
Solution: Notice that, for all polynomials
𝑓 (𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + · · · + 𝑎𝑛 𝑥𝑛
and 𝑔(𝑥) = 𝑏0 + 𝑏1 𝑥 + 𝑏2 𝑥2 + · · · + 𝑏𝑛 𝑥𝑛 ,
we have
𝐷(𝑓 (𝑥) + 𝑔(𝑥)) = 𝐷((𝑎0 + 𝑏0 ) + (𝑎1 + 𝑏1 )𝑥 + (𝑎2 + 𝑏2 )𝑥2 + · · · + (𝑎𝑛 + 𝑏𝑛 )𝑥𝑛 )
= (𝑎1 + 𝑏1 ) + 2(𝑎2 + 𝑏2 )𝑥 + · · · + 𝑛(𝑎𝑛 + 𝑏𝑛 )𝑥𝑛−1
= (𝑎1 + 2𝑎2 𝑥 + · · · + 𝑛𝑎𝑛 𝑥𝑛−1 ) + (𝑏1 + 2𝑏2 𝑥 + · · · + 𝑛𝑏𝑛 𝑥𝑛−1 )
= 𝐷(𝑓 (𝑥)) + 𝐷(𝑔(𝑥)),
Section 10.3
Linear Operators
265
and for all 𝑐 ∈ F we have
𝐷(𝑐𝑓 (𝑥)) = 𝐷(𝑐𝑎0 + 𝑐𝑎1 𝑥 + 𝑐𝑎2 𝑥2 + · · · + 𝑐𝑎𝑛 𝑥𝑛 )
= 𝑐𝑎1 + 𝑐(2𝑎2 )𝑥 + · · · + 𝑐(𝑛𝑎𝑛 )𝑥𝑛−1
= 𝑐(𝑎1 + 2𝑎2 𝑥 + · · · 𝑛𝑎𝑛 𝑥𝑛−1 )
= 𝑐𝐷(𝑓 (𝑥)).
Thus, 𝐷 is a linear transformation. If you studied calculus, then you may recognize that 𝐷
is the differentiation operator; that is, for any polynomial 𝑓 (𝑥) ∈ 𝑃𝑛 (F), 𝐷(𝑓 (𝑥)) = 𝑓 ′ (𝑥),
where 𝑓 ′ (𝑥) is the derivative of 𝑓 (𝑥).
EXERCISE
Show that the function 𝑇 : 𝑃𝑛 (F) → F defined by 𝑇 (𝑓 (𝑥)) = 𝑓 (0) is a linear transformation.
Example 10.3.3
For positive integers 𝑚 and 𝑛, consider the function
𝑆 : 𝑀𝑚×𝑛 (F) → 𝑀𝑛×𝑚 (F)
defined by
𝑆(𝐴) = 𝐴𝑇 .
That is, 𝑆(𝐴) is equal to the transpose of the 𝑚 × 𝑛 matrix 𝐴. Show that 𝑆 is a linear
transformation.
Solution: In order to prove that 𝑆 is a linear transformation we will use Proposition 4.3.
13 (Properties of Matrix Transpose).
Notice that, for all 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), we have
𝑆(𝐴 + 𝐵) = (𝐴 + 𝐵)𝑇
= 𝐴𝑇 + 𝐵 𝑇
by Proposition 4.3.13
= 𝑆(𝐴) + 𝑆(𝐵).
Moreover, for all 𝑐 ∈ F, we have
𝑆(𝑐𝐴) = (𝑐𝐴)𝑇
= 𝑐𝐴𝑇
by Proposition 4.3.13
= 𝑐𝑆(𝐴).
Thus, 𝑆 is a linear transformation.
EXERCISE
Recall from Definition 7.3.2 that the trace tr(𝐴) of a square matrix 𝐴 is defined as the sum
of diagonal entries of 𝐴. Show that the function tr : 𝑀𝑛×𝑛 (F) → F is a linear transformation.
266
Chapter 10
Vector Spaces
As in Chapter 9, the linear transformations 𝑇 : V → V whose domain and codomain are the
same set are especially interesting to us.
Definition 10.3.4
A linear transformation 𝑇 : V → V is called a linear operator.
Linear Operator
Just like linear operators on F𝑛 that we studied in Chapter 9, linear operators 𝑇 : V → V
have many nice properties, such as the property that the composition of 𝑇 with itself, 𝑇 ∘ 𝑇 ,
is well-defined, or that we can naturally introduce eigenvalues, eigenvectors, and eigenpairs
for 𝑇 . We will explore this second property in detail.
Definition 10.3.5
Eigenvector,
Eigenvalue and
Eigenpair of a
Linear Operator
Let 𝑇 : V → V be a linear operator. We say that the nonzero vector #»
𝑥 ∈ V is an eigenvector of 𝑇 if there exists a scalar 𝜆 ∈ F such that
𝑇 ( #»
𝑥 ) = 𝜆 #»
𝑥.
The scalar 𝜆 is called an eigenvalue of 𝑇 over F and the pair (𝜆, #»
𝑥 ) is called an eigenpair
of 𝑇 over F.
Thus, if (𝜆, #»
𝑥 ) is an eigenpair of 𝑇 over F, then the action of 𝑇 on #»
𝑥 is especially simple,
#»
i.e., it is the result of scaling the vector 𝑥 by a factor 𝜆. In fact, this observation applies
not only to #»
𝑥 , but to any vector #»
𝑦 = 𝑐 #»
𝑥 with 𝑐 ∈ F that is a scalar multiple of #»
𝑥 , because
𝑇 ( #»
𝑦 ) = 𝑇 (𝑐 #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥 ) = 𝑐(𝜆 #»
𝑥 ) = 𝜆(𝑐 #»
𝑥 ) = 𝜆 #»
𝑦.
Example 10.3.6
Consider the differentiation operator 𝐷 : 𝑃𝑛 (F) → 𝑃𝑛 (F) from Example 10.3.2. Then the
constant polynomial 𝑝(𝑥) = 1 satisfies
𝐷(𝑝(𝑥)) = 𝐷(1) = 0 = 0 · 𝑝(𝑥).
Since 𝑝(𝑥) is not the zero polynomial, we see that, for 𝜆 = 0, (𝜆, 𝑝(𝑥)) is an eigenpair of 𝐷
over F.
Example 10.3.7
If in Example 10.3.3 we assume that 𝑚 = 𝑛, then 𝑆 : 𝑀𝑛×𝑛 (F) → 𝑀𝑛×𝑛 (F) is a linear
operator. Since the 𝑛×𝑛 identity matrix 𝐼𝑛 is diagonal, it remains fixed under the operation
of matrix transpose, and so
𝑆(𝐼𝑛 ) = 𝐼𝑛𝑇 = 𝐼𝑛 = 1 · 𝐼𝑛 .
Since 𝐼𝑛 is not the zero matrix, we see that, for 𝜆 = 1, (𝜆, 𝐼𝑛 ) is an eigenpair of 𝑆 over F.
As we learned in Chapter 9, given an arbitrary linear operator 𝑇 : F𝑛 → F𝑛 , it may be quite
difficult to understand the nature of its action. However, when 𝑇 has a basis of eigenvectors { #»
𝑣 1 , . . . , #»
𝑣 𝑛 }, then it follows from Proposition 9.2.6 (Eigenvector Basis Criterion for
Diagonalizability) that 𝑇 is diagonalizable, and so the action of 𝑇 is well understood in
this case. We can use this criterion to introduce the notion of diagonalizability for linear
operators V → V.
Section 10.3
Linear Operators
Definition 10.3.8
Diagonalizable
267
Let 𝑇 : V → V be a linear operator. We say that 𝑇 is diagonalizable over F if there exists
a basis ℬ of V comprised of eigenvectors of 𝑇 .
When 𝑇 is diagonalizable, the action of 𝑇 on an arbitrary vector #»
𝑥 ∈ V can be interpreted
#»
as scaling the vector 𝑥 by a factor of 𝜆𝑖 in the direction of an eigenvector #»
𝑣 𝑖 of 𝑇 . We
summarize this observation in the following theorem, whose proof we leave as an exercise.
Theorem 10.3.9
(Diagonalizable Operators Viewed As Scalings)
Let 𝑇 : V → V be a diagonalizable linear operator. Let ℬ = { #»
𝑣 1 , . . . , #»
𝑣 𝑛 } be a basis of
V comprised of eigenvectors of 𝑇 and, for 𝑖 = 1, . . . , 𝑛, let 𝜆𝑖 denote the eigenvalue of 𝑇
corresponding to #»
𝑣 𝑖 . Then, for any #»
𝑥 ∈ V, if
#»
𝑥 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑛 #»
𝑣 𝑛,
𝑐1 , . . . , 𝑐𝑛 ∈ F,
we have that
𝑇 ( #»
𝑥 ) = 𝑇 (𝑐1 #»
𝑣 1 + · · · + 𝑐𝑛 #»
𝑣 𝑛 ) = 𝜆1 𝑐1 #»
𝑣 1 + · · · + 𝜆𝑛 𝑐𝑛 #»
𝑣 𝑛.
That is, the action of 𝑇 on #»
𝑥 can be viewed as the result of scaling #»
𝑥 by a factor of 𝜆𝑖 in
#»
the direction of 𝑣 𝑖 for each 𝑖.
Example 10.3.10
Show that the linear operator 𝑆 : 𝑀2×2 (F) → 𝑀2×2 (F) given by 𝑆(𝐴) = 𝐴𝑇 is diagonalizable.
Solution: By observation, we find that
]︂
]︂)︂
[︂
(︂[︂
10
10
,
=1·
𝑆
00
00
(︂[︂
𝑆
01
10
]︂)︂
[︂
=1·
]︂
01
,
10
(︂[︂
𝑆
(︂[︂
𝑆
00
01
0 1
−1 0
]︂)︂
]︂
00
,
=1·
01
[︂
]︂)︂
[︂
= (−1) ·
]︂
0 1
.
−1 0
Thus,
(︂ [︂
]︂)︂ (︂ [︂
]︂)︂ (︂ [︂
]︂)︂
10
01
01
1,
, 1,
, 1,
00
10
10
(︂
and
[︂
0 1
−1,
−1 0
]︂)︂
are the eigenpairs of 𝑆. As we have seen in Example 10.2.8, the set
{︂[︂
]︂ [︂
]︂ [︂
]︂ [︂
]︂}︂
10
00
01
0 1
ℬ=
,
,
,
00
01
10
−1 0
is a basis of V. Thus, ℬ is a basis of eigenvectors of 𝑆, which means that 𝑆 is diagonalizable. By Theorem 10.3.9 (Diagonalizable Operators Viewed As Scalings), for any matrix
𝐴 ∈ 𝑀2×2 (F) such that
[︂
]︂
[︂
]︂
[︂
]︂
[︂
]︂
10
00
01
0 1
𝐴 = 𝑐1
+ 𝑐2
+ 𝑐3
+ 𝑐4
00
01
10
−1 0
[︂
]︂
𝑐1 𝑐3 + 𝑐4
=
,
𝑐3 − 𝑐4 𝑐2
268
Chapter 10
Vector Spaces
we have
(︂[︂
]︂)︂
𝑐1 𝑐3 + 𝑐4
𝑆(𝐴) = 𝑆
𝑐3 − 𝑐4 𝑐2
[︂
]︂)︂
[︂
]︂
[︂
]︂
]︂
(︂ [︂
0 1
01
01
10
+ 𝑐4
+ 𝑐3
+ 𝑐2
= 𝑆 𝑐1
−1 0
10
10
00
[︂
]︂
[︂
]︂
[︂
]︂
[︂
]︂
0 1
01
01
10
− 𝑐4
+ 𝑐3
+ 𝑐2
= 𝑐1
−1 0
10
10
00
[︂
]︂
𝑐1 𝑐3 − 𝑐4
=
.
𝑐3 + 𝑐4 𝑐2
EXERCISE
Recall Definition 6.4.2, where the adjugate of a square matrix was introduced. Show that
the linear operator adj : 𝑀2×2 (F) → 𝑀2×2 (F) given by
]︂
]︂)︂ [︂
(︂[︂
𝑑 −𝑏
𝑎 𝑏
=
adj
−𝑐 𝑎
𝑐 𝑑
is diagonalizable over F.
REMARK
Notice that the function adj : 𝑀𝑛×𝑛 (F) → 𝑀𝑛×𝑛 (F) is not linear when 𝑛 ≥ 3.
EXERCISE
Show that, for any positive integer 𝑛, the linear operator 𝑇 : 𝑃𝑛 (F) → 𝑃𝑛 (F) given by
𝑇 (𝑝(𝑥)) = 𝑝(2𝑥) is diagonalizable over F.
Just like Example 10.3.10, the exercises above can be solved by observation by finding the
most “natural” basis of eigenvectors. However, in general, this method does not work,
as sometimes a linear operator 𝑇 : V → V may not be diagonalizable, and even if it is
diagonalizable the action of 𝑇 may be so complicated that a basis of eigenvectors may not
be obvious.
Example 10.3.11
Show that the linear operator 𝑇 : 𝑃2 (C) → 𝑃2 (C) given by
𝑇 (𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 ) = (−𝑎1 − 𝑎2 ) + (𝑎0 + 𝑎2 )𝑥 + 𝑎2 𝑥2
is diagonalizable over C.
Solution: As an exercise, show that
𝑇 (1 − 𝑥2 ) = 1 − 𝑥2 , 𝑇 (−1 + 𝑖𝑥) = −1 + 𝑖𝑥, and 𝑇 (−1 − 𝑖𝑥) = 𝑖 − 𝑥 = −𝑖(−1 − 𝑖𝑥),
Section 10.3
Linear Operators
269
so that we may conclude
(1, 1 − 𝑥2 ), (𝑖, −1 + 𝑖𝑥),
and
(−𝑖, −1 − 𝑖𝑥)
are eigenpairs of 𝑇 . As we have seen in Example 10.2.9, the set
ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥}
is a basis of 𝑃2 (C). Thus, ℬ is a basis of 𝑃2 (C) comprised of eigenvectors of 𝑇 , which
means that 𝑇 is diagonalizable. By Theorem 10.3.9 (Diagonalizable Operators Viewed As
Scalings), for any polynomial 𝑝(𝑥) ∈ 𝑃2 (C) such that
𝑝(𝑥) = 𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥)
= (𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2
we have
(︀
)︀
𝑇 (𝑝(𝑥)) = 𝑇 (𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2
(︀
)︀
= 𝑇 𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥)
= 𝑐1 (1 − 𝑥2 ) + 𝑖𝑐2 (−1 + 𝑖𝑥) − 𝑖𝑐3 (−1 − 𝑖𝑥)
= (𝑐1 − 𝑖𝑐2 + 𝑖𝑐3 ) + (−𝑐2 − 𝑐3 )𝑥 − 𝑐1 𝑥2 .
In the solution to Example 10.3.11 a basis of eigenvectors ℬ was provided to you, but had
we have not provided it, finding ℬ would be a non-trivial task. Furthermore, with the
techniques that we currently have you would not be able to show that the linear operator
𝑇 from Example 10.3.11 is not diagonalizable over R, or that the operator 𝐷 from Example
10.3.2 is not diagonalizable over F. This requires further generalizaton of the diagonalization
theory, which includes the introduction of notions such as the kernel of a linear operator,
ℬ-matrix of a linear operator, geometric and algebraic multiplicities, etc. This theory,
along with many other fascinating topics, is studied in detail in a subsequent linear algebra
course.
Index
Basic Variable, 70
Basis, 197, 262
Block Matrix, 246
(𝑖, 𝑗)𝑡ℎ Minor, 148
(𝑖, 𝑗)𝑡ℎ Submatrix, 148
𝑀𝑚×𝑛 (C), 75
𝑀𝑚×𝑛 (F), 75
𝑀𝑚×𝑛 (R), 75
𝑃𝑛 (F), 257
C𝑛
addition, 12
components, 13
definition, 12
#» 26
projection of #»
𝑧 onto 𝑤,
scalar multiplication, 12
standard basis, 13
standard inner product, 23
subtraction, 12
𝑛
F
standard basis, 209
subspace, 190
𝑛
F
standard inner product, 27
𝑛
R
addition, 8
cross product, 28
definition, 5
dot product, 13
properties of addition, 9
scalar multiplication, 10
standard basis, 11
subtraction, 10
ℬ-Matrix, 230
Change-of-Basis Matrix, 226
Characteristic Equation, 170
Characteristic Polynomial, 170
Coefficient, 49
Coefficient Matrix, 61
Cofactor, 159
Column Space, 97
Components Of The Standard Basis, 11
Composite linear transformation, 144
Consistent Systems, 52
Constant Term, 49
Coordinate Vector, 221
Coordinates, 220
Cross Product, 28
Determinant, 147, 148
Diagonal, 108
Diagonalizable, 236, 267
Diagonalizable Matrix, 186
Dimension, 217
dot product, 13
Eigenpair, 169, 266
Eigenspace, 182
Eigenvalue, 169, 266
algebraic multiplicity, 245
geometric multiplicity, 245
Eigenvalue Equation, 170
Eigenvector, 169, 266
Elementary matrix, 109
Elementary Operations, 53
Equality, 99
Equality Of Vectors, 7
Equivalent Systems, 52
Addition of vectors, 8
Additive Inverse, 9
Additive inverse, 104
Adjugate, 159
Algebraic Multiplicity, 245
Angle Between Vectors, 16
Associated Homogeneous System, 90
Augmented Matrix, 61
Free Variable, 70
270
Section 10.3
INDEX
271
Function determined by the matrix, 121
Generic Linear Transformation, 264
Geometric Multiplicity, 245
Homogeneous Systems, 81
Identity, 108, 146
Inconsistent Systems, 52
Independence, 195, 261
Injective, 130
Inverse, 114
Invertible, 112
Kernel, 130
Leading Entry, 66
Length, 14, 24
Line, 36, 38
parametric equations, 34, 37
vector equation, 35, 36
Linear Combination, 32
Linear Dependence, 195, 261
Linear Equation, 49
associated homogeneous system, 90
coefficient, 49
consistent systems, 52
constant term, 49
elementary operations, 53
equivalent systems, 52
homogeneous systems, 81
inconsistent systems, 52
non-homogeneous systems, 81
nullspace, 82
particular solution, 91
solution, 50
solution set, 51
solve, 50
system of linear equations, 50
trivial equation, 54
trivial solution, 81
Linear Operator, 230, 266
diagonalizable, 236, 267
eigenpair, 266
eigenvalue, 266
eigenvector, 266
Linear Transformation, 123
composition, 144
identity, 146
kernel, 130
linear operator, 230, 266
one-to-one, 130
onto, 128
power, 146
range, 127
standard matrix, 134
Lower-triangular, 107
Matrix, 61
(𝑖, 𝑗)𝑡ℎ minor, 148
(𝑖, 𝑗)𝑡ℎ submatrix, 148
additive inverse, 104
adjugate, 159
augmented matrix, 61
basic variable, 70
block matrix, 246
change-of-basis matrix, 226
characteristic equation, 170
characteristic polynomial, 170
coefficient matrix, 61
cofactor, 159
column space, 97
determinant, 147, 148
diagonal, 108
diagonalizable matrix, 186
eigenpair, 169
eigenspace, 182
eigenvalue, 169
eigenvalue equation, 170
eigenvector, 169
elementary, 109
entry, 61
equality, 99
free variable, 70
identity, 108
inverse, 114
invertible, 112
leading entry, 66
leading one, 66
lower-triangular, 107
multiplication, 100
multiplication in terms of columns,
86
multiplication in terms of the
individual entries, 85
nullity, 77
pivot, 66
pivot column, 66
pivot position, 66
pivot row, 66
rank, 75
reduced row echelon form, 67
row echelon form, 66
272
Chapter 10
row equivalent, 62
row operation (ERO), 62
row space, 98
row vector, 85
RREF, 68
scalar multiplication, 105
similar, 184
square, 107
sum, 103
trace, 173
transpose, 98
upper-triangular, 107
zero matrix, 104
zero row, 62
Matrix Product, 100
Multiplication, 100
Multiplication in Terms of Columns, 86
Multiplication in Terms of the Individual
Entries, 85
Non-Homogeneous Systems, 81
Norm, 14, 24
Normalization, 15
Nullity, 77
Nullspace, 82
One-to-One, 130
Onto, 128
Ordered Basis, 220
Orthogonal, 17, 25
Parametric Equations, 37
parametric equations, 34
Particular Solution, 91
Pivot, 66
Pivot Column, 66
Pivot Position, 66
Pivot Row, 66
Plane, 40, 41
scalar equation, 44
Vector Equation, 40
vector equation, 42
Power, 146
Projection, 19
Range, 127
Rank, 75
Row Echelon Form, 66
Row Equivalent, 62
Row notation, 6
Row Operation (ERO), 62
Row Space, 98
Row Vector, 85
RREF, 68
Scalar Equation, 44
Scalar Multiplication, 10
Scalar multiplication, 105
Similar, 184
Solution, 50
Solution Set, 51
Solve, 50
Span, 32, 258
spanned by, 32, 258
Spanning Set, 32, 258
Square, 107
Standard Basis, 11, 209
Standard Inner Product, 23, 27
Standard Matrix, 134
Subspace, 190
Subtraction, 10
Sum, 103
Surjective, 128
T
ℬ-matrix, 230
Trace, 173
Transpose, 98
Trivial Equation, 54
Trivial Solution, 81, 195, 261
Unit Vector, 15
Upper-triangular, 107
Vector, 5, 256
addition, 8
additive inverse, 9
angle between, 16
basis, 197, 262
components, 11
coordinate, 220
coordinate vector, 221
cross product, 28
dimension, 217
direction vector, 35, 36
dot product, 13
equality, 7
independence, 195, 261
length, 14, 24
linear combination, 32
linear dependence, 195, 261
normalization, 15
INDEX
Section 10.3
INDEX
273
ordered basis, 220
orthogonal, 17, 25
projection, 19
properties of addition, 9
row notation, 6
scalar multiplication, 10
span, 32, 258
standard inner product, 23
subtraction, 10
unit vector, 15
vector space, 255
Vector Equation, 35, 36, 40, 42
Vector Space, 255
Zero Matrix, 104
Zero Row, 62
Download