Uploaded by Jaspreet Singh

Math340CourseNotes

advertisement
Math 340: Summer 2020
Course Notes
Robert Won
These are course notes from Math 340 (Abstract Linear Algebra) at the University of Washington
taught during Summer 2020. The textbook was Linear Algebra Done Right by Sheldon Axler. I
also referenced Lucas Braune’s notes from when he taught Math 340.
To my students: If you find errors or typos, please let me know! I can correct them.
1. Monday 6/22: Vector Spaces (1.A–B) and Overview of the Course
Course overview
• Q: What is the point of this course?
This course serves as a second (or perhaps third) exposure to linear algebra. In Math
308, when you first learned linear algebra, you focused on Rn , matrices, and computations.
Depending on your instructor, you likely saw a few proofs, because understanding a proof
of a result deepens your understanding of the result. You also probably saw some of the
powerful applications of linear algebra (if you took Math 380, you saw even more). Linear
algebra is central to Google PageRank, sabermetrics, machine learning, principal component analysis, and graph theory. It is also central to pure mathematics (every mathematics
professor uses linear algebra in their research). The more deeply you understand the concepts, the better. So the goal of this course is to understand linear algebra deeply, which
means that this course will be a proof-based course.
• Logistical details: The textbook is important, and you can get access to an electronic
version through the UW library. I will post three lectures on Canvas (in the Panopto tab)
a week, which you can watch asynchronously. But I highly suggest that you keep up with
lectures. Getting behind in a math class makes things a lot less fun. Both your TA and I
will have office hours on Zoom, with the exact timing TBA.
I will post the slides from lecture to Canvas, as well as regularly updating these typed
up notes (I’m writing them while teaching this summer, so as the quarter progresses, this
document will continue growing).
You will have weekly problem sets which will be posted to Canvas. You will submit
completed assignments on Gradescope. There will be one midterm exam, and a final exam.
These will be timed remote exams, also submitted on Gradescope. I will give you more
information as we get closer to the first midterm.
Rn and Cn (1.A)
In your previous linear algebra courses, you likely focused mostly on the vector space Rn . The
next vector space to keep in mind is Cn . So first, we should make sure that we’re all somewhat
comfortable with complex numbers. Section 1.A in your textbook has more details, which you
should review carefully if you’re feeling less comfortable.
Definition 1. A complex number is a pair (a, b) where a, b ∈ R. We usually write a + bi instead of
(a, b).
The set of all complex numbers is denoted by C:
C = {a + bi | a, b ∈ R}.
The set C is equipped with operations addition and multiplication:
(a + bi) + (c + di) = (a + c) + (b + d)i
(a + bi)(c + di) = (ac − bd) + (ad + bc)i.
Using this multiplication rule, you can calculate that i2 = −1. So you should not memorize the
multiplication rule, but just carry out the arithmetic using the fact that i2 = −1.
Remark. Let z = a + bi ∈ C \ {0}. Then (a + bi)(a − bi) = a2 + b2 . Using this fact, you can show
that
b
a
−
i = 1.
z·
a2 + b2 a2 + b2
In other words, every nonzero complex number has a multiplicative inverse. We write
z −1 =
1
a
b
= 2
− 2
i.
2
z
a +b
a + b2
The triple (C, +, ·) is an example of a field (a set with two operations satisfying certain axioms,
including that every nonzero element has a multiplicative inverse). Other examples of fields include:
R, Q, Z/(p) where p is a prime number. It turns out that actually you can do linear algebra over
any field.
Notation. in this course, we use F to denote either R or C. elements of F are called scalars.
Just as you are familiar with
R3 = {(x, y, z) | x, y, z ∈ R}
we make the following definition.
Definition 2. The n-dimensional Euclidean space over F is defined to be
Fn = {(x1 , x2 , . . . , xn ) | xj ∈ F for j = 1, . . . , n}.
We say that xj is the jth coordinate of (x1 , . . . , xn ) ∈ Fn .
If you recall all of the things you did in Math 308 and Math 380, the important fact about Rn is
that you can add vectors and scalar multiply (when taking linear combinations, spans, null spaces,
etc., these are the important operations). When row reducing, it was important to be able to divide
by any nonzero scalar. These operations also work in Fn (where F = R or C or any field).
Definition 3. Addition in Fn is defined by adding corresponding coordinates
(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ).
Scalar multiplication in Fn is defined as follows. If λ ∈ F and (x1 , . . . , xn ) ∈ Fn then
λ(x1 , . . . , xn ) = (λx1 , . . . , λxn ).
Vector Spaces (1.B)
Since all we need to do linear algebra are addition and scalar multiplication by elements of a field
F, in the abstract, we define vector spaces in terms of having these operations.
Definition 4. A vector space (over F) is a set V equipped with two operations: an addition that
assigns an element u + v ∈ V for every pair of elements u, v ∈ V and scalar multiplication that
assigns an element λv ∈ V to each λ ∈ F and each v ∈ V . The operations are required to satisfy
the following axioms.
(1) (commutativity of addition) u + v = v + u for all u, v ∈ V ,
(2) (associativity of addition) (u + v) + w = u + (v + w) for all u, v, w ∈ V ,
(3) (additive identity) there exists an element 0 ∈ V such that 0 + v = v for all v ∈ V ,
(4) (additive inverse) for every v ∈ V , there exists w ∈ V such that v + w = 0,
(5) (associativity of scalar multiplication) (ab)v = a(bv) for all a, b ∈ F and v ∈ V ,
(6) (multiplicative identity) 1v = v for all v ∈ V ,
(7) (distributivity) a(u + v) = au + av and (a + b)v = av + bv for all a, b ∈ F and u, v ∈ V .
Elements of a vector space are called vectors or sometimes points.
The value in defining vector spaces abstractly is that many different sets (other than just Rn )
satisfy these axioms. So if we prove any results using only these axioms, the result will hold for
any vector space, including Rn . Here are some examples of some vector spaces.
Example 5. Let
F∞ = {(x1 , x2 , x3 , . . . ) | xj ∈ F for all j ∈ N}.
Define addition and scalar multiplication
(x1 , x2 , . . . ) + (y1 , y2 , . . . ) = (x1 + y1 , x2 + y2 , . . . )
λ(x1 , x2 , . . . ) = (λx1 , λx2 , . . . ).
One can check that these operations satisfy the axioms above.
Example 6. Let P(R) denote the set of all polynomials in the variable x with coefficients in
R. For example, x2 − 3, πx + e, x3 ∈ P(R). As usual, you can add two polynomials and obtain
another polynomial
(x2 − 3) + (πx + e) = x2 + πx + e − 3
and you can multiply a polynomial by any real number λ ∈ R to obtain another
e(x2 − 3) = ex2 − 3e
and you can check that these operations make P(R) into a vector space (over R).
Example 7. Let S be any set. Let
FS = {all functions f : S → F}.
For any f, g ∈ FS , define addition by
(f + g) : S → F
(f + g)(x) = f (x) + g(x)
for all x ∈ S. For any λ ∈ F and f ∈ FS , define
(λf ) : S → F
(λf )(x) = λf (x)
for all x ∈ S. Then this makes FS into a vector space.
So now when we prove a result for vector spaces (using just the axioms), it will hold for Rn , Cn ,
these examples, and all other vector spaces. Let’s see some examples (which may be familiar from
Math 300).
Notation. Throughout these notes and the textbook, V will denote a vector space over F.
Proposition 8. A vector space has a unique additive identity.
Remark. Note that the axioms of a vector space only asserted the existence of an additive identity,
we didn’t assume that there was a unique vector with that property. In fact, it is unique, and we
can prove it.
Proof. Suppose that 0, 00 ∈ V are both additive identities. Then
00 = 0 + 00 = 00 + 0 = 0
so 00 = 0. Hence, the additive identity is unique.
Proposition 9. Every element in a vector space has a unique additive inverse.
Proof. Let v ∈ V . We wish to show that v has a unique additive inverse. So suppose that w, w0
are additive inverses of v. Then
w = w + 0 = w + (v + w0 ) = (w + v) + w0 = 0 + w0 = w0
so w = w0 , as desired.
Notation. Let v, w ∈ V . We use the notation −v for the (unique) additive inverse of v and define
w − v = w + (−v).
Proposition 10. 0v = 0 for every v ∈ V
Proof. For v ∈ V , we have
0v = (0 + 0)v = 0v + 0v
subtract 0v from both sides (that is, add the additive inverse of 0v).
Proposition 11. For every a ∈ F, a0 = 0.
Proof. For any a ∈ F
a0 = a(0 + 0) = a0 + a0
subtract a0 from both sides.
Note the difference in the two above propositions. The first says that the product of the scalar
0 with any vector v is the zero vector. The second says that product of any scalar with the zero
vector is the zero vector. The proofs look similar, but you have to pay careful attention to what is
a vector and what is a scalar. Oftentimes we abuse notation by denoting both the scalar 0 and the
zero vector 0 with the same symbol, because usually it is clear from context. You should, though.
keep clear in your mind whether you are working with a scalar or a vector.
Proposition 12. For every v ∈ V , (−1)v = −v.
This proposition says that if you take the scalar −1 and multiply it by any vector v, you obtain
the additive inverse of v. Try to prove it! Then check it against the book’s proof afterward.
2. Wednesday 6/24: Subspaces (1.C)
As always, let V denote a vector space over F.
Definition 13. A subset U ⊆ V is a subspace if U is also a vector space (using the same addition
and scalar multiplication as on V ).
Proposition 14. A subset U ⊆ V is a subspace if and only if it satisfies the following three
conditions:
(1) 0 ∈ U ,
(2) u, v ∈ U implies u + v ∈ U ,
(3) a ∈ F and u ∈ U implies au ∈ U .
Proof. If U is a subspace of V , then by definition it is a vector space using the same addition and
scalar multiplication as V . So (2) and (3) are automatic by the definition of a vector space. We
should check that 0U = 0 (that is, the additive identity of U is the same as the additive identity of
V ). Note that
0 = 0 · 0U ∈ U
because U is closed under scalar multiplication. And since the additive identity of a vector space
is unique, this shows that 0U = 0.
For the converse, suppose that U satisfies conditions (1)–(3). If u ∈ U then
−u = (−1) · u ∈ U
by condition (3). Since V is a vector space and the addition and scalar multiplication of U are
inherited from V , all other axioms for a vector space hold in U . (For example, if u, v ∈ U then
u, v ∈ V so u + v = v + u, etc.)
Example 15.
• If b ∈ F, then
{(x1 , x2 , x3 ) ∈ F4 | x1 − 3x2 = b}
is a subspace of F3 if and only if b = 0.
• The set of continuous real-valued functions f : [0, 1] → R is a subspace of R[0,1] (the
set of all functions [0, 1] → R). This is because the sum of two continuous functions is
continuous, and the scalar multiple of any continuous function is continuous.
• The set of continuous real-valued functions f : [0, 1] → R satisfying f (0) = b is a
subspace of R[0,1] if and only if b = 0.
You should check for yourself that the three examples above are subspaces of their respective vector
spaces. Remember that this come down to checking that the 0 vector is in each of them (the zero
vector in R[0,1] is the function that is identically 0) and that the subset is closed under addition
and scalar multiplication.
Proposition 16. Let {Uλ }λ∈Λ be a family of subspaces of V . Then
T
λ∈Λ Uλ
is a subspace of V .
Proof. You will prove this on your problem set.
Sums of subspaces.
We now discuss sums of subspaces. The idea is the following. Given two (or more) subspaces U
and V , what is the correct notion of combining U and V into a bigger subspace? In particular,
we would like this bigger subspace to actually be a subspace, not just a subset. So for example,
taking U ∪ V does not always produce a subspace. The correct notion is to take the sum of U and
V , which we will define below. Note that the definition actually holds for any collection of subsets,
not just subspaces, but we will mostly be interested in taking sums of subspaces.
Definition 17. Let U1 , . . . , Um be subsets of V . The sum of U1 , . . . , Um is
U1 + U2 + · · · + Um = {u1 + · · · + um | ui ∈ Ui for all 1 ≤ i ≤ m},
that is, the collection of all possible sums of elements of U1 , . . . , Um .
Example 18. Let V = R2 , U1 = {(x, 0) | x ∈ R} the x-axis, and U2 = {(0, y) | y ∈ R}, the
y-axis. Then U1 ∪U2 is not a subspace. On the other hand, check for yourself that U1 +U2 = R2 ,
which is a subspace. In fact, it is the smallest subspace of R2 containing both U1 and U2 . This
is true in general.
Proposition 19. Let U1 , . . . , Um be subspaces of V . Then U1 + · · · + Um is the smallest subspace
of V containing U1 , . . . , Um .
Proof. First, we show that U1 + · · · + Um is indeed a subspace. Since each of the Ui are subspaces,
each of them contains 0. Further,
0 = 0 + · · · + 0 ∈ U1 + · · · + Um
so 0 ∈ U1 + · · · + Um (you can write 0 as a sum of m things, one from each of the Ui ’s).
Suppose v, w ∈ U1 + · · · + Um . By definition, this means that v = u1 + · · · + um and w = t1 + · · · + tm
where ui , ti ∈ Ui for each 1 ≤ i ≤ m. Then
v + w = (u1 + t1 ) + · · · + (um + tm )
and since each Ui is a subspace, ui + ti ∈ Ui . A similar proof works to show that U1 + · · · + Um is
closed under scalar multiplication.
Now that we know that U1 +· · ·+Um is a subspace, how do we prove that it is the smallest subspace
containing U1 , . . . , Um ? Well, first of all, note that if u ∈ Ui , then u = 0 + · · · + 0 + ui + 0 + · · · + 0 so
u ∈ U1 +· · ·+Um . Hence, the sum contains each Ui . On the other hand, if W is any subspace which
contains U1 , . . . , Um , then since W is closed under addition, it must contain all sums of elements of
U1 , . . . , Um . Hence, W must contain U1 + · · · + Um .
Definition 20. Suppose U1 , . . . , Um are subspaces of V . The sum U1 + · · · + Um ⊆ V is called a
direct sum if each v ∈ U1 + · · · + Um can be written in a unique way as v = u1 + · · · + um with
ui ∈ Ui for all i. In this case, we use the notation
U1 ⊕ · · · ⊕ Um = U1 + · · · + Um .
Example 21. Consider the following subspaces of R3 : U = {(x, 0, 0) | x ∈ R}, V = {(0, y, 0) |
y ∈ R} and W = {(x, x, 0) | x ∈ R}.
Then U + V + W is a subspace (the xy-plane in R3 ). However, for example, (1, 1, 0) can
be written as (1, 0, 0) + (0, 1, 0) + (0, 0, 0) or as (0, 0, 0) + (0, 0, 0) + (1, 1, 0), or in fact as
(2, 0, 0) + (0, 2, 0) − (1, 1, 0). Hence, this is not a direct sum.
You can check for yourself that U + V is a direct sum and U + V + W = U + V . Hence,we can
denote U + V by U ⊕ V . In fact, U ⊕ V = U ⊕ W = V ⊕ W = U + V + W .
A priori, it seems like the condition for a sum to be a direct sum involves checking infinitely many
things, since you have to check whether every vector has a unique representation as a sum. It turns
out that you only need to check whether 0 can be written uniquely as a sum.
Theorem 22. Suppose U1 , . . . , Um are subspaces of V . Then U1 + · · · + Um is a direct sum if and
only if the only way to write 0 as a sum u1 + · · · + um where each ui ∈ Ui is by taking each ui to
be the zero vector.
Proof sketch. By the definition of the direct sum, there is only one way to write 0 as a sum, and
certainly 0 + · · · + 0 = 0, so this takes care of the ⇒ direction.
Conversely, suppose that the unique way to write 0 as a sum of elements from the Ui is 0 = 0+· · ·+0.
Suppose that there is a v ∈ U1 + · · · + Um that you can write as a sum in two ways, say u1 + · · · + um
and v1 + · · · + vm . Then subtract to get (u1 − v1 ) + · · · + (um − vm ) = 0. Since there is only one
way to write 0 as a sum, and each ui − vi ∈ Ui , we must have ui − vi = 0 for all i. Hence, the two
ways to write v were in fact the same.
For the sum of two subspaces, there is an even easier criterion to check.
Proposition 23. If U and W are subspaces of V , then U + W is a direct sum if and only if
U ∩ W = {0}.
Proof. [⇒]. Suppose U + W is a direct sum. Let v ∈ U ∩ W . Then v = v + 0 = 0 + v ∈ U + W .
By uniqueness, we must have that v = 0.
[⇐]. Suppose that U ∩ W = {0}. We can use the above theorem and prove that there is only one
way to write 0 as a sum. Suppose 0 = u + w where u ∈ U and w ∈ W . Then u = −w ∈ U ∩ W .
Hence, u = w = 0.
3. Friday 6/26: Span and Linear Independence (2.A)
Given a collection of vectors in V , we can scalar multiply them and add them to form new vectors.
Definition 24. A linear combination of v1 , . . . , vm ∈ V is a vector of the form
a1 v1 + · · · + am vm
where a1 , . . . , am ∈ F.
Definition 25. The set of all linear combinations of v1 , . . . , vm ∈ V is called the span of v1 , . . . , vm
and is denoted span(v1 , . . . , vm ). That is,
span(v1 , . . . , vm ) = {a1 v1 + · · · + am vm | a1 , . . . , am ∈ F}.
By convention, we take span(∅) = {0}.
Example 26.
• If v ∈ R2 is nonzero, then span(v) = {av | a ∈ R} is the line through v
in R2 . Of course, if v = 0 then span(v) = {0}.
• If v, w ∈ R3 are nonzero and they are not scalar multiples of one another, then span(v, w)
is a plane in R3 .
Proposition 27. The set span(v1 , . . . , vm ) is the smallest subspace of V containing v1 , . . . , vm .
Proof. This is asserting two things. First of all that S = span(v1 , . . . , vm ) is a subspace. Second,
that it is the smallest subspace containing the vectors.
To show that S is a subspace, we should show that it contains 0 and is closed under addition and
scalar multiplication. First, 0 ∈ S since 0 = 0v1 + · · · + 0vm . Now suppose u, w ∈ S. Then
u = a1 v1 + · · · + am vm
w = b1 v1 + · · · + bm vm
for some a1 , . . . , am , b1 , . . . , bm ∈ F. Then
u + w = (a1 + b1 )v1 + · · · + (am + bm )vm ∈ S.
Check for yourself that S is closed under scalar multiplication to complete the proof that S is a
subspace.
Now it is clear that S contains each vi , since vi = 0v1 +· · ·+1vi +· · ·+0vm so vi is a linear combination
of v1 , . . . , vm . Now if T is any subspace which contains v1 , . . . , vm , then since T is closed under
addition and scalar multiplication, it must contain every linear combination of v1 , . . . , vm and hence
S ⊆ T . This shows that S is the smallest subspace which contains v1 , . . . , vm .
Definition 28. If span(v1 , . . . , vm ) equals V , we say that v1 , . . . , vm spans V .
Note the difference between the noun “the span of v1 , . . . , vm ” and the verb “v1 , . . . , vm spans V ”.
Example 29. The vectors e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ... en = (0, 0, . . . , 0, 1) span
Fn since span(e1 , . . . , en ) = Fn .
Definition 30. Let P(F) denote the set of all polynomials in one variable z with coefficients in F.
That is, P(F) consists of all polynomials of the form
a0 + a1 z + · · · + am z m
for some a0 , . . . , am ∈ F.
If m is a non-negative integer, let Pm (F) denote the subset of P(F) consisting of polynomials of
degree at most m. This is a subspace of P(F).
Example 31. The monomials 1, z, z 2 , . . . span the vector space P(F), since every polynomial
can be written as a (finite) linear combination of these monomials.
The m + 1 monomials 1, z, z 2 , . . . , z m spans the subspace Pm (F).
Definition 32. A vector space is finite-dimensional if it is spanned by finitely many vectors.
Otherwise, we say it is infinite-dimensional.
So P(F) is infinite-dimensional while Pm (F) is finite-dimensional. F∞ is infinite-dimensional and
Fn is finite-dimensional.
Remember your intuition from 308 that if {v1 , . . . , vn } spans V , then {v1 , . . . , vn } “points in every
direction in V ”. The other notion from 308 is the notion of linear independence. Remember
the intuition is that a linearly independent set “points in different directions” or perhaps “isn’t
redundant”.
Definition 33. A subset {v1 , . . . , vn } ⊆ V is linearly independent if
a1 v1 + · · · + an vn = 0
implies that a1 = · · · = an = 0. By convention, the empty set {} is said to be linearly independent.
Otherwise, {v1 , . . . , vn } is linearly dependent. Another way to say this is that {v1 , . . . , vn } is linearly
dependent if there exist scalars a1 , . . . , an ∈ F not all 0 such that a1 v1 + · · · + an vn = 0.
Example 34.
• {v} ⊆ V is linearly independent if and only if v 6= 0.
• {v, w} ⊆ V is linearly independent if and only if v 6= λw and w 6= λv for all λ ∈ F.
• {1, z, z 2 , z 3 } is a linearly independent subset of P(F) or Pm (F) (for m ≥ 3). If a0 +
a1 z + a2 z 2 + a3 z 3 = 0 as a function of z, then a0 = a1 = a2 = a3 = 0 since a nonzero
polynomial of degree 3 has at most 3 roots, while our polynomial has infinitely many
roots.
The next proposition again captures the intuition that a linearly independent set is one with “no
redundancy.”
Proposition 35. A subset {v1 , . . . , vn } ⊆ V is linearly independent if and only if none of the
vectors is a linear combination of the others.
Proof. [⇒]. We prove the contrapositive. If one of the vectors is a linear combination of others,
say v1 = a2 v2 + · · · + an vn , then
(−1)v1 + a2 v2 + · · · + an vn = 0
so we have a nontrivial linear combination which is 0 and hence the set is linearly dependent.
[⇐]. Conversely, suppose that {v1 , . . . , vn } is linearly dependent. Then there exist a1 , . . . , an ∈ F
not all zero such that a1 v1 + · · · + an vn = 0. Without loss of generality, suppose that a1 6= 0. Then
v1 = −
an
a2
v2 − · · · − vn
a1
a1
so one of the vectors is a linear combination of the others.
We can say a bit more actually. Not only is one of the vectors a linear combination of the others,
but you can remove it from the list without affecting the span. The lemma as stated in your book
is as follows:
Lemma 36 (Linear Dependence Lemma). Suppose v1 , . . . , vm is a linearly dependent list in V .
Then there exists j ∈ {1, 2, . . . , m} such that the following hold
(1) vj ∈ span(v1 , . . . , vj−1 );
(2) if the jth term is removed, the span of the remaining list equals span(v1 , . . . , vm ).
Proof. We have essentially proved the first part of this lemma above. If the set is linearly dependent,
then there exist scalars not all zero such that a1 v1 + · · · + am vm = 0. Simply take the largest j
such that aj 6= 0. Then we actually have a1 v1 + · · · + aj vj = 0 (since the rest of the coefficients are
zero). Subtract and divide by aj .
For the second claim we need to show that the span is not affected when you remove the vector vj .
We need to prove that if you can write a vector as a linear combination of all of the vi ’s, then you
can fact write it as a linear combination without using vj . So take some vector
a1 v1 + a2 v2 + · · · + aj−1 vj−1 + aj vj + aj+1 vj+1 + · · · + am vm ∈ span(v1 , . . . , vm ).
We know that vj ∈ span(v1 , . . . , vj−1 ), so
vj = b1 v1 + · · · + bj−1 vj−1
for some b1 , · · · bj−1 ∈ F. Now simply substitute this expression for vj , to see that you can write
the original vector as a linear combination without using vj .
The preceding lemma is what a mathematician might call a “technical” lemma. In particular, the
second part of the lemma seems like something we can use to prove other results.
4. Monday 6/29: More on Span and Linear Independence (2.A) and Bases (2.B)
Last time we ended with the Linear Dependence Lemma, which was a technical result, meaning
that we can use it to prove other results. We illustrate this by using it to prove two results,
which should agree with your intuition. For example, think about how this result agrees with your
intuition behind spanning and linear independence.
Proposition 37. Suppose u1 , . . . , um is a linearly independent set and w1 , . . . , wn spans V . Then
m ≤ n.
Proof. The proof uses an interesting technique. Since w1 , . . . , wn spans V , if we add any vector to
the list, it will result in a linearly dependent list (since that vector is already in the span of the
others). In particular,
u1 , w1 , . . . , wn
must be linearly dependent. Hence, by the Linear Dependence Lemma, we can remove one of the
wi ’s so that the new list still spans V . (Here, there is a technical point that we will remove the
furthest-right vector from the list that we can, as we did in the proof of the Linear Dependence
Lemma, so we will be removing a wi rather than u1 . If the coefficients on all of the wi ’s are zero,
then this implies that u1 = 0, which is impossible since the u’s are linearly independent).
Now we have a list of n vectors u1 , w1 , . . . wj−1 , wj+1 , . . . , wn which still spans V . Hence, u2 is in
the span of this list, so
u1 , u2 , w1 , . . . , wj−1 , wj+1 , . . . , wn
is linearly dependent. By the Linear Dependence Lemma, we can remove one of the wi ’s so that
this new list still spans V .
Repeat this process m times. At each step, we added a ui and the Linear Dependence Lemma
implies that there is a wj to remove. Hence, there are at least as many wj as ui , so m ≤ n.
This proposition gives a very quick way to show if some set of vectors is linearly dependent. For
example, we know that e1 , e2 , e3 span F3 . So if we have a list of four or more vectors in F3 , they
are automatically linearly dependent.
Our last result in the section also seems intuitively clear, and can be proved using the technical
Linear Dependence Lemma.
Proposition 38. Every subspace of a finite-dimensional vector space is finite-dimensional.
Recall that by finite-dimensional, we mean that a vector space that can be spanned by finitely
many vectors.
Proof. Suppose V is finite-dimensional and let U be a subspace. We want to prove that U is
finite-dimensional.
If U = {0}, then U is spanned by the empty list by convention, so U is finite-dimensional.
If U 6= {0}, then U contains some nonzero vector, say v1 . If U = span(v1 ), then U is spanned by
finitely many vectors so is finite-dimensional. If not, then we can choose a vector v2 6∈ span(v1 ).
Continue this process.
At each step, we have constructed a linearly independent list, since by the Linear Dependence
Lemma, none of the vj ’s is in the span of v1 , . . . , vj−1 . But by the previous proposition, any
linearly independent list in V cannot be longer than any spanning list of V . Since V is spanned
by finitely many vectors, this process must eventually stop, so U is spanned by finitely many
vectors.
Bases (2.B)
In the last section, we discussed linearly independence and the notion of spanning V . A set of
vectors which is both linearly independent and spans V is called a basis of V .
Definition 39. A basis for V is a set of vectors B ⊆ V such that B spans V and B is linearly
independent. The plural of basis is bases.
Example 40. For each of these examples, convince yourself that the claimed basis both spans
and is linearly independent.
(1) The list {e1 , e2 , . . . , en } is a basis of Fn called the standard basis of Fn .
(2) {1, z, . . . , z m } is a basis of Pm (F).
(3) {1, z, z 2 , . . . } is a basis of P(F).
Example 41. The list of vectors {e1 = (1, 0, 0, . . . ), e2 = (0, 1, 0, . . . ), e3 = (0, 0, 1, 0, . . . ), . . . }
do not form a basis for F∞ , since it is not possible to write (1, 1, 1, . . . ) as a linear combination
of these vectors. (By definition, a linear combination is a finite sum).
The difference between this and P(F) is that polynomials are defined to be finite sums of
monomials. The analogue is that {1, z, z 2 , . . . } is not a basis for the vector space of all power
series centered at z = 0.
Here’s an extremely useful property of bases.
Proposition 42. The list B = {v1 , . . . , vn } is a basis of V if and only if every v ∈ V can be written
uniquely in the form
v = a1 v1 + · · · + an vn
where the ai ∈ F.
Proof. Suppose that B is a basis of V and let v ∈ V . Since B spans V , therefore there are scalars
a1 , . . . , an ∈ F such that
v = a1 v1 + · · · + an vn .
To show that this representation is unique, suppose that we also have scalars b1 , . . . , bn ∈ F such
that
v = b1 v1 + · · · + bn vn .
Subtract these two equations to obtain
0 = (a1 − b1 )v1 + · · · + (an − bn )vn .
Since B is linearly independent, this implies that each ai − bi = 0, and hence we have uniqueness.
Conversely, suppose every vector v ∈ V can be written uniquely in the specified form. Since every
vector in V is a linear combination of the vectors in B, this means that B spans V . Further, we
must have that 0 can be written uniquely as a linear combination of vectors in B. But also, we
know that
0 = 0v1 + · · · + 0vn
so this must be the unique representation of 0. Hence, B is also linearly independent so B is a basis
of V .
Definition 43. If B = {v1 , . . . , vn } is a basis of V and v ∈ V , write
v = a1 v1 + · · · + an vn .
By the above proposition, this representation is unique. The scalars a1 , . . . , an ∈ F are called the
coordinates of v with respect to the basis B.
Now it seems like a basis is a very special thing, since it both spans and is linearly independent. In
some sense it sort of is special, since it is somehow balanced between “pointing in every direction”
while still “avoiding redundancy.” So it is more special than simply a spanning set (which “points
in every direction” but might have some “redundancy”) or than a linearly independent set (which
does “not have redundant vectors” but doesn’t “point everywhere”). On the other hand, the next
results will show that a basis isn’t that special.
If you have a spanning set, you can always remove some vectors until you get a basis. Similarly, if
you have a linearly independent set, it is always possible to add some vectors until you get a basis.
Proposition 44. Every spanning list in a vector space can be reduced to a basis of the vector
space.
Proof. Suppose we have a list of vectors v1 , . . . , vn which spans V . We want to delete some of the
vectors so that the list still spans but the shorter list is linearly independent. We do this in a
multi-step process. Start with B = {v1 , . . . , vn }.
Step 1. If v1 = 0, delete v1 from B. If not, do nothing.
Step j. If vj ∈ span(v1 , ,̇vj−1 ), then we don’t need vj anyway so delete it from B. If not, then
leave B unchanged.
This algorithm terminates after n steps, leaving a list B. Since we have only thrown out vectors
that have already been in the span of the previous ones, at no point did we change the span of B.
On the other hand, by the Linear Dependence Lemma, since our new list has the property that
none of the vectors is in the span of any of the previous ones, it is also linearly independent.
Before we prove the “linearly independent set” version of the lemma, we have a nice corollary.
Corollary 45. Every finite-dimensional vector space has a basis.
Proof. By definition, a finite-dimensional vector space is spanned by finitely many vectors. Use the
previous proposition to cut a spanning set down to a basis.
5. Wednesday 7/1: Bases (2.B) and Dimension (2.C)
Proposition 46. Every linearly independent list of vectors in a finite-dimensional vector space
can be extended to a basis of the vector space.
Proof. Suppose that u1 , . . . , um is a linearly independent set in a finite-dimensional vector space V .
Since V is finite-dimensional, by the previous corollary, V has a basis. Let w1 , . . . , wn be a basis of
V . Then
u1 , . . . , um , w1 , . . . wn
is a spanning set (since the basis already spanned). Now simply run the algorithm from the proof
of the previous proposition. In the first m steps of the algorithm, we never throw any of the ui out,
since they are linearly independent.
We are left with a basis of V that contains u1 , . . . , um , and some of the wj ’s.
Proposition 47. Suppose V is finite-dimensional and U is a subspace of V . Then there is a
subspace W of V such that V = U ⊕ W .
Proof. Since V is finite-dimensional, we know that U is finite-dimensional. So U has a basis
u1 , . . . um . But this is a linearly independent set in V , so we can extend to a basis
u1 , . . . , um , w1 , . . . , wn
of V . Let W = span(w1 , . . . , wn ). We would like to show that V = U ⊕ W , and by Proposition 23,
this means we need to show that V = U + W and U ∩ W = {0}.
For any v ∈ V , since the list u1 , . . . , um , w1 , . . . wn spans V , this means that we can write
v = (a1 u1 + · · · + am um ) + (b1 w1 + · · · + bn wn ).
Since the first sum is in U and the second sum is in W , this shows that v ∈ U + W so V = U + W .
To show that U ∩ W = {0}, suppose v ∈ U ∩ W . This means that there are scalars such that
v = a1 u1 + · · · + am um = b1 v1 + · · · + bn vn .
But then, subtracting we have
(a1 u1 + · · · + am um ) − (b1 v1 + · · · + bn vn ) = 0
but since the u’s and w’s are a basis of V , they are linearly independent, so all of the constants
must be 0. Hence, v = 0, as desired.
Dimension (2.C)
Recall that we have defined (and studied a bit) the notion of a finite-dimensional vector space;
namely, a vector space that is spanned by finitely many vectors. But we haven’t yet defined the
dimension of a vector space. Whatever our notion of dimension, the correct definition should mean
that Fn has dimension n (since there are “n directions” in Fn ). The definition of the dimension of
V (as you may recall from Math 308) will be the number of vectors in a basis for V . Of course, in
order for this to be a reasonable definition, every basis should contain the same number of vectors.
Luckily, this is true.
Proposition 48. Any two bases of a finite-dimensional vector space V contain the same number
of vectors.
Proof. We will basically use Proposition 37 (2.27 in your textbook) twice. If B1 and B2 are two
bases, then since B1 is linearly independent and B2 spans V , by Proposition 37, B1 must contain
less than or equal to the number of vectors than B2 . But then reversing the roles of the two bases,
B2 must contain less than or equal to the number of vectors in B1 . Hence, B1 and B2 contain the
same number of vectors.
Since this is true, we are now allowed to define dimension as we wanted to.
Definition 49. The dimension of a finite-dimensional vector space V is the length of any basis of
V . The dimension of V is denoted dim V .
Example 50. (1) The standard basis of Fn contains n vectors (hence any basis of Fn does),
so dim Fn = n.
(2) We saw that {1, z, z 2 , . . . , z m } is a basis of Pm (F) so dim Pm (F) = m + 1.
(3) In general, the dimension of a vector space depends on the field. So, for example, you can
view C as a 1-dimensional vector space over C. Any nonzero complex number serves as
a basis. So {1} is a basis, because you can get any complex number by taking C-linear
combinations.
However, you can also view C as a vector space over R, with basis given by {1, i}. Every
complex number is an R-linear combination of 1 and i, and they are linearly independent
over R. Hence, C is a 2-dimensional vector space over R.
In this class, generally we work with a fixed field F (that could be R or C), so when we
say dimension we mean dimension over F. If you wanted to be very specific, you could use
a subscript and say
dimC C = 1
and
dimR C = 2.
The notion of dimension is really a notion of the “size” of a vector space. Most vector spaces contain
infinitely many elements (think of R or Rn ), so we don’t want to measure the size of a vector space
naively by the number of vectors in it. Nevertheless, dimension should obey our intuition for a
notion of size, and we should be able to prove the results that we strongly suspect are true.
Proposition 51. If V is finite-dimensional and U is a subspace of V , then dim U ≤ dim V .
Proof. Recall that we already showed that every subspace of a finite-dimensional vector space is
finite-dimensional, so it makes sense to talk about dim U .
Suppose that B is a basis of U and C is a basis of V . Then B is a linearly independent list of vectors
in U , but we can also view them as vectors in V . Hence, B is a linearly independent list in V and
C spans V . So again by Proposition 37, we have that the length of B is less than or equal to the
length of C. By the definition of dimension, we have dim U ≤ dim V , as desired.
Here’s another indication that bases aren’t that special. If you know that V has dimension n (that
is, every basis has length n), then it turns out that if you have a linearly independent list of size n,
it is already a basis. The same result is true for n spanning vectors.
Proposition 52. Suppose V is finite-dimensional. Then every linearly independent list of vectors
in V with length dim V is a basis of V .
Proof. This follows from the theorem that every linearly independent list can be extended to a
basis. If dim V = n and v1 , . . . , vn is a linearly independent list, then it can be extended to a basis
of V by the theorem. However, every basis has length n, so v1 , . . . , vn must already be a basis. Of course, since we know that every spanning set can be cut down to a basis, the spanning set
analogue of the previous result is also true.
Proposition 53. Suppose V is finite-dimensional. Then every spanning list of vectors in V with
length dim V is a basis of V .
There is a nice example in your textbook of using these theorems. It is interesting enough that I
think it’s worth talking about in detail in class.
Example 54. Let U = {p ∈ P3 (R) | p0 (5) = 0}. This is a subset of P3 (R), since it consists of
some of the polynomials but not all of them (for example, p(z) = z does not satisfy p0 (5) = 0).
It is a subspace since the 0 polynomial satisfies the given proprety, if p0 (5) = 0 and q 0 (5) = 0
then (p + q)0 (5) = p0 (5) + q 0 (5) = 0, and if α ∈ R then (αp)0 (5) = αp0 (5) = 0.
We will show that 1, (z − 5)2 , (z − 5)3 is a basis of U .
First, note that these polynomials are in U since the derivative of 1 is identically 0, the derivative
of (z − 5)2 is 2(z − 5) and the derivative of (z − 5)3 = 3(z − 5)2 , so all of these derivatives vanish
at 5.
Now we can check that these three polynomials are linearly independent. Suppose a, b, c ∈ R
satisfy
a + b(z − 5)2 + c(z − 5)3 = 0
Comparing the z 3 coefficient on both sides yields that c = 0. Then comparing the z 2 coefficient
yields b = 0. Finally, this means b = 0. So these three polynomials are linearly independent.
Hence dim U ≥ 3. On the other hand, U 6= P3 (R), dim U 6= 4, so we must have dim U = 3.
The last result in the section, which we won’t prove in lecture, is on the dimension of a sum. Notice
the similarity between this formula and the formula for the size of a union of finite sets
|U ∪ V | = |U | + |V | − |U ∩ V |.
Also note that in particular, this says that if the sum is direct, dim(U1 ⊕ U2 ) = dim U1 + dim U2 .
Proposition 55. If U1 and U2 are subspaces of a finite-dimensional vector space then
dim(U1 + U2 ) = dim U1 + dim U2 − dim(U1 ∩ U2 ).
6. Monday 7/6: Linear Maps (3.A)
A nice quote from your book to start the chapter: “So far our attention has focused on vector spaces.
No one gets excited about vector spaces. The interesting part of linear algebra is the subject to
which we now turn—linear maps.” This is true. Vector spaces have tons of nice properties, which
means that they are frankly pretty boring. But a very deep mathematical fact is that often you
learn more about objects by studying the functions between them than by just studying the objects
themselves. You will see this in your future mathematics classes as well, so over your mathematical
career you will understand this idea more and more.
Generally in math, you learn about certain structures (e.g., vector spaces), and then you consider
functions which preserve the structures. So to have a vector space, you must have an addition and
a scalar multiplication. The kinds of functions we should consider are the ones that play nice with
addition and scalar multiplication.
Definition 56. Let V and W be vector spaces. A linear map from V to W is a function T : V → W
such that
(1) T (u + v) = T u + T v for all v, w ∈ V ;
(2) T (λv) = λ(T v) for all λ ∈ F and all v ∈ V .
In Math 308 these were called linear transformations, but the words “transformation” and “map”
and “function” all mean the same thing anyway, and the latter two are more common.
Notation. Here, T v = T · v = T (v) is the image of v ∈ V under the function T : V → W .
Example 57. (1) The zero map 0 : V → W defined by 0(v) = 0 for all v ∈ V . We can check
0(v + w) = 0 = 0 + 0 = 0v + 0w and 0(αv) = 0 = α0 = α0(v). (Note the difference here
between which of the 0’s are functions and which are scalars).
(2) The identity map I : V → V defined by Iv = v for all v ∈ V is a linear map. We can check
I(v + w) = v + w = Iv + Iw and I(αv) = αv = α(Iv).
(3) Differentiation D : P(R) → P(R) defined by Dp = p0 .
R1
(4) Integration J : P(R) → R defined by Jp = 0 p(t) dt.
(5) The shift maps L, R : F∞ → F∞ :
L(x1 , x2 , . . . ) = (x2 , x3 , . . . )
R(x1 , x2 , . . . ) = (0, x1 , x2 , . . . ).
Here is an easy but useful result.
Proposition 58. If T : V → W is a linear map, then T (0) = 0.
Proof. We have T (0) = T (0 + 0) = T (0) + T (0) and hence T (0) is the additive identity of W , as
desired.
Definition 59. The set of all linear maps from V to W is denoted L(V, W ).
Proposition 60. Suppose v1 , . . . , vn is a basis of V and w1 , . . . , wn ∈ W . Then there exists a
unique linear map T : V → W such that T vj = wj for all j.
Proof. Let v ∈ V . Then since the vi ’s are a basis, we have
v = a1 v1 + · · · + an vn
for uniquely determined scalars ai ∈ F. We will define the linear map T by
T v = a 1 w1 + · · · + a n wn .
Certainly this defines a map V → W . We need to check that it is linear. Suppose u ∈ V and write
u = b1 v1 + · · · + bn vn
for scalars bi ∈ F. Then
T u = b1 w1 + · · · + bn wn .
On the other hand
T (v + u) = T ((a1 + b1 )v1 + · · · + (an + bn )vn ) = (a1 + b1 )w1 + · · · + (an + bn )wn
which is the same as T v + T u. Similarly, you can check that T (λv) = λT (v) for all λ ∈ F so T is
linear.
To show that T is unique, suppose S : V → W is any linear map such that Svj = wj for all j.
Then let v ∈ V and again write v = a1 v1 + · · · + an vn . Then, since S is linear
Sv = S(a1 v1 + · · · + an vn ) = a1 S(v1 ) + · · · + an S(vn ) = a1 w1 + · · · + an wn = T v
so S = T .
This says that you can send a basis anywhere with a linear map, and once you choose where to
send your basis, the entire linear map is determined.
Example 61. In 308, you learned that every linear map T : Fn → Fm can be represented by
an m × n matrix. Simply see where T sends the standard basis vectors:


a1,j
 . 
. 
T ej = 
 . 
am,j
for all 1 ≤ j ≤ 1n. This should determine the entire linear transformation. Indeed, if
v = x1 e1 + · · · + xn en
then



 
 
a1,1
a1,n
a1,1 x1 + · · · + a1,n xn
a1,1 . . .
 . 
 .  
  .
.
..
 .  
 =  ..
. 
..
T v = x1 
.
 .  + · · · + xn  .  = 
 
am,1
am,n
am,1 x1 + · · · + am,n xn
am,1 . . .
 
a1,n
x1
 . 
.. 
 . 
. 
 . .
am,n
xn
So you can think of the linear map T as multiplication by the matrix.
Of course, you can add two m × n matrices to get another m × n matrix. You can also scalar
multiply a matrix to get a matrix of the same size. This suggests that all of the m × n matrices
might themselves form a vector space! But the more fundamental viewpoint is to consider the
linear maps rather than the matrices (which in some way are basis-dependent).
Definition 62. Suppose S, T ∈ L(V, W ) and λ ∈ F. The sum S + T is the function defined by
(S + T )(v) = Sv + T v
and the product (of λ and T ) is the function λT is defined by
(λT )(v) = λ(T v)
for all v ∈ V . These are both linear maps.
It is not obvious that the sum of two linear maps is a linear map! It is a thing that needs to be
checked. You need to check that the map that we just defined as S + T satisfies additivity and
homogeneity, that is (S + T )(v + w) = (S + T )(v) + (S + T )(w) and (S + T )(λv) = λ(S + T )(v).
You should check this.
Now just because L(V, W ) is closed under two operations, addition and scalar multiplication, does
not necessarily mean that it is a vector space. After all, these operations need to satisfy all the
axioms of being a vector space! It turns out this is true, and can be checked using the definitions
above.
Proposition 63. With operations defined as above, L(V, W ) is a vector space.
What is the additive identity of L(V, W )?1 What is the additive inverse of an element S ∈ L(V, W )?2
It turns out that there is actually even a bit more structure on L(V, W ). Generally in a vector
space, you can add vectors and multiply vectors by scalars, but there is no notion of multiplying a
vector by another vector. But if we have maps T : U → V and S : V → W , then we can take the
composition ST : U → W . Note that it is again not obvious that ST : U → W is a linear map.
You should prove this! In fact I think it is so worth doing that I will put it on your next problem
set (if I remember to).
Of course, as vectors T ∈ L(U, V ), S ∈ L(V, W ) and ST ∈ L(U, W ), so they all live in different
vector spaces (unless U = V = W ). So this is not a product on L(V, W ), in general.
Definition 64. If T ∈ L(U, V ) and S ∈ L(V, W ), then the product ST ∈ L(U, W ) is the function
defined by
(ST )(u) = S(T u)
for u ∈ U .
This product has some nice properties, but not necessarily all of the nice properties you could
possibly dream of. In particular, in general ST 6= T S. In fact, in many cases if ST is defined, then
T S may not be (since the codomain of the first function needs to match the domain of the other).
Even in the case that both S and T are defined (i.e., when U = V = W ), they may not be equal.
Nevertheless, we do have the following nice properties.
Proposition 65. The product of linear maps satisfies the following properties:
• Associativity: (ST )U = S(T U ) whenever S, T, U are linear maps such that the products
make sense.
1Answer: The zero map 0 : V → W
2Answer: The map −S which we define by (−S)(v) = −S(v) for all v ∈ V .
• Identity: T I = IT = T for all T ∈ L(V, W ). (Note that the two identity functions
appearing here are identities of different vector spaces.
• Distributivity: (S1 + S2 )T = S1 T + S2 T and S(T1 + T2 ) = ST1 + ST2 for all T, T1 , T2 ∈
L(U, V ) and S, S1 , S2 ∈ L(V, W ).
Null Spaces and Ranges (3.B)
We continue our study of linear maps (with the philosophy that we can learn a lot about vector
spaces by looking at the structure-preserving maps between them). For each linear map, there are
two subspaces that we can naturally look at, the null space (or kernel ) and range (or image).
Definition 66. For T ∈ L(V, W ), the null space (or kernel ) of T , denoted null T , is the subset of
V consisting of those vectors that T maps to 0:
null T = {v ∈ V | T v = 0}.
Proposition 67. If T ∈ L(V, W ), then null T is a subspace of V (the domain).
Proof. We showed that for any linear map T , T (0) = 0, so 0 ∈ null T .
If u, v ∈ null T , then this means T (u) = 0 and T (v) = 0. Since T is linear, we have T (u + v) =
T u + T v = 0 + 0 = 0. So u + v ∈ null T .
If u ∈ null T and α ∈ F, then again since T is linear, T (αu) = αT u = α · 0 = 0 so αu ∈ null T .
Example 68.
• Consider the zero map T : V → W . Then T v = 0 for every v ∈ V . So
null T = V .
• Consider the identity map I : V → V . Then Iv = v for every v ∈ V . So if Iv = 0, this
means v = 0. So null I = {0}.
• Let D ∈ L(P(R), P(R)) be the differentiation map. Then D(f ) = 0 if and only if the
derivative of f is the 0 polynomials. The polynomials with derivative 0 are exactly the
constant functions. So null D = {constant polynomials}.
• Let L : F∞ → F∞ be the left shift map
L(x1 , x2 , . . . ) = (x2 , x3 , . . . ).
Then null L = {(a, 0, 0, . . . ) | a ∈ F}. On the other hand null R consists of only the zero
vector.
• Let T : F3 → F2 be given by T (x, y, z) = (x + y, 2z). Then
null T = {(x, y, z) ∈ F3 | x + y = 0 and 2z = 0} = {(a, −a, 0) | a ∈ F}.
So the null space is one-dimensional, spanned by (1, −1, 0). In Math 308, you learned
how to compute this by taking the matrix representation for T , row reducing, and
reading off a basis for the null space.
7. Wednesday 7/8: Null Spaces and Ranges (3.B)
So in some of the examples above, lots of vectors got mapped to 0. In other examples, only the 0
vector itself got mapped to 0. This is related to the injectivity of a linear map.
Definition 69. A function T : V → W is called injective if T u = T v implies u = v.
This can be restated as if u 6= v then T u 6= T v.
In other words, injective functions map distinct inputs to distinct outputs. They never map two
different vectors to the same vector. In Math 308, you learned the term “one-to-one” instead of
“injective”. The next result is a very useful result which tells you when a linear map is injective.
Proposition 70. Let T ∈ L(V, W ). Then T is injective if and only if null T = {0}.
Remark. This result says that being injective is equivalent to null T = {0}, or in other words, the
only vector that maps to 0 is 0. Of course, if T is going to be injective, it is only allowed to map
one vector to 0, and it has to map the 0 vector there.
For a general function, to check injectivity, you need to check that every single output only comes
from one input. Since linear maps are so nice, you only need to check if the output 0 only comes
from one input.
Proof. Suppose T is injective. We know that since T is linear, T (0) = 0, and so {0} ⊆ null T . We
need to show that null T ⊆ {0}. So suppose v ∈ null T . Then
T (v) = 0 = T (0)
and since T is injective, v = 0, as desired.
Conversely, suppose null T = {0}. We want to show that T is injective. So suppose T u = T v. Then
0 = T u − T v = T (u − v)
so u − v ∈ null T and hence u − v = 0. Therefore, u = v so T is injective.
The other subspace we associate to a linear map is the range or image.
Definition 71. The range (or image) of a function T : V → W is the set
range(T ) = {w ∈ W | w = T v for some v ∈ V }
Proposition 72. If T ∈ L(V, W ), then range T is a subspace of W (the codomain).
Proof. Since T (0) = 0, we have 0 ∈ range T .
Suppose w1 , w2 ∈ range(T ). Then there exist v1 , v2 ∈ V such that T v1 = w1 and T v2 = w2 .
Therefore
w1 + w2 = T v1 + T v2 = T (v1 + v2 )
so w1 + w2 ∈ range(T ). Finish the proof that range T is closed under scalar multiplication on your
own.
Example 73.
• Consider the zero map T : V → W . Then T v = 0 for every v ∈ V . So
range T = {0} ⊆ W .
• Consider the identity map I : V → V . Then for every v ∈ V , we have Iv = v. Therefore,
range I = V .
• Let D ∈ L(P(R), P(R) be the differentiation map. Every polynomial occurs as the
derivative of some polynomial (its antiderivative). Specifically,
a1 2
an n+1
n
a0 + a1 z + · · · + an z = D a0 z + z + · · · +
z
,
2
n+1
so range D = P(R).
• Let L : F∞ → F∞ be the left shift map
L(x1 , x2 , . . . ) = (x2 , x3 , . . . ).
Then range L = F∞ . On the other hand range R = {(0, a1 , a2 , . . . ) | a1 , a2 , · · · ∈ F}.
• Let T : F3 → F2 be given by T (x, y, z) = (x + y, 2z). Then
range T = {(x + y, 2z) | x, y, z ∈ F}.
But every vector can be obtained this way. Indeed,
(a, b) = T (a, 0, b/2).
So range T = F2 .
Again, in some of the examples above, the range T is the entire codomain. In some of the examples,
it is a strict subspace.
Definition 74. A function T : V → W is called surjective if range T = W .
In Math 308, you used the term “onto” rather than “surjective”.
The null space and the range are very closely related. The basic idea is this: the more vectors you
send to 0, the fewer vectors you get in the image. The fewer vectors you send to 0, the more vectors
you can get in the image. The precise statement is the following
Theorem 75 (The Fundamental Theorem of Linear Maps or the Rank–Nullity Theorem). Suppose
V is finite-dimensional and T ∈ L(V, W ). Then range T is finite-dimensional and
dim V = dim null T + dim range T
In Math 308, one way to prove this is to count the number of pivots and columns in matrices.
However, this proof is distasteful because it relies on having a matrix representation (which is less
intrinsic than the map itself). Also note that that proof only works for maps Fn → Fm . Here, V
can be any finite-dimensional vector space, and W need not even be finite-dimensional. If W is
infinite-dimensional, then there is really no hope of writing down a matrix. So we provide a much
better proof.
Proof. Since null T is a subspace of a finite-dimensional vector space, it is finite-dimensional and
hence has a basis u1 , . . . , um . By a previous theorem, we can extend this to a basis of all of V , say
u1 , . . . , um , v1 , . . . , vn
where dim V = m+n. We simply need to show that range T is finite-dimensional and dim range T =
n.
Where can we possibly get a basis of range T that has length n? Well the natural thing to try is
to show that T v1 , . . . , T vn is a basis of range T (at least these are n vectors that live in range T ).
So let’s show that these vectors span range T and are linearly independent.
For any vector v ∈ V , we can write
v = a1 u1 + · · · + am um + b1 v1 + · · · + bn vn
since the u’s and v’s formed a basis of V . After applying T , since each of the u’s is in the kernel,
we have
T v = b1 T v1 + · · · + bn T vn .
This means that every vector in range T is indeed in the span of the T v1 , . . . T vn , so these vectors
span range T .
To show that they are linearly independent, suppose that
c1 (T v1 ) + · · · + cn (T vn ) = 0.
Then
T (c1 v1 + · · · + cn vn ) = 0
so
c1 v1 + · · · + cn vn ∈ null T.
Since this vector is in null T , we can write it as a linear combination of the u’s
c1 v1 + · · · + cn vn = d1 u1 + · · · + dm um ,
but subtracting and using linear independence of the u’s and v’s, we can conclude that all of
the scalars are 0. Since all of the c’s are zero, the list T v1 , . . . , T vn is linearly independent, as
desired.
Recall that in Math 300 you learned that if |A| > |B| then a function f : A → B is never injective.
Similarly, if |A| < |B| then a function f : A → B is never surjective. For vector spaces, the correct
notion of size is dimension, so we have the following theorems (which you should also remember
from Math 300). We can prove these results using the Rank–Nullity Theorem.
Proposition 76. Suppose V and W are finite-dimensional vector spaces such that dim V > dim W .
Then no linear map V → W is injective.
Proof. Let T ∈ L(V, W ). Then
dim null T = dim V − dim range T ≥ dim V − dim W > 0.
Since null T has positive dimension, it must contain vectors other than 0, so by Proposition 70, T
is not injective.
A very similar proof is used to prove:
Proposition 77. Suppose V and W are finite-dimensional vector spaces such that dim V < dim W .
Then no linear map V → W is surjective.
A reprise of a Math 308 hit: bases of the null space and range.
One of the nicest computational tools in linear algebra is the use of matrices. In the abstract, it
is generally easier to work with a linear map. For a specific computation, you often want to work
with a matrix. The next section (3.C) is specifically on the matrix of a linear map, but I wanted
to do an example to jog your memories.
Two ways that we can describe a subspaces of Fn are:
• Implicitly, e.g., as the set of solutions x ∈ R4 to the system of equations
x1 + 3x2 + 3x3 + 2x4 = 0
2x1 + 6x2 + 9x3 + 7x4 = 0
−x1 − 3x2 + 3x3 + 4x4 = 0.
• Explicitly, e.g., the span of

 
    
3
2
       
3
 2  ,  6  , 9 , 7 ∈ R .
1
−1
3
−3
3
4
Both of these can be expressed as the null space and range of the linear map T : R4 → R3 with the
matrix


1
3
3 2

A= 2
6

9 7 .
−1 −3 3 4
Here the columns of A are given by the images T e1 , T e2 , T e3 , T e4 . Note that for every x ∈ R4 , we
can write
 
x1
 
x2 

x=
x  = x1 e1 + x2 e2 + x3 e3 + x4 e4
 3
x4
and
 
 
2
3
 
 
 
 
Ax = x1  2  + x2  6  + x3 9 + x4 7 .

1
−1


3

−3
The key tool is to use Gaussian elimination:




1
3 3 2
1 3 3 2




A= 2
6 9 7
0 0 3 3
−1 −3 3 4
0 0 6 6
3
4


1 3 0 −1


0 0 1 1  =: U.
0 0 0
0
The matrix U is the reduced row echelon form of A, and you can use it to determine a basis of
null T and range T . First,
null T = null A = {x | Ax = 0} = {x | U x = 0}.
But from U , x2 and x4 are free variables, so every solution to U x = 0 can be written as


 
 
−3x2 + x4
−3
1


 
 


 
 
x2

 = x2  1  + x4  0  .
 −x

0
−1
4


 
 
x4
0
1
So these two vectors form a basis of the null space.
On the other hand, the range of T ends up being the same as the column space of A, col A. For a
basis for the column space, you take the pivot columns, so a basis of range T is
   
1
3
   
 2  , 9 .
−1
3
Note that the dimensions are correct and satisfy the Rank–Nullity Theorem.
8. Friday 7/10: Matrices (3.C)
Let V and W be finite-dimensional vector spaces with bases BV = {v1 , . . . vn } and BW = {w1 , . . . , wm },
respectively. Now for any linear map T : V → W , we know that T is determined entirely by
T v1 , . . . , T vn (i.e., what T does to a basis of V ). Since the wj ’s are a basis of W , we can write each
T vi uniquely as a linear combination of the wj ’s. This should contain all of the information about
T.
Definition 78. An m × n matrix A is a rectangular array of elements of F with m rows and n
columns

A1,1 . . .
 .
.
A=
 .
Am,1 . . .

A1,n
.. 
. 
.
Am,n
Note that Aj,k is the entry in the jth row and kth column of A.
We will use matrices to record the data of the linear map T . Note that this of course depends on
our choice of bases for V and W .
Definition 79. Suppose T ∈ L(V, W ) and retain the bases BV and BW above. The matrix of T
with respect to these bases is the m × n matrix M(T ) whose entries Aj,k are defined by
T vk = A1,k w1 + · · · + Am,k wm
(that is, the kth column of M(T ) records the coefficients for writing T vk as a linear combination
of the wj ’s).
If
the
bases
are
not
clear
from
context,
then
the
cumbersome
notation
M(T, (v1 , . . . , vn ), (w1 , . . . , wn ) is used.
Generally, if no bases are given and T : Fn → Fm , we use the standard bases of Fn and Fm . Indeed,
these are the matrices you have been using since Math 308.
Example 80. Suppose T ∈ L(F3 , F2 ) is defined by
T (x, y, z) = (3x − y, 2x + 2z).
To find the matrix of T with respect to the standard bases, we just need to understand
T e1 , T e2 , T e3 , and write these with respect to the standard basis of F2 . Of course, everything is
already written with respect to the standard basis. Since T (1, 0, 0) = (3, 2), T (0, 1, 0) = (−1, 0)
and T (0, 0, 1) = (0, 2) we have
M(T ) =
"
#
3 −1 0
2
0
2
.
In Math 308, you learned that there is a correspondence between m × n matrices and linear map
T : Fn → Fm . Every matrix gives a linear map, every linear map can be recorded by a matrix.
The only extra wrinkle here is that we allow arbitrary finite-dimensional vector spaces, and the
matrices depend on a choice of basis for the vector space.
You can add two matrices of the same size and you can multiply a matrix by a scalar. We also saw
that you can add two linear maps V → W and you can scalar multiply a linear map by a scalar.
Do these two notions coincide? That is, suppose S, T : V → W . From these (with a fixed choice of
basis), you obtain two matrices M(S) and M(T ). You can also form the linear map S +T : V → W
and the matrix M(S + T ). How are M(S), M(T ), and M(S + T ) related?
Since this is linear algebra, the nicest possible subject, the answer is: the way you want them to
be.
Proposition 81. Suppose S, T ∈ L(V, W ). Then M(S + T ) = M(S) + M(T ).
So addition of linear maps and addition of their corresponding matrices correspond. The same hold
for scalar multiplication
Proposition 82. Suppose T ∈ L(V, W ) and λ ∈ F. Then M(λT ) = λM(T ).
Yet again, since we now have a notion of addition for matrices of the same size and a notion of
scalar multiplication, this suggests that perhaps the set of all matrices of a fixed size form a vector
space. Again, it is not obvious (but it is true), that these operations satisfy the axioms of a vector
space.
Definition 83. For m and n positive integers, the set of all m × n matrices with entries in F is
denoted by Fm,n .
Fm,n is a vector space of dimension mn. A basis is given by the elementary matrices Ei,j which
have a 1 in the ith row and jth column and 0s everywhere else.
Recall that we said that there is even more structure to the set of linear maps—given linear maps
T : U → V and S : V → W , it is possible to compose them to obtain ST : U → W . On the
matrix side, U , V , and W are all finite-dimensional with some choice of fixed bases, we obtain
M(S), M(T ) and M(ST ). We will define a multiplication on matrices (of the correct size) so that
it corresponds to function composition. That is, M(S)M(T ) = M(ST ).
Definition 84. Let A ∈ Fm,n and C ∈ Fn,p . Then the product AC is defined to be the m × p
matrix whose entry in row j and column k is given by
(AC)j,k =
n
X
Aj,r Cr,k
r=1
In other words, the (j, k) entry is obtained by taking the dot product of the jth row of A with the
kth column of C.
Note that just as function composition is only defined when the domain of T matches the codomain
of S, matrix multiplication is only defined when the number of columns of A is the same as the
number of rows of C.
Proposition 85. Let T : U → V and S : V → W be linear maps. Fix bases {u1 , . . . , up },
{v1 , . . . , vn } and {w1 , . . . , wm } of U , V , and W , respectively. Let M(T ), M(S), and M(ST ) be
the matrices of T , S, and ST with respect to the given bases. Then M(S)M(T ) = M(ST ).
Proof sketch. Really the proof is just a computation where you’re careful with indices. Let A =
M(S) and C = M(T ). To compute the matrix of ST , we simply need to know the image (ST )(uk )
for each 1 ≤ k ≤ p. What happens to uk under these maps?
!
n
X
(ST )(uk ) = S
Cr,k vr
r=1
=
=
n
X
r=1
n
X
r=1
=
Pn
r=1 Aj,r Cr,k
Cr,k
m
X
Aj,r wj
j=1
m
n
X
X
j=1
So this means that
Cr,k (Svr )
!
Aj,r Cr,k
wj
r=1
should be the (j, k)th entry of M(ST ), which is exactly how we
defined AC.
Invertibility and Isomorphic Vector Spaces (3.D)
Definition 86. A linear map T ∈ L(V, W ) is called invertible if there exists a linear map S ∈
L(W, V ) such that ST = IV ∈ L(V, V ) and T S = IW ∈ L(W, W ).
A linear map S ∈ L(V, W ) satisfying ST = I and T S = I is called an inverse of T .
Notice that in the definition it says that S is an inverse of T , not the inverse of T . That’s because
it is not obvious that an invertible map has only one inverse. It is true though.
Proposition 87. An invertible linear map has a unique inverse.
Proof. Suppose T ∈ L(V, W ) is invertible. Suppose that S1 and S2 are inverses of T . Then consider
S1 T S2 . On the one hand,
S1 T S2 = (S1 T )S2 = IS2 = S2
but also
S1 T S2 = S1 (T S2 ) = S1 I = S1
so we conclude that S1 = S2 .
So in fact inverses are unique, so we can talk about the inverse of an invertible linear map. If
T ∈ L(V, W ) is invertible, we denote its inverse by T −1 ∈ L(W, V ). But the question remains: how
can we tell if a linear map is invertible? It turns out this question has a very clean answer.
Proposition 88. A linear map is invertible if and only if it is injective and surjective (i.e., if and
only if it is bijective).
9. Monday 7/13: Invertibility and Isomorphic Vector Spaces (3.D)
We begin by proving the theorem that we stated at the end of last lecture.
Proposition 89. A linear map is invertible if and only if it is injective and surjective (i.e., if and
only if it is bijective).
Proof. Suppose T ∈ L(V, W )
[⇒]. Suppose T is invertible. We want to show that T is injective and surjective. First, we show T
is injective. To this end, suppose u, v ∈ V such that T u = T v. Now apply the inverse T −1 : W → V
to both of these vectors to see that
u = T −1 (T u) = T −1 (T v) = v
so u = v and hence T is injective.
Now we want to show that T is surjective. So suppose w ∈ W . Which vector in V maps over to
w? Again, use the inverse map T −1 . The vector T −1 w ∈ V and
T (T −1 w) = w
so w ∈ range T and so T is surjective.
[⇐]. Now suppose that T is bijective. We want to construct a map S ∈ L(V, W ) which is its
inverse. So for each w ∈ W , let
Sw ∈ V be the unique vector in V such that T (Sw) = w.
Why does such a vector exist? First, there is a vector that maps over to w since T is surjective.
And there is only one such vector since T is injective.
We need to show that this map S is linear and the inverse of T . First we show S is linear. Let
w1 , w2 ∈ W . Then
T (Sw1 + Sw2 ) = T (Sw1 ) + T (Sw2 ) = w1 + w2
and so Sw1 + Sw2 is the unique element of V which maps over to w1 + w2 . But this is the definition
of S(w1 + w2 ) so we have
S(w1 + w2 ) = Sw1 + Sw2
which shows that S is additive. The proof for homogeneity is similar.
Finally, we need to show that ST = IV and T S = IW . The latter of these is clear by the definition
of S. For any w ∈ W , we defined Sw to be exactly the vector that T (Sw) = w. So T S = IW . Now
let v ∈ V . Then
T (ST (v)) = (T S)(T v) = I(T v) = T v.
But since T is injective, this shows that ST (v) = v. Therefore ST = IV , and we have completed
the proof.
Isomorphic Vector Spaces.
If there is an invertible linear map between two vector spaces V and W , then by the previous result
it is injective and surjective. In which case these two vector spaces are essentially “the same”.
Example 90. Let V be the xy-plane in R3 , that is
V = {(x, y, 0) | x, y ∈ R} = span((1, 0, 0), (0, 1, 0)).
This seems like it is basically the same as R2 . Of course, they are not the same. The vector
(1, 0) is in R2 , and it is definitely not in V .
But they do behave in basically exactly the same way. There is a map T : V → R2 defined
by T (x, y, 0) = (x, y) which you should check is an invertible linear map. This means that
the two spaces have “the same size” (since T is bijective) and preserves addition and scalar
multiplication.
So even if two vector spaces aren’t exactly the same, if there’s an invertible linear map between
them, they behave in the same way. We would not say that R2 is equal to V , instead, we say that
they are isomorphic.
Definition 91. An isomorphism is an invertible linear map.
If there exists an isomorphism V → W , then we say that V and W are isomorphic.
Note that this is sort of a redundant definition, a linear map T is invertible if and only if it is an
isomorphism. You use the word isomorphic when you really want to stress that two vector spaces
are essentially the same.
One nice fact is that over any field, basically there is “only one” vector space of each finite-dimension.
So we saw that the xy-plane was isomorphic to R2 , but actually every two-dimensional subspace
of R3 is isomorphic to R2 . And every two-dimensional subspace of Rn . And even the vector space
P1 (R) of linear polynomials is isomorphic to R2 .
Theorem 92. Let V and W be finite-dimensional vector spaces over F. Then V and W are
isomorphic if and only if dim V = dim W .
Proof. [⇒]. Suppose that V and W are isomorphic, so there exits an isomorphism T : V → W .
Since T is injective, null T = {0} and since T is surjective, range T = W . Hence, by the Rank–
Nullity Theorem,
dim V = dim null T + dim range T = 0 + dim W = dim W.
[⇐]. Conversely, suppose that dim V = dim W = d. Then there exist bases {v1 , . . . , vd } and
{w1 , . . . , wd } of V and W , respectively. We want to show that there is an isomorphism T : V → W .
But by Proposition 60, there exists a unique linear map T such that T vj = wj for all j. By the
previous result, we simply need to show that T is both injective and surjective.
Since {w1 , . . . , wd } is a basis for W , every vector w ∈ W can be written as
w = c1 w1 + · · · + cd wd .
But then
T (c1 v1 + · · · + cd vd ) = c1 T v1 + · · · + cd T v2 = c1 w1 + · · · + cd wd = w
so T is surjective.
Now to show that T is injective, it suffices to show that null T = 0. So suppose T v = 0. Write
v = c1 v1 + · · · + cd vd .
Then
0 = T v = c1 T v1 + · · · + cd T vd = c1 w1 + · · · + cd wd
and since {w1 , . . . , wd } is linearly independent, this implies that cj = 0 for all j and so v = 0, as
desired.
Since the concept of isomorphic vector spaces captures vector spaces that are “the same”, the fact
that there is a correspondence between m × n matrices and linear maps from Fm → Fn should be
captured by some isomorphism.
Proposition 93. Suppose v1 , . . . , vn is a basis of V and w1 , . . . , wm is a basis of W . For each
T ∈ L(V, W ), let M(T ) ∈ Fm,n be the m × n matrix of T with respect to these bases.
Then the function M : L(V, W ) → Fm,n defined by T 7→ M(T ) is an isomorphism.
Proof. Propositions 81 and 82 show that M is linear. We need to prove that M is injective and
surjective.
First, we prove injectivity. Suppose T ∈ null M, that is T ∈ L(V, W ) such M(T ) = 0. Then by
the definition of M(T ), we have that T vk = 0 for all k. Since v1 , . . . , vn is a basis of V , this implies
that T = 0 so M is injective.
Next, surjectivity. Suppose A ∈ Fm,n . We must find a T ∈ L(V, W ) such that M(T ) = A. Define
T : V → W by
T vk =
m
X
Aj,k wj
j=1
for all k. Then A = M(T ), so M is surjective, as desired.
Since we know that Fm,n has dimension mn, we have the following corollary.
Corollary 94. Suppose V and W are finite-dimensional. Then L(V, W ) is finite-dimensional and
dim L(V, W ) = (dim V )(dim W ).
The last definition we will introduce in this section is that of an operator, which is just a linear
map from a vector space to itself.
Definition 95. A linear map from a vector space V to itself is called an operator. We use the
notation L(V ) = L(V, V ) for the set of all operators on V .
Remember the first result we proved today says that a linear map is invertible if and only if it is
bijective. The intuition here is that an isomorphic vector spaces are “the same” so they should
have the same size. If you have a surjective map V → W , this says that dim V ≥ dim W and if you
have an injective map V → W , this says that dim V ≤ dim W .
On the other hand, when we are talking about an operator V → V , do we actually need to check
both things? In Math 300, if we had an injective map {1, 2, . . . , n} → {1, 2, . . . , n}, then since we
send each input to a distinct output, this injection was automatically surjective. Similarly, if we
have an injective operator V → V , is it automatically surjective?
Well if we think about just set-theoretic functions, we can have injections Z → Z that are not
surjective, for example the map x 7→ 2x. The issue here is that Z has infinite cardinality so weird
things can happen. If V is infinite-dimensional, we can similarly have operators of V that are
injective but not surjective or surjective but not injective.
Example 96.
• The left shift operator L : F∞ → F∞ is not injective, since the null space
contains all scalar multiples of (1, 0, 0, . . . ). However, it is surjective.
• The right shift opeator R : F∞ → F∞ is not surjective, since all of the vectors in the
range start with 0. However, it is injective.
• The multiplication by z operator on P(R) is injective since the only polynomial p such
that zp = 0 is the polynomial p = 0. However, it is not surjective, as its image does not
contain any polynomials with nonzero constant term.
However, as long as the vector space in question is finite-dimensional, injectivity and surjectivity
of an operator are equivalent.
Proposition 97. Suppose V is finite-dimensional and T ∈ L(V ). Then the following are equivalent:
(1) T is invertible,
(2) T is injective,
(3) T is surjective.
Proof. By Proposition 89, (1) if and only if (2) and (3). It suffices to prove that (2) and (3) are
equivalent, since then if we have (2), we also have (3), and together (2) and (3) would imply (1).
[(2) ⇒ (3)]. Suppose that T is injective. Then null T = {0} by Proposition 70. So by the Rank–
Nullity Theorem,
dim range T = dim V − dim null T = dim V
so range T = V , and so T is surjective.
[(3) ⇒ (2)]. Similarly, if T is surjective, then
dim null T = dim V − dim range T = dim V − dim V = 0
so null T = {0} and so T is injective.
I think your book gives a pretty cool example of the power of this seemingly simple theorem.
Example 98. Show that for each polynomial q ∈ P(R), there exists a polynomial p ∈ P(R)
such that ((x2 + 5x + 7)p)00 = q.
In theory you could (try) prove this after Calculus I. You simply write q as a general polynomial with variables as coefficients, then integrate twice (now with two new variables from
the constants of integration that you have control over), then try to choose those integration
constants such that you can divide by x2 + 5x + 7 and solve for the coefficients of p. But this
seems painful to me.
You can also use the previous proposition, although it doesn’t seem (at first) to apply to this,
since P(R) is infinite-dimensional. However, for each polynomial q, since q has finite degree m,
q ∈ Pm (R) which is finite dimensional.
So suppose q ∈ Pm (R). Define an operator T : Pm (R) → Pm (R) by
T (p) = ((x2 + 5x + 7)p)00
Since multiplying by p by a degree two polynomial then differentiating twice preserves the
degree, T (p) ∈ Pm (R). So T is an operator on Pm (R).
Suppose T (p) = 0. The only polynomials whose second derivatives are zero are linear polynomials ax + b. So this means that (x2 + 5x + 7)p = ax + b for some a, b ∈ R. But this means
that p = 0. So null T = {0} and hence T is injective, and thus surjective. Therefore every
polynomial q ∈ Pm (R) is in the image of T .
10. Wednesday 7/15: Products of Vector Spaces (3.E)
Section 3.E of the textbook covers products and quotients of vector spaces. I think quotients are a
bit more difficult to grasp, so we will not be covering them in this class, although you can read that
part in your textbook. Quotient spaces will be important in your abstract algebra and topology
classes, so I’ll let your professors in those classes handle that subject.
The notion of a product of vector spaces is just a way to combine two or more different vector
spaces into a larger vector space.
Definition 99. Suppose V1 , . . . , Vm are vector spaces over F.
The product V1 × · · · × Vm is, as a set
V1 × · · · × Vm {(v1 , . . . , vm ) | vi ∈ Vi for all i}
with addition defined by
(u1 , . . . , um ) + (v1 , . . . , vm ) = (u1 + v1 , . . . , um + vm )
and scalar multiplication defined by
λ(v1 , . . . , vm ) = (λv1 , . . . , λvm ).
This makes the product into a vector space over F.
This is basically the most straightforward way to put two vector spaces together. You basically
just write elements of the individual vector spaces next to each other.
Example 100. The product P3 (R) × R2 contains vectors that look like (2x3 − 4x + 5, (3, 2)).
Addition just happens component-wise so
(x − 3, (0, 1)) + (x2 + 2x − 4, (2, −2)) = (x2 + 3x − 7, (2, −1)).
Note the difference between the sum of two subspaces and the product of two vector spaces. To
take the sum U + W , both U and W need to be subspaces of the same vector space V . Then
U + W = {u + w | u ∈ U, w ∈ W }. In order for this definition to make sense, you need to be able
to add an element of U with an element of W . On the other hand, you can take the product of
any two vector spaces you want.
Example 101. Consider the product R2 × R3 . Elements of this product look like (v, w) where
v ∈ R2 and w ∈ R3 , so like ((v1 , v2 ), (w1 , w2 , w3 )). Is this equal to R5 ? Well, almost. You view
an element of R2 × R3 as an ordered pair, where one element is in R2 and one is in R3 while
you view an element of R5 as a 5-tuple of real numbers. So they are not equal as sets.
On the other hand, they behave essentially identically, with the exception of sticking in some
parentheses. The correct notion is that of an isomorphism, as we’ve discussed. The linear map
T : R2 × R3 → R5 which maps ((v1 , v2 ), (w1 , w2 , w3 )) to (v1 , v2 , w1 , w2 , w3 ) is bijective so these
two vector spaces are isomorphic.
Once you’ve gained some mathematical maturity, you might be allowed to say that “R2 × R3 =
R5 ”, but this is informal so in this class we ought to be more precise.
What is the dimension of V1 × · · · × Vm ? Well if we think about the example above, we might guess
that the dimension of the product is the sum of the dimensions.
Proposition 102. Suppose V1 , . . . , Vm are finite-dimensional vector spaces. Then
dim(V1 × · · · × Vm ) = dim V1 + · · · + dim Vm .
In particular, the product of finitely many finite-dimensional vector spaces is finite-dimensional.
Proof sketch. Pick bases for each of the Vi ’s. If v is a basis element of some Vi , then consider the
element
(0, 0, . . . , 0, v, 0, . . . , 0)
where v appears in the ith slot. The collection of all such vectors as we range over the bases of the
Vi ’s will form a basis of the product. And there are clearly dim V1 + · · · + dim Vm of them.
We discussed that you can take the product of any finite set of vector spaces. You can only take the
sum of subspaces inside a common vector space. But if you have several subspaces of a common
vector space, you can take their product, too (since they’re each vector spaces). What is the
relationship between the sum and product of a collection of subspaces?
Proposition 103. Suppose U1 , . . . , Um are subspaces of a vector space V . Define a linear map
Γ : U1 × · · · × Um → U1 + · · · + Um
Γ(u1 , . . . , um ) = u1 + · · · + um .
Then U1 + · · · + Um is a direct sum if and only if Γ is injective.
Proof. You should make sure that you believe that Γ is a linear map.
Once we know it is linear, Γ is injective if and only if null Γ = {0}. In other words, the only way
to write 0 as a sum u1 + · · · + um where each ui ∈ Ui is by taking all of the ui = 0.
But by Theorem 22, this happens if and only if the sum is a direct sum.
Note that the map Γ defined above is always surjective. Hence, if the sum is a direct sum, then
the product is isomorphic to the sum.
The previous result did not have any finite-dimensional hypothesis; it holds for any subspaces of
any vector space. On the other hand, in the finite-dimensional case, we can use dimension to detect
direct sums.
Proposition 104. Suppose V is finite-dimensional and U1 , . . . , Um are subspaces of V . Then
U1 + · · · + Um is a direct sum if and only if
dim(U1 + · · · + Um ) = dim U1 + · · · + dim Um
Proof. As we just stated, the map Γ is always surjective. So by the Rank–Nullity Theorem
dim(U1 × · · · × Um ) = dim null Γ + dim range Γ = dim null Γ + dim(U1 + · · · + Um ).
We know that the sum is direct if and only if Γ is injective if and only if null Γ = {0}. And so the
sum is direct if and only if
dim(U1 × · · · × Um ) = dim(U1 + · · · + Um ) = dim U1 + · · · + dim Um .
Duality (3.F)
Let V be a vector space. The linear maps V → V were important enough to warrant a special
name—L(V, V ) is the set of operators on V . Another extremely important set of linear maps are
the linear maps V → F.
Definition 105. A linear functional on V is a linear map from V → F, i.e., an element of L(V, F).
We use the notation
V 0 = V ∗ = L(V, F)
and call this the dual space of V .
Your book uses the notation V 0 , but you should be aware that V ∗ is also a very common notation.
Example 106.
• Fix an element (c1 , . . . , cn ) ∈ Fn . Then we have a linear functional
φ : Fn → F defined by φ(x1 , . . . , xn ) = c1 x1 + · · · + cn xn .
To be even more concrete, we can let (3, −2) ∈ R2 . Then the functional φ : R2 → R
sends the vector (x, y) to the number 3x − 2y.
• The linear map J : P(R) → R given by J(p) =
R1
0
p(z) dz is a linear functional.
• Another linear functional you are already familiar with is the map, say ev3 : P(R) → R
that evaluates every polynomial at 3. That is, ev3 (p) = p(3).
These last two examples should be seen as evidence has to how commonly linear
functionals appear in mathematics.
Proposition 107. Let V be a finite-dimensional vector space. Then V 0 is finite dimensional and
dim V 0 = dim V .
Proof. We already proved that if V and W are finite-dimensional then dim L(V, W ) = (dim V )(dim W ).
For the dual space, W = F is one-dimensional.
So if dim V = n, then dim V 0 = n, as well. A natural question to ask is: is there a nice basis for
V 0?
Definition 108. Suppose v1 , . . . , vn is a basis of V . Then the dual basis of v1 , . . . , vn is the list
φ1 , . . . , φn of elements in V 0 . Each φj is the linear functional on V such that

1 if k = j,
φj (vk ) =
.
0 if k 6= j.
Make sure that you understand that each φj really is a linear map from V to F. This follows
because to determine a linear map, you only need to specify the images of a basis of the domain.
And the images are certainly in the field F.
Example 109. If we consider the standard basis e1 , . . . , en of Fn , then the dual basis φ1 , . . . , φn
is defined by

1
φj (ek ) =
0
if k = j,
.
if k 6= j.
What then does φj do to a general vector in Fn ? Well if x = (x1 , . . . , xn ) ∈ Fn then x =
x1 e1 + · · · + xn en . So
φj (x) = x1 φj (e1 ) + · · · + xn φj (en ) = xj φj (ej ) = xj .
In other words, φj returns the jth coordinate of x.
We called the dual basis without ever checking that it actually is a basis of V 0 . We need to fix this
immediately.
Proposition 110. If v1 , . . . , vn is a basis of V , then the dual basis φ1 , . . . , φn is a basis of V 0 .
Proof. Since we know that dim V 0 = n, it is enough to verify that the dual basis is linearly independent (since by Proposition 52, any n linearly independent elements in a vector space of dimension
n form a basis).
So suppose that a linear combination
a1 φ1 + · · · + an φn = 0
where 0 denotes the function V → F that is identically zero. Hence, when we apply this function
to any of the vj , we get 0. However,
(a1 φ1 + · · · + an φn )(vj ) = a1 φ1 (vj ) + · · · + an φn (vj ) = aj φj (vj ) = aj .
Hence, for each j, we have aj = 0. Therefore, the dual basis is in fact linearly independent and
therefore a basis of V 0 .
An interesting feature of the dual space is that if T : V → W is a linear map, then there is a natural
way to define a linear map W 0 → V 0 .
Definition 111. If T ∈ L(V, W ), then the dual map of T is the linear map T 0 ∈ L(W 0 , V 0 ) defined
by
T 0 (φ) = φ ◦ T
for φ ∈ W 0 .
Note that φ : W → F is a linear functional on W . Hence, T 0 (φ) = φ ◦ T : V → W → F is a linear
functional on V (it is linear since it is a composition of linear maps). This function composition
also explains why the arrow “flipped” when we took duals.3
Example 112. In these examples, I will use
∗
notation instead of 0 notation because I want to
reserve 0 for differentiation.
Consider the linear map D : P(R) → P(R) given by differentiation Dp = p0 . We would like to
understand the dual map D∗ : (P(R))∗ → (P(R))∗ .
• Let ev3 ∈ (P(R))∗ be the linear functional which evaluates a polynomial at 3. What is
D∗ (ev3 )? Well, by definition, this is a linear functional on P(R) defined by:
(D∗ (ev3 ))(p) = (ev3 ◦D)(p) = ev3 (Dp) = ev3 (p0 ) = p0 (3).
So D∗ (ev3 ) is the linear functional that maps p to p0 (3).
• Also we considered the linear functional J ∈ (P(R))∗ which takes the definite integral
R1
of a polynomial on the interval [0, 1], that is J(p) = 0 p(z) dz. What is D∗ (J)? This
is the linear functional the element of (P(R))∗ defined by
Z 1
(D∗ (J))(p) = (J ◦ D)(p) = J(p0 ) =
p0 (z) dz = p(1) − p(0).
0
So
D∗ (J)
is the linear functional that maps p to p(1) − p(0).
3You can think of all of the vector spaces over F as living in some world of vector spaces with linear maps between
them. Then for every vector space and every linear map, you can “take duals”. Each resulting dual vector space has
the same dimension as the original vector space, but all the arrows have flipped direction. It’s like the whole world
has mirrored. And like the kids in Stranger Things, we’re trying to understand the Upside-Down world of duals.
11. Friday 7/17: Duality (3.F)
We know that taking the dual of a vector space V returns a vector space V 0 of the same dimension.
We also know that when you take the dual of a linear map T : V → W you get a linear map
T 0 : W 0 → V 0 . What properties does this process have? For example, what if you take the dual of
the sum of two linear maps S + T ? What about the composition of two linear maps U → V and
V → W ? If you know properties about T , what does that say about T 0 ? Can you understand the
null space and range of T 0 in terms of information about T ? So many questions, so little time.
Proposition 113. The following properties hold:
(1) (S + T )0 = S 0 + T 0 for all S, T ∈ L(V, W ).
(2) (λT )0 = λT 0 for all T ∈ L(V, W ) and λ ∈ F.
(3) (ST )0 = T 0 S 0 for all T ∈ L(U, V ) and S ∈ L(V, W ).
Proof. (1) The first two claims are simply computations that should be checked. Suppose S, T ∈
L(V, W ). Then we get maps S 0 , T 0 ∈ L(W 0 , V 0 ). For each φ ∈ W 0 , we have
S 0 (φ) = φ ◦ S : V → F
T 0 (φ) = φ ◦ T : V → F
and we can add these two linear maps
S 0 (φ) + T 0 (φ) = φ ◦ S + φ ◦ T : V → F
using the usual definition of addition of linear maps. On the other hand, we could add S + T ∈
L(V, W ) and then
(S + T )0 (φ) = φ ◦ (S + T ) : V → F.
Are these two functions S 0 (φ) + T 0 (φ) and (S + T )0 (φ) the same function V → F? Well if v ∈ V ,
then
φ ◦ (S + T )(v) = φ(S(v) + T (v)) = φ ◦ S(v) + φ ◦ T (v)
so they are indeed the same function. Since this holds for all φ ∈ W 0 , therefore S 0 + T 0 and (S + T )0
are the same function W 0 → V 0 .
The hard thing about this proof is understanding which things are maps, what the domains and
ranges of the maps are, and the definition of addition and equality of functions. So the difficulty is
in the abstraction, not in the computation.
(2) You should check the second claim.
(3) First note that ST : U → W so (ST )0 : W 0 → U 0 . Further, S 0 : W 0 → V 0 and T : V 0 → U 0 . So
both (ST )0 and T 0 S 0 are maps W 0 → U 0 . Are they equal as maps?
Well let φ ∈ W 0 . Then
(ST )0 (φ) = φ ◦ (ST ) = (φ ◦ S) ◦ T = T 0 (φ ◦ S) = T 0 (S 0 (φ)) = (T 0 S 0 )(φ).
So they are indeed equal.
The Null Space and Range of the Dual of a Linear Map.
We now turn our attention to null T 0 and range T 0 . We know that if T : V → W then T 0 : W 0 → V 0
so null T 0 will be a subspace of W 0 and range T 0 will be a subspace of V 0 . But how are they related
to the original map T ? In order to answer this question, we first need the following definition.
Definition 114. Let U ⊆ V . The annihilator of U is
U 0 = {φ ∈ V 0 | φ(u) = 0 for all u ∈ U }.
That is, the annihilator is all of the linear functionals which vanish on every vector in U .
Note that this definition holds for any subset U of V . We do not require that U is a subspace.
Note that U 0 is a subset of the dual space U 0 ⊆ V 0 . It will turn out that it is always a subspace.
Example 115. Let U be the subspace of P(R) consisting of all multiples of x2 . What are
some linear functionals that are in the annihilator of U ?
One example is the polynomial ev0 : P(R) → R. Any multiple of x2 is equal to 0 when evaluated
at 0. Another example is the linear functional φ : P(R) → R given by φ(p) = p0 (0).
So both ev0 , φ ∈ U 0 .
Proposition 116. Suppose U ⊆ V . Then U 0 is a subspace of V 0 .
Proof. To show that U 0 is subspace, we need to show it contains 0 (here, the zero vector of V 0 is
the zero functional V → F that sends every vector to 0) and that it is closed under addition and
scalar multiplication.
It is clear that 0 ∈ U 0 , since the zero functional applied to any vector in V is 0. So this is certainly
true for every vector in U .
Suppose now that φ, ψ ∈ U 0 .4 This means that for all u ∈ U , we have φ(u) = 0 and ψ(u) = 0.
Therefore, for every u ∈ U .
(φ + ψ)(u) = φ(u) + ψ(u) = 0
and so φ + ψ ∈ U 0 . The proof for scalar multiplication follows similarly.
Example 117. Let’s consider R5 with its standard basis e1 , . . . , e5 . Take the dual basis
φ1 , . . . , φ5 of (R5 )0 .
If we let U = span(e1 , e2 ) (i.e., the xy-plane in R5 ), then what is U 0 ?
Well we certainly know three functionals on R5 that vanish on U , namely φ3 , φ4 , φ5 (φj is
defined to be 1 on ej and 0 on the other standard basis vectors). So φ3 , φ4 , φ5 ∈ U 0 . We claim
that in fact
U 0 = span(φ3 , φ4 , φ5 )
First, we show that U 0 ⊇ span(φ3 , φ4 , φ5 ). Every element of the right-hand side can be written
c3 φ3 + c4 φ4 + c5 φ5 . But then
(c3 φ3 + c4 φ4 + c5 φ5 )(a1 e1 + a2 e2 ) = 0
so every vector vanishes on all of U 0 .
Conversely, suppose that ψ ∈ U 0 . Because the dual basis is a basis of (R5 )∗ , then
ψ = c1 φ1 + c2 φ2 + c3 φ3 + c4 φ4 + c5 φ5 .
Then since ψ vanishes on U , it certainly vanishes on e1 . That is
0 = ψ(e1 ) = (c1 φ1 + c2 φ2 + c3 φ3 + c4 φ4 + c5 φ5 ) = c1
so this implies that c1 = 0. Similarly, since ψ vanishes on e2 , we get that c2 = 0. Therefore,
ψ ∈ span(φ3 , φ4 , φ5 ), as desired.
So for any subset, the annihilator is a subspace. What if U itself is a subspace? Then U and U 0
are both subspaces. How are they related?
Proposition 118. Suppose V is finite-dimensional and U is a subspace of V . Then
dim U + dim U 0 = dim V.
4Note that we use the letters φ, ψ for vectors in a dual space to remind ourselves that they are linear maps.
Proof. Consider the inclusion map i ∈ L(U, V ) defined by i(u) = u for all u ∈ U . This is a
linear map which is defined since U is a subspace of V . Then, taking duals, we get a dual map
i0 ∈ L(V 0 , U 0 ) So by the Rank–Nullity Theorem applied to i0 , we have
dim range i0 + dim null i0 = dim V 0 .
But what is null i0 ? Well by definition i0 : V 0 → U 0 takes a linear functional φ ∈ V 0 to φ ◦ i ∈ U 0 .
So null i0 consists of all of those linear functionals φ such that φ ◦ i = 0 as a function U → F. But
since i is simply the inclusion map, null i0 consists of all those elements u ∈ U such that φ(u) = 0.
And this is the definition of U 0 .
Further, we know that dim V 0 = dim V , so we get
dim range i0 + dim U 0 = dim V.
Now, what is range i0 ? Again, by the definition of dualizing, range i0 consists of every functional
ψ ∈ U 0 such that ψ = φ ◦ i for some φ ∈ V 0 . In other words, range i0 is all of those maps in U 0 that
can be extended to a linear functional on V . But every linear functional ψ on U can be extended
to a linear functional φ on V , so range i0 = U 0 , which has the same dimension as U . Therefore, we
obtain the desired equality.
The next result explicitly characterizes the null space of the dual map. The null space of the dual
of T is the annihilator of the range of T .
Proposition 119. Suppose T ∈ L(V, W ). Then null T 0 = (range T )0 .
If, further, V and W are finite-dimensional, then
dim null T 0 = dim null T + dim W − dim V.
Proof. For the first claim, we need to prove two inclusions. So first, we show that null T 0 ⊆
(range T )0 . Suppose φ ∈ null T 0 . Then
0 = T 0 (φ) = φ ◦ T.
To say that this is the 0 functional is to say that for every v ∈ V ,
0 = (φ ◦ T )(v) = φ(T v).
This means that φ vanishes on every vector in range T (since T v ranges over all vectors in range T
as you take every vector v ∈ V ). Hence, φ ∈ range(T )0 .
To prove the reverse inclusion null T 0 ⊇ (range T )0 , suppose that φ ∈ (range T )0 . This means that
for every w ∈ range T , we have φ(w) = 0. But every vector w ∈ range T can be written as T v for
some vector(s) v ∈ V . Hence, for every v ∈ V , we have φ(T v) = 0. Then
0 = φ ◦ T = T 0 (φ)
so φ ∈ null T 0 . Combining this with the previous paragraph we get null T 0 = (range T )0 .
If both V and W are finite-dimensional, then we have
dim null T 0
null T 0 = (range T )0
=
Prop 118
=
dim W − dim range T
Rank–Nullity
=
dim(range T )0
dim W − (dim V − dim null T )
= dim null T + dim W − dim V,
as desired.
Material for your midterm exam ends here. That is to say, your midterm (on
Wednesday 7/22) covers up through the middle of 3.F. On Monday, we will finish
up 3.F. On Friday we will cover chapter 4 and start chapter 5.
12. Monday 7/20: Duality (3.F)
As a corollary of the last result from last time, we prove the following pithy theorem.
Corollary 120. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then T is surjective
if and only if T 0 is injective.
Proof. We know T ∈ L(V, W ) is surjective if and only if range T = W .
Further, range T = W if and only if (range T )0 = {0}. Why is this true? Well if range T = W ,
then (range T )0 = W 0 and W 0 consists of all those linear functionals which vanish on all of W .
This is of course only the zero functional, so (range T )0 = {0}. Conversely, suppose range T 6= W .
Take a basis v1 , . . . , vn of range T and extend to a basis v1 , . . . , vn , w1 , . . . , wm of all of W . Since
range T 6= W , we have at least one wi . We can then construct a linear functional W → F which
vanishes on v1 , . . . , vn but does not vanish on the w’s. This linear functional is a nonzero element
of (range T )0 . [This is the step in the proof that uses finite-dimensionality, since we need to choose
a basis.]
Therefore, by the previous theorem null T 0 = (range T )0 = {0}. But this is equivalent to T 0 being
injective.
Just as we were able to characterize the null space of a dual map, we have similar results for the
range of a dual map.
Proposition 121. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then range T 0 =
(null T )0 .
Further, dim range T 0 = dim range T .
Proof. We will prove the second statement first, and then use it in the proof of the first statement.
We have
dim range T 0
Rank–Nullity
=
dim W 0 − dim null T 0
Prop 119 and 107
=
Prop 118
=
dim W − dim(range T )0
dim range T.
For the first statement, we show range T 0 ⊆ (null T )0 . Suppose φ ∈ range T 0 . This means that there
exists ψ ∈ W 0 such that φ = T 0 (ψ). Now if v ∈ null T , then
φ(v) = (T 0 (ψ))(v) = (ψ ◦ T )(v) = ψ(T v) = ψ(0) = 0
and so φ ∈ (null T )0 .
But now, since range T 0 ⊆ (null T )0 , if we show that they have the same dimension, then they must
actually be equal. But onte
above
dim range T 0 = dim range T
Rank–Nullity
=
Prop 118
=
dim V − dim null T
dim(null T )0 .
We also have the analogue of the theorem that said that surjectivity of T is equivalent to injectivity
of T 0 .
Proposition 122. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then T is injective
if and only if T 0 is surjective.
Proof. The map T ∈ L(V, W ) is injective if and only if null T = {0}.
But null T = {0} if and only if (null T )0 = V 0 . Why is this? Well if null T = {0}, then every linear
functional on V vanishes on null T . Hence, (null T )0 = V 0 . On the other hand, if (null T )0 = V 0 ,
this means that every linear functional on V vanishes on null T . If null T is nontrivial, then it
contains a nonzero vector v. Extend to a basis of V and define a linear functional which maps v
to 1 and vanishes on the rest of the basis. Then this linear functional does not vanish on null T .
Hence, null T must be trivial.
Now by Proposition 121, this happens if and only if range T 0 = V 0 , which is the definition of T 0
being surjective.
Finally, suppose V and W are finite-dimensional vector spaces. If T : V → W is a linear map with
dual T 0 : W 0 → V 0 , we can ask about the matrix of T 0 . How is it related to the matrix of T ?
Well we can take bases {v1 , . . . , vn } and {w1 , . . . , wm } of V and W . Then M(T ) is an m×n matrix.
We can also take dual bases {φ1 , . . . , φn } and {ψ1 , . . . , ψm } of V 0 and W 0 . Since T 0 : W 0 → V 0 , its
matrix will be n × m. It turns out that with respect to the dual bases, M(T 0 ), the matrix of T 0 is
the transpose of M(T ). For a matrix A, we denote its transpose At .
Proposition 123. Assume the set-up above. Then M(T 0 ) = M(T )t .
Proof. This is yet again an exercise in definitions and notation. Let A = M(T ) and C = M(T 0 ).
The entries of Cl,k of C are determined by what T 0 does to the ψk ’s in terms of the φl ’s. In
particular, for each 1 ≤ k ≤ m,
0
T (ψk ) =
n
X
Cl,k φl
l=1
Now T 0 (ψk ) is an element of V 0 , so we can evaluate it on vj to see
!
n
X
0
T (ψk )(vj ) =
Cl,k φl (vj ) = Cj,k .
l=1
Meanwhile, we also have
T 0 (ψk )(vj ) = (ψk ◦ T )(vj ) = ψk
m
X
!
Al,j wl
= Ak,j .
l=1
Therefore the (j, k) entry of C is given by the (k, j) entry of A, so the two matrices are transpose
to one another.
The Rank of a Matrix.
Now that we have built up all of this technology with duality, we apply it to something concrete.
You may remember in Math 308 learning that given a matrix A, you can consider the column
space col(A), row space row(A), and null space null A. The matrix A is also related to a linear
transformation T and range(T ) = col(A) while null(T ) = null(A). But curiously, you didn’t talk
much about the row space. How is the row space related to everything? Well we know that
row(A) = col(At ). And we just saw that At is the matrix of the dual map T 0 . So the row space is
the range of the dual of T .
Definition 124. Suppose A ∈ Fm,n is an m × n matrix. The row rank of A is the dimension of the
span of the rows of A in F1,n . The column rank of A is the dimension of the span of the columns
of A in Fm,1 .
The next result is one from Math 308, and you can read the abstract proof in the book.
Proposition 125. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then dim range T
is equal to the column rank of M(T ).
Using what we proved about duality, we can show that the row rank and column rank of a matrix
are equal.
Proposition 126. Suppose A ∈ Fm,n . Then the row rank of A equals the column rank of A.
Proof. We define a map T : Fn → Fm given by T (x) = Ax so that M(T ) = A. Then the column
rank of A is equal to dim range T .
Note that the row rank of A is equal to the column rank of At . So the row rank of A is equal to
the column rank of M(T )t = M(T 0 ) = dim range T 0 .
But we proved that dim range T = dim range T 0 , so the column rank and row rank of A are equal. Since the row rank of a matrix always equals its column rank, we can define the rank of a matrix
to just be either of them.
Definition 127. The rank of a matrix A ∈ Fm,n is the column rank of A.
Wednesday 7/22: You will have your first midterm which you will both download from and
submit to Gradescope.
Friday 7/24: We will quickly cover some necessary results about polynomials in Chapter 4 and
then talk about polynomials applied to operators in 5.B.
Next week: 5.A, 5.B, 5.C.
13. Wednesday 7/22: Midterm Exam
Today you took your midterm exam.
14. Friday 7/24: Polynomials (Chapter 4) and Operators (5.B)
Chapter 4 in your textbook is on properties of polynomials, starting from the basic definitions and
covering the fundamental facts about roots, factorization, polynomial division, etc. It is good that
this is all in your textbook, in case you want to brush up on how to work with polynomials, but I
will assume that you know most of the basic tools in working with polynomials.
I will cover only some of the facts and tools that we will need that are a bit more advanced.
Theorem 128 (The Division Algorithm). Given polynomials p, s ∈ P(F) with s 6= 0, there exist
unique polynomials q, r ∈ P(F) such that
p = qs + r
and deg r < deg s.
Remark. By convention, we say that deg(0) = −∞, so it has smaller degree than every polynomial.
This is called the division algorithm because it says that you can divide p by s. The quotient is
q, and the remainder is r. In Math 300, you learned this statement for integers. The difference is
that for the integers, we had r < s. For polynomials, we have deg r < deg s.
Example 129. Let’s suppose that p = 2x3 +3x2 −x+1 and s = x−1. The method you learned
in pre-calculus for long-division of polynomials yields the quotient and remainder guaranteed
by the theorem. If you carry out the computation, the quotient is q = 2x2 + 5x + 4 and the
remainder is r = 5. You can then verify that
2x3 + 3x2 − x + 1 = (2x2 + 5x + 4)(x − 1) + 5.
Proof. Let deg p = n and deg s = m.
If n < m, then if p = qs + r, we must have that q = 0 and hence r = p. These are the unique
polynomials satisfying the property that we want.
So suppose n ≥ m. Define a linear map
T : Pn−m (F) × Pm−1 (F) → Pn (F)
by
T (q, r) = qs + r.
You should check that this map is linear. We will in fact show that T is an isomorphism.
If (q, r) ∈ null T then this means that qs + r = 0. But this implies that q = 0 since if q 6= 0 then
deg(qs) ≥ deg(s) = m > deg(r). Since r has strictly smaller degree than qs, there is no way for
qs + r = 0 except to have q = 0. But this also forces r = 0. So null T = {0}. Hence T is injective.
Further,
dim (Pn−m (F) × Pm−1 (F)) = dim(Pn−m (F))+dim(Pm−1 (F)) = (n−m+1)+m = n+1 = dim Pn (F).
So T is in an injection between two finite-dimensional vector spaces of the same dimension, which
means it must be an isomorphism.
We also have the Fundamental Theorem of Algebra5, which you have hopefully seen before.
Theorem 130 (Fundamental Theorem of Algebra). Let p ∈ P(C) be a nonconstant polynomial.
Then there exists λ ∈ C such that p(λ) = 0.
In other words, every nonconstant polynomial has a complex root.
Which has an important corollary.
Corollary 131. Given p(z) ∈ P(C) of degree d > 0, there exist c, λ1 , . . . , λd ∈ C such that
p(z) = c(z − λ1 ) . . . (z − λd )
So λ1 , . . . , λd are the roots of p.
Proof. The result is clear if deg(p) = 1, for every linear polynomial is of the form αz + β =
α(z + β/α). So c = α and λ1 = −β/α.
Suppose deg(p) ≥ 2. By the Fundamental Theorem of Algebra, p(z) has a complex root λ1 . By
the Division Algorithm, we can write
p(z) = (z − λ1 )q(z) + r(z)
where deg(r(z)) < deg(z − λ1 ) = 1. So in fact r(z) is a constant polynomial. But since p(λ1 ) = 0,
this means that r(z) = 0.
Now p(z) = (z − λ1 )q(z) and q is a polynomial of degree d − 1. Use induction.
Remark. The factorization given in the corollary is unique up to reordering the factors.
5I am a professional algebraist, and I’m not sure that this deserves to be called the Fundamental Theorem of Algebra.
To me this theorem has a pretty analytic flavor, although the roots of modern group theory can be traced back to
understanding the behavior of roots of polynomials.
Polynomials Applied to Operators (part of 5.B)
One new way that we will use polynomials in this class is that we will apply them to operators.
Normally, if you have a polynomial like p(z) = z 2 + 3, you apply the polynomial to a number so,
e.g., p(4) = 42 + 3 = 19. However, we will be plugging operators into polynomials.
Recall that L(V ) denotes the set of linear operators on a vector space V .
Definition 132. Suppose T ∈ L(V ) and m is a positive integer.
• We define T m = T
· · T}. This makes sense since T : V → V , so you can compose T with
| ·{z
m times
itself.
• T 0 is defined to be the identity operator I on V
• If T is invertible with inverse T −1 , then T −m is define to be T −m = (T −1 )m .
It is clear that if T is an operator then T m T n = T m+n and (T m )n = T mn . Since we have a
well-behaved notion of the “power” of an operator, we can substitute operators into polynomials.
Definition 133. Suppose T ∈ L(V ) and p ∈ P(F) is a polynomial given by
p(z) = a0 + a1 z + · · · + am z m .
Then p(T ) is the operator defined by
p(T ) = a0 I + a1 T + · · · + am T m .
Again, here we are taking powers of T (which also give operators) and taking a linear combination
(and a linear combination of operators is an operator). Hence, p(T ) is a well-defined operator.
Example 134. Let D ∈ L(P(R)) denote the differentiation operator (as always) so D(q) = q 0 .
Let p ∈ P(R) be the polynomial defined by p(x) = 3 + 2x2 − x3 . Then p(D) is a new operator
p(D) = 3I + 2D2 − D3 .
What does this operator do to a polynomial?
p(D)(q) = 3q + 2q 00 − q 000
for all q ∈ P(R).
Proposition 135. Suppose p, q ∈ P(F) and let T ∈ L(V ). Then
(1) (pq)(T ) = p(T )q(T ).
(2) p(T )q(T ) = q(T )p(T ).
Remark. In the first equation, the left-hand side means that you multiply the polynomials p and
q to get the new polynomial pq. Then plug the operator into this new polynomial to get (pq)(T ),
an operator. The right-hand side says to plug T into p and q to get two operators p(T ), q(T ). You
can then compose these to get p(T )q(T ). So this says that if you plug an operator into a product
of polynomials, then its the same thing as composition.
For the second, it says that any operators given by polynomials in T commute. Note that almost
always L(V ) has non-commutative composition. The order in which you perform operators definitely matters. But in the case that your operators are all polynomials in a single operator T , then
the order does not matter.
Proof. Note that (1) will imply (2). If we have (1), then
p(T )q(T ) = (pq)(T ) = (qp)(T ) = q(T )p(T )
since multiplication of polynomials is commutative. Hence, we need only show (1).
Pn
P
k
j
To show (1), write p(z) = m
k=0 bk z . Then
j=0 aj z and q(z) =
(pq)(z) =
m X
n
X
aj bk z j+k .
j=0 k=0
Hence,


!
m
n
X
X
p(T )q(T ) = 
aj T j 
bk T k
j=0
=
n
m X
X
k=0
aj bk T j+k
j=0 k=0
= (pq)(T ).
We will see that thinking of polynomials applied to operators will be useful in the rest of chapter
5. But next we jump back to section 5.A.
Invariant Subspaces (5.A)
Your book gives a nice little introduction to 5.A. Suppose you want to study an operator T ∈ L(V ).
Further suppose that we can decompose V as a direct sum
V = U1 ⊕ · · · ⊕ Um .
Well then to understand what T does to V , you only need to understand what T does to each
Uj —that is, you only need to understand each restriction T |Uj .
The restriction is defined on a smaller vector space, so should be easier to understand. However,
the restriction T |Uj is now a linear map in L(Uj , V ). It need not send vectors in Uj only to vectors
in Uj . Therefore, even though T |Uj is simpler, it may not be an operator, so we lose the ability to
take powers.
We are thus naturally led to try to understand the notion of a subspace of a vector space that T
actually maps to itself.
Definition 136. Suppose T ∈ L(V ). A subspace U of V is called invariant under T if u ∈ U
implies T u ∈ U .
In other words, U is invariant under T if T |U is an operator on U .
Example 137. Let V = R2 , U1 = span(e1 ) (the x-axis) and U2 = span(e2 ) (the y-axis). Then
V = U1 ⊕ U2 .
"
Consider the operator T on R2 given by the matrix
2 1
0 1
#
. That is, T (e1 ) = (2, 0) and
T (e2 ) = (1, 1). This is a perfectly good operator on R2 .
Note that every vector in U1 is of the form (x, 0). Then T (x, 0) = (2x, 0) ∈ U1 . So U1 is an
invariant subspace of T and T |U1 is an operator on U1 .
On the other hand, every vector in U2 is of the form (0, y) and T (0, y) = (y, y) 6∈ U2 . So when
restricted to U2 , T |U2 is just a linear map U2 → V , not an operator.
15. Monday 7/27: Eigenvalues, Eigenvectors, and Invariant Subspaces (5.A–B)
Recall the definition of an invariant subspace.
Definition 138. Suppose T ∈ L(V ). A subspace U of V is called invariant under T if u ∈ U
implies T u ∈ U .
In other words, U is invariant under T if T |U is an operator on U .
Example 139. Suppose T ∈ L(V ). Here are some first basic examples of invariant subspaces.
(1) The zero subspace {0} is invariant under T . This is because T (0) = 0, so T maps every
vector in {0} to a vector in {0}.
(2) The whole space V is invariant under T . For each v ∈ V , we certainly have T v ∈ V , since
T is an operator on V .
(3) The null space of T , null T is invariant under T . Suppose v ∈ null T . Then T v = 0.
Therefore T (T v) = T (0) = 0. So T v ∈ null T , as well.
(4) The range of T , range T is invariant under T . Suppose v ∈ range T . Then since v ∈ V , we
have T v ∈ range T , as well.
Example 140. Consider the differentiation operator D : P(R) → P(R) given by Dp = p0 .
We have that P3 (R) is a subspace of P(R) and if p ∈ P3 (R) then Dp ∈ P3 (R), since if p is
a polynomial of degree ≤ 3 then its derivative is a polynomial of degree ≤ 3, as well. Hence,
P3 (R) is invariant under D.
In fact any Pm (R) is invariant under D.
If U is a subspace of V which is invariant under T ∈ L(V ), then when we restrict T to U , since
T u ∈ U for every u ∈ U , the restriction yields an operator on U . (As mentioned earlier, this
would not be true if U were not invariant, since then the restriction would just yield a linear map
T |U : U → V , which is not an operator.)
Definition 141. Suppose T ∈ L(V ) and U is a subspace of V which is invariant under T .
The restriction operator T |U ∈ L(U ) is defined by
T |U (u) = T u
for u ∈ U .
Note that the restriction of a linear map to a subspace is always defined, it just may not always be
an operator.
Example 142. Let T ∈ L(V ) and let U = span(v) be a one-dimensional subspace of V . Then
U is invariant under T if and only if T v = λv for some λ ∈ F.
This example explains why this section is called eigenvalues, eigenvectors, and invariant subspaces.
Definition 143. Suppose T ∈ L(V ). A scalar λ ∈ F is an eigenvalue of T if there exists v ∈ V
such that v 6= 0 and T v = λv.
Any such vector v is called an eigenvector corresponding to the eigenvalue λ.
Proposition 144. Suppose V is finite-dimensional, T ∈ L(V ) and λ ∈ F. The following are
equivalent:
(1) λ is an eigenvalue of T .
(2) T − λI is not injective.
(3) T − λI is not surjective.
(4) T − λI is not invertible.
Proof. The first two are equivalent since T v = λv is equivalent to
0 = T v − λv = (T − λI)v
so if λ is an eigenvector of T , there is a nonzero vector in the null space of T − λI, so T − λI is not
invertible.
Since V is finite-dimensional by Proposition 97, the last three are all equivalent.
For a fixed eigenvalue, we can look at all of the eigenvectors that have that eigenvalue.
Definition 145. Let T ∈ L(V ) and let λ ∈ F be an eigenvalue of T . The eigenspace corresponding
to λ is defined by
E(λ, T ) = null(T − λI).
That is, E(λ, T ) is the set of all eigenvectors of T corresponding to λ, along with the 0 vector.
We next show that if eigenvectors have distinct eigenvalues, then they are linearly independent.
Proposition 146. Let T ∈ L(V ). Suppose λ1 , . . . , λm are distinct eigenvalues of T and v1 , . . . , vm
are corresponding eigenvectors. Then v1 , . . . , vm is linearly independent.
Proof. We prove by contradiction. Suppose for contradiction that v1 , . . . , vm is linearly dependent.
We know that there is some k such that vk ∈ span(v1 , . . . , vk−1 ). Choose the smallest such k. (So
we will have that v1 , . . . , vk−1 is linearly independent, but vk is in their span).
We can therefore write
vk = a1 v1 + · · · + ak−1 vk−1 .
Now apply T to both sides of this equation to get
λk vk = a1 λ1 v1 + · · · + ak−1 λk−1 vk−1 .
But we also have
λk vk = a1 λk v1 + · · · + ak−1 λk vk−1 .
Subtracting these two equations yields
0 = a1 (λk − λ1 )v1 + · · · + ak−1 (λk − λk−1 )vk−1 .
Since the v1 , . . . , vk−1 is linearly independent, this forces all of the coefficients in the previous
equation to be 0. However, since the eigenvalues are distinct, we must have that a1 = · · · = ak−1 =
0. But then vk = 0, contradicting the hypothesis that vk is an eigenvector.
As a corollary, we can see that an operator on a finite-dimensional vector space can only have so
many eigenvalues.
Corollary 147. Suppose V is finite-dimensional. Then each operator on V has at most dim V
distinct eigenvalues.
Proof. If λ1 , . . . , λm are distinct eigenvalues, then they have corresponding eigenvectors v1 , . . . , vm .
By the previous result, these are linearly independent. So there can be at most dim V of them. However, not every operator needs to have an eigenvalue.
Remark. Fix an angle 0 < θ < π. Consider the operator T : R2 → R2 given by rotation by angle
θ. You may remember that with respect to the standard basis on R2 ,
"
#
h
i
cos θ − sin θ
M(T ) = T e1 T e2 =
.
sin θ cos θ
Then by thinking about the geometry, it is clear that no nonzero vector gets scaled by T . So this
operator has no eigenvalues or eigenvectors.
So in general, an operator on V can have up to dim V eigenvalues, but it might have none. However,
over C, it is true that every operator has an eigenvalue.
Theorem 148. Let F = C and let V be a finite-dimensional vector space. Then every operator on
T has an eigenvalue.
The traditional proof of this fact that you learn as an undergraduate uses the determinant. However,
Axler feels strongly against using the determinant. A quote from the introduction “Determinants
are difficult, nonintuitive, and often defined without motivation. [...] This tortuout (torturous?)
path gives students little feeling for why eigenvalues exist.”
Traditional proof. The determinant det(T − xI) is a polynomial in x of degree dim V > 0 and hence
has a root by the Fundamental Theorem of Algebra. If λ ∈ C is such a root, then T − λI is not
invertible, so λ is an eigenvalue of T .
Determinant-free proof. Suppose V is a complex vector space with dimension n > 0 and T ∈ L(V ).
Let v ∈ V be any nonzero vector. Consider the n + 1 vectors
v, T v, . . . , T n v
are linearly dependent. Then there exist scalars a0 , . . . , an ∈ F not all zero such that
a0 v + a1 T v + · · · + an T n v = 0.
Let p ∈ P(F) and write
p(z) = a0 + a1 z + · · · + an z n = c(z − λ1 ) . . . (z − λm )
(which you can do by the Fundamental Theorem of Algebra). Now we have
0 = (a0 I + a1 T + · · · + an T n )(v) = c(T − λ1 ) . . . (T − λm )(v).
This means that for some j, T − λj is not injective. And hence, there is a vector w such that
(T − λj )(w) = 0 so w is an eigenvector of T with eigenvalue λj .
16. Wednesday 7/29: Upper-Triangular Matrices and Diagonal Matrices (5.B–C)
We have already discussed the matrix of a linear map T : V → W when V and W are finitedimensional with a choice of fixed basis.
Since an operator is just a linear map T : V → V , if V is finite-dimensional, then we can also talk
about the matrix of an operator. But since the domain and codomain are the same, the matrix will
be square. Also, we will choose the same basis of V in the domain and the codomain by convention.
Definition 149. Suppose T ∈ L(V ) and v1 , . . . , vn is a basis of V . The matrix of T with respect
to this basis is the n × n matrix

A1,1 . . .
 .
.
M(T ) = 
 .
An,1 . . .

A1,n
.. 
. 

An,n
where the entries Aj,k of M(T ) are defined by
T vk = A1,k v1 + · · · + An,k vn .
(The kth column of the matrix tells you how to write T vk in terms of the basis).
As always, if V = Fn and no basis is specified, we assume that the basis being used is the standard
basis.
Definition 150. The diagonal (or main diagonal ) of a square matrix consists of the entries Ai,i ,
that is, the entries along the line from the upper-left corner to the bottom-right corner.
A matrix is called upper-triangular if all the entries below the main diagonal equal 0. A matrix is
called lower-triangular if all the entries above the main diagonal equal 0.
A matrix is called diagonal if its only non-zero entries occur along the main diagonal (if and only
if it is both upper-triangular and lower-triangular).
Here, sometimes we specify main diagonal to distinguish this from the diagonal line from the
upper-right to the bottom-left. This other diagonal is sometimes called the antidiagonal.
A key idea: as you change the basis of V , the matrix of the linear operator changes. We might
want to choose a basis of V so that the matrix of T has a particularly nice or simple form. This
was the idea behind diagonalization in Math 308, although you maybe didn’t think about it this
way at the time.
Definition 151. We say T ∈ L(V ) is diagonalizable if there exists a basis of V such that M(T ) is
diagonal with respect to this basis.
Imagine that v is an eigenvector T with eigenvalue λ. Then we can choose a basis that has v as its
first vector. What is the matrix of M(T )? Well the first column will be determined by T v, written
with respect to our basis. But since T v = λv and v is in our basis, the first column with respect
to this basis will be (λ, 0, . . . , 0)t . Similarly, if w is not a scalar multiple of v and is an eigenvector
with eigenvalue µ, then if we choose w to be the second basis element, then the second column will
be (0, µ, 0, . . . , 0)t . So, if we could find a basis of linearly independent eigenvectors, then the M(T )
with respect to this basis would be diagonal! This idea is captured by the following result.
Proposition 152. Suppose T ∈ L(V ) and v1 , . . . , vn is a basis of V . Then the following are
equivalent:
(1) the matrix of T with respect to v1 , . . . , vn is diagonal;
(2) vj is an eigenvector of T for each 1 ≤ j ≤ n;
Proof. The discussion above is essentially the proof (combined with induction).


λ1


..
 then the jth column of M(T ) says that
If M(T ) is diagonal, say M(T ) = 
.


λn
T vj = 0v1 + · · · + λj vj + · · · + 0vn
and so vj is a eigenvector of T .
Conversely, if each vj is an eigenvector of T , then T (vj ) = λj vj for some λj ∈ F. Then it is clear
that M(T ) with respect to this basis of eigenvectors will be the diagonal matrix with diagonal
entries λ1 , . . . , λn .
This says that T is diagonalizable if and only if V admits a basis consisting of eigenvectors of T .
However, as you may remember from Math 308, not every operator is diagonalizable, even over C.
Example 153. Let T : C2 → C2 be given by T (x, y) = (0, x). Suppose that v = (v1 , v2 ) is an
eigenvector of T with eigenvalue λ. Note that
T v = T (v1 , v2 ) = (0, v1 )
T 2 v = T (0, v1 ) = (0, 0).
So if T v = λv, then T 2 v = λ2 v = 0 so λ = 0. This means that 0 is the only possible eigenvalue
for T . But now
E(0, T ) = null(T − 0I) = null(T ) = span((0, 1)).
which is one-dimensional. Hence, it is not possible to find a basis of C2 consisting of eigenvectors
of T , whence T is not diagonaliable.
We do have a sufficient (though not necessary) condition that ensures diagonalizability.
Proposition 154. Suppose dim V = n. If T ∈ L(V ) has n distinct eigenvalues, then T is diagonalizable.
Proof. We can choose eigenvectors v1 , . . . , vn which correspond to the n distinct eigenvalues. We
proved that these vectors are linearly independent, and hence form a basis of V . Therefore, with
respect to this eigenbasis, M(T ) is diagonal.
Being diagonalizable is a very strong notion. We also have the weaker notion of upper-triangularity6.
What does it mean if the matrix of an operator is upper-triangular?
Proposition 155. Suppose T ∈ L(V ) and v1 , . . . , vn is a basis of V . Then the following are
equivalent:
(1) the matrix of T with respect to v1 , . . . , vn is upper-triangular;
(2) T vj ∈ span(v1 , . . . , vj ) for each 1 ≤ j ≤ n;
(3) span(v1 , . . . , vj ) is invariant under T for each 1 ≤ j ≤ n.
Proof. Parts (1) and (2) are equivalent from the definition of M(T ). The jth column of M(T )
contains the coefficients of T vj when written as a linear combination of the vi ’s. So if the matrix
is upper triangular, this means that the coefficients on vj+1 , . . . , vn are all 0, and so T vj can be
written as a linear combination of v1 , . . . vj . So we need only prove that (2) and (3) are equivalent.
Suppose (3) holds and let 1 ≤ j ≤ n. By hypothesis, span(v1 , . . . , vj ) is invariant under T . This
means that any vector in span(v1 , . . . , vj ) stays in span(v1 , . . . , vj ) after you apply T . In particular,
T vj ∈ span(v1 , . . . , vj ). This holds for all 1 ≤ j ≤ n, so (3) implies (2).
6Upper-triangularness?
Finally, suppose (2) holds and let 1 ≤ k ≤ n. By hypothesis, for every j, we have T vj ∈
span(v1 , . . . , vj ). So
T v1 ∈ span(v1 ) ⊆ span(v1 , . . . , vk )
T v2 ∈ span(v1 , v2 ) ⊆ span(v1 , . . . , vk )
..
.
T vk ∈ span(v1 , v2 , . . . , vk ).
Any v ∈ span(v1 , . . . , vk ) can be written as a linear combination of v1 , . . . , vk so T v ∈ span(v1 , . . . , vk ).
This shows that span(v1 , . . . , vk ) is invariant under T .
We know that not every matrix is diagonalizable. This means that given an operator T ∈ L(V ), it
is not always possible to find a basis of V so that M(T ) is diagonal. However, over C, it is always
possible to find a basis of V so that M(T ) is upper-triangular.
Theorem 156. Suppose V is a finite-dimensional vector space over C and T ∈ L(V ). Then T has
an upper-triangular matrix with respect to some basis of V .
Proof. We prove by induction on dim V . If V is one-dimensional, then it is spanned by some vector
V = span(v). And T v ∈ V so T v = λv for some λ ∈ C. The matrix of T is just [λ], which is
upper-triangular.
So suppose dim V > 1 and the result holds for all vector spaces of smaller dimension. Let T ∈ L(V ).
By Theorem 148, T has an eigenvalue λ. This means that T − λI is not invertible. Let
U = range(T − λI).
Since T − λI is not invertible, dim U < dim V .
Claim. U is invariant under T . To see this, suppose that u ∈ U . Then
T u = T u − λu + λu = (T − λI)(u) + λu.
We have (T −λI)(u) ∈ range(T −λI) = U and λu ∈ U (since U is closed under scalar multiplication).
Hence, T u ∈ U . This proves the claim.
Since U is invariant under T , the restriction T |U is an operator on U . Since dim U < dim V , by our
inductive hypothesis, there is a basis u1 , . . . , um of U such that M(T |U ) is upper-triangular with
respect to this basis. Therefore, for each 1 ≤ j ≤ m,
T uj = (T |U )(uj ) ∈ span(u1 , . . . , uj ).
We know we can extend our basis of U to a basis u1 , . . . , um , v1 , . . . , vn of V . Now note that for
each 1 ≤ k ≤ n,
T vk = (T − λI)vk + λvk .
Since U = range(T − λI), this shows that T vk ∈ span(u1 , . . . , um , vk ). And therefore, for each k,
T vk ∈ span(u1 , . . . , um , v1 , . . . , vk ).
By the previous proposition, this shows that M(T ) is upper-triangular with respect to this basis.
One benefit of finding a basis of V such that M(T ) has a nice form is that perhaps the matrix
can tell you properties of the linear map. For example, for a general linear map/matrix, it is not
obvious whether the map is invertible. It is also not easy to read off the eigenvalues. However, if
M(T ) is upper-triangular, it becomes easy to see.
Proposition 157. Suppose T ∈ L(V ) has an upper-triangular matrix with respect to some basis
of V . Then T is invertible if and only if all the entries on the diagonal are nonzero.
The eigenvalues of T are the diagonal entries of M(T ).
17. Friday 7/31: Inner Product Spaces (6.A)
Probably the first time you learned about vectors was in a math class or physics class long ago.
Back then, you were young and naive, and your teacher or professor may have defined a vector as
“something with a magnitude and a direction”. They were lying to you.
Now that you’re older and mathematically mature, I can tell you that a vector is... an element of a
vector space7. And a vector space is characterized by its axioms. Certainly in Rn the vectors have
magnitude and direction. But you’re using the Euclidean notion of length that relies on properties
of R. There are lots of other fields out there in the wide world. If F is any field, you can always
consider vector spaces over F. But sometimes there will be no notion of magnitude or direction of
vectors. (After all, magnitude and direction are not built into our axioms of a vector space).
On the other hand, if we are in the realm of Rn , then vectors have such a natural notion of magnitude
and direction that surely studying those properties can teach us more about those spaces. So in
this chapter, we’ll consider things like length and angle, and also generalize the key properties so
that maybe we can apply them to other vector spaces. This can be useful. For example, quantum
physics is really the study of certain Hilbert spaces (which are vector spaces which admit an inner
product + some other properties).
Definition 158. The Euclidean norm (or length) of a vector x = (x1 , . . . , xn ) ∈ Rn is
q
√
kxk = x21 + · · · + x2n = x · x.
Here, the · means the dot product, since it is the product of two vectors (well, one vector with
itself) in Rn . We haven’t defined the dot product yet, though so:
Definition 159. The dot product of x, y ∈ Rn is
x · y = x1 y1 + · · · + xn yn
where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ).
Note that the dot product is only defined between two vectors living in the same Rn , and the dot
product of those two vectors is a number, not a vector. Based on the two definitions above, it is
clear that x · x = kxk2 for any x ∈ Rn .
The dot product also satisfies:
7Not the most enlightening definition, but really the only correct definition.
• x · x ≥ 0 for all x ∈ Rn ;
• x · x = 0 if and only if x = 0;
• for a fixed vector y ∈ Rn , the map Rn → R mapping x ∈ Rn to x · y ∈ R is linear;
• x · y = y · x for all x, y ∈ Rn (the dot product is commutative).
The fact that x · x = kxk2 shows us that the dot product is intimately connected to the notion of
length in Rn . Indeed, you may also remember from Calc 3 that x · y = kxk kyk cos θ where θ is the
angle between x and y. So the dot product somehow knows about both length and angles in Rn .
We would like to abstract the properties of a dot product so that we can generalize. There may be
other notions of angle and length that may be useful, and we’d like to study those. In particular,
the notion of dot product is not so useful on infinite-dimensional vector spaces. But first, we should
consider the other case that we’re very comfortable with: Euclidean space over C. We’ll see that
actually there is a subtlety here that we won’t see if we only think about R.
Recall that if λ ∈ C and we write λ = a + bi for a, b ∈ R, then
• the absolute value of λ, denoted |λ|, is defined by |λ| =
√
a2 + b2 ;
• the complex conjugate of λ, denoted λ is λ = a − bi;
• |λ|2 = λλ.
With these basics in place, we can define a norm in Cn . If z = (z1 , . . . , zn ) ∈ Cn then
p
kzk = |z1 |2 + · · · + |zn |2 .
We need to take absolute values because we want norms to be nonnegative real numbers. Hence,
kzk2 = |z1 |2 + · · · + |zn |2 = z1 z1 + · · · + zn zn .
So if w, z ∈ Cn , we should define the product to be
w1 z 1 + · · · + wn z n .
This does generalize the notion of the dot product in Rn , but because everything was real there, we
couldn’t see the need to conjugate. Note also that if we take the product of z and w in the other
order, we get
z 1 w1 + · · · + z n wn = w1 z 1 + · · · + wn z n ,
so actually we should not expect our notion of product to commute, but commute up to conjugation.
We now define an inner product via its abstract properties.
Definition 160. Let F = R or C and let V be a vector space over F (not necessarily finitedimensional). An inner product on V is a function that takes each ordered pair (u, v) of elements
of V to a scalar hu, vi such that the following properties hold for all u, v, w ∈ V and λ ∈ F:
• (positivity)
hv, vi ≥ 0; (if F = C, then λ ≥ 0 means that λ is a non-negative real number).
• (definiteness)
hv, vi = 0 if and only if v = 0;
• (additivity in first slot)
hu + v, wi = hu, wi + hv, wi;
• (homogeneity in first slot)
hλu, vi = λhu, vi;
• (conjugate symmetry)
hu, vi = hv, ui.
Now for some examples
Example 161.
• The Euclidean inner product on Fn is given by
h(w1 , . . . , wn ), (z1 , . . . , zn )i = w1 z1 + · · · + wn zn .
When F = R, this is the usual dot product.
• If c1 , . . . , cn are positive real numbers, then we can define a weighted version of the
usual inner product on Fn , namely:
h(w1 , . . . , wn ), (z1 , . . . , zn )i = c1 w1 z1 + · · · + cn wn zn .
So the usual inner product is an example of this inner product with c1 = · · · = cn = 1.
(Note that the positive hypothesis is necessary. If one of the ci = 0, then it will fail to
be an inner product).
• Let V = R[−1,1] be the vector space of continuous real-valued functions on the interval
[−1, 1]. We can define an inner product on V by
Z 1
hf, gi =
f (x)g(x) dx.
−1
When you take analysis classes, you will see that this is a very important inner product.
If you talk to a functional analyst about an inner product, this is the one they imagine.8
Note that this example is an infinite-dimensional vector space.
Definition 162. An inner product space is a vector space V equipped with an inner product h−, −i
on V .
In the rest of this chapter, V will denote an inner product space. If V = Fn and no inner product
is specified, we assume that we are using the Euclidean inner product.
We have a new definition to play with, which means we should try to understand its basic properties!
This is fun because the proofs cannot involve too much technical machinery—at this point basically
all we have is the definition.
Proposition 163. Let V be an inner product space. Then
(1) For each fixed u ∈ V , the function that takes v to hv, ui is a linear map V → F.
(2) h0, ui = hu, 0i = 0 for every u ∈ V .
(3) hu, v + wi = hu, vi + hu, wi for all u, v, w ∈ V .
(4) hu, λvi = λhu, vi for all λ ∈ F and u, v ∈ V .
Remark. You may have noticed that the definition of an inner product was annoyingly nonsymmetric in the first and second slot of the inner product. Inner products are additive in the first
slot, what about the second? Well this proposition shows that inner products are also additive in
the second slot, but this is a consequence of the definition, so we didn’t need to assume it. There
is also a subtlety about scalar multiplication in the second slot: you need to conjugate when you
pull a scalar out of the second slot.
Proof. (1) We need to prove that the map is linear. That is, we need to prove that it is additive
and homogeneous. The function we are considering is
T :V →F
T (v) = hv, ui
Let v, w ∈ V . Then T (v + w) = hv + w, ui = hv, ui + hw, ui = T (v) + T (w) by the additivity of
inner products in the first slot. The proof for homogeneity is similar.
(2) Since T is linear, we have T (0) = h0, ui = 0. Since u was arbitrary, this proves h0, ui = 0 for all
u ∈ V . Now, by conjugate symmetry:
hu, 0i = h0, ui = 0 = 0.
(3) and (4) are simply calculations using conjugate symmetry. So,
hu, v + wi = hv + w, ui = hv, ui + hw, ui = hv, ui + hw, ui = hu, vi + hu, wi.
Here, we are using that if z, w are complex numbers then z + w = z + w and also z = z. These can
both be easily checked from the definition of complex conjugation.
Finally,
hu, λvi = hλv, ui = λhv, ui = λhv, ui = λhu, vi
where here we used zw = zw which you should also check.
Now fix an inner product on V . We saw that the dot product on Rn was closely related to the
√
notion of length. Namely, the length of a vector v was given by v · v. We extend this idea to all
inner products.
Definition 164. For v ∈ V , the norm of v, denoted kvk is defined by
p
kvk = hv, vi.
Proposition 165. Suppose v ∈ V and λ ∈ F.Then
(1) kvk = 0 if and only if v = 0.
(2) kλvk = |λ| kvk.
Proof. By definiteness of inner products, we have hv, vi = 0 if and only if v = 0. And kvk = hv, vi2
which is 0 if and only if hv, vi = 0.
For the second part, we compute
kλvk2 = hλv, λvi = λhv, λvi = λλhv, vi = |λ|2 kvk2
and taking square roots yields the desired result.
Note that (as in the proof above), it is often easier to work with the square of the norm of a vector,
since this has a nice definition as an inner product. Just don’t forget to square or square-root as
appropriate.
18. Monday 8/3: Inner Product Spaces (6.A)
Not only does an inner product know about “lengths” it also knows about “angles”. Recall that in
calculus, you learned that two nonzero vectors u, v ∈ Rn are perpendicular if and only if u · v = 0,
or phrased in terms of inner products, hu, vi = 0. We generalize this to any inner product space.
Definition 166. Two vectors u, v ∈ V are called orthogonal if hu, vi = 0.
Note that since hu, vi = hv, ui, if u and v are orthogonal then
hu, vi = 0 = hv, ui.
So the definition is symmetric in u and v. if u is orthogonal to v, then v is orthogonal to u.
Proposition 167. (1) 0 is orthogonal to every vector in V .
(2) 0 is the only vector that is orthogonal to itself.
Proof. The first part is simply a restatement of Proposition 163(b).
The second part follows from the definition of the inner product. If v is orthogonal to itself, then
hv, vi = 0, but by definiteness, this implies that v = 0.
This takes us to maybe the most famous theorem? Maybe.
Theorem 168 (Pythagorean Theorem). Suppose u and v are orthogonal. Then
ku + vk2 = kuk2 + kvk2 .
Here’s a picture when V = R2 . (Based on how long this took, I need you to be impressed by this.)
u+v
u
v
Proof. This follows by a computation.
ku + vk2 = hu + v, u + vi
additivity
=
hu, ui + hu, vi + hv, ui + hv, vi
= kuk2 + kvk2 .
When we work with norms and inner products, we should be guided by our geometric intuition
from Rn (but our proofs should work for abstract inner produts on abstract vector spaces). For
example, we can draw a picture of the following theorem.
Proposition 169. Let u, v ∈ V with v 6= 0. Then there exists a unique c ∈ F such that w = u − cv
is orthogonal to v.
In particular, c =
hu, vi
.
kvk2
u
w
cv
v
Proof. We want w = u − cv to be orthogonal to v, so we want
0 = hu − cv, vi = hu, vi − chv, vi = hu, vi − c kvk2 .
As long as kvk2 6= 0 (which is true because v 6= 0) we can solve for the unique such c, namely
c=
hu, vi
kvk2
We can use this to prove the following very fundamental result, which is important in many fields
of mathematics.
Theorem 170 (Cauchy–Schwarz inequality). If u, v ∈ V , then |hu, vi| ≤ kuk kvk.
Equality holds if and only if one of u, v is a scalar multiple of the other.
Proof. Clearly, if v = 0, then both sides are equal to 0. Also in this case, v is a scalar multiple of u.
Suppose v 6= 0 and let w = u −
hu, vi
v so that w and v are orthogonal. Rewriting this, we have
kvk2
u=w+
and w is orthogonal to
hu,vi
.
kvk2
hu, vi
v
kvk2
Hence, by the Pythagorean Theorem,
kuk2 = kwk2 +
hu, vi
v
kvk2
2
= kwk2 +
hu, vi
kvk2
= kwk2 +
|hu, vi|2
kvk2
≥
2
kvk2
|hu, vi|2
kvk2
Now clear the denominator and take square roots to conclude that |hu, vi| ≤ kuk kvk.
Note that we will have equality if and only if kwk2 = 0 if and only if kwk = 0. But this means that
u is a scalar multiple of v.
As a corollary, we get another very fundamental result, with a nic geometric interpretation.
Theorem 171 (Triangle inequality). Suppose u, v ∈ V . Then
ku + vk ≤ kuk + kvk .
Equality holds if and only if one of u, v is a nonnegative multiple of the other.
v
u
u+v
The picture9 of this result explains why it is called the triangle inequality. It basically says that
the shortest path between two points is a straight line.
9This is the most pictures I’ve ever tried to put into some notes and you should not expect this going forward.
Proof. We have
ku + vk2 = hu + v, u + vi
= hu, ui + hv, vi + hu, vi + hv, ui
= hu, ui + hv, vi + hu, vi + hu, vi
= kuk2 + kvk2 + 2 Rehu, vi
≤ kuk2 + kvk2 + 2|hu, vi|
≤ kuk2 + kvk2 + 2 kuk kvk
= (kuk + kvk)2 .
Again, take square roots to conclude the inequality.
Note that equality will hold if and only Rehu, vi = |hu, vi| = kuk kvk. The latter equality holds if
and only if one of u and v is a scalar multiple of the other. To have the first equality, we also need
hu, vi to be a nonnegative real number.
If u = λv then hu, vi = hu, λui = λ kuk2 so we need λ to be a nonnegative real number. Similarly,
if v = λu ,then hu, vi = λ kvk2 , so we still need λ to be a nonnegative real number.
Altogether, this means that in order to have equality, we must have that u or v is a nonnegative
multiple of the other. Conversely, if one of them is a nonnegative multiple of the other, it is easy
to see that Rehu, vi = |hu, vi| = kuk kvk.
19. Wednesday 8/5: Orthonormal Bases (6.B)
Orthogonal vectors are nice for many reasons. One thing that makes the standard basis of Fn so
nice is that it consists of vectors that are mutually orthogonal. But they actually satisfy an even
stronger property.
Definition 172. A list of vectors is called orthonormal if each vector in the last has norm 1 and
is orthogonal to all the other vectors in the list.
In other words, e1 , . . . , em is orthonormal if

1
hej , ek i =
0
if j = k,
if j 6= k.
Remark. Normally I would reserve ei to denote a standard basis vector of Fn . But in this section
of your textbook, e1 , . . . , em will generally mean an orthonormal set of vectors. This is not so bad
since...
Example 173.
e1 , . . . , en of
• The prototypical example of an orthonormal list is the standard basis
Fn .
• Also the list
1
1
1
1
(1, 1, 1, 1), (1, 1, −1, −1), (1, −1, 1, −1), (1, −1, −1, 1),
2
2
2
2
is an orthonormal basis of F4 .
An orthonormal list of vectors satisfies the following very nice property.
Proposition 174. If e1 , . . . , em is an orthonormal list of vectors in V then
ka1 e1 + · · · + am em k2 = |a1 |2 + · · · + |am |2
for all a1 , . . . , am ∈ F.
Proof. As your book mentions, this follows from the Pythagorean Theorem. For simplicity of
illustration, suppose m = 3. Then, since a1 e1 is orthogonal to a2 e2 + a3 e3 , we have
ka1 e1 + a2 e2 + a3 e3 k2 = ka1 e1 k2 + ka2 e2 + a3 e3 k2
= ka1 e1 k2 + ka2 e2 k2 + ka3 e3 k2
= |a1 |2 ke1 k2 + |a2 |2 ke2 k2 + |a3 |2 ke3 k2
and so the result follows since kej k = 1 for all j. You could make this rigorous by doing a proof by
induction.
We also get the following important corollary. Note that this agrees with our geometric intuition,
since perpendicular vectors “point in different directions”. The correct way to say this is:
Corollary 175. Every orthonormal list of vectors is linearly independent.
Hence, if V is finite-dimensional with dim V = n, then any list of n orthonormal vectors in V is a
basis.
Proof. Suppose that e1 , . . . , em is an orthonormal list of vectors in V and suppose
a1 e1 + · · · + am em = 0.
We wish to show that all of the scalars must be 0. But by the above result, we have
0 = ka1 e1 + · · · + am em k2 = |a1 |2 + · · · + |am |2
and so each |aj | = 0.
For the second statement, any list of n orthonormal vectors is linearly independent, and any list of
n linearly independent vectors in a vector space of dimension n is already a basis.
Definition 176. An orthonormal basis of V is a basis of V which is also an orthonormal list.
Here is one of the most favorable properties of an orthonormal basis. If e1 , . . . , en is a basis of V
and w ∈ V , then we know that there must exist scalars such that
w = a1 e1 + · · · + an en .
In general, it is not that easy to figure out the scalars aj . You would have to set up a matrix whose
columns are the ej ’s, augment with w, and solve with row reduction.
However, if we have an orthonormal basis:
Proposition 177. Let w ∈ V and suppose e1 , . . . , en is an orthonormal basis of V . Then
w = a1 e1 + · · · + an en
where, for all 1 ≤ j ≤ n, aj = hw, ej i.
Furthermore, kwk2 = |hw, e1 i|2 + · · · + |hw, en i|2 .
Proof. This is simply a computation. Note that
hw, ej i = ha1 e1 , ej i + · · · + han en , ej i = haj ej , ej i = ej .
The second claim follows from Proposition 174 applied to w.
So orthonormal bases are very nice, but how special are they? Do they always exist? If they do
always exist, how do we get ahold of one?
Maybe in Fn , using the Euclidean inner product, it seems geometrically reasonable that we should
be able to get an orthonormal basis. And it even seems like we can start an orthonormal basis
however we’d like. But what about other inner product spaces that we don’t have as much intuition
R1
for? Remember that we have an inner product on Pm (R) given by hp, qi = −1 p(x)q(x) dx. Can we
get an orthonormal basis with respect to this inner product? If you’re anything like me, I wouldn’t
have any idea how to start to try to get one.
It turns out that there is a deterministic algorithm that takes any linearly independent list and
returns an orthonormal lis with the same span.
Proposition 178 (Gram–Schmidt Procedure). Suppose v1 , . . . , vn ∈ V are linearly independent.
Let
v1
kv1 k
v2 − hv2 , e1 ie1
e2 =
kv2 − hv2 , e1 ie1 k
..
.
e1 =
en =
vn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1
kvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 k
Then e1 , . . . , en is an orthonormal list and span(e1 , . . . , en ) = span(v1 , . . . , vn ).
The geometric picture here (which I drew in lecture, but won’t attempt to recreate in these notes)
is that at each step you take the next vector and subtract off the projection onto the span of all the
previous ones. This leaves you with a vector that is orthogonal to the span of the previous ones, so
is orthogonal to all of the previous ones. In the end, you will get the same span but your vectors
will all be orthogonal. Also, at each step, since you divide through by the norm, you will ensure
that all of your vectors are unit vectors. Hence, you will be left with an orthonormal set.
Proof. When n = 1, the result is clear, as v1 / kv1 k is an orthonormal list.
So suppose n > 1 and by induction assume that the algorithm works for any collection of n − 1
vectors.
Note that the definition of en makes sense because we are not dividing by 0. This is because since
v1 , . . . , vn is linearly independent, we know that vn 6∈ span(v1 , . . . , vn−1 ) = span(e1 , . . . , en−1 ).
Hence, the vector in the numerator of en must be nonzero. Further,
span(e1 , . . . , en ) = span(v1 , . . . , vn−1 , en ) = span(v1 , . . . , vn ).
It is also clear that ken k = 1. Hence, we need only show that en is orthogonal to all of the previous
vectors. Now note that for any 1 ≤ j < n, we have
hvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 , ej i
kvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 k
hvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 , ej i
=
denom
hvn , ej i − hhvn , ej iej , ej i
hvn , ej i − hvn , ej i
=
=
=0
denom
denom
hen , ej i =
and so the set of vectors e1 , . . . , en is an orthonormal list which has the same span as v1 , . . . , vn . As a corollary, we can now see that we can always find an orthonormal basis.
Corollary 179. Every finite-dimensional inner product space has an orthonormal basis.
Proof. We already know that every finite-dimensional vector space has a basis. Then apply the
Gram–Schmidt Procedure to obtain an orthonormal basis.
Example 180. Let’s use the Gram–Schmidt Procedure to study a non-Euclidean inner product
R1
space. Let’s find an orthonormal basis of P2 (R) with the inner product hp, qi = −1 p(x)q(x) dx.
To start with, we know that 1, x, x2 is a basis of P2 (R).
At the first step, we should take 1/ k1k where the norm here is given by the inner product
defined above. Now
2
Z
1
12 dx = 2
k1k =
−1
so k1k =
√
2. Therefore, e1 =
q
1
2.
For the second vector, the numerator is
Z
r
1
x − hx, e1 ie1 = x −
x
−1
1
dx
2
!r
1
= x.
2
r
3
x dx
2
We need to divide by its norm. We have
kxk2 =
Z
1
x2 dx =
−1
so kxk =
q
2
3
and so e2 =
q
2
3
3
2 x.
For the third vector, the numerator is
x2 − hx2 , e1 ie1 − hx2 , e2 ie2
!r
Z 1 r
1
1
= x2 −
x2
dx
−
2
2
−1
Z
1
x2
−1
1
= x2 − .
3
We still need to divide by the norm. We have
Z 1
1 2
2 2 1
8
2
4
x −
=
x − x +
dx =
3
3
9
45
−1
q
q
8
1
2
so x2 − 31 = 45
. Therefore, e3 = 45
8 (x − 3 ).
Therefore, our orthonormal basis is
r
1
,
2
r
3
x,
2
r
45
8
1
2
x −
.
3
!r
3
x
2
20. Friday 8/7: Orthonormal Bases (6.B)
We saw last time that every finite-dimensional vector space has an orthonormal basis.
Sometimes, in proving something, it may not be enough to know that an orthonormal basis of V
exists, but that we can extend any orthornormal list in V to an orthonormal basis of V . (Recall
that in many proofs in class and in your homework, it was an important technique to take a linearly
independent list and extend it to a basis. Note that an orthonormal list is automatically linearly
independent.)
Proposition 181. Suppose V is finite dimensional. Then every orthonormal list of vectors in V
can be extended to an orthonormal basis of V .
Proof. Suppose e1 , . . . , em is an orthonormal list of vectors in V . Then e1 , . . . , em is linearly independent. Hence, we can extend this list to a basis e1 , . . . , em , v1 , . . . , vn of V . Now to this basis, we
can apply the Gram–Schmidt process to produce an orthonormal list.
This will yield an orthonormal basis e1 , . . . , em , f1 , . . . , fn . Why are the first m vectors unchanged?
Well, in the jth step, we have
ej =
ej − hej , e1 ie1 − · · · − hej , ej−1 iej−1
kej − hej , e1 ie1 − · · · − hej , ej−1 iej−1 k
and since all of the ei ’s are orthogonal, the numerator is ej and the denominator is kej k = 1.
So to summarize, before, we have the theorem “every linearly independent list can be extended to
a basis” and now, we’ve suped it up to “every orthonormal list can be extended to an orthonormal
basis”. We may now be tempted to go through all of the theorems of linear algebra and add the
word “orthonormal” before “basis” to make all of our theorems stronger. However, the following
example shows that we can’t expect to always be able to do this and yield true statements.
Example 182. If T ∈ L(V ) and T is diagonalizable, then this means that V has a basis
consisting of eigenvectors of T . However, a simple example shows that we cannot expect to
be able to find an orthonormal basis of V consisting of eigenvectors of T . If V = R2 and
the eigenspaces of T are span(1, 0) and span(1, 1), then the only choice we have in finding
eigenvectors is scaling. And no vectors in span(1, 0) are orthogonal to any vectors in span(1, 1),
unless you take one or the other vector to be 0.
Another way to say this is that if there is a basis of V for which T is diagonal, then it is not true
that there is always an orthonormal basis of V for which T is diagonal. However, if we replace
“diagonal” with upper-triangular, then it is true.
Proposition 183. Suppose T ∈ L(V ). If T has an upper-triangular matrix with respect to some
basis of V , then T has an upper-triangular matrix with respect to some orthonormal basis of V .
Proof. Suppose T has an upper-triangular matrix with respect to the basis v1 , . . . , vn of V . Then
by Proposition 155, we have span(v1 , . . . , vj ) is invariant under T for every j.
Now we apply the Gram–Schmidt Procedure to v1 , . . . , vn , which produces an orthonormal basis
e1 , . . . , en of V . But note that we showed that
span(e1 , . . . , en ) = span(v1 , . . . , vn )
for each j. Hence, span(e1 , . . . , en ) is invariant under T so by Proposition 155, T is upper-triangular
with respect to the orthonormal basis e1 , . . . , en .
If we combine this result with Proposition 156, we obtain a corollary known as Schur’s Theorem.
Corollary 184 (Schur’s Theorem). Suppose V is a finite-dimensional complex vector space and
T ∈ L(V ). Then T has an upper-triangular matrix with respect to some orthonormal basis of V .
We saw that (in Proposition 163), that if you fix a vector u ∈ V , then the function that takes v to
hv, ui is a linear map. This is a function V → F, i.e., a linear functional on V .
Example 185. The function φ : F3 → F defined by
φ(z1 , z2 , z3 ) = 3z1 + 2z2 − z3
3
is a linear
h funtional
i on F . You can check directly that it is linear, or you can represent it as a
matrix 3 2 −1 , for example. However, we can also think of this linear functional as
φ(z) = hz, ui
where u = (3, 2, −1).
One question is, can every linear functional be written in this way? If I have an inner product
h−, −i on V and φ : V → F is any linear map, can I find a vector u ∈ V such that φ(v) = hv, ui?
It seems highly non-obvious to me.10 Here’s an example where it doesn’t seem too clear.
Example 186. Take the inner product on P2 (R) defined by
Z 1
hp, qi =
p(t)q(t) dt
−1
and consider the function φ : P2 (R) → R defined by
Z a
φ(p) =
p(t)(cos(πt)) dt
−1
is linear (do a quick sanity-check and verify this).
It is not obvious that there exists a u ∈ P2 (R) such that φ(p) = hp, ui. It in fact seems kind of
unlikely! It seems like we should take u = cos(πt), but of course can’t because this is not an
element of P2 (R).
Nevertheless, it turns out that the answer to the question above is yes. This is known as the Riesz
Representation Theorem (because you can represent every linear functional via the inner product).
Theorem 187 (Riesz Representation Theorem). Suppose V is a finite-dimensional inner product
space and φ ∈ V 0 . Then there exists a unique vector u ∈ V such that
φ(v) = hv, ui
for every v ∈ V .
Proof. Let φ ∈ V 0 . First we show existence of such a u, then uniqueness.
Choose an orthonormal basis e1 , . . . , en of V (which we can do by Corollary 179). Then by Proposition 177, we can write
v = hv, e1 ie1 + · · · + hv, en ien .
10It seems like there could be crazy linear maps out there that can’t be written just as taking an inner product with
a fixed vector. On the other hand, there’s dim V worth of vectors in V and V 0 has dimension dim V , so maybe there’s
some hope...
Hence
φ(v) = φ (hv, e1 ie1 + · · · + hv, en ien )
= hv, e1 iφ(e1 ) + · · · + hv, en iφ(en )
= hv, φ(e1 )e1 i + · · · + hv, φ(en )en i
= hv, φ(e1 )e1 + · · · + φ(en )en i
Hence, we can let
u = φ(e1 )e1 + · · · + φ(en )en
and the above computation shows that φ(v) = hv, ui for every v ∈ V .
Uniqueness is actually easier. Suppose that u1 , u2 ∈ V are vectors such that
φ(v) = hv, u1 i = hv, u2 i
for every v ∈ V . Then
0 = hv, u1 i − hv, u2 i = hv, u1 − u2 i
for every v ∈ V . In particular, we can choose v = u1 − u2 . But then ku1 − u2 k = 0 so u1 − u2 = 0
and hence u1 = u2 , as desired.
Remark. If V is a finite-dimensional inner product space, then we can define a map
Φ:V →V0
u 7→ h−, ui.
What the Riesz Representation Theorem is saying is that this map is bijective. Every linear
functional in V 0 can be represented as h−, ui (so Φ is surjective), and this u is unique (so Φ is
injective).
However, in general, Φ is not a linear map, since Φ(λu) is the linear map h−, λui = λh−, ui = λΦ(u).
It is linear over R, but not over C.
Your book works out what the vector is that gives the linear map in Example 186, but I was too
tired to do it in lecture.
21. Monday 8/10: Orthogonal Complements (6.C)
Let V be an inner product space.
Definition 188. If U is a subset of V , then the orthogonal complement of U , denoted U ⊥ is the
set of all vectors in V that are orthogonal to every vector in U . That is,
U ⊥ = {v ∈ V | hv, ui = 0 for every u ∈ U }.
The notation U ⊥ is generally read “U perp”. We can picture orthogonal complements in R2 and
R3 , by drawing a picture. This should give us some intution about what to expect from orthogonal
complements. We first prove some basic properties about orthogonal complements.
Proposition 189. (1) If U is a subset of V , then U ⊥ is a subspace of V .
(2) {0}⊥ = V .
(3) V ⊥ = {0}.
(4) If U is a subset of V , then U ∩ U ⊥ ⊆ {0}
(5) If U and W are subsets of V and U ⊆ W , then W ⊥ ⊆ U ⊥ .
Proof. (1) Note that here we don’t require U to be a subspace, just any subset. If U is any subset,
we want to show that U ⊥ is a subspace. First, certainly h0, ui = 0 for all u ∈ U (in fact for all
u ∈ V , so 0 ∈ U ⊥ ).
Now suppose v, w ∈ U ⊥ . Then for any u ∈ U , we have hv, ui = hw, ui = 0. Hence, for any u ∈ U ,
hv + w, ui = hv, ui + hw, ui = 0 so v + w ∈ U ⊥ . Homogeneity is proved similarly and so U ⊥ is a
subspace of V .
(2) For any v ∈ V , we have hv, 0i = 0 so v ∈ {0}⊥ . Hence, {0}⊥ = V .
(3) Suppose w ∈ V ⊥ . Then w is orthogonal to every vector in V . In particular, w is orthogonal to
w (since w ∈ V ). But then hw, wi = 0 so w = 0. Hence, V ⊥ = {0}.
(4) Suppose U is a subset of V . Let u ∈ U ∩ U ⊥ . Since u ∈ U ⊥ , it is orthogonal to every vector in
U . Since u ∈ U , it is orthogonal to itself. Again, this means u = 0 so U ∩ U ⊥ = {0}.
(5) If U ⊆ W and v ∈ W ⊥ , then hv, wi = 0 for every w ∈ W . But every u ∈ U is also in W so
hv, ui = 0 for all u ∈ U . Hence, v ∈ U ⊥ .
Recall that if V = U ⊕ W is the direct sum of two subspaces, we can write each element of V in
exactly one way as a sum u + w with u ∈ U and w ∈ W . This is like breaking up V into two pieces
that overlap minimally (since we saw that V = U ⊕ W if and only if V = U + W and U ∩ W = {0}).
It turns out that given any subspace U of V , we can write V as the direct sum of U and some other
subspace. In particular:
Proposition 190. If U ⊆ V is a finite-dimensional subspace, then V = U ⊕ U ⊥ .
Proof. We showed in the previous proposition that U ∩ U ⊥ = {0}. Hence, as long as U + U ⊥ = V ,
we will be able to conclude that V = U ⊕ U ⊥ . Clearly, U + U ⊥ ⊆ V . We just need to prove the
reverse inclusion.
To this end,d suppose v ∈ V . Choose an orthonormal basis e1 , . . . , em of U . Now we cleverly
rewrite v as:
v = hv, e1 ie1 + · · · + hv, em iem + v − hv, e1 ie1 − · · · − hv, em iem .
|
{z
} |
{z
}
u
w
Clearly u ∈ U . We aim to show that w ∈ U ⊥ .
Now note that
hw, e1 i = hv − hv, e1 ie1 − · · · − hv, em iem , e1 i = hv, e1 i − hv, e1 ihe1 , e1 i = 0
so w is orthogonal to e1 . Similarly, it is orthogonal to every ej and hence is orthogonal to every
vector in U . Thus, w ∈ U ⊥ so v ∈ U + U ⊥ . Hence, V = U + U ⊥ , as desired.
Using this, we can compute dim U ⊥ .
Proposition 191. Suppose V is finite-dimensional and U is a subspace of V . Then
dim U ⊥ = dim V − dim U
Proof. We already proved that dim(U ⊕ U ⊥ ) = dim U + dim U ⊥ so combined with the above, we
get the desired equality.
Also, if you start with a subspace and take the orthogonal complement twice, then you get back to
the subspace you started with.
Proposition 192. Suppose U is a finite-dimensional subspace of V . Then U = (U ⊥ )⊥ .
Proof. Let u ∈ U . Then by definition, for every v ∈ U ⊥ , hu, vi = 0. But this means that u is
orthogonal to every vector in U ⊥ . Hence, u ∈ (U ⊥ )⊥ . This shows that U ⊆ (U ⊥ )⊥ .
Now suppose v ∈ (U ⊥ )⊥ ⊆ V . By the previous proof, we can write v = u + w where u ∈ U and
w ∈ U ⊥ . Then v − u = w ∈ U ⊥ . Since u ∈ U , we just showed u ∈ (U ⊥ )⊥ . Since v ∈ (U ⊥ )⊥ as
well, we therefore have v − u ∈ (U ⊥ )⊥ .
In summary, we have v−u ∈ U ⊥ ∩(U ⊥ )⊥ = {0}. Hence, u = v. Therefore, v ∈ U so (U ⊥ )⊥ ⊆ U . We now define an operator that projects onto a subspace U of V . I drew a picture in lecture, but
don’t have the ambition to try to make a picture for these notes...
Definition 193. Let U be a finite-dimensional subspace of V . The orthogonal projection of V onto
U is the operator PU ∈ L(V ) defined as follows:
For v ∈ V , write v = u + w where u ∈ U and w ∈ U ⊥ . Then PU (v) = u.
We remark that since V = U ⊕U ⊥ , in the above definition, there is a unique way to write v = u+w.
Hence, PU is well-defined.
We now list some basic properties. You should think about the geometric picture that we drew
when thinking about these properties.
Proposition 194. Suppose U is a finite-dimensional subspace of V and v ∈ V . Then
(1) PU ∈ L(V ).
(2) PU (u) = u for all u ∈ U .
(3) PU (w) = 0 for all w ∈ U ⊥ .
(4) range PU = U
(5) null PU = U ⊥
(6) v − PU (v) ∈ U ⊥ .
(7) PU2 = PU .
(8) kPU (v)k ≤ kvk.
(9) For every orthonormal basis e1 , . . . , em of U ,
PU (v) = hv, e1 ie1 + · · · + hv, em iem .
Proof. (1) To show that PU is a linear map on V , we need to show that it is additive and homogeneous. Suppose v1 , v2 ∈ V . Then write
v1 = u1 + w1
and
v2 = u2 + w2
with u1 , u2 ∈ U and w1 , w2 ∈ U ⊥ (there is a unique way to do this). Then PU (v1 ) = u1 and
PU (v2 ) = u2 . Now notice
v1 + v2 = (u1 + u2 ) + (w1 + w2 )
where u1 + u2 ∈ U and w1 + w2 ∈ U ⊥ . Hence,
PU (v1 + v2 ) = u1 + u2 = PU (v1 ) + PU (v2 ).
Homogeneity is proved similarly.
(2) Let u ∈ U . Since we can write u = u + 0 where u ∈ U and 0 ∈ U ⊥ , therefore PU (u) = u.
(3) Similarly, if w ∈ U ⊥ then we can write w = 0 + w where 0 ∈ U and w ∈ U ⊥ . Hence, PU (w) = 0.
(4) Since we defined PU (v) = u where we write v = u + w with u ∈ U and w ∈ U ⊥ , therefore
range PU ⊆ U . Conversely, if u ∈ U then PU (u) = u so u ∈ range PU . Hence, U = range PU .
(5) By part (3), we have U ⊥ ⊆ null PU . Now suppose v ∈ null PU . Since PU (v) = 0, this means
that v = 0 + v where 0 ∈ U and v ∈ U ⊥ . Hence, v ∈ U ⊥ , which proves the reverse inclusion.
(6) Write v = u + w with u ∈ U and w ∈ U ⊥ . Then
v − PU (v) = u + w − u = w ∈ U ⊥ .
(7) Let v ∈ V and write v = u + w with u ∈ U and w ∈ U ⊥ . Then
(PU )2 (v) = PU (PU (v)) = PU (u) = u = PU (v).
(8) Again, write v = u + w with u ∈ U an w ∈ U ⊥ . Then
kPU (v)k2 = kuk2 ≤ kuk2 + kwk2 = ku + wk2 = kvk2
where the penultimate equality follows from the Pythagorean Theorem, since u and w are orthogonal.
(9) We saw this in the proof of Proposition 190.
22. Wednesday 8/12: Minimization Problems (6.C)
Linear algebra is one of the most useful areas of mathematics (with uses outside of mathematics). The better you understand the concepts behind an application, the more you’ll be able to
understand why things work and perhaps adapt the techniques to new settings. I’ll give three
applications in this lecture, one of which is in your textbook,
Frequently, in real life11, you want to find a vector that is as close as possible to a subspace. A
classical example is linear regression.
Example 195. Suppose you have five data points (x1 , y1 ), . . . , (x5 , y5 ) and you are trying to
approximate the relationship between xi and yi . Maybe you are interested in how square footage
affects housing prices, so you have the data of five houses where xi is the square footage of the
ith house and yi is the price.
This means you want to find scalars α, β such that yi = αxi + β for every 1 ≤ i ≤ 5. This is
the same thing as wanting to solve the matrix equation
 


y1
x1 1 " #
.
 . . α
 .. .. 

.

 β =.
y5
x5 1 |{z}
| {z }
| {z } z
A
y
for α and β. Of course, in general, there will be no solution to this equation, since the points
are unlikely to lie on a single line. In other words, (y1 , . . . , y5 ) is not likely to be in the range
of the linear transformation given by the matrix A. Indeed, the range of the matrix A will be
a two-dimensional subspace of R5 .
The goal in linear regression is to adjust the vector y to ye = (e
y1 , . . . , ye5 ) so that the equation
Az = ye is solvable, and so that ye is “close” to y. In other words, you want to minimize ke
y − yk
such that ye is in the range of A. Luckily, linear algebra will show us that we just need to
project y onto the range of A to find ye, then we can solve the matrix equation for z to find the
parameters for our linear regression.
Proposition 196. Suppose U is a finite-dimensional subspace of V and v ∈ V . Then
kv − PU (v)k ≤ kv − uk
for all u ∈ U , with equality if and only if u = PU (v).
11As a mathematician, to me “in real life” means in engineering, statistics, computer science, physics, etc.
In other words, PU (v) is the closest vector in U to the vector v.
Proof. We compute
kv − PU (v)k2
add non-neg.
≤
Pyth Thm
=
kv − PU (v)k2 + kPU (v) − uk2
(v − PU (v)) + (PU (v) − u)2
= kv − uk2
The Pythagorean Theorem applies since v − PU (v) ∈ U ⊥ and PU (v) − u ∈ U , so these two vectors
are orthogonal. Taking square roots gives the desired inequality.
Note that equality holds if and only if the first inequality is actually equality which is if and only
if kPU (v) − uk = 0 if and only if PU (v) = u.
The picture in your book with V = R2 , U a line, v a vector not on that line is a good picture of
this theorem.
Now if we think about this “projection as minimization” away from our comfortable geometric
home of Fn to a more exotic vector space of functions on some intervals, this takes us....
Approximating Functions.
Let V = CR [−π, π] denote the space of continuous real-valued functions on [−π, π]. All kinds of
continuous functions live in this vector space: polynomials, sines, cosines, exponentials, rational
functions with no roots in [−π, π], etc. There are also lots of other crazy functions that live in V :
anything you could draw without picking your pencil up between −π and π.
One thing you might want to do is approximate a continuous function f ∈ V by taking a linear
combination of some nice functions. For example, you learned in calculus that you can approximate
an infinitely differentiable function f by its Taylor series. If you truncate at, say, degree 5, then
you get an approximation to f that is a degree 5 polynomial.
Of course, V is infinite-dimensional, and the space of degree 5 (or smaller) polynomials is dimension
6. Call this subspace U . Another way to get an approximation to f (even if f is not differentiable!)
is to project onto U .
Let’s try this!
Example 197. Let’s approximate the function sin(x) on the interval [−π, π] by a degree 5
polynomial. We will use the inner product on V given by
Z π
hf, gi =
f (x)g(x) dx.
−π
If we use orthogonal projection to project sin(x) onto the subspace of degree 5 polynomials,
this should yield a polynomial that is “close” to sin(x). In other words, we want to find a u ∈ U
which minimizes ksin(x) − uk. Note that this means minimizing
Z π
2
| sin(x) − u|2 dx.
ksin(x) − uk = hsin(x) − u, sin(x) − ui =
−π
It makes sense that if this integral is small, then u is close sin(x) along the entire interval.
Now unless you enjoy pain, I wouldn’t try doing this by hand. But we can take the basis
1, x, . . . , x5 of U , use Gram–Schmidt to get an orthonormal basis, and then compute the projection onto U . I didn’t do this, because your book tells me that
u = 0.987862x − 0.155271x3 + 0.00564312x5
where the coefficients have been approximated by decimals.
Your book has this very impressive figure which shows how close u is to sin(x) on this interval.
On the other hand, the Taylor series approximation to sin(x) near 0 is given by v = x − x3 /3! +
x5 /5!. This is another degree 5 polynomial that is “close” to sin(x). So v ∈ U , but v is further
away (in this norm) from sin(x) than u is. Here is the graph.12
Fourier Series.
Another very useful way to approximate a function in V = CR [−π, π] is to use Fourier series. Here,
we will use a scaled version of the previous inner product on V , namely
Z
1 π
f (x)g(x) dx.
hf, gi =
π −π
Again, V is an infinite-dimensional inner product space, but we would like to be able to approximate
functions in V by projecting onto some smallish subspace.
It turns out that the functions
1
√ , sin(x), sin(2x), sin(3x), ... cos(x), cos(2x), cos(3x), ...
2
are orthonormal vectors with respect to this inner product.13 So you can take
√1 ,
2
sin(x), . . . , sin(nx),
cos(x), . . . , cos(nx) and get a (2n + 1)-dimensional subspace of V . Taking the orthogonal projection
of a function f ∈ V onto this subspace is the same thing as giving the first few terms of its Fourier
series.
So the real reason why Fourier series make good approximations to functions is because they are
given by projection onto a nice subspace of V .
13You will prove this on your homework.
23. Friday 8/14: Operators on Inner Product Spaces (7.A)
The general outline of this class has been:
• Study vector spaces and subspaces (Chapters 1 and 2).
• Study linear maps between vector spaces, V → W (Chapter 3).
• Study operators on a vector space, V → V (Chapter 5).
• Study inner product spaces (Chapter 6).
• ???
• Profit.
What clearly must go in the “???” is to study linear maps between inner product spaces and
operators on inner product spaces. This is what we now endeavor to do. So throughout this
chapter, V and W will always denote finite-dimensional inner product spaces over F.
Definition 198. Suppose T ∈ L(V, W ). The adjoint of T is the function T ∗ : W → V such that
hT v, wi = hv, T ∗ wi
for every v ∈ V and every w ∈ W .
Why should such a function T ∗ exist? Well, suppose T ∈ L(V, W ) and fix a vector w ∈ W . Then
consider the linear functional V → F which maps v ∈ V to hT v, wi. (This is linear because it is
the composition of two linear maps v 7→ T v 7→ hT v, wi). This is a linear functional that depends
on T and w.
By the Riesz Representation Theorem, there is a unique vector in u ∈ V such that this map is the
same thing as hv, ui. This is the vector that we will call T ∗ w so that hT v, wi = hv, T ∗ wi for all
v ∈V.
Example 199. Let T : R4 → R2 be defined by
T (x1 , x2 , x3 , x4 ) = (x1 − x3 , x2 ).
Lets compute the adjoint T ∗ : R2 → R4 . Fix a point (y1 , y2 ) ∈ R2 . By the definition of the
adjoint, we have
h(x1 , x2 , x3 , x4 ), T ∗ (y1 , y2 )i = hT (x1 , x2 , x3 , x4 ), (y1 , y1 )i = hx1 − x3 , x2 ), (y1 , y2 )i
= x1 y1 − x3 y1 + x2 y2 = h(x1 , x2 , x3 ), (y1 , y2 , −y1 )i
and so
T ∗ (y1 , y2 ) = (y1 , y2 , −y1 , 0).
Note that in the above example, T ∗ is a linear map (which we didn’t assume by definition of T ∗ ).
This is true in general.
Proposition 200. If T ∈ L(V, W ), then T ∗ ∈ L(W, V ).
Proof. Suppose T ∈ L(V, W ) and fix w1 , w2 ∈ W . We first want to show T ∗ (w1 + w2 ) = T ∗ (w1 ) +
T ∗ (w2 ). Note that for every v ∈ V , we have
hv, T ∗ (w1 + w2 )i = hT v, w1 + w2 i = hT v, w1 i + hT v, w2 i
= hv, T ∗ w1 i + hv, T ∗ w2 i = hv, T ∗ w1 + T ∗ w2 i.
Since this is true for every v ∈ V , and by the Riesz Representation Theorem, if h−, u1 i = h−, u2 i
then u1 = u2 , we conclude that T ∗ (w1 + w2 ) = T ∗ w1 + T ∗ w2 .
Homogeneity is proved similarly (and in your book).
So we have a shiny new linear map, the adjoint, to understand. What should we do next? First,
we should understand basic properties.
Proposition 201. (1) (S + T )∗ = S ∗ + T ∗ for all S, T ∈ L(V, W );
(2) (λT )∗ = λT ∗ for all λ ∈ F and T ∈ L(V, W );
(3) (T ∗ )∗ = T for all T ∈ L(V, W );
(4) I ∗ = I where I is the identity operator on V ;
(5) (ST )∗ = T ∗ S ∗ for all T ∈ L(V, W ) and S ∈ L(W, U ).
Proof. (1) Suppose S, T ∈ L(V, W ). If v ∈ V and w ∈ W , then
hv, (S + T )∗ wi = h(S + T )v, wi = hSv, wi + hT v, wi
= hv, S ∗ wi + hv, T ∗ wi = hv, S ∗ w + T ∗ wi.
Again, since this is true for every v ∈ V , we have (S + T )∗ w = S ∗ w + T ∗ w for all w ∈ W , giving
our desired equality of functions.
(2) Suppose T ∈ L(V, W ) and λ ∈ F. Then
hv, (λT )∗ wi = hλT v, wi = λhT v, wi = λhv, T ∗ wi = hv, λT ∗ wi
so (λT )∗ = λT ∗ .
(3) Suppose T ∈ L(V, W ). If v ∈ V and w ∈ W , then
hw, (T ∗ )∗ vi = hT ∗ w, vi = hv, T ∗ wi = hT v, wi = hw, T vi
so (T ∗ )∗ = T .
(4) If v, u ∈ V then
hv, I ∗ ui = hIv, ui = hv, ui
so I ∗ u = u. Since this is true for all u ∈ V , I ∗ is the identity on V .
(5) Suppose T ∈ L(V, W ) and S ∈ L(W, U ). If v ∈ V and u ∈ U then
hv, (ST )∗ ui = hST v, ui = hT v, S ∗ ui = hv, T ∗ (S ∗ u)i.
Hence, (ST )∗ u = T ∗ (S ∗ u), as desired.
The next question you should ask is... what about the null space and range of T ∗ ?14
Proposition 202. Suppose T ∈ L(V, W ). Then
(1) null T ∗ = (range T )⊥ ;
(2) range T ∗ = (null T )⊥ ;
(3) null T = (range T ∗ )⊥ ;
(4) range T = (null T ∗ )⊥ .
Proof. The nice thing about this proof is that we only need to prove (1) by hand, and then we will
cleverly conclude (2)–(4).
Suppose w ∈ W . Then
w ∈ null T ∗ ⇐⇒ T ∗ w = 0
⇐⇒ hv, T ∗ wi = 0 for all v ∈ V
⇐⇒ hT v, wi = 0 for all v ∈ V
⇐⇒ w ∈ (range T )⊥ .
Therefore, null T ∗ = (range T )⊥ .
14Thanks for asking!
Now take orthogonal complements of both sides to get that
(null T ∗ )⊥ = ((range T )⊥ )⊥ = range T,
which is (4).
Now since we proved (1) for any linear map, we know
null T = null(T ∗ )∗ = (range T ∗ )⊥
which is (3). Again, taking orthogonal complements yields (2).
And the next natural question might be... what is the matrix of T ∗ ? As sometimes happens in
math classes, I’m going to give you a definition which betrays the answer before we prove the
answer.15
Definition 203. The conjugate transpose of an m × n matrix is the n × m matrix obtained by
taking the transpose followed by taking the complex conjugat eof each entry.
If A is a matrix, then its conjugate transpose is sometimes denoted AH (for Hermitian transpose).
This is not notation that your book uses, however.
Example 204. If F = R, then the conjugate transpose is just the transpose (since the complex
conjugate of a real number is itself).
"
But suppose A =
2
1+i
#
3 − i 5i
2
7

1−i


= 3 + i
2 .

. Then the conjugate transpose AH
2
−5i
7
So our question is, what is the matrix of T ∗ ? The answer to that question is, of course, “it depends
on the basis of V and basis of W ”. As you change the bases, you change the matrix. In the
following result, note that we are assuming that we have taken orthonormal bases of V and W . If
you drop the word “orthonormal”, then the result is not true.
When you have fixed orthonormal bases of V and W , then the matrices of T and T ∗ will be
conjugate transpose to one another. The fact that this is only true for some choices of bases is a
good example of why the abstract “basis-free” approach to linear algebra is much stronger. You
can define the adjoint of a linear map regardless of what basis you use. We would not want to
15Act surprised anyway.
define the adjoint of a linear map as “the linear map associated to the conjugate transpose of the
matrix”, since this will only be true for some choices of basis.
Proposition 205. Let T ∈ L(V, W ).
Suppose e1 , . . . , en is an orthonormal basis of V and
f1 , . . . , fm is an orthonormal basis of W . Then
M(T ∗ , (f1 , . . . , fm ), (e1 , . . . , en ))
is the conjugate transpose of
M(T, (e1 , . . . , en ), (f1 , . . . , fm )).
Proof. For ease of notation, we fix the bases above and write simply M(T ∗ ) and M(T ).
To get the kth column of the matrix M(T ), we write T ek as a linear combination of the fj ’s. But
since the fj ’s form an orthonormal basis, we know that
T ek = hT ek , f1 if1 + · · · + hT ek , fm ifm .
Therefore the (j, k) entry of M(T ) Is hT ek , fj i.
By the same reasoning, we say that the (j, k) entry of M(T ∗ ) is hT ∗ fk , ej i. By the definition of
the adjoint, this is equal to hfk , T ej i = hT ej , fk i. But this is the complex conjugate of the (k, j)
entry of M(T ). Hence, with respect to this choice of orthonormal bases of V and W , M(T ∗ ) is the
conjugate transpose of M(T ).
24. Monday 8/17: Self-Adjoint and Normal Operators (7.A)
If we consider an operator T : V → V , then since the adjoint of T goes from the codomain of T to
the domain of T , we have that T ∗ : V → V will also be an operator on V . In general, these two
maps are related as in the definition of the adjoint, namely hT v, wi = hv, T ∗ wi for all v, w ∈ V .
However, if T = T ∗ , we call such an operator self-adjoint. We will see that these are very important
and very special operators.
Definition 206. An operator T ∈ L(V ) is called self-adjoint if T = T ∗ . In other words an operator
T ∈ L(V ) is self-adjoint if and only if
hT v, wi = hv, T wi
for all v, w ∈ V .
Example 207. If we fix the standard basis of Fn and T ∈ L(Fn ), then we can write the n × n
matrix M(T ) with respect to the standard basis. Then T will be self-adjoint if and only if this
matrix is equal to its conjugate transpose. For example,


1
2i
3


4
1 + i
−2i
3
1−i
5
will give a self-adjoint operator C3 → C3 . Note that in particular, the diagonal entries must be
real numbers.
If F = R, then the conjugate transpose is just the transpose, so an operator is self-adjoint if
and only if its matrix with respect to the standard basis is symmetric.
This example shows that, by analogy, we can think of taking the adjoint on L(V ) as simliar to
taking complex conjugation in C. By this analogy, self-adjoint operators are analogous to real
numbers.
A slightly weaker notion than a self-adjoint operator is a normal operator.
Definition 208. An operator on an inner product space is called normal if it commutes with its
adjoint. That is, T ∈ L(V ) is normal if
T T ∗ = T ∗ T.
This is a weaker notion than being self-adjoint, because of course if T is self adjoint, then T ∗ = T ,
so certainly T T ∗ = T ∗ T .
Example 209. Let T : C2 → C2 be the operator whose matrix with respect to the standard
basis is
"
2i
3
#
−3 −i
"
Then the matrix of
T∗
is
−2i −3
.
#
, so T is not self-adjoint. However,
3
i
"
#"
# "
# "
#"
#
2i 3
−2i −3
13 −3i
−2i −3
2i 3
=
=
−3 −i
3
i
3i 10
3
i
−3 −i
so T and T ∗ commute. Hence, T is normal.
It turns out that self-adjoint operators and normal operators are some of the nicest possible operators on an inner product space.
We first dig into studying self-adjoint operators.
Proposition 210. If S, T ∈ L(V ) are self-adjoint, then S + T is self-adjoint. If λ ∈ R and T is
self-adjoint, then λT is self-adjoint.
Proof. This follows from Proposition 201. If S, T are self-adjoint then S ∗ = S and T ∗ = T . Hence,
(S + T )∗ = S ∗ + T ∗ = S + T
so S + T is self adjoint. Also,
(λT )∗ = λT ∗ = λT
which is equal to λT if and only if λ is actually real.
Remark. Note that this result says that over C, the set of self-adjoint operators is not a subspace
of L(V ). The set is closed under addition but not closed under scalar multiplication. Over R, the
set of self-adjoint operators is a subspace of L(V ).
But even with this first basic result, we can see an instance of the analogy “{self-adjoint operators} ⊆
L(V ) is like R ⊆ C”. The set R inside of C is closed under addition, but not closed under scalar
multiplication. In other words, R ⊆ C is not a subspace of C (as a C-vector space). In fact, R is
just a small slice of the one-dimensional vector space C. Similarly, the set of self-adjoint operators
is a small (nice) slice of L(V ).
Here’s another result which says that self-adjoint operators are somehow analogous to real numbers.
Proposition 211. Every eigenvalue of a self-adjoint operator is real.
Proof. Suppose T is a self-adjoint operator and let λ be an eigenvalue. We want to show λ is real.
Since T is self-adjoint, we know that hT v, vi = hv, T vi for every v ∈ V . Choose an eigenvector with
eigenvalue λ and work outward from both sides of this equation
λ kvk2 = hλv, vi = hT v, vi = hv, T vi = hv, λvi = λ kvk2 .
Since v is an eigenvector, it is nonzero, so kvk2 6= 0 and hence λ = λ, so λ is real.
This next result is not true over R, because you can consider the operator on R2 that rotates by
an angle of π/2.
Lemma 212. Suppose V is an inner product space over C and T ∈ L(V ). Suppose
hT v, vi = 0
for all v ∈ V . Then T = 0.
Proof. Verify that for all u, w ∈ V ,
1
1
hT u, wi = hT (u + w), u + wi − hT (u − w), u − wi
4
4
i
i
+ hT (u + iw), u + iwi − hT (u − iw), u − iwi.
4
4
[Just expand out the right-hand side and cancel.] Now notice every inner product on the right-hand
side is of the form hT v, vi for some vector v. So by hypothesis, these are all 0. Then hT u, wi = 0
for all u, w ∈ V .
Now fix u and take w = T u. Then hT u, T ui = kT uk2 = 0 so T u = 0. Since u was arbitrary, this
shows that T = 0.
We said that taking adjoints can be thought of as analogous to complex conjugation, and self-adjoint
operators can be thought of as analogous to real numbers (inside the space of all operators). The
next result is another instance of this analogy. In general, for a vector v ∈ V , the scalar hT v, vi will
be some complex number that varies as you vary v. However, if T is self-adjoint, then this scalar
will always be real.
Proposition 213. Suppose V is a complex inner product space and T ∈ L(V ). Then T is selfadjoint if and only if
hT v, vi ∈ R
for every v ∈ V .
Proof. Let v ∈ V . Then
hT v, vi − hT v, vi = hT v, vi − hv, T vi = hT v, vi − hT ∗ v, vi = h(T − T ∗ )v, vi.
A scalar λ ∈ F is real if and only if λ − λ = 0. So if hT v, vi ∈ R for every v ∈ V if and only
if h(T − T ∗ )v, vi = 0 for every v ∈ V , which by the previous proposition happens if and only if
T − T ∗ = 0.16 Hence, T = T ∗ and T is self-adjoint.
We saw that Lemma 212 is not necessarily true over R, but only true over C. However, if we restrict
our attention to self-adjoint operators, the statement is true over R.
Proposition 214. Suppose T is a self-adjoint operator on V such that
hT v, vi = 0
for all v ∈ V . Then T = 0.
Proof. We showed that this was true over C even without assuming T is self-adjoint. So we may
assume that V is a real inner product space. Now if u, w ∈ V then
hT (u + w), u + wi − hT (u − w), u − wi
4
1
hT u, ui + hT u, wi + hT w, ui + hT w, wi − hT u, ui + hT u, wi + hT w, ui − hT w, wi
=
4
1
= hT u, wi + hT w, ui .
2
Now since T is self adjoint, we have
hT w, ui = hw, T ui = hT u, wi = hT u, wi
where the last equality holds since we are working over R. Hence, the first expression simply
becomes hT u, wi.
16We only proved one direction in the previous theorem, but the converse to the previous theorem is obviously true.
But each term in the numerator of the first expression is of the form hT v, vi for some v, which
means the first expression is equal to 0 so hT u, wi = 0 for all u, w ∈ V . In particular, taking u
arbitrary and w = T u, we have hT u, T ui = 0 for all u ∈ U whence T u = 0 for all u ∈ U . Hence
T = 0.
Proposition 215. An operator T ∈ L(V ) is normal if and only if
kT vk = kT ∗ vk
for all v ∈ V .
Proof. Let T ∈ L(V ). Note that the operator T ∗ T − T T ∗ is self-adjoint since
(T ∗ T − T T ∗ )∗ = (T ∗ T )∗ − (T T ∗ )∗ = T ∗ (T ∗ )∗ − (T ∗ )∗ T ∗ = T ∗ T − T T ∗
(by using several parts of Proposition 201). Therefore, we have
T is normal ⇐⇒ T ∗ T − T T ∗ = 0
Prop 214
⇐⇒ h(T ∗ T − T T ∗ )v, vi = 0 for all v ∈ V
⇐⇒ hT ∗ T v, vi = hT T ∗ v, vi for all v ∈ V .
Now by the definition of the adjoint, we have hT ∗ T v, vi = hT v, (T ∗ )∗ vi = hT v, T vi. So the left-hand
side of the last line is kT vk2 and similarly the right-hand side is kT ∗ vk2 . Hence, we continue the
chain of equivalences
⇐⇒ kT vk2 = kT ∗ vk2 for all v ∈ V
⇐⇒ kT vk = kT ∗ vk for all v ∈ V ,
which proves the desired equivalence.
The idea is that normal operators are very closely related to their adjoints (although not as closely
as self-adjoint operators, naturally).
25. Wednesday 8/19: The Spectral Theorem (7.B)
The goal of today’s lecture is to state two very important theorems, the Spectral Theorems.
In order to talk about them, we need to know a bit more about normal operators. Recall that
we were in the midst of showing that even though normal operators aren’t as strongly related to
their adjoints as self-adjoint operators are, they are still very closely related. For example, if T is
normal, then T and T ∗ share all of their eigenvectors.
Proposition 216. Suppose T ∈ L(V ) is normal and v ∈ V is an eigenvector of T with eigenvalue
λ. Then v is also an eigenvector of T ∗ with eigenvalue λ.
Proof. First, we know that if T is normal then for any λ ∈ F, the operator T − λI is normal. This
is because
(T − λI)(T − λI)∗
Prop 201
=
(T − λI)(T ∗ − λI) = T T ∗ − λIT ∗ − T λI + λλI
= T ∗ T − λT ∗ I − λIT + λ2 I = (T ∗ − λI)(T − λI)
= (T − λI)∗ (T − λI)
where T commutes with T ∗ since T is normal, and every operator commutes with the identity
operator.
Hence, v is an eigenvector of T with eigenvalue λ, then we have
0 = k(T − λI)vk = k(T − λI)∗ vk = (T ∗ − λI)v
and so v is an eigenvector of T ∗ with eigenvalue λ.
Our last result in this section is a hint as to the importance of normal operators. Their “eigenspaces
are orthogonal” in a sense we make precise below.
Proposition 217. Suppose T ∈ L(V ) is normal. If u is an eigenvector with eigenvalue α and v is
an eigenvector with eigenvalue β and α 6= β, then u and v are orthogonal.
Proof. We have T u = αu, T v = βv and also T ∗ v = βv. Therefore,
(α − β)hu, vi = hαu, vi − hu, βvi = hT u, vi − hu, T ∗ vi = 0.
But since α − β 6= 0, this means hu, vi = 0, as desired.
Recall that an operator T ∈ L(V ) has a diagonal matrix with respect to a basis of V if and only if
the basis consists of eigenvectors of T . We already saw (in Example 182), that if V does have such
a basis, it might not necessarily have an orthonormal basis of eigenvectors of T .
The nicest operators will be the ones such that there is an orthonormal basis of eigenvectors. The
big goal of this section is to prove the Spectral Theorem, which characterizes these operators. The
answer is different over C vs. over R, so your book proves two versions of the Spectral Theorem.
Theorem 218 (The Complex Spectral Theorem). Suppose F = C and T ∈ L(V ). Then the
following are equivalent:
(1) T is normal.
(2) V has an orthonormal basis consisting of eigenvectors of T .
(3) T has a diagonal matrix with respect to some orthonormal basis of V .
Proof. The equivalence of (2) and (3) follows from Proposition 152. So it suffces to prove the
equivalence of (1) and (3).
[(3) ⇒ (1)]. Suppose T has a diagonal matrix with respect to some orthonormal basis of V . Then
by Proposition 205, with respect to the same basis, the matrix of T ∗ is the conjugate transpose of
the matrix of T . Therefore, both T and T ∗ have diagonal matrices with respect to this basis. Now
note that

a1



..

b1



.
an

a1 b1
 
=
 


b1
 
=
 

..
.
bn
..
.
an bn
..

a1



.

..



.
bn
an
(that is, any two diagonal matrices commute). Hence, T T ∗ and T ∗ T have the same matrix with
respect to this basis, so they are the same linear map. So T T ∗ = T ∗ T and so T is normal.
[(1) ⇒ (3)]. Suppose T is normal. By Schur’s Theorem, there is an orthonormal basis e1 , . . . , en of
V with respect to which the matrix of T is upper-triangular. So



a1,1
a1,1 . . . a1,n



.
..
..
..  and M(T ∗ , (e1 , . . . , en )) =  ...
M(T, (e1 , . . . , en )) = 
.
.



a1,n . . .
0
an,n
We will show that actually these matrices are diagonal.
Note that from the matrices above we see that
kT e1 k2 = |a1,1 |2
0




an,n
and
kT ∗ e1 k2 = |a1,1 |2 + · · · + |a1,n |2 = |a1,1 |2 + · · · + |a1,n |2 .
But since T is normal, by Proposition 215, kT e1 k = kT ∗ e1 k and so a1,2 = · · · = a1,n = 0.
Now play the same game with e2 . Because we know a1,2 = 0, we have T e2 = a2,2 e2 , so
kT e2 k2 = |a2,2 |2 .
We also have
kT ∗ e2 k2 = |a2,2 |2 + · · · + |a2,n |2 .
and again since T is normal, we can conclude that a2,3 = · · · = a2,n = 0.
Continue in this fashion (okay actually you should do an induction) to see that all non-diagonal
entries in M(T ) are zero.
In lecture, we did not have time to prove the Real Spectral Theorem, so I simply stated it. Full
details of the proof are of course found in your book.
Theorem 219 (The Real Spectral Theorem). Suppose F = R and T ∈ L(V ). Then the following
are equivalent:
(1) T is self-adjoint.
(2) V has an orthonormal basis consisting of eigenvectors of T .
(3) T has a diagonal matrix with respect to some orthonormal basis of V .
The Real17 Spectral Theorem.
In the real case, things are a bit different (e.g., Schur’s Theorem only applies to complex vector
spaces). So we need to do a bit more set up.
The first observation is an analogue of the following fact. For any real number x,
b 2
b2
2
x + bx + c = x +
+c−
2
4
and so as long as c − b2 /4 > 0 ⇐⇒ b2 < 4c, we will have that x2 + bx + c > 0. Every nonzero real
number is invertible, so this means that if b2 < 4c, then for every real number x, x2 + bx + c is
invertible.
17Real as in R, not real as in “actual”.
Proposition 220. Suppose T ∈ L(V ) is a self-adjoint operator and b, c ∈ R such that b2 < 4c.
Then
T 2 + bT + cI
is an invertible operator.
Proof. Let v is any nonzero vector in V . If we can show that (T 2 + bT + cI)v 6= 0, then T 2 + bT + cI
will have a trivial null space, so will be invertible. Now observe that
h(T 2 + bT + cI)v, vi = hT 2 v, vi + bhT v, vi + chv, vi
T self-adj
=
hT v, T vi + bhT v, vi + c kvk2
Cauchy-Schwarz
≥
kT vk2 + b kT vk kvk + c kvk2
|b|2 kvk2
|b|2 kvk2
2
= kT vk + b kT vk kvk +
+ c kvk −
4
4
2 2
b
|b| kvk
+ c−
kvk2 > 0.
= kT vk −
2
4
2
Since this inner product is nonzero, we have that (T 2 + bT + cI)v 6= 0, as desired.
We saw that every operator on a complex vector space has an eigenvalue, but that this was not true
for real vector spaces (e.g., rotation by an angle 0 < θ < π). However, every self-adjoint operator
on a real vector space does have an eigenvalue.
Proposition 221. Suppose V 6= {0} and T ∈ L(V ) is a self-adjoint operator. Then T has an
eigenvalue.
Proof. Every operator on a complex vector space has an eigenvalue, regardless of whether it is
self-adjoint or not. So we need only prove this over R.
Assume V is a real inner product space and let n = dim V . Choose v ∈ V nonzero. Then
v, T v, T 2 v, . . . , T n v
is a linearly dependent list (since there are n + 1 vectors in an n-dimensional space). Hence, there
exist a1 , . . . , an ∈ R, not all 0 such that
0 = a0 v + a1 T v + · · · + an T n v.
Consider the polynomial a0 + a1 x + · · · + an xn . Even though this polynomial does not factor
completely into linear factors, it does factor completely into linear and quadratic factors (since the
non-real roots of a real polynomial come in conjugate pairs, you can pair them up and get a real
quadratic factor). Hence, we can write
a0 + a1 x + · · · + an xn = c(x2 + b1 x + c1 ) . . . (x2 + bM xcM )(x − λ1 ) . . . (x − λm )
where c 6= 0, the bi , ci , λi are real and 2M + m = n. We did not cover this Theorem 4.17 in your
book, but we can also assume b2i < 4ci for all i.
Then we can write
0 = a0 v + a1 T v + · · · + an T n v
= (a0 I + a1 T + · · · + an T n )v
= c(T 2 + b1 T + c1 I) · · · (T 2 + bM T + cM I)(T − λ1 I)cdots(T − λm I)v.
By the previous results, each of the T 2 + bj T + cj I is invertible. Since c 6= 0, this is not the zero
map. So this means that
0 = (T − λ1 I) · · · (T − λm I)v.
Hence, for some j, T − λj I is not injective, and for this j, λj will be an eigenvalue of T .
One last lemma before we’re ready to prove the Real Spectral Theorem.
Proposition 222. Suppose T ∈ L(V ) is self-adjoint and U is a subspace of V that is invariant
under T . Then
(1) U ⊥ is invariant under T ;
(2) T |U ∈ L(U ) is self-adjoint;
(3) T |U ⊥ ∈ L(U ⊥ ) is self-adjoint.
Proof. (1) Suppose that v ∈ U ⊥ . We want to show that T v ∈ U ⊥ . For any u ∈ U , we have
hT v, ui = hv, T ui
since T is self-adjoint. Since U is invariant under T , we also have that T u ∈ U and hence hv, T ui = 0.
Therefore, hT v, ui = 0 and since u ∈ U was arbitrary, we conclude that T v ∈ U ⊥ .
(2) Let u, v ∈ U . Then
hT |U (u), vi = hT u, vi = hu, T vi = hu, T |U (v)i
and hence T |U (which is an operator on U since U is invariant under T ) is a self-adjoint operator.
(3) Since we proved in part (1) that U ⊥ is invariant under T , this has the same proof as part (2)
with U ⊥ substituted in for U .
This takes us to the last theorem of the class.
Theorem 223 (The Real Spectral Theorem). Suppose F = R and T ∈ L(V ). Then the following
are equivalent:
(1) T is self-adjoint.
(2) V has an orthonormal basis consisting of eigenvectors of T .
(3) T has a diagonal matrix with respect to some orthonormal basis of V .
Proof. As we noted in the proof of the Complex Spectral Theorem, (2) and (3) are equivalent by
by Proposition 152. We will prove (3) ⇒ (1) and (1) ⇒ (2).
[(3) ⇒ (1)]. Suppose T has a diagonal matrix with respect to some orthonormal basis of V . The
matrix of T ∗ with respect to this matrix is given by the conjugate transpose. But a real diagonal
matrix is equal to its conjugate transpose, so T and T ∗ have the same matrix with respect to this
basis. Hence, T = T ∗ and so T is self-adjoint.
[(1) ⇒ (2)]. We prove this by induction on dim V . For the base case, if dim V = 1, then (1) ⇒ (2),
because every linear map V → V is represented by a 1 × 1 matrix [a], and so every nonzero vector
is an eigenvector of T . So the set {1} is an orthonormal basis of V consisting of eigenvectors of T .
Now suppose dim V > 1 and assume (1) ⇒ (2) for all real inner product spaces of smaller dimension.
Suppose T is self-adjoint. By Proposition 221, T has an eigenvalue and thus has a (non-zero)
eigenvector u. We can scale this eigenvector by dividing by its norm and assume that kuk = 1.
Then U = span(u) is a 1-dimensional subspace of V which is invariant under T . By the previous
result, T |U ⊥ ∈ L(U ⊥ ) is a self-adjoint operator.
Now note that dim U ⊥ = dim V − dim U = dim V − 1, so by the induction hypothesis, there is an
orthonormal basis of U ⊥ consisting of eigenvectors of T |U ⊥ . Adjoining u to this orthonormal basis
of U ⊥ gives an orthonormal basis of V consisting of eigenvectors of T .
Download