Math 340: Summer 2020 Course Notes Robert Won These are course notes from Math 340 (Abstract Linear Algebra) at the University of Washington taught during Summer 2020. The textbook was Linear Algebra Done Right by Sheldon Axler. I also referenced Lucas Braune’s notes from when he taught Math 340. To my students: If you find errors or typos, please let me know! I can correct them. 1. Monday 6/22: Vector Spaces (1.A–B) and Overview of the Course Course overview • Q: What is the point of this course? This course serves as a second (or perhaps third) exposure to linear algebra. In Math 308, when you first learned linear algebra, you focused on Rn , matrices, and computations. Depending on your instructor, you likely saw a few proofs, because understanding a proof of a result deepens your understanding of the result. You also probably saw some of the powerful applications of linear algebra (if you took Math 380, you saw even more). Linear algebra is central to Google PageRank, sabermetrics, machine learning, principal component analysis, and graph theory. It is also central to pure mathematics (every mathematics professor uses linear algebra in their research). The more deeply you understand the concepts, the better. So the goal of this course is to understand linear algebra deeply, which means that this course will be a proof-based course. • Logistical details: The textbook is important, and you can get access to an electronic version through the UW library. I will post three lectures on Canvas (in the Panopto tab) a week, which you can watch asynchronously. But I highly suggest that you keep up with lectures. Getting behind in a math class makes things a lot less fun. Both your TA and I will have office hours on Zoom, with the exact timing TBA. I will post the slides from lecture to Canvas, as well as regularly updating these typed up notes (I’m writing them while teaching this summer, so as the quarter progresses, this document will continue growing). You will have weekly problem sets which will be posted to Canvas. You will submit completed assignments on Gradescope. There will be one midterm exam, and a final exam. These will be timed remote exams, also submitted on Gradescope. I will give you more information as we get closer to the first midterm. Rn and Cn (1.A) In your previous linear algebra courses, you likely focused mostly on the vector space Rn . The next vector space to keep in mind is Cn . So first, we should make sure that we’re all somewhat comfortable with complex numbers. Section 1.A in your textbook has more details, which you should review carefully if you’re feeling less comfortable. Definition 1. A complex number is a pair (a, b) where a, b ∈ R. We usually write a + bi instead of (a, b). The set of all complex numbers is denoted by C: C = {a + bi | a, b ∈ R}. The set C is equipped with operations addition and multiplication: (a + bi) + (c + di) = (a + c) + (b + d)i (a + bi)(c + di) = (ac − bd) + (ad + bc)i. Using this multiplication rule, you can calculate that i2 = −1. So you should not memorize the multiplication rule, but just carry out the arithmetic using the fact that i2 = −1. Remark. Let z = a + bi ∈ C \ {0}. Then (a + bi)(a − bi) = a2 + b2 . Using this fact, you can show that b a − i = 1. z· a2 + b2 a2 + b2 In other words, every nonzero complex number has a multiplicative inverse. We write z −1 = 1 a b = 2 − 2 i. 2 z a +b a + b2 The triple (C, +, ·) is an example of a field (a set with two operations satisfying certain axioms, including that every nonzero element has a multiplicative inverse). Other examples of fields include: R, Q, Z/(p) where p is a prime number. It turns out that actually you can do linear algebra over any field. Notation. in this course, we use F to denote either R or C. elements of F are called scalars. Just as you are familiar with R3 = {(x, y, z) | x, y, z ∈ R} we make the following definition. Definition 2. The n-dimensional Euclidean space over F is defined to be Fn = {(x1 , x2 , . . . , xn ) | xj ∈ F for j = 1, . . . , n}. We say that xj is the jth coordinate of (x1 , . . . , xn ) ∈ Fn . If you recall all of the things you did in Math 308 and Math 380, the important fact about Rn is that you can add vectors and scalar multiply (when taking linear combinations, spans, null spaces, etc., these are the important operations). When row reducing, it was important to be able to divide by any nonzero scalar. These operations also work in Fn (where F = R or C or any field). Definition 3. Addition in Fn is defined by adding corresponding coordinates (x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ). Scalar multiplication in Fn is defined as follows. If λ ∈ F and (x1 , . . . , xn ) ∈ Fn then λ(x1 , . . . , xn ) = (λx1 , . . . , λxn ). Vector Spaces (1.B) Since all we need to do linear algebra are addition and scalar multiplication by elements of a field F, in the abstract, we define vector spaces in terms of having these operations. Definition 4. A vector space (over F) is a set V equipped with two operations: an addition that assigns an element u + v ∈ V for every pair of elements u, v ∈ V and scalar multiplication that assigns an element λv ∈ V to each λ ∈ F and each v ∈ V . The operations are required to satisfy the following axioms. (1) (commutativity of addition) u + v = v + u for all u, v ∈ V , (2) (associativity of addition) (u + v) + w = u + (v + w) for all u, v, w ∈ V , (3) (additive identity) there exists an element 0 ∈ V such that 0 + v = v for all v ∈ V , (4) (additive inverse) for every v ∈ V , there exists w ∈ V such that v + w = 0, (5) (associativity of scalar multiplication) (ab)v = a(bv) for all a, b ∈ F and v ∈ V , (6) (multiplicative identity) 1v = v for all v ∈ V , (7) (distributivity) a(u + v) = au + av and (a + b)v = av + bv for all a, b ∈ F and u, v ∈ V . Elements of a vector space are called vectors or sometimes points. The value in defining vector spaces abstractly is that many different sets (other than just Rn ) satisfy these axioms. So if we prove any results using only these axioms, the result will hold for any vector space, including Rn . Here are some examples of some vector spaces. Example 5. Let F∞ = {(x1 , x2 , x3 , . . . ) | xj ∈ F for all j ∈ N}. Define addition and scalar multiplication (x1 , x2 , . . . ) + (y1 , y2 , . . . ) = (x1 + y1 , x2 + y2 , . . . ) λ(x1 , x2 , . . . ) = (λx1 , λx2 , . . . ). One can check that these operations satisfy the axioms above. Example 6. Let P(R) denote the set of all polynomials in the variable x with coefficients in R. For example, x2 − 3, πx + e, x3 ∈ P(R). As usual, you can add two polynomials and obtain another polynomial (x2 − 3) + (πx + e) = x2 + πx + e − 3 and you can multiply a polynomial by any real number λ ∈ R to obtain another e(x2 − 3) = ex2 − 3e and you can check that these operations make P(R) into a vector space (over R). Example 7. Let S be any set. Let FS = {all functions f : S → F}. For any f, g ∈ FS , define addition by (f + g) : S → F (f + g)(x) = f (x) + g(x) for all x ∈ S. For any λ ∈ F and f ∈ FS , define (λf ) : S → F (λf )(x) = λf (x) for all x ∈ S. Then this makes FS into a vector space. So now when we prove a result for vector spaces (using just the axioms), it will hold for Rn , Cn , these examples, and all other vector spaces. Let’s see some examples (which may be familiar from Math 300). Notation. Throughout these notes and the textbook, V will denote a vector space over F. Proposition 8. A vector space has a unique additive identity. Remark. Note that the axioms of a vector space only asserted the existence of an additive identity, we didn’t assume that there was a unique vector with that property. In fact, it is unique, and we can prove it. Proof. Suppose that 0, 00 ∈ V are both additive identities. Then 00 = 0 + 00 = 00 + 0 = 0 so 00 = 0. Hence, the additive identity is unique. Proposition 9. Every element in a vector space has a unique additive inverse. Proof. Let v ∈ V . We wish to show that v has a unique additive inverse. So suppose that w, w0 are additive inverses of v. Then w = w + 0 = w + (v + w0 ) = (w + v) + w0 = 0 + w0 = w0 so w = w0 , as desired. Notation. Let v, w ∈ V . We use the notation −v for the (unique) additive inverse of v and define w − v = w + (−v). Proposition 10. 0v = 0 for every v ∈ V Proof. For v ∈ V , we have 0v = (0 + 0)v = 0v + 0v subtract 0v from both sides (that is, add the additive inverse of 0v). Proposition 11. For every a ∈ F, a0 = 0. Proof. For any a ∈ F a0 = a(0 + 0) = a0 + a0 subtract a0 from both sides. Note the difference in the two above propositions. The first says that the product of the scalar 0 with any vector v is the zero vector. The second says that product of any scalar with the zero vector is the zero vector. The proofs look similar, but you have to pay careful attention to what is a vector and what is a scalar. Oftentimes we abuse notation by denoting both the scalar 0 and the zero vector 0 with the same symbol, because usually it is clear from context. You should, though. keep clear in your mind whether you are working with a scalar or a vector. Proposition 12. For every v ∈ V , (−1)v = −v. This proposition says that if you take the scalar −1 and multiply it by any vector v, you obtain the additive inverse of v. Try to prove it! Then check it against the book’s proof afterward. 2. Wednesday 6/24: Subspaces (1.C) As always, let V denote a vector space over F. Definition 13. A subset U ⊆ V is a subspace if U is also a vector space (using the same addition and scalar multiplication as on V ). Proposition 14. A subset U ⊆ V is a subspace if and only if it satisfies the following three conditions: (1) 0 ∈ U , (2) u, v ∈ U implies u + v ∈ U , (3) a ∈ F and u ∈ U implies au ∈ U . Proof. If U is a subspace of V , then by definition it is a vector space using the same addition and scalar multiplication as V . So (2) and (3) are automatic by the definition of a vector space. We should check that 0U = 0 (that is, the additive identity of U is the same as the additive identity of V ). Note that 0 = 0 · 0U ∈ U because U is closed under scalar multiplication. And since the additive identity of a vector space is unique, this shows that 0U = 0. For the converse, suppose that U satisfies conditions (1)–(3). If u ∈ U then −u = (−1) · u ∈ U by condition (3). Since V is a vector space and the addition and scalar multiplication of U are inherited from V , all other axioms for a vector space hold in U . (For example, if u, v ∈ U then u, v ∈ V so u + v = v + u, etc.) Example 15. • If b ∈ F, then {(x1 , x2 , x3 ) ∈ F4 | x1 − 3x2 = b} is a subspace of F3 if and only if b = 0. • The set of continuous real-valued functions f : [0, 1] → R is a subspace of R[0,1] (the set of all functions [0, 1] → R). This is because the sum of two continuous functions is continuous, and the scalar multiple of any continuous function is continuous. • The set of continuous real-valued functions f : [0, 1] → R satisfying f (0) = b is a subspace of R[0,1] if and only if b = 0. You should check for yourself that the three examples above are subspaces of their respective vector spaces. Remember that this come down to checking that the 0 vector is in each of them (the zero vector in R[0,1] is the function that is identically 0) and that the subset is closed under addition and scalar multiplication. Proposition 16. Let {Uλ }λ∈Λ be a family of subspaces of V . Then T λ∈Λ Uλ is a subspace of V . Proof. You will prove this on your problem set. Sums of subspaces. We now discuss sums of subspaces. The idea is the following. Given two (or more) subspaces U and V , what is the correct notion of combining U and V into a bigger subspace? In particular, we would like this bigger subspace to actually be a subspace, not just a subset. So for example, taking U ∪ V does not always produce a subspace. The correct notion is to take the sum of U and V , which we will define below. Note that the definition actually holds for any collection of subsets, not just subspaces, but we will mostly be interested in taking sums of subspaces. Definition 17. Let U1 , . . . , Um be subsets of V . The sum of U1 , . . . , Um is U1 + U2 + · · · + Um = {u1 + · · · + um | ui ∈ Ui for all 1 ≤ i ≤ m}, that is, the collection of all possible sums of elements of U1 , . . . , Um . Example 18. Let V = R2 , U1 = {(x, 0) | x ∈ R} the x-axis, and U2 = {(0, y) | y ∈ R}, the y-axis. Then U1 ∪U2 is not a subspace. On the other hand, check for yourself that U1 +U2 = R2 , which is a subspace. In fact, it is the smallest subspace of R2 containing both U1 and U2 . This is true in general. Proposition 19. Let U1 , . . . , Um be subspaces of V . Then U1 + · · · + Um is the smallest subspace of V containing U1 , . . . , Um . Proof. First, we show that U1 + · · · + Um is indeed a subspace. Since each of the Ui are subspaces, each of them contains 0. Further, 0 = 0 + · · · + 0 ∈ U1 + · · · + Um so 0 ∈ U1 + · · · + Um (you can write 0 as a sum of m things, one from each of the Ui ’s). Suppose v, w ∈ U1 + · · · + Um . By definition, this means that v = u1 + · · · + um and w = t1 + · · · + tm where ui , ti ∈ Ui for each 1 ≤ i ≤ m. Then v + w = (u1 + t1 ) + · · · + (um + tm ) and since each Ui is a subspace, ui + ti ∈ Ui . A similar proof works to show that U1 + · · · + Um is closed under scalar multiplication. Now that we know that U1 +· · ·+Um is a subspace, how do we prove that it is the smallest subspace containing U1 , . . . , Um ? Well, first of all, note that if u ∈ Ui , then u = 0 + · · · + 0 + ui + 0 + · · · + 0 so u ∈ U1 +· · ·+Um . Hence, the sum contains each Ui . On the other hand, if W is any subspace which contains U1 , . . . , Um , then since W is closed under addition, it must contain all sums of elements of U1 , . . . , Um . Hence, W must contain U1 + · · · + Um . Definition 20. Suppose U1 , . . . , Um are subspaces of V . The sum U1 + · · · + Um ⊆ V is called a direct sum if each v ∈ U1 + · · · + Um can be written in a unique way as v = u1 + · · · + um with ui ∈ Ui for all i. In this case, we use the notation U1 ⊕ · · · ⊕ Um = U1 + · · · + Um . Example 21. Consider the following subspaces of R3 : U = {(x, 0, 0) | x ∈ R}, V = {(0, y, 0) | y ∈ R} and W = {(x, x, 0) | x ∈ R}. Then U + V + W is a subspace (the xy-plane in R3 ). However, for example, (1, 1, 0) can be written as (1, 0, 0) + (0, 1, 0) + (0, 0, 0) or as (0, 0, 0) + (0, 0, 0) + (1, 1, 0), or in fact as (2, 0, 0) + (0, 2, 0) − (1, 1, 0). Hence, this is not a direct sum. You can check for yourself that U + V is a direct sum and U + V + W = U + V . Hence,we can denote U + V by U ⊕ V . In fact, U ⊕ V = U ⊕ W = V ⊕ W = U + V + W . A priori, it seems like the condition for a sum to be a direct sum involves checking infinitely many things, since you have to check whether every vector has a unique representation as a sum. It turns out that you only need to check whether 0 can be written uniquely as a sum. Theorem 22. Suppose U1 , . . . , Um are subspaces of V . Then U1 + · · · + Um is a direct sum if and only if the only way to write 0 as a sum u1 + · · · + um where each ui ∈ Ui is by taking each ui to be the zero vector. Proof sketch. By the definition of the direct sum, there is only one way to write 0 as a sum, and certainly 0 + · · · + 0 = 0, so this takes care of the ⇒ direction. Conversely, suppose that the unique way to write 0 as a sum of elements from the Ui is 0 = 0+· · ·+0. Suppose that there is a v ∈ U1 + · · · + Um that you can write as a sum in two ways, say u1 + · · · + um and v1 + · · · + vm . Then subtract to get (u1 − v1 ) + · · · + (um − vm ) = 0. Since there is only one way to write 0 as a sum, and each ui − vi ∈ Ui , we must have ui − vi = 0 for all i. Hence, the two ways to write v were in fact the same. For the sum of two subspaces, there is an even easier criterion to check. Proposition 23. If U and W are subspaces of V , then U + W is a direct sum if and only if U ∩ W = {0}. Proof. [⇒]. Suppose U + W is a direct sum. Let v ∈ U ∩ W . Then v = v + 0 = 0 + v ∈ U + W . By uniqueness, we must have that v = 0. [⇐]. Suppose that U ∩ W = {0}. We can use the above theorem and prove that there is only one way to write 0 as a sum. Suppose 0 = u + w where u ∈ U and w ∈ W . Then u = −w ∈ U ∩ W . Hence, u = w = 0. 3. Friday 6/26: Span and Linear Independence (2.A) Given a collection of vectors in V , we can scalar multiply them and add them to form new vectors. Definition 24. A linear combination of v1 , . . . , vm ∈ V is a vector of the form a1 v1 + · · · + am vm where a1 , . . . , am ∈ F. Definition 25. The set of all linear combinations of v1 , . . . , vm ∈ V is called the span of v1 , . . . , vm and is denoted span(v1 , . . . , vm ). That is, span(v1 , . . . , vm ) = {a1 v1 + · · · + am vm | a1 , . . . , am ∈ F}. By convention, we take span(∅) = {0}. Example 26. • If v ∈ R2 is nonzero, then span(v) = {av | a ∈ R} is the line through v in R2 . Of course, if v = 0 then span(v) = {0}. • If v, w ∈ R3 are nonzero and they are not scalar multiples of one another, then span(v, w) is a plane in R3 . Proposition 27. The set span(v1 , . . . , vm ) is the smallest subspace of V containing v1 , . . . , vm . Proof. This is asserting two things. First of all that S = span(v1 , . . . , vm ) is a subspace. Second, that it is the smallest subspace containing the vectors. To show that S is a subspace, we should show that it contains 0 and is closed under addition and scalar multiplication. First, 0 ∈ S since 0 = 0v1 + · · · + 0vm . Now suppose u, w ∈ S. Then u = a1 v1 + · · · + am vm w = b1 v1 + · · · + bm vm for some a1 , . . . , am , b1 , . . . , bm ∈ F. Then u + w = (a1 + b1 )v1 + · · · + (am + bm )vm ∈ S. Check for yourself that S is closed under scalar multiplication to complete the proof that S is a subspace. Now it is clear that S contains each vi , since vi = 0v1 +· · ·+1vi +· · ·+0vm so vi is a linear combination of v1 , . . . , vm . Now if T is any subspace which contains v1 , . . . , vm , then since T is closed under addition and scalar multiplication, it must contain every linear combination of v1 , . . . , vm and hence S ⊆ T . This shows that S is the smallest subspace which contains v1 , . . . , vm . Definition 28. If span(v1 , . . . , vm ) equals V , we say that v1 , . . . , vm spans V . Note the difference between the noun “the span of v1 , . . . , vm ” and the verb “v1 , . . . , vm spans V ”. Example 29. The vectors e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ... en = (0, 0, . . . , 0, 1) span Fn since span(e1 , . . . , en ) = Fn . Definition 30. Let P(F) denote the set of all polynomials in one variable z with coefficients in F. That is, P(F) consists of all polynomials of the form a0 + a1 z + · · · + am z m for some a0 , . . . , am ∈ F. If m is a non-negative integer, let Pm (F) denote the subset of P(F) consisting of polynomials of degree at most m. This is a subspace of P(F). Example 31. The monomials 1, z, z 2 , . . . span the vector space P(F), since every polynomial can be written as a (finite) linear combination of these monomials. The m + 1 monomials 1, z, z 2 , . . . , z m spans the subspace Pm (F). Definition 32. A vector space is finite-dimensional if it is spanned by finitely many vectors. Otherwise, we say it is infinite-dimensional. So P(F) is infinite-dimensional while Pm (F) is finite-dimensional. F∞ is infinite-dimensional and Fn is finite-dimensional. Remember your intuition from 308 that if {v1 , . . . , vn } spans V , then {v1 , . . . , vn } “points in every direction in V ”. The other notion from 308 is the notion of linear independence. Remember the intuition is that a linearly independent set “points in different directions” or perhaps “isn’t redundant”. Definition 33. A subset {v1 , . . . , vn } ⊆ V is linearly independent if a1 v1 + · · · + an vn = 0 implies that a1 = · · · = an = 0. By convention, the empty set {} is said to be linearly independent. Otherwise, {v1 , . . . , vn } is linearly dependent. Another way to say this is that {v1 , . . . , vn } is linearly dependent if there exist scalars a1 , . . . , an ∈ F not all 0 such that a1 v1 + · · · + an vn = 0. Example 34. • {v} ⊆ V is linearly independent if and only if v 6= 0. • {v, w} ⊆ V is linearly independent if and only if v 6= λw and w 6= λv for all λ ∈ F. • {1, z, z 2 , z 3 } is a linearly independent subset of P(F) or Pm (F) (for m ≥ 3). If a0 + a1 z + a2 z 2 + a3 z 3 = 0 as a function of z, then a0 = a1 = a2 = a3 = 0 since a nonzero polynomial of degree 3 has at most 3 roots, while our polynomial has infinitely many roots. The next proposition again captures the intuition that a linearly independent set is one with “no redundancy.” Proposition 35. A subset {v1 , . . . , vn } ⊆ V is linearly independent if and only if none of the vectors is a linear combination of the others. Proof. [⇒]. We prove the contrapositive. If one of the vectors is a linear combination of others, say v1 = a2 v2 + · · · + an vn , then (−1)v1 + a2 v2 + · · · + an vn = 0 so we have a nontrivial linear combination which is 0 and hence the set is linearly dependent. [⇐]. Conversely, suppose that {v1 , . . . , vn } is linearly dependent. Then there exist a1 , . . . , an ∈ F not all zero such that a1 v1 + · · · + an vn = 0. Without loss of generality, suppose that a1 6= 0. Then v1 = − an a2 v2 − · · · − vn a1 a1 so one of the vectors is a linear combination of the others. We can say a bit more actually. Not only is one of the vectors a linear combination of the others, but you can remove it from the list without affecting the span. The lemma as stated in your book is as follows: Lemma 36 (Linear Dependence Lemma). Suppose v1 , . . . , vm is a linearly dependent list in V . Then there exists j ∈ {1, 2, . . . , m} such that the following hold (1) vj ∈ span(v1 , . . . , vj−1 ); (2) if the jth term is removed, the span of the remaining list equals span(v1 , . . . , vm ). Proof. We have essentially proved the first part of this lemma above. If the set is linearly dependent, then there exist scalars not all zero such that a1 v1 + · · · + am vm = 0. Simply take the largest j such that aj 6= 0. Then we actually have a1 v1 + · · · + aj vj = 0 (since the rest of the coefficients are zero). Subtract and divide by aj . For the second claim we need to show that the span is not affected when you remove the vector vj . We need to prove that if you can write a vector as a linear combination of all of the vi ’s, then you can fact write it as a linear combination without using vj . So take some vector a1 v1 + a2 v2 + · · · + aj−1 vj−1 + aj vj + aj+1 vj+1 + · · · + am vm ∈ span(v1 , . . . , vm ). We know that vj ∈ span(v1 , . . . , vj−1 ), so vj = b1 v1 + · · · + bj−1 vj−1 for some b1 , · · · bj−1 ∈ F. Now simply substitute this expression for vj , to see that you can write the original vector as a linear combination without using vj . The preceding lemma is what a mathematician might call a “technical” lemma. In particular, the second part of the lemma seems like something we can use to prove other results. 4. Monday 6/29: More on Span and Linear Independence (2.A) and Bases (2.B) Last time we ended with the Linear Dependence Lemma, which was a technical result, meaning that we can use it to prove other results. We illustrate this by using it to prove two results, which should agree with your intuition. For example, think about how this result agrees with your intuition behind spanning and linear independence. Proposition 37. Suppose u1 , . . . , um is a linearly independent set and w1 , . . . , wn spans V . Then m ≤ n. Proof. The proof uses an interesting technique. Since w1 , . . . , wn spans V , if we add any vector to the list, it will result in a linearly dependent list (since that vector is already in the span of the others). In particular, u1 , w1 , . . . , wn must be linearly dependent. Hence, by the Linear Dependence Lemma, we can remove one of the wi ’s so that the new list still spans V . (Here, there is a technical point that we will remove the furthest-right vector from the list that we can, as we did in the proof of the Linear Dependence Lemma, so we will be removing a wi rather than u1 . If the coefficients on all of the wi ’s are zero, then this implies that u1 = 0, which is impossible since the u’s are linearly independent). Now we have a list of n vectors u1 , w1 , . . . wj−1 , wj+1 , . . . , wn which still spans V . Hence, u2 is in the span of this list, so u1 , u2 , w1 , . . . , wj−1 , wj+1 , . . . , wn is linearly dependent. By the Linear Dependence Lemma, we can remove one of the wi ’s so that this new list still spans V . Repeat this process m times. At each step, we added a ui and the Linear Dependence Lemma implies that there is a wj to remove. Hence, there are at least as many wj as ui , so m ≤ n. This proposition gives a very quick way to show if some set of vectors is linearly dependent. For example, we know that e1 , e2 , e3 span F3 . So if we have a list of four or more vectors in F3 , they are automatically linearly dependent. Our last result in the section also seems intuitively clear, and can be proved using the technical Linear Dependence Lemma. Proposition 38. Every subspace of a finite-dimensional vector space is finite-dimensional. Recall that by finite-dimensional, we mean that a vector space that can be spanned by finitely many vectors. Proof. Suppose V is finite-dimensional and let U be a subspace. We want to prove that U is finite-dimensional. If U = {0}, then U is spanned by the empty list by convention, so U is finite-dimensional. If U 6= {0}, then U contains some nonzero vector, say v1 . If U = span(v1 ), then U is spanned by finitely many vectors so is finite-dimensional. If not, then we can choose a vector v2 6∈ span(v1 ). Continue this process. At each step, we have constructed a linearly independent list, since by the Linear Dependence Lemma, none of the vj ’s is in the span of v1 , . . . , vj−1 . But by the previous proposition, any linearly independent list in V cannot be longer than any spanning list of V . Since V is spanned by finitely many vectors, this process must eventually stop, so U is spanned by finitely many vectors. Bases (2.B) In the last section, we discussed linearly independence and the notion of spanning V . A set of vectors which is both linearly independent and spans V is called a basis of V . Definition 39. A basis for V is a set of vectors B ⊆ V such that B spans V and B is linearly independent. The plural of basis is bases. Example 40. For each of these examples, convince yourself that the claimed basis both spans and is linearly independent. (1) The list {e1 , e2 , . . . , en } is a basis of Fn called the standard basis of Fn . (2) {1, z, . . . , z m } is a basis of Pm (F). (3) {1, z, z 2 , . . . } is a basis of P(F). Example 41. The list of vectors {e1 = (1, 0, 0, . . . ), e2 = (0, 1, 0, . . . ), e3 = (0, 0, 1, 0, . . . ), . . . } do not form a basis for F∞ , since it is not possible to write (1, 1, 1, . . . ) as a linear combination of these vectors. (By definition, a linear combination is a finite sum). The difference between this and P(F) is that polynomials are defined to be finite sums of monomials. The analogue is that {1, z, z 2 , . . . } is not a basis for the vector space of all power series centered at z = 0. Here’s an extremely useful property of bases. Proposition 42. The list B = {v1 , . . . , vn } is a basis of V if and only if every v ∈ V can be written uniquely in the form v = a1 v1 + · · · + an vn where the ai ∈ F. Proof. Suppose that B is a basis of V and let v ∈ V . Since B spans V , therefore there are scalars a1 , . . . , an ∈ F such that v = a1 v1 + · · · + an vn . To show that this representation is unique, suppose that we also have scalars b1 , . . . , bn ∈ F such that v = b1 v1 + · · · + bn vn . Subtract these two equations to obtain 0 = (a1 − b1 )v1 + · · · + (an − bn )vn . Since B is linearly independent, this implies that each ai − bi = 0, and hence we have uniqueness. Conversely, suppose every vector v ∈ V can be written uniquely in the specified form. Since every vector in V is a linear combination of the vectors in B, this means that B spans V . Further, we must have that 0 can be written uniquely as a linear combination of vectors in B. But also, we know that 0 = 0v1 + · · · + 0vn so this must be the unique representation of 0. Hence, B is also linearly independent so B is a basis of V . Definition 43. If B = {v1 , . . . , vn } is a basis of V and v ∈ V , write v = a1 v1 + · · · + an vn . By the above proposition, this representation is unique. The scalars a1 , . . . , an ∈ F are called the coordinates of v with respect to the basis B. Now it seems like a basis is a very special thing, since it both spans and is linearly independent. In some sense it sort of is special, since it is somehow balanced between “pointing in every direction” while still “avoiding redundancy.” So it is more special than simply a spanning set (which “points in every direction” but might have some “redundancy”) or than a linearly independent set (which does “not have redundant vectors” but doesn’t “point everywhere”). On the other hand, the next results will show that a basis isn’t that special. If you have a spanning set, you can always remove some vectors until you get a basis. Similarly, if you have a linearly independent set, it is always possible to add some vectors until you get a basis. Proposition 44. Every spanning list in a vector space can be reduced to a basis of the vector space. Proof. Suppose we have a list of vectors v1 , . . . , vn which spans V . We want to delete some of the vectors so that the list still spans but the shorter list is linearly independent. We do this in a multi-step process. Start with B = {v1 , . . . , vn }. Step 1. If v1 = 0, delete v1 from B. If not, do nothing. Step j. If vj ∈ span(v1 , ,̇vj−1 ), then we don’t need vj anyway so delete it from B. If not, then leave B unchanged. This algorithm terminates after n steps, leaving a list B. Since we have only thrown out vectors that have already been in the span of the previous ones, at no point did we change the span of B. On the other hand, by the Linear Dependence Lemma, since our new list has the property that none of the vectors is in the span of any of the previous ones, it is also linearly independent. Before we prove the “linearly independent set” version of the lemma, we have a nice corollary. Corollary 45. Every finite-dimensional vector space has a basis. Proof. By definition, a finite-dimensional vector space is spanned by finitely many vectors. Use the previous proposition to cut a spanning set down to a basis. 5. Wednesday 7/1: Bases (2.B) and Dimension (2.C) Proposition 46. Every linearly independent list of vectors in a finite-dimensional vector space can be extended to a basis of the vector space. Proof. Suppose that u1 , . . . , um is a linearly independent set in a finite-dimensional vector space V . Since V is finite-dimensional, by the previous corollary, V has a basis. Let w1 , . . . , wn be a basis of V . Then u1 , . . . , um , w1 , . . . wn is a spanning set (since the basis already spanned). Now simply run the algorithm from the proof of the previous proposition. In the first m steps of the algorithm, we never throw any of the ui out, since they are linearly independent. We are left with a basis of V that contains u1 , . . . , um , and some of the wj ’s. Proposition 47. Suppose V is finite-dimensional and U is a subspace of V . Then there is a subspace W of V such that V = U ⊕ W . Proof. Since V is finite-dimensional, we know that U is finite-dimensional. So U has a basis u1 , . . . um . But this is a linearly independent set in V , so we can extend to a basis u1 , . . . , um , w1 , . . . , wn of V . Let W = span(w1 , . . . , wn ). We would like to show that V = U ⊕ W , and by Proposition 23, this means we need to show that V = U + W and U ∩ W = {0}. For any v ∈ V , since the list u1 , . . . , um , w1 , . . . wn spans V , this means that we can write v = (a1 u1 + · · · + am um ) + (b1 w1 + · · · + bn wn ). Since the first sum is in U and the second sum is in W , this shows that v ∈ U + W so V = U + W . To show that U ∩ W = {0}, suppose v ∈ U ∩ W . This means that there are scalars such that v = a1 u1 + · · · + am um = b1 v1 + · · · + bn vn . But then, subtracting we have (a1 u1 + · · · + am um ) − (b1 v1 + · · · + bn vn ) = 0 but since the u’s and w’s are a basis of V , they are linearly independent, so all of the constants must be 0. Hence, v = 0, as desired. Dimension (2.C) Recall that we have defined (and studied a bit) the notion of a finite-dimensional vector space; namely, a vector space that is spanned by finitely many vectors. But we haven’t yet defined the dimension of a vector space. Whatever our notion of dimension, the correct definition should mean that Fn has dimension n (since there are “n directions” in Fn ). The definition of the dimension of V (as you may recall from Math 308) will be the number of vectors in a basis for V . Of course, in order for this to be a reasonable definition, every basis should contain the same number of vectors. Luckily, this is true. Proposition 48. Any two bases of a finite-dimensional vector space V contain the same number of vectors. Proof. We will basically use Proposition 37 (2.27 in your textbook) twice. If B1 and B2 are two bases, then since B1 is linearly independent and B2 spans V , by Proposition 37, B1 must contain less than or equal to the number of vectors than B2 . But then reversing the roles of the two bases, B2 must contain less than or equal to the number of vectors in B1 . Hence, B1 and B2 contain the same number of vectors. Since this is true, we are now allowed to define dimension as we wanted to. Definition 49. The dimension of a finite-dimensional vector space V is the length of any basis of V . The dimension of V is denoted dim V . Example 50. (1) The standard basis of Fn contains n vectors (hence any basis of Fn does), so dim Fn = n. (2) We saw that {1, z, z 2 , . . . , z m } is a basis of Pm (F) so dim Pm (F) = m + 1. (3) In general, the dimension of a vector space depends on the field. So, for example, you can view C as a 1-dimensional vector space over C. Any nonzero complex number serves as a basis. So {1} is a basis, because you can get any complex number by taking C-linear combinations. However, you can also view C as a vector space over R, with basis given by {1, i}. Every complex number is an R-linear combination of 1 and i, and they are linearly independent over R. Hence, C is a 2-dimensional vector space over R. In this class, generally we work with a fixed field F (that could be R or C), so when we say dimension we mean dimension over F. If you wanted to be very specific, you could use a subscript and say dimC C = 1 and dimR C = 2. The notion of dimension is really a notion of the “size” of a vector space. Most vector spaces contain infinitely many elements (think of R or Rn ), so we don’t want to measure the size of a vector space naively by the number of vectors in it. Nevertheless, dimension should obey our intuition for a notion of size, and we should be able to prove the results that we strongly suspect are true. Proposition 51. If V is finite-dimensional and U is a subspace of V , then dim U ≤ dim V . Proof. Recall that we already showed that every subspace of a finite-dimensional vector space is finite-dimensional, so it makes sense to talk about dim U . Suppose that B is a basis of U and C is a basis of V . Then B is a linearly independent list of vectors in U , but we can also view them as vectors in V . Hence, B is a linearly independent list in V and C spans V . So again by Proposition 37, we have that the length of B is less than or equal to the length of C. By the definition of dimension, we have dim U ≤ dim V , as desired. Here’s another indication that bases aren’t that special. If you know that V has dimension n (that is, every basis has length n), then it turns out that if you have a linearly independent list of size n, it is already a basis. The same result is true for n spanning vectors. Proposition 52. Suppose V is finite-dimensional. Then every linearly independent list of vectors in V with length dim V is a basis of V . Proof. This follows from the theorem that every linearly independent list can be extended to a basis. If dim V = n and v1 , . . . , vn is a linearly independent list, then it can be extended to a basis of V by the theorem. However, every basis has length n, so v1 , . . . , vn must already be a basis. Of course, since we know that every spanning set can be cut down to a basis, the spanning set analogue of the previous result is also true. Proposition 53. Suppose V is finite-dimensional. Then every spanning list of vectors in V with length dim V is a basis of V . There is a nice example in your textbook of using these theorems. It is interesting enough that I think it’s worth talking about in detail in class. Example 54. Let U = {p ∈ P3 (R) | p0 (5) = 0}. This is a subset of P3 (R), since it consists of some of the polynomials but not all of them (for example, p(z) = z does not satisfy p0 (5) = 0). It is a subspace since the 0 polynomial satisfies the given proprety, if p0 (5) = 0 and q 0 (5) = 0 then (p + q)0 (5) = p0 (5) + q 0 (5) = 0, and if α ∈ R then (αp)0 (5) = αp0 (5) = 0. We will show that 1, (z − 5)2 , (z − 5)3 is a basis of U . First, note that these polynomials are in U since the derivative of 1 is identically 0, the derivative of (z − 5)2 is 2(z − 5) and the derivative of (z − 5)3 = 3(z − 5)2 , so all of these derivatives vanish at 5. Now we can check that these three polynomials are linearly independent. Suppose a, b, c ∈ R satisfy a + b(z − 5)2 + c(z − 5)3 = 0 Comparing the z 3 coefficient on both sides yields that c = 0. Then comparing the z 2 coefficient yields b = 0. Finally, this means b = 0. So these three polynomials are linearly independent. Hence dim U ≥ 3. On the other hand, U 6= P3 (R), dim U 6= 4, so we must have dim U = 3. The last result in the section, which we won’t prove in lecture, is on the dimension of a sum. Notice the similarity between this formula and the formula for the size of a union of finite sets |U ∪ V | = |U | + |V | − |U ∩ V |. Also note that in particular, this says that if the sum is direct, dim(U1 ⊕ U2 ) = dim U1 + dim U2 . Proposition 55. If U1 and U2 are subspaces of a finite-dimensional vector space then dim(U1 + U2 ) = dim U1 + dim U2 − dim(U1 ∩ U2 ). 6. Monday 7/6: Linear Maps (3.A) A nice quote from your book to start the chapter: “So far our attention has focused on vector spaces. No one gets excited about vector spaces. The interesting part of linear algebra is the subject to which we now turn—linear maps.” This is true. Vector spaces have tons of nice properties, which means that they are frankly pretty boring. But a very deep mathematical fact is that often you learn more about objects by studying the functions between them than by just studying the objects themselves. You will see this in your future mathematics classes as well, so over your mathematical career you will understand this idea more and more. Generally in math, you learn about certain structures (e.g., vector spaces), and then you consider functions which preserve the structures. So to have a vector space, you must have an addition and a scalar multiplication. The kinds of functions we should consider are the ones that play nice with addition and scalar multiplication. Definition 56. Let V and W be vector spaces. A linear map from V to W is a function T : V → W such that (1) T (u + v) = T u + T v for all v, w ∈ V ; (2) T (λv) = λ(T v) for all λ ∈ F and all v ∈ V . In Math 308 these were called linear transformations, but the words “transformation” and “map” and “function” all mean the same thing anyway, and the latter two are more common. Notation. Here, T v = T · v = T (v) is the image of v ∈ V under the function T : V → W . Example 57. (1) The zero map 0 : V → W defined by 0(v) = 0 for all v ∈ V . We can check 0(v + w) = 0 = 0 + 0 = 0v + 0w and 0(αv) = 0 = α0 = α0(v). (Note the difference here between which of the 0’s are functions and which are scalars). (2) The identity map I : V → V defined by Iv = v for all v ∈ V is a linear map. We can check I(v + w) = v + w = Iv + Iw and I(αv) = αv = α(Iv). (3) Differentiation D : P(R) → P(R) defined by Dp = p0 . R1 (4) Integration J : P(R) → R defined by Jp = 0 p(t) dt. (5) The shift maps L, R : F∞ → F∞ : L(x1 , x2 , . . . ) = (x2 , x3 , . . . ) R(x1 , x2 , . . . ) = (0, x1 , x2 , . . . ). Here is an easy but useful result. Proposition 58. If T : V → W is a linear map, then T (0) = 0. Proof. We have T (0) = T (0 + 0) = T (0) + T (0) and hence T (0) is the additive identity of W , as desired. Definition 59. The set of all linear maps from V to W is denoted L(V, W ). Proposition 60. Suppose v1 , . . . , vn is a basis of V and w1 , . . . , wn ∈ W . Then there exists a unique linear map T : V → W such that T vj = wj for all j. Proof. Let v ∈ V . Then since the vi ’s are a basis, we have v = a1 v1 + · · · + an vn for uniquely determined scalars ai ∈ F. We will define the linear map T by T v = a 1 w1 + · · · + a n wn . Certainly this defines a map V → W . We need to check that it is linear. Suppose u ∈ V and write u = b1 v1 + · · · + bn vn for scalars bi ∈ F. Then T u = b1 w1 + · · · + bn wn . On the other hand T (v + u) = T ((a1 + b1 )v1 + · · · + (an + bn )vn ) = (a1 + b1 )w1 + · · · + (an + bn )wn which is the same as T v + T u. Similarly, you can check that T (λv) = λT (v) for all λ ∈ F so T is linear. To show that T is unique, suppose S : V → W is any linear map such that Svj = wj for all j. Then let v ∈ V and again write v = a1 v1 + · · · + an vn . Then, since S is linear Sv = S(a1 v1 + · · · + an vn ) = a1 S(v1 ) + · · · + an S(vn ) = a1 w1 + · · · + an wn = T v so S = T . This says that you can send a basis anywhere with a linear map, and once you choose where to send your basis, the entire linear map is determined. Example 61. In 308, you learned that every linear map T : Fn → Fm can be represented by an m × n matrix. Simply see where T sends the standard basis vectors: a1,j . . T ej = . am,j for all 1 ≤ j ≤ 1n. This should determine the entire linear transformation. Indeed, if v = x1 e1 + · · · + xn en then a1,1 a1,n a1,1 x1 + · · · + a1,n xn a1,1 . . . . . . . .. . = .. . .. T v = x1 . . + · · · + xn . = am,1 am,n am,1 x1 + · · · + am,n xn am,1 . . . a1,n x1 . .. . . . . am,n xn So you can think of the linear map T as multiplication by the matrix. Of course, you can add two m × n matrices to get another m × n matrix. You can also scalar multiply a matrix to get a matrix of the same size. This suggests that all of the m × n matrices might themselves form a vector space! But the more fundamental viewpoint is to consider the linear maps rather than the matrices (which in some way are basis-dependent). Definition 62. Suppose S, T ∈ L(V, W ) and λ ∈ F. The sum S + T is the function defined by (S + T )(v) = Sv + T v and the product (of λ and T ) is the function λT is defined by (λT )(v) = λ(T v) for all v ∈ V . These are both linear maps. It is not obvious that the sum of two linear maps is a linear map! It is a thing that needs to be checked. You need to check that the map that we just defined as S + T satisfies additivity and homogeneity, that is (S + T )(v + w) = (S + T )(v) + (S + T )(w) and (S + T )(λv) = λ(S + T )(v). You should check this. Now just because L(V, W ) is closed under two operations, addition and scalar multiplication, does not necessarily mean that it is a vector space. After all, these operations need to satisfy all the axioms of being a vector space! It turns out this is true, and can be checked using the definitions above. Proposition 63. With operations defined as above, L(V, W ) is a vector space. What is the additive identity of L(V, W )?1 What is the additive inverse of an element S ∈ L(V, W )?2 It turns out that there is actually even a bit more structure on L(V, W ). Generally in a vector space, you can add vectors and multiply vectors by scalars, but there is no notion of multiplying a vector by another vector. But if we have maps T : U → V and S : V → W , then we can take the composition ST : U → W . Note that it is again not obvious that ST : U → W is a linear map. You should prove this! In fact I think it is so worth doing that I will put it on your next problem set (if I remember to). Of course, as vectors T ∈ L(U, V ), S ∈ L(V, W ) and ST ∈ L(U, W ), so they all live in different vector spaces (unless U = V = W ). So this is not a product on L(V, W ), in general. Definition 64. If T ∈ L(U, V ) and S ∈ L(V, W ), then the product ST ∈ L(U, W ) is the function defined by (ST )(u) = S(T u) for u ∈ U . This product has some nice properties, but not necessarily all of the nice properties you could possibly dream of. In particular, in general ST 6= T S. In fact, in many cases if ST is defined, then T S may not be (since the codomain of the first function needs to match the domain of the other). Even in the case that both S and T are defined (i.e., when U = V = W ), they may not be equal. Nevertheless, we do have the following nice properties. Proposition 65. The product of linear maps satisfies the following properties: • Associativity: (ST )U = S(T U ) whenever S, T, U are linear maps such that the products make sense. 1Answer: The zero map 0 : V → W 2Answer: The map −S which we define by (−S)(v) = −S(v) for all v ∈ V . • Identity: T I = IT = T for all T ∈ L(V, W ). (Note that the two identity functions appearing here are identities of different vector spaces. • Distributivity: (S1 + S2 )T = S1 T + S2 T and S(T1 + T2 ) = ST1 + ST2 for all T, T1 , T2 ∈ L(U, V ) and S, S1 , S2 ∈ L(V, W ). Null Spaces and Ranges (3.B) We continue our study of linear maps (with the philosophy that we can learn a lot about vector spaces by looking at the structure-preserving maps between them). For each linear map, there are two subspaces that we can naturally look at, the null space (or kernel ) and range (or image). Definition 66. For T ∈ L(V, W ), the null space (or kernel ) of T , denoted null T , is the subset of V consisting of those vectors that T maps to 0: null T = {v ∈ V | T v = 0}. Proposition 67. If T ∈ L(V, W ), then null T is a subspace of V (the domain). Proof. We showed that for any linear map T , T (0) = 0, so 0 ∈ null T . If u, v ∈ null T , then this means T (u) = 0 and T (v) = 0. Since T is linear, we have T (u + v) = T u + T v = 0 + 0 = 0. So u + v ∈ null T . If u ∈ null T and α ∈ F, then again since T is linear, T (αu) = αT u = α · 0 = 0 so αu ∈ null T . Example 68. • Consider the zero map T : V → W . Then T v = 0 for every v ∈ V . So null T = V . • Consider the identity map I : V → V . Then Iv = v for every v ∈ V . So if Iv = 0, this means v = 0. So null I = {0}. • Let D ∈ L(P(R), P(R)) be the differentiation map. Then D(f ) = 0 if and only if the derivative of f is the 0 polynomials. The polynomials with derivative 0 are exactly the constant functions. So null D = {constant polynomials}. • Let L : F∞ → F∞ be the left shift map L(x1 , x2 , . . . ) = (x2 , x3 , . . . ). Then null L = {(a, 0, 0, . . . ) | a ∈ F}. On the other hand null R consists of only the zero vector. • Let T : F3 → F2 be given by T (x, y, z) = (x + y, 2z). Then null T = {(x, y, z) ∈ F3 | x + y = 0 and 2z = 0} = {(a, −a, 0) | a ∈ F}. So the null space is one-dimensional, spanned by (1, −1, 0). In Math 308, you learned how to compute this by taking the matrix representation for T , row reducing, and reading off a basis for the null space. 7. Wednesday 7/8: Null Spaces and Ranges (3.B) So in some of the examples above, lots of vectors got mapped to 0. In other examples, only the 0 vector itself got mapped to 0. This is related to the injectivity of a linear map. Definition 69. A function T : V → W is called injective if T u = T v implies u = v. This can be restated as if u 6= v then T u 6= T v. In other words, injective functions map distinct inputs to distinct outputs. They never map two different vectors to the same vector. In Math 308, you learned the term “one-to-one” instead of “injective”. The next result is a very useful result which tells you when a linear map is injective. Proposition 70. Let T ∈ L(V, W ). Then T is injective if and only if null T = {0}. Remark. This result says that being injective is equivalent to null T = {0}, or in other words, the only vector that maps to 0 is 0. Of course, if T is going to be injective, it is only allowed to map one vector to 0, and it has to map the 0 vector there. For a general function, to check injectivity, you need to check that every single output only comes from one input. Since linear maps are so nice, you only need to check if the output 0 only comes from one input. Proof. Suppose T is injective. We know that since T is linear, T (0) = 0, and so {0} ⊆ null T . We need to show that null T ⊆ {0}. So suppose v ∈ null T . Then T (v) = 0 = T (0) and since T is injective, v = 0, as desired. Conversely, suppose null T = {0}. We want to show that T is injective. So suppose T u = T v. Then 0 = T u − T v = T (u − v) so u − v ∈ null T and hence u − v = 0. Therefore, u = v so T is injective. The other subspace we associate to a linear map is the range or image. Definition 71. The range (or image) of a function T : V → W is the set range(T ) = {w ∈ W | w = T v for some v ∈ V } Proposition 72. If T ∈ L(V, W ), then range T is a subspace of W (the codomain). Proof. Since T (0) = 0, we have 0 ∈ range T . Suppose w1 , w2 ∈ range(T ). Then there exist v1 , v2 ∈ V such that T v1 = w1 and T v2 = w2 . Therefore w1 + w2 = T v1 + T v2 = T (v1 + v2 ) so w1 + w2 ∈ range(T ). Finish the proof that range T is closed under scalar multiplication on your own. Example 73. • Consider the zero map T : V → W . Then T v = 0 for every v ∈ V . So range T = {0} ⊆ W . • Consider the identity map I : V → V . Then for every v ∈ V , we have Iv = v. Therefore, range I = V . • Let D ∈ L(P(R), P(R) be the differentiation map. Every polynomial occurs as the derivative of some polynomial (its antiderivative). Specifically, a1 2 an n+1 n a0 + a1 z + · · · + an z = D a0 z + z + · · · + z , 2 n+1 so range D = P(R). • Let L : F∞ → F∞ be the left shift map L(x1 , x2 , . . . ) = (x2 , x3 , . . . ). Then range L = F∞ . On the other hand range R = {(0, a1 , a2 , . . . ) | a1 , a2 , · · · ∈ F}. • Let T : F3 → F2 be given by T (x, y, z) = (x + y, 2z). Then range T = {(x + y, 2z) | x, y, z ∈ F}. But every vector can be obtained this way. Indeed, (a, b) = T (a, 0, b/2). So range T = F2 . Again, in some of the examples above, the range T is the entire codomain. In some of the examples, it is a strict subspace. Definition 74. A function T : V → W is called surjective if range T = W . In Math 308, you used the term “onto” rather than “surjective”. The null space and the range are very closely related. The basic idea is this: the more vectors you send to 0, the fewer vectors you get in the image. The fewer vectors you send to 0, the more vectors you can get in the image. The precise statement is the following Theorem 75 (The Fundamental Theorem of Linear Maps or the Rank–Nullity Theorem). Suppose V is finite-dimensional and T ∈ L(V, W ). Then range T is finite-dimensional and dim V = dim null T + dim range T In Math 308, one way to prove this is to count the number of pivots and columns in matrices. However, this proof is distasteful because it relies on having a matrix representation (which is less intrinsic than the map itself). Also note that that proof only works for maps Fn → Fm . Here, V can be any finite-dimensional vector space, and W need not even be finite-dimensional. If W is infinite-dimensional, then there is really no hope of writing down a matrix. So we provide a much better proof. Proof. Since null T is a subspace of a finite-dimensional vector space, it is finite-dimensional and hence has a basis u1 , . . . , um . By a previous theorem, we can extend this to a basis of all of V , say u1 , . . . , um , v1 , . . . , vn where dim V = m+n. We simply need to show that range T is finite-dimensional and dim range T = n. Where can we possibly get a basis of range T that has length n? Well the natural thing to try is to show that T v1 , . . . , T vn is a basis of range T (at least these are n vectors that live in range T ). So let’s show that these vectors span range T and are linearly independent. For any vector v ∈ V , we can write v = a1 u1 + · · · + am um + b1 v1 + · · · + bn vn since the u’s and v’s formed a basis of V . After applying T , since each of the u’s is in the kernel, we have T v = b1 T v1 + · · · + bn T vn . This means that every vector in range T is indeed in the span of the T v1 , . . . T vn , so these vectors span range T . To show that they are linearly independent, suppose that c1 (T v1 ) + · · · + cn (T vn ) = 0. Then T (c1 v1 + · · · + cn vn ) = 0 so c1 v1 + · · · + cn vn ∈ null T. Since this vector is in null T , we can write it as a linear combination of the u’s c1 v1 + · · · + cn vn = d1 u1 + · · · + dm um , but subtracting and using linear independence of the u’s and v’s, we can conclude that all of the scalars are 0. Since all of the c’s are zero, the list T v1 , . . . , T vn is linearly independent, as desired. Recall that in Math 300 you learned that if |A| > |B| then a function f : A → B is never injective. Similarly, if |A| < |B| then a function f : A → B is never surjective. For vector spaces, the correct notion of size is dimension, so we have the following theorems (which you should also remember from Math 300). We can prove these results using the Rank–Nullity Theorem. Proposition 76. Suppose V and W are finite-dimensional vector spaces such that dim V > dim W . Then no linear map V → W is injective. Proof. Let T ∈ L(V, W ). Then dim null T = dim V − dim range T ≥ dim V − dim W > 0. Since null T has positive dimension, it must contain vectors other than 0, so by Proposition 70, T is not injective. A very similar proof is used to prove: Proposition 77. Suppose V and W are finite-dimensional vector spaces such that dim V < dim W . Then no linear map V → W is surjective. A reprise of a Math 308 hit: bases of the null space and range. One of the nicest computational tools in linear algebra is the use of matrices. In the abstract, it is generally easier to work with a linear map. For a specific computation, you often want to work with a matrix. The next section (3.C) is specifically on the matrix of a linear map, but I wanted to do an example to jog your memories. Two ways that we can describe a subspaces of Fn are: • Implicitly, e.g., as the set of solutions x ∈ R4 to the system of equations x1 + 3x2 + 3x3 + 2x4 = 0 2x1 + 6x2 + 9x3 + 7x4 = 0 −x1 − 3x2 + 3x3 + 4x4 = 0. • Explicitly, e.g., the span of 3 2 3 2 , 6 , 9 , 7 ∈ R . 1 −1 3 −3 3 4 Both of these can be expressed as the null space and range of the linear map T : R4 → R3 with the matrix 1 3 3 2 A= 2 6 9 7 . −1 −3 3 4 Here the columns of A are given by the images T e1 , T e2 , T e3 , T e4 . Note that for every x ∈ R4 , we can write x1 x2 x= x = x1 e1 + x2 e2 + x3 e3 + x4 e4 3 x4 and 2 3 Ax = x1 2 + x2 6 + x3 9 + x4 7 . 1 −1 3 −3 The key tool is to use Gaussian elimination: 1 3 3 2 1 3 3 2 A= 2 6 9 7 0 0 3 3 −1 −3 3 4 0 0 6 6 3 4 1 3 0 −1 0 0 1 1 =: U. 0 0 0 0 The matrix U is the reduced row echelon form of A, and you can use it to determine a basis of null T and range T . First, null T = null A = {x | Ax = 0} = {x | U x = 0}. But from U , x2 and x4 are free variables, so every solution to U x = 0 can be written as −3x2 + x4 −3 1 x2 = x2 1 + x4 0 . −x 0 −1 4 x4 0 1 So these two vectors form a basis of the null space. On the other hand, the range of T ends up being the same as the column space of A, col A. For a basis for the column space, you take the pivot columns, so a basis of range T is 1 3 2 , 9 . −1 3 Note that the dimensions are correct and satisfy the Rank–Nullity Theorem. 8. Friday 7/10: Matrices (3.C) Let V and W be finite-dimensional vector spaces with bases BV = {v1 , . . . vn } and BW = {w1 , . . . , wm }, respectively. Now for any linear map T : V → W , we know that T is determined entirely by T v1 , . . . , T vn (i.e., what T does to a basis of V ). Since the wj ’s are a basis of W , we can write each T vi uniquely as a linear combination of the wj ’s. This should contain all of the information about T. Definition 78. An m × n matrix A is a rectangular array of elements of F with m rows and n columns A1,1 . . . . . A= . Am,1 . . . A1,n .. . . Am,n Note that Aj,k is the entry in the jth row and kth column of A. We will use matrices to record the data of the linear map T . Note that this of course depends on our choice of bases for V and W . Definition 79. Suppose T ∈ L(V, W ) and retain the bases BV and BW above. The matrix of T with respect to these bases is the m × n matrix M(T ) whose entries Aj,k are defined by T vk = A1,k w1 + · · · + Am,k wm (that is, the kth column of M(T ) records the coefficients for writing T vk as a linear combination of the wj ’s). If the bases are not clear from context, then the cumbersome notation M(T, (v1 , . . . , vn ), (w1 , . . . , wn ) is used. Generally, if no bases are given and T : Fn → Fm , we use the standard bases of Fn and Fm . Indeed, these are the matrices you have been using since Math 308. Example 80. Suppose T ∈ L(F3 , F2 ) is defined by T (x, y, z) = (3x − y, 2x + 2z). To find the matrix of T with respect to the standard bases, we just need to understand T e1 , T e2 , T e3 , and write these with respect to the standard basis of F2 . Of course, everything is already written with respect to the standard basis. Since T (1, 0, 0) = (3, 2), T (0, 1, 0) = (−1, 0) and T (0, 0, 1) = (0, 2) we have M(T ) = " # 3 −1 0 2 0 2 . In Math 308, you learned that there is a correspondence between m × n matrices and linear map T : Fn → Fm . Every matrix gives a linear map, every linear map can be recorded by a matrix. The only extra wrinkle here is that we allow arbitrary finite-dimensional vector spaces, and the matrices depend on a choice of basis for the vector space. You can add two matrices of the same size and you can multiply a matrix by a scalar. We also saw that you can add two linear maps V → W and you can scalar multiply a linear map by a scalar. Do these two notions coincide? That is, suppose S, T : V → W . From these (with a fixed choice of basis), you obtain two matrices M(S) and M(T ). You can also form the linear map S +T : V → W and the matrix M(S + T ). How are M(S), M(T ), and M(S + T ) related? Since this is linear algebra, the nicest possible subject, the answer is: the way you want them to be. Proposition 81. Suppose S, T ∈ L(V, W ). Then M(S + T ) = M(S) + M(T ). So addition of linear maps and addition of their corresponding matrices correspond. The same hold for scalar multiplication Proposition 82. Suppose T ∈ L(V, W ) and λ ∈ F. Then M(λT ) = λM(T ). Yet again, since we now have a notion of addition for matrices of the same size and a notion of scalar multiplication, this suggests that perhaps the set of all matrices of a fixed size form a vector space. Again, it is not obvious (but it is true), that these operations satisfy the axioms of a vector space. Definition 83. For m and n positive integers, the set of all m × n matrices with entries in F is denoted by Fm,n . Fm,n is a vector space of dimension mn. A basis is given by the elementary matrices Ei,j which have a 1 in the ith row and jth column and 0s everywhere else. Recall that we said that there is even more structure to the set of linear maps—given linear maps T : U → V and S : V → W , it is possible to compose them to obtain ST : U → W . On the matrix side, U , V , and W are all finite-dimensional with some choice of fixed bases, we obtain M(S), M(T ) and M(ST ). We will define a multiplication on matrices (of the correct size) so that it corresponds to function composition. That is, M(S)M(T ) = M(ST ). Definition 84. Let A ∈ Fm,n and C ∈ Fn,p . Then the product AC is defined to be the m × p matrix whose entry in row j and column k is given by (AC)j,k = n X Aj,r Cr,k r=1 In other words, the (j, k) entry is obtained by taking the dot product of the jth row of A with the kth column of C. Note that just as function composition is only defined when the domain of T matches the codomain of S, matrix multiplication is only defined when the number of columns of A is the same as the number of rows of C. Proposition 85. Let T : U → V and S : V → W be linear maps. Fix bases {u1 , . . . , up }, {v1 , . . . , vn } and {w1 , . . . , wm } of U , V , and W , respectively. Let M(T ), M(S), and M(ST ) be the matrices of T , S, and ST with respect to the given bases. Then M(S)M(T ) = M(ST ). Proof sketch. Really the proof is just a computation where you’re careful with indices. Let A = M(S) and C = M(T ). To compute the matrix of ST , we simply need to know the image (ST )(uk ) for each 1 ≤ k ≤ p. What happens to uk under these maps? ! n X (ST )(uk ) = S Cr,k vr r=1 = = n X r=1 n X r=1 = Pn r=1 Aj,r Cr,k Cr,k m X Aj,r wj j=1 m n X X j=1 So this means that Cr,k (Svr ) ! Aj,r Cr,k wj r=1 should be the (j, k)th entry of M(ST ), which is exactly how we defined AC. Invertibility and Isomorphic Vector Spaces (3.D) Definition 86. A linear map T ∈ L(V, W ) is called invertible if there exists a linear map S ∈ L(W, V ) such that ST = IV ∈ L(V, V ) and T S = IW ∈ L(W, W ). A linear map S ∈ L(V, W ) satisfying ST = I and T S = I is called an inverse of T . Notice that in the definition it says that S is an inverse of T , not the inverse of T . That’s because it is not obvious that an invertible map has only one inverse. It is true though. Proposition 87. An invertible linear map has a unique inverse. Proof. Suppose T ∈ L(V, W ) is invertible. Suppose that S1 and S2 are inverses of T . Then consider S1 T S2 . On the one hand, S1 T S2 = (S1 T )S2 = IS2 = S2 but also S1 T S2 = S1 (T S2 ) = S1 I = S1 so we conclude that S1 = S2 . So in fact inverses are unique, so we can talk about the inverse of an invertible linear map. If T ∈ L(V, W ) is invertible, we denote its inverse by T −1 ∈ L(W, V ). But the question remains: how can we tell if a linear map is invertible? It turns out this question has a very clean answer. Proposition 88. A linear map is invertible if and only if it is injective and surjective (i.e., if and only if it is bijective). 9. Monday 7/13: Invertibility and Isomorphic Vector Spaces (3.D) We begin by proving the theorem that we stated at the end of last lecture. Proposition 89. A linear map is invertible if and only if it is injective and surjective (i.e., if and only if it is bijective). Proof. Suppose T ∈ L(V, W ) [⇒]. Suppose T is invertible. We want to show that T is injective and surjective. First, we show T is injective. To this end, suppose u, v ∈ V such that T u = T v. Now apply the inverse T −1 : W → V to both of these vectors to see that u = T −1 (T u) = T −1 (T v) = v so u = v and hence T is injective. Now we want to show that T is surjective. So suppose w ∈ W . Which vector in V maps over to w? Again, use the inverse map T −1 . The vector T −1 w ∈ V and T (T −1 w) = w so w ∈ range T and so T is surjective. [⇐]. Now suppose that T is bijective. We want to construct a map S ∈ L(V, W ) which is its inverse. So for each w ∈ W , let Sw ∈ V be the unique vector in V such that T (Sw) = w. Why does such a vector exist? First, there is a vector that maps over to w since T is surjective. And there is only one such vector since T is injective. We need to show that this map S is linear and the inverse of T . First we show S is linear. Let w1 , w2 ∈ W . Then T (Sw1 + Sw2 ) = T (Sw1 ) + T (Sw2 ) = w1 + w2 and so Sw1 + Sw2 is the unique element of V which maps over to w1 + w2 . But this is the definition of S(w1 + w2 ) so we have S(w1 + w2 ) = Sw1 + Sw2 which shows that S is additive. The proof for homogeneity is similar. Finally, we need to show that ST = IV and T S = IW . The latter of these is clear by the definition of S. For any w ∈ W , we defined Sw to be exactly the vector that T (Sw) = w. So T S = IW . Now let v ∈ V . Then T (ST (v)) = (T S)(T v) = I(T v) = T v. But since T is injective, this shows that ST (v) = v. Therefore ST = IV , and we have completed the proof. Isomorphic Vector Spaces. If there is an invertible linear map between two vector spaces V and W , then by the previous result it is injective and surjective. In which case these two vector spaces are essentially “the same”. Example 90. Let V be the xy-plane in R3 , that is V = {(x, y, 0) | x, y ∈ R} = span((1, 0, 0), (0, 1, 0)). This seems like it is basically the same as R2 . Of course, they are not the same. The vector (1, 0) is in R2 , and it is definitely not in V . But they do behave in basically exactly the same way. There is a map T : V → R2 defined by T (x, y, 0) = (x, y) which you should check is an invertible linear map. This means that the two spaces have “the same size” (since T is bijective) and preserves addition and scalar multiplication. So even if two vector spaces aren’t exactly the same, if there’s an invertible linear map between them, they behave in the same way. We would not say that R2 is equal to V , instead, we say that they are isomorphic. Definition 91. An isomorphism is an invertible linear map. If there exists an isomorphism V → W , then we say that V and W are isomorphic. Note that this is sort of a redundant definition, a linear map T is invertible if and only if it is an isomorphism. You use the word isomorphic when you really want to stress that two vector spaces are essentially the same. One nice fact is that over any field, basically there is “only one” vector space of each finite-dimension. So we saw that the xy-plane was isomorphic to R2 , but actually every two-dimensional subspace of R3 is isomorphic to R2 . And every two-dimensional subspace of Rn . And even the vector space P1 (R) of linear polynomials is isomorphic to R2 . Theorem 92. Let V and W be finite-dimensional vector spaces over F. Then V and W are isomorphic if and only if dim V = dim W . Proof. [⇒]. Suppose that V and W are isomorphic, so there exits an isomorphism T : V → W . Since T is injective, null T = {0} and since T is surjective, range T = W . Hence, by the Rank– Nullity Theorem, dim V = dim null T + dim range T = 0 + dim W = dim W. [⇐]. Conversely, suppose that dim V = dim W = d. Then there exist bases {v1 , . . . , vd } and {w1 , . . . , wd } of V and W , respectively. We want to show that there is an isomorphism T : V → W . But by Proposition 60, there exists a unique linear map T such that T vj = wj for all j. By the previous result, we simply need to show that T is both injective and surjective. Since {w1 , . . . , wd } is a basis for W , every vector w ∈ W can be written as w = c1 w1 + · · · + cd wd . But then T (c1 v1 + · · · + cd vd ) = c1 T v1 + · · · + cd T v2 = c1 w1 + · · · + cd wd = w so T is surjective. Now to show that T is injective, it suffices to show that null T = 0. So suppose T v = 0. Write v = c1 v1 + · · · + cd vd . Then 0 = T v = c1 T v1 + · · · + cd T vd = c1 w1 + · · · + cd wd and since {w1 , . . . , wd } is linearly independent, this implies that cj = 0 for all j and so v = 0, as desired. Since the concept of isomorphic vector spaces captures vector spaces that are “the same”, the fact that there is a correspondence between m × n matrices and linear maps from Fm → Fn should be captured by some isomorphism. Proposition 93. Suppose v1 , . . . , vn is a basis of V and w1 , . . . , wm is a basis of W . For each T ∈ L(V, W ), let M(T ) ∈ Fm,n be the m × n matrix of T with respect to these bases. Then the function M : L(V, W ) → Fm,n defined by T 7→ M(T ) is an isomorphism. Proof. Propositions 81 and 82 show that M is linear. We need to prove that M is injective and surjective. First, we prove injectivity. Suppose T ∈ null M, that is T ∈ L(V, W ) such M(T ) = 0. Then by the definition of M(T ), we have that T vk = 0 for all k. Since v1 , . . . , vn is a basis of V , this implies that T = 0 so M is injective. Next, surjectivity. Suppose A ∈ Fm,n . We must find a T ∈ L(V, W ) such that M(T ) = A. Define T : V → W by T vk = m X Aj,k wj j=1 for all k. Then A = M(T ), so M is surjective, as desired. Since we know that Fm,n has dimension mn, we have the following corollary. Corollary 94. Suppose V and W are finite-dimensional. Then L(V, W ) is finite-dimensional and dim L(V, W ) = (dim V )(dim W ). The last definition we will introduce in this section is that of an operator, which is just a linear map from a vector space to itself. Definition 95. A linear map from a vector space V to itself is called an operator. We use the notation L(V ) = L(V, V ) for the set of all operators on V . Remember the first result we proved today says that a linear map is invertible if and only if it is bijective. The intuition here is that an isomorphic vector spaces are “the same” so they should have the same size. If you have a surjective map V → W , this says that dim V ≥ dim W and if you have an injective map V → W , this says that dim V ≤ dim W . On the other hand, when we are talking about an operator V → V , do we actually need to check both things? In Math 300, if we had an injective map {1, 2, . . . , n} → {1, 2, . . . , n}, then since we send each input to a distinct output, this injection was automatically surjective. Similarly, if we have an injective operator V → V , is it automatically surjective? Well if we think about just set-theoretic functions, we can have injections Z → Z that are not surjective, for example the map x 7→ 2x. The issue here is that Z has infinite cardinality so weird things can happen. If V is infinite-dimensional, we can similarly have operators of V that are injective but not surjective or surjective but not injective. Example 96. • The left shift operator L : F∞ → F∞ is not injective, since the null space contains all scalar multiples of (1, 0, 0, . . . ). However, it is surjective. • The right shift opeator R : F∞ → F∞ is not surjective, since all of the vectors in the range start with 0. However, it is injective. • The multiplication by z operator on P(R) is injective since the only polynomial p such that zp = 0 is the polynomial p = 0. However, it is not surjective, as its image does not contain any polynomials with nonzero constant term. However, as long as the vector space in question is finite-dimensional, injectivity and surjectivity of an operator are equivalent. Proposition 97. Suppose V is finite-dimensional and T ∈ L(V ). Then the following are equivalent: (1) T is invertible, (2) T is injective, (3) T is surjective. Proof. By Proposition 89, (1) if and only if (2) and (3). It suffices to prove that (2) and (3) are equivalent, since then if we have (2), we also have (3), and together (2) and (3) would imply (1). [(2) ⇒ (3)]. Suppose that T is injective. Then null T = {0} by Proposition 70. So by the Rank– Nullity Theorem, dim range T = dim V − dim null T = dim V so range T = V , and so T is surjective. [(3) ⇒ (2)]. Similarly, if T is surjective, then dim null T = dim V − dim range T = dim V − dim V = 0 so null T = {0} and so T is injective. I think your book gives a pretty cool example of the power of this seemingly simple theorem. Example 98. Show that for each polynomial q ∈ P(R), there exists a polynomial p ∈ P(R) such that ((x2 + 5x + 7)p)00 = q. In theory you could (try) prove this after Calculus I. You simply write q as a general polynomial with variables as coefficients, then integrate twice (now with two new variables from the constants of integration that you have control over), then try to choose those integration constants such that you can divide by x2 + 5x + 7 and solve for the coefficients of p. But this seems painful to me. You can also use the previous proposition, although it doesn’t seem (at first) to apply to this, since P(R) is infinite-dimensional. However, for each polynomial q, since q has finite degree m, q ∈ Pm (R) which is finite dimensional. So suppose q ∈ Pm (R). Define an operator T : Pm (R) → Pm (R) by T (p) = ((x2 + 5x + 7)p)00 Since multiplying by p by a degree two polynomial then differentiating twice preserves the degree, T (p) ∈ Pm (R). So T is an operator on Pm (R). Suppose T (p) = 0. The only polynomials whose second derivatives are zero are linear polynomials ax + b. So this means that (x2 + 5x + 7)p = ax + b for some a, b ∈ R. But this means that p = 0. So null T = {0} and hence T is injective, and thus surjective. Therefore every polynomial q ∈ Pm (R) is in the image of T . 10. Wednesday 7/15: Products of Vector Spaces (3.E) Section 3.E of the textbook covers products and quotients of vector spaces. I think quotients are a bit more difficult to grasp, so we will not be covering them in this class, although you can read that part in your textbook. Quotient spaces will be important in your abstract algebra and topology classes, so I’ll let your professors in those classes handle that subject. The notion of a product of vector spaces is just a way to combine two or more different vector spaces into a larger vector space. Definition 99. Suppose V1 , . . . , Vm are vector spaces over F. The product V1 × · · · × Vm is, as a set V1 × · · · × Vm {(v1 , . . . , vm ) | vi ∈ Vi for all i} with addition defined by (u1 , . . . , um ) + (v1 , . . . , vm ) = (u1 + v1 , . . . , um + vm ) and scalar multiplication defined by λ(v1 , . . . , vm ) = (λv1 , . . . , λvm ). This makes the product into a vector space over F. This is basically the most straightforward way to put two vector spaces together. You basically just write elements of the individual vector spaces next to each other. Example 100. The product P3 (R) × R2 contains vectors that look like (2x3 − 4x + 5, (3, 2)). Addition just happens component-wise so (x − 3, (0, 1)) + (x2 + 2x − 4, (2, −2)) = (x2 + 3x − 7, (2, −1)). Note the difference between the sum of two subspaces and the product of two vector spaces. To take the sum U + W , both U and W need to be subspaces of the same vector space V . Then U + W = {u + w | u ∈ U, w ∈ W }. In order for this definition to make sense, you need to be able to add an element of U with an element of W . On the other hand, you can take the product of any two vector spaces you want. Example 101. Consider the product R2 × R3 . Elements of this product look like (v, w) where v ∈ R2 and w ∈ R3 , so like ((v1 , v2 ), (w1 , w2 , w3 )). Is this equal to R5 ? Well, almost. You view an element of R2 × R3 as an ordered pair, where one element is in R2 and one is in R3 while you view an element of R5 as a 5-tuple of real numbers. So they are not equal as sets. On the other hand, they behave essentially identically, with the exception of sticking in some parentheses. The correct notion is that of an isomorphism, as we’ve discussed. The linear map T : R2 × R3 → R5 which maps ((v1 , v2 ), (w1 , w2 , w3 )) to (v1 , v2 , w1 , w2 , w3 ) is bijective so these two vector spaces are isomorphic. Once you’ve gained some mathematical maturity, you might be allowed to say that “R2 × R3 = R5 ”, but this is informal so in this class we ought to be more precise. What is the dimension of V1 × · · · × Vm ? Well if we think about the example above, we might guess that the dimension of the product is the sum of the dimensions. Proposition 102. Suppose V1 , . . . , Vm are finite-dimensional vector spaces. Then dim(V1 × · · · × Vm ) = dim V1 + · · · + dim Vm . In particular, the product of finitely many finite-dimensional vector spaces is finite-dimensional. Proof sketch. Pick bases for each of the Vi ’s. If v is a basis element of some Vi , then consider the element (0, 0, . . . , 0, v, 0, . . . , 0) where v appears in the ith slot. The collection of all such vectors as we range over the bases of the Vi ’s will form a basis of the product. And there are clearly dim V1 + · · · + dim Vm of them. We discussed that you can take the product of any finite set of vector spaces. You can only take the sum of subspaces inside a common vector space. But if you have several subspaces of a common vector space, you can take their product, too (since they’re each vector spaces). What is the relationship between the sum and product of a collection of subspaces? Proposition 103. Suppose U1 , . . . , Um are subspaces of a vector space V . Define a linear map Γ : U1 × · · · × Um → U1 + · · · + Um Γ(u1 , . . . , um ) = u1 + · · · + um . Then U1 + · · · + Um is a direct sum if and only if Γ is injective. Proof. You should make sure that you believe that Γ is a linear map. Once we know it is linear, Γ is injective if and only if null Γ = {0}. In other words, the only way to write 0 as a sum u1 + · · · + um where each ui ∈ Ui is by taking all of the ui = 0. But by Theorem 22, this happens if and only if the sum is a direct sum. Note that the map Γ defined above is always surjective. Hence, if the sum is a direct sum, then the product is isomorphic to the sum. The previous result did not have any finite-dimensional hypothesis; it holds for any subspaces of any vector space. On the other hand, in the finite-dimensional case, we can use dimension to detect direct sums. Proposition 104. Suppose V is finite-dimensional and U1 , . . . , Um are subspaces of V . Then U1 + · · · + Um is a direct sum if and only if dim(U1 + · · · + Um ) = dim U1 + · · · + dim Um Proof. As we just stated, the map Γ is always surjective. So by the Rank–Nullity Theorem dim(U1 × · · · × Um ) = dim null Γ + dim range Γ = dim null Γ + dim(U1 + · · · + Um ). We know that the sum is direct if and only if Γ is injective if and only if null Γ = {0}. And so the sum is direct if and only if dim(U1 × · · · × Um ) = dim(U1 + · · · + Um ) = dim U1 + · · · + dim Um . Duality (3.F) Let V be a vector space. The linear maps V → V were important enough to warrant a special name—L(V, V ) is the set of operators on V . Another extremely important set of linear maps are the linear maps V → F. Definition 105. A linear functional on V is a linear map from V → F, i.e., an element of L(V, F). We use the notation V 0 = V ∗ = L(V, F) and call this the dual space of V . Your book uses the notation V 0 , but you should be aware that V ∗ is also a very common notation. Example 106. • Fix an element (c1 , . . . , cn ) ∈ Fn . Then we have a linear functional φ : Fn → F defined by φ(x1 , . . . , xn ) = c1 x1 + · · · + cn xn . To be even more concrete, we can let (3, −2) ∈ R2 . Then the functional φ : R2 → R sends the vector (x, y) to the number 3x − 2y. • The linear map J : P(R) → R given by J(p) = R1 0 p(z) dz is a linear functional. • Another linear functional you are already familiar with is the map, say ev3 : P(R) → R that evaluates every polynomial at 3. That is, ev3 (p) = p(3). These last two examples should be seen as evidence has to how commonly linear functionals appear in mathematics. Proposition 107. Let V be a finite-dimensional vector space. Then V 0 is finite dimensional and dim V 0 = dim V . Proof. We already proved that if V and W are finite-dimensional then dim L(V, W ) = (dim V )(dim W ). For the dual space, W = F is one-dimensional. So if dim V = n, then dim V 0 = n, as well. A natural question to ask is: is there a nice basis for V 0? Definition 108. Suppose v1 , . . . , vn is a basis of V . Then the dual basis of v1 , . . . , vn is the list φ1 , . . . , φn of elements in V 0 . Each φj is the linear functional on V such that 1 if k = j, φj (vk ) = . 0 if k 6= j. Make sure that you understand that each φj really is a linear map from V to F. This follows because to determine a linear map, you only need to specify the images of a basis of the domain. And the images are certainly in the field F. Example 109. If we consider the standard basis e1 , . . . , en of Fn , then the dual basis φ1 , . . . , φn is defined by 1 φj (ek ) = 0 if k = j, . if k 6= j. What then does φj do to a general vector in Fn ? Well if x = (x1 , . . . , xn ) ∈ Fn then x = x1 e1 + · · · + xn en . So φj (x) = x1 φj (e1 ) + · · · + xn φj (en ) = xj φj (ej ) = xj . In other words, φj returns the jth coordinate of x. We called the dual basis without ever checking that it actually is a basis of V 0 . We need to fix this immediately. Proposition 110. If v1 , . . . , vn is a basis of V , then the dual basis φ1 , . . . , φn is a basis of V 0 . Proof. Since we know that dim V 0 = n, it is enough to verify that the dual basis is linearly independent (since by Proposition 52, any n linearly independent elements in a vector space of dimension n form a basis). So suppose that a linear combination a1 φ1 + · · · + an φn = 0 where 0 denotes the function V → F that is identically zero. Hence, when we apply this function to any of the vj , we get 0. However, (a1 φ1 + · · · + an φn )(vj ) = a1 φ1 (vj ) + · · · + an φn (vj ) = aj φj (vj ) = aj . Hence, for each j, we have aj = 0. Therefore, the dual basis is in fact linearly independent and therefore a basis of V 0 . An interesting feature of the dual space is that if T : V → W is a linear map, then there is a natural way to define a linear map W 0 → V 0 . Definition 111. If T ∈ L(V, W ), then the dual map of T is the linear map T 0 ∈ L(W 0 , V 0 ) defined by T 0 (φ) = φ ◦ T for φ ∈ W 0 . Note that φ : W → F is a linear functional on W . Hence, T 0 (φ) = φ ◦ T : V → W → F is a linear functional on V (it is linear since it is a composition of linear maps). This function composition also explains why the arrow “flipped” when we took duals.3 Example 112. In these examples, I will use ∗ notation instead of 0 notation because I want to reserve 0 for differentiation. Consider the linear map D : P(R) → P(R) given by differentiation Dp = p0 . We would like to understand the dual map D∗ : (P(R))∗ → (P(R))∗ . • Let ev3 ∈ (P(R))∗ be the linear functional which evaluates a polynomial at 3. What is D∗ (ev3 )? Well, by definition, this is a linear functional on P(R) defined by: (D∗ (ev3 ))(p) = (ev3 ◦D)(p) = ev3 (Dp) = ev3 (p0 ) = p0 (3). So D∗ (ev3 ) is the linear functional that maps p to p0 (3). • Also we considered the linear functional J ∈ (P(R))∗ which takes the definite integral R1 of a polynomial on the interval [0, 1], that is J(p) = 0 p(z) dz. What is D∗ (J)? This is the linear functional the element of (P(R))∗ defined by Z 1 (D∗ (J))(p) = (J ◦ D)(p) = J(p0 ) = p0 (z) dz = p(1) − p(0). 0 So D∗ (J) is the linear functional that maps p to p(1) − p(0). 3You can think of all of the vector spaces over F as living in some world of vector spaces with linear maps between them. Then for every vector space and every linear map, you can “take duals”. Each resulting dual vector space has the same dimension as the original vector space, but all the arrows have flipped direction. It’s like the whole world has mirrored. And like the kids in Stranger Things, we’re trying to understand the Upside-Down world of duals. 11. Friday 7/17: Duality (3.F) We know that taking the dual of a vector space V returns a vector space V 0 of the same dimension. We also know that when you take the dual of a linear map T : V → W you get a linear map T 0 : W 0 → V 0 . What properties does this process have? For example, what if you take the dual of the sum of two linear maps S + T ? What about the composition of two linear maps U → V and V → W ? If you know properties about T , what does that say about T 0 ? Can you understand the null space and range of T 0 in terms of information about T ? So many questions, so little time. Proposition 113. The following properties hold: (1) (S + T )0 = S 0 + T 0 for all S, T ∈ L(V, W ). (2) (λT )0 = λT 0 for all T ∈ L(V, W ) and λ ∈ F. (3) (ST )0 = T 0 S 0 for all T ∈ L(U, V ) and S ∈ L(V, W ). Proof. (1) The first two claims are simply computations that should be checked. Suppose S, T ∈ L(V, W ). Then we get maps S 0 , T 0 ∈ L(W 0 , V 0 ). For each φ ∈ W 0 , we have S 0 (φ) = φ ◦ S : V → F T 0 (φ) = φ ◦ T : V → F and we can add these two linear maps S 0 (φ) + T 0 (φ) = φ ◦ S + φ ◦ T : V → F using the usual definition of addition of linear maps. On the other hand, we could add S + T ∈ L(V, W ) and then (S + T )0 (φ) = φ ◦ (S + T ) : V → F. Are these two functions S 0 (φ) + T 0 (φ) and (S + T )0 (φ) the same function V → F? Well if v ∈ V , then φ ◦ (S + T )(v) = φ(S(v) + T (v)) = φ ◦ S(v) + φ ◦ T (v) so they are indeed the same function. Since this holds for all φ ∈ W 0 , therefore S 0 + T 0 and (S + T )0 are the same function W 0 → V 0 . The hard thing about this proof is understanding which things are maps, what the domains and ranges of the maps are, and the definition of addition and equality of functions. So the difficulty is in the abstraction, not in the computation. (2) You should check the second claim. (3) First note that ST : U → W so (ST )0 : W 0 → U 0 . Further, S 0 : W 0 → V 0 and T : V 0 → U 0 . So both (ST )0 and T 0 S 0 are maps W 0 → U 0 . Are they equal as maps? Well let φ ∈ W 0 . Then (ST )0 (φ) = φ ◦ (ST ) = (φ ◦ S) ◦ T = T 0 (φ ◦ S) = T 0 (S 0 (φ)) = (T 0 S 0 )(φ). So they are indeed equal. The Null Space and Range of the Dual of a Linear Map. We now turn our attention to null T 0 and range T 0 . We know that if T : V → W then T 0 : W 0 → V 0 so null T 0 will be a subspace of W 0 and range T 0 will be a subspace of V 0 . But how are they related to the original map T ? In order to answer this question, we first need the following definition. Definition 114. Let U ⊆ V . The annihilator of U is U 0 = {φ ∈ V 0 | φ(u) = 0 for all u ∈ U }. That is, the annihilator is all of the linear functionals which vanish on every vector in U . Note that this definition holds for any subset U of V . We do not require that U is a subspace. Note that U 0 is a subset of the dual space U 0 ⊆ V 0 . It will turn out that it is always a subspace. Example 115. Let U be the subspace of P(R) consisting of all multiples of x2 . What are some linear functionals that are in the annihilator of U ? One example is the polynomial ev0 : P(R) → R. Any multiple of x2 is equal to 0 when evaluated at 0. Another example is the linear functional φ : P(R) → R given by φ(p) = p0 (0). So both ev0 , φ ∈ U 0 . Proposition 116. Suppose U ⊆ V . Then U 0 is a subspace of V 0 . Proof. To show that U 0 is subspace, we need to show it contains 0 (here, the zero vector of V 0 is the zero functional V → F that sends every vector to 0) and that it is closed under addition and scalar multiplication. It is clear that 0 ∈ U 0 , since the zero functional applied to any vector in V is 0. So this is certainly true for every vector in U . Suppose now that φ, ψ ∈ U 0 .4 This means that for all u ∈ U , we have φ(u) = 0 and ψ(u) = 0. Therefore, for every u ∈ U . (φ + ψ)(u) = φ(u) + ψ(u) = 0 and so φ + ψ ∈ U 0 . The proof for scalar multiplication follows similarly. Example 117. Let’s consider R5 with its standard basis e1 , . . . , e5 . Take the dual basis φ1 , . . . , φ5 of (R5 )0 . If we let U = span(e1 , e2 ) (i.e., the xy-plane in R5 ), then what is U 0 ? Well we certainly know three functionals on R5 that vanish on U , namely φ3 , φ4 , φ5 (φj is defined to be 1 on ej and 0 on the other standard basis vectors). So φ3 , φ4 , φ5 ∈ U 0 . We claim that in fact U 0 = span(φ3 , φ4 , φ5 ) First, we show that U 0 ⊇ span(φ3 , φ4 , φ5 ). Every element of the right-hand side can be written c3 φ3 + c4 φ4 + c5 φ5 . But then (c3 φ3 + c4 φ4 + c5 φ5 )(a1 e1 + a2 e2 ) = 0 so every vector vanishes on all of U 0 . Conversely, suppose that ψ ∈ U 0 . Because the dual basis is a basis of (R5 )∗ , then ψ = c1 φ1 + c2 φ2 + c3 φ3 + c4 φ4 + c5 φ5 . Then since ψ vanishes on U , it certainly vanishes on e1 . That is 0 = ψ(e1 ) = (c1 φ1 + c2 φ2 + c3 φ3 + c4 φ4 + c5 φ5 ) = c1 so this implies that c1 = 0. Similarly, since ψ vanishes on e2 , we get that c2 = 0. Therefore, ψ ∈ span(φ3 , φ4 , φ5 ), as desired. So for any subset, the annihilator is a subspace. What if U itself is a subspace? Then U and U 0 are both subspaces. How are they related? Proposition 118. Suppose V is finite-dimensional and U is a subspace of V . Then dim U + dim U 0 = dim V. 4Note that we use the letters φ, ψ for vectors in a dual space to remind ourselves that they are linear maps. Proof. Consider the inclusion map i ∈ L(U, V ) defined by i(u) = u for all u ∈ U . This is a linear map which is defined since U is a subspace of V . Then, taking duals, we get a dual map i0 ∈ L(V 0 , U 0 ) So by the Rank–Nullity Theorem applied to i0 , we have dim range i0 + dim null i0 = dim V 0 . But what is null i0 ? Well by definition i0 : V 0 → U 0 takes a linear functional φ ∈ V 0 to φ ◦ i ∈ U 0 . So null i0 consists of all of those linear functionals φ such that φ ◦ i = 0 as a function U → F. But since i is simply the inclusion map, null i0 consists of all those elements u ∈ U such that φ(u) = 0. And this is the definition of U 0 . Further, we know that dim V 0 = dim V , so we get dim range i0 + dim U 0 = dim V. Now, what is range i0 ? Again, by the definition of dualizing, range i0 consists of every functional ψ ∈ U 0 such that ψ = φ ◦ i for some φ ∈ V 0 . In other words, range i0 is all of those maps in U 0 that can be extended to a linear functional on V . But every linear functional ψ on U can be extended to a linear functional φ on V , so range i0 = U 0 , which has the same dimension as U . Therefore, we obtain the desired equality. The next result explicitly characterizes the null space of the dual map. The null space of the dual of T is the annihilator of the range of T . Proposition 119. Suppose T ∈ L(V, W ). Then null T 0 = (range T )0 . If, further, V and W are finite-dimensional, then dim null T 0 = dim null T + dim W − dim V. Proof. For the first claim, we need to prove two inclusions. So first, we show that null T 0 ⊆ (range T )0 . Suppose φ ∈ null T 0 . Then 0 = T 0 (φ) = φ ◦ T. To say that this is the 0 functional is to say that for every v ∈ V , 0 = (φ ◦ T )(v) = φ(T v). This means that φ vanishes on every vector in range T (since T v ranges over all vectors in range T as you take every vector v ∈ V ). Hence, φ ∈ range(T )0 . To prove the reverse inclusion null T 0 ⊇ (range T )0 , suppose that φ ∈ (range T )0 . This means that for every w ∈ range T , we have φ(w) = 0. But every vector w ∈ range T can be written as T v for some vector(s) v ∈ V . Hence, for every v ∈ V , we have φ(T v) = 0. Then 0 = φ ◦ T = T 0 (φ) so φ ∈ null T 0 . Combining this with the previous paragraph we get null T 0 = (range T )0 . If both V and W are finite-dimensional, then we have dim null T 0 null T 0 = (range T )0 = Prop 118 = dim W − dim range T Rank–Nullity = dim(range T )0 dim W − (dim V − dim null T ) = dim null T + dim W − dim V, as desired. Material for your midterm exam ends here. That is to say, your midterm (on Wednesday 7/22) covers up through the middle of 3.F. On Monday, we will finish up 3.F. On Friday we will cover chapter 4 and start chapter 5. 12. Monday 7/20: Duality (3.F) As a corollary of the last result from last time, we prove the following pithy theorem. Corollary 120. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then T is surjective if and only if T 0 is injective. Proof. We know T ∈ L(V, W ) is surjective if and only if range T = W . Further, range T = W if and only if (range T )0 = {0}. Why is this true? Well if range T = W , then (range T )0 = W 0 and W 0 consists of all those linear functionals which vanish on all of W . This is of course only the zero functional, so (range T )0 = {0}. Conversely, suppose range T 6= W . Take a basis v1 , . . . , vn of range T and extend to a basis v1 , . . . , vn , w1 , . . . , wm of all of W . Since range T 6= W , we have at least one wi . We can then construct a linear functional W → F which vanishes on v1 , . . . , vn but does not vanish on the w’s. This linear functional is a nonzero element of (range T )0 . [This is the step in the proof that uses finite-dimensionality, since we need to choose a basis.] Therefore, by the previous theorem null T 0 = (range T )0 = {0}. But this is equivalent to T 0 being injective. Just as we were able to characterize the null space of a dual map, we have similar results for the range of a dual map. Proposition 121. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then range T 0 = (null T )0 . Further, dim range T 0 = dim range T . Proof. We will prove the second statement first, and then use it in the proof of the first statement. We have dim range T 0 Rank–Nullity = dim W 0 − dim null T 0 Prop 119 and 107 = Prop 118 = dim W − dim(range T )0 dim range T. For the first statement, we show range T 0 ⊆ (null T )0 . Suppose φ ∈ range T 0 . This means that there exists ψ ∈ W 0 such that φ = T 0 (ψ). Now if v ∈ null T , then φ(v) = (T 0 (ψ))(v) = (ψ ◦ T )(v) = ψ(T v) = ψ(0) = 0 and so φ ∈ (null T )0 . But now, since range T 0 ⊆ (null T )0 , if we show that they have the same dimension, then they must actually be equal. But onte above dim range T 0 = dim range T Rank–Nullity = Prop 118 = dim V − dim null T dim(null T )0 . We also have the analogue of the theorem that said that surjectivity of T is equivalent to injectivity of T 0 . Proposition 122. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then T is injective if and only if T 0 is surjective. Proof. The map T ∈ L(V, W ) is injective if and only if null T = {0}. But null T = {0} if and only if (null T )0 = V 0 . Why is this? Well if null T = {0}, then every linear functional on V vanishes on null T . Hence, (null T )0 = V 0 . On the other hand, if (null T )0 = V 0 , this means that every linear functional on V vanishes on null T . If null T is nontrivial, then it contains a nonzero vector v. Extend to a basis of V and define a linear functional which maps v to 1 and vanishes on the rest of the basis. Then this linear functional does not vanish on null T . Hence, null T must be trivial. Now by Proposition 121, this happens if and only if range T 0 = V 0 , which is the definition of T 0 being surjective. Finally, suppose V and W are finite-dimensional vector spaces. If T : V → W is a linear map with dual T 0 : W 0 → V 0 , we can ask about the matrix of T 0 . How is it related to the matrix of T ? Well we can take bases {v1 , . . . , vn } and {w1 , . . . , wm } of V and W . Then M(T ) is an m×n matrix. We can also take dual bases {φ1 , . . . , φn } and {ψ1 , . . . , ψm } of V 0 and W 0 . Since T 0 : W 0 → V 0 , its matrix will be n × m. It turns out that with respect to the dual bases, M(T 0 ), the matrix of T 0 is the transpose of M(T ). For a matrix A, we denote its transpose At . Proposition 123. Assume the set-up above. Then M(T 0 ) = M(T )t . Proof. This is yet again an exercise in definitions and notation. Let A = M(T ) and C = M(T 0 ). The entries of Cl,k of C are determined by what T 0 does to the ψk ’s in terms of the φl ’s. In particular, for each 1 ≤ k ≤ m, 0 T (ψk ) = n X Cl,k φl l=1 Now T 0 (ψk ) is an element of V 0 , so we can evaluate it on vj to see ! n X 0 T (ψk )(vj ) = Cl,k φl (vj ) = Cj,k . l=1 Meanwhile, we also have T 0 (ψk )(vj ) = (ψk ◦ T )(vj ) = ψk m X ! Al,j wl = Ak,j . l=1 Therefore the (j, k) entry of C is given by the (k, j) entry of A, so the two matrices are transpose to one another. The Rank of a Matrix. Now that we have built up all of this technology with duality, we apply it to something concrete. You may remember in Math 308 learning that given a matrix A, you can consider the column space col(A), row space row(A), and null space null A. The matrix A is also related to a linear transformation T and range(T ) = col(A) while null(T ) = null(A). But curiously, you didn’t talk much about the row space. How is the row space related to everything? Well we know that row(A) = col(At ). And we just saw that At is the matrix of the dual map T 0 . So the row space is the range of the dual of T . Definition 124. Suppose A ∈ Fm,n is an m × n matrix. The row rank of A is the dimension of the span of the rows of A in F1,n . The column rank of A is the dimension of the span of the columns of A in Fm,1 . The next result is one from Math 308, and you can read the abstract proof in the book. Proposition 125. Suppose V and W are finite-dimensional and T ∈ L(V, W ). Then dim range T is equal to the column rank of M(T ). Using what we proved about duality, we can show that the row rank and column rank of a matrix are equal. Proposition 126. Suppose A ∈ Fm,n . Then the row rank of A equals the column rank of A. Proof. We define a map T : Fn → Fm given by T (x) = Ax so that M(T ) = A. Then the column rank of A is equal to dim range T . Note that the row rank of A is equal to the column rank of At . So the row rank of A is equal to the column rank of M(T )t = M(T 0 ) = dim range T 0 . But we proved that dim range T = dim range T 0 , so the column rank and row rank of A are equal. Since the row rank of a matrix always equals its column rank, we can define the rank of a matrix to just be either of them. Definition 127. The rank of a matrix A ∈ Fm,n is the column rank of A. Wednesday 7/22: You will have your first midterm which you will both download from and submit to Gradescope. Friday 7/24: We will quickly cover some necessary results about polynomials in Chapter 4 and then talk about polynomials applied to operators in 5.B. Next week: 5.A, 5.B, 5.C. 13. Wednesday 7/22: Midterm Exam Today you took your midterm exam. 14. Friday 7/24: Polynomials (Chapter 4) and Operators (5.B) Chapter 4 in your textbook is on properties of polynomials, starting from the basic definitions and covering the fundamental facts about roots, factorization, polynomial division, etc. It is good that this is all in your textbook, in case you want to brush up on how to work with polynomials, but I will assume that you know most of the basic tools in working with polynomials. I will cover only some of the facts and tools that we will need that are a bit more advanced. Theorem 128 (The Division Algorithm). Given polynomials p, s ∈ P(F) with s 6= 0, there exist unique polynomials q, r ∈ P(F) such that p = qs + r and deg r < deg s. Remark. By convention, we say that deg(0) = −∞, so it has smaller degree than every polynomial. This is called the division algorithm because it says that you can divide p by s. The quotient is q, and the remainder is r. In Math 300, you learned this statement for integers. The difference is that for the integers, we had r < s. For polynomials, we have deg r < deg s. Example 129. Let’s suppose that p = 2x3 +3x2 −x+1 and s = x−1. The method you learned in pre-calculus for long-division of polynomials yields the quotient and remainder guaranteed by the theorem. If you carry out the computation, the quotient is q = 2x2 + 5x + 4 and the remainder is r = 5. You can then verify that 2x3 + 3x2 − x + 1 = (2x2 + 5x + 4)(x − 1) + 5. Proof. Let deg p = n and deg s = m. If n < m, then if p = qs + r, we must have that q = 0 and hence r = p. These are the unique polynomials satisfying the property that we want. So suppose n ≥ m. Define a linear map T : Pn−m (F) × Pm−1 (F) → Pn (F) by T (q, r) = qs + r. You should check that this map is linear. We will in fact show that T is an isomorphism. If (q, r) ∈ null T then this means that qs + r = 0. But this implies that q = 0 since if q 6= 0 then deg(qs) ≥ deg(s) = m > deg(r). Since r has strictly smaller degree than qs, there is no way for qs + r = 0 except to have q = 0. But this also forces r = 0. So null T = {0}. Hence T is injective. Further, dim (Pn−m (F) × Pm−1 (F)) = dim(Pn−m (F))+dim(Pm−1 (F)) = (n−m+1)+m = n+1 = dim Pn (F). So T is in an injection between two finite-dimensional vector spaces of the same dimension, which means it must be an isomorphism. We also have the Fundamental Theorem of Algebra5, which you have hopefully seen before. Theorem 130 (Fundamental Theorem of Algebra). Let p ∈ P(C) be a nonconstant polynomial. Then there exists λ ∈ C such that p(λ) = 0. In other words, every nonconstant polynomial has a complex root. Which has an important corollary. Corollary 131. Given p(z) ∈ P(C) of degree d > 0, there exist c, λ1 , . . . , λd ∈ C such that p(z) = c(z − λ1 ) . . . (z − λd ) So λ1 , . . . , λd are the roots of p. Proof. The result is clear if deg(p) = 1, for every linear polynomial is of the form αz + β = α(z + β/α). So c = α and λ1 = −β/α. Suppose deg(p) ≥ 2. By the Fundamental Theorem of Algebra, p(z) has a complex root λ1 . By the Division Algorithm, we can write p(z) = (z − λ1 )q(z) + r(z) where deg(r(z)) < deg(z − λ1 ) = 1. So in fact r(z) is a constant polynomial. But since p(λ1 ) = 0, this means that r(z) = 0. Now p(z) = (z − λ1 )q(z) and q is a polynomial of degree d − 1. Use induction. Remark. The factorization given in the corollary is unique up to reordering the factors. 5I am a professional algebraist, and I’m not sure that this deserves to be called the Fundamental Theorem of Algebra. To me this theorem has a pretty analytic flavor, although the roots of modern group theory can be traced back to understanding the behavior of roots of polynomials. Polynomials Applied to Operators (part of 5.B) One new way that we will use polynomials in this class is that we will apply them to operators. Normally, if you have a polynomial like p(z) = z 2 + 3, you apply the polynomial to a number so, e.g., p(4) = 42 + 3 = 19. However, we will be plugging operators into polynomials. Recall that L(V ) denotes the set of linear operators on a vector space V . Definition 132. Suppose T ∈ L(V ) and m is a positive integer. • We define T m = T · · T}. This makes sense since T : V → V , so you can compose T with | ·{z m times itself. • T 0 is defined to be the identity operator I on V • If T is invertible with inverse T −1 , then T −m is define to be T −m = (T −1 )m . It is clear that if T is an operator then T m T n = T m+n and (T m )n = T mn . Since we have a well-behaved notion of the “power” of an operator, we can substitute operators into polynomials. Definition 133. Suppose T ∈ L(V ) and p ∈ P(F) is a polynomial given by p(z) = a0 + a1 z + · · · + am z m . Then p(T ) is the operator defined by p(T ) = a0 I + a1 T + · · · + am T m . Again, here we are taking powers of T (which also give operators) and taking a linear combination (and a linear combination of operators is an operator). Hence, p(T ) is a well-defined operator. Example 134. Let D ∈ L(P(R)) denote the differentiation operator (as always) so D(q) = q 0 . Let p ∈ P(R) be the polynomial defined by p(x) = 3 + 2x2 − x3 . Then p(D) is a new operator p(D) = 3I + 2D2 − D3 . What does this operator do to a polynomial? p(D)(q) = 3q + 2q 00 − q 000 for all q ∈ P(R). Proposition 135. Suppose p, q ∈ P(F) and let T ∈ L(V ). Then (1) (pq)(T ) = p(T )q(T ). (2) p(T )q(T ) = q(T )p(T ). Remark. In the first equation, the left-hand side means that you multiply the polynomials p and q to get the new polynomial pq. Then plug the operator into this new polynomial to get (pq)(T ), an operator. The right-hand side says to plug T into p and q to get two operators p(T ), q(T ). You can then compose these to get p(T )q(T ). So this says that if you plug an operator into a product of polynomials, then its the same thing as composition. For the second, it says that any operators given by polynomials in T commute. Note that almost always L(V ) has non-commutative composition. The order in which you perform operators definitely matters. But in the case that your operators are all polynomials in a single operator T , then the order does not matter. Proof. Note that (1) will imply (2). If we have (1), then p(T )q(T ) = (pq)(T ) = (qp)(T ) = q(T )p(T ) since multiplication of polynomials is commutative. Hence, we need only show (1). Pn P k j To show (1), write p(z) = m k=0 bk z . Then j=0 aj z and q(z) = (pq)(z) = m X n X aj bk z j+k . j=0 k=0 Hence, ! m n X X p(T )q(T ) = aj T j bk T k j=0 = n m X X k=0 aj bk T j+k j=0 k=0 = (pq)(T ). We will see that thinking of polynomials applied to operators will be useful in the rest of chapter 5. But next we jump back to section 5.A. Invariant Subspaces (5.A) Your book gives a nice little introduction to 5.A. Suppose you want to study an operator T ∈ L(V ). Further suppose that we can decompose V as a direct sum V = U1 ⊕ · · · ⊕ Um . Well then to understand what T does to V , you only need to understand what T does to each Uj —that is, you only need to understand each restriction T |Uj . The restriction is defined on a smaller vector space, so should be easier to understand. However, the restriction T |Uj is now a linear map in L(Uj , V ). It need not send vectors in Uj only to vectors in Uj . Therefore, even though T |Uj is simpler, it may not be an operator, so we lose the ability to take powers. We are thus naturally led to try to understand the notion of a subspace of a vector space that T actually maps to itself. Definition 136. Suppose T ∈ L(V ). A subspace U of V is called invariant under T if u ∈ U implies T u ∈ U . In other words, U is invariant under T if T |U is an operator on U . Example 137. Let V = R2 , U1 = span(e1 ) (the x-axis) and U2 = span(e2 ) (the y-axis). Then V = U1 ⊕ U2 . " Consider the operator T on R2 given by the matrix 2 1 0 1 # . That is, T (e1 ) = (2, 0) and T (e2 ) = (1, 1). This is a perfectly good operator on R2 . Note that every vector in U1 is of the form (x, 0). Then T (x, 0) = (2x, 0) ∈ U1 . So U1 is an invariant subspace of T and T |U1 is an operator on U1 . On the other hand, every vector in U2 is of the form (0, y) and T (0, y) = (y, y) 6∈ U2 . So when restricted to U2 , T |U2 is just a linear map U2 → V , not an operator. 15. Monday 7/27: Eigenvalues, Eigenvectors, and Invariant Subspaces (5.A–B) Recall the definition of an invariant subspace. Definition 138. Suppose T ∈ L(V ). A subspace U of V is called invariant under T if u ∈ U implies T u ∈ U . In other words, U is invariant under T if T |U is an operator on U . Example 139. Suppose T ∈ L(V ). Here are some first basic examples of invariant subspaces. (1) The zero subspace {0} is invariant under T . This is because T (0) = 0, so T maps every vector in {0} to a vector in {0}. (2) The whole space V is invariant under T . For each v ∈ V , we certainly have T v ∈ V , since T is an operator on V . (3) The null space of T , null T is invariant under T . Suppose v ∈ null T . Then T v = 0. Therefore T (T v) = T (0) = 0. So T v ∈ null T , as well. (4) The range of T , range T is invariant under T . Suppose v ∈ range T . Then since v ∈ V , we have T v ∈ range T , as well. Example 140. Consider the differentiation operator D : P(R) → P(R) given by Dp = p0 . We have that P3 (R) is a subspace of P(R) and if p ∈ P3 (R) then Dp ∈ P3 (R), since if p is a polynomial of degree ≤ 3 then its derivative is a polynomial of degree ≤ 3, as well. Hence, P3 (R) is invariant under D. In fact any Pm (R) is invariant under D. If U is a subspace of V which is invariant under T ∈ L(V ), then when we restrict T to U , since T u ∈ U for every u ∈ U , the restriction yields an operator on U . (As mentioned earlier, this would not be true if U were not invariant, since then the restriction would just yield a linear map T |U : U → V , which is not an operator.) Definition 141. Suppose T ∈ L(V ) and U is a subspace of V which is invariant under T . The restriction operator T |U ∈ L(U ) is defined by T |U (u) = T u for u ∈ U . Note that the restriction of a linear map to a subspace is always defined, it just may not always be an operator. Example 142. Let T ∈ L(V ) and let U = span(v) be a one-dimensional subspace of V . Then U is invariant under T if and only if T v = λv for some λ ∈ F. This example explains why this section is called eigenvalues, eigenvectors, and invariant subspaces. Definition 143. Suppose T ∈ L(V ). A scalar λ ∈ F is an eigenvalue of T if there exists v ∈ V such that v 6= 0 and T v = λv. Any such vector v is called an eigenvector corresponding to the eigenvalue λ. Proposition 144. Suppose V is finite-dimensional, T ∈ L(V ) and λ ∈ F. The following are equivalent: (1) λ is an eigenvalue of T . (2) T − λI is not injective. (3) T − λI is not surjective. (4) T − λI is not invertible. Proof. The first two are equivalent since T v = λv is equivalent to 0 = T v − λv = (T − λI)v so if λ is an eigenvector of T , there is a nonzero vector in the null space of T − λI, so T − λI is not invertible. Since V is finite-dimensional by Proposition 97, the last three are all equivalent. For a fixed eigenvalue, we can look at all of the eigenvectors that have that eigenvalue. Definition 145. Let T ∈ L(V ) and let λ ∈ F be an eigenvalue of T . The eigenspace corresponding to λ is defined by E(λ, T ) = null(T − λI). That is, E(λ, T ) is the set of all eigenvectors of T corresponding to λ, along with the 0 vector. We next show that if eigenvectors have distinct eigenvalues, then they are linearly independent. Proposition 146. Let T ∈ L(V ). Suppose λ1 , . . . , λm are distinct eigenvalues of T and v1 , . . . , vm are corresponding eigenvectors. Then v1 , . . . , vm is linearly independent. Proof. We prove by contradiction. Suppose for contradiction that v1 , . . . , vm is linearly dependent. We know that there is some k such that vk ∈ span(v1 , . . . , vk−1 ). Choose the smallest such k. (So we will have that v1 , . . . , vk−1 is linearly independent, but vk is in their span). We can therefore write vk = a1 v1 + · · · + ak−1 vk−1 . Now apply T to both sides of this equation to get λk vk = a1 λ1 v1 + · · · + ak−1 λk−1 vk−1 . But we also have λk vk = a1 λk v1 + · · · + ak−1 λk vk−1 . Subtracting these two equations yields 0 = a1 (λk − λ1 )v1 + · · · + ak−1 (λk − λk−1 )vk−1 . Since the v1 , . . . , vk−1 is linearly independent, this forces all of the coefficients in the previous equation to be 0. However, since the eigenvalues are distinct, we must have that a1 = · · · = ak−1 = 0. But then vk = 0, contradicting the hypothesis that vk is an eigenvector. As a corollary, we can see that an operator on a finite-dimensional vector space can only have so many eigenvalues. Corollary 147. Suppose V is finite-dimensional. Then each operator on V has at most dim V distinct eigenvalues. Proof. If λ1 , . . . , λm are distinct eigenvalues, then they have corresponding eigenvectors v1 , . . . , vm . By the previous result, these are linearly independent. So there can be at most dim V of them. However, not every operator needs to have an eigenvalue. Remark. Fix an angle 0 < θ < π. Consider the operator T : R2 → R2 given by rotation by angle θ. You may remember that with respect to the standard basis on R2 , " # h i cos θ − sin θ M(T ) = T e1 T e2 = . sin θ cos θ Then by thinking about the geometry, it is clear that no nonzero vector gets scaled by T . So this operator has no eigenvalues or eigenvectors. So in general, an operator on V can have up to dim V eigenvalues, but it might have none. However, over C, it is true that every operator has an eigenvalue. Theorem 148. Let F = C and let V be a finite-dimensional vector space. Then every operator on T has an eigenvalue. The traditional proof of this fact that you learn as an undergraduate uses the determinant. However, Axler feels strongly against using the determinant. A quote from the introduction “Determinants are difficult, nonintuitive, and often defined without motivation. [...] This tortuout (torturous?) path gives students little feeling for why eigenvalues exist.” Traditional proof. The determinant det(T − xI) is a polynomial in x of degree dim V > 0 and hence has a root by the Fundamental Theorem of Algebra. If λ ∈ C is such a root, then T − λI is not invertible, so λ is an eigenvalue of T . Determinant-free proof. Suppose V is a complex vector space with dimension n > 0 and T ∈ L(V ). Let v ∈ V be any nonzero vector. Consider the n + 1 vectors v, T v, . . . , T n v are linearly dependent. Then there exist scalars a0 , . . . , an ∈ F not all zero such that a0 v + a1 T v + · · · + an T n v = 0. Let p ∈ P(F) and write p(z) = a0 + a1 z + · · · + an z n = c(z − λ1 ) . . . (z − λm ) (which you can do by the Fundamental Theorem of Algebra). Now we have 0 = (a0 I + a1 T + · · · + an T n )(v) = c(T − λ1 ) . . . (T − λm )(v). This means that for some j, T − λj is not injective. And hence, there is a vector w such that (T − λj )(w) = 0 so w is an eigenvector of T with eigenvalue λj . 16. Wednesday 7/29: Upper-Triangular Matrices and Diagonal Matrices (5.B–C) We have already discussed the matrix of a linear map T : V → W when V and W are finitedimensional with a choice of fixed basis. Since an operator is just a linear map T : V → V , if V is finite-dimensional, then we can also talk about the matrix of an operator. But since the domain and codomain are the same, the matrix will be square. Also, we will choose the same basis of V in the domain and the codomain by convention. Definition 149. Suppose T ∈ L(V ) and v1 , . . . , vn is a basis of V . The matrix of T with respect to this basis is the n × n matrix A1,1 . . . . . M(T ) = . An,1 . . . A1,n .. . An,n where the entries Aj,k of M(T ) are defined by T vk = A1,k v1 + · · · + An,k vn . (The kth column of the matrix tells you how to write T vk in terms of the basis). As always, if V = Fn and no basis is specified, we assume that the basis being used is the standard basis. Definition 150. The diagonal (or main diagonal ) of a square matrix consists of the entries Ai,i , that is, the entries along the line from the upper-left corner to the bottom-right corner. A matrix is called upper-triangular if all the entries below the main diagonal equal 0. A matrix is called lower-triangular if all the entries above the main diagonal equal 0. A matrix is called diagonal if its only non-zero entries occur along the main diagonal (if and only if it is both upper-triangular and lower-triangular). Here, sometimes we specify main diagonal to distinguish this from the diagonal line from the upper-right to the bottom-left. This other diagonal is sometimes called the antidiagonal. A key idea: as you change the basis of V , the matrix of the linear operator changes. We might want to choose a basis of V so that the matrix of T has a particularly nice or simple form. This was the idea behind diagonalization in Math 308, although you maybe didn’t think about it this way at the time. Definition 151. We say T ∈ L(V ) is diagonalizable if there exists a basis of V such that M(T ) is diagonal with respect to this basis. Imagine that v is an eigenvector T with eigenvalue λ. Then we can choose a basis that has v as its first vector. What is the matrix of M(T )? Well the first column will be determined by T v, written with respect to our basis. But since T v = λv and v is in our basis, the first column with respect to this basis will be (λ, 0, . . . , 0)t . Similarly, if w is not a scalar multiple of v and is an eigenvector with eigenvalue µ, then if we choose w to be the second basis element, then the second column will be (0, µ, 0, . . . , 0)t . So, if we could find a basis of linearly independent eigenvectors, then the M(T ) with respect to this basis would be diagonal! This idea is captured by the following result. Proposition 152. Suppose T ∈ L(V ) and v1 , . . . , vn is a basis of V . Then the following are equivalent: (1) the matrix of T with respect to v1 , . . . , vn is diagonal; (2) vj is an eigenvector of T for each 1 ≤ j ≤ n; Proof. The discussion above is essentially the proof (combined with induction). λ1 .. then the jth column of M(T ) says that If M(T ) is diagonal, say M(T ) = . λn T vj = 0v1 + · · · + λj vj + · · · + 0vn and so vj is a eigenvector of T . Conversely, if each vj is an eigenvector of T , then T (vj ) = λj vj for some λj ∈ F. Then it is clear that M(T ) with respect to this basis of eigenvectors will be the diagonal matrix with diagonal entries λ1 , . . . , λn . This says that T is diagonalizable if and only if V admits a basis consisting of eigenvectors of T . However, as you may remember from Math 308, not every operator is diagonalizable, even over C. Example 153. Let T : C2 → C2 be given by T (x, y) = (0, x). Suppose that v = (v1 , v2 ) is an eigenvector of T with eigenvalue λ. Note that T v = T (v1 , v2 ) = (0, v1 ) T 2 v = T (0, v1 ) = (0, 0). So if T v = λv, then T 2 v = λ2 v = 0 so λ = 0. This means that 0 is the only possible eigenvalue for T . But now E(0, T ) = null(T − 0I) = null(T ) = span((0, 1)). which is one-dimensional. Hence, it is not possible to find a basis of C2 consisting of eigenvectors of T , whence T is not diagonaliable. We do have a sufficient (though not necessary) condition that ensures diagonalizability. Proposition 154. Suppose dim V = n. If T ∈ L(V ) has n distinct eigenvalues, then T is diagonalizable. Proof. We can choose eigenvectors v1 , . . . , vn which correspond to the n distinct eigenvalues. We proved that these vectors are linearly independent, and hence form a basis of V . Therefore, with respect to this eigenbasis, M(T ) is diagonal. Being diagonalizable is a very strong notion. We also have the weaker notion of upper-triangularity6. What does it mean if the matrix of an operator is upper-triangular? Proposition 155. Suppose T ∈ L(V ) and v1 , . . . , vn is a basis of V . Then the following are equivalent: (1) the matrix of T with respect to v1 , . . . , vn is upper-triangular; (2) T vj ∈ span(v1 , . . . , vj ) for each 1 ≤ j ≤ n; (3) span(v1 , . . . , vj ) is invariant under T for each 1 ≤ j ≤ n. Proof. Parts (1) and (2) are equivalent from the definition of M(T ). The jth column of M(T ) contains the coefficients of T vj when written as a linear combination of the vi ’s. So if the matrix is upper triangular, this means that the coefficients on vj+1 , . . . , vn are all 0, and so T vj can be written as a linear combination of v1 , . . . vj . So we need only prove that (2) and (3) are equivalent. Suppose (3) holds and let 1 ≤ j ≤ n. By hypothesis, span(v1 , . . . , vj ) is invariant under T . This means that any vector in span(v1 , . . . , vj ) stays in span(v1 , . . . , vj ) after you apply T . In particular, T vj ∈ span(v1 , . . . , vj ). This holds for all 1 ≤ j ≤ n, so (3) implies (2). 6Upper-triangularness? Finally, suppose (2) holds and let 1 ≤ k ≤ n. By hypothesis, for every j, we have T vj ∈ span(v1 , . . . , vj ). So T v1 ∈ span(v1 ) ⊆ span(v1 , . . . , vk ) T v2 ∈ span(v1 , v2 ) ⊆ span(v1 , . . . , vk ) .. . T vk ∈ span(v1 , v2 , . . . , vk ). Any v ∈ span(v1 , . . . , vk ) can be written as a linear combination of v1 , . . . , vk so T v ∈ span(v1 , . . . , vk ). This shows that span(v1 , . . . , vk ) is invariant under T . We know that not every matrix is diagonalizable. This means that given an operator T ∈ L(V ), it is not always possible to find a basis of V so that M(T ) is diagonal. However, over C, it is always possible to find a basis of V so that M(T ) is upper-triangular. Theorem 156. Suppose V is a finite-dimensional vector space over C and T ∈ L(V ). Then T has an upper-triangular matrix with respect to some basis of V . Proof. We prove by induction on dim V . If V is one-dimensional, then it is spanned by some vector V = span(v). And T v ∈ V so T v = λv for some λ ∈ C. The matrix of T is just [λ], which is upper-triangular. So suppose dim V > 1 and the result holds for all vector spaces of smaller dimension. Let T ∈ L(V ). By Theorem 148, T has an eigenvalue λ. This means that T − λI is not invertible. Let U = range(T − λI). Since T − λI is not invertible, dim U < dim V . Claim. U is invariant under T . To see this, suppose that u ∈ U . Then T u = T u − λu + λu = (T − λI)(u) + λu. We have (T −λI)(u) ∈ range(T −λI) = U and λu ∈ U (since U is closed under scalar multiplication). Hence, T u ∈ U . This proves the claim. Since U is invariant under T , the restriction T |U is an operator on U . Since dim U < dim V , by our inductive hypothesis, there is a basis u1 , . . . , um of U such that M(T |U ) is upper-triangular with respect to this basis. Therefore, for each 1 ≤ j ≤ m, T uj = (T |U )(uj ) ∈ span(u1 , . . . , uj ). We know we can extend our basis of U to a basis u1 , . . . , um , v1 , . . . , vn of V . Now note that for each 1 ≤ k ≤ n, T vk = (T − λI)vk + λvk . Since U = range(T − λI), this shows that T vk ∈ span(u1 , . . . , um , vk ). And therefore, for each k, T vk ∈ span(u1 , . . . , um , v1 , . . . , vk ). By the previous proposition, this shows that M(T ) is upper-triangular with respect to this basis. One benefit of finding a basis of V such that M(T ) has a nice form is that perhaps the matrix can tell you properties of the linear map. For example, for a general linear map/matrix, it is not obvious whether the map is invertible. It is also not easy to read off the eigenvalues. However, if M(T ) is upper-triangular, it becomes easy to see. Proposition 157. Suppose T ∈ L(V ) has an upper-triangular matrix with respect to some basis of V . Then T is invertible if and only if all the entries on the diagonal are nonzero. The eigenvalues of T are the diagonal entries of M(T ). 17. Friday 7/31: Inner Product Spaces (6.A) Probably the first time you learned about vectors was in a math class or physics class long ago. Back then, you were young and naive, and your teacher or professor may have defined a vector as “something with a magnitude and a direction”. They were lying to you. Now that you’re older and mathematically mature, I can tell you that a vector is... an element of a vector space7. And a vector space is characterized by its axioms. Certainly in Rn the vectors have magnitude and direction. But you’re using the Euclidean notion of length that relies on properties of R. There are lots of other fields out there in the wide world. If F is any field, you can always consider vector spaces over F. But sometimes there will be no notion of magnitude or direction of vectors. (After all, magnitude and direction are not built into our axioms of a vector space). On the other hand, if we are in the realm of Rn , then vectors have such a natural notion of magnitude and direction that surely studying those properties can teach us more about those spaces. So in this chapter, we’ll consider things like length and angle, and also generalize the key properties so that maybe we can apply them to other vector spaces. This can be useful. For example, quantum physics is really the study of certain Hilbert spaces (which are vector spaces which admit an inner product + some other properties). Definition 158. The Euclidean norm (or length) of a vector x = (x1 , . . . , xn ) ∈ Rn is q √ kxk = x21 + · · · + x2n = x · x. Here, the · means the dot product, since it is the product of two vectors (well, one vector with itself) in Rn . We haven’t defined the dot product yet, though so: Definition 159. The dot product of x, y ∈ Rn is x · y = x1 y1 + · · · + xn yn where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). Note that the dot product is only defined between two vectors living in the same Rn , and the dot product of those two vectors is a number, not a vector. Based on the two definitions above, it is clear that x · x = kxk2 for any x ∈ Rn . The dot product also satisfies: 7Not the most enlightening definition, but really the only correct definition. • x · x ≥ 0 for all x ∈ Rn ; • x · x = 0 if and only if x = 0; • for a fixed vector y ∈ Rn , the map Rn → R mapping x ∈ Rn to x · y ∈ R is linear; • x · y = y · x for all x, y ∈ Rn (the dot product is commutative). The fact that x · x = kxk2 shows us that the dot product is intimately connected to the notion of length in Rn . Indeed, you may also remember from Calc 3 that x · y = kxk kyk cos θ where θ is the angle between x and y. So the dot product somehow knows about both length and angles in Rn . We would like to abstract the properties of a dot product so that we can generalize. There may be other notions of angle and length that may be useful, and we’d like to study those. In particular, the notion of dot product is not so useful on infinite-dimensional vector spaces. But first, we should consider the other case that we’re very comfortable with: Euclidean space over C. We’ll see that actually there is a subtlety here that we won’t see if we only think about R. Recall that if λ ∈ C and we write λ = a + bi for a, b ∈ R, then • the absolute value of λ, denoted |λ|, is defined by |λ| = √ a2 + b2 ; • the complex conjugate of λ, denoted λ is λ = a − bi; • |λ|2 = λλ. With these basics in place, we can define a norm in Cn . If z = (z1 , . . . , zn ) ∈ Cn then p kzk = |z1 |2 + · · · + |zn |2 . We need to take absolute values because we want norms to be nonnegative real numbers. Hence, kzk2 = |z1 |2 + · · · + |zn |2 = z1 z1 + · · · + zn zn . So if w, z ∈ Cn , we should define the product to be w1 z 1 + · · · + wn z n . This does generalize the notion of the dot product in Rn , but because everything was real there, we couldn’t see the need to conjugate. Note also that if we take the product of z and w in the other order, we get z 1 w1 + · · · + z n wn = w1 z 1 + · · · + wn z n , so actually we should not expect our notion of product to commute, but commute up to conjugation. We now define an inner product via its abstract properties. Definition 160. Let F = R or C and let V be a vector space over F (not necessarily finitedimensional). An inner product on V is a function that takes each ordered pair (u, v) of elements of V to a scalar hu, vi such that the following properties hold for all u, v, w ∈ V and λ ∈ F: • (positivity) hv, vi ≥ 0; (if F = C, then λ ≥ 0 means that λ is a non-negative real number). • (definiteness) hv, vi = 0 if and only if v = 0; • (additivity in first slot) hu + v, wi = hu, wi + hv, wi; • (homogeneity in first slot) hλu, vi = λhu, vi; • (conjugate symmetry) hu, vi = hv, ui. Now for some examples Example 161. • The Euclidean inner product on Fn is given by h(w1 , . . . , wn ), (z1 , . . . , zn )i = w1 z1 + · · · + wn zn . When F = R, this is the usual dot product. • If c1 , . . . , cn are positive real numbers, then we can define a weighted version of the usual inner product on Fn , namely: h(w1 , . . . , wn ), (z1 , . . . , zn )i = c1 w1 z1 + · · · + cn wn zn . So the usual inner product is an example of this inner product with c1 = · · · = cn = 1. (Note that the positive hypothesis is necessary. If one of the ci = 0, then it will fail to be an inner product). • Let V = R[−1,1] be the vector space of continuous real-valued functions on the interval [−1, 1]. We can define an inner product on V by Z 1 hf, gi = f (x)g(x) dx. −1 When you take analysis classes, you will see that this is a very important inner product. If you talk to a functional analyst about an inner product, this is the one they imagine.8 Note that this example is an infinite-dimensional vector space. Definition 162. An inner product space is a vector space V equipped with an inner product h−, −i on V . In the rest of this chapter, V will denote an inner product space. If V = Fn and no inner product is specified, we assume that we are using the Euclidean inner product. We have a new definition to play with, which means we should try to understand its basic properties! This is fun because the proofs cannot involve too much technical machinery—at this point basically all we have is the definition. Proposition 163. Let V be an inner product space. Then (1) For each fixed u ∈ V , the function that takes v to hv, ui is a linear map V → F. (2) h0, ui = hu, 0i = 0 for every u ∈ V . (3) hu, v + wi = hu, vi + hu, wi for all u, v, w ∈ V . (4) hu, λvi = λhu, vi for all λ ∈ F and u, v ∈ V . Remark. You may have noticed that the definition of an inner product was annoyingly nonsymmetric in the first and second slot of the inner product. Inner products are additive in the first slot, what about the second? Well this proposition shows that inner products are also additive in the second slot, but this is a consequence of the definition, so we didn’t need to assume it. There is also a subtlety about scalar multiplication in the second slot: you need to conjugate when you pull a scalar out of the second slot. Proof. (1) We need to prove that the map is linear. That is, we need to prove that it is additive and homogeneous. The function we are considering is T :V →F T (v) = hv, ui Let v, w ∈ V . Then T (v + w) = hv + w, ui = hv, ui + hw, ui = T (v) + T (w) by the additivity of inner products in the first slot. The proof for homogeneity is similar. (2) Since T is linear, we have T (0) = h0, ui = 0. Since u was arbitrary, this proves h0, ui = 0 for all u ∈ V . Now, by conjugate symmetry: hu, 0i = h0, ui = 0 = 0. (3) and (4) are simply calculations using conjugate symmetry. So, hu, v + wi = hv + w, ui = hv, ui + hw, ui = hv, ui + hw, ui = hu, vi + hu, wi. Here, we are using that if z, w are complex numbers then z + w = z + w and also z = z. These can both be easily checked from the definition of complex conjugation. Finally, hu, λvi = hλv, ui = λhv, ui = λhv, ui = λhu, vi where here we used zw = zw which you should also check. Now fix an inner product on V . We saw that the dot product on Rn was closely related to the √ notion of length. Namely, the length of a vector v was given by v · v. We extend this idea to all inner products. Definition 164. For v ∈ V , the norm of v, denoted kvk is defined by p kvk = hv, vi. Proposition 165. Suppose v ∈ V and λ ∈ F.Then (1) kvk = 0 if and only if v = 0. (2) kλvk = |λ| kvk. Proof. By definiteness of inner products, we have hv, vi = 0 if and only if v = 0. And kvk = hv, vi2 which is 0 if and only if hv, vi = 0. For the second part, we compute kλvk2 = hλv, λvi = λhv, λvi = λλhv, vi = |λ|2 kvk2 and taking square roots yields the desired result. Note that (as in the proof above), it is often easier to work with the square of the norm of a vector, since this has a nice definition as an inner product. Just don’t forget to square or square-root as appropriate. 18. Monday 8/3: Inner Product Spaces (6.A) Not only does an inner product know about “lengths” it also knows about “angles”. Recall that in calculus, you learned that two nonzero vectors u, v ∈ Rn are perpendicular if and only if u · v = 0, or phrased in terms of inner products, hu, vi = 0. We generalize this to any inner product space. Definition 166. Two vectors u, v ∈ V are called orthogonal if hu, vi = 0. Note that since hu, vi = hv, ui, if u and v are orthogonal then hu, vi = 0 = hv, ui. So the definition is symmetric in u and v. if u is orthogonal to v, then v is orthogonal to u. Proposition 167. (1) 0 is orthogonal to every vector in V . (2) 0 is the only vector that is orthogonal to itself. Proof. The first part is simply a restatement of Proposition 163(b). The second part follows from the definition of the inner product. If v is orthogonal to itself, then hv, vi = 0, but by definiteness, this implies that v = 0. This takes us to maybe the most famous theorem? Maybe. Theorem 168 (Pythagorean Theorem). Suppose u and v are orthogonal. Then ku + vk2 = kuk2 + kvk2 . Here’s a picture when V = R2 . (Based on how long this took, I need you to be impressed by this.) u+v u v Proof. This follows by a computation. ku + vk2 = hu + v, u + vi additivity = hu, ui + hu, vi + hv, ui + hv, vi = kuk2 + kvk2 . When we work with norms and inner products, we should be guided by our geometric intuition from Rn (but our proofs should work for abstract inner produts on abstract vector spaces). For example, we can draw a picture of the following theorem. Proposition 169. Let u, v ∈ V with v 6= 0. Then there exists a unique c ∈ F such that w = u − cv is orthogonal to v. In particular, c = hu, vi . kvk2 u w cv v Proof. We want w = u − cv to be orthogonal to v, so we want 0 = hu − cv, vi = hu, vi − chv, vi = hu, vi − c kvk2 . As long as kvk2 6= 0 (which is true because v 6= 0) we can solve for the unique such c, namely c= hu, vi kvk2 We can use this to prove the following very fundamental result, which is important in many fields of mathematics. Theorem 170 (Cauchy–Schwarz inequality). If u, v ∈ V , then |hu, vi| ≤ kuk kvk. Equality holds if and only if one of u, v is a scalar multiple of the other. Proof. Clearly, if v = 0, then both sides are equal to 0. Also in this case, v is a scalar multiple of u. Suppose v 6= 0 and let w = u − hu, vi v so that w and v are orthogonal. Rewriting this, we have kvk2 u=w+ and w is orthogonal to hu,vi . kvk2 hu, vi v kvk2 Hence, by the Pythagorean Theorem, kuk2 = kwk2 + hu, vi v kvk2 2 = kwk2 + hu, vi kvk2 = kwk2 + |hu, vi|2 kvk2 ≥ 2 kvk2 |hu, vi|2 kvk2 Now clear the denominator and take square roots to conclude that |hu, vi| ≤ kuk kvk. Note that we will have equality if and only if kwk2 = 0 if and only if kwk = 0. But this means that u is a scalar multiple of v. As a corollary, we get another very fundamental result, with a nic geometric interpretation. Theorem 171 (Triangle inequality). Suppose u, v ∈ V . Then ku + vk ≤ kuk + kvk . Equality holds if and only if one of u, v is a nonnegative multiple of the other. v u u+v The picture9 of this result explains why it is called the triangle inequality. It basically says that the shortest path between two points is a straight line. 9This is the most pictures I’ve ever tried to put into some notes and you should not expect this going forward. Proof. We have ku + vk2 = hu + v, u + vi = hu, ui + hv, vi + hu, vi + hv, ui = hu, ui + hv, vi + hu, vi + hu, vi = kuk2 + kvk2 + 2 Rehu, vi ≤ kuk2 + kvk2 + 2|hu, vi| ≤ kuk2 + kvk2 + 2 kuk kvk = (kuk + kvk)2 . Again, take square roots to conclude the inequality. Note that equality will hold if and only Rehu, vi = |hu, vi| = kuk kvk. The latter equality holds if and only if one of u and v is a scalar multiple of the other. To have the first equality, we also need hu, vi to be a nonnegative real number. If u = λv then hu, vi = hu, λui = λ kuk2 so we need λ to be a nonnegative real number. Similarly, if v = λu ,then hu, vi = λ kvk2 , so we still need λ to be a nonnegative real number. Altogether, this means that in order to have equality, we must have that u or v is a nonnegative multiple of the other. Conversely, if one of them is a nonnegative multiple of the other, it is easy to see that Rehu, vi = |hu, vi| = kuk kvk. 19. Wednesday 8/5: Orthonormal Bases (6.B) Orthogonal vectors are nice for many reasons. One thing that makes the standard basis of Fn so nice is that it consists of vectors that are mutually orthogonal. But they actually satisfy an even stronger property. Definition 172. A list of vectors is called orthonormal if each vector in the last has norm 1 and is orthogonal to all the other vectors in the list. In other words, e1 , . . . , em is orthonormal if 1 hej , ek i = 0 if j = k, if j 6= k. Remark. Normally I would reserve ei to denote a standard basis vector of Fn . But in this section of your textbook, e1 , . . . , em will generally mean an orthonormal set of vectors. This is not so bad since... Example 173. e1 , . . . , en of • The prototypical example of an orthonormal list is the standard basis Fn . • Also the list 1 1 1 1 (1, 1, 1, 1), (1, 1, −1, −1), (1, −1, 1, −1), (1, −1, −1, 1), 2 2 2 2 is an orthonormal basis of F4 . An orthonormal list of vectors satisfies the following very nice property. Proposition 174. If e1 , . . . , em is an orthonormal list of vectors in V then ka1 e1 + · · · + am em k2 = |a1 |2 + · · · + |am |2 for all a1 , . . . , am ∈ F. Proof. As your book mentions, this follows from the Pythagorean Theorem. For simplicity of illustration, suppose m = 3. Then, since a1 e1 is orthogonal to a2 e2 + a3 e3 , we have ka1 e1 + a2 e2 + a3 e3 k2 = ka1 e1 k2 + ka2 e2 + a3 e3 k2 = ka1 e1 k2 + ka2 e2 k2 + ka3 e3 k2 = |a1 |2 ke1 k2 + |a2 |2 ke2 k2 + |a3 |2 ke3 k2 and so the result follows since kej k = 1 for all j. You could make this rigorous by doing a proof by induction. We also get the following important corollary. Note that this agrees with our geometric intuition, since perpendicular vectors “point in different directions”. The correct way to say this is: Corollary 175. Every orthonormal list of vectors is linearly independent. Hence, if V is finite-dimensional with dim V = n, then any list of n orthonormal vectors in V is a basis. Proof. Suppose that e1 , . . . , em is an orthonormal list of vectors in V and suppose a1 e1 + · · · + am em = 0. We wish to show that all of the scalars must be 0. But by the above result, we have 0 = ka1 e1 + · · · + am em k2 = |a1 |2 + · · · + |am |2 and so each |aj | = 0. For the second statement, any list of n orthonormal vectors is linearly independent, and any list of n linearly independent vectors in a vector space of dimension n is already a basis. Definition 176. An orthonormal basis of V is a basis of V which is also an orthonormal list. Here is one of the most favorable properties of an orthonormal basis. If e1 , . . . , en is a basis of V and w ∈ V , then we know that there must exist scalars such that w = a1 e1 + · · · + an en . In general, it is not that easy to figure out the scalars aj . You would have to set up a matrix whose columns are the ej ’s, augment with w, and solve with row reduction. However, if we have an orthonormal basis: Proposition 177. Let w ∈ V and suppose e1 , . . . , en is an orthonormal basis of V . Then w = a1 e1 + · · · + an en where, for all 1 ≤ j ≤ n, aj = hw, ej i. Furthermore, kwk2 = |hw, e1 i|2 + · · · + |hw, en i|2 . Proof. This is simply a computation. Note that hw, ej i = ha1 e1 , ej i + · · · + han en , ej i = haj ej , ej i = ej . The second claim follows from Proposition 174 applied to w. So orthonormal bases are very nice, but how special are they? Do they always exist? If they do always exist, how do we get ahold of one? Maybe in Fn , using the Euclidean inner product, it seems geometrically reasonable that we should be able to get an orthonormal basis. And it even seems like we can start an orthonormal basis however we’d like. But what about other inner product spaces that we don’t have as much intuition R1 for? Remember that we have an inner product on Pm (R) given by hp, qi = −1 p(x)q(x) dx. Can we get an orthonormal basis with respect to this inner product? If you’re anything like me, I wouldn’t have any idea how to start to try to get one. It turns out that there is a deterministic algorithm that takes any linearly independent list and returns an orthonormal lis with the same span. Proposition 178 (Gram–Schmidt Procedure). Suppose v1 , . . . , vn ∈ V are linearly independent. Let v1 kv1 k v2 − hv2 , e1 ie1 e2 = kv2 − hv2 , e1 ie1 k .. . e1 = en = vn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 kvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 k Then e1 , . . . , en is an orthonormal list and span(e1 , . . . , en ) = span(v1 , . . . , vn ). The geometric picture here (which I drew in lecture, but won’t attempt to recreate in these notes) is that at each step you take the next vector and subtract off the projection onto the span of all the previous ones. This leaves you with a vector that is orthogonal to the span of the previous ones, so is orthogonal to all of the previous ones. In the end, you will get the same span but your vectors will all be orthogonal. Also, at each step, since you divide through by the norm, you will ensure that all of your vectors are unit vectors. Hence, you will be left with an orthonormal set. Proof. When n = 1, the result is clear, as v1 / kv1 k is an orthonormal list. So suppose n > 1 and by induction assume that the algorithm works for any collection of n − 1 vectors. Note that the definition of en makes sense because we are not dividing by 0. This is because since v1 , . . . , vn is linearly independent, we know that vn 6∈ span(v1 , . . . , vn−1 ) = span(e1 , . . . , en−1 ). Hence, the vector in the numerator of en must be nonzero. Further, span(e1 , . . . , en ) = span(v1 , . . . , vn−1 , en ) = span(v1 , . . . , vn ). It is also clear that ken k = 1. Hence, we need only show that en is orthogonal to all of the previous vectors. Now note that for any 1 ≤ j < n, we have hvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 , ej i kvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 k hvn − hvn , e1 ie1 − · · · − hvn , en−1 ien−1 , ej i = denom hvn , ej i − hhvn , ej iej , ej i hvn , ej i − hvn , ej i = = =0 denom denom hen , ej i = and so the set of vectors e1 , . . . , en is an orthonormal list which has the same span as v1 , . . . , vn . As a corollary, we can now see that we can always find an orthonormal basis. Corollary 179. Every finite-dimensional inner product space has an orthonormal basis. Proof. We already know that every finite-dimensional vector space has a basis. Then apply the Gram–Schmidt Procedure to obtain an orthonormal basis. Example 180. Let’s use the Gram–Schmidt Procedure to study a non-Euclidean inner product R1 space. Let’s find an orthonormal basis of P2 (R) with the inner product hp, qi = −1 p(x)q(x) dx. To start with, we know that 1, x, x2 is a basis of P2 (R). At the first step, we should take 1/ k1k where the norm here is given by the inner product defined above. Now 2 Z 1 12 dx = 2 k1k = −1 so k1k = √ 2. Therefore, e1 = q 1 2. For the second vector, the numerator is Z r 1 x − hx, e1 ie1 = x − x −1 1 dx 2 !r 1 = x. 2 r 3 x dx 2 We need to divide by its norm. We have kxk2 = Z 1 x2 dx = −1 so kxk = q 2 3 and so e2 = q 2 3 3 2 x. For the third vector, the numerator is x2 − hx2 , e1 ie1 − hx2 , e2 ie2 !r Z 1 r 1 1 = x2 − x2 dx − 2 2 −1 Z 1 x2 −1 1 = x2 − . 3 We still need to divide by the norm. We have Z 1 1 2 2 2 1 8 2 4 x − = x − x + dx = 3 3 9 45 −1 q q 8 1 2 so x2 − 31 = 45 . Therefore, e3 = 45 8 (x − 3 ). Therefore, our orthonormal basis is r 1 , 2 r 3 x, 2 r 45 8 1 2 x − . 3 !r 3 x 2 20. Friday 8/7: Orthonormal Bases (6.B) We saw last time that every finite-dimensional vector space has an orthonormal basis. Sometimes, in proving something, it may not be enough to know that an orthonormal basis of V exists, but that we can extend any orthornormal list in V to an orthonormal basis of V . (Recall that in many proofs in class and in your homework, it was an important technique to take a linearly independent list and extend it to a basis. Note that an orthonormal list is automatically linearly independent.) Proposition 181. Suppose V is finite dimensional. Then every orthonormal list of vectors in V can be extended to an orthonormal basis of V . Proof. Suppose e1 , . . . , em is an orthonormal list of vectors in V . Then e1 , . . . , em is linearly independent. Hence, we can extend this list to a basis e1 , . . . , em , v1 , . . . , vn of V . Now to this basis, we can apply the Gram–Schmidt process to produce an orthonormal list. This will yield an orthonormal basis e1 , . . . , em , f1 , . . . , fn . Why are the first m vectors unchanged? Well, in the jth step, we have ej = ej − hej , e1 ie1 − · · · − hej , ej−1 iej−1 kej − hej , e1 ie1 − · · · − hej , ej−1 iej−1 k and since all of the ei ’s are orthogonal, the numerator is ej and the denominator is kej k = 1. So to summarize, before, we have the theorem “every linearly independent list can be extended to a basis” and now, we’ve suped it up to “every orthonormal list can be extended to an orthonormal basis”. We may now be tempted to go through all of the theorems of linear algebra and add the word “orthonormal” before “basis” to make all of our theorems stronger. However, the following example shows that we can’t expect to always be able to do this and yield true statements. Example 182. If T ∈ L(V ) and T is diagonalizable, then this means that V has a basis consisting of eigenvectors of T . However, a simple example shows that we cannot expect to be able to find an orthonormal basis of V consisting of eigenvectors of T . If V = R2 and the eigenspaces of T are span(1, 0) and span(1, 1), then the only choice we have in finding eigenvectors is scaling. And no vectors in span(1, 0) are orthogonal to any vectors in span(1, 1), unless you take one or the other vector to be 0. Another way to say this is that if there is a basis of V for which T is diagonal, then it is not true that there is always an orthonormal basis of V for which T is diagonal. However, if we replace “diagonal” with upper-triangular, then it is true. Proposition 183. Suppose T ∈ L(V ). If T has an upper-triangular matrix with respect to some basis of V , then T has an upper-triangular matrix with respect to some orthonormal basis of V . Proof. Suppose T has an upper-triangular matrix with respect to the basis v1 , . . . , vn of V . Then by Proposition 155, we have span(v1 , . . . , vj ) is invariant under T for every j. Now we apply the Gram–Schmidt Procedure to v1 , . . . , vn , which produces an orthonormal basis e1 , . . . , en of V . But note that we showed that span(e1 , . . . , en ) = span(v1 , . . . , vn ) for each j. Hence, span(e1 , . . . , en ) is invariant under T so by Proposition 155, T is upper-triangular with respect to the orthonormal basis e1 , . . . , en . If we combine this result with Proposition 156, we obtain a corollary known as Schur’s Theorem. Corollary 184 (Schur’s Theorem). Suppose V is a finite-dimensional complex vector space and T ∈ L(V ). Then T has an upper-triangular matrix with respect to some orthonormal basis of V . We saw that (in Proposition 163), that if you fix a vector u ∈ V , then the function that takes v to hv, ui is a linear map. This is a function V → F, i.e., a linear functional on V . Example 185. The function φ : F3 → F defined by φ(z1 , z2 , z3 ) = 3z1 + 2z2 − z3 3 is a linear h funtional i on F . You can check directly that it is linear, or you can represent it as a matrix 3 2 −1 , for example. However, we can also think of this linear functional as φ(z) = hz, ui where u = (3, 2, −1). One question is, can every linear functional be written in this way? If I have an inner product h−, −i on V and φ : V → F is any linear map, can I find a vector u ∈ V such that φ(v) = hv, ui? It seems highly non-obvious to me.10 Here’s an example where it doesn’t seem too clear. Example 186. Take the inner product on P2 (R) defined by Z 1 hp, qi = p(t)q(t) dt −1 and consider the function φ : P2 (R) → R defined by Z a φ(p) = p(t)(cos(πt)) dt −1 is linear (do a quick sanity-check and verify this). It is not obvious that there exists a u ∈ P2 (R) such that φ(p) = hp, ui. It in fact seems kind of unlikely! It seems like we should take u = cos(πt), but of course can’t because this is not an element of P2 (R). Nevertheless, it turns out that the answer to the question above is yes. This is known as the Riesz Representation Theorem (because you can represent every linear functional via the inner product). Theorem 187 (Riesz Representation Theorem). Suppose V is a finite-dimensional inner product space and φ ∈ V 0 . Then there exists a unique vector u ∈ V such that φ(v) = hv, ui for every v ∈ V . Proof. Let φ ∈ V 0 . First we show existence of such a u, then uniqueness. Choose an orthonormal basis e1 , . . . , en of V (which we can do by Corollary 179). Then by Proposition 177, we can write v = hv, e1 ie1 + · · · + hv, en ien . 10It seems like there could be crazy linear maps out there that can’t be written just as taking an inner product with a fixed vector. On the other hand, there’s dim V worth of vectors in V and V 0 has dimension dim V , so maybe there’s some hope... Hence φ(v) = φ (hv, e1 ie1 + · · · + hv, en ien ) = hv, e1 iφ(e1 ) + · · · + hv, en iφ(en ) = hv, φ(e1 )e1 i + · · · + hv, φ(en )en i = hv, φ(e1 )e1 + · · · + φ(en )en i Hence, we can let u = φ(e1 )e1 + · · · + φ(en )en and the above computation shows that φ(v) = hv, ui for every v ∈ V . Uniqueness is actually easier. Suppose that u1 , u2 ∈ V are vectors such that φ(v) = hv, u1 i = hv, u2 i for every v ∈ V . Then 0 = hv, u1 i − hv, u2 i = hv, u1 − u2 i for every v ∈ V . In particular, we can choose v = u1 − u2 . But then ku1 − u2 k = 0 so u1 − u2 = 0 and hence u1 = u2 , as desired. Remark. If V is a finite-dimensional inner product space, then we can define a map Φ:V →V0 u 7→ h−, ui. What the Riesz Representation Theorem is saying is that this map is bijective. Every linear functional in V 0 can be represented as h−, ui (so Φ is surjective), and this u is unique (so Φ is injective). However, in general, Φ is not a linear map, since Φ(λu) is the linear map h−, λui = λh−, ui = λΦ(u). It is linear over R, but not over C. Your book works out what the vector is that gives the linear map in Example 186, but I was too tired to do it in lecture. 21. Monday 8/10: Orthogonal Complements (6.C) Let V be an inner product space. Definition 188. If U is a subset of V , then the orthogonal complement of U , denoted U ⊥ is the set of all vectors in V that are orthogonal to every vector in U . That is, U ⊥ = {v ∈ V | hv, ui = 0 for every u ∈ U }. The notation U ⊥ is generally read “U perp”. We can picture orthogonal complements in R2 and R3 , by drawing a picture. This should give us some intution about what to expect from orthogonal complements. We first prove some basic properties about orthogonal complements. Proposition 189. (1) If U is a subset of V , then U ⊥ is a subspace of V . (2) {0}⊥ = V . (3) V ⊥ = {0}. (4) If U is a subset of V , then U ∩ U ⊥ ⊆ {0} (5) If U and W are subsets of V and U ⊆ W , then W ⊥ ⊆ U ⊥ . Proof. (1) Note that here we don’t require U to be a subspace, just any subset. If U is any subset, we want to show that U ⊥ is a subspace. First, certainly h0, ui = 0 for all u ∈ U (in fact for all u ∈ V , so 0 ∈ U ⊥ ). Now suppose v, w ∈ U ⊥ . Then for any u ∈ U , we have hv, ui = hw, ui = 0. Hence, for any u ∈ U , hv + w, ui = hv, ui + hw, ui = 0 so v + w ∈ U ⊥ . Homogeneity is proved similarly and so U ⊥ is a subspace of V . (2) For any v ∈ V , we have hv, 0i = 0 so v ∈ {0}⊥ . Hence, {0}⊥ = V . (3) Suppose w ∈ V ⊥ . Then w is orthogonal to every vector in V . In particular, w is orthogonal to w (since w ∈ V ). But then hw, wi = 0 so w = 0. Hence, V ⊥ = {0}. (4) Suppose U is a subset of V . Let u ∈ U ∩ U ⊥ . Since u ∈ U ⊥ , it is orthogonal to every vector in U . Since u ∈ U , it is orthogonal to itself. Again, this means u = 0 so U ∩ U ⊥ = {0}. (5) If U ⊆ W and v ∈ W ⊥ , then hv, wi = 0 for every w ∈ W . But every u ∈ U is also in W so hv, ui = 0 for all u ∈ U . Hence, v ∈ U ⊥ . Recall that if V = U ⊕ W is the direct sum of two subspaces, we can write each element of V in exactly one way as a sum u + w with u ∈ U and w ∈ W . This is like breaking up V into two pieces that overlap minimally (since we saw that V = U ⊕ W if and only if V = U + W and U ∩ W = {0}). It turns out that given any subspace U of V , we can write V as the direct sum of U and some other subspace. In particular: Proposition 190. If U ⊆ V is a finite-dimensional subspace, then V = U ⊕ U ⊥ . Proof. We showed in the previous proposition that U ∩ U ⊥ = {0}. Hence, as long as U + U ⊥ = V , we will be able to conclude that V = U ⊕ U ⊥ . Clearly, U + U ⊥ ⊆ V . We just need to prove the reverse inclusion. To this end,d suppose v ∈ V . Choose an orthonormal basis e1 , . . . , em of U . Now we cleverly rewrite v as: v = hv, e1 ie1 + · · · + hv, em iem + v − hv, e1 ie1 − · · · − hv, em iem . | {z } | {z } u w Clearly u ∈ U . We aim to show that w ∈ U ⊥ . Now note that hw, e1 i = hv − hv, e1 ie1 − · · · − hv, em iem , e1 i = hv, e1 i − hv, e1 ihe1 , e1 i = 0 so w is orthogonal to e1 . Similarly, it is orthogonal to every ej and hence is orthogonal to every vector in U . Thus, w ∈ U ⊥ so v ∈ U + U ⊥ . Hence, V = U + U ⊥ , as desired. Using this, we can compute dim U ⊥ . Proposition 191. Suppose V is finite-dimensional and U is a subspace of V . Then dim U ⊥ = dim V − dim U Proof. We already proved that dim(U ⊕ U ⊥ ) = dim U + dim U ⊥ so combined with the above, we get the desired equality. Also, if you start with a subspace and take the orthogonal complement twice, then you get back to the subspace you started with. Proposition 192. Suppose U is a finite-dimensional subspace of V . Then U = (U ⊥ )⊥ . Proof. Let u ∈ U . Then by definition, for every v ∈ U ⊥ , hu, vi = 0. But this means that u is orthogonal to every vector in U ⊥ . Hence, u ∈ (U ⊥ )⊥ . This shows that U ⊆ (U ⊥ )⊥ . Now suppose v ∈ (U ⊥ )⊥ ⊆ V . By the previous proof, we can write v = u + w where u ∈ U and w ∈ U ⊥ . Then v − u = w ∈ U ⊥ . Since u ∈ U , we just showed u ∈ (U ⊥ )⊥ . Since v ∈ (U ⊥ )⊥ as well, we therefore have v − u ∈ (U ⊥ )⊥ . In summary, we have v−u ∈ U ⊥ ∩(U ⊥ )⊥ = {0}. Hence, u = v. Therefore, v ∈ U so (U ⊥ )⊥ ⊆ U . We now define an operator that projects onto a subspace U of V . I drew a picture in lecture, but don’t have the ambition to try to make a picture for these notes... Definition 193. Let U be a finite-dimensional subspace of V . The orthogonal projection of V onto U is the operator PU ∈ L(V ) defined as follows: For v ∈ V , write v = u + w where u ∈ U and w ∈ U ⊥ . Then PU (v) = u. We remark that since V = U ⊕U ⊥ , in the above definition, there is a unique way to write v = u+w. Hence, PU is well-defined. We now list some basic properties. You should think about the geometric picture that we drew when thinking about these properties. Proposition 194. Suppose U is a finite-dimensional subspace of V and v ∈ V . Then (1) PU ∈ L(V ). (2) PU (u) = u for all u ∈ U . (3) PU (w) = 0 for all w ∈ U ⊥ . (4) range PU = U (5) null PU = U ⊥ (6) v − PU (v) ∈ U ⊥ . (7) PU2 = PU . (8) kPU (v)k ≤ kvk. (9) For every orthonormal basis e1 , . . . , em of U , PU (v) = hv, e1 ie1 + · · · + hv, em iem . Proof. (1) To show that PU is a linear map on V , we need to show that it is additive and homogeneous. Suppose v1 , v2 ∈ V . Then write v1 = u1 + w1 and v2 = u2 + w2 with u1 , u2 ∈ U and w1 , w2 ∈ U ⊥ (there is a unique way to do this). Then PU (v1 ) = u1 and PU (v2 ) = u2 . Now notice v1 + v2 = (u1 + u2 ) + (w1 + w2 ) where u1 + u2 ∈ U and w1 + w2 ∈ U ⊥ . Hence, PU (v1 + v2 ) = u1 + u2 = PU (v1 ) + PU (v2 ). Homogeneity is proved similarly. (2) Let u ∈ U . Since we can write u = u + 0 where u ∈ U and 0 ∈ U ⊥ , therefore PU (u) = u. (3) Similarly, if w ∈ U ⊥ then we can write w = 0 + w where 0 ∈ U and w ∈ U ⊥ . Hence, PU (w) = 0. (4) Since we defined PU (v) = u where we write v = u + w with u ∈ U and w ∈ U ⊥ , therefore range PU ⊆ U . Conversely, if u ∈ U then PU (u) = u so u ∈ range PU . Hence, U = range PU . (5) By part (3), we have U ⊥ ⊆ null PU . Now suppose v ∈ null PU . Since PU (v) = 0, this means that v = 0 + v where 0 ∈ U and v ∈ U ⊥ . Hence, v ∈ U ⊥ , which proves the reverse inclusion. (6) Write v = u + w with u ∈ U and w ∈ U ⊥ . Then v − PU (v) = u + w − u = w ∈ U ⊥ . (7) Let v ∈ V and write v = u + w with u ∈ U and w ∈ U ⊥ . Then (PU )2 (v) = PU (PU (v)) = PU (u) = u = PU (v). (8) Again, write v = u + w with u ∈ U an w ∈ U ⊥ . Then kPU (v)k2 = kuk2 ≤ kuk2 + kwk2 = ku + wk2 = kvk2 where the penultimate equality follows from the Pythagorean Theorem, since u and w are orthogonal. (9) We saw this in the proof of Proposition 190. 22. Wednesday 8/12: Minimization Problems (6.C) Linear algebra is one of the most useful areas of mathematics (with uses outside of mathematics). The better you understand the concepts behind an application, the more you’ll be able to understand why things work and perhaps adapt the techniques to new settings. I’ll give three applications in this lecture, one of which is in your textbook, Frequently, in real life11, you want to find a vector that is as close as possible to a subspace. A classical example is linear regression. Example 195. Suppose you have five data points (x1 , y1 ), . . . , (x5 , y5 ) and you are trying to approximate the relationship between xi and yi . Maybe you are interested in how square footage affects housing prices, so you have the data of five houses where xi is the square footage of the ith house and yi is the price. This means you want to find scalars α, β such that yi = αxi + β for every 1 ≤ i ≤ 5. This is the same thing as wanting to solve the matrix equation y1 x1 1 " # . . . α .. .. . β =. y5 x5 1 |{z} | {z } | {z } z A y for α and β. Of course, in general, there will be no solution to this equation, since the points are unlikely to lie on a single line. In other words, (y1 , . . . , y5 ) is not likely to be in the range of the linear transformation given by the matrix A. Indeed, the range of the matrix A will be a two-dimensional subspace of R5 . The goal in linear regression is to adjust the vector y to ye = (e y1 , . . . , ye5 ) so that the equation Az = ye is solvable, and so that ye is “close” to y. In other words, you want to minimize ke y − yk such that ye is in the range of A. Luckily, linear algebra will show us that we just need to project y onto the range of A to find ye, then we can solve the matrix equation for z to find the parameters for our linear regression. Proposition 196. Suppose U is a finite-dimensional subspace of V and v ∈ V . Then kv − PU (v)k ≤ kv − uk for all u ∈ U , with equality if and only if u = PU (v). 11As a mathematician, to me “in real life” means in engineering, statistics, computer science, physics, etc. In other words, PU (v) is the closest vector in U to the vector v. Proof. We compute kv − PU (v)k2 add non-neg. ≤ Pyth Thm = kv − PU (v)k2 + kPU (v) − uk2 (v − PU (v)) + (PU (v) − u)2 = kv − uk2 The Pythagorean Theorem applies since v − PU (v) ∈ U ⊥ and PU (v) − u ∈ U , so these two vectors are orthogonal. Taking square roots gives the desired inequality. Note that equality holds if and only if the first inequality is actually equality which is if and only if kPU (v) − uk = 0 if and only if PU (v) = u. The picture in your book with V = R2 , U a line, v a vector not on that line is a good picture of this theorem. Now if we think about this “projection as minimization” away from our comfortable geometric home of Fn to a more exotic vector space of functions on some intervals, this takes us.... Approximating Functions. Let V = CR [−π, π] denote the space of continuous real-valued functions on [−π, π]. All kinds of continuous functions live in this vector space: polynomials, sines, cosines, exponentials, rational functions with no roots in [−π, π], etc. There are also lots of other crazy functions that live in V : anything you could draw without picking your pencil up between −π and π. One thing you might want to do is approximate a continuous function f ∈ V by taking a linear combination of some nice functions. For example, you learned in calculus that you can approximate an infinitely differentiable function f by its Taylor series. If you truncate at, say, degree 5, then you get an approximation to f that is a degree 5 polynomial. Of course, V is infinite-dimensional, and the space of degree 5 (or smaller) polynomials is dimension 6. Call this subspace U . Another way to get an approximation to f (even if f is not differentiable!) is to project onto U . Let’s try this! Example 197. Let’s approximate the function sin(x) on the interval [−π, π] by a degree 5 polynomial. We will use the inner product on V given by Z π hf, gi = f (x)g(x) dx. −π If we use orthogonal projection to project sin(x) onto the subspace of degree 5 polynomials, this should yield a polynomial that is “close” to sin(x). In other words, we want to find a u ∈ U which minimizes ksin(x) − uk. Note that this means minimizing Z π 2 | sin(x) − u|2 dx. ksin(x) − uk = hsin(x) − u, sin(x) − ui = −π It makes sense that if this integral is small, then u is close sin(x) along the entire interval. Now unless you enjoy pain, I wouldn’t try doing this by hand. But we can take the basis 1, x, . . . , x5 of U , use Gram–Schmidt to get an orthonormal basis, and then compute the projection onto U . I didn’t do this, because your book tells me that u = 0.987862x − 0.155271x3 + 0.00564312x5 where the coefficients have been approximated by decimals. Your book has this very impressive figure which shows how close u is to sin(x) on this interval. On the other hand, the Taylor series approximation to sin(x) near 0 is given by v = x − x3 /3! + x5 /5!. This is another degree 5 polynomial that is “close” to sin(x). So v ∈ U , but v is further away (in this norm) from sin(x) than u is. Here is the graph.12 Fourier Series. Another very useful way to approximate a function in V = CR [−π, π] is to use Fourier series. Here, we will use a scaled version of the previous inner product on V , namely Z 1 π f (x)g(x) dx. hf, gi = π −π Again, V is an infinite-dimensional inner product space, but we would like to be able to approximate functions in V by projecting onto some smallish subspace. It turns out that the functions 1 √ , sin(x), sin(2x), sin(3x), ... cos(x), cos(2x), cos(3x), ... 2 are orthonormal vectors with respect to this inner product.13 So you can take √1 , 2 sin(x), . . . , sin(nx), cos(x), . . . , cos(nx) and get a (2n + 1)-dimensional subspace of V . Taking the orthogonal projection of a function f ∈ V onto this subspace is the same thing as giving the first few terms of its Fourier series. So the real reason why Fourier series make good approximations to functions is because they are given by projection onto a nice subspace of V . 13You will prove this on your homework. 23. Friday 8/14: Operators on Inner Product Spaces (7.A) The general outline of this class has been: • Study vector spaces and subspaces (Chapters 1 and 2). • Study linear maps between vector spaces, V → W (Chapter 3). • Study operators on a vector space, V → V (Chapter 5). • Study inner product spaces (Chapter 6). • ??? • Profit. What clearly must go in the “???” is to study linear maps between inner product spaces and operators on inner product spaces. This is what we now endeavor to do. So throughout this chapter, V and W will always denote finite-dimensional inner product spaces over F. Definition 198. Suppose T ∈ L(V, W ). The adjoint of T is the function T ∗ : W → V such that hT v, wi = hv, T ∗ wi for every v ∈ V and every w ∈ W . Why should such a function T ∗ exist? Well, suppose T ∈ L(V, W ) and fix a vector w ∈ W . Then consider the linear functional V → F which maps v ∈ V to hT v, wi. (This is linear because it is the composition of two linear maps v 7→ T v 7→ hT v, wi). This is a linear functional that depends on T and w. By the Riesz Representation Theorem, there is a unique vector in u ∈ V such that this map is the same thing as hv, ui. This is the vector that we will call T ∗ w so that hT v, wi = hv, T ∗ wi for all v ∈V. Example 199. Let T : R4 → R2 be defined by T (x1 , x2 , x3 , x4 ) = (x1 − x3 , x2 ). Lets compute the adjoint T ∗ : R2 → R4 . Fix a point (y1 , y2 ) ∈ R2 . By the definition of the adjoint, we have h(x1 , x2 , x3 , x4 ), T ∗ (y1 , y2 )i = hT (x1 , x2 , x3 , x4 ), (y1 , y1 )i = hx1 − x3 , x2 ), (y1 , y2 )i = x1 y1 − x3 y1 + x2 y2 = h(x1 , x2 , x3 ), (y1 , y2 , −y1 )i and so T ∗ (y1 , y2 ) = (y1 , y2 , −y1 , 0). Note that in the above example, T ∗ is a linear map (which we didn’t assume by definition of T ∗ ). This is true in general. Proposition 200. If T ∈ L(V, W ), then T ∗ ∈ L(W, V ). Proof. Suppose T ∈ L(V, W ) and fix w1 , w2 ∈ W . We first want to show T ∗ (w1 + w2 ) = T ∗ (w1 ) + T ∗ (w2 ). Note that for every v ∈ V , we have hv, T ∗ (w1 + w2 )i = hT v, w1 + w2 i = hT v, w1 i + hT v, w2 i = hv, T ∗ w1 i + hv, T ∗ w2 i = hv, T ∗ w1 + T ∗ w2 i. Since this is true for every v ∈ V , and by the Riesz Representation Theorem, if h−, u1 i = h−, u2 i then u1 = u2 , we conclude that T ∗ (w1 + w2 ) = T ∗ w1 + T ∗ w2 . Homogeneity is proved similarly (and in your book). So we have a shiny new linear map, the adjoint, to understand. What should we do next? First, we should understand basic properties. Proposition 201. (1) (S + T )∗ = S ∗ + T ∗ for all S, T ∈ L(V, W ); (2) (λT )∗ = λT ∗ for all λ ∈ F and T ∈ L(V, W ); (3) (T ∗ )∗ = T for all T ∈ L(V, W ); (4) I ∗ = I where I is the identity operator on V ; (5) (ST )∗ = T ∗ S ∗ for all T ∈ L(V, W ) and S ∈ L(W, U ). Proof. (1) Suppose S, T ∈ L(V, W ). If v ∈ V and w ∈ W , then hv, (S + T )∗ wi = h(S + T )v, wi = hSv, wi + hT v, wi = hv, S ∗ wi + hv, T ∗ wi = hv, S ∗ w + T ∗ wi. Again, since this is true for every v ∈ V , we have (S + T )∗ w = S ∗ w + T ∗ w for all w ∈ W , giving our desired equality of functions. (2) Suppose T ∈ L(V, W ) and λ ∈ F. Then hv, (λT )∗ wi = hλT v, wi = λhT v, wi = λhv, T ∗ wi = hv, λT ∗ wi so (λT )∗ = λT ∗ . (3) Suppose T ∈ L(V, W ). If v ∈ V and w ∈ W , then hw, (T ∗ )∗ vi = hT ∗ w, vi = hv, T ∗ wi = hT v, wi = hw, T vi so (T ∗ )∗ = T . (4) If v, u ∈ V then hv, I ∗ ui = hIv, ui = hv, ui so I ∗ u = u. Since this is true for all u ∈ V , I ∗ is the identity on V . (5) Suppose T ∈ L(V, W ) and S ∈ L(W, U ). If v ∈ V and u ∈ U then hv, (ST )∗ ui = hST v, ui = hT v, S ∗ ui = hv, T ∗ (S ∗ u)i. Hence, (ST )∗ u = T ∗ (S ∗ u), as desired. The next question you should ask is... what about the null space and range of T ∗ ?14 Proposition 202. Suppose T ∈ L(V, W ). Then (1) null T ∗ = (range T )⊥ ; (2) range T ∗ = (null T )⊥ ; (3) null T = (range T ∗ )⊥ ; (4) range T = (null T ∗ )⊥ . Proof. The nice thing about this proof is that we only need to prove (1) by hand, and then we will cleverly conclude (2)–(4). Suppose w ∈ W . Then w ∈ null T ∗ ⇐⇒ T ∗ w = 0 ⇐⇒ hv, T ∗ wi = 0 for all v ∈ V ⇐⇒ hT v, wi = 0 for all v ∈ V ⇐⇒ w ∈ (range T )⊥ . Therefore, null T ∗ = (range T )⊥ . 14Thanks for asking! Now take orthogonal complements of both sides to get that (null T ∗ )⊥ = ((range T )⊥ )⊥ = range T, which is (4). Now since we proved (1) for any linear map, we know null T = null(T ∗ )∗ = (range T ∗ )⊥ which is (3). Again, taking orthogonal complements yields (2). And the next natural question might be... what is the matrix of T ∗ ? As sometimes happens in math classes, I’m going to give you a definition which betrays the answer before we prove the answer.15 Definition 203. The conjugate transpose of an m × n matrix is the n × m matrix obtained by taking the transpose followed by taking the complex conjugat eof each entry. If A is a matrix, then its conjugate transpose is sometimes denoted AH (for Hermitian transpose). This is not notation that your book uses, however. Example 204. If F = R, then the conjugate transpose is just the transpose (since the complex conjugate of a real number is itself). " But suppose A = 2 1+i # 3 − i 5i 2 7 1−i = 3 + i 2 . . Then the conjugate transpose AH 2 −5i 7 So our question is, what is the matrix of T ∗ ? The answer to that question is, of course, “it depends on the basis of V and basis of W ”. As you change the bases, you change the matrix. In the following result, note that we are assuming that we have taken orthonormal bases of V and W . If you drop the word “orthonormal”, then the result is not true. When you have fixed orthonormal bases of V and W , then the matrices of T and T ∗ will be conjugate transpose to one another. The fact that this is only true for some choices of bases is a good example of why the abstract “basis-free” approach to linear algebra is much stronger. You can define the adjoint of a linear map regardless of what basis you use. We would not want to 15Act surprised anyway. define the adjoint of a linear map as “the linear map associated to the conjugate transpose of the matrix”, since this will only be true for some choices of basis. Proposition 205. Let T ∈ L(V, W ). Suppose e1 , . . . , en is an orthonormal basis of V and f1 , . . . , fm is an orthonormal basis of W . Then M(T ∗ , (f1 , . . . , fm ), (e1 , . . . , en )) is the conjugate transpose of M(T, (e1 , . . . , en ), (f1 , . . . , fm )). Proof. For ease of notation, we fix the bases above and write simply M(T ∗ ) and M(T ). To get the kth column of the matrix M(T ), we write T ek as a linear combination of the fj ’s. But since the fj ’s form an orthonormal basis, we know that T ek = hT ek , f1 if1 + · · · + hT ek , fm ifm . Therefore the (j, k) entry of M(T ) Is hT ek , fj i. By the same reasoning, we say that the (j, k) entry of M(T ∗ ) is hT ∗ fk , ej i. By the definition of the adjoint, this is equal to hfk , T ej i = hT ej , fk i. But this is the complex conjugate of the (k, j) entry of M(T ). Hence, with respect to this choice of orthonormal bases of V and W , M(T ∗ ) is the conjugate transpose of M(T ). 24. Monday 8/17: Self-Adjoint and Normal Operators (7.A) If we consider an operator T : V → V , then since the adjoint of T goes from the codomain of T to the domain of T , we have that T ∗ : V → V will also be an operator on V . In general, these two maps are related as in the definition of the adjoint, namely hT v, wi = hv, T ∗ wi for all v, w ∈ V . However, if T = T ∗ , we call such an operator self-adjoint. We will see that these are very important and very special operators. Definition 206. An operator T ∈ L(V ) is called self-adjoint if T = T ∗ . In other words an operator T ∈ L(V ) is self-adjoint if and only if hT v, wi = hv, T wi for all v, w ∈ V . Example 207. If we fix the standard basis of Fn and T ∈ L(Fn ), then we can write the n × n matrix M(T ) with respect to the standard basis. Then T will be self-adjoint if and only if this matrix is equal to its conjugate transpose. For example, 1 2i 3 4 1 + i −2i 3 1−i 5 will give a self-adjoint operator C3 → C3 . Note that in particular, the diagonal entries must be real numbers. If F = R, then the conjugate transpose is just the transpose, so an operator is self-adjoint if and only if its matrix with respect to the standard basis is symmetric. This example shows that, by analogy, we can think of taking the adjoint on L(V ) as simliar to taking complex conjugation in C. By this analogy, self-adjoint operators are analogous to real numbers. A slightly weaker notion than a self-adjoint operator is a normal operator. Definition 208. An operator on an inner product space is called normal if it commutes with its adjoint. That is, T ∈ L(V ) is normal if T T ∗ = T ∗ T. This is a weaker notion than being self-adjoint, because of course if T is self adjoint, then T ∗ = T , so certainly T T ∗ = T ∗ T . Example 209. Let T : C2 → C2 be the operator whose matrix with respect to the standard basis is " 2i 3 # −3 −i " Then the matrix of T∗ is −2i −3 . # , so T is not self-adjoint. However, 3 i " #" # " # " #" # 2i 3 −2i −3 13 −3i −2i −3 2i 3 = = −3 −i 3 i 3i 10 3 i −3 −i so T and T ∗ commute. Hence, T is normal. It turns out that self-adjoint operators and normal operators are some of the nicest possible operators on an inner product space. We first dig into studying self-adjoint operators. Proposition 210. If S, T ∈ L(V ) are self-adjoint, then S + T is self-adjoint. If λ ∈ R and T is self-adjoint, then λT is self-adjoint. Proof. This follows from Proposition 201. If S, T are self-adjoint then S ∗ = S and T ∗ = T . Hence, (S + T )∗ = S ∗ + T ∗ = S + T so S + T is self adjoint. Also, (λT )∗ = λT ∗ = λT which is equal to λT if and only if λ is actually real. Remark. Note that this result says that over C, the set of self-adjoint operators is not a subspace of L(V ). The set is closed under addition but not closed under scalar multiplication. Over R, the set of self-adjoint operators is a subspace of L(V ). But even with this first basic result, we can see an instance of the analogy “{self-adjoint operators} ⊆ L(V ) is like R ⊆ C”. The set R inside of C is closed under addition, but not closed under scalar multiplication. In other words, R ⊆ C is not a subspace of C (as a C-vector space). In fact, R is just a small slice of the one-dimensional vector space C. Similarly, the set of self-adjoint operators is a small (nice) slice of L(V ). Here’s another result which says that self-adjoint operators are somehow analogous to real numbers. Proposition 211. Every eigenvalue of a self-adjoint operator is real. Proof. Suppose T is a self-adjoint operator and let λ be an eigenvalue. We want to show λ is real. Since T is self-adjoint, we know that hT v, vi = hv, T vi for every v ∈ V . Choose an eigenvector with eigenvalue λ and work outward from both sides of this equation λ kvk2 = hλv, vi = hT v, vi = hv, T vi = hv, λvi = λ kvk2 . Since v is an eigenvector, it is nonzero, so kvk2 6= 0 and hence λ = λ, so λ is real. This next result is not true over R, because you can consider the operator on R2 that rotates by an angle of π/2. Lemma 212. Suppose V is an inner product space over C and T ∈ L(V ). Suppose hT v, vi = 0 for all v ∈ V . Then T = 0. Proof. Verify that for all u, w ∈ V , 1 1 hT u, wi = hT (u + w), u + wi − hT (u − w), u − wi 4 4 i i + hT (u + iw), u + iwi − hT (u − iw), u − iwi. 4 4 [Just expand out the right-hand side and cancel.] Now notice every inner product on the right-hand side is of the form hT v, vi for some vector v. So by hypothesis, these are all 0. Then hT u, wi = 0 for all u, w ∈ V . Now fix u and take w = T u. Then hT u, T ui = kT uk2 = 0 so T u = 0. Since u was arbitrary, this shows that T = 0. We said that taking adjoints can be thought of as analogous to complex conjugation, and self-adjoint operators can be thought of as analogous to real numbers (inside the space of all operators). The next result is another instance of this analogy. In general, for a vector v ∈ V , the scalar hT v, vi will be some complex number that varies as you vary v. However, if T is self-adjoint, then this scalar will always be real. Proposition 213. Suppose V is a complex inner product space and T ∈ L(V ). Then T is selfadjoint if and only if hT v, vi ∈ R for every v ∈ V . Proof. Let v ∈ V . Then hT v, vi − hT v, vi = hT v, vi − hv, T vi = hT v, vi − hT ∗ v, vi = h(T − T ∗ )v, vi. A scalar λ ∈ F is real if and only if λ − λ = 0. So if hT v, vi ∈ R for every v ∈ V if and only if h(T − T ∗ )v, vi = 0 for every v ∈ V , which by the previous proposition happens if and only if T − T ∗ = 0.16 Hence, T = T ∗ and T is self-adjoint. We saw that Lemma 212 is not necessarily true over R, but only true over C. However, if we restrict our attention to self-adjoint operators, the statement is true over R. Proposition 214. Suppose T is a self-adjoint operator on V such that hT v, vi = 0 for all v ∈ V . Then T = 0. Proof. We showed that this was true over C even without assuming T is self-adjoint. So we may assume that V is a real inner product space. Now if u, w ∈ V then hT (u + w), u + wi − hT (u − w), u − wi 4 1 hT u, ui + hT u, wi + hT w, ui + hT w, wi − hT u, ui + hT u, wi + hT w, ui − hT w, wi = 4 1 = hT u, wi + hT w, ui . 2 Now since T is self adjoint, we have hT w, ui = hw, T ui = hT u, wi = hT u, wi where the last equality holds since we are working over R. Hence, the first expression simply becomes hT u, wi. 16We only proved one direction in the previous theorem, but the converse to the previous theorem is obviously true. But each term in the numerator of the first expression is of the form hT v, vi for some v, which means the first expression is equal to 0 so hT u, wi = 0 for all u, w ∈ V . In particular, taking u arbitrary and w = T u, we have hT u, T ui = 0 for all u ∈ U whence T u = 0 for all u ∈ U . Hence T = 0. Proposition 215. An operator T ∈ L(V ) is normal if and only if kT vk = kT ∗ vk for all v ∈ V . Proof. Let T ∈ L(V ). Note that the operator T ∗ T − T T ∗ is self-adjoint since (T ∗ T − T T ∗ )∗ = (T ∗ T )∗ − (T T ∗ )∗ = T ∗ (T ∗ )∗ − (T ∗ )∗ T ∗ = T ∗ T − T T ∗ (by using several parts of Proposition 201). Therefore, we have T is normal ⇐⇒ T ∗ T − T T ∗ = 0 Prop 214 ⇐⇒ h(T ∗ T − T T ∗ )v, vi = 0 for all v ∈ V ⇐⇒ hT ∗ T v, vi = hT T ∗ v, vi for all v ∈ V . Now by the definition of the adjoint, we have hT ∗ T v, vi = hT v, (T ∗ )∗ vi = hT v, T vi. So the left-hand side of the last line is kT vk2 and similarly the right-hand side is kT ∗ vk2 . Hence, we continue the chain of equivalences ⇐⇒ kT vk2 = kT ∗ vk2 for all v ∈ V ⇐⇒ kT vk = kT ∗ vk for all v ∈ V , which proves the desired equivalence. The idea is that normal operators are very closely related to their adjoints (although not as closely as self-adjoint operators, naturally). 25. Wednesday 8/19: The Spectral Theorem (7.B) The goal of today’s lecture is to state two very important theorems, the Spectral Theorems. In order to talk about them, we need to know a bit more about normal operators. Recall that we were in the midst of showing that even though normal operators aren’t as strongly related to their adjoints as self-adjoint operators are, they are still very closely related. For example, if T is normal, then T and T ∗ share all of their eigenvectors. Proposition 216. Suppose T ∈ L(V ) is normal and v ∈ V is an eigenvector of T with eigenvalue λ. Then v is also an eigenvector of T ∗ with eigenvalue λ. Proof. First, we know that if T is normal then for any λ ∈ F, the operator T − λI is normal. This is because (T − λI)(T − λI)∗ Prop 201 = (T − λI)(T ∗ − λI) = T T ∗ − λIT ∗ − T λI + λλI = T ∗ T − λT ∗ I − λIT + λ2 I = (T ∗ − λI)(T − λI) = (T − λI)∗ (T − λI) where T commutes with T ∗ since T is normal, and every operator commutes with the identity operator. Hence, v is an eigenvector of T with eigenvalue λ, then we have 0 = k(T − λI)vk = k(T − λI)∗ vk = (T ∗ − λI)v and so v is an eigenvector of T ∗ with eigenvalue λ. Our last result in this section is a hint as to the importance of normal operators. Their “eigenspaces are orthogonal” in a sense we make precise below. Proposition 217. Suppose T ∈ L(V ) is normal. If u is an eigenvector with eigenvalue α and v is an eigenvector with eigenvalue β and α 6= β, then u and v are orthogonal. Proof. We have T u = αu, T v = βv and also T ∗ v = βv. Therefore, (α − β)hu, vi = hαu, vi − hu, βvi = hT u, vi − hu, T ∗ vi = 0. But since α − β 6= 0, this means hu, vi = 0, as desired. Recall that an operator T ∈ L(V ) has a diagonal matrix with respect to a basis of V if and only if the basis consists of eigenvectors of T . We already saw (in Example 182), that if V does have such a basis, it might not necessarily have an orthonormal basis of eigenvectors of T . The nicest operators will be the ones such that there is an orthonormal basis of eigenvectors. The big goal of this section is to prove the Spectral Theorem, which characterizes these operators. The answer is different over C vs. over R, so your book proves two versions of the Spectral Theorem. Theorem 218 (The Complex Spectral Theorem). Suppose F = C and T ∈ L(V ). Then the following are equivalent: (1) T is normal. (2) V has an orthonormal basis consisting of eigenvectors of T . (3) T has a diagonal matrix with respect to some orthonormal basis of V . Proof. The equivalence of (2) and (3) follows from Proposition 152. So it suffces to prove the equivalence of (1) and (3). [(3) ⇒ (1)]. Suppose T has a diagonal matrix with respect to some orthonormal basis of V . Then by Proposition 205, with respect to the same basis, the matrix of T ∗ is the conjugate transpose of the matrix of T . Therefore, both T and T ∗ have diagonal matrices with respect to this basis. Now note that a1 .. b1 . an a1 b1 = b1 = .. . bn .. . an bn .. a1 . .. . bn an (that is, any two diagonal matrices commute). Hence, T T ∗ and T ∗ T have the same matrix with respect to this basis, so they are the same linear map. So T T ∗ = T ∗ T and so T is normal. [(1) ⇒ (3)]. Suppose T is normal. By Schur’s Theorem, there is an orthonormal basis e1 , . . . , en of V with respect to which the matrix of T is upper-triangular. So a1,1 a1,1 . . . a1,n . .. .. .. and M(T ∗ , (e1 , . . . , en )) = ... M(T, (e1 , . . . , en )) = . . a1,n . . . 0 an,n We will show that actually these matrices are diagonal. Note that from the matrices above we see that kT e1 k2 = |a1,1 |2 0 an,n and kT ∗ e1 k2 = |a1,1 |2 + · · · + |a1,n |2 = |a1,1 |2 + · · · + |a1,n |2 . But since T is normal, by Proposition 215, kT e1 k = kT ∗ e1 k and so a1,2 = · · · = a1,n = 0. Now play the same game with e2 . Because we know a1,2 = 0, we have T e2 = a2,2 e2 , so kT e2 k2 = |a2,2 |2 . We also have kT ∗ e2 k2 = |a2,2 |2 + · · · + |a2,n |2 . and again since T is normal, we can conclude that a2,3 = · · · = a2,n = 0. Continue in this fashion (okay actually you should do an induction) to see that all non-diagonal entries in M(T ) are zero. In lecture, we did not have time to prove the Real Spectral Theorem, so I simply stated it. Full details of the proof are of course found in your book. Theorem 219 (The Real Spectral Theorem). Suppose F = R and T ∈ L(V ). Then the following are equivalent: (1) T is self-adjoint. (2) V has an orthonormal basis consisting of eigenvectors of T . (3) T has a diagonal matrix with respect to some orthonormal basis of V . The Real17 Spectral Theorem. In the real case, things are a bit different (e.g., Schur’s Theorem only applies to complex vector spaces). So we need to do a bit more set up. The first observation is an analogue of the following fact. For any real number x, b 2 b2 2 x + bx + c = x + +c− 2 4 and so as long as c − b2 /4 > 0 ⇐⇒ b2 < 4c, we will have that x2 + bx + c > 0. Every nonzero real number is invertible, so this means that if b2 < 4c, then for every real number x, x2 + bx + c is invertible. 17Real as in R, not real as in “actual”. Proposition 220. Suppose T ∈ L(V ) is a self-adjoint operator and b, c ∈ R such that b2 < 4c. Then T 2 + bT + cI is an invertible operator. Proof. Let v is any nonzero vector in V . If we can show that (T 2 + bT + cI)v 6= 0, then T 2 + bT + cI will have a trivial null space, so will be invertible. Now observe that h(T 2 + bT + cI)v, vi = hT 2 v, vi + bhT v, vi + chv, vi T self-adj = hT v, T vi + bhT v, vi + c kvk2 Cauchy-Schwarz ≥ kT vk2 + b kT vk kvk + c kvk2 |b|2 kvk2 |b|2 kvk2 2 = kT vk + b kT vk kvk + + c kvk − 4 4 2 2 b |b| kvk + c− kvk2 > 0. = kT vk − 2 4 2 Since this inner product is nonzero, we have that (T 2 + bT + cI)v 6= 0, as desired. We saw that every operator on a complex vector space has an eigenvalue, but that this was not true for real vector spaces (e.g., rotation by an angle 0 < θ < π). However, every self-adjoint operator on a real vector space does have an eigenvalue. Proposition 221. Suppose V 6= {0} and T ∈ L(V ) is a self-adjoint operator. Then T has an eigenvalue. Proof. Every operator on a complex vector space has an eigenvalue, regardless of whether it is self-adjoint or not. So we need only prove this over R. Assume V is a real inner product space and let n = dim V . Choose v ∈ V nonzero. Then v, T v, T 2 v, . . . , T n v is a linearly dependent list (since there are n + 1 vectors in an n-dimensional space). Hence, there exist a1 , . . . , an ∈ R, not all 0 such that 0 = a0 v + a1 T v + · · · + an T n v. Consider the polynomial a0 + a1 x + · · · + an xn . Even though this polynomial does not factor completely into linear factors, it does factor completely into linear and quadratic factors (since the non-real roots of a real polynomial come in conjugate pairs, you can pair them up and get a real quadratic factor). Hence, we can write a0 + a1 x + · · · + an xn = c(x2 + b1 x + c1 ) . . . (x2 + bM xcM )(x − λ1 ) . . . (x − λm ) where c 6= 0, the bi , ci , λi are real and 2M + m = n. We did not cover this Theorem 4.17 in your book, but we can also assume b2i < 4ci for all i. Then we can write 0 = a0 v + a1 T v + · · · + an T n v = (a0 I + a1 T + · · · + an T n )v = c(T 2 + b1 T + c1 I) · · · (T 2 + bM T + cM I)(T − λ1 I)cdots(T − λm I)v. By the previous results, each of the T 2 + bj T + cj I is invertible. Since c 6= 0, this is not the zero map. So this means that 0 = (T − λ1 I) · · · (T − λm I)v. Hence, for some j, T − λj I is not injective, and for this j, λj will be an eigenvalue of T . One last lemma before we’re ready to prove the Real Spectral Theorem. Proposition 222. Suppose T ∈ L(V ) is self-adjoint and U is a subspace of V that is invariant under T . Then (1) U ⊥ is invariant under T ; (2) T |U ∈ L(U ) is self-adjoint; (3) T |U ⊥ ∈ L(U ⊥ ) is self-adjoint. Proof. (1) Suppose that v ∈ U ⊥ . We want to show that T v ∈ U ⊥ . For any u ∈ U , we have hT v, ui = hv, T ui since T is self-adjoint. Since U is invariant under T , we also have that T u ∈ U and hence hv, T ui = 0. Therefore, hT v, ui = 0 and since u ∈ U was arbitrary, we conclude that T v ∈ U ⊥ . (2) Let u, v ∈ U . Then hT |U (u), vi = hT u, vi = hu, T vi = hu, T |U (v)i and hence T |U (which is an operator on U since U is invariant under T ) is a self-adjoint operator. (3) Since we proved in part (1) that U ⊥ is invariant under T , this has the same proof as part (2) with U ⊥ substituted in for U . This takes us to the last theorem of the class. Theorem 223 (The Real Spectral Theorem). Suppose F = R and T ∈ L(V ). Then the following are equivalent: (1) T is self-adjoint. (2) V has an orthonormal basis consisting of eigenvectors of T . (3) T has a diagonal matrix with respect to some orthonormal basis of V . Proof. As we noted in the proof of the Complex Spectral Theorem, (2) and (3) are equivalent by by Proposition 152. We will prove (3) ⇒ (1) and (1) ⇒ (2). [(3) ⇒ (1)]. Suppose T has a diagonal matrix with respect to some orthonormal basis of V . The matrix of T ∗ with respect to this matrix is given by the conjugate transpose. But a real diagonal matrix is equal to its conjugate transpose, so T and T ∗ have the same matrix with respect to this basis. Hence, T = T ∗ and so T is self-adjoint. [(1) ⇒ (2)]. We prove this by induction on dim V . For the base case, if dim V = 1, then (1) ⇒ (2), because every linear map V → V is represented by a 1 × 1 matrix [a], and so every nonzero vector is an eigenvector of T . So the set {1} is an orthonormal basis of V consisting of eigenvectors of T . Now suppose dim V > 1 and assume (1) ⇒ (2) for all real inner product spaces of smaller dimension. Suppose T is self-adjoint. By Proposition 221, T has an eigenvalue and thus has a (non-zero) eigenvector u. We can scale this eigenvector by dividing by its norm and assume that kuk = 1. Then U = span(u) is a 1-dimensional subspace of V which is invariant under T . By the previous result, T |U ⊥ ∈ L(U ⊥ ) is a self-adjoint operator. Now note that dim U ⊥ = dim V − dim U = dim V − 1, so by the induction hypothesis, there is an orthonormal basis of U ⊥ consisting of eigenvectors of T |U ⊥ . Adjoining u to this orthonormal basis of U ⊥ gives an orthonormal basis of V consisting of eigenvectors of T .