MATH 146 Linear Algebra 1, Lecture Notes by Stephen New 1 Chapter 1. Concrete Vector Spaces and Affine Spaces Rings and Fields 1.1 Definition: For a set S we write S × S = {(a, b)|a ∈ S, b ∈ S}. A binary operation on S is a map ∗ : S × S → S, where for a, b ∈ S we usually write ∗(a, b) as a ∗ b. 1.2 Definition: A ring (with identity) is a set R together with two binary operations + and · (called addition and multiplication), where for a, b ∈ R we usually write a · b as ab, and two distinct elements 0 and 1, such that (1) + is associative: (a + b) + c = a + (b + c) for all a, b, c ∈ R, (2) + is commutative: a + b = b + a for all a, b ∈ R, (3) 0 is an additive identity: 0 + a = a for all a ∈ R, (4) every element has an additive inverse: for every a ∈ R there exists b ∈ R with a+b = 0, (5) · is associative: (ab)c = a(bc) for all a, b, c ∈ R, (6) 1 is a multiplicative identity: 1 · a = a for all a ∈ R, and (7) · is distributive over +: a(b + c) = ab + ac for all a, b, c ∈ R, A ring R is called commutative when (8) · is commutative: ab = ba for all a, b ∈ R. For a ∈ R, we say that a is invertible (or that a has an inverse) when there exists an element b ∈ R such that ab = 1 = ba. A field is a commutative ring F such that (9) every non-zero element has a multiplicative inverse: for every a ∈ F with a 6= 0 there exists b ∈ F such that ab = 1. An element in a field F is called a number or a scalar. 1.3 Example: The set of integers Z is a commutative ring, but it is not a field because it does not satisfy Property (9). The set of positive integers Z+ = {1, 2, 3, · · ·} is not a ring because 0 ∈ / Z+ and Z+ does not satisfy Properties (3) and (4). The set of natural numbers N = {0, 1, 2, · · ·} is not a ring because it does not satisfy Property (4). The set of rational numbers Q, the set of real numbers R and the set of complex numbers C are all fields. For 2 ≤ n ∈ Z, the set Zn = {0, 1, · · · , n − 1} of integers modulo n is a commutative ring, and Zn is a field if and only if n is prime (in Z1 = {0} we have 0 = 1, so Z1 is not a ring). 1.4 Remark: In a field, we can perform all of the usual arithmetical operations. The next few theorems illustrate this. 1.5 Theorem: (Uniqueness of Inverse) Let R be a field. Let a ∈ R. Then (1) the additive inverse of a is unique: if a + b = 0 = a + c then b = c, (2) if a has an inverse then it is unique: if ab = 1 = ac then b = c. Proof: To prove (1), suppose that a + b = 0 = a + c. Then b = 0 + b = (a + c) + b = b + (a + c) = (b + a) + c = (a + b) + c = 0 + c = c . To prove (2), suppose that a 6= 0 and that ab = 1 = ac. Then b = 1 · b = (ac)b = b(ac) = (ba)c = (ab)c = 1 · c = c . 2 1.6 Definition: Let R be a ring and let a, b ∈ R. We write the (unique) additive inverse of a as −a, and we write b − a = b + (−a). If a has a multiplicative inverse, we write the (unique) multiplicative inverse of a as a−1 . When R is commutative, we also write a−1 as b 1 1 a , and we write a = b · a . 1.7 Theorem: (Cancellation) Let R be a field. Then for all a, b, c ∈ R, we have (1) if a + b = a + c then b = c, (2) if a + b = a then b = 0, and (3) if a + b = 0 then b = −a. Let F be a field. Then for all a, b, c ∈ F we have (4) (5) (6) (7) if if if if ab = ac then either a = 0 or b = c. ab = a then either a = 0 or b = 1, ab = 1 then b = a−1 , and ab = 0 then either a = 0 or b = 0. Proof: To prove (1), suppose that a + b = a + c. Then we have b = 0 + b = −a + a + b = −a + a + c = 0 + c = c . Part (2) follows from part (1) since if a + b = a then a + b = a + 0, and part (3) follows from part (1) since if a + b = 0 then a + b = a + (−a). To prove part (4), suppose that ab = ac and a 6= 0. Then we have b = 1 · b = a−1 ab = a−1 ac = 1 · c = c . Note that parts (5), (6) and (7) all follow from part (4). 1.8 Remark: In the above proof, we used associativity and commutativity implicitly. If we wished to be explicit then the proof of part (1) would be as follows. Suppose that a + b = a + c. Then we have b = 0+b = (a−a)+b = (−a+a)+b = −a+(a+b) = −a+(a+c) = (−a+a)+c = 0+c = c. In the future, we shall often use associativity and commutativity implicitly in our calculations. 1.9 Theorem: (Multiplication by 0 and −1) Let R be a ring and let a ∈ R. Then (1) 0 · a = 0, and (2) (−1)a = −a. Proof: We have 0a = (0 + 0)a = 0a + 0a. Subtracting 0a from both sides (using part 2 of the Cancellation Theorem) gives 0 = 0a. Also, we have a + (−1)a = (1)a + (−1)a = (1 + (−1))a = 0a = 0, and subtracting a from both sides (part 3 of the Cancellation Theorem) gives (−1)a = −a. 3 The Standard Vector Space 1.10 Definition: Let S be a set. An n-tuple on S is a function a : {1, 2, · · · , n} → S. Given an n-tuple a on S, for k ∈ {1, 2, · · · , n} we write ak = a(k). The set {1, 2, · · · , n} is called the index set, an element k ∈ {1, 2, · · · , n} is called an index, and the element ak ∈ S is called the k th entry of a. We sometimes write a = (a1 , a2 , · · · , an ) but we more often write a1 a2 a = (a1 , a2 , · · · , an )T = ... an to indicate that a is the n-tuple with entries a1 , a2 , · · · , an . The set of all n-tuples on S is denoted by S n , so we have n o n T S = a = (a1 , a2 , · · · , an ) each ai ∈ S . 1.11 Definition: For a ring R, we define the zero element 0 ∈ Rn to be 0 = (0, 0, 0, · · · , 0)T or equivalently we define 0 ∈ Rn to be the element with entries 0i = 0 for all i. We define the standard basis elements e1 , e2 , · · · , en ∈ Rn to be e1 = (1, 0, 0, 0 · · · , 0)T , e2 = (0, 1, 0, 0, · · · , 0T , e3 = (0, 0, 1, 0, · · · , 0)T , .. . en = (0, 0, 0, · · · , 0, 1)T . ( Equivalently, for each index k we define ek ∈ Rn to be given by (ek )i = δki = 1 if k = i, 0 if k 6= i. 1.12 Definition: Given t ∈ R, x = (x1 , x2 , · · · , xn )T ∈ Rn and y = (y1 , y2 , · · · , yn )T ∈ Rn , where R is a ring, we define the product tx and the sum x + y by x1 t x1 x2 t x2 tx = t ... = ... , xn t xn y1 x1 + y1 x1 x2 y2 x2 + y2 . x+y = .. ... + .. = . . xn yn xn + yn Equivalently, we can define tx to be the element with entries (tx)i = t xi for all i, and we can define x + y to be the element with entries (x + y)i = xi + yi for all i. 4 1.13 Note: For x1 , x2 , · · · , xn ∈ R, notice that n P xi ei = x1 e1 + x2 e2 + · · · + xn en . (x1 , x2 , · · · , xn )T = i=1 1.14 Theorem: (Basic Properties of Rn ) Let R be a ring. Then (1) + is associative: (x + y) + z = x + (y + z) for all x, y, z ∈ Rn , (2) + is commutative: x + y = y + x for all x, y ∈ Rn , (3) 0 ∈ Rn is an additive identity 0 + x = x for all x ∈ Rn , (4) every x ∈ Rn has an additive inverse: for all x ∈ Rn there exists y ∈ Rn with x + y = 0, (5) · is associative: (st)x = s(tx) for all s, t ∈ R and all x ∈ Rn , (7) · distributes over addition in R: (s + t)x = sx + tx, for all s, t ∈ R and x ∈ Rn , (8) · distributes over addition in Rn : t(x + y) = tx + ty for all t ∈ R and x, y ∈ Rn , and (9) 1 ∈ R acts a multiplicative identity: 1x = x for all x ∈ Rn . Proof: To prove part (4), let x ∈ Rn and choose y = (−1)x. Then for all indices i we have yi = ((−1)x)i = −xi and so (x + y)i = xi − xi = 0. Since (x + y)i = 0 for all i, we have x+y = 0, as required. To prove part (8), let t ∈ R and let x, y ∈ Rn . Then for all i we have t(x + y) i = t(x + y)i = t(xi + yi ) = t xi + t yi = (tx + ty)i . Since t(x + y) i = (tx + ty)i for all i, we have t(x + y) = tx + ty. The other parts can be proven similarly. 1.15 Definition: When F is a field, F n is called the standard n-dimensional vector space over F , and an element of F n is called a point or a vector. 1.16 Example: Let n = 2 or 3 and let u, v ∈ Rn . If u 6= 0 then the set {tu|t ∈ R} is the line in Rn through the points 0 and u, and the set {tu|0 ≤ t ≤ 1} is the line segment in R2 between 0 and u. If u 6= 0 and v does not lie on the line through 0 and u, then the points 0, u, v and u + v are the veritices of a parallelogram P in Rn , the set {su + tv|s ∈ R, t ∈ R} is the plane which contains P (in that case that n = 2, this plane is the entire set R2 ), the set {su + tv|0 ≤ s ≤ 1, 0 ≤ t ≤ 1} is the set of points inside (and on the edges of) P . As an exercise, describe the sets {su + tv|s + t = 1} and {su + tv|s ≥ 0, t ≥ 0, s + t ≤ 1}. 5 Vector Spaces and Affine Spaces in F n 1.17 Definition: Let F be a field. Given a point p ∈ F n and a non-zero vector u ∈ F n , we define the line in F n through p in the direction of u to be the set L = p + tu t ∈ F . Given a point p ∈ Fn and two vectors u, v ∈ Rn with u 6= 0 and v 6= tu for any t ∈ F , we define the plane in F n through p in the direction of u and v to be the set P = p + su + tv s, t ∈ F . 1.18 Remark: We wish to generalize the above definitions by defining higher dimensional versions of lines and planes. 1.19 Note: For a finite set S, the cardinality of S, denoted by |S|, is the number of elements in S. When we write S = {a1 , a2 , · · · , am }, we shall always tacitly assume that the elements ai ∈ S are all distinct so that |S| = m unless we explicitly indicate otherwise. 1.20 Definition: Let R be a ring, let A = {u1 , u2 , · · · , um } ⊆ Rn . A linear combination on A (over R) is an element x ∈ Rn of the form m P x= ti ui = t1 u1 + t2 u2 + · · · + tm um with each ti ∈ R. i=1 The span of A (over R) (also called the submodule of Rn spanned by A over R), is the set of all linear combinations on A. We denote the span of A by Span A (or by SpanR A) so we have nP o m Span A = SpanR A = ti ui each ti ∈ R . i=1 For convenience, we also define Span ∅ = {0}, where ∅ is the empty set. Given an element p ∈ Rn , we write n o n o m P p + Span A = p + u u ∈ Span A = p + ti ui each ti ∈ R . i=1 1.21 Definition: Let F be a field. For a finite set A ⊆ F n the set U = Span A is called the vector space in F n (or the subspace of F n ) spanned by A (over F ). A vector space in F n (or a subspace of F n ) is a subset U ⊆ F n of the form U = Span A for some finite subset A ⊆ F n . Given a point p ∈ F n and a finite set A ⊆ F n , the set P = p + Span A is called the affine space in F n (or the affine subspace of F n ) through p in the direction of the vectors in A. An affine space in F n (or an affine subspace of F n ) is a subset P ⊆ F n of the form P = p + U for some point p ∈ F n and some vector space U in F n . An element in a subspace of F n can be called a point or a vector. An element in an affine subspace of F n is usually called a point. 1.22 Theorem: (Closure under Addition and Multiplication) Let R be a ring, let A be a finite subset of R, and let U = Span A. Then (1) U is closed under addition: for all x, y ∈ U we have x + y ∈ U , and (2) U is closed under multiplication: for all t ∈ R and all x ∈ U we have tx ∈ U . Proof: Let A = {u1 , u2 , · · · , um }, let t ∈ R and let x, y ∈ U = Span A, say x = and y = m P i=1 ti ui . Then x + y = m P (si + ti )ui ∈ U and tx = i=1 m P i=1 6 m P i=1 (tsi )ui ∈ U . si ui 1.23 Theorem: Let R be a ring, let p, q ∈ Rn , let A and B be finite subsets of Rn , and let U = Span A and V = Span B. Then (1) p + U ⊆ q + V if and only if U ⊆ V and p − q ∈ V , and (2) p + U = q + V if and only if U = V and p − q ∈ U . Proof: Suppose that p + U ⊆ q + V . Since p = p + 0 ∈ p + U , we also have p ∈ q + V , say p = q + v where v ∈ V . Then p − q = v ∈ V . Let u ∈ U . Then we have p + u ∈ p + U and so p+u ∈ q+V , say p+u = q+w where w ∈ V . Then u = w−(p−q) = w−v = w+(−1)v ∈ V by closure. Conversely, suppose that U ⊆ V and p−q ∈ V , say p−q = v ∈ V . Let a ∈ p+U , say a = p + u where u ∈ U . Then we have a = p + u = (q + v) + u = q + (u + v) ∈ q + V by closure, since u, v ∈ V . This proves Part (1), from which Part (2) immediately follows. 1.24 Theorem: Let A = {u1 , u2 , · · · , ul } ⊆ Rn and let B = {v1 , v2 , · · · , vm } ⊆ Rn , where R is a ring. Then (1) Span A ⊆ Span B if and only if each uj ∈ Span B, and (2) Span A = Span B if and only if each uj ∈ Span B and each vj ∈ Span A. Proof: Note that each uj ∈ Span A because we can write uj as a linear combination on A, indeed we have l P uj = 0u1 + 0u2 + · · · + 0uj−1 + 1uj + 0uj+1 + · · · + 0ul = ti ui with ti = δij . i=1 It follows that if Span A ⊆ Span B then we have each uj ∈ Span B. Suppose, conversely, m l P P that each uj ∈ Span B, say uj = sji vi . Let x ∈ Span A, say x = tj uj . Then i=1 x= l P j=1 tj uj = l P j=1 tj j=1 m P sji vi = i=1 m P l P i=1 tj sji vi ∈ Span B. j=1 This Proves part (1), and Part (2) follows immediately from part (1). 7 Linear Independence, Bases and Dimension 1.25 Definition: Let R be a ring. For A = {u1 , u2 , · · · , um } ⊆ Rn , we say that A is m P linearly independent (over R) when for all t1 , t2 , · · · , tm ∈ R, if ti ui = 0 then each i=1 ti = 0, and otherwise we say that A is linearly dependent. For convenience, we also say that the empty set ∅ is linearly independent. For a finite set A ⊆ F n , when A is linearly independent and U = Span A, we say that A is a basis for U . 1.26 Example: Let F be a field. The empty set ∅ is linearly independent and Span ∅ = {0} and so ∅ is a basis for the vector space {0} in F n . If 0 6= u ∈ F n then {u} is linearly independent and so {u} is a basis for Span {u}. As an exercise, verify that for u, v ∈ F n , the set {u, v} is linearly independent if and only if u 6= 0 and for all t ∈ F we have v 6= tu. 1.27 Example: Verify that the set {e1 , e2 , · · · , en } is a basis for F n . We call it the standard basis for F n . 1.28 Theorem: Let F be a field and let A = {u1 , u2 , · · · , um } ⊆ F n . Then (1) for 1 ≤ k ≤ m, we have uk ∈ Span A \ {uk } if and only if Span A \ {uk } = Span A, (2) A is linearly dependent if and only if uk ∈ Span A \ {um } for some index k. Proof: Note that if Span A \ {uk } = Span A then uk ∈ Span A = Span A \ {uk } . P Suppose, conversely, that uk ∈ Span A \ {uk } , say uk = si ui where each si ∈ F . i6=k Since A \ {uk } ⊆ A it is clear that Span A \ {uk } ⊆ Span A. Let x ∈ Span A, say m P P P P x= ti ui . Then we have x = tk uk + ti ui = tk si ui + ti ui ∈ Span A \ {uk } . i=1 i6=k i6=k i6=k This proves Part (1). Note that since A \ {uk } ⊆ A, we have Span A \ {uk } ⊆ Span A. Suppose that A is m P linearly dependent. Choose coefficients si ∈ F , not all equal to zero, so that si ui = 0. i=1 P P si Choose an index k so that sk 6= 0. Since 0 = sk uk + si ui we have uk = − sk ui . i6=k m P P i6=k P s P i ti ui ∈ Span A we have x = tk uk + ti ui = −tk ti ui ∈ sk ui + i=1 i6=k i6=k i6=k Span A\{uk } . This shows that if A is linearly dependent then Span A\{uk } = Span A. For x = 1.29 Theorem: Let F be a field, let A = {u1 , u2 , · · · , um } ⊆ F n , and let U = Span A. Then A contains a basis for U . Proof: If A is linearly independent, then A is a basis for U . Suppose that A is linearly dependent. Then for some index k we have uk ∈ Span (A \ {uk }). Reordering the vectors if necessary, let us assume that um ∈ Span (A \ {um }). Then we have Span {u1 , u2 , · · · , um−1 } = Span {u1 , u2 , · · · , um } = U . If {u1 , u2 , · · · , um−1 } is linearly independent then it is a basis for U . Otherwise, as above, we can reorder u1 , u2 , · · · , um−1 if necessary so that Span {u1 , u2 , · · · , um−2 } = Span {u1 , u2 , · · · , um−1 } = U . Repeating this procedure we will eventually obtain a linearly independent subset {u1 , u2 , · · · , uk } ⊆ A with Span {u1 , u2 , · · · , uk } = U (if the procedure continues until no vectors are left then we have k = 0 and {u1 , · · · , uk } = ∅, which is linearly independent). 1.30 Corollary: For a field F , every subspace of F n has a basis. 8 1.31 Theorem: Let F be a field, let A = {u1 , u2 , · · · , um } ⊆ F n , let a1 , a2 , · · · , am ∈ F m P with ak 6= 0, and let B = {v1 , v2 , · · · , vm } where vi = ui for i 6= k and vk = ai ui . Then i=1 (1) Span A = Span B and (2) A is linearly independent if and only if B is linearly independent. m m P P P P ti vi ∈ Span B, we have x = tk vk + ti v i = tk ai ui + ti ui and Proof: For x = i=1 i=1 i6=k i6=k so x ∈ Span A. This shows that Span B ⊆ Span A. m P Suppose A is linearly independent. Suppose ti vi = 0 where each ti ∈ F . Then i=1 0 = t k vk + P ti vi = tk m P ai ui + i=1 i6=k P ti ui = tk ak uk + i6=k P (tk ai + ti )ui . i6=k Since A is linearly independent, all of the coefficients in the above linear combination on A must be equal to zero, so we have tk ak = 0 and tk ai + ti = 0 for i 6= k. Since tk ak = 0 and ak 6= 0 we have tk = 0 and hence 0 = tk ai + ti = ti for all i 6= k. This shows that B is linearly independent. m P P Finally note that since vk = ai ui = ak uk + ai vi with ak 6= 0, it follows that i=1 uk = m P bi vi where bk = i=1 1 ak i6=k 6= 0 and bi = − aaki for i 6= k. Hence the same arguments used in the previous two paragraphs, with the rôles of A and B interchanged, show that Span A ⊆ Span B and that if B is linearly independent then so is A. 1.32 Theorem: Let F be a field, let U be a subspace of F n and let A = {u1 , u2 , · · · , um } be a basis for U . Let B = {v1 , v2 , · · · , vl } ⊆ U . Suppose that B is linearly independent. Then l ≤ m, if l = m then B is a basis for U , and if l < m then there exist m − l vectors in A which, after possibly reordering the vectors ui we can take to be ul+1 , ul+2 , · · · , um , such that the set {v1 , v2 , · · · , vl , ul+1 , ul+2 , · · · , um } is a basis for U . Proof: When l = 0 so that B = ∅, we have m − l = m and we use all m of the vectors in A to obtain the basis {u1 , u2 , · · · , um }. Let l ≥ 1 and suppose, inductively, that for every set B0 = {v1 , v2 , · · · , vl−1 } ⊆ U , if B0 is linearly independent then we have l − 1 ≤ m and we can reorder the vectors ui so that {v1 , v2 , · · · , vl−1 , ul , ul+1 , · · · , um } is a basis for U . Let B = {v1 , v2 , · · · , vl } ⊆ U . Suppose B is linearly independent. Let B0 = {v1 , v2 , · · · , vl−1 }. Note that B0 is linearly independent but B0 idoes not span U because vl ∈ U but vl ∈ / Span B0 . By the induction hypothesis, we have l − 1 < m and we can reorder the vectors ui so that {v1 , v2 , · · · , vl−1 , ul , ul+1 , · · · , um } is a basis for U . Since vl ∈ U we can write l−1 m P P vl in the form vl = t i vi + sj uj . Note that the coefficients sj cannot all be equal to i=1 i=l zero since vl ∈ / Span B0 . After reordering the vectors uj we can suppose that sl 6= 0. By Theorem XX, the set {v1 , v2 , · · · , vl−1 , vl , ul+1 , · · · um } is a basis fot U (in the case that l − 1 = m this basis is the set B). 1.33 Corollary: For a vector space U in F n , any two bases for U have the same number of elements. 1.34 Definition: For a vector space U in F n , we define the dimension of U , denoted by dim U , to be the number of elements in any basis for U . For an affine space P = p + U in F n , we define the dimension of P to be dim P = dim U . 9 Chapter 2. Solving Systems of Linear Equations Systems of Linear Equations 2.1 Definition: Let R be a ring and let n ∈ Z+ . A linear equation in Rn (or a linear equation in the variables x1 , x2 , · · · , xn over R) is an equation of the form a1 x1 + a2 x2 + · · · an xn = b with a = (a1 , a2 , · · · , an )T ∈ Rn and b ∈ R. The numbers ai are called the coefficients of the equation. A solution to the above equation is a point x = (x1 , x2 , · · · , xn )T ∈ Rn for which the equation holds. The solution set of the equation is the set of all solutions. Note that x = 0 is a solution if and only if b = 0, and in this case we say that the equation is homogeneous. 2.2 Example: Let F be a field, let a = (a1 , a2 , · · · , an )T ∈ F n , let b ∈ F , and let S be the solution set of the linear equation a1 x1 + a2 x2 + · · · + an xn = b. Show that either S = ∅ (the empty set) or S is an affine space in F n . Solution: If a = 0 and b 6= 0 then the equation has no solution so we have S = ∅. If a = 0 and b = 0 then every x ∈ F n is a solution, so S = F n . If a 6= 0 then we can choose an index k such that xk 6= 0 and then we have n P P ai ai xi = b ⇐⇒ xk = abk − ak xi i=1 i6=k ⇐⇒ x = x1 , x2 , · · · , xk−1 , abk − P ⇐⇒ x = p + P i6=k ai ak xi , xk+1 · · · , xn T xi ui i6=k where p = b ak ek and ui = ei − ai ak ek . 2.3 Definition: Let R be a ring and let n, m ∈ Z+ . A system of m linear equations in Rn (or in n variables over R) is a set of m equations of the form a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . am1 x1 + am2 x2 + · · · + amn xn = bm where each aij ∈ R and each bj ∈ R. The numbers aij are called the coefficients of the system. A solution to the above system is a point x = (x1 , x2 , · · · , xn )T ∈ Rn for which all of the m equations hold. The solution set of the system is the set of all solutions. Note that x = 0 is a solution if and only if b = 0, and in this case we call the system of linear equations homogeneous. 10 2.4 Remark: We shall describe an algorithmic method for solving a given system of liner equations over a field F . We shall see that if the the solution set is not empty then it is an affine space in F n . The algorithm involves performing the following operations on the equations in the system. These operations do not change the solution set. (1) Ei ↔ Ej : interchange the ith and j th equations, (2) Ei 7→ t Ej : multiply (both sides of) the ith equation by t where 0 6= t ∈ F , and (3) Ei 7→ Ei + t Ej : add t times (each side of) the j th equation to (the same side of) the ith equation Ei , where t ∈ F . For now, we illustrate the algorithm in a particular example. 2.5 Example: Consider the system of linear equations over the field Q. 2x1 − x2 + 3x3 = 7 x1 + 0x2 + 2x3 = 5 3x1 − 4x2 + 2x3 = 3 Show that the solution set is an affine space in Q3 . Solution: Performing operations of the above three types, we have E1 ↔ E2 x1 + 0x2 + 2x3 = 5 x1 + 0x2 + 2x3 = 5 E2 7→ E2 − 2E1 0x1 − x2 − x3 = −3 2x1 − x2 + 3x3 = 7 E3 → 7 E3 − 3E1 0x1 − 4x2 − 4x3 = −12 3x1 − 4x2 + 2x3 = 3 E2 7→ −E2 x1 + 0x2 + 2x3 = 5 x1 + 0x2 + 2x3 = 5 0x1 + x2 + x3 = 3 E3 7→ E3 + 4E2 0x1 + x2 + x3 = 3 0x1 − 4x2 − 4x3 = −12 0x1 + 0x2 + 0x3 = 0 Thus the original system of 3 equations has the same solution set as the system x1 + 2x3 = 5 x2 + x3 = 3. If we let x3 = t, then x = (x1 , x2 , x3 )T is a solution when x1 = 5 − 2t, x2 = 3 − t and x3 = t, that is when x = (5, 3, 0)T +t(−2, −1, 1)T . Thus the solution set is the line through p = (5, 3, 0)T in the direction of u = (−2, −1, 1)T . 11 Matrix Notation 2.6 Remark: We wish to introduce some notation which will simplify our discussion of systems of linear equations. In fact the objects that we introduce will turn out to be of interest in their own right. 2.7 Definition: Let F be a field. An m × n matrix over F is an array of the form a11 a12 · · · a1n a21 a22 · · · a2n A= .. .. ... . . am1 am2 ··· amn where each aij ∈ F . The number aij is called the (i, j) entry of the matrix A, and we write Aij = aij . The set of all m × n matrices over F is denoted by Mm×n (F ). The set of n × n matrices is also denoted by Mn (F ). Note that F n = Mn×1 (F ) , Mn (F ) = Mn×n (F ) , and F = M1 (F ) . The j th column of the above matrix A ∈ Mm×n (F ) is the vector (a1j , a2j , · · · , amj )T ∈ F m . The j th row of A is the vector (aj1 , aj2 , · · · , ajn )T ∈ F n . Given vectors u1 , u2 , · · · , un ∈ F m , the matrix with columns u1 , u2 , · · · , un is the matrix A = u1 , u2 , · · · , un ∈ Mm×n (F ) and given vectors v1 , v2 , · · · , vm ∈ F n , the matrix with rows v1 , v2 , · · · , vm is the matrix T v1 v2 T A= ... ∈ Mm×n (F ) . vm T Given a matrix A ∈ Mm×n (F ), the transpose of A is the matrix AT ∈ Mn×m (F ) with entries (AT )ij = Aji = aji , that is a11 a21 · · · am1 a12 a22 · · · am2 AT = . .. .. ... . . a1n a2n ··· amn The m × n zero matrix is the matrix 0 ∈ Mm×n (F ) whose entries are all equal to 0. The n × n identity matrix is the matrix I ∈ Mn (F ) with columns e1 , e2 , · · · , en . Given a matrix a = (a1 , a2 , · · · , an ) ∈ M1×n (F ) and a vector x = (x1 , x2 , · · · , xn )T ∈ F n , we define the product ax ∈ F to be x1 n x2 P ax = a1 , a2 , · · · , an = ... i=1 ai xi = a1 x1 + a2 x2 + · · · + an xn . xn 12 More generally, given a matrix A ∈ Mm×n (F ) with entries Aij = aij , and a vector x ∈ F n with entries xi ∈ F , we define the product Ax ∈ F m to be a11 x1 + a12 x2 + · · · + a1n xn x1 a11 a12 · · · a1n a21 a22 · · · a2n x2 a21 x1 + a22 x2 + · · · + a2n xn . . = Ax = .. .. .. ... . . . .. am1 am2 ··· am1 x1 + am2 x2 + · · · + amn xn xn amn Equivalently, we define Ax to be the vector in F m with entries n P (Ax)j = aji xi . i=1 Using this notation, notice that for a = (a1 , a2 , · · · , an ) ∈ M1×n (F ) and b ∈ F , the single linear equation a1 x1 + a2 x2 + · · · an xn = b can be written simply as ax = b, and for A ∈ Mm×n (F ) with entries Aij = aij and b = (b1 , b2 , · · · , bn )T ∈ F n , the system of equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . am1 x1 + am2 x2 + · · · + amn xn = bk can be written simply as Ax = b. 2.8 Note: For vectors v1 , v2 , · · · , vm ∈ F n and for x ∈ F n , if A is the matrix with rows v1 , v2 , · · · , vm then the product Ax is defined so that we have T T v1 x1 v1 x 1 v2 T x 2 v2 T x 2 . Ax = .. ... ... = . vm T vm T x n xn For vectors u1 , u2 , · · · , un ∈ F m and x ∈ F n , if A is the matrix with columns u1 , u2 , · · · , un , notice that we have x1 x2 Ax = u1 , u2 , · · · , un ... = x1 u1 + x2 u2 + · · · + xn un ∈ Span {u1 , u2 , · · · , un }. xn 2.9 Note: For 0 ∈ Mm×n (F ), I ∈ Mn (F ) and x ∈ F n we have 0x = 0 and Ix = x. 2.10 Theorem: (Linearity) For A ∈ Mm×n (F ), we have (1) A(tx) = t Ax for all t ∈ F and x ∈ F n , and (2) A(x + y) = Ax + Ay for all x, y ∈ F n . Proof: For t ∈ F and for x, y ∈ F n we have A(tx) j = n P Aji (tx)i = t i=1 n P i=1 Aji xi = tAx j n n n n P P P P and A(x + y) j = Aji (x + y)i = Aji (xi + yi ) = Aji xi + Aji yi = Ax + Ay j . i=1 i=1 i=1 13 i=1 Row Equivalence and Reduced Row Echelon Form 2.11 Definition: Given a matrix A ∈ Mm×n (F ) with entries Aij = aij and a vector b ∈ F n , we obtain the system of linear equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . am1 x1 + am2 x2 + · · · + amn xn = bn which we write simply as Ax = b. The matrix A is called the coefficient matrix of the system, and the matrix A | b ∈ Mk×n+1 (F ) is called the augmented matrix of the system. The solution set of the equation Ax = b is the set x ∈ F n Ax = b . 2.12 Definition: We noted earlier that we can perform three kinds of operations on the equations in the system without changing the solution set. These correspond to the following three kinds of operations that we can perform on the rows of the augmented matrix without changing the solution set. (1) Ri ↔ Rj : interchange rows i and j, (2) Ri → 7 t Ri : multiply the ith row by t, where 0 6= t ∈ F , and (3) Ri → 7 Ri + t Rj : add t times the j th row to the ith row, where t ∈ F . These three kinds of operations will be called elementary row operations. Given matrices A, B ∈ Mm×n (F ), we say that A and B are row equivalent, and we write A ∼ B, when B can be obtained by applying a finite sequence of elementary row operations to A. 2.13 Remark: In the next section, we describe an algorithm for finding the solution set to a given matrix equation Ax = b. We shall use elementary row operations to construct a sequence of augmented matrices (A|b) = (A0 |b0 ) ∼ (A1 |b1 ) ∼ (A2 |b2 ) ∼ · · · ∼ (Al , bl ) = (R, c) such that each equation Ak x = bk has the same solution set as the original equation Ax = b, and such that the final matrix Al = R is in a particularly nice form so that we can easily determine the solution set. We shall find that when the solution set is non-empty, we can write the solutions in the form x = p + t1 u1 + t2 u2 + · · · + tr ur . 2.14 Definition: A matrix A ∈ Mm×n (F ) is said to be in reduced row echelon form when A = 0 or there exist column indices 1 ≤ j1 ≤ j2 ≤ · · · ≤ jr ≤ n, where 1 ≤ r ≤ n, such that for each row index k with 1 ≤ k ≤ m we have (1) (2) (3) (4) Ak,jk = 1, for each j < jk we have Akj = 0, for each i < k we have Ai,jk = 0, and for all i > r and all j we have Aij = 0. The entries Ak,jk = 1 are called the pivots, the positions (k, jk ) where they occur are called the pivot positions, the columns j1 , j2 , · · · , jr of A are called the pivot columns, and the number of pivots r is called the rank of the reduced row echelon matrix A. 14 2.15 Example: Consider the augmented matrix 1 −2 1 0 −3 0 2 0 0 0 1 1 0 −1 (A|b) = 0 0 0 0 0 1 −2 0 0 0 0 0 0 0 4 1 . 3 0 Note that the matrix A is in reduced row echelon form with the pivots in positions (1, 1), (2, 4) and (3, 6). The corresponding system of equations is x1 − 2x2 + x3 − 3x5 + 2x7 = 4 x4 + x5 − x7 = 1 x6 − 2x7 = 3 The solutions x can be obtained by letting x2 = t1 , x3 = t2 , x5 = t3 and x7 = t4 , with t1 , t2 , t3 , t4 ∈ F arbitrary, and then solving for x1 , x4 and x6 to get x1 = 4 + 2t1 − t2 + 3t3 − 2t4 x4 = 1 − t3 + t4 x6 = 3 + 2t4 Thus the solution set is the set of points of the form x = p + t1 u1 + t2 u2 + t3 u3 + t4 u4 where −2 3 −1 2 4 0 0 0 1 0 0 0 1 0 0 p = 1 , u1 = 0 , u2 = 0 , u3 = −1 , u4 = 1 . 0 1 0 0 0 2 0 0 0 3 1 0 0 0 0 We can also write this as x = p + Bt where B is the matrix with columns u1 , u2 , u3 , u4 . 2.16 Note: In general, suppose that A ∈ Mm×n (F ) is in reduced row echelon form with pivot column indices 1 ≤ j1 ≤ j2 ≤ · · · ≤ jr ≤ n. Let 1 ≤ l1 ≤ l2 ≤ · · · ≤ ln−r ≤ n be the non-pivot column indices. Write R c A|b = 0 d where R ∈ Mr×n (F ) is the matrix whose rows are the non-zero rows of A and where c ∈ F r and d ∈ F m−r . Then the equation Ax = b has a solution if and only if d = 0, and in this case, as in the above example, the solution is given by x = p + Bt where p ∈ F n and B ∈ Mn×m−r (F ) are given by pJ = c , pL = 0 , BJ = −AL , and BL = I where pJ = (pj1 , pj2 , · · · , pjr )T , pL = (pl1 , pl2 , · · · , plm−r )T , BJ is the matrix whose rows are the rows j1 , j2 , · · · , jr of B, BL is the matrix whose rows are the rows l1 , l2 , · · · , lm−r of B, and AL is the matrix whose columns are the columns l1 , l2 , · · · , lm−r of A. 15 Gauss-Jordan Elimination 2.17 Theorem: (Gauss-Jordan Elimination) Let A ∈ Mm×n (F ) and let b ∈ F m , and consider the equation Ax = b. If A = 0 and b = 0 then every x ∈ F n is a solution so the solution set is Fn . If A = 0 and b 6= 0 then there is no solution, so the solution set is ∅. If A 6= 0 then we can perform a series of elementary row operations to obtain a sequence of augmented matrices R c (A|b) = (A0 |b0 ) ∼ (A1 |b1 ) ∼ (A2 |b2 ) ∼ · · · ∼ (Al |bl ) = 0 d where Al is in reduced row echelon form and R is the matrix whose rows are the non-zero rows of Al . If d 6= 0 then the solution set is empty and if d = 0 then the solution set is the affine space x = p + Bt, as described in Note 2.16 above. Proof: Suppose that A 6= 0. We describe an algorithm to obtain the required sequence of augmented matrices. Step 1: choose the smallest index j = j1 such that the j th column of A is not zero, then choose the smallest index i = i1 such that aij 6= 0. Perform the row operations Ri 7→ a1ij Ri then R1 ↔ Ri to obtain a new augmented matrix (A0 |b0 ) whose first j − 1 columns are all zero, with A0 1j = 1. For each i > 1 perform the row operation Ri 7→ Ri − A0 ij R1 to obtain a new augmented matrix (A1 |b1 ) whose first j − 1 columns are zero and whose j th column is e1 . Step s + 1: suppose that we have performed the first s steps in the algorithm and have obtained an augmented matrix (As , bs ) which is row-equivalent to (A|b) and which has the property that there exist column indices 1 ≤ j1 ≤ j2 ≤ · · · ≤ js ≤ n such that for each row index k with 1 ≤ k ≤ s we have (1) (2) (3) (4) (As )k,jk = 1, for each j < jk we have (As )kj = 0, for each i < k we have (As )i,jk = 0, and for all i > s and all j ≤ js we have (As )ij = 0. If s = m or if js = n then we are done because the matrix As is already in reduced row echelon form. Suppose that s < m and js < n. If for all i > s and all j > js we have (As )ij = 0 then we are done because the matrix As is already in row echelon form. Suppose that Aij 6= 0 for some i > s and some j > js . Let j = js+1 be the smallest index such that (As )ij 6= 0 for some i > s, then apply elementary row operations, involving only rows i with i > s, to the augmented matrix (As |bs ) to obtain a matrix (A0s |b0s ) with (A0s )s+1,j = 1; this can be done for example by choosing an index i > s such that (As )ij 6= 0 and then performing the row operation Ri 7→ (As1)ij Ri then (if i 6= s + 1) the operation Rs+1 ↔ Ri . Then for each i 6= s+1 perform the row operation Ri 7→ Ri −(A0s )ij Rs+1 to the augmented matrix (A0s |bs ) to obtain the row-equivalent augmented matrix (As+1 |bs+1 ). Verify that the new matrix satisfies the above 4 properties with s replaced by s + 1. 2.18 Definition: The algorithm for solving the system Ax = b described in the proof of the above theorem is called Gauss-Jordan elimination. 16 2.19 Example: Solve the system 3x + 2y + z = 4, 2x + y − z = 3, x + 2y + 4z = 1 in Q3 . Solution: We form the augmented matrix for the system then perform row operations to reduce to reduced row echelon form. 1 4 R → 7 R − R 1 1 2 3 2 1 1 1 3 2 1 −1 3 R2 7→ R2 − 2R1 A b = 2 1 −1 3 R3 7→ R3 − R1 1 2 4 1 1 2 4 1 R1 7→ R1 − R2 1 0 −3 2 1 1 2 1 0 1 5 −1 0 1 5 −1 R3 7→ − 31 R3 R3 7→ R3 − R2 0 0 −3 1 0 1 2 0 1 R1 7→ R1 − 3R3 1 0 0 1 0 −3 2 2 0 1 5 −1 R2 7→ R2 + 5R3 0 1 0 3 0 0 1 −1 0 0 1 − 13 3 From the final reduced matrix we see that the solution is given by (x, y, z) = 1, 32 , − 13 . 2.20 Remark: Note that in the above solution, at the first step we used the row operation R1 7→ R1 − R3 to obtain a pivot in position (1, 1), but we could have achieved this in many different ways. For example, we could have used the row operation R1 ↔ R3 or we could have used R1 7→ 13 R1 . 2.21 Example: Solve the system 2x + y + 3z = 1, 3x + y + 5z = 2, x − y + 3z = 0 in Q3 . Solution: Using Gauss-Jordan elimination, we have 1 1 2 0 1 2 1 3 1 (A|b) = 3 1 5 2 ∼ 3 1 5 2 ∼ 0 0 1 −1 3 0 1 −1 3 0 1 0 2 −1 1 2 0 1 ∼ 0 1 −1 1 ∼ 0 1 −1 1 0 0 0 −2 0 3 −3 1 1 1 1 2 1 5 −1 2 0 From the reduced matrix, we see that there is no solution. 2.22 Example: Solve the system x1 + 2x2 + x3 + 3x4 = 2, 2x1 + 3x2 + x3 + 4x4 + x5 = 5, x1 + 3x2 + 2x3 + 4x4 + x5 = 3 in Q5 . Solution: Using Gauss-Jordan elimination gives 1 2 1 3 0 2 1 2 (A|b) = 2 3 1 4 1 5 ∼ 0 1 1 3 2 4 2 3 0 1 1 0 −1 −1 2 4 1 ∼ 0 1 1 2 −1 −1 ∼ 0 0 0 0 1 −3 −2 0 1 1 1 3 0 2 2 −1 −1 1 2 1 0 −1 1 1 0 0 0 −1 2 0 5 3 1 −3 −2 From the reduced matrix we see that the solution set is the plane given by x = p + su + tv where p = (2, 3, 0, −2, 0)T , u = (1, −1, 1, 0, 0)T and v = (1, −5, 0, 3, 1)T . 17 Chapter 3. Matrices and Concrete Linear Maps The Row Space, Column Space and Null Space of a Matrix 3.1 Definition: Let R be a ring. For a matrix A ∈ Mm×n (R), the row span of A, denoted by Row(A), is the span of the rows of A, the column span of A, denoted by Col(A), is the span of the columns of A, and the null set of A, is the set Null(A) = x ∈ F n Ax = 0 . When F is a field, Row(A) and Col(A) are also called the row space and column space of A, and we define the rank of A and the nullity of A are the dimensions rank(A) = dim ColA and nullity(A) = dim NullA . n P 3.2 Note: For A = (u1 , u2 , · · · , un ) ∈ Mm×n (R) and t ∈ Rn we have At = ti ui , so i=1 Col(A) = At t ∈ Rn . 3.3 Theorem: Let F be a field, let A ∈ Mm×n (F ) and let b ∈ F m . If x = p is a solution to the equation Ax = b then x ∈ F n Ax = b = p + Null(A). Proof: If Ap = b then for x ∈ F n we have Ax = b ⇐⇒ Ax = Ap ⇐⇒ A(x − p) = 0 ⇐⇒ (x − p) ∈ NullA ⇐⇒ x ∈ p + Null(A). 3.4 Note: For A = {u1 , u2 , · · · , un } ⊆ F m and A = (u1 , u2 , · · · , un ) ∈ Mm×n (F ), A is linearly independent ⇐⇒ for all t1 , t2 , · · · , tn ∈ F , if n P ti ui = 0 then each ti = 0 i=1 ⇐⇒ for all t ∈ F n , if At = 0 then t = 0 ⇐⇒ Null(A) = {0} ⇐⇒ Null(R) = {0} I ⇐⇒ R has a pivot in every column ⇐⇒ R is of the form R = , and 0 A spans F m ⇐⇒ Col(A) = F m ⇐⇒ for every x ∈ F m there exists t ∈ F n with At = x ⇐⇒ for every y ∈ F m there exists t ∈ F n with Rt = y ⇐⇒ R has a pivot in every row. 3.5 Theorem: Let F be a field, let A = (u1 , u2 , · · · , un ) ∈ Mm×n (F ), and suppose A ∼ R where R is in reduced row echelon form with pivots in columns 1 ≤ j1 < j2 < · · · < jr ≤ n. Then (1) the non-zero rows of R form a basis for Row(A), (2) the set {uj1 , uj2 , · · · , ujr } is a basis for Col(A), and (3) when we solve Ax = b using Gauss-Jordan elimination and write the solution as x = p + Bt as in Note 2.16, the columns of B form a basis for Null(A). 18 Proof: First we prove Part (1). By Theorem 1.31, when we perform an elementary row operation on a matrix, the span of the rows is unchanged, and so we have Row(A) = Row(R). The nonzero rows of R span Row(R), so it suffices to show that the nonzero rows of R are linearly independent. Let 1 ≤ j1 < j2 < · · · < jr be the indices of the pivot columns in R. Let v1 , v2 , · · · , vr be the nonzero rows of R. Because R is in reduced row echelon form, for 1 ≤ i ≤ r and 1 ≤ k ≤ r we have (vi )jk = δi,k . It follows that r P {v1 , v2 , · · · , vr } is linearly independent because if ti vi = 0 with each ti ∈ F then for all i=1 k with 1 ≤ k ≤ r we have 0= r P i=1 t i vi jk = r P ti (vi )jk = i=1 r P ti δi,k = tk . i=1 To prove Part (2), let 1 ≤ l1 < l2 < · · · , ln−r ≤ n be the indices of the non-pivot columns. Let v1 , v2 , · · · , vn ∈ F m be the columns of R and note that we have vji = ei for 1 ≤ i ≤ r. When we use row operations to reduce A to R, the same row operations I reduce AJ = (uj1 , · · · , ujr ) to RJ = (vj1 , · · · , vjt ) = (e1 , · · · , er ) = . This shows 0 that {uj1 , · · · , ujr } is linearly independent. When we use row operations to reduce A to R, the same row operations will reduce (A|uk ) to (R|vk ), and so the equation Ax = uk has the same solutions as the equation Rx = vk . Since only the first r columns of R r r P P are nonzero, each column vk can be written as vk = (vk )i ei = (vk )i vji = Rt where i=1 i=1 t ∈ Rn is given by tJ = vk and tL = 0. Since Ax = uk and Rx = vk have the same r P solutions, we also have uk = At = (vk )i uji ∈ Span {uj1 , uj2 , · · · , ujr }. This shows that i=1 Col(A) = Span {u1 , u2 , · · · , un } = Span {uj1 , · · · , ujr }. Since the solution set to the equation Ax = b is the set {x ∈ Rn |Ax = b} = p + Col(B) = p + Null(A) we must have Col(B) = Null(A). Since (as in Note 2.16) we have BL = I, it follows that the columns of B are linearly independent using the same argument that we used in Part (1) to show that the nonzero rows of R are linearly independent. This proves Part (3). 3.6 Corollary: Let F be a field, let A ∈ Mm×n (F ), suppose that A is row equivalent to a reduced row echelon matrix which has r pivots. Then rank(A) = dim(RowA) = dim(ColA) = r , and nullity(A) = dim(NullA) = n − r. 3.7 Corollary: Let F be a field, let A ∈ Mm×n (F ) and suppose that A is row equivalent to a row reduced echelon matrix R. (1) The rows of A are linearly independent ⇐⇒ the columns of A span F m rank(A) = m ⇐⇒ R has a pivot in every row. ⇐⇒ (2) The rows of A span F n ⇐⇒ the columns of A are linearly independent ⇐⇒ I rank(A) = n ⇐⇒ R has a pivot in every column ⇐⇒ R is of the form R = . 0 (3) The rows of A form a basis for Rn ⇐⇒ the columns of A form a basis for F m ⇐⇒ rank(A) = m = n ⇐⇒ R = I. 19 Matrices and Linear Maps 3.8 Definition: Let R be a ring. A map L : Rn → Rm is called linear when (1) L(x + y) = L(x) + L(y) for all x, y ∈ Rn , and (2) L(tx) = t L(x) for all x ∈ Rn and all t ∈ R. 3.9 Note: Given a matrix A ∈ Mm×n (R), the map L : Rn → Rm given by L(x) = Ax is linear. 3.10 Theorem: Let L : Rn → Rm be linear. There exists a unique matrix A ∈ Mm×n (R) n such that L(x) = Ax for all x ∈ R , namely the matrix A = L(e1 ), L(e2 ), · · · , L(en ) . Proof: Let L : Rn → Rm and let A = (u1 , u2 , · · · , un ) ∈ Mm×n (R). If L(x) = Ax for all x ∈ R then for each index k we have uk = Aek = L(ek ). Conversely, suppose that uk = L(ek ) for every index k. Then for all x ∈ Rn we have n n n P P P L(x) = L xi ei = xi L(ei ) = xi ui = Ax . i=1 i=1 i=1 3.11 Notation: Often, we shall not make a notational distinction between the matrix A ∈ Mm×n (R) and its corresponding linear map A : Rn → Rm given by A(x) = Ax. When we do wish to make a distinction, we shall use the following notation. Given a matrix A ∈ Mm×n (R) we let LA : Rn → Rn be the linear map given by LA (x) = Ax for all x ∈ Rn and given a linear map L : Rn → Rm we let [L] be the corresponding matrix given by [L] = L(e1 ), L(e2 ), · · · , L(en ) ∈ Mm×n (R). 3.12 Definition: For a linear map L : Rn → Rm , the kernel (or the null set of L is the set Ker(L) = Null(L) = L−1 (0) = {x ∈ Rn |L(x) = 0} and the image (or the range of L) is the set Image(L) = Range(L) = L(Rn ) = {L(x)|x ∈ Rn }. We also use the same terminology for a matrix A ∈ Mm×n (R) when we think of the matrix as a linear map, so when A = [L] we have Ker(L) = Null(L) = Ker(A) = Null(A) and Image(L) = Range(L) = Image(A) = Range(A) = Col(A). When F is a field and L : F n → F m is linear, we define the rank and the nullity of L to be the dimensions rank(L) = dim Range(L) and nullity(L) = dim Null(L) . 3.13 Theorem: Let R be a ring and let L : Rn → Rm be a linear map. Then (1) L is surjective if and only if Range(L) = F m , and (2) L is injective if and only if Null(L) = {0}. Proof: Part (1) is obvious, so we only prove Part (2). Note that since L is linear we have L(0) = L(0 · 0) = 0 L(0) = 0 and so 0 ∈ Null(L). Suppose that L is injective. Then for x ∈ Rn we have x ∈ Null(L) =⇒ L(x) = 0 =⇒ L(x) = L(0) =⇒ x = 0 so Null(L) = {0}. Conversely, suppose that Null(L) = {0}. Then for x, y ∈ Rn we have L(x) = L(y) =⇒ L(x − y) = 0 =⇒ (x − y) ∈ Null(L) = {0} =⇒ x − y = 0 =⇒ x = y and so L is injective. 20 3.14 Example: The identity map on Rn is the map I : Rn → Rn given by I(x) = x for all x ∈ Rn , and it corresponds to the identity matrix I ∈ Mn (R) with entries Ii,j = δi,j . The zero map O : Rn → Rm given by O(x) = 0 for all x ∈ Rn corresponds to the zero matrix O ∈ Mm×n (R) with entries Oi,j = 0 for all i, j. 3.15 Note: Given linear maps L, M : Rn → Rm and K : Rm → Rl and given t ∈ R, the maps (L + M ) : Rn → Rn , tL and KL : Rn → Rl given by (L + M )(x) = L(x) + M (x), (tL)(x) = t L(x) and KL : Rn → Rl are all linear. For example, to see that KL is linear, note that for x, y ∈ Rn and t ∈ R we have (KL)(x + y) = K L(x + y) = K L(x) + L(y) = K L(x) + K L(y) = (KL)(x) + (KL)(y) , and KL(tx) = K L(tx) = K t L(x) = t K L(x) = t(KL)(x). 3.16 Definition: Given A, B ∈ Mm×n (R) we define A + B ∈ Mm×n (R) to be the matrix such that (A + B)(x) = Ax + Bx for all x ∈ Rn . Given A ∈ Mm×n (R) and t ∈ R, we define tA ∈ Mm×n (R) to be the matrix such that (tA)(x) = t Ax for all x ∈ Rn . Given A ∈ Ml×m (R) and B ∈ Mm×n (R) we define AB ∈ Ml×n (R) to be the matrix such that (AB)x = A(Bx) for all x ∈ Rn . 3.17 Note: From the above definitions, it follows immediately that for all matrices A, B, C of appropriate sizes and for all s, t ∈ R, we have (1) (A + B) + C = A + (B + C), (2) A + B = B + A, (3) O + A = A = A + O, (4) A + (−A) = 0, (5) (AB)C = A(BC), (6) IA = A = AI, (7) OA = O and AO = O, (8) (A + B)C = AC + BC and A(B + C) = AB + AC, (9) s(tA) = (st)A, (10) if R is commutative then A(tB) = t(AB), (11) (s + t)A = sA + tA and t(A + B) = tA + tB, and (12) 0A = O, 1A = A and (−1)A = −A. In particular, the set Mn (R) is a ring under addition and multiplication of matrices. 3.18 Theorem: For A, B ∈ Mm×n (R) and t ∈ R, the matrices A + B and tA are given by (A + B)i,j = Ai,j + Bi,j and (tA)i,j = t Ai,j . For A = (u1 , u2 , · · · , ul )T ∈ Ml×m (R) and B = (v1 , v2 , · · · , vn ) ∈ Mm×n (R), the matrix AB is given by (AB)j,k = vj T uk = m P Aj,i Bi,k . i=1 Proof: For A, B ∈ Mm×n (R), the k th column of (A + B) is equal to (A + B)ek = Aek + Bek which is the sum of the k th columns of A and B. It follows that (A + B)j,k = Aj,k + Bj,k for all j, k. Similarly for t ∈ R, the k th column of tA is equal to (tA)ek = t Aek which is t times the k th column of A. Now let A = (u1 , · · · , ul )T ∈ Ml×m (R) and B = (v1 , · · · , vn ) ∈ Mm×n (R). The k th column of (AB) is equal to (AB)ek = A(Bek ) = Avk , so the (j, k) entry of AB is equal to (AB)j,k = vj T uk = Aj,1 , Aj,2 , · · · , Aj,m 21 The Transpose and the Inverse 3.19 Definition: For a linear map L : Rn → Rm the transpose of L is the map LT : Rm → Rn such that [LT ] = [L]T . 3.20 Note: When R is a ring, for A ∈ Mm×n (R) we have Row(A) = Col(AT ) and Col(A) = Row(AT ). When F is a field, for A ∈ Mm×n (F ) we have rank(A) = rank(AT ) and for a linear map L : Rn → Rm we have rank(L) = rank(LT ). 3.21 Definition: For linear maps L : Rn → Rm and M : Rm → Rn , when LM = I where I : Rm → Rm we say that L is a left inverse of M and that M is a right inverse of L, and when LM = I and M L = I we say that L and M are (two-sided) inverses of each other. When L : Rn → Rm has a (two-sided) inverse M : Rm → Rn we say that L is invertible. We use the same terminology for matrices A ∈ Mm×n (R) and B ∈ Mn×m (R). 3.22 Theorem: Let R be a ring, let A ∈ Mm×n (R) and B ∈ Mn×m (R). If B is a left inverse of A and C is a right inverse of A then B = C. A similar result holds for linear maps L : Rn → Rm and K, M : Rm → Rn . Proof: Suppose that BA = I and that AC = I. Then B = BI = B(AC) = (BA)C = IC = C . 3.23 Theorem: Let R be a commutative ring. (1) For A, B ∈ Mm×n (R) and t ∈ R we have (AT )T = A , (A + B)T = AT + B T and (tA)T = t AT . A similar result holds for linear maps L, M : Rn → Rm . (2) If A ∈ Ml×m (R) and B ∈ Mm×n (R) then (AB)T = B TAT . A similar result holds for linear maps L : Rl → Rm and M : Rm → Rn . (3) For invertible matrices A, B ∈ Mn (R) and for an invertible element t ∈ R we have (A−1 )−1 , (tA)−1 = 1 t A−1 and (AB)−1 = B −1 A−1 . A similar result holds for invertible linear maps L, M : Rn → Rn . Proof: We leave the proof of Part (1) as an exercise. To prove Part (2), suppose that R is commutative and let A ∈ Ml×m (R) and B ∈ Mm×n (R). Then for all indices j, k we have (AB)T j,k = (AB)k,j = m P i=1 Ak,i Bi,j = m P Bi,j Ak,i = i=1 m P B T j, iAT i,k = (B TAT )j,k . i=1 To prove Part (3), let A, B ∈ Mn (R) be invertible matrices and let t ∈ R be an invertible element. Because AA−1 = I and A−1A = I, it follows that (A−1 )−1 = A. Because (tA) 1t A = t · 1t AA−1 = 1 · I = I and similarly 1t A tA = I, it follows that (tA)−1 = 1t A. Because (AB)(B −1A−1 ) = A(BB −1 )A−1 = AA−1 = I and similarly B −1A−1 )(AB) = I, it follows that (AB)−1 = B −1A−1 . 22 3.24 Theorem: Let F be a field and let A ∈ Mm×n (F ), (1) (2) (3) (4) A is surjective ⇐⇒ A has a right inverse matrix, A is injective ⇐⇒ A has a left inverse matrix, if A is bijective then n = m and A has a (two-sided) inverse matrix, and when n = m, A is bijective ⇐⇒ A is surjective ⇐⇒ A is injective. A similar result holds for a linear map L : Rn → Rm . Proof: We prove Part (1). Suppose first that A has a right inverse matrix, say AB = I with B ∈ Mn×n (F ). Then given y ∈ F m we can choose x ∈ F n to get Ax = A(By) = (AB)y = Iy = y. Thus A is surjective. Conversely, suppose that A is surjective. For each index k ∈ {1, 2, · · · , m}, choose uk ∈ F n so that Auk = ek , and then let B = (u1 , u2 , · · · , um ) ∈ Mn×m (F ). Then we have AB = A(u1 , u2 , · · · , um ) = Au1 , Au2 , · · · , Aum = (e1 , e2 , · · · , em ) = I . To prove Part (2), suppose first that A has a left inverse matrix, say BA = I with B ∈ Mn×n (F ). Then for x ∈ F n we have Ax = 0 =⇒ B(Ax) = 0 =⇒ (BA)x = 0 =⇒ Ix = 0 =⇒ x = 0 and so Null(A) = {0}. Thus A is injective. Conversely, suppose that A is injective. Then Null(A) = {}, so the columns of A are linearly independent, hence the rows of A span F n , equivalently the columns of AT span F n , hence Range(AT ) = F n and so AT is surjective. Since AT is surjective, we can choose C ∈ Mm×n (F ) so that AT C = I. Let B = C T so that ATB T = I. Transpose both sides to get BA = I T = I. Thus the matrix B is a left inverse of A. Parts (3) and (4) follow easily from Parts (1) and (2) together with previous results (namely Note 3.4, Corollary 3.7 and Theorems 3.13 and 3.22). 3.25 Note: To obtain a right inverse of a given matrix A ∈ Mm×n (F ) using the method described in the proof of Part (1) of the above theorem, we can find vectors u1 , u2 , · · · , um ∈ F n such that Auk = ek for each index k by reducing each of the augmented matrices (A|ek ). Since the same row operations which are used to reduce (A|e1 ) to the form (R|u1 ), (with R in reduced echelon form) will also reduce each of the augmented matrices (A|ek ) to the form (R|ek ), we can solve all of the equations Auk = ek simultaneously by reducing the matrix (A|I) = (A|e1 , e2 , · · · , em ) to the form (R|u1 , u2 , · · · , um ). 1 3 2 3.26 Example: Let A = 2 4 1 ∈ M3 (Q). Find A−1 . 1 1 0 Solution: We have 1 (A|I) = 2 1 1 ∼ 0 0 0 1 3 0 ∼ 0 −2 0 −2 1 1 1 0 0 3 2 1 3 1 −2 0 ∼ 0 1 2 −2 −2 −1 0 1 0 3 4 1 2 1 0 1 0 0 0 1 0 2 1 0 0 −3 −2 1 0 −2 −1 0 1 1 5 0 0 2 −1 2 1 0 − 12 1 − 23 0 1 1 −1 1 and so A−1 is equal to the matrix whichh appears on the right of the final matrix above. 23 Chapter 4. Determinants Permutations 4.1 Definition: A group is a set G together with an element e ∈ G, called the identity element, and a binary operation ∗ : G × G → G, where for a, b ∈ G we write ∗(a, b) as a ∗ b or often simply as ab, such that (1) ∗ is associative: (ab)c = a(bc) for all a, b, c ∈ G, (2) e is an identity: ae = a = ea for all a ∈ G, and (3) every a ∈ G has an inverse: for every a ∈ G there exists b ∈ G with ab = e = ba. A group G is called abelian when (4) ∗ is commutative: ab = ba for all a, b ∈ G. 4.2 Note: Let G be a group. Note that the identity element e ∈ G is the unique element that satisfies Axiom (2) in the above definition because if u ∈ G has the property that ua = a = au for all a ∈ G, then in the case that a = e we obtain e = ue = e. Also note given a ∈ G the element b which satisfies Axiom (3) above is unique because if ab = e and ca = e then we have b = eb = (ca)b = c(ab) = ce = e. 4.3 Definition: Let G be a group. Given a ∈ G, the unique element b ∈ G such that ab = e = ba is called the inverse of a and is denoted by a−1 (unless the operation in G is addition denoted by +, in which case the inverse of a is also called the negative of a and is denoted by −a). We write a0 = e and for k ∈ Z+ we write ak = aa · · · a (where the product involves k copies of a) and a−k = (ak )−1 . 4.4 Note: In a group G, we have the cancellation property: for all a, b, c ∈ G, if ab = ac (or if ca = ba) then b = c. Indeed, if ab = ac then b = eb = (a−1 a)b = a−1 (ab) = a−1 (ac) = (a−1 a)c = ec = c. 4.5 Example: If R is a ring under addition and multiplication then R is also an abelian group under addition. The identity element is 0 and the inverse of a ∈ R is −a. For example Zn , Z, Q, R and C are all abelian groups under addition. 4.6 Example: If R is a ring under addition and multiplication then the set R∗ = a ∈ R a is invertible is a group under multiplication. The identity element is 1 and the inverse of a ∈ R is a−1 . For example Z∗ = {1, −1}, Q∗ = Q \ {0}, R∗ = R \ {0} and C∗ = C \ {0} are all abelian groups under multiplication. For n ∈ Z+ , the group of units modulo n is the group Un = Zn ∗ = a ∈ Zn gcd(a, n) = 1 . The group of units Un is an abelian group under multiplication modulo n. When R is a ring (usually commutative), the general linear group GLn (R) is the group GLn (R) = A ∈ Mn (R) A is invertible . When n ≥ 2, the general linear group GLn (R) is a non-abelian group under matrix multiplication. 24 4.7 Definition: Let X be a set. The group of permutations of X is the group Perm (X) = f : X → X f is bijective under composition. The identity element is the identity map I : X → X given by I(x) = x for all x ∈ X. For n ∈ Z+ , the nth symmetric group is the group Sn = Perm {1, 2, · · · , n} . 4.8 Definition: When a1 , a2 , · · · , al are distinct elements in {1, 2, · · · , n} we write α = (a1 , a2 , · · · , al ) for the permutation α ∈ Sn given by α(a1 ) = a2 , α(a2 ) = a3 , · · · , α(al−1 ) = al , α(al ) = a1 α(k) = k for all k ∈ / {a1 , a2 , · · · , al } . Such a permutation is called a cycle of length l or an l-cycle. 4.9 Note: We make several remarks. (1) We have e = (1) = (2) = · · · = (n). (2) We have (a1 , a2 , · · · , al ) = (a2 , a3 , · · · , al , a1 ) = (a3 , a4 , · · · , al , a1 , a2 ) = · · ·. (3) An l-cycle with l ≥ 2 can be expressed uniquely in the form α = (a1 , a2 , · · · , al ) with a1 = min{a1 , a2 , · · · , al }. (4) If α = (a1 , a2 , · · · , al ) then α−1 = (al , al−1 , · · · , a2 , a1 ) = (a1 , al , · · · , a3 , a2 ). (5) If n ≥ 3 then we have (12)(23) = (123) and (23)(12) = (132) so Sn is not abelian. 4.10 Definition: In Sn , given cycles αi with αi = (ai,1 , ai,2 , · · · , ai,li ), we say that the cycles αi are disjoint when all the elements ai,j ∈ {1, 2, · · · , n} are distinct. 4.11 Theorem: (Cycle Notation) Every α ∈ Sn can be written as a product of disjoint cycles. Indeed every α 6= e can be written uniquely in the form α = (a1,1 , · · · , a1,l1 )(a2,1 , · · · , a2,l2 ) · · · (am,1 , · · · , am,lm ) with m ≥ 1, each li ≥ 2, the elements ai,j all distinct, each ai,1 = min{ai,1 , ai,2 , · · · , ai,li } and a1,1 < a2,1 < · · · < am,1 . Proof: Let e 6= α ∈ Sn where n ≥ 2. To write α in the given form, we must take a1,1 to be the smallest element k ∈ {1, 2, · · · , n} with α(k) 6= k. Then we must have a1,2 = α(a1,1 ), a1,3 = α(a1,2 ) = α2 (a1,1 ), and so on. Eventually we must reach l1 such that a1,1 = αl1 (a1,1 ), indeed since {1, 2, · · · , n} is finite, eventually we find αi (a1,1 ) = αj (a1,1 ) for some 1 ≤ i < j and then a1,1 = α−i αi (a1,1 ) = α−i αj (a1,1 ) = αj−i (a1,1 ). For the smallest such l1 the elements a1,1 , · · · , a1,l1 will be disjoint since if we had a1,i = a1,j for some 1 ≤ i < j ≤ l1 then, as above, we would have αj−i (a11 ) = a11 with 1 ≤ j − i < l1 . This gives us the first cycle α1 = (a1,1 , a1,2 , · · · , a1,l1 ). If we have α = α1 we are done. Otherwise there must be some k ∈ {1, 2, · · · , n} with k∈ / {a1,1 , a1,2 , · · · , a1,l1 } such that α(k) 6= k, and we must choose a2,1 to be the smallest such k. As above we obtain the second cycle α2 = (a2,1 , a2,2 , · · · , a2,l2 ). Note that α2 must be disjoint from α1 because if we had αi (a2,1 ) = αj (a1,1 ) for some i, j then we would have a2,1 = α−i αi (a2,1 ) = α−i αj (a1,1 ) = αj−i (a1,1 ) ∈ {a1,1 , · · · , a1,l1 }. At this stage, if α = α1 α2 we are done, and otherwise we continue the procedure. 4.12 Definition: When a permutation e 6= α ∈ Sn is written in the unique form of the above theorem, we say that α is written in cycle notation. We usually write e as e = (1). 25 4.13 Theorem: (Even and Odd Permutations) In Sn , with n ≥ 2, (1) every α ∈ Sn is a product of 2-cycles, (2) if e = (a1 , b1 )(a2 , b2 ) · · · (al , bl ) then l is even, that is l = 0mod 2, and (3) if α = (a1 , b1 )(a2 , b2 ) · · · (al , bl ) = (c1 , d1 )(c2 , d2 ) · · · (cm , dm ) then l = m mod 2. Solution: To prove part (1), note that given α ∈ Sn we can write α as a product of cycles, and we have (a1 , a2 , · · · , al ) = (a1 , al )(a1 , al−1 ) · · · (a1 , a2 ) . We shall prove part (2) by induction. First note that we cannot write e as a single 2-cycle, but we can write e as a product of two 2-cycles, for example e = (1, 2)(1, 2). Fix l ≥ 3 and suppose, inductively, that for all k < l, if we can write e as a product of k 2-cycles the k must be even. Suppose that e can be written as a product of l 2-cycles, say e = (a1 , b1 )(a2 , b2 ) · · · (al , bl ). Let a = a1 . Of all the ways we can write e as a product of l 2-cycles, in the form e = (x1 , y1 )(x2 , y2 ) · · · (xl , yl ), with xi = a for some i, choose one way, say e = (r1 , s1 )(r2 , s2 ) · · · (rl , sl ) with rm = a and ri , si 6= a for all i < m, with m being as large as possible. Note that m 6= l since for α = (r1 , s1 ) · · · (rl , sl ) with rl = a and ri , si 6= a for i < l we have α(sl ) = a 6= sl and so α 6= e. Consider the product (rm , sm )(rm+1 , sm+1 ). This product must be (after possibly interchanging rm+1 and sm+1 ) of one of the forms (a, b)(a, b) , (a, b)(a, c) , (a, b)(b, c) , (a, b)(c, d) where a, b, c, d are distinct. Note that (a, b)(a, c) = (a, c, b) = (b, c)(a, b), (a, b)(b, c) = (a, b, c) = (b, c)(a, c), and (a, b)(c, d) = (c, d)(a, b) , and so in each of these three cases we could rewrite e as a product of l 2-cycles with the first occurrence of a being farther to the right, contradicting the fact that we chose m to be as large as possible. Thus the product (rm , sm )(rm+1 , sm+1 ) is of the form (a, b)(a, b). By cancelling these two terms, we can write e as a product of (l − 2) 2-cycles. By the induction hypothesis, (l − 2) is even, and so l is even. Finally, to prove part (3), suppose that α = (a1 , b1 ) · · · (al , bl ) = (c1 , d1 ) · · · (cm , dm ). Then we have e = α α−1 = (a1 , b1 ) · · · (al , bl )(cm , dm ) · · · (c1 , d1 ). By part (2), l + m is even, and so l = m mod 2. 4.14 Definition: For n ≥ 2, a permutation α ∈ Sn is called even if it can be written as a product of an even number of 2-cycles. Otherwise α can be written as a product of an odd number of 2-cycles, and then it is called odd. We define the sign (or the parity) of α ∈ Sn to be ( 1 if α is even, (−1)α = −1 if α is odd. 4.15 Note: Note that (−1)e = 1 and that for α, β ∈ Sn , we have (−1)αβ = (−1)α (−1)β −1 and (−1)α = (−1)α . Also note that when α is an l-cycle we have (−1)α = (−1)l−1 because (a1 , a2 , · · · , al ) = (a1 , a2 )(a2 , a3 ) · · · (al−1 , al ). 26 Multilinear Maps 4.16 Notation: Let R be a commutative ring. For positive integers n1 , n2 , · · · , nk , let k Y Rni = (u1 , u2 , · · · , uk ) each ui ∈ Rni . i=1 Note that Mn×k (R) = k Y Rn = (u1 , u2 , · · · , uk ) each ui ∈ Rn . i=1 4.17 Definition: For a map L : k Q Rni → Rm , we say that L is k-linear when for each i=1 index j ∈ {1, 2, · · · , k} and for all ui , v, w ∈ Rnj and all t ∈ R we have L(u1 , · · · , uj−1 , v + w, uj+1 , · · · , un ) = L(u1 , · · · , uj−1 , v, uj+1 , · · · , un ) + L(u1 , · · · , uj−1 , w, uj+1 , · · · , un ) , and L(u1 , · · · , uj−1 , tv, uj+1 , · · · , un ) = t L(u1 , · · · , uj−1 , uj , uj+1 , · · · , un ). For a k-linear map L : Mn×k (R) = k Q Rn → Rm we say that L is symmetric when for i=1 each index j ∈ {1, 2, · · · , k − 1} and for all ui , v, w ∈ Rn we have L(u1 , · · · , uj−1 , v, w, uj+2 , · · · , un ) = L(u1 , · · · , uj−1 , w, v, uj+2 , · · · , un ) or equivalently when for every permutation σ ∈ Sk and all ui ∈ Rn we have L(u1 , u2 , · · · , uk ) = L(uσ(1) , uσ(2) , · · · , uσ(n) ), and we say that L is skew-symmetric when for each index j ∈ {1, 2, · · · , k − 1} and for all ui , v, w ∈ Rn we have L(u1 , · · · , uj−1 , v, w, uj+2 , · · · , un ) = − L(u1 , · · · , uj−1 , w, v, uj+2 , · · · , un ) or equivalently when for every permutation σ ∈ Sk and all ui ∈ Rn we have L(u1 , u2 , · · · , uk ) = (−1)σ L(uσ(1) , uσ(2) , · · · , uσ(n) ), and we say that L is alternating when for each index j ∈ {1, 2, · · · , k − 1} and for all ui , v ∈ Rn we have L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un ) = 0. 4.18 Example: As an exercise, show that for every matrix A ∈ Mm×n (R), the map L : Rn × Rm → R given by L(x, y) = y TA x is 2-linear and, conversely, that given any 2-linear map L : Rn × Rm → R there exists a unique matrix A ∈ Mm×n (R) such that L(x, y) = y TA x for all x ∈ Rn and y ∈ Rm . 4.19 Theorem: Let R be a commutative ring. Let L : Mn×k = k Q Rn → Rm be k-linear. i=1 Then (1) if L is alternating then L is skew-symmetric, (2) if L is alternating then for all indices i, j ∈ {1, 2, · · · , k} with i < j and for all ui , v ∈ Rn we have L(u1 , · · · , ui−1 , v, ui+1 , · · · , uj−1 , v, uj+1 , · · · , un ) = 0, and (3) if 2 ∈ R∗ and L is skew-symmetric then L is alternating. 27 Proof: To prove Part (1), we suppose that L is alternating. Then for j ∈ {1, 2, · · · , k − 1} and ui , v, w ∈ Rn we have 0 = L(u1 , · · · , uj−1 , v + w, v + w, uj+2 , · · · , un ) = L(u1 , · · · , v, v, · · · , un ) + L(u1 , · · · , v, w, · · · , un ) + L(u1 , · · · , w, v, · · · , un ) + L(u1 , · · · , w, w, · · · , un ) = L(u1 , · · · , v, w, · · · , un ) + L(u1 , · · · , w, v, · · · , un ) and so L(u1 , · · · , v, w, · · · , un ) = − L(u1 , · · · , w, v, · · · , un ), hence L is skew-symmetric. To prove Part (2) we again suppose that L is alternating. Then, as shown immediately above, L is also skew-symmetric and so for indices i, j ∈ {1, 2, · · · , k} with i < j and for ui , v ∈ Rn , in the case that j > i + 1 we have L(u1 , · · · , ui−1 ,v, w, ui+2 , · · · , uj−1 , v, uj+1 , · · · , un ) = − L(u1 , · · · , ui−1 , v, v, ui+2 , · · · , uj−1 , w, uj+1 · · · , un ) = 0. Finally, to prove Part (3), suppose that 2 ∈ R∗ and that L is skew-symmetric. Then for an index j ∈ {1, 2, · · · , k − 1} and for ui , v ∈ Rn we have L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un ) = − L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un ) and so 2 L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un ) = 0. Since 2 ∈ R∗ we can multiply both sides by 2−1 to get L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , uk ) = 0. 4.20 Theorem: Let R be a commutative ring. Given c ∈ R there exists a unique n Q alternating n-linear map L : Mn (R) = Rn → R such that L(I) = L(e1 , e2 , · · · , en ) = c. i=1 This unique map L is given by L(A) = c · X (−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n , that is σ∈Sn L(u1 , u2 , · · · , un ) = c · X (−1)σ (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) . σ∈Sn Proof: First we prove uniqueness. Suppose that L : Mn (R) = and n-linear with L(I) = c. Then for all ui ∈ Rn we have L(u1 , u2 , · · · , un ) = L = P n (u1 )i1 ei1 , i1 =1 n X n P n Q (u2 )i2 ei2 , · · · , i2 =1 Rn → R is alternating i=1 n P (un )in ein in =1 (u1 )i1 (u2 )i2 · · · (un )in L(ei1 , ei2 , · · · , ein ). i1 ,i2 ,···,in =1 Note that because L is alternating, whenever we have eij = eik for some j 6= k, we obtain L(ei1 , ei2 , · · · , ein ) = 0, and so the only nonzero terms in the above sum occur when 28 i1 , i2 , · · · , in are distinct, so there is a permutation σ ∈ Sn with ij = σ(j) for all j. Thus X L(u1 , u2 , · · · , un ) = (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) L(eσ(1) , eσ(2) , · · · , eσ(n) ) σ∈Sn X = (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) (−1)σ L(e1 , e2 , · · · , en ) σ∈Sn =c· X (−1)σ (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) σ∈Sn This proves that there is a unique such map L and that it is given by the required formula. n Q To prove existence, it suffices to show that the map L : Mn (R) = Rn → R given i=1 by the formula X L(u1 , u2 , · · · , un ) = c · (−1)σ (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) . σ∈Sn is n-linear and alternating with L(I) = c. Note that this map L is n-linear because X L(u1 , · · · , v + w, · · · , un ) = c · (−1)σ (u1 )σ(1) · · · (v + w)σ(j) , · · · (un )σ(n) σ∈Sn =c· X σ (−1) (u1 )σ(1) · · · vσ(j) · · · (un )σ(n) + c σ∈Sn X (−1)σ (u1 )σ(1) · · · wσ(j) · · · (un )σ(n) σ∈Sn = L(u1 , · · · , v, · · · , un ) + L(u1 , · · · , w, · · · , un ) and similarly L(u1 , · · · , tv, · · · , un ) = t L(u1 , · · · , v, · · · , un ). Note that L is alternating because, given indices i, j ∈ {1, 2, · · · , n} with i < j, when ui = uj = v we have X L(u1 , · · · ,v, · · · , v, · · · , un ) = c · (−1)σ (u1 )σ(1) · · · vσ(i) · · · vσ(j) · · · (un )σ(n) σ∈Sn =c· X σ (−1) (u1 )σ(1) · · · vσ(i) · · · vσ(j) · · · (un )σ(n) σ∈Sn ,σ(i)<σ(j) + c· X (−1)τ (u1 )τ (1) · · · vτ (i) · · · vτ (j) · · · (un )τ (n) . τ ∈Sn ,τ (i)>τ (j) This is equal to 0 because the term in the first sum labeled by σ ∈ Sn with σ(i) < σ(j) can be paired with the term in the second sum labeled by τ = σ(i, j) (where σ(i, j) denotes the composite of σ with the 2-cycle (i, j)), and then the sum of the two terms in each pair is equal to 0 because (−1)τ = −(−1)σ . Finally note that X L(e1 , e2 , · · · , en ) = c · (−1)σ (e1 )σ(1) (e2 )σ(2) · · · (en )σ(n) σ∈Sn =c· X (−1)σ δ1,σ(1) δ2,σ(2) · · · δn,σ(n) = c σ∈Sn because the only nonzero term in the sum occurs when σ = e. 29 The Determinant 4.21 Definition: Let R be a commutative ring. The unique alternating n-linear map det : Mn (R) → R with det(I) = 1 is called the determinant map. For A ∈ Mn (R), the determinant of A, denoted by |A| or by det(A), is given by X |A| = det(A) = (−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n . σ∈Sn 4.22 Example: As an exercise, find an explicit formula for the determinant of a 2 × 2 matrix and for the determinant of a 3 × 3 matrix. 4.23 Note: Given c ∈ R, according to the above theorem, the unique alternating n-linear map L : Mn (R) → R with L(I) = c is given by L(A) = c |A|. 4.24 Theorem: Let R be a commutative ring and let A, B ∈ Mn (R). Then (1) |AT | = |A|, and (2) |AB| = |A| |B|. Proof: To prove Part (1) note that X |A| = (−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n σ∈Sn = X (−1)σ A1,σ−1 (1) A2,σ−1 (2) · · · An,σ−1 (n) σ∈Sn = X (−1)τ A1,τ (1) A2,τ (2) · · · An,τ (n) τ ∈Sn = X (−1)τ (AT )τ (1),1 (AT )τ (2),2 · · · (AT )τ (n),n = |AT |. τ ∈Sn To prove Part (2), fix a matrix A ∈ Mn (R) and define L : Mr (R) → R by L(B) = |AB|. Note that L is n-linear because L(u1 , · · · , v + w, · · · , un ) = A(u1 , · · · , v + w, · · · , un ) = (Au1 , · · · , A(v + w), · · · , Aun ) = (Au1 , · · · , Av + Aw, · · · , Aun ) = (Au1 , · · · , Av, · · · , Aun ) + (Au1 , · · · , Aw, · · · , Aun ) = A(u1 , · · · , v, · · · , un ) + A(u1 , · · · , w, · · · , un ) = L(u1 , · · · , v, · · · , un ) + L(u1 , · · · , w, · · · , un ). and similarly L(u1 , · · · , tv, · · · , un ) = t L(u1 , · · · , v, · · · , un ). Note that L is alternating because L(u1 , · · · , v, v, · · · , un ) = A(u1 , · · · , v, v, · · · , un ) = (Au1 , · · · , Av, Av, · · · , Aun ) = 0. Note that L(I) = AI = |A|. Thus, by Theorem 4.20 (see Note 4.23) it follows that L(B) = L(I)|B| = |A| |A|. 4.25 Definition: Let R be a commutative ring and let A ∈ Mn (R). We say that A is upper triangular when Aj,k = 0 for all j > k, and we say that A is lower-triangular when Aj,k = 0 for all j < k. 30 4.26 Theorem: Let R be a commutative ring and let A, B ∈ Mn (R). (1) If B is obtained from A by performing an elementary column operation then |B| is obtained from |A| as follows. (a) if we use Ck ↔ Cl with k 6= l then |B| = −|A|, (b) if we use Ck 7→ t Ck with t ∈ R then |B| = t |A|, and (c) if we use Ck 7→ Ck + tCl with t ∈ R and k 6= l then |B| = |A|. The same rules apply when B is obtained from A using an elementary row operation. n Q (2) If A is either upper-triangular or lower-triangular then |A| = Ai,i . i=1 Proof: If B is obtained from A using the column operation Ck ↔ Cl with k 6= l then |B| = −|A| because the determinant map is skew-symmetric. If B is obtained from A using Ck 7→ t Ck with t ∈ R then |B| = t|A| because the determinant map is linear. Suppose B is obtained from A using Ck 7→ Ck + t Cl where t ∈ R and k 6= l. Write A = (u1 , u2 · · · , un ) with each ui ∈ Rn . Then since the determinant map is n-linear and alternating we have |B| = (u1 , · · · , uk + tul , · · · , ul , · · · , un ) = (u1 , · · · , uk , · · · , ul , · · · , un ) + t (u1 , · · · , ul , · · · , ul , · · · , un ) = |A| + t · 0 = |A|. This proves Part (1) in the case of column operations. The same rules apply when using row operations because |AT | = |A|. To prove Part (2), suppose that A is upper-triangular (the case that A is lowertriangular is similar). We claim that for every σ ∈ Sn with σ 6= e we have σ(i) > i, hence Aσ(i),i = 0, for some i ∈ {1, 2, · · · , n}. Suppose, for a contradiction, that σ 6= e and σ(i) ≤ i for all indices i. Let k be the largest index for which σ(k) < k. Then we have σ(i) = i > k for all i > k and σ(k) < k and σ(i) ≤ i < k for all i < k. This implies that there is no index i for which σ(i) = k, but this is not possible since σ is surjective. This n P Q proves the claim. Thus |A| = (−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n = Ai,i because the i=1 σ∈Sn only nonzero term in the above sum occurs when σ = e. 4.27 Example: The above theorem gives us a method that we can use to calculate determinants. For example, using only row operations of the form Rk → Rk + t Rl we have 1 2 3 1 3 4 5 1 2 1 4 5 4 1 3 2 2 0 −2 −3 = 1 0 −4 −2 3 0 −2 3 1 3 2 0 2 −1 = 0 0 2 0 0 2 4 1 3 2 4 1 3 2 4 −6 0 2 −1 5 0 2 −1 5 = = −11 0 −4 −2 −11 0 0 −4 −1 −1 0 −2 3 −1 0 0 2 4 4 1 3 2 4 1 0 2 −1 5 = = −28. 11 0 0 2 11 4 0 0 0 −7 4.28 Definition: Let R be a commutative ring and let A ∈ Mn (R) with n ≥ 2. We write A(j,k) to denote the (n − 1) × (n − 1) matrix which is obtained by removing the j th row and the k th column of A. The cofactor matrix of A is the matrix Cof(A) ∈ Mn (R) with entries Cof(A)k,l = (−1)k+l A(l,k) . 31 4.29 Theorem: Let R be a commutative ring and let A ∈ Mn (R) with n ≥ 2. (1) For each k ∈ {1, 2, · · · , n} we have |A| = n X (−1)j+k Aj,k A(j,k) = j=1 n X (−1)k+j Ak, j A(k, j) . j=1 (2) We have A · Cof(A) = |A| · I = Cof(A) · A. (3) A is invertible in Mn (R) if and only if |A| is invertible in R, and in this case we have |A−1 | = |A|−1 and 1 A−1 = |A| Cof(A). (4) If A is invertible then the unique solution to the equation Ax = b is the element x ∈ Rn with entries |Bk | xk = |A| where Bk is the matrix obtained by replacing the k th column of A by b. Proof: We have X |A| = (−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n = σ∈Sn n X X (−1)σ Aσ(1),1 · · · Aσ(k−1),k−1 Aj,k Aσ(k+1),k+1 · · · Aσ(n),n j=1 σ∈Sn ,σ(k)=j = n X j=1 Aj,k X (−1)σ A(j,k) τ (1),1 · · · A(j,k) τ (k−1),k−1 A(j,k) τ (k),k · · · A(j,k) τ (k−1),k−1 σ∈Sn ,σ(k)=j where τ = τ (σ) ∈ Sn−1 is the permutation defined as follows: ( ( σ(i) if σ(i) < j, σ(i) if σ(i) < j, and if i > k τ (i − 1) = if i < k τ (i) = σ(i) − 1 if σ(i) > j, σ(i) − 1 if σ(i) > j, or equivalently, τ is the composite τ = (n, n − 1, · · · , j + 1, j) σ (k, k + 1, · · · , n − 1, n). Note that (−1)τ = (−1)n−j (−1)σ (−1)n−k and so we have (−1)σ = (−1)j+k (−1)τ . Thus |A| = n X Aj,k = (−1)j+k (−1)τ A(j,k) τ (1),1 · · · A(j,k) τ (n−1),n−1 τ ∈Sn−1 j=1 n X X (−1)j+k Aj,k A(j,k) . j=1 The proof that |A| = n P (−1)k+j Ak,j A(k,j) is similar (or it follows from the formula j=1 |AT | = |A|). This completes the proof of Part (1). 32 To prove Part (2) we note that n n X X Cof(A) · A = Cof(A)k,j Aj,l = (−i)k+j Aj,l A(j,k) . k,l j=1 j=1 By Part (1), the sum on the right is equal to the determinant of the matrix B (k,l) ∈ Mn (R) which is obtained from A by replacing the k th column of A by a copy of its lth column. Since B (k,l) is equal to A when k = l, and B (k,l) has two equal columns when k 6= l we have ( ) |A| if k = l, Cof(A) · A = B (k,l) = = |A| · δk,l . k,l 0 if k 6= l This proves that Cof(A) · A = |A| · I. A similar proof shows that A · Cof(A) = |A| · I. If A is invertible in Mn (R), then we have |A| |A−1 | = |A · A−1 | = |I| = 1 and similarly −1 |A | |A| = 1 and so |A| and |A−1 | are invertible in R with |A−1 | = |A|. Conversely, the formulas in Part (2) show that if |A| is invertible in R then A is invertible in Mn (R) with 1 Cof(A). This proves Part (3). A−1 = |A| Part (4) now follows from Parts (1) and (3). Indeed if A is invertible then the solution to Ax = b is given by x = A−1 b and so n P 1 1 xk = A−1 b k = |A| Cof(A) b k = |A| Cof(A)k,j bj j=1 = 1 |A| n P (−1)k+j bj A(j,k) = j=1 1 |A| |Bk | where Bk is the matrix obtained by replacing the k th column of A by b. 4.30 Definition: For a matrix A ∈ Mn (R), the first of the two sums in Part (1) of the above theorem is called the cofactor expansion of |A| along the k th column of A, and the second sum is called the cofactor expansion of |A| along the k th row of A. 4.31 Example: Using row operations of the form Rk 7→ Rk + t Rl , together with cofactor expansions along various columns, we have 2 5 1 2 4 1 2 0 0 3 3 4 2 1 5 4 3 3 2 6 2 2 1 1 1 = 1 0 2 2 −2 4 =− 2 2 1 3 4 2 1 −2 −5 −3 0 −2 −5 −3 1 2 3 1 0 2 3 1 =− =− 2 1 2 0 0 1 2 0 −2 −4 −6 −4 0 −4 −6 −4 4 1 4 4 −4 2 =− 2 6 −6 4 1 2 2 4 2 1 4 4 3 2 6 0 1 0 0 0 −4 −4 −4 8 0 1 2 =− =− = 16. −6 −2 −6 −2 0 −2 4.32 Example: From the formula in Part (2), if ad − bc 6= 0 then we have −1 1 d b a b . = c d ad − bc −c a As an exercise, find a similar formula for the inverse of a 3 × 3 matrix. 4.33 Corollary: Let R be a commutative ring and let A ∈ Mn (R). If A is invertible in Mn (R) then |A| is invertible in R and we have |A−1 | = |A|−1 . 33 Chapter 5. The Dot and Cross Products in Rn 5.1 Definition: Let F be a field. For vectors x, y ∈ F n we define the dot product of x and y to be n P x y = yT x = xi yi ∈ F . . i=1 5.2 Theorem: (Properties of the Dot Product) For all x, y, z ∈ Rn and all t ∈ R we have . . . . . . . . (1) (Bilinearity) (x + y) z = x z + y z , (tx) y = t(x y) x (y + z) = x y + x z , x (ty) = t(x y), (2) (Symmetry) x y = y x, and (3) (Positive Definiteness) x x ≥ 0 with x x = 0 if and only if x = 0. . . . . . . Proof: The proof is left as an exercise. 5.3 Definition: For a vector x ∈ Rn , we define the length (or norm) of x to be s n √ P xi 2 . |x| = x x = . i=1 We say that x is a unit vector when |x| = 1. 5.4 Theorem: (Properties of Length) Let x, y ∈ Rn and let t ∈ R. Then (1) (Positive Definiteness) |x| ≥ 0 with |x| = 0 if and only if x = 0, (2) (Scaling) |tx| = |t||x|, (3) |x ± y|2 = |x|2 ± 2(x y) + |y|2 . (4) (The Polarization Identities) x y = 12 |x + y|2 − |x|2 − |y|2 = 14 |x + y|2 − |x − y|2 , (5) (The Cachy-Schwarz Inequality) |x y| ≤ |x| |y| with |x y| = |x| |y| if and only if the set {x, y} is linearly dependent, and (6) (The Triangle Inequality) |x + y| ≤ |x| + |y|. . . . . Proof: We leave the proofs of Parts (1), (2) and (3) as an exercise, and we note that (4) follows immediately from (3). To prove part (5), suppose first that {x, y} is linearly dependent. Then one of x and y is a multiple of the other, say y = tx with t ∈ R. Then . . . |x y| = |x (tx)| = |t(x x)| = |t| |x|2 = |x| |tx| = |x| |y|. Suppose next that {x, y} is linearly independent. Then for all t ∈ R we have x + ty 6= 0 and so 0 6= |x + ty|2 = (x + ty) (x + ty) = |x|2 + 2t(x y) + t2 |y|2 . . . Since the quadratic on the right is non-zero for all t ∈ R, it follows that the discriminant of the quadratic must be negative, that is . and hence |x . y| < |x| |y|. This proves part (5). 4(x y)2 − 4|x|2 |y|2 < 0. . Thus (x y)2 < |x|2 |y|2 Using part (5) note that . . |x + y|2 = |x|2 + 2(x y) + |y|2 ≤ |x + y|2 + 2|x y| + |y|2 ≤ |x|2 + 2|x| |y| + |y|2 = |x| + |y| and so |x + y| ≤ |x| + |y|, which proves part (6). 34 2 5.5 Definition: For points a, b ∈ Rn , we define the distance between a and b to be dist(a, b) = |b − a|. 5.6 Theorem: (Properties of Distance) Let a, b, c ∈ Rn . Then (1) (Positive Definiteness) dist(a, b) ≥ 0 with dist(a, b) = 0 if and only if a = b, (2) (Symmetry) dist(a, b) = dist(b, a), and (3) (The Triangle Inequality) dist(a, c) ≤ dist(a, b) + dist(b, c). Proof: The proof is left as an exercise. 5.7 Definition: For nonzero vectors 0 6= x, y ∈ Rn , we define the angle between x and y to be x y −1 θ(x, y) = cos ∈ [0, π]. |x| |y| . . Note that θ(x, y) = π2 if and only if x y = 0. For vectors x, y ∈ Rn , we say that x and y are orthogonal when x y = 0. . 5.8 Theorem: (Properties of Angle) Let 0 6= x, y ∈ Rn . Then ( θ(x, y) = 0 if and only if y = tx for some t > 0, and (1) θ(x, y) ∈ [0, π] with θ(x, y) = π if and only if y = tx for some t < 0, (2) (Symmetry) θ(x, y) = θ(y, x), ( θ(x, y) if 0 < t ∈ R, (3) (Scaling) θ(tx, y) = θ(x, ty) = π − θ(x, y) if 0 > t ∈ R, (4) (The Law of Cosines) |y − x|2 = |x|2 + |y|2 − 2|x| |y| cos θ(x, y), (5) (Pythagoras’ Theorem) θ(x, y) = π2 if and only if |y − x|2 = |x|2 + |y|2 , and (6) (Trigonometric Ratios) if (y − x) x = 0 then cos θ(x, y) = |x| |y| and sin θ(x, y) = . . |y−x| |y| . Proof: The Law of Cosines follows from the identity |y − x|2 = |y|2 − 2(y x) + |x|2 and the definition of θ(x, y). Pythagoras’ Theorem is a special case of the Law of Cosines. We Prove Part (6). Let 0 6= x, y ∈ Rn and write θ = θ(x, y). Suppose that (y − x) x = 0. Then we have y x − x x = 0 so that x y = |x|2 , and so we have . . cos θ = . . . x y |x|2 |x| = = . |x| |y| |x| |y| |y| Also, by Pythagoras’ Theorem we have |x|2 + |y − x|2 = |y|2 so that |y|2 − |x|2 = |y − x|2 , and so |x|2 |y|2 − |x|2 |y − x|2 sin2 θ = 1 − cos2 θ = 1 − 2 = = . |y| |y|2 |y|2 Since θ ∈ [0, π] we have sin θ ≥ 0, and so taking the square root on both sides gives sin θ = |y − x| . |y| 5.9 Definition: For points a, b, c ∈ Rn with a 6= b and b 6= c we define 6 abc = θ(b − a, c − b). 35 Orthogonal Complement and Orthogonal Projection in Rn 5.10 Definition: Let F be a field and let U , V and W be subspaces of F n . Recall that U + V = u + v u ∈ U, v ∈ V is a subspace of F n . We say that W is the internal direct sum of U with V , and we write W = U ⊕ V , when W = U + V and U ∩ V = {0}. As an exercise, show that W = U ⊕ V if and only if for every x ∈ W there exist unique vectors u ∈ U and v ∈ V with x = u + v. 5.11 Definition: Let U ⊆ Rn be a subspace. We define the orthogonal complement of U in Rn to be U ⊥ = x ∈ Rn x u = 0 for all u ∈ U . . 5.12 Theorem: (Properties of the Orthogonal Complement) Let U ⊆ Rn be a subspace, let S ⊆ U and let A ∈ Mk×n (R). Then (1) If U = Span (S) then U ⊥ = x ∈ Rn x · u = 0 for all u ∈ S , (2) (RowA)T = NullA. (3) U ⊥ is a vector space, (4) dim(U ) + dim(U ⊥ ) = n (5) U ⊕ U ⊥ = Rn , (6) (U ⊥ )⊥ = U , (7) (NullA)⊥ = RowA. Proof: To prove part (1), let T = x ∈ Rn x u = 0 for all u ∈ S . Note that U ⊥ ⊆ T . n P Let x ∈ T . Let u ∈ U = Span (S), say u = ti ui with each ti ∈ R and each ui ∈ S. . . . n P n P i=1 . ti (x ui ) = 0. Thus x ∈ U ⊥ and so we have T ⊆ U ⊥ . i=1 i=1 x v1 . To prove part (2), let v1 , v2 , · · · , vn be the rows of A. Note that Ax = .. so x vn ⊥ we have x ∈ NullA ⇐⇒ x vi = 0 for all i ⇐⇒ x ∈ Span {v1 , v2 , · · · , vk } = (RowA)⊥ by part (1). Part (3) follows from Part (2) since we can choose the matrix A so that U = Row(A) and then we have U ⊥ = Null(A) which is a vector space in Rn . Part (4) also follows from part (2) since if we choose A so that RowA = U then we have dim(U ) + dim(U ⊥ ) = dim RowA + dim(RowA)⊥ = dim RowA + dim NullA = n. To prove part (5), in light of part (4), it suffices to show that U ∩ U ⊥ = {0}. Let x ∈ U ∩ U ⊥ . Since x ∈ U ⊥ we have x u = 0 for all u ∈ U . In particular, since x ∈ U we have x x = 0, and hence x = 0. Thus U ∩ U ⊥ = {0} and so U ⊕ U ⊥ = Rn . To prove part (6), let x ∈ U . By the definition of U ⊥ we have x v = 0 for all v ∈ U ⊥ . By the definition of (U ⊥ )⊥ we see that x ∈ (U ⊥ )⊥ . Thus U ⊆ (U ⊥ )⊥ . By part (4) we know that dim U + dim U ⊥ = n and also that dim U ⊥ + dim(U ⊥ )⊥ = n. It follows that dim U = n − dim U ⊥ = dim(U ⊥ )⊥ . Since U ⊆ (U ⊥ )⊥ and dim U = dim(U ⊥ )⊥ we have U = (U ⊥ )⊥ , as required. ⊥ By parts (3) and (6) we have (NullA)⊥ = (RowA)⊥ = RowA, proving part (7). Then x u = x ti ui = . . . . . . 36 5.13 Definition: For a subspace U ⊆ Rn and a vector x ∈ Rn , we define the orthogonal projection of x onto U , denoted by ProjU (x), as follows. Since Rn = U ⊕ U ⊥ , we can choose unique vectors u, v ∈ Rn with u ∈ U , v ∈ U ⊥ and x = u + v. We then define ProjU (x) = u. Note that since U = (U ⊥ )⊥ , for u and v as above we have ProjU ⊥ (x) = v. When y ∈ Rn and U = Span {y}, we also write Projy (x) = ProjU (x) and Projy⊥ (x) = ProjU ⊥ (x). 5.14 Theorem: Let U ⊆ Rn be a subspace and let x ∈ Rn . Then ProjU (x) is the unique point in U which is nearest to x. Proof: Let u, v ∈ Rn with u ∈ U , v ∈ V and u + v = x so that ProjU (x) = u. Let w ∈ U with w 6= u. Since v ∈ U ⊥ and u, w ∈ U we have v u = v w = 0 and so v (w − u) = v w − v u = 0. Thus we have |x − w|2 = |u + v − w|2 = |v − (w − u)|2 = v − (w − u) v − (w − u) . . . . . . . = |v|2 − 2 v (w − u) + |w − u|2 = |v|2 + |w − u|2 = |x − u|2 + |w − u|2 . Since w 6= u we have |w − u| > 0 and so |x − w|2 > |x − u|2 . Thus |x − w| > |x − u|, that is dist(x, w) > dist(x, u), so u is the vector in U nearest to x, as required. 5.15 Theorem: For any matrix A ∈ Mn×l (R) we have Null(ATA) = Null(A) and Col(AT A) = Col(AT ) so that nullity(AT A) = nullity(A) and rank(AT A) = rank(A). Proof: If x ∈ Null(A) then Ax = 0 so ATAx = 0 hence x ∈ Null(ATA). This shows that Null(A) ⊆ Null(ATA). If x ∈ Null(ATA) then we have ATAx = 0 which implies that |Ax|2 = (Ax)T (Ax) = xTATAx = 0 and so Ax = 0. This shows that Null(AT A) ⊆ Null(A). Thus we have Null(ATA) = Null(A). It then follows that Col(AT ) = Row(A) = Null(A)⊥ = Null(ATA)⊥ = Row(ATA) = Col (ATA)T = Col(ATA). 5.16 Theorem: Let A ∈ Mn×l (R), let U = Col(A) and let x ∈ Rn . Then (1) the matrix equation ATA t = AT x has a solution t ∈ Rl , and for any solution t we have ProjU (x) = At, (2) if rank(A) = l then AT A is invertible and ProjU (x) = A(ATA)−1 AT x. Proof: Note that U ⊥ = (ColA)⊥ = Row(AT )⊥ = Null(AT ). Let u, v ∈ Rn with u ∈ U , v ∈ U ⊥ and u + v = x so that ProjU (x) = u. Since u ∈ U = ColA we can choose t ∈ Rl so that u = At. Then we have x = u + v = At + v. Multiply by AT to get AT = ATAt + AT v. Since v ∈ U ⊥ = Null(AT ) we have AT v = 0 so ATA t = AT x. Thus the matrix equation ATA t = AT x does have a solution t ∈ Rl . Now let t ∈ Rl be any solution to ATA t = At x. Let u = At and v = x − u. Note that x = u + v, u = At ∈ Col(A) = U , and AT v = AT (x − u) = AT (x − At) = AT x − ATA t = 0 so that v ∈ Null(AT ) = U ⊥ . Thus ProjU (x) = u = At, proving part (1). Now suppose that rank(A) = l. Since ATA ∈ Ml×l (R) with rank(ATA) = rank(A) = l, the matrix ATA is invertible. Since ATA is invertible, the unique solution t ∈ Rl to the matrix equation ATA t = AT x is the vector t = (ATA)−1 AT x, and so from Part (1) we have ProjU (x) = At = A(ATA)−1 AT x, proving Part (2). 37 The Volume of a Parallelotope 5.17 Definition: Given vectors u1 , u2 , · · · , uk ∈ Rn , we define the parallelotope on u1 , · · · , uk to be the set nP o k P (u1 , · · · , uk ) = ti ui 0 ≤ ti ≤ 1 for all i . j=1 We define the volume of this parallelotope, denoted by V (u1 , · · · , uk ), recursively by V (u1 ) = |u1 | and V (u1 , · · · , uk ) = V (u1 , · · · , uk−1 ) ProjU ⊥ (uk ) where U = Span {u1 , · · · , uk−1 }. 5.18 Theorem: Let u1 , · · · , uk ∈ Rn and let A = (u1 , · · · , uk ) ∈ Mn×k (R). Then q V (u1 , · · · , un ) = det(ATA). Proof: We prove the theorem by induction on k. Note that = 1, u1 ∈ Rn and p when k √ √ T A = u1 ∈ Mn×1 (R), we have V (u1 ) = |u1 | = u1 u1 = u1 u1 = ATA, as required. Let k ≥ 2 and suppose, inductively, that p when A = (u1 , · · · , uk−1 ) ∈ Mn×k−1 we have det(ATA) > 0 and V (u1 , · · · , uk−1 ) = det(ATA). Let B = (u1 , · · · , uk ) = (A, uk ). Let U = Span {u1 , · · · , uk−1 } = Col(A). Let v = ProjU (uk ) and w = ProjU ⊥ (uk ). Note that v ∈ U = Col(A) and w ∈ U ⊥ = Null(AT ). Then we have uk = v + w so that B = (A, v + w). Since v ∈ Col(A), the matrix B can be obtained from the matrix (A, w) by performing elementary column operations of the type Ck 7→ Ck + tCi . Let E be the product of the elementary matrices corresponding to these column operations, and note that B = (A, v + w) = (A, w)E. Since the row operations Ck 7→ Ck + tCi do not alter the determinant, E is a product of elementary matrices of determinant 1, so we have det(E) = 1. Since det(E) = 1 and w ∈ Null(AT ) we have T A T T T det(B B) = det E (A, w) (A, w)E = det Aw wT T T A A ATw A A 0 = det = = det(ATA) |w|2 . wTA wTw 0 |w|2 . By the induction hypothesis, we can take the square root on both sides to get q q det(B TB) = det(ATA) |w| = V (u1 , · · · , uk−1 ) |w| = V (u1 , · · · , uk ). 38 The Cross Product in Rn 5.19 Definition: Let F be a field. For n ≥ 2 we define the cross product X: n−1 Q Fn → Fn k=1 as follows. Given vectors u1 , u2 , · · · , un−1 ∈ F n , we define X(u1 , u2 , · · · , un−1 ) ∈ F n to be the vector with entries X(u1 , u2 , · · · , un−1 )j = (−1)n+j |A(j) | where A(j) ∈ Mn−1 (F ) is the matrix obtained from A = (u1 , u2 , · · · , un−1 ) ∈ Mn×n−1 (F ) by removing the j th row. Given a vector u ∈ F 2 we write X(u) as u× , and given two vectors u, v ∈ F 3 we write X(u, v) as u × v. 5.20 Example: Given u ∈ F 2 we have × u1 −u2 × u = = . u2 u1 Given u, v ∈ F 3 we have u1 v1 u × v = u2 × v2 = − u3 v3 u2 u3 u1 u3 u1 u2 v2 v3 v1 v3 v1 v2 u2 v3 − u3 v2 = u3 v1 − u1 v3 . u1 v2 − u2 v1 5.21 Note: Because the determinant is n-linear, alternating and skew-symmetric, it follows that the cross product is (n − 1)-linear, alternating and skew-symmetric. Thus for ui , v, w ∈ F n and t ∈ F we have (1) X(u1 , · · · , v + w, · · · , un−1 ) = X(u1 , · · · , v, · · · , un−1 ) + X(u1 , · · · , w, · · · , un−1 ), (2) X(u1 , · · · , t uk , · · · , un−1 ) = t X(u1 , · · · , uk , · · · , un−1 ), (3) X(u1 , · · · , uk , · · · , ul , · · · , un−1 ) = −X(u1 , · · · , ul , · · · , uk , · · · , un−1 ). 5.22 Definition: Recall that for u1 , · · · , un ∈ Rn , the set {u1 , · · · , un } is a basis for Rn if and only if det(u1 , · · · , un ) 6= 0. For an ordered basis A = (u1 , · · · , un ), we say that A is positively oriented when det(u1 , · · · , un ) > 0 and we say that A is negatively oriented when det(u1 , · · · , un ) < 0. 5.23 Theorem: Let u1 , · · · , un−1 , v1 , · · · , vn−1 , w ∈ Rn . Then . (1) X(u1 , · · · , un−1 ) w = det(u1 , · · · , un−1 , w), (2) X(u1 , · · · , un−1 ) = 0 if and only if {u1 , · · · , un−1 } is linearly dependent. (3) When w = X(u1 , · · · , un−1 ) 6= 0 we have det(u1 , · · · , un−1 , w) > 0 so that the n-tuple (u1 , · · · , un−1 , w) is a positively oriented basis for Rn , (4) X(u1 , · · · , un−1 ) X(v1 , · · · , vn−1 ) = det(ATB) where A = (u1 , · · · , un−1 ) ∈ Mn×n−1 (R) and B = (v1 , · · · , vn−1 ) ∈ Mn×n−1 (R), and (5) X(u1 , · · · , un−1 ) is equal to the volume of the parallelipiped on u1 , · · · , un−1 . . Proof: I may include a proof later. 39 Chapter 6. Vector Spaces and Modules 6.1 Definition: Let R be a commutative ring. A module over R (or an R-module) is a set U together with an element 0 ∈ U and two operations + : U × U → U and ∗ : R × U → U , where we write +(x, y) as x + y and ∗(t, x) as t ∗ x, t · x or as tx, such that (1) (2) (3) (4) (5) (6) (7) (8) + is associative: (x + y) + z = x + (y + z) for all x, y, z ∈ U , + is commutative: x + y = y + x for all x, y ∈ U , 0 is an additive identity: x + 0 = x for all x ∈ U , every element has an additive inverse: for all x ∈ U there exists y ∈ U with x + y = 0, ∗ is associative: (st)x = s(tx) for all s, t ∈ R and all x ∈ U , 1 is a multiplicative identity: 1 · x = x for all x ∈ U , ∗ is distributive over + in R: (s + t)x = sx + tx for all s, t ∈ R and all x ∈ U , and ∗ is distributive over + in U : t(x + y) = tx + ty for all t ∈ R and all x, y ∈ U . When F is a field, a module over F is also called a vector space over F . 6.2 Note: In an R-module U , the zero element is unique, the additive inverse of a given element x ∈ U and we denote it by −x, and we have additive cancellation. 6.3 Definition: Let R be a commutative ring and let W be an R-module. A submodule of W over R is a subset U ⊆ W which is also an R-module using the (restrictions of) the same operations used in W . Note that for a subset U ⊆ W , the operations on W restrict to well-defined operations on U if and only if (1) U is closed under +: for all x, y ∈ U we have x + y ∈ U , and (2) U is closed under ∗: for all t ∈ R and all x ∈ U we have tx ∈ U . When the operations do restrict as above, U is a submodule of W if and only if (3) U contains the zero element: 0 ∈ U , and (4) U is closed under negation: for all x ∈ U we have −x ∈ U . When F is a field and W is a vector space over F , a submodule of W is also called a subspace of W over F . 6.4 Example: Let R be a ring. Then Rn , Rω and R∞ are all R-modules, where Rn = f : {1, 2, · · · , n} → R = (a1 , a2 , · · · , an ) each ak ∈ R , Rω = f : Z+ → R = (a1 , a2 , a3 , · · ·) each ak ∈ R and R∞ = f : Z+ → R f (k) = 0 for all but finitely many k ∈ Z+ = (a1 , a2 , a3 , · · ·) each ak ∈ R with ak = 0 for all but finitely many k ∈ Z+ 6.5 Example: When R is a ring, Mm×n (R) is an R-modulue. 6.6 Example: For two sets A and B, we denote the set of all functions f : A → B by B A or by Func(A, B). When A is a set and R is a ring, we define operations on Func(A, R) by (tf )(x) = t f (x), (f + g)(x) = f (x) + g(x) and (f g)(x) = f (x)g(x) for all x ∈ X. The set Func(A, R) is a ring under addition and multiplication and also an R-module under addition and multiplication by t ∈ R. 6.7 Example: Let R be a ring. Recall that a polynomial ,with coefficients in R, is an n P expression of the form f (x) = ak xk = a0 +a1 x+a2 x2 +· · ·+an xn where n ∈ N and each k=0 40 ck ∈ R. We denote the set of all such polynomials by R[x] or by P (R). We consider two n P ak xk and polynomials to be equal when their coefficients are all equal, so for f (x) = g(x) = n P k=0 k bk x we have f = g when ak = bk for all k. In particular, we have f = 0 when k=0 ak = 0 for all indices k. When 0 6= f ∈ R[x], the degree of f , denoted by deg(f ), is the largest n ∈ N for which cn = 0. The degree of the zero polynomial is −∞. For n ∈ N, we denote the set of all polynomials of degree at most n by Pn (R). A (formal) power series, ∞ P with coefficients in R, is an expression of the form f (x) = ck xk = c0 + c1 + c2 x2 + · · ·. k=0 We denote the set of all such power series by R[[x]]. Thus we have n o n P k Pn (R) = f (x) = ck x each ck ∈ R , k=0 R[x] = P (R) = n o n P Pn (R) = f (x) = ck xk n ∈ N and each ck ∈ R and S n∈N R[[x]] = nP ∞ k=0 o ck xk each ck ∈ R . k=0 We define operations on these sets as follows. Given f (x) = ak xk and g(x) = P ikge0 P bk x k k≥0 (where the sums may be finite or infinite) we define tf , f + g and f g by P P (tf )(x) = t ak xk , (f + g)(x) = (ak + bk )xk , and k≥o (f g)(x) = P i,j≥0 ikge0 (ai bj )xi+j = P k≥0 ck xk with ck = k P ai bk−i . i=0 The sets Pn (R), R[x] = P (R) and R[[x]] are all rings under addition and multiplication, and they are all R-modules under addition and multiplication by elements t ∈ R. 6.8 Definition: Let R be a commutative ring. An algebra over R (or an R-algebra) is a set U with an element 0 ∈ U together with three operations + : U × U → U , ∗ : U × U → U and ∗ : R × U → U such that U is an abelian group under +, such that the two multiplication operations satisfy (xy)z = x(yz), (x + y)z = xz + yz, x(y + z) = xy + xz and t(xy) = (tx)y for all t ∈ R and all x, y, z ∈ U . 6.9 Example: When R is a commutative ring and A is a set, Rn , R∞ , Rω , Mn (R), Func(A, R), Pn (R), R[x], R[[x]] and RA are all R-algebras using their usual operations. 41 6.10 Theorem: Let R be a ring, and let W be an R-module. Let A be a set, and for T each α ∈ A let Uα be a submodule of W over R. Then Uα is an R-module. α∈A Proof: I may include a proof later. 6.11 Definition: Let R be a ring, let W be an R-module, and let S ⊆ U . A linear combination of the set S (or of the elements in S) (over R) is an element w ∈ W of the n P form w = ti ui for some n ∈ N, ti ∈ R and ui ∈ U (we allow the case n = 0 to include i=1 the empty sum, which we take to be equal to 0). The span of S (over R), denoted by Span (S) or by Span R (S), is the set of all such linear combinations, that is nP o n Span (S) = ti ui n ∈ N, ti ∈ R, ui ∈ S . i=1 When U = Span (S), we also say that S spans U or that U is spanned by S. The submodule of W generated by S, denoted by hSi, is the smallest submodule of W containing S, that is \ hSi = U ⊆ W U is a submodule of W with S ⊆ U . 6.12 Theorem: Let R be a ring, let W be an R-module, and let S ⊆ W . Then hSi = Span (S). Proof: I may include a proof later. 6.13 Definition: Let R be a ring, let U be an R-module, and let S ⊆ U . We say that S is linearly independent (over R) when for all n ∈ Z+ , for all ti ∈ R, and for all distinct n P elements ui ∈ S, if ti ui = 0 then ti = 0 for all indices i. Otherwise we say that S is i=1 linearly dependent. We say that S is a basis for U when S spans U and S is linearly independent. 6.14 Note: A set S is a basis for an R-module U if and only if every element x ∈ U can n P be written uniquely (up to the order of the terms in the sum) in the form x = ti ui i=1 where n ∈ N, 0 6= ti ∈ F and u1 , u2 , · · · , un are distinct elements in S. 6.15 Example: For ek ∈ Rn given by (ek )i = δk,i , the set S = {e1 , e2 , · · · , en } is a basis for Rn , and we call it the standard basis for Rn . For ek ∈ R∞ given by (ek )i = δk,i , the set S = {e1 , e2 , e3 , · · ·} is a basis for R∞ , which we call the standard basis for R∞ . It is not immediately obvious whether the R-module Rω has a basis. 6.16 Example: For each k, l ∈ {1, 2, · · · , n}, let Ek,l ∈ Mn (R) denote the matrix with (Ek,l )i,j = δk,i δl,j (so the (k, l) entry is equal to 1 and all other entries are equal to 0). Then the set S = Ek,l k, l ∈ {1, 2, · · · , n} is a basis for Mm×n (R), which we call the standard basis for Mm×n (R). 6.17 Example: For a ring R, the set S = Sn = 1, x, x2 , · · · , xn is the standard basis for Pn (R), and the set S = 1, x, x2 , x3 , · · · is the standard basis for Pn (R) = R[x]. It is not immediately obvious whether the R-module R[[x]] has a basis. 42 Ordered Bases and the Coordinate Map 6.18 Definition: Let R be a ring and let W be an R-module. Let A = (u1 , u2 , · · ·) be an ordered n-tuple of elements in W . A linear combination of A (over R) is an element n P x ∈ W of the form x = ti ui with each ti ∈ R. The span of U (over R) is the set i=1 Span (A) = nP n o ti ui t ∈ Rn . i=1 When U = Span (A) we say that A spans U or that U is spanned by A. We say that A n P is linearly independent (over R) when for all t ∈ Rn , if ti ui = 0 then t = 0. We say i=1 that A is an ordered basis for U when A is linearly independent and spans U . 6.19 Note: Let R be a ring, let W be an R-module, and let u1 , u2 , · · · , un ∈ W . Then the span of the n-tuple (u1 , u2 , · · · , un ) is equal to the span of there set {u1 , u2 , · · · , un }, and the n-tuple (u1 , u2 , · · · , un ) is linearly independent if and only of the elements u1 , u2 , · · · , un are distinct and the set {u1 , u2 , · · · , un } is linearly independent. For a submodule U ⊆ W , the n-tuple (u1 , · · · , un ) is a basis for U if and only if the elements ui are distinct and the set {u1 , · · · , un } is a basis for U . 6.20 Definition: Let R be a ring, let U be an R-module, and let A = (u1 , u2 , · · · , un ) be an ordered basis for U . Given an element x ∈ U , since A spans U we can write x as n P a linear combination x = ti ui with t ∈ Rn , and since A is linearly independent the i=1 n element t ∈ R is unique. We denote the unique element t ∈ Rn such that x = n P ti ui by i=1 [x]A . Thus for x ∈ U and t ∈ Rn we have t = [x]A ⇐⇒ x = n P ti ui . i=1 The elements t1 , t2 , · · · , tn ∈ R are called the coordinates of x with respect to the ordered basis A. In the case that R is a field, the vector t = [x]A is called the coordinate vector of x with respect to A. The bijective map φA : U → Rn given by φA (x) = [x]A is called the coordinate map from U to Rn induced by the ordered basis A. 6.21 Theorem: Let R be a ring, let U be a free R-module, and let A be a finite ordered basis for U . Then coordinate map φA : U → Rn is bijective and linear. That is φA (x + y) = φA (x) + φA (y) and φA (tx) = r φA (x) for all x, y ∈ U and all r ∈ R. Proof: I may include a proof later. 43 The Dimension of Finite Dimensional Vector Spaces 6.22 Note: Let R be a ring and let U be a free R-module. If A and B are bases for U with A ⊆ B, then we must have A = B because if we had A ⊂ 6 B then we could choose = n P v ∈ B \ A then write v as a linear combination v = ti ui with each ui ∈ A, but then we would would have 0 = 1 · v − n P i=1 ti ui which is a linear combination of elements in B with i=1 coefficients not all equal to zero. 6.23 Theorem: Let R be a ring and let U be a free R-module. Let A and B be two bases for U over R. Then A and B are either both finite or both infinite. Proof: If U = {0} then A = B = ∅. Suppose that U 6= {0}. Suppose that one of the mi P two bases is finite, say A = {u1 , u2 , · · · , un }. For each index i, write ui = ti,j vi,j with j=1 ti,j ∈ R and vi,j ∈ B. Let C = vi,j 1 ≤ i ≤ n , 1 ≤ j ≤ mi . Note that C is finite, and C is linearly independent because C ⊆ B and B is linearly independent, and C spans U because mi n n P P P given x ∈ U we can write x = ti ui and then we have x = (ti si,j )vi,j ∈ Span (C). i=1 i=1 j=1 Since B and C are both bases for U and C ⊆ B we have C = B, so B is finite. 6.24 Theorem: Let F be a field and let U be a vector space over F . Let A and B be finite bases for U over F . Then |A| = |B|. Proof: If U = {0} then A = B = ∅. Suppose that U 6= {0}. Let n = |A| and say A = {u1 , u2 , · · · , un }. Replace the set A by the ordered n-tuple A = (u1 , u2 , · · · , un ). Let n φ = φA : U → F and consider the set φ(B) = φ(v) v ∈ B ⊆ F n . Note that φ(B) n P spans F n because given t ∈ F n we can let x = ti ui so that [x]A = t, then we can i=1 P m m P write x = si vi with si ∈ R and vi ∈ B, and then we have t = φ(x) = φ si vi = m P i=1 i=1 si φ(vi ) ∈ Span φ(B) . Also, we claim that φ(B) is linearly independent in F n . Suppose i=1 that m P si yi = 0 where the yi are distinct elements in F n and each si ∈ F . Choose elements i=1 xi ∈ B so that φ(xi ) = yi , and note that the elements xi will be distinct because φ is P m m m P P bijective. Then we have 0 = si yi = si φ(xi ) = φ si xi . Since φ is injective it follows that m P i=1 i=1 i=1 si xi = 0. Since the elements xi are distinct elements in B and B is linearly i=1 independent, it follows that every si = 0. Thus φ(B) is linearly independent, as claimed. Since φ(B) spans F n it follows that φ(B) ≥ n, and since φ(B) is linearly independent it follows that φ(B) ≤ n, and so we have φ(B) = n = |A|. Since φ is bijective we have |B| = φ(B) = |A|. 6.25 Definition: Let U be a vector space over a field F . When U has a finite basis, we say that U is finite dimensional (over F ) and we define the dimension of U to be dim(U ) = |A| where A is any basis for U . 44 The Existence of a Basis for a Vector Space 6.26 Definition: Let S be a nonempty set of sets. A chain in S is a nonempty subset T ⊆ S with the property that for all A, B ∈ T , either A ⊆ B or B ⊆ A. For a subset T ⊆ S, an upper bound for T in S is an element B ∈ S such that A ⊆ B for all A ∈ T . A maximal element in S is an element B ∈ S such that there is no A ∈ S with B ⊆ A. 6.27 Theorem: (Zorn’s Lemma) Let S be a nonempty set. Suppose that every chain in S has an upper bound in S. Then S has a maximal element. Proof: We take Zorn’s Lemma to be an axiom, which means that we accept it as true without proof. 6.28 Note: Let F be a field and let U be a vector space over F . Let A be a linearly independent subset of U and let v ∈ U . Then A ∪ {v} is linearly independent if and only if v ∈ / Span (A) We leave the proof of this result as an exercise. Note that the analogous result does not hold when U is a module over a ring R. 6.29 Theorem: Every vector space has a basis. Indeed, every linearly independent set in a vector space is contained in a basis for the vector space. Proof: Let F be a field and let U be a vector space over F . Let A be a subset of U which is linearly independent over F . Let S be the collection of all linearly independent sets B ⊆ U with A ⊆ B. Note that S 6= ∅ because A ∈ S. Let T be a chain in S. We claim S that S T ∈ S. Since T 6= ∅ we can choose an element B0 ∈ TS and then we have A ⊆ B0 ⊆ T . Since for every B ∈ T we have B ⊆ U it follows that T ⊆ U . It remains to show that n S P T is linearly independent. Suppose that ti ui = 0 where the ui are distinct elements i=1 S S in T and ti ∈ F . For each index i, since ui ∈ T we can choose Bi ∈ T with ui ∈ Bi . Since T is a chain, for all indices i and j, either Bi ⊆ Bj or B| ⊆ Bi . It follows that we can choose an index k so that Bi ⊆ Bk for all indices i. Then we have ui ∈ Bk for all i. Since n P the ui are distinct elements in Bk with ti ui = 0 and since Bk is linearly independent i=1 S it S follows that ti = 0 for every S i. This shows that T is linearly independent, and so T ∈ S, as claimed. Since S T ∈ S it follows that T has an upper bound in S since for every B ∈ T we have B ⊆ T . By Zorn’s Lemma, it follows that S has a maximal element. Let B be a maximal element in S. We claim that B is a basis for U . Since B ∈ S we know that A ⊆ B ⊆ U and that B is linearly independent. Note also that B spans U because if we had Span (B) ⊂ / Span (B) and then B ∪ {w} 6= U then we could choose w ∈ U with w ∈ would be linearly independent by the above Note, but then B would not be maximal in S. 6.30 Example: When F is a field and A is any set, the vector spaces F ω , F [[x]] and Func(A, F ) all have bases. It is not easy to construct an explicit basis for any of these vector spaces. 6.31 Example: There exists a basis for R as a vector space over Q, but it is not easy to construct an explicit basis. 45 Some Cardinal Arithmetic 6.32 Definition: Let S be a set of nonempty sets. A choice function on S is a function S f : S → S such that f (A) ∈ A for every A ∈ S. 6.33 Theorem: (The Axiom of Choice) Every set of nonempty sets has a choice function. Proof: A proof can be found in Ehsaan Hossain’s Tutorial Lecture Notes. 6.34 Corollary: Let A beSa set. For each α ∈ A, let Xα be a nonempty set. Then there exists a function f : A → Xα with f (α) ∈ Xα for all α ∈ A. α∈A Proof: Let S = Xα α ∈ A . Note that S S = S Xα . Let g : S → S S be a choice S function for S, so we have g(Xα ) ∈ Xα for all α ∈ A. Define the map f : A → Xα by α∈A α∈A f (α) = g(Xα ) to obtain f (α) ∈ Xα for all α ∈ A, as required.. 6.35 Theorem: Let A and B be nonempty sets and let f : A → B. Then (1) f is injective if and only if f has a left inverse, and (2) f is surjective if and only if f has a right inverse. Proof: The proofs can be found in last term’s MATH 147 Lecture Notes. We remark that the proof of Part (1) does not require the Axiom of Choice but the proof of Part (2) does. 6.36 Definition: For sets A and B, we say that A and B are equipotent (or that A and B have the same cardinality), and we write |A| = |B| when there exists a bijection f : A → B. We say that the cardinality of A is less than or equal to the cardinality of B, and we write |A| ≤ |B|, when there exists an injective map f : A → B. Note that by the above theorem, we have |A| ≤ |B| if and only if there exists a surjective map f gB → A. 6.37 Note: It follows immediately from the above definitions that for all sets A, B and C we have (1) |A| = |A|, (2) if |A| = |B| then |B| = |A|, and (3) if |A| = |B| and |B| = |C| then |A| = |C|, and also (4) |A| ≤ |A|, and (5) if |A| ≤ |B| and |B| ≤ |C| then |A| ≤ |C|. Properties 1, 2 and 3 imply that equipotence is an equivalence relation on the class of all sets. Properties 4 and 5 are two of the 4 properties which appear in the definition a total order. The other 2 properties also hold, but they require proof. The third property is known as the Cantor-Schroeder-Bernstein Theorem, and we state it below. After that, we state and prove the fourth property. 6.38 Theorem: (The Cantor-Schroeder-Bernstein Theorem) Let A and B be sets. If |A| ≤ |B| and |B| ≤ |A| then |A| = |B|. Proof: A proof can be found in last term’s MATH 147 Lecture Notes (we remark that it does not require the Axiom of Choice). 46 6.39 Theorem: Let A and B be sets. Then either |A| ≤ |B| or |B| ≤ |A|. Proof: If A = ∅ we have |A| ≤ |B|. If B = ∅ we have |B| ≤ |A|. Suppose that |A| = 6 ∅ and B 6= ∅. Let S be the set of all (graphs of) injective functions f : X → B with X ⊆ A. Note that S 6= ∅ since we can choose S a ∈ A and b ∈ B and define f : {a} → B by f (a) = b. Let T be a chain in S. Note that T ∈ S, as in the proof of the Axiom of Choice found in Ehsaan Hossain’s notes, and so T has an upper bound in S. By Zorn’s Lemma, it follows that S has a maximal element. Let (the graph of) g : X → B be a maximal element ⊂ in S. Note that either X = A or g(X) = B since if we had X ⊂ 6= A and g(X) 6= B then we could choose an element a ∈ A \ X and an element b ∈ B \ f (A) and then extend g to the injective map h : X ∪ {a} → B defined by h(x) = g(x) for x ∈ X and h(a) = b, contradicting the maximality of g. In the case that X = A, since the map g : A → B is injective we have |A| ≤ |B. In the case that f (X) = B, the map g : X → B is surjective so we have |B| ≤ |X| ≤ |A|. 6.40 Note: Recall that for sets X and Y , the set of all functions f : Y → X is denoted by X Y . As an exercise, verify that given sets A1 , A2 , B1 and B2 with |A1 | = |A2 | and |B1 | = |B2 |, we have (1) |A1 × B1 | = |A2 × B2 |, (2) A1 B1 = A2 B2 , and (3) if A1 ∩ B1 = ∅ and A2 ∩ B2 = ∅ then |A1 ∪ B1 | = |A2 ∪ B2 |. 6.41 Definition: (Cardinal Arithmetic) For sets A, B and X, we write |X| = |A| |B| when |X| = |A × B|, |X| = |A||B| when |X| = |AB |, and |X| = |A| + |B| when |X| = |A0 ∪ B 0 | for disjoint sets A0 and B 0 with |A0 | = |A| and |B 0 | = |B| (for example the sets A0 = A × {1} and B 0 = B × {2}). 6.42 Note: Let B be an infinite set and let n ∈ Z+ , and let Sn = {1, 2, · · · , n}. As an exercise, show that there are injective maps f : B → B × Sn and g : B × Sn → B × B so that |B| ≤ |B × Sn | ≤ |B × B|, then use the Cantor-Schroeder-Bernstein Theorem to show that if |B| = |B × B| then we have |B × Sn | = |B|. 47 6.43 Theorem: Let A be an infinite set. Then |A × A| = |A|. Proof: Let S be the set of all (graphs of) bijective functions f : X × X → X where X is an infinite subset of A. Note that S 6= ∅ because, as proven in last term’s MATH 147 Lecture notes, we can choose a countable subset X ∈ A and a bijection f : X × X → X. S Let T be a chain in S. Note that T ∈ S as in the proof of the Axiom of Choice found in Ehsaan Hossain’s notes, and so T has an upper bound in S. By Zorn’s Lemma, S has a maximal element. Let (the graph of) g : B × B → B be a maximal element in S. Note that since g : B × B → B is bijective we have |B × B| = |B|. By the previous theorem, either |B| ≤ |A \ B| or |A \ B| ≤ |B|. We claim that |A \ B| ≤ |B|. Suppose, for a contradiction, that |B| ≤ |A \ B|. Choose C ⊆ A \ B with |C| = |B|. Note that the set (B ∪ C) × (B ∪ C) is the disjoint union (B ∪ C) × (B ∪ C) = (B × B) ∪ (B × C) ∪ (C × B) ∪ (C × C) and we have (B × C) ∪ (C × B) ∪ (C × C) = |B × C| + |C × B| + |C × C| = |B × B| + |B × B| + |B × B| = |B| + |B| + |B| = |B × {1, 2, 3}| = |B| = |C| and so the maximal bijective map g : B × B → B can be extended to a bijective map h : (B ∪ C) × (B ∪ C) → (B ∪ C) contradicting the maximality of g. Thus the case in which |B| ≤ |A \ B| cannot arise, and so we must have |A \ B| ≤ |B|, as claimed. Since |A \ B| ≤ |B| we have |A| = (A \ B) ∪ B = |A \ B| + |B| ≤ |B| + |B| = |B × {1, 2}| = |B| and hence |A × A| = |B × B| = |B| = |A|. 6.44 Corollary: Let A and B be sets. (1) If A is nonempty and B is infinite and |A| ≤ |B|, then |A| |B| = |B|. (2) If B is infinite and |A| ≤ |B| then |A| + |B| = |B|. Proof: The proof is left as an exercise. 6.45 Corollary: Let A be an infinite set. For each u ∈ A let Bu be a finite set. Then S Bu ≤ |A|. u∈A Proof: For each u ∈ A, choose a surjective map fu : A → Bu . Define f : A × A → S u∈A by g(u, v) = fu (v). Note that g is surjective and so we have S Bu ≥ |A × A| = |A|. u∈A 48 Bu The Dimension of an Infinite Dimensional Vector Space 6.46 Theorem: Let R be a ring and let U be a free R-module. Then any two infinite bases for U have the same cardinality. Proof: Let A and B be infinite bases for U . For each u ∈ A, let c = c(u) : A → R be the (unique) function, with c(u)v = 0 for all but finitely many elements v ∈ B, such that P u= c(u)v · v, and then let Bu be the set of all elements v ∈ B for which c(u)v 6= 0. v∈B S Note that each set Bu is a nonempty finite subset of B. Let C = Bu . Note that C u∈A spans U because given any x ∈ U we can write x = for each index i we can write ui = mi P n P ti ui with each ui ∈ A, and then i=1 si,j vi,j with each vi,j ∈ Bui , and then we have j=1 x= P (ti si,j )vi,j ∈ Span i,j n S Bui ⊆ Span (C). Since B is linearly independent and C ⊆ B, i=1 it follows that C is linearly independent. Since C is linearly independent and spans U , it is a basis for U . Since C and B are bases for U with C ⊆ B it follows that C = B because if we n P had C ⊂ B then we could choose v ∈ B \ C then write v as a linear combination v = ti vi 6= i=1 n P with each vi ∈ C, but then we would have 0 = 1 · v − ti vi which is a linear combination i=1 of elements in B with not all coefficients equal to 0. By the above theorem, we have S |B| = |C| = Bu ≤ |A|. u∈A By interchanging the rôles of A and B in the above proof, we see that |A| ≤ |B|. Thus we have |A| = |B| by the Cantor-Schroeder-Bernstein Theorem. 6.47 Definition: Let R be a ring and let U be a free R-module with an infinite basis. We define the rank of R to be rank(R) = |A| where A is any basis for U . When F is a field and U is a vector space over F which has an infinite basis, the rank of U is also called the dimension of U . 49 Chapter 7. Module Homomorphisms and Linear Maps 7.1 Definition: Let R be a ring and let U and V be R-modules. An (R-module) homomorphism from U to V is a map L : U → V such that L(x + y) = L(x) + L(y) and L(tx) = t L(x) for all x, y ∈ U and all t ∈ R. A bijective homomorphism from U to V is called an isomorphism from U to V , a homomorphism from U to U is called an endomorphism of U , and an isomorphism from U to U is called an automorphism of U . We say that U is isomorphic to V , and we write U ∼ = V , when there exists an isomorphism L : U → V . We use the following notation Hom(U, V ) = HomR (U, V ) = L : U → V L is a homomorphism , Iso(U, V ) = IsoR (U, V ) = L : U → V L is an isomorphism , End(U ) = EndR (U ) = L : U → U L is an endomorphism , Aut(U ) = AutR (U ) = L : U → U L is an automorphism . For L, M ∈ Hom(U, V ) and t ∈ R we define L + M and tM by (L + M )(x) = L(x) + M (x) and (tL)(x) = t L(x). Using these operations, if R is commutative then the set Hom(U, V ) is an R-module. For L ∈ Hom(U, V ), the image (or range of L and the kernel (or null set) of L are the sets Image(L) = Range(L) = L(U ) = L(x) x ∈ U and Ker(L) = Null(L) = L−1 (0) = x ∈ U L(x) = 0 . When F is a field and U and V are vector spaces over F , an F -module homomorphism from U to V is also called a linear map from U to V . 7.2 Note: For an R-module homomorphism L : U → V and for x ∈ U we have L(0) = 0 P n n P and L(−x) = −L(x), and for ti ∈ R and xi ∈ U we have L t i xi = ti L(xi ). i=1 i=1 7.3 Definition: When G and H are groups, a map L : G → H is called a group homomorphism when L(xy) = L(x)L(y) for al x, y ∈ G. A group isomorphism is a bijective group homomorphism. When R and S are rings, a map L : R → S is called a ring homomorphism when L(x + y) = L(x) + L(y) and L(xy) = L(x)L(y) for all x, y ∈ R. A ring isomorphism is a bijective ring homomorphism. When R is a ring and U and V are R-algebras, a map L : U → V is called an R-algebra homomorphism when L(x + y) = L(x) + L(y), L(xy) = L(x)L(y) and L(tx) = t L(x) for all x, y ∈ U and all t ∈ R. An R-algebra isomorphism is a bijective R-algebra homomorphism. 50 7.4 Theorem: Let R be a ring and let U , V and W be R-modules. (1) If L : U → V and M : V → W are homomorphisms then so is the composite M L : U → W. (2) If L : U → V is an isomorphism, then so is the inverse L−1 : V → U . Proof: Suppose that L : U → V and M : V → W are R-module homomorphisms. Then for all x, y ∈ U and all t ∈ R we have M (L(x + y)) = M (L(x) + L(y)) = M (L(x)) + M (L(y)) and M (L(tx)) = M (t L(x)) = t M (L(x)). Suppose that L : U → V is an isomorphism. Then given u, v ∈ V and t ∈ R, if we let x = L−1 (u) and y = L−1 (v) then we have L−1 (u + v) = L−1 L(x) + L(y) = L−1 L(x + y) = x + y = L−1 (u) + L−1 (v) and L−1 (tu) = L−1 t L(x) = L−1 L(tx) = tx = t L−1 (u). 7.5 Corollary: Let R be a ring. Then isomorphism is an equivalence relation on the class of all R-modules. This means that for all R-modules U , V and W we have (1) U ∼ = U, (2) if U ∼ = V then V ∼ = U , and ∼ ∼ (3) if U = V and V = W then U ∼ = W. 7.6 Corollary: When R is a commutative ring and U is an R-module, End(U ) is a ring under addition and composition, hence also an R-algebra, and Aut(U ) is a group under composition. 7.7 Theorem: Let L : U → V be an R-algebra homomorphism. (1) If U0 is a submodule of U then L(U ) is a submodule of V . In particular, the image of L is a submodule of V . (2) If V0 is a submodule of V then L−1 (V0 ) is a submodule of U . In particular, the kernel of L is a submodule of U . Proof: To prove Part (1), let U0 be a submodule of U . Let u, v ∈ L(U0 ) and let t ∈ R. Choose x, y ∈ U0 with L(x) = u and L(y) = v. Since x + y ∈ U0 and L(x + y) = L(x) + L(y) = u + v, it follows that u + v ∈ L(U0 ). Since tx ∈ U0 and L(tx) = t L(x) = tu, it follows that tu ∈ L(U0 ). Thus L(U0 ) is closed under the module operations and so it is a submodule of V . To prove Part (2), let V0 be a submodule of V . Let x, y ∈ L−1 (V0 ) and let t ∈ R. Let u = L(x) ∈ V0 and v = L(y) ∈ V0 . Since L(x + y) = L(x) + L(y) = u + v ∈ V0 it follows that x + y ∈ L−1 (V0 ). Since L(tx) = t L(x) = Lu ∈ V0 it follows that tx ∈ L−1 (V0 ). Thus L−1 (V0 ) is closed under the module operations and so it is a sub algebra of U . 7.8 Theorem: Let L : U → V be an R-module homomorphism. Then (1) L is surjective if and only if Range(L) = V , and (2) L is injective if and only if Ker(L) = {0}. Proof: Part (1) is simply a restatement of the definition of subjectivity and does not require proof. To Prove Part (2), we begin by remarking that since L(0) = 0 we have {0} ⊆ Ker(L). Suppose L is injective. Then x ∈ Ker(L) =⇒ L(x) = 0 =⇒ L(x) = L(0) =⇒ x = 0 and so Ker(L) = {0}. Suppose, conversely, that Ker(L) = {0}. Then L(X) = L(y) =⇒ L(x) − L(y) = 0 =⇒ L(x − y) = 0 =⇒ x − y ∈ Ker(L) = {0} =⇒ x − y = 0 =⇒ x = y and so L is injective. 51 7.9 Example: The maps n P L : Pn (R) → Rn+1 given by L ai xi = (a0 , a1 , · · · , an ) i=0 L : R[x] → R∞ given by L n P L : R[[x]] → Rω given by L i=0 ∞ P ai xi = (a0 , a1 , · · · , an , 0, 0, · · ·) ai xi = (a0 , a1 , a2 , · · ·) i=0 are all R-algebra isomorphisms, so we have Pn (R) ∼ = Rn+1 , R[x] ∼ = R∞ and R[[x]] ∼ = Rω . 7.10 Example: The map L : Mm×n (R) → Rm·n given by a1,1 a1,2 · · · a1,n .. .. = a , a , · · · , a , a , · · · , a , · · · , a , · · · , a L ... 1,1 1,2 1,n 2,1 2,n m,1 m,n . . am,1 am,2 ··· am,n is an R-module isomorphism, so we have Mm×n (R) ∼ = Rm·n . 7.11 Example: Let A and B be sets with |A| = |B|, and let g : A → B be a bijection. Then the map L : RA → RB given by L(f )(b) = f (g −1 (b)), that is by L(f ) = f g −1 , is an R-module isomorphism, and so we have RA ∼ = RB . In particular, if |A| = n then we have {1,2,···,n} {1,2,3,···} RA ∼ = Rn , and if |A| = ℵ0 then we have RA ∼ = Rω . =R =R 7.12 Example: Let A = (u1 , u2 , · · · , un ) be a finite ordered basis for a free R-module U . Then the map φA : U → Rn given by φA (x) = [x]A is an R-module isomorphism, so we have U ∼ = Rn . If B = (v1 , v2 , · · · , vn ) is a finite ordered basis for another free R-module V , then the ∼ map φ−1 B φA : U → V is an R-molule isomorphism, so we have U = V . 7.13 Example: Let R be a commutative ring. Let φ : Hom(Rn , Rm ) → Mm×n (R) be the map given by φ(L) = [L] = L(e1 ), L(e2 ), · · · , L(en ) ∈ Mm×n (R). Recall that the inverse of φ is the map ψ : Mm×n (R) → Hom(Rn , Rm ) given by ψ(A) = LA where LA (a) = Ax. Note that ψ preserves the R-module operations because φ(A + B) = LA+B = LA + LB = ψ(A) + ψ(B) and ψ(tA) = LtA = t LA = t ψ(A). Thus φ and ψ are R-module isomorphisms and we have Hom(Rn , Rm ) ∼ = Mm×n (R). In the case that m = n, we also have ψ(AB) = LAB = LA LB = ψ(A)ψ(B) and so the maps φ and ψ are in fact R-algebra isomorphisms so we have End(Rn ) ∼ = Mn (R) as R-algebras. By restricting φ and ψ to the invertible elements, we also obtain a group isomorphism Aut(Rn ) ∼ = GLn (R). 52 7.14 Theorem: Let R be a ring and let U and V be free R-modules. Then U ∼ = V if and only if there exists a basis A for U and a basis B for V with |A| = |B|. ∼ Proof: Suppose that U = V . Let A be a basis for U and let L : U → V be an isomorphism. Let B = L(A) = L(u) u ∈ A . Since L is bijective we have |A| = |B|. Note that B spans n P V because given y ∈ V we can choose x ∈ U with L(x) = y, then write x = ti ui with i=1 ti ∈ R and ui ∈ A, and then we have P P n n y = L(x) = L ti ui = ti L(ui ) ∈ Span (B). i=1 i=1 It remains to show that B is linearly independent. Suppose that n P ti vi = 0 where ti ∈ R i=1 and the vi are distinct elements in B. For each index i, choose ui ∈ A with L(ui ) = vi , and note that the elements ui are distinct because L is bijective. We have P n n n P P ti ui . 0= t i vi = ti L(ui ) = L i=1 i=1 i=1 Because L is injective, it follows that n P ti ui = 0 and then, because A is linearly indepen- i=1 dent, it follows that each ti = 0. Thus B is linearly independent, as required. Conversely, suppose that A is a basis for U and B is a basis for V with |A| = |B|. Let g : A → B be a bijection. Define a map L : U → V as follows. Given x ∈ U , n P write x = ti ui where ti ∈ R and the ui are distinct elements in A, and then define L(x) = x= n P n P i=1 ti g(ui ). Note that L is an R-module homomorphism because for r ∈ R and for i=1 si ui and y = i=1 n P ti ui (where we are using the same elements ui in both sums with i=1 some of the coefficients equal to zero), we have P n n n P P L(rx) = L (rsi )ui = rsi g(ui ) = r si g(ui ) = rL(x) , and i=1 i=1 i=1 P P n n n n P P L(x + y) = L (si + ti )ui = (si + ti )g(ui ) = si g(ui ) + ti g(ui ) = L(x)+L(y). i=1 i=1 i=1 Also note that L is bijective with inverse M : V → U given by M i=1 P n i=1 P n t i vi = ti g −1 (vi ), i=1 where ti ∈ R and the vi are distinct elements in B. 7.15 Corollary: Let F be a field and let U and V be vector spaces over F . Then U∼ = V ⇐⇒ dim(U ) = dim(V ). 7.16 Remark: When U and V are modules over a commutative ring R, we have U∼ = V ⇐⇒ rank(U ) = rank(V ), but we have not built up enough machinery to prove this result. 53 7.17 Theorem: Let R be a ring, let U be a free R-module, and let V be any R-module. Let A be basis for U and, for each u ∈ A, let vu ∈ V . Then there exists a unique R-module homomorphism L : U → V with L(u) = vu for all u ∈ A. Proof: Note that if L : U → V is an R-module homomorphism with L(u) = vu for all u ∈ U , then for ti ∈ R and ui ∈ A we have P P n n n P L ti ui = ti L(ui ) = ti vui . i=1 i=1 i=1 This shows that the map L is unique and must be given by the above formula. P P n n To prove existence, we define L : U → V by L ti ui = ti vui where ti ∈ R and i=1 i=1 ui ∈ A, and we note that L is an R-module homomorphism because for x = y= n P n P si ui and i=1 ti ui (using the same elements ui in both sums) and r ∈ R we have i=1 P P n n n P L(rx) = L rsi ui = rsi vui = r si vui = r L(x) , and i=1 i=1 i=1 P P n n n n P P (si + ti )ui = L(x + y) = L (si + ti )vui = si vui + ti vui = L(x) + L(y). i=1 i=1 i=1 i=1 7.18 Corollary: Let R be a commutative ring, let U be a free R-module with basis A and let V be an R-module. Then the map φ : Hom(U, V ) → V A , given by φ(L)(u) = L(u) for all u ∈ A, is an R-module isomorphism, and so we have Hom(U, V ) ∼ = V A. Proof: The above theorem states that the map φ is bijective, and we note that φ is an R-module homomorphism because for L, M ∈ Hom(U, V ) and t ∈ R we have φ(L + M )(u) = (L + M )(u) = L(u) + M (u) = φ(L)(u) + φ(M )(u) , and φ(tL)(u) = (tL)(u) = t L(u) = tφ(L)(u) for all u ∈ A hence φ(L + M ) = φ(L) + φ(M ) and φ(tL) = tφ(L). 7.19 Example: For a module U over a commutative ring R, the dual module of U is the R-module U ∗ = Hom(U, R). By the above corollary, when U is a free module with basis A. we have U ∗ ∼ = RA , and if |A| = n then we have U ∗ ∼ = Rn ∼ = U . When U is a vector space over a field F , U ∗ is called the dual vector space of U . 7.20 Definition: Let R be a commutative ring and let U and V be free R-modules with finite bases. Let A and B be finite ordered bases for U and V respectively, with |A| = n and |B| = m. For L ∈ Hom(U, V ), we define the matrix of L with respect to A and B to be the matrix −1 [L]A ∈ Mm×n (R). B = φB L φA When L ∈ End(U ) we write [L]A for the matrix [L]A A ∈ Mn (R). 54 7.21 Theorem: Let R be a commutative ring. Let A and B be finite ordered bases for free R-modules U and V , respectively, with |A| = n and |B| = m. Let L ∈ Hom(U, V ). Then A (1) [L]A B is the matrix such hat [L]B [u]A = L(u) B for all u ∈ U , and (2) if A = (u1 , u2 , · · · , un ) and B = (v1 , v2 , · · · , vm ) then , · · · , L(u ) ∈ Mm×n (R). [L]A = , L(u ) L(u ) n B 2 B 1 B B Proof: Part (1) holds because for u ∈ U and x = φA (u) = [u]A we have −1 −1 [L]A φA (u) φB L(u) = L(u) B . B [u]A = φB L φA φA (u) = φB L φA Part (2) follows from Part (1) because for each index k, the k th column of [L]A B is A [L]A B (ek ) = [L]B [uk ]A = L(uk ) B . 7.22 Theorem: Let R be a commutative ring. Let A, B and C be finite ordered bases for free R-modules U , V and W , respectively. A A A and t L B = t [L]A +[M ] (1) For L, M ∈ Hom(U, V ) and t ∈ R we have L+M B = [L]A B. B B A A B (2) For L ∈ Hom(U, V ) and M ∈ Hom(V, W ) we have M L C = [M ]C [L]B . Proof: We prove Part (2), leaving the (similar) proof of Part (1) as an exercise. Let L ∈ Hom(U, V ) andn let M ∈ Hom(V, W ). Say |A| = n, |B| = m and |C| = l. Let x ∈ Rn . Choose u ∈ U with [u]A = φA (u) = x. Then B B A A A B A M L C x = M L C [u]A = M (L(u)) C = M C L(u) B = M C L B [u]A = M C L B x. A B A A B A Since M L C x = M C L B x for all x ∈ Rn it follows that M L C = M C L B . 7.23 Corollary: Let R be a commutative ring. Let A and B be finite ordered bases for free R-modules U and V . Then the map φA B : Hom(U, V ) → Mm×n (R) given by A A φB (L) = [L]B is an R-module isomorphism, and the map φA : End(U ) → Mn (R) given by φA (L) = [L]A is an R-algebra isomorphism which restricts to a group isomorphism φA : Aut(U ) → GLn (R). 7.24 Corollary: (Change of Basis) Let R be a commutative ring. Let U and V be free R-modules. Let A and C be two ordered bases for U with |A| = |C| = n and let B and D be two ordered bases for V with |B| = |D| = m. For L ∈ Hom(U, V ) and u ∈ U we have C B A C [u]C = [IU ]A C [u]A and [L]D = [IV ]D [L]B [IU ]A where IU and IV are the identity maps on U and V . Proof: By Part (1) of Theorem 7.21 we have [u]C = [IU (u) C = [IU ]A C [u]A and by Part (2) of Theorem 7.22 C C A C L D = IV L IU D = [IV ]B D [L]B [IU ]A . 7.25 Definition: Let A = (u1 , u2 , · · · , un ) and B = (v1 , v2 , · · · , vn ) be two finite ordered bases for a module U over a commutative ring R. The matrix [I]A B = [u1 ]B , [u2 ]B , · · · , [un ]B ∈ Mn (R) is called the change of basis matrix from A to B. Note that [I]A B is invertible with −1 [I]A = [I]B B A. 55 7.26 Note: Let A and B be two finite ordered bases, with |A| = |B|, for a free module U over a commutative ring R. For L ∈ End(U ), the Change of Basis Theorem gives A A B [L]B = [L]B B = [I]B [L]A [I]A . If we let A = [L]A and B = [L]B and P = [I]A B then the formula becomes B = P AP −1 . 7.27 Note: Given a finite ordered basis B = {v1 , v2 , · · · , vn } for a free R-module U over a commutative ring R, and given an invertible matrix P ∈ GLn (R), if we choose uk ∈ U with [uk ]B = P ek , then A = {u1 , u2 , · · · , un } is an ordered basis for U such that [I]A B = P. Thus every invertible matrix P ∈ GLn (R) is equal to a change of basis matrix. 7.28 Definition: Let R be a commutative ring. For A, B ∈ Mn (R), we say that A and B are similar, and we write A ∼ B, when B = P AP −1 for some P ∈ GLn (R). 7.29 Note: Let R be a commutative ring, and let A, B ∈ Mn (R) with A ∼ B. Choose P ∈ GLn (R) so that B = P AP −1 . Then we have det(B) = det(P AP −1 = det(P ) det(A) det(P )−1 = det(A). Thus similar matrices have the same determinant. 7.30 Definition: Let F be a field and let U be a finite dimensional vector space over F . For L ∈ End(U ), we define the determinant of L to be det(L) = det [L]A where A is any ordered basis for U . 56 Chapter 8. Eigenvalues, Eigenvectors and Diagonalization 8.1 Definition: For a square matrix D ∈ Mn (R) with entries in a ring R, we say that D is a diagonal matrix when Dk,l = 0 whenever k 6= l. For λ1 , λ2 , · · · , λn ∈ R, we write D = diag(λ1 , λ2 , · · · , λn ) for the diagonal matrix D with Dk,k = λk for all indices k. 8.2 Definition: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F . We say that that L is diagonalizable when there exists an ordered basis A for U such that [L]A is diagonal. 8.3 Note: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F . When A = {u1 , u2 , · · · , un } is an ordered basis for U and λ1 , λ2 , · · · , λn ∈ F , we have [L]A = diag(λ1 , λ2 , · · · , λn ) ⇐⇒ L(uk ) A = λk ek for all k ⇐⇒ L(uk ) = λk uk for all k Thus L is diagonalizable if and only if there exists an ordered basis A = {u1 , u2 , · · · , un } for U and there exist λ1 , λ2 , · · · , λn ∈ F such that L(uk ) = λk uk for all k. 8.4 Definition: Let L ∈ End(U ) where U is a vector space over a field F . For λ ∈ F , we say that λ is an eigenvalue (or a characteristic value) of L when there exists a nonzero vector 0 6= u ∈ F n such that L(u) = λu. Such a vector 0 6= u ∈ U is called an eigenvector (or characteristic vector) of L for λ. The spectrum of L is the set Spec(L) = λ ∈ F λ is an eigenvalue of L . For λ ∈ F , the eigenspace of L for λ is the subspace Eλ = u ∈ U L(u) = λu = u ∈ U (L − λI)u = 0 = Ker(L − λI) ⊆ U. Note that Eλ consists of the eigenvectors for λ together with the zero vector. 8.5 Note: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F . For λ ∈ F λ is an eigenvalue of L ⇐⇒ there exists 0 6= u ∈ Ker(L − λI) ⇐⇒ (L − λI) is not invertible ⇐⇒ det(L − λI) = 0 ⇐⇒ λ is a root of f (x) = det(L − xI). Note that when A is any ordered basis for U , we have f (x) = det(L − xI) = det L − xI A = det [L]A − xI ∈ Pn (F ). 8.6 Definition: Let L ∈ End(U ) where U is an n-dimensional vector space over a field F . The characteristic polynomial of L is the polynomial fL (x) = det(L − xI) ∈ Pn (F ). Note that Spec(L) is the set of roots of fL (x). 57 8.7 Note: Let L ∈ End(U ) where U is an n-dimensional vector space over a field F . Recall that L is diagonalizable if and only if there exists an ordered basis A = {u1 , u2 , · · · , un } for U such that each uk is an eigenvector for some eigenvalue λk . The eigenvalues of L are the roots of fL (x), so there are at most n possible distinct eigenvalues. For each eigenvalue, the largest number of linearly independent eigenvectors for λ is equal to the dimension of Eλ . We can try to diagonalize L be finding all the eigenvalues λ for L, then finding a basis for each eigenspace Eλ , then selecting an ordered P basis A from the union of the bases of the eigenspaces. In particular, note that if dim(Eλ ) < n then L cannot λ∈Spec(L) be diagonalizable. 8.8 Definition: Let F be a field and let A ∈ Mn (F ). By identifying A with the linear map L = LA ∈ End(F n ) given by L(x) = Ax, all of the above definitions and remarks may be applied to the matrix A. The matrix A is diagonalizable when there exists an invertible matrix P and a diagonal matrix D such that A = P DP −1 (or equivalently P −1AP = D). An eigenvalue for A is an element λ ∈ F for which there exists 0 6= x ∈ F n such that Ax = λx, and then such a vector x is called an eigenvector of A for λ. The set of eigenvalues of A, denoted by Spec(A), is called the spectrum of A. For λ ∈ F , the eigenspace for λ is the vector space Eλ = Null(A−λI). The characteristic polynomial of A is the polynomial fA (x) = det(A − xI) ∈ Pn (F ), 3 −1 8.9 Example: Let A = ∈ M2 (F ) where F = R or C. The characteristic 4 −1 polynomial of A is 3 − x −1 = (x − 3)(x + 1) + 4 = x2 − 2x + 1 = (x − 1)2 4 −1 − x fA (x) = det(A − xI) = so the only eigenvalue of A is λ = 1. When λ = 1 we have 2 −1 2 −1 (A − λI) = (A − I) = ∼ 4 −2 0 0 so the eigenspace E1 = Null(A − I) has basis {u} where u = (1, 2)T . Since P dim(Eλ ) = dim(E1 ) = 1 < 2, λ∈Spec(A) we see that A is not diagonalizable. 1 −2 8.10 Example: Let A = ∈ M2 (F ) where F = R or C. The characteristic 2 1 polynomial of A is fA (x) = 1−x 2 −2 = (x − 1)2 + 4 = x2 − 2x + 5. 1−x √ For x ∈ C, we have fA (x) = 0 ⇐⇒ x = 2± 24−20 = 1 ± 2i. When F = R, A has no eigenvalues (in R) and so A is not diagonalizable. When F = C, the eigenvalues of A are λ1 = 1 + 2i and λ2 = 1 − 2i. As an exercise, show that when λ = λ1 the eigenspace Eλ1 has basis {u1 } where u1 = (i, 1)T , and when λ = λ2 the eigenspace Eλ2 has basis u2 = (−i, 1)T , then verify that the matrix P = (u1 , u2 ) ∈ M2 (C) is invertible and that P −1 A P = diag(λ1 , λ2 ) thus showing that A is diagonalizable. 58 8.11 Theorem: Let L ∈ End(U ) where U is a vector space over a field F . Let λ1 , λ2 , · · · , λ` ∈ F be distinct eigenvalues of L. For each index k, let 0 6= uk ∈ U be an eigenvector of L for λk . Then {u1 , u2 , · · · , u` } is linearly independent. Proof: Since u1 6= 0 the set {u1 } is linearly independent. Suppose, inductively, that the P̀ ti ui = 0 with ti ∈ F . Note set {u1 , u2 , · · · , u`−1 } is linearly independent. Suppose that i=1 that 0 = (L − λ` I) P̀ `−1 P̀ P̀ P ti ui = ti (L(ui ) − λ` ui ) = ti (λi − λ` )ui = ti (λi − λ` )ui i=1 i=1 i=1 i=1 and so ti = 0 for 1 ≤ i < ` since {u1 , u2 , · · · , u`−1 } is linearly independent. Since ti = 0 P̀ for 1 ≤ i ≤ ` − 1 and ti ui = 0, we also have t` = 0. Thus {u1 , u2 , · · · , u` } is linearly i=1 independent. 8.12 Corollary: Let L ∈ End(U ) where U is a vector space over a field F . Let λ1 , λ2 , · · · , λ` be distinct eigenvalues of L. For each index k, let Ak be a linearly indeS̀ pendent set of eigenvectors for λk . Then Ak is linearly independent. k=1 Proof: Suppose that P̀ m Pk tk,i uk,i = 0 where each tk,i ∈ F and for each k, the vectors uk,i k=1 i=1 are distinct vectors in Ak . Then we have P̀ uk = 0 where uk = i=1 m Pk tk,i uk,i ∈ Eλk . From i=1 the above theorem, it follows that uk = 0 for all k, because if we had uk 6= 0 for some values of k, say the values k1 , k2 , · · · , kr , then {uk1 , uk2 , · · · , ukr } would be linearly independent m r P Pk but ukr = 0, which is impossible. Since for each index k we have 0 = uk = tk,i uk,i i=1 i=1 it follows that each tk,i = 0 because Ak is linearly independent. 8.13 Corollary: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F . Then P L is diagonalizable if and only if dim(Eλ ) = dim U. λ∈Spec(L) In this case, if Spec(L) = {λ1 , λ2 , · · · , λ` } and, for each k, Ak = {uk,1 , uk,2 , · · · , uk,mk } is an ordered basis for Eλk , and then A= ` [ Ak = u1,1 , u1,2 , · · · , u1,m1 , u2,1 , u2,2 , · · · , u2,m2 , · · · , u`,1 , u`,2 , · · · , u`,m` k=1 is an ordered basis for U such that [L]A is diagonal. 59 8.14 Definition: Let F be a field. For f ∈ F [x] and a ∈ F , the multiplicity of a as a root of f , denoted by mult(a, f (x)), is the smallest m ∈ N such that (x − a)m is a factor of f (x). Note that a is a root of f if and only if mult(a, f ) > 0. For a non-constant polynomial f ∈ F [x], we say that f splits (over F ) when f factors into a product of linear n Q factors in F [x], that is when f is of the form f (x) = c (x − ai ) for some ai ∈ F . i=1 8.15 Theorem: Let L ∈ End(U ) where U is afinite dimensional vector space over a field F . Let λ ∈ Spec(L) and let mλ = mult λ, fL (x) . Then 1 ≤ dim(Eλ ) ≤ mλ . Proof: Since λ is an eigenvalue of L we have Eλ 6= {0} so dim(Eλ ) ≥ 1. Let m = dim(Eλ ) and let A = (u1 , u2 , · · · , um ) be an ordered basis for Eλ . Extend A to an ordered basis B = (u1 , · · · , um , · · · , un ) for U . Since L(ui ) = λui for 1 ≤ i ≤ m, the matrix [L]B is of the form λI A [L]B = ∈ Mn (F ) 0 B where I ∈ Mm (F ). The characteristic polynomial of L is Thus (x − λ)m (λ − x)I 0 A = (λ − x)m fB (x). B − xI is a factor of fL (x) and so mλ = mult λ, fL (x) ≥ m. fL (x) = 8.16 Corollary: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F . Then L is diagonalizable if and only if fL (x) splits and dim(Eλ ) = mult(λ, fL (x) for every λ ∈ Spec(L). Proof: Suppose that L is diagonalizable. Choose an ordered basis A so that [L]A is n Q diagonal, say [L]A = D = diag(λ1 , · · · , λn ). Note that fL (x) = fD (x) = (λk − x), k=1 and so fL (x) splits. For each λ ∈ Spec(L), let mλ = mult(λ, fL (x)). Then, by the above theorem together with Corollary 8.13, we have P P n = dim(U ) = dim(Eλ ) ≤ mλ = deg(fL ) = n λ∈Spec(L) λ∈Spec(L) which implies that dim(Eλ ) = mλ for all λ. Conversely, if fL splits and dim(Eλ ) = mλ for all λ then P P dim(Eλ ) = mλ = deg(fL ) = n = dim(U ) λ∈Spec(L) λ∈Spec(L) and so L is diagonalizable. 8.17 Corollary: Let A ∈ Mn (F ) where F is a field. Then A is diagonalizable if and only if fA (x) splits and dim(Eλ ) = mult(λ, fA (x)) for all λ ∈ Spec(A). 60 8.18 Note: To summarize the above results, given a matrix A ∈ Mn (F ), where F is a field, we can determine whether A is diagonalizable as follows. We find the characteristicc polynomial fA (x) = det(A − xI). We factor fA (x) to find the eigenvalues of A and the multiplicity of each eigenvalue. If fA (x) does not split then A is not diagonalizable. If fA (x) does split, then for each eigenvalue λ with multiplicity mλ ≥ 2, we calculate dim(Eλ ). If we find one eigenvalue λ for which dim(Eλ ) < mλ then A is not diagonalizable. Otherwise A is diagonalizable. In particular we remark that if fA (x) splits and has n distinct roots (so the eigenvalues all have multiplicity 1) then A is diagonalizable. Q̀ In the case that A is diagonalizable and fA (x) = (−1)n (x − λk )mk , if we find an k=1 ordered basis Ak = {uk,1 , uk2 , · · · , uk,mk for each eigenspace, then we have P −1AP = D with P = u1,1 , u1,2 , · · · , u1,m1 , u2,1 , u2,2 , · · · , u2,m2 , · · · , u`,1 , u`,2 , · · · , u`,m` D = diag(λ1 , λ1 , · · · , λ1 , λ2 , λ2 , · · · , λ2 , · · · , λ` , λ` , · · · , λ` where each λk is repeated mk times. 3 1 1 8.19 Example: Let A = 2 4 2 ∈ M3 (Q). Determine whether A is diagonalizable −1 −1 1 and, if so, find an invertible matrix P and a diagonal matrix D such that P −1AP = D. Solution: The characteristic polynomial of A is 3−x 2 fA (x) = A − xI = −1 1 4−x −1 1 2 1−x = −(x − 3)(x − 4)(x − 1) − 2 − 2 − 2(x − 3) + 2(x − 1) − (x − 4) = −(x3 − 8x2 + 19x − 12) − x + 4 = −(x3 − 8x2 + 20x − 16) = −(x − 2)(x2 − 6x + 8) = −(x − 2)2 (x − 4) so the eigenvalues are λ1 = 2 and λ2 = 4 of multiplicities m1 = λ = λ1 = 2 we have 1 1 1 1 1 A − λI = A − 2I = 2 2 2 ∼ 0 0 0 0 −1 −1 −1 2 and m2 = 1. When 1 0 0 so the eigenspace E2 has basis {u1 , u2 } with u1 = (−1, 0, 1)T and u2 = (−1, 1, 0)T . When λ = λ2 = 4 we have −1 1 1 1 −1 −1 1 0 1 A − λI = A − 4I = 2 0 2 ∼ 0 2 4 ∼ 0 1 2 −1 −1 −3 0 2 4 0 0 0 so the eigenspace E4 has basis {u3 } where u3 = (−1, −2, 1)T . Thus we have P −1AP = D where −1 −1 −1 2 0 0 P = u1 , u2 , u3 = 0 1 −2 and D = diag λ1 , λ1 , λ2 = 0 2 0 . 1 0 1 0 0 4 61 8.20 Definition: For a square matrix T ∈ Mn (R) with entries in a ring R, we say that T is upper triangular when Tk,l = 0 whenever k > l. 8.21 Definition: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F . We say that L is (upper) triangularizable when there exists an ordered basis A for U such that [L]A is upper triangular. 8.22 Definition: For a square matrix A ∈ Mn (F ), where F is a field, we say that A is (upper) triangularizable when there exists an invertible matrix P ∈ GLn (F ) such that P −1AP is upper triangular. 8.23 Theorem: (Schur’s Theorem) Let F be a field. (1) Let L ∈ End(U ) where U is a finite dimensional vector space over F . Then L is triangularizable if and only if fL (x) splits, and (2) Let A ∈ Mn (F ). Then A is triangularizable if and only if fA (x) splits. Proof: We shall prove Part (2), and we leave it as an exercise to show that Part (1) holds if and only if Part (2) holds. Suppose first that A is triangularizable. Choose an invertible matrix P and an upper triangular matrix T with P −1AP = T . Then n Q fA (x) = fT (x) = (Tk,k − x) k=1 and so fA (x) splits. Conversely, suppose that fA (x) splits. Choose a root λ1 of fA (x) and note that λ1 is an eigenvalue of A. Choose an eigenvector u1 for λ1 , so we have Au1 = λ1 u1 . Since u1 6= 0 the set {u1 } is linearly independent. Extend the set {u1 } to a basis A = {u1 , u2 , · · · , un } for F n . Let Q = (u1 , u2 , · · · , un ) ∈ Mn (F ), and note that Q is invertible because A is a basis for F n . Since Q−1 Q = I, the first column of Q−1 Q is equal to e1 , so we have Q−1AQ = Q−1 A(u1 , u2 , · · · , un ) = Q−1 (Au1 , Au2 , · · · , Aun ) = Q−1 λ1 u1 , A(u2 , · · · , un ) = λ1 Q−1 u1 , Q−1A(u2 , · · · , un ) λ 1 xT −1 = λ1 e1 , Q A(u2 , · · · , un ) = 0 B with x ∈ F n−1 and B ∈ Mn−1 (F ).. Note that fA (x) = (x − λ1 )fB (x) and so fB (x) splits. We suppose, inductively, that B is triangularizable. Choose R ∈ GLn−1 (F ) so that 1 0 R−1BR = S with S upper-triangular. Let P = Q ∈ Mn (F ). Then P is invertible 0 R 1 0 −1 with P = Q−1 and 0 R−1 1 0 1 0 1 0 1 0 λ1 xT −1 −1 P AP = Q AQ = 0 B 0 R 0 R−1 0 R 0 R−1 T T T 1 0 λ1 x R λ1 x R λ1 x R = = = 0 R−1 0 BR 0 R−1BR 0 S which is upper triangular. 62