MATH 136: Linear Algebra 1 for Honours Mathematics Fall 2023 Edition 1st edition 2020 by Conrad Hewitt and Francine Vinette © Faculty of Mathematics, University of Waterloo We would like to acknowledge contributions from: Shane Bauman, Judith Koeller, Anton Mosunov, Graeme Turner, and Owen Woody 2 Contents 1 Vectors in R𝑛 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 1.2 Algebraic and Geometric Representation of Vectors 1.3 Operations on Vectors . . . . . . . . . . . . . . . . 1.4 Vectors in C𝑛 . . . . . . . . . . . . . . . . . . . . . 1.5 Dot Product in R𝑛 . . . . . . . . . . . . . . . . . . 1.6 Projection, Components and Perpendicular . . . . 1.7 Standard Inner Product in C𝑛 . . . . . . . . . . . . 1.8 Fields . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 The Cross Product in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 7 12 13 19 23 27 28 2 Span, Lines and Planes 2.1 Linear Combinations and Span . . . . 2.2 Lines in R2 . . . . . . . . . . . . . . . 2.3 Lines in R𝑛 . . . . . . . . . . . . . . . 2.4 The Vector Equation of a Plane in R𝑛 2.5 Scalar Equation of a Plane in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 34 36 40 44 3 Systems of Linear Equations 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 An Approach to Solving Systems of Linear Equations . . . . . . . . . . . 3.4 Solving Systems of Linear Equations Using Matrices . . . . . . . . . . . . 3.5 The Gauss–Jordan Algorithm for Solving Systems of Linear Equations . . 3.6 Rank and Nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Homogeneous and Non-Homogeneous Systems, Nullspace . . . . . . . . . 3.8 Solving Systems of Linear Equations over C . . . . . . . . . . . . . . . . . 3.9 Matrix–Vector Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Using a Matrix–Vector Product to Express a System of Linear Equations 3.11 Solution Sets to Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 48 49 55 61 66 75 78 84 84 88 89 . . . . . . 97 97 99 103 107 109 112 . . . . . . . . . . 4 Matrices 4.1 The Column and Row Spaces of a Matrix 4.2 Matrix Equality and Multiplication . . . . 4.3 Arithmetic Operations on Matrices . . . . 4.4 Properties of Square Matrices . . . . . . . 4.5 Elementary Matrices . . . . . . . . . . . . 4.6 Matrix Inverse . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 0 CONTENTS 5 Linear Transformations 121 5.1 The Function Determined by a Matrix . . . . . . . . . . . . . . . . . . . . . 121 5.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.3 The Range of a Linear Transformation and “Onto” Linear Transformations 127 5.4 The Kernel of a Linear Transformation and “One-to-One” Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.5 Every Linear Transformation is Determined by a Matrix . . . . . . . . . . . 133 5.6 Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.7 Composition of Linear Transformations . . . . . . . . . . . . . . . . . . . . 143 6 The 6.1 6.2 6.3 6.4 6.5 6.6 Determinant The Definition of the Determinant . . . Computing the Determinant in Practice: The Determinant and Invertibility . . . An Expression for 𝐴−1 . . . . . . . . . . Cramer’s Rule . . . . . . . . . . . . . . . The Determinant and Geometry . . . . . . . . EROs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Eigenvalues and Diagonalization 7.1 What is an Eigenpair? . . . . . . . . . . . . . . . . . . . 7.2 The Characteristic Polynomial and Finding Eigenvalues 7.3 Properties of the Characteristic Polynomial . . . . . . . 7.4 Finding Eigenvectors . . . . . . . . . . . . . . . . . . . . 7.5 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 147 153 156 158 161 163 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 167 169 172 178 181 183 8 Subspaces and Bases 8.1 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Linear Dependence and the Notion of a Basis of a Subspace 8.3 Detecting Linear Dependence and Independence . . . . . . 8.4 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Bases for Col(𝐴) and Null(𝐴) . . . . . . . . . . . . . . . . . 8.7 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 190 193 197 204 208 212 216 219 . . . . 230 230 235 238 244 9 Diagonalization 9.1 Matrix Representation of a Linear Operator 9.2 Diagonalizability of Linear Operators . . . . 9.3 Diagonalizability of Matrices Revisited . . . 9.4 The Diagonalizability Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Vector Spaces 254 10.1 Definition of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2 Span, Linear Independence and Basis . . . . . . . . . . . . . . . . . . . . . . 258 10.3 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Chapter 1 Vectors in R𝑛 1.1 Introduction Linear algebra is used widely in the social sciences, business, science, and engineering. Vectors are used in the sciences for displacement, velocity, acceleration, force, and many other important physical concepts. Informally, a vector is an object that has both magnitude and direction. For instance, the velocity of a boat on a river can be represented by a vector that encodes both its magnitude (speed) and its direction of travel. In mathematics, a vector is represented by a column of numbers, as in the following definition. Definition 1.1.1 R𝑛 , Vector in R𝑛 ⎧ ⎪ ⎨ ⎫ ⎤ 𝑥1 ⎪ ⎬ ⎢ .. ⎥ #» 𝑛 The set R is defined as 𝑥 = ⎣ . ⎦ : 𝑥1 , . . . , 𝑥𝑛 ∈ R . ⎪ ⎪ ⎩ ⎭ 𝑥𝑛 ⎡ ⎤ 𝑥1 ⎢ .. ⎥ #» A vector is an element 𝑥 = ⎣ . ⎦ of R𝑛 . ⎡ 𝑥𝑛 REMARK In these notes, a vector is indicated by an accented right arrow: #» 𝑣. Other texts use different notation for vectors, such as v or 𝑣. ⎡ Example 1.1.2 [︂ ]︂ 2 #» The following are vectors: 𝑤 = ∈ R2 , 3 ⎡ ⎤ −3 ⎢ ⎢ #» 𝑣 = ⎣ 1 ⎦ ∈ R3 , and #» 𝑢 =⎢ ⎣ −5 5 𝑢1 𝑢2 .. . 𝑢100 ⎤ ⎥ ⎥ ⎥ ∈ R100 . ⎦ 6 Chapter 1 Vectors in R𝑛 We will usually write vectors in column notation. It can be more convenient and compact to write a vector in row notation, which includes the superscript “𝑇 ” (short for transpose) and clearly separates the components of the vectors with spaces, as follows. ⎤ 𝑣1 ⎢ 𝑣2 ⎥ ]︀𝑇 [︀ ⎢ ⎥ 𝑣 = 𝑣1 𝑣2 · · · 𝑣𝑛 . The row notation for the vector #» 𝑣 = ⎢ . ⎥ is #» . ⎣ . ⎦ ⎡ Definition 1.1.3 Row Notation for a Vector 𝑣𝑛 REMARK ]︀𝑇 ]︀ [︀ [︀ Many texts do not distinguish between 𝑣1 𝑣2 · · · 𝑣𝑛 and 𝑣1 𝑣2 · · · 𝑣𝑛 . ]︀𝑇 ]︀ [︀ [︀ For our purposes, this distinction is important, i.e. 𝑣1 𝑣2 . . . 𝑣𝑛 ̸= 𝑣1 𝑣2 . . . 𝑣𝑛 . 1.2 Algebraic and Geometric Representation of Vectors [︂ ]︂ 2 #» . A vector’s algebraic representation consists of a column of numbers, e.g. 𝑣 = 3 A vector’s geometric representation consists of a directed line segment in R𝑛 . Vectors are not localized; that is, if two vectors have the same magnitude and direction, but different starting points, we consider those vectors to be equal. For convenience, we often choose the starting point of a vector to be the origin, 𝑂. Figure 1.2.1: Vectors and the Origin Relationship between Geometric and Algebraic Representations Consider the geometric representation for a vector #» 𝑣 in R𝑛 , which is a directed line segment. #» We typically consider the initial point of 𝑣 to be the origin 𝑂 and call its terminal point V (i.e., we use the same letter as the vector). If the point V has coordinates (𝑣1 , 𝑣2 , . . . , 𝑣𝑛 ), then we write the vector #» 𝑣 as ⎡ ⎤ 𝑣1 ⎢ 𝑣2 ⎥ ⎢ ⎥ #» 𝑣 =⎢ . ⎥ ⎣ .. ⎦ 𝑣𝑛 This 𝑛-tuple is the algebraic representation of #» 𝑣. Section 1.3 Operations on Vectors 7 ⎤ 𝑣1 ⎢ 𝑣2 ⎥ ⎢ ⎥ On the other hand, if the algebraic representation of a vector #» 𝑣 in R𝑛 is ⎢ . ⎥, then its ⎣ .. ⎦ ⎡ 𝑣𝑛 geometric interpretation is a directed line segment from the initial point, 𝑂, to the terminal point, V, with co-ordinates (𝑣1 , 𝑣2 , . . . , 𝑣𝑛 ). [︂ ]︂ 𝑣1 #» Figure 1.2.2: The geometric representation of 𝑣 = in R2 𝑣2 We are free to move the vector #» 𝑣 around, as long as we maintain its magnitude and direction; however, moving it will change both the initial and terminal points. We say that the point 𝑉 is the terminal point associated with the vector #» 𝑣 (using the initial point as 𝑂), or more simply, we say that 𝑉 is the point associated with the vector #» 𝑣. ⎡ Example 1.2.1 ⎤ 2 The point 𝑉 = (2, −4, 9) is the terminal point associated with the vector #» 𝑣 = ⎣ −4 ⎦ in R3 . 9 1.3 Operations on Vectors ⎡ Definition 1.3.1 Equality of Vectors in R𝑛 ⎤ ⎡ ⎤ 𝑢1 𝑣1 ⎢ 𝑢2 ⎥ ⎢ 𝑣2 ⎥ ⎢ ⎥ ⎢ ⎥ We say that vectors #» 𝑢 = ⎢ . ⎥ and #» 𝑣 = ⎢ . ⎥ in R𝑛 are equal if 𝑢𝑖 = 𝑣𝑖 for all 𝑖 = 1, . . . , 𝑛. ⎣ .. ⎦ ⎣ .. ⎦ 𝑢𝑛 𝑣𝑛 We denote this by writing #» 𝑢 = #» 𝑣. The vector equation #» 𝑢 = #» 𝑣 is therefore equivalent to a system of 𝑛 equations, one for each component. Example 1.3.2 ⎡ ⎤ ⎡ ⎤ 1 1 If #» 𝑢 = ⎣ 2 ⎦ and #» 𝑣 = ⎣ 2 ⎦, then #» 𝑢 = ̸ #» 𝑣 because 𝑢3 ̸= 𝑣3 . 3 4 8 Chapter 1 Vectors in R𝑛 Geometrically, two vectors #» 𝑢 and #» 𝑣 are said to be equal if they have the same magnitude and the same direction. ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 𝑢1 + 𝑣1 𝑣1 𝑢1 𝑣1 𝑢1 ⎢ 𝑢2 ⎥ ⎢ 𝑣2 ⎥ ⎢ 𝑢2 + 𝑣2 ⎥ ⎢ 𝑣2 ⎥ ⎢ 𝑢2 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ 𝑢 + #» 𝑣 =⎢ . ⎥+⎢ . ⎥=⎢ 𝑣 = ⎢ . ⎥ ∈ R𝑛 . Then #» Let #» 𝑢 = ⎢ . ⎥ , #» ⎥. .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ ⎣ .. ⎦ ⎣ .. ⎦ . 𝑢𝑛 + 𝑣𝑛 𝑣𝑛 𝑢𝑛 𝑣𝑛 𝑢𝑛 ⎡ Definition 1.3.3 Addition in R𝑛 Example 1.3.4 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 3 2+3 5 We have ⎣ 4 ⎦ + ⎣ −5 ⎦ = ⎣ 4 − 5 ⎦ = ⎣ −1 ⎦ . 6 7 6+7 13 #» = #» Geometrically, the vector 𝑤 𝑢 + #» 𝑣 follows the Parallelogram Law, illustrated in the #» following example with 𝑣 moved so that its initial point is the terminal point of #» 𝑢. Example 1.3.5 [︂ ]︂ [︂ ]︂ 5 −2 #» #» in R2 . Show the result geometrically. to 𝑣 = Add 𝑢 = 1 3 [︂ ]︂ [︂ ]︂ [︂ ]︂ 3 5 −2 #» #» . = + Solution: 𝑢 + 𝑣 = 4 1 3 REMARK In this course, most of the time we will be labelling axes in R2 as 𝑥1 , 𝑥2 instead of 𝑥, 𝑦. Similarly, in R3 we will be labelling axes as 𝑥1 , 𝑥2 , 𝑥3 instead of 𝑥, 𝑦, 𝑧. We recommend you to follow that practice as well. Example 1.3.6 Suppose a boat is moving with a velocity of #» 𝑢 and a person on the boat walks with velocity #» 𝑣 on (relative to) the boat. An observer on the shore sees the person move with velocity #» = #» 𝑤 𝑢 + #» 𝑣 , and thus sees the combined velocity of the person and the boat. The following rules of vector addition follow from the properties of addition of real numbers. Section 1.3 Operations on Vectors Proposition 1.3.7 9 (Properties of Vector Addition) #» ∈ R𝑛 . Let #» 𝑢 , #» 𝑣,𝑤 (a) #» 𝑢 + #» 𝑣 = #» 𝑣 + #» 𝑢 (symmetry) #» = #» #» = ( #» #» (associativity) (b) #» 𝑢 + #» 𝑣 +𝑤 𝑢 + ( #» 𝑣 + 𝑤) 𝑢 + #» 𝑣)+ 𝑤 ]︀𝑇 #» [︀ (c) There is a zero vector, 0 = 0 0 · · · 0 in R𝑛 , with the property that #» #» #» 𝑣 + 0 = 0 + #» 𝑣 = #» 𝑣. EXERCISE Prove Proposition 1.3.7 (Properties of Vector Addition). For many of the results in this chapter, the proof is left as an exercise for you. ⎡ Definition 1.3.8 Additive Inverse ⎤ 𝑢1 ⎢ 𝑢2 ⎥ ⎢ ⎥ Let #» 𝑢 = ⎢ . ⎥ ∈ R𝑛 . The additive inverse of #» 𝑢 , denoted − #» 𝑢 , is defined as ⎣ .. ⎦ 𝑢𝑛 ⎡ ⎤ −𝑢1 ⎢ −𝑢2 ⎥ ⎢ ⎥ − #» 𝑢 = ⎢ . ⎥. ⎣ .. ⎦ −𝑢𝑛 ⎡ Example 1.3.9 ⎤ ⎡ ⎤ 1 −1 If #» 𝑢 = ⎣ −4 ⎦, then − #» 𝑢 = ⎣ 4 ⎦. 2 −2 We now notice the key property of the additive inverse. Proposition 1.3.10 (Additive Inverse Property) If #» 𝑢 ∈ R𝑛 , we have #» #» 𝑢 − #» 𝑢 = #» 𝑢 + (− #» 𝑢 ) = (− #» 𝑢 ) + #» 𝑢 = 0. Thus − #» 𝑢 has the effect of “cancelling” the vector #» 𝑢 when addition is performed (hence the term additive inverse). 10 Chapter 1 Vectors in R𝑛 REMARK The vector − #» 𝑢 has the same magnitude as #» 𝑢 , but the opposite direction. ⎤ ⎡ ⎤ ⎡ ⎤ −𝑢1 𝑢1 𝑣1 ⎢ −𝑢2 ⎥ ⎢ 𝑢2 ⎥ ⎢ 𝑣2 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ 𝑢 = ⎢ . ⎥ be vectors in R𝑛 . We define 𝑢 = ⎢ . ⎥ and − #» Let #» 𝑣 = ⎢ . ⎥, #» ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ −𝑢𝑛 𝑢𝑛 𝑣𝑛 ⎡ Definition 1.3.11 Subtraction in R𝑛 ⎤ 𝑣1 − 𝑢1 ⎢ 𝑣2 − 𝑢2 ⎥ ⎥ ⎢ #» 𝑣 − #» 𝑢 = #» 𝑣 + (− #» 𝑢) = ⎢ ⎥. .. ⎦ ⎣ . ⎡ 𝑣𝑛 − 𝑢𝑛 Figure 1.3.3: General geometric interpretation of #» 𝑣 − #» 𝑢 Example 1.3.12 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ −1 2−3 3 2 We have ⎣ 4 ⎦ − ⎣ −5 ⎦ = ⎣ 4 + 5 ⎦ = ⎣ 9 ⎦ . −1 6−7 7 6 For vectors in R𝑛 with 𝑛 > 1, we do not define a component-wise “vector multiplication”, as we did for addition and subtraction, since such an operation turns out to be of little use in linear algebra. There are different types of “vector products” that we do define, however. These will be discussed later in this chapter. In the meantime, we will define the scalar multiplication of a vector by a real number 𝑐. Definition 1.3.13 Scalar Multiplication ⎤ ⎤ ⎡ ⎤ ⎡ 𝑐𝑣1 𝑣1 𝑣1 ⎢ 𝑣2 ⎥ ⎢ 𝑐𝑣2 ⎥ ⎢ 𝑣2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Let 𝑐 ∈ R and #» 𝑣 = ⎢ . ⎥ ∈ R𝑛 . We define 𝑐 #» 𝑣 = 𝑐 ⎢ . ⎥ = ⎢ . ⎥. ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ 𝑣𝑛 𝑐𝑣𝑛 𝑣𝑛 We say that the vector #» 𝑣 is scaled by 𝑐. ⎡ For this reason, real numbers are often referred to as scalars. Section 1.3 Operations on Vectors 11 Figure 1.3.4: Scalar multiplication of #» 𝑣 REMARKS • When 𝑐 > 0, the vector 𝑐 #» 𝑣 points in the direction of #» 𝑣 and is 𝑐 times as long. • When 𝑐 < 0, then 𝑐 #» 𝑣 points in the opposite direction to #» 𝑣 and is |𝑐| times as long. • The vector (−1) #» 𝑣 = − #» 𝑣 has the same length as #» 𝑣 but points in the opposite direction. Proposition 1.3.14 (Properties of Scalar Multiplication) Let 𝑐, 𝑑 ∈ R and #» 𝑢 , #» 𝑣 ∈ R𝑛 . (a) (𝑐 + 𝑑) #» 𝑣 = 𝑐 #» 𝑣 + 𝑑 #» 𝑣. (b) (𝑐𝑑) #» 𝑣 = 𝑐(𝑑 #» 𝑣 ). (c) 𝑐( #» 𝑢 + #» 𝑣 ) = 𝑐 #» 𝑢 + 𝑐 #» 𝑣. #» (d) 0 #» 𝑣 = 0. #» #» (e) If 𝑐 #» 𝑣 = 0 , then 𝑐 = 0 or #» 𝑣 = 0. (cancellation law) Property (e) may look especially familiar, because the following result is often used in mathematics when dealing with real numbers: For all 𝑎, 𝑏 ∈ R, if 𝑎𝑏 = 0, then 𝑎 = 0 or 𝑏 = 0. Definition 1.3.15 Standard Basis for R𝑛 Example 1.3.16 In R𝑛 , let 𝑒#»𝑖 be the vector whose 𝑖𝑡ℎ component is 1 with all other components 0. The set 𝑛 ℰ = {𝑒#»1 , 𝑒#»2 , . . . , 𝑒#» 𝑛 } is called the standard basis for R . ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 0 0 ⎬ ⎨ 1 The standard basis for R3 is {𝑒#»1 , 𝑒#»2 , 𝑒#»3 } = ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ . ⎩ ⎭ 0 0 1 ⎡ Definition 1.3.17 Components ⎤ 𝑣1 ⎢ 𝑣2 ⎥ ⎢ ⎥ #» If #» 𝑣 = ⎢ . ⎥ = 𝑣1 𝑒#»1 + 𝑣2 𝑒#»2 + ... + 𝑣𝑛 𝑒#» 𝑛 , then we call 𝑣1 , 𝑣2 , ..., 𝑣𝑛 the components of 𝑣 . ⎣ .. ⎦ 𝑣𝑛 12 Example 1.3.18 Chapter 1 In R2 , Vectors in R𝑛 [︂ ]︂ [︂ ]︂ [︂ ]︂ 2 1 0 #» #» #» 𝑣 = = 2𝑒1 + 3𝑒2 = 2 +3 . 3 0 1 ⎡ Example 1.3.19 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 5 1 0 0 If #» 𝑣 = ⎣ 3 ⎦ ∈ R3 , then #» 𝑣 = 5 ⎣ 0 ⎦ + 3 ⎣ 1 ⎦ − 4 ⎣ 0 ⎦ = 5𝑒#»1 + 3𝑒#»2 − 4𝑒#»3 . −4 0 0 1 The components of #» 𝑣 are 5, 3 and −4. 1.4 Vectors in C𝑛 Vectors are also defined in C𝑛 , where each component is a complex number. Equality, addition, subtraction, scalar multiplication, and the cancellation law in C𝑛 are all defined as in the previous propositions, replacing R with C. Definition 1.4.1 C𝑛 , Vector in C𝑛 ⎧ ⎪ ⎨ ⎫ ⎤ 𝑧1 ⎪ ⎬ ⎢ ⎥ The set C𝑛 is defined as #» 𝑧 = ⎣ ... ⎦ : 𝑧1 , . . . , 𝑧𝑛 ∈ C . ⎪ ⎪ ⎩ ⎭ 𝑧𝑛 ⎡ ⎤ 𝑧1 ⎢ .. ⎥ #» A vector is an element 𝑧 = ⎣ . ⎦ of C𝑛 . ⎡ 𝑧𝑛 ⎤ 1+𝑖 #» = 1 − 2𝑖 ∈ C2 and #» We have 𝑤 𝑧 = ⎣ 2 − 3𝑖 ⎦ ∈ C3 . 7 −4 + 5𝑖 [︂ Example 1.4.2 ]︂ ⎡ Addition, subtraction, and scalar multiplication in C𝑛 are defined as in R𝑛 . [︂ Example 1.4.3 ]︂ [︂ ]︂ 1+𝑖 2−𝑖 The expression 3 −2 evaluates to 2+𝑖 4+𝑖 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 1+𝑖 2−𝑖 3 + 3𝑖 4 − 2𝑖 −1 + 5𝑖 3 −2 = − = . 2+𝑖 4+𝑖 6 + 3𝑖 8 + 2𝑖 −2 + 𝑖 Section 1.5 Dot Product in R𝑛 Example 1.4.4 13 #» ∈ C𝑛 and (𝑐 − 4 − 𝑖)( #» #» = #» #» If 𝑐 ∈ C and #» 𝑧,𝑤 𝑧 − 𝑤) 0 , then 𝑐 = 4 + 𝑖 or #» 𝑧 = 𝑤. 𝑛 𝑛 The standard basis {𝑒#»1 , 𝑒#»2 , ..., 𝑒#» 𝑛 } for C is the same as the standard basis for R . Definition 1.4.5 Standard Basis for C𝑛 Example 1.4.6 In C𝑛 , let 𝑒#»𝑖 represent the vector whose 𝑖𝑡ℎ component is 1 with all other components 0. 𝑛 The set {𝑒#»1 , 𝑒#»2 , ..., 𝑒#» 𝑛 } is called the standard basis for C . ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 0 0 0 ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ ⎬ 0 1 0 ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢0⎥ . The standard basis for C4 is {𝑒#»1 , 𝑒#»2 , 𝑒#»3 , 𝑒#»4 } = ⎢ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 0 0 0 1 ⎡ Definition 1.4.7 Components ⎢ ⎢ If #» 𝑣 =⎢ ⎣ 𝑣1 𝑣2 .. . ⎤ ⎥ ⎥ ⎥ = 𝑣1 𝑒#»1 + 𝑣2 𝑒#»2 + ... + 𝑣𝑛 𝑒#» 𝑛 , then we refer to 𝑣1 , 𝑣2 , ..., 𝑣𝑛 as the components ⎦ 𝑣𝑛 of #» 𝑣. ⎡ Example 1.4.8 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 5+𝑖 1 0 0 If #» 𝑣 = ⎣ 3𝑖 ⎦ ∈ C3 , then #» 𝑣 = (5 + 𝑖) ⎣ 0 ⎦ + 3𝑖 ⎣ 1 ⎦ − 4 ⎣ 0 ⎦ = (5 + 𝑖)𝑒#»1 + 3𝑖 𝑒#»2 − 4𝑒#»3 . −4 0 0 1 #» The components of 𝑣 are 5 + 𝑖, 3𝑖 and −4. 1.5 Dot Product in R𝑛 The dot product is a function that takes as input two vectors in R𝑛 and returns a real number. (Later, we will adapt this idea to vectors in C𝑛 using a different operation.) ⎡ ⎤ ⎤ 𝑣1 𝑢1 ⎢ 𝑢2 ⎥ ⎢ 𝑣2 ⎥ ⎢ ⎥ ⎢ ⎥ 𝑣 = ⎢ . ⎥ be vectors in R𝑛 . We define their dot product by Let #» 𝑢 = ⎢ . ⎥ and #» ⎣ .. ⎦ ⎣ .. ⎦ 𝑢𝑛 𝑣𝑛 ⎡ Definition 1.5.1 Dot Product #» 𝑢 · #» 𝑣 = 𝑢1 𝑣1 + 𝑢2 𝑣2 + · · · + 𝑢𝑛 𝑣𝑛 . Example 1.5.2 ⎡ ⎤ ⎡ ⎤ [︂ ]︂ [︂ ]︂ 3 −4 2 5 We have · = 2(5) + 3(7) = 31 and ⎣ −5 ⎦ · ⎣ 4 ⎦ = 3(−4) + (−5)(4) + 2(−2) = −36. 3 7 2 −2 14 Chapter 1 Proposition 1.5.3 Vectors in R𝑛 (Properties of the Dot Product) #» are vectors in R𝑛 , then: If 𝑐 ∈ R and #» 𝑢 , #» 𝑣, 𝑤 (a) #» 𝑢 · #» 𝑣 = #» 𝑣 · #» 𝑢 (symmetry) }︃ #» = #» #» + #» #» (b) ( #» 𝑢 + #» 𝑣)· 𝑤 𝑢 ·𝑤 𝑣 ·𝑤 (c) (𝑐 #» 𝑢 ) · #» 𝑣 = 𝑐( #» 𝑢 · #» 𝑣) (collectively known as linearity) #» (d) #» 𝑣 · #» 𝑣 ≥ 0, with #» 𝑣 · #» 𝑣 = 0 if and only if #» 𝑣 = 0. (non-negativity) An important use of the dot product is determining the length of a vector. Definition 1.5.4 √ The length (or norm or magnitude) of the vector #» 𝑣 ∈ R𝑛 is ‖ #» 𝑣 ‖ = #» 𝑣 · #» 𝑣. Length in R𝑛 , Norm in R𝑛 Example 1.5.5 ⃦[︂ ]︂⃦ √︃[︂ ]︂ [︂ ]︂ √︀ ⃦ 2 ⃦ √ √ 2 2 ⃦ = 2(2) + 4(4) = 20 = 2 5. We have ⃦ · ⃦ 4 ⃦= 4 4 Example 1.5.6 ⃦⎡ ⎤⃦ ⎯ ⎡ ⎤ ⎡ ⎤ ⃦ 2 ⃦ ⎸ 2 2 √︀ ⃦ ⃦ ⎸ √ ⎣ −3 ⎦⃦ = ⎸ ⎣ −3 ⎦ · ⎣ −3 ⎦ = 2(2) + (−3)(−3) + 4(4) = 29. We have ⃦ ⎷ ⃦ ⃦ ⃦ 4 ⃦ 4 4 REMARKS The notation “‖” is used in order to distinguish from the notation for the absolute value of a real number, “|”. √ Recall that when we take the square root, we mean the positive square root, so that 4 = 2 (not ±2). Notice that it follows from Property (d) of Proposition 1.5.3 (Properties of the Dot Product) that every non-zero vector has length greater than zero, and the zero vector #» 0 has a length of zero. Proposition 1.5.7 If 𝑐 ∈ R and #» 𝑣 ∈ R𝑛 , then ‖𝑐 #» 𝑣 ‖ = |𝑐| ‖ #» 𝑣 ‖. ⃦[︂ ]︂⃦ √︀ ⃦ 1 ⃦ √ ⃦ ⃦ ⃦ 2 ⃦ = 3 1(1) + 2(2) = 3 5. Example 1.5.8 ⃦[︂ ]︂⃦ ⃦ [︂ ]︂⃦ ⃦ −3 ⃦ ⃦ 1 ⃦ ⃦ ⃦ ⃦ We have ⃦ ⃦ −6 ⃦ = ⃦−3 2 ⃦ = | − 3| Example 1.5.9 ⃦⎡ ⎤⃦ ⃦ ⎡ ⎤⃦ ⃦ −2 ⃦ ⃦ ⃦ 1 ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ ⎣ ⎦ ⎣ ⎦ We have ⃦ −6 ⃦ = ⃦−2 3 ⃦ ⃦ = | − 2| ⃦ 4 ⃦ ⃦ −2 ⃦ ⃦⎡ ⎤⃦ ⃦ 1 ⃦ √︀ ⃦ ⃦ √ ⃦⎣ 3 ⎦⃦ = 2 12 + 32 + (−2)2 = 2 14. ⃦ ⃦ ⃦ −2 ⃦ Section 1.5 Dot Product in R𝑛 Definition 1.5.10 15 We say that #» 𝑣 ∈ R𝑛 is a unit vector if ‖ #» 𝑣 ‖ = 1. Unit Vector Example 1.5.11 Example 1.5.12 [︂ ]︂ √ 2 #» If 𝑢 = , then ‖ #» 𝑢 ‖ = 13, so #» 𝑢 is not a unit vector. 3 If #» 𝑣 = [︂ 1 1 2 √ √ √ 6 6 6 ]︂𝑇 ⎡ ⎤ 1 1 ⎣ ⎦ 1 √ 1 , then ‖ #» = √ 𝑣 ‖ = √ 12 + 12 + 22 = 1, so #» 𝑣 is a unit 6 2 6 vector. Definition 1.5.13 When #» 𝑣 ∈ R𝑛 is a non-zero vector, we can produce a unit vector #» 𝑣 𝑣^ = #» ‖𝑣‖ Normalization in the direction of #» 𝑣 by scaling #» 𝑣 . This process is called normalization. REMARKS #» 𝑣 The vector 𝑣^ has the same direction as the vector #» 𝑣 , since 𝑣^ = ‖ #» 𝑣‖ = ⃦ ⃦ ⃒ ⃒ ⃦ 1 #»⃦ ⃒ 1 ⃒ #» The vector 𝑣^ is a unit vector, since ‖^ 𝑣 ‖ = ⃦ ‖ #» 𝑣 ‖ 𝑣 ⃦ = ⃒ ‖ #» 𝑣 ‖ ⃒ ‖ 𝑣 ‖ = 1. 1 ‖ #» 𝑣‖ #» 𝑣. ⎡ Example 1.5.14 ⎤ ⎡ ⎤ 2 2 #» 𝑣 1 ⎣ ⎦ #» ⎣ ⎦ −4 . The unit vector in the direction of 𝑣 = −4 is the vector 𝑣^ = #» = √ ‖𝑣‖ 21 −1 −1 The dot product can also be used to determine the angle between two vectors. In R2 this can be seen as follows. Figure 1.5.5: The angle 𝜃 between #» 𝑢 and #» 𝑣 in R2 16 Chapter 1 Vectors in R𝑛 If 𝜃 is the angle between #» 𝑢 , #» 𝑣 ∈ R2 , then the law of cosines says that ‖ #» 𝑢 − #» 𝑣 ‖2 = ‖ #» 𝑢 ‖2 + ‖ #» 𝑣 ‖2 − 2‖ #» 𝑢 ‖‖ #» 𝑣 ‖ cos 𝜃. On the other hand, recalling the definition of the length ‖ · ‖ in terms of the dot product, we have ‖ #» 𝑢 − #» 𝑣 ‖2 = ( #» 𝑢 − #» 𝑣 ) · ( #» 𝑢 − #» 𝑣) #» #» #» #» = 𝑢 · 𝑢 − 𝑢 · 𝑣 − #» 𝑣 · #» 𝑢 + #» 𝑣 · #» 𝑣 2 2 #» #» #» #» = ‖𝑢‖ − 2𝑣 · 𝑢 + ‖𝑣 ‖ . If we combine the above two equations, we arrive at #» 𝑢 · #» 𝑣 = ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ cos 𝜃. This gives us a relationship between the angle 𝜃 between the vectors #» 𝑢 , #» 𝑣 ∈ R2 and their dot product. This motivates the following definition. Definition 1.5.15 Angle Between Vectors Let and #» 𝑢 and #» 𝑣 be non-zero vectors in R𝑛 . The angle 𝜃, in radians (0 ≤ 𝜃 ≤ 𝜋), between #» 𝑢 #» 𝑣 is such that #» 𝑢 · #» 𝑣 = ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ cos 𝜃, that is, 𝜃 = arccos (︂ #» #» )︂ 𝑢·𝑣 . #» ‖ 𝑢 ‖‖ #» 𝑣‖ #» is between 0 and 𝜋 Figure 1.5.6: The angle 𝜃 between #» 𝑣 and 𝑤 REMARK #» #» 𝑢·𝑣 In order for the above definition to be well-defined, we need to know that ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ lies in in (︁ #» #» )︁ 𝑢·𝑣 the domain of arccos—otherwise arccos ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ is undefined. That is, we need to know that #» 𝑢 · #» 𝑣 ∈ [−1, 1], #» ‖ 𝑢 ‖‖ #» 𝑣‖ or, equivalently, that | #» 𝑢 · #» 𝑣 | ≤ ‖ #» 𝑢 ‖ #» 𝑣 ‖ for all #» 𝑢 , #» 𝑣 ∈ R𝑛 . This inequality is true and is known as the Cauchy–Schwarz Inequality. We will take its validity for granted in these notes, and will not provide a proof. Section 1.5 Dot Product in R𝑛 Example 1.5.16 Example 1.5.17 Definition 1.5.18 17 [︂ ]︂ [︂ ]︂ 1 −2 Determine the angle 𝜃 between and in R2 , rounding your answer to three decimal 4 3 places throughout. ⎛ [︂ ]︂ [︂ ]︂ ⎞ 1 −2 (︂ )︂ · ⎟ ⎜ 4 3 ⎟ = arccos √ 10√ ⃦ ⃦ [︂ ]︂⃦ [︂ ]︂⃦ Solution: 𝜃 = arccos ⎜ ≈ 0.833. ⎝ ⃦ 1 ⃦ ⃦ −2 ⃦ ⎠ 17 13 ⃦ ⃦ ⃦⃦ ⃦ 4 ⃦⃦ 3 ⃦ ⎡ ⎤ ⎡ ⎤ 1 2 Determine the angle, 𝜃, between ⎣ 3 ⎦ and ⎣ −4 ⎦ in R3 , rounding your answer to three 5 3 decimal places throughout. ⎛ ⎡ ⎤ ⎡ ⎤ ⎞ 1 2 ⎜ ⎣ 3 ⎦ · ⎣ −4 ⎦ ⎟ ⎜ ⎟ )︂ (︂ ⎜ ⎟ 5 3 5 ⎜ ⎟ ⎤⃦ ⎟ = arccos √ √ ≈ 1.413. Solution: 𝜃 = arccos ⎜ ⃦⎡ ⎤⃦ ⃦⎡ ⎜⃦ 1 ⃦ ⃦ 2 ⃦⎟ 35 29 ⃦⃦ ⃦⎟ ⎜⃦ ⎣ ⎦⃦ ⃦⎣ ⎦⃦ ⎝⃦ ⃦ 3 ⃦ ⃦ −4 ⃦ ⎠ ⃦ 5 ⃦⃦ 3 ⃦ We say that the two vectors #» 𝑢 and #» 𝑣 in R𝑛 are orthogonal (or perpendicular) if #» 𝑢 · #» 𝑣 = 0. Orthogonal in R𝑛 REMARK Our definition of the angle between two vectors, Definition 1.5.15, is consistent with our definition of orthogonality. Examining our new dot product formula, #» #» = ‖ #» #» cos 𝜃, 𝑣 ·𝑤 𝑣 ‖‖ 𝑤‖ we see that if two vectors are orthogonal, then their dot product is zero, and thus conclude that the angle 𝜃 between them is 𝜋2 , since cos 𝜋2 = 0 and 0 ≤ 𝜃 ≤ 𝜋. ⎡ Example 1.5.19 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 1 1 The vectors #» 𝑢 = ⎣ 5 ⎦ and #» 𝑣 = ⎣ 1 ⎦ are orthogonal, since #» 𝑢 · #» 𝑣 = ⎣ 5 ⎦ · ⎣ 1 ⎦ = 0. −3 2 −3 2 ⎡ Example 1.5.20 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 1 2 #» = ⎣ −3 ⎦ and #» #» · #» The vectors 𝑤 𝑥 = ⎣ 2 ⎦ are not orthogonal, since 𝑤 𝑥 = ⎣ −3 ⎦ · ⎣ 2 ⎦ = 5 1 5 1 1 ̸= 0. 18 Chapter 1 Vectors in R𝑛 REMARK ⎤ 𝑣1 ⎢ 𝑣2 ⎥ #» #» ⎢ ⎥ 𝑣 = ⎢ . ⎥ ∈ R𝑛 is orthogonal to 0 , as 0 · #» 𝑣 = 0𝑣1 + 0𝑣2 + · · · + 0𝑣𝑛 = 0. Every vector #» ⎣ .. ⎦ ⎡ 𝑣𝑛 Given a vector #» 𝑣 , there are many ways to produce a non-trivial vector that is orthogonal to #» 𝑣. Example 1.5.21 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 𝑎 −𝑏 −𝑏 𝑎 2 A vector orthogonal to in R is , since · = 0. 𝑎 𝑏 𝑏 𝑎 Example 1.5.22 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 𝑎 −𝑏 −𝑏 𝑎 3 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ A vector orthogonal to 𝑏 in R is 𝑎 , since 𝑎 · 𝑏 ⎦ = 0. 𝑐 0 0 𝑐 EXERCISE ⎡ ⎤ 𝑎 ⎣ Produce two different vectors that are orthogonal to 𝑏 ⎦. 𝑐 ⎡ Example 1.5.23 ⎤ ⎡ ⎤ 1 1 #» #» #» ⎣ ⎦ ⎣ Determine a vector 𝑢 that is orthogonal to both 𝑣 = −1 and 𝑤 = 2 ⎦. 0 3 ⎡ ⎤ 𝑢1 Solution: Let #» 𝑢 = ⎣ 𝑢2 ⎦ be such a vector. Then 𝑢3 #» 𝑢 · #» 𝑣⎤ =⎡0 ⎤ ⎡ 𝑢1 1 ⎣ 𝑢2 ⎦ · ⎣ −1 ⎦ = 0 𝑢3 0 𝑢1 − 𝑢2 = 0 𝑢1 = 𝑢2 (*) and ⎡ #» #» = 0 𝑢 ·𝑤 ⎡ ⎤ ⎡ ⎤ 𝑢1 1 ⎣ 𝑢2 ⎦ · ⎣ 2 ⎦ = 0 𝑢3 3 𝑢1 + 2𝑢2 + 3𝑢3 = 0 3𝑢1 + 3𝑢3 = 0 substitute from (*) 𝑢3 = −𝑢1 (**) ⎤ ⎡ ⎤ 𝑢1 𝑢1 From (*) and (**) we have #» 𝑢 = ⎣ 𝑢1 ⎦. For any choice of 𝑢1 ∈ R, the vector ⎣ 𝑢1 ⎦ will −𝑢1 −𝑢1 #» This is because by choosing different values of 𝑢 , we be orthogonal to both #» 𝑣 and 𝑤. 1 Section 1.6 Projection, Components and Perpendicular 19 change the length of #» 𝑢 , but the line through the origin which contains #» 𝑢 remains fixed. ⎡ ⎤ 4 For example, with 𝑢1 = 4 we get #» 𝑢 = ⎣ 4 ⎦, in which case −4 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 4 1 4 1 #» #» #» #» ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ 𝑢 · 𝑣 = 4 · −1 = 4 − 4 = 0 and 𝑢 · 𝑤 = 4 · 2 ⎦ = 4 + 8 − 12 = 0. −4 0 −4 3 EXERCISE ⎡ ⎤ 𝑢1 With #» 𝑢 = ⎣ 𝑢1 ⎦ from Example 1.5.23, choose different values of 𝑢1 ∈ R and convince −𝑢1 ⎡ ⎤ ⎡ ⎤ 1 1 #» = ⎣ −1 ⎦. yourself that the resulting #» 𝑢 is still orthogonal to both #» 𝑣 = ⎣ 2 ⎦ and 𝑤 0 3 #» Show that for any choice of 𝑢1 ∈ R, the resulting #» 𝑢 is still orthogonal to both #» 𝑣 and 𝑤. 1.6 Definition 1.6.1 Projection of #» 𝑣 #» Onto 𝑤 Projection, Components and Perpendicular #» #» ∈ R𝑛 with 𝑤 #» = #» is defined by Let #» 𝑣,𝑤 ̸ 0 . The projection of #» 𝑣 onto 𝑤 #» #» #» ( #» 𝑣 · 𝑤) #» = ( 𝑣 · 𝑤) 𝑤. #» proj 𝑤#» ( #» 𝑣 ) = #» 2 𝑤 #» · 𝑤 #» 𝑤 ‖ 𝑤‖ #» direction. We also refer to this as the projection of #» 𝑣 in the 𝑤 #» #» ( 𝑣 ) Figure 1.6.7: proj 𝑤 20 Chapter 1 Find the projection of #» 𝑣 = [︂ Vectors in R𝑛 ]︂ [︂ ]︂ 4 7 #» onto 𝑤 = . −8 1 Solution: proj 𝑤#» ( #» 𝑣) = Example 1.6.2 = = = #» ( #» 𝑣 · 𝑤) #» #» 2 𝑤 ‖ 𝑤‖ [︂ ]︂ [︂ ]︂ 4 7 [︂ ]︂ · −8 1 7 ⃦[︂ ]︂⃦2 1 ⃦ 7 ⃦ ⃦ ⃦ ⃦ 1 ⃦ [︂ ]︂ 20 7 50 1 2 #» 𝑤. 5 ⎡ Example 1.6.3 ⎤ ⎡ ⎤ 3 1 #» = ⎣ 2 ⎦. Find the projection of #» 𝑣 = ⎣ −4 ⎦ onto 𝑤 5 3 Solution: ⎡ ⎤ ⎡ ⎤ 3 1 ⎣ −4 ⎦ · ⎣ 2 ⎦ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 1 #» #» 5 3 ( 𝑣 · 𝑤) #» 10 ⎣ ⎦ 5 ⎣ ⎦ #» ⎣ ⎦ 2 = 2 = 2 . proj 𝑤#» ( 𝑣 ) = #» 2 𝑤 = ⃦⎡ ⎤⃦2 ‖ 𝑤‖ 14 7 ⃦ 1 ⃦ 3 3 3 ⃦ ⃦ ⃦⎣ 2 ⎦⃦ ⃦ ⃦ ⃦ 3 ⃦ REMARKS 1. We can write the projection as follows: (︂ #» ( #» 𝑣 · 𝑤) #» = #» proj 𝑤#» ( #» 𝑣 ) = #» 2 𝑤 𝑣 · ‖ 𝑤‖ #» )︂ 𝑤 #» 𝑤 #» ^ 𝑤. ^ #» #» = ( 𝑣 · 𝑤) ‖ 𝑤‖ ‖ 𝑤‖ #» is not relevant to the projection, but the direction of 𝑤 #» is. Thus, the magnitude of 𝑤 #» is given by #» #» = 2. Recall from Definition 1.5.15 that the angle 𝜃 between #» 𝑣 and 𝑤 𝑣 ·𝑤 #» cos 𝜃, giving: ‖ #» 𝑣 ‖‖ 𝑤‖ #» cos 𝜃 ‖ #» 𝑣 ‖‖ 𝑤‖ #» = (‖ #» proj 𝑤#» ( #» 𝑣) = 𝑤 𝑣 ‖ cos 𝜃)𝑤. ^ #» ‖ 𝑤‖2 #» as giving that part of the vector #» You can think of the projection of #» 𝑣 onto 𝑤 𝑣 that #» lies in the 𝑤 direction. Section 1.6 Projection, Components and Perpendicular Definition 1.6.4 21 #» #» ∈ R𝑛 with 𝑤 #» = Let #» 𝑣,𝑤 ̸ 0 . We refer to the quantity Component ‖ #» 𝑣 ‖ cos 𝜃 = #» 𝑣 ·𝑤 ^ #» as the component (or scalar component) of #» 𝑣 along 𝑤. #» will be in the direction If the scalar component is negative, then the projection of #» 𝑣 along 𝑤 #» we can still think of this projection vector as being in the 𝑤 #» direction. of the vector − 𝑤; ⎡ Example 1.6.5 Definition 1.6.6 ⎤ ⎡ ⎤ 3 1 #» #» ⎣ ⎦ ⎣ Determine the component of 𝑣 = −4 along 𝑤 = 2 ⎦. 5 3 ⎡ ⎤ 1 1 #» = √14, so 𝑤 ^ = √ ⎣ 2 ⎦. Solution: Note that ‖ 𝑤‖ 14 3 ⎡ ⎤ ⎡ ⎤ 1 3 10 1 #» is #» The component of #» 𝑣 along 𝑤 𝑣 ·𝑤 ^ = ⎣ −4 ⎦ · √ ⎣ 2 ⎦ = √ . 14 3 14 5 #» #» ∈ R𝑛 with 𝑤 #» = #» is defined by Let #» 𝑣,𝑤 ̸ 0 . The perpendicular of #» 𝑣 onto 𝑤 Perpendicular perp 𝑤#» ( #» 𝑣 ) = #» 𝑣 − proj 𝑤#» ( #» 𝑣 ). #» #» ( 𝑣 ) Figure 1.6.8: perp 𝑤 Example 1.6.7 [︂ ]︂ [︂ ]︂ 4 7 #» #» Determine the perpendicular of 𝑣 = onto 𝑤 = . −8 1 Then calculate proj #» ( #» 𝑣 ) · perp #» ( #» 𝑣 ). 𝑤 Solution: 𝑤 22 Chapter 1 Vectors in R𝑛 [︂ ]︂ 2 7 #» , so From [Example 1.6.2], we have proj 𝑤#» ( 𝑣 ) = 5 1 perp 𝑤#» ( #» 𝑣) = = = proj 𝑤#» ( #» 𝑣 ) · perp 𝑤#» ( #» 𝑣) = = = #» 𝑣 − proj #» ( #» 𝑣) [︂ ]︂ 𝑤 [︂ ]︂ 2 7 4 − −8 5 1 [︂ ]︂ 6 1 5 −7 (︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂ 2 7 6 1 · 5 1 5 −7 12 (7(1) + 1(−7)) 25 0. ⎤ ⎡ ⎤ 1 3 #» #» ⎦ ⎣ ⎣ Determine the perpendicular of 𝑣 = −4 onto 𝑤 = 2 ⎦. 3 5 ⎡ ⎤ 1 5⎣ ⎦ #» 2 , so Solution: From Example 1.6.3, we have proj 𝑤#» ( 𝑣 ) = 7 3 ⎡ Example 1.6.8 ⎤ ⎤ ⎡ ⎤ ⎡ 3 1 16 5 1 𝑣 ) = #» 𝑣 − proj 𝑤#» ( #» 𝑣 ) = ⎣ −4 ⎦ − ⎣ 2 ⎦ = ⎣ −38 ⎦ . perp 𝑤#» ( #» 7 7 5 3 20 ⎡ Proposition 1.6.9 #» #» = The projection and the perpendicular of a vector #» 𝑣 onto 𝑤 ̸ 0 are orthogonal; that is perp 𝑤#» ( #» 𝑣 ) · proj 𝑤#» ( #» 𝑣 ) = 0. #» since Proof: First, perp 𝑤#» ( #» 𝑣 ) is orthogonal to 𝑤, #» = ( #» #» perp 𝑤#» ( #» 𝑣)· 𝑤 𝑣 − proj 𝑤#» ( #» 𝑣 )) · 𝑤 #» − (proj #» ( #» #» = #» 𝑣 ·𝑤 𝑤 𝑣 )) · 𝑤 (︂ #» #» )︂ #» − ( 𝑣 · 𝑤) 𝑤 #» · 𝑤 #» = #» 𝑣 ·𝑤 #» 2 ‖ 𝑤‖ (︂ #» #» )︂ ( 𝑣 · 𝑤) #» #» #» 2 = 𝑣 ·𝑤− ‖ 𝑤‖ #» 2 ‖ 𝑤‖ #» − #» #» = #» 𝑣 ·𝑤 𝑣 ·𝑤 = 0. #» it is also orthogonal to proj #» ( #» Since perp 𝑤#» ( #» 𝑣 ) is orthogonal to 𝑤, 𝑤 𝑣 ). Section 1.7 Standard Inner Product in C𝑛 1.7 23 Standard Inner Product in C𝑛 Recall the different ways used to find the magnitude of a real number and the magnitude of a complex number. √︀ If 𝑥 ∈ R, then the magnitude (absolute value) of 𝑥 is |𝑥| = 𝑥(𝑥). √︀ If 𝑧 ∈ C, then the magnitude (modulus) of 𝑧 is |𝑧| = 𝑧(¯ 𝑧 ). Example 1.7.1 Determine the magnitude of 𝑧 = 2 − 3𝑖. Solution: The magnitude of 𝑧 is √︁ √︀ √ |𝑧| = |2 − 3𝑖| = (2 − 3𝑖)(2 − 3𝑖) = 22 + (−3)2 = 13. One of the most important uses of the dot product on R𝑛 is in finding the lengths of vectors. When we extend this concept to C𝑛 , we will need to use complex conjugates. ⎡ Definition 1.7.2 Standard Inner Product on C𝑛 ⎤ ⎡ ⎤ 𝑣1 𝑤1 ⎢ 𝑣2 ⎥ ⎢ 𝑤2 ⎥ ⎢ ⎥ #» ⎢ ⎥ The standard inner product of #» 𝑣 = ⎢ . ⎥,𝑤 = ⎢ . ⎥ ∈ C𝑛 is ⎣ .. ⎦ ⎣ .. ⎦ 𝑣𝑛 𝑤𝑛 #» = 𝑣 𝑤 + 𝑣 𝑤 + · · · + 𝑣 𝑤 . ⟨ #» 𝑣 , 𝑤⟩ 2 2 𝑛 𝑛 1 1 Notice that this operation is similar to the dot product, in that we multiply components together and add them up. An important difference is the fact that we take the complex #» conjugate of all of the components that come from the second vector 𝑤. Example 1.7.3 Let #» 𝑣 = [︂ ]︂ [︂ ]︂ 1+𝑖 2+𝑖 #» #» ,𝑤 = . Evaluate ⟨ #» 𝑣 , 𝑤⟩. 1 + 2𝑖 1+𝑖 Solution: #» = (1 + 𝑖)(2 + 𝑖) + (1 + 2𝑖)(1 + 𝑖) ⟨ #» 𝑣 , 𝑤⟩ = (2 − 𝑖 + 2𝑖 + 1) + (1 − 𝑖 + 2𝑖 + 2) = (3 + 𝑖) + (3 + 𝑖) = 6 + 2𝑖. ⟨[︂ Example 1.7.4 Evaluate ]︂ [︂ ]︂⟩ 2 − 3𝑖 −3 + 4𝑖 , . 1 − 2𝑖 3 + 5𝑖 24 Chapter 1 Vectors in R𝑛 Solution: ⟨[︂ 2 − 3𝑖 1 − 2𝑖 ]︂ [︂ ]︂⟩ −3 + 4𝑖 , = (2 − 3𝑖)(−3 + 4𝑖) + (1 − 2𝑖)(3 + 5𝑖) 3 + 5𝑖 = (2 − 3𝑖) (−3 − 4𝑖) + (1 − 2𝑖)(3 − 5𝑖) = (−6 − 8𝑖 + 9𝑖 − 12) + (3 − 5𝑖 − 6𝑖 − 10) = −25 − 10𝑖. Proposition 1.7.5 (Properties of the Standard Inner Product) #» are vectors in C𝑛 , then If 𝑐 ∈ C and #» 𝑢 , #» 𝑣 and 𝑤 (a) ⟨ #» 𝑢 , #» 𝑣 ⟩ = ⟨ #» 𝑣 , #» 𝑢 ⟩ (conjugate symmetry) }︃ (︂ )︂ #» = ⟨ #» #» + ⟨ #» #» (b) ⟨ #» 𝑢 + #» 𝑣 , 𝑤⟩ 𝑢 , 𝑤⟩ 𝑣 , 𝑤⟩ collectively known as linearity in the first argument. (c) ⟨𝑐 #» 𝑢 , #» 𝑣 ⟩ = 𝑐 ⟨ #» 𝑢 , #» 𝑣⟩ #» (d) ⟨ #» 𝑣 , #» 𝑣 ⟩ ≥ 0, with ⟨ #» 𝑣 , #» 𝑣 ⟩ = 0 if and only if #» 𝑣 = 0. (non-negativity). As in R𝑛 , we can use Property (d) to calculate lengths of vectors in C𝑛 . Definition 1.7.6 √︀ The length (or norm or magnitude) of the vector #» 𝑣 ∈ C𝑛 is ‖ #» 𝑣 ‖ = ⟨ #» 𝑣 , #» 𝑣 ⟩. Length in C𝑛 , Norm in C𝑛 ⎡ Example 1.7.7 ⎤ 2−𝑖 Determine the length of the vector #» 𝑣 = ⎣ −3 + 2𝑖 ⎦. −4 − 5𝑖 Solution: ⟨⎡ 2 − 𝑖 ⎤ ⎡ 2 − 𝑖 ⎤⟩ ⎣ −3 + 2𝑖 ⎦ , ⎣ −3 + 2𝑖 ⎦ ⟨ #» 𝑣 , #» 𝑣⟩ = −4 − 5𝑖 −4 − 5𝑖 = (2 − 𝑖)(2 − 𝑖) + (−3 + 2𝑖)(−3 + 2𝑖) + (−4 − 5𝑖)(−4 − 5𝑖) = 2 2 + 12 + 32 + 22 + 4 2 + 5 2 = 59 √︀ √ #» ⟨ #» 𝑣 , #» 𝑣 ⟩ = 59. ‖𝑣‖ = Note that ⟨ #» 𝑣 , #» 𝑣 ⟩ = ‖ #» 𝑣 ‖2 = |2 − 𝑖|2 + | − 3 + 2𝑖|2 + | − 4 − 5𝑖|2 . Proposition 1.7.8 (Properties of the Length) Let 𝑐 ∈ C and #» 𝑣 ∈ C𝑛 . Then (a) ‖𝑐 #» 𝑣 ‖ = |𝑐| ‖ #» 𝑣‖ Section 1.7 Standard Inner Product in C𝑛 25 (b) ‖ #» 𝑣 ‖ ≥ 0, and ‖ #» 𝑣 ‖ = 0 if and only if #» 𝑣 = 0. Example 1.7.9 ⃦ ⎡ ⎤⃦ ⃦ 2−𝑖 ⃦ ⃦ ⃦ ⎣ ⎦⃦ Determine ⃦ ⃦(−1 + 2𝑖) −3 + 2𝑖 ⃦ . ⃦ −4 − 5𝑖 ⃦ Solution: Using Property (a) in the above Proposition and Example 1.7.7, ⃦ ⃦⎡ ⎡ ⎤⃦ ⎤⃦ ⃦ ⃦ 2−𝑖 ⃦ 2−𝑖 ⃦ √ ⃦ ⃦ ⃦ ⃦ √ √ ⃦(−1 + 2𝑖) ⎣ −3 + 2𝑖 ⎦⃦ = | − 1 + 2𝑖| ⃦⎣ −3 + 2𝑖 ⎦⃦ = 5 59 = 295. ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ −4 − 5𝑖 ⃦ −4 − 5𝑖 ⃦ Definition 1.7.10 We say that the two vectors #» 𝑢 and #» 𝑣 in C𝑛 are orthogonal if ⟨ #» 𝑢 , #» 𝑣 ⟩ = 0. Orthogonal in C𝑛 Example 1.7.11 Is #» 𝑣 = [︂ ]︂ ]︂ [︂ 2+𝑖 1+𝑖 #» ? orthogonal to 𝑤 = −1 − 𝑖 1 + 2𝑖 #» Solution: Let’s calculate the standard inner product of #» 𝑣 and 𝑤: #» = (1 + 𝑖)(2 + 𝑖) + (1 + 2𝑖)(−1 − 𝑖) ⟨ #» 𝑣 , 𝑤⟩ = 2 − 𝑖 + 2𝑖 + 1 − 1 + 𝑖 − 2𝑖 − 2 = 0. #» = 0, #» #» are orthogonal. Since ⟨ #» 𝑣 , 𝑤⟩ 𝑣 and 𝑤 Example 1.7.12 ]︂ [︂ 2+𝑖 #» . (a) Find a non-zero vector that is orthogonal to 𝑤 = 1+𝑖 ]︂ [︂ #» = 2 + 𝑖 . (b) Find a unit vector that is orthogonal to 𝑤 1+𝑖 Solution: [︂ ]︂ 𝑣 #» Then 𝑣 = 1 be orthogonal to 𝑤. (a) Let #» 𝑣2 #» = 0 ⟨ #» 𝑣 , 𝑤⟩ 𝑣1 (2 − 𝑖) + 𝑣2 (1 − 𝑖) = 0 𝑣2 (1 − 𝑖) = 𝑣1 (𝑖 − 2) 𝑣2 (1 − 𝑖)(1 + 𝑖) = 𝑣1 (𝑖 − 2)(1 + 𝑖) (︀ )︀ multiply by 1 − 𝑖 2𝑣2 = 𝑣1 (−3 − 𝑖) −3 − 𝑖 𝑣2 = 𝑣1 . (*) 2 #» = 0, then By Proposition 1.7. 5 (Properties of the Standard Inner Product), if ⟨ #» 𝑣 , 𝑤⟩ #» #» #» #» #» #» ⟨𝑐 𝑣 , 𝑤⟩ = 𝑐 ⟨ 𝑣 , 𝑤⟩ = 𝑐(0) = 0; that is, if 𝑣 is orthogonal to 𝑤, then any scalar multiple 𝑐 #» 𝑣 #» is also orthogonal to 𝑤. 26 Chapter 1 Vectors in R𝑛 Accordingly, by choosing any non-zero value for 𝑣[︂1 , we ]︂ can use (*) to determine a corre𝑣 #» sponding value for 𝑣2 , so that the resulting vector 1 is orthogonal to 𝑤. 𝑣2 −3 − 𝑖 For example, choosing 𝑣1 = 2 gives 𝑣2 = (2) = −3 − 𝑖. 2 ⟨[︂ ]︂ [︂ ]︂⟩ 2 2+𝑖 #» #» Since ⟨ 𝑣 , 𝑤⟩ = , −3 − 𝑖 1+𝑖 = 2(2 + 𝑖) + (−3 − 𝑖)(1 + 𝑖) = 2(2 − 𝑖) + (−3 − 𝑖)(1 − 𝑖) = 4 − 2𝑖 − 3 − 𝑖 + 3𝑖 − 1 = 0, [︂ ]︂ [︂ ]︂ 2 2+𝑖 #» #» we know that 𝑣 = is orthogonal to 𝑤 = . −3 − 𝑖 1+𝑖 [︂ ]︂ 1 2 (b) A unit vector in the same direction as #» 𝑣 = is 𝑣^ = #» #» 𝑣. −3 − 𝑖 ‖𝑣‖ √︁ √︀ 𝑣 , #» 𝑣⟩ = 2(2) + (−3 − 𝑖)(−3 − 𝑖) ‖ #» 𝑣 ‖ = ⟨ #» √ 4+9+1 = √ = 14 ]︂ [︂ 1 1 2 #» is a unit vector that is orthogonal to 𝑤. Thus, 𝑣^ = #» #» 𝑣 =√ ‖𝑣‖ 14 −3 − 𝑖 EXERCISE In (a) of the previous Example, choose a different non-zero [︂value ]︂ for 𝑣1 and use (*) to 𝑣1 #» = determine a value for 𝑣2 . Verify that your resulting vector is orthogonal to 𝑤 𝑣2 ]︂ [︂ 2+𝑖 . 1+𝑖 Definition 1.7.13 Projection of #» 𝑣 #» Onto 𝑤 #» #» ∈ C𝑛 , with 𝑤 #» = #» is defined by Let #» 𝑣,𝑤 ̸ 0 . The projection of #» 𝑣 onto 𝑤 #» ⟨ #» 𝑣 , 𝑤⟩ #» = ⟨ #» proj 𝑤#» ( #» 𝑣 ) = #» 2 𝑤 𝑣 , 𝑤⟩ ^ 𝑤. ^ ‖ 𝑤‖ ⎡ Example 1.7.14 ⎤ ⎡ ⎤ 1 1−𝑖 #» = ⎣ 2 − 𝑖 ⎦ . Find the projection of #» 𝑣 = ⎣ 𝑖 ⎦ onto 𝑤 1+𝑖 3+𝑖 Solution: Section 1.8 Fields 27 #» = 1(1 − 𝑖) + 𝑖(2 − 𝑖) + (1 + 𝑖)(3 + 𝑖) ⟨ #» 𝑣 , 𝑤⟩ = 1(1 + 𝑖) + 𝑖(2 + 𝑖) + (1 + 𝑖)(3 − 𝑖) = 1 + 𝑖 + 2𝑖 − 1 + 3 − 𝑖 + 3𝑖 + 1 = 4 + 5𝑖 2 #» ‖ 𝑤‖ = |1 − 𝑖|2 + |2 − 𝑖|2 + |3 + 𝑖|2 = 1+1+4+1+9+1 = 17 ⎡ ⎤ 1−𝑖 #» #» ⟨ 𝑣 , 𝑤⟩ #» 4 + 5𝑖 ⎣ 2 − 𝑖⎦ . Thus, proj 𝑤#» ( #» 𝑣) = #» 2 𝑤 = 17 ‖ 𝑤‖ 3+𝑖 1.8 Fields So far, we have used the sets R and C to choose the components of vectors, but they will play other roles as well. These sets are examples of a field. We will not see any other fields in this course beyond R and C, but for the interested student, we can give some feel for what a field “is”. Loosely speaking, a field is a set where we can add, subtract, multiply and divide. So, the set of integers, Z, is not a field, since although 2 and 3 are both integers, the quotient 2/3 is not. The set of positive real numbers R+ is not a field since although 2 and 3 are positive real numbers, the difference 2 − 3 is not. It turns out (and this is not supposed to be obvious) that the ability to add, subtract, multiply and divide are exactly the properties we need of our scalars in order to do the type of algebra that we want. In much of what we do, there is no need to make an explicit distinction between R and C, and so we will refer to our universal set as the field F, with the understanding that F will either be R or C, so we will assume that F ⊆ C. Most of the time, the choice of F will not matter. There are however a few occasions where we need to be explicit about which field we are using. For example, if we attempt to solve 2𝑥 = 5 over F, the solution is 𝑥 = 25 , whether F is R or C. However, if we try to solve 𝑥2 = −1 over F, then the field matters. If F = R, then this equation has no solutions, but if F = C, then there are two solutions: 𝑥 = ±𝑖. We previously saw the definition of the standard inner product in C𝑛 . It will be useful to extend this definition to F𝑛 . Definition 1.8.1 Standard Inner Product on F𝑛 #» ∈ F𝑛 is The standard inner product of #» 𝑣,𝑤 #» = 𝑣 𝑤 + 𝑣 𝑤 + · · · + 𝑣 𝑤 . ⟨ #» 𝑣 , 𝑤⟩ 1 1 2 2 𝑛 𝑛 28 Chapter 1 Vectors in R𝑛 REMARK • If F = C, then this is exactly the standard inner product on C𝑛 , from Definition 1.7.2. • If F = R, since each 𝑤𝑖 ∈ R, then 𝑤𝑖 = 𝑤𝑖 , and this is just the dot product on R𝑛 , as in Definition 1.5.1. 1.9 The Cross Product in R3 Given two vectors #» 𝑢 , #» 𝑣 ∈ R3 , the cross product returns a vector in R3 that is orthogonal #» #» to both 𝑢 and 𝑣 . ⎡ Definition 1.9.1 Cross Product ⎤ ⎡ ⎤ 𝑢1 𝑣1 Let #» 𝑢 = ⎣ 𝑢2 ⎦ , #» 𝑣 = ⎣ 𝑣2 ⎦ ∈ R3 . The cross product of #» 𝑢 and #» 𝑣 is defined to be the 𝑢3 𝑣3 vector in R3 given by ⎡ ⎤ 𝑢2 𝑣3 − 𝑢3 𝑣2 #» 𝑢 × #» 𝑣 = ⎣ −(𝑢1 𝑣3 − 𝑢3 𝑣1 ) ⎦ . 𝑢1 𝑣2 − 𝑢2 𝑣1 ⎡ Example 1.9.2 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 −2 (−3)(4) − (5)(1) −17 We have ⎣ −3 ⎦ × ⎣ 1 ⎦ = ⎣ −((2)(4) − (5)(−2)) ⎦ = ⎣ −18 ⎦ . 5 4 (2)(1) − (−3)(−2) −4 REMARK A useful trick for remembering the definition of the cross product will be given in Section 6.6. Proposition 1.9.3 (Properties of the Cross Product) Let #» 𝑢 , #» 𝑣 ∈ R3 and let #» 𝑧 = #» 𝑢 × #» 𝑣 . Then (a) #» 𝑧 · #» 𝑢 = 0 and #» 𝑧 · #» 𝑣 =0 ( #» 𝑧 is orthogonal to both #» 𝑢 and #» 𝑣) (b) #» 𝑣 × #» 𝑢 = − #» 𝑧 = − #» 𝑢 × #» 𝑣 (skew-symmetric) #» #» (c) If #» 𝑢 ̸= 0 and #» 𝑣 ̸= 0 , then ‖ #» 𝑢 × #» 𝑣 ‖ = ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ sin 𝜃, where 𝜃 is the angle between #» 𝑢 #» and 𝑣 . Section 1.9 The Cross Product in R3 29 REMARK (Right-Hand Rule) If #» 𝑧 is orthogonal to both #» 𝑢 and #» 𝑣 , then − #» 𝑧 will #» also be orthogonal to both 𝑢 and #» 𝑣 , and − #» 𝑧 has #» opposite direction to 𝑧 . The Right-Hand Rule is a convention used to determine which of these directions #» 𝑢 × #» 𝑣 should have. If the pointer finger of your right hand points in the direction of #» 𝑢 , and the middle finger of your right hand points in the direction of #» 𝑣 , then your thumb points in the direction of #» 𝑢 × #» 𝑣. If you consider #» 𝑣 × #» 𝑢 instead, you would have to turn your hand upside-down, which helps demonstrate the skew-symmetric property above! REMARK (Parallelogram Area via Cross Product) Consider the parallelogram defined by #» 𝑢 and #» 𝑣 ; its height is ℎ = ‖ #» 𝑣 ‖ sin 𝜃. As a result of Property (c) of the above Proposition, its area is ‖ #» 𝑢 ‖ℎ = ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ sin 𝜃 = ‖ #» 𝑢 × #» 𝑣 ‖. ⎡ Example 1.9.4 ⎤ ⎡ ⎤ 2 −2 With #» 𝑢 = ⎣ −3 ⎦ and #» 𝑣 = ⎣ 1 ⎦, verify Proposition 1.9.3 (Properties of the Cross Prod5 4 uct). ⎡ ⎤ −17 Solution: From Example 1.9.2, #» 𝑧 = #» 𝑢 × #» 𝑣 = ⎣ −18 ⎦. −4 ⎡ ⎤ ⎡ ⎤ −17 2 (a) #» 𝑧 · #» 𝑢 = ⎣ −18 ⎦ · ⎣ −3 ⎦ = −17(2) − 18(−3) − 4(5) = 0. −4 5 ⎡ ⎤ ⎡ ⎤ −17 −2 #» 𝑧 · #» 𝑣 = ⎣ −18 ⎦ · ⎣ 1 ⎦ = −17(−2) − 18(1) − 4(4) = 0. −4 4 30 Chapter 1 Vectors in R𝑛 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −2 2 (1)(5) − (4)(−3) 17 (b) #» 𝑣 × #» 𝑢 = ⎣ 1 ⎦ × ⎣ −3 ⎦ = ⎣ −((−2)(5) − (4)(2)) ⎦ = ⎣ 18 ⎦ = − #» 𝑢 × #» 𝑣. 4 5 (−2)(−3) − (1)(2) 4 (c) Let 𝜃 be the angle between #» 𝑢 and #» 𝑣 . Working to three decimal places throughout, #» #» 𝑢 · 𝑣 can be used to evaluate 𝜃. ⎛ ⎡ ⎤ ⎡ ⎤ ⎞ 2 −2 ⎜ ⎣ −3 ⎦ · ⎣ 1 ⎦ ⎟ ⎜ ⎟ (︂ #» #» )︂ )︂ (︂ ⎜ ⎟ 5 4 𝑢·𝑣 13 ⎜ ⎟ ⎤⃦ ⎟ = arccos √ 𝜃 = arccos = arccos ⎜ ⃦⎡ ⎤⃦ ⃦⎡ ≈ 1.093. ⎜ ⃦ 2 ⃦ ⃦ −2 ⃦ ⎟ ‖ #» 𝑢 ‖‖ #» 𝑣‖ 798 ⃦⃦ ⃦⎟ ⎜⃦ ⎣ ⎦⃦ ⃦⎣ ⎦⃦ ⎝⃦ ⃦ −3 ⃦ ⃦ 1 ⃦ ⎠ ⃦ 5 ⃦⃦ 4 ⃦ ⃦⎡ ⎤⃦ ⃦⎡ ⎤⃦ ⃦ 2 ⃦ ⃦ −2 ⃦ ⃦ ⃦ ⃦ ⃦ √ ⎣ −3 ⎦⃦ ⃦⎣ 1 ⎦⃦ sin 𝜃 ≈ 798(0.888) ≈ 25.080. Thus ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ sin 𝜃 = ⃦ ⃦ ⃦⃦ ⃦ ⃦ 5 ⃦⃦ 4 ⃦ ⎡ ⎤ −17 Since #» 𝑢 × #» 𝑣 = ⎣ −18 ⎦, thus −4 √ √ ‖ #» 𝑢 × #» 𝑣 ‖ = 172 + 182 + 42 = 629 = 25.080 ≈ ‖ #» 𝑢 ‖‖ #» 𝑣 ‖ sin 𝜃. Proposition 1.9.5 (Linearity of the Cross Product) #» ∈ R3 , then If 𝑐 ∈ R and #» 𝑢 , #» 𝑣,𝑤 }︃ #» = ( #» #» + ( #» #» ( #» 𝑢 + #» 𝑣)× 𝑤 𝑢 × 𝑤) 𝑣 × 𝑤) (𝑏) (𝑐 #» 𝑢 ) × #» 𝑣 = 𝑐( #» 𝑢 × #» 𝑣) (𝑎) (𝑐) (𝑑) #» #» = ( #» #» 𝑢 × ( #» 𝑣 + 𝑤) 𝑢 × #» 𝑣 ) + ( #» 𝑢 × 𝑤) #» 𝑢 × (𝑐 #» 𝑣 ) = 𝑐( #» 𝑢 × #» 𝑣) linearity in the first argument }︃ linearity in the second argument Proof: We will prove linearity in the second argument; the proof of linearity in the first argument follows by using the skew-symmetry of the cross product. (c) We have ⎡ ⎤ 𝑢2 (𝑣3 + 𝑤3 ) − 𝑢3 (𝑣2 + 𝑤2 ) #» #» = ⎣ −[𝑢 (𝑣 + 𝑤 ) − 𝑢 (𝑣 + 𝑤 )] ⎦ 𝑢 × ( #» 𝑣 + 𝑤) 1 3 3 3 1 1 𝑢1 (𝑣2 + 𝑤2 ) − 𝑢2 (𝑣1 + 𝑤1 ) ⎡ ⎤ ⎡ ⎤ 𝑢2 𝑣3 − 𝑢3 𝑣2 𝑢2 𝑤3 − 𝑢3 𝑤2 = ⎣ −[𝑢1 𝑣3 − 𝑢3 𝑣1 ] ⎦ + ⎣ −[𝑢1 𝑤3 − 𝑢3 𝑤1 ] ⎦ 𝑢1 𝑣2 − 𝑢2 𝑣1 𝑢1 𝑤2 − 𝑢2 𝑤1 #» #» #» #» = ( 𝑢 × 𝑣 ) + ( 𝑢 × 𝑤). Section 1.9 The Cross Product in R3 31 (d) We have ⎡ ⎤ 𝑢2 (𝑐𝑣3 ) − 𝑢3 (𝑐𝑣2 ) #» 𝑢 × (𝑐 #» 𝑣 ) = ⎣ −[𝑢1 (𝑐𝑣3 ) − 𝑢3 (𝑐𝑣1 )] ⎦ 𝑢1 (𝑐𝑣2 ) − 𝑢2 (𝑐𝑣1 ) ⎡ ⎤ 𝑢2 𝑣3 − 𝑢3 𝑣2 = 𝑐 ⎣ −[𝑢1 𝑣3 − 𝑢3 𝑣1 ] ⎦ 𝑢1 𝑣2 − 𝑢2 𝑣1 = 𝑐( #» 𝑢 × #» 𝑣 ). REMARK (Why is #» 𝑢 × #» 𝑣 only defined for R3 ?) Given two non-parallel vectors #» 𝑢 , #» 𝑣 ∈ R𝑛 for 𝑛 = 3, we can notice that there is always exactly one line through the origin that is orthogonal to both #» 𝑢 and #» 𝑣 . A cross product allows us to compute one non-zero vector on this line. It turns out that any other vector that is orthogonal to both #» 𝑢 and #» 𝑣 will always lie on this line, and hence will be a multiple #» #» of 𝑢 × 𝑣 . Notice that, when 𝑛 ̸= 3, this does not happen. [︂ ]︂ [︂ ]︂ 0 1 #» #» , then the only vector that is orthogonal to both #» 𝑢 and 𝑣 = • In if 𝑢 = 1 0 #» and #» 𝑣 is 0 , which we can consider a trivial choice. R2 , • For vectors #» 𝑢 , #» 𝑣 ∈ R𝑛 with 𝑛 ≥ 4, there are many different lines through the origin that are orthogonal to both #» 𝑢 and #» 𝑣 , making it difficult to choose one line out of many. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0⎥ #» ⎢ 1 ⎥ #» ⎢ 0 ⎥ #» ⎢ 0 ⎥ ⎥ For example, in R4 , if #» 𝑢 =⎢ ⎣ 0 ⎦ and 𝑣 = ⎣ 0 ⎦, then 𝑦 = ⎣ 1 ⎦ and 𝑧 = ⎣ 0 ⎦ are 0 0 0 1 each orthogonal to both #» 𝑢 and #» 𝑣 , but #» 𝑦 and #» 𝑧 lie on different lines through the origin. Chapter 2 Span, Lines and Planes 2.1 Definition 2.1.1 Linear Combination Example 2.1.2 Example 2.1.3 Linear Combinations and Span Let 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F and let 𝑣#»1 , 𝑣#»2 , . . . 𝑣#»𝑘 be vectors in F𝑛 . We refer to any vector of the form 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 as a linear combination of 𝑣#»1 , 𝑣#»2 , . . . 𝑣#»𝑘 . ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤ −1 2 9 9 −1 2 ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ Since 2 0 − 5 1 = −5 , −5 is a linear combination of vectors 0 and 1 ⎦ 3 5 −5 −5 3 5 3 in R . ]︂ [︂ ]︂ [︂ ]︂ ]︂ [︂ ]︂ [︂ [︂ [︂ ]︂ 1+𝑖 𝑖 1 + 4𝑖 1 + 4𝑖 1+𝑖 𝑖 and is a linear combination of , = +4 Since 3𝑖 −4𝑖 0 −16𝑖 −16𝑖 −4𝑖 0 in C2 . ⎡ Example 2.1.4 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −31 1 0 1 10 ⎢ 54 ⎥ ⎢0⎥ ⎢ 0 ⎥ ⎢2⎥ ⎢ −10 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ The vector ⎢ ⎣ −10 ⎦ = 2 ⎣ 2 ⎦ − 5 ⎣ −1 ⎦ + 7 ⎣ 3 ⎦ − 4 ⎣ 10 ⎦ is a linear combination of the 23 0 1 4 0 four vectors [︀ ]︀𝑇 [︀ ]︀𝑇 [︀ ]︀𝑇 [︀ ]︀𝑇 1 0 2 0 , 0 0 −1 1 , 1 2 3 4 , 10 −10 10 0 in R4 . Example 2.1.5 Definition 2.1.6 Span #» ∈ F𝑛 , the zero vector #» #» is a linear combination of #» #» For any #» 𝑣,𝑤 0 = 0 #» 𝑣 + 0𝑤 𝑣 and 𝑤. Let 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 be vectors in F𝑛 . We define the span of {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } to be the set of all linear combinations of 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 . That is, Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } = {𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 : 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F}. 32 Section 2.1 Linear Combinations and Span 33 We refer to {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } as a spanning set for Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. We also say that Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is spanned by {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. ⎡ Example 2.1.7 ⎤ ⎡ ⎤ 2𝑖 2+𝑖 Consider the vectors 𝑣#»1 = ⎣ 3 ⎦ , 𝑣#»2 = ⎣ −5 + 2𝑖 ⎦ in C3 . Then 𝑖 6𝑖 ⎫ ⎧⎡ ⎤ ⎡ ⎡ ⎤ ⎤⎫ ⎧ ⎡ ⎤ 2𝑖 2+𝑖 2+𝑖 ⎬ ⎨ ⎬ ⎨ 2𝑖 Span ⎣ 3 ⎦ , ⎣ −5 + 2𝑖 ⎦ = 𝑐1 ⎣ 3 ⎦ + 𝑐2 ⎣ −5 + 2𝑖 ⎦ : 𝑐1 , 𝑐2 ∈ C . ⎭ ⎭ ⎩ ⎩ 𝑖 6𝑖 𝑖 6𝑖 Example 2.1.8 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 2 5 Consider the vectors 𝑣#»1 = ⎣ 3 ⎦ , 𝑣#»2 = ⎣ 5 ⎦ , 𝑣#»3 = ⎣ −8 ⎦ , 𝑣#»4 = ⎣ 7 ⎦ in F3 . Then 7 7 6 −4 ⎧ ⎡ ⎤ ⎫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 2 5 ⎨ ⎬ Span {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 } = 𝑐1 ⎣ 3 ⎦ + 𝑐2 ⎣ 5 ⎦ + 𝑐3 ⎣ −8 ⎦ + 𝑐4 ⎣ 7 ⎦ : 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ F . ⎩ ⎭ 7 7 6 −4 REMARK In the above example, F could be either R or C. If F = R, then we would choose 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ R. However, if F = C, then we would choose 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ C, resulting in more vectors being in Span {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 }. ⎧⎡ ⎤ ⎡ ⎤⎫ ⎤ −3 1 ⎬ ⎨ 1 Determine whether the vector ⎣ 9 ⎦ is an element of Span ⎣ 2 ⎦ , ⎣ −1 ⎦ . ⎩ ⎭ 2 1 0 ⎡ Example 2.1.9 Solution: ⎧⎡ ⎤ ⎡ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤⎫ −3 1 1 −3 1 ⎬ ⎨ 1 Since ⎣ 9 ⎦ = 2 ⎣ 2 ⎦ − 5 ⎣ −1 ⎦, we see that ⎣ 9 ⎦ ∈ Span ⎣ 2 ⎦ , ⎣ −1 ⎦ . ⎩ ⎭ 2 1 0 2 1 0 Example 2.1.10 ⎧⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ 0 −1 ⎬ ⎨ 2 ⎣ ⎦ ⎣ ⎦ ⎣ 0 , 0 ⎦ . Determine whether 1 is in Span ⎩ ⎭ 0 5 3 Solution: ⎧⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 ⎬ 2 −1 2𝑐1 − 𝑐2 ⎨ 2 ⎦ 0 Every vector in Span ⎣ 0 ⎦ , ⎣ 0 ⎦ has the form 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ = ⎣ ⎩ ⎭ 5 3 5 3 5𝑐1 + 3𝑐2 34 Chapter 2 Span, Lines and Planes ⎤ ⎡ ⎤ ⎡ 0 2𝑐1 − 𝑐2 ⎦ has no solutions in 𝑐1 , 𝑐2 ∈ F, 0 for some 𝑐1 , 𝑐2 ∈ F. Since the equation ⎣ 1 ⎦ = ⎣ 0 5𝑐1 + 3𝑐2 ⎧⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ −1 ⎬ 0 ⎨ 2 we conclude that ⎣ 1 ⎦ ̸∈ Span ⎣ 0 ⎦ , ⎣ 0 ⎦ . ⎭ ⎩ 5 3 0 2.2 Lines in R2 There is a nice way to write down the equation of a line in R𝑛 making use of vectors. We begin with R2 (otherwise known as the 𝑥𝑦-plane) before generalizing to R𝑛 . We consider the line ℒ with points (𝑥1 , 𝑦1 ) and (𝑥2 , 𝑦2 ), 𝑦-intercept at (0, 𝑏), with slope 𝑚 = 𝑝𝑞 where 𝑞 ̸= 0. We visualize all of this information below. Figure 2.2.1: Characteristics of a line ℒ in R2 The table below includes four different forms of equation(s) for ℒ in the 𝑥𝑦-plane, based on given information. Given Information Slope 𝑚 and 𝑦-intercept 𝑏 Two points (𝑥1 , 𝑦1 ) and (𝑥2 , 𝑦2 ) Point (𝑥1 , 𝑦1 ) and slope 𝑚 Point (𝑥1 , 𝑦1 ) and slope 𝑝 𝑞 (𝑞 ̸= 0) Equation of Line 𝑦 = 𝑚𝑥 + 𝑏 𝑦 − 𝑦1 𝑥 − 𝑥1 = 𝑦2 − 𝑦1 𝑥2 − 𝑥1 𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1 ) 𝑥 = 𝑥1 + 𝑞𝑡 , 𝑡∈R 𝑦 = 𝑦1 + 𝑝𝑡 In the first 3 forms above, we input a particular value for 𝑥 to obtain the value of 𝑦 for the corresponding point on the line. In the last form above, the variable 𝑡 is called a parameter. We input a particular value for 𝑡 to obtain the value of the 𝑥- and 𝑦-coordinates for the corresponding point on the line. Definition 2.2.1 Parametric Equations of a Line in R2 Let 𝑝, 𝑞 be fixed real numbers and 𝑞 ̸= 0. Then parametric equations of a line in R2 through the point (𝑥1 , 𝑦1 ) with slope 𝑝𝑞 are 𝑥 = 𝑥1 + 𝑞𝑡 , 𝑦 = 𝑦1 + 𝑝𝑡 𝑡 ∈ R. Section 2.2 Lines in R2 35 Each value of 𝑡 gives a different point on the line. For instance, • 𝑡 = 0 gives the point (𝑥, 𝑦) = (𝑥1 , 𝑦1 ) • 𝑡 = 2 gives the point (𝑥, 𝑦) = (𝑥1 + 2𝑞, 𝑦1 + 2𝑝) • 𝑡 = −5 gives the point (𝑥, 𝑦) = (𝑥1 − 5𝑞, 𝑦1 − 5𝑝). As 𝑡 varies over all real numbers, we generate all the points on the line. REMARK Since 𝑞 ̸= 0, the expression 𝑝𝑞 is always defined. Putting 𝑞 = 0 into the parametric equations of the line gives us 𝑥 = 𝑥1 and 𝑦 = 𝑦1 + 𝑝𝑡, a vertical line with undefined slope (we can think of this informally as “infinite slope”). Definition 2.2.2 [︂ ]︂ 𝑞 be a non-zero vector in R2 . The expression Let 𝑝 Vector Equation of a Line in R2 [︂ ]︂ [︂ ]︂ [︂ ]︂ #» 𝑥 𝑥1 𝑞 ℓ = = ,𝑡∈R +𝑡 𝑦 𝑦1 𝑝 is a vector equation of the line ℒ in R2 [︂ through 𝑥1 𝑦1 ]︂ [︂ ]︂ 𝑞 . with direction 𝑝 [︂ ]︂ [︂ ]︂ 𝑥1 𝑞 For any value of 𝑡 ∈ R, the expression produces a vector in R2 whose terminal +𝑡 𝑦1 𝑝 point has coordinates 𝐿 = (𝑥1 + 𝑡𝑞, 𝑦1 + 𝑡𝑝) and is on ℒ. REMARKS [︂ ]︂ [︂ ]︂ 𝑥 𝑞 If we let #» 𝑢 = 1 and #» , we can write a vector equation of line ℒ in R2 as 𝑣 = 𝑦1 𝑝 #» #» ℓ = 𝑢 + 𝑡 #» 𝑣 , for 𝑡 ∈ R. • Letting 𝑡 = 0 gives us that the vector #» 𝑢 is on the line. • The line passes through the terminal point 𝑈 associated with #» 𝑢 . The other points on the line move from 𝑈 in the #» 𝑣 direction by scalar multiples of #» 𝑣. • We say that #» 𝑣 is parallel to the line and that #» 𝑣 is a direction vector to the line. • The vector #» 𝑣 is parallel to the line. However, the terminal point 𝑉 associated with #» 𝑣 is not usually a point on the line; in fact, 𝑉 is a point on the line if and only if the vector #» 𝑣 is a scalar multiple of the vector #» 𝑢. 36 Chapter 2 Definition 2.2.3 Span, Lines and Planes #» Let #» 𝑢 , #» 𝑣 ∈ R2 with #» 𝑣 = ̸ 0 . We refer to the set of vectors Line in R2 ℒ = { #» 𝑢 + 𝑡 #» 𝑣 : 𝑡 ∈ R} as a line ℒ in R2 through #» 𝑢 with direction #» 𝑣. Figure 2.2.2: Line in R2 through #» 𝑢 with direction #» 𝑣 Example 2.2.4 Let 𝑈 = (2, 3) and 𝑉 = (1, 4) be two points on a line ℒ in R2 . Find a vector equation of ℒ. Solution: #» #» We can find a vector parallel to the line by taking the difference #» 𝑣 − #» 𝑢 ,[︂ where ]︂ ]︂ [︂ 𝑢 ]︂and[︂𝑣 are −1 2 1 , = − the vector representations of points 𝑈 and 𝑉 , respectively. This gives 1 3 4 [︂ ]︂ [︂ ]︂ #» −1 2 for 𝑡 ∈ R. +𝑡 and a vector equation of the line is therefore ℓ = #» 𝑢 + 𝑡( #» 𝑣 − #» 𝑢) = 1 3 2.3 Lines in R𝑛 We now consider lines in R𝑛 . Lines in R𝑛 can be described using the same vector equations as lines in R2 . The only major differences in describing these lines arise from the implicit number of components in the vectors, which in turn affects the number of parametric equations associated with these lines. Definition 2.3.1 Vector Equation of a Line in R𝑛 #» Let #» 𝑢 , #» 𝑣 ∈ R𝑛 with #» 𝑣 = ̸ 0 . The expression #» #» ℓ = 𝑢 + 𝑡 #» 𝑣, 𝑡∈R is a vector equation of the line ℒ in R𝑛 through #» 𝑢 with direction #» 𝑣. #» #» If ℓ 1 and ℓ 2 are lines with direction #» 𝑣 1 and #» 𝑣 2 , respectively, we say that they have the #» #» same direction if 𝑐 𝑣 1 = 𝑣 2 for some non-zero 𝑐 ∈ R. Section 2.3 Lines in R𝑛 37 ⎤ ⎡ ⎤ 𝑣1 𝑢1 ⎢ 𝑣2 ⎥ ⎢ 𝑢2 ⎥ ⎢ ⎥ ⎢ ⎥ For any value of 𝑡 ∈ R, the expression ⎢ . ⎥ + 𝑡 ⎢ . ⎥ produces a vector in R𝑛 whose . ⎣ .. ⎦ ⎣ . ⎦ ⎡ 𝑣𝑛 𝑢𝑛 terminal point has coordinates 𝐿 = (𝑢1 + 𝑡𝑣1 , 𝑢2 + 𝑡𝑣2 , ..., 𝑢𝑛 + 𝑡𝑣𝑛 ) and is on line ℒ. #» Figure 2.3.3: In R𝑛 , the line ℓ = #» 𝑢 + 𝑡 #» 𝑣, 𝑡 ∈ R. REMARKS • Letting 𝑡 = 0 gives us that the vector #» 𝑢 is on the line. • The line passes through the terminal point 𝑈 associated with #» 𝑢 . The other points on the line move from 𝑈 in the #» 𝑣 direction by scalar multiples of #» 𝑣. • We say that #» 𝑣 is parallel to the line, and that #» 𝑣 is a direction vector to the line. • The vector #» 𝑣 is parallel to the line, however the terminal point 𝑉 associated with the #» vector 𝑣 , is not usually a point on the line. In fact, 𝑉 is a point on the line if and only if the vector #» 𝑣 is a multiple of the vector #» 𝑢. • There are many different vector equations that could be used to describe the same line ℒ. We’ll see this in Example 2.3.5 Definition 2.3.2 Parametric Equations of a Line in R𝑛 #» Let #» 𝑢 , #» 𝑣 ∈ R𝑛 with #» 𝑣 = ̸ 0 . Consider the vector equation of the line ℒ in R𝑛 given by #» #» ℓ = 𝑢 + 𝑡 #» 𝑣, 𝑡 ∈ R. The parametric equations of the line ℒ in R𝑛 through #» 𝑢 with direction #» 𝑣 are ℓ1 = 𝑢1 + 𝑡𝑣1 ℓ2 = 𝑢2 + 𝑡𝑣2 , .. . 𝑡 ∈ R. ℓ𝑛 = 𝑢𝑛 + 𝑡𝑣𝑛 Example 2.3.3 Give a vector equation and a set of parametric equations of the line through ⎡ ⎤ −2 𝑈 = (4, −3, 5) and in the direction of the vector #» 𝑣 = ⎣ 4 ⎦. 1 38 Chapter 2 Span, Lines and Planes Solution: A vector equation of the line is ⎡ ⎤ ⎡ ⎤ 4 −2 #» ⎣ ℓ = −3 ⎦ + 𝑡 ⎣ 4 ⎦ , 𝑡 ∈ R. 5 1 The corresponding parametric equations of the line are: ℓ1 = 4 − 2𝑡 ℓ2 = −3 + 4𝑡 , ℓ3 = 5 + 𝑡 Definition 2.3.4 𝑡 ∈ R. #» Let #» 𝑢 , #» 𝑣 ∈ R𝑛 with #» 𝑣 = ̸ 0 . We refer to the set of vectors Line in R𝑛 ℒ = { #» 𝑢 + 𝑡 #» 𝑣 : 𝑡 ∈ R} as a line ℒ in R𝑛 through #» 𝑢 with direction #» 𝑣. Example 2.3.5 ⎫ ⎧⎡ ⎤ ⎤ ⎡ −2 ⎬ ⎨ 4 Let ℒ = ⎣ −3 ⎦ + 𝑡 ⎣ 4 ⎦ : 𝑡 ∈ R , be the line from Example 2.3.3. ⎭ ⎩ 1 5 ⎫ ⎧⎡ ⎤ ⎤ ⎡ 6 ⎬ ⎨ 0 Show that the line ℳ = ⎣ 5 ⎦ + 𝑠 ⎣ −12 ⎦ : 𝑠 ∈ R is identical to ℒ. ⎭ ⎩ −3 7 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −2 6 −2 6 Solution: First, −3 ⎣ 4 ⎦ = ⎣ −12 ⎦, so ⎣ 4 ⎦ has the same direction as ⎣ −12 ⎦. 1 −3 1 −3 Thus, ℒ and ℳ are parallel. Two parallel lines are the same if and only if they share a common point. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 4 −2 0 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ Taking 𝑠 = 0 shows that 5 is on ℳ. Taking 𝑡 = 2 shows that −3 + 2 4 = 5 ⎦ 7 5 1 7 is also on ℒ. Thus, lines ℒ and ℳ are identical. EXERCISE ⎡ ⎤ ⎡ ⎤ 1 0 #» Determine two different vector equations for the line ℓ = ⎣ 2 ⎦ + 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R. 2 1 Suppose that we are given two distinct points, 𝑈 and 𝑄, on a line in R𝑛 , with associated vectors, #» 𝑢 and #» 𝑞 , respectively. We can define #» 𝑣 = #» 𝑞 − #» 𝑢 , and this vector gives the direction of the line. Thus, we can still use Definition 2.3.1 for the vector equation of the line: #» #» ℓ = 𝑢 + 𝑡( #» 𝑞 − #» 𝑢 ), 𝑡 ∈ R. Section 2.4 Lines in R𝑛 39 Figure 2.3.4: Line in R𝑛 with direction #» 𝑣 = #» 𝑞 − #» 𝑢 Example 2.3.6 Determine a vector equation of the line through 𝑈 = (2, −3, 5) and 𝑄 = (4, −2, 6). Solution: A vector equation of the line through the given points is ⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ 2 4 2 2 2 #» ⎣ ⎦ ⎝ ⎣ ⎦ ⎣ ⎦ ⎠ ⎣ ⎦ ⎣ −2 − −3 = −3 + 𝑡 1 ⎦ , 𝑡 ∈ R. ℓ = −3 + 𝑡 5 6 5 5 1 We can describe lines through the origin as the span of a single vector. If #» 𝑣 ∈ R𝑛 , with #» #» #» #» #» 𝑣 ̸= 0 , then Span{ 𝑣 } = { 0 + 𝑡 𝑣 : 𝑡 ∈ R} is the line through the origin 𝑂 with #» 𝑣 as a direction vector. #» Figure 2.3.5: In R𝑛 , Span{ #» 𝑣 } = { 0 + 𝑡 #» 𝑣 : 𝑡 ∈ R} is a line through the origin REMARK If a line can be expressed as the span of one vector, then the line must pass through the origin. Example 2.3.7 ⎧⎡ ⎤⎫ ⎪ ⎪ 2 ⎪ ⎪ ⎨ ⎢ 4 ⎥⎬ ⎢ ⎥ The set Span ⎣ ⎦ is a line ℒ in R4 that goes through the origin and points in the 6 ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 8 ⎧ ⎡ ⎤ ⎫ ⎡ ⎤ 2 2 ⎪ ⎪ ⎪ ⎪ ⎨ ⎢ ⎥ ⎬ ⎢4⎥ 4 ⎥. It is shorthand for ℒ = 𝑡 ⎢ ⎥ : 𝑡 ∈ R . direction of ⎢ ⎣6⎦ ⎣6⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 8 8 40 Chapter 2 2.4 Span, Lines and Planes The Vector Equation of a Plane in R𝑛 In this section we will be working in R𝑛 where 𝑛 ≥ 2. Definition 2.4.1 Plane in R𝑛 Through the Origin #» be non-zero vectors in R𝑛 with #» #» for any 𝑐 ∈ R. Then Let #» 𝑣,𝑤 𝑣 = ̸ 𝑐𝑤 #» = {𝑠 #» #» : 𝑠, 𝑡 ∈ R} 𝒫 = Span{ #» 𝑣 , 𝑤} 𝑣 + 𝑡𝑤 #» is a plane in R𝑛 through the origin with direction vectors #» 𝑣 and 𝑤. #» in gray. Figure 2.4.6: Plane 𝒫 = Span{ #» 𝑣 , 𝑤} #» #» #» and #» #» Three points on 𝒫 are illustrated: #» 𝑞 = #» 𝑣 + 𝑤, 𝑟 = 3 #» 𝑣 +𝑤 𝑠 = −2 #» 𝑣 − 𝑤. REMARKS • 𝒫 contains the points 𝑉 and 𝑊 , which are the terminal points associated with the #» respectively. vectors #» 𝑣 and 𝑤 #» for some • If a point 𝑃 with associated vector #» 𝑝 lies on the plane, then #» 𝑝 = 𝑠 #» 𝑣 + 𝑡 𝑤, 𝑠, 𝑡 ∈ R. • Any plane defined by the span of two vectors must pass through the origin. #» must be non-zero vectors with 𝑤 #» ̸= 𝑐 #» • To form a plane, #» 𝑣,𝑤 𝑣 for any 𝑐 ∈ R. If exactly #» #» #» #» #» #» is a line, not a one of 𝑣 and 𝑤 is 0 , or if 𝑤 = 𝑐 𝑣 for some 𝑐 ∈ R, then Span{ #» 𝑣 , 𝑤} #» #» #» #» #» #» plane. If 𝑣 = 𝑤 = 0 , then Span{ 𝑣 , 𝑤} = 0 ; this is not a plane, but rather a single point, the origin. Definition 2.4.2 Vector Equation of a Plane in R𝑛 Through the Origin #» be non-zero vectors in R𝑛 with #» #» for any 𝑐 ∈ R. The expression Let #» 𝑣,𝑤 𝑣 = ̸ 𝑐𝑤 #» #» 𝑝 = 𝑠 #» 𝑣 + 𝑡𝑤 is a vector equation of the plane in R𝑛 through the origin with direction vectors #» #» 𝑣 and 𝑤. Section 2.4 The Vector Equation of a Plane in R𝑛 Example 2.4.3 41 equation of the plane in R3 through the origin with direction vectors ⎡Determine ⎤ ⎡ a vector ⎤ 2 −1 ⎣ 4 ⎦ and ⎣ 2 ⎦. 6 −3 Solution: A vector equation of this plane is ⎡ ⎤ ⎡ ⎤ 2 −1 #» ⎣ ⎦ ⎣ 𝑝 = 𝑠 4 + 𝑡 2 ⎦ , 𝑠, 𝑡 ∈ R. 6 −3 Notice that the points (2, 4, 6) and (−1, 2, −3) are on this plane. Example 2.4.4 Give a vector equation of the plane in R5 that passes through the origin with direction [︀ ]︀𝑇 [︀ ]︀𝑇 vectors 5 4 3 2 1 and −5 4 −3 2 −1 . Solution: A vector equation of this plane is ⎡ ⎤ ⎡ ⎤ 5 −5 ⎢4⎥ ⎢ 4 ⎥ ⎢ ⎥ ⎢ ⎥ #» ⎥ ⎢ ⎥ 𝑝 = 𝑠⎢ ⎢ 3 ⎥ + 𝑡 ⎢ −3 ⎥ , 𝑠, 𝑡 ∈ R. ⎣2⎦ ⎣ 2 ⎦ 1 −1 Notice that the points (5, 4, 3, 2, 1) and (−5, 4, −3, 2, −1) in R5 are on this plane. Definition 2.4.5 #» be non-zero vectors in R𝑛 with #» #» for any 𝑐 ∈ R. Then Let #» 𝑢 ∈ R𝑛 and let #» 𝑣 and 𝑤 𝑣 = ̸ 𝑐 𝑤, Plane in R𝑛 #» : 𝑠, 𝑡 ∈ R} 𝒫 = { #» 𝑢 + 𝑠 #» 𝑣 + 𝑡𝑤 #» We say that #» #» is a plane in R𝑛 through #» 𝑢 with direction vectors #» 𝑣 and 𝑤. 𝑣 and 𝑤 are parallel to 𝒫. REMARKS • Letting 𝑠 = 𝑡 = 0 gives us that the vector #» 𝑢 is on the plane. • The plane passes through the terminal point 𝑈 associated with #» 𝑢 . The other points #» directions by linear combinations of #» on the plane move from 𝑈 in the #» 𝑣 and the 𝑤 𝑣 #» and 𝑤. #» are not usually • The terminal points, 𝑉 and 𝑊 , associated with the vectors, #» 𝑣 and 𝑤, #» that on the plane. In fact, 𝑉 is a point on the plane if and only if #» 𝑢 ∈ Span{ #» 𝑣 , 𝑤}; is, if and only if 𝑈 lies on the plane through the origin that contains 𝑉 and 𝑊 . 42 Example 2.4.6 Definition 2.4.7 Vector Equation of a Plane Chapter 2 Span, Lines and Planes ⎫ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 −1 ⎬ ⎨ 0 Since the plane ℳ = ⎣ 0 ⎦ + 𝑠 ⎣ 4 ⎦ + 𝑡 ⎣ 2 ⎦ : 𝑠, 𝑡 ∈ R has the same direction vectors ⎭ ⎩ 1 6 −3 ⎧ ⎡ ⎤ ⎫ ⎡ ⎤ −1 ⎨ 2 ⎬ as the plane 𝒫 = 𝑠 ⎣ 4 ⎦ + 𝑡 ⎣ 2 ⎦ : 𝑠, 𝑡 ∈ R from Example 2.4.3, then ℳ is parallel to ⎩ ⎭ 6 −3 𝒫. It can be shown that ℳ does not pass through the origin. #» be non-zero vectors in R𝑛 with #» #» for any 𝑐 ∈ R. Then Let #» 𝑢 ∈ R𝑛 and let #» 𝑣 and 𝑤 𝑣 = ̸ 𝑐 𝑤, #» #» 𝑝 = #» 𝑢 + 𝑠 #» 𝑣 + 𝑡 𝑤, 𝑠, 𝑡 ∈ R is a vector equation of the plane in R𝑛 through #» 𝑢 with direction vectors #» 𝑣 and #» 𝑤. Example 2.4.8 3 Give a vector ⎡ ⎤ equation ⎡ ⎤of a plane in R which passes through the point (1, −4, 6), and has 2 −1 vectors ⎣ 4 ⎦ and ⎣ 2 ⎦ parallel to it. 6 −3 Solution: A vector equation of this plane is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 2 1 #» ⎦ ⎣ ⎦ ⎣ ⎣ 𝑝 = −4 + 𝑠 4 + 𝑡 2 ⎦ , 𝑠, 𝑡 ∈ R. −3 6 6 Note points (2, 4, 6) and (−1, 2, −3) are not on this plane; however, the vectors ⎤ that⎡the ⎤ ⎡ −1 2 ⎣ 4 ⎦ and ⎣ 2 ⎦ are parallel to this plane. −3 6 #» in gray. Figure 2.4.7: Plane 𝒫 with equation #» 𝑝 = #» 𝑢 + 𝑠 #» 𝑣 + 𝑡𝑤 #» #» #» #» #» #» #» #» and #» #» Plane 𝒫 includes 𝑞 = 𝑢 + 𝑣 + 𝑤 and 𝑟 = 𝑢 + 3 𝑣 + 𝑤 𝑥 = #» 𝑢 − 2 #» 𝑣 − 𝑤. Section 2.4 The Vector Equation of a Plane in R𝑛 Example 2.4.9 43 Give a vector equation of a plane 𝒫 in R5 that passes through the point (1, −4, 5, −2, 7) [︀ ]︀𝑇 [︀ ]︀𝑇 and has direction vectors 2 4 6 8 10 and −1 2 −3 4 −5 . Solution: A vector equation of this plane is ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ 2 −1 1 ⎢ −4 ⎥ ⎢4⎥ ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ #» ⎢ ⎥ ⎢ ⎥ ⎥ 𝑝 = ⎢ 5 ⎥ + 𝑠⎢ 6 ⎥ + 𝑡⎢ ⎢ −3 ⎥ , 𝑠, 𝑡 ∈ R. ⎣ −2 ⎦ ⎣8⎦ ⎣ 4 ⎦ 7 10 −5 Note that the points (2, 4, 6, 8, 10) and (−1, 2, −3, 4, −5) are not on 𝒫. However, the vectors [︀ ]︀𝑇 [︀ ]︀𝑇 2 4 6 8 10 and −1 2 −3 4 −5 are parallel to 𝒫. [︀ ]︀𝑇 [︀ ]︀𝑇 Noticing that 2 4 6 8 10 = 2 1 2 3 4 5 , a different vector equation for 𝒫 is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 −1 ⎢ −4 ⎥ ⎢2⎥ ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ #» ⎥ ⎢ ⎥ ⎢ ⎥ 𝑝 =⎢ ⎢ 5 ⎥ + 𝑠 ⎢ 3 ⎥ + 𝑡 ⎢ −3 ⎥ , 𝑠, 𝑡 ∈ R. ⎣ −2 ⎦ ⎣4⎦ ⎣ 4 ⎦ 7 5 −5 REMARK There are many different vector equations that could be used to describe the same plane 𝒫. EXERCISE ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 1 Determine two different vector equations for the plane #» 𝑝 = ⎣ 1 ⎦ +𝑠 ⎣ 2 ⎦ +𝑡 ⎣ −1 ⎦ , 𝑠, 𝑡 ∈ R. 0 3 −1 We can also uniquely define a plane using three points. Let 𝑈, 𝑄 and 𝑅 be three non-colinear points in R𝑛 (that is, three points which do not all lie on the same line), with associated vectors #» 𝑢 , #» 𝑞 , and #» 𝑟 , respectively. Suppose that we want the equation of the unique plane #» = #» containing these three points. The vectors #» 𝑣 = #» 𝑞 − #» 𝑢 and 𝑤 𝑟 − #» 𝑢 are parallel to the plane, and thus, we can express the equation of the plane as 𝒫 = { #» 𝑢 + 𝑠( #» 𝑞 − #» 𝑢 ) + 𝑡( #» 𝑟 − #» 𝑢 ) : 𝑠, 𝑡 ∈ R}. Example 2.4.10 Find a vector equation of a plane in R4 which contains the following points: 𝑈 = (2, −4, 6, −8), 𝑄 = (1, 3, −2, −4), and 𝑅 = (9, 7, 5, 3). Solution: We need to determine two non-parallel direction vectors. We have: 44 Chapter 2 Span, Lines and Planes Figure 2.4.8: Plane 𝒫 = { #» 𝑢 + 𝑠( #» 𝑞 − #» 𝑢 ) + 𝑡( #» 𝑟 − #» 𝑢 ) : 𝑠, 𝑡 ∈ R} in gray. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 9 2 7 ⎢ 3 ⎥ ⎢ −4 ⎥ ⎢ 7 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ #» #» #» ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 7 ⎥ ⎢ −4 ⎥ ⎢ 11 ⎥ 𝑞 − #» 𝑢 =⎢ ⎣ −2 ⎦ − ⎣ 6 ⎦ = ⎣ −8 ⎦ and 𝑟 − 𝑢 = ⎣ 5 ⎦ − ⎣ 6 ⎦ = ⎣ −1 ⎦ . 3 −8 11 −4 −8 4 A vector equation of this plane is then ⎡ ⎤ ⎡ ⎤ ⎤ 7 −1 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 11 7 −4 #» ⎢ ⎥ ⎢ ⎥ ⎥ 𝑝 = #» 𝑢 + 𝑠( #» 𝑞 − #» 𝑢 ) + 𝑡( #» 𝑟 − #» 𝑢) = ⎢ ⎣ 6 ⎦ + 𝑠 ⎣ −8 ⎦ + 𝑡 ⎣ −1 ⎦ , 𝑠, 𝑡 ∈ R. 11 4 −8 ⎡ 2.5 Scalar Equation of a Plane in R3 In R3 (and only in R3 !), there is an additional way to generate an equation of a plane, using the cross product. #» be non-zero vectors in R3 with #» #» for any 𝑐 ∈ R. We know that Let #» 𝑣 and 𝑤 𝑣 ̸= 𝑐 𝑤, #» #» #» #» are both the cross product, 𝑛 = 𝑣 × 𝑤, is orthogonal to any plane to which #» 𝑣 and 𝑤 direction vectors. The vector #» 𝑛 is referred to as a normal vector to such a plane. Suppose that 𝑈 and 𝑃 are points on the plane, with associated vectors, #» 𝑢 and #» 𝑝 , respectively. #» #» Then the vector 𝑝 − 𝑢 is parallel to the plane and is therefore orthogonal to #» 𝑛 ; that is, #» #» · ( #» 𝑛 · ( #» 𝑝 − #» 𝑢 ) = ( #» 𝑣 × 𝑤) 𝑝 − #» 𝑢 ) = 0. Definition 2.5.1 Normal Form, Scalar Equation of a Plane in R3 ⎡ ⎤ 𝑎 #» #» and a normal vector #» Let 𝒫 be a plane in R3 with direction vectors #» 𝑣 and 𝑤 𝑛 = ⎣ 𝑏 ⎦ ̸= 0 . 𝑐 ⎡ ⎤ 𝑥 𝑝 ̸= #» 𝑢 . A normal form of 𝒫 is given by Let #» 𝑢 ∈ 𝒫 and #» 𝑝 = ⎣ 𝑦 ⎦ ∈ 𝒫 where #» 𝑧 #» 𝑛 · ( #» 𝑝 − #» 𝑢 ) = 0. Expanding this, we arrive at a scalar equation (or general form) of 𝒫, 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 = 𝑑, where 𝑑 = #» 𝑛 · #» 𝑢. Section 2.5 Scalar Equation of a Plane in R3 45 REMARK The plane goes through the origin, 𝑂 #» • if and only if the vector 0 satisfies this equation #» · ( #» • if and only if ( #» 𝑣 × 𝑤) 0 − #» 𝑢) = 0 #» for some 𝑎, 𝑏 ∈ R. • if and only if #» 𝑢 = 𝑎 #» 𝑣 + 𝑏 𝑤, Geometrically, the plane goes through the origin if and only if both 𝑉 and 𝑊 lie on the plane. #» The plane in R3 with normal form ( #» #» · ( #» Figure 2.5.9: Let #» 𝑛 = #» 𝑣 × 𝑤. 𝑣 × 𝑤) 𝑝 − #» 𝑢 ) = 0 is in gray. It passes #» through #» 𝑢 and has direction vectors #» 𝑣 and 𝑤. Example 2.5.2 Find a scalar of the plane in R3 which passes through the origin and has direction ⎡ ⎤ equation ⎡ ⎤ 2 −1 vectors ⎣ 4 ⎦ and ⎣ 2 ⎦. 7 −3 ⎡ ⎤ 𝑥 Solution: Let #» 𝑛 be a normal to the plane and let #» 𝑝 = ⎣ 𝑦 ⎦ be any vector on the plane. 𝑧 Then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 −1 −26 #» 𝑛 = ⎣ 4 ⎦ × ⎣ 2 ⎦ = ⎣ −1 ⎦ . 7 −3 8 A normal form of the plane (going through the origin) is thus given by ⎡ ⎤ ⎡ ⎤ 𝑥 −26 ⎣ 𝑦 ⎦ · ⎣ −1 ⎦ = 0. 𝑧 8 Expanding this, we obtain a scalar equation of the plane −26𝑥 − 𝑦 + 8𝑧 = 0. 46 Chapter 2 Span, Lines and Planes Note that the two points (2, 4, 7) and (−1, 2, −3) lie on this plane. Example 2.5.3 3 Give a scalar equation ⎡ ⎤ of the ⎡ plane ⎤ in R which passes through the point (1, −4, 3) and has 2 −1 the two vectors ⎣ 4 ⎦ and ⎣ 2 ⎦ parallel to it. 7 −3 ⎡ ⎤ 𝑥 #» #» ⎣ Solution: Let 𝑛 be a normal to the plane and 𝑝 = 𝑦 ⎦ be any vector on the plane. 𝑧 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 −1 −26 #» 𝑛 = ⎣ 4 ⎦ × ⎣ 2 ⎦ = ⎣ −1 ⎦ . 7 −3 8 A scalar equation of the plane (not going through the origin) is then given by ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ −26 1 𝑥 ⎝⎣ 𝑦 ⎦ − ⎣ −4 ⎦⎠ · ⎣ −1 ⎦ = 0 8 3 𝑧 −26(𝑥 − 1) − (𝑦 + 4) + 8(𝑧 − 3) = 0 −26𝑥 − 𝑦 + 8𝑧 = 2. Note that the two points (2, 4, 7) and (−1, 2, −3) do not lie on this plane. REMARKS 1. Examining the definition of a scalar of a plane above, we see that the com⎤ ⎡ equation 𝑎 ponents of a normal vector #» 𝑛 = ⎣ 𝑏 ⎦ are recorded as the coefficients of 𝑥, 𝑦, 𝑧 in a 𝑐 scalar equation. We can use this information to quickly retrieve a normal form from a scalar equation by identifying a single vector on the plane. 2. We use the language “a scalar equation”, “a normal vector”, and “a normal form” of a plane 𝒫 to emphasize that these are not unique. • Normal vectors of a plane are simply non-zero vectors that are orthogonal to any vector that is parallel to 𝒫. If #» 𝑛 is a normal vector of 𝒫, then so is 𝑐 #» 𝑛 for any non-zero scalar 𝑐 ∈ R, so 𝒫 does not have a unique normal vector (and thus normal form). That said, all normal vectors of 𝒫 lie on the same line. • A scalar equation of a plane is an equation which all vectors of 𝒫 satisfy. We know that multiplying an equation by a non-zero 𝑐 ∈ R does not change the solution set to that equation, so 𝒫 does not have a unique scalar equation. Section 2.5 Scalar Equation of a Plane in R3 Example 2.5.4 47 Let 𝒫 be a plane in R3 with scalar equation 2𝑥 − 𝑦 + 𝑧 = 7. Give a normal form of 𝒫. ⎡ ⎤ 2 Solution: From the scalar equation, we see that a normal vector of 𝒫 is #» 𝑛 = ⎣ −1 ⎦. To 1 identify a vector on 𝒫, we can pick any constants we want for exactly two of 𝑥, 𝑦, 𝑧 and solve for the third value. Taking 𝑥 = 𝑦 = 0, we have 7 = 2(0) − 0 + 𝑧 = 𝑧, ⎡ ⎤ 0 so #» 𝑢 = ⎣ 0 ⎦ ∈ 𝒫. Therefore, a normal form of 𝒫 is 7 ⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞ 2 𝑥 0 ⎣ −1 ⎦ · ⎝⎣ 𝑦 ⎦ − ⎣ 0 ⎦⎠ = 0. 1 𝑧 7 Chapter 3 Systems of Linear Equations 3.1 Introduction We begin with some introductory examples in which linear systems of equations naturally arise. Example 3.1.1 At the market, 3 watermelons and 7 kiwis cost $19 and 2 watermelons and 5 kiwis cost $13. How much does a watermelon cost and how much does a kiwi cost? Solution: Let 𝑤 denote the price of a watermelon and let 𝑘 denote the price of a kiwi. We convert the given information into a system of two equations. 3𝑤 + 7𝑘 = 19 2𝑤 + 5𝑘 = 13 (𝑒1 ) (𝑒2 ) We must find values for 𝑤 and 𝑘 that satisfy the equations (𝑒1 ) and (𝑒2 ) simultaneously. If we multiply (𝑒1 ) by 2 and subtract from that 3 times (𝑒2 ), we obtain [2(3) − 3(2)]𝑤 + [2(7) − 3(5)]𝑘 = 2(19) − 3(13). That is, −𝑘 = −1, which implies that 𝑘 = 1. Substituting this value for 𝑘 into equation (𝑒1 ) gives 𝑤 = 19−7(1) 3 = 4. Therefore, the price of a watermelon is $4 and the price of a kiwi is $1. Substituting these values into the two equations 3(4) + 7(1) = 19 2(4) + 5(1) = 13 shows that they satisfy both equations. The crucial step in solving this problem is that of taking the combination 2(𝑒1 ) − 3(𝑒2 ). This step is determined by looking at the coefficients of 𝑤 in the two equations (𝑒1 ) and (𝑒2 ), which are 3 and 2, respectively. The combination of 2(𝑒1 ) − 3(𝑒2 ) produces a new equation in which 𝑤 has been eliminated. We then solve this new equation for the other 48 Section 3.2 Systems of Linear Equations 49 variable, 𝑘. Once we have a value for 𝑘, we then substitute it into one of the original two equations and solve for the value of the other variable, 𝑤. Alternatively, we could have considered the combination 5(𝑒1 ) − 7(𝑒2 ), to eliminate the variable 𝑘, and then solve for 𝑤. We then could substitute its value into either equation, (𝑒1 ) or (𝑒2 ) in order to find the value of 𝑘. Example 3.1.2 Find the equation of the line that passes through the points (2, 3) and (−3, 4). Solution: The general equation of a line is 𝑦 = 𝑚𝑥 + 𝑏. We must determine the values of the constants 𝑚 and 𝑏. Since both points lie on the line, they must both satisfy the equation. Therefore, 3 = 2𝑚 + 𝑏 and 4 = −3𝑚 + 𝑏. Thus, we have a system with two equations and two variables, 𝑚 and 𝑏: 2𝑚 + 𝑏 = 3 −3𝑚 + 𝑏 = 4 We will not solve this system now. Example 3.1.3 ⎧⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ 1 1 ⎬ ⎨ −1 Determine whether the vector ⃗𝑣 = ⎣ 2 ⎦ lies in Span ⎣ 2 ⎦ , ⎣ 1 ⎦ . ⎩ ⎭ 3 −3 1 Solution: The vector ⃗𝑣 will lie in the span of the two given vectors if, and only if, there exist scalars 𝑎, 𝑏 ∈ R, such that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 −1 1 ⎣2⎦ = 𝑎 ⎣ 2 ⎦ + 𝑏 ⎣1⎦ , 3 −3 1 or equivalently 1 = −1𝑎 +𝑏 2 = 2𝑎 +𝑏 . 3 = −3𝑎 +𝑏 We have a system with three equations and two variables. We will not solve this system now. 3.2 Definition 3.2.1 Linear Equation, Coefficient, Constant Term Systems of Linear Equations A linear equation in 𝑛 variables (or unknowns) 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 is an equation that can be written in the form 𝑎1 𝑥1 + 𝑎2 𝑥2 + · · · + 𝑎𝑛 𝑥𝑛 = 𝑏, where 𝑎1 , 𝑎2 , . . . , 𝑎𝑛 , 𝑏 ∈ F. The scalars 𝑎1 , 𝑎2 , . . . , 𝑎𝑛 are the coefficients of 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 , respectively, and 𝑏 is called the constant term. 50 Definition 3.2.2 System of Linear Equations Chapter 3 Systems of Linear Equations A system of linear equations is a collection of 𝑚 linear equations in 𝑛 variables, 𝑥1 , . . . , 𝑥 𝑛 : 𝑎11 𝑥1 +𝑎12 𝑥2 + · · · +𝑎1𝑛 𝑥𝑛 = 𝑏1 𝑎21 𝑥1 +𝑎22 𝑥2 + · · · +𝑎2𝑛 𝑥𝑛 = 𝑏2 .. . 𝑎𝑚1 𝑥1 +𝑎𝑚2 𝑥2 + · · · +𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚 We will use the convention that 𝑎𝑖𝑗 is the coefficient of 𝑥𝑗 in the 𝑖𝑡ℎ equation. Definition 3.2.3 Solve, Solution We say that the scalars 𝑦1 , 𝑦2 , . . . , 𝑦𝑛 in F solve a system of linear equations if, when we set 𝑥1 = 𝑦1 , 𝑥2 = 𝑦2 , . . . , 𝑥𝑛 = 𝑦𝑛 in the system, then each of the equations is satisfied: 𝑎11 𝑦1 𝑎21 𝑦1 +𝑎12 𝑦2 + · · · +𝑎22 𝑦2 + · · · .. . 𝑎𝑚1 𝑦1 +𝑎𝑚2 𝑦2 + · · · +𝑎1𝑛 𝑦𝑛 = +𝑎2𝑛 𝑦𝑛 = 𝑏1 𝑏2 +𝑎𝑚𝑛 𝑦𝑛 = 𝑏𝑚 ⎡ ⎤ 𝑦1 ⎢ 𝑦2 ⎥ ⎢ ⎥ We also say that the vector ⃗𝑦 = ⎢ . ⎥ is a solution to the system. ⎣ .. ⎦ 𝑦𝑛 Note that if at least one equation in the system of linear equations is not satisfied by the entries in #» 𝑦 , then #» 𝑦 is not a solution. Example 3.2.4 Consider the following system of linear equations: 𝑥1 −2𝑥2 +3𝑥3 = 1 2𝑥1 −4𝑥2 +6𝑥3 = 2 . 3𝑥1 −𝑥2 +𝑥3 = 3 ⎡ ⎤ 1 The vector ⎣ 0 ⎦ is a solution, as 0 1(1) − 2(0) + 3(0) = 1 2(1) − 4(0) + 6(0) = 2. 3(1) − 1(0) + 1(0) = 3 ⎡ ⎤ −2 However, ⎣ 0 ⎦ is not a solution, because even though 1(−2) − 2(0) + 3(1) = 1 and 1 2(−2) − 4(0) + 6(1) = 2, we have that 3(−2) − 1(0) + 1(1) = −5 ̸= 3. Section 3.2 Systems of Linear Equations Definition 3.2.5 Solution Set Example 3.2.6 51 The set of all solutions to a system of linear equations is called the solution set to the system. Solve 2𝑥 = 4. That is, find the solution set to this system of linear equations (a system with one equation). Solution: The solution to this equation is 𝑥 = 2 and thus, the solution set 𝑆 is {2}. We can verify that 2(2) = 4. Example 3.2.7 Solve 2𝑥 + 4𝑦 = 6. Solution: We have only one equation for the two unknowns, 𝑥 and 𝑦. There will be an infinite number of solutions to this equation. Indeed, given any real number 𝑦, we can solve for 𝑥 to obtain 𝑥 = 3 − 2𝑦. In this way we see that 𝑥 depends on 𝑦, but that 𝑦 can be freely chosen to be any real number. We emphasize this point by writing 𝑦 = 𝑡, where the variable 𝑡 ∈ R is called a parameter. Then 𝑥 = 3 − 2𝑡, and therefore the solution set, 𝑆, can be expressed as {︂[︂ ]︂ }︂ 3 − 2𝑡 𝑆= :𝑡∈R . 𝑡 We can verify our solution by setting 𝑦 = 𝑡 and 𝑥 = 3 − 2𝑡: 2𝑥 + 4𝑦 = 2(3 − 2𝑡) + 4𝑡 = 6 − 4𝑡 + 4𝑡 = 6, which is the right-hand side of the equation. Note that declaring 𝑦 = 𝑡 as our parameter was an arbitrary choice. We could have just as well let 𝑥 = 𝑡 and solved for 𝑦 in terms of 𝑡. The important property to note is that a parameter was needed to describe the solution set 𝑆. Example 3.2.8 Solve the system of linear equations 2𝑥 + 3𝑦 = 4 2𝑥 + 3𝑦 = 5 Solution: Since both equations have identical left-hand sides, but different right-hand sides, it will be impossible to find values for 𝑥 and 𝑦 that satisfy both of these equations. Therefore, the solutions set is 𝑆 = ∅. Theorem 3.2.9 (The Solution Set to a System of Linear Equations) The solution set to a system of linear equations is exactly one of the following: (a) empty (there are no solutions), (b) contains exactly one element (there is a unique solution), or (c) contains an infinite number of elements (the solution set has one or more parameters). 52 Chapter 3 Systems of Linear Equations We have given examples above to illustrate these three outcomes. However, at this time, we will not give a proof of this result. Definition 3.2.10 Inconsistent and Consistent Systems If the solution set to a system of linear equations is empty, then we say that the system is inconsistent. If the solution set has a unique solution or infinitely many solutions, then we say that the system is consistent. Example 3.2.11 Solve the system 𝑥=1 . 𝑥=2 Solution: There is no solution and 𝑆 = ∅. This system is inconsistent. Example 3.2.12 Solve the system 𝑥+𝑦 =2 . 𝑥−𝑦 =4 {︂[︂ Solution: There is a unique solution, 𝑥 = 3 and 𝑦 = −1. Therefore, 𝑆 = 3 −1 ]︂}︂ . This system is consistent. Example 3.2.13 Solve the system 𝑥 + 𝑦 = 1. Solution: ]︂ 𝑡, where }︂ 𝑡 ∈ R. Therefore, 𝑥 = 1−𝑡. We obtain infinitely many solutions {︂[︂Let 𝑦 = 1−𝑡 : 𝑡 ∈ R . This system is consistent. and 𝑆 = 𝑡 Note{︂[︂ that if]︂ we had}︂ let 𝑥 = 𝑡, where 𝑡 ∈ R, we would obtain 𝑦 = 1 − 𝑡, and 𝑡 :𝑡∈R . 𝑆= 1−𝑡 These are two different representations of the same set 𝑆. Generally there will be many different ways of expressing any given set. It is usually not obvious whether a system is inconsistent, will have a unique solution, or will have infinitely many solutions. The key idea to solving a system of equations is to manipulate the system into another system of equations which is easier to solve and has the same solution set as the original system. Definition 3.2.14 We say that two linear systems are equivalent whenever they have the same solution set. Equivalent Systems Given a system of linear equations, we will manipulate it into equivalent systems and we will stop once we have obtained an equivalent system whose solution set is easily obtained. The question then becomes: what manipulations can we do to a system which will produce an equivalent system? Section 3.2 Systems of Linear Equations Example 3.2.15 Consider the system 53 𝑥 +𝑦 = 1 . If we multiply the first equation by zero, we get a 2𝑥 +3𝑦 = 6 new system: 0=0 2𝑥 + 3𝑦 = 6 This system is not equivalent to the original system. For example, 𝑥 = 3 and 𝑦 = 0 is a solution to the new system, but it is not a solution to the original system. We have lost information by multiplying the first equation by zero. Example 3.2.16 Consider the system 𝑥 +𝑦 +𝑧 = 3 2𝑥 +3𝑦 −2𝑧 = 3 We could add an additional equation to get a new system. 𝑥 +𝑦 +𝑧 = 3 2𝑥 +3𝑦 −2𝑧 = 3 𝑥 +2𝑦 −3𝑧 = 5 This new system is not equivalent to the old one. For example, 𝑥 = 1, 𝑦 = 1, 𝑧 = 1 is a solution to the original system, but it is not a solution to the new system. We have added an extra restriction on the variables by adding in this new equation. When manipulating a system, we must be careful that we neither lose information nor add information. Definition 3.2.17 Elementary Operations Consider a system of 𝑚 linear equations in 𝑛 variables. The equations are ordered and labelled from 𝑒1 to 𝑒𝑚 . The following three operations are known as elementary operations. Equation swap elementary operation: interchange two equations. 𝑒𝑖 ↔ 𝑒𝑗 Interchange equations 𝑒𝑖 and 𝑒𝑗 . Equation scale elementary operation: multiply one equation by a non-zero constant. 𝑒𝑖 → 𝑚𝑒𝑖 , 𝑚 ∈ F∖{0} Replace equation 𝑒𝑖 by 𝑚 times equation 𝑒𝑖 . Equation addition elementary operation: add a multiple of one equation to another equation. }︃ 𝑒𝑗 → 𝑐𝑒𝑖 + 𝑒𝑗 Replace equation 𝑒𝑗 by adding 𝑒𝑗 and a multiple of equation 𝑒𝑖 . 𝑖 ̸= 𝑗, 𝑐 ∈ F Some sources refer to these operations as Type I, Type II, and Type III operations, respectively. When performing a series of elementary operations, we will use the convention that 𝑒𝑖 and 𝑒𝑗 refer to the current system we are working with. Therefore, 𝑒𝑖 and 𝑒𝑗 will change as the system is manipulated. 54 Theorem 3.2.18 Chapter 3 Systems of Linear Equations (Elementary Operations) If a single elementary operation of any type is performed on a system of linear equations, then the system produced will be equivalent to the original system. In practice, there might be other operations that we want to perform. These operations will not be elementary operations, instead they will be combinations of elementary operations. For example, 𝑒𝑗 → 𝑐𝑒𝑖 − 𝑒𝑗 is not an elementary operation, but rather it is a combination of two elementary operations. Can you determine which ones? Using these operations will still produce an equivalent system, but they will cause confusion later. Therefore, when manipulating a system of equations, we will make use of the three elementary operations only. Note that if at any point we produce an equation of the form 0 = 𝑎, where 𝑎 ̸= 0, then the system is inconsistent and we stop at once. Definition 3.2.19 Trivial Equation We refer to the equation 0 = 0 as the trivial equation. Any other equation is known as a non-trivial equation. The trivial equation is always true and it has no information content. If at any point when performing elementary operations we produce the trivial equation, then we move this equation to the end of the system, by performing an equation swap elementary operation. We might be tempted to ignore such equations and not write them down. In practice, we keep any (there may be more than one) equations of this form in our new system, so that we will always have the same number of equations in any of the equivalent systems. The goal of performing elementary operations is to obtain a system whose solution set is easier to determine. Let us refer to this system as our “final equivalent system.” The final equivalent system is not unique and depends on the sequence of elementary operations used. Example 3.2.20 Solve the following system of three linear equations in three unknowns: −2𝑥1 −4𝑥2 −6𝑥3 = 4 3𝑥1 +6𝑥2 +10𝑥3 = 6 . 𝑥2 +2𝑥3 = 5 Solution: Suppose that we perform the elementary operation 1 𝑒1 → − 𝑒1 2 to get the new system 𝑥1 +2𝑥2 +3𝑥3 = −2 3𝑥1 +6𝑥2 +10𝑥3 = 6 . 𝑥2 +2𝑥3 = 5 Section 3.3 An Approach to Solving Systems of Linear Equations 55 We will now make use of this new equation 𝑒1 to remove the 𝑥1 term from 𝑒2 . 𝑒2 → −3𝑒1 + 𝑒2 gives 𝑥1 +2𝑥2 +3𝑥3 = −2 𝑥3 = 12 𝑥2 +2𝑥3 = 5 This could be our final equivalent system. From 𝑒2 we obtain that 𝑥3 = 12. We could make use of this fact and 𝑒3 to get 𝑥2 + 2(12) = 5, so that 𝑥2 = −19. We can put the values for 𝑥2 and 𝑥3 into 𝑒1 to get 𝑥1 + 2(−19) + 3(12) = −2, so that 𝑥1 = 0. However, instead of doing these substitutions, we may wish to continue applying elementary operations. We perform the operation 𝑒2 ↔ 𝑒3 to get 𝑥1 +2𝑥2 +3𝑥3 = −2 𝑥2 +2𝑥3 = 5 . 𝑥3 = 12 Then we perform 𝑒2 → −2𝑒3 + 𝑒2 𝑒1 → −3𝑒3 + 𝑒1 to get to get 𝑥1 +2𝑥2 +3𝑥3 = −2 𝑥2 = −19 , 𝑥3 = 12 𝑥1 +2𝑥2 𝑥2 𝑥3 𝑥1 𝑒1 → −2𝑒2 + 𝑒1 to finally get 𝑥2 𝑥3 = −38 = −19 , = 12 = 0 = −19 . = 12 This is our final equivalent system, and its solution set is ⎧⎡ ⎤⎫ ⎨ 0 ⎬ 𝑆 = ⎣ −19 ⎦ . ⎩ ⎭ 12 It is clear from this example that there will be plenty of choices on which elementary operations to apply at any stage and also that there will be choices as to when we will stop at our final equivalent system. There is no “best” final equivalent system. However, there are two particular forms that are preferable for the final equivalent system. We will attempt to motivate these two forms and discuss how they can be obtained. 3.3 An Approach to Solving Systems of Linear Equations Let us consider a system of 𝑚 linear equations and 𝑛 unknowns. We normally try to solve for the variables of the system in the alphabetical or numerical order in which they appear. Let us assume that the variables are labelled 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 . We assume that there is at least one equation with the variable 𝑥1 appearing in it with a non-zero coefficient (by relabeling the variables, if necessary). 56 Chapter 3 Systems of Linear Equations We would like the first equation to tell us about the first variable, 𝑥1 , so that it is of the form 𝑎1 𝑥1 + · · · = 𝑏1 , with 𝑎1 ̸= 0. We will make use of the first equation to remove the variable 𝑥1 from all the other equations after the first equation by performing equation addition elementary operations. We would like the second equation to tell us about the next variable that can be obtained. Call this 𝑥𝑖 . Often this would be the variable 𝑥2 , so that 𝑖 = 2, but this is not always the case. The (new) second equation has the form 𝑎𝑖 𝑥𝑖 + · · · = 𝑏2 , with 𝑎𝑖 ̸= 0. We will make use of the second equation to remove the variable 𝑥𝑖 from all the other equations after the second equation by performing equation addition elementary operations. We repeat this process, moving down through the equations, performing equation swap and equation addition elementary operations as needed. We also move any trivial equations to the end of the system. We assume that the 𝑟𝑡ℎ equation is the last non-trivial equation. (It is possible that 𝑟 is equal to 𝑚.) We also assume that this equation tells us about the 𝑘 𝑡ℎ variable, 𝑥𝑘 . Note that this equation should not involve the variables which were removed from the preceding equations by the process above. (𝑥1 from equation 1, 𝑥𝑖 from equation 2, etc.) Therefore, we have an equation of the form 𝑎𝑘 𝑥𝑘 + · · · = 𝑏𝑟 , with 𝑎𝑘 ̸= 0. We illustrate the process up to this point with the following example. Example 3.3.1 Solve the following system of five linear equations in four unknowns: 𝑥1 +2𝑥2 𝑥1 +2𝑥2 +3𝑥3 +𝑥4 −𝑥1 −𝑥2 +𝑥3 +𝑥4 𝑥2 +𝑥3 +𝑥4 −𝑥2 +2𝑥3 = 1 = 0 = −2 . = −1 = 0 Solution: We will apply elementary operations, usually just one or two at a time. At each stage, we will produce an equivalent system. We will stop the process when we produce an equivalent system whose solution set is relatively easily obtained. We notice that 𝑒1 has 𝑥1 in it and we will use it to eliminate the variable 𝑥1 from all the equations below it. (If 𝑒1 did not have any 𝑥1 terms, then we would have switched it with one that did.) We perform the elementary operations 𝑒2 → −𝑒1 + 𝑒2 𝑒3 → 𝑒1 + 𝑒3 Section 3.3 An Approach to Solving Systems of Linear Equations 57 We notice that 𝑒4 and 𝑒5 do not contain 𝑥1 and so they are not modified at this stage. We get 𝑥1 +2𝑥2 = 1 3𝑥3 +𝑥4 = −1 𝑥2 +𝑥3 +𝑥4 = −1 . 𝑥2 +𝑥3 +𝑥4 = −1 −𝑥2 +2𝑥3 = 0 We notice that 𝑒2 does not have an 𝑥2 variable, but 𝑒3 does. Therefore, we perform the elementary operation 𝑒2 ↔ 𝑒3 to get 𝑥1 +2𝑥2 𝑥2 𝑥2 −𝑥2 +𝑥3 +𝑥4 3𝑥3 +𝑥4 +𝑥3 +𝑥4 +2𝑥3 = 1 = −1 = −1 . = −1 = 0 There is an 𝑥2 in 𝑒2 and we can make use of this fact to remove the variable 𝑥2 from the equations below it. Performing the elementary operations 𝑒4 → −𝑒2 + 𝑒4 𝑒5 → 𝑒2 + 𝑒5 yields 𝑥1 +2𝑥2 𝑥2 +𝑥3 +𝑥4 3𝑥3 +𝑥4 0 3𝑥3 +𝑥4 = 1 = −1 = −1 . = 0 = −1 There is an 𝑥3 in 𝑒3 and we make use of this fact to remove 𝑥3 from all the equations below it. In this case, we need only perform 𝑒5 → −𝑒3 + 𝑒5 to obtain 𝑥1 +2𝑥2 𝑥2 +𝑥3 +𝑥4 3𝑥3 +𝑥4 0 0 = 1 = −1 = −1 . = 0 = 0 Notice that the two trivial equations are at the end. At this point in the process, the form of the equivalent system of equations has a particularly simple appearance. We may choose to stop performing elementary operations at this point, and so this would be our final equivalent system. Note that a different sequence of elementary operations could produce a different final 58 Chapter 3 Systems of Linear Equations equivalent system of a similar form. For example, a final equivalent system could involve the equation 6𝑥3 + 2𝑥4 = −2. Proceeding from the original system to the final equivalent system in this form is known as the forward elimination phase. Once we have a final equivalent system in this form, we can obtain the solution set using a process known as back substitution. We use the information given in the 𝑟𝑡ℎ equation for the variable 𝑥𝑘 to then back substitute for the variable 𝑥𝑘 into all the previous equations in our final equivalent system. Then we repeat this process moving back up the system. Our last step is to use the information given in the second equation for the variable 𝑥𝑖 to back substitute for the variable 𝑥𝑖 into the first equation (if it appears there). Let us return to our example. Example 3.3.1 Our final equivalent system was 𝑥1 +2𝑥2 𝑥2 +𝑥3 +𝑥4 3𝑥3 +𝑥4 0 0 = 1 = −1 = −1 . = 0 = 0 We have completed the forward elimination phase and this system is a good choice for a final equivalent system. It will have the same solution set as the original system and we can obtain its solution by back substitution. We notice that 𝑒4 and 𝑒5 are trivial equations and do not give us any information. We also notice that 𝑒3 gives us a relationship between variables 𝑥3 and 𝑥4 . One of these variables can arbitrarily be assigned any real value and the value of the other variable will depend on this arbitrary value. We will choose to assign 𝑥4 any real value and to emphasize this fact we will let 𝑥4 = 𝑡, where 𝑡 ∈ R. Therefore, 𝑒3 tells us that 1 𝑥3 = (−1 − 𝑡). 3 Then 𝑒2 tells us that 1 2 2𝑡 𝑥2 = −1 − 𝑥3 − 𝑥4 = −1 − (−1 − 𝑡) − 𝑡 = − − . 3 3 3 Then 𝑒1 tells us that (︂ )︂ 2 2𝑡 7 4𝑡 𝑥1 = 1 − 2𝑥2 = 1 − 2 − − = + . 3 3 3 3 Therefore, the solution set is ⎫ ⎧⎡ 7 ⎤ ⎫ ⎧⎡ 7 4 ⎤ ⎡ 4 ⎤ + 𝑡 ⎪ ⎪ ⎪ ⎪ 3 3 3 3 ⎪ ⎪ ⎪ ⎨⎢ 2 2 ⎥ ⎬ ⎪ ⎨⎢ 2 ⎥ ⎬ ⎢−2 ⎥ − − 𝑡 − 3 3 3 3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 𝑆 = ⎣ 1 1 ⎦ : 𝑡 ∈ R = ⎣ 1 ⎦ + 𝑡⎣ 1 ⎦ : 𝑡 ∈ R . −3 − 3𝑡 −3 ⎪ ⎪ −3 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎪ ⎩ ⎭ 𝑡 0 1 Section 3.3 An Approach to Solving Systems of Linear Equations 59 An alternative way to proceed, instead of performing back substitution, is to continue to simplify the system. The following extension yields a unique final equivalent system, which is particularly simple. First, using equation scale elementary operations, scale each of the non-trivial equations of the current system. Scale the first equation by the factor 𝑎11 , the second equation by the factor 𝑎1𝑖 and so on, scaling the 𝑟𝑡ℎ equation by the factor 𝑎1𝑘 . Using the 𝑟𝑡ℎ equation and equation addition elementary operations, we eliminate the variable 𝑥𝑘 from all the equations above it. We then examine the (𝑟 − 1)𝑠𝑡 equation, which will be of the form 𝑥𝑗 + · · · = 𝑏𝑟−1 , for some natural number 𝑗 < 𝑘. Using this equation and equation addition elementary operations, we eliminate the variable 𝑥𝑗 from all the equations above it. We repeat this process, moving up through the equations, and performing equation addition elementary operations to eliminate variables. The process ends when we use the second equation to eliminate the variable 𝑥𝑖 from the first equation. These additional steps, performed after the end of the forward elimination phase, are collectively known as the backward elimination phase. When we are performing these steps we say that we are doing backward elimination. For a given system, there are many different possibilities for the equivalent system after doing forward elimination, depending on the particular elementary operations used. However, regardless of the elementary operations used, the same final equivalent system will be obtained after performing both forward and backward elimination. We will formally establish this result in Theorem 3.5.8 (Unique RREF). Once again we return to our example. Example 3.3.1 Our final equivalent system after the forward elimination phase was 𝑥1 +2𝑥2 𝑥2 +𝑥3 +𝑥4 3𝑥3 +𝑥4 0 0 = 1 = −1 = −1 = 0 = 0 and instead of using back substitution, we continue performing elementary operations. We observe that the last two equations are trivial equations and 𝑒3 is in terms of both variables 𝑥3 and 𝑥4 . We perform the elementary operation 1 𝑒3 → 𝑒3 3 to get 𝑥1 +2𝑥2 𝑥2 +𝑥3 +𝑥4 𝑥3 + 31 𝑥4 0 0 = 1 = −1 = − 13 . = 0 = 0 60 Chapter 3 Systems of Linear Equations We use 𝑒3 to eliminate the variable 𝑥3 from the equations above it, by performing 𝑒2 → −𝑒3 + 𝑒2 to get 𝑥1 +2𝑥2 𝑥2 𝑥3 = 1 = − 23 = − 13 . = 0 = 0 + 32 𝑥4 + 31 𝑥4 0 0 We use 𝑒2 to eliminate variable 𝑥2 from the equation above it by performing 𝑒1 → −2𝑒2 + 𝑒1 to get 𝑥1 𝑥2 𝑥3 − 43 𝑥4 + 32 𝑥4 + 31 𝑥4 0 0 7 = 3 = − 23 = − 13 . = 0 = 0 We have completed the backward elimination phase. We have that 𝑥4 = 𝑡, 𝑡 ∈ R. From 𝑒1 we obtain that 7 4 𝑥1 = + 𝑡. 3 3 From 𝑒2 we obtain that 2 2 𝑥2 = − − 𝑡. 3 3 From 𝑒3 we obtain that 1 1 𝑥3 = − − 𝑡. 3 3 Therefore, the solution set is ⎫ ⎧⎡ 7 ⎤ ⎫ ⎧⎡ 7 4 ⎤ ⎡ 4 ⎤ ⎪ ⎪ ⎪ ⎪ 3 + 3𝑡 3 3 ⎪ ⎪ ⎪ ⎪ ⎬ ⎨⎢ 2 ⎥ ⎬ ⎨⎢ 2 2 ⎥ ⎢−2 ⎥ − 𝑡 − − 3 3 3 3 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 𝑆 = ⎣ 1 1 ⎦ : 𝑡 ∈ R = ⎣ 1 ⎦ + 𝑡⎣ 1 ⎦ : 𝑡 ∈ R . −3 − 3𝑡 −3 ⎪ ⎪ ⎪ −3 ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎪ ⎩ ⎭ 𝑡 0 1 𝑡 If we use another parameter, 𝑢 = , then we remove some of the fractions to get 3 ⎧⎡ 7 ⎫ ⎧⎡ 7 ⎤ ⎫ ⎤ ⎡ ⎤ 4 ⎪ ⎪ ⎪ ⎪ 3 + 4𝑢 3 ⎪ ⎪ ⎪ ⎪ ⎨⎢ 2 ⎬ ⎨⎢ 2 ⎥ ⎬ ⎥ ⎢ −2 ⎥ − − 2𝑢 − 3 3 ⎢ ⎥ ⎥ ⎢ ⎥ 𝑆= ⎢ : 𝑢 ∈ R = + 𝑢 : 𝑢 ∈ R . ⎣ −1 − 𝑢 ⎦ ⎣−1 ⎦ ⎣ −1 ⎦ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 3 3 ⎩ ⎭ ⎩ ⎭ 3𝑢 0 3 𝑡−2 and get an even nicer looking solution set 3 ⎧⎡ ⎤ ⎫ ⎡ ⎤ 5 4 ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎬ ⎢ −2 ⎥ −2 ⎥ + 𝑠⎢ ⎥ : 𝑠 ∈ R . 𝑆= ⎢ ⎣ −1 ⎦ ⎣ −1 ⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 2 3 Alternatively, we could let 𝑠 = Section 3.4 Solving Systems of Linear Equations Using Matrices 61 REMARK Remember that, when a solution set contains at least one parameter, you can obtain particular solutions by setting particular values to parameters. In the case of Example 3.3.1, the solution set of a system of linear equations is given by ⎧⎡ ⎤ ⎫ ⎡ ⎤ 5 4 ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎬ ⎢ −2 ⎥ −2 ⎢ ⎥ ⎢ ⎥ 𝑆 = ⎣ ⎦ + 𝑠⎣ ⎦ : 𝑠 ∈ R . −1 −1 ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 2 3 By setting 𝑠 = −1, 𝑠 = 0 and 𝑠 = 1, we find that ⎡ ⎤ ⎡ ⎤ 1 5 ⎢ 0 ⎥ ⎢ −2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ 0 ⎦ , ⎣ −1 ⎦ and −1 2 ⎡ ⎤ 9 ⎢ −4 ⎥ ⎢ ⎥ ⎣ −2 ⎦ 5 are particular solutions to the given system of linear equations. 3.4 Solving Systems of Linear Equations Using Matrices When we are solving a system of equations, we manipulate the system using elementary operations. At each stage, we have an equivalent system of equations. That is, each system has the same solution set. We can stop at any stage with any equivalent system as soon as we find that we can write down the solution set. In the systems of equations that we have solved, notice that we write down the variables many times in the process, but it turns out they are only really important at the beginning and the end of the process. Moving forwards, we will only need to write down the coefficients and the constant terms in the intermediate steps. To do this, we will make use of matrices. Definition 3.4.1 Matrix, Entry An 𝑚 × 𝑛 matrix, 𝐴, is a rectangular array of scalars with 𝑚 rows and 𝑛 columns. The scalar in the 𝑖𝑡ℎ row and 𝑗 𝑡ℎ column is the (𝑖, 𝑗)𝑡ℎ entry and is denoted 𝑎𝑖𝑗 or (𝐴)𝑖𝑗 . That is, ⎡ ⎤ 𝑎11 𝑎12 · · · 𝑎1𝑛 ⎢ 𝑎21 𝑎22 · · · 𝑎2𝑛 ⎥ ⎢ ⎥ 𝐴=⎢ . .. . . .. ⎥ . ⎣ .. . . ⎦ . 𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛 Definition 3.4.2 Coefficient Matrix, Augmented Matrix For a given system of linear equations, 𝑎11 𝑥1 𝑎21 𝑥1 +𝑎12 𝑥2 + · · · +𝑎22 𝑥2 + · · · .. . 𝑎𝑚1 𝑥1 +𝑎𝑚2 𝑥2 + · · · +𝑎1𝑛 𝑥𝑛 = +𝑎2𝑛 𝑥𝑛 = 𝑏1 𝑏2 +𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚 , 62 Chapter 3 Systems of Linear Equations the coefficient matrix, 𝐴, of the system is the matrix ⎤ ⎡ 𝑎11 𝑎12 · · · 𝑎1𝑛 ⎢ 𝑎21 𝑎22 · · · 𝑎2𝑛 ⎥ ⎥ ⎢ 𝐴=⎢ . .. . . .. ⎥ . . ⎣ . . . ⎦ . 𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛 The entry 𝑎𝑖𝑗 is the coefficient of the variable 𝑥𝑗 in the 𝑖𝑡ℎ #» The augmented matrix, [𝐴| 𝑏 ], of the system is ⎡ 𝑎11 𝑎12 · · · 𝑎1𝑛 ⎢ #» ⎢ 𝑎21 𝑎22 · · · 𝑎2𝑛 [𝐴| 𝑏 ] = ⎢ . .. .. .. ⎣ .. . . . 𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛 equation. ⎤ 𝑏1 𝑏2 ⎥ ⎥ .. ⎥ , . ⎦ 𝑏𝑚 #» where 𝑏 is a column whose entries are the constant terms on the right-hand side of the equations. #» The augmented matrix is the coefficient matrix, 𝐴, with the extra column, 𝑏 , added. In solving a system of equations, we can manipulate the rows of the augmented matrix, which will correspond to manipulating the equations. Definition 3.4.3 Elementary Row Operation (ERO) Elementary row operations (EROs) are the operations performed on the coefficient and/or augmented matrix which correspond to the elementary operations performed on the system of equations. Elementary operation Row swap Row scale Row addition Equation 𝑒𝑖 ↔ 𝑒𝑗 𝑒𝑖 → 𝑐𝑒𝑖 , 𝑐 ̸= 0 𝑒𝑖 → 𝑐𝑒𝑗 + 𝑒𝑖 , 𝑖 ̸= 𝑗 Row 𝑅𝑖 ↔ 𝑅𝑗 𝑅𝑖 → 𝑐𝑅𝑖 , 𝑐 ̸= 0 𝑅𝑖 → 𝑐𝑅𝑗 + 𝑅𝑖 , 𝑖 ̸= 𝑗 As with the equation operations, these EROs are referred to in some sources as Type I, Type II, and Type III operations, respectively. Definition 3.4.4 In a matrix, we refer to a row that has all zero entries as a zero row. Zero Row Definition 3.4.5 Row Equivalent If a matrix 𝐵 is obtained from a matrix 𝐴 by a finite number of EROs, then we say that 𝐵 is row equivalent to 𝐴. At any point in the process of manipulating the rows of an augmented matrix, we could stop and immediately write down the corresponding system of linear equations. The first column in the matrix corresponds to the variable 𝑥1 , the second column in the matrix corresponds to the variable 𝑥2 and so on up to the second-last column in the matrix which corresponds Section 3.4 Solving Systems of Linear Equations Using Matrices 63 to the variable 𝑥𝑛 . The last column in the matrix corresponds to the constants on the right hand side of the system. The augmented matrix at any stage is row equivalent to the original augmented matrix. Therefore, the system of equations corresponding to this augmented matrix is equivalent to the system of equations corresponding to the original augmented matrix. Example 3.4.6 Solve the following system (the same system as in Example 3.3.1) using augmented matrices and EROs. 𝑥1 +2𝑥2 = 1 𝑥1 +2𝑥2 +3𝑥3 +𝑥4 = 0 −𝑥1 −𝑥2 +𝑥3 +𝑥4 = −2 . 𝑥2 +𝑥3 +𝑥4 = −1 −𝑥2 +2𝑥3 = 0 Solution: The augmented matrix for this system is ⎡ ⎤ 1 2 0 0 1 ⎢1 2 3 1 0⎥ ⎢ ⎥ ⎢−1 −1 1 1 −2⎥ . ⎢ ⎥ ⎣0 1 1 1 −1⎦ 0 −1 2 0 0 We begin by eliminating 𝑥1 from the second and third equations by performing the following EROs 𝑅2 → −𝑅1 + 𝑅2 𝑅3 → 𝑅1 + 𝑅3 to get ⎡ 1 2 ⎢0 0 ⎢ ⎢0 1 ⎢ ⎣0 1 0 −1 0 3 1 1 2 ⎤ 0 1 1 −1⎥ ⎥ 1 −1⎥ ⎥. 1 −1⎦ 0 0 The second row does not contain the variable 𝑥2 , but the third one does and so we swap these two rows, 𝑅2 ↔ 𝑅3 , to get ⎤ ⎡ 1 2 0 0 1 ⎢0 1 1 1 −1⎥ ⎢ ⎥ ⎢0 0 3 1 −1⎥ . ⎥ ⎢ ⎣0 1 1 1 −1⎦ 0 −1 2 0 0 We now make use of the new second row to eliminate 𝑥2 from all rows after the second one, by performing the following EROs 𝑅4 → −𝑅2 + 𝑅4 𝑅5 → 𝑅2 + 𝑅5 to get ⎡ 1 ⎢0 ⎢ ⎢0 ⎢ ⎣0 0 2 1 0 0 0 0 1 3 0 3 ⎤ 0 1 1 −1⎥ ⎥ 1 −1⎥ ⎥. 0 0⎦ 1 −1 64 Chapter 3 Systems of Linear Equations We use the third row to eliminate 𝑥3 from the rows after it, by performing the following ERO 𝑅5 → −𝑅3 + 𝑅5 to get ⎡ 1 ⎢0 ⎢ ⎢0 ⎢ ⎣0 0 2 1 0 0 0 0 1 3 0 0 ⎤ 0 1 1 −1⎥ ⎥ 1 −1⎥ ⎥. 0 0⎦ 0 0 At this point, we have completed the forward elimination phase using a sequence of EROs equivalent to how we manipulated the equations in Example 3.3.1. We can write down the system of equations that corresponds to the final augmented matrix that we obtained. This system of equations is equivalent to the original system. 𝑥1 +2𝑥2 𝑥2 +𝑥3 +𝑥4 3𝑥3 +𝑥4 0 0 = 1 = −1 = −1 = 0 = 0 This system is the same system we obtained after the forward elimination phase when we were using elementary operations. As before we can use back substitution to obtain the solution set ⎧⎡ 7 4 ⎤ ⎫ ⎧⎡ 7 ⎤ ⎫ ⎡ 4 ⎤ ⎪ ⎪ ⎪ ⎪ 3 + 3𝑡 3 3 ⎪ ⎪ ⎪ ⎪ ⎨⎢ 2 2 ⎥ ⎬ ⎨⎢ 2 ⎥ ⎬ ⎢−2 ⎥ − − 𝑡 − 3 3 ⎥:𝑡∈R 3 ⎥ + 𝑡⎢ 3 ⎥ : 𝑡 ∈ R . ⎢ 𝑆= ⎢ = ⎣−1 ⎦ ⎣−1 − 1𝑡⎦ ⎪ ⎪ ⎪ ⎪⎣ − 13 ⎦ ⎪ ⎪ ⎪ 3 3 3 ⎩ ⎭ ⎪ ⎭ ⎩ 𝑡 0 1 Instead of using back substitution we can continue using EROs to obtain an even simpler system. We scale the third row by performing the following ERO 1 𝑅3 → 𝑅3 3 to get ⎡ ⎤ 1 2 0 0 −1 ⎢0 1 1 1 −1 ⎥ ⎢ ⎥ ⎢0 0 1 1 − 1 ⎥ . 3 3⎥ ⎢ ⎣0 0 0 0 0 ⎦ 0 0 0 0 0 Next, we use the third row to eliminate 𝑥3 from the equations above it. We do this by performing the following ERO 𝑅2 → −𝑅3 + 𝑅2 yielding ⎡ 1 ⎢0 ⎢ ⎢0 ⎢ ⎣0 0 2 1 0 0 0 ⎤ 0 0 −1 0 23 − 23 ⎥ ⎥ 1 13 − 13 ⎥ ⎥. 0 0 0 ⎦ 0 0 0 Section 3.5 Solving Systems of Linear Equations Using Matrices 65 The last step of the backward elimination phase is to eliminate 𝑥2 from the first equation, which is achieved by the ERO 𝑅1 → −2𝑅2 + 𝑅1 yielding ⎡ 1 ⎢0 ⎢ ⎢0 ⎢ ⎣0 0 0 1 0 0 0 0 − 34 0 23 1 13 0 0 0 0 7 3 ⎤ − 23 ⎥ ⎥ − 13 ⎥ ⎥. 0 ⎦ 0 This augmented matrix has a very simple structure. We could argue that it has the simplest structure of all the augmented matrices we have seen in this example. In the next section we will introduce terminology to formalize this structure. The system of equations corresponding to the last augmented matrix above is 𝑥1 𝑥2 𝑥3 − 43 𝑥4 + 32 𝑥4 + 31 𝑥4 0 0 7 = 3 = − 23 = − 13 = 0 = 0 This system is the same system we obtained after the forward and backward elimination phases when we were using elementary operations. Therefore, the solution set is ⎧⎡ 7 4 ⎤ ⎫ ⎧⎡ 7 ⎤ ⎫ ⎡ 4 ⎤ ⎪ ⎪ ⎪ ⎪ 3 + 3𝑡 3 3 ⎪ ⎪ ⎪ ⎪ ⎨⎢ 2 2 ⎥ ⎬ ⎨⎢ 2 ⎥ ⎬ 2⎥ ⎢ − − − − 𝑡 3 3 3 3 ⎥ ⎥ ⎢ ⎥ ⎢ 𝑆= ⎢ : 𝑡 ∈ R + 𝑡 : 𝑡 ∈ R = . ⎣−1 ⎦ ⎣−1 − 1𝑡⎦ ⎣−1 ⎦ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 3 3 3 3 ⎩ ⎭ ⎩ ⎭ 𝑡 0 1 REMARK In an augmented matrix, a zero row corresponds to the equation 0 = 0. In the above example, the last two rows of the augmented matrix eventually became zero rows. Although these equations do not give us any information about the solution to the system of equations, we continued to write them down so that we would not forget that our system started with five equations. If at any point we have a zero row in the coefficient matrix and the last entry in the corresponding row in the augmented matrix is 𝑏 ̸= 0, then we can stop and deduce that our system is inconsistent, since one of the equations has become 0 = 𝑏, where 𝑏 ̸= 0. 66 Chapter 3 3.5 Systems of Linear Equations The Gauss–Jordan Algorithm for Solving Systems of Linear Equations Consider a system of 𝑚 linear equations in 𝑛 variables (unknowns). We assume the variables are labelled 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 . The Gauss–Jordan algorithm is the formalization of the strategy we used in the previous section to solve such a system. In this algorithm, we use EROs to manipulate the augmented matrix into a simpler form from which it is easier to determine the solution of the system. The process of performing EROs on the matrix to bring it into these simpler forms is called row reduction or Gaussian elimination after Carl Friedrich Gauss (1777-1855) who outlined the process; a similar method for solving systems of linear equations was known to the Chinese around 250 B.C. In the previous section, we obtained two simpler forms of the augmented matrix. The first was obtained after completing the forward elimination process and is called the row echelon form. The second is obtained after completing both the forward and backward elimination processes and is called the reduced row echelon form. We now give the formal definitions of these forms in terms of leading entries and pivots. Definition 3.5.1 Leading Entry, Leading One Definition 3.5.2 Row Echelon Form The leftmost non-zero entry in any non-zero row of a matrix is called the leading entry of that row. If the leading entry is a 1, then it is called a leading one. We say that a matrix is in row echelon form (REF) whenever both of the following two conditions are satisfied: 1. All zero rows occur as the final rows in the matrix. 2. The leading entry in any non-zero row appears in a column to the right of the columns containing the leading entries of any of the rows above it. We say that the matrix 𝑅 is a row echelon form of matrix 𝐴 to mean that 𝑅 is in row echelon form and that 𝑅 can be obtained from 𝐴 by performing a finite number of EROs to 𝐴. Definition 3.5.3 Pivot, Pivot Position, Pivot Column, Pivot Row If a matrix is in REF, then the leading entries are referred to as pivots and their positions in the matrix are called pivot positions. Any column that contains a pivot position is called a pivot column. Any row that contains a pivot position is called a pivot row. Due to the structure of REF, any pivot column will contain exactly one pivot and any pivot row will contain exactly one pivot. Example 3.5.4 The matrix ⎡ 3 ⎢0 ⎢ ⎢0 ⎢ ⎣0 0 2 0 0 0 0 −3 2 0 0 0 ⎤ 4 0 ⎥ ⎥ −5 ⎥ ⎥ 0 ⎦ 0 Section 3.5 The Gauss–Jordan Algorithm for Solving Systems of Linear Equations 67 is in REF, with pivots in the (1, 1), (2, 3) and (3, 4) entries. However, the matrix ⎡ ⎤ −3 3 5 7 ⎣ 0 26 0 ⎦ 1 0 1 −2 is not in REF because the leading entry in the third row (marked in red) is not to the right of the leading entry of one (in fact both of) the rows above it. Definition 3.5.5 Reduced Row Echelon Form We say that a matrix is in reduced row echelon form (RREF) whenever all of the following three conditions are satisfied: 1. It is in REF. 2. All its pivots are leading ones. 3. The only non-zero entry in a pivot column is the pivot itself. REMARK Item “3.” from the definition of RREF above implies that all entries above and below a pivot MUST be zero for a matrix in RREF; in other words, any column of a matrix in RREF with two or more non-zero entries is not a pivot column. Example 3.5.6 The matrix ⎡ 3 ⎢0 ⎢ ⎢0 ⎢ ⎣0 0 2 0 0 0 0 −3 2 0 0 0 ⎤ 4 0 ⎥ ⎥ −5 ⎥ ⎥ 0 ⎦ 0 from the previous example is in REF, but not in RREF, because its pivots are not leading ones, and moreover, the third and fourth pivot columns contain non-zero entries that are not pivots. The matrix ⎡ 1 ⎢0 ⎢ ⎢0 ⎢ ⎣0 0 2 0 0 0 0 0 1 0 0 0 ⎤ 0 0⎥ ⎥ 1⎥ ⎥ 0⎦ 0 is in RREF. Example 3.5.7 List all possible 2 × 2 matrices that are in RREF. 68 Chapter 3 Systems of Linear Equations Solution: Since a 2 × 2 matrix has two rows and two columns, then the number of pivots is either zero, one or two. If there are zero pivots, then all entries in the matrix must be 0. Therefore, we obtain the matrix [︂ ]︂ 00 . 00 If there is one pivot, then this pivot must occur in the first row and the second row must be a zero row. The pivot could occur in the first column or the second column. If the pivot occurs in the first column, then the (1, 2) entry could be anything. Therefore, we obtain an infinite number of matrices of the form [︂ ]︂ 1𝑎 , 𝑎 ∈ F. 00 If the pivot occurs in the second column, then the (1, 1) entry must be 0. Therefore, we obtain the matrix [︂ ]︂ 01 . 00 If there are two pivots, then they must occur in the positions (1, 1) and (2, 2) and the other entries must be 0. Therefore, we obtain the matrix [︂ ]︂ 10 . 01 If 𝐴 is not a matrix of all zeros, then there are an infinite number of matrices which are REFs of 𝐴. However, the pivot positions of 𝐴 are unique, as well as is its RREF. At this point, we do not have the tools to prove this result. Theorem 3.5.8 (Unique RREF) Let 𝐴 be a matrix with REFs 𝑅1 and 𝑅2 . Then 𝑅1 and 𝑅2 will have the same set of pivot positions. Moreover, there is a unique matrix 𝑅 such that 𝑅 is the RREF of 𝐴. Because the RREF of a given matrix 𝐴 is unique, we can introduce notation to refer to it. Definition 3.5.9 RREF(𝐴) We say that the matrix 𝑅 is the reduced row echelon form of matrix 𝐴, and we write 𝑅 = RREF(𝐴), if 𝑅 is in reduced row echelon form and if 𝑅 can be obtained from 𝐴 by performing a finite number of EROs to 𝐴. Having laid down the needed definitions, we now outline an algorithm that takes as input a given matrix 𝐴 and produces a row equivalent matrix 𝑅 in REF. This algorithm corresponds to the forward elimination process for a consistent system. ALGORITHM (Obtaining an REF) Section 3.5 The Gauss–Jordan Algorithm for Solving Systems of Linear Equations 69 1. Consider the leftmost non-zero column of the matrix. Use EROs to obtain a leading entry in the top position of this column. This entry is now a pivot and this row is now a pivot row. 2. Use EROs to change all other entries below the pivot in this pivot column to 0. 3. Consider the submatrix formed by covering up the current pivot row and all previous pivot rows. If there are no more rows or if the only remaining rows are zero rows, we are finished. Otherwise, repeat steps 1 and 2 on the submatrix. Continue in this manner, covering up the current pivot row to obtain a matrix with one less row until no rows remain or we obtain a submatrix with only zero rows. We can also outline a similar algorithm that takes as input a matrix 𝑅 in REF and produces the RREF of 𝑅. This will correspond to the backward elimination process. ALGORITHM (Obtaining the RREF from an REF) Start with a matrix in REF. 1. Select the rightmost pivot column. If the pivot is not already 1, use EROs to change it to 1. 2. Use EROs to change all entries above the pivot in this pivot column to 0. 3. Consider the submatrix formed by covering up the current pivot row and all other rows below it. If there are no more rows, then we are finished. Otherwise, repeat steps 1 and 2 on the submatrix until no rows remain. The preceding two algorithms can be run in sequence to take a matrix 𝐴 and produce RREF(𝐴). This process is known as the Gauss–Jordan1 algorithm. REMARKS • If [︀ the system]︀is inconsistent, then at some point we will obtain a row of the form 0 · · · 0 𝑏 , where 𝑏 ̸= 0. This row corresponds to the equation 0 = 𝑏, where 𝑏 ̸= 0. As soon as we obtain such a row, we can stop the algorithm and conclude that the system is inconsistent. • We will not always blindly follow these algorithms when finding an REF or the RREF. There may be other choices along the way that lead to fewer calculations, or we may want to avoid fractions for as long as possible. When determining an REF we often try to obtain leading 1s, even though they are not required, because they are easier to work with. 1 Named after the aforementioned Gauss and Wilhelm Jordan (1842-1899), a German engineer who popularized this method for solving systems of equations. 70 Chapter 3 Systems of Linear Equations We will apply the Gauss–Jordan algorithm to the augmented matrix obtained from a system of linear equations. This will allow us to easily obtain the solution set to the system. The following terminology will be helpful in describing these solution sets. Definition 3.5.10 Basic Variable, Free Variable Consider a system of linear equations. Let 𝑅 be an REF of the coefficient matrix of this system. If the 𝑖𝑡ℎ column of this matrix contains a pivot, then we call 𝑥𝑖 a basic variable. Otherwise, we call 𝑥𝑖 a free variable. This terminology reflects the fact that we will be able to assign parameters to the free variables, and then we will be able to express the basic variables in terms of these parameters. See the examples below. Example 3.5.11 Solve the following system. −𝑥1 2𝑥1 𝑥1 2𝑥1 −2𝑥2 +2𝑥3 +4𝑥2 −2𝑥3 +2𝑥2 −𝑥3 +4𝑥2 −6𝑥3 +4𝑥4 −2𝑥4 −2𝑥4 −8𝑥4 = 3 = 4 = 0 = −4 Solution: We begin by giving the augmented matrix for the system. ⎡ ⎤ −1 −2 2 4 3 ⎢2 4 −2 −2 4 ⎥ ⎢ ⎥ ⎣1 2 −1 −2 0 ⎦ 2 4 −6 −8 −4 The leftmost non-zero column is the first column. Therefore, our first goal is to get a leading entry (which will become a pivot) in the top position of this column. It will make our calculations easier if this is a leading one, so we will perform the ERO 𝑅1 ←→ 𝑅3 to get ⎡ ⎤ 1 2 −1 −2 0 ⎢2 4 −2 −2 4 ⎥ ⎢ ⎥. ⎣−1 −2 2 4 3⎦ 2 4 −6 −8 −4 Next, we want to change all entries below the pivot in this column to 0. So we perform the following EROs 𝑅2 → −2𝑅1 + 𝑅2 𝑅3 → 𝑅1 + 𝑅3 𝑅4 → −2𝑅1 + 𝑅4 to get ⎡ 1 ⎢0 ⎢ ⎣0 0 ⎤ 2 −1 −2 0 0 0 2 4⎥ ⎥. 0 1 2 3⎦ 0 −4 −4 −4 Covering up the first row, the leftmost non-zero column is the third column. We need to get a leading entry (which will become a pivot) in the top position of this column (with the Section 3.5 The Gauss–Jordan Algorithm for Solving Systems of Linear Equations first row covered) and so we need to get a pivot in do this is to perform the ERO 𝑅2 ←→ 𝑅3 to get ⎡ 1 2 −1 −2 ⎢0 0 1 2 ⎢ ⎣0 0 0 2 0 0 −4 −4 71 the (2,3) position. The simplest way to ⎤ 0 3⎥ ⎥. 4⎦ −4 Next, we want to change all entries below the pivot in this column to 0. So we perform the following ERO 𝑅4 → 4𝑅2 + 𝑅4 to get ⎡ 1 ⎢0 ⎢ ⎣0 0 ⎤ 2 −1 −2 0 0 1 2 3⎥ ⎥. 0 0 2 4⎦ 0 0 4 8 Covering up the first two rows, the leftmost non-zero column is the fourth column. We need to get a leading entry (which will become a pivot) in the top position of this column (with the first two rows covered) and so we need to get a pivot in the (3,4) position. There already is a leading entry there and so no EROs are needed. Next, we want to change all entries below the pivot in this column to 0. So we perform the following ERO 𝑅4 → −2𝑅3 + 𝑅4 to get ⎡ 1 ⎢0 ⎢ ⎣0 0 ⎤ 2 −1 −2 0 0 1 2 3⎥ ⎥. 0 0 2 4⎦ 0 0 0 0 Covering up the first three rows, all remaining rows are zero rows and so we now have an augmented matrix in REF. We write down the system of equations that corresponds to this augmented matrix and solve this system using back substitution. The corresponding system of equations is 𝑥1 +2𝑥2 −𝑥3 −2𝑥4 𝑥3 +2𝑥4 2𝑥4 0 = = = = 0 3 . 4 0 The third equation tells us that 𝑥4 = 2. Substituting this into the second equation, we obtain that 𝑥3 = −1. Substituting both of these values into the first equation, we obtain that 𝑥1 + 2𝑥2 = 3. Using our convention, we set 𝑥1 to be a basic variable and 𝑥2 to be a free variable. Therefore, 𝑥2 = 𝑡, 𝑡 ∈ R and 𝑥1 = 3 − 2𝑡. ⎧⎡ ⎫ ⎧⎡ ⎫ ⎤ ⎤ ⎡ ⎤ 3 − 2𝑡 3 −2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎬ ⎨⎢ ⎬ ⎥ ⎥ ⎢ 1 ⎥ 𝑡 0 ⎥:𝑡∈R = ⎢ ⎥ + 𝑡⎢ ⎥ : 𝑡 ∈ R . Therefore, the solution set is 𝑆 = ⎢ ⎣ −1 ⎦ ⎣ 0 ⎦ ⎪ ⎪ ⎪⎣ −1 ⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎪ ⎩ ⎭ 2 2 0 72 Chapter 3 Systems of Linear Equations Instead of stopping at REF and using back substitution, we could continue to do row reduction until we reached an augmented matrix in RREF (the Gauss-Jordan algorithm). Beginning with ⎡ 1 ⎢0 ⎢ ⎣0 0 ⎤ 2 −1 −2 0 0 1 2 3⎥ ⎥ 0 0 2 4⎦ 0 0 0 0 we identify the rightmost pivot column, which to 1 by performing the ERO 𝑅3 → 12 𝑅3 to get ⎡ 1 2 −1 ⎢0 0 1 ⎢ ⎣0 0 0 0 0 0 is the fourth column. We change the pivot −2 2 1 0 ⎤ 0 3⎥ ⎥. 2⎦ 0 Next, we want to change all entries above this pivot to 0. So we perform the EROs 𝑅1 → 2𝑅3 + 𝑅1 𝑅2 → −2𝑅3 + 𝑅2 to get ⎡ 1 ⎢0 ⎢ ⎣0 0 ⎤ 2 −1 0 4 0 1 0 −1⎥ ⎥. 0 0 1 2⎦ 0 0 0 0 Covering up the bottom two rows, the rightmost pivot column is the third column. The pivot in this column is already 1 so no ERO is necessary. We want to change all entries above this pivot to 0 and so we perform the ERO 𝑅1 → 𝑅2 + 𝑅1 to get ⎡ 1 ⎢0 ⎢ ⎣0 0 2 0 0 0 0 1 0 0 ⎤ 0 3 0 −1⎥ ⎥. 1 2⎦ 0 0 Covering up the bottom three rows, the rightmost pivot column is the first column. The pivot in this column is already 1 so no ERO is necessary. There are now no more rows to adjust and so our augmented matrix is in RREF. The corresponding system of equations is 𝑥1 +2𝑥2 𝑥3 = 3 = −1 𝑥4 = 2 0 = 0 The second and third equations give us values for 𝑥3 and 𝑥4 , respectively. We set 𝑥2 to be a free variable and thus 𝑥2 = 𝑡, 𝑡 ∈ R, which implies that 𝑥1 = 3 − 2𝑡. Therefore, the solution Section 3.5 The Gauss–Jordan Algorithm for Solving Systems of Linear Equations 73 ⎧⎡ ⎫ ⎧⎡ ⎤ ⎫ ⎤ ⎡ ⎤ 3 − 2𝑡 3 −2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎬ ⎪ ⎨⎢ ⎥ ⎬ ⎥ ⎢ 1 ⎥ 𝑡 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ set is 𝑆 = ⎣ : 𝑡 ∈ R = ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑡 ∈ R which is the same solution −1 ⎦ 0 ⎪ ⎪ ⎪ −1 ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎪ ⎩ ⎭ 2 2 0 set we obtained earlier. Example 3.5.12 Solve the following system. 2𝑥1 +4𝑥2 −2𝑥3 = 2 3𝑥1 +7𝑥2 +𝑥3 = 0 2𝑥1 +5𝑥2 +2𝑥3 = 1 Solution: We begin by giving the augmented matrix for the system. ⎡ ⎤ 2 4 −2 2 ⎣3 7 1 0⎦ 2 5 2 1 Our first goal is to get a leading entry (which will become a pivot) in the top position of the first column, since is it the leftmost non-zero column. It will actually make our calculations easier if this is a leading one and so we will perform the ERO 𝑅1 → 12 𝑅1 to obtain ⎡ ⎤ 1 2 −1 1 ⎣3 7 1 0⎦ . 2 5 2 1 Next, we want to change all entries below the pivot in this column to 0. So we perform the following row addition EROs 𝑅2 → −3𝑅1 + 𝑅2 𝑅3 → −2𝑅1 + 𝑅3 to obtain ⎤ ⎡ 1 2 −1 1 ⎣0 1 4 −3⎦ . 0 1 4 −1 Covering up the first row, the leftmost non-zero column is the second column. We need to obtain a leading entry (which will become a pivot) in the top position of this column (with the first row covered) and so we need to get a pivot in the (2,2) position. There already is a pivot there and so no ERO is necessary. Next, we want to change all entries below this pivot to 0 and so we perform the ERO 𝑅3 → −𝑅2 + 𝑅3 to obtain ⎡ ⎤ 1 2 −1 1 ⎣0 1 4 −3⎦ . 0 0 0 2 [︀ ]︀ The third row is of the form 0 0 0 𝑏 , where 𝑏 ̸= 0 and so we can stop the algorithm here and declare that this system is inconsistent. (The third row corresponds to the equation 0 = 2.) 74 Chapter 3 Systems of Linear Equations We will include one more example in which we give far less commentary. We will simply outline the EROs that we are using. We will make use of the Gauss–Jordan algorithm (forward and backward elimination). Example 3.5.13 Solve the following system. 3𝑥1 +5𝑥2 +3𝑥3 = 9 −2𝑥1 −𝑥2 +6𝑥3 = 10 4𝑥1 +10𝑥2 −3𝑥3 = −2 Solution: The augmented matrix for the system is ⎡ ⎤ 3 5 3 9 ⎣−2 −1 6 10 ⎦ . 4 10 −3 −2 We determine an REF. ⎤ 1 4 9 19 𝑅1 → 𝑅2 + 𝑅1 gives ⎣−2 −1 6 10 ⎦ . 4 10 −3 −2 ⎤ ⎡ {︃ 19 1 4 9 𝑅2 → 2𝑅1 + 𝑅2 24 48 ⎦ . gives ⎣0 7 𝑅3 → −4𝑅1 + 𝑅3 0 −6 −39 −78 ⎡ ⎤ 1 4 9 19 𝑅2 → 𝑅3 + 𝑅2 gives ⎣0 1 −15 −30⎦ . 0 −6 −39 −78 ⎡ ⎤ 1 4 9 19 𝑅3 → 6𝑅2 + 𝑅3 gives ⎣0 1 −15 −30 ⎦ . 0 0 −129 −258 ⎡ This augmented matrix is in REF. At this point we could use back substitution, but we will continue on to RREF. ⎡ ⎤ 1 4 9 19 1 𝑅3 → − 129 𝑅3 gives ⎣0 1 −15 −30⎦ . 0 0 1 2 ⎡ ⎤ {︃ 1 4 0 1 𝑅1 → −9𝑅3 + 𝑅1 gives ⎣0 1 0 0⎦ . 𝑅2 → 15𝑅3 + 𝑅2 0 0 1 2 ⎡ ⎤ 1 0 0 1 𝑅1 → −4𝑅2 + 𝑅1 gives ⎣0 1 0 0⎦ . 0 0 1 2 Given this augmented matrix we ⎧ can ⎡ see ⎤⎫that the solution is 𝑥1 = 1, 𝑥2 = 0 and 𝑥3 = 2. ⎨ 1 ⎬ Therefore, the solution set is 𝑆 = ⎣ 0 ⎦ . ⎩ ⎭ 2 Section 3.6 Rank and Nullity 3.6 75 Rank and Nullity Definition 3.6.1 We use 𝑀𝑚×𝑛 (R) to denote to the set of all 𝑚 × 𝑛 matrices with real entries. 𝑀𝑚×𝑛 (R), 𝑀𝑚×𝑛 (C), 𝑀𝑚×𝑛 (F) We use 𝑀𝑚×𝑛 (C) to denote to the set of all 𝑚 × 𝑛 matrices with complex entries. If we do not need to distinguish between real and complex entries we will use 𝑀𝑚×𝑛 (F). REMARK If 𝑚 = 𝑛 it is common to abbreviate 𝑀𝑚×𝑛 (F) as 𝑀𝑛 (F). Let us consider a system of 𝑚 linear equations in 𝑛 variables. The matrix [︁ coefficient #» ]︁ of this system, 𝐴, belongs to 𝑀𝑚×𝑛 (F) and the augmented matrix, 𝐴 | 𝑏 , belongs to 𝑀𝑚×(𝑛+1) (F). In other words, the augmented matrix has exactly one more column than the coefficient matrix. There are three important questions that we would like to consider about this system. 1. Is the system consistent or inconsistent? 2. If the system is consistent, does it have a unique solution? 3. If the system is consistent and does not have a unique solution, how many parameters are there in the solution set? We will be able to[︁ answer these questions once we have an REF or the RREF of the #» ]︁ augmented matrix 𝐴 | 𝑏 even before we have found the solution set. Definition 3.6.2 Rank Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) such that RREF(𝐴) has exactly 𝑟 pivots. Then we say that the rank of 𝐴 is 𝑟, and we write rank(𝐴) = 𝑟. Although there are infinitely many REFs of a matrix 𝐴, Theorem 3.5.8 (Unique RREF) tells us that they will all have the same pivot positions, hence the same number of pivots. Since RREF(𝐴) is an REF of 𝐴, all REFs of 𝐴 will have the same number of pivots as RREF(𝐴). Therefore, we can use any REF of 𝐴 to determine its rank. Proposition 3.6.3 (Rank Bounds) If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then rank(𝐴) ≤ min{𝑚, 𝑛}. Proof: Since there is at most one pivot in each row, then rank(𝐴) ≤ 𝑚. Also, since there is at most one pivot in each column, then rank(𝐴) ≤ 𝑛. Comparing the rank of the coefficient matrix of a system of linear equations to the rank of its augmented matrix will tell us whether the system is consistent or inconsistent. 76 Proposition 3.6.4 Chapter 3 Systems of Linear Equations (Consistent System Test) [︁ #» ]︁ Let 𝐴 be the coefficient matrix of a system of linear equations and let 𝐴 | 𝑏 be the augmented of the system. The system is consistent if and only if (︁ [︁ matrix #» ]︁ )︁ rank(𝐴) = rank 𝐴| 𝑏 . [︁ #» ]︁ Proof: Suppose that we perform a sequence of EROs on 𝐴 | 𝑏 to manipulate it into its [︁ ]︁ RREF, which we will call 𝑅 | #» 𝑐 . If we perform the same sequence of EROs on 𝐴, we will [︁ ]︁ obtain the matrix 𝑅, which is the RREF of 𝐴. The only difference between 𝑅 and 𝑅 | #» 𝑐 [︁ ]︁ (︁ [︁ #» ]︁ )︁ is that 𝑅 | #» 𝑐 has an additional column. Therefore, rank(𝐴) ≤ rank 𝐴| 𝑏 . [︁ ]︁ Assume that the system is inconsistent. Then 𝑅 | #» 𝑐 contains a row of the form [0 · · · 0|1] [︁ ]︁ (︁ [︁ #» ]︁ )︁ and so 𝑅 | #» 𝐴| 𝑏 . 𝑐 has a pivot in its final column. Therefore, rank(𝐴) < rank [︁ ]︁ 𝑐 does not contain a row of the form Assume that the system is consistent. Then 𝑅 | #» (︁ [︁ #» ]︁ )︁ [0 · · · 0|1] and so 𝑅 and [𝑅| #» 𝑐 ] have the same pivots. Therefore, rank(𝐴) = rank 𝐴| 𝑏 . In summary, the system is consistent if and only if rank(𝐴) = rank Example 3.6.5 Suppose we have a system of linear equations matrix to obtain the following REF. ⎡ 2 −5 5 ⎢0 0 4 ⎢ ⎣0 0 0 0 0 0 (︁ [︁ #» ]︁ )︁ 𝐴| 𝑏 . and we perform EROs on the augmented −7 −9 5 0 ⎤ 4 3⎥ ⎥ 2⎦ 1 Determine whether the system is consistent or inconsistent. Solution: The rank of the coefficient matrix is 3 and the rank of the augmented matrix is 4. Therefore, the system is inconsistent. The fact that the system is inconsistent is also evident from the fact that the final row of the augmented matrix corresponds to the equation 0 = 1. Example 3.6.6 Suppose we have a system of linear equations and matrix to obtain the following REF. ⎡ 2 −5 5 −7 ⎢0 0 4 −9 ⎢ ⎣0 0 0 5 0 0 0 0 we perform EROs on the augmented ⎤ 4 3⎥ ⎥ 2⎦ 0 Determine whether the system is consistent or inconsistent. Solution: The ranks of the coefficient matrix and the augmented matrix are both 3. Therefore, the system is consistent. Section 3.6 Rank and Nullity 77 [︁ #» ]︁ Comparing the ranks of 𝐴 and 𝐴 | 𝑏 gives us a quick way of determining whether or not a system of linear equations is consistent. If the system is consistent, then we usually want to know how many parameters, if any, appear in the solution set. We can also answer this question using rank(𝐴), as our next result shows. Theorem 3.6.7 (System Rank Theorem) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) with rank(𝐴) = 𝑟. #» #» (a) Let 𝑏 ∈ F𝑚 . If the system of linear equations with augmented matrix [𝐴 | 𝑏 ] is consistent, then the solution set to this system will contain 𝑛 − 𝑟 parameters. [︁ #» ]︁ #» (b) The system with augmented matrix 𝐴 | 𝑏 is consistent for every 𝑏 ∈ F𝑚 if and only if 𝑟 = 𝑚. Proof: (a) Since 𝐴 has 𝑛 columns, the system of equations has 𝑛 variables. Since rank(𝐴) = 𝑟, RREF(𝐴) will have 𝑟 pivot columns. Therefore, the system will have 𝑟 basic variables and 𝑛−𝑟 free variables. In determining the solution set, we assign each free variable a parameter. Therefore, the solution set will contain 𝑛 − 𝑟 parameters. (b) [︁We prove the contrapositive which says that the system with augmented matrix #» ]︁ #» 𝐴 | 𝑏 is inconsistent for some 𝑏 ∈ F𝑚 if and only if 𝑟 ̸= 𝑚. We begin the forward direction and assume that the system with augmented [︁ with #» ]︁ #» matrix 𝐴 | 𝑏 is inconsistent for some 𝑏 ∈ F𝑚 . Then, as in the proof of Proposition (︁ [︁ #» ]︁ )︁ 3.6. 4 (Consistent System Test), RREF 𝐴| 𝑏 will include a row of the form [0 · · · 0|1]. Therefore, RREF(𝐴) will include a row of all zeros. Since RREF(𝐴) has 𝑚 rows, it follows that rank(𝐴) < 𝑚. For the backward direction we assume that 𝑟 ̸= 𝑚. By Proposition 3.6. 3 (Rank Bounds), we get that 𝑟 < 𝑚. If we let 𝑅 = RREF(𝐴), [︁ ]︁ then 𝑅 must include a #» row of all zeros. Consider the augmented matrix 𝑅 | 𝑒 𝑚 , where #» 𝑒 𝑚 is the vector from the standard basis of F𝑚 whose 𝑚𝑡ℎ component is 1 with all other components 0. It corresponds [︁ ]︁to an inconsistent system. Since all EROs are reversible, we can #» apply to 𝑅 | 𝑒 𝑚 the reverse of the EROs needed to row reduce 𝐴 to 𝑅. Applying [︁ #» ]︁ these reverse EROs will result in an augmented matrix of the form 𝐴 | 𝑏 , for some [︁ ]︁ [︁ #» #» ]︁ 𝑏 ∈ F𝑚 . The matrix is row equivalent to 𝑅 | #» 𝑒 𝑚 . Therefore, 𝐴 | 𝑏 is the augmented matrix of an inconsistent system. Definition 3.6.8 Nullity Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) with rank(𝐴) = 𝑟. We define the nullity of 𝐴, written nullity(𝐴), to be the integer 𝑛 − 𝑟. 78 Chapter 3 Systems of Linear Equations [︁ #» ]︁ Thus Theorem 3.6. 7 (System Rank Theorem) shows that, if 𝐴 | 𝑏 is the augmented matrix of a consistent system, the number of parameters in the solution set of this system is given by nullity(𝐴). Example 3.6.9 Returning to Example 3.6.6, we have an augmented matrix with the following REF: ⎡ ⎤ 2 −5 5 −7 4 ⎢0 0 4 −9 3⎥ ⎢ ⎥ ⎣0 0 0 5 2⎦ . 0 0 0 0 0 How many parameters will there be in the solution set of the corresponding system? Solution: We already determined that this system was consistent. The system has four variables and three pivots. Using Theorem 3.6.7 (System Rank Theorem), the number of parameters in the solution set to this system is 4 − 3 = 1 (i.e. the nullity is 1). The variable 𝑥2 is a free variable because the second column is not a pivot column. All other variables are basic. 3.7 Homogeneous and Non-Homogeneous Systems, Nullspace We examine a series of examples that will help us state general facts about solving systems of linear equations. Example 3.7.1 Solve the following system of linear equations: 𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = 1 2𝑥1 −4𝑥2 +𝑥3 = 5 . 𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 = 4 Solution: [︁ #» ]︁ The augmented matrix 𝐴 | 𝑏 for the system is ⎡ ⎤ 1 −2 −1 3 1 ⎣2 −4 1 0 5⎦ . 1 −2 2 −3 4 We perform the following EROs. {︃ 𝑅2 → −2𝑅1 + 𝑅2 ⎡ 1 ⎣ gives 0 𝑅3 → −𝑅1 + 𝑅3 0 ⎡ 1 −2 𝑅2 → 13 𝑅2 gives ⎣0 0 0 0 ⎡ 1 𝑅3 → −3𝑅2 + 3𝑅3 gives ⎣0 0 ⎤ −2 −1 3 1 0 3 −6 3⎦ . 0 3 −6 3 ⎤ −1 3 1 1 −2 1⎦ . 3 −6 3 ⎤ −2 −1 3 1 0 1 −2 1⎦ . 0 0 0 0 Section 3.7 Homogeneous and Non-Homogeneous Systems, Nullspace This matrix is in REF and since rank(𝐴) = rank this system is consistent. 79 (︁[︁ #» ]︁)︁ 𝐴| 𝑏 = 2, then we conclude that We perform the following ERO. ⎤ ⎡ 1 −2 0 1 2 𝑅1 → 𝑅1 + 𝑅2 gives ⎣0 0 1 −2 1⎦ . 0 0 0 0 0 This matrix is in RREF. Since there are four variables and rank(𝐴) = 2, the number of parameters is 4 − 2 = 2. We let the free variables 𝑥2 and 𝑥4 be 𝑠 and 𝑡 respectively, where 𝑠, 𝑡 ∈ R. Solving for the basic variables, we get 𝑥1 = 2+2𝑠−𝑡 and 𝑥3 = 1+2𝑡, for 𝑠, 𝑡 ∈ R. Therefore, the solution set is ⎫ ⎧⎡ ⎤ ⎧⎡ ⎫ ⎡ ⎤ ⎡ ⎤ ⎤ −1 2 2 2 + 2𝑠 − 𝑡 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎬ ⎨⎢ ⎥ ⎨⎢ ⎢ 0 ⎥ ⎢1⎥ ⎥ 0 𝑠 ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ : 𝑠, 𝑡 ∈ R . + 𝑡 + 𝑠 : 𝑠, 𝑡 ∈ R = 𝑆= ⎣ ⎣ 2 ⎦ ⎣0⎦ 1 + 2𝑡 ⎦ ⎪ ⎪ ⎪⎣ 1 ⎦ ⎪ ⎪ ⎪ ⎪ ⎭ ⎭ ⎪ ⎩ ⎩ 1 0 0 𝑡 Example 3.7.2 Solve the following system of linear equations: 𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = 1 2𝑥1 −4𝑥2 +𝑥3 = 2 . 𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 = 3 Solution: The augmented matrix is: ⎡ ⎤ 1 −2 −1 3 1 ⎣2 −4 1 0 2⎦ . 1 −2 2 −3 3 The coefficient matrix for this system is exactly the same as it was for the previous system. So we perform the exact same sequence of EROs to obtain the following augmented matrix. (Only the constant terms will be different.) ⎡ ⎤ 1 −2 −1 3 1 ⎣0 0 1 −2 0⎦ . 0 0 0 0 2 (︁[︁ #» ]︁)︁ This matrix is in REF and since rank(𝐴) = 2 is not equal to rank 𝐴 | 𝑏 = 3, we conclude that this system is inconsistent. (We could also notice that the third row corresponds to the equation 0 = 2.) The solution set is 𝑇 = ∅. 80 Example 3.7.3 Chapter 3 Systems of Linear Equations Solve the following system of linear equations: 𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = −1 2𝑥1 −4𝑥2 +𝑥3 = 4 . 𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 = 5 Solution: Again we observe that we have the same coefficient matrix and thus, we will perform the same sequence of EROs as in the two previous examples to obtain the following augmented matrix. ⎡ ⎤ 1 −2 −1 3 −1 ⎣0 0 1 −2 2 ⎦ . 0 0 0 0 0 This matrix is in REF, and since rank(𝐴) = rank system is consistent. This time the RREF will be (︁[︁ #» ]︁)︁ 𝐴| 𝑏 = 2, we conclude that this ⎤ ⎡ 1 −2 0 1 1 ⎣0 0 1 −2 2⎦ . 0 0 0 0 0 Since there are four variables and rank(𝐴) = 2, then the number of parameters is 4 − 2 = 2. We let the free variables 𝑥2 and 𝑥4 be 𝑠 and 𝑡 respectively, where 𝑠, 𝑡 ∈ R. Solving for the basic variables, we get 𝑥1 = 1+2𝑠−𝑡 and 𝑥3 = 2+2𝑡, for 𝑠, 𝑡 ∈ R. Therefore, the solution set is ⎧⎡ ⎫ ⎧⎡ ⎤ ⎫ ⎤ ⎡ ⎤ ⎡ ⎤ 1 + 2𝑠 − 𝑡 1 2 −1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎬ ⎨⎢ ⎥ ⎬ ⎥ ⎢1⎥ ⎢ 0 ⎥ 𝑠 0 ⎥ : 𝑠, 𝑡 ∈ R = ⎢ ⎥ + 𝑠 ⎢ ⎥ + 𝑡 ⎢ ⎥ : 𝑠, 𝑡 ∈ R . 𝑈= ⎢ ⎣ 2 + 2𝑡 ⎦ ⎣0⎦ ⎣ 2 ⎦ ⎪ ⎪ ⎪⎣ 2 ⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎪ ⎩ ⎭ 𝑡 0 0 1 Example 3.7.4 Solve the following system of linear equations: 𝑥1 −2𝑥2 −𝑥3 +3𝑥4 = 0 2𝑥1 −4𝑥2 +𝑥3 = 0 . 𝑥1 −2𝑥2 +2𝑥3 −3𝑥4 = 0 Solution: Once again, we observe that we have the same coefficient matrix and thus we will be performing the same sequence of EROs as in the previous three examples. We also note that this system is guaranteed to be consistent since all of the constant terms are originally 0 and no ERO will change this fact. We can always set all of the variables to 0 to obtain a solution. The RREF of this system is Section 3.7 Homogeneous and Non-Homogeneous Systems, Nullspace 81 ⎡ ⎤ 1 −2 0 1 0 ⎣0 0 1 −2 0⎦ . 0 0 0 0 0 As before we let the free variables 𝑥2 and 𝑥4 be 𝑠 and 𝑡 respectively, where 𝑠, 𝑡 ∈ R. Solving for the basic variables, we get 𝑥1 = 2𝑠 − 𝑡 and 𝑥3 = 2𝑡, for 𝑠, 𝑡 ∈ R. Therefore, the solution set is ⎧⎡ ⎫ ⎧ ⎡ ⎤ ⎫ ⎤ ⎡ ⎤ 2𝑠 − 𝑡 2 −1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎬ ⎪ ⎨ ⎢ ⎥ ⎬ ⎥ ⎢ 0 ⎥ 𝑠 1 ⎥ : 𝑠, 𝑡 ∈ R = 𝑠 ⎢ ⎥ + 𝑡 ⎢ ⎥ : 𝑠, 𝑡 ∈ R . 𝑉 = ⎢ ⎣ 2𝑡 ⎦ ⎣ 2 ⎦ ⎪ ⎪ ⎪ ⎣0⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎪ ⎩ ⎭ 𝑡 0 1 Note that in this final example, the solution set, 𝑉 , can be written as all linear combinations [︀ ]︀𝑇 [︀ ]︀𝑇 of the vectors 2 1 0 0 and −1 0 2 1 . Therefore, we can write the solution set as the span of these two vectors: ⎫ ⎧ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤⎫ ⎤ ⎡ −1 ⎪ 2 −1 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎬ ⎨⎢ ⎥ ⎢ ⎨ ⎢ ⎥ ⎢ 0 ⎥ 1⎥ ⎢ 0 ⎥ 1⎥ ⎥ ⎥ ⎢ ⎢ ⎢ , : 𝑠, 𝑡 ∈ R = Span 𝑉 = 𝑠⎣ ⎦ + 𝑡⎣ ⎣ 0 ⎦ ⎣ 2 ⎦⎪ . 2 ⎦ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎭ ⎩ ⎩ 1 0 1 0 There are several observations we can make by comparing these four systems of linear equations. The first observation is that the last system is the simplest to solve because all the constant terms are zero. We give a name to such a system and its solution set. Definition 3.7.5 Homogeneous and Non-homogeneous Systems We say that a system of linear equations is homogeneous if all the constant terms on the right-hand side of the equations are zero. Otherwise we say the system is nonhomogeneous. [︁ #» ]︁ Thus, if 𝐴 | 𝑏 is the augmented matrix of a system of linear equations, then the system #» #» is homogeneous if 𝑏 = ⃗0, and non-homogeneous if 𝑏 ̸= ⃗0. As we noted in Example 3.7.4, a homogeneous system will always be consistent. We can set all of the variables to 0 to obtain a solution. This solution is special in its consideration, and we formally define it below. Definition 3.7.6 Trivial Solution For a homogeneous system with variables 𝑥1 , 𝑥2 , ..., 𝑥𝑛 , the trivial solution is the solution 𝑥1 = 𝑥2 = · · · = 𝑥𝑛 = 0. Moreover, if we collect all the solutions of a homogeneous system into a single set, that set has a special relationship with the matrix 𝐴. We define this set formally below. 82 Chapter 3 Definition 3.7.7 Nullspace Systems of Linear Equations The solution set of a homogeneous system of linear equations with coefficient matrix 𝐴 is called the nullspace of 𝐴 and is denoted Null(𝐴). Consider the solution sets of the three consistent systems in our previous examples. ⎧⎡ ⎤ ⎫ ⎡ ⎤ ⎡ ⎤ 2 2 −1 ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ ⎢ 0 ⎥ ⎢1⎥ ⎢0⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ 𝑆 = ⎣ ⎦ + 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R . 0 1 2 ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 0 0 1 ⎫ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 2 1 ⎪ ⎪ ⎪ ⎪ ⎬ ⎨⎢ ⎥ ⎢ 0 ⎥ ⎢1⎥ 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 𝑈 = ⎣ ⎦ + 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R . 2 0 ⎪ ⎪ ⎪ ⎪ 2 ⎭ ⎩ 1 0 0 ⎧ ⎡ ⎤ ⎫ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ −1 2 −1 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎢ ⎥ ⎬ ⎨⎢ ⎥ ⎢ ⎢ 0 ⎥ ⎥⎬ 1 0 1 ⎥ : 𝑠, 𝑡 ∈ R = Span ⎢ ⎥ , ⎢ ⎥ + 𝑡⎢ ⎥ . 𝑉 = 𝑠⎢ ⎣ 2 ⎦ ⎣0⎦ ⎣ 0 ⎦ ⎣ 2 ⎦⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎩ ⎭ 1 0 1 0 ⎤ 1 −2 −1 3 Note that 𝑉 = Null(𝐴), where 𝐴 = ⎣ 2 −4 1 0 ⎦ is the common coefficient matrix of all 1 −2 2 −3 three systems. ⎡ The second observation is that the solution set for the homogeneous system of equations can be written as the span of a set of vectors. The solution sets for the two non-homogeneous systems of equations cannot be written as the span of a set of vectors because these solution sets do not include ⃗0. The third observation is that the solution sets 𝑆, 𝑈 and 𝑉 are very similar. To be more precise, the solution sets 𝑆 and 𝑈 of the non-homogeneous systems differ by only a constant vector and the part which they have in common is 𝑉 . We will return to this observation #» #» 𝑥 = 0 and 𝐴 #» 𝑥 = 𝑏 ). later and formalize it in Theorem 3.11.6 (Solutions to 𝐴 #» The systems in Examples 3.7.1, 3.7.2, 3.7.3 and 3.7.4 have the same coefficient matrix, 𝐴, but different right-hand sides: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 −1 0 ⎣ 5 ⎦ , ⎣ 2 ⎦ , ⎣ 4 ⎦ and ⎣ 0 ⎦ , 4 3 5 0 respectively. To solve each of the previous four examples, of EROs. We can solve all four systems at once using matrix”, ⎡ 1 −2 −1 3 1 1 −1 ⎣2 −4 1 0 5 2 4 1 −2 2 −3 4 3 5 we performed the same sequence the following “super-augmented ⎤ 0 0⎦ , 0 Section 3.8 Homogeneous and Non-Homogeneous Systems, Nullspace which has RREF 83 ⎡ ⎤ 1 −2 0 1 2 1 1 0 ⎣0 0 1 −2 1 0 2 0⎦ . 0 0 0 0 0 2 0 0 We use the RREF of the “super-augmented matrix” to solve the four systems. For the first system, we need the first column after the vertical line. ⎡ ⎤ 1 −2 0 1 2 ⎣0 0 1 −2 1⎦ 0 0 0 0 0 This corresponds to the system of equations 𝑥1 −2𝑥2 with solution set +𝑥4 = 2 𝑥3 −2𝑥4 = 1 0 = 0 ⎧⎡ ⎤ ⎫ ⎡ ⎤ ⎡ ⎤ 2 2 −1 ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎬ ⎢1⎥ ⎢ 0 ⎥ 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 𝑆 = ⎣ ⎦ + 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R . 0 2 ⎪ ⎪ ⎪ 1 ⎪ ⎩ ⎭ 0 0 1 For the second system, we need the second column after the vertical line. ⎤ ⎡ 1 −2 0 1 1 ⎣0 0 1 −2 0⎦ 0 0 0 0 2 This is an inconsistent system and has solution set 𝑇 = ∅. For the third and fourth systems we need the third and fourth columns after the vertical line, respectively, ⎡ ⎤ ⎡ ⎤ 1 −2 0 1 0 1 −2 0 1 1 ⎣0 0 1 −2 2⎦ and ⎣0 0 1 −2 0⎦ . 0 0 0 0 0 0 0 0 0 0 These correspond to solution sets ⎫ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 −1 ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎬ ⎢1⎥ ⎢ 0 ⎥ 0 ⎥ + 𝑎 ⎢ ⎥ + 𝑏 ⎢ ⎥ : 𝑎, 𝑏 ∈ R 𝑈= ⎢ ⎣2⎦ ⎣0⎦ ⎣ 2 ⎦ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 0 0 1 and respectively. ⎧ ⎡ ⎤ ⎫ ⎧⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ 2 −1 2 −1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎢ ⎥ ⎬ ⎨⎢ ⎥ ⎢ ⎥⎪ ⎬ ⎢ 0 ⎥ 1 1 ⎥ + 𝑡 ⎢ ⎥ : 𝑠, 𝑡 ∈ R = Span ⎢ ⎥ , ⎢ 0 ⎥ , 𝑉 = 𝑠⎢ ⎣0⎦ ⎣ 2 ⎦ ⎣ 0 ⎦ ⎣ 2 ⎦⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎩ ⎭ 0 1 0 1 84 Chapter 3 3.8 Systems of Linear Equations Solving Systems of Linear Equations over C The Gauss–Jordan Algorithm works just as well over C as it does over R. The only new feature is that the arithmetic is more intricate because of the complex numbers. If parameters are introduced in the solution, then these parameters take on all complex values instead of just real values. REMARK Recall that for 𝑧 = 𝑎 + 𝑏𝑖 ∈ C, 𝑧 ̸= 0, that Example 3.8.1 1 𝑧 𝑎 − 𝑏𝑖 = = 2 . 𝑧 𝑧𝑧 𝑎 + 𝑏2 Solve the following system of linear equations over C. (1 + 𝑖) 𝑧1 +(−2 − 3𝑖)𝑧2 = −15𝑖 (1 + 3𝑖) 𝑧1 +(−1 − 8𝑖)𝑧2 = 15 − 30𝑖 Solution: The augmented matrix is ]︂ 1 + 𝑖 −2 − 3𝑖 −15𝑖 . 1 + 3𝑖 −1 − 8𝑖 15 − 30𝑖 [︂ We perform the following EROs. (︂ (︂ )︂ )︂ [︂ ]︂ 1 1 1 1 − 52 − 12 𝑖 − 15 − 15 𝑖 2 2 𝑅1 = − 𝑖 𝑅1 gives 𝑅1 → . 1 + 3𝑖 −1 − 8𝑖 15 − 30𝑖 1+𝑖 2 2 [︂ 1 − 52 − 12 𝑖 − 15 2 − 𝑅2 → −(1 + 3𝑖)𝑅1 + 𝑅2 gives 0 0 0 15 2 𝑖 ]︂ . This matrix is in RREF. The rank of the coefficient matrix and the rank of the augmented matrix are both equal to 1 and so the system is consistent. The number of parameters in the solution set is 2 − 1 = 1 parameter in the solution set. We let the 𝑧2 = 𝑡 (︀ 5 free1 variable )︀ 15 for 𝑡 ∈ C. Solving for the basic variable 𝑧1 we get 𝑧1 = − 15 − 𝑖 + + 𝑖 𝑡. Therefore, 2 2 2 2 the solution set, 𝑆, is {︂[︂ (︀ 15 15 )︀ (︀ 5 1 )︀ ]︂ }︂ {︂[︂ 15 15 ]︂ [︂ 5 1 ]︂ }︂ − 2 − 2 𝑖 + 2 + 2𝑖 𝑡 −2 − 2𝑖 + 𝑖 𝑆= :𝑡∈C = + 2 2 𝑡:𝑡∈C 𝑡 0 1 3.9 Matrix–Vector Multiplication In this section we will define a notion of matrix–vector multiplication that will be very useful in our study of systems of linear equations. However, before we give the definition, we will consider some different ways that we can partition a matrix. Section 3.9 Matrix–Vector Multiplication Definition 3.9.1 Row Vector 85 A row vector is a matrix with exactly one row. For a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F), we will denote # »(𝐴). That is, the 𝑖𝑡ℎ row of 𝐴 by row 𝑖 # »(𝐴) = [︀ 𝑎 𝑎 · · · 𝑎 ]︀ . row 𝑖1 𝑖2 𝑖𝑛 𝑖 We can write an 𝑚 × 𝑛 matrix 𝐴 as ⎡# » ⎤ row1 (𝐴) # »(𝐴) ⎥ ⎢ row 2 ⎢ ⎥ 𝐴=⎢ ⎥. .. ⎣ ⎦ . #row »(𝐴) 𝑚 In other words, we can partition 𝐴 into 𝑚 row vectors, each with 𝑛 components. ]︂ 5 −2 0 # »(𝐴) = [︀ 5 −2 0 ]︀ and row # »(𝐴) = [︀ 2 1 −8 ]︀. . Then row Let 𝐴 = 1 2 2 1 −8 [︂ Example 3.9.2 [︀ ]︀ #» 𝑚 Similarly, we can think of an 𝑚 × 𝑛 matrix 𝐴 as 𝐴 = 𝑎#»1 𝑎#»2 . . . 𝑎# » 𝑛 , where 𝑎𝑖 ∈ F . In other words, we can partition 𝐴 into 𝑛 column vectors, each with 𝑚 components. We will use the convention, demonstrated here, where an upper case letter refers to the matrix and the corresponding lower case letter with subscripts refers to the columns. ⎡ Example 3.9.3 ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 1 2 [︁ #» #» ]︁ #» #» Let 𝐵 = ⎣ −3 −4 ⎦. Then 𝐵 = 𝑏1 𝑏2 , where 𝑏 1 = ⎣ −3 ⎦ and 𝑏 2 = ⎣ −4 ⎦. 7 9 7 9 We now define the product of an 𝑚 × 𝑛 matrix and a (column) vector in F𝑛 . Definition 3.9.4 Matrix–Vector Multiplication in Terms of the Individual Entries Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and ⎡ 𝑎11 ⎢ 𝑎21 ⎢ 𝐴 #» 𝑥 =⎢ . ⎣ .. 𝑎𝑚1 #» 𝑥 ∈ F𝑛 . We define the product 𝐴 #» 𝑥 as follows: ⎤ ⎤⎡ ⎤ ⎡ 𝑎12 . . . 𝑎1𝑛 𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛 𝑥1 ⎢ ⎥ ⎢ ⎥ 𝑎22 . . . 𝑎2𝑛 ⎥ ⎥ ⎢ 𝑥2 ⎥ ⎢ 𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛 ⎥ ⎥. .. . . .. .. ⎥ ⎢ .. ⎥ = ⎢ ⎦ . . ⎦⎣ . ⎦ ⎣ . . 𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛 𝑎𝑚2 . . . 𝑎𝑚𝑛 𝑥𝑛 REMARK Notice that we have defined the product of an 𝑚 × 𝑛 matrix 𝐴 with a vector #» 𝑥 only if #» 𝑥 ∈ F𝑛 . The expression 𝐴 #» 𝑦 is meaningless if #» 𝑦 ∈ F𝑘 with 𝑘 ̸= 𝑛. In other words, the number of components of #» 𝑥 must be equal to the number of columns of 𝐴 for the product 𝐴 #» 𝑥 to be defined. 86 Chapter 3 Systems of Linear Equations The resulting product 𝐴 #» 𝑥 of 𝐴 ∈ 𝑀𝑚×𝑛 (F) and #» 𝑥 ∈ F𝑛 is a vector in F𝑚 . The 𝑖𝑡ℎ component of 𝐴 #» 𝑥 is the sum 𝑎𝑖1 𝑥1 + 𝑎𝑖2 𝑥2 + · · · + 𝑎𝑖𝑛 𝑥𝑛 = 𝑛 ∑︁ 𝑎𝑖𝑗 𝑥𝑗 = 𝑗=1 𝑛 ∑︁ # »(𝐴)) 𝑥 . (row 𝑖 𝑗 𝑗 𝑗=1 Informally, we can think of the sum above as something analogous to a “dot product” of the 𝑖𝑡ℎ row of 𝐴 with the vector #» 𝑥. ⎡ Example 3.9.5 ⎤ ⎡ ⎤ 16 1 1 Compute the product 𝐴 #» 𝑥 where 𝐴 = ⎣ 3 4 5 ⎦ and #» 𝑥 = ⎣ −4 ⎦. 5 2 −3 6 Solution: We have coloured the matrix red and the vector blue to emphasize the role their entries play in calculating the product. First, we calculate the components of the product by calculating “dot products” of the rows of 𝐴 with #» 𝑥. ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 16 1 1 (1)(1) + (6)(−4) + (1)(6) −17 ⎣ 3 4 5 ⎦⎣ −4 ⎦ = ⎣ (3)(1) + (4)(−4) + (5)(6) ⎦ = ⎣ 17 ⎦ . 5 2 −3 6 (5)(1) + (2)(−4) + (−3)(6) −21 We also make the observation that ⎡ 𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛 ⎢ 𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛 ⎢ ⎢ .. ⎣ . ⎤ ⎡ 𝑎11 ⎥ ⎢ 𝑎21 ⎥ ⎢ ⎥ = 𝑥1 ⎢ .. ⎦ ⎣ . 𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛 𝑎𝑚1 ⎤ ⎡ 𝑎12 ⎥ ⎢ 𝑎22 ⎥ ⎢ ⎥ + 𝑥2 ⎢ .. ⎦ ⎣ . ⎤ ⎡ 𝑎1𝑛 ⎥ ⎢ 𝑎2𝑛 ⎥ ⎢ ⎥ + · · · + 𝑥𝑛 ⎢ .. ⎦ ⎣ . 𝑎𝑚2 ⎤ ⎥ ⎥ ⎥, ⎦ 𝑎𝑚𝑛 which is a linear combination of the columns of 𝐴. Therefore, we can re-express matrix– vector multiplication in terms of the columns of 𝐴. Proposition 3.9.6 (Matrix–Vector Multiplication in Terms of Columns) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and #» 𝑥 ∈ F𝑛 . Then ⎤ 𝑥1 ⎥ [︀ ]︀ ⎢ ⎢ 𝑥2 ⎥ 𝑎 1 #» 𝑎 2 · · · #» 𝑎 𝑛 ⎢ . ⎥ = 𝑥1 #» 𝐴 #» 𝑥 = #» 𝑎 1 + 𝑥2 #» 𝑎 2 + · · · + 𝑥𝑛 #» 𝑎 𝑛. ⎣ .. ⎦ ⎡ 𝑥𝑛 ⎡ Example 3.9.7 ⎤⎡ ⎤ 16 1 1 ⎣ ⎦ ⎣ −4 ⎦ again, this time by thinking of the product as a linear We will compute 3 4 5 5 2 −3 6 Section 3.9 Matrix–Vector Multiplication 87 combination of the columns of 𝐴: ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 16 1 1 1 6 1 ⎣ 3 4 5 ⎦⎣ −4 ⎦ = (1)⎣ 3 ⎦ + (−4)⎣ 4 ⎦ + (6)⎣ 5 ⎦ 5 2 −3 6 5 2 −3 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 −24 6 = ⎣ 3 ⎦ + ⎣ −16 ⎦ + ⎣ 30 ⎦ 5 −8 −18 ⎡ ⎤ −17 = ⎣ 17 ⎦ . −21 Example 3.9.8 ⎡ ⎤ 1 1 + 𝑖 2 + 2𝑖 3 − 𝑖 Compute the product 𝐴 #» 𝑥 where 𝐴 = and #» 𝑥 = ⎣ 1 − 𝑖 ⎦. 2 + 3𝑖 4 + 𝑖 5 − 2𝑖 2 − 3𝑖 [︂ ]︂ Solution: We compute the product in two different ways. We have coloured the matrix red and the vector blue to emphasize the role their entries play in calculating the product. First, we calculate the components of the product by calculating “dot products” of the rows of 𝐴 with #» 𝑥. ⎤ ⎡ ]︂ [︂ 1 1 + 𝑖 2 + 2𝑖 3 − 𝑖 ⎣ 1−𝑖 ⎦ 𝐴 #» 𝑥 = 2 + 3𝑖 4 + 𝑖 5 − 2𝑖 2 − 3𝑖 ]︂ [︂ (1 + 𝑖)(1) + (2 + 2𝑖)(1 − 𝑖) + (3 − 𝑖)(2 − 3𝑖) = (2 + 3𝑖)(1) + (4 + 𝑖)(1 − 𝑖) + (5 − 2𝑖)(2 − 3𝑖) ]︂ [︂ 8 − 10𝑖 . = 11 − 19𝑖 Second, we calculate the product by thinking of the product as a linear combination of the columns of 𝐴. ⎡ ⎤ [︂ ]︂ 1 1 + 𝑖 2 + 2𝑖 3 − 𝑖 ⎣ 1−𝑖 ⎦ 𝐴 #» 𝑥 = 2 + 3𝑖 4 + 𝑖 5 − 2𝑖 2 − 3𝑖 [︂ ]︂ [︂ ]︂ [︂ ]︂ 1+𝑖 2 + 2𝑖 3−𝑖 = (1) + (1 − 𝑖) + (2 − 3𝑖) 2 + 3𝑖 4+𝑖 5 − 2𝑖 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 1+𝑖 4 3 − 11𝑖 8 − 10𝑖 = + + = . 2 + 3𝑖 5 − 3𝑖 4 − 19𝑖 11 − 19𝑖 The following useful result can be proved using Proposition 3.9.6 (Matrix–Vector Multiplication in Terms of Columns). Theorem 3.9.9 (Linearity of Matrix–Vector Multiplication) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). Let #» 𝑥 , #» 𝑦 ∈ F𝑛 and 𝑐 ∈ F. Then 88 Chapter 3 Systems of Linear Equations (a) 𝐴( #» 𝑥 + #» 𝑦 ) = 𝐴 #» 𝑥 + 𝐴 #» 𝑦. (b) 𝐴(𝑐 #» 𝑥 ) = 𝑐𝐴 #» 𝑥. 3.10 Using a Matrix–Vector Product to Express a System of Linear Equations Consider the system of linear equations 𝑎11 𝑥1 𝑎21 𝑥1 +𝑎12 𝑥2 + · · · +𝑎22 𝑥2 + · · · .. . 𝑎𝑚1 𝑥1 +𝑎𝑚2 𝑥2 + · · · ⎡ 𝑎11 𝑎12 ⎢ 𝑎21 𝑎22 ⎢ with coefficient matrix 𝐴 = ⎢ . .. ⎣ .. . 𝑎𝑚1 𝑎𝑚2 +𝑎1𝑛 𝑥𝑛 = +𝑎2𝑛 𝑥𝑛 = 𝑏1 𝑏2 +𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚 ⎤ · · · 𝑎1𝑛 · · · 𝑎2𝑛 ⎥ ⎥ . ⎥. .. . .. ⎦ · · · 𝑎𝑚𝑛 ⎡ ⎤ ⎡ ⎤ 𝑥1 𝑏1 ⎢ ⎥ ⎢ #» #» ⎢ 𝑏2 ⎥ ⎢ 𝑥2 ⎥ ⎥ This system can now be represented as 𝐴 #» 𝑥 = 𝑏 , where #» 𝑥 = ⎢ . ⎥ and 𝑏 = ⎢ . ⎥. ⎣ .. ⎦ ⎣ .. ⎦ 𝑥𝑛 𝑏𝑛 #» Going forwards, we will also refer to expressions of the form 𝐴 #» 𝑥 = 𝑏 , with 𝐴 ∈ 𝑀𝑚×𝑛 (F), #» #» 𝑥 ∈ F𝑚 and 𝑏 ∈ F𝑚 , as “a system of linear equations”, acknowledging the equivalence of #» a system of 𝑚 linear equations (as given above) to the expression 𝐴 #» 𝑥 = 𝑏. Example 3.10.1 The system of equations 5𝑥1 +4𝑥2 −7𝑥3 +2𝑥4 = −1 −6𝑥1 −2𝑥2 +3𝑥3 = 2 +3𝑥2 +5𝑥3 +5𝑥4 = 7 can be represented as 𝐴⃗𝑥 = ⃗𝑏, where ⎡ ⎤ ⎡ ⎤ 5 4 −7 2 −1 𝐴 = ⎣ −6 −2 3 0 ⎦ and ⃗𝑏 = ⎣ 2 ⎦ . 0 3 5 5 7 Using this notation and the linearity of matrix–vector multiplication, we can prove a result that is essentially a corollary of Theorem 3.6.7 (System Rank Theorem). Section 3.11 Solution Sets to Systems of Linear Equations Proposition 3.10.2 89 Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). If for every vector 𝑒#»𝑖 in the standard basis of F𝑚 the system of equations 𝐴 #» 𝑥 = 𝑒#»𝑖 is consistent, then rank(𝐴) = 𝑚. Proof: We begin by assuming that for every vector 𝑒#»𝑖 in the standard basis of F𝑚 , the system of equations 𝐴 #» 𝑥 = 𝑒#»𝑖 is consistent. Let 𝑥#»𝑖 be a solution to 𝐴 #» 𝑥 = 𝑒#»𝑖 . #» #» We know that for any vector 𝑏 ∈ F𝑚 , we can write 𝑏 as a linear combination of the vectors in the standard basis for F𝑚 . That is, there exists 𝑐1 , . . . , 𝑐𝑚 ∈ F such that #» ». 𝑏 = 𝑐1 𝑒#»1 + · · · + 𝑐𝑚 𝑒#𝑚 Using these scalars, the solutions #» 𝑥 𝑖 to 𝐴 #» 𝑥 = 𝑒#»𝑖 and the linearity of matrix-vector multiplication we find that » = #» 𝐴(𝑐1 #» 𝑥 1 + · · · + 𝑐𝑚 #» 𝑥 𝑚 ) = 𝑐1 𝐴 #» 𝑥 1 + · · · + 𝑐𝑚 𝐴 #» 𝑥 𝑚 = 𝑐1 𝑒#»1 + · · · + 𝑐𝑚 𝑒#𝑚 𝑏. #» #» » is a solution to 𝐴 #» Therefore, 𝑐1 𝑥#»1 + · · · + 𝑐𝑚 𝑥# 𝑚 𝑥 = 𝑏 . Thus, 𝐴 #» 𝑥 = 𝑏 is consistent #» for every 𝑏 ∈ F𝑚 . Therefore, by Part (b) of Theorem 3.6.7 (System Rank Theorem), rank(𝐴) = 𝑚. 3.11 Solution Sets to Systems of Linear Equations #» #» #» Let 𝐴 #» 𝑥 = 𝑏 be a system of linear equations. If 𝑏 = 0 , then the system is homogeneous. #» #» If 𝑏 ̸= 0 , then the system is non-homogeneous. We will consider the solution sets of homogeneous and non-homogeneous systems with the same coefficient matrix. As we noted in Section 3.7, a homogeneous system of linear equations is always consistent, since we can always set all of the variables to 0 (the trivial solution). In other words, the solution set to a homogeneous system always contains the zero vector (the trivial solution), and thus, it is never empty. The solution set of a non-homogeneous system does not contain the trivial solution. Non-homogeneous systems may be consistent or inconsistent. The next result gives another important property of the solution set to a homogeneous system. Proposition 3.11.1 #» Let 𝐴 #» 𝑥 = 0 be a homogeneous system of linear equations with solution set 𝑆. If #» 𝑥 , #» 𝑦 ∈ 𝑆, #» #» #» and if 𝑐 ∈ F, then 𝑥 + 𝑦 ∈ 𝑆 and 𝑐 𝑥 ∈ 𝑆. Proof: Assume that #» 𝑥 , #» 𝑦 ∈ 𝑆. Then #» 𝑥 and #» 𝑦 are solutions to the homogeneous system #» #» #» #» and 𝐴 𝑥 = 0 and 𝐴 𝑦 = 0 . Using the linearity of matrix multiplication, #» #» #» 𝐴( #» 𝑥 + #» 𝑦 ) = 𝐴 #» 𝑥 + 𝐴 #» 𝑦 = 0+0 = 0 and #» #» 𝐴(𝑐 #» 𝑥 ) = 𝑐(𝐴 #» 𝑥 ) = 𝑐( 0 ) = 0 . Therefore, #» 𝑥 + #» 𝑦 ∈ 𝑆 and 𝑐 #» 𝑥 ∈ 𝑆. 90 Chapter 3 Systems of Linear Equations We can combine these two results to state that If #» 𝑥 , #» 𝑦 ∈ 𝑆 and if 𝑐, 𝑑 ∈ F, then 𝑐 #» 𝑥 + 𝑑 #» 𝑦 ∈ 𝑆. We can also extend this result inductively to 𝑛 vectors in 𝑆 and 𝑛 scalars in F. In other words, given 𝑛 solutions to a homogeneous system of linear equations, then any linear combination of these solutions is also a solution to the homogeneous system. We say that 𝑆 is closed under addition and scalar multiplication. This fact, along with the fact that 𝑆 is non-empty means that 𝑆 forms a type of object called a subspace. We will explore subspaces later in the course. Example 3.11.2 Consider the homogeneous system given in Example 3.7.4 of Section 3.7. It has coefficient matrix ⎡ ⎤ 1 −2 −1 3 𝐴 = ⎣ 2 −4 1 0 ⎦ 1 −2 2 −3 and solution set ⎫ ⎧ ⎡ ⎤ ⎡ ⎤ −1 2 ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ ⎢ ⎥ ⎢ 0 ⎥ 1 ⎢ ⎥ ⎥ ⎢ 𝑉 = 𝑠 ⎣ ⎦ + 𝑡 ⎣ ⎦ : 𝑠, 𝑡 ∈ R . 2 0 ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 1 0 [︀ ]︀𝑇 By choosing 𝑠 = 1, 𝑡 = 0 and then 𝑠 = 0, 𝑡 = 1, we get the solutions #» 𝑥 = 2100 [︀ ]︀𝑇 𝑧 be the following linear combination of these two and #» 𝑦 = −1 0 2 1 , respectively. Let #» solutions: Then [︀ ]︀𝑇 #» 𝑧 = 2 #» 𝑥 + 3 #» 𝑦 = 1263 . ⎡ ⎤ ⎡ ⎤ ⎤ 1 0 1 −2 −1 3 ⎢ ⎥ 2⎥ ⎣ ⎦ #» ⎢ ⎣ ⎦ 𝐴 𝑧 = 2 −4 1 0 ⎣ ⎦ = 0 . 6 0 1 −2 2 −3 3 ⎡ And so #» 𝑧 ∈ 𝑉 as predicted by Proposition 3.11.1. Definition 3.11.3 Associated Homogeneous System #» #» #» Let 𝐴 #» 𝑥 = 𝑏 , where 𝑏 ̸= 0 , be a non-homogeneous system of linear equations. The #» associated homogeneous system to this system is the system 𝐴 #» 𝑥 = 0. Example 3.11.4 ⎡ ⎤ ⎡ ⎤ 2 3 0 −5 7 ⎦ #» 𝑥 = ⎣ 0 ⎦ is a homogeneous system of 3 equations in 3 unknowns. −4 6 0 ⎤ ⎡ ⎤ 2 3 4 2 #» ⎦ ⎣ −5 7 −9 𝑦 = 4 ⎦ is a non-homogeneous system of 3 equations in 4 unknowns, −4 6 −4 6 ⎡ ⎤ ⎡ ⎤ 1 2 3 4 0 with associated homogeneous system of equations ⎣ −4 −5 7 −9 ⎦ #» 𝑦 = ⎣ 0 ⎦. 2 −4 6 −4 0 1 (a) ⎣ −4 2 ⎡ 1 ⎣ (b) −4 2 Section 3.11 Solution Sets to Systems of Linear Equations Definition 3.11.5 Particular Solution 91 #» Let 𝐴 #» 𝑥 = 𝑏 be a consistent system of linear equations. We refer to a solution of this system, 𝑥#»𝑝 , as a particular solution to this system. #» Since a consistent system 𝐴 #» 𝑥 = 𝑏 either has a unique solution or it has an infinite number of solutions, there may be an infinite number of choices for a particular solution. We can use a particular solution to a non-homogeneous system to find a relationship between the non-homogeneous system and its associated homogeneous system. Specifically, we can #» express the solution set of a consistent, non-homogeneous system 𝐴 #» 𝑥 = 𝑏 in terms of a particular solution to the system and the solution set of its associated homogeneous system #» 𝐴 #» 𝑥 = 0 as follows. Theorem 3.11.6 #» #» (Solutions to 𝐴 #» 𝑥 = 0 and 𝐴 #» 𝑥 = 𝑏) #» #» #» Let 𝐴 #» 𝑥 = 𝑏 , where 𝑏 ̸= 0 , be a consistent non-homogeneous system of linear equations #» ˜ Let 𝐴 #» with solution set 𝑆. 𝑥 = 0 be the associated homogeneous system with solution set ˜ then 𝑆. If 𝑥#»𝑝 ∈ 𝑆, 𝑆˜ = {𝑥#»𝑝 + #» 𝑥 : #» 𝑥 ∈ 𝑆}. ˜ Therefore, 𝐴𝑥#»𝑝 = #» Proof: Let 𝑥#»𝑝 ∈ 𝑆. 𝑏 . Let #» 𝑥 be a solution of its associated homogeneous #» #» system. Therefore, 𝐴 𝑥 = 0 . To show that 𝑆˜ = {𝑥#»𝑝 + #» 𝑥 : #» 𝑥 ∈ 𝑆}, we will show that # » #» #» # » #» #» ˜ 𝑆˜ ⊆ {𝑥𝑝 + 𝑥 : 𝑥 ∈ 𝑆} and {𝑥𝑝 + 𝑥 : 𝑥 ∈ 𝑆} ⊆ 𝑆. #» ˜ Therefore, 𝐴 #» Suppose that #» 𝑦 ∈ 𝑆. 𝑦 = 𝑏 . We can write #» 𝑦 as #» 𝑦 = 𝑥#»𝑝 + #» 𝑦 − 𝑥#»𝑝 . Using the linearity of matrix-vector multiplication we have that #» #» #» 𝐴( #» 𝑦 − 𝑥#»𝑝 ) = 𝐴 #» 𝑦 − 𝐴𝑥#»𝑝 = 𝑏 − 𝑏 = 0 , which implies that that 𝑆˜ ⊆ {𝑥#»𝑝 + #» 𝑥 : #» 𝑦 − 𝑥#»𝑝 ∈ 𝑆. Therefore, #» 𝑦 ∈ {𝑥#»𝑝 + #» 𝑥 : #» 𝑥 ∈ 𝑆}. Thus, we have shown #» 𝑥 ∈ 𝑆}. Now, suppose that #» 𝑧 ∈ {𝑥#»𝑝 + #» 𝑥 : #» 𝑥 ∈ 𝑆}. Then #» 𝑧 = 𝑥#»𝑝 + #» 𝑥 1 for some #» 𝑥 1 ∈ 𝑆. Since #» ⃗𝑥1 ∈ 𝑆, 𝐴⃗𝑥1 = 0 . Again using the linearity of matrix-vector multiplication, we have that #» #» #» 𝐴 #» 𝑧 = 𝐴(𝑥#»𝑝 + #» 𝑥 1 ) = 𝐴𝑥#»𝑝 + 𝐴 #» 𝑥1 = 𝑏 + 0 = 𝑏 . ˜ ˜ Thus, we have shown {𝑥#»𝑝 + #» Therefore, #» 𝑧 ∈ 𝑆. 𝑥 : #» 𝑥 ∈ 𝑆} ⊆ 𝑆. The next example illustrates the idea from this theorem. [︂ Example 3.11.7 ]︂ 12 Let 𝐴 = . Solve the following systems and represent their solutions sets graphically. 24 #» (a) 𝐴 #» 𝑥 = 0 [︂ ]︂ 4 #» (b) 𝐴 𝑥 = 8 92 Chapter 3 Systems of Linear Equations [︂ ]︂ −2 #» (c) 𝐴 𝑥 = −4 Solution: We solve all three systems simultaneously by row-reducing the “super-augmented matrix” [︂ ]︂ 1 2 0 4 −2 . 2 4 0 8 −4 We perform the ERO −2𝑅1 + 𝑅2 to obtain the following matrix which is in RREF: [︂ ]︂ 1 2 0 4 −2 . 0 0 0 0 0 Therefore, the solution set to the first system, which is a homogeneous system, is {︂[︂ ]︂ }︂ −2 𝑆= 𝑡:𝑡∈R . 1 The solution to the second system is }︂ {︂[︂ ]︂ [︂ ]︂ 4 −2 𝑡:𝑡∈R . 𝑇 = + 1 0 The solution to the third system is }︂ {︂[︂ ]︂ [︂ ]︂ −2 −2 𝑡:𝑡∈R . + 𝑈= 1 0 We represent these solution sets graphically. The solution sets are labelled in the diagram. Section 3.11 Solution Sets to Systems of Linear Equations 93 We see that 𝑆 is a line through the origin, which is to be expected because it is the solution set to a homogeneous system. The solution sets 𝑇 and 𝑈 are translations of the set 𝑆. The solution set 𝑇 is obtained [︂ ]︂ by translating every vector in 𝑆 to the right by 4 units, which 4 is equivalent to adding , a particular solution to the system in (b), to every vector in 𝑆. 0 Similarly, the solution set 𝑈 is obtained [︂ by ]︂ translating every vector in 𝑆 to the left by 2 −2 units, which is equivalent to adding , a particular solution to the system in (c), to 0 every vector in 𝑆. Let’s consider another example in R3 . ⎡ Example 3.11.8 ⎤ 123 Let 𝐴 = ⎣ 2 4 6 ⎦. Solve the following systems and represent their solutions sets graphically. 369 #» (a) 𝐴 #» 𝑥 = 0 ⎡ ⎤ 6 (b) 𝐴 #» 𝑥 = ⎣ 12 ⎦ 18 ⎡ ⎤ −4 (c) 𝐴 #» 𝑥 = ⎣ −8 ⎦ −12 Solution: We solve all three systems simultaneously by row-reducing the “super-augmented matrix” ⎤ ⎡ 1 2 3 0 6 −4 ⎣ 2 4 6 0 12 −8 ⎦ . 3 6 9 0 18 −12 We perform the EROs −2𝑅1 + 𝑅2 and −3𝑅1 + 𝑅3 in RREF: ⎡ 1 2 3 0 6 ⎣ 0 0 0 0 0 0 0 0 0 0 to obtain the following matrix which is ⎤ −4 0 ⎦. 0 Therefore, the solution set to the first system, which is a homogeneous system, is ⎧⎡ ⎤ ⎫ ⎧⎡ ⎤ ⎡ ⎡ ⎤ ⎤⎫ −3 −3 ⎬ ⎨ −2 ⎬ ⎨ −2 𝑆 = ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ . ⎩ ⎭ ⎩ ⎭ 0 1 0 1 The solution to the second system is ⎧⎡ ⎤ ⎡ ⎤ ⎫ ⎡ ⎤ −2 −3 ⎨ 6 ⎬ 𝑇 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R . ⎭ ⎩ 0 0 1 94 Chapter 3 Systems of Linear Equations The solution to the third system is ⎧⎡ ⎤ ⎡ ⎤ ⎫ ⎡ ⎤ −2 −3 ⎨ −4 ⎬ 𝑈 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R . ⎩ ⎭ 0 0 1 We represent these solution sets graphically. The solution sets are labelled in the diagram We see that 𝑆 is a plane through the origin, which is to be expected because it is the solution set to a homogeneous system. The solution sets 𝑇 and 𝑈 are translations of the set 𝑆. For 𝑇 , every vector in 𝑆 has ⎡ ⎤ 6 been translated in the direction of ⎣ 0 ⎦, a particular solution to the system in (b). For 𝑈 , 0 ⎡ ⎤ −4 every vector in 𝑆 has been translated in the direction of ⎣ 0 ⎦, a particular solution to the 0 system in (c). In the previous two examples, we see that we can find the solution set 𝑈 by subtracting from every element in 𝑇 a particular solution of 𝑇 and then adding a particular solution of 𝑈 . Let’s focus on the sets in Example 3.11.8. ⎡ ⎤ ⎡ ⎤ −4 6 #» Let #» 𝑧 ∈ 𝑇 , and let #» 𝑢 𝑝 = ⎣ 0 ⎦ and 𝑡 𝑝 = ⎣ 0 ⎦ be particular solutions of 𝑈 and 𝑇 , 0 0 #» #» respectively. From Theorem 3.11.6 (Solutions to 𝐴 #» 𝑥 = 0 and 𝐴 #» 𝑥 = 𝑏 ), as #» 𝑧 ∈ 𝑇 , we know that #» #» 𝑧 = 𝑡 𝑝 + #» 𝑥 Section 3.11 Solution Sets to Systems of Linear Equations 95 for some #» 𝑥 ∈ 𝑆, the solution set of the associated homogeneous system. From above, we would express ⎡ ⎤ ⎡ ⎤ −2 −3 #» 𝑥 = ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 for some 𝑠, 𝑡 ∈ R. 0 1 #» #» As we can certainly write #» 𝑢 𝑝 = #» 𝑢 𝑝 − 𝑡 𝑝 + 𝑡 𝑝 and any vector #» 𝑢 ∈ 𝑈 as #» 𝑢 = #» 𝑢 𝑝 + #» 𝑥 for #» #» #» #» #» some 𝑥 ∈ 𝑆 by Theorem 3.11.6 (Solutions to 𝐴 𝑥 = 0 and 𝐴 𝑥 = 𝑏 ), we have 𝑈 = { #» 𝑢 𝑝 + #» 𝑥 : #» 𝑥 ∈ 𝑆} {︀(︀ #» }︀ #» #» )︀ = 𝑢 𝑝 − 𝑡 𝑝 + 𝑡 𝑝 + #» 𝑥 : #» 𝑥 ∈𝑆 (︀ )︀ #» )︀ (︀ #» = { #» 𝑢 𝑝 − 𝑡 𝑝 + 𝑡 𝑝 + #» 𝑥 : #» 𝑥 ∈ 𝑆} ⏟ ⏞ element of 𝑇 {︀(︀ #» }︀ #» )︀ #» #» = 𝑢𝑝 − 𝑡 𝑝 + 𝑧 : 𝑧 ∈ 𝑇 . This holds in general and provides a nice way to understand the relationships between parallel lines and between parallel planes. We formalize this discussion below. Corollary 3.11.9 #» (Solutions to 𝐴 #» 𝑥 = 𝑏 and 𝐴 #» 𝑥 = #» 𝑐) Consider the two consistent, non-homogeneous systems #» 𝐴 #» 𝑥 = 𝑏 , and 𝐴 #» 𝑥 = #» 𝑐, #» where 𝑏 ̸= #» 𝑐 . If 𝑆˜𝑏 and 𝑆˜𝑐 are their respective solution sets, with particular solutions 𝑥#»𝑏 and 𝑥#»𝑐 , respectively, then 𝑆˜𝑐 = {(𝑥#»𝑐 − 𝑥#»𝑏 ) + #» 𝑧 : #» 𝑧 ∈ 𝑆˜𝑏 }. We can use this result to help us find the solution set to a consistent, non-homogeneous system, provided we have a particular solution to the system and we have the solution set to another consistent, non-homogeneous system which shares the coefficient matrix. Note that finding a particular solution of a system is not always a simple task. Example 3.11.10 𝑥 = #» 𝑐 using the Using Example 3.11.8, solve the consistent non-homogeneous system 𝐴 #» #» #» solution to 𝐴 𝑥 = 𝑏 , where ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 123 6 1 #» 𝐴 = ⎣ 2 4 6 ⎦ and 𝑏 = ⎣ 12 ⎦ , #» 𝑐 = ⎣2⎦ . 369 18 3 [︀ ]︀𝑇 Solution: From Example 3.11.8, we have that the solution set to 𝐴 #» 𝑥 = 6 12 18 is ⎧⎡ ⎤ ⎡ ⎫ ⎤ ⎡ ⎤ −2 −3 ⎨ 6 ⎬ 𝑇 = 𝑆˜𝑏 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R . ⎩ ⎭ 0 0 1 [︀ ]︀𝑇 By inspection, we can see that 1 0 0 is a particular solution to 𝐴 #» 𝑥 = #» 𝑐. 96 Chapter 3 Systems of Linear Equations #» Therefore, by Corollary 3.11.9 (Solutions to 𝐴 #» 𝑥 = 𝑏 and 𝐴 #» 𝑥 = #» 𝑐 ), the solution set to #» #» 𝐴 𝑥 = 𝑐 is ⎡ ⎤ }︃ {︃ ⎛⎡ 1 ⎤ ⎡ 6 ⎤⎞ ⎡ 6 ⎤ ⎡ −2 ⎤ −3 𝑆˜𝑐 = ⎝⎣ 0 ⎦ − ⎣ 0 ⎦⎠ + ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R 0 0 0 0 1 ⏟ ⏞ #» 𝑧 ∈𝑆˜𝑏 Note that in the final solution above, simplifying the expression in the set gives us ⎧⎡ ⎤ ⎡ ⎤ ⎫ ⎧⎡ ⎤ ⎫ ⎡ ⎤ −2 −3 ⎨ 1 ⎬ ⎨ 1 ⎬ 𝑆˜𝑐 = ⎣ 0 ⎦ + ⎣ 1 ⎦ 𝑡 + ⎣ 0 ⎦ 𝑠 : 𝑡, 𝑠 ∈ R = ⎣ 0 ⎦ + #» 𝑥 : #» 𝑥 ∈𝑆 , ⎩ ⎭ ⎩ ⎭ 0 0 1 0 where 𝑆 is the solution set of the associated homogeneous system, which is in alignment #» #» with Theorem 3.11.6 (Solutions to 𝐴 #» 𝑥 = 0 and 𝐴 #» 𝑥 = 𝑏 ). Chapter 4 Matrices 4.1 The Column and Row Spaces of a Matrix In the previous chapter, we introduced the concept of a matrix, which was built from the coefficients of a system of equations. It turns out that matrices are interesting and powerful mathematical objects all by themselves. In this chapter, we will explore matrices further. We saw in Proposition 3.9.6 that the matrix–vector product 𝐴 #» 𝑥 is a linear combination of the columns of 𝐴. We give the set of all linear combinations of the columns of 𝐴 a name. Definition 4.1.1 Column Space Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the column space of 𝐴, denoted by Col(𝐴), to be the span of the columns of 𝐴. That is, Col(𝐴) = Span{𝑎#»1 , 𝑎#»2 , . . . , 𝑎# » 𝑛 }. Proposition 4.1.2 (Consistent System and Column Space) #» #» Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝑏 ∈ F𝑚 . The system of linear equations 𝐴 #» 𝑥 = 𝑏 is consistent #» if and only if 𝑏 ∈ Col(𝐴). Proof: We must prove the statement in both directions. We begin with the forward direc]︀𝑇 [︀ #» such that 𝐴 #» 𝑦 = 𝑏 . Thus, tion. Assume that there exists a vector #» 𝑦 = 𝑦1 𝑦2 · · · 𝑦𝑛 ]︀𝑇 [︀ #» #» ]︀ [︀ #» 𝑎1 𝑎2 · · · 𝑎# » 𝑦1 𝑦2 · · · 𝑦𝑛 = 𝑏 . 𝑛 Therefore, #» 𝑦1 𝑎#»1 + 𝑦2 𝑎#»2 + · · · + 𝑦𝑛 𝑎# » 𝑛 = 𝑏. #» #» Thus, 𝑏 is a linear combination of the columns of the matrix 𝐴, and so 𝑏 ∈ Col(𝐴). #» Now we prove the backward direction. Assume that 𝑏 ∈ Col(𝐴). Then #» 𝑏 = 𝑠1 𝑎#»1 + 𝑠2 𝑎#»2 + · · · + 𝑠𝑛 𝑎# » 𝑛 [︀ ]︀𝑇 #» for some 𝑠1 , 𝑠2 , . . . , 𝑠𝑛 ∈ F. Let 𝑠 = 𝑠1 𝑠2 · · · 𝑠𝑛 . Then #» 𝐴 #» 𝑠 = 𝑠1 𝑎#»1 + 𝑠2 𝑎#»2 + · · · + 𝑠𝑛 𝑎# » 𝑛 = 𝑏. #» #» Thus, 𝐴 #» 𝑠 = 𝑏 , and so #» 𝑠 is a solution to the system. Therefore, 𝐴 #» 𝑥 = 𝑏 is consistent. 97 98 Chapter 4 Matrices We now begin our examination of a similar space for the rows of the matrix. First, we need the notion of the transpose of a matrix. Definition 4.1.3 Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the transpose of 𝐴, denoted 𝐴𝑇 , by (𝐴𝑇 )𝑖𝑗 = (𝐴)𝑗𝑖 . Transpose If 𝐴 is an 𝑚 × 𝑛 matrix, then 𝐴𝑇 is an 𝑛 × 𝑚 matrix. It is formed by making all the rows of 𝐴 into the columns of 𝐴𝑇 in the order in which they appear. This definition is the motivation behind ]︀𝑇 the notation that we have used to write a vector [︀ from F𝑛 horizontally: #» 𝑣 = 𝑣1 . . . 𝑣𝑛 . ⎡ ⎤ ]︂ 1 4 1 2 −3 If 𝐴 = , then 𝐴𝑇 = ⎣ 2 −5 ⎦ . 4 −5 6 −3 6 [︂ Example 4.1.4 In Section 3.9, we introduced the idea of thinking of an 𝑚 × 𝑛 matrix 𝐴 as ⎡# » ⎤ row1 (𝐴) # »(𝐴) ⎥ ⎢ row 2 ⎢ ⎥ 𝐴=⎢ ⎥. .. ⎣ ⎦ . #row »(𝐴) 𝑚 In other words, we partition 𝐴 into 𝑚 row vectors, each with 𝑛 components. If we transpose these row vectors, we get vectors in R𝑛 . We use these vectors to define the row space of 𝐴. Definition 4.1.5 Row Space Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the row space of 𝐴, denoted by Row(𝐴), to be the span of the transposed rows of 𝐴. That is, {︁(︀ }︁ # »(𝐴))︀𝑇 , (︀row # »(𝐴))︀𝑇 , . . . , (︀row # »(𝐴))︀𝑇 . Row(𝐴) = Span row 1 2 𝑚 Notice that, if 𝐴 ∈ 𝑀𝑚×𝑛 (F), Row(𝐴) is a subset of F𝑛 while Col(𝐴) is a subset of F𝑚 . ⎡ Example 4.1.6 1 ⎢2 If 𝐴 = ⎢ ⎣3 4 4 3 2 1 ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 2 3 4 ⎬ ⎨ 1 ⎥ −1 ⎥ ⎣4⎦ , ⎣ 3 ⎦ , ⎣2⎦ , ⎣ 1 ⎦ . , then Row(𝐴) = Span 1 ⎦ ⎩ ⎭ 1 −1 1 −1 −1 REMARK Since the transposed rows of 𝐴 are the columns of 𝐴𝑇 , we see that Row(𝐴) = Col(𝐴𝑇 ). Applying EROs to a matrix does not affect the row space. We formalize this below. Section 4.2 Matrix Equality and Multiplication Proposition 4.1.7 99 Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F). If 𝐵 is row equivalent to 𝐴, then Row(𝐵) = Row(𝐴). Proof: Using the remark above, we note that Row(𝐴) = Col(𝐴𝑇 ) and Row(𝐵) = Col(𝐵 𝑇 ). Therefore, if we can show that Col(𝐵 𝑇 ) = Col(𝐴𝑇 ), then the proposition will be proved. We begin by considering the case where 𝐵 is obtained from 𝐴 by applying exactly one ERO. When we perform a single ERO of any type to 𝐴 to obtain 𝐵, each column of 𝐵 𝑇 will be a linear combination of the columns of 𝐴𝑇 . Therefore, Col(𝐵 𝑇 ) ⊆ Col(𝐴𝑇 ). As we know from Theorem 3.2. 18 (Elementary Operations), the operations which reverse EROs are themselves EROs. Therefore, we can obtain 𝐴 from 𝐵 by doing a single ERO. Thus, every column of 𝐴𝑇 is a linear combination of the columns of 𝐵 𝑇 . Therefore, Col(𝐴𝑇 ) ⊆ Col(𝐵 𝑇 ) and we have that Col(𝐵 𝑇 ) = Col(𝐴𝑇 ). In the case that 𝐵 is obtained from 𝐴 by applying multiple EROs, we can apply the above argument multiple times, once for each of the EROs. REMARK One might expect that a similar result would hold for the column space. That is, if 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), and if 𝐵 is row equivalent to 𝐴, [︂then ]︂Col(𝐵) = Col(𝐴). However, [︂this ]︂result 00 11 , but is row equivalent to 𝐴 = is not true. For example, the matrix 𝐵 = 11 00 {︂[︂ ]︂}︂ {︂[︂ ]︂}︂ 1 0 while Col(𝐴) = Span Col(𝐵) = Span . 0 1 4.2 Matrix Equality and Multiplication As a preliminary step toward discussing statements involving algebraic matrix operations, we establish what it means for matrices to be equal. Definition 4.2.1 Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝐵 ∈ 𝑀𝑝×𝑞 (F). We say that 𝐴 and 𝐵 are equal if Matrix Equality 1. 𝐴 and 𝐵 have the same size, that is, 𝑚 = 𝑝 and 𝑛 = 𝑞, and 2. The corresponding entries of 𝐴 and 𝐵 are equal, i.e., 𝑎𝑖𝑗 = 𝑏𝑖𝑗 , for all 𝑖 = 1, 2, . . . , 𝑚 and 𝑗 = 1, 2, . . . , 𝑛. We denote this by writing 𝐴 = 𝐵. While this definition seems to suggest that matrix equality should always be determined through exhaustive comparison of matrix entries, we can also uniquely identify matrices based on their action on other objects. Our next theorem gives a very useful criterion for 100 Chapter 4 Matrices checking that two given matrices of the same size are equal based on their behaviour in matrix-vector products. It requires a preliminary lemma, which is important in its own right. Lemma 4.2.2 (Column Extraction) [︀ ]︀ #» #» Let 𝐴 = 𝑎#»1 · · · 𝑎# » 𝑛 ∈ 𝑀𝑚×𝑛 (F). Then 𝐴 𝑒𝑖 = 𝑎𝑖 for all 𝑖 = 1, . . . , 𝑛. In other words, the matrix-vector multiplication of 𝐴 and the 𝑖𝑡ℎ standard basis vector 𝑒#»𝑖 yields the 𝑖𝑡ℎ column of 𝐴. Proof: According to the definition of matrix-vector multiplication, #» 𝐴𝑒#»1 = 1 · 𝑎#»1 + 0 · 𝑎#»2 + 0 · 𝑎#»3 + · · · + 0 · 𝑎# » 𝑛 = 𝑎1 , #» 𝐴𝑒#»2 = 0 · 𝑎#»1 + 1 · 𝑎#»2 + 0 · 𝑎#»3 + · · · + 0 · 𝑎# » 𝑛 = 𝑎2 , .. . #» #» #» #» 𝐴𝑒𝑛 = 0 · 𝑎1 + 0 · 𝑎2 + 0 · 𝑎#»3 + · · · + 1 · 𝑎# » 𝑛 = 𝑎𝑛 . Theorem 4.2.3 (Equality of Matrices) Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F). Then 𝐴 = 𝐵 if and only if 𝐴 #» 𝑥 = 𝐵 #» 𝑥 for all #» 𝑥 ∈ F𝑛 . Proof: If 𝐴 = 𝐵, then it is clear that 𝐴 #» 𝑥 = 𝐵 #» 𝑥 for all #» 𝑥 ∈ F𝑛 . [︁ ]︁ [︀ ]︀ #» #» #» #» #» 𝑛 Now, let 𝐴 = 𝑎#»1 · · · 𝑎# » 𝑛 , 𝐵 = 𝑏1 · · · 𝑏𝑛 , and suppose that 𝐴 𝑥 = 𝐵 𝑥 for all 𝑥 ∈ F . Then we can take #» 𝑥 equal to the 𝑖𝑡ℎ standard basis vector 𝑒#» and conclude that 𝐴𝑒#» = 𝐵 𝑒#» 𝑖 𝑖 𝑖 for all 𝑖 = 1, . . . , 𝑛. But then it follows from Lemma 4.2.2 (Column Extraction) that #» ⃗𝑎𝑖 = 𝐴𝑒#»𝑖 = 𝐵 𝑒#»𝑖 = 𝑏𝑖 . Since the columns of 𝐴 and 𝐵 are the same, we conclude that the matrices are equal. Recall that we can calculate the matrix–vector product 𝐴 #» 𝑥 provided the sizes of 𝐴 and #» 𝑥 #» 𝑛 are compatible: we need 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝑥 ∈ F . We extend this process to multiplying several column vectors, of the correct size, by the same matrix simultaneously. Definition 4.2.4 Matrix Multiplication Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝐵 ∈ 𝑀𝑛×𝑝 (F). We define the matrix product 𝐴𝐵 = 𝐶 to be the matrix 𝐶 ∈ 𝑀𝑚×𝑝 (F), constructed as follows: [︁ #» #» #» ]︁ [︁ #» #» #» ]︁ 𝐶 = 𝐴𝐵 = 𝐴 𝑏1 𝑏2 · · · 𝑏𝑝 = 𝐴𝑏1 𝐴𝑏2 · · · 𝐴𝑏𝑝 . That is, the 𝑗 𝑡ℎ column of 𝐶, 𝑐#»𝑗 , is obtained by multiplying the matrix 𝐴 by the 𝑗 𝑡ℎ column of the matrix 𝐵: #» 𝑐#»𝑗 = 𝐴𝑏𝑗 , for all 𝑗 = 1, . . . , 𝑝. Section 4.2 Matrix Equality and Multiplication 101 Thus, we can construct the product 𝐴𝐵, column by column, by calculating a sequence of #» matrix–vector products 𝐴𝑏𝑗 . REMARKS • In calculating the product 𝐴𝐵, the number of columns of the first matrix, 𝐴, must be equal to the number of rows of the second matrix, 𝐵. If these values do not match, then the product is undefined. • Matrix multiplication is generally non-commutative, so the order of multiplication matters. In other words, for matrices 𝐴 and 𝐵, we are not guaranteed that 𝐴𝐵 = 𝐵𝐴. • We can immediately connect the definition of matrix multiplication to our understanding of column spaces. As the 𝑗 𝑡ℎ column of 𝐶 = 𝐴𝐵 is obtained by multiplying the matrix 𝐴 by the 𝑗 𝑡ℎ column of the matrix 𝐵, the 𝑗 𝑡ℎ column of 𝐶 must be a linear combination of the columns of 𝐴 (by Proposition 3.9.6 (Matrix–Vector Multiplication in Terms of Columns)), and therefore is a member of Col(𝐴). ⎡ Example 4.2.5 ⎤ ]︂ [︂ 12 −1 3 , calculate the products 𝐴𝐵 and 𝐵𝐴 if possible. Given 𝐴 = ⎣ 3 5 ⎦ and 𝐵 = 2 −4 87 Solution: First, we note that the number of columns of 𝐴 is equal to the number of rows of 𝐵 and so the product is defined. Also, since 𝐴 has three rows and 𝐵 has two columns, then the size of the product will be 3 × 2. We have coloured the entries of 𝐴 blue and the entries of 𝐵 red to emphasize the role their entries play in calculating the product. ⎤ ⎤ ⎤ ⎡ ⎤ ⎡⎡ ⎡ ]︂ ]︂ ]︂ 1 2 [︂ 1 2 [︂ 1 2 [︂ ⎣3 5⎦ 3 ⎦ ⎣ 3 5 ⎦ −1 3 = ⎣⎣ 3 5 ⎦ −1 −4 2 2 −4 87 87 87 ⎡ ⎤ (1)(−1) + (2)(2) (1)(3) + (2)(−4) = ⎣ (3)(−1) + (5)(2) (3)(3) + (5)(−4) ⎦ (8)(−1) + (7)(2) (8)(3) + (7)(−4) ⎡ ⎤ 3 −5 = ⎣ 7 −11 ⎦ . 6 −4 The product 𝐵𝐴 is undefined, since 𝐵 has 2 columns, 𝐴 has 3 rows and 2 ̸= 3. [︂ Example 4.2.6 2 − 𝑖 1 − 2𝑖 Given 𝐴 = 3 − 2𝑖 1 − 3𝑖 𝐵𝐴 if possible. ]︂ [︂ and 𝐵 = ]︂ 1 + 𝑖 −2 + 𝑖 , calculate the products 𝐴𝐵 and −1 + 𝑖 3 + 2𝑖 Solution: First, we note that the number of columns of 𝐴 is equal to the number of rows of 𝐵 and so the product is defined. Also, since 𝐴 has two rows and 𝐵 has two columns, then the size of the product will be 2 × 2. 102 Chapter 4 Matrices We have coloured the entries of 𝐴 blue and the entries of 𝐵 red to emphasize the role their entries play in calculating the product. [︂ ]︂[︂ ]︂ 2 − 𝑖 1 − 2𝑖 1 + 𝑖 −2 + 𝑖 3 − 2𝑖 1 − 3𝑖 −1 + 𝑖 3 + 2𝑖 [︂ [︂ ]︂[︂ ]︂ [︂ ]︂[︂ ]︂ ]︂ 2 − 𝑖 1 − 2𝑖 1+𝑖 2 − 𝑖 1 − 2𝑖 −2 + 𝑖 = 3 − 2𝑖 1 − 3𝑖 −1 + 𝑖 3 − 2𝑖 1 − 3𝑖 3 + 2𝑖 [︂ ]︂ (2 − 𝑖)(1 + 𝑖) + (1 − 2𝑖)(−1 + 𝑖) (2 − 𝑖)(−2 + 𝑖) + (1 − 2𝑖)(3 + 2𝑖) = (3 − 2𝑖)(1 + 𝑖) + (1 − 3𝑖)(−1 + 𝑖) (3 − 2𝑖)(−2 + 𝑖) + (1 − 3𝑖)(3 + 2𝑖) [︂ ]︂ 4 + 4𝑖 4 = . 7 + 5𝑖 5 The product 𝐵𝐴 is also defined since the number of columns of 𝐵 is equal to the number of rows of 𝐴. [︂ ]︂[︂ ]︂ 1 + 𝑖 −2 + 𝑖 2 − 𝑖 1 − 2𝑖 −1 + 𝑖 3 + 2𝑖 3 − 2𝑖 1 − 3𝑖 [︂ [︂ ]︂[︂ ]︂ [︂ ]︂[︂ ]︂ ]︂ 1 + 𝑖 −2 + 𝑖 2−𝑖 1 + 𝑖 −2 + 𝑖 1 − 2𝑖 = −1 + 𝑖 3 + 2𝑖 3 − 2𝑖 −1 + 𝑖 3 + 2𝑖 1 − 3𝑖 ]︂ [︂ (1 + 𝑖)(2 − 𝑖) + (−2 + 𝑖)(3 − 2𝑖) (1 + 𝑖)(1 − 2𝑖) + (−2 + 𝑖)(1 − 3𝑖) = (−1 + 𝑖)(2 − 𝑖) + (3 + 2𝑖)(3 − 2𝑖) (−1 + 𝑖)(1 − 2𝑖) + (3 + 2𝑖)(1 − 3𝑖) ]︂ [︂ −1 + 8𝑖 4 + 6𝑖 . = 12 + 3𝑖 10 − 4𝑖 Notice that even though 𝐴𝐵 and 𝐵𝐴 are both defined, we have that 𝐴𝐵 ̸= 𝐵𝐴. In practice, we usually construct the product 𝐴𝐵 = 𝐶 one entry at a time. The entry (𝐶)𝑖𝑗 is in the 𝑖𝑡ℎ row and the 𝑗 𝑡ℎ column of the product. Therefore, (𝐶)𝑖𝑗 = (𝑐#»𝑗 )𝑖 , where #» #» 𝑐#»𝑗 = 𝐴𝑏𝑗 . For 𝐴 ∈ 𝑀𝑚×𝑛 (F), 𝐵 ∈ 𝑀𝑛×𝑝 (F), and 𝐶 ∈ 𝑀𝑚×𝑝 (F), the 𝑖𝑡ℎ entry of 𝐴𝑏𝑗 is 𝑛 ∑︁ 𝑎𝑖𝑘 𝑏𝑘𝑗 = 𝑎𝑖1 𝑏1𝑗 + 𝑎𝑖2 𝑏2𝑗 + · · · + 𝑎𝑖𝑛 𝑏𝑛𝑗 . 𝑘=1 Thus, 𝑛 ∑︁ #» (𝐶)𝑖𝑗 = (𝐴𝑏𝑗 )𝑖 = 𝑎𝑖𝑘 𝑏𝑘𝑗 . 𝑘=1 Consequently, in order to obtain the (𝑖, 𝑗)𝑡ℎ entry of 𝐴𝐵, we need to multiply the corresponding entries of the 𝑖𝑡ℎ row of 𝐴 and the 𝑗 𝑡ℎ column of 𝐵 and sum them up. We can re-write this last expression in a more suggestive manner: ⎡ ⎤ ⎡ ⎤ 𝑎𝑖1 𝑏1𝑗 ⎢ 𝑎𝑖2 ⎥ ⎢ 𝑏2𝑗 ⎥ ⎢ ⎥ ⎢ ⎥ (𝐶)𝑖𝑗 = ⎢ . ⎥ · ⎢ . ⎥ . ⎣ .. ⎦ ⎣ .. ⎦ 𝑎𝑖𝑛 𝑏𝑛𝑗 Section 4.3 Arithmetic Operations on Matrices 103 REMARK The equation above shows that the (𝑖, 𝑗)𝑡ℎ entry of 𝐴𝐵 may be obtained by taking the dot product of the (transposed) 𝑖𝑡ℎ row of 𝐴 with the 𝑗 𝑡ℎ column of 𝐵. Example 4.2.7 Determine the (2, 2)𝑡ℎ entry of the product ⎡ 1 ⎣ 𝐴𝐵 = 𝐶 = 3 8 ⎤ ]︂ 2 [︂ −1 3 ⎦ 5 . 2 −4 7 Solution: We use the second row of 𝐴 and the second column of 𝐵 to calculate [︂ ]︂ [︂ ]︂ [︂ ]︂ [︀ ]︀𝑇 3 3 3 𝑐22 = 3 5 · = · = 9 − 20 = −11. −4 5 −4 Example 4.2.8 Determine the (2, 1)𝑡ℎ entry of the product ]︂ ]︂[︂ [︂ 1 + 𝑖 −2 + 𝑖 2 − 𝑖 1 − 2𝑖 . 𝐶𝐷 = 𝐸 = 3 − 2𝑖 1 − 3𝑖 −1 + 𝑖 3 + 2𝑖 Solution: We use the second row of 𝐶 and the first column of 𝐷 to calculate ]︂ [︂ [︀ ]︀𝑇 1+𝑖 𝑒21 = 3 − 2𝑖 1 − 3𝑖 · −1 + 𝑖 [︂ ]︂ [︂ ]︂ 3 − 2𝑖 1+𝑖 = · 1 − 3𝑖 −1 + 𝑖 = (3 − 2𝑖)(1 + 𝑖) + (1 − 3𝑖)(−1 + 𝑖) = 7 + 5𝑖. 4.3 Arithmetic Operations on Matrices In this section, we will introduce additional operations that can be performed on matrices. We will also see that, while the sets 𝑀𝑚×𝑛 (R) and 𝑀𝑚×𝑛 (C) are very similar to R𝑛 and C𝑛 , in certain respects, there also exist quite a few important distinctions between them. Definition 4.3.1 Sum of Matrices Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F). We define the matrix sum 𝐴 + 𝐵 = 𝐶 to be the matrix 𝐶 ∈ 𝑀𝑚×𝑛 (F) whose (𝑖, 𝑗)𝑡ℎ entry is 𝑐𝑖𝑗 = 𝑎𝑖𝑗 + 𝑏𝑖𝑗 , for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛. 104 Chapter 4 Matrices REMARK Addition is not defined for matrices that do not have the same size. [︂ Example 4.3.2 ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 02 1 0 02 1 0 1 2 For 𝐴 = and 𝐵 = , we have the sum 𝐴 + 𝐵 = + = . 30 0 −4 30 0 −4 3 −4 Some basic properties of matrix addition and matrix multiplication are given in the next two propositions. Proposition 4.3.3 (Matrix Addition) If 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑚×𝑛 (F), then (a) 𝐴 + 𝐵 = 𝐵 + 𝐴. (b) 𝐴 + 𝐵 + 𝐶 = (𝐴 + 𝐵) + 𝐶 = 𝐴 + (𝐵 + 𝐶). Proposition 4.3.4 (Matrix Multiplication) If 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), 𝐶, 𝐷 ∈ 𝑀𝑛×𝑝 (F) and 𝐸 ∈ 𝑀𝑝×𝑞 (F), then (a) (𝐴 + 𝐵)𝐶 = 𝐴𝐶 + 𝐵𝐶. (b) 𝐴(𝐶 + 𝐷) = 𝐴𝐶 + 𝐴𝐷. (c) (𝐴𝐶)𝐸 = 𝐴(𝐶𝐸) = 𝐴𝐶𝐸. EXERCISE Prove the previous two Propositions. Definition 4.3.5 Additive Inverse Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). We define the additive inverse of 𝐴 to be the matrix −𝐴 whose (𝑖, 𝑗)𝑡ℎ entry is −𝑎𝑖𝑗 for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛. [︂ Example 4.3.6 Definition 4.3.7 Zero Matrix ]︂ [︂ ]︂ 1 2 −1 −2 If 𝐴 = , then the additive inverse of 𝐴 is −𝐴 = . 3 −4 −3 4 The 𝑚 × 𝑛 zero matrix is the matrix 𝒪𝑚×𝑛 ∈ 𝑀𝑚×𝑛 (F) all of whose entries are 0. Section 4.3 Arithmetic Operations on Matrices 105 REMARK 1. It is important to distinguish the zero matrix 𝒪𝑚×𝑛 from the zero scalar 0 and the zero vector ⃗0. 2. Usually, the context in which the zero matrix is used is sufficient for us to determine its size, in which case we, denote it by 𝒪. [︂ Example 4.3.8 We have 𝒪2×3 = ]︂ 000 and 𝒪1×1 = [0]. Note that 𝒪1×1 is technically different from the 000 zero scalar 0. On the other hand, 𝒪𝑚×1 Proposition 4.3.9 ⎡ ⎤ 0 ⎢ .. ⎥ = ⎣ . ⎦ is the zero vector in F𝑚 . 0 (Properties of the Additive Inverse and the Zero Matrix) If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then (a) 𝒪 + 𝐴 = 𝐴 + 𝒪 = 𝐴. (b) 𝐴 + (−𝐴) = (−𝐴) + 𝐴 = 𝒪. EXERCISE Prove the previous Proposition. Definition 4.3.10 Multiplication of a Matrix by a Scalar Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and 𝑐 ∈ F. We define the product of 𝑐 and 𝐴 to be the matrix 𝑐𝐴 ∈ 𝑀𝑚×𝑛 (F) whose (𝑖, 𝑗)𝑡ℎ entry is (𝑐𝐴)𝑖𝑗 = 𝑐𝑎𝑖𝑗 for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛. Notice that the product (−1)𝐴 (of the scalar −1 and the matrix 𝐴) is −𝐴, the additive inverse of 𝐴. That is, (−1)𝐴 = −𝐴. So, our notational choices are compatible. The following proposition gives some additional properties of scalar multiplication. Proposition 4.3.11 (Properties of Multiplication of a Matrix by a Scalar) If 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), 𝐶 ∈ 𝑀𝑛×𝑘 (F), and 𝑟, 𝑠 ∈ F, then (a) 𝑠(𝐴 + 𝐵) = 𝑠𝐴 + 𝑠𝐵. 106 Chapter 4 Matrices (b) (𝑟 + 𝑠)𝐴 = 𝑟𝐴 + 𝑠𝐴. (c) 𝑟(𝑠𝐴) = (𝑟𝑠)𝐴. (d) 𝑠(𝐴𝐶) = (𝑠𝐴)𝐶 = 𝐴(𝑠𝐶). Thus, we may write 𝑠𝐴𝐶 for any of these three quantities unambiguously. EXERCISE Prove the previous Proposition. Our previous results demonstrate that matrix operations behave like their scalar counterparts. However, there are significant differences. For instance, while it is true that 𝑎𝑏 = 𝑏𝑎 for all 𝑎, 𝑏 ∈ F, we have already seen an example of matrices 𝐴 and 𝐵 where 𝐴𝐵 ̸= 𝐵𝐴, even when both 𝐴𝐵 and 𝐵𝐴 are defined. (See Example 4.2.6.) Two other results that apply to scalars which students are tempted to apply to matrices are the following: 1. If 𝑎𝑏 = 𝑎𝑐 and 𝑎 ̸= 0, then 𝑏 = 𝑐 (cancellation law). 2. 𝑎𝑏 = 0 if and only if 𝑎 = 0 or 𝑏 = 0. These results do not extend to matrix arithmetic, in general. ]︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ 34 25 11 01 and 𝐴 ̸= 𝒪, but . Then 𝐴𝐵 = 𝐴𝐶 = , and 𝐶 = ,𝐵= 68 34 34 02 𝐵 ̸= 𝐶. Thus, the cancellation law does not apply to matrices. [︂ ]︂ [︂ ]︂ 01 37 2. Let 𝐴 = and 𝐵 = . Then 𝐴𝐵 = 𝒪, but 𝐴 ̸= 𝒪 and 𝐵 ̸= 𝒪. 02 00 [︂ Example 4.3.12 1. Let 𝐴 = In Section 4.1 we were introduced to the transpose 𝐴𝑇 of a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F). Recall that (𝐴𝑇 )𝑖𝑗 = (𝐴)𝑗𝑖 . Our next Proposition describes the interplay between transpose and the other operations on matrices. Proposition 4.3.13 (Properties of Matrix Transpose) If 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), 𝐶 ∈ 𝑀𝑛×𝑘 (F), and 𝑠 ∈ F, then (a) (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇 . (b) (𝑠𝐴)𝑇 = 𝑠𝐴𝑇 . (c) (𝐴𝐶)𝑇 = 𝐶 𝑇 𝐴𝑇 . (d) (𝐴𝑇 )𝑇 = 𝐴. Section 4.4 Properties of Square Matrices 107 REMARK Pay close attention to Property (c). The transpose of a product is the product of the transposes in reverse! EXERCISE Prove the previous Proposition. 4.4 Properties of Square Matrices We turn our attention to matrices that have the same number of rows as columns. Definition 4.4.1 Square Matrix A matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) where the number of rows is equal to the number of columns is called a square matrix . REMARK For the case where 𝑚 = 𝑛, it is common to refer to 𝑀𝑛×𝑛 (F) as 𝑀𝑛 (F). ]︂ 321 is not a square matrix, since it has 2 rows and 3 columns. The matrix The matrix 111 ]︂ [︂ −2 + 𝑖 3 − 2𝑖 is a square matrix, since it has 2 rows and 2 columns. 2 + 2𝑖 6 − 7𝑖 [︂ Example 4.4.2 Definition 4.4.3 Upper Triangular Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐴 is upper triangular if 𝑎𝑖𝑗 = 0 for 𝑖 > 𝑗 with 𝑖 = 1, . . . , 𝑛 and 𝑗 = 1, . . . , 𝑛. ⎤ ⎡ ⎤ 3 −4 6 𝑖 4 − 𝑖 6 − 4𝑖 1 + 𝑖 −7 The matrices , ⎣ 0 5 9 ⎦ , and ⎣ 0 1 + 𝑖 7𝑖 ⎦ are upper-triangular. 0 1−𝑖 0 0 7 0 0 2 − 4𝑖 [︂ Example 4.4.4 Definition 4.4.5 Lower Triangular ⎡ Let 𝐵 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐵 is lower triangular if 𝑏𝑖𝑗 = 0 for 𝑖 < 𝑗 with 𝑖 = 1, . . . , 𝑛 and 𝑗 = 1, . . . , 𝑛. ⎤ 𝑖 0 0 0 300 ⎢ 7𝑖 1 + 𝑖 0 𝑖 0 0 ⎥ ⎥ are lower triangular. The matrices , ⎣ 3 4 0 ⎦ , and ⎢ ⎣ 4 1−𝑖 6 2 + 𝑖 2 − 4𝑖 0 ⎦ 070 3 − 5𝑖 5 0 4 + 3𝑖 [︂ Example 4.4.6 ]︂ ]︂ ⎡ ⎤ ⎡ 108 Chapter 4 Matrices REMARKS • The transpose of an upper (lower) triangular matrix is a lower (upper) triangular matrix, respectively. • The product of upper (lower) triangular 𝑛 × 𝑛 matrices is upper (lower) triangular, respectively. Definition 4.4.7 Diagonal We say that an 𝑛 × 𝑛 matrix 𝐴 is diagonal if 𝑎𝑖𝑗 = 0 for 𝑖 ̸= 𝑗 with 𝑖 = 1, . . . , 𝑛 and 𝑗 = 1, . . . , 𝑛. We refer to the entries 𝑎𝑖𝑖 as the diagonal entries of 𝐴, and denote our matrix 𝐴 by 𝐴 = diag(𝑎11 , 𝑎22 , . . . , 𝑎𝑛𝑛 ). ⎤ 𝑖 0 0 0 300 ⎢0 1 + 𝑖 0 𝑖 0 0 ⎥ ⎥ are diagonal. The matrices , ⎣ 0 4 0 ⎦ , and ⎢ ⎣ 0 1−𝑖 0 0 2 − 4𝑖 0 ⎦ 000 0 0 0 4 + 3𝑖 [︂ Example 4.4.8 ]︂ ⎤ ⎡ ⎡ ⎡ Example 4.4.9 ⎤ 1 0 0 If 𝐶 = diag(1, −2, 3), then 𝐶 = ⎣ 0 −2 0 ⎦. 0 0 3 REMARKS • Perhaps the most simple example of a diagonal matrix is the 𝑛 × 𝑛 zero matrix 𝒪𝑛×𝑛 . • All diagonal matrices are also upper triangular and lower triangular matrices. Another simple (and important) example is the identity matrix, which we define as follows. Definition 4.4.10 Identity Matrix The diagonal matrix diag(1, 1, . . . , 1) is called the identity matrix, and is denoted by 𝐼. If we wish to indicate that the size of the identity matrix is 𝑛 × 𝑛, we add a subscript 𝑛 and write 𝐼𝑛 . ⎡ ⎤ 1 100 ⎢0 10 𝐼1 = [1], 𝐼2 = , 𝐼3 = ⎣ 0 1 0 ⎦ , and 𝐼4 = ⎢ ⎣0 01 001 0 [︂ Example 4.4.11 ]︂ ⎡ 0 1 0 0 0 0 1 0 ⎤ 0 0⎥ ⎥. 0⎦ 1 Section 4.5 Elementary Matrices 109 REMARK The identity matrix behaves as a multiplicative identity: for any 𝐴 ∈ 𝑀𝑚×𝑛 (F) we have that 𝐼𝑚 𝐴 = 𝐴 and 𝐴𝐼𝑛 = 𝐴. In particular, this also holds for vectors in F𝑛 , so 𝐼 #» 𝑥 = #» 𝑥 for all #» 𝑥 ∈ F𝑛 . 𝑛 When the identity matrix is multiplied by a scalar 𝑐 ∈ F, the result is a diagonal matrix whose diagonal entries are all equal to 𝑐: 𝑐𝐼𝑛 = diag(𝑐, 𝑐, . . . , 𝑐). [︂ Example 4.4.12 ]︂ 12 If 𝐴 = , then 𝐴𝐼2 = 𝐼2 𝐴 = 𝐴. 34 ]︂ [︂ 123 , then 𝐵𝐼3 = 𝐼2 𝐵 = 𝐵. If 𝐵 = 340 ]︂ 𝑐0 . If 𝑐 ∈ F, then 𝑐𝐼2 = 0𝑐 [︂ Example 4.4.13 4.5 Elementary Matrices In this section we will show that performing an elementary row operation (ERO) on a matrix is equivalent to multiplying that matrix on the left by a square matrix of a certain type. Definition 4.5.1 Elementary Matrices A matrix that can be obtained by performing a single ERO on the identity matrix is called an elementary matrix. If we wish to be more precise, we refer to elementary matrices using the same names as we have for EROs. Example 4.5.2 The following matrices are examples of elementary matrices: ⎡ ⎤ 001 • 𝐸1 = ⎣ 0 1 0 ⎦ is a row swap elementary matrix, since performing 𝑅1 ↔ 𝑅3 on 𝐼3 will 100 produce 𝐸1 . [︂ ]︂ 50 • 𝐸2 = is a row scale elementary matrix, since performing 𝑅1 → 5𝑅1 on 𝐼2 will 01 produce 𝐸2 . 110 Chapter 4 ⎡ 1 • 𝐸3 = ⎣ 0 0 𝑅2 on 𝐼3 Matrices ⎤ 0 0 1 −2 ⎦ is a row addition elementary matrix, since performing 𝑅2 → −2𝑅3 + 0 1 will produce 𝐸3 . Elementary matrices are useful because they can be used to carry out EROs. The next two results make this statement precise. Proposition 4.5.3 Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and suppose that a single ERO is performed on it to produce matrix 𝐵. Suppose, also, that we perform the same ERO on the matrix 𝐼𝑚 to produce the elementary matrix 𝐸. Then 𝐵 = 𝐸𝐴. Proof: We will prove the result only for row swap elementary matrices and leave the rest as an exercise. Let 𝐴 be an 𝑚 × 𝑛 matrix with 𝑚 > 1, and consider its transpose 𝐴𝑇 . The »𝑇 , where 𝑟#» = row # »(𝐴) is the 𝑖𝑡ℎ row of 𝐴. Thus columns of 𝐴𝑇 are 𝑟#»1 𝑇 , . . . , 𝑟#𝑚 𝑖 𝑖 [︀ ]︀ 𝐴𝑇 = 𝑟#»𝑇 · · · 𝑟#»𝑇 · · · 𝑟#»𝑇 · · · 𝑟# »𝑇 . 1 𝑖 𝑗 𝑚 Interchanging rows 𝑖 and 𝑗 of 𝐴 yields the matrix 𝐵 whose transpose is [︀ »𝑇 ]︀ . 𝐵 𝑇 = 𝑟#»1 𝑇 · · · 𝑟#»𝑗 𝑇 · · · 𝑟#»𝑖 𝑇 · · · 𝑟#𝑚 On the other hand, let [︀ » ]︀ 𝐸 = 𝑒#»1 · · · 𝑒#»𝑗 · · · 𝑒#»𝑖 · · · 𝑒#𝑚 be the elementary matrix that corresponds to the row swap 𝑅𝑖 ↔ 𝑅𝑗 . One can quickly show that 𝐸 = 𝐸 𝑇 ; it then follows from Lemma 4.2.2 (Column Extraction) that (𝐸𝐴)𝑇 = 𝐴𝑇 𝐸 𝑇 = 𝐴𝑇 𝐸 [︀ » ]︀ = 𝐴𝑇 𝑒#»1 · · · 𝑒#»𝑗 · · · 𝑒#»𝑖 · · · 𝑒#𝑚 [︀ » ]︀ = 𝐴𝑇 𝑒#»1 · · · 𝐴𝑇 𝑒#»𝑗 · · · 𝐴𝑇 𝑒#»𝑖 · · · 𝐴𝑇 𝑒#𝑚 [︀ »𝑇 ]︀ = 𝑟#»1 𝑇 · · · 𝑟#»𝑗 𝑇 · · · 𝑟#»𝑖 𝑇 · · · 𝑟#𝑚 = 𝐵𝑇 Since 𝐵 𝑇 = (𝐸𝐴)𝑇 , we see that 𝐵 = 𝐸𝐴 by Property (d) in Proposition 4.3.13 (Properties of Matrix Transpose). The previous Proposition tells us that we can perform an ERO on 𝐴 by multiplying 𝐴 on the left by the appropriate elementary matrix 𝐸. We can also carry out a sequence of EROs by multiplying 𝐴 on the left by a sequence of appropriate elementary matrices. Corollary 4.5.4 Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and suppose that a finite number of EROs, numbered 1 through 𝑘, are performed on 𝐴 to produce a matrix 𝐵. Let 𝐸𝑖 denote the elementary matrix corresponding to the 𝑖𝑡ℎ ERO (1 ≤ 𝑖 ≤ 𝑘) applied to 𝐼𝑚 . Then 𝐵 = 𝐸𝑘 . . . 𝐸2 𝐸1 𝐴. Section 4.5 Elementary Matrices 111 Proof: Use the Principle of Mathematical Induction on 𝑘 and Proposition 4.5.3. Pay attention to the order in which the elementary matrices appear in this product. Example 4.5.5 Let us revisit Example 3.7.1 in Section ⎡ 1 𝐵=⎣ 2 1 3.7. The augmented matrix, 𝐵, was given by ⎤ −2 −1 3 1 −4 1 0 5 ⎦. −2 2 −3 4 We perform the following EROs: ⎡ 1 −2 0 gives ⎣ 0 𝑅3 → −𝑅1 + 𝑅3 0 0 ⎡ 1 −2 −1 0 1 𝑅2 → 31 𝑅2 gives ⎣ 0 0 0 3 ⎡ 1 −2 0 𝑅3 → −3𝑅2 + 𝑅3 gives ⎣ 0 0 0 {︃ 𝑅2 → −2𝑅1 + 𝑅2 ⎤ −1 3 1 3 −6 3 ⎦ = 𝐶, 3 −6 3 ⎤ 3 1 −2 1 ⎦ = 𝐷, −6 3 ⎤ −1 3 1 1 −2 1 ⎦ = 𝐹. 0 0 0 For each of the four EROs, we give the corresponding elementary matrix: ⎡ ⎤ 1 00 𝑅2 → −2𝑅1 + 𝑅2 gives ⎣ −2 1 0 ⎦ = 𝐸1 , 0 01 ⎡ ⎤ 1 00 𝑅3 → −𝑅1 + 𝑅3 gives ⎣ 0 1 0 ⎦ = 𝐸2 . −1 0 1 Notice that 𝐶 = 𝐸2 𝐸1 𝐵. Further, ⎡ ⎤ 1 0 0 𝑅2 → 13 𝑅2 gives ⎣ 0 31 0 ⎦ = 𝐸3 , 0 0 1 ⎡ ⎤ 1 0 0 𝑅3 → −3𝑅2 + 𝑅3 gives ⎣ 0 1 0 ⎦ = 𝐸4 . 0 −3 1 Notice that 𝐷 = 𝐸3 𝐶 and 𝐹 = 𝐸4 𝐷. We can now verify our result by evaluating the product 𝐸4 𝐸3 𝐸2 𝐸1 𝐵 = 𝐸4 (𝐸3 (𝐸2 (𝐸1 𝐵))): 112 Chapter 4 Matrices ⎡ ⎤ ⎤⎡ ⎤⎡ ⎤⎡ ⎤⎡ 1 0 0 1 0 0 1 00 1 00 1 −2 −1 3 1 ⎣ 0 1 0 ⎦ ⎣ 0 1 0 ⎦ ⎣ 0 1 0 ⎦ ⎣ −2 1 0 ⎦ ⎣ 2 −4 1 0 5 ⎦ 3 0 −3 1 0 0 1 1 −2 2 −3 4 −1 0 1 0 01 ⎡ ⎤⎡ ⎤ ⎤⎡ ⎤⎡ 1 0 0 1 0 0 1 00 1 −2 −1 3 1 0 3 −6 3 ⎦ = ⎣ 0 1 0 ⎦ ⎣ 0 13 0 ⎦ ⎣ 0 1 0 ⎦ ⎣ 0 0 −3 1 1 −2 2 −3 4 0 0 1 −1 0 1 ⎡ ⎤⎡ ⎤⎡ ⎤ 1 0 0 1 −2 −1 3 1 1 0 0 0 3 −6 3 ⎦ = ⎣ 0 1 0 ⎦ ⎣ 0 13 0 ⎦ ⎣ 0 0 0 1 0 −3 1 0 0 3 −6 3 ⎤ ⎡ ⎤⎡ 1 0 0 1 −2 −1 3 1 0 1 −2 1 ⎦ = ⎣0 1 0⎦ ⎣ 0 0 −3 1 0 0 3 −6 3 ⎡ ⎤ 1 −2 −1 3 1 ⎣ 0 1 −2 1 ⎦ = 𝐹. = 0 0 0 0 0 0 4.6 Matrix Inverse Let 𝐴 be an 𝑚 × 𝑛 matrix. In this section, we aim to explore the following two questions: • Does there exist an 𝑛 × 𝑚 matrix 𝐵 such that, for all #» 𝑥 ∈ F𝑚 , 𝐴𝐵 #» 𝑥 = #» 𝑥? • Does there exist an 𝑛 × 𝑚 matrix 𝐶 such that, for all #» 𝑥 ∈ F𝑛 , 𝐶𝐴 #» 𝑥 = #» 𝑥? In the first case, you can think of the matrix 𝐴 as the one that “cancels out” the action of 𝐵. In the second case, you can think of the matrix 𝐶 as the one that “cancels out” the action of 𝐴. In view of the Theorem 4.2.3 (Equality of Matrices), let us reformulate our two motivating questions as follows: • Given an 𝑚 × 𝑛 matrix 𝐴, does there exist an 𝑛 × 𝑚 matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑚 ? • Given an 𝑚 × 𝑛 matrix 𝐴, does there exist an 𝑛 × 𝑚 matrix 𝐶 such that 𝐶𝐴 = 𝐼𝑛 ? If the matrices 𝐵 and 𝐶 exist, we refer to them as a right inverse and a left inverse of 𝐴, respectively. Notice that if 𝐵 and 𝐶 exist and all of 𝐴, 𝐵 and 𝐶 are square, then 𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 . This case is the most interesting to us, and it motivates the following definition. Definition 4.6.1 Invertible Matrix We say that an 𝑛 × 𝑛 matrix 𝐴 is invertible if there exist 𝑛 × 𝑛 matrices 𝐵 and 𝐶 such that 𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 . Section 4.6 Matrix Inverse 113 The following theorem demonstrates that if an 𝑛 × 𝑛 matrix 𝐴 is invertible, then its left and right inverses are equal. Proposition 4.6.2 (Equality of Left and Right Inverses) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). If there exist matrices 𝐵 and 𝐶 in 𝑀𝑛×𝑛 (F) such that 𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 , then 𝐵 = 𝐶. Proof: We have 𝐵 = 𝐼𝑛 𝐵 = (𝐶𝐴)𝐵 = 𝐶(𝐴𝐵) = 𝐶𝐼𝑛 = 𝐶. So far, we have learned that if both left and right inverses exist, then they must be equal. The next result guarantees that for square matrices the left inverse exists if and only if the right inverse exists. Theorem 4.6.3 (Left Invertible Iff Right Invertible) For 𝐴 ∈ 𝑀𝑛×𝑛 (F), there exists an 𝑛 × 𝑛 matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 if and only if there exists an 𝑛 × 𝑛 matrix 𝐶 such that 𝐶𝐴 = 𝐼𝑛 . Proof: Let 𝐴 be an 𝑛 × 𝑛 matrix. To prove the forward direction, suppose that there exists an 𝑛 × 𝑛 matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 . Consider the homogeneous system #» 𝐵 #» 𝑥 = 0. Multiplying both sides of this equation by 𝐴 on the left, we have that #» #» #» 𝐴𝐵 #» 𝑥 = 𝐴 0 =⇒ 𝐼𝑛 #» 𝑥 = 0 =⇒ #» 𝑥 = 0. #» #» Thus, the homogeneous system 𝐵 #» 𝑥 = 0 has the unique solution #» 𝑥 = 0 . Therefore, its solution set has no parameters, and so it follows from Part (a) of Theorem 3.6.7 (System Rank Theorem) that rank(𝐵) = 𝑛. It then follows from Part (b) of Theorem 3.6.7 (System #» #» Rank Theorem) that the system 𝐵 #» 𝑥 = 𝑏 has a solution for every 𝑏 ∈ F𝑛 . Thus, for any #» #» 𝑏 ∈ F𝑛 , there exists #» 𝑥 ∈ F𝑛 such that 𝑏 = 𝐵 #» 𝑥 . Consequently, #» #» #» (𝐵𝐴) 𝑏 = 𝐵𝐴(𝐵 #» 𝑥 ) = 𝐵(𝐴𝐵) #» 𝑥 = 𝐵𝐼𝑛 #» 𝑥 = 𝐵 #» 𝑥 = 𝑏 = 𝐼𝑛 𝑏 . #» #» #» Since the identity (𝐵𝐴) 𝑏 = 𝐼𝑛 𝑏 holds for all 𝑏 ∈ F𝑛 , it follows from Theorem 4.2.3 (Equality of Matrices) that 𝐵𝐴 = 𝐼𝑛 . Taking 𝐶 = 𝐵, the result follows. To prove the backward direction, suppose that there exists an 𝑛 × 𝑛 matrix 𝐶 such that 𝐶𝐴 = 𝐼𝑛 . Then 𝐼𝑛 = 𝐼𝑛𝑇 = (𝐶𝐴)𝑇 = 𝐴𝑇 𝐶 𝑇 . Thus, 𝐶 𝑇 is the right inverse of 𝐴𝑇 , and so it follows from the proof of the forward direction that 𝐶 𝑇 is also the left inverse of 𝐴𝑇 , that is, 𝐶 𝑇 𝐴𝑇 = 𝐼𝑛 . But then 𝐼𝑛 = 𝐼𝑛𝑇 = (𝐶 𝑇 𝐴𝑇 )𝑇 = (𝐴𝑇 )𝑇 (𝐶 𝑇 )𝑇 = 𝐴𝐶. Taking 𝐵 = 𝐶, the result follows. 114 Chapter 4 Matrices Notice that if a matrix 𝐴 is invertible, then there exists a unique matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 . Indeed, suppose that 𝐴𝐵1 = 𝐼𝑛 and 𝐴𝐵2 = 𝐼𝑛 . Since 𝐴𝐵1 = 𝐴𝐵2 , we see that 𝐵1 𝐴𝐵1 = 𝐵1 𝐴𝐵2 . By Theorem 4.6.3 (Left Invertible Iff Right Invertible), we know that 𝐵1 𝐴 = 𝐼𝑛 , so 𝐵1 = 𝐵2 . In view of this, consider the following definition. Definition 4.6.4 Inverse of a Matrix If an 𝑛 × 𝑛 matrix 𝐴 is invertible, we refer to the matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 as the inverse of 𝐴. We denote the inverse of 𝐴 by 𝐴−1 . The inverse of 𝐴 satisfies 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼𝑛 . REMARK The above results tell us that, in order to verify that the matrix 𝐵 is the inverse of 𝐴, it is sufficient to verify that 𝐴𝐵 = 𝐼𝑛 . We do not need to also verify that 𝐵𝐴 = 𝐼𝑛 . ]︂ ]︂ [︂ −𝑖 𝑖 1+𝑖 𝑖 , because is the inverse of 𝐴 = The matrix 𝐵 = 1 −1 − 𝑖 1 𝑖 [︂ Example 4.6.5 [︂ −𝑖 𝑖 𝐴𝐵 = 1 −1 − 𝑖 ]︂ [︂ ]︂ ]︂ [︂ 10 1+𝑖 𝑖 . = 01 1 𝑖 ]︂ ]︂ [︂ 2 −1 11 , because is not the inverse of 𝐴 = The matrix 𝐵 = −2 2 12 [︂ Example 4.6.6 [︂ 2 −1 𝐴𝐵 = −2 2 ]︂ [︂ ]︂ ]︂ [︂ ]︂ [︂ 10 10 11 . ̸ = = 01 02 12 While there exist numerous criteria for the invertibility of a matrix, for now we will outline only three of them. Theorem 4.6.7 (Invertibility Criteria – First Version) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The following three conditions are equivalent: (a) 𝐴 is invertible. (b) rank(𝐴) = 𝑛. (c) RREF(𝐴) = 𝐼𝑛 . Proof: ((𝑎) ⇒ (𝑏)): Suppose that 𝐴 is invertible. Then there exists a matrix 𝐵 such that 𝐵𝐴 = 𝐼𝑛 . We can show that rank(𝐴) = 𝑛 as in the proof of Theorem 4.6.3 (Left Invertible Iff Right Invertible). Section 4.6 Matrix Inverse 115 ((𝑏) ⇒ (𝑐)): Suppose that rank(𝐴) = 𝑛. Then every row and every column of the 𝑛×𝑛 matrix RREF(𝐴) has a pivot. The only matrix that can satisfy this condition is the 𝑛 × 𝑛 identity matrix 𝐼𝑛 . ((𝑐) ⇒ (𝑎)): Suppose that RREF(𝐴) = 𝐼𝑛 . Then the rank of 𝐴 is equal to the number of rows of 𝐴, which #» 𝑛 ⃗ means that the system 𝐴 #» 𝑥 = ⃗𝑏 is consistent for every [︁ #» 𝑏 ∈ F#».]︁Thus, we let 𝑏𝑖 be a solution to 𝐴 #» 𝑥 = 𝑒#»𝑖 for 𝑖 = 1, . . . , 𝑛. If we now define 𝐵 = 𝑏1 · · · 𝑏𝑛 , then by construction [︁ #» ]︀ #» ]︁ [︀ #» ]︁ [︁ #» 𝐴𝐵 = 𝐴 𝑏1 · · · 𝑏𝑛 = 𝐴𝑏1 · · · 𝐴𝑏𝑛 = 𝑒#»1 · · · 𝑒#» 𝑛 = 𝐼𝑛 From Theorem 4.6.3 (Left Invertible Iff Right Invertible) it follows that there exists a matrix 𝐶 such that 𝐶𝐴 = 𝐼𝑛 . Since 𝐴𝐵 = 𝐶𝐴 = 𝐼𝑛 , we conclude that 𝐴 is invertible. Let us look closely at the last part of the proof, as it describes an algorithm for finding the inverse of an invertible matrix 𝐴. Indeed, if we are able to solve each of the equations #» #» #» 𝐴𝑏1 = 𝑒#»1 , 𝐴𝑏2 = 𝑒#»2 , . . . , 𝐴𝑏𝑛 = 𝑒#» 𝑛 [︁ #» ]︁ #» #» #» for 𝑏1 , . . . , 𝑏𝑛 ∈ F𝑛 , then 𝐴−1 = 𝑏1 · · · 𝑏𝑛 . Let 𝑅 = RREF(𝐴), and suppose that a sequence of EROs 𝑟1 , . . . , 𝑟𝑘 is used to row reduce the augmented matrix 𝐴 to 𝑅. Convince yourself that the same sequence of EROs can be used to row reduce any system [𝐴 | 𝑒#»𝑖 ] to #» #» its RREF, [𝑅 | 𝑏𝑖 ], for some vector 𝑏𝑖 . Consequently, the same sequence of EROs can be used to row reduce the super-augmented matrix [︀ ]︀ [𝐴 | 𝐼 ] = 𝐴 𝑒#» · · · 𝑒#» 𝑛 1 𝑛 into its RREF [︁ #» #» ]︁ [𝑅 | 𝐵] = 𝑅 𝑏1 · · · 𝑏𝑛 [︁ #» #» ]︁ for some 𝑛×𝑛 matrix 𝐵 = 𝑏1 · · · 𝑏𝑛 . Notice how by performing row reduction on a superaugmented matrix, we are solving 𝑛 distinct systems of linear equations simultaneously. Of #» course, this is possible because these systems 𝐴 𝑏𝑖 = 𝑒#»𝑖 have equal coefficient matrices. By Theorem 3.6.7 (System Rank Theorem), we have that • If rank(𝐴) ̸= 𝑛 (which is equivalent to 𝑅 ̸= 𝐼𝑛 ), there exists an index 𝑖 such that the #» system 𝐴 𝑏𝑖 = 𝑒#»𝑖 is inconsistent (by Proposition 3.10.2). Consequently, it is impossible to find a matrix 𝐵 such that 𝐴𝐵 = 𝐼𝑛 , which means that 𝐴 is not invertible. #» • If rank(𝐴) = 𝑛 (which is equivalent to 𝑅 = 𝐼𝑛 ), then each system 𝐴 𝑏𝑖 = 𝑒#»𝑖 is consistent, and so the matrix 𝐵 is the inverse 𝐴. We summarize our observations in the following proposition. Proposition 4.6.8 (Algorithm for Checking Invertibility and Finding the Inverse) The following algorithm allows you to determine whether an 𝑛 × 𝑛 matrix 𝐴 is invertible, and if it is, the algorithm will provide the inverse of 𝐴. 1. Construct a super-augmented matrix [𝐴 | 𝐼𝑛 ]. 2. Find the RREF, [𝑅 | 𝐵], of [𝐴 | 𝐼𝑛 ]. 3. If 𝑅 ̸= 𝐼𝑛 , conclude that 𝐴 is not invertible. If 𝑅 = 𝐼𝑛 , conclude that 𝐴 is invertible, and that 𝐴−1 = 𝐵. 116 Example 4.6.9 Chapter 4 Determine whether the matrix [︂ 𝐴= 12 34 Matrices ]︂ is invertible. If it is invertible, find its inverse. Solution: In order to find the inverse, we need to solve two systems of equations with augmented matrices [𝐴 | 𝑒#»1 ] and [𝐴 | 𝑒#»2 ]. Instead of solving them separately, we will solve them simultaneously by forming a super-augmented matrix ]︂ [︂ [︀ ]︀ 1 2 1 0 [𝐴 | 𝐼2 ] = 𝐴 𝑒#»1 𝑒#»2 = 3 4 0 1 and row reducing it. We have [︂ 𝑅2 → 𝑅2 − 3𝑅1 gives ]︂ 1 2 1 0 0 −2 −3 1 . Notice that the matrix that we’ve obtained is in REF. Since it has two pivots, we have that rank(𝐴) = 2, so it must be the case that 𝐴 is invertible. We continue reducing to RREF. 𝑅2 → − 21 𝑅2 [︂ gives [︂ 𝑅1 → 𝑅1 − 2𝑅2 gives 1 2 1 0 0 1 32 − 12 ]︂ 1 0 −2 1 0 1 32 − 12 , ]︂ . Thus, we conclude that the inverse of 𝐴 is [︂ ]︂ ]︂ [︂ 1 4 −2 −2 1 −1 𝐴 = 3 . 1 =− 2 −3 1 2 −2 Let us verify our calculations: ]︂ ]︂ [︂ ]︂ [︂ [︂ 1 4 · 1 + (−2) · 3 4 · 2 + (−2) · 4 1 4 −2 1 2 −1 𝐴 𝐴=− =− 34 2 −3 1 2 (−3) · 1 + 1 · 3 (−3) · 2 + 1 · 4 [︂ ]︂ [︂ ]︂ 1 −2 0 10 =− = . 0 −2 01 2 Example 4.6.10 Determine whether the matrix ⎡ ⎤ 123 𝐵 = ⎣4 5 6⎦ 789 is invertible. If it is invertible, find its inverse. Solution: We consider the super-augmented matrix [𝐵 | 𝐼3 ] and reduce it to a matrix in REF. Section 4.6 Matrix Inverse 117 {︃ 𝑅2 → 𝑅2 − 4𝑅1 𝑅3 → 𝑅3 − 7𝑅1 1 𝑅2 → − 𝑅2 3 𝑅3 → 𝑅3 + 6𝑅2 ⎡ 1 2 3 gives ⎣ 0 −3 −6 0 −6 −12 ⎡ 1 2 3 1 2 gives ⎣ 0 0 −6 −12 ⎡ 1 2 3 1 gives ⎣ 0 1 2 43 0 0 0 1 ⎤ 1 0 0 −4 1 0 ⎦ , −7 0 1 ⎤ 1 0 0 4 1 ⎦, 3 −3 0 −7 0 1 ⎤ 0 0 − 31 0 ⎦ . −2 1 At this point, we have that rank(𝐵) = 2, and so it is not invertible. Example 4.6.11 Determine whether the matrix ⎡ ⎤ 12 3 𝐶 = ⎣4 5 6 ⎦ 7 8 10 is invertible. If it is invertible, find its inverse. Solution: We consider the super-augmented matrix [𝐶 | 𝐼3 ] and row reduce it. ⎡ {︃ 𝑅2 → 𝑅2 − 4𝑅1 𝑅3 → 𝑅3 − 7𝑅1 gives 1 𝑅2 → − 𝑅2 3 gives 𝑅3 → 𝑅3 + 6𝑅2 gives 1 2 3 ⎣ 0 −3 −6 0 −6 −11 ⎡ 1 2 3 ⎣ 0 1 2 0 −6 −11 ⎡ 1 2 3 1 ⎣ 0 1 2 4 3 0 0 1 1 At this point, we have that rank(𝐶) = 3, and so it reducing it to RREF. ⎡ {︃ 1 2 𝑅1 → 𝑅1 − 3𝑅3 ⎣ 0 1 gives 𝑅2 → 𝑅2 − 2𝑅3 0 0 ⎡ 1 0 ⎣ 0 1 𝑅1 → 𝑅1 − 2𝑅2 gives 0 0 ⎤ 1 0 0 −4 1 0 ⎦ , −7 0 1 ⎤ 1 0 0 4 1 ⎦, 3 −3 0 −7 0 1 ⎤ 0 0 − 13 0 ⎦ . −2 1 is invertible. We thus continue row ⎤ 0 −2 6 −3 0 − 23 11 −2 ⎦ , 3 1 1 −2 1 ⎤ 2 4 0 −3 −3 1 11 0 − 23 −2 ⎦ . 3 1 1 −2 1 We conclude that the inverse of 𝐶 is ⎡ 2 4 ⎤ ⎡ ⎤ −3 −3 1 −2 −4 3 1 ⎢ ⎥ 𝐶 −1 = ⎣ − 32 11 = ⎣ −2 11 −6 ⎦ . 3 −2 ⎦ 3 3 −6 3 1 −2 1 118 Chapter 4 Matrices Let us verify our calculations. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 3 (︂ )︂ −2 −4 3 300 ⎣ 4 5 6 ⎦ 1 ⎣ −2 11 −6 ⎦ = 1 ⎣ 0 3 0 ⎦ = 𝐼3 3 3 7 8 10 3 −6 3 003 Example 4.6.12 Determine whether the matrix [︂ 𝐷= −1 − 2 𝑖 −1 − 𝑖 1−𝑖 −𝑖 ]︂ is invertible. If it is invertible, find its inverse. Solution We consider the super-augmented matrix [𝐷 | 𝐼2 ] and row reduce it. [︂ −1 − 2 𝑖 −1 − 𝑖 1 0 1−𝑖 −𝑖 0 1 [︂ 5 3 − 𝑖 −1 + 2 𝑖 0 2 1−𝑖 0 1+𝑖 [︂ 5 3−𝑖 −1 + 2 𝑖 0 0 −1 − 3 𝑖 2 − 4 𝑖 5 + 5 𝑖 [𝐷 | 𝐼2 ] = {︃ 𝑅1 → (−1 + 2𝑖)𝑅1 𝑅2 → (1 + 𝑖)𝑅2 gives 𝑅2 → −2𝑅1 + 2𝑅2 gives ]︂ ]︂ ]︂ At this point, we have that rank(𝐷) = 2 and so 𝐷 is invertible. We thus continue row reducing to RREF. [︂ 5 3 − 𝑖 −1 + 2 𝑖 0 0 3 − 𝑖 4 + 2 𝑖 −5 + 5 𝑖 [︂ ]︂ 5 0 −5 5 − 5𝑖 0 3 − 𝑖 4 + 2 𝑖 −5 + 5 𝑖 𝑅2 → 𝑖𝑅2 gives 𝑅1 → 𝑅1 − 𝑅2 gives As ]︂ 1 3+𝑖 3+𝑖 = = , we complete our reduction with 3−𝑖 9+1 10 {︃ [︂ ]︂ 𝑅1 → 15 𝑅1 1 0 −1 1−𝑖 gives 1 0 1 1 + 𝑖 −2 + 𝑖 𝑅2 → 10 (3 + 𝑖)𝑅2 Therefore, we conclude that 𝐷 −1 [︂ = −1 1−𝑖 1 + 𝑖 −2 + 𝑖 ]︂ Let us verify our calculations: [︂ ]︂ [︂ ]︂ −1 1−𝑖 −1 − 2 𝑖 −1 − 𝑖 1 + 𝑖 −2 + 𝑖 1−𝑖 −𝑖 [︂ ]︂ [︂ ]︂ 2 1 + 2𝑖 + (1 − 𝑖) 1 + 𝑖 − 𝑖(1 − 𝑖) 10 = = . (1 + 𝑖)(−1 − 2𝑖) + (−2 + 𝑖)(1 − 𝑖) − (1 + 𝑖)2 − 𝑖(−2 + 𝑖) 01 Section 4.6 Matrix Inverse 119 The inverse of a 2 × 2 matrix may be computed as follows. Proposition 4.6.13 (Inverse of a 2 × 2 Matrix) [︂ ]︂ 𝑎 𝑏 Let 𝐴 = . Then 𝐴 is invertible if and only if 𝑎𝑑 − 𝑏𝑐 ̸= 0. Furthermore, if 𝑎𝑑 − 𝑏𝑐 ̸= 0, 𝑐 𝑑 then [︂ ]︂ 1 𝑑 −𝑏 −1 . 𝐴 = 𝑎𝑑 − 𝑏𝑐 −𝑐 𝑎 Proof: If 𝑎𝑑 − 𝑏𝑐 ̸= 0, then we may define a matrix 𝐵 ∈ 𝑀2×2 (F) by [︂ ]︂ 1 𝑑 −𝑏 𝐵= . 𝑎𝑑 − 𝑏𝑐 −𝑐 𝑎 A straightforward computation shows that 𝐴𝐵 = 𝐵𝐴 = 𝐼2 . Thus, 𝐴 is invertible in this case and moreover 𝐵 = 𝐴−1 , as claimed. Conversely, assume that 𝑎𝑑 − 𝑏𝑐 = 0. We will show in this case that 𝐴 is not invertible by proving that rank(𝐴) ̸= 2. We will have to consider two cases: 𝑑 = 0 and 𝑑 ̸= 0. If 𝑑 = 0 then [︂ ]︂ 𝑎 𝑏 𝐴= . 𝑐 0 and 𝑏𝑐 = 𝑎𝑑 = 0 (since 𝑎𝑑 − 𝑏𝑐 = 0 by assumption). So, one of 𝑏 or 𝑐 must be 0. If 𝑐 = 0 then 𝐴 has a row of zeros, so it can have at most one pivot, hence rank(𝐴) ̸= 2, and 𝐴 is not invertible. A similar argument can be applied if 𝑏 = 0: in that case, there can be at most one pivot (in the first column). Next, if 𝑑 ̸= 0, we can perform the following EROs to 𝐴: ]︂ 𝑎𝑑 𝑏𝑑 . 𝑅1 → 𝑑𝑅1 gives 𝑐 𝑑 ]︂ [︂ 𝑎𝑑 − 𝑏𝑐 0 . 𝑅1 → −𝑏𝑅2 + 𝑅1 gives 𝑐 𝑑 [︂ Since 𝑎𝑑 − 𝑏𝑐 = 0, we have thus row reduced 𝐴 to a matrix with a row of zeros. Hence, as above, 𝐴 can have at most one pivot, and therefore, it cannot be invertible. This completes the analysis of all cases. REMARK [︂ ]︂ 𝑎 𝑏 , then the number 𝑎𝑑 − 𝑏𝑐 is called the determinant of 𝐴. We will discuss it 𝑐 𝑑 in more detail in Chapter 6. If 𝐴 = 120 Example 4.6.14 Chapter 4 Matrices Find the inverses of the following matrices, if they exist. [︂ ]︂ [︂ ]︂ [︂ ]︂ 12 12 −1 − 2𝑖 −1 − 𝑖 𝐴= 𝐵= 𝐶= 34 36 1−𝑖 −𝑖 Solution: The determinant of 𝐴 is (1(4) − 2(3)) = −2, and so 𝐴 is invertible with [︂ ]︂ 1 4 −2 −1 𝐴 =− . 2 −3 1 The determinant of 𝐵 is (1(6) − 2(3)) = 0, and so 𝐵 is not invertible. The determinant of 𝐶 is (−1 − 2𝑖)(−𝑖) − (−1 − 𝑖)(1 − 𝑖) = −2 + 𝑖 + 2 = 𝑖 1 = −𝑖, we have 𝑖 ]︂ ]︂ [︂ [︂ −1 1 − 𝑖 −𝑖 1+𝑖 −1 = 𝐶 = −𝑖 1 + 𝑖 −2 + 𝑖 −1 + 𝑖 −1 − 2𝑖 and so 𝐶 is invertible. As Chapter 5 Linear Transformations 5.1 The Function Determined by a Matrix Up to this point in the course, we have considered matrices as arrays of numbers which contain important information about systems of linear equations through either the coefficient matrix, the augmented matrix, or both. Moving forward, we will discover that there is much more going on than this. We will make use of multiplication by a matrix to define functions. Definition 5.1.1 Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). The function determined by the matrix 𝐴 is the function Function Determined by a Matrix 𝑇𝐴 : F𝑛 → F𝑚 defined by 𝑇𝐴 ( #» 𝑥 ) = 𝐴 #» 𝑥. ⎤ 1 4 𝑥 ∈ R2 , then Let 𝐴 = ⎣ −2 −5 ⎦. If #» 4 6 ⎡ Example 5.1.2 ⎡ ⎤ ⎡ ⎤ 1 4 [︂ ]︂ 𝑥1 + 4𝑥2 𝑥 𝑇𝐴 ( #» 𝑥 ) = ⎣ −2 −5 ⎦ 1 = ⎣ −2𝑥1 − 5𝑥2 ⎦ . 𝑥2 4 6 4𝑥1 + 6𝑥2 Note that 𝑇𝐴 ( #» 𝑥 ) ∈ R3 . [︂ ]︂ −2 For instance, if #» 𝑥 = , then 𝑇𝐴 ( #» 𝑥 ) = 𝑇𝐴 3 (︂[︂ −2 3 ]︂)︂ ⎡ ⎤ 10 = ⎣ −11 ⎦. 10 The function 𝑇𝐴 takes as input vectors in R2 and outputs vectors in R3 . 121 122 Chapter 5 Linear Transformations [︂ Example 5.1.3 ]︂ 𝑖 1 + 2𝑖 3 + 2𝑖 Let 𝐴 = . If #» 𝑧 ∈ C3 , then 2 − 𝑖 4 2 − 5𝑖 𝑇𝐴 ( #» 𝑧) = [︂ 𝑖 1 + 2𝑖 3 + 2𝑖 2−𝑖 4 2 − 5𝑖 ]︂ ⎡ ⎤ [︂ ]︂ 𝑧1 ⎣ 𝑧2 ⎦ = 𝑖𝑧1 + (1 + 2𝑖)𝑧2 + (3 + 2𝑖)𝑧3 . (2 − 𝑖)𝑧1 + 4𝑧2 + (2 − 5𝑖)𝑧3 𝑧3 Note that 𝑇𝐴 ( #» 𝑧 ) ∈ C2 . ⎡ ⎤ ⎛⎡ ⎤⎞ [︂ ]︂ 3 3 12𝑖 #» #» ⎣ ⎦ ⎝ ⎣ ⎦ ⎠ 2−𝑖 For instance, if 𝑧 = 2 − 𝑖 , then 𝑇𝐴 ( 𝑧 ) = 𝑇𝐴 = . 24 − 3𝑖 2𝑖 2𝑖 The function 𝑇𝐴 takes as input vectors in C3 and outputs vectors in C2 . The function 𝑇𝐴 determined by a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F) has some rather special features. In Theorem 3.9.9 we saw that matrix–vector multiplication is linear : that is, for any #» 𝑥 , #» 𝑦 ∈ F𝑛 and any 𝑐 ∈ F, the following two properties hold: 𝐴( #» 𝑥 + #» 𝑦 ) = 𝐴 #» 𝑥 + 𝐴 #» 𝑦 #» #» 𝐴(𝑐 𝑥 ) = 𝑐𝐴 𝑥 In view of these properties, the following result holds. Theorem 5.1.4 (Function Determined by a Matrix is Linear) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 be the function determined by the matrix 𝐴. Then 𝑇𝐴 is linear; that is, for any #» 𝑥 , #» 𝑦 ∈ F𝑛 and any 𝑐 ∈ F, the following two properties hold: (a) 𝑇𝐴 ( #» 𝑥 + #» 𝑦 ) = 𝑇𝐴 ( #» 𝑥 ) + 𝑇𝐴 ( #» 𝑦) (b) 𝑇𝐴 (𝑐 #» 𝑥 ) = 𝑐 𝑇𝐴 ( #» 𝑥) Proof: 𝑇𝐴 ( #» 𝑥 + #» 𝑦 ) = 𝐴( #» 𝑥 + #» 𝑦) #» = 𝐴( 𝑥 ) + 𝐴( #» 𝑦) and = 𝑇𝐴 ( #» 𝑥 ) + 𝑇𝐴 ( #» 𝑦) 5.2 𝑇𝐴 (𝑐 #» 𝑥 ) = 𝐴(𝑐 #» 𝑥) #» = 𝑐𝐴 𝑥 = 𝑐𝑇𝐴 ( #» 𝑥) (by definition) (by Theorem 3.9.9) (by definition). Linear Transformations The properties exhibited by 𝑇𝐴 in Theorem 5.1.4 (Function Determined by a Matrix is Linear) turn out to be of such importance in linear algebra that we give a name to the class of functions that exhibit these properties. Section 5.2 Linear Transformations Definition 5.2.1 Linear Transformation 123 We say that the function 𝑇 : F𝑛 → F𝑚 is a linear transformation (or linear mapping) if, for any #» 𝑥 , #» 𝑦 ∈ F𝑛 and any 𝑐 ∈ F, the following two properties hold: 1. 𝑇 ( #» 𝑥 + #» 𝑦 ) = 𝑇 ( #» 𝑥 ) + 𝑇 ( #» 𝑦 ) (called linearity over addition). 2. 𝑇 (𝑐 #» 𝑥 ) = 𝑐𝑇 ( #» 𝑥 ) (called linearity over scalar multiplication). We refer to F𝑛 here as the domain of 𝑇 and F𝑚 as the codomain of 𝑇 , as we would for any function. REMARKS • In linear algebra, the words transformation or mapping are often used instead of the word function. • In high school, you studied a variety of non-linear functions, such as 𝑓 (𝑥) = 𝑥2 + 1 or 𝑓 (𝑥) = 2𝑥 . These functions usually had a simple domain and codomain, such as R. In linear algebra, the domains and codomains are more complicated (such as R𝑛 or C𝑛 ), but the functions themselves are rather simple — they are linear. If you go on to take a vector calculus course, you may see more complicated functions and more complicated domains and codomains occurring together. REMARK A fundamental example of a linear transformation is the function 𝑇𝐴 determined by a matrix 𝐴. The linearity of 𝑇𝐴 was established in Theorem 5.1.4 (Function Determined by a Matrix is Linear). From this point forward we shall refer to 𝑇𝐴 as the linear transformation determined by 𝐴, instead of the function determined by 𝐴. We now consider some features of linear transformations. We begin with the result below which tells us that, if we want, we can check linearity over addition and linearity over scalar multiplication simultaneously, instead of checking them separately. We will give a few examples after the following two results. Proposition 5.2.2 (Alternate Characterization of a Linear Transformation) Let 𝑇 : F𝑛 → F𝑚 be a function. Then 𝑇 is a linear transformation if and only if for any #» 𝑥 , #» 𝑦 ∈ F𝑛 and any 𝑐 ∈ F, 𝑇 (𝑐 #» 𝑥 + #» 𝑦 ) = 𝑐𝑇 ( #» 𝑥 ) + 𝑇 ( #» 𝑦 ). Proof: We begin with the forward direction. Suppose that 𝑇 is a linear transformation. #» = 𝑐 #» Let #» 𝑥 , #» 𝑦 ∈ F𝑛 and 𝑐 ∈ F. Put 𝑤 𝑥 . Since 𝑇 is linear, we have #» + #» #» + 𝑇 ( #» 𝑇 (𝑤 𝑦 ) = 𝑇 ( 𝑤) 𝑦) 124 Chapter 5 and Linear Transformations #» = 𝑇 (𝑐 #» 𝑇 ( 𝑤) 𝑥 ) = 𝑐𝑇 ( #» 𝑥 ). Thus, we have #» + #» 𝑇 (𝑐 #» 𝑥 + #» 𝑦 ) = 𝑇 (𝑤 𝑦) #» = 𝑇 ( 𝑤) + 𝑇 ( #» 𝑦) = 𝑐𝑇 ( #» 𝑥 ) + 𝑇 ( #» 𝑦 ). Now for the backward direction. Suppose that for all #» 𝑥 , #» 𝑦 ∈ F𝑛 and for all 𝑐 ∈ F, 𝑇 (𝑐 #» 𝑥 + #» 𝑦 ) = 𝑐𝑇 ( #» 𝑥 ) + 𝑇 ( #» 𝑦 ). Taking 𝑐 = 1, we find that for all #» 𝑥 , #» 𝑦 ∈ F𝑛 , 𝑇 ( #» 𝑥 + #» 𝑦 ) = 𝑇 ( #» 𝑥 ) + 𝑇 ( #» 𝑦 ). #» Taking #» 𝑦 = 0 , we find that for all #» 𝑥 ∈ F𝑛 and for all 𝑐 ∈ F, 𝑇 (𝑐 #» 𝑥 + #» 𝑦 ) = 𝑇 (𝑐 #» 𝑥 ) = 𝑐𝑇 ( #» 𝑥 ). We conclude that 𝑇 is a linear transformation. Proposition 5.2.3 (Zero Maps to Zero) Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. Then #» #» 𝑇 ( 0 F𝑛 ) = 0 F𝑚 . #» #» Proof: Notice that 0 F𝑛 = 0 · 0 F𝑛 . Since 𝑇 is linear, #» #» 𝑇 ( 0 F𝑛 ) = 𝑇 (0 · 0 F𝑛 ) #» = 0 · 𝑇 ( 0 F𝑛 ) #» = 0 F𝑚 , #» where the last equality follows from the fact that for any #» 𝑥 ∈ F𝑚 , 0 · #» 𝑥 = 0 F𝑚 . We will now look at some functions 𝑇 : F𝑛 → F𝑚 and determine whether or not they are linear. Towards this end, here are a couple of questions to consider: • Does the zero vector of the domain get mapped to the zero vector of the codomain? If it does not, then the function is not linear. • Does the function look linear? √ All of its terms must be linear. For example, the terms 2𝑥1 , 3𝑥2 , 𝜋𝑥3 , and −𝑥4 are linear, while the terms of the form −3𝑥1 𝑥3 , 𝑥21 or 𝑥1 𝑥2 𝑥3 are not linear. If you encounter any such non-linear terms, this indicates that the function is likely not linear. Find a counterexample to definitively prove that the function is not linear. If the answer to both of these questions is yes, then we use the definition to prove linearity. Section 5.2 Linear Transformations Example 5.2.4 125 Determine whether or not the function 𝑇1 : R3 → R2 defined by ⎛⎡ ⎤⎞ [︂ ]︂ 𝑥 2𝑥 + 3𝑦 + 4 ⎝ ⎣ ⎦ ⎠ 𝑦 𝑇1 = 6𝑥 − 7𝑧 𝑧 is a linear transformation. Solution: This transformation is not linear, because [︀ ]︀𝑇 [︀ ]︀𝑇 [︀ ]︀𝑇 00 . 𝑇1 ( 0 0 0 ) = 4 0 = ̸ Example 5.2.5 Determine whether or not the function 𝑇2 : C3 → C2 defined by ⎛⎡ ⎤⎞ [︂ ]︂ 𝑧1 2𝑧1 + 3𝑧3 ⎦ ⎠ ⎝ ⎣ 𝑧2 = 𝑇2 𝑧2 𝑧3 𝑧3 is a linear transformation. Solution: The zero vector of C3 is mapped to the zero vector of C2 by this function. However, this function does not look linear because of the product 𝑧2 𝑧3 . We try to construct a counterexample. The (possibly) troublesome term does not involve 𝑧1 , so we will set that to zero. Since the image under the function does involve the other two variables let us see what happens for different values (setting both to 1 or 0 is not usually helpful). Note that on one hand, ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ [︂ ]︂ [︂ ]︂ [︂ ]︂ 0 0 3 3 0 . = + 𝑇2 ⎝⎣ 1 ⎦⎠ + 𝑇2 ⎝⎣ 0 ⎦⎠ = 0 0 0 1 0 On the other hand, ⎛⎡ ⎤⎞ ⎛⎡ ⎤ ⎡ ⎤⎞ [︂ ]︂ 0 0 0 3 ⎦ ⎣ ⎦ ⎠ ⎝ ⎣ ⎦ ⎠ ⎝ ⎣ 1 1 + 0 = 𝑇2 = . 𝑇2 1 1 1 0 Since ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎛⎡ ⎤ ⎡ ⎤⎞ 0 0 0 0 𝑇2 ⎝⎣ 1 ⎦⎠ + 𝑇2 ⎝⎣ 0 ⎦⎠ ̸= 𝑇2 ⎝⎣ 1 ⎦ + ⎣ 0 ⎦⎠ , 0 1 0 1 the function 𝑇2 is not a linear transformation. Example 5.2.6 Determine whether or not the function 𝑇3 : R2 → R3 defined by ⎡ ⎤ (︂[︂ ]︂)︂ 2𝑥1 + 3𝑥2 𝑥1 𝑇3 = ⎣ 6𝑥1 − 5𝑥2 ⎦ 𝑥2 2𝑥1 is a linear transformation. 126 Chapter 5 Linear Transformations Solution: As the zero vector of R2 is mapped to the zero vector of R3 and we do not detect any non-linear terms, this function could be linear. Let us prove that this is indeed the case. [︂ ]︂ [︂ ]︂ 𝑥1 #» 𝑦 #» Let 𝑥 = , 𝑦 = 1 ∈ R2 and 𝑐 ∈ R. Then, using the definition of 𝑇3 , we have 𝑥2 𝑦2 𝑇3 ( #» 𝑥 ) = 𝑇3 (︂[︂ 𝑥1 𝑥2 𝑇3 ( #» 𝑦 ) = 𝑇3 (︂[︂ 𝑦1 𝑦2 ]︂)︂ and We also have ]︂)︂ ⎤ 2𝑥1 + 3𝑥2 = ⎣ 6𝑥1 − 5𝑥2 ⎦ 2𝑥1 ⎡ ⎡ ⎤ 2𝑦1 + 3𝑦2 = ⎣ 6𝑦1 − 5𝑦2 ⎦ . 2𝑦1 [︂ ]︂ [︂ ]︂ [︂ ]︂ 𝑐𝑥1 𝑦1 𝑐𝑥1 + 𝑦1 #» #» 𝑐𝑥 + 𝑦 = + = , 𝑐𝑥2 𝑦2 𝑐𝑥2 + 𝑦2 so that ⎤ 2(𝑐𝑥1 + 𝑦1 ) + 3(𝑐𝑥2 + 𝑦2 ) 𝑐𝑥1 + 𝑦1 𝑇3 (𝑐 #» 𝑥 + #» 𝑦 ) = 𝑇3 = ⎣ 6(𝑐𝑥1 + 𝑦1 ) − 5(𝑐𝑥2 + 𝑦2 ) ⎦ 𝑐𝑥2 + 𝑦2 2(𝑐𝑥1 + 𝑦1 ) ⎡ ⎤ ⎡ ⎤ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 2𝑥1 + 3𝑥2 2𝑦1 + 3𝑦2 𝑦1 𝑥1 ⎣ ⎦ ⎣ ⎦ = 𝑐 6𝑥1 − 5𝑥2 + 6𝑦1 − 5𝑦2 = 𝑐𝑇3 . + 𝑇3 𝑦2 𝑥2 2𝑥1 2𝑦1 (︂[︂ ]︂)︂ ⎡ That is, 𝑇3 (𝑐 #» 𝑥 + #» 𝑦 ) = 𝑐𝑇3 ( #» 𝑥 ) + 𝑇3 ( #» 𝑦 ). It follows that 𝑇3 is a linear transformation (by Proposition 5.2.2 (Alternate Characterization of a Linear Transformation)). Example 5.2.7 Determine whether or not the function 𝑇4 : C2 → C2 defined by (︂[︂ ]︂)︂ [︂ ]︂ 𝑧1 2𝑖𝑧1 + 3𝑧2 𝑇4 = . 𝑧2 (2 − 𝑖)𝑧1 − (1 + 3𝑖)𝑧2 is a linear transformation. Solution: As the zero vector of C2 is mapped to the zero vector of C2 and we do not detect any non-linear terms, this function could be linear. Let us prove that this is indeed the case. [︂ ]︂ [︂ ]︂ 𝑧1 #» 𝑤1 #» Let 𝑧 = ,𝑤 = ∈ C2 , and 𝑐 ∈ C. Then, from the definition of 𝑇4 , we have 𝑧2 𝑤2 (︂[︂ ]︂)︂ [︂ ]︂ 𝑧1 2𝑖𝑧1 + 3𝑧2 #» 𝑇4 ( 𝑧 ) = 𝑇4 = 𝑧2 (2 − 𝑖)𝑧1 − (1 + 3𝑖)𝑧2 and #» = 𝑇 𝑇4 ( 𝑤) 4 (︂[︂ 𝑤1 𝑤2 ]︂)︂ [︂ = ]︂ 2𝑖𝑤1 + 3𝑤2 . (2 − 𝑖)𝑤1 − (1 + 3𝑖)𝑤2 Section 5.3 The Range of a Linear Transformation and “Onto” Linear Transformations We also have 127 [︂ ]︂ [︂ ]︂ [︂ ]︂ #» = 𝑐𝑧1 + 𝑤1 = 𝑐𝑧1 + 𝑤1 , 𝑐 #» 𝑧 +𝑤 𝑐𝑧2 𝑤2 𝑐𝑧2 + 𝑤2 so that #» = 𝑇 𝑇4 (𝑐 #» 𝑧 + 𝑤) 4 [︂ =𝑐 (︂[︂ 𝑐𝑧1 + 𝑤1 𝑐𝑧2 + 𝑤2 ]︂)︂ [︂ 2𝑖(𝑐𝑧1 + 𝑤1 ) + 3(𝑐𝑧2 + 𝑤2 ) = (2 − 𝑖)(𝑐𝑧1 + 𝑤1 ) − (1 + 3𝑖)(𝑐𝑧2 + 𝑤2 ) ]︂ [︂ ]︂ 2𝑖𝑧1 + 3𝑧2 2𝑖𝑤1 + 3𝑤2 + (2 − 𝑖)𝑧1 − (1 + 3𝑖)𝑧2 (2 − 𝑖)𝑤1 − (1 + 3𝑖)𝑤2 (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 𝑤1 𝑧1 . + 𝑇4 = 𝑐𝑇4 𝑤2 𝑧2 ]︂ #» = 𝑐𝑇 ( #» #» That is, 𝑇4 (𝑐 #» 𝑧 + 𝑤) 4 𝑧 ) + 𝑇4 ( 𝑤). It follows that 𝑇4 is a linear transformation (by Proposition 5.2.2 (Alternate Characterization of a Linear Transformation)). 5.3 The Range of a Linear Transformation and “Onto” Linear Transformations A linear transformation is a special kind of function, and, as with many other functions, we consider the set of all outputs of a linear transformation. Definition 5.3.1 Range Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. We define the range of 𝑇 , denoted Range(𝑇 ), to be the set of all outputs of 𝑇 . That is, Range(𝑇 ) = {𝑇 ( #» 𝑥 ) : #» 𝑥 ∈ F𝑛 } . The range of 𝑇 is a subset of F𝑚 . #» #» Since for any linear transformation 𝑇 : F𝑛 → F𝑚 we have 𝑇 ( 0 F𝑛 ) = 0 F𝑚 (by Proposition #» 5.2.3 (Zero Maps to Zero)), we have that 0 F𝑚 ∈ Range(𝐴). Therefore, the range of a linear transformation is never empty. Proposition 5.3.2 (Range of a Linear Transformation) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F), and let 𝑇𝐴 : F𝑛 → F𝑚 be the linear transformation determined by 𝐴. Then Range(𝑇𝐴 ) = Col(𝐴). Proof: We will show that these two subsets of F𝑚 are equal by showing that each set is contained in the other. First, suppose that #» 𝑦 ∈ Range(𝑇𝐴 ). Then there exists #» 𝑥 ∈ F𝑛 such that 𝑇𝐴 ( #» 𝑥 ) = #» 𝑦 . That is, there exists a vector #» 𝑥 ∈ F𝑛 such that 𝐴 #» 𝑥 = #» 𝑦 . Since the system 𝐴⃗𝑥 = ⃗𝑦 is consistent if and only if #» 𝑦 ∈ Col(𝐴) (see Proposition 4.1.2 (Consistent System and Column Space)), we conclude that #» 𝑦 ∈ Col(𝐴). 128 Chapter 5 Linear Transformations Next, suppose that #» 𝑧 ∈ Col(𝐴). Then there exist scalars 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ∈ F such that ]︀𝑇 [︀ #» #» #» 𝑥 = #» 𝑧 ; that is, 𝑧 = 𝑥1 𝑎 1 + 𝑥2 𝑎 2 + · · · + 𝑥𝑛 #» 𝑎 𝑛 . If we let #» 𝑥 = 𝑥1 𝑥2 · · · 𝑥𝑛 , then 𝐴 #» #» #» #» 𝑧 = 𝑇𝐴 ( 𝑥 ), and so 𝑧 ∈ Range(𝑇𝐴 ). Thus, we conclude that Range(𝑇𝐴 ) = Col(𝐴). REMARK (Connection to Systems of Linear Equations) We have already seen in Proposition 4.1.2 (Consistent System and Column Space) that #» #» the system of linear equations 𝐴 #» 𝑥 = 𝑏 has a solution if and only if 𝑏 ∈ Col(𝐴). We can now write #» #» 𝑥 = 𝑏 is consistent if and only if 𝑏 ∈ Range(𝑇𝐴 ). 𝐴 #» ⎤ 1 4 Determine Range(𝑇𝐴 ), where 𝐴 = ⎣ −2 −5 ⎦. 4 6 ⎡ Example 5.3.3 Solution: The range of 𝑇𝐴 is ⎧⎡ ⎤⎫ ⎤ ⎡ 4 ⎬ 1 ⎨ Range(𝑇𝐴 ) = Col(𝐴) = Span ⎣ −2 ⎦ , ⎣ −5 ⎦ . ⎭ ⎩ 6 4 #» #» Note that the system of linear equations 𝐴 #» 𝑥 = 𝑏 is consistent if and only if 𝑏 ∈ Range(𝑇𝐴 ). ]︂ 𝑖 1 + 2𝑖 3 + 2𝑖 . Determine Range(𝑇𝐴 ), where 𝐴 = 2 − 𝑖 4 2 − 5𝑖 [︂ Example 5.3.4 Solution: The range of 𝑇𝐴 is {︂[︂ Range(𝑇𝐴 ) = Span 𝑖 2−𝑖 ]︂ [︂ ]︂ [︂ ]︂}︂ 1 + 2𝑖 3 + 2𝑖 , , . 4 2 − 5𝑖 #» It can be shown that Range(𝑇𝐴 ) = C2 , and thus the system of linear equations 𝐴 #» 𝑥 = 𝑏 is #» consistent for all 𝑏 ∈ C2 . Linear transformations whose range is equal to the entire codomain deserve a special name. Definition 5.3.5 Onto We say that the transformation 𝑇 : F𝑛 → F𝑚 is onto (or surjective) if Range(𝑇 ) = F𝑚 . Section 5.4 The Range of a Linear Transformation and “Onto” Linear Transformations Corollary 5.3.6 129 (Onto Criteria) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 be the linear transformation determined by the matrix 𝐴. The following statements are equivalent: (a) 𝑇𝐴 is onto. (b) Col(𝐴) = F𝑚 . (c) rank(𝐴) = 𝑚. Proof: ((𝑎) ⇒ (𝑏)): Suppose that 𝑇𝐴 is onto. From Proposition 5.3.2 (Range of a Linear Transformation), it follows that Col(𝐴) = Range(𝑇𝐴 ) = F𝑚 . ((𝑏) ⇒ (𝑐)): #» #» Suppose that Col(𝐴) = F𝑚 . The system 𝐴 #» 𝑥 = 𝑏 is consistent if and only if 𝑏 ∈ Col(𝐴) #» (see Proposition 4.1.2 (Consistent System and Column Space)). Thus we have that 𝐴 #» 𝑥 = 𝑏 is consistent for all ⃗𝑏 ∈ F𝑚 . From Part (b) of Theorem 3.6.7 (System Rank Theorem), we conclude that rank(𝐴) = 𝑚. ((𝑐) ⇒ (𝑎)): Suppose that rank(𝐴) = 𝑚. From Part (b) of Theorem 3.6.7 (System Rank Theorem), #» #» it follows that the system 𝐴 #» 𝑥 = 𝑏 is consistent for all 𝑏 ∈ F𝑚 . By Proposition 4.1.2 (Consistent System and Column Space), Col(𝐴) = F𝑚 . By Proposition 5.3.2 (Range of a Linear Transformation), Range(𝑇𝐴 ) = Col(𝐴) = F𝑚 . ⎡ Example 5.3.7 Determine whether or not the linear transformation 𝑇𝐴 ⎤ 1 4 determined by 𝐴 = ⎣ −2 −5 ⎦ is 4 6 onto. Solution: From Example 5.3.3 we know that ⎧⎡ ⎤ ⎡ ⎤⎫ 4 ⎬ ⎨ 1 Range(𝑇𝐴 ) = Col(𝐴) = Span ⎣ −2 ⎦ , ⎣ −5 ⎦ . ⎩ ⎭ 4 6 The range of 𝑇𝐴 is only a plane in R3 . Therefore, it cannot be equal to all of R3 , and so #» #» Range(𝑇𝐴 ) ̸= R3 . There will be some vectors 𝑏 in R3 for which the equation 𝑇𝐴 ( #» 𝑥) = 𝑏 has no solution. This will happen ⎡ for ⎤ any vector which does not lie on the plane determined 1 by Col(𝐴). One such vector is ⎣ 0 ⎦. Therefore, the linear transformation 𝑇𝐴 is not onto. 0 130 Chapter 5 5.4 Linear Transformations The Kernel of a Linear Transformation and “One-toOne” Linear Transformations When analyzing functions, we consider the set of inputs whose output is zero. Definition 5.4.1 Kernel Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. We define the kernel of 𝑇 , denoted Ker(𝑇 ), to be the set of inputs of 𝑇 whose output is zero. That is, {︁ #» }︁ Ker(𝑇 ) = #» 𝑥 ∈ F𝑛 : 𝑇 ( #» 𝑥 ) = 0 F𝑚 . The kernel of 𝑇 is a subset of F𝑛 . #» #» Since for any linear transformation 𝑇 : F𝑛 → F𝑚 we have 𝑇 ( 0 F𝑛 ) = 0 F𝑚 (by Proposition #» 5.2.3 (Zero Maps to Zero)), we have 0 F𝑛 ∈ Ker(𝑇 ). Thus, the kernel of a linear transformation is never empty. Proposition 5.4.2 (Kernel of a Linear Transformation) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 : F𝑛 → F𝑚 be the linear transformation determined by 𝐴. Then Ker(𝑇𝐴 ) = Null(𝐴). Proof: Using the definitions of kernel and nullspace, we find that #» 𝑥 ∈ Ker(𝑇𝐴 ) ⇐⇒ #» #» #» #» #» 𝑇𝐴 ( 𝑥 ) = 0 F𝑚 ⇐⇒ 𝐴 𝑥 = 0 F𝑚 ⇐⇒ 𝑥 ∈ Null(𝐴). REMARK #» The kernel of 𝑇𝐴 is equal to the solution set of the homogeneous system 𝐴 #» 𝑥 = 0. Definition 5.4.3 One-to-One We say that the transformation 𝑇 : F𝑛 → F𝑚 is one-to-one (or injective) if, whenever 𝑇 ( #» 𝑥 ) = 𝑇 ( #» 𝑦 ), then #» 𝑥 = #» 𝑦. REMARK Notice that the statement For all #» 𝑥 , #» 𝑦 ∈ F𝑛 , if 𝑇 ( #» 𝑥 ) = 𝑇 ( #» 𝑦 ) then #» 𝑥 = #» 𝑦 is logically equivalent to its contrapositive For all #» 𝑥 , #» 𝑦 ∈ F𝑛 , if #» 𝑥 ̸= #» 𝑦 then 𝑇 ( #» 𝑥 ) ̸= 𝑇 ( #» 𝑦) Thus, one-to-one linear transformations have the nice property that they map distinct elements of F𝑛 to distinct elements of F𝑚 . Section 5.4 The Kernel of a Linear Transformation and “One-to-One” Linear Transformations 131 There is a simple way to check if a linear transformation is one-to-one. Proposition 5.4.4 (One-to-One Test) Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. Then #» 𝑇 is one-to-one if and only if Ker(𝑇 ) = { 0 F𝑛 }. #» Proof: Suppose that 𝑇 is one-to-one. We have noted that { 0 F𝑛 } ⊆ Ker(𝑇 ), so it remains #» 𝑥 in Ker(𝑇 ), to show that Ker(𝑇 ) ⊆ { 0 F𝑛 }. For this purpose, choose an arbitrary vector #» #» #» #» #» 𝑥 ) = 𝑇 ( 0 F𝑛 ). Since 𝑇 is so that 𝑇 ( #» 𝑥 ) = 0 F𝑚 . Since 𝑇 ( 0 F𝑛 ) = 0 F𝑚 , we have that 𝑇 ( #» #» #» one-to-one, we conclude that #» 𝑥 = 0 F𝑛 . Thus, Ker(𝑇 ) ⊆ { 0 F𝑛 }. #» 𝑥 , #» 𝑦 ∈ F𝑛 that satisfy 𝑇 ( #» 𝑥 ) = 𝑇 ( #» 𝑦 ), Conversely, suppose that Ker(𝑇 ) = { 0 F𝑛 }. Given any #» #» #» we must show that 𝑥 = 𝑦 . To this end, notice that #» #» 𝑇 ( #» 𝑥 ) = 𝑇 ( #» 𝑦 ) =⇒ 𝑇 ( #» 𝑥 ) − 𝑇 ( #» 𝑦 ) = 0 F𝑚 =⇒ 𝑇 ( #» 𝑥 − #» 𝑦 ) = 0 F𝑚 , where the last implication follows from linearity. This shows that #» 𝑥 − #» 𝑦 ∈ Ker(𝑇 ). Since #» #» #» #» by assumption Ker(𝑇 ) = { 0 F𝑛 }, it follows that 𝑥 − 𝑦 = 0 F𝑛 , and therefore #» 𝑥 = #» 𝑦 , as required. If we know that our linear transformation 𝑇 is of the form 𝑇 = 𝑇𝐴 for some matrix 𝐴, then we can say a bit more. Corollary 5.4.5 (One-to-One Criteria) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and let 𝑇𝐴 be the linear transformation determined by the matrix 𝐴. The following statements are equivalent. (a) 𝑇𝐴 is one-to-one. (b) Null(𝐴) = {⃗0F𝑛 }. (c) nullity(𝐴) = 0. (d) rank(𝐴) = 𝑛. Proof: ((𝑎) ⇒ (𝑏)): This follows from Proposition 5.4.2 (Kernel of a Linear Transformation) and Proposition 5.4.4 (One-to-One Test). ((𝑏) ⇒ (𝑐)): #» #» Suppose that Null(𝐴) = { 0 F𝑛 }. Then the consistent system 𝐴 #» 𝑥 = 0 F𝑚 has a unique #» solution, namely #» 𝑥 = 0 F𝑛 . Thus, the solution set of this consistent system has 0 parameters, and so it follows from the definition of nullity that nullity(𝐴) = 0. ((𝑐) ⇒ (𝑑)): Suppose that nullity(𝐴) = 0. Since nullity(𝐴) = 𝑛 − rank(𝐴) by definition, we see that rank(𝐴) = 𝑛. 132 Chapter 5 Linear Transformations ((𝑑) ⇒ (𝑎)): #» Suppose that rank(𝐴) = 𝑛 and consider the system 𝐴 #» 𝑥 = 0 F𝑚 . Note that this system is #» consistent, because #» 𝑥 = 0 F𝑛 is a solution. From Part (a) of Theorem 3.6.7 (System Rank Theorem), it follows that number of parameters in the solution set of this consistent system #» #» 𝑥 = 0 F𝑚 is is 𝑛 − rank(𝐴) = 𝑛 − 𝑛 = 0. We conclude that the solution #» 𝑥 = 0 F𝑛 to 𝐴 #» unique. #» #» This shows that Null(𝐴) = { 0 F𝑛 }. Thus Ker(𝑇𝐴 ) = { 0 F𝑛 } by Proposition 5.4.2 (Kernel of a Linear Transformation), and therefore 𝑇𝐴 is one-to-one by Proposition 5.4.4 (One-to-One Test). Example 5.4.6 Determine whether or not the linear transformation 𝑇𝐴 : R3 → R3 determined by ⎡ ⎤ 2 4 10 𝐴 = ⎣ 4 −4 −4 ⎦ is one-to-one. 6 8 22 Solution: Let us examine the kernel of 𝑇𝐴 , which is the same as looking at the solution #» set to 𝐴 #» 𝑥 = 0 . The augmented matrix is ⎡ ⎤ 2 4 10 0 ⎣ 4 −4 −4 0 ⎦ , 6 8 22 0 which row reduces to ⎡ ⎤ 1250 ⎣0 1 2 0⎦ . 0000 Thus rank(𝐴) = 2 < 3 = 𝑛, and it follows that 𝑇𝐴 is not one-to-one by Part (d) of Corollary 5.4.5 (One-to-One Criteria). By combining Corollary 5.3.6 (Onto Criteria) and Corollary 5.4.5 (One-to-One Criteria) we arrive at the following result concerning square matrices. The proof is left as an exercise. (Compare with Theorem 4.6.7 (Invertibility Criteria – First Version).) Theorem 5.4.7 (Invertibility Criteria – Second Version) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) be a square matrix and let 𝑇𝐴 be the linear transformation determined by the matrix 𝐴. The following statements are equivalent. (a) 𝐴 is invertible. (b) 𝑇𝐴 is one-to-one. (c) 𝑇𝐴 is onto. #» #» (d) Null(𝐴) = { 0 }. That is, the only solution to the homogeneous system 𝐴 #» 𝑥 = 0 is the #» trivial solution #» 𝑥 = 0. #» #» (e) Col(𝐴) = F𝑛 . That is, for every 𝑏 ∈ F𝑛 , the system 𝐴 #» 𝑥 = 𝑏 is consistent. (f) nullity(𝐴) = 0. Section 5.5 Every Linear Transformation is Determined by a Matrix 133 (g) rank(𝐴) = 𝑛. (h) RREF(𝐴) = 𝐼𝑛 . 5.5 Every Linear Transformation is Determined by a Matrix In Section 5.1 we saw that every matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F) determines a linear transformation 𝑇𝐴 : F𝑛 → F𝑚 . We will show in this section that, conversely, every linear transformation 𝑇 : F𝑛 → F𝑚 is actually of the form 𝑇 = 𝑇𝐴 for a certain matrix 𝐴 (that depends on 𝑇 ). In fact, we will show that there is a very natural process that constructs this matrix 𝐴 from 𝑇. We will make use of the set ⎧⎡ ⎤ ⎡ ⎤ 1 0 ⎪ ⎪ ⎪ ⎨⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ℰ = ⎢ . ⎥,⎢ . ⎥,··· ⎪ ⎣ .. ⎦ ⎣ .. ⎦ ⎪ ⎪ ⎩ 0 0 ⎡ ⎤⎫ 0 ⎪ ⎪ ⎢ 0 ⎥⎪ ⎬ ⎢ ⎥ , ⎢ . ⎥ = {𝑒#»1 , 𝑒#»2 , . . . , 𝑒#» 𝑛} ⎣ .. ⎦⎪ ⎪ ⎪ ⎭ 1 of standard basis vectors in F𝑛 (which we had already met in Definition 1.3.15). Example 5.5.1 Let us examine the consequences of linearity in the special [︂case]︂when F𝑛 = F𝑚 = F2 . Thus 𝑥 suppose that 𝑇 : F2 → F2 is a linear mapping and let #» 𝑥 = 1 be a vector in F2 . Then 𝑥2 (︂[︂ 𝑇 𝑥1 𝑥2 ]︂)︂ (︂[︂ ]︂ [︂ ]︂)︂ 𝑥1 0 =𝑇 + 𝑥2 0 [︂ ]︂)︂ (︂ [︂ ]︂ 0 1 + 𝑥2 = 𝑇 𝑥1 1 0 (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 0 1 + 𝑥2 𝑇 = 𝑥1 𝑇 0 1 [︂ ]︂ [︀ ]︀ 𝑥1 = 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) 𝑥2 [︀ #» ]︀ #» #» = 𝑇 (𝑒1 ) 𝑇 (𝑒2 ) 𝑥 . (by linearity) This shows us that the actual effect of]︀ the linear transformation can be replicated by the [︀ introduction of a matrix 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) . [︀ ]︀ In addition, this matrix 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) has columns which are constructed by applying 𝑇 to the basis vectors 𝑒#»1 and 𝑒#»2 in F2 . This means that if we know what the linear transformation does to just these two (standard basis) vectors, then we can determine what it does to all vectors in F2 . Finally, value of 𝑇 ( #» 𝑥 ) can be computed by matrix multiplication of this matrix [︀ #» the#»actual ]︀ 𝑇 (𝑒 ) 𝑇 (𝑒 ) by the component vector #» 𝑥 . This result extends to higher dimensions. 1 2 134 Definition 5.5.2 Standard Matrix Chapter 5 Linear Transformations Let 𝑇 : F𝑛 → F𝑚 be a linear transformation. We define the standard matrix of 𝑇 , denoted by [𝑇 ]ℰ , to be 𝑚 × 𝑛 matrix whose columns are the images under 𝑇 of the vectors in the standard basis of F𝑛 : [︀ ]︀ [𝑇 ]ℰ = 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) · · · 𝑇 (𝑒#» 𝑛) ⎛⎡ ⎤⎞ ⎤ ⎡ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ 0 0 1 ⎜⎢ 0 ⎥⎟ ⎥ ⎢ ⎜⎢ 0 ⎥⎟ ⎜⎢ 1 ⎥⎟ ⎢ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟ ⎥ = ⎢ 𝑇 ⎜⎢ . ⎥⎟ 𝑇 ⎜⎢ . ⎥⎟ · · · 𝑇 ⎜⎢ . ⎥⎟ ⎥ . ⎝⎣ .. ⎦⎠ ⎦ ⎣ ⎝⎣ .. ⎦⎠ ⎝⎣ .. ⎦⎠ 0 1 0 REMARKS • There is a 𝑇 to remind us of the linear transformation that we are going to represent. • The square brackets in [𝑇 ]ℰ is our way of indicating that we are converting a linear transformation into a matrix. • The ℰ indicates that the standard basis is being used for both the domain and the codomain. Later we will investigate the consequences of using different bases in either or both of these two sets. Theorem 5.5.3 (Every Linear Transformation Is Determined by a Matrix) Let 𝑇 : F𝑛 → F𝑚 be a linear transformation and let [𝑇 ]ℰ be the standard matrix of 𝑇 . Then for all #» 𝑥 ∈ F𝑛 , 𝑇 ( #» 𝑥 ) = [𝑇 ]ℰ #» 𝑥. That is, 𝑇 = 𝑇[𝑇 ]ℰ is the linear transformation determined by the matrix [𝑇 ]ℰ . [︀ ]︀𝑇 Proof: Let #» 𝑥 ∈ F𝑛 with #» 𝑥 = 𝑥1 𝑥2 · · · 𝑥𝑛 . Then ⎤⎞ 𝑥1 ⎜⎢ 𝑥2 ⎥⎟ ⎜⎢ ⎥⎟ 𝑇 ( #» 𝑥 ) = 𝑇 ⎜⎢ . ⎥⎟ ⎝⎣ .. ⎦⎠ ⎛⎡ 𝑥𝑛 ⎛⎡ ⎤ ⎡ ⎤ ⎡ 𝑥1 0 ⎢ ⎜⎢ 0 ⎥ ⎢ 𝑥2 ⎥ ⎢ ⎜⎢ ⎥ ⎢ ⎥ = 𝑇 ⎜⎢ . ⎥ + ⎢ . ⎥ + · · · + ⎢ . . ⎣ ⎝⎣ . ⎦ ⎣ . ⎦ 0 0 .. . ⎤⎞ ⎥⎟ ⎥⎟ ⎥⎟ ⎦⎠ 𝑥𝑛 0 0 = 𝑇 (𝑥1 𝑒#»1 + 𝑥2 𝑒#»2 + · · · + 𝑥𝑛 𝑒#» 𝑛) #» #» = 𝑥1 𝑇 (𝑒1 ) + 𝑥2 𝑇 (𝑒2 ) + · · · + 𝑥𝑛 𝑇 (𝑒#» 𝑛) (using linearity) Section 5.5 Every Linear Transformation is Determined by a Matrix 135 ⎤ 𝑥1 ⎥ [︀ ]︀ ⎢ ⎢ 𝑥2 ⎥ = 𝑇 (𝑒#»1 ) 𝑇 (𝑒#»2 ) · · · 𝑇 (𝑒#» 𝑛 ) ⎢ .. ⎥ ⎣ . ⎦ ⎡ = [𝑇 ]ℰ #» 𝑥. 𝑥𝑛 This result is quite remarkable. First, the matrix representation provides us with a very elegant and compact way of expressing the linear transformation. Second, the construction of the standard matrix is very simple. We must only find the images of the 𝑛 standard basis vectors and build the standard matrix column by column from these images. Third, once we have this standard matrix, we can find the image of any vector in F𝑛 under the linear transformation using matrix–vector multiplication. Thus, if 𝑇 : F𝑛 → F𝑚 is a linear transformation, then once we know 𝑇 (𝑒#»1 ), . . . , 𝑇 (𝑒#» 𝑛 ), we can compute 𝑇 ( #» 𝑥 ) for all #» 𝑥 ∈ F𝑛 . This property is incredibly strong and is one of the key features of linear transformations. In particular, linear transformations 𝑇 : R → R are very simple. For example, given 𝑔(𝑥) = 𝑚𝑥 (a linear function through the origin), it is possible to determine the value of 𝑚 if we are provided with any non-zero point 𝑃 that falls on the line. It turns out that these functions completely categorize linear transformations 𝑇 : R → R, which is our result below. Proposition 5.5.4 Let 𝑇 : R → R be a linear transformation. Then there is a real number 𝑚 ∈ R such that 𝑇 (𝑥) = 𝑚𝑥 for all 𝑥 ∈ R. Proof: We have 𝑇 (𝑥) = [𝑇 ]ℰ 𝑥, where the standard matrix [𝑇 ]ℰ of 𝑇 is the 1 × 1 matrix [𝑇 (1)]. Thus, if we let 𝑚 = 𝑇 (1), we find that 𝑇 (𝑥) = 𝑚𝑥, as required. Note that the above property is not true of non-linear transformations. For example, the (non-linear) function 𝑓 : R → R defined by 𝑓 (𝑥) = sin(𝑥) cannot be determined from knowing the value of sin(1). To conclude this section, we summarize what we have learned. Given a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F), we can define a linear transformation 𝑇𝐴 : F𝑛 → F𝑚 . Conversely, given a linear transformation 𝑇 : F𝑛 → F𝑚 , we can construct a matrix [𝑇 ]ℰ ∈ 𝑀𝑚×𝑛 (F). What happens if we combine these two processes? That is, what can we say about 𝑇[𝑇 ]ℰ and [𝑇𝐴 ]ℰ ? We answer both these questions and state a couple of useful consequences in the following result. 136 Proposition 5.5.5 Chapter 5 Linear Transformations (Properties of a Standard Matrix) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F), let 𝑇𝐴 : F𝑛 → F𝑚 be the linear transformation determined by 𝐴, and let 𝑇 : F𝑛 → F𝑚 be a linear transformation. Then (a) 𝑇[𝑇 ]ℰ = 𝑇 . (b) [𝑇𝐴 ]ℰ = 𝐴. (c) 𝑇 is onto if and only if rank([𝑇 ]ℰ ) = 𝑚. (d) 𝑇 is one-to-one if and only if rank([𝑇 ]ℰ ) = 𝑛. Proof: (a) Using the definition of the linear transformation defined by a matrix, we have 𝑇[𝑇 ]ℰ #» 𝑥 = [𝑇 ]ℰ #» 𝑥 = 𝑇 #» 𝑥 for all #» 𝑥 ∈ F𝑛 , where the last equality follows from Theorem 5.5.3 (Every Linear Transformation Is Determined by a Matrix). Thus, the two functions 𝑇[𝑇 ]ℰ and 𝑇 have the same values at each #» 𝑥 ∈ F𝑛 , so they must be equal. (b) Using the definition of the standard matrix, we have [︀ ]︀ [𝑇𝐴 ]ℰ = 𝑇𝐴 (𝑒#»1 ) · · · 𝑇𝐴 (𝑒#» 𝑛) [︀ ]︀ = 𝐴𝑒#»1 · · · 𝐴𝑒#» 𝑛 [︀ ]︀ = 𝑎#»1 · · · 𝑎# » (by Lemma 4.2.2 (Column Extraction)) 𝑛 = 𝐴. (c) This follows from Corollary 5.3.6 (Onto Criteria). (d) This follows from Corollary 5.4.5 (One-to-One Criteria). 5.6 Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection In this section, we will revisit the operations of projection and perpendicular discussed in Chapter 1, prove that both of these types of transformations are linear, and determine their standard matrices. We will do the same for the operations of rotation and reflection. Example 5.6.1 (Projection onto a line through the origin) ]︂ 𝑤1 be a non-zero vector in R2 . 𝑤2 proj 𝑤#» : R2 → R2 defined in Section 1.6. #» = Let 𝑤 [︂ (a) Prove that proj 𝑤#» is a linear transformation. Consider the projection transformation Section 5.6 Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection 137 (b) Determine the standard matrix of proj 𝑤#» . (c) Determine whether proj 𝑤#» is onto. (d) Determine whether proj 𝑤#» is one-to-one. Solution: #» is defined by (a) Recall that if #» 𝑣 is a vector in R2 , then the projection of #» 𝑣 onto 𝑤 #» #» 𝑣 ·𝑤 #» 𝑣 ) = #» 2 𝑤. proj 𝑤#» ( #» ‖ 𝑤‖ Now let #» 𝑢 , #» 𝑣 ∈ R2 and let 𝑐1 , 𝑐2 ∈ R. By the linearity properties of the dot product, #» #» #» #» 𝑢 ·𝑤 #» + 𝑐 𝑣 · 𝑤 𝑤 #» proj 𝑤#» (𝑐1 #» 𝑢 + 𝑐2 #» 𝑣 ) = 𝑐1 #» 2 𝑤 2 #» 2 ‖ 𝑤‖ ‖ 𝑤‖ = 𝑐 proj #» ( #» 𝑢 ) + 𝑐 proj #» ( #» 𝑣 ). 1 𝑤 2 𝑤 We conclude that proj 𝑤#» is a linear transformation. (b) To find the standard matrix [proj 𝑤#» ]ℰ of proj 𝑤#» , recall that ]︀ [︀ [proj 𝑤#» ]ℰ = proj 𝑤#» (𝑒#»1 ) proj 𝑤#» (𝑒#»2 ) . Since [︂ ]︂ [︂ 2 ]︂ #» 𝑒#»1 · 𝑤 𝑤1 · 1 + 𝑤2 · 0 𝑤1 1 𝑤1 #» #» proj 𝑤#» (𝑒1 ) = #» 2 𝑤 = = 2 𝑤2 ‖ 𝑤‖ 𝑤12 + 𝑤22 𝑤1 + 𝑤22 𝑤1 𝑤2 [︂ ]︂ [︂ ]︂ #» 𝑒#»2 · 𝑤 1 𝑤1 𝑤2 #» = 𝑤1 · 0 + 𝑤2 · 1 𝑤1 = proj 𝑤#» (𝑒#»2 ) = #» 2 𝑤 , 𝑤2 𝑤2 ‖ 𝑤‖ 𝑤2 + 𝑤2 𝑤2 + 𝑤2 1 we have 2 1 2 2 [︂ 2 ]︂ 1 𝑤1 𝑤1 𝑤2 [proj 𝑤#» ]ℰ = 2 . 𝑤1 + 𝑤22 𝑤1 𝑤2 𝑤22 (c) There are two ways in which we can argue why proj 𝑤#» is not an onto linear transfor#» mation. Thinking geometrically, note that all images of proj 𝑤#» lie on the line Span{ 𝑤} and nowhere else in R2 . Equivalently, [︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂}︂ {︂ 𝑤2 𝑤1 𝑤1 𝑤1 𝑤1 Range(proj 𝑤#» ) = Span , 2 = Span ̸= R2 . 2 2 2 𝑤 𝑤 𝑤 𝑤1 + 𝑤2 𝑤1 + 𝑤2 2 2 2 (d) Once again, there are two ways in which we can argue why proj 𝑤#» is not a one-toone linear transformation. Thinking geometrically, note that there are many different vectors which have the same image when they are projected onto the line. Equivalently, the kernel is obtained by solving the system [︂ 2 ]︂ 1 𝑤1 𝑤1 𝑤2 #» #» 𝑣 = 0, 𝑤12 + 𝑤22 𝑤1 𝑤2 𝑤22 which row-reduces to [︂ ]︂ 𝑤1 𝑤2 #» #» 𝑣 = 0. 0 0 Since the rank of the coefficient matrix is strictly less than 2, it follows that proj 𝑤#» not one-to-one from Corollary 5.4.5 (One-to-One Criteria). 138 Chapter 5 Linear Transformations Note that if you want to project onto a line that does not pass through the origin, then you can project on the parallel line through the origin and then translate the solution. Note, #» however, that this function is not an example of a linear transformation, since 0 would not #» be mapped to 0 . EXERCISE #» be a non-zero vector in R𝑛 and consider the perpendicular transformation Let 𝑤 perp 𝑤#» : R𝑛 → R𝑛 defined in Section 1.6 for #» 𝑣 ∈ R𝑛 : 𝑣 ). perp 𝑤#» ( #» 𝑣 ) = #» 𝑣 − proj 𝑤#» ( #» (a) Prove that perp 𝑤#» is a linear transformation. (b) Determine the standard matrix of perp 𝑤#» . (c) Determine whether perp 𝑤#» is onto. (d) Determine whether perp 𝑤#» is one-to-one. Example 5.6.2 (Projection onto a plane through the origin) We can define the projection map onto a plane through the origin by using a normal vector #» ∈ 𝑃 is orthogonal of the plane. For a plane 𝑃 with normal vector #» 𝑛 , we know every vector 𝑤 #» #» #» to 𝑛 , so we know that 𝑤 = perp 𝑛#» ( 𝑤). #» ∈ R3 , we think of proj #» ( #» #» #» Intuitively, for #» 𝑣,𝑤 𝑤 𝑣 ) as the 𝑤-component of 𝑣 ; the same analogy can be made with planes. With this intuition in place, we define the projection of #» 𝑣 onto the plane 𝑃 as proj𝑃 ( #» 𝑣 ) = perp 𝑛#» ( #» 𝑣 ). With this established, consider the projection map proj𝑃 : R3 → R3 onto a plane 𝑃 , where 𝑃 is defined by the scalar equation 3𝑥 − 4𝑦 + 5𝑧 = 0. Section 5.6 Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection 139 (a) Prove that proj𝑃 is a linear transformation. (b) Determine the standard matrix of proj𝑃 . (c) Determine whether proj𝑃 is onto. (d) Determine whether proj𝑃 is one-to-one. Solution: [︀ ]︀𝑇 (a) Let #» 𝑛 = 3 −4 5 be a normal to the plane 𝑃 . Then proj𝑃 ( #» 𝑣 ) = perp 𝑛#» ( #» 𝑣 ), and because we know (from the exercise above) that perp 𝑛#» is a linear transformation, we conclude that proj𝑃 is a linear transformation as well. (b) To find the standard matrix [proj𝑃 ]ℰ of proj𝑃 , recall that ]︀ [︀ [proj𝑃 ]ℰ = proj𝑃 (𝑒#»1 ) proj𝑃 (𝑒#»2 ) proj𝑃 (𝑒#»3 ) Since proj𝑃 (𝑒#»1 ) = perp 𝑛#» (𝑒#»1 ) = 𝑒#»1 − proj 𝑛#» (𝑒#»1 ) 𝑒#»1 · #» 𝑛 = 𝑒#»1 − #» 2 #» 𝑛 ‖𝑛‖ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 3 41/50 1 1 · 3 + 0 · (−4) + 0 · 5 ⎣ ⎦ ⎣ −4 = 6/25 ⎦ = ⎣0⎦ − 50 5 −3/10 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 3 6/25 0 · 3 + 1 · (−4) + 0 · 5 ⎣ −4 ⎦ = ⎣ 17/25 ⎦ proj𝑃 (𝑒#»2 ) = ⎣ 1 ⎦ − 50 0 5 2/5 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 3 −3/10 0 · 3 + 0 · (−4) + 1 · 5 ⎣ −4 ⎦ = ⎣ 2/5 ⎦ proj𝑃 (𝑒#»3 ) = ⎣ 0 ⎦ − 50 1 5 1/2 we have that ⎡ ⎤ 41/50 6/25 −3/10 [proj𝑃 ]ℰ = ⎣ 6/25 17/25 2/5 ⎦ . −3/10 2/5 1/2 (c) There are two ways in which we can argue why proj𝑃 is not onto. Thinking geometrically, note that all images lie on the plane 3𝑥 − 4𝑦 + 5𝑧 = 0 and nowhere else in R3 . Equivalently, we could investigate the range and show that it is not R3 . This can be done, for example, by finding one vector #» 𝑣 ∈ R3 such that the system [proj𝑃 ]ℰ #» 𝑥 = #» 𝑣 #» #» is inconsistent. This can be achieved by taking any vector 𝑣 ∈ / 𝑃 , such as 𝑣 = 𝑒#»1 𝑣 ∈ / Range(proj𝑃 ), we conclude that (notice that 1 · 3 + 0 · (−4) + 0 · 5 ̸= 0). Since #» 3 Range(proj𝑃 ) ̸= R , and so proj𝑃 is not onto. 140 Chapter 5 Linear Transformations (d) Once again, there are two ways in which we can argue why proj𝑃 is not one-to-one. Thinking geometrically, note that many different points have the same image when they are projected onto the plane. Equivalently, the kernel is obtained by solving #» [proj𝑃 ]ℰ #» 𝑥 = 0. Since the rank of the coefficient matrix is strictly less than 3, it follows that proj𝑃 not one-to-one from Corollary 5.4.5 (One-to-One Criteria). Example 5.6.3 (Rotation about the origin by an angle 𝜃) Consider the transformation 𝑅𝜃 : R2 → R2 that rotates every vector #» 𝑥 ∈ R2 by a fixed angle 𝜃 counter-clockwise about the origin. We will assume 0 < 𝜃 < 2𝜋 so that the rotation action is not trivial. This linear transformation is illustrated in the diagram below. Notice that the variable 𝑟 represents the length of a vector #» 𝑥. (a) Prove that 𝑅𝜃 is a linear transformation. (b) Determine the standard matrix of 𝑅𝜃 . (c) Determine whether 𝑅𝜃 is onto. (d) Determine whether 𝑅𝜃 is one-to-one. Solution: (a) We will prove that 𝑅𝜃 is a linear transformation by showing that[︂ it can ]︂ be written in 𝑥1 the form 𝑅𝜃 ( #» 𝑥 ) = 𝐴 #» 𝑥 for some matrix 𝐴 ∈ 𝑀2×2 (R). Let #» 𝑥 = be an arbitrary 𝑥2 vector in R2 . Recall that we can convert #» 𝑥 into its polar representation by taking #» #» 𝑟 = ‖ 𝑥 ‖ (the length of 𝑥 ) and multiplying by cos 𝜑 to get the 𝑥-coordinate and sin 𝜑 to get the 𝑦-coordinate, where 𝜑 is the angle between 𝑒#»1 and #» 𝑥. [︂ ]︂ 𝑟 cos 𝜑 #» #» Converting 𝑥 into polar coordinates, we find that 𝑥 = , Since 𝑅𝜃 ( #» 𝑥 ) is the 𝑟 sin 𝜑 result of a counter-clockwise rotation of #» 𝑥 by an angle 𝜃 about the origin, (︂[︂ 𝑅𝜃 𝑟 cos 𝜑 𝑟 sin 𝜑 ]︂)︂ [︂ ]︂ 𝑟 cos(𝜑 + 𝜃) = . 𝑟 sin(𝜑 + 𝜃) Section 5.6 Special Linear Transformations: Projection, Perpendicular, Rotation and Reflection 141 By making use of the trigonometric angle-sum identities cos (𝜑 + 𝜃) = cos 𝜑 cos 𝜃 − sin 𝜑 sin 𝜃 and sin (𝜑 + 𝜃) = sin 𝜑 cos 𝜃 + cos 𝜑 sin 𝜃, we have (︂[︂ ]︂)︂ 𝑟 cos 𝜑 𝑅𝜃 ( #» 𝑥 ) = 𝑅𝜃 𝑟 sin 𝜑 [︂ ]︂ 𝑟 cos(𝜑 + 𝜃) = 𝑟 sin(𝜑 + 𝜃) [︂ ]︂ 𝑟(cos 𝜑 cos 𝜃 − sin 𝜑 sin 𝜃) = 𝑟(sin 𝜑 cos 𝜃 + cos 𝜑 sin 𝜃) [︂ ]︂ cos 𝜃(𝑟 cos 𝜑) − sin 𝜃(𝑟 sin 𝜑) = sin 𝜃(𝑟 cos 𝜑) + cos 𝜃(𝑟 sin 𝜑) ]︂ ]︂ [︂ [︂ cos 𝜃 − sin 𝜃 𝑟 cos 𝜑 = 𝑟 sin 𝜑 sin 𝜃 cos 𝜃 = 𝐴 #» 𝑥, ]︂ cos 𝜃 − sin 𝜃 . Since we were able to express 𝑅𝜃 in the form of a matrixwhere 𝐴 = sin 𝜃 cos 𝜃 vector product, it must be the case that 𝑅𝜃 is a linear transformation. [︂ (b) From part (a) we find that ]︂ cos 𝜃 − sin 𝜃 . [𝑅𝜃 ]ℰ = 𝐴 = sin 𝜃 cos 𝜃 [︂ (c) There are two ways in which we can argue why 𝑅𝜃 is onto. Thinking geometrically, note that any vector in R2 can be obtained by a rotation from an appropriate starting vector. Equivalently, as Range(𝑅𝜃 ) = Col([𝑅𝜃 ]ℰ ) by Proposition 5.3.2 (Range of a Linear Transformation), we have {︂[︂ Range(𝑅𝜃 ) = Span ]︂ [︂ ]︂}︂ cos 𝜃 − sin 𝜃 , sin 𝜃 cos 𝜃 and it can be shown that this is R2 . (d) Once again, there are two ways in which we can argue why 𝑅𝜃 is one-to-one. Thinking geometrically, note that any two different vectors have different images when they are rotated. Equivalently, as Ker(𝑅𝜃 ) = Null([𝑅𝜃 ]ℰ ) by Proposition 5.4.2 (Kernel of a Linear Transformation), we have [︂ ]︂ cos 𝜃 − sin 𝜃 #» #» 𝑥 = 0. sin 𝜃 cos 𝜃 Given that the angle 𝜃 is fixed here, we view sin 𝜃 and cos 𝜃 as constants, so we can reduce this matrix as normal. We wish to show that the rank of the coefficient matrix is 2, so that the only solution to the system above is the trivial solution, allowing us to conclude that this linear transformation is one-to-one. 142 Chapter 5 Linear Transformations First, note that if cos(𝜃) = 0 or sin(𝜃) = 0 (conditions which cannot occur simultaneously) then the matrix is either in REF or a row swap will establish an REF. As such, in these scenarios the rank is 2 and the linear transformation is one-to-one. In the scenarios where sin 𝜃 ̸= 0 and cos 𝜃 ̸= 0, we have {︃ [︂ ]︂ 𝑅1 → (sin 𝜃)𝑅1 sin 𝜃 cos 𝜃 − sin2 𝜃 gives sin 𝜃 cos 𝜃 cos2 𝜃 𝑅2 → (cos 𝜃)𝑅2 [︂ ]︂ [︂ ]︂ sin 𝜃 cos 𝜃 − sin2 𝜃 sin 𝜃 cos 𝜃 − sin2 𝜃 𝑅2 → 𝑅2 − 𝑅1 gives = 0 cos2 𝜃 + sin2 𝜃 0 1 We 𝜃]︂ ̸= 0 as sin 𝜃 ̸= 0 and cos 𝜃 ̸= 0, so we can conclude that [︂ know sin 𝜃 cos sin 𝜃 cos 𝜃 − sin2 𝜃 is an REF of [𝑅𝜃 ]ℰ , so again the rank is 2. Thus, we can say 0 1 in all cases that 𝑅𝜃 is one-to-one. Example 5.6.4 (Reflection about a line through the origin) [︂ ]︂ 𝑤1 be a non-zero vector in R2 , and consider the reflection transformation 𝑤2 #» This linear refl 𝑤#» : R2 → R2 , which reflects any vector #» 𝑣 ∈ R2 about the line Span{ 𝑤}. transformation is illustrated in the diagram below. #» = Let 𝑤 Using the geometric construction above, one can show that any vector #» 𝑣 ∈ R2 gets reflected #» according to the rule below. This is left as an exercise. about the line Span{ 𝑤} refl 𝑤#» ( #» 𝑣 ) = #» 𝑣 − 2 perp 𝑤#» ( #» 𝑣 ). (a) Prove that refl 𝑤#» is a linear transformation. (b) Determine the standard matrix of refl 𝑤#» . (c) Determine whether refl 𝑤#» is onto. (d) Determine whether refl 𝑤#» is one-to-one. Solution: Section 5.7 Composition of Linear Transformations 143 (a) Let #» 𝑢 , #» 𝑣 ∈ R2 and let 𝑐1 , 𝑐2 ∈ F. Since the perpendicular transformation is linear, refl 𝑤#» (𝑐1 #» 𝑢 + 𝑐2 #» 𝑣 ) = (𝑐1 #» 𝑢 + 𝑐2 #» 𝑣 ) − 2 perp 𝑤#» (𝑐1 #» 𝑢 + 𝑐2 #» 𝑣) #» #» #» = (𝑐 𝑢 + 𝑐 𝑣 ) − 2𝑐 perp #» ( 𝑢 ) − 2𝑐 perp #» ( #» 𝑣) 1 2 1 2 𝑤 𝑤 = 𝑐1 #» 𝑢 + 𝑐2 #» 𝑣 − 2𝑐1 perp 𝑤#» ( #» 𝑢 ) − 2𝑐2 perp 𝑤#» ( #» 𝑣) #» #» #» #» = 𝑐 ( 𝑢 − 2 perp #» ( 𝑢 )) + 𝑐 ( 𝑣 − 2 perp #» ( 𝑣 )) 1 𝑤 2 𝑤 𝑣 ). 𝑢 ) + 𝑐2 refl 𝑤#» ( #» = 𝑐1 refl 𝑤#» ( #» We conclude that refl 𝑤#» is a linear transformation. (b) To find the standard matrix [refl 𝑤#» ]ℰ of refl 𝑤#» , note that 𝑣 ). 𝑣 ) = − #» 𝑣 + 2 proj 𝑤#» ( #» refl 𝑤#» ( #» 𝑣 ) = #» 𝑣 − 2 perp 𝑤#» ( #» Since [︂ ]︂ 2 1 #» refl 𝑤#» (𝑒1 ) = − + 2 0 𝑤1 + 𝑤22 [︂ ]︂ 2 0 + 2 refl 𝑤#» (𝑒#»2 ) = − 1 𝑤1 + 𝑤22 ]︂ 1 𝑤12 = 2 𝑤1 𝑤2 𝑤1 + 𝑤22 ]︂ [︂ 1 𝑤1 𝑤2 = 2 2 𝑤2 𝑤1 + 𝑤22 [︂ ]︂ 𝑤12 − 𝑤22 , 2𝑤1 𝑤2 ]︂ [︂ 2𝑤1 𝑤2 , 𝑤22 − 𝑤12 [︂ we have that [refl 𝑤#» ]ℰ = [︀ refl 𝑤#» (𝑒#»1 ) refl 𝑤#» (𝑒#»2 ) ]︀ [︂ 2 ]︂ 1 𝑤1 − 𝑤22 2𝑤1 𝑤2 = 2 . 𝑤1 + 𝑤22 2𝑤1 𝑤2 𝑤22 − 𝑤12 (c) There are two ways in which we can argue why refl 𝑤#» is onto. Thinking geometrically, note that any vector in R2 can be obtained by a reflection from an appropriate starting vector. Equivalently, the range is ]︂ [︂ ]︂}︂ {︂[︂ 2 2𝑤1 𝑤2 𝑤1 − 𝑤22 #» , Range(refl 𝑤 ) = Span 2𝑤1 𝑤2 𝑤22 − 𝑤12 and it can be shown that this is equal to R2 . (d) Once again, there are two ways in which we can argue why 𝑅𝜃 is one-to-one. Thinking geometrically, note that any two different vectors have different images when they are reflected. Equivalently, the kernel is obtained by solving #» [refl 𝑤#» ]ℰ #» 𝑥 = 0. Since the rank of the coefficient matrix is equal to 2, then the only solution is the trivial solution and so this linear transformation is one-to-one. 5.7 Composition of Linear Transformations In this section, we will consider the composition of linear transformations and show how it is related to matrix multiplication. 144 Definition 5.7.1 Composition of Linear Transformations Chapter 5 Linear Transformations Let 𝑇1 : F𝑛 → F𝑚 and 𝑇2 : F𝑚 → F𝑝 be linear transformations. We define the function 𝑇2 ∘ 𝑇1 : F𝑛 → F𝑝 by (𝑇2 ∘ 𝑇1 )( #» 𝑥 ) = 𝑇2 (𝑇1 ( #» 𝑥 )). The function 𝑇2 ∘ 𝑇1 is called the composite function of 𝑇2 and 𝑇1 . Proposition 5.7.2 (Composition of Linear Transformations Is Linear) Let 𝑇1 : F𝑛 → F𝑚 and 𝑇2 : F𝑚 → F𝑝 be linear transformations. Then 𝑇2 ∘ 𝑇1 is a linear transformation. Proof: Since 𝑇1 is linear, we have that for all #» 𝑥 , #» 𝑦 ∈ F𝑛 , 𝑐 ∈ F, 𝑇1 (𝑐 #» 𝑥 + #» 𝑦 ) = 𝑐𝑇1 ( #» 𝑥 ) + 𝑇1 ( #» 𝑦 ). #» #» Since 𝑇2 is linear, we have that for all 𝑤, 𝑧 ∈ F𝑚 , 𝑎 ∈ F, #» + #» #» + 𝑇 ( #» 𝑇2 (𝑎 𝑤 𝑧 ) = 𝑎𝑇2 ( 𝑤) 2 𝑧 ). Now, consider the action of the composition on 𝑐 #» 𝑥 + #» 𝑦: (𝑇2 ∘ 𝑇1 )(𝑐 #» 𝑥 + #» 𝑦 ) = 𝑇2 (𝑐𝑇1 ( #» 𝑥 ) + 𝑇1 ( #» 𝑦 )) . #» and 𝑇 ( #» #» We can let 𝑇1 ( #» 𝑥) = 𝑤 1 𝑦 ) = 𝑧 , so that #» + #» (𝑇2 ∘ 𝑇1 )(𝑐 #» 𝑥 + #» 𝑥 2 ) = 𝑇2 (𝑐 𝑤 𝑧) #» = 𝑐𝑇 ( 𝑤) + 𝑇 ( #» 𝑧 ), 2 2 = 𝑐𝑇2 (𝑇1 ( #» 𝑥 )) + 𝑇2 (𝑇1 ( #» 𝑦 )) #» = 𝑐(𝑇 ∘ 𝑇 )( 𝑥 ) + (𝑇 ∘ 𝑇 )( #» 𝑦 ). 2 1 2 (by linearity of 𝑇2 ) 1 Therefore, 𝑇2 ∘ 𝑇1 is a linear transformation. Proposition 5.7.3 (The Standard Matrix of a Composition of Linear Transformations) Let 𝑇1 : F𝑛 → F𝑚 and 𝑇2 : F𝑚 → F𝑝 be linear transformations. Then the standard matrix of 𝑇2 ∘ 𝑇1 is equal to the product of standard matrices of 𝑇2 and 𝑇1 . That is, [𝑇2 ∘ 𝑇1 ]ℰ = [𝑇2 ]ℰ [𝑇1 ]ℰ . Proof: Let 𝐴 = [𝑇1 ]ℰ , 𝐵 = [𝑇2 ]ℰ , and let 𝑇𝐵𝐴 : F𝑛 → F𝑝 denote the linear transformation determined by the matrix 𝐵𝐴. Note that, for all #» 𝑣 ∈ F𝑛 , (𝑇2 ∘ 𝑇1 )( #» 𝑣 ) = 𝑇2 (𝑇1 ( #» 𝑣 )) #» = 𝑇 (𝐴 𝑣 ) 2 = 𝐵(𝐴 #» 𝑣) = (𝐵𝐴) #» 𝑣 = 𝑇𝐵𝐴 ( #» 𝑣 ). Section 5.7 Composition of Linear Transformations 145 We conclude that the linear transformations 𝑇2 ∘ 𝑇1 and 𝑇𝐵𝐴 are identical. Consequently, (𝑇2 ∘ 𝑇1 )(𝑒#»𝑖 ) = 𝑇𝐵𝐴 (𝑒#»𝑖 ) for all 𝑖 = 1, . . . , 𝑛. But then [︀ ]︀ [𝑇2 ∘ 𝑇1 ]ℰ = (𝑇2 ∘ 𝑇1 )(𝑒#»1 ) · · · (𝑇2 ∘ 𝑇𝑛 )(𝑒#» 𝑛) [︀ ]︀ = 𝑇𝐵𝐴 (𝑒#»1 ) · · · 𝑇𝐵𝐴 (𝑒#» 𝑛) ]︀ [︀ = (𝐵𝐴)𝑒#»1 · · · (𝐵𝐴)𝑒#» 𝑛 = 𝐵𝐴 (by Lemma 4.2.2 (Column Extraction)) = [𝑇2 ]ℰ [𝑇1 ]ℰ . This result gives us a very efficient way of obtaining the matrix representation of a composite linear transformation. It also explains a motivation behind the definition of matrix multiplication. ⎤ 3𝑖 1 + 𝑖 1 + 𝑖 −3𝑖 −2 0 ⎦ be a matrix in Let 𝐴 = be a matrix in 𝑀2×3 (C) and 𝐵 = ⎣ 1 2 1 + 2𝑖 4𝑖 1 − 𝑖 3 − 2𝑖 𝑀3×2 (C). Let 𝑇𝐴 and 𝑇𝐵 be the linear transformations determined by 𝐴 and 𝐵, respectively. Determine the standard matrix of their composite linear transformation 𝑇𝐵 ∘ 𝑇𝐴 . [︂ Example 5.7.4 ⎡ ]︂ Solution: By Proposition 5.7.3 (The Standard Matrix of a Composition of Linear Transformations), [𝑇𝐵 ∘ 𝑇𝐴 ]ℰ = [𝑇𝐵 ]ℰ [𝑇𝐴 ]ℰ = 𝐵𝐴. Therefore, ⎡ ⎤ ⎤ ]︂ −1 + 5𝑖 8 + 3𝑖 −4 − 2𝑖 3𝑖 1 + 𝑖 [︂ 1 + 𝑖 −3𝑖 −2 −2 ⎦ . 0 ⎦ = ⎣ 1 + 𝑖 −3𝑖 [𝑇𝐵 ∘ 𝑇𝐴 ]ℰ = ⎣ 1 2 1 + 2𝑖 4𝑖 8 − 4𝑖 4 + 𝑖 6 + 14𝑖 1 − 𝑖 3 − 2𝑖 ⎡ Example 5.7.5 Find the standard matrix of the linear transformation 𝑇 : R2 → R2 which is defined by 𝑇1 , a rotation counter-clockwise about the origin by an angle of 𝜋3 radians, followed by 𝑇2 , a projection onto the line 𝑦 = −3𝑥. Solution From Example 5.6.3 we have that √ ]︃ [︃ ]︃ [︃ 3 1 cos( 𝜋3 ) − sin( 𝜋3 ) − 2 2 √ [𝑇1 ]ℰ = [𝑅𝜋/3 ]ℰ = = . 3 1 sin( 𝜋3 ) cos( 𝜋3 ) 2 2 [︀ ]︀𝑇 Next, note that the line 𝑦 = −3𝑥 can be written as Span{𝑤}, ⃗ where 𝑤 ⃗ = 1 −3 . From Example 5.6.1 we have that for [︂ ]︂ [︂ ]︂ 1 1 1 −3 1 −3 . [𝑇2 ]ℰ = [proj 𝑤#» ]ℰ = = 1 + (−3)2 −3 9 10 −3 9 Thus, we have √ ]︃ [︃ √ √ ]︃ [︂ ]︂ [︃ 1 3 − 1 − 3 3 −3 − 3 1 1 1 −3 2 2 √ √ √ [𝑇 ]ℰ = [𝑇2 ]ℰ [𝑇1 ]ℰ = = . 3 1 10 −3 9 20 −3 + 9 3 3 3 + 9 2 2 146 Definition 5.7.6 Identity Transformation Chapter 5 Linear Transformations The linear transformation id𝑛 : F𝑛 → F𝑛 such that id𝑛 ( #» 𝑥 ) = #» 𝑥 for all #» 𝑥 ∈ F𝑛 is called the identity transformation. EXERCISE Show that the standard matrix [id𝑛 ]ℰ of id𝑛 is the identity matrix 𝐼𝑛 . A special case of composition arises when 𝑇 : F𝑛 → F𝑛 has the same domain and codomain, in which case we can apply the linear transformation 𝑇 more than one time. Definition 5.7.7 𝑇𝑝 Let 𝑇 : F𝑛 → F𝑛 and let 𝑝 > 1 be an integer. We then define the 𝑝𝑡ℎ power of 𝑇 , denoted by 𝑇 𝑝 , inductively by 𝑇 𝑝 = 𝑇 ∘ 𝑇 𝑝−1 . We also define 𝑇 0 = id𝑛 . Corollary 5.7.8 Let 𝑇 : F𝑛 → F𝑛 be a linear transformation and let 𝑝 > 1 be an integer. Then the standard matrix of 𝑇 𝑝 is the 𝑝𝑡ℎ power of the standard matrix of 𝑇 . That is, [𝑇 𝑝 ]ℰ = ([𝑇 ]ℰ )𝑝 . Proof: Use induction and Proposition 5.7.3 (The Standard Matrix of a Composition of Linear Transformations). Example 5.7.9 Consider the linear transformation defined by a rotation about the origin by an angle of 𝜃 2 2 counter-clockwise in the ]︂ plane, denoted by 𝑅𝜃 : R → R . In Example 5.6.3 we found that [︂ cos 𝜃 − sin 𝜃 . Determine the standard matrix of 𝑅𝜃2 . [𝑇𝜃 ]ℰ = sin 𝜃 cos 𝜃 Solution: We have [︂ ]︂ [︂ ]︂ [︀ 2 ]︀ cos(𝜃) − sin(𝜃) cos(𝜃) − sin(𝜃) 2 𝑅𝜃 ℰ = ([𝑅𝜃 ]ℰ ) = sin(𝜃) cos(𝜃) sin(𝜃) cos(𝜃) [︂ = cos2 (𝜃) − sin2 (𝜃) −2 cos(𝜃) sin(𝜃) 2 cos(𝜃) sin(𝜃) cos2 (𝜃) − sin2 (𝜃) ]︂ From here, we can make use of the trigonometric double-angle identities, cos(2𝜃) = cos2 (𝜃) − sin2 (𝜃) and sin(2𝜃) = 2 cos(𝜃) sin(𝜃), to obtain [︂ ]︂ [︀ 2 ]︀ cos(2𝜃) − sin(2𝜃) 𝑅𝜃 ℰ = , sin(2𝜃) cos(2𝜃) which is the standard matrix for a rotation of 2𝜃 counter-clockwise. Chapter 6 The Determinant 6.1 The Definition of the Determinant [︀ ]︀ When is an 𝑛 × 𝑛 matrix 𝐴[︀invertible? If 𝑛 = 1 then 𝐴 = 𝑎 is invertible if and only if ]︀ 𝑎 ̸= 0, in which case 𝐴−1 = 𝑎1 . If 𝑛 = 2, we saw in Proposition 4.6.13 (Inverse of a 2 × 2 ]︂ [︂ 𝑎 𝑏 is invertible if and only if 𝑎𝑑 − 𝑏𝑐 ̸= 0. Matrix) that 𝐴 = 𝑐 𝑑 Thus, the invertibility of a 1 × 1 or 2 × 2 matrix can be completely determined by examining a number formed using the entries of the matrix. We give this number a name. Definition 6.1.1 [︀ ]︀ If 𝐴 = 𝑎11 is in 𝑀1×1 (F), then the determinant of 𝐴, denoted by det(𝐴), is: Determinant of a 1 × 1 and 2 × 2 Matrix det(𝐴) = 𝑎11 . [︂ 𝑎 𝑎 If 𝐴 = 11 12 𝑎21 𝑎22 ]︂ is in 𝑀2×2 (F), then the determinant of 𝐴, denoted by det(𝐴), is: det(𝐴) = 𝑎11 𝑎22 − 𝑎12 𝑎21 . Example 6.1.2 [︀ ]︀ Let 𝐴 = 4 − 𝑖 . Then det(𝐴) = 4 − 𝑖. [︂ Example 6.1.3 ]︂ 6 8 Let 𝐵 = . Then det(𝐵) = 6(2) − 8(−3) = 36. −3 2 In the remainder of this section, we will generalize the above and define the determinant of an 𝑛 × 𝑛 matrix. One of our main results will be that an 𝑛 × 𝑛 matrix is invertible if and only if its determinant is non-zero, but it will take some time to get there (see Theorem 6.3.1 (Invertible iff the Determinant is Non-Zero)). We will need some preliminary definitions. 147 148 Definition 6.1.4 (𝑖, 𝑗)𝑡ℎ Submatrix, (𝑖, 𝑗)𝑡ℎ Minor Chapter 6 The Determinant Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The (𝑖, 𝑗)𝑡ℎ submatrix of 𝐴, denoted by 𝑀𝑖𝑗 (𝐴), is the (𝑛 − 1) × (𝑛 − 1) matrix obtained from 𝐴 by removing the 𝑖𝑡ℎ row and the 𝑗 𝑡ℎ column from 𝐴. The determinant of 𝑀𝑖𝑗 (𝐴) is known as the (𝑖, 𝑗)𝑡ℎ minor of 𝐴. ⎡ Example 6.1.5 Definition 6.1.6 Determinant of an 𝑛 × 𝑛 matrix ⎤ [︂ ]︂ [︂ ]︂ 4 6 8 48 6 8 If 𝐴 = ⎣ 6 −3 2 ⎦ then 𝑀22 (𝐴) = and 𝑀31 (𝐴) = . 59 −3 2 5 7 9 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) for 𝑛 ≥ 2. We define the determinant function, det : 𝑀𝑛×𝑛 (F) → F, by 𝑛 ∑︁ det(𝐴) = 𝑎1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐴)). 𝑗=1 [︂ Example 6.1.7 ]︂ 𝑎11 𝑎12 If 𝑛 = 2, so that 𝐴 = , the above definition gives 𝑎21 𝑎22 [︀ ]︀ [︀ ]︀ det(𝐴) = 𝑎11 (−1)1+1 det( 𝑎22 ) + 𝑎12 (−1)1+2 det( 𝑎21 ) = 𝑎11 𝑎22 − 𝑎12 𝑎21 . Thus, we recover the same expression given in Definition 6.1.1. The expression for det(𝐴) in Definition 6.1.6 is called the expansion along the first row of the determinant. It is recursive in nature: we must multiply the entries in the first row of the matrix 𝐴 by ±1 and then by the determinant of an (𝑖, 𝑗) submatrix. For a 3 × 3 matrix, this process will involve the evaluation of three determinants of 2 × 2 matrices. For a 4 × 4 matrix, this process will involve the evaluation of four determinants of 3 × 3 matrices, each of which involve the evaluation of three determinants of 2 × 2 matrices. This pattern continues as we increase 𝑛. In the next section we will learn techniques that will permit us to occasionally avoid too many tedious computations. ⎡ Example 6.1.8 ⎤ 12 3 Find the determinant of the matrix 𝐴 = ⎣ 4 5 6 ⎦. 7 8 10 Section 6.1 The Definition of the Determinant 149 Solution: Using the definition, we have det(𝐴) = 3 ∑︁ 𝑎1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐴)) 𝑗=1 = 𝑎11 (−1)1+1 det(𝑀11 (𝐴)) + 𝑎12 (−1)1+2 det(𝑀12 (𝐴)) + 𝑎13 (−1)1+3 det(𝑀13 (𝐴)) (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 5 6 4 6 45 = (1)(1) det + (2)(−1) det + (3)(1) det 8 10 7 10 78 = 1(50 − 48) − 2(40 − 42) + 3(32 − 35) = −3. ⎡ Example 6.1.9 ⎤ 1 2𝑖 3 Find the determinant of the matrix 𝐵 = ⎣ 4 − 𝑖 5 + 2𝑖 6 ⎦ . 7 + 3𝑖 8 − 2𝑖 10 Solution: Using the definition, we have det(𝐵) = 𝑛 ∑︁ 𝑏1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐵)) 𝑗=1 = 𝑐11 (−1)1+1 det(𝑀11 (𝐶)) + 𝑐12 (−1)1+2 det(𝑀12 (𝐶)) + 𝑐13 (−1)1+3 det(𝑀13 (𝐶)) ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ 5 + 2𝑖 6 4−𝑖 6 = (1)(1) det + (2𝑖)(−1) det 7 + 3𝑖 10 8 − 2𝑖 10 ]︂)︂ (︂[︂ 4 − 𝑖 5 + 2𝑖 + (3)(1) det 7 + 3𝑖 8 − 2𝑖 = 1(50 + 20𝑖 − (48 − 12𝑖)) − 2𝑖(40 − 10𝑖 − (42 + 18𝑖)) + 3(30 − 16𝑖 − (29 + 29𝑖)) = −51 − 99𝑖. One of the remarkable features of the determinant is that it may be evaluated by performing an expansion along any row, not just the first row. We state this result below. The proof of this result is quite lengthy and is omitted. Proposition 6.1.10 (𝑖𝑡ℎ Row Expansion of the Determinant) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) with 𝑛 ≥ 2 and let 𝑖 ∈ {1, . . . , 𝑛}. Then det(𝐴) = 𝑛 ∑︁ 𝑎𝑖𝑗 (−1)𝑖+𝑗 det(𝑀𝑖𝑗 (𝐴)). 𝑗=1 Notice that if 𝑖 = 1 then the above expression reduces to the definition of det(𝐴). 150 Chapter 6 The Determinant ⎡ Example 6.1.11 ⎤ 12 3 Evaluate the determinant of the matrix 𝐴 = ⎣ 4 5 6 ⎦ using an expansion along the second 7 8 10 row. Solution: We have det(𝐴) = 𝑛 ∑︁ 𝑎2𝑗 (−1)2+𝑗 det(𝑀2𝑗 (𝐴)) 𝑗=1 = 𝑎21 (−1)2+1 det(𝑀21 (𝐴)) + 𝑎22 (−1)2+2 det(𝑀22 (𝐴)) + 𝑎23 (−1)2+3 det(𝑀23 (𝐴)) (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 2 3 1 3 12 = (4)(−1) det + (5)(1) det + (6)(−1) det 8 10 7 10 78 = −4(20 − 24) + 5(10 − 21) − 6(8 − 14) = −3, just as we found in Example 6.1.8. A further remarkable feature of the determinant is that it may also be evaluated by expanding along any column. Proposition 6.1.12 (𝑗 𝑡ℎ Column Expansion of the Determinant) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) with 𝑛 ≥ 2 and let 𝑗 ∈ {1, . . . , 𝑛}. Then det(𝐴) = 𝑛 ∑︁ 𝑎𝑖𝑗 (−1)𝑖+𝑗 det(𝑀𝑖𝑗 (𝐴)). 𝑖=1 ⎡ Example 6.1.13 ⎤ 12 3 Evaluate the determinant of the matrix 𝐴 = ⎣ 4 5 6 ⎦ using an expansion along the third 7 8 10 column. Solution: We have det(𝐴) = 3 ∑︁ 𝑎𝑖3 (−1)𝑖+3 det(𝑀𝑖3 (𝐴)) 𝑖=1 = 𝑎13 (−1)1+3 det(𝑀13 (𝐴)) + 𝑎23 (−1)2+3 det(𝑀23 (𝐴)) + 𝑎33 (−1)3+3 det(𝑀33 (𝐴)) (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 45 12 12 = (3)(1) det + (6)(−1) det + (10)(1) det 78 78 45 = 3(32 − 35) − 6(8 − 14) + 10(5 − 8) = −3. Thus, you may evaluate the determinant by expanding along any row or column of your choice. In practice, it is usually easier to choose a row or column with many zeros, since this will reduce the number determinants of submatrices that need to be computed. Section 6.1 The Definition of the Determinant 151 ⎡ Example 6.1.14 1 ⎢ 3 Evaluate the determinant of 𝐴 = ⎢ ⎣ 567 4 0 2 234 5 0 0 14 0 ⎤ 5 7 ⎥ ⎥. 235 ⎦ 3 Solution: We perform an expansion along the third column. det(𝐴) = 4 ∑︁ 𝑎𝑖3 (−1)𝑖+3 det(𝑀𝑖3 (𝐴)) 𝑖=1 = 0(1) det(𝑀13 (𝐴)) + 0(−1) det(𝑀23 (𝐴)) + 14(1) det(𝑀33 (𝐴)) + 0(−1) det(𝑀43 (𝐴)) ⎛⎡ ⎤⎞ 105 = 14 det ⎝⎣ 3 2 7 ⎦⎠ . 453 From here, we need to proceed with another expansion along a row or column. Row 1 now looks “ideal” with the most zeroes of any row. Doing so, we continue our calculation below: ⎛⎡ ⎤⎞ 105 det(𝐴) = 14 det ⎝⎣ 3 2 7 ⎦⎠ 453 ]︂)︂]︂ ]︂)︂ (︂[︂ [︂ (︂[︂ 32 27 + (5) det = 14 (1) det 45 53 = 14 [(6 − 35) + 5(15 − 8)] = 14 [−29 + 35] = 14(6) = 84. The recursive definition of the determinant makes its direct application tedious. There are, however, certain situations where the determinant is relatively easy to compute. Proposition 6.1.15 (Easy Determinants) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) be a square matrix. (a) If 𝐴 has a row consisting only of zeros, then det 𝐴 = 0. (b) If 𝐴 has a column consisting only of zeros, then det 𝐴 = 0. ⎡ ⎤ 𝑎11 * * · · · * ⎢ 0 𝑎22 * · · · * ⎥ ⎢ ⎥ ⎢ ⎥ (c) If 𝐴 = ⎢ 0 0 𝑎33 · · · * ⎥ is upper triangular, then det 𝐴 = 𝑎11 𝑎22 · · · 𝑎𝑛𝑛 . ⎢ .. .. . . . . .. ⎥ ⎣ . . . . . ⎦ 0 0 · · · 0 𝑎𝑛𝑛 Proof: For parts (a) and (b), simply perform the expansion along the row/column consisting of 0s. 152 Chapter 6 The Determinant To prove part (c), we will proceed by induction on 𝑛. If 𝑛 = 1, then the result follows immediately from the definition of the determinant of a 1 × 1 matrix. For the inductive step, we will assume that the result is true when 𝑛 = 𝑘 and consider a (𝑘 + 1) × (𝑘 + 1) upper triangular matrix ⎡ ⎤ 𝑎11 * * · · · * ⎢ 0 𝑎22 * · · · ⎥ * ⎥ ⎢ ⎢ ⎥ * 𝐴 = ⎢ 0 0 𝑎33 · · · ⎥. ⎢ .. .. . . . . ⎥ .. ⎣ . . ⎦ . . . 0 0 · · · 0 𝑎𝑘+1,𝑘+1 Using an expansion along the first column, we obtain ⎛⎡ 𝑎22 ⎜⎢ 0 ⎜⎢ det(𝐴) = 𝑎11 det(𝑀11 (𝐴)) = 𝑎11 det ⎜⎢ . ⎝⎣ .. * 𝑎44 .. . ··· ··· .. . * * .. . ⎤⎞ ⎥⎟ ⎥⎟ ⎥⎟ . ⎦⎠ 0 · · · 0 𝑎𝑘+1,𝑘+1 Since 𝑀11 (𝐴) is an 𝑘 × 𝑘 matrix, we may apply the inductive hypothesis to conclude that det 𝑀11 (𝐴) = 𝑎22 · · · 𝑎𝑘+1,𝑘+1 . It follows that det(𝐴) = 𝑎11 𝑎22 · · · 𝑎𝑘+1,𝑘+1 , as required. Corollary 6.1.16 The determinant of the 𝑛 × 𝑛 identity matrix is 1, that is, det(𝐼𝑛 ) = 1. Proof: This follows from part (c) of the previous Proposition. The determinant of a lower triangular square matrix is also the product of its diagonal entries. This follows at once from the following result and Proposition 6.1.15 (Easy Determinants). Proposition 6.1.17 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then det(𝐴) = det(𝐴𝑇 ). Proof: We proceed by induction on 𝑛. If 𝑛 = 1, then 𝐴 = 𝐴𝑇 and there is nothing to prove. Thus, assume that the result is true for 𝑛 = 𝑘 and consider a (𝑘 + 1) × (𝑘 + 1) matrix 𝐴. Let 𝐵 = 𝐴𝑇 . Notice that 𝑏𝑖𝑗 = 𝑎𝑗𝑖 and that 𝑀𝑖𝑗 (𝐵) = (𝑀𝑗𝑖 (𝐴))𝑇 . Consequently, using an expansion along the first row of 𝐵, we have det(𝐵) = 𝑘+1 ∑︁ 𝑏1𝑗 (−1)1+𝑗 det(𝑀1𝑗 (𝐵)) = 𝑗=1 𝑘+1 ∑︁ 𝑎𝑗1 (−1)1+𝑗 det((𝑀𝑗1 (𝐴))𝑇 ). 𝑗=1 Now since 𝑀𝑗1 (𝐴) is a 𝑘 × 𝑘 matrix, we have det((𝑀𝑗1 (𝐴))𝑇 ) = det(𝑀𝑗1 (𝐴)), by the inductive hypothesis. So, the above expression for det(𝐵) = det(𝐴𝑇 ) becomes det(𝐴𝑇 ) = 𝑘+1 ∑︁ 𝑎𝑗1 (−1)1+𝑗 det(𝑀𝑗1 (𝐴)). 𝑗=1 The right-side is the determinant expansion along the first column of 𝐴, that is, it is equal to det(𝐴). Thus, det(𝐴𝑇 ) = det(𝐴). Section 6.2 Computing the Determinant in Practice: EROs 6.2 153 Computing the Determinant in Practice: EROs We will now outline a practical method for computing the determinant of a general square matrix 𝐴. The basic idea is to perform EROs on 𝐴, thereby reducing it to a matrix 𝐵, whose determinant is more easily computed—for instance, 𝐵 could have a lot of 0 entries. Since EROs will turn out to have a well-understood effect on the determinant (as we describe below), we can obtain det(𝐴) from det(𝐵). Theorem 6.2.1 (Effect of EROs on the Determinant) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). (a) (Row swap) If 𝐵 is obtained from 𝐴 by interchanging two rows, then det(𝐵) = − det(𝐴). (b) (Row scale) If 𝐵 is obtained from 𝐴 by multiplying a row by 𝑚 ̸= 0, then det(𝐵) = 𝑚 det(𝐴). (c) (Row addition) If 𝐵 is obtained from 𝐴 by adding a non-zero multiple of one row to another row, then det(𝐵) = det(𝐴). REMARK Since det(𝐴) = det(𝐴𝑇 ), the previous Theorem remains true if we replace every instance of the word “row” with the word “column”. WARNING: Do not use such “column operations” when performing calculations other than a determinant, as column operations disrupt many properties of a matrix. ⎡ Example 6.2.2 ⎤ 2 −1 0 Let 𝐴 = ⎣ 0 −3 1 ⎦. Since 𝐴 is upper triangular, we have that det(𝐴) = (2)(−3)(3) = −18. 0 0 3 ⎡ ⎤ 2 −1 0 Let 𝐵 = ⎣ 0 −3 1 ⎦. Notice that 𝐵 may be obtained from 𝐴 by adding a multiple of 2 times 0 −6 5 the second row of 𝐴 to the third row of 𝐴. Thus, det(𝐵) = det(𝐴) = −18. ⎡ ⎤ 4 −2 0 Let 𝐶 = ⎣ 0 −3 1 ⎦. Notice that 𝐶 may be obtained from 𝐵 by multiplying the first row 0 −6 5 by 2. Thus, det(𝐶) = 2 det(𝐵) = −36. ⎡ ⎤ 4 −2 0 Finally, let 𝐷 = ⎣ 0 −6 5 ⎦, which is obtained from 𝐶 by interchanging the second and 0 −3 1 third rows. Thus, det(𝐷) = − det(𝐶) = 36. The proof of Theorem 6.2.1 (Effect of EROs on the Determinant) involves lengthy (but straightforward) computations. We will not give it in this course. Instead, we will extract from the Theorem some useful corollaries. 154 Corollary 6.2.3 Chapter 6 The Determinant Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). If 𝐴 has two identical rows (or two identical columns), then det(𝐴) = 0. Proof: If 𝐴 has two identical rows, we can use a row operation to subtract one of these rows from the other. The resulting matrix 𝐵 will have a row of zeros. Consequently, det(𝐵) = 0 by Proposition 6.1.15 (Easy Determinants). From Theorem 6.2.1 (Effect of EROs on the Determinant), we know that row addition EROs do not change the determinant. Thus, we conclude that det(𝐴) = det(𝐵) = 0. If 𝐴 has two identical columns, we can apply the previous argument to 𝐴𝑇 . Corollary 6.2.4 (Determinants of Elementary Matrices) For each part below, let 𝐸 be an elementary matrix of the indicated type. (a) (Row swap) det(𝐸) = −1. (b) (Row scale) det(𝐸) = 𝑚 (if 𝐸 is obtained from 𝐼𝑛 by multiplying a row by 𝑚 ̸= 0). (c) (Row addition) det(𝐸) = 1. Proof: Combine the fact that det(𝐼𝑛 ) = 1 with Theorem 6.2.1 (Effect of EROs on the Determinant). Corollary 6.2.5 (Determinant After One ERO) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and suppose we perform a single ERO on 𝐴 to produce the matrix 𝐵. Assume that the corresponding elementary matrix is 𝐸. Then det(𝐵) = det(𝐸) det(𝐴). Proof: Combine Theorem 6.2.1 (Effect of EROs on the Determinant) and Corollary 6.2.4 (Determinants of Elementary Matrices). Corollary 6.2.6 (Determinant After 𝑘 EROs) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and suppose we perform a sequence of 𝑘 EROs on the matrix 𝐴 to obtain the matrix 𝐵. Suppose that the elementary matrix corresponding to the 𝑖th ERO is 𝐸𝑖 , so that 𝐵 = 𝐸𝑘 · · · 𝐸2 𝐸1 𝐴. Then det(𝐵) = det(𝐸𝑘 · · · 𝐸2 𝐸1 𝐴) = det(𝐸𝑘 ) · · · det(𝐸2 ) det(𝐸1 ) det(𝐴). Section 6.2 Computing the Determinant in Practice: EROs 155 Proof: We proceed by induction on 𝑘. If 𝑘 = 1, this is the previous Corollary. Now assume that the result is true for 𝑘 = ℓ and consider the case where 𝑘 = ℓ + 1. We then have 𝐵 = 𝐸ℓ+1 𝐸ℓ · · · 𝐸1 𝐴, and therefore, det(𝐵) = det(𝐸ℓ+1 ) det(𝐸ℓ · · · 𝐸1 𝐴) (Corollary 6.2.5) = det(𝐸ℓ+1 ) det(𝐸ℓ ) · · · det(𝐸1 ) det(𝐴) (inductive hypothesis). The result now follows. We will give an example that illustrates how to use the preceding results to obtain the determinant of an arbitrary matrix. We will attempt to apply EROs to simplify 𝐴, keeping track of what EROs are being applied. Then we can relate det(𝐴) to det(𝐵) using Corollary 6.2.6 (Determinant After 𝑘 EROs). Notice that row addition EROs do not change the determinant, so we need only keep track of row swap and row scale EROs. ⎡ Example 6.2.7 ⎤ 12 3 Evaluate the determinant of 𝐴 = ⎣ 4 5 6 ⎦. 7 8 10 Solution: We have ⎛⎡ 1 2 ⎝ ⎣ 0 −3 det(𝐴) = det 0 −6 ⎛⎡ 1 2 ⎝ ⎣ 0 −3 = det 0 0 ⎤⎞ 3 −6 ⎦⎠ −11 ⎤⎞ 3 −6 ⎦⎠ 1 = (1)(−3)(1) (︃ perform 𝑅2 → −4𝑅1 + 𝑅2 𝑅3 → −7𝑅1 + 𝑅3 )︃ , determinant unchanged (perform 𝑅3 → −2𝑅2 + 𝑅3 , determinant unchanged) (determinant of an upper triangular matrix) = −3. Example 6.2.8 The previous example used only row addition EROs. Our next example will use all three types. Notice that if we apply a row scale ERO to 𝐴 to obtain 𝐵, then det(𝐵) = 𝑚 det(𝐴). 1 det(𝐵). Thus, det(𝐴) = 𝑚 ⎡ ⎤ 2 −2 4 Evaluate the determinant of 𝐴 = ⎣ 5 −5 1 ⎦. 3 6 5 Solution: We have ⎛⎡ ⎤⎞ 1 −1 2 det(𝐴) = 2 det ⎝⎣ 5 −5 1 ⎦⎠ 3 6 5 ⎛⎡ ⎤⎞ 1 −1 2 = 2 det ⎝⎣ 0 0 −9 ⎦⎠ 0 9 −1 ⎛⎡ ⎤⎞ 1 −1 2 = −2 det ⎝⎣ 0 9 −1 ⎦⎠ 0 0 −9 (︁ perform 𝑅1 → 21 𝑅1 , determinant scaled by (︃ perform 𝑅2 → −5𝑅1 + 𝑅2 𝑅3 → −3𝑅1 + 𝑅3 1 1/2 )︁ =2 )︃ , determinant unchanged (perform 𝑅2 ↔ 𝑅3 , sign of determinant changed) 156 Chapter 6 = (−2)(1)(9)(−9) The Determinant (determinant of an upper triangular matrix) = 162. 6.3 The Determinant and Invertibility In this section, we will use our results on the relationship between EROs and the determinant to prove the following fundamental result. Theorem 6.3.1 (Invertible iff the Determinant is Non-Zero) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 𝐴 is invertible if and only if det(𝐴) ̸= 0. Proof: Let 𝑅 = RREF(𝐴). Suppose we obtain 𝑅 from 𝐴 by applying 𝑘 EROs. Thus, 𝑅 = 𝐸𝑘 · · · 𝐸1 𝐴, where 𝐸𝑖 is the elementary matrix corresponding to the 𝑖th ERO. Corollary 6.2.6 (Determinant After 𝑘 EROs) then implies that det(𝑅) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝐴). By Corollary 6.2.4 (Determinants of Elementary Matrices), since det(𝐸𝑖 ) ̸= 0 for all 𝑖, it follows that det(𝑅) ̸= 0 if and only if det(𝐴) ̸= 0. Now, if 𝐴 is invertible, then 𝑅 = 𝐼𝑛 , and therefore det(𝑅) = 1 ̸= 0. So det(𝐴) ̸= 0. Conversely, if 𝐴 is not invertible, then 𝑅 must have a zero row, and therefore det(𝑅) = 0. So det(𝐴) = 0 in this case. This completes the proof. ⎡ Example 6.3.2 ⎤ 35 7 For what value(s) of the constant 𝑘 is the matrix 𝐴 = ⎣ 6 𝑘 14 ⎦ invertible? 24 6 Solution: Let us evaluate the determinant of 𝐴. ⎛⎡ ⎤⎞ 3 5 7 det(𝐴) = det ⎝⎣ 0 𝑘 − 10 0 ⎦⎠ (𝑅2 → −2𝑅1 + 𝑅2 ) 2 4 6 ⎛⎡ ⎤⎞ 3 5 7 = 2 det ⎝⎣ 0 𝑘 − 10 0 ⎦⎠ (𝑅3 → 21 𝑅3 ) 1 2 3 ⎛⎡ ⎤⎞ 0 −1 −2 = 2 det ⎝⎣ 0 𝑘 − 10 0 ⎦⎠ (𝑅1 → −3𝑅3 + 𝑅1 ) 1 2 3 = 4(𝑘 − 10) (expansion along the second row). Thus, det(𝐴) = 0 if and only if 𝑘 = 10. We conclude that the matrix 𝐴 is invertible if and only if 𝑘 ̸= 10. Section 6.3 The Determinant and Invertibility 157 The following result is one of the most important properties of the determinant. Proposition 6.3.3 (Determinant of a Product) Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Then det(𝐴𝐵) = det(𝐴) det(𝐵). Proof: As in the proof of Theorem 6.3.1 (Invertible iff the Determinant is Non-Zero), we may write 𝐴 as 𝐴 = 𝐸𝑘 · · · 𝐸1 𝑅 where 𝑅 = RREF(𝐴) and the 𝐸𝑖 are elementary matrices that correspond to the EROs needed to operate on 𝑅 to produce 𝐴. Then Corollary 6.2.6 (Determinant After 𝑘 EROs) gives det(𝐴) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝑅). Now, suppose 𝐴 is invertible, so that 𝑅 = 𝐼𝑛 . On the one hand, det(𝐴) = det(𝐸𝑘 ) · · · det(𝐸1 ), while on the other hand, 𝐴𝐵 = (𝐸𝑘 · · · 𝐸1 𝑅)𝐵 = 𝐸𝑘 · · · 𝐸1 𝐵 (as 𝑅 = 𝐼𝑛 ) so that det(𝐴𝐵) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝐵), again by Corollary 6.2.6. The right-side is det(𝐴) det(𝐵). Thus, the Proposition is true if 𝐴 is invertible. If 𝐴 is not invertible, then 𝑅 has a zero row, and therefore so does 𝑅𝐵. Since 𝐴𝐵 = 𝐸𝑘 · · · 𝐸1 (𝑅𝐵) we can apply Corollary 6.2.6 (Determinant After 𝑘 EROs) once more to find that det(𝐴𝐵) = det(𝐸𝑘 ) · · · det(𝐸1 ) det(𝑅𝐵) = 0. Since det(𝐴) is 0 in this case too, we have that det(𝐴𝐵) = det(𝐴) det(𝐵). REMARK It is not true that det(𝐴 + 𝐵) = det(𝐴) + det(𝐵) in general. For example, if [︂ ]︂ [︂ ]︂ 10 00 𝐴= and 𝐵 = 00 01 then det(𝐴) = det(𝐵) = 0 so that det(𝐴) + det(𝐵) = 0. On the other hand, 𝐴 + 𝐵 = 𝐼2 , so det(𝐴 + 𝐵) = 1. Thus, det(𝐴 + 𝐵) ̸= det(𝐴) + det(𝐵) in this case. 158 Example 6.3.4 Chapter 6 The Determinant Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Prove that 𝐴𝐵 is invertible if and only if 𝐵𝐴 is invertible. Solution: 𝐴𝐵 is invertible iff det(𝐴𝐵) ̸= 0 (Theorem 6.3.1 (Invertible iff the Determinant is Non-Zero)) iff det(𝐴) det(𝐵) ̸= 0 (Proposition 6.3.3 (Determinant of a Product)) iff det(𝐵) det(𝐴) ̸= 0 iff det(𝐵𝐴) ̸= 0 (Proposition 6.3.3) iff 𝐵𝐴 is invertible. (Theorem 6.3.1) The previous example implicitly contains the following useful observation. Corollary 6.3.5 Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Then det(𝐴𝐵) = det(𝐵𝐴). Here is another useful observation that can be proved using similar ideas. Corollary 6.3.6 (Determinant of Inverse) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) be invertible. Then det(𝐴−1 ) = 1 . det(𝐴) Proof: We have 𝐴−1 𝐴 = 𝐼𝑛 . Using Proposition 6.3.3 (Determinant of a Product) and taking determinants of both sides, we get det(𝐴−1 ) det(𝐴) = 1. Since 𝐴 is invertible, we have that det(𝐴) ̸= 0, so we may divide through by det(𝐴) to obtain det(𝐴−1 ) = 1/ det(𝐴), as desired. ⎡ Example 6.3.7 ⎤ 12 3 Prove that 𝐴 = ⎣ 4 5 6 ⎦ is invertible and find det(𝐴−1 ). 7 8 10 Solution: We saw in Example 6.2.7 that det(𝐴) = −3, so Theorem 6.3.1 (Invertible iff the Determinant is Non-Zero)) tells us that 𝐴 is invertible. We get that det(𝐴−1 ) = − 6.4 1 by Corollary 6.3.6 (Determinant of Inverse). 3 An Expression for 𝐴−1 We saw that a 2 × 2 matrix 𝐴 is invertible if and only if det(𝐴) ̸= 0 in Proposition 4.6. 13 (Inverse of a 2 × 2 Matrix). In the previous sections, we generalized the notion of the determinant to 𝑛 × 𝑛 matrices and obtained the analogous result: an 𝑛 × 𝑛 matrix is invertible if and only if det(𝐴) ̸= 0 (Theorem 6.3.1). Section 6.4 An Expression for 𝐴−1 159 Proposition 4.6.13 contains one additional result: in the case where det(𝐴) ̸= 0, an expression for the inverse of 𝐴 is given, namely [︂ ]︂ 1 𝑑 −𝑏 𝐴−1 = . det(𝐴) −𝑐 𝑎 In this section, we will show that there is a similar result for an 𝑛 × 𝑛 matrix (see Corollary 6.4.6 (Inverse by Adjugate)). We will need some preliminary notation and terminology. Definition 6.4.1 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The (𝑖, 𝑗)𝑡ℎ cofactor of 𝐴, denoted by 𝐶𝑖𝑗 (𝐴), is defined by Cofactor 𝐶𝑖𝑗 (𝐴) = (−1)𝑖+𝑗 det(𝑀𝑖𝑗 (𝐴)). Note that we can write the determinant of 𝐴 in terms of cofactors as either: det(𝐴) = 𝑛 ∑︁ (expansion along the 𝑖𝑡ℎ row) 𝑎𝑖𝑗 𝐶𝑖𝑗 (𝐴) 𝑗=1 or det(𝐴) = 𝑛 ∑︁ 𝑎𝑖𝑗 𝐶𝑖𝑗 (𝐴) (expansion along the 𝑗 𝑡ℎ column). 𝑖=1 Definition 6.4.2 Adjugate of a Matrix Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). The adjugate of 𝐴, denoted by adj(𝐴), is the 𝑛 × 𝑛 matrix whose (𝑖, 𝑗)𝑡ℎ entry is (adj(𝐴))𝑖𝑗 = 𝐶𝑗𝑖 (𝐴). That is, the adjugate of 𝐴 is the transpose of the matrix of cofactors of 𝐴. [︂ Example 6.4.3 ]︂ 𝑎 𝑏 , then the cofactors of 𝐴 are If 𝐴 = 𝑐 𝑑 𝐶11 (𝐴) = (−1)1+1 𝑑 = 𝑑, 𝐶12 (𝐴) = (−1)1+2 𝑐 = −𝑐, 𝐶21 (𝐴) = (−1)2+1 𝑏 = −𝑏, 𝐶22 (𝐴) = (−1)2+2 𝑎 = 𝑎. Hence [︂ 𝑑 −𝑐 adj(𝐴) = −𝑏 𝑎 ]︂𝑇 [︂ ]︂ 𝑑 −𝑏 = . −𝑐 𝑎 Notice that this is the matrix that appeared in the expression for 𝐴−1 in Proposition 4.6. 13 (Inverse of a 2 × 2 Matrix). 160 Chapter 6 The Determinant ⎡ Example 6.4.4 ⎤ 12 3 Find the adjugate of 𝐴 = ⎣ 4 5 6 ⎦ . 7 8 10 Solution: The cofactors of 𝐴 are given below. 𝐶𝑖𝑗 (𝐴).) [︂ ]︂ 5 6 𝑀11 = 𝐶11 8 10 [︂ ]︂ 45 𝑀13 = 𝐶13 78 [︂ ]︂ 1 3 𝑀22 = 𝐶22 7 10 [︂ ]︂ 23 𝑀31 = 𝐶31 56 ]︂ [︂ 12 𝐶33 𝑀33 = 45 (We will write 𝑀𝑖𝑗 and 𝐶𝑖𝑗 instead of 𝑀𝑖𝑗 (𝐴) and [︂ =2 𝑀12 = −3 𝑀21 = −11 𝑀23 = −3 𝑀32 ]︂ 4 6 = 7 10 [︂ ]︂ 2 3 = 8 10 [︂ ]︂ 12 = 78 [︂ ]︂ 13 = 46 𝐶12 = 2 𝐶21 = 4 𝐶23 = 6 𝐶32 = 6 = −3 Consequently, ⎤ ⎤𝑇 ⎡ 2 4 −3 2 2 −3 adj(𝐴) = ⎣ 4 −11 6 ⎦ = ⎣ 2 −11 6 ⎦ . −3 6 −3 −3 6 −3 ⎡ Here is the key result concerning the adjugate matrix, which we state without proof. Theorem 6.4.5 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 𝐴 adj(𝐴) = adj(𝐴) 𝐴 = det(𝐴)𝐼𝑛 . Corollary 6.4.6 (Inverse by Adjugate) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). If det(𝐴) ̸= 0, then 𝐴−1 = 1 adj(𝐴). det(𝐴) Proof: From Theorem 6.4.5, we know that adj(𝐴)𝐴 = det(𝐴)𝐼𝑛 . Dividing this equation by det(𝐴), we obtain (︂ )︂ 1 adj(𝐴) 𝐴 = 𝐼𝑛 . det(𝐴) Section 6.5 Cramer’s Rule 161 Therefore, 1 adj(𝐴) = 𝐴−1 . det(𝐴) [︂ Example 6.4.7 𝑎 𝑏 If 𝐴 = 𝑐 𝑑 Adjugate), ]︂ and det(𝐴) ̸= 0, then by Example 6.4.3 and Corollary 6.4.6 (Inverse by −1 𝐴 [︂ ]︂ [︂ ]︂ 1 1 𝑑 −𝑏 𝑑 −𝑏 = = , det(𝐴) −𝑐 𝑎 𝑎𝑑 − 𝑏𝑐 −𝑐 𝑎 exactly as in Proposition 4.6.13 (Inverse of a 2 × 2 Matrix). ⎡ Example 6.4.8 ⎤ 12 3 Let 𝐴 = ⎣ 4 5 6 ⎦ . Determine 𝐴−1 , if it exists. 7 8 10 Solution: From Example 6.2.7, we know ⎤ ⎡ 1 2 3 det(𝐴) = det ⎣ 0 −3 −6 ⎦ = 33 − 36 = −3 ̸= 0, 0 −6 −11 so we have that 𝐴 is invertible. Having computed in Example 6.4.4 that ⎤ ⎡ 2 4 −3 adj(𝐴) = ⎣ 2 −11 6 ⎦ , −3 6 −3 we conclude that ⎡ 𝐴−1 ⎤ 2 4 −3 1 = − ⎣ 2 −11 6 ⎦ . 3 −3 6 −3 REMARK We do not recommend this method as a way of determining the inverse of a matrix. However, it may be useful for checking that one or two entries are correct. In practice, it will usually be more efficient to use the Algorithm involving EROs given in Proposition 4.6.8 (Algorithm for Checking Invertibility and Finding the Inverse). 6.5 Cramer’s Rule Cramer’s Rule provides a method for solving systems of 𝑛 linear equations in 𝑛 unknowns by making use of determinants. It is not usually efficient from a computational point of view, since calculating determinants is generally computationally intensive. However, Cramer’s Rule is particularly useful if one is only interested in a single component of the solution. 162 Proposition 6.5.1 Chapter 6 The Determinant (Cramer’s Rule) #» #» Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and consider the equation 𝐴 #» 𝑥 = 𝑏 , where 𝑏 ∈ F𝑛 and det(𝐴) ̸= 0. #» If we construct 𝐵𝑗 from 𝐴 by replacing the 𝑗 𝑡ℎ column of 𝐴 by the column vector 𝑏 , then the solution #» 𝑥 to the equation #» 𝐴 #» 𝑥 = 𝑏 is given by 𝑥𝑗 = det(𝐵𝑗 ) , det(𝐴) for all 𝑗 = 1, . . . , 𝑛. Proof: Since det(𝐴) ̸= 0, we have that 𝐴 is invertible and 𝐴−1 = Thus, and so 1 adj(𝐴). det(𝐴) #» #» 𝑥 = 𝐴−1 𝑏 = 1 #» adj(𝐴) 𝑏 , det 𝐴 𝑛 ∑︁ 1 (adj(𝐴))𝑗𝑘 𝑏𝑘 ; 𝑥𝑗 = det(𝐴) 𝑘=1 that is, 𝑥𝑗 = 𝑛 ∑︁ 1 𝐶𝑘𝑗 (𝐴) 𝑏𝑘 . det(𝐴) 𝑘=1 We will show that 𝑛 ∑︁ 𝐶𝑘𝑗 (𝐴) 𝑏𝑘 = det(𝐵𝑗 ), 𝑘=1 and then Cramer’s Rule will be proven. We evaluate det(𝐵𝑗 ) by expanding the determinant along the 𝑗 𝑡ℎ column of 𝐵𝑗 , 𝑘 ∑︁ det(𝐵𝑗 ) = (𝐵𝑗 )𝑘𝑗 (𝐶(𝐵𝑗 ))𝑘𝑗 . 𝑖=1 #» Recall that (𝐵𝑗 )𝑘𝑗 = 𝑏𝑘 , since the 𝑗 𝑡ℎ column of 𝐵𝑗 is just 𝑏 . Since the matrices 𝐴 and 𝐵 only differ (at most) in the 𝑗 𝑡ℎ column, we conclude that (𝐶(𝐵𝑗 ))𝑘𝑗 = (𝐶(𝐴))𝑘𝑗 . Thus, det(𝐵𝑗 ) = 𝑛 ∑︁ 𝑘=1 as required. 𝐶𝑘𝑗 (𝐴) 𝑏𝑘 , Section 6.6 The Determinant and Geometry 163 ⎡ Example 6.5.2 ⎤ 12 3 Let 𝐴 = ⎣ 4 5 6 ⎦. Use Cramer’s Rule to solve 7 8 10 ⎡ ⎤ ⎡ ⎤ 12 3 −2 ⎣ 4 5 6 ⎦ #» 𝑥 = ⎣ 3 ⎦. 7 8 10 −4 Solution: We saw in Example 6.2.7 that ⎛⎡ ⎤⎞ 12 3 det ⎝⎣ 4 5 6 ⎦⎠ = −3. 7 8 10 Now, let us evaluate the determinants of the matrices ⎡ ⎤ ⎡ ⎤ −2 2 3 1 −2 3 𝐵1 = ⎣ 3 5 6 ⎦ , 𝐵2 = ⎣ 4 3 6 ⎦ , −4 8 10 7 −4 10 ⎡ ⎤ 1 2 −2 𝐵3 = ⎣ 4 5 3 ⎦ . 7 8 −4 We evaluate the determinants of 𝐵1 , 𝐵2 and 𝐵3 using the indicated EROs: ⎧ ⎤ ⎤ ⎡ ⎡ −2 2 3 −2 2 3 ⎨ 𝑅2 → 32 𝑅1 + 𝑅2 ⎦ = 20. det(𝐵1 ) = det ⎣ 3 5 6 ⎦ = det ⎣ 0 8 21 2 ⎩ 0 4 4 𝑅3 → −2𝑅1 + 𝑅3 −4 8 10 ⎡ ⎤ ⎡ ⎤ 1 −2 3 1 −2 3 det(𝐵2 ) = det ⎣ 4 3 6 ⎦ = det ⎣ 0 11 −6 ⎦ = −61. 7 −4 10 0 10 −11 ⎡ ⎤ ⎡ ⎤ 1 2 −2 1 2 −2 det(𝐵3 ) = det ⎣ 4 5 3 ⎦ = det ⎣ 0 −3 11 ⎦ = 36. 7 8 −4 0 −6 10 ⎧ ⎨ 𝑅2 → −4𝑅1 + 𝑅2 ⎩ 𝑅3 → −7𝑅1 + 𝑅3 ⎧ ⎨ 𝑅2 → −4𝑅1 + 𝑅2 ⎩ 𝑅3 → −7𝑅1 + 𝑅3 Thus, ⎡ ⎤ 20 1 #» 𝑥 = − ⎣ −61 ⎦ . 3 36 6.6 The Determinant and Geometry In this section, we will give a geometric interpretation of the determinant. In R2 , the underlying idea is the relationship between the determinant and areas of parallelograms, which we can calculate using the cross product. Here is the key result. 164 Proposition 6.6.1 Chapter 6 The Determinant (Area of Parallelogram) [︂ ]︂ [︂ ]︂ 𝑣1 𝑤1 #» #» Let 𝑣 = and 𝑤 = be vectors in R2 . 𝑣2 𝑤2 ⃒ (︂[︂ ]︂)︂⃒ ⃒ 𝑣1 𝑤1 ⃒⃒ #» #» ⃒ The area of the parallelogram with sides 𝑣 and 𝑤 is ⃒det . 𝑣2 𝑤2 ⃒ #» in R3 with a third component Proof: We shall consider the analogous vectors #» 𝑣 1 and 𝑤 1 of zero. That is, we let ⎡ ⎤ ⎡ ⎤ 𝑣1 𝑤1 #» #» = ⎣ 𝑤 ⎦ . 𝑣 1 = ⎣ 𝑣2 ⎦ and 𝑤 2 1 0 0 As we saw in the Remark (Parallelogram Area via Cross Product) in Section 1.9, the area #» is given by of the parallelogram with sides #» 𝑣 and 𝑤 ⃦⎡ ⎤ ⎡ ⎤⃦ ⃦⎡ ⎤⃦ ⃦ 𝑣1 ⃦ ⃦ ⃦ 𝑤 0 1 ⃦ ⃦ ⃦ ⃦ #» #» ⃦ ⃦ ⃦ ⃦ = |𝑣1 𝑤2 − 𝑤1 𝑣2 |. ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 0 ‖ 𝑣 1 × 𝑤 1 ‖ = ⃦ 𝑣 2 × 𝑤2 ⃦ = ⃦ ⃦ ⃦ 0 0 ⃦ ⃦ 𝑣 1 𝑤2 − 𝑤1 𝑣 2 ⃦ ⃒ (︂[︂ ]︂)︂⃒ ⃒ ⃒ 𝑣 𝑤 1 1 ⃒, as required. The expression on the right is equal to ⃒⃒det 𝑣 2 𝑤2 ⃒ Example 6.6.2 [︀ ]︀𝑇 #» = [︀ 4 1 ]︀𝑇 Determine the area of the parallelogram in R2 with the vectors #» 𝑣 = 2 5 and 𝑤 as its sides. ⃒ ]︂)︂⃒ (︂[︂ ⃒ 2 4 ⃒⃒ ⃒ Solution: The area is ⃒det = | − 18| = 18 units2 . 51 ⃒ Notice that it is important to include the absolute value signs in the expression given by Proposition 6.6.1 (Area of Parallelogram). REMARK If we study the proof of Proposition 6.6.1 (Area of Parallelogram), we can obtain a useful trick that helps us compute cross products. Namely, suppose we wish to compute the cross ⎡ ⎤ ⎡ ⎤ 𝑢1 𝑣1 product #» 𝑢 × #» 𝑣 of #» 𝑢 = ⎣ 𝑢2 ⎦ and #» 𝑣 = ⎣ 𝑣2 ⎦. Consider the matrix 𝑢3 𝑣3 ⎡ #» 𝑒1 ⎣ 𝑢1 𝑣1 #» 𝑒2 𝑢2 𝑣2 ⎤ #» 𝑒3 𝑢3 ⎦ 𝑣3 whose first row consists of the standard basis vectors #» 𝑒 1 , #» 𝑒 2 , #» 𝑒 3 (treated as formal sym#» bols!), and whose second and third rows are the entries of 𝑢 and #» 𝑣 , respectively. Then if Section 6.6 The Determinant and Geometry 165 we expand along the first row, we obtain ⎡ #» #» #» ⎤ 𝑒1 𝑒2 𝑒3 𝑒 1 − (𝑢1 𝑣3 − 𝑢3 𝑣1 ) #» 𝑒 2 + (𝑢1 𝑣2 − 𝑢2 𝑣1 ) #» 𝑒3 det ⎣ 𝑢1 𝑢2 𝑢3 ⎦ = (𝑢2 𝑣3 − 𝑢3 𝑣2 ) #» 𝑣1 𝑣2 𝑣3 ⎡ ⎤ 𝑢2 𝑣3 − 𝑢3 𝑣2 = ⎣ −(𝑢1 𝑣3 − 𝑢3 𝑣1 ) ⎦ . 𝑢1 𝑣2 − 𝑢2 𝑣1 We recognize this last expression as #» 𝑢 × #» 𝑣. We must emphasize however that this is simply a trick that helps us remember the formula for #» 𝑢 × #» 𝑣 . We are abusing notation by treating #» 𝑒 1 , #» 𝑒 2 , #» 𝑒 3 as symbols in the left-side of the equation, but as vectors in the right-side of the equation. ⎡ Example 6.6.3 ⎤ ⎡ ⎤ 2 −2 Compute ⎣ −3 ⎦ × ⎣ 1 ⎦. 5 4 Solution: Using the trick above, we have ⎡ ⎤ ⎡ ⎤ ⎡ #» #» #» ⎤ 2 −2 𝑒1 𝑒2 𝑒3 ⎣ −3 ⎦ × ⎣ 1 ⎦ = det ⎣ 2 −3 5 ⎦ 5 4 −2 1 4 = (−17) #» 𝑒 1 − (18) #» 𝑒 2 + (−4) #» 𝑒3 ⎡ ⎤ −17 ⎣ = −18 ⎦ , −4 just as we had computed in Example 1.9.2. Here is another useful re-interpretation of Proposition 6.6.1 (Area of Parallelogram). Let 𝐴 ∈ 𝑀2×2 (R) and consider the unit square in R2 . Multiply the sides 𝑒#»1 and 𝑒#»2 of the unit square by 𝐴 to obtain the vectors 𝐴𝑒#»1 and 𝐴𝑒#»2 . In this way, we may view 𝐴 as having transformed the unit square into a parallelogram with sides 𝐴𝑒#»1 and 𝐴𝑒#»2 . Since 𝐴𝑒#» and 𝐴𝑒#» are the columns of 𝐴 (by Lemma 4.2.2 (Column Extraction)), Proposition 1 2 6.6.1 tells us that the area of the resulting parallelogram is | det(𝐴)|. Since the area of the unit square is 1, we conclude 𝐴 has scaled the area of the unit square by a factor of | det(𝐴)|. The determinant of a 2 × 2 real matrix may thus be interpreted as a (signed) scaling factor of areas. In particular, notice that if the determinant is 0, then the area gets scaled by 0. This intuitively aligns with the fact that such a matrix is not invertible, since such a scaling cannot be reversed. It is possible to give a similar interpretation of the 𝑛 × 𝑛 determinant of a real matrix as a (signed) scaling factor of volumes in R𝑛 , but we will not pursue this idea further in this course. 166 Chapter 6 Figure 6.6.1: Effect of 𝐴 on the unit square. The Determinant Chapter 7 Eigenvalues and Diagonalization 7.1 What is an Eigenpair? Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be the standard matrix for the linear transformation 𝑇𝐴 . We want to consider the effect that multiplication by 𝐴 has on the vector #» 𝑥 ∈ R𝑛 ; that is, we #» will compare the length and the direction of an input vector 𝑥 ∈ R𝑛 to its image vector #» 𝑦 = 𝐴 #» 𝑥 ∈ R𝑛 . Usually, when #» 𝑥 is multiplied by matrix 𝐴 to produce #» 𝑦 = 𝐴 #» 𝑥 , there will be changes to both the length and the direction. Exceptions to this pattern turn out to be particularly interesting, and will become the focus of the majority of this chapter. ]︂ [︂ ]︂ 1 13 #» . Then and 𝑔 = Let 𝐴 = 0 31 [︂ Example 7.1.1 ]︂ [︂ ]︂ [︂ ]︂ [︂ #» 1 13 1 #» #» . = ℎ = 𝑇𝐴 ( 𝑥 ) = 𝐴 𝑔 = 3 31 0 √ #» Since ‖ #» 𝑔 ‖ = 1 and the length of the image √ is ‖ ℎ ‖ = 10, #» the image ℎ has been scaled by a factor of 10 relative to the length of the input vector #» 𝑔. #» The image ℎ has a different direction than the input #» 𝑔; #» #» the angle between and ℎ is (︁ #» #»𝑔 )︁ (︁ )︁ 𝑔 ·ℎ 𝜃 = arccos #» #» = arccos √110 ≈ 1.249 radians ‖𝑔 ‖‖ℎ‖ counter-clockwise. [︂ Example 7.1.2 ]︂ [︂ ]︂ 13 2 #» Keeping 𝐴 = from the previous example with a new input vector 𝑢 = , then 31 −1 167 168 Chapter 7 Eigenvalues and Diagonalization [︂ ]︂ [︂ ]︂ [︂ ]︂ 13 2 −1 #» #» #» 𝑣 = 𝑇𝐴 ( 𝑢 ) = 𝐴 𝑢 = = . 3 1 −1 5 √ √ 𝑣 ‖ = 26, Since ‖ #» 𝑢 ‖ = 5 and the length of the image√︁is ‖ #» the image #» 𝑣 has been scaled by a factor of 26 5 relative to #» the length of the input vector 𝑢 . #» The image #» 𝑣 has a different direction than the 𝑢)︁; (︁ input #» #» #» #» 𝑢·𝑣 the angle between 𝑢 and 𝑣 is 𝜃 = arccos ‖ #» = 𝑢 ‖ ‖ #» 𝑣‖ (︁ )︁ √ arccos √5−7 ≈ 2.232 radians counter-clockwise. 26 We now ask, for a given matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F), are there any input vectors whose direction is not changed when they are multiplied by 𝐴? [︂ Example 7.1.3 ]︂ [︂ ]︂ 13 1 #» Keeping 𝐴 = from the previous examples with a new input vector 𝑧 = , then 31 1 [︂ ]︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ 1 4 13 1 #» = 𝑇 ( #» #» = 4 #» 𝑧. = 4 = 𝑤 𝑧 ) = 𝐴 𝑧 = 𝐴 1 4 31 1 #» has been scaled by a factor of 4 relative to the length of the input vector #» The image 𝑤 𝑧. #» has the same direction as the original vector #» The image 𝑤 𝑧 ; that is, multiplying 𝐴 by #» 𝑧 did not change the direction. We include within our scope of interest image vectors that are in the opposite direction of the input vector. [︂ Example 7.1.4 ]︂ [︂ ]︂ 13 1 #» from the previous examples with a new input vector 𝑥 = , Keeping 𝐴 = 31 −1 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 13 1 −2 1 #» #» #» 𝑦 = 𝑇𝐴 ( 𝑥 ) = 𝐴 𝑥 = = = −2 = −2 #» 𝑥. 3 1 −1 2 −1 The image #» 𝑦 has been scaled by a factor of −2 relative to the length of the input vector #» 𝑥. The image #» 𝑦 has the opposite direction as the original vector #» 𝑥. [︂ ]︂ [︂ ]︂ 1 1 #» #» In the previous two examples, we found vectors 𝑧 = and 𝑥 = that “fit” with 1 −1 [︂ ]︂ 13 the matrix 𝐴 = in a special way, such that 𝐴 #» 𝑧 and 𝐴 #» 𝑥 are scalar multiples of #» 𝑧 31 #» and 𝑥 respectively. Section 7.2 The Characteristic Polynomial and Finding Eigenvalues 169 REMARK Given any matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F), we will seek vectors #» 𝑥 such that 𝐴 #» 𝑥 is a scalar multiple #» #» #» #» #» #» of 𝑥 . For any such matrix 𝐴, 𝑥 = 0 is such a vector because 𝐴 0 = 0 = 𝑐 0 for any #» #» constant 𝑐; that is, 𝐴 0 is always a scalar multiple of 0 . This fact is so trivial that we focus #» our attention on vectors #» 𝑥 ̸= 0 such that 𝐴 #» 𝑥 is a scalar multiple of #» 𝑥. We now arrive at our main definitions. Definition 7.1.5 Eigenvector, Eigenvalue and Eigenpair Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). A non-zero vector #» 𝑥 is an eigenvector of 𝐴 over F if there exists a scalar 𝜆 ∈ F such that 𝐴 #» 𝑥 = 𝜆 #» 𝑥. The scalar 𝜆 is then called an eigenvalue of 𝐴 over F, and the pair (𝜆, #» 𝑥 ) is an eigenpair of 𝐴 over F. [︂ Example 7.1.6 In Examples 7.1.3 and 7.1.4, we saw that 𝐴 = (︂ [︂ ]︂)︂ 1 over R. −2, −1 13 31 ]︂ (︂ [︂ ]︂)︂ 1 has eigenpairs 4, and 1 Interestingly enough, a square matrix with real entries may have eigenpairs with non-real eigenvalues. ]︂ (︂ [︂ ]︂)︂ 𝑖 0 −1 #» is an eigenpair of 𝐵 over C, because , then (𝜆, 𝑥 ) = 𝑖, If 𝐵 = 1 1 0 [︂ Example 7.1.7 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 0 −1 𝑖 −1 𝑖 #» 𝐵𝑥 = = =𝑖 = 𝜆 #» 𝑥. 1 0 1 𝑖 1 (︂ ]︂)︂ −𝑖 You can verify that −𝑖, is an eigenpair of 𝐵 over C in a similar way. With the 1 techniques that will be developed in this chapter you will be able to show that, even though 𝐵 has entries in R, it does not have any eigenpairs over R. 7.2 [︂ The Characteristic Polynomial and Finding Eigenvalues When looking for eigenpairs of a matrix 𝐴 over F, consider this list of equivalent equations which we solve over F for both #» 𝑥 and 𝜆. We denote 𝐼𝑛 by 𝐼. 𝐴 #» 𝑥 = 𝜆 #» 𝑥 #» ⇔ 𝐴 #» 𝑥 − 𝜆 #» 𝑥 = 0 #» ⇔ 𝐴 #» 𝑥 − 𝜆𝐼 #» 𝑥 = 0 #» ⇔ (𝐴 − 𝜆𝐼) #» 𝑥 = 0 170 Chapter 7 Eigenvalues and Diagonalization #» Thus, solving the equation 𝐴 #» 𝑥 = 𝜆 #» 𝑥 is equivalent to solving (𝐴 − 𝜆𝐼) #» 𝑥 = 0. Definition 7.2.1 Eigenvalue Equation or Eigenvalue Problem Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We refer to the equation 𝐴 #» 𝑥 = 𝜆 #» 𝑥 or #» (𝐴 − 𝜆𝐼) #» 𝑥 = 0 as the eigenvalue equation for the matrix 𝐴 over F. It is also sometimes referred to as the eigenvalue problem. REMARKS • This is an unusual equation to solve since we want to solve it for both the vector #» 𝑥 ∈ F𝑛 and the scalar 𝜆 ∈ F. We will approach the problem by first identifying eligible values of 𝜆. We can then determine corresponding sets of vectors #» 𝑥 that solve the equation for each 𝜆 we identify. • As mentioned in the Remark following Example 7.1.4, a trivial solution to this equa#» #» tion is #» 𝑥 = 0 . We seek to obtain a non-trivial ( #» 𝑥 ̸= 0 ) solution to the eigenvalue equation. This is possible if and only if the RREF of matrix 𝐴 − 𝜆𝐼 has fewer than 𝑛 pivots, which occurs if and only if 𝐴 − 𝜆𝐼 is not invertible; that is, if and only if det(𝐴 − 𝜆𝐼) = 0. ⎛⎡ ⎤⎞ 𝑎11 − 𝜆 𝑎12 . . . 𝑎1𝑛 ⎜⎢ 𝑎21 𝑎22 − 𝜆 . . . 𝑎2𝑛 ⎥⎟ ⎜⎢ ⎥⎟ • Computing det(𝐴 − 𝜆𝐼) = det ⎜⎢ ⎥⎟ by expanding along .. .. .. . . ⎝⎣ ⎦⎠ . . . . 𝑎𝑛1 𝑎𝑛2 . . . 𝑎𝑛𝑛 − 𝜆 the first row, we sum up 𝑛 terms, each of which is the product of entries in 𝐴 − 𝜆𝐼. Since each entry is either a constant or a linear term in the variable 𝜆, and all products present in the expanded determinant calculation contain 𝑛 terms, we can infer that det(𝐴 − 𝜆𝐼) is a polynomial in 𝜆 of degree 𝑛. See Proposition 7.3.4 (Features of the Characteristic Polynomial) for a more precise statement and justification. Definition 7.2.2 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and 𝜆 ∈ F. The characteristic polynomial of 𝐴, denoted by 𝐶𝐴 (𝜆), is Characteristic Polynomial and Characteristic Equation 𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼). The characteristic equation of 𝐴 is 𝐶𝐴 (𝜆) = 0. REMARK The eigenvalues of 𝐴 over F are the roots of the characteristic polynomial of 𝐴 in F. Section 7.2 The Characteristic Polynomial and Finding Eigenvalues 171 [︂ Example 7.2.3 ]︂ 12 Find the eigenvalues of 𝐴 = over R. 21 [︂ ]︂ 1−𝜆 2 Solution: Since 𝐴 − 𝜆𝐼 = , the characteristic polynomial of 𝐴 is 2 1−𝜆 (︂[︂ 𝐶𝐴 (𝜆) = det 1−𝜆 2 2 1−𝜆 ]︂)︂ = (1 − 𝜆)2 − 22 = 𝜆2 − 2𝜆 − 3 and the characteristic equation of 𝐴 is 𝐶𝐴 (𝜆) = 𝜆2 − 2𝜆 − 3 = (𝜆 − 3)(𝜆 + 1) = 0. The eigenvalues of 𝐴 over R are the real roots of 𝐶𝐴 (𝜆), that is, 𝜆1 = 3 and 𝜆2 = −1. [︂ Example 7.2.4 ]︂ 0 −1 Find the eigenvalues of 𝐵 = over R. 1 0 ]︂ [︂ −𝜆 −1 , the characteristic polynomial of 𝐵 is Solution: Since 𝐵 − 𝜆𝐼 = 1 −𝜆 (︂[︂ 𝐶𝐵 (𝜆) = det −𝜆 −1 1 −𝜆 ]︂)︂ = 𝜆2 + 1 and the characteristic equation of 𝐵 is 𝐶𝐵 (𝜆) = 𝜆2 + 1 = 0. Since 𝐶𝐵 (𝜆) has no real roots, 𝐵 has no eigenvalues and no eigenpairs over R. REMARK ]︂ ]︂ [︂ cos 𝜃 − sin 𝜃 0 −1 Note that we can interpret 𝐵 geometrically by observing that = for sin 𝜃 cos 𝜃 1 0 𝜋 𝜃 = 2. [︂ Therefore, from a geometric perspective, 𝐵 is the matrix which performs a rotation by 𝜋2 radians counter-clockwise. Since 𝐵 #» 𝑥 can never be a scalar multiple of a nonzero vector #» 𝑥 ∈ R2 (because the transformation will always change the vector’s direction), it makes sense that 𝐵 would have no real eigenvalues nor eigenvectors. [︂ Example 7.2.5 ]︂ 0 −1 Keeping 𝐵 = from the previous example, find the eigenvalues of 𝐵 over C. 1 0 [︂ ]︂ −𝜆 −1 Solution: As in the previous example, 𝐵 − 𝜆𝐼 = and 1 −𝜆 (︂[︂ 𝐶𝐵 (𝜆) = det −𝜆 −1 1 −𝜆 ]︂)︂ = 𝜆2 + 1. 172 Chapter 7 Eigenvalues and Diagonalization We can factor 𝐶𝐵 (𝜆) fully because we are now working in C; thus the characteristic equation of 𝐵 is 𝐶𝐵 (𝜆) = 𝜆2 + 1 = (𝜆 − 𝑖)(𝜆 + 𝑖) = 0. The complex roots of 𝐶𝐵 (𝜆) are the eigenvalues of 𝐵 over C. Thus, 𝜆1 = 𝑖 and 𝜆2 = −𝑖. [︂ Example 7.2.6 ]︂ 3𝑖 −4 over C. 2 𝑖 [︂ ]︂ 3𝑖 − 𝜆 −4 Solution: Since 𝐺 − 𝜆𝐼 = , the characteristic polynomial of 𝐺 is 2 𝑖−𝜆 Find the eigenvalues of 𝐺 = (︂[︂ 𝐶𝐺 (𝜆) = det 3𝑖 − 𝜆 −4 2 𝑖−𝜆 ]︂)︂ = (3𝑖 − 𝜆)(𝑖 − 𝜆) + 8 = 𝜆2 − 4𝑖𝜆 + 5 and the characteristic equation is 𝐶𝐺 (𝜆) = 𝜆2 − 4𝑖𝜆 + 5 = (𝜆 − 5𝑖)(𝜆 + 𝑖) = 0. The complex roots of 𝐶𝐺 (𝜆) are the eigenvalues of 𝐺 over C. Thus, 𝜆1 = 5𝑖 and 𝜆2 = −𝑖. ⎡ Example 7.2.7 ⎤ 4 2 −6 Find the eigenvalues of 𝐻 = ⎣ 1 −2 1 ⎦ over R. −6 2 4 Solution: ⎡We find the characteristic polynomial of 𝐻 by calculating the determinant of ⎤ 4−𝜆 2 −6 𝐻 − 𝜆𝐼 = ⎣ 1 −2 − 𝜆 1 ⎦ , using an expansion along the first row: −6 2 4−𝜆 ⎛⎡ ⎤⎞ 4−𝜆 2 −6 𝐶𝐻 (𝜆) = det ⎝⎣ 1 −2 − 𝜆 1 ⎦⎠ −6 2 4−𝜆 = (4 − 𝜆)[(−2 − 𝜆)(4 − 𝜆) − 2] − 2[1(4 − 𝜆) − 1(−6)] − 6[1(2) − (−2 − 𝜆)(−6)] = −𝜆3 + 6𝜆2 + 40𝜆. The characteristic equation is 𝐶𝐻 (𝜆) = −𝜆3 + 6𝜆2 + 40𝜆 = −𝜆(𝜆2 − 6𝜆 − 40) = −𝜆(𝜆 − 10)(𝜆 + 4) = 0. The real roots are the eigenvalues of 𝐻 over R. Thus, 𝜆1 = 10, 𝜆2 = 0 and 𝜆3 = −4. 7.3 Properties of the Characteristic Polynomial There is a lot of arithmetic involved in finding eigenpairs and it is wise to make sure that our characteristic polynomial is actually correct before we begin factoring it. This section explores properties of the characteristic polynomial that will allow us to verify our work. We begin with the following result. Section 7.3 Properties of the Characteristic Polynomial Proposition 7.3.1 173 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 𝐴 is invertible if and only if 𝜆 = 0 is not an eigenvalue of 𝐴. 𝐴 is invertible iff det(𝐴) ̸= 0. Proof: iff det(𝐴 − 0 𝐼𝑛 ) ̸= 0 iff 0 is not a root of the characteristic polynomial iff 0 is not an eigenvalue of the matrix A. Definition 7.3.2 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We define the trace of 𝐴 by Trace tr(𝐴) = 𝑛 ∑︁ 𝑎𝑖𝑖 . 𝑖=1 That is, the trace of a square matrix is the sum of its diagonal entries. ⎡ Example 7.3.3 Proposition 7.3.4 ⎤ 12 7 Let 𝐴 = ⎣ 3 4 −5 ⎦. Then tr(𝐴) = 1 + 4 + 6 = 11. 40 6 (Features of the Characteristic Polynomial) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have characteristic polynomial 𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼). Then 𝐶𝐴 (𝜆) is a degree 𝑛 polynomial in 𝜆 of the form 𝐶𝐴 (𝜆) = 𝑐𝑛 𝜆𝑛 + 𝑐𝑛−1 𝜆(𝑛−1) + · · · + 𝑐1 𝜆 + 𝑐0 , where (a) 𝑐𝑛 = (−1)𝑛 , (b) 𝑐𝑛−1 = (−1)(𝑛−1) tr(𝐴), and (c) 𝑐0 = det(𝐴). Proof: By performing cofactor expansion along the first row of 𝐴 and along the first row of every subsequent (1, 1)-submatrix, we obtain an expression of the form 𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼) = (𝑎11 − 𝜆)(𝑎22 − 𝜆) · · · (𝑎𝑛𝑛 − 𝜆) + 𝑓 (𝜆), where 𝑓 is the polynomial that is the sum of the other terms of the determinant. In the above, we have singled out the contribution of the product of the diagonal entries, and have grouped together all other terms. This latter grouping 𝑓 (𝜆) is a polynomial of degree at most 𝑛 − 2. To see this, note that unless the cofactor calculation is done with respect to a diagonal entry, it will use an entry (𝑖, 𝑗) that corresponds to deleting the entry (𝑎𝑖𝑖 − 𝜆) 174 Chapter 7 Eigenvalues and Diagonalization Figure 7.3.1: Visualization of taking the (1, 2)-submatrix of 𝐴 − 𝜆𝐼 from the 𝑖𝑡ℎ row and the entry (𝑎𝑗𝑗 − 𝜆) from the 𝑗 𝑡ℎ column. This leaves us with at most 𝑛 − 2 entries that contain a 𝜆. We illustrate this below using the (1, 2)-entry as an example. Returning to our expression for 𝐶𝐴 (𝜆), we can expand out the first term in the expression to obtain 𝐶𝐴 (𝜆) = 𝑎11 · · · 𝑎𝑛𝑛 + · · · + (−1)𝑛−1 (𝑎11 + · · · + 𝑎𝑛𝑛 )𝜆𝑛−1 + (−1)𝑛 𝜆𝑛 + 𝑓 (𝜆). From this we see that 𝐶𝐴 (𝜆) is a polynomial of degree 𝑛 in 𝜆 (because we’ve shown that this degree is not exceeded by the polynomial 𝑓 , which is of degree at most 𝑛 − 2). Moreover, we obtain the following results: (a) The coefficient of 𝜆𝑛 is 𝑐𝑛 = (−1)𝑛 . (b) The coefficient of 𝜆𝑛−1 is 𝑐𝑛−1 = (−1)𝑛−1 (𝑎11 + · · · + 𝑎𝑛𝑛 ) = (−1)𝑛−1 tr(𝐴). (c) From the definition of 𝐶𝐴 (𝜆), we immediately have that the constant term is 𝑐0 = 𝐶𝐴 (0) = det(𝐴 − 0𝐼) = det(𝐴). ⎤ 12 3 Determine the characteristic polynomial of 𝐴 = ⎣ 4 5 6 ⎦ and verify the properties outlined 7 8 10 in Proposition 7.3.4 (Features of the Characteristic Polynomial). ⎡ Example 7.3.5 Solution: The characteristic polynomial of 𝐴 is ⎡ ⎤ 1−𝜆 2 3 6 ⎦ 𝐶𝐴 (𝜆) = det ⎣ 4 5 − 𝜆 7 8 10 − 𝜆 = (1 − 𝜆) [(5 − 𝜆)(10 − 𝜆) − 48] − 2 [4(10 − 𝜆) − 42] + 3 [32 − 7(5 − 𝜆)] = −𝜆3 + 16𝜆2 + 12𝜆 − 3. Observe that 𝐶𝐴 (𝜆) has degree 3, that tr(𝐴) = 1 + 5 + 10 = 16 and that det(𝐴) = −3 as we determined in Example 6.1.8. We now verify our calculation of 𝐶𝐴 (𝜆) using each part of Proposition 7.3.4 (Features of the Characteristic Polynomial): Section 7.3 Properties of the Characteristic Polynomial 175 (a) The coefficient of 𝜆3 is 𝑐3 = −1 = (−1)3 . (b) The coefficient of 𝜆2 is 𝑐2 = 16 = (−1)2 tr(𝐴). (c) The constant term is 𝑐0 = −3 = det(𝐴). We can now be more confident that our characteristic polynomial was computed correctly. Note that this does not prove that our calculation was correct; this is simply a nice check for “red flags”. The characteristic polynomial 𝐶𝐴 (𝜆) of an 𝑛 × 𝑛 matrix 𝐴 is a degree 𝑛 polynomial. By the Fundamental Theorem of Algebra, the characteristic polynomial 𝐶𝐴 (𝜆) is guaranteed to have 𝑛 complex roots (though some may be repeated) as we assume F ⊆ C. Therefore, 𝐴 is guaranteed to have 𝑛 eigenvalues in C. Some of these eigenvalues may be repeated, in which case we list each eigenvalue as many times as its corresponding linear factor appears in the characteristic polynomial 𝐶𝐴 (𝜆); Example 7.3.12 will address the case of repeated eigenvalues. The following result establishes a connection between the characteristic polynomial of a matrix and its 𝑛 complex eigenvalues over C. Proposition 7.3.6 (Characteristic Polynomial and Eigenvalues over C) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have characteristic polynomial 𝐶𝐴 (𝜆) = 𝑐𝑛 𝜆𝑛 + 𝑐𝑛−1 𝜆𝑛−1 + · · · + 𝑐1 𝜆 + 𝑐0 , and 𝑛 eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 (possibly repeated) in C. Then (a) 𝑐𝑛−1 = (−1)(𝑛−1) 𝑛 ∑︁ 𝜆𝑖 , and 𝑖=1 (b) 𝑐0 = 𝑛 ∏︁ 𝜆𝑖 . 𝑖=1 Note that if 𝐴 has repeated eigenvalues over C, then we include each eigenvalue in the list 𝜆1 , 𝜆2 , ..., 𝜆𝑛 as many times as its corresponding linear factor appears in the characteristic polynomial 𝐶𝐴 (𝜆). Proof: The eigenvalues of 𝐴 over C are the 𝑛 complex roots of the characteristic polynomial, so its characteristic polynomial has the form 𝐶𝐴 (𝜆) = 𝑘(𝜆 − 𝜆1 )(𝜆 − 𝜆2 ) · · · (𝜆 − 𝜆𝑛 ) for some 𝑘 ∈ C. By Parts (a) and (b) of Proposition 7.3.4 (Features of the Characteristic Polynomial), the coefficient of 𝜆𝑛 is 𝑐𝑛 = (−1)𝑛 . It must be that 𝑘 = (−1)𝑛 , so 𝐶𝐴 (𝜆) = (−1)𝑛 (𝜆 − 𝜆1 )(𝜆 − 𝜆2 ) · · · (𝜆 − 𝜆𝑛 ). 176 Chapter 7 Eigenvalues and Diagonalization (a) Consider 𝑐𝑛−1 , the coefficient of 𝜆𝑛−1 in 𝐶𝐴 (𝜆). By expanding out the expression for 𝐶𝐴 (𝜆), we find that there are 𝑛 terms involving 𝜆𝑛−1 , each of which is the product of (−1)𝑛 with one of the constants −𝜆1 , −𝜆2 , . . . , −𝜆𝑛 and with 𝜆 from each of the other 𝑛 − 1 linear factors. By taking the sum of these terms, we find that 𝑐𝑛−1 𝜆𝑛−1 = (−1)𝑛 [−𝜆1 (𝜆)𝑛−1 − 𝜆2 (𝜆)𝑛−1 − · · · − 𝜆𝑛 (𝜆)𝑛−1 ] (︃ )︃ 𝑛 ∑︁ (𝑛−1) = (−1) 𝜆𝑖 𝜆𝑛−1 𝑖=1 that is, 𝑐𝑛−1 = (−1)(𝑛−1) 𝑛 ∑︁ 𝜆𝑖 . 𝑖=1 (b) Expanding the terms of 𝐶𝐴 (𝜆), its constant term must be (−1)𝑛 times the product of the constant terms in each of the 𝑛 linear factors, which are −𝜆1 , −𝜆2 , . . . , −𝜆𝑛 . Therefore, 𝑛 𝑛 𝑛 ∏︁ ∏︁ ∏︁ 𝑐0 = (−1)𝑛 (−𝜆𝑖 ) = (−1)2𝑛 𝜆𝑖 = 𝜆𝑖 . 𝑖=1 𝑖=1 𝑖=1 Propositions 7.3.4 and 7.3.6 combine to form the following corollary, the proof of which we leave as an exercise. Corollary 7.3.7 (Eigenvalues and Trace/Determinant) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have 𝑛 eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 (possibly repeated) in C. Show that: (a) 𝑛 ∑︁ 𝜆𝑖 = tr(𝐴). 𝑖=1 (b) 𝑛 ∏︁ 𝜆𝑖 = det(𝐴). 𝑖=1 In the following examples, we confirm our previous calculations using Proposition 7.3.6 (Characteristic Polynomial and Eigenvalues over C). [︂ Example 7.3.8 ]︂ 0 −1 In Example 7.2.5, the eigenvalues of 𝐵 = over C are 𝜆1 = 𝑖 and 𝜆2 = −𝑖, and 1 0 𝐶𝐵 (𝜆) = 𝜆2 + 1 is a quadratic (degree 2) polynomial. Proposition 7.3.6 and Corollary 7.3.7 confirm that (a) the coefficient of 𝜆 is (−1)1 2 ∑︁ 𝜆𝑖 = −(𝜆1 + 𝜆2 ) = −(𝑖 − 𝑖) = 0 = 𝑐1 and tr(𝐵) = 𝑖=1 𝜆1 + 𝜆2 = 0. (b) the constant term is 2 ∏︁ 𝑖=1 𝜆𝑖 = 𝜆1 𝜆2 = (𝑖)(−𝑖) = −𝑖2 = 1 = 𝑐0 and 1 = det(𝐵). Section 7.3 Properties of the Characteristic Polynomial 177 [︂ Example 7.3.9 ]︂ 3𝑖 −4 In Example 7.2.6, the eigenvalues of 𝐺 = over C are 𝜆1 = 5𝑖 and 𝜆2 = −𝑖 and 2 𝑖 𝐶𝐺 (𝜆) = 𝜆2 − 4𝑖𝜆 + 5 is a degree 2 polynomial. hyperref[prop:complex-eigenvals-coeffs-charpoly]Proposition 7.3.6 and Corollary 7.3.7 confirm that 1 (a) the coefficient of 𝜆 is (−1) 2 ∑︁ 𝜆𝑖 = −(𝜆1 + 𝜆2 ) = −(5𝑖 − 𝑖) = −(4𝑖) = 𝑐1 and tr(𝐺) = 𝑖=1 𝜆1 + 𝜆2 = 4𝑖. (b) the constant term is 2 ∏︁ 𝜆𝑖 = 𝜆1 𝜆2 = 5𝑖(−𝑖) = 5 = 𝑐0 and 5 = det(𝐺). 𝑖=1 In the next two examples, we will look at the eigenvalues of a matrix 𝐴 over R, which are the real roots of the characteristic polynomial. Since R is a subset of C, any real root of a polynomial can also be considered as a complex root of the polynomial; thus, any real eigenvalue of a matrix 𝐴 can be considered to be a complex eigenvalue. As a result, Proposition 7.3.6 (Characteristic Polynomial and Eigenvalues over C) applies in these cases. ]︂ 12 has a degree two characteristic polynomial In Example 7.2.3, 𝐴 = 21 𝐶𝐴 (𝜆) = 𝜆2 − 2𝜆 − 3 = (𝜆 − 3)(𝜆 + 1), with two real roots 𝜆1 = 3 and 𝜆2 = −1. These are the eigenvalues of 𝐴 over C (as well as over R). Proposition 7.3.6 confirms that [︂ Example 7.3.10 (a) the coefficient of 𝜆 is (−1)1 2 ∑︁ 𝜆𝑖 = −(𝜆1 + 𝜆2 ) = −(3 − 1) = −2 = 𝑐1 . 𝑖=1 (b) the constant term is 2 ∏︁ 𝜆𝑖 = 𝜆1 𝜆2 = (3)(−1) = −3 = 𝑐0 . 𝑖=1 Be careful to note that a matrix 𝐴 can have both real and non-real eigenvalues. In applying Proposition 7.3.6, we must use all of the eigenvalues of 𝐴. ⎡ Example 7.3.11 ⎤ −4 0 0 The characteristic polynomial of 𝐽 = ⎣ 0 0 4 ⎦ is 0 −4 0 ⎛⎡ ⎤⎞ −4 − 𝜆 0 0 −𝜆 4 ⎦⎠ = −(𝜆 + 4)(𝜆2 + 16) = −𝜆3 − 4𝜆2 − 16𝜆 − 64. 𝐶𝐽 (𝜆) = det ⎝⎣ 0 0 −4 −𝜆 Its degree is 3, and it has one real root 𝜆1 = −4 and two non-real complex roots 𝜆2 = 4𝑖 and 𝜆3 = −4𝑖. These are the eigenvalues of 𝐽 over C. Proposition 7.3.6 confirms that 178 Chapter 7 (a) the coefficient of 𝜆2 is (−1)2 3 ∑︁ Eigenvalues and Diagonalization 𝜆𝑖 = 𝜆1 + 𝜆2 + 𝜆3 = −4 + 4𝑖 − 4𝑖 = −4 = 𝑐2 . 𝑖=1 (b) the constant term is 3 ∏︁ 𝜆𝑖 = 𝜆1 𝜆2 𝜆3 = (−4)(4𝑖)(−4𝑖) = −64 = 𝑐0 . 𝑖=1 For our following example, the characteristic polynomial includes a repeated linear factor. In order to use Proposition 7.3.6 (Characteristic Polynomial and Eigenvalues over C), we include each eigenvalue in our list 𝜆1 , 𝜆2 , ..., 𝜆𝑛 as many times as its corresponding linear factor appears in the characteristic polynomial 𝐶𝐴 (𝜆). We will address this concept later in terms of multiplicity. ⎡ Example 7.3.12 ⎤ 311 The characteristic polynomial of 𝑍 = ⎣ 1 3 1 ⎦ is 113 ⎛⎡ ⎤⎞ 3−𝜆 1 1 𝐶𝑍 (𝜆) = det ⎝⎣ 1 3 − 𝜆 1 ⎦⎠ 1 1 3−𝜆 = (3 − 𝜆)[(3 − 𝜆)2 − 1] − 1[(3 − 𝜆) − 1] + 1[1 − (3 − 𝜆)] = −𝜆3 + 9𝜆2 − 24𝜆 + 20 = −(𝜆 − 5)(𝜆 − 2)2 . Since 𝑛 = 3, we must carefully write down the three eigenvalues corresponding to the roots of 𝐶𝑍 (𝜆), allowing repeats. Since the linear factor 𝜆 − 2 occurs twice in 𝐶𝑍 (𝜆), we list the eigenvalues as 𝜆1 = 5, 𝜆2 = 2, and 𝜆3 = 2. Proposition 7.3.6 confirms that (a) the coefficient of 𝜆2 is (−1)2 3 ∑︁ 𝜆𝑖 = 𝜆1 + 𝜆2 + 𝜆3 = 5 + 2 + 2 = 9 = 𝑐2 . 𝑖=1 (b) the constant term is 3 ∏︁ 𝜆𝑖 = 𝜆1 𝜆2 𝜆3 = 5(2)(2) = 20 = 𝑐0 . 𝑖=1 7.4 Finding Eigenvectors Once we have found an eigenvalue 𝜆 of a matrix 𝐴 over F, we can examine the eigenvalue #» equation in the form (𝐴 − 𝜆𝐼) #» 𝑥 = 0 in order to obtain a corresponding eigenvector #» 𝑥. [︂ Example 7.4.1 ]︂ 12 For each eigenvalue of 𝐴 = over R, find a corresponding eigenpair. 21 Section 7.4 Finding Eigenvectors 179 Solution: From Example 7.2.3, the eigenvalues of 𝐴 over R are 𝜆1 = 3 and 𝜆2 = −1. For each eigenvalue 𝜆𝑖 , we examine the eigenvalue equation to determine an eigenvector. #» • When 𝜆1 = 3, we examine (𝐴 − 3𝐼) #» 𝑥 = 0 which is equivalent to [︂ ]︂ [︂ ]︂ −2 2 𝑥1 #» = 0 2 −2 𝑥2 and row-reduces to [︂ 1 −1 0 0 ]︂ [︂ ]︂ 𝑥1 #» = 0. 𝑥2 {︂ [︂ ]︂ }︂ (︂ [︂ ]︂)︂ 1 1 The general solution is 𝑠 : 𝑠 ∈ R and an eigenpair is 3, . 1 1 #» • When 𝜆2 = −1, we examine (𝐴 − (−1)𝐼) #» 𝑥 = 0 which is equivalent to ]︂ [︂ ]︂ [︂ 2 2 𝑥1 #» = 0 2 2 𝑥2 and row-reduces to [︂ 11 00 ]︂ [︂ ]︂ 𝑥1 #» = 0. 𝑥2 }︂ (︂ [︂ ]︂)︂ {︂ [︂ ]︂ 1 1 . : 𝑡 ∈ R and an eigenpair is −1, The general solution is 𝑡 −1 −1 Checking these eigenpairs, we have that [︂ ]︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ 1 3 12 1 1 =3 = = 𝐴 1 3 21 1 1 and ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ ]︂ [︂ 1 −1 1 12 1 . = −1 = = 𝐴 1 −1 2 1 −1 −1 [︂ REMARK (︂ [︂ ]︂)︂ 1 While 3, is an eigenpair of 𝐴, we can get other eigenpairs by pairing non-zero scalar 1 [︂ ]︂ (︂ [︂ ]︂)︂ 1 2 multiples of the eigenvector with the eigenvalue 𝜆 = 3. For example, 3, and 1 2 (︂ [︂ ]︂)︂ −4 3, are eigenpairs of 𝐴 over R, since −4 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 2 12 2 6 2 𝐴 = = =3 2 21 2 6 2 and [︂ 𝐴 ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ −4 1 2 −4 −12 −4 = = =3 . −4 2 1 −4 −12 −4 180 Chapter 7 Eigenvalues and Diagonalization We formalize this idea later in Proposition 7.5.1 (Linear Combinations of Eigenvectors). [︂ Example 7.4.2 ]︂ 0 −1 For each eigenvalue of 𝐵 = over C, find a corresponding eigenpair. 1 0 Solution: From Example 7.2.4, the eigenvalues of 𝐵 over C are 𝜆1 = 𝑖 and 𝜆2 = −𝑖. For each eigenvalue 𝜆𝑖 , we examine the eigenvalue equation to determine an eigenvector. #» • When 𝜆1 = 𝑖, we examine (𝐵 − 𝑖𝐼) #» 𝑥 = 0 which is equivalent to [︂ ]︂ [︂ ]︂ −𝑖 −1 𝑥1 #» = 0 1 −𝑖 𝑥2 and row-reduces to [︂ 1 −𝑖 0 0 ]︂ [︂ ]︂ 𝑥1 #» = 0. 𝑥2 }︂ (︂ [︂ ]︂)︂ {︂ [︂ ]︂ 𝑖 𝑖 . : 𝑠 ∈ C and an eigenpair is 𝑖, The general solution is 𝑠 1 1 #» • When 𝜆2 = −𝑖, we examine (𝐵 − (−𝑖)𝐼) #» 𝑥 = 0 which is equivalent to ]︂ [︂ ]︂ [︂ 𝑖 −1 𝑥1 #» = 0 1 𝑖 𝑥2 and row-reduces to [︂ 1 𝑖 00 ]︂ [︂ ]︂ 𝑥1 #» = 0. 𝑥2 }︂ (︂ [︂ ]︂)︂ {︂ [︂ ]︂ −𝑖 −𝑖 . : 𝑡 ∈ C and an eigenpair is −𝑖, The general solution is 𝑡 1 1 Checking these eigenpairs, we have that [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 𝑖 0 −1 𝑖 −1 𝑖 𝐵 = = =𝑖 1 1 0 1 𝑖 1 and [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ −𝑖 0 −1 −𝑖 −1 −𝑖 𝐵 = = = −𝑖 . 1 1 0 1 −𝑖 1 ⎡ Example 7.4.3 ⎤ 311 For each eigenvalue of 𝑍 = ⎣ 1 3 1 ⎦ over R, find a corresponding eigenpair. 113 Solution: From Example 7.3.12, the characteristic polynomial of 𝑍 is 𝐶𝑍 (𝜆) = −(𝜆 − 5)(𝜆 − 2)2 . Therefore, the eigenvalues of 𝑍 over R are 𝜆1 = 5, 𝜆2 = 2 and 𝜆3 = 2. (Note that the eigenvalue 2 occurs twice.) Section 7.5 Eigenspaces 181 For each eigenvalue 𝜆𝑖 , we examine the eigenvalue equation to determine an eigenvector. #» • When 𝜆1 = 5, we solve the system (𝑍 − 5𝐼) #» 𝑥 = 0 which is equivalent to ⎡ ⎤⎡ ⎤ −2 1 1 𝑥1 ⎣ 1 −2 1 ⎦ ⎣ 𝑥2 ⎦ = #» 0 1 1 −2 𝑥3 and row-reduces to ⎤⎡ ⎤ 1 0 −1 𝑥1 ⎣ 0 1 −1 ⎦ ⎣ 𝑥2 ⎦ = #» 0. 00 0 𝑥3 ⎡ ⎧ ⎡ ⎤ ⎫ ⎛ ⎡ ⎤⎞ 1 ⎨ 1 ⎬ The solution set is 𝑠 ⎣ 1 ⎦ : 𝑠 ∈ R and an eigenpair is ⎝5, ⎣ 1 ⎦⎠ . ⎩ ⎭ 1 1 #» • When 𝜆2 = 2, we solve the system (𝑍 − 2𝐼) #» 𝑥 = 0 which is equivalent to ⎤⎡ ⎤ ⎡ 𝑥1 111 ⎣ 1 1 1 ⎦ ⎣ 𝑥2 ⎦ = #» 0 111 𝑥3 and row-reduces to ⎡ ⎤⎡ ⎤ 111 𝑥1 ⎣ 0 0 0 ⎦ ⎣ 𝑥2 ⎦ = #» 0. 000 𝑥3 ⎧ ⎡ ⎤ ⎫ ⎡ ⎤ ⎛ ⎡ ⎤⎞ −1 −1 ⎨ −1 ⎬ The solution set is 𝑡 ⎣ 1 ⎦ + 𝑢 ⎣ 0 ⎦ : 𝑡, 𝑢 ∈ R and an eigenpair is ⎝2, ⎣ 1 ⎦⎠ . ⎩ ⎭ 0 1 0 • No new computation ⎛ ⎡ ⎤⎞is needed for 𝜆3 = 2. However, ⎡ ⎤note that we can obtain another −1 −1 ⎝ ⎣ ⎦ ⎠ ⎣ 0 ⎦ is not a scalar multiple of the eigenpair 2, 0 where the eigenvector 1 1 ⎡ ⎤ −1 eigenvector ⎣ 1 ⎦ obtained in the previous step. We are able to do this because the 0 solution set to the eigenvalue equation has two parameters in it. This feature of the solution set is related to the fact that the eigenvalue 𝜆2 = 𝜆3 = 2 is a repeated root of the characteristic polynomial. We will study this phenomenon in more detail later. 7.5 Eigenspaces [︂ (︂ [︂ ]︂)︂ 1 After Example 7.2.3, we noted that 𝐴 = has an eigenpair 3, . By taking 1 [︂ ]︂ (︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂ 1 2 −4 scalar multiples of the eigenvector , we generated 3, and 3, that are 1 2 −4 12 21 ]︂ 182 Chapter 7 Eigenvalues and Diagonalization also eigenpairs of 𝐴 over R with the same eigenvalue 𝜆 = 3. More generally, we have the following result. Proposition 7.5.1 (Linear Combinations of Eigenvectors) Let 𝑐, 𝑑 ∈ F and suppose that (𝜆1 , #» 𝑥 ) and (𝜆1 , #» 𝑦 ) are eigenpairs of a matrix 𝐴 over F with #» the same eigenvalue 𝜆1 . If 𝑐 #» 𝑥 + 𝑑 #» 𝑦 ̸= 0 , then (𝜆1 , 𝑐 #» 𝑥 + 𝑑 #» 𝑦 ) is also an eigenpair for 𝐴 with eigenvalue 𝜆1 . Proof: Observe that 𝐴(𝑐 #» 𝑥 + 𝑑 #» 𝑦 ) = 𝑐(𝐴 #» 𝑥 ) + 𝑑(𝐴 #» 𝑦 ) = 𝑐(𝜆1 #» 𝑥 ) + 𝑑(𝜆1 #» 𝑦 ) = 𝜆1 (𝑐 #» 𝑥 + 𝑑 #» 𝑦 ). Thus, since 𝑐 #» 𝑥 + 𝑑 #» 𝑦 ̸= 0, it follows that 𝑐 #» 𝑥 + 𝑑 #» 𝑦 is an eigenvector of 𝐴 with eigenvalue #» #» 𝜆1 . That is, (𝜆1 , 𝑐 𝑥 + 𝑑 𝑦 ) is an eigenpair for 𝐴. Example 7.5.2 If we take 𝑑 = 0 and 𝑐 ̸= 0 in Proposition 7.5.1, we see that every non-zero scalar multiple of an eigenvector is an eigenvector (with the same eigenvalue). ]︂ (︂ [︂ ]︂)︂ [︂ (︂ [︂ ]︂)︂ 𝑐 12 1 for any , then so is 3, is an eigenpair for 𝐴 = For instance, since 3, 𝑐 21 1 𝑐 ̸= 0 in F. Since every non-zero linear combination of eigenvectors with respect to a fixed eigenvalue 𝜆 are also eigenvectors with respect to 𝜆, this suggests collecting all possible eigenvectors of 𝐴 with respect to a value 𝜆 into a set. Definition 7.5.3 Eigenspace Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) and let 𝜆 ∈ F. The eigenspace of 𝐴 associated with 𝜆, denoted by #» 𝐸𝜆 (𝐴), is the solution set to the system (𝐴 − 𝜆𝐼) #» 𝑥 = 0 over F. That is, 𝐸𝜆 (𝐴) = Null(𝐴 − 𝜆 𝐼). If the choice of 𝐴 is clear, we abbreviate this as 𝐸𝜆 . REMARKS #» #» • Notice that 0 is always in 𝐸𝜆 . However, by definition 0 is not an eigenvector. Thus the eigenspace 𝐸𝜆 consists of all the eigenvectors that have eigenvalue 𝜆 together with the zero vector. #» • 𝜆 is an eigenvalue for 𝐴 if and only if 𝐸𝜆 ̸= { 0 }. • 𝐸0 (𝐴) = Null(𝐴 − 0𝐼) = Null(𝐴). Section 7.6 Diagonalization Example 7.5.4 183 (︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂ [︂ ]︂ 1 1 12 In Example 7.4.1, we found that 3, and −1, are eigenpairs of 𝐴 = 1 −1 21 over R. Our{︂[︂work in that Example in fact shows that the eigenspaces of 𝐴 are ]︂}︂ {︂[︂ ]︂}︂ 1 1 𝐸3 = Span and 𝐸−1 = Span . 1 −1 [︂ Example 7.5.5 ]︂ 0 −1 Our work in Example 7.4.2 shows that the eigenspaces of 𝐵 = over C are 1 0 {︂[︂ ]︂}︂ {︂[︂ ]︂}︂ 𝑖 −𝑖 𝐸𝑖 = Span and 𝐸−𝑖 = Span . 1 1 ⎡ Example 7.5.6 ⎤ 311 In Example 7.4.3, the eigenspaces of 𝑍 = ⎣ 1 3 1 ⎦ over R are 113 ⎫ ⎧⎡ ⎤⎫ ⎧ ⎡ ⎤ ⎬ ⎨ 1 ⎬ ⎨ 1 ⎦ ⎣ 𝐸5 = 𝑠 1 : 𝑠 ∈ R = Span ⎣ 1 ⎦ ⎭ ⎭ ⎩ ⎩ 1 1 and 7.6 ⎧ ⎡ ⎫ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤⎫ −1 −1 ⎬ ⎨ −1 ⎬ ⎨ −1 𝐸2 = 𝑡 ⎣ 1 ⎦ + 𝑢 ⎣ 0 ⎦ : 𝑡, 𝑢 ∈ R = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ . ⎩ ⎭ ⎩ ⎭ 0 1 0 1 Diagonalization In this section we will demonstrate one practical use of eigenvalues and eigenvectors. (There are many more that you can learn about in your future courses!) In certain applications of linear algebra, it is necessary to compute powers 𝐴𝑘 of a square matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) efficiently. It would be ideal if we could compute 𝐴𝑘 without first having to compute each of 𝐴2 , 𝐴3 , 𝐴4 , . . . , 𝐴𝑘−1 . In some special situations, this is possible. For instance, if 𝐴 is a diagonal matrix, then computing 𝐴𝑘 is particularly straightforward: simply take the 𝑘 𝑡ℎ powers of the diagonal entries. EXERCISE Prove that if 𝐷 = diag(𝑑1 , . . . , 𝑑𝑛 ) ∈ 𝑀𝑛×𝑛 (F), then 𝐷𝑘 = diag(𝑑𝑘1 , . . . , 𝑑𝑘𝑛 ) for all 𝑘 ∈ N. As the following example demonstrates, computing powers of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) is simpler not only when 𝐴 is diagonal, but also when 𝐴 is of the form 𝐴 = 𝑃 𝐷𝑃 −1 , where 𝑃 is invertible and 𝐷 is diagonal. 184 Chapter 7 Eigenvalues and Diagonalization [︂ Example 7.6.1 ]︂ [︂ ]︂ 1 1 3 0 Consider the matrices 𝑃 = and 𝐷 = . (Observe that 𝐷 is a diagonal 1 −1 0 −1 [︂ ]︂ 12 matrix.) Let 𝐴 = 𝑃 𝐷𝑃 −1 = . Then, 21 𝐴2 = 𝐴𝐴 = (𝑃 𝐷𝑃 −1 )(𝑃 𝐷𝑃 −1 ) = 𝑃 𝐷(𝑃 −1 𝑃 )𝐷𝑃 −1 = 𝑃 𝐷2 𝑃 −1 , and therefore 𝐴3 = 𝐴𝐴2 = (𝑃 𝐷𝑃 −1 )(𝑃 𝐷2 𝑃 −1 ) = 𝑃 𝐷3 𝑃 −1 . One can show using induction (exercise: do this!) that we have the following general formula for 𝐴𝑘 : 𝐴𝑘 = 𝑃 𝐷𝑘 𝑃 −1 for all 𝑘 ∈ N. Now, the key observation is: since 𝐷 is a diagonal matrix, we have [︂ 𝑘 ]︂ 3 0 𝑘 𝐷 = . 0 (−1)𝑘 Consequently, 𝑘 𝑘 𝐴 = 𝑃𝐷 𝑃 −1 ]︂−1 ]︂ [︂ 𝑘 ]︂ [︂ 3 0 1 1 1 1 = 1 −1 1 −1 0 (−1)𝑘 ]︂ [︂ 1 1 ]︂ ]︂ [︂ 𝑘 [︂ 3 0 1 1 2 2 = 1 1 1 −1 0 (−1)𝑘 2 −2 ]︂ [︂ 𝑘 1 3 + (−1)𝑘 3𝑘 − (−1)𝑘 . = 2 3𝑘 − (−1)𝑘 3𝑘 + (−1)𝑘 [︂ In the previous Example, the relationship between 𝐴 and 𝐷 given by the equation 𝐴 = 𝑃 𝐷𝑃 −1 made the computation of 𝐴𝑘 a rather straightforward matter. We shall give such a relationship a name. Of course, the key problem is how to find such a pair of matrices 𝑃 and 𝐷 given a matrix 𝐴, if this is even possible. We will return to this problem shortly. Definition 7.6.2 Similar Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐴 is similar to 𝐵 over F if there exists an invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F) such that 𝐴 = 𝑃 𝐵𝑃 −1 . REMARKS • Notice that if 𝐴 = 𝑃 𝐵𝑃 −1 then multiplying this equation on the left by 𝑃 −1 and on the right by 𝑃 gives 𝑃 −1 𝐴𝑃 = 𝐵, or equivalently, 𝑄𝐴𝑄−1 = 𝐵 where 𝑄 = 𝑃 −1 . Thus if 𝐴 is similar to 𝐵, then 𝐵 is similar to 𝐴 and we can just say that 𝐴 and 𝐵 are similar to each other. Section 7.6 Diagonalization 185 • The choice of “similar” as terminology for the relationship in Definition 7.6.2 might seem strange at first sight. There is in fact good reason for this choice, as we will later learn. (See the Remark on page 234.) In the Exercise below, you are asked to show that similar matrices share a variety of features. Specifically, they have the same determinant, trace, eigenvalues and characteristic polynomial. [︂ Example 7.6.3 ]︂ [︂ ]︂ 12 3 0 Example 7.6.1 shows that 𝐴 = is similar to 𝐷 = over R. 21 0 −1 [︂ Example 7.6.4 ]︂ [︂ ]︂ [︂ ]︂ 1 6 −2 12 12 −1 Let 𝐴 = . Consider 𝑃 = , with 𝑃 = − , and let 34 46 2 −4 1 𝐵 = 𝑃 −1 𝐴𝑃 = [︂ 12 46 ]︂−1 [︂ 12 34 ]︂ [︂ 12 46 ]︂ = [︂ ]︂ 1 16 24 . 2 −17 −26 ]︂ ]︂ [︂ 1 16 24 12 is similar to 𝐵 = over R. Then 𝐴 = 34 2 −17 −26 [︂ EXERCISE Let 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (F). Show that: (a) 𝐴 is similar to 𝐴 over F. (b) If 𝐴 is similar to 𝐵 over F and 𝐵 is similar to 𝐶 over F, then 𝐴 is similar to 𝐶 over F. EXERCISE Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Show that if 𝐴 is similar to 𝐵 over F, then: (a) For all 𝑘 ∈ N, 𝐴𝑘 is similar to 𝐵 𝑘 over F. (b) 𝐶𝐴 (𝜆) = 𝐶𝐵 (𝜆). (c) 𝐴 and 𝐵 have the same eigenvalues. (d) tr(𝐴) = tr(𝐵) and det(𝐴) = det(𝐵). Let us return to the problem of computing 𝐴𝑘 . In view of Example 7.6.1, the problem becomes much easier if we can show that 𝐴 is similar to a diagonal matrix 𝐷. We will single out such matrices. 186 Definition 7.6.5 Diagonalizable Matrix Chapter 7 Eigenvalues and Diagonalization Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). We say that 𝐴 is diagonalizable over F if it is similar over F to a diagonal matrix 𝐷 ∈ 𝑀𝑛×𝑛 (F); that is, if there exists an invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F) such that 𝑃 −1 𝐴𝑃 = 𝐷. We say that the matrix 𝑃 diagonalizes 𝐴. REMARKS • Although we have motivated the above Definition by considering the problem of computing powers of matrices, the notion of diagonalizablity is fundamental for several other reasons, both theoretical and practical. We will see some of them in due course. • It is important to pay attention to what field F we are working over. It is possible for a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) to be diagonalizable over C but not over R. (See Example 9.4.1.) [︂ Example 7.6.6 Example 7.6.1 shows that 𝐴 = ]︂ 12 is diagonalizable over R. 21 The problem we must now address is: determine if a given matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F) is diagonalizable over F, and if it is, find the matrix 𝑃 that diagonalizes it. We will eventually be able to completely solve this problem. For now, we will give a partial solution. Proposition 7.6.7 (𝑛 Distinct Eigenvalues =⇒ Diagonalizable) If 𝐴 ∈ 𝑀𝑛×𝑛 (F) has 𝑛 distinct eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 in F, then 𝐴 is diagonalizable over F. More specifically, if we let (𝜆1 , 𝑣#»1 ), (𝜆2 , 𝑣#»2 ), . . . , (𝜆𝑛 , 𝑣#» 𝑛 ) be eigenpairs of 𝐴 over F, and if we #» #» # » let 𝑃 = [𝑣 𝑣 · · · 𝑣 ] be the matrix whose columns are eigenvectors corresponding to the 1 2 𝑛 distinct eigenvalues, then (a) 𝑃 is invertible, and (b) 𝑃 −1 𝐴𝑃 = 𝐷 = diag(𝜆1 , 𝜆2 , · · · , 𝜆𝑛 ). REMARKS • WARNING: The converse of the above Proposition is false. That is, if 𝐴 ∈ 𝑀𝑛×𝑛 (F) is diagonalizable over F, then it is not necessarily true that 𝐴 has 𝑛 distinct eigenvalues. Consider, for instance, 𝒪2×2 = diag(0, 0) which is diagonalizable (it is diagonal!) with two repeated eigenvalues 𝜆1 = 𝜆2 = 0. Section 7.6 Diagonalization 187 • We will generalize this result in Chapter 9, and we will address precisely what happens when 𝐴 does not have 𝑛 distinct eigenvalues. We will prove the generalized result at that time. (See page 242 for the proof of Proposition 7.6.7.) • Notice that the columns of 𝑃 are eigenvectors of 𝐴 and the diagonal entries of 𝐷 are eigenvalues of 𝐴. Moreover, the order of the eigenvalues down the diagonal of 𝐷 corresponds to the order the eigenvectors occur as columns in 𝑃 . That is, the 𝑖𝑡ℎ diagonal entry in 𝐷 is the eigenvalue corresponding to the 𝑖𝑡ℎ column of 𝑃 . The Proposition remains true if we list the eigenvectors as columns in 𝑃 in any order we want, provided we adjust the diagonal entries of 𝐷 accordingly. Although we do not have a general proof of Proposition 7.6.7 at this time, we can directly verify that it works in specific examples as we illustrate below. [︂ Example 7.6.8 ]︂ (︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂ 12 1 1 From Example 7.4.1, 𝐴 = has eigenpairs 3, and −1, over R. −1 21 1 ]︂ [︂ ]︂ [︂ 1 1 1 −1 −1 −1 from the eigenvectors, with inverse 𝑃 = − 2 We construct 𝑃 = . 1 −1 −1 1 Now, if we compute 𝐷=𝑃 −1 [︂ 1 −1 𝐴𝑃 = − 2 −1 [︂ 1 −1 =− 2 −1 ]︂ [︂ 3 0 = 0 −1 ]︂ [︂ −1 1 ]︂ [︂ 12 21 −1 1 ]︂ [︂ 3 −1 3 1 1 1 1 −1 ]︂ ]︂ = diag(3, −1), we find that it is a diagonal matrix whose diagonal entries are the eigenvalues of 𝐴, exactly as asserted by Proposition 7.6.7. Example 7.6.9 [︂ ]︂ (︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂ 0 −1 𝑖 −𝑖 From Example 7.4.2, 𝐵 = has eigenpairs 𝑖, and −𝑖, over C. We 1 0 1 1 [︂ ]︂ [︂ ]︂ 1 −𝑖 1 𝑖 −𝑖 construct 𝑃 = from the eigenvectors, with inverse 𝑃 −1 = . 1 1 2 𝑖 1 Then 𝐷 = 𝑃 −1 𝐵𝑃 [︂ ]︂ [︂ ]︂ [︂ ]︂ 1 −𝑖 1 0 −1 𝑖 −𝑖 = 1 1 2 𝑖 1 1 0 [︂ ]︂ [︂ ]︂ 1 −𝑖 1 −1 −1 = 𝑖 −𝑖 2 𝑖 1 [︂ ]︂ 𝑖 0 = 0 −𝑖 188 Chapter 7 Eigenvalues and Diagonalization = diag(𝑖, −𝑖). ⎡ Example 7.6.10 ⎤ 4 2 −6 From Example 7.2.7, 𝐻 = ⎣ 1 −2 1 ⎦ has eigenalues 𝜆1 = 10, 𝜆2 = 0 and 𝜆3 = −4. −6 2 4 ⎛ ⎡ ⎤⎞ ⎛ ⎡ ⎤⎞ ⎛ ⎡ ⎤⎞ 1 1 1 We leave it as an exercise to verify that ⎝10, ⎣ 0 ⎦⎠ , ⎝0, ⎣ 1 ⎦⎠ and ⎝−4, ⎣ −1 ⎦⎠ are −1 1 1 ⎡ ⎤ 1 1 1 ⎣ 0 1 −1 ⎦ from the eigenvectors, with inverse eigenpairs over R. We construct 𝑃 = −1 1 1 ⎡ ⎤ 2 0 −2 1 𝑃 −1 = ⎣ 1 2 1 ⎦ . 4 1 −2 1 Then ⎤ ⎤⎡ ⎤⎡ 1 1 1 4 2 −6 2 0 −2 1 𝐷 = 𝑃 −1 𝐻𝑃 = ⎣ 1 2 1 ⎦ ⎣ 1 −2 1 ⎦ ⎣ 0 1 −1 ⎦ 4 −1 1 1 −6 2 4 1 −2 1 ⎡ ⎤⎡ ⎤ 2 0 −2 10 0 −4 1 = ⎣1 2 1 ⎦ ⎣ 0 0 4 ⎦ 4 1 −2 1 −10 0 −4 ⎡ ⎤ 10 0 0 =⎣ 0 0 0 ⎦ 0 0 −4 ⎡ = diag(10, 0, −4). EXERCISE ⎡ ⎤ 3 −5 0 The matrix 𝐴 = ⎣ 2 −3 0 ⎦ has the following eigenpairs: −2 3 −1 ⎛ ⎡ ⎤⎞ ⎛ ⎡ ⎤⎞ ⎛ ⎡ ⎤⎞ 0 2−𝑖 2+𝑖 ⎝−1, ⎣ 0 ⎦⎠ , ⎝𝑖, ⎣ 1 − 𝑖 ⎦⎠ , and ⎝−𝑖, ⎣ 1 + 𝑖 ⎦⎠ . 1 −1 −1 Find an invertible matrix 𝑃 and a diagonal matrix 𝐷 such that 𝐴 = 𝑃 𝐷𝑃 −1 . We close this Chapter with an example of a quick computation of powers of a diagonalizable matrix. Section 7.6 Diagonalization 189 ⎡ Example 7.6.11 ⎤ 4 2 −6 Let 𝐻 = ⎣ 1 −2 1 ⎦. Find 𝐻 10 . −6 2 4 Solution: In the previous Example we found matrices 𝑃 and 𝐷 so that 𝐻 = 𝑃 𝐷𝑃 −1 . By what we have learned from Example 7.6.1 (see also the Exercise below), we find that for all 𝑘 ∈ N 𝐻 𝑘 = (𝑃 𝐷𝑃 −1 )𝑘 = 𝑃 𝐷𝑘 𝑃 −1 ⎡ ⎤⎡ ⎤𝑘 ⎡ ⎤ 1 1 1 10 0 0 2 0 −2 1 ⎣1 2 1 ⎦ = ⎣ 0 1 −1 ⎦ ⎣ 0 0 0 ⎦ 4 −1 1 1 0 0 −4 1 −2 1 ⎡ ⎤ ⎤⎡ 𝑘 ⎤ ⎡ 10 0 0 2 0 −2 1 1 1 1 = ⎣ 0 1 −1 ⎦ ⎣ 0 0𝑘 0 ⎦ ⎣ 1 2 1 ⎦ 4 1 −2 1 −1 1 1 0 0 (−4)𝑘 ⎡ ⎤ 𝑘 𝑘 𝑘 2 · 10 + (−4) −2(−4) −2(10𝑘 ) 1 ⎦. −(−4)𝑘 2(−4)𝑘 −(−4)𝑘 = ⎣ 4 𝑘 𝑘 𝑘 𝑘 𝑘 −2(10 ) + (−4) −2(−4) 2(10 ) + (−4) If we plug in 𝑘 = 10 and simplify, we arrive at ⎤ ⎡ 9766137 −1024 −976513 1024 −512 ⎦ . 𝐻 10 = 512 ⎣ −512 −9765113 −1024 9766137 Chapter 8 Subspaces and Bases 8.1 Subspaces Throughout this course we have been dealing with a variety of subsets of F𝑛 —for example, solutions sets of systems of equations, lines and planes in R3 , and ranges of linear transformations. Many of these subsets have a special structure that makes them particularly interesting from the point of view of linear algebra. The next definition identifies the key features of this structure. Definition 8.1.1 Subspace A subset 𝑉 of F𝑛 is called a subspace of F𝑛 if the following properties are all satisfied. #» 1. 0 ∈ 𝑉 . 2. For all #» 𝑥 , #» 𝑦 ∈ 𝑉, #» 𝑥 + #» 𝑦 ∈ 𝑉 (closure under addition). 3. For all #» 𝑥 ∈ 𝑉 and 𝑐 ∈ F, 𝑐 #» 𝑥 ∈ 𝑉 (closure under scalar multiplication). Here are some important examples of subspaces. Proposition 8.1.2 (Examples of Subspaces) #» (a) { 0 } and F𝑛 are subspaces of F𝑛 . (b) If {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is a subset of F𝑛 , then Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is a subspace of F𝑛 . #» (c) If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then the solution set to the homogeneous system 𝐴 #» 𝑥 = 0 is a subspace of F𝑛 . (Equivalently, Null(𝐴) is a subspace of F𝑛 .) Proof: (a) This is immediate from the definition. 190 Section 8.1 Subspaces 191 #» #» (b) We have 0 = 0𝑣#»1 + · · · + 0𝑣#»𝑘 , so 0 ∈ Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. Next, we consider #» 𝑥 , #» 𝑦 ∈ Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } and 𝑐 ∈ F. Then there exist scalars 𝑎1 , 𝑎2 , . . . , 𝑎𝑘 ∈ F such that #» 𝑥 = 𝑎1 𝑣#»1 + 𝑎2 𝑣#»2 + · · · + 𝑎𝑘 𝑣#»𝑘 , and there exist scalars 𝑏 , 𝑏 , . . . , 𝑏 ∈ F such that #» 𝑦 = 𝑏 𝑣#» + 𝑏 𝑣#» + . . . + 𝑏 𝑣#». We have 1 2 1 1 𝑘 2 2 𝑘 𝑘 #» 𝑥 + #» 𝑦 = (𝑎1 𝑣#»1 + 𝑎2 𝑣#»2 + · · · + 𝑎𝑘 𝑣#»𝑘 ) + (𝑏1 𝑣#»1 + 𝑏2 𝑣#»2 + · · · + 𝑏𝑘 𝑣#»𝑘 ) = (𝑎 + 𝑏 )𝑣#» + (𝑎 + 𝑏 )𝑣#» + · · · + (𝑎 + 𝑏 )𝑣#». 1 1 1 2 2 2 𝑘 𝑘 𝑘 This shows that #» 𝑥 + #» 𝑦 is an element of Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. Similarly, since 𝑐 #» 𝑥 = 𝑐(𝑎1 𝑣#»1 + 𝑎2 𝑣#»2 + · · · + 𝑎𝑘 𝑣#»𝑘 ) = 𝑐𝑎1 𝑣#»1 + 𝑐𝑎2 𝑣#»2 + · · · + 𝑐𝑎𝑘 𝑣#»𝑘 we also have that 𝑐 #» 𝑥 ∈ Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. This proves that Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is a subspace of F𝑛 . #» #» #» (c) Let 𝑆 = { #» 𝑥 ∈ F𝑛 : 𝐴 #» 𝑥 = 0 }. We must show that 𝑆 is a subspace of F𝑛 . As 𝐴 0 = 0 , #» we know that 0 ∈ 𝑆. Next, let #» 𝑥 , #» 𝑦 ∈ 𝑆 and 𝑐 ∈ F. Then since #» #» #» 𝐴( #» 𝑥 + #» 𝑦 ) = 𝐴 #» 𝑥 + 𝐴 #» 𝑦 = 0 + 0 = 0, #» #» we have that #» 𝑥 + #» 𝑦 ∈ 𝑆. And since 𝐴(𝑐 #» 𝑥 ) = 𝑐(𝐴 #» 𝑥 ) = 𝑐 0 = 0 , we have that 𝑐 #» 𝑥 ∈ 𝑆. This proves that 𝑆 is a subspace of F𝑛 . Many commonly encountered sets in linear algebra can be shown to be subspaces by realizing that they are either of the form Span 𝑆, for some finite subset 𝑆 of F𝑛 , or realizing that they are the solution set of a homogeneous linear system of equations, and then appealing to Proposition 8.1.2 above. Here are some key examples of this. Proposition 8.1.3 (More Examples of Subspaces) (a) If 𝐴 ∈ 𝑀𝑚×𝑛 (F), then Col(𝐴) is a subspace of F𝑚 . (b) If 𝑇 : F𝑛 → F𝑚 is a linear transformation, then the range of 𝑇 , Range(𝑇 ), is a subspace of F𝑚 . (c) If 𝑇 : F𝑛 → F𝑚 is a linear transformation, then the kernel of 𝑇 , Ker(𝑇 ), is a subspace of F𝑛 . (d) If 𝐴 ∈ 𝑀𝑛×𝑛 (F) and if 𝜆 ∈ F, then the eigenspace 𝐸𝜆 is a subspace of F𝑛 . #» 𝑡ℎ column of 𝐴, so this is a Proof: (a) Col(𝐴) = Span{𝑎#»1 , 𝑎#»2 , . . . , 𝑎# » 𝑛 }, where 𝑎𝑖 is the 𝑖 special case of Proposition 8.1.2 (b). (b) Range(𝑇 ) = Col([𝑇 ]ℰ ), so this is a special case of Part (a). #» (c) Null(𝑇 ) is equal to the solution set of the homogeneous linear system [𝑇 ]ℰ #» 𝑥 = 0 , so this follows from Proposition 8.1.2 (c). 192 Chapter 8 Subspaces and Bases #» (d) 𝐸𝜆 is the solution set of the homogeneous system (𝐴 − 𝜆𝐼) #» 𝑥 = 0 , so this is a special case of Proposition 8.1.2 (c). The following proposition can simplify the process needed to check if a subset of F𝑛 is a subspace. Proposition 8.1.4 (Subspace Test) Let 𝑉 be a subset of F𝑛 . Then 𝑉 is a subspace of F𝑛 if and only if (a) 𝑉 is non-empty, and (b) for all #» 𝑥 , #» 𝑦 ∈ 𝑉 and 𝑐 ∈ F, 𝑐 #» 𝑥 + #» 𝑦 ∈𝑉. Proof: (⇒): #» Assume 𝑉 is a subspace of F𝑛 . Then 0 ∈ 𝑉 , so 𝑉 is non-empty. Also for all #» 𝑥 ∈ 𝑉 and #» 𝑐 ∈ F, 𝑐 𝑥 ∈ 𝑉 , since 𝑉 is closed under scalar multiplication. But then, for all #» 𝑦 ∈ 𝑉, 𝑐 #» 𝑥 + #» 𝑦 is also in 𝑉 , since 𝑉 is closed under addition. So 𝑉 satisfies (a) and (b). (⇐): Conversely, assume that 𝑉 is a subset of F𝑛 that satisfies (a) and (b). By (a), there exists #» an #» 𝑥 ∈ 𝑉 . Then, by (b) with 𝑐 = −1 and #» 𝑦 = #» 𝑥 , we find that 0 = − #» 𝑥 + #» 𝑥 is in 𝑉 . Also, #» #» #» #» with 𝑐 = 1, and any 𝑥 and 𝑦 in 𝑉 , we find that 𝑥 + 𝑦 ∈ 𝑉 , so 𝑉 is closed under addition. #» #» Finally, since we have shown that 𝑉 contains 0 , then (b) with #» 𝑦 = 0 shows that 𝑉 is closed under scalar multiplication. So 𝑉 is a subspace of F𝑛 . EXERCISE Establish that the examples referred to in Propositions 8.1.2 and 8.1.3 above are subspaces using Proposition 8.1.4 (Subspace Test). Example 8.1.5 Let 𝑊1 = {[𝑥 𝑦 𝑧]𝑇 ∈ R3 : 𝑥 + 2𝑦 + 3𝑧 = 4} and 𝑊2 = {[𝑥 𝑦 𝑧]𝑇 ∈ R3 : 𝑥2 − 𝑧 2 = 0} be subsets of R3 . Determine whether either of them is a subspace of R3 . Solution: 𝑊1 is not a subspace of R3 , since [0 0 0]𝑇 ∈ / 𝑊1 . Although [0 0 0]𝑇 ∈ 𝑊2 , this subset does not look “linear.” Let us check closure under addition. Notice that [2 0 2]𝑇 ∈ 𝑊2 , as 22 − 22 = 0, and [−2 0 2]𝑇 ∈ 𝑊2 , as (−22 ) − 22 = 0. However, if we add these two vectors together, we get [2 0 2]𝑇 + [−2 0 2]𝑇 = [0 0 4]𝑇 , and since (02 ) − 42 = −16 ̸= 0, then [0 0 4]𝑇 ∈ / 𝑊2 . This means that 𝑊2 is not closed under addition and is therefore not a subspace of R3 . Section 8.2 Linear Dependence and the Notion of a Basis of a Subspace 193 REMARK The preceding example illustrates a useful check: solution sets to non-homogeneous systems #» (such as 𝑊1 ) are never subspaces, since they do not contain 0 . Likewise, solution sets to nonlinear systems (such as 𝑊2 ) tend to not be subspaces, since they are usually not closed under addition or scalar multiplication (or both). On the other hand, we have from Proposition 8.1.2 (Examples of Subspaces) that solution sets to homogeneous linear systems are always subspaces. 8.2 Linear Dependence and the Notion of a Basis of a Subspace We will now tackle the issue of how to describe a subspace in an efficient way. We have already seen in Proposition 8.1.2 (Examples of Subspaces) that, given any finite subset 𝑆 of vectors in F𝑛 , their span Span 𝑆 is a subspace of F𝑛 . We will eventually see that, conversely, every subspace 𝑉 of F𝑛 may be expressed in the form of 𝑉 = Span 𝑆 where 𝑆 is a finite subset of vectors in 𝑉 . An important matter here is that this set 𝑆 is not uniquely determined by 𝑉 , in the sense that a given subspace 𝑉 may be expressed as the spanning set of many different subsets 𝑆. Example 8.2.1 We have {︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂ −1 1 0 1 1 0 1 0 1 , , , , = Span , , = Span , R = Span −1 2 1 0 2 1 0 1 0 2 [︂ ]︂ 𝑥 ∈ R2 , we can express this as as for 𝑦 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ −1 1 0 1 1 0 1 0 1 𝑥 . +0 +0 +𝑦 =𝑥 +0 +𝑦 =𝑥 +𝑦 =𝑥 −1 2 1 0 2 1 0 1 0 𝑦 In the preceding example, it is apparent that the second and third spanning sets contain some redundancies. Our goal in this section is to be able to mathematically quantify this type of redundancy and demonstrate that sets without such redundancies are preferable. Below is a more involved example to illustrate this point. Example 8.2.2 Consider the following subset of R3 : ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 2 0 −2 1 ⎬ ⎨ 2 ⎦ ⎣ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ 2 , −3 , 5 , 13 , 1 ⎦ . 𝑆= ⎭ ⎩ −8 1 2 4 −2 Let 𝑉 = Span (𝑆). At this point we cannot say much about 𝑉 . For instance, it could be the entirety of R3 , or it could be some smaller subspace, such as a line or a plane through the 194 Chapter 8 Subspaces and Bases origin. We can at least be sure that 𝑉 cannot be a line through the origin, since 𝑆 contains two non-zero, non-parallel vectors. (We will see below that 𝑉 is in fact a plane.) To get a better understanding of 𝑉 , we would like to remove any “redundant” vectors from 𝑆 and provide a smaller subset, let us call it 𝑆1 , such that 𝑉 = Span (𝑆1 ) . How do we know which vectors are redundant and ought to be removed from 𝑆? This leads us to the important idea of linear dependence. Informally, we say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent if at least one of them, 𝑣#»𝑗 say, can be written as a linear combination of some of the other vectors. Thus, the vector 𝑣#»𝑗 depends linearly on the other vectors. (The formal definition is given in Definition 8.2.3 below.) For example, the vectors in the set 𝑆 above are linearly dependent because ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −2 2 2 ⎣ 13 ⎦ = 2 ⎣ 2 ⎦ − 3 ⎣ −3 ⎦ . −8 2 4 [︀ ]︀𝑇 Thus, we can remove the vector −2 13 −8 from 𝑆 and obtain a more efficient description of 𝑉 : ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 2 0 −2 1 ⎬ ⎨ 2 𝑉 = Span ⎣ 2 ⎦ , ⎣ −3 ⎦ , ⎣ 5 ⎦ , ⎣ 13 ⎦ , ⎣ 1 ⎦ ⎩ ⎭ 2 4 −2 −8 1 ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 ⎬ 0 2 ⎨ 2 ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ 2 , −3 , 5 , 1 ⎦ . = Span ⎩ ⎭ 1 −2 4 2 [︀ ]︀𝑇 However, note that the choice of singling out the vector −2 13 −8 is somewhat arbitrary. For example, we could also write ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 −2 2 ⎣ 2 ⎦ = 1 ⎣ 13 ⎦ + 3 ⎣ −3 ⎦ 2 2 2 −8 4 or ⎡ ⎤ ⎡ ⎤ ⎤ 2 −2 2 ⎣ −3 ⎦ = 2 ⎣ 2 ⎦ − 1 ⎣ 13 ⎦ 3 3 4 2 −8 ⎡ and then remove one of these vectors instead. It is therefore more reasonable and balanced if we put everything on one side of our dependence equation and re-write it as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −2 2 2 0 ⎣ 13 ⎦ − 2 ⎣ 2 ⎦ + 3 ⎣ −3 ⎦ = ⎣ 0 ⎦ . −8 2 4 0 The point is that the redundancy in the set 𝑆 is identified by the feature that we can take some linear combination of some of the vectors in 𝑆 to produce the zero vector. Section 8.2 Linear Dependence and the Notion of a Basis of a Subspace 195 We can always form the zero vector by taking a trivial linear combination (that is, one where all the scalar coefficients are 0). For example, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −2 2 2 0 0 ⎣ 13 ⎦ + 0 ⎣ 2 ⎦ + 0 ⎣ −3 ⎦ = ⎣ 0 ⎦ . −8 2 4 0 It is worth noting the scenarios where this is the only linear combination that can produce the zero vector, which forms the premise of our definitions of linear dependence and independence below. Definition 8.2.3 Linear Dependence We say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ F𝑛 are linearly dependent if there exist scalars #» 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F, not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . If 𝑈 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }, then we say that the set 𝑈 is a linearly dependent set (or simply that 𝑈 is linearly dependent) to mean that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent. If a list or set of vectors is not linearly dependent, we will say that it is linearly independent. Here is the formal definition. Definition 8.2.4 Linear Independence, Trivial Solution We say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ F𝑛 are linearly independent if there do not exist #» scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F, not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0 . Equivalently we say that 𝑣#», 𝑣#», . . . , 𝑣#» ∈ F𝑛 are linearly independent if the only solution 1 to the equation 2 𝑘 #» 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0 is the trivial solution 𝑐1 = 𝑐2 = · · · = 𝑐𝑘 = 0. If 𝑈 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }, then we say that the set 𝑈 is a linearly independent set (or simply that 𝑈 is linearly independent) to mean that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly independent. REMARK As a matter of convention, the empty set ∅ is considered to be linearly independent. We consider the condition for linear independence in the above definition to be vacuously satisfied by ∅. The notion of linear dependence and independence is one of the most important ideas in linear algebra. We will examine it more deeply in the following sections. For now, let’s return to the previous example. 196 Example 8.2.5 Chapter 8 Subspaces and Bases We have as before the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 2 0 −2 1 ⎬ ⎨ 2 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ 2 , −3 , 5 , 13 , 1 ⎦ 𝑉 = Span (𝑆) = Span ⎭ ⎩ 2 4 −2 −8 1 which is a subspace of R3 . We have shown that 𝑆 is linearly dependent, and specifically that [−2 13 − 8]𝑇 can be written as a linear combination of [2 2 2]𝑇 and [2 − 3 4]𝑇 . We choose to remove the vector [−2 13 − 8]𝑇 from 𝑆 to produce the set 𝑆1 . Then ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 2 0 1 ⎬ ⎨ 2 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ 2 , −3 , 5 , 1 ⎦ . 𝑉 = Span (𝑆1 ) = Span ⎩ ⎭ 2 4 −2 1 However 𝑆1 is linearly dependent. For example, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 1 0 ⎣2⎦ − 2 ⎣1⎦ = ⎣0⎦ . 2 1 0 We choose to remove the vector [2 2 2]𝑇 from 𝑆1 to produce the set 𝑆2 . Then ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 0 1 ⎬ ⎨ 2 𝑉 = Span (𝑆2 ) = Span ⎣ −3 ⎦ , ⎣ 5 ⎦ , ⎣ 1 ⎦ . ⎩ ⎭ 4 −2 1 The set 𝑆2 is still linearly dependent! For example, ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 0 0 2 1 2 ⎣ 1 ⎦ − ⎣ −3 ⎦ − ⎣ 5 ⎦ = ⎣ 0 ⎦ . 0 −2 4 1 We choose to remove the vector [0 5 − 2]𝑇 from 𝑆2 to produce the set 𝑆3 . Then ⎧⎡ ⎤ ⎡ ⎤⎫ 1 ⎬ ⎨ 2 𝑉 = Span (𝑆3 ) = Span ⎣ −3 ⎦ , ⎣ 1 ⎦ . ⎩ ⎭ 4 1 The set 𝑆3 , however, is linearly independent. Indeed, if we have a linear combination ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 1 0 𝑐1 ⎣ −3 ⎦ + 𝑐2 ⎣ 1 ⎦ = ⎣ 0 ⎦ , 4 1 0 then 𝑐2 = −2𝑐1 and 𝑐2 = 3𝑐1 , so −2𝑐1 = 3𝑐1 . Therefore, 𝑐1 = 0 and consequently 𝑐2 = 0. That is, we cannot form the zero vector as a non-trivial linear combination of vectors in 𝑆3 . The set 𝑆3 is an optimal way of producing the subspace 𝑉 . This optimal set incidentally shows that 𝑉 is equal to the span of two non-parallel vectors in R3 , and so it is a plane through the origin. Section 8.3 Detecting Linear Dependence and Independence 197 The previous example illustrates an important phenomenon. Given a subspace 𝑉 of F𝑛 , say one of the form 𝑉 = Span (𝑆) for some finite set 𝑆 (we will later see that all subspaces are of this form), it will be desirable to remove any redundant vectors from 𝑆, reducing it to a smaller linearly independent subset ℬ such that 𝑉 = Span(ℬ). Such a set ℬ is, in some sense, a basic set of building blocks for the construction of 𝑉 . Definition 8.2.6 Basis Let 𝑉 be a subspace of F𝑛 and let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a finite set of vectors contained in 𝑉 . We say that ℬ is a basis for 𝑉 if 1. ℬ is linearly independent, and 2. 𝑉 = Span(ℬ). REMARK We shall {︁ #»}︁adopt the convention that the empty set ∅ is a basis for the zero subspace 𝑉 = 0 . Recall that we had earlier adopted the convention of considering the empty set to be linearly {︁ #»}︁ independent. (See the remark following Definition 8.2.4.) The statement Span (∅) = 0 can be made plausible if we agree that a “linear combination of no vectors” gives the zero vector. The notion of a basis is central to much of what follows in the course. We will spend the next two sections carefully examining both conditions (1) (linear independence) and (2) (spanning) in the above definition. Example 8.2.7 ⎧⎡ ⎤ ⎡ ⎤⎫ 1 ⎬ ⎨ 2 ⎦ ⎣ ⎣ −3 , 1 ⎦ is a basis for Our work in the previous example shows that ℬ = ⎭ ⎩ 1 4 ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 2 0 −2 1 ⎬ ⎨ 2 𝑉 = Span ⎣ 2 ⎦ , ⎣ −3 ⎦ , ⎣ 5 ⎦ , ⎣ 13 ⎦ , ⎣ 1 ⎦ . ⎩ ⎭ 2 4 −2 −8 1 8.3 Detecting Linear Dependence and Independence We begin by recalling the definitions of linear dependence and independence given in Definitions 8.2.3 and 8.2.4. A set of vectors {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is said to be linearly dependent #» if there exist scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 , not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0 ; if not, the set is said to be linearly independent. The next proposition offers reformulations of these definitions that can be easier to use in practice. 198 Proposition 8.3.1 Chapter 8 Subspaces and Bases (Linear Dependence Check) (a) The vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent if and only if one of the vectors can be written as a linear combination of some of the other vectors. (b) The vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly independent if and only if #» 𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 (𝑐𝑖 ∈ F) implies 𝑐1 = · · · = 𝑐𝑘 = 0. Proof: (a) Suppose that 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent. Then there exist scalars #» 𝑐1 , . . . , 𝑐𝑘 , not all zero, such that 𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . Suppose that 𝑐𝑗 is nonzero. Then by moving the 𝑐𝑗 𝑣#»𝑗 term to the right side of the previous equation, we get ∑︁ −𝑐𝑗 𝑣#»𝑗 = 𝑐𝑖 #» 𝑣 𝑖. 𝑖̸=𝑗 Dividing through by −𝑐𝑗 (which is permissible, since 𝑐𝑗 ̸= 0), we get ∑︁ 𝑐𝑖 #» 𝑣 𝑖. 𝑣#»𝑗 = −𝑐𝑗 𝑖̸=𝑗 Thus, we have expressed 𝑣#»𝑗 as a linear combination of the other vectors. Conversely, if 𝑣#» is a linear combination of the other vectors, say 𝑗 𝑣#»𝑗 = ∑︁ 𝑎𝑖 #» 𝑣 𝑖, 𝑖̸=𝑗 then (−1)𝑣#»𝑗 + ∑︁ #» 𝑎𝑖 #» 𝑣𝑖 = 0 𝑖̸=𝑗 #» is an expression of 0 as a nontrivial linear combination of the vectors 𝑣#»1 , . . . , 𝑣#»𝑘 (nontrivial because the coefficient of 𝑣#»𝑗 is nonzero). This proves that 𝑣#»1 , . . . , 𝑣#»𝑘 are linearly dependent. (b) This result is a rephrasing of the equivalent condition for linear independence in Definition 8.2.4. Indeed, the contrapositive of the given statement is: A list of vectors 𝑣#»1 , . . . , 𝑣#»𝑘 is linearly dependent if and only if there exists scalars 𝑐1 , . . . , 𝑐𝑘 not all zero #» such that 𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . This is the definition of linear dependence. So the statement given in (b) must be true as well. Now we turn to the problem of determining whether a given set is or is not linearly dependent. When the set contains at most two vectors, this is particularly straightforward. Proposition 8.3.2 Let 𝑆 ⊆ F𝑛 . #» (a) If 0 ∈ 𝑆, then 𝑆 is linearly dependent. Section 8.3 Detecting Linear Dependence and Independence 199 #» (b) If 𝑆 = { #» 𝑥 } contains only one vector, then 𝑆 is linearly dependent if and only if #» 𝑥 = 0. (c) If 𝑆 = { #» 𝑥 , #» 𝑦 } contains only two vectors, then 𝑆 is linearly dependent if and only if one of the vectors is a scalar multiple of the other. #» #» Proof: (a) Since 1 is a non-zero scalar, and 1( 0 ) = 0 , we can thus take a non-trivial linear combination of some vectors (in this case, only one vector) in 𝑆 to produce the zero vector. #» (b) We know from (a) that if #» 𝑥 = 0 , then 𝑆 is linearly dependent. #» Conversely, we know from the properties of the zero vector that if 𝑐 #» 𝑥 = 0 , then either #» #» 𝑥 = 0 or 𝑐 = 0. #» #» So if #» 𝑥 ̸= 0 and 𝑐 #» 𝑥 = 0 , then 𝑐 must be zero. That is, only the trivial combination #» of #» 𝑥 ̸= 0 will produce the zero vector. Thus, 𝑆 is linearly independent. (c) If one of the vectors is a multiple of the other, we may assume without loss of generality that #» 𝑦 = 𝑚 #» 𝑥 for some 𝑚 ∈ F. Then #» #» 𝑦 − 𝑚 #» 𝑥 = 0. Since the coefficient of #» 𝑦 is 1 ̸= 0, we have demonstrated linear dependence. #» Conversely, if 𝑆 = { 𝑥 , #» 𝑦 } is linearly dependent, then there exist 𝑎, 𝑏 ∈ F, not both #» zero, such that 𝑎 #» 𝑦 + 𝑏 #» 𝑥 = 0 . We may assume without loss of generality that 𝑎 ̸= 0. We then have that 𝑏 𝑏 #» #» 𝑦 + #» 𝑥 = 0 , which gives #» 𝑦 = − #» 𝑥. 𝑎 𝑎 Thus, one of the vectors, #» 𝑦 , is a multiple of the other, #» 𝑥. {︂[︂ Example 8.3.3 Is 𝑆 = ]︂}︂ ]︂ [︂ 2 6 linearly dependent? , −4 −12 Solution: [︂ ]︂ [︂ ]︂ 6 2 Yes, as =3 . −12 −4 {︂[︂ Example 8.3.4 Is 𝑆 = ]︂ [︂ ]︂}︂ 2 6 , linearly dependent? −4 −14 Solution: No. The second vector is not a multiple of the first (and the first is not a multiple of the second). Looking at the first components, 6 is 3 times 2; however, looking at the second components, −14 is not 3 times −4. 200 Example 8.3.5 Chapter 8 Subspaces and Bases [︂ ]︂ [︂ ]︂ 2 + 3𝑖 12 + 5𝑖 #» #» #» #» Is 𝑆 = {𝑧1 , 𝑧2 } , with 𝑧1 = and 𝑧2 = , linearly dependent? 4+𝑖 14 − 5𝑖 Solution: Because we are working over C, the answer to this question is not so obvious. #» We will use Proposition 8.3.2 (b). Consider the equation 𝑐1 𝑧#»1 + 𝑐2 𝑧#»2 = 0 . The Proposition states that {𝑧#», 𝑧#»} is linearly independent precisely when this equation has only the trivial 1 2 solution 𝑐1 = 𝑐2 = 0. The corresponding coefficient matrix of this equation is [︂ ]︂ 2 + 3𝑖 12 + 5𝑖 4 + 𝑖 14 − 5𝑖 which row reduces to [︂ ]︂ 1 3 − 2𝑖 . 0 0 We deduce from this reduced matrix that there are (infinitely many) non-trivial solutions to the system of equations. One solution is 𝑐1 = −3 + 2𝑖 and 𝑐2 = 1. Thus, the set 𝑆 is linearly dependent. How do we deal with a set that contains more than two vectors? The ideas in the previous examples can be expanded as follows. Let 𝑆 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 . Consider the following questions about 𝑆: #» 1. Is 0 ∈ 𝑆? 2. Is one vector in 𝑆 a scalar multiple of another vector in 𝑆? 3. Can we easily observe that we can write one of the vectors in 𝑆 as a linear combination of some of the other vectors in 𝑆? If the answer to any of these questions is “yes”, then the set 𝑆 is linearly dependent. In particular, (2) and (3) above essentially ask “is it obvious that this set is linearly dependent?” If the answer to all of them is “no”, we cannot immediately conclude whether the set is linearly independent or dependent. Therefore, we must examine the expression #» 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . This is a homogeneous linear system of 𝑛 equations (obtained by equating the entries of the vectors on both sides of the above expression in 𝑘 unknowns (the coefficients 𝑐𝑖 ). Since it is a homogeneous system, we have that one solution is the trivial solution, and we wish to know whether or not there are any other solutions. We know that 𝑆 is linearly dependent if and only if there are non-trivial solutions from Proposition 8.3.1 (Linear Dependence Check). This is all succinctly captured by the rank of the coefficient matrix of the system associated with the expression above. Section 8.3 Detecting Linear Dependence and Independence Proposition 8.3.6 201 (Pivots and Linear Independence) [︀ ]︀ Let 𝑆 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 . Let 𝐴 = 𝑣#»1 𝑣#»2 · · · 𝑣#»𝑘 be the 𝑛 × 𝑘 matrix whose columns are the vectors in 𝑆. Suppose that rank(𝐴) = 𝑟 and 𝐴 has pivots in columns 𝑞1 , 𝑞2 , . . . , 𝑞𝑟 . Let 𝑈 = {𝑣# 𝑞»1 , 𝑣# 𝑞»2 , . . . , 𝑣# 𝑞»𝑟 }, the set of columns of 𝐴 that correspond to the pivot columns labelled above. Then (a) 𝑆 is linearly independent if and only if 𝑟 = 𝑘. (b) 𝑈 is linearly independent. 𝑣 } is linearly dependent. (c) If #» 𝑣 is in 𝑆 but not in 𝑈 then the set {𝑣# 𝑞»1 , ..., 𝑣# 𝑞»𝑟 , #» (d) Span (𝑈 ) = Span (𝑆). REMARK This proposition makes the identification of linear dependence/independence relatively straightforward. Additionally, it identifies “redundant” vectors in a given set that may be removed to obtain a linearly independent subset whose span is the same as the span of the original set. This gives a framework for the ad-hoc process carried out in Proposition 8.1.2. Proof: (a) The matrix 𝐴 is the coefficient matrix for the homogeneous system of linear equations #» 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . There will be no parameters in the solution set if and only if there is a pivot in each of the columns of 𝐴, that is, if and only if 𝑟 = 𝑘. So 𝑆 is linearly independent if and only if 𝑟 = 𝑘. (b) We must examine the linear dependence of the vectors 𝑣# 𝑞»1 , 𝑣# 𝑞»2 , . . . , 𝑣# 𝑞»𝑟 corresponding to the pivot columns in 𝐴. Thus, we must consider the homogeneous system of linear equations #» 𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 = 0 . This system is a system of 𝑟 equations in 𝑟 unknowns. Its coefficient matrix has rank 𝑟 because it consists of the pivot columns in 𝐴. Thus, the only solution to this system is the trivial solution. Consequently, 𝑈 must be linearly independent. (c) Suppose that 𝑟 < 𝑘, so that there is at least one non-pivot column in 𝐴 — the 𝑗th column, say. Add the vector 𝑣#»𝑗 to 𝑈 and check this new set for linear dependence. This amounts to examining the system #» 𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 + 𝛼 𝑣#»𝑗 = 0 . By construction, the coefficient matrix of this system has 𝑟 + 1 columns and rank 𝑟. Thus, there must be 𝑟 + 1 − 𝑟 = 1 free parameter in its solution set, and therefore, there are nontrivial solutions. It follows that {𝑣# 𝑞»1 , 𝑣# 𝑞»2 , . . . , 𝑣# 𝑞»𝑟 , 𝑣#»𝑗 } is linearly dependent. 202 Chapter 8 Subspaces and Bases (d) Note that 𝑈 ⊆ 𝑆, so Span (𝑈 ) ⊆ Span (𝑆). We will prove that Span (𝑆) ⊆ Span (𝑈 ). If 𝑈 = 𝑆, there is nothing to prove. So assume that 𝑆 ̸= 𝑈 , and let 𝑣#» be a vector in 𝑆, 𝑗 but not in 𝑈 . Our proof in part (3) shows that the system #» 𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 + 𝛼 𝑣#»𝑗 = 0 has a nontrivial solution. Such a solution must have 𝛼 ̸= 0, because of the following argument. If 𝛼 = 0, then #» 𝑑1 𝑣# 𝑞»1 + 𝑑2 𝑣# 𝑞»2 + · · · + 𝑑𝑟 𝑣# 𝑞»𝑟 = 0 , and consequently 𝑑1 = . . . = 𝑑𝑟 = 0 by part (b) above and by Part (b) of Proposition 8.3.1 (Linear Dependence Check). So, if 𝛼 = 0, we get the trivial solution, which implies that this set of vectors is linearly independent and this contradicts (c). Therefore, since 𝛼 ̸= 0, we can write 𝑣#» as a linear combination of the vectors in 𝑈 , 𝑗 namely 𝑑1 # » 𝑑2 # » 𝑑𝑟 # » 𝑣#»𝑗 = 𝑣𝑞1 + 𝑣𝑞2 + · · · + 𝑣𝑞 . −𝛼 −𝛼 −𝛼 𝑟 This shows that 𝑣#»𝑗 ∈ Span (𝑈 ). Since 𝑣#»𝑗 was an arbitrary vector in 𝑆 but not in 𝑈 , it follows that Span (𝑆) ⊆ Span (𝑈 ), as desired. Here is a very useful application of the previous Proposition. Corollary 8.3.7 (Bound on Number of Linearly Independent Vectors) Let 𝑆 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 . If 𝑛 < 𝑘, then 𝑆 is linearly dependent. Proof: As in Proposition 8.3.6 (Pivots and Linear Independence), let 𝐴 be the matrix whose columns are the vectors in 𝑆, and let 𝑟 = rank(𝐴). Then 𝑟 ≤ 𝑛, since 𝐴 has 𝑛 rows. Hence if 𝑛 < 𝑘, it follows that 𝑟 ≤ 𝑛 < 𝑘. Therefore, 𝑆 is linearly dependent, by Proposition 8.3.6 (a). We now turn to some computational applications of Proposition 8.3.6 (Pivots and Linear Independence). Example 8.3.8 Consider the following vectors in R6 : [︀ ]︀𝑇 𝑣#»1 = 1 −2 3 −4 5 −6 , [︀ ]︀𝑇 𝑣#»3 = −5 −10 −7 −16 −9 −22 , [︀ ]︀𝑇 𝑣#»5 = 1 1 1 1 1 1 , Let 𝑆 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 , 𝑣#»5 , 𝑣#»6 } and 𝑉 = Span 𝑆. (a) Show that 𝑆 is linearly dependent. (b) Find a subset of 𝑆 that is a basis for 𝑉 . [︀ ]︀𝑇 𝑣#»2 = 3 4 5 6 7 8 , [︀ ]︀𝑇 𝑣#»4 = −6 −4 −2 0 2 4 , [︀ ]︀𝑇 𝑣#»6 = −3 3 −5 5 −1 1 . Section 8.3 Detecting Linear Dependence and Independence 203 Solution: (a) Form matrix 𝐴, which has the vectors in 𝑆 as its columns. 3 −5 −6 1 −3 4 −10 −4 1 3 5 −7 −2 1 −5 6 −16 0 1 5 7 −9 2 1 −1 8 −22 4 1 1 ⎡ 1 ⎢ −2 ⎢ ⎢ 3 𝐴=⎢ ⎢ −4 ⎢ ⎣ 5 −6 An REF of 𝐴 is ⎡ 1 3 −5 −6 1 −3 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎣0 0 1 −2 0 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ −8 3 −3 5 10 10 7 1 −1 12 24 0 0 0 0 0 0 0 0 0 0 0 0 ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ 1 ⎥ ⎥ 0 ⎦ 0 We find that 𝐴 has rank 4 with pivots in columns 1, 2, 4, and 6. Since 4 < 6, we conclude that 𝑆 is linearly dependent (by Part (a) of Proposition 8.3.6 (Pivots and Linear Independence)). (b) Parts (b) and (d) of Proposition 8.3.6 (Pivots and Linear Independence) tell us that the vectors 𝑣#»1 , 𝑣#»2 , 𝑣#»4 and 𝑣#»6 are linearly independent, and that Span{𝑣#»1 , 𝑣#»2 , 𝑣#»4 , 𝑣#»6 } = Span 𝑆 = 𝑉. We conclude that {𝑣#»1 , 𝑣#»2 , 𝑣#»4 , 𝑣#»6 } is a basis for 𝑉 . Example 8.3.9 Consider the following vectors in C5 : #» 𝑣1 #» 𝑣2 #» 𝑣3 #» 𝑣4 #» 𝑣 5 = [1 + 𝑖, −2 + 𝑖, 3, −4 − 𝑖, 5𝑖]𝑇 , = [−1 + 5𝑖, −7 − 4𝑖, 6 + 9𝑖, −5 − 14𝑖, −15 + 10𝑖]𝑇 , = [3𝑖, 4 + 2𝑖, 5 + 2𝑖, 6 − 4𝑖, 7 − 2𝑖]𝑇 , = [−1 + 2𝑖, −3 − 6𝑖, 11 + 9𝑖, 1 − 18𝑖, −8 + 10𝑖]𝑇 , = [−6𝑖, −4𝑖, −2𝑖, 0, 2𝑖]𝑇 . Let 𝑆 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 , 𝑣#»5 } and 𝑉 = Span 𝑆. (a) Show that 𝑆 is linearly dependent. (b) Find a subset of 𝑆 that is a basis for 𝑉 . Solution: (a) Form matrix 𝐴, which has the vectors in 𝑆 as its columns. ⎡ ⎤ 1+𝑖 −1 + 5𝑖 3𝑖 −1 + 2𝑖 −6𝑖 ⎢ −2 + 𝑖 −7 − 4𝑖 4 + 2𝑖 −3 − 6𝑖 −4𝑖 ⎥ ⎢ ⎥ ⎢ 3 6 + 9𝑖 5 + 2𝑖 11 + 9𝑖 −2𝑖 ⎥ 𝐴=⎢ ⎥. ⎣ −4 − 𝑖 −5 − 14𝑖 6 − 4𝑖 1 − 18𝑖 0 ⎦ 5𝑖 −15 + 10𝑖 7 − 2𝑖 −8 + 10𝑖 2𝑖 204 Chapter 8 Subspaces and Bases The RREF of 𝐴 is ⎡ 1 2 + 3𝑖 ⎢0 0 ⎢ ⎢0 0 ⎢ ⎣0 0 0 0 0 1 0 0 0 ⎤ 0 −2 − 3𝑖 0 −1 ⎥ ⎥ ⎥. 1 1 ⎥ ⎦ 0 0 0 0 We find that 𝐴 has rank 3 with pivots in columns 1, 3 and 4. Since 3 < 5, we conclude that 𝑆 is linearly dependent (by Part (a) of Proposition 8.3.6 (Pivots and Linear Independence)). (b) Parts (b) and (d) of Proposition 8.3.6 (Pivots and Linear Independence) tell us that the vectors 𝑣#»1 , 𝑣#»3 , and 𝑣#»4 are linearly independent, and that Span{𝑣#»1 , 𝑣#»2 , 𝑣#»4 } = Span 𝑆 = 𝑉. We conclude that {𝑣#»1 , 𝑣#»2 , 𝑣#»4 } is a basis for 𝑉 . 8.4 Spanning Sets The problem we wish to tackle in this section is the following: given a subspace 𝑉 of F𝑛 , find a finite set 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } of vectors in 𝑉 such that 𝑉 = Span{𝑣#»1 , . . . , 𝑣#»𝑘 }. We begin by showing that this is always possible. Theorem 8.4.1 (Every Subspace Has a Spanning Set) Let 𝑉 be a subspace of F𝑛 . Then there exist vectors 𝑣#»1 , . . . , 𝑣#»𝑘 ∈ 𝑉 such that 𝑉 = Span{𝑣#»1 , . . . , 𝑣#»𝑘 }. #» #» #» Proof: If 𝑉 = { 0 }, then 𝑉 = Span{ 0 }. So we may assume that 𝑉 ̸= { 0 }. Therefore, there must be some nonzero vector #» 𝑣 1 in 𝑉 . If 𝑉 = Span{𝑣#»1 }, we are done. Otherwise, there exists a 𝑣#»2 ∈ 𝑉 that is not in Span{𝑣#»1 }. By Part (a) of Proposition 8.3.1 (Linear Dependence Check), the set {𝑣#», 𝑣#»} must be linearly 1 2 independent. If 𝑉 = Span{𝑣#»1 , 𝑣#»2 }, we are done. Otherwise, by continuously repeating this process, we generate a set {𝑣#»1 , . . . , 𝑣#»𝑘 } of 𝑘 linearly independent vectors in 𝑉 . By Corollary 8.3.7 (Bound on Number of Linearly Independent Vectors), we must have 𝑘 ≤ 𝑛. Thus, the process must terminate after at most 𝑛 steps, at which point we have 𝑉 = Span{𝑣#»1 , . . . , 𝑣#»𝑘 }. REMARK The above proof actually shows more than is stated in the Theorem. It shows that if 𝑉 is a nonzero subspace of F𝑛 , then we can find a linearly independent spanning set (that is, a basis!) for 𝑉 . Moreover, we can find a basis with at most 𝑛 elements. We will return to Section 8.4 Spanning Sets 205 these points in Theorem 8.5.1 (Every Subspace Has a Basis) and Proposition 8.7.5 (Bound on Dimension of Subspace). The proof of the previous Theorem can be refined into an algorithm that produces a spanning set for a given subspace 𝑉 of F𝑛 . We will not go into the full details of that here; instead, we will only consider one of the key issues that must be addressed in order to make this process practical. Namely, given a finite subset 𝑆 of 𝑉 , how can we determine if Span (𝑆) = 𝑉 ? Proposition 8.4.2 (Span of Subset) Let 𝑉 be a subspace of F𝑛 and let 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } ⊆ 𝑉 . Then Span (𝑆) ⊆ 𝑉 . Proof: This result follows immediately from the fact that 𝑉 is closed under addition and scalar multiplication. Thus, given a finite subset 𝑆 of 𝑉 , to show that Span 𝑆 = 𝑉 , we need only show that 𝑉 ⊆ Span (𝑆), since the other containment is always true by the previous Proposition. In order to do this, we need to show that given #» 𝑣 ∈ 𝑉 , there exist scalars 𝑐1 , . . . , 𝑐𝑘 ∈ F such that #» 𝑣 = 𝑐1 𝑣#»1 + · · · + 𝑐𝑘 𝑣#»𝑘 . [︀ ]︀ If we [︀let 𝐴 = 𝑣#»1 · ·]︀· 𝑣#»𝑘 be the coefficient matrix of this system of equations, and if we let 𝑣 be the augmented matrix of this system of equations, then we need to 𝐵 = 𝑣#»1 · · · 𝑣#»𝑘 | #» show that rank(𝐴) = rank(𝐵) for every #» 𝑣 ∈ 𝑉. This will depend on both 𝑆 and 𝑉 . Example 8.4.3 Consider the plane 𝑃 in R3 given by the scalar equation 2𝑥 − 6𝑦 + 4𝑧 = 0. Determine a basis for 𝑃 . Solution: If we let 𝑧 = 𝑡 and 𝑦 = 𝑠, where 𝑠, 𝑡 ∈ R, then 𝑥 = 3𝑠 − 2𝑡, and we can express the plane as ⎧⎡ ⎫ ⎧⎡ ⎤ ⎫ ⎤ ⎡ ⎤ −2 ⎨ 3𝑠 − 2𝑡 ⎬ ⎨ 3 ⎬ 𝑃 = ⎣ 𝑠 ⎦ : 𝑠, 𝑡 ∈ R = ⎣ 1 ⎦ 𝑠 + ⎣ 0 ⎦ 𝑡 : 𝑠, 𝑡 ∈ R . ⎩ ⎭ ⎩ ⎭ 𝑡 0 1 Thus, ⎧⎡ ⎤ ⎡ ⎤⎫ −2 ⎬ ⎨ 3 ⎣ ⎦ ⎣ 1 , 0 ⎦ . 𝑃 = Span ⎩ ⎭ 0 1 ⎡ ⎤ ⎡ ⎤ 3 −2 Since the vectors ⎣ 1 ⎦ and ⎣ 0 ⎦ are not multiples of each other, the set 0 1 ⎧⎡ ⎤ ⎡ ⎤⎫ −2 ⎬ ⎨ 3 𝑆 = ⎣1⎦ , ⎣ 0 ⎦ ⎩ ⎭ 0 1 is linearly independent, hence is a basis for 𝑃 . 206 Example 8.4.4 Chapter 8 Subspaces and Bases ⎧⎡ ⎤ ⎡ ⎤⎫ −4 ⎬ ⎨ 2 Continue with 𝑃 as above, and consider 𝑆1 = ⎣ 0 ⎦ , ⎣ 0 ⎦ . Is Span (𝑆1 ) = 𝑃 ? ⎭ ⎩ −1 2 (Note that 𝑆1 ⊆ 𝑃 .) ⎤ 3𝑠1 − 2𝑡1 ⎦ for some 𝑠1 , 𝑡1 ∈ R. Consider 𝑠1 Solution: Let #» 𝑣 ∈ 𝑃. Then we can write #» 𝑣 =⎣ 𝑡1 the system ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 2 −4 3𝑠1 − 2𝑡1 #» ⎦ ⎣ ⎦ ⎣ ⎣ 𝑠1 = 𝑎 0 + 𝑏 0 ⎦ , where 𝑎, 𝑏 ∈ R. 𝑣 = −1 2 𝑡1 ⎡ The augmented matrix for this system is ⎡ ⎤ 2 −4 3𝑠1 − 2𝑡1 ⎣ 0 ⎦, 0 𝑠1 𝑡1 −1 2 which row reduces to ⎡ ⎤ 2 −4 3𝑠1 − 2𝑡1 ⎣ 0 0 ⎦. 𝑠1 0 0 0 The last column of the augmented matrix is a pivot column when 𝑠1 ̸= 0, and thus, we cannot always solve this system. It is also quite clear from the form of the two vectors in 𝑆1 , which both have a second component of zero, that we will be unable to build all the vectors in 𝑃 from them, as some of the vectors in 𝑃 have a nonzero second component. Thus Span 𝑆1 ̸= 𝑃 . Example 8.4.5 ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 3 ⎬ −4 ⎨ 2 Again use 𝑃 from above. Consider 𝑆2 = ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ . Is Span (𝑆2 ) = 𝑃 ? ⎭ ⎩ −1 2 0 ⎡ ⎤ 3𝑠1 − 2𝑡1 ⎦ ∈ 𝑃 and consider the system 𝑠1 Solution: Let #» 𝑣 =⎣ 𝑡1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 3𝑠1 − 2𝑡1 2 −4 3 #» ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ 𝑠1 𝑣 = = 𝑎 0 + 𝑏 0 + 𝑐 1 ⎦ , where 𝑎, 𝑏, 𝑐 ∈ R. 𝑡1 −1 2 0 The augmented matrix for this system is ⎡ ⎤ 2 −4 3 3𝑠1 − 2𝑡1 ⎣ 0 ⎦, 0 1 𝑠1 −1 2 0 𝑡1 which row reduces to ⎡ ⎤ 1 −2 0 𝑡1 ⎣ 0 0 1 𝑠1 ⎦ . 0 0 0 0 Section 8.4 Spanning Sets 207 Notice that the rank of the coefficient matrix and the rank of the augmented matrix are both equal to 2, and thus, the system will be consistent for all #» 𝑣 ∈ 𝑃 . Consequently, Span 𝑆2 = 𝑃. In addition, we can solve the system to get 𝑐 = 𝑠1 , 𝑏 = 𝑢, and 𝑎 = 2𝑢 − 𝑡1 , for 𝑢 ∈ R. Note that the fact that there is a parameter 𝑢 in this solution, tells us that we have redundancy. Thus, the set 𝑆2 is not linearly independent. In the special case where 𝑉 = F𝑛 , we have the following useful criterion. Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛) [︀ ]︀ Let 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 and let 𝐴 = 𝑣#»1 · · · 𝑣#»𝑘 be the matrix whose columns are the vectors in 𝑆. Then Span (𝑆) = F𝑛 if and only if rank(𝐴) = 𝑛. Proof: We have Span (𝑆) = F𝑛 if and only if for every #» 𝑣 ∈ F𝑛 , the system of equations #» 𝑣 = 𝑐 𝑣#» + 𝑐 𝑣#» + · · · + 𝑐 𝑣#» 1 1 2 2 𝑘 𝑘 has a solution. [︀ ]︀ 𝑣 be the corresponding augmented The coefficient matrix of this system is 𝐴. Let 𝐵 = 𝐴 | #» matrix. Then the above system has a solution if and only if rank(𝐴) = rank(𝐵). We want this system to have a solution for every #» 𝑣 ∈ F𝑛 ; in particular, this holds for every standard basis vector. This happens if and only if the last column of 𝐵 is never a pivot column, which in turn happens if and only if every row of 𝐴 has a pivot in it. Since 𝐴 has 𝑛 rows, this last statement is equivalent to having rank(𝐴) = 𝑛. [︀ ]︀ Note that if rank 𝑣#»1 · · · 𝑣#»𝑘 = 𝑛, then 𝑘 ≥ 𝑛. Therefore, we need at least 𝑛 vectors in 𝑆 for Span (𝑆) to be equal to F𝑛 , which is very reasonable. Of course, the condition for 𝑘 ≥ 𝑛 is not sufficient on its own — the vectors in 𝑆 must also form a spanning set. Example 8.4.7 Determine which (if any) of the following sets span R4 . ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 2 2 ⎪ ⎪ ⎪ ⎬ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ 1 −2 ⎥ , ⎢ ⎥ , ⎢4⎥ . (a) 𝑆1 = ⎢ ⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦⎪ ⎪ ⎪ ⎪ ⎭ ⎩ −3 8 1 ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 2 2 5 ⎪ ⎪ ⎪ ⎬ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ 1 −2 4 ⎥,⎢ ⎥,⎢ ⎥,⎢ 3 ⎥ . (b) 𝑆2 = ⎢ ⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦ ⎣ 10 ⎦⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 1 −3 8 6 ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 2 2 5 3 ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ ⎬ 1 −2 4 3 ⎥,⎢ ⎥,⎢ ⎥,⎢ ⎥,⎢ 5 ⎥ . (c) 𝑆3 = ⎢ ⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦ ⎣ 10 ⎦ ⎣ 6 ⎦⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 1 −3 8 6 10 208 Chapter 8 Subspaces and Bases Solution: (a) Span (𝑆1 ) ̸= R4 , since we need at least four vectors to span R4 . (b) There are at least four vectors in 𝑆2 , so perhaps it spans R4 . Let us place the vectors in a matrix 𝐴 and check if rank(𝐴) = 4. We have ⎡ ⎤ 1 2 2 5 ⎢ 1 −2 4 3 ⎥ ⎥ 𝐴=⎢ ⎣ 1 3 6 10 ⎦ 1 −3 8 6 with ⎡ 0 1 0 0 2 4 6 8 5 3 10 6 1 ⎢0 RREF(𝐴) = ⎢ ⎣0 0 0 0 1 0 ⎤ 1 1⎥ ⎥. 1⎦ 0 Thus rank(𝐴) = 3 ̸= 4. So Span (𝑆2 ) ̸= R4 . (c) We again have at least four vectors. Let ⎡ 1 2 ⎢ 1 −2 𝐴=⎢ ⎣1 3 1 −3 Then ⎡ 1 ⎢0 RREF(𝐴) = ⎢ ⎣0 0 0 1 0 0 ⎤ 3 5⎥ ⎥. 6⎦ 10 0 0 1 0 1 1 1 0 ⎤ 0 0⎥ ⎥. 0⎦ 1 Thus rank(𝐴) = 4, and so in this case Span (𝑆3 ) = R4 . 8.5 Basis In Definition 8.2.6, we defined a basis for a subspace 𝑉 of F𝑛 to be a subset ℬ ⊆ 𝑉 that is linearly independent and spans 𝑉 ; that is, Span (ℬ) = 𝑉 . In the previous two sections, we examined both aspects of this definition. In this section and the next, we will put everything together to showcase the utility of bases. We begin with the following result which we had already noted in passing. Theorem 8.5.1 (Every Subspace Has a Basis) Let 𝑉 be a subspace of F𝑛 . Then 𝑉 has a basis. Section 8.5 Basis 209 Proof: If 𝑉 is not the zero subspace, then the result is given by the proof of Theorem 8.4.1 (Every Subspace Has a Spanning Set). As discussed following Definition 8.2.6, }︁ {︁ #»immediately we know that ∅ is a basis of the trivial subspace 0 . Theorem 8.5.1 (Every Subspace Has a Basis), as it is stated, is merely an existence result. It does not supply us with a method for obtaining a basis for 𝑉 . (Although, as mentioned in the previous section, the proof of Theorem 8.4.1 (Every Subspace Has a Spanning Set) can be refined into an algorithm.) Nonetheless, in some cases it’s possible to explicitly exhibit examples of bases. Here is a special—but very important—example. Definition 8.5.2 Standard Basis for F𝑛 In F𝑛 , let #» 𝑒 𝑖 represent the vector whose 𝑖𝑡ℎ component is 1 with all other components 0. 𝑛 The set ℰ = {𝑒#»1 , 𝑒#»2 , ..., 𝑒#» 𝑛 } is called the standard basis for F . Using Proposition 8.3.6 (Pivots and Linear Independence), we can verify that the standard basis ℰ is linearly independent. It also spans F𝑛 . Indeed, if #» 𝑥 ∈ F𝑛 , then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 𝑥1 1 0 0 ⎢ 𝑥2 ⎥ ⎢0⎥ ⎢1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ #» 𝑥 = ⎢ . ⎥ = 𝑥1 ⎢ . ⎥ + 𝑥2 ⎢ . ⎥ + · · · + 𝑥𝑛 ⎢ . ⎥ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ 𝑥𝑛 0 0 1 = 𝑥1 𝑒#»1 + 𝑥2 𝑒#»2 + · · · 𝑥𝑛 𝑒#» 𝑛. Therefore, #» 𝑥 ∈ Span (ℰ). Thus, ℰ is a basis for F𝑛 . In general, determining if a set 𝑆 is a basis for F𝑛 is a straightforward matter, as the next two propositions show. Proposition 8.5.3 (Size of Basis for F𝑛 ) Let 𝑆 = {𝑣#»1 , . . . , 𝑣#»𝑘 } be a set of 𝑘 vectors in F𝑛 . If 𝑆 is a basis for F𝑛 , then 𝑘 = 𝑛. Proof: For a subset 𝑆 of F𝑛 to be a basis of F𝑛 , we must have that 1. Span (𝑆) = F𝑛 , which by Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛) means that 𝑘 ≥ 𝑛, and 2. 𝑆 is linearly independent, which by Corollary 8.3.7 (Bound on Number of Linearly Independent Vectors) means that 𝑘 ≤ 𝑛. We conclude that 𝑘 = 𝑛. Thus, if a subset 𝑆 of F𝑛 contains fewer than 𝑛 vectors, it does not contain enough vectors to span F𝑛 and is thus not a spanning set of F𝑛 . On the other hand, if 𝑆 contains more than 𝑛 vectors in F𝑛 , then it must be linearly dependent. If 𝑆 contains exactly 𝑛 vectors, then it may or may not be a basis for F𝑛 . We still have to check that 𝑆 is linearly independent and that it spans F𝑛 . In fact, as the next result shows, we only need to check one of these conditions! 210 Proposition 8.5.4 Chapter 8 Subspaces and Bases (𝑛 Vectors in F𝑛 Span iff Independent) 𝑛 Let 𝑆 = {𝑣#»1 , . . . , 𝑣#» 𝑛 } be a set of 𝑛 vectors in F . Then 𝑆 is linearly independent if and only if 𝑛 Span (𝑆) = F . ]︀ [︀ Proof: Let 𝐴 = 𝑣#»1 · · · 𝑣#» 𝑛 be the 𝑛 × 𝑛 matrix whose columns are the vectors in 𝑆. By Part (a) of Proposition 8.3.6 (Pivots and Linear Independence), 𝑆 is linearly independent if and only if rank(𝐴) = 𝑛. On the other hand, from Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛) we have that Span (𝑆) = F𝑛 if and only if rank(𝐴) = 𝑛. Combining both of these results completes the proof. Thus, when we are checking to see whether or not a subset 𝑆 of F𝑛 is a basis of F𝑛 , below are three options to check this: 1. Check 𝑆 for linear independence and check 𝑆 for spanning, or 2. Count the vectors in 𝑆 and, if 𝑆 contains exactly 𝑛 vectors, then check 𝑆 for spanning using Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛), or 3. Count the vectors in 𝑆 and, if 𝑆 contains exactly 𝑛 vectors, then check 𝑆 for linear independence using Proposition 8.3.6 (Pivots and Linear Independence). In practice, the last option above is usually the quickest. Example 8.5.5 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 −1 4 1 4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −3 1 3 2 2 ⎥ #» ⎢ ⎥ #» ⎢ ⎥ #» ⎢ ⎥ #» ⎢ ⎥ Let 𝑣#»1 = ⎢ ⎣ 3 ⎦ , 𝑣2 = ⎣ −3 ⎦ , 𝑣3 = ⎣ 2 ⎦ , 𝑣4 = ⎣ 1 ⎦ , 𝑣5 = ⎣ 2 ⎦ . −1 1 1 4 4 Which of the following subsets of R4 is a basis for R4 ? 𝑆1 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 } , 𝑆2 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»4 , 𝑣#»5 } , 𝑆3 = {𝑣#»1 , 𝑣#»2 , #» 𝑣 3 , 𝑣#»4 } , 𝑆4 = {𝑣#»1 , 𝑣#»2 , 𝑣#»3 , 𝑣#»5 } . Solution: Sets 𝑆1 and 𝑆2 fail immediately as they have the wrong number of vectors in them. Sets 𝑆3 and 𝑆4 have the correct number of vectors in them so we need to investigate them further. Let us check whether they are linear independent. The matrix whose columns are the vectors in 𝑆3 is ⎡ ⎤ 1 −1 4 1 ⎢2 2 3 1⎥ ⎥ 𝐴=⎢ ⎣ 3 −3 2 1 ⎦ , 4 4 11 Section 8.5 Basis 211 which row reduces to ⎡ 1 ⎢0 ⎢ ⎣0 0 0 1 0 0 ⎤ 0 15 0 0⎥ ⎥. 1 15 ⎦ 0 0 Thus, rank(𝐴) = 3. Since rank(𝐴) ̸= 4, 𝑆3 is not linearly independent. Consequently, it is not a basis either. For set 𝑆4 , we consider the matrix −1 2 −3 4 4 3 2 1 ⎡ 0 1 0 0 ⎤ 0 0⎥ ⎥. 0⎦ 1 1 ⎢2 𝐵=⎢ ⎣3 4 which row reduces to ⎤ 4 −3 ⎥ ⎥, 2 ⎦ −1 ⎡ 1 ⎢0 ⎢ ⎣0 0 0 0 1 0 Since rank(𝐵) = 4, it must be the case that 𝑆4 is linearly independent. Since 𝑆4 contains exactly four elements, it is a basis for R4 by Proposition 8.5.4 (𝑛 Vectors in F𝑛 Span iff Independent). REMARK There are two problems we might encounter when trying to obtain a basis for F𝑛 : (a) We might have a set of vectors 𝑆 ⊆ F𝑛 with the property that Span (𝑆) = F𝑛 , but the set contains more than 𝑛 vectors, which is too many to be a basis. In this case, 𝑆 will be linearly dependent. We may apply Proposition 8.3.6 (Pivots and Linear Independence) to produce a subset of 𝑆 that is linearly independent, but still spans F𝑛 . This subset will be a basis for F𝑛 . (b) We might have a set of vectors 𝑆 ⊆ F𝑛 that is linearly independent, but that contains fewer than 𝑛 vectors, which is too few to be a basis. In this case, Span (𝑆) ̸= F𝑛 . The problem here is to figure out which vectors to add to 𝑆 to make it span F𝑛 . One possible approach is to add all 𝑛 standard basis vectors to 𝑆, obtaining a larger set 𝑆 ′ . Then certainly Span (𝑆 ′ ) = F𝑛 , but now 𝑆 ′ is too large to be a basis. This brings us back to (a). We summarize observations stated in the above remark in Theorem 8.5.6. Theorem 8.5.6 (Basis From a Spanning Set or Linearly Independent Set) Let 𝑆 = { #» 𝑣 1 , . . . , #» 𝑣 𝑘 } be a subset of F𝑛 . 212 Chapter 8 Subspaces and Bases (a) If Span(𝑆) = F𝑛 , then there exists a subset ℬ of 𝑆 which is a basis for F𝑛 . (b) If Span(𝑆) ̸= F𝑛 and 𝑆 is linearly independent, then there exist vectors #» 𝑣 𝑘+1 , . . . , #» 𝑣𝑛 #» #» #» #» 𝑛 𝑛 in F such that ℬ = { 𝑣 1 , . . . , 𝑣 𝑘 , 𝑣 𝑘+1 , . . . , 𝑣 𝑛 } is a basis for F . Example 8.5.7 Find a basis ℬ of R3 that contains the set ⎧⎡ ⎤ ⎡ ⎤⎫ −3 ⎬ ⎨ 1 𝑆 = ⎣ 2 ⎦ , ⎣ −4 ⎦ . ⎭ ⎩ 3 −6 Solution: Notice that 𝑆 is linearly independent, since the two vectors it contains are not multiples of one another. We would like to extend 𝑆 to a basis for R3 . We begin by adding the standard basis vectors to the set to obtain the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 0 ⎬ 0 1 −3 ⎨ 1 𝑆 ′ = ⎣ 2 ⎦ , ⎣ −4 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ . ⎭ ⎩ 1 0 0 −6 3 Now Span 𝑆 ′ = R3 and we would like to extract a linearly independent subset from 𝑆 ′ using the method of Proposition 8.3.6 (Pivots and Linear Independence). The matrix whose columns are the vectors in 𝑆 ′ is ⎡ ⎤ 1 −3 1 0 0 𝐴 = ⎣ 2 −4 0 1 0 ⎦ , 3 −6 0 0 1 which row reduces to ⎡ ⎤ 1 0 −2 0 1 ⎣ 0 1 −1 0 1 ⎦ . 3 0 0 0 1 − 23 There are pivots in columns 1, 2 and 4. Thus, columns 1, 2 and 4 of 𝐴 are a basis for R3 : ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ −3 0 ⎬ ⎨ 1 ℬ = ⎣ 2 ⎦ , ⎣ −4 ⎦ , ⎣ 1 ⎦ . ⎩ ⎭ 3 −6 0 In the above, we only considered bases for the whole space F𝑛 . In the next section we will discuss examples of bases for special subspaces of F𝑛 . In Chapter 9 we will look at bases for eigenspaces and we will discover a connection to the problem of diagonalizability that we had met in Section 7.6. 8.6 Bases for Col(𝐴) and Null(𝐴) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). In Propositions 8.1.3 and 8.1.2, we saw that Col(𝐴) is a subspace of F𝑚 and that Null(𝐴) is a subspace of F𝑛 . Section 8.6 Bases for Col(𝐴) and Null(𝐴) Proposition 8.6.1 213 (Basis for Col(𝐴)) [︀ ]︀ Let 𝐴 = 𝑎#»1 · · · 𝑎# » ∈ 𝑀𝑚×𝑛 (F) and suppose that RREF(𝐴) has pivots in columns 𝑛 𝑞1 , . . . , 𝑞𝑟 , where 𝑟 = rank(𝐴). Then {𝑎# 𝑞»1 , . . . , 𝑎# 𝑞»𝑟 } is a basis for Col(𝐴). Proof: This follows from Proposition 8.3.6 (Pivots and Linear Independence). REMARK It is important to note that the columns of 𝐴—and not those of RREF(𝐴)—are the elements of the basis given in the previous proposition. In general, the columns in RREF(𝐴) are not necessarily in Col(𝐴). ⎡ Example 8.6.2 1 ⎢ −2 Let 𝐴 = ⎢ ⎣ 2 3 3 −2 3 4 3 2 0 −1 2 −8 7 11 ⎤ −9 2 ⎥ ⎥. Find a basis for Col(𝐴). 1 ⎦ −8 Solution: Row reduce 𝐴 to ⎡ 1 ⎢0 RREF(𝐴) = ⎢ ⎣0 0 0 1 0 0 −3 2 0 0 5 −1 0 0 ⎤ 0 0⎥ ⎥. 1⎦ 0 Since there are pivots in columns 1, 2 and 5, we conclude that ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ −9 ⎪ 3 1 ⎪ ⎪ ⎪ ⎬ ⎨⎢ ⎥ ⎢ ⎥ ⎢ −2 ⎥ ⎢ −2 ⎥ ⎢ 2 ⎥ ⎥ ⎢ ℬ = ⎣ ⎦,⎣ ⎦,⎣ 1 ⎦⎪ 3 2 ⎪ ⎪ ⎪ ⎭ ⎩ −8 4 3 is a basis for Col(𝐴). Notice that in this example it would be incorrect to say that the set of pivot columns of RREF(𝐴) is a basis for Col(𝐴). Indeed, they all have 0 in the fourth component, so they cannot possibly span Col(𝐴). We can also show that ⎡ ⎤ 0 ⎢0⎥ ⎢ ⎥ ̸∈ Col(𝐴). ⎣1⎦ 0 Next we turn our attention to Null(𝐴). Recall that we can view this subspace as the solution #» set to the homogeneous system 𝐴 #» 𝑥 = 0 . If we re-examine the Gauss–Jordan Algorithm, we find that it provides us with a basis for Null(𝐴). Indeed, the solutions corresponding to the free parameters are by design a basis for Null(𝐴). 214 Chapter 8 Subspaces and Bases ⎡ Example 8.6.3 ⎤ 1 −2 −1 3 Let 𝐴 = ⎣ 2 −4 1 0 ⎦. Find a basis for Null(𝐴). 1 −2 2 −3 Solution: The matrix 𝐴 is the equations 𝑥1 2𝑥1 𝑥1 coefficient matrix of the homogeneous system of linear −2𝑥2 −𝑥3 +3𝑥4 = 0 −4𝑥2 +𝑥3 = 0 . −2𝑥2 +2𝑥3 −3𝑥4 = 0 Thus, Null(𝐴) is the solution set to this system. In Example 3.7.4, we applied the Gauss– Jordan Algorithm to find that the solution to this system is ⎫ ⎧ ⎡ ⎤ ⎫ ⎧⎡ ⎤ ⎡ ⎤ 2𝑠 − 𝑡 −1 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ ⎢ ⎥ ⎬ ⎨⎢ ⎥ ⎢ 0 ⎥ 𝑠 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ : 𝑠, 𝑡 ∈ F : 𝑠, 𝑡 ∈ F Null(𝐴) = ⎣ = + 𝑡 . 𝑠 ⎣ 2 ⎦ 2𝑡 ⎦ ⎪ ⎪ ⎪ ⎪ ⎣0⎦ ⎪ ⎪ ⎪ ⎭ ⎪ ⎭ ⎩ ⎩ 𝑡 1 0 Notice that the Gauss–Jordan Algorithm immediately supplies us with the spanning set ⎧⎡ ⎤ ⎡ ⎤⎫ −1 ⎪ 2 ⎪ ⎪ ⎬ ⎨⎢ ⎥ ⎢ ⎥⎪ 1⎥ ⎢ 0 ⎥ ⎢ ℬ = ⎣ ⎦,⎣ ⎦ . 2 ⎪ 0 ⎪ ⎪ ⎪ ⎭ ⎩ 1 0 Moreover, observe that this spanning set is linearly independent since the two vectors it contains are not multiples of each other. Thus, ℬ is a basis for Null(𝐴). Digging a bit deeper, the real reason the set ℬ in the previous example is linearly independent is because it contains the solution vectors that were constructed using the free parameters 𝑥2 and 𝑥4 . (Refer back to Example 3.7.4.) This forces the first vector in the spanning set to have a 1 in the second component and forces the second vector to have a 0 in the second component. (It also forces the second vector to have a 1 in the fourth component and the first vector to have a 0 in the fourth component.) ⎡ Example 8.6.4 ⎤ −1 −1 −2 3 1 Let 𝐴 = ⎣ −9 5 −4 −1 −5 ⎦. Find a basis for Null(𝐴). 7 −5 2 3 5 Solution: The matrix 𝐴 is the coefficient matrix of the homogeneous system of linear equations −𝑥1 −2𝑥2 −2𝑥3 +3𝑥4 +𝑥5 = 0 −9𝑥1 5𝑥2 −4𝑥3 −𝑥4 −5𝑥5 = 0 . 7𝑥1 −5𝑥2 +2𝑥3 +3𝑥4 +5𝑥5 = 0 Thus, Null(𝐴) is the solution set to this system. ⎡ 10 RREF(𝐴) = ⎣ 0 1 00 We have ⎤ 1 −1 0 1 −2 −1 ⎦ . 0 0 0 Section 8.6 Bases for Col(𝐴) and Null(𝐴) 215 There are pivots in columns 1 and 2, so we may take 𝑥3 = 𝑠, 𝑥4 = 𝑡 and 𝑥5 = 𝑟 as free parameters. Then 𝑥1 = −𝑥3 + 𝑥4 = −𝑠 + 𝑡 and 𝑥2 = −𝑥3 + 2𝑥4 + 𝑥5 = −𝑠 + 2𝑡 + 𝑟, and therefore, the solution set is given by ⎫ ⎧⎡ ⎫ ⎧ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ 0 −𝑠 + 𝑡 1 −1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎢ −𝑠 + 2𝑡 + 𝑟 ⎥ ⎪ ⎪ ⎢1⎥ ⎪ ⎢2⎥ ⎢ −1 ⎥ ⎬ ⎨ ⎬ ⎨ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ 𝑠 Null(𝐴) = ⎢ ⎥ : 𝑠, 𝑡, 𝑟 ∈ F⎪ = ⎪𝑠 ⎢ 1 ⎥ + 𝑡 ⎢ 0 ⎥ + 𝑟 ⎢ 0 ⎥ : 𝑠, 𝑡, 𝑟 ∈ F⎪ . ⎪ ⎪ ⎪ ⎪ ⎪ ⎦ ⎣0⎦ ⎣ ⎣1⎦ 𝑡 ⎪ ⎪ ⎪ ⎪ ⎣ 0 ⎦ ⎪ ⎪ ⎪ ⎭ ⎩ ⎭ ⎪ ⎩ 1 𝑟 0 0 Consequently, the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ −1 1 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎥ ⎥ ⎢ ⎢ ⎢ ⎪ −1 2 ⎨⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎥⎬ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ℬ = ⎢ 1 ⎥ , ⎢0⎥ , ⎢0⎥ ⎪ ⎪ ⎪ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 0 0 1 is a spanning set for Null(𝐴). It is also linearly independent. Indeed, if some linear combination of the vectors in the set is zero, that is, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 1 0 0 ⎢ −1 ⎥ ⎢2⎥ ⎢1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ + 𝑡 ⎢0⎥ + 𝑟 ⎢0⎥ = ⎢0⎥ , 1 𝑠⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ 0 ⎦ ⎣1⎦ ⎣0⎦ ⎣0⎦ 0 0 1 0 then ⎡ ⎤ ⎡ ⎤ −𝑠 + 𝑡 0 ⎢ −𝑠 + 2𝑡 + 𝑟 ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢0⎥ 𝑠 ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣0⎦ 𝑡 𝑟 0 and hence 𝑠 = 𝑡 = 𝑟 = 0 by equating the third, fourth and fifth entries. So the only possible such linear combination is the trivial linear combination. Therefore, ℬ is linearly independent. Notice that the third, fourth and fifth entries correspond to the free parameters. This is no coincidence. Indeed, in the set ℬ, each vector has a nonzero entry in the row corresponding to its free parameter while all other vectors in ℬ have a 0 in that same row. The results in the previous two examples may be generalized as follows. Proposition 8.6.5 (Basis for Null(𝐴)) #» Let 𝐴 ∈ 𝑀𝑚×𝑛 (F) and consider the homogeneous linear system 𝐴 #» 𝑥 = 0 . Suppose that, after applying the Gauss–Jordan Algorithm, we obtain 𝑘 free parameters so that the solution set to this system is given by Null(𝐴) = {𝑡1 𝑥#»1 + · · · + 𝑡𝑘 𝑥# »𝑘 : 𝑡1 , . . . , 𝑡𝑘 ∈ F}. 216 Chapter 8 Subspaces and Bases Here 𝑘 = nullity(𝐴) = 𝑛 − rank(𝐴) and the parameters 𝑡𝑖 and the vectors 𝑥#»𝑖 for 1 ≤ 𝑖 ≤ 𝑘 are obtained using the method outlined in Section 3.7. Then {𝑥#», . . . , 𝑥# »} is a basis for Null(𝐴). 1 𝑘 Proof: Since Null(𝐴) = {𝑡1 𝑥#»1 + · · · + 𝑡𝑘 𝑥# »𝑘 : 𝑡1 , . . . , 𝑡𝑘 ∈ F}, we see that ℬ = {𝑥#»1 , . . . , 𝑥# »𝑘 } is a spanning set. The fact that it is linearly independent follows from the way the solution vectors 𝑥#»𝑖 were constructed such that only 𝑥#»𝑖 has a nonzero entry in the row corresponding to the 𝑖𝑡ℎ free variable. So if there are scalars 𝑐𝑖 ∈ F so that #» 𝑐1 𝑥#»1 + · · · + 𝑐𝑘 𝑥# »𝑘 = 0 , then by comparing 𝑖𝑡ℎ coefficients of both sides of the equation, we get that 𝑐𝑖 = 0 for all 𝑖. Thus, ℬ is a basis for Null(𝐴), as claimed. 8.7 Dimension A nonzero subspace 𝑉 of F𝑛 will have infinitely many different bases. If ℬ = {𝑣#»1 , . . . , 𝑣#»𝑘 } is a basis for 𝑉 , then so is ℬ ′ = {𝑐𝑣#»1 , . . . , 𝑣#»𝑘 } for any nonzero scalar 𝑐 ∈ F. More drastically, two different bases can contain substantially different vectors, and not just ones that are scalar multiples of each other. Example 8.7.1 In Section 8.5, we saw two bases for R4 : the standard basis ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 0 ⎪ 0 0 1 ⎪ ⎪ ⎬ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ 0 1 0 ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢0⎥ , 𝒮 = {𝑒#»1 , 𝑒#»2 , 𝑒#»3 , 𝑒#»4 } = ⎢ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 1 0 0 0 and the basis ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 4 ⎪ 4 −1 1 ⎪ ⎪ ⎪ ⎬ ⎨⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ 2 ⎥ ⎢ 2 ⎥ ⎢ 3 ⎥ ⎢ −3 ⎥ ⎥ ⎢ , , 𝑆4 = ⎣ ⎦ , ⎣ −3 ⎦ ⎣ 2 ⎦ ⎣ 2 ⎦⎪ 3 ⎪ ⎪ ⎪ ⎭ ⎩ −1 1 4 4 given in Example 8.5.5. From our work in Example 8.4.7(b), we can obtain yet another basis: ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 2 2 5 ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ ⎬ −2 4 1 ⎥,⎢ ⎥,⎢ ⎥,⎢ 3 ⎥ . 𝑆2 = ⎢ ⎣ 1 ⎦ ⎣ 3 ⎦ ⎣ 6 ⎦ ⎣ 10 ⎦⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 1 −3 8 6 Although the three sets in the above example contain different vectors, they share one thing in common. They each contain precisely four vectors. This agrees with Proposition 8.5.3 (Size of Basis for F𝑛 ), which states that a basis for F𝑛 must contain precisely 𝑛 vectors. The same kind of result is true for a subspace 𝑉 of F𝑛 . Even though 𝑉 may have many different bases, any two bases will have the same number of elements. Section 8.7 Dimension Theorem 8.7.2 217 (Dimension is Well-Defined) # », . . . , 𝑤 # »} are bases for 𝑉 , then Let 𝑉 be a subspace of F𝑛 . If ℬ = {𝑣#»1 , . . . , 𝑣#»𝑘 } and 𝒞 = {𝑤 1 ℓ 𝑘 = ℓ. Proof: Since ℬ and 𝒞 are bases for 𝑉 , we have that 𝑉 = Span (ℬ) = Span (𝒞). This means, # » ∈ Span (ℬ). Thus there are scalars 𝑐 , . . . , 𝑐 such that in particular, that 𝑤 1 1 𝑘 # » = 𝑐 𝑣#» + 𝑐 𝑣#» + · · · + 𝑐 𝑣#». 𝑤 1 1 1 2 2 𝑘 𝑘 # » would be At least one of these scalars must be nonzero, for if they were all zero then 𝑤 1 #» equal to 0 , which would contradict the fact that 𝒞 is linearly independent (by Proposition 8.3.2(a)). We may assume, without loss of generality, that 𝑐1 ̸= 0. Then we can write 1 #» #» #» 𝑣#»1 = (𝑤 1 − 𝑐2 𝑣2 − · · · − 𝑐𝑘 𝑣𝑘 ). 𝑐1 # », 𝑣#», . . . , 𝑣#»}. Thus, we have effectively replaced This shows that Span{𝑣#»1 , . . . , 𝑣#»𝑘 } = Span{𝑤 1 2 𝑘 # ». By repeating this same argument with 𝑤 # », 𝑣#», . . . , 𝑣#» and 𝑤 # », we can show 𝑣#»1 with 𝑤 1 1 2 2 𝑘 #» #» that we can replace some other 𝑣𝑖 , say without loss of generality 𝑣2 , while still having # », 𝑤 # », 𝑣#», . . . , 𝑣#»}. We can continue this replacement procedure Span{𝑣#»1 , . . . , 𝑣#»𝑘 } = Span{𝑤 1 2 3 𝑘 # » for each 𝑤 in 𝒞. This is only possible if ℓ ≤ 𝑘. 𝑖 Using a similar argument where the roles of ℬ and 𝒞 are swapped, we must also have that 𝑘 ≤ ℓ. Thus, 𝑘 = ℓ. Definition 8.7.3 Dimension The number of elements in a basis for a subspace 𝑉 of F𝑛 is called the dimension of 𝑉 . We denote this number by dim(𝑉 ). Since every subspace 𝑉 of F𝑛 has a basis (by Theorem 8.5.1 (Every Subspace Has a Basis)), and since we just proved in Theorem 8.7.2 (Dimension is Well-Defined) that any two bases for 𝑉 have the same number of elements, we see that dim(𝑉 ) is well-defined (or unambiguous). We can evaluate dim(𝑉 ) by counting the number of vectors in any basis for 𝑉 . Example 8.7.4 𝑛 𝑛 The standard basis ℰ = {𝑒#»1 , . . . , 𝑒#» 𝑛 } of F consist of 𝑛 vectors. Thus, dim(F ) = 𝑛. This aligns with our intuition of F𝑛 being “𝑛-dimensional”. How large can the dimension of a subspace of F𝑛 be? Proposition 8.7.5 (Bound on Dimension of Subspace) Let 𝑉 be a subspace of F𝑛 . Then dim(𝑉 ) ≤ 𝑛. Proof: The proof of Theorem 8.4.1 (Every Subspace Has a Spanning Set) gave us that every subspace has a basis, and established that the basis generated in the proof was guaranteed to have at most 𝑛 elements. 218 Chapter 8 Subspaces and Bases EXERCISE Let 𝑉 be a subspace of F𝑛 . Show that 𝑉 = F𝑛 if and only if dim(𝑉 ) = 𝑛. Example 8.7.6 #» #» Let 𝐿 be the line through the origin in R𝑛 with direction vector 𝑑 ̸= 0 . That is, {︁ #»}︁ #» 𝐿 = {𝑡 𝑑 : 𝑡 ∈ R} = Span 𝑑 . {︁ #»}︁ #» Since 𝑑 ̸= 0, the set 𝑑 is linearly independent, and by the above it spans 𝐿. Thus, {︁ #»}︁ ℬ = 𝑑 is a basis for 𝐿. Consequently, dim(𝐿) = 1. That is, lines through the origin are 1-dimensional subspaces of R𝑛 . Example 8.7.7 Let 𝑃 be the plane in R3 given by the scalar equation 2𝑥 − 6𝑦 + 4𝑧 = 0. In Example 8.4.3, we saw that ⎧⎡ ⎤ ⎡ ⎤⎫ −2 ⎬ ⎨ 3 ⎦ ⎣ ⎣ 1 , 0 ⎦ 𝑆= ⎭ ⎩ 1 0 is a basis for 𝑃 . Thus, dim(𝑃 ) = 2. More generally, if 𝑃 is any plane through the origin in R3 , then we can show that any two linearly independent vectors in 𝑃 will form a basis for 𝑃 . Thus, dim(𝑃 ) = 2 in all cases. That is, planes through the origin are 2-dimensional subspaces of R𝑛 . The next two examples of dimension computations are important enough to warrant being put in a proposition. Proposition 8.7.8 (Rank and Nullity as Dimensions) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). Then (a) rank(𝐴) = dim(Col(𝐴)), and (b) nullity(𝐴) = dim(Null(𝐴)). Proof: (a) follows from Proposition 8.6.1 (Basis for Col(𝐴)). (b) follows from Proposition 8.6.5 (Basis for Null(𝐴)). Using the definition of nullity (Definition 3.6.8), we arrive at the following important result. Section 8.8 Coordinates Theorem 8.7.9 219 (Rank–Nullity Theorem) Let 𝐴 ∈ 𝑀𝑚×𝑛 (F). Then 𝑛 = rank(𝐴) + nullity(𝐴) = dim(Col(𝐴)) + dim(Null(𝐴)). Proof: This result follows immediately from the fact that nullity(𝐴) = 𝑛 − rank(𝐴), together with Proposition 8.7.8 (Rank and Nullity as Dimensions). This relationship between rank and nullity is one of the central results of linear algebra. Although the above proof seems short, it contains a significant amount of content. 8.8 Coordinates In this section, we discuss one of the most important results about a basis. This result will naturally lead us to the idea of coordinates with respect to bases. Theorem 8.8.1 (Unique Representation Theorem) #» 𝑛 𝑛 Let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } be a basis for F . Then, for every vector 𝑣 ∈ F , there exist unique scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 ∈ F such that #» 𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#» 𝑛. Proof: First we prove that such scalars exist. Since ℬ is a basis for F𝑛 , Span (ℬ) = F𝑛 , and so each vector #» 𝑣 can be written as a linear combination of elements of ℬ, that is, there exist scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 ∈ F such that #» 𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#» 𝑛. Next, we show that the representation is unique. Suppose that we can also express #» 𝑣 as #» 𝑣 = 𝑑1 𝑣#»1 + 𝑑2 𝑣#»2 + · · · + 𝑑𝑛 𝑣#» 𝑛 , for some scalars 𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ∈ F. If we subtract these two expressions for #» 𝑣 , then we have #» #» #» #» #» #» 0 = 𝑣 − 𝑣 = (𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#» 𝑛 ) − (𝑑1 𝑣1 + 𝑑2 𝑣2 + · · · + 𝑑𝑛 𝑣𝑛 ) = (𝑐 − 𝑑 )𝑣#» + (𝑐 − 𝑑 )𝑣#» + · · · + (𝑐 − 𝑑 )𝑣#». 1 1 1 2 2 2 𝑛 𝑛 𝑛 Since ℬ is linearly independent, we have that (𝑐1 − 𝑑1 ) = 0, (𝑐2 − 𝑑2 ) = 0, . . . , (𝑐𝑛 − 𝑑𝑛 ) = 0. That is, 𝑐1 = 𝑑1 , 𝑐2 = 𝑑2 , . . . , 𝑐𝑛 = 𝑑𝑛 , and thus, the representation for #» 𝑣 is unique. ⎤ 𝑥1 #» ⎢ . ⎥ 𝑛 𝑛 Let ℰ = {𝑒#»1 , . . . , 𝑒#» 𝑛 } be the standard basis for F . Given 𝑥 = ⎣ .. ⎦ ∈ F , we have 𝑥𝑛 ⎡ Example 8.8.2 #» 𝑥 = 𝑥1 𝑒#»1 + · · · + 𝑥𝑛 𝑒#» 𝑛. 220 Chapter 8 Subspaces and Bases Thus, the unique scalars in the representation of #» 𝑥 in terms of the standard basis are the coordinates (or components) of #» 𝑥. The previous Example motivates our next definition. Definition 8.8.3 #» 𝑛 𝑛 Let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } be a basis for F . Let the vector 𝑣 ∈ F have representation Coordinates and Components #» 𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#» 𝑛 (𝑐𝑖 ∈ F). We call the scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 the coordinates (or components) of #» 𝑣 with respect #» to ℬ, or the ℬ-coordinates of 𝑣 . ⎡ ⎤ 𝑐1 ⎢ ⎥ We would like to use this definition to create a column vector ⎣ ... ⎦ using the coordinates 𝑐𝑛 𝑐1 , . . . , 𝑐𝑛 of #» 𝑣 with respect to ℬ. However, the ordering of the vectors in a basis (that is, the assignment of labels 1, 2, ..., 𝑛 for 𝑣#»1 , 𝑣#»2 , ..., 𝑣#» 𝑛 ) is arbitrary, and permutations of the ordering result in the same basis. Unfortunately, each of these permutations can result in different representations of the vector under ℬ, which could be a source of confusion. The next definition helps us address this subtlety. Definition 8.8.4 Ordered Basis 𝑛 An ordered basis for F𝑛 is a basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } for F together with a fixed ordering. REMARK #» When we refer to the set {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } as being ordered, we are indicating that 𝑣1 is the #» first element in the ordering, that 𝑣2 is the second, and so on. Thus even though {𝑣#», 𝑣#»} and {𝑣#», 𝑣#»} are the same set, they are different from the point 1 2 2 1 of view of orderings. A given basis {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } gives rise to 𝑛! ordered bases, one for each possible ordering (permutation) of the entries. Example 8.8.5 {︂[︂ ]︂ [︂ ]︂}︂ 1 5 The set ℬ = , is a basis for R2 . The two vectors in ℬ are not multiples of each 2 6 other, so ℬ is linearly independent. Since ℬ contains precisely two vectors, it must also span R2 , by Proposition 8.5.4 (𝑛 Vectors in F𝑛 Span iff Independent). Hence ℬ is a basis for R2 . The basis ℬ gives rise to two different ordered bases: {︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂ 1 5 5 1 , and , . 2 6 6 2 Section 8.8 Coordinates Definition 8.8.6 Coordinate Vector 221 #» 𝑛 𝑛 Let ℬ = {𝑣#»1 , . . . , 𝑣#» 𝑛 } be an ordered basis for F . Let 𝑣 ∈ F have coordinates 𝑐1 , . . . , 𝑐𝑛 with respect to ℬ, where the ordering of the scalars 𝑐𝑖 matches the ordering in ℬ, that is, #» 𝑣 = 𝑛 ∑︁ 𝑐𝑖 𝑣#»𝑖 . 𝑖=1 𝑣 with respect to ℬ (or the ℬ-coordinate vector of Then the coordinate vector of #» #» 𝑛 𝑣 ) is the column vector in F ⎡ ⎤ 𝑐1 ⎢ 𝑐2 ⎥ ⎢ ⎥ [ #» 𝑣 ]ℬ = ⎢ . ⎥ . ⎣ .. ⎦ 𝑐𝑛 If we choose a different ordering on the basis ℬ then the entries of [ #» 𝑣 ]ℬ will be permuted accordingly. Example 8.8.7 Consider the ordered basis ℬ = have [︂ ]︂ {︂[︂ ]︂ [︂ ]︂}︂ 7 5 1 . By inspection, we for R2 and let #» 𝑣 = , 10 6 2 [︂ ]︂ [︂ ]︂ 5 1 #» . + (1) 𝑣 = (2) 6 2 So the ℬ-coordinates of #» 𝑣 are 2 and 1. Therefore, [︂ ]︂ 2 #» . [ 𝑣 ]ℬ = 1 {︂[︂ ]︂ [︂ ]︂}︂ 1 5 , we would have , If we instead use the ordered basis 𝒞 = 2 6 [︂ ]︂ 1 #» [ 𝑣 ]𝒞 = . 2 In using coordinates, it is therefore important to indicate clearly which ordered basis you are using. REMARK (Standard Ordering) #» 𝑛 Let ℰ = {𝑒#»1 , . . . , 𝑒#» 𝑛 } be the standard basis for F , ordered so that 𝑒𝑖 is the 𝑖th vector. We call this the standard ordering. Unless explicitly stated otherwise, we always assume that ℰ is given the standard ordering. In the early part of the course, we had been implicitly using coordinate vectors with respect 222 Chapter 8 Subspaces and Bases to ℰ. Indeed, whenever we had written ⎤ 𝑣1 ⎢ 𝑣2 ⎥ ⎢ ⎥ #» 𝑣 =⎢ . ⎥ ⎣ .. ⎦ ⎡ 𝑣𝑛 we were implicitly writing down [ #» 𝑣 ]ℰ . Moving forwards, when we attempt to describe some vector in F𝑛 , we still have to represent it somehow. Unless we have some other explicit arrangement, we will do this as above, i.e., by making use of the standard basis (with its standard ordering). We will often suppress the notation [ #» 𝑣 ]ℰ when doing so, and instead simply write #» 𝑣 as we have been doing so far. REMARK The notation of [ #» 𝑣 ]ℰ (and more generally [ #» 𝑣 ]ℬ ) is reminiscent of the notation [𝑇 ]ℰ for the standard matrix representation of the linear transformation 𝑇 . In both cases, the notation conveys the idea that we are taking an object (a vector or linear transformation) and representing it as an array of numbers (a column vector or matrix) by using a basis to obtain this representation. There is another connection between these two pieces of notation, as we will see later. #» #» 𝑛 Note that for an ordered basis ℬ = {𝑣#»1 , . . . , 𝑣#» 𝑛 } of F , the relationship between 𝑣 and [ 𝑣 ]ℬ is a two-way relationship: ⎡ ⎤ 𝑐1 ⎢ .. ⎥ #» #» # » #» 𝑣 = 𝑐1 𝑣1 + · · · + 𝑐𝑛 𝑣𝑛 ⇐⇒ [ 𝑣 ]ℬ = ⎣ . ⎦ . 𝑐𝑛 Thus, if we have #» 𝑣 , we can obtain [ #» 𝑣 ]ℬ , and vice versa. We leave the proof of the following result as an exercise. Theorem 8.8.8 (Linearity of Taking Coordinates) 𝑛 Let ℬ = {𝑣#»1 , . . . , 𝑣#» 𝑛 } be an ordered basis for F . Then the function [ #» #» by sending 𝑣 to [ 𝑣 ]ℬ is linear : ]ℬ : F𝑛 → F𝑛 defined (a) For all #» 𝑢 , #» 𝑣 ∈ F𝑛 , [ #» 𝑢 + #» 𝑣 ]ℬ = [ #» 𝑢 ]ℬ + [ #» 𝑣 ]ℬ . (b) For all #» 𝑣 ∈ F𝑛 and 𝑐 ∈ F, [𝑐 #» 𝑣 ]ℬ = 𝑐[ #» 𝑣 ]ℬ . We now provide some examples of computing coordinates with respect to bases for F𝑛 . Section 8.8 Coordinates Example 8.8.9 223 {︂[︂ ]︂ [︂ ]︂}︂ [︂ ]︂ 1 5 3 #» 2 Consider the ordered basis ℬ = , for R . Let 𝑣 = . (By this we mean [ #» 𝑣 ]ℰ , 2 6 4 as explained at the end of the above remark on the standard ordering.) (a) Find [ #» 𝑣 ]ℬ . [︂ ]︂ −3 #» #» (i.e., find [ 𝑤] #» ). (b) If [ 𝑤]ℬ = , find 𝑤 ℰ 2 Solution: (a) To find [ #» 𝑣 ]ℬ , we need to find the coordinates of #» 𝑣 with respect to ℬ. Thus, we want to find 𝑎, 𝑏 ∈ F so that [︂ ]︂ [︂ ]︂ [︂ ]︂ 1 5 3 𝑎 +𝑏 = . 2 6 4 This leads us to the system [︂ 15 26 ]︂ [︂ ]︂ [︂ ]︂ 𝑎 3 = 𝑏 4 with augmented matrix [︂ which reduces to Thus, 𝑎 = 1 2 [︃ 1 5 3 2 6 4 ]︂ 1 2 1 2 ]︃ 1 0 0 1 , . and 𝑏 = 12 . These are the coordinates of #» 𝑣 with respect of ℬ. Therefore, [︃ ]︃ [︃ ]︃ 1 3 = 21 . 4 2 ℬ [︂ ]︂ −3 #» . This means that (b) We are given that [ 𝑤]ℬ = 2 [︂ ]︂ [︂ ]︂ [︂ ]︂ 1 5 7 #» 𝑤 = (−3) +2 = . 2 6 6 That is, [︂ ]︂ #» = 𝑤 #» = 7 . [ 𝑤] ℰ 6 Example 8.8.10 ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ 1 1 ⎬ −3 ⎨ 1 #» ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ 1 , 0 , −2 Let ℬ = and let 𝑣 = 2 ⎦. ⎩ ⎭ 1 3 4 −6 (a) Show that ℬ is a basis for R3 . 224 Chapter 8 Subspaces and Bases (b) Viewing ℬ as an ordered basis in the given order, find [ #» 𝑣 ]ℬ . ⎡ ⎤ 3 #» = ⎣ −4 ⎦, find [ 𝑤] #» . (c) If [ 𝑤] ℬ ℰ 5 Solution: (a) By Proposition 8.5.4 (𝑛 Vectors in F𝑛 Span iff Independent), since we have three vectors in ℬ, it suffices to check that they span R3 . By Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛), this can be done by evaluating the rank of ⎡ ⎤ 11 1 𝐴 = ⎣ 1 0 −2 ⎦ . 13 4 ⎡ ⎤ 111 Row reduction yields the matrix 𝑅 = ⎣ 0 1 3 ⎦ , which has rank 3. Thus, we have a 003 basis. (b) We need to find 𝑎, 𝑏, 𝑐 ∈ R so that ⎤ ⎡ ⎤ ⎡ ⎡ ⎤ ⎡ ⎤ −3 1 1 1 ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ 𝑎 1 + 𝑏 0 + 𝑐 −2 = 2 ⎦ . −6 4 3 1 Consider the system ⎡ ⎤⎡ ⎤ ⎡ ⎤ 11 1 𝑎 −3 ⎣ 1 0 −2 ⎦ ⎣ 𝑏 ⎦ = ⎣ 2 ⎦ , 13 4 𝑐 −6 which has augmented matrix ⎡ ⎤ 1 1 1 −3 ⎣ 1 0 −2 2 ⎦ , 1 3 4 −6 which reduces to ⎡ 1 0 0 ⎢ ⎣ 0 1 0 0 0 1 −8 3 ⎤ ⎥ 2 ⎦. −7 3 Thus, 𝑎 = − 83 , 𝑏 = 2 and 𝑐 = − 73 , and we have ⎡ ⎤ − 83 ⎢ ⎥ [ #» 𝑣 ]ℬ = ⎣ 2 ⎦ . − 37 Section 8.8 Coordinates 225 (c) If ⎡ ⎤ 3 #» = ⎣ −4 ⎦ , [ 𝑤] ℬ 5 then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 1 4 #» = (3) ⎣ 1 ⎦ + (−4) ⎣ 0 ⎦ + (5) ⎣ −2 ⎦ = ⎣ −7 ⎦ , 𝑤 1 3 4 11 that is, ⎡ ⎤ 4 #» = 𝑤 #» = ⎣ −7 ⎦ . [ 𝑤] ℰ 11 In the previous two examples, we saw that in order to obtain the coordinate vector representation [ #» 𝑣 ]ℬ of a given vector #» 𝑣 , we must solve a certain system of equations. Must we do this each time we want to find coordinate vectors with respect to ℬ? The answer is “no”; there is a more efficient approach. Example 8.8.11 Let us first consider the final calculation of Example 8.8.9 (b), where we had [︂ ]︂ [︂ ]︂ #» = 𝑤 #» = 7 . #» = −3 , and then [ 𝑤] [ 𝑤] ℬ ℰ 6 2 This was completed by performing the calculation [︂ ]︂ [︂ ]︂ 5 1 #» = 𝑤. +2 (−3) 6 2 The system above can be expressed as [︂ 15 26 ]︂ [︂ ]︂ −3 . 2 [︂ ]︂ 15 Thus, there is a matrix at work here, namely 𝑀 = , such that multiplication by 𝑀 26 allows us to go from the new coordinate system (with respect to the basis ℬ) to the standard system (basis ℰ). [︂ ]︂ 4 For instance, if we are given a vector #» 𝑧 with [ #» 𝑧 ]ℬ = , then −3 [︂ ]︂ [︂ ]︂ [︂ ]︂ 15 4 −11 #» #» [ 𝑧 ]ℰ = 𝑀 [ 𝑧 ]ℬ = = . 2 6 −3 −10 Notice that 𝑀 is very special. Its columns are the coordinate vectors of ℬ expressed in the standard basis. 226 Chapter 8 Subspaces and Bases What about going the other way, [︂ from ]︂ ℰ to ℬ? Let us re-examine what we did in Example 3 8.8.9 (a). We wanted to obtain , which we found amounted to solving the system 4 ℬ [︂ 15 26 ]︂ [︂ ]︂ [︂ ]︂ 𝑎 3 = . 𝑏 4 The coefficient matrix here is the same 𝑀 above. It is invertible! (This is a general phenomenon, as we will soon see.) Thus, [︂ ]︂ [︂ ]︂−1 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︃ 1 ]︃ 1 6 −5 3 𝑎 15 3 2 = =− = 1 . 𝑏 26 4 4 4 −2 1 2 From this, we get ⎡1⎤ [︂ ]︂ 2 3 = ⎣ 1 ⎦. 4 ℬ 2 ]︂ 6 −5 . It will change −2 1 coordinates from the standard system (basis ℰ) to the new system (basis ℬ), and it will do this for any vector whose components we already have in ℰ. [︂ ]︂ −2 , then For instance, given a vector #» 𝑢 with [ #» 𝑢 ]ℰ = 5 Once again there is a matrix at work here, namely 𝑀 −1 = − 14 [︂ ]︂ [︂ ]︂ ]︂ [︂ [︂ 1 6 −5 −2 1 37 −1 #» #» [ 𝑢 ]ℬ = 𝑀 [ 𝑢 ]ℰ = − = . 5 4 −2 1 4 −9 To obtain 𝑀 −1 , we invert the matrix whose columns are the coordinates of the new basis vector in ℬ expressed in the standard basis. With some work, you can show that the columns of 𝑀 −1 are in fact the coordinate vectors of the standard basis vectors with respect to ℬ. The idea of using a matrix like 𝑀 above to change coordinates from one basis to another is in fact always possible, as we now explain. Definition 8.8.12 Change-of-Basis Matrix, Change-ofCoordinate Matrix #» #» 𝑛 Let ℬ = {𝑣#»1 , . . . , 𝑣#» 𝑛 } and 𝒞 = {𝑤1 , . . . , 𝑤𝑛 } be ordered bases for F . The change-of-basis (or change-of-coordinates) matrix from ℬ-coordinates to 𝒞coordinates is the 𝑛 × 𝑛 matrix [︀ ]︀ [𝐼] = [𝑣#»] . . . [𝑣#»] 𝒞 ℬ 1 𝒞 𝑛 𝒞 whose columns are the 𝒞-coordinates of the vectors 𝑣#»𝑖 in ℬ. Similarly, the change-of-basis (or change-of-coordinates) matrix from 𝒞-coordinates to ℬ-coordinates is the 𝑛 × 𝑛 matrix [︀ # » # »] ]︀ [𝐼] = [𝑤 ] . . . [𝑤 ℬ 𝒞 1 ℬ 𝑛 ℬ #» in 𝒞. whose columns are the ℬ-coordinates of the vectors 𝑤 𝑖 Section 8.8 Coordinates 227 REMARKS • The reason for the notation will become apparent later. The resemblance to the notation [𝑇 ]ℰ for the standard matrix of the linear transformation 𝑇 is intentional. • Other sources may notate 𝒞 [𝐼]ℬ as 𝒞 𝑃ℬ or 𝑃𝒞←ℬ . Example 8.8.13 The matrix 𝑀 in Example 8.8.9 is none other than ℰ [𝐼]ℬ , while the matrix 𝑀 −1 is ℬ [𝐼]ℰ . (This is not a coincidence. See Corollary 8.8.16 (Inverse of Change-of-Basis Matrix).) The role of the change-of-basis matrix is exactly as its name suggests. It changes coordinates from one basis to another. Proposition 8.8.14 (Changing a Basis) #» #» 𝑛 Let ℬ = {𝑣#»1 , . . . , 𝑣#» 𝑛 } and 𝒞 = {𝑤1 , . . . , 𝑤𝑛 } be ordered bases for F . Then [ #» 𝑥 ] = [𝐼] [ #» 𝑥 ] and [ #» 𝑥 ] = [𝐼] [ #» 𝑥 ] for all #» 𝑥 ∈ F𝑛 . 𝒞 𝒞 ℬ ℬ ℬ ℬ 𝒞 𝒞 ⎡ ⎤ 𝑎1 ⎢ ⎥ Proof: Suppose that [ #» 𝑥 ]ℬ = ⎣ ... ⎦ . Then #» 𝑥 = 𝑎1 𝑣#»1 + · · · + 𝑎𝑛 𝑣#» 𝑛 . As the coordinate vector 𝑎𝑛 map is linear by Theorem 8.8.8 (Linearity of Taking Coordinates), we have [ #» 𝑥 ]𝒞 = [𝑎1 𝑣#»1 + · · · + 𝑎𝑛 𝑣#» 𝑛 ]𝒞 #» = 𝑎 [𝑣 ] + · · · + 𝑎 [𝑣#»] 1 1 𝒞 𝑛 𝑛 𝒞 ⎡ ⎤ 𝑎1 ⎥ [︀ ]︀ ⎢ ⎢ 𝑎2 ⎥ = [𝑣#»1 ]𝒞 [𝑣#»2 ]𝒞 · · · [𝑣#» ] 𝑛 𝒞 ⎢ .. ⎥ ⎣ . ⎦ = 𝒞 [𝐼]ℬ [ #» 𝑥 ]ℬ . 𝑎𝑛 The proof going from 𝒞 to ℬ is identical, with 𝒞 switched with ℬ. ⎡ Corollary 8.8.15 ⎤ 𝑥1 ⎢ ⎥ Let #» 𝑥 = [ #» 𝑥 ]ℰ = ⎣ ... ⎦ be a vector in F𝑛 , where ℰ is the standard basis for F𝑛 . If 𝒞 is any 𝑥𝑛 ordered basis for F𝑛 , then [ #» 𝑥 ]𝒞 = 𝒞 [𝐼]ℰ [ #» 𝑥 ]ℰ . Proof: Take ℬ = ℰ in Proposition 8.8.14 (Changing a Basis). 228 Chapter 8 Subspaces and Bases The matrix ℰ [𝐼]𝒞 is relatively easily obtained by simply inserting the standard coordinates of the vectors in 𝒞 into the columns of a matrix. However, we usually want to go the other way (as in the above Corollary), and thus, we would like to have 𝒞 [𝐼]ℰ . The next result tells us that once we have one of these two matrices, then we can obtain the other rather quickly as they are inverses of each other. Corollary 8.8.16 (Inverse of Change-of-Basis Matrix) Let ℬ and 𝒞 be two ordered bases of F𝑛 . Then ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ = 𝐼𝑛 and 𝒞 [𝐼]ℬ ℬ [𝐼]𝒞 = 𝐼𝑛 . In other words, ℬ [𝐼]𝒞 = ( 𝒞 [𝐼]ℬ )−1 . Proof: Suppose we change coordinates twice. We start using basis ℬ, then change the basis to 𝒞, and then back to ℬ. We then have, for any #» 𝑥 ∈ F𝑛 , [ #» 𝑥 ]𝒞 = 𝒞 [𝐼]ℬ [ #» 𝑥 ]ℬ and therefore, [ #» 𝑥 ]ℬ = ℬ [𝐼]𝒞 [ #» 𝑥 ]𝒞 = ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ [ #» 𝑥 ]ℬ . We can write this as (︀ )︀ #» 𝐼𝑛 − ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ [ #» 𝑥 ]ℬ = 0 , for all #» 𝑥 ∈ F𝑛 . Thus, by Theorem 4.2.3 (Equality of Matrices), we conclude that (𝐼𝑛 −ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ ) = 𝒪, where 𝒪 is the 𝑛 × 𝑛 zero matrix, and therefore ℬ [𝐼]𝒞 𝒞 [𝐼]ℬ = 𝐼𝑛 . So ℬ [𝐼]𝒞 and 𝒞 [𝐼]ℬ are inverses of each other. Example 8.8.17 If we revisit Example 8.8.10 with this new notation and information, we find that ⎡ ⎤ 11 1 ℰ [𝐼]ℬ = ⎣ 1 0 −2 ⎦ . 13 4 The inverse of this matrix is and so ⎡ ⎡ ⎤ 6 −1 −2 1⎣ −6 3 3 ⎦ ℬ [𝐼]ℰ = 3 3 −2 −1 ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ −3 6 −1 −2 −3 −8 1 1 ⎣ 2 ⎦ = ⎣ −6 3 3 ⎦ ⎣ 2 ⎦ = ⎣ 6 ⎦ , 3 3 −6 ℬ 3 −2 −1 −6 −7 just as we had computed. Section 8.8 Coordinates 229 ⎡ ⎤ 3 Furthermore, if we now want, for example, ⎣ 5 ⎦ , then we can quickly find it: 9 ℬ ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ 3 6 −1 −2 3 −5 1 1 ⎣ 5 ⎦ = ⎣ −6 3 3 ⎦ ⎣ 5 ⎦ = ⎣ 24 ⎦ . 3 3 9 ℬ 3 −2 −1 9 −10 Example 8.8.18 #» #» 2 Suppose [︂ we are ]︂ in C and [︂ we ]︂wish to work with the ordered basis 𝒞 = {𝑣1 , 𝑣2 } , where 1 + 2𝑖 1+𝑖 𝑣#»1 = and 𝑣#»2 = . −1 + 𝑖 𝑖 [︂ ]︂ 10 + 𝑖 #» (a) Let 𝑧 = . Find [ #» 𝑧 ]𝒞 . 2 + 6𝑖 #» ∈ C2 with [ 𝑤] #» = (b) Consider the vector 𝑤 𝒞 [︂ ]︂ 2 #» . . Find [ 𝑤] ℰ 3𝑖 Solution: Note that, as usual, all the vectors have been given with respect to the standard basis. We have [︂ 1 + 2𝑖 1 + 𝑖 ℰ [𝐼]𝒞 = −1 + 𝑖 𝑖 ]︂ and therefore, we can compute [︂ 1 + 2𝑖 1 + 𝑖 𝒞 [𝐼]ℰ = −1 + 𝑖 𝑖 ]︂−1 [︂ ]︂ [︂ ]︂ [︂ ]︂ 1 𝑖 −1 − 𝑖 𝑖 −1 − 𝑖 1 −1 + 𝑖 = = −𝑖 = . 1 − 𝑖 1 + 2𝑖 −1 − 𝑖 2 − 𝑖 𝑖 1 − 𝑖 1 + 2𝑖 (a) We can obtain [ #» 𝑧 ]𝒞 by [ #» 𝑧 ]𝒞 = 𝒞 [𝐼]ℰ [ #» 𝑧 ]ℰ = [︂ 1 −1 + 𝑖 −1 − 𝑖 2 − 𝑖 ]︂ [︂ ]︂ [︂ ]︂ 10 + 𝑖 2 − 3𝑖 = . 2 + 6𝑖 1−𝑖 [︂ ]︂ [︂ ]︂ [︂ ]︂ 2 −1 + 7𝑖 = . 3𝑖 −5 + 2𝑖 #» by (b) We can obtain [ 𝑤] ℰ #» = [𝐼] [ 𝑤] #» = [ 𝑤] ℰ ℰ 𝒞 𝒞 1 + 2𝑖 1 + 𝑖 −1 + 𝑖 𝑖 Chapter 9 Diagonalization 9.1 Matrix Representation of a Linear Operator In this chapter we will focus on linear transformations 𝑇 : F𝑛 → F𝑛 whose domain and codomain are the same set. Among other notable properties, these transformations will have the ability to be composed with themselves; that is, functions such as 𝑇 ∘ 𝑇 will be well-defined when the domain and codomain of 𝑇 are the same. Definition 9.1.1 A linear transformation 𝑇 : F𝑛 → F𝑚 where 𝑛 = 𝑚 is called a linear operator. Linear Operator In Section 5.5, we learned how to find the standard matrix [𝑇 ]ℰ of a linear operator 𝑇 . In this section we’ll learn how to find matrix representations of a linear operator 𝑇 with respect to bases for F𝑛 other than the standard basis. Definition 9.1.2 ℬ-Matrix of T Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } be an ordered basis for F𝑛 . We define the ℬ-matrix of 𝑇 to be the matrix [𝑇 ]ℬ constructed as follows: [︁ ]︁ [𝑇 ]ℬ = [𝑇 (𝑣#»1 )]ℬ [𝑇 (𝑣#»2 )]ℬ · · · [𝑇 (𝑣#» )] 𝑛 ℬ . That is, after applying the action of 𝑇 to each member of ℬ, we take the ℬ-coordinate vectors of each of these images to create the columns of [𝑇 ]ℬ . Similar to the way we used [𝑇 ]ℰ to find the image of a vector in F𝑛 , we can use [𝑇 ]ℬ to find the coordinate vector with respect to ℬ of the image of any vector in F𝑛 by performing matrix–vector multiplication. Proposition 9.1.3 Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } be an ordered basis for F𝑛 . If #» 𝑣 ∈ F𝑛 , then [𝑇 ( #» 𝑣 )]ℬ = [𝑇 ]ℬ [ #» 𝑣 ]ℬ . 230 Section 9.1 Matrix Representation of a Linear Operator 231 Proof: Since ℬ is a basis for F𝑛 and #» 𝑣 ∈ F𝑛 , then there exists 𝑐1 , 𝑐2 , . . . , 𝑐𝑛 ∈ F such that #» 𝑣 = 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#» 𝑛. We can use the linearity of 𝑇 to expand 𝑇 ( #» 𝑣 ). We get #» #» #» 𝑇 ( #» 𝑣 ) = 𝑇 (𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑛 𝑣#» 𝑛 ) = 𝑐1 𝑇 (𝑣1 ) + 𝑐2 𝑇 (𝑣2 ) + · · · + 𝑐𝑛 𝑇 (𝑣𝑛 ). Taking coordinates is a linear operation, so [𝑇 ( #» 𝑣 )]ℬ = [𝑐1 𝑇 (𝑣#»1 ) + 𝑐2 𝑇 (𝑣#»2 ) + · · · + 𝑐𝑛 𝑇 (𝑣#» 𝑛 )]ℬ #» #» = 𝑐1 [𝑇 (𝑣1 )]ℬ + 𝑐2 [𝑇 (𝑣2 )]ℬ + · · · + 𝑐𝑛 [𝑇 (𝑣#» 𝑛 )]ℬ ⎡ ⎤ 𝑐1 [︁ ]︁ ⎢ 𝑐2 ⎥ ⎢ ⎥ = [𝑇 (𝑣#»1 )]ℬ [𝑇 (𝑣#»2 )]ℬ · · · [𝑇 (𝑣#» 𝑛 )]ℬ ⎢ .. ⎥ ⎣ . ⎦ 𝑐𝑛 = [𝑇 ]ℬ [ #» 𝑣 ]ℬ , as required. There are two important reasons why we may want to use another basis. The first is that we may have some geometrically or physically preferred vectors that naturally arise in our problem; for example, the direction vector of a line through which we are reflecting vectors. The second is that we may wish to simplify the matrix representation; for example, it would simplify things if [𝑇 ]ℬ were diagonal. These two reasons are often connected. We will now revisit some of the linear transformations that we considered in Section 5.6. For each transformation, we will choose a basis ℬ such that the ℬ-matrix will be particularly simple: it will be diagonal. Example 9.1.4 [︂ ]︂ 𝑤1 be a non-zero vector in R2 . Consider the projection transformation 𝑤2 proj 𝑤#» : R2 → R2 . Find a basis ℬ such that [proj 𝑤#» ]ℬ is diagonal. #» = Let 𝑤 Solution: Consider some vectors for which it is relatively easy to determine their images under [︂ proj ]︂ 𝑤#» . #» onto itself is itself. Therefore, proj #» ( 𝑤) #» = 𝑤. #» The vector 𝑤2 is The projection of 𝑤 𝑤 −𝑤1 (︂[︂ ]︂)︂ [︂ ]︂ 𝑤2 0 #» Therefore, proj #» orthogonal to 𝑤. = . 𝑤 −𝑤1 0 {︂[︂ ]︂ [︂ ]︂}︂ 𝑤1 𝑤2 Let ℬ = , . This set consists of two vectors that are not scalar multiples of 𝑤2 −𝑤1 each other, and therefore it is linearly independent. So ℬ is a basis for R2 . We have that (︂[︂ proj 𝑤#» and (︂[︂ proj 𝑤#» 𝑤1 𝑤2 ]︂)︂ 𝑤2 −𝑤1 ]︂ [︂ ]︂ [︂ ]︂ 𝑤1 𝑤1 𝑤2 = =1 +0 𝑤2 𝑤2 −𝑤1 ]︂)︂ [︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 0 𝑤1 𝑤2 = =0 +0 0 𝑤2 −𝑤1 . 232 Chapter 9 Diagonalization Consequently, [︂ ]︂ 10 . [proj 𝑤#» ]ℬ = 00 Notice that this is a diagonal matrix. Example 9.1.5 ]︂ 𝑤1 be a non-zero vector in R2 . Consider the reflection transformation 𝑤2 #» refl 𝑤#» : R2 → R2 , which reflects any vector in R2 about the line Span{ 𝑤}. Find a basis ℬ such that [refl 𝑤#» ]ℬ is diagonal. #» = Let 𝑤 [︂ Solution: Consider some vectors for which it is relatively easy to determine their images #» #» #» #» #» under [︂ ]︂refl 𝑤 . The reflection of 𝑤 about itself is itself. Therefore, refl 𝑤 ( 𝑤) = 𝑤. The vector 𝑤2 #» and so its reflection about 𝑤 #» is its additive inverse. Therefore, is orthogonal to 𝑤 −𝑤1 (︂[︂ ]︂)︂ [︂ ]︂ 𝑤2 −𝑤2 #» refl 𝑤 = . −𝑤1 𝑤1 {︂[︂ ]︂ [︂ ]︂}︂ 𝑤1 𝑤2 Let ℬ = , . This set consists of two vectors that are not scalar multiples of 𝑤2 −𝑤1 each other, and therefore it is linearly independent. So ℬ is a basis for R2 . We have that refl 𝑤#» and refl 𝑤#» (︂[︂ (︂[︂ 𝑤1 𝑤2 𝑤2 −𝑤1 ]︂)︂ ]︂)︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 𝑤1 𝑤1 𝑤2 = =1 +0 𝑤2 𝑤2 −𝑤1 [︂ ]︂ [︂ ]︂ [︂ ]︂ −𝑤2 𝑤1 𝑤2 = =0 + (−1) 𝑤1 𝑤2 −𝑤1 . Consequently, we get the following diagonal matrix for [refl 𝑤#» ]ℬ : ]︂ [︂ 1 0 . [refl 𝑤#» ]ℬ = 0 −1 EXERCISE In Examples{︂[︂ 9.1.4]︂and ]ℬ and [refl 𝑤#» ]ℬ if we [︂ 9.1.5, ]︂}︂ what happens to the matrices {︂[︂ ]︂[proj [︂ 𝑤#»]︂}︂ 𝑤1 𝑤2 𝑤2 𝑤1 replace ℬ = , with the ordered basis , ? 𝑤2 −𝑤1 −𝑤1 𝑤2 REMARK Finding a basis ℬ such that [𝑇 ]ℬ is diagonal is not usually done by inspection. Moreover, it is not always possible to choose a basis ℬ such that [𝑇 ]ℬ is diagonal. the (︂[︂ For ]︂)︂example, [︂ ]︂ 𝑥 𝑥 + 𝑦 2 2 operator known as the “shear operator” 𝑇 : R → R , defined by 𝑇 = , 𝑦 𝑦 is an example of a linear operator for which no basis ℬ exists such that [𝑇 ]ℬ is diagonal. Throughout this chapter we will develop criteria to determine whether or not we will be able to find such a basis. Section 9.1 Matrix Representation of a Linear Operator 233 Given a linear operator 𝑇 : F𝑛 → F𝑛 , we can create many matrices [𝑇 ]ℬ by choosing different bases ℬ for F𝑛 . A natural question to ask is: How are these various matrices related? The answer is that they are similar (see Definition 7.6.2). Proposition 9.1.6 (Similarity of Matrix Representations) Let 𝑇 : F𝑛 → F𝑛 be a linear operator. Let ℬ and 𝒞 be ordered bases for F𝑛 . Then [𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ ℬ [𝐼]𝒞 = (ℬ [𝐼]𝒞 )−1 [𝑇 ]ℬ ℬ [𝐼]𝒞 and [𝑇 ]ℬ = ℬ [𝐼]𝒞 [𝑇 ]𝒞 𝒞 [𝐼]ℬ = (𝒞 [𝐼]ℬ )−1 [𝑇 ]𝒞 𝒞 [𝐼]ℬ . That is, the matrices [𝑇 ]ℬ and [𝑇 ]𝒞 are similar over F. #» #» #» Proof: Let ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } and let 𝒞 = {𝑤1 , 𝑤2 , . . . , 𝑤𝑛 }. Then using the definition of [𝑇 ]𝒞 and Proposition 8.8.14 (Changing a Basis) gives [︁ ]︁ # »)] [𝑇 (𝑤 # »)] · · · [𝑇 (𝑤 # »)] [𝑇 ]𝒞 = [𝑇 (𝑤 1 𝒞 2 𝒞 𝑛 𝒞 [︁ ]︁ # »)] #» #» = 𝒞 [𝐼]ℬ [𝑇 (𝑤 1 ℬ 𝒞 [𝐼]ℬ [𝑇 (𝑤2 )]ℬ · · · 𝒞 [𝐼]ℬ [𝑇 (𝑤𝑛 )]ℬ . Using the definition of matrix multiplication, we can write this matrix as the product [︁ ]︁ # »)] # »)] · · · [𝑇 (𝑤 # »)] . [𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 (𝑤 [𝑇 ( 𝑤 1 ℬ 2 ℬ 𝑛 ℬ Using Proposition 9.1.3, we get [︁ # »] [𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ [𝑤 1 ℬ ]︁ # »] · · · [𝑇 ] [𝑤 # »] . [𝑇 ]ℬ [𝑤 2 ℬ ℬ 𝑛 ℬ Again using the definition of matrix multiplication, we obtain [︁ ]︁ # »] #» #» [𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ [𝑤 1 ℬ [𝑤2 ]ℬ · · · [𝑤𝑛 ]ℬ . By definition, [︁ # »] [𝐼] = [𝑤 1 ℬ ℬ 𝒞 ]︁ # »] · · · [𝑤 # »] . [𝑤 2 ℬ 𝑛 ℬ Since 𝒞 [𝐼]ℬ = (ℬ [𝐼]𝒞 )−1 , so we obtain that [𝑇 ]𝒞 = (ℬ [𝐼]𝒞 )−1 [𝑇 ]ℬ ℬ [𝐼]𝒞 . By swapping the roles played by ℬ and 𝒞 in the above argument, we obtain that [𝑇 ]ℬ = ℬ [𝐼]𝒞 [𝑇 ]𝒞 𝒞 [𝐼]ℬ = (𝒞 [𝐼]ℬ )−1 [𝑇 ]𝒞 𝒞 [𝐼]ℬ . 234 Corollary 9.1.7 Chapter 9 Diagonalization (Finding the Standard Matrix) Let 𝑇 : F𝑛 → F𝑛 be a linear operator. Let ℬ be a basis for F𝑛 and let ℰ be the standard basis for F𝑛 . Then [𝑇 ]ℰ = ℰ [𝐼]ℬ [𝑇 ]ℬ ℬ [𝐼]ℰ = (ℬ [𝐼]ℰ )−1 [𝑇 ]ℬ [𝑇 ]ℬ = ℬ [𝐼]ℰ [𝑇 ]ℰ ℰ [𝐼]ℬ = (ℰ [𝐼]ℬ )−1 [𝑇 ]ℰ ℬ [𝐼]ℰ and ℰ [𝐼]ℬ . Proof: This follows from the previous Proposition, with 𝒞 replaced by ℰ. EXERCISE In this exercise you will show that if 𝐴 and 𝐵 are similar matrices in 𝑀𝑛×𝑛 (F), then they are different representations of the same linear operator. Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F). Suppose that 𝐴 is similar to 𝐵, say 𝐴 = 𝑃 𝐵𝑃 −1 for some invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F). Let ℬ be the ordered basis for F𝑛 consisting of the columns of 𝑃 (ordered in the way they appear in 𝑃 ). Show that [𝑇𝐴 ]ℰ = 𝐴 and [𝑇𝐴 ]ℬ = 𝐵. REMARK The previous Exercise justifies the use of the word “similar” to describe the relationship in Definition 7.6.2. If matrices 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (F) are similar over F, then they are both ℬ-matrices of the same linear operator 𝑇 : F𝑛 → F𝑛 . That is, similar matrices are just different representations of the same operator. Example 9.1.8 [︂ ]︂ 𝑤1 be a non-zero vector in R2 . Consider the projection transformation 𝑤2 proj 𝑤#» : R2 → R2 . Determine [proj 𝑤#» ]ℰ using the solution to Example 9.1.4 and Corollary 9.1.7 (Finding the Standard Matrix). [︂ ]︂ 10 Solution: In Example 9.1.4, we found the matrix representation [proj 𝑤#» ]ℬ = with 00 {︂[︂ ]︂ [︂ ]︂}︂ 𝑤1 𝑤2 respect to the basis ℬ = , . 𝑤2 −𝑤1 #» = Let 𝑤 We also have that [︂ 𝑤1 𝑤2 ℰ [𝐼]ℬ = 𝑤2 −𝑤1 and ℬ [𝐼]ℰ = (ℰ [𝐼]ℬ ) −1 ]︂ [︂ ]︂ [︂ ]︂ 1 1 −𝑤1 −𝑤2 𝑤1 𝑤2 = 2 . = −𝑤12 − 𝑤22 −𝑤2 𝑤1 𝑤1 + 𝑤22 𝑤2 −𝑤1 Section 9.2 Diagonalizability of Linear Operators 235 Therefore, using Corollary 9.1.7 (Finding the Standard Matrix), we have [proj 𝑤#» ]ℰ = ℰ [𝐼]ℬ [proj 𝑤#» ]ℬ ℬ [𝐼]ℰ [︂ ]︂ [︂ ]︂ [︂ ]︂ 1 𝑤1 𝑤2 10 𝑤1 𝑤2 = 𝑤2 −𝑤1 0 0 𝑤12 + 𝑤22 𝑤2 −𝑤1 [︂ ]︂ [︂ ]︂ 1 𝑤1 0 𝑤1 𝑤2 = 2 𝑤1 + 𝑤22 𝑤2 0 𝑤2 −𝑤1 [︂ 2 ]︂ 1 𝑤1 𝑤1 𝑤2 = 2 , 𝑤1 + 𝑤22 𝑤1 𝑤2 𝑤22 which matches what we found in Example 5.6.1. Example 9.1.9 ]︂ 𝑤1 be a non-zero vector in R2 . Consider the reflection transformation 𝑤2 #» refl 𝑤#» : R2 → R2 , which reflects any vector #» 𝑣 ∈ R2 about the line Span{ 𝑤}. Determine [refl 𝑤#» ]ℰ using the solution to Example 9.1.5 and Corollary 9.1.7 (Finding the Standard Matrix). ]︂ [︂ 1 0 with Solution: In Example 9.1.5, we found the matrix representation [refl 𝑤#» ]ℬ = 0 −1 {︂[︂ ]︂ [︂ ]︂}︂ 𝑤1 𝑤2 respect to the basis ℬ = , . 𝑤2 −𝑤1 #» = Let 𝑤 [︂ We also have that [︂ ℰ [𝐼]ℬ = and ℬ [𝐼]ℰ = (ℰ [𝐼]ℬ ) −1 𝑤1 𝑤2 𝑤2 −𝑤1 ]︂ [︂ [︂ ]︂ ]︂ 1 1 −𝑤1 −𝑤2 𝑤1 𝑤2 = 2 . = −𝑤12 − 𝑤22 −𝑤2 𝑤1 𝑤1 + 𝑤22 𝑤2 −𝑤1 Therefore, using Corollary 9.1.7 (Finding the Standard Matrix), we have [refl 𝑤#» ]ℰ = ℰ [𝐼]ℬ [refl 𝑤#» ]ℬ ℬ [𝐼]ℰ [︂ ]︂ [︂ ]︂ [︂ ]︂ 1 𝑤1 𝑤2 𝑤1 𝑤2 1 0 = 𝑤2 −𝑤1 0 −1 𝑤12 + 𝑤22 𝑤2 −𝑤1 [︂ ]︂ [︂ ]︂ 1 𝑤1 −𝑤2 𝑤1 𝑤2 = 2 𝑤2 −𝑤1 𝑤1 + 𝑤22 𝑤2 𝑤1 [︂ 2 ]︂ 1 𝑤1 − 𝑤22 2𝑤1 𝑤2 = 2 , 𝑤1 + 𝑤22 2𝑤1 𝑤2 𝑤22 − 𝑤12 which matches what we found in Example 5.6.4. 9.2 Diagonalizability of Linear Operators Of all the ℬ-matrices of a fixed linear operator 𝑇 : F𝑛 → F𝑛 , some might be simpler than others. For instance, in Examples 9.1.4 and 9.1.5, we were able to find an ordered basis ℬ for F𝑛 such that the ℬ-matrix of the given linear operator 𝑇 was especially simple: it 236 Chapter 9 Diagonalization was a diagonal matrix. This can allow us to perform computations with or make certain observations about the nature of 𝑇 in a more efficient manner. Example 9.2.1 [︂ ]︂ 1 #» Let 𝑤 = ∈ R2 and consider the linear operator 𝑇 = refl 𝑤#» . Then: 2 [︂ ]︂ 1 −3 4 1. Using the standard basis ℰ, we have [𝑇 ]ℰ = . 5 4 3 {︂[︂ ]︂ [︂ ]︂}︂ [︂ ]︂ 1 2 1 0 2. Using the basis ℬ = , from Example 9.1.5, we have [𝑇 ]ℬ = . 0 −1 2 −1 {︂[︂ ]︂ [︂ ]︂}︂ [︂ ]︂ −1 7 24 1 2 3. Using the basis 𝒞 = , it can be shown that [𝑇 ]𝒞 = . −2 1 25 24 −25 We can see immediately that the matrix [𝑇 ]ℬ is invertible. This is also true of the ℰ- and 𝒞-matrix representations, but it is not as obvious. Moreover, we can also see at once that ([𝑇 ]ℬ )2 = 𝐼2 . This is a manifestation of the geometric fact that refl 𝑤#» ∘ refl 𝑤#» is the identity transformation. You can check that ([𝑇 ]ℰ )2 and ([𝑇 ]𝒞 )2 are also both equal to the 2 × 2 identity matrix, but this requires a bit of a computation. This example demonstrates some of the ways in which a diagonal matrix representation, such as [𝑇 ]ℬ , can be superior to other matrix representations. Definition 9.2.2 Diagonalizable Let 𝑇 : F𝑛 → F𝑛 be a linear operator. We say that 𝑇 is diagonalizable over F if there exists an ordered basis ℬ for F𝑛 such that [𝑇 ]ℬ is a diagonal matrix. Like matrices, not all linear operators are diagonalizable. We’ll see this later as we explore the connection between the diagonalizability of matrices and linear operators. Example 9.2.3 #» ∈ R2 be a non-zero vector. Then Example 9.1.4 shows that 𝑇 = proj #» is a diagonalLet 𝑤 𝑤 izable operator and Example 9.1.5 shows that 𝑇 = refl 𝑤#» is a diagonalizable operator. We were able to accomplish this because we could choose basis vectors 𝑣#»1 and 𝑣#»2 such that 𝑇 (𝑣#») is a scalar multiple of 𝑣#». 𝑖 𝑖 The observation at the end of the preceding Example is the key to diagonalizability. It also prompts the following (hopefully familiar!) definition. Definition 9.2.4 Eigenvector, Eigenvalue and Eigenpair of a Linear Operator Let 𝑇 : F𝑛 → F𝑛 be a linear operator. We say that the non-zero vector #» 𝑥 ∈ F𝑛 is an eigenvector of 𝑇 if there exists a scalar 𝜆 ∈ F such that 𝑇 ( #» 𝑥 ) = 𝜆 #» 𝑥. The scalar 𝜆 is called an eigenvalue of 𝑇 over F and the pair (𝜆, #» 𝑥 ) is called an eigenpair of 𝑇 over F. Section 9.2 Diagonalizability of Linear Operators 237 This should remind you of the analogous definitions for matrices (Definition 7.1.5). These definitions are connected as follows. Proposition 9.2.5 (Eigenpairs of 𝑇 and [𝑇 ]ℬ ) Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ be an ordered basis for F𝑛 . Then (𝜆, #» 𝑥 ) is an eigenpair of 𝑇 if and only if (𝜆, [ #» 𝑥 ]ℬ ) is an eigenpair of the matrix [𝑇 ]ℬ . Proof: Take the coordinates of both sides with respect to ℬ of the eigenvalue problem and use the fact that taking coordinates is a linear operation: 𝑇 ( #» 𝑥 ) = 𝜆 #» 𝑥 ⇔ [𝑇 ( #» 𝑥 )] = [𝜆 #» 𝑥] ℬ ℬ ⇔ [𝑇 ]ℬ [ #» 𝑥 ]ℬ = 𝜆 [ #» 𝑥 ]ℬ EXERCISE ]︂ 𝑤1 be a non-zero vector in R2 . Consider the projection transformation 𝑤2 proj 𝑤#» : R2 → R2 . Find the eigenvalues of proj 𝑤#» and, for each eigenvalue, find a corresponding eigenvector. #» = Let 𝑤 [︂ [Hint: The hard way is to use the standard basis and [proj 𝑤#» ]ℰ which we computed in Example 5.6.1. The easy way is to use the basis ℬ from Example 9.1.4.] Now we can give a criterion for the diagonalizability of a linear operator 𝑇 . Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability) Let 𝑇 : F𝑛 → F𝑛 be a linear operator. Then 𝑇 is diagonalizable over F if and only if there 𝑛 exists an ordered basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } for F consisting of eigenvectors of 𝑇 . Proof: Begin with the forward direction. Assume that 𝑇 is diagonalizable over F. Then 𝑛 there exists an ordered basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } of F such that [𝑇 ]ℬ = diag(𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ). Therefore, [𝑇 (𝑣#»𝑖 )]ℬ = [𝑇 ]ℬ [𝑣#»𝑖 ]ℬ = diag(𝑑1 , . . . , 𝑑𝑛 )𝑒#»𝑖 (by Proposition 9.1.3) = the 𝑖𝑡ℎ column of diag(𝑑1 , . . . , 𝑑𝑛 ) (by Lemma 4.2.2 (Column Extraction)) = 𝑑 𝑒#» 𝑖 𝑖 = 𝑑𝑖 [𝑣#»𝑖 ]ℬ . Therefore, for each 𝑖 = 1, 2, . . . , 𝑛, 𝑣#»𝑖 is an eigenvector of 𝑇 with eigenvalue 𝑑𝑖 . For the backward direction, assume that there exists an ordered basis ℬ = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛} for F𝑛 consisting of eigenvectors of 𝑇 . Then, for all 𝑣#» ∈ ℬ, 𝑇 (𝑣#») = 𝜆 𝑣#» for some 𝜆 ∈ F. 𝑖 𝑖 𝑖 𝑖 So [𝑇 ]ℬ = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ), a diagonal matrix. Thus, 𝑇 is diagonalizable over F. 𝑖 238 Example 9.2.7 Chapter 9 Diagonalization [︂ ]︂ 𝑤1 #» Let 𝑤 = be a non-zero vector in R2 . In Example 9.1.4, we essentially showed that 𝑤2 {︂[︂ ]︂ [︂ ]︂}︂ 𝑤1 𝑤2 ℬ = , is a basis for R2 consisting of eigenvectors of proj 𝑤#» . So proj 𝑤#» is 𝑤2 −𝑤1 diagonalizable over R by Proposition 9.2.6. [︂ ]︂ [︂ ]︂ 𝑤1 𝑤2 Moreover, we showed that the eigenvalues corresponding to and are 1 and 𝑤2 −𝑤1 #» = diag(1, 0). This is 0, respectively, so the proof of Proposition 9.2.6 asserts that [proj 𝑤] ℬ precisely what we had found in Example 9.1.4. EXERCISE Apply the same analysis to refl 𝑤#» . The previous Example and Exercise make use of eigenpairs that were determined through a geometric argument. Not all transformations will have intuitive geometric interpretations, so this strategy won’t always be practical. We will tackle this problem in the next two sections. 9.3 Diagonalizability of Matrices Revisited In this section we will describe a strategy for determining when a linear operator is diagonalizable while simultaneously connecting this notion of diagonalizability to the notion of diagonalizability of matrices that we learned about in Section 7.6. Proposition 9.3.1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable) Let 𝑇 : F𝑛 → F𝑛 be a linear operator and let ℬ be an ordered basis of F𝑛 . Then 𝑇 is diagonalizable over F if and only if the matrix [𝑇 ]ℬ is diagonalizable over F. Proof: (⇒): Assume that 𝑇 is diagonalizable over F. Then, by Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability), there exists an ordered basis 𝒞 of F𝑛 consisting of eigenvectors of 𝑇 . In the proof of Proposition 9.2.6, we saw that [𝑇 ]𝒞 = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ), where the 𝜆𝑖 ’s are the eigenvalues of 𝑇. By Corollary 9.1.7 (Finding the Standard Matrix), [𝑇 ]𝒞 = 𝒞 [𝐼]ℬ [𝑇 ]ℬ ℬ [𝐼]𝒞 = (ℬ [𝐼]𝒞 )−1 [𝑇 ]ℬ ℬ [𝐼]𝒞 . Since [𝑇 ]𝒞 is diagonal, then [𝑇 ]ℬ is diagonalizable over F and the matrix 𝑃 = ℬ [𝐼]𝒞 diagonalizes [𝑇 ]ℬ . Section 9.3 Diagonalizability of Matrices Revisited 239 (⇐): Assume [︀ that [𝑇 ]ℬ]︀ is diagonalizable over F. 𝑃 = 𝑝#»1 𝑝#»2 · · · 𝑝#» 𝑛 such that Then there exists an invertible matrix 𝑃 −1 [𝑇 ]ℬ 𝑃 = 𝐷 = diag(𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ). #» #» 𝑡ℎ column of 𝑃 . We define vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 such that for 𝑖 = 1, . . . , 𝑛, [𝑣𝑖 ]ℬ = 𝑝 𝑖 , the 𝑖 #» #» We see this is possible by defining 𝑣𝑖 = ℰ [𝐼]ℬ 𝑝𝑖 , so that 𝑝#»𝑖 = (ℰ [𝐼]ℬ )−1 𝑣#»𝑖 = ℬ [𝐼]ℰ 𝑣#»𝑖 = [𝑣#»𝑖 ]ℬ . Therefore, [︁ ]︁ 𝑃 = [𝑣#»1 ]ℬ [𝑣#»2 ]ℬ · · · [𝑣#» ] 𝑛 ℬ . 𝑛 We will show that the set 𝒮 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#» 𝑛 } is a basis for F . By Proposition 8.5.4 (𝑛 𝑛 Vectors in F Span iff Independent), since it contains 𝑛 vectors in F𝑛 , then it will be enough to show that the set 𝒮 is linearly independent. Consider the equation #» 𝑐1 𝑣#»1 + · · · + 𝑐𝑛 𝑣#» 𝑛 = 0. Taking the coordinates with respect to ℬ of both sides of the equation above, as finding #» #» coordinates with respect to a basis is a linear process and [ 0 ]ℬ = 0 , we obtain the equation #» 𝑐1 [𝑣#»1 ]ℬ + · · · + 𝑐𝑛 [𝑣#» 𝑛 ]ℬ = 0 . The set {[𝑣#»1 ]ℬ , . . . , [𝑣#» 𝑛 ]ℬ } consists of the columns of 𝑃 . Since 𝑃 is invertible, then by Theorem 4.6.7 (Invertibility Criteria – First Version), rank(𝑃 ) = 𝑛. It then follows from Proposition 8.3.6 (Pivots and Linear Independence) that {[𝑣#»1 ]ℬ , . . . , [𝑣#» 𝑛 ]ℬ } is linearly independent. Therefore, 𝑐1 = · · · = 𝑐𝑛 = 0 and thus, 𝒮 is linearly independent and a basis for F𝑛 . Next, we will show that the vectors in 𝒮 are also eigenvectors of 𝑇 . Using the fact that 𝑃 −1 [𝑇 ]ℬ 𝑃 = 𝐷 = diag(𝑑1 , 𝑑2 , . . . , 𝑑𝑛 ), we can multiply both sides by 𝑃 to obtain that [𝑇 ]ℬ 𝑃 = 𝑃 𝐷. Comparing the 𝑖𝑡ℎ column of [𝑇 ]ℬ 𝑃 with the 𝑖𝑡ℎ column of 𝑃 𝐷 we obtain the equation [𝑇 ]ℬ [𝑣#»𝑖 ]ℬ = 𝑑𝑖 [𝑣#»𝑖 ]ℬ . Therefore, by Proposition 9.1.3 and the linearity of finding coordinates, [𝑇 (𝑣#»𝑖 )]ℬ = [𝑑𝑖 𝑣#»𝑖 ]ℬ . Thus, 𝑇 (𝑣#»𝑖 ) = 𝑑𝑖 𝑣#»𝑖 by Theorem 8.8.1 (Unique Representation Theorem). We know that #» 𝑣#»𝑖 ̸= 0 , since 𝑣#»𝑖 belongs to a linearly independent set, and so 𝑣#»𝑖 is an eigenvector of 𝑇 . Consequently, we conclude that 𝒮 is a basis of eigenvectors of 𝑇 , and 𝑇 is diagonalizable over F by Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability). Using this result, we can translate our criterion for diagonalizability of operators from Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability) to a criterion for diagonalizability of matrices. 240 Corollary 9.3.2 Chapter 9 Diagonalization (Eigenvector Basis Criterion for Diagonalizability – Matrix Version) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 𝐴 is diagonalizable over F if and only if there exists a basis of F𝑛 consisting of eigenvectors of 𝐴. Proof: Consider the linear operator 𝑇𝐴 : F𝑛 → F𝑛 such that 𝑇𝐴 ( #» 𝑥 ) = 𝐴 #» 𝑥 . Then [𝑇𝐴 ]ℰ = 𝐴 and 𝐴 is diagonalizable over F ⇐⇒ [𝑇𝐴 ]ℰ is diagonalizable over F (because [𝑇𝐴 ]ℰ = 𝐴) ⇐⇒ 𝑇𝐴 is diagonalizable over F (by Proposition 9.3.1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable)) ⇐⇒ There exists a basis ℬ of F𝑛 of eigenvectors of 𝑇𝐴 (by Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability)) By Proposition 9.2.5 (Eigenpairs of 𝑇 and [𝑇 ]ℬ ), #» 𝑥 is an eigenvector of 𝑇𝐴 if and only if #» 𝑥 𝑛 is an eigenvector of [𝑇 ]ℰ = 𝐴. Therefore, there exists a basis ℬ of F of eigenvectors of 𝑇𝐴 if and only if there exists a basis ℬ of F𝑛 of eigenvectors of 𝐴. REMARKS The previous two results contain several important pieces of information. 1. To determine whether a linear operator 𝑇 is diagonalizable over F, we can start by obtaining the matrix of 𝑇 with respect to some basis ℬ for F𝑛 , and then test whether this matrix is diagonalizable over F. The most natural initial choice for ℬ is the standard basis, ℰ. Thus, moving forward, we can focus our discussion on the diagonalization of matrices. 2. If 𝐴 is diagonalizable over F and 𝑃 −1 𝐴𝑃 = 𝐷 is a diagonal matrix, then (a) the entries of 𝐷 are the eigenvalues of 𝐴 in F, (b) the matrix 𝑃 is the change of basis matrix from an ordered basis ℬ for F𝑛 comprised of eigenvectors of 𝐴 to the standard basis ℰ for F𝑛 (so 𝑃 = ℰ [𝐼]ℬ ), and (c) the columns of 𝑃 are eigenvectors of 𝐴. Corollary 9.3.2 (Eigenvector Basis Criterion for Diagonalizability – Matrix Version) tells us that determining whether a matrix 𝐴 is diagonalizable over F boils down to whether or not we can find 𝑛 linearly independent eigenvectors of 𝐴 in F𝑛 . Section 9.3 Diagonalizability of Matrices Revisited 241 ⎡ Example 9.3.3 ⎤ 3 4 −2 Let 𝑇 : R3 → R3 be a linear operator with [𝑇 ]ℰ = ⎣ 3 8 −3 ⎦ . We will show that 𝑇 is 6 14 −5 diagonalizable over R by finding three linearly independent eigenvectors of 𝐴 = [𝑇 ]ℰ . In order to do this, let us begin by finding all the possible eigenvectors of 𝐴. The characteristic polynomial of 𝐴 is ⎛⎡ ⎤⎞ 3−𝜆 4 −2 𝐶𝐴 (𝜆) = det ⎝⎣ 3 8 − 𝜆 −3 ⎦⎠ = −𝜆3 + 6𝜆2 − 11𝜆 + 6 = −(𝜆 − 3)(𝜆 − 2)(𝜆 − 1). 6 14 −5 − 𝜆 The eigenvalues of 𝐴 are thus 𝜆1 = 1, 𝜆2 = 2 and 𝜆3 = 3. We leave it as an exercise to show that the corresponding eigenspaces are given by ⎧⎡ ⎤⎫ ⎧⎡ ⎤⎫ ⎧⎡ ⎤⎫ ⎨ 1 ⎬ ⎨ 0 ⎬ ⎨ 1 ⎬ 𝐸1 = Span ⎣ 0 ⎦ , 𝐸2 = Span ⎣ 1 ⎦ and 𝐸3 = Span ⎣ 3 ⎦ . ⎩ ⎭ ⎩ ⎭ ⎩ ⎭ 1 2 6 Therefore, there is an obvious choice for our desired three linearly independent eigenvectors: ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 ⎬ 0 ⎨ 1 ⎦ ⎣ ⎦ ⎣ ⎣ 0 , 1 , 3⎦ . ℬ= ⎩ ⎭ 6 2 1 Of course, we have to check that this set is in fact linearly independent. We can easily do this using Proposition 8.3.6 (Pivots and Linear Independence), for instance. Thus ℬ is a basis for R3 consisting of eigenvectors of 𝐴. Therefore, 𝐴 is diagonalizable over F, by Corollary 9.3.2 (Eigenvector Basis Criterion for Diagonalizability – Matrix Version), and hence 𝑇 is diagonalizable over F too, by Proposition 9.3.1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable). In the previous Example we could have invoked Proposition 7.6.7 (𝑛 Distinct Eigenvalues =⇒ Diagonalizable) to deduce that 𝐴 is diagonalizable. Notice that Proposition 7.6.7 suggests the same strategy that we followed above: namely, take an eigenvector corresponding to each one of the eigenvalues of 𝐴, and form a basis using these eigenvectors. For this strategy to actually produce a basis, we need to be sure that the resulting set of eigenvectors is linearly independent. Proposition 9.3.4 (Eigenvectors Corresponding to Distinct Eigenvalues are Linearly Independent) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) have eigenpairs (𝜆1 , 𝑣#»1 ), (𝜆2 , 𝑣#»2 ), . . . , (𝜆𝑘 , 𝑣#»𝑘 ) over F. If the eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑘 are all distinct, then the set of eigenvectors {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is linearly independent. Proof: Proceed by induction on 𝑘. If 𝑘 = 1, then since eigenvectors are non-zero we have that {𝑣#»1 } is linearly independent. 242 Chapter 9 Diagonalization Assume the statement holds for 𝑘 = 𝑗, that is, we assume that the set {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑗 } is linearly independent. »}. We must show that this set is linearly independent Consider the set {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑗 , 𝑣# 𝑗+1 and so we consider the equation » = #» 𝑐1 𝑣#»1 + · · · + 𝑐𝑗+1 𝑣# 𝑗+1 0 . (*) We’ll manipulate (*) in two ways. First, multiply both sides of (*) through by 𝐴 to obtain » = #» 𝑐1 𝜆1 𝑣#»1 + · · · + 𝑐𝑗 𝜆𝑗 𝑣#»𝑗 + 𝑐𝑗+1 𝜆𝑗+1 𝑣# 𝑗+1 0 . (**) Next, multiply both sides of (*) through by 𝜆𝑗+1 to obtain » = #» 𝑐1 𝜆𝑗+1 𝑣#»1 + · · · + 𝑐𝑗 𝜆𝑗+1 𝑣#»𝑗 + 𝑐𝑗+1 𝜆𝑗+1 𝑣# 𝑗+1 0 . (* * *) Then subtract (* * *) from (**) to get #» 𝑐1 (𝜆1 − 𝜆𝑗+1 )𝑣#»1 + · · · + 𝑐𝑗 (𝜆𝑗 − 𝜆𝑗+1 )𝑣#»𝑗 = 0 . By the inductive hypothesis, {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑗 } is linearly independent so for all 𝑖, 1 ≤ 𝑖 ≤ 𝑗, 𝑐𝑖 (𝜆𝑖 − 𝜆𝑗+1 ) = 0. However, 𝜆𝑖 ̸= 𝜆𝑗+1 and so 𝑐𝑖 = 0. Thus, 𝑐1 = · · · = 𝑐𝑗 = 0 and our original equation (*) becomes » = #» 𝑐𝑗+1 𝑣# 𝑗+1 0. » is nonzero since it is an eigenvector, so we conclude that 𝑐 The vector 𝑣# 𝑗+1 𝑗+1 = 0. Conse#» #» #» # » quently, the set {𝑣1 , 𝑣2 , . . . , 𝑣𝑗 , 𝑣𝑗+1 } is linearly independent. This establishes the inductive step and completes the proof. We can now provide a proof of Proposition 7.6.7 (𝑛 Distinct Eigenvalues =⇒ Diagonalizable) which states the following: If 𝐴 ∈ 𝑀𝑛×𝑛 (F) has 𝑛 distinct eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 in F, then 𝐴 is diagonalizable over F. More specifically, if we let (𝜆1 , 𝑣#»1 ), . . . , (𝜆𝑛 , 𝑣#» 𝑛 ) be eigenpairs of 𝐴 over F, and if we let 𝑃 = [𝑣#» · · · 𝑣#»] be the matrix whose columns are eigenvectors corresponding to the distinct 1 𝑛 eigenvalues, then (a) 𝑃 is invertible, and (b) 𝑃 −1 𝐴𝑃 = 𝐷 = diag(𝜆1 , 𝜆2 , · · · , 𝜆𝑛 ). Proof: Since we have 𝑛 eigenvectors corresponding to 𝑛 distinct eigenvalues, then the set {𝑣#»1 , . . . , 𝑣#» 𝑛 } is linearly independent by Proposition 9.3.4 (Eigenvectors Corresponding to Distinct Eigenvalues are Linearly Independent). Further, the set {𝑣#»1 , . . . , 𝑣#» 𝑛 } is a basis of eigenvectors for F𝑛 . Therefore, 𝐴 is diagonalizable over F by Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability). Section 9.3 Diagonalizability of Matrices Revisited 243 #» #» Let 𝑃 = [𝑣#»1 · · · 𝑣#» 𝑛 ]. Since {𝑣1 , . . . , 𝑣𝑛 } is a basis, then it is linearly independent, and so rank(𝑃 ) = 𝑛 by Proposition 8.3.6 (Pivots and Linear Independence). Therefore, by Theorem 4.6.7 (Invertibility Criteria – First Version), 𝑃 is invertible. Now since 𝑣#»1 , . . . , 𝑣#» 𝑛 are eigenvectors of 𝐴, then using Lemma 4.2.2 (Column Extraction), 𝐴𝑃 = 𝐴[𝑣#» · · · 𝑣#»] = [𝐴𝑣#» · · · 𝐴𝑣#»] = [𝜆 𝑣#» · · · 𝜆 𝑣#»]. 1 𝑛 𝑖 𝑛 1 1 𝑛 𝑛 Let 𝐷 = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ) = [𝜆1 𝑒#»1 · · · 𝜆𝑛 𝑒#» 𝑛 ]. Therefore, using Lemma 4.2.2 again, #» #» #» #» 𝑃 𝐷 = 𝑃 [𝜆1 𝑒#»1 · · · 𝜆𝑛 𝑒#» 𝑛 ] = [𝜆1 𝑃 𝑒1 · · · 𝜆𝑛 𝑃 𝑒𝑛 ] = [𝜆1 𝑣1 · · · 𝜆𝑛 𝑣𝑛 ]. Therefore, 𝐴𝑃 = 𝑃 𝐷 and so 𝑃 −1 𝐴𝑃 = 𝐷 = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ). ⎡ Example 9.3.5 ⎤ 5 2 −1 Suppose 𝑇 : R3 → R3 is a linear operator with [𝑇 ]ℰ = ⎣ 8 1 −2 ⎦ . 16 0 −3 (a) Prove that 𝑇 is diagonalizable over R. (b) Find a basis ℬ of R3 such that [𝑇 ]ℬ is a diagonal matrix. (c) Determine an invertible matrix 𝑃 such that 𝑃 −1 [𝑇 ]ℰ 𝑃 = [𝑇 ]ℬ . Solution: (a) By Proposition 9.3.1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable), it suffices to prove that 𝐴 = [𝑇 ]ℰ is diagonalizable over R. The characteristic polynomial of 𝐴 is ⎤⎞ ⎛⎡ 5−𝜆 2 −1 𝐶𝐴 (𝜆) = det ⎝⎣ 8 1 − 𝜆 −2 ⎦⎠ = −𝜆3 + 3𝜆2 + 13𝜆 − 15. 16 0 −3 − 𝜆 Since −𝜆3 + 3𝜆2 + 13𝜆 − 15 = −(𝜆 − 5)(𝜆 − 1)(𝜆 + 3), the eigenvalues of 𝑇 are 𝜆1 = 5, 𝜆2 = 1, and 𝜆3 = −3. There are three distinct eigenvalues, and therefore 𝐴 is diagonalizable over R by Proposition 7.6.7 (𝑛 Distinct Eigenvalues =⇒ Diagonalizable). (b) We want a basis ℬ for R3 consisting of eigenvectors of 𝐴. Since we have three distinct eigenvalues, we need to select one eigenvector from each eigenspace. We leave it as an exercise to determine that the eigenspaces of 𝐴 are given by ⎧⎡ ⎤⎫ ⎧⎡ ⎤⎫ ⎧⎡ ⎤⎫ ⎨ 1 ⎬ ⎨ 1 ⎬ ⎨ 0 ⎬ ⎣ ⎦ ⎣ ⎦ 1 0 𝐸5 = Span , 𝐸1 = Span and 𝐸−3 = Span ⎣ 1 ⎦ . ⎩ ⎭ ⎩ ⎭ ⎩ ⎭ 2 4 2 Thus, ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 0 ⎬ ⎨ 1 ⎣ ⎦ ⎣ ⎦ ⎣ 1 , 0 , 1⎦ ℬ= ⎩ ⎭ 2 4 2 is a basis for R3 for which [𝑇 ]ℬ = diag(5, 1, −3). 244 Chapter 9 Diagonalization (c) The columns of the desired matrix 𝑃 are the eigenvectors in ℬ. Thus, ⎡ ⎤ 110 𝑃 = ⎣1 0 1⎦ . 242 REMARK We repeat for emphasis that the matrix 𝑃 in part (c) that diagonalizes 𝐴 is the change of basis matrix from the basis of eigenvectors of 𝐴 to the standard basis, ℰ [𝐼]ℬ . 9.4 The Diagonalizability Test For 𝐴 ∈ 𝑀𝑛×𝑛 (F) to be diagonalizable over F, it must have 𝑛 linearly independent eigenvectors in F𝑛 . Let us consider some examples where 𝑛 linearly independent eigenvectors do not exist. ]︂ 0 1 . Then 𝐶𝐴 (𝜆) = 𝜆2 + 1. This matrix has no real eigenvalues and therefore −1 0 no real eigenvectors. So 𝐴 is not diagonalizable over R. [︂ Example 9.4.1 Let 𝐴 = However, we note that 𝐴 does have two distinct complex eigenvalues, 𝜆1 = 𝑖 and 𝜆2 = −𝑖, and is therefore diagonalizable over C. ⎤ 2 0 0 Let 𝐴 = ⎣ 0 0 1 ⎦ . Then 𝐶𝐴 (𝜆) = −(𝜆 − 2)(𝜆2 + 1). 0 −1 0 ⎡ Example 9.4.2 This matrix has only one real eigenvalue 𝜆1 = 2. We can show that all eigenvectors associated [︀ ]︀𝑇 with this eigenvalue are scalar multiples of 1 0 0 . Therefore, 𝐴 is not diagonalizable over R. However, we note that 𝐴 does have three distinct complex eigenvalues, 𝜆1 = 2, 𝜆2 = 𝑖, and 𝜆3 = −𝑖, and is therefore diagonalizable over C. [︂ Example 9.4.3 ]︂ 01 Let 𝐴 = . Then 𝐶𝐴 (𝜆) = 𝜆2 = (𝜆 − 0)2 . 00 This matrix has two real eigenvalues, but they are both 0. We can show that all eigenvec[︀ ]︀𝑇 tors associated with the eigenvalue 0 are scalar multiples of 1 0 . Therefore, 𝐴 is not diagonalizable over R or C. We will now introduce some theory that will help us determine exactly when a basis of eigenvectors exists. Section 9.4 The Diagonalizability Test Definition 9.4.4 Algebraic Multiplicity 245 Let 𝜆𝑖 be an eigenvalue of 𝐴 ∈ 𝑀𝑛×𝑛 (F). The algebraic multiplicity of 𝜆𝑖 , denoted by 𝑎𝜆𝑖 , is the largest positive integer such that (𝜆 − 𝜆𝑖 )𝑎𝜆𝑖 divides the characteristic polynomial 𝐶𝐴 (𝜆). In other words, 𝑎𝜆𝑖 gives the number of times that (𝜆−𝜆𝑖 ) terms occur in the fully factorized form of 𝐶𝐴 (𝜆). ⎡ Example 9.4.5 ⎤ 122 Let 𝐴 = ⎣ 2 1 2 ⎦ . Then 𝐶𝐴 (𝜆) = −(𝜆 − 5)(𝜆 + 1)2 . 221 The matrix 𝐴 has two distinct eigenvalues: 𝜆1 = 5, with algebraic multiplicity 𝑎𝜆1 = 1, and 𝜆2 = −1, with algebraic multiplicity 𝑎𝜆2 = 2. We can also say that 𝐴 has three eigenvalues with one of them being repeated twice. So its eigenvalues are 5, −1, and −1. Definition 9.4.6 Geometric Multiplicity Let 𝜆𝑖 be an eigenvalue of 𝐴 ∈ 𝑀𝑛×𝑛 (F). The geometric multiplicity of 𝜆𝑖 , denoted by 𝑔𝜆𝑖 , is the dimension of the eigenspace 𝐸𝜆𝑖 . That is, 𝑔𝜆𝑖 = dim(𝐸𝜆𝑖 ). ⎡ Example 9.4.7 ⎤ 122 Let 𝐴 ∈ 𝑀3×3 (R) with 𝐴 = ⎣ 2 1 2 ⎦ . Determine the geometric multiplicities of all eigen221 values of 𝐴. Solution: The eigenvalues of 𝐴 are 𝜆1 = 5 and 𝜆2 = −1. #» The eigenspace 𝐸𝜆1 is the solution set to (𝐴 − 5𝐼) #» 𝑥 = 0 , which is equivalent to ⎤ ⎡ −4 2 2 #» ⎣ 2 −4 2 ⎦ #» 𝑥 = 0 2 2 −4 and row reduces to A basis for 𝐸5 is thus ⎡ ⎤ 1 0 −1 #» ⎣ 0 1 −1 ⎦ #» 𝑥 = 0. 00 0 ⎧⎡ ⎤⎫ ⎨ 1 ⎬ ⎣1⎦ . ⎩ ⎭ 1 Therefore, dim(𝐸𝜆1 ) = 1 and so 𝑔𝜆1 = 1. #» The eigenspace 𝐸𝜆2 is the solution set to (𝐴 + 1𝐼) #» 𝑥 = 0 , which is equivalent to ⎡ ⎤ 222 #» ⎣ 2 2 2 ⎦ #» 𝑥 = 0 222 246 Chapter 9 and row reduces to Diagonalization ⎡ ⎤ 111 #» ⎣ 0 0 0 ⎦ #» 𝑥 = 0. 000 A basis for 𝐸𝜆2 is thus ⎧⎡ ⎤ ⎡ ⎤⎫ 1 ⎬ ⎨ 1 ⎣ −1 ⎦ , ⎣ 0 ⎦ . ⎩ ⎭ 0 −1 Therefore, dim(𝐸𝜆2 ) = 2 and so 𝑔𝜆2 = 2. The geometric multiplicity of an eigenvalue tells us the maximum number of linearly independent eigenvectors we can obtain from the eigenspace corresponding to that eigenvalue. The two multiplicities are connected through the following result. Proposition 9.4.8 (Geometric and Algebraic Multiplicities) Let 𝜆𝑖 be an eigenvalue of the matrix 𝐴 ∈ 𝑀𝑛×𝑛 (F). Then 1 ≤ 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 . REMARK In the proof below, we will describe a matrix by using what is called a block matrix; that is, denoting some or all parts of a matrix as matrices themselves. For example, given the matrix ⎡ 1 ⎢3 ⎢ 𝐴=⎢ ⎢ 11 ⎣ 13 15 2 4 12 14 16 5 8 17 20 23 6 9 18 21 24 ⎤ 7 10 ⎥ ⎥ 19 ⎥ ⎥ 22 ⎦ 25 we can describe 𝐴 as [︂ 𝐴1 𝐴2 𝐴= 𝐵1 𝐵2 ]︂ where [︂ ]︂ 12 𝐴1 = , 34 [︂ ]︂ 56 7 𝐴2 = , 8 9 10 ⎡ ⎤ 11 12 𝐵1 = ⎣ 13 14 ⎦ , 15 16 ⎡ ⎤ 17 18 19 𝐵2 = ⎣ 20 21 22 ⎦ . 23 24 25 Alternatively, with 𝐴2 , 𝐵1 , and 𝐵2 defined as above, we can re-write our matrix 𝐴 as ⎡ ⎤ 1 2 𝐴2 ⎦ 𝐴=⎣ 3 4 . 𝐵1 𝐵2 Of course, there are many other ways we could slice 𝐴 into blocks to give different block representations of 𝐴. Section 9.4 The Diagonalizability Test 247 Proof: By definition, if 𝜆𝑖 is an eigenvalue of 𝐴, then there is a non-trivial solution to 𝐴 #» 𝑣 = 𝜆𝑖 #» 𝑣 . Therefore, each eigenspace contains at least one non-zero vector and so its dimension will be at least one. Therefore, 1 ≤ 𝑔𝜆𝑖 . Suppose that {𝑣#», . . . , 𝑣#»} is a basis for 𝐸 . Therefore, 𝑔 = 𝑘. 1 𝑘 𝜆𝑖 𝜆𝑖 F𝑛 We can extend this basis for 𝐸𝜆𝑖 to a basis ℬ for (for a demonstration of how this can be done, see Example 8.5.7, as well as the remark preceding it). Therefore, ℬ = # », . . . , 𝑤 # »} . {𝑣#»1 , . . . , 𝑣#»𝑘 , 𝑤 𝑛 𝑘+1 𝑛 𝑛 We let 𝑇 : F → F be defined as 𝑇 ( #» 𝑣 ) = 𝐴 #» 𝑣. 𝐴 𝐴 For 𝑗 = 1, . . . , 𝑘, it is the case that 𝑣#»𝑗 ∈ 𝐸𝜆𝑖 and therefore, 𝐴𝑣#»𝑗 = 𝜆𝑖 𝑣#»𝑗 . Therefore, # »)] will be some vector in F𝑛 . It now follows [𝑇𝐴 (𝑣#»𝑗 )]ℬ = 𝜆𝑖 𝑒#»𝑗 . For 𝑗 = 𝑘 + 1, . . . , 𝑛, [𝑇𝐴 (𝑤 𝑗 ℬ that [𝑇𝐴 ]ℬ has the block structure ⎤ ⎡ 𝜆𝑖 0 · · · 0 ⎥ ⎢ 0 𝜆𝑖 · · · 0 ⎢ ⎥ 𝑀 ⎥ ⎢ .. . . 1 .. . . . .. [𝑇𝐴 ]ℬ = ⎢ . ⎥ ⎥ ⎢ ⎦ ⎣ 0 0 · · · 𝜆𝑖 𝒪(𝑛−𝑘)×𝑘 𝑀2 for some 𝑀1 ∈ 𝑀𝑘×(𝑛−𝑘) (F) and some 𝑀2 ∈ 𝑀(𝑛−𝑘)×(𝑛−𝑘) (F). The characteristic polynomial of 𝐴 is equal to the characteristic polynomials of [𝑇𝐴 ]ℬ since these two matrices are similar (exercise). It can be shown by induction that 𝐶𝐴 (𝜆) = (𝜆𝑖 − 𝜆)𝑘 𝐶𝑀2 (𝜆) = (𝜆𝑖 − 𝜆)𝑔𝜆𝑖 𝐶𝑀2 (𝜆). Since 𝑎𝜆𝑖 is the largest positive integer such that (𝜆 − 𝜆𝑖 )𝑎𝜆𝑖 divides 𝐶𝐴 (𝑡), it follows that 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 . What happens if we take the union of the bases of all the eigenspaces of a given matrix? It turns out that this set is linearly independent. Proposition 9.4.9 Let 𝐴 ∈ 𝑀𝑛×𝑛 (F) with distinct eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆𝑘 . If their corresponding eigenspaces, 𝐸𝜆1 , 𝐸𝜆2 , . . . , 𝐸𝜆𝑘 have bases ℬ1 , ℬ2 , . . . , ℬ𝑘 , then ℬ = ℬ1 ∪ ℬ2 ∪ · · · ∪ ℬ𝑘 is linearly independent. Proof: To make things easier to read, let 𝑔𝑖 = 𝑔𝜆𝑖 for 𝑖 = 1, . . . , 𝑘. Let ℬ1 = {𝑣# 1»1 , . . . , 𝑣# 1 𝑔»1 }, ℬ2 = {𝑣# 2»1 , . . . , 𝑣# 2 𝑔»2 } and so on up to ℬ𝑘 = {𝑣# 𝑘»1 . . . , 𝑣# 𝑘 𝑔»𝑘 }. Take note of the double subscripts here, with the first subscript corresponding to the numbering of the basis (ℬ1 , ℬ2 , and so on) and the second subscript counting the vector’s position within that basis. We wish to perform the linear dependence check on ℬ = ℬ1 ∪ ℬ2 ∪ · · · ∪ ℬ𝑘 . We consider the following equation with scalars in F: 𝑔1 ∑︁ 𝑖=1 𝑐1𝑖 𝑣# 1»𝑖 + 𝑔2 ∑︁ 𝑖=1 𝑐2𝑖 𝑣# 2»𝑖 + · · · + 𝑔𝑘 ∑︁ 𝑖=1 #» 𝑐𝑘𝑖 𝑣# 𝑘»𝑖 = 0 (*). 248 Chapter 9 Diagonalization 𝑔1 𝑔2 # » = ∑︀ # » = ∑︀ Let 𝑤 𝑐1𝑖 𝑣# 1»𝑖 , which is a vector from 𝐸𝜆1 . Let 𝑤 𝑐2𝑖 𝑣# 2»𝑖 , which is a vector from 1 2 𝑖=1 𝑖=1 𝑔𝑘 # » = ∑︀ 𝐸𝜆2 . Continuing in this manner, let 𝑤 𝑐𝑘𝑖 𝑣# 𝑘»𝑖 , which is a vector from 𝐸𝜆𝑘 . 𝑘 𝑖=1 Therefore, our equation (*) has become #» + 𝑤 #» + ··· + 𝑤 # » = #» 𝑤 0 1 2 𝑘 (**). # » are from a different eigenspace. The vectors in a given eigenspace are Each of the 𝑤 𝑗 # » are not the zero vector, then from either eigenvectors or the zero vector. If any of the 𝑤 𝑗 (**) we get a non-trivial linear combination of eigenvectors each chosen from a different eigenspace equalling to the zero vector. This contradicts Proposition 9.3.4 (Eigenvectors Corresponding to Distinct Eigenvalues are Linearly Independent), which tells us that a set of eigenvectors chosen in such a way that every eigenvector is from a different eigenspace is #» = ··· = 𝑤 # » = #» linearly independent. Therefore, it must be the case that 𝑤 0. 1 𝑘 𝑔𝑗 # » = ∑︀ 𝑐 𝑣# » = #» But 𝑤 0 implies that 𝑐𝑗1 = · · · = 𝑐𝑗𝑔𝜆𝑗 = 0 because ℬ𝑗 is a basis for 𝐸𝜆𝑗 . 𝑗 𝑗𝑖 𝑗 𝑖 𝑖=1 Therefore, all of the scalars in (*) are 0 and the set ℬ is linearly independent. We have now reached the stage where we can give a test to determine whether a matrix is diagonalizable over F. Theorem 9.4.10 (Diagonalizability Test) Let 𝐴 ∈ 𝑀𝑛×𝑛 (F). Suppose that the complete factorization of the characteristic polynomial of 𝐴 into irreducible factors over F is given by 𝐶𝐴 (𝜆) = (𝜆 − 𝜆1 )𝑎𝜆1 · · · (𝜆 − 𝜆𝑘 )𝑎𝜆𝑘 ℎ(𝜆), where 𝜆1 , . . . 𝜆𝑘 are all of the distinct eigenvalues of 𝐴 in F with corresponding algebraic multiplicities 𝑎𝜆1 . . . 𝑎𝜆𝑘 and ℎ(𝜆) is a polynomial in 𝜆 that is irreducible over F. Then 𝐴 is diagonalizable over F if and only if ℎ(𝜆) is a constant polynomial and 𝑎𝜆𝑖 = 𝑔𝜆𝑖 , for each 𝑖 = 1, . . . , 𝑘. Proof: We begin with the forward direction. Thus, assume that 𝐴 is diagonalizable over F. Then, by Corollary 9.3.2 (Eigenvector Basis Criterion for Diagonalizability – Matrix Version), 𝐴 has 𝑛 linearly independent eigenvectors in F𝑛 . Each of these eigenvectors must lie in one of the eigenspaces 𝐸𝜆𝑖 of 𝐴, and no eigenvector can lie in two different eigenspaces. Since there can be at most 𝑔𝜆𝑖 linearly independent vectors in 𝐸𝜆𝑖 , we conclude that 𝑛 ≤ 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 . On the other hand, by Proposition 9.4.8 (Geometric and Algebraic Multiplicities), we know that 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 , and so 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 ≤ 𝑎𝜆1 + · · · + 𝑎𝜆𝑘 . We also know that the sum of algebraic multiplicities 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 is at most the degree of 𝐶𝐴 (𝜆), which is 𝑛 by Proposition 7.3. 4 (Features of the Characteristic Polynomial). Therefore, 𝑛 ≤ 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 ≤ 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 ≤ 𝑛. Section 9.4 The Diagonalizability Test 249 This implies that 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 = 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 = 𝑛 = deg(𝐶𝐴 (𝜆)). From this we immediately conclude that deg(ℎ(𝜆)) = 0, i.e., that ℎ(𝜆) is a constant polynomial. Since 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 for all 𝑖 = 1, . . . , 𝑘, we also conclude from this that 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for all 𝑖 = 1, . . . , 𝑘. This completes the proof of the forward direction. Conversely, assume that ℎ(𝜆) is a constant polynomial and that 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for all 𝑖 = 1, . . . , 𝑘. This implies that 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 = 𝑎𝜆𝑖 + · · · + 𝑎𝜆𝑘 = 𝑛. Thus if we let ℬ𝑘 be a basis for 𝐸𝜆𝑘 for 𝑖 = 1, . . . , 𝑘, and if we let ℬ = ℬ1 ∪ · · · ∪ ℬ𝑘 be the union of these bases, then by Proposition 9.4.9, ℬ will linearly independent, and by the above, ℬ will contain 𝑔𝜆𝑖 + . . . + 𝑔𝜆𝑘 = 𝑛 eigenvectors. Thus ℬ is a linearly independent subset of F𝑛 that contains 𝑛 vectors, so ℬ must be a basis for F𝑛 . Therefore, we have found a basis for F𝑛 consisting of eigenvectors of 𝐴, so 𝐴 must be diagonalizable over F, by Corollary 9.3.2 (Eigenvector Basis Criterion for Diagonalizability – Matrix Version). REMARK If F = C, then the polynomial ℎ(𝜆) in Theorem 9.4.10 (Diagonalizability Test) is always constant (in fact, it is equal to (−1)𝑛 .) This is because every degree 𝑛 polynomial factors over C into linear factors. Therefore ℎ(𝜆) only matters if F = R. It measures the failure of 𝐶𝐴 (𝜆) to factor completely into linear factors, and so it measures, in a sense, a deficit of real eigenvalues. ⎡ Example 9.4.11 ⎤ 122 Consider the matrix 𝐴 = ⎣ 2 1 2 ⎦ from Example 9.4.7. Determine whether 𝐴 is diagonal221 izable over R and/or C. Solution: In Example 9.4.7, we found the characteristic polynomial to be 𝐶𝐴 (𝜆) = −(𝜆 − 5)(𝜆 + 1)2 . So ℎ(𝜆) = −1, a constant polynomial. The eigenvalues are 𝜆1 = 5 and 𝜆2 = −1 and we determined that 𝑎𝜆1 = 𝑔𝜆1 = 1 and 𝑎𝜆2 = 𝑔𝜆2 = 2. Therefore, 𝐴 is diagonalizable over both R and C, using Theorem 9.4.10 (Diagonalizability Test). ⎡ Example 9.4.12 1 ⎢2 Determine whether 𝐴 = ⎢ ⎣0 0 2 1 0 0 0 0 1 2 ⎤ 0 0⎥ ⎥ is diagonalizable over R and/or C. 2⎦ 1 Solution: The characteristic polynomial is 𝐶𝐴 (𝜆) = (𝜆 − 3)2 (𝜆 + 1)2 . Therefore, ℎ(𝜆) = 1, a constant polynomial. 250 Chapter 9 Diagonalization The eigenvalues are: 𝜆1 = 3, with 𝑎𝜆1 = 2, and 𝜆2 = −1, with 𝑎𝜆2 = 2. #» The eigenspace 𝐸𝜆1 is the solution set to (𝐴 − 3𝐼) #» 𝑣 = 0 , which is equivalent to ⎤ ⎡ −2 2 0 0 ⎢ 2 −2 0 0 ⎥ #» #» ⎥ ⎢ ⎣ 0 0 −2 2 ⎦ 𝑣 = 0 0 0 2 −2 and row reduces to ⎡ 1 ⎢0 ⎢ ⎣0 0 −1 0 0 0 0 1 0 0 ⎤ 0 −1 ⎥ #» ⎥ #» 𝑣 = 0. ⎦ 0 0 Since the rank of the coefficient matrix is 2, then there will be 4 − 2 = 2 parameters in the solution set. Therefore, 𝑔𝜆1 = dim(𝐸𝜆1 ) = 2. #» The eigenspace 𝐸 is the solution set to (𝐴 + 1𝐼) #» 𝑣 = 0 which is equivalent to 𝜆2 and row reduces to ⎡ 2 ⎢2 ⎢ ⎣0 0 2 2 0 0 0 0 2 2 ⎤ 0 0⎥ #» ⎥ #» 𝑣 = 0 2⎦ 2 ⎡ 1 0 0 0 0 1 0 0 ⎤ 0 1⎥ #» ⎥ #» 𝑣 = 0. 0⎦ 0 1 ⎢0 ⎢ ⎣0 0 Since the rank of the coefficient matrix is 2, then there will be 4 − 2 = 2 parameters in the solution set. Therefore, 𝑔𝜆2 = dim(𝐸𝜆2 ) = 2. Since 𝑎𝜆1 = 𝑔𝜆1 = 2, and 𝑎𝜆2 = 𝑔𝜆2 = 2, 𝐴 is diagonalizable over both R and C by Theorem 9.4.10 (Diagonalizability Test). ⎡ Example 9.4.13 5 ⎢ −2 Determine whether 𝐴 = ⎢ ⎣ 4 16 2 1 4 0 0 0 3 −8 ⎤ 1 −1 ⎥ ⎥ is diagonalizable over R and/or C. 2 ⎦ −5 Solution: The characteristic polynomial 𝐶𝐴 (𝜆) = (𝜆 − 3)3 (𝜆 + 5). Therefore, ℎ(𝜆) = 1, a constant polynomial. The eigenvalues are: 𝜆1 = 3, with 𝑎𝜆1 = 3, and 𝜆2 = −5, with 𝑎𝜆2 = 1. #» The eigenspace 𝐸𝜆1 is the solution set to (𝐴 − 3𝐼) #» 𝑣 = 0 , which is equivalent to ⎡ ⎤ 2 2 0 1 ⎢ −2 −2 0 −1 ⎥ #» #» ⎢ ⎥ ⎣ 4 4 0 2 ⎦𝑣 = 0 16 0 −8 −8 Section 9.4 The Diagonalizability Test 251 and row reduces to ⎡ 2 ⎢0 ⎢ ⎣0 0 2 2 0 0 0 1 0 0 ⎤ 1 2⎥ #» ⎥ #» 𝑣 = 0. 0⎦ 0 Since the rank of the coefficient matrix is 2, then there will be 4 − 2 = 2 parameters in the solution set. Therefore, dim(𝐸𝜆1 ) = 2. Since 𝑎𝜆1 = 3 and 𝑔𝜆1 = 2, 𝐴 is not diagonalizable over R nor C, by Theorem 9.4.10 (Diagonalizability Test). [︂ Example 9.4.14 Determine whether 𝐴 = ]︂ 1 −1 is diagonalizable over R and/or C. 1 1 Solution: The characteristic polynomial 𝐶𝐴 (𝜆) = 𝜆2 − 2𝜆 + 2 = (𝜆 − 1 − 𝑖)(𝜆 − 1 + 𝑖). Over R, ℎ(𝜆) = 𝜆2 −2𝜆+2 which is not a constant polynomial. Therefore, 𝐴 is not diagonalizable over R, by Theorem 9.4.10 (Diagonalizability Test). However, over C, ℎ(𝜆) = 1, which is a constant polynomial. The eigenvalues are: 𝜆1 = 1 + 𝑖, with 𝑎𝜆1 = 1 and 𝜆2 = 1 − 𝑖, with 𝑎𝜆2 = 1. Since 𝐴 has two distinct eigenvalues, we can conclude that 𝐴 is diagonalizable over C by Proposition 7.6.7 (𝑛 Distinct Eigenvalues =⇒ Diagonalizable). If we wanted to use Theorem 9.4.10, we could note that 1 ≤ 𝑔𝑖 ≤ 𝑎𝑖 for any eigenvalue. In this case, 𝑎𝜆1 = 𝑎𝜆2 = 1 and so 𝑔𝜆1 = 𝑔𝜆2 = 1. Therefore, by the Diagonalizability Test, 𝐴 is diagonalizable over C. Theorem 9.4.10 (Diagonalizability Test) is useful when we are determining whether a matrix is diagonalizable over F. However, if an 𝑛 × 𝑛 matrix 𝐴 is diagonalizable over F, we often want to find an invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (F) such 𝑃 −1 𝐴𝑃 = 𝐷, where 𝐷 is a diagonal matrix. Theorem 9.4.10 and the proof of Proposition 9.4.9 give us that if 𝐴 is diagonalizable over F, then the union of the bases of all of the eigenspaces of 𝐴 will be a basis of eigenvectors of 𝐴 for F𝑛 . Therefore, we let 𝑃 be the matrix whose columns are this basis of eigenvectors. The proof of Proposition 9.3. 1 (𝑇 Diagonalizable iff [𝑇 ]ℬ Diagonalizable) gives us that 𝑃 −1 𝐴𝑃 = 𝐷, where 𝐷 is a diagonal matrix and the entries on the diagonal of 𝐷 will be the eigenvalues of 𝐴 in the order corresponding to the order in which the eigenvectors are used as the columns of 𝑃 . We illustrate this with an example. ⎡ Example 9.4.15 ⎤ 1200 ⎢2 1 0 0⎥ ⎥ We know from Example 9.4.12 that matrix 𝐴 = ⎢ ⎣ 0 0 1 2 ⎦ is diagonalizable over R. De0021 termine an invertible matrix 𝑃 such that 𝑃 −1 𝐴𝑃 = 𝐷, where 𝐷 is a diagonal matrix. Solution: In Example 9.4.12, we determined that 𝐴 had two eigenvalues, 𝜆1 = 3 and 𝜆2 = −1. 252 Chapter 9 We determined that the eigenspace 𝐸𝜆1 was the solution set to ⎡ ⎤ 1 −1 0 0 ⎢ 0 0 1 −1 ⎥ #» #» ⎢ ⎥ ⎣0 0 0 0 ⎦ 𝑣 = 0 . 0 0 0 0 A basis for 𝐸𝜆1 ⎧⎡ ⎤ ⎡ ⎤⎫ 1 0 ⎪ ⎪ ⎪ ⎨⎢ ⎥ ⎢ ⎥⎪ ⎬ 1 ⎥ , ⎢0⎥ . is ⎢ ⎣ 0 ⎦ ⎣ 1 ⎦⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 0 1 We determined that the eigenspace 𝐸𝜆2 was the solution set to ⎤ ⎡ 1100 ⎢ 0 0 1 1 ⎥ #» #» ⎥ ⎢ ⎣0 0 0 0⎦ 𝑣 = 0 . 0000 A basis for 𝐸𝜆2 ⎧⎡ ⎤ ⎡ ⎤⎫ 0 ⎪ −1 ⎪ ⎪ ⎪ ⎨ ⎢ 1 ⎥ ⎢ 0 ⎥⎬ ⎥,⎢ ⎥ . is ⎢ ⎣ 0 ⎦ ⎣ −1 ⎦⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 1 0 Therefore, we let ⎡ 1 ⎢1 𝑃 =⎢ ⎣0 0 0 0 1 1 −1 1 0 0 ⎤ 0 0 ⎥ ⎥, −1 ⎦ 1 ⎡ 0 3 0 0 which will give us that 3 ⎢ 0 𝑃 −1 𝐴𝑃 = 𝐷 = ⎢ ⎣0 0 If we reorder the columns of 𝑃 such that ⎡ 1 ⎢1 𝑃 =⎢ ⎣0 0 then we will obtain −1 1 0 0 ⎡ 0 0 1 1 3 ⎢ 0 𝑃 −1 𝐴𝑃 = 𝐷 = ⎢ ⎣0 0 0 0 −1 0 ⎤ 0 0 ⎥ ⎥. 0 ⎦ −1 ⎤ 0 0 ⎥ ⎥, −1 ⎦ 1 0 −1 0 0 0 0 3 0 ⎤ 0 0 ⎥ ⎥. 0 ⎦ −1 Diagonalization Section 9.4 The Diagonalizability Test 253 EXERCISE ⎡ 1 ⎢2 Using Example 9.4.15, find all diagonal matrices which are similar to 𝐴 = ⎢ ⎣0 0 2 1 0 0 0 0 1 2 ⎤ 0 0⎥ ⎥. 2⎦ 1 Chapter 10 Vector Spaces 10.1 Definition of a Vector Space So far, all of our discussion has taken place within R𝑛 and C𝑛 or within various subspaces of these spaces. An underlying fact that we have made use of is that these sets are closed under addition and scalar multiplication. We have taken it for granted that any linear combination of vectors in R𝑛 (or C𝑛 ) will produce a vector in R𝑛 (or C𝑛 ). This assumption is fundamental to linear algebra. We can consider other sets of objects along with two operations. One operation, called addition, combines two objects in the set to produce another object in the set. The other operation, called scalar multiplication, combines an object in the set and a scalar from the field F to produce another object in the set. These operations may be defined in familiar ways, like the addition and scalar multiplication of vectors in F𝑛 . However, they may also be newly defined operations. Example 10.1.1 Consider the set of all real polynomials of degree two or less, which we denote by 𝑃2 (R): {︀ }︀ 𝑃2 (R) = 𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 : 𝑝0 , 𝑝1 , 𝑝2 ∈ R . Given two polynomials in this set, 𝑝(𝑥) = 𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 and 𝑞(𝑥) = 𝑞0 + 𝑞1 𝑥 + 𝑞2 𝑥2 , their sum is 𝑝1 (𝑥) + 𝑝2 (𝑥) = 𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 + 𝑞0 + 𝑞1 𝑥 + 𝑞2 𝑥2 = 𝑟0 + 𝑟1 𝑥 + 𝑟2 𝑥2 , where 𝑟0 = 𝑝0 + 𝑞0 , 𝑟1 = 𝑝1 + 𝑞1 and 𝑟2 = 𝑝2 + 𝑞2 . The resulting polynomial is another polynomial in 𝑃2 (R), and thus we say that 𝑃2 (R) is closed under addition. If we multiply 𝑝(𝑥) by 𝑐 ∈ R we get 𝑐𝑝(𝑥) = 𝑐(𝑝0 + 𝑝1 𝑥 + 𝑝2 𝑥2 ) = 𝑐𝑝0 + 𝑐𝑝1 𝑥 + 𝑐𝑝2 𝑥2 = 𝑠0 + 𝑠1 𝑥 + 𝑠2 𝑥2 , where 𝑠0 = 𝑐𝑝0 , 𝑠1 = 𝑐𝑝1 and 𝑠2 = 𝑐𝑝2 . The resulting polynomial is another polynomial in 𝑃2 (R) and thus, we say that 𝑃2 (R) is closed under scalar multiplication. 254 Section 10.1 Definition of a Vector Space 255 We can think of linear algebra as operating in a world with four components: 1. a non-empty set of objects V; 2. a field F; 3. an operation called addition, that combines two objects from V, which we denote by ⊕; and 4. an operation called scalar multiplication, which combines an object from V and a scalar from F, which we denote by ⊙. Note that addition (⊕) and scalar multiplication (⊙) are used in place of + and · to denote that these operations may have definitions that differ from the “standard operations” of addition/multiplication on scalars or coordinate vectors. Many other definitions of these operations are possible, as long as they satisfy the vector space axioms below. Definition 10.1.2 Vector Space A non-empty set of objects, V, is a vector space over a field, F, under the operations of addition, ⊕, and scalar multiplication, ⊙, provided the following set of ten axioms are met. C1. For all #» 𝑥 , #» 𝑦 ∈ V, #» 𝑥 ⊕ #» 𝑦 ∈ V. (Closure Under Addition) C2. For all #» 𝑥 ∈ V and all 𝑐 ∈ F, 𝑐 ⊙ #» 𝑥 ∈ V. (Closure Under Scalar Multiplication) V1. For all #» 𝑥 , #» 𝑦 ∈ V, #» 𝑥 ⊕ #» 𝑦 = #» 𝑦 ⊕ #» 𝑥. (Addition is Commutative) V2. For all #» 𝑥 , #» 𝑦 , #» 𝑧 ∈ V, ( #» 𝑥 ⊕ #» 𝑦 ) ⊕ #» 𝑧 = #» 𝑥 ⊕ ( #» 𝑦 ⊕ #» 𝑧 ) = #» 𝑥 ⊕ #» 𝑦 ⊕ #» 𝑧. (Addition is Associative) #» #» #» V3. There exists a vector 0 ∈ V such that for all #» 𝑥 ∈ V, #» 𝑥 ⊕ 0 = 0 ⊕ #» 𝑥 = #» 𝑥. (Additive Identity) #» V4. For all #» 𝑥 ∈ V, there exists a vector − #» 𝑥 ∈ V such that #» 𝑥 ⊕ (− #» 𝑥 ) = (− #» 𝑥 ) ⊕ #» 𝑥 = 0. (Additive Inverse) V5. For all #» 𝑥 , #» 𝑦 ∈ V and for all 𝑐 ∈ F, 𝑐 ⊙ ( #» 𝑥 ⊕ #» 𝑦 ) = (𝑐 ⊙ #» 𝑥 ) ⊕ (𝑐 ⊙ #» 𝑦 ). (Vector Addition Distributive Law) V6. For all #» 𝑥 ∈ V and for all 𝑐, 𝑑 ∈ F, (𝑐 + 𝑑) ⊙ #» 𝑥 = (𝑐 ⊙ #» 𝑥 ) ⊕ (𝑑 ⊙ #» 𝑥 ). (Scalar Addition Distributive Law) V7. For all #» 𝑥 ∈ V and for all 𝑐, 𝑑 ∈ F, (𝑐𝑑) ⊙ #» 𝑥 = 𝑐 ⊙ (𝑑 ⊙ #» 𝑥 ). (Scalar Multiplication is Associative) V8. For all #» 𝑥 ∈ V, 1 ⊙ #» 𝑥 = #» 𝑥. (Multiplicative Identity) 256 Chapter 10 Definition 10.1.3 Vector Spaces A vector is an element of a vector space. Vector REMARKS 1. The sum 𝑐 + 𝑑 in V6 (scalar addition distributive law) is the sum of scalars in F. 2. The product 𝑐𝑑 in V7 (scalar associativity) is the product of scalars in F. 3. In Abstract Algebra, the combination of set, field, addition, and scalar multiplication is sometimes referenced as (V, F, ⊕, ⊙). 4. If the “standard operations” for a given vector space are being used, then we will usually default to the standard + symbol for addition and juxtaposition for scalar multiplication. 5. One of the defining properties of a field that we will use is the fact that every nonzero 𝑎 ∈ F has a multiplicative inverse, which we denote by 𝑎−1 . That is to say, for every nonzero 𝑎 ∈ F, there exists 𝑎−1 ∈ F such that 𝑎𝑎−1 = 𝑎−1 𝑎 = 1. Example 10.1.4 Consider R𝑛 over R and C𝑛 over C with the standard vector addition and standard scalar multiplication, respectively. We can show that the ten vector space axioms are satisfied here, so that both of these sets are vector spaces over their respective fields. Perhaps the most simple vector space V that one can think of is the vector space consisting #» of a single vector, namely the zero vector 0 , with addition and scalar multiplication defined in an obvious way. Example 10.1.5 #» #» #» #» #» #» Consider the set V = { 0 } with the operations 0 + 0 = 0 and 𝑐 0 = 0 , where 𝑐 ∈ F. It #» can be shown that these operations satisfy the ten vector space axioms. Thus, V = { 0 } is a vector space. It is called the zero vector space or the zero space. The zero space is not the most interesting object to study. Here are two more interesting examples of vector spaces. Example 10.1.6 Consider 𝑀𝑚×𝑛 (F) over F equipped with the standard matrix addition and scalar multiplication, respectively. We can show that the ten vector space axioms are satisfied, with the zero vector in this space being the zero matrix. Therefore, 𝑀𝑚×𝑛 (F) is a vector space over F. Section 10.1 Definition of a Vector Space Example 10.1.7 257 The set 𝑃𝑛 (F) is the set of all polynomials of degree at most 𝑛 with coefficients in F. Using the field F, we define addition and scalar multiplication in the following way: (𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) + (𝑏0 + 𝑏1 𝑥 + · · · + 𝑏𝑛 𝑥𝑛 ) = (𝑎0 + 𝑏0 ) + (𝑎1 + 𝑏1 )𝑥 + · · · + (𝑎𝑛 + 𝑏𝑛 )𝑥𝑛 , 𝑐(𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) = (𝑐𝑎0 ) + (𝑐𝑎1 )𝑥 + · · · + (𝑐𝑎𝑛 )𝑥𝑛 . We can show that these operations satisfy the ten vector space axioms, with the zero vector being the zero polynomial in 𝑃𝑛 (F), i.e., 𝑝(𝑥) = 0 + 0𝑥 + · · · + 0𝑥𝑛 . Therefore, this is a vector space over F. We give the formal definition below. Note that a vector in 𝑀𝑚×𝑛 (F) is by definition an 𝑚 × 𝑛 matrix 𝐴 ∈ 𝑀𝑚×𝑛 (F). Likewise, a vector in 𝑃𝑛 (F) is a polynomial in 𝑃𝑛 (F). Definition 10.1.8 𝑃𝑛 (F) We use 𝑃𝑛 (F) to denote the vector space over F comprised of the set of all polynomials of degree at most 𝑛 with coefficients in F, with addition and scalar multiplication defined as follows: (𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) + (𝑏0 + 𝑏1 𝑥 + · · · + 𝑏𝑛 𝑥𝑛 ) = (𝑎0 + 𝑏0 ) + (𝑎1 + 𝑏1 )𝑥 + · · · + (𝑎𝑛 + 𝑏𝑛 )𝑥𝑛 , 𝑐(𝑎0 + 𝑎1 𝑥 + · · · + 𝑎𝑛 𝑥𝑛 ) = (𝑐𝑎0 ) + (𝑐𝑎1 )𝑥 + · · · + (𝑐𝑎𝑛 )𝑥𝑛 . Of course, our discussion about vector spaces would not be complete without showing examples of objects (V, F, ⊕, ⊙) that are not vector spaces. This happens when at least one of the ten vector space axioms fails. In order to prove that (V, F, ⊕, ⊙) is not a vector space, it suffices to disprove V4 by showing that V has no additive identity or to disprove any other vector space axiom by providing an explicit counterexample that violates it. Example 10.1.9 For a positive integer 𝑛, let ⎧⎡ ⎤ ⎫ ⎪ ⎪ ⎨ 𝑥1 ⎬ ⎢ .. ⎥ V = ⎣ . ⎦ : 𝑥1 > 0, . . . , 𝑥𝑛 > 0 ⎪ ⎪ ⎩ ⎭ 𝑥𝑛 be a subset of R𝑛 , and consider V equipped with the standard operations of addition and scalar multiplication of vectors in R𝑛 . Show that V is not a vector space. Solution: Since V does not contain the zero vector, we see that the vector space axiom V4 is violated, which means that V is not a vector space. 258 Chapter 10 Example 10.1.10 Vector Spaces For a positive integer 𝑛, let ⎧⎡ ⎤ ⎫ ⎪ ⎪ ⎨ 𝑥1 ⎬ ⎢ .. ⎥ V = ⎣ . ⎦ : 𝑥1 ≥ 0, . . . , 𝑥𝑛 ≥ 0 ⎪ ⎪ ⎩ ⎭ 𝑥𝑛 be a subset of R𝑛 , and consider V equipped with the standard operations of addition and scalar multiplication of vectors in R𝑛 . Show that V is not a vector space over R. Solution: Unlike in the previous example, this time V does contain the zero vector, so the axiom V4 is satisfied. Furthermore, one can verify that V is closed under addition, so the axiom C1 is satisfied. However, V is not closed under scalar multiplication. As a counterexample, take the first standard basis vector #» 𝑥 = #» 𝑒 1 and 𝑐 = −1. Then ⎡ ⎤ ⎡ ⎤ 1 −1 ⎢0⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ 𝑐 #» 𝑥 = (−1) · ⎢ . ⎥ = ⎢ . ⎥ ∈ / V, ⎣ .. ⎦ ⎣ .. ⎦ 0 0 which means that the vector space axiom C2 is violated. This proves that V is not a vector space over R. As an exercise try listing all vector space axioms that do and do not hold for V. EXERCISE Let ⎧⎡ ⎤ ⎫ 𝑥 ⎪ ⎪ 1 ⎨ ⎬ ⎢ .. ⎥ V = ⎣ . ⎦ : 𝑥1 , . . . , 𝑥𝑛 ∈ R ⎪ ⎪ ⎩ ⎭ 𝑥𝑛 be a subset of C𝑛 , and consider V equipped with the standard operations of addition and scalar multiplication of vectors in C𝑛 . Show that V is not a vector space over C. List all vector space axioms that do and do not hold for V. 10.2 Span, Linear Independence and Basis Just like for F𝑛 , we can introduce the notions of a span, linear dependence, linear independence and basis for a vector space (V, F, + , · ). In what follows, we will primarily be interested in vector spaces 𝑃𝑛 (F) and 𝑀𝑚×𝑛 (F). Definition 10.2.1 Span Let 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 be vectors in V. We define the span of {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } to be the set of all linear combinations of 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 . That is, Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } = {𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 : 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F}. We refer to {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } as a spanning set for Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. We also say that Span{𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 } is spanned by {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }. Section 10.2 Span, Linear Independence and Basis 259 We have not yet developed an analogous result for Proposition 8.4.6 (Spans F𝑛 iff rank is 𝑛) in the general vector space context, so in order to show a set ℬ is a spanning set of a vector space V we will need to apply a more “first principles” approach. To establish Span(ℬ) = V, we will need to show Span(ℬ) ⊆ V and V ⊆ Span(ℬ). Example 10.2.2 Show that the set {︂[︂ ℬ= ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ 10 00 01 0 1 , , , 00 01 10 −1 0 is a spanning set for 𝑀2×2 (F). Solution: Our goal is to show that {︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ 10 00 01 0 1 Span , , , = 𝑀2×2 (F). 00 01 10 −1 0 Since ℬ is a subset of 𝑀2×2 (F), it must be the case that any linear combination of the elements of ℬ is also in 𝑀2×2 (F) by the closure axioms for vector spaces. Therefore, Span(ℬ) ⊆ 𝑀2×2 (F). [︂ ]︂ 𝑎11 𝑎12 It remains to show that 𝑀2×2 (F) ⊆ Span(ℬ). Let 𝐴 = be a matrix in 𝑀2×2 (F). 𝑎21 𝑎22 We claim that there exist 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ F such that ]︂ [︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 10 00 01 𝑎11 𝑎12 0 1 𝑐1 + 𝑐2 = + 𝑐3 + 𝑐4 . −1 0 𝑎21 𝑎22 00 01 10 To see that this is the case, let us simplify the left-hand side of the above equality: ]︂ ]︂ [︂ [︂ 𝑎 𝑎 𝑐1 𝑐3 + 𝑐4 = 11 12 . 𝑎21 𝑎22 𝑐3 − 𝑐4 𝑐2 Equating the entries of the matrices on both sides of the above equality, we obtain the following system of four equations in four unknowns: 𝑐1 𝑐3 +𝑐4 𝑐3 −𝑐4 𝑐2 = 𝑎11 = 𝑎12 . = 𝑎21 = 𝑎22 Solving this system for 𝑐1 , 𝑐2 , 𝑐3 , and 𝑐4 , we find that 𝑐1 = 𝑎11 , 𝑐2 = 𝑎22 , 𝑐3 = 𝑎12 + 𝑎21 , 2 and 𝑐4 = 𝑎12 − 𝑎21 . 2 Thus, [︂ 𝐴 = 𝑎11 ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 𝑎12 + 𝑎21 0 1 𝑎12 − 𝑎21 0 1 10 00 + 𝑎22 + + , 10 −1 0 00 01 2 2 so 𝐴 ∈ Span(ℬ). This means that 𝑀2×2 (F) ⊆ Span(ℬ), and since it is also the case that Span(ℬ) ⊆ 𝑀2×2 (F), we conclude that ℬ is a spanning set for 𝑀2×2 (F). 260 Chapter 10 Vector Spaces EXERCISE Show that the set {︂[︂ ℬ= ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ 10 01 00 1 0 , , , 01 00 10 0 −1 is a spanning set for 𝑀2×2 (F). Example 10.2.3 Show that the set ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥} is a spanning set for 𝑃2 (C). Solution: Our goal is to show that {︀ }︀ Span 1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥 = 𝑃2 (C). Since ℬ is a subset of 𝑃2 (C), it must be the case that any linear combination of the elements of ℬ is also in 𝑃2 (C). Therefore, Span(ℬ) ⊆ 𝑃2 (C). It remains to show that 𝑃2 (C) ⊆ Span(ℬ). Let 𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 be a polynomial in 𝑃2 (C). We claim that there exist 𝑐1 , 𝑐2 , 𝑐3 ∈ C such that 𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 . To see that this is the case, let us simplify the left-hand side of the above equality: (𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 . Equating the corresponding coefficients of the polynomials on both sides of the above equality, we obtain the following system of three equations in three unknowns: 𝑐1 −𝑐2 −𝑐3 = 𝑎0 𝑖𝑐2 −𝑖𝑐3 = 𝑎1 . −𝑐1 = 𝑎2 Solving this system for 𝑐1 , 𝑐2 , and 𝑐3 , we find that 𝑐1 = −𝑎2 , 𝑐2 = −𝑎0 − 𝑖𝑎1 − 𝑎2 2 , and 𝑐3 = −𝑎0 + 𝑖𝑎1 − 𝑎2 . 2 Thus, 𝑝(𝑥) = (−𝑎2 )(1 − 𝑥2 ) + −𝑎0 + 𝑖𝑎1 − 𝑎2 −𝑎0 − 𝑖𝑎1 − 𝑎2 (−1 + 𝑖𝑥) + (−1 − 𝑖𝑥), 2 2 so 𝑝(𝑥) ∈ Span(ℬ). This means that 𝑃2 (C) ⊆ Span(ℬ), and since it is also the case that Span(ℬ) ⊆ 𝑃2 (C), we conclude that ℬ is a spanning set for 𝑃2 (C). EXERCISE Let 𝑛 be a non-negative integer. Show that the set ℬ = {1, 2𝑥, 4𝑥2 , . . . , 2𝑛 𝑥𝑛 } Section 10.2 Span, Linear Independence and Basis 261 is a spanning set for 𝑃𝑛 (F). As with the definition of the span of a set, the definitions of linear dependence and linear independence in the general vector space context will look nearly identical to what was used for linear dependence and linear independence in F𝑛 in Definitions 8.2.3 and 8.2.4. Definition 10.2.4 Linear Dependence, Linear Independence We say that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ F𝑛 are linearly dependent if there exist scalars #» 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ F, not all zero, such that 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + · · · + 𝑐𝑘 𝑣#»𝑘 = 0 . We say that 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 ∈ V are linearly independent if the only solution to the equation #» 𝑐1 𝑣#»1 + 𝑐2 𝑣#»2 + . . . + 𝑐𝑘 𝑣#»𝑘 = 0 is the trivial solution 𝑐1 = 𝑐2 = · · · = 𝑐𝑘 = 0. If 𝑈 = {𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 }, then we say that the set 𝑈 is linearly dependent (resp. linearly independent) to mean that the vectors 𝑣#»1 , 𝑣#»2 , . . . , 𝑣#»𝑘 are linearly dependent (resp. linearly independent). Since we do not yet have an analogous result for Proposition 8.3. 6 (Pivots and Linear Independence) in the general vector space context, we will need to determine whether or not a set ℬ is linearly independent directly from the definition above. Example 10.2.5 Determine whether the subset ]︂}︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ {︂[︂ 0 1 01 00 10 , , , ℬ= −1 0 10 01 00 of 𝑀2×2 (F) is linearly dependent or linearly independent. Solution: In order to determine whether the set ℬ is linearly dependent or linearly independent, we examine the equation ]︂ ]︂ ]︂ ]︂ ]︂ [︂ [︂ [︂ [︂ [︂ 00 10 00 01 0 1 . 𝑐1 + 𝑐2 + 𝑐3 + 𝑐4 = 00 01 10 −1 0 00 Simplifying the left-hand side of the above equality, we find that [︂ ]︂ [︂ ]︂ 𝑐1 𝑐3 + 𝑐4 00 = . 𝑐3 − 𝑐4 𝑐2 00 Equating the entries of the matrices on both sides of the above equality, we obtain the following system of four equations in four unknowns: 𝑐1 𝑐2 =0 𝑐3 +𝑐4 = 0 . 𝑐3 −𝑐4 = 0 =0 Solving this system for 𝑐1 , 𝑐2 , 𝑐3 and 𝑐4 , we find that 𝑐1 = 𝑐2 = 𝑐3 = 𝑐4 = 0. Therefore, ℬ is linearly independent. 262 Chapter 10 Vector Spaces EXERCISE Determine whether the subset {︂[︂ ℬ= ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ 10 01 00 1 0 , , , 01 00 10 0 −1 of 𝑀2×2 (F) is linearly dependent or linearly independent. Example 10.2.6 Determine whether the subset ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥} of 𝑃2 (C) is linearly dependent or linearly independent. Solution: In order to determine whether the set ℬ is linearly dependent or linearly independent, we examine the equation 𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥) = 0. Simplifying the left-hand side of the above equality, we find that (𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2 = 0. Equating the corresponding coefficients of the polynomials on both sides of the above equality, we obtain the following system of three equations in three unknowns: 𝑐1 −𝑐2 −𝑐3 = 0 𝑖𝑐2 −𝑖𝑐3 = 0 . −𝑐1 =0 Solving this system for 𝑐1 , 𝑐2 and 𝑐3 , we find that 𝑐1 = 𝑐2 = 𝑐3 = 0. Therefore, ℬ is linearly independent. EXERCISE Let 𝑛 be a non-negative integer. Determine whether the subset ℬ = {1, 2𝑥, 4𝑥2 , . . . , 2𝑛 𝑥𝑛 } of 𝑃𝑛 (F) is linearly dependent or linearly independent. Finally, we extend the notion of basis to the general vector space context, with a definition that is, yet again, nearly identical to what was used previously for a basis of a subspace of F𝑛 in Definition 8.2.6. Definition 10.2.7 We say that a subset ℬ of a nonzero vector space V is a basis for V if Basis 1. ℬ is linearly independent, and Section 10.2 Span, Linear Independence and Basis 263 2. V = Span(ℬ). REMARK You should be cautioned that our definition requires a basis to have finitely many vectors in it. This may not always be possible. For example, the set of all polynomials with coefficients in F is a vector space over F with the usual operations of addition and scalar multiplication. However, this vector space does not admit a “basis” according to our definition. The study of such “infinite-dimensional” vector spaces has important applications in mathematics and physics, and is taken up in more advanced courses. Example 10.2.8 Determine whether or not the set {︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ 10 00 01 0 1 ℬ= , , , 10 −1 0 00 01 is a basis for 𝑀2×2 (F). Solution: It was demonstrated in Example 10.2.5 that ℬ is linearly independent. It was also demonstrated in Example 10.2.2 that 𝑀2×2 (F) = Span(ℬ). Since ℬ is a linearly independent spanning set for 𝑀2×2 (F), it is a basis for 𝑀2×2 (F). EXERCISE Show that the set {︂[︂ ℬ= ]︂}︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ 1 0 00 01 10 , , , 0 −1 10 00 01 is a basis for 𝑀2×2 (F). Example 10.2.9 Show that the set ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥} is a basis for 𝑃2 (C). Solution: It was demonstrated in Example 10.2.6 that ℬ is linearly independent. It was also demonstrated in Example 10.2.3 that 𝑃2 (C) = Span(ℬ). Since ℬ is a linearly independent spanning set for 𝑃2 (C), it is a basis for 𝑃2 (C). EXERCISE Let 𝑛 be a non-negative integer. Determine whether or not the set ℬ = {1, 2𝑥, 4𝑥2 , . . . , 2𝑛 𝑥𝑛 } is a basis for 𝑃2 (F). 264 Chapter 10 Vector Spaces With the definitions of the span of a set, linear independence, linear dependence, and basis being so similar to what were used previously in the context of F𝑛 , it should come as no surprise that much of what we developed in that context holds for general vector spaces as well. In particular, analogous results to Theorem 8.7.2 (Dimension is Well-Defined) and Theorem 8.8.1 (Unique Representation Theorem) exist, allowing us to define the notions of dimension and coordinate vectors. With these tools, we can translate much of the work of general vector spaces back to familiar territory in the context of F𝑛 ! 10.3 Linear Operators Just like we were able to generalize the notions of a vector space, linear dependence, linear independence, and basis, we can generalize the notion of a linear transformation. Definition 10.3.1 Generic Linear Transformation Let (V, F, + , · ) and (W, F, ⊕, ⊙) be vector spaces over the same field F. We say that the function 𝑇 : V → W is a linear transformation (or linear mapping) if, for any #» 𝑥 , #» 𝑦 ∈ V and any 𝑐 ∈ F, the following two properties hold: 1. 𝑇 ( #» 𝑥 + #» 𝑦 ) = 𝑇 ( #» 𝑥 ) ⊕ 𝑇 ( #» 𝑦 ) (called linearity over addition). 2. 𝑇 (𝑐 · #» 𝑥 ) = 𝑐 ⊙ 𝑇 ( #» 𝑥 ) (called linearity over scalar multiplication). We refer to V here as the domain of 𝑇 and W as the codomain of 𝑇 , as we would for any function. When studying functions between vector spaces, we will restrict our attention only to linear transformations whose domains and codomains are F𝑛 , 𝑃𝑛 (F) and 𝑀𝑚×𝑛 (F), equipped with the standard operations of addition and scalar multiplication. In view of this, we will write V instead of (V, F, + , · ) for brevity. Example 10.3.2 For a positive integer 𝑛, consider the function 𝐷 : 𝑃𝑛 (F) → 𝑃𝑛 (F) defined by 𝐷(𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + · · · + 𝑎𝑛 𝑥𝑛 ) = 𝑎1 + (2𝑎2 )𝑥 + · · · + (𝑛𝑎𝑛 )𝑥𝑛−1 . Show that 𝐷 is a linear transformation. Solution: Notice that, for all polynomials 𝑓 (𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + · · · + 𝑎𝑛 𝑥𝑛 and 𝑔(𝑥) = 𝑏0 + 𝑏1 𝑥 + 𝑏2 𝑥2 + · · · + 𝑏𝑛 𝑥𝑛 , we have 𝐷(𝑓 (𝑥) + 𝑔(𝑥)) = 𝐷((𝑎0 + 𝑏0 ) + (𝑎1 + 𝑏1 )𝑥 + (𝑎2 + 𝑏2 )𝑥2 + · · · + (𝑎𝑛 + 𝑏𝑛 )𝑥𝑛 ) = (𝑎1 + 𝑏1 ) + 2(𝑎2 + 𝑏2 )𝑥 + · · · + 𝑛(𝑎𝑛 + 𝑏𝑛 )𝑥𝑛−1 = (𝑎1 + 2𝑎2 𝑥 + · · · + 𝑛𝑎𝑛 𝑥𝑛−1 ) + (𝑏1 + 2𝑏2 𝑥 + · · · + 𝑛𝑏𝑛 𝑥𝑛−1 ) = 𝐷(𝑓 (𝑥)) + 𝐷(𝑔(𝑥)), Section 10.3 Linear Operators 265 and for all 𝑐 ∈ F we have 𝐷(𝑐𝑓 (𝑥)) = 𝐷(𝑐𝑎0 + 𝑐𝑎1 𝑥 + 𝑐𝑎2 𝑥2 + · · · + 𝑐𝑎𝑛 𝑥𝑛 ) = 𝑐𝑎1 + 𝑐(2𝑎2 )𝑥 + · · · + 𝑐(𝑛𝑎𝑛 )𝑥𝑛−1 = 𝑐(𝑎1 + 2𝑎2 𝑥 + · · · 𝑛𝑎𝑛 𝑥𝑛−1 ) = 𝑐𝐷(𝑓 (𝑥)). Thus, 𝐷 is a linear transformation. If you studied calculus, then you may recognize that 𝐷 is the differentiation operator; that is, for any polynomial 𝑓 (𝑥) ∈ 𝑃𝑛 (F), 𝐷(𝑓 (𝑥)) = 𝑓 ′ (𝑥), where 𝑓 ′ (𝑥) is the derivative of 𝑓 (𝑥). EXERCISE Show that the function 𝑇 : 𝑃𝑛 (F) → F defined by 𝑇 (𝑓 (𝑥)) = 𝑓 (0) is a linear transformation. Example 10.3.3 For positive integers 𝑚 and 𝑛, consider the function 𝑆 : 𝑀𝑚×𝑛 (F) → 𝑀𝑛×𝑚 (F) defined by 𝑆(𝐴) = 𝐴𝑇 . That is, 𝑆(𝐴) is equal to the transpose of the 𝑚 × 𝑛 matrix 𝐴. Show that 𝑆 is a linear transformation. Solution: In order to prove that 𝑆 is a linear transformation we will use Proposition 4.3. 13 (Properties of Matrix Transpose). Notice that, for all 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (F), we have 𝑆(𝐴 + 𝐵) = (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇 by Proposition 4.3.13 = 𝑆(𝐴) + 𝑆(𝐵). Moreover, for all 𝑐 ∈ F, we have 𝑆(𝑐𝐴) = (𝑐𝐴)𝑇 = 𝑐𝐴𝑇 by Proposition 4.3.13 = 𝑐𝑆(𝐴). Thus, 𝑆 is a linear transformation. EXERCISE Recall from Definition 7.3.2 that the trace tr(𝐴) of a square matrix 𝐴 is defined as the sum of diagonal entries of 𝐴. Show that the function tr : 𝑀𝑛×𝑛 (F) → F is a linear transformation. 266 Chapter 10 Vector Spaces As in Chapter 9, the linear transformations 𝑇 : V → V whose domain and codomain are the same set are especially interesting to us. Definition 10.3.4 A linear transformation 𝑇 : V → V is called a linear operator. Linear Operator Just like linear operators on F𝑛 that we studied in Chapter 9, linear operators 𝑇 : V → V have many nice properties, such as the property that the composition of 𝑇 with itself, 𝑇 ∘ 𝑇 , is well-defined, or that we can naturally introduce eigenvalues, eigenvectors, and eigenpairs for 𝑇 . We will explore this second property in detail. Definition 10.3.5 Eigenvector, Eigenvalue and Eigenpair of a Linear Operator Let 𝑇 : V → V be a linear operator. We say that the nonzero vector #» 𝑥 ∈ V is an eigenvector of 𝑇 if there exists a scalar 𝜆 ∈ F such that 𝑇 ( #» 𝑥 ) = 𝜆 #» 𝑥. The scalar 𝜆 is called an eigenvalue of 𝑇 over F and the pair (𝜆, #» 𝑥 ) is called an eigenpair of 𝑇 over F. Thus, if (𝜆, #» 𝑥 ) is an eigenpair of 𝑇 over F, then the action of 𝑇 on #» 𝑥 is especially simple, #» i.e., it is the result of scaling the vector 𝑥 by a factor 𝜆. In fact, this observation applies not only to #» 𝑥 , but to any vector #» 𝑦 = 𝑐 #» 𝑥 with 𝑐 ∈ F that is a scalar multiple of #» 𝑥 , because 𝑇 ( #» 𝑦 ) = 𝑇 (𝑐 #» 𝑥 ) = 𝑐𝑇 ( #» 𝑥 ) = 𝑐(𝜆 #» 𝑥 ) = 𝜆(𝑐 #» 𝑥 ) = 𝜆 #» 𝑦. Example 10.3.6 Consider the differentiation operator 𝐷 : 𝑃𝑛 (F) → 𝑃𝑛 (F) from Example 10.3.2. Then the constant polynomial 𝑝(𝑥) = 1 satisfies 𝐷(𝑝(𝑥)) = 𝐷(1) = 0 = 0 · 𝑝(𝑥). Since 𝑝(𝑥) is not the zero polynomial, we see that, for 𝜆 = 0, (𝜆, 𝑝(𝑥)) is an eigenpair of 𝐷 over F. Example 10.3.7 If in Example 10.3.3 we assume that 𝑚 = 𝑛, then 𝑆 : 𝑀𝑛×𝑛 (F) → 𝑀𝑛×𝑛 (F) is a linear operator. Since the 𝑛×𝑛 identity matrix 𝐼𝑛 is diagonal, it remains fixed under the operation of matrix transpose, and so 𝑆(𝐼𝑛 ) = 𝐼𝑛𝑇 = 𝐼𝑛 = 1 · 𝐼𝑛 . Since 𝐼𝑛 is not the zero matrix, we see that, for 𝜆 = 1, (𝜆, 𝐼𝑛 ) is an eigenpair of 𝑆 over F. As we learned in Chapter 9, given an arbitrary linear operator 𝑇 : F𝑛 → F𝑛 , it may be quite difficult to understand the nature of its action. However, when 𝑇 has a basis of eigenvectors { #» 𝑣 1 , . . . , #» 𝑣 𝑛 }, then it follows from Proposition 9.2.6 (Eigenvector Basis Criterion for Diagonalizability) that 𝑇 is diagonalizable, and so the action of 𝑇 is well understood in this case. We can use this criterion to introduce the notion of diagonalizability for linear operators V → V. Section 10.3 Linear Operators Definition 10.3.8 Diagonalizable 267 Let 𝑇 : V → V be a linear operator. We say that 𝑇 is diagonalizable over F if there exists a basis ℬ of V comprised of eigenvectors of 𝑇 . When 𝑇 is diagonalizable, the action of 𝑇 on an arbitrary vector #» 𝑥 ∈ V can be interpreted #» as scaling the vector 𝑥 by a factor of 𝜆𝑖 in the direction of an eigenvector #» 𝑣 𝑖 of 𝑇 . We summarize this observation in the following theorem, whose proof we leave as an exercise. Theorem 10.3.9 (Diagonalizable Operators Viewed As Scalings) Let 𝑇 : V → V be a diagonalizable linear operator. Let ℬ = { #» 𝑣 1 , . . . , #» 𝑣 𝑛 } be a basis of V comprised of eigenvectors of 𝑇 and, for 𝑖 = 1, . . . , 𝑛, let 𝜆𝑖 denote the eigenvalue of 𝑇 corresponding to #» 𝑣 𝑖 . Then, for any #» 𝑥 ∈ V, if #» 𝑥 = 𝑐1 #» 𝑣 1 + · · · + 𝑐𝑛 #» 𝑣 𝑛, 𝑐1 , . . . , 𝑐𝑛 ∈ F, we have that 𝑇 ( #» 𝑥 ) = 𝑇 (𝑐1 #» 𝑣 1 + · · · + 𝑐𝑛 #» 𝑣 𝑛 ) = 𝜆1 𝑐1 #» 𝑣 1 + · · · + 𝜆𝑛 𝑐𝑛 #» 𝑣 𝑛. That is, the action of 𝑇 on #» 𝑥 can be viewed as the result of scaling #» 𝑥 by a factor of 𝜆𝑖 in #» the direction of 𝑣 𝑖 for each 𝑖. Example 10.3.10 Show that the linear operator 𝑆 : 𝑀2×2 (F) → 𝑀2×2 (F) given by 𝑆(𝐴) = 𝐴𝑇 is diagonalizable. Solution: By observation, we find that ]︂ ]︂)︂ [︂ (︂[︂ 10 10 , =1· 𝑆 00 00 (︂[︂ 𝑆 01 10 ]︂)︂ [︂ =1· ]︂ 01 , 10 (︂[︂ 𝑆 (︂[︂ 𝑆 00 01 0 1 −1 0 ]︂)︂ ]︂ 00 , =1· 01 [︂ ]︂)︂ [︂ = (−1) · ]︂ 0 1 . −1 0 Thus, (︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂ (︂ [︂ ]︂)︂ 10 01 01 1, , 1, , 1, 00 10 10 (︂ and [︂ 0 1 −1, −1 0 ]︂)︂ are the eigenpairs of 𝑆. As we have seen in Example 10.2.8, the set {︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ 10 00 01 0 1 ℬ= , , , 00 01 10 −1 0 is a basis of V. Thus, ℬ is a basis of eigenvectors of 𝑆, which means that 𝑆 is diagonalizable. By Theorem 10.3.9 (Diagonalizable Operators Viewed As Scalings), for any matrix 𝐴 ∈ 𝑀2×2 (F) such that [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 10 00 01 0 1 𝐴 = 𝑐1 + 𝑐2 + 𝑐3 + 𝑐4 00 01 10 −1 0 [︂ ]︂ 𝑐1 𝑐3 + 𝑐4 = , 𝑐3 − 𝑐4 𝑐2 268 Chapter 10 Vector Spaces we have (︂[︂ ]︂)︂ 𝑐1 𝑐3 + 𝑐4 𝑆(𝐴) = 𝑆 𝑐3 − 𝑐4 𝑐2 [︂ ]︂)︂ [︂ ]︂ [︂ ]︂ ]︂ (︂ [︂ 0 1 01 01 10 + 𝑐4 + 𝑐3 + 𝑐2 = 𝑆 𝑐1 −1 0 10 10 00 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ 0 1 01 01 10 − 𝑐4 + 𝑐3 + 𝑐2 = 𝑐1 −1 0 10 10 00 [︂ ]︂ 𝑐1 𝑐3 − 𝑐4 = . 𝑐3 + 𝑐4 𝑐2 EXERCISE Recall Definition 6.4.2, where the adjugate of a square matrix was introduced. Show that the linear operator adj : 𝑀2×2 (F) → 𝑀2×2 (F) given by ]︂ ]︂)︂ [︂ (︂[︂ 𝑑 −𝑏 𝑎 𝑏 = adj −𝑐 𝑎 𝑐 𝑑 is diagonalizable over F. REMARK Notice that the function adj : 𝑀𝑛×𝑛 (F) → 𝑀𝑛×𝑛 (F) is not linear when 𝑛 ≥ 3. EXERCISE Show that, for any positive integer 𝑛, the linear operator 𝑇 : 𝑃𝑛 (F) → 𝑃𝑛 (F) given by 𝑇 (𝑝(𝑥)) = 𝑝(2𝑥) is diagonalizable over F. Just like Example 10.3.10, the exercises above can be solved by observation by finding the most “natural” basis of eigenvectors. However, in general, this method does not work, as sometimes a linear operator 𝑇 : V → V may not be diagonalizable, and even if it is diagonalizable the action of 𝑇 may be so complicated that a basis of eigenvectors may not be obvious. Example 10.3.11 Show that the linear operator 𝑇 : 𝑃2 (C) → 𝑃2 (C) given by 𝑇 (𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 ) = (−𝑎1 − 𝑎2 ) + (𝑎0 + 𝑎2 )𝑥 + 𝑎2 𝑥2 is diagonalizable over C. Solution: As an exercise, show that 𝑇 (1 − 𝑥2 ) = 1 − 𝑥2 , 𝑇 (−1 + 𝑖𝑥) = −1 + 𝑖𝑥, and 𝑇 (−1 − 𝑖𝑥) = 𝑖 − 𝑥 = −𝑖(−1 − 𝑖𝑥), Section 10.3 Linear Operators 269 so that we may conclude (1, 1 − 𝑥2 ), (𝑖, −1 + 𝑖𝑥), and (−𝑖, −1 − 𝑖𝑥) are eigenpairs of 𝑇 . As we have seen in Example 10.2.9, the set ℬ = {1 − 𝑥2 , −1 + 𝑖𝑥, −1 − 𝑖𝑥} is a basis of 𝑃2 (C). Thus, ℬ is a basis of 𝑃2 (C) comprised of eigenvectors of 𝑇 , which means that 𝑇 is diagonalizable. By Theorem 10.3.9 (Diagonalizable Operators Viewed As Scalings), for any polynomial 𝑝(𝑥) ∈ 𝑃2 (C) such that 𝑝(𝑥) = 𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥) = (𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2 we have (︀ )︀ 𝑇 (𝑝(𝑥)) = 𝑇 (𝑐1 − 𝑐2 − 𝑐3 ) + (𝑖𝑐2 − 𝑖𝑐3 )𝑥 − 𝑐1 𝑥2 (︀ )︀ = 𝑇 𝑐1 (1 − 𝑥2 ) + 𝑐2 (−1 + 𝑖𝑥) + 𝑐3 (−1 − 𝑖𝑥) = 𝑐1 (1 − 𝑥2 ) + 𝑖𝑐2 (−1 + 𝑖𝑥) − 𝑖𝑐3 (−1 − 𝑖𝑥) = (𝑐1 − 𝑖𝑐2 + 𝑖𝑐3 ) + (−𝑐2 − 𝑐3 )𝑥 − 𝑐1 𝑥2 . In the solution to Example 10.3.11 a basis of eigenvectors ℬ was provided to you, but had we have not provided it, finding ℬ would be a non-trivial task. Furthermore, with the techniques that we currently have you would not be able to show that the linear operator 𝑇 from Example 10.3.11 is not diagonalizable over R, or that the operator 𝐷 from Example 10.3.2 is not diagonalizable over F. This requires further generalizaton of the diagonalization theory, which includes the introduction of notions such as the kernel of a linear operator, ℬ-matrix of a linear operator, geometric and algebraic multiplicities, etc. This theory, along with many other fascinating topics, is studied in detail in a subsequent linear algebra course. Index Basic Variable, 70 Basis, 197, 262 Block Matrix, 246 (𝑖, 𝑗)𝑡ℎ Minor, 148 (𝑖, 𝑗)𝑡ℎ Submatrix, 148 𝑀𝑚×𝑛 (C), 75 𝑀𝑚×𝑛 (F), 75 𝑀𝑚×𝑛 (R), 75 𝑃𝑛 (F), 257 C𝑛 addition, 12 components, 13 definition, 12 #» 26 projection of #» 𝑧 onto 𝑤, scalar multiplication, 12 standard basis, 13 standard inner product, 23 subtraction, 12 𝑛 F standard basis, 209 subspace, 190 𝑛 F standard inner product, 27 𝑛 R addition, 8 cross product, 28 definition, 5 dot product, 13 properties of addition, 9 scalar multiplication, 10 standard basis, 11 subtraction, 10 ℬ-Matrix, 230 Change-of-Basis Matrix, 226 Characteristic Equation, 170 Characteristic Polynomial, 170 Coefficient, 49 Coefficient Matrix, 61 Cofactor, 159 Column Space, 97 Components Of The Standard Basis, 11 Composite linear transformation, 144 Consistent Systems, 52 Constant Term, 49 Coordinate Vector, 221 Coordinates, 220 Cross Product, 28 Determinant, 147, 148 Diagonal, 108 Diagonalizable, 236, 267 Diagonalizable Matrix, 186 Dimension, 217 dot product, 13 Eigenpair, 169, 266 Eigenspace, 182 Eigenvalue, 169, 266 algebraic multiplicity, 245 geometric multiplicity, 245 Eigenvalue Equation, 170 Eigenvector, 169, 266 Elementary matrix, 109 Elementary Operations, 53 Equality, 99 Equality Of Vectors, 7 Equivalent Systems, 52 Addition of vectors, 8 Additive Inverse, 9 Additive inverse, 104 Adjugate, 159 Algebraic Multiplicity, 245 Angle Between Vectors, 16 Associated Homogeneous System, 90 Augmented Matrix, 61 Free Variable, 70 270 Section 10.3 INDEX 271 Function determined by the matrix, 121 Generic Linear Transformation, 264 Geometric Multiplicity, 245 Homogeneous Systems, 81 Identity, 108, 146 Inconsistent Systems, 52 Independence, 195, 261 Injective, 130 Inverse, 114 Invertible, 112 Kernel, 130 Leading Entry, 66 Length, 14, 24 Line, 36, 38 parametric equations, 34, 37 vector equation, 35, 36 Linear Combination, 32 Linear Dependence, 195, 261 Linear Equation, 49 associated homogeneous system, 90 coefficient, 49 consistent systems, 52 constant term, 49 elementary operations, 53 equivalent systems, 52 homogeneous systems, 81 inconsistent systems, 52 non-homogeneous systems, 81 nullspace, 82 particular solution, 91 solution, 50 solution set, 51 solve, 50 system of linear equations, 50 trivial equation, 54 trivial solution, 81 Linear Operator, 230, 266 diagonalizable, 236, 267 eigenpair, 266 eigenvalue, 266 eigenvector, 266 Linear Transformation, 123 composition, 144 identity, 146 kernel, 130 linear operator, 230, 266 one-to-one, 130 onto, 128 power, 146 range, 127 standard matrix, 134 Lower-triangular, 107 Matrix, 61 (𝑖, 𝑗)𝑡ℎ minor, 148 (𝑖, 𝑗)𝑡ℎ submatrix, 148 additive inverse, 104 adjugate, 159 augmented matrix, 61 basic variable, 70 block matrix, 246 change-of-basis matrix, 226 characteristic equation, 170 characteristic polynomial, 170 coefficient matrix, 61 cofactor, 159 column space, 97 determinant, 147, 148 diagonal, 108 diagonalizable matrix, 186 eigenpair, 169 eigenspace, 182 eigenvalue, 169 eigenvalue equation, 170 eigenvector, 169 elementary, 109 entry, 61 equality, 99 free variable, 70 identity, 108 inverse, 114 invertible, 112 leading entry, 66 leading one, 66 lower-triangular, 107 multiplication, 100 multiplication in terms of columns, 86 multiplication in terms of the individual entries, 85 nullity, 77 pivot, 66 pivot column, 66 pivot position, 66 pivot row, 66 rank, 75 reduced row echelon form, 67 row echelon form, 66 272 Chapter 10 row equivalent, 62 row operation (ERO), 62 row space, 98 row vector, 85 RREF, 68 scalar multiplication, 105 similar, 184 square, 107 sum, 103 trace, 173 transpose, 98 upper-triangular, 107 zero matrix, 104 zero row, 62 Matrix Product, 100 Multiplication, 100 Multiplication in Terms of Columns, 86 Multiplication in Terms of the Individual Entries, 85 Non-Homogeneous Systems, 81 Norm, 14, 24 Normalization, 15 Nullity, 77 Nullspace, 82 One-to-One, 130 Onto, 128 Ordered Basis, 220 Orthogonal, 17, 25 Parametric Equations, 37 parametric equations, 34 Particular Solution, 91 Pivot, 66 Pivot Column, 66 Pivot Position, 66 Pivot Row, 66 Plane, 40, 41 scalar equation, 44 Vector Equation, 40 vector equation, 42 Power, 146 Projection, 19 Range, 127 Rank, 75 Row Echelon Form, 66 Row Equivalent, 62 Row notation, 6 Row Operation (ERO), 62 Row Space, 98 Row Vector, 85 RREF, 68 Scalar Equation, 44 Scalar Multiplication, 10 Scalar multiplication, 105 Similar, 184 Solution, 50 Solution Set, 51 Solve, 50 Span, 32, 258 spanned by, 32, 258 Spanning Set, 32, 258 Square, 107 Standard Basis, 11, 209 Standard Inner Product, 23, 27 Standard Matrix, 134 Subspace, 190 Subtraction, 10 Sum, 103 Surjective, 128 T ℬ-matrix, 230 Trace, 173 Transpose, 98 Trivial Equation, 54 Trivial Solution, 81, 195, 261 Unit Vector, 15 Upper-triangular, 107 Vector, 5, 256 addition, 8 additive inverse, 9 angle between, 16 basis, 197, 262 components, 11 coordinate, 220 coordinate vector, 221 cross product, 28 dimension, 217 direction vector, 35, 36 dot product, 13 equality, 7 independence, 195, 261 length, 14, 24 linear combination, 32 linear dependence, 195, 261 normalization, 15 INDEX Section 10.3 INDEX 273 ordered basis, 220 orthogonal, 17, 25 projection, 19 properties of addition, 9 row notation, 6 scalar multiplication, 10 span, 32, 258 standard inner product, 23 subtraction, 10 unit vector, 15 vector space, 255 Vector Equation, 35, 36, 40, 42 Vector Space, 255 Zero Matrix, 104 Zero Row, 62