Notes on Linear Transformations November 17, 2014 Recall that a linear transformation is a function V and W such that T / W between vector spaces V (i) T (c~v ) = cT (~v ) for all ~v in V and all scalars c. (Geometrically, T takes lines to lines.) (ii) T (~v1 + ~v2 ) = T (~v1 ) + T (~v2 ) for all ~v1 , ~v2 in V . (Geometrically, T takes parallelograms to parallelograms.) In the following, we will always assume that V a summary of what will be discussed below. (1) V T / T / W is a linear transformation. Here is W is completely determined by how it acts on a basis B = {~v1 , . . . , ~vn } of V . (2) Every Rn T / Rm is multiplication by a matrix A. (3) B-coordinates turn V into Rn . T / W as multiplication by a matrix, we can choose bases B = {~ (4) To interpret V v1 , . . . , ~vn } 0 n m of V and B = {w ~ 1, . . . , w ~ m } of W , which turn V into R and W into R so that multiplying by a matrix makes sense. (5) An in-depth example: the derivative and integral are linear transformations. (1) V T / W is determined by how it acts on B. Let B = {~v1 , . . . , ~vn } be a basis of V . If ~v is any vector in V , then there is a unique way to write ~v as a linear combination of the vectors in the basis: ~v = c1~v1 + · · · + cn~vn . The linear combination exists since the basis vectors span V , and it is unique since the basis vectors are independent. Then, by linearity of T , we compute T (~v ) = T (c1~v1 + · · · + cn~vn ) = c1 T (~v1 ) + · · · + cn T (~vn ), which shows that every T (~v ) is fully determined by the vectors T (~v1 ), · · · , T (~vn ). In particular, this shows that any two linear transformations that act the same way on all the basis vectors must in fact be the same linear transformation. (2) Every Rn T / Rm is multiplication by a matrix A. When our vector spaces are Rn and Rm , there is a matrix A so that T (~v ) = A~v for all ~v in Rn . This means that T is just multiplication by the matrix A. To find this A, compute how T acts on the standard basis vectors of Rn and use the resulting vectors as the columns of A. For instance, when n = 3, compute the vectors 02 3 1 1 @ 4 05A , T 0 02 3 1 0 @ 4 15A , T 0 02 3 1 0 @ 4 05A , T 1 which are in Rm , and use these three vectors as the columns of A. Why does this give the right matrix? By (1), it is enough to check that T and A act the same way on all the standard basis vectors. But since 2 3 02 3 1 1 1 4 5 @ 4 05A A 0 = first column of A = T 0 0 and similar equations hold for the second and third columns of A, our definition of A was exactly the right one to ensure that A agrees with T on the standard basis vectors. (3) B-coordinates turn V into Rn . In order to interpret V T / W as multiplication by a matrix, we first need to make sure that V looks like Rn . This is necessary because matrices multiply column vectors, and the vector space V could consist of vectors that bear no resemblance to column vectors (for instance, V could be a vector space of polynomials). Similarly, we will need to make W look like Rm , because multiplying our matrix by column vectors “in V ” will yield column vectors that are supposed to be in W . Luckily for us, a basis is exactly the right thing to make a vector space look like Rn . Let B = {~v1 , . . . , ~vn } be a basis for V . As noted above, any vector ~v in V can be uniquely expressed as a linear combination ~v = c1~v1 + · · · + cn~vn . The scalars c1 , . . . , cn are called the B-coordinates of ~v , and we put them into a column vector 2 3 c1 6 .. 7 [~v ]B = 4 . 5 . cn The notation [~v ]B means “the B-coordinates of ~v ”, which is a vector in Rn . By taking Bcoordinates of all the vectors in V , we e↵ectively turn V into Rn . (More precisely, taking [ ]B / Rn that induces a one-to-one correspondence between B-coordinates is a linear map V the vectors in V and the vectors in Rn . Such linear maps are called isomorphisms.) 2 (4) Using B, B 0 to interpret V T / W as multiplication by a matrix. Choose a basis B = {~v1 , . . . , ~vn } of V and a basis B 0 = {w ~ 1, . . . , w ~ m } of W . By taking coordinates, we can view any ~v in V as a column vector [~v ]B in Rn . Similarly, any w ~ in W has an associated column vector [w] ~ B0 in Rm . Now our method in (2) will reveal that T is multiplication by a matrix A. Following (2), we need to determine how T acts on the standard basis vectors of Rn . But the standard basis vectors are simply the B-coordinates [~v1 ]B , . . . , [~vn ]B of our basis B! Thus we’re interested in computing T (~v1 ), . . . , T (~vn ). However, if we want to use the vectors T (~v1 ), . . . , T (~vn ) as the columns of A, we first need to turn them into column vectors – this is achieved by taking B 0 -coordinates. Thus the columns of A are [T (~v1 )]B0 , . . . , [T (~vn )]B0 . This definition of A exactly ensures that A[~v1 ]B = [T (~v1 )]B0 , ..., A[~vn ]B = [T (~vn )]B0 , which is a confusing way of saying that A agrees with T when we turn V and W into Rn and Rm . One thing to emphasize is that this matrix A depends heavily on the chosen bases B and B 0 . Di↵erent bases lead to di↵erent matrices. (5) In-depth example: derivative and integral of polynomials. Let V be the vector space of polynomials in x of degree 3, and let W be the vector space of T / d~v polynomials in x of degree 2. Let V W be the derivative T (~v ) = dx . The fact that the derivative is linear is one of the basic properties you learned in Calculus I! For instance, linearity says that you can compute d (1 dx + 2x x3 ) = d (1) dx + d (2x) dx d (x3 ) dx =0+2 3x2 = 2 3x2 term-by-term in a manner that has hopefully become instinctive for you. Let’s try to view T as multiplication by a matrix. First, we pick bases for V and W . The nicest choices are B = {1, x, x2 , x3 } for V and 0 B = {1, x, x2 } for W . With these nice bases, computing the coordinates of a polynomial ~v simply amounts to building a column vector out of the coefficients of ~v . For instance 2 3 2 3 1 2 6 7 27 3 2 6 4 [1 + 2x x ]B = 4 5 and [2 3x ]B0 = 0 5 . 0 3 1 The columns of our matrix A are 2 3 2 3 2 3 2 3 0 1 0 0 [T (1)]B0 = [0]B0 = 405 , [T (x)]B0 = [1]B0 = 405 , [T (x2 )]B0 = [2x]B0 = 425 , [T (x3 )]B0 = [3x2 ]B0 = 405 , 0 0 0 3 3 so 2 3 0 1 0 0 A = 40 0 2 05 . 0 0 0 3 To see how A acts on a polynomial like 1+2x x3 , we first compute B-coordinates [1+2x x3 ]B as above and then take the product 2 3 2 3 1 2 3 0 1 0 0 6 7 2 27 4 5 3 6 4 5 A[1 + 2x x ]B = 0 0 2 0 4 5 = 0 = [2 3x2 ]B0 , 0 0 0 0 3 3 1 which agrees with our earlier calculation of how the derivative acts! Note that the first column of A is the only free column, so the nullspace of A is 02 3 1 1 B6 0 7 C 6 7C N (A) = span B @405A = span([1]B ). 0 The column vectors in this nullspace correspond to the constant polynomials, which are indeed the kernel of T : the derivative of any constant is 0! Also, since A has rank 3, the column space of A has dimension 3, so the range of T must be all of W . This means that every polynomial of degree 2 is the derivative of a polynomial of degree 3. Let’s apply similar reasoning for the indefinite integral (the antiderivative). Using the same V Rand W , the indefinite integral gives a linear transformation W S / V defined by S(w) ~ = w ~ dx. Since the indefinite integral is only defined up to a constant C, we must make a choice, namely C = 0. Again, the linearity of the integral is one of the basic properties you saw in Calculus I (here it is important that we chose C = 0!). For instance, linearity allows us to compute Z Z Z 2 2 S(2 3x ) = (2 3x ) dx = 2 dx 3x2 dx = 2x x3 . Let’s find the matrix B for S in the bases B 0 , B. The columns of B are 2 3 2 3 2 3 0 0 0 617 6 7 6 7 0 7 , [S(x)]B = [ 1 x2 ]B = 6 1 7 , [S(x2 )]B = [ 1 x3 ]B = 6 0 7 , [S(1)]B = [x]B = 6 2 3 405 4 5 405 2 1 0 0 3 so that 2 3 0 0 0 61 0 0 7 7 B=6 40 1 0 5 . 2 0 0 13 4 Let’s check how B acts on 2 B[2 3x2 ]B0 3x2 : 2 3 2 3 0 0 0 2 3 0 61 0 0 7 2 627 74 5 6 7 =6 40 1 0 5 0 = 4 0 5 = [2x 2 3 0 0 13 1 x3 ] B , which agrees with our above calculation. Note that B has rank 3, so N (B) = {~0}, so the kernel of S is {0}: the only polynomial whose indefinite integral is 0 is the 0 polynomial. Moreover, C(B) consists of all column vectors in R4 with 0 as their first coordinate, so the range of S is all polynomials with no constant term: we made the choice to use the constant term 0 for our indefinite integrals back when we defined S. One more thing to do is to think about what happens when we act by both T and S in either order. Since S is the antiderivative, Z d T (S(w)) ~ = dx w ~ dx = w ~ leaves w ~ unchanged (the composition T S is the identity transformation on W ). To see how T and S act in the other order, let ~v = a + bx + cx2 + dx3 be a general element of V . Then Z Z d~v S(T (~v )) = dx = (b + 2cx + 3dx2 ) dx = bx + cx2 + dx3 , dx which leaves ~v the same except for killing the constant term (the composition S T is a projection onto span(x, x2 , x3 )). These properties can be easily seen using the matrices A and B. The composition T S corresponds to the matrix product AB, which yields 2 3 2 3 0 0 0 2 3 0 1 0 0 6 1 0 0 1 0 07 7 4 5 AB = 40 0 2 05 6 40 1 0 5 = 0 1 0 , 2 0 0 0 3 0 0 1 0 0 13 the identity transformation. Likewise, 2 3 2 3 0 0 0 2 0 61 0 0 7 0 1 0 0 60 74 5 6 BA = 6 40 1 0 5 0 0 2 0 = 40 2 0 0 0 3 0 0 13 0 0 1 0 0 0 0 1 0 3 0 07 7 05 1 is the projection matrix we described above. Although A and B are not inverses (since A and B are not square, there’s no chance of them being invertible), B is the pseudoinverse A+ of A: the two compositions calculated above give projections onto the column space and row space of A. (See 7.3 of Strang for more about pseudoinverses.) 5