OVERVIEW OF REAL LINEAR ALGEBRA LARRY SUSANKA Date: April 13, 2014. 1 2 LARRY SUSANKA OVERVIEW OF REAL LINEAR ALGEBRA 3 Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. ✸ Opening Remarks Sets and Set Operations ✸ First Steps Rn and Euclidean Space ✸ The Association Between “Arrow-Vectors” and Rn ✸ Position Vectors Parametric and Point-Normal Equations for Lines and Planes Systems of Linear Equations Einstein Summation Convention Matrices The Identity and Elementary Matrices Special Matrices Row Reduction Matrix Form For Systems of Linear Equations Example Solutions Determinants Part One: The Laplace Expansion Determinants Part Two: Everything Else You Need to Know Linear Transformations from Rn to Rm Eigenvalues Real Vector Spaces and Subspaces Basis for a Vector Space The Span of Column Vectors A Basis for the Intersection of Two Vector Subspaces of Rn ✸ Solving Problems in More Advanced Math Classes Dimension is Well Defined Coordinates ✸ Position Vectors and Coordinates Linear Functions Between Vector Spaces Change of Basis Effect of Change of Basis on the Matrix for a Linear Function ✸ Effect of Change of Basis on a Position Vector ✸ Effect of Change of Basis on a Linear Functional Effect of Change of Basis on the Trace and Determinant Example: How To Use Convenient Bases Efficiently Bases Containing Eigenvectors Several Applications Approximate Eigenvalues and Eigenvectors Nullspace, Image, Columnspace and Solutions More on Nullspace and Columnspace Rank Plus Nullity is the Domain Dimension Sum, Intersection and Dimension Direct Sum A Function with Specified Kernel and Image 7 8 11 18 24 26 29 31 33 34 36 38 40 42 44 50 51 55 57 58 63 66 68 71 72 73 74 76 78 80 81 83 85 86 89 91 95 98 99 101 102 103 105 4 LARRY SUSANKA 44. Inner Products 45. The Matrix for an Inner Product 46. Orthogonal Complements 47. Orthogonal and Orthonormal Bases 48. Projection onto Subspaces in an Inner Product Space 49. A Type of “Approximate” Solution 50. Embedding an Inner Product Space in Euclidean Space 51. Effect of Change of Basis on the Matrix for an Inner Product 52. A Few Facts About Complex Matrices and Vectors 53. Real Matrices with Complex Eigenvalues 54. Real Symmetric Matrices 55. Real Skew Symmetric Matrices 56. Orthonormal Eigenbases and the Spectral Theorem 57. The Schur Decomposition 58. Normal Matrices 59. Real Normal Matrices 60. The LU Decomposition 61. The Singular Value Decomposition 62. ✸ Loose Ends and Next Steps Index 106 107 109 111 112 113 113 114 115 117 118 120 121 122 124 125 127 129 131 137 OVERVIEW OF REAL LINEAR ALGEBRA 5 6 LARRY SUSANKA OVERVIEW OF REAL LINEAR ALGEBRA 7 1. ✸ Opening Remarks. In these notes1 I will give a kind of narrative of those ideas encountered in our one-quarter introductory Linear Algebra class, with additional material that could be used to fill out a semester course. They constitute a synopsis of the pertinent points in a natural order in a conversational but condensed style. This is what I would have liked to have said, and what I wish to have understood by successful students in the class. To the student: You might use these notes as review for the final, or tick off the sections as we come to them to keep track of our progress. I also give advice here and there about how to think of these things, and how to study them, and tools and techniques you can use to solve the assigned problems. You are well advised to read over these notes very carefully, at least the sections we actually cover. They tell you what I think is important and will, if nothing else, tip you off as to potential test questions. Mathematics is a foreign language, and you cannot understand that language without a dictionary. This is particularly true for those studying Linear Algebra for the first time. Step one must be to learn vocabulary: I recommend “flash cards” with the technical words you encounter on one side and the definitions (perhaps with one or two key examples) on the other. The notes here can be a guide for the creation of these flash cards. Without memorizing the vocabulary we use you cannot even participate in the discussion. You can’t succeed. If you do memorize the vocabulary, the difficulties become manageable. Really. In this class you are being asked to do things that are different in character from the math you have—most likely—seen before. Calculation is part of the job, but here only part. The real question is to determine which calculation needs to be done, and for what purpose. You will be asked to (gasp) prove2 things. The class is not about calculations alone, though we will learn to do some rather intricate ones, but the ideas which inspire them as well. Linear Algebra is a case study in the incredible power of generalization employed in service of certain types of practical problems. These are situations which are unmanageably complex in their natural context, but when stripped of unnecessary extra features are seen to be special cases of vector spaces, linear functions, inner products and so on. All of our theorems and compact notation then apply. 1Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. 2 You will not, of course, be asked to prove theorems. You will be asked to understand what a few theorems state. You will need to verify that the conditions in the statement of the theorem hold, so that the theorem actually applies to what you are doing. You will then draw a conclusion. 8 LARRY SUSANKA Linear Algebra is not about Kirchhoff’s Laws, analysis of rotating bodies, a model of an economy with n products, sequences defined recursively, representation of forces in the structural members of a building, maximum or minimum of functions with many independent variables, Quantum Mechanics, signal analysis and error correction, the study of curved surfaces, Markov probability models, analysis of n different food items for m different nutrient values, stresses inside a crystal, models of ecosystems or approximations in the study of air-flow over a wing. Instead, it is about techniques absolutely necessary to study all those things efficiently, and many many more, all at once, with unified vocabulary and technique. Your job here is to learn that vocabulary and technique. Of necessity, most of the detail work must be done with hardware assistance, a computer program such as Maple, Mathematica, MATLAB or Mathcad or, preferably (for us) a calculator. Embrace this necessity. You should be able in principle to do everything by hand but the goal is to find ways to do nothing whatsoever by hand, beyond checking here and there to make sure your calculator has not meandered off into the ozone without you. The calculations tend to become far too complex for humans as the size “n” of the problems rise. Even if it is possible to do a particular problem by hand (because an idea is being illustrated with a simple or carefully “rigged” example) you should try in every case to find a way to do it also with hardware. Let me be blunt: The correct amount of arithmetic to perform in a calculation in this class is little (scales with n) or none (read off or enter coefficients.) Most other calculations require a number of steps that scales with n2 or n3 (hard but machinery can do it) or n! (essentially impossible except in trivial cases.) If you find yourself wandering into a mess, think again and try another way. 2. Sets and Set Operations. In all human languages, the language of mathematics included, the words are defined in terms of each other. Ultimate meaning is derived from introspection or pointing out the window. Unlike other human languages, modern mathematics tries to keep strict control over this process and reduce the number of undefined “ultimate root” words to a minimum. The “grammar” of mathematics is designed to help us catch inconsistencies and logical error. Two undefined items get mathematics started: the first is the word “set” and the second is the “element of ” binary operation. One is supposed to simply know what they mean, without definition. Virtually all of mathematics is expressed in terms of the theory that comes from these two undefined objects: Set Theory. OVERVIEW OF REAL LINEAR ALGEBRA 9 I think of a set as a bag that can contain things. The things in the bag are called the elements of the bag. A set is completely defined by the things it contains. There are not “red bags” and “yellow bags” that contain the same things but are different. If they have the same elements they are the same set. The statement “a is an element of the set B,” written in math notation “a ∈ B,” will be true if, when you peek into B, you see a in there. The relation A ⊂ B between sets is defined in terms of the “element of” relation. A ⊂ B if every element of A is an element of B. So B contains, at least, all the elements of A and possibly others too. It is read aloud “A is a subset of B.” One can write this in shorthand as A⊂B if and only if a ∈ A ⇒ a ∈ B. The right arrow stands for the word “implies.” The “if and only if” phrase has a couple of uses. The first time you see it in use with some new mathematical object it says “we are about to define something.” After that, it indicates a theorem to be proved. There are two ways we will define sets in these notes. First, we can list their elements as { 1, 2, 3, 4 }. The curly brackets can be thought of as the “paper” of the bag. This set has four elements, the numbers 1, 2, 3 and 4. The next way is to describe the elements in some unambiguous fashion. The notation for that is something like this: { x | x is a whole number between one and four, including one and four. } The x to the left of the vertical bar is a variable letter that represents any potential member of the set. It could be any unused symbol. The vertical bar is read out loud as the words “such that.” After that is the condition you will use to test x for membership in the set. If the statement is true for a given x, that x is “in.” Otherwise it is “not in.” A slight variant of this approach is to give some information about the type of mathematical object you expect x to be to the left of the vertical bar. For instance we let R denote the set of all real numbers and N denote the natural numbers: the positive counting numbers and zero. Then { x ∈ R | 1 ≤ x < 5 } is the interval of real numbers [1, 5), closed on the left and open on the right. On the other hand { x ∈ N | 1 ≤ x < 5 } is the set { 1, 2, 3, 4 }. 10 LARRY SUSANKA This can be kind of tedious, but it has virtues. For instance sometimes you see sets given by a notation that sort of “splits the difference” between the two approaches. The notation { 1, 2, . . . } means . . . what exactly? Most people would say it stands for the counting numbers. Others might think it stands for integer powers of two, starting with 20 . Others might even be more creative. Use this last notation at your own risk. It is of necessity somewhat ambiguous. If A and B are two sets we can create two new sets from them, A ∩ B and A ∪ B. They are called the intersection of A and B, and the union of A and B, respectively. They are defined as A ∩ B = { x | x ∈ A and x ∈ B } and A ∪ B = { x | x ∈ A or x ∈ B }. The word “and” in the definition of intersection has its usual English meaning. x is in the intersection when it is in both A and B. But the word “or” in the definition of union means something a bit different from its English usage. In math, “or” is not exclusive. So x is in the union if it is in A or it is in B, or both if that is possible. One interesting set is the “empty set.” It is the empty bag, the bag with nothing in it. It is denoted ∅. The statement “a ∈ ∅” is always false. The statement “∅ ⊂ B” is always true. That is just about all you need to know about sets for now, except for one more thing. We will be talking about “ordered sets” and these have some additional structure. A raw set has no order. { 1, 2, 3, 4 } = { 2, 1, 3, 4 } = { 2, 2, 2, 1, 3, 4 }. So in this class when we talk about an ordered set we will mean a set together with something extra: an ordered labeling of the elements, a unique integer label for each element, which must be specified along with the set. 2.1. Exercise. Find the following sets: T T (i) N { x ∈ R | .5 < x < T 2.5 }. (ii){ x ∈ R | x < 2 } { x ∈ S R | x > 3 }. (iii) { x ∈ N | x/6 ∈ N } { x ∈ R | x < 25 } (iv) { 1, 3, 5 } { 2, 7, 9 } OVERVIEW OF REAL LINEAR ALGEBRA 11 3. ✸ First Steps. Most people taking this class3 have been introduced to vectors at some point—they are essential in the physical sciences and it is customary to discuss them in various earlier math classes too. This class is entirely concerned with them, start to finish. A vector is an object completely characterized by two quantities, which we call magnitude and direction. The physical meaning of these quantities in an application of vectors comes from experience and varies from application to application. There are a number of things in the world with which you are no doubt familiar that are commonly represented as vectors, and you should think a bit about the meaning of “magnitude” and “direction” in each case. • Displacement—a representation of a movement from a starting place to an ending place, with emphasis on the “distance and direction from start-to-end” of the completed movement rather than how or where it occurred. • Velocity—a description of motion, whose magnitude is the speed and whose direction “points the way.” The velocity would be the displacement vector over one time unit, if the motion continued unchanged for the whole time unit. • Forces—these describe “pushes” by one thing against another. A force is the cause of acceleration. If you see changes in the motion of something, it is because there is a force acting on that thing. No such changes require that the resultant of all forces have zero magnitude. • A representation of a uniform Wind or Current—in the air or water. This example is tied to velocity. It can be interpreted as the velocity of a dust particle swept up and carried along by an unvarying wind or current. Why should any mathematical construction (in our case, vectors) describe faithfully a category of real-world experiences? I don’t know, really. It is an interesting philosophical puzzle. But many mathematical abstractions do, at least reasonably well. It is only through experience, conjecture bolstered by many experiments, that we (i.e. physicists, engineers, you, me) can decide that a mathematical entity is a reasonable tool to try to describe something we see and want to understand better. We are going to study in this class how vectors behave. Only you can decide if vectors mimic well aspects of the world, such as those on the list above. Mostly we will be working with pretty abstract mathematical objects, ordered triples of real numbers, matrices and so on, and referring to them as vectors and studying their properties. Precise definitions are necessary in 3Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. 12 LARRY SUSANKA mathematics, but we also need at least one way (preferably more than one way) of thinking about these abstract definitions, to guide our calculations and tell us which ideas are important and which are not. Luckily, there are several excellent visualization guides of this kind for vectors and their collective, vector spaces. In this particular section we represent vectors as “arrows,” one of the intuitive notions students usually encounter in earlier presentations, but bear in mind these are not the more rigorous, more abstract, definitions that will come later. They are (and throughout the text continue to be) visual or intuitive aids to help us think about the ideas we encounter. In a sense they are more basic, more primitive, more real than these later ideas. You can draw or even buy an arrow, but an ordered triple of real numbers is hard to find in a store. Your instructor will, no doubt, illuminate the later more abstract discussion with numerous references to the pictures we see in this section. During a discussion like that, the word “displacement” might be used synonymously with “arrow-vector.” An arrow has direction given by its shaft with specified “tail-to-tip” orientation, and magnitude given by its length. We take the point of view that two arrows located anywhere are merely instances of the “same” arrow-vector, so long as each has the same length and direction. So the arrow to the left represents the “same” arrow-vector (or simply “vector,” for short) as the one on the right above, even though it is located at a different place. You have plenty of experience with “fuzzing out” the distinctions among things which are manifestly different but which exhibit similarities upon which we wish to focus. For example, the fractions 3/4 and 6/8 represent different ideas. In the first, you break the “whole” into 4 equal pieces, and you have 3 of them. In the second, you break the “whole” into 8 equal pieces, and you have 6 of them. With these differences, there is something important that is similar about these two fractions: namely, I am just as full if I eat 3/4 of a pizza or if I eat 6/8 of a pizza. We choose to focus on that and we say 3/4 = 6/8. We gather together all the fractions “equal” to 3/4 and refer to the entire pile of them by picking any convenient representative, such as that in lowest terms or with some specified denominator. Similarly, when we speak of vectors represented using arrows, the arrows above are different on the page, but by picking one of them we refer to both, and any other arrow with the same magnitude and direction as well! OVERVIEW OF REAL LINEAR ALGEBRA 13 There is a phrase used for this type of thing in mathematics. It is called an equivalence relation. All the fractions equivalent to 3/4 are said to be in the same equivalence class of fractions. The arrows equivalent to a given specific arrow form, all together, an arrow-vector. Two vectors A = and B = are added by finding the copy of B which has its tail on the nose of a copy of A. The sum vector A + B consists of the arrow that starts with its tail at the tail of this copy of A and ends with its tip at the tip of this copy of B, together with all other arrows with the same magnitude and direction. It is important to note the direction of A + B, in this case from left to right. −B is the vector that looks just like B but with tip and tail switched: B= −B = A positive constant k times a vector A is a new vector pointing in the same direction as A but with length stretched (if k > 1) or shrunk (if k < 1) by the factor k. Negative multiples of A are said to have direction opposite to A. 2A = and (1/2)A = A − B is defined to be A + (−B). So A − B is the vector on the left of the picture: 14 LARRY SUSANKA The process of adding two vectors is called vector addition. Multiplying a vector by a constant is called scalar multiplication. The vector with zero magnitude is hard to represent as an arrow: it is called the zero vector, denoted 0. It doesn’t really have a direction—or perhaps it has any direction. You pick. Context helps distinguishes it from the number 0. 3.1. Exercise. You should satisfy yourself that: A + B = B + A, A − A = 0 and that 2A = A + A. On the far left is a picture of the vector sum 2A − 3B. A vector such as this, formed as a sum of multiples of vectors, is often referred to as a resultant vector. It is also called a linear combination of the vectors involved, in this case A and B. 3.2. Exercise. (i) Draw a picture of 3C − 2D and C + 21 D where C and D are given by: (ii) Find the resultant of the linear combination 2V − 3W when V and W are shown below: OVERVIEW OF REAL LINEAR ALGEBRA 15 Now that we know how to add vectors, combine them into a single resultant vector, a natural next step might be to see how we can break them into pieces in various ways. We will think about how to decompose a vector into the sum of two others. One of these will be a multiple of a specified vector and the other perpendicular to that specified vector. This is a very important process in applications. The process of finding the “parallel part” is called projection. In order to create the picture of such a decomposition you need to know only one thing not present in the first part of this section. You must have a concept of perpendicularity, and be able to tell when two arrows are perpendicular to each other by some method. In this section, the old standby “eyeball” method will suffice. A common and very important usage of vector decomposition occurs when we consider force vectors. A classic example would be that of a box sliding down a slanted board. The most obvious force here is the weight of the box. But that force is directed straight down and the surface of the board prevents movement in that direction. The right way to handle this is to decompose the force caused by gravity into two perpendicular pieces: the part that is straight into the surface of the board (the source of friction) and the part that points along the line of the board. It is only this last part which makes the box slide. 16 LARRY SUSANKA Whatever the source of the vectors involved may be, we will draw some pictures here to see “how to do it.” We want to learn how to decompose a vector V into the sum of a vector P which is a multiple of some vector W and another vector V − P which is perpendicular to W . We call the second vector V − P because whenever V = P + A it must be that A = V − P, so there is no point in introducing an independent name at this point for the perpendicular part of the decomposition. Find below three different decompositions of this type in picture form. Notice two things about the pictures above: First, in each case P and V − P are perpendicular to each other. Second, P is a multiple of W . In constructing the decomposition, the length of W is irrelevant. The only thing that is important about W is the “slope” of its shaft. To create the decomposition, draw a fresh picture of a V and W pair from one of the pictures above on a sheet of paper. V should be somewhere in the middle and W off to the side for reference. Draw a dotted line through the tail of the copy of V . The dotted line must go along the same direction as W. Extend this dotted line a good bit on either side of V , across the whole paper. Next lay your pencil down on the paper. Put the eraser on this dotted line with the point on the same side as V is on. Make the shaft of the pencil perpendicular to this dotted line. This is where you need to know about perpendicularity. Slide the pencil up or down the dotted line, keeping it perpendicular to the dotted line, till the pencil tip points at the tip of V or the shaft crosses the tip of V . Stop. This gives the decomposition. OVERVIEW OF REAL LINEAR ALGEBRA 17 3.3. Exercise. (i) Draw a picture showing the decomposition as described above for the indicated V and W on the left. (ii) Draw a picture showing the decomposition as described above for the indicated V and W on the left. (iii) Decompose V into the sum of two vectors, one “along the line” of W , and the other perpendicular to W . There are a couple of points I would like to emphasize (or re-emphasize) before getting down to business. First, each instance of a vector in the world actually occurs at some specific spot, and whenever an arrow is drawn it is drawn somewhere specifically. When we think of something in the world as a vector, we take the point of view that any representative refers not only to itself but to all others with the same magnitude and direction too. When you refer to 7/3 you are often making a statement about 21/9 at the same time even without mentioning that second fraction. Second, a given push (a force) is a real thing that exists however we decide to describe it. The wind is just whatever it is and doesn’t need us to tell it that it is 30 miles per hour from the North. A displacement across a room is a real thing, in itself. But in physics and other classes we try to describe things, often using mathematics and numbers. This association always involves a huge pile of assumptions including, for example, a choice 18 LARRY SUSANKA of a distance unit, a time unit, an “origin,” directions for coordinate axes and methods for measuring lengths and angles and the passage of time and some way of gauging the magnitude of a “push” and on and on. There is also a conceptual framework, frequently generated by the esthetic sensibilities of the creators of the model, which helps us think about the measurements. It is not always clear which among the conceptual underpinnings are necessary, or even if they are consistent. Our description depends not only on the real thing, but on all these choices involved in a representation too. When we go through this process of assembling a model of something in the world we must never forget that the map is not the territory. A nickname for a thing is not the thing itself. The universe names itself, and whatever shorthand we use to describe part of it leaves out almost everything. In applications we must always be looking “out the window” to make sure the world is still answering to our nickname for it. It is astounding how often, over the last four centuries, it comes when we call. We must be doing something right. 4. Rn and Euclidean Space. Now that we have in mind the typical visual representation of vectors, and some important subjects usually studied using vectors, it is time to get more precise. Rn is defined to be the set of columns or “stacks” of real numbers which are n real numbers high. A column is a type of matrix. (We will talk about matrices of various shapes a lot more later.) The real numbers in a column like this are called entries or coordinates . You add columns, if they are the same height, by adding corresponding entries. You multiply columns by real numbers (called scalars) by multiplying all the entries by that number. So 1 1 1 1 1 cx x x + y1 y x x2 cx2 x2 y 2 x2 + y 2 and c .. = .. . .. + .. = .. . . . . . xn yn xn + y n xn cxn Adding columns of different heights is not defined. As is customary in algebra, subtraction of columns is defined by adding “minus one times” the subtracted column to the first column. We will refer to members of Rn , in a context where these operations are important, as vectors. The addition operation will then be called vector addition. The multiplication is called scalar multiplication. A sum or difference of scalar multiples of vectors is called a linear combination of the vectors involved. This linear combination, written as a single vector, is sometimes called the resultant of the linear combination. OVERVIEW OF REAL LINEAR ALGEBRA 19 For purely typographical reasons we often denote members of Rn (columns) by rows 1 x x2 (x1 , x2 , . . . , xn ) ⇐⇒ .. . xn with intent taken from context. These are not to be mistaken for row matrices, which will come up later, and which have no commas separating entries4. We define ei by 0 .. . ei = 1 ←− ith row . .. 0 where the 1 is located in the ith spot, counting down from the top, and all other entries are 0. The set of these ei will be denoted En = { e1 , . . . , en }. This set is called the standard basis of Rn . Note the ambiguity in the notation here: e1 as an element of En is not the same vector as e1 as an element of Ek unless n = k. The zero vector, that is, the vector whose entries are all 0, suffers from the same ambiguity. The zero vector of any size is denoted 0 and usually distinguished from the number 0 by context or the use of bold type. Any vector x can be written as the resultant of one and only one (except for order of summands) linear combination of these standard basis vectors e 1 , . . . , en : x = x 1 e1 + x 2 e2 + · · · + x n en = n X x i ei . i=1 We will usually use bold lower case letters for members of Rn , while the entries will usually be the same letter in normal type, with subscripts or superscripts to indicate position in the column. Sometimes in R2 or R3 the vectors e1 , e2 or e3 are denoted ~i, ~j or ~k. In that case, usually the entries of a vector x are denoted x, y or z rather than x1 , x2 or x3 . We will mostly use subscripts or superscripts. That allows us 4This convention is far from universal, even among works by the same author—though one might hope for consistency within a given text. In my notes on tensors, for instance, we take a different point of view. 20 LARRY SUSANKA to write down, so far as possible, expressions whose form does not depend on dimension. We define dot product between members v and w of Rn by v · w = v1 w1 + · · · + vn wn = n X vi wi . i=1 The dot product has many useful properties. For instance, the dot product ei · ej equals 1 if i = j and 0 otherwise. Dot product is commutative and distributes over vector addition: v·w =w·v and u · (v + w) = u · v + u · w. Also if c is a scalar, c(v · w) = (cv) · w = v · (cw). We define magnitude or norm (synonymous) of a vector u by √ kuk = u · u. The distance between vectors u and v is defined to be k u − v k. If c is a scalar, kcuk = |c| kuk. The norm satisfies the very important Cauchy-Schwarz and triangle inequalities: for all pairs of vectors v and w |v · w| ≤ kvk kwk and | kvk − kwk | ≤ kv + wk ≤ kvk + kwk The proof of the second of these follows easily from the first. The proof of the Cauchy-Schwarz inequality is given in the next calculation. The inequality |v · w| ≤ kvk kwk is obviously true when v · w = 0. We assume, , that this dot product is nonzero so, in particular, nether v nor w is the zero vector. v·w v·w w · v− w 0≤ v− w·w w·w v · w 2 v·w w·w w·v+ = v·v−2 w·w w·w (v · w)2 = v·v− . w·w So (v · w)2 ≤v·v w·w and the Cauchy-Schwarz inequality follows. The angle θ between members v and w of Rn is defined by the equation v · w = kvk kwk cos(θ). OVERVIEW OF REAL LINEAR ALGEBRA 21 In the plane applied to unit vectors (corresponding to points on the unit circle) the proof of this is nothing more than the difference formula for cosines: ( cos(β), sin(β) )·( cos(α), sin(α) ) = cos(β)cos(α)+sin(β) sin(α) = cos(β−α). That the angle formula “works” for any pair of nonzero vectors in the plane, not just unit vectors, is an exercise. To justify the formula in Rn when n > 2 requires some thought. An argument can be made using the lengths of the edges of a triangle formed by the two vectors involved. For now you can just regard this as the definition of angle in these cases. So two vectors are perpendicular exactly when their dot product is zero. It sometimes is useful to be able to produce vectors perpendicular to a given vector, and this dot product idea makes it easy. For instance to find a vector perpendicular to (3, −4) you switch entries and change one sign. So (4, 3) works. In higher dimensions you kill (i.e. replace by 0) all but two entries and apply this rule to these two. So (4, 3, 0) and (7, 0, −3) and (0, −7, −4) are all perpendicular to (3, −4, 7). Euclidean Space of dimension n is Rn with the operations of vector addition and scalar multiplication together with this particular notion of distance and angle, given by dot product. A unit vector (that is, a vector one unit long) in Euclidean Space has coordinates ( cos(θ1 ), cos(θ2 ), . . . , cos(θn ) ) where θi is the angle between the vector and the “ei axis.” Use dot product to justify this. The numbers cos(θi ) are called direction cosines. We define projection onto a line through the origin, containing the nonzero vector w to be v·w P rojw (v) = w. w·w Note that P rojw ((P rojw (v))) = P rojw (v) for all v. If v is perpendicular to w then P rojw (v) = 0. If v is a multiple of w then P rojw (v) = v. We use the decomposition v = P rojw (v) + (v − P rojw (v)) for several things. It gives the vector v as the sum of two vectors. One is “along the line of” the vector w. The other is perpendicular to that line. The projection onto the plane5 through the origin perpendicular to the vector w is the function 5 If n > 3 we might call this projection onto a “hyperplane.” In dimension 2, this is projection onto the line perpendicular to w. 22 LARRY SUSANKA CoP rojw (v) = v − P rojw (v). Note that CoP rojw (CoP rojw (v)) = CoP rojw (v) for all v and if v is perpendicular to w then CoP rojw (v) = v. If v is a multiple of w then CoP rojw (v) = 0. We define reflection in a plane perpendicular to vector w to be the function Ref lw (v) = −P rojw (v) + (v − P rojw (v)) = v − 2 P rojw (v). Note that Ref lw (Ref lw (v)) = v for all v and if Ref lw (v) = v then v is perpendicular to w. Also Ref lw (w) = −w. We define cross product between members v and w of R3 by v × w = ( v 2 w3 − v 3 w2 , v 3 w1 − v 1 w3 , v 1 w2 − w1 v 2 ). An easy calculation using dot product shows that v × w is perpendicular to both v and w. Also c(v × w) = (cv) × w = v × (cw) for any scalar c. It is a fact that v × w = −w × v. So cross product is not commutative. Except for special examples (u × v) × w 6= u × (v × w). Cross product is not associative. Cross product does obey both left and right distributive laws with respect to vector addition: u × (v + w) = u × v + u × w and (v + w) × u = v × u + v × w. In general, if the angle between v and w is θ, kv × wk = kvk kwk sin(θ) so the magnitude of v × w is the area of the parallelogram formed using v and w. This last formula is proved via a messy collection of terms to verify the left equation in kvk2 kwk2 − kv × wk2 = (v · w)2 = kvk2 kwk2 cos2 (θ). It is also a fact, useful in Calculus, that the number |u · (v × w)| gives the value of the volume of the “bent box” (i.e. parallelepiped) determined by the three edges u, v and w. The number u · (v × w) is called the triple product of u, v and w. 4.1. Exercise. Find the resultant of the linear combination 2(1, 3) − 3(−2, 1). OVERVIEW OF REAL LINEAR ALGEBRA 23 4.2. Exercise. (i) When will v · w = kvk kwk? (ii) Use the Cauchy-Schwarz inequality to prove the triangle inequality. 4.3. Exercise. (i) Find the angle between (7, 1, 0, 5) and (1, 1, 1, 0). (ii) Find four vectors perpendicular to (1, 3, −8, −2). (iii) Create a unit vector pointing in the same direction as (1, 3, −8, −2). (iv) What is the cosine of the angle between (1, 3, −8, −2) and the e4 axis? 4.4. Exercise. (i) Show that c(v · w) = (cv) · w = v · (cw) and that v · w = w · v for scalar c and vectors v and w. (ii) Show that (v + u) · w = v · w + u · w for vectors v, w and u. 4.5. Exercise. Decompose v = (1, 3, 5) into the sum of two vectors: one “along the line of ” w = (−1, 1, 2) and the other perpendicular to that line. 4.6. Exercise. (i) Calculate P rojw (e1 ) for w = (1, −1, 3). (ii) Calculate CoP rojw (e2 ) for w = (1, −1, 3). (iii) Calculate Ref lw (e3 ) for w = (1, −1, 3). 4.7. Exercise. (i) Show that P rojw (v + cu) = P rojw (v) + cP rojw (u) for any nonzero vector w and scalar c and any vectors v and u. (ii) Show P rojw (P rojw (v)) = P rojw (v). 4.8. Exercise. (i) Show that CoP rojw (v+ku) = CoP rojw (v)+kCoP rojw (u) for any nonzero vector w and scalar k and any vectors v and u. (ii) Show CoP rojw (CoP rojw (v)) = CoP rojw (v). 4.9. Exercise. (i) Show that Ref lw (v + cu) = Ref lw (v) + cRef lw (u) for any nonzero vector w and scalar c and any vectors v and u. (ii) Show Ref lw (Ref lw (v)) = v. 4.10. Exercise. In R3 , calculate ~i · ~j × ~k and ~k · ~i × ~j and ~j · ~k × ~i . 4.11. Exercise. (i) Find a vector perpendicular to both (3, 5, 1) and (1, 2, 3). (ii) Find an example to show that (u × v) × w need not equal u × (v × w). (iii) Show that cross product does obey distributive laws with respect to vector addition: u × (v + w) = u × v + u × w. (iv) Show that u · (v × w) = w · (u × v) = v · (w × u). 24 LARRY SUSANKA 4.12. Exercise. (i) Find the area of the parallelogram determined by edges (1, 1, 1) and (1, 3, 2). (ii) Find the area of the parallelogram determined by edges (1, 1, 0) and (1, 3, 0). Why did I ask this question, since it is so similar to the last? (iii) Find the volume of the parallelepiped determined by edges (1, 1, 1), (1, 0, 2) and (3, 6, 7). 5. ✸ The Association Between “Arrow-Vectors” and Rn . It is time to make a connection6 between vectors represented in Rn as in the last section, and our earlier ideas typified by the quasi-physical “arrowvectors” from Section 3. We identify a part of our universe as worthy of study, and call that, below, “our world.” In order to make this work, we need to have quite a bit of certain types of information about our world. We have to be able, for instance, to understand how to move around in our world, and measure properties of specific displacements there. We have to be able to “parallel-transport” one of our specific displacements to an “equivalent” specific displacement that starts where any other particular displacement ends. Thought of as arrows, we have to be able to move one arrow so that its tail is at the tip of any other. Looking at the collections of equivalent displacements, which we will call arrow-vectors, we have to be able to consistently extend an arrow-vector (multiply by a scalar) or to combine several of them into a single arrowvector. And if A = B + C represents arrow-vector A as a sum of two other arrow-vectors then it must be true, for instance, that 2A = 2B + 2C. It is entirely possible that we could be deluding ourselves about our ability to do or know these things. Nevertheless, we suppose that we can do and know them. If we are wrong, we will find out sooner or later, because our model will make predictions that are contrary to what we see7. If we study our world and realize that we cannot make any displacement at all then we are confined to a point. Our world is “zero dimensional” and we could identify our only displacement vector with the set {0}: not a very interesting situation, and it is hard to imagine anyone finding that worthy of study, but there you are. We might find, though, that we can move in the world. We pick a displacement W that is to be our measuring stick. W is not very abstract at all. A representative is sitting in front of our nose: our standard movement 6Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. 7That could be really important information, and discoveries like that have led to Nobel Prizes. OVERVIEW OF REAL LINEAR ALGEBRA 25 in one possible direction. As we examine other displacements we might find that every single one is a multiple of W . There is only one independent direction in this world. We say our world is one dimensional, our movements are “confined to a line.” So every displacement A is of the form A = a1 W for some real number a1 and we associate the displacement A with the real number a1 in R = R1 . It is particularly important to note that this association depends explicitly on the yardstick W , so when you use the “map” that associates each A with its own a1 you must explicitly include the “legend.” Without knowing W you cannot know what a1 means. But if you know W and how it fits inside our world (and of course you do know this: that was our starting point) then you know exactly what this number refers to. After some thought, one might conceive of a one dimensional world as a line cutting across the room and extending out into space, infinitely far in both direction, whatever that means. However it might be more like a big ball of string with our yardstick W trapped inside the string. Remember, though, we would have to know (or think we know) how to slide along one arrow onto and past another around the bends of the string, and how to know when one displacement is equivalent to another at a different location along the string. This world is worth thinking about, but there are more interesting cases. Let’s consider the situation that we would be in if we had found there to be more than one direction in our world. If there are displacements that are not multiples of W , pick one. Call it V . We can experiment with displacement after displacement trapped on our world. We might find that every one of them is a combination of these two chosen displacements. After a while we would become convinced that any displacement A in this world can be realized as A = a1 W + a2 V 26 LARRY SUSANKA for certain real numbers a1 and a2 . Because one arrow-vector is not enough to describe all displacements as we explore our world and two arrow-vectors are, we say our world has two dimensions. Our world seems like a tabletop, or a piece of paper. In this case we associate the arrow-vector A = a1 W + a2 V with the point a1 e1 + a2 e2 = (a1 , a2 ) in R2 . Again, to understand what an ordered pair means physically on the tabletop or on the paper we must have W and V in hand, identified. And we have to know enough about our world so we think we understand when two displacements at different locations are equivalent, how to combine displacements, and we must satisfy ourselves that these displacements scale and combine the way they should. Displacements trapped on an “unbounded” sheet of paper, even if it is bent, could be made to work. Displacements sliding around trapped on the surface of a sphere could not. But you would find that out, in the end. Experiments would produce anomalies8 and you would realize that your assumptions lead to contradiction. The way these contradictions reveal themselves could provide very interesting information about the way the world really is. Now let’s suppose we find that there are at least three directions we can go, by discovering a displacement which cannot be written in terms of W and V . If U is an example of a displacement like this, and if every displacement can be written as a sum a1 W + a2 V + a3 U then we call our world three dimensional, and associate the displacement A = a1 W + a2 V + a3 U with a1 e1 + a2 e2 + a3 e3 = (a1 , a2 , a3 ) in R3 . It is often far easier to do calculations in Rn than to work with arrowvectors directly. To make sense of answers computed in the model, R3 , we must keep in mind the ordered list of guide arrow-vectors in our world that give the ordered triples meaning. 6. ✸ Position Vectors. Vector operations are so efficient9 that sometimes we want to use them to help us define specific places. This is awkward, because vectors don’t have specific locations. To do this an agreement is required. We simply have to agree on a “base point” for our vectors. With this agreement we can 8For instance when confined to the surface of a sphere a certain multiple of each displacement is the zero displacement. This presents problems. 9Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. OVERVIEW OF REAL LINEAR ALGEBRA 27 create new things called position vectors. Using a vector this way means you must always use the version of the vector, thought of as an arrow, with its tail at this specified base point. Its nose then rests on the location you intend to identify. This agreement, a known choice of base point, is part of the definition of “position vector” and must be specified explicitly or you will not know which point is intended. In the case of our visual assistants, the “arrow-vectors,” using an arrow that had its tail at some particular place in our world might have been convenient to help us calculate if we decide on coordinates. With position vectors, using this copy is required, whether we represent our arrow-vectors in Rn or not. A position vector is not a vector because the location of its tail really does matter. If we decide on a different base point, the position vector to the same point will differ from the earlier position vector for two reasons: first, the base point is different so they are different arrows. But even if we forget that, they will not have the same magnitude and direction. Position vectors (with the same base point) cannot be added to each other to yield a position vector or anything else: the tail of one of these position vectors cannot be at the nose of the other, to use the customary construction of an arrow-vector sum. If you “formally add arrows” by moving one position vector so that its tail is at the nose of the other to attempt this addition you will see that you end up at points corresponding to different places in the world depending on your base point. The same is true with scalar multiplication: if you stretch a position vector by a factor of 2 you end up at different places depending on the base point used. 6.1. Exercise. Draw pictures in a plane to convince yourself of the two assertions in the previous paragraph. If we do decide on a base point then the difference between two position vectors with this base point is a vector. That is because, had we picked a different base point, the difference between the two new position vectors would be the same. The difference between starting and ending position defines a displacement. That displacement is a vector even if the start and end position vectors are not. And the sum of any vector added to a position vector can be sensibly interpreted as a position vector with the same base point. 6.2. Exercise. Draw pictures in a plane to convince yourself of the statements in the last two paragraphs. There are two reasons, I think, that people have trouble grasping the difference between vectors and position vectors. First, the situations where you must be aware of the distinction or you will inadvertently do the wrong calculation don’t come up that often in the early examples. One can do the correct calculation by accident. Second, it is customary to start working 28 LARRY SUSANKA with entries rather than pictures of arrows from the very start, so a base point has already been selected by someone, even if the student is not made aware of the choice, and the presumption is that it will not change. Let’s examine a situation where you might be tempted to perform undefined operations and see how to keep it all straight. Let QA denote the position vector of point Q using base point A. It is a specific arrow, with tail at A and nose at Q. Let QB denote the position vector using base point B. We will call AA the base position vector at A and BB the base position vector at B. Both are, of course, the zero position vectors. However they denote different locations. Given some position P, let’s say we want to move twice as far from A along the line from A to P and then move from that spot by displacement vector D. We will call the final position Q and we want a position vector for Q. Using base point A we have PA = AA + (PA − AA), the sum of a position vector and a displacement vector. That displacement does not depend on the base point: PA − AA = PB − AB. The first part of the instruction to locate Q does not imply that we stretch PA by factor 2. Instead, the intent is to multiply the displacement vector PA−AA by 2. It is only vectors that can be multiplied by scalars to produce a result that is independent of choice of base point. Then we add D. With base point A, the position vector we intended to describe is then QA = AA + 2(PA − AA) + D. Since AA is the zero position vector, if we just examine coordinates it seems like we are doing nothing. But by writing the position vector for P in this way, we can translate the intended operations to a position vector with a new base point effortlessly: QB = AB + 2(PB − AB) + D = BB + (AB − BB) + 2(PB − AB) + D. AB is not, of course, the zero position vector in this second description. 6.3. Exercise. Make selections of locations A, B, P and D in a plane and see what I am describing above in that case. OVERVIEW OF REAL LINEAR ALGEBRA 29 7. Parametric and Point-Normal Equations for Lines and Planes. We want to use vector operations to describe lines and planes in space. Since these lines and planes are composed of points that have specific locations, we use position vectors to describe them: we use the copy of a vector with its tail at a base point to “point” to the location we intend. Implicitly, when we use members of R2 or R3 for position vectors we must have agreed upon, in advance, a choice of base point (identified with the origin in R2 or R3 ) and standard direction vectors (in a particular order) of unit length to which the coordinates of these ordered pairs or triples refer. In this section we assume that has been done, and identify vectors and also position vectors with their coordinate representations. Since we will not, in this section, make a notational distinction between vectors and position vectors we must take extra care to make completely clear which is which during calculations. When there is mention of perpendicularity, we also assume that we have chosen standard direction vectors to be perpendicular to each other and to have the same length: i.e. we use the same scales on each coordinate axis10 so that perpendicularity in the world matches perpendicularity in Rn . We say a vector lies in or lies along a line or plane if, whenever the tail touches the line or plane the entire “shaft” of the vector, head to tail, is in the line or plane. We say a position vector p points to a line or plane if the nose rests on a point of the line or plane when the tail of p is at the origin. We say a nonzero vector n is normal to a line or plane if n · v = 0 whenever v lies in the plane or line. Suppose a given position vector p points to a line in R2 and vector n is normal to that line. The point-normal form for the position vector x of points in the line in R2 is n · (x − p) = 0. In other words, x is a position vector of a point on the line exactly when it satisfies that equation. Expanding, we have the usual standard equation you have all seen for a line in the plane n1 x1 + n2 x2 = n · p. 10Imagine what would happen to the coordinates of arrow-vectors that are perpendic- ular in the world if we measured in centimeters on one axis and inches along another. 30 LARRY SUSANKA Suppose a given position vector p points to a plane in R3 and vector n is normal to that plane. The point-normal form for the position vector x of points in the plane in R3 is n · (x − p) = 0. Again, this produces the usual standard equation for a plane in space n1 x1 + n2 x2 + n3 x3 = n · p. Parametric formulas for constant velocity motion and parametric formulas for planes or “hyperplanes” in higher dimensions are given by: Q(t) = p + tv Q(s, t) = p + sw + tv Q(r, s, t) = p + ru + sw + tv. Here r, s and t are free parameters. They can be any real numbers. Q is the position vector of a point in the object. u, w and v are vectors that lie in the object. 7.1. Exercise. (i) Find a parametric equation for a line through (1, 6) and (8, 9). (ii) Find a point-normal form for this line. (iii) Find a standard equation for this line. 7.2. Exercise. Find a parametric equation for a line through the points (7, 8, 9) and (−2, −2, 4). 7.3. Exercise. (i) Find a point-normal equation for the plane through the points (7, 8, 9) and (−2, −2, 4) and (−1, 1, 7). (ii) Find a parametric equation for this plane. (iii) Find a standard form for this plane. 7.4. Exercise. (i) Find a point-normal form for the plane 2x1 −3x2 +3x3 = 7. (ii) Find a parametric equation for this plane. OVERVIEW OF REAL LINEAR ALGEBRA 31 8. Systems of Linear Equations. We now set aside (for quite some time) our geometrical considerations and descend into a quagmire of messy algebraic calculations. Once you learn what is going on it is absolutely vital that you acquire methods of dodging these complex calculations, offloading any that remain to a hardware assistant to the extent possible. You then must interpret the answer, no small task in itself, but a proper job for a human. It is essentially impossible for us to perform the raw calculations we will be dealing with using pencil and paper unless the problem has been carefully rigged to allow for this. Artificial problems of this kind virtually never come up in practice. Nature is just not that accommodating. So we have to use hardware. We will learn in this class to express one underlying technique in various ways and for various purposes. That single technique is, simply, to find the solution(s) of a system of k linear equations in n unknowns x1 , x2 , . . . , xn or know when that system has no solution. m1 1 x1 + m1 2 x2 + · · · + m1 n xn m2 1 x1 + m2 2 x2 + · · · + m2 n xn .. .. .. . . . mk 1 x1 + mk 2 x2 + · · · + mk n xn = b1 = b2 .. . = bk If the system of equations is homogeneous—that is, all the bi are zero— the system has one obvious solution: namely choose all xi to be zero too. But there might be other solutions, and if some of the bi are nonzero there might be no solution at all. From basic algebra we know how to solve such a system by laborious application of elementary operations and how to express a solution set in parametric form when there are many solutions. These operations come in three types: First, you can multiply one equation by any constant and add it to a different equation. Second you can multiply an equation by any nonzero constant. Third, you can switch the order of two equations. We will call these operations of type one, two and three11, respectively. The solution to a system of equations is not changed if you apply any operation of these three types to the system. And any system can be solved (in principle) by applying these three operations by hand to the system of equations, eliminating variables followed by backsubstitution. If, in the course of solving a system, you produce an equation of the form “0 = c” for some nonzero c then the system has no solution and is called inconsistent. 11An operation of type three can be produced by a combination of three operations of type one plus one operation (multiply by −1) of type two. 32 LARRY SUSANKA If, in the course of solving the system some of the equations turn into “0 = 0” and there are n remaining equations of the form x1 + w 1 2 x2 + · · · + w 1 n xn = c 1 x2 + · · · + w 2 n xn = c 2 .. .. . . xn = c n where each equation has one fewer variable than the one above then there is a unique solution, choosing xn = cn and employing back-substitution to obtain the rest of the xi . If, finally, the system is consistent (not inconsistent) and you end up with fewer nontrivial equations than unknowns the system can be reduced to one in which neighboring descending pairs of equations look like xt + ws t+1 xt+1 + · · · + ws j xj + · · · + ws n xn = cs xj + · · · + ws+1 n xn = cs+1 . In this case some of the variables are the first nonzero term in their row and have a 1 coefficient. These are called pivot variables. When you solve for the pivot variables, starting from the bottom and employing backsubstitution as you move up, the other variables become free parameters and there are an infinite number of solutions obtained by arbitrary choices of these free parameter values. We note in particular that a homogeneous system with fewer equations than variables always has (infinitely many) nonzero solutions. Just to have it on record, we note that if there are k pivots in this last case then there will be n − k free parameters. These are the only three possibilities. This is all very awkward to do, or even to think about, and the effort to carry this out scales badly as the size of the system grows. But the work is very mechanical: multiplying and adding. Just the thing machinery is good at! 8.1. Exercise. For each of the three systems below, solve the system and present your answer in parametric form, or as a single solution, or assert that no solution exists, whichever is appropriate. First System: 3x1 + 2x2 + 5x3 = 1 7x1 − 6x2 + x3 = 9 10x1 − 4x2 + 6x3 = 10 OVERVIEW OF REAL LINEAR ALGEBRA Second System: 3x1 + 2x2 + 5x3 = 1 7x1 − 6x2 + x3 = 9 10x1 − 4x2 + 6x3 = 8 Third System: 3x1 + 2x2 + 5x3 = 1 7x1 − 6x2 + x3 = 9 3x1 − 9x2 + 2x3 = 5 33 9. Einstein Summation Convention. In this note we often use the Einstein Summation Convention: a (possibly long) sum a1 s1 +· · ·+an sn can be written simply as ai si , with the summation over all possible values of i understood. The convention can be used to compress a sum of indexed products where within each product the indices are repeated exactly once. The symbol i could be replaced by any (unused) symbol and the same summation would be meant. The index of summation is sometimes called a “dummy index.” When subscripts or superscripts over which summation is not being taken appear, we assume one instance of the sum is indicated for each possible value of that index. We list below some examples. a 1 b1 + a 2 b2 a1 b1 + a2 b2 + c1 d1 + c2 d2 a1 b11 + a2 b12 and a1 b21 + a2 b22 a1 b11 = 3 and a2 b22 = 3 ⇔ either ai bi or ak bk ⇔ either ai bi + ci di or ai bi + ck dk ⇔ ai bki (k = 1 or 2 implied) ⇔ ai bii = 3 (Not a sum: i = 1 or 2 implied) Einstein invented this notation so that messy sums (of exactly the kind we will be working with throughout this course) will be easier to look at, and it can be a bit shocking to see how much this convention cleans up a discussion. 34 LARRY SUSANKA 10. Matrices. A matrix is a rectangular arrangement of numbers, organized into horizontal rows and vertical columns. The size of a matrix is given by a pair m × n where m represents the number of rows and n is the number of columns. The number of rows is always given first. So the members of Rn we were working with before are n × 1 matrices. The numbers inside a matrix are called entries and the location of an entry is given by specifying its row and column, counting from the upper left corner. Usually, the entries of a generic matrix will be given by lower case letters corresponding to the name of the matrix, with subscripts or superscripts indicating row or column. Thus: 1 x m1 1 m1 2 . . . m1 n m2 1 m2 2 . . . m2 n x2 i M = (mi j ) = .. .. .. and x = x = .. . . . . . mk 1 mk 2 . . . mk n xk In notation for entries of a matrix, if there are two superscripts or two subscripts, the first number usually refers to the row number of the entry. If there is one index as a superscript and one subscript, the superscript is (almost always) the row number of the entry. Two matrices are said to be equal if they are the same size and all entries are equal. Matrices of the same size can be added by adding corresponding entries. This operation is called, oddly, matrix addition. There is a matrix of each size filled with zeroes that acts as an additive identity matrix. It is called the zero matrix. It is always denoted by “0” even though matrices of different sizes should probably not be denoted by the same symbol. As with zero vectors, this ambiguity doesn’t seem to cause much problem. Context determines the shape. Any matrix can be multiplied by a number by multiplying every entry in the matrix by that number. This operation is called scalar multiplication. Matrices of certain sizes can be multiplied by each other in a certain order in an operation called matrix multiplication. Specifically, an n × m matrix on the left can be multiplied by an m × k matrix on the right. The number of columns of the left matrix must equal the number of rows of the right one. The product matrix will be n × k. If A = (aij ) is m × n and B = (bij ) is n × k then the product matrix C = (cij ) = AB is defined to be the m × k matrix whose entries are cij = ait btj . Note the Einstein summation convention in action. This is mk different equations, one for each row-column combination of the entries of C. On the OVERVIEW OF REAL LINEAR ALGEBRA 35 right side of the equation the index of summation is t. So you are multiplying entries of A against entries of B and adding. You move across the ith row of A, and down the jth column of B, adding these numerical products as you move along. Matrix multiplication is not commutative. For one thing, if AB is defined there is no reason for BA to be defined. This will only happen if A and B are m × n and n × m, respectively. But even then, and even if m = n, it is not typical for these matrices to commute with matrix multiplication. It is quite easy to show for matrices A, B, M and N and number c that A+B =B+A A(M + N ) = AM + AN c(M N ) = (cM )N = M (cN ) (M + N )A = M A + N A (A + B) + M =A + (B + M ) whenever these products and sums are defined. It is a bit messier to show that matrix multiplication is associative: (AB)C = A(BC). To see this let AB = D so dij = ait btj and let BC = M so that muv = bus csv . So the ith row jth column entry of (AB)C is i t w diw cw j = a t bw c j while the ith row jth column entry of A(BC) is i w s aiw mw j = a w bs c j . Since these entries are all the same (except for an exchange of dummy indices) the two matrices are equal. The number of steps (individual multiplications and additions) needed to multiply an m × k matrix by a k × n matrix is exactly mn(2k − 1). We will describe this situation by saying the work required to do this task is “of the order mnk,” neglecting constant factors and additive terms of lower-thanmaximum degree. x 3 1 10.1. Exercise. A = 2 7 T 10.2. Exercise. A = 7 3 1 6 3 and B = 2 7 . Find AB. 8 1 6 and B = 2 . Find AB and BA. 8 36 LARRY SUSANKA 11. The Identity and Elementary Matrices. There are several important square matrices which we now discuss, along with a small shopping cart of operations on matrices. First is the n × n multiplicative identity matrix In . It is the one and only matrix that satisfies In M = M and AIn = A whenever A is any m × n matrix and M is any n × k matrix. (Can you prove there is at most one matrix like this for each n?) It is the matrix In = 1 0 .. . 0 0 ... 1 ... .. . 0 ... 0 0 .. . . 1 Any matrix has what is called a main diagonal: the ordered list of entries with equal row and column number. They are called the diagonal entries of the matrix. The identity matrix is a square matrix with ones along the main diagonal and zeroes off the main diagonal. The entries in the identity matrix are denoted δi j . So if i = j the value of δi j = 1. Otherwise δi j = 0. This function (of its subscripts) has several uses, and is called the Kronecker delta function. If A is any n × n matrix and if there is another n × n matrix B with AB = BA = In then B is called the inverse of A, denoted A−1 . It is easy to show that if A has an inverse at all then it has just one inverse: inverses are unique. A matrix with an inverse is called invertible or nonsingular. Matrices without inverses are called singular. One of the important early tasks will be to find inverses, or know when they do not exist. We note the following useful fact: If A1 , . . . , Aj is a list of two or more n × n matrices which all have inverses then the product A1 · · · Aj has an inverse and −1 (A1 · · · Aj )−1 = A−1 j · · · A1 . In other words, under these conditions the inverse of the product is the product of the inverses of the factors in reverse order. Finally, we come to two types of simple square matrices that do have inverses, the elementary row matrices of the first, second and third kinds. OVERVIEW OF REAL LINEAR ALGEBRA 37 An elementary row matrix of the first kind is obtained when you take the identity matrix and add a multiple of one of its rows to another. An elementary row matrix of the second kind is obtained when you take the identity matrix and multiply one of its rows by a nonzero constant. An elementary row matrix of the third kind is obtained when you take the identity matrix and switch two rows. Elementary row matrices of the third kind, and all products of such matrices, have only one nonzero entry in each row, and only one nonzero entry in each column, and those nonzero entries are all 1. Matrices with this property are called permutation matrices. 11.1. Exercise. Prove that the product of two permutation matrices is also a permutation matrix. Any permutation matrix is the product of elementary row matrices of the third kind. If you multiply any elementary row matrix S by any compatible matrix M , placing S on the left, the result will be as if you had performed the operation that created S on M . For instance, left multiplication by a permutation matrix re-orders the rows of M . 11.2. Exercise. (i) Find a 3 × 3 matrix T that, when multiplied on the left of matrix M , as T M , will add 2 times the third row of M to the second row of M , and leave the first and third rows of M alone. (ii) Find a 3 × 3 matrix S that, when multiplied on the left of matrix M , as SM , will multiply the first row by 2 and leave the second and third rows of M alone. (iii) Find a 3 × 3 matrix U that, when multiplied on the left of matrix M , as SM , re-orders the rows of M so that the first row is sent to the third row, the third row is sent to the second row and the second row is sent to the first row. (iv) What are the inverses of S, T and U ? (v) Describe in words the inverse matrices for elementary row matrices of the three kinds. 5 T 6 3 −1 −1 11.3. Exercise. A = and B = . Find (AB)−1 . W 7 2 7 38 LARRY SUSANKA 12. Special Matrices. Matrices can sometimes profitably be thought of as being composed of “blocks” or “submatrices.” Below we have 1 R m1 1 m1 2 . . . m1 n m2 1 m2 2 . . . m2 n R2 M = (mi,j ) = = C1 C2 . . . Cn .. .. .. = .. . . . . mk 1 mk 2 . . . mk n Rk where the Ri are the rows of M and the Ci are the columns of M . For instance for numbers xi and y i you could write the matrix products 1 1 R y y2 R2 C1 C2 . . . Cn . (x1 x2 . . . xk ) .. = y i Ci . = xi Ri or .. . yn Rk So the sum on the left could be called a “weighted” sum of the rows Ri and the sum on the right a “weighted sum” of the columns Ci . The “weights” xi and y i are a measure of how significant each row or column is in the final sum. There is a matrix operation called “transpose” that can be applied to any matrix. It takes a matrix and switches the row and column designations of the entries. The rows become columns, and the columns become rows. If M is an m × k matrix then its transpose, denoted MT , is a k × m matrix. If B = M T then bi j = mj i . 14 T 2 1 1234 = 3 5 . 4157 47 If v and w are vectors in Rn then the dot product v · w can be represented as the matrix product v T w. Transpose is one of those things that is easier to explain by example than by formula. T It is obvious that M T = M . It is also a pretty easy calculation (a test to see if you can use the Einstein summation notation) to show that (AB)T = B T AT . So similarly to the situation with inverses, the transpose of a product is the product of transposes in reverse order. −1 This implies that if A has an inverse then so does AT and AT = T −1 A . OVERVIEW OF REAL LINEAR ALGEBRA 39 An elementary column matrix is the transpose of an elementary row matrix. They act on the right rather than the left to add a multiple of a column of a compatible matrix to a different column, or to multiply a column by a nonzero constant, or to switch two columns. If you wish, you can justify T this by examining R AT where R is an elementary row operation. A matrix is called symmetric if it is its own transpose: M T = M . Such a matrix must be square. Diagonal matrices are matrices having zeroes off the main diagonal. A square diagonal matrix is symmetric. A matrix is called skew symmetric if it is the negative of its own transpose: M T = −M . A matrix like this must be square and have only zeroes on the main diagonal. A matrix that has zeroes below the main diagonal is called upper triangular, and a matrix that has zeroes above the main diagonal is called lower triangular. A matrix that is both upper and lower triangular is, of course, diagonal. 12.1. Exercise. The product of two (compatible) upper triangular square matrices is upper triangular. The product of two (compatible) lower triangular square matrices is lower triangular. There is an important group of invertible matrices called orthogonal matrices. They have the property that their inverse is their transpose: that is, P −1 = P T . If you think of the columns of such a matrix as vectors, the equation P T P = In implies that these vectors are perpendicular to each other, and each has length 1. The same fact is true of the n rows of an orthogonal matrix, stood up as columns. 12.2. Exercise. (i) The product of two orthogonal matrices (of the same size) is itself orthogonal. (ii) Any permutation matrix is orthogonal. We will find occasion later in these notes to consider linear combinations of powers of a square matrix. For an n × n matrix A we define for positive exponent k the matrix Ak to be the obvious product of A by itself k times. We define A0 to be the n × n identity matrix. Given polynomial f (t) = ck tk + ck−1 tk−1 + · · · + c1 t + c0 we then define f (A) by f (A) = ck Ak + ck−1 Ak−1 + · · · + c1 A + c0 In . Last on the list of things we need to define here is trace. That is a function that takes a square matrix and returns the sum of the diagonal entries of that matrix. 762 For instance trace9 3 0 = 10. 530 40 LARRY SUSANKA 5 12.3. Exercise. (i) A = W 5 T 7 6 Find AT . (ii) Two of the following matrices must be equal, assuming the matrices are compatible sizes. Circle these two. (AB)T AT B T B T AT . x y z 1/3 . 12.4. Exercise. A = 2/3 2/3 Find x, y and z so that 2/3 −1/3 −2/3 A is an orthogonal matrix. Then, without doing any further calculation, produce A−1 . 12.5. Exercise. Find f (A) where f is the polynomial f (t) = 3t3 − 2t + 7 and A is the square matrix 5 7 2 A = 7 7 1 . 1 5 6 12.6. Exercise. Find all possible orthogonal matrices of the form 3/5 a A = . b c 1 2 3 T 12.7. Exercise. Find trace C C where C = . 4 5 6 13. Row Reduction. Given any matrix m1 1 m1 2 . . . m1 n m2 1 m2 2 . . . m2 n M = (mi,j ) = .. .. .. . . . mk 1 mk 2 . . . mk n we want to perform elementary row operations on this matrix by multiplying it on the left by elementary row matrices, converting it into simpler form. We will be interested in two special forms here. The first of these is called row echelon form. In any matrix, we will call the first nonzero entry (counted from the left) the “leading coefficient” of the row it is in, and the zero entries to the left of a leading coefficient will be called “leading zeroes” of that row. A matrix is in row echelon form if any rows without leading coefficient (that is, a row of zeroes) are on the bottom and each row after the first has more ‘leading zeroes” than the row above it. OVERVIEW OF REAL LINEAR ALGEBRA 41 Any matrix can be “reduced” to this form by consecutive left-multiplications by elementary row matrices of type one only. For many purposes, row reduced echelon form (shorthand rref ) is more useful. A matrix is in that form if it is in row echelon form and each leading coefficient is 1 and every leading coefficient is the only nonzero entry in it’s column. These are called pivot columns. The transformation to row reduced echelon form can be carried out by consecutive left-multiplications by elementary row matrices of type one to create a row reduced matrix with one nonzero entry in each pivot column, followed by elementary row matrices of type two to clean up the pivots. The row reduced echelon form of a matrix A is denoted rref (A). The total number of operations (additions or multiplications) to carry this out is on the order of nk 2 or n2 k, whichever is bigger. A proof of these facts using induction on the number of columns in the original matrix is not too hard to produce. By hand, this procedure is arduous for 4 × 4 matrices and essentially impossible for a human if the matrix is much larger. But these calculations are quite manageable for any calculator. There is a way to keep track of the steps taken by your calculator as it produces the rref of a k × n matrix M . We create a k×(n+k) block matrix ( M Ik ) and reduce this to rref. The result will be a k × (n + k) block matrix ( S P ) where S is the rref for M . The product of the elementary matrices used to accomplish the reduction will accumulate in the right half of this block matrix, as the invertible k × k matrix P . P is the matrix for which P M = S, the rref for M . Let’s look at a special case. If k = n and if the rref of ( M In ) has In as the left block then the right block is M −1 . To reiterate: If the reduction process takes j steps using elementary row matrices L1 , . . . , Lj and if Lj Lj−1 · · · L2 L1 ( M In ) = ( Lj Lj−1 · · · L2 L1 M = ( In Lj Lj−1 · · · L2 L1 In ) Lj Lj−1 · · · L2 L1 ) then Lj Lj−1 · · · L2 L1 is M −1 . We will argue now that if M has an inverse then the rref for M must be the identity. If not, then the rref would have a row of zeroes on the bottom in the first block. But then (0 0 . . . 0 0) = (0 0 . . . 0 1)Lj Lj−1 · · · L2 L1 M. 42 LARRY SUSANKA The matrix A = Lj Lj−1 · · · L2 L1 M is the product of invertible matrices so is itself invertible. This leads to the contradiction (0 0 . . . 0 0) = (0 0 . . . 0 0) A−1 = (0 0 . . . 0 1) AA−1 = (0 0 . . . 0 1) In = (0 0 . . . 0 1). So we have the important conclusion that M is invertible if and only if its rref is the identity matrix and in that case both M and M −1 are the product of elementary matrices. The reduction process indicated above for an n × n matrix takes on the order of n3 steps, the same order (surprisingly) as the number of steps needed to multiply two n × n matrices. 13.1. Exercise. 1 4 7 A = 6 1 9 0 4 −3 Find the inverse of A using row reduction on the block matrix ( A I3 ). (You should be aware that what you are doing is left-multiplying by elementary row matrices, which accumulate in the right block.) 13.2. Exercise. The inverse of an invertible lower triangular matrix is lower triangular. The inverse of an invertible upper triangular matrix is upper triangular. 14. Matrix Form For Systems of Linear Equations. The system of equations m1 1 x1 + m1 2 x2 + · · · + m1 n xn m2 1 x1 + m2 2 x2 + · · · + m2 n xn .. .. .. . . . mk 1 x1 + mk 2 x2 + · · · + mk n xn = b1 = b2 .. . = bk can be converted to matrix form m1 1 m1 2 . . . m1 n x1 b1 m2 1 m2 2 . . . m2 n x2 b2 Mx = b ⇐⇒ .. .. .. .. = .. . . . . . . mk 1 mk 2 . . . mk n xn bk Finding the column matrix x that makes the matrix equation true is exactly the same as solving the system above it for x1 , . . . , xn . OVERVIEW OF REAL LINEAR ALGEBRA 43 We also come to an interesting fact, thinking of M as a block matrix with columns C1 , . . . , Cn . There is a solution to the matrix equation above exactly when b is a weighted sum of the columns of M . Mx = (C1 C2 · · · Cn ) x = x1 C1 + x2 C2 + · · · + xn Cn = b. The numbers xi we are looking for are the weights on the columns. If k = n so the matrix M is square and if M has an inverse we can find the unique solution as x = M−1 b. In applications it often happens that the entries in M are determined by fixed elements of the problem, but the “target” b varies. Using this method, you can recycle the work needed to create M−1 : find it once and then apply it to various columns b as required. If M−1 = (H1 H2 · · · Hn ) then x = (H1 H2 · · · Hn ) b = b1 H1 + b2 H2 + · · · + bn Hn . So the solution for any b is given as a weighted sum of the columns of M−1 , with the entries of b as weights. But if M has no inverse we can’t go this route. The augmented matrix for the system is: m1 1 m1 2 . . . m1 n b1 m2 1 m2 2 . . . m2 n b2 .. .. . .. .. . . . . mk 1 mk 2 . . . mk n bk The steps needed to solve the system correspond, in matrix language, to left multiplying the augmented matrix by various elementary matrices until the product matrix is in rref. If any row in this rref matrix is of the form 0 0 . . . 0 k where k is nonzero then that row corresponds to the equation 0 = k so there is no solution: the original system was inconsistent. Any rows without pivots (all-zero rows) correspond to the comforting but uninformative equation 0 = 0 and we can ignore them. This leaves 0 ... 0 . . . . . . . .. 0 ... us to consider rref matrices of the form 1 w 1 i1 . . . 0 w 1 i2 . . . 0 w 1 ik 0 0 ... 1 w 2 i2 . . . 0 w 2 ik .. .. .. .. .. .. . . . . . . .. .. .. .. .. .. . . . . . . 0 0 ... 0 0 ... 1 w 1 ik ... ... ... c1 c2 .. . . .. . ck In the matrix above, the pivot columns have only one nonzero entry. The 1 indicated in a pivot column is the first nonzero entry in its row, and every other entry in its column is zero, and every entry which is to the left and below this 1 is zero. 44 LARRY SUSANKA We can now, if we wish, write down the equations to which this matrix corresponds. xi1 −1 + w1 i1 xii + ··· ··· ··· xi2 −1 + w1 i2 xi2 + ··· ··· · · · =c1 ··· ··· · · · =c2 .. .. .. . . . xik −1 + w1 ik xik + · · · =ck The pivot variables xi1 −1 , xi2 −1 , . . . xik −1 occur only once each in the equations above and are the variables corresponding to the pivot columns. The other variables are free parameters, and you can solve for the pivot variables in terms of these. 14.1. Exercise. Linear systems can be readied for solution by matrix methods in two ways. First, as a matrix equation: Ax = b where A is the coefficient matrix, x is the “variable” column, and b is the “constant” column. Second, by creating the augmented matrix (A b). Ready the three systems from Exercise 8.1 for solution by matrix methods in these two ways. 15. Example Solutions. This is awfully messy to look at, but there are ways of keeping things organized. You should play with the two examples found below until you understand everything about them: you must become expert at creating and interpreting solutions like this. We will, repeatedly and for different purposes, use both of the solution methods described. For our first example, we will 1 A = 1 0 let A be the matrix 3 0 5 7 0 5 3 1 11 9 0 12 0 1 6 2 1 16 which corresponds to the augmented matrix for system of equations x1 + 3x2 1 2 + 3 5x4 + 7x5 4 x + 3x + x + 11x + 9x 5 =5 = 12 x3 + 6x4 + 2x5 + x6 = 16 OVERVIEW OF REAL LINEAR ALGEBRA 45 Entering matrix A into your best-Linear-Algebra-friend (that is your calculator) and hitting it with the rref stick produces 1 3 0 5 7 0 5 1 6 2 0 7 rref (A) = 0 0 1 9 0 0 0 0 0 Define the 3 × 6 matrix D and column matrices x, p, C2 , C4 and C5 and C7 : x1 x 2 1 3 x 1 3 0 5 7 0 x x3 D = 0 0 1 6 2 0 x = p = x 4 5 0 0 0 0 0 1 x6 x x6 5 7 5 3 C2 = 0 C4 = 6 C5 = 2 C7 = 7 . 9 0 0 0 The matrix equation Dx = C7 is equivalent to the original system, and the solution, an equation for the pivot variables in terms of free parameters x2 , x4 and x5 can given by Method One p = − x2 C2 − x4 C4 − x5 C5 + C7 (x2 , x4 , x5 are free parameters) 1 2 x x 3 x or, equivalently, = − ( C2 C4 C5 ) x4 + C7 . 6 x x5 46 LARRY SUSANKA A different way to represent the solution is as follows. Let −5 −3 0 1 −6 0 K2 = and K4 = 1 0 0 0 0 0 5 −7 0 0 7 −2 K5 = 0 and K7 = 0 . 0 1 0 9 There will be one of these columns for each non-pivot variable (the free parameters) plus one more (though if C7 had been the zero column, so the original system was homogeneous, K7 would be the zero column.) The general solution is given by Method Two x = x2 K2 + x4 K4 + x5 K5 + K7 or, equivalently, (x2 , x4 , x5 are free parameters) 2 x x = ( K2 K4 K5 ) x4 + K7 . x5 You will note that each non-pivot variable is a free parameter, and is associated with a column Ki that has 1 in a row where all of the different Kj columns, including K7 , have 0. This will be important for us later: we will say that K2 , K4 and K7 are linearly independent as a consequence of this. Let’s look at another example. Here A is the augmented matrix 0 0 A = 0 0 1 8 8 0 2 0 1 8 9 6 3 3 5 9 6 9 1 0 5 9 3 4 1 1 5 5 7 4 OVERVIEW OF REAL LINEAR ALGEBRA 47 for the system of equations x2 + 2x3 + 9x4 + 5x5 + x6 + 3x7 = 5 8x2 + 6x4 + 9x5 + 4x7 = 5 8x2 + x3 + 3x4 + 6x5 + 5x6 + x7 = 7 8x3 + 3x4 + 9x5 + 9x6 + x7 = 4 Delivering the rref encouragement produces 1 0 0 0 0 578/543 −176/543 207/181 0 0 1 0 0 1198/543 −346/543 228/181 . rref (A) = 0 0 0 1 0 73/1629 269/1629 349/543 1 −530/543 338/543 −161/181 0 0 0 0 Define the 4 × 7 matrix D and column matrices x, p, C1 , C6 and C7 and C8 : 0 1 0 0 0 578/543 −176/543 0 0 1 0 0 1198/543 −346/543 D = 0 0 0 1 0 73/1629 269/1629 0 0 0 0 1 −530/543 338/543 1 x x2 2 3 x 0 x 4 x3 0 x = p = C1 = x5 x4 0 x 6 0 x5 x x7 578/543 −176/543 207/181 1198/543 −346/543 228/181 C6 = 73/1629 C7 = 269/1629 C8 = 349/543 . −530/543 338/543 −161/181 The matrix equation Dx = C8 is equivalent to the original system, and the solution, an equation for the pivot variables in terms of free parameters x1 , x6 and x7 can given by 48 LARRY SUSANKA Method One p = − x1 C1 − x6 C6 − x7 C7 + C8 (x1 , x6 , x7 are free parameters) 2 1 x x x 3 or, equivalently, 4 = − ( C1 C6 C7 ) x6 + C8 . x x7 x5 A different way to represent the solution is as follows. Let 1 0 0 −578/543 0 −1198/543 K1 = 0 and K6 = −73/1629 0 530/543 0 1 0 0 0 0 207/181 176/543 228/181 346/543 K7 = −269/1629 and K8 = 349/543 . −161/181 −338/543 0 0 0 1 There will be one of these columns for each non-pivot variable (the free parameters) plus one more (though if C8 had been the zero column, so the original system was homogeneous, K8 would be the zero column.) The general solution is given by Method Two x = x1 K1 + x6 K6 + x7 K7 + K8 or, equivalently, (x1 , x6 , x7 are free parameters) 1 x x = ( K 1 K 6 K 7 ) x 6 + K 8 . x7 Each non-pivot variable is a free parameter, and is associated with a column Ki that has 1 in a row where all of the different Kj columns, including K8 , have 0. None of these Ki can, therefore, be written as a linear combination of the others. OVERVIEW OF REAL LINEAR ALGEBRA 49 Now I will grant you that the work above is fairly ugly. But no calculations were done by hand. The work involved nothing more than writing down entries and keeping track of what it meant. 15.1. Exercise. Solve the systems mentioned in Exercise 14.1 and give solutions, for those with free parameters, in the two forms shown in the example solution. 15.2. Exercise. Solve the following systems in the two forms shown in the example solution. (i) First System: 3x1 + 2x2 + 5x3 + 7x4 − 8x5 7x1 − 6x2 + x3 + 7x4 − 8x5 9x1 − 4x2 + 6x3 + x4 − x5 10x1 − 4x2 + 6x3 + 14x4 − 16x5 16x1 − 10x2 + 7x3 + 8x4 − 9x5 x1 − x2 + 7x3 + 8x4 − x5 11x1 − 5x2 + 13x3 + 22x4 − 17x5 = = = = = = = 1 9 10 10 19 1 13 (ii) Second System: 3x1 + 2x2 + 5x3 + 7x4 − 8x5 7x1 − 6x2 + x3 + 7x4 − 8x5 9x1 − 4x2 + 6x3 + x4 − x5 10x1 − 4x2 + 6x3 + 14x4 − 16x5 16x1 − 10x2 + 7x3 + 8x4 − 9x5 x1 − x2 + 7x3 + 8x4 − x5 = = = = = = 1 9 10 10 19 1 (iii) Third System: 3x1 + 2x2 + 5x3 + 7x4 − 8x5 7x1 − 6x2 + x3 + 7x4 − 8x5 9x1 − 4x2 + 6x3 + x4 − x5 10x1 − 4x2 + 6x3 + 14x4 − 16x5 16x1 − 10x2 + 7x3 + 8x4 − 9x5 = = = = = 1 9 10 10 19 (iv) Fourth System: 3x1 + 2x2 + 5x3 + 7x4 − 8x5 7x1 − 6x2 + x3 + 7x4 − 8x5 9x1 − 4x2 + 6x3 + x4 − x5 10x1 − 4x2 + 6x3 + 14x4 − 16x5 = = = = 1 9 10 10 (v) Fifth System: 3x1 + 2x2 + 5x3 + 7x4 − 8x5 = 1 7x1 − 6x2 + x3 + 7x4 − 8x5 = 9 9x1 − 4x2 + 6x3 + x4 − x5 = 10 50 LARRY SUSANKA (vi) Sixth System: (vii) Seventh System: 3x1 + 2x2 + 5x3 + 7x4 − 8x5 = 1 7x1 − 6x2 + x3 + 7x4 − 8x5 = 9 3x1 + 2x2 + 5x3 + 7x4 − 8x5 = 1 16. Determinants Part One: The Laplace Expansion. We are going to discuss a way of creating a number for each square matrix called the determinant of the matrix. First, the determinant of a 2 × 2 matrix M = mij is m11 m22 − m12 m21 . This number is denoted det(M). So, for instance 45 det = 4 · 2 − 5 · 3 = −7. 32 We reduce the task of calculating one n×n determinant to the calculation of n different (n − 1) × (n − 1) determinants. These smaller determinants are themselves broken up and the process continues until we arrive at 2 × 2 determinants at which point the final answer is calculated as a gigantic weighted sum of many 2 × 2 determinants. The procedure, called Laplace expansion, requires the selection of one row or one column in the n × n matrix. It is by no means obvious that the answer will not depend on which row or column you pick for this step. That the answer will not depend on this choice requires a proof and that proof is pretty involved and would take too much time for us here. So we punt: you may learn the proof, if you wish, in your next Linear Algebra class. I would be happy to direct any student who can’t rest without tieing down this loose end to a readable source. As important as determinants may be in the great scheme of things, they are somewhat of a side issue for us in this particular class. I will spend exactly one class day on this and the next section, outlining the facts about determinants you need to know. I do advise you leave it at that for now. Examine the pattern in the matrix below: + − + − ··· − + − + · · · + − + − · · · .. .. .. .. . . . . ··· When the sum of row and column number is even you have “+” and when that sum is odd you have “−.” OVERVIEW OF REAL LINEAR ALGEBRA by 51 Now suppose we wish to find the determinant of n × n matrix M given m1 2 . . . m1 n m2 2 . . . m2 n .. . .. .. . . . mn 1 mn 2 . . . mn n We pick any row or column. In practice you will look for a row or column with lots of zeroes in it, if there is one. Since we have to pick one, let’s pick row 2. m1 1 m2 1 .. . We look at the first entry in that row. It is in a spot corresponding to a minus sign in the “sign matrix.” We affix the minus sign to m2 1 and multiply that entry by the determinant of the matrix obtained by deleting from M the row and column of m2 1 . We proceed in this way across the row, affixing either a “+” or “−” to the entry there and multiplying by the determinant obtained by deleting from M the row and column of that entry. We then add up these n different (smaller) weighted determinants. Here is an example. 9 2 7 det 3 −6 1 −8 2 3 2 7 9 7 9 2 = (−1)(3) det + (−6) det + (−1)(1) det 2 3 −8 3 −8 2 = (−3)(6 − 14) + (−6)(27 + 56) − (18 + 16) = −508. You just have to find a few 3 × 3 or 4 × 4 determinants to get the idea. We reiterate: doing the analogous calculation, expanding around any other row or any column, will produce the same number for the matrix M . 17. Determinants Part Two: Everything Else You Need to Know. A permutation of the set {1, . . . , n} is a one-to-one and onto function σ : {1, . . . , n} → {1, . . . , n}. You can conceive of them as a “switching around” of the order of these integers. There are a lot of permutations. Actually, there are n! of them. That means, for instance, there are more than 1025 ways of “switching around” the first 25 integers. Each permutation σ can be built in stages by switching a pair of numbers at a time until they are all sent to the right place. It is a fact (fairly hard to show) that if you can build a permutation by an even number of pair-switches then every way it can be built will require an even number 52 LARRY SUSANKA of switches. This implies that if you can build a permutation by an odd number of pair-switches then every way it can be built will require an odd number of switches. A permutation is called “even” or “odd” depending on this. We assign the number 1 or −1 to a permutation depending on if the permutation is even or odd. The notation sgn(σ) is used for this assignment, and it is called the “signum” of the permutation σ. The determinant of an n × n matrix A is defined to be X sgn(σ) a1 σ(1) a2 σ(2) · · · an σ(n) . det(A) = all permutations σ This sum is over all possible ways of picking one entry from each row and each column in the matrix A, with a minus sign attached to half of these possible choices. Using the definition directly is not really a feasible means of calculating a determinant, though you might give it a try when n = 3. This definition is good for proving things about determinants. Some of the things you prove are better methods of calculating determinants, or avoiding a calculation entirely. We learned in the last section that an n × n determinant can be “expanded” around any row or any column, given as the “signed sum” of n smaller determinants. The proof that you can do this comes from careful examination of the definition above. It may be proved, for instance, by induction on the size of the determinant. The method, Laplace expansion, let’s us avoid thinking about permutations. But it is no better than the original definition as far as how many steps are required to calculate a determinant is concerned. It takes on the order of n! steps too. So we need a better way, and finding that way has the added benefit of giving us new facts about determinants. Once again, these facts are all proved by careful examination of the original definition of determinant. Fact 1: If A is square, det(AT ) = det(A). Fact 2: If you multiply any row or any column of a square matrix by the number k, the determinant changes by factor k. Fact 3: If you switch two rows or two columns of a square matrix you change the determinant by a factor of −1. So if two rows (or two columns) are multiples of each other, the determinant must be 0. OVERVIEW OF REAL LINEAR ALGEBRA Fact 4: If every sum, a determinant nants: s 1 1 + t1 1 s 2 1 + t2 1 det .. . 53 entry in the first column of a square matrix contains a of that matrix can be found as a sum of two determi- m1 2 . . . m1 n m2 2 . . . m2 n .. .. . . sn 1 + tn 1 mn 2 . . . mn n s1 1 m1 2 . . . m1 n t1 1 m1 2 . . . m1 n s2 1 m2 2 . . . m2 n t2 1 m2 2 . . . m2 n = det .. + det .. .. .. .. .. . . . . . . . sn 1 mn 2 . . . mn n tn 1 mn 2 . . . mn n By looking at transposes and switching another column with the first, we see that the corresponding fact is true if we have any row or column whose entries are written as a sum. Fact 5: If you recall our discussion of row echelon form, we can create that form by elementary row operations solely of type one: “add a multiple of one row to another.” Doing one of these operations on a square matrix does not change its determinant. You can do the same thing (examine transposes) by using elementary column operations of the type “add a multiple of one column to another.” If the matrix is n × n the final result in either case is a triangular matrix that has the same determinant as the original matrix. Fact 6: The determinant of an n × n triangular matrix A is the product of the diagonal entries: only one term in the determinant definition sum is nonzero. That term is a1 1 a2 2 · · · an n . Fact 7: If a square matrix M is broken into blocks and among these blocks are square blocks D1 , . . . , Dk arranged along and exactly covering the main diagonal of M , and if all entries beneath each of these blocks down to the nth row are zero then det(M ) = det(D1 ) · · · det(Dk ). This is, of course, a generalization of Fact 6 and is proved using Facts 5 and 6. Fact 8: Any n × n matrix A can be reduced to an upper triangular matrix R by H1 · · · Hk A = R where each Hi is an elementary row operation matrix of type one. Similarly (look at transposes) any n × n matrix B can be reduced to a lower triangular matrix S by BC1 · · · Cm = S where each Ci is an elementary column operation matrix of this simple type. So det(A) = det(H1 · · · Hk A) = det(R) = r1 1 r2 2 · · · rn n det(B) = det(BC1 · · · Cm ) = det(S) = s1 1 s2 2 · · · sn n . and 54 LARRY SUSANKA This reduction process takes on the order of n3 steps, much better than n!. Fact 9: The matrix RS from above is diagonal with nonzero entries r1 1 s 1 1 r2 2 s 2 2 ... rn n s n n running down along its main diagonal. This means det(AB) = det(H1 · · · Hk A BC1 · · · Cm ) = det(RS) = r1 1 s1 1 r2 2 s2 2 · · · rn n sn n . So we conclude that the following very important equation holds for any square (compatible) matrices A and B: det(AB) = det(A) det(B). Fact 10: If A has an inverse then 1 = det(In ) = det(AA−1 ) = det(A) det(A−1 ). So if A has an inverse then det(A) 6= 0 and det(A−1 ) = (det(A))−1 . On the other hand, if det(A) = 0 then A cannot have an inverse. 17.1. Exercise. Calculate −8 2 0 7 9 0 6 7 4 1 det −8w 2w 0 7w 9w . 0 0 0 2 5 8 0 0 1 5 17.2. Exercise. C is the elementary 4 × 4 matrix that adds twice the second row to the third. D is the elementary 4 × 4 matrix that multiplies the second row by 9. Also, for the following 4 × 4 matrices A and B we know that det(A) = 6 and det(B) = 2. For each of the following, calculate the determinant or indicate that there is not enough information to determine the answer. (i) det(AB) = (ii) det(5B) = (iv) det(A−1 ) = (v) det(B T ) = (vii) det(CA) = (viii) det(DB) = (iii) det(A + 7B) = (vi) det(ABA−1 ) = OVERVIEW OF REAL LINEAR ALGEBRA 55 18. Linear Transformations from Rn to Rm . A linear transformations from Rn to Rk is a function f with domain Rn and with range in Rk , indicated by notation f : Rn → Rk , which satisfies: f (u+cv) = f (u)+c f (v) for all members u and v in Rn and any constant c. “Left multiplication by an n × k matrix” is the prototypical linear transformation from Rn to Rk , and (this is really important) any linear transformation f from Rn to Rk , however it might have been presented to you, actually is left multiplication by the k by n matrix M = f (e1 ) · · · f (en ) . This formula f (x) = M x is easily seen to be true when x is one of the ei . The linearity of f then shows that the equation is true for any x in Rn . f (x) = f (xi ei ) = xi f (ei ) = M x. The matrix M is called the matrix of f . A linear transformation is completely determined by what it does to the basis vectors so if two linear transformations agree on the basis vectors they agree on any vector: they are the same linear transformation. That provides a rather straightforward way to see if a function given by some formula is linear or not. Evaluate f (ei ) for i = 1, . . . , n and create matrix M . See if the formula f (x) for generic x is the same as M x. If they are equal, f is linear. If they aren’t, it’s not. A particularly simple (and therefore important) example of a linear transformation is a linear function f : Rn → R. The matrix of a function like that is 1×n, a row matrix. Obviously these look an awful lot like members of Rn , which are column matrices. The set of these row matrices is denoted Rn∗ and called the dual of Rn . Members of Rn∗ are called linear functionals when they are thought of as matrices for a linear transformation on Rn . More generally, any function f : Rn → Rm has m different coordinate functions f 1 , . . . , f m given by 1 f (x) f (x) = ... . f m (x) The function f : Rn → Rm is linear exactly when all m coordinate functions are linear functionals. 56 LARRY SUSANKA A linear transformation is nothing more than a higher-dimensional version of direct variation. You may recall that a real variable y is said to vary directly with a real variable x that there is a constant k for which y = kx. If you know a single nonzero point on the graph of this relationship you can pin down k and then you know everything about the variation. It is a straight line through the origin with slope k. For a linear transformation from Rn to Rm each of the m different range variables y 1 , . . . , y m is directly proportional to each of the n different domain variables x1 , . . . , xn . There are mn different variation constants, one for each pair of variables, and these are the entries of the matrix of the linear transformation. 18.1. Exercise. If f : Rk → Rm and g : Rm → Rn are both linear then so is f ◦ g : Rk → Rn . If M f is the matrix of f and M g is the matrix of g then M f M g is the matrix of f ◦ g. Here are some very important linear transformations: (1) One empty column (or row) ina determinant. For in · 2 6 stance given the “mostly filled” matrix · 8 3 we can define · 8 1 3 g : R → R by 1 v 2 6 g(v) = det v 2 8 3 . v3 8 1 (2) Dot product against a fixed vector w. This is the function f : Rn → R given by f (v) = w · v. (3) Projection onto a line through the origin containing a v·w w. vector w. This is given by P rojw (v) = w·w (4) Projection onto a plane (or hyperplane) perpendicular to a vector w. The formula for this is CoP rojw (v) = v − P rojw (v). (5) Reflection in a plane (or hyperplane) perpendicular to a vector w. We calculate this by Ref lw (v) = v − 2 P rojw (v). (6) Inversion given by Inv(v) = −v. (7) Rotation in R2 counterclockwise by angle θ given by Rotθ (v) = ( v 1 cos(θ) − v 2 sin(θ), v 1 sin(θ) + v 2 cos(θ) ). OVERVIEW OF REAL LINEAR ALGEBRA 57 (8) Rotation in R3 by angle θ around the e3 axis given by Rot θ, e3 (v) = ( v 1 cos(θ) − v 2 sin(θ), v 1 sin(θ) + v 2 cos(θ), v 3 ). 18.2. Exercise. Create matrices for the eight functions above, using w = (1, 2, 3) for (2)-(5), and verify that all eight are linear. Note that if w is a unit vector the matrix of P rojw is w wT . More generally, if w is nonzero T the matrix of P rojw is wwTww . 19. Eigenvalues. We define eigenvalues and eigenvectors of a linear transformation f and associated matrix M when range and domain are both Rn . A real number λ is called an eigenvalue for M (or for f if M is the matrix of a linear transformation) if there is a nonzero vector x for which M x = λx. M x = λx is equivalent to λx − M x = (λIn − M ) x = 0. There will be a nonzero solution to this equation exactly when det(λIn − M ) = 0. This determinant is an nth degree polynomial in λ, called the characteristic polynomial, which could have real roots: the real eigenvalues. If n is odd, it will always have at least one. So the first thing to do is find those roots. Since finding the roots of an nth degree polynomial algebraically is an arduous, perhaps impossible, task you will often in practice be forced to estimate these eigenvalues using hardware. Your graphing calculator can get numerical estimates for the roots of this polynomial. If the polynomial has nice rational roots, as many rigged example problems do, graphing this polynomial obtained from the determinant function built into your calculator and seeing where the graph crosses the x axis will enable you to find all of the eigenvalues exactly. For a particular eigenvalue λ, the set of solutions for M x = λx is called the eigenspace for M and the eigenvalue λ. Every vector (except the zero vector) in an eigenspace for eigenvalue λ is an eigenvector for λ. The eigenspace for eigenvalue 0 is a set we have run into before. It is the solution space to the homogeneous system determined by matrix M. More generally, the eigenspace for eigenvalue λ is the solution space to the homogeneous system determined by matrix λIn − M . On an eigenspace, the linear transformation has essentially trivial effect. It acts on any vector there by simply multiplying by a constant. That is a very easy process to understand. 58 LARRY SUSANKA To actually find eigenvectors for a known eigenvalue λ you solve (λIn − M )x = 0. The columns associated with free parameters in our “Method Two” solution are eigenvectors, and any eigenvector for this eigenvalue is a linear combination of these columns. If we are forced to use approximate eigenvalues, these columns will of course only be “approximate” eigenvectors. Thinking about what “approximate” means can be a bit tricky. Practical folk such as Engineers spend a lot of time thinking about this in applied mathematics classes. In the last six examples of the last section, the domain and range of the transformation are the same, so they might have eigenvalues and eigenvectors12. 19.1. Exercise. Find eigenvectors and eigenvalues for the linear functions (3)-(8), whose matrices you created in Exercise 18.2. 19.2. Exercise. Find eigenvectors and eigenvalues for F (x) = Ax and G(x) = Bx and H(x) = Cx where 4 −2 0 −1 6 7 A= and B = and C = . 1 1 1 0 −2 −3 19.3. Exercise. Find eigenvectors and eigenvalues for L(x) N (x) = Bx and K(x) = Cx where −8 2 0 1 2 8 −12 12 14 0 6 7 A = 0 12 0 B = −6 6 7 , C = 0 0 −2 0 0 0 2 1 1 0 0 1 0 0 0 = Ax and 7 4 6 2 2 9 1 9 . 5 5 20. Real Vector Spaces and Subspaces. After becoming familiar with Rm and related linear functions we now make definitions of objects that have features similar to these. Many of the problems you will be working on for a while are designed to see if the definitions we make apply in a given situation or not. Mathematicians often argue about definitions. Good definitions make hard things easy to think about. We make these abstract definitions for the following simple and practical reason. It was observed how useful the theorems about Calculus and Geometry and angles and solving linear equations and so on could be in Rn . It was observed that only a few properties of Rn and dot product were ever used in proving these theorems. It was observed that many of the objects 12If you think carefully about the meaning of eigenvalues and eigenvectors you can determine them without a calculation in each of these cases. Be sure, however, to actually do the calculation for inversion and a variety of reflections, projections and rotations. OVERVIEW OF REAL LINEAR ALGEBRA 59 that kept popping up in disparate applications had all these properties. So all those theorems would be valid without change for any of these objects, and not just Rn ! You pay a price “up front” to learn these definitions, which might themselves have subtle aspects, but once these are learned the problems you usually encounter are simpler to think about and solve. We now define a vector space and subspaces of these vector spaces. These are objects that “act like” Rn and lines and planes through the origin. A real vector space is a nonempty set V together with two operations called vector addition and scalar multiplication that satisfy a collection of properties. Vector addition “acts on” pairs of members of V . Scalar multiplication “acts on” a pair one of which is a real number and the other of which is a member of V . We insist that both vector addition and scalar multiplication must be closed in V, and by that we mean that the result of applying these operations on members of V or real numbers always produces a member of V . You cannot leave V by doing these operations to members of V and real numbers. There must be a distinguished member of V , always denoted 0, for which v + 0 = v for all members v of V . You distinguish this member of V from the real number 0 by context. They are different, unless V = R. For each v in V there must be a member u of V for which v + u = 0. u can be denoted −v, and is called the negative or opposite of v. Vector addition must be commutative and associative: that is, v + w = w + v and (v + w) + u = v + (w + u) for any members v, u and w of V . The two distributive laws must hold: (r + s)v = rv + sv and r(v + w) = rv + rw for all real numbers r and s and any members v and w of V . Finally, (rs)v = r(sv) and 1v = v for all real numbers r and s and any v in V . To show that some situation in the world (such as the collection of arrowvectors we examined) “looks like” or “is” a vector space, you must define two operations and show (or just assume and see how that works out) that the ten properties are true. In a more abstract setting, to show a set with two operations is a vector space (or not) requires one to check that all ten requirements are actually true (or not) for these operations. Usually it will be pretty obvious if a condition is true, and counterexamples easy to come by if false. A subspace of a vector space V is a subset W of V which is, itself, a vector space with the operations it inherits from V . 60 LARRY SUSANKA To show a subset W of a vector space V is a subspace you need only show that scalar multiplication and vector addition are closed in W . The other eight conditions, required of the operations on W , are automatically true if you (somehow) already know that V is a vector space. The set containing just the zero vector is a subspace of every vector space. Any one-element vector space is called a trivial vector space. Also (obvious but worth mentioning) every vector space is a subspace of itself. R itself is a very simple real vector space. Suppose D is any nonempty set and let V be any vector space. Define VD to be the set of all functions with domain D and range contained in V . Define pointwise addition and pointwise scalar multiplication on V D in the obvious way: If f, g ∈ V D and c ∈ R define f + g to be the function (f + g)(d) = f (d) + g(d) for all d ∈ D and define (cf )(d) = cf (d) for all d ∈ D. V D is a vector space with pointwise addition and scalar multiplication. All of the “abstract” vector spaces you will actually use for something, both in this text and almost anywhere else13, are subspaces of vector spaces of this type. Once you identify a nonempty subset of such a space, all you need to do is verify closure of the two operations and you can conclude your subset is a vector space in its own right. The set of matrices of a specific shape with usual operations forms a vector space. We will often denote the m × n matrices with real entries by Mm×n . Rn = Mn×1 , of course, provides the prototypical example of this type. The diagonal, upper triangular and lower triangular matrices form subspaces of Mm×n . Remember that a matrix is defined by its entries. We choose to visualize those entries in a rectangular array. That helps us organize complex operations such as matrix multiplication or row reduction in an efficient way. But it is the subscripted real entries that define the matrix. So if D = { (i, j) | 1 ≤ i ≤ m and 1 ≤ j ≤ n and i, j ∈ N } a matrix is nothing more than a function m : D → R. The big rectangular symbol used to denote this function is not truly relevant. Matrix addition and scalar multiplication are the pointwise operations defined above on RD . 13Traditionally a student is exposed to one or two others as odd examples or counterexamples with “surprise” value. It will be very rare to encounter any of them in any applications. OVERVIEW OF REAL LINEAR ALGEBRA 61 The set of square n × n matrices is an important special case. The sets of symmetric, skew symmetric and traceless (i.e. trace equals zero) matrices are each subspaces of this vector space. The set of real valued functions whose domain is a specified interval of the real line is an important example. The continuous functions with this domain, the set of differentiable functions, the set of polynomials and the set of polynomials of third degree or less (again, with this domain) are four subspaces of this vector space. We will denote the set of real polynomials in variable t defined on R by P(t). The polynomials of degree at most n will be denoted Pn (t). We use this particular kind of function space a lot here: they are fairly familiar to most people taking this class and provide some examples that are clearly unlike Rn in some ways. A real sequence is a function f : N → R. The set of convergent real sequences is a subspace of RN with pointwise operations. A formal power series in one real variable is a formal sum P (we do not n presume convergence of the sequence of partial sums) of the type ∞ n=0 an t . We add two of these power series and multiply by scalars in the obvious way, by adding like powers and distributing a scalar factor across all terms in a formal series. This example is nothing more than an interpretation of the last example. The power series is determined by the sequence a : N → R used to create it, and addition and scalar multiplication match pointwise operations on the sequences. The set of power series that converge on a fixed symmetric interval around 0 constitute a vector subspace of the formal power series. There are many other examples, libraries full of them, each one requiring its own setup. Just to give you a taste, we examine a rather odd one. Let V be the set of positive real numbers. Define operation ⊕ on V by v ⊕ w = vw. Define a scalar multiplication ⊛ by r ⊛ v = v r for any real number r and member v of V . Then V is a vector space with ⊕ for vector addition and ⊛ as scalar multiplication. 20.1. Exercise. Show, by verifying all ten properties explicitly, that the function space V D , where D is a nonempty set and V is a real vector space, is itself a real vector space as claimed above with pointwise addition and scalar multiplication. 20.2. Exercise. Suppose given a continuous function g : [0, ∞) → R. Let RS ∞denote the set of all those real-valued functions f : [0, ∞) → R for which 0 f (t)g(t)dt exists. 20.3. Exercise. Show that the intersection of two subspaces (of the same vector space) is a subspace. 62 LARRY SUSANKA 20.4. Exercise. Decide the exact conditions under which the union of two subspaces (of the same vector space) is a subspace. 20.5. Exercise. We define the sum of two subspaces U and W of the vector space V to be the set { u + w | u ∈ U and w ∈ W }. We denote this set U + W. Show that U + W is a subspace of V . 20.6. Exercise. Let A = { (x, y) ∈ R2 | y = ±x }. Is A a subspace of R2 ? 20.7. Exercise. Let B = { (x, y, z) ∈ R3 | 2x − 3y + 4z = 1 }. Is B a subspace of R3 ? 20.8. Exercise. Let C = { (0, 0) } ∪ { (x, y) ∈ R2 | x2 + y 2 > 1 }. Is C a subspace of R2 ? 20.9. Exercise. Let D = { (x, y) ∈ R2 | x2 + y 2 < 1 }. Is D a subspace of R2 ? 20.10. Exercise. Let F = { (x, y) ∈ R2 | (x, y) · (1, 7) = 0 }. Is F a subspace of R2 ? 20.11. Exercise. Suppose p and v are fixed members of Rn . Let G = { x ∈ Rn | x = p + tv for some t ∈ R }. When is G a subspace of Rn ? 20.12. Exercise. Suppose A is a fixed 3 × 3 matrix. Let Is H a subspace of R3 ? H = { x ∈ R3 | Ax = 7x }. 20.13. Exercise. Suppose A is a fixed 3 × 3 matrix. Let K = { x ∈ R3 | Ax = 0 }. Is K a subspace of R3 ? What does this have to do with a solution set for a homogeneous system? 20.14. Exercise. Let G be the set of polynomials in variable t of even degree. (Constant functions are of degree zero.) Is G a vector space? 20.15. Exercise. Let Is W a subspace of R3 ? W = { (x, y) ∈ R2 | y ≥ 0 }. 20.16. Exercise. Show that the example operations defined at the end of text of Section 20 above make [0, ∞) into a real vector space. 20.17. Exercise. (i) Consider the set R2 together with “scalar multiplication” given by c ⊙ x = 0 for all real c and all vectors x, together with the usual vector addition. Which of the ten properties that define a vector space hold for these two operations? (ii) Now consider the same question when also ordinary vector addition is replaced by the operation v ⊕ w = 0 for all vectors v and w. OVERVIEW OF REAL LINEAR ALGEBRA 63 20.18. Exercise. Consider the set R2 together with ordinary scalar multiplication and “vector addition” given by the operation v⊕w = (v 2 +w1 , w2 +v 1 ) for all vectors v and w. Which of the ten properties that define a vector space hold for these two operation? 21. Basis for a Vector Space. A linear combination of vectors is a finite sum of numerical multiples of vectors, all from the same vector space. The span of a set S of vectors in a vector space V is the set of all vectors that can be written as a finite sum a1 v 1 + · · · + ak v k ai are scalars, vi are in S. The span of a set of vectors S, denoted Span(S), is always a subspace of V . We say that S spans V if Span(S) = V . We define linear dependence and its negation, linear independence. A set of vectors is linearly dependent if at least one of these vectors can be written as a linear combination of the others. Linear dependence can be phrased in terms of the existence of a nontrivial solution to a certain homogeneous system of equations. A set S of vectors is linearly independent if whenever v1 , . . . , vk are different members of S the equation x1 v 1 + · · · + xk v k = 0 has only the zero solution x1 = x2 = · · · = xk = 0. S will be linearly dependent if there are distinct (that is, different) members v1 , . . . , vk of S for which the equation above has a nontrivial (that is, not all xi are zero) solution. Of course any set containing the zero vector is linearly dependent. A set containing just a single nonzero vector is linearly independent. Any set containing two vectors where one is a multiple of the other is linearly dependent. But any set containing just two vectors for which neither one is a multiple of the other is linearly independent. In context, the repetitive word “linearly” is often dropped, and we refer, simply, to dependent or independent sets of vectors. We now define a basis of a vector space and the closely related and computationally convenient concept of ordered basis. A basis for a vector space V is a linearly independent set of vectors that spans V. 64 LARRY SUSANKA An ordered basis is a listing of the different members of a basis in a particular order. We will almost always use ordered bases. Sometimes we will refer to an ordered set of vectors as a list. We have a theorem (proved below) that states that if a vector space has a basis, and there is a finite number of vectors in a basis of that vector space, then that number cannot vary. This number is called the dimension of the vector space. It is denoted dim(V). If V is the trivial vector space it is said to be zero dimensional. If V is nontrivial but has no finite basis, V is said to be infinite dimensional. It is convenient to observe that any set of nonzero vectors S (which may be a dependent set) can be trimmed to contain a basis for Span(S)—select independent vectors from S, one at a time, till no independent directions remain. So every nontrivial vector space has a basis, which may be selected from any spanning set. On the other hand, if T is any linearly independent set of vectors in a vector space V then T is contained in a basis for V . You “grow” the basis from T by adding independent vectors from V one at a time until there are no independent directions left in V . So a vector space cannot have dimension exceeding that of any containing space. Any linearly independent set in a vector space can be extended to a basis of the space. Any spanning set contains a basis. These two important theorems remain true for infinite dimensional spaces, even though the somewhat easier proofs we could create in a finite dimensional setting fail. The mathematics required to handle these cases efficiently is beyond the scope of our goals here, and we leave these proofs for later classes. This lacuna is less of a problem than you might suppose: most of the vector spaces you will see can be (or are) given explicitly in terms of a basis from the outset and the benefits of (typically nonconstructive) existence proofs in practice are (usually) minimal. It is very important to note that if S is an independent set of vectors then each member of Span(S) can be written in only one way as a linear combination of members of S, except for order of terms in the sum. If S is a basis of V , so Span(S) = V , then every member of V can be written in one and only one way (except for order) as a linear combination of vectors from S. 21.1. Exercise. Prove the last boxed statement above. OVERVIEW OF REAL LINEAR ALGEBRA 65 21.2. Exercise. Which of the following are linearly independent sets? { (1, 2) }. { (1, 2), (2, 4) }. { (3, 6, 9), (11, 14, 26), (1, 2, 8) }. 21.3. Exercise. Is the span of the following set all of R3 ? { (4, 5, 2), (1, 6, 2), (10, 22, 8) }. 21.4. Exercise. Suppose S is a set of vectors and a1 s1 + · · · + ak sk is a nontrivial linear combination involving members si ∈ S. Let us suppose further that all ai 6= 0. Let T be the set consisting of all members of S except for s1 , which has been replaced by a1 s1 + · · · + ak sk . Show that Span(S) = Span(T ). Note that rref (A) is obtained from matrix A by replacing rows of A, one after another, by combinations of rows of just this type. Conclude that the span of the rows of A is the same as the span of the rows of rref (A). 21.5. Exercise. Let S be the set of vectors S = { (1, 3, 6, 2), (4, 3, 6, 2), (5, 6, 12, 4), (6, 9, 18, 6), (8, 1, 7, 1) }. Find a very nice basis for Span(S), defined (somewhat subjectively, but you know one when you see one) to be a basis whose vectors have lots of zero entries and few fractional entries and which are obviously independent. 21.6. Exercise. Let T be the set of polynomials T = { t2 − 3t, t2 + 3t, 7t }. Is T a basis for Span(T )? If yes, prove it. If not, find a basis for Span(T ). 21.7. Exercise. Find a basis for the vector space { at2 + b(t2 + t) + ct ∈ P (t) | a, b, c, d ∈ R }. 21.8. Exercise. We know that the diagonal matrices, the upper triangular matrices, the symmetric matrices, the skew symmetric matrices and the traceless matrices all form subspaces of Mn×n . What is the dimension of each of these subspaces? 21.9. Exercise. Does { Sin(t), et } span { a Sin(t) + b et + c Sin(t)et | a, b, c ∈ R }? 21.10. Exercise. Let P (t) denote the set of all polynomials in variable t. Let W be the set of polynomials W = { f ∈ P (t) | f (1) = 0. }. Is W a subspace of P (t)? If no, prove it. If yes, find a basis for W . 66 LARRY SUSANKA 21.11. Exercise. Prove that the polynomial tn is not in the span of { 1, t, . . . , tn−1 }. 21.12. Exercise. Suppose B1 , . . . , Bi is a linearly independent list of vectors in Rn and suppose v1 , . . . , vt is a linearly independent list of vectors in Ri . So it must be that t ≤ i ≤ n. Let M be the n × i block matrix given by M = (B1 . . . Bi ) and define vectors S1 , . . . , St by Sj = Mvj for j = 1, . . . , t. Show that S1 , . . . , St is a linearly independent list of vectors. hint: We note two facts. First, the only solution to My = 0 is y = 0. Second, (using the Einstein summation convention here) if xj vj = 0 then xj = 0 for all j. With these two facts in hand, we suppose xj Sj = 0. But then so xj vj = 0 so xj = 0 for all j. 0 = xj Sj = xj Mvj = M xj vj This leads to the desired conclusion. 22. The Span of Column Vectors. In this section we are going to find out how to see if a finite list of column vectors is independent, and how to select a basis from among them if they are dependent. We have now seen other types of vector spaces than Rn . After we discuss coordinates you will see that the approach of this section (and the next) will apply to these other spaces too. Suppose given column vectors C1 , . . . , Cn . Our first goal is to determine dependency among these vectors. Dependency is found through a nontrivial solution to the equation x1 C1 + · · · + xn Cn = 0. This corresponds to the matrix equation ( C1 · · · Cn )x = 0. If we solve the equation by doing row reduction to rref there are two possibilities. First, all columns could be pivot columns, in which case the only solution is the “xi = 0 for all i ” solution, and the columns C1 , . . . , Cn form an independent set and so are a basis for Span({ C1 , . . . , Cn }). The other case is where some of the columns are pivot columns but others are not. This divides the variables into two groups: pivot variables OVERVIEW OF REAL LINEAR ALGEBRA 67 xi1 , . . . , xik and free variables xj1 , . . . , xjn−k . The free variables can be chosen arbitrarily, and then the pivot variable values are determined by formulas involving these free variables as we have seen. But the point here is that there are nontrivial solutions to the matrix equation above, so the original list of columns is a dependent list. But we can actually get more out of this. If we have any choice of free variables and determine the pivot variables from them we find, moving free variable terms to the right, that xi1 Ci1 + · · · + xik Cik = − xj1 Cj1 − · · · − xjn−k Cjn−k . By choosing free variable xj1 = −1 and all other free variables to be 0 we find that we can write the “free column” Cj1 in terms of the “pivot columns.” That means it can be deleted from the list of columns without affecting the span of the columns. We can say the same thing about all the free columns: they can all be deleted without affecting the span of the columns. Having done the deletion of all columns associated with free variables and recreating the matrix equation above, we find that all columns in the row reduced matrix are pivot columns, and therefore the reduced list is a basis of the span of all the columns. It is worth noting that the dimension of this vector space is the number of nonzero rows in the rref of the original (as well as the shortened) matrix of column vectors: there is one nonzero row for each pivot. The rref of the shortened matrix is a block matrix with identity matrix Ik on top of a zero block. Let’s apply this setup to a specific situation. Let S be the set of vectors S = { (1, 3, 6, 2), (4, 3, 6, 2), (5, 6, 12, 4), (6, 9, 18, 6), (8, 1, 7, 1) }. This generates matrix equation x1 0 1 4 5 6 8 2 x 3 3 6 9 1 3 0 6 6 12 18 17 x4 = 0 . x 0 2 2 4 6 1 0 x5 Hitting this with the rref stick yields x1 0 1 0 1 2 0 2 x 0 0 1 1 1 0 x3 = 0 . 0 0 0 0 1 x4 0 0 0 0 0 0 0 x5 68 LARRY SUSANKA After deleting the superfluous free columns { (5, 6, 12, 4), (6, 9, 18, 6) } we have a basis for Span(S), given by the pivot columns { (1, 3, 6, 2), (4, 3, 6, 2), (8, 1, 7, 1) }. In case you need an explicit expression of the dependency, we have 1 x −1 −2 x2 = x3 −1 + x4 −1 . 0 0 x5 Choosing x3 = −1 and x4 = 0 we have C3 = C1 + C2 . Choosing x3 = 0 and x4 = −1 we have C4 = 2C1 + C2 . It is worth noting the relation between the entries in the vector to the right of the free variable and the specific combination of pivot vectors generating the dependency of a “free” vector in terms of the “pivot” vectors. 23. A Basis for the Intersection of Two Vector Subspaces of Rn . Suppose { C1 , . . . , Ck } is an ordered basis for a subspace V of Rn and { B1 , . . . , Bi } is an ordered basis for a subspace W of Rn . We know that V ∩ W is a vector subspace of Rn , but how can we determine what it is? How do we represent it? A vector v is in V ∩ W exactly when it can be represented in terms of both bases listed above. Specifically, there are numbers x1 , . . . , xk and y 1 , . . . , y i for which v = x 1 C 1 + · · · + x k C k = y 1 B1 + · · · + y i Bi . So we are seeking all solutions to x1 C1 + · · · + xk Ck − y 1 B1 − · · · − y i Bi = 0. This corresponds to the block matrix equation Mz = (C1 · · · Ck − B1 · · · − Bi )z = 0 where z is the column (x1 , . . . , xk , y 1 , . . . , y i ). In rref (M), all of the variables x1 , . . . , xk must be pivot variables because { C1 , . . . , Ck } is a linearly independent set. The row reduction on M is performed one column at a time, moving from left to right. The exact same elementary row operations (left multiplication by elementary matrices) that clean up the first k columns of M will do the same for the block matrix (C1 · · · Ck ). Since all k columns in this last matrix are pivot columns, these identical first k columns must be pivot columns in rref (M) as well. If all of the y 1 , . . . , y i are pivot variables too then the only solution is the all-zero solution and the intersection V ∩ W is {0}. OVERVIEW OF REAL LINEAR ALGEBRA 69 On the other hand if any of the y 1 , . . . , y i are free then V ∩ W will not be trivial. Suppose y j1 , . . . , y jt are the free variables among the y 1 , . . . , y i . The solution coefficient values x1 , . . . , xk and also y 1 , . . . , y i can be written in terms of them. A solution expression y 1 B1 + · · · + y i Bi is a generic member of the intersection. For each q between 1 and t let vq be the vector in Ri whose entries are the solution coefficients y 1 , . . . , y i obtained by choosing y jq = 1 and all the other free variables equal to 0. The pivot variables among the y 1 , . . . , y i are determined by these free variable choices. v1 , . . . , vt is a linearly independent list of vectors: each of these columns has entry value of 1 in a row where all the rest have a 0. 23.1. Exercise. Show that y = a1 v1 + · · · + at vt is the vector containing the list of solution coefficients corresponding to the choices y j1 = a1 , . . . , y jt = at . Appealing to Exercise 21.12 we find that Sq = (B1 · · · Bi ) vq for q = 1, . . . , t is a linearly independent list of t members of the intersection. And any member of the intersection can be written as v = y 1 B1 + · · · + y i Bi = y j 1 S 1 + · · · + y j t S t for choices of the free variables y j1 , . . . , y jt . So Span({S1 , . . . , St }) = V ∩ W , which therefore has dimension t. We have, along the way, proved the following important result for any subspaces V and W of Rn : dim(V + W ) = dim(V ) + dim(W ) − dim(V ∩ W ). Here is an example of the things we built above. We are given two subspaces of R4 as the span of ordered bases: V = Span({ (1, 3, 6, 2), (4, 3, 6, 2), (8, 1, 7, 1) }) and W = Span({ (9, 3, 0, 2), (0, 3, 1, 2), (5, 1, 7, 9) }). We want to determine and, if nontrivial, find a basis for V ∩ W . Create the matrix 1 3 M = 6 2 4 3 6 2 8 −9 0 −5 1 −3 −3 −1 . 7 0 −1 −7 1 −2 −2 −9 70 LARRY SUSANKA This matrix has six columns and four rows, so we know right now, without further work, that there will be at least two free variables and that V ∩ W will be nontrivial. 1 0 0 0 −49/18 −263/3 0 1 0 0 23/9 347/3 . rref (M) = 0 0 1 0 0 −25 1 0 0 0 5/6 20 This means: 5 y 1 = − y 2 − 20y 3 6 2 2 y =y y3 = y3. So a generic member of the intersection can be written in parametric form as v = y 1 B1 + y 2 B2 + y 3 B3 5 0 9 5 2 3 1 2 3 3 3 = − y − 20y + y + y 7 1 0 6 9 2 2 9 0 9 5 5 3 3 3 + + y 3 −20 + 1 = y2 − 6 0 1 0 7 2 2 2 9 175 −45 y2 3 − y 3 59 . (y 2 and y 3 are free parameters.) = −7 6 6 31 2 So the two vectors in the last line form a basis for V ∩ W . Using these two vectors as rows in a 2 × 4 matrix and reducing to rref produces the (very) marginally better basis 636 0 636 0 . , −75 147 349 −5 OVERVIEW OF REAL LINEAR ALGEBRA 71 24. ✸ Solving Problems in More Advanced Math Classes. Students in a Linear Algebra class14 typically have a bit of an emotional roller-coaster ride as the course progresses. You are all able calculators, and have been rewarded for that in your past math classes. That is why you are here. And the first few weeks play to that strength. You are learning how to do a bunch of calculations, mostly related to things you have seen before such as vectors and angles and solving systems of linear equations. But now the game is changing. We are introducing more abstract ideas. In many of the problems you are asked if one or more of the definitions apply to an object given in the problem. You might be asked if a set with certain operations is a vector space, a subspace, or if a function is linear. You might be asked if a set of vectors spans a given vector space, or if it is a dependent or independent set of vectors. You might be asked to identify a nullspace, or a columnspace or an eigenspace, or to change bases from one basis to another. All of these require a calculation step . . . but which calculation are you to do? And why? And what if (horrors) there are two different different kinds of steps you must take, each with their own calculation, before you can draw a conclusion? Typically, for the middle weeks in a class like this, there is a lot of angst and frustration. Scores on quizzes and tests might drop. Then (for almost everyone, I hope) there is a sequence of those “AHA” moments we all like so much and students become much more successful and efficient, just in time for the last test and the final. In all math classes, from this point onward, there is a sequence of steps you take in analyzing a problem, assuming that the techniques of the subject apply. In Linear Algebra, for instance, you must identify the vector space of interest and specify unambiguously the linear functions of interest. If there is additional structure, such as a means of determining angles in the space, that needs to be identified or specified. Application subject areas each have their own vocabulary and issues they care about but these refer to common facts about matrices and the like, in disguise. The question you would like to answer is rephrased in terms used in this class. You must clarify: 1) What is the question? If it takes several steps to answer, separate the steps so you don’t get lost. Don’t try to do them all at once. For instance to show a set is a basis, you must show it is linearly independent and spans. These are very different properties. 2) What technique will be used? Often it involves solving some linear system of equations. Identify this technique. Remember that sometimes there are several techniques you could use to show the same thing. Sometimes one way is much easier than another! 3) Carry out the calculation. 4) Draw an explicit conclusion. Remind yourself in writing of what you have just 14Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. 72 LARRY SUSANKA accomplished by the calculation. Sometimes the calculation takes a long time and it is easy to lose track of why you did it. 5) If it is a multi-step question, sum up after you have verified all the parts of the argument and make the concluding assertion. It is all about communication—with others and with yourself. Whew. Lots to do, but believe it or not people can learn to do this. It’s also kind of fun after you get the hang of it. And learning to isolate assumptions, identify the question, select a technique, carry out the calculation and, finally, draw a valid conclusion in this rather pristine environment will make you a more powerful problem solver in other areas of study as well. 25. Dimension is Well Defined. The proof of this result, found below, is one of two proofs in these notes that you should think about enough so you thoroughly understand it and could recreate it. Suppose V is a vector space and v1 , . . . , vn is a list of vectors that span V : that is, a list of n different vectors from V so that every vector in V can be written as a linear combination of the vectors on this list. We will show that any list z1 , . . . , zn+1 of n + 1 vectors from V must be dependent. In particular, we will show there are constants a1 , . . . , an+1 Pn+1that j which are not all 0 and for which j=1 a zj = 0. This will imply that no basis for V can have more than n members, and one can deduce (can you?) that if v1 , . . . , vn is a basis (that is, it not only spans V but is an independent list of vectors too) no basis can have fewer than n members either. On with the proof. We don’t use the Einstein summation convention here: keeping track of the length of each sum is part of the discussion. Since v1 , . . . , vn spans V , for j = 1, . . . , n + 1 there are constants cij for P which zj = ni=1 cij vi . The matrix C formed from these 1 c1 c2 C = cij = .1 .. cn1 constants c12 ... 2 c2 ... .. . cn2 ... c1n+1 c2n+1 .. . n cn+1 has more columns than rows: it is an n by n + 1 matrix. This means that the left-multiplication linear transformation f : Rn+1 → Rn defined by f (w) = Cw is not one-to-one. In particular there is a nonzero solution a ∈ Rn+1 to the equation Ca = 0. OVERVIEW OF REAL LINEAR ALGEBRA 73 Upon examining the n entries of Ca, which are all 0, we see that n+1 n+1 n n n+1 n X X X X X X a j zj = aj cij vi = aj cij vi = 0 vi = 0. j=1 j=1 i=1 i=1 j=1 i=1 This was the result we were looking for. 26. Coordinates. If S = { v1 , · · · , vn } is an ordered basis for V and p = a1 v1 + · · · + an vn we define the S-coordinates of p to be a1 a2 [p]S = [ a1 v1 + · · · + an vn ]S = .. = a1 e1 + · · · + an en . . an Each vector in V has unique S-coordinates. The function [·]S : V → Rn is called the S-coordinate map. No matter what the vector space was at the outset all questions bearing on the vector structure on the space can be translated to a question involving these S-coordinates and the standard ordered basis En = { e1 , . . . , en } for Rn , our paradigmatic vector space. After we answer the corresponding question in Rn we transfer the answer back to V using the S-coordinate map. There is a word for this situation in mathematics: isomorphic, meaning “same form.” Any finite dimensional real vector space is isomorphic to Rn for some unique integer n. We also note an obvious fact: if x is in Rn the coordinates for x with basis En , which we could denote [x]En , is just x itself. Up to this point we had only one interpretation of coordinates: those in terms of the only basis we knew about, En . There was no need to specify the basis, the “language we were speaking.” That is no longer the case. When there is more than one basis in sight, you must take explicit note of the basis of V to which the coordinates refer. Otherwise there is no way to interpret the meaning of the coordinates, and that kind of sloppiness is a huge source of confusion. We bring up now a point that bears remembering and which we will use. Suppose v1 , . . . , vk are members of V and m1 , . . . , mk are members of Rn and these vectors are related by [vi ]S = mi for a basis S for V and all i. 74 LARRY SUSANKA Then each member of the span of { v1 , . . . , vk } is associated with exactly one member of the span of { m1 , . . . , mk }, and conversely, by the S-coordinate map. Entire subspaces of V can be moved to Rn and back, and subspaces of Rn are associated with subspaces of V . The set { v1 , . . . , vk } is linearly independent exactly when { m1 , . . . , mk } is linearly independent. 26.1. Exercise. (i) Find coordinates [w]T where t et − e−t e + e−t t −t w = 3e − 5e + 1 and T = . , 1, 2 2 (ii) Find coordinates [v]S where 7 2 0 0 0 1 0 1 1 0 v= and S = , , , . −1 3 0 1 1 0 0 1 0 0 27. ✸ Position Vectors and Coordinates. We now consider an example of particular importance to applications, the relationship between coordinates and physical space15. We suppose the arrow-vectors in space form a three dimensional vector space, as they certainly seem to do by our common experience. If someone disagrees it is up to this critic to point out where that assumption leads us to error. If the critique is validated, and only at that point, the rules of our science-game require us to revise our opinion. But until then, and as long as the assumption continues to lead us to verifiable—and verified—conclusions, we will carry on. This assumption has been found to be a (very) useful approximation. When we want to identify the arrow-vectors in space with R3 we commonly select an explicit ordered basis S = { v1 , v2 , v3 } of three arrowvectors which incorporate our sense of three independent directions. These arrow-vectors might be perpendicular to each other in space if that is convenient, though we do not require it. These vectors are to be our “measuring sticks” in three different directions and any arrow-vector can be written as p = p1 v1 + p2 v2 + p3 v3 which we associate to [p]S = (p1 , p2 , p3 ) in R3 . Recall from Section 6 that if we want to use a position vector to describe a specific location A in space we first must identify a base point E in space, the corner of your laboratory or some other convenient place. Let’s suppose the displacement vector from E to A is the vector p from above. 15Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. OVERVIEW OF REAL LINEAR ALGEBRA 75 Then the position vector to A with base point E is a specific arrow that starts at E and ends at A denoted AE = EE + (AE − EE) = EE + p. If someone else has a different idea of where the base point should be located in space, as is bound to occur from time to time, we will need different position vectors. Let’s say their base point is a location B in space, and the displacement vector from B to E is given by EE − BE = EB − BB = c = c1 v1 + c2 v2 + c3 v3 . According to our earlier discussion on page 28, AB = EB + (AB − EB) = BB + (EB − BB) + (AB − EB) = BB + c + p. If we are to associate position vectors in the world with position vectors in Rn we must get the base points involved in an explicit way. We intend to associate base point E in the world with the standard origin in R3 . To make this association explicit in the notation, we use 0E to denote the position vector of (0, 0, 0) with base point (0, 0, 0) in R3 . For base point E in the world, we define for position vector AE [ AE ]S, E = [ EE + (AE − EE) ]S, E = 0E + [ AE − EE ]S = 0E + [p]S . We associate the position vector EE in the world with position vector 0E in R3 . For any other position vector AE, we add to that the S-coordinate map applied to the displacement of A from the base point E. It is a straightforward matter to compute the position vector in R3 associated with the position vector of A using a new base point B. [ AB ]S, B = [ BB + (AB − BB) ]S, B = [ BB + (EB − BB) + (AB − EB) ]S, B = 0B + [ c + p ]S = 0B + [c]S + [p]S . We reiterate: it makes no sense to talk about a position vector to a point in the world represented as a position vector in R3 , unless one specifies both the ordered basis S in the world and the base point E in the world. You can do it by repeatedly reminding people of basis and base point in words, or by use of an explicit notation embedded in the calculations where the issue will arise. The function [·]S, E that implements the association using notation is called a position map. 76 LARRY SUSANKA 28. Linear Functions Between Vector Spaces. Suppose f is a function with domain V and range in W , denoted f : V → W . If V and W are vector spaces we say f is linear when: f (u+cv) = f (u)+c f (v) for all members u and v in V and any constant c. As an example, the S-coordinates functions from the last section are linear. If A is an ordered basis of V and x = xi vi is a representation of x as a linear combination of members of A and if f is linear then f (x) = f (xi vi ) = xi f (vi ). We conclude: A linear function is completely determined by what it does to any spanning subset of its domain space. In particular, any two linear functions that agree on a basis are actually the same function. The kernel of f is the set of those v in V for which f (v) = 0. We will denote this set ker(f ). The image of f is the set of those w in W which are f (v) for some v in V . We will denote this set image(f ). The kernel is the set of vectors “killed” by f , while the image is the collection of all “outputs” of f . Both kernel and image are vector subspaces, of V and W respectively. If W = V , so f : V → V , we define eigenvectors and eigenvalues for f . The number λ is called an eigenvalue for f if there is a nonzero vector x for which f (x) = λx. The nonzero vector x is called an eigenvector for λ and f . The set of all x for which f (x) = λx (the set of eigenvectors for λ plus the zero vector) is a subspace of V , called the eigenspace for f and eigenvalue λ. If A is a basis for V and C is a basis for W there is a matrix that corresponds to the effect f has on the A-coordinates of vectors in V in terms of C-coordinates of vectors in W . Specifically if A = { a1 , . . . , an } and W has dimension m we define: MC←A = [f (a1 )]C · · · [f (an )]C (Each [f (ai )]C is an m by 1 column.) MC←A is called the matrix of f with respect to bases A and C. For any x = xi ai in V , MC←A [x]A = [f (x)]C . OVERVIEW OF REAL LINEAR ALGEBRA 77 The proof is, again, nothing deeper than a calculation: 1 x 2 x MC←A [x]A = [f (a1 )]C · · · [f (an )]C .. . xn = xi [f (ai )]C = [f (xi ai )]C = [f (x)]C . A picture illustrating this situation is found below: W ✛ Commuting Square f [·]C (reversible) V [·]A (reversible) ❄ MC←A Rm ✛ ❄ Rn A diagram as above is called a commutative diagram by mathematicians. It means that if you start with a member of a set at the “corner” V and apply the functions on the edges as indicated by the arrows you will end up with the same member of Rm , whether you follow the path across the top and down by [·]C , or if you go down first by [·]A and across by using the matrix MC←A . Diagrams of this kind help us visualize (and so remember) the relationships among functions. If an arrow is indicated to be reversible, we mean that the function it connotes has an inverse function and the modified diagram with the function replaced by its inverse (and arrow reversed at that place) is commutative too. We remark that if any m by n matrix M is thought of as a linear function from Rn to Rm by left multiplication, and domain and range have the standard bases En , then MEm←En = M . There was no need for this notational specificity when the only bases we had were the standard bases. But now we must be explicit about the basis in both domain and range: otherwise there is no way to know what the entries in the matrix mean. Solutions to f (v) = b can be found by solving MS←S x = [b]S and transferring the answer back to V . The image and kernel of f can be found the same way. The characteristic polynomial is defined for f : V → V by using any basis S for both domain and range to create a matrix MS←S for f . The characteristic polynomial for f is then defined to be the characteristic polynomial for MS←S . The eigenvalues for MS←S , and the vectors in V associated with the eigenvectors for this matrix are the eigenvalues and eigenvectors for f . Though the matrix MS←S depends on the choice of basis S, the characteristic polynomial, and the eigenvalues and eigenvectors for f in V found 78 LARRY SUSANKA using this matrix, will not depend on this choice. Some of that is obvious now, but it will be completely clear after we discuss matrices of transition from one basis to another. 28.1. Exercise. Let S be the ordered basis { t, et , e2t } for function space V and let T be ordered basis { et , e2t } for function space W . Consider the d2 linear function dt 2 : V → W . Find MT←S for this function. What is the kernel of this function? What is the image of this function? 28.2. Exercise. Let S be the ordered basis { (1, 2, 3), (1, 0, 1), (0, 1, 5) } for R3 and define the linear function F : R3 → R3 by 33 −35 7 F (x) = Ax where A = −10 −26 10 . −1 −85 41 First give ME3←E3 for this function. Then find MS←S for this function. 28.3. Exercise. (i) Is the function W : P (t) → R given by Z 1 (3 + t)g(t) dt W (g) = 0 linear? Prove it is or show why not. (ii) Is the function K : P (t) → R given by Z 1 2 + g(t) dt K(g) = 0 linear? Prove it is or show why not. 28.4. Exercise. Is the function H : R2 → R2 given by W (x) = (x1 x2 , x1 + x2 ) linear? Prove it is or show why not. 28.5. Exercise. If f : V → V is linear and λ is an eigenvalue for f show that the eigenspace for this eigenvalue is a vector subspace of V . d 28.6. Exercise. Find the matrix MT←S of dt : V → W where V has ordered t t basis S = { te − e , Cos(t) } and W has ordered basis T = { Sin(t), tet }. 29. Change of Basis. Suppose A = { a1 , . . . , an } and B = { b1 , . . . , bn } are two ordered bases for a vector space V . We do not rule out the possibility that V = Rn and A or B is the standard basis. We remind the reader that [x]En = x, where En is the standard basis of Rn . This is a pretty common situation. In any case, we will let PB←A be the n by n matrix PB←A = [a1 ]B . . . [an ]B . OVERVIEW OF REAL LINEAR ALGEBRA 79 PB←A is called the matrix of transition from A to B. Its columns are the coordinates of the “old” A vectors in terms of the “new” basis B. You can think of it as an “automatic translator” from the A-language to the B-language. It is a fact that [x]B = PB←A [x]A . The proof: Suppose x = xi ai . Then i PB←A [x]A = PB←A [x ai ]A = [a1 ]B x1 2 x . . . [an ]B .. . xn = xi [ai ]B = [xi ai ]B = [x]B . A picture indicating what is happening here is found below: V Commuting Triangle (all arrows reversible) [·]B [·] ❅ A ❘ ❅ ✠ P n ✛ B←A R Rn Since PA←B PB←A [x]A = [x]A , the product PA←B PB←A is the identity matrix. −1 PB←A is the matrix of transition PA←B from B to A, so its columns are the A-coordinates of the vectors in B. One more little time-saver: Sometimes there will be a nice basis A, but two more useful bases B and C. The bases B and C have to be given to you somehow. Usually both are given in terms of some easy basis such as A. If you want to find the matrix of transition PC←B you can proceed via the basis A by −1 PC←B = PC←A PA←B = PA←C PA←B . Generally in applications of this material to specific examples you should be very cautious before launching into some horrifying messy calculation. You want to off-load this drudgery to hardware, where it belongs. Almost always there is a basis in which a linear transformation has an easy-to-compute matrix. Almost always there are easy-to-use coordinates close at hand. Use them! Enter the transition matrices into an electronic assistant and maneuver around using the shaded formulae above and below. 80 LARRY SUSANKA 29.1. Exercise. Consider the polynomial space P2 (t) with ordered bases A = { 3t + 1, 5t, t + t2 } and B = { 1, t, t2 } and C = { 1 − t2 , 1 − t, 1 + t2 }. Find the matrix PA←C that converts C-coordinates to A-coordinates. 30. Effect of Change of Basis on the Matrix for a Linear Function. Finally, we consider changes in the matrix of a linear transformation f : V → W under change of basis. Suppose we wish to change from basis A to basis B in V and from basis C to basis D in W . Let PB←A be the matrix of transition from A to B in V and let PD←C be the matrix of transition from C to D in W . Suppose MC←A and MD←B are the matrices of f with respect to the implied coordinates. These matrices involve the coordinates of members of V and W with respect to certain bases in V and W . Specifically, if x is a generic member of V , we find MC←A [x]A = [f (x)]C and MD←B [x]B = [f (x)]D . So: PD←C MC←A [x]A = PD←C [f (x)]C = [f (x)]D = MD←B [x]B = MD←B PB←A [x]A . This implies PD←C MC←A = MD←B PB←A and so −1 MD←B = PD←C MC←A PB←A = PD←C MC←A PA←B . MD←B = PD←C MC←A PA←B . This shaded equation, applied to coordinates [x]B is MD←B [x]B = PD←C MC←A PA←B [x]B . The internal dialog you use to describe this equation (from far right to left) is: “Change from B-coordinates of x to A-coordinates. Do the f -thing to these coordinates, producing C-coordinates for f (x). Then switch to Dcoordinates. The result is just as if you had used MD←B on [x]B directly.” A picture that could be useful here is the following. PA←B Rn ✛ ❦ ◗ ◗ MC←A Commuting Prism (horizontal arrows reversible) ✑ ◗ [·]A ❄ V f ◗ [·]C ✸ ✑ ✑ [·]B ❄✑ W MD←B =PD←C MC←A PA←B ❄ ✲ Rm ✸ ✑ ✑ PD←C Rm ❦ ◗ ◗ Rn [·]D OVERVIEW OF REAL LINEAR ALGEBRA 81 From a different standpoint, any n × n invertible matrix P whatsoever could be construed as a matrix of transition, so if V is an n-dimensional vector space and MA←A is the matrix of a linear function f : V → V in a basis A then the matrix P −1 MA←A P will be the matrix of f in a different basis. Matrices that are related this way are called similar. Similar matrices share many properties. For instance, the characteristic polynomials of similar matrices are identical, so they share eigenvalues. See Exercise 39.2 for more. 30.1. Exercise. Define F : M2×2 → M2×2 by F (X) = AXB where 7 2 1 0 A= and B = . −1 3 3 2 Find the matrix MS←S of F with 0 0 0 S= , 0 1 1 respect to ordered 1 0 1 1 , , 0 0 1 0 basis 0 . 0 Does F have eigenvalues? If so, find bases for each eigenspace. d f . Create 30.2. Exercise. Define K : P2 (t) → P2 (t) by K(f ) = 3f + dt matrices MS←S and MT←T where S = 1, t, t2 and T = 1 − t − t2 , t + 1, 3 − 2t2 . Does K have eigenvalues? If so, find bases for each eigenspace. 31. ✸ Effect of Change of Basis on a Position Vector. We saw in Section 27 what is involved when we create coordinates for position vectors, and how those coordinates change when we adjust the base location. In this section16 we examine how coordinate changes affect position vectors. We also generalize slightly: we suppose the underlying space has displacements which form an n dimensional vector space, rather than the 3 dimensional setting of that earlier section. Let’s get specific. Suppose the displacements in our world have two different ordered bases S = { v1 , v2 , . . . , vn } and T = { w1 , w2 , . . . , wn } We are going to be creating position vectors in our world. We have two different possible base points in mind for our position vectors: E and B. 16Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. 82 LARRY SUSANKA Let p denote the displacement vector from E to location A and let c denote the displacement vector from base point B to base point E. So the position vector to location A relative to base point E is AE = EE + (AE − EE) = EE + p. On the other hand, the position vector to location A relative to base point B is AB = BB + (EB − BB) + (AB − EB) = BB + c + p. As we did in R3 we let 0B denote the position vector of (0, . . . , 0) using (0, . . . , 0) as base point when we intend to represent position vectors in our world with base point B using position vectors in Rn . So comparing two different position maps, one with base point E and basis S, the other with base point B and basis T we have [ AE ]S, E = 0E + [p]S and [ AB ]T, B = 0B + [c]T + [p]T . These two position vectors in Rn represent the same point in our world but they will, of course, generally have different coordinates. Now let’s suppose an astronaut at location E using basis S reports observations of something happening at location A to home base at location B. Home base uses basis T . The astronaut reports the incident by radio, sending, to home base, the displacement [p]S from E to A. The folks there want to interpret what that means. Home base knows where the astronaut is: at position vector 0B + [c]T , using B as base point and T as basis. Changing the reported coordinates from basis S to basis T using matrix PT←S gives [p]T = PT←S [p]S . So home base knows the position coordinates of the incident are [ AB ]T, B = 0B + [c]T + [p]T = 0B + [c]T + PT←S [p]S . As a minor embellishment, it is entirely possible that it might be easier for the astronaut to know where she is in relation to base than for base to know where the astronaut is. In that case the astronaut would report the base displacement [−c]S along with [p]S . And then [ AB ]T, B = 0B + PT←S [c]S + PT←S [p]S . OVERVIEW OF REAL LINEAR ALGEBRA 83 32. ✸ Effect of Change of Basis on a Linear Functional. Let’s consider for a moment17 a linear function f : V → R and basis A for V . We noted before that the matrix for f is a row matrix, a member of Rn∗ . ME1←A = ( f (a1 ) f (a2 ) . . . f (an ) ) f (x) = ME1←A [x]A . Changing bases to new basis B for V is done using the change of basis matrix PB←A : f (x) = ME1←A [x]A = ME1←A PA←B (PB←A [x]A ) = ME1←B [x]B . The point is, to change to coordinates in terms of B you left multiply the coordinates of vectors by PB←A but you right multiply a functional, a row matrix, by PA←B . If you take the point of view that functionals are more important than vectors you would call PA←B the matrix of transition from basis A to B, not the matrix of transition from basis B to A as we would do. That vocabulary is indeed used in some texts and it can be confusing, but the usage is not totally illogical. The confusion is exacerbated when we (incorrectly but conveniently) think of functionals as dot product against vectors. As long as we move from one T orthonormal basis to another all is well, because in that case PA←B = PB←A and the erroneous idea is not revealed. But it is still wrong in principle, even if it gives you the answer you want in carefully chosen cases. The problem comes up often in physics and engineering. For instance an electric field is usually given as a vector, but in fact it is a functional. Let’s think of an example that helps illustrate the distinction. The physical scenario we have in mind consists of two types of items in the air in front of your eyes with you seated, perhaps, at a desk. First, we have actual physical displacements, say of various dust particles that you witness. These displacements are what we usually think of as vectors. Second, we have stacks of flat paper, numbered like pages in a book, each stack having its own characteristic uniform “air gap” between pages, which are parallel throughout its depth. We make no restriction about the angle any particular stack must have relative to the desk. We consider these pages as having indeterminate extent, perhaps very large pages, and the stacks to be as deep as required, though of uniform density. The magnitude of a displacement will be indicated by the length of a line segment connecting start to finish, which we can give numerically should we decide on a standard of length. Direction of the displacement is indicated 17Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. 84 LARRY SUSANKA by the direction of the segment together with an “arrow head” at the finish point. The magnitude of a stack will be indicated by the density of pages in the stack which we can denote numerically by reference to a “standard stack” if we decide on one. The direction of the stack is in the direction of increasing page number. This is the example that matches the electric field. The “pages” are equipotential surfaces, more dense where the field is big. We now define a coordinate system on the space in front of you, measuring distances in centimeters, choosing an origin, axes and so on in some reasonable way with z axis pointing “up” from your desk. Consider the displacement of a dust particle which moves straight up 100 centimeters from your desk, and a stack of pages laying on your desk with density 100 pages per centimeter “up” from your desk. If you decide to measure distances in meters rather than centimeters, the vertical coordinate of displacement drops to 1, decreasing by a factor of 100. The numerical value of the density of the stack, however, increases to 10, 000. When the “measuring stick” in the vertical direction increases in length by a factor of 100, coordinates of displacement drop by that factor and displacement is called contravariant because of this. On the other hand, the stack density coordinate changes in the same way as the basis vector length, so we would describe the stack objects as covariant. Though we have discussed the geometrical procedure for defining scalar multiplication and vector addition of displacements, we haven’t really shown that these stack descriptions can be regarded as vector spaces. There are purely geometrical ways of combining stacks to produce a vector space structure on stacks too: two intersecting stacks create parallelogram “columns” and the sum stack has sheets that extend the diagonals of these columns. But the important point is that if stacks and displacements are to be thought of as occurring in the same physical universe, and if a displacement is to be represented as a member x of R3 , then a stack must be represented as a linear functional, a member M of R3∗ . You cannot represent both (at the same time) as members of R3 . Otherwise they would have to change the same way when you change yardsticks. And yet they don’t. There is a physical meaning associated with the number M x. It is the number of pages of the stack corresponding to M which cross the shaft of the displacement corresponding to x, where this number is positive when the motion was in the direction of increasing “page number.” OVERVIEW OF REAL LINEAR ALGEBRA 85 It is obvious on physical grounds that this number must be invariant: it cannot depend on the vagaries of the coordinate system used to calculate it. That is the meaning of M x = M P −1 P x = M P −1 (P x) and why coordinates of functionals and vectors must change in complementary ways. 33. Effect of Change of Basis on the Trace and Determinant. Any linear transformation f : V → V has a square matrix MS←S with respect to ordered basis S for V . If you use another basis T then there is a square matrix P with MT←T = P MS←S P −1 . It follows det(MT←T ) = det(P MS←S P −1 ) = det(P ) det(MS←S ) det(P −1 ) = det(MS←S ). That means you can define the determinant for f , denoted det(f ), to be the determinant of the matrix of f with respect to any convenient basis. An identical argument shows that the characteristic polynomial for any matrix for f does not depend on S. It is called the characteristic polynomial for f . Finally, let MS←S = (ai j ) and MT←T = (bi j ) and P = (pi j ) and P −1 = (qi j ). Recall the Kronecker delta function δi j and that In = (δi j ). So qi t qt j = δi j . MT←T = P MS←S P −1 so bi j = pi t at s qs j . So trace(MT ) = bi i = pi t at s qs i = qs i pi t at s = δs t at s = at t = trace(MS←S ). We conclude that trace is also invariant under choice of basis, and define trace for f to be the trace of any matrix for f . Trace, determinant and characteristic polynomial for a linear transformation can be calculated using the matrix for the linear transformation in any basis. Though the matrix used to calculate them will differ from one basis to another, trace and determinant and characteristic polynomial will not. 86 LARRY SUSANKA 34. Example: How To Use Convenient Bases Efficiently. The example below is 2-dimensional, so the calculations done by hand are not too bad. Really, a person would not do this in 2-dimensions, but it illustrates what is going on quite well. And even in this case it is somewhat easier to do everything possible with calculator and matrix methods. As has been emphasized, you should try to organize calculations so you have to do practically nothing by hand. If you think first before calculating that can usually be arranged. Remember even the simplest calculation in 2 dimensions is likely to “scale up” very badly to higher dimensions. In three dimensions and beyond it is so much easier (for us) if we use hardware and matrices that doing it any other way is actually wrong, unless you have a very pertinent reason. Humans are just not good at doing hundreds of arithmetic operations without error. People also forget what they are doing in the middle of a mess like that. That is why we have hardware to do this, and keeping it all straight is exactly why we humans invented and use Linear Algebra. 2 2t Let’s start with a function space V = Span t e , te2t and two ordered bases A = t2 e2t + te2t , t2 e2t − te2t and B = 3t2 e2t + 4te2t , t2 e2t − 6te2t . 2t 2t Define W to be the function space Span te , e with two ordered bases C = 3te2t − e2t , te2t + e2t and D = te2t − e2t , 4te2t + 3e2t . Finally, we consider the linear transformation F : V → W defined by F (x) = d x 2x − . t dt t For instance, choosing (at random) x = 3t2 e2t + 5te2t we have F 3t2 e2t + 5te2t = 2 3te2t + 5e2t − 3e2t + 6te2t + 10e2t = −3e2t . One needs to show F is linear, but also it is not totally obvious that F is into W : that is, F (x) is always a combination of e2t and te2t and never yields an output involving t2 e2t . You should be able to show that too. Anyway, assuming those two tasks are accomplished, let’s determine the matrices of transition PA←B and PC←D and the matrix MD←B for F . The problem is all four bases are messy. But all four are given in terms of nice bases. Remember that V and W are given to you as the spans of ordered lists of independent simple vectors. This will virtually always be the case: there will be some nice easy basis around, even if it is not just given explicitly as it is here. Look for these bases! OVERVIEW OF REAL LINEAR ALGEBRA 87 We let S be the ordered basis { t2 e2t , te2t } for V and T be the ordered basis { te2t , e2t } for W . These are the obvious “nice” bases. They will make our work very easy. PS←A 1 1 = 1 −1 PT←C Also and 3 1 , PS←B = , 4 −6 3 1 1 4 = , PT←D = . −1 1 −1 3 d F (s1 ) = F t2 e2t = 2te2t − te2t = −e2t = −t2 dt d F (s2 ) = F te2t = 2e2t − e2t = 0. dt So in case we care, F has nullity 1, and rank 1. (Why?) 0 The T -coordinates of F (s1 ) are and the T -coordinates of F (s2 ) are −1 0 . 0 0 0 So MT←S = −1 0 and we have everything we need to answer any question involving F and the four bases. Notice I had to do two easy differentiations and a couple of subtractions to create the matrix for F . And as for these four matrices of transition, I had to do nothing except read off the coefficients! This is typical. It’s true, I don’t have the answers I wanted quite yet. But all subsequent work will be done by matrix multiplication using hardware if I want! Here it is: PA←B PC←D −1 3 1 1 1 =PA←S PS←B = 4 −6 1 −1 7/2 −5/2 = . −1/2 7/2 −1 3 1 1 4 =PC←T PT←D = −1 1 −1 3 1/2 1/4 = . −1/2 13/4 88 LARRY SUSANKA MD←B =PD←T MT←S PS←B = 12/7 4/7 = . −3/7 −1/7 −1 1 4 0 0 3 1 −1 3 −1 0 4 −6 So I guess we are done. But what do these matrices mean, what do they have to do with the functions in V and W , or the calculation given by F ? Let’s track through with the typical function x = 3t2 e2t + 5te2t that we looked at above. We calculated that F (x) = −3e2t . 3 23/22 3 = . [x]S = and [x]B = PB←S 5 −3/22 5 3 23 b1 − 22 b2 . (Check The coordinates on the right mean that x should be 22 this!) 12/7 4/7 23/22 12/7 [F (x)]D = MD←B [x]B = = . −3/7 −1/7 −3/22 −3/7 3 Apparently F (x), which we know to be −3e2t , is 12 7 d1 − 7 d2 . (Once again, you might want to verify this.) We could also check by calculating 1 4 12/7 0 [F (x)]T = PT←D [F (x)]D = = . −1 3 −3/7 −3 We can actually go farther with this and since we have already created all these matrices we might as well see what else they can do for us. MC←A = PC←D MD←B PB←A −1 1/2 1/4 12/7 4/7 1/4 1/4 7/2 −5/2 = = . −1/2 13/4 −3/7 −1/7 −3/4 −3/4 −1/2 7/2 Also, the A-coordinates of x and the C-coordinates of F (x) are (calculating each two ways, just to check that it “works” and everything consistent) [x]A =PA←S [x]S = PA←B [x]B −1 1 1 3 7/2 −5/2 23/22 4 = = = . 1 −1 5 −1/2 7/2 −3/22 −1 [F (x)]C =PC←T [F (x)]T = PC←D [F (x)]D −1 3 1 0 1/2 1/4 12/7 3/4 = = = . −1 1 −3 −1/2 13/4 −3/7 −9/4 Let’s check to see if the results match: 1/4 1/4 4 3/4 MC←A [x]A = = . −3/4 −3/4 −1 −9/4 So they agree! Well of course they do. It looks like a lot of calculation up above. But virtually all of it was just checking to make sure everything was OVERVIEW OF REAL LINEAR ALGEBRA 89 consistent and writing down the entries of matrices. But I already knew how that would all work out: I proved the theorems that said it would be consistent in class. And there was no need to write down the entries beyond those four easy-basis matrices and two F evaluations on page 87. Subsequent multiplication was all done by hardware. 34.1. Exercise. Consider the ordered bases A = { 3t+1, 5t, t+t2 }, B = { t2 , t, 1 } and C = { 3t2 −t+1, t−2, t2 +t−1 } of P2 (t). Find PB←A and PA←B and PA←C . 34.2. Exercise. Consider the function G : R2 → R2 defined by 1 2 G(x) = Ax where A = . 5 7 Let S be the ordered basis S = { (1, 2), (1, −1) } and T = { (−1, 7), (1, 5) }. (i) What are the S-coordinates of (6, 1)? (ii) What is the matrix MS←S of G in basis S? (iii) What is PS←T ? 2 2 The matrix of linear function H : R → R in basis T (in both domain 9 −3 and range) is . 0 1 (iv) What is the matrix of H in basis S? 34.3. Exercise. Use a nicer intermediary basis to simplify the calculations to find the matrix MS←S of Exercise 30.1. 35. Bases Containing Eigenvectors. Suppose f : V → V is linear and B = { b1 , . . . , bn } is a basis for V . If the matrix MB←B for f is diagonal with diagonal entries λ1 , . . . , λn then each bi is an eigenvector for eigenvalue λi . The converse is also true. This is a very convenient type of basis to use if we want to understand f . Such a basis is said to diagonalize the matrix for f and the process of finding such a basis is referred to as diagonalization. If x = a1 b1 + · · · + an bn is any member of V then f (x) = λ1 a1 b1 + · · · + λn an bn . It turns out that if the characteristic polynomial for f factors into linear factors and produces n different eigenvalues you are guaranteed to be able to find a basis of eigenvectors. That is because a set of eigenvectors for different eigenvalues cannot be a dependent set, and there must be at least one eigenvector for each eigenvalue. A basis of eigenvectors can be referred to as an eigenbasis. 90 LARRY SUSANKA To prove independence of a set of eigenvectors for different eigenvalues we show that a dependency leads to a contradiction. Therefore no dependency is possible. If there is a dependency, then there would be such a dependency among a minimum number of eigenvectors for different eigenvalues for f . Let a1 y 1 + · · · + ak y k = 0 be such a relation involving the fewest possible eigenvectors yi for different eigenvalues λi . This obviously cannot involve only one eigenvector: eigenvectors are not the zero vector. Nor can it involve just two eigenvectors: the same vector cannot be an eigenvector for two different eigenvalues. So 0 = f (a1 y1 + · · · + ak yk ) = a1 f (y1 ) + · · · + ak f (yk ) = a1 λ1 y1 + · · · + ak λk yk . But dividing this last equation by a nonzero λi and subtracting it from the first equation produces a nontrivial relation among fewer eigenvectors. This is the contradiction we were looking for, and we conclude that any set containing one eigenvector each for different eigenvalues is independent. If there are fewer than n different roots of the characteristic polynomial, then there might or might not be a basis of eigenvectors, but at least we can include in a preferred basis any eigenvectors that we can find, adding other vectors to fill out the rest of the basis. The matrix for f will be relatively simple, even if it is not diagonal, if we do this. Later in these notes we will discuss several types of “nice” bases for a given matrix, and in other classes the issue will be revisited for a variety of purposes. If the characteristic polynomial has a complex root there will definitely not be a basis of real eigenvectors. We will consider what can be done to handle complex eigenvalues later. When the characteristic polynomial has real factors (all real eigenvalues) but some of the factors are repeated there might be a basis of eigenvectors . . . or maybe not. You simply have to find bases for the eigenspaces, put them together into an independent set, and count. If the set has n vectors in it, you have a basis. A linear function that has a basis of eigenvectors is called diagonalizable. That vocabulary refers to the fact that the matrix of such a function using a basis of eigenvectors will be diagonal. 35.1. Exercise. The matrices of Exercise 19.3 are diagonalizable. Find matrices of transition from the standard basis to a new basis which can be used to convert these matrices to diagonal form. OVERVIEW OF REAL LINEAR ALGEBRA 91 35.2. Exercise. V = { a + bt + cet + de2t | a, b, c, d ∈ R }. 2 d d Define D : V → V by D(f ) = dt f . − 2 dt 2 f Find a basis of eigenfunctions, if possible. (Note: an eigenfunction is just an eigenvector that happens to be a function.) Find a basis for the kernel and the image of D. d g. 35.3. Exercise. Define F : P2 (t) → P2 (t) by F (g) = t dt Find a basis of eigenfunctions, if possible. Find a basis for the kernel and the image of F . 36. Several Applications. If A is a square matrix we define A0 = In . If n is large An might be difficult to calculate. But powers Dn , where D is a square diagonal matrix, are easy to calculate. These powers are also diagonal matrices, with powers of the eigenvalues λ1 , . . . , λn arrayed along its main diagonal. If matrix A is diagonalizable this allows us to make sense of f (A) for many 1 real-variable functions f . For instance if P AP −1 = D we can define D 3 to 1 1 be the matrix with λ13 , . . . , λn3 arrayed along the main diagonal. But then 3 1 P −1 D 3 P = P −1 DP = A so we have found a cube root of the matrix A as well. More generally, if all of the eigenvalues of diagonalizable A are inside P i then f (D) is the interval of convergence of power series f (t) = ∞ a t i=0 i diagonal with f (λ1 ), . . . , f (λn ) arrayed along the main diagonal. Then we define f (A) = P −1 f (D)P . The entries of the partial sum matrices converge to this matrix. This can be extended to matrices that are not diagonalizable, but the calculations are harder and rely on “canonical forms” for the matrices involved. These will be studied in more detail your next Linear Algebra class, or in a Differential Equations class where they use such matrices to solve systems of differential equations. 1 1 2 A 3 . 36.1. Exercise. Find A , e and Sin(A) where A = 7 6 36.2. Exercise. Suppose B = λ 1 . 0 λ It is easy to show that for variable t, That means eBt λt e teλt = 0 eλt (Bt)n n n λ t nλn−1 tn . = 0 λn tn 92 LARRY SUSANKA 36.3. Exercise. Consider the system of differential equations dx = 3x + y x(0) = 5 dt dy = 3y y(0) = 4 dt This can be converted to the matrix equation d 3 1 5 x = Ax , A = and x(0) = . 0 3 4 dt The solution is then x(t) = eAt x(0). Calculate this solution. We now use matrices to study some sequences with recursive definition, such as the famous Fibonacci sequence. This sequence starts out with two “seed” values f0 and f1 and defines other members of the sequence by fn+1 = fn + fn−1 for n ≥ 1. The behavior of the sequence is somewhat mysterious: as defined here, you must know all of the previous n values before you can calculate fn+1 . Our goal here is to find a formula for fn for any seed values that does not require this. The sequence can be computed as n 1 1 fn+1 f1 for n ≥ 0. = fn f0 1 0 1 1 The matrix A = has characteristic polynomial 1 0 λ − 1 −1 p(λ) = det ( λ I − A ) = det = (λ − 1)λ − 1 = λ2 − λ − 1. −1 λ This can be factored using the quadratic formula, producing two eigenvalues. One is positive and bigger than 1, while the other is negative and less than 1 in magnitude. √ √ 1− 5 1+ 5 and λ2 = . λ1 = 2 2 These two eigenvalues have some interesting and peculiar properties. For instance λ1 λ2 = −1 and also, since both are roots of the characteristic polynomial, λ21 = λ1 + 1 and λ22 = λ2 + 1, facts which might help simplify calculations we have to do. The Cayley-Hamilton Theorem says that every square matrix “satisfies” its characteristic polynomial, and in this context that means 0 = p(A) = A2 − A − I = (λ1 I − A)(λ2 I − A) = (λ2 I − A)(λ1 I − A) where 0 here refers to the zero matrix and I is the 2 by 2 identity matrix. We don’t need to prove the general theorem here: an easy calculation shows OVERVIEW OF REAL LINEAR ALGEBRA 93 that it is, indeed, true that 0 = A2 − A − I. We are going to use this fact to help us produce an eigenvector for each eigenvalue without doing messy arithmetic, a technique you might want to remember in other contexts. Since (λ1 I−A)(λ2 I−A) = 0 any vector that is in the range of λ2 I−A must be killed—sent to the zero vector—by λ1 I − A: that is, it is an eigenvector for λ1 . The “output” of λ2 I − A as a linear transformation contains both columns of λ2 I−A. So any nonzero column of λ2 I−A must be an eigenvector for eigenvalue λ1 . By an identical argument, any nonzero column of λ1 I − A must be an eigenvector for eigenvalue λ2 . We calculate these matrices: λ1 − 1 −1 λ1 I − A = = −1 λ1 λ2 − 1 −1 λ2 I − A = = −1 λ2 √ 5−1 2 −1 √ −1− 5 2 −1 −1 √ 1+ 5 2 −1 √ ! 1− 5 2 ! We note here, by way of making mathematical conversation, that we are in a two dimensional space, and eigenvectors for different eigenvalues must be linearly independent, and there must be at least one for each eigenvalue. So for each matrix above the second column has to be a numerical multiple of the first column, even though it doesn’t really look like it. There can be no more than two linearly independent vectors in a two dimensional space such as R2 . I want to pick an eigenvector for each eigenvalue, and fractions are a bit of a nuisance, so I will choose twice the second column as an eigenvector in each case. This gives eigenvectors −2√ −2√ v1 = for λ1 and v2 = for λ2 . 1− 5 1+ 5 It is interesting to note that v1 and v2 are orthogonal: their dot product is 0. Let V be the ordered basis { v1 , v2 } which we will call an eigenbasis because it consists of eigenvectors for A. E2 = { e1 , e2 } is the standard basis of R2 , as usual. The matrices of transition between these two bases are PE2←V and PV ←E2 , √ −1 −2√ −2√ 1 + √5 2 and PV ←E2 = √ . PE2←V = 1− 5 1+ 5 4 5 −1 + 5 −2 The first one of these is obvious, but the second is the inverse matrix of the first and can be calculated using one of the methods we have for that task. Simply checking that it is the inverse matrix is easy enough. 94 LARRY SUSANKA The linear transformation given by matrix multiplication by A in basis E2 is given by a different matrix B = PV ←E2 APE2←V when using coordinates corresponding to basis V . And λ1 0 B = PV ←E2 APE2←V = 0 λ2 which can be shown either by calculating the product of those three matrices if you don’t trust what you have done so far, or simply thinking about the meaning of the eigenbasis V . It then follows (combine the “inside” pairs PE2←V PV ←E2 = I) that n λ1 0 = B n = (PV ←E2 APE2←V )n 0 λn2 = PV ←E2 APE2←V PV ←E2 APE2←V · · · PV ←E2 APE2←V P V ←E2 APE2←V = PV ←E2 An PE2←V λn1 0 From this we get = PE2←V PV ←E2 . 0 λn2 f1 f1 = Now suppose we start out with seed vector . f0 E f0 2 a Let be the coordinates of the seed vector in the eigenbasis: b √ −1 a f1 f1 (1 + √5)f1 + 2f0 = √ = PV ←E2 = . b f0 V f0 4 5 (−1 + 5)f1 − 2f0 An Now comes the payoff for this work. It is worth pointing out that, all the checking and comments aside, up to this point there were just three calculations involving more than addition: we had to factorize the characteristic polynomial, we had to find the inverse matrix PV ←E2 and we just found the eigencoordinates of the seed vector. It now follows that f fn+1 n n f1 =A = PE2←V PV ←E2 A PE2←V PV ←E2 1 f0 fn f0 n n λ1 0 f1 λ1 0 a =PE2←V = PE2←V PV ←E2 0 λn2 f0 0 λn2 b n λ1 a =PE2←V = λn1 av1 + λn2 bv2 . λn2 b There are some interesting things that can be drawn from this. First, the magnitude of λ2 is about 0.6, so for even modestly sized n the −4 last term in the last line above is tiny. For instance λ20 2 < 10 . So unless a = 0 the λ1 term will utterly swamp the λ2 term very quickly. OVERVIEW OF REAL LINEAR ALGEBRA So λn1 av1 is a good approximation to 95 fn+1 for even modestly big n. fn 36.4. Exercise. Verify the statements about the Fibonacci sequence from above and conclude that √ √ f0 f1 fn = λn1 a(1 − 5) + λn2 b(1 + 5) = √ (λn1 − λn2 ) + √ λ1n−1 − λ2n−1 . 5 5 Fibonacci enthusiasts call the number λ1 the “golden ratio” and use the symbol φ for this eigenvalue. Note that λ2 = −φ−1 , and we can use that to simplify the formula in various ways. For instance if f1 = f0 = 1 we have Binet’s formula, 1 fn = √ φn+1 − (−1/φ)n+1 . 5 Any recursive real sequence Sn where later terms Sn are linear functions of Sn−1 , . . . , Sn−k (k fixed) can be handled similarly, using k × k matrices rather than the 2 × 2 matrices we used for the Fibonacci sequence. 37. Approximate Eigenvalues and Eigenvectors. You can use your calculator to find approximate eigenvalues for F (x) = Ax where −8 2 3 7 9 9 6 7 4 1 . 1 6 −2 6 9 A= 9 1 −3 2 5 3 2 −7 2 5 Using hardware aid to graph the polynomial y = det(λI5 − A) we see So there are five real eigenvalues. Each will have an eigenvector, so there is a basis of eigenvectors. 10.8399 is an approximation to the biggest eigenvalue λ1 (to the nearest 10−4 ) and the rest (to the same accuracy) are λ2 ≈ −9.8298. λ3 is near 8.2963, λ4 ≈ −4.9077 and λ5 is near −1.3986. 96 LARRY SUSANKA Finding approximate eigenvectors for your approximate eigenvalues is trickier. We can’t just solve Ax − 10.8399x = 0 because the matrix A − 10.8399I5 is nonsingular: 10.8399 is not exactly an eigenvalue. So let’s define our task. We want to find a vector x so that kF (x) − λxk = kAx − λxk < 10−3 kxk, for the approximate eigenvalue λ = 10.8399. We saw before that if vi is an eigenvector for the ith eigenvalue λi , for i = 1, . . . , 5, and if x = a1 v1 + · · · + a5 v5 then Ax = 5 X a i λi v i and more generally i=1 a1 An x = 5 X ai λni vi . i=1 So unless = 0, as you multiply by larger powers of A the part of An x corresponding to multiples of v1 will form a larger and increasing proportion of An x. n n λ2 λ5 An x 1 = a v1 + v2 + · · · + v5 . n λ1 λ1 λ1 If we pick a “randomly chosen” initial x then the expression on the left will converge to a1 v1 , an eigenvector for λ1 . The problem is that we don’t know λ1 exactly. This might not seem like such a big problem: after all, in a practical problem we don’t know the entries of A exactly either. They are also measured or approximated. Still, a nicer formula which also produces a multiple of v1 (that is, unless by colossal misfortune your randomly chosen initial vector has no v1 component) and which does not involve λ1 explicitly, is given by the recursion relation Ayn−1 . y1 = x, yn = kyn−1 k The part of yn which is a multiple of v1 becomes proportionately overwhelming, and the magnitude of yn will converge to λ1 . This process is called power iteration. 9.8298 100 Since 10.8399 ≈ 6×10−5 , we should expect y100 would be good enough as an approximate eigenvector for λ1 , or close to it. In fact after 102 iterations (using e1 as the “seed”) we find that the unit vector 0.146067089516298237 0.896414826917890095 w= 0.405101325701891668 0.0671042007313784328 −0.0805903594151132635 satisfies kAw − 10.8399wk < 10−3 kwk. OVERVIEW OF REAL LINEAR ALGEBRA 97 Though conceptually interesting, the method is grossly inefficient, requiring a very large number of calculation steps unless the ratio λλ21 is small. One way to take advantage of estimates of the eigenvalues to improve this situation is called inverse iteration. This relies on the fact that the eigenvectors for Bµ = (A − µIn )−1 are the same as the eigenvectors for A when µ is not an eigenvalue for A. In fact, in this case if λ is an eigenvalue for A then (λ − µ)−1 is an eigenvalue for Bµ and any eigenvector for A for eigenvalue λ is an eigenvector for Bµ for eigenvalue (λ − µ)−1 . The point here is that if λ is close to µ, the largest eigenvalue of Bµ will be huge in comparison to the second largest, so power iteration of Bµ should converge comparatively quickly to an eigenvector for A for eigenvalue λ. Using B10.8 and iterating twice provides an approximate eigenvector about as close to the true eigenvector as 102 iterations produced in the direct calculation, yielding unit vector 0.146128109664179606 0.896389101389033605 . 0.405139215156500421 w1 = 0.0671255800941089886 −0.805576015434486675 The norm of Aw1 is about 10.8404. Again, we emphasize this is achieved after just two iterations. This method also allows for approximation to an eigenvector for each distinct (real) eigenvalue. Choosing µ to be (respectively, one after another) −9.8, 8.3, −4.9 and −1.4 we obtain, after two iterations, unit approximate eigenvectors 0.654318825949867344 −0.376569981063663017 −.447396299161207878 −.921072301620784795 w2 = 0.393306187768773075 , w3 = −.333298382765519541 , −.432930248564862230 0.0960115255988746169 0.172003104722630806 0.172921197953743749 −.345907090362161995 0.0860602631013127478 0.463579909121897504 −0.693437559456339253 w4 = −.492261570378854152 , w5 = 0.314753866089923562 0.484190962465213981 −.756740328294183806 .434372664356520444 0.562191385317338810 and where kAw2 k ≈ 9.8300, kAw3 k ≈ 8.2963, kAw4 k ≈ 4.9077, kAw5 k ≈ 1.3986. 98 LARRY SUSANKA If we let W be the ordered basis of eigenvectors W = { w1 , w2 , w3 , w4 , w5 } we can create the matrix of transition PE5←W = ( w1 w2 w3 w4 w5 ). We find that we can approximately diagonalize matrix A as PW←E5 APE5←W 10.839933 0.000046 0.000005 0.000003 0.000000 −0.000073 −9.829828 −0.000001 −0.000010 −0.000010 8.296279 0.000004 0.000000 = −0.000679 0.000055 . −0.000114 0.000217 −0.000001 −4.907735 −0.000001 −0.000052 0.000045 −0.000001 0.000006 −1.398648 38. Nullspace, Image, Columnspace and Solutions. The nullspace of an m × n matrix M is the set of vectors “killed” by left-multiplication by a matrix. It is the kernel of the linear transformation given by left multiplication by matrix M . It is a vector subspace of Rn , denoted nullspace(M). If M is square and 0 is an eigenvalue of M , the nullspace is the eigenspace of M for eigenvalue 0. You can also think of it as the solution set of the homogeneous system determined by the coefficient matrix M . A nonzero vector is in the nullspace of M exactly when it is perpendicular to all the vectors obtained as the transpose of the rows of M . The columnspace of M is the image of the matrix as a function from Rn to Rm . Writing M as a block matrix with columns C1 , . . . , Cn we have M x = (C1 C2 · · · Cn ) x = xi Ci so the result of multiplying M by x is explicitly a linear combination of the columns of M . The columnspace is the span of the columns of M , denoted colspace(M). If the nullspace of an m by n matrix M is trivial, the linear function formed using matrix M is called “one-to-one.” Solutions to the equation M x = b, if they exist, are unique when the linear function is one-to-one. The columns of a matrix M form a linearly independent set exactly when nullspace(M ) = { 0 } . If the columns of M span Rm , the function obtained by left multiplication by M is called “onto.” When this function is onto, there is always a solution to M x = b. In any case, onto or not, there is a solution to M x = b exactly when b is in the columnspace of M . Suppose we want the set of solutions to M x = b where M is an m by n matrix. If b is in the columnspace and p is any particular solution and N is the nullspace of M then the solution set is p + N , which we define to be OVERVIEW OF REAL LINEAR ALGEBRA 99 the set of all vectors of the form p + n for n ∈ N . The solution set will not be a subspace unless p is in N , in which case p + N = N. The kernel and image corresponding to any linear function f between finite dimensional spaces V and W can be found by applying the above remarks to the matrix of the function with respect to coordinates. If you find a bases for the nullspace and the columnspace of the matrix, the coordinate maps can be used to find vectors in V and W respectively which comprise bases for ker(f ) and image(f ). The vocabulary “one-to-one” and “onto” is applied to f when the property holds for any (and hence every) matrix for f . Rephrasing the comments from above regarding solution sets to apply to a more general vector space, if p is an element of the vector space V and N is a subspace of V define the set p + N to be the set of all vectors of the form p + w where w is from N . This is the equivalent of a plane or line in space, possibly not through the origin. It will go through the origin only when p is in N , in which case p + N = N . 38.1. Exercise. A function f : V → W is one-to-one and onto exactly when an inverse function f −1 : W → V exists. Show that in this case if f is linear so is f −1 . Show that if f is one-to-one and onto then dim(V ) = dim(W ). 38.2. Exercise. (i) Find the columnspace 1 3 A = 2 0 1 1 and the nullspace of the matrix 4 2 . 2 (ii) Find the image and kernel of the function given by G(x) = Ax. (iii) Can the equation G(x) = (1, 1, −3), with G as above, be solved? (iv) Find the image and kernel of the function F given by the formula F (x) = P rojw (x) for w = (1, 2, 3). 39. More on Nullspace and Columnspace. Identifying a vector subspace explicitly is often accomplished by locating a basis for that subspace. Finding a basis for a nullspace or a columnspace, for instance, can be tedious. Here are some timesaving tips to accomplish these tasks. Convert the m by n matrix A = (ai,j ) to row reduced echelon form R = (ri,j ) by elementary row operations. R = M A where M is the invertible m × m matrix that is the product of the elementary row matrices used to produce R. So A = BR where the m × m matrix B is the product, in reverse order, of the “opposite” of those elementary row operations used to formed R. 100 LARRY SUSANKA If the rows of a matrix are thought of as 1 × n matrices, their span is called the rowspace of the matrix. Left multiplication by an elementary row matrix does not change the rowspace, so A and R have the same rowspace. Let’s suppose that there are exactly k nonzero rows in the echelon matrix R, so the common rowspace of A and R has dimension k. Looking at A = BR as the product of column block matrices we have (A1 . . . Aj . . . An ) = (B1 . . . Bm ) (R1 . . . Rj . . . Rn ) . Focusing attention on the jth column we get Aj = BRj or, expanding, Aj = r1,j B1 + · · · + rm,j Bm = r1,j B1 + · · · + rk,j Bk because ri,j = 0 whenever m ≥ i > k. So we have, explicitly, all n columns of A as linear combination of these first k columns of B, so the dimension of the columnspace is some number t which cannot exceed k, the dimension of the rowspace of A. The same fact is true of AT , the transpose of A, whose rows are the columns of A. We know therefore that k cannot exceed t, and conclude that the dimension of the columnspace of A is exactly k. For any matrix, the dimension of the columnspace is the same as the dimension of the rowspace. We now make three important points. First, a more careful examination of the linear combination of columns above shows that the columns in the original matrix A corresponding to the pivot columns in the echelon matrix form a linearly independent set of k columns, which therefore comprise a basis of the columnspace of A. To see this, note that B is invertible so its columns are linearly independent. If Rj is a pivot column of R with single nonzero entry (with value 1 there, of course) located at row s then the above block equation Aj = r1,j B1 + · · · + rs,j Bs + · · · + rm,j Bm identifies Aj as Bs . Second, as noted above, reduction to rref is obtained by elementary operations on the rows of a matrix, which does not alter the rowspace. These rows are in a relatively simple form in comparison to the starting rows. Therefore, if you want a nice basis (defined as having as few nonzero entries as possible) for the span of a set of vectors, take the transpose of the matrix having these vectors as columns and hit it with the rref stick. The rowspace is unchanged by this. The transpose of the nonzero rows will be a basis for the original columnspace. Third, because it involves left-multiplication by invertible matrices, it is easy to show that the process of reduction to rref does not change the OVERVIEW OF REAL LINEAR ALGEBRA 101 nullspace of a matrix. If there are k nonzero rows in the echelon matrix, they can be used to form a homogeneous system of k independent equations in n unknowns satisfied by (and only by) the vectors in the nullspace, which therefore has dimension n − k. This observation shows that the dimension of the nullspace (the nullity of A, also denoted nullity(A)) plus the dimension of the columnspace (the rank of A, also denoted rank(A)) must add to n, the dimension of the domain. 39.1. Exercise. Find a basis for the nullspace and a basis for the columnspace of A, where 4 2 3 9 A = 3 6 9 1 . −1 4 6 −8 39.2. Exercise. Show that similar matrices have the same rank and nullity. 40. Rank Plus Nullity is the Domain Dimension. By analogy with the discussion for matrix transformations, the rank of any linear transformation f (not just one given by a matrix) is the dimension of image(f ). This dimension is denoted rank(f ). The nullity of a linear transformation of f , is the dimension of ker(f ). This dimension is denoted nullity(f ). We will now give another proof of a very important fact, proved for matrices in the last section. This proof here has the advantage of being direct, but also uses many important ideas. For any linear function, the dimension of the kernel plus the dimension of the image is the dimension of the domain. If f : V → W is linear rank(f ) + nullity(f ) = dim(V ). In the course of understanding this proof you must clarify your understanding of the central ideas used in it. I advise you to think about it until you understand it completely and can reproduce it. This is the second (and last) proof which I identify as special in this way in these notes. Suppose f : V → W is linear, and V is an n dimensional space. We know that both image(f ) and ker(f ) are subspaces of W and V , respectively. Suppose k = rank(f ) = dim(image(f )) and m = nullity(f ) = dim(ker(f )). We will show that m + k = n. The proof goes as follows. image(f ) and ker(f ) have bases { g1 , . . . , gk } and { w1 , . . . , wm }, respectively. For each i, the vector gi is in image(f ) so there is a vector hi ∈ V 102 LARRY SUSANKA with f (hi ) = gi . We will show that S = { h1 , . . . , hk , w1 , . . . , wm } is a basis for V , and so k + m = n. To do that we must show that S is linearly independent and that S spans V. Suppose ai hi + bk wk = 0. Because the wk are in ker(f ) 0 = f (0) = f (ai hi + bk wk ) = ai f (hi ) + bk f (wk ) = ai f (hi ) = ai gi . Since the gi form a basis for image(f ) they are independent. That means ai = 0 for all i. So the equation given above was really just bk wk = 0, and since the wk are independent we are forced to conclude that bk = 0 for all k too. So S is an independent set of vectors. Now suppose v is a generic member of V . So f (v) ∈ image(f ) and therefore can be written as f (v) = ai gi . So f (v) = ai f (hi ) which implies f (v − ai hi ) = 0. So v − ai hi ∈ ker(f ). So there are constants bk with v − ai hi = bk wk . We conclude that v = ai hi + bk wk and S spans V . The proof is complete. 40.1. Exercise. (i) Suppose H : R4 → R4 is linear and Image(H) = Span{ (1, 3, 6, 0), (1, 2, 1, 2) }. Are there two different vectors v and w with H(v) = H(w) = (1, 3, 6, 0)? (ii) Suppose F : R4 → R2 is linear and Ker(F ) = Span{ (5, 3, 6, 0), (8, 2, 9, 2) }. Must there be a solution to F (v) = (1, 3)? (iii) G : R5 → R2 is linear. What are possible values for nullity(G)? 41. Sum, Intersection and Dimension. If U and W are subspaces of V we defined U + W to be the set of all vectors of the form u + w where u is from U and w is from W . This set is called the sum of U and V . The sum is itself a vector subspace of V . The overlap of the two subspaces, U ∩ W , is also a vector subspace of V . In fact, U ∩ W is a subspace of U and of W and of U + W . We saw that if all spaces are subspaces of Rn that there is an interesting relationship between the dimensions of these various spaces. We assume here only that U + W is finite dimensional. Let S be a basis { s1 , . . . , si } for U ∩ W . Using S as a starter set, we add members { r1 , . . . , rj } of U to create a basis R = { s1 , . . . , si , r1 , . . . , rj } of U . Again using S as a starter we add members { t1 , . . . , tk } of W to create basis T = { s1 , . . . , si , t1 , . . . , tk } of W . OVERVIEW OF REAL LINEAR ALGEBRA 103 So dim(U ∩ W ) = i and dim(U ) = i + j and dim(W ) = i + k. Note (prove, if you wish) the following: if any linear combination bx rx + x y z y is in U ∩ W then all the b must be 0. Similarly, if c sy + d tz is in z U ∩ W then all the scalars d must be 0. cy s We will now show that A = { r1 , . . . , rj , s1 , . . . , si , t1 , . . . , tk } is a basis for U + W so that dim(U + W ) = i + j + k. It is easy to see that A spans U + W . That is because every member of U + W is the sum of a member of U and a member of W . Since each of these members are linear combinations of members of A, so is their sum. It remains only to demonstrate that A is linearly independent. Suppose bx r x + c y s y + d z t z = 0 for certain scalars bx , cy and dz . That means bx rx + cy sy = −dz tz . The left side is in U and the right side is in W . So both sides are actually in W and U : that is, bx rx +cy sy ∈ U ∩W . But by our earlier remark this means all the bx are 0. By identical argument, all the dz are zero. So the original equation was cy sy = 0 which, since S is a basis, implies all the cy are 0 too. So A is a basis of U + W . We conclude that: dim(U ) + dim(W ) = dim(U ∩ W ) + dim(U + W ). 41.1. Exercise. Show that the boxed statement above remains true in case any one of the four spaces involved is infinite dimensional, in the sense that both sides of the equation must then involve at least one infinite dimensional space. 41.2. Exercise. (i) Suppose G : R5 → R4 is linear and nullity(G) = 2. Suppose vectors v = (1, 2, 7, 0) and w = (1, 0, 0, 1) are not in image(G). T What is dim( Span({ v, w }) image(G) )? Why? (ii) K : R5 → R2 is linear and rank(K) = 2. Also, { v1 , v2 , v3 } is an independent set of three vectors in R5 , none of which is in ker(K). T What values of dim( Ker(K) Span( { v1 , v2 , v3 } ) are possible? 42. Direct Sum. Suppose given two nontrivial subspaces U and W of vector space V . The overlap of the two subspaces, U ∩ W , is also a vector subspace of V . If this intersection is just {0} we write U ⊕ W for the sum U + W of the two subspaces U and W . The sum is called a direct sum in this case. 104 LARRY SUSANKA The important thing about a direct sum is that every vector in it can be written in exactly one way as the sum of a vector from W and a vector from U. To see this, we suppose a vector v in U ⊕ W can be written as v = w1 + u1 = w2 + u2 where both w1 and w2 are in W and u1 and u2 are in U. But then w1 − w2 = u2 − u1 and so these differences must be in both W and V . Since U ∩ W = {0} we have w1 − w2 = u2 − u1 = 0. This means that w1 = w2 and also u2 = u1 . This uniqueness is important and useful. So we are left with the problem of finding U ∩ W , which might be useful for something even if it is not {0}. We learned how to find a basis for U ∩ W in section 22. 2 2−1 2 42.1. Exercise. V = Span t + 1, t and W = Span t, t 3 and U = Span t + 1, t2 − 1 . Is V + W = V ⊕ W ? Is V + U = V ⊕ U ? Is U + W = U ⊕ W ? More generally, if W1 , W2 , . . . , Wk are nontrivial subspaces of V we can form the subspace W of V consisting of the span of all the vectors in any of the Wi . Every vector in W can be written as a sum w1 + w2 + · · · + wk where wi ∈ Wi for all i. We define the sum W1 + W2 + · · · + Wk to be W . Of particular interest are those sums where the representation of each vector in W is unique. Specifically, we write W = W1 ⊕ W2 ⊕ · · · ⊕ Wk and call the sum a direct sum, when 0 = w1 + w2 + · · · + wk , where wi ∈ Wi for all i, implies wi = 0 for all i. 42.2. Exercise. Show that if W = W1 ⊕ W2 ⊕ · · · ⊕ Wk and if v 1 + v 2 + · · · + v k = w1 + w2 + · · · + wk , where vi and wi are in Wi for all i then vi = wi for all i. If v is any nonzero vector define the one dimensional subspace Rv = { rv | r ∈ R }. 42.3. Exercise. If { v1 , v2 , . . . , vk } is a basis of W then W = Rv1 ⊕ Rv2 ⊕ · · · ⊕ Rvk . OVERVIEW OF REAL LINEAR ALGEBRA 105 43. A Function with Specified Kernel and Image. We now create a linear function f : V → W with specified kernel and image in vector spaces V and W . Throughout this section, we suppose given a basis S = { s1 , . . . , sk , sk+1 , . . . , sn } of V and independent vectors T = { t1 , . . . , tk } in W . Let Y = span({ sk+1 , . . . , sn }) and U = span({ t1 , . . . , tk }). Any x in V can be written in a unique way as x = c1 s1 + · · · + ck sk + ck+1 sk+1 + · · · + cn sn . We define f by f (x) = c1 t1 + · · · + ck tk . So f sends the first k terms in the sum to the same combination involving the independent members of T , and sends the last n − k terms to 0. It is easy to show that f is linear. The kernel of f is Y : If f (x) = 0 then independence of the members of T implies xi = 0 for 0 ≤ i ≤ k which implies x ∈ Y . Clearly any member of Y is in the kernel of f . The image of f is, explicitly, U . There are many different functions of this kind whose image is U . They correspond to all the different ways of selecting a basis t1 , . . . , tk for U . There are also many different functions of this kind whose kernel is Y . They correspond to different ways of adding (or prepending, I guess) vectors to sk+1 , . . . , sn to complete a basis for V . 43.1. Exercise. Extend t1 , . . . , tk to a basis T = { t1 , . . . , tk , tk+1 , . . . , tm } for W . Find the matrix MT←S for the function f : V → W described above. 43.2. Exercise. Suppose F : V → W is linear and V = A ⊕ B and W = C ⊕ D. We will also suppose that F (a) ∈ C and F (b) ∈ D for all a ∈ A and b ∈ B. Finally, find bases S = { s1 , . . . , sk , sk+1 , . . . , sn } for V and T = { t1 , . . . , tj , tj+1 , . . . , tm } for W , selected so that { s1 , . . . , sk } is a basis for A and { t1 , . . . , tj } is a basis for C. What does the matrix of F with respect to these bases look like? Suppose V = Rn and W = Rm . Consider the matrices PEn←S = (s1 · · · sn ) and K = (t1 · · · tk ). PS←En x gives coordinates of x in basis S so if J is the matrix formed from the top k rows of PS←En then Jx ∈ Rk will be the vector consisting of the first k S-coordinates of x. 106 LARRY SUSANKA The k rows of the k×n matrix J are linearly independent so the columnspace will have dimension k. Since the range is Rk that means the function obtained by left multiplication by J will be onto Rk . The kernel of this function is obviously span({ sk+1 , . . . , sn }). The columnspace of the m × k matrix K is the span of the independent vectors t1 , . . . , tk of Rm and so has dimension k. Let f : Rn → Rm be given by f (x) = KJx. ker(f ) = span({ sk+1 , . . . , sn }) and image(f ) = span({ t1 , . . . , tk }). 43.3. Exercise. Create the matrix with respect to standard bases in domain and range for a linear transformation H : R5 → R3 for which 1 0 1 2 1 2 ker(H) = span 6 , 6 , 6 and 5 0 5 3 3 1 1 1 0 , 1 . image(H) = 1 4 This involves a judicious choice of intermediary bases in domain and range. Done properly, there will be no calculation done by hand. There are many correct answers. 44. Inner Products. An inner product on a real vector space V is a real valued function, which we will denote h· , ·i, defined on ordered pairs of vectors and for which, for all constants c and vectors v, w and z: 1. hv, wi = hw, vi 2. hcv, wi = chv, wi and hv + w, zi = hv, zi + hw, zi 3. hy, yi > 0 for any nonzero vector y These properties have names. The first property is called symmetry. The function h·, ·i is called linear in its first slot by virtue of the second property. It is called positive definite if the third property holds. An inner product allows you to import the concept of magnitude, distance and angle into your vector space. p The norm of a vector v is denoted kvk and defined to be hv, vi. The distance between vectors v and w is denoted d(v, w) and defined to be kv− wk. The angle θ between two vectors is defined by hv, wi = kvk kwk cos(θ). OVERVIEW OF REAL LINEAR ALGEBRA 107 v and w are called orthogonal (to each other) if hv, wi = 0. With an inner product we have the extremely important concept of projection. For vector v and nonzero vector w define P rojw (v) = hv, wi w. hw, wi Note that for any vector v, the vector v − P rojw (v) is orthogonal to w: when you take away the part of v which lies in the direction of w, the part remaining is orthogonal to w. Among other properties, the norm satisfies the triangle and CauchySchwarz inequalities: for all pairs of vectors v and w | kvk − kwk | ≤ kv + wk ≤ kvk + kwk and |hv, wi| ≤ kvkkwk. A vector space endowed with an inner product is called an inner product space. 45. The Matrix for an Inner Product. Every inner product h·, ·i on Rn can be given as a formula involving a square matrix G. In fact, hx, yi = xT G y where G has entries gi j = hei , ej i and the ei are the standard basis vectors of Rn . This is easy to prove if x and y themselves are standard basis vectors, and the general fact follows by linearity of matrix multiplication. Here are the most important facts about the matrix of an inner product, proved in Sections 52 and 56. The matrix of any inner product must be symmetric with only positive eigenvalues. On the other hand any symmetric matrix is diagonalizable with an orthogonal matrix of transition. So there is a basis of orthonormal eigenvectors for the matrix of any inner product. And if all the eigenvalues of a symmetric matrix are positive it can be used to create an inner product. If the matrix G of an inner product on Rn is diagonal, the inner product is called a weighted Euclidean inner product. The dot product itself corresponds to the identity matrix. If V is any n-dimensional vector space with inner product h·, ·i and S an ordered basis for V then the matrix GS defined by gi j = hsi , sj i can be used to calculate inner products in V . Specifically, 108 LARRY SUSANKA hx, yi = [x]TS GS [y]S and the matrix GS itself corresponds to an inner product on Rn . 45.1. Exercise. Prove the statement in the box above. 45.2. Exercise. None of the matrices listed below could be the matrix of an inner product. Explain why: which of the defining properties of an inner product would be violated? 5 1 0 5 0 0 −2 0 0 1 8 0 0 0 0 0 8 0 1 0 10 0 0 10 0 0 10 45.3. Exercise. The graphs of the characteristic polynomials of three 4 × 4 symmetric matrices are shown below. (Recall that you can create such graphs directly using the determinant function in your calculator.) Decide in each case if the matrix will be or cannot be the matrix of an inner product on R4 . 45.4. Exercise. Decide which of the following functions could be inner products. Find the matrices of those that are. (i) F : R3 × R3 → R , F (x, y) = 7x1 y 1 + x2 y 1 + x1 y 2 + 3x2 y 2 + 4x3 y 3 . (ii) G : R3 × R3 → R , G(x, y) = 7x1 y 1 + 2x2 y 1 + x1 y 2 + 3x2 y 2 + 4x3 y 3 . (iii) K : R3 × R3 → R ,K(x, y) = 2x1 y 1 + 3x2 y 1 + 3x1 y 2 + 2x2 y 2 + 4x3 y 3 . (iv) Find the angle between (1, 3, 2) and (1, 1, 1) with respect to any inner products you found above. OVERVIEW OF REAL LINEAR ALGEBRA 109 45.5. Exercise. At least one of the functions defined below on P2 (t) × P2 (t) is an inner product. Identify which are and calculate the matrix for each of these. Then find the angle between t2 + 5 and t − 1 with respect to each inner product. R1 (i) F (x, y) = 0 x(t)y(t)dt R1 (ii) G(x, y) = 0 (x(t) + 1)y(t)dt R1 (iii) H(x, y) = 0 t x(t)y(t)dt Rt (iv) K(x, y) = 0 x(u)y(u)du 46. Orthogonal Complements. If v is a vector in inner product space V we define v⊥ to be the set of all vectors orthogonal to v. This set is called the orthogonal complement of the vector v. More generally, if W is a nonempty subset of an inner product space V we define W ⊥ to be the set of all vectors orthogonal to every member of W . It is called the orthogonal complement of W , and read aloud as “W perp.” 46.1. Exercise. (i) Show that W ⊥ is a subspace of V . (ii) If H ⊂ K ⊂ V and H is nonempty then K ⊥ ⊂ H ⊥ . (iii) If H is a basis of subspace W of V then H ⊥ = W ⊥ . ⊥ (iv) Show that W ⊥ = Span(W )⊥ and Span(W ) = W ⊥ . (v) If v is a nonzero vector, show that V = Rv ⊕ v⊥ . If W and U are two subspaces of V , we define W + U to be the set of all vectors of the form w + u where w is a generic member of W and u an arbitrary member of U . W + U is a subspace of V too. It is obvious that W ∩ W ⊥ = {0} and we will see in section 48 (extend an orthonormal basis of W to an orthonormal basis of all of V ) that V = W + W ⊥ . So the sum is direct. V = W ⊕ W ⊥ when W is a subspace of inner product space V. Now consider a function f : Rn → Rm given by left matrix multiplication f (x) = Ax A is an m × n matrix. The matrix AT produces another function f˜: Rm → Rn given by left matrix multiplication f˜(x) = AT x AT is an n × m matrix. 110 LARRY SUSANKA There is an interesting relationship between the kernels and images of these two functions, important enough that some authors refer to it as the Fundamental Theorem of Linear Algebra. Since the rows of AT are the columns of A, any member of ker f˜ must be perpendicular to every column of A, and so must be perpendicular to ˜ any linear combination of such columns. It follows that v ∈ ker f exactly when v ∈ image(f )⊥ . ⊥ = ker(f ). Swapping A with AT we find, analogously, that image f˜ Remember that image f˜ is the span of the transposed rows of A. Rn =image f˜ ⊕ ker(f ) and Rm = image(f ) ⊕ ker f˜ and the direct summands are orthogonal complements. nullity(f ) = n − r nullity f˜ = m − r. rank(f ) = rank f˜ = r With this result inhand, let S = { s1 , . . . , sr , sr+1 , . . . sn } where { s1 , . . . , sr } is a basis for image f˜ and { sr+1 , . . . sn } is a basis for ker(f ). Let T = { t1 , . . . , tr , tr+1 , . . . tm } where { t1 , . . . , tr } is a basis for image(f ) and { tr+1 , . . . tm } is a basis for ker f˜ . Then the m × n matrix MT←S = PT←Em APEn←S = B O O O where B is an r × r invertible submatrix and O represents zero blocks of the appropriate sizes at each location (if r = n or m two or all three of them will be missing.) 46.2. Exercise. With the situation as above, create an n × m matrix C that acts as a “partial inverse” to A in the sense that Ir O Ir O PS←En PT←Em and CA = PEn←S AC = PEm←T O O O O where the zero blocks are the appropriate sizes to form an m × m matrix on the left and an n × n matrix on the right. Show how to use this to find a solution for Ax = b when b is in image(f ). OVERVIEW OF REAL LINEAR ALGEBRA 111 47. Orthogonal and Orthonormal Bases. Suppose V is any vector space with inner product h·, ·i and B = { b1 , . . . , bn } is an ordered basis. We are going to transform the basis B into a new basis using a procedure with n steps, and will use the superscript found below to indicate where we are in this process. Define v 1 = b1 and v2 = b2 − P rojv1 (b2 ). Generally, for i = 3, . . . , n define vectors vi = bi − P rojv1 (bi ) − P rojv2 (bi ) − · · · − P rojvi−1 (bi ). This produces an ordered basis of vectors vi , . . . , vi of V which are orthogonal to each other. This is called an orthogonal basis. Sometimes we wish to go further. Letting ui = kvvii k for each i produces an ordered orthonormal basis: S = { u1 , . . . , un }. Each vector in S has length one and is orthogonal to every other vector in S. The procedure which produces this basis is called the Gram-Schmidt process. Orthonormal bases are very useful. For instance, if S = { u1 , . . . , un } is orthonormal then b = b1 u1 + · · · + bn un = hb, u1 iu1 + · · · + hb, un iun for any b ∈ V. The coefficients bi = hb, ui i on the basis vectors ui are called the Fourier coefficients of the vector b with respect to orthonormal basis S. 47.1. Exercise. (i) Inner product h·, ·i on R2 is given by hx, yi = 7x1 y 1 + 4x2 y 1 + 4x1 y 2 + 3x2 y 2 . Find an orthonormal basis for R2 with respect to this inner product. R1 (ii) Inner product h·, ·i on P2 (t) is given by hx, yi = −1 60x(t)y(t)dt. Find an orthonormal basis for P2 (t) with respect to this inner product. 47.2. Exercise. Show that h·, ·i defined on Mn×m by hA, Bi = trace(AT B) is an inner product. If n = m the set of symmetric matrices, which we denote Symn , and the set of skew symmetric matrices, which we denote Skewn , are subspaces of Mn×n . Note that any matrix A in Mn×n can be written as A= A − AT A + AT + . 2 2 ⊥ Describe Sym⊥ n and Skewn . Find an orthonormal basis for M3×3 containing an orthonormal basis for Sym3 . 112 LARRY SUSANKA 48. Projection onto Subspaces in an Inner Product Space. It is a fact that if you divide the members of an orthonormal basis S into two subsets A and B then Span(A)⊥ = Span(B) and V = Span(A) ⊕ Span(B). If p = v + w where v is in Span(A) and w is in Span(B) then: kpk2 = kvk2 + kwk2 . (The Pythagorean Theorem) Suppose W is any nontrivial subspace of inner product space V . We can find an orthonormal basis { u1 , . . . , um } for W and then extend that to an orthonormal basis { u1 , . . . , um , um+1 , . . . un } for V . Any vector b in V can be written in a unique way as ! ! m n X X b= bi u i + bi ui = p + q i=1 i=m+1 where p is in W , and q is in W ⊥ . The vector p is denoted P rojW (b), and called the projection of b onto the subspace W . P rojW is a linear function from V to V . The image of P rojW is W . The kernel of this map is W ⊥ . The vector P rojW (b) is the unique member of W which is nearest to b. Note P rojW (P rojW (b)) = P rojW (b) and P rojW (b) = 0 exactly when b ∈ W ⊥ and P rojW (b) = b exactly when b ∈ W. With b = p + q as above, we can define the reflection of b in the subspace W to be Ref lW (b) = p − q. Reflection is also a linear function from V to V . Its image is all of V and Ref lW (Ref lW (b)) = b. Don’t forget that any linear transformation, including these two, has a matrix. Creating the matrix only involves calculating the function on a basis. 48.1. Exercise. Consider R4 with the usual inner product. Find a basis B = { b1 , b2 } for W ⊥ where W = Span({ (0, −3, 2, 0), (−1, 1, 0, 1) }). Then convert { (0, −3, 2, 0), (−1, 1, 0, 1), b1 , b2 } (in this order) into an orthogonormal basis using the Gram-Schmidt process. 48.2. Exercise. Using the matrix for item (8) in Exercise 18.1 and a change of basis as intermediary, create a matrix that will rotate points in R3 around axis vector (1, 2, 3) (tail at the origin, as usual) by angle π/3 counterclockwise as seen by an observer looking from the point (1, 2, 3) down onto the plane of rotation perpendicular to (1, 2, 3) through the origin. OVERVIEW OF REAL LINEAR ALGEBRA 113 49. A Type of “Approximate” Solution. Now suppose we are attempting to solve the matrix equation M x = b, where M is an m by n matrix, x is in Rn and b is a fixed member of Rm . We know that this equation has a solution only if b is in the columnspace of the matrix M . But what if it isn’t? It may be you would be willing to settle for a solution that puts you as close as possible to b. In that case let W be the columnpace of M , and let p = P rojW (b). You can solve M x = p for this “second best” or “approximate” solution. This approximate solution does depend explicitly on the projection, and therefore the inner product, in use. The solution corresponding to dot product is called the least squares solution. If we want an efficient way to calculate this solution we can proceed as follows. First review the discussion of section 46 regarding the Fundamental Theorem of Linear Algebra. Since Rm = colspace(M ) ⊕ Ker M T there is a unique representation of b as a sum p + c where p ∈ colspace(M ) and c ∈ Ker M T . Since colspace(M )⊥ = Ker M T we have p · c = 0. The vector p is the unique member of colspace(M ) closest to b with respect to the Euclidean norm. If M x = p then M T M x = M T p = M T p + M T c = M T b. On the other hand, if M T Mx = M T b then M x − b ∈ Ker M T so b = M x + k where k ∈ Ker M T and M x ∈ colspace(M ). Since the representation of b is unique, we find p = M x and c = k. The least squares solution to matrix equation M x = b, where x ∈ Rn and b is a fixed member of Rm , is the solution to M T M x = M T b. 50. Embedding an Inner Product Space in Euclidean Space. Suppose S = { uP 1 , . . . , un } is an orthonormal basis for an inner product space V , and q = ni=1 qi ui is a generic member of V . The individual coordinates qi can be easily calculated (assuming you have done the tedious up-front work of creating orthonormal S—you don’t get something for nothing) as qi = hui , qi. P Also if p = ni=1 pi ui then hq, pi = q1 p1 + · · · + qn pn = (q1 , q2 , . . . , qn )S · (p1 , p2 , . . . , pn )S = [q]S · [p]S . That means that angles, lengths, and any other geometrical fact you might want to know about vectors in V can be calculated using dot product on the S-coordinates, which are ordinary members of Rn . To reiterate: we are associating these orthonormal basis vectors ui with the corresponding standard basis vectors ei in Rn . With this association 114 LARRY SUSANKA the inner product on V corresponds to dot product in Rn . The matrix GS is the identity matrix. 51. Effect of Change of Basis on the Matrix for an Inner Product. Suppose given an inner product h·, ·i on a vector space V . Given an ordered basis A = { a1 , . . . , an }, this inner product has a matrix GA for which hx, yi = [x]TA GA [y]A . The matrix GA has ijth entry hai , aj i. This inner product will also have a matrix GB for use on B-coordinates. The question arises as to the relationship between the two matrices. For every pair of vectors x and y we must have hx, yi =[x]TA GA [y]A = [x]TB GB [y]B T = (PB←A [x]A )T GB (PB←A [y]A ) = [x]TA PB←A GB PB←A [y]A T T =[x]A PB←A GB PB←A [y]A . T It follows that these two matrices are related by GA = PB←A GB PB←A or, if you prefer, −1 T −1 T GB = PB←A GA PB←A = PA←B GA PA←B . So matrices used for inner products change a little differently, under change of coordinates, from matrices used to represent linear transformations. There is an interesting observation that can be made here. If both bases are orthonormal, both GB and GA must be the identity matrix. The equality T above then states In = PB←A PB←A . −1 So the matrix of transition between two orthonormal bases satisfies PB←A = : the rows of the matrix of transition from orthonormal A to orthonorB←A mal B are the columns of the matrix of transition from B to A. If you recall, we referred to matrices like this as orthogonal. PT 51.1. Exercise. A vector space V has 2 GA = 0 0 with respect to basis A. The matrix of transition PA←B inner product h·, ·i with matrix 0 0 1 0 0 5 2 0 0 is 0 −1 3. Calculate GB . 0 0 5 OVERVIEW OF REAL LINEAR ALGEBRA 115 52. A Few Facts About Complex Matrices and Vectors. In this section we are going to review and prove some facts about the behavior of vectors and matrices with complex numbers as entries. We do this in order to prove the Spectral Theorem in the next section. This section is about existence, but the proof does not provide an efficient way to find the things whose existence is being asserted. Those calculations are done by means you already know. The theorem merely states that the things (eigenvectors with certain properties) exist, so it is worth your time to look for them. Recall that a complex number is an expression “of the form” a + bi were i2 = −1 and the numbers a and b are real. This form is called the standard form of the complex number. Obviously whatever i might be it is not a real number. We perform arithmetic involving this new number exactly as we would if it were any other square root: whenever we see i2 we replace it by −1. If z = a + bi we define the conjugate of z, denoted z, to be a − bi. The magnitude of the complex number z is denoted kzk and defined by p √ kzk = zz = a2 + b2 . It follows that the multiplicative inverse of z is z −1 = z z = . zz kzk2 It is easy to show the following facts for two complex numbers z = a + bi and w = c + di: z+w =z+w Also z + z = 2a is real and and (zw) = (z) (w). i(z − z) = 2b is real. In this section we will fatten up our matrices to (potentially) contain complex number entries, and our vectors will be drawn from Cn , the space of n × 1 column matrices with complex entries. Matrix addition and matrix multiplication and scalar multiplication are defined just as before, with complex numbers in place of real numbers. The concepts of spanning set, linear independence, basis and also the process of finding complex solutions to a system of equations with complex coefficients are unchanged when applied to Cn , where scalars and entries and the coefficients of linear combinations are all drawn from C rather than R. There is one additional operation when entries and scalars are allowed to be complex. If A = (ai j ) is any m × n matrix with complex entries we define A to be the matrix (ai j ). 116 LARRY SUSANKA It is an easy exercise to show that if A, B and C are of compatible shapes (so the following operations make sense) AB + C = ( A ) ( B ) + C A + A is real i(A − A) is real. The norm of the complex vector x ∈ Cn is denoted kxk and defined to be p √ kxk = x · x = xj xj . For any complex number c we find kcxk = kckkxk. Also, kxk = 0 exactly when x = 0. Note that any x ∈ Cn can be written in a unique way as a + bi where a, b ∈ Rn . Then kxk2 = kak2 + kbk2 . T If B is any m × n matrix, the n × m matrix B∗ is defined to be B , the conjugate transpose of B. So, for example, the square norm of vector x is x∗ x. 52.1. Exercise. Suppose x, y ∈ Cn . (i) Show that x∗ y = 0 exactly when y∗ x = 0. If the condition holds, x and y are said to be orthogonal (to each other.) (ii) Show that x∗ y = xT y = yT x. (iii) If x 6= 0 show that x kxk If x 6= 0 define Projx (y) = has norm 1. x∗ y x∗ x x and CoProjx (y) = y − P rojx (y). (iv) Show that x∗ CoP rojx (y) = 0. (v) Show how to adapt the Gramm-Schmidt process so that it can be applied to produce, from any (complex) ordered basis { v1 , . . . , vn } of Cn a basis of vectors which are all orthogonal (pairwise) and of norm 1. A square matrix M is called Hermitian or self-adjoint exactly when M∗ = M. A square matrix Q is called unitary exactly when Q∗ = Q−1 , so for these matrices QQ∗ = Q∗ Q = In . The columns and rows of a unitary matrix are of (complex) norm 1, and if Ci is the ith column of unitary Q then C∗i Cj = δi j . The columns of an n × n unitary matrix form a (complex) basis of Cn , and these basis vectors are orthogonal. Unitary matrices are the complex equivalent of orthogonal matrices. Of course, orthogonal matrices are unitary. It is easy to see that the product of (compatible) unitary matrices is unitary. OVERVIEW OF REAL LINEAR ALGEBRA 117 53. Real Matrices with Complex Eigenvalues. Now suppose that A is a real square matrix, and the characteristic polynomial P (x) is defined for A, just as before. By the Fundamental Theorem of algebra, p(x) can be factored in a unique way (except for order) as p(x) = (x − λ1 ) · · · (x − λn ), the product of n linear factors involving the (possibly) complex eigenvalues λ1 , . . . , λn . For each distinct eigenvalue there is at least one eigenvector, this time with potentially complex entries. Since p(x) is a real polynomial, we have p(x) = (x − λ1 ) · · · (x − λn ) = p(x) = (x − λ1 ) · · · (x − λn ). So the complex roots come in conjugate pairs. Also, if Ax = λx then Ax = Ax = λx = λ x so the conjugate of an eigenvector is also an eigenvector, but for the conjugate eigenvalue. Eigenvalues for real square matrices come in conjugate pairs. If x is an eigenvector for complex eigenvalue λ then x is an eigenvector for eigenvalue λ. In particular, x cannot be a real vector if λ is not real. We finish off with a calculation that pertains to real vectors that are linear combinations of conjugate eigenvectors. Let x = xr + ixi where xr and xi are both real: the real and imaginary parts of the eigenvector x. Note that xr 6= kxi for any constant k. That is because if there were such a k, we would have x = kxi + ixi and so and x = kxi − ixi 1 1 x x= k+i k−i so xi would be a nonzero multiple of both eigenvectors and therefore an eigenvector for two different eigenvalues, an impossibility. xi = So xi and xr are both nonzero and an independent pair of real vectors. Suppose Z is a real vector and a complex linear combination of x and x: Z = (a + bi)x + (c + di)x = (a + bi)(xr + ixi ) + (c + di)(xr − ixi ). 118 LARRY SUSANKA Multiplying this out and using the fact that Z is real and also that xi and xr are independent, it is easy to see that c = a and d = −b and so Z = 2axr − 2bxi = (a + bi)x + (a − bi)x. In other words, Z is this real linear combination of real vectors xr and xi . 54. Real Symmetric Matrices. We now consider the situation where A is a real symmetric matrix. If x is any complex vector then xT Ax = xi ai j xj = xi aj i xj = xj aj i xi = xT Ax. Since the only numbers that are their own conjugates are real, we conclude that xT Ax is real. Now suppose that λ is an eigenvalue for A and x is an eigenvector for that eigenvalue. xT Ax = xT λx = λxT x = λ kx1 k2 + · · · + kxn k2 . The numbers on the far left and the far right above are real and nonzero, so λ is real too. We conclude that all eigenvalues of a real symmetric matrix are real. There is a real eigenvector for each real eigenvalue. Finally, suppose y1 is a real eigenvector for λ1 and y2 is a real eigenvector for λ2 for real symmetric A. Then λ1 y1T y2 = (A y1 )T y2 = y1T A y2 = y1T (λ2 y2 ) = λ2 y1T y2 . Unless λ1 = λ2 we must have y1T y2 = y1 · y2 = 0. So eigenvectors for different eigenvalues are orthogonal. But we can go a bit further in this direction. Suppose y1 is a normalized eigenvector for λ1 and real symmetric A. Let W = y1⊥ . Let { y2 , . . . , yn } be an ordered orthonormal basis for W and let T be the orthonormal basis { y1 , . . . , yn } of Rn . Using the symmetry of A and for i > 1 y1 · Ayi = y1T Ayi = (Ay1 )T yi = λ1 y1T yi = λ1 0 = 0. That means Aw ∈ W for all w ∈ W . And λ1 0 −1 P AP = 0 B for an (n−1)×(n−1) real symmetric block matrix B, and where the matrix P is orthogonal, whose columns are the basis vectors of T . OVERVIEW OF REAL LINEAR ALGEBRA 119 Suppose that matrix B is orthogonally diagonalizable. That means there is an ordered orthonormal basis C = { c1 , . . . , cn−1 } of Rn−1 so that λ2 0 . . . 0 0 λ3 . . . 0 Q−1 BQ = .. .. .. . . . 0 0 ... λn where the columns of Q are the members of orthonormal basis C. Using block matrix notation define vectors 0 di+1 = ∈ Rn for i = 1, . . . , n − 1 ci and let d1 = e1 . So D = { d1 , d2 , . . . , dn } is a basis for Rn . Let R denote the matrix whose columns are this ordered basis, in order. Then 0 −1 λ1 R R = R−1 P −1 AP R 0 B λ1 0 . . . 0 0 λ2 . . . 0 = .. .. .. . . . . 0 0 ... λn The matrix P R is the product of two orthogonal matrices so is itself orthogonal. Its columns h1 , . . . , hn form an ordered orthonormal basis H of Rn . So λ1 0 . . . 0 0 λ2 . . . 0 A = PEn←H .. .. PH←En .. . . . 0 0 ... λn and H constitutes an orthonormal basis of eigenvectors for A. These are the elements needed to create an induction argument (on dimension18) to prove the facts which prompted our excursion into the realm of complex matrices. These are results, in the end, about real matrices and vectors. All roots of the characteristic polynomial of a real symmetric matrix A are real. Eigenvectors for different eigenvalues are orthogonal. There is a basis of eigenvectors for A. So there is an orthonormal basis S and orthogonal matrix of transition PS←En so that PS←En APEn←S is diagonal. 18The student is encouraged to complete this argument as a challenging exercise. 120 LARRY SUSANKA 55. Real Skew Symmetric Matrices. At this point we abandon symmetric matrices and consider a real skew symmetric matrix A. 55.1. Exercise. Modify the argument from above, where we showed that real symmetric matrices have real eigenvalues, to show instead that all nonzero eigenvalues of a real skew symmetric matrix (i.e. AT = −A) are pure complex. Suppose λi is nonzero and a pure complex eigenvalue for real skew symmetric A, and x = a + bi is an eigenvector for this eigenvalue, where a, b ∈ Rn . A(a + bi) = λi(a + bi) = −λb + λai. So Aa = −λb and Ab = λa. Also λaT b = (λa)T b = (Ab)T b = bT AT b = −bT Ab = −λbT a = −λaT b. That means a · b = 0 so a and b are orthogonal. Similarly, if λ1 i and λ2 i are different nonzero eigenvalues with eigenvectors y1 = a + bi and y2 = c + di, respectively, then λ1 aT y2 = λ1 aT c + λ1 aT di. But also λ1 aT y2 =(Ab)T y2 = bT AT y2 = −bT Ay2 = −bT λ2 iy2 =λ2 bT d − λ2 bT ci. By a very similar calculation we have λ1 bT y2 = λ1 bT c + λ1 bT di. But, again λ1 bT y2 = − (Aa)T y2 = −aT AT y2 = aT Ay2 = aT λ2 iy2 = − λ2 aT d + λ2 aT ci. Equating the four real and complex components shows that aT c = aT d = bT c = bT d. So not only are y1 and y2 orthogonal, but the four component vectors { a, b, c, d } form a real orthogonal set of vectors. 55.2. Exercise. This next exercise is quite a challenge, but it is handled in the same way (induction on rank) as the corresponding argument for symmetric matrices. Suppose A is a real skew symmetric n × n matrix. Then there is a real orthogonal matrix of transition PEn←S so that PS←En APEn←S is block diagonal OVERVIEW OF REAL LINEAR ALGEBRA 121 with blocks of two types. First, there are rank(A)/2 blocks of the form 0 λ for various nonzero real λ. The rest are 1 × 1 zero blocks. −λ 0 56. Orthonormal Eigenbases and the Spectral Theorem. If V is a real inner product space with inner product h·, ·i there might be an orthonormal basis of eigenvectors for a linear transformation f : V → V . We are interested in exactly when this can happen. The first thing to do is create a matrix MS←S for f with respect to some orthonormal basis S. If there is an orthonormal basis of eigenvectors T then the matrix of transition PS←T is from one orthonormal basis to another, and so is orthogonal: its transpose is its inverse. The matrix MT←T would be diagonal. MS←S = PS←T MT←T PT←S . The product on the right is its own transpose, so it is necessary, for this orthonormal diagonalization to happen, that MS←S is symmetric. So let’s consider f with symmetric matrix with respect to orthonormal S as above. In our effort to construct an orthonormal eigenbasis then, our first step is to find a (randomly chosen) basis for each eigenspace and use the GramSchmidt process to find an orthonormal basis for each eigenspace. The last result of Section 52 implies that these eigenvectors, all together, will form a basis. It is a fact that when the matrix of a linear transformation f : V → V with respect to any one orthonormal basis is symmetric then its matrix is symmetric with respect to any orthonormal basis and the orthonormal eigenvectors, found as described above, do form a basis for V , so f is diagonalizable. The result is a special case of a theorem called The Spectral Theorem, a beautiful and important result with many generalizations and consequences. In our context, the spectrum of a linear transformation is the set of its eigenvalues. You will learn more about this theorem, and variants, in later courses. This theorem can be regarded as a matrix factorization result, often referred to as a “decomposition” of the matrix. When M is a real symmetric matrix there is an orthogonal matrix P and a diagonal matrix D so that M = P DP T = P DP −1 . This is called an eigenvalue decomposition of M . 122 LARRY SUSANKA All of the mathematics programs such as Maple, Mathematica, MATLAB and so on have efficient means by which these factor matrices can be calculated—and also for the factorizations mentioned in the next sections and many other factorizations too. Still, practical cost-benefits of slight efficiency gains of one implementation over another in special cases make the whole subject very important and active. We won’t deal with these matters here, but only consider the nature of the factorizations these programs produce. 57. The Schur Decomposition. Another factorization result for square matrices is the Schur decomposition. Its proof requires only a little tweaking of the work we did to prove the spectral theorem for real symmetric matrices. If M is an n × n matrix there is a factorization as M = QU Q∗ = QU Q−1 where Q is a unitary matrix and U is upper triangular with the eigenvalues of M as diagonal entries. If M is real and has all real eigenvalues, then Q and U can be chosen to be real matrices. Even if there are some complex eigenvalues, if M is real the columns of Q corresponding to real diagonal entries of U can be chosen to be real vectors, and U itself can be chosen to have a real column through each real diagonal entry. This is called a Schur decomposition of M . Suppose that any matrix up to size (n − 1) × (n − 1) has been shown to be “upper triangularizable” by a unitary matrix of transition of the prescribed type when real matrices and eigenvalues are involved. Given n × n matrix A, if A is real and has any real eigenvalue λ1 , we pick that real eigenvalue. If A is not real, or has no real eigenvalues, let λ1 be any choice of eigenvalue. Find normalized eigenvector y1 for eigenvalue λ1 . If λ1 and A are real choose y1 to be real and extend to an orthonormal basis Y = { y1 , . . . , yn } of Rn . Otherwise, extend to an orthonormal basis for Cn . Let S be the block matrix S = ( y1 · · · yn ) . OVERVIEW OF REAL LINEAR ALGEBRA 123 We find that λ1 0 S −1 AS = .. . 0 m1 2 . . . m2 2 . . . .. . m2 n ... λ1 m1 n 0 m2 n .. = .. . . K M 0 mn n where K is the last n − 1 entries in the top row: (m1 2 . . . m1 n ). Note that in case A and λ1 are real so is K. The (n − 1) × (n − 1) block on the lower right is indicated by M , which will also be real in this event. By assumption there is (n − 1) × (n − 1) unitary P for which m2 2 . . . −1 −1 .. P MP = P . m2 n . . . m2 n .. P . mn n is upper triangular, where P and the upper triangular product are real under the prescribed conditions. The n × n block matrix 1 0 R = .. . 0 ... P 0 0 is also unitary so the product Q = SR is too, and the latter will triangularize the matrix A. 1 0 ... 0 λ1 K 1 0 ... 0 0 0 0 −1 .. .. .. . . . M P 0 P 0 λ1 0 = . .. K −1 P M 0 1 0 .. . 0 ... P 0 0 λ1 0 = . .. 0 KP −1 P MP 0 As in the case of the Spectral Theorem, an induction argument finishes the proof. 124 LARRY SUSANKA 58. Normal Matrices. We are now in a position to determine exactly which matrices are diagonalizable using unitary matrices of transition. An n × n matrix is called normal if it commutes with its conjugate transpose: M is normal if and only if M ∗ M = M M ∗ . Unitary, Hermitian, symmetric and skew symmetric matrices are all normal, but they are not the only normal matrices. For instance if P is any unitary matrix, P −1 = P ∗ so P P ∗ = P ∗ P = In . But In + P is not unitary, and there is no reason for it to have symmetry properties, yet (In + P )∗ (In + P ) = 2In + P + P ∗ = (In + P )(In + P )∗ . 58.1. Exercise. Suppose M is unitary, Hermitian or normal, and Q is unitary of the same size. Then Q∗ M Q is also, respectively, unitary, Hermitian or normal. Suppose M is real symmetric or real skew symmetric, and Q is orthogonal of the same size. Then Q∗ M Q = QT M Q is also, respectively, symmetric or skew symmetric. Now suppose generic square matrix M is diagonalizable using unitary transition matrix P : that is, the matrix D = P ∗ M P is diagonal. But then D∗ = P ∗ M ∗ P is also diagonal, and DD∗ = D∗ D. So M M ∗ = P P ∗ M P P ∗ M ∗ P P ∗ = P (P ∗ M P ) (P ∗ M ∗ P ) P ∗ = P DD∗ P ∗ = P D∗ DP ∗ = P (P ∗ M ∗ P ) (P ∗ M P ) P ∗ = P P ∗ M ∗ P P ∗ M P P ∗ = M ∗ M. We have shown that any matrix which is diagonalizable by a unitary matrix must be normal. On the other hand, suppose M is a normal matrix. Find the Schur Decomposition M = QU Q∗ for M , where U is upper triangular and Q unitary. Using the normality of M it is easy to see that U is normal too: U ∗U = U U ∗. Since U ∗ is lower triangular, the first row first column entry of U ∗ U is a1 1 a1 1 . On the other hand the first row first column entry of U U ∗ is a1 1 a1 1 + a1 2 a1 2 + · · · + a1 n a1 n . That means all but the first entry of the first row of U must be 0 OVERVIEW OF REAL LINEAR ALGEBRA 125 Similarly, the second row second column entry of U ∗ U is a2 1 a2 2 . But the second row second column entry of U U ∗ is a2 1 a2 1 + a2 2 a2 2 + · · · + a2 n a2 n . That means every entry in the second row of U is 0 except for the second column. We carry on this way for each row and conclude that U is diagonal and M is diagonalizable using unitary matrix of transition Q. We conclude: A square matrix M is diagonalizable with a unitary matrix of transition P P ∗M P = D P unitary, D diagonal if and only if M is normal. If M is a real normal matrix and all the eigenvalues of M are real then P can be chosen to be real. 59. Real Normal Matrices. Let’s carry on a bit further with real normal matrices. We saw in Section 53 that complex eigenvalues of real matrices come in conjugate pairs and eigenvectors for these pairs can be chosen to be conjugate vectors as well. Specifically, we saw there that if λ = s + t i is a complex eigenvalue for real matrix M with eigenvector x then x is also an eigenvector of M but for eigenvalue λ = s − t i. We just found that if M is normal it can be diagonalized as P ∗ M P = D with unitary P , whose columns are eigenvectors for M . Assume that normal real M actually has a complex eigenvalue with unit eigenvector x = xr + ixi where xr and xi are real vectors, and that (by reordering the columns of P and, possibly, multiplying a column by −1) x and x are the first two columns of P . Make sure that any real eigenvalues correspond, in P , to real columns (and not a complex scalar multiple of a real eigenvector.) The first two elements on the diagonal of D are λ and λ. In Section 53 we found that xr and xi are independent vectors, but here more is true. Since P is unitary, the product x∗ x must vanish: 0 = x∗ x = xT x = xr · xr − xi · xi + 2i xr · xi . This means, first, that xr and xi are orthogonal and, second, that xr · xr equals xi · xi . Since x has √unit magnitude, these two orthogonal real vectors each have magnitude 1/ 2. Both xr and xi are linear combinations of x and x so these vectors are orthogonal to all the other columns of P . 126 LARRY SUSANKA Let Q be the unitary matrix √ √ obtained from P by replacing the first two columns of P by 2 xr and 2 xi . The transpose of these real vectors are the first two rows of Q∗ . We will refer to these first two rows as Q∗1 and Q∗2 respectively. The first two rows of P ∗ are xT and xT , and i Q∗ Q∗ xT = xTr − i xTi = √1 − √ 2 2 2 and Q∗ i Q∗ xT = xTr + i xTi = √1 + √ 2 . 2 2 As a matrix equation we have P ∗ = K ∗ Q∗ where K ∗ is the block diagonal unitary matrix 1 −i √ √ ... 0 2 2 √1 i √ . . . 0 2 ∗ 2 K = .. .. . . I 0 0 and I indicates the identity matrix of the appropriate size. But then D = P ∗ M P = K ∗ Q∗ M QK The matrix KDK ∗ is 1 √ √1 ... 2 2 √i −i ... 2 √ 2 .. .. . . 0 0 I 0 λ 0 0 . .. 0 and so KDK ∗ = Q∗ M Q. 0 ... λ ... .. . 0 R √1 0 2 0 √12 .. . 0 −i √ 2 ... 0 √i 2 ... .. . 0 I 0 where R is diagonal, containing the other eigenvalues. Multiplying this out we find that, for λ = s + t i, s t ... 0 −t s . . . 0 ∗ = Q∗ M Q KDK = .. .. . . 0 0 R where Q is orthogonal whose first two columns are real vectors √ 2 xi . √ 2 xr and We can proceed in just this way with the other complex eigenvalue conjugate pairs, permuting eigenvectors for them to the front columns, constructing 2 × 2 real diagonal blocks and real orthonormal basis vectors for each. After the complex eigenvalues are exhausted, the remaining eigenvalues are all real, each with real eigenvector. We conclude: OVERVIEW OF REAL LINEAR ALGEBRA 127 If M is a real normal matrix there is an orthogonal matrix Q so that B = Q−1 M Q is block diagonal, with either 2 × 2 blocks or 1 × 1 blocks. Each 2 × 2 block is of the form s t , −t s one block for each complex eigenvalue pair λ = s + i t and √i t. √ λ=s− The two columns in Q corresponding to this block are 2 xr and 2 xi where xr + i xi is a unit eigenvector for λ. The 1×1 blocks contain real eigenvalues and the corresponding columns in Q are real eigenvectors for these eigenvalues. 60. The LU Decomposition. If M is an m × n matrix there is a factorization as P M = LU where L is m × m and lower triangular and U is m × n and upper triangular and P is an m × m permutation matrix. If M is real, L and U are real too. This is called an LU decomposition of M . The calculations needed to produce the decomposition correspond, essentially, to Gaussian elimination. We describe the algorithm for the construction below. If the first column of M is zero, let H1 = Im and proceed to examine the second column. If the first column has nonzero entry, pick i1 so that mi1 1 is a largestmagnitude entry in that column. Use elementary row matrices to clean out (i.e. reduce to zero) all other nonzero entries of the first column of M . This is accomplished by left multiplication by elementary row matrices of the first kind. The product of these is a matrix H1 . If no “clean out” step is required let H1 = Im . In both cases H1 M has at most one nonzero entry in the first column. If it has a nonzero entry, that entry is in row i1 . Define the set of integers A2 to be { i1 } or, if i1 is not defined, the empty set. This set will keep track of rows to be ignored in subsequent calculations. Proceed to examine the second column. Identify a largest-magnitude entry in the second column of H1 M which is not in row j for any j ∈ A2 . If there is no such entry let A3 = A2 and H2 = Im . If there is let A3 = A2 ∪ {i2 } where the specified entry is in 128 LARRY SUSANKA column i2 and let H2 be the product of the elementary row matrices needed to clean out any other nonzero entries in that column for all rows except rows in A2 . Let H2 = Im if no “clean out” step is required. Let k be the lesser of m or n. We carry on as suggested above for k steps. At the jth step for j < k we examine the entries in the jth column of Hj−1 · · · H1 M to see if there are nonzero entries not in a row previously selected and recorded as a member of Aj . If there is no such entry we let Aj+1 = Aj and Hj = Im and move on to the next column. If there is such an entry select largest-magnitude entry in row ij , let Aj+1 = Aj ∪ {ij } and let Hj be the product of the elementary row matrices needed to clean out any other nonzero entries in that column for all rows except those rows recorded as “off-limits” in Aj . Let Hj = Im if no “clean out” is required. This procedure terminates at the completion of the kth step, and the matrix Hk · · · H1 M = V is an m × n matrix which has k leading zeroes in any row designation not in Ak . On the other hand, if ij is a row designated in Ak then any row it for t > j has strictly more leading zeroes than does row ij . In other words, if the rows of V corresponding to the members of Ak are placed in a new matrix in order of their selection to Ak , and the other rows picked however you like (they have k leading zeroes), the re-ordered matrix will be upper triangular. Let P be a permutation matrix that re-orders the rows of V so that P V = U is upper triangular. Examining the calculations above which created V applied to the rows of the matrix P M in their permuted locations, we see that at each step the elementary row operations are applied to place zeroes in columns at locations beneath the row selected as a largest-magnitude entry in that column. Performing the analogous row operations to the rows of P M in their new locations produces Qk · · · Q1 P M = U where each Qj is invertible and lower triangular. The inverse and product −1 of lower triangular matrices is lower triangular so if we let L = Q−1 1 · · · Qk we have the decomposition P M = LU as required. Implementations that take advantage of the nature of sparse matrices (those with a high proportion of zero entries) or that reorganize the work to cut down on roundoff error in the calculations are particularly important. OVERVIEW OF REAL LINEAR ALGEBRA 129 You will likely see and use this decomposition more than any other in Engineering and other applied mathematics settings where they use Linear Algebra. One advertised use of an LU decomposition is to more quickly solve an equation of the form M x = b. You can create a solution by solving, in order Ly = P b and then U x = y. Because of the triangular nature of L and U these can be solved by substitution. This method would be advantageous over direct Gaussian elimination of the augmented matrix if M is a fixed matrix (so you just need to find the decomposition once) but there are many different “target” b-values. 61. The Singular Value Decomposition. Next under discussion is a very interesting factorization with many uses. If M is any real m × n matrix there is an orthonormal basis U = { u1 , . . . , um } of Rm and an orthonormal basis V = { v1 , . . . , vn } of Rn and an m × n diagonal matrix Σ so that M = PEm←U Σ PV ←En . Further, Σ can be chosen so that its diagonal entries σ1 , σ2 , . . . , σk , where k is the least of m or n, satisfy σi ≥ σi+1 ≥ 0 for all i = 1, . . . , k − 1. The number t of nonzero entries on the diagonal of Σ is the rank of M . This is the singular value decomposition, or SVD. The considerations below in which the bases U and V are found also produce a slightly different, and somewhat more compact, factorization: If M is any real m × n matrix of rank t there is an orthonormal set of vectors { u1 , . . . , ut } in Rm and an orthonormal set of vectors { v1 , . . . , vt } in Rn and a t × t diagonal matrix Σ so that M = P Σ QT where P is the matrix (u1 · · · ut ) and Q is the matrix (v1 · · · vt ) of sizes m × t and n × t, respectively. Σ can be chosen so that its diagonal entries σ1 , σ2 , . . . , σt satisfy σi ≥ σi+1 > 0 for i = 1, . . . , t − 1. This is the reduced singular value decomposition, or RSVD. 130 LARRY SUSANKA The n × n matrix M T M is symmetric so there is an orthogonal n × n matrix PEn←V with column blocks (v1 v2 · · · vn ) for which PV ←En M T M PEn←V = PETn←V M T M PEn←V is diagonal with nonzero entries λ1 , . . . , λt arranged first and in nonincreasing order along the diagonal. If M T M v = λv for nonzero λ and v then M T M M T M v = λ2 v and M M T (M v) = M λv = λ(M v). The first equality implies M v 6= 0 and the second then implies that M v is an eigenvector for M M T for eigenvalue λ. So each nonzero eigenvalue of M T M is an eigenvalue for M M T . v1 , . . . , vt are linearly independent eigenvectors for nonzero eigenvalues λ1 , . . . , λt for M T M and M v1 , . . . , M vt are eigenvectors for these same eigenvalues for M M T . P Pt Pt i i If ti=1 ai M vi = 0 then M i=1 a vi ∈ nullspace(M ). i=1 a vi = 0 so But the span of v1 , . . . , vt intersects the nullspace of M T M (and therefore the nullspace of M ) only at the zero vector. The linear independence of the vi then implies that all ai = 0. So the M vi are independent, even those associated with the equal eigenvalues. This means the span of the eigenspaces for nonzero eigenvalues of M M T has dimension at least t. Switching the positions of M and M T in this argument leads us to conclude that the span of these eigenspaces for nonzero eigenvalues of M M T has dimension exactly t. Since vi · vj = viT vj = 0 when i 6= j (M vi ) · (M vj ) = viT M T M vj = viT (λj vj ) = λj vi · vj = 0. So M v1 , . . . , M vt is an orthogonal set of eigenvectors for M M T . Note that 0 < (M vi ) · (M vi ) = viT M T M vi = viT (λi vi ) = λi vi · vj = λi . This means that the λi are all positive. √ Now let σi = λi = kM vi k and ui = σi−1 M vi for i = 1, . . . , t. Then extend this orthonormal set to orthonormal basis U = { u1 , . . . , ut , ut+1 , . . . , um } spanning all of Rm . These additional ui are eigenvectors for M M T for eigenvalue 0. For each i = 1, . . . , t we have M vi = σi ui and also M T ui = σi vi . With reference to these properties, the nonzero σi are called singular values of M and the vectors ui and vi are called left and right singular vectors, respectively, for σi and M . The columnspace of M is Span({u1 , . . . , ut }) while the kernel of M is trivial if t = n, and otherwise is Span({vt+1 , . . . , vn }). OVERVIEW OF REAL LINEAR ALGEBRA 131 We now define Σ to be the m × n matrix filled with zeroes except for entries σi for 1 ≤ i ≤ t arrayed in order down the first t entries of the main diagonal. We find that Σ = PU←Em M PEn←V so M = PEm←U Σ PV ←En = u1 . . . = σ1 u1 v1T + · · · + σt ut vtT v1T um Σ ... vnT which is the sum of t nonzero m×n matrices. There are t(m+n+1) numbers to record to reproduce M , which could be small in comparison to mn. The entries of the matrices ui viT are each small, consisting of all products of the entries of unit vectors. The σi are weights that indicate how important each combination is in the sum that forms M . Some filtering, approximation and data compression schemes rely on discarding those terms associated with certain (such as small) σi . Keeping the terms corresponding to the r largest singular values we have an approximation to M given as M ≈ σ1 u1 v1T + · · · + σr ur vrT . The right side can also be interpreted as the matrix “closest” to M of rank r. You must record only r(m + n + 1) numbers to reproduce this approximation, which could represent a considerable compression. 62. ✸ Loose Ends and Next Steps. In your next class19 you could go in several directions. Most likely you will examine, a bit more closely than we did in Section 52, the effects of using the field of complex numbers rather than the real numbers. Complex inner products are a bit different. Interesting factorization results will be proved, useful in the study of Differential Equations and other areas, such as the Jordan decomposition, which states: 19Sections marked with the diamond symbol ✸, such as this one, consist of philosophical remarks, advice and points of view. They can be skipped on first reading, or altogether. 132 LARRY SUSANKA Any complex matrix M can be factorized as M = QAQ−1 where Q is a unitary matrix and A is block diagonal, with diagonal entries of the form λ 1 0 0 ... 0 0 λ 1 0 . . . 0 . . .. . . ... ... . . . . 0 0 0 . . . 1 0 0 0 0 0 λ 1 0 0 0 0 0 λ You will prove the Cayley-Hamilton Theorem, which can be useful when trying to understand the eigenspace structure of V , and provides an alternative method to find the eigenspaces among other things. This theorem states that every square matrix M satisfies its characteristic polynomial20. If p(x) = xn +pn−1 xn−1 +· · ·+p1 x+p0 is the characteristic polynomial for n × n matrix M then p(M ) = M n + pn−1 M n−1 + · · · + p1 M + p0 In = 0. This theorem is easy to prove if there is a basis of eigenvectors for M , and a direct calculation shows it to be true in dimensions two and three. But it is not so easy to show in general. Assuming this theorem, suppose p(x) has a factor (x − α) for real α and p(x) = (x − α)q(x) where q is of degree n − 1 and (x − α) is not a factor of q(x). It is quite easy to show (we assume n > 1) that q(M ) is not the zero matrix, and any nonzero vector in the columnspace of q(M ) must be killed by M − αIn so it is an eigenvector for eigenvalue α. Under the condition described above, the columnspace of q(M ) is exactly the nullspace of M − αIn , the eigenspace for eigenvalue α. Any calculator can find q(M ) as long as the dimension is not too big. Unfortunately, if (x − α) is a factor of p(x) more than once and you are looking for eigenspaces there are a number of possibilities and it is probably most efficient, from a computational standpoint, to go back to the matrix 20That every square matrix M satisfies some polynomial is easy to show. The list 2 In , M, M 2 , . . . , M n is n2 + 1 vectors in an n2 dimensional space, so the set of these matrices must be dependent. The Cayley-Hamilton Theorem brings the minimal degree of such a polynomial down to no more than n. OVERVIEW OF REAL LINEAR ALGEBRA 133 M − αIn and calculate its nullspace for that eigenvalue directly. In later classes you will examine what else can be done. Another thing to notice about the Cayley-Hamilton Theorem is that the coefficient pn−1 in p(x) = xn + pn−1 xn−1 + · · · + p1 x + p0 is −trace(M ) while p0 is (−1)n det(M ). We know that det(M ) 6= 0 exactly when M has an inverse. Cayley-Hamilton gives us a formula for it: −1 (M n + pn−1 M n−1 + · · · + p1 M ) p0 −1 (M n−1 + pn−1 M n−2 + · · · + p1 In )M. = p0 In = There might be a nonzero polynomial L(x) of lower degree than n for which L(M ) = 0. The polynomial of least degree with leading coefficient 1 for which this is true is called the minimal polynomial for M . The minimal polynomial is a factor of the characteristic polynomial, and in fact has all the same irreducible factors as has p(x), but possibly to lower (nonzero) degree. The minimal polynomial is useful for the same kinds of things that the characteristic polynomial is used to accomplish. Factorizations of these polynomials can be used to break Rn into a chain of subspaces, one inside the other, called invariant subspaces for M . An invariant subspace for M is a subspace V of Rn for which M x ∈ V whenever x ∈ V . The invariant subspaces are very useful in understanding M , and in finding and interpreting matrix factorization results such as those we have created in earlier sections. They are essential in understanding simultaneous diagonalization or simultaneous triangularization results. The basic (easiest) result in this direction is the following fact: Any collection of square matrices which commute with each other can all be brought to triangular form by the same matrix of transition. If they are all diagonalizable, they can all be brought to diagonal form simultaneously. Many other topics are natural extensions to what we have learned. Tensors are a vital subject in many applied areas. You have seen two already: inner product and determinant. Derivatives of functions from Rn to Rm are matrices. You will study them with the tools learned in linear algebra. You may well spend more time on infinite dimensional spaces, using them to study wavelet or Fourier decompositions of a function. The Laplace and Fourier (and other) transforms are linear maps on infinite dimensional spaces. 134 LARRY SUSANKA If you are in Engineering you will almost certainly do a lot of careful work analyzing how errors propagate through our matrix calculations, and how to create numerical solutions to differential equations using matrices. But ... the purpose of these notes is to provide a basis for this later work. The time has come to call things off for the quarter. Have fun in your future mathematical endeavors! OVERVIEW OF REAL LINEAR ALGEBRA 135 136 LARRY SUSANKA Index A ∩ B, 10 A ∪ B, 10 A ⊂ B, 9 CoP rojw , 21, 56 CoP rojx (y) complex vectors, 116 En , 73 GS , 107 M T , 38 MC←A , 76 P rojW (onto any subspace), 112 P rojw (onto a line), 21, 56 P rojx (y) complex vectors, 116 Ref lW (in any subspace), 112 Ref lw , 22, 56 Span(S), 63 U + W , 62, 102 U ⊕ W , 103 V D , 60 W ⊥ , 109 W1 + W2 + · · · + Wk , 104 W1 ⊕ W2 ⊕ · · · ⊕ Wk , 104 QA (position vector), 28 δi j , 36 det(f ), 85 [·]S , 73 [·]S, E , 75 Cn , 115 Mm×n , 60 N, 9 R, 9 Rv, 104 Rn , 18 Rn∗ , 55 0, 14, 19, 34, 59 P(t), 61 Pn (t), 61 ei , 19 p + N , 99 rref (A), 41 v⊥ , 109 z, 115 ~i, ~j, ~k, 19 kuk, 20, 106, 116 kzk (complex number), 115 a ∈ B, 9 colspace(M ), 98 dim(V ), 64 image(f ), 76 ker(f ), 76 nullity(A), 101 nullity(f ), 101 nullspace(M ), 98 rank(A), 101 rank(f ), 101 sgn(σ), 52 angle, 20, 106 approximate eigenvalues, 95 eigenvector, 96 solution, 113 approximately diagonalize, 98 arrow, 12 arrow-vector, 12 augmented matrix, 43 base point, 26 position vector, 28 basis, 63 ordered, 63 orthonormal, 111 standard, 19 block, 38 Cauchy-Schwarz inequality, 20, 107 Cayley-Hamilton Theorem, 132 characteristic polynomial, 57, 77, 85 closed operation, 59 columnspace, 98 commutative diagram, 77 complex numbers, 115 conjugate of z, 115 transpose, 116 consistent, 32 coordinate change, 79 functions, 55 map, 73 coordinates general basis, 73 standard basis, 18 cross product, 22 decomposition eigenvalue, 121 LU, 127 reduced singular value, 129 RSVD, 129 Schur, 122 137 138 LARRY SUSANKA singular value, 129 SVD, 129 determinant, 50, 52, 85 diagonal entries, 36 matrices, 39 diagonalizable, 90 diagonalization, 89 diagonalize, 89 approximately, 98 differential equations, 92 dimension, 64 direct sum, 103 of several subspaces, 104 direction, 11 opposite, 13 same, 13 direction cosines, 21 displacement, 11 distance, 20, 106 dot product, 20 dual, 55 dummy index, 33 eigenbasis, 89 eigenfunction, 91 eigenspace, 57, 76 eigenvalue, 57, 76 approximate, 95 decomposition, 121 eigenvector, 57, 76 approximate, 96 element of, 8 elementary column matrix, 39 operations, 31 row matrix, 37 entries, 18, 34 equivalence relation, 13 Euclidean Space, 21 Fibonacci sequence, 92 force, 11 Fourier coefficients, 111 Fundamental Theorem of Algebra, 117 of Linear Algebra, 110 Gram-Schmidt orthonormalization process, 111 Hermitian, 116 homogeneous system, 31 identity matrix, 36 image, 76 inconsistent, 31 infinite dimensional, 64 inner product, 106 space, 107 instances, 12 intersection, 10 invariant subspaces, 133 inverse iteration, 97 matrix, 36 inversion, 56 invertible, 36 isomorphic, 73 Jordan decomposition, 131 kernel, 76, 98 Kronecker delta function, 36 Laplace expansion, 50 leading coefficient, 40 zeroes, 40 least squares solution, 113 lies in or along, 29 linear (in)dependent, 63 combination, 18, 63 equations, 31 functionals, 55 in the first slot, 106 transformation Rn to Rm , 55 from V to W , 76 linear combination, 14 list, 64 lower triangular, 39 LU decomposition, 127 magnitude, 11 of a complex number, 115 of a vector, 20, 106 main diagonal, 36 matrix, 34 addition, 34 factorization, 121 multiplication, 34 of a linear transformation, 55, 76 of an inner product, 107 of transition, 79 minimal polynomial, 133 OVERVIEW OF REAL LINEAR ALGEBRA multiplication scalar, 14 natural numbers, 9 nonsingular, 36 norm of a complex vector, 116 of a vector, 20, 106 normal matrix, 124 to a line or plane, 29 nullity of a matrix, 101 nullspace, 98 one-to-one, 98 onto, 98 ordered set, 10 orthogonal, 107 complement, 109 complex vectors, 116 matrices, 39, 114 orthonormal basis, 111 parametric formulas for lines, planes etc., 30 permutation, 51 matrices, 37 perpendicular, 15, 21 pivot column, 41 variables, 32 point-normal form, 29 points to, 29 pointwise addition, 60 scalar multiplication, 60 position map, 75 vector, 27 positive definite, 106 power iteration, 96 projection, 15, 107 onto a line, 21, 56 onto a plane, 21, 56 onto a subspace, 112 Pythagorean Theorem, 112 rank of a linear transformation, 101 of a matrix, 101 real numbers, 9 recursive definition, 92 reduced singular value decomposition, 129 reflection 139 in a plane, 22, 56 in a subspace, 112 resultant, 14, 18 rotation in R2 , 56 in R3 , 57, 112 row echelon form, 40 reduced echelon form, 41 rowspace, 100 rref, 41 RSVD, 129 scalar multiplication, 14 scalar multiplication, 18, 34, 59 scalars, 18 Schur decomposition, 122 self-adjoint, 116 sequence, 92 set, 8 signum, 52 similar, 81 simultaneous diagonalization, 133 triangularization, 133 singular, 36 value decomposition, 129 values, 130 vectors, 130 skew symmetric, 39 solution to a system of linear equations, 31 parametric form, 31 span (verb or noun), 63 Spectral Theorem, 121 spectrum, 121 speed, 11 standard basis, 19, 73 form of a complex number, 115 stick, 45 submatrix, 38 subset, 9 subspace, 59 Span(S), the span of a set S, 63 W ⊥ , the orthogonal complement of W , 109 columnspace, 98 eigenspace, 57 image, 76, 98 kernel, 76, 98 nullspace, 98 140 LARRY SUSANKA solutions of a homogeneous system, 57 sum of several subspaces, 104 of two vector subspaces, 62, 102 sum of two vectors, 13 SVD, 129 symmetric, 39, 106 system of differential equations, 92 trace, 39, 85 transition matrix, 79 transpose, 38 conjugate, 116 triangle inequality, 20, 107 triangular matrix, 39 triple product, 22 trivial vector space, 60 union, 10 unit vector, 21 unitary matrix, 116 upper triangular, 39 vector, 11 addition, 14, 18, 59 space, 59 velocity, 11 zero dimensional, 24, 64 matrix, 34 vector, 14, 19, 59