GILBERT STRANG TREE | RR gi ee | nS Mana Ns oe 66 i © LINEAR ALGEBRA WELLES LEY --CoA Mab R de® tombe “Pane Ses aed fa i / 7 Digitized by the Internet Archive In 2022 with funding from Kahle/Austin Foundation httos://archive.org/details/introductiontoli0000stra_v7z1 or .fe —— fea = i = ——— /} a ad : INTRODUCTION TO LINEAR ALGEBRA Second Edition GILBERT STRANG Massachusetts Institute of Technology WELLESLEY-CAMBRIDGE PRESS Box 812060 Wellesley MA 02482 Introduction to Linear Algebra, 2nd Edition Copyright ©1998 by Gilbert Strang All rights reserved. No part of this work may be reproduced or stored or transmitted by any means, including photocopying, without written permission from Wellesley-Cambridge Press. Translation in any language is strictly prohibited — authorized translations are arranged. Printed in the United States of America ISBN SOME Oe 0-9614088-5-5 QA184.878 1998 512’.5 93-14092 Other texts from Wellesley-Cambridge Press Wavelets and Filter Banks, Gilbert Strang and Truong Nguyen, ISBN 0-9614088-7-1. Linear Algebra, Geodesy, and GPS, Gilbert Strang and Kai Borre, ISBN 0-9614088-6-3. Introduction to Applied Mathematics, Gilbert Strang, ISBN 0-9614088-0-4. An Analysis of the Finite Element Method, Gilbert Strang and George Fix, ISBN 0-9614088-8-X. Calculus, Gilbert Strang, ISBN 0-9614088-2-0. Wellesley-Cambridge Press Box 812060 Wellesley MA 02482 USA MATLAB® gs@math.mit.edu http://www-math.mit.edu/~gs/ (781) 431-8488 fax (617) 253-4358 is a registered trademark of The Mathworks, Inc. IATRX production by Amy Hendrickson, TRXnology Inc., http://www.texnology.com A Solutions Manual is available to instructors by email from the publisher. A syllabus and Teaching Codes and exams for this course are available on the linear algebra web site: http://web.mit.edu/18.06/www TABLE OF CONTENTS Introduction to Vectors Lil Vectors and Linear Combinations ies Lengths and Dot Products Solving Linear Equations GH | 2.2 pee 2.4 phe) 2.6 2a Vectors and Linear Equations The Idea of Elimination Elimination Using Matrices Rules for Matrix Operations Inverse Matrices Elimination = Factorization: Transposes and Permutations A = LU Vector Spaces and Subspaces Sal 3.2 3.3. 3.4 SL) 3.6 Spaces of Vectors The Nullspace of A: Solving Ax = 0 The Rank and the Row Reduced Form The Complete Solution to Ax = b Independence, Basis and Dimension Dimensions of the Four Subspaces Orthogonality 4.1 4.2 4.3 4.4 Orthogonality of the Four Subspaces Projections Least Squares Approximations Orthogonal Bases and Gram-Schmidt Determinants Al The Properties of Determinants DB 5.3 Permutations and Cofactors Cramer’s Rule, Inverses, and Volumes Eigenvalues and Eigenvectors 6.1 6.2 6.3. 6.4 Introduction to Eigenvalues Diagonalizing a Matrix Applications to Differential Equations Symmetric Matrices 101 101 114 122 131 141 153 165 165 174 184 195 207 207 PAM | 230 245 245 258 271 283 Vv Table of Contents 6.5 6.6 6.7 Positive Definite Matrices Similar Matrices Singular Value Decomposition (SVD) Linear Transformations Holt ee. 12 7.4 The Idea of a Linear Transformation The Matrix of a Linear Transformation Change of Basis Diagonalization and the Pseudoinverse Applications 8.1 8.2 8.3 8.4 8.5 Graphs and Networks Markov Matrices and Economic Models Linear Programming Fourier Series: Linear Algebra for Functions Computer Graphics Numerical Linear Algebra 10 9.1 9.2 Gaussian Elimination in Practice Norms and Condition Numbers 93 Iterative Methods for Linear Algebra Complex Vectors and Matrices 10.1 10.2 10.3. Complex Numbers Hermitian and Unitary Matrices The Fast Fourier Transform 294 306 a13 321 aol 329 341 347 357 357 368 376 382 388 395 395 404 410 421 421 430 439 Solutions to Selected Exercises 447 MIT Course 18.06 Final Exam 490 Matrix Factorizations 492 Index 497 Teaching Codes 503 PINE ACE This is a basic textbook for linear algebra, combining the theory with the applications. The central equations are the linear system Ax = Db and the eigenvalue problem Ax = Ax. It is simply amazing how much there is to say (and learn) about those two equations. This book comes from years of teaching and organizing and thinking about the linear algebra course—and still this subject feels new and very alive. I am really happy that the need for linear algebra is widely recognized. It is absolutely as important as calculus. I don’t concede anything, when I look at how mathematics is actually used. So many applications today are discrete rather than continuous, digital rather than analog, linearizable rather than erratic and chaotic. Then vectors and matrices are the language to know. The equation Ax = b uses that language right away. The left side has a matrix A times an unknown vector x. Their product Ax is a combination of the columns of A. This is the best way to multiply, and the equation is asking for the combination that produces b. Our solution can come at three levels and they are all important : 1. Direct solution by forward elimination and back substitution. 2. Matrix solution by x = A~'b using the inverse matrix A~!. 3. Vector space solution by finding all combinations of the columns of A and all solutions to Ax = 0. We are looking at the column space and the nullspace. And there is another possibility: Ax = b may have no solution. The direct approach by elimination may lead to 0 = 1. The matrix approach may fail to find A~!. The vector space approach can look at all combinations of the columns, but b might not lie in that column space. Part of the mathematics is understanding when an equation is solvable and when it’s not. Another part is learning to visualize vectors. hard. The components A vector v with two components is not v; and v2 tell how far to go across and up—we can draw an arrow. A second vector w may be perpendicular to v (and Chapter | tells exactly when). If those vectors have six components, we can’t draw them but our imagination keeps trying. We can think of a right angle in six-dimensional space. We can see 2v (twice as far) and —w (in the opposite direction to w). We can almost see a combination like 2v — w. Most important is the effort to imagine all the combinations of cv with dw. They fill some kind of “two-dimensional plane” inside the six-dimensional space. As I write these vi Preface words, I am not at all sure that I see this subspace. But linear algebra offers a simple way to work with vectors and matrices of any size. If we have six currents in a network or six forces on a structure or six prices for our products, we are certainly in six dimensions. In linear algebra, a six-dimensional space is pretty small. Already in this preface, you can see the style of the book and its goal. The style is informal but the goal is absolutely serious. Linear algebra is great mathematics, and I try to explain it as clearly as I can. I certainly hope that the professor in this course learns new things. The author always does. The student will notice that the applications reinforce the ideas. That is the whole point for all of us—to learn how to think. I hope you will see how this book moves forward, gradually but steadily. Mathematics is continually asking you to look beyond the most specific case, to see the broader pattern. Whether we have pixel intensities on a TV screen or forces on an airplane or flight schedules for the pilots, those are all vectors and they all get multiplied by matrices. Linear algebra is worth doing well. Structure of the Textbook I want to note five points about the organization of the book: 1. Chapter 1 provides a brief introduction to the basic ideas of vectors and matrices and dot products. If the class has met them before, there is no problem to begin with Chapter 2. That chapter solves n by n systems Ax = b. 2. For rectangular matrices, I now use the reduced row echelon form more than before. In MATLAB this is R = rref(A). Reducing A to R produces bases for the row space and column space. Better than that, reducing the combined matrix [A I ] produces total information about all four of the fundamental subspaces. 3. | Those four subspaces are an excellent way to learn about linear independence and dimension and bases. The examples are so natural, and they are genuinely the key to applications. I hate just making up vector spaces when so many important ones are needed. If the class sees plenty of examples of independence and dependence, then the definition is virtually understood in advance. The columns of A are independent when x = 0 is the only solution to Ax = 0. 4. Section 6.1 introduces eigenvalues for 2 by 2 matrices. Many courses want to meet eigenvalues early (to apply them in another subject or to avoid missing them completely). It is absolutely possible to go directly from Chapter 3 to Section 6.1. The determinant is easy for a 2 by 2 matrix, and eigenvalues come through clearly. 5. Every section in Chapters | to 7 ends with a highlighted Review of the Key Ideas. The reader can recapture the main points by going carefully through this review. Preface Vil A one-semester course that moves steadily can reach eigenvalues. The key idea is to diagonalize a matrix. For most square matrices that is S~!AS, using the eigenvector matrix S. For symmetric matrices it is Q’ AQ. When A is rectangular we need U! AV. I do my best to explain that Singular Value Decomposition because it has become extremely useful. I feel very good about this course and the student response to it. Structure of the Course Chapters 1-6 are the heart of a basic course in linear algebra—theory plus applications. The beauty of this subject is in the way those two parts come together. The theory is needed, and the applications are everywhere. I now use the web page to post the syllabus and homeworks and exam solutions: http://web.mit.edu/18.06/www I hope you will find that page helpful. It is coming close to 30,000 visitors. use it freely and suggest how it can be extended and improved. Please Chapter 7 connects linear transformations with matrices. The matrix depends on the choice of basis! We show how vectors and matrices change when the basis changes. And we show the linear transformation behind the matrix. I don’t start the course with that deeper idea, it is better to understand subspaces first. Chapter 8 gives important applications—often I choose Markov matrices for a lecture without an exam. Chapter 9 comes back to numerical linear algebra, to explain how Ax = b and Ax = dx are actually solved. Chapter 10 moves from real to complex numbers, as entries in the vectors and matrices. The complete book is appropriate for a two-semester course—it starts gradually and keeps going forward. Computing in Linear Algebra The text gives first place to MATLAB, a beautiful system that was developed specifically for linear algebra. This is the primary language of our Teaching Codes, written by Cleve Moler for the first edition and extended by Steven Lee for this edition. The Teaching Codes are on the web page, with MATLAB homeworks and references and a short primer. The best way to get started is to solve problems! We also provide a similar library of Teaching Codes for Maple and Mathematica. The codes are listed at the end of the book, and they execute the same steps that we teach. Then the reader can see matrix theory both ways—the algebra and the algorithms. Those work together perfectly. This textbook supports a course that includes computing and also a course that doesn’t. There is so much good mathematics to learn and to do. Acknowledgements This book surely comes with a lot of help. Suggestions arrived by email from an army of readers; may I thank you now! Steven Lee visited M.I.T. three times from Oak Ridge National Laboratory, to teach the linear algebra course 18.06 from this textbook. He created the web page http://web.mit.edu/18.06/www and he added new MATLAB Teaching Codes to those written by Cleve Moler for the first edition. (All the Teaching Codes are listed at the end of this book.) I find that these short programs show the essential steps of linear algebra in a very clear way. Please look at the web page for help of every kind. For the creation of the book I express my deepest thanks to five friends. The 1993 first edition was converted to IATRX2e by Kai Borre and Frank Jensen in Denmark. Then came the outstanding work by Sueli Rocha on the new edition. First at M.I.T. and then in Hong Kong, Sueli has shared all the excitement and near-heartbreak and eventual triumph that is part of publishing. Vasily Strela succeeded to make the figures print (by somehow reading the postscript file). And in the final crucial step, Amy Hendrickson has done everything to complete the design. She is a professional as well as a friend. I hope you will like the Review of the Key Ideas (at the end of every section) and the clear boxes inside definitions and theorems. The reviews and the boxes highlight the main points, and then my class remembers them. There is another special part of this book: The front cover. A month ago I had a mysterious email message from Ed Curtis at the University of Washington. He insisted that I buy Great American Quilts: Book 5, without saying why. Perhaps it is not necessary to admit that I have made very few quilts. On page 131 of that book I found an amazing quilt created by Chris Curtis. She had seen the first edition of this textbook (its cover had slanted houses). They show what linear transformations can do, in Section 7.1. She liked the houses and she made them beautiful. Possibly they became nonlinear, but that’s art. I appreciate that Oxmoor House allowed the quilt to appear on this book. The color was the unanimous choice of two people. And I happily thank Tracy Baldwin for designing her third neat cover for Wellesley-Cambridge Press. May I now dedicate this book to my grandchildren. It is a pleasure to name the ones I know so far: Roger, Sophie, Kathryn, Alexander, Scott, Jack, William, and Caroline. I hope that all of you will take this linear algebra course one day. Please pass it, whatever you do. The author is proud of you. INTRODUCTION TO VECTORS The heart of linear algebra is in two operations—both with vectors. We add vectors to get v + w. We multiply by numbers c and d to get cv and dw. Combining those operations gives the linear combination cv + dw. Chapter | explains these central ideas, on which everything builds. We start with two-dimensional vectors and three-dimensional vectors, which are reasonable to draw. Then we move into higher dimensions. The really impressive feature of linear algebra is how smoothly it takes that step into n-dimensional space. Your mental picture stays completely correct, even if drawing a ten-dimensional vector is impossible. This is where the book is going (into n-dimensional space), and the first steps are the two operations in Sections 1.1 and 1.2: 1.1. Vector addition v + w and linear combinations 1.2 The dot product v-w cv + dw. and the length |\v|| = /v- v. VECTORS AND LINEAR COMBINATIONS ag 1.1 “You can’t add apples and oranges.” That sentence might not be news, but it still contains some truth. In a strange way, it is the reason for vectors! If we keep the number of apples separate from the number of oranges, we have a pair of numbers. That pair is a two-dimensional vector v: {)) We wrote v as a column U] v; = number of apples v2 v2 = number of oranges. vector. The numbers v; and v2 are its “components.” The main point so far is to have a single letter v (in boldface) for this pair of numbers v and v2 (in lightface). Even if we don’t add v; to v2, we do add vectors. The first components and w stay separate from the second components: Vv ieee v2 and w=| w w2 ' add to 1 ve vj | +w 1 v2 + wW2 of v 2 Chapter 1 Introduction to Vectors You see the reason. The total number of apples is vj + w;. The number of oranges is v2 + w2. Vector addition is basic and important. Subtraction same idea: The components of v—w are v, — wy, and of vectors follows the Vectors can be multiplied by 2 or by —1 or by any number c. There are two ways to double a vector. One way is to add v + v. The other way (the usual way) is to multiply each component by 2: 2 The components ba and 2v2 of cv are cv; and cv2. —v= aa : (U9) The number c is called a “scalar.” Notice that the sum of —v and v is the zero vector. This is 0, which is not the same as the number zero! The vector 0 has components 0 and 0. Forgive me for hammering away at the difference between a vector and its components. Linear algebra is built on these operations v-+ w and cv—adding vectors and multiplying by scalars. There is another way to see a vector, that shows all its components at once. The vector v can be represented by an arrow. When v has two components, the arrow is in two-dimensional space (a plane). If the components are v; and v2, the arrow goes v, units to the right and v2 units up. This vector is drawn twice in Figure 1.1. First, it starts at the origin (where the axes meet). This is the usual picture. Unless there is a special reason, our vectors will begin at (0,0). But the second arrow shows the starting point shifted over to A. The arrows OP and AB represent the same vector. One reason for allowing any starting point is to visualize the sum v + w: Vector addition (head to tail) At the end of v, place the start of w. We travel along v and then along w. Or we take the shortcut along v + w. We could also go along w and then v. In other words, w+ v gives the same answer as v + w. These are different ways along the parallelogram (in this example it is a rectangle). The endpoint in Figure 1.2 is the diagonal v + w which is also w + v. Figure 1.1. The arrow usually starts at the origin (0,0); cv is always parallel to v. 1.1. Vectors and Linear Combinations Figure 1.2 Vector addition v-+w produces the diagonal of a parallelogram. first components and second components separately. Check that by algebra: The first component The order of addition makes no difference: 3 Add the is vj + w; which equals w, + vj. vee (EI 1--E- The zero vector has vj = 0 and v2 = O. It is too short to draw a decent arrow, but you know that v + 0 = v. For 2v we double the length of the arrow. We reverse its direction for —v. This reversing gives a geometric way to subtract vectors. Vector subtraction To draw v — w, along w (Figure 1.3). The components go forward along v and then backward are vj — w; and v2 — w2. We will soon meet a “dot product” of vectors. It is not the vector whose components are vjw, and v2W2. Linear Combinations We have added vectors, subtracted vectors, and multiplied by scalars. The answers v + w, v — w, and cv are computed a component at a time. By combining these operations, we now form “linear combinations” of v and w. Apples still stay separate from oranges—the linear combination in Figure 1.3 is a new vector cv + dw. DEFINITION The sum of cv and dw is a linear combination of v and w. woes i]ea{i]- Lt) 4 Chapter 1 Figure 1.3. Introduction to Vectors Vector subtraction v — w (left). The linear combination 3v + 2w (right). This is the fundamental construction of linear algebra: multiply and add. The sum v+w is a special combination, when c = d = |. The multiple 2v is the particular case with c = 2 and d = 0. Soon you will be looking at all linear combinations of v and w—a whole family of vectors at once. It is this big view, going from two vectors to a “plane of vectors,” that makes the subject work. In the forward direction, a combination of v and w is supremely easy. We are given the multipliers c = 3 and d = 2, so we multiply. Then add 3v+2w. The serious problem is the opposite question, when c and d are “unknowns.” In that case we are only given the answer: cv + dw has components 8 and —1. We look for the right multipliers c and d. The two components give two equations in these two unknowns. When 100 unknowns multiply 100 vectors each with 100 components, the best way to find those unknowns is explained in Chapter 2. Vectors in Three Dimensions Each vector v with two components corresponds to a point in the xy plane. The components of v are the coordinates of the point: x = v; and y = v2. The arrow ends at this point (v1, v2), when it starts from (0,0). Now we allow vectors to have three components. The xy plane is replaced by three-dimensional space. Here are typical vectors (still column vectors but with three components): = l || 2 2 and w= 3 —1 and v+w= OM W — The vector v corresponds to an arrow in 3-space. Usually the arrow starts at the origin, where the xyz axes meet and the coordinates are (0, 0,0). The arrow ends at the point 1.1. Vectors and Linear Combinations 3 >| (1, —2, 2) (Gerd) 3 > ae: 1 5 Zz 2 | 2 ih TeX : Ta =i) Figure 1.4 —_-Vectors [| and [| correspond to points (x, y) and (x, y, z). with coordinates x = 1, y = 2, z = 2. There is a perfect match between the column vector and the arrow from the origin and the point where the arrow ends. Those are three ways to describe the same vector: 1 From now on v= | 2 is also written as |v =(1,2,2). 2 The reason for the column form (in brackets) is to fit next to a matrix. The reason for the row form (in parentheses) is to save space. This becomes essential for long vectors. To print (1, 2,2,4,4,6) in a column would waste the environment. Important note v = (1,2, 2) is not a row vector. The row vector [1 2 2] is abso- lutely different, even though it has the same three components. It is the “transpose” of v. A column vector can be printed horizontally (with commas and parentheses). Thus (1, 2,2) is in actuality a column vector. It is just temporarily lying down. In three dimensions, vector addition is still done a component at a time. The result v-+w has components v;+ wy, and v2+w2 and v3-+w3—maybe apples, oranges, and pears. You see already how to add vectors in 4 or 5 or n dimensions. This is now the end of linear algebra for groceries! The addition v + w is represented by arrows in space. When w starts at the end of v, the third side is v + w. parallelogram. Question: When w follows v, we get the other sides of a Do the four sides all lie in the same plane? sum v + w — v — w goes around the parallelogram to produce Yes. And the — A typical linear combination of three vectors in three dimensions is u+4v—2w: Linear combination | 0; 3 1 +4/2)]-2 l 2 3 — = I 2 9 We end with this question: What surface in 3-dimensional space do you get from all the linear combinations of u and v? The surface includes the line through w and the 6 Chapter 1 Introduction to Vectors line through v. It includes the zero vector (which is the combination Ou + Ov). The surface also includes the diagonal line through u + v—and every other combination cu + dv (not using w). This whole surface is a plane (unless u is parallel to v). Note on Computing Suppose the components of v are v(1), ..., u(N) and similarly for w. In a language like FORTRAN, the sum v + w requires a loop to add components separately: DO 10 iN 10 VPLUSW(I) = vil) + wil) MATLAB works directly with vectors and matrices. When v and w have been defined, v + w is immediately understood. It is printed unless the line ends in a semicolon. Input two specific vectors as rows—the prime ’ at the end changes them to columns. Then print v + w and another linear combination: C=(2 03 4 We ee ee 2*v—3x*«w The sum will print with u =. The unnamed combination prints with ans =: i= ans = 3 4 5 =» 1 5 REVIEW OF THE KEY IDEAS #8 1. A vector v in two-dimensional space has two components v; and v2. 2. Vectors are added and subtracted a component at a time. 3. The scalar product is cv = (cv1,cv2). A linear combination of v and w is cv + dw. 4. The linear combinations of two non-parallel vectors v and w fill a plane. Problem Set 1.1 Problems 1 1-9 are about addition of vectors and linear combinations. Draw the vectors v = [4] and w = [~3] and v + w and v — w in a single xy plane. 2 If v+w=[}] and v—w=[34], compute and draw v and w. 1.1. Vectors and Linear Combinations 7. From v = lea and w = [}], find the components of 3v + w and v — 3w and cu+dw. Compute u+v and u+v-+w and 2u+2v+w when 1 | 3 —3 | —2 2 tes —] Every combination of v = (1, —2,1) and w = (0,1, —1) has components add to ___—. Find c and d so that cv + dw = (4, 2, —6). that In the xy plane mark all nine of these linear combinations: [i] +a{7| With C10 eranGene) ele (a) The subtraction v — w goes forward along v and backward 1.3 also shows a second route to v — w. What is it? (b) If you look at all combinations you see? on w. Figure of v and w, what “surface of vectors” do The parallelogram in Figure 1.2 has diagonal v + w. What is its other diagonal? What is the sum of the two diagonals? Draw that vector sum. If three corners of a parallelogram are (1, 1), (4,2), and (1,3), what are all the possible fourth corners? Draw two of them. Problems 10-13 involve the length of vectors. 10 = ‘The parallelogram with sides v = (4, 2) and Compute (length of v)? as vt =f ve: w = (—1, 2) is a rectangle (Figure 1.2). Check the Pythagoras formula a? + b* = c* which is for right triangles only: (length of v)* + (length of w)? = (length of v + w)’. 11 In this 90° case, a2 + b? = c? also works for v — w. In Figure 1.2, check that (length of v) + (length of w)? = (length of v — w)’. Give an example of v and w (not at right angles) for which this formula fails. 12 To emphasize that right triangles are special, construct v and w without a 90° angle. Compare (length of v)? + (length of w)* with (length of v + w)?. 13 In Figure 1.2 check that (length of v) + (length of w) is larger than (length of v + w). This “triangle inequality” is true for every triangle, except the absolutely thin triangle when v and w are . Notice that these lengths are not squared. 8 = Chapter 1 Introduction to Vectors 4:00 (1, 0, 0) Problems 14-18 are about special vectors on cubes and clocks. 14 Copy the cube and draw the vector sum of i = (1,0,0) and j = (0,1, 0) and k = (0,0, 1). The addition i + j yields the diagonal of : 15 Three edges of the unit cube are i, j,k. Three corners are (0,0,0), (1,0, 0), (0, 1,0). What are the other five corners and the coordinates of the center point? The center points of the six faces are 16 17 How many corners does a cube have in 4 dimensions? A typical corner is (0, 0, 1, 0). (a) What is the sum V of the twelve vectors that go from the center of a clock 1:00, 2:00, ..., 12:00? (b) If the vector to 4:00 is removed, find the sum of the eleven remaining vectors. (c) Suppose the 1:00 vector is cut in half. Add it to the other eleven vectors. Suppose the twelve vectors start from (0, —1) at the bottom of the clock instead of (0,0) at the center. The vector to 6:00 is zero doubled to (27). Add the new twelve vectors. Problems 19 How edges? to the hours 18 How many faces? many 19-22 go further with linear combinations and the vector to 12:00 is of v and w (see Figure). The figure shows u = sv ae Sw. Mark the points iv + iw and iv + iw and v+ w. 20 Mark the point —v + 2w and one other combination cv + dw with c+d Draw the line of all combinations that have c+ d = 1. = 1. 21 +Locate iv + tw and Ev + Sw. The combinations cv + cw fill out what line? Restricted by c > 0 those combinations with c = d fill what ray? 22 (a) Mark 5v+w and v+5w. Restricted by 0 <c <1 and 0 <d <1, shade in all combinations cv + dw. 1.1. Problems 19—22 9 Problems 23-26 Problems 19-22 in a plane (b) Vectors and Linear Combinations Problems 23-27 in 3-dimensional space Restricted only by 0 < c and 0 < d draw the “cone” cu+dw. of all combinations Problems 23-27 deal with u,v, w in three-dimensional 23 (a) Locate (b) Challenge problem: Under what restrictions on c,d, e, will the collection of points cu + dv + ew fill in the plane containing uw, v, w? 24 25 26 space (see Figure). iu + iv + tw and ly + sw in the figure. The three sides of the dotted triangle are v—u and w—v iS . Draw the head-to-tail (—1, 1) plus (—2, —2). around addition and u—w. Their sum a plane triangle of (3, 1) plus Shade in the pyramid of combinations cu+dv+ew with c > 0, d > 0, e > 0 and c+d+e <1. Mark the vector $(u Se eae w) as inside or outside this pyramid. If you look at all combinations of u, v, and w, is there any vector that can’t be produced from cu + dv + ew? DO Which vectors are in the plane of uw and v, and also in the plane of v and w? 28 (a) Draw vectors u, v, w so that their combinations line. (b) Draw vectors u, v, w so that their combinations cu + dv + ew fill only a plane. 29. What combination of the vectors cu + dv + ew B and H produces fill only a al? Express this question as two equations for the coefficients c and d in the linear combination. 10° = Chapter 1 Introduction to Vectors LENGTHS AND DOT PRODUCTS a 1.2 The first section mentioned multiplication of vectors, but it backed off. Now we go forward to define the “dot product” of v and w. This multiplication involves the separate products vjw, and v2w2, but it doesn’t stop there. Those two numbers are added to produce the single number v- w. DEFINITION is the number The dot product or inner product of v = (vj, v2) and w = (wj, w2) v-W=VUVW{ + 122. (1) i}f-aes Example 1 The vectors v = (4,2) and w = (—1, 2) have a zero dot product: In mathematics, zero is always a special number. For dot products, it means that these two vectors are perpendicular. The angle between them is 90°. When we drew them in Figure 1.2, we saw a rectangle (not just any parallelogram). The clearest example of perpendicular vectors is i = (1,0) along the x axis and jy = (0,1) up the y axis. Again the dot product is i- 7 =0+0=0. Those vectors i and j form a right angle. The vectors v = (1,2) and w = (2, 1) are not perpendicular. Their dot product is 4. Soon that will reveal the angle between them (not 90°). Example 2 Put a weight of 4 at the point x = —1 and a weight of 2 at the point x = 2. If the x axis is a see-saw, it will balance on the center point weights balance because the dot product is (4)(—1) + (2)(2) = 0. x = 0. The This example is typical of engineering and science. The vector of weights is (w1, wW2) = (4, 2). The vector of distances from the center is (v;, v2) = (—1, 2). The force times the distance, w; times vj, gives the “moment” of the first weight. The equation for the see-saw to balance is w,vj + w2v2 = 0. The dot product w-+v equals v-w. The order of v and w makes no difference. Example 3 Dot products enter in economics and business. We have five products to buy and sell. Their prices are (p1, p2, p3, p4, ps) for each unit—this is the “price vector’p. The quantities we buy or sell are (q1, g2, 93, 94, 95) positive when we sell, negative when we buy. Selling g; units of the first product at the price p; brings in qipi. The total income is the dot product q - p: Income= (q), q2,---,95)* (Pi, P2,---, Ps) = Gipi + g2p2+--:+495ps. 1.2. Lengths and Dot Products 11 A zero dot product means that “the books balance.” Total sales equal total purchases if q:-p =0. Then p is perpendicular to g (in five-dimensional space). With five products, the vectors are five-dimensional. This is how linear algebra goes quickly into high dimensions. Small note: Spreadsheets have become essential computer software in management. What does a spreadsheet actually do? It computes linear combinations and dot products. What you see on the screen is a matrix. Main point To compute the dot product, multiply each v; times w;. Then add. Lengths and Unit Vectors An important case is the dot product of a vector with itself. In this case the vector is v = (1, 2,3), the dot product with itself is 1 v-v = v = w. When 14: 1 Po)= 1 2 |) o |] 2 || = 3 3 se ah ae) — ale The answer is not zero because v is not perpendicular to itself. Instead of a 90° angle we have 0°. In this special case, the dot product v-v gives the length squared. DEFINITION The length (or norm) of a vector v is the square root of v- v: length = ||v|| = J/v-v. In two dimensions the length is Vur a6 U5. In three dimensions it is Vue aS v3 + US. By the calculation above, the length of v = (1, 2, 3) is |/v|] = “/ 14: We can explain this definition. ||v|| is just the ordinary length of the arrow that represents the vector. In two dimensions, the arrow is in a plane. If the components are | and 2, the arrow is the third side of a right triangle (Figure 1.5). The formula a” +b? = c*, which connects the three sides, is 17 + 2? = ||v|l. For the length of v = (1, 2,3), we used the right triangle formula twice. First, the vector in the base has components 1, 2,0 and length /5. This base vector is perpendicular to the vector (0, 0,3) that goes straight up. So the diagonal of the box has length ||v|] = /5+9= /14. The length of a four-dimensional vector would be Vv? + 0; =f v3 + om Thus (1,1, 1,1) has length Ji? + 124+ 12412 = 2. This is the diagonal through a unit cube in four-dimensional space. The diagonal in n dimensions has length \/n. The word “unit” is always indicating that some measurement equals “one.” The unit price is the price for one item. A unit cube has sides of length one. A unit circle is a circle with radius one. Now we define the idea of a “unit vector.” 12. Chapter 1 DEFINITION Introduction to Vectors A unit vector u is a vector whose length equals one. Then u-u = 1. An example in four dimensions is u = G. , 5, 5). Then u-u is it . AF i AF j =f, We divided the vector v = (1, 1, 1, 1) by its length ||v|| = 2 to get the unit vector. Example 4 The standard unit vectors along the x and y axes are written i and j. In the xy plane, the unit vector that makes an angle “theta” with the x axis is (cos @, sin @): , 1 =|, When and ? 0 rf) 6 = 0, the vector u is the same and as i. When cos0 Baal 0 = 90° (or a radians), uw is the same as J. At any angle, the components cos@ and sin@ produce u -u = 1 because cos? @+sin* 9 = 1. These vectors reach out to the unit circle in Figure 1.6. Thus cos6 and sin@ are simply the coordinates of that point at angle 6 on the unit circle. In three dimensions, the unit vectors along the axes are i, J, and k. Their components are (1,0,0) and (0,1,0) and (0,0, 1). Notice how every three-dimensional vector is a linear combination of i, j, and k. The vector v = (2,2, 1) is equal to 2i+2j +k. Its length is “22 + 2? + 12. This is the square root of 9, so ||v|| = 3. Since (2,2, 1) has length 3, the vector (, z, 5) has length 1. To create a unit vector, just divide v by its length ||v]]. 1A Unit vectors Divide any nonzero vector v by its length. is a unit vector in the same direction as v. Then u = v/|\v|| Lage (1, 2, 3) has length ¥14 | | | | | | (0, 2, 0) N (hy O.0) ye Figure 1.5 Sl ~ (1,2,0) has lengthV5 The length \/v- v of two-dimensional and three-dimensional vectors. 1.2. 7 =, 1) Lengths and Dot Products 13 i+7=(,1) i—( ie Ly i = (1,0) ay Figure 1.6 The coordinate vectors i and j. The unit vector u at angle 45° (left) and the unit vector (cos 6, sin@) at angle 6. In three dimensions we found u = (3, 5, 4). Check that w-u = i,+ 5 + 4 =wilh Then uw reaches out to the “unit sphere” centered at the origin. Unit vectors correspond to points on the sphere of radius one. The Angle Between Two Vectors We stated that perpendicular vectors have v-w = 0. The dot product is zero when the angle is 90°. To give a reason, we have to connect right angles to dot products. Then we show how v- w finds the angle between any two nonzero vectors. 1B Right angles The dot product v-w is zero when v is perpendicular to w. Proof When v and w are perpendicular, they form two sides of a right triangle. The third side (the hypotenuse going across in Figure 1.7) is v—w. So the law a7+ b? = c? for their lengths becomes II v||7 + || w ||? = ||\v— w|l- (for perpendicular vectors only). (2) Writing out the formulas for those lengths in two dimensions, this equation is ie,+ v5 + wr + ws = (vj — w1)° + (v2 — wy). (3) The right side begins with v? —2vjwy + Da. Then ve and wi are on both sides of the 5 . equation and they cancel. —2v2w2. All terms cancel would also be —2v3w3.) . 9) . 4) 2, Similarly (v2 — w2)° contains vy and w 5 and the cross term except —2v,;w,; and —2v2w2. (In three dimensions there The last step is to divide by —2: 0 = —2v,;w) —2v2w2 which leads to vijw, + v2wW2 = 0. (4) 14 = Chapter 1 Introduction to Vectors angle below 90° in this half ieee 2 V5 a v= a V20 + +} oe v-w>0 WwW ++} =e; v-W< angle above 90° in this half Mi Figure 1.7 ; 2 Perpendicular vectors have v-w = 0. The angle is below 90° when v-w > 0. Conclusion Right angles produce v - w = 0. We have proved Theorem 1B. The dot product is zero when the angle is 6 = 90°. Then cos@ = 0. The zero vector v = 0 is perpendicular to every vector w because 0- w is always zero. Now suppose v-w is not zero. It may be positive, it may be negative. The sign of v+ w immediately tells whether we are below or above a right angle. The angle is less than 90° when v- w is positive. The angle is above 90° when v - w is negative. Figure 1.7 shows a typical vector w = (1,3) in the white half-plane, with v- w > 0. The vector W = (—2,(0) in the screened half-plane has v- W = —8. The borderline is where vectors are perpendicular to v. between plus and minus, the dot product is zero. On that dividing line The next page takes one more step with the geometry of dot products. We find the exact angle 6. This is not necessary for linear algebra—you could stop here! Once we have matrices and linear equations, we won’t come back to 6. But while we are on the subject of angles, this is the place for the formula. It will show that the angle 6 in Figure 1.7 is exactly 45°, between v = (4, 2) and w = (1, 3). Start with unit vectors u and U. The sign of uw -U tells whether 6 < 90° or 9 > 90°. Because the vectors have length 1, we learn more than that. The dot product u-U is the cosine of 0. This is true in any member of dimensions! 1C (a) If wu and U are unit vectors then u-U =cosé. (b) If uw and U are unit vectors then |u-U| <1. Statement (b) follows directly from (a). Remember that cos@ is never greater than 1. It is never less than —1. The dot product of unit vectors is between —1 and 1. Figure 1.8 shows this clearly when the vectors are u = (cos6@,sin@) and i = (1,0). The dot product is u-i =cos@. That is the cosine of the angle between them. 1.2. Figure 1.8 Lengths and Dot Products 15 The dot product of unit vectors is the cosine of the angle 0. After rotation through any angle a, these are still unit vectors. The angle between them is still 6. The vectors are u = (cosa,sina) and U = (cos f,sinf). Their dot product is cosacos B+ sina sin B. From trigonometry this is the same as cos(f — @). Since 6 —a@ equals 6 we have reached the formula u-U =cos@. Problem 24 proves the inequality |w-U |< 1 directly, without mentioning angles. Problem 22 applies another formula from trigonometry, the “Law of Cosines.” The inequality and the cosine formula u-U = cos@ are true in any number of dimensions. The dot product does not change when vectors are rotated, since 6 stays the same. What if v and w are not unit vectors)? Their dot product is not generally cos 6. We have to divide by their lengths to get unit vectors u = v/||v|| and U = w/||wl. Then the dot product of those rescaled vectors u and U gives cos@. Whatever the angle, this dot product of v/||v|| with w/||w|| never exceeds one. That is the “Schwarz inequality” for dot products—or more correctly the Cauchy-SchwarzBuniakowsky inequality. It was found in France and Germany and Russia (and maybe elsewhere—it is the most important inequality in mathematics). With the extra factor ||v|| ||w|| from rescaling to unit vectors, we have cos @: v-w 1D (a) COSINE FORMULA If v and w are nonzero vectors then Piel = cosd § v|| ||w (b) SCHWARZ INEQUALITY If v and w are any vectors then |v + w| < ||v|j ||w|] . Example 5 Find the angle between v = ('4'] and w = [4] in Figure 1.7b. Solution The dot product is v- w =4+6= 10. The length of v is ||v|| = 20. The length of w is ||w|| = /10. Therefore the cosine of the angle is cos (idee 0= ae lyiiiwl = = 20/10 = V2 : Then en @ a = 45° a The angle that has this cosine is 45°. It is below 90° because v - w = 10 is positive. By the Schwarz inequality, ||v|| ||w|| = /200 is larger than v + w = 10. 16 == Chapter 1 Introduction to Vectors Example 6 The dot product of v = (a,b) and w = (b,a) is 2ab. The Schwarz inequality says that 2ab < a? +b*. For example 2(3)(4) = 24 is less than Bed? = 25, Reason The lengths are ||v|| = ||w|| |v|||}w||= a* +b. = /a2 +62. Then v- w = 2ab never exceeds The difference between a? + b* and 2ab can never be negative: a’ + b* —2ab = (a — b)* > 0. This is more famous if we write x = a? and y = b?. Then the “geometric mean” ./xy is not larger than the “arithmetic mean,’ which is the average of x and y: < s . “+4 be becomes /xy < Xcte 5 _ Computing dot products and lengths and angles It is time for a moment of truth. The dot product v - w is usually seen as a row times a column: 1 Instead of In FORTRAN DO WO 10 VDOTW 3 B : iH we more often see 3 pl 2 | Be we multiply components and add (using a loop): = aN = VDOTW + Vil) * Wil) MATLAB works with whole vectors, vectors then v’ is a row as above: not their components. dot The length of v is already known If v and =v’ xw to MATLAB as norm(v). We could define it our- selves as sqrt (v’ xv), using the square root function—also known. angle (in radians) we do have to define ourselves: cos w are column The cosine and the =v’ * w/(norm(v) * norm(w)); angle = acos(cos) We used the arc cosine (acos) function to find the angle from its cosine. We have not created a new function cos(v, w) for future use. Chapter 2 will show its format. this book. That would become an M-file, and (Quite a few M-files have been created especially for They are listed at the end.) The instructions above will cause the numbers dot and angle to be printed. The cosine will not be printed because of the semicolon. 1.2 = REVIEW OF THE KEY IDEAS The dot product v-w 17 = multiplies each component v; by w; and adds. The length ||v|| is the square root of vThe vector v/||v|| is a unit vector. ee = Lengths and Dot Products v. Its length is 1. The dot product is v- w =O when v and w are perpendicular. The cosine of 6 (the angle between v and w) never exceeds v-w A= and therefore I| || ||w || |: |v- w]| < ||v|| ||w]. Problem Set 1.2 Calculate the dot products u-v and u-w Bae Heda a v= and v-w is) ly, and w-v: Pe WS i. Nol: Compute the lengths ||z|| and ||v|| and ||w|| of those vectors. inequalities |u+v| < ||u|| ||v|| and |v- w| < |/v|| ||w|]. Write down unit vectors in the directions Check the Schwarz of v and w in Problem 1. Find the cosine of the angle 6 between them. Find unit vectors u; and uw2 in the directions of v = (3,1) and w = (2, 1,2). Also find unit vectors U; and U> that are perpendicular to v and w. For any unit vectors v and w, show that the angle is 0° or 90° or 180° between (a) v and v (b) w and —w (c) v-+w and v—w. Find the angle @ (from its cosine) between \ @aao— Ra \ and w= ia 2 (©) at — (c) v=| 5] and w=| 751 (a) =| 2 8 and w=|]|-l1 and Heir (a) Describe every vector (w , w2) that is perpendicular to v = (2, —1). (b) In words, describe all vectors that are perpendicular to V = (1, 1, 1). True or false (give a reason if true or a counterexample if false): 18 Chapter 1 Introduction to Vectors (a) If w is perpendicular (in three dimensions) to v and w, then v and w are parallel. (b) If u is perpendicular to v and w, then w is perpendicular to v + 2w. (c) There is always a combination v + cu that is perpendicular to w. The slopes of the arrows from (0,0) to (vj, v2) and (w1, w2) are v2/v, and w2/wy,. If the product of those slopes is v2w2/v1;w, = —1, show that v- w = 0 and the vectors are perpendicular. 10 Draw arrows from (0,0) to the points v = (1,2) and w = (—2, 1). Write down the two slopes and multiply them. That answer is a signal that arrows are v- w = O and the 11 If v-w is negative, what does this say about the angle between v and w? Draw a 3-dimensional vector v (an arrow), and show where to find all w’s with v-w < 0. 12 With v = (1, 1) and w = (1,5) choose a number c so that w—cv is perpendicular to v. Then find the formula that gives this number c for any v and w. 13 Find two vectors v and w that are perpendicular to (1, 1,1) and to each other. 14 Find three vectors u, v, w that are perpendicular to (1, 1, 1, 1) and to each other. 15 The geometric mean of x = 2 and y = 8 is ./xy = 4. The arithmetic mean is larger: 5(x +y)= . This came in Example 6 from the Schwarz inequality Omi 16 How (Umm /8) and w = (V8, ok Find cos@ for this v and w. long is the vector v = (1, 1,..., 1) in 9 dimensions? Find a unit vector u in the same direction as v and a vector w that is perpendicular to v. 17 What are the cosines of the angles a, 8, 0 between the vector (1, 0, —1) and the unit vectors i, j, k along the axes? Check the formula cos” w + cos” B + cos? 6 = 1. Problems 18—24 lead to the main facts about lengths and angles. Never again will we have several proofs in a row. 18 (Rules for dot products) These equations are simple but useful: Q)v-w=w-v (2)u-(v+w)=u-v+u-w (3) (cv)-w=c(v-w) Use Rules (1) and (2) with u = v+w to prove that |v-+w||? = v-v+2v-w+wew. 19 The triangle inequality is (length of v + w) < (length of v) + (length of w). Problem 18 found ||v+ wl? = |jv||* -+2v-w-+ |w||*. Use the Schwarz inequality on v+ w to prove lv + wll? < (lv|] + jw)? 20 or |v + wll < |lvl| + lw]. A right triangle in three dimensions still obeys ||v||* + ||w||? = ||v + w||?.. Show how this leads in Problem 18 to vjw, + v2w2 + v3w3 = 0. 1.2 Lengths and Dot Products 19 W = (Wj, W>) aN ar Ni 21 The figure shows that cosa = vj,/||v|| and sina = v2/|\v||. Similarly cosB is and sinfB is ____. ~The angle 6 is 8 —a@. Substitute into the formula for cos(B — a) to find cos@ = cos Bcosa@ + sinf sina = v- w/||v|| ||w]. 22 With v and w at angle 6, the Law of Cosines gives the length of the third side: |v — wll? = [vl]? — 2[]v|| lwil cos6+ [wll Compare with (v—w)-(v—w)= 23 _____ to find again the formula for cos @. The Schwarz inequality by algebra instead of trigonometry: (a) (b) Multiply out both sides of (vj w) + v2w2)? < (v7 + v5)(w7 + w3). Show that the difference between those sides equals (v;w2 — v2w )*. This cannot be negative since it is a square—so the inequality is true. 24 One-line proof of the Schwarz inequality |u -U| < 1: If (uy, uz) and (U;, U2) are unit vectors, pick out the step that uses Example 6: pals lial (Ui | ls [Ua us2 + U 2 us2 + Us2 1+1 Put (u1,U2) = (.6, .8) and (U;, U2) = (.8, .6) in that whole line and find cos @. 25 Why is |cos@| never greater than | in the first place? 26 Pick any numbers that add to x ++ y+z=0. Find the angle between your vector v = Challenge (x,y,z) and the vector w = (z,x, y). Suppose ||v|| = 5 and ||w|| = values of ||v — w|| and v- w? 3. What question: Explain why v-+w/|lv||||wi| is always —5. 27 are the smallest and largest possible ~~ So? 64 9 GaAs tp — 1 "6 ne (4) _ 9 te Ghee PLD awe ie en of > i eT SEE on fe Fas. ww a Che Gere he owing: te” Une NeHeale enw, > Sif ia aes th a i sk 1 A a oil Nap eae Alpi ety a hake Oe, st Wd delone tig lanier a hahaai Ginn weds orig segs ._iW <2 : Ly are! =e} cian, vied - 0S eas vhs 5a sae — » fdinae ine is ie 9 er Moral) VilehiGs Ffomntsnumts lee 16 mnrr Ny, pene 2 tom Dane a sea 1D Wotbeader lanier a eormpinnew yenaed ae Bane SpringDPyi dj alsesa qed oe s 7 14 cml = ‘> : ez ula: cad Salad eg > Se Hai @ fi A dynatiber sition» at ee Siliinstaas hte 7 oe heme : “f * Hee Ge GA Pri’ Gare legen Oye ae ve ark im - pp af toull tui a i ees sonnets les «jn os aa hoe et cag MS Tianty Anaad! “pee aah bel> ieeeal e Riu ule Vini@ud: aims Oj attend re iv Se: empress oye? sae atheiy wtp al law wha mnge a elo Pn i AL Sits Vee pee tiwut@& = aie @ ig ro ¥* en = aes ee —Ce ¢ ; SOLVING LINEAR EQUATIONS VECTORS AND LINEAR EQUATIONS ag 2.1 The central problem of linear algebra is to solve a system of equations. Those equations are linear, which means that the unknowns are only multiplied by numbers—we never see x times y. Our first example of a linear system is certainly not big. It has two equations in two unknowns. But you will see how far it leads: Gyo Bhs Sere PA Se eI (1) One way to approach these equations is a row at a time. The first row is x —2y = l. That equation produces a straight line in the xy plane. The point x = 1, y = 0 is on the line because it solves the equation. The point x = 3, y = 1 is also on the line because 3 —2 = 1. If we choose x = 101 we find y = 50. The slope of this particular line is 5. But slopes are important in calculus and this is linear algebra! ag 3x 2y= 11 x-2y=1 Figure 2.1. Row picture: The point where the lines meet is the solution. Figure 2.1 shows that line x — 2y = 1. The other line in the figure comes from the second equation 3x -+2y = 11. You can’t miss the intersection point where the two lines meet. The point x = 3, y = 1 lies on both lines. That point solves both equations at once. This is the solution to our system of linear equations. 21 22 Chapter 2 Solving Linear Equations We call Figure 2.1 the “row picture” of the two equations in two unknowns: R The row picture shows two lines meeting at a single point. Turn now to the column picture. I want to recognize the linear system as a “vector equation”. Instead of numbers we need to see vectors. If you separate the original system into its columns instead of its rows, you get eae laibe| This has two column vectors on the left side. The problem is to find the combination of those vectors that equals the vector on the right. We are multiplying the first column vector by x and the second column vector by y, and adding. With the right choices x = 3 and y = |, this linear combination produces the vector with components | and 11. C The column picture combines the columns on the left side to produce the vector on the right side. Figure 2.2. Column picture: A combination of columns produces the right side (1,11). Figure 2.2 is the “column picture” of the two equations in two unknowns. first part shows the two separate columns ; and The a | It also shows that first column multiplied by 3. This multiplication by a scalar (a number) is one of the two basic operations in linear algebra: Scalar multiplication 3 : =: . ; 2.1 Vectors and Linear Equations 23 If the components of a vector v are v; and v2, then the components of cv are cv; and cv2. Notice boldface for vectors and lightface for scalars. The other basic operation is vector addition. Algebraically we are adding the first components and second components separately: as Vector addition 3) —2 1 5 J+ lela | The graph in Figure 2.2 shows a parallelogram. Two vectors are placed “head to tail”. Their sum is a diagonal of the parallelogram: The sides are ; and - I;The diagonal sum is i i We have multiplied the original columns by x = 3 and y = 1, and combined them to produce the vector on the right side of the linear equations. To repeat: The left side of the vector equation is a linear combination of the columns. The problem is to find the right coefficients x = 3 and y = 1. We are combining scalar multiplication and vector addition into one step. That step is crucially important, because it contains both of the basic operations: : abyss Linear combination Of course 1 3]3 the solution x = 3, y = + —2 > | is the same = 1 i | as in the row picture. I don’t know which picture you prefer! I suspect that the two intersecting lines are more familiar at first. You may like the row picture better, but only for one day. My own preference is to combine column vectors. It is a lot easier to see a combination of four vectors in four-dimensional space, than to visualize how four hyperplanes in that space might possibly meet at a point. The coefficient matrix on the left side of the equations is the 2 by 2 matrix A: BPs S| This is very typical of linear algebra, to look at a matrix by rows and by columns. Its rows give the row picture and its columns give the column picture. Same numbers, different pictures, same equations. We can write those equations as a matrix problem 76Se ip ; : Matrix equation 1 E —2 wuts 1 slikeleback The row picture multiplies on the left side by rows. The column picture combines the columns. The numbers x = 3 and y = | go into the solution vector x. Then Ax = b is see [a= [aed 24 Chapter 2. Solving Linear Equations The first row is |-3—2-1 and this equals 1. The second row is agrees with 11. The column picture takes 3 (column 3-3+2-1 and this 1) + (column 2). It produces the column vector on the right side. Either way, this very small system is now solved! Three Equations in Three Unknowns Let the unknowns be x, y, z, and let the linear equations be Dye See 25 Ok = 6 Be = Sy Se (3) pe We look for numbers x, y, z that solve all three equations at once. Those desired numbers might or might not exist. For this system, they do exist. When the number of unknowns matches the number of equations, there is usually one solution. Before solving the problem, we try to visualize it both ways: R The row picture shows three planes meeting at a single point. C The column picture combines three columns to produce the fourth column. In the row picture, each equation is a plane in three-dimensional space. The first plane comes from the first equation x + 2y + 3z = 6. That plane crosses the x and y and z axes at the points (6,0,0) and (0,3,0) and (0,0, 2). Those three points solve the equation and they determine the whole plane. But the vector (x, y, z) = (0, 0, 0) does not solve x +2y+3z = 6. Therefore the plane in Figure 2.3 does not go through the origin. line L is on both planes line L meets third plane at solution 2x Figure 2.3 ay +22 =4 Row picture of three equations: Three planes meet at a point. 2.1 Vectors and Linear Equations 25 For a plane to go through (0, 0,0), the right side of the equation must be zero. The plane x + 2y + 3z = O does pass through the origin, and it is parallel to the original plane. When the right side increases to 6, the plane moves away from the origin. Later we will recognize that the vector (1, 2,3) is perpendicular to these parallel planes x + 2y + 3z= Cc. The second plane is given by the second equation 2x +5y+2z = 4. It intersects the first plane in a line L. The usual result of two equations in three unknowns is a line L of solutions. The third equation gives a third plane. It cuts the line L at a single point. That point lies on all three planes and solves all three equations. It is harder to draw this triple intersection point than to imagine it. The three planes meet at the solution (which we haven’t found yet). The column picture starts with the vector form of the equations: 1 v2 6 | 2 ey = 5 a ian ae Pee 6 ia | 2 (4) The unknown numbers x, y, z are the coefficients in this linear combination. We want to multiply the three column vectors by the correct numbers x, y, z to produce b = (6, 4, 2). 3 2 I 2 5 6 b=|4 De Figure 2.4 Column times column 3. picture: The solution is (x, y, Z) = (0,0, 2) because b equals 2 Figure 2.4 shows this column picture. Linear combinations of those columns can produce all possible right sides! The combination that produces b = (6, 4,2) is just 2 times the third column. The coefficients we need are x = 0, y =0, and z= 2. This is also the intersection point of the three planes in the row picture. It is the solution to the linear system: 26 Chapter 2. Solving Linear Equations ONT 1 eae 6 0 Pi 2paul Wi a —=3 5 6 ite etal(os— 9)|i 1 2 The Matrix Form of the Equations We have three rows in the row picture and three columns in the column picture (plus the right side). The three rows and columns contain nine numbers. These nine numbers fill a 3 by 3 matrix. The “coefficient matrix” has the rows and columns that have so far been kept separate: The coefficient matrix is 1 A= | 2 6 z3 Sipe -3 1 The capital letter A stands for all nine coefficients (in this square array). The letter b denotes the column vector with components 6,4,2. The unknown x is also a column vector, with components x, y, z. (We use boldface because it is a vector, x because it is unknown.) By rows the equations were (3), by columns they were (4), and now by matrices they are (5). The shorthand is Ax = b: 1 D 6° Matrix equation: De 3 ee —3 1 x 6 Sia z 2 e (5) We multiply the matrix A times the unknown vector x to get the right side b. Basic question: What does it mean to “multiply A times x”? We can do it by rows or we can do it by columns. Either way, Ax = b must be a correct representation of the three equations. So there are two ways to multiply the matrix A times the vector x: Multiplication by rows Ax comes from dot products, a row times a column: Ax=| Multiplication by columns Ax =x (row 1) * x Crow Z) «x ( row 3) * x |. (6) Ax is a linear combination of column vectors: (column 1) + y (column 2) +z (column 3). (7) However you do it, all our equations show A times x. When we substitute the solution x = (0,0, 2), the multiplication Ax produces b: 1 2, 6 Zs PS 2 =—3' 1 0 Oo 2 ;=] 6 4 2 2.1 Vectors and Linear Equations 27 In this case, multiplication by columns is simply 2 times column 3. The right side b = (6,4, 2) is proportional to the third column (3, 2, 1). This book sees Ax as a combination of the columns of A. Compute by rows, but please hold on to multiplication by columns. Example 1 Here are matrices A and J with three ones and six zeros: Axia Oma LagO eG. OO 4 Be 6 4 ein | te 4 [Oe OR ia @ OF One Lo 4 + Saha 6 6 In the first example Ax is (4,4,4). If you are a row person, the product of every row (1, 0,0) with (4,5, 6) is 4. If you are a column person, the linear combination is 4 times the first column (1,1, 1). In that matrix A, the second and third columns are zero vectors. The example with /x deserves a careful look, because the matrix / is special. It has ones on the “main diagonal”. Off that diagonal, all the entries are zeros. Whatever vector this matrix multiplies, that vector is not changed. This is like multiplication by 1, but for matrices and vectors. The exceptional matrix in this example is the 3 by 3 identity matrix: —~ I always yields the multiplication Jx =x. a oor oro Matrix Notation The first row of a 2 by 2 matrix contains aj; and aj. The second row contains a2, and a2. The first index gives the row number, so that a;; is an entry in row 7. The second index j gives the column number. But those subscripts are not convenient on a keyboard! Instead of aj; it is easier to type A(i, 7). The entry aj2 = A(1, 2) is in row |, column 2: OM aie? a=[ i ai a 4 VG AC dealee ara2) el For an m by n matrix, the row index 7 goes from 1 to m. The column index j stops at n. There are mn entries in the matrix. Multiplication in MATLAB I want to express A and x and their product Ax using MATLAB commands. This is a first step in learning that language. I begin by defining the matrix A and the vector x. This vector is a 3 by | matrix, with three rows and one column. Enter matrices a row 28 Chapter 2. Solving Linear Equations at a time, and use a semicolon to signal the end of a row: As ll 2 35 be? Se ee One oe sf = |O2 Os 2) Here are three ways to multiply Ax in MATLAB: Matrix multiplication b= Axx In reality, that is the way to do it. MATLAB is a high level language, and matrices are the objects it works with. The symbol x indicates multiplication. We can also pick out the first row of A (as a smaller matrix!). The notation for that submatrix is A(1,:), which keeps row | and all columns. Multiplying this row by the column vector x gives the first component b; = 6: Row ata time b=[A(l,:)*x; A(Q2,:)*x; A(G,:)*x] Those are dot products, row times column. The other way to multiply uses the columns of A. The first column is the 3 by 1 submatrix A(:, 1). Now the colon symbol : is keeping all rows of column 1. This column multiplies x(1) and the other columns multiply x(2) and x(3): Column at atime b= A(:,1)*x(1)+A(:, 2) *x(2) + A(:,3) * x(3) I think that matrices are stored by columns. Then multiplying a column at a time will be a little faster. So A * x is actually executed by columns. You can see the same choice in a FORTRAN-type structure, which operates on single entries of A and x. This lower level language needs an outer loop and an inner loop. When the outer loop (in FORTRAN or MATLAB) indicates the row number i, the multiplication is a row at a time: FORTRAN by rows MATLAB by rows DOMOS FEATes fone Sts3 DORLOV 4 153 foray = be 3 10 BU) =BU)+ Ad, J) # XV) b@) =b@) + AG, J) ¥ x) Notice that MATLAB is sensitive to upper case versus lower case (capital letters and small letters). If the matrix is A then its entries are A(i, j) not a(i, j). When the outer loop indicates the column number j, the multiplication is executed a column at a time. The inner loop goes down the column: FORTRAN by columns DO 20) Gi: = 3 DO 20 tit i53 20 BU) =BU)+ACU, J) * XV) MATLAB by columns foray; =i for t= 13 b(i) = bi) + AG, j) *xQ) My hope is that you will prefer the higher level A * x. FORTRAN will not appear again in this book. Maple and Mathematica and graphing calculators also operate at 2.1 Vectors and Linear Equations 29 the higher level. The multiplication is A. x in Mathematica. It is multiply(A, x); or equally evalm(A& * x); in Maple. Those languages allow symbolic entries a,b, c,... and not only real numbers. Like MATLAB’s Symbolic Toolbox, they give the correct symbolic answer. =» REVIEW OF THE KEY IDEAS =& The basic operations on vectors are scalar multiplication cv and vector addition v+ Ww. Together those operations give linear combinations cv + w and cv + dw. Ax = b asks for the combination of the columns of A that produces b. gre ee Matrix-vector multiplication Ax can be executed by rows or by columns. Each separate equation in Ax = b gives a line (n = 2) or a plane (n = 3) or a “hyperplane” (n > 3). They intersect at the solution or solutions. Problem Set 2.1 Problems 1 1-8 are about the row and column pictures of Ax = b. With A = / (the identity matrix) draw the planes in the row picture. Three sides of a box meet at the solution: Ix +0y +0z=2 Ox+ly+0z=3 On Oy iz I| or © or © oor & NS bh RW Draw the vectors in the column picture of Problem 1. Two times column | plus three times column 2 plus four times column 3 equals the right side b. If the equations in Problem | are multiplied by 1, 2,3 they become Ix +0y+0z=2 Ox +2y+0z=6 Ox 4 Oy-+ 32.) 12 or [O00 Oa 0 OPTUS ¥8 yi z fk=41) Z 6 12 Why is the row picture the same? Is the solution the same? The column picture is not the same—draw it. If equation 1 is added to equation 2, which of these are changed: the row picture, the column picture, the coefficient matrix, the solution? The new equations in Problem 1 would be x = 2, x+y =5, z= 4. 30 Chapter 2 Solving Linear Equations Find any point on the line of intersection of the planes x + y +z=4. By trial and error find another point on the line. y+ 3z=6 and x — The first of these equations plus the second equals the third: Ka ye = x+2y+ z=3 2x 4-3) 20 =o: The first two planes meet along a line. The third plane contains that line, because if x, y, z satisfy the first two equations then they also . The equations have infinitely many solutions (the whole line). Find three solutions. Move the third plane in Problem 6 to a parallel plane 2x + 3y +2z = 9. Now the three equations have no solution—why not? The first two planes meet along a line, but the third plane doesn’t cross that line (artists please draw). In Problem 6 the columns are (1, 1,2) and (1, 2,3) and (1, 1,2). This is a “singular case” because the third column is . Find two combinations of the three columns that give the right side (2, 3, 5). Problems 9-14 are about multiplying matrices and vectors. 9 Compute each Ax by dot products of the rows with the column vector: (aap 10 223 all lee orca di (b) Vas Maas eg Ulhc O° Or rao Compute each Ax in Problem 9 as a combination of the columns: O(a), becomes) FA = 1 25)—2" —4 |/-F Z 213 1 4 S35 |) = 2 How many separate multiplications for Ax, when the matrix is “3 by 3’? 11 Find the two components of Ax by rows or by columns: GE) ~ & IE) = bap 12 Multiply A times x to find three components of Ax: On On id Onalian® La0: 0 53 y Zz and pie mst ee aos) 2.1 13 (a) A matrix with m rows and n columns multiplies a vector with ponents to produce a vector with 14 Vectors and Linear Equations com- components. (b) The planes from the m equations Ax = b are in The combination of the columns of A is in -dimensional space. -dimensional space. (a) How would you define a linear equation in three unknowns x, y, z? (b) If vo = (x0, yo, Zo) and vy = (x1, y1, 21) both satisfy the linear equation, then so does cvg9 + dv,, provided c+d = (c) All combinations of vg and v; satisfy the equation when the right side is Problems 15-22 ask for matrices that act in special ways on vectors. 15 (a) What is the 2 by 2 identity matrix? (b) What is the 2 by 2 exchange matrix? P times [¥] equals [¥ ]. (a) What 2 by 2 matrix R rotates every vector by 90°? (b) What 2 by 2 matrix rotates every vector by 180°? What 3 by 3 matrix 16 17 31 P permutes x >.¢ / times be equals eae the vector R times [Fal is [XKYI. (x, y, z) to (y,z,x)? What matrix P—! permutes (y,z,x) back to (x, y, z)? 18 What 2 by 2 matrix E subtracts the first component from the second component? What 3 by 3 matrix does the same? 19 What 3 by 3 matrix E multiplies (x, y, z) to give (x, y,z+ x)? E~! multiplies (x, y, z) to give (x, y,z—x)? then multiply by E~!, the two results are ( 20 What matrix If you multiply (3, 4,5) by E and ) and ( F What 2 by 2 matrix P; projects the vector (x, y) onto the x axis to produce (x, 0)? What matrix P2 projects onto the y axis to produce (0, y)? If you multiply (5,7) by P; and then multiply by P2, the two results are ( ) and (sanst4). 21 What 2 by 2 matrix R rotates every vector through 45°? The vector (1,0) goes to (/2/2, /2/2). The vector (0, 1) goes to (—/2/2, /2/2). Those determine the matrix. Draw these particular vectors in the xy plane and find R. 22 Write the dot product of (1, 4,5) and (x, y, z) as a matrix multiplication Ax. The matrix A has one row. The solutions to Ax = 0 lie ona ____. _The columns of A are only in ___-dimensional space. 32 Chapter 2 23 Which code finds the dot product of V with row 1 and then row 2? Which code finds column | times V(1) plus column 2 times V(2)? Read the paragraph just before this problem set. If A has 4 rows and 3 columns, what changes are needed in the codes? Solving Linear Equations DO10) h=172 DO 10 jpsh,2 10 B(I) = Bil) + ACJ) * VU) DO#10) Je= 172 DO 10,b= 172 LOLB CO),= B(I) + A(I,J) * VU) 24 In both codes the first step is B(1) = ACJ, 1) x V(1).- Write the other three steps in the order executed by each code. 25 In three lines of MATLAB, enter a matrix and a column vector and multiply them. Questions 26-28 are a review of the row and column pictures. 26 Draw each picture in a plane for the equations 27 For two linear equations in three unknowns x, y, z, the row picture will show (2 or 3) (lines or planes). Those lie in (2 or 3)-dimensional space. The column x —-2y =0,x+y=6. picture is in (2 or 3)-dimensional space. 28 V4, For four linear equations in two unknowns x and y, the row picture shows four . The column picture is in -dimensional space. The equations have no solution unless the vector on the right side is a combination of (Markov matrix) Start with the vector up = (1,0). Multiply again and again by the same matrix A. The next three vectors are w1, U2, U3: Soin 8 ve te 4\lo|=[3| mq = Au) = ____ 43 = Auy = __ What property do you notice for all four vectors ug, wu), U2, U3? 30 With a computer continue Problem 29 as far as the vector u7. Then from vo = (0, 1) compute vj = Avg and v2 = Av; on to v7. Do the same from wo = (.5, .5) as far as w7. What do you notice about u7, v7 and w7? Extra: Plot the results by hand or computer. Here is a MATLAB code—you can use other languages: Ue Yeas Ol Aga |RSS SOAs while length(x) <= 7 UR FAHU end plot(k, x) Xe Xaulls wee F 2.1. 31 Vectors and Linear Equations 33 The u’s and v’s and w’s are approaching a steady state vector s. Guess that vector and check that As = s. This vector is “steady” because if you start with S, you stay with s. 32 This MATLAB code allows you to input the starting column ug = [a; b; 1—a—b] with a mouse click. These three components should be positive. (Click the left button on your point (a, b) or on several points. Add the instruction disp(u’) after the first end to see the steady state in the text window. Right button aborts.) The matrix A is entered by rows—its columns add to 1. Nel Xen oe elie cdl ave ea cil oll Hol axis({(O 1 O 1]); axis(’square’) plot(O, 0); hold on title(/Markov—your name’); xlabel(‘a’); ylabel(’b’); grid butses. while but == [a,b,but] = ginput(1) u = [a; b; 1-a-—b]; der in Taal oe 0 Re Al while length(x) <= 7 UE = Assur x = end plot end hold off 33 [he vill: exe) EXC xi(2)3) 000) Invent a 3 by 3 magic matrix M3 with entries 1,2,...,9. and diagonals add to 15. The first row could be 8,3,4. All rows and columns What is M3 times (1,1, 1)? What is Mg times (1, 1, 1, 1) if this magic matrix has entries 1,..., 16? 34 Chapter 2. Solving Linear Equations THE IDEA OF ELIMINATION go 2.2 This chapter explains a systematic way to solve linear equations. The method is called “elimination”’, and you can see it immediately in our 2 by 2 example. Before elimination, x and y appear in both equations. After elimination, the first unknown x has disappeared from the second equation: Before: SP A ay Ey | ent saa hey Sy =8. The last equation 8y = 8 instantly gives y = 1. Substituting for y in the first equation leaves x — 2 = 1. Therefore x = 3 and the solution (x, y) = (3, 1) is complete. Elimination produces an upper triangular system—this is the goal. The nonzero coefficients 1, —2, 8 form a triangle. To solve this system, x and y are computed in reverse order (bottom to top). The last equation 8y = 8 yields the last unknown, and we go upward to x. This quick process is called back substitution. It is used for upper triangular systems of any size, after forward elimination is complete. Important point: The original equations have the same solution x = 3 and y = 1. Figure 2.5 repeats this original system as a pair of lines, intersecting at the solution point (3, 1). After elimination, the lines still meet at the same point! One line is hor- izontal because its equation 8y = 8 does not contain x. Also important: How did we get from the first pair of lines to the second pair? We subtracted 3 times the first equation from the second equation. The step that eliminates x is the fundamental operation in this chapter. We use it so often that we look at it closely: To eliminate x: Subtract a multiple of equation 1 from equation 2. Three times x —2y = 1 gives 3x —6y = 3. When this is subtracted from 3x+2y = 11, the right side becomes 8. The main point is that 3x cancels 3x. What remains on the left side is 2y — (—6y) or 8y. Therefore 8y = 8, and x is eliminated. Before elimination After elimination y Shear Pape il x—2y=1 Figure 2.5 Two lines meet at the solution. So does the new line 8y = 8. 2.2 The Idea of Elimination 35 Ask yourself how that multiplier / = 3 was found. The first equation contains x. The first pivot is | (the coefficient of x). The second equation contains 3x, so the first equation was multiplied by 3. Then subtraction 3x — 3x produced the zero. You will see the general rule if we change the first equation to 5x — 10y = 5. (Same straight line.) The first pivot is now 5. The multiplier is now / = 2. To find the multiplier, divide the coefficient to be eliminated by the pivot: . ; 3 Multiply equation | by s Subtract from equation 2 - = Ix—-10y=5 36 y—el te ad ap 5 5 - Oy —a) Sy = 3: The system is triangular and the last equation still gives y = 1. Back substitution produces 5x — 10 = 5 and 5x = 15 and x = 3. Multiplying the first equation by 5 changed the numbers but not the lines or the solution. Here is the rule to eliminate a coefficient. Pivot = first nonzero in the row that does the elimination Multiplier 3 = (entry to eliminate) divided by (pivot) = 5" The new second equation contains the second pivot, which is 8. It is the coefficient of y. We would use it to eliminate y from the third equation if there were one. To solve n equations we want n pivots. You could have solved those equations for x and y without reading this book. It is an extremely humble problem, but we stay with it a little longer. Elimination might break down and we have to see how. By understanding the possible breakdown (when we can’t find a full set of pivots), you will understand the whole process of elimination. Breakdown of Elimination Normally, elimination produces the pivots that take us to the solution. But failure is possible. The method might go forward up to a certain point, and then ask us to divide by zero. We can’t do it. The process has to stop. There might be a way to adjust and continue—or failure may be unavoidable. Example 1 fails with no solution and Example 2 fails with too many solutions. Example 3 succeeds by exchanging the equations. Example 1 Permanent failure with no solution. Elimination makes this clear: Xe 2 yal 3x —6y = 11 Subtract 3 times eqn. | from eqn. 2 ta 2y ea 1 Oy = 8. The last equation is Oy = 8. There is no solution. Normally we divide the right side 8 by the second pivot, but this system has no second pivot. (Zero is never allowed as a pivot!) The row and column pictures of this 2 by 2 system show that failure was unavoidable. If there is no solution, elimination must certainly have trouble. 36 Chapter 2 Solving Linear Equations first 1 column |3 Columns don't combine to give plane second column Figure 2.6 |-—2 |-—6 Row and column pictures for Example 1: elimination fails. The row picture in Figure 2.6 shows parallel lines—which never meet. A solution must lie on both lines. Since there is no meeting point, the equations have no solution. The column picture shows the two columns (1,3) and (—2, —6) in the same di- rection. All combinations of the columns lie in that direction. But the column from the right side is in a different direction (1,11). No combination of the columns can produce this right side—therefore no solution. When we change the right side to (1,3), failure shows as a whole line of solu- tions. Instead of no solution there are infinitely many: Example 2 Permanent failure with infinitely many solutions: xD yr Ih Subtract 3 times 3x —6y =. »eqn; lL from eqn.2 a 2y = 1 Oy = 0. Now the last equation is Oy = 0. Every y satisfies this equation. There is really only one equation, namely x —2y = 1. The unknown y is “free”. After y is freely chosen, then x is determined as x = 1+ 2y. In the row picture, the parallel lines have become the same line. Every point on that line satisfies both equations. We have a whole line of solutions. In the column picture, the right side now lines up with both columns from the left side. We can choose x = | and y = 0—the first column equals the right side (1,3). We can also choose x = 0 and y = —5; the second column times —5 equals the right side. There are infinitely many other solutions. Every (x, y) that solves the row problem also solves the column problem. Elimination can go wrong in a third way—but this time it can be fixed. Suppose the first pivot position contains zero. We refuse to allow zero as a pivot. When the 2.2 The Idea of Elimination 37 right hand side H lies on the line of columns Same line from both equations Solutions all along this line 1 fal Fo (second column) = 2] Figure 2.7. Row and column pictures for Example 2: infinitely many solutions. first equation has no term involving x, we can exchange it with an equation below. With an acceptable pivot the process goes forward: Example 3 Temporary failure but a row exchange produces two pivots: Ox+2y=4 Exchange the 3x —2y=5 __ two equations @0=2ye 2y.= 4. The new system is already triangular. This small example is ready for back substitution. The last equation gives y = 2, and then the first equation gives x = 3. The row picture is normal (two intersecting lines). The column picture is also normal (column vectors not in the same direction). required. The pivots 3 and 2 are normal—but a row exchange was Examples | and 2 are singular—there is no second pivot. Example 3 is nonsingular—there is a full set of pivots. Singular equations have exactly no solution or infinitely many solutions. Nonsingular equations have exactly one solution. Pivots must be nonzero because we have to divide by them. Three Equations in Three Unknowns To understand Gaussian elimination, you have to go beyond 2 by 2 systems. Three by three is enough to see the pattern. For now the matrices are square—an equal number of rows and columns. Here is a 3 by 3 system, specially constructed so that all steps 38 Chapter 2 Solving Linear Equations lead to whole numbers and not fractions: 2x +4y —2z=2 4x+9y —2x —3z=8 (i) —3y+7z=10 What are the steps? The first pivot is 2 in the upper left corner. Below that pivot we want to create zeros. The first multiplier is the ratio / = 4/2 = 2. Multiply the pivot equation by 2 and subtract. Subtraction removes the 4x from the second equation: 1 = Subtract 2 times equation 1 from equation 2. We also eliminate x from equation 3—still using the first pivot. The quick way is to add equation | to equation 3. Then 2x cancels —2x. We do exactly that, but the rule in this book is to subtract rather than add. The systematic pattern has multiplier —2/2 = —1. Subtracting —1 times an equation is the same as adding that equation: 2 Subtract —1 times equation 1 from equation 3. Now look at the situation. The two new equations involve only y and z: ly+1z=4 ly +-352=.12 We have reached a 2 by 2 system. The final step eliminates y to reach a | by | problem: 3 Subtract equation 2new from equation 3new. Then Ox + Oy + 4z = 8. The original 3 by 3 system has been converted into a triangular 3 by 3 system: 2x+4y—22=2 4x + 9y —3z=8 | 2x +4y—2z Ge has become =—2x —3y + 7z= 10 =2 (2) ly+1z=4 47 =8. The goal is achieved—forward elimination is complete. Notice the pivots 2,1,4 along the diagonal. Those pivots 1 and 4 were hidden in the original system! Elimination brought them out. The triangular system is zero below the pivots—three elimination steps produced three zeros. This triangle is ready for back substitution, which is quick: 4z=8 gives z=2, y+z=4 gives y=2, equation I gives x =—1. The solution is (x, y, z) = (—1, 2, 2). The row picture shows planes, starting with the first equation 2x + 4y —2z = 2. The planes from all our equations go through the solution (—1, 2, 2). The original three planes are sloping, but the very last plane 4z = 8 is horizontal. 2.2 The Idea of Elimination 39 The column picture shows a linear combination of column vectors producing the right side. The coefficients in that combination are —1, 2,2 (the solution): 2 (-1)| 4 4/+2) —2 —2 2 9|+2]—3 | equals —3 7 Sam 10 (3) The numbers x, y,z multiply columns 1, 2, 3 in the original system and also in the triangular system. For a 4 by 4 problem, or an n by n problem, elimination proceeds the same way. Here is the whole idea of forward elimination: 1. Use the first equation to create zeros below the first pivot. 2. Use the new second equation to create zeros below the second pivot. 3. Keep going to find the nth pivot. The result of forward elimination is an upper triangular system. It is nonsingular if there is a full set of n pivots (never zero!). Here is a final example to show the original system, the triangular system, and the solution from back substitution: i = 6 x+y+z=6 otic) x+2y+2z=9 x+2y+3z=10 V2 y+z=3 ye i iw (oF All multipliers are 1. All pivots are 1. All planes meet at the solution (3, 2,1). The original columns combine with coefficients 3,2, 1 to give the (6,9, 10) column. =» REVIEW OF THE KEY IDEAS =& 1. A linear system becomes upper triangular after elimination. 2. The upper triangular system is solved by back substitution (starting at the bottom). 3. Elimination subtracts a multiple @;; of equation j from equation i, to make the (i, 7) entry zero. 4. The multiplier is ¢;; = 5. A zero in the pivot position can be repaired if there is a nonzero below it. 6. Otherwise, many. breakdown entry to eliminate a e —. pivot in row 1 is permanent. Pivots can not be zero! The system has no solution or infinitely 40 Chapter 2. Solving Linear Equations Problem Set 2.2 Problems 1-10 are about elimination on 2 by 2 systems. 1 What multiple / of equation 1 should be subtracted from equation 2? ax Sy =a 10x 4-9" aL After this elimination step, write down the upper triangular system and circle the two pivots. The numbers | and 11 have no influence on those pivots. 2 Solve the triangular system of Problem | by back substitution, y before x. Verify that x times (2,10) plus y times (3,9) equals (1, 11). If the right side changes to (4, 44), what is the new solution? 3 What multiple of equation 1 should be subtracted from equation 2? 2x —4y = 6 —x+5y=0. After this elimination step, solve the triangular system. to (—6, 0), what is the new solution? 4 If the right side changes What multiple / of equation 1 should be subtracted from equation 2? ax + by =\f cx+dy =g. The first pivot is a (assumed nonzero). Elimination produces what formula for the second pivot? The second pivot is missing when ad = bc. 5 Choose a right side which gives no solution and another right side which gives infinitely many solutions. What are two of those solutions? ax pp 2y=7 6x+4y 6 = Choose a coefficient b that makes this system singular. Then choose a right side g that makes it solvable. 2x + by = 13 cB Al toh) vi ako For which numbers a does elimination break down (1) permanently (2) temporarily? ak Oy == —3 4x+6y= 6. Solve for x and y after fixing the second breakdown by a row exchange. 2.2 For which three numbers k does elimination number of solutions 0 or 1 or oo? The Idea of Elimination break down? kx+3y= 41 In each case, is the 6 3x +ky = —6. What is the test on b; and b2 to decide whether these two equations have a solution? How many solutions will they have? 3x —2y =D 6x — 4y = bp. 10 In the xy plane, draw the lines x + y = 5 and x + 2y = 6 and the equation ya that comes from elimination. The line 5x —4y = c will go through the solution of these equations if c = Problems 11 11-20 study elimination on 3 by 3 systems (and possible failure). Reduce this system to upper triangular form: 26 Sy eg — 4x +7y +5z = ~— —2y+2z= 6. Circle the pivots. Solve by back substitution for z, y,.x. Two row operations are enough if a zero coefficient appears in which positions? a2 Apply elimination and back substitution to solve 2x —3y 55 Axi Sy EZ) 2x = y—3z=5 Circle the pivots. List the three row operations which subtract a multiple of the pivot row from a lower row. 13 Which number d forces a row exchange, and what is the triangular system for that d? 2x +5y+z=0 4x+dy4+72=2 yz = 3. Which number d makes this system singular (no third pivot)? 42 Chapter 2 14 Which number b leads later to a row exchange? Which b leads to a missing pivot? In that singular case find a nonzero solution x, y, z. Solving Linear Equations x + by = 0 x=2y =7=0 y+z=0. 15 16 (a) Construct a 3 by 3 system that needs two row exchanges to reach a triangular form and a solution. (b) Construct a 3 by 3 system that needs a row exchange to keep going, but breaks down later. If rows 1 and 2 are the same, how far can you get with elimination? If columns 1 and 2 are the same, which pivot is missing? i IP 2x -—y+z=0 24+ 2y +z =0 2x—-—y+z=0 4x+4y+z=0 44 6x yp z = 2 GyePe Construct a 3 by 3 example that has 9 different coefficients on the left side, but rows 2 and 3 become zero in elimination. 18 Which number g makes this system singular and which right side f gives it infinitely many solutions? Find the solution that has z = 1. x+4y-—2z=1 x+7y —6z=6 3y.+qz=t. 19 20 (Recommended) It is impossible for a system of linear equations to have exactly two solutions. Explain why. (a) If (x, y,z) and (X, Y, Z) are two solutions, what is another one? (b) If three planes meet at two points, where else do they meet? How can three planes fail to have An intersection point, when no two planes are parallel? Draw your best picture. Find a third equation that can’t be solved if x+y+z=0 and x —-y-—z=0. Problems 21-23 move up to 4 by 4 and n by n. 21 Find the pivots and the solution for these four equations: 2x+ y —U Ro ey ae ee =i) Vee = 0 a 2.2 JD. The Idea of Elimination This system has the same pivots and right side as Problem 21. How is the solution different (if it is)? 2x—- y ==) —x+2y—-— — z = ( y+2z-— t=0 ie 23 If you extend Problems 21—22 following the 1,2, 1 pattern or the —1,2,—1 tern, what is the fifth pivot? What is the nth pivot? 24 If elimination leads to these equations, find three possible original matrices A: Vo pat- a y+z=0 Baas 25 For which three numbers a will elimination fail to give three pivots? Ga A=|aa 4 Gag 26 Look for a matrix that has row Nl Ce 4 ae b sums 4 and 8, and column a+b=4 c+d=8 The four equations are solvable only if s = matrices that have the correct row and column sums 2 and s: a+c=2 b+d=s . Then find two different sums. 44 Chapter 2 Solving Linear Equations ELIMINATION USING MATRICES a 2.3 We now combine two ideas—elimination and matrices. The goal is to express all the steps of elimination (and the final result) in the clearest possible way. In a 3 by 3 example, elimination could be described in words. For larger systems, a long list of steps would be hopeless. You will see how to subtract a multiple of one row from another row—using matrices. The matrix form of a linear system is Ax = b. Here are b,x, and A: 1 The vector of right sides is b. 2 The vector of unknowns is x. (The unknowns change from x, y, Z,... x3, ... because we run out of letters before we run out of numbers.) 3 The coefficient matrix is A. to x1, x2, The example in the previous section has the beautifully short form Ax = b: 2x1 + 4x2 — 2x3 =2 4x, + 9x2 — 3x3 =8 is the same as —2x — 3x2 + 7x3 = 10 2 w Aton) ley 4 bs gl il =2 9 3 -3 2 Tis la i e l 2 10 The nine numbers on the left go into the matrix A. That matrix not only sits beside x, it multiplies x. The rule for “A times x” is exactly chosen to yield the three equations. Review of A times x. A matrix times a vector gives a vector. The matrix is square when the number of equations (three) matches the number of unknowns (three). Our matrix is 3 by 3. A general square matrix is n by n. Then the vector x is in ndimensional space. This example is in 3-dimensional space: The unknown is x] x2 x3 x =| —| and the solution is x = ” 2 Key point: Ax = b represents the row form and also the column form of the equations. We can multiply Ax a column at a time: 2 mag 4 3 oe z 7 10 This rule is used so often that we repeat it for emphasis. uc x isa combination of the columns of A. Columns are multiplied by. of x.Then Ax is x; times (column 1) + --- + xp times (column n). : 2.3 Elimination Using Matrices 45 One point to repeat about matrix notation: The entry in row 1, column | (the top left corner) is called aj}. The entry in row 1, column 3 is a;3. The entry in row 3, column | is a3}. (Row number comes before column number.) The word “entry” for a matrix corresponds to the word “component” for a vector. in row i, column j of the matrix A is aj;. General rule: The entry Example 1 This matrix has aj; = 2i + j. Then aj; = 3. Also aj2 = 4 and az; = 5. Here is Ax with numbers 3 4 5 SOW 2 and letters: =. 3-2+4-1 ay aio X1 | _ | 411%1 + @12x2 BiisS, s2216-<:1 a2, a22|| x2] ~~ ‘|aayx1 + a22x2 |" For variety we multiplied a row at a time. The first component of Ax is 6+ 4 = 10. That is the product of the row [3 4] with the column product of [aj; a 2] with (x1, x2). The ith component (2,1). of Ax involves row i, which is [ajj Using letters, it is the aj2...@jn]. The short formula uses “sigma notation”: n 2B The ith component of Ax is aj1x; + aj2x2 +---+4jnXn. This is SP es ‘ | The symbol 5° is an instruction to add. Start with with aj;x; and stop with ajyxp.* 7 = 1 and stop with j =n. Start The Matrix Form of One Elimination Step Ax = b is a convenient form for the original equation. What about the elimination steps? The first step in this example subtracts 2 times the first equation from the second equation. On the right side, 2 times the first component of b is subtracted from the second component: pi b=] 8 2 changes to b’=| 10 4 10 We want to do that subtraction with a matrix! The same result is achieved when we multiply an elimination matrix E times b: The elimination matrix is *Binstein addition. shortened this even more 1 E = | —2 Oo Ono 1 0 © l by omitting the }°. The repeated j in ajjX; automatically He also wrote the sum as By xf. Not being Einstein, we include the }°. meant 46 Chapter 2. Solving Linear Equations With numbers and letters, multiplication by E subtracts 2 times row | from row 2: Lay =e) aa (Oi 0 “thin 20 ee p) Sian 10 Z 4 10 T eneae 40. sonal 10 Os £05! by ba) b3 alba by 2 by b3 Notice how 2 and 10 stay the same. Those are b; and b3. The first and third rows of E are the first and third rows of the identity matrix /. That matrix leaves all vectors unchanged. The new second component is the number 4 that appeared after the elimination step. This is b2 — 2b. It is easy to describe the “elementary matrices” or “elimination matrices” like E. Start with the identity matrix 7. Change one of its zeros to the multiplier —/: 2C The identity matrix has 1’s on the diagonal and 0’s everywhere else. Then 1b = b. The elementary matrix or elimination matrix E;; that subtracts a multiple / of row j from row i has the extra nonzero entry —/ in the i, 7 position. Example2 bs O40 ] L=l0.1 O01) Sande ea ORO oO Ome —-l 0O 1 If you multiply 7 times b, you get b again. If you multiply £3; times Bb, then / times the first component is subtracted from the third component. Here we get 9-4 =5: it @© @ ORO OmOn 1 1 3 || == || 3 9 9 and 1 0 —4 ORO 1 0 0O 1 1 3 | =] 9 1 3 5 This is on the right side of Ax = b. What about the left side? The multiplier / = 4 was chosen to produce a zero, by subtracting 4 times the pivot. The purpose of E3, is to create a zero in the (3, 1) position. The notation fits the purpose. Start with A. Apply E’s to produce zeros below the pivots (the first E is E2,). End with a triangular system. We now look in detail at the left side—elimination applied to Ax. First a small point. The vector x stays the same. The solution is not changed by elimination. (That may be more than a small point.) It is the coefficient matrix that is changed! When we start with Ax = b and multiply by E, the result is EAx = Eb. We want the new matrix E A—the result of multiplying E times A. Matrix Multiplication The question is: How do we multiply two matrices? When the first matrix is E (an elimination matrix E21), there is already an important clue. We know A, and we know 2.3. what it becomes expect that after the elimination step. ea One eee OF Ri On OU O) ai times Elimination Using Matrices 47 To keep everything right, we hope and 2 4 A= | 4° 9 —2 — -2 —3 ve 2 4 sara et ee -2 1 et Pives = EAC— This step does not change rows | and 3. Those rows are repeated in EA—only row 2 is different. Twice the first row has been subtracted from the second row. Matrix multiplication agrees with elimination—and the new system of equations is EAx = Eb. That is simple but it hides a subtle idea. Multiplying both sides of the original equation gives E(Ax) = Eb. With our proposed multiplication of matrices, this is also (EA)x = Eb. The first was E times Ax, the second is EA times x. They are the same! The parentheses are not needed. We just write EAx = Eb. When multiplying ABC, you can do BC first or you can do AB first. This is the point of an “associative law” like 3 x (4 x 5) = (3 x 4) x 5. We multiply 3 times 20, or we multiply 12 times 5. Both answers are 60. That law seems so obvious that it is hard to imagine it could be false. But the “commutative law” 3 x 4 = 4 x 3 looks even more obvious. For matrices, EA is different from AL. 2D ASSOCIATIVE LAW = A(BC) = (AB)C NOT COMMUTATIVE LAW Often AB 4 BA. There is another requirement on matrix multiplication. We know how to multiply A times x or E times b. The matrix-matrix law should be consistent with the old matrix-vector law. Suppose B has only one column (this column is b). Then the matrix-matrix product EB should agree with Eb. Even more, we should be able to multiply matrices a column at a time: If the matrix B contains several columns by, b2, b3, then the columns of EB are Eby, Eb2, Eb3. This holds true for the matrix multiplication above (where we have the letter A instead of B). If you multiply column 1 by £, you get column | of the answer: tA ered) att ee alee Oia ata I 2 CI —2 (eo 2 een —2 and similarly for columns 2 and 3. This requirement deals with columns, while elimination deals with rows. A third approach (in the next section) describes each individual entry of the product. The beauty of matrix multiplication is that all three approaches (rows, columns, whole matrices) come out right. 48 Chapter 2 Solving Linear Equations The Matrix P;; for a Row Exchange To subtract row j from row i we use E;;. To exchange or “permute” those rows we use another matrix Pj;. Remember the situation when row exchanges are needed: Zero is in the pivot position. Lower down that column is a nonzero. By exchanging the two rows, we have a pivot (nonzero!) and elimination goes forward. What matrix P23 exchanges row 2 with row 3? We can find it by exchanging rows of /: 1 0 Py ie) 0 i GS = 0 This is a row exchange matrix. Multiplying by P23 exchanges components 2 and 3 of any column vector. Therefore it exchanges rows 2 and 3 of any matrix: beOs 10 O05 Oalee0), 1 oj © 5 1 as) 3 and 1,,0,0 OnO Ns) Or 1? 0 2 4) 24 1 O:50e 32)aO 26252 Ons 6495 OnOR 3 On the right, P23 is doing what it was created for. With zero in the second pivot position and “6” below it, the exchange puts 6 into the pivot. Notice how matrices act. They don’t just sit there. We will soon meet other permutation matrices, which can change the order of several rows. Rows 1, 2, 3 can be moved to 3, 1, 2. Our P93 is one particular permutation matrix—it works on rows 2 and 3. 2E Permutation Matrix P;; exchanges rows i and j of the matrix it multiplies. P;; is the identity matrix with rows i and j reversed. To exchange equations 1 and 3 multiply by 001 = P\3 = [°10|. Usually no row exchanges are needed. The odds are good that elimination uses only the £;;. But the P;; are ready if needed, to move a new pivot into position. The Augmented Matrix A’ This book eventually goes far beyond elimination. Matrices have all kinds of practical applications—in which they are multiplied. Our best starting point was a square E times a square A, because we met this in elimination—and we know what answer to expect for EA. The next step is to allow a rectangular matrix A’. It still comes from our original equations, but now it includes the right side b. 2.3 Elimination Using Matrices 49 Key idea: Elimination acts on A and b, and it does the same thing to both. We can include b as an extra column and follow it through elimination. The matrix A is enlarged or “augmented” by b: 2 Augmented matrix A'= [A b] =| 4 =2 SpA aie 9 -3 —3 8 97 “10 A’ contains the whole system (left and right side). Elimination acts on whole rows of A’. The left side and right side are both multiplied by E, to subtract 2 times equation 1 from equation 2. With A’ = [A b| those steps happen together: Len; ae EA’ =|-—2 1 O Ora Oneal 2 alt oe De Zo Ae Deen cee: Mee Mame cee— 7 ae L =2 —3 7 10 —2 =3 7° 2 peut 10 The new second row contains 0,1, 1,4. The new second equation is x2 +x3 = 4. Both requirements on matrix multiplication are obeyed: R_ (by rows): Each row of E acts on A’ to give a row of EA’. C (by columns): E acts on each column of A’ to give a column of EA’. Notice again that word “acts.” This is essential. Matrices do something! The matrix A acts on x to give b. The matrix E operates on A to give EA. The whole process of elimination is a sequence of row operations, alias matrix multiplications. A goes to E2,A which goes to £3; F2,A. Finally £32£3);£2,A is a triangular matrix. The right side is included when we work with A’. Then A’ goes to E2; A’ which goes to £3, E2,A’. Finally E32£3,£2)A’ is a triangular system of equations. We stop for exercises on multiplication by E, before writing down the rules for all matrix multiplications. =» REVIEW OF THE KEYIDEAS = 1. Ax equals x; times (column 1) +---+ Xn times (column 7). 2. Multiplying Ax = b by £2, subtracts a multiple €2; of row number —; is the (2, 1) entry of the matrix F). 3. On the augmented matrix [A 6], that elimination step gives [£2;A 4. When a matrix A multiplies a matrix B, it multiplies each column of B separately. | from row 2. The E215|. 50° Chapter 2 Solving Linear Equations Problem Set 2.3 Problems 1 1-14 are about elimination matrices. Write down the 3 by 3 matrices that produce these elimination steps: (a) £ , subtracts 5 times row | from row 2. (b) £32 subtracts —7 times row 2 from row 3. (c) Piz exchanges rows | and 2. In Problem 1, applying E2; and then £32 to the column b = E32 E2\b = . Applying E32 before E2; gives E2;E32b = is different from F322, because when £32 comes first, row fect from row Which three matrices A= (1,0,0) gives . £2, E32 feels no ef- E;, F3;, E32 put A into triangular form U? 1 4 eal Lee 6 1 e220 a0 and £32E3,;E2,A = U. Multiply those E’s to get one matrix M that does elimination: MA = U. Include b = (1, 0,0) as a fourth column in Problem 3 to produce A’. Carry out the elimination steps on this augmented matrix A’ to solve Ax = b. Suppose a 3 by 3 matrix has a33 = 7 and its third pivot is 2. If you change a33 to 11, the third pivot is . If you change a33 to , there is no third pivot. If every column of A is a multiple of (1,1, 1), then Ax is always a multiple of (1,1, 1). Do a 3 by 3 example. How many pivots are produced by elimination? Suppose £3, subtracts 7 times row | from row 3. To reverse that step you should 7 times row to row . The matrix to do this reverse step (the inverse matrix) is R3; = Suppose £3; subtracts 7 times row | from row 3. What matrix R3; is changed into 1? Then E3;R3; = J where Problem 7 has R3; £3; = J. (So £3; commutes with R31.) (a) £2, subtracts row | from row 2 and then P23 exchanges What matrix M = P23E2; does both steps at once? rows P23 exchanges 1 from row (b) rows 2 and 3 and then £3, subtracts row 2 and 3. 3. What matrix M = £3) P23 does both steps at once? Explain why the M’s are the same but the E’s are different. 2.3 10 Create a matrix that has aj; = a22 = negative pivots. (The first pivot is 1.) 11 Multiply these matrices: Ueriulipne she VG 8G 12 a33 = Leo —1 - Elimination Using Matrices 51 1 but elimination produces two OS 1 O 0 | ee Leee3 4 ee yal 0 Explain these facts. If the third column of B is all zero, the third column of EB is all zero (for any E). If the third row of B is all zero, the third row of EB might not be zero. 13 This 4 by 4 matrix will need elimination matrices are those matrices? E2; and £32 and E43. What shoei th eke Fe) Ot ite een O40; a ee 14 Write down the 3 by 3 matrix that has aj; = 21 — 37. This matrix has a32 = 0, but elimination still needs £32 to produce a zero. Which previous step destroys the original zero and what is E32? Problems 15 15-22 are about creating and multiplying matrices. Write these ancient problems in a 2 by 2 matrix form Ax = b: (a) X is twice as old as Y and their ages add to 33. (b) The line C. y=mx-+c goes through (x, y) = (2,5) and (3,7). Find m and 16 The parabola y = a+ bx + cx* goes through the points (x, y) = (1,4) and (2,8) and (3, 14). Find a matrix equation for the unknowns (a, b,c). Solve by elimination. 17 Multiply these matrices in the orders E = EF and FE: (eau) ira le BAe 0a One OSC Ri Also compute E? = EE and F* = FF. 18 Multiply these row exchange matrices in the orders PQ and QP: PS Oe laeo enor Oo ae 8 and) = Also compute P? = PP and (PQ)? = PQPQ. Om 00% We vi 0 1 Ma 52 Chapter 2 19 (a) Solving Linear Equations Suppose all columns of B are the same. Then all columns of EB are the same, because each one is E times (b) Suppose all rows of B are the same. EB are not the same. Show by example that all rows of 20 If E adds row 21 The entries of A and x are a;; and x;. The matrix E2; subtracts row | from row 2. Write a formula for Ze | to row 2 and F adds row 2 to row (a) the third component of Ax (b) the (2, fb) entry of EA (c) the (2,1) component of Ex (d) the first component of EF Ax. 1, does EF equal FE? The elimination matrix EF = [5 :| subtracts 2 times row | of A from row 2 of A. The result is EA. In the opposite order AE, we are subtracting 2 times of A from . (Do an example.) Problems 23-26 include the column b in the augmented matrix A’. 23 Apply elimination to the 2 by 3 augmented matrix A’. What is the triangular system Ux = c? What is the solution x? rele illel=Lal, 24 Apply elimination to the 3 by 4 augmented matrix A’. How do you know this system has no solution? Aides ees Wat Roped, Sie ae! a Sa z 1 ee! 6 Change the last number 6 so that there is a solution. 25 The equations Ax = b and Ax* = b* have the same matrix A. (a) What double augmented matrix A” should you use in elimination to solve both equations at once? (b) Solve both of these equations by working on a 2 by 4 matrix A”: 2 albl-[o} 2 lE]-f) 2.3. 26 53 Choose the numbers a,b,c,d in this augmented matrix so that there is (a) no solution (b) infinitely many solutions. AG Which Daf Elimination Using Matrices of the numbers Challenge question: position, S SS a, b, c, or d have no effect on the solvability? E£;; is the 4 by 4 identity matrix with an extra | in the (i, /) i > j. Describe the matrix examples first. YS tOra. Ss Oe Ord E;;E,;. When does it equal Ex; E;;? Try 54 Chapter 2 Solving Linear Equations RULES FOR MATRIX OPERATIONS a 2.4 I will start with basic facts. A matrix is a rectangular array of numbers or “entries.” When A has m rows and n columns, it is an “m by n” matrix. Matrices can be added if their shapes are the same. They can be multiplied by any constant c. Here are examples of A+ B and 2A, for 3 by 2 matrices: Lege 2 gl 3 4 OlncO 9. 9 O49 i Ps 2 4 OF2.0 O- “0 Matrices are added exactly as vectors are—one entry at a time. We could even regard a column vector as a matrix with only one column (so n = 1). The matrix —A comes from multiplication by c = —1 (reversing all the signs). Adding A to —A leaves the zero matrix, with all entries zero. The 3 by 2 zero matrix is different from the 2 by 3 zero matrix. Even zero has a shape (several shapes) for matrices. All this is only common sense. The entry in row i and column j is called a;; or A(i, j). Thus the entry a32 is found in row 3 and column 2. The n entries along the first row are aj1, a@)2,..., @inThe lower left entry in the matrix is a, ; and the lower right is amp. Again, the row number 7 goes from 1 to m. The column number j goes from | to n. Matrix addition is easy. The serious question is matrix multiplication. When can we multiply A times B, and what is the product AB? We cannot multiply A and B above (both 3 by 2). They don’t pass the following test: To multiply AB, the number of columns of A must equal the numbers of rows of B. If A has two columns, B must have two rows. When A is 3 by 2, the matrix B can be 2 by 1 (a vector) or 2 by 2 (square) or 2 by 20. Every column of B is ready to be multiplied by A. Then AB is 3 by 1 or 3 by 2 or 3 by 20. Suppose A is m by n and B isn by p. We can multiply. The product AB is m by p. m rows nrows |_| m rows n columns | |p columns | | p columns | ° A row times a column is an extreme case. Then | by n multiplies n by 1. The result is 1 by 1. That single number is the “dot product.” In every case AB is filled with dot products. For the top corner, the (1, 1) entry of AB is (row | of A) - (column 1 of B). To multiply matrices, take all these dot products: (each row of A) - (each column of B). 2F The entry in row i and column j of AB is (row i of A)-(column j of B). 2.4 * 5 ai Qi2 ais : * * * (AB)ij : * : * bsj A is 4 by 5 Figure 2.8 B is 5 by 6 Here i =2 55 * 2) * Rules for Matrix Operations * * * AB is 4 by 6 and j = 3. Then (AB)3 is (row 2) « (column3) = Larxbx3. Figure 2.8 picks out the second row (i = 2) of a 4 by 5 matrix A. It picks out the third column (j = 3) of a 5 by 6 matrix B. Their dot product goes into row 2 and column 3 of AB. The matrix AB has as many rows as A (4 rows), and as many columns as B. Example 1 Square matrices can be multiplied if and only if they have the same size: > -a][3 a]=[1 a]; The first dot product is 1-2 + 1+3 = 5. Three more dot products give 6,1, and 0. Each dot product requires two multiplications—thus eight in all. If A and B are n by n, so is AB. It contains n? dot products, row of A times column of B. Each dot product needs n multiplications, so the computation of AB uses n> separate multiplications. For n = 100 we multiply a million times. For n = 2 we have n? = 8. Mathematicians thought until recently that AB absolutely needed 2? = 8 multiplications. Then somebody found a way to do it with 7 (and extra additions). By breaking n by n matrices into 2 by 2 blocks, this idea also reduced the count for large matrices. Instead of n> it went below n?'8, and the exponent keeps falling.* The best at this moment is n*37°. But the algorithm is so awkward that scientific computing is done the regular way—with n? dot products in AB, and n multiplications for each one. Example 2 Suppose A is a row vector (1 by 3) and B is a column vector (3 by 1). Then AB is 1 by | (only one entry, the dot product). On the other hand B times A (a column times a row) is a full matrix: Column times row: 0 O00 l [1 2 3 | bal plIa cae: 2B 2 4 ON A row times a column is an “inner” product—another name for dot product. A column times a row is an “outer” product. These are extreme cases of matrix multiplication, *Maybe the exponent won’t stop falling before 2. No number in between looks special. 56 Chapter 2 Solving Linear Equations with very thin matrices. They follow the rule for shapes in multiplication: (n by 1) times (1 by n). The product of column times row is n by n. Rows and Columns of AB In the big picture, A multiplies each column of B. The result is a column of AB. In that column, we are combining the columns of A. Each column of AB is a combination of the columns of A. That is the column picture of matrix multiplication: Column of AB is (matrix A) times (column of B). The row picture is reversed. Each row of A multiplies the whole matrix B. The result is a row of AB. It is a combination of the rows of B: [row i of A] We see row Lg Ma Sy YF 3 operations in elimination ag = [row i of AB]. © (E times A). We see columns in A times x. The “row-column picture” has the dot products of rows with columns. Believe it or not, there is also a “column-row picture.” Not everybody knows that columns of A multiply rows of B to give the same answer AB. This is in the next example. The Laws for Matrix Operations May I put on record six laws that matrices do obey, while emphasizing an equation they don’t obey? The matrices can be square or rectangular, and the laws involving A+ B are all simple and all obeyed. Here are three addition laws: A+B=B+A (commutative law) c(A+ B)=cA+cB (distributive law) A+(B+C)=(A+8B)+C (associative law). You know the right sizes for AB: (m by n) multiplies (n by p) to produce (m by p). Three more laws hold for multiplication, but AB = BA is not one of them: ABABA (the commutative “law” is usually broken) C(A+ B)=CA+CB (A+ B)C =AC+ (distributive law from the left) BC A(BC) = (AB)E (distributive law from the right) (associative law)(parentheses not needed). When A and B are not square, AB is a different size from BA. These matrices can’t be equal—even if both multiplications are allowed. For square matrices, almost any example shows that AB is different from BA: eee] core ales apa 2.4 It is true that AJ = JA. All square matrices commute these matrices c/ commute with all other matrices. The law A(B + C) = AB+ Rules for Matrix Operations 57 with J and also with c/. Only AC is proved a column at atime. Start with A(b+ cc) = Ab + Ac for the first column. That is the key to everything—linearity. Say no more. The law A(BC) = (AB)C means that you can multiply BC first or AB first. The direct proof is sort of awkward (Problem 16) but this law is extremely useful. We highlighted it above; it is the key to the way we multiply matrices. Look at the special case when A = B = C = square matrix. Then (A times A*) = (A? times A). The product in either order is A*. The matrix powers A” follow the same rules as numbers: A? = AAA---A (p factors) (A?)(A?) = A?T4 (AP)? = AP4, Those are the ordinary laws for exponents. A* times A* is A’ (seven factors). A? to the fourth power is A!* (twelve A’s). When p and gq are zero or negative these rules still hold, provided A has a “—1 power”—which is the inverse matrix A~'. Then A° = / is the identity matrix (no factors). For a number, a! is 1/a. For a matrix, the inverse is written A~!. (It is never I/A, except this is allowed in MATLAB). Every number has an inverse except a = 0. To decide when A has an inverse is a central problem in linear algebra. Section 2.5 will start on the answer. This section is a Bill of Rights for matrices, to say when A and B can be multiplied and how. Block Matrices and Block Multiplication We have to say one more thing about matrices. They can be cut into blocks (which are smaller matrices). This often happens naturally. Here is a 4 by 6 matrix broken into blocks of size 2 by 2—and each block is just /: 1. “Oerecouler ri aL Ae | (ea 0 1 ¢ Ora = k jj 4 Lae ae 00: | eel If B is also 4 by 6 and its block sizes match the block sizes in A, you can add a block at a time. A+B We have seen block matrices before. The right side vector b was placed next to A in the “augmented matrix.” Then A’ =[A _ b| has two blocks Multiplying A’ by an elimination matrix gave [EA blocks times blocks, when their shapes permit: Eb]. of different sizes. No problem to multiply 58 Chapter 2 Solving Linear Equations 2G Block multiplication If the cuts between columns of A match the cuts between rows of B, then block multiplication of AB is allowed: Ai "Ai2 ))Big 4 PA Bi + Aiea al Aa Ag) =. Az By + AB. +: |Baie This equation is the same as if the blocks were numbers (1) (which are | by | blocks). We are careful to keep A’s in front of B’s, because BA can be different. The cuts between rows of A give cuts between rows of AB. Any column cuts in B are also column cuts in AB. Main point When matrices split into blocks, it is often simpler to see how they act. The block matrix of /’s above is much clearer than the original 4 by 6 matrix A. Example 3 (Important) Let the blocks of A be its columns. Let the blocks of B be its rows. Then block multiplication AB is columns times rows: | AB = aj | oo BST iid an | tea : = 2E9) a,b, se -+> + anby, . (2) eRe This is another way to multiply matrices! Compare it with the usual rows times columns. Row | of A times column 1 of B gave the (1,1) entry in AB. Now column 1 of A times row | of B gives a full matrix—not just a single number. Look at this example: elle fo he o|+15 a 3) We stop there so you can see columns multiplying rows. If a 2 by 1 matrix (a column) multiplies a 1 by 2 matrix (a row), the result is 2 by 2. That is what we found. Dot products are “inner products,” these are “outer products.” When you add the two matrices at the end of equation (3), you get the correct answer AB. In the top left corner the answer is 3 +4 = 7. This agrees with the row-column dot product of (1,4) with (3, 1). Summary The usual way, rows times columns, gives four dot products (8 multiplications). The new way, columns times rows, gives two full matrices (8 multiplications). The eight multiplications and four additions are all the same. You just execute them in a different order. Example 4 (Elimination by blocks) Suppose the first column of A contains 2, 6, 8. To change 6 and 8 to O and 0, multiply the pivot row by 3 and 4 and subtract. Those 2.4 Rules for Matrix Operations 59 steps are really multiplications by elimination matrices E2, and £3}: OF E> = |-3 1 Op 5 Olde 0 and Le 05 aEO 0 1 0 —4 0 1 E3) ae The “block idea” is to do both eliminations with one matrix E. That matrix clears out the whole first column below the pivot a = 2: ie Li —4 COR 0) le) 0 Ose multiplies A to give EA=|0O 1 x Otte Oa x x Block multiplication gives a formula for those x’s in EA. The matrix A has four blocks a, b, c, D: the pivot, the rest of row 1, the rest of column 1, and the rest of the matrix. Watch how FE multiplies A by blocks: kevetallearcleleaneyll 1 |9 a| 5 —c/a |I c |D a | : b 0 | D —cb/a Elimination multiplies a by c/a and subtracts from c, to get zeros in the first column. It multiplies the column vector c/a times the row vector b, and subtracts to get D—cb/a. This is ordinary elimination, a column = at a time—written in blocks. REVIEW OF THE KEY IDEAS is (row 7 of A)-(column =® 1. ‘The (i, j) entry of AB 7 of B). 2. Anm 3. A times BC equals AB times C (surprisingly important). 4. AB 5. Block multiplication is allowed when the block shapes match correctly. by n matrix times an n by p matrix uses mnp separate multiplications. is also the sum of (column j of A)(row j of B). Problem Set 2.4 Problems 1 1-17 are about the laws of matrix multiplication. Suppose A is 3 by 5, B is 5 by 3, C is 5 by 1, and D is 3 by 1. Which of these matrix operations are allowed, and what are the shapes of the results? (a) BA 60 Chapter 2 Solving Linear Equations (b) A(B+C) (c) ABD (d) AG+ BD (ec) ABABD. What rows or columns and what matrices do you multiply to find (a) the third column of AB? (b) the first row of AB? (c) the entry in row 3, column 4 of AB? (d) the entry in row 1, column | of CDE? Compute AB + AC and separately A(B + C) and compare: =. eS | and 0 ae; 2 i and Sia ea, wl al In Problem 3, multiply A times BC. Then multiply AB times C. Compute A? and A?. Make a prediction for A> and A”: IL a=|) a) | and DD Sel al Show that (A + B) is different from A? + 24B + B2, when ded) rel, A and i bee @ alk Write down the correct rule for (A + B)(A + B) = A* + AB + + B?. True or false. Give a specific example when false: (a) If columns | and 3 of B are the same, so are columns (b) If rows 1 and 3 of B are the same, so are rows | and 3 of AB. (c) If rows 1 and 3 of A are the same, so are rows | and 3 of AB. @) (GAB) =A" B*. How is each row of DA and EA JB ae0 B=, ‘| and How is each column of AD 1 and 3 of AB. related to the rows of A, when S (Ove! eal; | and AE and i Flay a=|é gb Be related to the columns of A? 2.4 Row | of A is added to row 2. This gives added to column 2 to produce (EA)F: Ik OU BA=| Na (a) | ae A) | | of EA is b ie] a a+b (EA)F = (EA)|j Se. atte 1 of A to column 1 of AF Compare with (EA)F. tion? What law is or is not obeyed by matrix multiplica- to row 2 by E(AF). Row | of A is again added to row 2 to produce EA. Then F adds row 2 of EA 1. The result is F(EA): es Fea) 2 a 61 2 by AF, then add row to row 11 Bb Do those steps in the opposite order. First add column (b) 10 EA below. Then column ile Ae A ue Rules for Matrix Operations i ae =| a b Ae D242" B2b Ard Pala eal (a) Do those steps in the opposite order: first add row 2 to row | by FA, then add row | of FA to row 2. (b) What law is or is not obeyed by matrix multiplication? (3 by 3 matrices) Choose (a) BA=4A (b) BA=4B (c) BA has rows (d) All rows of BA B so that for every matrix A (if possible) | and 3 of A reversed are the same as row Suppose AB = BA and AC = CA oi. a=|* 16) i commutes | of A. for these two particular matrices B and C: f with Fl If (0) i and ® mal jl al Prove that a = d and b = c = 0. Then A is a multiple of 7. The only matrices that commute with B and C and all other 2 by 2 matrices are A = multiple of /. 13 Which of the following matrices are guaranteed to equal (A— B)?: 14 True or false: (B—A)?, A*—2AB+B*, A(A—B)-—B(A-—B), (a) If A is defined then A is necessarily square. (b) If AB (c) If AB and BA are defined then AB and BA are square. (d) If and BA AB= B then are defined then A and B are square. A=T. A — B?, A*—AB-—BA+B?? 62 Chapter 2 15 If A is m by n, how many separate multiplications are involved when 16 Solving Linear Equations (a) A multiplies a vector x with n components? (b) A multiplies an n by p matrix B? (c) A multiplies itself to produce A*? Here m =n. To prove that (AB)C = A(BC), use the column vectors bj,...,b, of B. First suppose that C has only one column c with entries cj,..., Cn: AB has columns Ab),..., Ab, and Bc has one column c,bj +--- + cybn. Then (AB)c = cj Ab, +---+c,Ab, while A(Bc) = A(cjb) +--+: + cnbn). Linearity makes those last two equal: (AB)c = other 17. . The same is true for all of C. Therefore (AB)C = A(BC). For A=[274] and B =[1? (a) column 2 of AB (b) row 2 of AB (c) (d) row 2 of AA = A? row 2 of AAA = A?. 4], compute these answers and nothing more: Problems 18-20 use a;; for the entry in row i, column j of A. 18 19 20 Write down the 3 by 3 matrix A whose entries are (a) aj =it+] (b) ay =) (c) ay =i/j. What words would you use to describe each of these classes of matrices? Give a 3 by 3 example in each class. Which matrix belongs to all four classes? (a) aj; =Oifi xs (b) aj (C) aij = Qji (d) aQjj = =Oifi<j alj- The entries of A are a;;. Assuming that zeros don’t appear, what is (a) the first pivot? (b) the multiplier of row 1 to be subtracted from row 3? (c) the new entry that replaces a32 after that subtraction? (d) the second pivot? 2.4 Rules for Matrix Operations 63 Problems 21-25 involve powers of A. 21 Compute A’, A®, A* and also Av, Av, A>v, A‘v for and v= © oo oOo © © = 22 mh &SN Find all the powers A?, A>,... and AB, (AB)?,... for Hohe Bei i ies E z| pas 23 24 25 0 k il By trial and error find 2 by 2 matrices (of real numbers) such that (a) A? =-I] (b) BC =-—CB (a) Find a nonzero matrix A for which A = 0. (b) Find a matrix that has A? 40 but A? = 0. (not allowing BC = 0). By experiment with n = 2 and n = 3 predict A” for a=) | and a= | and | al Problems 26-34 use column-row multiplication and block multiplication. 26 Multiply AB using columns times rows: 1X0 ap=|2 4]/ > atte Zh + 1 p24) oe cheba 2 = The product of upper triangular matrices is upper triangular: AB = Bae 2% |,0 2x, <x eta: Row times column is dot product ae aeet, OES eee O05 005, x 0 Oa0 (Row 2 of A)-(column | of B) = 0. Which other dot products give zeros? Column times row is full matrix Draw x’s and 0’s in (column 2 of A) (row 2 of B) and in (column 3 of A) (row 3 of B). 64 Chapter 2 28 Draw the cuts in A (2 by 3) and B (3 by 4) and AB to show how each of the four multiplication rules is really a block multiplication: Solving Linear Equations (1) Matrix A times columns of B. (2) Rows of A times matrix B. (3) Rows of A times columns of B. (4) Columns of A times rows of B. 29 Draw the cuts in A and x to multiply Ax a column at a time: x; times column | aca 30 Which matrices E2,; and E3, produce zeros in the (2,1) and (3, 1) positions of E,,A and E3,;A? A= Find the single matrix 31 In Problem I Ole se ik eb a b Fle g eed: 30, what are c and D and what is D — cb/a? With i* = —1, the product of (A+iB) and (x+iy) is Ax+iBx+iAy—By. Use blocks to separate the real part without i from the imaginary part that multiplies /: Ay te 33 E = £3; F2, that produces both zeros at once. Multiply EA. Block multiplication says in Example 4 that EA 32 AAO =2 Oo 4 S68 Dies SB ie) ethene’ OtPAN Se yo erealepart ? imaginary part Each complex multiplication (a +ib)(c + id) seems to need ac, bd,ad, bc. But notice that ad + bc = (a + b)(c + d) — ac — bd; those 3 multiplications are enough. For matrices A, B,C, D this “3M method” becomes 3n? versus 4n?. Check additions: (a) How many additions for a dot product? For n by n matrix multiplication? (b) (Old method) How many additions for (c) (3M method) How many additions for R and § = (A+ B)(C+D)—AC— BD? For n = 2 the multiplications are 24 versus 32. Additions are 32 versus 24. For n > 2 the 3M method clearly wins. R= AC—BD and § = AD+ BC? 2.4 34 Rules for Matrix Operations 65 Suppose you solve Ax = b for three special right sides b: bes 1 \n0 0 and 0 1 0 0 and 1 If the three solutions x, x2, x3 are the columns of a matrix X, what is A times X? 35 ‘If the three solutions in Question 34 are x; = (1,1, 1) and x2 = (0,1, 1) and x3 = (0,0, 1), solve Ax = b when b = (3,5, 8). Challenge problem: What is A? 66 Chapter 2 Solving Linear Equations INVERSE MATRICES m 2.5 Suppose A is a square matrix. We look for a matrix A~! of the same size, such that A! times A equals I. Whatever A does, A~! undoes. Their product is the identity matrix—which does nothing. But this “inverse matrix” might not exist. What a matrix mostly does is to multiply a vector x. Multiplying Ax = b by A7! gives A~'Ax = A~'b. The left side is just x! The product A~'A is like multiplying by a number and then dividing by that number. An ordinary number has an inverse if it is not zero—matrices are more complicated and more interesting. The matrix Aq! is called “A inverse.” Not all matrices have inverses. This is the first question we ask about a square matrix: Is A invertible? We don’t mean that our first calculation is immediately to find A7!. In most problems we never compute it! The inverse exists if and only if elimination produces n pivots. (This is proved below. We must allow row exchanges to help find those pivots.) Elimination solves Ax = b without knowing or using A~!. But the idea of invertibility is fundamental. DEFINITION The matrix A is invertible if there exists a matrix A~! such that A1A=I and AA =I. (1) Note 1 The matrix A cannot have two different inverses. Suppose BA = / and also AC =I. Then B =C, according to this “proof by parentheses”: BAC) S(BAjC gives BI=IC or B= C. (2) This shows that a left-inverse B (multiplying from the left) and a right-inverse C (multiplying A from the right to give AC = /) must be the same matrix. Note 2 If A is invertible then the one and only solution to Ax = b is x = A~'b: Multiply Ax=b by A! Then x =A7!Ax = A7'b. Note 3 (Important) Suppose there is a nonzero vector x such that Ax = 0. Then A cannot have an inverse. No matrix can bring 0 back to x. If A is invertible, then Ax = 0 has the one and only solution x = 0. Note 4 A 2 by 2 matrix is invertible if and only if ad — bc is not zero: a bl Ie | 1 Wiese dveb be al (3) 2.5 Inverse Matrices 67 This number ad — bc is the determinant of A. A matrix is invertible if its determinant is not zero (Chapter 5). The test for n pivots is usually decided before the determinant appears. Note 5 A diagonal matrix is invertible when no diagonal entries are zero: dj If A= 1/dy tag then A’)= 4 dn 1/dy Note 6 The 2 by 2 matrix A = [12] is not invertible. It fails the test in Note 4, because ad — bc equals 2 — 2 = 0. It fails the test in Note 3, because Ax = 0 when x = (2,—1). It also fails to have two pivots. Elimination turns the second row of A into a zero row. The Inverse of a Product For two nonzero numbers a and b, the sum a +b might or might not be invertible. The numbers a = 3 and b = —3 have inverses 4 and —i. Their sum a + b = 0 has no inverse. But the product ab = —9 does have an inverse, which is 4 times —f. For two matrices A and B, the situation is similar. It is hard to say much about the invertibility of A+ B. But the product AB has an inverse, whenever the factors A and B are separately invertible (and the same size). The important point is that A7! and B~! come in reverse order: 2H If A and B are invertible then so is AB. (QU The inverse of AB a is (4) To see why the order is reversed, start with AB. Multiplying on the left by A~! leaves B. Multiplying by B~! leaves the identity matrix /: B-'A'AB_ equals B'IB=B'B=I. Similarly AB times B~'A~! equals AIA~'! = AA~! =]. This illustrates a basic rule of mathematics: Inverses come in reverse order. It is also common sense: If you put on socks and then shoes, the first to be taken off are the . The same idea applies to three or more matrices: ARGs =C- BAS (5) 68 Chapter 2 Solving Linear Equations Example 1 If E is the elementary matrix that subtracts 5 times row then E~! adds 5 times row | to row 2: E=|—5 Lo. 0 0) ~1 “0 OQ: oat and 1 from row 2, iP Om) EF'=|5 1 °0 Gorm Multiply EE~! to get the identity matrix 7. Also multiply E~'E to get J. We are adding and subtracting the same 5 times row (this is 1. Whether we add and then subtract EE~') or subtract first and then add (this is E —lE), we are back at the start. For square matrices, an inverse on one side is automatically an inverse on the other side. If AB =/ then automatically BA = /. In that case B is A~!. This is very useful to know but we are not ready to prove it. Example 2 Suppose F subtracts 4 times row 2 from row 3, and F~! adds it back: Fe i 0 Oe 12s Oe beanie Omens ial 00 tie ah Oahu) On Aviel Again FF~!' =] and also F~'F = 1. Now multiply F by the matrix E in Example 1 to find FE. Also multiply E~! times F~! to find (FE)~!. Notice the order of those inverses! FE=| ee —-5 1 8 O| 20 1 -4 isinverted by E-'F-'=|] ee 5 Pe 1 O}. 0 4 (6) 1 This is strange but correct. The product FE contains “20” but its inverse doesn’t. You can check that FE times E~'F—! gives /. There must be a reason why 20 appears in FE but not in its inverse. E subtracts 5 times row | from row 2. Then F subtracts 4 times the new row 2 (which contains —5 times row 1) from row 3. In this order FE, row 3 feels an effect from row 1. In the order E~'F~!, that effect does not happen. row 3. After that, E~! adds 5 times row First F~! adds 4 times row 2 to 1 to row 2. There is no 20, because row 3 doesn’t change again. The example makes two points: 1 Usually we cannot find A~! from a quick look at A. 2 When elementary matrices come in their normal order FE, we can find the product of inverses E~'F~! quickly. The multipliers fall into place below the diagonal. This special property of E~'F~! and E~'F~'G7! will be useful in the next section. We will explain it again, more completely. In this section our job is A~!, and we expect some serious work to compute it. Here is a way to organize that computation. 2.5 Inverse Matrices 69 The Calculation of A~! by Gauss-Jordan Elimination I hinted that A~! might not be explicitly needed. The equation Ax = Bb is solved by x = A~'b. But it is not necessary or efficient to compute A~! and multiply it times b. Elimination goes directly to x. In fact elimination is also the way to calculate A~!, as we now show. The idea is to solve AA~! = J a column at a time. A multiplies the first column of A~! (call that x1) to give the first column of / (call that e;). This is our equation Ax, =e . Each column of A~! is multiplied by A to produce a column of /: AA = Alm x2 x3|]=[e1 e2 €3] =F (7) To invert a 3 by 3 matrix A, we have to solve three systems of equations: and Ax7 = e2 and Ax3 = e3. Then x1, x2, x3 are the columns of A~!. Ax; = e; Note This already shows why computing A~! is expensive. We must solve n equations for its n columns. To solve Ax = b directly, we deal only with one column. In defense of A~!, we want to say that its cost is not n times the cost of one system Ax = b. Surprisingly, the cost for n columns is only multiplied by 3. This saving is because the n equations Ax; = e; all involve the same matrix A. When we have many different right sides, elimination only has to be done once on A. Working with the right sides is relatively cheap. The complete A~! needs n> elimination steps, whereas solving for a single x needs n?/3. The next section calculates these costs. The Gauss-Jordan method computes A~! by solving all n equations together. Usually the “augmented matrix” has one extra column b, from the right side of the equations. Now we have three right sides e;, e2, e3 (when A is 3 by 3). They are the columns of I, so the augmented matrix is really just [A J]. Here is a worked-out example when A has 2’s on the main diagonal and —1’s next to the 2’s: 29101 0 eer O 61 ee 0 «(CO Diet 05° 0 [A e; e2 e3|=| = >| We are halfway. 0 5.02 1.. 3 —1 5 1 0 O19) 2-7 0) Oe a moe oe il a () 3 j 1 0 3-1 0 0 i € row 1+ row 2) (3 row 2+ row 3) 3 laee The matrix in the first three columns is now The pivots 2, 3, ; are on its diagonal. Gauss would U (upper triangular). finish by back substitution. The contribution of Jordan is to continue with elimination, which goes all the way to the 70 Chapter 2 Solving Linear Equations “reduced echelon form’. Rows are added to rows above them, to produce zeros above the pivots: 2 >] -1 0 OE 3 0 1 Orr ) Tes 4 3 3 3 ; 1 € row 1 2 (3 row 2+row 1) 20s Ls linOale Sr OG a ee Pe OF ; bAOkewiel ecigdl tae 3+ row 2) nee Now divide each row by its pivot. The new pivots are 1. The result is to produce I in the first half of the matrix. (This reduced echelon form is R = J because A is invertible.) The columns of A~' are in the second half. 3 Lees (divide by 3) 0. sirlelign0 See divide (divide Oe re 41 byby 43) es) 1 (divide by 2) | Starting from the 3 by 6 matrix [A Gauss-Jordan process on one line: Multiply Beak ei 1 |= [fF x1 x2 3]. :1 33 /], we have reached [ / A~!]. Here is the whole [A I] by A toget [I Aq']. The elimination steps gradually end up with the inverse matrix. Again, we probably don’t need A~! at all. But for small matrices, it can be very worthwhile to know the inverse. We add three observations about this particular A~! because it is an important example. We introduce the words symmetric, tridiagonal, and determinant: 1 A is symmetric across its main diagonal. So is A7!. 2 The matrix A is tridiagonal (three nonzero diagonals). But A~! is a full matrix with no zeros. That is another reason we don’t often compute A~!. 3 The product of the pivots is 2(3)(4) = 4. This number 4 is the determinant of A. Chapter 5 will show why A7~! involves division by 4: The inverse above is A! 1 = A 4 AsO aHe NO me WwW 3 (8) Example 3 Find A~! by Gauss-Jordan elimination with A = [23]. so, 3: ee 153.61920 be ul=|[5 i. 0 > [a 2 | | ow aaaat l= ©—_= | NN W — sl or =o a | lI | = wNnin mR Dlw — — > | J 2.5 The reduced echelon form of [A determinant 2.7 —3-4=2. J] is [Z Inverse Matrices 71 A~!]. This A7! involves division by the The code for X = inverse(A) has three important lines! I = eye(n, n); R=rref({A /)}); X=RE,n+1:n+n) The last line discards columns | to n, in the left half of R. It picks out X = Aq! from the right half. Of course A must be invertible, or the left half will not be /. Singular versus invertible We come back to the central question. Which matrices have inverses? The start of this section proposed the pivot test: A~! exists exactly when A has a full set of n pivots. (Row exchanges are allowed!) Now we can prove that fact, by Gauss-Jordan elimination: 1 With n pivots, elimination solves all the equations Ax; = e;. The columns x; go into A~'. Then AA~! = J and A™! is at least a right-inverse. Z Elimination is really a long sequence of matrix multiplications: (Drie cha Pe ARE D~! divides by the pivots. The matrices E produce zeros below and above the pivots. The matrices P exchange rows (details in Section 2.7). The product matrix in parentheses is evidently a left-inverse that gives A~'A = J. It equals the right-inverse by Note 1, in this section. With a full set of pivots A is invertible. The converse is also true. Jf A~! exists, then A must have a full set of n pivots. Again the argument has two steps: 1 If A has a whole row of zeros, the inverse cannot exist. Every matrix product AB Z will have that zero row, and cannot be equal to /. If a column has no pivot, elimination will eventually reach a row of zeros: “hy OP 32 Be hwo, OE Oe ey, reduces to ah 3e TO 38 car 0 0 0 We have reached a matrix with no inverse. Since each elimination step is invertible, the original A was not invertible. Without pivots AW! fails. 2I Elimination gives a complete test for A~! to exist. There must be n pivots. 72 Chapter 2. Solving Linear Equations Example 4 If L is lower triangular with 1’s on the diagonal, so is L~!. Use the Gauss-Jordan method to construct L~!. Start by subtracting multiples of pivot rows from rows below. Normally this gets us halfway to the inverse, but for L it gets us all the way. This example is typical: fe 10k oil Mai TRO! hp 1, BOF 4 5 I “On OmG Ie sl ae OT onal ] 0 -—3 eS 1-4 [Zs deta —>10 lO eer QE SOO When SO°°@ yey jr *0 PSSat OL A L goes to J by elimination, elimination matrices E32E3,E2; 20 20 werent) 1 0 0 1 ah ee) =s 1 CER (3 times row (4 times row ON ts J goes to L~!. is L~!. | from row 2) | from row 3) erie In other words, the product of All the pivots are 1’s (a full set). Lis lower triangular. = REVIEW OF THE KEY IDEAS #8 1. The inverse matrix gives AA~! = J and A~'A=J. De The inverse of AB a If Ax = 0 for a nonzero 4. The Gauss-Jordan method solves AA~! row-reduced to es A \ 5. A is invertible if and only if it has n pivots (row exchanges allowed). is B~!A7!, vector x, then A has no inverse. = J. The augmented matrix [A I ] is Problem Set 2.5 1 Find the inverses (directly or from the 2 by 2 formula) of A, B, C: A, Ones al and an) B=|; | and 3 ory. dl al 2.5. Inverse Matrices 73 For these “permutation matrices” find P~! by trial and error (with 1’s and 0s): Gb) On-O0 i MOIR 0) ik SOR) ands Oe! 40 70 0 at ee (Oe f= Solve for the columns of A7! = [3 Hil: [oo so][¢]=[o] o»# [20 so][e]=[i] Show that Bea has no inverse by trying to solve for the columns mee (a,c) and Pager Find three 2 by 2 matrices (not A = /) that are their own inverses: (a) If A is invertible and AB = AC, prove quickly that (D) elite Ay — & rs find two matrices A? = /. B = C. B 4 C such that AB = AC. (Important) If the 3 by 3 matrix A has row not invertible: | + row 2 = row 3, show that it is (a) Explain why Ax = (1,0,0) cannot have a solution. (b) Which (c) What happens to row 3 in elimination? right sides (5), b2, b3) might allow a solution to Ax = b? Suppose A is invertible and you exchange its first two rows. Is the new matrix B invertible and how would you find B~! from A7!? Find the inverses (in any legal way) of Oin0: 00.2 003 0 NW ee Pees 5 01070 10 11 3200 430 0 lip ms Gun Ture (a) Find invertible matrices (b) Find singular matrices A and B such that A + B is invertible. If the product A and B such that A + B is not invertible. C = AB is invertible (A and B are square), then A itself is in- vertible. Find a formula for A~! that involves C~! and B. iy If the product M = ABC of three square matrices is invertible, then B is invertible. (So are A and C.) Find a formula for B~! that involves M~! and A and C. 74 Chapter 2 13 If you add row 1 of A to row 2 to get B, how do you find B~! from “Aa? Solving Linear Equations The inverse of B= i i A is : 14 Prove that a matrix with a column of zeros cannot have an inverse. 15 Multiply [#5] times [_¢->]. What is the inverse of each matrix if ad 4 bc? Cc 16 (a) (b) What single matrix E has the same effect as these three steps? Subtract row | from row row 3. 2 from 2, subtract row 3, then subtract row What single matrix L has the same effect as these three steps? Add row 2 to row 3, add row Urs | from row | to row 3, then add row If the 3 by 3 matrix A has column not invertible: 1 + column (a) Find a nonzero (b) Does elimination keep column there is no third pivot. | to row 2. 2 = column solution to Ax = 0. 1 + column 2 = column 3? Explain why 18 If B is the inverse of A, show that AB 19 Find the numbers a and b that give the correct inverse: is the inverse of A. =I ES St) 14S) Sy ee eee Se aan eoiy iianad 20 3, show that it is abd: bea bab ee ae: bap bitrate There are sixteen 2 by 2 matrices whose entries are 1’s and 0’s. How many of them are invertible? Questions 21-27 are about the Gauss-Jordan method for calculating A~!. 21 Change / into A! as you reduce A to J (by row operations): [4 l=[,3 01] ™ 4 =[5 3 041 22 Follow the 3 by 3 text example but with plus signs in A. Eliminate above and below the pivots to reduce [A epee J] to 2 ee 0 R=[J A™!];: SS = © NR So = 0 0 © or l 2.5 23 Use Gauss-Jordan Me Multiply D-'!Ej2E and 29 ly elt 8) A SIO Oi sO @¢ © i Zl ul hilt 2a aad 1 Shige cA and, 2A Lisl ile 2 2 3 Exchange rows and continue with Gauss-Jordan to find A7!: peste True or false (with a counterexample if false): (a) A 4 by 4 matrix with __ zeros 1s not invertible. (b) Every matrix with 1’s down the main diagonal is invertible. (c) If A is invertible then A~! is invertible. (d) If A is invertible then A? is invertible. For which numbers c is this matrix not invertible, and why not? A= 22a |uCe aC a.C eae 30 @ I @ to find A7!. Cea 28 lL ||M OO Invert these matrices by the Gauss-Jordan method starting with [A / ]: 7M 2a. ae || = What three matrices Ep; and E}2 and D~! reduce A = [}2] to the identity matrix? 26 Bey Find A7! (if it exists) by Gauss-Jordan elimination on [A J ]: 2 a lyk 7 Wachee tea| jE 2 25 75 elimination with J next to A to solve AA~! =7: ab Oia e O @ i 24 Inverse Matrices Prove that this matrix is invertible if a #0 and a £b: 76 Chapter 2 31 This matrix has a remarkable inverse. and multiply AA! to confirm: Solving Linear Equations Find A~! and guess the 5 by 5 inverse ea 2] 32 Use the inverses in Question 31 to solve Ax = (1, 1, 1, 1) and Ax = (1, 1, 1, 1, 1). 33 The Hilbert matrices have aj; = 1/(i + j — 1). Ask MATLAB for the exact inverse invhilb(6). Then ask for inv(hilb(6)). How can these be different, when the computer never makes mistakes? 34 beakeear ear, Find the inverses (assuming they exist) of these block matrices: 2.6 ELIMINATION Elimination = Factorization: A = LU FET = FACTORIZATION: A = LU ao 2.6 Students often say that mathematics courses are too theoretical. Well, not this section. It is almost purely practical. The goal is to describe Gaussian elimination in the most useful way. Many key ideas of linear algebra, when you look at them closely, are really factorizations of a matrix. The original matrix A becomes the product of two or three special matrices. The first factorization—also the most important in practice— comes now from elimination. The factors are triangular matrices. The factorization is A= LU. Start with a 2 by 2 example. The matrix A contains 2, 1,6,8. The number to eliminate is 6. Subtract 3 times row | from row 2. That step is E2; in the forward direction. The return step from U to A is Ey (an addition): Forward to U : Back from U to A: LO De Ena=|_; alle leg | alse EU 4 = A. = E a fel The second line is our factorization. Instead of A = Ea we write A = LU. Move now to larger matrices with many E’s. Then L will include all their inverses. Each step from A to U multiplies by a matrix E;; to produce zero in the (i, /) position. The row exchange matrices P;; move nonzeros up to the pivot position. A long sequence of E’s and P’s multiplies A to produce U. When we put each E~! and P~—' on the other side of the equation, they multiply U and bring back A: A= (Ee POSE Ne: (1) This factorization has too many factors. The matrices E~! and P~! are too simple. We will combine them into a single matrix L which inverts all the elimination steps at once. Another matrix P accounts for all the row exchanges. (It comes in the next section.) Then A is built out of L, P, and U. To keep this clear, we want to start with the most frequent case—when no row exchanges are involved. If A is 3 by 3, we multiply it by £2; and £3; and E37. That produces zero in the (2, 1) and (3, 1) and (3, 2) positions—all below the diagonal. end with the upper triangular inverses multiply U: (£32E31E21)A=U becomes We U. Now move those E’s onto the other side, where their A=(E5)'E3)' Ey )U whichis A=LU. (2) The inverses go in opposite order, as they must. That product of three inverses is L. We have reached A = LU. Now we stop to understand it. First point: Every inverse matrix Ey is lower triangular. Its off-diagonal entry is ¢;;, to undo the subtraction in £;; with —€;;. The main diagonals of E and E~! all 1’s. contain 78 Chapter 2 Solving Linear Equations Second point: The product of E’s is still lower triangular. Equation (2) shows a lower triangular matrix (in the first parentheses) multiplying A. It also shows a lower triangular matrix (the product of Ee multiplying U to bring back A. This product of inverses is L. There are two good reasons for working with the inverses. One reason is that we want to factor A, not U. The “inverse form” of equation (2), gives A= LU. second reason is that we get something extra, almost more than we deserve. the third point, which shows that L is exactly right. The This is Third point: Each multiplier /;; goes directly into its 7, 7 position—unchanged—in the product L. Usually a matrix multiplication will mix up all the numbers. Here that doesn’t happen. The order is right for the inverse matrices, to keep the /’s unchanged. The reason is given below. Since each E~! has 1’s down its diagonal, the final good point is that L does too. 2) (A = LU) This is elimination without row exchanges. The upper triangular U has the pivots on its diagonal. The lower triangular L has 1’s on its diagonal. The multipliers /;; are below the diagonal. Example row 1 For the matrix A with 1, 2, 1 on its diagonals, elimination subtracts : times | from row 2. Then /2; = 5. The last step subtracts : times row 2 from row 3: 25m las ge. Cadi Onele ae Ua Are 2 ee, Ble OO een eae Os Irae O° s) 1 }owo The (3, 1) multiplier is zero because the (3, 1) entry in A is zero. No operation needed. Example 2 Change the top left entry from 2 to 1. The multipliers all become pivots are all 1. That pattern continues when A is 4 by 4 or n by n: as bel AK) 1 ae Saas a Fi (pa eed Na paca CC (nme tee. 0 0 1 1. The i? *17'0t6 elle’ alk hes ig al 1 These LU examples are showing something extra, which is very important in practice. Assume no row exchanges. When can we predict zeros in L and U? When a row of A starts with zeros, so does that row of L. When a column of A starts with zeros, so does that column of U. If a row starts with zero, we don’t need an elimination step. The zeros below the diagonal in A gave zeros in L. The multipliers /;; were zero. That saves computer 2.6 Elimination = Factorization: A = LU 79 time. Similarly, zeros at the start of a column survive into U. But please realize: Zeros in the middle of a matrix are likely to be filled in, while elimination sweeps forward. We now explain why L has the multipliers /;; in position, with no mix-up. The key reason why A equals LU: Ask yourself about the rows that are subtracted from lower rows. Are they the original rows of A? No, elimination probably changed them. Are they rows of U? Yes, the pivot rows never get changed again. For the third row of U, we are subtracting multiples of rows 1 and 2 of U (not rows of A!): Row 3 of U = row 3 of A —13;(row 1 of U) — /32(row 2 of U). (3) Rewrite that equation: Row 3 of A = 13; (row 1 of U) + 132(row 2 of U) + l(row 3 of U). (4) The right side shows the row [/3; 1/32 1] multiplying U. This is row 3 of A= LU. All rows look like this, whatever the size of A. With no row exchanges, we have Al—re We Remark The LU factorization is “unsymmetric” in one respect. U has the pivots on its diagonal where L has 1’s. This is easy to change. Divide U by a diagonal matrix D that contains the pivots. Then U equals D times a matrix with 1’s on the diagonal: d\ 1 do uy2/d) 1 u43/d) u23/d2 dn l It is convenient (but confusing) to keep the same letter U for this new upper triangular matrix. It has 1’s on the diagonal (like L). Instead of the normal LU, the new form has LDU: Lower triangular times diagonal times upper triangular: The triangular factorization can be written A=LU or A=LDU. Whenever you see LDU, it is understood that U has 1’s on the diagonal. Each row is divided by its first nonzero entry—the pivot. Then L and U are treated evenly. Here is LU and then also LDU: ill s}-B al? slo The pivots 2 and 5 went into D. Dividing the rows by 2 and 5 left the rows [1 and [0 1] in the new U. The multiplier 3 is still in L. 4] 80 Chapter 2 Solving Linear Equations One Square System = Two Triangular Systems We emphasized that L contains our memory of Gaussian elimination. It holds all the numbers that multiplied the pivot rows, before subtracting them from lower rows. When do we need this record and how do we use it? We need to remember L as soon as there is a right side b. The factors L and U were completely decided by the left side (the matrix A). Now work on the right side. Most computer programs for linear equations separate the problem Ax = b this way, into two different subroutines: 1 Factor A (into L and U) Z Solve for x (from L and U and Bb). Up to now, we worked on b while we were working on A. No problem with that—just augment the matrix by an extra column. The Gauss-Jordan method for A7! worked on n right sides at once. But most computer codes keep the two sides separate. The memory of forward elimination is held in L and U, at no extra cost in storage. Then we process b whenever we want to. The User’s Guide to LINPACK remarks that “This situation is so common and the savings are so important that no provision has been made for solving a single system with just one subroutine.” How do we process b? First, apply the forward elimination steps (which are stored in L). This changes b to a new right side c—we are really solving Lc = b. Then back substitution solves Ux = c. The original system Ax = b is factored into two triangular systems: Solve Lc=b_~ and then solve Ux=c. (6) To see that x is correct, multiply Ux = c by L. Then LU equals A and Le equals b. We have Ax = b. To emphasize: There is nothing new about those steps. This is exactly what we have done all along. Forward elimination changed Bb into c on the right side. We were really solving the triangular system Le = b by “forward substitution.’ Then back substitution produces x. An example shows it all: Example 3 Forward elimination ends at Ux = c: ¥ 2 E28 We eer oreo, bees 2u+2v=8 Sts The multiplier was 2. The left side found L, while the right side solved Le = b: = Le=b The lower triangular system 1 O es : ‘|| = | : SIV ESE ps — H : 2.6 Ux =c The upper triangular system Elimination = Factorization: i | | ee B A = LU CIVES sk 81 H : It is satisfying that L and U replace A with no extra storage space. The triangular matrices can take the n* storage locations that originally held A. The /’s fit in below the diagonal. The whole discussion is only looking to see what elimination actually did. The Cost of Elimination A very practical question is cost—or computing time. Can we solve a thousand equations on a PC? What if n = 10,000? Large systems come up all the time in scientific computing, where a three-dimensional problem can easily lead to a million unknowns. We can let the calculation run overnight, but we can’t leave it for 100 years. It is not hard to estimate how many individual multiplications go into elimination. Concentrate on the left side (the expensive side, where A changes to U). The first stage produces zeros below the first pivot. To change an entry in A requires one multiplication and subtraction. We will count this stage as n* multiplications and n? subtractions. It is actually less, n* —1n, because row | does not change. Now column IMistset. The next stage clears out the second column below the second pivot. The matrix is now of size n—1. Estimate by (n— 1)2 multiplications and subtractions. The matrices are getting smaller as elimination goes forward. The rough count to reach U is the sum n>+(n—1)*+---+274+2. There is an exact formula in(n + y(n + 1) for this sum is large, the 4 and the | are not important. of squares. When n The number that matters is in, and it comes immediately from the square pyramid in Figure 2.9. The base shows the n? steps for stage 1. The height shows the n stages as the working matrix gets smaller. The volume is a good estimate for the total number of steps. The volume of an n by n by n pyramid is ine: Elimination on A _requires about in? multiplications and in subtractions. What about the right side b? The first stage subtracts multiples of b; from the lower components b2,..., bn. This is n — 1 steps. The second stage takes only n — 2 steps, because b, is not involved. The last stage of forward elimination takes one step to produce c. Now start back substitution. Computing x, uses one step (divide by the last pivot). The next unknown uses two steps. The first unknown x; requires n steps (n—1 substitutions of the other unknowns, then division by the first pivot). The total count on the right side—forward to the bottom and back to the top—is exactly a (n— 1) ein— 2) fee PI 1424640 =-) +n =n’, (7) 82 Chapter 2 Solving Linear Equations ~ i + nbyn | tL * —1 by n-1 ma Ds | ls Ii / = 3 \ n f N )(= | | | | = area n” j 1 area (n—1)* \ n—-1 ee ee ee volume 5” Figure 2.9 — Elimination needs about in? multiplications and subtractions. count (n? —n)+---+(?-1) =} 3 aren Exact in. To see that sum, pair off (n—1) with 1 and (n—2) with 2. The pairings leave n terms each equal to n. That makes n*. The right side costs a lot less than the left side! Each right side (from b to c to x) needs n* 2 multiplications and n* subtractions. Here are the MATLAB codes to factor A into LU and to solve Ax = b. The program slu stops if zero appears in a pivot position (that fact is printed). Later the program plu will look down the column for a pivot, and execute a row exchange. These M-files are on the web. function [L, U] = slu(A) % slu Square lu factorization with no row exchanges. [n,n] = size(A); tol = l.e — 6; fork =n if abs(A(k, k)) < tol disp({/Small pivot in column ‘int 2 str(k)]) end % Cannot proceed without a row exchange: stop Lik ky =": fori=k+1:in L(i,k) = A@i,k)/A(k,k); =% The kth pivot is now A(k, k) LORS estkid= ert A(i, j) = A@, J) — LG, k) * AC, J); end end for = kin U(k, j) = Atk, j); % Rename eliminated A to call it U end end 2.6 Elimination = Factorization: A = LU 83 function x = slv(A, b) % Solve Ax = b using L and U from slu(A). No row exchanges! [L, U] = slu(A); % Forward elimination to solve Le = b. % L is lower triangular with 1’s on the diagonal. [n,n] = size(A); fOr keaton for j=1:k—1 s=s+LKk, j) *c(j); end c(k) = b(k) —s; end % Back substitution to solve fork =n:-—1:1 for Ux = c. % Going backwards to x(n),..., x(1) j=kK+1cn t=t+U{k, j) *x(j); end x(k) = (c(k) — t)/U(K,k); end 5 aes How long does it take to factor A into LU? For a random matrix of order n = 100, we used the MATLAB command t = clock; lu(A); time(clock, t). The elapsed time on a SUN Sparcstation 1 was one second. For n = 200 the elapsed time was eight seconds. This follows the n° rule! The time is multiplied by 8 when n is multiplied by 2. A According to this n> rule, matrices that are 10 times as large (order 1000) will take 10° seconds. Matrices of order 10,000 will take (100)? seconds. This is very ex- pensive but remember that these matrices are full. Most matrices in practice are sparse (many zero entries). In that case A = LU is very much faster. For tridiagonal matrices of order 10,000, solving Ax = b is a breeze. = REVIEW OF THE KEY IDEAS & 1. Gaussian elimination (with no row exchanges) factors A into L times U. 2. The lower triangular L contains the multiples of pivot rows that are subtracted going from A to U. The product LU adds those back to recover A. 84 Chapter 2 Solving Linear Equations On the right side we solve Le = b (forward) and Ux = c (backwards). There are 4(n3 —n) multiplication and subtractions on the left side. There are n* multiplications and subtractions on the right side. Problem Set 2.6 Problems 1-8 compute the factorization 1 A = LU (and also A= LDU). What matrix E puts A into triangular form EA = U? Multiply by factor A into LU: Ze VeRO) Ae ae ee) What two elimination matrices E~! = L to F2; and £32 put A into upper triangular form E32 E>,A = U? Multiply by E3,' and E>,’ to factor A into E5,'£3,'U which is LU. Find L and U for L Ava) 0 Brat nS 4 0 What three elimination matrices E2,, £3;, E32 put A into upper triangular form E32 E3;E2,A = U? Multiply by 12eae ES and Ee to factor A into LU where [= Engh hE ebindn’ and. Ly Ory3d AS 2 2-2 3 eats Suppose A is already lower triangular with 1’s on the diagonal. Then U = /! The elimination matrices E2,, E3;, £32 contain —a then —b then —c. (a) Multiply £323; £2; to find the single matrix E that produces EA = J. (b) Multiply Ee Ban Ex to find the single matrix L that gives A = LU (or L/). 2.6 Elimination = Factorization: A = LU 85 When zero appears in a pivot position, the factorization A = LU is not possible! (We are requiring nonzero pivots in U.) Show directly why these are both impossible: 01 2 men 3/=l1 illo ¢ Jeg Lac) 1 OL Le Ql eee min di 620g fm 1 de i This difficulty is fixed by a row exchange. Which number c leads to zero in the second pivot position? A row exchange is needed and A = LU is not possible. Which c produces zero in the third pivot position? Then a row exchange can’t help and elimination fails: > I ARO a — WN What new are L and D for this matrix A? U in A=LDU? What is U in A = LU and what is the 8 ) Fs OW 1A Z 7 NandAN) 0 A and B are symmetric across the diagonal (because 4 = 4). Find their factor- izations LDU and say how U a=|i 1s related to L: it and, > Ler4oEO =|4oi2) 4 4 0 B I fc Ss Sc 8 ao SG SS Find four conditions on a, b,c,d to get 10 A= LU with four pivots. Find L and U for the nonsymmetric matrix A= Gh a Gh oe ip id 1 dey iP & 1 ik is al Find the four conditions on a,b,c, d,r,s,t to get A= LU with four pivots. 86 Chapter 2. Solving Linear Equations Problems 11 11-12 use L and U (without needing A) to solve Ax = b. Solve the triangular system Le = b to find c. Then solve Ux =c 10 nel) i For safety find 12 Dh B=; (a) ih ee ik O@ 15 2 el ths to find x. What was othe a eel OmrOmeal A? ances 4 anda Dias) 6 When you apply the usual elimination steps to L, what matrix do you reach? 1 0 @ |i 1 © Ij, ip I (b) When you apply the same steps to 7, what matrix do you get? (c) When you apply the same steps to LU, what matrix do you get? If A= LDU D= D, and U = U\. “The factors are unique.” (a) Derive the equation jira aay: = D,U,U~!. triangular or diagonal? (b) Show that the main diagonals in that equation give ea) Bee and also A = L; DU, with all factors invertible, then L = L, and Are the two sides lower or upper D = D,. Why does Tridiagonal matrices have zero entries except on the main diagonal and the two adjacent diagonals. Factor these into A = LU and A= LDL!: ag Av— 16 and ik lee) ll L= 14 7A A = LU and solve Ax = Bb as usual. Circle c when you see it. Solve Lc = b to find c. Then solve Ux =c es 13 and to find x: Maas sol ame Oe a and A=|a a 0 a+b b When T is tridiagonal, its L and U factors have only two nonzero diagonals. How would you take advantage of the zeros in T in a computer code for Gaussian elimination? Find L and U. Se tS 1) & wb = or © On oOo kW 2.6 17 Elimination = Factorization: ba Si oy TOT oH tal teh Si OG GY AS) ty Oe Oeee xe After elimination has produced zeros below the first pivot, put x’s to show which blank entries are known in the final L and U: ee Se ee 68 || = Be 19 87 If A and B have nonzeros in the positions marked by x, which zeros (marked by 0) are still zero in their factors L and U? anda 18 A = LU 1 0 1b 0 @ 1 0 0 Suppose you eliminate upwards (almost unheard of). Use the last row to produce zeros in the last column (the pivot is 1). Then use the second row to produce zero above the second pivot. Find the factors in A = UL(!): Ae Nn OO — 20 Collins uses elimination WW OW mem in both directions, meeting at the center. Substitution goes out from the center. After eliminating both 2’s in A, one from above and one from below, what 4 by 4 matrix is left? Solve Ax = b his way. i 0) a0 Sul eotban() A=19 1 3 2| OL Oe 5 8 4 =I 2 21 (Important) If A has pivots 2,7,6 with no row exchanges, what are the pivots for the upper left 2 by 2 submatrix B (without row 3 and column 3)? Explain why. 22 Starting from a 3 by 3 matrix A with pivots 2,7, 6, add a fourth row and column to produce M. What are the first three pivots for M, and why? What fourth row and column are sure to produce 9 as the fourth pivot? 23 MATLAB knows the n by n matrix pascal(n). Find its LU factors for n = 4 and 5 and describe their pattern. Use chol(pascal(n)) or slu(A) above or work by hand. The row exchanges in MATLAB’s doesn’t: A = pascal(4) = | l i l lu code spoil the pattern, but chol 88 Chapter 2 24 (Careful review) For which numbers c is A = LU impossible—with three pivots? Solving Linear Equations hh= NO) AGP Loe oW 0 1 1 25 Change the program slu(A) into sldu(A), so that it produces L, D, and U. 26 Rewrite slu(A) so that the factors L and U appear in the same n? storage locations that held the original A. The extra storage used for L is not required. a7 Explain in words why x(k) is (c(k) —t)/U(k,k) 28 Write a program that multiplies triangular matrices, L times U. Don’t loop from 1 to n when you know there are zeros in the matrices. Somehow L times U should undo the operations in slu. at the end of slv(A, b). 2.7. Transposes and Permutations 89 TRANSPOSES AND PERMUTATIONS ag 2.7 We need one more matrix, and fortunately it is much simpler than the inverse. It is the “transpose” of A, which is denoted by A!. Its columns are the rows of A. When A is an m by n matrix, the transpose is n by m: wo then AT=| Be 0 3 4 If 2 = |e You can write the rows of A into the columns of A’. Or you can write the columns of A into the rows of A. The matrix “flips over” its main diagonal. The entry in row i, column j of A! comes from row j, column i of the original A: (A")ij = Aji. The transpose of a lower triangular matrix is upper triangular. is—-. Note MATLAB’s symbol for the transpose is A’. a row and transpose: The transpose of A‘! To enter a column vector, type v =[ 1 2 3 ]’. To enter a matrix with second column w = [4 5 6] you could define M =[v w]. Quicker to enter by rows and then transpose the whole mani: Mo = | 2 334"°5 6)7 The rules for transposes are very direct. We can transpose A+B to get (A+B)!. Or we can transpose A and B separately, and then add A! + B'—same result. The serious questions are about the transpose of a product AB and an inverse A~!: The transpose of A+B _ is Adi-pep ee (1) The transpose of AB is (AB)! = BTA’, (2) The transpose of An a6 Aas (3) = (Aja Notice especially how B'A!T comes in reverse order like B~!'A~!. for the inverse is quick: B~!A~! The proof times AB produces J. To see this reverse order for (AB), start with (Ax)!: Ax combines the columns of A while x TEA combines the rows of A’. It is the same combination of the same vectors! In A they are columns, in A! they are rows. So the transpose of the column Ax is the row x! A!. That fits our formula (Ax)? = x? AT. Now we can prove the formulas for (AB)' and (A~!)?. When columns B = [x; x2] has two columns, apply the same idea to each column. of AB are Ax; and Ax2. Their transposes are eA} and As Those The are the rows of BTA: x} At Transposing AB =| Ax; Ax2 -:: | gives uA | which is B'A!. (4) 90 = Chapter 2. Solving Linear Equations The right answer B! A! comes out a row at a time. You might like our “transparent proof” at the end of the problem set. Maybe numbers are the easiest; in this example: 422|oil| al Slo) etefoul al The rule extends to three or more factors: (ABC)? equals C'BTA?. If The pivot matrix A=LDU then A’ =U'D'L". Now apply this product rule to both sides of A-!A =I. D =D" om is the same. On one side, J" is J. On the other side, we discover that (AeAyE is the inverse of A!: A-'A=I is transposed to A!(A~!)T =J/. (5) Similarly AA~! = J leads to (A~!)'AT = JI. Notice especially: A! is invertible exactly when A is invertible. We can invert first or transpose first, it doesn’t matter. Example 1 The inverse of (Any? A=[19] is A~! =[_19]. The transpose is AT = [4 ¢]. equals E Hl which is also Gia Before leaving these rules, we call attention to dot products. The following statement looks extremely simple, but it actually contains the deep purpose for the transpose. For any vectors x and y, (Ax) y equals xTAly equals xl(Aly). (6) We can put in parentheses or leave them out. Here are three quick applications to engineering and economics. In electrical networks, x is the vector of potentials (voltages at nodes) and y is the vector of currents. Ax gives potential differences and A'y gives the total current into each node. Every vector has a meaning—the dot product is the heat loss. Chapter 8 explains in detail this important application to networks. All we say here is that nature is extremely symmetric! The symmetry is between A and A!. In engineering and physics, x can be the displacement of a structure under a load. Then Ax is the stretching (the strain). When y is the internal stress, Aly is the external force. The equation is a statement about dot products and also a balance equation: Internal work (strain - stress) = external work (displacement - force). In economics, x gives the amounts of n outputs. The m inputs to produce those output are Ax (and A is the input-output matrix). The costs per input go into y and the values per output are in Aly: Total input cost is Ax+y (input amounts times cost per input) Total output value is x +A‘ y (output amounts times value per output) When A moves from one side of a dot product to the other side, it becomes A!. 2.7 Transposes and Permutations 91 Symmetric Matrices For a symmetric matrix—these are the most important matrices—transposing A to Ax produces no change. Then A! = A. The matrix is symmetric across the main diagonal. A symmetric matrix is necessarily square. Its (j,i) and (i, j) entries are equal. DEFINITION A symmetric matrix has AT = A. This means that aj; = aj; . Example 2 Ane E | =A! and D= F A =p A is symmetric because of the 2’s on opposite sides of the diagonal. In D those 2’s are zeros. A diagonal matrix is automatically symmetric. The inverse of a symmetric matrix is also symmetric. (We have to add: If A = A! has an inverse.) When A~! is transposed, this gives (A!)~! which is A~!. So the transpose of A~! is A~!, for a symmetric matrix A: ‘ =i) cee 1 0 —-l]_ la. il ane ae =; oa Now we show that multiplying any matrix by its transpose gives a symmetric matrix. Symmetric Products R'R and RR' and LDL™ Choose any matrix R, probably rectangular. R'R is automatically a symmetric matrix: The transpose of R'R Multiply R' times R. Then the product is R'(R")' whichis R'R. (7) That is a quick proof of symmetry for R'R. We could also look at the (i, j) entry of R'R. It is the dot product of row i of R' (column i of R) with column j of R. The (j,i) entry is the same dot product, column j with column 7. So R'R is symmetric. The matrix RR! is also symmetric. (The shapes of R and R! allow multiplication.) But RR® is a different matrix from R'R. In our experience, most scientific problems that start with a rectangular matrix R end up with R'™R or RR! or both. Example 3 R=[1 2] and ees [i and RR! =[5]. The product R'R is n by n. In the opposite order, RR' is m by m. Even if m =n, it is not likely that R'R = RR!. Equality can happen, but it is abnormal. When elimination is applied to a symmetric matrix, we hope that A! = A isan advantage. That depends on the smaller matrices staying symmetric as elimination proceeds—which 92 Chapter 2. Solving Linear Equations they do. It is true that the upper triangular U cannot be symmetric. The symmetry is in LDU. Remember how the diagonal matrix D of pivots can be divided out, to leave 1’s on the diagonal of both L and U: se Je 2 ee i ae8) eile oe a i 2 0.3 (LU misses the symmetry) panlelig. 0 eeu) Lene? (LDU Bae 52 Ons 0 U is the transpose of L. fee 1 captures the symmetry) When A is symmetric, the usual form A = LDU becomes A = LDL!. The final U (with 1’s on the diagonal) is the transpose of L (also with 1’s on the diagonal). The diagonal D—the matrix of pivots—is symmetric by itself. 2K If A = A! can be factored into LDU with no row exchanges, then U = L?. The symmetric factorization is A = LDL", Notice that the transpose of LDL" is automatically (L?)'D'L™ which is LDL™ again. We have a symmetric factorization for a symmetric matrix. The work of elimination is cut in half, from n?/3 multiplications to n?/6. The storage is also cut essentially in half. We only keep L and D, not U. Permutation Matrices The transpose plays a special role for a permutation matrix. This matrix P has a single silt in every row and every column. Then P! is also a permutation matrix—maybe the same or maybe different. Any product P; P2 is again a permutation matrix. We now create every P from the identity matrix, by reordering the rows of /. The simplest permutation matrix is P = I (no exchanges). The next simplest are the row exchanges P;;. Those are constructed by exchanging two rows i and j of J. Other permutations exchange three rows. By doing all possible row exchanges to /, we get all possible permutation matrices: DEFINITION An n by n permutation matrix P has the n rows of I in any order. Example 4 There are six 3 by 3 permutation matrices. 1 1 I an Py = P31 = l P32 = —~ Here they are without the zeros: 1 | P39 Pot = 1 P21 P32 = | 1 1 2.7 Transposes and Permutations 93 There are n\ permutation matrices of order n. The symbol n! stands for “n factorial,” the product of the numbers (1)(2)--- (nm). Thus 3! = (1)(2)(3) which is 6. There will be 24 permutation matrices of order 4. There are only two permutation matrices of order 2, namely [}9] and [°4 ae Important: P~' is also a permutation matrix. In the example above, the four matrices on the left are their own inverses. The two matrices on the right are inverses of each other. In all cases, a single row exchange is its own inverse. If we repeat we are back to J. But for P32 P2;, the inverses go in opposite order (of course). inverse is P; P32. The More important: P~! is always the same as P*. The four matrices on the left above are their own transposes and inverses. The two matrices on the right are transposes— and inverses—of each other. When we multiply PP!, the “1” in the first row of P hits the “1” in the first column of P!. It misses the ones in all the other columns. So PPP Another proof of P! = P~! looks at P as a product of row exchanges. A row exchange is its own transpose and its own inverse. P! is the product in the opposite order. So is P~'. So P? and P~! are the same. Symmetric matrices led to A = LDL". Now permutations lead to PA = LU. The LU Factorization with Row Exchanges We hope you remember A = LU. It started with A = Ce Sey laac (8 Every elimination step was carried out by an £;; and inverted by an BE Those inverses were compressed into one matrix L. The lower triangular L has 1’s on the diagonal, and the result is A = LU. This is a great factorization, but it doesn’t always work! Sometimes row exchanges are needed to produce pivots. Then A = (E~!.-. P-!.--E7!...Po!.--)U. Every row exchange is carried out by a P;; and inverted by that Pj;. We now compress those row exchanges into a single permutation matrix P. This gives a factorization for every invertible matrix A—which we naturally want. The main question is where to collect the Pj;’s. There are two good possibilities— do all the exchanges before elimination, or do them after the E;;’s. The first way gives PA = LU. The second way has the permutation in the middle. 1 The row exchanges can be moved to the left side, onto A. We think of them as done in advance. Their product P puts the rows of A in the right order, so that no exchanges are needed for PA. Then PA = LU. 2 We can hold all row exchanges until after elimination. This leaves the pivot rows in a strange order. P; puts them in the correct order in U; (upper triangular as usual). PA = LU Then A = Ly P,Uj. is constantly used in almost all computing (and always in MATLAB). will concentrate on this form PA = LU. The factorization A = L; PU, We is the right 94 Chapter 2. Solving Linear Equations one for theoretical algebra—it is more elegant. If we mention both, it is because the difference is not well known. Probably you will not spend a long time on either one. Please don’t. The most important case by far has P = 7, when A equals LU with no exchanges. For this matrix A, exchange rows | and 2 to put the first pivot in its usual place. Then go through elimination: (Oia! 2: RG" Im OF Sy Il AS ee, I 0 0 Pel 1%i1ts-> 3 FW £3; =2 The matrix PA 1h 25 Fl (ORee lana OO} O84 £32 = 3 is in good order, and it factors as usual: Aga i © |e eee By © ee) Al i @ (Om Be i O ik it || s/o mae (8) We started with A and ended with U. The only requirement is invertibility of A. 2L (PA = LU) If A is invertible, a permutation P will put its rows in the right order to factor PA into LU. There is a full set of pivots and Ax = b can be solved. In the MATLAB code, these lines exchange row k with row r below it (where the kth pivot has been found). The notation A([r k], 1:7) picks out the submatrix with these two rows and all columns | to n. AN kal ulienra) AK alpen) Ev Riot kel) ke ry kk 1). Pttr kin) = Pi alleen): sign = —sign The “sign”? of P tells whether the number of row exchanges is even (sign = +1) or odd (sign = —1). At the start, P is J and sign = +1. When there is a row exchange, the sign is reversed. The final value of sign is the determinant of P and it does not depend on the order of the row exchanges. For PA we get back to the familiar LU. This is the usual factorization. But the computer does not always use the first available pivot. An algebraist accepts a small pivot—anything but zero. A computer looks down the column for the largest pivot. (Section 9.1 explains why this reduces the roundoff error. It is called partial pivoting.) P may contain row exchanges that are not algebraically necessary. We still have PA = LU. Our advice is to understand permutations but let MATLAB do the computing. Calculations of A = LU are enough to do by hand, without P. The code splu(A) factors PA = LU and splv(A, b) solves Ax = b for any square invertible A. The program splu stops if no pivot can be found in column k. That fact is printed. 2.7. = 1. Transposes and Permutations REVIEW OF THE KEY IDEAS 95 =& | The transpose puts the rows of A into the columns of A!. Then (A!); j = Aji. 2. The transpose of AB is B'A!. The transpose of A~! is the inverse of A’. 3. The dot product (Ax)! y equals the dot product XECAS y). 4. A symmetric matrix (A? = A) has a symmetric factorization 5. A permutation matrix P has a | in each row and column, and P! = P~!, 6. If A is invertible then a permutation P will reorder its rows for PA = LU. A = LDL?. Problem Set 2.7 Questions 1-7 are about the rules for transpose matrices. 1 Find A? and A~! and (A~!)? and (A‘)~! for = 2 i © 5| and also if as|, al al Verify that (AB)! equals B'A™ but does not equal A! B!: [4] of] 2-8 3 In case AB = BA (not generally true!) how do you prove that B'AT = ATBT? 3 (a) The matrix ((AB)7!)? comes from (A-!y? and (B~')!. (b) If U is upper triangular then (U~'!)" is 4 Show that A? = 0 is possible but AA 5 (a) In what order? triangular. = 0 is not possible (unless A = zero matrix.) The row vector x! times A times the column y produces what number? 0 Aes: ne|e E 4] a= x'Ay=[0 0 6 (b) This is the row x'A= __ times the column y = (0, 1, 0). (c) This is the row x' =[0 When you transpose a block matrix M = Fs et the result is M? = 1] times the column Ay= it. Under what conditions on A, B, C, D is M a symmetric matrix? _ . Test 96 Chapter 2 Solving Linear Equations True or false: (a) The block matrix [9 4] is automatically symmetric. (b) If A and B are symmetric then their product AB is symmetric. (c) If A is not symmetric then A~! is not symmetric. (d) When A, B,C are symmetric, the transpose of (ABC)" is CBA. Questions 8-14 are about permutation matrices. 8 Why are there n! permutation matrices of order n? 9 If P, and Pz are permutation matrices, so is P; P23. After two permutations we still have the rows of J in some order. Give examples with P,P, # PP, and Pa P4 = Py Px. 10 There are 12 “even” permutations of (1, 2,3,4), with an even number of exchanges. Two of them are (1, 2,3,4) with no exchanges and (4, 3,2, 1) with two exchanges. List the other ten. Instead of writing each 4 by 4 matrix, use the numbers 4, 3,2, 1 to give the position of the 1 in each row. 11 Which permutation matrix makes PA upper triangular? Which permutations make P, AP» lower triangular? If you multiply A on the right by P, it exchanges the of A. > II CO xt oro OO M W 12 13 (a) Explain why the dot product of x and y equals the dot product of Px and Py. P is any permutation matrix. (b) With x = (1,2,3) and y = (1, 1,2) show that Px -y is not always equal WO ag ole S% (a) If you take powers equal to /? (b) Find a 5 by 5 permutation matrix so that the smallest power to equal / is of a permutation matrix, why is some P* eventually Pe (This is a challenge question. Combine a 2 by 2 block with a 3 by 3 block.) 14 Some permutation matrices are symmetric: Pak (a) If P sends row When (b) P! = P. Then P!P = J becomes 1 to row 4, then P! sends row to row P! = P the row exchanges come in pairs with no overlap. Find a 4 by 4 example with P! = P that moves all four rows. 2.7 Transposes and Permutations 97. Questions 15-20 are about symmetric matrices and their factorizations. 15 16 Find 2 by 2 symmetric matrices (a) A is not invertible. (b) Ai is invertible but cannot be factored into LU. (c) A can be factored into LU but not into LL!. If A=A! and B = B', which of these matrices are certainly symmetric? Gye 17. 18 19 8 (be (c) (d) Ade B)( A= B) ABA ABAB. (a) How many entries of A can be chosen independently, if A = A! is 5 by 5? (b) How (c) How many entries can be chosen if A is skew-symmetric? do L and D (still 5 by 5) give the same number of choices? (APH =A), Suppose R is rectangular (m by n) and A is symmetric (m by m). (a) Prove that R'AR (b) Prove that R'R has no negative numbers on its diagonal. is symmetric. Factor these symmetric matrices into = 20 A = A! with these properties: | im and fel After elimination clears out column 2 matrix that remains: A=|4 Dah 3 9 Ss 10 2 What shape is this matrix? A = LDL!. The pivot matrix D is diagonal: | and 2 |-l 0 A= ~+Find the PA = LU ANC Al—s il i |e Cemee Cee? 2 by € Up: L1,P,Uj. factorizations (and check them) for feat 22 0 —-1 » | below the pivot, find the symmetric Questions 21-29 are about the factorizations PA = LU and A= 21. =) 2 —-1 CO decal 8 AD OF i P| bet rage! Find a 3 by 3 permutation matrix (call it A) that needs two row exchanges to reach the end of elimination. For this matrix, what are its factors P, L, and U? 98 Chapter 2. Solving Linear Equations 23 Factor the following matrix into PA = LU. Factor it also into A = L;P,U; (hold the row exchange until 3 times row 1 is subtracted from row 2): A = SS Woe — NO 0O — 24 Write out P after each step of the MATLAB =|; i 25 ATG code splu, when 0 @ il © 5 6 eA— | Write out P and L after each step of the code splu when Omi Avs IO mB 26 Extend the MATLAB 27 What is the matrix L; in AsS Sy a) code splu to a code spldu which factors PA into LDU. A= L,P,U,? heatlect ss 3ehi 2A Wes Lect et POOR eae Or3y6 1 O20 Oa at OeIKO gta Des gligyyl O Pes akG Hin” 2 28 Suppose A is a permutation matrix. Then L = U = J. Explain why P is A? in PA = LU. The way to put the rows of P in the normal order (of /) is to multiply by : 29 Prove that the identity matrix cannot be the product of three row exchanges (or five). It can be the product of two exchanges (or four). 30 (a) Choose E>; to remove the 3 below the first pivot. Then multiply EAE}, to remove both 3’s: Marea) Al | oe 24 0 4 9 (b) Choose £32 to remove is going toward D=|0 PEO 20 2 0 A erat the 4 below the second pivot. Then A is reduced to D by E32E2,AE}, Ej, = D. Invert the E’s to find L in A= LDL". The next questions are about applications of the identity (Ax)'y = x™(ATy). 31. Wires go between Boston, Chicago, and Seattle. Those cities are at voltages xg, xc, x 5. With unit resistances between cities, the currents between cities are in y: y =AXY “As YBC int 0 XB Yossi 0 1 -l XC YBS At OMI ces 2.7. 32 Transposes and Permutations (a) Find the total currents A? y out of the three cities. (b) Verify that (Ax)! y agrees with xt(A! y)—six terms in both. 99 Producing x; trucks and x2 planes needs x; +50x2 tons of steel, 40x; + 1000x2 pounds of rubber, and 2x; + 50x2 months of labor. If the unit costs yj, y2, y3 are $700 per ton, $3 per pound, and $3000 per month, what are the values of a truck and a plane? Those are the components of A! y. 33 Ax gives the amounts of steel, rubber, and labor to produce x in Problem Find A. Then Ax-y is the 32. of inputs while x-ATy is the value of 34 The matrix P that multiplies (x, y, z) to give (z,x, y) is also a rotation matrix. Find P and P?. The rotation axis a = (1, 1, 1) equals Pa. What is the angle of rotation from v = (2,3, —5) to Pv = (—S, 2, 3)? 35 Witter. le 5| as the product EH and a symmetric matrix H. 36 This chapter ends with a great new factorization A = EH. Start from A = LDU, with 1’s on the diagonals of L and U. Insert C and U? to find E and H: Aue (LC)(U! DU) with 1’s on the diagonal of = of an elementary row operation matrix FE (lower triangular E) (symmetric matrix H) E = LC. What is C? Why is LC lower triangular? preseason|2 TA Talg CK ~®)&®®& vvreeseceeseeees * Transparent proof that (AB)' = B'A™. Matrices can be transposed by looking through the page from the other side. Hold up to the light and practice with B 100 = Chapter 2. Solving Linear Equations below. Its column becomes a row in B!. To see better, draw the figure with heavy lines on thin paper and turn it over so the symbol B! is upright. The three matrices are in position for matrix multiplication: the row of A times the column of B gives the entry in AB. Looking from the reverse side, the row of B! times the column of A! gives the correct entry in B'AT = (AB)!. VECTONSE AGES AND SUBSEAGEs SPACES OF VECTORS a 3.1 To a newcomer, matrix calculations involve a lot of numbers. To you, they involve vectors. The columns of Ax and AB are linear combinations of n vectors—the columns of A. This chapter moves from numbers and vectors to a third level of understanding (the highest level). Instead of individual columns, we look at “spaces” of vectors. Without seeing vector spaces and especially their subspaces, you haven’t understood everything about Ax = b. Since this chapter goes a little deeper, it may seem a little harder. That is natural. We are looking inside the calculations, to find the mathematics. The author’s job is to make it clear. These pages go to the heart of linear algebra. We begin with the most important vector spaces. They are denoted by R!, R?2, R?, R4, .... Each space R” consists of a whole collection of vectors. R> contains all column vectors with five components. DEFINITION This is called “S-dimensional space.” The space R"” consists of all column vectors v with n components. The components of v are real numbers, which is the reason for the letter R. A vector whose n components are complex numbers lies in the space C”. The vector space R? is represented by the usual xy plane. Each vector v in R? has two components. The word “space” asks us to think of all those vectors—the whole plane. Each vector gives the x and y coordinates of a point in the plane. Similarly the vectors in R? correspond to points (x, y, z) in three-dimensional space. The one-dimensional space R! is a line (like the x axis). As before, we print vectors as a column between brackets, or along a line using commas and parentheses: 101 102 Chapter 3 Vector Spaces and Subspaces 4 0| isin R3, (1,1,0,1,1) isinRD, fies is in C2. The great thing about linear algebra is that it deals easily with five-dimensional space. We don’t draw the vectors, we just need the five numbers (or n numbers). To multiply v by 7, multiply every component by 7. Here 7 is a “scalar.” To add vectors in R°, add them a component at a time. The two essential vector operations go on inside the vector space: We can add any two vectors in R", and we can multiply any vector in R" by any scalar. “Inside the vector space” means that the result stays in the space. If v is the vector in R* with components 1, 0, 0, 1, then 2v is the vector in R‘ with components 2, 0, 0, 2. (In this case 2 is the “scalar.’) A whole series of properties can be verified in R”. The commutative law is v-+ w = w+; the distributive law is c(v-+w) = cv+cw. There is a unique “zero vector” satisfying 0+ v = v. Those are three of the eight conditions listed at the start of the problem set. These eight conditions are required of every vector space. There are vectors other than column vectors, and vector spaces other than R”, and they have to obey the eight reasonable rules. A real vector space is a set of “vectors” together with rules for vector addition and for multiplication by real numbers. The addition and the multiplication must produce vectors that are in the space. And the eight conditions must be satisfied (which is usually no problem). Here are three vector spaces other than R”: M_ The vector space of all real 2 by 2 matrices. F Z_ The vector space of all real functions f (x). The vector space that consists only of a zero vector. In M the “vectors” are really matrices. In F the vectors are functions. In Z the only addition is 0+ 0 = 0. In each case we can add: matrices to matrices, functions to functions, zero vector to zero vector. We can multiply a matrix by 4 or a function by 4 or the zero vector by 4. The result is still in M or F or Z. The eight conditions are all easily checked. The space Z is zero-dimensional (by any reasonable definition of dimension). It is the smallest possible vector space. We hesitate to call it R°, which means no components—you might think there was no vector. The vector space Z contains exactly one vector (zero). No space can do without that zero vector. Each space has its own zero vector—the zero matrix, the zero function, the vector (0,0, 0) in R?. Subspaces At different times, we will ask you to think of matrices and functions as vectors. But at all times, the vectors that we need most are ordinary column vectors. They are vectors 3.1 Spaces of Vectors 103 00 10 e@ 01 bd Figure 3.1. smallest vector space f; “Four-dimensional” matrix space M. The “zero-dimensional” space Z. with n components—but maybe not all of the vectors with n components. important vector spaces inside R". There are Start with the usual three-dimensional space R*?. Choose a plane through the origin (0, 0,0). That plane is a vector space in its own right. If we add two vectors in the plane, their sum is in the plane. If we multiply an in-plane vector by 2 or —5S, it is still in the plane. The plane is not R* (even if it looks like R*). The vectors have three components and they belong to R°?. The plane is a vector space inside R°*. This illustrates one of the most fundamental ideas in linear algebra. is a subspace of the full vector space R°. The plane DEFINITION A subspace of a vector space is a set of vectors (including 0) that satisfies two requirements: If v and w are vectors in the subspace and c is any scalar, then (i) v-+w is inthe subspace and _ (ii) cv is in the subspace. In other words, the set of vectors is “closed” under addition v + w and multiplication cv (and cw). Those operations leave us in the subspace. We can also subtract, because —w is in the subspace and its sum with v is v — w. In short, all linear combinations stay in the subspace. All these operations follow the rules of the host space, so the eight required conditions are automatic. We just have to check the requirements (i) and (ii) for a subspace. First fact: Every subspace contains the zero vector. The plane in R® has to go through (0, 0,0). We mention this separately, for extra emphasis, but it follows directly from rule (ii). Choose c = 0, and the rule requires Ov to be in the subspace. Planes that don’t contain the origin fail those tests. When v is on such a plane, —v and Ov are not on the plane. A plane that misses the origin is not a subspace. Lines through the origin are also subspaces. When we multiply by 5, or add two vectors on the line, we stay on the line. But the line must go through (0, 0, 0). 104 = Chapter 3 Vector Spaces and Subspaces Another space is all of R*. The whole space is a subspace (of itself). Here is a list of all the possible subspaces of R*: (L) Any line through (0, 0, 0) (R>) The whole space (P) Any plane through (0, 0, 0) (Z) The single vector (0, 0, 0) If we try to keep only part of a plane or line, the requirements for a subspace don’t hold. Look at these examples in R?. Example 1 Keep only the vectors (x, y) whose components are positive or zero (this is a quarter-plane). The vector (2,3) is included but (—2, —3) is not. So rule (ii) is violated when we try to multiply by c = —1. The quarter-plane is not a subspace. Example 2 Include also the vectors whose components are both negative. Now we have two quarter-planes. Requirement (ii) is satisfied; we can multiply by any c. But rule (i) now fails. The sum of v = (2,3) and w = (—3, —2) is (—1, 1), which is outside the quarter-planes. Two quarter-planes don’t make a subspace. Rules (i) and (ii) involve vector addition v+w and multiplication by scalars like c and d. The rules can be combined into a single requirement—the rule for subspaces: A subspace containing v and w must contain all linear combinations cv + dw. Example 3 Inside the vector space M of all 2 by 2 matrices, here are two subspaces: (U) All upper triangular matrices b i Al (D) All diagonal matrices k a: Add any two matrices in U, and the sum is in U. Add diagonal matrices, and the sum is diagonal. In this case D is also a subspace of U! Of course the zero matrix is in these subspaces, when a, b, and d all equal zero. To find a smaller subspace of diagonal matrices, we could require a = d. The matrices are multiples of the identity matrix 7. The sum 2/ + 3/ is in this subspace, and so is 3 times 4/. It is a “line of matrices” inside M and U and D. Is the matrix J a subspace by itself? Certainly not. Only the zero matrix is. Your mind will invent more subspaces of 2 by 2 matrices—write them down for Problem 5. The Column Space of A The most important subspaces are tied directly to a matrix A. We are trying to solve Ax = b. If A is not invertible, the system is solvable for some b and not solvable for other b. We want to describe the good right sides b—the vectors that can be written as A times some vector x. 3.1 Spaces of Vectors 105 Remember that Ax is a combination of the columns of A. To get every possible b, we use every possible x. So start with the columns of A, and take all their linear combinations. This produces the column space of A. It is a vector space made up of column vectors—not just the n columns of A, but all their combinations Ax. By taking all combinations of the columns, we fill out a vector space. DEFINITION Thecolumn space of A consists of all linear combinations of the columns. The combinations are all possible vectors Ax. To solve Ax = b is to express b as a combination of the columns. The vector b on the right side has to be in the column space produced by A on the left side. 3A The system Ax = Bb is solvable if and only if b is in the column space of A. When Bb is in the column space, it is a combination of the columns. cients in that combination give us a solution x to the system Ax = b. The coeffi- Suppose A is an m by n matrix. Its columns have m components (not 7). So the columns belong to R”. The column space of A is a subspace of R”. The set of all column combinations Ax satisfies rules (i) and (ii): When we add linear combina- tions or multiply by scalars, we still produce combinations of the columns. The word “subspace” is justified. Here is a 3 by 2 matrix A, whose column space is a subspace of R*. It is a plane. Example 4 The column space consists of all combinations of the two columns—any x; times the first column plus any x2 times the second column. Those combinations fill up a plane in R> (Figure 3.2). If the right side b lies on that plane, then it is one of the combinations and (x1, x2) is a solution to Ax = b. The plane has zero thickness, so it is more likely that b is not in the column space. Then there is no solution to our 3 equations in 2 unknowns. Of course (0,0, 0) is in the column space. The plane passes through the origin. There is certainly a solution to Ax = 0. That solution, always available, is x = To repeat, the attainable right sides b are exactly the vectors in the column space. One possibility is the first column itself—take x; = | and x2 = 0. Another combination is the second column—take x; = 0 and x2 = 1. The new level of understanding is to see all combinations—the whole subspace is generated by those two columns. 106 Chapter 3 Vector Spaces and Subspaces Figure 3.2 The column space C(A) is a plane containing the two columns. Ax = Db is solvable when Bb is on that plane. Then b is a combination of the columns. Notation The column space of A is denoted by C(A). take all their linear combinations. Start with the columns and Example 5 Describe the column spaces (they are subspaces of R’) for 0 =; | and 192 a=|) | and | ae} Be; 0 “A Solution The column space of J is the whole space R’. Every vector is a combination of the columns of J. In vector space language, C(/) is R?. The column space of A is only a line. The second column (2,4) is a multiple of the first column (1, 2). Those vectors are different, but our eye is on vector spaces. The column space contains (1,2) and (2,4) and all other vectors (c,2c) along that line. The equation Ax = b is only solvable when B is on the line. The third matrix (with three columns) places no restriction on b. The column space C(B) is all of R?. Every b is attainable. The vector b = (5,4) is column 2 plus column 3, so x can be (0, 1,1). The same vector (5,4) is also 2(column 1) + column 3, so another possible x is (2,0, 1). This matrix has the same column space as I—any b is allowed. But now x has more components and there are more solutions. The next section creates another vector space, to describe all the possible solutions of Ax = 0. This section created the column space, to describe all the attainable right sides b. 3.1. =» 1. REVIEW OF THE KEYIDEAS R” and C” and M Spaces of Vectors 107 =& (matrices) and F (functions) and Z (zero vector alone) are vector spaces. 2. 3. 4. A space containing v and w must contain all thir combinations cv + dw. | The combinations of the columns of A form the column space of A. The system Ax = b is solvable exactly when B is in the column space of A. Problem Set 3.1 The first problems are about vector spaces in general. The vectors in those spaces are not necessarily column vectors. In the definition of a vector space, vector addition x + y and scalar multiplication cx are required to obey the following eight rules: ak 1 ey = Vox @) x+(y+zZ)=@+y)+2z (3) There is a unique “zero vector” such that x +0 =x (4) For each x there is a unique vector —x (5) 1 times x equals x (6) (cic2)x = c1(c2x) (Dy <c(x yy) = cx (8), (eq ica) MCX -1'CQX. for all x such that x + (-x) =0 icy Suppose the sum (x1, x2) + (y1, y2) is defined to be (x; + y2,x2 + y1). With the usual multiplication cx = (cx1,cx2), which of the eight conditions are not satisfied? 2 Suppose the multiplication cx is defined to produce (cx), 0) instead of (cx1, cx2). With the usual addition in R*, are the eight conditions satisfied? 3 (a) Which rules are broken if we keep only the positive numbers x > 0 in R!? Every c must be allowed. The half-line is not a subspace. (b) The positive numbers with x + y and cx redefined to equal the usual xy and x° do satisfy the eight rules. Test rule 7 when c =.3,x = 2;y = (Then x + y = 2 and cx = 8.) Which number is the “zero vector”? vector “—2” is the number _ 4 1. The The matrix A = [273] is a “vector” in the space M of all 2 by 2 matrices. Write down the zero vector in this space, the vector 14, and the vector —A. What matrices are in the smallest subspace containing A? 108 = Chapter 3. Vector Spaces and Subspaces 5 (a) Describe a subspace of M that contains (b) If a subspace of M contains A and B, must it contain /? (c) Describe a subspace of M that contains no nonzero diagonal matrices. A=[49] but not B=[9§ _?]. 6 The functions f(x) = x? and g(x) = 5x are “vectors” in F. This is the vector space of all real functions. (The functions are defined for —oo < x < oo.) The combination 3 f(x) — 4g(x) is the function h(x) = 7 Which rule is broken if multiplying f(x) by c gives the function f(cx)? the usual addition f(x) + g(x). 8 If the sum of the “vectors” f(x) and g(x) is defined to be the function f(g(x)), then the “zero vector” is g(x) = x. Keep the usual scalar multiplication c f(x) and find two rules that are broken. Questions 9-18 are about the “subspace requirements”: in the subspace. 9 10 11. Keep x + y and cx must stay One requirement can be met while the other fails. Show this with (a) A set of vectors in R* for which x + y stays in the set but 5x may be outside. (b) A set of vectors in R? (other than two quarter-planes) for which every cx stays in the set but x + y may be outside. Which of the following subsets of R? are actually subspaces? (a) The plane of vectors (b1, b2, b3) with b, = 0. (b) The plane of vectors with bj = 1. (c) The vectors with bj b2b3 = 0. (d) All linear combinations of v = (1,4,0) and w = (2, 2, 2). (e) All vectors that satisfy b; + b2 + b3 = 0. (f) All vectors with b; < bo < bz. Describe the smallest subspace of the matrix space M that contains @ iNet ik a a (b) Eae ol 0 1 i A ) EPO a and ihae| 12 Let P be the plane in R*® with equation x + y — 2z = 4. Find two vectors in P and check that their sum is not in P. 13 Let Po be the plane through (0, 0,0) parallel to the previous plane P. What is the equation for Po? Find two vectors in Po and check that their sum is in Po. 3.1 14 15 Spaces of Vectors 109 The subspaces of R? are planes, lines, R? itself, or Z containing (0, 0, 0). (a) Describe the three types of subspaces of R?. (b) Describe the five types of subspaces of R*. (a) The intersection of two planes through (0, 0,0) is probably a (b) The intersection of a plane through (0, 0,0) with a line through (0, 0, 0) is probably a (c) If S and T are subspaces of R>, prove that S T (the set of vectors in both subspaces) is a subspace of R°. Check the requirements on x + y and 6x 16 Suppose P is a plane through (0,0,0) and L is a line through (0,0,0). smallest vector space containing both P and L is either or 17 (a) Show that the set of invertible matrices in M is not a subspace. (b) Show that the set of singular matrices in M is not a subspace. 18 The True or false (check addition in each case by an example): (a) The symmetric matrices in M (with A! = A) form a subspace. (b) The skew-symmetric (c) The unsymmetric matrices in M (with A! # A) form a subspace. matrices in M (with A! = —A) form a subspace. Questions 19-27 are about column spaces C(A) and the equation Ax = b. 19 Describe the column spaces (lines or planes) of these particular matrices: A= 20 and Bis ho ) 0 OC ands @C= igaele, e200) 0 0 For which right sides b do these systems have solutions? (a) ae DS —1 21 a2 1070 issn 8 -4 Sami edUe b P| EN aay: —2]| | x3 b3 Leesa (b) 2 —1| es 9 j4=| b by -4 Adding row | of A to row 2 produces B. Adding column | to column 2 produces C. A combination of the columns of ____ is also a combination of the columns of A. Those two matrices have the same column ee his i and Ibn Bele a and ie ho 3 al 110 = Chapter 3 ZZ Vector Spaces and Subspaces For which vectors ie a | ) ih ol OOM (b;, bz, b3) do these systems have a solution? x] by coy || = |) x3 b3 and 23 itl 0 O ORO hye Oth O70" and rs ll 0 x] b, 53 ||= || ea X32 b3 ae bj 5% || = || lap Sea b3 If we add an extra column Bb to a matrix A, then the column space gets larger unless _____. Give an example where the column space gets larger and an example where it doesn’t. Why is Ax = B solvable exactly when the column spaces are the same for A and [A 6]? 24 25 The columns of AB are combinations of the columns of A. The column space of AB is contained in (possibly equal to) the column space of . Give an example where those two column spaces are not equal. Suppose Ax = b and Ay = b* are both solvable. Then Az = b+ b* is solvable. What is z? This translates into: If b and b* are in the column space C(A), then 26 If A is any 5 by 5 invertible matrix, then its column space is Daa True or false (with a counterexample if false): 28 . Why? (a) The vectors b that are not in the column space C(A) form a subspace. (b) If C(A) contains only the zero vector, then A is the zero matrix. (c) The column space of 2A equals the column space of A. (d) The column space of A — J equals the column space of A. Construct a 3 by 3 matrix whose column space contains (1, 1,0) and (1,0, 1) but not (L511): 3.2 The Nullspace of A: Solving Ax=0 THE NULLSPACE OF A: SOLVING 111 Ax = 0 o 3.2 This section is about the space of solutions to Ax = 0. The matrix A can be square or rectangular. One immediate solution is x = 0—and for invertible matrices this is the only solution. For other matrices, not invertible, there are nonzero solutions to Ax = 0. Each solution x belongs to the nullspace of A. We want to find all solutions and identify this very important subspace. DEFINITION =The nullspace of A consists of all solutions to Ax = 0. These vectors x are in R”. The nullspace containing all solutions x is denoted by N(A). Check that the solution vectors form a subspace. Suppose x and y are in the nullspace (this means Ax = 0 and Ay = 0). The rules of matrix multiplication give A(x + y) = 0+0. The rules also give A(cx) = c0. The right sides are still zero. Therefore x + y and cx are also in the nullspace N(A). Since we can add and multiply without leaving the nullspace, it is a subspace. To repeat: The solution vectors x have n components. They are vectors in R", so the nullspace is a subspace of R". The column space C(A) is a subspace of R’”. If the right side b is not zero, the solutions of Ax = b do not form a subspace. The vector x = 0 is only a solution if b = 0. When the set of solutions does not include x = 9, it cannot be a subspace. Section 3.4 will show how the solutions to Ax = b (if there are any solutions) are shifted away from the origin by one particular solution. Example 1 The equation x + 2y +3z = Ocomes from the 1 by 3 matrix A=[1 2 3]. This equation produces a plane through the origin. The plane is a subspace of R?. It is the nullspace of A. The solutions to x + 2y + 3z = 6 also form a plane, but not a subspace. 2 Example 2 Describe the nullspace of A = E ‘| Solution Apply elimination to the linear equations Ax = 0: pete 3x; + 6x2 = 0 a ier O70 There is really only one equation. The second equation is the first equation multiplied by 3. In the row picture, the line x; + 2x2 = 0 is the same as the line 3x; + 6x2 = 0. That line is the nullspace N(A). To describe this line of solutions, the efficient way is to give one point on it (one special solution). Then all points on the line are multiples of this one. We choose the second component to be x2 = 1 (a special choice). From the equation x; + 2x2 = 0, the 114 Chapter 3 Vector Spaces and Subspaces combination” or “trivial combination”. The solution is trivial (just x = 0) but the idea is not trivial. This case of a zero nullspace Z is of the greatest importance. It says that the columns of A are independent. No combination of columns gives the zero vector except the zero combination. All columns have pivots and no columns are free. You will see this idea of independence again .... Solving Ax = 0 by Elimination This is important. We solve m equations in n unknowns—and the right sides are all zero. The left sides are simplified by row operations, after which we read off the solution (or solutions). Remember the two stages in solving Ax = 0: 1. Forward elimination from A to a triangular U (or its reduced form R). 2. Back substitution in Ux = 0 or Rx = 0 to find x. You will notice a difference in back substitution, when A and U have fewer than n pivots. We are allowing all matrices in this chapter, not just the nice ones (which are square matrices with inverses). Pivots are still nonzero. The columns below the pivots are still zero. But it might happen that a column has no pivot. In that case, don’t stop the calculation. Go on to the next column. The first example is a 3 by 4 matrix: |Ne ee Sar Ae ge Soa) OG) LOT ls Certainly aj; = | is the first pivot. Clear out the 2 and 3 below that pivot: A>|0 |Wag Wage es | 0 4 4 O--0) “4-54 (subtract 2 x (subtract 3 x row row 1) 1) The second column has a zero in the pivot position. We look below the zero for a nonzero entry, ready to do a row exchange. The entry below that position is also zero. Elimination can do nothing with the second column. This signals trouble, which we expect anyway for a rectangular matrix. There is no reason to quit, and we go on to the third column. The second pivot is 4 (but it is in the third column). Subtracting row 2 from row 3 clears out that column below the pivot. We arrive at Loft U=10 0 (only two pivots) 0 4 4 (the last equation G0 0 0 became 0 = 0) 3.2. The fourth column The Nullspace of A: Solving Ax =0 115 also has a zero in the pivot position—but nothing can be done. There is no row below it to exchange, and forward elimination is complete. The matrix has three rows, four columns, and only two pivots. The original Ax = 0 seemed to involve three different equations, but the third equation is the sum of the first two. It is automatically satisfied (0 = 0) when the first two equations are satisfied. Elimination reveals the inner truth about a system of equations. Now comes back substitution, to find all solutions to Ux = 0. With four unknowns and only two pivots, there are many solutions. The question is how to write them all down. A good method is to separate the pivot variables from the free variables. ls The pivot variables are x; and x3, since columns 1| and 3 contain pivots. F_ The free variables are x2 and x4, because columns 2 and 4 have no pivots. The free variables x2 and x4 can be given any values whatsoever. Then back substitu- tion finds the pivot variables x; and x3. (In Chapter 2 no variables were free. When A is invertible, all variables are pivot variables.) The simplest choices for the free variables are ones and zeros. Those choices give the special solutions. Special Solutions - Set x2 = 1 and x4 =0. By back substitution x3 = 0 and x; = —1. - Set %9 = 0 and ag By back substitution x3 =— ly and x7 = —1- These special solutions solve Ux = 0 and therefore Ax = 0. They are in the nullspace. The good thing is that every solution is a combination of the special solutions. —| Complete Solution — 21 1 9 —| x4] 4 0 —x2 — X4 x2 oy 0 1 X4 special special complete (1) Please look again at that answer. It is the main goal of this section. The vector s; = (—1,1,0,0) is the special solution when x2 = | and x4 = 0. The second special solution has x2 = 0 and x4 = 1. All solutions are linear combinations of s\ and sp. The special solutions are in the nullspace N(A), and their combinations fill out the whole nullspace. The MATLAB code nulbasis computes these special solutions. They go into the columns of a nullspace matrix N. The complete solution to Ax = 0 is a combination of those columns. Once we have the special solutions, we have the whole nullspace. There is a special solution for each free variable. If no variables are free—this means there are n pivots—then the only solution to Ux = 0 and Ax = 0 is the trivial 114 Chapter 3 Vector Spaces and Subspaces combination” or “trivial combination”. The solution is trivial (just x = 0) but the idea is not trivial. This case of a zero nullspace Z is of the greatest importance. It says that the columns of A are independent. No combination of columns gives the zero vector except the zero combination. All columns have pivots and no columns are free. You will see this idea of independence again .... Solving Ax = 0 by Elimination This is important. We solve m equations in n unknowns—and the right sides are all zero. The left sides are simplified by row operations, after which we read off the solution (or solutions). Remember the two stages in solving Ax = 0: 1. Forward elimination from A to a triangular U (or its reduced form R). 2. Back substitution in Ux = 0 or Rx = 0 to find x. You will notice a difference in back substitution, when A and U have fewer than n pivots. We are allowing all matrices in this chapter, not just the nice ones (which are square matrices with inverses). Pivots are still nonzero. The columns below the pivots are still zero. But it might happen that a column has no pivot. In that case, don’t stop the calculation. Go on to the next column. The first example is a 3 by 4 matrix: a Aa de Sano algae? Al “Oe SE) IO" GhS Certainly aj; = | is the first pivot. Clear out the 2 and 3 below that pivot: A>1|0 Pal 0 00 ye 4 4 4 4 (subtract 2 x (subtract 3 x row row 1) 1) The second column has a zero in the pivot position. We look below the zero for a nonzero entry, ready to do a row exchange. The entry below that position is also zero. Elimination can do nothing with the second column. This signals trouble, which we expect anyway for a rectangular matrix. There is no reason to quit, and we go on to the third column. The second pivot is 4 (but it is in the third column). Subtracting row 2 from row 3 clears out that column below the pivot. We arrive at 1 it ?23 0 4 4 0 0 U=i10 0 0 (only two pivots) (the last equation became 0 = 0) 3.2. The Nullspace of A: Solving Ax =0 115 The fourth column also has a zero in the pivot position—but nothing can be done. There is no row below it to exchange, and forward elimination is complete. The matrix has three rows, four columns, and only two pivots. The original Ax = 0 seemed to involve three different equations, but the third equation is the sum of the first two. It is automatically satisfied (0 = 0) when the first two equations are satisfied. reveals the inner truth about a system of equations. Now comes back substitution, to find all solutions to Elimination Ux = 0. With four unknowns and only two pivots, there are many solutions. The question is how to write them all down. A good method is to separate the pivot variables from the free variables. P_ F_ The pivot variables are x; and x3, since columns | and 3 contain pivots. The free variables are x2 and x4, because columns 2 and 4 have no pivots. The free variables x2 and x4 can be given any values whatsoever. Then back substitu- tion finds the pivot variables x; and x3. (In Chapter 2 no variables were free. When A is invertible, all variables are pivot variables.) The simplest choices for the free variables are ones and zeros. Those choices give the special solutions. Special Solutions - Set x2 = 1 and x4 =0. By back substitution x3 = 0 and x; = —1. - Set xp = 0 anda, “By back substitution x3 = It P and xy = —1. These special solutions solve Ux = 0 and therefore Ax = 0. They are in the nullspace. The good thing is that every solution is a combination of the special solutions. —] Complete Solution x = Please look again at that answer. (—1,1,0,0) —] EX. x 42 . — X24 42 = 0 1 X4 special special complete It is the main goal of this section. The vector s; = is the special solution when x2 = 1 and x4 = 0. The second special solution has x2 = 0 and x4 = 1. All solutions are linear combinations of s, and sp. The special solutions are in the nullspace N(A), and their combinations fill out the whole nullspace. The MATLAB code nulbasis computes these special solutions. They go into the columns of a nullspace matrix N. The complete solution to Ax = 0 is a combination of those columns. Once we have the special solutions, we have the whole nullspace. There is a special solution for each free variable. If no variables are free—this means there are n pivots—then the only solution to Ux = 0 and Ax = 0 is the trivial 116 Chapter 3 Vector Spaces and Subspaces solution x = 0. All variables are pivot variables. In that case the nullspaces of A and U contain only the zero vector. With no free variables, and pivots in every column, the output from nulbasis is an empty matrix. Saeed Example 4 Find the nullspace of U = k 0 Hi The second column of U has no pivot. So x2 is free. The special solution has x2 = 1. Back substitution into 9x3 = 0 gives x3 = 0. Then x; + 5x2 = 0 or x; = —5. solutions to At) The Ux = 0 are multiples of one special solution: =5 l 0 The nullspace of U is a line in R?. It contains multiples of the special solution. One variable is free, and N = nulbasis (U) has one column. In a minute we will continue elimination on U, to get zeros above the pivots and ones in the pivots. The 7 is eliminated and the pivot changes from 9 to 1. The final result of this elimination will be R: i v=, This makes SF 0 4 reduces ih to R= | 5). 0 @ ip it even clearer that the special solution is s = (—5, 1, 0). Echelon Matrices Forward elimination goes from A to U. The process starts with an m by n matrix A. It acts by row operations, including row exchanges. It goes on to the next column when no pivot is available in the current column. The m by n “staircase” U is an echelon matrix. Here is a 4 by 7 echelon matrix with the three pivots highlighted in boldface: = Cox cus So Oss ss oOOs 2 Soo <i So = as yy Question Three pivot variables x1, x2, x6 Four free variables x3, x4, x5, x7 Four special solutions in N(U) What are the column space and the nullspace for this matrix? Answer The columns have four components so they lie in R*. (Not in R?!) The fourth component of every column is zero. Every combination of the columns—every vector in the column space—has fourth component zero. The column space C(U) consists of all vectors of the form (b1, bz, b3, 0). For those vectors we can solve Ux = b by back substitution. These vectors b are all possible combinations of the seven columns. The nullspace N(U) is a subspace of R’. The solutions to Ux = 0 are all the combinations of the four special solutions—one for each free variable: 3.2 The Nullspace of A: Solving Ax=0 1. Columns 2. Set one free variable to 1 and set the other free variables to zero. 3. Solve Ux = 0 for the pivot variables x1, x2, x6. 4. This gives one of the four special solutions in the nullspace matrix N. 117 3, 4,5,7 have no pivots. So the free variables are x3, x4, X5, X7. The nonzero rows of an echelon matrix come first. The pivots are the first nonzero entries in those rows, and they go down in a staircase pattern. The usual row operations (in the Teaching Code plu) produce a column of zeros below every pivot. Counting the pivots leads to an extremely important theorem. Suppose A has more columns than rows. With n > m there is at least one free variable. The system Ax = 0 has at least one special solution. This solution is not zero! 3B If Ax =0 has more unknowns than equations (A has more columns than rows), then it has nonzero solutions. In other words, a short wide matrix (n > m) always has nonzero vectors in its nullspace. There must be at least n — m free variables, since the number of pivots cannot exceed m. (The matrix only has m rows, and a row never has two pivots.) Of course a row might have no pivot—which means an extra free variable. But here is the point: When there is a free variable, it can be set to 1. Then the equation Ax = 0 has a nonzero solution. To repeat: There are at most m pivots. With n > m, the system Ax = 0 has a free variable and a nonzero solution. Actually there are infinitely many solutions, since any multiple cx is also a solution. The nullspace contains at least a line of solutions. With two free variables, there are two special solutions and the nullspace is even larger. The nullspace is a subspace. Its “dimension” is the number of free variables. This central idea—the dimension of a subspace—is defined and explained in this chapter. The Reduced Echelon Matrix R From the echelon matrix U we can go one more step. Continue onward from We can divide the second row by 4. Then both pivots equal 1. We can subtract 2 times this new row [0 0 1 1] from the row above. That produces a zero above the second pivot as well as below. The reduced row echelon matrix is i eenree) Re | Fao LOPa Ha OOo: OexO 118 = Chapter 3 Vector Spaces and Subspaces R has 1’s as pivots. It has 0’s everywhere else in the pivot columns. Zeros above pivots come from upward elimination. If A is invertible, its reduced row echelon form is the identity matrix R = I. This is the ultimate in row reduction. The zeros in R make it easy to find the special solutions (the same as before): 1. Set xo = 1 end x4 =—0. Solve Rve= Oy Then) —— Dom aScte = Orandiv——sl Solver hci 0) —!—landicat— sheng» 1 andes, — 0: ee The numbers —1 and 0 are sitting in column 2 of R (with plus signs). The numbers —1 and —1 are sitting in column 4 (with plus signs). By reversing signs we can read off the special solutions from the matrix R. The general solution to Ax = 0 or Ux = 0 or Rx = 0 is a combination N(U) = N(R) contains 18 SSE =i l 0 0 of those two special solutions: The nullspace N(A) = —1 + x4 = (complete solution of Ax = 90). =I) 1 The next section of the book moves firmly from U to R. The MATLAB command [ R, pivcol |= rref(A) produces R and also a list of the pivot columns. » REVIEW OF THE KEY IDEAS 8 1. The nullspace N(A) contains all solutions to Ax = 0. 2. Elimination produces an echelon columns and free columns. 3. Every free column leads to a special solution to Ax = 0. The free variable equals 1 and the other free variables equal 0. 4. The complete solution to Ax = 0 is a combination of the special solutions. 5. Ifn>m then A has at least one column without pivots, giving a special solution. So there are nonzero vectors x in the nullspace of this A. matrix U, or a row reduced R, with pivot Problem Set 3.2 Questions 1-8 are about the matrices in Problems 1 and 5. 1 Reduce these matrices to their ordinary echelon forms U: te (a) he Aa 28 4. 0 BS Os) sie ee Ong? 3.2. The Nullspace of A: Solving Ax =0 (Db) Ac 119 Sor sf op Ny of Which are the free variables and which are the pivot variables? For the matrices in Problem 1, find a special solution for each free variable. the free variable to 1. Set the other free variables to zero.) (Set By combining the special solutions in Problem 2, describe every solution to Ax = 0. The nullspace of A contains only the vector x = 0 when By further row operations on each U in Problem 1, find the reduced echelon form R. The nullspace of R is the nullspace of U. By row operations reduce A to its echelon form U. Write down a 2 by 2 lower triangular L such that A = LU. eh =1 3° 5 eS 6 al ee = 34085 EB 6 aI For the matrices in Problem 5, find the special solutions to Ax = 0. For an m by n matrix, the number of pivot variables plus the number of free variables is In Problem 5, describe the nullspace of each A in two ways. Give the equations for the plane or line N(A), and give all vectors x that satisfy those equations (combinations of the special solutions). Reduce the echelon forms U in Problem 5 to R. For each R draw a box around the identity matrix that is in the pivot rows and pivot columns. Questions 9-17 are about free variables and pivot variables. 9 10 True or false (with reason if true and example if false): (a) A square matrix has no free variables. (b) An invertible matrix has no free variables. (c) An m by n matrix has no more than n pivot variables. (d) An _m by n matrix has no more than m pivot variables. Construct 3 by 3 matrices A to satisfy these requirements (if possible): (a) A has no zero entries but U = /. (b) A has no zero entries but R = /. (c) A has no zero entries but R= (dd) A=U=ZK. U. 120 Chapter 3. Vector Spaces and Subspaces 11 Put 0’s and x’s (for zeros and nonzeros) in a 4 by 7 echelon matrix U so that the pivot variables are (a) 12 2, 4,.5 (D) = Son 7, (c) 4 and 6. Put 0’s and 1’s and x’s (zeros, ones, and nonzeros) in a 4 by 8 reduced echelon matrix R so that the free variables are (a) 2,°4,.5,0 (Dyin Leo Ones 13 Suppose column 4 of a 3 by 5 matrix is all zero. Then xq is certainly a variable. The special solution for this variable is the vector x = 14 Suppose the first and last columns of a 3 by 5 matrix are the same (not zero). Then is a free variable. The special solution for this variable is 5 i= 15 Suppose an m by n matrix has r pivots. The number . The nullspace contains only x = 0 when r = is all of R” when r = 16 The nullspace of a 5 by 5 matrix contains only x = 0 when the matrix has pivots. The column space is R°® when there are pivots. Explain why. 17 The equation x — 3y — z = 0 determines a plane in R*. What is the matrix A in this equation? Which are the free variables? The special solutions are (3, 1, 0) and 18 The plane x — 3y — z = 12 is parallel to the plane x — 3y — z = O in Problem 17. One particular point on this plane is (12, 0,0). All points on the plane have of special solutions is . The column space the form (fill in the first components) II SS aa << pa + as) © = &NS 19 If x is in the nullspace of B, prove that x is in the nullspace of AB. This means: If Bx = 0 then . Give an example in which these nullspaces are different. 20 If A is invertible then N(AB) equals N(B). second part: If ABx = 0 then Bx = 0. This means that Ux = 0 whenever that L is triangular but that L is LUx Following Problem 19, prove this = 0 (same nullspace). The key is not Questions 21-28 ask for matrices (if possible) with specific properties. 21 Construct a matrix avoxa! (3), Ile O), I). whose nullspace consists of all combinations of (2, 2, 1, 0) 3.2. The Nullspace of A: Solving Ax=0 121 22 Construct a matrix whose nullspace consists of all multiples of (4, 3, 2, 1). 23 Construct a matrix whose column space contains (1, 1,5) and (0, 3, 1) and whose nullspace contains (1, 1, 2). 24 Construct a matrix whose column space contains (1, 1,0) and (0, 1, 1) and whose nullspace contains 25 (1,0, 1) and (0,0, 1). Construct a matrix whose column space contains (1, 1, 1) and whose nullspace is the line of multiples of (1, 1, 1, 1). 26 Construct a 2 by 2 matrix whose nullspace equals its column space. This is possible. a7 Why does no 3 by 3 matrix have a nullspace that equals its column space? 28 If AB = 0 then the column space of B is contained in the ____—soff A. Give an example. 29 The reduced form R of a 3 by 3 matrix with randomly chosen entries is almost sure to be _. What R is most likely if the random A is 4 by 3? 30 Show by example that these three statements are generally false: 31 (a) A and A! have the same nullspace. (b) A and A! have the same free variables. (c) A and A! have the same pivots. (The matrix may need a row exchange.) A and A! do have the same number of pivots. This will be important. What is the nullspace matrix N (containing the special solutions) for A, B, C? A=[J 1] and bead Boj 0 sinel (C a=Il. Sy If the nullspace of A consists of all multiples of x = (2, 1,0, 1), how many pivots appear in U? 33 If the columns of N are the special solutions to Rx = 0, what are the nonzero rows of R? 34 (a) What are the five 2 by 2 reduced echelon matrices R whose entries are all 0’s and 1’s? (b) What are the eight | by 3 matrices containing only 0’s and 1’s? eight of them reduced echelon matrices? Are all 122 Chapter 3. Vector Spaces and Subspaces THE RANK AND THE ROW REDUCED FORM as 3.3 This section completes the step from A to its reduced row echelon form R. The matrix A is m by n (completely general). The matrix R is also m by n, but each pivot column has only one nonzero entry (the pivot which is always 1): R=10 £20 2 0-1 4 0090 -) -3 4 You see zeros above the pivots as well as below. This matrix is the final result of elimination. MATLAB uses the command rref. The Teaching Code elim has rref built into it: MATLAB: [R, pivcol] = rref(A) Teaching Code: [E,R]= elim(A) The extra output pivcol gives the numbers of the pivot columns. They are the same in A and R. The extra output E gives the m by m elimination matrix that puts A into its row reduced form R: EA=R. (1) The square matrix E is the product of elementary matrices E;; and Pj; and D~! that produce the elimination steps. £;; subtracts a multiple of row j from row 1, and P;; exchanges these rows. Then D~! divides the rows by the pivots to produce 1’s in every pivot. We actually compute E by applying row reduction to the longer matrix [A I |: All the elementary matrices that multiply A (to produce R) will also multiply J (to produce E). The whole long matrix is being multiplied by E: E LAr ah] je n columns = m columns RMSE fig | (2) nm This is exactly what “Gauss-Jordan” did in Chapter 2 to compute A~!. When A is square and invertible, its reduced row echelon form is R = I. Then EA = R becomes EA = I. In this invertible case, E is A~'. This chapter is going further, to any (rectangular) matrix A and its reduced form R. The matrix E that multiplies A is still square and invertible, but the best it can do is to produce R. Let us now suppose that R is computed. The work of elimination is done. We want to read off from R the key information it contains. It tells us which columns of A are pivot columns. R also tells us the special solutions in the nullspace. That information is hidden in A, and partly hidden in the upper triangular U. It shows up most clearly when the pivot columns in R are reduced to ones and zeros. 3.3. The Rank and the Row Reduced Form 123 The Rank of a Matrix The numbers m and n give the size of a matrix—but not necessarily the true size of a linear system. An equation like 0 = 0 should not count. If there are two identical rows in A, then the second one will disappear in R. We don’t want to count that row of zeros. Also if row 3 is a combination of rows | and 2, then row 3 will become all zeros in R. That zero row has no pivot. It is not included in determining the rank: DEFINITION The rank of A is the number of pivots. This number is r. The matrix R at the start of this section has rank r = 2. It has two pivots and two pivot columns. So does the unknown matrix A that produced R. This number r = 2 will be crucial to the theory, but its first definition is entirely computational. To execute the command r = rank (A), the computer just counts the pivots. When pivcol gives a list of the pivot columns, the length of that list is r. We know right away that r < m and r <n. The number of pivots can’t be greater than the number of rows. It can’t be greater than the number of columns. The cases r = m and r = n of “full row rank” and “full column rank” will be especially important. We mention them here and come back to them soon: - A has full row rank if every row has a pivot: r =m. No zero rows in R. - A has full column rank if every column has a pivot: r =n. No free variables. A square invertible matrix has r =m =n. Then R is the same as /. Actually the computer has a hard time to decide whether a small number is really zero. When it subtracts 3 times .33---3 from 1, does it obtain zero? Our Teaching Codes treat numbers below the tolerance 10~° as zero. But that is not entirely safe. A second definition of rank will soon come at a higher level. It deals with entire rows and entire columns—vectors and not just numbers. The matrices A and U and R have r independent rows (the pivot rows). They also have r independent columns (these are the pivot columns). We have to say exactly what it means for rows or columns to be independent. That crucial idea comes in Section 3.5. A third definition, at the top level of linear algebra, will deal with spaces of vectors. The rank r is the “dimension” of the column space. It is also the dimension of the row space. The great thing is that r reveals the dimensions of all other important subspaces—including the nullspace. Example 1 . 2 :|and i s and B and [6] all have rank 1. 124 Chapter 3° Vector Spaces and Subspaces The reduced row echelon forms R of these matrices have one nonzero row: 13 ki 0 4 0 7 1 | and i ,|and ia and [1]. In a rank one matrix A, all the rows are multiples of one nonzero row. Then the matrix R has only one pivot. Please check each of those examples of R. The Pivot Columns The pivot columns of R have 1’s in the pivots and 0’s everywhere else in the column. The r pivot columns taken together have an r by r identity matrix J. It sits above m —r rows of zeros. Remember that the numbers of the pivot columns are in the list called pivcol. The pivot columns of A are probably not obvious from A itself. But their column numbers are given by the same list pivcol. The r columns of A that eventually have pivots (in R) are the pivot columns. The first matrix R in this section is the row reduced echelon form of A: 13 A=|0 0 Lede 02 =-1 1 4 -—3| vl 26. 4 yields R=| 3 09.0 oe 0 2-1 © 4 =3 #80 0 The last row of A is the sum of rows 1 and 2. Elimination discovers this fact and replaces row 3 by the zero row. That uncovers the pivot columns (1 and 3). Columns | and 3 are also the pivot columns of A. But the columns of R and A are different! The columns of A don’t end with zeros. Our Teaching Codes pick out these r pivot columns by A(:, pivcol). The symbol : means that we take all rows of the pivot columns listed in pivcol. In this example E subtracts rows | and 2 from row 3 (to produce that zero row in R). So the elimination matrix FE and its inverse are Esl 0 0) 0- OL 10 Bes Sf and {0.0 E'=!0 1 0 1d The r pivot columns of A are also the first r columns of E~!. We can check that here: columns | and 3 of A are also columns 1 and 2 of E~!. The reason is that A = E7!R. Each column of A is E~! times a column of R. The 1’s in the pivots of R just pick out the first r columns of E~!. This is multiplication a column at a time (our favorite). One more important fact about the pivot columns. Our definition has been purely computational: the echelon form R indicates the pivot columns of A. Here is a more mathematical description of those pivot columns: 3.3 The Rank and the Row Reduced Form 125 3C The pivot columns of A are not linear combinations of earlier columns of A. This is clearly true for R. Its second pivot column has a one in row 2. All earlier columns have zeros in row 2. So the second pivot column could not be a combination of the earlier columns. Then the same is true for A. The reason is that Ax = 0 exactly when Rx = 0. The solutions don’t change during elimination. The combinations of columns that produce the zero vector are given by x, the same for A and R. Suppose a column of A is a combination of earlier columns of A. Then that column of R would be the same combination of earlier columns of R. The pivot columns are not combinations of earlier columns. The free columns are combinations of earlier columns. These combinations are exactly given by the special solutions! We look at those next. The Special Solutions Each special solution to Ax = 0 and Rx = 0 has one free variable equal to 1. The other free variables are all zero. The solutions come directly from the echelon form R: x] Re = [3 0 2 =I )0P209 I) 435 OM TGs (077 07 Gao x2, et X4 heal 0 02s 0 (3) X5 We set the first free variable to x2 = 1 with x4 = x5 = 0. The equations give the pivot variables xj = —3 and x3 = 0. This tells us how column 2 (a free column) is a combination of earlier columns. It is 3 times the first column and the special solution is s; = (—3, 1, 0,0, 0). That is the first of three special solutions in the nullspace. The next special solution has x4 = 1. The other free variables are x2 = O and x5 = 0. The solution is s2 = (—2, 0, —4, 1,0). Notice how the —2 and —4 are in R, with plus signs. s3; = The third special solution has x5 (1,0,3,0, 1). The numbers x; = = 1. With x2 = O and x4 = O we find 1 and x3 = 3 are in column 5 of R, again with opposite signs. This is a general rule as we soon verify. The nullspace matrix N contains the three special solutions in its columns: N= —3 -—2 11] 1. 10% -0-}7 0 —4 31] 0 tO 4 QO 20° I} not free free not free tree free The linear combinations of these three columns give all vectors in the nullspace. This is the complete solution to Ax = 0 (and Rx = 0). Where R had the identity matrix (2 by 2) in its pivot columns, N has the identity matrix (3 by 3) in its free rows. 126 Chapter 3 Vector Spaces and Subspaces There is a special solution for every free variable. Since r columns have pivots, that leaves n —r free variables. This is the key to Ax = 0. 3D The system Ax = 0 has n —, nullspace matrix N. special solutions. These are the columns of the When we introduce the idea of “independent” vectors, we will show that the special solutions are independent. Perhaps you can see in the matrix N that no column is a combination of the other columns. The beautiful thing is that the count is exactly right, and we explain it informally: 1. Ax =0 has n unknowns. 2. There are really r independent equations. 3. So there are n —r independent solutions. To complete this section, look again at the rule for special solutions. Suppose for simplicity that the first r columns are the pivot columns, and the last n —r columns are free (no pivots). Then the reduced row echelon form looks like | fee alt r pivot columns r pivot rows nl m—r n—r —Ff (4) free columns 3E The nullspace matrix containing the n —r | zero rows special solutions is r pivot variables n—r _ free variables : (5) Multiply R times N to get zero. The first block row is (J times —F) + (F times /). The columns of N do solve the equation Rx = 0 (and Ax = 0). They follow the rule that was discovered in the example: The pivot variables come by changing signs (F to —F) in the free columns of R. This is because the equations Rx = 0 are so simple. When the free part moves to the right side, the only nonzero part on the left side is the identity matrix: pivot : Laer |. a free leet : (6) In each special solution, the free variables are a column of 7. Then the pivot variables are a column of —F. That is the nullspace matrix N. 3.3. The Rank and the Row Reduced Form 127, The idea is true whether or not the pivot columns all come before the free columns. If they do come first, then J comes before F as in equation (1.5). If the pivot columns are mixed in with the free columns, then the J and F are mixed together. You can still see J and you can still see F. Here is an example where J = [1] comes first and F =[2 3] comes last. Example 2 The special solutions of x; + 2x2 + 3x3 =0 are the columns of oe Ll 0 Opal Ni The coefficient matrix is [1 2 3] =[J F']. Its rank is one. So there are n —r = 3 — 1 special solutions in the matrix N. Their first components are —F = [—2 —3]. Their other components (free variables) come from /. Final Note How can I write confidently about R when I don’t know which steps MATLAB will take? The matrix A could be reduced to R in different ways. Two different elimination matrices E, and Ez might produce R. Very likely you and I and Mathematica and Maple would do the elimination differently. The key point is that the final matrix R is always the same. Here is the reason. The original matrix A completely determines the J and F and zero rows in R: 1. 2. The pivot columns (containing 7) are decided by 3C: they are not combinations of earlier columns. The free columns (containing F’) are decided by 3D: they are combinations of earlier columns (and F tells those combinations). The solutions to Rx = 0 are exactly the solutions to Ax = 0, and the special solutions are the same. A small example with rank one will show two E’s that produce the correct R: De. a=|i | i ea; reduces to fl ak You could multiply row 1 of A by 5, and subtract it from row 2: 1 0}; 1/2 O}_} 1/2 O)_, Seo Be | ale) ines: Or you could exchange rows in A, and then subtract 2 times row | from row 2: 2 Jf? ol- [2 2] -2 Multiplication gives EA same. = R and also EnewA = R. The E’s are different, One thing that is determined is the first column of E ak Zo) ‘ E =) ees =| il and ai aeRe 2 Al R is the 128 = Chapter 3 Vector Spaces and Subspaces The first column of E~! is the first pivot column of A = E~!R. The second column of E—! only multiplies the zeros in R. So that second column of E—! is free to change. The matrix R and the r pivot columns in E~! are completely determined by A. = REVIEW OF THE KEY IDEAS =& 1. The reduced row echelon form comes from A by EA = R. 2. The rank of A is the number of pivots in R. 3. The pivot columns of A and R are in the same list pivcol. 4. Those pivot columns are not combinations of earlier columns. 5. The equation Ax = 0 has n —r special solutions (the columns of J). Problem Set 3.3 1 Z Which of these rules gives a correct definition of the rank of A? (a) The number of nonzero rows in R. (b) The number of columns minus the total number of rows. (c) The number of columns minus the number of free columns. (d) The number of 1’s in the matrix Throw away the last m —r Then the factorization A = R. zero rows in R, and the last A = E~!R, with zeros removed, m—r columns of E~!. becomes (first r columns of E~') (first r rows of R). Write the 3 by 5 matrix A at the start of this section as the product of the 3 by 2 matrix from the columns of E~! and the 2 by 5 matrix from the rows of R. This is the factorization A = (COL)(ROW)!. r matrix times an r by n matrix. 3 Find the reduced row echelon forms Every m by n matrix is an m by R of these matrices: (a) The 3 by 4 matrix of all ones. (b) The 3 by 4 matrix with aj; =i+j (c) The 3 by 4 matrix with aj; = (—1)/. —1. 3.3 The Rank and the Row Reduced Form 129 Find R for each of these (block) matrices: B=[A Al ne a Suppose all the pivot variables come Jast instead of first. Describe all four blocks in the reduced echelon form The block B should be r by r. What is the nullspace matrix N containing the special solutions? (Silly problem) Describe all 2 by 3 matrices A; and A2, with row echelon forms R, and R2, such that R; + Ro is the row echelon form of A; + Ao. that R; = A, and R> = Ad in this case? Is is true If A has r pivot columns, how do you know that A? has r pivot columns? 3 by 3 example for which the column numbers are different. Give a What are the special solutions to Rx = 0 and y'R = 0 for these R? Ree LO oars a0! WEA os O= OF 0 0 Rh) Ce 2 OS ORO 0 0 0 Problems 9-11 are about r by r invertible matrices inside A. If A has rank r, then it has an r by r submatrix S that is invertible. Remove m —r rows and n —r columns to find such a submatrix S inside each A: fame pee el oe| ber 3 5 0 ois Lid OFel 0 0 10 Suppose P is the m by r submatrix of A containing only the pivot columns of A. Explain why P has rank r. 11 In Problem 10, what is the rank of P'? Then if S? contains only the pivot columns of P! its rank is again r. The matrix S is an r by r invertible submatrix of A. Carry out the steps from A to P to P™ to S' to S for this matrix: aS For A = 4 61] = NN ns, find the invertible S. 130 = Chapter 3. Vector Spaces and Subspaces Problems 12-15 connect the rank of AB to the ranks of A and B. 12 (a) Suppose column j of B is a combination of previous columns of B. Show that column j of AB is the same combination of previous columns of AB. (b) Part (a) implies that rank(AB) < rank(B), because AB cannot have any new pivot columns. Find matrices A; and A2 so that rank(A;B) = rank B and rank(A2B) < rank B, for repel |Be os 13 Problem 12 proved that rank(AB) < rank(B). Then by the same reasoning rank(B! A‘) < rank(A!). How do you deduce that rank(AB) < rank A? 14 (Important) Suppose A and B are n by n matrices, and AB = J. Prove from rank(A B) < rank(A) that the rank of A is n. So A is invertible and B must be its two-sided inverse (Section 2.5). Therefore BA = I (which is not so obvious’). 15 If A is 2 by 3 and B is 3 by 2 and AB = 7, show from its rank that BA # I. Give an example of A and B. 16 Suppose A and B have the same reduced row echelon form R. 17 (a) Show that A and B have the same nullspace and the same row space. (b) Since E;A = R and E2B = R we know times B. The factorization A = (COL)(ROW)! that A equals an in Problem 2 expresses matrix every matrix of rank r as a sum of r rank one matrices, when the multiplication is done as columns times rows. Express A and B in this way as the sum of two rank one matrices: A= 18 eR As A) Suppose A is an m by n matrix of rank r. Its echelon form is R. Describe exactly the matrix Z (its shape and all its entries) that comes from transposing the row echelon form of R’ (prime means transpose): R= 19 a1 Fe rref(A) and Z = (rref(R’))’. Compare Z in Problem 18 with the matrix that comes echelon form of A’ (which is not R’): S= rref(A’) and Y= rref(S’). Explain in one line why Y is or is not equal to Z. from starting with the 3.4 20 The Complete Solution to Ax=b = 131 In Problem 18 we could also find the matrices E and F that put A and R’ into their row echelon forms (as in [ike Beli aref([Ant|, EA = R): and» [Sez] = mref((LR! of |)eeand »ZeSs. What are the shapes of E and F? How is the final Z connected by E and F to the original R? THE COMPLETE SOLUTION TO Ax = 5b a 3.4 The last section totally solved Ax = 0. Elimination converted the problem to Rx = 0. The free variables were given special values (one and zero). Then the pivot variables were found by back substitution. We paid no attention to the right side b because it started and ended as zero. The solution x was in the nullspace of A. Now 3B is not zero. Row operations on the left side must act also on the right side. One way to organize that is to add b as an extra column of the matrix. We keep the same example A as before. But we “augment” A with the right side (b), b2, b3) = CBG 2): h Soe ae! 0 0 1 4]//°?|}=]6] asa 4 1] has the augmented 7 | matrix {AS PTO Oren |0 0 1 4 6/=[A 5] peota Qe The augmented matrix is just [A b |. When we apply the usual elimination steps to A, we also apply them to b. In this example we subtract row | from row 3 and then subtract row 2 from row 3. This produces a complete row of zeros: Lh sey 0. 2 1 | has the 0 0 1 4/|°2 |=] 6] augmented CeCe Oneal.4 0 | matrix ee | |0 0 1 4 6|=[R d]. 0.030 026 That very last zero is crucial. It means that the equations can be solved; the third equation has become 0 = 0. In the original matrix A, the first row plus the second row equals the third row. If the equations are consistent, this must be true on the right side of the equations also! The all-important property on the right side was 1+6 = 7. Here are the same augmented matrices for a general b = (by, b2, b3): | 0 l Ww Ww © b| b nN 3 © — —oe a) b La —|]0 (OL 2 014 0°08 050 b _— 6 Now we get 0 = 0 in the third equation provided b3—b; —bz = 0. This is bj +52 = b3. 132 Chapter 3 Vector Spaces and Subspaces The free variables are x2 = x4 = 0. The pivot variables are x; = 1 and x3 = 6, taken from the last column d of the reduced augmented matrix. The code x = partic (A, b) gives this particular solution to Ax = b. First it reduces A and b to R and d. Then the r pivot variables in x are taken directly from d, because the pivot columns in R contain the identity matrix. After row reduction we are just solving /x = d. Notice how we choose the free variables and solve for the pivot variables. After the row reduction to R, those steps are quick. We can see which variables are free (their columns don’t have pivots). When those variables are zero, the pivot variables for x, are in the extra column: The particular solution solves Ax, =b The n—r Ax, = 09. special solutions solve Question Suppose A is a square invertible matrix. What are x, and x,? Answer The particular solution is the one and only solution A~'b. There are no special solutions because there are no free variables. The only vector in the nullspace is X, = 0. The complete solution is x =X) +X, = A-'b +0. This was the situation in Chapter 2. We didn’t mention the nullspace in that chapter; it contained only the zero vector. The solution A~'b appears in the extra column, because the reduced form of A is R = J. Reduction goes from [A 6] to [7 A~'b]. The original system Ax = b is reduced all the way to x = A~'!D. In this chapter, that means m = n = r. It is special here, but square invertible matrices are the ones we see most often in practice. So they got their own chapter at the start of the book. For small examples we can put [A 6] into reduced row echelon form. For a large matrix, MATLAB can do it better. rank. Both columns have pivots. Example 1 Here is a small example with full column Find the condition on (b;, bz, b3) for Ax = b to be solvable, if A= 1 —2 1 2 | and b=] -3 b; bs b3 This condition puts 6 in the column space of A. Find the complete Solution row x = xp» + Xp. Use the augmented matrix, with its extra column b. Elimination subtracts | from row 2, and adds 2 times row 1 i —2 | to row 1b; 1 | — ia 2 bo|—10 1 b-bd -—3 bz QO —-1 b3+2b, 3: 1 >{10 0 O 1 0 2b; — bo b-bd, b+b,+b2 The last equation is 0 = 0 provided b3 + bj + by = 0. This condition puts b in the column space of A; then the system is solvable. The rows of A add to the zero row. So for consistency (these are equations!) the entries of b must also add to zero. 3.4 The Complete Solution to Ax=b 133 This example has no free variables and no special solutions. The nullspace solution is x, = 0. The (only) particular solution x, is at the top of the augmented column: 2b, —b 0 If b3 +b; + b2 is not zero, there is no solution to Ax = b (x, doesn’t exist). rank. This example is typical of the extremely important case when A has full column Every column has a pivot. The rank is r = n. The matrix is tall and thin (m =n). Row reduction puts J at the top, when ke A is reduced to R: n by n identity matrix m —n = rows of zeros I 0 : (1) There are no free columns or free variables. F is an empty matrix. We will collect together the different ways of recognizing this type of matrix. 3F Every matrix A with full column rank (r = n) has all these properties: 1. All columns of A are pivot columns. 2. There are no free variables or special solutions. 3. The nullspace N(A) contains only the zero vector x = 0. 4. If Ax =5b has a solution (it might not) then it has only one solution. In the language of the next section, A has independent columns when r = n. In Chapter 4 we will add one more fact to this list: The square matrix A'A is invertible. In this case the nullspace of A (and R) has shrunk to the zero vector. The solu- tion to Ax = b is unique (if it exists). There will be m —n zero rows in R. So there are m —n conditions on b in order to have 0 = 0 in those rows. The example had m = 3 and r =n = 2, so there was one condition b3+b,+b2 = O on the right side b. If this is satisfied, Ax = b has exactly one solution. The other extreme case is full row rank. shrinks to the zero vector. Now it is the nullspace of A! that In this case A is short and wide (m <n). The number of unknowns is at least the number of equations. A matrix has full row rank if r = m. Every row has a pivot, and here is an example. Example 2 There are n = 3 unknowns but only two equations. The rank is r = m = 2: oe Sp Yap Sie Hey I NNII 3 4 134 Chapter 3. Vector Spaces and Subspaces line of solutions in particular AXp =b nullspace Ax, = 90 Figure 3.3. tions. The complete solution is one particular solution plus all nullspace solu- These are two planes in xyz space. The planes are not parallel so they intersect in a line. This line of solutions is exactly what elimination will find. The particular solution will be one point on the line. Adding the nullspace vectors Xx, will move us along the line. Then x = Xp) +X, gives the whole line of solutions. We find x, and x, by elimination. Subtract row | from row 2 and then subtract row 2 from row I: De Le lia 9 Nout ole aah Ghee lt 4 Cele it Min Ota a2 Oe ee lah The free variable is x3. The particular solution has x3 = 0 and the special solution has rey c= Ile Xparticular Comes directly from the right side: x, = (2, 1, 0) Xspecial Comes from the third column (the free column F): s = (—3, 2, 1) It is wise to check that x, and s satisfy the original equations Ax, = b and As = 0: Datel 2+2 ar = 8 4 —34+2+1 —3+4-1 The nullspace solution x, is any multiple of s. It moves starting at Xparticular: = = 0 0 along the line of solutions, —3 Complete Solution: SiS YY hy fee azo 1 This line is drawn in Figure 3.3. Any point on the line could have been chosen as the particular solution; we chose the point with x3 = 0. The particular solution is not multiplied by an arbitrary constant! The special solution is, and you understand why. 3.4 The Complete Solution to Ax=b 135 Now we summarize this short wide case (m <n) of full row rank: 3G Every matrix A with full row rank (r = m) has all these properties: 1. _ All rows have pivots, and R has no zero rows. ys Ax = b has a solution for any right side b. 3. The column space is the whole space R”. 4 There are n —r =n —m special solutions in the nullspace of A. In this case with m pivots, the rows are “linearly independent”. In other words, the columns of A! are linearly independent. We are more than ready for the definition of linear independence, as soon as we summarize the four possibilities—which depend on the rank. Notice how r, m, n are the critical numbers! The four possibilities for linear equations depend on the rank r: r=m and r=n_ Ax =b_ has | solution r=m r<m and and r<n _ Short and wide r=n _ “Tall and thin Square and invertible Ax =b_ Ax =b has o solutions has O or | solution r<m and ¢r<n_ Ax =b_ has 0 or & solutions Unknown shape The reduced R will fall in the same category as the matrix A. In case the pivot columns come first, we can display these four possibilities for R: Re(eee|Aaa |ela r=>m=—nh Cases fr=mM<n 1 and 2 have full row rank r = m. I A FHSnN<mM 1 0 eg. 0 FTomyr<n Cases | and 3 have full column rank. Case 4 is the most general in theory and the least common in practice. Note In the first edition of this textbook, we generally stopped at U before reaching R. Instead of reading the complete solution directly from Rx = d, we found it by back substitution from Ux = c. That combination of reduction to U and back substitution for x is slightly faster. Now we prefer the complete reduction: a single “1” in each pivot column. We find that everything is so much clearer in R (and the computer should do the hard work anyway) that we reduce all the way. =» REVIEW OF THE KEY IDEAS 1. The rank r is the number of pivots. 2. Ax =b =& is solvable if and only if the last m —r equations reduce to 0 = 0. 136 Chapter 3 Vector Spaces and Subspaces 3 Then one particular solution x, has all free variables equal to zero. 4. The pivot variables are determined after the free variables are chosen. 5 Full column rank r =n 6 Full row rank r = m means one or infinitely many solutions. means no free variables: one solution or none. Problem Set 3.4 Questions 1-12 are about the solution of Ax = b. Follow the steps in the text to Xp» and x,. Use the augmented 1 matrix with last column b. Write the complete solution as x, plus any multiple of s: x+3y+3z= 2x +6y +9z=5 =X == 3y + 3z.=5. Find the complete solution (also called the general solution) to al 5 6 Bd 4 Soa 0 NAR eta feel if 1 bas Tl Under what condition on b;, bz, b3 is this system solvable? column in elimination. Find all solutions: Include b as a fourth x+2y —2z=b 2x+5y—4z=b) 4x + 9y — 8z = b3. Under what conditions on b;, bz, b3, b4 is each system solvable? case. tia DA le 2 e a Ey 3) bi by fw 2 Oies wid s6qltl ba Bit O nib Qul nacre Bail oh pal A aCe Zi oe Nl Find x in that by b) by Show by elimination that (b;, b2, b3) is in the column space of A if b3 — 2b) + 4b, = 0. bier} ASN ue Stee 2 4 0 What combination of the rows of A gives the zero row? 3.4. The Complete Solution to Ax =b Which vectors (b;, b2,b3) are in the column of the rows of A give zero? (2) eae Lg ee OF 25 (One 25 3 (D). Ate space of A? | 2 4 — —_— NO Which 137 combinations eR of Construct a 2 by 3 system Ax = b with particular solution x, = (2,4,0) and homogeneous solution x, = any multiple of (1, 1, 1). Why can’t a | by 3 system have x, Cie)? 10 11 12 = (2,4,0) and x, If Ax = Bb has two solutions x; and x2, find two solutions to Ax = 0. (b) Then find another solution to Ax = 0 and another solution to Ax = b. Explain why these are all false: (a) The complete solution is any linear combination of x, and xp. (b) A system Ax = b has at most one particular solution. (c) The solution x, with all free variables zero is the shortest solution (minimum length ||/x||). Find a 2 by 2 counterexample. (d) If A is invertible there is no homogeneous solution x). Suppose column 5 of U has no pivot. Then x5 is a ______ vector is _____ has _____ solutions. Suppose row 3 of U has no pivot. Then that row is ____._The _____ The zero _. The equation Ux =c equation Ax = b (is) (is not) (might not or r =n. The largest possible rank of a 3 by 5 matrix is _____. in every variable. the only solution to Ax = 0. If Ax = b has a solution, then it Questions 13-18 are about matrices of “full rank” r =m Then there is a pivot of U.. The solution to Ax = b (always exists) (is unique). column space of A is ____. 14 any multiple of (a) is only solvable provided be) solvable. 13 = An example is A = The largest possible rank of a 6 by 4 matrix is ____. Then there is a pivot in every ___ of U. The solution to Ax = b (always exists) (is unique). nullspace of A is _ . An example is A= ____ 15 Find by elimination the rank of A and also the rank of A’: | fee age —1 ks 2 oo 10 and A=jJ|1 1 ie 2/1 The - (rank depends on q). The 138 Chapter 3 Vector Spaces and Subspaces 16 Find the rank of A and also of A™A FE) 17 and also of AA!: : | Ane n, (0) | i 2B —e Reduce A to its echelon form U. Then find a triangular L so that A = LU. |e A= aNG oS 18 ANd pa §be | 6© ele Oe Oi; Oye Seed, Find the complete solution in the form x, + x, to these full rank systems: (a)) x+y+z=4 y Ay (b) tiz=4 pate. eas 19 If Ax = 5b has infinitely many solutions, why is it impossible for Ax = B (new right side) to have only one solution? Could Ax = B have no solution? 20 Choose the number g so that (if possible) the rank is (a) 1, (b) 2, (c) 3: @ A=|]-3 «a —2 2 -l SghGumt and nel q : a q Questions 21-26 are about matrices of rank r = 1. 21 Fill out these matrices so that they have rank 1: 1 AS ied ; and 4 B=|1 and u=|° Cc le 22 If A is an m by n matrix with r = 1, its columns are multiples of one column and its rows are multiples of one row. The column space is a in R”. The nullspace is a in R". Also the column space of A! is a in R". 23. Choose vectors u and v so that A = uv! = column times row: an|i 4 28 2|8 wa a=[? 7? 8 a 4) 3 66, 6 A= uv! is the natural form for every matrix that has rank r = 1. 24 If A is a rank one matrix, the second row of U is . Do an example. 3.4 The Complete Solution to Ax=b 139 25 Multiply a rank one matrix times a rank one matrix, to find the rank of AB and AM: 26 The rank one matrix uv! times the rank one matrix wz! is uz! times the number . This has rank one unless _ a) 27, Give examples of matrices A for which the number of solutions to Ax = b is 28 (a) 0 or 1, depending on b (b) ov, regardless of b (c) 0 or ow, depending on b (d) 1, regardless of b. Write down all known relations between r and m and n if Ax = Bb has (a) no solution for some b (b) infinitely many solutions for every b (c) exactly one solution for some 5b, no solution for other b (d) exactly one solution for every b. Questions 29-33 are about Gauss-Jordan elimination wards) and the reduced echelon matrix R. 29 (upwards as well as down- Continue elimination from U to R. Divide rows by pivots so the new pivots are all 1. Then produce zeros above those pivots to reach R: U 30 Suppose 31 Apply Gauss-Jordan Rie a: 2 =\20 0 4 Ww 6 S D ana 0 ea alal | OmO U is square with n pivots (an invertible matrix). elimination to Ux = 0 and Ux Explain why R = I. = cc. Reach Rx = 0 and fy o}=[5 6 so] ™ (Y l=[o 0 4 a) Solve Rx = 0 to find x, (its free variable is x7 = 1). Solve (its free variable is x2 = 0). Rx =d to find x, 140 Chapter 3. Vector Spaces and Subspaces 32 Gauss-Jordan elimination yields the reduced matrix R. Find Rx = 0 and Rx = d: GEL O |x. 3 OS Ge 10 OF 25 0 O02 0 10 120 and CP tert SOMES OO a2. Ont Os P29 4 Solve Ux = 0 or Rx = 0 to find x, (free variable = 1). What are the solutions to RY =a? 33 Reduce Ax = b to Ux = c (Gaussian elimination) and then to Rx = d (GaussJordan): [a0) 2s Soles Aral the63) Qn Ovlale & 7 Ora on pe 2 eee 10 X4 Find a particular solution x, and all homogeneous solutions xj. 34 Find matrices A and B with the given property or explain why you can’t: The 1 only solution of Ax = [2| is ¥ = |¢ |- The ‘only ‘solutiontar Bx =) [87 as -=[1) 35 Find the LU factorization of A and the complete solution to Ax = b: 1 3 6 ee ee i) W me BNW 5 36 and b= and then b= nA WW The complete solution to Ax =[}] is x =[}]+c[°]. Find A. = ooo 3.5 INDEPENDENCE, Independence, Basis and Dimension 141 BASIS AND DIMENSION a 3.5 This important section is about the true size of a subspace. There are n columns in an m by n matrix, and each column has m components. But the true “dimension” of the column space is not necessarily m or n. The dimension is measured by counting independent columns—and we have to say what that means. For this particular subspace (the column space) we will see that the true dimension is the rank r. The idea of independence applies to any vectors v;,... , v¥, in any vector space. Most of this section concentrates on the subspaces that we know and use—especially the column space and nullspace. In the last part we also study “vectors” that are not column vectors. These vectors can be matrices and functions; they can be linearly independent (or not). First come the key examples using column vectors, before the extra examples with matrices and functions. The final goal is to understand a basis for a vector space. We are at the heart of our subject, and we cannot go on without a basis. The four essential ideas in this section are: 1. Independent vectors 2. Spanning a space 3. Basis for a space 4. Dimension of a space. Linear Independence Our first definition of independence is not so conventional, but you are ready for it. DEFINITION The columns of A are linearly independent when the only solution to Ax =0 is x = 0. No other combination Ax of the columns gives the zero vector. With linearly independent columns, the nullspace N(A) contains only the zero vector. Let me illustrate linear independence (and linear dependence) with three vectors in R?: 1. Figure 3.4 (left) shows three vectors not in the same plane. They are independent. 2. Figure 3.4 (right) shows three vectors in the same plane. They are dependent. This idea of independence applies to 7 vectors in 12-dimensional space. If they are the columns of A, and independent, the nullspace only contains x = 0. Now we choose different words to express the same idea. The following definition of independence will apply to any sequence of vectors in any vector space. When the vectors are the columns of A, the two definitions say exactly the same thing. 142 = Chapter 3. Vector Spaces and Subspaces U3 U3 : arid boi —_—- — Ww2 Figure 3.4 Independent vectors vj, v2, v3. Dependent vectors w;, w2, w3. The combination w, — w2 + w3 is (0, 0, 0). DEFINITION The sequence of vectors v1,... , VU, is linearly independent if the only combination that gives the zero vector is Ov; + Ov2 +---+0Ov,. Thus linear independence means that xyvy + .x202 +++++X,V_, If a combination = 90 only happens when all x’s are zero. (1) gives 0, when the x’s are not all zero, the vectors are dependent. Correct language: “The sequence of vectors is linearly independent.” Acceptable shortcut: “The vectors are independent.” A collection of vectors is either dependent or independent. They can be combined to give the zero vector (with nonzero x’s) or they can’t. So the key question is: Which combinations of the vectors give zero? We begin with some small examples in R?: (a) The vectors (1,0) and (0, 1) are independent. (b) The vectors (1, 1) and (1, 1.001) are independent. (c) The vectors (1,1) and (2,2) are dependent. (d) The vectors (1,1) and (0,0) are dependent. Geometrically, (1, 1) and (2, 2) are on a line through the origin. They are not independent. To use the definition, find numbers x; and x2 so that x;(1, 1) +x2(2, 2) = (0, 0). This is the same as solving Ax = 0: i 2 x1 = hi SET 0 LOD 2 e—)2eandex = le The columns are dependent exactly when there is a nonzero vector in the nullspace. If one of the v’s is the zero vector, independence has no chance. Why not? Now move to three vectors in R°. If one of them is a multiple of another one, these vectors are dependent. But the complete test involves all three vectors at once. We put them in a matrix and try to solve Ax = 0. 3.5 Independence, Basis and Dimension 143 Example 1 The columns of this matrix are dependent: Ati bee aE eo eles BOCs 3 1 1 1S eS) l (2 1 0 3 Leas 0 3 0 ee 0 The rank of A is only r = 2. Independent columns would give full column rank r = is. In that matrix the rows are also dependent. You can see a combination of these rows that gives the zero row (it is row 1 minus row 3). For a square matrix, we will show that dependent columns imply dependent rows (and vice versa). Question How do you find that solution to Ax = 0? The systematic way is elimination. le At i Olees il alee al eTeGuCeSEtON Aa —s|n) ORs OP @ 3 mala OR eO The solution x = (—3, 1,1) was exactly the special solution. It shows how the free column (column 3) is a combination of the pivot columns. That kills independence! The general rule is already in our hands: 3H The columns of A are independent exactly when the rank is r =n. pivots and no free variables. Only x = 0 is in the nullspace. There are n One case is of special importance because it is clear from the start. Suppose seven columns have five components each (m = 5 is less than n = 7). Then the columns must be dependent. Any seven vectors from R° are dependent. The rank of A cannot be larger than 5. There cannot be more than five pivots in five rows. The system Ax = 0 has at least 7—5 = 2 free variables, so it has nonzero solutions—which means that the columns are dependent. 31 Any set of vectors in R” must be linearly dependent if n > m. The matrix has more columns than rows—it is short and wide. The columns are certainly dependent if n > m, because Ax = 0 has a nonzero solution. The columns might be dependent or might be independent if n < m. Elimination will reveal the pivot columns. /t is those pivot columns that are independent. Note Another way to describe linear independence is this: “One of the vectors is a combination of the other vectors.” That sounds clear. Why don’t we say this from the start? Our definition was longer: “Some combination gives the zero vector, other than the trivial combination with every x = 0.’ We must rule out the easy way to get the zero vector. That trivial combination of zeros gives every author a headache. In the first statement, the vector that is a combination of the others has coefficient x = 1. 144° Chapter 3 Vector Spaces and Subspaces The point is, our definition doesn’t pick out one particular vector as guilty. All columns of A are treated the same. We look at Ax = 0, and it has a nonzero solution or it hasn’t. In the end that is better than asking if the last column (or the first, or a column in the middle) is a combination of the others. Vectors that Span a Subspace The first subspace in this book was the column space. Starting with n columns vj, --- , Up, the subspace was filled out by including all combinations x; vj +---+2%,U,. The column space consists of all combinations Ax of the columns. We now introduce the single word “span” to describe this: The column space is spanned by the columns. DEFINITION A set of vectors spans a space if their linear combinations fill the space. Example 2 v; = | i and v2 = H span the full two-dimensional space R?. Example 3 v; = Ht v2 = Hi v3 = B also span the full space R?. Example 4 w; = B and w2 = be only span a line in R?. So does w by itself. So does w2 by itself. Think of two vectors coming out from (0, 0,0) in 3-dimensional space. Gener- ally they span a plane. Your mind fills in that plane by taking linear combinations. Mathematically you know other possibilities: two vectors spanning a line, three vectors spanning all of R°, three vectors spanning only a plane. It is even possible that three vectors span only a line, or ten vectors span only a plane. They are certainly not independent! The columns span the column space. Here is a new subspace—which begins with the rows. The combinations of the rows produce the “row space”. DEFINITION The row space of a matrix is the subspace of R” spanned by the rows. The rows of an m by n matrix have n components. They are vectors in R”’— or they would be if they were written as column vectors. There is a quick way to do that: Transpose the matrix. Instead of the rows of A, look at the columns of A’. Same numbers, but now in columns. The row space of A is C(A'). It is the column space of A‘. It is a subspace of R”. The vectors that span it are the columns of A', which are the rows of A. 3.5 Independence, Basis and Dimension 145 Example 5 Asa|2 lime 3 Ae les 3S and a" =| ]-Here 7 Sy eae. m= 3 Ands7a—oe The column space of A is spanned by the two columns of A. It is a plane in R°. The row space of A is spanned by the three rows of A (columns of A‘). It is all of R*. Remember: The rows are in R”. The columns are in R”. Same numbers, different vectors, different spaces. A Basis for a Vector Space In the xy plane, a set of independent vectors could be small—just one single vector. A set that spans the xy plane could be large—three vectors, or four, or infinitely many. One vector won’t span the plane. Three vectors won’t be independent. A “basis” for the plane is just right. DEFINITION A basis for a vector Pale is a sequence of vectors that has two properties at once: . 1. The vectors are linearly independent. 2. The vectors span the space. This combination of properties is fundamental to linear algebra. Every vector v in the space is a combination of the basis vectors, because they span the space. More than that, the combination that produces v is unique, because the basis vectors v1,... , Up are independent: There is one and only one way to write v as a combination of the basis vectors. Reason: Suppose v = ajv1; +++: +40, (a; — by)vj +--+ and also v = bv, +---byv,. By subtraction (Gn — bn)dn is the zero vector. From the independence of the v’s, each a; — bj = 0. Hence a; = bj. Example 6 The columns of / = E “|are a basis for R*. This is the “standard basis”. The vectors i= 3 and j = H are independent. They span Re, 146 Chapter 3 Vector Spaces and Subspaces Everybody thinks of this basis first. The vector i goes across and j goes straight up. Similarly the columns of the 3 by 3 identity matrix are the standard basis i, j, k. The columns of the n by n identity matrix give the “standard basis” for R”. Now we find other bases. Example 7 (Important) The columns of any invertible n by n matrix give a basis for R”: = hee Pipi: and) (Arete 1-8 0 ee haa) LD glee ch but not A= : 2 al. 4 When A is invertible, its columns are independent. The only solution to Ax = 0 is x = 0. The columns span the whole space R"—because every vector b is a combination of the columns. Ax = b can always be solved by x = A~'b. Do you see how everything comes together for invertible matrices? Here it is in one sentence: 3) The vectors vj,... , ¥, are a basis for R” exactly when they are the columns of an n by n invertible matrix. Thus R” has infinitely many different bases. When the columns of a matrix are independent, they are a basis for its column space. When the columns are dependent, we keep only the pivot columns—the r columns with pivots. They are independent and they span the column space. 3K The pivot columns of A are a basis for its column space. The pivot rows of A are a basis for its row space. So are the pivot rows of its echelon form R. Example 8 This matrix is not invertible. Its columns are not a basis for anything! i Dm ah s (es ‘ which reduces to eo; al Column | of A is the pivot column. That column alone is a basis for its column space. The second column of A would be a different basis. So would any nonzero multiple of that column. There is no shortage of bases! So we often make a definite choice: the pivot columns. Notice that the pivot column of R is quite different. It ends in zero. That column is a basis for the column space of R, but it is not even a member of the column space of A. The column spaces of A and R are different. Their bases are different. The row space of A is the same as the row space of R. It contains (2,4) and (1,2) and all other multiples of those vectors. As always, there are infinitely many bases to choose from. I think the most natural choice is to pick the nonzero rows of 3.5 Independence, Basis and Dimension 147 R (rows with a pivot). So this matrix A with rank one has only one vector in the basis: ; Basis for the column space: 2 3]: Basis for the row space: 1 >|: The next chapter will come back to these bases for the column space and row space. We are happy first with examples where the situation is clear (and the idea of a basis is still new). The next example is larger but still clear. Example 9 Find bases for the column space and row space of YS Le Oats PN | O70: 20's 10 Columns | and 3 are the pivot columns. They are a basis for the column space (of R!). The vectors in that column space all have the form b = (x, y,0). The column space of R is the “xy plane” inside the full 3-dimensional xyz space. That plane is not R?, it is a subspace of R°. Columns 2 and 3 are a basis for the same column space. So are columns | and 4, and also columns 2 and 4. Which columns of R are not a basis for its column space? The row space of R is a subspace of R*. The simplest basis for that row space is the two nonzero rows of R. The third row (the zero vector) is in the row space too. But it is not in a basis for the row space. The basis vectors must be independent. Question Given five vectors in R’, how do you find a basis for the space they span? First answer Make them the rows of A, and eliminate to find the nonzero rows of R. Second answer Put the five vectors into the columns of A. Eliminate to find the pivot columns (of A not R!). The program colbasis uses the column numbers from pivcol. The column space of R had r = 2 basis vectors. Could another basis have more than r vectors, or fewer? This is a crucial question with a good answer. All bases for a vector space contain the same number of vectors. This number is the “dimension” . Dimension of a Vector Space We have to prove what was just stated. There are many choices for the basis vectors, but the number of basis vectors doesn’t change. 3L If vj,..., Vm and w1,... , W, are both bases for the same vector space, then m = n. Proof Suppose that there are more w’s than v’s. From n > m we want to reach a contradiction. The v’s are a basis, so w; must be a combination of the v’s. If w; equals 148 = Chapter 3 Vector Spaces and Subspaces 41101 +:::+4m1Um, this is the first column of a matrix multiplication V A: a\\ W= =e ee =VA. am) We don’t know each aj;, but we know the shape of A (it is m by n). The second vector w2 is also a combination of the v’s. The coefficients in that combination fill the second column of A. The key is that A has a row for every v and a column for every w. It is a short wide matrix, since n > m. There is a nonzero solution to Ax = 0. But then VAx = 0 and Wx = 0. A combination of the w’s gives zero! The w’s could not be a basis—which is the contradiction we wanted. If m > n we exchange the v’s and w’s and repeat the same steps. The only way to avoid a contradiction is to have m =n. This completes the proof. The number of basis vectors depends on the space—not on a particular basis. The number is the same for every basis, and it tells how many “degrees of freedom” the vector space allows. For the space R”, the number is n. This is the dimension of R”. We now introduce the important word dimension for other spaces too. DEFINITION basis. The dimension of a vector space is the number of vectors in every This matches our intuition. The line through v = (1,5, 2) has dimension one. It is a subspace with one vector in its basis. Perpendicular to that line is the plane x + 5y +2z=0. This plane has dimension 2. To prove it, we find a basis (—5, 1, 0) and (—2,0,1). The dimension is 2 because the basis contains two vectors. The plane is the nullspace of the matrix A = [1 @) ZI which has two free variables. Our basis vectors (—5, 1,0) and (—2,0,1) are the “special solutions” to Ax = 0. The next section studies other nullspaces to show that the special solutions always give a basis. Here we emphasize only this: All bases for a space contain the same number of vectors. Note about the language of linear algebra We never say “the rank of a space” or “the dimension of a basis” or “the basis of a matrix”. Those terms have no meaning. It is the dimension of the column space that equals the rank of the matrix. Bases for Matrix Spaces and Function Spaces The words “independence” and “basis” and “dimension” are not at all restricted to column vectors. We can ask whether the 3 by 4 matrices A;, A2, A3 are independent. They are members of the space of all 3 by 4 matrices; some combination might give the zero matrix. We can also ask the dimension of that matrix space (it is 12). 3.5. Independence, Basis and Dimension 149 In differential equations we find a basis for the solutions to d*y/dx* = y. That basis contains functions, probably y = e* and y = e *. Counting the basis functions gives the dimension 2 (for the space of all solutions). We think matrix spaces and function spaces are optional. Your class can go past this part—no problem. But in some way, you haven’t got the ideas straight until you can apply them to “vectors” other than column vectors. Matrix spaces The vector space and here is a basis: M contains all 2 by 2 matrices. Its dimension is 4 f OV [021] )0) O14 OmrO Ar, Aa, Aa As = [4 nll msl nlalo le Those matrices are linearly independent. We are not looking at their columns, but at the whole matrix. Combinations of those four matrices can produce any matrix in M, so they span the space: cyA, +c2A2 +€3A3 + €4A4 = é (Biya ea‘ (67:1 This is zero only if the c’s are all zero—which proves independence. The matrices A;, Az, A4 are a basis for a subspace—the upper triangular matrices. Its dimension is 3. A; and Aq are a basis for the diagonal matrices. What is a basis for the symmetric matrices? Keep A; and Aq, and throw in A? + A3. To push this further, think about the space of all n by n matrices. For a basis, choose matrices that have only a single nonzero entry (that entry is 1). There are n? positions for that 1, so there are n* basis matrices: The dimension of the whole matrix space is n7. The dimension of the subspace of upper triangular matrices is sn? aa Sn. The dimension of the subspace of diagonal matrices is n. The dimension of the subspace of symmetric matrices is 5n? + 5n. Function spaces The equations d*y/dx? =0 and d*y/dx? + y =0 and d*y/dx* — y = 0 involve the second derivative. In calculus we solve to find the functions y(x): y’=0 _ is solved by any linear function y = cx +d y’ =—y_ is solved by any combination y = csinx + dcosx y’=y is solved by any combination y = ce* + de ~. The second solution space has two basis functions: sinx and cosx. The third solution space has basis functions e‘ and e *. The first space has x and 1. It is the “nullspace” of the second derivative! The dimension is 2 in each case (these are second-order equations). 150 Chapter 3. Vector Spaces and Subspaces What about y” = 2? Its solutions do not form a subspace—there is a nonzero right side b = 2. A particular solution is y(x) = x*. The complete solution is y(x) = ya-exos All those functions satisfy y” = 2. Notice the particular solution plus any function cx +d in the nullspace. A linear differential equation is like a linear matrix equation Ax = b. But we solve it by calculus instead of linear algebra. We end here with the space Z that contains only the zero vector. The dimension of this space is zero. The empty set (containing no vectors at all) is a basis. We can never allow the zero vector into a basis, because then linear independence is lost. =» REVIEW OF THE KEY IDEAS #8 The columns of A are independent if x = 0 is the only solution to Ax = 0. The vectors v1,..., Uy span a space if their combinations fill the space. A basis contains linearly independent vectors that span the space. ee All bases for a space have the same number of vectors. dimension of the space. This number is the The pivot columns are a basis for the column space and the dimension is r. Problem Set 3.5 Questions 1-10 are about linear independence and linear dependence. 1 Show that v,, v2, v3 are independent but v1, v2, v3, v4 are dependent: 1 y= 1 0 vN2= 0 1 1 v3O= 0 1 2 V4 = 1 3 4 Solve either cjv + c2v2 +¢303 = 0 or Ax = 0. The v’s go in the columns of A. (Recommended) y= 1 el 0 0 Find the largest possible number of independent vectors among l v2= aa v= 1 0 0 eal 04 = 0 1 ey 0 0 1 Uv = 0 | Vv = 0 0 1 aa Prove that if a = 0 ord = 0 or f = 0 (3 cases), the columns of U are dependent: 3.5 Independence, Basis and Dimension If a,d, f in Question 3 are all nonzero, 151 show that the only solution to Ux = 0 is x = 0. Then U has independent columns. Decide the dependence or independence of (a)jeeathe vectors (173.2) and (2. 153) and (3; 2.11) (b) (1, —3,2) and (2, 1, —3) and (—3, 2, 1). the vectors Choose three independent columns of U. Then make two other choices. Do the same for A. Pee YO tee: aa | oo 7 © ode ON6r Tae, 770 ad WO ne) oo A ano creel (Oar) 0440 0.4.0 ay (th If w 1, W2, w3 are independent vectors, show that the differences vj = w2 — w3 and v2 = w) — w3 and v3 = wy — w2 are dependent. Find a combination of the v’s that gives zero. If w,, w2,w3 are independent vectors, show that the sums v} = w2 + w3 and v2 = w; + w3 and v3 = w; + w2 are independent. (Write c)vj+c2v2 +c303 = 0 in terms of the w’s. Find and solve equations for the c’s.) Suppose 10 v1, v2, v3, v4 are vectors in R?. (a) These four vectors are dependent because (b) The two vectors vj and v2 will be dependent if (c) The vectors v; and (0,0,0) are dependent because Find two independent vectors on the plane x + 2y —3z—t = 0 in R*. Then find three independent vectors. Why not four? This plane is the nullspace of what matrix? Questions 11-15 are about the space spanned by a set of vectors. combinations of the vectors. 11 12 Take all linear Describe the subspace of R? (is it a line or plane or R°?) spanned by (a) the two vectors (1,1, —1) and (—1, —1, 1) (b) the three vectors (c) the columns (d) all vectors with positive components. (0, 1,1) and (1, 1,0) and (0, 0, 0) of a 3 by 5 echelon matrix with 2 pivots The vector b is in the subspace spanned by the columns of A when there is a solution to ___. The vector ¢ is in the row space of A when there is a solution LOWE . True or false: If the zero vector is in the row space, the rows are dependent. 152 Chapter 3 13 Find the dimensions of these 4 spaces. (a) column Vector Spaces and Subspaces space of A, (b) column Which two of the spaces are the same? space of U, (c) row space of A, (d) row space of U: | Asesinas So 14 1 ge 1 peat Bd (a er al) eae) 2 wll Cr ee) x1, x3, x4) 62): 10 R‘. It has 24 rearrangements like (x2, = (2 14,.19,.03, Choose and (x4, x3, X1,X2). Those 24 vectors, including x itself, span a subspace S. Find specific vectors x so that the dimension of S is: (a) zero, (b) one, (c) three, (d) four. 15 v + w and v — w are combinations of v and w. Write v and w as combinations of v-+w and v—w. The two pairs of vectors the same space. When are they a basis for the same space? Questions 16-26 are about the requirements for a basis. 16 If vj,..., Vy, are linearly independent, the space they span has dimension These vectors are a for that space. If the vectors are the columns of an m by n matrix, then m is than n. 17 Find a basis for each of these subspaces of R?*: (a) All vectors whose components are equal. (b) All vectors whose components (c) All vectors that are perpendicular to (1, 1,0,0) and (1,0, 1, 1). (d) The column space (in R’) and nullspace (in R>) of add to zero. 18 Find three different bases for the column space of U above. ferent bases for the row space of U. 19 Suppose v1, 2,..., 6 are six vectors in R*. U= i : i : als Then find two dif- (a) Those vectors (do)(do not)(might not) span R¢. (b) Those vectors (are)(are not)(might be) linearly independent. (c) Any four of those vectors (are)(are not)(might be) a basis for Re 20 The columns of A are n vectors from R”. If they are linearly independent, what is the rank of A? If they span R”, what is the rank? If they are a basis for R”, what then? 21 Find a basis for the plane x — 2y + 3z = 0 in R*. Then find a basis for the intersection of that plane with the xy plane. Then find a basis for all vectors perpendicular to the plane. 3.5 22 24 Basis and Dimension 153 Suppose the columns of a 5 by 5 matrix A are a basis for R°. (a) The equation Ax = 0 has only the solution (b) If b is in R° then Ax =D Conclusion: 23 Independence, x = 0 because is solvable because A is invertible. Its rank is 5. Suppose S is a 5-dimensional subspace of R®°. True or false: (a) Every basis for S can be extended to a basis for R° by adding one more vector. (b) Every basis for R° can be reduced to a basis for S by removing one vector. U comes from A by subtracting row | from row 3: I Sk i A= Pasa? il 3, ann Cea 2 | OROREO Find bases for the two column spaces. Find bases for the two row spaces. Find bases for the two nullspaces. 25 26 True or false (give a good reason): (a) If the columns of a matrix are dependent, so are the rows. (b) The column space of a 2 by 2 matrix is the same as its row space. (c) The column space. (d) The columns of a matrix are a basis for the column space. space of a 2 by 2 matrix has the same dimension For which numbers i= as its row c and d do these matrices have rank 2? ih || 0 Oo kay Oo @ © @ AOS) 2-2 @ 2 and Ate =|; |] Questions 27-32 are about spaces where the ‘“‘vectors” are matrices. 27 ‘Find a basis for each of these subspaces of 3 by 3 matrices: (a) All diagonal matrices. (b) All symmetric matrices (c) All skew-symmetric (AT =A). matrices (AT = —A). 28 Construct six linearly independent 3 by 3 echelon matrices U},..., Us. 29 ‘Find a basis for the space of all 2 by 3 matrices whose columns Find a basis for the subspace whose rows also add to zero. 30 ~=Show that the six 3 by 3 permutation matrices (Section 2.6) are linearly dependent. add to zero. 154 Chapter 3 31 What subspace of 3 by 3 matrices is spanned by 32 Vector Spaces and Subspaces (a) all invertible matrices? (b) all echelon matrices? (c) the identity matrix? Find a basis for the space of 2 by 3 matrices whose nullspace contains (2, 1, 1). Questions 33-37 are about spaces where the ‘‘vectors” are functions. 33 (a) Find all functions that satisfy a ==); (b) Choose a particular function that satisfies ay = 3s (c) Find all functions that satisfy 42 = 3. 34 The cosine space F3 contains all combinations y(x) = C cos 3x. Find a basis for the subspace with y(0) = 0. 35 Find a basis for the space of functions that satisfy (a) &-2y=0 ) Acosx + Bcos2x + %-2=0. 36 Suppose y;(x), y2(x), y3(x) are three different functions of x. The vector space they span could have dimension 1, 2, or 3. Give an example of y;, y2, y3 to show each possibility. 37 Find a basis for the space of polynomials p(x) of degree < 3. Find a basis for the subspace with p(1) = 0. 38 Find a basis for the space S of vectors (a, b, c,d) with a+c+d = 0 and also for the space T with a+b = 0 and c = 2d. What is the dimension of the intersection SOT? 3.6 Dimensions of the Four Subspaces 155 DIMENSIONS OF THE FOUR SUBSPACES u 3.6 The main theorem in this chapter connects rank and dimension. The rank of a matrix is the number of pivots. The dimension of a subspace is the number of vectors in a basis. We count pivots or we count basis vectors. The rank of A reveals the dimensions of all four fundamental subspaces. Here are the subspaces, including the new one. Two subspaces come directly from A, and the other two from A!: 1. The row space is C(A"), a subspace of R". The column space is C(A), a subspace of R”. The nullspace is N(A), a subspace of R”. eo The left nullspace is N(A‘), a subspace of R”. This is our new space. In this book the column space and nullspace came first. We know C(A) and N(A) pretty well. Now the other two subspaces come forward. For the row space we take all combinations of the rows. This is also the column space of A‘. For the left nullspace we solve A! y= 0—that system is n by m. This is the nullspace of A'. The vectors y go on the left side of A when the equation is written as y' A =0!. This is the nullspace of A’. The matrices A and A! are usually very different. So are their column spaces and their nullspaces. But those spaces are connected in an absolutely beautiful way. Part 1 of the Fundamental Theorem finds the dimensions of the four subspaces. One fact will stand out: The row space and column space have the same dimension. That dimension is r (the rank of the matrix). The other important fact involves the two nullspaces: Their dimensions are n —r and m —r, to make up the full dimensions n and m. Part 2 of the Theorem will describe how the four subspaces fit together (two in R” and two in R”). That completes the “right way” to understand Ax = b. Stay with it—you are doing real mathematics. The Four Subspaces for R Suppose A is reduced to its row echelon form R. For that special form, the four subspaces are easy to identify. We will find a basis for each subspace and check its dimension. Then we watch how the subspaces change (or don’t change!) as we look back at A. The main point is that the four dimensions are the same for A and R. As a specific 3 by 5 example, look at the four subspaces for the echelon matrix R: n= nd pa? 3 PSE SiO OF 0R.0 O-50e 9 1 “0 FO pivot rows | and 2 pivot columns | and 4 156 Chapter 3 Vector Spaces and Subspaces The rank is r = 2 (two pivots). Take the subspaces in order: 1. The row space of R has dimension 2, matching the rank. Reason: The first two rows are a basis. The row space contains combinations of all three rows, but the third row (the zero row) adds nothing new. So rows 1 and 2 span the row space. The pivot rows | and 2 are also independent. That is obvious for this example, and it is always true. If we look only at the pivot columns, we see the r by r identity matrix. There is no way to combine its rows to give the zero row (except by the combination with all coefficients zero). So the r pivot rows are independent and the dimension is r. The dimension of the row space is r. The nonzero rows of R form a basis. 2. The column space of R also has dimension r = 2. Reason: The pivot columns form a basis. They are independent because they start with the r by r identity matrix. No combination of those pivot columns can give the zero column (except the combination with all coefficients zero). And they also span the column space. Every other (free) column is a combination of the pivot columns. Actually the combination we need is also in the special solution: Column 2 is 3 (column 1). The special solution is (—3, 1, 0, 0, 0). Column 3 is 5 (column 1). The special solution is (—5, 0, 1, 0, 0, ). Column 5 is 9 (column (—9, 0, 0, —8, 1). 1) +8 (column 4). The special solution is The pivot columns are independent, and they span, so they are a basis. The dimension of the column space is r. The pivot columns form a basis. 3. The nullspace of this R has dimension n —r = 5 — 2. There are n —r = 3 free variables. Here x2,x3,x5 are free (no pivots in those columns). Those 3 free variables yield the 3 special solutions to Rx = 0. Set a free variable to 1, and solve for x; and x4: 0 ~A S) lI ~”AWw ll wnNn iS —8 SSO OoN coor Rx = 0 has the complete solution X = X98) + X33 + X585 3.6 Dimensions of the Four Subspaces There is a special solution for each free variable. variables, that leaves n —r 157 With n variables and r pivot free variables and special solutions: The nullspace has dimension n —r. The special solutions form a basis. The special solutions are independent, because they contain the identity matrix in rows 2, 3, 5. All solutions are combinations of special solutions, x = x282 + x383 + x585, because this gets x2, x3 and xs in the correct positions. Then the pivot variables x; and xq are totally determined by the equations Rx = 0. 4. The nullspace of R' has dimension m — r = 3 — 2. Reason: The equation R'y = 0 looks for combinations of the columns rows of R) that produce zero. This equation R' y = 0 is yi(l.3,5,.0, 9) of R™ (the y2(0, 0,0, 1,8) + y3(0, 0,.0,0,,.0) = 0.020507 0): (1) The solutions y;, y2, y3 are pretty clear. We need yj = 0 and y2 = 0. The variable y3 is free (it can be anything). The nullspace of R’ contains all vectors y = (0,0, y3). It is the line of all multiples of the basis vector (0, 0, 1). In all cases R ends with m—r zero rows. Every combination of these m—r rows gives zero. These are the only combinations of the rows of R that give zero, because the pivot rows are linearly independent. So we can identify the left nullspace of R, which is the nullspace of R’: TR TR Soe The left nullspace has dimension m — r. The solutions are y = (0,..., On veces SA Vine The first r rows of R are independent. The other rows are zero. To produce a zero combination, y must start with r zeros. This leaves dimension m — r. Why is this a “eft nullspace” ? The reason is that R? y = 0 can be transposed to y’R=0". Now y’ is a row vector to the left of R. You see the y’s in equation (1) multiplying the rows. This subspace came fourth, and some linear algebra books omit it—but that misses the beauty of the whole subject. In R" the row space and nullspace have dimensions r and n — r (adding to n). In R" the column space and left nullspace have dimensions r and m — r (total m). So far this is proved for echelon matrices R. Figure 3.5 shows the same for A. The Four Subspaces for A We have a small job still to do. The subspace dimensions for A are the same as for R. The job is to explain why. Remember that those matrices are connected by an invertible matrix E (the product of all the elementary matrices that reduce A to R): EA=R and A=E™!R (2) 158 Chapter 3 Vector Spaces and Subspaces C (A) dim r left nullspace nullspace Ax =0 N (A) dimension n—r Figure 3.5 1 Aly=0 N (A!) dimension m—r The dimensions of the four fundamental subspaces (for R and for A). A has the same row space as R. Same dimension r and same basis. Reason: Every row of A is a combination of the rows of R. Also every row of R is a combination of the rows of A. In one direction the combinations are given by E~!, in the other direction by E. Elimination changes the rows of A to the rows of R, but the row spaces are identical. Since A has the same row space as R, we can choose the first r rows of R as a basis. Or we could choose r suitable rows of the original A. They might not always be the first r rows of A, because those could be dependent. The good r rows of A are the ones that end up as pivot rows in R. 2 The column space of A has dimension r. For every matrix this fact is essential: The number of independent columns equals the number of independent rows. Wrong reason: “A and R have the same column space.” This is false. The columns of R often end in zeros. The columns of A don’t often end in zeros. The column spaces are different, but their dimensions are the same—equal to r. Right reason: A combination of the columns of A is zero exactly when the same combination of the columns of R is zero. Say that another way: Ax = 0 exactly when Re OU! Conclusion The r pivot columns in R are a basis for its column space. So the r pivot columns of A are a basis for its column space. 3.6 Dimensions of the Four Subspaces 159 3 A has the same nullspace as R. Same dimension n —r and same basis. Reason: The elimination steps don’t change the solutions. The special solutions are a basis for this nullspace. There are n —r free variables, so the dimension is n — r. Notice that r + (n —r) equals n: (dimension of column space) + (dimension of nullspace) = dimension of R”. 4 The left nullspace of A (the nullspace of A!) has dimension m — r. Reason: A’ is just as good a matrix as A. When we know the dimensions for every A, we also know them for A’. Its column space was proved to have dimension r. Since A! is n by m, the “whole space” is now R”. The counting rule for A was r+(n—r) =n. The counting rule for A! isr+(m-—r) =m. So the nullspace of A! has dimension m — r. We now have all details of the main theorem: Fundamental Theorem of Linear Algebra, Part 1 The column space and row space both have dimension r. The nullspaces have dimensions n—r and m—r. By concentrating on spaces of vectors, not on individual numbers or vectors, we get these clean rules. You will soon take them for granted—eventually they begin to look obvious. But if you write down an 11 by 17 matrix with 187 nonzero entries, we don’t think most people would see why these facts are true: dimension of C(A) = dimension of CAS dimension of C(A) + dimension of N(A) = 17. Example 1A =[1 2 3] has m=1 and n=3 and rank r=1. The row space is a line in R*. The nullspace is the plane Ax = x; + 2x2 + 3x3 = 0. This plane has dimension 2 (which is 3 — 1). The dimensions add to 1 + 2 = 3. The columns of this 1 by 3 matrix are in R'! The column space is all of R!. The left nullspace contains only the zero vector. The only solution to Aly = 0 is y =0. That is the only combination of the row that gives the zero row. Thus N(A') is Z, the zero space with dimension 0 (which is m—r). In R” the dimensions add to 1+-0= 1. Example 2 A = Le 8S 12 | hase 72 swith? —)5e-and e7a— 1. The row space is the same line through (1, 2,3). The nullspace is the same x1 + 2x2 + 3x3 = 0. The dimensions still add to 1 + 2 = 3. The columns are multiples of the first column (1, 1). But there is more plane than the zero vector in the left nullspace. The first row minus the second row is the zero row. 160 == Chapter 3 Vector Spaces and Subspaces Therefore A' y = 0 has the solution y = (1, —1). The column space and left nullspace are perpendicular lines in R?. Their dimensions are 1 and 1, adding to 2: column space = line through If A has three equal rows, | | its rank is left nullspace = line through . What are two 1 eq : of the y’s in its left nullspace? The y’s combine the rows to give the zero row. Matrices of Rank One That last example had rank r = 1—and rank one matrices are special. We can describe them all. You will see again that dimension of row space = dimension of column space. When r = 1, every row is a multiple of the same row: 1 ®) 2253 246 Ueiting wikg A Sia al 1 equals , times 3 [1 Z 3 |: 0 A column times a row (4 by | times 1 by 3) produces a matrix (4 by 3). All rows are multiples of the row (1, 2,3). All columns are multiples of the column (1, 2, —3, 0). The row space is a line in R”, and the column space is a line in R”. Every rank one matrix has the special form A = uv’ T = column times row. The columns are multiples of u. The rows are multiples of v'. The nullspace is the plane perpendicular to v. (Ax = 0 means that u(v'x) = 0 and then v'x = 0.) It is this perpendicularity of the subspaces that will be Part 2 of the Fundamental Theorem. =» REVIEW OF THE KEY IDEAS »® 1. The r pivot rows of R are a basis for the row space of R and A (same row space). 2. Ther 3. The n —,r nullspace). 4. The last m —r 5. The last m —r rows of E are a basis for the left nullspace of A. pivot columns of A (!) are a basis for its column space. special solutions are a basis for the nullspace of A and R (same rows of J are a basis for the left nullspace of R. 3.6 Dimensions of the Four Subspaces 161 Problem Set 3.6 (a) If a7 by 9 matrix has rank 5, what are the dimensions of the four subspaces? (b) If a3 by 4 matrix has rank 3, what are its column space and left nullspace? Find bases for the four subspaces associated with A and B: ee =[ 4 4 oi I 2 4 aa E 5 af Find a basis for each of the four subspaces associated with Oi 2 3-34 A=) 0 2, Ae iO Oia O nO) ol 2 Construct a matrix with the required property or explain why no such matrix exists: space contains 1] 70 1|,[°|, row space contains [3], | 2]. (a) Column (b) 1 3 Column space has basis E iFnullspace has basis li]. (c) Dimension of nullspace = | + dimension of left nullspace. (d) Left nullspace contains [4], row space contains |} ]. (e) Row space = column space, nullspace ¥ left nullspace. If V is the subspace spanned by (1, 1, 1) and (2, 1,0), find a matrix A that has V as its row space and a matrix B that has V as its nullspace. Without elimination, find dimensions and bases for the four subspaces for 3 > II 0 ands be 1 —Ow Sow >oo Re np Suppose the 3 by 3 matrix A is invertible. Write down bases for the four subspaces for A, and also for the 3 by 6 matrix B=[A A]. What are the dimensions of the four subspaces for A, B, and C, if J is the 3 by 3 identity matrix and 0 is the 3 by 2 zero matrix? Lied A=[J 0] and B=| or ot| and». C = [0]. 162 Chapter 3. Vector Spaces and Subspaces Which subspaces are the same for these matrices of different sizes? (a) [A] and hi (b) 4 and k il Prove that all three matrices have the same rank r. 10 If the entries of a 3 by 3 matrix are chosen randomly between 0 and 1, what are the most likely dimensions of the four subspaces? What if the matrix is 3 by 5? 11 (Important) A is an m by n matrix of rank r. Suppose there are right sides b for which Ax = b has no solution. 12 (a) What are all inequalities (< or <) that must be true between m,n, and r? (b) How do you know that A' y = 0 has solutions other than y = 0? Construct a matrix with (1,0, 1) and (1, 2,0) as a basis for its row space and its column space. Why can’t this be a basis for the row space and nullspace? 13 14 True or false: (a) If m =n then the row space of A equals the column space. (b) The matrices A and —A share the same four subspaces. (c) If A and B share the same four subspaces then A is a multiple of B. Without computing A, find bases for the four fundamental subspaces: AO 15 ees VERA 10 Dems wal | a 58 § OFT 2 3 ORO If you exchange the first two rows of A, which of the four subspaces stay the same? If v = (1, 2,3,4) is in the column space of A, write down a vector in the column space of the new matrix. 16 Explain why v = (1, 2,3) cannot be a row of A and also be in the nullspace of A. 17 Describe the four subspaces of R* associated with y We 18 Ot 0 aed4 oO. 0 0 | 0 and” f+ A-=| Pio 0. 1 1 OO (Left nullspace) Add the extra column b and reduce A to echelon form: [A b]=]4 t 2 Br pi 5 6 bb} 7 8 9 bs > eet fae 7 |0 -3 -6 bo —4b, 0 0 0 b3— 2b. +d; 3.6 Dimensions of the Four Subspaces 163 A combination of the rows of A has produced the zero row. What combination is it? (Look at b3 —2b2 +b, on the right side.) Which vectors are in the nullspace of A! and which are in the nullspace of A? 19 Following the method of Problem 18, reduce A to echelon form and look at zero rows. The b column tells which combinations you have taken of the rows: Be tinal eas (a) |3Gn 4 by (b) 3 2 2 AL Ios 2 55 sba From the 6 column after elimination, read off vectors in the left nullspace of A (combinations of rows that give zero). You should have m — r basis vectors for N(A!), 20 (a) Describe all solutions to Ax = 0 if 7 Wee, 21 [05-76 Pa ais MLS eee: Sey SN Oh On 0d 0) aa aaes 50) 20 (b) How many independent solutions are there to Al y = 0? (c) Give a basis for the column space of A. Suppose A is the sum of two matrices of rank one: vectors span the column A = uv! + wz!. (a) Which space of A? (b) Which vectors span the row space of A? (cj *Thesrank?istless thane 2/1f* (d) Compute A and its rank if wu= z= (1,0,0) and 22 Soe orale v= w = (0,0, 1). V2 Construct a matrix whose column space has basis (1, 2, 4), (2,2, 1) and whose row space has basis (1, 0, 0), (0, 1, 1). 23 Without multiplying matrices, find bases for the row and column spaces of A: Tele sles eee a How do you know from these shapes that A is not invertible? 24 Aly =d is solvable when the right side d is in which subspace? The solution is unique when the ______ contains only the zero vector. 25 True or false (with a reason or a counterexample): 164 26 27 = Chapter 3 Vector Spaces and Subspaces (a) A and A! have the same number of pivots. (b) A and A! have the same left nullspace. (c) If the row space equals the column space then A! = A. (d) If A’ =—A then the row space of A equals the column space. If AB =C-, the rows of C are combinations of the rows of . So the rank of C is not greater than the rank of . Since B'A! = C!, the rank of C is also not greater than the rank of If a,b,c are given with a 40, how would you choose d so that A = [24] has rank one? Find a basis for the row space and nullspace. 28 Find the ranks of the 8 by 8 checkerboard matrix B and chess matrix C: Ds ice lean Ot ysli O . Orr Aig Oo atari 20m i Oo tf © ! . Orr. 0 NR PR 3 a et Stall PE FC AN | O lt C ° PO al . Ort The numbers r,n, b,g,k, p are all different. (C == Pp pe Ppa four zero rows Dp =) PoP PP 2 oP : nO Gg kK bon Find bases for the row space and left nullspace of B and C. Challenge problem: Find a basis for the nullspace of C. ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES ua 4.1 Two vectors are orthogonal when their dot product is zero: v- w = 0 or v'w = 0. This chapter moves up a level, from orthogonal vectors to orthogonal subspaces. Orthogonal means the same as perpendicular. Subspaces entered Chapter 3 with a specific purpose—to throw light on Ax = b. Right away we needed the column space (for b) and the nullspace (for x). Then the light turned onto A’, uncovering two more subspaces. Those four fundamental sub- spaces reveal what a matrix really does. A matrix multiplies a vector: A times x. At the first level this is only numbers. At the second level Ax is a combination of column vectors. The third level shows subspaces. But I don’t think you have seen the whole picture until you study Figure 4.1. It fits the subspaces together, to show the hidden reality of A times x. The 90° angles between subspaces are something new—and we have to say what they mean. The row space is perpendicular to the nullspace. Every row of A is perpendicular to every solution of Ax = 0. Similarly every column is perpendicular to every solution of A'y = 0. That gives the 90° angle on the right side of the figure. This perpendicularity of subspaces is Part 2 of the Fundamental Theorem of Linear Algebra. May we add a word about the left nullspace? It is never reached by Ax, so it might seem useless. But when B is outside the column space—when we want to solve Ax = b and can’t do it—then this nullspace of A? comes into its own. It contains the error in the “least-squares” solution. That is the key application of linear algebra in this chapter. Part 1 of the Fundamental Theorem gave the dimensions of the subspaces. The row and column spaces have the same dimension r (they are drawn the same size). The two nullspaces have the remaining dimensions n — r and m — r. Now we will show that the row space and nullspace are actually perpendicular. DEFINITION Two subspaces V and W of a vector space are orthogonal if every vector v in V is perpendicular to every vector w in W: v-w=0 or v'iw=0 for all v in V and all w in W. 165 166 Chapter 4 Orthogonality dimension dimension =r = Ax=b nullspace nullspace of A of AT dimension =m-—r dimension =n-r Figure 4.1. Two pairs of orthogonal subspaces. Dimensions add to n and m. Example 1 The floor of your room (extended to infinity) is a subspace V. The line where two walls meet is a subspace W (one-dimensional). Those subspaces are orthogonal. Every vector up the meeting line is perpendicular to every vector in the floor. The origin (0, 0, 0) is in the corner. We assume you don’t live in a tent. Example 2 Suppose V is still the floor but W is one of the walls (a two-dimensional space). The wall and floor look orthogonal but they are not! You can find vectors in V and W that are not perpendicular. In fact a vector running along the bottom of the wall is also in the floor. This vector is in both V and W—and it is not perpendicular to itself. Example 3 If a vector is in two orthogonal subspaces, it must be perpendicular to itself. It is v and it is w, so v'v = 0. This has to be the zero vector. Zero is the only point where the nullspace meets the row space. The crucial examples for linear algebra come from the fundamental subspaces. 4A Every vector x in the nullspace of A is perpendicular to every row of A. The nullspace and row space are orthogonal subspaces. 4.1 Orthogonality of the Four Subspaces 167 To see why x is perpendicular to the rows, look at Ax = 0. Each row multiplies x: row A= | : 0 x}/=]:]. row m (1) 0 The first equation says that row | is perpendicular to x. The last equation says that row m is perpendicular to x. Every row has a zero dot product with x. Then x is perpendicular to every combination of the rows. The whole row space C(A!) is orthogonal to the whole nullspace N(A). Here is a second proof of that orthogonality for readers who like matrix shorthand. The vectors in the row space are combinations Aly of the rows. Take the dot product of Aly with any x in the nullspace. These vectors are perpendicular: xl(Aly) = (Ax)'y =0ly =0. (2) We like the first proof. You can see those rows of A multiplying x to produce zeros in equation (1). The second proof shows why A and A! are both in the Fundamental Theorem. A! goes with y and A goes with x. At the end we used Ax = 0. In this next example, the rows are perpendicular to (1, 1, —1) in the nullspace: Ax = E 3 | eeeAb x | A a 0 gives the dot products 1+3-—-4=0 5+2-7=0 Now we turn to the other two subspaces. They are also orthogonal, but in R”. 4B Every vector y in the nullspace of A! is perpendicular to every column of A. The left nullspace and column space are orthogonal. Apply the original proof to A!. Its nullspace is orthogonal to its row space—which is the column space of A. Q.E.D. For a visual proof, look at y' A= 0. The row vector y! multiplies each column of A: = Ee Loa c Cc oO oO ] n if == uebeet | (3) The dot product with every column is zero. Then y is perpendicular to each column— and to the whole column space. Very important Their dimensions The fundamental subspaces are more than just orthogonal (in pairs). are also right. Two lines could be perpendicular in 3-dimensional 168 = Chapter 4 Orthogonality space, but they could not be the row space and nullspace of a matrix. The lines have dimensions | and 1, adding to 2. The correct dimensions r and n—r must add to n = 3. Our subspaces have dimensions 2 and 1, or 3 and 0. The fundamental subspaces are not only orthogonal, they are orthogonal complements. DEFINITION The orthogonal complement of a subspace V contains every vector that is perpendicular to V. This orthogonal subspace is denoted by V+ (pronounced “V perp”). By this definition, the nullspace is the orthogonal complement of the row space. Every x that is perpendicular to the rows satisfies Ax = 0, and is included in the nullspace. The reverse is also true (automatically). If v is orthogonal to the nullspace, it must be in the row space. Otherwise we could add this v as an extra row of the matrix, without changing the nullspace. The row space and its dimension would grow, which breaks the law r + (n —r) =n. We conclude that N(A)+ is exactly the row space C(A'). The left nullspace and column space are not only orthogonal in R”, they are orthogonal complements. The 90° angles are marked in Figure 4.2. Their dimensions add to the full dimension m. Fundamental Theorem of Linear Algebra, Part 2 The nullspace is the orthogonal complement of the row space (in R"). The left nullspace is the orthogonal complement of the column space (in R"). Part | gave the dimensions of the subspaces. Part 2 gives their orientation. They are perpendicular (in pairs). The point of “complements” is that every x can be split into a row space component x, and a nullspace component x,. When A multiplies x =x, +X,, Figure 4.2 shows what happens: The nullspace component goes to zero: Ax, = 0. The row space component goes to the column space: Ax, = Ax. Every vector goes to the column space! Multiplying by A cannot do anything else. But more than that: Every vector in the column space comes from one and only one vector x, in the row space. Proof: If Ax, = Ax’, the difference x, — x; is in the nullspace. It is also in the row space, where x, and x} came from. This difference must be the zero vector, because the spaces are perpendicular. Therefore x, = x’. There is an invertible matrix hiding inside A, if we throw away the two nullspaces. From the row space to the column space, A is invertible. The “pseudoinverse” will invert it in Section 7.4. Example 4 Every diagonal matrix has an r by r invertible submatrix: v= 3 0 lO 5S ORO O @ @ © © @ ORO) contains 3 0 é |; The rank is r = 2. The 2 by 2 submatrix in the upper corner is certainly invertible. The other eleven zeros are responsible for the nullspaces. 4.1 Orthogonality of the Four Subspaces 169 column space of A dim m—r dim n—r Figure 4.2 zero. The true action of A times x: row space to column space, nullspace to Section 7.3 will show how every A becomes a diagonal matrix, when we choose the right bases for R” and R”. This Singular Value Decomposition is a part of the theory that has become extremely important in applications. Combining Bases from Subspaces What follows are some valuable facts about bases. They could have come earlier, when bases were defined, but they were saved until now—when we are ready to use them. After a week you have a clearer sense of what a basis is (independent vectors that span the space). When the count is right, one of those two properties implies the other: 4C Any n linearly independent vectors in R” must span R”. Any n vectors that span R” must be independent. Normally we have to check both properties: First, that the vectors are linearly independent, and second, that they span the space. For n vectors in R”, either independence or spanning is enough by itself. Starting with the correct number of vectors, one property of a basis implies the other. This is true in any vector space, but we care most about R”. When the vectors go into the columns of an n by n matrix A, here are the same two facts. Remember that A is square: 170 = Chapter 4 Orthogonality 4D If the n columns are independent, they must span all of R”. If the n columns span R”, they must be independent. A square system Ax = b always has one solution if it never has two solutions. Uniqueness implies existence and existence implies uniqueness. The square matrix A is invertible. Its columns are a basis for R”. Our standard method of proof is elimination. If there are no free variables (uniqueness), there must be n pivots. Then back substitution solves Ax = b (exis- tence). In the opposite direction, suppose Ax = b can always be solved (existence of solutions). Then elimination produced no zero rows. There are n pivots and no free variables. The nullspace contains only x = 0 (uniqueness of solutions). The count is always right for the row space of A and its nullspace. They have dimensions r and n —r. With a basis for the row space and a basis for the nullspace, we have r+(n—r) =n vectors—the right number. Those n vectors are independent.‘ Therefore they span R". They are a basis: Every x in R” is the sum x, +x, of a row space vector x, and a nullspace vector x,. This confirms the splitting in Figure 4.2. It is the key point complements—the dimensions add to n and no vectors are missing. Example 5 For A=[J of orthogonal 1]=[492°], write any vectorx as x, + Xn. (1, 0, 1, 0) and (0, 1, 0, 1) are a basis for the row space. (1,0, —1, 0) and (0, 1, 0, —1) are a basis for the nullspace. Those four vectors are a basis for R*. Any x = (a,b,c, d) can be split into x, + X;: a b eR eee ta d 1 0 [tc 0 =» b+d\ ae 0 1 a6 |kOnet amie 1 0 0 1 bad eet eee 1 wee —='| REVIEW OF THE KEY IDEAS =# 1 The subspaces V and W are orthogonal if every v in V is orthogonal to every w in W. 2 V and W are “orthogonal complements” perpendicular to the other subspace. TIf a combination of the vectors gives xy +X, = if each subspace contains all vectors 0, then x, = —x, is in both subspaces. It is orthogonal to itself and must be zero. All coefficients of the row space basis and nullspace basis must be zero—which proves independence of the n vectors together. 4.1 Orthogonality of the Four Subspaces 171 The nullspace N(A) and the row space C(A!) are orthogonal complements, from Ax = 0. Any n independent vectors in R” will span R”. Every vector x in R” has a nullspace component and a row space component. Problem Set 4.1 Questions 1-10 grow out of Figures 4.1 and 4.2. 1 Suppose A is a 2 by 3 matrix of rank one. Draw Figure 4.1 to match the sizes of the subspaces. Redraw Figure 4.2 for a 3 by 2 matrix of rank r = 2. What are the two parts x, and x,? Construct a matrix with the required property or say why that is impossible: : 1 2 : (a) Column space contains |2|and |-3]. nullspace contains (b) 1 2 1 Row space contains |2|and |-3|.nullspace contains [+] (c) Column space is perpendicular to nullspace (d) Row (e) Columns 1 + row 2 + row 3 = 0, column space contains 1 H (1, 2, 3) add up to zero column, rows add to a row of 1’s. It is possible for the row space to contain the nullspace. Find an example. (a) (b) If Ax =b has a solution and A'y = 0, then y is perpendicular to If Ax = Bb has no solution and Ay = 0, explain why y is not perpendicular to In Figure 4.2, how do we know that Ax; is equal to Ax? How do we know that this vector is in the column space? If Ax is in the nullspace of A’ then Ax must be zero. Why? Which other subspace is Ax in? This is important: A'A has the same nullspace as A. Suppose A is a symmetric matrix (Al = A). (a) Why is its column space perpendicular to its nullspace? (b) If Ax = 0 and Az = 5z, why is x perpendicular to z? These are “eigenvectors.” 172 = Chapter 4 Orthogonality 9 (Recommended) Draw Figure 4.2 to show each subspace correctly for LZ ce, ‘| and 10 hw asl. a Find the pieces x, and x, and draw Figure 4.2 properly if Ar i —alm) 0 —1 0 O and D, 2a! Questions 11-19 are about orthogonal subspaces. 11 12 Prove that every y in N(A‘) is perpendicular to every Ax in the column space, using the matrix shorthand of equation (2). Start from Aly = 0. The Fundamental Theorem is also stated in the form of Fredholm’s alternative: For any A and Bb, exactly one of these two problems has a solution: (a) Ax=b (b) Aly=0 with bly 40. Either b is in the column space of A or else b is not orthogonal to the nullspace of A!. Choose A and B so that (a) has no solution. Find a solution to (b). 13 If S is the subspace of R* containing only the zero vector, what is St? is spanned by (1, 1, 1), what is S=9? If S is spanned by (2,0,0) If § and (0, 0, 3), what is $+? 14 Suppose S only contains two vectors (1,5, 1) and (2, 2, 2) (not a subspace). Then S+ is the nullspace of the matrix A = . Therefore S+ is a even if S is not. 15 Suppose L is a one-dimensional subspace (a line) in R>. Its orthogonal complement L+ is the perpendicular to L. Then (L+)+ is a perpendicular to L+. In fact (L+)+ is the same as 16 Suppose V is the whole space R*. Then Then (V+)+ contains V+ contains only the vector . So (V+)+ is the same as 17 Suppose S is spanned by the vectors (1, 2, 2,3) and (1, 3,3, 2). Find two vectors that span St. 18 If P is the plane of vectors in R* satisfying x; + x2 +x3-+x4 = 0, write a basis for P+. Construct a matrix that has P as its nullspace. 19 If a subspace S$ is contained in a subspace V, prove that $+ contains V+. 4.1 Orthogonality of the Four Subspaces 173 Questions 20-26 are about perpendicular columns and rows. 20 Suppose an n by n matrix is invertible: AA~! A! is orthogonal to the space spanned by 21 Suppose the columns of A are unit vectors, all mutually perpendicular. = J. Then the first column of : What is ATA? 22 Construct a 3 by 3 matrix A with no zero entries whose columns are mutually perpendicular. Compute A'A. Why is it a diagonal matrix? 23 The lines 3x + y = by and 6x + 2y = line if__——.:~<XIn that case bo are (b;, bz) is perpendicular nullspace of the matrix is the line 3x-+y = ____. nullspace is 24 They are the same to the vector ____—.:~CThe One particular vector in that Why is each of these statements false? (a) (1,1, 1) is perpendicular to (1,1, —2) so the planes x + y —2z=0 are orthogonal subspaces. x + y +z = O and (b) The lines from (0, 0,0) through (2,4,5) and (1, —3, 2) are orthogonal complements. (c) If two subspaces meet only in the zero vector, the subspaces are orthogonal. ZS Find a matrix with v = (1, 2, 3) in the row space and column space. Find another matrix with v in the nullspace and column space. Which pairs of subspaces can v not be in? 26 The command N = null(A) will produce a basis for the nullspace of A. Then the command B = null(N’) will produce a basis for the of A. 174 = Chapter 4 Orthogonality PROJECTIONS am 4.2 May we start this section with two questions? (In addition to that one.) The first question aims to show that projections are easy to visualize. The second question is about matrices: 1 What are the projections of b = (2,3, 4) onto the z axis and the xy plane? Zz What matrices produce those projections onto a line and a plane? When b is projected onto a line, its projection p is the part of b along that line. If b is projected onto a plane, p is the part in that plane. There is a matrix P that multiplies b to give p. The projection p is Pb. One projection goes across to the z axis. The second projection drops straight down to the xy plane. The picture in your mind should be Figure 4.3. Start with b = (2, 3, 4). One way gives p; = (0,0,4) and the other way gives p2 = (2,3,0). Those are the parts of b along the z axis and in the xy plane. The projection matrices P; and Pz are 3 by 3. They multiply b with 3 components to produce p with 3 components. Projection onto a line comes from a rank one matrix. Projection onto a plane comes from a rank two matrix: Ontothe.7 axis.) = 0: 2086 [0 1050 Wai irsen| Onto the xy plane: Pop =|0 Oto 1 O OF O20 P, picks out the z component of every vector. P2 picks out the x and y components. To find p; and p2, multiply b by P; and P> (small p for the vector, capital P for the matrix that produces it): pi=Pib=|0 oO 0 0 0 00 1 0 =/0} (pi i pro=Pob=/0 OO 1 OF} lex a y}=]y 0 z 0 0 0 In this case the projections P; and P2 are perpendicular. The xy plane and the z axis are orthogonal subspaces, like the floor of a room and the line between two walls. More than that, the line and plane are orthogonal complements. Their dimensions add to 1 + 2 = 3—every vector b in the whole space is the sum of its parts in the two subspaces. The projections p; and p2 are exactly those parts: The vectors give pj + po =). The matrices give P; + Pp = /. (1) This is perfect. Our goal is reached—for this example. We have the same goal for any line and any plane and any n-dimensional subspace. The object is to find the part p in each subspace, and the projection matrix P that produces p = Pb. Every subspace of R” has an m by m (square) projection matrix. To compute P, we absolutely need a good description of the subspace that it projects onto. 4.2 Figure 4.3. Projections 175 The projections of b onto the z axis and the xy plane. The best description of a subspace is to have a basis. The basis vectors go into the columns of A. We are projecting onto the column space of A. Certainly the z axis is the column space of the following matrix A;. The xy plane is the column space of Az. That plane is also the column space of A3 (a subspace has many bases): Ar= 0 10 1 andes —a| i 20 nO 0 0 and seAci—s 1y 52 eZee O30 The overall problem is to project onto the column space of any m by n matrix. Start with a line. Then n = 1. The matrix A has only one column. Projection Onto a Line We are given a point b = (bj,..., bm) in m-dimensional space. line through the origin, in the direction of a = (a),...,@m). We are also given a We are looking along that line, to find the point p closest to b. The key to projection is orthogonality: The line connecting b to p is perpendicular to the vector a. This is the dotted line marked e in Figure 4.4—which we now compute by algebra. The projection p is some multiple of a! Call it p = Xa = “x hat” times a. Our first step is to compute this unknown number x. That will give the vector p. Then from the formula for p, we read off the projection matrix P. These three steps will lead to all projection matrices: find x, then find p, then find P. The dotted line b— p is b—<xa. It is perpendicular to a—this will determine x. Use the fact that two vectors are perpendicular when their dot product is zero: ss a-(b—xa)=0 or a a-b—Xa-a=0 or aed) s=—— a-a (ab =>: a'‘a For vectors the multiplication a'b is the same as a+b. Using the transpose is better, because it applies also to matrices. (We will soon meet A'b.) Our formula for % immediately gives the formula for p: 176 = Chapter 4 Orthogonality Figure 4.4 ~The projection of b onto a line has length ||p|| = ||b|| cos@. 4E The projection of b onto the line through a is the vector Special case 1: If b=a p=Xa = ‘7774. then t = 1. The projection of a onto a is itself. Special case 2: If b is perpendicular to a then a'b = 0. The projection is p = 0. Example 1 Project b = [| onto 1 et a= [2|to find p = xa in Figure 4.4. Solution The number ¥ is the ratio of a'b = 5 to a'a = 9. So the projection is p= 2a. The error vector between b and p is e = b— p. Those vectors p and e will add to b: ees P ritySia Gucgwig MeMba. P= Nae piney) The error e should be perpendicular to a and it is: e'a = : — 5 — 5 ==). Look at the right triangle of b, p, and e. The vector b is split into two parts—its component along the line is p, its perpendicular component is e. Those two sides of a right triangle have length ||b|| cos@ and ||b|| sin@. The trigonometry matches the dot product: p= 4b a, ata so its length is |[p|| = b 6 lla|| = = all2 ||5|| cos é. (3) The dot product is a lot simpler than getting involved with cos@ and the length of b. The example has square roots in cos@ = 5/3/3 and ||b|| = /3. There are no square roots in the projection p = 2a. Now comes the projection matrix. In the formula for p, what matrix is multiplying b? You can see it better if the number x is on the right side of a: = Pb when the matrix is ala a'b p= ax =a—— — aa! P =“. ava 4.2 Projections 177 P is a column times a row! The column is a, the row is a!. Then divide by the number a'a. The matrix P is m by m, but its rank is one. We are projecting onto a one-dimensional subspace, the line through a. : ee Example 2 Find the projection matrix aa at P = —— a‘a : onto the line through a = | I rene Solution Multiply column times row and divide by a'a = 9: T I L bea =5/2/[! ae I eo 2 2]=5|2 4 4 2 cia(oeee This matrix projects any vector b onto a. Check p = Pb for the particular b = (1, 1, 1) in Example 1: pres Jes I i 9 Dak 2 Dama ® A 1 | i) = = | 1¢ Pe 5 AG which is correct. If the vector a is doubled, the matrix P stays the same. It still projects onto the same line. If the matrix is squared, P* equals P. Projecting a second time doesn’t change anything, so P* = P. The diagonal entries of P add up to 5(1 + 4-4) 1 The matrix / — P should be a projection too. It produces the other side e of the triangle—the perpendicular part of b. Note that (J — P)b equals b— p which is e. When P projects onto one subspace, | — P projects onto the perpendicular subspace. Here / — P projects onto the plane perpendicular to a. Now we move beyond projection onto a line. Projecting onto an n-dimensional subspace of R” takes more effort. The crucial formulas are collected in equations (5)— (6)-(7). Basically you need to remember them. Projection Onto a Subspace Start with n vectors aj,...,@, in R’”. Assume that these a’s are linearly independent. Problem: Find the combination Xa, +---+Xna, that is closest to a given vector b. We are projecting each b in R™ onto the subspace spanned by the a’s. With n = | (only one vector a;) this is projection onto a line. umn space of A, which has just one column. a,,...,4,. Their combinations in R” The line is the col- In general the matrix A has n columns are the vectors Ax in the column space. We are looking for the particular combination p = AX (the projection) that is closest to b. The hat over ¥ indicates the best choice, to give the closest vector in the column space. That choice is a'b/a'a when n = 1. For n > 1, the best ¥ is the vector COTE came li We solve this problem for an n-dimensional subspace in three steps: Find the vector X, find the projection p = AX, find the matrix P. 178 = Chapter 4 Orthogonality The key is in the geometry! The dotted line in Figure 4.5 goes from b to the nearest point A¥ in the subspace. This error vector b — Ax is perpendicular to the subspace. The error b — AX¥ makes a right angle with all the vectors a;,...,@,. That gives the n equations we need to find x: al (b — AX) =0 —al — or a! (b — Ax) = 0 b— Ax |=|0}. | a (4) The matrix in those equations is A’. The n equations are exactly A'(b — Ax) = 0. Rewrite A'(b — AX) = 0 in its famous form A’A¥ = A'D. This is the equation for ©, and the coefficient matrix is A'A. Now we can find ¥ and p and P: 4F The combination Xa; + ---+X,a, = AX that is closest to b comes from A'(b—Ax)=0 of AAT =A'Dd. (5) The symmetric matrix A™A is n by n. It is invertible if the a’s are independent. The solution is ¥ = (A!A)~!A™Db. The projection of b onto the subspace is the vector p = A = A(ATA)“!A™D. (6) This formula shows the n by n projection matrix that produces p = Pb: P=A(A'A)'A!. (7) Compare with projection onto a line, when the matrix A has only one column a: y= a'b ala and a'b p=a— ala and aa! P=-—— i a‘a Those formulas are identical with (5) and (6) and (7)! The number a'a becomes the matrix A'A. When it is a number, we divide by it. When it is a matrix, we invert it. The new formulas contain (A'A)~! instead of 1/a'a. The linear independence of the columns a),...,@, will guarantee that this inverse matrix exists. The key step was A'(b — AX) = 0. We used geometry (e is perpendicular to all the a’s). Linear algebra gives this “normal equation” too, in a very quick way: 1. Our subspace is the column space of A. 2. The error vector 3. Therefore b — AX is in the left nullspace. This means A'(b — AX) = 0. b — AX is perpendicular to that column space. The left nullspace is important in projections. This nullspace of A? contains the error vector e = b — Ax. The vector b is being split into the projection p and the error e = b— p. Figure 4.5 shows the right triangle with sides p, e, and b. 4.2 Projections 179 column a, ale=0 ale =1() Ale = Al(b— A%) =0 column az Figure 4.5 The projection p is the nearest point to b in the column space of A. The perpendicular error e must be in the nullspace of A. 10 6 Example 3 If 4 = Hal and b =|° |find ¥ and p and P. Solution Compute the square matrix A'A and also the vector A'b: 1 0 diye 7 Filba a a=[p i alft tf-[3 a MM a Bs 6 ie pea ee Ors eo alod Kul BIEI-6) ~ [-L) Now solve the normal equation A'A& = A‘D to find &: The combination p = AX is the projection of b onto the column space of A: 1 P=Dx\leth— | 0 3h b 2. less) 5) <2, —| l 2 Phe etrot is > =.)pi 2 (9) 1 That solves the problem for one particular b. To solve it for every b, compute the matrix P = A(A™A)~!A?. The determinant of ATA is 15 —9 = 6; (ATA)! is easy. Then multiply A times (A'A)~! times A! to reach P: (A’Taya A) 1} 5 -—3“ ae = fe | and | P legisalt (10) NN = bY NN =aN Two checks on the calculation. First, the error e = (1, —2, 1) is perpendicular to both columns (1, 1,1) and (0, 1,2). Second, the final P times b = (6, 0,0) correctly gives p = (5,2, —1). We must also have P* = P, because a second projection doesn’t change the first projection. 180 = Chapter 4 Orthogonality Warning The matrix P = A(A'A)~!A!? is deceptive. You might try to split (A'A)~! into A~! times (A')7!. If you make that mistake, and substitute it into P, you will find P = AA~!(A‘T)~!AT. Apparently everything cancels. This looks like P = /, the identity matrix. We want to say why this is wrong. The matrix A is rectangular. It has no inverse matrix. We cannot split (ALAa. into A~! times (A')~! because there is no A~! in the first place. In our experience, a problem that involves a rectangular matrix almost always leads to A'A. We cannot split up its inverse, since A~! and (A')~! don’t exist. What does exist is the inverse of the square matrix A!A. This fact is so crucial that we state it clearly and give a proof. 4G A!A is invertible if and only if A has linearly independent columns. Proof A'A is a square matrix (n by n). For every matrix A, we will now show that A'A has the same nullspace as A. When the columns of A are linearly independent, its nullspace contains only the zero vector. Then A’ A, with this same nullspace, is invertible. Let A be any matrix. If x is in its nullspace, then Ax = 0. Multiplying by AT gives A'Ax = 0. So x is also in the nullspace of A'A. Now start with the nullspace of A' A. From A’Ax = 0 we must prove that Ax = 0. We can’t multiply by (A')~!, which generally doesn’t exist. Just multiply by x!: (x')ATAx =0 or (Ax)'(Ax)=0 or ||Ax|l? =0. The vector Ax has length zero. Therefore Ax = 0. Every vector x in one nullspace is in the other nullspace. If A has dependent columns, so does A'A. If A has independent columns, so does A'A. This is the good case: When A has independent columns, A!A is square, symmetric, and invertible. To repeat for emphasis: A'A is (n by m) times (m by n). It is always square (n by n). It is always symmetric, because its transpose is (ATA)! = AT(A!)! which equals A'A. We just proved that A'A is invertible—provided A has independent columns. Watch the difference between dependent and independent columns: A! A A‘A |: 1 | ; ; ok Dieta) ana 4 i 8 dependent _ singular Very brief summary To find the projection Al A A'A E 1 4] : =| 2 SOP Ry et 4 indep. | 9 invertible p = X1a,+---+Xnan, solve A'AX = A™D. The projection is Ax and the error is e = b — p = b — AX. The projection matrix P = A(A'A)—'!A! gives p = Pb. This matrix satisfies P? = P. The distance from b to the subspace is |le|]. 4.2 =» REVIEW OF THE KEY IDEAS Projections =& 1 The projection of b onto the line through a is p = a¥ = a(a'b/a'‘a). Z The rank one projection matrix P = aa'/a'a multiplies b to produce p. 3 The projection of b onto a subspace leaves space. 4 When the columns 181 e = b — p perpendicular to the sub- of A are a basis, the normal equation A! A& = A'b leads to p= Ax. 5 The projection matrix P = A(A'A)~'A? has P! = P and P* = P. The second projection leaves p unchanged. Problem Set 4.2 Questions 1-9 ask for projections onto lines. Also errors e = b — p and matrices P. 1 Project the vector b onto the line through a. Check that e is perpendicular to a: (Qe iz Ola l e2 4 1 |i 1 ands 1 (DB), Opies and 1 —1 a= | —3 —1 Draw the projection of b onto a and also compute it from p = Xa: (a) belt and «=(9| (b) »=|| and fae 3 In Problem 1, find the projection matrix P = aa'/a'a onto the line through each vector a. Verify that P* = P. Multiply Pb in each case to compute the projection p. 4 Construct the projection matrices onto the lines through the a’s in Problem 2. Explain why P? should equal P. 5 Compute the projection matrices aa'/a'a onto the lines through a, = (—1, 2, 2) and az = (2,2,—1). Multiply those projection matrices and explain why their product P; Pz is what it is. 6 Project b = (1,0, 0) onto the lines through a; and az in Problem 5 and also onto a3 = (2, —1,2). Add up the three projections pj + p2 + p3. 7 Continuing Problems 5-6, find the projection matrix Verify that Py) + P2 + P3 = 1. P3 onto a3 = (2, —l, 2). 182 = Chapter 4 Orthogonality Poa} 2 a a. asl ld P; Poa, =| Questions 8-9-10 Questions 5-6-7 8 | on = Project the vector b = (1, 1) onto the lines through a; = (1,0) and az = (1, 2). Draw the projections p; and p2 and add p; + p2. The projections do not add to b because the a’s are not orthogonal. 5 In Problem 8, the projection of b onto the plane of a, and a2 will equal b. The projection matrix is P = le Check P= “AAMAS YAP wor A=[a, a]=[)4] 10 =“Project a; = (1,0) onto az = (1, 2). Then project the result back onto a;. Draw these projections and multiply the projection matrices P; Pz: Is this a projection? Questions 11-20 ask for projections, and projection matrices, onto subspaces. 11 Project b onto the column space of A by solving A'A¥ = A'b and p = Af: (a) A=|]0 [etl 1] 0 0 and b=|3 2) 4 (b) A=/]1 [orl Oia 1]° and b=|4 4 6 Find e = b — p. It should be perpendicular to the columns of A. 12 Compute the projection matrices P; and P2 onto the column spaces in Problem 11. Verify that P\b gives the first projection p,. Also verify Pe lal 13. Suppose A is the 4 by 4 identity matrix with its last column removed. A is 4 by 3. Project b = (1,2, 3,4) onto the column space of A. What shape is the projection matrix P and what is P? 14 Suppose b equals 2 times the first column of A. What is the projection of b onto the column space of A? Is P = / in this case? Compute p and P when b = (0, 2, 4) and the columns of A are (0, 1,2) and (1, 2, 0). 15 If A is doubled, then P = 2A(4A™A)~!2A™. This is the same as A(ATA)~!AT. The column space of 2A is the same as 16 . Is ¥ the same for A and 2A? What linear combination of (1,2, —1) and (1,0, 1) is closest to b = (2,1, 1)? 4.2 17. (Important) If P* = P. show that (I — P)* =I —P. When Projections 183 P projects onto the column space of A, J — P projects onto the 18 19 (a) If P is the 2 by 2 projection matrix onto the line through (1, 1), then 7 —P is the projection matrix onto (b) If P is the 3 by 3 projection matrix onto the line through (1, 1, 1), then I — P is the projection matrix onto To find the projection matrix onto the plane x — y — 2z = 0, choose two vectors in that plane and make them the columns of A. The plane should be the column space. Then compute P = A(ATA)7!A?. 20 = To find the projection matrix P onto the same plane x — y—2z = 0, write down a vector e that is perpendicular to that plane. Compute the projection Q = ee!/ele and then P = ] — Q. Questions 21-26 show that projection matrices satisfy P* = P and P! = P. 21 Multiply the matrix P = A(A'A)~!A! by itself. Cancel to prove that P? = P. Explain why P(Pb) always equals Pb: The vector Pb is in the column space so its projection is 22 Prove that P = A(A'A)~!A! is symmetric by computing the inverse of a symmetric matrix is symmetric. 23 If A is square and invertible, the warning against splitting (A'A)~! apply: P = AA~!(A')-!Al = 1. error 24 P!. Remember When A is invertible, why is P =I? that does not What is the e? The nullspace of A’ is _____ to the column space C(A). projection of b onto C(A) should be p = So if A'b = 0, the . Check that P = A(ATA)“!A? gives this answer. 25 The projection matrix P onto an n-dimensional subspaces has rank r = Reason: The projections Pb fill the subspace S. So S is the OLsP;, 26 If an m by m matrix has A* = A and its rank is m, prove that A = J. 27 The important fact in Theorem 4G is this: If AlAx = 0 then Ax = 0. New proof: The vector Ax is in the nullspace of . Ax 1s always in the column space of ____. To be in both perpendicular spaces, Ax must be zero. 28 Use P’ = P and P? = P to prove that the length squared of column 2 always equals the diagonal entry p22. This number is 2 = ma + a + a for a, \| | wn NNN=AN n. 184 = Chapter 4 Orthogonality LEAST SQUARES APPROXIMATIONS a 4.3 It often happens that Ax = b has no solution. The usual reason is: too many equations. The matrix has more rows than columns. There are more equations than unknowns (m is greater than n). The n columns span a small part of m-dimensional space. Unless all measurements are perfect, b is outside that column space. Elimination reaches an impossible equation and stops. But these are real problems and they need an answer. To repeat: We cannot always get the error e = b— Ax down to zero. When e is zero, X is an exact solution to Ax = b. When the length of e is as small as possible, X is a least squares solution. Our goal in this section is to compute x and use it. The previous section emphasized p (the projection). This section emphasizes ¥ (the least squares solution). They are connected by p = AX. The fundamental equation is still ATA¥ = A'b. Here is a short unofficial way to derive it: When the original Ax = b has no solution, multiply by A! and solve A!A¥ = A'b. Example 1 Find the closest straight line to three points (0, 6), (1,0), and (2, 0). No straight line goes through those points. We are asking for two numbers C and D that satisfy three equations. The line is b = C + Dt. Here are the equations at t = 0, 1,2 to match the given values b = 6, 0, 0: The first point is on the line b = C + Dt if The second point is on the line The third point is on the line b=C+ Dt if b = C + Dt if CUE 06 C+D-1=0 Cae Diet: This 3 by 2 system has no solution; b = 6,0,0 is not a combination of the columns of A: Avil me) Wiel Le 2, 6 fake 2 DIO Ax = b is not solvable. 0 The same numbers were in Example 3 in the last section. In practical problems, the data points are close to a line. But they don’t exactly match any C + Dt, and there easily could be m = 100 points instead of m = 3. The numbers 6, 0, 0 exaggerate the error sO you can see it clearly. Minimizing the Error How do we make the error e = b — Ax as small as possible? This is an important question with a beautiful answer. The best x (called ¥) can be found by geometry or algebra or calculus: 4.3 Least Squares Approximations 185 By geometry Every Ax lies in the plane of the columns (1, 1, 1) and (0, 1,2). In that plane, we are looking for the closest point b. The nearest point is the projection p. The best choice for Ax is p. Then x =X. The smallest possible error is e = b — p. By algebra Every vector b has a part in the column space of A and a perpendicular part in the left nullspace. The column space part is p. The left nullspace part is e. There is an equation we cannot solve (Ax = b). There is an equation we do solve (by removing é): Ax =b=p+e_ is impossible; Ax = p_ is solvable. (1) The solution ¥ to Ax = p makes the error as small as possible, because for any x: Ax —bl? = Ax — pil? + lel’. (2) This is the law c* = a* + b? for a right triangle. The vector Ax — p in the column space is perpendicular to e in the left nullspace. We reduce Ax — p to zero by choosing x to be x. That leaves the smallest possible error (namely e). Notice what “smallest” means. The squared length of Ax — b is being minimized: The least squares solution X makes E = |\|Ax — b\\? as small as possible. By calculus Most functions are minimized by calculus! The derivatives are zero at the minimum. The graph bottoms out and the slope in every direction is zero. Here the error function to be minimized is a sum of squares (the square of the error in each equation!): B= "Ax = bi? =(C +D-0—6)7 (C4 D-1 C+D). (3) The unknowns are C and D. Those are the components of ¥, and they determine the line. With two unknowns there are two derivatives—both zero at the minimum. They are “partial derivatives” because 0E/dC treats D as constant and dE/dD treats C as constant: OE oO = AC 1 DEO 6) df /9D =2(C + D-0—6)0) a(n Gey + 2(6 + D= DW) +2(C PD 2) 2) =—0: dE/0D contains the extra factors 0, 1,2. Those are the numbers in £ that multiply D. They appear because of the chain rule. (The derivative of (4 + 5x)? is 2 times 4 + 5x times an extra 5.) In the C derivative the corresponding factors are (1)(1)(1), because C is always multiplied by 1. It is no accident that 1, 1, | and 0, 1, 2 are the columns of A. Now cancel 2 from every term and collect all C’s and all D’s: The C derivative ra is zero; “3G + 3D'=6 The D derivative au is zéro;,, SC. +5) = 0; 186 = Chapter4 Orthogonality b,=0 é,=1 p3=-1 Figure 4.6 The closest line has heights p = (5,2, —1) with errors e = (1, —2, 1). The equations A' Ax = A'b give ¥ = (5, —3). The line is b = 5—3¢ and the projection is 5a; — 3ap. These equations are identical with A’AX = A'b. The best C and D are the components of ¥. The equations (4) from calculus are the same as the “normal equations” from linear algebra: The partial derivatives of \|Ax — b\\? are zero when A! AX = A‘b. The solution is C = 5 and D = —3. Therefore b = 5 — 3t is the best line—it comes closest to the three points. At ¢ = 0, 1, 2 this line goes through p = 5, 2, —1. It could not go through b = 6, 0, 0. The errors are 1, —2, 1. This is the vector e! Figure 4.6a shows the closest line to the three points. It misses by distances €1, €2, €3. Those are vertical distances. The least squares line is chosen to minimize = et + es + e. Figure 4.6b shows the same problem in another way (in 3-dimensional space). The vector b is not in the column space of A. That is why we could not solve Ax = b and put a line through the three points. The smallest possible error is the perpendicular vector e to the plane. This is e = b — AX, the vector of errors (1, —2, 1) in the three equations—and the distances from the best line. Behind both figures is the fundamental equation A'A¥ = A'D. Notice that the errors 1, —2, 1 add to zero. The error e = (e}, €2, e3) is perpendicular to the first column (1, 1,1) in A. The dot product gives e; + e2 + e3 = 0. 4.3 Least Squares Approximations 187 column space A Sy es Siete, 7 not possible = — >-—€ b=pte b not in column space 0 Independent columns No nullspace Figure 4.7 The projection p = AX is closest to b, so ¥ minimizes E = ||b — Ax ||?. The Big Picture The key figure of this book shows the four subspaces and the true action of a matrix. The vector x on the left side of Figure 4.2 went to b = Ax on the right side. In that figure x was split into x, + x,. There were many solutions to Ax = bD. In this section the situation is just the opposite. There are no solutions to Ax = Bb. Instead of splitting up x we are splitting up b. Figure 4.7 shows the big picture for least squares. Instead of Ax = b we solve AX¥ = p. The error e = b—p is unavoidable. Notice how the nullspace N(A) is very small—just one point. With independent columns, the only solution to Ax = 0 is x = 0. Then ATA is invertible. The equation A'Ax = A'b fully determines the best vector ¥. Chapter 7 will have the complete picture—all four subspaces included. Every x splits into x, + x,, and every b splits into p +e. The best solution is still ¥ (or X,) in the row space. We can’t help e and we don’t want x,—this leaves A¥ = p. Fitting a Straight Line This is the clearest application of least squares. It starts with m points in a plane— hopefully near a straight line. At times f,..., tm those points are at heights bj,..., Den Figure 4.6a shows the best line €1,...,@m. b = C + Dt, which misses the points by distances Those are vertical distances. No line is perfect, and the least squares line is chosen to minimize E = et +..-+e¢, The first example in this section had three points. Now we allow m points (m can be large). The algebra still leads to the same two equations A'AX = A'b. The components of X¥ are still C and D. 188 Chapter 4 Orthogonality Figure 4.6b shows the same problem in another way (in m-dimensional space). A line goes exactly through the m points when we exactly solve Ax = b. Generally we can’t do it. There are only two unknowns C and D because A has n = 2 columns: C+ Ve ige Dt, =b, I if = aeto =UH ji amnesia) C+ Din (5) es The column space is so thin that almost certainly b is outside of it. The m points (t;,b;) almost certainly don’t lie on a line. €1,---,@m to the closest line. The components of e are the distances When b happens to lie in the column space, the points happen to lie on a line. In that case b = p. Then Ax = b is solvable and the errors are e = (0,..., 0). The closest line has heights pj,..., Pm with errors e1,..., €mThe equations A’AX = A'b give & = (C, D). The errors are e; = b; — C — Dt;. Fitting points by straight lines is so important that we give the two equations once and for all. Remember that b = C + Dt exactly fits the data points if C+ Dt, ; Cpr = b, wii pals ake it. ee =o. os a ba ag by e ext bm ic (6) This is our equation Ax = b. It is generally unsolvable, if m > 2. But there is one good feature. The columns of A are independent (unless all times ¢; are the same). So we turn to least squares and solve A'A® = A'b. The “dot-product matrix” A‘A is 2.by=2: ries F ee fi) 1 wall ep ty Sas ‘ bees m i fe aa 8, ; (8) On the right side of the equation is the 2 by 1 vector A'b: by A oe 5 me) wae en x a ees en; In a specific problem, all these numbers are given. The m equations Ax = b reduce to two equations ATA¥ = A'b. A formula for C and D will be in equation (10). 4.3 Least Squares Approximations 189 4H The line C + Dt which minimizes ef +--+ +e, is determined by ATAR = A™D: me CL SE ie ate C ENV ldee ESE: SED (9) The vertical errors at the m points on the line are the components of e = b — p. As always, those equations come from geometry or linear algebra or calculus. The error vector (or residual) b— AX is perpendicular to the columns of A (geometry). It is in the nullspace of A? (linear algebra). The best ¥ = (C, D) minimizes the total error E, the sum of squares: E(x) = \|Ax — bil? =(C +.Dh — by)? +--+ + (C + Dtm — bm)?. When calculus sets the derivatives 0E/dC and dE/dD to zero, it produces the two equations in (9). Other least squares problems have more than two unknowns. Fitting a parabola has n = 3 coefficients C, D, E (see below). In general we are fitting m data points by n parameters x;,..., X,. The matrix A hasn columns andn < m. The total error is a function E(x) = ||Ax — b||? of n variables. Its derivatives give the n equations A'A = A'b. The derivative of a square is linear—this is why the method of least squares is so popular. Example 2 The columns of A are orthogonal in one special case—when the measurement times add to zero. Suppose b = 1, 2,4 at times ¢ = —2,0,2. Those times add to zero. The dot product with 1, 1,1 is zero and A has orthogonal columns: C+ D(-2) =1 CAD (O) a= Ca UO), =4 The measurements La? or Ae —a C 1 | 0 | = || 2 I ame 4 1, 2,4 are not on a line. There is no exact C and D and x. Look at the matrix A! A in the least squares equation for ¥: A Tee An iPAD . is 3 Ol Le 7 E | sl=[4)- Main point: Now A'A is diagonal. We can solve separately for C= “ and D = g. The zeros in A'A are dot products of perpendicular columns in A. The denominators 3 and 8 are not | and 1, because the columns are not unit vectors. But a diagonal matrix is virtually as good as the identity matrix. Orthogonal columns are so helpful that it is worth moving the time origin to produce them. To do that, subtract away the average time f= (ty + +++ + tn)/m. Then the shifted measurement times 7; = t; —f add to zero. With the columns now orthogonal, A'A is diagonal. The best C and D have direct formulas: C2 by +-+-+)Dm mi and ee ees Encny Tote+Tp (10) 190 = Chapter 4 Orthogonality The best line is C+ DT or C+ D(t —f). The time shift that makes ATA diagonal is an example of the Gram-Schmidt process, which orthogonalizes the columns in advance. Fitting by a Parabola If we throw a ball, it would be crazy to fit the path by a straight line. A parabola b =C + Dt + Et? allows the ball to go up and come down again (b is the height at time ft). The actual path is not a perfect parabola, but the whole theory of projectiles starts with that approximation. When Galileo dropped a stone from the Leaning Tower of Pisa, it accelerated. The distance contains a quadratic term 5gt*. (Galileo’s point was that the stone’s mass is not involved.) Without that term we could never send a satellite into the right orbit. But even with a nonlinear function like t?, the unknowns C, D, E appear linearly. Choosing the best parabola is still a problem in linear algebra. Problem Fit b,,...,bm at times t},..., tm by a parabola b = C + Dt + Et?. With m > 3 points, the equations for an exact fit are generally unsolvable: C+ D+ Et? =b 1 has the m by 3 matrix A=|: : ie tome C8 C2 Dt, + kt be Least squares equations The best parabola chooses ¥ = (C, D, E) to satisfy the three normal A' Ax = A'b. May I ask you to convert this to a problem of projection? The column space of A has dimension ____. The projection of b is p = AX, which combines the three columns using the coefficients C, D, E. The error at the first data point is ey = bj—C-—Dt,—E re The total squared error is FE= er+ . If you prefer to minimize by calculus, take the partial derivatives of E Sith respect to These three derivatives will be zero when X = (C, D, E) solves the 3 by 3 see = equations Section 8.4 has more least squares applications. The big one is Fourier series— approximating functions instead of vectors. The error to be minimized changes from a sum Ve er i e to an integral. We will find the straight line that is closest to f(x). Example 3 For a parabola b = C+Dt+Et? to go through the three heights b = 6, 0, 0 when ¢ = 0, 1, 2, the equations are C+D-0+E-0*=6 C+D-1+E-17=0 C+D-2+E-2?=0. (12) 4.3 Least Squares Approximations 191 b =6—9t + 312 Figure 4.8 From Example 2: An exact fit of the parabola through three points means p =6 and e = 0. The fourth point requires least squares. This is Ax = b. We can solve it exactly. Three data points give three equations and a square matrix. The solution is x = (C, D, E) = (6, —9, 3). The parabola through the three points in Figure 4.8a is b = 6 — 9t + 377. What does this mean for projection? The matrix has three columns, which span the whole space R?. The projection matrix is the identity matrix! The projection of b is b. The error is zero. We didn’t need A'A¥ = A'D, because we solved Ax = b. Of course we could multiply by A‘, but there is no reason to do it. Figure 4.8a also shows a fourth point b4 at time ty. If that falls on the parabola, the new Ax = b (four equations) is still solvable. When the fourth point is not on the parabola, we turn to A'A¥ = A‘. Will the least squares parabola stay the same, with all the error at the fourth point? Not likely! An error vector (0,0, 0, e4) is not perpendicular to the column (1,1, 1,1) of A. Least squares balances out the four errors, and they add to zero. =» 1 REVIEW OF THE KEYIDEAS The least squares solution of ¥ minimizes =& E = ||Ax — b\|?. This is the sum of squares of the errors in the m equations (m > n). Zz The best ¥ comes from the normal equations A'A¥ = AD. 3 To fit m points by a line 4 The heights of the best line are p = (p},..- , Pm). The vertical distances to the data points are e = (e),... , &m). b= C + Dt the two normal equations give C and D. 192 Chapter 4 Orthogonality 5 If we fit m points by a combination of n functions y;(t), the m equations >> x; (ti) = b; are generally unsolvable. The n normal equations give the best least ‘squares solution ¥ = (4,....%,): Problem Set 4.3 Problems 1 J 1-10 use four data points to bring out the key ideas. (Straight line b = C + Dt through four points) With b = 0,8, 8,20 at times t = 0,1,3,4, measurements write down the four equations Ax = b (unsolvable). Change the to p = 1,5, 13, 17 and find an exact solution to AX¥ = p. With b = 0,8, 8,20 at t = 0,1,3,4, set up and solve the normal equations A! AX = A'b. For the best straight line in Figure 4.9a, find its four heights and four errors. What is the minimum value of E = et + es es -- e4? 3 Compute p = AX for the same b and A using A'A¥ = A'b. Check that e = b — p is perpendicular to both columns of A. What is the shortest distance |le]| from b to the column space? 4 (Use calculus) Write down E = ||Ax — b||? as a sum of four squares involving C and D. Find the derivative equations JE /dC = 0 and JdE/dD = 0. Divide by 2 to obtain the normal equations A)A¥ = A'b. 5 Find the height C of the best horizontal line to fit b = (0, 8, 8, 20). An exact fit would solve the unsolvable equations C = 0,C = 8,C = 8,C = 20. Find the 4 by 1 matrix A in these equations and solve ATA¥ = A'b. Redraw Figure 4.9a to show the best height x = C and the four errors in e. b,=20 7 b= (0, 8, 8, 20) a ~ €o p=Ca,+Da, b= (0, 1, 3, 4) a,=(1, 1,1, 1) = Figure 4.9 t=1 tz =3 t4=4 Problems 1-11: The closest line C + Dt matches the projection in R*. 4.3 Least Squares Approximations 193 Project b = (0, 8, 8, 20) onto the line through a = (1, 1, 1, 1). Find x = a'b/a'a and the projection p = xa. Redraw Figure 4.9b and check that e = b— p is perpendicular to a. What is the shortest distance |le|| from b to the line in your figure? Find the closest line b = Dt, through the origin, to the same four points. exact fit*would D -0-= solve 0,D -1.=.8, De 4 by | matrix and solve A'A¥ = A'b. slope x = D and the four errors. 3i= "89D <4%=120) An Find the Redraw Figure 4.9a showing the best Project b = (0,8, 8,20) onto the line through a = (0,1,3,4). Find ¥ and p = xa. The best C in Problems 5-6 and the best D in Problems 7-8 do not agree with the best (C, D) in Problems 1-4. That is because (1, 1,1, 1) and (0,1,3,4) are ____—s perpendicular. For the closest parabola b = C+ Dt+ Et? to the same four points, write down the unsolvable equations Ax = b. Set up the three normal equations A'A¥ = Ab (solution not required). In Figure 4.9a you are now fitting a parabola—what is happening in Figure 4.9b? 10 11 For the closest cubic b = C + Dt + Et? + Ft? to the same four points, write down the four equations Ax = b. Solve them by elimination. In Figure 4.9a this cubic now goes exactly through the points. Without computation write p and e in Figure 4.9b. The average of the four times is f = + +1+3-+4) =2. The average of the four b’s is b = (0+ 8 +8 + 20) =9. (a) Verify that the best line goes through the center point (, b) = (2,9). (b) Explain why C + Di = b comes from the first normal equation. Questions 12-16 introduce basic ideas of statistics—the foundation for least squares. 2 (Recommended) (Os (lees) This problem projects b = (bj,...,bm) onto the line through (a) Solve a'aX = a'b to show that X is the mean of the b’s. (b) Find the error vector e and the variance (c) Draw a graph with b = (1, 2, 6) fitted by a horizontal line. What are p and e on the graph? Check that p is perpendicular to e and find the matrix P. \le||7 and the standard deviation lle. 13 First assumption behind least squares: Each measurement error has “expected value” zero. Multiply the eight error vectors b — Ax = (+1,+1,+1) by (ATA) 1AT to show that the eight vectors ¥ — x also average to zero. The expected value of ¥ is the correct x. 194 Chapter 4 Orthogonality 14 Second assumption behind least squares: The measurement errors are independent and have the same variance o*. The average of (b — Ax)(b — Ax)! is 071. Multiply on the left by (A7A)~!A™ and multiply on the right by A(ATA)~! to show that the average of (¥ —x)(X — x)! is 02(A'A)~!. matrix for the error in X. 15 This is the covariance A doctor takes m readings b;,...,bm of your pulse rate. The least squares solution to the m equations x = bj,x = b2,...,x = bm is the average xX = (b} + +--+ bm)/m. The matrix A is a column of 1’s. Problem 14 gives the expected error (x — x)" as o2(A! A)! = . By taking m measurements, the variance drops from 07 to o7/m. 16 If you know the average X99 of 99 numbers b),..., 499, how can you quickly find the average Xj99 with one more number bj99? The idea of recursive least squares is to avoid adding 100 numbers. What coefficient correctly gives X00 from bjo9 and X99? 7000100 + Zoq = Tog(bi +--+ bio0). Questions 17-25 give more practice with ¥ and p and e. 17 Write down three equations for the line t= lob =] at. t=, and.b.=-2) b = C + Dt to go through b = 7 at at t= 2...Find the leastasquares, solution X = (C, D) and draw the closest line. 18 Find the projection p = AX in Problem 17. This gives the three of the closest line. Show that the error vector is e = (2, —6, 4). 1g Suppose the measurements at t = —1, 1, 2 are the errors 2, —6,4 in Problem Compute x and the closest line. Explain the answer: dicular to so the projection is p = 0. 20 Suppose the measurements at ¢ = —1,1,2 are b = (5, 13,17). the closest line and e. The error is e = 0 because this D is 21 Which of the four subspaces contains the error vector e? Which contains x? What is the nullspace of A? ZZ Find the best line C + Dt to fit 23 18. b = (2, —6,4) is perpenCompute ¥ and Which contains p? b= 4,2, —1,0,0 at times t = —2, —1,0, 1,2. (Distance between lines) The points P = (x, x, x) are on a line through (1, 1, 1) and Q = (y,3y,-—1) are on another line. squared distance ||P — Q|I". Choose x and y to minimize the 24 Is the error vector e orthogonal to b or p or e or ¥? Show that |le||* equals e'b which equals b'b — b! p. This is the smallest total error E. 25 The derivatives of ||Ax||? with respect to the -variables x;,...,x, fill the vector 2A! Ax. The derivatives of 2b’ Ax fill the vector 2A'b. So the derivatives of ||Ax — b\|? are zero when 4.4 ORTHOGONAL Orthogonal Bases and Gram-Schmidt 195 BASES AND GRAM-SCHMIDT a 4.4 This section has two goals. The first is to see how orthogonal vectors make calculations simpler. Dot products are zero—so A!A is a diagonal matrix. The second goal is to construct orthogonal vectors. We will pick combinations of the original vectors to produce right angles. Those original vectors are the columns of A. The orthogonal vectors will be the columns of a new matrix Q. You know from Chapter 3 what a basis consists of—independent vectors that span the space. The basis gives a set of coordinate axes. The axes could meet at any angle (except 0° and 180°). But every time we visualize axes, they are perpendicular. /n our imagination, the coordinate axes are practically always orthogonal. This simplifies the picture and it greatly simplifies the computations. The vectors q1,...,4n are orthogonal when their dot products q; + qj are zero. More exactly gq}9) = 0 whenever i ~# j. With one more step—just divide each vector by its length—the vectors become orthogonal unit vectors. Their lengths are all 1. Then the basis is called orthonormal. DEFINITION The vectors qj,..., qn are orthonormal if tT, _J|0O Vi 41 ~) 1 wheni+#j (orthogonal vectors) when i = j. (unit vectors: |lq;|| = 1) A matrix with orthonormal columns is assigned the special letter Q. The matrix Q is easy to work with because Q'Q =I. This repeats in matrix language that the columns q1,..-@n are orthonormal. It is equation (1) below, and Q is not required to be square. When Q is square, o'oO = / means that gt =O transpose = inverse. 4l A matrix Q with orthonormal columns satisfies Q'Q = /: ag ne ie O' Oe ; ae, oe 1 — Fie i | Tieden Gril 14) 0 | 5) cae a i eee ee eee OO cee | See(l) 1 When row i of Q? multiplies column j of Q, the dot product is q; 9): Off the diagonal (i # j) that dot product is zero by orthogonality. On the diagonal (i = j) the unit vectors give q} qi = lIqi ll ——}a ks If the columns are only orthogonal (not unit vectors), then Q'Q is a diagonal matrix (not the identity matrix). We wouldn’t use the letter Q. But this matrix is almost as good. The important thing is orthogonality—then it is easy to produce unit vectors. 196 Chapter 4 Orthogonality To repeat: Q'Q =I even when Q is rectangular. In that case Q! is only an inverse from the left. For square matrices we also have QQ' =1, so Q! is the two-sided inverse of Q. The rows of a square Q are orthonormal like the columns. To invert the matrix we just transpose it. In this square case we call Q an orthogonal matrix. Here are three examples of orthogonal matrices—rotation and permutation and reflection. Example 1 (Rotation) Q rotates every vector in the plane through the angle @: ee sin 6 a a oat cos 0 A a —sin@ cosé The columns of Q are orthogonal (take their dot product). They are unit vectors because sin? @+cos?@ = 1. Those columns give an orthonormal basis for the plane R?. The standard basis vectors i and j are rotated through @ (see Figure 4.10a). Q-! rotates vectors back through —6. It agrees with Q', because the cosine of —O is the cosine of 0, and sin(—@) = —sin@. We have Q'Q =/ and OQ! =1/. Example 2 (Permutation) Ope Oa slat) eel These matrices change the order of the components: i 0 O12 x yl y =|2 and x i A eee is . All columns of these Q’s are unit vectors (their lengths are obviously 1). They are also orthogonal (the 1’s appear in different places). The inverse of a permutation matrix is its transpose. The inverse puts the components back into their original order: Inverse = transpose: QO" oO I LeOe.0 Oth 0 y Pe 3 x med and Zz 0 1 ; o|k|= | ; ie y Every permutation matrix is an orthogonal matrix. Example 3 (Reflection) If wu is any unit vector, set Q = J — 2uu!. Notice that uu’ is a matrix while u!u is the number ||u|/? = 1. Then Q' and Q~! both equal Q: O'=1-2uu'=Q and QO'QO=1 —4uu" 4+ 4uuTuu' = 1. (2) Reflection matrices J — 2uu™ are symmetric and also orthogonal. If you square them, you get the identity matrix. Reflecting twice through a mirror brings back the original, and O* = Q'Q =1. Notice u'u = 1 near the end of equation (2). *“Orthonormal matrix” would have been a better name for Q, but it’s not used. Any matrix with orthonormal columns has the letter Q, but we only call it an orthogonal matrix when it is square. 4.4 Orthogonal Bases and Gram-Schmidt As examples we choose the unit vectors uw; = (1, 0) and then uz = (2, Compute 2uu! (column times row) and subtract from /: 197 - Li/4/2): er=1-2[5]0 ol=[ {] ame or=r-2[$ S]=[1 9] Q, reflects uw; = (1,0) across the y axis to (—1,0). Every vector (x, y) goes into its mirror image (—x, y) across the y axis: Reflection from Q): E "|| = als Q2 is reflection across the 45° line. Every (x, y) goes to (y, x)—this was the permuta- tion in Example 2. A vector like (3,3) doesn’t move when you exchange 3 and 3—it is on the mirror line. Figure 4.10b shows the mirror. Rotations preserve the length of a vector. So do reflections. So do permutations. So does multiplication by any orthogonal matrix—lengths and angles don’t change. 4J If Q has orthonormal columns (Q™Q = 1), it leaves lengths unchanged: || Qx || = ||x|| for every vector x. Q also preserves dot products: Proof ||Qx|| equals (3) (Qx)'(Qy) =x'Q'Qy=x!y. IIx ||? because (Qx)'(Qx) = On Oa Just use Q'Q =7! x'Ix = x'x. Orthogonal matrices are excellent for computations—numbers can never grow too large when lengths of vectors are fixed. Good computer codes use Q’s as much as possible. That makes them numerically stable. Figure 4.10 Rotation by Q = [§ “§] and reflection by Q = [ o_-S — 198 = Chapter 4 Orthogonality Projections Using Orthogonal Bases: Q Replaces A This chapter is about projections onto subspaces. We developed the equations for X and p and P. When the columns of A were a basis for the subspace, all formulas involved A'A. The entries of A'A are the dot products a}aj. Suppose the basis vectors are actually orthonormal. The a’s become qg’s. Then A'A simplifies to Q'Q = J. Look at the improvements in ¥ and p and P. Instead of O'O we print a blank for the identity matrix: %¥=O0'b and p=QO* The least squares solution of Ox = bis and P=Q On (4) = Q"b. The projection matrix is P = Q oF There are no matrices to invert. This is the point of an orthonormal basis. The best ¥ = Q'b just has dot products of b with the rows of Q', which are the q’s: —qi — qib =) I | > oC ie ; (dot products) qnb We have n separate 1-dimensional projections. The “coupling matrix” or “correlation matrix” A'A is now Q'Q =. There is no coupling. Here is p = QF: P= | 1G) = | T 1} %2 Qn : |= qi(qib) +---+4n(q,b). Ugh (5) Important case: When Q is square and m = n, the subspace is the whole space. Then Q' = Q~! and & = Q'Db is the same as x = Q~'b. The solution is exact! The projection of b onto the whole space is b itself. In this case P = QQ! =/. You may think that projection onto the whole space is not worth mentioning. But when p = JB, our formula assembles b out of its 1-dimensional projections. If 41,---»Qn 18 an orthonormal basis for the whole space, so Q is square, then every b is the sum of its components along the q’s: b = qi(qib) + 42(q3b) + + Gn(qn®). (6) That is QQ' =. It is the foundation of Fourier series and all the great “transforms” of applied mathematics. They break vectors or functions into perpendicular pieces. Then by adding the pieces, the inverse transform puts the function back together. Example 4 The columns of this matrix Q are orthonormal vectors q1, 42, q3: | a Q=%| 2 -1 | ) 2] has first column qi = Wile WIN 4.4 Orthogonal Bases and Gram-Schmidt 199 The separate projections of b = (0,0, 1) onto q; and gz and q3 are qi(q; b) = £4) and q2(q3 b) = £q2 and q3(qib) me es The sum of the first two is the projection of b onto the plane of q; and q2. The sum of all three is the projection of b onto the whole space—which is b itself: ere ee’) 0 =5. =|]0| 4-2-2] + 2g. —4g3=4| 24, 44441 I The Gram-Schmidt Process The point of this section is that “orthogonal is good.’ Projections and least squares always involve A!A. When this matrix becomes Q'Q = /, the inverse is no problem. The one-dimensional projections are uncoupled. The best ¥ is Q'b (n separate dot products). For this to be true, we had to say “Jf the vectors are orthonormal.” Now we find a way to create orthonormal vectors. Start with three independent vectors a, b,c. We intend to construct three orthogonal vectors A, B, C. Then (at the end is easiest) we divide A, B,C by their lengths. That produces three orthonormal vectors q; = A/||A|l, 92 = B/||Bl|, ¢3 = C/||Cl. Gram-Schmidt Begin by choosing A = a. This gives the first direction. The next direction B must be perpendicular to A. Start with b and subtract its projection along A. This leaves the perpendicular part, which is the orthogonal vector B: B=b— —.-A. (7) A and B are orthogonal in Figure 4.11. Take the dot product with A to verify that A'B = A'b— A'b = 0. This vector B is what we have called the error vector e, | | | | | NJ onto AB plane Figure 4.11 First find B. Then find C. Then divide by ||Al], ||B||, and ||C|]. 200 = Chapter 4 Orthogonality perpendicular to A. Notice that B in equation (7) is not zero (otherwise a and b would be dependent). The directions A and B are now set. The third direction starts with c. This is not a combination of A and B (because c is not a combination of a and b). But most likely ¢ is not perpendicular to A and B. So subtract off its components in those two directions to get C: C=c-——— A] ——B. (8) This is the one and only idea of the Gram-Schmidt process. Subtract from every new vector its projections in the directions already set. That idea is repeated at every step.’ If we also had a fourth vector d, we would subtract its projections onto A, B, and C. That gives D. At the end, divide the orthogonal vectors A, B, C, D by their lengths. The resulting vectors g1, q2, 43, q4 are orthonormal. Example 5 Suppose the independent vectors a, b, ¢ are a=j|-1l and 2 0 —2 b= 0 Then A =a and 3 |-—3 3 c= has A'A = 2. Subtract from b its projection along A: Check: C is perpendicular to A and B. Finally convert A, B,C to unit vectors (length 1, orthonormal). The lengths of A, B, C are /2 and 6 and 3. Divide by those, for an orthonormal basis: ich q=—|-1 V2) 9 and Usually A, B,C contain fractions. TT think Gram had the idea. i) q2=-—= 2 ] V6 |_> and 1 q3 = —}] va 1 Almost always q1, 42, q3 contain square roots. I don’t really know where Schmidt came in. 4.4 Orthogonal Bases and Gram-Schmidt The Factorization 201 A = OR We started with a matrix A, whose columns were a,b,c. We ended with a matrix Q, whose columns are q1, 42, q3. How are those matrices related? Since the vectors a, b,c are combinations of the q’s (and vice versa), there must be a third matrix connecting A to Q. Call it R. The first step was q; = a/|l|a|| (other vectors not involved). The second equation was (7), where b is a combination of A and B. At that stage C and q3 were not involved. This non-involvement of later vectors is the key point of the process: - The vectors a and A and q; are all along a single line. ~ The vectors a,b and A, B and qj, q2 are all in the same plane. - The vectors a, b,c and A, B, C and q}, q2, q3 are in one subspace (dimension 3). At every step a},...,a@ are combinations of qj,..., qx. Later q’s are not involved. The connecting matrix R is triangular, and we have A = QR. abecl=|q A = OR is Gram-Schmidt qa gib qye @ 9 gob qyc| q3¢ or A=QR- (9) in a nutshell. 4K (Gram-Schmidt) From independent vectors a), ...,a,, Gram-Schmidt constructs orthonormal vectors qj,...,4n. The matrices with these columns satisfy A = QR. Then R = Q'A is triangular because later g’s are orthogonal to earlier a’s and a q; 4 = 9. Here are the a’s and q’s from the example. dot product of g; with aj: (oe AS On=2 Sie VOwA37 W = Li 1/2 0) Wa/6 174/60. 632/76: The i,j entry of R = Q'A is the Aa 3 1/83 sea iw om on 18 0 R/6 x6 ONO 4/3 FOR. The lengths of A, B, C are the numbers V2, V6, V3 on the diagonal of R. Because of the square roots, QR looks less beautiful than LU. Both factorizations are absolutely central to calculations in linear algebra. Any m by n matrix A with independent columns can be factored into QR. The m by n matrix Q has orthonormal columns, and the square matrix R is upper triangular with positive diagonal. We must not forget why this is useful for least squares: ATA equals R'Q™ QR = R'R. The least squares equation A’ AX = A'b simplifies to R™R* = R'Q'b or RF = O'D. (10) 202 Chapter 4 Orthogonality Instead of solving Ax = b, which is impossible, we solve R¥ = O'b by back substitution—which is very fast. The real cost is the mn* multiplications in the Gram-Schmidt process, which are needed to find Q and R. Here is an informal code. It executes equations (11) and (12), for k = 1 then k = 2 and eventually k =n. Equation (11) normalizes to unit vectors: For k = 1,...,n rk = bey and = Gin = © for i=1,...,m. a Vkk Equation (12) subtracts off the projection onto qx: For m kj = DE Gian i=1 and Gj; = Gi; — Gixtey 7 (11) =k+1,...,n for i= 1,2.2, m (12) Starting from a, b,c = aj, az, a3 this code will construct qi, B, q2, C, q3: 1 qi =4@;/|\a;|| in (11) 2 B=ay—(q{ar)q; 3 q2 = B/||B|| in (11) 4 C=C*—(qiC*)qo in (12) 5 q3=C/||C|| in (11) and =C* = a3 — (q/.a3)qi in (12) Equation (12) subtracts off projections as soon as the new vector qx is found. This change to “one projection at a time” is called modified Gram-Schmidt. It is numerically more stable than equation (8) which subtracts all projections at once. # 1 REVIEW OF THE KEY IDEAS If the orthonormal vectors q),...qQn are the columns #8 of Q, then q} 4; = 0 and q)4i = 1 translate into Q'Q = 1. If Q is square (an orthogonal matrix) then Q' = Q7!. The length of Qx equals the length of x: ||Qx|| = ||x]]. Ww > N The projection onto the column space spanned by the q’s is P = OQ!. 5 If Q is square then P = / and every b= qi(q} b) +...+4@n(q)b). 6 Gram-Schmidt produces orthonormal In matrix form this is the factorization qi, q2,q3 A = OR. from independent a,b,c. 4.4. Orthogonal Bases and Gram-Schmidt 203 Problem Set 4.4 Problems 1-12 are about orthogonal vectors and orthogonal matrices. 1 Are these pairs of vectors orthonormal or only orthogonal or only independent? ) 1 b ape —1 _ () 6 4 | auc Bey ©) cos 6 ee ane —sin@ a Change the second vector when necessary to produce orthonormal vectors. The vectors (2, 2, —1) and (—1, 2, 2) are orthogonal. Divide them by their lengths to find orthonormal vectors q; and g2. Put those into the columns of Q and multiply O'O and OO". (a) If A has three orthogonal columns each of length 4, what is A’ A? (b) If A has three orthogonal columns of lengths 1, 2,3, what is AA? Give an example of each of the following: (a) A matrix Q that has orthonormal columns but QQ? Sali, (b) Two orthogonal vectors that are not linearly independent. (c) An orthonormal basis for R*, where every component is 5 or —5. Find two orthogonal vectors that lie in the plane x + y + 2z = 0. Make them orthonormal. If Q,; and Q> are orthogonal matrices, show that their product Q;Q2 orthogonal matrix. (Use o'o =O) is also an If the columns of Q are orthonormal, what is the least squares solution ¥ to Qx = b? Give an example with b 4 0 but x = 0. (a) Compute P = QQ" when q, = (.8,.6,0) and q2 = (—.6,.8,0). Verify that P? = P. (b) Prove that always (0Q')(QQ') = QQ! by using Q'Q =/. Then P = QQ! is the projection matrix onto the column space of Q. Orthonormal vectors are automatically linearly independent. (a) Vector proof: When c1q1 + c2qg2 + c3q3 = 0, what dot product leads to c) = 0? Similarly cz = 0 and c3 = 0 and the q’s are independent. 10 (b) Matrix proof: Show that Qx = 0 leads to gular, you can use QP but not Or% x = 0. Since (a) Find orthonormal vectors q; and q2 in the plane of a = (1,3,4,5,7) b = (—6, 6, 8, 0, 8). (b) Which vector in this plane is closest to (1, 0,0, 0, 0)? Q may be rectan- and 204 Chapter 4 Orthogonality 11. If q; and q2 are orthonormal vectors in R°, what combination is closest to a given vector b? 12 If aj, az, a3 is a basis for R°’, any vector b can be written as b= xja, + x2a2 + x343 (a) a, az a3 Oe Suppose the a’s are orthonormal. Show that x; = alb. (b) Suppose the a’s are orthogonal. Show that x; = al b/a}ay. (c) If the a’s are independent, x, is the first component of Problems 13. or qt 13-24 are about the Gram-Schmidt qQ 0. times b. process and A = QR. What multiple of a =[1] should be subtracted from b = [4] to make the result B orthogonal to a? Sketch a figure to show a, b, and B. 14 Complete the Gram-Schmidt process in Problem and q2 = B/||B|| and factoring into QR: 13 by computing q; = a/|la|| F f= [a oo|['0! isi) 15 (a) Find orthonormal vectors q1, q2, g3 such that q;, g2 span the column space of (b) Which of the four fundamental subspaces contains q3? (c) Solve Ax = (1, 2,7) by least squares. 16 What multiple of a = (4,5, 2,2) is closest to vectors q; and q2 in the plane of a and BD. 17 ‘Find the projection of b onto the line through a: 1 ¢=nel and. i 1 bi=2\53 5 and) b = (1,2,0,0)? Pi) and Find orthonormal eb — pie) Compute the orthonormal vectors q; = a/||a|| and q2 = e/|le||. 18 If A = QR then ATA = R'R = Gram-Schmidt triangular times triangular. on A corresponds to elimination on A’A. Compare the pivots for A'A with ||a||? = 3 and |le||7 = 8 in Problem 17: 1 l = Av—siales N and Tops Nelly ee Al 4.4 19 20 Orthogonal Bases and Gram-Schmidt 205 True or false (give an example in either case): (a) The inverse of an orthogonal matrix is an orthogonal matrix. (b) If Q (3 by 2) has orthonormal columns then ||Qx|| always equals ||x||. ‘Find an orthonormal basis for the column space of A: 2 0 1 and —4 —3 3 I) = 3 — 0 Then compute the projection of b onto that column space. 21 ~+Find orthogonal vectors A, B,C by Gram-Schmidt from 1 1 Oral and b=/)-1 2 22 ard Cea) 0 Find qj, q2, q3 (orthonormal) Then write A as OR: as combinations > 4 of a, b,c (independent columns). ll oor won Nuns 23 (a) Find a basis for the subspace S in R* spanned by all solutions of xp 24 +x. +x3—x4 = 0. (b) Find a basis for the orthogonal complement S+. (c) Find by; in S and bp in S+ so that b} + by = b = (1, 1,1, 1). If ad —bc > 0, the entries in A = QR are a (a ki 4 CL te =¢| (?) ances (a7 0 sab scd ad — bc [qr 2 Write down A = QR when a, b, c,d = 2,1, 1, 1 and also 1, 1, 1, 1. Which entry becomes zero when Gram-Schmidt breaks down? 206 Chapter 4 Orthogonality Problems 25-28 use the QR code in equations (11-12). It executes Gram-Schmidt. 25 Show why C (found via C* in the steps after (12)) is equal to C in equation (8). 26 Equation (8) subtracts from c its components along A and B. Why not subtract the components along a and along b? 27 Write a working code and apply it to a = (2, 2, —1), What are the q’s? 28 Where are the mn? multiplications in equations (11) and (12)? b = (0, —3, 3), ec= (1, 0, 0). Problems 29-32 involve orthogonal matrices that are special. 29 (a) Choose c so that Q is an orthogonal matrix: jee ee |e ee Gere CEC aleip 8Rien eo sie (b) Change the first row matrix. and column ket 1 to all 1’s and fill in another orthogonal 30 Project b = (1,1, 1,1) onto the first column onto the plane of the first two columns. 31 If w is a unit vector, then Q = J — 2uu!' is a reflection matrix (Example 3). Find Q from u = (0,1) and also from u = (0, V2/2, V2/2). Draw figures to show the reflections of (x, y) and (x, y, z). 32 Q =1 —2uu! 33 in Problem 29(a). Then project b T is a reflection matrix when u'u = 1. (a) Prove that Qu = —u. The mirror is perpendicular to u. (b) Find Qv when u!v =0. The mirror contains v. The first four wavelets are in the columns r=. a ce alae of this wavelet matrix W: VA 0 BG /2) mein 1 Ora), iG es Onn) What is special about the columns of W? W-!. What is the relation of W~! to W? Find the inverse wavelet transform DETERMINANTS THE PROPERTIES OF DETERMINANTS a 5.1 The determinant of a square matrix is a single number. That number contains an amazing amount of information about the matrix. It tells immediately whether the matrix is invertible. The determinant is zero when the matrix has no inverse. When A is invertible, the determinant of A~! is 1/(det A). If det A = 2 then det A~! = 5. In fact the determinant leads to a formula for every entry in A7!. This is one use for determinants—to find formulas for inverse matrices and pivots and solutions A~!b. For a matrix of numbers, we seldom use those formulas. (Or rather, we use elimination as the quickest way to evaluate the answer.) For a matrix with entries a,b, c,d, its determinant shows how A~! changes as A changes: _|4a a=|? b wrk: A has inverse A =| =a 1 d lS —b ak (1) Multiply those matrices to get 7. The determinant of A is ad — bc. When det A = 0, we are asked to divide by zero and we can’t—then A has no inverse. (The rows are parallel when a/c = b/d. This gives ad = bc and a zero determinant.) Dependent rows lead to det A = 0. There is also a connection to the pivots, which are a and d—(c/a)b. The product of the two pivots is the determinant: a(d 2 <p) alist belewmichtiel deans a After a row exchange the pivots are c and b — (a/c)d. Those pivots multiply to give minus the determinant. The row exchange reversed the sign of det A. Looking ahead The determinant of an n by n matrix can be found in three ways: 1 Multiply the n pivots (times 1 or —1). 2 Add up n! terms (times 1 or —1). 3 Combine This is the pivot formula. This is the “big” formula. n smaller determinants (times | or —1). 207 This is the cofactor formula. 208 Chapter 5 Determinants You see that plus or minus signs—the decisions between 1 and —1—play a very big part in determinants. That comes from the following rule for n by n matrices: The determinant changes sign when two rows (or two columns) are exchanged. The identity matrix has determinant +1. Exchange two rows and det P = —1. Exchange two more rows and the new permutation has det P = +1. Half of all permutations are even (det P = 1) and half are odd (det P = —1). Starting from /, half of the P’s involve an even number of exchanges and half require an odd number. In the 2 by 2 case, ad has a plus sign and bc has minus—coming from the row exchange: il det| @ Na and (Oa det|| o|=-1 The other essential rule is linearity—but a warning comes first. Linearity does not mean that det(A + B) = det A + det B. This is absolutely false. That kind of linearity is not even true when A = 7 and B = I. The false rule would say that det22 = 1+ 1 = 2. The true ruletissdet2/ = 2”. Determinants’ ‘are:multiphed* by t” (not just by ¢) when matrices are multiplied by ¢. But we are getting ahead of ourselves. In the choice between defining the determinant by its properties or its formulas, we choose its properties—sign reversal and linearity. The properties are simple (Section 5.1). They prepare for the formulas (Section 5.2). Then come the applications, including these three: (1) Determinants give A7! and A~!b (by Cramer’s Rule). (2) The volume of an n-dimensional box is |det A|, when the edges of the box come from the rows of A. (3) The numbers 4 for which det(A — AJ) = 0 are the eigenvalues of A. This is the most important application and it fills Chapter 6. The Properties of the Determinant There are three basic properties (rules 1, 2, 3). By using those rules we can compute the determinant of any square matrix A. This number is written in two ways, det A and |A|. Notice: Brackets for the matrix, straight bars for its determinant. When A is a 2 by 2 matrix, the three properties lead to the answer we expect: The determinant of ig | G iS a b d [=ad ~ be We will check each rule against this 2 by 2 formula, but do not forget: The rules apply to any n by n matrix. When we prove that properties 4-10 follow from 1-3, the proof must apply to all square matrices. 5.1 The Properties of Determinants 209 Property 1 (the easiest rule) matches the determinant of J with the volume of a unit cube. 1 The determinant of the n by n identity matrix is 1. 2 The determinant changes sign when two rows are exchanged a Gud Check: b (sign reversal): b d | (both sides equal bc — ad). Because of this rule, we can find det P for any permutation matrix. Just exchange rows of J until you reach P. Then det P = +1 for an even number of row exchanges and det P = —1 for an odd number. The third rule has to make the big jump to the determinants of all matrices. 3. The determinant is a linear function of each row separately (all other rows stay fixed). If the first row is multiplied by ¢, the determinant is multiplied by f. If first rows are added, determinants are added. This rule only applies when the other rows do not change! Notice how c and d stay the same: : | by ?: multiply row add row ee | of A to row | of A’: In the first case, both sides are tad —thc. ta... 6b Gop ce tl. i : Ge é / 27 d / } se, er\e cod / b / Clad Then t factors out. In the second case, both sides are ad +. a'd — bc — b’c. These rules still apply when A is n by a, and the last n— 1 rows don’t change. May we emphasize this with numbers: 0 O i (eel and 0 0 EO | 0 ih (Oe 0 0 NY or —WwW © By itself, rule 3 does not say what any of those determinants are. It just says that the determinants must pass these two tests for linearity. Combining multiplication and addition, we get any linear combination in the first row: t(row | of A) + ¢t/(row 1 of A’). With this combined row, the determinant is ¢ times detA plus ¢’ times det A’. This rule does not mean that det 2/ = 2 det /. To obtain 27 we have to multiply both rows by 2, and the factor 2 comes out both times: 2 Sao F 0 9 eg =. and | ; 9) U =". 210 = Chapter 5 Determinants This is just like area and volume. Expand a rectangle by 2 and its area increases by 4. Expand an n-dimensional box by tf and its volume increases by t”. The connection is no accident—we will see how determinants equal volumes. Pay special attention to rules 1-3. They completely determine the number det A— but for a big matrix that fact is not obvious. We could stop here to find a formula for n by n determinants. It would be a little complicated—we prefer to go gradually. Instead we write down other properties which follow directly from the first three. These extra rules make determinants much easier to work with. 4 If two rows of A are equal, then detA = 0. Check 2 by: 2: a b =A b Rule 4 follows from rule 2. (Remember we must use the rules and not the 2 by 2 formula.) Exchange the two equal rows. The determinant D is supposed to change sign. But also D has to stay the same, because the matrix is not changed. The only number with —D = D is D = 0—this must be the determinant. (Note: In Boolean algebra the reasoning fails, because —1 = 1. Then D is defined by rules 1, 3, 4.) A matrix with two equal rows has no inverse. Rule 4 makes detA = 0. But matrices can be singular and determinants can be zero without having equal rows! Rule 5 will be the key. We can do row operations without changing det A. 5 Subtracting a multiple of one row from another row leaves det A unchanged. a c—la b d—lb a C b ais Linearity splits the left side into the right side plus another term —i[2 Lab This extra term is zero by rule 4. Therefore rule 5 is correct. Note how only one row changes while the others stay the same—as required by rule 3. Conclusion The determinant is not changed by the usual elimination steps: det A equals det U. If we can find determinants of triangular matrices U, we can find determinants of all matrices A. Every row exchange reverses the sign, so always det A = +det U. We have narrowed the problem to triangular matrices. 6 A matrix with a row of zeros has det A = 0. 0 e O [=o @ and oe 1b) 0 O =o. For an easy proof, add some other row to the zero row. The determinant is not changed (rule 5). But the matrix now has two equal rows. So det A = 0 by rule 4. 7 If A is a triangular matrix then det A = entries. a41a22++-dyn a < “|= ad and also =product of diagonal 0 [=ad. d 5.1. The Properties of Determinants 211 Suppose all diagonal entries of A are nonzero. Eliminate the off-diagonal entries by the usual steps. (If A is lower triangular, subtract multiples of each row from lower rows. If A is upper triangular, subtract from rows above.) By rule 5 the determinant is not changed—and now the matrix is diagonal: a\\ 0 a22 We must still prove that ; 0 = 411422 °*+ Ann. Gay For this we apply rules 1 and 3. Factor a;; from the first row. Then factor a22 from the second row. Eventually factor a,,, from the last row. The determinant is a; times a7 times --- times dy, times det/. Then rule 1 (used at last!) is det / = 1. What if a diagonal entry of the triangular matrix is zero? Then the matrix is singular. Elimination will produce a zero row. By rule 5 the determinant is unchanged, and by rule 6 a zero row means det A = 0. Thus rule 7 is proved—the determinants of triangular matrices come directly from their main diagonals. 8 If A is singular then det A =0. If A is invertible then det A # 0. |< i is singular if and only if ad —bc =0. Proof Elimination goes from A to U. If A is singular then U has a zero row. The rules give det A = detU = 0. If A is invertible then U has the pivots along its diagonal. The product of nonzero pivots (using rule 7) gives a nonzero determinant: det A = tdet U = + (product of the pivots). This is the first formula for the determinant. MATLAB would use it to find det A from the pivots. The plus or minus sign depends on whether the number of row exchanges is even or odd. In other words, +1 or —1 is the determinant of the permutation matrix P that does the row exchanges. With no row exchanges, the number zero is even and P = J and detA = detU. Note that always det L = 1, because L is triangular with 1’s on the diagonal. What we have is this: If. Again, det P = +1 PAr=,LU_and det then, et P detA =detL, det VU. A = +det U. Equation (2) is our first case of rule 9. (2) 212 9 Chapter 5 Determinants The determinant of AB is the product of det A times det B: |AB|=|A||BI. a b Gund! ap+br cp+dr Peg fees aq+bs cq+ds|" In particular—when the matrix B is A~'—the determinant of A7! is 1/det A: AA7!=I1 so (det A)(det A~!) = det J = 1. This product rule is the most intricate so far. We could check the 2 by 2 case by algebra: (ad — bc)(ps — qr) = (ap + br)(cq + ds) — (aq + bs)(cp + dr). For the n by n case, here is a snappy proof that |AB| = |A||B|. The idea is to consider the ratio D(A) = |AB|/|B|. If this ratio has properties 1,2,3—-which we now check—it must equal the determinant |A|. (The case |B| = 0 is separate and easy, because AB is singular when B is singular. The rule |AB| = |A||B| becomes 0 = 0.) Here are the three properties of the ratio |AB|/|B|: Property 1 (Determinant of /): If A =/ then the ratio becomes |B|/|B| = 1. Property 2. (Sign reversal): When two rows of A are exchanged, so are the same two rows of AB. Therefore |AB| changes sign and so does the ratio |AB|/|B]. Property 3 (Linearity): When row | of A is multiplied by f, so is row 1 of AB. This multiplies |AB| by ¢ and multiplies the ratio by t—as desired. Now suppose row | of A is added to row | of A’ (the other rows staying the same throughout). Then row 1 of AB is added to row 1 of A’B. By rule 3, the determinants add. After dividing by |B], the ratios add. Conclusion ‘This ratio |AB|/|B| has the same three properties that define |A|. Therefore it equals |A|. This proves the product rule |AB| = |A||B|. 10 The transpose A* has the same determinant as A. Check: |“ z =|" “| d Did since both sides equal ad — be. The equation |A!| = |A| becomes 0 = 0 when A is singular (we know that A! is also singular). Otherwise A has the usual factorization PA = LU. gives A'p! = U'L’. The proof of |A| = |A™| comes Transposing both sides by using rule 9 for products and comparing: det Pdet A =detLdetU and det A! det P? = detU' det L!. First, det L= det L™ (both have 1’s on the diagonal). Second, det U = det U? (trans- posing leaves the main diagonal unchanged, and triangular determinants only involve that diagonal). Third, det P = det P' (permutations have P’ = P—!, so |P||P™| =1 by rule 9; thus |P| and |P™| both equal 1 or both equal —1). Fourth and finally, the comparison proves that det A equals det A!. 5.1. The Properties of Determinants 213 Important comment Rule 10 practically doubles our list of properties. Every rule for the rows can apply also to the columns (just by transposing, since |A| = |A‘|). The determinant changes sign when two columns are exchanged. A zero column or two equal columns will make the determinant zero. If a column is multiplied by f, so is the determinant. The determinant is a linear function of each column separately. It is time to stop. The list of properties is long enough. Next we find and use an explicit formula for the determinant. =» REVIEW OF THE KEY IDEAS = abe The determinant is defined by linearity, sign reversal, and det J = 1. 2. After elimination det A is + (product of the pivots). 3. | The determinant is zero exactly when A is not invertible. 4. Two remarkable properties are det AB = (det A)(det B) and det A! = det A. Problem Set 5.1 Questions 1-12 are about the rules for determinants. 1 If a 4 by 4 matrix has det A = 2, find det(2A) and det(—A) and det(A”) and det(A~!). 2 If a3 by 3 matrix has det A= —3, find det($A) and det(—A) and det(A*) and det(A~'). 3 True or false, with reason or counterexample: 4 (a) The determinant of / +A is | +det A. (b) The determinant of ABC is |A||B||C\. (c) The determinant of A* is |A/*. (d) The determinant of 4A is 4A]. Which row exchanges show that these “reverse identity matrices” J3 and J4 have [J3|= —1 but [J4| = +17 det 1 1 O|=-l (One) © SO but det ©= OO or 214 Chapter 5 Determinants For n = 5,6,7, count the row exchanges to permute the reverse identity J, to the identity matrix /,. Propose a rule for every size n and predict whether Jo is even or odd. Show how Rule 6 (determinant= 0 if a row is all zero) comes from Rule 3. Prove from the product rule |AB| = |A||B| that an orthogonal matrix determinant | or —1. Also prove that |Q| = \o-!| =|aQ?|. Find the determinants Oe Q has of rotations and reflections: cos@ —sin@ ~ | sin@ cos@ ior ~ 1—2cos?@ —2cosé@sin@ |-2cos@sin@d 1—2sin*6é |" Prove that |A‘| = |A| by transposing A = QR. (R is triangular and Q is orthogonal; note Problem 7.) Why does |R™| = |RI|? 10 If the entries in every row of A add to zero, prove that det A= 0. If every row of A adds to one, prove that det(A—/) = 0. Does this guarantee that det A = 1? 11 Suppose that CD = —DC and find the flaw in this reasoning: Taking determinants gives |C||D| = —|D||C|. Therefore |C| = 0 or |D| = 0. One or both of the matrices must be singular. (That is not true.) 12 The inverse of a 2 by 2 matrix seems to have determinant = 1: det A~! = det I E ae ad—be|—¢€ a| ad—bc What is wrong with this calculation? Questions 13-26 use the rules to compute specific determinants. 13 By applying row operations to produce an upper triangular U, compute det ps i Om 2, Oe 14 Ok ay 0) Ome 0s UP SS and det 2 —-1 —| 2 jing 0 O 0 0 —1 0 yah 2 -!I 2 Use row operations to show that the 3 by 3 “Vandermonde determinant” is det] 1 b b? |=(b—a)(c—a)(c—Db). 5.1. 15 The Properties of Determinants 215 Find the determinants of a rank one matrix and a skew-symmetric matrix: A= |e A> | and Pak | 3 0 =3 eis 8 Op eed =4 0 16 A skew-symmetric matrix has K' = —K. Insert a,b,c for 1,3,4 in Question 15 and show that |K| = 0. Write down a 4 by 4 example with |K| = 1. 17 Use row operations to simplify and compute these determinants: 101% det | 102 103 18 BO 202 203 Find the determinants of U and U~! be @ 2 0 0 Use| 19 3} § 6 Leste det | Peni tz ¢ and 42 and U?: and b Zk , Suppose you do two row operations at once, going from a b cud Find the second determinant. 20 261 302 303 Add row ee a—Lce c=la b—Ld d—Ib| Does it equal ad — bc? 1 of A to row 2, then subtract row 2 from row 1. Then add row | to row 2 and multiply row 1 by —1 to reach B. Which rules show that A= ow: © al andar ong a b have det B = —detA? Those rules could replace Rule 2 in the definition of the determinant. 21 From ad — bc, find the determinants of A and A> es Peel 5 and A nok l all at | and cand Ay=uli A la ut =| \ 1 ae Which two numbers A lead to det(A — AJ) =0? Write down the matrix A —A/ for each of those numbers A—it should not be invertible. PD. From A = laa find A? and A~! and A — AJ and their determinants. two numbers A lead to |A —A/J| =0? Which 216 Chapter 5 Determinants 23 Elimination reduces A to U. Then A = LU: 385 4 = Ass 1) 6: Pakb ce Rl =e. 0 haan a .0. ee ts 40) 8 I 34 ON Oe to 4 ee ei ea! aU Find the determinants of L, U, A, U~!L~!, and U“!L!A. 24 If the i, 7 entry of A isi times j, show that det A = 0. (Exception when A = [1 ].) 25 If the i, j entry of A isi + j, show that det A = 0. (Exception when n = 1 or 2.) 26 Compute the determinants of these matrices by row operations: Angin Ca 8 2 g SSS 3S) ‘er sy, SoS Severs S| Sy & & => SVS 27 28 True or false (give a reason if true or a 2 by 2 example if false): (a) If A is not invertible then AB is not invertible. (b) The determinant of A is the product of its pivots. (c) The determinant of A — B equals det A — det B. (d) AB and BA have the same determinant. (Calculus question) Show that the partial derivatives of f(a,b,c,d) =In(ad —bc) leads to f(A) = In(det A) give Ari Eas ae a, 5.2. PERMUTATIONS Permutations and Cofactors AND 217 COFACTORS ga 5.2 A computer finds the determinant from the pivots. This section explains two other ways to do it, using permutations and using cofactors. We will find a new formula for the determinant—after reviewing the pivot formula. May I give the best example right away? It is my favorite 4 by 4 matrix: on ee hte 9 lath 0) lem 22a lee 0) Or = 2 1 OSs (0 The determinant of this matrix is 5. We can find it in all three ways: 1. The product of the pivots is 2 - ‘ : t : 2 Cancellation produces 5. 2. The “big formula” in equation (7) has 4! = 24 terms. But only five terms are nonzero: det A = 16—4-—-4-—-4412=5. The 16 comes from 2-2.-2-2 on the diagonal. Where does +1 come When you can find those five terms, you have understood formula (7). 3. from? The numbers 2, —1, 0, 0 in the first row are multiplied by their cofactors 4, 3, 2, | from the other rows. That gives 2-4—1-3=5. Those cofactors are 3 by 3 determinants. They use the rows and columns that are not used by the entry in the first row. Every term in a determinant uses each row and column once! The Pivot Formula Elimination leaves the pivots d,, ..., d, on the diagonal of the upper triangular U. If no row exchanges are involved, multiply those pivots to find the determinant! det A = (det (det U) = U)(dja>---d,). (1) This is our first formula for det A. When A changes to U, the determinant stays the same. L has 1|’s on the diagonal, which produces detL = 1. Formula (1) already appeared in the previous section, with the further possibility of row exchanges. The permutation matrix in PA = LU has determinant —1 or +1. This factor +1 enters the determinant of A: (det P)(detA) = (det L)(detU) When gives detA = +(d,d2---d,). A has fewer than n pivots, det A= 0 by Rule 8. The matrix is singular. (2) 218 Chapter 5 Determinants Example 1 A row exchange produces pivots 4, 2, 1 and that important minus sign: OW ORL Ar | Um a eS 4 5 6 PA 4 5 OT 2L Uegirqig 6 eS | det A= —(4)(2)(1) = —8. The odd number of row exchanges (namely one exchange) means that det P = —1. The next example has no row exchanges. It is one of the first matrices we factored in Section 2.6 (when it was 3 by 3). What is remarkable is that we can go directly to the n by n determinant. Large determinants become clear when we know the pivots. Example 2 (The —1, 2, —1 tridiagonal matrix) — The first three pivots are 2 and : n+l, and ;. The next pivots are 2 and isand eventually ——: | —] 2 —] -1 2 . factors into =I 2 —|] i 1 2 oe -1 3 ie —1 Sel +1 of The pivots are on the diagonal of U (the last matrix). When 2 and 3 and : and 3 are multiplied, the fractions cancel. The determinant of the 4 by 4 matrix is 5. Jn general the determinant is n + 1: n+1 det A= (2)(5)(G)--(72°) =n 41. 3 The 3 by 3 determinant is 4. Important point: Those first pivots depend only on the upper left corner of the original matrix. This is a rule for all matrices without row exchanges. We explain it while the example is fresh: The first k pivots come from the k by k matrix Ax in the top left corner of A. Their product d\dz---d, is the determinant of Ax. The 1 by 1 matrix A, contains the very first pivot d,. This is det Ay. The 2 by 2 matrix in the corner has det Az = d;d2. Eventually the n by n determinant uses all n pivots to give det A, which is det A. Elimination deals with the corner matrix Ax while starting on the whole matrix. We assume no row exchanges—then A = LU and Ax = LxU x. Dividing one determinant by the previous determinant (det Ay divided by det Ay_1) cancels everything but 5.2 Permutations and Cofactors 219 the most recent pivot d,. This gives a formula for the kth pivot. The pivots are ratios of determinants: did...d The kth pivot is d, = ee did)... dp) det A det Ax_ d In the —1, 2, —1 example this ratio of determinants correctly gives the pivot (k+1)/k. We don’t need row exchanges when all these corner matrices have det Ay # 0. The Big Formula for Determinants Pivots are good for computing. They concentrate a lot of information—enough to find the determinant. But it is impossible to see what happened to the original a;;. That part will be clearer if we go back to rules 1-2-3, linearity and sign reversal and det J = 1. We want to derive a single explicit formula for the determinant, directly from the entries aj;;. The formula has n\ terms. Its size grows fast because n! = 1, 2, 6,24, 120,.... For n = 11 there are forty million terms. For n = 2, the two terms are ad and bc. Half the terms have minus signs (as in —bc). The other half have plus signs (as in ad). For n = 3 there are 3! = (3)(2)(1) terms. Here are those six terms: 41 ts : 431 Notice the pattern. a2 aj is i. +44 1422433 + 412473431 + 413421432 fe —@1 1423432 — 412421433 — 13022431. 032 a33 Each product like aj1a23a32 has one entry from each column. has one entry from each row. (3) It also The column order 1, 3, 2 means that this particular term comes with a minus sign. The column order 3, 1, 2 in a)3a21a32 has a plus sign. It will be “permutations” that tell us the sign. The next step (n = 4) brings 4! = 24 terms. entry from each row and column. There are 24 ways to choose one The diagonal a@11a@22a33a44 with column order 1234 always has a plus sign. To derive the big formula I start with n = 2. The goal is to reach ad — be in a systematic way. Break each row into two simpler rows: ahi [as0) 40.5] Sand le dil Oelo og), Now apply linearity, first in row 1 (with row 2 fixed) and then in row 2 (with row | fixed): Gp bl ol] «ta0| 110.8 AR hel PINs Slee S66 Be lok ytd) eeOl Oe ale (4) 220 Chapter 5 Determinants The last line has 27 = 4 determinants. The first and fourth are zero because their rows are dependent—one row is a multiple of the other row. We are left with 2! = 2 determinants to compute: Ov id b i fa) [=ad ~ be. | =ad | | + be LO Om 1 0 a0 0 | fe Oar: c I won’t labor the point—you see it. The rules lead to permutation matrices, with 1’s and 0’s that give a plus or minus sign. The 1’s are multiplied by numbers that come from A. The permutation tells the column sequence, in this case (1, 2) or (2, 1). Now try n = 3. Each row splits into 3 simpler rows like [a;; 0 0]. Using linearity in each row, det A splits into 3 = 27 simple determinants. If a column choice is repeated—for example if we also choose [a2; 0 0]—then the simple determinant is zero. We pay attention only when the nonzero terms come from different columns. That leaves 3! = 6 determinants: aij a2, 43, aj2 a22. 432 ay3 a23]; 433 aj\ a\2 = a22 a\3 +e a33 a73| + |421 a3 a32 ay} a\2 ate a23| a3 + |421 a a32 a33 a22 a3} One entry is chosen from the first row. Call it aj,g, where the column number @ can be 1, 2, or 3. The number from the second row is ag, and there are two choices for column £. (Column q@ is not available—it is already used.) That leaves only one column to choose in the last row. There are 3! = 6 ways to order the columns. The six permutations of (1, 2, 3) include the identity permutation (1, 2,3) from P = /: (a, B;.@) = Gs 2,3), (23,35 DC 2) dt Sez) ae; Waa), (5) The last three are odd (one exchange). The first three are even (0 or 2 exchanges). When the column sequence is (a, 8, w), we have chosen the entries a},a2ga3,,—and the column sequence comes with a plus or minus sign. The determinant is now split into six simple terms. Factor out the a;;: 1 det A = 41422433 1 ] 1 + 412423431 1 1 1} + 13421432 |1 1 (6) + a11a23a32 1 |+ aj2a21433 | 1 1 + a43€2243) 1 1 1 The first three (even) permutations have det P = +1, the last three (odd) permutations have det P = —1. We have proved the 3 by 3 formula in a systematic way. 5.2 Permutations In the same way you can see the n by n formula. and Cofactors 221 There are n! orderings of the column numbers. The numbers (1,2,...,”) go in each possible order (a, B,..., @). With the columns in that order, the determinant is the product ajqa2g---dnw times +1 or —l. The sign depends on the parity (even or odd) of the column ordering. The determinant of the whole matrix is the sum of n! simple determinants (times | or —1)—which is our big formula: Formula for the determinant det A= sum over all n! permutations = P = (a, B,...,@) Ct) Y “(det P )ajqA2p TTI OAC: The 2 by 2) case is +-a11a22 — a ,2a2, (which is ad — bce). P is (1,2) or (2,1). The 3 by 3 case has three products “down to the right” and three products “down to the left.’ Warning: Many people believe they should follow this pattern in the 4 by 4 case. They only take 8 products—it should be 24. Example 3 (Determinant of ucts can be nonzero. U) When U is upper triangular, only one of the n! prod- This one term comes from the diagonal: det U = +u11U22-+-Unn. All other column orderings pick an entry from below the diagonal, where U has zeros. As soon as we pick a number like v2; = 0 from below the diagonal, that term in equation (7) is sure to be zero. Of course det J = 1. The only nonzero term is +(1)(1)---(1) from the diagonal. Example 4 Suppose Z is the identity matrix except for column 3. Then : determinant of Z = The term (1)(1)(c)(1) comes 10 Oty O20 “a0 oo 0 ol” OO oath (8) from the main diagonal with a plus sign. There are 23 other products (one factor from each row and column) but they are all zero. Reason: If we pick a, b, or d from column 3, that column is used up. Then the only available choice from row 3 is zero. Here is a different reason for the same answer. If c = 0, then Z has a row of zeros and det Z = c = 0 is correct. If c is not zero, use elimination. Subtract multiples of row 3 from the other rows, to knock out a,b, d. That leaves a diagonal matrix and det Z = c; This example will soon be used for “Cramer’s Rule.” If we move a, b, c, d into the first column of Z, the determinant is det Z = a. (Why?) Changing one column of I leaves Z with an easy determinant, coming from its main diagonal only. 222 Chapter 5 Determinants Example 5 Suppose A has 1’s just above and below the main diagonal. Here n = 4: TOO ORO Ob Oe al OF S06 A,= and P= Os DORs O10 OO OD OTEO Ol ie 0 The only nonzero choice in the first row is column 2. The only nonzero choice in row 4 is column 3. Then rows 2 and 3 must choose columns 1 and 4. In other words P4 is the only permutation that picks out nonzeros in A4. The determinant of Py is +1 (two exchanges to reach 2,1, 4,3). Therefore det Ag = +1. Determinant by Cofactors Formula (7) is a direct definition of the determinant. It gives you everything at once— but you have to digest it. Somehow this sum of n! terms must satisfy rules 1-2-3 (then all the other properties follow). The easiest is det = 1, already checked. The rule of linearity becomes clear, if you separate out the factor aj; or aj2 OF djq that comes from the first row. With n = 3 we separate the determinant into det A = ay, (22033 — 423032) + a2 (a23a31 — 421433) +413 (a21432 — a22a31). (9) Those three quantities in parentheses are called “cofactors.” They are 2 by 2 determinants, coming from matrices in rows 2 and 3. The first row contributes the factors a1, a12, a3, and the lower rows contribute the cofactors Cj;, Cj2, C 3. Certainly the determinant a1C11 + a12C}2 + a13C13 depends linearly on aj;, a2, aj3—this is rule 3. The cofactor of aj; is C11 = a22a33 — a23a32. You can see it in this splitting: 411 412 443 421 422 423) 431 432 433 a\) = a\2 422. 423) + |a21 432 433 43] aj3 a23|+|421 422 433 432 43] We are still choosing one entry from each row and column. Since aj, uses up row 1 and column 1, that leaves a 2 by 2 determinant as the cofactor. As always, we have to watch signs. The determinant that goes with a;2 looks like a21a33 — a23a3;. But in the cofactor Cj2, its sign is reversed. The sign pattern for cofactors along the first row is plus-minus-plus-minus. The sign is (—1)!*/ for the cofactor C;; that goes with a;;. You cross out row | and column j to get a submatrix M;;. Then take its determinant, and multiply by (—1)!*/ to get the cofactor: The cofactor is Cy; = (—1)'*/ det Mj;. Then det A = aj1}Cy, + 4j2Ci2 +--+ + ajnCin. (10) This is the “cofactor expansion along row 1.” In the big formula (7), the terms that multiply aj; combine to give det Mj;. The sign is (—1)!*!, meaning plus. Equation (10) is another form of equation (7), with the factors from row | multiplying their cofactors from rows 2, 3, ..., n. 5.2 Permutations and Cofactors 223 Note Whatever is possible for row 1 is possible for row i. The entries aj; in that row also have cofactors Cj;. Those are determinants of order n—1, multiplied by (iyi. Since aj; accounts for row i and column j—and each row and column are to be used once—the matrix Mj; throws out row i and column j. The display shows a43 and its cofactor M43 (with row 4 and column 3 crossed out). The sign (=1)t multiplies the determinant of M43 to give C43. The sign matrix shows the + pattern: as ca (-DiV = = 56 OG OURS as 5A COFACTOR FORMULA with its cofactors: det | + 2p STR — + — us | =) 2. Fee The determinant of A is the dot product of any row i A = aj, Cj, + aj2Cji2 + +++ + ainCin. (il) Each cofactor C;; (order n—1, without row i and column /) includes its correct sign: Cij = (—1)'*/ det Mj. A determinant of order n is a combination of determinants of order n— 1. A recursive person would keep going. Each subdeterminant breaks into determinants of order n—2. We could define all determinants via equation (11). This rule goes from order n to n—1 to n—2 and eventually to order 1. Define the 1 by 1 determinant |a| to be the number a. Then the cofactor method is complete. We preferred to construct det A from its properties (linearity, sign reversal, and det / = 1). The explicit formula those properties. (7) and the cofactor formulas (9)-(11) follow from It is the properties that allow us to change A into an easier matrix like U, when computing the determinant. One last formula comes from the rule that det A = det A'. We can expand in cofactors, down a column instead of across a row. Down column j the entries are aj; to anj. The cofactors are Cj); to C,;. The determinant is the dot product: Cofactor expansion down column j: det A = a1 j;Cjj + 42;C2j +--+ + anjCnj. (12) Cofactors are most useful when the matrices have many zeros—as in the next examples. 224 Chapter 5 Determinants Example 6 The —1, 2, —1 matrix has only two nonzeros in its first row. So only two cofactors are involved in the determinant: * . | - a aid - =I 2 ies olla) leene =| int Maal be) 2 DD =| r= Ue 2 (13) You see 2 times Cj; on the right. This cofactor has exactly the same —1, 2, —1 pattern as the original A—but one size smaller. The other cofactor Cj2 came from crossing out row | and column 2. It is multiplied by aj2 = —1 and by the sign (—1)!*? which is also —1. To compute C12, use cofactors down its first column. The only nonzero is at the top. That contributes another —1 (so we are back to minus). Its cofactor is the —1,2,—1 matrix which is two sizes smaller than the original A. Summary Let D,, stand for the determinant of the —1, 2, —1 matrix of order n. Equation (13) gives the 4 by 4 determinant D4 from a 3 by 3 cofactor D3 and a 2 by 2 cofactor Dp: D4 = 2D3 — Do and generally D, = 2D, 4 — D>. (14) Direct calculation gives D2 = 3 and D3 = 4. Therefore D4 = 2(4) —3 = 5. Then Ds = 10—4 = 6. These determinants 3, 4, 5, 6 fit the formula D, = n+ 1. That answer also came from the product of pivots in Example 2. The idea behind cofactors is to reduce the order one step at a time. In this example the determinants D, =n-+ 1 obey the recursion formula n + 1 = 2n — (n — 1). As they must. Example 7 This is the same matrix, except the first entry (upper left) is now 1: Lies PW Pie whe ey af OORT =1 mud All pivots of this matrix turn out to be 1. So its determinant is 1. How does that come from cofactors? Expanding on row 1, the cofactors all agree with Example 6. Just change: a14 = 2. torbi; = "1; det Bg = D3 — Do instead of det Ag = 2D3 — Dp. The determinant of By is 4—3= 1. The determinant of every B, is n —(n—1) = 1. Problem 13 asks you to use cofactors of the Jast row. You still find det B, = 1. 5.2 m Permutations REVIEW OF THE KEY IDEAS and Cofactors 225 = 1. With no row exchanges, det A = (product of the pivots). In the upper left corner, det Ay = (product of the first k pivots). 2. Every term in the big formula (7) uses each row and column once. n! terms have plus sign (when det P = +1). 3. The cofactor C;; is (—1)'*/ times the smaller determinant that omits row i and column j (because aj; uses them). 4. The determinant is the dot product of any row of A with its row of cofactors. Half of the Problem Set 5.2 Problems 1-10 use the formula with n! terms: |A| = >> £01~428 ++ Any. 1 Compute the determinants of A and B from six terms. pendent? Je Compute the determinants of A and B. Are their columns independent? jh Bry Ay == lheLaie ian Mal Mae 3 Bede Le aaPiinge 3 Se ab Cleat ial, Show that det A = 0, regardless of the five nonzeros marked by x’s: x) As || OQ OQ” 4 and Are their columns inde- Myo OW x Olas |, (What is the rank of A?) This problem shows in two ways that det A = 0 (the x’s are any numbers): > | Oe oe Se: Se ee Ss &SoCeo oy iS# &oom (a) How do you know that the rows are linearly dependent? (b) Explain why every term is zero in formula (7) for det A. 226 Chapter 5 Determinants Find two ways to choose nonzeros from four different rows and columns: Age be Oes0 at bh atlFl ice 10 B= Ol Oe Ora" 4. 5 40 2 a2 5 3 (B has the same zeros as A). Olin Ou Is det A equal to 1+ 1 or 1 —1 or —1 — 1? What is det B? Place the smallest number of zeros in a 4 by 4 matrix that will guarantee det A = 0. Place as many zeros as possible while still allowing det A 4 0. (a) If ayy = a22 = a33 = 0, how many of the six terms in detA will be zero? (b) If ay) = a22 = a33 = a44 = 0, how many of the 24 products aj ja2,43)d4m are sure to be zero? How many 5 by 5 permutation matrices have det P = +1? Those are even permutations. Find one that needs four exchanges to reach the identity. 10 If det A is not zero, at least one of the terms in (7) is not zero. Deduce some ordering of the rows of A leaves no zeros on the diagonal. from elimination; that PA can have zeros on the diagonal.) (Don’t use P (a) (b) that How many permutations of (1, 2, 3,4) are even and what are they? An odd permutation matrix times an odd permutation matrix is an permutation matrix. Problems 11-20 use cofactors C;; = (—1)'*/ det M;;. Remove row i and column j. 11 Find all cofactors and put them into a cofactor matrix C. Find det B by cofactors: A= 12 is fl B= } 23 4-3, 0 7 0 30 Find the cofactor matrix C and multiply A times CT’. What is A~!? 221, A a 0 13 50 ies ok =—1" 2 The matrix B, is the —1, 2, —1 matrix A, except that b}; = 1 instead of a); = 2. Using cofactors of the /ast row of B4 show that |B4| = 2|B3|—|B| and find |B4|: a UIET fe = os Pe L —1 2 B3=|-1 1 -1 2 -1 -1 2 5.2 Permutations and Cofactors D7, The recursion |B,| = 2|Bn—1| — |Bn—2| is satisfied when every |B,| = 1. This recursion is the same as for the A’s. The difference is in the starting values 1, 1, 1 Or @ = il, DB, Se 14 The n by n determinant C, has 1’s above and below the main diagonal: C; = |0| 15 ce 0 1 0 ooro = CO or 1 oc oro GS (a) What are these determinants By cofactors find the relation between C, and C,_; and C,—2. C), C2, C3, C4? Find Cjo. The matrices in Problem 14 have 1’s just above and below the main diagonal. Going down the matrix, which order of columns gives all 1’s? Decide even or odd to prove: The tridiagonal Cre I= 4,835 5) Cr=so 1,1, 1 matrix of order n has determinant (oe — AN 17 Ont 50 0 | c=|to 1 Omi (b) Cn = 0 (odd n) 16 Ope ape role ie = 2,6,25): E,,: alht lea eat jy sil Gis ail i Nib i a 1a = @ i (ee (a) By cofactors show that E, = E,-1 — En-2. (b) Starting from £; = 1 and Ey = 0 find £3, E4,..., Eg. (c) By noticing how these numbers ih al ee : eventually repeat, find Ej 9. F,, is the determinant of the 1, 1, —1 tridiagonal matrix of order n: 1 fe oe 1 1 — es aoe —a il 0 i 1 Expand in cofactors to show that F, = Fibonacci numbers 1,2,3,5,8,13,.... — ee Fr—1 + Fn—2. The sequence -1 ei) 1 1 1 eas -1l 1 These determinants are usually starts 1,1, 2,3 (with two 1’s) so our F, is the usual Fy,+1. 18 Go back to B, in Problem 13. It is the same as A, except for bj; = 1. So use linearity in the first row, where [1 —1 0] equals [2 —1 0] minus [1 0 0]. Here is |B,|: =| 1 -l 0 2 =I An-1 0 —-1 0 _|-1 ree An-1 0 0 An-1 0 228 Chapter 5 Determinants Linearity gives"|B,| = |A,| = |An-1| = 19 Explain why the 4 by 4 Vandermonde determinant contains i Dut not x lec sped ta. cor x": ae ade Dee eA ioe oo * eA fl) ~ Xie SA Bea The determinant is zero at x = : , and _ The cofactor of x? is V3 = (b—a)(c—a)(c—b). Then V4 = (b—a)(c—a)(c—b) (x —a)(x —b)(x—c). 20 Produce zeros by row operations and find the determinant of O01 ive) RG ie Ghee) te SL Ded Ort 1 0 Find also det G2 and det G3, with zeros on the diagonal and ones elsewhere. you predict det G,? Problems 21 21-24 are about block matrices and block determinants. With 2 by 2 blocks in 4 by 4 matrices, you cannot always use block determinants: A 0 (a) 22 23 Can B D =|A||D| yt) gual!03 but é D # |A||D| — |C||Bl. Why is the first statement true? Somehow B doesn’t enter. (b) Show by example that equality fails (as shown) when C enters. (c) Show by example that the answer det(AD — CB) is also wrong. With block multiplication, A= LU has Ax = L;U, in the top left corner: Ay *| |Le O|| Ux * x #*{| | x iar exh * (a) If A has pivots 2,3,—1 what are the determinants U2, Uz and Aj, Az, A3? of L;, L2, L3 and Uj, (b) If Ay, Az, A3 have determinants 2,3, —1 find the three pivots. Block elimination subtracts CA~! times the first row [A B] from the second row. This leaves the Schur complement D — CA~'B in the corner: fi = CA 1, [cA le A B Pl SC mealla), a= eA ee Take determinants of these block matrices to prove correct rules for square blocks: A B = é aa |A|[D—CA’B|-= if A~! exists |AD=CB| if AC =CA. 5.2 24 Permutations and Cofactors 229 “(2 1-[2 IL If A is m by n and B is n by m, block multiplication gives det M = det AB: If A is a single row and B is a single column what is det M? If A is a column and B is a row what is det M? Do a 3 by 3 example of each. 25 (A calculus question based on the cofactor expansion) (a) Find the derivative of det A with respect to aj;. The other entries are fixed—we are only changing aj; in equation (9). (b) Find the derivative of In(det A) with respect to a,;. By the chain rule this is 1/ det A times the answer to (a), which gives the (1,1) entry inthe matrix. 26 A 3 by 3 determinant has three products “down to the right” and three “down to the left” with minus signs. Compute the six terms to find D. Then explain without determinants why this matrix is or is not invertible: 27 For E4 in Problem nonzero. 16, five of the 4! = 24 terms in the big formula (7) are Find those five terms to show that E4 = —1. 28 For the 4 by 4 matrix at the beginning of this section, find the five terms in the big formula that give det A= 16 -4—4—4+1. 29 Compute the determinants |5j|, |S2|, |S3| of these 1, 3, 1 tridiagonal matrices: Sie (3/5 oe =|; a Sys a © ieh se I 0.1 3 Make a Fibonacci guess for |S4| and verify that you are right. 30 Cofactors of the 1,3, 1 matrices give a recursion similar to the —1, 2, —1 exam- ple, but with 3 instead of 2: |Sin| = 3|Sn-1| - |Sn—2l- Challenge: Show that the Fibonacci number |Sp| = F2n+42 satisfies this recursion. Keep using the Fibonacci rule Fy = Fy—1 + Fr-2. 31 Change 3 to 2 in the upper left corner of the matrix S,. Show that the determinant drops from the Fibonacci number F2,+2 to F241. 230 Chapter 5 Determinants CRAMER’S RULE, INVERSES, AND VOLUMES a 5.3 This section is about the applications of determinants, first to Ax = b and then to my In the entries of A~', you will see det A in every denominator—we divide by it. (If it is zero then A~! doesn’t exist.) Each number in A~! is a determinant divided by another determinant. So is every component of x = A~'b. Start with Cramer’s Rule to find x. A neat idea gives the solution immediately. Put x in the first column of 7. When you multiply by A, the first column becomes Ax which is b: A x 0 O ll Ol Mil = |a thy Gp || =] Jai (1) We multiplied a column at a time. The first column of B; is Ax, the other columns are just copied from A. Now take determinants. The product rule is: (det A)(x;) = det By or xX] i det By, 7 det A (2) This is the first component of x in Cramer’s Rule! Changing a column of A gives By). To find x2, put the vector x into the second column of the identity matrix: a, a a Psa 0. se> Ose @ O ||= il \.ah WD Ge || | lth. (3) Take determinants to find (det A)(x2) = det By. This gives x2 in Cramer’s Rule: 5B (CRAMER’s RULE) If detA is not zero, then Ax = Bb has the unique solution x det B, x] mudepati y det Bz 82" Seto 4 det B,, eae The matrix Bj; is obtained by replacing the jth column of A by the vector b. The computer program for Cramer’s rule only needs one formula line: x(j) = det([AG, 1: 7-1) B AG, j +1: n)])/det(A) To solve an n by n system, Cramer’s Rule evaluates n+1 determinants (of A and the n different B’s). When each one is the sum of n! terms—applying the “big formula” with all permutations—this makes (n + 1)! different terms. It would be crazy to solve equations that way. But we do finally have an explicit formula for the solution x. 5.3 Cramer’s Rule, Inverses, and Volumes 231 Example 1 Use Cramer’s Rule (it needs four determinants) to solve M2 x3) — | —2x,|+x2 (|) —4x, +x3 = 0. The first determinant is |A|. It should not be zero. into columns Then the right side (1,0,0) goes 1, 2, 3 to produce the matrices B,, Bz, B3: es |A}=|-2 1 O|=7 =i Ona ieSikes! |Bo)=|-2 0 O|=2 ST and and ima laa 1 O|=1 Omom ie 0 |B3)}=|-2 1 0] =4 -4 0 0 |B|=|0 Cramer’s Rule takes ratios to find the components of x. Always divide by |A|: x} = (By — apes == 1 and x2 |Bo| = —=-= 2 Alea |B3| 4 x3=—=-2. and [Al I always substitute the x’s back into the equations, to check the calculations. A Formula for A7! In Example 1, the right side was b = (1,0,0). This is the first column of J. The solution x must be the first column of the inverse. Then the first column of AA~! = J is correct. Another important point: When b = (1,0,0) goes in a column of A to produce one of the B’s, the determinant is just 1 times a cofactor. are cofactors (one size smaller). The numbers 1, 2, 4 Look back to see how each B is a cofactor: |B;| =1 is the cofactor cu=|q |B2| = 2 1s the cofactor Cj2 = — ee B3|=4 |B3|=4 is1s the cofactor Cj3 1= 4 aa |—4 Ol” Main point: The numerators in A~' are cofactors. They are divided by det A. For the second column of A~!, change b to (0, 1, 0). Watch how the determinants of B,, Bo, B3 are cofactors along row 2—including the signs (—)(+)(—): O) ik 1 1 OO 4 OO} =-—-1 ma and Divide —1,5, —4 by |A| =7 i —2 —4 0 1 0O l Ol 1 Saeeana l —2 —4 to get the second column of A7!. 1 1 0 0 1| = —4. QO 232 Chapter 5 Determinants For the third column of A~!, the right side is b = (0,0, 1). The three determi- nants of the B’s become cofactors along the third row. The numbers are —1, —2, 3. We always divide by |A| = 7. Now we have all columns of Tee A= Summary 1 ole #1 (31-0 =e On a times eee A = 1277" 5/7, 2/7 |) equals: Aiea 8) To find A~! you can solve AA~! = J. The columns J. of J lead to the columns of A~!. After stating a short formula for A~', we will give a separate direct proof. Then you have two ways to approach A~!—by putting columns of J into Cramer’s Rule (above), or by cofactors (the lightning way using equation (7) below). 5C (FORMULA FOR A~!)_ determinant of A: (A The i, j entry of A! =e )ij = Cji det A and is the cofactor Cj; divided by the A =i a = det A (4) : The cofactors Cj; go into the “cofactor matrix” C. Then this matrix is transposed. To compute the i, j entry of A~', cross out row j and column i of A. Multiply the determinant by (—1)'*/ to get the cofactor, and divide by det A. Example 2 The matrix A = [2%] has cofactor matrix C = [_¢ -©]. Look at A times the transpose of C: eo Te aed d=") BF kad’ —“be =| ie Alpe 0 0 meal ©) The matrix on the right is det A times J. So divide by det A. Then A times C!/ detA is I, which reveals A7!: ho tie Ge det A which is 1 ad — bc bs <3 |\—c A (6) a This 2 by 2 example uses letters. The 3 by 3 example used numbers. Inverting a 4 by 4 matrix would need sixteen cofactors (each one is a 3 by 3 determinant). Elimination is faster—but now we know an explicit formula for A~!. Direct proof of the formula A~! = C'/ detA The idea is to multiply A times C7: aj ao ay Crp Co. C3) ayy 431 a. 432 493 | | Cig 433.) | Cis Copy C93 Cag 43 det A PO 0 0 0 deta 0) detA (7) 5.3 Row Cramer’s Rule, Inverses, and Volumes 233 | of A times column | of the cofactors yields det A on the right: By cofactors of row 1: a,,C 1 + aj2Cj2 +.a13C13 = det A. Similarly row 2 of A times column 2 of C! yields detA. The entries a2 j are multiplying cofactors C2; as they should. Reason for zeros off the main diagonal in equation (7). Rows of A are combined with cofactors from different rows. Row 2 of A times column 1 of C! gives zero, but why? an, Cyy + 422C12 + a23C 73 = 0. (8) Answer: This is the determinant of a matrix A* with two equal rows. A®* is the same as A, except that its first row is a copy of its second row. So det A* = 0, which is equation (8). It is the expansion of detA* along row 1, where A* has the same cofactors C11, Cj2, C13 as A—because all rows agree after the first row. Thus the remarkable matrix multiplication (7) is correct. It has det A times / on the right side: AC!= (det A)I Cee “s Ae — fas det A Example 3 A triangular matrix of I’s has determinant cofactors: Il TA‘aes A\ = @ © (lei Page alle Cross out row © NOFA)acl i has inverse 1 On 0 O On ie eel) es a seeker Ag, = (ies 1 0 “fil | and column |. The inverse matrix contains -!1l | of A to see the cofactor C;; = 1. Now 1 cross out row 1 and column 2. The 3 by 3 submatrix is still triangular with determinant 1. But the cofactor C;2 is —1 because of the sign (—1)!*2. This number —1 goes into the (2, 1) entry of A~'—don’t forget to transpose C! The inverse of a triangular matrix is triangular. Cofactors give a reason why. Example 4 If all cofactors are nonzero, is A sure to be invertible? No way. Example 5 Here is part of a direct computation of A”! by cofactors: Gy A=|1 al 0.1 pe yh PAN and 5) Cj. = —(—2) Cro = —6 1|* and AW=% X ee oa Phe x Sar Bs X 234 Chapter 5 Determinants (x2, y2) (x1, 1) (x3, y3) Figure 5.1 General triangle; special triangle from (0, 0); general from three specials. Area of a Triangle Everybody knows the area of a rectangle—base times height. The area of a triangle is half the base times the height. But here is a question that those formulas don’t answer. If we know the corners area? (x1, y,) and (x2, y2) and (x3, y3) of a triangle, what is the Using the corners to find the base and height is not a good way. Determinants are much better. There are square roots in the base and height, but they cancel out in the good formula. The area of a triangle is half of a 3 by 3 determinant. If one corner is at the origin, say (x3, y3) = (0,0), the determinant is only 2 by 2. 5D (Area of a triangle) The triangle with corners (x;, y;) and (x2, y2) and (x3, y3) has area = E (determinant): Ais sy eee area = 4 wy wa | oe 8 One el ated — 5 i 2 : when’ ((¥3, v3) =©,, 0). When you set x3 = y3 = 0 in the 3 by 3 determinant, you get the 2 by 2 determinant. These formulas have no square roots—they are reasonable to memorize. The 3 by 3 determinant breaks into a sum of three 2 by 2’s, just as the third triangle in Figure 5.1 breaks into three triangles from (0, 0): Sen eae +5(x1y2 — x2y1) 5 (%2 y2 1) = +5(x2y3 —x3y2) x3 y3 | (9) +5 (x31 — x1 y3). This shows the area of the general triangle as the sum of three special areas. If (0, 0) is outside the triangle, two of the special areas can be negative—but the sum is still correct. The real problem is to explain 5(x1y2 — x2y}). 5.3 Cramer’s Rule, Inverses, and Volumes 235 (1,3) (X1, ¥,) (0, 0) Figure 5.2 (4, 1) I Triangle area 5; (0, 0) _A triangle is half of a parallelogram. Why is this the area of a triangle? We can remove the factor $ and change to a parallelogram (twice as big). The parallelogram contains two equal triangles. We now prove that its area is the determinant x;y2 — x2y,;. This area in Figure 5.2 is 11, and the triangle has area 55. Proof that a parallelogram starting from (0,0) has area = 2 by 2 determinant. There are many proofs but this one fits with the book. We show that the area has the same properties 1-2-3 as the determinant. Then area = determinant! Remember that those three rules defined the determinant and led to all its other properties. 1 When A = /7/, the parallelogram becomes the unit square. Its area is det / = 1. 2 When rows are exchanged, the determinant reverses sign. The absolute value (positive area) stays the same—it is the same parallelogram. 3 If row | is multiplied by ft, Figure 5.3a shows that the area is also multiplied by f. If row 1 of A is added to row 1 of A’ (keeping row 2 fixed), the new determinant is |A|+|A’|. Figure 5.3b shows that the solid parallelogram areas also add to the dotted parallelogram area (because the two triangles completed by dotted lines are the same). Dotted area = Solid area=A + A’ Full area=tA — 2 = / = ti x Sires (+X), (0, 0) Figure 5.3. M2 YH) (0, 0) Areas tA and A + A’ obey linearity (keeping one side constant). 236 Chapter 5 Determinants That is an exotic proof, when we could use plane geometry. But the proof has a major attraction—it applies in n dimensions. The n edges going out from the origin are given by the rows of an n by n matrix. This is like the triangle with two edges going out from (0,0). The box is completed by more edges, just as the parallelogram was completed from a triangle. Figure 5.4 shows a three-dimensional box—whose edges are not at right angles. The volume of the box in Figure 5.4 equals the absolute value of det A. Our proof is to check that rules 1-3 for determinants are also obeyed by volumes. When an edge is stretched by a factor t, the volume is multiplied by t. When edge 1 is added to edge 1’, the new box has edge 1 + 1’. Its volume is the sum of the two original volumes. This is Figure 5.3b lifted into three dimensions or n dimensions. I would draw the boxes but this paper is only two-dimensional. The unit cube has volume = 1, which is det J. This leaves only rule 2 to be checked. Row exchanges or edge exchanges leave the same box and the same absolute volume. The determinant changes sign, to indicate whether the edges are a right-handed triple (det A > 0) or a left-handed triple(det A < 0). The box volume follows the rules for determinants, so volume = determinant. Example 6 A rectangular box (90° angles) has side lengths r,s, and t. Its volume is r times s times ¢. A matrix that produces those three sides is A = . Then det A also equals rst. Example 7 (Triple and double integrals) In calculus, the box is infinitesimally small! When the coordinates are x, y, z, the box volume is dV = dx dy dz. To integrate over a circle, we might change x and y to r and @. Those are polar coordinates: x = r cos@ (431, 432, 433) (411, 412, 443) Jf (421,422, 22+ 423 a23) x Figure 5.4 Three-dimensional box formed from the three rows of A. 5.3. and y=rsin@. Cramer’s Rule, Inverses, and Volumes 237 The area of a polar box is a determinant J times dr dé: dx/dr dy/or — dx/dO| dy/d0| _ — cos@ —rsing sin 8 r cos@ This determinant is J = rcos?6 +r sin?6 =r. That is the r in polar coordinates. A small area is dA = rdrd9é. The stretching factor J goes into double integrals just as dx/du goes into an ordinary integral [ dx = [(dx/du) du. The Cross Product This is an extra (and optional) application, special for three dimensions. tors u = (u),U2,u3) Start with vec- and v = (vj, v2, v3). These pages are about their cross product. Unlike the dot product, which is a number, the cross product is a vector—also in three dimensions. It is written uw x v and pronounced “u cross v.” We will quickly give the components of this vector, and also the properties that make it useful in geometry and physics. This time we bite the bullet, and write down the formula before the properties. DEFINITION vector UXV= The cross product i ie ek |u, U2 U3| VI U2~~——«OU3 of u = (uj, u2,u3) and v = (vj, v2, v3) is the SGovs —u302)i + (U3U] = U1) 7 ie — ok (10) This vector is perpendicular to u and v. The cross product v x u is —(u x v). Comment The 3 by 3 determinant is the easiest way to remember u x v. It is not especially legal, because the first row contains vectors i, j, k and the other rows contain numbers. In the determinant, the vector i = (1,0, 0) multiplies u2v3 and —u3v2. The result is (u2v3 — u3v2, 0,0), which displays the first component of the cross product. Notice the cyclic pattern of the subscripts: 2 and 3 give component |, then 3 and 1 give component 2, then | and 2 give component 3. This completes the definition of u x v. Now we list its properties. Property 1 v x uw reverses every sign so it equals —(u x v). We are exchanging rows 2 and 3 in the determinant so we expect a sign change. Property 2 The cross product is perpendicular to uw (and also to v), The most direct proof is to watch terms cancel. Perpendicularity is a zero dot product: u-(u Xx Vv) = uy (U2V3 — U3V2) + U2(U3V) — U4 V3) + U3(UjV2 — u2U,) = 0. (11) 238 Chapter 5 Determinants This is like replacing i, j, k in the top row by wy, u2, u3. The cross-product determinant now has a repeated row so it is zero. Property 3 The cross product of any vector with itself (two equal rows) is u xu = 0. When w and v are parallel, the cross product is zero. When wu and v are perpendicular, the dot product is zero. One involves sin@ and the other involves cos 6: lw x vl] = |lu|| lvl |sin@| Example 8 Since u = (3,2,0) the z axis: = and fu + | = ||z|| ||v|] | cos 4]. and v = (1,4,0) ‘pak 2 0O|= 10k. 4 0 (12) are in the xy plane, u x v goes up The cross product is (0, 0, 10). The length of u x v equals the area of the parallelogram with sides u and v. This will be important: In this example the area is 10. Example 9 The cross product of wu = (1,1, 1) and v = (1, 1, 2) is (1, —1, 0): 1 alt a]tel afet—s i | EE ee ee ie This vector (1, —1, 0) is perpendicular to (1, 1, 1) and (1, 1, 2) as predicted. Area = MO}, Example 10 The cross product of (1,0,0) and (0, 1,0) obeys the right hand rule. It goes up not down: Io ik k O|=k 0 Rule ux v points along your right thumb when the fingers curl from u to v. u =i v=j Thus i x j = k. The right hand rule also gives j x kK= i and k xi = j. Note the cyclic order. In the opposite order (anti-cyclic) the thumb is reversed and the cross product goes the other way: k x j = —i andi x k = —j and j x i = —k. You see the three plus signs and three minus signs from a 3 by 3 determinant. 5.3 Cramer’s Rule, Inverses, and Volumes 239 The definition of u x v can be based on vectors instead of their components: DEFINITION = The cross product is a vector with length ||w|j ||v|| |sin@|. Its direction is perpendicular to u and v. It points “up” or “down” by the right hand rule. This definition appeals to physicists, who hate to choose axes and coordinates. They see (U1, U2, U3) as the position of a mass and (F,, Fy, F,) as a force acting on it. If F is parallel to u, then u x F = 0—there is no turning. The mass is pushed out or pulled in. The cross product u x F is the turning force or torque. It points along the turning axis (perpendicular to u and F). Its length ||w|| ||F'|| sin@ measures the “moment” that produces turning. Triple Product = Determinant = Volume Since u xv is a vector, we can take its dot product with a third vector w. That produces the triple product (u x v)-w. It is called a “scalar” triple product, because it is a number. In fact it is a determinant: Gy W, W2 U3 wm=|u, un uz|=|¥ uj v2 3). V1 U2 U3 W2 W3 Ww] u2 U3 (13) We can put w in the top row or the bottom row. The two determinants are the same because it takes row exchanges to go from one to the other. Notice when the determinant is zero: (u x v)»w=O_ First reason exactly when the vectors u, v, w lie in the same plane. u X v is perpendicular to that plane so its dot product with w is zero. Second reason Three vectors in a plane are linearly dependent. The matrix is singular (determinant = 0). Third reason Determinant equals volume. This is zero when the u, v, w box is squashed onto a plane. It is remarkable that (u x v)-w equals the volume of the box with sides u, v, w. Both numbers are given by the 3 by 3 determinant—which carries tremendous information. Like ad — bc for a 2 by 2 matrix, it separates invertible from singular. Chapter 6 will be looking for singular. 240 = Chapter 5 Determinants =» REVIEW OF THE KEY IDEAS #8 Cramer’s Rule solves Ax = b by ratios like x; = |baz...an|/|Al. The formula for the inverse matrix is A~! = CT /(det A). The volume of a box is | det A], when the sides are the rows of A. That volume is needed to change variables in double and triple integrals. The cross product u x v is perpendicular to u and v. Problem Set 5.3 Problems 1 1-5 are about Cramer’s Rule for x = A~!b. Solve these linear equations by Cramer’s Rule x; = det B;/ det A: ad i ie ae 2x1 + b) x2 =] x +2 + 3 =0 aa x2 + 2x3 = 0. Use Cramer’s Rule to solve for y (only). Call the 3 by 3 determinant D: i) ax by =1 ex'-+ dy = 0 ax+by+cez=1 dx+ey+ fz=0 gx thy+iz=0. (b) Cramer’s Rule breaks down when det A = 0. Example (a) has no solution while (b) has infinitely many. What are the ratios x; = det B;/ det A in these two cases? (a) 2x1 + 3x2 = 1 4x, + 6x2 = 1 (b) 2x1 + 3x2 = 1 4x; + 6x2 = 2. Quick proof of Cramer’s rule. The determinant is a linear function of column 1. It is zero when two columns are equal. If b = Ax = Yaj;x; goes into the first column of A, the determinant |b ap a3| = |Lajx; of this matrix ar B; is a3| = x |ay a>) (a3)\ =x (a) What formula for x; comes from left side = right side? (b) What steps lead to middle determinant = right side? det A. If the first column of A is also the right side b, solve the 3 by 3 system Ax = b. How does each determinant in Cramer’s Rule lead to your answer? 5.3. Cramer’s Rule, Inverses, and Volumes Problems 6-16 are about A~! = C'/ det A. Remember 6 241 to transpose C. Find A~! from the cofactor formula C!/det A. Use symmetry in part (b). i (2) eee ee 2. 0 ones) a (0) Os as ot ee FS ee Ore 1h F2 If all the cofactors are zero, how do you know that A has no inverse? of the cofactors are zero, is A sure to be invertible? If none Find the cofactors of A and multiply AC? to find det A: i AC—a it i al lane, F205 and i A) Cs Suppose det A = | and you know all the cofactors. How can you find A? From the formula AC! = (det A)/ show that det C = (det A)"~!. (for professors only) If you know all 16 cofactors of a 4 by 4 invertible matrix A, how would you find A? If all entries of A are integers, and det A = A7! are integers. Give a 2 by 2 example. | or —1, prove that all entries of If all entries of A and A~! are integers, prove that det A = 1 or —1. Hint: What is det A times det A~!? 14 Complete the calculation of A~! by cofactors in Example 5. 15 L is lower triangular and S is symmetric. ea) jb=S\a Ee det 4h 16 Qa Sas | di baad mGaE olenen tj (a) L~! is also lower triangular. Which three cofactors are zero? (b) S~! is also symmetric. Which three pairs of cofactors are equal? For n = 5 the matrix C contains ____ cofactors and each 4 by 4 cofactor contains terms and each term needs —____ multiplications. Compare with 53 — 125 for the Gauss-Jordan computation of A~! in Section 2.4. Problems 17 Assume they are invertible: 17-26 are about area and volume by determinants. (a) Find the area of the parallelogram with edges v = (3,2) and w = (1, 4). (b) Find the area of the triangle with sides v, w, and v + w. Draw it. (c) Find the area of the triangle with sides v, w, and w — v. Draw it. 242 Chapter 5 18 A box has edges from (0, 0, 0) to (3, 1, 1) and (1, 3, 1) and (1, 1, 3). Find its volume and also find the area of each face from ||u x v||. The faces are Determinants parallelograms. 19 20 (a) The corners of a triangle are (2, 1) and (3,4) and (0,5). What is the area? (b) Add a corner at (—1,0) the area? to make a lopsided region (four sides). What is The parallelogram with sides (2,1) and (2,3) has the same area as the parallel- ogram with sides (2,2) and (1,3). Find those areas from 2 by 2 determinants and say why they must be equal. (I can’t see why from a picture. Please write to me if you do.) 21 (a) Suppose the column vectors in a 3 by 3 matrix have lengths Lj, L2, L3. What is the largest possible value for the determinant? (b) If all entries have |a;;| = 1, can the determinant (six terms) equal 6? 22 Show by a picture how a rectangle with area x; y2 minus a rectangle with area X2y, produces the same area as our parallelogram. 23 When the edge vectors a, b,c are perpendicular, the volume of the box is |la]| times ||b|| times |lc||. The matrix A’ A is . Find the determinants of ATA and A. 24 The box with edges i and j and w = 2i + 37 + 4k has height . What is the volume? What is the matrix with this determinant? What is i x j and what is its dot product with w? 25 An n-dimensional cube has how many corners? How many edges? How many (n — 1)-dimensional faces? The cube whose edges are the rows of 2/ has volume _____. A hypercube computer has parallel processors at the corners with connections along the edges. 26 The triangle with corners (0, 0), (1, 0), (0, 1) has area 5. The pyramid with four corners (0, 0,0), (1, 0, 0), (0, 1, 0), (0, 0, 1) has volume . What is the volume of a 4-dimensional pyramid with five corners at (0,0, 0,0) and the columns of [? Problems 27, 27-30 are about areas dA and volumes dV Polar coordinates r,@ satisfy x =rcos@ and OxfOr” ON OOn dy/dr The two columns dy/d0| — in calculus. y=rsin@. cos@ sinO? Polar area includes J: —rsind | rcos@ are orthogonal and their lengths are ~ Therefore: J = 5.3. Cramer’s Rule, Inverses, and Volumes 243 28 Spherical coordinates p,y,6 satisfy x = psingcos@ and y = psingsin@ and z= pcosg. Find the 3 by 3 matrix of partial derivatives: dx/dp, dx /dg~, 0x/d0 in row 1. Simplify its determinant to J = p*sing. Then dV ina sphere is p* sing dp dgdé. 29 The matrix that connects 7,6 to x, y is in Problem 27. Invert that 2 by 2 matrix: al ore to dr/ox d0/dx or/dy dO/dy cos@ ? ae yh ? It is surprising that dr/dx = dx/dr (Calculus, Gilbert Strang, p. 501). Multiplying the matrices in 27 and 29 gives the chain rule a = cae + a Ae 30 = The triangle with corners (0, 0), (6, 0), and (1, 4) has area it by 6 = 60° the area is ae . When you rotate . The determinant of the rotation matrix is cos @ — sind} sind cos@ _ 5 ? ies ois Problems 31-37 are about the triple product (u x v) -w in three dimensions. 31. A box has base area ||u x v||. Its perpendicular height is ||w||cos@. Volume equals base area times height. So volume = ||u x v|| ||w|| cos @ which is (ux v)-w. Compute base area, height, and volume for u = (2,4,0), v = (—1,3,0), w = Gee 32 33 The volume of the same box is given by a 3 by 3 determinant. determinant. Evaluate that Expand the 3 by 3 determinant in equation (13) in cofactors of its row 1, U2, U3. This expansion is the dot product of u with the vector 34 Which of the triple products (u x w)-v and (w xu)-v and (v x w)-u are the same as (u x v)+w? Which orders of the rows u, v, w give the correct determinant? Shee ket? = (hk 0,=1) andiOi= G51) and.k = @2.2a1)< is a parallelogram and compute its area. Choose T,U,V is a tilted box and compute its volume. 36 37 ChooseS’so' tha. PORS so that OPQRSTUV Suppose (x, y,z) and (1, 1,0) and (1, 2,1) lie on a plane through the origin. What determinant is zero? What equation does this give for the plane? Suppose (x, y, z) is a linear combination of (2, 3,1) and (1, 2,3). What deter- minant is zero? What equation does this give for what plane? 38 (a) Explain from volumes why det2A = 2” det A for n by n matrices. (b) For what size matrix is the false statement det A+det A = det(A + A) true? . TE Raita? Qoereay qe (imees «1 Hay Wieheee SSN oot, (ule GH,ees \-ORk ge aay rN Aa FENN anet if aay inital shied‘4 > \ at melon | Thee ones sage ar all? eal DS, 4 mr: vows hanes otathe aa Skane ©Ts). ton n lthtaie), Dl oe we hice ; Tid peu beeentlw oe etal: Bh Na) Ti nm ei any 7. ‘ght Ay vt an | we ‘+ejane faa. harass ae ay 4 Gh air Ue 3, fi 2 ps tal “7 Mri oJ Pl reigns‘Gin ; a te vn a8 hw i le ing wah panes et ma Se aR ) et vai sat (gyn TY nae wren) Gr Metre HAN ite snl ee TELE a ie Boy rent rth it eee) a me hh ¥ “a re = wit Cote 6 a See Soe) + res ens sagan ehennis ati Swi fis Fu 1 PePol Wii — ) oo mili a tt a bea gD ny .. car 6 Tie pews HOA @! werealiens cutee anycde ‘iD (oem 1) ® eneahd® Oh ee tm ec, @ here pyle Sip eindildipars a acai dina cium et yn ‘oat (arte y gutali¢ eurenehas ee" Nadecole at tow © gee) Minders olgith adTe | Cweusiccrrsctadle Hime oy yo 8 no, mh... 28 te. oe <n eo Wy orbhs at fade Ses oO, () @\ ae Ee nae nical — Sih A) we A tad hye eB bun (f— aS) waits Sadie iD ND AROS ah orice BONY wir th pees aes Wisteria Tonghy @ al peer y aw i pea ud idSot here pity — thai yen otek foo TF Rye snmptiigresest at | we 7 es 7) fart) FS 8 Aet b nerd _ a ies — and) a swore he: . eaters miata ‘ EIGENVALUES AND EIGENVECTORS INTRODUCTION TO EIGENVALUES ua 6.1 Linear equations Ax = b come from steady state problems. Eigenvalues have their greatest importance in dynamic problems. The solution is changing with time—growing or decaying or oscillating. We can’t find it by elimination. This chapter enters a new part of linear algebra. All matrices in this chapter are square. A good model comes from the powers A, A’, A>,... of a matrix. Suppose you need the hundredth power A!. The starting matrix A becomes unrecognizable after a few steps: sOuaaesos en 70.45 Bo ee) £650 {2325 350.475 A A? A j)qres .6000 4000 .6000 .4000 Ae A! was found by using the eigenvalues of A, not by multiplying 100 matrices. Those eigenvalues are a new way to see into the heart of a matrix. To explain eigenvalues, we first explain eigenvectors. Almost all vectors change direction, when they are multiplied by A. Certain exceptional vectors x are in the same direction as Ax. Those are the “eigenvectors.” Multiply an eigenvector by A, and the vector Ax is a number A times the original x. The basic equation is Ax = Ax. The number A is the “eigenvalue”. It tells whether the special vector x is stretched or shrunk or reversed or left unchanged— when it is multiplied by A. We may find A = 2 or t or —1 or 1. The eigenvalue A could also be zero. The equation Ax = Ox means that this eigenvector x is in the nullspace. If A is the identity matrix, every vector has Ax = x. All vectors are eigenvectors. The eigenvalue (the number lambda) is A = 1. This is unusual to say the least. Most 2 245 246 Chapter 6 Eigenvalues and Eigenvectors | A2x1 = (1)*x1 eS A2x9 = (.5)2x5 = Bey AxD =p) — [ea [1 Figure 6.1 The eigenvectors keep their directions. The eigenvalues for A> are), by 2 matrices have two eigenvector directions and two eigenvalues. This section teaches how to compute the x’s and 4’s. It can come early in the course because we only need the determinant of a 2 by 2 matrix. For the matrix A in our model, here are eigenvectors x; and x2. Multiplying those vectors by A gives x; and 5X2. The eigenvalues 4 are | and 5: fe ey oe za] | ky wre] Wes = 2%, (Ax =x and Ax) = E oD and eel cote Tal alr io an=[5 lie S| If we multiply x; by A again, we still get x;. A"x, =x meéans that A, — 1) ar Tt i (this is 3X2 SO A2 = 5). Every power of A will give ,. Multiplying x2 by A gave 5X2, and if we multiply again we get (5)°x2. When A is squared, the eigenvectors x1 and x2 stay the same. The A’s are now 17 and (5). The eigenvalues are squared! This pattern keeps going, because the eigenvectors stay in their own directions (Figure 6.1) and never get mixed. The eigenvectors of A! are the same x, and x2. The eigenvalues of A! are 1190 = 1 and (5) 1 = very small number. Other vectors do change direction. But those other vectors are combinations of the two eigenvectors. The first column of A is x; + (.2)x2: 2]=L4]+L-3} 8 6 D5 z Multiplying by A gives the first column of A”. Do it separately for x; and (.2)x2. Of course Ax; = x,. And A multiplies the second vector by its eigenvalue 5: alee Wa les Each eigenvector is multiplied by its eigenvalue, whenever A is applied. We didn’t need these eigenvectors to find A*. But it is the good way to do 99 multiplications. 6.1 Introduction to Eigenvalues 247 At every step x; is unchanged and x2 is multiplied by (3), so we have (Gy: Ao ae | is really 1 : | + | small : This is the first column of A!°°, exact. Moe x, + (2)(Gy"x2 = vector The number we originally wrote as .6000 was not We left out (.2)(5)”” which wouldn’t show up for 30 decimal places. The eigenvector x; is a “steady state” that doesn’t change (because A; = 1). The eigenvector x2 is a “decaying mode” that virtually disappears (because A2 = .5). The higher the power of A, the closer its columns approach the steady state. We mention that this particular A is a Markov matrix. Its entries are positive and every column adds to 1. Those facts guarantee that the largest eigenvalue is A = 1 (as we found). Its eigenvector x; = (.6, .4) is the steady state—which all columns of A* will approach. Section 8.2 shows how Markov matrices appear in applications. Example 1 The projection matrix P = Fe 2] has eigenvalues 0 and 1. Its eigenvectors are x; = (1,1) and x2 = (1,—1). For those vectors, Px; = x; and Px2 = 90. This projection matrix illustrates three things at once: 1 Each column of P adds to 1, so A = | is an eigenvalue. 2 P is a singular matrix, so 4 = 0 is an eigenvalue. 3 P is a symmetric matrix, so x; and x2 are perpendicular. The only possible eigenvalues of a projection matrix are 0 and 1. The eigenvectors for X = 0 (which means Px = Ox) fill up the nullspace. The eigenvectors for 4 = 1 (which means Px = x) fill up the column space. The nullspace is projected to zero. The column space stays fixed (those vectors are projected onto themselves). An in-between vector like v = (3, 1) partly disappears and partly stays: LE] = [EE] The projection keeps the column space part of v and destroys the nullspace part. To emphasize: Special properties of a matrix lead to special eigenvalues and eigenvectors. That is a major theme of this chapter. Projections have 4 = 0 and |. The next matrix (a reflection and a permutation) is also special. Example 2 The reflection matrix R = [9}] has eigenvalues 1 and —1. 248 Chapter 6 Eigenvalues and Eigenvectors se ® Px,cia = * at ~ X5 \ N Rx, =X, en " x \ s S = \ N 5 \ oS N ‘e ee Projection onto line ; Reflection across line Rx ‘ Figure 6.2 Projections have A = | and 0. Reflections have 4 = 1 and —1. Perpendicular eigenvectors. The eigenvector (1, 1) is unchanged by R. The second eigenvector is (1, —1)—its signs are reversed by R. A matrix with no negative entries can still have a negative eigenvalue! The perpendicular eigenvectors are the same x; and x2 that we found for projection. Behind that fact is a relation between the matrices R and P: DP <= Ii ss IX or io ae ae, iO Oneal 3\-lo Wah: “ak Here is the point. If Px = Ax then 2Px = 2x. the matrix is doubled. Now subtract x (2) The eigenvalues are doubled when = x. The result is (2P — /)x = (2A — 1)x. When a matrix is shifted by I, each x is shifted by 1. The eigenvectors stay the same. The eigenvalues are related exactly as the matrices are related: 2P — 1 eR so the eigenvalues of R are 2) aA = 1 20) —TSsi Similarly the eigenvalues of R? are 7. In this case R* = J and (1)? = Land (—1)? = 1. The Equation for the Eigenvalues In small examples we could try to solve Ax = Ax by trial and error. Now we use determinants and linear algebra. This is the key calculation in the whole chapter—to compute the X’s and x’s. First move Ax to the left side. Write the equation Ax = Ax as (A—AI)x = 0. The matrix A —AJ times the eigenvector x is the zero vector. The eigenvectors make up the nullspace of A—dI! When we know an eigenvalue 2, we find an eigenvector by solving (A —AJ)x = 0. Eigenvalues first. If (A — A/)x = 0 has a nonzero solution, A — AJ is not invertible. The determinant of A —XI must be zero. This is how to recognize an eigenvalue A: 6.1 6A Introduction to Eigenvalues 249 The number A is an eigenvalue of A if and only if det(A — AI) =0. (3) This “characteristic equation” involves only A, not x. Then each eigenvalue A leads to x: Solve (A —Al)x =0 or Ate AY: (4) Since det(A — AJ) = 0 is an equation of the nth degree, the matrix has n eigenvalues. Example 3 A= (3 ral is already singular (zero determinant). Find its 4’s and x’s. When A is singular, 4 = 0 is one of the eigenvalues. The equation Ax = Ox has solutions. They are the eigenvectors for A = 0. But to find all 4’s and x’s, begin as always by subtracting AJ from A: Subtract 4 from the diagonal to find Take the determinant of this matrix. 2 — 5A +4. From 1 — A—AJ == = [5 é ) ae | , times 4—4, the determinant involves The other term, not containing A, is 2 times 2. Subtract as in ad — be: ania how — — _-QQ2) =a 2 Ls 76d2 i= (1-24-12) (5) This determinant A*—5A is zero when A is an eigenvalue. Factoring into A times A—S, the two roots are A = 0 and 4 =5S: det(A — AI) = 7-51 Now find the eigenvectors. (A —OI)x = (4=sx =0 yields the eigenvalues The matrices and dA2=5S. Solve (A — A/)x = 0 separately for 4; = 0 and Az =5: ea bee) haan | Rt F | ele A =| 5 A; =0 ae yields an eigenvector yi} 2 Fisted ic [OF Ay 0 A yt am Olen: pale yields an eigenvector i ee fe: Be b fOR Ad == 9. A —O/ and A —S/ are singular (because 0 and 5 are eigenvalues). The eigenvectors are in their nullspaces: (A — A/J)x =0 is Ax = Ax. We need to emphasize: There is nothing exceptional about X = 0. Like every other number, zero might be an eigenvalue and it might not. If A is singular, it is. The eigenvectors fill the nullspace: Ax = Ox = 0. If A is invertible, zero is not an eigenvalue. We shift A by a multiple of / to make it singular. In the example, the shifted matrix A —5/ was singular and 5 was the other eigenvalue. 250 Chapter 6 Summary 1. Eigenvalues and Eigenvectors To solve the eigenvalue problem for an n by n matrix, follow these steps: Compute the determinant of A — AI. With 2 subtracted along the diagonal, this determinant starts with 4” or —A”. It is a polynomial in A of degree n. 2. Find the roots of this polynomial, so that det(A — AJ) = 0. The n roots are the n eigenvalues of A. 3. For each eigenvalue solve the system (A —AI)x = 0. Since the determinant is zero, there are solutions other than x = 0. Those x’s are the eigenvectors. A note on quick computations, when A is 2 by 2. The determinant of A —A/ is a quadratic (starting with 47). By factoring or by the quadratic formula, we find its two roots (the eigenvalues). Then the eigenvectors come immediately from A — AJ. This matrix is singular, so both rows are multiples of a vector (a,b). The eigenvector is any multiple of (b, —a). The example had A = 0 and A =5S: XA =0: rows of A —OJ/ in the direction (1, 2); eigenvector in the direction (2, —1) 4 =5: rows of A —S/ in the direction (—4, 2); eigenvector in the direction (2, 4). Previously we wrote that last eigenvector as (1,2). Both (1,2) and (2,4) are correct. There is a whole line of eigenvectors—any nonzero multiple of x is as good as x. Often we divide by the length, to make the eigenvector into a unit vector. We end with a warning. Some 2 by 2 matrices have only one line of eigenvectors. This can only happen when two eigenvalues are equal. (On the other hand A = / has equal eigenvalues and plenty of eigenvectors.) Similarly some n by n matrices don’t have n independent eigenvectors. Without n eigenvectors, we don’t have a basis. We can’t write every v as a combination of eigenvectors. In the language of the next section, we can’t diagonalize the matrix. Good News, Bad News Bad news first: If you add a row of A to another row, or exchange rows, the eigenvalues usually change. Elimination does not preserve the i’s. After elimination, the triangular U has its eigenvalues sitting along the diagonal—they are the pivots. But they are not the eigenvalues of A! Eigenvalues are changed when row | is added to row 2: | ale Ro ‘| has Good news second: A = 0 and A = 1; a=| Toe I Gy haseAs— Onan Gee The product i, times Az and the sum A; +A can be found quickly from the matrix A. Here the product is 0 times 2. That agrees with the deter- 6.1 Introduction to Eigenvalues 251 minant of A (which is 0). The sum of the eigenvalues is 0 +2. That agrees with the sum down the main diagonal of A (which is 1+ 1). These quick checks always work: 6B The product of the n eigenvalues equals the determinant of A. 6C The sum of the n eigenvalues equals the sum of the n diagonal entries. number is called the trace of A: Ay + A2 + +++ + An = race = aj; + 422 +--+ + Gan. This (6) Those checks are very useful. They are proved in Problems 15-16 and again in the next section. They don’t remove the pain of computing the 4’s. But when the computation is wrong, they generally tell us so. To compute the correct 4’s, go back to det(A — AJ) = 0. The determinant test makes the product of the ’s equal to the product of the pivots (assuming no row exchanges). But the sum of the A’s is not the sum of the pivots—as the example showed. The individual A’s have almost nothing to do with the individual pivots. In this new part of linear algebra, the key equation is really nonlinear: A multiplies x. Imaginary Eigenvalues One more bit of news (not too terrible). The eigenvalues might not be real numbers. Example 4 The 90° rotation Q =[_° }] has no real eigenvectors or eigenvalues. No vector Qx stays in the same direction as x (except the zero vector which is useless). There cannot be an eigenvector, unless we go to imaginary numbers. Which we do. To see how i can help, look at Q? which is —/. If Q is rotation through 90°, then Q? is rotation through 180°. Its eigenvalues are —1 and —1. (Certainly —Ix = —1x.) Squaring Q is supposed to square its eigenvalues A, so we must have ? = —1. The eigenvalues of the 90° rotation matrix Q are +i and —i. Those A’s come as usual from det(Q — AJ) = 0. This equation gives 4274+1=0. Its roots are Aj =i and A2 = —i. They add to zero (which is the trace of Q). They multiply to one (which is the determinant). We meet the imaginary number 7 (with i? = —1) also in the eigenvectors of Q: ft olll=‘L:] = Lr olli=-Li} Somehow these complex vectors keep their direction as they are rotated. Don’t ask me how. This example makes the all-important point that real matrices can easily have 252 Chapter 6 Eigenvalues and Eigenvectors complex eigenvalues. It also illustrates two special properties that we don’t study completely until Chapter 10: 1 Q is a skew-symmetric matrix so each 4 is pure imaginary. 2 Q is an orthogonal matrix so the absolute value of each A is 1. A symmetric matrix (A? = A) can be compared to a real number. A skew-symmetric matrix (A? = —A) can be compared to an imaginary number. An orthogonal matrix (ATA = J) can be compared to a complex number with |A| = 1. For the eigenvalues those are more than analogies—they are theorems to be proved. The eigenvectors for all these special matrices are perpendicular.* Eigshow There is a MATLAB demo called eigshow, displaying the eigenvalue problem for a 2 by 2 matrix. It starts with the unit vector x = (1,0). The mouse makes this vector move around the unit circle. At the same time the graph shows Ax, in color and also moving. Possibly Ax is ahead of x. Possibly Ax is behind x. Sometimes Ax is parallel to x. At that parallel moment, Ax = Ax and x is an eigenvector. tae / Ax7 = 0.5x7 0.3, 0.7) \ \ / \ \ ellipse of Ax’s AxI——1(O18702) As ge / Fe ciclesofix's ap =O) The eigenvalue A comes from the length and direction of Ax. The built-in choices for A illustrate various possibilities: 1. There are no (real) eigenvectors. The directions of x and Ax never meet. eigenvalues and eigenvectors are complex. 2. There is only one line of eigenvectors. but don’t cross. 3. There are eigenvectors in two independent directions. This is typical! Ax crosses x at the first eigenvector, and it crosses back at the second eigenvector. The moving directions Ax and x meet This file is in the Student Version and in MATLAB and it can be downloaded from the Web: *Chapter 10 defines “perpendicular” for complex The vectors. 5.2. It is with our Teaching Codes Somehow (i, 1) and (1,i) are perpendicular. 6.1. Introduction to Eigenvalues 253 http:// pub/mathworks/toolbox/matlab/demos Suppose A is singular (rank one). Its column space is a line. The vector Ax can’t move around, it has to stay on that line. One eigenvector x is along the line. Another eigenvector appears when Ax2 = 0. Zero is an eigenvalue of a singular matrix. You can mentally follow x and Ax for these six matrices. How many eigenvectors and where? When does Ax go clockwise instead of counterclockwise? “Ei =» a) Co) Go bi) fi REVIEW OF THE KEY IDEAS #® 1. Ax =Ax says that x keeps the same direction when multiplied by A. 2. Ax =dx also says that det(A — AJ) = 0. This determines n eigenvalues. 3. The eigenvalues of A* and A~! are A* and A~!, with the same eigenvectors. 4. The sum and product of the A’s equal the trace and determinant. 5. Special matrices like projections and rotations have special eigenvalues ! Problem Set 6.1 1 The example at the start of the chapter has yan ees) on Ee Z| es WO. aA EE s) ne A te eed F Al: The matrix A? is halfway between A and A®. Explain why A? = 5(A + A%) from the eigenvalues and eigenvectors of these three matrices. 2 (a) Show from A how a row exchange can produce different eigenvalues. (b) Why is a zero eigenvalue not changed by the steps of elimination? Find the eigenvalues and the eigenvectors of these two matrices: hi eieenalee 1 A+/T/ has the 4 24 eigenvectors as A. Its eigenvalues are ebyri 254 Chapter 6 Eigenvalues and Eigenvectors 2) Compute the eigenvalues and eigenvectors of A and An aa A! has the (2 A rel | and eigenvectors as A. When —3/4 1/2 1/2 al A has eigenvalues A; and Az, its inverse has eigenvalues Compute the eigenvalues and eigenvectors of A and sh —1 hei, A® has the same values 3 A and ay |i oe ees. ‘| as A. When A has eigenvalues A, and A2, A” has eigen- Find the eigenvalues of A and B and A+ B: i 12) Eigenvalues of A+B values of B. @ il and Leta alk | and rt a+B=|i a (are equal to)(are not equal to) eigenvalues of A plus eigen- Find the eigenvalues of A and B and AB and BA: et i @ i and fl Fa; il | and if ap =| il 3] and Dil Ba=(/ ap Eigenvalues of AB (are equal to)(are not equal to) eigenvalues of A times eigenvalues of B. Eigenvalues of AB (are equal to)(are not equal to) eigenvalues of BA. Elimination produces A = LU. The eigenvalues of U are on its diagonal; they are the . The eigenvalues of L are on its diagonal; they are all The eigenvalues of A are not the same as (a) If you know x is an eigenvector, the way to find A is to (b) If you know A is an eigenvalue, the way to find x is to What do you do to Ax = Ax, in order to prove (a), (b), and (c)? (ay AP tis_an eigenvalue of A’, as in Problem 4. (b) A! (c) A+ 1 is an eigenvalue of A+ /, as in Problem 2. is an eigenvalue of A—!, as in Problem 3. 6.1 10 Introduction to Eigenvalues 255 Find the eigenvalues and eigenvectors for both of these Markoy matrices A and A®. Explain why A! is close to AX: ae 2 ery te seal 7 | me =|5; aa 11 Find the eigenvalues and eigenvectors for the projection matrices P and P!: fan If two eigenvectors share the same aoe) Wise) teh at) OF 0. 1 A, so do all their linear combinations. Find an eigenvector of P with no zero components. 72 From the unit vector u = (Z. i ;, 2) construct the rank one projection matrix P=uu'. 13 (a) Show that Pu =u. Then u is an eigenvector with A = 1. (b) If v is perpendicular to uw show that Pv = 0. Then 4 = 0. (c) Find three independent eigenvectors of P all with eigenvalue 4 = 0. Solve det(Q — AJ) = 0 by the quadratic formula to reach A = cos@ +i —sing a eaten =| cos@ rotates the xy plane by the angle @. Find the eigenvectors of Q by solving (Q —A/)x =0. 14 Use ici Every permutation matrix leaves x = (1, 1,..., 1) unchanged. two more A’s for these permutations: Oa ligne () (Pee) eel i ©. 10 15 sin@: gral 0. (P= | O il @ i @ Then A = 1. Find 1 © @ Prove that the determinant of A equals the product A,A2---A,. Start with the polynomial det(A — A/) separated into its n factors. Then set 4 = det(A — AI) = (Ay —A)(A2 —A)-++ (An —A) 16 so detA= The sum of the diagonal entries (the trace) equals the sum of the eigenvalues: a a=| c b | has If A has Ay = 3 and Az =4 gives the eigenvalues det(A —AJ) D =A* —(a+d)X\+ad — bc =0. then det(A—AJ) = A= (a+d+/ )/2andA= _. The quadratic formula __. Their sum is 256 Chapter 6 Eigenvalues and Eigenvectors 17 If A has Ay =4 and Ay =5 then det(A —AJ) = (A —4)(A — 5) = A2 — 94.420. Find three matrices that have trace 18 A 3 by 3 matrix B is known a + d = 9 and determinant 20 and A = 4, 5. to have eigenvalues 0,1,2. This information is enough to find three of these: (a) the rank of B (b) the determinant of B'B (c) the eigenvalues of B'B (d) the eigenvalues of (B+ / ieee 19 Choose the second row of A =[°!] so that A has eigenvalues 4 and 7. 20 Choose a, b,c, so that det(A —A/J) = 9, —A3. Then the eigenvalues are —3, 0, 3: OPO Aa) oi WC 21 The eigenvalues of A equal the eigenvalues of A‘. This is because det(A — A/) equals det(A! — AJ). That is true because . Show by an example that the eigenvectors of A and A! are not the same. 22 Construct any 3 by 3 Markov matrix M: positive entries down each column add to 1. If e = (1, 1,1) verify that M™e = e. By Problem 21, A = 1 is also an eigenvalue of M. Challenge: A 3 by 3 singular Markov matrix with trace : has eigenvalues A = 23 Find three 2 by 2 matrices that have 4; = Az = 0. The trace is zero and the determinant is zero. The matrix A might not be 0 but check that A* = 0. 24 This matrix is singular with rank one. Find three A’s and three eigenvectors: 1 Aja 20 1 25 pk ae) Suppose A and B have the same eigenvalues 4),..., A, with the same independent eigenvectors x;,...,X,. Then A = B. Reason: Any vector x is a combination cyx; +---+c¢,X,. 26 Ha? 2112, Meira Ae Oecd What is Ax? What is Bx? The block B has eigenvalues 1,2 and C has eigenvalues 3,4 and D has eigenvalues 5,7. Find the eigenvalues of the 4 by 4 matrix A: 0 B Cl_|-2 A= | le 0 0 ) 0 6 Fr W CoO 1 SO Ae 27 6.1 Introduction to Eigenvalues C= 1 0 1 — i = Si 0 Find the rank and the four eigenvalues of and ey ee fee py pe ay Ss pa 28 257 oOo ©= Subtract J from the previous A. Find the 4’s and then the determinant: 1 l BSA SoS SS SS 7S eyeSey aS 0 When A (all ones) is 5S by 5, the eigenvalues of A and and 29 are (Review) Find the eigenvalues of A, B, and C: oe A= ||@ ee On 4 givel When a+b =c-+d of Oneal 2 @ 0) Je=| 0 ORROmAG 30 B = A —/ 3 ANC ee Gn — a| Pi 4 SRV ED) ee Ds Be Be show that (1, 1) is an eigenvector and find both eigenvalues Goa a 31 When P exchanges rows E AF | and 2 and columns change. Find eigenvectors of A and PAP | Ave A 32 2 ij Ome Sa4 | and 2, the eigenvalues don’t for A= 11: Andee Alea a on WwW Rr WwW RR Suppose A has eigenvalues 0, 3,5 with independent eigenvectors u, v, w. (a) Give a basis for the nullspace and a basis for the column space. (b) Find a particular solution to Ax = v + w. Find all solutions. (c) Show that Ax =u column space.) has no solution. (If it did then “4 would be in the 258 Chapter 6 Eigenvalues and Eigenvectors DIAGONALIZING A MATRIX @ 6.2 When x is an eigenvector, multiplication by A is just multiplication by a single number: Ax = Ax. All the difficulties of matrices are swept away. Instead of an interconnected system, we can follow the eigenvectors separately. It is like having a diagonal matrix, with no off-diagonal interconnections. The 100th power of a diagonal matrix is easy. The point of this section is very direct. The matrix A turns into a diagonal matrix A when we use the eigenvectors properly. This is the matrix form of our key idea. We start right off with that one essential computation. 6D Diagonalization Suppose the n by n matrix A has n linearly independent eigen- vectors. Put them into the columns of an eigenvector matrix S. Then S~!AS is the eigenvalue matrix A: hn The matrix A is “diagonalized.’ We use capital lambda for the eigenvalue matrix, because of the small i’s (the eigenvalues) on its diagonal. Proof Multiply A times its eigenvectors, which are the columns of S. The first column of AS is Ax,. That is A,x,: Assi AC WA Ae le eA eh, Pc The trick is to split this matrix AS into S times A: Ay AjX1 °° * Anky | =] °° Xp so = SA. An Keep those matrices in the right order! Then 4; multiplies the first column x;, as shown. The diagonalization is complete, and we can write AS = SA in two good ways: As=SA) 3. S-'AS=A~ Ors A=SAS_1, (2) The matrix S has an inverse, because its columns (the eigenvectors of A) were assumed to be linearly independent. Without n independent eigenvectors, we can’t diagonalize. The matrices eigenvalues 41,...,A,. A and A have the same The eigenvectors are different. The job of the original eigenvectors was to diagonalize A—those eigenvectors of A went into S. The new eigenvectors, for the diagonal matrix A, are just the 6.2 Diagonalizing a Matrix 259 columns of J. By using A, we can solve differential equations or difference equations or even Ax = b. Example 1 The projection matrix P = [-3:3] has A = 1 and 0 with eigenvectors (1, 1) and (—1, 1). Put the eigenvalues into A and the eigenvectors into S. Then S -Ips=a;: ES5 5 3]fon a 0 The original projection satisfied P? = P. The new projection satisfies A* = A. The column space has swung around from (1,1) to (1,0). The nullspace has swung around from (—1, 1) to (0, 1). Diagonalization lines up the eigenvectors with the coordinate axes. Here are four small remarks about diagonalization, before the applications. Remark 1 Suppose the numbers Aj, ..., An are all different. Then it is automatic that the eigenvectors x;,...,X, are independent. See 6E below. Therefore any matrix that has no repeated eigenvalues can be diagonalized. Remark 2 The eigenvector matrix S$ is not unique. We can multiply its columns (the eigenvectors) by any nonzero constants. Suppose we multiply by 5 and —1. Divide the rows of S~! by 5 and —1 to find the new inverse: sh sees Moe wal Be ete = | = SnewP Snew = ie EY E 4 ie fi = E i sp SAIN yak —{ 2 The extreme case is A = /, when every vector is an eigenvector. S can be the eigenvector matrix. Then Any invertible matrix S-!1S = 1 (which is A). Remark 3 To diagonalize A we must use an eigenvector matrix. From S~'AS = A we know that AS = SA. Suppose the first column of S is y. Then the first columns of AS and SA are Ay and A;y. For those to be equal, y must be an eigenvector. The eigenvectors in S come in the same order as the eigenvalues in A. To reverse the order in S and A, put (—1, 1) before (1, 1): eeeemieteice Cer Remark 4 (repeated warning for repeated eigenvalues) Some matrices have too few eigenvectors. Those matrices are not diagonalizable. Here are two examples: 1 —=1 a=|y zy and 0) A=[5 dl a Their eigenvalues happen to be 0 and 0. Nothing is special about zero—it is the repeated A that counts. Look for eigenvectors of the second matrix. Ax =Ox means Oat 0 ip alPale lat 260 Chapter 6 Eigenvalues and Eigenvectors The eigenvector x is a multiple of (1,0). There is no second eigenvector, so A cannot be diagonalized. This matrix is the best example to test any statement about eigenvectors. In many true-false questions, this matrix leads to false. Remember that there is no connection between invertibility and diagonalizability: - Invertibility is concerned with the eigenvalues (zero or not). - Diagonalizability is concerned with the eigenvectors (too few or enough). Each eigenvalue has at least one eigenvector! If (A —A/)x = O leads you to x = 0, then A is not an eigenvalue. Look for a mistake in solving det(A —A/) = 0. If you have eigenvectors for n different ’s, those eigenvectors are independent and A is diagonalizable. 6E (Independent x from different A) Eigenvectors x1, ..., x; that correspond to distinct (all different) eigenvalues are linearly independent. Ann by n matrix that has n different eigenvalues (no repeated i’s) must be diagonalizable. Proof Suppose c;x; +c2x2 = 0. Multiply by A to find cjA x; +c2A2x2 = 0. Multiply by Az to find cyA2x1 + c2A2x2 = 0. Now subtract one from the other: Subtraction leaves (A; — A2)cix; = 0. Since the i’s are different and x; 4 0, we are forced to the conclusion that cj = 0. Similarly cz = 0. No other combination gives c;x; +c2x2 = 0, so the eigenvectors x and x2 must be independent. This proof extends directly to j eigenvectors. Suppose cjx; +---+ cjx; = 0. Multiply by A, then multiply separately by A;, and subtract. This removes x;. Now multiply by A and by Aj;_-; and subtract. This removes x;_;. Eventually only a multiple of x, is left: (At —A3) "Ai = 2; cixy = 0 which forces” cy = 0. (3) Similarly every cj; = 0. When the A’s are all different, the eigenvectors are independent. With n different eigenvalues, the full set of eigenvectors goes into the columns of S. Then A is diagonalized. Example 2 The Markov matrix A = [-8-3] in the last section had A, = 1 and Az = .5. Here is A= SAS7!: bea eal clined The eigenvectors (.6, .4) and (1, —1) are in the columns of S$. We know that they are also the eigenvectors of A?. Therefore A* has the same S, and the eigenvalues in A are squared: A* =SAS SAS] = SA2s“!, 6.2 Diagonalizing a Matrix 261 Just keep going, and you see why the high powers A* approach a “steady state”: pO ehOG perry Wows iat ho =| anil itm elle Hal As k gets larger, (.5)* gets smaller. In the limit it disappears completely. That limit is eos oR A aL I =1/)/0 ORT 1 ey Decca: O]| 4 —6]1.4 “41° The limit has the eigenvector x; in both columns. We saw this steady state in the last section. Now we see it more quickly from powers like A! = §A!5s-!, Eigenvalues of AB and A+ B The first guess about the eigenvalues of AB is not true. An eigenvalue 4 of A times an eigenvalue 6 of B usually does not give an eigenvalue of AB. It is very tempting to think it should. Here is a false proof: ABx = ABx = BAx = BAx. (4) It seems that 6 times A is an eigenvalue. When x is an eigenvector for A and B, this proof is correct. The mistake is to expect that A and B automatically share the same eigenvector x. Usually they don’t. Eigenvectors of A are not generally eigenvectors of B. A and B can have all zero eigenvalues, while 5 Onl | and ae ORO k ak then | is an eigenvalue of AB: LO aB=(j | and Ome a+B=|\ ale For the same reason, the eigenvalues of A+ B are generally not A+. Here +f =0 while A + B has eigenvalues 1 and —1. (At least they add to zero.) The false proof suggests what is true. Suppose x really is an eigenvector for both A and B. Then we do have ABx = ABx. Sometimes all n eigenvectors are shared, and we can multiply eigenvalues. The test for A and B to share eigenvectors is important in quantum mechanics—time out to mention this application of linear algebra: 6F Commuting matrices share eigenvectors Suppose A and B can each be diagonalized. They share the same eigenvector matrix S if and only if AB = BA. The uncertainty principle In quantum mechanics, the position matrix P and the mo- mentum matrix Q do not commute. In fact QP —PQ = (these are infinite matrices). Then we cannot have Px = 0 at the same time as Ox = 0 (unless x = 0). If we knew the position exactly, we could not also know the momentum exactly. Problem 32 derives Heisenberg’s uncertainty principle from the Schwarz inequality. 262 Chapter 6 Eigenvalues and Eigenvectors Fibonacci Numbers We present a famous example, which leads to powers of matrices. The Fibonacci numbers start with Fo = 0 and F; = 1. Then every new F is the sum of the two previous F’s: The sequence 0,1, 1,2,3,5,8,13,... comes from = Fy42 = Fruit + Fx. These numbers turn up in a fantastic variety of applications. Plants and trees grow in a spiral pattern, and a pear tree has 8 growths for every 3 turns. For a willow those numbers can be 13 and 5. The champion is a sunflower of Daniel O’Connell, which had 233 seeds in 144 loops. Those are the Fibonacci numbers Fj3 and Fj2. Our problem is more basic. Problem: Find the Fibonacci number F99 The slow way is to apply the rule Fy42 = Fu4j + Fe one step at a time. By adding Fe = 8 to F7 = 13 we reach Fg = 21. Eventually we come to Figg. Linear algebra gives a better way. The key is to begin with a matrix equation uz4; = Aux. That is a one-step rule for vectors, while Fibonacci gave a two-step rule for scalars. We match them by putting Fibonacci numbers into the vectors: F; Let uz, = eal The mig — & + fF 9 7 Frat = Fri | “Becomes Uki1 = é Aluz. 4%(5) a(t} [i [i EB) 6 Every step multiplies by A = ( al After 100 steps we reach u199 = Al uo: The Fibonacci numbers come from the powers of A—and we can compute A! without multiplying 100 matrices. This problem is just right for eigenvalues. Subtract 2 from the diagonal of A: AxAL= yes eh leadsto det(A —AJ) =4° -A-1. The eigenvalues solve the equation A* — formula (—b + Vb? — 4ac) /2a: A — 1 = 0. They come from the quadratic _ 145 hs yee? 618. These eigenvalues A; and Az lead to eigenvectors x; and x2. This completes step 1: 1— ee 1 m 0 | eee ah we pra lea P71 ael[=|=[o] mee = =[7]. l—dpo 1 esta % |AD 6.2 Diagonalizing a Matrix 263 Step 2 finds the combination of those eigenvectors that gives ug = (1, 0): es l dM 2 cell asi) 90) i) o See ees MS The final step multiplies uo by A! to find uj99. The eigenvectors stay separate! They are multiplied by (A)? and (2)!: “100 = (A) 1x1 — (Az)!x2 MAKE (7) We want Fio9 = second component of uwjo9. The second components of x; and x2 are 1. Substitute the numbers A; and Az into equation (7), to find A; —A2 = V5 and F\00: \ Fw = 14 J5\10 Je( 5 ) 7p _ 5 100 -( 5 ) ~ 3.54-10°°. (8) Is this a whole number? Yes. The fractions and square roots must disappear, because Fibonacci’s rule Fy42 = Fey, + Fe stays with integers. The second term in (8) is less than 5, so it must move the first term to the nearest whole number: kth Fibonacci number = _ nearest integer to lets (4°) J5 2 The ratio of Fg to Fs is 8/5 = 1.6. The ratio Fj9;/Fi09 must (1 + V5) /2. The Greeks called this number the “golden mean.” \ : (9) be very close to For some reason a rectangle with sides 1.618 and 1 looks especially graceful. Matrix Powers A‘ Fibonacci’s example is a typical difference equation uz; = Aux. Each step multiplies by A. The solution is uy = A*ug. We want to make clear how diagonalizing the matrix gives a quick way to compute AS The eigenvector matrix S produces A = SAS ~! This is a factorization of the matrix, like A = LU or A = QR. The new factorization is perfectly suited to computing powers, because every time S-! multiplies S we get I: A? Ae 2S vais Ay)Weta IPS ore ASS Ss A Ss The eigenvector matrix for AK is still S, and the eigenvalue matrix is Ak. We knew that. The eigenvectors don’t change, and the eigenvalues are taken to the kth power. When A is diagonalized, Akuy is easy to compute: 264 Chapter 6 Eigenvalues and Eigenvectors 1. Find the eigenvalues of A and n independent eigenvectors. 2. Write uo as a combination c)x; +---+CnX, 3. Multiply each eigenvector x; by (A;)*. Then of the eigenvectors. up = Akug = c1(A1)*x1 +--+ + n(n)" Xn. (10) In matrix language A* is (SAS~!)* which is S$ times A* times S~!. guage, the eigenvectors in S$ lead to the c’s: In vector lan- C| UN = CX, s+ + CnXm =| X1 -+* This says that es Xn = Sc. uo Cn The coefficients c},..., Cn in Step 2 are c = S~'uo. Then Step 3 multiplies by ee The combination uz = >> cj (A;)*x; is all in the matrices § and A and § 7: (a1) Boe = SA*S “uo =SA*e=| xj ~~: _ C] 4 eee (A) Oy Cn This result is exactly uz, = c1(A1)* xy fore ep 6 0G, ee It solves uzs1 = Aug. Example 3 Compute A* when S and A and S~! contain whole numbers: It Aa; il | has A; =1 and i “=|, ho and: 1 2S.) A is triangular, with 1 and 2 on the diagonal. A* is also triangular, with 1 and 2* on the diagonal. Those numbers stay separate in A‘. They are combined in A‘: k k 1 With k =1 we get A. With k =0 we get I. With k = —1 we get A~!. Note The zeroth power of every nonsingular matrix is A? = 7. The product SA°S7! becomes SIJS~! which is J. Every A to the zeroth power is 1. But the rule breaks down when A = 0. Then 0° is not determined. We don’t know A®° when A is singular. 6.2 Diagonalizing a Matrix 265 Nondiagonalizable Matrices (Optional) Suppose i is an eigenvalue of A. We discover that fact in two ways: 1. Eigenvectors (geometric) 2. Eigenvalues (algebraic) There are nonzero solutions to Ax = Ax. The determinant of A — AJ is zero. The number A may be a simple eigenvalue or a multiple eigenvalue, and we want to know its multiplicity. Most eigenvalues have multiplicity M = | (simple eigenvalues). Then there is a single line of eigenvectors, and det(A — AJ) does not have a double factor. For exceptional matrices, an eigenvalue can be repeated. Then there are two different ways to count its multiplicity: 1. pa (Geometric Multiplicity =GM) Count the independent eigenvectors for 4. This is the dimension of the nullspace of A — AJ. (Algebraic Multiplicity = AM) Count the repetitions of A among the eigenvalues. Look at the n roots of det(A — AJ) = 0. The following matrix A is the standard example of trouble. Its eigenvalue A = 0 is repeated. It is a double eigenvalue (AM = 2) with only one eigenvector (GM = 1). The geometric multiplicity can be below the algebraic multiplicity—it is never larger: nell 0 4 0 has det(A A — 21) =| Q- 1 2 Ee =X There “should” be two eigenvectors, because the equation A* = 0 has a double root. The double factor A? makes AM = 2. But there is only one eigenvector. x4 == 1,70). This shortage of eigenvectors when GM < AM means that A is not diagonalizable. The vector called “repeats” in the Teaching Code eigval gives the algebraic mul- tiplicity AM for each eigenvalue. When repeats = [1 1... 1] we know that the n eigenvalues are all different. A is certainly diagonalizable in that case. The sum of all components in “repeats” is always n, because the nth degree equation det(A —A/) = 0 always has n roots (counting repetitions). The diagonal matrix D in the Teaching Code eigvec gives the geometric multiplicity GM for each eigenvalue. This counts the independent eigenvectors. The total number of independent eigenvectors might be less than n. The n by n matrix A is diagonalizable if and only if this total number is n. We have to emphasize: There is nothing special about A = 0. It makes for easy computations, but these three matrices also have the same shortage of eigenvectors. Their repeated eigenvalue is A = 5: oma A=\> | and eI 6 —-l ] and ey ee Fe a Those all have det(A — A/) = (A — 5)?. The algebraic multiplicity is AM = 2. But A — 5/ has rank r = |. The geometric multiplicity is GM = 1. There is only one eigenvector, and these matrices are not diagonalizable. 266 = Chapter 6 Eigenvalues and Eigenvectors =» 1. REVIEW OF THE KEY IDEAS =& If A has n independent eigenvectors (they go into the columns of S), then S TAS is diagonal: S-1AS=A and A=SAS™!,. 2. The powers of A are AX = SA*¥S—!, 3. The eigenvalues of A* are (A1)*; 02° AGEs 4. The solution to uz4; = Au, starting from uo is up = Akug = uy = c1(A1)*x) +.++4 eax, 5. provided SA‘ S~'uo: wp = cx, + +++ + CnXn. A is diagonalizable if every eigenvalue has enough eigenvectors (GM=AM). Problem Set 6.2 Questions 1-8 are about the eigenvalue and eigenvector matrices. 1 Factor these two matrices into oe | SAS7!: and W)C jrandA = 2 A=SAS~ 3 If A has A; = 2 with eigenvector x; = SAS~! 4 thaw’=C A= 2 A 6 7, ie Al (i) nC: ee and Az = 5 with x2 = Eat use to find A. No other matrix has the same A’s and x’s. Suppose A = SAS~!. eigenvector matrix? 5 a True or false: then What is the eigenvalue matrix for A + 2/? What is the Check that If the columns A+27=( )(_ )( Vou of S (eigenvectors of A) are linearly independent, (a) A is invertible (b) A is diagonalizable (c) S is invertible (d) S is diagonalizable. If the eigenvectors of A are the columns of /, then A is a matrix. eigenvector matrix S$ is triangular, then S~! is triangular. triangular. Prove that A is also Describe all matrices § that diagonalize this matrix A: ie F4 0>|| Then describe all matrices that diagonalize Aq. 8 Write down the most general matrix that has eigenvectors ea and lege If the 6.2. Diagonalizing a Matrix 267 Questions 9-14 are about Fibonacci and Gibonacci numbers. 9 For the Fibonacci matrix A = li ale compute A? and A> and A*. Then use the text and a calculator to find Fy. 10 Suppose each number G;+2 is the average of the two previous numbers and G,. Then Gx42 = 5(Gest sr Ge Gus2 = Gus + 3Gx ie ad = | hi [el Gry Gx Gea = Ges 11 (a) Find the eigenvalues and eigenvectors of A. (b) Find the limit as n > oo of the matrices A” = SA”"S7!. (c) If Go =O and G; = | show that the Gibonacci numbers approach Gx+1 WIND Diagonalize the Fibonacci matrix by completing S~!: Dh ge WRAL Ao Oe leeieee At O 0 A2 : Do the multiplication SA‘ S aleS — to find its second component. This is the kth Fibonacci number Fy = (Af — 45) /(A1 — 22). 12 The numbers hi and Ae satisfy the Fibonacci rule Fyi2 = Fey, + Fr: ARP Se Oe Pap hee ead oA hee Prove this by using the original equation for the A’s. Then any combination of ne and aS satisfies the rule. The combination F, = (at = EW ACS — 2) gives the right start Fo = 0 and F; = 1. 13 Suppose Fibonacci had started with Fo = 2 and Fj = 1. The rule Fyy2 Fy11 + Fx is the same so the matrix A is the same. Its eigenvectors add to = mtx =feel eelee) Se ete After 20 steps the second component of Ae that number F 0. 14 Ea ise) Prove that every third Fibonacci number in 0, 1, 1, 2,3,... ee is even. Compute 268 Chapter 6 Eigenvalues and Eigenvectors Questions 15-18 are about diagonalizability. 15 True or false: If the eigenvalues of A are 2,2,5 then the matrix is certainly (a) 16 (b) diagonalizable (c) not diagonalizable. True or false: If the only eigenvectors of A are multiples of (1,4) then A has (a) Ww invertible noinverse (b) arepeatedeigenvalue (c) nodiagonalization SASa. Complete these matrices so that det A = 25. Then check that A = 5 is repeated— the determinant of A—A/ is (A—5)?. Find an eigenvector with Ax = 5x. These matrices will not be diagonalizable because there is no second line of eigenvectors. 8 9 4 ONS a=| >| and a=| i and seis 18 The matrix A = iF A) is not diagonalizable because the rank of A—3/ is Change one entry by .01 to make A diagonalizable. Which entries could you change? Questions 19-23 are about powers of matrices. 19 Ak = SA*S~—! approaches the zero matrix as k > oo if and only if every A has absolute value less than . Which of these matrices has A‘ > 0? 6 A sve | E 20 gue (Recommended) Find A and S to diagonalize A in Problem 19. What is the limit of A‘ as k — 00? What is the limit of SA‘S~!? matrix you see the 21 Find A and S to diagonalize B in Problem wo=(5| 22 and w= In the columns of this limiting 19. What is B!°u9 for these uo? (3 and wo=(5]. Diagonalize A and compute SA*‘S~! to prove this formula for A*: at ie a=| 23 6 9 S Al ns has Diagonalize B and compute SA*S~! £1801 P=, | has A lel =a es eee ; to prove this formula for B*: yenloy Ba, po" ak : 6.2. Questions 24-29 are new applications of Diagonalizing a Matrix 269 A = SAS™!. 24 Suppose that A = SAS~!. Take determinants to prove that det A = A}A2---Ay, = product of A’s. This quick proof only works when A is 25 Show that trace AB = trace BA, by adding the diagonal entries of AB and BA: a=|? A ed Choose A as S and B as AS~!. and Then a=|4 al sree SAS~! has the same trace as AS~!S. The trace of A equals the trace of A which is 26 AB — BA =I is impossible since the left side has trace = elimination matrix so that AB Ia) If A= SAS—!, BA . But find an A= E and B = E! give —-1 0 0 1 : which has trace zero. diagonalize the block matrix B = ies al Find its eigenvalue and eigenvector matrices. 28 Consider all 4 by 4 matrices A that are diagonalized by the same fixed eigenvector matrix S. Show that the A’s form a subspace (cA and A; + A2 have this same S$). What is this subspace when S = /? What is its dimension? 29 Suppose A* = A. On the left side A multiplies each column of A. Which of our four subspaces contains eigenvectors with A = 1? Which subspace contains eigenvectors with A = 0? From the dimensions of those subspaces, A has a full set of independent eigenvectors and can be diagonalized. 30 (Recommended) Suppose Ax = Ax. If A = 0 then x is in the nullspace. If A 4 0 then x is in the column space. Those spaces have dimensions (n —r) +r =n. So why doesn’t every square matrix have n linearly independent eigenvectors? 31 The eigenvalues of A are | and 9, the eigenvalues of B are —1 and 9: 5 a=|j a! | and ae ah alk Find a matrix square root of A from R = SA S~!. Why is there no real matrix square root of B? 32 (Heisenberg’s Uncertainty Principle) AB — BA = 7 can happen for infinite matrices with A = A? and B = —B!. Then x'x =x! ABx —x'BAx < 2||Ax|| ||Bx||. Explain that last step by using the Schwarz inequality. Then the inequality says that ||Ax||/||x|] times ||Bx||/||x|| 1s at least 7 It is impossible to get the position error and momentum error both very small. 270 33 Chapter 6 Eigenvalues and Eigenvectors If A and B have the same A’s with the same independent eigenvectors, their factorizations into 34 are the same. So A = B. Suppose the same S diagonalizes both A and B, so that A = SA,S~! and B = SA2S~'. Prove that AB = BA. 35 If A = SAS~! show why the product (A—A,/)(A—Az2/)---(A—A,J) is the zero matrix. The Cayley-Hamilton Theorem says that this product is always zero. We are substituting A for the number A in the polynomial det(A — AJ). 36 The matrix A = [734] has det(A — A/) = A? — 1. Show from Problem 35 that A? — I =0. 37 Deduce that A~! = A and check that this is correct. (a) When do the eigenvectors for A = 0 span the nullspace N(A)? (b) When do all the eigenvectors for 4 #0 span the column space C(A)? 6.3 Applications to Differential Equations APPLICATIONS TO DIFFERENTIAL 271 EQUATIONS w» 6.3 Eigenvalues and eigenvectors and A = SAS™! are perfect for matrix powers A‘. They are also perfect for differential equations. This section is mostly linear algebra, but to read it you need one fact from calculus: The derivative of e™ is Xe. It helps to know what e is, but I am not even sure that is essential. The whole point of the section is this: Zo convert differential equations into linear algebra. The equation du/dt = u is solved by u = e’. The equation du/dt = 4u is solved by u =e*. The most basic equations have exponential solutions: du Gee hu , has the solutions . u(t) = Ce’. (1) The number C turns up on both sides of du/dt = du and cancels out. At t = 0 the solution reduces to C (because e® = 1). By choosing C = u(0), we match the “initial condition.” The solution that starts from u(0) at t = 0 is u(O)e*". We just solved a 1 by 1 problem. Linear algebra moves to n by n. The unknown is a vector u (now boldface). It starts from the initial vector u(0), which is given. The n equations contain a square matrix A: Problem Solve d = = Au _ starting from the vector u(0) at t=O. (2) This system of differential equations is /inear. If u(t) and v(t) are solutions, so is Cu(t) + Dv(t). We will need n freely available constants to match the n components of u(0). Our first job is to find n “pure exponential solutions” to the equation du/dt = Au. Notice that A is a constant matrix. In other linear equations, A changes as f changes. In nonlinear equations, A changes as u changes. We don’t have either of these difficulties. Equation (2) is “linear with constant coefficients.’ Those and only those are the differential equations that we will convert directly to linear algebra. The main point will be: Solve linear constant coefficient equations by exponentials e™ x. Solution of du/dt = Au Our pure exponential solution will be e* times a fixed vector x. You may guess that i is an eigenvalue of A, and x is the eigenvector. Substitute u(t) = e*'x into the equation du/dt = Au to prove you are right (the factor e*™ will cancel): Au = Ae“ x agrees with d = =he™x provided Ax =Ax. (3) All components of this special solution u = ex share the same e*!. The solution grows when 24 > 0. It decays when A < 0. In general A can be a complex number. Then the real part of A decides growth or decay, while the imaginary part gives oscillation like a sine wave. 272 Chapter 6 Eigenvalues and Eigenvectors Example 1 Solve ie = Ale if i|u starting from u(0) = [3]. This is a vector equation for uw. It contains two scalar equations for the components y and z. They are “coupled together” because the matrix is not diagonal: d y aKa Oa y alle) means that dy _ prea: and Gin wpa oe The idea of eigenvectors is to combine those equations in a way that gets back to 1 by 1 problems. The combinations y + z and y — z will do it: d i; (y+z=zt+y The combination y + z grows d an rr (y — z) like e’, because it has (y — Zz) 4 = 1. The combination y — z decays like e~', because it has 4 = —1. Here is the point: We don’t have to juggle the original equations du/dt = Au, looking for these special combinations. The eigenvectors and eigenvalues do it for us. This matrix A has eigenvalues | and —1. Here are two eigenvectors: Pollil=L] s(t ofLal-[7} The pure exponential solutions u; and w2 take the form e”’x: wie Notice: hitruw =e r| 11 These u’s are eigenvectors. and u(t) hat Xy = € Sahee ee =e" (4) They satisfy Au; = u; and Auz = —uz2, just like x, and x2. The factors e’ and e~' change with time. Those factors give du,/dt = uy; and du2/dt = —uz. We have two solutions to du/dt = Au. To find all other solutions, multiply those special solutions by any C and D and add them: f General solution u(t) =epe Ce 1 A + De zAlieeel t Ce’ + De ea == la " ne) : (5) With these constants C and D, we can match any starting vector u(0). Set t = 0 and e° = 1. The problem asked for u(0) = (4, 2): cf3]+2(_i]=[3| yields C=3 and D=1. With C =3 and D = 1 in the solution (5), the initial value problem is solved. We summarize the steps. The same three steps that solved uz; = Au, in the last section now solve du/dt = Au. The matrix powers A‘ led to AX. The differential equation leads to e*': 6.3 Applications to Differential Equations 1. Find the eigenvalues and n independent eigenvectors of A. 2. Write u(0) as a combination cx; +---+cCyXn 3. Multiply each eigenvector x; by e*‘". Then u(t) is the combination u(t) = cye*'xy tes Example 2 Solve du/dt = Au » i Fe =|0 a ok 2 TG ae 273 of the eigenvectors. + ene xy. (6) knowing the eigenvalues A = 1, 2,3 of A: od 1ja 6 starting from “(0)=)'5 4 Step 1 The eigenvectors are x; = (1,0, 0) and x2 = (1, 1,0) and x3 = (1, 1, 1). Step 2 The vector u(0) = (6,5, 4) is xj + x2 + 4x3. Thus (c), c2, c3) = (1, 1, 4). Step 3 The pure exponential solutions are ex, and e*’x2 and e*x3. The combination that starts from u(0) is u(t) = e'x; + e7!x7 + 4e%'x3. The coefficients 1, 1,4 came from solving the linear equation c)x,-+c2x2+c3x3 Gy se, Be) ave = u(0): 4 6x || = | S C3 6 which is Se = u(0). (7) You now have the basic idea—how to solve du/dt = Au. The rest of this section goes further. We solve equations that contain second derivatives, because they arise so often in applications. We also decide whether u(t) approaches zero or blows up or just oscillates. At the end comes the matrix exponential e“'. Then e“‘u(0) solves the equation du/dt = Au in the same way that A*ug solves the equation uz.; = Auk. All these further steps use the A’s and x’s. With extra time, this section makes a strong connection to the whole topic of differential equations. It solves the constant coefficient problems that turn into linear algebra. Use this section to clarify these simplest but most important differential equations—whose solution is completely based on e”’. Second Order Equations Start with the equation y” + by’ + ky = 0. The unknown y(t) is just a scalar, not a vector. The notations y” and y’ mean d*y/dt* and dy/dt. This is a second-order equation because it contains the second derivative y”. It is still linear with constant coefficients b and k. 274 Chapter 6 Eigenvalues and Eigenvectors In a differential equations course, the method of solution is to substitute y = eM, Each derivative brings down a factor 4. We want y = e* to solve the equation: d? d = 2 pect thy Ge tog, =07 + bh + We ie=0. mare (8) Everything depends on A* + bA +k = 0. This equation for 4 has two roots A; and Az. Then the equation for y has two pure solutions yj = e”!’ and y2 = e*2! Their combinations c,y; + c2y2 give all other solutions. This complete solution has two free constants c; and c2 because the equation contains a second derivative. In a linear algebra course we expect matrices and eigenvalues. Therefore we turn the scalar equation into a vector equation. The unknown vector u has components y and y’. The equation is du/dt = Au: Gy Sas dt dy’ = y dly Orit Talolity. converts rts to to —= >| = , ep A : 9 (9) ve = —ky y —— by’by The first equation dy/dt = y’ is trivial (but true). The second equation connects y” to y’ and y. Together the equations connect wu’ to u. So we can solve that vector equation by eigenvalues: —A i 1 : al has determinant 2 A“ + bA+k =0. The equation for the i’s is the same! It is still A7+bX+k =0. The two roots A; and Az are now eigenvalues of A. The eigenvectors and the complete solution are o=[i] [i] so=oer[t]-or[t] In the first component of u(t), you see y = cje*!! +.c7e*2'_the same solution as before. It can’t be anything else. In the second component of u(t) you see dy/dt. The vector problem is completely consistent with the scalar problem. Note 1 In linear algebra the serious danger is a shortage of eigenvectors. Our eigen- vectors (1,A,) and (1,A2) are the same if Aj = Az. Then we can’t diagonalize A. In this case we don’t yet have a complete solution to du/dt = Au. In differential equations the danger is also a repeated A. The pure solutions y = e*!’Ait and y = e?2' are the same if A; = Az. Again the problem is not yet solved. A second solution has to be found, and it turns out to be y = te”. This “impure” solution (with the extra f) appears also in the vector problem. A diagonal matrix A = S~!AS cannot be reached when there is only one eigenvector. Then A is replaced by the Jordan matrix J. The last section of this chapter describes J. The exponential of J produces that extra factor tf. 6.3 Applications to Differential Equations 275 Note 2 In engineering and physics, the second derivative enters from Newton’s Law F = ma. The acceleration a = y” is multiplied by the mass m. The true equation is my” + by'+ky = f. (The force F in Newton’s Law has three parts, an external force f and a damping force by’ and an internal force ky.) This section divides by m, which brings the left side into the standard form starting with y”. Note 3 Real engineering and real physics deal with systems (not just a single mass at one point). The unknown y is a vector in the first place. The coefficient m is a mass matrix M, not a number. The coefficient k is a stiffness matrix K. The coefficient of y’ is a damping matrix which might be zero. In that case the real differential equation is My"”+Ky = f. To bring back the standard form with y”, multiply through by M7!. Eventually the solution involves the eigenvalues of M~'K which is a major part of computational mechanics. Example p 3 Solve y” + 4y’ + 3y =0 by substituting e*’ and also by linear algebra. Solution Substituting y = e*’ yields (A* + 44 + 3)e*! = 0. The quadratic equation is 42 4+404+3 =0. It factors into (A+1)(A+3) =0. Therefore 4; = —1 and Az = —3. The pure solutions are yj = e~’ and y2 = e~*’. The complete solution is cyy;+ c2y2 and it approaches zero as f increases. To use linear algebra we set u = (y, y’). This leads to a vector equation: dy/dt = y' dy'/dt = —3y —4y’ rts tto converts ae = wp ee: |; u. Find the eigenvalues from the determinant of A — AJ: rv 1 An ai=|73 eal [= +4.4+3=0 The equation is the same and the A’s are the same and the solution is the same. constant coefficients and pure exponentials, calculus goes back to algebra. With Stability of 2 by 2 Matrices For the solution of du/dt = Au, there is a fundamental question. Does the solution approach u = 0 as t — o? Is the problem stable? Example 3 was certainly stable, because both pure solutions e~’ and e~*’ approach zero. Stability depends on eigenvalues, and the eigenvalues depend on A. The complete solution u(t) is built from pure solutions ex. If the eigenvalue is real, we know exactly when e”! approaches zero: The number X must be negative. If the eigenvalue is a complex number A = a +ib, its exponential e™ splits into eel, The second factor e!”' has absolute value fixed at 1: , eft — cosbt vi, +isinbt has bt 12 2 wie |e!?'|? = cos” bt + sin? bt = 1. 276 Chapter 6 Eigenvalues and Eigenvectors determinant D A ‘ 3 both ReA<0| both ReA>0 stable unstable / RS 74 NX “6 7 bothh<0 stable : \4~ a andTe= 2s pe? 27 7 both A>0 unstable rn IE A, <Oand X, > 0: unstable Figure 6.3. A 2 by 2 matrix is stable (u(t) > 0) when T < 0 and D> 0. The other factor e“’ controls growth (instability) or decay (stability). For stability, the real part a of every A must be negative. Then every e*! approaches zero. The question is: Which matrices have negative eigenvalues? More accurately, when are the real parts of the i’s all negative? 2 by 2 matrices allow a clear answer. 6G Stability The matrix A = [25] is stable and u(t) approaches zero when two conditions hold: The trace T=a+d _ must be negative. The determinant D=ad-—bc_ must be positive. Reason If the A’s are real and negative, their sum is negative. This is the trace T. Their product is positive. This is the determinant D. The argument also goes in the reverse T =, direction. If D = Aj,A2 is positive, then 4; and Az have the same sign. If + Az is negative, that sign must be negative. When the ’s are complex numbers, they must have the form a+ib and a — ib. Otherwise T and D will not be real. The determinant D is automatically positive, since (a +ib)(a —ib) = a* +b’. The trace T isa+ib+a—ib=2a. Soa negative trace means that the real part a is negative and the matrix is stable. Q.E.D. Figure 6.3 shows the parabola T*? = 4D which separates real from complex eigenvalues. Solving A* — TA + D = 0 leads to /T2 — 4D. This is real below the parabola and imaginary above it. The stable region is the upper left quarter of the figure—where T < 0 and D > 0. Example 4 Which of these matrices is stable? Ah sl te me ee Ot Ava 5 = 18 6.3 Applications to Differential Equations 277 The Exponential of a Matrix We return briefly to write the solution u(t) in a new form e4‘u(0). This gives a perfect parallel with A‘uo in the previous section. First we have to say what e4’ means. The matrix e“’ has a matrix in the exponent. To define e4', copy e*. The direct definition of e* is by the infinite series 1 +x + 5x? Ss zx? +--+. When you substitute the matrix Af for x, you get a matrix e4’: Definition of the matrix exponential eM = 1+ Att 4(At) + Z(At +.-. (10) The number that divides (At)” is “n factorial.” This is n! = (1)(2)---(n—1)(n). The factorials after 1,2,6 are 4! = 24 and 5! = 120. They grow quickly. The series always converges and its derivative is always Ae“! (Problem 17). Therefore e4‘u(0) solves the differential equation with one quick formula—even if there is a shortage of eigenvectors. This chapter emphasizes how to get to the answer by diagonalization. Assume A is diagonalizable, and substitute A = SAS! into the series. Whenever SAS~'SAS7! appears, cancel S~!S in the middle: e“ = 1+ SAS™'t + 4(SAS™!t)(SAS7't) +--= SUI + At+5(At)? +---JS7! = (11) Sees. That equation says: e4’ equals Se“'S~!. To compute e“’, compute the A’s as usual. Then A is a diagonal matrix and so is e4’—the numbers e*'' are on its diagonal. Multiply Se'S—!u(0) to recognize the new solution u(t) = solution in terms of eigenvalues and eigenvectors: e4'u(0). ert EO Sen Sar BO) =, es lage, It is the old Cl < al rain 6 3 Ant en Cn S contains the eigenvectors x;. A combination c)x| ++-+-+CnX, of its columns is Sc. This matches the starting value when u(0) = Sc. The column c = S~'u(O) at the end of equation (12) brings back the best form of the solution: cyert ANG) y= x] aN he = A == Ge ‘A; Ue ee So A AS a cnern! This is e“‘u(0). It is the same answer that came from our three steps: i (13) 278 Chapter 6 Eigenvalues and Eigenvectors 1. ‘Find the 2’s and x’s. 2 Write u(O) = cyxy +---+CpXp- 3. Multiply each x; by e*’'. Then the solution is u(t) = cye*"’x, +--+» + ene?" Xp. Example 5 Find e4! for (14) A=[_92]. Notice that A* = /: Ll eta ef] ef A>, A°, A’, A® will repeat these four matrices. The top right corner has 1,0, —1,0 repeating over and over. The infinite series for e4’ contains t/1!, 0, —1>/3!,0. In other words t — t?/6 is the start of that corner: Pare Han? + hans += | 1 — 1;2 #1 ane ere ap, os The series on the left is e4’. The top row of the matrix shows the series for cost and sint. We have found e“’ directly: ae Lat a sint cost (15) At t = 0 this gives e? = J. Most important, the derivative of e4 is Ae4’: d Ab tcosttae’ Sings) ei essiniey "Cost d¢i—sint) cost |) i=cose sing)!” er | Opal |= 20:1) cost sint |=sint’ eos nl” A is a skew-symmetric matrix (A! = —A). Its exponential e4’ is an orthogonal matrix. The eigenvalues of A are i and —i. The eigenvalues of e“! are e!’ and e~''. This illustrates two general rules: 1 The eigenvalues of e4' are e™. Z When A is skew-symmetric, Example 6 Solve ue =A e*'At is orthogonal. = [istu starting from u(0) = Ea t=O, Solution The eigenvalues 1 and 2 are on the diagonal of A (since A is triangular). The eigenvectors are (1,0) and (1, 1): fo a}lo)=[o) [fo allil=2Li} 6.3. Applications to Differential Equations Step 2 writes u(0) as a combination x; + x2 of these eigenvectors. 279 Remember: This is Sc = u(O). In this case cj = c7 = 1. Then the solution is the same combination of pure exponential solutions: pete Wee 7 0 dot That is the clearest solution e*!'x;+e*2'x7. ae 2t | |: In matrix form, the eigenvectors go into S: t = ip De u(t) = Se S—!u(0) is f NE elt ‘| uo) =| a [WO That last matrix is e4'. It’s not bad to see what a matrix exponential looks like (this is a particularly nice one). The situation is the same as for Ax = b and inverses. We don’t really need A~! to find x, and we don’t need e“! to solve du/dt = Au. But as quick formulas for the answers, A~'b and e“‘u(0) are unbeatable. =» REVIEW OF THE KEY IDEAS #8 1. The equation u’ = Au is linear with constant coefficients. 2. Its solution is usually a combination of exponentials, involving each A and x: u(t) = cye"A\t x, +--+ cne*"' An xp. 3. The constants c,...,¢, 4. The solution approaches zero (stability) if Real part of A <0 for every 2. 5. The solution is always u(t) = e“'u(0), with the matrix exponential e4’. 6. are determined by w(O) = cjx] +--- + cCpXp. Equations involving y” can be reduced to u’ = Au, by combining y’ and y into u=(y’, y). Problem Set 6.3 1 Find i’s and x’s so that u = e My solves du Ae 3 >isbaad Oana lil aes What combination u = cje*!"x; + c2e*2'x» starts from u(0) = (5, —2)? 280 Chapter 6 Eigenvalues and Eigenvectors 2 Solve Problem | for u = (y, z) by back substitution: dz First solve Fina z starting from z(0) = —2. dy : Then solve tes 4y + 3z starting from y(0) = 5. The solution for y will be a combination of e* and e’. Find A to change the scalar equation y” = 5y’ + 4y into a vector equation for u=(y,y): oe se > What are the eigenvalues of A? y’ i : Find them also by substituting y = e t into y” = 5y' +4y. The rabbit and wolf populations show fast growth of rabbits (from 6r) but loss to wolves (from —2w): dr — dt = 6r —2w and Find the eigenvalues and eigenvectors. dw — dt =2r+w. If r(0) = w(0) = 30 what are the popu| to 2 or lations at time ¢? After a long time, is the ratio of rabbits to wolves its) att toy 1% A door is opened between rooms that hold v(0) = 30 people and w(0) = 10 people. The movement between rooms is proportional to the difference v — w: dv —_ =w-v dt and dw —_ =vU—-w. dt Show that the total v+w is constant (40 people). Find the matrix in du/dt = Au and its eigenvalues and eigenvectors. What are v and w at t = 1? Reverse the diffusion of people in Problem 5 to du/dt = —Au: dv —_ =v-—w dt and The total v + w still remains constant. changed to —A? dw —_ =WwW-v. dt How are the A’s changed now that A is But show that v(t) grows to infinity from v(0) = 30. The solution to y” = 0 is a straight line y = C + Dt. Convert to a matrix equation: Cy Ey>| has the solution 2 -P fe ae) = i 5 This matrix A cannot be diagonalized. 5A*1? +--+. Se | = SPO) ae Find A* and compute e4’ = J + At + Multiply your e4” times (y(0), y’(0)) to check the straight line y(t) = yO) + y'(O)r. 6.3. Applications to Differential Equations 281 Substitute y = e*’ into y” = 6y’ — 9y to show that A = 3 is a repeated root. This is trouble; we need a second solution. The matrix equation is alr}=Ls |b" Show that this matrix has only one line of eigenvectors. Trouble here too. Show that the second solution is y = te*’. The matrix in this question is skew-symmetric: oe Wine 0 c —-c 0 b—-—a The derivative of uy + U3 + U3 is 2uju) + 2u2u, + 2u3u3. U',, U5, U4 to get zero. Then ju (t) ||? is constant. (b) What type of matrix always has e4"u(0)|| = ||u(O)||? When (a) Q =e“! isa ___ Write (1,0) as a combination c)x; Substitute A is a skew- matrix. + c2x2 of these two eigenvectors of A: rofls}=E] 11 ui, = cuz — bu3 Uy = a3 = cuy uy = bu, — au2. or (a) symmetric matrix, 10 —b alu 0O Lt olLe=~La) (b) The solution to du/dt = Au starting from (1,0) is cyel’xy + coe Substitute e’’ = cost +isint and e'' =cost —isint to find u(t). "xy. (a) Write down two familiar functions that solve the equation d?y/dt* = —y. Which one starts with y(0) = 1 and y’/(0) = 0? (b) This second-order equation y” = —y produces a vector equation u’ = Au: peel u= ig ge dt = y’ 5 i heal erie 7 ba @a yl y’ Au = Put y(t) from part (a) into u(t) = (y, y’). This solves Problem iz 12 again. If A is invertible, which constant vector solves du/dt = Au — b? This vector is Uparticular, and the solutions to du/dt = AU give Upomogeneous: Find the complete solution Up, + u;, to du 13 . du 2 () If c is not an eigenvalue of A, substitute uw = ev fo) 8 and find v to solve du/dt = Au —e“'b. This u = e“'v is a particular solution. How does it break down when c is an eigenvalue? 282 Chapter 6 Eigenvalues and Eigenvectors 14 Find a matrix A to illustrate each of the unstable regions in Figure 6.3: (a) A, <O and A2>0 (b) A, >O0 and Az >0 (c) Complex A’s with real part a > 0. Questions 17-25 are about the matrix exponential e4’. 15 Write five terms of the infinite series for e4’. Take the ¢ derivative of each term. Show that you have four terms of Ae“’. Conclusion: The matrix e“’ solves the differential equation. 16 The matrix B = (i apt has B? = 0. Find e? from a (short) infinite series. Check that the derivative of e?’ is Be?'. 17 = Starting from u(0) the solution at time T is e4Ty(0). Go an additional time f to reach e“'(e47u(0)). This solution at time t + T can also be written as Conclusion: e“ times e47 equals 18 Write 19 — 1)A. For If A? = A show that the infinite series produces e4’ = J + (e' 20 A=[2)¢] in the form SAS™!. Find e4! from Se*'S7!. A =[11] in Problem 20 this gives e4! = Generally e4e? is different from e?e4. They are both different from e4+?. Check this using Problems 20-21 and 18: A=|9et 4 21. Write A = [}4] as SAS~'. B=|)0 -1i a+B= iP)ae Multiply Se“’S~! to find the matrix exponential e“' when t = 0. eld bel beef 22 Put 23 Give two reasons why the matrix e“’ At is never singular: 24 A=[/3] into the infinite series to find e“’. First compute A”! (a) Write down its inverse. (b) Write down its eigenvalues. If Ax = Ax then e4’x = Find a solution x(t), y(t) that gets large as t — scientist exchanged the two equations: dx/dt= Ox —4y dy /dt = —2x + 2y pee omnes oo. x To avoid this instability a dy/dt = —2x + 2y dx/dt = Ox —4y. Now the matrix [~? _?] is stable. It has negative eigenvalues. Comment on this. 6.4 Symmetric Matrices 283 SYMMETRIC MATRICES @ 6.4 The eigenvalues of projection matrices are | and 0. The eigenvalues of reflection matrices are 1 and —1. Now we open up to all other symmetric matrices. It is no exaggeration to say that these are the most important matrices the world will ever see—in the theory of linear algebra and also in the applications. We will come immediately to the two basic questions in this subject. Not only the questions, but also the answers. You can guess the questions. The first is about the eigenvalues. The second is about the eigenvectors. What is special about Ax = Xx when A is symmetric? In matrix language, we are looking for special properties of A and S when A = A!. The diagonalization A = SAS~! should reflect the fact that A is symmetric. We get some hint by transposing (S~')'AS'. Since A = A! those are the same. Possibly S~! in the first form equals S$? in the second form. Then S'S = /. We are near the answers and here they are: 1. A symmetric matrix has only real eigenvalues. 2. The eigenvectors can be chosen orthonormal. Those orthonormal eigenvectors go into the columns of S. There are n of them (independent because they are orthonormal). Every symmetric matrix can be diagonalized. Its eigenvector matrix S becomes an orthogonal matrix Q. Orthogonal matrices have Q~! = Q'—what we suspected about S$ is true. To remember it we write Q in place of S, when we choose orthonormal eigenvectors. Why do we use the word “choose”? Because the eigenvectors do not have to be unit vectors. Their lengths are at our disposal. We will choose unit vectors— eigenvectors of length one, which are orthonormal and not just orthogonal. Then A = SAS! is in its special and particular form for symmetric matrices: 6H (Spectral Theorem) Every symmetric matrix A = QAQ! with real diagonal A and orthogonal matrix Q: A=QAQ"'=QaA0' with A! has the factorization Q'!'=aQ!. It is easy to see that OA Q" is symmetric. Take its transpose. You get (Q')TATQT, which is QAQ! again. So every matrix of this form is symmetric. The harder part is to prove that every symmetric matrix has real 4’s and orthonormal x’s. This is the “spectral theorem” in mathematics and the “principal axis theorem” in geometry and physics. We approach it in three steps: 284 Chapter 6 Eigenvalues and Eigenvectors 1. By an example (which proves nothing, except that the spectral theorem might be true) 2. By calculating the 2 by 2 case (which convinces most fair-minded people) 3. By a proof when no eigenvalues are repeated (leaving only real diehards). The diehards are worried about repeated eigenvalues. eigenvectors? Are there still n orthonormal Yes, there are. They go into the columns of S (which becomes last page before the problems outlines this fourth and final step. We now take steps | and 2. In a sense they are optional. mostly for fun, since it is included in the final n by n case. Q). The The 2 by 2 case is Example 1 Find the A’s and x’s when A = E ‘|and A —AI = teny nad Solution The equation det(A — A) = 0 is A* — 5A = 0. The eigenvalues are 0 and 5 (both real). We can see them directly: 4 = 0 is an eigenvalue because A is singular, and 2 = 5 is the other eigenvalue so that 0+ 5 agrees with 1+ 4. This is the trace down the diagonal of A. The eigenvectors are (2, —1) and (1, 2)—orthogonal but not yet orthonormal. The eigenvector for 4 = 0 is in the nullspace of A. The eigenvector for A = 5 is in the column space. We ask ourselves, why are the nullspace and column space perpendicular? The Fundamental Theorem says that the nullspace is perpendicular to the row space— not the column space. But our matrix is symmetric! Its row and column spaces are the same. So the eigenvectors (2, —1) and (1, 2) are perpendicular—which their dot product tells us anyway. These eigenvectors have length /5. Divide them by 5 to get unit vectors. Put the unit vectors into the columns of S (which is Q). Then A is diagonalized: oOA0= Ka TES gs Ve al aS 0 5 Now comes the calculation for any 2 by 2 symmetric matrix. values. Second, perpendicular eigenvectors. The A’s come from det | “7” wo |=? eto ac H) =0. First, real eigen- (1) This factors into (A —A,)(A — Az). The product 1A2 is the determinant D = ac — b?. The sum A; + Az is the trace T=a-+c. The test for real roots of Ax? + Bx +C =0 is based on B2 — 4AC. This must not be negative, or its square root in the quadratic formula would be imaginary. Our equation has different letters, A - TA + D = 0, so the test is based on T? — 4D: 6.4 Real eigenvalues: T? Symmetric Matrices 285 —4D = (a+ c)? — 4(ac — b?) must not be negative. Rewrite that as a* + 2ac + c* — 4ac + 4b. Rewrite again as (a — c)* + 4b’. This is not negative! So the roots A; and Az (the eigenvalues) are certainly real. Perpendicular eigenvectors: Ne A—d1)x1 = a= Compute x; and x2 and their dot product: lain” = (A —AoI)x2 = [ine When b is zero, «OP e ¢ eam eee = b = |x.|(SOME A has perpendicular eigenvectors from x from ke i second row (1,0) and (0,1). Otherwise, take the dot product of x; and x2 to prove they are perpendicular: Xi *X2 = D(A2 —c)+ 1 —a)b = dA +42 —a—c) =0. (2) This is zero because A; + Az equals the trace a+c. Thus x; -x2 = 0. Now comes the general n by n case, with real A’s and perpendicular eigenvectors. 61 Real Eigenvalues The eigenvalues of a real symmetric matrix are real. Proof Suppose that Ax = Ax. Until we know otherwise, A might be a complex number. Then A has the form a +ib (a and b real). Its complex conjugate is } = a — ib. Similarly the components of x may be complex numbers, and switching the signs of their imaginary parts gives x. The good thing is that A times ¥ is the conjugate of 4 times x. So take conjugates of Ax = Ax, remembering that A =! is real: Ax =Ax leads to AX =Ax. Transpose to x’A=x!h. (3) Now take the dot product of the first equation with x and the last equation with x: x'Ax =x! ax and also xl Ax =x! ax. (4) The left sides are the same so the right sides are equal. One equation has i, the other has A. They multiply x'x which is not zero—it is the squared length of the eigenvector. Therefore 2 must equal 2, and a +ib equals a — ib. The imaginary part is b = 0 and the number A =a is real. Q.E.D. The eigenvectors come from solving the real equation (A —A/)x = 0. So the x’s are also real. The important fact is that they are perpendicular. 6) Orthogonal Eigenvectors Eigenvectors of a real symmetric matrix (when they correspond to different ’s) are always perpendicular. 286 Chapter 6 Eigenvalues and Eigenvectors A has real eigenvalues and real orthogonal eigenvectors if and only if A = A Proof Suppose Ax = A,x and Ay = Azy and equation with y and the second with x: A= A!. Take dot products of the first (ix)y = (Ax)Ty =x" Aly =x" Ay =x" day. (5) The left side is x A, y, the right side is x Ary. Since Ay # Ao, this proves that xly =0. Eigenvectors are perpendicular. Example 2 Find the A’s and x’s for this symmetric matrix with trace zero: ele a=| has =3 det(A = 21) =| 4 4 Fle [=9? -25, The roots of A? — 25 = 0 are Ay = 5 and Ay = —5 (both real). The eigenvectors x; = (1,2) and x2 = (—2, 1) are perpendicular. To make them into unit vectors, divide by their lengths 5. The new x; and x2 are the columns of Q, and Q7! equals Q': peer cat ae Reelem This example shows the main goal of this section— to diagonalize symmetric matrices A by orthogonal eigenvector matrices S = Q: 6H (repeated) Every symmetric matrix A has a complete set of orthogonal eigenvectors: A=SAS~! becomes A= ONO” If A = A! has a double eigenvalue 4, there are two independent eigenvectors. We use Gram-Schmidt to make them orthogonal. The Teaching Code does this for each eigenspace of A, whatever its dimension. The eigenvectors go into the columns of Q. One more step. Every 2 by 2 symmetric matrix looks like a=on0"=[x T » ||" ll ae (6) The columns x; and x2 times the rows 2 ie, and Aoxy produce A: A= Ayx1x} + Agxoxt. (7) This is the great factorization QAQ", written in terms of A’s and x’s. When the symmetric matrix is n by n, there are n columns in Q multiplying n rows in Q'. The n 6.4 pieces are Apex e. Those are matrices! Symmetric Matrices 287 Equation (7) for our example is _[-3 4]_ [1/5 2/5 4/5 —2/5 a=| 4 “|= ee ieee ae valk My On the right, each cea! is a projection matrix. It is like uu’ in Chapter 4. The spectral theorem for symmetric matrices says that A is a combination of projection matrices: A=AyPi +---+AnPr A; = eigenvalue, P; = projection onto eigenspace. Complex Eigenvalues of Real Matrices Equation (3) went from Ax = Ax to Ax = Ax. In the end, A and x were real. Those two equations were the same. But a nonsymmetric matrix can easily produce A and x that are complex. In this case, Ax = AX is different from Ax = Ax. It gives us a new eigenvalue (which is A) and a new eigenvector (which is x): For real matrices, complex d’s and x’s come in “conjugate pairs.” If Ax =)x Example 3 A = then Bsr —sin’| has Ay = Ax =)x. cos@ +isin@ and Az = cosé — isin#. Those eigenvalues are conjugate to each other. They are 4 and i, because the imaginary part sin@ switches sign. The eigenvectors must be x and x (all this is true because A is real): cos@ Bae —sin@g | 1 EB = cos@ Be —sin§ 1 call |]= ve. 1 (cos @ +i sin@) Bi (9) ose me = ssi) 1 |le One is Ax = Ax, the other is AX = Ax. The eigenvectors are x = (1, —i) and ¥ = (1,i). For this real matrix the eigenvalues and eigenvectors are complex conjugates. By Euler’s formula, cos 6+ sin@ is the same as e! Its absolute value is |A| = 1, because cos?@ + sin?@ = 1. This fact |A| = | holds for the eigenvalues of every orthogonal matrix—including this rotation. We apologize that a touch of complex numbers slipped in. They are unavoidable even when the matrix is real. Chapter 10 goes beyond complex numbers A and complex vectors x to complex matrices A. Then you have the whole picture. 288 Chapter 6 Eigenvalues and Eigenvectors We end with two optional discussions. Eigenvalues versus Pivots The eigenvalues of A are very different from the pivots. det(A — AI) = 0. For pivots, we use elimination. For eigenvalues, we solve The only connection so far is this: product of pivots = determinant = product of eigenvalues. We are assuming that A = A! = LU. There is a full set of pivots dj, ...,d,. There are n real eigenvalues 4;,..., An. The d’s and i’s are not the same, but they come from the same matrix. This paragraph is about a hidden relation for symmetric matrices: The pivots and the eigenvalues have the same signs. 6K If A is symmetric the number of positive number of positive (negative) pivots. (negative) eigenvalues equals the Example 4 This symmetric matrix A has one positive eigenvalue and one positive pivot: ae 1 TS 3 Hl has pivots 1 and —10 eigenvalues /10 and —/10. The signs of the pivots match the signs of the eigenvalues, one plus and one minus. This could be false when the matrix is not symmetric: Bs ~ eo |-1 —4 has pivots | and 2 eigenvalues —1 and —2. The pivots are positive and the eigenvalues are negative. The diagonal of B has both signs! The diagonal entries are a third set of numbers and we are saying nothing about them. Here is a proof that the pivots and eigenvalues have matching signs when A = A! = LU. You see it best when the pivots are divided out of the rows of U. The pivot matrix D is diagonal. It goes between two triangular matrices L and L', whose diagonal entries are all 1|’s: 1 Fle 2) Oa 3 z alike ma |ea 8 = E A a Gait This is A = LU. ia: ae E ‘| This is mn ! Now divide out —10. A = LDL’. . : It is symmetric. The special event is the appearance of L!. This only happens for symmetric matrices, because LDL" is always symmetric. (Take its transpose to get LDL" again.) For symmetric matrices, A = LU converts to A = LDL" when the pivots are divided out. 6.4 Watch the eigenvalues when the start, the eigenvalues of LDL! Symmetric Matrices 289 L and L" move toward the identity matrix. are 10 and —,/10. At At the end, the eigenvalues of IDI" are 1 and —10 (the pivots!). The eigenvalues are changing, as the “3” in L moves to zero. But to change sign, an eigenvalue would have to cross zero. The matrix would at that moment be singular. Our changing matrix always has pivots 1 and —10, so it is never singular. The signs cannot change, as the 4’s move to the d’s. We repeat the proof for any A = LDL". Move L toward J, by moving the offdiagonal entries to zero. The pivots are not changing and not zero. The eigenvalues i of LDL" change to the eigenvalues d of 1DJ™. Since these eigenvalues cannot cross zero as they move, their signs cannot change. Q.E.D. This connects the two halves of applied linear algebra—pivots and eigenvalues. All Symmetric Matrices are Diagonalizable When no eigenvalues of A are repeated, the eigenvectors are sure to be independent. Then A can be diagonalized. But a repeated eigenvalue can produce a shortage of eigenvectors. This sometimes happens for nonsymmetric matrices. It never happens for symmetric matrices. There are always enough eigenvectors to diagonalize A = A!. Here are three matrices, all with A = —1 and | and | (a repeated eigenvalue): OF Ache Oe 40 Oko Osinepel: Be) = ayOgee OL eee Os {Oat C=) =i] "0° 0 EeOraiee | OO A is symmetric. We guarantee that it can be diagonalized. The nonsymmetric B can also be diagonalized. The nonsymmetric C has only two eigenvectors. It cannot be diagonalized. One way to deal with repeated eigenvalues is to separate them a little. Change the lower right corner of A, B, C from | to d. Then the eigenvalues are —1 and | and d. The three eigenvectors are independent. But when d reaches 1, two eigenvectors of C collapse into one. Its eigenvector matrix S loses invertibility: Sth an) eOee wl OF 10 1 ss) approaches 100 | 0 1 1 | =no On On 2.0 good. This cannot happen when A = A!. Reason: The eigenvectors stay perpendicular. They cannot collapse as d — 1. In this example the eigenvectors don’t even change: O72 0 eG OG 0 0rd. ion has orthogonal eigenvectors = columns of S=]-—I1 Oo est) 1 O FOr al Final note The eigenvectors of a skew-symmetric matrix (AT = —A) are perpendicular. The eigenvectors of an orthogonal matrix (Q' = Q™!) are also perpendicular. The best matrices have perpendicular eigenvectors! They are all diagonalizable. I stop there. 290 Chapter 6 Eigenvalues and Eigenvectors The reason for stopping is that the eigenvectors may contain complex numbers. We need Chapter 10 to say what “perpendicular” means. When x and y are complex vectors, the test is no longer x'y = 0. So we can’t prove anything now—but we can reveal the answer. A real matrix has perpendicular eigenvectors if and only if ATA = AA!. Symmetric and skew-symmetric and orthogonal matrices are included among these “normal” matrices. They may be called normal but they are special. The very best are symmetric. =» REVIEW OF THE KEY IDEAS = 1. A symmetric matrix has real eigenvalues and perpendicular eigenvectors. 2. Diagonalization becomes A = QAQ! 3. All symmetric matrices are diagonalizable. 4. The signs of the eigenvalues agree with the signs of the pivots. with an orthogonal matrix Q. Problem Set 6.4 1 Write A as M+ N, symmetric matrix plus skew-symmetric matrix: 1° 2rse Se Ob A=|4°3°0|/=M4N 3 (M' = M,N? = -—N). For any square matrix, M = fo and N = add up to A. If C is symmetric prove that A'CA is also symmetric. (Transpose it.) When A is 6 by 3, what are the shapes of C and A'CA? 3 If A is symmetric, show that the dot product of Ax with y equals the dot product of x with Ay. If A is not symmetric then (Ax)'y = x1( . 4 Find an orthogonal matrix Q that diagonalizes 5 Find an orthogonal matrix Q that diagonalizes this symmetric matrix: A=, 6 A = [tee Si); bag AD na Oe 2 2, Ee oO) Find all orthogonal matrices that diagonalize A = ie ‘ak 6.4 Symmetric Matrices 291 (a) Find a symmetric 2 by 2 matrix with 1’s on the diagonal but a negative eigenvalue. (b) How do you know it must have a negative pivot? (c) How do you know it can’t have two negative eigenvalues? If A> = 0 then the eigenvalues of A must be . Give an example that has A #0. But if A is symmetric, diagonalize it to prove that A must be zero. If A =a-+ib is an eigenvalue of a real matrix A, then its conjugate X}=a-—ib is also an eigenvalue. (If Ax = Ax then also Ax = Ax.) Prove that every real 3 by 3 matrix has a real eigenvalue. 10 Here is a quick “proof” that the eigenvalues of all real matrices are real: Ax =)x gives x'Ax=Ax'x so A= x! Ax xlx is real. Find the flaw in this reasoning—a hidden assumption that is not justified. 11 Write A and B in the form eae 1 3 A| nel, 12 rp“4. Aox2x4 of the spectral theorem S12| Bal QA OF: || 1). (keep ||x1|| = ||x2= What are the eigenvalues of A = [_? |? Create a 3 by 3 skew-symmetric matrix (Al = —A) and verify that its eigenvalues are all imaginary. 13 This matrix is and also . Therefore its eigenvalues are pure imagi- nary and they have |A| = 1. (Reason: ||Mx|| = ||x|| for every x so ||Ax|| = ||x]| for eigenvectors.) Find all four eigenvalues of eG 14 1 Ogg Fe ]/-1 0-1 1 |te herpes -1 -1 1 0 Show that this A (symmetric but complex) does not have two independent eigenvectors: A= ie Alis not diagonalizable; det(A — AJ) = (A — ae A! = A is not such a special property for complex matrices. The good prop- erty becomes A’ = A. When this holds, the eigenvalues are all real and A is diagonalizable. 292 Chapter 6 15 Even if A is rectangular, the block matrix B = [ar “] is symmetric: Eigenvalues and Eigenvectors ; Odes BA eee tS at ley y | 2|=1?] ee which is Az=Ay Aly Ae (a) Show that A'Az = Az, so that A? is an eigenvalue of ATA. (b) If A=T (2 by 2) find all four eigenvalues and eigenvectors of B. 16 If A =[}] in Problem 15, find all three eigenvalues and eigenvectors of B. 17 Every 2 by 2 symmetric matrix is Ayxixy + Aaxrxd = A,P, +A2P). Explain why a) Feat (b) eee —0) 18 Another proof that eigenvectors are perpendicular when A = A‘. Suppose Ax = Ax and Ay = Oy and A} # 0. Then y is in the nullspace and x is in the column space. They are perpendicular because ____._~-Go carefully—why are these subspaces orthogonal? If the second eigenvalue is a nonzero number f, apply this argument to A — B/. The eigenvalue moves to zero and the eigenvectors stay the same—so they are perpendicular. 19 Find the eigenvector matrix S for the matrix B. Show that it doesn’t collapse at d = 1, even though A = | is repeated. Are the eigenvectors perpendicular? [o) = 20 —-1 0 1 Ou th WG OnOmed eS ee SHTho le From the trace and the determinant find the eigenvalues of pana —3 4 Compare the signs of the 4’s with the signs of the pivots. 21 22 True or false. Give a reason or a counterexample. (a) A matrix with real eigenvalues and eigenvectors is symmetric. (b) A symmetric matrix times a symmetric matrix is symmetric. (c) The inverse of a symmetric matrix is symmetric. (d) The eigenvector matrix S of a symmetric matrix is symmetric. A normal matrix has A! A = AA‘. Why is every skew-symmetric matrix normal? Why is every orthogonal matrix normal? When is [_# 4] normal? 6.4 23 Symmetric Matrices 293 (A paradox for instructors) If AA? = ATA then A and A! share the same eigen- vectors (true). They always share the same eigenvalues. Find the flaw in this conclusion: They must have the same S and A. Therefore A equals A. 24 Which of these classes of matrices do A and B belong to: Invertible, orthogonal, projection, permutation, diagonalizable, Markov? 0 0 Oe ee 6 A & II — Wl —_ o& oOo = Which of these factorizations are possible for A and B: LU, OR, SAS~', QAQ"? 25 What number A = SAS! 26 Db in ee makes A = QAQ! possible? What number makes impossible? This A is nearly symmetric. A= 0" [One aloe But its eigenvectors are far from orthogonal: has eigenvectors : 0: and [2] What is the angle between the eigenvectors? Bh When MATLAB computes A! A, the result is symmetric. But the computed pro- jection matrix P = A(A'A)~'A! might not be exactly symmetric. Construct P from A=[1 1 1 1 1;1 2 3 4 5] and use [S, LAMBDA] =eig(P). From S’ * S show that two of the computed eigenvectors have dot product 0.9999. 294 Chapter 6 Eigenvalues and Eigenvectors POSITIVE DEFINITE MATRICES @ 6.5 This section concentrates on symmetric matrices that have positive eigenvalues. If symmetry makes a matrix important, this extra property (all A > 0) makes it special. When we say special, we don’t mean rare. Symmetric matrices with positive eigenvalues enter all kinds of applications of linear algebra. They are called positive definite. The first problem is to recognize these matrices. You may say, just find the eigenvalues and test A > 0. That is exactly what we want to avoid. Calculating eigenvalues is work. When the A’s are needed, we can compute them. But if we just want to know that they are positive, there are faster ways. Here are the two goals of this section: 1. To find quick tests on a symmetric matrix that guarantee positive eigenvalues. 2. To explain the applications of positive definiteness. The matrices are symmetric to start with, so the 4’s are automatically real numbers. An important case is 2 by 2. When does A= ie | have 44 > 0 and i2 > 0? 6L The eigenvalues of A = A" are positive if and only if a > 0 and ac — b? > 0. This test is passed by A = [43]. It is failed by [22] and also by [~} _9]. One failure is because the determinant is 24 — 25 < 0. The other failure is because a = —1. The determinant of +7 is not enough to pass, because the test has two parts. We think of a and ac — b? as a 1 by 1 determinant and a 2 by 2 determinant. Proof without computing the d’s Suppose A; > O and Az > 0. Their product A;A2 equals the determinant ac — b*. That must be positive. Therefore ac is also positive. Then a and c have the same sign. That sign has to be positive, because 4; +A2 equals the trace a+ c. Proved so far: Positive 4’s require ac — b* > 0 and a > 0. The statement was “if and only if,’ so there is another half to prove. Start with a > 0 and ac — b? > 0. This also ensures c > 0. Since A,A2 equals the determinant ac — b*, the i’s are both positive or both negative. Since A; + A> equals the trace a+c> 0, the A’s must be positive. End of proof. Here is another test. Instead of determinants, it checks for positive pivots. 6M_ The eigenvalues of A = A’ are positive if and only if the pivots are positive: a>0O and ac — b? s (0), a A new proof is unnecessary. a= 0) and, The ratio of positive numbers is certainly positive: ae b S90 if and only if a>0O and ac — b* a Soils 6.5 Positive Definite Matrices 295 The point is to recognize that last ratio as the second pivot of A: er The first pivot is a is “| The multiplier nersis b/a 7 ip ki aa The second pivot is oo noltee ulCa ee This connects two big parts of linear algebra. Positive eigenvalues mean positive pivots and vice versa (for symmetric matrices!). If that holds for n by n symmetric matrices, and it does, then we have a quick test for A > 0. The pivots are a lot faster to compute than the eigenvalues. It is very satisfying to see pivots and determinants and eigenvalues and even least squares come together in this course. Example 1 This matrix has a = | (positive). But ac — b? = (3) — 2? is negative: Ny 1 ae 3 ; ‘ : has a negative eigenvalue and a negative pivot. The pivots are 1 and —1. The eigenvalues also multiply to give —1. is negative (we don’t want its formula, just its sign). One eigenvalue Here is a different way to look at symmetric matrices with positive eigenvalues. From Ax = Ax, multiply by x! to get x'Ax = Ax'x. The right side is a positive d times a positive x'x = \|x||?. So the left side x'Ax is positive when x is an eigenvector. The new idea is that this number x" Ax is positive for all vectors x, not just the eigenvectors. (Of course x! Ax = 0 for the trivial vector x = 0.) There is a name for matrices with this property x'Ax > 0. They are positive definite. We will prove that exactly these matrices have positive eigenvalues and pivots. Definition The matrix A is positive definite if x'Ax > 0 for every nonzero vector: x! AX fe E : Al = ax* + 2bxy +cy? > 0. x!Ax is a number (1 by 1 matrix). The four entries a, b, b, c give the four parts of x’ Ax. From a and c on the diagonal come the pure squares ax* and cy*. From b and b off the diagonal come the cross terms bxy and byx (the same). Adding those four parts gives x! Ax = ax? + 2bxy + cy’. This is a quadratic function ofx and y: Ley) = axe 4 2bxy + cy? is “second degree.” The rest of this book has been linear (mostly Ax). Now the degree has gone from | to 2. The second derivatives of ax?-+2bxy+cy? are constant. Those second derivatives 296 Chapter 6 Eigenvalues and Eigenvectors are 2a, 2b, 2b, 2c. They go into the second derivative matrix 2A: a 92 af =2ax + 2by Ox Galt GX Hah OVOX axdy — dy? af a 7 — =2bx + 2cy dy go Z s[ 2a. 26 ply |aleoptpes|t This is the 2 by 2 version of what everybody knows for | by 1. There the function is ax”, its slope is 2ax, and its second derivative is 2a. Now the function is x! Ax, its first derivatives are in the vector 2Ax, and its second derivatives are in the matrix 2A. Third derivatives are all zero. Where does calculus use second derivatives? They give the bending of the graph. When f” is positive, the curve bends up from the tangent line. The parabola y = ax is convex up or concave down according to a > 0 or a < 0. The point x = 0 is a minimum point of y = x? and a maximum point of y = —x?. To decide minimum versus maximum for f(x), we look at its second derivative. For a two-variable function f(x, y), the matrix of second derivatives holds the key. One number is not enough to decide minimum versus maximum (versus saddle point). The function f = x'Ax has a minimum at x = y = 0 if and only if A is positive definite. The statement “A is a positive definite matrix” is the 2 by 2 version of “a is a positive number.” Example 2 This example is positive definite. The function f(x, y) is positive: ae 2 i | The function is x' Ax = x7 +4xy+7y*. has pivots | and 3. It is positive because it is a sum of squares: ee 4xy + Ty? = (x + 2y) + Bye. The pivots 1 and 3 multiply those squares. This is no accident! We prove below, by the algebra of “completing the square,” that this always happens. So when the pivots are positive, the sum f(x, y) is guaranteed to be positive. Comparing Examples 1 and 2, the only difference is that a22 changed from 3 to 7. The borderline is when a27 = 4. Above 4, the matrix is positive definite. a22 = 4, the borderline matrix is only semidefinite Then (> 0) changes to (> 0): At ie: : > al has pivots 1 and It has eigenvalues 5 and 0. It has a > 0 but ac — b> = 0. Not quite positive definite. 6.5 Positive Definite Matrices 297 We will summarize this section so far. We have four ways to recognize a positive definite matrix. Right now it is only 2 by 2. 6N When a 2 by 2 symmetric matrix has one of these four properties, it has them all: 1. _ Both of the eigenvalues are positive. 2. The | by 1 and 2 by 2 determinants are positive: 3. The 4. The function x'Ax = ax? + 2bxy + cy? is positive except at (0, 0). pivots are positive: a > 0 and ac — b* > 0. a > 0 and (ac — b*)/a > 0. When A has one (therefore all) of these four properties, it is a positive definite matrix. Note We deal only with symmetric matrices. The cross derivative d°*f/dxdy always equals 0°f/dydx. For f(x, y,z) the nine second derivatives fill a symmetric 3 by 3 matrix. It is positive definite when the three pivots (and the three eigenvalues, and the three determinants) are positive. Example Pp 3 Is f(x, y) =x* + 8xy +3y everywhere positive—except Pp at (0, 0)? Solution The second derivatives are fy, = 2 and fry = fyx = 8 and fy, = 6, all positive. But the test is not positive derivatives. We look for positive definiteness. The answer is no, this function is not always positive. By trial and error we locate a point x = 1, y = —1 where f(1, -1) = 1—8-+3 = —4. Better to do linear algebra, and apply the exact tests to the matrix that produced f(x, y): x* + 8xy + 3y* = [x y| b | hea The matrix has ac — b* = 3 — 16. The pivots are 1 and —13. The eigenvalues are (we don’t need them). The matrix is not positive definite. Note how 8xy comes from aj2 = 4 above the diagonal and a2; = 4 symmetrically below. That matrix multiplication in x’ Ax makes the function appear. Main point The sign of b is not the essential thing. The cross derivative d“f/dxdy nie . ye #3 . _ can be positive or negative—it is b~ that enters the tests. The size of b, compared to a and c, decides whether A is positive definite and the function has a minimum. 298 Chapter 6 Eigenvalues and Eigenvectors Example 4 For which numbers c is x? + 8xy + cy’ always positive (or zero)? Solution The matrix is A =[44]. Again a = 1 passes the first test. The second test has ac — b* = c — 16. For a positive definite matrix we need c > 16. The “semidefinite” borderline is c = 16. At that point [} ,¢] has A = 17 and 0, determinants 1 and 0, pivots | and ___._The function x2 + 8xy + loy is @—4y)~, Its graph does not go below zero, but it stays equal to zero all along the line x + 4y = 0. This is close to positive definite, but each test just misses: x’ Ax equals zero for x = (4, —1). Example 5 When A is positive definite, write f(x, y) as a sum of two squares. Solution This is called “completing the square.” The part ax? +2bxy is correct in the first square a(x + 29 he But that ends with a final Gey) To stay even, this added amount b?y*/a has to be subtracted off from cy? at the end: ax* > > + 2bxy + cy” =a b Sie \? + HOM ir 5; > Var (1) After that gentle touch of algebra, the situation is clearer. There are two perfect squares (never negative). They are multiplied by two numbers, which could be positive or negative. Those numbers a and (ac — b*)/a are the pivots! So positive pivots give a sum of squares and a positive definite matrix. Think back to the factorization A = LDL?: QD, lake k ales Ona || ee orn 1 b/a he, 5 i ese T ae To complete the square, we dealt with a and b and fixed the part involving c later. Elimination does exactly the same. It deals with the first column, and fixes the rest later. The numbers that come out are identical. Outside the squares are the pivots. Inside (x ae 2y)? are the numbers | and from L. Every positive definite symmetric matrix factors into A = LDL" with positive pivots. Important to compare A = LDL! with A = QAQ". One is based on pivots (in D), the other is based on eigenvalues (in A). Please do not think that the pivots equal the eigenvalues. Their signs are the same, but the numbers are entirely different. Positive Definite Matrices: n by n For a 2 by 2 matrix, the “positive definite test” uses eigenvalues or determinants or pivots. All those numbers must be positive. We hope and expect that the same tests carry over to larger matrices. They do. 6.5 Positive Definite Matrices 60 When ann 1. ‘All the eigenvalues are positive. 2. All the upper left determinants are positive. 3. All the pivots are positive. 4. x! Ax is positive except at x = 0. The matrix is positive definite. 299 by n symmetric matrix has one of these four properties, it has them all: The upper left determinants are | by 1, 2 by 2,...,” by n. The last one is the determinant of A. This remarkable theorem ties together the whole linear algebra course—at least for symmetric matrices. We believe that two examples are more helpful than a proof. Then we give two applications. Example 6 Test the matrices A and A* for positive definiteness: 2 A=|-1 0 —1 0 2 —1 -!1 2 2 and At= b Solution This A is an old friend (or enemy). —-1 |=] b | —1 D Its pivots are 2 and 5 and ;, all positive. Its upper left determinants are 2 and 3 and 4, all positive. Its eigenvalues are 2 — /2 and 2 and 2 + V2, all positive. That completes tests 1, 2, and 3. We can write x! Ax as a sum of three squares (since n = 3). Using A = LDL! the pivots 2, 3, t appear outside the squares. The multipliers from L are inside: xl Ax = 2 (ai — xx. + xe — x2x3 + a) 2 Ome = 2(x1 — 342)" + 3(a2 — 3x3) + 323 > 0. Go to the second matrix A*. The determinant test is easiest. The 1 by 1 determinant is 2 and the 2 by 2 determinant is 3. The 3 by 3 determinant comes from A* itself: det A* = 4+2b—2b? must be positive. At b = —1 and b = 2 we get detA* = 0. In those cases A®* is positive semidefinite (no inverse, zero eigenvalue, x! A*x > 0). Between b = —1 and b = 2 the matrix is positive definite. The corner entry b = 0 in the first example was safely between. Second Application: The Ellipse ax? + 2bxy + cy” = 1 Think of a tilted ellipse centered at (0,0), as in Figure 6.4a. Turn it to line up with the coordinate axes. That is Figure 6.4b. These two pictures show the geometry behind A=QAQ7!: 300 = Chapter 6 Figure 6.4 Eigenvalues and Eigenvectors The tilted ellipse 5x? + 8xy + Sy? = 1. Lined up it is 9X? + Y? =1. 1. The tilted ellipse is associated with A. Its equation is x' Ax = 1. 2. The lined-up ellipse is associated with A. Its equation is X'AX = 1. 3. The rotation matrix from x to X that lines up the ellipse is Q. Example 7 Find the axes of the tilted ellipse 5x2 48xystiSye Sul, Solution Start with the positive definite matrix that matches this function: The function is [x y| E | ae The matrix is A= E alt The eigenvalues of A are 4; =9 and Az = 1. The eigenvectors are heal and ea make them unit vectors, divide by /2. Then spit] Meee, A = QAQ? To is emo 40 (Al mie AceSu)p mag line = Oma |orsIte sla Now multiply by [x y] on the left and [}] on the right to get back the function x AK: 5x? + Bry +5y? = 9 lan, a ay P) *) +1(22) ; 2 we 3 . The function is again a sum of two squares. But this is different from completing the square. The coefficients are not the pivots 5 and 9/5 from D, they are the eigenvalues 9 and | from A. Inside these squares are the eigenvectors (1, yee and (1, SS0y eo: 6.5 Positive Definite Matrices 301 The axes of the tilted ellipse point along the eigenvectors. This explains why A = QAQ' is called the “principal axis theorem”—it displays the axes. Not only the axis directions (from the eigenvectors) but also the axis lengths (from the eigenvalues). To see it all, use capital letters for the new coordinates that line up the ellipse: 38 ar WY af2 Xd and Soy eas, WP) The ellipse becomes 9X* +Y* = 1. The largest value of X? is 5. The point at the end of the shorter axis has X = ‘ and Y = 0. Notice: The bigger eigenvalue A; gives the shorter axis, of half-length 1/./A, = i. The point at the end of the major axis has X =0 and Y = 1. The smaller eigenvalue A2 = 1 gives the greater length 1/./A2 = 1. In the xy system, the axes are along the eigenvectors of A. In the XY system, the axes are along the eigenvectors of A—the coordinate axes. Everything comes from the diagonalization A = QAQ?. 6P Suppose A = QAQ! is positive definite. The graph of x'Ax = 1 is an ellipse: x [x y]QAaQ T| iste yJa] xyp]aux 2 $Y? : =1. The half-lengths of the axes are 1/,/A; and 1/./A2. For an ellipse, A must be positive definite. If an eigenvalue is negative (exchange the 4’s with the 5’s in A), we don’t have an ellipse. difference of squares: =» 9X” — Y* =1. The sum of squares becomes This is a hyperbola. REVIEW OF THE KEY IDEAS =& 1. Positive definite matrices have positive eigenvalues and positive pivots. 2. A quick test is given by the upper left determinants: 3. The quadratic function f =x'Ax a > 0 and ac — bo. has a minimum at x = 0: x? Ax =ax* + 2bxy + cy” is positive except at (x, y) = (0, 0). 4, The ellipse x’ Ax = | has its axes along the eigenvectors of A. 5. A'A is automatically positive definite if A has independent columns (r = 7). a 302 Chapter 6 Eigenvalues and Eigenvectors Problem Set 6.5 Problems 1-13 are about tests for positive definiteness. 1 Which of A,, Az, A3, Aq has two positive eigenvalues? Use the test, don’t com- pute the i’s: 5 6 =) 1 10 a=[e 3] a= le, x ele a “el 1 10 mal Explain why c > 0 (instead of a > 0) combined with ac — b*, > 0 is also.a complete test for [#°] to have positive eigenvalues. 2 For which numbers b and c are these matrices positive definite? i a=|; Factor each A into LU 3 | and Ae A se al} and then into LDL’. What is the quadratic f = ax?+2bxy+cy for each of these matrices? Complete the square to write f as a sum of one or two squares d}( te ae ee ee: een eel 2 4 Show that f(x, y) = x2 +4xy + 3y” does not have a minimum at (0,0) even though it has positive coefficients. Write f as a difference of squares and find a point (x, y) where f is negative. 5 The function f(x, y) = 2xy certainly has a saddle point and not a minimum at (0,0). What symmetric matrix A produces this f? What are its eigenvalues? 6 Test to see if A'A is positive definite: 7 (Important) If A has independent columns then A'A is square and symmetric and invertible (Section 4.2). Show why x! AT Ax is positive except when x = 0. Then A'A 8 is more than invertible, it is positive definite. The function f(x, y) = 3(x + 2y)* + 4y? is positive except at (0,0). What is the matrix A, so that f =[x y]A[x y]'? Check that the pivots of A are 3 and 4. 6.5 Positive Definite Matrices 303 Find the 3 by 3 matrix A and its pivots, rank, eigenvalues, and determinant: x] [x1 50) x3 | A 2Y || NG a = ah ne) X3 10 11 Which 3 by 3 symmetric matrices A produce these functions is the first matrix positive definite but not the second one? f = x'Ax? Why Compute the three upper left determinants to establish positive definiteness. ify that their ratios give the second and third pivots. Ver- (a) ih= D(x, + x5 + x3 = X1x2 — x2X3) (Db), Ff = 2 (x7 + x2 +x? — X1X2 — X1Xx3 — x2x3). 2 oF <2, 95 3 0) 73-8 Aves 12 For what numbers c and d are A and B positive definite? Test the 3 determinants: c Ass lt Ly @ il and oy B29 dy 13 4S Find a matrix with a > 0 and c > 0 and a+c > 2b that has a negative eigenvalue. Problems 14 oes de 14—20 are about applications of the tests. If A is positive definite then A~! is positive definite. First proof: The eigenvalues of A~! are positive because _. Second proof (2 by 2): The entries of Al= 1 eT OG a al ee ; ret 4 pass the test 15 If A and B are positive definite, show that A+ B is also positive definite. Pivots and eigenvalues are not convenient for this; better to prove x'(A+B)x > 0 from the positive definiteness of A and B. 16 For a block positive definite matrix, the upper left block A must be positive definite: less ) gt a | The complete block test is that reduces to x'Ax A and C — B'A~'B when y= must be positive definite. 304 Chapter 6 17 A positive definite matrix cannot have a zero (or worse, a negative number) on Eigenvalues and Eigenvectors its diagonal. Show that this matrix is not positive definite: if @ 2 18 nail 2 x2 | is not positive when (%1,%2,%3) =( , , ). The first entry aj, of a symmetric matrix A cannot be smaller than all the eigenvalues. If it were, then A — a,,/ would on the main diagonal. the eigenvalues. have ______ eigenvalues but it has a Similarly no diagonal entry can be larger than all 19 If x is an eigenvector of A then A is positive definite. x! Ax = . Prove that 4 is positive when 20 Give a quick reason why each of these statements is true: (a) Every positive definite matrix is invertible. (b) The only positive definite permutation matrix is P = J. (c) The only positive definite projection matrix is P = /. (d) A diagonal matrix with positive diagonal entries is positive definite. (e) Asymmetric matrix with a positive determinant might not be positive definite! Problems 21-24 use the eigenvalues; Problems 25-27 are based on pivots. 21 For which s and ¢ do these matrices have positive eigenvalues (therefore positive definite)? s A=|-4 -4 —4 s -4 -4 -4 5 and fe Er 3 i) ne ae ae | 0 ie Sas 22 From A = QAQ! compute the positive definite symmetric square root QA!/7QT of each matrix. Check that this square root gives R? = A: 23 You may have seen the equation for an ellipse as cae + Ns = 1. What are a and b when the equation is written as 4yx? + Ary? = 1? The ellipse 9x? + 16y? = 1 has axes with half-lengths a = and b = 24 Draw the tilted ellipse x*-+xy+y* = 1 and find the half-lengths of its axes from the eigenvalues of the corresponding A. 6.5 25 With positive pivots in D, the factorization A = LDL! (Square roots of the pivots give D = /DVD.) Then Cholesky factorization A = CC!: From 26 Positive Definite Matrices ea 50 2 | find In the Cholesky factorization A. becomes LVDJDL". C = LVD yields the 4 a=| 8 From 305 8 25 find A = CC‘, with C = LVD, the C. of the pivots are on the diagonal of C. Find C (lower triangular) for Al— ZT » sO 0} O © ie 2 8 The symmetric factorization iS 1 and A = LDL! A means that x'Ax = x'LDL'x. This is lle 4 BiG ae qlE we ¥yfa| [0 Tb) Multiplication produces ax* + 2bxy +cy? = a(x + ‘y) + __y*. pivot completes the square. Test witha =2, b=4, c= 10. 28 29 Without multiplying The second A = ees ane i dil—sin@ Cos@'cosé sing 1h find sin? cos@ (a) the determinant of A (b) the eigenvalues of A (c) the eigenvectors of A (d) a reason why A For fi(x,y) = ix4 =f xy ae y? and f2(x,y) derivative matrices A; and A): = is positive definite. seas xy — x find the second _ | a?f/dx? a?f/dxdy La f/ayax a? f/dy? | A is positive definite so f| is concave up (= convex). Find the minimum point of f; and the saddle point of f> (where first derivatives are zero). 30 The graph of z= x* + y? is a bowl opening upward. The graph of z = x7 — y? is a saddle. The graph of z = —x* — y? is a bowl opening downward. What is a test for z= ax* + 2bxy + cy” to have a saddle at (0, 0)? 31 Which values of c give a bowl and which give a saddle for the graph of the function z = 4x* + 12xy +cy?? Describe the graph at the borderline value of c. 306 Chapter 6 Eigenvalues and Eigenvectors SIMILAR MATRICES @ 6.6 The big step in this chapter was to diagonalize a matrix. That was done by S—the eigenvector matrix. The diagonal matrix S~!AS is A—the eigenvalue matrix. But diagonalization was not possible for every A. Some matrices resisted and we had to leave them alone. They had too few eigenvectors to produce S. In this new section, S remains the best choice when we can find it, but we allow any invertible matrix M. Starting from A we go to M~'AM. This new matrix may happen to be diagonal— more likely not. It still shares important properties of A. No matter which M we choose, the eigenvalues stay the same. The matrices A and M~! AM are called “similar.” A typical matrix A is similar to a lot of other matrices because there are so many choices of M. DEFINITION Let / be any invertible matrix. Then B = M~!AM is similar to A. If B = M~'AM then immediately A = MBM. That means: If B is similar to A then A is similar to B. The matrix in this reverse direction is M~'—just as good as M. A diagonalizable matrix is similar to A. In that special case M is S. We have A = SAS~! and A = S~'AS. They certainly have the same eigenvalues! This section is opening up to other similar matrices B = M~'AM. The combination M~!AM appears when we change variables. ferential equation for uw and set u = Mv: du — = Au dt becomes dv M—=AMv dt ee whichis —=M dt Start with a dif- £2] “AMv. The original coefficient matrix was A, the new one at the right is M~'AM. Changing variables leads to a similar matrix. When M = S the new system is diagonal—the maximum in simplicity. But other choices of M also make the new system easier to solve. Since we can always go back to u, similar matrices have to give the same growth or decay. More precisely, their eigenvalues are the same. 6Q (No change in A’s) The similar matrices A and M —l 4M have the same eigenvalues. If x is an eigenvector of A then M~!x is an eigenvector of B = M~!AM. The proof is quick, since MBM~'x B = M~'AM gives A= MBM7—!. Suppose Ax = Ax: =)x BM~!x means that =AM7'x. The eigenvalue of B is the same A. The eigenvector is now M~!x. 6.6 Similar Matrices 307 The following example finds three matrices that are similar to one projection matrix. Example 1 The projection A = i a is similar to A = S~!AS = k a Now choose M = 1 0 }the similar matrix M~!AM Let 0 -1 | the similar matrix M~!AM i 0 Also choose M = | is | ae 0). 0 is LE2 5 ey These matrices M~!AM all have eigenvalues 1 and 0. Every 2 by 2 matrix with those eigenvalues is similar to A. The eigenvectors change with M. The eigenvalues in that example are | and 0, not repeated. This makes life easy. Repeated eigenvalues are harder. The next example has eigenvalues 0 and 0. The zero matrix shares those eigenvalues, but it is only similar to itself: M —lOM =0. The following matrix A is similar to every nonzero matrix with eigenvalues 0 and 0. Example 2 A 2 0 01 is similar to every matrix B = |<: : =| ExXCept i —U: i = These matrices B are all singular (like A). They all have rank one (like A). Their trace is cd —cd = 0. Their eigenvalues are 0 and 0 (like A). I chose any M = ie a with ad — bc = 1. These matrices B can’t be diagonalized. In fact A is as close to diagonal as possible. It is the “Jordan form” for the family of matrices B. This is the outstanding member of the family. The Jordan form is as near as we can come to diagonalizing all of these matrices, when there is only one eigenvector. Chapter 7 will explain another approach to similar matrices. Instead of changing variables by u = Mv, we “change the basis.” In this approach, similar matrices represent the same transformation of n-dimensional space. When we choose a basis for that space, we get a matrix. The usual basis vectors in M = J lead to I~'AJ which is A. Other bases lead to other matrices B = M~'AM. In this “similarity transformation” from A to B, some things change and some don’t. Here is a table to show connections between A and B: Not changed Eigenvalues Trace and determinant Rank Number of independent eigenvectors Jordan form Changed Eigenvectors Nullspace Column space Row space Left nullspace Singular values 308 = Chapter 6 Eigenvalues and Eigenvectors The eigenvalues don’t change for similar matrices; the eigenvectors do. The trace is the sum of the ’s (unchanged). The determinant is the product of the same As,t The nullspace consists of the eigenvectors for 4 = O (if any), so it can change. Its dimension n—r does not change! The number of eigenvectors stays the same for each i, while the vectors themselves are multiplied by M~!. The singular values depend on A' A, which definitely changes. They come in the next section. The table suggests good exercises in linear algebra. But the last entry in the unchanged column—the Jordan form—is more than an exercise. We lead up to it with one more example of similar matrices. Example 3 This Jordan matrix J has triple eigenvalue 5,5, 5. Its only eigenvectors are multiples of (1, 0, 0): ear) — NY ll OMS rO e/a) et OmORRO tet oO OWNKe © nA oN has rank 2. Every similar matrix B = M~'JM has the same triple eigenvalue 5,5,5. Also B—5/ must have the same rank 2. Its nullspace has dimension 3 — 2 = 1. So each similar matrix B also has only one independent eigenvector. The transpose matrix J? has the same eigenvalues 5,5,5, and J’ —5/ has the same rank 2. Jordan’s theory says that J" is similar to J. The matrix that produces the similarity happens to be the reverse identity M: J =M UM wis 5 40. 0 l\ See 0 1.5 1 = 1 5 1 ‘se Ma 1 1 All blank entries are zero. The only eigenvector of J. iss (1-0. 0) = (0, 0, 1). More exactly, there is one line of eigenvectors (x;, 0, 0) for J and another line (0, 0, x3) for J!. The key fact is that this matrix J is similar to every matrix with eigenvalues 5,5,5 and one line of eigenvectors. Example 4 Since J is as close to diagonal as we can get, the equation du/dt = Ju cannot be simplified by changing variables. We must solve it as it stands: ae pa THere ae mig vga pa tien be a Nagi: is a direct proof that the determinant x y z is is unchanged: dx/dt =5x+y dy/dt =5y+z dz/dt = 5z. det B = (det M—!)(det A)(det M) = det A. 6.6 The system is triangular. We think naturally of back substitution. tion and work upwards. Main point: All solutions contain e*': dz : Similar Matrices 309 Solve the last equa- Geel yields z= z(O)e™ dy = iba Sy4-2 : yields y=ES (y(0) + tz(0))e St dx ea= 5x -+y ; yields De z(0))e St : x== (x (0) +t y(0) 4-51 The two missing eigenvectors are responsible for the te’ and t?e’ terms in this solution. They enter because 4 = 5 is a triple eigenvalue. The Jordan Form For every A, we want to choose M so that M~!AM is as nearly diagonal as possible. When A has a full set of n eigenvectors, they go into the columns of M. Then M = S. The matrix S~'AS is diagonal, period. This matrix is the Jordan form of A—when A can be diagonalized. In the general case, eigenvectors are missing and A can’t be reached. Suppose A has s independent eigenvectors. Then it is similar to a matrix with s blocks. Each block is like J in Example 3. The eigenvalue is on the diagonal and the diagonal above it contains \’s. This block accounts for one eigenvector of A. When there are n eigenvectors and n blocks, they are all 1 by 1. In that case J is A. 6R (Jordan form) If A has s independent eigenvectors, it is similar to a matrix J that has s Jordan blocks on its diagonal: There is a matrix M such that J M'AM = ie = J. (1) Js Each block in J has one eigenvalue A;, one eigenvector, and 1’s above the diagonal: Ji = ine (2) A is similar to B if they share the same Jordan form J—and not otherwise. This is the big theorem about matrix similarity. In every family of similar matrices, we are picking out one member called J. It is nearly diagonal (or if possible completely diagonal). For that matrix, we can solve du/dt = Ju as in Example 4. 310 Chapter 6 Eigenvalues and Eigenvectors We can take powers J* as in Problems 9-10. Every other matrix in the family has the form A = MJM~!. The connection through M solves du/dt = Au. The point you must see is that MWJ/M~!MJM~! = MJ*M7—!, That cancellation of M~'M in the middle has been used through this chapter (when M was S). We found A! from SA!0S5~—!—by diagonalizing the matrix. Now we can’t quite diagonalize A. So we use MJ! M—! instead. Jordan’s Theorem 6R is proved in my textbook Linear Algebra and Its Applications, published by HBJ/Saunders. Please refer to that book (or more advanced books) for the proof. The reasoning is rather intricate and in actual computations the Jordan form is not popular—its calculation is not stable. A slight change in A will separate the repeated eigenvalues and remove the off-diagonal 1’s—leaving a diagonal A. Proved or not, you have caught the central idea of similarity—to make A as simple as possible while preserving its essential properties. =» REVIEW OF THE KEY IDEAS 1. _B is similar to A if = B= M~'AM. 2. Similar matrices have the same eigenvalues. 3. If A has n independent eigenvectors then A is similar to A (take M = S). 4. Every matrix is similar to a Jordan matrix J (which has A as its diagonal part). J has a “1” above the diagonal for every missing eigenvector. Problem Set 6.6 1 If B = M~'AM and also Conclusion: 2 If C = F"!AF and also Conclusion: 3 C = N~'BN, what matrix T gives C = T~!AT? If B is similar to A and C is similar to B, then C = G~'BG, what matrix M gives If C is similar to A and also to B then Show that A and B are similar by finding M: i ei; @ 0 and @ a=; il i B = M~'AM? 6.6 Similar Matrices 311 If a 2 by 2 matrix A has eigenvalues 0 and 1, why is it similar to A = [)?]? Deduce from Problem 2 that all 2 by 2 matrices with those eigenvalues are similar. Which of these matrices are similar? Check their eigenvalues. [0 al Ol tO ey sy 70 Cary Let ed) 1 0 0 Oe 1 Te There are sixteen 2 by 2 matrices whose entries are 0’s and 1’s. Similar matrices go into the same family. How many families? How many matrices (total 16) in each family? (a) If x is in the nullspace of A show that M~'x is in the nullspace of M~'AM. (b) The nullspaces of A and M —! AM have the same (vectors)(basis)(dimension). If A and B have the exactly the same eigenvalues and eigenvectors, does A = B? With n independent eigenvectors we do have A = B. What if A has eigenvalues 0,0 and only one line of eigenvectors (x;, 0)? By direct multiplication find A? and A? when io EiraeHe Guess the form of A* and check k = 6. Set k = 0 to find A® and k = —1 to find A7!. Questions 10 10-14 are about the Jordan form. By direct multiplication, find J* and J? when ie iCal ah Guess the form of J‘. Set k = 0 to find J®. Set k = —1 to find J~!. 11 The text solved du/dt = Ju for a 3 by 3 Jordan block J. Add a fourth equation dw/dt =5w +x. Follow the pattern of solutions for z, y, x to find w. 12 These Jordan matrices have eigenvalues 0,0,0,0. They have two eigenvectors (one from each block). But the block sizes don’t match and they are not similar: Pin i jacineo V0 uated 0.6) ona 0010 0 a es resale an. = Deion0 alia tah bohm? pro? ORG For any matrix M, compare JM with MK. If they are equal show that M not invertible. Then M~'JM = K is impossible. is 312 13 Chapter 6 Eigenvalues and Eigenvectors J™ was similar to J in Example 3. Prove that A! is always similar to A, in three steps: First, when A is a Jordan block J;: Find M; so that Mo" J; WW I Second, A is a Jordan matrix J: Build Mo from blocks so that My | JMo =J!. Third, A is any matrix MJM~!: Show that A! is similar to J’ and to J and to A. 14 Find two more matrices similar to J in Example 3. 15 Prove that det(A — 47) = det(M~'!AM — AI). Write 1 = M~'M and factor out det M—! and det M. This says that M~'AM has the same characteristic polynomial as A. So its roots are the same eigenvalues. 16 Which pairs are similar? Choose a,b, c,d to prove that the other pairs aren’t: a b ad 17 ba ak we ed a@ ob ad Db ¢ ai True or false, with a good reason: (a) An invertible matrix can’t be similar to a singular matrix. (b) A symmetric (c) A can’t be similar to —A (d) A can’t be similar to A+ /. matrix can’t be similar to a nonsymmetric matrix. unless A = 0. 18 If B is invertible prove that AB has the same eigenvalues as BA. 19 When A is m by n and B is n by m, AB and BA can have different sizes. But 20 EMIS ste a (a) What sizes are the blocks? They are the same in each matrix. (b) This block equation isM -lFM =G, so F and G have the same eigenvalues. F has the eigenvalues of AB plus n zeros, G has the eigenvalues of BA plus m zeros. Conclusion when m > n: AB has the same eigenvalues as BA plus zeros. Why are these statements all true? (a) If A is similar to B then A? is similar to B?. (b) A* and B? can be similar when A and B are not similar. (c) eal is similar to eas (d) [3 | is not similar to ie ae (e) If we exchange rows | and 2 of A, and then exchange columns the eigenvalues stay the same. | and 2, 6.7 Singular Value Decomposition (SVD) SINGULAR VALUE DECOMPOSITION 313 (SVD) a 6.7 The Singular Value Decomposition is a highlight of linear algebra. A is any m by n matrix. It can be square or rectangular, and we will diagonalize it. Its row space is r-dimensional (inside R”). Its column space is also r-dimensional (inside R”). We are going to choose orthonormal bases for those spaces. The row space basis will be vj,..., v, and the column space basis will be w,,..., Uu,. Start with a 2 by 2 matrix: m =n = 2. Let its rank be r = 2, so it is invertible. Its row space is the plane R*. We want v; and v2 to be perpendicular unit vectors, an orthogonal basis. We also want Av, and Av? to be perpendicular. (This is the tricky part.) Then the unit vectors uj = Avj/||Av;|| and wz = Av2/||Av2|| will be orthogonal, just dividing by lengths. As a specific example, we work with the unsymmetric matrix a-[2 2]. 2a r ye First point Why not choose one orthogonal basis instead of two? Because no orthogonal matrix Q will make Q~!AQ diagonal. Second point Why not choose the eigenvectors of A as the basis? Because that basis is not orthonormal. A is not symmetric and we need two different orthogonal matrices to diagonalize it. We are aiming for orthonormal bases that diagonalize A. The two bases will be different— one basis cannot do it. When the inputs are v; and v2, the outputs are Av; and Av2. We want those to line up with uw; and u2. The basis vectors have to give Av; = 0), and also Av2 = 02u2. The numbers oj and 02 are just the lengths ||Av, || and ||/Av2||. With vj and v2 as columns you can see what we are asking for: A E | = jou In matrix notation that is AV = UX. versus capital lambda). on |= c | [° 02 |. (2) The diagonal matrix & is like A (capital sigma © contains the singular values 01,02 and A contains the eigen- values. The difference comes from U and V. When they both equal S, we have AS = SA which means S~'AS = A. The matrix is diagonalized. But the eigenvectors in S$ are not generally orthonormal. The new requirement is that U and V must be orthogonal matrices. The basis vectors in their columns must be orthonormal: vive= Thus V'V =/ which means ae il —v — ny wm} a=}|! OT. Om V? = V~!. Similarly U'U =7 and U' =U™!. (3) 314 Chapter 6 Eigenvalues and Eigenvectors 6R The Singular Value Decomposition (SVD) has orthogonal matrices U and V: AV=US andthn A=UXV7!=UXV'. This is the new factorization of A: orthogonal times diagonal (4) times orthogonal. There is a neat way to get U out of the picture and see V by itself: Multiply A! times A. AA] (USV WU eV')= VEU US. (5) U'U disappears because it equals 7. Then ©! is next to ©. Multiplying those diagonal matrices gives oe and Oe. That leaves an ordinary factorization of the symmetric matrix ATA: 1 Agari of 0 0 052 T Views (6) This is exactly like A = QAQ!. But the symmetric matrix is not A itself. Now the symmetric matrix is A! A! And the columns of V are its eigenvectors. This tells us how to find V. We are ready to complete the example. Example 1 Find the singular value decomposition of A = ee : V2] = =f] Solution Compute A!A and its eigenvectors. Then make them unit vectors: ATA= E 2 has unit eigenvectors v, = The eigenvalues of A'A are 8 and 2. The v’s are perpendicular, because eigenvectors of every symmetric matrix are perpendicular—and A‘A is automatically symmetric. What about uw; and uz? They are quick to find, because Av, is in the direction of uw; and Avg is in the direction of uo: Av, = if A isis = leash Clearly Av, is the same as 2/2u,. The unit vector is uy, = fal; The first singular value is 0, = 2./2. Then o; ss which is the eigenvalue of A'A. We have Av; = 0,u exactly as required. Similarly Av? = is | ueel = Rat Now Av» is /2u2. The unit vector is uz = call The second singular value is 07 = J/2. And On agrees with the other eigenvalue 2 of A'A. We have completed the SVD: a ae be l=(¢ vile ell On 1 ‘lll 1//2 —1/V2 Wak 1/72 (7) 6.7. Figure 6.5 Singular Value Decomposition (SVD) U and V are rotations and reflections. 315 © is a stretching matrix. This matrix, and every invertible 2 by 2 matrix, transforms the unit circle to an ellipse. You can see that in the figure, which was created by Cliff Long and Tom Hern. One final point about that example. We found the w’s from the v’s. Could we find the u’s directly? Yes, by multiplying AA‘ instead of A! A: AAM=(UDV DV SLU). Ses = 0s, (8) This time it is V'V = / that disappears. Multiplying XX! gives ae and oF as before. We have an ordinary factorization of the symmetric matrix AA!. The columns of U are the eigenvectors of AA’: Aa ate fe 2 This matrix happens to be diagonal. oe ae lame NV 2a x os Its eigenvectors are (1,0) and (0, 1). This agrees with uw; and uw2 found earlier. Why should we take the first eigenvector to be (1, 0) instead of (0, 1)? Because we have to follow the order of the eigenvalues. Notice that AA! has the same eigenvalues (8 and 2) as A'A. The singular values are /8 and /2. Example 2 Find the SVD of the singular matrix A = [??]. The rank is r = 1. The row space has only one basis vector vj. The column space has only one basis vector u;. We can see those vectors in A, and make them into unit vectors: v; = multiple of row UU he ed (et i|-34] u; = multiple of column v i) 12 H we i 316 Chapter 6 Eigenvalues and Eigenvectors ——= 7 es oa ae column space “ nullspace nullspace of AT Figure 6.6 The SVD chooses orthonormal bases so that Av; = ojU4;. Then Av; must equal o,u;. It does, with singular value oj = V10. stop there (it usually doesn’t): The SVD could ? t= [ye | vor2 1/2]. It is customary for U and V to be square. The matrices need a second column. The vector v2 must be orthogonal to v;, and uz must be orthogonal to wu): 1 45] n= and m= 1 |_|. The vector v2 is in the nullspace. It is perpendicular to v; in the row space. Multiply by A to get Av2 = 0. We could say that the second singular value is 07 = 0, but singular values are like pivots—only the r nonzeros are counted. If A is 2 by 2 then all three matrices U, X, V are 2 by 2 in the complete SVD: 2 ae Tear | URRVIO <0 i i |=uEv =I i =| ie 0 ali me e, 6S The matrices U and V contain orthonormal bases for all four subspaces: first r last n—r_ first r last m-—r_ columns of V: row space of A columns of V: — nullspace of A columns of U: column space of A columns of U: — nullspace of A!. The first columns v,...,v,; and uj,...,u, are the hardest to choose, because Av; has to fall in the direction of u;. The last v’s and u’s (in the nullspaces) are easier. 6.7 Singular Value Decomposition (SVD) 317 As long as those are orthonormal, the SVD will be correct. The v’s are eigenvectors of ATA and the w’s are eigenvectors of AA’. Example 2 has 1, oo aa=|3 | and 7 |8 aat=i 4 ip Those matrices have the same eigenvalues 10 and 0. The first has eigenvectors v; and v2, the second has eigenvectors u; and uz. Multiplication shows that Av; = /10 wy and Av? = 0. It always happens that Av; = o;u;, and we now explain why. Proof of SVD: From A™ Av; = oP vi, the key steps are to multiply by vi and by A: vo} AT Av; = of v) v; gives \| Av; || =o?U AA‘ Ap; — 07 Av; gives u; = Av;/o; 50 that ||Av; || = 9; (10) as a unit eigenvector of AA’. Equation (10) used the small trick of placing parentheses in (v) AT)(Ay;). vector times its transpose, giving ||Av;||?. Equation (11) This is a (11) placed the parentheses in (AA!')(Av;). This shows that Av; is an eigenvector of AA!. We divide by its length o; to get the unit vector u; = Av;/o;. This is the equation Avj = o;u;, which says that A is diagonalized by these outstanding bases. We will give you our opinion directly. The SVD is the climax of this linear algebra course. We think of it as the final step in the Fundamental Theorem. First come the dimensions of the four subspaces. Then their orthogonality. Then the bases which diagonalize A. It is all in the formula A = UXV!. Applications are coming— they are certainly important!—but you have made it to the top. Eigshow Section 6.1 described the MATLAB (it is built into MATLAB (Part 2) display called eigshow and gave its Web location 5.2). The first option is eig, when x moves in a circle and Ax follows on an ellipse. The second option is sud, when two perpendicular vectors x and y travel around a circle. Then Ax and Ay move too. There are four vectors on the screen. The SVD is seen graphically when Ax happens to be perpendicular to Ay. Their directions at that moment give an orthonormal basis u;,u2. Their lengths give the singular values 01, 02. The vectors x and y at that moment are the orthonormal basis Vj, U2. On the screen you see these bases that give Av; = oju; matrix language that is AV = UX. This is the SVD. y=(0,1) and Avz = o2u2. Ay = (2,1) Ax = (2, —1) Ax =2V/2u In 318 Chapter 6 Eigenvalues and Eigenvectors =» 1. The SVD REVIEW OF THE KEY IDEAS factors A into UXV!, The numbers = with singular values 0; > --- > 0, > 0. a ssn Ge are the nonzero eigenvalues of AA! and AA. The orthogonal columns of U and V are eigenvectors of AA! and ATA. ag Sey a Those columns give orthogonal bases for the four fundamental subspaces of A, and they diagonalize that matrix: Av; = oju; for i <r. Problem Set 6.7 Problems 1-3 compute the SVD of a square singular matrix A. 1 Compute A'A and its eigenvalues on 0 and unit eigenvectors vj), v2: 2 (a) Compute (b) Verify from Problem en eel AA! and its eigenvalues On Q and unit eigenvectors w1, U2. | that Av; = 0, . Find all entries in the SVD: fo ai=[e @)[” o ffm »]. 3 Write down orthonormal bases for the four fundamental subspaces of this A. Problems 4—7 ask for the SVD of matrices of rank 2. 4 (a) Find the eigenvalues bonacci matrix and unit eigenvectors of A'A (b) Construct the singular value decomposition of A. and AA! for the Fi- 5 Show that the vectors in Problem 4 satisfy Av; = o,u,; and Avz = o7U2. 6 Use the SVD part of the MATLAB and v2 graphically. rf Compute ATA and AA! and their eigenvalues and unit eigenvectors for demo eigshow to find the same vectors v, ape tat Ae | 1 al: Multiply the three matrices UXV! to recover A. 6.7. Singular Value Decomposition (SVD) 319 Problems 8-15 bring out the underlying ideas of the SVD. 8 Suppose uj,...,U, and vj,... , v, are orthonormal bases for R”. Construct the matrix A that transforms each v; into u; to give Avy = uj,... , AU, = Up. Construct the matrix with rank one that has Av = 12u for v = S(1, 1,1, 1) and u= 10 Suppose What 11 HO 2, 1).Its only singular value is oj; = A has orthogonal columns of lengths 0), 02,... , On. Explain how the SVD expresses the matrix A as the sum of r rank one matrices: A =0 Uv); 12 w,, W2,...,W, are U, &, and V in the SVD? Tt +-:: + Oyu v2. Suppose A is a 2 by 2 symmetric matrix with unit eigenvectors u; and uz. its eigenvalues are 4; = 3 and Az = —2, what are the matrices If U, &, V! in its SVD? 13 If A = OR with an orthonormal matrix Q, then the SVD of A is almost the same as the SVD of R. Which of the three matrices in the SVD is changed because of Q? 14 Suppose A is invertible (with oj > 02 > 0). Change A by as small a matrix as possible to produce a singular matrix Ap. Hint: U and V do not change: se[a a)”, ][m eo] 15 16 (a) If A changes to 4A, what is the change in the SVD? (b) What is the SVD for A! and for A~!? Why doesn’t the SVD for A+ / just use X + /? a. | es ee ne 0 a taping ee LOG tir Gy “wed vi dime? jeiytrgiree ger es Tite es te tae 7 Sepvue (ities a iS chien oelign > ~~ a= Y-2 eae 1@a2ea 6 ain SS ham, Gan) © Oe ome ah & ¢ ome W pe heh 7aaah 4 > i>) - ee : _ ogep .t = tive &2 tik’ 9 WY = ail®» Vw) wD Wie de ee ides eata * 9Mcyt r i : 0 rave 2dian npnied tsteeae eed of a date wave a ep ine oe" a RIS : Ceslieuale oe me scaba ol e in ee = Nes wes. » eae =a See ee a lew pp-a~aese®. ’ pa» Sad ay = a Sovyuh niche 2? @ Pees os = em at ec 7 oe? i Ye vies eaters Dadaist wht wae eS> Uh Ol he ay, RS sage ie he 20-2 } Wine ae : 6-0-4 of POU TOO st poetic 4 an tn 7 ASE ") 1) ee. eeBs & (awe easpod = tAmy wee ate aera a, — Sl 0 7 a ches > tes aaa — es Seaver a an a > ow e 43 Se. ee »5 nee ie — == Ge 6 = Pale NAM ladas’ ot GA vides of - @e-jeneee . FF = - a re aa a pea a: ae & ed re Te nie Lund) DT deem mepersin via _— _ - LINEAR TRANSFORMATIONS THE IDEA OF A LINEAR TRANSFORMATION a 7.1 When a matrix A multiplies a vector v, it “transforms” v into another vector Av. In goes v, out comes Av. This transformation follows the same idea as a function. In goes a number x, out comes f(x). For one vector v or one number x, we multiply by the matrix or we evaluate the function. The deeper goal is to see all v’s at once. We are transforming the whole space when we multiply every v by A. Start again with a matrix A. It transforms v to Av. It transforms w to Aw. Then we know what happens to u = v + w. There is no doubt about Au, it has to equal Av + Aw. Matrix multiplication gives a linear transformation: DEFINITION A transformation T assigns an output T(v) to each input vector v. The transformation is linear if it meets these requirements for all v and w: (a) Tvu+w)=T(v)+T(w) (0) Bii(Gu) = Gr w) Betorsallxe: If the input is v = 0, the output must be T(v) = 0. We combine (a) and (b) into one: Linearity: T(cv+dw) =cT(v)+dT(w). A linear transformation is highly restricted. Suppose we add up to every vector. Then T(v) = v+uo and T(w) = w+ uo. This isn’t good, or at least it isn’t linear. Applying T to v-+w produces v + w + uo. That is not the same as T(v) + T(w): T(v) + T(w) =v+uptwt+ug does not equal Tiv+w)=v+w+uo. 321 322 Chapter 7 Linear Transformations The exception is when up = 0. The transformation reduces to T(v) = v. This is the identity transformation (nothing moves, as in multiplication by /). That is certainly linear. In this case the input space V is the same as the output space W. The linear-plus-shift transformation T(v) = Av + uo is called “affine.” Straight lines stay straight and computer graphics works with affine transformations. The shift to computer graphics is in Section 8.5. Example 1 Choose a fixed vector a = (1, 3,4), and let T(v) be the dot product a -v: The input is This is linear. v = (v1, v2, v3). The output is The from inputs v come 7T(v) three-dimensional =a-v=v; space, + 3v2 + 413. so V = R>. The outputs are just numbers, so the output space is W = R!. We are multiplying by the row matrix A=[1 3 4]. Then T(v) = Av. You will get good at recognizing which transformations are linear. If the output involves squares or products or lengths, Us or v1v2 or ||v||, then 7 is not linear. Example 2 T(v) = ||v|| is not linear. Requirement (a) for linearity would |v + w|| = ||v|| + ||w]|. Requirement (b) would be ||cv|| = c|lv||. Both are false! be Not (a): The sides of a triangle satisfy an inequality ||v + w|| < ||v|| + ||w]l. Not (b): The length || — v|| is not —||v]]. Example 3 (Important) 7 is the transformation that rotates every vector by 30°. The domain is the xy plane (where the input vector v is). The range is also the xy plane (where the rotated vector T(v) is). We described T without mentioning a matrix: just rotate the plane. Is rotation linear? Yes it is. We can rotate two vectors and add the results. The sum of rotations T(v) + T(w) is the same as the rotation T(v + w) of the sum. The whole plane is turning together, in this linear transformation. Note Transformations have a language of their own. Where there is no matrix, we can’t talk about a column space. But the idea can be rescued and used. The column space consisted of all outputs Av. The nullspace consisted of all inputs for which Av = 0. Translate those into “range” and “kernel”: Range of T = set of all outputs T(v): corresponds to column space Kernel of T = set of all inputs for which T(v) = 0: corresponds to nullspace. The range is in the output space W. The kernel is in the input space V. When T is multiplication by a matrix, T(v) = Av, you can translate back to column space and nullspace. For an m by n matrix, the nullspace is a subspace of V = R”. The column space is a subspace of ____. The range might or might not be the whole output space W. 7.1 The Idea of a Linear Transformation Byes: Examples of Transformations (mostly linear) Example 4 Project every 3-dimensional vector down onto the xy plane. The range is that plane, which contains every T(v). The kernel is the z axis (which projects down to zero). This projection is linear. Example 5 Project every 3-dimensional vector onto the horizontal plane z = 1. The vector v = (x, y, 10) is transformed to T(v) = (x, y, 1). This transformation linear. Why not? It doesn’t even transform v = 0 into T(v) = 0. is not Multiply every 3-dimensional vector by a 3 by 3 matrix A. This is definitely a linear transformation! T(v+w)=A(v+w)_ which does equal Av+Aw=T7(v)+T7(w). Example 6 Suppose A is an invertible matrix. The kernel of T is the zero vector; the range W equals the domain V. Another linear transformation is multiplication by A~!. This is the inverse transformation T~', which brings every vector T(v) back to v: T(T(v)) =v matches the matrix multiplication Pp A7!(Av) = i). We are reaching an unavoidable question. Are all linear transformations produced by matrices? Each m by n matrix does produce a linear transformation from R” to R”. The rule is T(v) = Av. Our question is the converse. When a linear T is described as a “rotation” or “projection” or “...”, is there always a matrix hiding behind 7? The answer is yes. This is an approach to linear algebra that doesn’t start with matrices. The next section shows that we still end up with matrices. Linear Transformations of the Plane It is more interesting to see a transformation than to define it. When a 2 by 2 matrix A multiplies all vectors in R?, we can watch how it acts. Start with a “house” in the xy plane. It has eleven endpoints. Those eleven vectors v are transformed into eleven vectors Av. Straight lines between v’s become straight lines between the transformed vectors Av. (The transformation is linear!) Applying A to a house produces a new house—possibly stretched or rotated or otherwise unlivable. This part of the book is visual not theoretical. We will show six houses and the matrices that produce them. The columns of H are the eleven circled points of the first house. (H is 2 by 12, so plot2d will connect the 11th circle to the first.) The 11 points in the house matrix H are multiplied by A to produce the other houses. The 324 Chapter 7 Linear Transformations houses on the cover were produced this way (before Christine Curtis turned them into a quilt for Professor Curtis). pie 6, TPP —68= 7 Orr] ee ONE BER 16" GOB De he Se 3" LON ee, De ONG ee AF hess35° ye 3D Oil Figure 7.1 Linear transformations of a house drawn by plot2d(A*#). —sin 35 | COSTS Oe 7.1. =» The Idea of a Linear Transformation REVIEW OF THE KEY IDEAS 325 = A transformation T takes each v in the input space to T(v) in the output space. Linearity requires that T (cjvj +---+cy,¥,) =c, T(v1) +-+-+en T(0p). The transformation T(v) = Av + vg is linear only if v9 = 0 ! eee ee The quilt on the book cover shows T(house) = AH for nine matrices A. Problem Set 7.1 A linear transformation must leave the zero vector fixed: from T(v + w) = T(v) + T(w) by choosing w = requirement (b) by choosing c = T(0) = 0. Prove this . Prove it also from Requirement (b) gives T(cv) = cT(v) and also T(dw) = dT(w). Then by addition, requirement (a) gives T( )=(_ ). What is T(cv+dw-+eu)? Which (a) (d) of these transformations T(v) = (v2, v1) T(v) = (0, 1). (b) is not linear? The input is v = (v1, v2): T(v) = (v1, v1) (c) T(v) = (0, v1) If S and T are linear transformations, is S$(7(v)) linear or quadratic? (a) (Special case) If S(v) = v and T(v) = v, then S(T(v)) = v or v*? (b) (General case) S(w,+w2) = S(w1)+S(w2) combine into S(T (v1 + v2)) = S¢ Suppose T7(v) =v except that and T(v,; +02) = T(v,)+T (02) )= ec 7(0, v2) = (0,0). Show that this transformation satisfies T(cv) = cT(v) but not T(v+w)=T(v)+T(w). Which of these transformations satisfy T(v-++-w) = T(v)+T(w) L(coy el(wv)? (a) T(v)=v/|lv|l (d) T(v) = largest component of v. (b) Tv) = vj +v2+03 For these transformations of V = R? to mation T? linear? (a) T(v)=—-v (Dy (c) T(v) = 90° rotation = (—v2, v}) (d) T(v) = projection = (45%, *15*2). (c) Tv) = (v4, 202, 3v3) W = R’, find T(T(v)). re(O) == va (1, 1) and which satisfy Is this transfor- 326 Chapter 7 Linear Transformations 8 Find the range and kernel (like the column space and nullspace) of T: (a) T(v1, v2) = (v2, 1) (c) Tv, v2) = ©, 9) (b) (d) Tv, v2, v3) = (41, v2) Tv, v2) = (11, V1). The “cyclic” transformation T is defined by T (v1, v2, v3) = (v2, v3, v1). What is T(T(v))? What is T3(v)? What is T'!°(v)? Apply T three times and 100 times to v. 10 11 12 A linear transformation from V to W has an inverse from W to V when the range is all of W and the kernel contains only v = 0. Why are these transformations not invertible? (a) T(r, v2) = (v2, v2) W =R? (b) (c) T(v1, v2) = (v1, 02,0) Ty, v2) = +¥2) W=R> W=R! If T(v) = Av and A is m by n, then 7 is “multiplication by A.” (a) What are the input and output spaces V and W? (b) Why is range of T = column space of A? (c) Why is kernel of T = nullspace of A? Suppose a linear T transforms when (ay v= (2, 2) (by (1,1) to (2,2) and (2,0) to (0,0). et (eT) Kye" “P= 1) (2) Find T(v) i= ae) Problems 13-20 may be harder. The input space V contains all 2 by 2 matrices M. 13 14 M is any 2 by 2 matrix and A = [4 al The transformation T is defined by T(M) = AM. What rules of matrix multiplication show that 7 is linear? Suppose A = [12]. Show that the range of T is the whole matrix space V and the kernel is the zero matrix: 15 (1) If AM =O prove that M must be the zero matrix. (2) Find a solution to AM = B for any 2 by 2 matrix B. Suppose A = [42]. Show that the identity matrix / is not in the range of T. Find a nonzero matrix M such that T(M) = AM 16 is zero. Suppose 7 transposes every matrix M. Try to find a matrix A which gives AM = M! for every M. Show that no matrix A will do it. To professors: Is this a linear transformation that doesn’t come from a matrix? 7.1. 17 18 The Idea of a Linear Transformation 327 The transformation T that transposes every matrix is definitely linear. Which of these extra properties are true? (a) T? = identity transformation. (b) The kernel of T is the zero matrix. (c) Every matrix is in the range of T. (d) T(M) = —M is impossible. Suppose T(M) = [§9][m][9 9]. Find a matrix with T(M) 4 0. Describe all matrices with T(M) = 0 (the kernel of 7) and all output matrices T(M) (the range of 7). 19 If A #0 and B + 0 then there is a matrix M such that AMB 4 0. Show by example that M =/ might fail. For your example find an M that succeeds. 20 IfA and B are invertible and T(M) = AMB, find T~'(M) inthe form( Questions 21-27 are about house transformations by AH. 21 a2 24 The output is T (house). How can you tell from the picture of T (house) that A is (a) a diagonal matrix? (b) a rank-one matrix? (c) a lower triangular matrix? Draw a picture of T (house) for these matrices: = 23 an) and oi) =f Bel 4 and ee ul, 4 ne What are the conditions on A = [#}] to ensure that T (house) will (a) sit straight up? (b) expand the house by 3 in all directions? (c) rotate the house with no change in its shape? What are the conditions on det A = ad — bc to ensure that T (house) will (a) be squashed onto a line? (b) keep its endpoints in clockwise (c) have the same order (not reflected)? area as the original house? If one side of the house stays in place, how do you know that A = /? 25 )M(_). Describe T (house) when T(v) = —v + (1,0). This 7 is “affine.” 328 Chapter 7 Linear Transformations 26 Change the house matrix H to add a chimney. 27 This MATLAB program creates a vector of 50 angles called theta, and then draws the unit circle and T (circle) = ellipse. You can change A. = 2 Ae) 2 theta = [0:2 * pi/50:2 * pil; circle = [cos(theta); sin(theta)]; ellipse = A* circle; axis([-4 4 —4 4]); axis(‘square’) plot(circle(1,:), circle(2,:), ellipse(1,:), ellipse(2,:)) 28 Add two eyes and a smile to the circle in Problem 27. (If one eye is dark and the other is light, you can tell when the face is reflected across the y axis.) Multiply by matrices A to get new faces. 29 The first house is drawn by this program plot2d(H). Circles from o and lines from —: Koa FtA) cave (2c axis({-1010—1010]), axis(’square’) plot@ny/.c'.%..V. =); Test plot2d(A’ * H) and plot2d(A’ * A* H) with the matrices in Figure 7.1. 30 Without a computer describe the houses A « H for these matrices A: 1 0 E 4 a 31 a 3 5 ag mie: Be 3 3% Nn at i AF What matrices give the houses on the front cover? The second is A = /. 7.2. The Matrix of a Linear Transformation 329 THE MATRIX OF A LINEAR TRANSFORMATION a 7.2 The next pages assign a matrix to every linear transformation. For ordinary column vectors, the input v is in V= R” and the output T(v) is in W= R”. The matrix for T will be m by n. The standard basis vectors for R” and R” lead to a standard matrix for T. Then T(v) = Av in the normal way. But these spaces also have other bases, so the same T is represented by other matrices. A main theme of linear algebra is to choose the bases that give the best matrix. When V and W are not R” and R”, they still have bases. Each choice of basis leads to a matrix for 7. When the input basis is different from the output basis, the matrix for T(v) = v will not be the identity 7. It will be the “change of basis matrix.” Key idea of this section When we know T(v}),..., T(v,) for the basis vectors v\,..., Vn, linearity produces T(v) for every other vector v. Reason Every input v is a unique combination cjv; +---+ Cv, of the basis vectors. Since T is a linear transformation (here is the moment for linearity), the output 7 (v) must be the same combination of the known outputs 7T(v1),..., IMP Suppose v = cyvy +++++ Cyn. Then linearity requires T(v) = cyT (v4) + +++ ¢nT (vn). The rule of linearity extends from cv + dw to all combinations c)v] +--+ + Cn Vp. Example 1 Suppose T transforms v; = (1, 0) to T(v;) = (2, 3, 4). Suppose the second basis vector v2 = (0, 1) goes to T(v2) = (5,5, 5). If T is linear from R? to R> then its standard matrix is 3 by 2: 2 a A=\|3 By linearity: 4 “(ui + 123 Mm nn OS 3 Aaa) \ i = || & 9 Example 2 The derivatives of the functions 1, x,x?,x° are 0,1,2x,3x2. Those are four facts about the transformation 7 that “takes the derivative.’ Now add the crucial fact that T is linear: x T(v) = gy — ag beys obeyss thethe linearity rul linearity rule ive(e Senge! ae ae Sea NGU: w) = c— oyu ae Ce beam 330 Chapter 7 Linear Transformations It is exactly those facts that you use to find all other derivatives. From the derivative of each separate power 1, x, x*, x? (those are the basis vectors v1, v2, v3, v4) you find the derivative of any polynomial like 4+ x + x? + x?: d eet x? 4+x3)=14+2x+3x* (because of linearity!) (2) XxX Here the input space V contains all combinations of 1, x, x”, x. I call them vectors, you might call them functions. Those four vectors are a basis for the space V of cubic polynomials (degree < 3). For the nullspace of A, we solved Av = 0. For the kernel of the derivative T, we solve dv/dx = 0. The solution is v = constant. The nullspace of T is one-dimensional, containing all constant functions like v; = 1 (the first basis function). To find the range (or column space), look at all outputs from T(v) = dv/dx. The inputs are cubic polynomials a + bx + cx? + dx?, so the outputs are quadratic polynomials (degree < 2). For the image space W we have a choice. If W = cubics, then the range of T (the quadratics) is a subspace. If W = quadratics, then the range is all of W. That second choice emphasizes the difference between the domain or input space (V = cubics) and the image or output space (W = quadratics). V has dimension n = 4 and W has dimension m = 3. The matrix for T will be 3 by 4. The range of T is a three-dimensional subspace. The matrix will have rank r = 3. The kernel is one-dimensional. The sum 3+ 1 = 4 is the dimension of the input space. This was r + (n —r) =n in the Fundamental Theorem of Linear Algebra. Always (dimension of range) + (dimension of kernel) = dimension of V. Example 3 The integral is the inverse of the derivative. That is the Fundamental Theorem of Calculus. We see it now in linear algebra. The transformation T~! that “takes the integral from 0 to x” is linear! Apply T~! to 1, x, x?: x i Law. 0 x A x dx = 1x?, 0 . x / x? dx = ee 0 By linearity, the integral of w = B + Cx + Dx? is T~'(w) = Bx + $Cx? == £Dx? The integral of a quadratic is a cubic. The input space of T~! is the quadratics, the output space is the cubics. Integration takes W back to V. Its matrix will be 4 by 3. Range of T~! The outputs Bx + 5Cx? + +Dx? are cubics with no constant term. Kernel of T~' The output is zero only if B =C = D=0. The nullspace is the zero vector. Now 3 + 0 = 3 is the dimension of the input space W for T~!. 7.2. The Matrix of a Linear Transformation 331 Matrices for the Derivative and Integral We will show how the matrices A and A~! copy the derivative T and the integral T~!. This is an excellent example from calculus. Then comes the general rule—how to represent any linear transformation T by a matrix A. The derivative transforms the space V of cubics to the space W of quadratics. The basis for V is 1, x, x*, x3. The basis for W is 1, x, x”. The matrix that “takes the derivative” is 3 by 4: O A=|0 oO 1 0 0 2 070 0 0 | =matrix form of FZ. 3 Why is A the correct matrix? Because multiplying by A agrees with transforming by T. The derivative of v =a+bx +cx? +. dx? is T(v) =b + 2cx + 3dx*. The same b and 2c and 3d appear when we multiply by the matrix: OneIeaO sOilh ee b O 109s, © Be ea OO. Wel2a 3d (3) Look also at T~!. The integration matrix is 4 by 3. Watch how the following matrix starts with w = B + Cx + Dx? and produces its integral Bx + 5Cx? + 1Dx?: Integration: O27 Or 00 1 O00 1 Oa 0 0 4 B Gipe D 0 B dy Ag (4) ;D I want to call that matrix A~!, and I will. But you realize that rectangular matrices don’t have inverses. At least they don’t have two-sided inverses. This rectangular A has a one-sided inverse. The integral is a one-sided inverse of the derivative! oe AAt=|/0 Ole 1 0 but AeA 0100 OOF OF 10250)" 0 1 If you integrate a function and then differentiate, you get back to the start. So AA! =I. But if you differentiate before integrating, the constant term is lost. The integral of the derivative of \ is zero: T~'T(1) = integral of zero function This matches A~!A, whose first column is all zero. constant functions). = 0. The derivative T has a kernel (the Its matrix A has a nullspace. Main point again: Av copies 7 (v). 332 Chapter 7 Linear Transformations Construction of the Matrix Now for the general rule. Suppose T transforms the space V (n-dimensional) to the space W (m-dimensional). We choose a basis vj,..., V, for V and a basis Wj, ..., Wm for W. The matrix A will be m by n. To find its first column, apply T to the first basis vector v1: T (v4) is in W. It is a combination These numbers aj\,..., 4m matches multiplying ayywy+++:+an1Wm of the output basis. go into the first column of A. Transforming v; to T(v1) (1,0,...,0) by A. It yields that first column of the matrix. When T is the derivative and the first basis vector is 1, its derivative is T(v;) = 0. So for the derivative, the first column of A was all zero. For the integral, the first basis function is again 1 and its integral is x. This is 1 times the second basis function. So the first column of A~! was (0, 1, 0, 0). 7A Each linear transformation T from V to W is represented by a matrix A (after the bases are chosen for V and W). The jth column of A is found by applying T to the jth basis vector v;: T(v;) = combination of basis vectors of W =a,j;wW) +--+ +4mj Wm. (5) These numbers a1j,.-.,@mj go into column j of A. The matrix is constructed to get the basis vectors right. Then linearity gets all other vectors right. Every v is a combination c}v; +---+C,V,_,, and T(v) is a combination of the w’s. When A multiplies the coefficient vector c = (cj,..., Cn) in the v combination, Ac produces the coefficients in the T(v) combination. This is because matrix multiplication (combining columns) is linear like T. A tells what T does. Every linear transformation can be converted to a matrix. This matrix depends on the bases. Example 4 If the bases change, T is the same but the matrix A is different. Suppose we reorder the basis of cubics to x, x”, x7, 1. Keep the original 1, x, x? for the quadratics in W. Now apply T to the first basis vector. The derivative of x is 1. This is the first basis vector of W. So the first column of A looks different: A=}|0 be 10°00) 2 O OJ] Op: Ory 390 = matrix for the derivative T when the bases change to ak aa landil exer When we reorder the basis of V, we reorder the columns of A. The input basis vector v; is responsible for column j. The output basis vector w; is responsible for row i. Soon the changes in the bases will be more than permutations. 7.2. (0) : The Matrix of a Linear Transformation v2 =|— sin sin ube \ v9 . rv) =|989 1’ \9 Figure 7.2 wv 333 ‘. T (v1) =T(r) | sing 7 ie = 1/2 1/2 Rotation by @ and projection onto the 45° line. Products AB Match Transformations 7S The examples of derivative and integral made three points. First, linear transformations T are everywhere—in calculus and differential equations and linear algebra. Second, spaces other than R” are important—we had cubics and quadratics. Third, 7 still boils down to a matrix A. Now we make sure that we can find this matrix. The next examples have V = W. We choose the same basis for both spaces. Then we can compare the matrices A? and AB with the transformations T? and TS. Example 5 T rotates every plane vector by the same angle 6. Here Find the rotation matrix A. The answer depends on the basis! V = W = Re Solution The standard basis is vj; = (1,0) and vz = (0,1). To find A, apply T to those basis vectors. In Figure 7.2a, they are rotated by @. The first vector (1, 0) swings around to (cos@,sin@). This equals cos@ times (1,0) plus sin@ times (0, 1). There- fore those numbers cos@ and sin@ go into the first column of A: oe sin@ shows column |f A= cay sin 0 ly cos 0 shows both columns. For the second column, transform the second vector (0, 1). The figure shows it rotated to (—sin@,cos@). Those numbers go into the second column. Multiplying A times (0, 1) produces that column, so A agrees with T. Example 6 (Projection) Suppose T projects every plane vector onto the 45° line. Find its matrix for two different choices of the basis. Solution Start with a specially chosen basis. The basis vector vj is along the 45° line. It projects to itself. So the first column of A contains | and 0. The second basis vector v3 is along the perpendicular line (135°). This basis vector projects to zero. So the second column of A contains 0 and 0: Projection A = ; 3 when V and W have the 45° and 135° basis. 334 Chapter 7 Linear Transformations With the basis in the opposite order (135° then 45°), the matrix is __. Now take the standard basis (1,0) and (0,1). projects to G, 5). That gives the first column of A. Figure 7.2b shows how (1,0) The other basis vector (0, 1) also projects to G, 5). So the matrix is Projection A = for the same NI T and the standard basis. Ll NIK Both A’s are projection matrices. If you square A it doesn’t change. Projecting twice is the same as projecting once: T? = T so A? = A. Notice what is hidden in that statement: The matrix for T? is A?. We have come to something important—the real reason for the way matrices are multiplied. Two transformations S and T are represented by two matrices B and A. When we apply 7 to the output from S, we get the “composition”® TS. When we apply A after B, we get the matrix product AB. The rules for matrix multiplication give the correct matrix AB to represent TS. The transformation S is from a space U to V. Its matrix B uses a basis u),..., Up for U and a basis vj, ..., v, for V. The matrix is n by p. The transformation T is from V to W as before. /ts matrix A must use the same basis v\, ..., Vn for V—this is the output space for S and the input space for 7. Then AB matches T'S: 7B Multiplication The linear transformation TS starts with any vector wu in U, goes to S(u) in V and then to T(S(u)) in W. The matrix AB starts with any x in R?, goes to Bx in R” and then to ABx in R”. The matrix AB correctly represents 7S: TS: U~V-W AB: The input is uw = xju; +-:-+XpUpy. (m by n)(n by p) = (m by p). The output T(S(u)) matches the output ABx. Product of transformations matches product of matrices. The most important cases are when the spaces U, V, W are the same and their bases are the same. With m =n = p we have square matrices. Example 7 S rotates the plane by @ and T also rotates by 6. Then 7S rotates by 26. This transformation T* corresponds to the rotation matrix A? through 26: a. A=B A’ = rotation by 26 = Ieee aed sin2@ cos 20 By matching (transformation)? with (matrix)*, and sin20. Multiply A times A: cos@ —sin@][cos@ —sin@] sin@d cos@}]}sin@ cos@| we pick up the formulas [cos*@—sin*?@ —2sin@cosé | cos*@—sin?6 2sin@cos@ for cos 20 |" 7.2. The Matrix of a Linear Transformation 335 Comparing with the display above, cos 20 = cos? 6 — sin? 6 and sin26 = 2sin@cos@. Trigonometry comes from linear algebra. Example 8 § rotates by @ and T rotates by —@. Then TS = / and AB =/. In this case T(S(u)) is u. We rotate forward and back. For the matrices to match, ABx must be x. The two matrices are inverses. Check this by putting cos(—@) = cos@ and sin(—@) = —sin@ into A: cos 0 qs —sin@ ih hae al x Fake + sin? @ cosé a sin 0 cos@ 0 0 cos @ + sin? @ By the famous identity for cos 6 + sin? 6, this is /. Earlier T took the derivative and S took the integral. Then 7S is the identity but not ST. Therefore AB is the identity matrix but not BA: 0 010 71/4 9 0 0100 AB 040.02. 0 Lh PT SS a 57 ner eres ioe scat OM Oe) 0 0 4 OM08 a The Identity Transformation and Its Matrices We need the matrix for the special and boring transformation T(v) = v. This identity transformation does nothing to v. The matrix also does nothing, provided the output basis is the same as the input basis. The output 7(v;) is v;. When same, this is w;. So the first column of A is (1,0,..., 0). the bases are the When each output T(v;) = v; is the same as wj, the matrix is just I. This seems reasonable: The identity transformation is represented by the identity matrix. But suppose the bases are different. Then 7(v;) = v; is a combination of the w’s. That combination m,,w,; +---+my1Wy, tells us the first column of the matrix M. We will use M (instead of A) for a matrix that represents the identity transformation. When the outputs T(v;) = v; are combinations Yye1 Mijwi, the “change of basis matrix” is M. The basis is changing but the vectors themselves are not changing: T7(v) = v. When the input has one basis and the output has another basis, the matrix is not /. Example 9 The input basis is vj; = (3,7) and v2 = (2,5). The output basis is wy = (1,0) and w2 = (0,1). Then the matrix M is easy to compute: The matrix for T(v)=v is M= Seay: E alk Reason The first input is vj = (3,7). The output is also (3,7) but we express it as 3w; + 7w2. Then the first column of M contains 3 and 7. 336 Chapter 7 Linear Transformations This seems too simple to be important. It becomes trickier when the change of basis goes the other way. We get the inverse matrix: Example 10 The input basis is vj = (1,0) and v2 = (0, 1). The output basis is wy = (Cry) Randinwa71—@)) =i 3 2 E | = : . (ihe MathixetOneA()):— 0s Reason The first input is vj; = (1,0). 5 —2 be 4; The output is also vj but we express it as Sw, — 7w2. Check that 5(3,7) — 7(2,5) does produce (1,0). We are combining the columns of the previous M to get the columns of J. The matrix to do that is M7!: wi wn E * = B »| is MM! =I. A mathematician would say that MM! corresponds to the product of two identity transformations. We start and end with the same basis (1,0) and (0, 1). Matrix multiplication must give 7. So the two change of basis matrices are inverses. Warning One mistake about M is very easy to make. This matrix changes from the basis of v’s to the standard columns of 7. But matrix multiplication goes the other way. When you multiply M times the columns of J, you get the v’s. It seems backward but it is really OK. One thing is sure. Multiplying A times (1,0,...,0) gives column 1 of the ma- trix. The novelty of this section is that (1,0,...,0) stands for the first vector v,, written in the basis of v’s. Then column | of the matrix is that same vector v1, written in the standard basis. This is when we keep T = J and change basis. In the rest of the book we keep the basis fixed and T is multiplication by A. = 1. 22. 3. If we know other T(v). REVIEW OF THE KEY IDEAS T(v;),...7T(v,) Linear transformation T Input basis vj, ..., Up Output basis w),..., Wm 4. for the vectors in a basis, linearity determines all Matrix A (m by n) that represents T in these bases The derivative and integral matrices are one-sided inverses: (Derivative) (Integral) 8 = / = d(constant)/dx = 0: Fundamental Theorem of Calculus ! The change of basis matrix M represents T = J. Its columns are the coefficients of each output basis vector expressed in the input basis: w; = mj jvy +--+ + myj Up. 7.2. The Matrix of a Linear Transformation 337 Problem Set 7.2 Questions 1-4 extend the first derivative example to higher derivatives. 1 The transformation S takes the second derivative. Keep 1, x,x*, x> as the basis V1, V2, V3, v4 and also as wi, W2, W3, w4. Sv,, Sv2, Sv3, Sv4 in terms Write of the w’s. Find the 4 by 4 matrix B for S. What functions have v” = 0? They are in the kernel of the second derivative S. What vectors are in the nullspace of its matrix B in Problem 1? B is not the square of the 4 by 3 first derivative matrix as Et le wae o 000% 2an0 Or 0. (Oy +3 Add a zero row to A, so that output space = input space. Then compare A* with B. Conclusion: For B = A” we also want output basis = ____ basis. Then [ma Or (a) The product TS produces the third derivative. by 4 matrices, then compute AB. Add zero rows to make 4 (b) The matrix B* corresponds to S? = fourth derivative. zero? Why is this entirely Questions 5-10 are about a particular T and its matrix A. 5 With bases v1, v2, v3 and wy), w2, w3, suppose T(v,) = w2 and T(v2) = T(v3) = w, + w3. T is a linear transformation. Find the matrix A. (a) What is the output from 7 in Question 5 when the input is v; + v2 + 03? (b) Multiply A times the vector (1, 1, 1). Since T(v2) = T(v3), the solutions to T(v) = 0 are v = are in the nullspace of A? Find all solutions to T(v) = wp. . What vectors Find a vector that is not in the column space of A. Find a combination of w’s that is not in the range of T. You don’t have enough information to determine T?. Why not? Why is its matrix not necessarily A”? 10 Find the rank of A. This is not the dimension of the output space W. It is the dimension of the _ on i 338 Chapter 7 Linear Transformations Questions 11-14 are about invertible linear transformations. 11 Suppose T(v|) = w) +w2+w3 and T(v2) = w2+w3 and T(v3) = w3. Find the matrix for T using these basis vectors. What input vector v gives T(v) = w 1? 12 Invert the matrix A in Problem 11. Also invert the transformation T—what are T—!(w;) and T~!(w2) and T~!(w3)? Find all v’s that solve T(v) = 0. 13 Which of these are true and why is the other one ridiculous? @ 14 TUr=I! ®) TUT@))=1 © Tf) =w1. Suppose the spaces V and W have the same basis 0}, v2. (a) Describe a transformation T (not /) that is its own inverse. (b) Describe a transformation T (not /) that equals jie (c) Why can’t the same T be used for both (a) and (b)? Questions 15-20 are about changing the basis. 15 16 17 (a) What matrix transforms (1, 0) into (2, 5) and transforms (0, 1) to (1, 3)? (b) What matrix transforms (2,5) to (1,0) and (1, 3) to (0, 1)? (c) Why does no matrix transform (2,6) to (1,0) and (1,3) to (0, 1)? (a) What matrix M transforms (1,0) and (0, 1) to (r,t) and (s, u)? (b) What matrix N transforms (c) What condition on a,b, c,d will make part (b) impossible? (a) How do M and N in Problem (Ap) wanda(ba)aion(sarae (b) What matrix transforms (2,5) to (1,1) and (1,3) to (0, 2)? (a,c) and (b,d) to (1,0) and (0, 1)? 16 yield the matrix that transforms (a, c) to 18 If you keep the same basis vectors but put them in a different order, the change of basis matrix M is a matrix. If you keep the basis vectors in order but change their lengths, M is a matrix. 19 The matrix that rotates the axis vectors (1,0) and (0,1) through an angle @ is Q. What are the coordinates (a, b) of the original (1, 0) using the new (rotated) axes? This can be tricky. Draw a figure or solve this equation for a and b: De 20 cos@ —sin@ fe | 1 cosé lo]=«[S6]+2| — sin@ nae The matrix that transforms (1,0) and (0,1) to (1,4) and (1,5) is The combination a(1,4) + b(1, 5) that equals (1,0) has (a,b) =( are those new coordinates of (1,0) related to M or M~!? M = , _ ). How 7.2 The Matrix of a Linear Transformation 339 Questions 21-24 are about the space of quadratic polynomials A + Bx + Cx?. 21 22 The parabola w; = 5(x7 +x) equals one at x = | and zero at x = 0 and x = —1. Find the parabolas w2, w3, and y(x): (a) wy equals one at x = 0 and zero at x = 1 and x = —1. (b) w3 equals one at x = —1 and zero at x = 0 and x = 1. (c) y(x) equals 4 at x = 1 and 5 at x =O and 6 at One basis for second-degree polynomials is v; = x = —1. Use wy, wp, w3. 1 and v2 = x and v3 = x°. Another basis is w;, w2, w3 from Problem 21. Find two change of basis matrices, from the w’s to the v’s and from the v’s to the w’s. 23 What are the three equations for A, B,C if the parabola Y = A+ Bx + Cx? equals 4 at x = a and 5 at x = b and 6 at x = c? Find the determinant of the 3 by 3 matrix. For which numbers a, b,c will it be impossible to find this parabola Y? 24 Under what condition on the numbers m),™2,..., mg do these three parabolas give a basis for the space of all parabolas? vy (x) =m, 5) +mox + m3x~ and v2(x) = m4+ms5x +mox~ yD and U3(X) =m7- 25 The Gram-Schmidt process changes s mgx + mox-. a basis a), a2, a3 to an orthonormal basis 91,92, 93. These are columns in A = QR. Show that R is the change of basis matrix from the a’s to the q’s (a2 is what combination of g’s when A = QR?). 26 Elimination changes the rows of A to the rows of U with A = LU. Row 2 of A is what combination of the rows of U? Writing A' = UTL! to work with columns, the change of basis matrix is matrices are __—is) M = L™. (We have bases provided the 145 Suppose vj, v2, v3 are eigenvectors for 7. This means T(v;) = Ajv; for i = 1,2,3. What is the matrix for T when the input and output bases are the v’s? 28 Every invertible linear transformation can have / as its matrix. Choose any input basis vj,..., ¥, and for output basis choose w; to be T(v;). Why must T be invertible? Questions 29-32 review some basic linear transformations. PAs| Using vj = w; and v2 = wp find the standard matrix for these T’s: (a) T(vy) =0 and T(v2) = 3v, (b) Tv) =v, and T(v; + v2) = v4. 340 = Chapter 7 30 Linear Transformations Suppose T is reflection across the x axis and S is reflection across the y axis. The domain V is the xy plane. If v = (x, y) what is S(T (v))? Find a simpler description of the product ST. 31 Suppose T is reflection across the 45° line, and S is reflection across the y axis. If v = (2,1) then T(v) = (1,2). Find S(7(v)) and T(S(v)). This shows that generally ST Ail: 32 Show that the product ST of two reflections is a rotation. Multiply these reflection matrices to find the rotation angle: cos 20 sin20 33 True or false: If we know sin 20 —cos20 cos 2a sin2a@ sin 2a —cos2a 7(v) for n different nonzero know 7T(v) for every vector in R”. |~ vectors in R”, then we 7.3. Change of Basis 341 CHANGE OF BASIS 7.3 This section returns to one of the fundamental ideas of linear algebra—a basis for R". We don’t intend to change that idea, but we do intend to change the basis. It often happens (and we will give examples) that one basis is especially suitable for a specific problem. By changing to that basis, the vectors and the matrices reveal the information we want. The whole idea of a transform (this book explains the Fourier transform and wavelet transform) is exactly a change of basis. Remember what it means for the vectors wj,..., Ww, to be a basis for R”: 1. The w’s are linearly independent. 2. The nxn 3. Every vector v in R" can be written in exactly one way as a combination of the w’s: matrix W with these columns is invertible. v=cywy +C2W2 +--+ Here is the key point: Those coefficients cj,...,¢, + CpWn. (1) completely describe the vector v, after we have decided on the basis. Originally, a column vector v just has the components v1,..., Un. In the new basis of w’s, the same vector is described by the different set of numbers cj, ...C,. It takes n numbers to describe each vector and it also requires a choice of basis. The n numbers are the coordinates of v in that basis: V1} Cl vere and also Un standard VAX Cn basis A basis is a set of axes for R”. (2) The coordinates c;,...,cC, basis of w’s tell how far to go along each axis. The axes are at right angles when the w’s are orthogonal. Small point: What is the “standard basis’? Those basis vectors are simply the columns of the n by n identity matrix /. These columns e),...,@, are the “default basis.’ When I write down the vector v = (2,4, 5) in R?>, Tam intending and you are expecting the standard basis (the usual xyz axes, where the coordinates are 2, 4, 5): v = 2e; + 4e2 +5e3 1 0 0 2 =2/0}/+4)/1)+5)/0)}/=]4 0 0 l 5 The new question is: What are the coordinates c, C2, c3 in the new basis w,, W2, W3? As usual we put the basis vectors into the columns of a matrix. This is the basis matrix W. Then the fundamental equation v = cjw; + -:: + CW, has the matrix form v = We. From this we immediately know c. 7C The coordinates c = (cj,..., Cn) of a vector v in the basis w1,..., Wy are given by c= W-!'v. The change of basis matrix W~' is the inverse of the basis matrix W. 342 = Chapter 7 Linear Transformations The standard basis has W = J. The coordinates in that default basis e;,..., @n are the usual components v},..., Un. Our first new example is the wavelet basis for R*. Example 1 [Wavelet basis] Wavelets are little waves. They have different lengths and they are localized at different places. The first basis vector is not actually a wavelet, it is the very useful flat vector of all ones: Wir 1 1 | wy 1 | 1 —|] Cea —| we | 0 0 0 (3) —|] Those vectors are orthogonal, which is good. You see how w3 is localized in the first half and wa is localized in the second half. The coefficients cz and c4 will tell us about details in the first half and last half of v. The ultimate in localization is the standard basis. Why do want to change the basis? I think of v1, v2, v3, v4 as the intensities of a signal. It could be an audio signal, like music on a CD. It could be a medical signal, like an electrocardiogram. Of course n = 4 is very short, and n = 10,000 is more realistic. We may need to compress that long signal, by keeping only the largest 5% of the coefficients. This is 20 : 1 compression and (to give only one of its applications) it makes modern video conferencing possible. If we keep only 5% of the standard basis coefficients, we lose 95% of the signal. In image processing, most of the image disappears. In audio, 95% of the tape goes blank. But if we choose a better basis of w’s, 5% of the basis vectors can come very close to the original signal. In image processing and audio coding, you can’t see or hear the difference. One good basis vector is a flat vector like w; = (1, 1, 1, 1). That part alone can represent the whole constant background of our image. wave like w4 = (0,0, 1, —1) or in higher dimensions Then a short wg = (0, 0, 0, 0, 0,0, 1, —1) rep- resents a detail at the end of the signal. The three steps of transform and compression and inverse transform are input v — [lossless] coefficients c > [lossy] compressed © —> compressed0 [reconstruct] In linear algebra, where everything is perfect, we omit the compression step. The output 0 is exactly the same as the input v. The transform gives c = W~'v and the reconstruction brings back v = We. In true signal processing, where nothing is perfect but everything is fast, the transform (lossless) and the compression (which only loses unnecessary information) are absolutely the keys to success. I will show those steps for a typical vector like v = (6, 4,5, 1). Its coefficients in the wavelet basis are c = (4, 1, 1,2). This means that v can be reconstructed from 7.3. Change of Basis 343 c using the w’s. In matrix form the reconstruction is v = We: 6 4 ama 1 Those 1 1 la 1 coefficients 1 I les = ¢c= i 0 =I 0 el feel feywl Mic 0 = (4,1,1,2) are W~!v. because the w’s in its columns are orthogonal. Lie al teen Oalahes Ct LOT elt pgs hee es wb i oly Oe Inverting this basis matrix SS, W is easy But they are not unit vectors. So the inverse is the transpose divided by the lengths squared, W~' = (W’ W)~!W?: i4 wi= |— 1 2 1 1 1 1 -1 -!l | _ 0 0 oF ee Obs tila From the ones in the first row of c = W~!v, notice that c) is the average of v1, V2, 3, U4: C| II 6444541 _ 7 = 4, Example 2 (Same wavelet basis by recursion) I can’t resist showing you a faster way to find the c’s. The special point of the wavelet basis is that you can pick off the details in c3 and c4, before the coarse details in cz and the overall average in c,. A picture will explain this “multiscale” method, which is in Chapter | of my textbook with Nguyen on Wavelets and Filter Banks: Split v = (6, 4,5, 1) into averages and waves at small scale and then large scale: fo o™ averages differences /2 Jy average 4 4 Ce ll ey tel J, difference /2 4 4 plus = 1 344 Chapter 7 Linear Transformations Example 3 (Fourier basis) The first thing an electrical engineer does with a signal is to take its Fourier transform. This is a discrete signal (a vector v) and we are speaking about its Discrete Fourier Transform. The DFT involves complex numbers. But if we choose n = 4, the matrices are small and the only complex numbers are 7 and 7a Notice that i? = —i because i? = —1. A true electrical engineer would write j instead of i, but we have to draw a line somewhere. (The word imaginary does start with i.) Here are the four basis vectors, in the columns ~ 2 I — = = The first column 1 1 i2 i> i4 j6 PT is the useful flat basis vector signal or the direct current (the DC term). of the Fourier matrix F: Cig (1, 1,1, 1). It represents the average It is a wave at zero frequency. column is (1, —1, 1, —1), which alternates at the highest frequency. The third The Fourier trans- form decomposes the signal into waves at equally spaced frequencies. The Fourier matrix F is absolutely the most important complex matrix in mathematics and science and engineering. The last section of this book explains the Fast Fourier Transform: it is a factorization of F into matrices with many zeros. The FFT has revolutionized entire industries, by speeding up the Fourier transform. The beautiful thing is that F~' looks like F, with i changed to —i: 1 1 Si becomes F-* -+_1{/1 (i) = — i. (ai epi eane 1 (-i (-i)® (-i)? 1 W (Ci)? 1 (i)? The MATLAB command c = fft(v) produces the Fourier coefficients cj,..., Cc, of the vector v. It multiplies v by F —1 (fast). The Dual Basis The columns of W contain the basis vectors w),...,W,. To find the coefficients C],...,€n Of a vector in this basis, we use the matrix W~!. This subsection just introduces a notation and a new word for the rows of W~!. The vectors in those rows (call them wf ,...,u/) are the dual basis. The properties of the dual basis reflect W~'W = J and also WW! = J. The product W~!W takes rows of W~! times columns of W, in other words dot products of the u’s with the w’s. The two bases are “biorthogonal” because we get 1’s and 0’s: aii WW = a uy5: : me n wi---w, |=F 80 Z se w=), ire fiz} 7.3. Change of Basis 345 For an orthonormal basis, the u’s are the same as the w’s. The basis is biorthogonal to itself! The rows in W~! are the same as the columns in W. In other words W-!—WT!. That is the specially important case of an orthogonal matrix W. Other bases are not orthonormal. The axes don’t have to be perpendicular. The basis matrix W can be invertible without having orthogonal columns. When the inverse matrices are in the opposite order WW! = J, we learn something new. The columns are w;, the rows are Th, and each product is a rank one matrix. Multiply columns times rows: = wy +--+ wut = 7. This is the order that we constantly use to change the basis. c= W-~'v. facts that Then we reconstruct c = W~!v and v from We. Use the new The coefficients are in notation to state the basic v= We = WW!0d: n ihe cociicients atom, Cc; = uly l and the vector is v= w;(ulv). I (5S) | The analysis step takes dot products with the dual basis to find the c’s. synthesis step adds up the pieces c;w; to reconstruct the vector v. =» REVIEW OF THE KEY IDEAS The # 1. The new basis vectors w; are the columns of an invertible matrix W. 2. The coefficients of v in this new basis are oe The vector v is reconstructed 4. If there is compression then c becomes ¢ and we reconstruct D = C] Wi +++-+C, Wp. 5. The rows of W~! c = W~'v (the analysis step). as We = cjw, +---+cC,W, (the synthesis step). are the dual basis vectors and c; = ul. Then ul w; = dj}. Problem Set 7.3 1 Express the vectors e = (1,0,0,0) and v = (1, —1, 1, —1) in the wavelet basis, as in equation (4). The coefficients cj, c2, ¢3, 4 solve Wc =e and Wc =v. 346 = Chapter 7 Linear Transformations 2 Follow Example 2 to represent v = (7,5, 3, 1) in thie.wavelet basis. The first step —1 —1 The last step writes 6, 6, 2, 2 as an overall average plus a difference, using 1, 1, 1, 1 and 1,1, —1, —1. 3 What are the eight vectors in the wavelet basis for R°? They include the long wavelet (1, 1, 1, 1, —1, —1, —1, —1) and the short wavelet (1, —1, 0, 0, 0, 0, 0, 0). 4 The wavelet basis matrix W factors into simpler matrices i 1 elt et eal Ce A a)ul se [a Oe LOS eee vai OTe tie 0 oO “Or t Olt W; and W): 1 |Rea 6 ew8 |Nees hae 1 Salad, lee si OFS OPT at Then W-! = Wy Wr ! allows ¢ to be computed in two steps. The first splitting in Example 2 shows W,- 'y. Then the second splitting applies Wis Find those inverse matrices W, ' and Wi ; directly from W, and W2. Apply them to v = (6,4, 5, 1). 5 The 4 by 4 Hadamard matrix is like the wavelet matrix but entirely +1 and —1: eae ais ries | nace tee ila fe Aen, Me ed (ieee ee Find the matrix H~! and write v = (7, 5, 3, 1) as acombination of the columns of H. 6 Suppose we have two bases vj,..., UV, and wj,..., w, for R”. If a vector has coefficients b; in one basis and c; in the other basis, what is the change of basis matrix in b= Mc? Start from byyy +-++ + ban = VO =cyw, +--+ + c,W, = We. Your answer represents 7(v) = v with input basis of v’s and output basis of w’s. Because of different bases, the matrix is not /. i The dual basis vectors w},..., w, are the columns of W* = (W-')T. Show that the original basis w;,..., w, is “the dual of the dual.” In other words, show that the w’s are the rows of (W*)~!. Hint: Transpose the equation WW~! = J. 7.4 DIAGONALIZATION Diagonalization and the Pseudoinverse 347 AND THE PSEUDOINVERSE a 7.4 This short section combines the ideas from Section 7.2 (matrix of a linear transforma- tion) and Section 7.3 (change of basis). The combination produces a needed result: the change of matrix due to change of basis. The matrix depends on the input basis and output basis. We want to produce a better matrix than A, by choosing a better basis than the standard basis. The truth is that all our great factorizations of A can be regarded as a change of basis. But this is a short section, so we concentrate on the two outstanding examples. In both cases the good matrix is diagonal. It is either A or D: 1. S~!AS =A when the input and output bases are eigenvectors of A. 2. U~!AV =) when the input and output bases are eigenvectors of A' A and AA’. You see immediately the difference between A and &. In A the bases are the same. The matrix A must be square. And some square matrices cannot be diagonalized by any S, because they don’t have n independent eigenvectors. In & the input and output bases are different. The matrix A can be rectangular. The bases are orthonormal because A'A and AA! are symmetric. Then U~! = UT and V~! = V!. Every matrix A is allowed, and can be diagonalized. This is the Singular Value Decomposition (SWD) of Section 6.7. I will just note that the Gram-Schmidt factorization A = QR chooses only one new basis. That is the orthogonal output basis given by Q. The input uses the standard basis given by J. We don’t reach a diagonal &, but we do reach a triangular R. You see how the output basis matrix appears on the left and the input basis appears on the right. We start with input basis equal to output basis. Similar Matrices: A and SAS~! and WAW7! We begin with a square matrix and one basis. The input space V is R” and the output space W is also R”. The basis vectors are the columns of /. The matrix with this basis is n by n, and we call it A. The linear transformation is just “multiplication by A.” Most of this book has been about one fundamental problem — to make the matrix simple. We made it triangular in Chapter 2 (by elimination), and we made it diagonal in Chapter 6 (by eigenvectors). Now the change in the matrix comes from a change of basis. Here are the main facts in advance. When you change the basis for V, the matrix changes from A to AM. Because V is the input space, the matrix M goes on the right (to come first). When you change the basis for W, the new matrix is M~!'A. We are working with the output space so M~! is on the left (to come last). If you change both bases in the same way, the new matrix is M~'AM. The good basis vectors are the eigenvectors, which go into the columns of M = S. The matrix becomes STAs =A. 348 = Chapter 7 Linear Transformations 7D When the basis contains the eigenvectors x1, ..., Xn, the matrix for T becomes A. Reason To find column 1 of the matrix, input the first basis vector x;. The transformation multiplies by A. The output is Ax; = A, x1. This is A, times the first basis vector plus zero times the other basis vectors. Therefore the first column of the matrix is (A;,0,..., 0). In the eigenvector basis, the matrix is diagonal. a on 0 H projects projection 3] x to zero projects to x, Figure 7.3 Projection on the 135° line y = —x. Standard basis vs. eigenvector basis. Example 1 Find the diagonal matrix that projects onto the 135° line y = —x. The standard vectors (1,0) and (0, 1) are projected in Figure 7.3. In the standard basis ries oust Solution The eigenvectors for this projection are x; = (1, —1) and x2 = (1,1). The first eigenvector lies on the projection line and the second is perpendicular (Figure 7.3). Their projections are x; and 0. The eigenvalues are 4; = 1 and Az = O. In the eigenvector basis the projection matrix is eI; al What if you choose a different basis like vj} = w; = (2,0) and v2 = w2 = (1, 1)? Since w) is not an eigenvector, the matrix B in this basis will not be diagonal. The first way to compute B follows the rule of Section 7.2: Find column j of the matrix by writing the output Av; as a combination of w’s. Apply the projection T to (2,0). The result is (1, —1) which is w; — w2. So the first column of B contains 1 and —1. The second vector wz = (1, 1) projects to zero, so the second column of B contains 0 and 0: b= E ilin the basis wy, wo. (1) 7.4 Diagonalization and the Pseudoinverse 349 The second way to find the same B is more insightful. Use W~! and W to change between the standard basis and the basis of w’s. Those change of basis matrices from Section 7.3 are representing the identity transformation. The product of transformations is just /T7, and the product of matrices is B = W~!AW. This matrix is similar to A. 7E For the basis w,..., Wy find the matrix B in three steps. Change the input basis to the standard basis with W. The matrix in the standard basis is A. Then change the output basis back to the w’s with W~!. The product B = W~'AW represents /T/: Bw’s to w’s = =i Woes aed to w’s Example 2 (continuing with the projection) Astandard Ww’s to standard Apply this W~!AW (2) rule to find B for the basis (2,0) and (1, 1): Was a Pr. Fea | ea |oa [eh sao Ee | eet ia ee ta 0 The W~!AW tule has produced the same B as in equation (1). A change of basis produces a similarity transformation in the matrix. The matrices A and B are similar. They have the same eigenvalues (1 and 0). The Singular Value Decomposition (SVD) Now the input basis can be different from the output basis. In fact the input space R” can be different from the output space R”. The matrix will be m by n, and we want to find it. We call the input basis v,,...,v, and the output basis u1,...,Um. Again the best matrix is diagonal (now m by n). To achieve this diagonal matrix x, each input vector v; must transform into a multiple of u;. That multiple is the singular value oj; on the main diagonal of &: Me The singular values are in the order 0; > 02 > --- (3) > 0,. The reason that the rank r enters is that (by definition) singular values are not zero. The second part of the equation says that v; is in the nullspace for 7 =r-+1,...,n. This gives the correct number n —r of basis vectors for the nullspace. Let me connect the matrices A and & and V and U with the linear transformations they represent. The matrices A and & represent the same transformation. A uses the standard bases for R” and R”, while & uses the input basis of v’s and the output basis of u’s. The matrices V and U give the basis changes; they represent the identity transformations (in R” and R”). The product of transformations is just /7/ again, and it is represented in the v and wu bases by UAV: 350 = Chapter 7 Linear Transformations 7F The matrix © in the new bases comes from A in the standard bases by Xy’s to w’s = Oeren to u’s Astandard V»’s to standard: (4) The SVD chooses orthonormal bases (U~! = U™ and V~! = V") that diagonalize A. The two orthonormal bases in the SVD are the eigenvector bases for A'A (the v’s) and AA! (the u’s). Since those are symmetric matrices, their unit eigenvectors are orthonormal. Their eigenvalues are the numbers 07. To see why those bases diagonalize the matrix — to see that equation (3) holds — we use another proof by parentheses. With v; as a unit eigenvector of A'A, the first line shows that Av; has length o;. The second line finds its direction u;. Start from ATAD; = OF vj: Poet 0; vj©. A’ I, Avj = oj; AA‘ AD; = 0; Av; gives ‘ gives Av; |"2 = a;2 uj = Av;/oj; so that ||/Av;|| = 0;. as a unit eigenvector of AAT (5) XG) The eigenvector u; of AA! has length 1, because of (5). The equation Avj = ojuj says that these bases diagonalize A. Polar Decomposition Every complex number has the polar form re’®. A nonnegative number r multiplies a number on the unit circle. (Remember that |e!?| = |cos@ +isin@| = 1.) Thinking of these numbers as | by | matrices, r > 0 corresponds to a positive semidefinite matrix (call it H) and e’? corresponds to an orthogonal matrix Q. The SVD extends this rel? factorization to matrices (even m by n with rectangular Q). 7G Every real square matrix can be factored into A = QH, where Q is orthogonal and H is symmetric positive semidefinite. If A is invertible then H is positive definite. For the proof we just insert V'V = / into the middle of the SVD: A=UXV! =(UV')(V=XV') =(Q)(A). (7) The first factor UV' is Q. The product of orthogonal matrices is orthogonal. The second factor VXV' is H. It is positive semidefinite because its eigenvalues are in D. If A is invertible then H is also invertible—it is symmetric positive definite. H is the square root of A'A. Equation (7) says that H? = Vd2V! = ATA. There is also a polar decomposition A = K Q in the reverse order. Q is the same but now K = UXU!. This is the square root of AA?. 7.4 Example 3 Find the polar decomposition as ale i =i 0 Solution The orthogonal part is Vuvi=@Q"!A: Diagonalization and the Pseudoinverse 351 A = QH from its SVD: 2 2/961 UV!. Q = —1/V2 Wel A/S 1//2 Say aR al The positive definite part is H = et ollie ival-Line ial eAlyA WAIL BY) In mechanics, the polar decomposition separates the rotation (in Q) from the stretching (in H). The eigenvalues of H are the singular values of A. They give the stretching factors. The eigenvectors of H are the eigenvectors of A! A. They give the stretching directions (the principal axes). The polar decomposition just splits the key equation Av; = o;u; into two steps. The “H” part multiplies v; by o;. The “Q” part swings the v direction around to the u direction. The other order A = K Q swings v’s to w’s first (with the same Q). Then K multiplies u; by oj; to complete the job of A. The Pseudoinverse By choosing good bases, the action of A has become clear. It multiplies v; in the row space to give oju; in the column space. The inverse matrix must do the opposite! If Av = ou then A~'u = v/o. The singular values of A~! are 1/o, just as the eigenvalues of A~! are 1/A. The bases are reversed. The w’s are in the row space of A~!, the v’s are now in the column space. Until this moment we would have added the words “if A~! exists.” Now we don’t. A matrix that multiplies u; to produce v;/o; does exist. It is denoted by A*: 07; te oe Vi -°* Ur -° + Un T . n by n Uy: ++ Uy ++ +Um m by m n by m (8) At is the pseudoinverse of A. It is ann by m matrix. If A~! exists (we said it again), then At is the same as A~!. In that case m =n =r and we are inverting ULV! to get VD !U!. The new symbol At is needed when r < m or r <n. Then A has no two-sided inverse, but it has a pseudoinverse A* with these properties: 1 Atu; =—v; Oj fori<r and Atu;=0 ' fori>r. 352 Chapter 7 Linear Transformations nullspace of AT Figure 7.4 When we know A is invertible from row space to column space. At inverts it. what happens to each basis vector uj, we know A‘. The vectors uj,,...,U, in the column space of A go back to the row space. The other vectors Uy+1,..-,Um are in the left nullspace, and A* sends them to zero. Notice the pseudoinverse + of the diagonal matrix ©. Each singular value o is just replaced by o~!. The product ©*D is as near to the identity as we can get. We can’t do anything about the zero rows and columns, and we do get an r by r identity matrix. This example has o; = 2 and o2 = 3: St! Lie) Oks 0 30: > “Ove SAA OR Or Example 4 Find the pseudoinverse of A = rank is 1. The singular value is Atevetyt 2ecO *0 HCOr 230 107 Os 708 30 es Teata0 FOO 0 0 0 a2 iouk ae, : : This matrix is not invertible. The 10. That is inverted in D7: a= 1 a Bd bcs ey./S ONO) A654 ve | 0 OR Ds al 2 Sle ae el 9)12 ie io Lo oe A* also has rank 1. Its column space is the row space of A. When A takes (1, 1) in the row space to (4,2) in the column space, At does the reverse. Every rank one 7.4 matrix is a column times a row. Diagonalization and the Pseudoinverse With unit vectors uw and v, that is A = ouv!. 353 Then the best inverse we have is At = vu'/o. The product AA* is uu!, the projection onto the line through u. The product AtA is vv!, the projection onto the line through v. For all matrices, AA*+ and AtA are the projections onto the column space and row space. The shortest least squares solution to Ax Any other vector that solves the normal equation =» =b is xt = A*b (Problem 18). A' Ax = A'D is longer than x*. REVIEW OF THE KEY IDEAS = 1. Diagonalization S~'AS = A is the same as a change to the eigenvector basis. 2. |The SVD chooses an input basis of v’s and an output basis of u’s. thogonal bases diagonalize A. 3. Polar decomposition 4. The pseudoinverse At = VX*U! transforms the column space of A back to its row space. Then A‘A is the identity on the row space (and zero on the nullspace). Those or- factors A into (positive definite) times (orthogonal matrix). Problem Set 7.4 Problems 1 1-6 compute and use the SVD of a particular matrix (not invertible). Compute A'A and its eigenvalues and unit eigenvectors v; and v2: s=[3 | What is the only singular value o;? The rank of A is 2 3 r= 1. (a) Compute AA! and its eigenvalues and unit eigenvectors u; and up. (b) Verify from Problem From the u’s and 1 that Av; = o,u;. Put numbers Bab elt df =] v’s write down orthonormal bases for the four fundamental subspaces of this matrix A. 4 into the SVD: Describe all matrices that have those same four subspaces. 354 Chapter 7 From Linear Transformations U, V, and & find the orthogonal matrix H = VXV!. matrix Q = Verify the polar decomposition U V! and the symmetric A = QH. This H is only semidefinite because Compute the pseudoinverse At = VX+U?. 1/o,;. Rename The diagonal matrix &* contains the four subspaces (for A) in Figure 7.4 as four subspaces for At. Compute ATA and AAT. Problems ys 7-11 are about the SVD of an invertible matrix. Compute A'A and its eigenvalues and unit eigenvectors v; and v7. What are the singular values o; and o2 for this matrix A? [2 3 AA! has the same eigenvalues oF and os as A'A. Find unit eigenvectors wu and uw. Put numbers into the SVD: Ef ell? elle of. In Problem 8, multiply columns times rows to show that A = oud, + 02U20>. Prove from A = UXV' that every matrix of rank r is the sum of r matrices of rank one. 10 From U, V, and & find the orthogonal matrix K = UXU'. KQ. 11 The pseudoinverse of this A is the same as Problems 12 Q = UV! and the symmetric A = because 12-13 compute and use the SVD of a | by 3 rectangular matrix. Compute matrix is UP matrix Verify the polar decomposition in the reverse order A'A and AA! and their eigenvalues and unit eigenvectors when the A=[3 4 0]. What are the singular values of A? Put numbers into the singular value decomposition of A: A [3 4 0] = [wil [ou 0 0] [v1 v2 T v3| 9 Put numbers into the pseudoinverse of A. Compute AA* and A* A: At= =|v, MW 3 0 [wi ]'. 7.4 Diagonalization and the Pseudoinverse 355 14 What is the only 2 by 3 matrix that has no pivots and no singular values? What is © for that matrix? At is the zero matrix, but what shape? 15 If det A= 0 how do you know that detAt = 0? 16 When are the factors in UXV! the same as in QAQ™? The eigenvalues 4; must be positive, to equal the o;. Then A must be and positive Problems 17-20 bring out the main properties of A* and x* = ATb. 17 Suppose all matrices have rank one. pales Az 2 i Tl The vector b is (bj, b2). 2) al 7 |.8 <4 sen bee =|; 4 ave =| ei < ae el (a) The equation A! Ax = A'b has many solutions because A‘A is (b) Verify thatxt = Atb = (.2b; +.1b2, .2b; +.1b2) does solve ATAxt = A'D. (c) AAT projects onto the column space of A. Therefore _____ projects onto the nullspace of A?. Then A'(AAt+ —/)b =0. This gives A'Ax+ = A™b and x can be x". 18 The vector x+ is the shortest possible solution to A’AX = A'b. Reason: The difference x — x* is in the nullspace of A'A. This is also the nullspace of A. Explain how it follows that ab a lesaelier ala eae |b Any other solution x has greater length than x*. 19 Every b in R” is p+e. This is the column space part plus the left nullspace part. Every x in R” is x, +X, = (row space part) + (nullspace part). Then AAt p= 20 AAte = At AT AX, = Find At and AtA and AA? for the 2 by 1 matrix whose SVD is 3 6 —8]15 .s H 7 fe 4 Ac 21 A general 2 by 2 matrix A is determined by four numbers. If triangular, it is determined by three. If diagonal, by two. If a rotation, by one. Check that the total count is four for each factorization of A: LU 22 LDU ORs Gosvo an Ns a Following Problem 21, check that LDL! and QAQ! numbers. This is correct because the matrix A is are determined by three 356 Chapter 7 Linear Transformations 23 sibs. A new factorization! Factor Axed : | into A = EH, with 1’s on the diagonal and H is symmetric. : : where E is lower triangular When is this impossible? 24 Suppose vj,...v, and u;,...,u, are bases for the row space and column space of A. Describe all possible matrices A. 25 A pair of singular vectors v and u will satisfy Av = ou and A'u = ov. This ul. means that the double vector x = 4 is an eigenvector of what symmetric matrix? With what eigenvalue? APPEICAII@ IN: GRAPHS AND NETWORKS a 8.1 This chapter is about five selected applications of linear algebra. We had many applications to choose from. Any time you have a connected system, with each part depending on other parts, you have a matrix. Linear algebra deals with interacting systems, provided the laws that govern them are linear. Over the years I have seen one model so often, and found it so basic and useful, that I always put it first. The model consists of nodes connected by edges. This is called a graph. Graphs of the usual kind display functions f(x). Graphs of this different kind (m edges connecting n nodes) lead to matrices. This section is about the incidence matrix A of a graph—and especially about the four subspaces that come with it. For any m by n matrix there are two subspaces in R” and two in R”. They are the column spaces and nullspaces of A and A‘. Their dimensions are related by the most important theorem in linear algebra. The second part of that theorem is the orthogonality of the subspaces. Our goal is to show how examples from graphs illuminate the Fundamental Theorem of Linear Algebra. We review the four subspaces (for any matrix). Then we construct a directed graph and its incidence matrix. The dimensions will be easy to discover. But we want the subspaces themselves—this is where orthogonality helps. It is essential to connect the subspaces to the graph they come from. By specializing to incidence matrices, the laws of linear algebra become Kirchhoff’s laws. Please don’t be put off by the words “current” and “potential” and “Kirchhoff.’ These rectangular matrices are the best. Every entry of an incidence matrix is 0 or 1 or —1. This continues to hold during elimination. All pivots and multipliers are +1. Therefore both factors in A = LU also contain 0,1,—1. So do the nullspace matrices! All four subspaces have basis vectors with these exceptionally simple components. The matrices are not concocted for a textbook, they come from a model that is absolutely essential in pure and applied mathematics. 357 358 Chapter 8 Applications nullspace nullspace of A cr of A dim n—r Figure 8.1 The four subspaces with their dimensions and orthogonality. Review of the Four Subspaces Start with an m by n matrix. Its columns are vectors in R”. Their linear combinations produce the column space C(A), a subspace of R”. Those combinations are exactly the matrix-vector products Ax. The rows of A are vectors in R” (or they would be, if they were column vectors). Their linear combinations produce the row space. To avoid any inconvenience with rows, we transpose the matrix. The row space becomes C(A‘), the column space of A. The central questions of linear algebra come from these two ways of looking at the same numbers, by columns and by rows. The nullspace N(A) contains every x that satisfies Ax = 0—this is a subspace of R". The “left” nullspace contains all solutions to A! y = 0. Now y has m components, and N(A‘) is a subspace of R”. Written as y'A = 0!, we are combining rows of A to produce the zero row. The four subspaces are illustrated by Figure 8.1, which shows R” on one side and R™ on the other. The link between them is A. The information in that figure is crucial. First come the dimensions, which obey the two central laws of linear algebra: dim C(A) = dim C(A!) and dim C(A) + dim N(A) =n. When the row space has dimension r, the nullspace has dimension n — r. Elimination leaves these two spaces unchanged, and the echelon form U gives the dimension count. There are r rows and columns with pivots. There are n—r free columns without pivots, and those lead to vectors in the nullspace. 8.1. Graphs and Networks 359 The following incidence matrix A comes from a graph. Its echelon form is U: BN mie adeno one nO (att 36 tekeiae Oh ple| (oe we (re oun n wen = =f (aoe Tees us Ou 0G al aoe oe Onto 0 0 6 0 0 0. 06 The nullspace of A and U is the line through x = (1, 1, 1,1). The column spaces of A and U have dimension r = 3. The pivot rows are a basis for the row space. Figure 8.1 shows more—the subspaces are orthogonal. Every vector in the nullspace is perpendicular to every vector in the row space. This comes directly from the m equations Ax = 0. For A and U above, x = (1, 1, 1, 1) is perpendicular to all rows and thus to the whole row space. This review of the subspaces applies to any matrix A—only the example was special. Now we concentrate on that example. It is the incidence matrix for a particular graph, and we look to the graph for the meaning of every subspace. Directed Graphs and Incidence Matrices Figure 8.2 displays a graph with m = 6 edges and n = 4 nodes, so the matrix A is 6 by 4. It tells which nodes are connected by which edges. The entries —1 and +1 also tell the direction of each arrow (this is a directed graph). The first row of A gives a record of the first edge: The first edge goes from node | to node 2. The first row has —1 in column | and +1 in column 2. node re Figure 8.2a -l1 1 0 =| ee)ee 0-1 1 =i uO (0 0-1 0 OF R01" 0 eC) 0 al 1 4 1 2, 3 4 5 6 Complete graph with m = 6 edges and n = 4 nodes. Row numbers are edge numbers, column numbers are node numbers. You can write down A immediately by looking at the graph. edge 360 == Chapter 8 Applications The second graph has the same four nodes but only three edges. matrix is 3 by 4: Its incidence @ node O®O® at ® @ ton 0 0/0 1 | Oy Oey yal | 2 3 edge @ 2 Figure 8.2b Tree with 3 edges and 4 nodes and no loops. The first graph is complete—every pair of nodes is connected by an edge. The second graph is a tree—the graph has no closed loops. Those graphs are the two extremes, with the maximum number of edges m = $n(n — 1) and the minimum number m =n—1. We are assuming that the graph is connected, and it makes no fundamental difference which way the arrows go. On each edge, flow with the arrow is “positive.” Flow in the opposite direction counts as negative. The flow might be a current or a signal or a force—or even oil or gas or water. The rows of B match the nonzero rows of U—the echelon form found earlier. Elimination reduces every graph to a tree. The loops produce zero rows in U. Look at the loop from edges 1, 2, 3 in the first graph, which leads to a zero row: —1 [ae = Qe Ort Oh 0 Sone = = 1 Or 0 jae 0 1 1 = =1 0 0; 0 =I Or=1 0 0 — 0 1 0 0 0 0 Those steps are typical. When two edges share a node, elimination produces the “shortcut edge” without that node. If the graph already has this shortcut edge, elimination gives a row of zeros. When the dust clears we have a tree. An idea suggests itself: Rows are dependent when edges form a loop. Independent rows come from trees. This is the key to the row space. For the column space we look at Ax, which is a vector of differences: — —1 (Oi —1 0 1 0 xX] 0 -1 1 0 ND 0 -1 0 1 X4 gee 2] Que x2 — Xx] Wee 1 X3.— X] | Xx3,\— X2 oe |x, — x, x4 — X2 Boae Se: ® 8.1 The unknowns x), x2,.x3,x4 represent potentials Graphs and Networks at the nodes. Then Ax 361 gives the potential differences across the edges. It is these differences that cause flows. now examine the meaning of each subspace. We 1 The nullspace contains the solutions to Ax = 0. All six potential differences are zero. This means: All four potentials are equal. Every x in the nullspace is a constant vector (c,c,c,c). The nullspace of A is a line in R’—its dimension is n —r = 1. The second incidence matrix B has the same nullspace. It contains (1, 1, 1, 1): Baa —1 Ol 0 ly i 0 yh | ue , 0 0 0 We can raise or lower all potentials by the same amount c, without changing the differences. There is an “arbitrary constant” in the potentials. Compare this with the same statement for functions. We can raise or lower f(x) by the same amount C, without changing its derivative. There is an arbitrary constant C in the integral. Calculus adds “+C” to indefinite integrals. Graph theory adds (c, c,c,c) to the vector x of potentials. Linear algebra adds any vector x, in the nullspace to one par- ticular solution of Ax = b. The “+C” disappears in calculus when the integral starts at a known point x = a. Similarly the nullspace disappears when we set x4 = 0. The unknown x4 is removed and so are the fourth columns of A and B. Electrical engineers would say that node 4 has been “grounded.” 2 The row space contains all combinations of the six rows. Its dimension is certainly motccix. The rank isy = 3, as “we Ihe equation ry also saw from elimination. not independent. How — 1) = nm must be 3--l=4. After 3 edges, we start forming loops! The new rows are can we tell if v = (v1, v2, v3, v4) is in the row space? The slow way is to combine rows. The quick way is by orthogonality: v is in the row space if and only if it is perpendicular to (1, 1, 1, 1) in the nullspace. The vector v = (0, 1, 2, 3) fails this test—its components add to 6. The vector (—6, 1, 2, 3) passes the test. It lies in the row space because its components add to zero. It equals 6(row 1) + S(row 3) + 3(row 6). Each row of A adds to zero. This must be true for every vector in the row space. 3 The column space contains all combinations of the four columns. independent columns, since there were three independent rows. We expect three The first three columns are independent (so are any three). But the four columns add to the zero vector, which says again that (1, 1,1, 1) is in the nullspace. How can we tell if a particular vector b is in the column space? First answer Try to solve Ax = b. As before, orthogonality gives a better answer. We are now coming to Kirchhoff’s two famous laws of circuit theory—the voltage law and 362 Chapter 8 Applications current law. Those are natural expressions of “laws” of linear algebra. It is especially pleasant to see the key role of the left nullspace. Second answer Ax is the vector of differences in equation (1). If we add differences around a closed loop in the graph, the cancellation leaves zero. Around the big triangle formed by edges 1,3, —2 (the arrow goes backward on edge 2) the differences are (x2 —*1) + 3 = X42) — (a = 11) = 0. This is the voltage law: The components of Ax add to zero around every loop. When b is in the column space, it must obey the same law: Kirchhoff’s Voltage Law: b, + b3 — bz = 0. By testing each loop, we decide whether b is in the column space. Ax = b can be solved exactly when the components of b satisfy all the same dependencies as the rows of A. Then elimination leads to 0 = 0, and Ax = B 1s consistent. 4 The left nullspace contains the solutions to A' y = 0. Its dimension is m—r m9 Mats Daa eh 1 ana 0 i — —1 oT 1 01] » 0 Y3 -1|| 0 = 4] =|0 Sits = 6—3: 0 0 ” The true number of equations is r = 3 and not n = 4. Reason: The four equations add to 0 = 0. The fourth equation follows automatically from the first three. What do the equations mean? The first equation says that —y; — y2 — yq4 = 0. The net flow into node 1 is zero. The fourth equation says that y4 + ys + ye = 0. Flow into the node minus flow out is zero. The equations A' y = 0 are famous and fundamental: Kirchhoff’s Current Law: Flow in equals flow out at each node. This law deserves first place among the equations of applied mathematics. It expresses “conservation” and “continuity” and “balance.” Nothing is lost, nothing is gained. When currents or forces are in equilibrium, the equation to solve is Aly = 0. Notice the beautiful fact that the matrix in this balance equation is the transpose of the incidence matrix A. What are the actual solutions to Aly = 0? The currents must balance themselves. The easiest way is to flow around a loop. If a unit of current goes around the big triangle (forward on edge 1, forward on 3, backward on 2), the vector is y = (1, —1, 1,0, 0,0). This satisfies A'y = 0. Every loop current yields a solution y, because flow in equals flow out at every node. A smaller loop goes forward on edge 1, forward on 5, back on 4. Then y = (1,0,0, —1, 1,0) is also in the left nullspace. 8.1 Graphs and Networks 363 We expect three independent y’s, since 6 — 3 = 3. The three small loops in the graph are independent. The big triangle seems to give a fourth y, but it is the sum of flows around the small loops. The small loops give a basis for the left nullspace. 1 0 0 0 0 1 0 =| Oy 1 —1 l 1 0 —] I 0 ae) 0 0 <li 2 3 a Oe tin bikeah l inden small loops big loop Summary The incidence matrix A comes from a connected graph with n nodes and m edges. The row space and column space have dimensions n — 1. The nullspaces have dimension | and m—n +1: 1 The constant vectors (c,c,...,c) make up the nullspace of A. Z There are r = n — | independent rows, using edges from any tree. 3 Voltage law: The components of Ax add to zero around every loop. 4 Current law: A‘ y = 0 is solved by loop currents. N(A‘) has dimension m—r. There are m—r =m —n +1 independent loops in the graph. For every graph in a plane, linear algebra yields Euler’s formula: (number of nodes)—(number of edges)+(number of small loops)=1. This is n —m+(m—n-+1) = 1. The graph in our example has 4 -6+3= 1. A single triangle has (3 nodes)—(3 edges)+(1 loop). On a 10-node tree with 9 edges and no loops, Euler’s count is 10 — 9 + 0. All planar graphs lead to the answer 1. Networks and A'CA In a real network, the current y along an edge is the product of two numbers. One number is the difference between the potentials x at the ends of the edge. This differ- 364 Chapter 8 Applications ence is Ax and it drives the flow. The other number is the “conductance” c—which measures how easily flow gets through. In physics and engineering, c is decided by the material. For electrical currents, c is high for metal and low for plastics. For a superconductor, c is nearly infinite. If we consider elastic stretching, c might be low for metal and higher for plastics. In economics, c measures the capacity of an edge or its cost. To summarize, the graph is known from its “connectivity matrix” A. This tells the connections between nodes and edges. A network goes further, and assigns a conductance c to each edge. These numbers cj,..., Cm go into the “conductance matrix” C—which is diagonal. For a network of resistors, the conductance is c = | /(resistance). In addition to Kirchhoff’s laws for the whole system of currents, we have Ohm’s law for each particular current. Ohm’s law connects the current y; on edge | to the potential difference x2—x, between the nodes: Ohm’s Law: Current along edge = conductance times potential difference. Ohm’s law for all m currents is y = —CAx. The vector Ax gives the potential differences, and C multiplies by the conductances. Combining Ohm’s law with Kirchhoff’s current law Aly = 0, we get A'CAx = 0. This is almost the central equation for network flows. The only thing wrong is the zero on the right side! The network needs power from outside—a voltage source or a current source—to make something happen. Note about signs In circuit theory we change from Ax to —Ax. The flow is from higher potential to lower potential. There is (positive) current from node | to node 2 when x; — x2 is positive—whereas Ax was constructed to yield x2 — xj. The minus sign in physics and electrical engineering is a plus sign in mechanical engineering and economics. Ax versus —Ax is a general headache but unavoidable. Note about applied mathematics Every new application has its own form of Ohm’s law. For elastic structures y = CAx is Hooke’s law. The stress y is (elasticity C) times (stretching Ax). For heat conduction, Ax is a temperature gradient. For oil flows it is a pressure gradient. There is a similar law for least square regression in statistics. My textbook Introduction to Applied Mathematics (Wellesley-Cambridge Press) is practically built on “A'CA.” This is the key to equilibrium in matrix equations and also in differential equations. Applied mathematics is more organized than it looks. I have learned to watch for A'CA. We now give an example with a current source. Kirchhoff’s law changes from Aly = 0to Aly = f, to balance the source f from outside. Flow into each node still equals flow out. Figure 8.3 shows the network with its conductances c},..., C6, and it shows the current source going into node |. The source comes out at node 4 to keep the balance (in = out). The problem is: Find the currents yj,..., yo on the six edges. 8.1 Figure 8.3. The currents in a network Graphs and Networks with a source S into node 365 1. Example 1 All conductances are c = 1, so that C = J. A current yg travels directly from node | to node 4. Other current goes the long way from node | to node 2 to node 4 (this is y; = ys). Current also goes from node | to node 3 to node 4 (this is y2 = yo). We can find the six currents by using special rules for symmetry, or we can do it right by using A'CA. Since C = /, this matrix is A! A: Seer te eet it So POP Oe ee Theme 1 al -1 O0|| 0 -1 1° 0} Oat iierx 0 =1}|=-1 00 0 1 | 7 0 iia Sy eee Ome tnd) ai Create 21 That last matrix is not invertible! (1,1, 1, 1) is in the nullspace. We cannot 3g See Reed =e, oder solve for all four potentials because One node has to be grounded. Setting x4 = 0 removes the fourth row and column, and this leaves a 3 by 3 invertible matrix. Now we solve A'CAx = f for the unknown potentials x1, x2, x3, with source S into node 1: 3 =e) —-1 -1 -!l xX] iS -1 y= 3 oe x3 0 0 Ohm’s law y = —CAx yI y2 y3 yah ys 6 gives x] S/2 Kon X3 oye S/4 yields the six currents. Remember C = / and x4 = 0: = RRS OPO A101 nul Ol S72 Ok: Arcuri} aS, WN eaariieaivine) ARORR iM ApIS 48 ain ata Gi ory 0 Ora I Half the current goes directly on edge 4. That is yy = S/2. node 2 to node 3. Symmetry indicated y3 = 0 and now S/4 5/4 SO Se 5/4 S/4 No current crosses from the solution proves it. 366 Chapter 8 Applications The same matrix A'A appears in least squares. Nature distributes the currents to minimize the heat loss, where statistics chooses ¥ to minimize the error. Problem Set 8.1 Problems 1-7 and 8-14 are about the incidence matrices for these graphs. \ edge 1 2 3 4 edge 2 edge 3 1 1 Write down the 3 by 3 incidence matrix A for the triangle graph. The first row has —1 in column | and +1 in column 2. What vectors (x1, x2, x3) are in its nullspace? How do you know that (1, 0,0) is not in its row space? d Write down A! for the triangle graph. Find a vector y in its nullspace. The components of y are currents on the edges—how much current is going around the triangle? 3 Eliminate x; and x2 from the third equation to find the echelon matrix U. What tree corresponds to the two nonzero rows of U? Kista Dy —xX, +x3 = b2 —x2 + x3 = b3. 4 Choose a vector (b,, b2, b3) for which Ax = b can be solved, and another vector b that allows no solution. How are those b’s related to y = (1, —1, 1)? 5 Choose a vector (1, f2, f3) for which A’ y = f can be solved, and a vector iy that allows no solution. How are those f’s related to x = (1,1, 1)? The equation Aly = f is Kirchhoff’s law. 6 Multiply matrices to find A'A. Choose a vector f for which A'Ax = f can be solved, and solve for x. Put those potentials x and the currents y = —Ax and current sources f onto the triangle graph. Conductances are 1 because C = /. i With conductances c} = | and cz = cz = 2, multiply matrices to find ATCA. For f = (1,0, —1) find a solution to A'CAx = f. Write the potentials x and currents y = —C Ax on the triangle graph, when the current source f goes into node | and out from node 3. 8.1 Graphs and Networks 367 Write down the 5 by 4 incidence matrix A for the square graph with two loops. Find one solution to Ax = 0 and two solutions to Aly = 0. Find two requirements on the b’s for the five differences x2 — x1, x3 —X1,.X3—X, x4 — X2,x4 — x3 to equal by, bz, b3,b4, bs. You have found Kirchhoff’s law around the two in the graph. 10 Reduce A to its echelon form U. The three nonzero rows give the incidence matrix for what graph? You found one tree in the square graph—find the other seven trees. 11 Multiply matrices to find A'A and guess how its entries come from the graph: 12 13 (a) The diagonal of A'A tells how many (b) The off-diagonals —1 or 0 tell which pairs of nodes are Why is each statement true about into each node. A! A? Answer for A'A not A. (a) Its nullspace contains (b) It is positive semidefinite but not positive definite. (c) Its four eigenvalues are real and their signs are With conductances — _ (1, 1,1, 1). Its rank is n — 1. c) = cz = 2 and c3 = c4 = c5 = 3, multiply the matrices A'CA. Find a solution to A'CAx = f = (1,0,0,—1). Write these potentials x and currents y = —CAx on the nodes and edges of the square graph. 14 The matrix A'CA is not invertible. What vectors x are in its nullspace? does A'CAx = f have a solution if and only if fj + fo + f3+ f4 = 0? 15 A connected graph with 7 nodes and 7 edges has how many loops? 16 For the graph with 4 nodes, 6 edges, and 3 loops, add a new node. If you connect it to one old node, Euler’s formula becomes ( it to two old nodes, Euler’s formula becomes 17 18 Why )—( )+( ) = 1. If you connect ( )—( )+( )=1. Suppose A is a 12 by 9 incidence matrix from a connected (but unknown) graph. (a) How many columns of A are independent? (b) What condition on f makes it possible to solve Aly= f? (c) The diagonal entries of A'A give the number What is the sum of those diagonal entries? of edges into each node. Why does a complete graph with n = 6 nodes have m = connecting 6 nodes has ___ edges. 15 edges? A tree 368 = Chapter 8 Applications MARKOV MATRICES AND ECONOMIC MODELS ga 8.2 Early in this book we proposed an experiment. Start with any vector ug = (x, 1 — x). Multiply it again and again by the “transition matrix” A: a= [3 3), The experiment produces uj = Aug and then uz = Au,. After k steps we have Ak uo. Unless MATLAB went haywire, the vectors uo, 1, U2,U3,... approached a “steady state.” That limit state is Uo = (.6,.4). This final outcome does not depend on the starting vector: For every ug we always converge to (.6, .4). The question is why. At that time we had no good way to answer this question. We knew nothing about eigenvalues. It is true that the steady state equation Algo = Ugo could be verified: fs alle}-[2], You would now say that Uo is an eigenvector with eigenvalue 1. That makes it steady. Multiplying by A does not change it. But this equation Aug = Uo does not explain why all vectors ug lead to Uo. Other examples might have a steady state, but it is not necessarily attractive: f.0 pal 3 has the steady state 1 1 Pale In this case, the starting vector ug = (0, 1) will give uw; = (0, 2) and uz = (0,4). The second components are being doubled by the “2” in B. In the language of eigenvalues, B has 4 = | but it also has A = 2—and an eigenvalue larger than one produces instability. The component of wu along that unstable eigenvector is multiplied by 4, and |A| > 1 means blowup. This section is about two special properties of A that guarantee a steady state Uoo. These properties define a Markov matrix, and A above is one particular example: 1 Every entry of A is nonnegative. 2 Every column of A adds to |. B did not have Property 2. When A is a Markov matrix, two facts are immediate: Multiplying a nonnegative uo by A produces a nonnegative uj = Aug. If the components of wp add to 1, so do the components of uj = Auo. Reason: The components of uo add to 1 when [1 vee 1 Juo =I. This4strue-for each column of A by Property 2. Then by matrix multiplication it is true for Aug: [Pere Aug = (1 ea 1 jag =i: 8.2 Markov Matrices and Economic Models 369 The same facts apply to uz = Au, and uz = Au». Every vector ux, = Aug is nonnegative with components adding to |. These are “probability vectors’? The limit Uoo 1S also a probability vector—but we have to prove that there is a limit! The existence of a steady state will follow from 1 and 2 but not so quickly. We must show that A = 1 is an eigenvalue of A, and we must estimate the other eigenvalues. Example 1 The fraction of rental cars in Denver starts at 4 = .02. The fraction outside Denver is .98. Every month those fractions (which add to 1) are multiplied by the Markov matrix A: A 80.05 Es | SEES US asc 2 065 ES 7 el That is a single step of a Markov chain. In one month, the fraction of cars in Denver is up to .065. The chain of vectors is wo, u,,u2,..., and each step multiplies by A: u, = Auo, u2 = A*uo, pe PLOGUCES p= Ak uo. All these vectors are nonnegative because A is nonnegative. Furthermore .065+.935 = 1.000. Each vector uz will have its components adding to 1. The vector uz = Au, is (.09875, .90125). The first component has grown from .02 to .065 to nearly .099. Cars are moving toward Denver. What happens in the long run? This section involves powers of matrices. The understanding of A* was our first and best application of diagonalization. Where A* can be complicated, the diagonal matrix A* is simple. The eigenvector matrix S connects them: A* equals SA‘S~!. The new application to Markov matrices follows up on this idea—to use the eigenvalues (in A) and the eigenvectors (in S). We will show that woo is an eigenvector corresponding OA el Since every column of A adds to 1, nothing is lost or gained. We are moving rental cars or populations, and no cars or people suddenly appear (or disappear). The fractions add to | and the matrix A keeps them that way. The question is how they are distributed after k time periods—which leads us to A‘. Solution to Example 1 After & steps the fractions in and out of Denver are the components of A‘ug. To study the powers of A we diagonalize it. The eigenvalues are 4 = | and A = .75. The first eigenvector, with components adding to 1, is x; = (.2, .8): A= OU | .20 A 05 Chee, [=2 = 1.78+.75=0-NA=79 AG) = -l--Ci) Those eigenvectors are x; and x2. They are the columns of S. The starting vector uo is a combination of x; and x2, in this case with coefficients | and .18: om [2] [21 370 = =Chapter 8 Applications Now multiply by A to find u;. The eigenvectors are multiplied by A; = 1 and Az = .75: ve 2, he + (.75)(.18) —I il Each time we multiply by A, another .75 multiplies the last vector. xX, is unchanged: eae The eigenvector ee | + (.75)*(.18) el This equation reveals what happens. The eigenvector x, with i = | is the steady state Uo. The other eigenvector x2 gradually disappears because |A| < 1. The more steps we take, the closer we come to Ugo = (.2, .8). In the limit, a of the cars are in Denver and ty are outside. This is the pattern for Markov chains: 8A If A is a positive Markov matrix (entries aj; > 0, each column adds to 1), then A’ = 1 is larger than any other eigenvalue. The eigenvector x; is the steady state: ur =xy+ €2(A2)*x2 +---+c¢, Git x8 Assume that the components always approaches of wg add to 1. Then Ugo = xX}. this is true of uj, u2,.... The key point is that we approach a multiple of x, from every starting vector uo. If all cars start outside Denver, or all start inside, the limit is still woo = x; = (.2, .8). The first point is to see that 4 = | is an eigenvalue of A. Reason: Every column of A —J adds to |—1=0. The rows of A —/ add up to the zero row. Those rows are linearly dependent, so A — / is singular. Its determinant is zero and 4 = | is an eigenvalue. Since the trace of A was 1.75, the other eigenvalue had to be A2 = .75. The second point is that no eigenvalue can have |A| > 1. With such an eigenvalue, the powers A* would grow. But A* is also a Markov matrix with nonnegative entries adding to 1—and that leaves no room to get large. A lot of attention is paid to the possibility that another eigenvalue has |A| = 1. Suppose the entries of A or any power A* are all positive—zero is not allowed. In this “regular” case A = | is strictly bigger than any other eigenvalue. When A and its powers have zero entries, another eigenvalue could be as large as A, = 1. Example 2 A =[{)] has no steady state because A7 = —1. This matrix sends all cars from inside Denver to outside, and vice versa. The powers A* alternate between A and J. The second eigenvector x2 = (—1, 1) is multiplied by A2 = —1 at every step—and does not become smaller. With a regular Markov matrix, the powers A* approach the rank one matrix that has the steady state x, in every column. 8.2 Markov Matrices and Economic Models 371 Example 3 (“Everybody moves” ) Start with three groups. At each time step, half of group | goes to group 2 and the other half goes to group 3. The other groups also split in half and move. If the starting populations are pj), p2, p3, then after one step the new populations are Ue uy =Auo=|5 0 i 5 7 1 1 | iia 1 D2 hes 5]| po |=| 5p: +423 O - SPi+4p2 A is a Markov matrix. Nobody is born or lost. It is true that A contains zeros, which gave trouble in Example 2. But after two steps in this new example, the zeros disappear from A?: aS 2 uy=AUp=|5G De 1 AL I 5 Lore 4 4 el 4 1 Pi |Z P2 P3 2 What is the steady state? The eigenvalues of A are 4; = | (because A is Markov) and Mg =) = —5. The eigenvector x; = (, i 5) for X = 1 will be the steady state. When three equal populations split in half and move, the final populations are again equal. When the populations start from uo = (8, 16, 32), the Markov chain approaches its steady state: 8 up = 16 24 u;, = 32 20 [ 16 ua = 12 18 20 u3= 22 19 lig) The step to u4 will split some people in half. This cannot be helped. The total population is 8+ 16+32 = 56 (and later the total is still 20+19+17 = 56). The steady state populations Uo are 56 times Gj. i 4). You can see the three populations approaching, but never reaching, their final limits 56/3. Linear Algebra in Economics: The Consumption Matrix A long essay about linear algebra in economics would be out of place here. A short note about one matrix seems reasonable. The consumption matrix tells how much of each input goes into a unit of output. We have n industries like chemicals, food, and oil. To produce a unit of chemicals may require .2 units of chemicals, .3 units of food, and .4 units of oil. Those numbers go into row | of the consumption matrix A: chemical output food output oil output ae 2 3 4 A) ae al rales: chemical input food input oil input Row 2 shows the inputs to produce food—a heavy use of chemicals and food, not so much oil. Row 3 of A shows the inputs consumed to refine a unit of oil. The real 372 Chapter 8 Applications consumption matrix for the United States in 1958 contained 83 industries. The models in the 1990’s are much larger and more precise. We chose a consumption matrix that has a convenient eigenvector. Now comes the question: Can this economy meet demands yj, y2, y3 for chemicals, food, and oil? To do that, the inputs pj, p2, p3 will have to be higher—because part of p is consumed in producing y. The input is p and the consumption is Ap, which leaves p — Ap. This net production is what meets the demand y: Problem Find a vector p such that p—Ap = y or (I—A)p = y or p=(I—A)7'y. Apparently the linear algebra question is whether / — A is invertible. But there is more to the problem. The demand vector y is nonnegative, and so is A. The production levels in p = (I — A)~'y must also be nonnegative. The real question is: When is (I — Ay a nonnegative matrix? This is the test on (J — A)~! for a productive economy, which can meet any positive demand. If A is small compared to J, then Ap is small compared to p. There is plenty of output. If A is too large, then production consumes more than it yields. In this case the external demand y cannot be met. “Small” or “large” is decided by the largest eigenvalue A, of A (which is positive): If Aq. Sul | then. If A; =1 then Tf Aytal § then). (7 =!A)—* (J —A)7! (2A ie has negative entries fails to exist ais nonnegative as desired. The main point is that last one. The reasoning makes use of a nice formula for (J — A)~!, which we give now. The most important infinite series in mathematics is the geometric series 1 +x +x*+---. This series adds up to 1/(1 — x) provided x is between. —1 and 1. (When x = | the series is 1+ 1+1+--- =o. When |x| > 1 the terms x” don’t go to zero and the series cannot converge.) The nice formula for (I — A)~! is the geometric series of matrices: (LS Ay ea Pe Aae Ae Sar aes: If you multiply this series by A, you get the same series S except for J. Therefore S — AS =I, which is (J — A)S = I. The series adds to S = (I— A)7! if it converges. And it converges if |Amax| < 1. In our case A > 0. All terms of the series are nonnegative. Its sum is (J — A)~! > 0. Example 3A 4A = [4441] has A: =.9 and (I — A)! = [33434 36 24]. yh va) 23 36 This economy is productive. A is small compared to J, because Amax is .9. To meet the demand y, start from p = (J — A)~'y. Then Ap is consumed in production, leaving p—Ap. This is (J — A)p = y, and the demand is met. 8.2 Markov Matrices and Economic Models 373 Example 5 A = [94] has A; =2 and (J — A)" = —}[14]. This consumption matrix A is too large. Demands can’t be met, because production consumes more than it yields. The series / + A+ A? +... does not converge to (I — A)~!. The series is growing while (I — A)~! is actually negative. Problem Set 8.2 Questions 1-14 are about Markov matrices and their eigenvalues and powers. 1 Find the eigenvalues of this Markov matrix (their sum is the trace): What is the steady state eigenvector for the eigenvalue A; = 1? 2 Diagonalize the Markov matrix in Problem | to ai ae What is the limit of At = 3 A = SAS~! | SA‘ S~! when Ak =[} 0 .759, approaches [ 00 1°]? What are the eigenvalues and the steady state eigenvectors for these Markov matrices? > > | II > II wow or or won Pat Se. =a ae Rl BIR Bl J by finding its other For every 4 by 4 Markov BIR NIBle BI Ble NI matrix, what eigenvector of A! corresponds to the (known) eigenvalue A = 1? 5 Every year 2% of young people become old and 3% of old people become dead. (No births.) Find the steady state for young old dead 6 =i |,4, ie). |lees 00 OO eae 03 0) 1H) 1 young old dead |, The sum of the components of x equals the sum of the components of Ax. If Ax = Ax with A 4 1, prove that the components of this non-steady eigenvector x add to zero. 374 Chapter 8 Applications Z Find the eigenvalues and eigenvectors of A. Factor A into SAS ees valet This was a MATLAB example in Chapter 1. There A!® was computed by squaring four times. What are the factors in A!® = SA!¢s—!? Explain why the powers A‘ in Problem 7 approach this matrix A%: seals al: Challenge problem: Which Markov matrices produce that steady state (.6, .4)? This permutation matrix is also a Markov matrix: oOo —_— ©© Oo So — oreo S oo The steady state eigenvector for A = | is (Gj. ie is i): This is not approached when uo = (0,0, 0, 1). What are wu, and wz and uw3 and u4? What are the four eigenvalues of P, which solve 4 = 1? 10 Prove that the square of a Markov matrix is also a Markov matrix. 11 If A = [25] is a Markov matrix, its eigenvalues are 1 and . The steady state eigenvector is x; = 12 Complete the last row to make A a Markov matrix and find the steady state eigenvector: When A is a symmetric Markov matrix, why is x; = (1,..., 1) its steady state? 13 A Markov differential equation is not du/dt = Au but du/dt = (A —/)u. the eigenvalues of Find pie B=A=1=| A ai When e*!' multiplies the eigenvector x; and e*2’ multiplies x2, what is the steady state as t > co? 8.2 14 The matrix to B = A—I Markov Matrices and Economic Models 375 for a Markov differential equation has each column adding . The steady state x; is the same as for A, but now A; = and ert Questions 15-18 are about linear algebra in economics. 15 Each row of the consumption matrix in Example 4 adds to .9. Why does that make 4 = .9 an eigenvalue, and what is the eigenvector? 16 Multiply 7+ A+ A*+ A+--- by J—A to show that the series adds to For 4i— [ 2], find A? and A? and use the pattern to add up the series. iv, For which of these matrices does J + A+ A* +--+ yield a nonnegative matrix (I — A)~!? Then the economy can meet any demand: v-[2 3] 4-[% f] [3] 18 If the demands in Problem 17 are y = (2, 6), what are the vectors 19 (Markov again) This matrix has zero determinant. a Find the limits of Aug 20 p = (I — A)~!y? What are its eigenvalues? ae) 3 4 BARNA HN starting from uo = (1,0, 0) and then wo = (100, 0, 0). If A is a Markov matrix, does / + A+ A?+--- add up to (J — A)7!? 376 Chapter 8 Applications LINEAR PROGRAMMING a 8.3 Linear programming is linear algebra plus two new ingredients: inequalities and minimization. The starting point is still a matrix equation Ax = b. But the only acceptable solutions are nonnegative. We require x > 0 (meaning that no component of x can be negative). The matrix has n > m, more unknowns than equations. If there are any nonnegative solutions to Ax = b, there are probably a lot. Linear programming picks the solution x* > 0 that minimizes the cost: The cost is cyxXy +++++CnXn. The winning vector x* is the nonnegative solution of Ax = b that has smallest cost. Thus a linear programming problem starts with a matrix A and two vectors b and c: i) A has n > m: for example A=[1 ii) b has m components: iii) The cost c has n components: 1 2] for example b = [4] for example c=[5 3 8]. Then the problem is to minimize c-x subject to the requirements Ax = b and x > 0: Minimize 5x, + 3x2 + 8x3 subject to x1 +x2+2x3 =4 and x1, x2, x3 > 0. We jumped right into the problem, without explaining where it comes from. Linear programming is actually the most important application of mathematics to management. Development of the fastest algorithm and fastest code is highly competitive. You will see that finding x* is harder than solving Ax = b, because of the extra requirements: cost minimization and nonnegativity. We will explain the background, and the famous simplex method, after solving the example. Look first at the “constraints”: Ax = b and x > 0. The equation x;+x2+2x3 = 4 gives a plane in three dimensions. The nonnegativity x; > 0, x2 > 0, x3 > 0 chops the plane down to a triangle. The solution x* must lie in the triangle PQR in Figure 8.4. Outside that triangle, some components of x are negative. On the edges of that triangle, one component is zero. At the corners of that triangle, two components are zero. The solution x* will be one of those corners! We will now show why. The triangle contains all vectors x that satisfy Ax = b and x > 0. (Those x’s are called feasible points, and the triangle is the feasible set.) candidates in the minimization of c-x, which is the final step: These points are the Find x* in the triangle to minimize the cost 5x1 + 3x2 + 8x3. The vectors that have zero cost lie on the plane 5x; + 3x2 + 8x3 = 0. That plane does not meet the triangle. We cannot achieve zero cost, while meeting the requirements on x. So increase the cost C until the plane 5x; + 3x2 + 8x3 = C does meet the triangle. This is a family of parallel planes, one for each C. As C increases, the planes move toward the triangle. 8.3 R= (0, 0, 2) (2 hours by computer) Linear Programming 377 Ax = bis the plane x, +x, +2x,=4 Triangle has x,2 0, X,2 OF X32 0 Q = (0, 4, 0) (4 hours by student) P = (4, 0, 0) (4 hours by Ph.D.) Figure 8.4 The triangle containing nonnegative solutions: lowest cost solution x* is one of the corners P, Q, or R. Ax = b and x > 0. The The first plane to touch the triangle has minimum cost C. The point where it touches is the solution x*. This touching point must be one of the corners P or Q or R. A moving plane could not reach the inside of the triangle before it touches a corner! So check the cost 5x; + 3x2 + 8x3 at each corner: P = (4, 0,0) costs 20 Or=1(0, 45-0 weostss 12 R = (0, 0, 2) costs 16. The winner is Q. Then x* = (0,4, 0) solves the linear programming problem. If the cost vector ¢ is changed, the parallel planes are tilted. For small changes, Q is still the winner. For the cost c-x = 5x; + 4x2 + 7x3, the optimum x* moves to R= (0,0, 2). The minimum cost is now 7-2 = 14. Note 1 Some linear programs maximize profit instead of minimizing cost. The mathematics is almost the same. The parallel planes start with a large value of C, instead of a small value. They move toward the origin (instead of away), as C gets smaller. The first touching point is still a corner. Note 2 The requirements Ax = b and x > 0 could be impossible to satisfy. The equation x; + x2 + x3 = —1 cannot be solved with x > 0. The feasible set is empty. Note 3 It could also happen that the feasible set is unbounded. If I change the require- ment to xj + x2 —2x3 = 4, the large positive vector (100, 100, 98) is now a candidate. So is the larger vector (1000, 1000, 998). The plane Ax = Bb is no longer chopped off to a triangle. The two corners P and Q are still candidates for x*, but the third corner has moved to infinity. Note 4 With an unbounded feasible set, the minimum cost could be —oo (minus infinity). Suppose the cost is —xj — x2 + x3. Then the vector (100, 100, 98) costs C = —102. The vector (1000, 1000, 998) costs C = —1002. We are being paid to include x; and x2. Instead of paying a cost for those components. In realistic applications this will not happen. 378 Chapter 8 Applications But it is theoretically possible that changes in A, b, and c can produce unexpected triangles and costs. Background to Linear Programming This first problem is made up to fit the previous example. The unknowns x1, x2, x3 represent hours of work by a Ph.D. and a student and a machine. The costs per hour are $5, $3, and $8. (I apologize for such low pay.) The number of hours cannot be negative: xj > 0,x2 > 0,x3 > 0. The Ph.D. and the student get through one homework problem per hour—the machine solves two problems in one hour. In principle they can share out the homework, which has four problems to be solved: x; + x2 + 2x3 = 4. The problem is to finish the four problems at minimum cost. If all three are working, the job takes one hour: xj = x2 = x3 = 1. The cost 1s 5+3+8 = 16. But certainly the Ph.D. should be put out of work by the student (who is just as fast and costs less—this problem is getting realistic). When the student works two hours and the machine works one, the cost is 6+ 8 and all four problems get solved. We are on the edge QR of the triangle because the Ph.D. is unemployed: x; = 0. But the best point is at a corner—all work by student (at Q) or all work by machine (at R). In this example the student solves four problems in four hours for $12—the minimum cost. With only one equation in Ax = Bb, the corner (0,4,0) has only one nonzero component. When Ax = b has m equations, corners have m nonzeros. As in Chapter 3, n —m free variables are set to zero. We solve Ax = b for the m basic variables (pivot variables). But unlike Chapter 3, we don’t know which m variables to choose as basic. Our choice must minimize the cost. The number of possible corners is the number of ways to choose m components out of n. This number “n choose m” is heavily involved in gambling and probability. With n = 20 unknowns and m = 8 equations (still small numbers), the “feasible set” can have 20!/8!12! corners. That number is (20)(19) --- (13) = 5,079, 110,400. Checking three corners for the minimum cost was fine. Checking five billion corners is not the way to go. The simplex method described below is much faster. The Dual Problem In linear programming, problems come in pairs. There is a minimum problem and a maximum problem—the original and its “dual.” The original problem was specified by a matrix A and two vectors b and c. The dual problem has the same input, but A is transposed and b and c are switched. Here is the dual to our example: A cheater offers to solve homework problems by looking up the answers. The charge is y dollars per problem, or 4y altogether. (Note how b = 4 has gone into the cost.) The cheater must be as cheap as the Ph.D. or student or machine: y < 5 and y < 3 and 2y < 8. (Note how c = (5, 3, 8) has gone into inequality constraints). The cheater maximizes the income 4y. 8.3. Dual Problem Linear Programming 379 Maximize b-y subject to Aly <c. The maximum occurs when y = 3. The income is 4y = 12. The maximum in the dual problem ($12) equals the minimum in the original ($12). This is always true: Duality Theorem If either problem has a best vector (x* or y*) then so does the other. The minimum cost c -x* equals the maximum income b- y*. Please note that I personally often look up the answers. It’s not cheating. This book started with a row picture and a column picture. The first “duality theorem” was about rank: The number of independent rows equals the number of independent columns. That theorem, like this one, was easy for small matrices. A proof that minimum cost = maximum income is in our text Linear Algebra and Its Applications. Here we establish the easy half of the theorem: The cheater’s income cannot exceed the honest cost: If Av=b.x = 0; Aly <c_ then bly = (Ax)! y = xT(Aly) <x'e. The full duality theorem says that when b! y reaches its maximum and x'¢ reaches its minimum, they are equal: b- y* =c-x”*. The Simplex Method Elimination is the workhorse for linear equations. The simplex method is the workhorse for linear inequalities. We cannot give the simplex method as much space as elimination— but the idea can be briefly described. The simplex method goes from one corner to a neighboring corner of lower cost. Eventually (and quite soon in practice) it reaches the corner of minimum cost. This is the solution x*. A corner is a vector x > 0 that satisfies the m equations Ax = b with at most m positive components. The other n — m components are zero. (Those are the free variables. Back substitution gives the basic variables. All variables must be nonnegative or x is a false corner.) For a neighboring corner, one zero component becomes positive and one positive component becomes zero. The simplex method must decide which component “enters” by becoming positive, and which component “leaves” by becoming zero. That exchange is chosen so as to lower the total cost. This is one step of the simplex method. Here is the overall plan. Look at each zero component at the current corner. If it changes from 0 to 1, the other nonzeros have to adjust to keep Ax = b. Find the new x by back substitution and compute the change in the total cost c-x. This change 380 = Chapter 8 Applications is the “reduced cost” r of the new component. The entering variable is the one that gives the most negative r. This is the greatest cost reduction for a single unit of a new variable. Example 1 Suppose the current corner is (4, 0,0), with the Ph.D. doing all the work (the cost is $20). If the student works one hour, the cost of x = (3, 1,0) is down to $18. The reduced cost is r = —2. If the machine works one hour, then x = (2, 0, 1) also costs $18. The reduced cost is also r = —2. In this case the simplex method can choose either the student or the machine as the entering variable. Even in this small example, the first step may not go immediately to the best x*. The method chooses the entering variable before it knows how much of that variable to include. We computed r when the entering variable changes from 0 to 1, but one unit may be too much or too little. The method now chooses the leaving variable (the Ph.D.). The more of the entering variable we include, the lower the cost. This has to stop when one of the positive components (which are adjusting to keep Ax = 5) hits zero. The leaving variable is the first positive x; to reach zero. When that happens, a neighboring corner has been found. More of the entering variable would make the leaving variable negative, which is not allowed. We have gone along an edge of the allowed feasible set, from the old corner to the new corner. new corner) to find the next variables to enter and leave. Then start again (from the When all reduced costs are positive, the current corner is the optimal x*. No zero component can become positive without increasing c-x. No new variable should enter. The problem is solved. Note Generally x* is reached in an steps, where a@ is not large. But examples have been invented which use an exponential number of simplex steps. Eventually a different approach was developed, which is guaranteed to reach x* in fewer (but more difficult) steps. The new methods travel through the interior of the feasible set, to find x* in polynomial time. Khachian proved this was possible, and Karmarkar made it efficient. There is strong competition between Dantzig’s simplex method (traveling around the edges) and Karmarkar’s method through the interior. Example 2 Minimize the cost c-x = 3x; + x2 + 9x3 +.x4. The constraints are x > 0 and two equations Ax = b: x, +2x3+2x4=4 x2 +x3-—x4 =2 m=2 _ equations n=4_ unknowns. A starting corner is x = (4, 2,0, 0) which costs c-x = 14. It has m = 2 nonzeros and n—m=2 zeros (x3 and x4). The question is whether x3 or x4 should enter (become nonzero). Try each of them: Ifixge=1 and x4. =,0; otthenrxe= (291 sk; O)pcostsal 6: If x4 = 1, andex3 = 0, | then x = (3;.3/0,1) costs 13. 8.3 Linear Programming 381 Compare those costs with 14. The reduced cost of x3 is r = 2, positive and useless. The reduced cost of x4 is r = —1, negative and helpful. The entering variable is x4. How much of x4 can enter? One unit of x4 made x; drop from 4 to 3. Four units will make x; drop from 4 to zero (while x2 increases all the way to 6). The leaving variable is x;. The new corner is x = (0,6, 0,4), which costs only c-x = 10. This is the optimal x*, but to know that we have to try another simplex step from (0, 6, 0, 4). Suppose x; or x3 tries to enter: Mx, = and x3—= 0,” live = land 7) =.07 “then x = (lor0s siicostcer then = (0) 3.5152) costsur4. Those costs are higher than 10. Both r’s are positive—it does not pay to move. The current corner (0, 6,0, 4) is the solution x*. These calculations can be streamlined. It turns out that each simplex step solves three linear systems with the same matrix B. (This is the m by m matrix that keeps the m basic columns of A.) When a new column enters and an old column leaves, there is a quick way to update B~'. That is how most computer codes organize the steps of the simplex method. One final note. We described how to go from one corner to a better neighbor. We did not describe how to find the first corner—which is easy in this example but not always. One way is to create new variables x5 and x6, which begin at 4 and 2 (with all the original x’s at zero). Then start the simplex method with x5 + x6 as the cost. Switch to the original problem after x5 and x6 reach zero—a starting corner for the original problem has been found. Problem Set 8.3 1 Draw the region in the xy plane where x +2y = 6 and x > 0 and y > 0. Which point in this “feasible set” minimizes the cost c = x + 3y? Which point gives maximum cost? Z Draw the region in the xy plane where x +2y <6, 2x+y <6, x >0, It has four corners. Which corner minimizes the cost c = 2x — y? 3 y>0. What are the corners of the set x; + 2x2 — x3 = 4 with x1, x2, x3 all > 0? Show that x; + 2x3 can be very negative in this set. 4 Start at x = (0,0, 2) where the machine solves all four problems for $16. Move to x = (0,1, _) to find the reduced cost r (the savings per hour) for work by the student. Find r for the Ph.D. by moving to x = (1,0, not give the number of hours or the total savings. 5 Start from (4,0,0) with c changed to [5 3 7]. Show machine but the total cost is lower for the student. two steps, first to machine and then to student. 6 Choose imum __). Notice that r does that r is better for the The simplex method takes a different c so the Ph.D. gets the job. Rewrite the dual problem income to the cheater). (max- 382 Chapter 8 Applications FOURIER SERIES: LINEAR ALGEBRA FOR FUNCTIONS a 8.4 This section goes from finite dimensions to infinite dimensions. I want to explain linear algebra in infinite-dimensional space, and to show that it still works. First step: look back. This book began with vectors and dot products and linear combinations. We begin by converting those basic ideas to the infinite case—then the rest will follow. What does it mean for a vector to have infinitely many components? There are two different answers, both good: 1 The vector becomes v = (v], v2, v3,...). It could be (1, 5, is KARO): 2 The vector becomes a function f(x). It could be sin x. We will go both ways. Then the idea of Fourier series will connect them. After vectors come dot products. The natural dot product of two infinite vectors (vj, V2,...) and (w1, w2,...) iS an infinite series: vew=VvVwijt+v2zu2+-::-. (1) This brings a new question, which never occurred to us for vectors in R”. Does this infinite sum add up to a finite number? Does the series converge? Here is the first and biggest difference between finite and infinite. When v = w = (1,1, 1,...), the sum v-w=1+1+4+14+--- certainly does not converge. In that case is infinite. Since v equals w, we are really computing v-v = \lv|/7 = length squared. The vector (1,1, 1,...) has infinite length. We don’t want that vector. Since we are making the rules, we don’t have to include it. The only vectors to be allowed are those with finite length: DEFINITION The vector (vj, v2,...) is in our infinite-dimensional “Hilbert space” if and only if its length is finite: jv? =v-v= Example 1 The length is 2/./3. the square root: vector vi as v5 “ie v3 +.---must add to a finite number. v = (1, 5s 5 tte) is included We have a geometric vev=l+g+yt Question Answer in Hilbert series that adds to 4/3. = 1 space, because its The length of v is =| If v and w have finite length, how large can their dot product be? The sum v-w = vj w;+v2w2+--- also adds to a finite number. The Schwarz inequality is still true: |v -w| < |lv|| [wll (2) 8.4 Fourier Series: Linear Algebra for Functions 383 The ratio of v + w to ||v|j ||w|] is still the cosine of 6 (the angle between v and w). Even in infinite-dimensional space, |cos@| is not greater than 1. Now change over to functions. Those are the “vectors.” The space of functions f(x), g(x), h(x), ... defined for 0 < x < 27 must be somehow bigger than R”. What is the dot product of f (x) and g(x)? Key point in the continuous case: Sums are replaced by integrals. Instead of a sum of v; times w;, the dot product is an integral of f(x) times g(x). Change the “dot” to parentheses with a comma, and change the words “dot product” to inner product: DEFINITION = The inner product of f(x) and g(x), and the length squared, are 20 (fa= fo fo)go)dx 2 and A If IP = i (f(x)?dx. es The interval [0, 277] where the functions are defined could change to a different interval like [0, 1]. We chose 27 because our first examples are sin x and cos x. Example 2 The length of f(x) = sinx comes from its inner product with itself: 20 (f, f) =|! (sinx)* dx =a. The length of sinx is /7. 0 That is a standard integral in calculus—not part of linear algebra. By writing sin? x as f —1}cos2x, we see it go above and below its average value 5. Multiply that average by the interval length 27 to get the answer z. More important: is zero: The functions sinx and cosx are orthogonal. Qn / 0 20 Sin COSMO This zero is no accident. a= / 0 Their inner product on 5 Sine da [-4 cos lh It is highly important to science. —a()) (4) The orthogonality goes beyond the two functions sinx and cos x, to an infinite list of sines and cosines. list contains cosO0x (which is 1), sinx, cos x, sin2x, cos 2x, sin3x,cos3x,.... The Every function in that list is orthogonal to every other function in the list. The next step is to look at linear combinations of those sines and cosines. Fourier Series The Fourier series of a function y(x) is its expansion into sines and cosines: y(x) = ap + a1 cosx + bj sinx + a2 cos2x + bz sin2x+---. (5) 384 Chapter 8 Applications We have an orthogonal basis! The vectors in “function space” are combinations of the sines and cosines. On the interval from x = 27 to x = 4m, all our functions repeat what they did from 0 to 27. They are “periodic.” The distance between repetitions (the period) is 277. Remember: The list is infinite. The Fourier series is an infinite series. Just as we avoided the vector v = (1,1, 1,...) because its length is infinite, so we avoid a function like 5 +008 x +cos2x+cos3x+---. (Note: This is 2 times the famous delta function. It is an infinite “spike” above a single point. At x = 0 its height S+1+1+- . is infinite. At all points inside 0 < x < 2m the series adds in some average way to zero.) The delta function has infinite length, and regretfully it is excluded from our space of functions. Compute the length of a typical sum f(x): 20 Ch ey} 0 (ap + a cosx +b, sinx + a2 cos 2x +--+)? dx 20 =) 0 (a? + ai cos” x + b? sin” x + a3 cos* 2x + ---) dx = 2map-+ (a? +b} +03 +--+). (6) The step from line | to line 2 used orthogonality. All products like cos x cos 2x and sin x cos 3x integrate to give zero. Line 2 contains what is left—the integrals of each sine and cosine squared. Line 3 evaluates those integrals. Unfortunately the integral of 1° is 27, when all other integrals give 2. If we divide by their lengths, our functions become orthonormal: 1 COSEUIESINEG COS EFT AALEE os ATE ORIG lag These are unit vectors. is an orthonormal basis for our function space. We could combine them with coefficients Ag, A,, By, A2,... to yield a function F(x). Then the 27 and the z’s drop out of the formula for length. Equation (6) becomes function length = vector length: FI? =(F, F) = AR + AT + BP +A34--. 7) Here is the important point, for f(x) as well as F(x). The function has finite length exactly when the vector of coefficients has finite length. The integral of (F (x)) matches the sum of coefficients squared. Through Fourier series, we have a perfect match between function space and infinite-dimensional Hilbert space. On one side is the function, on the other side are its Fourier coefficients. 8B The function space contains f(x) exactly when the Hilbert space contains the vector v = (dao, a1, b),...) of Fourier coefficients. Both f(x) and v have finite length. 8.4 Fourier Series: Linear Algebra for Functions 385 Example 3 Suppose f(x) is a “square wave,” equal to —1 for negative x and +1 for positive x. That looks like a step function, not a wave. But remember that f(x) must repeat after each interval of length 277. We should have said fa)= —1 for —-x<x<0 ely OR 2Ot8 hoe re The wave goes back to —1 for 7 < x < 2z. It is an odd function like the sines, and all its cosine coefficients are zero. We will find its Fourier series, containing only sines: fe See 4 SI Xp uA 1 This square wave has length 27, -eeSIN. Ox +e SIN D0 3 (8) 5 because at every point (f(x) is (=1)* or (+1): 20 20 Jl? =f (re))tae =| pata — oe, 0 0 At x = 0 the sines are zero and the Fourier series (8) gives zero. This is half way up the jump from —1 to +1. The Fourier series is also interesting when x = 4. At this point the square wave equals 1, and the sines in equation (8) alternate between +1 and —1: [==(1-;+3-4+-). (9) Multiply through by z to find a magical formula 4(1 — ‘=i g= +---) for that famous number. The Fourier Coefficients How do we find the a’s and b’s which multiply the cosines and sines? function f(x), we are asking for its Fourier coefficients: f@) =a +.a; cosx For a given + by sinx + an cos2x +---. Here is the way to find a;. Multiply both sides by cosx. Then integrate from 0 to 2x. The key is orthogonality! All integrals on the right side are zero, except the integral of a, cos? x: 21 / 0 20 TAX) COS de / 0 ay cos? x dx = ay. (10) Divide by z and you have a). To find any other ax, multiply the Fourier series by coskx. Integrate from 0 to 27. Use orthogonality, so only the integral of ax cos? kx is left. That integral is wa,, and divide by 7: 1 a= mw 2n Jo i f(x)coskxdx and similarly by = — wT 2 JO f(x)sinkxdx. (11) 386 Chapter 8 Applications The exception is ap. This time we multiply by cosOx = 1. The integral of 1 is 27: 1 20 Mh) = Ae Tw JO f(x)+1dx = average value of f(x). (12) I used those formulas to find the coefficients in (8) for the square wave. The integral of f(x)coskx was zero. The integral of f(x) sinkx was 4/k for odd k. The point to emphasize is how this infinite-dimensional case is so much like the ndimensional case. Suppose the nonzero vectors v1,..., VU, are orthogonal. We want to write the vector b as a combination of those v’s: b=cyvy +202 +--+ + Cn Vy. (13) Multiply both sides by Oe Use orthogonality, so viv» = 0. Only the c, term is left: vib — C10, YY +0+---+0. Therefore cj = i (14) v, v1 The denominator vf vy is the length squared, like a in equation (11). The numerator vib is the inner product like [ f(x) coskx dx. Coefficients are easy to find when the basis vectors are orthogonal. We are just doing one-dimensional projections, to find the components along each basis vector. The formulas are even better when the vectors are orthonormal. Then we have unit vectors. The denominators cp=vib VE VK are all 1. In this orthonormal and co=vib case, and = cy=vib. (15) You know this in another form. The equation for the c’s is C1Vj +++ +Cy¥, = 5b or Vi °°: «Dp This is an orthogonal matrix Q! Its inverse is Q'. That gives the same c’s as in (15): Qc=b yields c=Q'b. Row by row this is cj = vu}b. Fourier series is like having a matrix with infinitely many orthogonal columns. Those columns are the basis functions 1, cosx,sinx,.... After dividing by their lengths we have an “infinite orthogonal matrix.” Its inverse is its transpose. The formulas for the Fourier coefficients are like (15) when we have unit vectors and like (14) when we don’t. Orthogonality is what reduces an infinite series to one single term. 8.4 Fourier Series: Linear Algebra for Functions 387 Problem Set 8.4 Integrate the trig identity 2 cos jx coskx = cos(j +k)x + cos(j — k)x to show that cos jx is orthogonal to cos kx, provided j # k. What is the result when j = k? Show that the three functions 1, x, and x2 — t are orthogonal, when the integration is from x = —1 tox = 1. Write f(x) = 2x? as a combination of those orthogonal functions. Find a vector (w1,w2, w3,...) that is orthogonal to v = (1, 5, is ve; Js Compute its length ||w||. The first three Legendre polynomials are 1, x, and x* — i Choose the number c so that the fourth polynomial x? — cx is orthogonal to the first three. The integrals still go from —1 to 1. For the square wave f(x) in Example 3, show that 20 20 F(x) cosxidxic 0 2 Faysmxdxi=4 0 ik) Sax ae = 0. 0 Which Fourier coefficients come from those integrals? The square wave has ||f ||? = 27. This equals what remarkable sum by equation (6)? Graph the square wave. Then graph by hand the sum of two sine terms in its series, or graph by machine the sum of two, three, and four terms. Find the lengths of these vectors in Hilbert space: ave (s;. ss tp) (b) v=(l,a,a’,...) (©) f@)=14sinx. Compute the Fourier coefficients ax and b; for f(x) defined from 0 to 27: (aye fy SHBior 0 ar, Px) = torr <r (DS x =x. 10 When f(x) has period 277, why is its integral from —z to m the same as from 0 to 2x? If f(x) is an odd function, f(—x) = — f(x), show that ka J (x) dx is zero. 11 From trig identities find the only two terms in the Fourier series for f (x): ye 12 he cossx | (5). i= cose tae) The functions 1, cos x, sinx, cos 2x, sin2x,... are a basis for Hilbert space. Write the derivatives of those first five functions as combinations of the same five functions. What is the 5 by 5 “differentiation matrix” for these functions? 13 Write the complete solution to dy/dx solution to dy/dx = 0. = cosx as a particular solution plus any 388 Chapter 8 Applications COMPUTER GRAPHICS a 8.5 Computer graphics deals with three-dimensional images. The images are moved around. Their scale is changed. They are projected into two dimensions. All the main operations are done by matrices—but the shape of these matrices is surprising. The transformations of three-dimensional space are done with 4 by 4 matrices. You would expect 3 by 3. The reason for the change is that one of the four key operations cannot be done with a 3 by 3 matrix multiplication. Here are the four operations: Translation (shift the origin to another point Po = (x0, yo, Zo)) Rescaling (by c in all directions or by different factors cj, c2, c3) Rotation (around an axis through the origin or an axis through Po) Projection (onto a plane through the origin or a plane through Po). Translation is the easiest—just add (xo, yo, Zo) to every point. But this is not linear! No 3 by 3 matrix can move the origin. So we change the coordinates of the origin to (0, 0,0, 1). This is why the matrices are 4 by 4. The “homogeneous coordinates” of the point (x, y, z) are (x, y, z, 1) and we now show how they work. 1. Translation Shift the whole three-dimensional space along the vector vg. The origin moves to (x9, yo, Zo). This vector vg is added to every point v in R?. Using homogeneous coordinates, the 4 by 4 matrix T shifts the whole space by v9: : : Translation matrix = Le ROS (Ome oe oO Olan eer) SO xo z# yo | Important: Computer graphics works with row vectors. We have row times matrix instead of matrix times column. You can quickly check that [0 0 0 1]T = [xo yo zo 1). To move the points (0, 0,0) and (x, y, z) by vo, change to homogeneous coordinates (0,0,0,1) and (x, y,z, 1). Then multiply by T. A row vector times T gives a row vector: Every v moves to v+ vo: [x y z I]T = [x + x0 y+ yo z+20 1] The output tells where any v will move. (It goes to v + vg.) Translation is now achieved by a matrix, which was impossible in R?. 2. Scaling To make a picture fit a page, we change its width and height. A Xerox copier will rescale a figure by 90%. In linear algebra, we multiply by .9 times the identity matrix. That matrix is normally 2 by 2 for a plane and 3 by 3 for a solid. In computer graphics, with homogeneous coordinates, the matrix is one size larger: Rescale the plane: S= 2 a 1 Rescale a solid: S = or O0 0 OPO 0-0 ~O=1 i OOO 8.5 Computer Graphics 389 Important: S is not cI. We keep the 1 in the lower corner. Then [x, y, 1] times S is the correct answer in homogeneous coordinates. The origin stays in position because [(O01]S = [001]. If we change that | to c, the result is strange. The point (cx,cy,cz,c) is the same as (x,y,z, 1). The special property of homogeneous coordinates is that multiplying by cI does not move the point. The origin in R* has homogeneous coordinates (0, 0,0, 1) and (0,0,0,c) for every nonzero c. This is the idea behind the word “ho- mogeneous.” Scaling can be different in different directions. To fit a full-page picture onto a half-page, scale the y direction by 5. To create a margin, scale the x direction by i. The graphics matrix is diagonal but not 2 by 2. It is 3 by 3 to rescale a plane and 4 by 4 to rescale a space: 34 Scaling matrices That last matrix S = C1 NI | S rescales the x, y, z directions by positive numbers point at the origin doesn’t move, because [0001]5=[000 cj, c2,c3. The 1]. Summary The scaling matrix § is the same size as the translation matrix T. They can be multiplied. To translate and then rescale, multiply vTS. To rescale and then translate, multiply vST. (Are those different? Yes.) The extra column in all these matrices leaves the extra | at the end of every vector. The point (x, y, z) in R> has homogeneous coordinates (x, y, z, 1) in P?. This “projective space” is not the same as R*. It is still three-dimensional. To achieve such a thing, (cx, cy, cz,c) is the same point as (x, y, z, 1). Those points of P> are really lines through the origin in R*. Computer graphics uses affine transformations, linear plus shift. An affine transformation T is executed on P? by a 4 by 4 matrix with a special fourth column: @ji aio a3 ayn 224 2230 O A319 ds7 1433, 10 a4, 442 443 | T(,0,0) 0O pO,0s1) =) — The usual 3 by 3 matrix tells us three outputs, this tells four. The usual outputs come from the inputs (1,0,0) and (0,1,0) and (0,0, 1). When the transformation is lin- ear, three outputs reveal everything. When the transformation is affine, the matrix also contains the output from (0, 0,0). Then we know 3. Rotation the shift. A rotation in R* or R® is achieved by an orthogonal matrix determinant is +1. (With determinant Q. Include the extra column when you use homogeneous coordinates! Plane rotation Q= cos@ enue The —1 we get an extra reflection through a mirror.) —sind Bos | becomes R=] cos@ sin@ 0 —sin@ cos@ 0 0 0 1 390 = Chapter 8 Applications This matrix rotates the plane around the origin. How would we rotate around a different point (4,5)? The answer brings out the beauty of homogeneous coordinates. Translate (4,5) to (0,0), then rotate by 6, then translate (0,0) back to (4,5): jor WTR Tce (rey ili) e0e 30) cos@é —sin@ 0 i 0: aa lie Ose Sin on aicos OaOGih Samet ] A. —4 -5 1 0 0 30.60 a0 goal I won’t multiply. The point is to apply the matrices one at a time: v translates to v7_, then rotates to vT_R, and translates back to v7_RT,. Because each point FE aed ] is a row vector, T_ acts first. The center of rotation (4, 5)—otherwise known as (4, 5, 1)—moves first to (0,0, 1). Rotation doesn’t change it. Then 7; moves it back to (4,5, 1). All as it should be. The point (4, 6, 1) moves to (0, 1, 1), then turns by 6 and moves back. In three dimensions, every rotation Q turns around an axis. The axis doesn’t move—it is a line of eigenvectors with A = 1. Suppose the axis is in the z direction. The 1 in Q is to leave the z axis alone, the extra 1 in R is to leave the origin alone: O= cos@é lasing 0 —sin@ cosé 0 0 0 1 O and é R= Oe Ou. Oigel Now suppose the rotation is around the unit vector a = (a), a2, a3). With this axis a, the rotation matrix Q which fits into R has three parts: ar QO = (cos@)I + (1 —cos@) | aja aja3 aja2z a; a2a3 aja3 0 aza3 | —sin@ | —a3 as a2 a—-a ONP aa —a, The axis doesn’t move because aQ = a. When a = (0,0, 1) is in the z direction, this Q becomes the previous Q—for rotation around the z axis. The linear transformation Q always goes in the upper left block of R. Below it we see zeros, because rotation leaves the origin in place. When those are not zeros, the transformation is affine and the origin moves. 4. Projection In a linear algebra course, most planes go through the origin. In real life, most don’t. A plane through the origin is a vector space. The other planes are affine spaces, sometimes called “flats.” An affine space is what comes from translating a vector space. We want to project three-dimensional vectors onto planes. Start with a plane through the origin, whose unit normal vector is n. (We will keep n as a column vector.) The vectors in the plane satisfy n'v = 0. The usual projection onto the plane is the matrix I — nn". To project a vector, multiply by this matrix. The vector n is projected to zero, and the in-plane vectors v are projected onto themselves: Ud - nn')n =n— n(n'n) =0 and (J- nn')y =v— n(n'v) =) 8.5 Computer Graphics 391 In homogeneous coordinates the projection matrix becomes 4 by 4 (but the origin doesn’t move): Projection onto the plane n'v =0 P= Now project onto a plane n'(v — v9) = 0 that does not go through the origin. One point on the plane is vo. This is an affine space (or a flat). It is like the solutions to Av = b when the right side is not zero. One particular solution vg is added to the nullspace—to produce a flat. The projection onto the flat has three steps. Translate vg to the origin by 7_. Project along the n direction, and translate back along the row vector vo: Projection onto a flat (NY oe a —v0 eh 4 ! ae I can’t help noticing that 7_ and 7, are inverse matrices: They are like the elementary matrices of Chapter 2. | ie + : 0 translate and translate back. The exercises will include reflection matrices, also known as mirror matrices. These are the fifth type needed in computer graphics. A reflection moves each point twice as far as a projection—the reflection goes through the plane and out the other side. So change the projection J — nn! to I — 2nn! for a mirror matrix. The matrix P gave a “parallel” projection. All points move parallel to n, until they reach the plane. The other choice in computer graphics is a “perspective” projection. This is more popular because it includes foreshortening. With perspective, an object looks larger as it moves closer. Instead of staying parallel to n (and parallel to each other), the lines of projection come toward the eye—the center of projection. This is how we perceive depth in a two-dimensional photograph. The basic problem of computer graphics starts with a scene and a viewing position. Ideally, the image on the screen is what the viewer would see. The simplest image assigns just one bit to every small picture element—called a pixel. It is light or dark. This gives a black and white picture with no shading. You would not approve. In practice, we assign shading levels between 0 and 28 for three colors like red, green, and blue. That means 8 x 3 = 24 bits for each pixel. Multiply by the number of pixels, and a lot of memory is needed! Physically, a raster frame buffer directs the electron beam. It scans like a televi- sion set. The quality is controlled by the number of pixels and the number of bits per pixel. In this area, one standard text is Computer Graphics: Principles and Practices by Foley, Van Dam, Feiner, and Hughes (Addison-Wesley, 1990). My best references were notes by Ronald Goldman (Rice University) and by Tony DeRose (University of Washington, now associated with Pixar). 392 Chapter 8 Applications =» 1. REVIEW OF THE KEY IDEAS # Computer graphics needs shift operations T(v) = v + vo as well as linear operations T(v) = Av. 2. 3. A shift in R” can be executed by a matrix of order n + 1, using homogeneous coordinates. | The extra component | in [x yz1] is preserved when all matrices have the num- bers 0,0, 0,1 as last column. Problem Set 8.5 1 A typical point in R? is xi + yj + zk. The coordinate vectors i, j, and k are (1, 0<0), ©, 1, Da@,0; 1). The coordinates of the pomt are @, y, z). This point in computer graphics is xi + yj + zk + origin. Its homogeneous coordinates are( , , , ). Other coordinates for the same point are( , , , ). D2 A linear transformation T is determined when an affine transformation we also need 7’( transformed to xT (i) + yT(j) + zT (k) + 3 Multiply the 4 by 4 matrix 7 for translation along (1, 4,3) and the matrix 7; for translation along (0, 2,5). The product 77; is translation along 4 Write down the 4 by 4 matrix S that scales by a constant c. Multiply ST and also TS, where we know 7 (i), T(j), T(k). For ). The input point (x, y, z, 1) is : T is translation by (1,4, 3). To blow up the picture around the center point (1,4, 3), would you use vST or vTS? 5 What scaling matrix S (in homogeneous coordinates, so 3 by 3) would make this page square? 6 What 4 by 4 matrix would move a corner of a cube to the origin and then multiply all lengths by 2? The corner of the cube is originally at (1, 1, 2). 7 When the three matrices in equation (1) multiply the unit vector a, show that they give (cos@)a and (1 —cos@)a and 0. Addition gives aQ = a and the rotation axis is not moved. 8 If b is perpendicular to a, multiply by the three matrices in (1) to get (cos@)b and 0 and a vector perpendicular to b. So Qb makes an angle @ with b. This is rotation. 9 What is the 3 by 3 projection matrix J — nn! onto the plane Ex + sy + Ez — (0? In homogeneous coordinates add 0, 0,0, 1 as an extra row and column in P. 8.5 10 Computer Graphics 393 With the same 4 by 4 matrix P, multiply T_PT to find the projection matrix onto the plane $x + fy+ bz = 1. The translation T_ moves a point on that plane (choose one) to (0,0, 0,1). The inverse matrix T, moves it back. 11 Project (3, 3,3) onto those planes. Use P in Problem 9 and 10. 12 If you project a square onto a plane, what shape do you get? 13 If you project a cube onto a plane, what is the outline of the projection? Make the projection plane perpendicular to a diagonal of the cube. 14 The 3 by 3 mirror matrix that reflects through the plane n'v = 0 is M = / — 2nn'. 15 T_ PT, in Problem Find the reflection of the point (3, 3,3) in the plane $x fy - Ez =a) Find the reflection of (3, 3,3) in the plane ox - sy _ RZ = |. Take three steps T_MT, using 4 by 4 matrices: translate by T_ so the plane goes through the origin, reflect the translated point (3, 3,3, 1)7_ in that plane, then translate back by 16 Ii. The vector between ence v = adda _ 17 the origin (0,0,0, 1) and the point (x, y, z, 1) is the differ- ___. _In homogeneous coordinates, vectors end in to a point, not a point to a point. . SO we If you multiply only the /ast coordinate of each point to get (x, y,z,c), you rescale the whole space by the number ~ Lhis is-becase (x. y,.2.c)_ 1s themsathencdsn (nmr ls i A une eN 7 ‘an " singe Sone te Oa ui @:)ie PALS Mle “Fer, aetliamnetrei as Petar rH ed eateey i) a eu tn Vey eal. sev a oe . OM ew fara viele a aah aa wl Gad. Te Sia sserrands ea Pepaders iat: LEP Pe saps me a flwe i i beter)stad 7a 1S i ae er : 2 1 i Eel sh Awe a s} die ; ‘ i pao ie sone is wing Pl Peete! caeitwirerento (oi oom Ve 4 nia 1 oe EO —— 7 a fla ., i¢ TORI i SDE Veeiea%® o@ 0@ “Bei fi BS Nant o& eae owt 7) Wee ae a “ye ee ve ty) =e pl) ID-daa sg,' 90s ew Nt 4 tong ‘ i + Shine : ni ee « er Oee : Gana = ¥ tal dan lw Bias eS De = =i hed tong RE aa Qt he - 9 op ot ° steele ae apg hee (aiiais 5 ay we, (= topie TT sual i" Nd OP J 4 ax a Oh. Sean OW ie = Pan DG - mit vow ned wetees'n < abe 1 "a howtH es => Bey Vy 4.>0) ep YA iio: she ‘We,sonatiewnins « aea a i ‘anh ame”, ie H. gona saytermanage ater vad Sian waeale it f ‘eae min , 6 ~~. Tor ~a, _Gas ass : NUMERICAL GAUSSIAN LINEAR ALGEBRA ELIMINATION IN PRACTICE m 9.1 Numerical linear algebra is a struggle for quick solutions and also accurate solutions. We need efficiency but we have to avoid instability. In Gaussian elimination, the main freedom (always available) is to exchange equations. This section explains when to exchange rows for the sake of speed, and when to do it for the sake of accuracy. The key to accuracy is to avoid unnecessarily large numbers. Often that requires us to avoid small numbers! A small pivot generally means large multipliers (since we divide by the pivot). Also, a small pivot now means a large pivot later. The product of the pivots is a fixed number (except for its sign). That number is the determinant. A good plan is to choose the largest candidate in each new column as the pivot. This is called “partial pivoting.’ The competitors are in the pivot position and below. We will see why this strategy is built into computer programs. Other row exchanges are done to save elimination steps. In practice, most large matrices have only a small percentage of nonzero entries. The user probably knows their location. Elimination is generally fastest when the equations are ordered to put those nonzeros close to the diagonal. Then the matrix is as “banded” as possible. New questions arise for parallel machines. Vector computers like the Cray have 4, 8, or 16 processors working at once. The Connection Machine already has 1024 and the CM-5 is designed for 2'4 = 16,384 processors in parallel. Now the problem is communication—to send processors the data they need, when they need it. This is a major research area now. The brief comments in this section will try to introduce you to thinking in parallel. Section 9.2 is about instability that can’t be avoided. It is built into the problem, and this sensitivity is measured by the “condition number.” Then Section 9.3 describes how to solve Ax = b by iterations. Instead of direct elimination, the computer solves an easier equation many times. Each answer xx goes back into the same equation to find the next guess x,41. For good iterations, the xz converge quickly to x = A mb: 395 396 Chapter 9 Numerical Linear Algebra Roundoff Error and Partial Pivoting Up to now, any pivot (nonzero of course) was accepted. In practice a small pivot is dangerous. A catastrophe can occur when numbers of different sizes are added. Computers keep a fixed number of significant digits (say three decimals, for a very weak machine). The sum 10,000 + 1 is rounded off to 10,000. The “1” is completely lost. Watch how that changes the solution to this problem: .0001u --v=1 starts with coefficient matrix —u+v=0 i —| a 1 : If we accept .0001 as the pivot, elimination adds 10,000 times row 1 to row 2. Roundoff leaves 10,000v = 10,000 The computed answer v = leads to 000l instead of 10,001v = 10,000. | is near the true v = .9999. u+1l=1).. ansiesa et But then back substitution 0001 uw + .9999 = 1. The first equation gives u = 0. The correct answer (look at the second equation) is u = 1.000. By losing the “1” in the matrix, we have lost the solution. The change from 10,001 to 10,000 has changed the answer from u =1 to u = 0 (100% error’). If we exchange rows, even this weak computer finds an answer that is correct to three places: —u+yv=0 000lu+v= —u+v=0 1 Via u=1 a The original pivots were .0001 and 10,000—badly scaled. After a row exchange the exact pivots are —1 and 1.0001—well scaled. The computed pivots —1 and 1 come close to the exact values. Small pivots bring numerical instability, and the remedy is partial pivoting. The kth pivot is decided when we reach and search column k: Choose the largest number in row k or below. Exchange its row with row k. The strategy of complete pivoting looks also in later columns for the largest pivot. It exchanges columns as well as rows. This expense is seldom justified, and all major codes use partial pivoting. Multiplying a row or column by a scaling constant can also be worthwhile. Jf the first equation above is u + 10,000v = 10,000 and we don’t rescale, then | is the pivot but we are in trouble again. For positive definite matrices, row exchanges are not required. It is safe to accept the pivots as they appear. Small pivots can occur, but the matrix is not improved by row exchanges. When its condition number is high, the problem is in the matrix and not in the order of elimination steps. In this case the output is unavoidably sensitive to the input. 9.1. Gaussian Elimination in Practice 397 The reader now understands how a computer actually solves Ax = b—by elimination with partial pivoting. Compared with the theoretical description—find A~! and multiply A~'b—the details took time. But in computer time, elimination is much faster. I believe this algorithm is also the best approach to the algebra of row spaces and nullspaces. Operation Counts: Full Matrices and Band Matrices Here is a practical question about cost. How many separate operations are needed to solve Ax = b by elimination? This decides how large a problem we can afford. Look first at A, which changes gradually into U. When a multiple of row 1 is subtracted from row 2, we do n operations. The first is a division by the pivot, to find the multiplier /. For the other n — | entries along the row, the operation is a “multiply-subtract.” For convenience, we count this as a single operation. If you regard multiplying by / and subtracting from the existing entry as two separate operations, multiply all our counts by 2. The matrix A is n by n. The operation count applies to all n — 1 rows below the first. Thus it requires n times n — 1 operations, or n? —n, to produce zeros below the first pivot. Check: All n? entries are changed, except the n entries in the first row. When elimination is down to k equations, the rows are shorter. We need only k* —k operations (instead of n* —n) to clear out the column below the pivot. This is true for 1 <k <n. The last step requires no operations (17 — 1 = 0), since the pivot is set and forward elimination is complete. The total count to reach U is the sum of k* —k over all values of k from 1 to n: (?4---+n’)-(+--+n = n(n + 1)(2n + 1) . nin+1) 6 _ n—n 2 3 Those are known formulas for the sum of the first n numbers and the sum of the first n squares. Substituting n = | into n> —n gives zero. Substituting n = 100 gives a million minus a hundred—then divide by 3. (That translates into one second on a workstation.) We will ignore the last term n in comparison with the larger term n>, to reach our main conclusion: The operation count for forward elimination (A to U) is in’, That means in? multiplications and sn subtractions. Doubling n increases this cost by eight (because n is cubed). 100 equations are OK, 1000 are expensive, 10000 are impossible. We need a faster computer or a lot of zeros or a new idea. On the right side of the equations, the steps go much faster. We operate on single numbers, not whole rows. Each right side needs exactly n> operations. Remember that we solve two triangular systems, Lc = b forward and Ux = c backward. In back substitution, the last unknown needs only division by the last pivot. The equation above it needs two operations—substituting x, and dividing by its pivot. The kth step needs 398 = Chapter 9 Figure 9.1. Numerical Linear Algebra A=LU for a band matrix. Good zeros in A stay zero in L and U. k operations, and the total for back substitution is [t24--tn= 1 FO why? operations. The forward part is similar. The n? total exactly equals the count for multiplying A7!b! This leaves Gaussian elimination with two big advantages over A~!b: 1 Elimination requires n° operations compared to n° for Aq!. 2 If A is banded so are L and U. But A~! is full of nonzeros. Band Matrices These counts are improved when A has “good zeros.’ A good zero is an entry that remains zero in L and U. The most important good zeros are at the beginning of a row. No elimination steps are required (the multipliers are zero). So we also find those same good zeros in L. That is especially clear for this tridiagonal matrix A: yer 1 vee 21 ei =e 10a eed =e =i ee. (el ea iol Rows 3 and 4 of A begin with zeros. No multiplier is needed, so L has the same zeros. Also rows | and 2 end with zeros. When a multiple of row 1 is subtracted from row 2, no calculation is required beyond the second column. The rows are short. They stay short! Figure 9.1 shows how a band matrix A has band factors L and U. These zeros lead to a complete change in the operation count, for “half-bandwidth” w: A band matrix has a;j =9 when |i-—j|>w. Thus w = | for a diagonal matrix and w = 2 for a tridiagonal matrix. The length of the pivot row is at most w. There are no more than w — 1 nonzeros below any pivot. 9.1. Gaussian Elimination in Practice 399 Each stage of elimination is complete after w(w—1) operations, and the band structure survives. There are n columns to clear out. Therefore: Forward elimination on a band matrix needs less than w?n operations. For a band matrix, the count is proportional to n instead of n°. It is also proportional to w*. A full matrix has w =n and we are back to n>. For a closer count, remember that the bandwidth drops below w in the lower right corner (not enough space). The exact count to find L and U is w(w — 1)(3n — 2w 4+ 1) for a band matrix 3 nin—1)(n+1) n?—n 3 when 8) w=n. On the right side, to find x from b, the cost is about 2wn (compared to the usual n7). Main point: For a band matrix the operation counts are proportional to n. This is extremely fast. A tridiagonal matrix of order 10,000 is very cheap, provided we don’t compute A~!. That inverse matrix has no zeros at all: ih soe ea) abi Mee wen eet oo Ow” 9 4y 3) oR 4 Se a ie PeebaseAne Tl ial, Wear et le ks Ra We are actually worse off knowing A~! than knowing L and U. Multiplication by A~! needs the full n? steps. Solving Le = b and Ux = c needs only 2wn. Here that means 4n. A band structure is very common in practice, when the matrix reflects connections between near neighbors. We see a)3 = 0 and aj4 = 0 because | is not a neighbor of 3 and 4. We close with two more operation counts: 1. Aq! costs n° steps. 2 QR costs $n? steps. 1 Start with AA~! = J. The jth column of A7! solves Ax; = jth column of /. Normally each of those n right sides needs n> operations, making n° in all. The left side costs zn? as usual. (This is a one-time cost! L and U are not repeated for each new right side.) This count gives zn, but we can get down to n°. The special saving for the jth column of J comes from its first 7 — 1 zeros. No forward cost The j. row reaches elimination until side right the on required work is on the elimination forward for total the j, over Summing sn. of is S(n — j)? instead n right sides is zn. Then the final count of multiplications for A~! (with an equal number of subtractions) is n> if we actually want the inverse matrix: A: 3 3 3 (L and U) + ~ (forward) 2 +n(5) (back substitutions) a = n, (1) 400 Chapter 9 Numerical Linear Algebra 2 The Gram-Schmidt process works with columns instead of rows—that is not so important to the count. The key difference from elimination is that the multiplier is decided by a dot product. So it takes n operations to find the multiplier, where elimination just divides by the pivot. Then there are n “multiply-subtract” operations to remove from column 2 its projection along column 1. (See Section 4.4 and Problem 4.4.28 for the sequence of projections.) The cost for Gram-Schmidt is 2n where for elimination it is n. This factor 2 is the price of orthogonality. We are changing a dot product to zero instead of changing an entry to zero. Caution To judge a numerical algorithm, it is not enough to count the operations. Beyond “flop counting” is a study of stability and the flow of data. Van Loan emphasizes the three levels of linear algebra: linear combinations cu + v (level 1), matrix-vector Au + v (level 2), and matrix-matrix AB +C (level 3). For parallel computing, level 3 is best. AB uses 2n? flops (additions and multiplications) and only 2n? data—a good ratio of work to communication overhead. Solving UX = B for matrices is better than Ux = b for vectors. Gauss-Jordan partly wins after all! Plane Rotations There are two ways to reach the important factorization A = QR. One way works to find Q, the other way works to find R. Gram-Schmidt chose the first way, and the columns of A were orthogonalized to go into Q. (Then R was an afterthought. It was upper triangular because of the order of Gram-Schmidt steps.) Now we look at a method that starts with A and aims directly at R. Elimination gives A = LU, orthogonalization gives A = QR. What is the difference, when R and U are both upper triangular? For elimination L is a product of E’s—with 1’s on the diagonal and the multiplier /;; below. For QR we must use orThe E’s are not allowed. We don’t want a triangular L, we want thogonal matrices. an orthogonal Q. There are two simple orthogonal matrices to take the place of the E’s. The reflection matrices I —2uu" are named after Householder. The plane rotation matrices are named after Givens. The matrix that rotates the xy plane by 6, and leaves the z direction alone, is Givens Rotation Oo; =_| cos@é sing 0 —sin@d O cos@ 0 Ort Use Q2) the way you used £>\, to produce a zero in the (2,1) position. That determines the angle 6. Here is an example given by Bill Hager in Applied Numerical Linear Algebra (Prentice-Hall, 1988): geo) QnA=|o 8 0 nea) ao 153 114 6 0 Oro 120 200 +79. —40 ~223 309 |= 150 —155 —110 0 200 Tey —40 Ps 395 9.1. The zero came from —.8(90) + .6(120). 90 cos 6 = ———_____ V907 + 1202 Gaussian Elimination in Practice 401 No need to find 6, what we needed was al 20) sin? = ————____ V907 + 1202 and a) Now we attack the (3,1) entry. The rotation will be in rows and columns 3 and 1. The numbers cos@ and sin@ are determined from 150 and 200, instead of 90 and 120. They happen to be .6 and —.8 again: (Gy 08 On =—s 0 Q3102:A = One more come step to R. 26 0 &&e S00) 0 oC) ie ae The (3,2) entry has to go. from 75 and 100. The rotation is now I 03203102A=|90 0 0 6 Ss 0 8 is 25) t= 25 0 75 0 100 = The numbers in rows 290) 125, 0 fea 0 Hoo cos@ and columns -l=|] . 250 —225 B25 and sin@ now 2 and 3: 250..—125 .0 125 0 Q 250 125 375 We have reached the upper triangular R. What is Q? Move the plane rotations Qj; to the other side to find A = QR—just as you moved the elimination matrices Ej; to the other side to find A = LU: Q303102A=R means A=(Q>,03/'03))R= OR. (3) The inverse of each Qj; is Qi, (rotation through —@). This is different from the inverse of £;;—which was not an orthogonal matrix! Bx added back to row i the multiple /;; (times row j) that E£;; had subtracted. I hope you see how the big computations of linear algebra—LU and QR—are similar but not the same. There is a third big computation—eigenvalues and eigenvectors. If we can make A triangular, we can see its eigenvalues on the diagonal. But we can’t use U and we can’t use R. To preserve the eigenvalues, the allowed step is not Q2;A but Qr1AQ>,. That extra factor Ont wipes out the zero that Q2) created! There are two ways to go. Neither one gives the eigenvalues in a fixed number of steps. (That is impossible. The calculation of cos @ and sin@ involved only a square root. The nth degree equation det(A — AJ) = 0 cannot be solved by a succession of square roots.) But still the rotations Q;; are useful: Method 1 Produce a zero in the (3, 1) entry of Q2;A, instead of (2,1). That zero is not destroyed when Ox multiplies on the right. We are leaving a diagonal of nonzeros under the main diagonal, so we can’t read off the eigenvalues. But this “Hessenberg matrix”’ with the extra diagonal of nonzeros still has a lot of good zeros. Method 2 Choose of Q2AQ>). a different Q2;, which does produce a zero in the (2, 1) position This is just a 2 by 2 eigenvalue problem, for the matrix in the upper 402 Chapter 9 Numerical Linear Algebra left corner of A. The column (cos@, —sin@) is an eigenvector of that matrix. This is the first step in “Jacobi’s method.” The problem of destroying zeros will not go away. The second step chooses Q3} so that 03) Q>,AQ3} Ox has a zero in the (3, 1) position. But it loses the zero in the (2, 1) position. Jacobi solves 2 by 2 eigenvalue problems to find his Q;;, but earlier nonzeros keep coming back. In general those nonzeros are smaller each time, and after several loops through the matrix the lower triangular part is substantially reduced. Then the eigenvalues gradually appear on the diagonal. What you should remember is this. The Q’s are orthogonal matrices—their columns with (cos@,sin@) and (—sin@,cos@) are orthogonal unit vectors. Computations with the Q’s are very stable. The angle 6 can be chosen to make a particular entry zero. This is a step toward the final goal of a triangular matrix. That was the goal at the beginning of the book, and it still is. Problem Set 9.1 1 Find the two pivots with and without partial pivoting for 001 0 oe : mae With partial pivoting, why are no entries of L larger than 1? Find a 3 by 3 matrix A with all |aj;| < 1 and |/;;| < 1 but third pivot = 4. Z Compute the exact inverse of the Hilbert matrix A by elimination. Then compute A~! again by rounding all numbers to three figures: A = hilb3) = Nie Wl ee 3 dle WI BIR Ble Ale Wie For the same A compute b = Ax for x = (1, 1, 1) and x = (0, 6, —3.6). A small change Ab produces a large change Ax. 4 Find the eigenvalues (by computer) of the 8 by 8 Hilbert matrix aj; = 1/(@i + j — 1). In the equation Ax = b with ||b|| = 1, how large can ||x|| be? If b has roundoff error less than 5 10~!°, how large an error can this cause in x? For back substitution with a band matrix (width w), show that the number of multiplications to solve Ux = ¢ is approximately wn. 6 7. If you know L and U and @Q and R, is it faster to solve LUx = b or ORx = b? Show that the number of multiplications to invert an upper triangular n by n matrix is about zn. Use back substitution on the columns of 7, upward from 1’s. 9.1 Gaussian Elimination in Practice 403 Choosing the largest available pivot in each column (partial pivoting), factor each AINtOCE Ate LU il. ae; >| and @ eee 2 Aa 0 i ae) ©¢ Put 1’s on the three central diagonals of a 4 by 4 tridiagonal matrix. cofactors of the six zero entries. 10 Those entries are nonzero Find the in A~!. (Suggested by C. Van Loan.) Find the LU factorization of A =[€1]. On your computer solve by elimination when ¢ = 10 10 ot iy, Lee tales The true x is (1, 1). Make = lee 2 10 10 ee ; a table to show the error for each ¢. Exchange the two equations and solve again—the errors should almost disappear. 11 Choose sin@ and cos@ to triangularize A, and find R: cos@é —sin@}]}]1 tay ol E —1 He AR a - 12 Choose sin@ and cos@ to make eigenvalues? 13 When A is multiplied by Q;;, which of the n> entries of A are changed? When Q>,AQ3} triangular (same A). What are the Q;;A is multiplied on the right by Oe which entries are changed now? 14 How many multiplications and how many additions are used to compute Q;;A? (A careful organization of the whole sequence of rotations gives in? multiplica- tions and in? additions—the same as for QR by reflectors and twice as many as tor. U-) 15 (Turning a robot hand) The robot produces any 3 by 3 rotation A from plane rotations around the x, y, z axes. Then Q32Q3;Q21A = R, where A is orthogonal so R is J! The three robot turns are in A = OO Own The three angles are “Euler angles” and det Q = | to avoid reflection. Start by choosing cos@ and sin@ so that cos6 QO7,A = | sind 0 —snO6 cosd 0 O;,;-1 2 2 0) A 2 —1 21 Lae 2 sa is zero in the (2, 1) position. 404 Chapter 9 Numerical Linear Algebra NORMS AND CONDITION NUMBERS gp 9.2 How do we measure the size of a matrix? For a vector, the length is ||x||. For a matrix, the norm is ||A||. This word “norm” is sometimes used for vectors, instead of length. It is always used for matrices, and there are many ways to measure ||A||. We look at the requirements on all “matrix norms”, and then choose one. Frobenius squared all the entries of A and added; his norm ||A||p is the square root. This treats the matrix like a long vector. It is better to treat the matrix as a matrix. Start with a vector norm: ||x + y|| is not greater than ||x|| + ||y||. This is the triangle inequality: x + y is the third side of the triangle. Also for vectors, the length of 2x or —2x is doubled to 2||x||. The same rules apply to matrix norms: |A + Bll < |All + [Bll and ||cAl] = |e] |All. (1) The second requirements for a norm are new for matrices—because matrices multiply. The size of Ax and the size of AB must stay under control. For all matrices and all vectors, we want || Ax |]< |]All ll and ABI < JA} Bl. (2) This leads to a natural way to define ||A||. Except for the zero matrix, the norm is a positive number. The following choice satisfies all requirements: DEFINITION The norm of a matrix A is the maximum |All = x40 of the ratio \|Ax||/||x ||: ||Ax|| ||| (3) Then ||Ax||/||x|| is never larger than ||A|| (its maximum). This says that ||Ax|| < |All ||x||, as required. ||x||/||x||. Therefore ||| = 1. If A is an orthogonal matrix Q, then again lengths are preserved: |x|| for every x. The ratios again give ||Q|| = 1. Example 1 If A is the identity matrix /, the ratios are always ||Qx|| = Example 2 The norm of a diagonal matrix is its largest entry (using absolute values): The norm of A= E i is; |AllP=3. The ratio is ||Ax|| = \/2?x7 + 32x} divided by ||x|] = ,/x? +3. That is a maximum when x; = 0 and x2 = 1. This vector x = (0, 1) is an eigenvector with Ax = (0, 3). The eigenvalue is 3. This is the largest eigenvalue of A and it equals the norm. 9.2 Norms and Condition Numbers 405 For a positive definite symmetric matrix the norm is ||A\| = Amax.Choosex to be the eigenvector with maximum eigenvalue: Ax = Amaxx. Then ||Ax||/||x|| equals Amax. The point is that no other vector x can make the ratio larger. The matrix is A = QAQ!, and the orthogonal matrices Q and Q" leave lengths unchanged. So the ratio to maximize is really ||Ax||/||x|]. The norm Amax is the largest eigenvalue in the diagonal matrix A. Symmetric matrices Suppose A is symmetric but not positive definite—some eigenvalues of A are negative or zero. Then the norm ||A|| is the largest of |A;|, |Aa|, ..., |An|. We take absolute values of the A’s, because the norm is only concerned with length. For an eigenvector we have ||Ax|| = ||Ax||, which is |A| times ||x|]. Dividing by ||x|| leaves |A|. The x that gives the maximum ratio is the eigenvector for the maximum |A|. Unsymmetric matrices If A is not symmetric, its eigenvalues may not measure its true size. The norm can be large when the eigenvalues are small. generally larger than |\A|max. A very unsymmetric example Thus the norm is has A; = Az = 0 but its norm is not zero: A= 3 a A || A || = max Axl has norm x40 ||x|| —w) The vector x = (0,1) gives Ax = (2,0). The ratio of lengths is 2/1. maximum ratio ||A||, even though x is not an eigenvector. This is the It is the symmetric matrix A™A, not the unsymmetric A, that has x = (0, 1) as its eigenvector. The norm is really decided by the largest eigenvalue of A'A, as we now prove. 9A The norm of A (symmetric or not) is the square root of Ninax(Ah A): te || Ax ||? = max 7, x2 wT Al Ax == eo ar rls eee St Amax(A’A) max ( )| 4 (4) Proof Choose x to be the eigenvector of A'A corresponding to its largest eigenvalue max. The ratio in equation (1) is then xT ATAx this particular x, the ratio equals Amax. = x! (Amax)X divided by x'x. For No other x can give a larger ratio. The symmetric matrix A'A has orthonormal eigenvectors q1,q2,---.Qn- Every x is a combination of those vectors. Try this combination in the ratio and remember that q} 4) =U Si pa ALAN xlx > (C141 aur CnQn) qe (c1A1 41 +-+++CnAngn) + ¢ngn)! (c1q1 air ee + Cnn) a crAy a = +... Con +c? (5) 406 Chapter 9 Numerical Linear Algebra That last ratio cannot be larger than Amax. The maximum ratio is when all c’s are zero, except the one that multiplies Amax. Note 1 The ratio in (5) is known as the Rayleigh quotient for the matrix A'A. The maximum is the largest eigenvale dmax(A!A). The minimum is Amin(A! A). If you substitute any vector x into the Rayleigh quotient x! ATAx/x'x, you are guaranteed to get a number between Amin and Amax.Note 2 The norm ||A|| equals the largest singular value Omax of A. The singular values O|,..-., 0, are the square roots of the positive eigenvalues of A'A. So certainly omax = (Amax)!/*. This is the norm of A. Note 3 Check that the unsymmetric example in equation (3) has Limag(A LA) = 4: A= k | leads to ATA = E | with Amax = 4. So the norm is ||Al| = V4. The Condition Number of A Section 9.1 showed that roundoff error can be serious. Some systems are sensitive, others are not so sensitive. The sensitivity to error is measured by the condition number. This is the first chapter in the book which intentionally introduces errors. We want to estimate how much they change x. The original equation is Ax = b. Suppose the right side is changed to b+ Ab because of roundoff or measurement error. The solution is then changed to x+ Ax. Our goal is to estimate the change Ax in the solution from the change AB in the equation. Subtraction gives the error equation A(Ax) = Ab: Subtract Ax = b from A(x + Ax) =b+ Ab to find A(Ax) = AB. (6) The error is Ax = A~!AD. It is large when A~! is large (then A is nearly singular). The error Ax is especially large when Ab points in the worst direction—which is amplified most by A~!. The worst error has ||Ax|| = ||A7/|| ||Ab||. That is the largest possible output error Ax. This error bound ||A~!|| has one serious drawback. If we multiply A by 1000, then A~! is divided by 1000. The matrix looks a thousand times better. But a simple rescaling cannot change the reality of the problem. It is true that Ax will be divided by 1000, but so will the exact solution x = A~'b. The relative error ||Ax||/||x|| will stay the same. It is this relative change in x that should be compared to the relative change in b. Comparing relative errors will now lead to the “condition number” c = ||A|j ||A~! |]. Multiplying A by 1000 does not change this number, because A™! is divided by 1000 and the product c stays the same. 9.2 Norms and Condition Numbers 407 9B The solution error is less than c = ||A|j ||A~‘|| times the problem error: Axl Ad lz) = ° Wen S If the problem error is AA (error in the matrix instead of in b), this changes to || Ax || |AAl| Fe ee Ix + Axl ~° (Al. i») Proof The original equation is b = Ax. The error equation (6) is Ax Apply the key property (2) of matrix norms: ll < All Ix and = A~!Ab. Axl] < JA" ABI. Multiply the left sides to get ||b|| ||Ax||, and also multiply the right sides. Divide both sides by ||b|| ||x||. The left side is now the relative error ||Ax||/||x||. The right side is now the upper bound in equation (7). The same condition number c = matrix. We have AA instead of Ab: ||A|l ||A~!|| appears when the error is in the Subtract Ax = b from (A + AA)(x + Ax) = Bb to find A(Ax) = —(AA)(x + Ax). Multiply the last equation by A~! and take norms to reach equation (8): || Ax || ee ee Axl] < AT AAI lle + Axl cd lx + Ax]|| le Conclusion Errors enter in two ways. They begin with an error AA or Ab—we use a wrong matrix or a wrong b. This problem error is amplified (a lot or a little) into the solution error Ax. That error is bounded, relative to x itself, by the condition number c. The error Ab depends on computer roundoff and on the original measurements of b. The error AA also depends on the elimination steps. Small pivots tend to produce large errors in L and U. Then L+ AL times U + AU equals A+ AA. When AA or the condition number is very large, the error Ax can be unacceptable. Example 3 When values: A and A~! are symmetric, c = ||A|| |A—!|| comes 6 0 A=a E | has norm 6. = Ate= LO 1n6 0 1 has; norm from the eigen- i"5 This A is symmetric positive definite. Its norm is Amax = 6. The norm l/Agin = - Multiplying those norms gives the condition number: 7 Amax Amin = GaS'S:3 of A! is 408 Chapter 9 Numerical Linear Algebra Example 4 Keep the same A, with eigenvalues 6 and 2. To make x small, choose b along the first eigenvector (1,0). To make Ax large, choose Ab along the second eigenvector (0, 1). Then x = ib and Ax = 5b. The ratio ||Ax||/||x|| is exactly c = 3 times the ratio ||Ab||/||b|]. This shows that the worst error allowed by the condition number can actually happen. Here is a useful rule of thumb, experimentally verified for Gaussian elimination: The computer can lose logc decimal places to roundoff error. Problem Set 9.2 1 Find the norms Amax and condition numbers Amax/Amin Of these positive definite matrices: Sy 0 ye Find the norms @ 2) wl il 2 23 ih al Jee and condition numbers from the square roots of AraC A LA)vand Axl AlAD: =O) ong 3 a 0 0 jae pie Explain these two inequalities from the definitions of ||A|| and ||Bl: |ABx|| < ||Al] ||Bxl] < |All |Bll Ix. From the ratio that gives ||AB||, deduce that ||AB|| < ||Al| ||B||. This is the key to using matrix norms. 4 5 Use ||AB|| < ||A]] |B] to prove that the condition number of any matrix A is at least 1. Why is J the only symmetric positive definite matrix that has Amax = Amin = 1? Then the only matrices with ||A|| = 1 and ||A~!|| = 1 must have ATA = J. They are matrices. 6 Orthogonal matrices have norm ||Q|| = 1. If A = QR show that ||A]| < ||R]| and also ||R|| < ||Al]. Then ||/A|| = ||R||. Find an example of A = LU with Al] < LINCI. ve 8 (a) Which famous inequality gives ||(A + B)x|| < ||Ax|| + ||Bx|| for every x? (b) Why does the definition (4) of matrix norms lead to ||A+B|| < ||A||+||B||? Show that if A is any eigenvalue of A, then |A| < ||Al|. Start from Ax = Ax. 9.2. Norms and Condition Numbers 409 The “spectral radius” p(A) = |Amax| is the largest absolute value of the eigenvalues. Show with 2 by 2 examples that p(A+B) < p(A)+ p(B) and p(AB) < P(A)p(B) can both be false. The spectral radius is not acceptable as a norm. 10 (a) Explain why A and A! have the same condition number. (b) Explain why A and A! have the same norm. 11 Estimate the condition number of the ill-conditioned matrix A = ie sve |. 12 Why is the determinant of A no good as a norm? dition number? 13 (Suggested by C. Moler and C. Van Loan.) Compute b— Ay and b— Az when yt 2 kt ee re JOO. | 363 Olle umeGS Why is it no good as a con- cy ly ol * Llu 08701, 999 cline tle Is y closer than z to solving Ax = b? Answer in two ways: Compare the residual b — Ay to b — Az. Then compare y and z to the tue x = (1,-—1). Both answers can be right. Sometimes we want a small residual, sometimes a small Ne 14 (a) Compute the determinant of A in Problem (b) If possible compute Problems 15 13. Compute A’. ||A|| and |A~!|| and show that c > 10°. 15-19 are about vector norms other than the usual ||x|| = /x -x. The “Z! norm” and the “J norm” of x = (x1,...,Xn) Welly = [xa] +--+ + [xn] Compute the norms and are |x ]loo = max [xj]. 1<i<n ||x|| and ||x||; and ||x||,.o of these two vectors in R>: ee ee) se (eee eae 16 Prove that ||x|loo < ||x|| < ||x|l1. Show from the Schwarz inequality that the ratios 1x ||/I Iloo and ||x||1/||x||are never larger than /n. Which vector (x1, ...,- Xn) gives ratios equal to /n? Ww, All vector norms must satisfy the triangle inequality. Prove that lx + Ylloo < II¥lloo+1¥lloo 18 and lx + yl S Ile lia + My h. Vector norms must also satisfy ||cx||= |c| |||]. The norm must be positive except when x = 0. Which of these are norms for (x1, x2)? Ea 89 Wee Be) Ix Ilo = lel + [lt lloo |x || 3 = min |x;| Ix llp = ||Ax|| (answer depends on A). 410° Chapter 9 Numerical Linear Algebra ITERATIVE METHODS FOR LINEAR ALGEBRA gp 9.3 Up to now, our approach to Ax = b has been “direct.” We accepted A as it came. We attacked it with Gaussian elimination. This section is about iterative methods, which replace A by a simpler matrix S. The difference T = S—A is moved over to the right side of the equation. The problem becomes easier to solve, with S instead of A. But there is a price—the simpler system has to be solved over and over. An iterative method is easy to invent. Just split A into S— 7. Then Ax = b is the same as Sx =Tx +b. The novelty is to solve (1) iteratively. (1) Each guess x; leads to the next x,41: SXp41 = PX, +O. (2) Start with any xo. Then solve $x; = Tx9 +6. Continue to the second iteration Sx2 = Tx; +5. A hundred iterations are very common—maybe more. Stop when (and if!) the new vector x;,+1 is sufficiently close to x,—or when the residual Ax; — 6 is near zero. We can choose the stopping test. Our hope is to get near the true solution, more quickly than by elimination. When the sequence x, converges, its limit x = Xoo does solve equation (1). The proof is to let k — oo in equation (2). The two goals of the splitting A = S—T are speed per step and fast convergence of the x;. The speed of each step depends on S$ and the speed of convergence depends on S pele 1 Equation (2) should be easy to solve for x,4,. The “preconditioner” S could be diagonal or triangular. When its LU factorization is known, each iteration step is fast. 2 The difference x —x, (this is the error e,) should go quickly to zero. Subtracting equation (2) from (1) cancels b, and it leaves the error equation: Sex41 = Tex which means ey; = Se eg (3) At every step the error is multiplied by S~'T. If S~'T is small, its powers go quickly to zero. But what is “small”? The extreme splitting is S = A and T =0. Then the first step of the iteration is the original Ax = b. Convergence is perfect and S~'T is zero. But the cost of that step is what we wanted to avoid. The choice of S is a battle between speed per step (a simple S) and fast convergence (S close to A). Here are some popular choices: J GSS S = diagonal part of A (the iteration is called Jacobi’s method) = lower triangular part of A (Gauss-Seidel method) 9.3 Iterative Methods for Linear Algebra SOR S = combination of Jacobi and Gauss-Seidel (successive overrelaxation) ILU S = approximate L times approximate U (incomplete LU method). 411 Our first question is pure linear algebra: When do the x,’s converge to x? The answer uncovers the number |A|max that controls convergence. In examples of J and GS and SOR, we will compute this “spectral radius” |A|max. It is the largest eigenvalue of the iteration matrix S~'T. The Spectral Radius Controls Convergence Equation (3) is eg+; = S~'!Te,. Every iteration step multiplies the error by the same matrix B = S~'T. The error after k steps is eg = B*eo. The error approaches zero if the powers of B = S~'T approach zero. It is beautiful to see how the eigenvalues of B—the largest eigenvalue in particular—control the matrix powers B*. 9C Convergence The powers B* approach zero if and only if every eigenvalue of B satisfies |A| < 1. The rate of convergence is controlled by the spectral radius |A.max: The test for convergence is |A|max < 1. Real eigenvalues must lie between —1 and 1. Complex eigenvalues 4 = a+ib must lie inside the unit circle in the complex plane. In that case the absolute value |A| is the square root of a + b*—Chapter 10 will discuss complex numbers. In every case the spectral radius is the largest distance from the origin 0 to the eigenvalues A;,...,A,. Those are eigenvalues of the iteration matrix Bsr Mk To see why |Almax < 1 is necessary, suppose the starting error e9 happens to be an eigenvector of B. After one step the error is Bep = eo. After k steps the error is Bee = Akeo. If we start with an eigenvector, we continue with that eigenvector—and it grows or decays with the powers A*. This factor A* goes to zero when Since this condition is required of every eigenvalue, we need |A|max < l. |A| < 1. To see why |Al|max < 1 is sufficient for the error to approach zero, suppose eo is a combination of eigenvectors: €o9 =cyXi +-+-+cenX, leads to ex = c1(A1)*x} feeef CH) Xp. (4) This is the point of eigenvectors! They grow independently, each one controlled by its eigenvalue. When we multiply by B, the eigenvector x; is multiplied by A;. If all |A;| < 1 then equation (4) ensures that e, goes to zero. Example 1 B = [-6:3] has Amex = 1.1 BY yO rs| Has Ae: B27 Wi times B. Then B®? is (1.1)* times B. The powers of B blow up. Contrast with the powers of B’. The matrix (B’)* has (.6)* and (.5)* on its diagonal. The off-diagonal entries also involve (.6)*, which sets the speed of convergence. 412 Chapter 9 Numerical Linear Algebra Note There is a technical difficulty when B does not have n independent eigenvectors. (To produce this effect in B’, change .5 to .6.) The starting error eg may not be a combination of eigenvectors—there are too few for a basis. Then diagonalization is impossible and equation (4) is not correct. We turn to the Jordan form: B= Siss and Be =syrs*. (5) Section 6.6 shows how J and J* are made of “blocks” with one repeated eigenvalue: et | k ki i = The powers of a 2 by 2 block are Ak 0 ORK k-1 ak |; If |A| < 1 then these powers approach zero. The extra factor k from a double eigenvalue is overwhelmed by the decreasing factor 4*~!. This applies to all Jordan blocks. A larger block has k?A*~? in J*, which also approaches zero when |A| < 1. If all |A| < 1 then J‘ -> 0. This proves Theorem 9C: Convergence requires |Almax < 1. Jacobi versus Seidel We now solve a specific 2 by 2 problem. The theory of iteration says that the key number is the spectral radius of B = S -!T. Watch for that number written o(B)—the Greek letter “rho” stands for the spectral radius: 2u-—- v= —u +2 4 ; has the solution = —2 The first splitting is Jacobi’s method. Uu v 2 = |Almax. It is also : (6) Keep the diagonal terms on the left side (this is S). Move the off-diagonal part of A to the right side (this is 7). Then iterate: 2Uk+1 = UR +4 2Uk4+1 = Uz — 2. Start the iteration from uo = vo = 0. The first step goes to uy = 2,v; = —1. going: Keep fo] La} Do} Leva) [%] [ays] semems [0] This shows convergence. At steps 1, 3, 5 the second component is —1, —1/4, —1/16. The error is multiplied by 4 every two steps. So is the error in the first component. The values 0, 3/2, 15/8 have errors 2, 5, 7. Those also drop by 4 in each two steps. The error equation is Sex4, = Tex: oly Op geal Oe eet =| “gee OF alone Sie gc (7) 9.3 Iterative Methods for Linear Algebra 413 That last matrix is S~'T. Its eigenvalues are 5 and —5. So its spectral radius is +: B=S =i r=[{0 1 5 ‘i has |Alma= x 5 and Le? 0 1 5 Zio k | =|, |. Two steps multiply the error by | exactly, in this special example. The important message is this: Jacobi’s method works well when the main diagonal of A is large compared to the off-diagonal part. The diagonal part is S, the rest is —T. We want the diagonal to dominate and S~!T to be small. The eigenvalue A = 4 is unusually small. Ten iterations reduce the error by 2!0 — 1024. Twenty iterations reduce e by (1024)?. More typical and more expensive 1S) [Almas = 94 Or .999.The Gauss-Seidel method keeps the whole lower triangular part of A on the left side as S: 2Uk+1 —UxK+1 + 20K41 Uk+1 ie = +4 = aed 2 Vet Ue +2 FUR — Nl = i [ns NI Notice the change. The new u;z+, from the first equation is used immediately in the second equation. With Jacobi, we saved the old ux until the whole step was complete. With Gauss-Seidel, the new values enter right away and the old uz is destroyed. This cuts the storage in half! It also speeds up the iteration (usually). And it costs no more than the Jacobi method. Starting from (0,6), the exact answer (2,0) is reached in one step. That is an accident I did not expect. Test the iteration from another start up = 0 and vo = —1: 0 =| ye Sia 15/8 63/32 lett (16-9 |=1/64s| h 2 Seen Orla The errors in the first component are 2, 1/2, 1/8, 1/32. The errors in the second component are —1, —1/4, —1/16, —1/32. We divide by 4 in one step not two steps. GaussSeidel is twice as fast as Jacobi. This is true for every positive definite tridiagonal matrix; |A|max for Gauss-Seidel is the square of |A|max for Jacobi. This holds in many other applications—but not for every matrix. Anything is possible when A is strongly nonsymmetric—Jacobi sometimes better, and both methods might fail. Our example is small: Be Siae na! iZ 4 erorel — (0) 1 E | anda) 20 dE — is 0 [i NIlm The Gauss-Seidel eigenvalues are 0 and i. Compare with 5 and —4 for Jacobi. With a small push we can explain the successive overrelaxation method (SOR). The new idea is to introduce a parameter w (omega) into the iteration. Then choose this number w@ to make the spectral radius of S~!T as small as possible. 414 Chapter 9 Numerical Linear Algebra Rewrite Ax = b as wAx = wb. The matrix S in SOR has the diagonal of the original A, but below the diagonal we use wA. The matrix T on the right side is S—@WA: 2Uk+1 = (2 — 2w)uz + —oug4, + 2Ug41 = wuz + 4@ (9) (2 — 2m)vz, — 20. This looks more complicated to us, but the computer goes as fast as ever. ux+1 from the first equation is used immediately to find vz; This is like Gauss-Seidel, Suede with an adjustable number w. cefi =e = i Each new in the second equation. The key matrix is always Ioo 2]. (10) The determinant is (1—w)?. At the best w, both eigenvalues turn out to equal 7 Ay which is close to (4). Therefore SOR is twice as fast as Gauss-Seidel in this example. In other examples SOR can converge ten or a hundred times as fast. I will put on record the most valuable test matrix of order n. It is our favorite —1, 2, —1 tridiagonal matrix. The diagonal i. 21. Below and above are —1’s. example had n = 2, which leads to cos} = 5 as the Jacobi eigenvalue. Our (We found that . above.) Notice especially that this eigenvalue is squared for Gauss-Seidel: 9D The splittings of the —1, 2, —1 matrix of order n yield these eigenvalues of B: Jacobi (S = 0,2, 0 matrix): So Tehes leet cos : n Gauss-Seidel (S = —1, 2,0 matrix): ST SOR (with the best w): (cos S7!T has |Almax = has a oH n == (cos I n+ 7. 2 a 7 /(: + sin 2 ah n 2 F “) : Let me be clear: For the —1, 2, —1 matrix you should not use any of these iterations! Elimination is very fast (exact LU). Iterations are intended for large sparse matrices—when a high percentage of the zero entries are “not good.” The not good zeros are inside the band, which is wide. They become nonzero in the exact L and U, which is why elimination becomes expensive. We mention one more splitting. It is associated with the words “incomplete LU.” The idea is to set the small nonzeros in L and U back to zero. This leaves triangular matrices Lo and Up which are again sparse. That allows fast computations. The splitting has S = LoUo on the left side. Each step is quick: LoUoxr+1 = (A — LoUo)x, + b. 9.3 Iterative Methods for Linear Algebra 415 On the right side we do sparse matrix-vector multiplications. Don’t multiply Lo times Up— those are matrices. Multiply x; by Uo and then multiply that vector by Lo. On the left side we do forward and back substitutions. If LoUo is close to A, then |Almax is small. A few iterations will give a close answer. The difficulty with all four of these splittings is that a single large eigenvalue in S~'T would spoil everything. There is a safer iteration—the conjugate gradient method—which avoids this difficulty. Combined with a good preconditioner S (from the splitting A = S—T), this produces one of the most popular and powerful algorithms in numerical linear algebra.* Iterative Methods for Eigenvalues We move from Ax = b to Ax = Ax. Iterations are an option for linear equations. They are a necessity for eigenvalue problems. The eigenvalues of an n by n matrix are the roots of an nth degree polynomial. The determinant of A — AJ starts with (—A)”". This book must not leave the impression that eigenvalues should be computed from this polynomial. The determinant of A — AJ is a very poor approach—except when n is small. For n > 4 there is no formula to solve det(A — 4/) = 0. Worse than that, the A’s can be very unstable and sensitive. It is much better to work with A itself, gradually making it diagonal or triangular. (Then the eigenvalues appear on the diagonal.) Good computer codes are available in the LAPACK library—individual routines are free on www.netlib.org. This library combines the earlier LINPACK and EISPACK, with improvements. It is a collection of Fortran 77 programs for linear algebra on highperformance computers. (The email message send index from lapack brings information.) For your computer and mine, the same efficiency is achieved by high quality matrix packages like MATLAB. We will briefly discuss the power method and the QR method for computing eigenvalues. It makes no sense to give full details of the codes. 1 Power methods and inverse power methods. Start with any vector up. Multiply by A to find w;. Multiply by A again to find u2. If uo is a combination of the eigenvectors, then A multiplies each eigenvector x; by A;. After k steps we have (Ai)*: uy = Akug = cy(A1)*x1 + +++ + Cn(An)*Xn- (11) As the power method continues, the largest eigenvalue begins to dominate. The vectors uz point toward that dominant eigenvector. We saw this for Markov matrices in Chapter 8: A= E «| has Amax =1 é with eigenvector *Conjugate gradients are described in the author’s book greater detail by Golub-Van Loan and by Trefethen-Bau. /ntroduction 75 ek to Applied Mathematics and in 416 = Chapter9 Numerical Linear Algebra Start with wg and multiply at every step by A: Ud iis l oe ylheed Geel llsane 84 cy : ; is approaching Ug = ) 95 |: The speed of convergence depends on the ratio of the second largest eigenvalue 12 to the largest A,;. We don’t want 4; to be small, we want A2/A; to be small. Here h2/d, = .6/1 and the speed is reasonable. For large matrices it often happens that |A2/A1| is very close to 1. Then the power method is too slow. Is there a way to find the smallest eigenvalue—which is often the most important in applications? Yes, by the inverse power method: Multiply uo by Aq! instead of A. Since we never want to compute A~!, we actually solve Au; = uo. By saving the LU factors, the next step Au2 = uw, 1s fast. Eventually | = Kuo = C\ xX] sae CnXn (12) Now the smallest eigenvalue Amin 1s in control. When it is very small, the factor WVpees| is large. For high speed, we make Amin even smaller by shifting the matrix to A—A*/. If 4* is close to Amin then A—A*J has the very small eigenvalue Amin—A*. Each shifted inverse power step divides the eigenvector by this number, and that eigenvector quickly dominates. 2 The QR Method This is a major achievement in numerical linear algebra. Fifty years ago, eigenvalue computations were slow and inaccurate. We didn’t even realize that solving det(A —A/) = O was a terrible method. Jacobi had suggested earlier that A should gradually be made triangular—then the eigenvalues appear automatically on the diagonal. He used 2 by 2 rotations to produce off-diagonal zeros. (Unfortunately the previous zeros can become nonzero again. But Jacobi’s method made a partial comeback with parallel computers.) At present the QR method is the leader in eigenvalue computations and we describe it briefly. The basic step is to factor A, whose eigenvalues we want, into QR. Remember from Gram-Schmidt (Section 4.4) that Q has orthonormal columns and R is triangular. For eigenvalues the key idea is: Reverse Q and R. The new matrix is RQ. A, = RQ is similar to A = QR, the eigenvalues are not changed: QORx =ax This process continues. gives RO(Q~'x) =A(Q7'x). Factor the new matrix A; into Q;R,. Since (13) Then reverse the factors to Rj Q . This is the next matrix A2, and again no change in the eigenvalues. Amazingly, those eigenvalues begin to show up on the diagonal. Often the last entry of A4 holds an accurate eigenvalue. In that case we remove the last row and column and continue with a smaller matrix to find the next eigenvalue. Two extra ideas make this method a success. One is to shift the matrix by a multiple of 7, before factoring into QR. Then RQ is shifted back: Factor Ay; —cxl into Qg Ry. The next matrix is Agy, = ReQe + cyl. 9.3 Iterative Methods for Linear Algebra 417 Ax+1 has the same eigenvalues as Az, and the same as the original Ag = A. A good shift chooses c near an (unknown) eigenvalue. That eigenvalue appears more accurately on the diagonal of Ax;;—which tells us a better c for the next step to Ago. The other idea is to obtain off-diagonal zeros before the QR method starts. Change A to the similar matrix L~'AL (no change in the eigenvalues): 1 LOAL = 1522735): [al 1 (eA 5 seer Dil ree 1 ivy i5*.3 ie OL aS Orden? 4 L~' subtracted row 2 from row 3 to produce the zero in column 1. Then L added column 3 to column 2 and left the zero alone. If I try for another zero (too ambitious), I will fail. Subtracting row 1 from row 2 produces a zero. But now L adds column 2 to column | and destroys it. We must leave those nonzeros | and 4 along one subdiagonal. This is a “Hessenberg matrix”, which is reachable in a fixed number of steps. The zeros in the lower left corner will stay zero through the QR method. The operation count for each QR factorization drops from O(n3) to O(n’). Golub and Van Loan give this example of one shifted QR step on a Hessenberg matrix A. The shittis (civ 7 f: Alga |e 3 Ae d 6 One 001 leads toy =:541.69 DL ALG oo 0 00002 Aq = 0.835 —6.656 7.012 Factoring A —7/ into QR produced Ay = RQ+7/. Notice the very small number .00002. The diagonal entry 7.012 is almost an exact eigenvalue of Aj, and therefore of A. Another QR step with shift by 7.0127 would give terrific accuracy. Problem Set 9.3 Problems 1-12 are about iterative methods for Ax = b. 1 Change Ax = b to x = (J—A)x+b. What are S and T for this splitting? What matrix S~!T controls the convergence of xx41 = (J — A)x, + b? Z If 4 is an eigenvalue of A, then ___ is an eigenvalue of B = ] — A. The real eigenvalues of B have absolute value less than | if the real eigenvalues of A lie and _ between 3 Show why the iteration x41 = ([—A)xx +b does not converge for A = [_7 74]. 4 Why is the norm the powers ||BI. of B* never larger than | Bk? Then B* approach zero (convergence). ||B|| < 1 guarantees that No surprise since |A|max is below 418 Chapter 9 Numerical Linear Algebra If A is singular then all splittings A = S—T must fail. From Ax = 0 show that S-'Tx =x. So this matrix B = S~'T has A = 1 and fails. Change the 2’s to 3’s and find the eigenvalues of S~'T for Jacobi’s method: SxXp41 =Tx, +) Find the eigenvalues of 3. is Out i ax, +b. S~'T for the Gauss-Seidel method applied to Problem 6: ew) ke see Does 0 li |Lea 2, |A|max for Gauss-Seidel equal |A| ay QO 1 |6 a [ae+e. Lor Jacobi? For any 2 by 2 matrix [2%] show that |Almax equals |bc/ad| for Gauss-Seidel and |bc/ad|'/? for Jacobi. We need ad #0 for the matrix S to be invertible. The best w produces two equal eigenvalues for S~!'T in the SOR method. Those eigenvalues are w—1 because the determinant is (w—1)*. Set the trace in equation (10) equal to (w — 1) + (w — 1) and find this optimal . 10 Write a computer code (MATLAB or other) for the Gauss-Seidel method. You can define S and T from A, or set up the iteration loop directly from the entries a;;. Test it on the —1, 2, —1 matrices A of order 10, 20, 50 with b = CLG an i0), 11 The Gauss-Seidel iteration at component 7 is If every x)" = x4 how does this show that the solution x is correct? does the formula change for Jacobi’s method? parentheses. 12 A is the —1, 2, —1 matrix of order n = 10. Divide equation (11) by ae and explain why |A2/A1| controls the convergence of the power method. verge. 14 How insert @ outside the The SOR splitting matrix S is the same as for Gauss-Seidel except that the diagonal is divided by w. Write a program for SOR on an n by n matrix. Apply it with w = 1, 1.4, 1.8, 2.2 when 13 For SOR Construct a matrix A for which this method does not con- The Markov matrix A = [-?-3] has A = 1 and .6, and the power method uz = A*uo converges to [-32]. Find the eigenvectors of A~!. What does the inverse power method u_,; = A~*uo converge to (after you multiply by .6*)? 9.3 15 Iterative Methods for Linear Algebra 419 Show that the n by n matrix with diagonals —1, 2, —1 has the eigenvector x; = (sin -77> sin aT a 8i ae, rind the eigenvalue A; by multiplying Ax). Note: For the other eigenvectors and eigenvalues of this matrix, change 7 to jm in x; and Aj. 16 For A =[_%~!] apply the power method uz+; = Au, three times starting with 1 What eigenvector is the power method converging to? uy = leat 17 In Problem 11 apply the inverse power method uz, = A~!u, three times with the same uo. What eigenvector are the u,x’s approaching? 18 In the QR method for eigenvalues, show that the 2,1 entry drops from sin@ in A = OR to —sin*@ in RQ. (Compute R and RQ.) This “cubic convergence” makes the method a success: cos@ ee sing 0 oe cos@ et —sin@ Il a eaallta ae 19 If A is an orthogonal matrix, its QR factorization has Q = and ht . Therefore RQ = . These are among the rare examples when the QR method fails. 20 The shifted QR method factors A — c/ into QR. Show A; = RQ+cl equals Q-'AQ. Therefore A; has the (but is closer to triangular). 21 When A = A!, the “Lanczos method” finds a’s and b’s and orthonormal q’s so that Ag; = bj-194)-1 + 4jq; + bjaj+1 (with qo = 0). Multiply by q} to find a that the next matrix eigenvalues as A formula for a;. The equation says that AQ = QT where T is a 22 matrix. The equation in Problem 21 develops from this loop with bo = | and rp = any q: Gis = 1)/0j, = J + le G4; Ag, ry = Aq] — 0;gj — qj.) — I Write a computer program. Test on the —1, 2, —1 matrix A. Q'Q should be /. 23 Suppose A is tridiagonal and symmetric in the QR method. From A, = O-A0 show that A, is symmetric. Then change Then change to Ay = RAR™! and show that A, is also tridiagonal. (If the lower part of A, is proved tridiagonal then by symmetry the upper part is too.) Symmetric tridiagonal matrices are at the heart of the QR method. Questions 24—26 are about quick ways to estimate the location of the eigenvalues. 24 If the sum of |a;;| along every row is less than 1, prove that |A| < 1. (If |x;| is larger than the other components of x, why is |2a;;x;| less than |x;|? That means |Ax;| < |x;| so |A| < 1.) 420 Chapter 9 Numerical Linear Algebra (Gershgorin circles) Every eigenvalue of A is in a circle centered at a diagonal entry aj; with radius rj = Lj -+4i\a;;| . This follows from (A —ajj)x; = If |x;| is larger than the other components viding by |x;| leaves |A — ajj| < rj. 25 What bound on |A|max does Problem Xj4iaijX;- of x, this sum is at most r;|x;|. Di- 24 give for these matrices? What are the three Gershgorin circles that contain all the eigenvalues? Ass 26 seh oe [see ie BAD 2-1 Aaa 0 wi2iel Ost gaz These matrices are diagonally dominant because each aj; > rj = absolute sum along the rest of row 7. From the Gershgorin circles containing all 4’s, show that diagonally dominant matrices are invertible. A= | 3/5" a> aes eat A, =| AO. + all 4 "1 a 2) 3s a 10 COMPLEX VECTORS AND MATRICES COMPLEX NUMBERS a 10.1 A complete theory of linear algebra must include complex numbers. Even when the matrix is real, the eigenvalues and eigenvectors are often complex. Example: A 2 by 2 rotation matrix has no real eigenvectors. Every vector turns by 6—the direction is changed. But there are complex eigenvectors (1,i) and (1, —i). The eigenvalues are also complex numbers e’? and e~'°. If we insist on staying with real numbers, the theory of eigenvalues will be left in midair. The second reason for allowing complex numbers goes beyond 4 and x to the matrix A. The matrix itself may be complex. We will devote a whole section to the most important example—the Fourier matrix. Engineering and science and music and economics all use Fourier series. In reality the series is finite, not infinite. Computing the coefficients in cye’* + cpe?2* +... +c,e!™ is a linear algebra problem. This section gives the main facts about complex numbers. It is a review for some students and a reference for everyone. The underlying fact is that i? =—1. Everything comes from that. We will get as far as the amazing formula e77! = 1. Adding and Multiplying Complex Numbers Start with the imaginary number 7. Everybody knows that x? = —1 has no real solution. When you square a real number, the answer is never negative. So the world has agreed on a solution called i. (Except that electrical engineers call it j.) Imaginary numbers follow the normal rules of addition and multiplication, with one difference. Whenever i* appears it is replaced by —1. 421 422 Chapter 10 Complex Vectors and Matrices 10A A complex number (say 3 + 2i) is the sum of a real number (3) and a pure imaginary number (2i). Addition keeps the real and imaginary parts separate. Multiplication uses i? = —1: Add: (3+2i)+(3+2i) =64+4i Multiply: (3 + 2i)(1 — i) =3 + 2i — 3i — 217 =5 -i. If I add 342i to 1—i, the answer is 4+i. The real numbers 3+ 1 stay separate from the imaginary numbers 2i — i. We are adding the vectors (3,2) and (1, —1). The number (1 — i)* is 1 — i times 1 — i. The rules give the surprising answer —2i: G=)NC 2s l= Se In the complex plane, 1 —i is at an angle of —45°. When we square it to get —2i, the angle doubles to —90°. If we square again, the answer is (—2i)* = —4. The —90° angle has become —180°, which is the direction of a negative real number. A real number is just a complex number z = a + bi, with zero imaginary part: b =0. A pure imaginary number has a = 0: The real part is a=Re (a+bi). The imaginary part is b=Im (a+Di). The Complex Plane Complex numbers correspond to points in a plane. Real numbers go along the x axis. Pure imaginary numbers are on the y axis. The complex number 3+2i is at the point with coordinates (3,2). The number zero, which is 0+ Oi, is at the origin. Adding and subtracting complex numbers is like adding and subtracting vectors in the plane. The real component stays separate from the imaginary component. The vectors go head-to-tail as usual. The complex plane C! is like the ordinary two-dimensional plane R’, except that we multiply complex numbers and we didn’t multiply vectors. Now comes an important idea. The complex conjugate of 3+2i is 3—2i. The complex conjugate of z= 1—i is z= 1+. In general the conjugate of z=a+bi is Z =a-—bi. (Notice the “bar” on the number to indicate the conjugate.) The imaginary parts of z and “z bar” have opposite signs. In the complex plane, Z is the image of z on the other side of the real axis. Two useful facts. When we multiply conjugates Zz, and Z2, we get the conjugate of 2122. When we add Z; and Z2, we get the conjugate of z; + z2: m+ 22 = (3 —21) Zi <p + 1 +1) = (b= 21) OG 21) =4—1 = 5 te conjugate of 7) + m: Conpiigate Of 21>. Adding and multiplying is exactly what linear algebra needs. By taking conjugates of Ax = Ax, when A is real, we have another eigenvalue A and its eigenvector X: If Ax = Ax and A is real then Ax = ix. (1) 10.1 Complex Numbers 423 A Imaginary axis Complex plane z Di ae Z=3-+21 UL r = |z| = 32 +22 l t Figure 10.1. z=a+bi +———_}+—> Real axis corresponds to the point (a,b) and the vector Fel Something special happens when z = 3+ 2i combines with its own complex conjugate Z=3-—2i. The result from adding z+zZ or multiplying zz is always real: (34+2i)+(3—2i)=6 (3+2i) x 3—2i) (real) =9+6i — 6i — 4i* = 13 (real). The sum of z = a+bi and its conjugate 7 = a—bi is the real number 2a. The product of z times Z is the real number a? + b?: zz= (a+ bi)(a — bi) =a* +b’. (2) The next step with complex numbers is division. The best idea is to multiply the denominator by its conjugate to produce a* + b? which is real: Ne at+tib Ey es eaee 2 at+iba-—ib a?+b? ipa 342i i ey es) 34213-2i 13 In case a? + b* = 1, this says that (a + ib)—! is a—ib. On the unit circle, 1/z is Z. Later we will say: 1/e!° is e~'® (the conjugate). A better way to multiply and divide is to use the polar form with distance r and angle 6. The Polar Form The square root of a’ +b? is |z|. This is the absolute value (or modulus) of the number z=a+ib. The same square root is also written r, because it is the distance from 0 to the complex number. The number r in the polar form gives the size of z: The absolute value of z=a+ib is |z] = Va2+b2. This is also called r. The absolute value of z=3+2i is |z| = V¥32+22. This is r= V13. The other part of the polar form is the angle 6. The angle for z= 5 is 6 = 0 (because this z is real and positive). The angle for z = 37 is 2/2 radians. The angle for z = —9 424 Chapter 10 Complex Vectors and Matrices 21 Z=34+2i S N SS f SS i ~ N a N 2+2=6 N > + 4 —}+——~+—» 5 6 Real axis complex conjugate of z =) Figure 10.2 Z=3-2i Conjugates give the mirror image of the previous figure: z+ Z is real. is a radians. The angle doubles when the number is squared. This is one reason why the polar form is good for multiplying complex numbers (not so good for addition). When the distance is r and the angle is 9, trigonometry gives the other two sides of the triangle. The real part (along the bottom) is a = rcos@. The imaginary part (up or down) is b=rsin@. Put those together, and the rectangular form becomes the polar form: The Note: number cos@ +isin@ Thus cos@ +isin@ z=a+ib z=rcos@+irsin#. always has absolute value r = 1 because cos? 6 + sin?6 = 1. lies on the circle of radius Example 1 Find r and 6 for Solution is also 1—the unit circle. z= 1-+#i and also for the conjugate jz = 1 —i. The absolute value is the same for z and Z. Here it is iZeet 41 =o. andalsol r= /1+1 sit = 1 ee = 2: = 2 The distance from the center is 2. What about the angle? The number | +7 is at the point (1, 1) in the complex plane. The angle to that point is 7/4 radians or 45°. The cosine is 1//2 and the sine is 1/./2. Combining r and 6 brings back z= 1+: rcos@ +irsiné I | ¥3(=) +iv3(—) : erie The angle to the conjugate 1 —i can be positive or negative. We can go to 77/4 radians which is 315°. Or we can go backwards through a negative angle, to —1/4 radians or —45°. If z is at angle 0, its conjugate Z is at 2x — 0 and also at —0. We can freely add 27 or 4m or —2z7 to any angle! Those go full circles so the final point is the same. This explains why there are infinitely many choices of 6. 10.1. Complex Numbers 425 Often we select the angle between zero and 27 radians. But —@ is very useful for the conjugate Z. Powers and Products: Polar Form Computing (1 +7)? and (1+i)® is quickest in polar form. That form has r = /2 and 6 = 1/4 (or 45°). If we square the absolute value to get r? = 2, and double the angle to get 20 = 2/2 (or 90°), we have (1+i)*. Fo = Loe) For the eighth power we need r® and 86: eee and 80 = 8. — = 2n. This means: (1 +i)® has absolute value 16 and angle 27. The eighth power of 1+i is the real number 16. Powers are easy in polar form. So is multiplication of complex numbers. 10B The polar form of z” has absolute value r”. The angle is n times 0: The nth power of z=r(cos@?+isin@) is z" n =r"(cosn@6 +isinné). (3) In that case z multiplies itself. In all cases, multiply r’s and add angles: r(cos@ +i sin@) times r'(cos 6’ +i sin 6’) = rr'(cos(6 + 6’) +isin(@+6’)). One way to understand this is by trigonometry. get the double angle 2 for z?? (cos Concentrate on angles. (4) Why do we +i sin@) x (cos@é +isin@) = cos’6 + i? sin’6 + 2i sin6@ cos 0. The real part cos? 0 —sin? @ is cos20. The imaginary part 2sin@ cos@ is sin26. Those are the “double angle” formulas. They show that @ in z becomes 26 in in z. When the angles @ and 6’ are different, use the “addition formulas” instead: (cos6 +i sin@)(cos 6’ +i sin@’) = [cos 6 cos 6’ — sin@ sin 6’] + i[sin@ cos 6’ + cos6@sin 6’]. In those brackets, trigonometry sees the cosine and sine of 6+6’. This confirms equation (4), that angles add when you multiply complex numbers. There is a second way to understand the rule for z”. It uses the only amazing formula in this section. Remember that cos @+i sin@ has absolute value 1. The cosine is made up of even powers, starting with | — 50°. The sine is made up of odd powers, starting with 6 — 70°, The beautiful fact is that e? combines both of those series into cos@ +i sin@: e=14+x+ | | ae - ral +--+. becomes ; e’? 3 ges PL =1+i0+ Wigs ce Zio" As 426 Chapter 10 Complex Vectors and Matrices (e!I? etl 6 ean! 6 el2ni/6 _ ¢2M — | e8ti/6 Figure 10.3. e 10ni/6 (a) Multiplying e’” times e!% (b) The nth power of e27'/" is e27! = 1. Write —1 for i*. The real part 1 — 50° +--+ is exactly cos@. The imaginary part 6 — 40° +.-- is sin@. The whole right side is cos@ + i sin@: Euler’s Formula e'? =cosé +isiné. The special choice 6 = 27 gives cos2m +isin22 series e27' = 1 4+2ni + 5(2mi)* +.--- Now multiply e’” times ef (5) which is 1. Somehow the infinite adds up to 1. Angles add for the same reason that exponents add: e? times e° is e° because (e)(e) x (e)(e)(e) = (e)(e)(e)(e)(e) e’”’id times e’”’Ca is e 210 e'” times e!?QC: is ei (G+0’) (6) Every complex number a+ib =rcos@+irsin@ now goes into its best possible form. That form is re!?. The powers (re!’)" are equal to r”e!"°. They stay on the unit circle when r = 1 and r” = 1. Then we find n different numbers whose nth powers equal 1: Set w=e 2xi/” The nth powers of 1, w,w*,..., w"~! all equal 1. Those are the “nth roots of 1.” They solve the equation z” = 1. They are equally spaced around the unit circle in Figure 10.3b, where the full 27 is divided by n. Multiply their angles by n to take nth powers. That gives w” = e?”! which is 1. Also (w?)” = e*”' = 1. Each of those numbers, to the nth power, comes around the unit circle to 17 These roots of | are the key numbers for signal processing. A real digital computer uses only 0 and 1. The complex Fourier transform uses w and its powers. The last section of the book shows how to decompose a vector (a signal) into n frequencies by the Fast Fourier Transform. 10.1. Complex Numbers 427 Problem Set 10.1 Questions 1-8 are about operations on complex numbers. 1 Add and multiply each pair of complex numbers: (a) cD (b) —1l+i,—-1+i (c) cos@+isin@,cos@ —isin@ Locate these points on the complex plane. Simplify them if necessary: @) 3 2+1,2-i1 2+i © @+i? © Fy @ [+i Find the absolute value r = |z| of these four numbers. If @ is the angle for 6—8i, what are the angles for the other three numbers? (Qa Ol Sit (bmn (Giaany 4 eC), Game If |z| = 2 and |w| =3 then |z x w| = mg B(OReEg and |z+w|< and |z/w| = and |z— w| < 5 Find a + ib for the numbers at angles 30°, 60°, 90°, 120° on the unit circle. If w is the number at 30°, check that w? is at 60°. What power of w equals 1? 6 If z=rcos@+irsin@ then |/z has absolute value polar form is . Multiply z x 1/z to get 1. and angle wits vf The 1 by 1 complex multiplication M = (a+ bi)(c + di) 1s a 2 by 2 real multi- BTL The right side contains the real and imaginary parts of M. Test 8 A = A, +iA2 is a complex n by n matrix and b = b; +ib2 is a complex vector. The solution to Ax = b is x; +ix2. Write Ax = b as a real system of size 2n: L dleJ=[a} Questions 9-16 are about the conjugate at+ib=re'®, 9 M = (1+ 31)(1 — 31). 7 = a — ib = re~ ‘@ of the number z = Write down the complex conjugate of each number by changing i to —i: (€) ey | (dd) e'*™=-1 (b) <2=i1)01=1) (ce) 1 (c) e!/2 (which is i) Wwhichisalsoi) (f) i? = 428 Chapter 10 Complex Vectors and Matrices 10 The sum z+Z is always . The difference z—Z is always . Assume z #0. The product z x Z is always . The ratio z/z always has absolute value 11 For a real 3 by 3 matrix, the numbers a2, a), a9 from the determinant are real: det(A — AI) = —A3 +.and? +.aja + ap = 0. Each root A is an eigenvalue. Taking conjugates gives SP so A is also an eigenvalue. ash Path ane: For the matrix with a;; = i—j, find det(A —A/) and the three eigenvalues. Note The conjugate of Ax = Ax is A¥ = AX. This proves two things: 2 is an eigenvalue and xX is its eigenvector. value. 12 Problem 11 only proves that A is an eigen- The eigenvalues of a real 2 by 2 matrix come from the quadratic formula: a-ixz Cc b =22 =a d—-ix ant (ad ~b0) =0 gives the two eigenvalues (notice the + symbol): \,= a+d+/(a+d)? —4(ad — bc) fag ge oe ee (a) If a=b=d =1, the eigenvalues are complex when c is (b) What are the eigenvalues when ad = bc? (c) The two eigenvalues (plus sign and minus sign) are not always conjugates of each other. Why not? 13 In Problem 12 the eigenvalues are not real when (trace)? = (a + d)* is smaller than . Show that the 2’s are real when bc > 0. 14 Find the eigenvalues and eigenvectors of this permutation matrix: 0 = has det(Py —AI) = /— S] Sis SSS) SS et SQ 15 Extend P4 above to P¢ (five 1’s below the diagonal and one in the corner). Find det(Ps — Al) and the six eigenvalues in the complex plane. 16 A real skew-symmetric matrix (A? = —A) has pure imaginary eigenvalues. First proof: If Ax = Ax then block multiplication gives Ea olli)=9 [2] This block matrix is symmetric. Its eigenvalues must be ! So A is 10.1. Complex Numbers 429 Questions 17-24 are about the form re’? of the complex number r cos6+ir sin 6. az Write these numbers in Euler’s form re!’. Then square each number: (aye 18 P= SP> (6) cos 20777 sin 20" (Cc) —7F Ges ar Find the absolute value and the angle for z = sin@+icos@ (careful). Locate this z in the complex plane. Multiply z by cos@ + isin@ to get 19 Draw all eight solutions of z® = 1 in the complex plane. What are the rectangular forms a +ib of these eight numbers? 20 Locate the cube roots of 1 in the complex plane. Locate the cube roots of —1. Together these are the sixth roots of — 2a By comparing e9 — cos 30 +isin36 with (e'?)? = (cos@ +isiné)%, find the “triple angle” formulas for cos3@ and sin36@ in terms of cos@ and sin@. 22 Suppose the conjugate z is equal to the reciprocal |/z. What are all possible z’s? 23 (a) Why do e! and i° both have absolute value 1? (b) In the complex plane put stars near the points e’ and i°. (c) The number i© could be (e in/2)e 24 or (er! /2)e | Are those equal? Draw the paths of these numbers from ¢ = 0 to t = 27 in the complex plane: (a) elt (b) elt = e tell (c) G1)! = eft 430 Chapter 10 Complex Vectors and Matrices HERMITIAN AND UNITARY MATRICES m 10.2 The main message of this section can be presented in the first sentence: When you transpose a complex vector z or a matrix A, take the complex conjugate too. Don’t stop at z' or A’. Reverse the signs of all imaginary parts. Starting from a column vector with components z; = aj +ibj;, the good row vector is the conjugate transpose with components a; — 1b;: Zs |e SS ae he ee (1) Here is one reason to go to z. The length squared of a real vector is x74. g -+x2, The length squared of a complex vector is not zt: : +2, With that wrong definition, the length of (1,1) would be 1* +i? =0. A nonzero vector would have zero length—not good. Other vectors would have complex lengths. Instead of (a+bi)* we want a?+b?, the absolute value squared. This is (a + bi) times (a — bi). For each component we want z; times Z;, which is Iz|? = a*+b*. when the components of z multiply the components of 2: That comes F65\ [z1 «+» Zw) | eo) Sal? totes ‘This iss! z'z=Yz\/?. (2) Now the squared length of (1,i) is 17+ |i]? =2. The length is 2 and not zero. The squared length of (1+i, 1—i) is 4. The only vectors with zero length are zero vectors. DEFINITION The length ||z|| is the square rootof \\z\|? =Z'z = |z|* +--+: + zl’. Before going further we replace two symbols by one symbol. Instead of a bar for the conjugate and T for the transpose, we just use a superscript H. Thus z! = z4, This is “z Hermitian,” the conjugate transpose of z. The new word is pronounced ““Hermeeshan.” The new symbol applies also to matrices: The conjugate transpose of a matrix A is AH, : Notation mae . —T The vector z@ is z'. The matrix AM is A —the conjugate transpose of A: If A= j 51 ney then AH — 1 0 |= “A Hermitian” Complex Inner Products For real vectors, the length squared is x'x—the inner product of x with itself. For complex vectors, the length squared is z4z. It will be very desirable if this is the 10.2 Hermitian and Unitary Matrices 431 inner product of z with itself. To make that happen, the complex inner product should use the conjugate transpose (not just the transpose). There will be no effect when the vectors are real, but there is a definite effect when they are complex: DEFINITION The inner product of real or complex vectors u and v is uv: v1 H uv — Sais vee Tn | 2 > |= a Wyvy = +--+ + Unvn. (3) With complex vectors, uy is different from vu. The order of the vectors is now important. In fact vlu = Diu, +--- + V_uy is the complex conjugate of uly. We have to put up with a few inconveniences for the greater good. Example 1 The inner product of u = i with v = | | is [1 Example 2 The inner product of u = ES Ue 0 pie. with v = ae a Pe Wea i a i) Ph NOL IE : Example | is surprising. Those vectors (1,7) and (i, 1) don’t look perpendicular. But they are. A zero inner product still means that the (complex) vectors are orthogonal. Similarly the vector (1,7) is orthogonal to the vector (1, —i). Their inner product is 1 —1=0. We are correctly getting zero for the inner product—where we would be incorrectly getting zero for the length of (1,7) if we forgot to take the conjugate. Note We have chosen to conjugate the first vector u. Some authors choose the second vector v. Their complex inner product would be u'v. It is a free choice, as long as we stick to one or the other. We wanted to use the single symbol © in the next formula too: The inner product of Au with v equals the inner product of u with A™v: (Au) y= ul (Ad). (4) The conjugate of Au is Au. Transposing it gives w'A’ as usual. This is wW4AM. Eyerything that should work, does work. The rule for H comes from the rule for '. That applies to products of matrices: 10C The conjugate transpose of AB is (AB)# = BHA, We are constantly using the fact that a — ib times c — id is the conjugate of a + ib times c+ id. 432 Chapter 10 Complex Vectors and Matrices Among real matrices, the symmetric matrices form the most important special class: A = A‘. They have real eigenvalues and a full set of orthogonal eigenvectors. The diagonalizing matrix § is an orthogonal matrix Q. Every symmetric matrix can be written as A = QAQ7! and also as A = QAQ! (because Q~! = Q'). All this follows from aj; = aj;, when A is real. Among complex matrices, the special class consists of the Hermitian matrices: A = A". The condition on the entries is now ajj = ajj. In this case we say that “A is Hermitian.” Every real symmetric matrix is Hermitian, because taking its conjugate has no effect. The next matrix is also Hermitian: Pie 2 3 —3i The main diagonal is real since aj; = Gjj. ~ {343i 5 Across it are conjugates 3 + 3i and 3 — 3i. This example will illustrate the three crucial properties of all Hermitian matrices. 10D If A = A# and z is any vector, the number z" Az is real. Quick proof: z4Az is certainly 1 by 1. Take its conjugate transpose: (zHaz)yH = gH ah(gH)H which is zAz again. Reversing the order has produced the same 1 by | matrix (this used A = AM!) For 1 by 1 matrices, the conjugate transpose is simply the conjugate. So the number z4 Az equals its conjugate and must be real. Here is z#Az in our example: Patient [z1 A 2 re 3—3i || z 5 = n nas a Aa = 27121 + 52222 + G — 31)Ziz2 + G + 31) 2122. The terms 2|z;|? and 5|z2|? from the diagonal are both real. The off-diagonal terms are conjugates of each other—so their sum is real. (The imaginary parts cancel when we add.) The whole expression z# Az is real. 10E Every eigenvalue of a Hermitian matrix is real. Proof Suppose Az = iz. Multiply both sides by z4 to get z#Az = az4z. On the left side, z4Az is real by 10D. On the right side, zz is the length squared, real and positive. So the ratio 4 = z4#Az/zHz is a real number. Q.E.D. The example above has real eigenvalues 4 = 8 and A = —1. Take the determinant of A —AXI to get (d — 8)(d + 1): De | Sie ate eas pe eee er = AP I Secrecy 10 18 =, Oe 8 ues 10.2 Hermitian and Unitary Matrices 433 10F The eigenvectors of a Hermitian matrix are orthogonal (provided they correspond to different eigenvalues). If Az = Az and Ay = By then y#z = 0. Proof Multiply Az = Az on the left by y#. Multiply y444 = By on the right by z: Vode — hye and: yo ANT byez, (5) The left sides are equal because A = A". Therefore the right sides are equal. Since B is different from 4, the other factor yz must be zero. The eigenvectors are orthogonal, as in the example with 4 = 8 and 6 = —1: of —6 B= Sb | ezialey (0 3 i) ofl fo tg Weal WG) amelie eS) Aine 50 Ss 2a ledieh auc ~ l ofS] c=] t5| say 1at Take the inner product of those eigenvectors y and z: A Bha oi [1 +i oo 1] fa! i ==) o ejg (orthogonal eigenvectors). These eigenvectors have squared length 17 + 17 + 1* = 3. After division by V3 they are unit vectors. They were orthogonal, now they are orthonormal. They go into the columns of the eigenvector matrix S, which diagonalizes A. When A is real and symmetric, it has real orthogonal eigenvectors. Then S is Q—an orthogonal matrix. Now A is complex and Hermitian. Its eigenvectors are complex and orthonormal. The eigenvector matrix S is like Q, but complex. We now assign a new name and a new letter to a complex orthogonal matrix. Unitary Matrices A unitary matrix is a (complex) square matrix with orthonormal columns. It is denoted by U—the complex equivalent of Q. The eigenvectors above, divided by V3 to become unit vectors, are a perfect example: Us ~ nd Valitifa ood! -1 is a unitary ‘ matrix This U is also a Hermitian matrix. I didn’t expect that! The example is almost too perfect. Its second column could be multiplied by —1, or even by 7, and the matrix of eigenvectors would still be unitary: U = — ag = wae oa | is also a unitary matrix. I 434 Chapter 10 Complex Vectors and Matrices e2ni/3 1 Fourier nant : 1 oe V3 : 1 1 1 eon! 3 eft! 3 eAti/3 e2ti/3 e4ni/3 Figure 10.4 The cube roots of 1 go into the Fourier matrix F = F3. The matrix test for real orthonormal columns was O'Q = J. When Q! multiplies Q, the zero inner products appear off the diagonal. In the complex case, Q becomes U and the symbol ' becomes 4. The columns show themselves as orthonormal when U a multiplies U. The inner products of the columns are again | and 0, and they fill up UA = 10G The matrix U has orthonormal columns when If U is square, it is a unitary matrix. Then lebanese1 vu =| 1 sh 1-i| fae1 5 1 Sulealn U®U = I. U4 = U7}, 1-i mM 0 = let a“ (6) Suppose U (with orthogonal column) multiplies any z. The vector length stays the same, because zZ4U4Uz = z4z. If z is an eigenvector, we learn something more: The eigenvalues of unitary (and orthogonal) matrices all have absolute value |i\ = 1. 10H Jf U is unitary then \\Uz|| = ||z\|. Therefore Uz = Xz leads to |A| = 1. Our 2 by 2 example is both Hermitian (U = U®) and unitary (U-! = U4). That means real eigenvalues (A = A), and it means absolute value one (A~! = A). A real number with absolute value 1 has only two possibilities: The eigenvalues are 1 or —1. One thing more about the example: The diagonal of U adds to zero. The trace is zero. So one eigenvalue is 4 = 1, the other is A = —1. The determinant must be 1 times —1, the product of the i’s. Example 3 The 3 by 3 Fourier matrix is in Figure 10.4. Is it Hermitian? Is it unitary? The Fourier matrix is certainly symmetric. It equals its transpose. But it doesn’t equal its conjugate transpose—it is not Hermitian. If you change i to —i, you get a different matrix. 10.2. Hermitian and Unitary Matrices 435 Is F unitary? Yes. The squared length of every column is F(1 +1-+ 1). The columns are unit vectors. The first column is orthogonal to the second column because 14 e27'/3 4 e4ti/3 — 0. This is the sum of the three numbers marked in Figure 10.4. Notice the symmetry of the figure. If you rotate it by 120°, the three points are in the same position. Therefore their sum S also stays in the same position! The only possible sum is S$ = 0, because this is the only point that is in the same position after 120° rotation. Is column 2 of F orthogonal to column 3? Their dot product looks like 41+ e673 4 4/3) — 11 +141). This is not zero. That is because we forgot to take complex conjugates! inner product uses / not !: The complex (column 2)" (column 3) = 4(1- 1+ e273 e473 4 e403 21/3) =: z(1 4 e2ti/3 te ents 3) = 0. So we do have orthogonality. Conclusion: F is a unitary matrix. The next section will study the n by n Fourier matrices. Among all complex unitary matrices, these are the most important. When we multiply a vector by F,, we are computing its discrete Fourier transform. When we multiply by F~!, we are comput- ing the inverse transform. The special property of unitary matrices is that F~! = F™, The inverse transform only differs by changing i to —i: 1 ja = FH 1 1 1 e2ni/3 ei 3 |4 e4nt(s™ Loni/3 — 1 /3 Everyone who works with F recognizes its value. The last section of the book will bring together Fourier analysis and linear algebra. This section ends with a table to translate between real and complex—for vectors and for matrices: Real versus Complex R": vectors with n real components C”: vectors with n complex components length: |[x||? =x? +---+x7 <> length: [zl]? = lz? +--+ lanl? (AT)ij = Aj @ (AM) = Aji (AB)' = BIA! ~ (AB)! = BEAM dot product: xly =X1Vi +e + Xnyn (Ax)Ty =x"(ATy) orthogonality: x' y = 0 symmetric matrices: A = AP inner product: wy = Ufvy + upp (Au)#y = u#( AN) orthogonality: uty =0 Hermitian matrices: A = AM A =UAU~! =UAUE (teal A) K T=_K skew-Hermitian K orthogonal matrices: Q' = Q7! unitary matrices: orthonormal columns: orthonormal columns: A= OAQ7! = QAQ' (teal A) skew-symmetric matrices: Q'Q = / matrices = —K UH! = U—! U#U = / TTT TPT TTT (Ux)F(Uy) = x4y and ||Uz|| = llzll (Qx)'(Qy) =x"y and ||Qx|| = |x| tts The columns and also the eigenvectors of Q and U are orthonormal. Every |A| = 1. 436 Chapter 10 Complex Vectors and Matrices Problem Set 10.2 Find the lengths of wu= (1+i,1—i,1+2i) and v = (i,i,i). Also find uy and viy. Compute AMA and AA. Those are both A inlet i boat matrices: : Solve Az = 0 to find a vector in the nullspace of A in Problem 2. Show that z is orthogonal to the columns of AM. Show that z is not orthogonal to the columns of A’ Problem 3 indicates that the four fundamental subspaces are C(A) and N(A) and and _. Their dimensions are still r and n—r andr and m—r. They are still orthogonal subspaces. The symbol takes the place of th (a) Prove that AMA is always a Hermitian matrix. (b) If Az = 0 then AMAz = 0. If AH Az = 0, multiply by z™ to prove that Az = 0. The nullspaces of A and AMA are . Therefore AHA is an invertible Hermitian matrix when the nullspace of A contains only z = True or false (give a reason if true or a counterexample if false): (a) If A is a real matrix then A+ / is invertible. (b) If A is a Hermitian matrix then A +7/ (c) If U is a unitary matrix then A+ i/ is invertible. is invertible. When you multiply a Hermitian matrix by a real number c, is cA still Hermitian? If c =i show that 7A is skew-Hermitian. The 3 by 3 Hermitian matrices are a subspace provided the “scalars” are real numbers. Which classes of matrices does P belong to: orthogonal, invertible, Hermitian, unitary, factorizable into LU, factorizable into QR? P= Compute P*, P?, and P! 10 1 O a © =0 0 in Problem 8. What are the eigenvalues of P? Find the unit eigenvectors of P in Problem 8, and put them into the columns of a unitary matrix F. What property of P makes these eigenvectors orthogonal? 10.2 Hermitian and Unitary Matrices 437 11 Write down the 3 by 3 circulant matrix C = 27 +5P+4P?. eigenvectors as P in Problem 8. Find its eigenvalues. 12 If U and V are unitary matrices, show that U~! is unitary and also UV Start from UHU = 7 and VEV = 1. 13 How do you know that the determinant of every Hermitian matrix is real? 14 The matrix AMA is not only Hermitian but also positive definite, when the columns of A are independent. Proof: z4 A" Az is positive if z is nonzero because 15 Diagonalize this Hermitian matrix to reach A= ar ;é 1 16 17 ei is unitary. UAU#: eel: Diagonalize this skew-Hermitian matrix to reach a It has the same K= UAU BAILA, Ssateu ee sagt Diagonalize this orthogonal matrix to reach Q = UAU H_ Now all 4’s are oe hee Mea sin@ 18 19 20 cos@ Diagonalize this unitary matrix V to reach V = UAU H” Again all 4’s are LE Ten ey = hie Lt —-1 x |- If vj,..., 0, is an orthogonal basis for C”, the matrix with those columns matrix. Show that any vector z equals (vitz)v, feeet (viz) un. is a The functions e~'* and e!* are orthogonal on the interval 0 < x < 2m because their inner product is ieHh) (1,7, 1),w = (i, 1,0) and z= ______ are an orthogonal basis 21 The vectors for ne If 23 The (complex) dimension of C” is __. 24 Describe all 1 by 1 Hermitian matrices and unitary matrices. Do the same for 2 by 2. v = A= R+iS is a Hermitian matrix, are its real and imaginary parts symmetric? Find a non-real basis for C”. 438 Chapter 10 25 How are the eigenvalues of A™ related to the eigenvalues of the square complex matrix A? 26 If u4uw = 1 show that J — 2uu" is Hermitian and also unitary. matrix uu" js the projection onto what line in C”? 27 If Complex Vectors and Matrices A+iB is a unitary matrix (A and B are real) show that The rank-one Q = [4-8] is an orthogonal matrix. 28 If A+iB is a Hermitian matrix (A and B are real) show that [4 —B] is sym- metric. 29 Prove that the inverse of a Hermitian 30 Diagonalize this matrix by constructing its eigenvalue matrix A and its eigenvector matrix S: = 31 2 , lhe matrix is a Hermitian Li 3 = matrix. Al, A matrix with orthonormal eigenvectors has the form A = UAU~! = UAU4. Prove that AA = AHA. These are exactly the normal matrices. 10.3. The Fast Fourier Transform 439 THE FAST FOURIER TRANSFORM a 10.3 Many applications of linear algebra take time to develop. It is not easy to explain them in an hour. The teacher and the author must choose between completing the theory and adding new applications. Generally the theory wins, because this course is the best chance to make it clear—and the importance of any one application seems limited. This section is almost an exception, because the importance of Fourier transforms is almost unlimited. More than that, the algebra is basic. We want to multiply quickly by F and F™=, the Fourier matrix and its inverse. This is achieved by the Fast Fourier Transform— the most valuable numerical algorithm in our lifetime. The FFT has revolutionized signal processing. Whole industries are speeded up by this one idea. Electrical engineers are the first to know the difference—they take your Fourier transform as they meet you (if you are a function). Fourier’s idea is to represent f as a sum of harmonics cye!**. The function is seen in frequency space through the coefficients cz, instead of physical space through its values f(x). The passage backward and forward between c’s and /’s is by the Fourier transform. Fast passage is by the FFT. An ordinary product Fc uses n* multiplications (the matrix has n* nonzero entries). The Fast Fourier Transform needs only n times 5 log, n. We will see how. Roots of Unity and the Fourier Matrix Quadratic equations have two roots (or one repeated root). Equations of degree n have n roots (counting repetitions). This is the Fundamental Theorem of Algebra, and to make it true we must allow complex roots. This section is about the very special equation z” = 1. The solutions z are the “nth roots of unity.” They are n evenly spaced points around the unit circle in the complex plane. Figure 10.5 shows the eight solutions to z’ = 1. Their spacing is <(360°) = 45°. The first root is at 45° or 6 = 27/8 radians. It is the complex number w = el? — e!2/8 We call this number wg to emphasize that it is an 8th root. You could write it in terms of cos aa and sin or but don’t do it. The seven other 8th roots are w?, w>,..., w, going around the circle. Powers of w are best in polar form, because we work only with the angle. The fourth roots of | are also in the figure. They are 7, —1, —i, 1. The angle is now 27/4 or 90°. The first root w4 = e°7'/4 is nothing but i. Even the square roots of 1 are seen, with w2 = e!2"/2 = —1. Do not despise those square roots | and —1. The idea behind the FFT is to go from an 8 by 8 Fourier matrix (containing powers of we) to the 4 by 4 matrix below (with powers of w4 =i). The same idea goes from 4 to 2. By exploiting the connections of Fg down to Fy and up to Fj (and beyond), the FFT makes multiplication by F024 very quick. We describe the Fourier matrix, first for n = 4. Its rows contain powers of | and w and w2 and w°. These are the fourth roots of 1, and their powers come in a 440 Chapter 10 Complex Vectors and Matrices w = e2hi/8 — cos +i sin — > Real axis Figure 10.5 The eight solutions to z8 = 1 are 1, w, w*,..., w’ with w = (1+i)/V2. special order: (reat mb Plame See sy ee 1 ag w> No ae w fe in Mla la et Kole i oy te ee ee w? fo a eT The matrix is symmetric (F = F T). It is not Hermitian. Its main diagonal is not real. But 5F is a unitary matrix, which means that (GFA) F) == //5 The columns of F give F&F =4I. The inverse of F is iF H which is iF. The inverse changes from w =i to w = —i. That takes us from F to F. When the Fast Fourier Transform gives a quick way to multiply by Fy4, it does the same for the inverse. The unitary matrix is U = F/,/n. We prefer to avoid that ,/n and just put : outside F~!. The main point is to multiply the matrix F times the coefficients in the Fourier series co + cye’* + cpe2* + c3e3!*: ee ] 1 1 w 1 w* 1 w CO Chee Gar Cu a= Cy == C3 cotcjwtow? +cow? 1 we wr we lilo cotcjw* 1 w wo w? CR cot cw? + cow® + c3w? +cow*+ciw® (1) |° The input is four complex coefficients co, c1, c2, c3. The output is four function values yo, ¥1, ¥2, 3. The first output yo = co +c; +c¢2+ 3 is the value of the Fourier series at x = 0. The second output is the value of that series y, =O never ue dene! ae ent \~cpe'** at x = 21/4: =cotcw +cow? + cw. 10.3. The Fast Fourier Transform 441 The third and fourth outputs y2 and y3 are the values of doce! at x = 40/4 and x = 67/4. These are finite Fourier series! They contain n = 4 terms and they are evaluated at n = 4 points. Those points x = 0, 27/4, 41/4, 67/4 are equally spaced. The next point would be x = 87/4 which is 27. Then the series is back to yo, because e?”! is the same as e° = 1. Everything cycles around with period 4. In this world 2 + 2 is 0 because (w2)(w2) = w? = 1. In matrix shorthand, F times ¢ gives a column vector y. The four y’s come from evaluating the series at the four x’s with spacing 27/4: 3 : y = Fe produces y; = y cpe'*P7J/) — 20 the value of the series at x = a k=0 We will follow the convention that j and k go from 0 to n — | (instead of 1 to n). The “zeroth row” and “zeroth column” of F contain all ones. The n by n Fourier matrix contains powers of w = e27!/": 1 [dice |Olah w tt 1 wt) 1 ] CO w ee ery C] Cy yee-dD. F,, is symmetric but not Hermitian. wal) VO | Cn—] YI yee |e (2) Yn-1 Its columns are orthogonal, and F,F , = n1I. Then |os is F,/n. The inverse contains powers of w, = e~27'/". Look at the pattern in F: The entry in row j, column k is w/*. Row zero and column zero contain w® = 1. The zeroth output is yo = co +c, +----+¢,-1. This is the series Seer Alexa): When we multiply c by F,, we sum the series at n points. When we multiply y by Pee we find the coefficients c from the function values y. The matrix F passes from “frequency space” to “physical space.” F~! returns from the function values y to the Fourier coefficients c. One Step of the Fast Fourier Transform We want to multiply F times c as quickly as possible. Normally a matrix times a vector takes n? separate multiplications—the matrix has n? entries. You might think it is impossible to do better. (If the matrix has zero entries then multiplications can be skipped. But the Fourier matrix has no zeros!) By using the special pattern w/* for its entries, / can be factored in a way that produces many zeros. This is the FFT. The key idea is to connect F, with the half-size Fourier matrix Fy)... Assume that n is a power of 2 (say n = 2!0 — 1024). We will connect Fj024 to Fs5,;2—or rather 442 Chapter 10 Complex Vectors and Matrices to two copies of Fs\2. When n = 4, the key is in the relation between the matrices les Ta sess hh (hah ime! eo ae ah a: F5 pu 1 alee i2 1 1 tei Wie On the left is F4, with no zeros. On the right is a matrix that is half zero. The work is cut in half. But wait, those matrices are not the same. The block matrix with two copies of the half-size F is one piece of the picture but not the only piece. the factorization of F4 with many Here is zeros: (3) 1 sj 1 i? 1 The matrix on the right is a permutation. It puts the even c’s (co and cz) ahead of the odd c’s (cj and c3). The middle matrix performs separate half-size transforms on the evens and odds. The matrix at the left combines the two half-size outputs—in a way that produces the correct full-size output y = F4c. You could multiply those three matrices to see that their product is F4. The same idea applies when n = 1024 and m = 5n = S12.) The number 11s e2Rt/1024 Te is at the angle @ = 27/1024 on the unit circle. The Fourier matrix F924 is full of powers of w. The first stage of the FFT is the great factorization discovered by Cooley and Tukey (and foreshadowed in 1805 by Gauss): F 2 ekye! I532 1024 I5;2 is the identity matrix. D512 || F512 —D5\2 even-odd F512 | |permutation (4) |~ Ds5,2 is the diagonal matrix with entries (1, w,..., wy The two copies of F'5;2 are what we expected. Don’t forget that they use the 512th root of unity (which is nothing but w?!!) The permutation matrix separates the incoming vector c into its even and odd parts c’ = (co, c2,..., C1922) and ec” = (cy, €3, ..., C1023). Here are the algebra formulas which say the same thing as the factorization of Fjo24: 101 (FFT) Set m = 5N. The first m and last m components of y = F,¢ are com- binations of the half-size transforms y’ = F,,c’ and y” = F,,c”. Equation (4) shows Ty’+ Dy” and Ly’ — Dy”: y=yitwiy’, yj Se = V7 ptm j=0,...,.m—1 0 ald! Way; aI = Up. ah 4 ‘ (5) Thus the three steps are: split ¢ into c’ and ce”, transform them by F,, into y’ and y”, and reconstruct y from equation (5). 10.3. The Fast Fourier Transform 443 You might like the flow graph in Figure 10.6 better than these formulas. The graph for n = 4 shows c’ and c” going through the half-size F>. Those steps are called “butterflies,’ from their shape. Then the outputs from the F>’s are combined using the J and D matrices to produce y = F4c: 00 co Yo 00 10 Co yy 01 01 cy yp 10 C3 y3 11 nS e” 11 Figure 10.6 Flow graph for the Fast Fourier Transform with n = 4. This reduction from F, to two F,,’s almost cuts the work in half—you see the zeros in the matrix factorization. That reduction is good but not great. The full idea of the FFT is much more powerful. It saves much more than half the time. The Full FFT by Recursion If you have read this far, you have probably guessed what comes next. We reduced F,, to Fn/2. Keep going to F,/4. The matrices 256 leads to 128. That is recursion. F5;2 lead to F256 (in four copies). Then It is a basic principle of many fast algorithms, and here is the second stage with four copies of F = Fo56 and D = Do56: If F512 nae F512 _ Wee D F a F i D I -D F F pick 0,4, 8,-- pick. 42,6, 10. - pick psn ey ae pick’ 3, 71k We will count the individual multiplications, to see how much is saved. Before the FFT was invented, the count was the usual n? = (1024)?. This is about a million multiplications. I am not saying that they take a long time. The cost becomes large when we have many, many transforms to do—which is typical. Then the saving by the FFT is also large: The final count for size n = 2! is reduced from n? to $n. The number 1024 is 2!°, so / = 10. The original count of (1024)? is reduced to (5)(1024). The saving is a factor of 200. A million is reduced to five thousand. That is why the FFT has revolutionized signal processing. 444 Chapter 10 Complex Vectors and Matrices Here is the reasoning behind snl . There are / levels, going from n = 2! down to n = 1. Each level has 5n multiplications from the diagonal D’s, to reassemble the half-size outputs from the lower level. This yields the final count snl, which is sn log, n. One last note about this remarkable algorithm. There is an amazing rule for the order that the c’s enter the FFT, after all the even-odd permutations. Write the num- bers 0 to n — 1 in binary (base 2). Reverse the order of their digits. The complete picture shows the bit-reversed order at the start, the / = log, n steps of the recursion, and the final output yo,..., ¥,—1; which is F, times c. The book ends with that very fundamental idea, a matrix multiplying a vector. Thank you for studying linear algebra. 1 hope you enjoyed it, and I very much hope you will use it. That is why the book was written. It was a pleasure. Problem Set 10.3 1 Multiply the three matrices in equation (3) and compare with F. In which six entries do you need to know that i? = —1? 2 Invert the three factors in equation (3) to find a fast factorization of F sag, 3 F is symmetric. So transpose equation (3) to find a new Fast Fourier Transform! 4 All entries in the factorization of F6 involve powers of w = sixth root of 1: sli ee | a8 (aaa D || Fs Write down these three factors with 1, w, w? in D and powers of w” in F3. Multiply! 5 If v = (1,0,0,0) and w = (1,1,1,1), Therefore F~'w = v and F~!v = show that Fv = 6 What is F* and what is F* for the 4 by 4 Fourier matrix? yi Put the vector c = (1,0, 1, 0) through the three steps of the FFT to find y = Fe. Do the same for c = (0, 1, 0, 1). 8 Compute y = Fgc by the three FFT steps for c = (1,0, 1,0, 1,0, 1,0). the computation for ¢==(0; 1, 021,051; 0.1): 9 If w = e?7'/* then w? and \/w are among the 10 (a) Draw all the sixth roots of 1 on the unit circle. Prove they add to zero. (b) What are the three cube roots of 1? Do they also add to zero? and w and Fw = 4v. Repeat roots of 1. 10.3. 11 The Fast Fourier Transform 445 The columns of the Fourier matrix F are the eigenvectors of the cyclic permutation P. Multiply PF to find the eigenvalues 41 to 4: Opis OP Out ime sacle val es Ras ieee, ORDO WEIS OLN Tile tine!o> ares ell ea yo aye OPO GOSU MeOMOLROS This is PF = FA anal or The eigenvector matrix (usually S) is F. P = FAF7!. Tam Ne do iLeeere a eyes p03 alae: Oa? tercomane Ne: 3 4 h2 The equation det(P — AJ) = 0 is 44 = 1. This shows again that the eigenvalue matrix A is . Which permutation P has eigenvalues = cube roots of 1? 13 (a) Two eigenvectors of C are (1,1, 1, 1) and (1,i, i2, i>). What are the eigenvalues? EO) Gly G3uG GoueCa (Gi) (b) ey P=FAF—! (G2. 1G3 1 (CGD CNG | | TSGeyy sc6i0) l immediately gives P? = FA*F~! and P? = FA3F7!. Then C = col +c, P+c7 P* +c3 P? = F(col +c; A+c2A*+c3A3)F7! = FEF™!. That matrix E in parentheses is diagonal. 14 It contains the Find the eigenvalues of the “periodic” —1,2,—1 A3, with the eigenvalues of P in A. Orc, matrix from E = 27 — A — The —1’s in the corners make this matrix periodic: Oi 15 2-1 O -1 -l 2-1 0 Sore (ALR -l1 0-1 2 has co = 2,c) = —1,c2 = 0,c3 = —1. To multiply C times a vector x, we can multiply F(E(F~!x)) instead. The direct way uses n” separate multiplications. Knowing E and F, the second way uses only nlog,n +n multiplications. How many of those come from E, how many from F, and how many from F~!? 16 How could you quickly compute these four components of Fc starting from co + €2, Co — C2, C1 + €3, C1 — C3? You are finding the Fast Fourier Transform! CO reel hr Gr anes kes co tic, + 2c + 1303 co + i2cy + i4¢) +-i%¢3 co + 2; $i + i°C3 a We aa ‘apts eer ee 7 ‘we a one Whites _ i <— a i%, ni may 4AM. Ra a i es (Step — be a J ee = an ,: e+ 7 Gaal ay / : eee J PEED > -@A P 7 te ly reve ~ y yf mys Hg oh jm ne a b ae 7afl? Prue usvvragigt wane ot, ae iv doeuk tt Bad a nnn ah wer eb ayaheny ai «a (LS ng C144 dint) wie -9 sea wes 20) Taney Ste batw tes pe br iL = Sere mes taneatoa ?. } OL ieee ve ig 8 jew nk ~ Re ; i we ae r20r io 7+. Bam if. r. v ee rw ' = eo @ pm j i Wy - ete T apyet fo "ie ating ¢ Gees | ee ™ iP aa ' ih a Hyak “es Fanare G rete 7 : a Wi awl jens ee easy, ape at axe Raid a AE + ‘eal Same ir Naingld tae, (eis ~T. ee 7 Srv Salea a amen) 7 marr ais Oi tee ot ter il Pours af | a © SP spent ake "e hold 7 oe hg — eee en a e =; we “Fag dp a renee me ommp on.0. yotunanes rr _ = ae me i! ete a’, AMetinD a6 5s li - = : iyo cere Treake fs =| © ENS | yous Spe? =) eth oh ¢cieme © on emceesaT. ba ane Tr ee nmap wT $F oh things ae Oe 7 aa i ica a . io SOEOMONSTOSElaeite®) EXERC aS Problem Set 1.1, page 6 3 3v+w=(7,5) and 5 (a) Components of cv add to zero v —3w = (—1, —5) and cv+ dw = (2c +-d,c+2d). (b) Components of every cv +dw add to zero (GC) GhoosesG—4eandras—s10) 7 (a) Back on w, forward on v (b) The combinations fill a plane through v and w. 9 The fourth corner can be (4,4) or (4,0) or (—2, 2). 10 (Length of Dye length of w)* = 42422 + (-1)2 + 2? = 25 and also (length of v+w)* =3*+42 = 25. 12 Choose v = (2,1) of v+w)* 15 and w = (1,2). Then (length of v)2 = (length of w) = 5 but (length = 32432, Five more corners (0, 0, 1), (1, 1,0), (1, 0, 1), (0, 1, 1), (1, 1, 1). The center point is (5, 4, The centers of the six faces are Gh 5,0), Cr i 1) and (0, 5, 5), (aie 5, 5) and (5,0, (5,1, 4). 16 A four-dimensional cube has 24 = 16 corners and 2-4 = 8 three-dimensional sides and 24 two-dimensional faces and 32 one-dimensional 17 (a) sum = zero vector (b) sum = —4:00 19 C= h= 0 = (ho 45 tw) = lip —w). edges. vector (e)esum— —4(1:00 vector). The point dv =f tw is three-fourths of the way to v starting from w. The point iy+ iw is Su. 20 All combinations with c+d = 1 are on the line through v and w. The point V = —v+2w is on that line beyond w. 21 The vectors cvu+cw beyond v+w. 22 (a) The combinations and w. Corner fill out the line passing through (0,0) and u = su+ Sw. It continues With c > 0, half this line is removed with 0 < c < | and at (0, 0) (b) With 0 <d <1 and the “ray” starts at (0, 0). fill the parallelogram 0 < c and O < d the parallelogram with sides v becomes an infinite wedge. Pipa, (a) fu + tu+ tw is the center of the triangle between u, v and w; tu +4w of the edge between u and w is the center (b) To fill in the triangle keep c > 0, d > 0, e => 0, and c+td+e=1. 447 448 Solutions to Selected Exercises 25 The vector 5(u +v-+w) is outside the pyramid because 26 All vectors are combinations c+d+e= 5 Paes > 1. of u, v, and w. 27 Vectors cv are in both planes. 29 The solution is s = 2 and d =4. Then 2(1, 2) + 4(3, 1) = (14, 8). Problem Set 1.2, page 17 2 ||w|| =1 and ||v|| =5 =||w||. Then 1.4 < (1)(5) and 24 < (5)(5). 4 uy; =d/|lv|| = Fa: 1) and u = w/||wl| = $(2, 1,2). U; = wari —~3) or Tg! 3)- U> could be nat yall —2, 0). 5 (a) zero angle (b) 180° (or zr) =) ORsoner— 90 6 (a) cos@ = One (c) (v+w)-(v—w) =v-v+w-v—v-w-wWw-w= acorn 5 radians). $010) = 160280r . radians (b) cos@ = 0 so 6 = 90° or % radians VY (@)ecosa — = = SOG) = GON or 5 (d) cos6 = —1//2 so 6 = 135° or 3r 8 (a) False (b) True: u-(cu+dw) = cu-v+du-w choose c=u-w and d= —u-v. 12 (1,1) perpendicular to (1,5) — c(i, 1) if 6 —2c = 0 = (c) True: 0 or c = for cu-v+du-w 3; v-(@w—cv) = = 0 0 if C=v-w/v-r. 16 ol? =9 so joll= 18 (v+w)-(v+w) 35a 30 w= (10, = (v+w)-v+(v+w)-w 0). = v-(v+w)+w-(v+w) = v-v+0-wt+w-vtw-ew = v-v+2v-wt+w-w. 19 2v-w <2l|v|||| wl] leads to |v + wl? =v-v+2v-w+w-w < |v? +2Iv||||wl] + Iwi]? = (|p| + |wi)?. 21 cosB = wyj/||w|| and sinB = w2/||w||. Then cos(B — a) = cosBcosa + sinf sina = vywy/|lv||||wl| + vaw2/|v||||w|| = v + w/|lv||||wl]. 22 We know Cancel that (v—w)-(v—w) =v-v—2v-w+w-w. Compare with the Law of Cosines. —2 to find ||v||||w||cos@ = v-w. 24 Example 6 gives |u||Uj| < 5(uj + U?) and |w2||U2| < 4(u3 + U3). The whole line becomes .96 < (.6)(.8) + (.8)(.6) < (.67 + 87) + 5(.87 + .67) = 1. 26 The angle between v = Write v-w =xz+yz+xy (1,2, —3) and w = (—3,1,2) has cos@ = a andegs—si20ce as (x+y +z)? — 5(x? + y? +27) which is —4(x? +y? +2%). 27 The length ||v — w|| is between 2 and 8. The dot product v-w is between —15 and 15. Solutions to Selected Exercises 449 Problem Set 2.1, page 29 The solution is not changed; the second plane and row 2 of the matrix and all columns are changed. J Sctey —0Nto tind sy ———3)/2 -andaa— 5) 2s Setny — Oh tomtind. — smandez —alb 6 If x, y,z satisfy the first two equations they also satisfy the third equation. solutions contains cvu+dw and w= (G, iF 5) and u = Sv a Sw and all combinations with c+d=1. 7 Equation DPA v = (1,1,0) The line of 1+ equation 2— equation 3 is now 0 = —4. Solution impossible. (Soa) peAbe —n( GAS)» 10 Nine multiplications for Ax = (18, 5, 0). 12 (z, y, x) and (0,0, 0) and (3, 3, 6). 13 (a) x has n components, Ax has m components the columns are in m-dimensional (b) Planes in n-dimensional space, but space. (b) cvg+dv 14 (a) Linear equations with unknowns x, y, z have the form ax+by+cz=d_ is also a solution if c+d=1 (c) If the equation has right side = constant term = zero, any c and d are allowed. 16 We rotation from R* = ie | ee be iar)l 180° OI O241.%0 POO! Wl) beat hasle) 17 P= andup=!= esa s0 “P70 eS 50 ord 19 F260 0 Oto 16 Onde t <0 “0 LO? 30 40" 10> Fe =] Gi 4's FO ev ea Gy a 23 The first code computes dot products for J = 1,2; the second code combines i—ie Ow Anise4uby esechantestOn— le 4randsJe—dles: columns 24 B(1) = B() + AC, 2) * V(2), B(2) = AQ, 1) * VC), BQ) = B(2) + AQ, 2) * V(2). Code 2 reverses the first two of those three steps. 29 uz = [a ey = ea The components always add to 1. 30 u7, 07, w7 are all close to (.6, .4). Their components still add to 1. 31 8 .3/).6) _ |6 Ea T i) taal Gaeta = steady state s ee . 32 From all starting points the vectors approach (.45, .35, .20). 33 Kel =|) 8 ih Cy 3 4 S+u SD |) cee |p ey no YL 2 S+u 136, row sums = 136/4 = 34. S5—u+u 5) Suv 5-—v 5+u+v 5—u }; 16 numbers in M4 add to 450 Solutions to Selected Exercises Problem Set 2.2, page 40 Multiply by / = ? = 5 and subtract to find 2x +3y = 14 and —6y = 6. Subtract -4 times equation 3y = 3. Then y= | (or add 5 times equation 1). The new 1 and x =5. second equation is If the right side changes sign, so does the solution: (x, y) = (—5, —1). Subtract / = © times equation 1. The new second pivot multiplying y is d — (cb/a) or (ad — bc) /a. The system is singular if b = 4, because 4x +8y is 2 times 2x+4y. Then g = 2-13 = 26 makes the system solvable. The two lines become the same: infinitely many solutions. If a = 2 elimination must fail. The equations have no solution. If a = 0 elimination stops for a row exchange. Then 3y = —3 gives y = —1 and 4x + 6y =6 gives x =3. If k = 3 elimination must fail: no solution. If k = —3, elimination gives 0 = O in equation 2: infinitely many solutions. If k =O a row exchange is needed: one solution. 11 12 Die 2x — Sy a=) il Ve -|- 952 =) =) oe = IC selves 3y a y+ z=1 gives 2y — 3z =2 ee yy —)— en If rarzeromisrate ther startroterow 2 z= 2. or 3, that avoids a row operation. 2x —3y a) Vere — lean = bye =) x=3)' — z=0 2x lero 2x row | from row 2 wale trOmenowe3 row 2 from row 3 13 Subtract 2 times row | from row 2 to reach (d —10)y —z = 2. Equation (3) is y—z=3. If d = 10 exchange rows 2 and 3. If d = 11 the system is singular; third pivot is missing. 14 The second pivot position will contain —2 —b. If b = —2 we must exchange with row 3. itp — FS letnessecondsequatlonniss—y 7 — ) Se AuSOLCON ecm ar—nl miye i lewe le 16 If row 1 = row 2, then row 2 is zero after the first step; exchange with row 3 and there is no third pivot. If column 1 = column 2 there is no second pivot. 17 x+2y+3z=0, 4x+8y+4+12z=0, 5x + 10y+ 15z=0. 18 Row 2 becomes 3y —4z = 5, then row 3 becomes (¢ +4)z=t—S. If g = —4 the system is singular — no third pivot. Then if t = 5 the third equation is 0 = 0. Choosing z= 1 the equation 3y — 4z=5 gives y =3 and equation | gives x = —9. 20 Looking from the “end view”, the three planes form a triangle. This happens if rows 1 +2 = row 3 on the left side but not the right side: for example x + y+ z = 0, x—y—z=0, 2x +0y+0z=1. No parallel planes but still no solution. 22 The solution is (1, 2,3,4) instead of (—1,2, —3, 4). 23 The fifth pivot is g The nth pivot is 7 1 24 A 1 1 "aaa a+1 b b+e b+c+3 25 a = 2 (equal columns), a = 4 (equal rows), a = 0 (zero column). 26 Solvable for s = 10 (add equations); [; | and E at Solutions to Selected Exercises 451 Problem Set 2.3, page 50 1 0 Seat Ole OR Om! Bog=) POO i “ez i ONO Om Oma ae ORO =O, vi Olle YF il Eo OO a Ol eae 1 |G 0 (ORO) i 0} -—2 1 1 M = E37E3,E2, is |—4 10 OpeO 1 0 -—2 1 4 Elimination on column 4: | b= | 0] 0 tion in Ux = (1, —4, 10) gives > 1 |-4] 0 z= —5, y= |Prat, are Onto On 0 0 FE 1, £31, E32 , > — : 5.x 0 10 1 1 |—4 10 |. Then back substitu- . This solves Ax = (1,0, 0). Changing a33 from 7 to 11 will change the third pivot from 2 to 6. Changing a33 from 7 to 5 will change the pivot from 2 to no pivot. To reverse E3,, add 7 times row 1 to row 3. The matrix is R3; = No So =o ea OO O O 1). —-1 1 0 M=j]{ After the exchange, E must act on the new row 3 a: 1 (Oe lees 0 2 -3 0] Ie 8) 12 (a) E times the third column of B is the third column of EB to row 3 to give nonzeros. 14 (b) E could add row 2 452 Solutions to Selected Exercises Oe) 17 EF=|a 0|, FE=| ln Fo 19 (a) ib @ =O. 19.0 0) Be 1 column is Elva Each 1 Q a 10). @ b+ac c 1 ‘B i B= 1 Ww |208 IF 07). 2D Ol © ohne times a b columno (a) Dla3jxj (Bb) ag1 — ayy Finitil 2° Seal 2 3 23 | a hea Os ii (b) > | | wel 2g ae 8) Wena ()2x2=2, (dd) (Ax) = Yay;;. 1). 2xjt3m= 1 m= 5 Sal —§x=159 wo = 3. 21 24 The last equation becomes 0 = 3. Change the original 6 to 3. Then row row 3. ee 25 (a) A” 1 has two extra columns; ‘ E 7 0 26 (a) No solution if (b) Infinitely many d= 0 and c 40 A 1 4 1 d=> i iS ae 1 + row 2 = 0 —7 i ae age 4 nee solutions if d = 0 and c = 0. No effect: a and b. Problem Set 2.4, page 59 (a) A (column 3 of B) (b) (Row 1 of A) B (d) (Row | of C)D(column 1 of E£). n wel 1 bn DES P| ana (a) True (c) (Row 3 of A)(column 4 of B) OD: n an = [7% ou (b) False (c) True (d) False. 8 Rows of DA are 3-(row | of A) and 5-(row 2 of A). Rows of EA are row 2 of A and zero. Columns of AD are 3-(column | of A) and 5-(column 2 of A). Columns of AE are zero and column | of A. INE = ip ae 10 alee ee and E(AF) b+d d equals (EA)F and then E(FA) = because matrix multiplication is associative. a+c | a+2c b+d b+2d | E(FA) is not F(EA) because multiplication is not commutative. ) 11 (a) eB 47 (o) B=0 (CG) Bs—s|EOmen @O la (d) Give B equal rows. LOO) 12 ap =|‘ Reel, | gives b=c =0. Then AC =CA gives a=d: A=al. 13 (A — B)? = (B — A)? = A(A — B) — B(A — B) = A* — AB— 15 (a) mn (every entry) (b) mnp BA 4 B?. (c) n> (this is n> dot products). Solutions to Selected 16 By linearity (AB)e agrees with A(Bc). 17 (a) Use only column 2 of B Exercises 453 Also for all other columns of C. (b) Use only row 2 of A (c)-(d) Use row 2 of first A. 19 Diagonal matrix, lower triangular, symmetric, all rows equal. Zero matrix. 20 (ayy 23 A= 25 Bite ai) ©) aja — (Fain () ajt/ary 9 EOme| has A? =—I; BC 2S WO ites Gt —/ ie a [0 x Sys il — 1 1 of B) = OF ON x =| 0 28 al =| 0 FOil ae Oy al te ie CB. = | BE aPPL lade 5 27 (a) (Row 3 of A)-(column (Dee 5 @) aan — (PADara. 1 ORO n a onal n—lp (Row 3 of A)-(column 2 of B) =0 Gx Fo 0 n_|@ = i |x |eand? [0 0 x| = x OR [FOR OmEes OM ex OME ORs AN ie tates ate)He ; 31 In Problem 30, oo || gh ea al Dele ee ah D~ ebja =| me “Al 33 (a) n—1 additions for a dot product; n2(n— 1) for n> dot products additions for R and S$ (c) 2n(n — 1) +n (b) 4n2(n— 1)+2n- for R and (knowing AC and BD) another dn? +n?(n —1) for S. Total: 3n2(n — 1) +5n? additions for the 3M method. 34 A times X will be the identity matrix /. 35 The solution for b= (G5, 8) is 3x7 475x> -- 8x3 = G, 8, 16): Problem Set 2.5, Page 72 Ob hee 3 0 = Sal 0 1 4 0 P-!=p; 0 O 1 p-!'=]1 0 a+2c=1, —1 1 OJ. 0 ei Ted ae See p= “transpose” of P. CoN Always 3a+6c =0: impossible. 0 —1 0 a=| 0 4 and |0 1 aa a bil-l Ol]la 6b —| E ‘ll 0 ile | : (a) If Ax = (1,0,0), equation 1 + equation 2 — equation 3 is 0 = 1 (b) The right sides must satisfy b, + b2 = b3 (c) Row 3 becomes a row of zeros—no third pivot. If you exchange rows C= AB gives | and 2 of A, you exchange columns C~! = B-1A7! s0 AW! = BCH | and 2 of A =i 454 Solutions to Selected Exercises 12 M-1=c7—!B-!47! 50 B-! = CMA. —| 13 B-1=,7! if | 1 =A! ie a subtract column —" 1 Sib il 1 16 1 —1 1||-1 1 |} 1 01 = 1 2 of A~! 1 from column 1 il i = /J¢ 1 1. th eel = L = E~! after reversing the order and changing —1 to +1. 18 A2B =I can be written as A(AB) = I. Therefore A7! is AB. 19 (1, 1) entry requires 4a —3b = 1. (1, 2) entry requires 2,—a =0. Then b= : and a = 2. 20 6 of the 16 are invertible, including all four with three 1’s. it 3 i © 1 3 i SB it © > [4 i 1 Oso 3 ete eel 230 1 OF ® i leew @- | i OF © O 1 @ © 10) 1 Ss il @ ON ls OPO —b eles | 1 2 Al= 3 5 —1 —-1 -1 3 -1 —-1 -1]|; 3 1 0 1 a E 7 0 > [o in) rE Se 26 A~'=|-2 ORO Al 1 0 1 OM > [o fe—2 = © i SOP ON a0 1 0 1]=]|0]| 1 0 7 4h> a 1 0 OR 0 0 1 -a ele OnnO 1 Otel OnmmnO) =U Ase |. =i O 2 1 0 -1 2 -1 -1 1 (b) False (matrix of all 1’s) 29 Not invertible for c = 7 (equal columns), c = 2 (equal rows), 31 A= SY) 89 So ©= CO © _-—- (7, Ds, 11) etitel 32 =O, 2%, I) 33 hilb(6) is not the exact Hilbert matrix because fractions are rounded off. 1-70 3 & lee "|and Aq! bees eee 0 AeA and ap |I 4 0: (c) True c = 0 (zero column). 30 Elimination produces the pivots a and a —b and a—b. 1 1 0 Clos or 0 ac—b —C 1 so A~! does not exist. -—3 | (notice the pattern); A-!=|-1 28 (a) False with 4 zeros; true with 13 zeros (d) True. BS Solutions to Selected Exercises 455 Problem Set 2.6, page 84 Av 1 0 QO 1 Onl —3 120) 1 a GQ) oy | 1 —2 yo | @ 1 -—2 1 il ab Sh 2 2 = ORE 1 Au—a10) 0 4 lee) ZEA Oma 1 2 0 1 ASS =| oO i Be. 1 33| —6 = U7 UE ORO | SOR to — Oi al Then As p-l E5, E35 YW! SIG 1 EH = Bao Bao, i = 1 —c This is A~!. But ih 9) ee |— 2a l 1 | —b —a | l —a = 1 1 1 ac—b -c | L =A. 1 feel i ip Ol i Il 4 be i die —s lei erOvael eteray (il, eiayel 7 = (0)2 no pivot in row 2 c = 2 leads to zero in the second pivot position: exchange rows and the matrix will be OK. c = 1 leads to zero in the third pivot position = matrix is singular. ih A=|> -@)]} | als 1 4 1 OI a! t Jee i) cae 04 OL sO ! Ol [2 a gO 4 4 O |i 2 : OF ks 3%: | E | |=EDU: notice U is “L transpose 1 4 Om | be QO. Or —4 1 4 a 1 Or OL. 10 12 Oo iP ww iP S OR — | Ne! Deen ty | ||) a Wd Cw! | tO eel POS ex ial : gives c= 13 (a) L goes to I iN SOCSHLON 14 (a) Multiply LDU = L;D,U, = Lbenn am r Sip if ie c—s t-—s d-t Ghe=\)) sei nies Wale) al 4 |Ol ye || Ul) OMOR (CREU 3 fakes ae = || (0 l goes tor. by inverses to get ie LD = D\U,U7~!. triangular, right side is upper triangular = both are diagonal | ial (0) (lh th 4 |) | Veto a a 0 a a+b b 0 b b+c is I. a =" (Samews) 16 If T is tridiagonal, each pivot row has only 2 nonzeros. not on later rows. Left side is lower (b) Since L, U, Lj, Uj have diagonals of 1’s we get D = D,. Then Like is J and U,U—! 1 aaa dt b (same U). c It operates only on the next row, U and L are zero away from the main diagonal and the next diagonal. 456 Solutions to Selected Exercises 17 L has the 3 lower zeros of A, but U may not have the upper zero. L has the bottom left zero of B and U has the upper right zero. One zero in A and two zeros in B are “filled in.” its ce Fe ass aoe Be ee be 3G eal on Ome) se x sil il 9 Stems 0) 1 0 (x’s are known after first pivot is used). 0 21 The 2 by 2 upper submatrix B has the first two pivots 2,7. starts with elimination on B. 23 1S NG ee P foo lta i eshe ) Soe l Eat [LO 20 eee 24 c =6 and also c=7 | ee J Thy a 1 Sie | will make LU Reason: Elimination on A oil 3 Ds he 31° Pascal’s triangle in L and U. 1 impossible (c = 6 needs a permutation). Problem Set 2.7, page 95 a ki 1 4A= Atl= le a GHP Sahel 2 li alhas A* = 0. But the diagonal entries of A'A of A with themselves. a Al If A! A = 0, zero dot products = (b) False (c) True 13 (a) There are n! permutation P’ = | Po | Iara |a (Sa) le Sh Gad are dot products of columns zero columns. cl 7 (a) False the same: ks wk P*. (d) False. matrices of order n. Eventually two powers Multiply by (P~!)* to find P’~S (em Cettainlyar is One P3 with P, = |= P! with a LG andy P32 == E = ie | 150) 1 10 0 (b) Pl (row 4) = row 1. Oaelt ffowrenll ay SE ea ited cee |Pomerat 20 Lower right 2 by 2 matrix is le a4 ae of P must be amheathe al 071! Solutions to Selected Exercises 1 23 1 1 AN = lb OQ iW @ 3S 2 il 1 |e | Oil Oo) Wye 1 3 1 ih =| 1 1 1 24 abs(A(1,1)) = 2) 0 so find abs(A(2,1)) > exchange in L; no more elimination so abs(A(2,1)) > GDS CAC) tol; A> =tolwA 3554 |0O0 O Y SC 1 8 —2/3 |. If we wait to exchange: pe Nils «il OF il @ OOM 1 2 1 3 457 Da tol; A > 01 and P > Oil a L = / and U = new A. abs(A(1,1)) =0 Oh 1{andP—>J]1 O 2, 3-4 510) 25" 16) ist oo @O 1 so find 0 =O OJ]; abs(A(2,2)) @ il Oo psi = no so find ik — > | ORO I =0 aa @ @ 1 74) JE = shows the elimination steps as actually done (L is affected by P). 2 Wil 1 30 Ex, =)|-3 1 i |KO 0 and E> AE}, =) l @ © 2ee4 4 9 eis stillisymmetie: ! | Dy — i —4 and E37 E>, AE}, Ee = D. Elimination from both sides gives the 1 symmetric LDL! directly. lo) 31 Total currents are Aly = | —1 Pla REG 0 WES |) = ||RE ORNs YBC + YBS PBS =YCS* ar WES LIBS Hither way (Ax) y=x!(Aly)= xpypc +xByBS — XCYBC + XCYCS — XSYCS — XSYBS1 32 Inputs | 40 5 50 1000 50) x] |=" x2 T 1 A’y =| 50 40 1000 2 50 700 3 3000 ns = 6820 1 truck 188000 | 1 plane © 34 P3 =1 so three rotations for 360°; P rotates around (1, 1,1) by 120°. Problem Set 3.1, Page 107 LxtyAyt+x andx+(y+z)A(e+y)4+zZ 2 The only broken rule is 1 times x equals x. and (cj +c7)x Acyx + cox. 458 Solutions to Selected Exercises 3 (a) cx may not be in our set: not “closed” under scalar multiplication. Also there is no 0 and no —x (b) c(x + y) is the usual (xy)°, while cx + cy is the usual (x°)(y°). Those are equal. With c = 3, x =2, y = 1 they equal 8. The “zero vector” is-the number 1. The vector —2 is the number +4. Rule 8 is broken: If c f(x) is defined to be the usual f(cx) then (cj +c2) f = f((cy +c2)x) is different from c; f + cof = usual f(cyx) + f(cox). If (f + g)(x) is the usual f(g(x)) then (g + f)x is g(f(x)) which is different. In Rule 2 both sides are f(g(h(x))). Rule 4 is broken because there might be no inverse function such that f(f~!(x)) =x. If f~! exists it will be the vector —f. (a) The vectors with integer components allow addition, but not multiplication by 5 (b) Remove the x axis from the xy plane (but leave the origin). Multiplication by any c is allowed but not all vector additions. Y (a) All matrices Gap i 0 The sum of (4,0,0) (a) The subspaces (b) All matrices [i ‘| (c) All diagonal matrices. and (0, 4,0) is not on the plane. of R2 are R2 itself, lines through (0,0), and (0, 0) itself (b) The subspaces of R‘ are R?¢ itself, three-dimensional planes n-v = 0, two-dimensional subspaces (nj -v =O and nz -v = 0), lines through (0,0,0,0), and (0,0, 0,0) itself. 15 (a) Two planes through (0, 0,0) probably intersect in a line through (0, 0, 0) (b) The plane and line probably intersect in the point (0, 0, 0) (c) Suppose x is in SMT and y is in SMT. Both vectors are in both subspaces, so x + y and cx are in both subspaces. 19 The column space of A is the “x axis” = all vectors (x,0,0). The column space of B is the “xy plane” = all vectors (x, y,0). The column space of C is the line of vectors Gr 2-0): 20 (a) Solution only if b2 = 2b; and b3 = —b, 22 (a) Every b (b) Solvable only if b3 = 0 (b) Solution only if b3 = —b). (c) Solvable only if b3 = bo. 23 The extra column b makes the column space larger unless b is already in the column space of A: [A vi=[4bawee | (larger column space), [i ie se ed (same column space). 24 The column space of AB is contained in (possibly equal to) the column space of A. If B=0 and A #0 then AB =O has a smaller column space than A. 26 The column space of any invertible 5 by 5 matrix is R>. The equation Ax = b is always solvable (by x = A~'b) so every b is in the column space. 27 (a) False (b) True il tk 0) 28 A= | ib Ws -@ ORO (c) True tal lhe || al @ OP 1 (d) False. Solutions to Selected 459 Exercises Problem Set 3.2, Page 118 (a) Free variables x2, x4,x5 and solutions (b) Free variable x3: solution (1, —1, 1). (—2, 1,0,0,0), (0,0, —2, 1,0), (0,0, —3, 0, 1) The complete solution is (—2x2, x2, —2x4 — 3x5,%4,x5). when no variables are free. The nullspace contains only 0 R= jL 0 0 0 0 OR OMAO te S| OOO >: a0 6 SS 10 0 1 (a) Special solutions ee R=H|@O Y 0 i © 1 ’ IP0 0Bm 0}Sil | 1 =I = (3, 1,0) and (5,0, 1) same nullspace as U and A. 2) fs OF |E 0 IP 1 0 3} 0 5 =3 | (b) (3, 1,0). Total of pivot and free is n. (a) Nullspace is the plane —x + 3y + 5z = 0; it contains all vectors (3y + 5z, y, z) (b) —x +3y+5z=0 and —2x + 6y + 7z = 0; the line contains all points (3y, y, 0). (a) False (b) True 10 (a) Impossible (GC) eA e Ue (c) True (d) True. (b) A = invertible = 1 1 1 ha—ale 12 sae Or Oo i= ©© ooo Gomes coor Ce es OO 8S SS 0 0 0 1 ’ ] 0 0 Ss Soe 0 CO OS 0 =) a) oO x& Ss = Ore eS a oot 14 If column 1 = column 5 then x5 is a free variable. Its special solution is (—1, 0, 0, 0, 1). 15 There are n —r special solutions. The nullspace contains only x = 0 when r =n. column space is R™ when r = m. 17 Au —3 —1]; y and z are free; special solutions The (3, 1,0) and (1,0, 1). 20 If ABx = 0 and A is invertible then multiply by A~! to find Bx = 0. The reason U and LU 23 have the same nullspace is that L is invertible. ik lies » ] S1/2 =, 23 24 This construction is impossible. 25 26 460 = Solutions to Selected Exercises 27 If the nullspace equals the column impossible. space then n —r = r. If n = 3 then 3 = 2r is 29 R is most likely to be J; R is most likely to be J with fourth row of zeros. 30 (a) A= @ 0 it 0 0 0 ©) aa 4 0 (c) isos A=]0O 1 1+) 1 0 0 —3], ek This was hard! 31 32 Three pivots. 33 These are the nonzero rows: 34 R=[1 —2 HG NT le al Get 0 0 (a) i Woe: at i a ki Al 3 4] ; ot Ras le OX Problem Set 3.3, page 128 (a) and (c) are correct; columns. (d) is false because We odhe |OO OO 1 (b) R=} 0 Op (2) ee Ab Om) 050 If all pivot variables come nullspace matrix is R might happen Ol 1 8: LOO 50 to have 1’s in nonpivot if (C) ea OF last then R must have three zero blocks il ee) 0p loll Opes 0 R = i af The N = Al The matrices A and A! have the same rank r. But pivcol (the column number) is 2 for A and 1 for A!: al | and S=[1] and Al 2 11 The rank of R! is also r, and the example has rank 2: _ ie ed | spel 2eee els ae ete el eee = (408 pl 3 Se ele 13 If we know that rank(B! A‘) < rank(A!), then since rank stays the same for transposes, we have rank(AB) < rank(A). Solutions to Selected Exercises 15 Certainly A and B have at most rank 2, so their product has at most rank2. By by sob cannot bem/mevennin AlbI—we = 1 st Oa) 1 a={1 01h, 10 fo i 4][g 4 U]=[1 1 il & val 0 Ol 8 B= 19 Y is the same il @ O mee 0 0 De Since BA is we} ==Vl avira! EAN = Ii. @ OR Oma o]+]o 0 4 OOS Pw coech cocoate as Z, because 461 Oy ee Oe Ole ek) ht the form is completely OOO NOs 0.0 tee OF ORS SOROS decided by the rank which is the same for A and Al. Problem Set 3.4, page 136 —2 1 Xcomplete = —3 O | +22 l 0 5b, — 2b2 3 Solvable if 25; +b2 = b3. Then x =| bp —2b, 2 | + x3] 0 0 1 5b, — 2b3 Ame pA 4 (a) Solvable if by = 2b; and 3b; —3b3+64 = 0. Then x = 3) (b) Solvable if by = 2b; and 3b; —3b3+bq4 =0. Ih Bh S638 as db 200 (0) tan bo laps I 3 0 O72 6 Every b is in the column 2 OWa2e— 10) particular solution articular the particular soluti solution (d) The homogeneous 92 space: 9 (a) xj —x2 and 0 solve Ax = 0 10 (a) The l Then x =| b by — 3b; | > 3) 20} independent rows. : (no free variables) A 5b, — 2b3 b3 —2b, | + x3] 0 —| -1 | row 3—2 row 2+4 row is the zero row 0 0 0 63 —2b2 + 4b 1 (b) Need b3 = 2b. Row 3 — (b) 2x; —2x2 solves Ax = 0; 2x; —x2 solves Ax = BD. xp is always multiplied by 1 (c)fl ese Ukce Le alate BseBiliapdlecs Then é solution is x, = 0. (b) Any solution shorter than , | is18 shorter the can be 02 462 = Solutions to Selected Exercises 12 If row 3 of U has no pivot, that is a zero row. Ux =c is solvable only if c3 = 0. Ax =b might not be solvable, because U may have other zero rows. 18 (a) a8 | yy} =] Se 4 —] 0}]+y 0 —1 1/+z| 0 EX 4 (oy) || 59 | 1, @ Zz 0 0 1 —1 seal © 1 19 If Ax; = 5b and Axz = b then we can add x; — x2 to any solution of Ax = B. But there will be no solution to Ax = B if B is not in the column space. Ag) CSN.) CHULA Qe w=, —)), GS Ch tl, 3). 24 A rank one matrix has one pivot. The second row of U is zero. 26 (uv! )(wz!) = (uz!) times v'w. Rank one unless v! T w = 0. 28 (a) r <™m, always r <n inne Pee LOE One: 476 S27 he — i © sO One HGF yee PZ ib (OQ) ie Sips iF =v lite |: 1°22 God Vie iN (Ee 2h (O ho: © 0 On nance esl MORO LE 20) le ean) 0 Oe! Os Oren 1s ee 0 QO 50) =1 1 2 |: no solution because ofrow 3. OS git 21]; B cannot exist since 2 equations in 3 unknowns 34 A=]|0 (dd) ii |. The pivot columns contain J so —1 and 2 go into xp. © ORO ee On 0); OURS —1 0 2 andxp, =| (0) 7 =o, cannot have a unique Ome solution. i © @ ent ire Silos | ees it 36 A= 2 Oi ¢ 1 3 1 7 4 -Olie0 we0 o| adx= eg ; Relient Oo Om © x3 and no solution. 2, | and luti yes k a Problem Set 3.5, page 150 i yO SS I cC| aa c2 | = O gives C3 ih OO cz3 = cp = cy = 0. But —2v, — 3v2 — 403 + vy = O (dependent). 2 v1, v2, v3 are independent. All six vectors are on the plane (1,1, 1,1)-v =O of these six vectors can be independent. 3 If so no four a=0 then column | = 0; if d =0 then b(column 1) —a(column 2) = 0; if f = 0 then all three columns end in zero (they are perpendicular to (0,0, 1), they lie in the xy plane, they are dependent). Solutions to Selected Exercises Columns Same 1, 2, 4 are independent. answers (not same columns!) 463 Also 1, 3, 4 and 2, 3, 4 and others (but not 1, 2, 3). for A. If cy (w2+w3)+c2 (wy +w3)+c3(w;+w2) = 0 then (cp+c3)w) +(c] +¢3) wW2+(c] +c2)W3 = 0. Since the w’s are independent this requires cz + cz3 = 0, cy +c3 =0,c) +c2 =0. The only solution is c} = cp = c3 = 0. (a) The four vectors are the columns of a 3 by 4 matrix A. There is a nonzero solution to Ac = 0 because there is at least one free variable (c) Ov; + 3(0, 0,0) = 0. (a) Line (b) Plane (c) Plane in R3 (d) All of R?. 12 b is in the column space when there is a solution to Ax = b; ¢ is in the row space when there is a solution to Al y =c. False. The zero vector is always in the row space. 14 The dimension of S is (a) zero when x = 0 (b) one when x = CPE 171) (c) three when x = (1,1, —1,—1) because all rearrangements of this x are perpendicWETS (roy Gila Tek, Ab) (d) four when the x’s are not equal and don’t add to zero. No x gives dimS = 2. 16 The n independent vectors span a space of dimension n. They are a basis for that space. If they are the columns of A then m is not less than n (m > n). 19 (a) The 6 vectors might not span R‘* (c) Any four might be a basis. (b) The 6 vectors are not independent 22 (a) The only solution is x = 0 because the columns are independent (D) RAG Das solvable because the columns span R. 25 (a) False for [1 dimensions 1] (b) False (c) True: Both dimensions = 2 if A is invertible, 0 if A = O, otherwise dimensions = | (d) False, columns may be = dependent. 27 31 (a) All 3 by 3 matrices 32 —] 0 2 0 0 CO) |? —| 0 (b) Upper triangular matrices 0 0 2 Oo; 0 |-1 ah) (a) y(x) = e2* (b) y =x 37 Basis Leones basis x — 1, x? — 1.x? eal 0 2, (one basis vector 0 0; (c) All multiples c/. 0 |-1 0 0 0 Dali in each case). 38 Basis for S: (1,0, —1, 0), (0, 1, 0,0), (1,0, 0, —1); basis for T: (1, —1,0,0) and (0, 0, 2, 1); ST has dimension |. 464 Solutions to Selected Exercises Problem Set 3.6, page 161 A: Row space (1, 2,4); nullspace (—2,1,0) and (—4,0,1); column space (1, 2); left nullspace (—2, 1). B: Row space (1, 2,4) and (2,5, 8); column space (1, 2) and (2, 5); nullspace (2) (—4, il © lO) gl 0,1); left nullspace (b) Impossible: (e) Impossible: basis is empty. r + (n —r) must be 3 Row space = column space requires -9 ()eO} 0} Then m—r=n—r. m =n. (d) |; —3 if A: Row space (0,3, 3,3) and (0,1,0, 1); column space (3,0, 1) and (3,0, 0); nullspace (1,0,0,0) and (0, —1,0, 1); left nullspace (0, 1,0). B: Row space (1), column space (1, 4,5), nullspace: empty basis, left nullspace (—4, 1,0) and (—S,0, 1). Invertible A: basis nullspace and left nullspace row space = column space basis are empty. basis Matrix = B: (1,0,0), row space (0, 1,0), basis (0,0, 1); (1, 0,0, 1, 0,0), (OMEOFOS TS O\Rand (OF OMe OOM) sscolumnespaccsbasisn (e050) sa(Osule 0) (On Os) nullspace basis (—1, 0,0, 1,0,0) and (0, —1, 0,0, 1,0) and (0,0, —1, 0,0, 1); left nullspace basis is empty. Row space dimensions 3,3,0; column 2,3, 2; left nullspace dimensions space dimensions 3, 3,0; nullspace dimensions 0, 2, 3. 11 (a) No solution means that r < m. Always r < n. Can’t compare m and n m—r > 0, the left nullspace contains a nonzero vector. 12 4] Om 2 0 i k 13 (a) False 14 Row OA 5 oe DB 2 i (b) True (c) False (choose A and B same space basis De 4 @ a] 0]; fi (1, 2,3,4), (0,1, (b) If r+m—-—r)=n=3 but 242 is 4. size and invertible). 2,3), (0,0, 1,2); nullspace basis space basis (1,0, 0), (0, 1,0), (0,0, 1); left nullspace has empty (0, 1, —2, 1); column basis. 16 If Av= 0 and v is a row of A then v-v =0. 18 Row 3 —2 row 2+ row 1 =0 so the vectors c(1, —2, 1) are in the left nullspace. same vectors happen to be in the nullspace. The 19 Elimination leads to 0 = b3 — by — by so (—1, —1, 1) is in the left nullspace. Elimination leads to b3 — 2b; = 0 and by + bz — 4b; = 0, so (—2,0, 1,0) and (—4, 1,0, 1) are in the left nullspace. 20 (a) All combinations of (—1,2,0,0) and (-4,0, —3, 1) (b) 1 (oS) 5 Onl) 21 (a) u and w dependent 22 (a? 7) (b) v and z (c) rank < 2 if uw and w are dependent or v and z are (d) The rank of uv! + wz! is 2. k ; l= An 4 25 (a) True (same rank) unsymmetric) 22 2 D2 1 (b) False (d) True. |] A=[1 0] (c) False (A can be invertible and also Solutions to Selected Exercises 465 27 Choose d = bc/a. Then the row space has basis (a,b) and the nullspace has basis (—d, a). 28 Both ranks are 2; rows | and 2 are a basis; N(B') separated by a zero; McC!) has (—1,0,0,0,0,0,0,1) columns 3,4,5,6 of J; N(C) is a challenge. has six vectors and with | and —1 (0, —1,0,0,0,0, 1,0) and Problem Set 4.1, page 171 (a) 1 2 —3 (d) Row 2 -3 5 -—3 1 —2 1+ row 2+ row 2 1 (b) |—3 | is not perpendicular to | 1 5 l (c)F Ag 3 = (6, (ORC e eh) , ) is not zero; no such matrix pom } | wall be in the nullspace and row space; no such matrix. (a) If Ax = b has a solution and Aly = 0, then y is perpendicular to b. bly =0 (b) If Ax = b has no solution, b is not in the column perpendicular to y. space. (Ax)! y = It is not xX =X;,+Xn, where x, is in the row space and x, is in the nullspace. Then Ax, = 0 and Ax = Ax; + Axyn = Ax,. All vectors Ax are combinations of the columns of A. If Ax is in the nullspace of A! then Ax must be zero. perpendicular to itself. It is also in the column space => (a) For a symmetric matrix the column space and row space are the same (b) x is in the nullspace and z is in the column space = row space because A = Ab ¥ =x, +x, = (1,-1)+d,1)=@,0). 11 Aly=05 xTAaly =(Ax)'y=05 yl Ax. 13 If S is the subspace of R? containing only the zero vector, then S+ is R?. If S is spanned bya (ey lee) sathen St is spanned by (1, —1,0) and (1,0,—1). If S is spanned by (2, 0, 0) and (0, 0,3), then S+ is spanned by (0, 1, 0). Ih Paydl 14 S~lee is the nullspace of A == E ae | Therefore S+ is a subspace even if S is not. 17 The vectors (—5,0, 1,1) and (0, 1, —1,0) span st. 19 For x in V4, x is perpendicular to any vector in V. Since V contains all the vectors in S, x is also perpendicular to any vector in S. So any x in V+ is also in S+. 2( ~~ Column | of A~! is orthogonal to the space spanned by the 2nd, 3rd, ..., nth rows of A. 24 (a) (1,—1,0) is in both planes. Normal vectors are perpendicular but planes intersect is orthogonal to the first line but not on the second line (c) Lines can meet without being orthogonal. (b) (2, —1,0) 466 Solutions to Selected Exercises Problem Set 4.2, pages 181 Q) p=G.3,.9) e€=C9.49) @P=U,3,D; €=@,0,0). \ P=} Py =5 N= ape 1 \ Indeed P? = Py. Pr = 7 1 28 a oA = wm al 915-9) (1,0,0) = b. Po =4 4 4 —2 —2 iets Mi Cae et i 3 il —2)|. P,P, =0 because a; la. } and pz = (3. 9-9) and p3 = (Gs: 3): has Whales ells Pp, = (1,0) and pz = (0.6, 1.2). Then pj + p2 £ Bb. P =I. If A is invertible, P = A(ATA)~!A? = AA~!(A!)—1 AT = 1. (1) p = A(AA)~!A'b = (2, 3,0) and e = (0,0, 4) The projection of b onto the column p=t 5 8 So Peal —4 space of A is Bb itself, but P is not necessarily J. —4 and pss (ODA). a 16 5(1, 2,—-1)+ 3(1, 0, (2 17 (2) p = (4,4,6) and e=0. al) lheretoresbi issn the plane. p? = P and therefore (I — P)* = (I — P)I — P) = 1 — PI— IP+ P* =1— P. When P projects onto the column space of A,/ — P projects onto the left nullspace of A. V6Or=176" 173 Sib. 1/6. Mae3 O=|-1/6 4/6 17 |. P=|1/6 5/6 =13 =1/3. “155273 Weigh ainZhgaaaWe 20 e=(1,=1,=2), 21 (A(ATA)—!AT)? = A(ATA)—“!(ATA)(ATA)—IAT = A(ATA)!AT. Therefore P? = P. Pb is always in the column space (where P projects). Therefore its projection P(Pb) is Ds 24 The nullspace of A! is orthogonal to the column space R(A). So if Ab = 0, the projection of b onto R(A) should be p=0. Check Pb = A(A'A)~!ATb = A(ATA)—!0=0. 26 A! exists since r =m. Multiply A27=A by A~! to get A=/. 27 Ax is in the nullspace of A. But Ax is always in the column space of A. To be in both perpendicular spaces, Ax must be zero. So A and A!A have the same nullspace. Problem Set 4.3, page 192 C aot SS oO RW. — 0 8 g 20 ; 5 |: Change the right side to p = 1 5 13 1J i 3.x = (154) becomes ‘exact. Solutions to Selected Exercises 45 ALA 467 RS| Alb = |B6nlr I 28 ¥ = (1,4). The four heights are 1, 5, 13, 17. The (Mil? See26 errors are —1,3, —5,3. The minimum error is E = 44. E=(C40D)7+(C 4D 8) + (C +3D —8)° + (C FAD = 20/2. ‘Then EOC = 2C +2(C + D—8)+2(C +3D—8) +2(C +4D — 20) = 0 and E/8D = 1-2(C + D—8)+ . 4 NST IG = el 36 Be 2(CA — 20) 20) =0. (Gee 3D== 8) )+ a4? (C +4D — = 0. The equations are E | E=(C—0)*4+ (C —8)*+(C —8) + (C20). AP =[1 I 1 11-ATA =14] and Alb = [36] and C =9. e = (—9, —1, -1, 11). A=[0 1 3 4]', ATA =[26] and A?b=[112] and D = 112/26 = 56/13. 0 8 8 20 0 ee Cc D Ja) y 1 1 Re WO RW © = oe 4 16 64 10 4 8 26 oo Mey A Spy Sake! 36 Cc 1D) \| == | iti. 400 EB 0 1 | 47 Spas @ Sj sales] 5 S nN and p=b 11 (a) The best line is x = 1+4?, which goes through (2, 9) C-m+D->0" ti = VL, 5j. Divide by m to get 12 (a) ala =m. a'b = bj +--:+ bm. b— fa. 7,(bo; — 2)" lel? = Therefore and e=0. (b) From the first equation: C+ Di = b. * is the mean () p ="G, 3,3), of the b’s (b)e= ¢€ = (2,—1,3), tpl 0 ] ae as el 1 13 (AT A)~!AT(Ax — b) = x —f. Error vectors (+1, +1, £1) add to 0, so the x — ¥ add to 0. 14 (® —x)(® —x)? = (ATA)—!AT(b— Ax)(b— Ax)'A(ATA)~!. “covariance 16 matrix” (ATA)! AT02A(AT A)! Take the average to get the which simplifies to o*(ATA)—!, 7002100 + 7p %99 = Top (bi +--+: + Pio0).- 18 p = Ax = (5, 13, 17) gives the heights of the closest line. The error is b— 20 xX = (9,4). 21 e is in N(A"); p= Ax = (5, 13,17) = b. Error p is in R(A); p = (2, —6, 4). e = 0 since b is in the column space of A. ¥ is in R(A‘); N(A) = {0}. 23 The square of the distance between points on two lines is E = (y—x)*+@By—x)?+(1+x)2. LaE/dx = —(y — x) — By —x) + (x +1) =0 and 49E/dy = (y — x) + 3Gy — x) =0. The solution is x = —3, = 24 e is orthogonal to p; —3; ji, = z, and the minimal distance is \le||2 =e!'(b—p)= eb =b'b— b! p. 25 The derivatives of ||Ax — b\|2 are zero when x = (AT A)—1AT. =3 468 — Solutions to Selected Exercises Problem Set 4.4, page 203 (a) ATA = 16] (b) ATA is diagonal with entries 1, 4, 9. ee (a) Q=|0 independent 1}, 0 0 QGQ'=]0 1205-0 1 O O10: 00 (b) (1,0) and (0,0) are orthogonal, but not (0) (4.4.4.4), Goh -f-p GbE D.C BEAD. If Q; and Q> are orthogonal matrices then (0102)' 0102 = OF 01010 = O10) = I which implies that Q;Q> is orthogonal also. The least squares solution is ¥ = QO'b. This is 0 if O = ld and b= bak (a) If q1, 92, 93 are orthonormal then the dot product of g; with cjq, +c2q92+03q3 gives cj = 0. Similarly cy = c3 =0 (b) Ox =0>5 O'Ox = O10=05x=0. (a) Two orthonormal vectors are pl. B54 Oe 1) and 79 (7, vector in the plane is the projection 20'(1,0, ORO} > = 3, —4,5, -1) =90 (b) The closest (0.5, —0.18, —0.24, 0.4, 0). If qj and q> are orthonormal vectors in R> then qi}b)qy + (q3.b)q2 is closest to b. (a) aib = al (x14) + x2a2 + x343) = xy(al aj) =x) (b) alb = a} (xjay + x2a7 + x3a3) = x(a} ay). Dheretore sep alb/ata (c) x1 is the first component of A~! times b. 14 15 Feelin, pile abel eel Hegel llal| N=CHA R=CH Jy) and | o)=la o]| 0 (a) gy = 4(1,2,-2), go =4(2,1,2), 93 =4(2,-2,-1) contains q3 qi BI! (db) The nulllspace of AT (c) ¥ = (A?A)—!AT(1,2, 7) = (1,2). 16 The multiple p = aba = ga = za is closest; q, = a/|la|| = 5a and q2 = B/||B\| = ¥3 (6, —3, 10, 10). 18 If 8. A= QR then A'A = R!R = lower times upper triangular. Pivots of A'A are 3 and 19 (a) True (b) True. 20 The orthonormal Ox = x1q) +242. \| Ox ||2 = ce +x3 because q| -q2 = 0. vectors are (5, 5 5.4) and (—5/V52, —1/V52, 1/V52, 5/52). Then b projects to p = (—3.5, —1.5, yr 1.5). Check that b — p is orthogonal to both vectors. 21 A=(1,1,2), B=(,-—1,0), C=(—1,-1, 1). Not yet orthonormal. 23 (a) One basis for this subspace is vy = (1,—1,0,0), (6) (1 ad) v2 = (1,0,—1,0), v3 = (1,0,0, 1) (c) by = (5,4, 5,9) and by = (4,4, 4, 3). iy B 25 (q3C*)qo is the same as a ae because q7 = [By] and the extra q; in C* is orthogonal to q2. 28 There are mn multiplications in (11) and sm 2 n multiplications in each part of (12). Solutions to Selected Exercises 32 (a) Qu = (I — Quu')u =u —2uu'u. (b) Qv=(1 This is —u, provided that ulu equals — 2uu')v =u—2uu'v=u, | provided that uly =0. 33 The columns of the wavelet matrix W are orthonormal. 7.3 for more 469 Then W~! = W!. See Section about wavelets. Problem Set 5.1, page 213 det(5A) = (4)? det A= —3 and det(—A) = (—1)? det A = 3 and det(A?) = 9 and det(A~!) = —3. (a) False (b) True 4 Exchange rows (Sine (c) True (d) False. | and 3. Exchange rows leet e nl — le 1 and 4, then 2 and 3. Dhesdetenminants area lee Ieee lean lenn | ee. lJio1| = 1. Multiply the zero row by ¢t. The determinant is multiplied by ft but the matrix is the same = deu—0)! O'=Q7! and O'QO=1 5/0)? =15|0|=H1. 10 If the entries in every row add to zero, then (1, 1,..., 1) is in the nullspace: singular A has det = 0. If every row adds to one, then rows of A —J add to zero (not necessarily det A = 1). 11 CD =—DC = |CD| = (-1)"|DC| d 12 det(A~!) = If n is even we can have |CD| 4 0. —b det lees ad —be 13 and not —|DC|. =e | = ae = ie. ad —be det(A) = 24 and det(A) =5. 15 det A = 0 and det K = 0. 16 Any 3 by 3 skew-symmetric K has det(K!) = det(—K) = (—1)3det(K). This is —det(K). But also det(K !) = det(K), so we must have det(K) = 0. 19 cet “Oi a= Ibe ae = (ad — bc)(1 = LI). Rules 5 and 1 give Rule 2. det(A) = 3, det(A~!) = 4, det(A — Al) = A? — 44 +3. Then A = 1 and A = 3 give det(A — AJ) = 0. Row 2 =2 Row times row | so det A = 0. 3 — row 2 = row 2 — row af/aa be affac] | ate pile aac. ad—be | so A is singular. ae |. _1 |-ax a ad—bc d *) = a =A™. 470 Solutions to Selected Exercises Problem Set 5.2, page 225 det A = 4, its columns are independent; det B = 0, its columns are linearly dependent. Each of the 6 terms in detA is zero; the rank is at most 2; column 411423432a44 gives —1, aj4a73a32a4, detB =2-4-4-2-—-1-4-4-1=48. gives +1 so det A = 0; (a) If ay) = a22 = a33 = O then 4 terms are sure zeros 8 5!/2 = 60 permutation matrices have det(P) = +1. 9 If a1942g°--anw #0, put rows 2 has no pivot. (b) 15 terms are sure zeros. Put row 5 of J at the top. 1, 2, ..., nm into rows a, B, ..., w. Then these nonzero a’s will appear on the main diagonal. 3 11 C= .C=] SS 0 0 =a 42 235 —21 14]. |B} =1(0) +2(42) + 3(-—35) = —21. ste Meee Sn 2iae 40 0 2 4 2 and, AC! = 0400) Therefore Ay = C'. (ersee 00 4 1-1 1-1 ivan = 2det}-1 2 -1]+det}-1 2 = 21B,)~ aet| 13 |B4| 12 C= -1 2 -1 -1 ]=21531- at vie |B9|. 15 Must choose 1’s from column 2 then column 1, column 4 then column 3, and so on. Therefore n must be even to have detA, 4 0. The number of row exchanges is 3D2 so Ch = (-1)"/2. 16 The 1, 1 cofactor is E,-,. The 1,2 cofactor has a single 1 in its first column, with cofactor E,—2. Signs give Ey, = E,-; — En—-2. Then 1, 0, —1, —1, 0, 1 repeats; Ejo9 = —1. 17 The 1,1 cofactor is F,-,. The 1,2 cofactor has a 1 in column 1, with cofactor F,_9. Multiply by (—1)!+? and also (—1) from the 1,2 entry to find Fy = Fy—1 + F,_2- 20 G7 = —-1, G3 =2, G4 = —3, and Gy = (-1)""!(n - 1). 21 (a) If we choose an entry from B we must choose an entry from the zero block; result zero. This leaves two entries from A and two entries from D leading to |A||D| (b) and (c) Take a=|i a B=) a Gal 0 0 1 0 vs ae 0 0 22 (a) All L’s have det = 1; det U; = detAy =2 then 6 then —6 E20 23 Problem 21 gives det heer If AC =CA |= A Oni Be (b) Pivots 2, 3, —4. B 1 and aet|¢ 4 = |A| times Aj |D—CA*B|. this is |AD — CAA~!B| = |AD—CB|. 24 If A is a row and B is a column then det M = det AB = dot product of A and B. If A is a column —al)e Ti and B is a row then AB has rank | and det M = det AB = O (unless Solutions to Selected Exercises 471 25 (a) det A = ay, Aj, + 412A12 +--- +4 nAjn. The derivative with respect to aj, is the (b) dIn(det A)/da,,; = A,,/det A is the (1, 1) entry in the inverse matrix cofactor A, fa! 27 There are five nonzero products, all 1’s with a plus or minus sign. column) numbers and the signs: Here are the (row, + (1, 1), 2)(3, 3)(44,4 + C, 2)(2, D3, 494, 3) — d, 2)(2, DG, 3)(4, 4) — 0, 1I)Q, 2)(3, 44, 3) — d, DQ, 3)G, 2)4, 4). Total 1+ 1—1—1—1=-1. 29 |Sy| = 3, [So] = 8, |$3| = 21. The rule looks like every second Fibonacci number ...3, 5, 8, 13, 21, 34, 55, ... so the guess is |S4] = 55. Following the solution to Problem 28 with 3’s instead of 2’s gives |S4] = 81+1—9-—9-—9=SS. 31 Changing 3 to 2 in the corner reduces the determinant by 1 times the cofactor of that corner entry. This cofactor is the determinant of S$, ; (one size smaller) which is F,. Therefore changing 3 to 2 changes the determinant from F 2,42 to F2,42 — F2, which is F241. Problem Set 5.3, page 240 — (det A= 5.) |Bij=3, |Bo|=—2, det Biy— 105) 3 (a) xy = 3/0 and x2 = —2/0: 4 (a) det([b a idetb> =) —siso x) = 2) and 25) — 1 (b) |A| = 4, |B3|=1. Therefore xy = 3 and x. =—} and x3 = }. no solution a ])/ det A, if det A 40 (b) x; = 0/0 and x7 = 0/0: undetermined. (b) The determinant is a linear function of its first column so we get x;|a@, @2 a3| + x2\/az az a3| + x3|a3 az a3]. determinants are zero. The last two If the first column in A is also the right side b then det A= det By. Both By and B3 are singular since a column is repeated. Therefore xj = |B,|/|A| = 1 and x2 = x3 =0. eo (a) | 0 oe 2 ae i 0 4 If all cofactors (b) a = eae 3 1 1 5 1 5 . The inverse of a symmetric matrix is symmetric. 1 1 a Ai 0 then A~! would 3 i ; be zero if it existed; has no zero cofactors but it is not invertible. 6 ate-20 C=|-3 4 -1] On 10 ral and AC'=|]0 : it cannot : exist. A = ol hi ae OO) 3 O|. Therefore det A =3. OOS 10 Take the determinant of both sides. The left side gives det ACT = (det A)(det C) while the right side gives (det A)”. Divide by det A. 1 11 We find det A= (detC)*-! with n = 4. Then det A~! is 1/detA. Construct AW! using the cofactors. Invert to find A. 472 Solutions to Selected Exercises 13 Both det A and det A~! are integers since the matrices contain only integers. But det A~! = 1/det A so detA =1 or —1. 15 (a) C21 = C31 = C32 = 0 (b) Cy2 = C21, C31 = C13, C32 = C93. 16 For n = 5 the matrix C contains 25 cofactors and each 4 by 4 cofactor contains 24 terms and each term needs 3 multiplications: 1800 vs. 125 for Gauss-Jordan. 17 (a) Area |22| = 10 311 18 Volume = |131| (b) 5 (c) 5. = 20. Area of faces = length of cross product 113 ;|211 19 (a) 5/341 timate ==85 (b) 5 + new area 5] 051 =p) cea, = De -101 21 (a) V = L,L2L3 when the box is rectangular < (/3)? < 6. Challenge question: (b) |ajj| < 1 gives each L < J/3. Then Is volume = 5 possible? 1.0.0 |0 1 Oj]; De BOS 24 The box has height 4. The volume is 4. The matrix is (k-w) = 6V2. 3ut 051 volume ijk |311| ix yj =k and and 2n(n — 1)-dimensional faces. =4. 25 The n-dimensional cube has 2” corners, n2"-1 edges The cube whose edges are the rows of 2/7 has volume 2”. 26 The pyramid has volume 28 i singcos@ pcosycosé —psing sind singsin@ pcosgsin@ psingcosé cos p 29 a The 4-dimensional pyramid has volume —psing dr/dx or/dy 0/ax 30/ay = | = p- sing, needed 4: for triple integrals in spheres. 0 cosé = sin® 1—-1sino 1r cosd r 1 Pp 30 The triangle with corners (0,0), (6,0), (1,4) has area 24. Rotated by 6 = 60° the area is : 1 288 unchanged. 32 Ve The determinant of the rotation matrix is J = 2a) lee On| i 2 2 deta 34 (wx u)-v=(vx cos@ —sin@ | _ | 32 sin@ cosé 2 2/=1, D il 0) w)-u=(u x v)-w: Cyclic = even permutation of (u, v, w). 35 S = (2,1,-—1). The area is ||PQ x PS|| = ||(—2, —2, —1)|| = 3. The other four corners could be (0,0, 0), (0,0, 2), (1, 2,2), (1, 1,0). The volume of the tilted box is 1. ASS 36 If (1,1,0), So 37 det} (1,2, 1) and (x,y,z) are coplanar the volume is det} Wap Ke ce 3 il WOR 2 1|=0O=7x 2D 3 —5y +z; plane contains the two vectors. 1 1 he 0] =0 = Solutions to Selected Exercises 473 Problem Set 6.1, page 253 A and A* and A®™ all have the same eigenvectors. 1 and 0.25 for A2, 1 and 0 for A®. The eigenvalues are | and 0.5 for A, Therefore A? is halfway between A and A™. Exchanging the rows of A changes the eigenvalues to 1 and —0.5 (because it is still a Markov matrix with eigenvalue 1, and the trace is now 0.2 +0.3—so the other eigenvalue is —0.5). Eigenvalues can change completely when rows are exchanged. Singular matrices stay singular during elimination. X41 = —1 and Az =5 with eigenvectors x; = (—2,1) and x» = (1,1). has the same eigenvectors, with eigenvalues increased to O and 6. The matrix A-+/ Avhas;s Aq = A? has the same —3 and Xo — 2 with x7 = G,—2) and x», = d= 1): eigenvectors as A, with eigenvalues Ke = 9 and v3 = 4, A and B have A; = 1 and Az = 1. AB and BA have 4 = 5(3+ /5). Eigenvalues of AB are not equal to eigenvalues of A times eigenvalues of B. Eigenvalues of AB are equal to eigenvalues of BA. (a) Multiply Ax to see Ax which reveals A (b) Solve (A —AJ)x 10 A has A, = 1 and Ay = .4 with x; = (1,2) and x2 = (1,—1). Az = 0 (same eigenvectors). So A! 12 (a) Baa (uu!)u 5(-1 +i/3); 15 Set 24 =0 has A; = 1 and — (.4)100 which is very near zero. = u(ulu) =O nN — x2 = (—3,0,1,0), 4 =-—1 oll (b>) eve (uu! )v = u(u! v) a) x3 = (—S,0,0, 1) are eigenvectors with 1 = 0. and | (the eigenvalue | is repeated). to find det A= (A1)(A2)--- Qn). 16 If A has A; =3 and Az = 4 then det(A —AJ) NM = 5(a +d+J(a—d)* 19 lands A™ is very near A™. (c) x} = (—1,1,0,0), 14 i A100 has i = 0 to find x. +4bc) =QA—3)Q-4 = Neaen and dr» = s(at+d— re eee dae 7h S od Always Their sum is a+d. hase OO OS Ose Oy A=| 38 “Al 22 A = 1 (for Markov), 0 (for singular), -4 (SONSUDIN==ntraces— 5). 26 =) Ihe Py Si We 27 rank(A) = 1 with 2.= 0, 0, 0;:4; 28 Behas 1 and B=A-—TI rank(C) = 2 with 2 =0, 1b eon sondey b= —See he) has A=-—1, —1, —1, —l, 4. 30 b i H = ea Gadel Carn! = a+o){ Dy A2 =d—b 0,: 2,,2. iimatnixAn — to produce trace =a-+d. 32 (a) u is a basis for the nullspace, v and w give a basis for the column space (O, 7 t) is a particular solution. Add any (c,0, 0) would be in the column space, giving dimension 3. (c) If Ax =u (e)) 28 = had a solution, u 474 Solutions to Selected Exercises Problem Set 6.2, page 266 = Fea oml reel raanl ares eal beelllea WI) Kl If A= SAS7! is still S. 10 then the eigenvalue matrix for A+2/ is A+2I/ and the eigenvector matrix A+ 27 = S(A+2NDS— = SAS! + SQNs-! = A421. (a) False (b) True eel i eer (a) Aa (0) eel (c) True il rae Alhas Ay =1, De (d) False. Ag =—4 with x} =(1,1), 2 ale ee le ‘ Foy = 6765. i x2 = (1, -2) 2 1 ae athe att Saga ae ale jfee=[ 4 3 3 3 3 o[%r]-«[8)-] itl-| Geri | 2 ae 1 Bt 1 SAN sok 13 A% (x, +Xx2) = Ax, se A29x5 = oa af eee The second component is ie + eu jaoyy) = May P27L, 15 (a) True (b) False (c) False. 16 (a) False (b) True (c) True. 20 21 23 0 \, 1 2] \ = 1 1 F k 1 0 Tea 2 s=[ Ei; ako | j] ant sats] ale SG a B}0 I = sum ce Met ie Oe RS es ae eae: 101 2 |= acto ie of those two. Oe = Oe soa Tae Of zaihe@ aia pha 24 det A = (det S) (det A) (det S~!) = det A = 41 ---An. This works when A is diagonalizable. 26 Impossible since the trace of AB — BA is trace AB — trace BA = zero. E = li a 29 If A has columns (x1,..., X,) then A2 = A means CAxiSe as Go) = (CSilposonkso vectors in the column space are eigenvectors with Dimensions add to n => A is diagonalizable. 30 Two problems: The nullspace and column pendent eigenvectors in the column space. 31 n= Sy AS = E space can overlap. i the square root of B would 2 (imaginary) so the trace is not real. E “All A = 1. Always the nullspace has 4 = 0. There may not be r inde- have 4 = J/9 (real) and A = /-1 | does have a real square root |: al Solutions to Selected Exercises 475 35 If A = SAS! then the product (A—A ,/)---(A—AnJ) equals S(A—A,/)--- (Nese The factor A —A;/ is zero in row j. The product is zero in all rows = zero matrix. 36 a2 —1 factors into (A — 1)(A+1) so Problem 35 gives (A—J/)(A+/) = zero matrix. Then A* =I and not necessary. A=A™!. Note: Factoring p(A) = det(A — AJ) into (A —A1)-:-(A— An) Is The Cayley-Hamilton Theorem just says p(A) = zero matrix. 3H/ (a) The eigenvectors for 4 = 0 always span the nullspace X # 0 span the column space if there are r independent multiplicity = geometric multiplicity for each nonzero i. (b) The eigenvectors for eigenvectors: then algebraic Problem Set 6.3, page 279 uy =e! lal u, =e! ei If u(0) = (5, —2), then u(t) = 3e% A +- Qe! E | z(t) = —2e’; then dy/dt = 4y—6e! with y(0) =5 gives y(t) = 3e* +2e! as in Problem 1. 1? = E | |» Then 4 = 4(5+ V41). / E al has Ay = 5, x1 = (2,1), r(t) = 20e% + 10e, x2 = (1,2); Ag =2, wo) — 10%! + 20e2!. The ratio of rabbits to wolves approaches 2 to 1. When A is skew-symmetric, Q = e4! is an orthogonal matrix and Je" u(0) |}= ||u(0)||. 0 []=t[] ALi} oie [te] [i] a 1 i 10 1 wl 4 Bye |eil 0 4 12 Up = A7!b; Up = 4andu(t) = ce2! +4: Up = B and u(t) = cye [| rene HEE a 13 Substituting u = ev gives ce“v = Ae“v —e“b or (A—cl)v =b or particular solution. 17 The solution at time t +7 is also e4¢+7)u(0). 18 1 1 fe eit? 1s) ee wea Aral lela zi is ah s5| as 20 e (a) The inverse of eA! is eA! v= (A— cl) A equals e ACQt+T) Thus e4! times e4! (b) If Ax = Ax then ely = eMx and e !b= 476 Solutions to Selected Exercises 24 x(t) = e*! and y(t) = —e*! is a growing solution. is The matrix for the exchanged system , ‘i and it does have the same eigenvalues as the original matrix. Problem Set 6.4, page 290 O=F| Ze te (2-20 =i -1 -2 2 (a) E Al (b) The pivots have the same signs as the A’s (c) trace = 1,+A2 = 2, so A can’t have two negative eigenvalues. IfA? = 0 then all 2° = 0 so all A= OA} gl = 0 gives Oma A= 0 as inA =|0 A = 0 and therefore al If A is symmetric then A = 0. If A is complex then d is also an eigenvalue and A+ third eigenvalue must be real. is real. The trace is real so the 10 If x is not real then A = xl Ax/x'x is not necessarily real. 0 12 4 =ib and -ib} A=|-3 0 3 0 0 4] —4 0 has det(A—AJ) = —A2—25A =O and A=0, Si, —Si. 13 Skew-symmetric and orthogonal; 4 =i, i, —i, —i to have trace zero. 15 (a) AZ = dy, Aly SO ATA are > 0 to (On OS— 1) ATAz = Alay) = (b) The eigenvalues a ORO) of B are —1, —1, ea — (ONION): 16 The eigenvalues of B are 0, V2, —J/2 with x; = a Aly = iz. The eigenvalues 1, 1 with x; = (1,—1,0), x2 = of (1,0, —1,0), X3 I| (1,1, V2), Ly) Te 17 (a) [xj x2] is an orthogonal matrix so P, + P2 = xixe + xzxt = [x1 x || 2 (D) PE eae (xl1 xo)xJ2 ==). 18 y is in the nullspace of A and x is in the column space. But A = A! has column space = row space, and this is perpendicular to the nullspace. If Ax = Ax and Ay = fy then (A — BI)x = (A — B)x and (A — B1)y = O and again x L y. Solutions to Selected Exercises 21 ale? (a) False. =, ‘| (c) True. A= OAQ7! and A~! = QA-!97! 16 ial is also symmetric ik il elle fi ot (die Ealsestor2At—s| 22 ile ae 13 (b) False: E AliF sek 477 If A’ = —A then ATA = AAT = —A?. If A is orthogonal then ATA = AAT = J. AN = be ;|is normal only if a =d. 23 A and A! have the same i’s but the order of the x’s can change. A = ee Alhas 4 =i and —i with x; = (l,i) for A but x; = (1, —i) for A. 25 Symmetric matrix when b = 1; repeated eigenvalue when b = —1. Problem Set 6.5, page 302 ait a f a olf) 9s { Wott 0 27 |tee al Solel ale: Real: ‘ft age wel ie s)-E if .*s][o 7) y =—1. x? + dxy + 3y? = (x + 2y)? — y? is negative at x =2, xT AT Ax = (Ax)! (Ax) = 0 only if Ax = 0. Since A has independent columns this only happens when x = 0. 4-4 A= 8 |-4 4 8 -8 determinant -—8 | has only one pivot = 4, rank A = 1, eigenvalues are 24, 0, 0, 16 = 0. VAri==)25 5 (A> |= 65 Ac| = 308 the pivots, aree2/ 1516/25, 30/6: A is positive definite for c > 1; B is never positive definite (determinants d — 4 and —4d + 12 are never both positive). If aj, were smaller than all the eigenvalues, A —aj,,/ would have positive eigenvalues (so But it has a zero in the (1, 1) position. positive definite). 19 If Ax = Ax then x! Ax = Ax!x. If A is positive definite this leads to 4 = x Ax/xTx > 0 (ratio of positive numbers). 22 Sy tii R= pli Sil [0/9 spell IAD il st levee me [9 2 2g Bed 2a Ree pee Lae bn lesen ee alec =1 23 Ay = 1/a? and Az = 1/b* so a = 1//Aq and b = 1/./Ap. The ellipse 9x* + l6y* has axes with the half-lengths a = 5 and b= i 27 ax? + 2bxy + cy? = a(x + By)? + SFy?. 478 Solutions to Selected Exercises 28 det Ar 10: 2 andi ach (Cosi —SinG) eee asin. 6, cos 6); the A’s are positive. 2 29 Ay = ‘e 5x7 +y=0; Z| is positive definite if x #0; f) = (5x? + y)? = 0 on the curve Ar= lt ‘|is indefinite and (0, 1) is a saddle point. 31 If c > 9 the graph of z is a bowl, if c <9 the graph has a saddle point. When c = 9 the graph of z= (2x + 3y)” is a trough staying at zero on the line 2x + 3y = 0. Problem Set 6.6, page 310 ieee (MN)~'A(MN) Eight families so ifB is similar to A and C is similar to B, then A is similar to C. of similar matrices: 6 matrices have 4 = 0, 1; 3 matrices have 4 = 1, 1 and 3 have A = 0, 0 (two families each!); one has X = 1, —1; one has 4 = 2, 0; two have a7 (4/5). (a) (M~!AM)(M7~!x) M~!AM 10 = have the same pose year 0 M~!(Ax) = M!0=0 (b) The nullspaces of A and of dimension. ae oe cl|0 c¢ et Qe yas ee On Ace me aCe’ 3a jel lm CO tke } all Olweetee wy Fe aleeeee lt amma Ores he x0) —-l_ el ae 13 Choose M; = reverse diagonal matrix to get M eee M; = mM} in each block; Mo has those blocks M; on its block diagonal; (MMoM")-!A(MMoM!). 15 then Al = (M~1yTyT yt = (M-!)TM,)'JMoM™ _ A? is similar to A. det(M—!AM — 21) = det(M~!AM — M—!AIM) = det(M~!(A — AI) M) = det(A — AJ). 17 (a) True: One has 4 = 0, the other doesn’t trix and A is symmetric (c) False: a LO All eigenvalues of A+ / are increased by 1. (b) False. Diagonalize a nonsymmetric ma- | and G in are similar 1 (d) True: 19 (b) AB has all the same eigenvalues as BA plus n — m zeros. Problem Set 6.7, page 318 — ATA= 2 20 #4 has oe =, 80 Oi] = eval 4/17 uy = bes for the column space, vy = bee for the nullspace, vz = ERs 7 “1/ el v7) — for the row space, uz = Lees for the left nullspace. Solutions to Selected 1 2 es. ; oo) : Hie fv? - Exercises 7 479 1/./2 / iF 4 has of = 3 with uj = lial and of = 1 with u2 = bea co ATA=]1 i 1/V6 iO 1] has o? =3 with »,=| 2/V6|, 2 1/J2 02 =1 with» =| 0 |; and 1/V3 v3 = |-1/V3 1/V3 a 1 0 [ I f= [a u2 | v3 | [m v2 3] 1 0 Are io my 11 Multiply uy! using columns (of U) times rows (of DV‘). Suppose the SVD of R is R= UXV!. Then multiply by Q. So the SVD of this A is (QU)=xV!. 15 (a) If A changes to 4A, multiply © by 4. (b) AT = VETUT. And if Aq! exists, it is square and equal to (VT)! p-ly-!, Problem Set 7.1, page 325 oS (a) S(T(v)) =v Choose (b) S(T (v1) + T(v2)) = S(T (v1) + S(T (02)). v = (1,1) and w = (—1,0). Then T(v) + T(w) = v+w (a) T(T(v)) =v (b) T(T(v)) = v + (2, 2) (d) T(T(v)) = T(). (a) T(1,0) =0 Th(Oi) (b) (0,0, 1) is not in the range 2) A is not invertible. (22) sen AM =/ (Cc) RON 10% (GeD) — le 1) +522, 0) then T(v) = b(2, 2)+(0, 0). is impossible. No matrix A gives A i ‘|= but T(v+ w) = (0,0). (c) T(T(v)) =—-v A hs es = E a E Ar To professors: The matrix space has dimen- sion 4. Linear transformations come from 4 by 4 matrices. Those in Problems special. (a) True (b) True (c) True 13-15 were (d) False. T(1) =0 but M = i s|= T(M); these fill the range. M = b | in the kernel. T(T~!(M)) = M so T~!(M) = A~!MB-!. (a) Horizontal lines stay horizontal, vertical lines stay vertical a line (c) Vertical lines stay vertical. (a) ad —bc=0 (b) ad —bc > 0 (b) House squashes onto (c) jad — be| = 1. If two independent vectors are transformed to themselves then by linearity T = /. This emphasizes that circles are transformed to ellipses. See also Figure 7.7. 480 Solutions to Selected Exercises Problem Set 7.2, page 337 LI S0 = S09) — OF S03) — 20S = 0p eB — CNS SS) &) SS) (ea) Qo) QQ So SSO) 4 Third derivative has 6 in the (1,4) position; fourth derivative of cubic is zero. See As Ql ean Oc @ i al () al 6 Tv +02 + 03) =2w, + w2 + 23; A times (1,1, 1) gives (2, 1, 2). 7 v = c(v2 — v3) gives T(v) = 0; nullspace (0, c, —c). 9 We don’t know 7(w) unless is (0O,c, —c); the w’s are the same solutions as the v’s. are (1,0,0) + any In that case the matrix is A? 12 A+ = 1° @O |=1 1 0 -l ¢ 0 | so T~!(w}) = i) = Ode 1 the only solution to T(v) = 0 is 15 (a) E | T~!(w2) — 1b) = DA v= 0. (b) E i = inverse of (a) —| 16 (a) M= E i T~!(w3) —aae (b) N= E “| —(c) alé| must be 2A bl (c) ad = be. pwr=[! JE J=[2 3] 21 w(x) =1—x?; w3(x) = A(x? —x); 0 YA T'S OY OS || 5 1 0 QO —.5 -l 5) ]. y=4uw, +5w2 + 6w3. v’s to w’s: inverse matrix 25 a2 = 11241 + 12242 gives az as a combination Swe 26 Row 2 of A is /9;(rowl invertible. 30 T(x, y) =, 30 fe 2(@-—a) sin2(9—a@) | 1 1 = | 1 0 0 1 —-l 1 of the qg’s. So the change of basis matrix of U) + /o2(row2 of U). The change of basis matrix is always —y) and then S(x, —y) = (—x, —y). Thus —sin2(0—a) cos2(6 — al rotates by 2(0 — a). 33 False, because the v’s might not be linearly independent. ST = —I. Solutions to Selected Exercises 481 Problem Set 7.3, page 345 The inverse matrix is W~! = v= 1 1 1 i 4 4 4 4 1 Lee : : iEN Geel a 4 a0 OF Os ae 1 aaa) |. Then e = wy ar iw a 5w3 and 1 wWw3+4+ W4. The eight vectors are (1,1, 1, 1,1, 1,1, 1) and the long wavelet and two medium wavelets Cr —1, —1,0,0,0,0) each time. The Hadamard H/4. and (0,0, 0,0, 1, 1,—1, —1) and four short wavelets with 1, —1 shifting matrix H has orthogonal columns of length 2. So the inverse is H'/4= The transpose of WW-!=1 is (w-lyTwt = 1. So the matrix W! (which has the w’s in its rows) is the inverse to the matrix that has the w*’s in its columns. Problem Set 7.4, page 353 10 A Tae ey 20 aa has A == 50 and 0, | 2 n=x, vil! [5], v2 Ee 5 [ih o, == V50. By il ‘4 has 4 == 50 and 0, a= le 1 u=F5{5| AA [s= Ag Ole + _ : [oyu 13 Ts Ke | Te B 1//50 vl 0 +07U205. Aft (5 O70]V andi At Sev 36 6.48) «(OO 48 .64 0 0 0 0 17 (a) A'A is singular 18 x (b) ATAxt 2 “12, 180: Pl S216) = ATb 4 + _ 203||P oe A is singular. ha: oy a In general this is ojuyv, +++ +o,u;v). 0 ATA=| 3 |i]. Zak H is semidefinite because 0 [arsS ay jE a 2 oY = |) al ANE i v cata} 4 = oud) v2 ur oe + aay ae: 0 (c) Ud - AAT) in the row space of A is perpendicular to ¥—xt projects onto N(A!). in the nullspace of A! A = nullspace of A. The right triangle has c? =a* +b’. 20 At =3[.6 8]=[.12 .16] and 36.48 AtA =[1] and aat =| | 48.64 482 Solutions to Selected Exercises 21 L is determined by one number below the diagonal. Unit eigenvectors in S are each determined by one number. The counts are 1+3 for LU, 1+2+1 for LDU, 1+3 for 23 OR, 1-23-1 for USV 1. 142 Mifor SASS" a Ip 1 b Olfa (LU -T)(uUT DU ) = (triangular)(symmetric). P a#0. Change needs dls eal E ‘| er I haven’t found a good application yet. Problem Set 8.1, page 366 —1 |-1 0 A= Aly =0 1 0 OO - 14; -1 1 for c nullspace contains | c c y = (1,—1,1); current = 1 |}; | O | is not orthogonal to that nullspace. 0 1 along edge 1, edge 3, back on edge 2 (full loop). Kirchhoff’s Current f =(,0,0); nN Law 2 |-1 —1 -1 2 -l -1 —1, 0 and currents —Ax AlAx = Al f is solvable 2) 3 exe =" | — 35 0 2 1 “I A! y = for f = (1,—1,0) and not solvable for f must be orthogonal to (1, 1, 1) in the nullspace. -1l -2 S28 —2 4 —CAx = ie 3, i 2 1 1a 0 ff sends 3 units into node 3 |—1 —2 A= 2, il j and currents = 2, 1, —1; fe producessx7— |e —s 1 On| —1 myleldsixt—s G aie I potentials: 1e G 1 and out from node 2. 5/4 elem 7/8 Cc ||-|> een GC empotentials Elimination on Ax = b always leads to y'b = 0 which is —b, + by — b3 = 0 and b3 — b4 + bs = 0 (y’s from Problem 8 in the left nullspace). This is Kirchhoff’s Voltage Law around the loops. 2-1 11 en 0 13 elABOG es =! -1 0 -1 diagonal entry = number of edges into the node -l1 3 -1 off-diagonal entry = —1 -—1 =! 3 Ter 4 —2 —2 8 Pipa OR Sar 2, if nodes are connected. —2 0 +3 =3 eels 3 mG x4 = 0 and solved 3 equations); ae2 1 0 : ¢ veal 9 | gives potentials x ==o (75, 6, 6-0) (grounded =| y=—CAx = G. , 0, 5 5). 17 (a) 8 independent columns (b) f must be orthogonal to the nullspace so fy) +---+ oO (c) Each edge goes into 2 nodes, 12 edges make diagonal entries sum to 24. Solutions to Selected Exercises 483 Problem Set 8.2, page 373 N ae He pe mt As) feat(oma " Ge cePrEoncnes A= 1 and.8, 6 iO mille ONL x =(1,0); ae eel eelv6 eee din Dee aE x =( 25 9) A= 1 and —8, h II — fl ro)=) Q. Al tad II — Sy Wo] | WIR must be The steady state is (0,0, 1) = all dead. If Ax = Ax, add components cua ()h on both ; CS) k —> 0) gives A~k > CO.A~- my 6+ 4a A=a [404 .6—.6a eens; M2 is still nonnegative; [1 --- 1]M=[1 1] so multiply by M to find [1 --X= 1]M?=[1 1 and a+d—1 --- 8 pd= le |and 130 a If A £1 the sum with —2 <a <1. from the trace; steady state is a multiple of x; = (b, 1 —a). the solution approaches 1S --- s = As. 1]= columns of M? add to 1. B has A = 0 and —.5 with x; =(.3, Wheseisenvecton sides to find .2) and xp = (—1,1); e7 approaches zero and cje™! 0,x) = cx}. — (le le )rand Ane ia Mi 5 19 Xk = 1 (Markov), 0 (singular), (COM Oe 9). il : 4 has no inverse. .2 (from trace). Steady state (.3, .3,.4) and (30, 30, 40). 20 No, A has an eigenvalue 4 = 1 and (J — A)~! does not exist. Problem Set 8.3, page 381 —_ Feasible set = line segment from (6,0) to (0,3); minimum cost at (6,0), maximum at (0, 3). Feasible set is 4-sided with corners (0,0), (6,0), (2,2), (0,6). Minimize 2x — y at (6,0). 3 Only two corners (4, 0,0) and (0, 2,0); choose x; very negative, x7 = 0, and x3 = x; —4. From (0,0,2) move to x = (0, 1, 1.5) with the constraint xj +x2 +2x3 = 4. The new cost is 3(1) + 8(1.5) = $15 so r = —1 is the reduced cost. The simplex method also checks x = (1,0, 1.5) with cost 5(1) + 8(1.5) = $17 so r = 1 (more expensive). Cost = 20 at start (4,0, 0); keeping x; + x2 + 2x3 =4 move to (3, 1,0) with cost 18 and r = —2; or move to (2,0, 1) with cost 17 and r = —3. Choose x3 as entering variable and move to (0,0, 2) with cost 14. Another step to reach (0,4,0) with minimum cost 12. c=[3 5 7] has minimum cost 12 by the Ph.D. since x = (4,0,0) is minimizing. dual problem maximizes 4y subject to y < 3, y <5, y < 7. Maximum = 12. The 484 Solutions to Selected Exercises Problem Set 8.4, page 387 — Hex COSCO da = denominator notice ae on [ae | j —k #0). = 0 and similarly Ga cos(j — k)x dx = O (in the If 7 =k then ee cos” [pe Cie = a [SO —cx)dx = 0 and eae — 5)(x3 —cx)dx = 0 for all c (integral of an odd function). Choose c so that ihe x(x3 —cx)dx = [$x _ alae a F _ c§ a) een) pes c=%. The integrals lead to aj = 0, b) =4/n, bo =0. From equation (8) the a, are zero and by = 4/mk. The square wave Then equation (6) is 27 = m(16/1°)(s5 + a + a +.---) has Wilke ee so this infinite series equals nr /8. = V2; |v]? =1+a2+a4+---=1/(1—a’) so ol? =1+ 5+ 4+ gt --=2 80 lo] + sin?x lv =1/V1—a2; (271 + 2sin x)dx =2n +047 80 lIfll = V3x. @) 7G) = 5 +35 (square wave) U YS 5 cox so a’s are 5, OOF (b) ap = (C7 eax 2a =i 11 COS D — eee 5 + 5 Cos 2x;: 13 dy/dx =cosx has ands OUNei aja Ose cos(x + Toxo 3) = COS x COs io 3 y=yp+yn ere snare 2/77. Ose 2/370. pe—s—Ke exo xsin cot,7 =eal5 COSxX sin V3 : 7 Sinx. =sinx +C. Problem Set 8.5, page 392 = (x, y, z) has homogeneous coordinates (x, y, z, 1) and also (cx, cy, cz, c) for any nonzero c. 1 1 1 hy = 1 1 ese Tal es 1 O32. C pS 1 1 1 = 255 al i © 1 & c eae ) ST = c i is translation along (1, 6, 8). i a ‘ eee , TS= c she : al Ca c AC 1/8.5 Sr 1/11 for a 1 by 1 square. 1 n= (%,%, 3)has ||m] =1 and P=I—nnT=})-4 5 -4 -2 5 -2 mIOCumL , use oT. Solutions to Selected Exercises 10 Ch : oose (0,0, 3) on the plane and multiply a ee i sy T_ PT, = He ae esa Gy 11 485 Gr ese 9 (3,3,3) projects to 3(—1, —1,4) and (3, 3,3, 1) projects to (4, 4, 3, 1). 13 The projection of a cube is a hexagon. 14 1S 1 -—8 —4 —4 -4 7 (3,3, 3) — 2nn™) = (3,3,5)|-8 (3, 3,3, 1) > 3,3,0,1) > 1 -4/=(¥,-4,-}). (-4,-4,-§,) > 3,-4.4,. 17 Rescaled by 1/c because (x, y, z,c) is the same point as (x/c, y/c, z/c, 1). Problem Set 9.1, page 402 Without exchange, pivots .001 and 1000; with exchange, pivots | and —1. When the pivot is larger than the entries below it, /;; = entry/pivot has |/j;|<1. Ae — 1 ele 1 een 11/16 Sale 47/60 ee 1.833 OSS 0.783 COmMparcawithieAu| 0 mo —=3.6 1 0 —1 A=| 1.80 =i |} LO) 0.78 1 1 l 1 —l i |. |avall| <= oY! but ||Ax|| > 6. The largest ||x|| = Ab] is 1/Amin; the largest error is 107 197A are Each row of U has at most w entries. Then w multiplications to substitute components of x (already known from below) and divide by the pivot. Total for n rows is less than wn. L, U, and R need sn? multiplications to solve a linear system. Q needs n? to multiply the right side by o7} = Q!. So QR takes 1.5 times longer than LU On column 5ie multiplications (only the j by j upper j of 7, back substitution needs to reach x. left block is involved). Then 5(1? +2? +--- +n?) © 5(4n°). 10 With 16-digit floating point arithmetic the errors (Ome 10n" 10- a 11 cos 6 = 1/10, ||x — Ycomputed|| care of orderlO "9, 10° >), 10g sae ree ite sin@ = —3//10, R= Fe Osan ley He for ¢ = 12>. 10%, Ome, ra ace s\-#l 0 | 14 Q;;A uses 4n multiplications (2 for each entry in rows i and 7). By factoring out cosé@, the entries 1 and +tan@ need only 2n multiplications, which leads to zn3 for OR. 486 Solutions to Selected Exercises Problem Set 9.2, page 408 All =2, c=2/.5=4; 5,83: |All =3, c= 3/1 =3; All =24+ V2, c= 24+ V2)/2-V2) = For the first inequality replace x by Bx in ||Ax|| < ||All||x||; the second inequality is just |Bx|| < ||Bllllx||. Then ||AB] = max(||ABx||/|)x|) < AMI BI. The triangle inequality gives ||Ax + Bx|| < ||Ax|| + ||Bx||. Divide by ||x|| and take the maximum over all nonzero vectors to find ||A + B|| < ||A|| + ||Bll. If Ax = Ax then ||Ax||/||x|| = |A| for that particular vector x. ratio over all vectors we get ||A|| > |A|. When we maximize 13 The residual b— Ay = oes 0) is much smaller than b — Az = (.0013, .0016). much closer to the solution than y. 2 14 devAu=" c> 0m 26 i “soiA = 659,000 —563,000 Bowes 780,000 | Then ||Aj| > 1, the But z is |JA7!|] > 10°, 10°. 16 a Ss = ss is not smaller than max(x?) and not larger than we +e+-4+ x + 2|x||xo| + IIx 12. Certainly ee +.-- see <n max(x?) so |x| < Jn||xlloo. (sign x1, Signx2,..., Signx,) to get x - y = ||x||]. By Schwarz /n\||x||. Choose x = (1, 1,..., 1) for maximum Choose this is at most y = ||x|]||_y|| = ratios ./n. Problem Set 9.3, page 417 If Ax = Ax then (J — A)x = (1 —A)x. Real eigenvalues of B= J — A have |1 —A| < 1 provided A is between O and 2. Jacobi has S~!T= 3ak Gauss-Seidel | with’ |X |max = i. 0 has S~!T = : with |Almax = 5 = ({Almax for Jacobi)”. Wire Ol Set the trace 2 — 2w + ior equal to (w — 1) + (w — 1) to find Wopt = 4(2 — J3) = 1.07. The eigenvalues w — 1 are about .07. 15 The jth component of Ax, is 2sin aa = sin We using sin(a+b) = sinacosb+cosasinb, i 253°) iS" ah = A =) R=Q'A= g — sin are The last two terms, combine into —2 sin a cos =. The eigenvalue 2cOS ue 517 zl : 1 a 5 14 | gives wo =| | m= 47], w= 3/3). ere 1 cos@siné i — sin? 0 d Ay =RO= re 2 coso(1+sin*@) —sin36 — sin? 0 —cosé@ sin? 6 1 i Solutions to Selected Exercises 20 If A—cI = QR then Ay eigenvalues from A to Aj. 21 Multiply Agj = b;-14;-1 are orthonormal). tridiagonal. = RQ4+cI = Q-'(OR+4+c1)O = Q7!AQ. 487 No change in + 4j9; + Djaj41 by q} to find q} Adj = aj; (because the q’s The matrix form (multiplying by columns) Its entries are the a’s and b’s. is AQ 23 If A is symmetric then A; = Q-'AQ = o'AO is also symmetric. Aj = QT where T is = RQ = R(QR)R~! = RAR! has R and R7! upper triangular, so A; cannot have nonzeros on a lower diagonal than A. If A is tridiagonal and symmetric then (by using symmetry for the upper part of A,) the matrix A; = RAR7! is also tridiagonal. Problem Set 10.1, page 427 In polar form these are /5e!?, 5e7/?, Fe”, ai5) Izxwl=6, lz+wl<5, kz/wl=%, lz-wl <5. a+ib= 92 41 ba 3) i, 2b4 Br wl a1, 2+i; 2+i)U+i) = 143i; eM? =-i; eI™ =-1, z+z is real; GE =i; 18 = (3 =i. z—Z is pure imaginary; zz is positive; z/z has absolute value 1. (a) When a = b = d = 1 the square root becomes /4c; A is complex if c < 0 (b) A=O and 1 =a+d when ad = bc (c) the A’s can be real and different. 13 Complex ’s when (a + d)* < 4(ad — bc); write (a+ d)? — 4(ad — bc) as (a — d)? + 4bc which is positive when bc > 0. 14 det(P — AI) = A+ — 1 = 0 has A = 1, —1, i, —i with eigenvectors (1, —1, 1, -1) and (1, i, —1, —i) and (1, —i, —1,7) = columns (1,1,1,1) and of Fourier matrix. 16 The block matrix has real eigenvalues; so iA is real and A is pure imaginary. 18 7a— bean cle 5 — 6; multiply by e'? to get elt)? — |. 21 cos 39 = Re(cos 6 + i sin 0) = cos? 6 — 3cos6@ sin” 6; sin3¢@ = Im(cos 6 + isin 6)3 = 3 cos? 6 sin@ — sin? 6. 23 (a) e! is at angle 6 = 1 on the unit circle; |i®| = 1° = 1 candidates i¢ = ef ("/2+27N)e 24 (a) Unit circle (b) Spiral in to eon (c) There are infinitely many (c) Circle continuing around to angle 6 = Qn. Problem Set 10.2, page 436 z= multiple of (1+7,1+i,—-2); to all columns Az= 0 gives zH 4H = 04 50 z (not z!) is orthogonal of AM (using complex inner product z4 times column). The four fundamental subspaces are R(A), N(A), R(A¥), N(A#), 488 — Solutions to Selected Exercises 5 (a) (AH A)H = AH AHH — AHA again is | Az||? = (0 so Az =0. when N(A) = {0}. (a) False: On A = E =_(b) If AH Az = O then (gHAH)(Az) = 0. This The nullspaces of A and AHA (b) True: 0 are the same. 2+ D> Beta 0 5) 5. Aa = 24 5P t4P is invertible —i is not an eigenvalue if A = AH 10 (1,1,1), (, e274/3, 4771/3), 1, 47/3, 6271/3) are orthogonal because P is an orthogonal matrix—and therefore unitary. 11 G=|4 AA ma tie® 5 Ee (complex (c) False. inner product!) aa ee 5e4ti/3 + 4871/3, 13 The determinant is the product of the eigenvalues (all real). 15 18 ee if —1+i D 0 1 1-1 rs Ber 1 0 —1|3\—-1-i ‘ae peta teas STE tL ee eS NOn aU Vesa ils WW ce Oa tceae = aie Oe er he i. vi gives real A, trace zero gives with L2 = 6 + 2/3 has A = 1, —1. 19 The v’s are columns of a unitary matrix U. Then z = uuHz = (multiply by columns) = v, (viz) ae aie vn (viz). 20 Don’t multiply e~!* times e!*; conjugate the first, then ee elk dy = aan Po ts = 0. 22 R+iS =(R+iS)# = RT —-iST; 24 (ealancd 27 lal ; A) aerativanlCacal: a b+ic fee d a Ww el PZ } E ; iva with |w|* D, ee + |z|~ = 1. Unitary means UHU = 1 or (A? —iB™)(A+iB) =(A'A+ B'B) +i(A'B — BTA) =1. Then ATA + B'B =I 30 = R is symmetric but S is skew-symmetric. _fi-i alee and A'B—B!A 1-i]f1 0}, f242a° 5) Ifo ree =O which makes the block matrix orthogonal. =27_ & 5 |= 8A Problem Set 10.3, page 444 ee e —> 1 0 | 0 ah 1 1 0 0 zi 2 0 0 0 (1,1,1,1,0,0,0,0) The second vector becomes as + 2 0 2 0 2 0 1 20) 1 ay (4,0, 0, 0, 0, 0, 0,0) + (0, 0,0, 0, 1,1, 1,1) ~ 0 0 1 | Bs. 0 0 2 0 a 2 0 —2 0 (4, 0,0, 0,4, 0,0, 0) which (0,0, 0, 0, 4,0, 0,0) > (4, 0, 0, 0, —4, 0, 0, 0). If wt = 1 then w? is a 32nd root of 1 and Jw is a 128th root of 1. is Fc. Solutions to Selected Exercises 13 Cl 1C() at Clete CO) =p Goan Oi Cy. 14 con—tCQ) 1-16 Eigenvalues e} = 2—1—1=0, e4 =2—i3—i? =2. 15 Diagonal lect ci? + c3i°; ep =2-i-i> =2, 489 E contains the four eigenvalues e =2-(-1)—-(-) =4, Check trace 0+24+442=8. E needs n multiplications, Fourier matrix F and F-! need 5n logy n multiphi- 16 (cor 2) 4 (ct Fes); then (cy —¢9) 4- 1(cy — c3), them (ep +o) (co — C2) —i(cy — c3). These steps are the FFT! — (ef +°e3), then cations each by the FFT. Total much less than the ordinary n?. A FINAL EXAM This was the final exam on May 18, 1998 in MIT’s linear algebra course 18.06 1. If A is a 5 by 4 matrix with linearly independent columns, find these explicitly: (a) The nullspace of A. (b) The dimension of the left nullspace N (At), (c) One particular solution x, to Ax, = column 2 of A. (d) The general (complete) solution to Ax = column 2 of A. (e) The reduced row echelon form R of A. 2. (a) (b) 3. (a) Find the general (complete) solution to this equation Ax = b: |anes aes 2 il i 2 ey || == || 2 De cDiwnd 4 Find a basis for the column space of the 3 by 9 block matrix [A 2A The command A? ]. N = null (A) produces a matrix whose columns are a basis for the nullspace of A. What matrix (describe its properties) is then pro- duced by B = null (N’)? Notice N’ and not A’. (b) What are the shapes (how many rows and columns) of those matrices N and B, if A is m by n of rank r? 4. Find the determinants of these three matrices: 05.050 0 S&S \| 3 re hOC © Oo -2 Won 5. If possible construct 3 by 3 matrices (a) =) L—~ =) @ II 4- > ~ > ~ ae! ~~ > poy SS (ees A, B, C, D with these properties: A is a symmetric matrix. Its row space is spanned by the vector (1, 1, 2) and its column space is spanned by the vector (2, 2, 4). 490 A Final Exam (b) All three of these equations have no solution but B + 0: 1 0 Se == || Il 0 Bia 0 (c) (d) 491 Boxes 0 e0) | C is a real square matrix but its eigenvalues are not all real. The vector (1, 1,1) is in the row space of D but the vector (1, —1,0) is not in the nullspace. . Suppose Uv), U2, Uz iS an orthonormal basis for R?. basis for R? and vj), v2 is an orthonormal (a) What is the rank, what are all vectors in the column space, and what is a basis for the nullspace for the matrix B = w;(v; + v2)!? (b) Suppose A = ujv ote U2V5. Multiply AA‘ and simplify. Show that this is a projection matrix by checking the required properties. (c) Multiply ATA and simplify. This is the identity matrix! example compute A! Av; and then finish the reasoning). - (a) Prove this (for If these three points happen to lie on a line y = C + Dt, what system Ax = b of three equations in two unknowns x = (C, D) would be solvable? y= Oat t= —1, Vis lat ti 0; Vs Brat aa Which value of B puts the vector b = (0, 1, B) into the column space of A? (b) For every B find the numbers C and D that give the best straight line y = C + Dt (closest to the three points in the least squares sense). (c) Find the projection of b onto the column space of A. (d) If you apply the Gram-Schmidt procedure to this matrix A, what is the resulting matrix Q that has orthonormal columns? » (a) Find a complete set of eigenvalues and eigenvectors for the matrix ee Pie | Lew? h\= fo Re — (b) Circle all the properties of this matrix A: A is a projection matrix A has determinant larger than trace A is a positive definite matrix A has three orthonormal eigenvectors A is a Markov matrix A can be factored into A = LU 4 2 : P é Write up = [0] as a combination of eigenvectors of A, and compute w199 = 1 Al0 (0)Uo. MATRIX FACTORIZATIONS were oe triangular 4 (upper triangular U tion 2.6 ; on the Paemety Section 1’s on the diagonal / \ pivots Requirements: No row exchanges as Gaussian elimination reduces A to U. A=LDU= ” ~ lower triangular L pivot matrix \1’s on the diagonal / \ D is diagonal) upper triangular U \1’s on the diagonal Requirements: No row exchanges. The pivots in D are divided out to leave 1’s in U. If A is symmetric then U is L™’ and A = LDL". Section 2.6 and 2.7 PA = LU (permutation matrix P to avoid zeros in the pivot positions). Requirements: A is invertible. Then P, L, U are invertible. exchanges in advance. Alternative: A = L; P,U,. Section 2.7 EA = R (m by P does the row invertible E) (any A) = rref(A). Requirements: None! The reduced row echelon form R has r pivot rows and pivot columns. The only nonzero in a pivot column is the unit pivot. The last m — r rows of E are a basis for the left nullspace of A, and the first r columns of E~! are a basis for the column space of A. Sections 3.2—3.3. A = CC? = (lower triangular matrix C) (transpose is upper triangular) Requirements: A is symmetric and positive definite (all n pivots in D are positive). This Cholesky factorization has C = LVD. Section 6.5 A = QR = (orthonormal columns in Q) (upper triangular R) Requirements: A has independent columns. Those are orthogonalized in Q by the Gram-Schmidt process. If A is square then Q~' = Q?. Section 4.4 A=SAS7!= (eigenvectors in S)(eigenvalues in A)(left eigenvectors in S$ 1). Requirements: A must have n linearly independent eigenvectors. Section 6.2 A=QAQ! =(orthogonal matrix Q)(real eigenvalue matrix A)(Q" is Q~!). Requirements: A is symmetric. This is the Spectral Theorem. Section 6.4 A =MJM—! = (generalized eigenvectors in M)(Jordan blocks in J)(M~!). Requirements: A is any square matrix. Jordan form J has a block for each independent eigenvector of A. Each block has one eigenvalue. Section 6.6 10. A=UEVTE Coane Uismxm i x n singular value a) ee) O1,..., 0, On its diagonal Visnxn Requirements: None. This singular value decomposition (SVD) has the eigenvec- tors of AA? in U and of ATA in V; 0; = V/Aj(ATA) = V/A;(AA!). Sections 6.7 and 7.4 11. At =vytut ia = > orthogonal nxn n X m pseudoinverse of & 1/o,,..., 1/o, on diagonal orthogonal mxm } Requirements: None. The pseudoinverse has A* A = projection onto row space of A and AA* = projection onto column space. The shortest least-squares solution to Ax = bis¥ = A*b. This solves A’A¥ = A'D. Section 7.4 Ls A = QH = (orthogonal matrix Q)(symmetric positive definite matrix H). Requirements: A is invertible. This polar decomposition has H? = A'A. The factor H is semidefinite if A is singular. The reverse polar decomposition has K? = AA’. Both have Q = UV! from the SVD. Section 7.4 13. A=UAU7!= A= K Q (unitary U)(eigenvalue matrix A)(U~! which is U2 = Gay Requirements: A is normal: A"A = AA". Its orthonormal (and possibly complex) eigenvectors are the columns of U. Complex i’s unless A = A. Section 10.2 14. A =UTU"! = (unitary U)(triangular T with A’s on diagonal)(U~! = UH). Requirements: Schur triangularization of any square A. There is a matrix U with orthonormal columns that makes U~! AU triangular. Section 10.2 15. F, = i | ee IM) F n/2 Evourocd permutation = one step of the FFT. Requirements: F,, = Fourier matrix with entries w/* where w” = 1. Then FyFn = nl. Dhas 1, w, w?,... onits diagonal. For n = 2! the Fast Fourier Transform has snl multiplications from / stages of D’s. Section 10.3 uy : ; Ne me 4 : ony cng eee Yo Oe bad ',ayeieaend se 6) mae ee calms ia 00hBl we —= Gon earerea)oesa + ae 0040") eee . oe comcgts WL fe aF epee etre’ myers ee S '. a ae Wik 5 7 = a » ~ 1a) @ => c i 7 Ww’ Marie je »e ea ry +ed iW. eden aS oe nord <0 .~2-4 rth VO OAs ta ent awry © i (OO 4 TH Gh ay > Star iP 7 ene? Te acae!> Jevister fisl.« a re . al Liaw tne maannygcenl : ig bo’ O8.< me fant ane 3 i : oe 7 ann : a= 5 | ee ne ao 0 p> weruis @ ue Qed Tog) aynp DS) a ei use au ee he emis & ome) ream Ce &» aaa Lt-s : eae oe Rd OY ie | ielicgen wy on wore @ care ) Fuad ides i Th Sree pes WelnengeOi cards ‘- ws Ts Serpette We ius ‘ ED leaned | iad 7 cw ©. - STs @ b, Gimeerviagrell, ges a © tap dil ‘hind lowe* whe oe? Kids . tae k 8 > oh) kalaee“7 Wd lea WES TTA SE niege ‘oy, owediy aig hii) GoaBiv In), 4 2 Dalim 4 “ye abr Ne 2 of ii ea > tq ‘ Cire (ype @) Wied eeeqesgpge dh pity GP catitesee tat (A -_ a rey you pha “Ube 2 | no ay +, oy ea nf. Pou sine ~ ip * Wi me @ a @,.' & te CaaS uhh 2.3 © i #4 Ge ey Sas) Sia eae tienes re wWiin—6s, wy a 1h = en © €* arakr * G2 . spre it v 48 SO 1S _eoct ime ye saan oe i'vie TG v0 yg wy ~~ ict =) we al elie, th (a= CI gti Oamsbon, Ge, awe AES 290 > ) pyres, C8, £53 - > a ) ianaa uM 7) ‘MVO ao. § <—Ga. a opel - ites — he Wawa Th) arts & " cS CAR # ane tae id &@< "Pee, TP apes D (ews ee 5 S Ts iva "yapae JEoe, SUVA 100 Some, 44 ay gariaa. WK FG ) ool, GA WE 7 ay tad ; oo iois pane, = Al 78 ns a, = = & : mel a's: Pe) itir © Say ' am 7 me d / INDEX Absolute value, 423 Affine, 322, 390 Angle, 14, 15, 197 Area, 234 Arrow, 2, 4 Associative law, 47, 56, 62 Augmented matrix, 49, 52, 69, 110, 131 Axes, 301 Column vector, 5 Column times row, 58, 63, 128 Combination of columns, 44, 56 Commutative law, 47, 56, 62, 102 Commute, 261 Complement (see orthogonal complement) Complete solution, 115, 134 Completing the square, 298 Complex conjugate, 285, 287, 422, 428 Back substitution, 34, 80, 83 Band matrix, 398, 402 Basis, 145, 169 Basis for column space, 146 Block determinant, 228 Block elimination, 228 Block matrix, 57, 76 Block multiplication, 58 Breakdown, 36, 40 Complex eigenvalue, 251, 422 Calculus, 21, 244, 271 C(A)= column space, 106 C(A!)= row space, 144 Consistent, 131 Cayley-Hamilton, 270 Change of basis, 335, 341 Characteristic equation, 249 Checkerboard matrix, 164 Chess matrix, 164 chol, 87 Cholesky, 305 Circulant matrix, 445 Clock, 8 Closest line, 184, 186, 188, 193 Coefficient matrix, 23, 26 Cofactor, 222, 224 Cofactor matrix, 232 Column at a time, 47 Column picture, 22, 24, 35 Column space, 105, 156 Complex matrix, 64, 430, 432 Complex number, 421, 427 Components, | Compression, 342 Computer graphics, 389, 393 Condition number, 406, 408 Cone, 9 Conjugate gradient, 415 Conjugate transpose, 430 Consumption matrix, 372 Coordinates, 5, 341 Corner, 8, 377, 380 Cosine, 15, 19,176,383 Cost function, 377 Cost of elimination, 81, 82 Covariance matrix, 194 Cramer’s rule, 230 Cross product, 238, 239 Cube, 8 Current, 98, 363, 367 Dependent, 142, 143, 150 Derivative, 271, 329, 331 Determinant of inverse, 212 Determinant of product, 212 Determinant of transpose, 212 498 Index Determinant, 67, 94, 207, 249, 251 Diagonalizable matrix, 260 Diagonalization, 258, 260, 313, 347 Diagonally dominant, 420 Difference equation, 263, 266 Differential equation, 271 Dimension, 117, 147, 148, 156 Distance, 180 Distributive law, 56, 102 Dot product, 10, 14, 54, 90 Fibonacci, 262, 267 Final exam, 490, 491 Fitting by a line, 184, 188, 193 Formula for A~!, 232 Formula for determinant, 219, 221 FORTRAN, 6, 16, 28 Forward elimination, 83 Four fundamental subspaces, 155, 158, 316 Four possibilities, 135 Double integral, 237 Four subspaces (figures), 158, 169, 187, Dual basis, 344 Dual problem, 379 Fourier matrix, 344, 434, 439, 44] BillGns SZ Fourier series, 198, 384, 386 Echelon matrix, 116 Economics, 90, 372 Eigenvalue, 245, 249 Eigenvalues of A2, 246 Fredholm, 172 Free columns, 113, 127 Free variables, 36, 115 Full column rank, 123, 133 Eigenvalues of AB, 261 Full row rank, 123, 135 Eigenvector, 245, 249 Function space, 102, 149, 154, 384 Eigenvector basis, 347 Fundamental Theorem, 159, 168, 172, 317 Eigenvector matrix, 258, 259 Galileo, 190 Eigshow, 252, 317, 318 Einstein, 45 Elementary matrix, 46 elim, 122 Gauss-Jordan, 69, 75, 122, 139 Gauss-Seidel, 410, 413, 414 Gaussian elimination, 37 General solution, 136 Elimination matrix, 45, 50, 122, 127 Elimination, 34, 81 Geometric mean, 16, 18 Geometric series, 373, 376 Ellipse, 299, 301, 315 Gershgorin, 420 Empty basis, 150 Givens, 400 Empty matrix, 133 Gram-Schmidt, 200, 201, 202, 204, 416 Engineering, 90 Graph, 357, 360 Entry, 45 Error, 407, 410 Heisenberg, 261, 269 Error vector, 178, 188, 191 Euler, 403, 426 Even permutation, 96, 208 Existence of solution, 105, 135 Hermitian, 430 Hermitian matrix, 432 Hessenberg matrix, 417 Hilbert matrix, 76, 402 Expected value, 193 Hilbert space, 383, 385 Exponential, 271, 272 Exponential matrix, 277, 278 Homogeneous coordinates, 390, 393 Homogeneous solution, 126, 134 Householder, 400 House, 324, 327 Factorizations 492, 493 Fast Fourier Transform, 439, 441, 443 Feasible set, 377 Identity matrix, 27, 46 Index Identity transformation, 335 MATLAB, Income, 10 Incidence matrix, 357, 363, 368 Independent columns, 141, 158, 170 Independent rows, 135, 371 Infinite dimensions, 383 Matrix multiplication, 46, 54, 334 Infinitely many solutions, 36, 135 Inner product, 10, 55, 430 Input basis, 329, 332 Input space, 322 Integral, 330, 337 Intersection point, 24 inverse, 71 Inverse matrix, 66, 69 Inverse of AB, 67, 74 Invertible, 66, 71 Invertible submatrix, 129, 168 Iterative method, 396, 410 Jacobi, 402, 410, 416, 418 Jordan matrix, 307, 308, 309, 412 499 6, 16, 28, 252 (see last page for Teaching Codes) Matrix equation, 26 Matrix notation, 27, 45 Matrix of linear transformation, 332 Matrix space, 103, 108, 149, 153, 326 Matrix times vector, 26, 28, 30, 45 Matrix, 26, 54 augmented, 49, 52, 69, 110, 131 band, 398, 402 block, 57, 76 checkerboard, 164 chess, 164 circulant, 445 coefficient, 23, 26 cofactor, 232 complex, 64, 430, 432 consumption, 372 diagonalizable, 260 echelon, 116 eigenvector, 258, 259 Kernel, 322, 326 Kirchhoff, 357, 362, 367 Lanczos, 419 LAPACK, 419 Law of Cosines, 19 Least squares, 184, 185, 198 Left nullspace, 157, 159, 162 Left-inverse, 66 Length, 7, 11, 197, 430 Linear combination, 3, 8, 105 Linear programming, 377 Linear transformation, 321, 323, 329 Linearity, 209, 321 Linearly independent, 141, 142, 150, 260 LINPACK, 80 Loop, 360, 363 lu, 87 Magic matrix, 33 Maple, 29 Markov matrix, 32, 247, 256, 260 Mathematica, 29 elementary, 46 elimination, 45, 50, 122, 127 empty, 133 equation, 26 exponential, 277, 278 Fourier, 344, 434, 439, 441 Hermitian, 432 Hessenberg, 417 Hilbert, 76, 402 identity, 27, 46 incidence, 357, 363, 368 inverse, 66, 69 Markov, 32, 247, 256, 260 multiplication, 46, 54, 334 nonnegative, 369, 373 normal, 292, 293, 438 notation, 27, 45 nullspace, 115, 125, 126 of linear transformation, 332 orthogonal, 196, 252, 283, 314 permutation, 48, 92, 97, 442 pivot, 79 positive definite, 295, 297, 299, 304 500 Index powers of, 245, 261, 263, 268 projection, 176, 178, 183, 198, 247, 287 null, 173 Nullspace, 111, 156, 166, 180 Nullspace matrix, 115, 125, 126 rank one, 130, 138, 319 reflection, 196, 206, 247, 400 reverse identity, 213 Nullspace of A! A, 180 rotation, 196, 251, 400, 421 One-sided inverse, 331 Operation count, 55, 69, 81, 397, 403, 443 Orthogonal, 13, 165 Orthogonal complement, 168, 173 Orthogonal eigenvectors, 283, 433 Orthogonal matrix, 196, 252, 283, 314 Orthogonal subspaces, 165, 166, 168 Orthonormal basis, 313, 316 Orthonormal columns, 195, 197, 203, 433 Orthonormal, 195, 200 Outer product, 55, 58 Output basis, 332 Output space, 322 scaling, 390 second derivative, 296, 337 semidefinite, 296, 298 similar, 306, 347 singular, 37, 211 skew-symmetric, 97, 215, 252, 428 space, 103, 108, 149, 153, 326 sparse, 83 square root, 305 stiffness, 275 stretching, 315 symmetric, 91, 252, 283 times vector, 26, 28, 30, 45 transpose, 89 triangular, 201, 210 tridiagonal, 70, 83, 86, 398, 419 unitary, 433, 434, 440 Vandermonde, 228 zero, 54 Odd permutation, 208 Parabola, 190, 191 Parallel computer, 395 Mean, 193 Minimum, 296, 305 Minor, 223 Parallelogram, 3, 5, 23, 236 Partial pivoting, 395, 396 partic, 132 Particular solution, 132, 134 pascal, 87 Permutation, 219, 221 Permutation matrix, 48, 92, 97, 442 Perpendicular eigenvectors, 248, 283, 285 Perpendicular, 13, 17 (see orthogonal) Multiplicity, 265 Multiplier, 35, 77 piveol, 122, 124 Pivot, 355555 4les96 —1,2, —1, 69, 78, 97, 214, 299, 414, 419 netlib, 419 Network, 364 N(A!)= left nullspace, 159 Nondiagonalizable, 259, 265 Nonnegative matrix, 369, 373 Nonsingular, 37 Nonzero solution, 117 Norm, 11, 16, 405, 409 Normal equations, 178, 186 Normal matrix, 292, 293, 438 nulbasis, 115 Pivot columns, 113, 125, 156 Pivot matrix, 79 Pivot rows, 155 Pivot variables, 115 Pixel, 392 Plane, 24 Polar form, 424, 425 Polar decomposition, 350 Positive definite matrix, 295, 297, 299, 304 Potential, 90, 361, 367 Power method, 415, 419 Powers of a matrix, 245, 261, 263, 268 Index Preconditioner, 410 Principal axis, 283, 301 Probability vector, 370 Product of determinants, 212 Product of pivots, 211 Projection, 174, 178 Projection matrix, 176, 178, 183, 198, 247, 287 Projection onto a line, 175 Projection onto a subspace, 177 Proof by parentheses, 66 Pseudoinverse, 168, 351 Pythagoras, 7, 12 OR method, 416 Quadratic form, 295, 297 Quadratic formula, 428 Saddle, 305 Scalar multiplication, 23 Scalar, 2, 102 Scaling matrix, 390 Schur complement, 228 Schwarz inequality, 15, 18, 19, 269, 383 Second derivative matrix, 296, 337 Semicolon, 28 Semidefinite matrix, 296, 298 Sigma notation, 45 Sign reversal, 209 Similar matrix, 306, 347 Simplex method, 377, 380 Singular matrix, 37, 211 Singular value, 313, 406 Singular value decomposition, see SVD Skew-symmetric matrix, 97, 215, 252, 428 slu, 82 Range, 322, 326 Rank of AB, 130 Rank one matrix, 130, 138, 319 Rank one, 138, 160, 177 slv, 83 Solution, 21, 134 Solvable, 105, 135 SOR method, 414, 418 Rank, 123, 156, 158 Rayleigh quotient, 406 Span, 144, 151 Real eigenvalues, 283, 432 Special solutions, 112, 115, 125 Spectral radius, 409, 411 Spectral theorem, 283 splu, 98 Spreadsheet, 11 Square root matrix, 305 Square wave, 386, 388 Stability, 276 Standard basis, 341 Standard deviation, 193 Statistics, 193 Steady state, 33, 369, 371 Stiffness matrix, 275 Stretching matrix, 315 Subspace, 103, 104, 108 Sum of squares, 298 $VD 531453172350 Symmetric factorization, 92 Symmetric matrix, 91, 252, 283 Real versus complex, 435 Recursive least squares, 194 Reduced echelon form, 70, 71 Reduced row echelon form, 113, 122 Reflection matrix, 196, 206, 247, 400 Repeated eigenvalues, 259, 265 Residual, 189, 419 Reverse identity matrix, 213 Reverse order, 67, 89 Right angle, 7, 13 Right-inverse, 66 Roots of unity, 426, 439 Rotation matrix, 196, 251, 400, 421 Roundoff, 396, 408 Row exchange matrix, 48 Row Row Row Row exchange, 37, 48, 92 picture, 21, 24, 35 space, 144, 156, 166 vector, 5 rref, 122, 130 501 Sparse matrix, 83 Teaching Codes, see last page 502 Index races ilo ae oo Wall, 166 Transparent proof, 99 Wavelets, 206 Transpose matrix, 89 Transpose of AB, 89, 431 Zero matrix, 54 Tree, 360, 367 Zero vector, 102, 103 Triangle inequality, 18, 409 Triangular factorization, 78 : —1,2, —1 matrix, 69, 78, 97, 214, 299, Triangular matrix, 201, 210 eas 419 Tridiagonal matrix, 70, 83, 86, 398, 419 Ses Triple product, 240 A= LDL’, 92, 298 Trivial combination, 143 Two triangular systems, 80 Ae eee 2 eee agg A = LDU, 79, 86 J\ =e LU Wh, Toda 0) Uncertainty Principle, 261, 269 A = QAQ', 283, 301 Unique solution, 133, 163 A = QOH, 350 Unit circle, 12, 423, 440 A = OR, 201, 204, 400 Unit vector, 12, 196 A’. 351 Unitary matrix, 433, 434, 440 AA®, 315, 347 Upper triangular, 34, 39 A! A, 180, 302, 314, 405 Upward elimination, 70, 118 A'CA, 365 AH, 430 Vandermonde matrix, 228 Variance, 193 Vector, | Vector addition, 2, 23, 102 C(A), 105, 106 C”, 101, 435 ij,k, 8 N(A), 111 Vector space, 102, 107 PA = LU, 93,94 Voltage law, 362, 364 Volume, 210, 235 R”, 101, 435 V perp = orthogonal complement, 168, 172 MATLAB Teaching Codes cofactor cramer Compute the n by n matrix of cofactors. Solve the system Ax = b by Cramer’s Rule. deter eigen2 eigshow eigval eigvec elim findpiv fourbase grams house Matrix determinant computed from the pivots in PA = LU. Eigenvalues, eigenvectors, and det(A — A/) for 2 by 2 matrices. Graphical demonstration of eigenvalues and singular values. Eigenvalues and their multiplicity as roots of det(A — AJ) = 0. Compute as many linearly independent eigenvectors as possible. Reduction of A to row echelon form R by an invertible E. Find a pivot for Gaussian elimination (used by plu). Construct bases for all four fundamental subspaces. Gram-Schmidt orthogonalization of the columns of A. 2 by 12 matrix giving the corner coordinates of a house. inverse leftnull linefit Matrix inverse (if it exists) by Gauss-Jordan elimination. Compute a basis for the left nullspace. Plot the least squares fit to m given points by a line. Isq Least squares solution to Ax = b from A'AX = A'D. normal Eigenvalues and orthonormal eigenvectors when ATA = AAT. nulbasis Matrix of special solutions to Ax = 0 (basis for nullspace). orthcomp _ Find a basis for the orthogonal complement of a subspace. partic Particular solution of Ax = b, with all free variables zero. plot2d plu poly2str project projmat randperm rowbasis samespan signperm slu slv splu splv symmeig tridiag Two-dimensional plot for the house figures (cover and Section 7.1). Rectangular PA = LU factorization with row exchanges. Express a polynomial as a string. Project a vector b onto the column space of A. Construct the projection matrix onto the column space of A. Construct a random permutation. Compute a basis for the row space from the pivot rows of R. Test whether two matrices have the same column space. Determinant of the permutation matrix with rows ordered by p. LU factorization of a square matrix using no row exchanges. Apply slu to solve the system Ax = b allowing no row exchanges. Square PA = LU factorization with row exchanges. The solution to a square, invertible system Ax = b. Compute the eigenvalues and eigenvectors of a symmetric matrix. Construct a tridiagonal matrix with constant diagonals a, b, c. These Teaching Codes are directly available from the Linear Algebra Home Page: http://web.mit.edu/18.06/www They were written in MATLAB, and translated into Maple and Mathematica. 503 ‘cm 9 Gs - wee iw >) >~3A oe OP ; lie? TEAL LENe TSN pies Pas ' =. se <Migcie 4 jana (eq eg german, Murege) oo” Te epieynn) 2 “ta, @46 Ghee eeirded -@lGeeGri cae) 68 te: 51819 (wives Oon4 ine a ao ' 1 ag Oe HG - mb j : See a _ 3 Um i Pees @ealev rife iii > Ed) We =§ nafabé > ‘cand oe p28 ae 8 > Goreetmn! dodanie vel Oe dita d4 O° @i (earl ¢ bari Se, We Preys sell paced ee Miuking pg? — a o? Storer dy guinie amteenie (Sahay i INUr we Oi orpie Mian OE: Oia) ary oaee eet Pee’ CaaS on mw > = 6d Gente Pid GA 2 T64 (aS ate pa 8 4e° 9p mareniwy wi V ieee @ jie Erte & Side & @enirnaah ‘pines Supa) ats @s a? St ehh Jones sradory | 7 ee o mod 6 cigs) ‘15 ee Wh ear he Mee Gas Vi) @ A atapk i-©+ empiatiinin stam: thi rt * 6 clucogui> wn eit a tic : yt a4 a: : eas4 ¥ iit aS a (PWGGee 9H . Bb TVG Fo sh wae a (OA 7.25 f-? oelW Aamanniigieg tina meh age TSA Ot » 1: ¢ ATS oe CARA Liter ipl. Gad SausgD Sey Gan ‘ omere ’ Faster = «6 © grt = a .; Tyr BDe ‘Mpc @ —— |e) a, . Bins los fo 4) gm euhesinzss bp Sa vee —T ty SP cert leprae © @gpen wf | ss > > ante Mile Kee Geese 9 . eee. dead) it watt obialtera se > rami ee al per tieNA jit etal se —_—— » : os it