MATH 2270 Project Description and Ideas November 23, 2014 • Goals of the project: (a) Apply the linear algebra that you’ve learned to an interesting problem. (b) Practice reading and thinking about new material on your own. (c) Show your understanding of linear algebra and work out relevant examples. (d) Discuss your work with your peers. (e) Produce a short, understandable paper presenting your findings. • Topic: You can choose any topic you wish, but you need to make sure that there is enough linear algebra involved. Solving linear equations will probably not be sufficient. There are detailed examples below and plenty of good ideas in Chapter 8 of Strang and at http://www.math.harvard.edu/archive/21b\_fall\_03/handouts/ use.pdf and http://aix1.uottawa.ca/~jkhoury/app.htm. If you are inventing your own topic, be sure to check in with me to make sure you are on the right track. • Graded components (20 points total, 20% of final grade): 1. Topic proposal (1 point). (a) Due December 1 or 2, in class. I will meet with each of you on those days to briefly discuss your topic choice, and you will turn in your proposal. While I wander around the room, you should form pairs and help each other get started on your projects. (b) The proposal should include your chosen topic, the resources you plan to use, and a short outline of what you plan to do. 2. Small group discussion record (1 point) (a) On December 8 and 9, you will form small groups of 3-4 people. Each of you will explain what you are working on. Ask questions and make suggestions! If your group finishes, form a new group and repeat! (b) As you go, maintain a sheet of paper with the following information: the name of each person who explains his or her project to you, the topic he or she is working on, and a sentence or two summarizing the project. Turn these sheets in at the end of the day! (c) You are only expected to attend class on one of those days. 3. Paper (18 points). (a) Due December 12. (b) Roughly 5 pages, typed. No other specific format requirements. (c) To type math symbols and matrices, the options for inserting equations on a good text editor should be sufficient. You can also use a professional mathematics typesetting language like LaTeX. (d) Include a bibliography listing your sources, which may include the textbook. • Class attendance: I expect you to attend class on December 1 or 2, so we can have a brief chat about your proposal, and on December 8 or 9, so that you can discuss your project with your peers. I highly recommend that you attend class on November 24 and 26, when I will present the six project ideas included later in this document. We have a test on November 25. You are under no obligation to attend class on December 3, 5, 10, or 12. I may begin reviewing for the final exam on December 10 or 12. • Getting help: I am here to help you and I know exactly what my expectations are! Consult with me frequently while you’re working on your project. I prefer if you come with questions during the usual class hours (usual room, usual time) or during the problem sessions (usual rooms, usual times). The regular class hours may also be a good time for discussions with other students pursuing similar projects. I can also answer short questions via e-mail. If you desire in-depth help from me, then come to the problem sessions or make an appointment to meet me at some other time. • Collaboration: I encourage you to get help from any resource whatsoever (me, other students in the class, the textbook, the internet, your math-wiz neighbor, Gandalf, ...). However, getting help is very di↵erent from having someone else do your work for you or copying someone else’s work. Getting help is about enhancing your understanding and your ability to write a good paper, not about getting credit for work you didn’t do. If you copy someone else’s work, I will know about it, and you will get a very poor score on this project. • Advice: You will likely encounter new definitions and theorems. A great way to begin understanding new mathematical definitions is to invent simple examples that satisfy the definition and simple examples that don’t satisfy the definition. A great way to understand new theorems is to think about what the theorem is saying in the context of some simple examples. Simple examples should be part of your write-up! Now you see why I’ve been asking you for inventions all year long. • Be brave! It is okay if writing mathematics makes you uncomfortable. Explain things to the best of your ability. Write up your findings and simple examples as you go, and get feedback on your work in progress. I’m here to help you every day of the week. • Below you will find outlines of possible projects. Don’t judge them by their length! The longer examples tend to be mostly self-contained, while some of the shorter examples require you to do outside reading (I’ve included sources). The sections marked with an asterisk (*) are intended for particularly ambitious students. 2 1 “The Extraordinary SVD” In this project, you will learn how the singular value decomposition (SVD) can be used to extract the most important information in a matrix, producing good approximations of a matrix requiring far less data than the original matrix. The first section below covers the essential facts you need to know about the SVD, while the second and third section are two di↵erent applications (you should pick one). The title of this project comes from a paper http://arxiv.org/pdf/1103.2338v5.pdf, which you should peruse. 1. Background on the SVD (a) Let A be an m⇥n matrix of rank r. Recall that the SVD produces a decomposition A = U ⌃V T , where the columns ~u1 , . . . , ~um of U are an orthonormal basis of Rm , the columns ~v1 , . . . , ~vn of V are an orthonormal basis of Rn , ⌃ is m ⇥ n with the singular values 1 ··· 2 r > 0 on its diagonal and zeros everywhere else, and A~vi = i~ui for all 1 i r. (b) Show that the SVD can also be written as a sum of r rank 1 matrices. A= u1~v1T 1~ + ··· + ur~vrT r~ (columns times rows). There is a Theorem. For every 1 k r, the matrix obtained by summing the first k terms Ak = u1~v1T 1~ + ··· + uk~vrT k~ in the SVD is the best approximation to A by a rank k matrix. When the singular values decrease quickly, the approximations using the largest singular values are very good approximations of A by matrices of low rank. When A is a large matrix, these low rank approximations have much less data than the full A and hence are much more practical in computations than A. Many applications make use of this. (c) As an example, 2 1 A = 41 1 consider the SVD 3 213 2 3 p 0 1 i p 0 h i 2 h p1 405 p1 p1 p1 0 15 = 2 4 p12 5 p12 0 + 3 2 3 3 3 1 1 1 0 2 3 2 3 1 0 1 0 0 0 1 5 + 4 0 0 05 . = 41 0 0 0 0 1 1 1 ⇥ ⇤ 1 occurs twice, and that this double You can see that in A, the row 1 0 row is the first term in the SVD. This means that this ⇥ double row ⇤ is the best 1 is particularly approximation to A by a rank 1 matrix, namely this row 1 0 important in explaining what is going on in the matrix. Moreover, the vector ~u1 is a weighted vector telling you which rows are most important for this rank 1 approximation, and the vector ~v1 is a unit approximation of those most important rows. 3 (d) Building on the intuition from the example, if A is a matrix of data, then the ~viT are trends in the rows of the matrix of data, and the ~ui show in which rows those trends are strongest. The importance of each trend is measured by the singular value i , so the ~v1 trend is most important, followed by ~v2 , and so on. (e) For large matrices, computing the SVD by hand is impractical. Fortunately, there is plenty of software that can help us. Matlab, Mathematica, and most other mathematics or statistics software packages can compute SVD for you. You can also find free online applications that compute SVD, such as Wolfram Alpha and http://www.bluebit.gr/matrix-calculator/. (f) Compute the SVD for some matrices A with multiple rows, or a row that is much larger in length than all other rows, and observe how the best rank 1 approximation of A (the first term of the SVD) emphasizes the multiple row or the particularly large row. What do ~u1 and ~v1 look like, and why? Can you explain why the SVD is behaving like this? 2. First Application: Congressmen Voting (a) Read Sections 1, 2, and 3 of the paper http://arxiv.org/pdf/1103.2338v5. pdf. Feel free to find supplementary resources online (google “. (b) What is the voting matrix A? Invent some examples of voting matrices. (c) The paper says that using the SVD, A2 is a good approximation of A. The vector ~u1 correlates well with how “partisan” every Congressman is, while ~u2 correlates with how “bipartisan” every Congressman is, namely “how often a Congressman votes with the majority”. Explain how ~u1 , ~u2 let us graph Congressmen in the plane using these two coordinates. (d) Explain how A2 can be used to determine (with some uncertainty) whether each bill passed. How accurate are the results in the paper for the 107th Congress? (e) Invent a (small) Congress of your own and a list of bills. Invent various voting matrices A, compute the SVD for each one, graph the Congressmen in the plane using partisan-bipartisan (~u1 -~u2 ) coordinates, and determine how accurately you can reconstruct which bills were passed. Try to invent A so that A2 is a very good approximation of A. Why is A2 so good for this A? Also invent A so that A2 does a poor job, and explain why A2 is no good for your example. (Hint: the number of strong political parties – groups of Congressmen that vote similarly – should play an important role in your answer.) (f) If you want to go further with this topic, you can find plenty of additional material online. Googling “voting SVD” is a way to get started. 3. Second Application: Image Compression (a) For this application, you will need to find some grayscale image files and have access to an application that can convert the pixel values into a matrix and compute SVD for that large matrix. Matlab is a good choice. 4 (b) Read about image compression at http://www.columbia.edu/itc/applied/e3101/ SVD_applications.pdf. (c) Convert each of your grayscale images into a matrix A whose entries are the pixel intensity values. Compute the SVD of A, graph the singular values, compute approximations Ak for a variety of values k (k is the number of “modes”, and each Ak is a compressed version of the image A), and convert Ak into an actual grayscale image. (d) How many modes do you need for the compressed image to look good? How does this depend on the individual images and on the sizes of the singular values? How much more efficient are your nice compressed images in terms of storage space? 5 2 Markov matrices and the Perron-Frobenius Theorem In this project, you will study when the random walk problem associated to a Markov matrix stabilizes after a large number of iterations. The solution is but one of the many applications of the Perron-Frobenius Theorem (which is explained in detail at http://www2.math.umd. edu/~mboyle/courses/475sp05/spec.pdf). 1. A primitive matrix is a square matrix A with all entries A has all entries > 0. Primitive matrices satisfy the 0 such that some power of Theorem (Perron-Frobenius). Suppose A is a primitive matrix. Let be the maximum of the absolute values of the eigenvalues of A. Then is an eigenvalue of multiplicity 1, no other eigenvalue has absolute value , and there are eigenvectors in the -eigenspace that have all entries > 0. Work out some examples of the theorem (choosing A to have all entries > 0 guarantees 1 0 0 1 that A is primitive) and some non-examples (like A = I = and A = ). 0 1 1 0 2. Given an n ⇥ n Markov matrix M (all entries are 0 and every column sums to 1), construct a graph (with vertices and edges) whose incidence matrix is M . This graph will have directed edges weighted by the entries of M . If p~0 is an initial vertex probability vector (whose entries are the probabilities of starting at each vertex), then the entries of M p~0 are the vertex probabilities after one iteration, and M k p~0 are the vertex probabilities after k iterations. (Go over the example we did in class during our discussion of Section 6.2.) 3. If M is a Markov matrix, when is M primitive? The non-example A = I for the PerronFrobenius theorem above is a Markov matrix that is not primitive. Think about how being primitive relates to the graph G whose incidence matrix is M . Try to prove the Theorem. Let M be an n⇥n Markov matrix and G the directed graph that has vertices v1 , . . . , vn and an edge connecting two vertices vi and vj exactly when mij > 0. If M is primitive, then G is strongly connected. 4. A primitive n ⇥ n Markov matrix M always has = 1 as its largest eigenvalue (as in the Perron-Frobenius theorem). It is easy to see that 1 is an eigenvalue of any Markov matrix M (use the fact that (1, . . . , 1) is an eigenvector of M T and the fact that M and M T have the same eigenvalues), but not as obvious that 1 is the largest eigenvalue. Here is an outline of the proof that 1 is the largest eigenvalue: (a) Let S denote the subset of Rn consisting of all points (x1 , . . . , xn ) such that xi 0 for all i and x1 + · · · + xn = 1. (Sketch what S looks like for n = 1, 2, 3.) Use the fact that M is Markov to show that for all ~v in S, M~v is still in S. (b) Since M is primitive, apply the Perron-Frobenius Theorem to find a largest eigenvalue , which has an eigenvector ~r all of whose components are > 0 (hence, by scaling, you can choose ~r to lie in S). But then by (a), M~r = ~r must be in S, so M~r = ~r and = 1. 6 5. You are now ready to fully understand random walks on the graph whose incidence matrix is M when M is primitive and diagonalizable. (Even if M does not have these properties, you can change the entries of M very slightly to get both properties to hold. How does this work?) (a) Diagonalize: M = S⇤S 1 , with the eigenvalues in ⇤ in decreasing order (the biggest eigenvalue is the most important!). This makes computing powers of M easy. (b) Compute M 1 = limk!1 M k using the fact (which you proved above!) that M has an eigenvalue 1 and all other eigenvalues have absolute value < 1. (c) Deduce that every initial vertex probability vector p~0 gives rise to the same vertex probabilities after a long time, namely that M 1 p~0 is independent of p~0 . Interpret this as a statement about random walks on graphs. 6. * Perron-Frobenius has many other applications. Do some Googling! Actually, Google’s search algorithm uses Perron-Frobenius (for starters, see http://en.wikipedia.org/ wiki/PageRank), though it is quite complicated. See what you can discover! 7 3 Symmetries of n-gons and platonic solids In this project, you will study linear transformations of R2 and R3 that preserve a regular n-gon or a platonic solid, which are called symmetries. (I studied these symmetries for my undergraduate thesis, which you can find on my webpage.) 1 0 1 0 2 1. Consider the square centered at the origin of R with vertices at , , , . 0 1 0 1 (a) A symmetry of the square is a linear transformation R2 T / R2 such that applying T to the square results in the same square (some of the points on the square may move to other points on the square). A symmetry of the square must take vertices of the square to vertices of the square, and edges to edges. (b) Find the eight symmetries (rotations and reflections) that preserve the square. Write these symmetries as matrices with respect to the standard basis of R2 . (c) What type of matrices are these? (Symmetries must preserve lengths and angles, since otherwise the square would get distorted.) (d) What are the eigenvalues and eigenvectors of each of these symmetries? Interpret your answer geometrically. 2. Do a similar analysis for any regular n-gon centered at the origin in R2 . 2 3 ±1 3. Consider the cube centered at the origin of R3 with vertices 4±15. ±1 (a) Make an argument for why this cube has 48 symmetries (rotations and reflections). (b) Write down some of these symmetries as matrices and discuss their eigenvalues and eigenvectors. 4. * Do a similar analysis for the other platonic solids: tetrahedron, octahedron, dodecahedron, icosahedron. 5. * Investigate the symmetries of other shapes in R2 or in R3 . Try to find a shape in R2 with infinitely many symmetries! 6. * What happens when you multiply the matrices corresponding to two symmetries together? Argue that the set of all symmetries for a geometric object forms a group of matrices (see Strang Exercise 36 on page 119 and Exercise 32 on page 354). The study of groups is an important branch of algebra. 8 4 Spectral graph theory In this project, you will study how the eigenvalues of the adjacency matrix and the Laplacian matrix of a graph (the kind with vertices and edges) encode information about the graph. One resource is the first 10 pages of https://orion.math.iastate.edu/butler/ PDF/dissertation.pdf, which also covers plenty of other cool material. 1. What is a graph? We usually denote a graph, which consists of a vertex set V (G) and an edge set E(G) together with incidence information, by G. Explain the basic definitions and notation. You will need to become familiar with walks and spanning trees. 2. What is the adjacency matrix A of the graph G? Explain why the entries of Ak measure the number of walks of length k between two vertices. Explain how the largest eigenvalue of A gives an asymptotic measure of the number of walks of length k for large k. Explain how all of this works for some simple examples(try the graphs 1 1 0 1 with two vertices whose adjacency matrices are A = and A = .) 1 1 1 0 3. What is the combinatorial Laplacian L of G? Explain how the eigenvalues of L can be used to compute the number of spanning trees of G (the Matrix Tree Theorem). Show some simple examples! 4. * Let M denote the adjacency matrix with each column normalized by the vertex degrees. M is a Markov matrix. How can you use powers of M to study random walks on G? The eigenvectors of M can be used to diagonalize M , which will help in computing powers of M . What do the eigenvalues of M tell you about random walks on G that have a large number of steps? See the project “Markov Matrices and the Perron-Frobenius Theorem” for details. 9 5 Coordinate transformations for 3D graphics In this project, you will study how coordinate transformations in R3 can be expressed as matrices. The project is based on Section 8.7 of Strang. For Quaternions, one reference is Mathematics for 3D Game Programming and Computer Graphics, Third Edition, by Eric Lengyel. 1. Scaling: what is the matrix S that scales the x, y, z components of a vector ~v in R3 by the scaling factors c1 , c2 , c3 ? 2. Rotation: what is the matrix R~a,✓ that rotates vectors in R3 by an angle ✓ about an axis span(~a), where ~a is a unit vector? (a) As a warm-up, do the cases when ~a is a standard basis vector. (b) Find R~a,✓ for a general ~a. This is not easy! (Derive the formula (1) given on page 461 of Strang.) 3. Projection onto a plane through the origin: what is the matrix P~n that projects vectors onto the plane through the origin with normal vector ~n? 4. Including translations: think of R3 inside R4 . (a) Let ~v0 be a fixed vector in R3 . Why is the translation function T (~v ) = ~v + ~v0 not linear? This means that T cannot be represented by a 3 ⇥ 3 matrix! (b) We use a trick: put R3 inside R4 as the set of all vectors (“points”) whose fourth component is 1. Explain the importance of “homogeneous” coordinates. (c) In our new setting of R3 inside R4 , how can we think of scaling, rotation, and translation as 4 ⇥ 4 matrices? (d) Also, what is the 4⇥4 matrix P~n,~v0 that projects onto the “flat” with normal vector ~n that has been shifted ~v0 away from the origin? 5. * Quaternions. (a) The set of quaternions H is a four-dimensional real vector space with a multiplication operation. Do some research to figure out the details of how H is defined. (b) Quaternions can be used to represent rotations more efficiently: they require less storage space, and multiplying quaternions requires fewer computations than composing rotations. Do some research to explain how this works. 10 6 The derivative is a linear transformation In this project, you will study how the derivative of a function of several variables, which you saw in Calculus III, can be interpreted as a linear transformation (matrix). At each point, the derivative gives the best affine approximation to the function. This is the way mathematicians view the derivative! 1. An affine transformation Rn L / Rm is a combination of a linear transformation /R R and a translation: (~v ) = L(~v ) + ~v0 , where ~v0 is in Rm . Like linear transformations, affine transformations take lines to lines and parallelograms to parallelograms, and they have the advantage of being able to move the origin to any other point ~v0 . Every affine transformation that fixes the origin is a linear transformation. n 2. Let R m F / R2 2 be a (di↵erentiable) parametrized curve in R , which is a vector of f (x) functions F (x) = . The derivative of F is a column vector of functions F 0 (x) = g(x) 0 0 f (x) f (x ) 0 , which at a point x0 gives a column vector of scalars F (x0 ) = 0 0 . This 0 g (x) g (x0 ) derivative gives the best affine approximation to F for x close to x0 by 0 f (x )(x x0 ) + f (x0 ) 0 x0 ) + F (x0 ) = 0 0 , F (x) = F (x0 )(x g (x0 )(x x0 ) + g(x0 ) which is just a parametrization of the tangent line to the curve F at the point F (x0 ). / R2 is affine. Give geometric reasoning for why F is a better Check that R approximation to F near x0 than any other affine transformation. Work all ofthis 2x x out for several choices of F : for instance F (x) = (a line); F (x) = 2 1 x x cos x (a parabola); F (x) = (unit circle). A similar analysis can be done for a sin x parametrized curve in any Rn . F 3. Let R3 F / R be a (di↵erentiable) function of three variables x, y, z. The derivative of F is the gradient ⇥ ⇤ (x, y, z) @F (x, y, z) @F (x, y, z) . F 0 (x, y, z) = rF (x, y, z) = @F @x @y @z At every point (x0 , y0 , z0 ), F 0 (x0 , y0 , z0 ) is a 1 ⇥ 3 row vector of scalars, which gives the best affine approximation to F near (x0 , y0 , z0 ) by 2 3 x x0 0 4 y0 5 + F (x0 , y0 , z0 ). F (x, y, z) = F (x0 , y0 , z0 ) y z z0 The product of the row vector F 0 (x0 , y0 , z0 ) by the column vector (x x0 , y y0 , z z0 ) is the directional derivative of F at (x0 , y0 , z0 ) in the direction (x x0 , y y0 , z z0 ), 11 which you learned about in Calculus III! Think about why F , which is built from the directional derivative, gives the best affine approximation to F near (x0 , y0 , z0 ). Compute some examples (possibly inspired by your textbook or notes from Calculus III). 4. Let R2 F R3 be a 2 (di↵erentiable) parametrized surface in R3 , which is a vector 3 f (x, y) 4 of functions F (s, t) = g(x, y) 5. The derivative of F is the 3 ⇥ 2 matrix of partial h(x, y) derivatives 2 3 @f @f 6 @x (x, y) @y (x, y)7 6 7 @g @g 7. F 0 (s, t) = 6 (x, y) (x, y) 6 @x 7 @y 4 5 @h @h (x, y) (x, y) @x @y / At a point (x0 , y0 ), we get an actual 3 ⇥ 2 matrix of scalars F 0 (x0 , y0 ). This matrix gives the best affine approximation to F near (x0 , y0 ) by x x0 0 + F (x0 , y0 ), F (x, y) = F (x0 , y0 ) y y0 which is a parametrization of the tangent plane to the surface F at the point F (x0 , y0 ). Why is F affine? Why is F the best affine approximation to F near (x0 , y0 )? Work out some explicit examples: for instance F (x, y) = (x, y, x + y) (a plane); F (x, y) = (x, y, x2 + y 2 ) (a hyperboloid of one sheet); F (x, y) = (x, y, xy) (a saddle); F (x, y) = (sin x cos y, sin x sin y, cos x) (unit sphere, look at the point (x0 , y0 ) = ( ⇡2 , 0)). F / Rm . What is the 5. Now generalize this discussion to any di↵erentiable function Rn derivative of F ? Write down the best linear approximation F to F near a point ~x0 of Rn . This general case unifies many seemingly di↵erent concepts like the derivative of a function of one variable (Calculus I), tangent lines and planes (as in the above two cases), and the gradient. 6. * Why does this definition of the derivative agree with the usual definition of the f derivative in the case when R 7. * Suppose Rn F / Rm and Rm / R? G / R` are di↵erentiable functions. They can be G F / R` . How does this work if F and G composed to get a di↵erentiable function Rn are linear or affine? In general, F and G will not be affine, but we can study their best affine approximations. How is the best affine approximation to G F related to the best affine approximations to F and G? The relationship is called the “chain rule”, which generalizes the “chain rule” you have seen in your calculus courses. 8. * In order to give a rigorous proof that affine approximation to R n F / R m x) F (~ = F 0 (~x0 )(~x ~x0 ) + F (~x0 ) is the best near ~x0 , we can use an algebraic definition of the 12 derivative. Recall the definition of the derivative in one variable: f 0 (x0 ) = lim x!x0 f (x) x f (x0 ) . x0 Another way to write this is that f 0 (x0 ), if it exists, is the real number such that lim x!x0 f (x) f 0 (x0 )(x x0 ) + f (x0 ) = 0. x x0 This formulation says that f 0 (x0 )(x x0 ) + f (x0 ) = f (x) is the best affine approximation to f (x), because the di↵erence of f (x) and f (x) vanishes to first order as x F / Rm . gets close to x0 . Generalize this limit definition of the derivative to any Rn Your formula will look similar, but it will require some care to state correctly because it will involve vectors rather than just scalars (for instance, you cannot divide by a vector). Show that the derivative defined in terms of partial derivatives satisfies the limit definition. 13