SM261 Class synopsis - Prof. W D Joyner1 Vectors: Let V = Rn denote the set of n-tuples of elements taken from the real numbers R. The elements of V are called vectors. Vectors are often thought of, intuitively and geometrically, as a quantity with a length and a direction, drawn as an arrow. Define vector addition + and scalar multiplication · as follows: for v = (v1 , ..., vn ), w = (w1 , w2 , ..., wn ) ∈ V and c ∈ R, define vector addition and scalar multiplication “component-wise” v + w = (v1 + w1 , v2 + w2 , ..., vn + wn ), c · v = (cv1 , cv2 , ..., cvn ). The dot product or inner product, denoted by v · w or by hv, wi, is the real number given by v · w = v1 w1 + v2 w2 + . . . + vn wn . We say vectors v, w ∈ Rn are orthogonal if hv, wi = 0, written v ⊥ w. The length of a vector can be defined by √ ||v|| = v · v. A unit vector is a vector of length 1. If we have two vectors v, w which are (non-zero) scalar multiples, say v = cw, c ∈ R, (c 6= 0), then v, w are parallel. The angle θ between two vectors v and w satisfies cos(θ) = v·w . ||v|| · ||w|| A linear combination of vectors v1 , v2 , ..., vk of V is an expression of the form c1 v1 + c2 v2 + ... + ck vk , 1 wdj@usna.edu. Last updated 2014-11-25. 1 for some scalar coefficients ci ∈ R (i = 1, 2, ..., k). Matrices: An m×n matrix is a rectangular array or table of elements of R arranged with m rows and n columns. It is usually written: a11 a12 ... a1n a21 a22 ... a2n A = .. .. . . . am1 am2 ... amn The (i, j)th entry of A is aij . The ith row of A is ri = ai1 ai2 · · · ain (1 ≤ i ≤ m) The jth column of A is cj = a1j a2j .. . (1 ≤ j ≤ n) amj If A is as above then the (n − 1) × (n − 1) matrix obtained by removing the i-th row and the j-th column is denoted A(i, j): a11 a12 . . . a1,j−1 a1,j+1 . . . a1n a21 a22 ... a2n . . . . . . ... ai−1,n . A(i, j) = ai−1,1 ai−1,2 ... ai+1,n ai+1,1 ai+1,2 . .. .. . am1 am2 ... amn The determinant of A(i, j) is called the (i, j)th minor of A: M inor(A)i,j = det(A(i, j)). The signed (i, j)th minor of A is called the (i, j)th cofactor of A: 2 Cof (A)i,j = (−1)i+j det(A(i, j)). Solving linear equations: Let A be an m × n matrix. The solutions to the matrix equation Av = b are obtained by the following method. • Compute the row-reduced echelon form of the augmented matrix, rref(A, b). Suppose it has the form (A0 , b0 ), for some m × n matrix A0 in rowreduced echelon form. Let i1 , . . . ik denote the indices of the non-pivot columns. • Using the non-pivot variables, parameterize the solution to A0 v = b0 : v = b0 + b1 vi1 + . . . + bk vik ∈ b0 + Span({b1 , . . . , bk }), where v = (v1 , . . . , vn ). • A vector v solves Av = b if and only if v ∈ b0 + Span({b1 , . . . , bk }). Types of matrices: A matrix having as many rows as it has columns (m = n) is called a square matrix. The entries aii of an m × n matrix A = (aij ) are called the diagonal entries, the entries aij with i > j are called the lower diagonal entries, and the entries aij with i < j are called the upper diagonal entries. An m × n matrix A = (aij ) all of whose lower diagonal entries are zero is called an upper triangular matrix. A similar definition holds for lower triangular matrices. The square n × n matrix with 1’s on the diagonal and 0’s elsewhere, 1 ... 0 .. . 0 .. . . , .. 0 .. 0 . 0 1 3 is called the n × n identity matrix and denoted I or In . This is both upper triangular and lower triangular. In general, any square matrix which is both upper triangular and lower triangular is called a diagonal matrix. A square n × n matrix with exactly one 1 in each row and each column, and 0’s elsewhere, is called an n × n permutation matrix. The identity In is a permutation matrix. Elementary row operations: The elementary row operations on a matrix are • multiplying one row by a non-zero scalar (Ri 7→ cRi ), • interchanging two rows (Ri ↔ Rj ), • adding a scalar times one row to another row, replacing that row by the sum (Ri + cRj 7→ Ri ). These operations are to be applied to a m × n matrix A, such as the one above. A matrix is in row-reduced echelon form if the following conditions are true. • All of the 0-rows (rows with no one nonzero element), if any, belong at the bottom of the matrix. That is, nonzero rows (rows with at least one nonzero element) are above any rows of all 0s. • The leading coefficient (the first nonzero number reading left-to-right, also called the pivot) of a nonzero row is always strictly to the right of the pivot of the row above it. • All entries in a column above and below a pivot are 0. • The leading coefficient (pivot) of any nonzero row is equal to 1. Matrix inverse: If A is an n×n matrix then A has an inverse matrix, if there is a matrix B such that AB = BA = In . The inverse is denoted A−1 (when it exists) and we have 4 In = A−1 A = AA−1 . The inverse of a matrix A only exists when A is square. When does A−1 exist? Theorem 1. Let A be an n × n matrix. The following are equivalent. • A−1 exists, • det(A) 6= 0, • The system Av = 0 has the unique solution v = 0. • The system Ax = b has a unique solution (namely, v = A−1 b). • The rank of A is n. • rref(A) = In . • rref(A, I) = (I, A−1 ). • rref(A, b) = (I, A−1 b). Adjugate formula for A−1 : The adjugate of A is the transpose of the matrix of cofactors of A: Adj(A) = (Cof (A)i,j )T . The following formula for A−1 is valid: A−1 = 1 Adj(A). det(A) In the 2 × 2 case, this formula becomes: −1 1 a b d −b = . c d ad − bc −c a Of course, this only makes sense if ad − bc 6= 0, ie if det(A) 6= 0. 5 Determinants: Suppose A = (aij ) is an n × n matrix and fix any i, j ∈ {1, 2, ..., n}. Its determinant det(A) is can be computed by a cofactor expansion down any column n X det(A) = ai,j Cof (A)i,j , j=1 or across any row, det(A) = n X ai,j Cof (A)i,j . i=1 Theorem 2. Let A, B be an n × n matrix. The following statements are true. • det(A−1 ) = det(A)−1 . • det(Ak ) = det(A)k . • det( t A ) = det(A) (where t A is the transpose of A). • det(A) det(b) = det(AB). • If we write A as as a column of row vectors, r1 r2 A = .. . rn then the elementary row operation Ri + cRj → Ri does not change the value of the determinant: det r1 r2 .. . ri .. . rn r1 r2 .. . = det ri + crj .. . rn 6 • If we write A as as a column of row vectors, as above, then the elementary row operation cRi → Ri changes the value of the determinant by a factor of c: c det r1 r2 .. . ri .. . = det rn r1 r2 .. . cri .. . rn Main properties of row-reduction (a) The rank of the matrix A is the number of non-zero rows in rref(A), denoted rank(A). (b) The column span or column space of the matrix A is the span of the column vectors of A. It’s denoted Col(A) and its dimension is dim Col(A) = rank(A). A basis of Col(A) is obtained by writing down the column vectors of A corresponding to the pivot columns of rref(A). (c) The row span or row space of the matrix A is the span of the row vectors of A. It’s denoted Row(A) and its dimension is dim Row(A) = rank(At ). Note Row(A) = Col(At ). (d) The kernel or null space of the matrix A is the subspace of vectors which get sent to 0 by A: Ker(A) = N ull(A) = {v ∈ Rn | Av = 0}. A basis of ker(A) is obtained by the following method. 7 – Compute the row-reduced echelon form of A, rref(A). Let i1 , . . . ik denote the indices of the non-pivot columns. – Using the non-pivot variables, parameterize the solution to rref(A)v = 0: v = b1 vi1 + . . . + bk vik , where v = (v1 , . . . , vn ). – A basis of Ker(A) is {b1 , . . . , bk }. Sometimes the dimension of the kernel of A is called the nullity of A; nullity(A) = dim Ker(A). • Algorithm to compute A−1 : If A is an n × n matrix, a fast way to compute A−1 is obtained by the following method. – Compute rref(A, In ) – A is invertible if and only of this has the form (In , B) for some matrix B. Set A−1 = B. As a consequence of (a), (b) and (d) above we have the rank plus nullity theorem (whose Gilbert Strang calls the “fundamental theorem of linear algebra”) rank(A) + nullity(A) = n, for every m × n matrix A. Let A be a given n × n matrix, a11 a12 ... a1n a21 a22 ... a2n A = .. .. , . . an1 am2 ... ann 8 b be a given column vector, b1 b2 .. . b= bn and x be a column vector of variables, x1 x2 .. . x= . xn The system Ax = b has a unique solution if and only if det(A) 6= 0. Cramer’s rule gives us a formula for this solution without having to compute A−1 . Cramer’s Rule: Write A as a vector of column vectors: A = (c1 , c2 , . . . , cn ), where cj = a1j a2j .. . , (1 ≤ j ≤ n). amj Then the ith coordinate of the solution x to Ax = b is xi = det(c1 , . . . , ci−1 , b, ci+1 , . . . , cn ) , det(c1 , . . . , ci−1 , ci , ci+1 , . . . , cn ) (1 ≤ i ≤ n). A linear combination of vectors v1 , . . . , vk of V is an expression of the form c1 v 1 + . . . + ck v k , for some constants c1 , . . . , ck . The collection of all possible linear combinations of v1 , . . . , vk is called the span and denoted 9 Span(v1 , . . . , vk ). If the only solution to c1 v1 + . . . + ck vk = 0, is c1 = . . . = ck = 0 then we call the vectors v1 , . . . , vk linearly independent. We call vectors v1 , . . . , vk of V a basis of V if (a) the v1 , . . . , vk are linearly independent, (b) V = Span(v1 , . . . , vk ). In this case we call k the dimension of V , written dim(V ) = k. If the B = {v1 , . . . , vk } is a set of linearly independent vectors and if c1 v1 + . . . + ck vk = v, then we write [v]B = (c1 , . . . , ck ), and call c1 , . . . , ck the coordinates of v with respect to B. Matrix representation of a linear transformation with respect to a basis: • Let T : Rn → Rm denote a linear transformation. Assume Rm and Rn each are given the standard basis. The matrix representation of T in the standard basis is the m × n matrix given by: [T ] = (T (e1 ), . . . , T (en )), where each T (e1 ), . . . , T (en ) is represented as a column vector in Rm . • Assume T : V → V is a linear transformation from a vector space V to itself. Let B = {b1 , . . . , bn } denote a basis for V . The matrix representation of T in the basis B is n × n matrix given by: 10 [T ]B = ([T (b1 )]B , . . . , [T (bn )]B ), where each [T (b1 )]B , . . . , [T (bn )]B is represented as a column vector in Rn . Another formula for this matrix representation is [T ]B = S −1 [T ]S, where the vectors b1 , . . . , bn are used as the column vectors of the n×n matrix S. (Sometimes S is called the change-of-basis matrix.) • Assume T : V → V 0 is a linear transformation between vector spaces V and V 0 . Let B = {b1 , . . . , bn } denote a basis for V and let B 0 = {b01 , . . . , b0m } denote a basis for V 0 . The matrix representation of T in the bases B, B 0 is m×n matrix given by: [T ]B = ([T (b1 )]B 0 , . . . , [T (bn )]B 0 ), where each [T (b1 )]B 0 , . . . , [T (bn )]B 0 is represented as a column vector in Rm . Orthogonal matrices An orthogonal basis for the vector space Rn is a basis of mutually orthogonal vectors. An orthonormal basis for the vector space Rn is a basis of mutually orthogonal vectors each of which has unit length. An orthogonal matrix is a square matrix whose rows (or columns) form a orthonormal basis. Equivalently, an n × n matrix is orthogonal if and only if A · At = In . Gram-Schmidt process Input: k linearly independent vectors v1 , . . . , vk in Rn , for k ≤ n. Output: k orthonormal vectors u1 , . . . , uk in Rn , with Span{v1 , . . . , vk } = Span{u1 , . . . , uk }. 11 • Let w1 = v1 . • Let w2 = v2 −a1 w1 , where a1 is determined by the condition hw1 , w2 i = 0. • Let w3 = v3 − a1 w1 − a2 w2 , where a1 , a2 are determined by the conditions hw1 , w3 i = 0, hw2 , w3 i = 0. ... • Let wk = vk − a1 w1 − a2 w2 . . . − ak−1 wk−1 , where a1 , a2 , . . . , ak−1 are determined by the conditions hw1 , wk i = 0, hw2 , wk i = 0, . . . , hwk−1 , wk i = 0. • Finally, normalize the wj ’s to have unit length: u1 = w1 /||w1 ||, u2 = w2 /|ww2 ||, . . . , uk = wk /||wk ||. Orthogonal projections Let V be a subspace of Rn . The orthogonal complement of V is the subspace V ⊥ = {w ∈ Rn | hv, wi = 0, for all v ∈ V }. Facts: • Every w ∈ Rn has a unique representation w = v + v⊥ , for v ∈ V and v⊥ ∈ V ⊥ . • If V = Col(A), for some matrix A, then Col(A)⊥ = ker(At ). • If V = Row(A), for some matrix A, then Row(A)⊥ = ker(A). • If V has orthonormal basis u1 , . . . , uk in Rn , then the orthogonal projection prV : Rn → V is given by prV (v) = hv, u1 iu1 + hv, u2 iu2 + . . . + hv, uk iuk . The matrix representation of prV is the matrix Q · Qt , where Q = (u1 , . . . , uk ) (written as a matrix of column vectors). 12 • If V = Col(A), where A has linearly independent columns, then the matrix representation of prV is the matrix A(At A)−1 At : [prV ] = A(At A)−1 At . Application to linear regression: To find the line y = mx + b of best fit for the data (x1 , y1 ), . . . , (xn , yn ), let x1 1 x2 1 A = .. , . xn 1 y1 y2 y = .. , . yn and compute (At A)−1 At y. Eigenvalues and Eigenvectors In general, you have the eigenvector equation v 6= 0. Av = λv, This means that λ is an eigenvalue of A with eigenvector v. • Find the eigenvalues. These are the roots of the characteristic polynomial pA (x) = det(A − xIn ). In the 2 × 2 case, this becomes p(x) = det a−x b c d−x 13 = x2 − (a + d)x + (ad − bc). • The eigenspace associated to λ is Eλ = ker(A − λIn ). The non-zero elements in this vector subspace are the eigenvectors associated to λ. In the 2 × 2 case, if b 6= 0, then you can use the formula b v= , λ−a to get an eigenvector associated to λ. • The algebraic multiplicity of an eigenvalue λ is A is the multiplicity of it as a root of the characteristic polynomial. For example, if k Y pA (x) = (x − λj )mj , j=1 where λ1 , . . . λk are distinct, then the algebraic multiplicity of λj is mj . • The geometric multiplicity of an eigenvalue λ is A is the dimension of the subspace ker(A − λI). • In general, the geometric multiplicity of an eigenvalue λ is always less than or equal to the algebraic multiplicity of λ. • The matrix A is diagonalizable if and only if there is an invertible matrix S such that S −1 AS = D, where D is a diagonal matrix having the eigenvalues of A on the diagonal (counted according to multiplicity). • If A is symmetric (ie, A = At ) then A is diagonalizable. • If A has distinct eigenvalues then A is diagonalizable. 14 • If, for each eigenvalue λ of A, the geometric multiplicity of λ is always equal to the algebraic multiplicity of λ, then A is diagonalizable. The converse also holds. Moreover, the following holds. If, for each eigenvalue λ of A, the geometric multiplicity of λ is always equal to the algebraic multiplicity of λ, then there are n linearly independent eigenvectors v1 , . . . , vn of A. Using these vectors vj as column vectors of an n × n matrix S, then S −1 AS = D, where D is a diagonal matrix having the eigenvalues of A on the diagonal (counted according to multiplicity). The matrix representation of A with respect to the basis B = {v1 , . . . , vn } is D: [A]B = D. 15