MA.1 Introduction to matrix algebra The basics What is a matrix? From the regression analysis book of Kutner, Nachtsheim, and Neter (2004, p. 176): “A matrix is a rectangular array of elements arranged in rows and columns” Example: 1 2 3 4 5 6 The dimension or size of a matrix: # rows # columns = rc Example: 23 Rather than using numbers, we can represent parts of a matrix (and the matrix itself) by symbols. Example: MA.2 a11 a12 A a21 a22 a13 a23 where aij is the row i and column j element of A a11 is often called the “(1,1) element” of A, a12 is called the “(1,2) element” of A,… a11 = 1 from the previous example Notice that the matrix A was bolded. When bolding is not possible (writing on a piece of paper or chalkboard), the letter is underlined - A Example: A matrix of size rc a11 a12 a a22 21 A ai1 ai2 ar1 ar 2 a1j a2 j aij arj a1c a2c aic arc Example: Square matrix is rc where r = c MA.3 Example: HS and College GPA From the introduction to R lecture, we can start putting parts of the observed data into a matrix form. For example, 1 1 1 X 1 1 3.04 2.35 2.70 2.28 1.88 The above 202 matrix contains the HS GPAs in the second column. It will be clear shortly why we have a leading column of 1’s. Vector – a r1 (column vector) or 1c (row vector) matrix – special case of a matrix Example: Symbolic representation of a 31 column vector MA.4 a1 A a 2 a3 Example: HS and College GPA 3.10 2.30 3.00 Y 2.20 1.60 The above 201 vector contains the College GPAs. Transpose – Interchange the rows and columns of a matrix or vector Example: a11 a21 a11 a12 a13 a A a A and 22 12 a a a 22 23 21 a13 a23 A is 23 and A is 32 MA.5 The symbol indicates a transpose, and it is said as the word “prime”. Thus, the transpose of A is “A prime”. Example: HS and College GPA Y 3.10 2.30 3.00 2.20 1.60 MA.6 Matrix addition and subtraction Add or subtract the corresponding elements of matrices with the same dimension. Example: 1 2 3 Suppose A and B 4 5 6 1 10 1 5 5 8 . 0 12 2 Then A B and A B 9 10 14 2 8 4 1 0 2 . Example: Using R (basic_matrix_algebra.R) > A<-matrix(data = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE) > A [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 > class(A) [1] "matrix" > B<-matrix(data = c(-1, 10, -1, 5, 5, 8), nrow = 2, ncol = 3, byrow = TRUE) > A+B [,1] [,2] [,3] [1,] 0 12 2 MA.7 [2,] 9 10 14 > A-B [,1] [,2] [,3] [1,] 2 -8 4 [2,] -1 0 -2 Notes: 1. Be careful with the byrow option. By default, this is set to FALSE. Thus, the numbers would be entered into the matrix by columns. For example, > matrix(data = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 2. The class of these objects is “matrix”. 3. A vector can be represented as a “matrix” class type or a type of its own. > y<-matrix(data = c(1,2,3), nrow = 3, ncol = 1, byrow = TRUE) > y [,1] [1,] 1 [2,] 2 [3,] 3 > class(y) [1] "matrix" > x<-c(1,2,3) > x [1] 1 2 3 > class(x) [1] "numeric" MA.8 > is.vector(x) [1] TRUE This can present some confusion when vectors are multiplied with other vectors or matrices because no specific row or column dimensions are given. More on this shortly. 4. A transpose of a matrix can be done using the t() function. For example, > t(A) [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 MA.9 Matrix multiplication Scalar - 11 matrix Example: Matrix multiplied by a scalar ca11 ca12 cA ca21 ca22 ca13 where c is a scalar ca23 1 2 3 2 4 6 Let A and c=2. Then 2A . 4 5 6 8 10 12 Multiplying two matrices Suppose you want to multiply the matrices A and B; i.e., AB or AB. In order to do this, you need the number of columns of A to be the same as the number of rows as B. For example, suppose A is 23 and B is 310. You can multiply these matrices. However if B is 410 instead, these matrices could NOT be multiplied. The resulting dimension of C=AB: 1. The number of rows of A is the number of rows of C. 2. The number of columns of B is the number of rows of C. MA.10 3. In other words, C A B where the dimension of the wy w z z y matrices are shown below them. How to multiply two matrices – an example 3 0 1 2 3 1 2 . Notice that A and B Suppose A 4 5 6 0 1 is 23 and B is 32 so C=AB can be done. 3 0 1 2 3 C AB 1 2 4 5 6 0 1 1 3 2 1 3 0 1 0 2 2 3 1 4 3 5 1 6 0 4 0 5 2 6 1 5 7 17 16 The “cross product” of the rows of A and the columns of B are taken to form C In the above example, D=BAAB where BA is: MA.11 3 0 1 2 BA 1 2 4 5 0 1 3 1 0 4 1 1 2 4 0 1 1 4 3 6 3 2 0 5 3 3 0 6 1 2 2 5 1 3 2 6 0 2 1 5 0 3 1 6 3 6 9 9 12 15 4 5 6 In general for a 23 matrix times a 32 matrix: b11 b12 a11 a12 a13 C AB b b 22 21 a a a 22 23 21 b31 b32 a11b11 a12b21 a13b31 a11b12 a12b22 a13b32 a b a b a b a b a b a b 22 21 23 31 21 12 22 22 23 32 21 11 Example: Using R (basic_matrix_algebra.R) > A<-matrix(data 3, > B<-matrix(data 2, = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = byrow = TRUE) = c(3, 0, 1, 2, 0, 1), nrow = 3, ncol = byrow = TRUE) MA.12 > C<-A%*%B > D<-B%*%A > C [,1] [,2] [1,] 5 7 [2,] 17 16 > D [,1] [,2] [,3] [1,] 3 6 9 [2,] 9 12 15 [3,] 4 5 6 > #What is A*B? > A*B Error in A * B : non-conformable arrays Notes: 1. %*% is used for multiplying matrices and/or vectors 2. * means to perform elementwise multiplications. Here is an example where this can be done: > E<-A > A*E [,1] [,2] [,3] [1,] 1 4 9 [2,] 16 25 36 The (i,j) elements of each matrix are multiplied together. 3. In R, multiplying vectors with other vectors or matrices can be confusing since no row or column dimensions MA.13 are given for a vector object. For example, suppose x 1 = 2 , a 31 vector 3 > x<-c(1,2,3) > x%*%x [,1] [1,] 14 > A%*%x [,1] [1,] 14 [2,] 32 How does R know that we want xx (11) instead of xx (33) when we have not told R that x is 31? Similarly, how does R know that Ax is 21? From the R help for %*% in the Base package: Multiplies two matrices, if they are conformable. If one argument is a vector, it will be promoted to either a row or column matrix to make the two arguments conformable. If both are vectors it will return the inner product. An inner product produces a scalar value. If you wanted xx (33), one can use the outer product %o% > x%o%x #outer product [,1] [,2] [,3] [1,] 1 2 3 [2,] 2 4 6 MA.14 [3,] 3 6 9 We will only need to use %*% in this class. Example: HS and College GPA (HS_college_GPA_MA.R) 1 1 1 X 1 1 3.04 3.10 2.30 2.35 3.00 2.70 and Y 2.20 2.28 1.88 1.60 Find XX, XY, and YY > #Read in the data > gpa<-read.table(file = "C:\\chris\\ unl\\Dropbox\\NEW\\ STAT873_rev\\matrix_algebra\\gpa.txt", header=TRUE, sep = "") > head(gpa) HS.GPA College.GPA 1 3.04 3.1 2 2.35 2.3 3 2.70 3.0 4 2.05 1.9 5 2.83 2.5 6 4.32 3.7 > X<-cbind(1, gpa$HS.GPA) > Y<-gpa$College.GPA > X MA.15 [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [,1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [,2] 3.04 2.35 2.70 2.05 2.83 4.32 3.39 2.32 2.69 0.83 2.39 3.65 1.85 3.83 1.22 1.48 2.28 4.00 2.28 1.88 > Y [1] 3.1 2.3 3.0 1.9 2.5 3.7 3.4 2.6 2.8 1.6 2.0 2.9 2.3 3.2 1.8 1.4 2.0 3.8 2.2 [20] 1.6 > t(X)%*%X [,1] [,2] [1,] 20.00 51.3800 [2,] 51.38 148.4634 > t(X)%*%Y [,1] [1,] 50.100 [2,] 140.229 > t(Y)%*%Y [,1] [1,] 135.15 MA.16 Notes: 1. The cbind() function combines items by “c”olumns. Since 1 is only one element, it will replicate (called “recycling” in R) itself for all elements that you are combining so that one full matrix is formed. There is also a rbind() function that combines by rows. Thus, rbind(a,b) forms a matrix with a above b. y1 y n 2 2. Y Y y1,y2 ,...,yn yi2 i1 yn y1 n n xi1yi yi xn1 y2 i1 i1 x11 x21 n n 3. X Y x x x 12 22 n2 xi2 yi xi2 yi i 1 i1 yn because x11=…=xn1=1 MA.17 4. XX x11 x12 1 x12 x21 x22 1 x22 x11 x12 xn1 x21 x22 xn2 x x n1 n2 1 x12 n 1 1 x22 n x xn2 i2 i 1 1 xn2 xi2 i 1 n 2 xi2 i 1 n 5. Here’s another way to get the X matrix > mod.fit<-lm(formula = College.GPA ~ HS.GPA, data = gpa) > model.matrix(object = mod.fit) (Intercept) HS.GPA 1 1 3.04 2 1 2.35 3 1 2.70 4 1 2.05 5 1 2.83 6 1 4.32 7 1 3.39 8 1 2.32 9 1 2.69 10 1 0.83 11 1 2.39 12 1 3.65 13 1 1.85 14 1 3.83 15 1 1.22 16 1 1.48 17 1 2.28 MA.18 18 1 19 1 20 1 attr(,"assign") [1] 0 1 4.00 2.28 1.88 MA.19 Special types of matrices Symmetric matrix: If A=A, then A is symmetric. 1 2 Example: A , A 2 3 1 2 2 3 Diagonal matrix: A square matrix whose “off-diagonal” elements are 0. a11 0 Example: A 0 a22 0 0 0 0 a33 Identity matrix: A diagonal matrix with 1’s on the diagonal. 1 0 0 Example: I 0 1 0 0 0 1 Note that “I” (the letter I, not the number one) usually denotes the identity matrix. Vector and matrix of 1’s MA.20 A column vector of 1’s: 1 1 A matrix of 1’s: J r r 1 Notes: 1. j j r r 1 r 1 2. j j J r 1r 1 r r 3. J J r J r r r r r r 0 0 Vector of 0’s: 0 r 1 0 1 1 1 j r 1 r 1 1 1 1 1 1 1 1 MA.21 Linear dependence and rank of matrix 1 2 6 Let A 3 4 12 . Think of each column of A as a 5 6 18 vector; i.e., A = [A1, A2, A3]. Note that 3A2=A3. This means the columns of A are “linearly dependent.” Formally, a set of column vectors are linearly dependent if there exists constants 1, 2,…, c (not all zero) such that 1A1+2A2+…+cAc=0. A set of column vectors are linearly independent if 1A1+2A2+…+cAc=0 only for 1=2=…=c=0 . The rank of a matrix is the maximum number of linearly independent columns in the matrix. rank(A)=2 If a matrix does not have a rank equal to the number of its columns, this can cause problems in some of our calculations this semester. More on this later… MA.22 Inverse of a matrix Note that the inverse of a scalar, say b, is b-1. For example, the inverse of b=3 is 3-1=1/3. Also, bb-1=1. In matrix algebra, the inverse of a matrix is another matrix. For example, the inverse of A is A-1, and AA-1=A-1A=I. Note that A must be a square matrix. 1 1 2 1 2 Example: A ,A 3 4 1.5 0.5 Check: AA 1 1 ( 2) 2 1.5 1 1 2 * ( 0.5) 1 0 3 ( 2) 4 1.5 3 1 4 * ( 0.5) 0 1 Finding the inverse For a general way, see a matrix algebra book (e.g., Kolman, 1988). For a 22 matrix, there is a simple a11 a12 formula. Let A . Then a21 a22 a22 a12 1 A1 . a11a22 a12 a21 a21 a11 Verify AA-1=I on your own. Example: Using R (basic_matrix_algebra.R) MA.23 > A<-matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow = TRUE) > solve(A) [,1] [,2] [1,] -2.0 1.0 [2,] 1.5 -0.5 > A%*%solve(A) [,1] [,2] [1,] 1 1.110223e-16 [2,] 0 1.000000e+00 > solve(A)%*%A [,1] [,2] [1,] 1.000000e+00 0 [2,] 1.110223e-16 1 > round(solve(A)%*%A, 2) [,1] [,2] [1,] 1 0 [2,] 0 1 The solve() function inverts a matrix in R. The solve(A,b) function can also be used to “solve” for x in Ax = b since A-1Ax = A-1b Ix = A-1b x = A-1b Example: HS and College GPA (HS_college_GPA_MA.R) MA.24 1 3.04 3.10 1 2.35 2.30 1 2.70 3.00 Remember that X and Y 1 2.28 2.20 1 1.88 1.60 Find (XX)-1 and (XX)-1XY > solve(t(X)%*%X) [,1] [,2] [1,] 0.4507584 -0.15599781 [2,] -0.1559978 0.06072316 > solve(t(X)%*%X) %*% t(X)%*%Y [,1] [1,] 0.7075776 [2,] 0.6996584 From previous output: > mod.fit<-lm(formula = College.GPA ~ HS.GPA, data = gpa) > mod.fit$coefficients (Intercept) HS.GPA 0.7075776 0.6996584 ˆ 0 Thus, (XX) XY = , where ̂0 and ̂1 are the ˆ1 estimated intercept and slope coefficients in a simple linear regression model. -1 MA.25 Trace and Determinant Suppose we have a pp matrix A: a11 a12 a a22 21 Α ap1 ap2 a1p a2p app Notice this is a square matrix! Trace – The trace of a square matrix A is defined as p tr( Α ) aii a11 a22 ... app ; i.e., the sum of a square i1 matrix’s diagonal elements. Example: Using R (basic_matrix_algebra.R) > A<-matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow = TRUE) > A [,1] [,2] [1,] 1 2 [2,] 3 4 > diag(A) [1] 1 4 > sum(diag(A)) [1] 5 MA.26 Determinant – The determinant of a square matrix A is p Α a1j Α1j where A1j = (-1)1+j|A1j| and A1j is obtained from A j1 by deleting its first row and its jth column. The determinant for a 22 matrix is defined as a11 a12 a11a22 a12a21 a 21 a22 and the determinant for a 33 matrix can be defined as a11 a12 a a 21 22 a31 a32 a13 a23 a11 a33 a22 a 32 a23 a12 a33 a21 a23 a13 a 31 a33 a21 a22 a 31 a32 Example: Using R (basic_matrix_algebra.R) > A<-matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow = TRUE) > det(A) [1] -2 MA.27 Eigenvalues and eigenvectors Suppose we have a pp matrix A: a11 a12 a a22 21 Α ap1 ap2 a1p a2p app Notice this is a square matrix! Eigenvalues – The roots of the polynomial equation defined by Α I 0 where I is an identity matrix. If p=2, then the eigenvalues are the roots of a12 a11 a12 0 a11 a a 0 a a 21 22 21 22 (a11 )(a22 ) a21a12 2 (a11 a22 ) (a21a12 a11a22 ) 0 Using the quadratic formula, we obtain a11 a22 (a11 a22 )2 4(a21a12 a11a22 ) 2 MA.28 In general, the eigenvalues are the p roots of c1p + c2p-1 + c3p-2 + … + cp + cp+1 = 0 where cj for i = 1, …, p+1 denote constants. When A is a symmetric matrix, the eigenvalues are real numbers and can be ordered from largest to smallest as 1 2 … p where 1 is the largest. Example: Using R (basic_matrix_algebra.R) > A<-matrix(data = c(1, 0.5, 0.5, 1.25), nrow = 2, ncol = 2, byrow = TRUE) > A [,1] [,2] [1,] 1.0 0.50 [2,] 0.5 1.25 > eigen(A) $values [1] 1.6403882 0.6096118 $vectors [,1] [,2] [1,] 0.6154122 -0.7882054 [2,] 0.7882054 0.6154122 > save.eig<-eigen(A) > save.eig$values [1] 1.6403882 0.6096118 > save.eig$vectors [,1] [,2] [1,] 0.6154122 -0.7882054 [2,] 0.7882054 0.6154122 2.25 2.252 4(.25 1.25) 1.6404 and 0.6096 2 MA.29 Eigenvectors – Each eigenvalue of A has a corresponding nonzero vector b called an eigenvector that satisfies Ab = b. Notes: Eigenvectors for a particular eigenvalue are not unique. When two eigenvalues are not equal, their corresponding eigenvectors are orthogonal (i.e., bibj 0 ). p tr( Α ) i i1 p Α i 12 i1 p Example: Using R (basic_matrix_algebra.R) The eigenvectors of A satisfy 0.5 b1 1 b1 0.5 1.25 b b 2 2 for b = [b1, b2], 1=1.6404, and 2=0.6096. Two possible vectors are: MA.30 0.6154 0.7882 0.7882 for 1=1.6404 and 0.6154 for 2=0.6096 (see the previous example’s output). Note that 0.5 0.6154 1 0.6154 1.6404 0.5 1.25 0.7882 0.7882 and 0.5 0.7882 1 0.7882 0.5 1.25 0.6154 0.6096 0.6154 Below is the verification from R: > A%*%save.eig$vectors[,1] [,1] [1,] 1.009515 [2,] 1.292963 > save.eig$values[1]*save.eig$vectors[,1] [1] 1.009515 1.292963 > A%*%save.eig$vectors[,2] [,1] [1,] -0.4804993 [2,] 0.3751625 > save.eig$values[2]*save.eig$vectors[,2] [1] -0.4804993 0.3751625 MA.31 Vector length – Suppose b = [b1, …, bp] is a vector. The length of a vector is p 2 bi . i1 R chooses eigenvectors so that they have a length of 1. Example: Using R (basic_matrix_algebra.R) > sqrt(sum(save.eig$vectors[,1]^2)) [1] 1 > sqrt(sum(save.eig$vectors[,2]^2)) [1] 1 Below is a plot of the vectors: > #Square plot (default is "m" which means maximal region) > par(pty = "s") > #Set up some dummy values for plot > b1<-c(-1,1) > b2<-c(-1,1) > plot(x = b1, y = b2, type = "n", main = expression(paste("Eigenvectors of ", A)), xlab = expression(b[1]), ylab = expression(b[2]) , panel.first=grid(col="gray", lty="dotted")) > #Run demo(plotmath) for help on mathematical notation > #draw line on plot - h specifies a horizontal line > abline(h = 0, lty = "solid", lwd = 2) > #v specifies a vertical line > abline(v = 0, lty = "solid", lwd = 2) > arrows(x0 = 0, y0 = 0, x1 = save.eig$vectors[1,1], y1 = save.eig$vectors[2,1], col = "red", lty = "solid") > arrows(x0 = 0, y0 = 0, x1 = save.eig$vectors[1,2], y1 = MA.32 save.eig$vectors[2,2], col = "red", lty = "solid") 0.0 -1.0 -0.5 b2 0.5 1.0 Eigenvectors of A -1.0 -0.5 0.0 0.5 1.0 b1 Notes: The vectors are orthogonal. Other eigenvectors do exist! For example, you can multiply one of the vectors by any scalar value. If the scalar was -10, this is what we obtain: > new.vec<--10*save.eig$vectors[,1] MA.33 > A%*%new.vec [,1] [1,] -10.09515 [2,] -12.92963 > save.eig$values[1]*new.vec [1] -10.09515 -12.92963 > sqrt(sum(new.vec^2)) [1] 10 MA.34 Positive Definite and Positive Semidefinite Matrices Definition 4.5: If a symmetric matrix has all of its eigenvalues positive, the matrix is called a positive definite matrix. Definition 4.6: If a symmetric matrix has all nonnegative eigenvalues and if at least one of the eigenvalues is actually 0, then the matrix is called a positive semidefinite matrix. Definition 4.7: If a matrix is either positive definite or positive semidefinite, the matrix is defined to be a nonnegative matrix. If you are familiar with covariance and correlation matrices, they are always nonnegative! Example: Using R (basic_matrix_algebra.R) All eigenvalues are positive so A is a positive definite matrix.