Chapter 5: Matrix approach to simple linear

5.1 Chapter 5: Matrix approach to simple linear regression analysis You need to understand matrix algebra for multiple regression! Fox’s Section 8.2 contains information about how to use R for matrix algebra. 5.1 Matrices What is a matrix? “A matrix is a rectangular array of elements arranged in rows and columns” (p. 176 of KNN) Example: 1 2 3 4 5 6   Dimension – Size of matrix: # rows  # columns = rc Example: 23 Symbolic representation of a matrix: Example: a12 a A   11 a21 a22 a13  a23   2012 Christopher R. Bilder 5.2 where aij is the row i and column j element of A a11=1 from the above example Notice that the matrix A is in bold. When bolding is not possible (writing on a piece of paper or chalkboard), the letter is underlined - A a11 is often called the “(1,1) element” of A, a12 is called the “(1,2) element” of A,… Example: rc matrix  a11 a12 a a22  21  A  ai1 ai2    ar1 ar 2 a1j a2 j aij arj a1c  a2c     aic    arc  Example: Square matrix is rc where r=c Example: HS and College GPA  2012 Christopher R. Bilder 5.3 1 3.04  1 2.35    1 2.70  X    1 2.28    1 1.88  The above 202 matrix contains the HS GPAs in the second column. Vector – a r1 (column vector) or 1c (row vector) matrix – special case of a matrix Example: Symbolic representation of a 31 column vector  a1  A  a 2    a3  Example: HS and College GPA 3.10  2.30    3.00  Y    2.20    1.60   The above 201 vector contains the College GPAs.  2012 Christopher R. Bilder 5.4 Transpose: Interchange the rows and columns of a matrix or vector Example:  a11 a12 a13  a a  A   11 A  and 12   a a a 22 23   21 a13 A is 23 and A is 32 a21  a22   a23  The  symbol indicates a transpose, and it is said as the word “prime”. Thus, the transpose of A is “A prime”. Example: HS and College GPA Y  3.10 2.30 3.00 2.20 1.60  2012 Christopher R. Bilder 5.5 5.2 Matrix addition and subtraction Add or subtract the corresponding elements of matrices with the same dimension. Example: 1 2 3  1 10 1 and B  Suppose A   . Then    4 5 6 5 5 8 0 12 2   2 8 4  A B   and A  B   .   9 10 14   1 0 2 Example: Using R (basic_matrix_algebra.R) > A<-matrix(data = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE) > class(A) [1] "matrix" > B<-matrix(data = c(-1, 10, -1, 5, 5, 8), nrow = 2, ncol = 3, byrow = TRUE) > A+B [,1] [,2] [,3] [1,] 0 12 2 [2,] 9 10 14 > A-B [,1] [,2] [,3] [1,] 2 -8 4 [2,] -1 0 -2  2012 Christopher R. Bilder 5.6 Notes: 1. Be careful with the byrow option. By default, this is set to FALSE. Thus, the numbers would be entered into the matrix by columns. For example, > matrix(data = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 2. The class of these objects is “matrix”. 3. A vector can be represented as a “matrix” class type or a type of its own. > y<-matrix(data = c(1,2,3), nrow = 3, ncol = 1, byrow = TRUE) > y [,1] [1,] 1 [2,] 2 [3,] 3 > class(y) [1] "matrix" > > [1] > [1] > [1] x<-c(1,2,3) x 1 2 3 class(x) "numeric" is.vector(x) TRUE This can present some confusion when vectors are multiplied with other vectors or matrices because no specific row or column dimensions are given. More on this shortly.  2012 Christopher R. Bilder 5.7 4. A transpose of a matrix can be done using the t() function. For example, > t(A) [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 Example: Simple linear regression model Yi=E(Yi) + i for i=1,…,n can be represented as Y  E( Y )   where  Y1  Y  Y   2  , E(Y )       Yn   E(Y1 )   1  E(Y )   2   , and    2          E(Y ) n    n   2012 Christopher R. Bilder 5.8 5.3 Matrix multiplication Scalar - 11 matrix Example: Matrix multiplied by a scalar  ca11 ca12 cA   ca21 ca22 ca13  where c is a scalar  ca23   1 2 3 2 4 6  2 A  Let A   and c=2. Then .    4 5 6 8 10 12 Multiplying two matrices Suppose you want to multiply the matrices A and B; i.e., AB or AB. In order to do this, you need the number of columns of A to be the same as the number of rows as B. For example, suppose A is 23 and B is 310. You can multiply these matrices. However if B is 410 instead, these matrices could NOT be multiplied. The resulting dimension of C=AB 1. The number of rows of A is the number of rows of C. 2. The number of columns of B is the number of rows of C. 3. In other words, C  A B where the dimension of the wy w  z z y matrices are shown below them.  2012 Christopher R. Bilder 5.9 How to multiply two matrices – an example 3 0   1 2 3  1 2  . Notice that A and B  Suppose A      4 5 6 0 1 is 23 and B is 32 so C=AB can be done. 3 0   1 2 3   C  AB   1 2  4 5 6   0 1    1 3  2  1  3  0 1 0  2  2  3  1     4  3  5  1  6  0 4  0  5  2  6  1 5 7   17 16  The “cross product” of the rows of A and the columns of B are taken to form C In the above example, D=BAAB where BA is:  2012 Christopher R. Bilder 5.10 3 0  1 2   BA  1 2    4 5 0 1 3  1  0  4   1 1  2  4   0  1  1 4 3 6  3  2  0  5 3  3  0  6 1 2  2  5 1 3  2  6   0  2  1 5 0  3  1 6  3 6 9   9 12 15     4 5 6  In general for a 23 matrix times a 32 matrix: b11 b12   a11 a12 a13   C  AB   b21 b22     a21 a22 a23  b  31 b32   a11b11  a12b21  a13b31 a11b12  a12b22  a13b32    a b  a b  a b a b  a b  a b 22 21 23 31 21 12 22 22 23 32   21 11 Example: Using R (basic_matrix_algebra.R) > > A<-matrix(data = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE) B<-matrix(data = c(3, 0, 1, 2, 0, 1), nrow = 3, ncol = 2, byrow = TRUE)  2012 Christopher R. Bilder 5.11 > > C<-A%*%B D<-B%*%A > C [,1] [,2] [1,] 5 7 [2,] 17 16 > D [,1] [,2] [,3] [1,] 3 6 9 [2,] 9 12 15 [3,] 4 5 6 > #What is A*B? > A*B Error in A * B : non-conformable arrays Notes: 1. %*% is used for multiplying matrices and/or vectors 2. * means to perform elementwise multiplications. Here is an example where this can be done: > > E<-A A*E [,1] [,2] [,3] [1,] 1 4 9 [2,] 16 25 36 The (i,i) elements of each matrix are multiplied together. 3. Multiplying vectors with other vectors or matrices can be confusing since no row or column dimensions are  2012 Christopher R. Bilder 5.12 given for a vector object. For example, suppose x =  1  2  , a 31 vector   3  > > x<-c(1,2,3) x%*%x [,1] [1,] 14 > A%*%x [,1] [1,] 14 [2,] 32 How does R know that we want xx (11) instead of xx (33) when we have not told R that x is 31? Similarly, how does R know that Ax is 21? From the R help for %*% in the Base package: Multiplies two matrices, if they are conformable. If one argument is a vector, it will be promoted to either a row or column matrix to make the two arguments conformable. If both are vectors it will return the inner product. An inner product produces a scalar value. If you wanted xx (33), one can use the outer product %o% > x%o%x #outer product [,1] [,2] [,3] [1,] 1 2 3 [2,] 2 4 6 [3,] 3 6 9  2012 Christopher R. Bilder 5.13 We will only need to use %*% in this class. Example: HS and College GPA (HS_college_GPA_ch5.R) 1 1  1 X  1  1 3.04  3.10  2.30  2.35     3.00  2.70  Y  and       2.20  2.28     1.88  1.60  Find XX, XY, and YY > #Read in the data > gpa<-read.table(file = "C:\\chris\\UNL\\STAT870\\Chapter1\\gpa.txt", header=TRUE, sep = "") > head(gpa) HS.GPA College.GPA 1 3.04 3.1 2 2.35 2.3 3 2.70 3.0 4 2.05 1.9 5 2.83 2.5 6 4.32 3.7 > X<-cbind(1, gpa$HS.GPA) > Y<-gpa$College.GPA > X [,1] [,2]  2012 Christopher R. Bilder 5.14 [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3.04 2.35 2.70 2.05 2.83 4.32 3.39 2.32 2.69 0.83 2.39 3.65 1.85 3.83 1.22 1.48 2.28 4.00 2.28 1.88 > Y [1] 3.1 2.3 3.0 1.9 2.5 3.7 3.4 2.6 2.8 1.6 2.0 2.9 2.3 3.2 1.8 1.4 2.0 3.8 2.2 [20] 1.6 > t(X)%*%X [,1] [,2] [1,] 20.00 51.3800 [2,] 51.38 148.4634 > t(X)%*%Y [,1] [1,] 50.100 [2,] 140.229 > t(Y)%*%Y [,1] [1,] 135.15 Notes:  2012 Christopher R. Bilder 5.15 1. The cbind() function combines items by “c”olumns. Since 1 is only one element, it will replicate itself for all elements that you are combining so that one full matrix is formed. There is also a rbind() function that combines by rows. Thus, rbind(a,b) forms a matrix with a above b.  y1  y  n 2 2. Y Y   y1,y2 ,...,yn      yi2   i1    yn   y1   n n   xi1yi yi      xn1  y2   i1   x11 x21     i1 n 3. X Y     n xn2     x y   x y   x12 x22  i2 i i2 i    i  1 i 1     yn  since x11=…=xn1=1 4. XX  x11   x12  1   x12 x21 x22 1 x22  x11 x12  xn1   x21 x22      xn2     x x n2   n1 1 x12   n 1  1 x22       n x xn2   i2    i 1 1 xn2   2012 Christopher R. Bilder  xi2  i 1  n 2  xi2  i 1  n 5.16 5. Here’s another way to get the X matrix > mod.fit<-lm(formula = College.GPA ~ HS.GPA, data = gpa) > model.matrix(object = mod.fit) (Intercept) HS.GPA 1 1 3.04 2 1 2.35 3 1 2.70 4 1 2.05 5 1 2.83 6 1 4.32 7 1 3.39 8 1 2.32 9 1 2.69 10 1 0.83 11 1 2.39 12 1 3.65 13 1 1.85 14 1 3.83 15 1 1.22 16 1 1.48 17 1 2.28 18 1 4.00 19 1 2.28 20 1 1.88 attr(,"assign") [1] 0 1  2012 Christopher R. Bilder 5.17 5.4 Special types of matrices Symmetric matrix: If A=A, then A is symmetric.  1 2  1 2  , A  Example: A    2 3  2 3    Diagonal matrix: A square matrix whose “off-diagonal” elements are 0. 0  a11 0 Example: A   0 a22 0     0 0 a33  Identity matrix: A diagonal matrix with 1’s on the diagonal. 1 0 0 Example: I  0 1 0    0 0 1 Note that “I” (the letter I, not the number one) usually denotes the identity matrix. Vector and matrix of 1’s 1 1 A column vector of 1’s: 1  j    r 1   r 1   1  2012 Christopher R. Bilder 5.18 1 1 1 1 A matrix of 1’s: J   r r   1 1 Notes: 1. j  j  r 1 1    1 r 1 r 1 2. j j   J r 1r 1 r r 3. J J  r J r r r r r r 0  0  Vector of 0’s: 0    r 1     0   2012 Christopher R. Bilder 5.19 5.5 Linear dependence and rank of matrix 1 2 6  Let A  3 4 12  . Think of each column of A as a   5 6 18  vector; i.e., A = [A1, A2, A3]. Note that 3A2=A3. This means the columns of A are “linearly dependent.” Formally, a set of column vectors are linearly dependent if there exists constants 1, 2,…, n (not all zero) such that 1A1+2A2+…+cAc=0. A set of column vectors are linearly independent if 1A1+2A2+…+cAc=0 only for 1=2=…=c=0 . The rank of a matrix is the maximum number of linearly independent columns in the matrix. rank(A)=2  2012 Christopher R. Bilder 5.20 5.6 Inverse of a matrix Note that the inverse of a scalar, say b, is b-1. For example, the inverse of b=3 is 3-1=1/3. Also, bb-1=1. In matrix algebra, the inverse of a matrix is another matrix. For example, the inverse of A is A-1, and AA-1=A-1A=I. Note that A must be a square matrix. 1   1 2  1  2 , A  Example: A    1.5 0.5  3 4    Check:  1 ( 2)  2  1.5 1 1  2 * ( 0.5)   1 0  1 AA     0 1  3  (  2)  4  1.5 3  1  4 * (  0.5)     Finding the inverse For a general way, see a matrix algebra book (such as: Kolman, 1988). For a 22 matrix, there is a simple  a11 a12  formula. Let A   . Then  a21 a22   a22 a12  1 1 A  . a11a22  a12 a21  a21 a11  Verify AA-1=I on your own. Example: Using R (basic_matrix_algebra.R) > A<-matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow = TRUE)  2012 Christopher R. Bilder 5.21 > solve(A) [,1] [,2] [1,] -2.0 1.0 [2,] 1.5 -0.5 > A%*%solve(A) [,1] [,2] [1,] 1 1.110223e-16 [2,] 0 1.000000e+00 > solve(A)%*%A [,1] [,2] [1,] 1.000000e+00 0 [2,] 1.110223e-16 1 > round(solve(A)%*%A, 2) [,1] [,2] [1,] 1 0 [2,] 0 1 The solve() function inverts a matrix in R. The solve(A,b) function can also be used to “solve” for x in Ax = b since A-1Ax = A-1b  Ix = A-1b  x = A-1b Example: HS and College GPA (HS_college_GPA_ch5.R) 1 3.04  3.10  1 2.35  2.30      1 2.70  3.00  Remember that X    and Y        1 2.28  2.20      1 1.88  1.60  Find (XX)-1 and (XX)-1XY  2012 Christopher R. Bilder 5.22 > solve(t(X)%*%X) [,1] [,2] [1,] 0.4507584 -0.15599781 [2,] -0.1559978 0.06072316 > solve(t(X)%*%X) %*% t(X)%*%Y [,1] [1,] 0.7075776 [2,] 0.6996584 From previous output: > mod.fit<-lm(formula = College.GPA ~ HS.GPA, data = gpa) > mod.fit$coefficients (Intercept) HS.GPA 0.7075776 0.6996584 b0  Note that (XX)-1XY =   !!!  b1  > > > #Another way to get (X'X)^(-1) sum.fit<-summary(mod.fit) sum.fit$cov.unscaled (Intercept) HS.GPA (Intercept) 0.4507584 -0.15599781 HS.GPA -0.1559978 0.06072316 Read “Uses of Inverse Matrix” on p. 192 of KNN on your own.  2012 Christopher R. Bilder 5.23 5.7 Some basic theorems of matrices Read on your own. 5.8 Random vectors and matrices A random vector or random matrix contains elements that are random variables. Example: Simple linear regression model Yi=E(Yi) + i for i=1,…,n can be represented as Y  E( Y )   where  Y1   1  Y    2 Y    and    2  are random vectors.         Y  n  n  Expectation of random vector or matrix: Find the expected value of the individual elements Example:   Y1    E(Y1 )   0  1X1         X  Y E(Y 2 2) 1 2  0  E( Y )  E                     Y E(Y )    X n n 0 1 n        2012 Christopher R. Bilder 5.24   1    E(1 )  0      0   E(  2 2)     since we assume E(ε )  E                      E(  ) n n  0     i~N(0,2) Variance-covariance matrix of a random vector Let Z1 and Z2 be random variables. Remember that Var(Z1)=E(Z1-1)2 where E(Z1)=1. The covariance of Z1 and Z2 is defined as Cov(Z1,Z2) = E[(Z1-1)(Z2-2)]. The covariance measures the relationship between Z1 and Z2. See p. 4.26 of my Chapter 4 STAT 380 notes for a more mathematical explanation (http://www.chrisbilder.com/ stat380/schedule.htm). Note that the correlation between Z1 and Z2 is Corr(Z1,Z2 )  Cov(Z1,Z2 ) Var(Z1 ) Var(Z2 ) . The “Pearson correlation coefficient” which is denoted by r and estimates Corr(Z1, Z2)  2012 Christopher R. Bilder 5.25 n Remember that r   (Xi  X)(Yi  Y) i 1 n n 2  (Xi  X)  (Yi  Y)2   i1   i1  where X=Z1 and Y=Z2. Notes:  Cov(Z1,Z1)=E[(Z1-1)(Z1-1)]=E[(Z1-1)2]=Var(Z1)  If Z1 and Z2 are independent, then Cov(Z1,Z2)=0. A variance-covariance matrix (most often just called the covariance matrix) is a matrix whose elements are the variances and covariances of random variables.  Z1  Example: Let Z    . The covariance matrix of Z is  Z2  Cov(Z1,Z2 )  Var(Z1 ) Cov(Z) =  Var(Z2 )  Cov(Z1,Z2 ) Note that Cov(Z1,Z2)=Cov(Z2,Z1) and Cov(Zi,Zi)=Var(Zi) KNN denote Cov(Z) by 2{Z}. The notation that I am using is much more prevalent and there is less chance for confusion (like Z being multiplied by 2 in 2{Z}). Example: Simple linear regression model  2012 Christopher R. Bilder 5.26 0 0   Var(1 )  0 Var(2 ) 0   Cov( )       0 0 Var(  ) n    2 0 0   2 0  0  = 2I     2 0 0    since Cov(i,i)=0 (Remember that i ~ INDEPENDENT N(0,2)). What is Cov(Y)? Note that all covariance matrices are symmetric! Some basic results Let W = AY where Y is a random vector and A is a matrix of constants (no random variables in it). The following results follow: 1. E(A) = A Remember this is like saying E(3) = 3  2012 Christopher R. Bilder 5.27 2. E(W) = E(AY) = AE(Y) Again, this is like saying E(3Y1)=3E(Y1) 3. Cov(W) = Cov(AY) = ACov(Y)A You may have seen before that Var(aY1) = a2Var(Y1) where a is a constant. Example: 1 1  Y1   Y1  Y2  and Y  W  Let A   . Then .      1 0   Y2   Y1  2.  1 1  Y1   E( W )  E        1 0   Y2   E(Y1 )  E(Y2 )   E(Y ) 1   1 1   Y1   1 0  E   Y      2 3.  2012 Christopher R. Bilder 5.28  1 1  Y1   1 1   Y1   1 1 Cov( W )  Cov     Y    1 0  Cov   Y   1 0  1 0   2      2 Cov(Y1,Y2 )  1 1 1 1  Var(Y1 )   Var(Y2 )   1 0  1 0  Cov(Y1,Y2 )  Var(Y1 )  Cov(Y1,Y2 ) Cov(Y1,Y2 )  Var(Y2 )   1 1    1 0  Var(Y1 ) Cov(Y1,Y2 )     Var(Y1 )  Var(Y2 )  2Cov(Y1,Y2 ) Var(Y1 )  Cov(Y1,Y2 )    Var(Y1 )  Cov(Y1,Y2 ) Var(Y1 )    2012 Christopher R. Bilder 5.29 5.9 Simple linear regression model in matrix terms Yi=0+1Xi+i where i~ independent N(0,2) and i=1,…,n The model can be rewritten as Y1=0+1X1+1 Y2=0+1X2+2  Yn=0+1Xn+n In matrix terms, let  Y1  1 X1  Y  1 X  2 2 Y    , X=  ,          Y 1 X n  n   1    0   2 and        1    n  Then Y = X + , which is  Y1  1 X1   1   0  X11  1   Y  1 X        X       2 1 2 2 0  2    2   0        1               X    Y 1 X  n 1 n n  n   n  0 Note that E(Y) = E(X + ) = E(X) + E() = X since E() = 0 and X are constants.  2012 Christopher R. Bilder 5.30 5.10 Least squares estimation of regression parameters From Chapter 1: The least squares method tries to find the b0 and b1 such that SSE = (Y- Ŷ )2 = (residual)2 is minimized. Formulas for b0 and b1 are derived using calculus. It can be shown that b0 and b1 can be found from solving the “normal equations”: n n i 1 i 1 nb0  b1  Xi   Yi n n i 1 i 1 n b0  Xi  b1  X  Xi Yi 2 i i 1 The normal equations can be rewritten as XXb  XY b0  where b    . In Section 5.3, it was shown that  b1   Y1   n Yi     1  Y2  i1  1 1    XY  and     n X X X 2 n  1 Xi Yi     i 1   Yn  1 X1   n 1  1 X2   1 1   XX     n X Xn    X1 X2 i    i  1 1 Xn  Thus, XXb  XY can be written as  2012 Christopher R. Bilder  Xi  i 1 . n 2  Xi  i 1  n 5.31 n n  n   Xi Yi      b0   i1  i 1 n  b    n  n 2  1   Xi  Xi    Xi Yi  i 1  i1  i 1  n n  nb  b    Xi Yi   0 1    i1  i 1  n   n  n 2 b0  Xi  b1  Xi    Xi Yi  i 1  i1   i1  Suppose both sides of XXb  XY are multiplied by (XX)-1. Then ( X X )1 X Xb  ( X X )1 X Y Ib  ( X X )1 X Y b  ( X X )1 X Y Therefore, we have a way to find b using matrix algebra! Note that b  ( X X )1 X Y  n  n   Xi  i1  Xi  i 1  n 2  Xi  i 1  n 1 n   Yi   i1  n    Xi Yi   i1   2012 Christopher R. Bilder 5.32  1 n  Xi2  i1  n   Xi  i1 n    Xi Yi     i1  i 1  n  n    Xi Yi    i1  n    Y  X  X X Y            1    n (X  X)     Y   X   n X Y    n n n X   Xi i 1 2 i 2 i 1 n i 1 n i 1 2 n i i 1 n i i 1 n 2 i i 1 n i i 1 n i i 1 i i n i i 1 i i   Y  b1X      (Xi  X)(Yi  Y)  2    (Xi  X) Example: HS and College GPA (HS_college_GPA_ch5.R) 1 3.04  3.10  1 2.35  2.30      1 2.70  3.00  Remember that X    and Y        1 2.28  2.20      1 1.88  1.60  Find b  ( X X )1 X Y . We already did this on p. 5.22. > solve(t(X)%*%X) %*% t(X)%*%Y [,1] [1,] 0.7075776 [2,] 0.6996584  2012 Christopher R. Bilder 5.33 > > #Using the formulation of solve for x in Ax = b solve(t(X)%*%X, t(X)%*%Y) [,1] [1,] 0.7075776 [2,] 0.6996584 From previous output: > mod.fit<-lm(formula = College.GPA ~ HS.GPA, data = gpa) > mod.fit$coefficients (Intercept) HS.GPA 0.7075776 0.6996584  2012 Christopher R. Bilder 5.34 5.11 Fitted values and residuals  Ŷ1    Let Yˆ    . Then Yˆ  Xb since  Ŷ   n  Ŷ1  1 X1   b0  b1X1  b     0          b1    Y  ˆ b0  b1Xn   n  1 Xn  ˆ  Xb  X(XX )1 X ' Y  ΗY where H=X(XX)-1X Hat matrix: Y is the hat matrix Why is this called the Hat matrix? We will use this in Chapter 10 to measure the influence of observations on the estimated regression line. Residuals: ˆ1   e1   Y1  Y   Let e       Y  Yˆ  Y  HY  Y(I  H)    ˆn  en   Yn  Y  Covariance matrix of the residuals: Cov(e) = Cov(Y(I-H)) = (I-H)Cov(Y)(I-H)  2012 Christopher R. Bilder 5.35 Now, Cov(Y) = Cov(X + ) = Cov() since X are constants. And, Cov() = 2I Also, (I-H) = (I-H) and (I-H)(I-H) = (I-H). (I-H) is called a symmetric “idempotent” matrix. You can use matrix algebra to see this result on your own (replace H with X(XX)-1X and multiply out). Thus, Cov(e) = (I-H)2I(I-H) = 2(I-H). The estimated covariance matrix of the residuals is then Cov(e)  MSE  (I  H)  ˆ 2  (I  H), say . Note that KNN would denote Cov(e) as s2{e}. The Cov(e) notation is more predominantly used.  2012 Christopher R. Bilder 5.36 5.12 Analysis of variance results Sums of Squares n From Chapters 1 and 2: SSTO   (Yi  Y)2 , i 1 n n i 1 i 1 ˆ 2 , and SSR  SSTO  SSE   (Yˆ i  Y)2 SSE   (Yi  Y) These can be rewritten using matrices: 1 SSTO  Y Y  Y JY n  Y1  1     Y1 Yn   Y   n 1  Yn  1   Yi2    Yi i 1 n  i1 n n 2  Y1     Yi    i 1   Yn  n   1 n   Yi   Yi i 1 n i1 n 1 Yn    1 2 n   (Yi  Y)2 i 1 SSE  ee   e1  e1  n n   2 en    ei   (Yi  Yˆ i )2   i1 i 1 en   2012 Christopher R. Bilder 1  Y1      1  Yn  5.37 1 SSR  bX Y  Y JY  SSTO  SSE n Example: HS and College GPA (HS_college_GPA_ch5.R) Continuing the same program > n<-length(Y) > > > > > #Can not use rows here since R does not know Y's dimension #Could use nrow(X) as well - X is a nx2 matrix b<-solve(t(X)%*%X) %*% t(X)%*%Y Y.hat<-X%*%b e<-Y-Y.hat H<-X%*%solve(t(X)%*%X)%*%t(X) > J<-matrix(data = 1, nrow = n, ncol = n) > #Notice that R will repeat 1 the correct number of times to fill the matrix > SSTO<-t(Y)%*%Y - 1/n%*%t(Y)%*%J%*%Y > SSE<-t(e)%*%e > MSE<-SSE/(n-nrow(b)) > #Notice how MSE uses * to do the multiplying since it is a scalar. Also diag(n) creates an identity matrix. Diag(c(1,2)) would create a 2x2 diagonal matrix with elements 1 and 2 on the diagonal > Cov.e<-MSE*(diag(n) - H) #Does not work! Error in MSE * (diag(n) - H) : non-conformable arrays > Cov.e<-as.numeric(MSE)*(diag(n) - H) #The as.numeric() removes 1x1 matrix meaning from MSE > SSR = SSTO - SSE > data.frame(X, Y, Y.hat, e) X1 X2 Y Y.hat e  2012 Christopher R. Bilder 5.38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3.04 2.35 2.70 2.05 2.83 4.32 3.39 2.32 2.69 0.83 2.39 3.65 1.85 3.83 1.22 1.48 2.28 4.00 2.28 1.88 3.1 2.3 3.0 1.9 2.5 3.7 3.4 2.6 2.8 1.6 2.0 2.9 2.3 3.2 1.8 1.4 2.0 3.8 2.2 1.6 2.834539 2.351775 2.596655 2.141877 2.687611 3.730102 3.079420 2.330785 2.589659 1.288294 2.379761 3.261331 2.001946 3.387269 1.561161 1.743072 2.302799 3.506211 2.302799 2.022935 0.26546091 -0.05177482 0.40334475 -0.24187731 -0.18761083 -0.03010181 0.32058048 0.26921493 0.21034134 0.31170591 -0.37976115 -0.36133070 0.29805437 -0.18726921 0.23883914 -0.34307203 -0.30279873 0.29378887 -0.10279873 -0.42293538 > data.frame(n, SSTO, SSE, MSE, SSR) n SSTO SSE MSE SSR 1 20 9.6495 1.587966 0.08822035 8.061534 > round(Cov.e[1:5, 1:5],6) #5x5 part of Cov(e) [,1] [,2] [,3] [,4] [,5] [1,] 0.082621 -0.003858 -0.004742 -0.003101 -0.005070 [2,] -0.003858 0.083552 -0.004257 -0.005020 -0.004105 [3,] -0.004742 -0.004257 0.083717 -0.004047 -0.004594 [4,] -0.003101 -0.005020 -0.004047 0.082366 -0.003685 [5,] -0.005070 -0.004105 -0.004594 -0.003685 0.083444 > #From past work > data.frame(X = model.matrix(mod.fit), Y.hat = mod.fit$fitted.values, e = mod.fit$residuals) X..Intercept. X.HS.GPA Y.hat e 1 1 3.04 2.834539 0.26546091 2 1 2.35 2.351775 -0.05177482 3 1 2.70 2.596655 0.40334475 4 1 2.05 2.141877 -0.24187731 5 1 2.83 2.687611 -0.18761083 6 1 4.32 3.730102 -0.03010181  2012 Christopher R. Bilder 5.39 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3.39 2.32 2.69 0.83 2.39 3.65 1.85 3.83 1.22 1.48 2.28 4.00 2.28 1.88 3.079420 2.330785 2.589659 1.288294 2.379761 3.261331 2.001946 3.387269 1.561161 1.743072 2.302799 3.506211 2.302799 2.022935 0.32058048 0.26921493 0.21034134 0.31170591 -0.37976115 -0.36133070 0.29805437 -0.18726921 0.23883914 -0.34307203 -0.30279873 0.29378887 -0.10279873 -0.42293538 > anova(mod.fit) Analysis of Variance Table Response: College.GPA Df Sum Sq Mean Sq F value Pr(>F) HS.GPA 1 8.0615 8.0615 91.38 1.779e-08 *** Residuals 18 1.5880 0.0882 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  2012 Christopher R. Bilder 5.40 5.13 Inferences in Regression Analysis Covariance matrix of b: Cov(b0 ,b1 )  Var(b0 ) 2 1  Cov(b)     ( X X ) Var(b1 )  Cov(b0 ,b1 ) Why? Cov(b) = Cov[(XX)-1XY] = (XX)-1XCov(Y)[(XX)-1X] = (XX)-1XCov(Y)X(XX)-1 = (XX)-1X2IX(XX)-1 = 2(XX)-1XX(XX)-1 = 2(XX)-1 Remember that for two matrices C and B, (CB) = BC Estimated covariance matrix of b:  Var(b0 ) Cov(b0 ,b1 ) Cov(b)    Cov(b0 ,b1 ) Var(b1 )  1  X2 X   2 2  n (X (X   i  X) i  X)   MSE    X 1  2 2    (Xi  X)  (Xi  X)   MSE  ( X X )1  2012 Christopher R. Bilder 5.41 Note that the Var(bi ) is from Sections 2.1 and 2.2. The Cov(b0 ,b1 ) is new. Estimated variance used in the C.I. for E(Yh): 2   1 (X  X) h  1 ˆ   Var(Yh )  MSE( Xh ( X X ) Xh )  MSE   2  n (X  X)  i   where Xh = [1, Xh] Why? Var(Yˆ h )  Var( Xh b)  Xh Cov(b)Xh  2 Xh ( X X )1 Xh Since b is a vector, I replaced Var() with Cov() (just notation) Estimated variance used in the P.I. for Yh(new): Var(Yh(new )  Yˆ h )  MSE(1  Xh ( X X )1 Xh )  1 (Xh  X)2   MSE  1   2  n (X  X)  i   Example: HS and College GPA (HS_college_GPA_ch5.R) > > cov.beta.hat<-as.numeric(MSE)*solve(t(X)%*%X) cov.beta.hat [,1] [,2] [1,] 0.03976606 -0.013762181  2012 Christopher R. Bilder 5.42 [2,] -0.01376218 0.005357019 > > > #From earlier sum.fit<-summary(mod.fit) sum.fit$cov.unscaled #Another way to get (X'X)^(-1) (Intercept) HS.GPA (Intercept) 0.4507584 -0.15599781 HS.GPA -0.1559978 0.06072316 > sum.fit$sigma^2 * sum.fit$cov.unscaled (Intercept) HS.GPA (Intercept) 0.03976606 -0.013762181 HS.GPA -0.01376218 0.005357019 > > #One more way to get the covariance matrix vcov(mod.fit) (Intercept) HS.GPA (Intercept) 0.03976606 -0.013762181 HS.GPA -0.01376218 0.005357019 > > > #Find Var(Y^) X.h<-c(1,1.88) as.numeric(MSE)*X.h%*%solve(t(X)%*%X)%*%X.h [,1] [1,] 0.006954107 > > #Find Var(Y-Y^) as.numeric(MSE)* (1+X.h%*%solve(t(X)%*%X)%*%X.h) [,1] [1,] 0.09517446 > > > > > > > #From HS_college_GPA_ch2.R n<-nrow(gpa) X.h<-1.88 Y.hat.h<-as.numeric(mod.fit$coefficients[1] + mod.fit$coefficients[2]*X.h) #as.numeric just removes an unneeded name ssx<-var(gpa$HS.GPA)*(n-1) #SUM( (X_i - X_bar)^2 ) X.bar<-mean(gpa$HS.GPA) MSE * (1/n + (X.h - X.bar)^2 / ssx) #Taken from the C.I. formula [,1]  2012 Christopher R. Bilder 5.43 [1,] 0.006954107 > MSE * (1 + 1/n + (X.h - X.bar)^2 / ssx) #Taken from the P.I. formula [,1] [1,] 0.09517446  2012 Christopher R. Bilder

Chapter 5: Matrix approach to simple linear

Related documents

Products

Support

Chapter 5: Matrix approach to simple linear

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib