Week 5: Matrix Approach to SLR (Chapter 5) Matrix algebra is widely used in statistics, and is really essential in multiple linear regression to succinctly present systems of equations and large arrays of data. Before we discussion SLR in a matrix algebra context, I want to introduce matrix algebra concepts that will be needed to understand the mathematical operations used later, especially in multiple linear regression (MLR). Definition of a Matrix A matrix is a rectangular array of elements arranged in rows and columns: ⎡ a 11 ⎢a ⎢ 21 ⎢ M ⎢ ⎣ a r1 a 12 L a 1c ⎤ a 22 L a 2 c ⎥⎥ M M ⎥ ⎥ a r 2 L a rc ⎦ An individual element is denoted by: A = [aij], where i = 1 to r (the number of rows) and j = 1 to c (the number of columns). A matrix is square if the number of rows equals the number of columns. A vector is a matrix with only one column (i.e., a column vector) or only one row (i.e., a row vector). A matrix is often defined by its dimensions; i.e., a matrix with 3 rows and 2 columns would be called a “3x2” matrix. Transpose of a Matrix The transpose of a matrix is another matrix where the columns and rows of the original matrix ⎡1 3⎤ have been exchanged with each other. For example, let A be a 3x2 matrix: A = ⎢⎢ 5 8 ⎥⎥ . The ⎢⎣12 15⎥⎦ 1 ⎡1 5 12⎤ transpose of A (a 2x3 matrix) is then: A’ = ⎢ ⎥ . (Note: Excel has a transpose function ⎣3 8 15⎦ under the Edit menu’s “Paste Special” feature. Equal Matrices Two matrices are said to be equal if they have the same dimensions (i.e., they both have the same number of rows and columns) and if all corresponding elements are equal. For example, for the following two, 2x2 matrices, ⎡ a 11 A= ⎢ ⎣a 21 a 12 ⎤ a 22 ⎥⎦ and ⎡1 2⎤ B= ⎢ ⎥, ⎣3 4⎦ to be equal (i.e., A = B), then, a11 = 1, a12 = 2, a21 = 3, a22 = 4. Matrix Addition and Subtraction Two matrices must have the same dimensions to add or subtract. Addition and subtraction is applied to corresponding elements in both matrices to get a new matrix. For example, if you have two, 2x2 matrices, ⎡ 5 6⎤ A= ⎢ ⎥ ⎣7 8 ⎦ and ⎡1 2⎤ B= ⎢ ⎥, ⎣3 4⎦ and you want to add them, you would get: ⎡5 6⎤ ⎡1 2⎤ ⎡ 5 + 1 6 + 2⎤ ⎡ 6 8 ⎤ C=A+B= ⎢ ⎥+⎢ ⎥ = ⎢ ⎥=⎢ ⎥. ⎣7 8⎦ ⎣3 4⎦ ⎣7 + 3 8 + 4 ⎦ ⎣10 12⎦ If you subtracted them, you would get: 2 ⎡5 6⎤ ⎡1 2⎤ ⎡ 5 − 1 6 − 2⎤ ⎡4 4⎤ C=A-B= ⎢ ⎥-⎢ ⎥ = ⎢ ⎥=⎢ ⎥. ⎣7 8⎦ ⎣3 4⎦ ⎣7 − 3 8 − 4 ⎦ ⎣4 4⎦ Matrix Multiplication by a Scalar A scalar is simply a constant, a 1x1 matrix. You can multiply an r x c matrix by a scalar by multiplying each individual matrix element by the scalar. For example, ⎡1 2⎤ ⎡ 4 *1 4 * 2⎤ ⎡ 4 8 ⎤ 4A = 4 ⎢ ⎥ =⎢ ⎥=⎢ ⎥. ⎣3 4⎦ ⎣4 * 3 4 * 4⎦ ⎣12 16⎦ Multiplication of Two Matrices To multiply two matrices, the column dimensions of the first matrix must equal the row dimensions of the second matrix (note, in matrix algebra, A * B ≠ B * A). If this condition is met, then obtaining the cross products of the rows of A with the columns of B, and summing these cross products will result in a new matrix that represents the product of two matrices. This may seem complicated, but an example will illustrate the procedure. Example: Find the product ⎡1 5 2⎤ of A & B, if A = ⎢ ⎥ and B = ⎣3 8 5⎦ ⎡0 1 ⎤ ⎢5 6⎥ . The product, AB, will be a 2 x 2 (2 x 2 = 2 x 3 * 3 ⎢ ⎥ ⎢⎣7 2⎥⎦ x 2) matrix: ⎡0 1 ⎤ ⎡1 5 2⎤ ⎢ ⎥ = ⎡1 * 0 + 5 * 5 + 2 * 7 1 * 1 + 5 * 6 + 2 * 2 ⎤ = ⎡39 35⎤ 5 6 AB = ⎢ ⎥ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎣3 8 5⎦ ⎢7 2⎥ ⎣3 * 0 + 8 * 5 + 5 * 7 3 *1 + 8 * 6 + 5 * 2⎦ ⎣75 61⎦ ⎣ ⎦ 3 Symmetric Matrix If A = A’, then the matrix is symmetric. For example, ⎡1 2 3 ⎤ A = ⎢⎢2 5 9⎥⎥ ⎢⎣3 9 7⎥⎦ ⎡1 2 3 ⎤ A’ = ⎢⎢2 5 9⎥⎥ . ⎢⎣3 9 7⎥⎦ A symmetric matrix is, by definition, square (e.g., 3 x 3). A symmetric matrix often occurs when one matrix is multiplied by its transpose. Diagonal Matrix A diagonal matrix is a matrix whose off-diagonal elements are all equal to zero: ⎡a 11 A = ⎢⎢ 0 ⎢⎣ 0 0 a 22 0 0⎤ 0 ⎥⎥ a 33 ⎥⎦ Identity Matrix An identity matrix, I, is a matrix whose diagonal elements all equal one: ⎡1 0 0 ⎤ I = ⎢⎢0 1 0⎥⎥ . Thus, ⎢⎣0 0 1⎥⎦ AI = IA = A. When a matrix is multiplied by I, then matrix remains unchanged: ⎡1 0 0⎤ ⎡ a 11 IA = ⎢⎢0 1 0⎥⎥ ⎢⎢a 21 ⎢⎣0 0 1⎥⎦ ⎢⎣a 31 a 12 a 22 a 32 a 13 ⎤ ⎡ a 11 a 23 ⎥⎥ = ⎢⎢a 21 a 33 ⎦⎥ ⎣⎢a 31 a 12 a 22 a 32 a 13 ⎤ a 23 ⎥⎥ . a 33 ⎦⎥ Scalar Matrix A scalar matrix is a diagonal matrix with the same values along the diagonal. For example, 4 ⎡λ 0 0 ⎤ A = ⎢⎢ 0 λ 0 ⎥⎥ . ⎢⎣ 0 0 λ ⎥⎦ Unity Matrix/Vector and Zero Vector A column vector with all elements equal to 1 will be referred to as: ⎡1⎤ ⎢1⎥ 1 =⎢ ⎥. r x1 ⎢M ⎥ ⎢⎥ ⎣1⎦ A square matrix with all elements equal to 1 will be referred to as: ⎡1 L 1⎤ J = ⎢⎢M M⎥⎥ . rxr ⎢⎣1 L 1⎥⎦ Note that: • ⎡1⎤ ⎢1⎥ 1’1 = [1 1 L 1] ⎢ ⎥ = [n ] = n . ⎢M⎥ ⎢⎥ ⎣1⎦ • ⎡1⎤ ⎡1 L 1⎤ ⎢1⎥ M⎥⎥ = J . 11’ = ⎢ ⎥ [1 1 L 1] = ⎢⎢M nxn ⎢M⎥ ⎢⎣1 L 1⎥⎦ ⎢⎥ ⎣1⎦ 5 Linear Dependence in a Matrix The columns of a matrix are said to be linearly dependent if one of the columns (vector) can be expressed as a linear combination of another column (vector). See page 188 in KNNL for an example. Rank of a Matrix The rank of a matrix is defined as the maximum number of linearly independent columns in a matrix. The rank of a matrix can also be expressed as the maximum number of linearly independent rows. In any case, the rank of an r x c matrix cannot exceed the minimum of the two values, r and c. Also, the rank of a matrix that results from the product of two matrices cannot exceed the smaller of the two ranks of the original matrices. Inverse of a Matrix The inverse of a matrix is the reciprocal of the matrix, and it has the following property: A-1A = AA-1 = I. An inverse of an r x r (square) matrix exists only if the matrix is full rank; i.e., the rank = r. If the rank < r, then the matrix is said to be singular. Finding the inverse of a matrix can be difficult, especially for large matrices. Only square matrices can have an inverse, though the inverse of some square matrices is not defined. 6 ⎡a b ⎤ I will only show how to find the inverse of a 2 x 2 matrix. In general, if A = ⎢ ⎥ , then the ⎣c d ⎦ inverse is: ⎡a b ⎤ A-1 = ⎢ ⎥ ⎣c d ⎦ −1 ⎡ d ⎢ = ⎢D −c ⎢ ⎣D − b⎤ D ⎥, a ⎥ ⎥ D⎦ where D = ad – bc. Note, D is called the determinant of matrix A. ⎡1 2⎤ Example: Let A = ⎢ ⎥ . So, a = 1, b = 2, c = 3, d = 4. ⎣3 4⎦ The determinant is: D = ad – bc = 1*4 – 2*3 = -2. So, the inverse is: ⎡1 2⎤ A-1 = ⎢ ⎥ ⎣3 4⎦ −1 ⎡ 4 ⎢ = ⎢− 2 −3 ⎢ ⎣− 2 − 2⎤ − 2⎥ = 1 ⎥ ⎥ − 2⎦ 1 ⎤ ⎡− 2 ⎢3 1 ⎥ ⎢⎣ 2 − 2 ⎥⎦ Some Basic Theorems about Matrices • A+B=B+A • (A + B) + C = A + (B + C) • (AB)C = A(BC) • C(A + B) = CA + CB • λ(A + B) = λA + λB • (A’)’ = A • (A + B)’ = A’ + B’ 7 • (AB)’ = B’A’ • (ABC)’ = C’B’A’ • (AB)-1 = B-1A-1 • (ABC)-1 = C-1B-1A-1 • (A-1)-1 = A • (A’)-1 = (A-1)’ Note: expectation notation also extends to random vectors and random matrices (see page 193). ⎡ Y1 ⎤ For example, let Y = ⎢⎢Y2 ⎥⎥ . Then, E[Y] = ⎢⎣ Y3 ⎥⎦ ⎡ E[Y1 ]⎤ ⎢E[Y ]⎥ . Also, the variance-covariance matrix of a 2 ⎥ ⎢ ⎢⎣ E[Y3 ]⎥⎦ ⎡ σ 2 (Y1 ) σ(Y1 , Y2 ) σ(Y1 , Y3 )⎤ ⎢ ⎥ random vector Y can be denoted as: σ2(Y) = ⎢σ(Y2 , Y1 ) σ 2 (Y2 ) σ(Y2 , Y3 )⎥ . ⎢σ(Y3 , Y1 ) σ(Y3 , Y2 ) σ 2 (Y3 ) ⎥ ⎣ ⎦ This leads to some additional matrix algebra theorems. First, let W be a random vector that is the product of premultiplying a random vector Y by a constant matrix A (a matrix with fixed elements); i.e., W = AY. Then, • E[A] = A • E[W] = E[AY] = AE[Y] • σ2(W) = σ2 (AY) = A σ2 (Y)A’ 8 SLR in a Matrix Algebra Context Now that we’ve laid the foundation with matrix algebra operations, we can transfer the SLR model to a matrix context. Recall the SLR model: Yi = β0 + β1Xi + εi, i = 1,…,n. This model can be expressed as: Y1 = β 0 + β1 X1 + ε1 Y2 = β 0 + β1 X 2 + ε 2 M Yn = β 0 + β1 X n + ε n We can now express the different model components as vectors and matrices, keeping in mind that we must establish the correct dimensions to make the SLR model in matrix notation valid: ⎡ Y1 ⎤ ⎢Y ⎥ Y = ⎢ 2⎥ n x1 ⎢ M ⎥ ⎢ ⎥ ⎣Yn ⎦ ⎡1 X 1 ⎤ ⎢1 X ⎥ 2⎥ X =⎢ nx2 ⎢M M ⎥ ⎢ ⎥ ⎣1 X n ⎦ ⎡β ⎤ β = ⎢ 0⎥ 2 x1 ⎣β1 ⎦ ⎡ ε1 ⎤ ⎢ε ⎥ ε = ⎢ 2⎥. n x1 ⎢M⎥ ⎢ ⎥ ⎣ε n ⎦ Now, we can rewrite the SLR in matrix notation: ⎡ Y1 ⎤ ⎡1 X 1 ⎤ ⎡ ε1 ⎤ ⎢ Y ⎥ ⎢1 X ⎥ β ⎢ε ⎥ 2⎥ ⎡ 0⎤ ⎢ 2⎥ = ⎢ ⎢ 2⎥ , + ⎢ M ⎥ ⎢M M ⎥ ⎢⎣β1 ⎥⎦ ⎢ M ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣Yn ⎦ ⎣1 X n ⎦ ⎣ε n ⎦ or, Y n x1 = β X nx2 2 x1 + ε n x1 We can also express some properties of the SLR in matrix notation: E[Y] = Xβ n x1 n x1 9 E[ε ] = n x1 note: ε is a vector of independent normal variables. 0 n x1 ⎡σ 2 ⎢ 0 σ 2 (ε ) = ⎢ ⎢M nxn ⎢ ⎣⎢ 0 0⎤ ⎥ 0⎥ = σ2 I nxn ⎥ M M ⎥ 0 L σ 2 ⎦⎥ 0 L 0 L 0 σ2 M 0 note: this is a scalar matrix. Least Squares Estimation of Regression Parameters in Matrix Notation Recall the normal equations from the first lecture: nb 0 + b1 ∑ X i = ∑ Yi b 0 ∑ X i + b1 ∑ X i2 = ∑ X i Yi We can express them in matrix notation as: X' X b = X' Y 2 x 2 2 x1 2 x1 Derivation: First, define some matrices: ⎡1 X' X = ⎢ 2x2 ⎣X1 ⎡1 X' Y = ⎢ 2 x1 ⎣X 1 ⎡1 X 1 ⎤ L 1 ⎤ ⎢⎢1 X 2 ⎥⎥ ⎡ n =⎢ L X n ⎥⎦ ⎢M M ⎥ ⎣∑ X i ⎢ ⎥ ⎣1 X n ⎦ 1 X2 1 X2 ∑X ∑X i 2 i ⎤ ⎥ ⎦ ⎡ Y1 ⎤ L 1 ⎤ ⎢⎢ Y2 ⎥⎥ ⎡ ∑ Yi ⎤ =⎢ ⎥ L X n ⎥⎦ ⎢ M ⎥ ⎣ ∑ X i Yi ⎦ ⎢ ⎥ ⎣ Yn ⎦ ⎡b ⎤ b = ⎢ 0⎥ 2 x1 ⎣ b1 ⎦ 10 Now we have: ⎡ n ⎢ ⎣∑ X i ∑X ∑X i 2 i ⎤ ⎥ ⎦ ⎡b 0 ⎤ ⎡ ∑ Yi ⎤ ⎢b ⎥ = ⎢ X Y ⎥ ⎣ 1 ⎦ ⎣∑ i i ⎦ or, ⎡ nb 0 + b1 ∑ X i ⎤ ⎡ ∑ Yi ⎤ =⎢ ⎢ ⎥. 2⎥ b X b X X Y + ∑ ∑ ∑ 0 i 1 i i i ⎣ ⎦ ⎣ ⎦ These final equations are the normal equations stated above. To find the regression parameter estimates, b0 and b1, in matrix notation, we must first premultiply both sides of the normal equations by X’X-1: (X' X )−1 X' Xb = (X' X )−1 X' Y which simplifies to: b = (X' X ) X' Y . −1 2 x1 2x2 2 x1 Fitted Values and Residuals in Matrix Notation ⎡ Ŷ1 ⎤ ⎢ ⎥ Ŷ ˆ ˆ = X b If we have a vector of predicted values, Y = ⎢ 2 ⎥ , then we have Y n x1 n x 2 2 x1 n x1 ⎢ M ⎥ ⎢ ⎥ ⎢⎣Ŷn ⎥⎦ because: ⎡ Ŷ1 ⎤ ⎡1 X 1 ⎤ ⎡ b 0 + b1X 1 ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ Ŷ2 ⎥ = ⎢1 X 2 ⎥ ⎡b 0 ⎤ = ⎢ b 0 + b1 X 2 ⎥ . ⎢ M ⎥ ⎢M M ⎥ ⎢⎣ b1 ⎥⎦ ⎢ ⎥ M ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ b 0 + b1 X n ⎦ ⎣⎢Ŷn ⎦⎥ ⎣1 X n ⎦ 11 We can rewrite the result for the matrix of predicted values as: ˆ = X( X' X) −1 X' Y , Y or, ˆ = H Y n x1 nxn Y , n x1 where H is called the hat matrix: H = X( X' X) −1 X' . nxn The hat matrix is a very important matrix in regression analysis, and it has some special properties. It is symmetric and idempotent; i.e., HH = H. Next, let the vector of residuals be expressed as: ⎡ e1 ⎤ ⎢e ⎥ e = ⎢ 2⎥. n x1 ⎢M⎥ ⎢ ⎥ ⎣e n ⎦ Then we have: ˆ = Y − Xb . e = Y− Y n x1 n x1 n x1 n x1 n x1 We can now derive the variance-covariance of the residuals in matrix notation. The residuals can be expressed as linear combinations of the Yi: ˆ = Y − HY = ⎛⎜ I − H ⎞⎟ Y e = Y− Y n x1 n x1 n x1 n x1 ⎝n x n n x n⎠n x1 n x1 note: the matrix I – H is also symmetric and idempotent. 12 Now we can find the variance-covariance matrix for the residuals (see proof on page 203): σ 2 (e ) = σ 2 (I − H ) , nxn which is estimated by: s 2 (e ) = MSE (I − H ) . nxn Analysis of Variance Results in Matrix Notation Total Sums of Squares: Recall that the total sums of squares for ANOVA can be expressed as: (∑ Y ) − 2 SSTO = ∑Y 2 i i . n In matrix notation, the total sums of squares is: ⎛1⎞ SSTO = Y’Y – ⎜ ⎟ Y’JY. ⎝n⎠ Note: Y’Y = [Y1 Y2 ⎡ Y1 ⎤ ⎢Y ⎥ L Yn ] ⎢ 2 ⎥ = Y12 + Y22 + L + Yn2 = ⎢ M ⎥ ⎢ ⎥ ⎣Yn ⎦ [ ] [∑ Y ]. 2 i Error Sums of Squares: Recall that the error sums of squares for ANOVA can be expressed as: SSE = ∑e 2 i ( = ∑ Yi − Ŷi ). 2 In matrix notation, the error sums of squares is: SSE = e’e = (Y – Xb)’(Y – Xb) = Y’Y – b’X’Y. 13 Regression Sums of Squares: The error sums of squares in matrix notation is: ⎛1⎞ SSR = SSTO – SSE = Y’Y – ⎜ ⎟ Y’JY – {Y’Y – b’X’Y} = b’X’Y – ⎝n⎠ ⎛1⎞ ⎜ ⎟ Y’JY. ⎝n⎠ Note: NKNW beginning on page 205 show how to express the sums of squares as quadratic forms. I refer you to the book for this topic. Inferences in Regression Analysis in Matrix Notation As you might expect, we can express estimators and their variances in matrix notation. Variance-Covariance Matrix of b: in matrix notation: ⎡ σ (b 0 ) σ2(b) = ⎢ ⎣σ(b 0 , b1 ) 2 ⎡ X i2 ∑ ⎢ 2 σ(b 0 , b1 )⎤ 2 ⎢ n ∑ (X i − X ) ⎥= σ ⎢ −X σ 2 (b1 ) ⎦ ⎢ 2 ⎢⎣ ∑ (X i − X ) [ ] ⎤ ⎥ ( ) − X X ⎥ = σ 2 ( X' X) −1 . ∑ i ⎥ 1 ⎥ 2 ( ) − X X ⎥⎦ ∑ i −X 2 To find the sample variance for b, substitute MSE for σ2: s2(b) = MSE( X' X) −1 . Derivation: Recall that b = (X' X ) X' Y = AY , where, A = a constant matrix = (X’X)-1X’, −1 and the basic theorem for a constant matrix, σ2(W) = σ2 (AY) = A σ2 (Y)A’. So, we now have: σ2(b) = A σ2 (Y)A’. Since: 14 σ2(Y) = σ2I, (X’X)-1 is symmetric, and theorem: (AB)’ = B’A’, we can now say: A’ = X(X’X)-1. Now, we can complete the derivation: σ2(b) = A σ2 (Y)A’ = (X’X)-1X’ σ2I X(X’X)-1 = σ2(X’X)-1X’X(X’X)-1 = σ2(X’X)-1I = σ2(X’X)-1. Mean Response: In matrix notation… ⎡1 ⎤ Ŷh = X 'h b , where X h = ⎢ ⎥ . ⎣X h ⎦ ( ) s 2 (Ŷh ) = MSE X 'h (X ' X ) X h = X 'h s 2 (b )X h −1 New Prediction: In matrix notation… ⎡1 ⎤ Ŷh (new ) = X 'h b , where X h = ⎢ ⎥ . ⎣X h ⎦ ( ( s 2 (Ŷh (new ) ) = MSE 1 + X 'h X ' X ) −1 ) Xh . 15