Week 5: Matrix Approach to SLR (Chapter 5)

advertisement
Week 5: Matrix Approach to SLR (Chapter 5)
Matrix algebra is widely used in statistics, and is really essential in multiple linear regression to
succinctly present systems of equations and large arrays of data. Before we discussion SLR in a
matrix algebra context, I want to introduce matrix algebra concepts that will be needed to
understand the mathematical operations used later, especially in multiple linear regression
(MLR).
Definition of a Matrix
A matrix is a rectangular array of elements arranged in rows and columns:
⎡ a 11
⎢a
⎢ 21
⎢ M
⎢
⎣ a r1
a 12 L a 1c ⎤
a 22 L a 2 c ⎥⎥
M
M ⎥
⎥
a r 2 L a rc ⎦
An individual element is denoted by: A = [aij], where i = 1 to r (the number of rows) and j = 1 to
c (the number of columns). A matrix is square if the number of rows equals the number of
columns. A vector is a matrix with only one column (i.e., a column vector) or only one row (i.e.,
a row vector). A matrix is often defined by its dimensions; i.e., a matrix with 3 rows and 2
columns would be called a “3x2” matrix.
Transpose of a Matrix
The transpose of a matrix is another matrix where the columns and rows of the original matrix
⎡1 3⎤
have been exchanged with each other. For example, let A be a 3x2 matrix: A = ⎢⎢ 5 8 ⎥⎥ . The
⎢⎣12 15⎥⎦
1
⎡1 5 12⎤
transpose of A (a 2x3 matrix) is then: A’ = ⎢
⎥ . (Note: Excel has a transpose function
⎣3 8 15⎦
under the Edit menu’s “Paste Special” feature.
Equal Matrices
Two matrices are said to be equal if they have the same dimensions (i.e., they both have the same
number of rows and columns) and if all corresponding elements are equal. For example, for the
following two, 2x2 matrices,
⎡ a 11
A= ⎢
⎣a 21
a 12 ⎤
a 22 ⎥⎦
and
⎡1 2⎤
B= ⎢
⎥,
⎣3 4⎦
to be equal (i.e., A = B), then,
a11 = 1, a12 = 2, a21 = 3, a22 = 4.
Matrix Addition and Subtraction
Two matrices must have the same dimensions to add or subtract. Addition and subtraction is
applied to corresponding elements in both matrices to get a new matrix. For example, if you
have two, 2x2 matrices,
⎡ 5 6⎤
A= ⎢
⎥
⎣7 8 ⎦
and
⎡1 2⎤
B= ⎢
⎥,
⎣3 4⎦
and you want to add them, you would get:
⎡5 6⎤ ⎡1 2⎤ ⎡ 5 + 1 6 + 2⎤ ⎡ 6 8 ⎤
C=A+B= ⎢
⎥+⎢
⎥ = ⎢
⎥=⎢
⎥.
⎣7 8⎦ ⎣3 4⎦ ⎣7 + 3 8 + 4 ⎦ ⎣10 12⎦
If you subtracted them, you would get:
2
⎡5 6⎤ ⎡1 2⎤ ⎡ 5 − 1 6 − 2⎤ ⎡4 4⎤
C=A-B= ⎢
⎥-⎢
⎥ = ⎢
⎥=⎢
⎥.
⎣7 8⎦ ⎣3 4⎦ ⎣7 − 3 8 − 4 ⎦ ⎣4 4⎦
Matrix Multiplication by a Scalar
A scalar is simply a constant, a 1x1 matrix. You can multiply an r x c matrix by a scalar by
multiplying each individual matrix element by the scalar. For example,
⎡1 2⎤ ⎡ 4 *1 4 * 2⎤ ⎡ 4 8 ⎤
4A = 4 ⎢
⎥ =⎢
⎥=⎢
⎥.
⎣3 4⎦ ⎣4 * 3 4 * 4⎦ ⎣12 16⎦
Multiplication of Two Matrices
To multiply two matrices, the column dimensions of the first matrix must equal the row
dimensions of the second matrix (note, in matrix algebra, A * B ≠ B * A). If this condition is
met, then obtaining the cross products of the rows of A with the columns of B, and summing
these cross products will result in a new matrix that represents the product of two matrices. This
may seem complicated, but an example will illustrate the procedure. Example: Find the product
⎡1 5 2⎤
of A & B, if A = ⎢
⎥ and B =
⎣3 8 5⎦
⎡0 1 ⎤
⎢5 6⎥ . The product, AB, will be a 2 x 2 (2 x 2 = 2 x 3 * 3
⎢
⎥
⎢⎣7 2⎥⎦
x 2) matrix:
⎡0 1 ⎤
⎡1 5 2⎤ ⎢
⎥ = ⎡1 * 0 + 5 * 5 + 2 * 7 1 * 1 + 5 * 6 + 2 * 2 ⎤ = ⎡39 35⎤
5
6
AB = ⎢
⎥
⎥
⎥ ⎢
⎢
⎥ ⎢
⎣3 8 5⎦ ⎢7 2⎥ ⎣3 * 0 + 8 * 5 + 5 * 7 3 *1 + 8 * 6 + 5 * 2⎦ ⎣75 61⎦
⎣
⎦
3
Symmetric Matrix
If A = A’, then the matrix is symmetric. For example,
⎡1 2 3 ⎤
A = ⎢⎢2 5 9⎥⎥
⎢⎣3 9 7⎥⎦
⎡1 2 3 ⎤
A’ = ⎢⎢2 5 9⎥⎥ .
⎢⎣3 9 7⎥⎦
A symmetric matrix is, by definition, square (e.g., 3 x 3). A symmetric matrix often occurs when
one matrix is multiplied by its transpose.
Diagonal Matrix
A diagonal matrix is a matrix whose off-diagonal elements are all equal to zero:
⎡a 11
A = ⎢⎢ 0
⎢⎣ 0
0
a 22
0
0⎤
0 ⎥⎥
a 33 ⎥⎦
Identity Matrix
An identity matrix, I, is a matrix whose diagonal elements all equal one:
⎡1 0 0 ⎤
I = ⎢⎢0 1 0⎥⎥ . Thus,
⎢⎣0 0 1⎥⎦
AI = IA = A. When a matrix is multiplied by I, then matrix remains unchanged:
⎡1 0 0⎤ ⎡ a 11
IA = ⎢⎢0 1 0⎥⎥ ⎢⎢a 21
⎢⎣0 0 1⎥⎦ ⎢⎣a 31
a 12
a 22
a 32
a 13 ⎤ ⎡ a 11
a 23 ⎥⎥ = ⎢⎢a 21
a 33 ⎦⎥ ⎣⎢a 31
a 12
a 22
a 32
a 13 ⎤
a 23 ⎥⎥ .
a 33 ⎦⎥
Scalar Matrix
A scalar matrix is a diagonal matrix with the same values along the diagonal. For example,
4
⎡λ 0 0 ⎤
A = ⎢⎢ 0 λ 0 ⎥⎥ .
⎢⎣ 0 0 λ ⎥⎦
Unity Matrix/Vector and Zero Vector
A column vector with all elements equal to 1 will be referred to as:
⎡1⎤
⎢1⎥
1 =⎢ ⎥.
r x1
⎢M ⎥
⎢⎥
⎣1⎦
A square matrix with all elements equal to 1 will be referred to as:
⎡1 L 1⎤
J = ⎢⎢M
M⎥⎥ .
rxr
⎢⎣1 L 1⎥⎦
Note that:
•
⎡1⎤
⎢1⎥
1’1 = [1 1 L 1] ⎢ ⎥ = [n ] = n .
⎢M⎥
⎢⎥
⎣1⎦
•
⎡1⎤
⎡1 L 1⎤
⎢1⎥
M⎥⎥ = J .
11’ = ⎢ ⎥ [1 1 L 1] = ⎢⎢M
nxn
⎢M⎥
⎢⎣1 L 1⎥⎦
⎢⎥
⎣1⎦
5
Linear Dependence in a Matrix
The columns of a matrix are said to be linearly dependent if one of the columns (vector) can be
expressed as a linear combination of another column (vector). See page 188 in KNNL for an
example.
Rank of a Matrix
The rank of a matrix is defined as the maximum number of linearly independent columns in a
matrix. The rank of a matrix can also be expressed as the maximum number of linearly
independent rows. In any case, the rank of an r x c matrix cannot exceed the minimum of the
two values, r and c. Also, the rank of a matrix that results from the product of two matrices
cannot exceed the smaller of the two ranks of the original matrices.
Inverse of a Matrix
The inverse of a matrix is the reciprocal of the matrix, and it has the following property:
A-1A = AA-1 = I.
An inverse of an r x r (square) matrix exists only if the matrix is full rank; i.e., the rank = r. If the
rank < r, then the matrix is said to be singular. Finding the inverse of a matrix can be difficult,
especially for large matrices. Only square matrices can have an inverse, though the inverse of
some square matrices is not defined.
6
⎡a b ⎤
I will only show how to find the inverse of a 2 x 2 matrix. In general, if A = ⎢
⎥ , then the
⎣c d ⎦
inverse is:
⎡a b ⎤
A-1 = ⎢
⎥
⎣c d ⎦
−1
⎡ d
⎢
= ⎢D
−c
⎢
⎣D
− b⎤
D ⎥,
a ⎥
⎥
D⎦
where D = ad – bc. Note, D is called the determinant of matrix A.
⎡1 2⎤
Example: Let A = ⎢
⎥ . So, a = 1, b = 2, c = 3, d = 4.
⎣3 4⎦
The determinant is: D = ad – bc = 1*4 – 2*3 = -2.
So, the inverse is:
⎡1 2⎤
A-1 = ⎢
⎥
⎣3 4⎦
−1
⎡ 4
⎢
= ⎢− 2
−3
⎢
⎣− 2
− 2⎤
− 2⎥ =
1 ⎥
⎥
− 2⎦
1 ⎤
⎡− 2
⎢3
1 ⎥
⎢⎣ 2 − 2 ⎥⎦
Some Basic Theorems about Matrices
•
A+B=B+A
•
(A + B) + C = A + (B + C)
•
(AB)C = A(BC)
•
C(A + B) = CA + CB
•
λ(A + B) = λA + λB
•
(A’)’ = A
•
(A + B)’ = A’ + B’
7
•
(AB)’ = B’A’
•
(ABC)’ = C’B’A’
•
(AB)-1 = B-1A-1
•
(ABC)-1 = C-1B-1A-1
•
(A-1)-1 = A
•
(A’)-1 = (A-1)’
Note: expectation notation also extends to random vectors and random matrices (see page 193).
⎡ Y1 ⎤
For example, let Y = ⎢⎢Y2 ⎥⎥ . Then, E[Y] =
⎢⎣ Y3 ⎥⎦
⎡ E[Y1 ]⎤
⎢E[Y ]⎥ . Also, the variance-covariance matrix of a
2 ⎥
⎢
⎢⎣ E[Y3 ]⎥⎦
⎡ σ 2 (Y1 ) σ(Y1 , Y2 ) σ(Y1 , Y3 )⎤
⎢
⎥
random vector Y can be denoted as: σ2(Y) = ⎢σ(Y2 , Y1 ) σ 2 (Y2 ) σ(Y2 , Y3 )⎥ .
⎢σ(Y3 , Y1 ) σ(Y3 , Y2 ) σ 2 (Y3 ) ⎥
⎣
⎦
This leads to some additional matrix algebra theorems. First, let W be a random vector that is
the product of premultiplying a random vector Y by a constant matrix A (a matrix with fixed
elements); i.e., W = AY. Then,
•
E[A] = A
•
E[W] = E[AY] = AE[Y]
•
σ2(W) = σ2 (AY) = A σ2 (Y)A’
8
SLR in a Matrix Algebra Context
Now that we’ve laid the foundation with matrix algebra operations, we can transfer the SLR
model to a matrix context. Recall the SLR model: Yi = β0 + β1Xi + εi, i = 1,…,n. This model
can be expressed as:
Y1 = β 0 + β1 X1 + ε1
Y2 = β 0 + β1 X 2 + ε 2
M
Yn = β 0 + β1 X n + ε n
We can now express the different model components as vectors and matrices, keeping in mind
that we must establish the correct dimensions to make the SLR model in matrix notation valid:
⎡ Y1 ⎤
⎢Y ⎥
Y = ⎢ 2⎥
n x1
⎢ M ⎥
⎢ ⎥
⎣Yn ⎦
⎡1 X 1 ⎤
⎢1 X ⎥
2⎥
X =⎢
nx2
⎢M M ⎥
⎢
⎥
⎣1 X n ⎦
⎡β ⎤
β = ⎢ 0⎥
2 x1
⎣β1 ⎦
⎡ ε1 ⎤
⎢ε ⎥
ε = ⎢ 2⎥.
n x1
⎢M⎥
⎢ ⎥
⎣ε n ⎦
Now, we can rewrite the SLR in matrix notation:
⎡ Y1 ⎤ ⎡1 X 1 ⎤
⎡ ε1 ⎤
⎢ Y ⎥ ⎢1 X ⎥ β
⎢ε ⎥
2⎥ ⎡ 0⎤
⎢ 2⎥ = ⎢
⎢ 2⎥ ,
+
⎢ M ⎥ ⎢M M ⎥ ⎢⎣β1 ⎥⎦ ⎢ M ⎥
⎢ ⎥ ⎢
⎥
⎢ ⎥
⎣Yn ⎦ ⎣1 X n ⎦
⎣ε n ⎦
or,
Y
n x1
=
β
X
nx2
2 x1
+
ε
n x1
We can also express some properties of the SLR in matrix notation:
E[Y] = Xβ
n x1
n x1
9
E[ε ] =
n x1
note: ε is a vector of independent normal variables.
0
n x1
⎡σ 2
⎢
0
σ 2 (ε ) = ⎢
⎢M
nxn
⎢
⎣⎢ 0
0⎤
⎥
0⎥
= σ2 I
nxn
⎥
M
M
⎥
0 L σ 2 ⎦⎥
0 L
0 L
0
σ2
M
0
note: this is a scalar matrix.
Least Squares Estimation of Regression Parameters in Matrix Notation
Recall the normal equations from the first lecture:
nb 0 + b1 ∑ X i = ∑ Yi
b 0 ∑ X i + b1 ∑ X i2 = ∑ X i Yi
We can express them in matrix notation as:
X' X b = X' Y
2 x 2 2 x1
2 x1
Derivation: First, define some matrices:
⎡1
X' X = ⎢
2x2
⎣X1
⎡1
X' Y = ⎢
2 x1
⎣X 1
⎡1 X 1 ⎤
L 1 ⎤ ⎢⎢1 X 2 ⎥⎥ ⎡ n
=⎢
L X n ⎥⎦ ⎢M M ⎥ ⎣∑ X i
⎢
⎥
⎣1 X n ⎦
1
X2
1
X2
∑X
∑X
i
2
i
⎤
⎥
⎦
⎡ Y1 ⎤
L 1 ⎤ ⎢⎢ Y2 ⎥⎥ ⎡ ∑ Yi ⎤
=⎢
⎥
L X n ⎥⎦ ⎢ M ⎥ ⎣ ∑ X i Yi ⎦
⎢ ⎥
⎣ Yn ⎦
⎡b ⎤
b = ⎢ 0⎥
2 x1
⎣ b1 ⎦
10
Now we have:
⎡ n
⎢
⎣∑ X i
∑X
∑X
i
2
i
⎤
⎥
⎦
⎡b 0 ⎤ ⎡ ∑ Yi ⎤
⎢b ⎥ = ⎢ X Y ⎥
⎣ 1 ⎦ ⎣∑ i i ⎦
or,
⎡ nb 0 + b1 ∑ X i ⎤ ⎡ ∑ Yi ⎤
=⎢
⎢
⎥.
2⎥
b
X
b
X
X
Y
+
∑
∑
∑
0
i
1
i
i
i
⎣
⎦ ⎣
⎦
These final equations are the normal equations stated above. To find the regression parameter
estimates, b0 and b1, in matrix notation, we must first premultiply both sides of the normal
equations by X’X-1:
(X' X )−1 X' Xb = (X' X )−1 X' Y
which simplifies to:
b = (X' X ) X' Y .
−1
2 x1
2x2
2 x1
Fitted Values and Residuals in Matrix Notation
⎡ Ŷ1 ⎤
⎢ ⎥
Ŷ
ˆ
ˆ = X b
If we have a vector of predicted values, Y = ⎢ 2 ⎥ , then we have Y
n x1
n x 2 2 x1
n x1
⎢ M ⎥
⎢ ⎥
⎢⎣Ŷn ⎥⎦
because:
⎡ Ŷ1 ⎤ ⎡1 X 1 ⎤
⎡ b 0 + b1X 1 ⎤
⎢ ⎥ ⎢
⎥
⎢
⎥
⎢ Ŷ2 ⎥ = ⎢1 X 2 ⎥ ⎡b 0 ⎤ = ⎢ b 0 + b1 X 2 ⎥ .
⎢ M ⎥ ⎢M M ⎥ ⎢⎣ b1 ⎥⎦ ⎢
⎥
M
⎢ ⎥ ⎢
⎥
⎢
⎥
⎣ b 0 + b1 X n ⎦
⎣⎢Ŷn ⎦⎥ ⎣1 X n ⎦
11
We can rewrite the result for the matrix of predicted values as:
ˆ = X( X' X) −1 X' Y ,
Y
or,
ˆ = H
Y
n x1
nxn
Y ,
n x1
where H is called the hat matrix:
H = X( X' X) −1 X' .
nxn
The hat matrix is a very important matrix in regression analysis, and it has some special
properties. It is symmetric and idempotent; i.e., HH = H.
Next, let the vector of residuals be expressed as:
⎡ e1 ⎤
⎢e ⎥
e = ⎢ 2⎥.
n x1
⎢M⎥
⎢ ⎥
⎣e n ⎦
Then we have:
ˆ = Y − Xb .
e = Y− Y
n x1
n x1
n x1
n x1
n x1
We can now derive the variance-covariance of the residuals in matrix notation. The residuals
can be expressed as linear combinations of the Yi:
ˆ = Y − HY = ⎛⎜ I − H ⎞⎟ Y
e = Y− Y
n x1 n x1
n x1
n x1
⎝n x n n x n⎠n x1
n x1
note: the matrix I – H is also symmetric and idempotent.
12
Now we can find the variance-covariance matrix for the residuals (see proof on page 203):
σ 2 (e ) = σ 2 (I − H ) ,
nxn
which is estimated by:
s 2 (e ) = MSE (I − H ) .
nxn
Analysis of Variance Results in Matrix Notation
Total Sums of Squares: Recall that the total sums of squares for ANOVA can be expressed as:
(∑ Y )
−
2
SSTO =
∑Y
2
i
i
.
n
In matrix notation, the total sums of squares is:
⎛1⎞
SSTO = Y’Y – ⎜ ⎟ Y’JY.
⎝n⎠
Note: Y’Y = [Y1
Y2
⎡ Y1 ⎤
⎢Y ⎥
L Yn ] ⎢ 2 ⎥ = Y12 + Y22 + L + Yn2 =
⎢ M ⎥
⎢ ⎥
⎣Yn ⎦
[
] [∑ Y ].
2
i
Error Sums of Squares: Recall that the error sums of squares for ANOVA can be expressed as:
SSE =
∑e
2
i
(
= ∑ Yi − Ŷi
).
2
In matrix notation, the error sums of squares is:
SSE = e’e = (Y – Xb)’(Y – Xb) = Y’Y – b’X’Y.
13
Regression Sums of Squares: The error sums of squares in matrix notation is:
⎛1⎞
SSR = SSTO – SSE = Y’Y – ⎜ ⎟ Y’JY – {Y’Y – b’X’Y} = b’X’Y –
⎝n⎠
⎛1⎞
⎜ ⎟ Y’JY.
⎝n⎠
Note: NKNW beginning on page 205 show how to express the sums of squares as quadratic
forms. I refer you to the book for this topic.
Inferences in Regression Analysis in Matrix Notation
As you might expect, we can express estimators and their variances in matrix notation.
Variance-Covariance Matrix of b: in matrix notation:
⎡ σ (b 0 )
σ2(b) = ⎢
⎣σ(b 0 , b1 )
2
⎡
X i2
∑
⎢
2
σ(b 0 , b1 )⎤
2 ⎢ n ∑ (X i − X )
⎥= σ ⎢
−X
σ 2 (b1 ) ⎦
⎢
2
⎢⎣ ∑ (X i − X )
[ ]
⎤
⎥
(
)
−
X
X
⎥ = σ 2 ( X' X) −1 .
∑ i
⎥
1
⎥
2
(
)
−
X
X
⎥⎦
∑ i
−X
2
To find the sample variance for b, substitute MSE for σ2:
s2(b) = MSE( X' X) −1 .
Derivation: Recall that
b = (X' X ) X' Y = AY , where, A = a constant matrix = (X’X)-1X’,
−1
and the basic theorem for a constant matrix,
σ2(W) = σ2 (AY) = A σ2 (Y)A’.
So, we now have:
σ2(b) = A σ2 (Y)A’.
Since:
14
σ2(Y) = σ2I,
(X’X)-1 is symmetric, and
theorem: (AB)’ = B’A’,
we can now say:
A’ = X(X’X)-1.
Now, we can complete the derivation:
σ2(b) = A σ2 (Y)A’
= (X’X)-1X’ σ2I X(X’X)-1
= σ2(X’X)-1X’X(X’X)-1
= σ2(X’X)-1I
= σ2(X’X)-1.
Mean Response: In matrix notation…
⎡1 ⎤
Ŷh = X 'h b , where X h = ⎢ ⎥ .
⎣X h ⎦
(
)
s 2 (Ŷh ) = MSE X 'h (X ' X ) X h = X 'h s 2 (b )X h
−1
New Prediction: In matrix notation…
⎡1 ⎤
Ŷh (new ) = X 'h b , where X h = ⎢ ⎥ .
⎣X h ⎦
(
(
s 2 (Ŷh (new ) ) = MSE 1 + X 'h X ' X
)
−1
)
Xh .
15
Download