# Topic 11: Matrix Approach to Linear Regression

```Topic 11: Matrix
Approach to Linear
Regression
Outline
• Linear Regression in Matrix Form
The Model in Scalar Form
• Yi = β0 + β1Xi + ei
• The ei are independent Normally
distributed random variables with
mean 0 and variance σ2
• Consider writing the observations:
Y1= β0 + β1X1 + e1
Y2= β0 + β1X2 + e2
:
Yn= β0 + β1Xn + en
The Model in Matrix Form
 Y1    0  1 X1    1 
  
  
 Y2    0  1 X 2    2 



  




  
  
 Y      X   
 n  0
1
n  n
The Model in Matrix Form II
 Y1  1
  
 Y2  1

   
  
 Y  1
 n 
X1 
 1 

 
X 2   0    2 








  1 


 



Xn 

 n
The Design Matrix



X
n 2



1
1

1
X1 

X2 
 

Xn 
Vector of Parameters
 0 
   
21

 1
Vector of error terms
 1 
 
 2 
ε  
n 1

 
 
 n
Vector of responses
 Y1 
 
 Y2 
Y  
n1

 
Y 
 n
Simple Linear Regression
in Matrix Form
Y  X  
Y    
n 1
n 2 21
n 1
Variance-Covariance
Matrix
  2{Y1}  {Y1 , Y2 }

 {Y1 , Yn } 


2
 {Y2 , Y1}  {Y2 }


2


 (Y) 
 


 {Yn 1 , Yn }


2

 {Yn , Yn -1}  {Yn } 
  {Yn , Y1}
Main diagonal values are the variances and off-diagonal
values are the covariances.
Covariance Matrix of e
 1 
 


2
2
2
 {ε}  Cov    I
nn

nn
 
 
 n
Independent errors means that the covariance of any
two residuals is zero. Common variance implies the
main diagonal values are equal.
Covariance Matrix of Y
 Y1 
 
Y

2
2
2
 {Y}  Cov    I
n n
n n

 
Y 
 n
Distributional Assumptions
in Matrix Form
• e ~ N(0, σ2I)
• I is an n x n identity matrix
• Ones in the diagonal elements specify that
the variance of each ei is 1 times σ2
• Zeros in the off-diagonal elements specify
that the covariance between different ei is
zero
• This implies that the correlations are zero
Least Squares
• We want to minimize (Y-Xβ)(Y-Xβ)
• We take the derivative with respect to
the (vector) β
• This is like a quadratic function
• Recall the function we minimized
using the scalar form
Least Squares
• The derivative is 2 times the derivative
of (Y-Xβ) with respect to β
• In other words, –2X(Y-Xβ)
• We set this equal to 0 (a vector of zeros)
• So, –2X(Y-Xβ) = 0
• Or, XY = XXβ (the normal equations)
Normal Equations
• XY = (XX)β
• Solving for β gives the least squares
solution b = (b0, b1)
• b = (XX)–1(XY)
• See KNNL p 199 for details
• This same matrix approach works for
multiple regression!!!!!!
Fitted Values
 Ŷ1   b  b X   1
   0 1 1 
 Ŷ   b0  b1 X 2   1
Ŷ   2   





  
 
   b  b X   1
 Ŷn   0 1 n  
X1 

X2  b0 



Xb



  b1 

Xn 
Hat Matrix
ˆ  Xb
Y
1
ˆ
Y  X( X' X) X' Y
ˆ  HY
Y
1
H  X( X' X) X'
We’ll use this matrix when assessing
diagnostics in multiple regression
Estimated Covariance
Matrix of b
• This matrix, b, is a linear combination of
the elements of Y
• These estimates are Normal if Y is
Normal
• These estimates will be approximately
Normal in general
A Useful
MultivariateTheorem
• U ~ N(μ, Σ), a multivariate Normal vector
• V = c + DU, a linear transformation of U
c is a vector and D is a matrix
• Then V ~ N(c+Dμ, DΣD)
Application to b
• b = (XX)–1(XY) = ((XX)–1X)Y
• Since Y ~ N(Xβ, σ2I) this means the
vector b is Normally distributed with
mean (XX)–1XXβ = β and covariance
σ2 ((XX)–1X) I((XX)–1X) = σ2 (XX)–1
• We will use this framework to do
multiple regression where we have
more than one explanatory variable
• Another explanatory variable is
in the design matrix
• See Chapter 6
```