# Matrix Approach to Simple Linear Regression

```Matrix Approach to Simple
Linear Regression
KNNL – Chapter 5
Matrices
• Definition: A matrix is a rectangular array of numbers
or symbolic elements
• In many applications, the rows of a matrix will
represent individuals cases (people, items, plants,
animals,...) and columns will represent attributes or
characteristics
• The dimension of a matrix is it number of rows and
columns, often denoted as r x c (r rows by c columns)
• Can be represented in full form or abbreviated form:
 a11
a
 21

A
 ai1


 ar1
a12
a22
a1 j
a2 j
ai 2
aij
ar 2
arj
a1c 
a2 c 

   aij  i  1,..., r ; j  1,..., c
aic 


arc 
Special Types of Matrices
Square Matrix: Number of rows = # of Columns
r  c
 20 32 50 
b 
b
A  12 28 42 
B   11 12 
b21 b22 
 28 46 60 
Vector: Matrix with one column (column vector) or one row (row vector)
 d1 
d 
D   2
E '  17 31
F '   f1 f 2 f 3 
 d3 
 
d4 
Transpose: Matrix formed by interchanging rows and columns of a matrix (use "prime" to denote transpose)
57 
C   24 
18 
6 15 22 
G

23
8 13 25 
6 8 
G '  15 13 
32
 22 25
h1c 
hr1 
 h11
 h11
   h  i  1,..., r ; j  1,..., c  H '  
   h  j  1,..., c; i  1,..., r
H  
ij




  ji 
r c
c r
 hr1
 h1c
hrc 
hrc 
Matrix Equality: Matrices of the same dimension, and corresponding elements in same cells are all equal:
 b11 b12 
4 6
A
=
B

b
  b11  4, b12  6, b21  12, b22  10

b
12
10


 21 22 
Regression Examples - Toluca Data
X
 Y1 
Y 
Response Vector: Y   2 
n1
 
 
Yn 
Y'  Y1 Y2
Yn 
1 n
1 X 1 
1 X 
2
Design Matrix: X  
n 2




1 X n 
1
1 
1
X'  

2 n
X
X
X
2
n
 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
80
30
50
90
70
60
120
80
100
50
40
70
90
20
110
100
30
50
90
110
30
90
40
80
70
Y
399
121
221
376
361
224
546
352
353
157
160
252
389
113
435
420
212
268
377
421
273
468
244
342
323
Addition and Subtraction of 2 Matrices of Common Dimension:
70   6 7 
7  0   2 7
4 7
 2 0
 42
 42
C 
D

C

D


C

D


14 6 
10  14 12  6   24 18
10  14 12  6    4 6 
22
22
22
22
10
12





 


 

a1c 
b1c 
 a11
b11
   a  i  1,..., r ; j  1,..., c B  
  b  i  1,..., r ; j  1,..., c
A  
ij 



  ij 
r c
r c
 ar1
br1
arc 
brc 
a1c  b1c 
 a11  b11

   a  b  i  1,..., r ; j  1,..., c
AB  
  ij ij 
r c
 ar1  br1
arc  brc 
a1c  b1c 
 a11  b11
   a  b  i  1,..., r ; j  1,..., c
A  B  
  ij ij 
r c
 ar1  br1
arc  brc 
Regression Example:
 Y1 
Y 
Y   2
n1
 
 
Yn 
 E Y1 


E
Y


2

E Y  


n1


 E Yn 
 1 
 
ε   2
n1
 
 
 n 
Y  E Y  ε
n1
n1
n1
 Y1   E Y1    1   E Y1  1 
Y   E Y     E Y   
 2   2    2  2 
since  2   


   

  
   

  
Yn   E Yn   n   E Yn    n 
Matrix Multiplication
Multiplication of a Matrix by a Scalar (single number):
 2 1
 3(2) 3(1)   6 3 
k 3 A

k
A






 2 7 
3(2) 3(7)   6 21
Multiplication of a Matrix by a Matrix (#cols(A) = #rows(B)):
If c A  rB : A B  AB =  abij  i  1,..., rA ; j  1,..., cB
rA c A rB cB
rA cB
abij  sum of the products of the c A  rB elements of i th row of A and jth column of B :
2 5 
A   3 1
32
 0 7 
 3 1
B 

22
2 4 
2(1)  5(4)  16 18 
 2(3)  5(2)
A B  AB  3(3)  (1)(2) 3(1)  (1)(4)    7 7 
32 2 2
32
 0(3)  7(2)
0(1)  7(4)  14 28 
 c

If c A  rB  c : A B  AB =  abij  =   aik bkj  i  1,..., rA ; j  1,..., cB
rA c A rB cB
rA cB
 k 1

Matrix Multiplication Examples - I
Simultaneous Equations: a11 x1  a12 x2  y1 a21 x1  a22 x2  y2
 a11 a12   x1   y1 
(2 equations: x1 , x2 unknown): 
 



 a21 a22   x2   y2 
 a11 x1  a12 x2   y1 
 
    AX = Y

 a21 x1  a22 x2   y2 
4
2
Sum of Squares: 42   2   32   4 2 3  2    29
 3 
1 X 1 
 0  1 X 1 
1 X  
   X 


2
1 2
0
 0
Regression Equation (Expected Values): 


  1  





1
X



X
n
1 n

 0
Matrix Multiplication Examples - II
Matrices used in simple linear regression (that generalize to multiple regression):
Y ' Y  Y1 Y2
1
X'X  
 X1
1
X'Y  
 X1
1
X2
1
X2
 Y1 
Y  n
Yn   2    Yi 2
  i 1
 
Yn 
1 X 1  
n
1  1 X 2  
 n

 
Xn  

  X i
1 X n   i 1
 Y1   n

Y
i 
1  Y2   
i 1




n
Xn    

X
Y
   i i 

Yn   i 1

Xi 

i 1

n
2
Xi 

i 1

n
1 X 1 
  0  1 X 1 
1 X  
   X 


2
1 2
0
 0
X  


  1  





1 X n 
  0  1 X n 
Special Matrix Types
Symmetric Matrix: Square matrix with a transpose equal to itself: A = A':
 6 19 8
 6 19 8
A  19 14 3 
A'  19 14 3   A
 8 3 1 
 8 3 1 
Diagonal Matrix: Square matrix with all off-diagonal elements equal to 0:
4 0 0
b1 0 0 
A   0 1 0 
B   0 b2 0 
Note:Diagonal matrices are symmetric (not vice versa)
 0 0 2 
 0 0 b3 
Identity Matrix: Diagonal matrix with all diagonal elements equal to 1 (acts like multiplying a scalar by 1):
1 0 0 
 a11 a12 a13 
 a11 a12 a13 




I  0 1 0 
A   a21 a22 a23   IA  AI  A   a21 a22 a23 
33
33
0 0 1 
 a31 a32 a33 
 a31 a32 a33 
Scalar Matrix: Diagonal matrix with all diagonal elements equal to a single number"
k
0

0

0
0 0 0
1 0 0 0 

0 1 0 0 
k 0 0
k I
k
44
0 0 1 0 
0 k 0



0 0 k
0 0 0 1 
1-Vector and matrix and zero-vector:
1
1
1  
r 1


1
1
J  
r r
1
1


1
0
0
0  
r 1
 
 
0
Note: 1 ' 1  1 1
1r r 1
1
1
1    r


1
1
1
1 1 '    1 1
r 11r


1
1
1  
1
1
 J
 r r
1
Linear Dependence and Rank of a Matrix
• Linear Dependence: When a linear function of the
columns (rows) of a matrix produces a zero vector
(one or more columns (rows) can be written as linear
function of the other columns (rows))
• Rank of a matrix: Number of linearly independent
columns (rows) of the matrix. Rank cannot exceed
the minimum of the number of rows or columns of
the matrix. rank(A) ≤ min(rA,ca)
• A matrix if full rank if rank(A) = min(rA,ca)
1
A 
22
 4
4
B 
22
4
3 
 A1
12   21
A 2 
21 
3 
 B1 B 2 

12   21 21 
3A1  A 2  0
0B1  0B 2  0
Columns of A are linearly dependent rank(A ) = 1
Columns of B are linearly independent rank(B) = 2
Matrix Inverse
• Note: For scalars (except 0), when we multiply a
number, by its reciprocal, we get 1:
2(1/2)=1
x(1/x)=x(x-1)=1
• In matrix form if A is a square matrix and full rank (all
rows and columns are linearly independent), then A
has an inverse: A-1 such that: A-1 A = A A-1 = I
2 8 
A

 4 2 
2
 36
A -1  
4
 36
8
36 

2 
36 
2
 36
A -1 A  
4
 36
8
 4 32


36  2 8   36 36


2   4 2   8 8

36 
 36 36
16 16 

36 36  1 0 
I

32 4  0 1 

36 36 
0
0 
000
0  0  0  1 0 0 
4 1 / 4   0  0
4 0 0
1/ 4






-1
-1
B   0 2 0 B   0 1/ 2 0  BB   0  0  0
0   2  1/ 2   0
0  0  0   0 1 0   I
 0  0  0
 0 0 6 
 0
0
1/ 6 
000
0  0  6 1/ 6   0 0 1 
Computing an Inverse of 2x2 Matrix
a12 
a
A   11
  full rank (columns/rows are linearly independent)
22
 a21 a22 
Determinant of A  A  a11a22  a12 a21
Note: If A is not full rank (for some value k ): a11  ka12
a21  ka22
 A  a11a22  a12 a21  ka12 a22  a12 ka22  0
 a22 a12 
-1
 a
 Thus A does not exist if A is not full rank
22
a
 21
11 
While there are rules for general r  r matrices, we will use computers to solve them
A 1 
1
A
Regression Example:
1 X 1 

1 X 
 n
2
  X'X  
X


 n


 X i
 i 1
1 X n 

 X'X 
1

1
n

n X i  X
i 1

2
 n 2
  Xi
 i 1
 n
  X i
 i 1
2
1
X
  n
2
n
Xi  X


1
i 1
  X'X   
X

  n
2
  Xi  X
 i 1




2

 n
 
 n
2
  Xi  
n
n
 n

i 1


2
2


X'X  n X i    X i   n  X i 
 n X i  X
 i 1

n
i 1
i 1
 i 1 







Xi 

i 1
 
n

X i2 

i 1

n

n

 X i 
i 1


n 



n
2
Xi  X 


i 1

1

n
2 
Xi  X 

i 1

X





n
Note:  X i  n X
i 1

n
i 1
Xi  X

2
n

2
n
n
i 1
i 1

  X i2  n X   X i2   X i  X
i 1
2

2
 nX
2
Use of Inverse Matrix – Solving Simultaneous Equations
AY = C where A and C are matrices of of constants, Y is matrix of unknowns
 A -1 AY  A -1C  Y = A -1C
Equation 1: 12 y1  6 y2  48
(assuming A is square and full rank)
Equation 2: 10y1  2 y2  12
 y1 
 48
Y 
C 
Y = A -1C
12 
 y2 
6 
 2 6  1  2
1
-1
 A 
 


12(2)  6(10)  10 12  84 10 12 
12 6 
A

10

2


6   48 1  96  72  1 168   2 
1 2
Y=A C 
 
 
 





84 10 12  12  84  480  144 84 336  4
-1
Note the wisdom of waiting to divide by | A | at end of calculation!
Useful Matrix Results
All rules assume that the matrices are conformable to operations:
AB BA
( A  B )  C  A  ( B  C)
Multiplication Rules:
(AB)C  A(BC)
C( A  B)  CA + CB
k ( A  B)  kA  kB k  scalar
Transpose Rules:
( A ') '  A
( A  B) '  A ' B '
( AB) '  B'A'
(ABC)' = C'B'A'
Inverse Rules (Full Rank, Square Matrices):
(AB)-1 = B -1 A -1
(ABC)-1 = C-1B -1 A -1
(A -1 )-1 = A
(A')-1 = (A -1 )'
Random Vectors and Matrices
Shown for case of n=3, generalizes to any n:
Random variables: Y1 , Y2 , Y3
 Y1 
 Y  Y2 
Y3 
 E Y1 


Expectation: E Y   E Y2 
 E Y3 
In general: E Y   E Yij  i  1,..., n; j  1,..., p
n p
Variance-Covariance Matrix for a Random Vector:
  Y1  E Y1 




 2 Y  E  Y  E Y  Y  E Y '  E  Y2  E Y2  Y1  E Y1 Y2  E Y2  Y3  E Y3   
 Y  E Y  

3 
 3


2

Y1  E Y1



 E  Y2  E Y2  Y1  E Y1


  Y3  E Y3  Y1  E Y1


Y  E Y  Y  E Y  Y  E Y  Y  E Y     
Y  E Y 
Y  E Y  Y  E Y    
  
Y

E
Y
Y

E
Y
Y

E
Y
  
 
 


 
1
1
2
2
1
1
3
3
2
1
2
2
3
3
21
2
2
2
2
3
3
2
2
3
3
31
 12  13 

 22  23   
 32  32 
Linear Regression Example (n=3)
Error terms are assumed to be independent, with mean 0, constant variance  2 :
 E  i   0
 1 
ε   2 
 3 
Y = Xβ + ε
 2  i    2
0
E ε   0 
 0 
  i ,  j   0  i  j
 2 0
0


2
2
σ ε   0 
0    2I
2
0
0



E Y  E Xβ + ε  Xβ  E ε  Xβ
 2 0
0


σ 2 Y  σ 2 Xβ + ε  σ 2 ε   0  2 0    2 I
2
0
0



Mean and Variance of Linear Functions of Y
A  matrix of fixed constants Y  random vector
k n
n1
 a11
W  AY  
k 1
 ak 1
a1n   Y1 
W1   a11Y1  ...  a1nYn 
    random vector: W     

 
  

Wk   ak 1Y1  ...  aknYn 
akn  Yn 
 E W1   E a11Y1  ...  a1nYn    a11 E Y1  ...  a1n E Yn  

 
 

E W  


 
 

 E Wk   E ak 1Y1  ...  aknYn   ak1 E Y1  ...  akn E Yn 
a1n   E Y1 
 a11


 
  AE Y

 ak 1
akn   E Yn 
 
 E  A  Y - E Y    Y - E Y  'A' = AE  Y - E Y   Y - E Y  ' A' = Aσ


σ 2 W = E  AY - AE Y  AY - AE Y ' = E  A  Y - E Y    A  Y - E Y   ' =
2
Y A'
Multivariate Normal Distribution
  12  12
 Y1 
 1 

Y 
 
2


2
Y   2
μ = E Y   2 
Σ = σ 2 Y   21

 
 

 
 
 1n  2 n
Yn 
 n 
Multivariate Normal Density function:
f  Y    2 
 n /2
Σ
1/2
 1

exp    Y - μ  'Σ -1  Y - μ  
 2

 Yi ~ N  i ,  i2  i  1,..., n
 Yi , Y j    ij i  j
Note, if A is a (full rank) matrix of fixed constants:
W = AY ~ N  Aμ, AΣA'
 1n 

 2n 

2 
 n 
Y ~ N  μ, Σ 
Simple Linear Regression in Matrix Form
Simple Linear Regression Model: Yi   0  1 X i   i
i  1,..., n
 Y1   0  1 X 1  1 
Y     X   
0
1 2
2
 2




Y




X


n
0
1
n
n


Defining:
 Y1 
Y 
Y=  2 
 
 
Yn 
1 X 1 
1 X 
2
X




1 X n 
 
β   0
 1 
 1 
 
ε   2   Y = Xβ + ε
 
 
 n 
Assuming constant variance, and independence of error terms  i :
 2 0

0 2
2
2

σ Y = σ ε 


0
0
0

0
  2I
 n n

2
Further, assuming normal distributio n for error terms  i : Y ~ N  Xβ,  2 I 
  0  1 X 1 
   X 
1 2
since: Xβ   0
 E Y




  0  1 X n 
Estimating Parameters by Least Squares
Normal equations obtained from:
n
Q Q
,
and setting each equal to 0:
 0 1
n
nb0  b1  X i   Yi
i 1
n
i 1
n
n
b0  X i  b1  X   X iYi
i 1
i 1
2
i
i 1

 n
Note: In matrix form: X ' X   n

 X i
 i 1

Xi 

i 1

n

X i2 

i 1

n
 n

Y

i


b 
i 1
 Defining b   0 
X'Y   n


 b1 
  X iYi 
 i 1

 X'Xb = X'Y  b =  X'X  X'Y
-1
Based on matrix form:
Q   Y - Xβ  '  Y - Xβ  = Y'Y - Y'Xβ - β'X'Y + β'X'Xβ 
n
n
n
n


2
2
 Y ' Y  2   0  Yi  1  X iYi   n 0  2  0 1  X i  1  X i2
i 1
i 1
i 1
 i 1

n
n
 Q  


2
Y

2
n


2

X


i
0
1
i
   


i 1
i 1
0



  2 X ' Y  2 X ' X 
(Q) 

n
n
 Q   n
β
2

2
X
Y

2

X

2

X



i
i
0
i
1
i
   

i 1
i 1

 1   i 1
Setting this equal to zero, and replacing  with b  X ' Xb  X ' Y
Fitted Values and Residuals
^
^
Y i  b0  b1 X i
ei  Yi  Y i
In Matrix form:
^ 
 Y 1   b0  b1 X 1 
 ^  b  b X 
^
-1
-1
Y  Y 2    0 1 2   Xb = X  X'X  X'Y = HY
H = X  X'X  X'

  

 ^  b b X 
Y n   0 1 n 
 
H is called the "hat" or "projection" matrix, note that H is idempotent (HH = H ) and symmetric(H = H'):
HH = X  X'X  X'X  X'X  X'  X'I  X'X  X'  X  X'X  X'  H
-1
-1
-1
-1
^


^ 
Y

Y
1
 1
  Y1   Y 1 
^ 
Y   ^ 

^
e  Y2  Y 2    2   Y 2   Y - Y = Y - Xb = Y - HY = (I - H)Y

    

   ^ 
^
Y  Y n  Yn  Y n 
 
 n


^


H' = X  X'X  X' ' = X  X'X  X' = H
-1
-1

^
Note: E Y = E HY = HE Y = HXβ = X  X'X  X'Xβ = Xβ
σ 2 Y = Hσ 2IH'   2 H
E e = E  I  H  Y =  I  H  E Y =  I  H  Xβ  Xβ - Xβ = 0
σ 2 e =  I  H  σ 2I  I  H  '   2  I  H 

^
s Y  MSE H
2
-1
s 2 e = MSE  I  H 
Analysis of Variance
n

Total (Corrected) Sum of Squares: SSTO   Yi  Y
i 1

2
 n 
  Yi 
n
  Yi 2   i 1 
n
i 1
2
2
n
Note: Y ' Y   Yi 2
i 1
 n 
  Yi 
 i 1    1  Y'JY
 
n
n
1
J  
n n
1
1


1
 1 
1
 SSTO  Y ' Y    Y'JY  Y ' I    J  Y
n
 n 
SSE  e'e =  Y - Xb  '  Y - Xb  = Y'Y - Y'Xb - b'X'Y + b'X'Xb = Y'Y - b'X'Y = Y'  I - H  Y
since b'X'Y = Y'X'  X'X  X'Y = Y'HY
-1
2
 n 
  Yi 

1
1 
SSR  SSTO  SSE  b'X'Y   i 1   Y'HY    Y'JY  Y '  H    J  Y
n
n
n 

Note that SSTO, SSR, and SSE are all QUADRATIC FORMS: Y'AY for symmetric matrices A
Inferences in Linear Regression
b =  X'X  X'Y Þ E b =  X'X  X'E Y =  X'X  X'Xβ = β
-1
-1
-1
σ 2 b =  X'X  X'σ 2 Y X  X'X  = σ 2  X'X  X'IX  X'X  = σ 2  X'X 
-1
Recall:
 X'X 
-1
-1
2
1
X
  n
2
n
Xi  X


i 1

X


n

2
  Xi  X
 i 1




-1

 n

2
Xi  X 


i 1

1

n
2 
Xi  X 

i 1



^
1 
Xh   
Xh 

^
s 2 b  MSE  X'X 





^
s 2 Y h  Xh's 2 b Xh  MSE Xh'  X'X  Xh
Predicted New Response at X  X h :
Y h  b0  b1 X h  Xh'b
-1
2
 MSE
X MSE
 n

 n
Xi  X


2
i 1
 s b  
X MSE


n

2
Xi  X


i 1

X
Estimated Mean Response at X  X h :
Y h  b0  b1 X h  Xh'b
-1

s 2 pred  MSE 1  Xh'  X'X  Xh
-1

-1


-1

 n

2
Xi  X 


i 1

MSE

n
2 
Xi  X 

i 1

X MSE

2




```

– Cards

– Cards

– Cards