Matrix Algebra Review & GLM: General Linear Models Advanced Biostatistics Dean C. Adams

advertisement
Matrix Algebra Review & GLM:
General Linear Models
Advanced Biostatistics
Dean C. Adams
Lecture 5
EEOB 590C
1
Matrices: The Algebra of Statistics
•Grasping matrix algebra is the KEY for understanding statistics
•Compact method of expressing mathematical operations
•Generalize from one to many variables (i.e. vectors to matrices)
•Matrix operations have geometric interpretations in data spaces
•Much of our data (e.g., shape) cannot be measured with a single
variable, so multivariate methods are required to properly address
our hypotheses (e.g., can evaluate covariance)
2
Scalars, Vectors, and Matrices
•Scalar: a number
•Vector: an ordered list (array) of scalars (nrows x 1cols)
•Matrix: a rectangular array of scalars (nrows x pcols)
a   3
 e1 


e 
en 
 X 11

X
 X n1

X1 p 


X np 
3
Matrix Transpose
•Reverse rows and columns
•Represent by At or A’
•Vector transpose works identically
a
A   b
 c
d
e 
f 
a b
A =A 
d e
t
'
c
f 
4
Matrix Addition and Subtraction
•Matrices must have same dimensions
•Add/subtract element-wise
•Vector addition/subtraction works identically
Addition
 2 4  6 8  8 12
AB = 





1
3
5
9
6
12

 
 

Subtraction
 2 4 6 8 -4 -4
AB = 





1
3
5
9
-4
-6

 
 

5
Scalar Multiplication
2  6 
ae  3 7    21
 5  15 
1 2  5 10 
aB  5 



0
5
0
25

 

•When multiplying a vector/matrix by a scalar, simply multiply
all elements by the scalar
6
Vector Multiplication
•Multiply 2 vectors:
•Right?
 a1 
a 
a   2
 
 
 an 
 b1 
b 
b   2
 
 
bn 
 a1b1 
a b 
ab   2 2 




 anbn 
•Wrong!
*Actually, not always wrong, but not general. The above is known as the ‘Hadamard’ product (element-wise multiplication), which is a special
multiplication procedure that is only possible for vectors/matrices of the same dimension (and only done when specified as Hadamard product).
7
Vector Multiplication: Inner Product
•Combines multiplication and addition, and is performed in
specified order
a t b   a1
a2
 b1 
b 
an   2   a1b1  a2b2 
 
 
bn 
Outer dimensions define resultant matrix
(1´ 3) (3´1)
Inner dimensions must match
n
 anbn   aibi
i 1
Why inner product? In
order for vector
multiplication to work,
dimensions have to match in
a special way.
Inner product(or dot
product) is a product that
returns a scalar.
8
Vector Multiplication: Outer Product
a b   a1
t
 b1 
b 
t
ba   2   a1
 
 
bn 
 b1 
b 
an   2   a1b1  a2b2 
 
 
bn 
a2
a2
 a1b1
a b
an    1 2


 a1bn
a2b1
a2b2
n
 anbn   aibi
Inner Product
i 1
anb1 




anbn 
Outer Product
This is a “square matrix”
rows = columns
*Note the trace of bat (the sum of diagonal elements) is equal to the inner product of atb
9
Matrix Multiplication
•Simply the joint multiplication of vectors within matrices
1 1 1 4 
AB  


 2 3  1 1 
1 1 1
AB  
 1


1 1 
AB  




 1
AB  

 2 3 1
Matrix
multiplication
is solving a
series of inner
products


AB  

 2 3 
 1 1  1 1

 
4 

1  
 0

 



1  4  1 1 

 
3


 
 

  2 1  3 1   5
 
 
4 
 





1   2  4  3 1 




11
1 1 1 4  0 3 
AB  
 1 1   5 11
2
3


 

10
Matrix Multiplication
•Order of matrices makes a difference: AB BA
1 1 1 4  0 3 
AB  
 1 1   5 11
2
3


 

1 4  1 1 9 1 
BA  
  2 3   3 2
1
1


 

•Also, matrix product will not always work in both directions:
 1 1
0 3 
1
4


AB   2 3  
 5 11 

1 1
10 4  
6 36 
 1 1
1
4


  NA
BA  
2
3



1 1  10 4 


11
Review: Multiplication
•Scalar multiplication:
•Matrix multiplication:
2  6 
ae  3 7    21
 5  15 
 n
  a1i bi1
i 1
AB =  n

  a2i bi1
 i 1

a
b

1i i 2 
i 1

n

a
b

2i i 2 
i 1

n
2 1
1  6  6   20 13 
1 2 3 
 2  6  12

AB = 
3 3  





4
5
6
8

15

24
4

15

12
47
31

  4 2 
 



Inner dimensions MUST AGREE!!!
12
Special Matrices
13
Special Matrices Cont.
14
Matrix Multiplication: Geometric Interpretation
•Multiplying data and other matrices has geometric interpretations
•YI=Y:
•YcI=Y:
•YD=Y:
•YT=Y:
•YT=Y:
No change to Y
Change of scale (e.g, enlargement)
Stretching if D is diagonal
Rigid rotation if T is pxp orthogonal
Shear if T is not orthogonal
(T can be decomposed into rotation, dilation, rotation)
•Y = data matrix
é cosq -sinq ù
H =ê
ú
ë sinq cosq û
15
Matrix Multiplication: Visual Examples
Scalar (1/2)
Original
Scalar (2)
Rotations
Shears and Projections
(images from C.A.B. Smith, 1969)
16
Matrix Inversion
•Can’t divide matrices, but can find the inverse (reciprocal) and
then multiply by reciprocal (much like: a / b  a *(1/ b) )
•Inverses have property that: AA 1  I
•Matrix must be square and non-singular (i.e., determinant  0)
a b 
A

c d 
1 é d -b ù
1 é d -b ù
A =
ê
ú=
ê
ú
det ( A) ë -c a û ad - bc ë -c a û
-1
17
Matrix Inversion: Example
Example:
3 4
A

4 6
Inverse:
A 1 
1  6 4 
1  6 4   3 2 






det  A   4 3  18  16  4 3   2 1.5
Confirm:
 3 4   3 2   3  3  4(2) 3(2)  4 1.5  1 0 
AA  







4
6

2
1.5
4

3

6(

2)
4

(

2)

6

1.5
0
1


 
 

1
18
Matrix Inversion: Larger matrices
Inversion of 3 x 3 matrices or larger is not trivial computationally.
Plagiarized from Wikipedia…
Computer programs like R
use LAPACK – a system of
numerical linear equations
and decompositions – to
estimate matrix inverse.
19
Matrix Inversion: Uses
•While inversion seems useful in only rare circumstances, we use it
frequently as an analytical tool in statistical estimation.
1 1 
•Consider:
1 1 
1

1
X
1
1

1
2 
3

4
5

6
1

1 1 1 1 1 1  1
t
X X

1 2 3 4 5 6  1
1

1
2 
3   6 21

4   21 91
5

6
This is the matrix equivalent to squaring!
( X X)
t
-1
é 91 -21 ù 1 é 91 -21 ù é 0.876 -0.200 ù
1
=
ê
ú=
ê
ú=ê
ú
6 × 91- 21× 21 ë -21 6 û 105 ë -21 6 û ë -0.200 0.867 û
20
From Matrix Algebra to Linear Models
•Here, X contains a dummy variable, and a second variable
1
1

1
X
1
1

1
1
2 
3

4
5

6
1
1

1 1 1 1 1 1  1
Xt X  
 1
1
2
3
4
5
6


1

1
é
ê
ê
êë
1
2 
3   6 21

4   21 91
5

6
ù
X
å ú
2 ú
X
X
å å úû
n
X is a linear model
design matrix. It
allows calculation of
statistics for variables
21
A Simple Linear Model
1
1

x  1

1
1
1
1

t
x x  1 1 1 1 1 1  5  n

1
1
8 
3
 
y  4
 
4
1 
8 
3
 
t
x y  1 1 1 1 1  4   20   y
 
4
1 
x x
t
1
1 1
 
5 n
The linear model and its coefficients
y

x x x y  n  y  4
t
1
t
22
The General Linear Model
Y = XB + E
n´ p
Data
n´ p
n´k k ´ p
=
Matrix of p dependent values
Model design
matrix n subjects
and k parameters
Design × Coefficients +
(Prediction )
Error
Matrix of unexplained values
(residuals)
Model coefficients for k
parameters, p times
23
The General Linear Model
Y = XB + E
k´ p
B̂ = ( X X) X Y
-1
t
Why?
Need to solve this
t
Y  XB
Make ‘X’ a square matrix:
Xt Y  Xt XB
 X X
 X X
t
1
t
1
X Y   X X  Xt XB
t
Xt Y  IB
t
1
Multiply by inverse:
24
General Linear Models
•Assess variation in Y as explained by models of the form:
Y   0  1 X 1   2 X 2  3 X 3  
•ANOVA: Categorical X
•Ho: no difference among groups
4
3
2
1
0.29
0.39 0.48 0.58 0.68
squamosal/dentary ratio
5.85
•Regression: Continuous X
•Ho: no relationship between X & Y
5.14
4.44
3.73
3.03
19.58
27.65
35.73
43.80
51.88
SVL
5.85
•ANCOVA: Combination of the two
•Ho: no difference among groups while
accounting for covariation
5.14
1
1
2
2
3
3
4
4
4.44
3.73
3.03
19.58
27.65
35.73
43.80
51.88
SVL
25
Estimating Parameters
•Typically expressed using SS equations (see e.g., Biometry)
•Linear models simply partition SS and s2
2
1 n
1
2
s 
Y

Y

SS



i
n  1 i 1
df
•ANOVA (single-factor)

a
SSM  SSB   ni Yi  Y
i 1

SSW  SSE   Yij  Yi 
a
n
Regression
2
n

SSM  SSR   Yˆi  Y
2
n

i 1
SSE   Yi  Yˆi
i 1 j 1
i 1


2
2
•Parameters then found using SS-type equations
n
bY  X 
 X
i 1
n
i
 X Yi  Y 
 X
i 1
i  X
2
•Parameters more easily found using matrix form of GLM
26
Deriving Univariate Regression
Y  XB  
where:
1. Expand matrixes:
2. Begin rewrite:

 1
X
 1

 Y1 
Y   
Yn 
1 X i 

X  

1 X n 
and:

1
 1
B   Xt X  Xt Y   
  X1

1
1 X 1  
1 
  1
 X
X n  
1 X n    1
 0 
  
 1 
1
1 X 1  
1 
  1
 X
X n  
1 X n    1

 Y1     1 = n
1   
 




Xn 
Yn     X

 Y1 
1  
X n   
Yn 

X 

2 

 X  

-1


Y
 




XY





27
Deriving Univariate Regression (Cont. 1)

  1 = n

1
B   Xt X  Xt Y   

   X

2. From before:
3. Calculate inverse:
4. Multiply

  1 = n


   X


X 

2 

X
  



X2

2



2
  n X    X 




 X


2



2
  n X    X 


 
-1

X
 

2 

X
  

-1


X2

2



2
  n X    X 


 

 X


2



2
  n X    X 


 


Y







  XY 





2

 
n X 2    X   

 


n

2

 
2
n X    X   

  
 X



2

  X  Y   X  XY 



2
2










n X 2    X      Y   n X 2    X 


 



   0 


  1 
 
    XY   n XY   X  Y 
n
 


2
2








2
n X 2    X   
 n X    X 


  




 X
28
Deriving Univariate Regression (Cont. 2)


2
  X  Y   X  XY 


2




2
n
X

X



 



   0 

  1 
 n XY   X  Y 


2




2
 n X    X 





1 
n XY   X  Y


n X    X 


2
n
Rearrange to: bY  X 
 X
i 1
2
i
 X Yi  Y 
n
 X
i 1
i
X
2
Denominator:
2
1:
2:
3:
4:
  X  X     X  X  X  X 
  X  X  X  X     X  2 XX  X 
  X  2 XX  X    X  2 X  X  nX
2
2
2
2

X

X
2
 X  2 n  X  n  n


(expand multiplication)
2



 X 


 X2 
n



2
(separate sum, extract constant)
2
2
(since: X 
X
n
2
5:


X
2





 X 2   n   n X 2    X 
(multiply by n)
)
NOTE: lots of work
To get to solution
from matrices!
29
Calculating Sums of Squares
•F-ratio is: SSM/SSE (with df corrections)
•Need to calculate full and reduced model SS
2
ˆ
•Full model (contains all terms) SS    Y  Y    Y  X t  Y  X 
•Reduced model (X# has 1 less term in it) SS   Y  X#  t  Y  X#  
 SS  SS 

•Significance based on:


Fs  
SS 
q  q' 
q'
•General formula for SS of a model effect (say, A):
SS A  TF XTF Y  TR XTR Y
•Works for multivariate as well
See Rencher, 1995: p. 359
30
SS & SS: Some Comments
•SS > SS (SS: more terms = lower SS; ‘explains’ more)
•q & q’ are df for SS & SS (how many parameters are estimated
for each model vs. sample size)
•(SS - SS) estimates effect of the term of interest
•For bivariate regression: SS = SST
•For bivariate regression: (SS - SS) = [SST – SSE] = SSM
 SS  SS 

'

q

q

Fs  
SS  '
q
31
Regression Example
•The Data:
> lm(yreg~xreg)
Coefficients:
(Intercept)
0.3152
1 
4
 
X  5
 
7 
 9 
xreg
0.5163
1 
2
 
Y  3
 
4
 5 
(for matrix form):
X new
1
1

 1

1
1
1
4 
5

7
9 
> solve(t(xnewreg)%*%xnewreg)%*%t(xnewreg)%*%yreg
[,1]
[1,] 0.3152174
[2,] 0.5163043
> anova(lm(yreg~xreg))
Df Sum Sq Mean Sq F value Pr(>F)
xreg
1 9.8098 9.8098 154.71 0.00112 **
Residuals 3 0.1902 0.0634
From full model:
t(yreg-yhatreg)%*%(yreg-yhatreg)
[,1]
[1,] 0.1902174
32
ANOVA using GLM
•Same idea, but must use special X-matrix coding
•Recode k groups in k-1 dummy variables columns) of X
1 1 
1 1 


1 1 
X

1

1


1 1


1 1
•Generally, column 1 yields X , column 2 yields deviation from X
for mean of group 1, etc. (if set up as orthogonal contrasts)
•Dummy variables can be constructed as orthogonal contrasts of
various groups
33
Orthogonal Contrasts
•Group columns only represent X , X  X1 , etc. if they are set
up as orthogonal contrasts
•Orthogonal contrasts are columns of 1, 0, -1 that sum to ZERO
•One approach: code group of interest as 1, last group as –1, and
others as 0
•If groups have different sample sizes, this can get complicated
•Need to do weighted contrasts (i.e. scale –1 by sample size)
•To get mean for group, multiply deviation by this weight
34
GLM ANOVA: The Meaning of 
•For GLM ANOVA, value in  represent the overall mean ( X) and
deviations from the grand mean ( X  X1)
•For 2 groups, think of 1 as the ‘slope’ between group means
•Example:
X5
0  5
1  1
X1  4
X6  6
-1
1
35
ANOVA Example 1: Equal Sample Sizes
•The Data:
n1=5
n2=5
1
1

1

1
1
X1  
1
1

1
1

1
1
1 
1

1
1

1
1

1
1

1
5
4
 
4
 
4
3
Y 
7 
5
 
6
6
 
 6 
X 5
X1  4 X 2  6
> summary(lm(y~x2))
Coefficients:
(Intercept)
5.0000
x2
-1.0000
0.2236
0.2236
22.361 1.69e-08
-4.472 0.00208
> solve(t(xnew)%*%xnew)%*%t(xnew)%*%y
[,1]
[1,]
5
[2,]
-1
> anova(lm(y~x2))
Df Sum Sq Mean Sq F value
Pr(>F)
x2
1
10
10.0
20 0.002077 **
Residuals 8
4
0.5
From full model:
> t(y-yhat)%*%(y-yhat)
[,1]
[1,]
4
36
ANOVA Example 2: Unequal Sample Sizes
•The Data:
1
1

1

1
1
X1  
1
1

1
1

1
1 
1 
1 

1 
.667 

.667 
.667 

.667 
.667 

.667 
5
4
 
4
 
4
3
Y 
7 
5
 
6
6
 
 6 
X 5
X 1  4.25 X 2  5  (.667 *.75)  5.5
> solve(t(xnew)%*%xnew)%*%t(xnew)%*%y
[,1]
ones 5.00
xnew -0.75
> summary(lm(y~x))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
5.5000
0.4621 11.902 2.28e-06 ***
xm
-1.2500
0.7307 -1.711
0.125
> anova(lm(y~x))
Df Sum Sq Mean Sq F value Pr(>F)
factor(x) 1
3.75 3.7500 2.9268 0.1255
Residuals 8 10.25 1.2812
37
GLM Comments
•GLM is the fitting of models
•Fit null model (e.g., Y~1) and then more complicated models
•Assess fit of different models via SS (i.e,. LRT)
•All of this is accomplished using matrix algebra
38
Conclusions
•ANOVA and regression all the same model (GLM)
(key difference is what is used in X)
•GLM can be expressed in matrix algebra
B   X X X Y
t
1
t
•Question: What about multivariate-Y?
39
Download