Matrix Algebra Review & GLM: General Linear Models Advanced Biostatistics Dean C. Adams Lecture 5 EEOB 590C 1 Matrices: The Algebra of Statistics •Grasping matrix algebra is the KEY for understanding statistics •Compact method of expressing mathematical operations •Generalize from one to many variables (i.e. vectors to matrices) •Matrix operations have geometric interpretations in data spaces •Much of our data (e.g., shape) cannot be measured with a single variable, so multivariate methods are required to properly address our hypotheses (e.g., can evaluate covariance) 2 Scalars, Vectors, and Matrices •Scalar: a number •Vector: an ordered list (array) of scalars (nrows x 1cols) •Matrix: a rectangular array of scalars (nrows x pcols) a 3 e1 e en X 11 X X n1 X1 p X np 3 Matrix Transpose •Reverse rows and columns •Represent by At or A’ •Vector transpose works identically a A b c d e f a b A =A d e t ' c f 4 Matrix Addition and Subtraction •Matrices must have same dimensions •Add/subtract element-wise •Vector addition/subtraction works identically Addition 2 4 6 8 8 12 AB = 1 3 5 9 6 12 Subtraction 2 4 6 8 -4 -4 AB = 1 3 5 9 -4 -6 5 Scalar Multiplication 2 6 ae 3 7 21 5 15 1 2 5 10 aB 5 0 5 0 25 •When multiplying a vector/matrix by a scalar, simply multiply all elements by the scalar 6 Vector Multiplication •Multiply 2 vectors: •Right? a1 a a 2 an b1 b b 2 bn a1b1 a b ab 2 2 anbn •Wrong! *Actually, not always wrong, but not general. The above is known as the ‘Hadamard’ product (element-wise multiplication), which is a special multiplication procedure that is only possible for vectors/matrices of the same dimension (and only done when specified as Hadamard product). 7 Vector Multiplication: Inner Product •Combines multiplication and addition, and is performed in specified order a t b a1 a2 b1 b an 2 a1b1 a2b2 bn Outer dimensions define resultant matrix (1´ 3) (3´1) Inner dimensions must match n anbn aibi i 1 Why inner product? In order for vector multiplication to work, dimensions have to match in a special way. Inner product(or dot product) is a product that returns a scalar. 8 Vector Multiplication: Outer Product a b a1 t b1 b t ba 2 a1 bn b1 b an 2 a1b1 a2b2 bn a2 a2 a1b1 a b an 1 2 a1bn a2b1 a2b2 n anbn aibi Inner Product i 1 anb1 anbn Outer Product This is a “square matrix” rows = columns *Note the trace of bat (the sum of diagonal elements) is equal to the inner product of atb 9 Matrix Multiplication •Simply the joint multiplication of vectors within matrices 1 1 1 4 AB 2 3 1 1 1 1 1 AB 1 1 1 AB 1 AB 2 3 1 Matrix multiplication is solving a series of inner products AB 2 3 1 1 1 1 4 1 0 1 4 1 1 3 2 1 3 1 5 4 1 2 4 3 1 11 1 1 1 4 0 3 AB 1 1 5 11 2 3 10 Matrix Multiplication •Order of matrices makes a difference: AB BA 1 1 1 4 0 3 AB 1 1 5 11 2 3 1 4 1 1 9 1 BA 2 3 3 2 1 1 •Also, matrix product will not always work in both directions: 1 1 0 3 1 4 AB 2 3 5 11 1 1 10 4 6 36 1 1 1 4 NA BA 2 3 1 1 10 4 11 Review: Multiplication •Scalar multiplication: •Matrix multiplication: 2 6 ae 3 7 21 5 15 n a1i bi1 i 1 AB = n a2i bi1 i 1 a b 1i i 2 i 1 n a b 2i i 2 i 1 n 2 1 1 6 6 20 13 1 2 3 2 6 12 AB = 3 3 4 5 6 8 15 24 4 15 12 47 31 4 2 Inner dimensions MUST AGREE!!! 12 Special Matrices 13 Special Matrices Cont. 14 Matrix Multiplication: Geometric Interpretation •Multiplying data and other matrices has geometric interpretations •YI=Y: •YcI=Y: •YD=Y: •YT=Y: •YT=Y: No change to Y Change of scale (e.g, enlargement) Stretching if D is diagonal Rigid rotation if T is pxp orthogonal Shear if T is not orthogonal (T can be decomposed into rotation, dilation, rotation) •Y = data matrix é cosq -sinq ù H =ê ú ë sinq cosq û 15 Matrix Multiplication: Visual Examples Scalar (1/2) Original Scalar (2) Rotations Shears and Projections (images from C.A.B. Smith, 1969) 16 Matrix Inversion •Can’t divide matrices, but can find the inverse (reciprocal) and then multiply by reciprocal (much like: a / b a *(1/ b) ) •Inverses have property that: AA 1 I •Matrix must be square and non-singular (i.e., determinant 0) a b A c d 1 é d -b ù 1 é d -b ù A = ê ú= ê ú det ( A) ë -c a û ad - bc ë -c a û -1 17 Matrix Inversion: Example Example: 3 4 A 4 6 Inverse: A 1 1 6 4 1 6 4 3 2 det A 4 3 18 16 4 3 2 1.5 Confirm: 3 4 3 2 3 3 4(2) 3(2) 4 1.5 1 0 AA 4 6 2 1.5 4 3 6( 2) 4 ( 2) 6 1.5 0 1 1 18 Matrix Inversion: Larger matrices Inversion of 3 x 3 matrices or larger is not trivial computationally. Plagiarized from Wikipedia… Computer programs like R use LAPACK – a system of numerical linear equations and decompositions – to estimate matrix inverse. 19 Matrix Inversion: Uses •While inversion seems useful in only rare circumstances, we use it frequently as an analytical tool in statistical estimation. 1 1 •Consider: 1 1 1 1 X 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 t X X 1 2 3 4 5 6 1 1 1 2 3 6 21 4 21 91 5 6 This is the matrix equivalent to squaring! ( X X) t -1 é 91 -21 ù 1 é 91 -21 ù é 0.876 -0.200 ù 1 = ê ú= ê ú=ê ú 6 × 91- 21× 21 ë -21 6 û 105 ë -21 6 û ë -0.200 0.867 û 20 From Matrix Algebra to Linear Models •Here, X contains a dummy variable, and a second variable 1 1 1 X 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 Xt X 1 1 2 3 4 5 6 1 1 é ê ê êë 1 2 3 6 21 4 21 91 5 6 ù X å ú 2 ú X X å å úû n X is a linear model design matrix. It allows calculation of statistics for variables 21 A Simple Linear Model 1 1 x 1 1 1 1 1 t x x 1 1 1 1 1 1 5 n 1 1 8 3 y 4 4 1 8 3 t x y 1 1 1 1 1 4 20 y 4 1 x x t 1 1 1 5 n The linear model and its coefficients y x x x y n y 4 t 1 t 22 The General Linear Model Y = XB + E n´ p Data n´ p n´k k ´ p = Matrix of p dependent values Model design matrix n subjects and k parameters Design × Coefficients + (Prediction ) Error Matrix of unexplained values (residuals) Model coefficients for k parameters, p times 23 The General Linear Model Y = XB + E k´ p B̂ = ( X X) X Y -1 t Why? Need to solve this t Y XB Make ‘X’ a square matrix: Xt Y Xt XB X X X X t 1 t 1 X Y X X Xt XB t Xt Y IB t 1 Multiply by inverse: 24 General Linear Models •Assess variation in Y as explained by models of the form: Y 0 1 X 1 2 X 2 3 X 3 •ANOVA: Categorical X •Ho: no difference among groups 4 3 2 1 0.29 0.39 0.48 0.58 0.68 squamosal/dentary ratio 5.85 •Regression: Continuous X •Ho: no relationship between X & Y 5.14 4.44 3.73 3.03 19.58 27.65 35.73 43.80 51.88 SVL 5.85 •ANCOVA: Combination of the two •Ho: no difference among groups while accounting for covariation 5.14 1 1 2 2 3 3 4 4 4.44 3.73 3.03 19.58 27.65 35.73 43.80 51.88 SVL 25 Estimating Parameters •Typically expressed using SS equations (see e.g., Biometry) •Linear models simply partition SS and s2 2 1 n 1 2 s Y Y SS i n 1 i 1 df •ANOVA (single-factor) a SSM SSB ni Yi Y i 1 SSW SSE Yij Yi a n Regression 2 n SSM SSR Yˆi Y 2 n i 1 SSE Yi Yˆi i 1 j 1 i 1 2 2 •Parameters then found using SS-type equations n bY X X i 1 n i X Yi Y X i 1 i X 2 •Parameters more easily found using matrix form of GLM 26 Deriving Univariate Regression Y XB where: 1. Expand matrixes: 2. Begin rewrite: 1 X 1 Y1 Y Yn 1 X i X 1 X n and: 1 1 B Xt X Xt Y X1 1 1 X 1 1 1 X X n 1 X n 1 0 1 1 1 X 1 1 1 X X n 1 X n 1 Y1 1 = n 1 Xn Yn X Y1 1 X n Yn X 2 X -1 Y XY 27 Deriving Univariate Regression (Cont. 1) 1 = n 1 B Xt X Xt Y X 2. From before: 3. Calculate inverse: 4. Multiply 1 = n X X 2 X X2 2 2 n X X X 2 2 n X X -1 X 2 X -1 X2 2 2 n X X X 2 2 n X X Y XY 2 n X 2 X n 2 2 n X X X 2 X Y X XY 2 2 n X 2 X Y n X 2 X 0 1 XY n XY X Y n 2 2 2 n X 2 X n X X X 28 Deriving Univariate Regression (Cont. 2) 2 X Y X XY 2 2 n X X 0 1 n XY X Y 2 2 n X X 1 n XY X Y n X X 2 n Rearrange to: bY X X i 1 2 i X Yi Y n X i 1 i X 2 Denominator: 2 1: 2: 3: 4: X X X X X X X X X X X 2 XX X X 2 XX X X 2 X X nX 2 2 2 2 X X 2 X 2 n X n n (expand multiplication) 2 X X2 n 2 (separate sum, extract constant) 2 2 (since: X X n 2 5: X 2 X 2 n n X 2 X (multiply by n) ) NOTE: lots of work To get to solution from matrices! 29 Calculating Sums of Squares •F-ratio is: SSM/SSE (with df corrections) •Need to calculate full and reduced model SS 2 ˆ •Full model (contains all terms) SS Y Y Y X t Y X •Reduced model (X# has 1 less term in it) SS Y X# t Y X# SS SS •Significance based on: Fs SS q q' q' •General formula for SS of a model effect (say, A): SS A TF XTF Y TR XTR Y •Works for multivariate as well See Rencher, 1995: p. 359 30 SS & SS: Some Comments •SS > SS (SS: more terms = lower SS; ‘explains’ more) •q & q’ are df for SS & SS (how many parameters are estimated for each model vs. sample size) •(SS - SS) estimates effect of the term of interest •For bivariate regression: SS = SST •For bivariate regression: (SS - SS) = [SST – SSE] = SSM SS SS ' q q Fs SS ' q 31 Regression Example •The Data: > lm(yreg~xreg) Coefficients: (Intercept) 0.3152 1 4 X 5 7 9 xreg 0.5163 1 2 Y 3 4 5 (for matrix form): X new 1 1 1 1 1 1 4 5 7 9 > solve(t(xnewreg)%*%xnewreg)%*%t(xnewreg)%*%yreg [,1] [1,] 0.3152174 [2,] 0.5163043 > anova(lm(yreg~xreg)) Df Sum Sq Mean Sq F value Pr(>F) xreg 1 9.8098 9.8098 154.71 0.00112 ** Residuals 3 0.1902 0.0634 From full model: t(yreg-yhatreg)%*%(yreg-yhatreg) [,1] [1,] 0.1902174 32 ANOVA using GLM •Same idea, but must use special X-matrix coding •Recode k groups in k-1 dummy variables columns) of X 1 1 1 1 1 1 X 1 1 1 1 1 1 •Generally, column 1 yields X , column 2 yields deviation from X for mean of group 1, etc. (if set up as orthogonal contrasts) •Dummy variables can be constructed as orthogonal contrasts of various groups 33 Orthogonal Contrasts •Group columns only represent X , X X1 , etc. if they are set up as orthogonal contrasts •Orthogonal contrasts are columns of 1, 0, -1 that sum to ZERO •One approach: code group of interest as 1, last group as –1, and others as 0 •If groups have different sample sizes, this can get complicated •Need to do weighted contrasts (i.e. scale –1 by sample size) •To get mean for group, multiply deviation by this weight 34 GLM ANOVA: The Meaning of •For GLM ANOVA, value in represent the overall mean ( X) and deviations from the grand mean ( X X1) •For 2 groups, think of 1 as the ‘slope’ between group means •Example: X5 0 5 1 1 X1 4 X6 6 -1 1 35 ANOVA Example 1: Equal Sample Sizes •The Data: n1=5 n2=5 1 1 1 1 1 X1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 4 4 4 3 Y 7 5 6 6 6 X 5 X1 4 X 2 6 > summary(lm(y~x2)) Coefficients: (Intercept) 5.0000 x2 -1.0000 0.2236 0.2236 22.361 1.69e-08 -4.472 0.00208 > solve(t(xnew)%*%xnew)%*%t(xnew)%*%y [,1] [1,] 5 [2,] -1 > anova(lm(y~x2)) Df Sum Sq Mean Sq F value Pr(>F) x2 1 10 10.0 20 0.002077 ** Residuals 8 4 0.5 From full model: > t(y-yhat)%*%(y-yhat) [,1] [1,] 4 36 ANOVA Example 2: Unequal Sample Sizes •The Data: 1 1 1 1 1 X1 1 1 1 1 1 1 1 1 1 .667 .667 .667 .667 .667 .667 5 4 4 4 3 Y 7 5 6 6 6 X 5 X 1 4.25 X 2 5 (.667 *.75) 5.5 > solve(t(xnew)%*%xnew)%*%t(xnew)%*%y [,1] ones 5.00 xnew -0.75 > summary(lm(y~x)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.5000 0.4621 11.902 2.28e-06 *** xm -1.2500 0.7307 -1.711 0.125 > anova(lm(y~x)) Df Sum Sq Mean Sq F value Pr(>F) factor(x) 1 3.75 3.7500 2.9268 0.1255 Residuals 8 10.25 1.2812 37 GLM Comments •GLM is the fitting of models •Fit null model (e.g., Y~1) and then more complicated models •Assess fit of different models via SS (i.e,. LRT) •All of this is accomplished using matrix algebra 38 Conclusions •ANOVA and regression all the same model (GLM) (key difference is what is used in X) •GLM can be expressed in matrix algebra B X X X Y t 1 t •Question: What about multivariate-Y? 39