PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 1 Examples of Adding Predictors to Multivariate Models Uses of the gls() Function with the Factor() Function Comparisons of Multivariate Models with Classical MANOVA The data for this example come from http://www.ats.ucla.edu/stat/sas/library/repeated_ut.htm R Syntax for Data Manipulation – Converting from Wide Format to Long/Stacked Format: Resulting R Data Set (personID = 1 and personID =18 shown) ๐ฆ๐1 112 R’s gls() function is modeling ๐๐ = [๐ฆ๐2 ], where ๐ is the pulse rate. For personID = 1 this would be: ๐1 = [166] ๐ฆ๐3 215 Empty multivariate model predicting the mean pulse rate for each intensity level: Multivariate model becomes (simultaneously): ๐๐ = ๐ ๐ ๐ท + ๐๐ ๐ฆ๐1 ๐๐1 ๐ฝ0 1 0 0 ๐ฆ ๐๐ = [ ๐2 ] ; ๐ ๐ = [1 1 0] ; ๐ท = [๐ฝ๐ผ2 ] ; ๐๐ = [๐๐2 ] ๐ฆ๐3 ๐๐3 ๐ฝ๐ผ3 1 0 1 2 ๐๐1 Where ๐๐ ∼ ๐3 (๐, ๐), and ๐ = [๐๐1 ,๐2 ๐๐1 ,๐3 ๐๐1 ,๐2 2 ๐๐2 ๐๐2 ,๐3 ๐๐1 ,๐3 ๐๐2 ,๐3 ] (the unstructured error covariance matrix model). 2 ๐๐3 PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 2 Which leads to: ๐ฆ๐1 = ๐ฝ0 + ๐๐1 ๐ฆ๐2 = ๐ฝ0 + ๐ฝ๐ผ2 + ๐๐2 ๐ฆ๐3 = ๐ฝ0 + ๐ฝ๐ผ3 + ๐๐3 Interpret each effect… Intercept: observationF2: observationF3: PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 3 R Matrix Estimate Adding a Predictor to a Multivariate Model We will add diet (vegetarian or meat eater) as a predictor to the empty model: Multivariate model becomes : ๐๐ = ๐ ๐ ๐ท + ๐๐ For meat eaters: ๐ฆ๐1 1 0 0 ๐๐ = [๐ฆ๐2 ] ; ๐ ๐ = [1 1 0 ๐ฆ๐3 1 0 1 0 0 0 0 0 0 ๐ฝ0 ๐ฝ๐ผ2 ๐๐1 0 ๐ฝ๐ผ3 ; ๐๐ = [๐๐2 ] 0] ; ๐ท = ๐ฝ๐ ๐๐3 0 ๐ฝ๐∗๐ผ2 [๐ฝ๐∗๐ผ3 ] 1 0 1 1 1 0 ๐ฝ0 ๐ฝ๐ผ2 ๐๐1 0 ๐ฝ๐ผ3 ; ๐๐ = [๐๐2 ] 0] ; ๐ท = ๐ฝ๐ ๐๐3 1 ๐ฝ๐∗๐ผ2 [๐ฝ๐∗๐ผ3 ] For vegetarians: ๐ฆ๐1 1 0 0 ๐๐ = [๐ฆ๐2 ] ; ๐ ๐ = [1 1 0 ๐ฆ๐3 1 0 1 2 ๐๐1 Where ๐๐ ∼ ๐3 (๐, ๐), and ๐ = [๐๐1 ,๐2 ๐๐1 ,๐3 ๐๐1 ,๐2 2 ๐๐2 ๐๐2 ,๐3 ๐๐1 ,๐3 ๐๐2 ,๐3 ] (the unstructured error covariance matrix model). 2 ๐๐3 Which leads to the following simultaneous linear models: For meat eaters: ๐ฆ๐1 = ๐ฝ0 + ๐๐1 ๐ฆ๐2 = ๐ฝ0 + ๐ฝ๐ผ2 + ๐๐2 ๐ฆ๐3 = ๐ฝ0 + ๐ฝ๐ผ3 + ๐๐3 For vegetarians: ๐ฆ๐1 = ๐ฝ0 + ๐ฝ๐ + ๐๐1 ๐ฆ๐2 = ๐ฝ0 + ๐ฝ๐ผ2 + ๐ฝ๐ + ๐ฝ๐∗๐ผ2 + ๐๐2 ๐ฆ๐3 = ๐ฝ0 + ๐ฝ๐ผ3 + ๐ฝ๐ + ๐ฝ๐∗๐ผ3 + ๐๐3 PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 4 The ๐ error covariance matrix has changed by adding the predictor – this is analogous to what happens in a univariate general linear model – the predictor shrinks the error variance: For intensity #1 (pulse after warm up): 2 2 Empty model ๐๐1 = 279.9; After adding DIET as a predictor: ๐๐1 = 188.6 2 Therefore ๐ ๐ผ1 = 279.9−188.6 279.9 = 0.326 For intensity #2 (pulse after jogging): 2 2 Empty model ๐๐2 = 473.1; After adding DIET as a predictor: ๐๐2 = 328.4 2 Therefore ๐ ๐ผ2 = 473.1−328.4 473.1 = 0.306 For intensity #3 (pulse after running): 2 2 Empty model ๐๐3 = 770.0; After adding DIET as a predictor: ๐๐3 = 530.1 2 Therefore ๐ ๐ผ3 = 770.0−530.1 770.0 = 0.312 These ๐ 2 reflect the proportion of variance accounted for marginally for each variable. What about the covariances? Notice, adding a predictor reduces those, too. What is needed is a joint description of how the combined variation of all three variables is reduced by the predictor. To get that, we can use the generalized sample variance (the determinant of ๐) For all three variables jointly: Empty model generalized variance: |๐| =1,675,026; After adding DIET as a predictor: |๐| = 1,224,462 Therefore the overall ๐ 2 is: ๐ 2 = 1,675,026−1,224,462 1,675,026 = .269 This value is less than each of the marginal ๐ 2 values because of the contributions of the covariances. PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 5 Because we are using the Residual Maximum Likelihood as our estimator, we can use a multivariate Wald test to determine if DIET significantly improved model fit. This is essentially testing whether the multivariate ๐ 2 change is zero. More formally, we are testing the hypothesis: ๐ป0 : ๐ฝ๐ = ๐ฝ๐∗๐ผ1 = ๐ฝ๐∗๐ผ3 = 0 ๐ป๐ด : At least one not equal to zero We can conclude that DIET significantly improved the fit of the multivariate model – but, what that does not tell us is if it improved the fit for each dependent variable marginally. Interpret each effect… Intercept: observationF2: observationF3: dietF2: observationF2:dietF2: observationF3:dietF2: PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 6 What we cannot find from our direct model output are some key hypothesis tests that show up for this type of analysis: ๏ท ๏ท ๏ท Within subjects main effect of intensity (do mean pulse rates differ when pulse is taken after different intensities?) o Should be 2 degrees of freedom (3 intensity levels - 1) Between subjects main effect of diet type (do mean pulse rates differ for vegetarians and meat eaters?) o Should be 1 degree of freedom (2 diet types - 1) Interaction of within subjects intensity and between subjects diet type (are there differences in pulse rate for varying combinations of intensity and diet?) o Should be 2 degrees of freedom (3 intensity levels - 1)* (2 diet types - 1) PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 7 Comparison of PROC MIXED Multivariate Analyses with Classical Multivariate Analysis of Variance (MANOVA) A semi-frequent request when running multivariate analyses is for some type of “MANOVA” tests. Usually, this is due to the person requesting the classical tests having been trained in a customary multivariate analysis course. This section of the handout describes MANOVA and how what we have just done can provide a MANOVA-style test if you ever are asked for one. We will begin our discussion of MANOVA by starting with the univariate ANOVA (a one-way ANOVA with one categorical IV). In univariate ANOVA, we develop the hypothesis test of group differences in the group mean of the outcome using the concept of sums of squares. ๐ป0 : ๐1 = ๐2 = โฏ = ๐๐บ ๐ป๐ด : at least one ๐ not equal to the others Sums of squares between groups comes from the sum of squared differences from each group mean ๐ฆฬ ๐ from the overall grand mean ๐ฆฬ (where ๐๐ is the within-group sample size for group ๐): ๐บ 2 ๐๐๐ต = ∑ ๐๐ (๐ฆฬ ๐ − ๐ฆฬ ) ๐=1 The sums of squares within groups (here referred to as the sums of squares for error) comes from the squared difference of each person’s outcome and their group’s mean: ๐บ ๐๐ 2 ๐๐๐ธ = ∑ ∑ (๐ฆ๐ − ๐ฆฬ ๐ ) ๐=1 ๐๐ =1 The F-test is the ratio of these two terms, each divided by their respective degrees of freedom: ๐น= ๐๐๐ต /๐๐๐ต ๐๐๐ธ /๐๐๐ธ A MANOVA extends a univariate ANOVA to test hypotheses about mean vectors across categorical independent variables. The hypothesis test is now: ๐ป0 : ๐1 = ๐2 = โฏ = ๐๐บ ๐ป๐ด : at least one ๐ not equal to the others ๐๐1 ๐๐2 Where ๐๐ = [ โฎ ], for ๐observed outcome variables (i.e., a multivariate analysis). The same concepts of univariate ANOVA still ๐๐๐ apply, just with vectors and matrices instead of individual means. The matrix analog to the sums of squares between groups is the Between Groups Sums of Squares and Cross Products matrix (using SAS’ notation ๐, which stands for Hypothesis) is formed ฬ ๐ (size ๐ ๐ฅ 1) and the overall grand mean vector ๐ ฬ (also size ๐ ๐ฅ 1) using each group’s mean vector ๐ ๐บ ๐ ฬ ๐ − ๐ ฬ )(๐ ฬ ๐ − ๐ ฬ ) ๐(๐ ๐ฅ ๐) = ∑ ๐๐ (๐ ๐=1 PRE 905: Spring 2014 – Lecture 9, Example 1 -- Page 8 The matrix analog to the sums of squares within groups (here referred to as the sums of squares for error) is the Error Sums of Squares and Cross Products Matrix which comes from comparing each person’s vector of outcomes ๐๐ with their group mean ฬ ๐ : vector ๐ ๐บ ๐๐ ๐ ฬ )(๐๐ − ๐ ฬ ) ๐(๐ ๐ฅ ๐) = ∑ ∑ (๐๐ − ๐ ๐=1 ๐๐ =1 ๐ฌ Note: for our multivariate models using maximum likelihood and an unstructured ๐ matrix, ๐ = ๐ Now, instead of forming an F-ratio, we can form a matrix that incorporates this ratio: ๐ = ๐−1 ๐ Where MANOVA differs from univariate ANOVA is that there is no one best test statistic for summarizing ๐ . Instead, four popular statistics are formed: det(๐) det(๐+๐) ๏ท Wilk’s Lambda = ๏ท ๏ท ๏ท Pillai’s trace = ๐ก๐๐๐๐(๐(๐ + ๐)−1 ) Hotelling-Lawley trace = ๐ก๐๐๐๐(๐−1 ๐) Roy’s greatest (largest) root = largest eigenvalue of ๐−1 ๐ (we will show how to get this using a modification of a likelihood ratio test) What we seek to say is this: even though these statistics don’t directly come out of PROC MIXED, you can still obtain (at least one of) them. Therefore, the analysis we have done subsumes classical MANOVA (or Multivariate Regression) into a more general framework for investigating multivariate hypotheses. We will now compare a MANOVA with PROC GLM and show how we can achieve the same result from PROC MIXED. The four MANOVA test statistics are shown in the output below. We can obtain Wilks Lambda from a model comparison of our previous two multivariate models in PROC MIXED: Empty Model −2๐ฟ๐ฟ๐ธ๐๐๐๐ = −198.5 ∗ −2 = 397.0 DIET Model −2๐ฟ๐ฟ๐ท๐ผ๐ธ๐ = −186.8 ∗ −2 = 373.6 Wilks Lambda = Λ = exp ( −(−2๐ฟ๐ฟ๐ธ๐๐๐๐ − −2๐ฟ๐ฟ๐ท๐ผ๐ธ๐ ) −(397.0 − 373.6) ) = exp ( ) = 0.273 ๐ 18