Two-way MANOVA • We now consider designs with two factors. Factor 1 has g levels and factor 2 has b levels. • If Xikr is the p × 1 vector of measurements on the rth unit in the ith level of factor 1 and the kth level of factor 2: Xikr = µ + αi + τ k + γ ik + eikr , with i = 1, ..., g, k = 1, ..., b and r = 1, ..., n and all are p × 1 vectors. • Here γik is a vector of interaction effects. • To simplify notation, we assume that there are n units in each of the gb combinations of factor levels, but everything we will say holds as well for the case where there are ngb units in each factor combination. 420 Two-way MANOVA • The vector of measurements taken on the r-th unit in the treatment group distinguished by the i-th level of factor 1 and the k-th level of factor 2 can be expressed as Xikr1 Xikr2 ... Xikrp = µ1 µ2 ... µp + αi1 αi2 ... αip + τk1 τk2 ... τkp + γik1 γik2 ... γikp + ikr1 ikr2 ... ikrp where ikr ∼ N IDp(0, Σ) 421 Two-way MANOVA • We will impose the SAS restrictions: αg` = 0 for ` = 1, 2, ..., p τb` = 0 for ` = 1, 2, ..., p γgk` = 0 for k = 1, 2, ...b and ` = 1, 2, ..., p γib` = 0 for i = 1, 2, ...g and ` = 1, 2, ..., p • By making the responses from different units the rows of a response matrix, the model for the entire set of data can be written in matrix notation as follows 422 Two-way MANOVA X1111 X1121 ... Xgbn1 1 1 ... 1 1 ... 1 ... 1 1 1 ... 1 0 ... 0 ... 0 X1112 X1122 ... Xgbn2 0 0 ... 0 1 ... 1 ... 0 ... ... ... ... ... ... ... ... ... ... ... ... ... 0 0 ... 0 0 0 0 X111p X112p ... Xgbnp = µl α11 ... αg−1,1 τ11 ... τb−1,1 γ111 ... γg−1,b−1,1 µ2 α12 ... αg−1,2 τ12 ... τb−1,2 γ112 ... γg−1,b−1,2 ... ... ... ... ... ... ... ... ... ... µp α1p ... αg−1,p τ1p ... τb−2,p γ11p ... γg−1,b−2,p + 423 Two-way MANOVA • If the interaction is non-zero, then factor effects are not additive and the effect of one factor may depend on the level of the other factor. • See next figure: on top panel the effect of factor 2 depends on level of factor 1. When factor 1 is at level 2, there relatively little effect of factor 2 on the outcome variable. • The bottom panel shows parallel profiles, an example with no interaction 424 Sum of squares and cross-product matrices • As in the one-factor model, we can decompose the overall variability into different sources. Note that xikr − x̄ = (x̄i. − x̄) + (x̄.k − x̄) + (x̄ik − x̄i. − x̄.k + x̄) + (xikr − x̄ik ), where x̄i. is the p × 1 mean vector of observations at ith level of factor 1, x̄.k is the p × 1 mean vector of observations at kth level of factor 2 and x̄ik is the p × 1 mean vector of observations at the ith level of factor 1 and the kth level of factor 2. • Multiplying both sides in expression above by the corresponding transposed vectors and summing over n, k, i we get the usual decomposition. 425 Two-way MANOVA Source SS and CP matrices Degress of freedom Factor 1 Factor 2 Interaction Residual SSPf ac1 SSPf ac2 SSPint SSE g−1 b−1 (g − 1)(b − 1) gb(n − 1) Corr. total SSPctot gbn − 1 See next page for expressions for the various SS and CP matrices. 426 Two-way MANOVA SSPf ac1 = SSPf ac2 = SSPint = SSE = SPctot = X i X bn(x̄i. − x̄)(x̄i. − x̄)0 gn(x̄.k − x̄)(x̄.k − x̄)0 k X X n(x̄`k − x̄i. − x̄.k + x̄)(x̄ik − x̄i. − x̄.k + x̄)0 i k XX X (xikr − x̄ik )(xikr − x̄`k )0 i k r XX X i k r (xikr − x̄)(xikr − x̄)0. All matrices are p × p dimensional. 427 Testing hypothesis in the two-way model • We first test for the presence of absence of interaction effects: H0 : γ11 = γ12 = · · · = γgb = 0, versus H1 : at least one γik 6= 0. • The Wilk’s Λ statistic below has an asymptotic χ2 distribution. For |SSE| Λ= , |SSPint + SSE| we reject H0 at level α if " − bg(n − 1) − # p + 1 − (g − 1)(b − 1) ln Λ ≥ χ2 (α). p(g−1)(b−1) 2 428 Testing hypothesis in the two-way model • A more accuate p-value is obtained from Rao’s F-approximation • If we reject the null hypothesis of no interaction effects, then interpreting the meaning of factors 1 and 2 effects gets complicated. • One recommended approach in this case is to focus on the p variables individually (perhaps by fitting the p univariate ANOVA models) to see whether interactions are present on all outcome variables or only on some of them. 429 Testing hypothesis in the two-way model • Interpretation of results can be aided by constructing p profile plots (one for each of the p response variables) with the sample mean at each combination of factor levels substituted for the corresponding population mean. • If we fail to reject the null hypothesis of no interaction effects, then we proceed with hypothesis tests for additive effects of factors 1 and 2 using the appropriate multivariate test statistics. 430 Testing hypothesis in the two-way model (cont’d) • The null hypothesis of no additive effect of factor 1 is H0 : α1 = α2 = · · · = αg = 0, versus H1 : at least one αil 6= 0. The Wilk’s Λ∗ statistic is Λ∗ = |SSE| , |SSPf ac1 + SSE| and the null is rejected at level α if " − bg(n − 1) − # p + 1 − (g − 1) ln Λ∗ ≥ χ2 (α). p(g−1) 2 431 Testing hypothesis in the two-way model (cont’d) • Similarly, the null hypothesis of no additive effect of factor 2 is H0 : τ1 = τ2 = · · · = τb = 0, versus H1 : at least one τk 6= 0. The Wilk’s Λ∗ statistic is Λ∗ = |SSE| , |SSPf ac2 + SSE| and the null is rejected at level α if " − bg(n − 1) − # p + 1 − (b − 1) ln Λ∗ ≥ χ2 (α). p(b−1) 2 432 Simultaneous confidence intervals • As in the one-factor case, we may wish to explore differences across factor levels for each of the p variables. • For example, simultaneous 100(1 − α)% Bonferroni confidence intervals for differences between the g(g − 1)/2 pairs of levels of factor 1 (α`j − αmj ) for all j = 1, 2, ..., p response variables are constructed as s Ejj 2 α (x̄`.j − x̄m.j ) ± tν ( ) , pg(g − 1) ν bn where ν = gb(n−1), Ejj is the (j, j)th diagonal element of the error SS and CP matrix, and (x̄`.j − x̄m.j ) is the jth element of the p × 1 vector of sample mean differences (x̄`. − x̄m.). 433 Simultaneous confidence intervals (cont’d) • Similarly, simultaneous 100(1 − α)% Bonferroni confidence intervals for differences between the b(b − 1)/2 pairs of levels of factor 1 (τkj − τqj ) for all j = 1, 2, ..., p response variables are constructed as s Ejj 2 α (x̄.kj − x̄.qj ) ± tν ( ) , pb(b − 1) ν gn where ν = gb(n−1), Ejj is the (j, j)th diagonal element of the error SS and CP matrix, and (x̄.kj − x̄.qj ) is the jth element of the p × 1 vector of sample mean differences (x̄.k − x̄.q ). 434 Simultaneous confidence intervals (cont’d) • We need to consider combinations of factor levels if the interaction effects are not negligible. • For example, simultaneous 100(1 − α)% Bonferroni confidence intervals for differences between the g(g − 1)/2 pairs of levels of factor 1 ((τ`j + γ`kj ) − (τmj − γmkj )) at each of the k = 1, 2, ..., b levels of factor 2 and all j = 1, 2, ..., p response variables are constructed as s Ejj 2 α ) , (x̄`kj − x̄mkj ) ± tν ( pbg(g − 1) ν n where ν = gb(n − 1), Ejj is the (j, j)th diagonal element of error SS and CP matrix, and (x̄`kj − x̄mkj ) is the jth element of the p × 1 vector of sample mean differences (x̄`k − x̄mk ). 435 Some Comments • Note: if n = 1, that is we do not have any replications within factor level combinations, we will not be able to estimate an error SS an CP matrix. In that case, we can only make formal inferences about the additive effects model for factors 1 and 2. • Note 2: The extension to designs with more than two factors is straight forward. • It is possible to fit interactions of third, fourth and higher orders when three, four or more factors are included in the experiment. • Interpretations are complicated if higher order interactions are not negligible. 436 Example: Peanuts • See Problem 6.31 in text book. • Plant scientists conducted an experiment to examine three traits of peanuts. The two factors in the experiment were variety (three levels) and location (two levels), so there are g × b = 3 × 2 = 6 factor level combinations. • Two replications n = 2 were included for each of the 6 combinations of factor levels. 437 Example: Peanuts • Scientists measured three variables on each plot: – X1 = yield (plot weight) – X2 = weight in grams of sound mature kernels - 250 grams – X3 = seed size measured as weight in grams of 100 seeds • Fit a two-way model with an interaction using SAS. 438 Example: Peanuts(cont’d) • It is useful to first obtain the profile plots for the three variables to see whether there may be interaction effects of location and variety. • We plot the means of each variable in each variety by locations. • See figures that follow. There are clear interactions between location and variety, and the interactions appear to be significant for all three variables. 439 Example: Peanuts(SAS Code) /* This program performs a two-way MANOVA on the peanut data posted as peanuts.dat The code is posted as peanuts.sas */ options linesize=64 nocenter nonumber ; data set1; infile "c:\stat501\data\peanuts.dat"; INPUT location variety x1 x2 x3; /* LABEL group = student group x1 = yield x2 = SdMatKer x3 = SeedSize; */ run; 440 Example: Peanuts(SAS Code) PROC PRINT data=set1; run; PROC GLM DATA=set1; CLASS location variety; MODEL x1-x3 = location variety location*variety / P SOLUTION; MANOVA H=location*variety /PRINTH PRINTE; MANOVA H=variety / printH printE; MANOVA H=location / printH printE; Repeated traits 3 profile / printm; run; 441 Example: Peanuts(SAS Output) Obs location variety x1 x2 x3 1 2 3 4 5 6 7 8 9 10 11 12 1 1 2 2 1 1 2 2 1 1 2 2 5 5 5 5 6 6 6 6 8 8 8 8 195.3 194.3 189.7 180.4 203.0 195.9 202.7 197.6 193.5 187.0 201.5 200.0 153.1 167.7 139.5 121.1 156.8 166.0 166.1 161.8 164.5 165.1 166.8 173.8 51.4 53.7 55.5 44.4 49.8 45.8 60.4 54.1 57.8 58.6 65.0 67.2 442 Example: Peanuts(SAS Output) Dependent Variable: x1 Source Model Error Corrected Total Source location variety location*variety DF 5 6 11 DF 1 2 2 Type I SS 0.7008333 196.1150000 205.1016667 Sum of Squares 401.9175000 104.2050000 506.1225000 Mean Square 80.3835000 17.3675000 Mean Square F Value 0.7008333 0.04 98.0575000 5.65 102.5508333 5.90 Pr > F 0.8474 0.0418 0.0382 443 Example: Peanuts(SAS Code) Dependent Variable: x2 Source Model Error Corrected Total DF 5 6 11 Source DF location 1 variety 2 location*variety 2 Sum of Squares 2031.777500 352.105000 2383.882500 Type I SS 162.067500 1089.015000 780.695000 Mean Square 406.355500 58.684167 Mean Square 162.067500 544.507500 390.347500 F Value 2.76 9.28 6.65 Pr > F 0.1476 0.0146 0.0300 444 Example: Peanuts(SAS Code) Dependent Variable: x3 Source Model Error Corrected Total DF 5 6 11 Source location variety location*variety DF 1 2 2 Sum of Squares 442.5741667 94.8350000 537.4091667 Type I SS 72.5208333 284.1016667 85.9516667 Mean Square 88.5148333 15.8058333 Mean Square 72.5208333 142.0508333 42.9758333 F Value 4.59 8.99 2.72 Pr > F 0.0759 0.0157 0.1443 445 Example: Peanuts(SAS Code) MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall location*variety Effect H = Type III SSCP Matrix for location*variety E = Error SSCP Matrix S=2 M=0 N=1 Statistic Wilks’ Lambda Pillai’s Trace Hotelling-Lawley Trace Roy’s Greatest Root Value F Value 0.07430 3.56 1.29086 3.03 7.54429 5.03 6.82409 11.37 Num DF 6 6 6 3 Den DF 8 10 4 5 Pr > F 0.0508 0.0587 0.0699 0.0113 NOTE: F Statistic for Roy’s Greatest Root is an upper bound. NOTE: F Statistic for Wilks’ Lambda is exact. 446 Example: Peanuts(SAS Code) MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall variety Effect H = Type III SSCP Matrix for variety E = Error SSCP Matrix S=2 M=0 N=1 Statistic Wilks’ Lambda Pillai’s Trace Hotelling-Lawley Trace Roy’s Greatest Root Value F Value 0.01244 10.62 1.70911 9.79 21.37568 14.25 18.18761 30.31 Num DF 6 6 6 3 Den DF 8 10 4 5 Pr > F 0.0019 0.0011 0.0113 0.0012 NOTE: F Statistic for Roy’s Greatest Root is an upper bound. NOTE: F Statistic for Wilks’ Lambda is exact. 447 Example: Peanuts(SAS Code) MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall location Effect H = Type III SSCP Matrix for location E = Error SSCP Matrix S=1 M=0.5 N=1 Statistic Wilks’ Lambda Pillai’s Trace Hotelling-Lawley Trace Roy’s Greatest Root Value F Value 0.10651620 11.18 0.89348380 11.18 8.38824348 11.18 8.38824348 11.18 Num DF 3 3 3 3 Den DF 4 4 4 4 Pr > F 0.0205 0.0205 0.0205 0.0205 448 Example: Peanuts(SAS Code) MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no traits Effect H = Type III SSCP Matrix for traits E = Error SSCP Matrix S=1 M=0 N=1.5 Statistic Wilks’ Lambda Pillai’s Trace Hotelling-Lawley Trace Roy’s Greatest Root Value 0.000089 0.999911 11279.385332 11279.385332 F Value 28198.5 28198.5 28198.5 28198.5 Num DF 2 2 2 2 Den DF 5 5 5 5 449 Pr>F <.0001 <.0001 <.0001 <.0001 Example: Peanuts(SAS Code) MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no traits*location Effect H = Type III SSCP Matrix for traits*location E = Error SSCP Matrix S=1 M=0 N=1.5 Statistic Wilks’ Lambda Pillai’s Trace Hotelling-Lawley Trace Roy’s Greatest Root Value F Value 0.11262 19.70 0.88738 19.70 7.87949 19.70 7.87949 19.70 Num DF 2 2 2 2 Den DF 5 5 5 5 Pr > F 0.0043 0.0043 0.0043 0.0043 450 Example: Peanuts(SAS Code) MANOVA Test Criteria and F Approximations for the Hypothesis of no traits*variety Effect H = Type III SSCP Matrix for traits*variety E = Error SSCP Matrix S=2 M=-0.5 N=1.5 Statistic Value Wilks’ Lambda 0.02064 Pillai’s Trace 1.55258 Hotelling-Lawley Trace 19.67760 Roy’s Greatest Root 18.14717 F Value 14.90 10.41 24.15 54.44 Num DF Den DF 4 10 4 12 4 5.1429 2 6 Pr > F 0.0003 0.0007 0.0016 0.0001 NOTE: F Statistic for Roy’s Greatest Root is an upper bound. NOTE: F Statistic for Wilks’ Lambda is exact. 451 Example: Peanuts(SAS Code) Repeated Measures Analysis of Variance MANOVA Test Criteria and F Approximations for the Hypothesis of no traits*location*variety Effect H = Type III SSCP Matrix for traits*location*variety E = Error SSCP Matrix S=2 M=-0.5 N=1.5 Statistic Wilks’ Lambda Pillai’s Trace Hotelling-Lawley Trace Roy’s Greatest Root Value F Value 0.09547 5.59 1.19307 4.44 6.45248 7.92 5.94400 17.83 Num DF Den DF 4 10 4 12 4 5.1429 2 6 Pr > F 0.0125 0.0198 0.0204 0.0030 NOTE: F Statistic for Roy’s Greatest Root is an upper bound. NOTE: F Statistic for Wilks’ Lambda is exact. 452 Example: Peanuts(SAS Code) Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source location variety location*variety Error Source location variety location*variety DF 1 2 2 6 F Value 0.07 9.21 7.23 Type III SS 3.802500 1071.387222 841.031667 348.941667 Mean Square 3.802500 535.693611 420.515833 58.156944 Pr > F 0.8067 0.0148 0.0252 453 Example: Peanuts(SAS Code) Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source traits traits*location traits*variety traits*location*variety Error(traits) DF 2 2 4 4 12 Type III SS 126097.2156 231.4867 497.8444 230.7167 202.2033 Mean Square 63048.6078 115.7433 124.4611 57.6792 16.8503 454 Example: Peanuts(SAS Code) Source traits traits*location traits*variety traits*location*variety Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon F Value 3741.70 6.87 7.39 3.42 Pr > F <.0001 0.0103 0.0031 0.0436 Adj Pr > F G - G H - F <.0001 <.0001 0.0337 0.0103 0.0188 0.0031 0.0919 0.0436 0.5583 1.1672 455