8. Two-way crossed classications This is called an \unbalanced" factorial experiment. Example 8.1 Days to germination of three varieties of carrot seed grown in two types of potting soil. Soil Type 1 1 Y111 = 6 Y112 = 10 Y113 = 11 2 Y211 = 12 Y212 = 15 Y213 = 19 Y214 = 18 Variety 2 3 Y121 = 13 Y131 = 14 Y122 = 15 Y132 = 22 Y221 = 31 Y231 = 18 Y232 = 9 Y233 = 12 495 We will restrict our attention to normal-theory Gauss-Markov models. Yijk = ij + ijk ijk NID(0; 2) 8 > > > > > > > < > > > > > > > : Soil type 1 2 1 Variety 2 3 n11 = 3 n12 = 2 n13 = 2 n21 = 4 n22 = 1 n23 = 3 In general we have i = 1; 2; : : : ; a levels for the rst factor j = 1; 2; : : : ; b levels for the second factor nij > 0 observations at the i-th level of the rst factor and the j-th level of the second factor 496 Overall mean response: 1 a b :: = ab i=1 j =1 ij X X Mean response at the i-th level of factor 1, averaging across the levels of factor 2. \Cell means" model: where Sample sizes b i: = 1b j =1 ij X i = 1; : : : ; a j = 1; : : : ; b k = 1; : : : ; nij Clearly, E (Yijk) = ij is estimable if nij > 0. 497 Mean response at the j -th level of factor 2, averaging across the levels of factor 1 a ij :j = a1 i=1 X 498 Conditional Eects Contrasts of interest \main eects" for factor 1: i: :: i = 1; 2; : : : ; a i: k: i 6= k ij kj 8 > > < > > : ij i` 8 > > > < > > : i 6= k j = 1; 2; : : : ; b j 6= ` i = 1; 2; : : : ; a Interaction Contrasts \main eects" for factor 2: :j :: j = 1; 2; : : : ; b :j :` j 6= ` (ij kj ) (i` k`) = (ij i`) (kj k`) = ij kj i` + k` 499 All of these contrasts are estimable when nij > 0 500 An \eects" model Yijk = + i + j + ij + ij for all (i; j ) where because E(Yij:) = ij Any linear function of estimable functions is estimable 501 ijk NID(0; 2) i = 1; 2; : : : ; a j = 1; 2; : : : ; b k = 1; 2; : : : ; nij > 0 502 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 3 2 1 77 66 1 77 66 1 77 66 77 66 1 77 66 1 77 66 1 77 66 1 77 = 66 1 77 66 77 66 1 77 66 1 77 66 1 77 66 1 77 66 1 75 64 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 77 2 77 6 777 66 77 66 77 66 77 66 77 66 77 66 6 777 66 77 66 77 66 77 66 77 4 75 1 2 1 2 3 11 12 13 21 22 23 2 66 66 66 66 66 66 66 + 666 66 66 66 66 66 66 4 The resulting restricted model is 3 77 77 77 77 77 77 77 77 77 77 77 5 Yijk = + i + j + ij + ijk where 111 112 113 121 122 131 132 211 212 213 214 221 231 232 233 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 ijk NID(0; 2) 8 > > > > > > > < > > > > > > > : and 1 1 i1 1j = = = = i = 1; : : : ; a j = 1; : : : ; b k = 1; : : : ; nij 0 0 0 for all i = 1; : : : ; a 0 for all j = 1; : : : ; b We will call these \baseline" restrictions. 503 504 Soil Type Soil 1 2 Variety 1 11 = 21 = + 1 + 22 Variety 2 Variety 3 Means + 1+3 2 + 2 12 = + 2 22 = + 2 +2 + 22 13 = + 3 23 = + 2 +3 + 23 + 22 + 2 + 222 + 22 + 3 + 223 + 2+3 3 + 22+3 23 Soil Type Soil 1 2 Variety 1 + 22 Interpretation: Variety 2 11 = 21 = + 1 12 = + 2 22 = + 2 +2 + 22 Variety 3 Means + 1+3 2 + 2 13 = + 3 23 = + 2 +3 + 23 + 2+3 3 + 22+3 23 + 22 + 2 + 222 + 22 + 3 + 223 Interpretation = 11 = E (Y11k) j = 1j 11 = E (Y1jk) E (Y11k) for j = 1; 2; : : : ; b is the mean response with both factors at the rst level. i = i1 11 = E (Yi1k) E (Y11k) is the dierence in mean responses between levels i and 1 of factor 1 when factor 2 is at level 1. 505 is the dierence in the mean responses for levels j and 1 of factor 2 when factor 1 is at level 1. 506 Soil Type Soil 1 2 Variety 1 Variety 2 11 = 21 = + 1 + 22 12 = + 2 22 = + 2 +2 + 22 Variety 3 Means + 1+3 2 13 = + 3 23 = + 2 +3 + 23 + 2 + 2+3 3 + 22+3 23 + 22 + 2 + 222 + 22 + 3 + 223 Interaction: ij = (ij ib) (aj ab) = (ij aj ) (ib ab) 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 4 Y111 777 666 1 Y112 7777 6666 1 Y113 7777 6666 1 Y121 7777 6666 1 Y122 7777 6666 1 Y131 7777 6666 1 Y132 7777 6666 1 Y211 7777 = 6666 1 Y212 7777 6666 1 Y213 7777 6666 1 Y214 7777 6666 1 Y221 7777 6666 1 Y231 7777 6666 1 Y232 7775 6664 1 1 Y233 3 " 2 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 " 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 3 0 3777 66 111 77 6 77 66 0 7777 66 112 777 66 7 0 7777 66 113 777 66 7 0 7777 6 121 77 0 7777 26 37 66666 122 77777 0 7777 6666 2 7777 6666 131 7777 0 7777 6666 2 7777 6666 132 7777 0 7777 6666 3 7777 + 6666 211 7777 0 7777 6666 22 7777 6666 212 7777 0 7777 64 23 75 6666 213 7777 66 7 0 7777 66 214 777 66 7 0 7777 66 221 777 66 77 1 7777 231 66 77 7 6 1 775 66 232 777 4 1 233 5 Y = X + Note that ij i` kj +k` = ij i` kj +k` for any (i; j ) and (k; `) 507 ^ 77 ^ 2 77777 ^ 7 b = ^2 777777 3 7 ^22 77775 ^23 = (X T X ) 2 11 66 Y 66 66 Y21 66 = 66666 YY12 66 13 66 Y 64 22 Y23 3 1 XT Y Y11 Y11 Y11 Y21 Y12 + Y11 Y21 Y13 + Y11 where Y N (X; 2I ) 508 Restrictions must involve \nonestimable" quantities for the unrestricted \eects" model. Baseline restrictions: (SAS) a = 0 b = 0 ib = 0 for all i = 1; : : : ; a aj = 0 for all j = 1; : : : ; b Least squares estimation: 2 66 66 66 66 66 66 66 66 66 64 Matrix formulation: 3 77 77 77 77 77 77 77 77 77 5 Baseline restrictions: (S-PLUS) 1 = 0 1 = 0 i1 = 0 for all i = 1; : : : ; a 1j = 0 for all j = 1; : : : ; b 509 510 -restrictions: Yijk = ! + i + Æj + ij + ijk Variety 1 - Soil ij = E (Yijk ) where type 1 Soil type 2 means ijk NID(0; 2) a i=1 i b X Æ j =1 j a X i=1 ij b X j =1 ij X Variety 2 Variety 3 Interpretation: 1 a b ! = ab i=1 j =1 ij = 0 X = 0 = 0 for each j = 1; : : : ; b = 0 for each i = 1; : : : ; a X is the overall mean germination time, averaging across all soil types and all varieties used in this study. 512 511 Variety 1 Soil type 1 Soil type 2 means Means 11 = ! + 1 12 = ! + 1 13 = ! + 1 1: = ! + 1 +Æ1 + 11 +Æ2 + 12 +Æ3 + 13 21 = ! + 2 22 = ! + 2 23 = ! + 2 2: = ! + 2 +Æ1 + 21 +Æ2 + 22 +Æ3 + 23 :1 = ! + Æ1 :2 = ! + Æ2 :3 = ! + Æ3 Variety 2 Variety 3 Means 11 = ! + 1 12 = ! + 1 13 = ! + 1 1: = ! + 1 +Æ1 + 11 +Æ2 + 12 +Æ3 + 13 21 = ! + 2 22 = ! + 2 23 = ! + 2 2: = ! + 2 +Æ1 + 21 +Æ2 + 22 +Æ3 + 23 :1 = ! + Æ1 :2 = ! + Æ2 :3 = ! + Æ3 Variety 1 Soil type 1 Interpretation: Soil type 2 means ! + Æj = :j Variety 2 Variety 3 Means 11 = ! + 1 12 = ! + 1 13 = ! + 1 1: = ! + 1 +Æ1 + 11 +Æ2 + 12 +Æ3 + 13 21 = ! + 2 22 = ! + 2 23 = ! + 2 2: = ! + 2 +Æ1 + 21 +Æ2 + 22 +Æ3 + 23 :1 = ! + Æ1 :2 = ! + Æ2 :3 = ! + Æ3 Interpretation: Æj = :j :: 1 2 = 1: 2: and Æj Æk = (:j ::) (:k ::) = :j :k is the dierence in the mean germination times for dierent soil types, averaging across varieties. is the dierence between mean germination times for varieties j and k, averaging across soil types. 513 514 Variety 1 Variety 2 Variety 3 Means 11 = ! + 1 12 = ! + 1 13 = ! + 1 1: = ! + 1 +Æ1 + 11 +Æ2 + 12 +Æ3 + 13 21 = ! + 2 22 = ! + 2 23 = ! + 2 2: = ! + 2 +Æ1 + 21 +Æ2 + 22 +Æ3 + 23 :1 = ! + Æ1 :2 = ! + Æ2 :3 = ! + Æ3 Soil type 1 Soil type 2 means Interaction: ij = ij (! + i + Æj ) is a deviation from an additive model. Then, kj i` + k` = ij kj i` + k` ij 515 Matrix formulation 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 3 2 1 77 66 1 77 66 77 66 1 77 66 1 77 66 1 77 66 1 77 66 1 77 = 66 1 77 66 77 66 1 77 66 1 77 66 1 77 66 1 77 66 1 75 64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 77 77 77 77 77 2 77 6 77 66 77 66 77 66 77 64 77 77 77 77 75 ! 1 Æ1 Æ2 11 12 2 66 66 66 6 3 66 6 77 66 77 66 77 + 66 77 66 5 66 66 66 66 66 64 111 112 113 121 122 131 132 211 212 213 214 221 231 232 233 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 This uses the -restrictions to obtain 2 = 1 Æ3 = Æ1 Æ2 21 = 11 13 = 11 12 22 = 12 23 = 13 = 11 + 12 516 Least squares estimation !^ 777 ^ 77777 ^1 77777 Æ b = ^ 7777 Æ2 777 ^11 77777 ^12 5 = (X T X ) 1X T Y 2 1 XX 66 Yij: 6 6 2 66 66 66 66 66 66 66 66 66 66 66 66 64 = 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 3 ij 1X 3 j Y1j: 16 X X Yij: Y i i1: 1X Y 2 i i2: Y11: !^ Y12: !^ 1X 2 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 1 777 75 1 X X Y ij: 61 XX 6 Y^ij: ^1 Æ ^1 Æ^2 If restrictions are placed on \nonestimable" functions of parameters in the unrestricted \eects" model, then 16:83 3777 3:17 7777 77 7 = 45::33 67 77777 0:33 7775 5:33 2 66 66 66 66 66 66 66 66 66 4 517 The resulting models are reparameterizations of each other. 518 The solution to the normal equations Y^ = PX Y e = Y Y^ = (I PX )Y SSE = eT e = YT (I PX )Y Y^ T Y^ = YT PX Y SSmodel = YT (PX P1)Y are the same for any set of restrictions. b = (X T X ) 1X T Y and interpretations of the corresponding parameters will not be the same for all such sets of restrictions. If you were to place restrictions on estimable functions of parameters in Yijk = + 1 + j + ij + ijk then you would change 519 Analysis of variance rank(X ) space spanned by the columns of X Y^ = X (X T X ) X T Y and OLS estimators of other estimable quantities. YT Y = YT PY + YT (P; P)Y +YT (P;; P;)Y +YT (PX P;;)Y +YT (I PX )Y = R() + R(j) + R(j; ) +R( j; ; ) + SSE 520 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 77 77 77 77 77 77 77 77 = 77 77 77 77 77 77 75 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 Dene: 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 " - - call this call this X 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 " call this X 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 77 2 77 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 4 75 1 2 1 2 3 11 12 13 21 22 23 3 77 77 77 77 77 77 77 + 77 77 77 77 5 X = X P = X(XT X) 1XT X; = [XjX] T X;) X T P; = X;(X; ; T X;; ) X T X;; = [XjXjX ] P;; = X;;(X;; ;; X = [XjXjX jX ] PX = X (X T X ) X T call this X X 521 The following three model matrices correspond to reparameterizations of the same model: 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 77 2 77 6 777 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 6 777 66 77 4 75 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 2 1 2 3 11 12 13 21 22 23 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 3 77 77 77 77 77 77 77 77 77 77 77 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 523 522 R() = YT PY is the same for all three models 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 77 2 77 77 66 77 66 77 66 77 66 77 66 77 66 6 777 66 77 66 77 66 77 66 77 64 77 5 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 2 1 2 3 11 12 13 21 22 23 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 3 77 77 77 77 77 77 77 77 77 77 77 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 524 R(; ) = YT P;Y is the same for all three models and so is R(j) = R(; ) R() 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 77 2 77 6 777 66 77 66 77 66 77 66 77 66 77 66 6 777 66 77 66 77 66 77 66 77 4 75 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 2 1 2 3 11 12 13 21 22 23 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 3 77 77 77 77 77 77 77 77 77 77 77 5 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 R(; ; ) = YT P;; Y is the same for all three models and so is R(j) = R(; ; ) R(; ) 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 77 2 77 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 6 777 66 77 64 77 5 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 2 1 2 3 11 12 13 21 22 23 3 77 77 77 77 77 77 77 77 77 77 77 5 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 525 R(; ; ; ) = YT PX Y is the same for all three models and so is R( j; ; ) = R(; ; ; ) R(; ; ) 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 66 66 66 66 66 66 66 66 66 66 66 66 66 66 64 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 77 2 77 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 66 77 4 75 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 2 1 2 3 11 12 13 21 22 23 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 526 Consequently, the partition YT Y = YT PY + YT (P; P)Y +YT (P;; P;)Y +YT (PX P;;)Y +YT (I PX )Y 3 77 77 77 77 77 77 77 77 77 77 77 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 77 77 77 77 77 77 77 77 77 77 77 77 77 77 75 527 = R() + R(j) + R(j; ) +R( j; ; ) + SSE is the same for all three models. 528 Normal Theory Gauss-Markov Model Using result 4.7, we have also shown earlier that ijk NID(0; 2) X By Cochran's Theorem, these quadratic forms (sums of squares) have independent chi-square distributions with 1; b 1; a 1; (a 1)(b 1); and n ab degrees of freedom, respectively, when nij > 0 for all (i; j ). nX a b ij ij )2 SSE = i=1 ( Y Y ijk j =1 k=1 = YT (I PX )Y 2n X ab 529 530 Produce ANOVA tables: The `m( ) function in S-PLUS: `m(Days soil*variety, data=carrot) > anova (`m.out1) R(j) R(j; ) R( j; ; ) SSE > `m.out1 To allow the `m( ) function to t a model involving classication variables, create factors. > carrot read.table("carrots.dat", col.names=c("Soil","Variety","Days")) > carrot$soil as.factor(carrot$soil) > carrot$variety as.factor(carrot$variety) > options(contrasts= c(\contr.sum","contr.poly")) 531 `m(Days variety*soil, data=carrot) > anova (`m.out2) R(j) R(j; ) R( j; ; ) SSE > `m.out2 532 There are four options for creating columns in the model matrix for classication variables: contr: helmert contr: treatment sets 1 = 0 1 = 0 1j = 0 for all j i1 = 0 for all i contr:sum constraints contr:poly orthogonal polynomial contrasts equal spacing equal sample sizes ># This file is posted as > + > > > carrots.ssc carrot <- read.table("c:\\carrots.dat", col.names=c("Soil","Variety","Days")) carrot$Soil <- as.factor(carrot$Soil) carrot$Variety <- as.factor(carrot$Variety) carrot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Soil Variety Days 1 1 6 1 1 10 1 1 11 1 2 13 1 2 15 1 3 14 1 3 22 2 1 12 2 1 15 2 1 19 2 1 18 2 2 31 2 3 18 2 3 9 2 3 12 534 533 # Set up the axes and title of the # profile plot. # # # # > par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5) > x.axis <- unique(carrot$Variety) > matplot(c(1,3,1), c(0,40,10), type="n", xlab="Variety", ylab="Mean Time", main= "Average Time to Carrot Seed Germination") Compute sample means of germination times for all combinations of soil type and varieties of carrot seeds and make a profile plot. > means <- tapply(carrot$Days, list(carrot$Variety,carrot$Soil),mean) > means 1 2 1 9 16 2 14 31 3 18 13 # Add a profile for each soil type > matlines(x.axis,means,type='l', lty=c(1,3),lwd=3) # Plot points for the observations > matpoints(x.axis,means, pch=c(1,16)) # Add a legend to the plot > legend(2.,38.6, legend=c('Soil Type 1','Soil Type 2'), lty=c(1,3),bty='n') 535 536 # Fit a model with main effects and interaction # Compute both sets of Type I sums of squares 40 Average Time to Carrot Seed Germination 20 Analysis of Variance Table 10 Mean Time 30 Soil Type 1 Soil Type 2 > options(contrasts= c(``contr.sum'',''contr.poly'')) > lm.out1 <- lm(Days~Soil*Variety,data=carrot) > anova(lm.out1) 0 Response: Days Terms added sequentially Df Sum Sq Soil 1 52.500 Variety 2 124.734 Soil:Variety 2 222.766 Residuals 9 120.000 1.0 1.5 2.0 2.5 3.0 Variety (first to last) Mean Sq F Value 52.5000 3.937500 62.3670 4.677527 111.3830 8.353723 13.3333 Pr(F) 0.0785 0.0405 0.0089 538 537 # Create a data frame containing the original # data and the residuals and estimated means > lm.out2 <- lm(Days~Variety*Soil,data=carrot) > anova(lm.out2) Analysis of Variance Table Response: Days Terms added sequentially Df Sum Sq Variety 2 93.3333 Soil 1 83.9007 Variety:Soil 2 222.7660 Residuals 9 120.0000 (first to last) Mean Sq F Value 46.6667 3.500000 83.9007 6.292553 111.3830 8.353723 13.3333 539 Pr(F) 0.0751 0.0334 0.0089 > data.frame(carrot$Soil,carrot$Variety, carrot$Days, Pred=lm.out1$fitted, Resid=round(lm.out1$resid,3)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X1 X2 X3 Pred Resid 1 1 6 9 -3 1 1 10 9 1 1 1 11 9 2 1 2 13 14 -1 1 2 15 14 1 1 3 14 18 -4 1 3 22 18 4 2 1 12 16 -4 2 1 15 16 -1 2 1 19 16 3 2 1 18 16 2 2 2 31 31 0 2 3 18 13 5 2 3 9 13 -4 2 3 12 13 -1 540 0 Residuals -4 -2 frame( ) par(cex=1.0,mex=1.0,lwd=3,pch=2, mkh=0.1,fig=c(0,1,.51,1), pty='m') plot(lm.out1$fitted, lm.out1$resid, xlab="Estimated Means", ylab="Residuals") abline(h=0, lty=2, lwd=3) 2 4 # Create residual plots 10 15 20 25 30 0 -4 -2 par(fig=c(0, 1, 0, 0.49), pty='s') qqnorm(lm.out1$resid) qqline(lm.out1$resid) 2 lm.out1$resid 4 Estimated Means -1 0 1 Quantiles of Standard Normal 541 Create plots for studentized residuals You must attach the MASS library to have access to the studres( ) function that computes studentized residuals in the following code 0 frame( ) par(cex=1.0,mex=1.0,lwd=3,pch=2, mkh=0.1,fin=c(6.5,6.5)) plot(lm.out1$fitted, studres(lm.out1), xlab="Estimated Means", ylab="Studentized Residuals", main="Studentized Residual Plot") abline(h=0, lty=2, lwd=3) Studentized Residuals library(MASS) 1 2 Studentized Residual Plot -1 # # # # # 542 10 15 20 25 30 Estimated Means qqnorm(studres(lm.out1), main="Studentized Residuals") qqline(studres(lm.out1)) 543 544 # Compute Type III sums of squares and F-tests. # First create the model matrix for # the cell means model. Studentized Residuals 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -1 studres(lm.out1) 1 2 > cb <- as.factor(10*as.numeric(carrot$Soil) + as.numeric(carrot$Variety)) > lm.out <- lm(carrot$Days ~ cb - 1) > D <- model.matrix(lm.out) > D -1 0 1 Quantiles of Standard Normal cb11 cb12 cb13 cb21 cb22 cb23 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 546 545 # Compute the sample means > y <- matrix(carrot$Days,ncol=1) > b <- solve(crossprod(D)) %*% crossprod(D,y) > b cb11 cb12 cb13 cb21 cb22 cb23 [,1] 9 14 18 16 31 13 # Compute Type III sums of squares and # related F-tests > s <- length(unique(carrot$Soil)) > t <- length(unique(carrot$Variety)) > yhat <- D %*% b > sse <- crossprod(y-yhat) > df2 <- nrow(y) - s*t # Generate an identity matrix and # a vector of ones Iden <- function(n) diag(rep(1,n)) one <- function(n) matrix(rep(1,n),ncol=1) 547 > c1 <- kronecker( cbind(Iden(s-1),-one(s-1)), t(one(t)) ) > q1 <- t(b) %*% t(c1)%*% solve( c1 %*% solve(crossprod(D)) %*% t(c1))%*% c1 %*% b > df1<- s-1 > f <- (q1/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) 548 > c1 [1,] > data.frame(SS=q2,df=df1,F.stat=f,p.value=p) [,1] [,2] [,3] [,4] [,5] [,6] 1 1 1 -1 -1 -1 SS df F.stat p.value 1 192.1277 2 7.204787 0.01354629 > data.frame(SS=q1,df=df1,F.stat=f,p.value=p) SS df F.stat p.value 1 123.7714 1 9.282857 0.01386499 > c2 <- kronecker( t(one(s)), cbind(Iden(t-1), -one(t-1)) > q2 <- t(b) %*% t(c2)%*% solve( c2 %*% solve(crossprod(D)) %*% t(c2))%*% c2 %*% b > df1<- t-1 > f <- (q2/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) > c2 [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 -1 1 0 -1 [2,] 0 1 -1 0 1 -1 > c3 <- kronecker( cbind(Iden(s-1),-one(s-1)), cbind(Iden(t-1),-one(t-1)) ) > q3 <- t(b) %*% t(c3)%*% solve( c3 %*% solve(crossprod(D)) %*% t(c3))%*% c3 %*% b > df1<- (s-1)*(t-1) > f <- (q3/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) > c3 [1,] [2,] [,1] [,2] [,3] [,4] [,5] [,6] 1 0 -1 -1 0 1 0 1 -1 0 -1 1 > data.frame(SS=q3,df=df1,F.stat=f,p.value=p) SS df F.stat p.value 1 222.766 2 8.353723 0.00888845 549 What null hypotheses are tested by F-tests derived from such ANOVA tables? Consider Type I sums of squares: R() = YT P1Y = YT P1P1Y = (P1Y)T (P1Y) = (Y:::1)T (Y :::1) = n::Y:::2 2 2 R() 1(Æ ) and F = SSE=R(n() ab) F(1;n:: ab)(Æ2) :: where Æ2 = 12 T X T P1X 1 2 = = 1 (T X T P )(P X) 1 1 2 1 (P X)T (P X) 1 2 1 550 For the carrot seed germination study: P1X = n1 1 1T X :: 1 = n 1[n::; n1:; n2:; n:1; n:2; n:3; :: n11; n12; n13; n21; n22; n23] = n1 1 n:: + Xa ni:i + Xb n:j j :: a X b X i=1 j =1 + i=1 j =1 ij The null hypothesis is H0 : 0 = n:: + Xa ni: i+ Xb n:j j +X X nij ij i=1 j =1 i j With respect to the cell means E (Yijk ) = ij = + i + j + ij this null hypothesis is H0 : 0 = Xa Xb nnij ij i=1 j =1 :: 551 552 Consider and R(j) = YT (P; P)Y )=(a 1) F 2 F = R(jMSE (a 1;n:: ab) (Æ ) Here, 1 2 2 2 R(j) a 1(Æ ) For the general eects model for the carrot seed germination study: T X;) X T X P; X = X;(X; ; 2 3 66 n:: n1: n2: 77 = X; 66664 n1: n1: 0 77775 n2: 0 n2: 2 66 64 n:: n1: n2: n:1 n:2 n:3 n11 n12 n13 n21 n22 n23 n1: n1: 0 n11 n12 n13 n11 n12 n13 0 0 0 n2: 0 n2: n21 n22 n23 0 0 0 n21 n22 n23 where a 1 = rank(X;) rank(X) and Æ2 = 12 T X T (P; P)X = 12 [(P; P)X]T [(P; P)X] = X; 2 66 66 66 66 66 66 4 3 1 1 0 1 1 0 777 2 2 66 66 66 66 4 0 0 0 0 n11: 0 0 0 n12: . . . 7 6 0 0 0 . . . . . . 7 7 6 1 1 0 77 66 1 1 0 1 0 1 777 4 1 0 1 . . . 7 . . . . . . 1 0 1 5 0 n11 n1 n11 n1 : : 0 n12 n1 n22 n2 : : 3 77 77 77 77 5 # 2 66 66 66 4 0 n13 n1 n23 n2 : : 3 77 77 77 5 0 n11 n :: 0 0 n12 n :: 0 0 n13 n :: 0 The last eight rows of (P; P)X are + 2 + Pbj=1 nn22j: (j + 2j ) + Xa nni: i + Xb nn:j j + X X nnij ij i=1 :: j =1 :: i j :: 555 3 0 0 n21 n1 n22 n2 : : 554 553 Then, the rst seven rows of (P; P)X are + 1 + Pbj=1 nn11j: (j + 1j ) + Xa nni: i + Xb nn:j j + X X nnij ij i=1 :: j =1 :: i j :: 0 0 = The null hypothesis is H0 : i + Xb nnij (j + ij ) j =1 i: are all equal (i = 1; : : : ; a) with respect to the cell means model, ij = E (Yijk = + i + j + ij ; this null hypothesis is H0 : Xb nnij ij are all equal (i = 1; : : : ; a): j =1 i: 556 0 7 0 777 n23 5 n1 : 3 77 75