8. Two-way crossed classications with \all cells lled" Example 8.1 Days to rst germination of three varieties of carrot seed grown in two types of potting soil. Soil Variety Type 1 2 3 1 Y111 = 6 Y121 = 13 Y131 = 14 Y112 = 10 Y122 = 15 Y132 = 22 Y113 = 11 2 Y211 = 12 Y221 = 31 Y231 = 18 Y212 = 15 Y232 = 9 Y213 = 19 Y233 = 12 Y214 = 18 This might be called \an unbalanced factorial experiment". Sample sizes: Soil type 1 2 1 n11 = 3 n21 = 4 Variety 2 n12 = 2 n22 = 1 3 n13 = 2 n23 = 3 In general we have i = 1; 2; : : : ; a levels for the rst factor j = 1; 2; : : : ; b levels for the second factor nij > 0 observations at the i-th level of the rst factor and the jth level of the second factor 439 438 We will restrict our attention to normal-theory Gauss-Markov models. \Cell means" model: where ijk NID(0; 2) b X ab i=1 j =1 ij Mean response at i-th level of factor 1, averaging across the levels of factor 2. 1 Xb = Yijk = ij + ijk 8 > < > : Overall mean response: 1 Xa :: = i: i = 1; : : : ; a j = 1; : : : ; b k = 1; : : : ; nij Clearly, E (Yijk) = ij is estimable if nij > 0. 440 b j =1 ij Mean response at j -th level of factor 2, averaging across the levels of factor 1 1 Xa :j = ij a i=1 441 Conditional eects: Contrasts of interest: ij kj \main eects" for factor 1: i: :: i = 1; 2; : : : ; a i: k: i 6= k ij i` ( ( i 6= k j = 1; 2; : : : ; b j 6= ` i = 1; 2; : : : ; a Interaction contrasts: \main eects" for factor 2: :j :: j = 1; 2; : : : ; b :j :` j 6= ` (ij kj ) = (ij = ij (i` i`) kj k`) (kj k`) i` + k` 442 All of these contrasts are estimable when nij > 0 for all (i; j ) because E (Yij:) = ij Any linear function of estimable functions is estimable 444 443 An \eects" model Yijk = + i + j + ij + ij where ijk NID(0; 2) i = 1; 2; : : : ; a j = 1; 2; : : : ; b k = 1; 2; : : : ; nij > 0 445 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 3 2 7 6 7 6 7 7 6 6 7 6 7 7 6 6 7=6 7 6 7 6 7 6 7 6 7 7 6 6 5 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 72 7 7 76 6 76 7 76 6 76 76 76 76 7 6 7 74 5 1 2 1 2 3 11 12 13 21 22 23 3 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 3 2 7 6 7 6 7 7 6 6 7 6 7 7 6 6 7=6 7 6 7 6 7 6 7 6 7 7 6 6 5 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 72 7 7 76 6 76 7 76 6 76 76 76 76 7 6 7 74 5 + Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 3 2 7 6 7 7 6 6 7 7 6 6 7 6 7 6 7 = 7 6 6 7 6 7 7 6 6 7 7 6 6 5 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 2 7 7 76 7 76 6 7 6 7 6 7 76 6 7 6 7 76 6 7 74 5 1 2 1 2 3 11 12 13 21 22 23 447 3 7 7 7 7 7 7 7 7 7 7 5 + 448 3 7 7 7 7 7 7 7 7 7 7 5 + 446 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 2 1 2 3 11 12 13 21 22 23 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 3 2 7 6 7 7 6 6 7 7 6 6 7 6 7 6 7 = 7 6 6 7 6 7 7 6 6 7 7 6 6 5 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 2 7 7 76 7 76 6 7 6 7 6 7 76 6 7 6 7 76 6 7 74 5 1 2 1 2 3 11 12 13 21 22 23 3 7 7 7 7 7 7 7 7 7 7 5 + 449 Soil Type 1 The resulting restricted model is Yijk = + i + j + ij + ijk where 8 > < i = 1; : : : ; a ijk NID(0; 2) > j = 1; : : : ; b : k = 1; : : : ; n ij and a = 0 b = 0 ib = 0 for all i = 1; : : : ; a aj = 0 for all j = 1; : : : ; b 2 Var. means Variety 1 Variety 2 Variety 3 11 = + 1 +1 + 11 12 = + 1 +2 + 12 13 = + 1 21 = + 1 22 = + 2 23 = + 21 + 1 + 211 + 21 + 2 + 212 + 21 = ab = E (Yabk ) the mean response when factor 1 is at level a and factor 2 is at level b. i = ib ab = E (Yibk ) E (Yabk ) is a dierence in mean responses between levels i and a of factor 1 when factor 2 is at its highest level. 451 450 2 Var. means Variety 1 11 = + 1 +1 + 11 Variety 2 12 = + 1 +2 + 12 Variety 3 13 = + 1 21 = + 1 22 = + 2 23 = + 21 + 1 + 211 + 21 + 2 + 212 + 21 + 1 +3 2 Interpretation: We will call these the \baseline" restrictions. Soil Type 1 Soil Type Means + 1 + 1+3 2 + 11+3 12 Soil Type Means + 1 + 1+3 2 + 11+3 12 + 1 +3 2 Soil Type 1 2 Var. means Variety 1 13 = + 1 21 = + 1 22 = + 2 23 = + 21 + 1 + 211 + 21 + 2 + 212 + 21 ij j = aj ab = E (Yajk ) E (Yabk ) for j = 1; 2; : : : ; b is the dierence in the mean responses for levels j and b of factor 2 when factor 1 is at its highest level. 452 = (ij = (ij ib) aj ) Note that ij Variety 3 12 = + 1 +2 + 12 Interaction: Interpretation: Variety 2 11 = + 1 +1 + 11 i` kj + k` = ij (aj (ib Soil Type Means + 1 + 1+3 2 + 11+3 12 + 1 +3 2 ab) ab) i` kj + k` for any (i; j ) and (k; `) 453 Matrix formulation: 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 = 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0 " % Y = X + Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 where 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 37 07 0 7777 17 1 777 2 3 0 76 7 0 7777 666 11 777 0 77 666 2 777 + 0 7 6 75 0 7777 4 11 0 77 12 07 0 7777 05 0 Least squares estimation: b = (X T X ) 2 6 6 b = 6 6 6 4 2 6 6 = 6 6 6 4 2 6 6 = 6 6 6 4 ^ ^1 ^1 ^2 ^11 ^12 n n1 n1 n2 n11 n12 Y23 Y13 Y21 Y22 Y11 Y12 1X T Y 3 7 7 7 7 7 5 n1 n1 n11 n12 n11 n12 n1 n2 n11 n12 n1 0 0 n2 n11 0 0 n12 Y23 Y23 Y23 Y13 Y13 n11 n12 n11 n12 n11 0 0 n12 n11 0 0 n12 3 7 7 7 7 7 Y21 + Y23 5 Y22 + Y23 3 7 7 7 7 7 5 12 6 6 6 6 6 4 Y Y1 Y1 Y2 Y11 Y12 3 7 7 7 7 7 5 Y N (X ; 2 I ) 454 Comments: Imposing a set of restrictions on the parameters in the \eects" model Yijk = + i + j + ij + ijk to obtain a model matrix with full column rank. (i) Avoids the use of a generalized inverse in least squares estimation. (ii) Is equivalent to choosing a generalized inverse for b = (X T X ) X T Y in the unrestricted \eects" model. 456 455 (iii) Restrictions must involve \non-estimable" quantities for the unrestricted \eects" model. Baseline restrictions: (SAS) a = 0 b = 0 ib = 0 for all i = 1; : : : ; a aj = 0 for all j = 1; : : : ; b Baseline restrictions: (S-PLUS) 1 = 0 1 = 0 i1 = 0 for all i = 1; : : : ; a 1j = 0 for all j = 1; : : : ; b 457 -restrictions: Yijk = ! + i + Æj + ij + ijk - where ijk ij = E (Yijk ) NID(0 ; 2 Soil type 1 Soil type 2 means ) and a X = 0 Æj = 0 ij = 0 for each j = 1; : : : ; b ij = 0 for each i = 1; : : : ; a j =1 a X i=1 b X j =1 Variety 2 12 = ! + 1 +Æ2 + 12 22 = ! + 2 +Æ2 + 22 :2 = ! + Æ2 Interpretation: i i=1 b X Variety 1 11 = ! + 1 +Æ1 + 11 21 = ! + 2 +Æ1 + 21 :1 = ! + Æ1 != 1 Variety 3 13 = ! + 1 +Æ3 + 13 23 = ! + 2 +Æ3 + 23 :3 = ! + Æ3 Variety 1 11 = ! + 1 +Æ1 + 11 21 = ! + 2 +Æ1 + 21 :1 = ! + Æ1 Variety 2 12 = ! + 1 +Æ2 + 12 22 = ! + 2 +Æ2 + 22 :2 = ! + Æ2 Interpretation: ! + Æj Æj Variety 3 13 = ! + 1 +Æ3 + 13 23 = ! + 2 +Æ3 + 23 :3 = ! + Æ3 ab i=1 j =1 ij is the overall mean germination time, averaging across all soil types and all varieties used in this study. 459 Means 1: = ! + 1 2: = ! + 2 Soil type 1 Soil type 2 means = :j = :j :: Variety 1 11 = ! + 1 +Æ1 + 11 21 = ! + 2 +Æ1 + 21 :1 = ! + Æ1 Similarly, and = (:j ::) (:k ::) = :j :k is the dierence between mean germination times for varieties j and k, averaging across soil types. Æj 2: = ! + 2 a X b X 458 Soil type 1 Soil type 2 means Means 1: = ! + 1 Æk 460 Variety 2 12 = ! + 1 +Æ2 + 12 22 = ! + 2 +Æ2 + 22 :2 = ! + Æ2 Variety 3 13 = ! + 1 +Æ3 + 13 23 = ! + 2 +Æ3 + 23 :3 = ! + Æ3 Means 1: = ! + 1 2: = ! + 2 1 2 = 1: 2: is the dierence in the mean germination times for dierent soil types, averaging across varieties. 461 Soil type 1 Soil type 2 means Variety 1 11 = ! + 1 +Æ1 + 11 21 = ! + 2 +Æ1 + 21 :1 = ! + Æ1 Variety 2 12 = ! + 1 +Æ2 + 12 22 = ! + 2 +Æ2 + 22 :2 = ! + Æ2 Variety 3 13 = ! + 1 +Æ3 + 13 23 = ! + 2 +Æ3 + 23 :3 = ! + Æ3 Matrix formulation: Means 1: = ! + 1 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2: = ! + 2 For a model that includes the -restrictions: ij = ij (! + i + Æj ) is a deviation from an additive model. Then, ij kj i` + k` = ij kj i` + k` 2 XT X XT X XT X XT X 3 1 2 ! ! ! ! Æ ! X!T Y 3 T X! X T X X T X T X 7 6 X X Æ XT Y 7 = 6 4 XT X! XT X XT XÆ XT X 5 4 XÆT Y 5 2 6 6 6 6 6 6 6 = 6 6 6 6 6 6 6 4 15 1 2 2 0 2 1 6 1 3 XX i X j X 1 2 1 15 0 2 2 2 i X 1 j Y1j: Yi1: 2 0 12 5 2 1 2 2 5 8 1 0 0 2 2 1 12 5 2 2 2 0 5 8 3 7 7 5 XT Y 1 2 Y::: Y1:: 6 6 YY:1: 4 :2: Y11: Y12: 3 7 7 2 1 XX 7 Yij: 7 6 7 7 6 6 1 XX 7 6 Yij: 7 = 6 7 6 7 6 6 1 XX 7 4 Yij: 7 6 7 7 5 ^1 ^ Æ1 Yij: Yi2: 2 i Y11: !^ Y12: !^ ^1 ^Æ2 !^ ^ ^ Æ1 ^ Æ2 ^11 ^12 Y2:: Y:3: Y:3: Y13: Y21: + Y23: Y13: Y22: + Y23: 3 2 7 7 7 6 7=6 7 7 4 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 7 7 7 7 72 7 76 76 7 4 7 7 7 7 7 5 ! 1 Æ1 Æ2 11 12 3 7 7+ 5 If restrictions are placed on \non-estimable" functions of parameters in the unrestricted \eects" model, then = (X T X ) 1X T Y 2 6 = 6 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 463 Least squares estimation: XT X! XT X XT XÆ XT X 3 2 7 6 7 6 7 6 7 7 6 6 7 7 6 6 7=6 7 6 7 7 6 6 7 6 7 7 6 6 5 4 This uses the -restrictions to obtain 2 = 1 Æ3 = Æ1 Æ2 21 = 11 13 = 11 12 22 = 12 23 = 13 = 11 + 12 462 b Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 3 16:83 3:17 7 4:33 7 5:67 5 0:33 5:33 464 3 7 7 5 The resulting models are reparameteriza- tions of each other. Y^ = PX Y ^ = (I PX )Y e=Y Y SSE = eT e = YT (I PX )Y ^ T Y^ = YT PX Y Y SSmodel = YT (PX P1)Y are the same for any set of restrictions. 465 Normal Theory Gauss-Markov Model The solution to the normal equations b = (X T X ) 1X T Y and interpretations of the corresponding parameters will not be the same for all such sets of restrictions. If you were to place restrictions on estimable functions of parameters in Yijk = + 1 + j + ij + ijk then you would change Yijk = + i + j + ij + ijk Analysis of variance: YT Y Y^ = X (X T X ) X T Y and OLS estimators of other estimable quantities. YT PY + YT (P; P)Y +YT (P;; P;)Y +YT (PX P;; )Y +YT (I PX )Y = R() + R(j) + R(j; ) +R( j; ; ) + SSE rank(X ) space spanned by the columns of X = By Cochran's Theorem, these quadratic forms (or sums of squares) have independent chi-square distributions with 1; a 1; b 1; (a 1)(b 1), and n ab degrees of freedom, respectively, (if nij > 0 for all (i; j )) 466 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 " 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 7 7 7= 7 7 7 7 7 5 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 - Dene: 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 - 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 call this call this call this X X 467 X 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 " 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 2 7 7 7 76 6 7 6 7 6 7 6 7 76 6 7 76 6 7 6 7 74 5 1 2 1 2 3 11 12 13 21 22 23 3 7 7 7 7 7 7 7+ 7 7 7 5 X = X P = X(XT X) 1XT X; = [XjX] T X; ) X T P; = X;(X; ; T X T X;; = [XjXjX ] P;; = X;; (X;; ;; ) X;; X = [XjXjX jX ] PX = X (X T X ) X T call this X 468 469 The following three model matrices correspond to reparameterizations of the same model: 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 7 7 76 7 76 6 76 76 76 7 6 7 76 6 76 7 74 5 1 2 1 2 3 11 12 13 21 22 23 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 R() = YT PY is the same for all three models 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 72 7 76 7 76 6 76 7 76 6 76 7 76 6 76 7 74 5 1 2 1 2 3 11 12 13 21 22 23 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 471 470 R(; ) 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 = YT P;Y is the same for all three models and so is R(j) = R(; ) R() R(; ; ) = YT P;; Y is the same for all three models and so is R(j) = R(; ; ) R(; ) 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 2 7 7 7 6 7 6 7 6 7 6 7 6 7 76 6 76 7 6 7 76 74 5 1 2 1 2 3 11 12 13 21 22 23 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 472 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 2 7 7 7 6 7 6 7 6 7 6 7 6 7 76 6 76 7 6 7 76 74 5 1 2 1 2 3 11 12 13 21 22 23 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 473 R(; ; ; ) = YT PX Y is the same for all three models and so is R( j; ; ) = R(; ; ; ) R(; ; ) 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 3 72 7 76 76 7 76 6 7 76 6 76 76 7 76 6 74 7 5 1 2 1 2 3 11 12 13 21 22 23 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 5 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 Consequently, the partition YT Y = YT PY + YT (P; P)Y +YT (P;; P; )Y +YT (PX P;; )Y +YT (I PX )Y = R() + R(j) + R(j; ) +R( j; ; ) + SSE is the same for all three models. By Cochran's Theorem, these quadratic forms (or sums of squares) have independent chi-square distributions with 1; b 1; a 1; (a 1)(b 1); and n ab degrees of freedom, respectively, when nij > 0 for all (i; j ). 475 474 The `m( ) function in S-PLUS: To allow the `m( ) function to create a model matrix involving classication variables, create factors. Using result 4.7, we have also shown earlier that SSE = YT (I PX )Y a b nij = X X X (Yijk Yij)2 i=1 j =1 k=1 2n ab 476 carrot read.table("carrots.dat", col.names=c("Soil","Variety","Days")) > carrot$soil as.factor (carrot$soil) > carrot$variety as.factor (carrot$variety) > Produce ANOVA tables: > `m.out1 `m(Days soil*variety, data=carrot) > anova (`m.out1) R(j) R( j; ) R( j; ; ) SSE 477 > `m.out2 `m(Days variety*soil, data=carrot) # This file is posted as carrots.ssc anova (`m.out2) R( j) R(j; ) R( j; ; ) SSE There are four options for creating columns in the model matrix for classication variables: contr: helmert contr: treatment sets 1 = 0 1 = 0 1j = 0 for all j i1 = 0 for all i contr:sum constraints contr:poly orthogonal polynomial contrasts equal spacing equal sample sizes > 478 carrot <- read.table( "c:/courses/st511/snew/carrots.dat", col.names=c("Soil","Variety","Days")) carrot$Soil <- as.factor(carrot$Soil) carrot$Variety <- as.factor(carrot$Variety) # # # # # # # Compute sample means of germination times for all combinations of soil type and varieties of carrot seeds and make a profile plot. At this point UNIX users should open a graphics window with the motif( ) function means <- tapply(carrot$Days, list(carrot$Variety,carrot$Soil),mean) means 479 # Set up the axes and title of the # profile plot. par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5) x.axis <- unique(carrot$Variety) matplot(c(1,3,1), c(0,40,10), type="n", xlab="Variety", ylab="Mean Time", main= "Average Time to Carrot Seed Germination") # Fit a model with main effects and interaction # Compute both sets of Type I sums of squares lm.out1 <- lm(Days~Soil*Variety,data=carrot) anova(lm.out1) lm.out2 <- lm(Days~Variety*Soil,data=carrot) anova(lm.out2) # Add a profile for each soil type matlines(x.axis,means,type='l', lty=c(1,3),lwd=3) # Plot points for the individual observations # Create a data frame containing the original # data and the residuals and estimated means data.frame(carrot$Soil,carrot$Variety,carrot$Days, Pred=lm.out1$fitted, Resid=round(lm.out1$resid,3)) matpoints(x.axis,means, pch=c(1,16)) # Add a legend to the plot legend(2.2,38.6, legend=c('Soil Type 1','Soil Type 2'), lty=c(1,3),bty='n') 480 481 # # # # # # Create residual plots frame( ) par(cex=1.0,mex=1.0,lwd=3,pch=2, mkh=0.1,fig=c(0,1,.51,1), pty='m') plot(lm.out1$fitted, lm.out1$resid, xlab="Estimated Means", ylab="Residuals") abline(h=0, lty=2, lwd=3) Create plots for studentized residuals You must attach the MASS library to have access to the studres( ) function that computes studentized residuals in the following code library(MASS) frame( ) par(cex=1.0,mex=1.0,lwd=3,pch=2, mkh=0.1,fin=c(6.5,6.5)) plot(lm.out1$fitted, studres(lm.out1), xlab="Estimated Means", ylab="Studentized Residuals", main="Studentized Residual Plot") abline(h=0, lty=2, lwd=3) par(fig=c(0, 1, 0, 0.49), pty='s') qqnorm(lm.out1$resid) qqline(lm.out1$resid) qqnorm(studres(lm.out1), main="Studentized Residuals") qqline(studres(lm.out1)) 482 # # # # By default S uses so-called Helmert contrast matrices for unordered factors and orthogonal polynomial contrast matrices for ordered factors. lm.out <- lm(Days~Soil*Variety,data=carrot) model.matrix(lm.out) summary(lm.out) anova(lm.out) # # # # # The default contrast matrices can be changed by resetting the contrasts options. The contr.sum option restricts parameters to sum to zero across the levels of any single factor. options(contrasts=c('contr.sum','contr.ploy')) unlist(options()) 484 483 # Now, ``contr.sum'' will be used for unordered # factors and orthogonal polynomial contrast # matrices will be used for ordered factors. lm.out <- lm(Days~Soil*Variety,data=carrot) model.matrix(lm.out) summary(lm.out) anova(lm.out) # Compute Type III sums of squares and F-tests. # First create the model matrix for # the cell means model. cb <- as.factor(10*as.numeric(carrot$Soil) + as.numeric(carrot$Variety)) lm.out <- lm(carrot$Days ~ cb - 1) D <- model.matrix(lm.out) D 485 # Compute the sample means y <- matrix(carrot$Days,ncol=1) b <- solve(crossprod(D)) %*% crossprod(D,y) b # Generate an identity matrix and a vector # of ones Iden <- function(n) diag(rep(1,n)) one <- function(n) matrix(rep(1,n),ncol=1) c1 <- kronecker( cbind(Iden(s-1), -one(s-1)), t(one(t)) ) q1 <- t(b) %*% t(c1)%*% solve( c1 %*% solve(crossprod(D)) %*% t(c1))%*% c1 %*% b df1<- s-1 f <- (q1/df1)/(sse/df2) p <- 1-pf(f,df1,df2) c1 data.frame(SS=q1,df=df1,F.stat=f,p.value=p) c2 <- kronecker( t(one(s)), cbind(Iden(t-1),-one(t-1)) ) q2 <- t(b) %*% t(c2)%*% solve( c2 %*% solve(crossprod(D)) %*% t(c2))%*% c2 %*% b df1<- t-1 f <- (q2/df1)/(sse/df2) p <- 1-pf(f,df1,df2) c2 data.frame(SS=q2,df=df1,F.stat=f,p.value=p) # Compute Type III sums of squares and # related F-tests s <- length(unique(carrot$Soil)) t <- length(unique(carrot$Variety)) yhat <- D %*% b sse <- crossprod(y-yhat) df2 <- nrow(y) - s*t 486 c3 <- kronecker( cbind(Iden(s-1),-one(s-1)), cbind(Iden(t-1),-one(t-1)) ) q3 <- t(b) %*% t(c3)%*% solve( c3 %*% solve(crossprod(D)) %*% t(c3))%*% c3 %*% b df1<- (s-1)*(t-1) f <- (q3/df1)/(sse/df2) p <- 1-pf(f,df1,df2) c3 data.frame(SS=q3,df=df1,F.stat=f,p.value=p) 488 487 > + > > > carrot <- read.table("c:/carrots.dat", col.names=c("Soil","Variety","Days")) carrot$Soil <- as.factor(carrot$Soil) carrot$Variety <- as.factor(carrot$Variety) carrot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Soil Variety Days 1 1 6 1 1 10 1 1 11 1 2 13 1 2 15 1 3 14 1 3 22 2 1 12 2 1 15 2 1 19 2 1 18 2 2 31 2 3 18 2 3 9 2 3 12 489 # Set up the axes and title of the # profile plot. # # # # # # # > par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5) > x.axis <- unique(carrot$Variety) > matplot(c(1,3,1), c(0,40,10), type="n", xlab="Variety", ylab="Mean Time", main= "Average Time to Carrot Seed Germination") Compute sample means of germination times for all combinations of soil type and varieties of carrot seeds and make a profile plot. At this point UNIX users should open a graphics window with the motif( ) function # Add a profile for each soil type > means <- tapply(carrot$Days, list(carrot$Variety,carrot$Soil),mean) > means > matlines(x.axis,means,type='l', lty=c(1,3),lwd=3) 1 2 1 9 16 2 14 31 3 18 13 > matpoints(x.axis,means, pch=c(1,16)) # Plot points for the observations # Add a legend to the plot > legend(2.,38.6, legend=c('Soil Type 1','Soil Type 2'), lty=c(1,3),bty='n') 490 # Fit a model with main effects and interaction # Compute both sets of Type I sums of squares 40 Average Time to Carrot Seed Germination 491 > lm.out1 <- lm(Days~Soil*Variety,data=carrot) > anova(lm.out1) 10 20 Analysis of Variance Table 0 Mean Time 30 Soil Type 1 Soil Type 2 1.0 1.5 2.0 2.5 3.0 Variety 492 Response: Days Terms added sequentially (first to Df Sum of Sq Mean Sq Soil 1 52.500 52.5000 Variety 2 124.734 62.3670 Soil:Variety 2 222.766 111.3830 Residuals 9 120.000 13.3333 last) F Value 3.937500 4.677527 8.353723 493 Pr(F) 0.0785 0.0405 0.0089 # Create a data frame containing the original # data and the residuals and estimated means > lm.out2 <- lm(Days~Variety*Soil,data=carrot) > anova(lm.out2) Analysis of Variance Table Response: Days Terms added sequentially (first to Df Sum of Sq Mean Sq Variety 2 93.3333 46.6667 Soil 1 83.9007 83.9007 Variety:Soil 2 222.7660 111.3830 Residuals 9 120.0000 13.3333 last) F Value 3.500000 6.292553 8.353723 Pr(F) 0.0751 0.0334 0.0089 > data.frame(carrot$Soil,carrot$Variety,carrot$Days, Pred=lm.out1$fitted, Resid=round(lm.out1$resid,3)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X1 X2 X3 Pred Resid 1 1 6 9 -3 1 1 10 9 1 1 1 11 9 2 1 2 13 14 -1 1 2 15 14 1 1 3 14 18 -4 1 3 22 18 4 2 1 12 16 -4 2 1 15 16 -1 2 1 19 16 3 2 1 18 16 2 2 2 31 31 0 2 3 18 13 5 2 3 9 13 -4 2 3 12 13 -1 495 494 0 Residuals -4 -2 frame( ) par(cex=1.0,mex=1.0,lwd=3,pch=2, mkh=0.1,fig=c(0,1,.51,1), pty='m') plot(lm.out1$fitted, lm.out1$resid, xlab="Estimated Means", ylab="Residuals") abline(h=0, lty=2, lwd=3) 2 4 # Create residual plots 10 15 20 25 30 2 0 par(fig=c(0, 1, 0, 0.49), pty='s') qqnorm(lm.out1$resid) qqline(lm.out1$resid) -4 -2 lm.out1$resid 4 Estimated Means -1 0 1 Quantiles of Standard Normal 496 497 Create plots for studentized residuals You must attach the MASS library to have access to the studres( ) function that computes studentized residuals in the following code Studentized Residuals library(MASS) -1 frame( ) par(cex=1.0,mex=1.0,lwd=3,pch=2, mkh=0.1,fin=c(6.5,6.5)) plot(lm.out1$fitted, studres(lm.out1), xlab="Estimated Means", ylab="Studentized Residuals", main="Studentized Residual Plot") abline(h=0, lty=2, lwd=3) 1 2 Studentized Residual Plot 0 # # # # # 10 15 20 25 30 Estimated Means qqnorm(studres(lm.out1), main="Studentized Residuals") qqline(studres(lm.out1)) 498 499 # By default S uses so-called Helmert contrast # matrices for unordered factors and orthogonal # polynomial contrast matrices for ordered factors. 2 Studentized Residuals 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -1 studres(lm.out1) 1 > lm.out <- lm(Days~Soil*Variety,data=carrot) > model.matrix(lm.out) -1 0 1 Quantiles of Standard Normal 500 (Int) Soil Variety1 Variety2 SoilVariety1 SoilVariety2 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 1 -1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 0 2 0 -2 1 -1 0 2 0 -2 1 1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 1 1 1 -1 1 -1 1 1 0 2 0 2 1 1 0 2 0 2 1 1 0 2 0 2 501 > summary(lm.out) > anova(lm.out) Call: lm(formula = Days ~ Soil * Variety, data = carrot) Residuals: Min 1Q Median 3Q Max -4 -2 9.992e-016 2 5 Coefficients: Value Std. Error (Intercept) 16.8333 1.0393 Soil 3.1667 1.0393 Variety1 5.0000 1.3176 Variety2 -0.6667 0.7082 SoilVariety1 2.5000 1.3176 SoilVariety2 -2.8333 0.7082 t value Pr(>|t|) 16.1960 0.0000 3.0468 0.0139 3.7947 0.0043 -0.9414 0.3711 1.8974 0.0903 -4.0008 0.0031 Residual standard error: 3.651 on 9 degrees of freedom Multiple R-Squared: 0.7692 F-statistic: 6 on 5 and 9 degrees of freedom, the p-value is 0.01031 Analysis of Variance Table Response: Days Terms added sequentially (first to Df Sum of Sq Mean Sq Soil 1 52.500 52.5000 Variety 2 124.734 62.3670 Soil:Variety 2 222.766 111.3830 Residuals 9 120.000 13.3333 # # # # last) F Value 3.937500 4.677527 8.353723 Pr(F) 0.0785 0.0405 0.0089 The default contrast matrices can be changed by resetting the contrasts options. The contr.sum option restricts parameters to sum to zero across the levels of any single factor. options(contrasts=c('contr.sum','contr.ploy')) unlist(options()) 503 502 > summary(lm.out) # Now, ``contr.sum'' will be used for unordered # factors and orthogonal polynomial contrast # matrices will be used for ordered factors. lm.out <- lm(Days~Soil*Variety,data=carrot) model.matrix(lm.out) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (Int) Soil Variety1 Variety2 SoilVariety1 SoilVariety2 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1 1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 1 -1 1 0 -1 0 1 -1 1 0 -1 0 1 -1 1 0 -1 0 1 -1 1 0 -1 0 1 -1 0 1 0 -1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 504 Call: lm(formula = Days ~ Soil * Variety, data = carrot) Residuals: Min 1Q Median 3Q Max -4 -2 5.551e-016 2 5 Coefficients: (Intercept) Soil Variety1 Variety2 SoilVariety1 SoilVariety2 Value Std. Error 16.8333 1.0393 -3.1667 1.0393 -4.3333 1.3147 5.6667 1.6574 -0.3333 1.3147 -5.3333 1.6574 t value Pr(>|t|) 16.1960 0.0000 -3.0468 0.0139 -3.2961 0.0093 3.4190 0.0076 -0.2535 0.8055 -3.2179 0.0105 Residual standard error: 3.651 on 9 degrees of freedom Multiple R-Squared: 0.7692 F-statistic: 6 on 5 and 9 degrees of freedom, the p-value is 0.01031 505 # Compute Type III sums of squares and F-tests. # First create the model matrix for # the cell means model. > cb <- as.factor(10*as.numeric(carrot$Soil) + as.numeric(carrot$Variety)) > lm.out <- lm(carrot$Days ~ cb - 1) > D <- model.matrix(lm.out) > D > anova(lm.out) Analysis of Variance Table Response: Days Terms added sequentially (first to Df Sum of Sq Mean Sq Soil 1 52.500 52.5000 Variety 2 124.734 62.3670 Soil:Variety 2 222.766 111.3830 Residuals 9 120.000 13.3333 last) F Value 3.937500 4.677527 8.353723 Pr(F) 0.0785 0.0405 0.0089 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cb11 cb12 cb13 cb21 cb22 cb23 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 507 506 # Compute the sample means > y <- matrix(carrot$Days,ncol=1) > b <- solve(crossprod(D)) %*% crossprod(D,y) > b cb11 cb12 cb13 cb21 cb22 cb23 [,1] 9 14 18 16 31 13 # Compute Type III sums of squares and # related F-tests > s <- length(unique(carrot$Soil)) > t <- length(unique(carrot$Variety)) > yhat <- D %*% b > sse <- crossprod(y-yhat) > df2 <- nrow(y) - s*t # Generate an identity matrix and # a vector of ones Iden <- function(n) diag(rep(1,n)) one <- function(n) matrix(rep(1,n),ncol=1) 508 > c1 <- kronecker( cbind(Iden(s-1),-one(s-1)), t(one(t)) ) > q1 <- t(b) %*% t(c1)%*% solve( c1 %*% solve(crossprod(D)) %*% t(c1))%*% c1 %*% b > df1<- s-1 > f <- (q1/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) 509 > c1 [1,] > data.frame(SS=q2,df=df1,F.stat=f,p.value=p) [,1] [,2] [,3] [,4] [,5] [,6] 1 1 1 -1 -1 -1 SS df F.stat p.value 1 192.1277 2 7.204787 0.01354629 > data.frame(SS=q1,df=df1,F.stat=f,p.value=p) SS df F.stat p.value 1 123.7714 1 9.282857 0.01386499 > c2 <- kronecker( t(one(s)), cbind(Iden(t-1), -one(t-1)) > q2 <- t(b) %*% t(c2)%*% solve( c2 %*% solve(crossprod(D)) %*% t(c2))%*% c2 %*% b > df1<- t-1 > f <- (q2/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) > c2 [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 -1 1 0 -1 [2,] 0 1 -1 0 1 -1 > c3 <- kronecker( cbind(Iden(s-1),-one(s-1)), cbind(Iden(t-1),-one(t-1)) ) > q3 <- t(b) %*% t(c3)%*% solve( c3 %*% solve(crossprod(D)) %*% t(c3))%*% c3 %*% b > df1<- (s-1)*(t-1) > f <- (q3/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) > c3 [1,] [2,] [,1] [,2] [,3] [,4] [,5] [,6] 1 0 -1 -1 0 1 0 1 -1 0 -1 1 > data.frame(SS=q3,df=df1,F.stat=f,p.value=p) SS df F.stat p.value 1 222.766 2 8.353723 0.00888845 510 What null hypotheses are tested by F-tests derived from such ANOVA tables (Type I sums of squares in SAS)? R() = YT P1Y = YT P1P1Y = (P1Y)T (P1Y) = (Y:::1)T (Y :::1) = n::Y:::2 2 () For the carrot seed germination study: 1 1 1T X P1X = n:: = 1 1[n::; n ; n ; n ; n ; n ; 1: 2: :1 :2 :3 n:: n11; n12; n13; n21; n22; n23] a = n1:: 1 n:: + X i=1 a b ni:i + b X j =1 n:j j + X X ij i=1 j =1 The null hypothesis is a b X X XX H0 : 0 = n:: + ni: i+ n:j j + nij ij i=1 j =1 i j With respect to the cell means E (Yijk ) = ij = + i + j + ij this null hypothesis is a X b n X ij H0 : 0 = n:: ij ( ) and R() F= F(1;n:: ab)(Æ2) SSE=(n:: ab) where 1 Æ2 = 2 T X T P1X = 12 (T X T P1)(P1X ) = 12 (P1X )T (P1X ) 1R 511 21 Æ2 i=1 j =1 512 513 Consider and F For the general eects model for the carrot seed germination study: T X;) X T X P; X = X;(X; ; R(j) = YT (P; P)Y )=(a 1) F = R(jMSE (a Here, 1;n:: 2 ab)(Æ ) 1 R(j) 2 (Æ2) a 1 2 where a 1 = rank(X;) rank(X) and 1 Æ2 = 2 T X T (P; P)X = 1 [(P; P)X ]T [(P; P)X ] 2 = 2 X; 64 n:: n1: n2: n1: n1: 0 n2: 0 n2: 3 7 5 2 3 n:: n1: n2: n:1 n:2 n:3 n11 n12 n13 n21 n22 n23 4 n1: n1: 0 n11 n12 n13 n11 n12 n13 0 0 0 5 n2: 0 n2: n21 n22 n23 0 0 0 n21 n22 n23 = 2 6 6 6 6 6 4 1 1 ... 1 1 ... 1 1 1 ... 1 0 ... 0 2 6 X; 64 0 0 ... 0 1 ... 1 0 0 0 0 n11: 0 0 0 n12: 3 720 0 0 7 7 741 1 0 7 1 0 1 5 0 n11 n1 n11 n1 : : # 3 2 7 6 7 4 5 0 n12 n1 n22 n2 : : 0 n13 n1 n23 n2 : : 0 n11 n :: 0 3 7 5 0 n12 n :: 0 0 n13 n :: 0 514 Then, the rst seven rows of (P; are h i + 1 + Pbj=1 nn11j: (j + 1j ) h + P)X b n X ni: :j + X X nij i i + j ij i=1 n:: j =1 n:: i j n:: b n X ni: :j + X X nij i + i + j ij i=1 n:: j =1 n:: i j n:: h a X 516 0 0 n21 n1 : 0 0 n22 n2 515 The null hypothesis is b X nij (j + ij ) j =1 ni: are all equal (i = 1; : : : ; a) H0 : i + a X The last eight rows of (P; P)X are h i + 2 + Pbj=1 nn22j: (j + 2j ) = with respect to the cell means model, ij = E (Yijk = + i + j + ij ; this null hypothesis is H0 : b X nij ij j =1 ni: are all equal (i = 1; : : : ; a): 517 : 3 0 0 5 n23 n1 :