This might be called

advertisement
8. Two-way crossed classications
with \all cells lled"
Example 8.1 Days to rst germination of three
varieties of carrot seed grown in two types of
potting soil.
Soil
Variety
Type
1
2
3
1 Y111 = 6 Y121 = 13 Y131 = 14
Y112 = 10 Y122 = 15 Y132 = 22
Y113 = 11
2 Y211 = 12 Y221 = 31 Y231 = 18
Y212 = 15
Y232 = 9
Y213 = 19
Y233 = 12
Y214 = 18
This might be called \an unbalanced factorial
experiment".
Sample sizes:
Soil
type
1
2
1
n11 = 3
n21 = 4
Variety
2
n12 = 2
n22 = 1
3
n13 = 2
n23 = 3
In general we have
i = 1; 2; : : : ; a levels for the rst factor
j = 1; 2; : : : ; b levels for the second factor
nij > 0
observations at the i-th level
of the rst factor and the jth level of the second factor
439
438
We will restrict our attention to normal-theory
Gauss-Markov models.
\Cell means" model:
where
ijk NID(0; 2)
b
X
ab i=1 j =1 ij
Mean response at i-th level of factor 1,
averaging across the levels of factor 2.
1 Xb =
Yijk = ij + ijk
8
>
<
>
:
Overall mean response:
1 Xa
:: =
i:
i = 1; : : : ; a
j = 1; : : : ; b
k = 1; : : : ; nij
Clearly, E (Yijk) = ij is estimable if nij > 0.
440
b j =1 ij
Mean response at j -th level of factor 2,
averaging across the levels of factor 1
1 Xa :j =
ij
a
i=1
441
Conditional eects:
Contrasts of interest:
ij kj
\main eects" for factor 1:
i: ::
i = 1; 2; : : : ; a
i: k:
i 6= k
ij i`
(
(
i 6= k
j = 1; 2; : : : ; b
j 6= `
i = 1; 2; : : : ; a
Interaction contrasts:
\main eects" for factor 2:
:j ::
j = 1; 2; : : : ; b
:j :`
j 6= `
(ij
kj )
= (ij
= ij
(i`
i`)
kj
k`)
(kj k`)
i` + k`
442
All of these contrasts are estimable when
nij > 0 for all (i; j )
because
E (Yij:) = ij
Any linear function of estimable functions
is estimable
444
443
An \eects" model
Yijk = + i + j + ij + ij
where
ijk NID(0; 2)
i = 1; 2; : : : ; a
j = 1; 2; : : : ; b
k = 1; 2; : : : ; nij > 0
445
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
3 2
7 6
7 6
7
7 6
6
7 6
7
7 6
6
7=6
7 6
7 6
7 6
7
6
7
7 6
6
5 4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
72
7
7
76
6
76
7
76
6
76
76
76
76
7
6
7
74
5
1
2
1
2
3
11
12
13
21
22
23
3
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
3 2
7 6
7 6
7
7 6
6
7 6
7
7 6
6
7=6
7 6
7 6
7 6
7
6
7
7 6
6
5 4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
72
7
7
76
6
76
7
76
6
76
76
76
76
7
6
7
74
5
+
Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
3 2
7
6
7
7 6
6
7
7 6
6
7
6
7
6
7
=
7 6
6
7
6
7
7 6
6
7
7 6
6
5 4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
2
7
7
76
7
76
6
7
6
7
6
7
76
6
7
6
7
76
6
7
74
5
1
2
1
2
3
11
12
13
21
22
23
447
3
7
7
7
7
7
7
7
7
7
7
5
+
448
3
7
7
7
7
7
7
7
7
7
7
5
+
446
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
2
1
2
3
11
12
13
21
22
23
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
3 2
7
6
7
7 6
6
7
7 6
6
7
6
7
6
7
=
7 6
6
7
6
7
7 6
6
7
7 6
6
5 4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
2
7
7
76
7
76
6
7
6
7
6
7
76
6
7
6
7
76
6
7
74
5
1
2
1
2
3
11
12
13
21
22
23
3
7
7
7
7
7
7
7
7
7
7
5
+
449
Soil
Type
1
The resulting restricted model is
Yijk = + i + j + ij + ijk
where
8
>
< i = 1; : : : ; a
ijk NID(0; 2) > j = 1; : : : ; b
: k = 1; : : : ; n
ij
and
a = 0
b = 0
ib = 0 for all i = 1; : : : ; a
aj = 0 for all j = 1; : : : ; b
2
Var.
means
Variety 1
Variety 2
Variety 3
11 = + 1
+1 + 11
12 = + 1
+2 + 12
13 = + 1
21 = + 1
22 = + 2
23 = + 21 + 1 + 211
+ 21 + 2 + 212
+ 21
= ab = E (Yabk )
the mean response when factor 1 is at
level a and factor 2 is at level b.
i = ib ab = E (Yibk ) E (Yabk )
is a dierence in mean responses between levels i and a of factor 1 when
factor 2 is at its highest level.
451
450
2
Var.
means
Variety 1
11 = + 1
+1 + 11
Variety 2
12 = + 1
+2 + 12
Variety 3
13 = + 1
21 = + 1
22 = + 2
23 = + 21 + 1 + 211
+ 21 + 2 + 212
+ 21
+ 1 +3 2
Interpretation:
We will call these the \baseline" restrictions.
Soil
Type
1
Soil
Type
Means
+ 1
+ 1+3 2
+ 11+3 12
Soil
Type
Means
+ 1
+ 1+3 2
+ 11+3 12
+ 1 +3 2
Soil
Type
1
2
Var.
means
Variety 1
13 = + 1
21 = + 1
22 = + 2
23 = + 21 + 1 + 211
+ 21 + 2 + 212
+ 21
ij
j = aj ab = E (Yajk ) E (Yabk )
for j = 1; 2; : : : ; b
is the dierence in the mean responses
for levels j and b of factor 2 when factor
1 is at its highest level.
452
= (ij
= (ij
ib)
aj )
Note that
ij
Variety 3
12 = + 1
+2 + 12
Interaction:
Interpretation:
Variety 2
11 = + 1
+1 + 11
i` kj + k` = ij
(aj
(ib
Soil
Type
Means
+ 1
+ 1+3 2
+ 11+3 12
+ 1 +3 2
ab)
ab)
i` kj + k`
for any (i; j ) and (k; `)
453
Matrix formulation:
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1 1 1
1 1 1
1 1 1
1 1 0
1 1 0
1 1 0
1 1 0
= 1 0 1
1 0 1
1 0 1
1 0 1
1 0 0
1 0 0
1 0 0
1 0 0
"
%
Y = X + Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
where
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0 37
07
0 7777
17
1 777 2 3
0 76 7
0 7777 666 11 777
0 77 666 2 777 + 0 7 6 75
0 7777 4 11
0 77 12
07
0 7777
05
0
Least squares estimation: b = (X T X )
2
6
6
b = 6
6
6
4
2
6
6
= 6
6
6
4
2
6
6
= 6
6
6
4
^
^1
^1
^2
^11
^12
n
n1
n1
n2
n11
n12
Y23
Y13
Y21
Y22
Y11
Y12
1X T Y
3
7
7
7
7
7
5
n1
n1
n11
n12
n11
n12
n1 n2
n11 n12
n1 0
0 n2
n11 0
0 n12
Y23
Y23
Y23
Y13
Y13
n11 n12
n11 n12
n11 0
0 n12
n11 0
0 n12
3
7
7
7
7
7
Y21 + Y23 5
Y22 + Y23
3
7
7
7
7
7
5
12
6
6
6
6
6
4
Y
Y1
Y1
Y2
Y11
Y12
3
7
7
7
7
7
5
Y N (X ; 2 I )
454
Comments:
Imposing a set of restrictions on the parameters in the \eects" model
Yijk = + i + j + ij + ijk
to obtain a model matrix with full column rank.
(i) Avoids the use of a generalized inverse in
least squares estimation.
(ii) Is equivalent to choosing a generalized inverse for
b = (X T X ) X T Y
in the unrestricted \eects" model.
456
455
(iii) Restrictions must involve \non-estimable"
quantities for the unrestricted \eects"
model.
Baseline restrictions: (SAS)
a = 0
b = 0
ib = 0 for all i = 1; : : : ; a
aj = 0 for all j = 1; : : : ; b
Baseline restrictions: (S-PLUS)
1 = 0
1 = 0
i1 = 0 for all i = 1; : : : ; a
1j = 0 for all j = 1; : : : ; b
457
-restrictions:
Yijk = ! + i + Æj + ij + ijk
-
where
ijk
ij = E (Yijk )
NID(0
; 2
Soil
type 1
Soil
type 2
means
)
and
a
X
= 0
Æj
= 0
ij
= 0 for each j = 1; : : : ; b
ij
= 0 for each i = 1; : : : ; a
j =1
a
X
i=1
b
X
j =1
Variety 2
12 = ! + 1
+Æ2 + 12
22 = ! + 2
+Æ2 + 22
:2 = ! + Æ2
Interpretation:
i
i=1
b
X
Variety 1
11 = ! + 1
+Æ1 + 11
21 = ! + 2
+Æ1 + 21
:1 = ! + Æ1
!=
1
Variety 3
13 = ! + 1
+Æ3 + 13
23 = ! + 2
+Æ3 + 23
:3 = ! + Æ3
Variety 1
11 = ! + 1
+Æ1 + 11
21 = ! + 2
+Æ1 + 21
:1 = ! + Æ1
Variety 2
12 = ! + 1
+Æ2 + 12
22 = ! + 2
+Æ2 + 22
:2 = ! + Æ2
Interpretation:
! + Æj
Æj
Variety 3
13 = ! + 1
+Æ3 + 13
23 = ! + 2
+Æ3 + 23
:3 = ! + Æ3
ab i=1 j =1 ij
is the overall mean germination time, averaging across all soil types and all varieties used
in this study.
459
Means
1: = ! + 1
2: = ! + 2
Soil
type 1
Soil
type 2
means
= :j
= :j ::
Variety 1
11 = ! + 1
+Æ1 + 11
21 = ! + 2
+Æ1 + 21
:1 = ! + Æ1
Similarly,
and
= (:j ::) (:k ::)
= :j :k
is the dierence between mean germination
times for varieties j and k, averaging across
soil types.
Æj
2: = ! + 2
a X
b
X
458
Soil
type 1
Soil
type 2
means
Means
1: = ! + 1
Æk
460
Variety 2
12 = ! + 1
+Æ2 + 12
22 = ! + 2
+Æ2 + 22
:2 = ! + Æ2
Variety 3
13 = ! + 1
+Æ3 + 13
23 = ! + 2
+Æ3 + 23
:3 = ! + Æ3
Means
1: = ! + 1
2: = ! + 2
1 2 = 1: 2:
is the dierence in the mean germination times
for dierent soil types, averaging across varieties.
461
Soil
type 1
Soil
type 2
means
Variety 1
11 = ! + 1
+Æ1 + 11
21 = ! + 2
+Æ1 + 21
:1 = ! + Æ1
Variety 2
12 = ! + 1
+Æ2 + 12
22 = ! + 2
+Æ2 + 22
:2 = ! + Æ2
Variety 3
13 = ! + 1
+Æ3 + 13
23 = ! + 2
+Æ3 + 23
:3 = ! + Æ3
Matrix formulation:
Means
1: = ! + 1
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2: = ! + 2
For a model that includes the -restrictions:
ij = ij (! + i + Æj )
is a deviation from an additive model. Then,
ij
kj i` + k`
= ij kj i` + k`
2 XT X XT X XT X XT X 3 1 2
! !
! ! Æ
! X!T Y 3
T X! X T X X T X
T X 7
6
X
X
Æ
XT Y
7
= 6
4 XT X! XT X XT XÆ XT X 5 4 XÆT Y 5
2
6
6
6
6
6
6
6
= 6
6
6
6
6
6
6
4
15
1
2
2
0
2
1
6
1
3
XX
i
X
j
X
1
2
1
15
0
2
2
2
i
X
1
j
Y1j:
Yi1:
2
0
12
5
2
1
2
2
5
8
1
0
0
2
2
1
12
5
2
2
2
0
5
8
3
7
7
5
XT Y
1 2 Y:::
Y1::
6
6 YY:1:
4 :2:
Y11:
Y12:
3
7
7 2
1 XX 7
Yij: 7
6
7
7 6
6
1 XX 7 6
Yij: 7 = 6
7
6
7 6
6
1 XX 7 4
Yij: 7
6
7
7
5
^1 ^
Æ1
Yij:
Yi2:
2
i
Y11: !^
Y12: !^ ^1 ^Æ2
!^
^
^
Æ1
^
Æ2
^11
^12
Y2::
Y:3:
Y:3:
Y13: Y21: + Y23:
Y13: Y22: + Y23:
3
2
7
7
7 6
7=6
7
7 4
5
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
0
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
0
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
3
7
7
7
7
72
7
76
76
7
4
7
7
7
7
7
5
!
1
Æ1
Æ2
11
12
3
7
7+
5
If restrictions are placed on \non-estimable"
functions of parameters in the unrestricted
\eects" model, then
= (X T X ) 1X T Y
2
6
= 6
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
463
Least squares estimation:
XT X! XT X XT XÆ XT X
3 2
7 6
7 6
7
6
7
7 6
6
7
7 6
6
7=6
7
6
7
7 6
6
7 6
7
7 6
6
5 4
This uses the -restrictions to obtain
2 = 1
Æ3 = Æ1 Æ2
21 = 11
13 = 11 12
22 = 12
23 = 13 = 11 + 12
462
b
Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
3
16:83
3:17 7
4:33 7
5:67 5
0:33
5:33
464
3
7
7
5
The resulting models are reparameteriza-
tions of each other.
Y^ = PX Y
^ = (I PX )Y
e=Y Y
SSE = eT e = YT (I PX )Y
^ T Y^ = YT PX Y
Y
SSmodel = YT (PX P1)Y
are the same for any set of restrictions.
465
Normal Theory Gauss-Markov Model
The solution to the normal equations
b = (X T X ) 1X T Y
and interpretations of the corresponding
parameters will not be the same for all such
sets of restrictions.
If you were to place restrictions on estimable
functions of parameters in
Yijk = + 1 + j + ij + ijk
then you would change
Yijk = + i + j + ij + ijk
Analysis of variance:
YT Y
Y^ = X (X T X ) X T Y and OLS estimators of other estimable quantities.
YT PY + YT (P; P)Y
+YT (P;; P;)Y
+YT (PX P;; )Y
+YT (I PX )Y
= R() + R(j) + R(j; )
+R( j; ; ) + SSE
rank(X )
space spanned by the columns of X
=
By Cochran's Theorem, these quadratic forms
(or sums of squares) have independent
chi-square distributions with 1; a 1; b 1;
(a 1)(b 1), and n ab degrees of freedom,
respectively, (if nij > 0 for all (i; j ))
466
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
"
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7=
7
7
7
7
7
5
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
-
Dene:
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
-
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
call this call this call this
X
X
467
X
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
"
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
2
7
7
7
76
6
7
6
7
6
7
6
7
76
6
7
76
6
7
6
7
74
5
1
2
1
2
3
11
12
13
21
22
23
3
7
7
7
7
7
7
7+
7
7
7
5
X = X
P = X(XT X) 1XT
X; = [XjX]
T X; ) X T
P; = X;(X;
;
T X
T
X;; = [XjXjX ] P;; = X;; (X;;
;; ) X;;
X = [XjXjX jX ]
PX = X (X T X ) X T
call this
X
468
469
The following three model matrices correspond
to reparameterizations of the same model:
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
2
7
7
76
7
76
6
76
76
76
7
6
7
76
6
76
7
74
5
1
2
1
2
3
11
12
13
21
22
23
1
1
1
0
0
1
1
1
0
1
1
0
1
1
1
0
0
0
1
1
1
1
0
1
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
5
1
1
1
0
0
1
1
1
1
1
1
0
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
R() = YT PY is the same for all three models
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
72
7
76
7
76
6
76
7
76
6
76
7
76
6
76
7
74
5
1
2
1
2
3
11
12
13
21
22
23
1
1
1
0
0
1
1
1
0
1
1
0
1
1
1
0
0
0
1
1
1
1
0
1
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
5
1
1
1
0
0
1
1
1
1
1
1
0
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
471
470
R(; )
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
= YT P;Y is the same for all three
models and so is R(j) = R(; ) R()
R(; ; ) = YT P;; Y is the same for all three
models and so is R(j) = R(; ; ) R(; )
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
2
7
7
7
6
7
6
7
6
7
6
7
6
7
76
6
76
7
6
7
76
74
5
1
2
1
2
3
11
12
13
21
22
23
1
1
1
0
0
1
1
1
0
1
1
0
1
1
1
0
0
0
1
1
1
1
0
1
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
5
1
1
1
0
0
1
1
1
1
1
1
0
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
472
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
2
7
7
7
6
7
6
7
6
7
6
7
6
7
76
6
76
7
6
7
76
74
5
1
2
1
2
3
11
12
13
21
22
23
1
1
1
0
0
1
1
1
0
1
1
0
1
1
1
0
0
0
1
1
1
1
0
1
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
5
1
1
1
0
0
1
1
1
1
1
1
0
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
473
R(; ; ; ) = YT PX Y is the same for all three
models and so is R( j; ; ) = R(; ; ; )
R(; ; )
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
3
72
7
76
76
7
76
6
7
76
6
76
76
7
76
6
74
7
5
1
2
1
2
3
11
12
13
21
22
23
1
1
1
0
0
1
1
1
0
1
1
0
1
1
1
0
0
0
1
1
1
1
0
1
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
5
1
1
1
0
0
1
1
1
1
1
1
0
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
Consequently, the partition
YT Y = YT PY + YT (P; P)Y
+YT (P;; P; )Y
+YT (PX P;; )Y
+YT (I PX )Y
= R() + R(j) + R(j; )
+R( j; ; ) + SSE
is the same for all three models.
By Cochran's Theorem, these quadratic forms
(or sums of squares) have independent
chi-square distributions with 1; b 1; a 1;
(a 1)(b 1); and n ab degrees of freedom,
respectively, when nij > 0 for all (i; j ).
475
474
The `m( ) function in S-PLUS:
To allow the `m( ) function to create a model
matrix involving classication variables, create
factors.
Using result 4.7, we have also shown earlier
that
SSE = YT (I PX )Y
a b nij
= X X X (Yijk Yij)2
i=1 j =1 k=1
2n
ab
476
carrot read.table("carrots.dat",
col.names=c("Soil","Variety","Days"))
> carrot$soil as.factor (carrot$soil)
> carrot$variety as.factor (carrot$variety)
>
Produce ANOVA tables:
> `m.out1 `m(Days soil*variety,
data=carrot)
> anova (`m.out1)
R(j)
R( j; )
R( j; ; )
SSE
477
> `m.out2
`m(Days variety*soil,
data=carrot)
# This file is posted as carrots.ssc
anova (`m.out2)
R( j)
R(j; )
R( j; ; )
SSE
There are four options for creating columns in
the model matrix for classication variables:
contr: helmert
contr: treatment sets 1 = 0
1 = 0
1j = 0 for all j
i1 = 0 for all i
contr:sum
constraints
contr:poly
orthogonal polynomial
contrasts
equal spacing
equal sample sizes
>
478
carrot <- read.table(
"c:/courses/st511/snew/carrots.dat",
col.names=c("Soil","Variety","Days"))
carrot$Soil <- as.factor(carrot$Soil)
carrot$Variety <- as.factor(carrot$Variety)
#
#
#
#
#
#
#
Compute sample means of germination
times for all combinations of soil
type and varieties of carrot seeds
and make a profile plot.
At this point UNIX users should open
a graphics window with the motif( )
function
means <- tapply(carrot$Days,
list(carrot$Variety,carrot$Soil),mean)
means
479
# Set up the axes and title of the
# profile plot.
par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5)
x.axis <- unique(carrot$Variety)
matplot(c(1,3,1), c(0,40,10), type="n",
xlab="Variety", ylab="Mean Time",
main= "Average Time to Carrot Seed
Germination")
# Fit a model with main effects and interaction
# Compute both sets of Type I sums of squares
lm.out1 <- lm(Days~Soil*Variety,data=carrot)
anova(lm.out1)
lm.out2 <- lm(Days~Variety*Soil,data=carrot)
anova(lm.out2)
# Add a profile for each soil type
matlines(x.axis,means,type='l',
lty=c(1,3),lwd=3)
# Plot points for the individual observations
# Create a data frame containing the original
# data and the residuals and estimated means
data.frame(carrot$Soil,carrot$Variety,carrot$Days,
Pred=lm.out1$fitted,
Resid=round(lm.out1$resid,3))
matpoints(x.axis,means, pch=c(1,16))
# Add a legend to the plot
legend(2.2,38.6,
legend=c('Soil Type 1','Soil Type 2'),
lty=c(1,3),bty='n')
480
481
#
#
#
#
#
# Create residual plots
frame( )
par(cex=1.0,mex=1.0,lwd=3,pch=2,
mkh=0.1,fig=c(0,1,.51,1), pty='m')
plot(lm.out1$fitted, lm.out1$resid,
xlab="Estimated Means",
ylab="Residuals")
abline(h=0, lty=2, lwd=3)
Create plots for studentized residuals
You must attach the MASS library
to have access to the studres( )
function that computes studentized
residuals in the following code
library(MASS)
frame( )
par(cex=1.0,mex=1.0,lwd=3,pch=2,
mkh=0.1,fin=c(6.5,6.5))
plot(lm.out1$fitted, studres(lm.out1),
xlab="Estimated Means",
ylab="Studentized Residuals",
main="Studentized Residual Plot")
abline(h=0, lty=2, lwd=3)
par(fig=c(0, 1, 0, 0.49), pty='s')
qqnorm(lm.out1$resid)
qqline(lm.out1$resid)
qqnorm(studres(lm.out1),
main="Studentized Residuals")
qqline(studres(lm.out1))
482
#
#
#
#
By default S uses so-called Helmert contrast
matrices for unordered factors and orthogonal
polynomial contrast matrices for ordered
factors.
lm.out <- lm(Days~Soil*Variety,data=carrot)
model.matrix(lm.out)
summary(lm.out)
anova(lm.out)
#
#
#
#
#
The default contrast matrices can be changed
by resetting the contrasts options. The
contr.sum option restricts parameters to sum
to zero across the levels of any single
factor.
options(contrasts=c('contr.sum','contr.ploy'))
unlist(options())
484
483
# Now, ``contr.sum'' will be used for unordered
# factors and orthogonal polynomial contrast
# matrices will be used for ordered factors.
lm.out <- lm(Days~Soil*Variety,data=carrot)
model.matrix(lm.out)
summary(lm.out)
anova(lm.out)
# Compute Type III sums of squares and F-tests.
# First create the model matrix for
# the cell means model.
cb <- as.factor(10*as.numeric(carrot$Soil)
+ as.numeric(carrot$Variety))
lm.out <- lm(carrot$Days ~ cb - 1)
D <- model.matrix(lm.out)
D
485
# Compute the sample means
y <- matrix(carrot$Days,ncol=1)
b <- solve(crossprod(D)) %*% crossprod(D,y)
b
# Generate an identity matrix and a vector
# of ones
Iden <- function(n) diag(rep(1,n))
one <- function(n) matrix(rep(1,n),ncol=1)
c1 <- kronecker( cbind(Iden(s-1),
-one(s-1)), t(one(t)) )
q1 <- t(b) %*% t(c1)%*%
solve( c1 %*% solve(crossprod(D))
%*% t(c1))%*% c1 %*% b
df1<- s-1
f <- (q1/df1)/(sse/df2)
p <- 1-pf(f,df1,df2)
c1
data.frame(SS=q1,df=df1,F.stat=f,p.value=p)
c2 <- kronecker( t(one(s)),
cbind(Iden(t-1),-one(t-1)) )
q2 <- t(b) %*% t(c2)%*%
solve( c2 %*% solve(crossprod(D))
%*% t(c2))%*% c2 %*% b
df1<- t-1
f <- (q2/df1)/(sse/df2)
p <- 1-pf(f,df1,df2)
c2
data.frame(SS=q2,df=df1,F.stat=f,p.value=p)
# Compute Type III sums of squares and
# related F-tests
s <- length(unique(carrot$Soil))
t <- length(unique(carrot$Variety))
yhat <- D %*% b
sse <- crossprod(y-yhat)
df2 <- nrow(y) - s*t
486
c3 <- kronecker( cbind(Iden(s-1),-one(s-1)),
cbind(Iden(t-1),-one(t-1)) )
q3 <- t(b) %*% t(c3)%*%
solve( c3 %*% solve(crossprod(D))
%*% t(c3))%*% c3 %*% b
df1<- (s-1)*(t-1)
f <- (q3/df1)/(sse/df2)
p <- 1-pf(f,df1,df2)
c3
data.frame(SS=q3,df=df1,F.stat=f,p.value=p)
488
487
>
+
>
>
>
carrot <- read.table("c:/carrots.dat",
col.names=c("Soil","Variety","Days"))
carrot$Soil <- as.factor(carrot$Soil)
carrot$Variety <- as.factor(carrot$Variety)
carrot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Soil Variety Days
1
1
6
1
1 10
1
1 11
1
2 13
1
2 15
1
3 14
1
3 22
2
1 12
2
1 15
2
1 19
2
1 18
2
2 31
2
3 18
2
3
9
2
3 12
489
# Set up the axes and title of the
# profile plot.
#
#
#
#
#
#
#
> par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5)
> x.axis <- unique(carrot$Variety)
> matplot(c(1,3,1), c(0,40,10), type="n",
xlab="Variety", ylab="Mean Time",
main= "Average Time to Carrot Seed
Germination")
Compute sample means of germination
times for all combinations of soil
type and varieties of carrot seeds
and make a profile plot.
At this point UNIX users should open
a graphics window with the motif( )
function
# Add a profile for each soil type
> means <- tapply(carrot$Days,
list(carrot$Variety,carrot$Soil),mean)
> means
> matlines(x.axis,means,type='l',
lty=c(1,3),lwd=3)
1 2
1 9 16
2 14 31
3 18 13
> matpoints(x.axis,means, pch=c(1,16))
# Plot points for the observations
# Add a legend to the plot
> legend(2.,38.6,
legend=c('Soil Type 1','Soil Type 2'),
lty=c(1,3),bty='n')
490
# Fit a model with main effects and interaction
# Compute both sets of Type I sums of squares
40
Average Time to Carrot Seed
Germination
491
> lm.out1 <- lm(Days~Soil*Variety,data=carrot)
> anova(lm.out1)
10
20
Analysis of Variance Table
0
Mean Time
30
Soil Type 1
Soil Type 2
1.0
1.5
2.0
2.5
3.0
Variety
492
Response: Days
Terms added sequentially (first to
Df Sum of Sq Mean Sq
Soil 1
52.500 52.5000
Variety 2 124.734 62.3670
Soil:Variety 2 222.766 111.3830
Residuals 9 120.000 13.3333
last)
F Value
3.937500
4.677527
8.353723
493
Pr(F)
0.0785
0.0405
0.0089
# Create a data frame containing the original
# data and the residuals and estimated means
> lm.out2 <- lm(Days~Variety*Soil,data=carrot)
> anova(lm.out2)
Analysis of Variance Table
Response: Days
Terms added sequentially (first to
Df Sum of Sq Mean Sq
Variety 2 93.3333 46.6667
Soil 1 83.9007 83.9007
Variety:Soil 2 222.7660 111.3830
Residuals 9 120.0000 13.3333
last)
F Value
3.500000
6.292553
8.353723
Pr(F)
0.0751
0.0334
0.0089
> data.frame(carrot$Soil,carrot$Variety,carrot$Days,
Pred=lm.out1$fitted,
Resid=round(lm.out1$resid,3))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
X1 X2 X3 Pred Resid
1 1 6
9
-3
1 1 10
9
1
1 1 11
9
2
1 2 13 14
-1
1 2 15 14
1
1 3 14 18
-4
1 3 22 18
4
2 1 12 16
-4
2 1 15 16
-1
2 1 19 16
3
2 1 18 16
2
2 2 31 31
0
2 3 18 13
5
2 3 9 13
-4
2 3 12 13
-1
495
494
0
Residuals
-4 -2
frame( )
par(cex=1.0,mex=1.0,lwd=3,pch=2,
mkh=0.1,fig=c(0,1,.51,1), pty='m')
plot(lm.out1$fitted, lm.out1$resid,
xlab="Estimated Means",
ylab="Residuals")
abline(h=0, lty=2, lwd=3)
2
4
# Create residual plots
10
15
20
25
30
2
0
par(fig=c(0, 1, 0, 0.49), pty='s')
qqnorm(lm.out1$resid)
qqline(lm.out1$resid)
-4 -2
lm.out1$resid
4
Estimated Means
-1
0
1
Quantiles of Standard Normal
496
497
Create plots for studentized residuals
You must attach the MASS library
to have access to the studres( )
function that computes studentized
residuals in the following code
Studentized Residuals
library(MASS)
-1
frame( )
par(cex=1.0,mex=1.0,lwd=3,pch=2,
mkh=0.1,fin=c(6.5,6.5))
plot(lm.out1$fitted, studres(lm.out1),
xlab="Estimated Means",
ylab="Studentized Residuals",
main="Studentized Residual Plot")
abline(h=0, lty=2, lwd=3)
1
2
Studentized Residual Plot
0
#
#
#
#
#
10
15
20
25
30
Estimated Means
qqnorm(studres(lm.out1),
main="Studentized Residuals")
qqline(studres(lm.out1))
498
499
# By default S uses so-called Helmert contrast
# matrices for unordered factors and orthogonal
# polynomial contrast matrices for ordered factors.
2
Studentized Residuals
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-1
studres(lm.out1)
1
> lm.out <- lm(Days~Soil*Variety,data=carrot)
> model.matrix(lm.out)
-1
0
1
Quantiles of Standard Normal
500
(Int) Soil Variety1 Variety2 SoilVariety1 SoilVariety2
1
-1
-1
-1
1
1
1
-1
-1
-1
1
1
1
-1
-1
-1
1
1
1
-1
1
-1
-1
1
1
-1
1
-1
-1
1
1
-1
0
2
0
-2
1
-1
0
2
0
-2
1
1
-1
-1
-1
-1
1
1
-1
-1
-1
-1
1
1
-1
-1
-1
-1
1
1
-1
-1
-1
-1
1
1
1
-1
1
-1
1
1
0
2
0
2
1
1
0
2
0
2
1
1
0
2
0
2
501
> summary(lm.out)
> anova(lm.out)
Call: lm(formula = Days ~ Soil * Variety,
data = carrot)
Residuals:
Min 1Q
Median 3Q Max
-4 -2 9.992e-016 2 5
Coefficients:
Value Std. Error
(Intercept) 16.8333 1.0393
Soil 3.1667 1.0393
Variety1 5.0000 1.3176
Variety2 -0.6667 0.7082
SoilVariety1 2.5000 1.3176
SoilVariety2 -2.8333 0.7082
t value Pr(>|t|)
16.1960 0.0000
3.0468 0.0139
3.7947 0.0043
-0.9414 0.3711
1.8974 0.0903
-4.0008 0.0031
Residual standard error: 3.651 on 9 degrees of
freedom
Multiple R-Squared: 0.7692
F-statistic: 6 on 5 and 9 degrees of freedom,
the p-value is 0.01031
Analysis of Variance Table
Response: Days
Terms added sequentially (first to
Df Sum of Sq Mean Sq
Soil 1
52.500 52.5000
Variety 2 124.734 62.3670
Soil:Variety 2 222.766 111.3830
Residuals 9 120.000 13.3333
#
#
#
#
last)
F Value
3.937500
4.677527
8.353723
Pr(F)
0.0785
0.0405
0.0089
The default contrast matrices can be changed by
resetting the contrasts options. The contr.sum
option restricts parameters to sum to zero
across the levels of any single factor.
options(contrasts=c('contr.sum','contr.ploy'))
unlist(options())
503
502
> summary(lm.out)
# Now, ``contr.sum'' will be used for unordered
# factors and orthogonal polynomial contrast
# matrices will be used for ordered factors.
lm.out <- lm(Days~Soil*Variety,data=carrot)
model.matrix(lm.out)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(Int) Soil Variety1 Variety2 SoilVariety1 SoilVariety2
1
1
1
0
1
0
1
1
1
0
1
0
1
1
1
0
1
0
1
1
0
1
0
1
1
1
0
1
0
1
1
1
-1
-1
-1
-1
1
1
-1
-1
-1
-1
1
-1
1
0
-1
0
1
-1
1
0
-1
0
1
-1
1
0
-1
0
1
-1
1
0
-1
0
1
-1
0
1
0
-1
1
-1
-1
-1
1
1
1
-1
-1
-1
1
1
1
-1
-1
-1
1
1
504
Call: lm(formula = Days ~ Soil * Variety,
data = carrot)
Residuals:
Min 1Q
Median 3Q Max
-4 -2 5.551e-016 2 5
Coefficients:
(Intercept)
Soil
Variety1
Variety2
SoilVariety1
SoilVariety2
Value Std. Error
16.8333 1.0393
-3.1667 1.0393
-4.3333 1.3147
5.6667 1.6574
-0.3333 1.3147
-5.3333 1.6574
t value Pr(>|t|)
16.1960 0.0000
-3.0468 0.0139
-3.2961 0.0093
3.4190 0.0076
-0.2535 0.8055
-3.2179 0.0105
Residual standard error: 3.651 on 9 degrees
of freedom
Multiple R-Squared: 0.7692
F-statistic: 6 on 5 and 9 degrees of freedom,
the p-value is 0.01031
505
# Compute Type III sums of squares and F-tests.
# First create the model matrix for
# the cell means model.
> cb <- as.factor(10*as.numeric(carrot$Soil)
+ as.numeric(carrot$Variety))
> lm.out <- lm(carrot$Days ~ cb - 1)
> D <- model.matrix(lm.out)
> D
> anova(lm.out)
Analysis of Variance Table
Response: Days
Terms added sequentially (first to
Df Sum of Sq Mean Sq
Soil 1
52.500 52.5000
Variety 2 124.734 62.3670
Soil:Variety 2 222.766 111.3830
Residuals 9 120.000 13.3333
last)
F Value
3.937500
4.677527
8.353723
Pr(F)
0.0785
0.0405
0.0089
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cb11 cb12 cb13 cb21 cb22 cb23
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
507
506
# Compute the sample means
> y <- matrix(carrot$Days,ncol=1)
> b <- solve(crossprod(D)) %*% crossprod(D,y)
> b
cb11
cb12
cb13
cb21
cb22
cb23
[,1]
9
14
18
16
31
13
# Compute Type III sums of squares and
# related F-tests
> s <- length(unique(carrot$Soil))
> t <- length(unique(carrot$Variety))
> yhat <- D %*% b
> sse <- crossprod(y-yhat)
> df2 <- nrow(y) - s*t
# Generate an identity matrix and
#
a vector of ones
Iden <- function(n) diag(rep(1,n))
one <- function(n) matrix(rep(1,n),ncol=1)
508
> c1 <- kronecker( cbind(Iden(s-1),-one(s-1)),
t(one(t)) )
> q1 <- t(b) %*% t(c1)%*%
solve( c1 %*% solve(crossprod(D))
%*% t(c1))%*% c1 %*% b
> df1<- s-1
> f <- (q1/df1)/(sse/df2)
> p <- 1-pf(f,df1,df2)
509
> c1
[1,]
> data.frame(SS=q2,df=df1,F.stat=f,p.value=p)
[,1] [,2] [,3] [,4] [,5] [,6]
1
1
1 -1 -1 -1
SS df
F.stat
p.value
1 192.1277 2 7.204787 0.01354629
> data.frame(SS=q1,df=df1,F.stat=f,p.value=p)
SS df F.stat
p.value
1 123.7714 1 9.282857 0.01386499
> c2 <- kronecker( t(one(s)), cbind(Iden(t-1),
-one(t-1))
> q2 <- t(b) %*% t(c2)%*%
solve( c2 %*% solve(crossprod(D))
%*% t(c2))%*% c2 %*% b
> df1<- t-1
> f <- (q2/df1)/(sse/df2)
> p <- 1-pf(f,df1,df2)
> c2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]
1
0
-1
1
0
-1
[2,]
0
1
-1
0
1
-1
> c3 <- kronecker( cbind(Iden(s-1),-one(s-1)),
cbind(Iden(t-1),-one(t-1)) )
> q3 <- t(b) %*% t(c3)%*%
solve( c3 %*% solve(crossprod(D))
%*% t(c3))%*% c3 %*% b
> df1<- (s-1)*(t-1)
> f <- (q3/df1)/(sse/df2)
> p <- 1-pf(f,df1,df2)
> c3
[1,]
[2,]
[,1] [,2] [,3] [,4] [,5] [,6]
1
0
-1 -1
0
1
0
1
-1
0
-1
1
> data.frame(SS=q3,df=df1,F.stat=f,p.value=p)
SS df
F.stat
p.value
1 222.766 2 8.353723 0.00888845
510
What null hypotheses are tested by F-tests
derived from such ANOVA tables (Type I
sums of squares in SAS)?
R() = YT P1Y
= YT P1P1Y
= (P1Y)T (P1Y)
= (Y:::1)T (Y :::1) = n::Y:::2
2
()
For the carrot seed germination study:
1 1 1T X P1X =
n::
= 1 1[n::; n ; n ; n ; n ; n ;
1: 2: :1 :2 :3
n::
n11; n12; n13; n21; n22; n23] a
= n1:: 1 n:: + X
i=1
a
b
ni:i +
b
X
j =1
n:j j
+ X X ij
i=1 j =1
The null hypothesis is
a
b
X
X
XX
H0 : 0 = n:: + ni: i+
n:j j +
nij ij
i=1
j =1
i j
With respect to the cell means
E (Yijk ) = ij = + i + j + ij
this null hypothesis is
a X
b n
X
ij H0 : 0 =
n:: ij
( ) and
R()
F=
F(1;n:: ab)(Æ2)
SSE=(n:: ab)
where
1
Æ2 = 2 T X T P1X = 12 (T X T P1)(P1X )
= 12 (P1X )T (P1X )
1R
511
21 Æ2
i=1 j =1
512
513
Consider
and
F
For the general eects model for the carrot
seed germination study:
T X;) X T X
P; X = X;(X;
;
R(j) = YT (P; P)Y
)=(a 1) F
= R(jMSE
(a
Here,
1;n::
2
ab)(Æ )
1 R(j) 2 (Æ2)
a 1
2
where a 1 = rank(X;) rank(X) and
1
Æ2 = 2 T X T (P; P)X = 1 [(P; P)X ]T [(P; P)X ]
2
=
2
X; 64
n:: n1: n2:
n1: n1: 0
n2: 0 n2:
3
7
5
2
3
n:: n1: n2: n:1 n:2 n:3 n11 n12 n13 n21 n22 n23
4 n1: n1: 0 n11 n12 n13 n11 n12 n13 0 0 0 5
n2: 0 n2: n21 n22 n23 0 0 0 n21 n22 n23
=
2
6
6
6
6
6
4
1
1
...
1
1
...
1
1
1
...
1
0
...
0
2
6
X; 64
0
0
...
0
1
...
1
0 0 0
0 n11: 0
0 0 n12:
3
720 0 0
7
7
741 1 0
7 1 0 1
5
0
n11
n1
n11
n1
:
:
#
3 2
7 6
7 4
5
0
n12
n1
n22
n2
:
:
0
n13
n1
n23
n2
:
:
0
n11
n
::
0
3
7
5
0
n12
n
::
0
0
n13
n
::
0
514
Then, the rst seven rows of (P;
are
h
i
+ 1 + Pbj=1 nn11j: (j + 1j )
h
+
P)X b n
X
ni:
:j + X X nij i
i +
j
ij
i=1 n::
j =1 n::
i j n::
b n
X
ni:
:j + X X nij i
+
i +
j
ij
i=1 n::
j =1 n::
i j n::
h
a
X
516
0
0
n21
n1
:
0
0
n22
n2
515
The null hypothesis is
b
X
nij
(j + ij )
j =1 ni:
are all equal (i = 1; : : : ; a)
H0 : i +
a
X
The last eight rows of (P; P)X are
h
i
+ 2 + Pbj=1 nn22j: (j + 2j )
=
with respect to the cell means model,
ij = E (Yijk = + i + j + ij ;
this null hypothesis is
H0 :
b
X
nij
ij
j =1 ni:
are all equal (i = 1; : : : ; a):
517
:
3
0
0 5
n23
n1
:
Download