Comments: T Consider R (

advertisement
Comments:
Consider R(j; ) = YT (P;; P;)Y
and the corresponding F-statistic
For R(j),
b
X
nij
ij may not be equal
j =1 ni:
for all i = 1; : : : ; a, even though
(i)
b
X
1
b
j =1
F
Here,
1 R(j; ) 2
2
rank(X;; ) rank(X;)(Æ )
2
are equal for all i = 1; : : : ; a.
ij
)=(b 1) F
= R(j;MSE
(b 1;n:: ab)(Æ2)
%
b
nij
ij may be equal
j =1 ni:
for all i = 1; : : : ; a, even though
(ii) X
b
X
1
b
j =1
and
1
Æ2 = 2 [(P;;
2
are not equal for
ij
-
[1+(a 1)+(b 1)] [1+(a 1)]
= b 1 degrees of freedom
some i = 1; : : : ; a.
P;)X ]T [(P;; P;)X ]
518
P;; X
=
h
6
6
X;; 6
6
6
4
n::
n1:
n2:
n:l
n:2
n:3
n1: n2:
n1: 0
0 n2:
n11 n21
n12 n22
n13 n23
%
call this
h
A
B
T
2
=4
B
C
i 1
The null hypothesis is
a n
X
ij ( + )
H0 :
j
ij
n
i
T X
T
X;; X;;
;; X;; X
2
=
519
=
=
h
h
i h
1 0
0 0 i+ h
0 01 +
0 C
A
W
C 1B T W
where W = [A
n:1 n:2 n:3
n11 n12 n13
n21 n22 n23
n:1 0 0
0 n:2 0
0 0 n:3
"
A B
BT C
i=1
3
7
7
7
7
7
5
T X
X;;
:j
a
X
nij
i=1 n:j
0
1
b
X
nik
(k + ik)A = 0
k=1 ni:
for all j = 1; : : : ; b
@
#
i
A 1 B [C B A 1 B ] 1 [ B A
I
i
I
1
1
C 1B [A BC B ] [I j
T
T
T
T
W BC 1
5
C 1 + C 1B T W BC 1
3
1
j I]
BC
1
]
With respect to the cell means,
E (Yijk ) = ij ;
this null hypothesis is
a
X
nij
H0 :
ij
i=1 n:j
a
X
0
1
b n
nij @ X
ik A = 0
ik
i=1 n:j k=1 ni:
for all j = 1; 2; : : : ; b:
BC 1B T ] 1
520
521
Consider
R( j; ; ) = YT [PX
P;; ]Y
Type I sums of squares
and the associated F-statistic
F
Source
of
sums of
Mean
variat. d.f.
squares
square
F p-value
Soil
types a 1 = 1
R(j) = 52:5
52.5 3.94 .0785
Var.
b 1=2
R(j; ) = 124:73
62.4 4.68 .0405
Interaction (a-1)(b-1)=2 R( j; ; ) = 222:76 111.38 8.35 .0089
Resid. n ab = 9 Y (I P )Y = 120
13.33
Corr.
total n 1 = 14 Y (I P1)Y = 520
Corr.
for
the
mean 1
R() = 3375
=[(a 1)(b 1)]
= R( j; ; )MSE
F(a 1)(b 1);n:: ab(Æ2)
T
The null hypothesis is:
H0 : (ij
X
T
i` kj + k`)
= (ij i` kj + k`) = 0
for all (i; j ) and (k; `) :
522
523
ANOVA Summary:
Sums of
Squares
Associated null
hypothesis
R()
H0 : +
a
X
n
=1
i
+
i:
n
::
=1 j =1
i
or H0 :
+
i
i
+
ij
n
ij
::
b
X
n
ij
or H0 :
j
R(j; )
H0 : j
+
=1
n
=1
ij
n
a
X
n
ij
n
=1
ij
::
ij
i:
:j
n
::
j
=0
!
:j
R(j)
are equal
a
X
ij
ij
=1
n
( + )
k
k:
a
b
X
n Xn
ij
ij
ij
:j
:j
i
i
kj
ij
i`
kj
=1
or H0 :
ik
ik
i:
k`
524
R( j; ; )
H0 : ij
+
ij
n
::
n
n
::
ij
n
j
::
=0
ij
( + ) are equal for all j = 1; : : : ; b
ij
ij
:j
:j
n
=0
ij
ij
=1
j
:j
Xn
=1
b
X
n
j
ij
i
a
are equal for all j = 1; : : : ; b
b
X
( + ) = nn
=1
for all i = 1; : : : ; a
ij
n
ij
i:
a
X
n
kj
i:
j
ij
=1
ij
ij
b
X
n
j
!
k
::
=1 j =1
b
X
n
ik
k`
i`
+
i
H0 :
i
a
X
n
=1
j
ik
k
j
or H0 :
R(j; )
b
X
n
:j
i
H0 : !
a
b
X
X
n
i
or H0 : n = n
for all j = 1; : : : ; b
n
=1
=1
=1
R( j; ; ) H0 : + = 0 for all (i; j ) and (k; `)
(or H0 : + = 0 for all (i; j )and (k; `)
ij
i:
n
=1 j =1
or H0 :
ij
i
a
X
n
a
X
n
a
b
X
X
n
i
= nn
n
=1
for
all j = 1; :=1: : ; b
ij
+
( + ) are equal
j
i:
b
X
n
H0 : +
=0
a
b
X
X
n
j
b
X
n
j
=1 j =1
H0 : R()
=1
i
R(j)
Associated null
hypothesis
i
a
b
X
X
n
Sums of
Squares
n
ij
i:
=
k
n
b
a
X
n hX n
ij
j
=1
n
i:
k=1
kj
n
:j
( + )
k
:j
=1
kj
i
kj
for all i = 1; : : : ; a
+ = 0 for all (i; j ) and (k; `)
kj
or H0 : ij
i`
kj
k`
i`
+ = 0 for all (i; j ) and (k; `)
k`
525
Type I sums of squares
Source
of
sums of
Mean
variat. d.f.
squares
square
F p-value
\Soils" a 1 = 1
R(j) = 52:50
52.5 3.94 .0785
\Var." b 1 = 2
R(j; ) = 124:73
62.4 4.68 .0405
Interaction (a-1)(b-1) R(j; ; ) = 222:76 111.38 8.35 .0089
-2
\Res." (n 1) Y (I P )Y = 120:00 13.33
=9
Corr.
total n 1 = 14 Y (I P1)Y = 520:00
T
ij
X
T
::
Source
of
sums of
Mean
variat. d.f.
squares
square
F p-value
\Var." b 1 = 2
R(j) = 93:33
46.67 3.50 .0751
\Soils" a 1 = 1
R(j; ) = 83:90
83.90 6.29 .0334
Interaction (a-1)(b-1) R(j; ; ) = 222:76 111.38 8.35 .0089
=2
\Res." (n 1) Y (I P )Y = 120:00 13.33
=9
Corr.
total n 1 = 14 Y (I P1)Y = 520:00
T
ij
Type II sums of squares: (SAS)
Source
of
sums of
Mean
variat. d.f.
squares
square F p-value
\Soils" a 1 = 1 R(j; ) = 83:90
83.90 6.3 .0339
\Var." b 1 = 2 R(j; ) = 124:73
62.37 4.7 .0405
Interaction (a-1)(b-1) R( j; ; ) = 222:76 111.38 8.4 .0089
=2
\Res." n ab
Y (I P )Y = 120
13.33
=9
Corr.
total n 1
Y (I P1 )Y = 520
T
X
T
X
T
::
527
526
Examine the soil type eect on time to
germination for each variety:
40
Average Time to Carrot Seed
Germination
20
10
Time to germination for variety 2 is
0
Mean Time
30
Soil Type 1
Soil Type 2
Time to Germination
Soil Type 1 Soil Type 2
Variety Yij: SYij: Y2j: SY2j:
t p-value
j=1
9.0 2.11 16.0 1.83 -2.51 .0333
j = 2 14.0 2.58 31.0 3.65 -3.80 .0042
j = 3 18.0 2.58 13.0 2.11 1.50 .1679
1.0
1.5
2.0
2.5
3.0
Variety
528
shorter in soil type 1.
Time to germination for variety 1 may
also be shorter in soil type 1.
For variety 3 there is no signicant
dierence in average germination times
for the two soil types.
529
In the previous analysis:
Yij: = ^ij
= ^ + ^i + ^j + ^ij
is the OLS estimator (b.l.u.e.) for
ij = + i + j + ij
SYij:
=
v
u
u
t
MSE
nij
for i = 1; : : : ; a; and j = 1; : : : ; b
and
Y1j: Y2j:
MSE ( n11j + n12j )
t=r
Method of Unweighted Means
(Type III sums of squares in SAS when
for all (i; j )).
nij > 0
Use the cell means reparameterization
of the model:
Yijk = + i + j + ij + ijk
= ij + ijk
for j = 1; : : : ; b
531
530
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y111
Y112
Y113
Y121
Y122
Y131
Y132
Y211
Y212
Y213
Y214
Y221
Y231
Y232
Y233
%
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
1
1
1
0
0
0
0
= 0
0
0
0
0
0
0
0
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y
The model is
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
%
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0 37
07
0 7777
07
0 777 2 11 3
0 7 6 77
7
0 7777 666 12
7
7+
0 77 666 13
7
7
0 777 64 21
7
22
0 77 23 5
07
0 777
1 777
05
0
D
"
The least squares estimator (b.l.u.e.) for is
^ = (DT D) 1DT Y
2
=
2
=
Y = D + 532
6
6
6
6
6
6
6
6
6
4
6
6
6
6
6
6
6
6
4
n111
Y11:
Y12:
Y13:
Y21:
Y22:
Y23:
n121
3
n131
32
n211
n221
n231
76
76
76
76
76
76
76
76
74
5
Y11:
Y12:
Y12:
Y21:
Y22:
Y23:
7
7
7
7
7
7
7
7
5
533
3
7
7
7
7
7
7
7
7
5
Test the null hypothesis
1 Xb = 1 Xb = = 1 Xb H0 :
b j =1 ij b j =1 2j
b j =1 aj
vs.
1 Xb ij 6= 1 Xb for some i 6= k
HA :
kj
b
b
j =1
j =1
The OLS estimator (b.l.u.e.) for 1b
Y~i:: =
with
V ar(Y~i::)
1 Xb Yij:
j =1
ij
is
where
C1
2
j =1 ij
b
1 X
2
=
6
6
6
6
6
6
4
2
=
1
b2 j =1 nij
= [Ia 1
b j =1
b 2
= b12 X n
=
b
X
Express the null hypothesis in matrix form:
H0 : C1 = 0
1Tb
1Tb
X
6
j
6
6
6
6 X
4
j
1a
1j
1] 1Tb 1Tb
1Tb
...
1Tb
X
..
a 1;j
j
aj
X
j
aj
..
1Tb
3
C1b
6
76
76
76
76
76
76
56
6
6
6
6
4
..
1b
21
..
2b
..
ab
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
535
Compute
SSH0 = (C1b 0)T [C1(DT D) 1C1T ] 1(C1b 0)
= YT D(DT D) 1C1T [C1(DT D) 1C1T ] 1
C1(DT D) 1DT Y
= C1(DT D) 1DT Y
2
3
X
X
Y
Y
1
j:
aj:
6
7
6
7
j
j
6
7
.
6
7
.
6
7
= 66
.
7
.
7
X
X
6
4
Ya 1;j:
Yaj: 75
j
11
6
6
3 6 12
7
7
7
7
7
5
534
Then
2
j
is the OLS estimator (b.l.u.e.) of C1, and
V ar(C1b) = V ar(C1(DT D) 1DT Y)
= C1(DT D) 1DT (2I )D(DT D) 1C1T
= 2C1(DT D) 1DT D(DT D) 1C1T
= 2C1(DT D) 1C1T
536
Use result 4.7 to show
1 SS 2
2
H0
(a 1)(Æ
2)
Check that
1
A = 2 D(DT D) 1C1T [C1(DT D) 1C1T ] 1
C1(DT D) 1DT (2I )
is idempotent and that
a 1 = rank(C1(DT D) 1C1T )
537
Compute:
where
SSE = YT (I
Check that
A1A2 =
=
=
PD )Y
PD = D(DT D) 1DT
A1(2I )A2
2A1A2
2(I PD )(D(DT D) 1C1T
(C1(DT D) 1C1T ) 1C1(DT D) 1DT
Use result 4.7 to show
1 SSE 2
= 0
This is true because (I
Use result 4.8 to show that
SSE = YT (I PD )Y
- call this A1
is distributed independently of
Then
SSH0 =(a 1)
F(a 1;(nij 1))(Æ2)
F=
SSE=((n 1))
(nij 1)
2
SSH
0
= YT D(DT D)
1C T
1
[C1(DT D)
(DT D)
- call this A2
1C T
1
]
1C
1
PD )D = 0.
ij
where
1
Æ2 = 2 T C1T [C1(DT D) 1C1T ] 1C1
1DT Y
538
539
Test
Reject
1
H0 :
H0 :
b
X
b j =1
if
F
1j =
1
b
X
b j =1
SSH0 =(a 1)
= SSE=
((nij 1))
or if
p
2j = =
1
vs.
b
X
b j =1
aj
> F(a 1;(nij 1)) ; HA :
value =
a
X
1
a
X
a i=1
C2
=
P r F(a 1;(nij 1)) > F
< then
ij 6=
a i=1
1
T
a
2
o
i1 =
4
I
b
1
1
1
b
a
X
1
a
X
a i=1
i2 = =
1
a
X
a i=1 ib
2
6
6
4
11
12
..
..
a i=1 ik
1
1
1
1
1 .
1
.
..1 ..
.
1 1
1 1
1 .
..
C2 = C2 6
6 1
6 21
540
1
for some j 6= k
Write the null hypothesis in matrix form as
H0 : C2 = 0
where =
n
1
b
ab
3
7
7
7
7
7
5
2
a
X
1
6
6
=1
= 666
i
4 1X
a
a
=1
i
3
1 X
a
1
i
a
i;b
..
1
a
ib
i
=1
1 X
a
a
7
7
7
7
7
5
ib
i
=1
541
3
5
Test for Interaction:
Compute
SSH0;2 =
Test
1C2T [C2(DT D) 1C2T ] 1
C2(DT D) 1DT Y
YT D(DT D)
and reject H0 if
F
=
for all
vs.
for all
SSH0;2 =(b 1)
SSE=((nij 1))
H0 : ij i` kj + k` = 0
(i; j ) and (k; `)
HA : ij i` kj + k` 6= 0
(i; k) and (j 6= `).
Write the null hypothesis in matrix form as
H0 : C3 = 0
where
C3 = [Ia 1j 1a 1] [Ib 1j 1b 1]
> F(b 1;(nij 1));
542
Compute
2
b = (DT D)
SSH0;3
1DT Y = 64
Y11:
..
Yab:
3
7
5
= (C3b 0)T [C3(DT D) 1C3T ] 1(C3b 0)
= YT D(DT D) 1C3T [C3(DT D) 1C3T ] 1
C3(DT D) 1DT Y
and reject H0 if
F
=
543
PROC GLM is SAS reports this as Type III
sums of squares.
Source of
Sum of
variation
d.f.
Soils
a-1=1
Var.
b-1=2
Inter.
(a-1)(b-1)=2
SS
SS
SS
Mean
Squares
Square F p-value
= 123.77 123.77 9.28 .0139
= 192.13 96.06 7.20 .0135
= 222.76 111.38 8.35 .0089
H0
H0;2
H0;3
SSH0;3 =((a 1)(b 1))
SSE=((nij 1))
> F((a 1)(b 1);(nij 1));
544
545
Note that
Note that
YT P1Y +
YT D(DT D) 1[C1(DT D)
C (DT D) 1DT Y
1
1C1T ] 1
YT D(DT D) 1C2T [C2(DT D)
C (DT D) 1DT Y
+
2
YT D(DT D) 1C3T [C3(DT D)
C (DT D) 1DT Y
+
3
1C2T ] 1
1C3T ] 1
+ YT (I PD )Y
do not necessarily sum to YT Y, nor do
the middle three terms (SSH0 ; SSH0;2 ; SSH0;3)
necessarily sum to
SSmodel,corrected = YT (PD P1)Y ;
nor are (SSH0 ; SSH0;2 ; SSH0;3) necessarily independent of each other.
SSH0 =
a
X
i=1
6
6
6
i6
6
4
w Y~i:
where
1 Xb Y
Y~i: =
ij:
b
wi
=
2
j =1
1 Xb
3
a5
4
2
b j =1 nij
2
a
X
2
3
wk Y~k: 77
k=1
a
X
wk
k=1
1
= 2
V ar(Y~i:)
h
i
n
ij
b X
X
Yi: =
j =1 k=1
b
X
j =1
Yijk
nij
b
X
=
nij Yij:
j =1
b
X
j =1
nij
547
Balanced factorial experiments
Furthermore,
a
X
2
SSH0;2 =
b
X
j =1
6
6
6
6
6
4
wj Y~:j
nij = n
2
3
w`Y~:` 77
`=1
a
X
w`
`=1
where
1 Xa Y
Y~:j =
ij:
a
7
7
7
5
i=1
3 1
a a
h
i
X
1
4
= a2 n 5 = 2 V ar(Y~:j ) 1
i=1 ij
and Y~:j is not necessarily equal to
n
ij
a X
X
i=1 k=1
a
X
i=1
Yijk
nij
a
X
=
nij Yij:
i=1
a
X
i=1
for
i = 1; : : : ; a
j = 1; : : : ; b
Example 8.2: Sugar Cane Yields
(from Snedecor and Cochran)
2
Y:j =
1
and Y~i: is not necessarily equal to
546
wj
7
7
7
5
nij
548
Nitrogem Level
150 lb/acre 210 lb/acre 270 lb/acre
Y111 = 70:5 Y121 = 67:3 Y131 = 79:9
Variety 1 Y112 = 67:5 Y122 = 75:9 Y132 = 72:8
Y113 = 63:9 Y123 = 72:2 Y133 = 64:8
Y114 = 64:2 Y124 = 60:5 Y134 = 86:3
Y211 = 58:6 Y221 = 64:3 Y231 = 64:4
Variety 2 Y212 = 65:2 Y222 = 48:3 Y232 = 67:3
Y213 = 70:2 Y223 = 74:0 Y233 = 78:0
Y214 = 51:8 Y224 = 63:6 Y234 = 72:0
Y311 = 65:8 Y321 = 64:1 Y331 = 56:3
Variety 3 Y312 = 68:3 Y322 = 64:8 Y332 = 54:7
Y313 = 72:7 Y323 = 70:9 Y331 = 66:2
Y314 = 67:6 Y324 = 58:3 Y334 = 54:4
549
For a balanced experiment (nij = n), Type I,
Type II, and Type III sums of squares are the
same:
R(j) = R(j; ) = SSH0
a
= n b X (Yi:: Y:::)2
ANOVA
R() = YT P1Y
R(j) = R(j; )
= nb
SSH0;3
n
a X
b
X
= na
(Yij: Yi:: Y:j: + Y:::)2
i=1 j =1
b
1X
j
b j =1
=0
ij = 0
(j + ij )
j =1
are equal
H0 : 1b
b
X
j =1
H0 : j + 1a
b
X
(Y:j: Y:::)2
j =1
i=1 j =1
b
X
H0 : i + 1b
(Yi:: Y:::)2
R(j) = R(j; )
i +
ij
i=1 j =1
a X
b
X
a
X
i=1
a
X
i=1
a X
b
X
H0 : a1b
= R(j; ) = SSH0;2
b
= n a X (Y:j: Y:::)2
=
=
+ a1b
j =1
R( j; ; )
H0 : + 1a
= a b n Y:::2
i=1
R( j)
Associated null
hypothesis
Sum of Squares
are equal
ij
a
X
i=1
(i + ij )
are equal
H0 : 1a
a
X
i=1
ij
are equal
551
550
# This file is stored as cane.ssc
R( j; ; ) = n
H0 : ij
for
a X
b
X
i=1 j =1
(Yij: Yi:: Y:j: + Y:::)2
kj i` + k` = 0
all (i; j ) and (k; `)
# Enter the data. Note that the first
# line of this file is a line of data,
# not a line of variable names.
cane <- read.table( "cane.dat",
col.names=c("Variety","Nitrogen",
"Yield"))
# Create factors
H0 : ij kj i` + k` = 0
for all (i; j ) and (k; `)
cane$V <- as.factor(cane$Variety)
cane$N <- as.factor(cane$Nitrogen)
# Print the data frame
cane
552
553
#
#
#
#
#
Compute mean yields for all combinations
of nitrogen levels and varieties and
Make a profile plot. At this point
UNIX users should open a graphics
window with the motif( ) function.
means <- tapply(cane$Yield,
list(cane$Variety,cane$Nitrogen),
mean)
means
# Add a profile for each soil type
matlines(x.axis,means,type='l',
lty=c(1,3,5),lwd=3)
# Plot symbols for the sample means
matpoints(x.axis,means, pch=c(15,16,18))
# Set up the profile plot
# Add a legend to the plot
par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5,mkh=.20)
x.axis <- unique (cane$Nitrogen)
matplot(c(130,270), c(50,80),
type="n", xlab="Nitrogen(lb/acre)",
ylab="Mean Yield",
main= "Sugar Cane Yields")
legend(130,60, legend=c('Variety 1',
'Variety 2','Variety 3'),
lty=c(1,3,5),bty='n')
555
554
# Fit a model with main effects and
# interaction effects. Compute both
# sets of Type I sums of squares.
options(contrasts=c('contr.sum','contr.ploy'))
# Compute Type III sums of squares and
# corresponding F-tests.
lm.out1 <- lm(Yield~N*V, data=cane)
anova(lm.out1)
# Generate an identity matrix and a
# vector of ones
lm.out2 <- lm(Yield~V*N, data=cane)
anova(lm.out2)
summary(lm.out2, correlation=F)
model.matrix(lm.out2)
Iden <- function(n) diag(rep(1,n))
one <- function(n) matrix(rep(1,n),ncol=1)
# Compute the transpose of the model
# matrix for the cell means model
# Create diagnostic plots
par(mfrow=c(3,2))
plot(lm.out1)
# Create a data frame containing the original
# data and the residuals and estimated means
data.frame(cane$Nitrogen,cane$Variety,
cane$Yield,Pred=lm.out1$fitted,
Resid=round(lm.out1$resid,3))
556
s <- length(unique(cane$Nitrogen))
t <- length(unique(cane$Variety))
st <- s*t
r <- length(cane$Yield)/(st)
D <- t(kronecker(Iden(st), t(one(r))))
557
# Least squares estimation
y <b <yhat
sse
df2
matrix(cane$Yield,ncol=1)
solve(crossprod(D)) %*% crossprod(D,y)
<- D %*% b
<- crossprod(y-yhat)
<- nrow(y) - st
c1 <- kronecker( cbind(Iden(s-1),-one(s-1)),
t(one(t)) )
q1 <- t(b) %*% t(c1)%*% solve( c1 %*%
solve(crossprod(D)) %*% t(c1))%*%
c1 %*% b
df1<- s-1
f <- (q1/df1)/(sse/df2)
p <- 1-pf(f,df1,df2)
c1
data.frame(SS=q1,df=df1,F.stat=f,p.value=p)
558
# This file is stored as cane.ssc
c2 <- kronecker( t(one(s)),
cbind(Iden(t-1),-one(t-1)) )
q2 <- t(b) %*% t(c2)%*%solve( c2 %*%
solve(crossprod(D)) %*% t(c2))%*%
c2 %*% b
df1<- t-1
f <- (q2/df1)/(sse/df2)
p <- 1-pf(f,df1,df2)
c2
data.frame(SS=q2,df=df1,F.stat=f,p.value=p)
c3 <- kronecker( cbind(Iden(s-1),-one(s-1)),
cbind(Iden(t-1),-one(t-1)) )
q3 <- t(b) %*% t(c3)%*% solve( c3 %*%
solve(crossprod(D)) %*% t(c3))%*%
c3 %*% b
df1<- (s-1)*(t-1)
f <- (q3/df1)/(sse/df2)
p <- 1-pf(f,df1,df2)
c3
data.frame(SS=q3,df=df1,F.stat=f,p.value=p)
559
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Enter the data. Note that the first
# line of this file is a line of data,
# not a line of variable names.
> cane <- read.table("cane.dat",
col.names=c("Variety","Nitrogen",
"Yield"))
# Create factors
> cane$V <- as.factor(cane$Variety)
> cane$N <- as.factor(cane$Nitrogen)
# Print the data frame
> cane
560
Variety Nitrogen Yield N V
1
150 70.5 150 1
1
150 67.5 150 1
1
150 63.9 150 1
1
150 64.2 150 1
1
210 67.3 210 1
1
210 75.9 210 1
1
210 72.2 210 1
1
210 60.5 210 1
1
270 79.9 270 1
1
270 72.8 270 1
1
270 64.8 270 1
1
270 86.3 270 1
2
150 58.6 150 2
2
150 65.2 150 2
2
150 70.2 150 2
2
150 51.8 150 2
2
210 64.3 210 2
2
210 48.3 210 2
561
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
210
210
270
270
270
270
150
150
150
150
210
210
210
210
270
270
270
270
74.0
63.6
64.4
67.3
78.0
72.0
65.8
68.3
72.7
67.6
64.1
64.8
70.9
58.3
56.3
54.7
66.2
54.4
210
210
270
270
270
270
150
150
150
150
210
210
210
210
270
270
270
270
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
#
#
#
#
#
Compute mean yields for all combinations
of nitrogen levels and varieties and
Make a profile plot. At this point
UNIX users should open a graphics
window with the motif( ) function.
> means <- tapply(cane$Yield,
list(cane$Variety,cane$Nitrogen),
mean)
> means
150
210
270
1 66.525 68.975 75.950
2 61.450 62.550 70.425
3 68.600 64.525 57.900
563
562
Set up the profile plot
par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5,mkh=.20)
x.axis <- unique (cane$Nitrogen)
matplot(c(130,270), c(50,80),
type="n", xlab="Nitrogen(lb/acre)",
ylab="Mean Yield",
main= "Sugar Cane Yields")
65
60
55
# Plot symbols for the sample means
> matpoints(x.axis,means, pch=c(15,16,18))
Variety 1
Variety 2
Variety 3
50
# Add a profile for each soil type
> matlines(x.axis,means,type='l',
lty=c(1,3,5),lwd=3)
70
75
80
Sugar Cane Yields
Mean Yield
#
>
>
>
140 160 180 200 220 240 260
# Add a legend to the plot
> legend(130,60, legend=c('Variety 1',
'Variety 2','Variety 3'),
lty=c(1,3,5),bty='n')
Nitrogen(lb/acre)
564
565
# Fit a model with main effects and
# interaction effects. Compute both
# sets of Type I sums of squares.
options(contrasts=c('contr.sum','contr.ploy'))
Analysis of Variance Table
> lm.out1 <- lm(Yield~N*V, data=cane)
> anova(lm.out1)
Response: Yield
Analysis of Variance Table
Response: Yield
Terms added sequentially (first
Df Sum of Sq Mean Sq
N 2
56.541 28.2703
V 2 319.374 159.6869
N:V 4 559.788 139.9469
Residuals 27 1254.460 46.4615
> lm.out2 <- lm(Yield~V*N, data=cane)
> anova(lm.out2)
to last)
F Value
Pr(F)
0.608467 0.5514780
3.436975 0.0467974
3.012107 0.0354707
Terms added sequentially (first
Df Sum of Sq Mean Sq
V 2 319.374 159.6869
N 2
56.541 28.2703
V:N 4 559.788 139.9469
Residuals 27 1254.460 46.4615
566
> summary(lm.out2, correlation=F)
Call: lm(formula = Yield ~ V * N, data = cane)
Residuals:
Min
1Q Median
3Q Max
-14.25 -3.131 -0.3625 3.956 11.45
Coefficients:
(Intercept)
V1
V2
N1
N2
V1N1
V2N1
V1N2
V2N2
Value Std. Error
66.3222 1.1360
4.1611 1.6066
-1.5139 1.6066
-0.7972 1.6066
-0.9722 1.6066
-3.1611 2.2721
-2.5611 2.2721
-0.5361 2.2721
-1.2861 2.2721
t value Pr(>|t|)
58.3800 0.0000
2.5900 0.0153
-0.9423 0.3544
-0.4962 0.6238
-0.6051 0.5501
-1.3913 0.1755
-1.1272 0.2696
-0.2360 0.8152
-0.5660 0.5760
Residual standard error: 6.816 on 27 degrees of freedom
Multiple R-Squared: 0.4272
F-statistic: 2.517 on 8 and 27 degrees of freedom,
the p-value is 0.03462
568
to last)
F Value
Pr(F)
3.436975 0.0467974
0.608467 0.5514780
3.012107 0.0354707
567
> model.matrix(lm.out2)
(Intercept) V1 V2 N1
1
1 1 0 1
2
1 1 0 1
3
1 1 0 1
4
1 1 0 1
5
1 1 0 0
6
1 1 0 0
7
1 1 0 0
8
1 1 0 0
9
1 1 0 -1
10
1 1 0 -1
11
1 1 0 -1
12
1 1 0 -1
13
1 0 1 1
14
1 0 1 1
15
1 0 1 1
16
1 0 1 1
17
1 0 1 0
18
1 0 1 0
N2 V1N1 V2N1 V1N2 V2N2
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
-1 -1
0 -1
0
-1 -1
0 -1
0
-1 -1
0 -1
0
-1 -1
0 -1
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
1
1
0
0
0
1
569
> data.frame(cane$Nitrogen,cane$Variety,
cane$Yield,Pred=lm.out1$fitted,
Resid=round(lm.out1$resid,3))
1
2
3
4
5
6
7
8
9
10
11
12
X2
1
1
1
1
1
1
1
1
1
1
1
1
X3
70.5
67.5
63.9
64.2
67.3
75.9
72.2
60.5
79.9
72.8
64.8
86.3
Pred
66.525
66.525
66.525
66.525
68.975
68.975
68.975
68.975
75.950
75.950
75.950
75.950
18
60
3.5
1.5
2.5
11
0.5
sqrt(abs(Residuals))
10
5
0
-5
Residuals
-15
11
18
19
65
70
75
60
65
Fitted : N * V
70
75
fits
0
-5
Residuals
5
80
10
19
11
-15
60
65
70
18
75
-2
Fitted : N * V
0
1
2
0.20
18
11
19
0.10
Cook’s Distance
-5
0
10
Residuals
5
Fitted Values
-1
Quantiles of Standard Normal
0.0
570
# Create a data frame containing the original
# data and the residuals and estimated means
X1
150
150
150
150
210
210
210
210
270
270
270
270
19
70
1
1
-1
-1
-1
-1
0
0
0
0
-1
-1
-1
-1
1
1
1
1
Yield
0
0
0
0
0
0
0
0
0
0
-1
-1
-1
-1
1
1
1
1
Resid
3.975
0.975
-2.625
-2.325
-1.675
6.925
3.225
-8.475
3.950
-3.150
-11.150
10.350
572
-15
# Create diagnostic plots
> par(mfrow=c(3,2))
> plot(lm.out1)
0
0
-1
-1
-1
-1
-1
-1
-1
-1
0
0
0
0
1
1
1
1
60
0
0
0
0
0
0
-1
-1
-1
-1
0
0
0
0
1
1
1
1
50
1
1
-1
-1
-1
-1
0
0
0
0
1
1
1
1
-1
-1
-1
-1
10
0
0
-1
-1
-1
-1
1
1
1
1
0
0
0
0
-1
-1
-1
-1
5
1
1
1
1
1
1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
Yield
0
0
0
0
0
0
0
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-5
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-15
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
0.0 0.2 0.4 0.6 0.8 1.0
0
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
150
150
150
150
210
210
210
210
270
270
270
270
150
150
150
150
210
210
210
210
270
270
270
270
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
10
20
0.0 0.2 0.4 0.6 0.8 1.0
Index
f-value
58.6
65.2
70.2
51.8
64.3
48.3
74.0
63.6
64.4
67.3
78.0
72.0
65.8
68.3
72.7
67.6
64.1
64.8
70.9
58.3
56.3
54.7
66.2
54.4
61.450
61.450
61.450
61.450
62.550
62.550
62.550
62.550
70.425
70.425
70.425
70.425
68.600
68.600
68.600
68.600
64.525
64.525
64.525
64.525
57.900
57.900
57.900
57.900
-2.850
3.750
8.750
-9.650
1.750
-14.250
11.450
1.050
-6.025
-3.125
7.575
1.575
-2.800
-0.300
4.100
-1.000
-0.425
0.275
6.375
-6.225
-1.600
-3.200
8.300
-3.500
30
571
573
#
>
>
>
>
>
# Compute Type III sums of squares and
# corresponding F-tests.
# Generate an identity matrix and a
# vector of ones
> Iden <- function(n) diag(rep(1,n))
> one <- function(n) matrix(rep(1,n),ncol=1)
# Compute the transpose of the model
# matrix for the cell means model
>
>
>
>
>
s <- length(unique(cane$Nitrogen))
t <- length(unique(cane$Variety))
st <- s*t
r <- length(cane$Yield)/(st)
D <- t(kronecker(Iden(st), t(one(r))))
574
> c2 <- kronecker( t(one(s)),
cbind(Iden(t-1),-one(t-1)) )
> q2 <- t(b) %*% t(c2)%*%solve( c2 %*%
solve(crossprod(D)) %*% t(c2))%*%
c2 %*% b
> df1<- t-1
> f <- (q2/df1)/(sse/df2)
> p <- 1-pf(f,df1,df2)
> c2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]
1
0 -1
1
0
-1
1
0 -1
[2,]
0
1 -1
0
1
-1
0
1 -1
> data.frame(SS=q2,df=df1,F.stat=f,p.value=p)
SS df F.stat p.value
1 56.54056 2 0.608467 0.551478
Least squares estimation
y <- matrix(cane$Yield,ncol=1)
b <- solve(crossprod(D)) %*% crossprod(D,y)
yhat <- D %*% b
sse <- crossprod(y-yhat)
df2 <- nrow(y) - st
>c1 <- kronecker( cbind(Iden(s-1),-one(s-1)),
t(one(t)) )
> q1 <- t(b) %*% t(c1)%*% solve( c1 %*%
solve(crossprod(D)) %*% t(c1))%*%
c1 %*% b
> df1<- s-1
> f <- (q1/df1)/(sse/df2)
> p <- 1-pf(f,df1,df2)
> c1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]
1
1
1
0
0
0
-1 -1 -1
[2,]
0
0
0
1
1
1
-1 -1 -1
> data.frame(SS=q1,df=df1,F.stat=f,p.value=p)
SS df F.stat
p.value
1 319.3739 2 3.436975 0.04679743
575
> c3 <- kronecker( cbind(Iden(s-1),-one(s-1)),
cbind(Iden(t-1),-one(t-1)) )
> q3 <- t(b) %*% t(c3)%*% solve( c3 %*%
solve(crossprod(D)) %*% t(c3))%*%
c3 %*% b
> df1<- (s-1)*(t-1)
> f <- (q3/df1)/(sse/df2)
> p <- 1-pf(f,df1,df2)
> c3
[1,]
[2,]
[3,]
[4,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
1
0
-1
0
0
0 -1
0
1
0
1
-1
0
0
0
0 -1
1
0
0
0
1
0
-1 -1
0
1
0
0
0
0
1
-1
0 -1
1
> data.frame(SS=q3,df=df1,F.stat=f,p.value=p)
SS df F.stat
p.value
1 559.7878 4 3.012107 0.03547072
576
577
Conclusions:
Variety 3 exhibits a \linear" decrease in
yield as nitrogen increases from 150 lb/acre
to 270 lb/acre.
Varieties 1 and 2 exhibit parallel \linear"
increasing trends in yield as nitrogen
increases from 150 lb/acre to 270 lb/acre.
Variety 1 appears to provide a consistently
higher yield than Variety 2, but the
dierence in these two varieties is not
\signicant" at the .05 level.
Variety 3 seems to do as well as Variety 1
at 150 lb/acre of nitrogen.
/* Analysis of completely randomized
factorial experiements with an
application to the sugar cane data
from Snedecor and Cochran. This
program is posted as cane.sas */
data set1;
infile 'cane.dat';
input variety nitrogen yield;
run;
/* Print the data */
proc print data=set1;
var yield;
run;
/* Compute an ANOVA table */
578
proc glm data=set1;
class variety nitrogen;
model yield = variety|nitrogen /
p clm alpha=.05 ss1 ss2
ss3 ss4 e e1 e2 e3 e4;
output out=setr r=resid p=yhat;
lsmeans variety*nitrogen / stderr pdiff;
means variety nitrogen / tukey;
contrast 'n-linear' nitrogen -1 0 1;
contrast 'n-quad' nitrogen -1 2 -1;
contrast 'v1-v2' variety 1 -1 0;
contrast '(v1+v2)-v3' variety .5 .5 -1;
contrast '(v1-v2)*(n-lin)' variety*nitrogen
-1 0 1 1 0 -1 0 0 0;
contrast '(v1-v2)*(n-quad)' variety*nitrogen
-1 2 -1 1 -2 1 0 0 0;
contrast '(.5(v1+v2)-v3)*(n-lin)'
variety*nitrogen
-.5 0 .5 -.5 0 .5 1 0 -1;
contrast '(.5(v1+v2)-v3)*(n-quad)'
variety*nitrogen
-.5 1 -.5 -.5 1 -.5 1 -2 1;
580
579
estimate
estimate
estimate
estimate
estimate
'n-linear' nitrogen -1 0 1;
'n-quad' nitrogen -1 2 -1;
'v1-v2' variety 1 -1 0;
'(v1+v2)-v3' variety .5 .5 -1;
'(v1-v2)*(n-lin)' variety*nitrogen
-1 0 1 1 0 -1 0 0 0;
estimate '(v1-v2)*(n-quad)' variety*nitrogen
-1 2 -1 1 -2 1 0 0 0;
estimate '(.5(v1+v2)-v3)*(n-lin)' variety*nitrogen
-.5 0 .5 -.5 0 .5 1 0 -1;
estimate '(.5(v1+v2)-v3)*(n-quad)' variety*nitroge
-.5 1 -.5 -.5 1 -.5 1 -2 1;
run;
581
/* Make a profile plots for the interaction
between varieties and nitrogen levels */
/* UNIX users can use the following options */
/* goptions cback=white colors=(black)
targetdevice=ps300 rotate=landscape; */
/* Windows users can use the following */
goptions cback=white colors=black
device=WIN target=WINPRTC;
proc sort data=set1; by variety nitrogen;
proc means data=set1 noprint;
by variety nitrogen;
var yield;
output out=means mean=my;
run;
axis1 label=(f=swiss h=2.5)
ORDER = 120 to 300 by 30
value=(f=swiss h=2.0) w=3.0
length= 5.5 in;
axis2 label=(f=swiss h=2.0)
order = 50 to 80 by 10
value=(f=swiss h=2.0) w= 3.0
length = 5.5 in;
SYMBOL1 V=CIRCLE H=2.0 w=3 l=1 i=join ;
SYMBOL2 V=DIAMOND H=2.0 w=3 l=3 i=join ;
SYMBOL3 V=square H=2.0 w=3 l=9 i=join ;
PROC GPLOT DATA=means;
PLOT my*nitrogen=variety /
vaxis=axis2 haxis=axis1;
TITLE1 H=3.0 F=swiss "Sugar Cane Yields";
LABEL my='Mean Yield';
LABEL nitrogen = 'Nitrogen (lb/acre)';
RUN;
582
General Form of Estimable Functions
583
Type I Estimable Functions
Effect
Coefficients
Intercept
L1
Intercept
L2
L3
L1-L2-L3
variety
variety
variety
150
210
270
L5
L6
L1-L5-L6
nitrogen
nitrogen
nitrogen
150
210
270
150
210
270
150
210
270
L8
L9
L2-L8-L9
L11
L12
L3-L11-L12
L5-L8-L11
L6-L9-L12
L1-L2-L3-L5-L6+L8
+L9+L11+L12
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety
variety
variety
1
2
3
nitrogen
nitrogen
nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
Effect
584
variety
0
1
2
3
1
1
1
2
2
2
3
3
3
L2
L3
-L2-L3
150
210
270
0
0
0
150
210
270
150
210
270
150
210
270
0.3333*L2
0.3333*L2
0.3333*L2
0.3333*L3
0.3333*L3
0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
585
Type I Estimable Functions
Type II Estimable Functions
Effect
-------------Coefficients------nitrogen
variety*nitrogen
Effect
----Coefficients---variety
Intercept
0
0
Intercept
0
variety
variety
variety
1
2
3
0
0
0
0
0
0
variety
variety
variety
1
2
3
L2
L3
-L2-L3
nitrogen
nitrogen
nitrogen
150
210
270
L5
L6
-L5-L6
0
0
0
nitrogen
nitrogen
nitrogen
150
210
270
0
0
0
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
L8
L9
-L8-L9
L11
L12
-L11-L12
-L8-L11
-L9-L12
L8+L9+L11+L12
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
0.3333*L2
0.3333*L2
0.3333*L2
0.3333*L3
0.3333*L3
0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
150
210
270
150
210
270
150
210
270
150
210
270
150
210
270
150
210
270
586
587
Type III Estimable Functions
Type II Estimable Functions
Effect
-------------Coefficients--------nitrogen
variety*nitrogen
Intercept
0
0
variety
variety
variety
1
2
3
0
0
0
0
0
0
nitrogen
nitrogen
nitrogen
150
210
270
L5
L6
-L5-L6
0
0
0
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
L8
L9
-L8-L9
L11
L12
-L11-L12
-L8-L11
-L9-L12
L8+L9+L11+L12
150
210
270
150
210
270
150
210
270
----Coefficients---variety
Effect
588
Intercept
variety
variety
variety
0
1
2
3
nitrogen
nitrogen
nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
L2
L3
-L2-L3
150
210
270
0
0
0
150
210
270
150
210
270
150
210
270
0.3333*L2
0.3333*L2
0.3333*L2
0.3333*L3
0.3333*L3
0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
589
Type IV Estimable Functions
Type III Estimable Functions
Effect
-------------Coefficients-------nitrogen
variety*nitrogen
Intercept
0
0
variety
variety
variety
1
2
3
0
0
0
0
0
0
nitrogen
nitrogen
nitrogen
150
210
270
L5
L6
-L5-L6
0
0
0
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
L8
L9
-L8-L9
L11
L12
-L11-L12
-L8-L11
-L9-L12
L8+L9+L11+L12
150
210
270
150
210
270
150
210
270
590
Effect
----Coefficients---variety
Intercept
0
variety
variety
variety
1
2
3
nitrogen
nitrogen
nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
L2
L3
-L2-L3
150
210
270
0
0
0
150
210
270
150
210
270
150
210
270
0.3333*L2
0.3333*L2
0.3333*L2
0.3333*L3
0.3333*L3
0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
-0.3333*L2-0.3333*L3
Dependent Variable: yield
Type IV Estimable Functions
Effect
-------------Coefficients------nitrogen
variety*nitrogen
Intercept
0
0
Source
DF
Model
8
Sum of
Squares
1
2
3
0
0
0
0
0
0
nitrogen
nitrogen
nitrogen
150
210
270
L5
L6
-L5-L6
0
0
0
C. Total 35 2190.1622
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
variety*nitrogen
1
1
1
2
2
2
3
3
3
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
0.3333*L5
0.3333*L6
-0.3333*L5-0.3333*L6
L8
L9
-L8-L9
L11
L12
-L11-L12
-L8-L11
-L9-L12
L8+L9+L11+L12
Source
592
Error
variety
nitrogen
var*nit
Mean
Square
F
Pr > F
935.7022 116.9628 2.52 0.0346
variety
variety
variety
150
210
270
150
210
270
150
210
270
591
27 1254.4600
DF Type I SS
2
2
4
319.3739
56.5406
559.7878
46.4615
Mean
Square
159.6869
28.2703
139.9469
F
Pr > F
3.44 0.0468
0.61 0.5515
3.01 0.0355
593
Source
Mean
Square
DF Type II SS
variety
nitrogen
var*nit
2
2
4
319.3739
56.5406
559.7878
F
159.6869
28.2703
139.9469
Pr > F
3.44 0.0468
0.61 0.5515
3.01 0.0355
Least Squares Means
variety
Source
DF Type III SS
variety
nitrogen
var*nit
Source
2
2
4
319.3739
56.5406
559.7878
2
2
4
F
159.6869
28.2703
139.9469
319.3739
56.5406
559.7878
Pr > F
3.44 0.0468
0.61 0.5515
3.01 0.0355
Mean
Square
DF Type IV SS
variety
nitrogen
var*nit
Mean
Square
F
159.6869
28.2703
139.9469
Pr > F
1
1
1
2
2
2
3
3
3
nitrogen
LSMEAN
yield
Standard
Error
Pr > |t|
150
210
270
150
210
270
150
210
270
66.525
68.975
75.950
61.450
62.550
70.425
68.600
64.525
57.900
3.408133
3.408133
3.408133
3.408133
3.408133
3.408133
3.408133
3.408133
3.408133
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
3.44 0.0468
0.61 0.5515
3.01 0.0355
595
594
Least Squares Means for effect variety*nitrogen
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: yield
i/j
1
1
2
3
4
5
6
7
8
9
0.6154
0.0610
0.3017
0.4168
0.4255
0.6702
0.6815
0.0848
i/j
1
2
3
4
5
6
7
8
9
2
3
4
5
0.6154
0.0610
0.1594
0.3017
0.1301
0.0056
0.4168
0.1937
0.0098
0.8212
0.1594
0.1301
0.1937
0.7658
0.9386
0.3640
0.0296
6
0.4255
0.7658
0.2617
0.0735
0.1139
0.7079
0.2315
0.0150
0.0056
0.0098
0.2617
0.1389
0.0252
0.0009
7
0.6702
0.9386
0.1389
0.1495
0.2202
0.7079
0.4053
0.0350
0.8212
0.0735
0.1495
0.5289
0.4678
0.1139
0.2202
0.6852
0.3432
8
9
0.6815
0.3640
0.0252
0.5289
0.6852
0.2315
0.4053
0.0848
0.0296
0.0009
0.4678
0.3432
0.0150
0.0350
0.1806
0.1806
596
Contrast
n-linear
n-quad
v1-v2
(v1+v2)-v3
(v1-v2)*(n-lin)
(v1-v2)*(n-quad)
(.5(v1+v2)-v3)*(n-lin)
(.5(v1+v2)-v3)*(n-quad)
DF
Contrast SS
Mean Square
1
1
1
1
1
1
1
1
39.5266667
17.0138889
193.2337500
126.1401389
0.2025000
1.6875000
528.0133333
29.8844444
39.5266667
17.0138889
193.2337500
126.1401389
0.2025000
1.6875000
528.0133333
29.8844444
Contrast
n-linear
n-quad
v1-v2
(v1+v2)-v3
(v1-v2)*(n-lin)
(v1-v2)*(n-quad)
(.5(v1+v2)-v3)*(n-lin)
(.5(v1+v2)-v3)*(n-quad)
F Value
Pr > F
0.85
0.37
4.16
2.71
0.00
0.04
11.36
0.64
0.3645
0.5501
0.0513
0.1110
0.9478
0.8503
0.0023
0.4296
597
Two factor experiments with empty cells:
Parameter
n-linear
n-quad
v1-v2
(v1+v2)-v3
(v1-v2)*(n-lin)
(v1-v2)*(n-quad)
(.5(v1+v2)-v3)*(n-lin)
(.5(v1+v2)-v3)*(n-quad)
Estimate
Standard
Error
2.5666667
-2.9166667
5.6750000
3.9708333
0.4500000
2.2500000
19.9000000
-8.2000000
2.7827289
4.8198279
2.7827289
2.4099139
6.8162659
11.8061189
5.9030595
10.2243989
Parameter
Pr > |t|
n-linear
n-quad
v1-v2
(v1+v2)-v3
(v1-v2)*(n-lin)
(v1-v2)*(n-quad)
(.5(v1+v2)-v3)*(n-lin)
(.5(v1+v2)-v3)*(n-quad)
0.3645
0.5501
0.0513
0.1110
0.9478
0.8503
0.0023
0.4296
t Value
0.92
-0.61
2.04
1.65
0.07
0.19
3.37
-0.80
Data from Littell, Freund, and Spector, 1991,
SAS System for Linear Models, 3rd edition, SAS Institute, Cary, N.C.
Factor A
i=1
i=2
j=1
Y111 = 5
Y112 = 6
Y211 = 2
Y212 = 3
Factor B
j=2
j=3
Y121 = 2
Y122 = 3
{
Y123 = 5
Y124 = 6
Y125 = 7
Y221 = 8 Y231 = 4
Y222 = 8 Y232 = 4
Y223 = 9 Y233 = 6
Y234 = 6
Y235 = 7
598
599
= E (Yij:)
= + i + j + ij
is estimable for all (i; j ) 6= (1; 3).
ij
Sample sizes:
Factor B
Factor A j = 1 j = 2 j = 3
i = 1 n11 = 2 n12 = 5
{
i = 2 n21 = 2 n22 = 3 n23 = 5
Functions of parameters that are not estimable
include:
13 = + 1 + 3 + 13
1 X2 X3 ij
:: =
6 i=1 j=1
= + 21 (1 + 2) + 13 (1 + 2 + 3)
+ 16 (11 + 12 + 13 + 21 + 22 + 23):
Eects model:
Yijk = + i + j + ij + ijk
for (i; j ) 6= (1; 3)
and k = 1; : : : ; nij
600
3
= 13 X 1j
j =1
1
:3 = (13 + 23)
2
1:
601
Two factor classications with empty cells:
No single \best" or \correct" analysis.
Analysis of variance
{ Test for interaction is useful
{ Use SSE to estimate the error variance
2.
{ Tests for \main eects" may not be
meaningful, especially in the presence
of interaction.
Compute F-tests and sums of squares for
meaningful contrasts.
Compare estimated means for dierent
combinations of factor levels.
It may be most convenient to consider the
various combinations of factor levels as
levels of a single \combined" factor.
{ one-way ANOVA
{ contrasts
{ compare means
602
603
2 1
2 1
2 2
2 2
2 2
2 3
2 3
2 3
2 3
2 3
run;
/* SAS code for analyzing data
from the two factor experiment
with no data for one combination
of factors> This code is posted
as littell.sas */
data set1;
input A B y;
cards;
1 1 5
1 1 6
1 2 2
1 2 3
1 2 5
1 2 6
1 2 7
2
3
8
8
9
4
4
6
6
7
/* Print the data */
proc print data=set1;
run;
/* Compute sample means for all
factor combinations with data.
Make a profile plot. */
604
605
proc sort data=set1; by a b;
proc means data=set1 noprint; by a b;
var Y;
output out=means mean=my;
run;
SYMBOL1 V=circle H=2.0 w=3 l=1 i=join;
SYMBOL2 V=diamond H=2.0 w=3 l=3 i=join;
goptions cback=white colors=black
device=WIN target=WINPRTC;
/*
goptions cback=white colors=(black)
targetdevice=ps300 rotate=landscape;
*/
proc gplot data=means;
plot my*b=a / vaxis=axis2 haxis=axis1;
title ls=0.8in H=3.0 F=swiss "Sample Means";
label my='Mean';
label b = 'Factor B';
footnote ls=0.4in ' ';
run;
/* Perform analysis of variance where
facror A is entered into the model
before factor B. Use the LSMEANS
statement to compare means for
different combinations of factor A
and factor B. */
axis1 label=(f=swiss h=2.0)
value=(f=swiss h=1.8)
w=3.0 length= 5.0 in;
axis2 label=(f=swiss h=2.0 a=90 r=0)
value=(f=swiss h=1.8)
w= 3.0 length = 5.0 in;
606
607
proc glm data=set1;
class A B;
model y = A B A*B / solution ss1 ss2
ss3 ss4 e e1 e2 e3 e4 p;
means A B A*B;
lsmeans A*B / pdiff tdiff stderr;
estimate 'A1-A2' A 1 -1 / e;
contrast 'A1-A2' A 1 -1 / e;
estimate 'A1-A2 within B1' A 1 -1
A*B 1 0 -1 0 0 / e;
estimate 'A1-A2 within B2' A 1 -1
A*B 0 1 0 -1 0 / e;
estimate 'A1-A2 over B' A 1 -1
A*B .5 .5 -.5 -.5 0 / e;
estimate 'B1-B2 over A' B 1 -1 0
A*B .5 -.5 .5 -.5 0 / e;
estimate 'B3-.5(B1+B2) in A2' B -.5 -.5 1
A*B 0 0 -.5 -.5 1 / e;
estimate 'interaction' A*B 1 -1 -1 1 0 / e;
run;
608
609
/* Do everything with a one-factor ANOVA by
combining the two factors into a single
factor with 5 categories. */
data set1; set set1;
C=10*A+B;
run;
proc glm data=set1;
class C;
model y = C / solution e e2;
estimate 'C11-C21' C 1 0 -1 0 0;
estimate 'C12-C22' C 0 1 0 -1 0;
estimate '.5(C11+C12-C21+C22)' C .5 .5 -.5 -.5 0;
estimate '.5(C11-C12+C21-C22)' C .5 -.5 .5 -.5 0;
estimate 'C23-.5(C21+C22)' C 0 0 -.5 -.5 1;
estimate 'C11-C12-C21+C22' C 1 -1 -1 1 0;
lsmeans C / stderr tdiff pdiff;
run;
General Form of Estimable Functions
Effect
Coefficients
Intercept
L1
A
A
1
2
L2
L1-L2
B
B
B
1
2
3
L4
L5
L1-L4-L5
A*B
A*B
A*B
A*B
A*B
1
1
2
2
2
1
2
1
2
3
L7
L2-L7
L4-L7
-L2+L5+L7
L1-L4-L5
611
610
Type IV Estimable Functions
Type III Estimable Functions
Effect
-----------Coefficients----------A
B
A*B
Intercept
0
0
0
A
A
1
2
L2
-L2
0
0
0
0
B
B
B
1
2
3
0
0
0
L4
L5
-L4-L5
0
0
0
A*B
A*B
A*B
A*B
A*B
1
1
2
2
2
0.5*L2
0.5*L2
-0.5*L2
-0.5*L2
0
0.25*L4-0.25*L5
-0.25*L4+0.25*L5
0.75*L4+0.25*L5
0.25*L4+0.75*L5
-L4-L5
L7
-L7
-L7
L7
0
1
2
1
2
3
Effect
------Coefficients-----A
B
A*B
Intercept
0
0
0
A
A
1
2
L2
-L2
0
0
0
0
B
B
B
1
2
3
0
0
0
L4
L5
-L4-L5
0
0
0
A*B
A*B
A*B
A*B
A*B
1
1
2
2
2
0.5*L2
0.5*L2
-0.5*L2
-0.5*L2
0
0
0
L4
L5
-L4-L5
L7
-L7
-L7
L7
0
1
2
1
2
3
NOTE: Other Type IV estimable functions exist.
612
613
General Form of Estimable Functions
Effect
Coefficients
Intercept
L1
Dependent Variable: y
L2
L3
L4
L5
L1-L2-L3-L4-L5
Source
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
Model
4
45.8157
11.4539
5.27
0.0110
Error
12
26.0667
2.1722
C. Total
16
71.8824
C
C
C
C
C
11
12
21
22
23
Type II Estimable Functions
Effect
-CoefficientsC
Intercept
0
C
C
C
C
C
11
12
21
22
23
Parameter
L2
L3
L4
L5
-L2-L3-L4-L5
C
Standard
Error
Pr > |t|
LSMEAN
Number
11
12
21
22
23
5.5000
4.6000
2.5000
8.3333
5.4000
1.0421
0.6591
1.0422
0.8509
0.6591
0.0002
<.0001
0.0336
<.0001
<.0001
1
2
3
4
5
1
2
3
4
5
-0.7299
0.4795
-2.0355
0.0645
2.1059
0.0569
-0.0811
0.9367
0.7299
0.4795
-1.70301
0.1143
3.46853
0.0046
0.85824
0.4076
2.0355
0.0645
1.7030
0.1143
4.3357
0.0010
2.3518
0.0366
1.4738
1.0763
0.9125
0.9125
0.9418
1.8250
2.04
-3.47
-0.40
-2.70
-0.02
3.69
location of empty cells
ordering of the levels for the row and
column factors
4
-2.1059
0.0569
-3.4685
0.0046
-4.3357
0.0010
-2.7253
0.0184
0.0645
0.0046
0.6949
0.0192
0.9862
0.0031
Estimable functions for Type IV sums of
squares may depend on
Dependent Variable: y
3
3.0000
-3.7333
-0.3667
-2.4667
0.0167
6.7333
Pr > |t|
615
Least Squares Means for Effect C
t for H0: LSMean(i)=LSMean(j) / Pr > |t|
2
t
614
LSMEAN
y
1
Standard
Error
C11-C21
C12-C22
.5(C11+C12-C21+C22)
.5(C11-C12+C21-C22)
C23-.5(C21+C22)
C11-C12-C21+C22
Least Squares Means
i/j
Estimate
5
0.0811
0.9367
-0.8582
0.4076
-2.3518
0.0366
2.7253
0.0184
616
Example: Exchange columns 1 and 3
in the previous example.
Factor 2
A
B
C
Factor 1 (old j=3)
(old j=1)
Y12: = 4:6 Y13 = 5:5
i=1
{
n12 = 5
n13 = 2
i = 2 Y21: = 5:4 Y22 = 8:33 Y23: = 2:5
n21 = 5
n22 = 3
n23 = 2
617
Additive model:
Yijk
Type IV estimable functions for Factor B:
Main Eects
A B
i=1 { 0
i=2 1 0
Interaction
C
0
-1
For this model
E (Yijk ) = ij = + i + j
may be estimable when nij = 0.
For example 8.1, n13 = 0, but
A B C
i = 1 { .5 -.5
i = 2 0 .5 -.5
1
2 (1B +1 2B )
2 (1C + 2C )
2A 2C
= + i + j + ijk
i = 1; : : : ; a
j = 1; : : : ; b
k + 1; : : : ; nij
13
In either case, Type IV sums of squares and
testable functions are not the same as Type III
sums of squares and testable functions.
= + 1 + 3
= ( + 2 + 3) ( + 2 + 2) + ( + 1 + 2)
= 23 + (12 22)
= E (Y23: Y22: + Y12:)
618
ANOVA
Sum of
Squares
R()
Sum of
Squares
R()
Associated null
hypothesis
a n
b
X
i: + X n:j = 0
H0 : +
i
n::
n:: j
R(j)
619
or
i=1
a X
b
X
j =1
nij
ij = 0
H0 :
i=1 j =1 n::
R(j)
b
X
nij
H0 : i +
j
j =1 ni:
are equal for all i = 1; : : : ; a
or H0 :
R(j; ) H0 : j
b
X
nij
ij are equal for all i
j =1 ni:
are equal for all j = 1; : : : ; b
or H0 :
i=1
a X
b
X
H 0 : j +
j =1
nij
ij = 0
i=1 j =1 n::
a
X
nij
i
i=1 n:j
are equal for all j = 1; : : : ; b
620
Associated null
hypothesis
a n
b
X
i: + X n:j = 0
H0 : +
i
n::
n:: j
a
nij
ij are equal for all j
i=1 n:j
or H0 : X
R(j; ) H0 : i
are equal for all i = 1; : : : ; a
621
Download