Comments: Consider R(j; ) = YT (P;; P;)Y and the corresponding F-statistic For R(j), b X nij ij may not be equal j =1 ni: for all i = 1; : : : ; a, even though (i) b X 1 b j =1 F Here, 1 R(j; ) 2 2 rank(X;; ) rank(X;)(Æ ) 2 are equal for all i = 1; : : : ; a. ij )=(b 1) F = R(j;MSE (b 1;n:: ab)(Æ2) % b nij ij may be equal j =1 ni: for all i = 1; : : : ; a, even though (ii) X b X 1 b j =1 and 1 Æ2 = 2 [(P;; 2 are not equal for ij - [1+(a 1)+(b 1)] [1+(a 1)] = b 1 degrees of freedom some i = 1; : : : ; a. P;)X ]T [(P;; P;)X ] 518 P;; X = h 6 6 X;; 6 6 6 4 n:: n1: n2: n:l n:2 n:3 n1: n2: n1: 0 0 n2: n11 n21 n12 n22 n13 n23 % call this h A B T 2 =4 B C i 1 The null hypothesis is a n X ij ( + ) H0 : j ij n i T X T X;; X;; ;; X;; X 2 = 519 = = h h i h 1 0 0 0 i+ h 0 01 + 0 C A W C 1B T W where W = [A n:1 n:2 n:3 n11 n12 n13 n21 n22 n23 n:1 0 0 0 n:2 0 0 0 n:3 " A B BT C i=1 3 7 7 7 7 7 5 T X X;; :j a X nij i=1 n:j 0 1 b X nik (k + ik)A = 0 k=1 ni: for all j = 1; : : : ; b @ # i A 1 B [C B A 1 B ] 1 [ B A I i I 1 1 C 1B [A BC B ] [I j T T T T W BC 1 5 C 1 + C 1B T W BC 1 3 1 j I] BC 1 ] With respect to the cell means, E (Yijk ) = ij ; this null hypothesis is a X nij H0 : ij i=1 n:j a X 0 1 b n nij @ X ik A = 0 ik i=1 n:j k=1 ni: for all j = 1; 2; : : : ; b: BC 1B T ] 1 520 521 Consider R( j; ; ) = YT [PX P;; ]Y Type I sums of squares and the associated F-statistic F Source of sums of Mean variat. d.f. squares square F p-value Soil types a 1 = 1 R(j) = 52:5 52.5 3.94 .0785 Var. b 1=2 R(j; ) = 124:73 62.4 4.68 .0405 Interaction (a-1)(b-1)=2 R( j; ; ) = 222:76 111.38 8.35 .0089 Resid. n ab = 9 Y (I P )Y = 120 13.33 Corr. total n 1 = 14 Y (I P1)Y = 520 Corr. for the mean 1 R() = 3375 =[(a 1)(b 1)] = R( j; ; )MSE F(a 1)(b 1);n:: ab(Æ2) T The null hypothesis is: H0 : (ij X T i` kj + k`) = (ij i` kj + k`) = 0 for all (i; j ) and (k; `) : 522 523 ANOVA Summary: Sums of Squares Associated null hypothesis R() H0 : + a X n =1 i + i: n :: =1 j =1 i or H0 : + i i + ij n ij :: b X n ij or H0 : j R(j; ) H0 : j + =1 n =1 ij n a X n ij n =1 ij :: ij i: :j n :: j =0 ! :j R(j) are equal a X ij ij =1 n ( + ) k k: a b X n Xn ij ij ij :j :j i i kj ij i` kj =1 or H0 : ik ik i: k` 524 R( j; ; ) H0 : ij + ij n :: n n :: ij n j :: =0 ij ( + ) are equal for all j = 1; : : : ; b ij ij :j :j n =0 ij ij =1 j :j Xn =1 b X n j ij i a are equal for all j = 1; : : : ; b b X ( + ) = nn =1 for all i = 1; : : : ; a ij n ij i: a X n kj i: j ij =1 ij ij b X n j ! k :: =1 j =1 b X n ik k` i` + i H0 : i a X n =1 j ik k j or H0 : R(j; ) b X n :j i H0 : ! a b X X n i or H0 : n = n for all j = 1; : : : ; b n =1 =1 =1 R( j; ; ) H0 : + = 0 for all (i; j ) and (k; `) (or H0 : + = 0 for all (i; j )and (k; `) ij i: n =1 j =1 or H0 : ij i a X n a X n a b X X n i = nn n =1 for all j = 1; :=1: : ; b ij + ( + ) are equal j i: b X n H0 : + =0 a b X X n j b X n j =1 j =1 H0 : R() =1 i R(j) Associated null hypothesis i a b X X n Sums of Squares n ij i: = k n b a X n hX n ij j =1 n i: k=1 kj n :j ( + ) k :j =1 kj i kj for all i = 1; : : : ; a + = 0 for all (i; j ) and (k; `) kj or H0 : ij i` kj k` i` + = 0 for all (i; j ) and (k; `) k` 525 Type I sums of squares Source of sums of Mean variat. d.f. squares square F p-value \Soils" a 1 = 1 R(j) = 52:50 52.5 3.94 .0785 \Var." b 1 = 2 R(j; ) = 124:73 62.4 4.68 .0405 Interaction (a-1)(b-1) R(j; ; ) = 222:76 111.38 8.35 .0089 -2 \Res." (n 1) Y (I P )Y = 120:00 13.33 =9 Corr. total n 1 = 14 Y (I P1)Y = 520:00 T ij X T :: Source of sums of Mean variat. d.f. squares square F p-value \Var." b 1 = 2 R(j) = 93:33 46.67 3.50 .0751 \Soils" a 1 = 1 R(j; ) = 83:90 83.90 6.29 .0334 Interaction (a-1)(b-1) R(j; ; ) = 222:76 111.38 8.35 .0089 =2 \Res." (n 1) Y (I P )Y = 120:00 13.33 =9 Corr. total n 1 = 14 Y (I P1)Y = 520:00 T ij Type II sums of squares: (SAS) Source of sums of Mean variat. d.f. squares square F p-value \Soils" a 1 = 1 R(j; ) = 83:90 83.90 6.3 .0339 \Var." b 1 = 2 R(j; ) = 124:73 62.37 4.7 .0405 Interaction (a-1)(b-1) R( j; ; ) = 222:76 111.38 8.4 .0089 =2 \Res." n ab Y (I P )Y = 120 13.33 =9 Corr. total n 1 Y (I P1 )Y = 520 T X T X T :: 527 526 Examine the soil type eect on time to germination for each variety: 40 Average Time to Carrot Seed Germination 20 10 Time to germination for variety 2 is 0 Mean Time 30 Soil Type 1 Soil Type 2 Time to Germination Soil Type 1 Soil Type 2 Variety Yij: SYij: Y2j: SY2j: t p-value j=1 9.0 2.11 16.0 1.83 -2.51 .0333 j = 2 14.0 2.58 31.0 3.65 -3.80 .0042 j = 3 18.0 2.58 13.0 2.11 1.50 .1679 1.0 1.5 2.0 2.5 3.0 Variety 528 shorter in soil type 1. Time to germination for variety 1 may also be shorter in soil type 1. For variety 3 there is no signicant dierence in average germination times for the two soil types. 529 In the previous analysis: Yij: = ^ij = ^ + ^i + ^j + ^ij is the OLS estimator (b.l.u.e.) for ij = + i + j + ij SYij: = v u u t MSE nij for i = 1; : : : ; a; and j = 1; : : : ; b and Y1j: Y2j: MSE ( n11j + n12j ) t=r Method of Unweighted Means (Type III sums of squares in SAS when for all (i; j )). nij > 0 Use the cell means reparameterization of the model: Yijk = + i + j + ij + ijk = ij + ijk for j = 1; : : : ; b 531 530 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y111 Y112 Y113 Y121 Y122 Y131 Y132 Y211 Y212 Y213 Y214 Y221 Y231 Y232 Y233 % 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 1 1 1 0 0 0 0 = 0 0 0 0 0 0 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y The model is 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 % 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 37 07 0 7777 07 0 777 2 11 3 0 7 6 77 7 0 7777 666 12 7 7+ 0 77 666 13 7 7 0 777 64 21 7 22 0 77 23 5 07 0 777 1 777 05 0 D " The least squares estimator (b.l.u.e.) for is ^ = (DT D) 1DT Y 2 = 2 = Y = D + 532 6 6 6 6 6 6 6 6 6 4 6 6 6 6 6 6 6 6 4 n111 Y11: Y12: Y13: Y21: Y22: Y23: n121 3 n131 32 n211 n221 n231 76 76 76 76 76 76 76 76 74 5 Y11: Y12: Y12: Y21: Y22: Y23: 7 7 7 7 7 7 7 7 5 533 3 7 7 7 7 7 7 7 7 5 Test the null hypothesis 1 Xb = 1 Xb = = 1 Xb H0 : b j =1 ij b j =1 2j b j =1 aj vs. 1 Xb ij 6= 1 Xb for some i 6= k HA : kj b b j =1 j =1 The OLS estimator (b.l.u.e.) for 1b Y~i:: = with V ar(Y~i::) 1 Xb Yij: j =1 ij is where C1 2 j =1 ij b 1 X 2 = 6 6 6 6 6 6 4 2 = 1 b2 j =1 nij = [Ia 1 b j =1 b 2 = b12 X n = b X Express the null hypothesis in matrix form: H0 : C1 = 0 1Tb 1Tb X 6 j 6 6 6 6 X 4 j 1a 1j 1] 1Tb 1Tb 1Tb ... 1Tb X .. a 1;j j aj X j aj .. 1Tb 3 C1b 6 76 76 76 76 76 76 56 6 6 6 6 4 .. 1b 21 .. 2b .. ab 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 535 Compute SSH0 = (C1b 0)T [C1(DT D) 1C1T ] 1(C1b 0) = YT D(DT D) 1C1T [C1(DT D) 1C1T ] 1 C1(DT D) 1DT Y = C1(DT D) 1DT Y 2 3 X X Y Y 1 j: aj: 6 7 6 7 j j 6 7 . 6 7 . 6 7 = 66 . 7 . 7 X X 6 4 Ya 1;j: Yaj: 75 j 11 6 6 3 6 12 7 7 7 7 7 5 534 Then 2 j is the OLS estimator (b.l.u.e.) of C1, and V ar(C1b) = V ar(C1(DT D) 1DT Y) = C1(DT D) 1DT (2I )D(DT D) 1C1T = 2C1(DT D) 1DT D(DT D) 1C1T = 2C1(DT D) 1C1T 536 Use result 4.7 to show 1 SS 2 2 H0 (a 1)(Æ 2) Check that 1 A = 2 D(DT D) 1C1T [C1(DT D) 1C1T ] 1 C1(DT D) 1DT (2I ) is idempotent and that a 1 = rank(C1(DT D) 1C1T ) 537 Compute: where SSE = YT (I Check that A1A2 = = = PD )Y PD = D(DT D) 1DT A1(2I )A2 2A1A2 2(I PD )(D(DT D) 1C1T (C1(DT D) 1C1T ) 1C1(DT D) 1DT Use result 4.7 to show 1 SSE 2 = 0 This is true because (I Use result 4.8 to show that SSE = YT (I PD )Y - call this A1 is distributed independently of Then SSH0 =(a 1) F(a 1;(nij 1))(Æ2) F= SSE=((n 1)) (nij 1) 2 SSH 0 = YT D(DT D) 1C T 1 [C1(DT D) (DT D) - call this A2 1C T 1 ] 1C 1 PD )D = 0. ij where 1 Æ2 = 2 T C1T [C1(DT D) 1C1T ] 1C1 1DT Y 538 539 Test Reject 1 H0 : H0 : b X b j =1 if F 1j = 1 b X b j =1 SSH0 =(a 1) = SSE= ((nij 1)) or if p 2j = = 1 vs. b X b j =1 aj > F(a 1;(nij 1)) ; HA : value = a X 1 a X a i=1 C2 = P r F(a 1;(nij 1)) > F < then ij 6= a i=1 1 T a 2 o i1 = 4 I b 1 1 1 b a X 1 a X a i=1 i2 = = 1 a X a i=1 ib 2 6 6 4 11 12 .. .. a i=1 ik 1 1 1 1 1 . 1 . ..1 .. . 1 1 1 1 1 . .. C2 = C2 6 6 1 6 21 540 1 for some j 6= k Write the null hypothesis in matrix form as H0 : C2 = 0 where = n 1 b ab 3 7 7 7 7 7 5 2 a X 1 6 6 =1 = 666 i 4 1X a a =1 i 3 1 X a 1 i a i;b .. 1 a ib i =1 1 X a a 7 7 7 7 7 5 ib i =1 541 3 5 Test for Interaction: Compute SSH0;2 = Test 1C2T [C2(DT D) 1C2T ] 1 C2(DT D) 1DT Y YT D(DT D) and reject H0 if F = for all vs. for all SSH0;2 =(b 1) SSE=((nij 1)) H0 : ij i` kj + k` = 0 (i; j ) and (k; `) HA : ij i` kj + k` 6= 0 (i; k) and (j 6= `). Write the null hypothesis in matrix form as H0 : C3 = 0 where C3 = [Ia 1j 1a 1] [Ib 1j 1b 1] > F(b 1;(nij 1)); 542 Compute 2 b = (DT D) SSH0;3 1DT Y = 64 Y11: .. Yab: 3 7 5 = (C3b 0)T [C3(DT D) 1C3T ] 1(C3b 0) = YT D(DT D) 1C3T [C3(DT D) 1C3T ] 1 C3(DT D) 1DT Y and reject H0 if F = 543 PROC GLM is SAS reports this as Type III sums of squares. Source of Sum of variation d.f. Soils a-1=1 Var. b-1=2 Inter. (a-1)(b-1)=2 SS SS SS Mean Squares Square F p-value = 123.77 123.77 9.28 .0139 = 192.13 96.06 7.20 .0135 = 222.76 111.38 8.35 .0089 H0 H0;2 H0;3 SSH0;3 =((a 1)(b 1)) SSE=((nij 1)) > F((a 1)(b 1);(nij 1)); 544 545 Note that Note that YT P1Y + YT D(DT D) 1[C1(DT D) C (DT D) 1DT Y 1 1C1T ] 1 YT D(DT D) 1C2T [C2(DT D) C (DT D) 1DT Y + 2 YT D(DT D) 1C3T [C3(DT D) C (DT D) 1DT Y + 3 1C2T ] 1 1C3T ] 1 + YT (I PD )Y do not necessarily sum to YT Y, nor do the middle three terms (SSH0 ; SSH0;2 ; SSH0;3) necessarily sum to SSmodel,corrected = YT (PD P1)Y ; nor are (SSH0 ; SSH0;2 ; SSH0;3) necessarily independent of each other. SSH0 = a X i=1 6 6 6 i6 6 4 w Y~i: where 1 Xb Y Y~i: = ij: b wi = 2 j =1 1 Xb 3 a5 4 2 b j =1 nij 2 a X 2 3 wk Y~k: 77 k=1 a X wk k=1 1 = 2 V ar(Y~i:) h i n ij b X X Yi: = j =1 k=1 b X j =1 Yijk nij b X = nij Yij: j =1 b X j =1 nij 547 Balanced factorial experiments Furthermore, a X 2 SSH0;2 = b X j =1 6 6 6 6 6 4 wj Y~:j nij = n 2 3 w`Y~:` 77 `=1 a X w` `=1 where 1 Xa Y Y~:j = ij: a 7 7 7 5 i=1 3 1 a a h i X 1 4 = a2 n 5 = 2 V ar(Y~:j ) 1 i=1 ij and Y~:j is not necessarily equal to n ij a X X i=1 k=1 a X i=1 Yijk nij a X = nij Yij: i=1 a X i=1 for i = 1; : : : ; a j = 1; : : : ; b Example 8.2: Sugar Cane Yields (from Snedecor and Cochran) 2 Y:j = 1 and Y~i: is not necessarily equal to 546 wj 7 7 7 5 nij 548 Nitrogem Level 150 lb/acre 210 lb/acre 270 lb/acre Y111 = 70:5 Y121 = 67:3 Y131 = 79:9 Variety 1 Y112 = 67:5 Y122 = 75:9 Y132 = 72:8 Y113 = 63:9 Y123 = 72:2 Y133 = 64:8 Y114 = 64:2 Y124 = 60:5 Y134 = 86:3 Y211 = 58:6 Y221 = 64:3 Y231 = 64:4 Variety 2 Y212 = 65:2 Y222 = 48:3 Y232 = 67:3 Y213 = 70:2 Y223 = 74:0 Y233 = 78:0 Y214 = 51:8 Y224 = 63:6 Y234 = 72:0 Y311 = 65:8 Y321 = 64:1 Y331 = 56:3 Variety 3 Y312 = 68:3 Y322 = 64:8 Y332 = 54:7 Y313 = 72:7 Y323 = 70:9 Y331 = 66:2 Y314 = 67:6 Y324 = 58:3 Y334 = 54:4 549 For a balanced experiment (nij = n), Type I, Type II, and Type III sums of squares are the same: R(j) = R(j; ) = SSH0 a = n b X (Yi:: Y:::)2 ANOVA R() = YT P1Y R(j) = R(j; ) = nb SSH0;3 n a X b X = na (Yij: Yi:: Y:j: + Y:::)2 i=1 j =1 b 1X j b j =1 =0 ij = 0 (j + ij ) j =1 are equal H0 : 1b b X j =1 H0 : j + 1a b X (Y:j: Y:::)2 j =1 i=1 j =1 b X H0 : i + 1b (Yi:: Y:::)2 R(j) = R(j; ) i + ij i=1 j =1 a X b X a X i=1 a X i=1 a X b X H0 : a1b = R(j; ) = SSH0;2 b = n a X (Y:j: Y:::)2 = = + a1b j =1 R( j; ; ) H0 : + 1a = a b n Y:::2 i=1 R( j) Associated null hypothesis Sum of Squares are equal ij a X i=1 (i + ij ) are equal H0 : 1a a X i=1 ij are equal 551 550 # This file is stored as cane.ssc R( j; ; ) = n H0 : ij for a X b X i=1 j =1 (Yij: Yi:: Y:j: + Y:::)2 kj i` + k` = 0 all (i; j ) and (k; `) # Enter the data. Note that the first # line of this file is a line of data, # not a line of variable names. cane <- read.table( "cane.dat", col.names=c("Variety","Nitrogen", "Yield")) # Create factors H0 : ij kj i` + k` = 0 for all (i; j ) and (k; `) cane$V <- as.factor(cane$Variety) cane$N <- as.factor(cane$Nitrogen) # Print the data frame cane 552 553 # # # # # Compute mean yields for all combinations of nitrogen levels and varieties and Make a profile plot. At this point UNIX users should open a graphics window with the motif( ) function. means <- tapply(cane$Yield, list(cane$Variety,cane$Nitrogen), mean) means # Add a profile for each soil type matlines(x.axis,means,type='l', lty=c(1,3,5),lwd=3) # Plot symbols for the sample means matpoints(x.axis,means, pch=c(15,16,18)) # Set up the profile plot # Add a legend to the plot par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5,mkh=.20) x.axis <- unique (cane$Nitrogen) matplot(c(130,270), c(50,80), type="n", xlab="Nitrogen(lb/acre)", ylab="Mean Yield", main= "Sugar Cane Yields") legend(130,60, legend=c('Variety 1', 'Variety 2','Variety 3'), lty=c(1,3,5),bty='n') 555 554 # Fit a model with main effects and # interaction effects. Compute both # sets of Type I sums of squares. options(contrasts=c('contr.sum','contr.ploy')) # Compute Type III sums of squares and # corresponding F-tests. lm.out1 <- lm(Yield~N*V, data=cane) anova(lm.out1) # Generate an identity matrix and a # vector of ones lm.out2 <- lm(Yield~V*N, data=cane) anova(lm.out2) summary(lm.out2, correlation=F) model.matrix(lm.out2) Iden <- function(n) diag(rep(1,n)) one <- function(n) matrix(rep(1,n),ncol=1) # Compute the transpose of the model # matrix for the cell means model # Create diagnostic plots par(mfrow=c(3,2)) plot(lm.out1) # Create a data frame containing the original # data and the residuals and estimated means data.frame(cane$Nitrogen,cane$Variety, cane$Yield,Pred=lm.out1$fitted, Resid=round(lm.out1$resid,3)) 556 s <- length(unique(cane$Nitrogen)) t <- length(unique(cane$Variety)) st <- s*t r <- length(cane$Yield)/(st) D <- t(kronecker(Iden(st), t(one(r)))) 557 # Least squares estimation y <b <yhat sse df2 matrix(cane$Yield,ncol=1) solve(crossprod(D)) %*% crossprod(D,y) <- D %*% b <- crossprod(y-yhat) <- nrow(y) - st c1 <- kronecker( cbind(Iden(s-1),-one(s-1)), t(one(t)) ) q1 <- t(b) %*% t(c1)%*% solve( c1 %*% solve(crossprod(D)) %*% t(c1))%*% c1 %*% b df1<- s-1 f <- (q1/df1)/(sse/df2) p <- 1-pf(f,df1,df2) c1 data.frame(SS=q1,df=df1,F.stat=f,p.value=p) 558 # This file is stored as cane.ssc c2 <- kronecker( t(one(s)), cbind(Iden(t-1),-one(t-1)) ) q2 <- t(b) %*% t(c2)%*%solve( c2 %*% solve(crossprod(D)) %*% t(c2))%*% c2 %*% b df1<- t-1 f <- (q2/df1)/(sse/df2) p <- 1-pf(f,df1,df2) c2 data.frame(SS=q2,df=df1,F.stat=f,p.value=p) c3 <- kronecker( cbind(Iden(s-1),-one(s-1)), cbind(Iden(t-1),-one(t-1)) ) q3 <- t(b) %*% t(c3)%*% solve( c3 %*% solve(crossprod(D)) %*% t(c3))%*% c3 %*% b df1<- (s-1)*(t-1) f <- (q3/df1)/(sse/df2) p <- 1-pf(f,df1,df2) c3 data.frame(SS=q3,df=df1,F.stat=f,p.value=p) 559 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # Enter the data. Note that the first # line of this file is a line of data, # not a line of variable names. > cane <- read.table("cane.dat", col.names=c("Variety","Nitrogen", "Yield")) # Create factors > cane$V <- as.factor(cane$Variety) > cane$N <- as.factor(cane$Nitrogen) # Print the data frame > cane 560 Variety Nitrogen Yield N V 1 150 70.5 150 1 1 150 67.5 150 1 1 150 63.9 150 1 1 150 64.2 150 1 1 210 67.3 210 1 1 210 75.9 210 1 1 210 72.2 210 1 1 210 60.5 210 1 1 270 79.9 270 1 1 270 72.8 270 1 1 270 64.8 270 1 1 270 86.3 270 1 2 150 58.6 150 2 2 150 65.2 150 2 2 150 70.2 150 2 2 150 51.8 150 2 2 210 64.3 210 2 2 210 48.3 210 2 561 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 210 210 270 270 270 270 150 150 150 150 210 210 210 210 270 270 270 270 74.0 63.6 64.4 67.3 78.0 72.0 65.8 68.3 72.7 67.6 64.1 64.8 70.9 58.3 56.3 54.7 66.2 54.4 210 210 270 270 270 270 150 150 150 150 210 210 210 210 270 270 270 270 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 # # # # # Compute mean yields for all combinations of nitrogen levels and varieties and Make a profile plot. At this point UNIX users should open a graphics window with the motif( ) function. > means <- tapply(cane$Yield, list(cane$Variety,cane$Nitrogen), mean) > means 150 210 270 1 66.525 68.975 75.950 2 61.450 62.550 70.425 3 68.600 64.525 57.900 563 562 Set up the profile plot par(fin=c(7,7),cex=1.2,lwd=3,mex=1.5,mkh=.20) x.axis <- unique (cane$Nitrogen) matplot(c(130,270), c(50,80), type="n", xlab="Nitrogen(lb/acre)", ylab="Mean Yield", main= "Sugar Cane Yields") 65 60 55 # Plot symbols for the sample means > matpoints(x.axis,means, pch=c(15,16,18)) Variety 1 Variety 2 Variety 3 50 # Add a profile for each soil type > matlines(x.axis,means,type='l', lty=c(1,3,5),lwd=3) 70 75 80 Sugar Cane Yields Mean Yield # > > > 140 160 180 200 220 240 260 # Add a legend to the plot > legend(130,60, legend=c('Variety 1', 'Variety 2','Variety 3'), lty=c(1,3,5),bty='n') Nitrogen(lb/acre) 564 565 # Fit a model with main effects and # interaction effects. Compute both # sets of Type I sums of squares. options(contrasts=c('contr.sum','contr.ploy')) Analysis of Variance Table > lm.out1 <- lm(Yield~N*V, data=cane) > anova(lm.out1) Response: Yield Analysis of Variance Table Response: Yield Terms added sequentially (first Df Sum of Sq Mean Sq N 2 56.541 28.2703 V 2 319.374 159.6869 N:V 4 559.788 139.9469 Residuals 27 1254.460 46.4615 > lm.out2 <- lm(Yield~V*N, data=cane) > anova(lm.out2) to last) F Value Pr(F) 0.608467 0.5514780 3.436975 0.0467974 3.012107 0.0354707 Terms added sequentially (first Df Sum of Sq Mean Sq V 2 319.374 159.6869 N 2 56.541 28.2703 V:N 4 559.788 139.9469 Residuals 27 1254.460 46.4615 566 > summary(lm.out2, correlation=F) Call: lm(formula = Yield ~ V * N, data = cane) Residuals: Min 1Q Median 3Q Max -14.25 -3.131 -0.3625 3.956 11.45 Coefficients: (Intercept) V1 V2 N1 N2 V1N1 V2N1 V1N2 V2N2 Value Std. Error 66.3222 1.1360 4.1611 1.6066 -1.5139 1.6066 -0.7972 1.6066 -0.9722 1.6066 -3.1611 2.2721 -2.5611 2.2721 -0.5361 2.2721 -1.2861 2.2721 t value Pr(>|t|) 58.3800 0.0000 2.5900 0.0153 -0.9423 0.3544 -0.4962 0.6238 -0.6051 0.5501 -1.3913 0.1755 -1.1272 0.2696 -0.2360 0.8152 -0.5660 0.5760 Residual standard error: 6.816 on 27 degrees of freedom Multiple R-Squared: 0.4272 F-statistic: 2.517 on 8 and 27 degrees of freedom, the p-value is 0.03462 568 to last) F Value Pr(F) 3.436975 0.0467974 0.608467 0.5514780 3.012107 0.0354707 567 > model.matrix(lm.out2) (Intercept) V1 V2 N1 1 1 1 0 1 2 1 1 0 1 3 1 1 0 1 4 1 1 0 1 5 1 1 0 0 6 1 1 0 0 7 1 1 0 0 8 1 1 0 0 9 1 1 0 -1 10 1 1 0 -1 11 1 1 0 -1 12 1 1 0 -1 13 1 0 1 1 14 1 0 1 1 15 1 0 1 1 16 1 0 1 1 17 1 0 1 0 18 1 0 1 0 N2 V1N1 V2N1 V1N2 V2N2 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 -1 -1 0 -1 0 -1 -1 0 -1 0 -1 -1 0 -1 0 -1 -1 0 -1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 1 569 > data.frame(cane$Nitrogen,cane$Variety, cane$Yield,Pred=lm.out1$fitted, Resid=round(lm.out1$resid,3)) 1 2 3 4 5 6 7 8 9 10 11 12 X2 1 1 1 1 1 1 1 1 1 1 1 1 X3 70.5 67.5 63.9 64.2 67.3 75.9 72.2 60.5 79.9 72.8 64.8 86.3 Pred 66.525 66.525 66.525 66.525 68.975 68.975 68.975 68.975 75.950 75.950 75.950 75.950 18 60 3.5 1.5 2.5 11 0.5 sqrt(abs(Residuals)) 10 5 0 -5 Residuals -15 11 18 19 65 70 75 60 65 Fitted : N * V 70 75 fits 0 -5 Residuals 5 80 10 19 11 -15 60 65 70 18 75 -2 Fitted : N * V 0 1 2 0.20 18 11 19 0.10 Cook’s Distance -5 0 10 Residuals 5 Fitted Values -1 Quantiles of Standard Normal 0.0 570 # Create a data frame containing the original # data and the residuals and estimated means X1 150 150 150 150 210 210 210 210 270 270 270 270 19 70 1 1 -1 -1 -1 -1 0 0 0 0 -1 -1 -1 -1 1 1 1 1 Yield 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 1 1 1 1 Resid 3.975 0.975 -2.625 -2.325 -1.675 6.925 3.225 -8.475 3.950 -3.150 -11.150 10.350 572 -15 # Create diagnostic plots > par(mfrow=c(3,2)) > plot(lm.out1) 0 0 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 1 1 1 1 60 0 0 0 0 0 0 -1 -1 -1 -1 0 0 0 0 1 1 1 1 50 1 1 -1 -1 -1 -1 0 0 0 0 1 1 1 1 -1 -1 -1 -1 10 0 0 -1 -1 -1 -1 1 1 1 1 0 0 0 0 -1 -1 -1 -1 5 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Yield 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -15 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 0.0 0.2 0.4 0.6 0.8 1.0 0 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 150 150 150 150 210 210 210 210 270 270 270 270 150 150 150 150 210 210 210 210 270 270 270 270 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 10 20 0.0 0.2 0.4 0.6 0.8 1.0 Index f-value 58.6 65.2 70.2 51.8 64.3 48.3 74.0 63.6 64.4 67.3 78.0 72.0 65.8 68.3 72.7 67.6 64.1 64.8 70.9 58.3 56.3 54.7 66.2 54.4 61.450 61.450 61.450 61.450 62.550 62.550 62.550 62.550 70.425 70.425 70.425 70.425 68.600 68.600 68.600 68.600 64.525 64.525 64.525 64.525 57.900 57.900 57.900 57.900 -2.850 3.750 8.750 -9.650 1.750 -14.250 11.450 1.050 -6.025 -3.125 7.575 1.575 -2.800 -0.300 4.100 -1.000 -0.425 0.275 6.375 -6.225 -1.600 -3.200 8.300 -3.500 30 571 573 # > > > > > # Compute Type III sums of squares and # corresponding F-tests. # Generate an identity matrix and a # vector of ones > Iden <- function(n) diag(rep(1,n)) > one <- function(n) matrix(rep(1,n),ncol=1) # Compute the transpose of the model # matrix for the cell means model > > > > > s <- length(unique(cane$Nitrogen)) t <- length(unique(cane$Variety)) st <- s*t r <- length(cane$Yield)/(st) D <- t(kronecker(Iden(st), t(one(r)))) 574 > c2 <- kronecker( t(one(s)), cbind(Iden(t-1),-one(t-1)) ) > q2 <- t(b) %*% t(c2)%*%solve( c2 %*% solve(crossprod(D)) %*% t(c2))%*% c2 %*% b > df1<- t-1 > f <- (q2/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) > c2 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 1 0 -1 1 0 -1 1 0 -1 [2,] 0 1 -1 0 1 -1 0 1 -1 > data.frame(SS=q2,df=df1,F.stat=f,p.value=p) SS df F.stat p.value 1 56.54056 2 0.608467 0.551478 Least squares estimation y <- matrix(cane$Yield,ncol=1) b <- solve(crossprod(D)) %*% crossprod(D,y) yhat <- D %*% b sse <- crossprod(y-yhat) df2 <- nrow(y) - st >c1 <- kronecker( cbind(Iden(s-1),-one(s-1)), t(one(t)) ) > q1 <- t(b) %*% t(c1)%*% solve( c1 %*% solve(crossprod(D)) %*% t(c1))%*% c1 %*% b > df1<- s-1 > f <- (q1/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) > c1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 1 1 1 0 0 0 -1 -1 -1 [2,] 0 0 0 1 1 1 -1 -1 -1 > data.frame(SS=q1,df=df1,F.stat=f,p.value=p) SS df F.stat p.value 1 319.3739 2 3.436975 0.04679743 575 > c3 <- kronecker( cbind(Iden(s-1),-one(s-1)), cbind(Iden(t-1),-one(t-1)) ) > q3 <- t(b) %*% t(c3)%*% solve( c3 %*% solve(crossprod(D)) %*% t(c3))%*% c3 %*% b > df1<- (s-1)*(t-1) > f <- (q3/df1)/(sse/df2) > p <- 1-pf(f,df1,df2) > c3 [1,] [2,] [3,] [4,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] 1 0 -1 0 0 0 -1 0 1 0 1 -1 0 0 0 0 -1 1 0 0 0 1 0 -1 -1 0 1 0 0 0 0 1 -1 0 -1 1 > data.frame(SS=q3,df=df1,F.stat=f,p.value=p) SS df F.stat p.value 1 559.7878 4 3.012107 0.03547072 576 577 Conclusions: Variety 3 exhibits a \linear" decrease in yield as nitrogen increases from 150 lb/acre to 270 lb/acre. Varieties 1 and 2 exhibit parallel \linear" increasing trends in yield as nitrogen increases from 150 lb/acre to 270 lb/acre. Variety 1 appears to provide a consistently higher yield than Variety 2, but the dierence in these two varieties is not \signicant" at the .05 level. Variety 3 seems to do as well as Variety 1 at 150 lb/acre of nitrogen. /* Analysis of completely randomized factorial experiements with an application to the sugar cane data from Snedecor and Cochran. This program is posted as cane.sas */ data set1; infile 'cane.dat'; input variety nitrogen yield; run; /* Print the data */ proc print data=set1; var yield; run; /* Compute an ANOVA table */ 578 proc glm data=set1; class variety nitrogen; model yield = variety|nitrogen / p clm alpha=.05 ss1 ss2 ss3 ss4 e e1 e2 e3 e4; output out=setr r=resid p=yhat; lsmeans variety*nitrogen / stderr pdiff; means variety nitrogen / tukey; contrast 'n-linear' nitrogen -1 0 1; contrast 'n-quad' nitrogen -1 2 -1; contrast 'v1-v2' variety 1 -1 0; contrast '(v1+v2)-v3' variety .5 .5 -1; contrast '(v1-v2)*(n-lin)' variety*nitrogen -1 0 1 1 0 -1 0 0 0; contrast '(v1-v2)*(n-quad)' variety*nitrogen -1 2 -1 1 -2 1 0 0 0; contrast '(.5(v1+v2)-v3)*(n-lin)' variety*nitrogen -.5 0 .5 -.5 0 .5 1 0 -1; contrast '(.5(v1+v2)-v3)*(n-quad)' variety*nitrogen -.5 1 -.5 -.5 1 -.5 1 -2 1; 580 579 estimate estimate estimate estimate estimate 'n-linear' nitrogen -1 0 1; 'n-quad' nitrogen -1 2 -1; 'v1-v2' variety 1 -1 0; '(v1+v2)-v3' variety .5 .5 -1; '(v1-v2)*(n-lin)' variety*nitrogen -1 0 1 1 0 -1 0 0 0; estimate '(v1-v2)*(n-quad)' variety*nitrogen -1 2 -1 1 -2 1 0 0 0; estimate '(.5(v1+v2)-v3)*(n-lin)' variety*nitrogen -.5 0 .5 -.5 0 .5 1 0 -1; estimate '(.5(v1+v2)-v3)*(n-quad)' variety*nitroge -.5 1 -.5 -.5 1 -.5 1 -2 1; run; 581 /* Make a profile plots for the interaction between varieties and nitrogen levels */ /* UNIX users can use the following options */ /* goptions cback=white colors=(black) targetdevice=ps300 rotate=landscape; */ /* Windows users can use the following */ goptions cback=white colors=black device=WIN target=WINPRTC; proc sort data=set1; by variety nitrogen; proc means data=set1 noprint; by variety nitrogen; var yield; output out=means mean=my; run; axis1 label=(f=swiss h=2.5) ORDER = 120 to 300 by 30 value=(f=swiss h=2.0) w=3.0 length= 5.5 in; axis2 label=(f=swiss h=2.0) order = 50 to 80 by 10 value=(f=swiss h=2.0) w= 3.0 length = 5.5 in; SYMBOL1 V=CIRCLE H=2.0 w=3 l=1 i=join ; SYMBOL2 V=DIAMOND H=2.0 w=3 l=3 i=join ; SYMBOL3 V=square H=2.0 w=3 l=9 i=join ; PROC GPLOT DATA=means; PLOT my*nitrogen=variety / vaxis=axis2 haxis=axis1; TITLE1 H=3.0 F=swiss "Sugar Cane Yields"; LABEL my='Mean Yield'; LABEL nitrogen = 'Nitrogen (lb/acre)'; RUN; 582 General Form of Estimable Functions 583 Type I Estimable Functions Effect Coefficients Intercept L1 Intercept L2 L3 L1-L2-L3 variety variety variety 150 210 270 L5 L6 L1-L5-L6 nitrogen nitrogen nitrogen 150 210 270 150 210 270 150 210 270 L8 L9 L2-L8-L9 L11 L12 L3-L11-L12 L5-L8-L11 L6-L9-L12 L1-L2-L3-L5-L6+L8 +L9+L11+L12 variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety variety variety 1 2 3 nitrogen nitrogen nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 Effect 584 variety 0 1 2 3 1 1 1 2 2 2 3 3 3 L2 L3 -L2-L3 150 210 270 0 0 0 150 210 270 150 210 270 150 210 270 0.3333*L2 0.3333*L2 0.3333*L2 0.3333*L3 0.3333*L3 0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 585 Type I Estimable Functions Type II Estimable Functions Effect -------------Coefficients------nitrogen variety*nitrogen Effect ----Coefficients---variety Intercept 0 0 Intercept 0 variety variety variety 1 2 3 0 0 0 0 0 0 variety variety variety 1 2 3 L2 L3 -L2-L3 nitrogen nitrogen nitrogen 150 210 270 L5 L6 -L5-L6 0 0 0 nitrogen nitrogen nitrogen 150 210 270 0 0 0 variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 L8 L9 -L8-L9 L11 L12 -L11-L12 -L8-L11 -L9-L12 L8+L9+L11+L12 variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 0.3333*L2 0.3333*L2 0.3333*L2 0.3333*L3 0.3333*L3 0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 150 210 270 150 210 270 150 210 270 150 210 270 150 210 270 150 210 270 586 587 Type III Estimable Functions Type II Estimable Functions Effect -------------Coefficients--------nitrogen variety*nitrogen Intercept 0 0 variety variety variety 1 2 3 0 0 0 0 0 0 nitrogen nitrogen nitrogen 150 210 270 L5 L6 -L5-L6 0 0 0 variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 L8 L9 -L8-L9 L11 L12 -L11-L12 -L8-L11 -L9-L12 L8+L9+L11+L12 150 210 270 150 210 270 150 210 270 ----Coefficients---variety Effect 588 Intercept variety variety variety 0 1 2 3 nitrogen nitrogen nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 L2 L3 -L2-L3 150 210 270 0 0 0 150 210 270 150 210 270 150 210 270 0.3333*L2 0.3333*L2 0.3333*L2 0.3333*L3 0.3333*L3 0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 589 Type IV Estimable Functions Type III Estimable Functions Effect -------------Coefficients-------nitrogen variety*nitrogen Intercept 0 0 variety variety variety 1 2 3 0 0 0 0 0 0 nitrogen nitrogen nitrogen 150 210 270 L5 L6 -L5-L6 0 0 0 variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 L8 L9 -L8-L9 L11 L12 -L11-L12 -L8-L11 -L9-L12 L8+L9+L11+L12 150 210 270 150 210 270 150 210 270 590 Effect ----Coefficients---variety Intercept 0 variety variety variety 1 2 3 nitrogen nitrogen nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 L2 L3 -L2-L3 150 210 270 0 0 0 150 210 270 150 210 270 150 210 270 0.3333*L2 0.3333*L2 0.3333*L2 0.3333*L3 0.3333*L3 0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 -0.3333*L2-0.3333*L3 Dependent Variable: yield Type IV Estimable Functions Effect -------------Coefficients------nitrogen variety*nitrogen Intercept 0 0 Source DF Model 8 Sum of Squares 1 2 3 0 0 0 0 0 0 nitrogen nitrogen nitrogen 150 210 270 L5 L6 -L5-L6 0 0 0 C. Total 35 2190.1622 variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen variety*nitrogen 1 1 1 2 2 2 3 3 3 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 0.3333*L5 0.3333*L6 -0.3333*L5-0.3333*L6 L8 L9 -L8-L9 L11 L12 -L11-L12 -L8-L11 -L9-L12 L8+L9+L11+L12 Source 592 Error variety nitrogen var*nit Mean Square F Pr > F 935.7022 116.9628 2.52 0.0346 variety variety variety 150 210 270 150 210 270 150 210 270 591 27 1254.4600 DF Type I SS 2 2 4 319.3739 56.5406 559.7878 46.4615 Mean Square 159.6869 28.2703 139.9469 F Pr > F 3.44 0.0468 0.61 0.5515 3.01 0.0355 593 Source Mean Square DF Type II SS variety nitrogen var*nit 2 2 4 319.3739 56.5406 559.7878 F 159.6869 28.2703 139.9469 Pr > F 3.44 0.0468 0.61 0.5515 3.01 0.0355 Least Squares Means variety Source DF Type III SS variety nitrogen var*nit Source 2 2 4 319.3739 56.5406 559.7878 2 2 4 F 159.6869 28.2703 139.9469 319.3739 56.5406 559.7878 Pr > F 3.44 0.0468 0.61 0.5515 3.01 0.0355 Mean Square DF Type IV SS variety nitrogen var*nit Mean Square F 159.6869 28.2703 139.9469 Pr > F 1 1 1 2 2 2 3 3 3 nitrogen LSMEAN yield Standard Error Pr > |t| 150 210 270 150 210 270 150 210 270 66.525 68.975 75.950 61.450 62.550 70.425 68.600 64.525 57.900 3.408133 3.408133 3.408133 3.408133 3.408133 3.408133 3.408133 3.408133 3.408133 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 3.44 0.0468 0.61 0.5515 3.01 0.0355 595 594 Least Squares Means for effect variety*nitrogen Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: yield i/j 1 1 2 3 4 5 6 7 8 9 0.6154 0.0610 0.3017 0.4168 0.4255 0.6702 0.6815 0.0848 i/j 1 2 3 4 5 6 7 8 9 2 3 4 5 0.6154 0.0610 0.1594 0.3017 0.1301 0.0056 0.4168 0.1937 0.0098 0.8212 0.1594 0.1301 0.1937 0.7658 0.9386 0.3640 0.0296 6 0.4255 0.7658 0.2617 0.0735 0.1139 0.7079 0.2315 0.0150 0.0056 0.0098 0.2617 0.1389 0.0252 0.0009 7 0.6702 0.9386 0.1389 0.1495 0.2202 0.7079 0.4053 0.0350 0.8212 0.0735 0.1495 0.5289 0.4678 0.1139 0.2202 0.6852 0.3432 8 9 0.6815 0.3640 0.0252 0.5289 0.6852 0.2315 0.4053 0.0848 0.0296 0.0009 0.4678 0.3432 0.0150 0.0350 0.1806 0.1806 596 Contrast n-linear n-quad v1-v2 (v1+v2)-v3 (v1-v2)*(n-lin) (v1-v2)*(n-quad) (.5(v1+v2)-v3)*(n-lin) (.5(v1+v2)-v3)*(n-quad) DF Contrast SS Mean Square 1 1 1 1 1 1 1 1 39.5266667 17.0138889 193.2337500 126.1401389 0.2025000 1.6875000 528.0133333 29.8844444 39.5266667 17.0138889 193.2337500 126.1401389 0.2025000 1.6875000 528.0133333 29.8844444 Contrast n-linear n-quad v1-v2 (v1+v2)-v3 (v1-v2)*(n-lin) (v1-v2)*(n-quad) (.5(v1+v2)-v3)*(n-lin) (.5(v1+v2)-v3)*(n-quad) F Value Pr > F 0.85 0.37 4.16 2.71 0.00 0.04 11.36 0.64 0.3645 0.5501 0.0513 0.1110 0.9478 0.8503 0.0023 0.4296 597 Two factor experiments with empty cells: Parameter n-linear n-quad v1-v2 (v1+v2)-v3 (v1-v2)*(n-lin) (v1-v2)*(n-quad) (.5(v1+v2)-v3)*(n-lin) (.5(v1+v2)-v3)*(n-quad) Estimate Standard Error 2.5666667 -2.9166667 5.6750000 3.9708333 0.4500000 2.2500000 19.9000000 -8.2000000 2.7827289 4.8198279 2.7827289 2.4099139 6.8162659 11.8061189 5.9030595 10.2243989 Parameter Pr > |t| n-linear n-quad v1-v2 (v1+v2)-v3 (v1-v2)*(n-lin) (v1-v2)*(n-quad) (.5(v1+v2)-v3)*(n-lin) (.5(v1+v2)-v3)*(n-quad) 0.3645 0.5501 0.0513 0.1110 0.9478 0.8503 0.0023 0.4296 t Value 0.92 -0.61 2.04 1.65 0.07 0.19 3.37 -0.80 Data from Littell, Freund, and Spector, 1991, SAS System for Linear Models, 3rd edition, SAS Institute, Cary, N.C. Factor A i=1 i=2 j=1 Y111 = 5 Y112 = 6 Y211 = 2 Y212 = 3 Factor B j=2 j=3 Y121 = 2 Y122 = 3 { Y123 = 5 Y124 = 6 Y125 = 7 Y221 = 8 Y231 = 4 Y222 = 8 Y232 = 4 Y223 = 9 Y233 = 6 Y234 = 6 Y235 = 7 598 599 = E (Yij:) = + i + j + ij is estimable for all (i; j ) 6= (1; 3). ij Sample sizes: Factor B Factor A j = 1 j = 2 j = 3 i = 1 n11 = 2 n12 = 5 { i = 2 n21 = 2 n22 = 3 n23 = 5 Functions of parameters that are not estimable include: 13 = + 1 + 3 + 13 1 X2 X3 ij :: = 6 i=1 j=1 = + 21 (1 + 2) + 13 (1 + 2 + 3) + 16 (11 + 12 + 13 + 21 + 22 + 23): Eects model: Yijk = + i + j + ij + ijk for (i; j ) 6= (1; 3) and k = 1; : : : ; nij 600 3 = 13 X 1j j =1 1 :3 = (13 + 23) 2 1: 601 Two factor classications with empty cells: No single \best" or \correct" analysis. Analysis of variance { Test for interaction is useful { Use SSE to estimate the error variance 2. { Tests for \main eects" may not be meaningful, especially in the presence of interaction. Compute F-tests and sums of squares for meaningful contrasts. Compare estimated means for dierent combinations of factor levels. It may be most convenient to consider the various combinations of factor levels as levels of a single \combined" factor. { one-way ANOVA { contrasts { compare means 602 603 2 1 2 1 2 2 2 2 2 2 2 3 2 3 2 3 2 3 2 3 run; /* SAS code for analyzing data from the two factor experiment with no data for one combination of factors> This code is posted as littell.sas */ data set1; input A B y; cards; 1 1 5 1 1 6 1 2 2 1 2 3 1 2 5 1 2 6 1 2 7 2 3 8 8 9 4 4 6 6 7 /* Print the data */ proc print data=set1; run; /* Compute sample means for all factor combinations with data. Make a profile plot. */ 604 605 proc sort data=set1; by a b; proc means data=set1 noprint; by a b; var Y; output out=means mean=my; run; SYMBOL1 V=circle H=2.0 w=3 l=1 i=join; SYMBOL2 V=diamond H=2.0 w=3 l=3 i=join; goptions cback=white colors=black device=WIN target=WINPRTC; /* goptions cback=white colors=(black) targetdevice=ps300 rotate=landscape; */ proc gplot data=means; plot my*b=a / vaxis=axis2 haxis=axis1; title ls=0.8in H=3.0 F=swiss "Sample Means"; label my='Mean'; label b = 'Factor B'; footnote ls=0.4in ' '; run; /* Perform analysis of variance where facror A is entered into the model before factor B. Use the LSMEANS statement to compare means for different combinations of factor A and factor B. */ axis1 label=(f=swiss h=2.0) value=(f=swiss h=1.8) w=3.0 length= 5.0 in; axis2 label=(f=swiss h=2.0 a=90 r=0) value=(f=swiss h=1.8) w= 3.0 length = 5.0 in; 606 607 proc glm data=set1; class A B; model y = A B A*B / solution ss1 ss2 ss3 ss4 e e1 e2 e3 e4 p; means A B A*B; lsmeans A*B / pdiff tdiff stderr; estimate 'A1-A2' A 1 -1 / e; contrast 'A1-A2' A 1 -1 / e; estimate 'A1-A2 within B1' A 1 -1 A*B 1 0 -1 0 0 / e; estimate 'A1-A2 within B2' A 1 -1 A*B 0 1 0 -1 0 / e; estimate 'A1-A2 over B' A 1 -1 A*B .5 .5 -.5 -.5 0 / e; estimate 'B1-B2 over A' B 1 -1 0 A*B .5 -.5 .5 -.5 0 / e; estimate 'B3-.5(B1+B2) in A2' B -.5 -.5 1 A*B 0 0 -.5 -.5 1 / e; estimate 'interaction' A*B 1 -1 -1 1 0 / e; run; 608 609 /* Do everything with a one-factor ANOVA by combining the two factors into a single factor with 5 categories. */ data set1; set set1; C=10*A+B; run; proc glm data=set1; class C; model y = C / solution e e2; estimate 'C11-C21' C 1 0 -1 0 0; estimate 'C12-C22' C 0 1 0 -1 0; estimate '.5(C11+C12-C21+C22)' C .5 .5 -.5 -.5 0; estimate '.5(C11-C12+C21-C22)' C .5 -.5 .5 -.5 0; estimate 'C23-.5(C21+C22)' C 0 0 -.5 -.5 1; estimate 'C11-C12-C21+C22' C 1 -1 -1 1 0; lsmeans C / stderr tdiff pdiff; run; General Form of Estimable Functions Effect Coefficients Intercept L1 A A 1 2 L2 L1-L2 B B B 1 2 3 L4 L5 L1-L4-L5 A*B A*B A*B A*B A*B 1 1 2 2 2 1 2 1 2 3 L7 L2-L7 L4-L7 -L2+L5+L7 L1-L4-L5 611 610 Type IV Estimable Functions Type III Estimable Functions Effect -----------Coefficients----------A B A*B Intercept 0 0 0 A A 1 2 L2 -L2 0 0 0 0 B B B 1 2 3 0 0 0 L4 L5 -L4-L5 0 0 0 A*B A*B A*B A*B A*B 1 1 2 2 2 0.5*L2 0.5*L2 -0.5*L2 -0.5*L2 0 0.25*L4-0.25*L5 -0.25*L4+0.25*L5 0.75*L4+0.25*L5 0.25*L4+0.75*L5 -L4-L5 L7 -L7 -L7 L7 0 1 2 1 2 3 Effect ------Coefficients-----A B A*B Intercept 0 0 0 A A 1 2 L2 -L2 0 0 0 0 B B B 1 2 3 0 0 0 L4 L5 -L4-L5 0 0 0 A*B A*B A*B A*B A*B 1 1 2 2 2 0.5*L2 0.5*L2 -0.5*L2 -0.5*L2 0 0 0 L4 L5 -L4-L5 L7 -L7 -L7 L7 0 1 2 1 2 3 NOTE: Other Type IV estimable functions exist. 612 613 General Form of Estimable Functions Effect Coefficients Intercept L1 Dependent Variable: y L2 L3 L4 L5 L1-L2-L3-L4-L5 Source DF Sum of Squares Mean Square F Value Pr > F Model 4 45.8157 11.4539 5.27 0.0110 Error 12 26.0667 2.1722 C. Total 16 71.8824 C C C C C 11 12 21 22 23 Type II Estimable Functions Effect -CoefficientsC Intercept 0 C C C C C 11 12 21 22 23 Parameter L2 L3 L4 L5 -L2-L3-L4-L5 C Standard Error Pr > |t| LSMEAN Number 11 12 21 22 23 5.5000 4.6000 2.5000 8.3333 5.4000 1.0421 0.6591 1.0422 0.8509 0.6591 0.0002 <.0001 0.0336 <.0001 <.0001 1 2 3 4 5 1 2 3 4 5 -0.7299 0.4795 -2.0355 0.0645 2.1059 0.0569 -0.0811 0.9367 0.7299 0.4795 -1.70301 0.1143 3.46853 0.0046 0.85824 0.4076 2.0355 0.0645 1.7030 0.1143 4.3357 0.0010 2.3518 0.0366 1.4738 1.0763 0.9125 0.9125 0.9418 1.8250 2.04 -3.47 -0.40 -2.70 -0.02 3.69 location of empty cells ordering of the levels for the row and column factors 4 -2.1059 0.0569 -3.4685 0.0046 -4.3357 0.0010 -2.7253 0.0184 0.0645 0.0046 0.6949 0.0192 0.9862 0.0031 Estimable functions for Type IV sums of squares may depend on Dependent Variable: y 3 3.0000 -3.7333 -0.3667 -2.4667 0.0167 6.7333 Pr > |t| 615 Least Squares Means for Effect C t for H0: LSMean(i)=LSMean(j) / Pr > |t| 2 t 614 LSMEAN y 1 Standard Error C11-C21 C12-C22 .5(C11+C12-C21+C22) .5(C11-C12+C21-C22) C23-.5(C21+C22) C11-C12-C21+C22 Least Squares Means i/j Estimate 5 0.0811 0.9367 -0.8582 0.4076 -2.3518 0.0366 2.7253 0.0184 616 Example: Exchange columns 1 and 3 in the previous example. Factor 2 A B C Factor 1 (old j=3) (old j=1) Y12: = 4:6 Y13 = 5:5 i=1 { n12 = 5 n13 = 2 i = 2 Y21: = 5:4 Y22 = 8:33 Y23: = 2:5 n21 = 5 n22 = 3 n23 = 2 617 Additive model: Yijk Type IV estimable functions for Factor B: Main Eects A B i=1 { 0 i=2 1 0 Interaction C 0 -1 For this model E (Yijk ) = ij = + i + j may be estimable when nij = 0. For example 8.1, n13 = 0, but A B C i = 1 { .5 -.5 i = 2 0 .5 -.5 1 2 (1B +1 2B ) 2 (1C + 2C ) 2A 2C = + i + j + ijk i = 1; : : : ; a j = 1; : : : ; b k + 1; : : : ; nij 13 In either case, Type IV sums of squares and testable functions are not the same as Type III sums of squares and testable functions. = + 1 + 3 = ( + 2 + 3) ( + 2 + 2) + ( + 1 + 2) = 23 + (12 22) = E (Y23: Y22: + Y12:) 618 ANOVA Sum of Squares R() Sum of Squares R() Associated null hypothesis a n b X i: + X n:j = 0 H0 : + i n:: n:: j R(j) 619 or i=1 a X b X j =1 nij ij = 0 H0 : i=1 j =1 n:: R(j) b X nij H0 : i + j j =1 ni: are equal for all i = 1; : : : ; a or H0 : R(j; ) H0 : j b X nij ij are equal for all i j =1 ni: are equal for all j = 1; : : : ; b or H0 : i=1 a X b X H 0 : j + j =1 nij ij = 0 i=1 j =1 n:: a X nij i i=1 n:j are equal for all j = 1; : : : ; b 620 Associated null hypothesis a n b X i: + X n:j = 0 H0 : + i n:: n:: j a nij ij are equal for all j i=1 n:j or H0 : X R(j; ) H0 : i are equal for all i = 1; : : : ; a 621