Topic 28: Unequal Replication in Two-Way ANOVA Outline • Two-way ANOVA with unequal numbers of observations in the cells – Data and model – Regression approach – Parameter estimates • Previous analyses with constant n just special case Data for two-way ANOVA • Y is the response variable • Factor A with levels i = 1 to a • Factor B with levels j = 1 to b • Yijk is the kth observation in cell (i,j) • k = 1 to nij and nij may vary Recall Bread Example • KNNL p 833 • Y is the number of cases of bread sold • A is the height of the shelf display, a=3 levels: bottom, middle, top • B is the width of the shelf display, b=2: regular, wide • n=2 stores for each of the 3x2 treatment combinations (BALANCED) Regression Approach • Create a-1 dummy variables to represent levels of A • Create b-1 dummy variables to represent levels of B • Multiply each of the a-1 variables with b-1 variables for B to get variables for AB LET’S LOOK AT THE RELATIONSHIP AMONG THESE SETS OF VARIABLES Common Set of Variables 0, 0, ( ) 0, ( ) data a2; 0 set a1; X1 = (height eq 1) - (height eq 3); X2 = (height eq 2) - (height eq 3); X3 = (width eq 1) - (width eq 2); X13 = X1*X3; X23 = X2*X3; i i i j ij j j ij Run Proc Reg proc reg data=a2; model sales= X1 X2 X3 X13 X23 / XPX I; height: test X1, X2; width: test X3; interaction: test X13, X23; run; X′X Matrix Model Crossproducts X'X X'Y Y'Y Variable Intercept X1 X2 X3 X13 X23 Sets of Intercept 12 0 0 0 0 0 variables X1 0 8 4 0 0 0 orthogonal X2 0 4 8 0 0 X3 0 0 0 12 0 X13 0 0 0 0 8 0 Crossproducts 0 between 4 sets is 0 X23 0 0 0 0 4 8 Orthogonal X’s • Order in which the variables are fit in the model does not matter – Type I SS = Type III SS • Order of fit not mattering is true for all choices of restrictions when nij is constant • Orthogonality lost when nij are not constant KNNL Example • KNNL p 954 • Y is the change in growth rates for children after a treatment • A is gender, a=2 levels: male, female • B is bone development, b=3 levels: severely, moderately, or mildly depressed • nij=3, 2, 2, 1, 3, 3 children in the groups Read and check the data data a3; infile 'c:\...\CH23TA01.txt'; input growth gender bone; proc print data=a1; run; Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 growth 1.4 2.4 2.2 2.1 1.7 0.7 1.1 2.4 2.5 1.8 2.0 0.5 0.9 1.3 gender 1 1 1 1 1 1 1 2 2 2 2 2 2 2 bone 1 1 1 2 2 3 3 1 2 2 2 3 3 3 Common Set of Variables 0, 0, ( ) 0, ( ) data a3; 0 set a3; X1 = (bone eq 1) - (bone eq 3); X2 = (bone eq 2) - (bone eq 3); X3 = (gender eq 1) - (gender eq 2); X13 = X1*X3; X23 = X2*X3; i i i j ij j j ij Run Proc Reg proc reg data=a3; model growth= X1 X2 X3 X13 X23 / XPX I; run; X′X Matrix Model Crossproducts X'X X'Y Y'Y Variable Intercept X1 X2 X3 X13 X23 CrossIntercept 14 -1 0 0 3 0 product X1 -1 9 5 3 1 -1 terms no longer 0 X2 0 5 10 0 -1 -2 Order of X3 0 3 0 14 -1 0 fit X13 3 1 -1 -1 9 5 matters X23 0 -1 -2 0 5 10 How does this impact the analysis? • In regression, this happens all the time (explanatory variables are correlated) – t tests look at significance of variable when fitted last • When looking at comparing means order of fit will alter null hypothesis Prepare the data for a plot data a1; set a1; if (gender eq 1)*(bone then gb='1_Msev '; if (gender eq 1)*(bone then gb='2_Mmod '; if (gender eq 1)*(bone then gb='3_Mmild'; if (gender eq 2)*(bone then gb='4_Fsev '; if (gender eq 2)*(bone then gb='5_Fmod '; if (gender eq 2)*(bone then gb='6_Fmild'; eq 1) eq 2) eq 3) eq 1) eq 2) eq 3) Plot the data title1 'Plot of the data'; symbol1 v=circle i=none; proc gplot data=a1; plot growth*gb; run; Find the means proc means data=a1; output out=a2 mean=avgrowth; by gender bone; run; Plot the means title1 'Plot of the means'; symbol1 v='M' i=join c=blue; symbol2 v='F' i=join c=green; proc gplot data=a2; plot avgrowth*bone=gender; run; Plot of the means avgrowth 2.4 F Interaction? 2.2 F 2.0 M M 1.8 1.6 1.4 1.2 1.0 M F 0.8 1 2 3 bone gender M M M 1 F F F 2 Cell means model • Yijk = μij + εijk – where μij is the theoretical mean or expected value of all observations in cell (i,j) – the εijk are iid N(0, σ2) – Yijk ~ N(μij, σ2), independent Estimates • Estimate μij by the mean of the observations in cell (i,j), Yij. Yij. ( k Yijk ) / n ij • For each (i,j) combination, we can get an estimate of the variance s k ( Yijk Yij. ) /( n ij 1) 2 ij 2 • We pool these to get an estimate of σ2 Pooled estimate of 2 σ • In general we pool the sij2, using weights proportional to the df, nij -1 • The pooled estimate is s2 = (Σ (nij-1)sij2) / (Σ(nij-1)) Nothing different in terms of parameter estimates from balanced design Run proc glm proc glm data=a1; class gender bone; model growth=gender|bone/solution; means gender*bone; run; Shorthand way to write main effects and interactions Parameter Estimates • Solution option on the model statement gives parameter estimates for the glm parameterization • These constraints are – Last level of main effect is zero – Interaction terms with a or b are zero • These reproduce the cell means in the usual way Parameter Estimates Parameter Intercept gender 1 bone 1 bone 2 gender*bone 1 1 gender*bone 1 2 Estimate 0.90000000 -0.00000000 1.50000000 1.20000000 -0.40000000 -0.20000000 B B B B B B Standard Error t Value Pr > |t| 0.2327373 3.87 0.0048 0.3679900 -0.00 1.0000 0.4654747 3.22 0.0122 0.3291403 3.65 0.0065 0.5933661 -0.67 0.5192 0.5204165 -0.38 0.7108 Example : ˆ 22 0.90 1.20 2.10 Output Source Model Error Sum of DF Squares 5 4.4742857 8 1.3000000 Mean Square F Value Pr > F 0.89485714 5.51 0.0172 0.16250000 Corrected Total 13 5.7742857 Note DF and SS add as usual Output Type I SS Source gender DF Type I SS Mean Square F Value Pr > F 1 0.0028571 0.00285714 0.02 0.8978 bone 2 4.3960000 2.19800000 13.53 0.0027 gender*bone 2 0.0754286 0.03771429 0.23 0.7980 SSG+SSB+SSGB=4.47429 Output Type III SS Source gender DF Type III SS Mean Square F Value Pr > F 1 0.12000000 0.12000000 0.74 0.4152 bone 2 4.18971429 2.09485714 12.89 0.0031 gender*bone 2 0.07542857 0.03771429 0.23 0.7980 SSG+SSB+SSGB=4.38514 Type I vs Type III • SS for Type I add up to model SS • SS for Type III do not necessarily add up • Type I and Type III are the same for the interaction because last term in model • The Type I and Type III analysis for the main effects are not necessarily the same • Different hypotheses are being examined Type I vs Type III • Most people prefer the Type III analysis • This can be misleading if the cell sizes differ greatly • Contrasts can provide some insight into the differences in hypotheses Contrast for A*B • Same for Type I and Type III • Null hypothesis is that the profiles are parallel; see plot for interpretation • μ12 - μ11 = μ22 - μ21 and μ13 - μ12 = μ23 - μ22 • μ11 - μ12 - μ21 + μ22 = 0 and μ12 - μ13 - μ22 + μ23 = 0 A*B Contrast statement contrast 'gender*bone Type I and III' gender*bone 1 -1 0 -1 1 0, gender*bone 0 1 -1 0 -1 1; run; Type III Contrast for gender • (1) μ11 = (1)(μ + α1 + β1 + (αβ)11) • (1) μ12 = (1)(μ + α1 + β2 + (αβ)12) • (1) μ13 = (1)(μ + α1 + β3 + (αβ)13) • (-1) μ21 = (-1)(μ + α2 + β1 + (αβ)21) • (-1) μ22 = (-1)(μ + α2 + β2 + (αβ)22) • (-1) μ23 = (-1)(μ + α2 + β3 + (αβ)23) L = 3α1 – 3α2 + (αβ)11 + (αβ)12 + (αβ)13 – (αβ)21 – (αβ)22 – αβ23 Contrast statement Gender Type III contrast 'gender Type III' gender 3 -3 gender*bone 1 1 1 -1 -1 -1; Type I Contrast for gender • (3) μ11 = (3)(μ + α1 + β1 + (αβ)11) • (2) μ12 = (2)(μ + α1 + β2 + (αβ)12) • (2) μ13 = (2)(μ + α1 + β3 + (αβ)13) • (-1) μ21 = (-1)(μ + α2 + β1 + (αβ)21) • (-3) μ22 = (-3)(μ + α2 + β2 + (αβ)22) • (-3) μ23 = (-3)(μ + α2 + β3 + (αβ)23) L = (7α1 – 7α2 )+(2β1 – β2 – β3)+3(αβ)11 +2(αβ)12 +2(αβ)13 –1(αβ)21 –3(αβ)22 –3(αβ)23 Contrast statement Gender Type I contrast 'gender Type I' gender 7 -7 bone 2 -1 –1 gender*bone 3 2 2 -1 -3 -3; Contrast output Contrast DF Contrast SS gender Type III 1 0.12000000 gender Type I 1 0.00285714 bone Type III 2 4.18971429 gender*bone Type I and III 2 0.07542857 Summary • Type I and Type III F tests test different null hypotheses • Should be aware of the differences • Most prefer Type III as it follows logic similar to regression analysis • Be wary, however, if the cell sizes vary dramatically Comparing Means • If interested in Type III hypotheses, need to use LSMEANS to do comparisons • If interested in Type I hypotheses, need to use MEANS to do comparisons. • We will show this difference via the ESTIMATE statement SAS Commands • Will use earlier contrast code to set up the ESTIMATE commands estimate 'gender Type III' gender 3 -3 gender*bone 1 1 1 -1 -1 -1 / divisor=3; estimate 'gender Type I' gender 7 -7 bone 2 -1 -1 gender*bone 3 2 2 -1 -3 -3 / divisor=7; MEANS OUPUT Level of ------------growth----------gender N Mean Std Dev 1 2 7 7 1.65714286 1.62857143 0.62411843 0.75655862 Diff = 0.0286 LSMEANS OUPUT gender 1 2 growth LSMEAN 1.60000000 1.80000000 Diff = -0.20 Estimate output Parameter Estimate gender Type III -0.200 gender Type I 0.029 Std Err 0.2327 0.2155 Notice that these two estimates agree with the difference of estimates for LSMEANS or MEANS Analytical Strategy • First examine interaction • Some options when the interaction is significant – Interpret the plot of means – Run A at each level of B and/or B at each level of A – Run as a one-way with ab levels – Use contrasts Analytical Strategy • Some options when the interaction is not significant – Use a multiple comparison procedure for the main effects – Use contrasts for main effects – If needed, rerun without the interaction Example continued proc glm data=a3; class gender bone; model growth=gender bone/ solution; For Type I hypotheses means gender bone/ tukey lines; run; Pool here because small df error Output Source Model Sum of DF Squares Mean Square F Value Pr > F 3 4.3988571 1.46628571 10.66 0.0019 Error 10 1.3754286 Corrected Total 13 5.7742857 0.13754286 Output Type I SS Source gender bone DF Type I SS Mean Square F Value Pr > F 1 0.00285714 0.00285714 0.02 0.8883 2 4.39600000 2.19800000 15.98 0.0008 Output Type III SS Source gender bone DF Type III SS Mean Square F Value Pr > F 1 0.09257143 0.09257143 0.67 0.4311 2 4.39600000 2.19800000 15.98 0.0008 Although different null hypothesis for gender, both Type I and III tests are not found significant Tukey comparisons Group Mean N bone A A A 2.1000 4 1 2.0200 5 2 B 0.9000 5 3 Tukey Comparisons • Why don’t we need a Tukey adjustment for gender? • Means statement does provide mean estimates so you know directionality of F test but that is all the statement provides you Last slide • Read KNNL Chapter 23 • We used program topic28.sas to generate the output for today