Topic 24: Two-Way ANOVA Outline • Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model Two-Way ANOVA • The response variable Y is continuous • There are two categorical explanatory variables or factors Data for two-way ANOVA • • • • Y is the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Yijk is the kth observation in cell (i,j) • In Chapter 19, we assume equal sample size in each cell (nij=n) KNNL Example • KNNL p 833 • Y is the number of cases of bread sold • A is the height of the shelf display, a=3 levels: bottom, middle, top • B is the width of the shelf display, b=2 levels: regular, wide • n=2 stores for each of the 3x2=6 treatment combinations (nT=12) Read the data data a1; infile ‘../data/ch19ta07.txt'; input sales height width; proc print data=a1; run; The data Obs 1 2 3 4 5 6 7 8 9 10 11 12 sales 47 43 46 40 62 68 67 71 41 39 42 46 height 1 1 1 1 2 2 2 2 3 3 3 3 width 1 1 2 2 1 1 2 2 1 1 2 2 Notation • For Yijk we use – i to denote the level of the factor A – j to denote the level of the factor B – k to denote the kth observation in cell (i,j) • i = 1, . . . , a levels of factor A • j = 1, . . . , b levels of factor B • k = 1, . . . , n observations in cell (i,j) Model • We assume that the response variable observations are – Normally distributed • With a mean that may depend on the levels of the factors A and B • With a constant variance – Independent Cell Means Model • Yijk = μij + εijk – where μij is the theoretical mean or expected value of all observations in cell (i,j) – the εijk are iid N(0, σ2) • This means Yijk ~ N(μij, σ2), independent • The parameters of the model are – μij, for i = 1 to a and j = 1 to b – σ2 Estimates • Estimate μij by the mean of the observations in cell (i,j), Yij. • Yij. (k Yijk ) / n • For each (i,j) combination, we can get an estimate of the variance • s k (Yijk Yij. ) /( n 1) • We need to combine these to get an estimate of σ2 2 ij 2 Pooled estimate of 2 σ • In general we pool the sij2, using weights proportional to the df, nij -1 • The pooled estimate is s2 = (Σ (nij-1)sij2) / (Σ(nij-1)) • Here, nij = n, so s2 = (Σsij2) / (ab), which is the average sample variance Run proc glm proc glm data=a1; class height width; model sales= height width height*width; means height width height*width; run; Output Class Level Information Class Levels Values height 31 2 3 width 21 2 Number of Observations Read Number of Observations Used 12 12 Means statement height sales Level of height 1 N 4 Mean 44.0000000 Std Dev 3.16227766 2 4 67.0000000 3.74165739 3 4 42.0000000 2.94392029 Means statement width sales Level of width 1 N Mean Std Dev 6 50.0000000 12.0664825 2 6 52.0000000 13.4313067 Means statement ht*w Level of height 1 1 2 2 3 3 Level of width 1 2 1 2 1 2 N 2 2 2 2 2 2 sales Mean Std Dev 45.0000000 2.82842712 43.0000000 4.24264069 65.0000000 4.24264069 69.0000000 2.82842712 40.0000000 1.41421356 44.0000000 2.82842712 Code the factor levels data a1; set a1; if height eq 1 and then hw='1_BR'; if height eq 1 and then hw='2_BW'; if height eq 2 and then hw='3_MR'; if height eq 2 and then hw='4_MW'; if height eq 3 and then hw='5_TR'; if height eq 3 and then hw='6_TW'; width eq 1 width eq 2 width eq 1 width eq 2 width eq 1 width eq 2 Plot the data symbol1 v=circle i=none; proc gplot data=a1; plot sales*hw/frame; run; The plot Put the means in a2 proc means data=a1; var sales; by height width; output out=a2 mean=avsales; proc print data=a2; run; Output Data Set Obs height width _TYPE_ 1 2 3 4 5 6 1 1 2 2 3 3 1 2 1 2 1 2 0 0 0 0 0 0 _FREQ_ avsales 2 2 2 2 2 2 45 43 65 69 40 44 Plot the means symbol1 v=square i=join c=black; symbol2 v=diamond i=join c=black; proc gplot data=a2; plot avsales*height=width/frame; run; The interaction plot Questions to consider • Does the height of the display affect sales? If yes, compare top with middle, top with bottom, and middle with bottom • Does the width of the display affect sales? If yes, compare regular and wide But wait!!! Are these factor level comparisons meaningful? • Does the effect of height on sales depend on the width? • Does the effect of width on sales depend on the height? • If yes, we have an interaction and we need to do some additional analysis Factor effects model • For the one-way ANOVA model, we wrote μi = μ + αi • Here we use μij = μ + αi + βj + (αβ)ij • Under “common” formulation – μ (μ.. in KNNL) is the “overall mean” – αi is the main effect of A – βj is the main effect of B – (αβ)ij is the interaction between A and B Factor effects model • μ = (Σij μij)/(ab) • μi. = (Σj μij)/b and μ.j = (Σi μij)/a • αi = μi. – μ and βj = μ.j - μ • (αβ)ij is difference between μij and μ + αi + βj • (αβ)ij = μij - (μ + (μi. - μ) + (μ.j - μ)) = μij – μi. – μ.j + μ Interpretation • • • • • μij = μ + αi + βj + (αβ)ij μ is the “overall” mean αi is an adjustment for level i of A βj is an adjustment for level j of B (αβ)ij is an additional adjustment that takes into account both i and j that cannot be explained by the previous adjustments Constraints for this framework • α. = Σi αi= 0 • β. = Σjβj = 0 • (αβ).j = Σi (αβ)ij = 0 for all j • (αβ)i. = Σj (αβ)ij = 0 for all i Estimates for factor effects model ˆ Y... (ijk Yijk ) / abn ˆ i. Yi.. and ˆ .j Y. j. ˆ i Yi.. Y... and ˆ j Y. j. Y... (ˆ ) ij Yij. Yi.. Y. j. Y... SS for ANOVA Table SSA ijk ˆ ijk (Yi .. Y... ) 2 i SSB ijk 2 ˆ j SSAB ijk ( ˆ) 2 ij SSE ijk (Yijk Yij . ) 2 SSTO ijk (Yijk Y... ) 2 2 df for ANOVA Table • dfA = a-1 • • • • dfB = b-1 dfAB = (a-1)(b-1) dfE = ab(n-1) dfT = abn-1 = nT-1 MS for ANOVA Table • • • • • MSA = SSA/dfA MSB = SSB/dfB MSAB = SSAB/dfAB MSE = SSE/dfE MST = SST/dfT Hypotheses for two-way ANOVA • H0A: αi = 0 for all i • H1A: αi ≠ 0 for at least one i • H0B: βj = 0 for all j • H1B: βj ≠ 0 for at least one j • H0AB: (αβ)ij = 0 for all (i,j) • H1AB: (αβ)ij ≠ 0 for at least one (i,j) F statistics • H0A is tested by FA = MSA/MSE; df=dfA, dfE • H0B is tested by FB = MSB/MSE; df=dfB, dfE • H0AB is tested by FAB = MSAB/MSE; df=dfAB, dfE ANOVA Table Source df SS A a-1 SSA B b-1 SSB AB (a-1)(b-1) SSAB Error ab(n-1) SSE Total abn-1 SSTO MS F MSA MSA/MSE MSB MSB/MSE MSAB MSAB/MSE MSE _ MST P-values • P-values are calculated using the F(dfNumerator, dfDenominator) distributions • If P ≤ 0.05 we conclude that the effect being tested is statistically significant KNNL Example • NKNW p 833 • Y is the number of cases of bread sold • A is the height of the shelf display, a=3 levels: bottom, middle, top • B is the width of the shelf display, b=2: regular, wide • n=2 stores for each of the 3x2 treatment combinations PROC GLM proc glm data=a1; class height width; model sales= height width height*width; run; Output Source DF Model 5 Error 6 Corrected Total 11 Sum of Squares 1580.0000 62.000000 1642.0000 Mean Square F Value Pr > F 316.000000 30.58 0.0003 10.333333 Note that there are 6 cells in this design…(6-1)df for model Output ANOVA Source height DF Type III SS Mean Square F Value Pr > F 2 1544.00000 772.000000 74.71 <.0001 width 1 12.000000 12.000000 1.16 0.3226 height*width 2 24.000000 12.000000 1.16 0.3747 Note Type I and Type III Analyses are the same because nij is constant Other output R-Square Coeff Var Root MSE sales Mean 0.962241 6.303040 3.214550 51.00000 Commonly do not consider R-sq when performing ANOVA…interested more in difference in levels rather than the models predictive ability Results • The main effect of height is statistically significant (F=74.71; df=2,6; P<0.0001) • The main effect of width is not statistically significant (F=1.16; df=1,6; P=0.32) • The interaction between height and width is not statistically significant (F=1.16; df=2,6; P=0.37) Interpretation • The height of the display affects sales of bread • The width of the display has no apparent effect • The effect of the height of the display is similar for both the regular and the wide widths Plot of the means Additional analyses • We will need to do additional analyses to explain the height effect (factor A) • There were three levels: bottom, middle and top • We could rerun the data with a oneway anova and use the methods we learned in the previous chapters • Use means statement with lines Run Proc GLM proc glm data=a1; class height width; model sales= height width height*width; means height / tukey lines; lsmeans height / adjust=tukey; run; MEANS Output Alpha Error Degrees of Freedom Error Mean Square Critical Value of Studentized Range Minimum Significant Difference 0.05 6 10.33333 4.33920 6.9743 Means with the same letter are not significantly different. Tukey Grouping Mean N height A 67.000 42 B B B 44.000 41 42.000 43 LSMEANS Output sales LSMEAN 44.0000000 LSMEAN Number 1 2 67.0000000 2 3 42.0000000 3 height 1 Least Squares Means for effect height Pr > |t| for H0: LSMean(i)=LSMean(j) i/j 1 2 3 Dependent Variable: sales 1 2 0.0001 0.0001 0.6714 <.0001 3 0.6714 <.0001 Last slide • We went over Chapter 19 • We used program topic24.sas to generate the output for today.