Topic 25: Inference for Two-Way ANOVA Outline • Two-way ANOVA – Data, models, parameter estimates • ANOVA table, EMS • Analytical strategies • Regression approach Data • Response written Yijk where – i denotes the level of the factor A – j denotes the level of the factor B – k denotes the kth observation in cell (i,j) • i = 1, . . . , a levels of factor A • j = 1, . . . , b levels of factor B • k = 1, . . . , n observations in cell (i,j) Cell means model • Yijk = μij + εijk – where μij is the theoretical mean or expected value of all observations in cell (i,j) – the εijk are iid N(0, σ2) – This means Yijk ~N(μij, σ2) and independent Factor effects model • μij = μ + αi + βj + (αβ)ij • Consider μ to be the overall mean • αi is the main effect of A • βj is the main effect of B • (αβ)ij is the interaction between A and B Constraints for this interpretation • α. = Σiαi = 0 (df = a-1) • β. = Σjβj = 0 (df = b-1) • (αβ).j = Σi (αβ)ij = 0 for all j • (αβ)i. = Σj (αβ)ij= 0 for all I df = (a-1)(b-1) SAS GLM Constraints • • • • αa = 0 (1 constraint) βb = 0 (1 constraint) (αβ)aj = 0 for all j (b constraints) (αβ)ib = 0 for all i (a constraints) • The total is 1+1+a+b-1=a+b+1 (the constraint (αβ)ab is counted twice in the last two bullets above) Parameters and constraints • The cell means model has ab parameters for the means • The factor effects model has (1+a+b+ab) parameters – An intercept (1) – Main effect of A (a) – Main effect of B (b) – Interaction of A and B (ab) Factor effects model • There are 1+a+b+ab parameters • There are 1+a+b constraints • There are ab unconstrained parameters (or sets of parameters), the same number of parameters for the means in the cell means model • While certain parameters depend on choice of constraints, others do not KNNL Example • KNNL p 833 • Y is the number of cases of bread sold • A is the height of the shelf display, a=3 levels: bottom, middle, top • B is the width of the shelf display, b=2: regular, wide • n=2 stores for each of the 3x2 treatment combinations Proc GLM with solution proc glm data=a1; class height width; model sales=height width height*width /solution; means height*width; run; Solution output Intercept height height height width width 1 2 3 1 2 44.0 -1.0 25.0 0.0 -4.0 0.0 B B B B B B Solution output height*width height*width height*width height*width height*width height*width 1 1 2 2 3 3 1 2 1 2 1 2 6.0 0.0 0.0 0.0 0.0 0.0 B B B B B B Means Based on estimates from previous two pages height width Mean 1 1 45=44 -1-4+6 1 2 43=44 -1+0+0 2 1 65=44+25-4+0 2 2 69=44+25+0+0 3 1 40=44 +0-4+0 3 2 44=44 +0+0+0 Check normality Alternative way to form QQplot proc glm data=a1; class height width; model sales=height width height*width; output out=a2 r=resid; proc rank data=a2 out=a3 normal=blom; var resid; ranks zresid; Normal Quantile plot proc sort data=a3; by zresid; symbol1 v=circle i=sm70; proc gplot data=a3; plot resid*zresid/frame; run; The plot Note, dfE is only 6 ANOVA Table Source df SS A a-1 SSA B b-1 SSB AB (a-1)(b-1) SSAB Error ab(n-1) SSE Total abn-1 SSTO MS F MSA MSA/MSE MSB MSB/MSE MSAB MSAB/MSE MSE _ Expected Mean Squares • E(MSE) = σ2 • E(MSA) = σ2 + nb(Σiαi2)/(a-1) • E(MSB) = σ2 + na(Σjβj2)/(b-1) • E(MSAB) = + n(Σ( ) • Here, αi, βj, and (αβ)ij are defined with the usual factor effects constraints σ2 2 ij)/((a-1)(b-1)) An analytical strategy • Run the model with main effects and the two-way interaction • Plot the data, the means, and look at the normal quantile plot and residual plots • If assumptions seem reasonable, check the significance of test for the interaction AB interaction not sig • If the AB interaction is not statistically significant – Possibly rerun the analysis without the interaction (See pooling §19.10) – Potential Type II errors when pooling – For a main effect with more than two levels that is significant, use the means statement with the Tukey multiple comparison procedure GLM Output Source DF SS MS F Pr > F Model 5 1580 316 30.58 0.0003 Error 6 62 10 Total 11 1642 Note that there are 6 cells in this design. Output ANOVA Type I or Type III Source DF SS MS F height 2 1544 772 74.71 width 1 12 12 1.16 h*w 2 24 12 1.16 Pr > F <.0001 0.3226 0.3747 Note Type I and Type III analyses are the same because cell size n is constant Rerun without interaction proc glm class model means run; data=a1; height width; sales=height width; height / tukey lines; ANOVA output Source height width DF MS F 2 772 71.81 1 12 1.12 Pr > F <.0001 0.3216 MS(height) and MS(width) have not changed. The MSE, F*’s, and P-values have because of pooling. Comparison of MSEs Model with interaction Error 6 62 10.33 Model without interaction Error 8 86 10.75 Little change in MSE here…often only pool when df small Pooling SS • Data = Model + Residual • When we remove a term from the `model’, we put this variation and the associated df into `residual’ • This is called pooling • A benefit is that we have more df for error and a simpler model • Potential Type II errors • Beneficial only in small experiments Pooling SSE and SSAB • For model with interaction • SSAB=24, dfAB=2 • SSE=62, dfE=6 • MSE=10.33 • For the model with main effects only • SSE=62+24=86, dfE=6+2=8 • MSE=10.75 Tukey Output Mean N height A 67.000 4 2 B B B 44.000 4 1 42.000 4 3 Plot of the means Regression Approach • • • • Similar to what we did for one-way Use a-1 variables for A Use b-1 variables for B Multiply each of the a-1 variables for A times each of the b-1 for B to get (a1)(b-1) for AB • You can use the test statement in Proc reg to perform F tests Create Variables data a4; set a1; X1 = (height eq 1) - (height eq 3); X2 = (height eq 2) - (height eq 3); X3 = (width eq 1) - (width eq 2); X13 = X1*X3; X23 = X2*X3; Run Proc Reg proc reg data=a4; model sales= X1 X2 X3 X13 X23 / ss1; height: test X1, X2; width: test X3; interaction: test X13, X23; run; SAS Output Analysis of Variance Sum of Mean DF Squares Square F Value Pr > F 5 1580.00000 316.00000 30.58 0.0003 Source Model Error 6 Corrected Total 62.00000 10.33333 11 1642.00000 Same basic ANOVA table SAS Output Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Intercept 1 51.00000 0.92796 54.96 <.0001 31212 X1 1 -7.00000 1.31233 -5.33 0.0018 8.00000 X2 1 16.00000 1.31233 12.19 <.0001 1536.0000 X3 1 -1.00000 0.92796 -1.08 0.3226 12.00000 X13 1 2.00000 1.31233 1.52 0.1783 18.00000 X23 1 -1.00000 1.31233 -0.76 0.4749 6.00000 SS Results • SS(Height) = SS(X1)+SS(X2|X1) 1544 = 8.0 + 1536 • SS(Width) = SS(X3|X1,X2) 12 = 12 • SS(Height*Width) = SS(X13|X1,X2,X3) + SS(X23|X1, X2,X3,X13) 24 = 18 + 6 Test Results Test height Results for Dependent Variable sales Mean Source DF Square F Value Pr > F Numerator 2 772.0000 74.71 <.0001 Denominator 6 10.33333 Test interaction Results for Dependent Variable sales Test width Results for Dependent Variable Source Numerator sales Denominator Mean Source DF Square F Value Pr > F Numerator 1 12.0000 1.16 0.3226 Denominator 6 10.3333 Mean DF Square F Value Pr > F 2 12.000 1.16 0.3747 6 10.333 Interpreting Estimates ˆ 1. 51 ( 7) 44 ˆ 2. 51 16 67 ˆ 3. 51 ( 7) 16 42 ˆ .1 51 ( 1) 50 ˆ .2 51 ( 1) 52 ˆ 11 51 ( 7) ( 1) 2 45 ˆ 22 51 16 ( 1) ( 1) 69 Last slide • Finish reading KNNL Chapter 19 • Topic25.sas contains the SAS commands for these slides • We will now focus more on the strategies needed to handle a two- or more factor ANOVA