Topic 25: Inference for Two-Way ANOVA

advertisement
Topic 25: Inference for
Two-Way ANOVA
Outline
• Two-way ANOVA
– Data, models, parameter estimates
• ANOVA table, EMS
• Analytical strategies
• Regression approach
Data
• Response written Yijk where
– i denotes the level of the factor A
– j denotes the level of the factor B
– k denotes the kth observation in cell (i,j)
• i = 1, . . . , a levels of factor A
• j = 1, . . . , b levels of factor B
• k = 1, . . . , n observations in cell (i,j)
Cell means model
• Yijk = μij + εijk
– where μij is the theoretical mean or
expected value of all observations
in cell (i,j)
– the εijk are iid N(0, σ2)
– This means Yijk ~N(μij, σ2) and
independent
Factor effects model
• μij = μ + αi + βj + (αβ)ij
• Consider μ to be the overall mean
• αi is the main effect of A
• βj is the main effect of B
• (αβ)ij is the interaction between A and B
Constraints for this
interpretation
• α. = Σiαi = 0
(df = a-1)
• β. = Σjβj = 0
(df = b-1)
• (αβ).j = Σi (αβ)ij = 0 for all j
• (αβ)i. = Σj (αβ)ij= 0 for all I
df = (a-1)(b-1)
SAS GLM Constraints
•
•
•
•
αa = 0 (1 constraint)
βb = 0 (1 constraint)
(αβ)aj = 0 for all j (b constraints)
(αβ)ib = 0 for all i (a constraints)
• The total is 1+1+a+b-1=a+b+1 (the
constraint (αβ)ab is counted twice
in the last two bullets above)
Parameters and
constraints
• The cell means model has ab
parameters for the means
• The factor effects model has
(1+a+b+ab) parameters
– An intercept (1)
– Main effect of A (a)
– Main effect of B (b)
– Interaction of A and B (ab)
Factor effects model
• There are 1+a+b+ab parameters
• There are 1+a+b constraints
• There are ab unconstrained
parameters (or sets of parameters),
the same number of parameters for
the means in the cell means model
• While certain parameters depend on
choice of constraints, others do not
KNNL Example
• KNNL p 833
• Y is the number of cases of bread sold
• A is the height of the shelf display, a=3
levels: bottom, middle, top
• B is the width of the shelf display, b=2:
regular, wide
• n=2 stores for each of the 3x2
treatment combinations
Proc GLM with solution
proc glm data=a1;
class height width;
model sales=height width
height*width
/solution;
means height*width;
run;
Solution output
Intercept
height
height
height
width
width
1
2
3
1
2
44.0
-1.0
25.0
0.0
-4.0
0.0
B
B
B
B
B
B
Solution output
height*width
height*width
height*width
height*width
height*width
height*width
1
1
2
2
3
3
1
2
1
2
1
2
6.0
0.0
0.0
0.0
0.0
0.0
B
B
B
B
B
B
Means
Based on
estimates from
previous two
pages
height width Mean
1
1
45=44 -1-4+6
1
2
43=44 -1+0+0
2
1
65=44+25-4+0
2
2
69=44+25+0+0
3
1
40=44 +0-4+0
3
2
44=44 +0+0+0
Check normality
Alternative way to form QQplot
proc glm data=a1;
class height width;
model sales=height width
height*width;
output out=a2 r=resid;
proc rank data=a2
out=a3 normal=blom;
var resid; ranks zresid;
Normal Quantile plot
proc sort data=a3;
by zresid;
symbol1 v=circle i=sm70;
proc gplot data=a3;
plot resid*zresid/frame;
run;
The plot
Note, dfE is only 6
ANOVA Table
Source df
SS
A
a-1
SSA
B
b-1
SSB
AB (a-1)(b-1) SSAB
Error ab(n-1) SSE
Total abn-1 SSTO
MS
F
MSA MSA/MSE
MSB MSB/MSE
MSAB MSAB/MSE
MSE
_
Expected Mean Squares
• E(MSE) = σ2
• E(MSA) = σ2 + nb(Σiαi2)/(a-1)
• E(MSB) = σ2 + na(Σjβj2)/(b-1)
• E(MSAB) =
+ n(Σ(  )
• Here, αi, βj, and (αβ)ij are defined with
the usual factor effects constraints
σ2
2
ij)/((a-1)(b-1))
An analytical strategy
• Run the model with main effects and
the two-way interaction
• Plot the data, the means, and look at
the normal quantile plot and residual
plots
• If assumptions seem reasonable,
check the significance of test for the
interaction
AB interaction not sig
• If the AB interaction is not statistically
significant
– Possibly rerun the analysis without the
interaction (See pooling §19.10)
– Potential Type II errors when pooling
– For a main effect with more than two
levels that is significant, use the
means statement with the Tukey
multiple comparison procedure
GLM Output
Source DF
SS MS
F Pr > F
Model
5 1580 316 30.58 0.0003
Error
6
62 10
Total 11 1642
Note that there are 6 cells in
this design.
Output ANOVA
Type I or Type III
Source DF
SS MS
F
height 2 1544 772 74.71
width
1
12 12 1.16
h*w
2
24 12 1.16
Pr > F
<.0001
0.3226
0.3747
Note Type I and Type III
analyses are the same because
cell size n is constant
Rerun without interaction
proc glm
class
model
means
run;
data=a1;
height width;
sales=height width;
height / tukey lines;
ANOVA output
Source
height
width
DF MS
F
2 772 71.81
1 12 1.12
Pr > F
<.0001
0.3216
MS(height) and MS(width) have
not changed. The MSE, F*’s,
and P-values have because of
pooling.
Comparison of MSEs
Model with interaction
Error
6
62
10.33
Model without interaction
Error
8
86
10.75
Little change in MSE here…often only
pool when df small
Pooling SS
• Data = Model + Residual
• When we remove a term from the `model’,
we put this variation and the associated df
into `residual’
• This is called pooling
• A benefit is that we have more df for error
and a simpler model
• Potential Type II errors
• Beneficial only in small experiments
Pooling SSE and SSAB
• For model with interaction
• SSAB=24, dfAB=2
• SSE=62, dfE=6
• MSE=10.33
• For the model with main effects only
• SSE=62+24=86, dfE=6+2=8
• MSE=10.75
Tukey Output
Mean
N
height
A
67.000
4
2
B
B
B
44.000
4
1
42.000
4
3
Plot of the means
Regression Approach
•
•
•
•
Similar to what we did for one-way
Use a-1 variables for A
Use b-1 variables for B
Multiply each of the a-1 variables for A
times each of the b-1 for B to get (a1)(b-1) for AB
• You can use the test statement in Proc
reg to perform F tests
Create Variables
data a4;
set a1;
X1 = (height eq 1) - (height eq 3);
X2 = (height eq 2) - (height eq 3);
X3 = (width eq 1) - (width eq 2);
X13 = X1*X3;
X23 = X2*X3;
Run Proc Reg
proc reg data=a4;
model sales= X1 X2 X3 X13 X23 / ss1;
height: test X1, X2;
width: test X3;
interaction: test X13, X23;
run;
SAS Output
Analysis of Variance
Sum of
Mean
DF
Squares
Square F Value Pr > F
5 1580.00000 316.00000
30.58 0.0003
Source
Model
Error
6
Corrected Total
62.00000 10.33333
11 1642.00000
Same basic ANOVA table
SAS Output
Parameter Estimates
Parameter Standard
Variable DF Estimate
Error t Value Pr > |t| Type I SS
Intercept 1 51.00000 0.92796 54.96 <.0001
31212
X1
1 -7.00000 1.31233 -5.33 0.0018 8.00000
X2
1 16.00000 1.31233 12.19 <.0001 1536.0000
X3
1 -1.00000 0.92796 -1.08 0.3226 12.00000
X13
1
2.00000 1.31233
1.52 0.1783 18.00000
X23
1 -1.00000 1.31233 -0.76 0.4749 6.00000
SS Results
• SS(Height) = SS(X1)+SS(X2|X1)
1544
=
8.0 + 1536
• SS(Width) = SS(X3|X1,X2)
12
=
12
• SS(Height*Width) =
SS(X13|X1,X2,X3) + SS(X23|X1, X2,X3,X13)
24
= 18
+ 6
Test Results
Test height Results for Dependent Variable
sales
Mean
Source
DF Square F Value Pr > F
Numerator
2 772.0000
74.71 <.0001
Denominator 6 10.33333
Test interaction Results for Dependent
Variable sales
Test width Results for Dependent Variable Source
Numerator
sales
Denominator
Mean
Source
DF Square F Value Pr > F
Numerator
1 12.0000
1.16 0.3226
Denominator
6 10.3333
Mean
DF Square F Value Pr > F
2 12.000
1.16 0.3747
6 10.333
Interpreting Estimates
ˆ 1.  51  ( 7)  44
ˆ 2.  51  16  67
ˆ 3.  51  ( 7)  16  42
ˆ .1  51  ( 1)  50 ˆ .2  51  ( 1)  52
ˆ 11  51  ( 7)  ( 1)  2  45
ˆ 22  51  16  ( 1)  ( 1)  69
Last slide
• Finish reading KNNL Chapter 19
• Topic25.sas contains the SAS
commands for these slides
• We will now focus more on the
strategies needed to handle a two- or
more factor ANOVA
Download