Topic 24: Two-Way ANOVA

advertisement
Topic 24: Two-Way
ANOVA
Outline
• Two-way ANOVA
–Data
–Cell means model
–Parameter estimates
–Factor effects model
Two-Way ANOVA
• The response variable Y is
continuous
• There are two categorical
explanatory variables or factors
Data for two-way ANOVA
•
•
•
•
Y is the response variable
Factor A with levels i = 1 to a
Factor B with levels j = 1 to b
Yijk is the kth observation in cell (i,j)
• In Chapter 19, we assume equal
sample size in each cell (nij=n)
KNNL Example
• KNNL p 833
• Y is the number of cases of bread sold
• A is the height of the shelf display, a=3
levels: bottom, middle, top
• B is the width of the shelf display, b=2
levels: regular, wide
• n=2 stores for each of the 3x2=6
treatment combinations (nT=12)
Read the data
data a1;
infile ‘../data/ch19ta07.txt';
input sales height width;
proc print data=a1;
run;
The data
Obs
1
2
3
4
5
6
7
8
9
10
11
12
sales
47
43
46
40
62
68
67
71
41
39
42
46
height
1
1
1
1
2
2
2
2
3
3
3
3
width
1
1
2
2
1
1
2
2
1
1
2
2
Notation
• For Yijk we use
– i to denote the level of the factor A
– j to denote the level of the factor B
– k to denote the kth observation in cell (i,j)
• i = 1, . . . , a levels of factor A
• j = 1, . . . , b levels of factor B
• k = 1, . . . , n observations in cell (i,j)
Model
• We assume that the response variable
observations are
– Normally distributed
• With a mean that may depend on the
levels of the factors A and B
• With a constant variance
– Independent
Cell Means Model
• Yijk = μij + εijk
– where μij is the theoretical mean or
expected value of all observations in
cell (i,j)
– the εijk are iid N(0, σ2)
• This means Yijk ~ N(μij, σ2), independent
• The parameters of the model are
– μij, for i = 1 to a and j = 1 to b
– σ2
Estimates
• Estimate μij by the mean of the
observations in cell (i,j), Yij.
• Yij.  (k Yijk ) / n
• For each (i,j) combination, we can get
an estimate of the variance
• s  k (Yijk  Yij. ) /( n  1)
• We need to combine these to get an
estimate of σ2
2
ij
2
Pooled estimate of
2
σ
• In general we pool the sij2, using
weights proportional to the df, nij -1
• The pooled estimate is
s2 = (Σ (nij-1)sij2) / (Σ(nij-1))
• Here, nij = n, so s2 = (Σsij2) / (ab),
which is the average sample variance
Run proc glm
proc glm data=a1;
class height width;
model sales=
height width height*width;
means height width
height*width;
run;
Output
Class Level Information
Class
Levels Values
height
31 2 3
width
21 2
Number of Observations Read
Number of Observations Used
12
12
Means statement height
sales
Level of
height
1
N
4
Mean
44.0000000
Std Dev
3.16227766
2
4
67.0000000
3.74165739
3
4
42.0000000
2.94392029
Means statement width
sales
Level of
width
1
N
Mean
Std Dev
6 50.0000000 12.0664825
2
6 52.0000000 13.4313067
Means statement ht*w
Level of
height
1
1
2
2
3
3
Level of
width
1
2
1
2
1
2
N
2
2
2
2
2
2
sales
Mean
Std Dev
45.0000000 2.82842712
43.0000000 4.24264069
65.0000000 4.24264069
69.0000000 2.82842712
40.0000000 1.41421356
44.0000000 2.82842712
Code the factor levels
data a1; set a1;
if height eq 1 and
then hw='1_BR';
if height eq 1 and
then hw='2_BW';
if height eq 2 and
then hw='3_MR';
if height eq 2 and
then hw='4_MW';
if height eq 3 and
then hw='5_TR';
if height eq 3 and
then hw='6_TW';
width eq 1
width eq 2
width eq 1
width eq 2
width eq 1
width eq 2
Plot the data
symbol1 v=circle i=none;
proc gplot data=a1;
plot sales*hw/frame;
run;
The plot
Put the means in a2
proc means data=a1;
var sales;
by height width;
output out=a2 mean=avsales;
proc print data=a2;
run;
Output Data Set
Obs height width _TYPE_
1
2
3
4
5
6
1
1
2
2
3
3
1
2
1
2
1
2
0
0
0
0
0
0
_FREQ_ avsales
2
2
2
2
2
2
45
43
65
69
40
44
Plot the means
symbol1 v=square i=join c=black;
symbol2 v=diamond i=join c=black;
proc gplot data=a2;
plot avsales*height=width/frame;
run;
The interaction plot
Questions to consider
• Does the height of the display affect
sales? If yes, compare top with
middle, top with bottom, and middle
with bottom
• Does the width of the display affect
sales? If yes, compare regular and
wide
But wait!!! Are these
factor level comparisons
meaningful?
• Does the effect of height on sales
depend on the width?
• Does the effect of width on sales
depend on the height?
• If yes, we have an interaction and we
need to do some additional analysis
Factor effects model
• For the one-way ANOVA model, we
wrote μi = μ + αi
• Here we use μij = μ + αi + βj + (αβ)ij
• Under “common” formulation
– μ (μ.. in KNNL) is the “overall mean”
– αi is the main effect of A
– βj is the main effect of B
– (αβ)ij is the interaction between A and B
Factor effects model
• μ = (Σij μij)/(ab)
• μi. = (Σj μij)/b and μ.j = (Σi μij)/a
• αi = μi. – μ and βj = μ.j - μ
• (αβ)ij is difference between μij and μ + αi + βj
• (αβ)ij = μij - (μ + (μi. - μ) + (μ.j - μ))
= μij – μi. – μ.j + μ
Interpretation
•
•
•
•
•
μij = μ + αi + βj + (αβ)ij
μ is the “overall” mean
αi is an adjustment for level i of A
βj is an adjustment for level j of B
(αβ)ij is an additional adjustment that
takes into account both i and j that
cannot be explained by the previous
adjustments
Constraints for this
framework
• α. = Σi αi= 0
• β. = Σjβj = 0
• (αβ).j = Σi (αβ)ij = 0 for all j
• (αβ)i. = Σj (αβ)ij = 0 for all i
Estimates for factor
effects model
ˆ  Y...  (ijk Yijk ) / abn
ˆ i.  Yi.. and ˆ .j  Y. j.
ˆ i  Yi..  Y... and ˆ j  Y. j.  Y...
(ˆ ) ij  Yij.  Yi..  Y. j.  Y...
SS for ANOVA Table
SSA   ijk ˆ   ijk (Yi ..  Y... )
2
i
SSB   ijk
2
ˆ

j
SSAB   ijk (
ˆ)
2
ij
SSE   ijk (Yijk  Yij . )
2
SSTO   ijk (Yijk  Y... )
2
2
df for ANOVA Table
• dfA = a-1
•
•
•
•
dfB = b-1
dfAB = (a-1)(b-1)
dfE = ab(n-1)
dfT = abn-1 = nT-1
MS for ANOVA Table
•
•
•
•
•
MSA = SSA/dfA
MSB = SSB/dfB
MSAB = SSAB/dfAB
MSE = SSE/dfE
MST = SST/dfT
Hypotheses for two-way
ANOVA
• H0A: αi = 0 for all i
• H1A: αi ≠ 0 for at least one i
• H0B: βj = 0 for all j
• H1B: βj ≠ 0 for at least one j
• H0AB: (αβ)ij = 0 for all (i,j)
• H1AB: (αβ)ij ≠ 0 for at least one (i,j)
F statistics
• H0A is tested by FA = MSA/MSE;
df=dfA, dfE
• H0B is tested by FB = MSB/MSE;
df=dfB, dfE
• H0AB is tested by FAB = MSAB/MSE;
df=dfAB, dfE
ANOVA Table
Source df
SS
A
a-1
SSA
B
b-1
SSB
AB (a-1)(b-1) SSAB
Error ab(n-1) SSE
Total abn-1 SSTO
MS
F
MSA MSA/MSE
MSB MSB/MSE
MSAB MSAB/MSE
MSE
_
MST
P-values
• P-values are calculated using the
F(dfNumerator, dfDenominator)
distributions
• If P ≤ 0.05 we conclude that the effect
being tested is statistically
significant
KNNL Example
• NKNW p 833
• Y is the number of cases of bread sold
• A is the height of the shelf display, a=3
levels: bottom, middle, top
• B is the width of the shelf display, b=2:
regular, wide
• n=2 stores for each of the 3x2
treatment combinations
PROC GLM
proc glm data=a1;
class height width;
model sales=
height width height*width;
run;
Output
Source
DF
Model
5
Error
6
Corrected Total 11
Sum of
Squares
1580.0000
62.000000
1642.0000
Mean
Square F Value Pr > F
316.000000 30.58 0.0003
10.333333
Note that there are 6 cells in
this design…(6-1)df for model
Output ANOVA
Source
height
DF Type III SS Mean Square F Value Pr > F
2 1544.00000 772.000000
74.71 <.0001
width
1
12.000000
12.000000
1.16 0.3226
height*width
2
24.000000
12.000000
1.16 0.3747
Note Type I and Type III
Analyses are the same because
nij is constant
Other output
R-Square Coeff Var Root MSE sales Mean
0.962241 6.303040
3.214550
51.00000
Commonly do not consider R-sq when
performing ANOVA…interested more in
difference in levels rather than
the models predictive ability
Results
• The main effect of height is statistically
significant (F=74.71; df=2,6; P<0.0001)
• The main effect of width is not
statistically significant (F=1.16; df=1,6;
P=0.32)
• The interaction between height and
width is not statistically significant
(F=1.16; df=2,6; P=0.37)
Interpretation
• The height of the display affects
sales of bread
• The width of the display has no
apparent effect
• The effect of the height of the display
is similar for both the regular and the
wide widths
Plot of the means
Additional analyses
• We will need to do additional
analyses to explain the height effect
(factor A)
• There were three levels: bottom,
middle and top
• We could rerun the data with a oneway anova and use the methods we
learned in the previous chapters
• Use means statement with lines
Run Proc GLM
proc glm data=a1;
class height width;
model sales=
height width height*width;
means height / tukey lines;
lsmeans height / adjust=tukey;
run;
MEANS Output
Alpha
Error Degrees of Freedom
Error Mean Square
Critical Value of Studentized Range
Minimum Significant Difference
0.05
6
10.33333
4.33920
6.9743
Means with the same letter are not significantly
different.
Tukey Grouping
Mean
N height
A
67.000
42
B
B
B
44.000
41
42.000
43
LSMEANS Output
sales LSMEAN
44.0000000
LSMEAN
Number
1
2
67.0000000
2
3
42.0000000
3
height
1
Least Squares Means for effect height
Pr > |t| for H0: LSMean(i)=LSMean(j)
i/j
1
2
3
Dependent Variable: sales
1
2
0.0001
0.0001
0.6714
<.0001
3
0.6714
<.0001
Last slide
• We went over Chapter 19
• We used program topic24.sas to
generate the output for today.
Download