Topic 18 – More on the One Way Analysis of... Two issues still to be dealt with:

advertisement
Topic 18 - ANOVA (II)
18-1
Topic 18 – More on the One Way Analysis of Variance
Two issues still to be dealt with:
a) checking the assumptions of the model, and
b) inference on individual means or combinations of
means, e.g. do two treatment means differ?
1.
Estimation
Predicted Values or LSMEANS (“Least Squares
Means”)
i. The best estimators of the “cell” means µi are the
sample means y i •
ii. The standard error of the mean estimator y i • is
estimated using
a.
SE( y i • ) =
b. Residuals
i. The estimators
MSE
.
ni
of the error terms ε ij are the
residuals
eij = y ij − y i • .
Topic 18 - ANOVA (II)
ii.
18-2
The residuals always sum to 0 (i.e., ∑ ∑ eij = 0 ) and
i
j
have standard deviation σ ε estimated by MSE
(“Root Mean Squared Error”)
iii. Under the assumptions, the residuals have a
Normal distribution with mean 0 and constant
variance σ ε2 .
EXAMPLE: the comparisons of the effects of safelights
on plant height
Treatment
D
D
D
D
AL
AL
AL
AL
AH
AH
AH
AH
BL
BL
BL
BL
BH
BH
BH
BH
Height
32.94
35.98
34.76
32.4
30.55
32.64
32.37
32.04
31.23
31.09
30.62
30.42
34.41
34.88
34.07
33.87
35.61
35
33.65
32.91
Predicted
34.02
34.02
34.02
34.02
31.9
31.9
31.9
31.9
30.84
30.84
30.84
30.84
34.3075
34.3075
34.3075
34.3075
34.2925
34.2925
34.2925
34.2925
SE(Pred)
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
0.52243899
Residual
-1.08
1.96
0.74
-1.62
-1.35
0.74
0.47
0.14
0.39
0.25
-0.22
-0.42
0.1025
0.5725
-0.2375
-0.4375
1.3175
0.7075
-0.6425
-1.3825
SE(Resid)
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
0.90489088
Topic 18 - ANOVA (II)
Checking The Assumptions of the Model
i.
Constant Variance
Graphically – do box plots of the residuals for each
treatment and look for similar variabilities
Plot of Residual Height By Treatment
Residual Height
a.
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
AH
AL
BH
BL
D
Treatment
Box Plots of Residual Height By Treatment
Residual Height
2.
18-3
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
AH
AL
BH
BL
Treatment
D
Topic 18 - ANOVA (II)
ii.
18-4
Hypothesis testing of equality of the treatment
variances σ i2 , i = 1,2,...,t using Levene’s test or
similar
Homogeneity of Variance Tests (from the JMP help files)
When the variances across groups are not equal, the usual
analysis of variance assumptions are not satisfied and the
ANOVA F-test is not valid.
JMP gives four tests for equality of group variances. The
concept behind the first three tests of equal variances is to
perform an analysis of variance on a new response
variable constructed to measure the spread in each group.
The fourth test is Bartlett’s test, which is similar to the
likelihood-ratio test under normal distributions.
• O’Brien’s test constructs a dependent variable so that
the group means of the new variable equal the group
sample variances of the original response. An
ANOVA on the O’Brien variable is actually an
ANOVA on the group sample variances (O’Brien
1979, Olejnik and Algina 1987).
• The Brown-Forsythe test shows the F-test from an
ANOVA in which the response is the absolute value
of the difference of each observation and the group
median (Brown and Forsythe 1974a).
• Bartlett’s test is a weighted geometric average of the
group sample variances multiplied by a correction
Topic 18 - ANOVA (II)
18-5
factor to give it a c2-distribution. Dividing the Bartlett
Chi-square test statistic by the degrees of freedom
gives the F-value shown in the table. Bartlett’s test is
valid only under normality (Bartlett and Kendall
1946).
• Levene’s test shows the F-test from an ANOVA in
which the response is the absolute value of the
difference of each observation and the group mean
(Levene 1960).
Test of H0: σ = σ = ... = σ versus HA: not all σ are equal.
This can be used when the sample sizes are unequal.
2
1
2
2
2
t
2
i
Topic 18 - ANOVA (II)
18-6
The Tests that the Variances are Equal table shows the
differences between group means to the grand mean and
to the median, and gives a summary of testing procedures.
Tests that the Variances are Equal
Level
Count Std Dev
AH
AL
BH
BL
D
4
4
4
4
4
Test
O'Brien[.5]
Brown-Forsythe
Levene
Bartlett
0.382710
0.932845
1.232947
0.441994
1.651262
F Ratio
2.3658
3.7971
5.1496
1.7934
MeanAbsDif to
Mean
0.320000
0.675000
1.012500
0.337500
1.350000
DFNum
4
4
4
4
MeanAbsDif to
Median
0.320000
0.605000
1.012500
0.337500
1.350000
DFDen
15
15
15
.
Prob > F
0.0995
0.0252
0.0082
0.1270
Warning: Small sample sizes. Use Caution.
Conclusion: Based on Levene’s test, reject the null
hypothesis and conclude that the variance is
heterogeneous, varying with the different treatments.
b.
i.
Normality
Graphically
1. do a stem and leaf plot, a histogram, or
something similar using the residuals to check
for the shape of the distribution and for outliers
Topic 18 - ANOVA (II)
2.
18-7
do a normal probability (quantile) plot of the
residuals
A Normal quantile plot is a graph of the observed values
of the dataset (X-axis) against the expected values of a set
of n random selection from a Normal distribution with the
mean and variance of the sample data.
To interpret: the points on the graph fall on a straight line
when the data are normally distributed. They should
definitely fall between the 95% confidence limits around
the straight line (with a slope of 1).
NOTE: usually, normality is NOT reviewed or tested until
after any problems with variance are corrected.
Obviously, if the variances are unequal it is highly likely
that the distribution of the residuals will look platykurtotic
(flatter than expected for a Normal Distribution).
Example: A study was performed in order to determine if
the mean weight of migrating warblers varied across
different habitats of pine trees and hardwoods (dp, ep, hw,
mw). A total of 174 birds were collected and weighed.
Topic 18 - ANOVA (II)
18-8
Results of the ANOVA:
Analysis of Variance
Source
DF
Model
3
Error
170
C. Total
173
Sum of Squares Mean Square
246.6366
82.2122
1424.9719
8.3822
1671.6085
F Ratio
9.8080
Prob > F
<.0001
weight Residual
Residual by Predicted Plot
This plot is useful
for checking
constant variance
and outliers
0
-10
30
40
weight Predicted
Tests that the Variances are Equal
Test
F Ratio
DFNum
O'Brien[.5]
0.2729
3
Brown-Forsythe
0.6891
3
Levene
1.0748
3
Bartlett
0.4077
3
DFDen
170
170
170
.
Prob > F
0.8449
0.5599
0.3613
0.7475
Conclusion: there is insufficient evidence to reject the null
hypothesis that the variances are equal.
So, now let’s check for normality:
Topic 18 - ANOVA (II)
18-9
3
.99
.95
.90
.75
.50
.25
.10
.05
.01
2
1
0
-1
-2
-3
-10
Normal Quantile Plot
Distribution: Residual weight
0
Normal(-5e-15,2.86999)
Fitted Normal
Parameter Estimates
Type
Parameter Estimate Lower 95% Upper 95%
Location
Mu
-0.00000
-0.42944 0.429440
Dispersion Sigma
2.86999
2.59681 3.207908
Based on the quantile plot and the overlay of a Normal
distribution on the histogram, there is not much evidence
that the assumption that the error terms are Normally
Topic 18 - ANOVA (II)
18-10
distributed is reasonable. We could also do a ShapiroWilk test here as well.
c. Independence and Random selection/allocation
This is something that is controlled and decided by
the scientist when planning and executing the
experiment.
ii. Important points to consider: in addition to
randomly selecting experimental units for inclusion
in the study and randomly allocating those units to
treatments, one should also randomly order the
laboratory analyses of the units after the
experiment is over.
i.
For example, in the study of height of plants as affected
by light regime, the scientist should randomly measure
the plants rather than take plants from the same treatment
sequentially. Subtle changes in the way measurements are
done could be occurring that might influence the results.
Remedial Measures
Many different methods:
1.
change the model to account for the nonindependence
Topic 18 - ANOVA (II)
18-11
change the model to account for the unequal
variance
3. do a transformation of the data for unequal variance
and non-normality
4. use a non-parametric test for severely non-normal
data
a. Kruskal-Wallis test
b. Bootstrapping
2.
Example of changing the model to allow for unequal
variance:
Model:
Yij = µ + α i + ε ij
= µ i + ε ij
where
• µ is the overall (grand) mean,
• µi is the ith treatment mean,
• αi (= µ − µi) is the deviation of the ith treatment mean
from the overall mean, and,
• εij ( = Yij − µi • ) is called the error term, i.e. it is the
deviation of the jth observation, Yij, from the ith
treatment mean.
Topic 18 - ANOVA (II)
18-12
The error terms are independently, Normally distributed
with a mean of 0 but with treatment variances
var(ε ij ) = σ i2 .
To analyze this model in JMP: using the fit Y by X
platform and the unequal variances option we get
Oneway Analysis of Height By Treatment
Welch Anova testing Means Equal, allowing Std Devs Not Equal
F Ratio
30.3117
DFNum
4
DFDen
7.1118
Prob > F
0.0001
This test is for H0: µ1 = ... = µt vs. HA: at least one mean
differs when the variances are unequal.
The denominator degrees of freedom have to be modified
(similar to the Satterthwaite method we used to 2
population t-tests) to allow for the unequal variances.
That is why they are 7.1118 rather than the Dfden (= 15)
that we saw for the equal variance test.
Download