Announcements Lecture 22: ANOVA

advertisement
Announcements
Lecture 22: ANOVA
Homework 10 due Thursday 12/1
Statistics 101
OpenIntro quiz tonight 6pm - Thursday noon
Presentations - next Monday. Attend the whole session you’re
signed up for
Prof. Rundel
November 29, 2011
Statistics 101 (Prof. Rundel)
L 22: ANOVA
Recap
November 29, 2011
1 / 29
Recap
Review question
Below is the output of a regression model for predicting % voted for
McCain in the 2008 presidential election from region (levels = midwest,
northeast, south, west). Cases are each state. Which region has the
highest average % voted for McCain?
Review question
Which of the following is false about assumptions and conditions for
multiple linear regression?
(a) Residuals should have a linear relationship with the response
variable.
Coefficients:
(Intercept)
regionnortheast
regionsouth
regionwest
(b) Explanatory variables should have linear relationship with the
response variable.
(c) Explanatory variables should not have strong linear relationships
with each other.
Estimate Std. Error t value Pr(>|t|)
25.750
3.779
6.814 1.73e-08 ***
-15.417
5.773 -2.671
0.0104 *
8.812
4.999
1.763
0.0846 .
1.327
5.241
0.253
0.8012
(a) midwest
(d) Residuals should be normally distributed around 0.
(b) northeast
(c) south
(d) west
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
2 / 29
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
3 / 29
ANOVA
ANOVA
ANOVA
To compare means of 2 groups we use a Z or a T statistic.
Clicker question
To compare means of 3+ groups we use a new test called
ANOVA and a new statistic called F.
ANOVA is used to assess whether the mean of the outcome
variable is different for different levels of a categorical variable.
H0 : The mean outcome is the same across all categories,
What does ANOVA mean?
(a) Another variance analysis
(b) Analysis of variance
(c) An omnibus variance analysis
µ1 = µ2 = · · · = µk ,
(d) Aardvarks not over vanilla ants
where µi represents the mean of the outcome for observations in
category i.
HA : The mean of the outcome is different for some (or all) groups.
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
4 / 29
Statistics 101 (Prof. Rundel)
ANOVA
L 22: ANOVA
5 / 29
ANOVA
Model and assumptions
z/t test vs. ANOVA - Purpose
yi ,j = µi + j ,
ANOVA
where an observation belongs to a group i and has error j .
z/t test
In fitting this model we assume that the errors are
Compare means from two groups
to see whether they are so far
apart that the observed difference
cannot reasonably be attributed to
sampling variability.
independent
the nearly normal
have nearly constant variance
These are more or less the same conditions as multiple linear
regression
L 22: ANOVA
November 29, 2011
Compare the means from two or
more groups to see whether they
are so far apart that the observed
differences cannot all reasonably
be attributed to sampling
variability.
H0 : µ1 = µ2
If the assumptions are met we perform ANOVA to evaluate if the
data provide strong evidence against the null hypothesis that all
µi are equal
Statistics 101 (Prof. Rundel)
November 29, 2011
6 / 29
Statistics 101 (Prof. Rundel)
H0 : µ1 = µ2 = · · · = µk
L 22: ANOVA
November 29, 2011
7 / 29
ANOVA
ANOVA
z/t test vs. ANOVA - Method
z/t test vs. ANOVA
z/t test
ANOVA
Compute a test statistic (a ratio).
Compute a test statistic (a ratio).
x̄1 − x̄2
z /t =
SE (x̄1 − x̄2 )
With only two groups t-test and ANOVA are equivalent, but only if
we use a pooled standard variance in the denominator of the test
statistic → which we kinda skipped over since it’s rarely used
variability bet. groups
F=
variability w/in groups
With more than two groups, ANOVA compares the sample
means to an overall grand mean.
Large test statistics lead to small p-values.
If the p-value is small enough H0 is rejected and we conclude
that the population means are not equal.
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
8 / 29
Statistics 101 (Prof. Rundel)
L 22: ANOVA
ANOVA
November 29, 2011
9 / 29
ANOVA
Aldrin in the Wolf River
Data
Aldrin concentration (nanograms per liter) at three levels of depth.
The Wolf River in Tennessee flows past an abandoned site once
used by the pesticide industry for dumping wastes, including
chlordane (pesticide), aldrin, and dieldrin (both insecticides).
1
2
3
4
5
6
7
8
9
10
These highly toxic organic compounds can cause various
cancers and birth defects.
The standard methods to test whether these substances are
present in a river is to take samples at six-tenths depth.
But since these compounds are denser than water and their
molecules tend to stick to particles of sediment, they are more
likely to be found in higher concentrations near the bottom than
near mid-depth.
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
10 / 29
aldrin
3.80
4.80
4.90
5.30
5.40
5.70
6.30
7.30
8.10
8.80
Statistics 101 (Prof. Rundel)
depth
bottom
bottom
bottom
bottom
bottom
bottom
bottom
bottom
bottom
bottom
aldrin
3.20
3.80
4.30
4.80
4.90
5.20
5.20
6.20
6.30
6.60
depth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
L 22: ANOVA
aldrin
3.10
3.60
3.70
3.80
4.30
4.40
4.40
4.40
5.10
5.20
depth
surface
surface
surface
surface
surface
surface
surface
surface
surface
surface
November 29, 2011
11 / 29
ANOVA
ANOVA
Exploratory analysis
Hypotheses
Clicker question
Bottom
What are the correct hypotheses for testing for a difference between
the mean aldrin concentrations among the three levels?
Middepth
(a) H0 : µB = µM = µS
HA : µB , µM , µS
Surface
Aldrin concentration (nanograms per liter) at three levels of depth.
(b) H0 : µB , µM , µS
HA : µB = µM = µS
3
4
5
6
Bottom
Mid-depth
Surface
Statistics 101 (Prof. Rundel)
mean
6.04
5.05
4.20
7
(c) H0 : µB = µM = µS
HA : At least one pair of means are different.
9
SD
1.58
1.10
0.66
L 22: ANOVA
ANOVA
8
(d) H0 : µB = µM = µS = 0
HA : At least one pair of means are different.
(e) H0 : µB = µM = µS
HA : µB > µM > µS
November 29, 2011
12 / 29
Statistics 101 (Prof. Rundel)
L 22: ANOVA
ANOVA and the F test
ANOVA
Test statistic
November 29, 2011
13 / 29
November 29, 2011
15 / 29
ANOVA and the F test
Test statistic (cont.)
F=
variability bet. groups
variability w/in groups
F=
variability bet. groups
MSG
=
variability w/in groups
MSE
MSG is mean square between groups
dfG = k − 1
where k is number of groups
MSE is mean square error - variability in residuals
dfE = n − k
Does there appear to be a lot of variability within groups? How about
between groups?
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
14 / 29
where n is number of observations.
Statistics 101 (Prof. Rundel)
L 22: ANOVA
ANOVA
ANOVA and the F test
ANOVA
p-value
ANOVA and the F test
ANOVA output for Aldrin data
F=
depth
Residuals
variability bet. groups
variability w/in groups
Df
2
27
Sum Sq
16.96
37.33
Mean Sq
8.48
1.38
F value
6.13
Pr(>F)
0.0064
Let’s see what this table shows...
dfG = 3 − 1 = 2
dfE = n − k = 30 − 3 = 27
MSG = 162.96 = 8.48
.33
MSE = 3727
= 1.38
F=
8.48
1.38
= 6.13
p − value = P (F > 6.13) ≈
0.0064
In order to be able to reject H0 , we need a small p-value, which
requires a large F statistic.
dfG = 2 ; dfE = 27
In order to obtain a large F statistic, variability between sample
means needs to be greater than variability within sample means.
0
Statistics 101 (Prof. Rundel)
L 22: ANOVA
ANOVA
November 29, 2011
16 / 29
Statistics 101 (Prof. Rundel)
ANOVA and the F test
6.13
L 22: ANOVA
ANOVA
Conclusion - in context
November 29, 2011
17 / 29
ANOVA and the F test
Conclusion
Clicker question
If p-value is small (less than α), reject H0 .The data provide
convincing evidence that at least one pair of means are different
from each other (but we can’t tell which ones).
What is the conclusion of the hypothesis test?
The data provide convincing evidence that the average aldrin
concentration
If p-value is large, fail to reject H0 .The data do not provide
convincing evidence that at least one pair of means are different
from each other, the observed differences in sample means are
attributable to sampling variability (or chance).
(a) is different for all groups.
(b) on the surface is lower than the other levels.
(c) is different for at least two of the groups.
(d) is the same for all groups.
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
18 / 29
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
19 / 29
ANOVA
Graphical diagnostics for ANOVA
ANOVA
(1) independent residuals
(2) nearly normal residuals
If we believe the order in which data was collected might
introduce dependence (like with time series data) we should
verify this assumption by making a plot of the residuals vs. order
of data collection.
Normal probability plot of residuals
1
2
3
4
5
6
7
8
9
10
Statistics 101 (Prof. Rundel)
aldrin
3.20
3.80
4.30
4.80
4.90
5.20
5.20
6.20
6.30
6.60
depth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
middepth
aldrin
3.10
3.60
3.70
3.80
4.30
4.40
4.40
4.40
5.10
5.20
L 22: ANOVA
ANOVA
●
1
●
●
● ●
●●●●
●●●
0
●●
−1
−2
●
Frequency
Sample Quantiles
depth
surface
surface
surface
surface
surface
surface
surface
surface
surface
surface
8
●
2
We will assume independence of observations, and hence
residuals.
depth
bottom
bottom
bottom
bottom
bottom
bottom
bottom
bottom
bottom
bottom
Histogram of residuals
●
In this case we have data that has already been sorted, so we
have no way of telling what order the data was collected in.
aldrin
3.80
4.80
4.90
5.30
5.40
5.70
6.30
7.30
8.10
8.80
Graphical diagnostics for ANOVA
● ●
●
●
●●
●
●●
6
4
2
●●
●
0
●
−2
−1
0
1
2
−3
−2
−1
0
1
2
3
Does this assumption appear to be satisfied?
November 29, 2011
20 / 29
Statistics 101 (Prof. Rundel)
Graphical diagnostics for ANOVA
L 22: ANOVA
ANOVA
(3) constant variance
November 29, 2011
21 / 29
Graphical diagnostics for ANOVA
Assumptions - recap
Side-by-side box plot of the outcome.
9
Independence is always important, though sometimes difficult to
check.
8
7
Normality condition is important when the sample sizes for each
group is relatively small.
6
Constant variance condition is especially important when the
sample sizes differ between groups.
5
4
3
surface
bottom
middepth
Does this assumption appear to be satisfied?
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
22 / 29
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
23 / 29
ANOVA
Multiple comparisons & Type 1 error rate
ANOVA
Which means differ?
Multiple comparisons
The scenario of testing many pairs of groups is called multiple
comparisons.
Earlier we concluded that at least one pair of means differ. The
natural question that follows is “which ones?”
The Bonferroni correction suggests that a more stringent
significance level is more appropriate for these tests:
We can do two sample t tests for differences in each possible
pair of groups.
α∗ = α/K
Can you see any pitfalls with this approach?
where K is the number of comparisons being considered.
When we run too many tests, the Type 1 Error rate increases.
If there are k groups, then usually all possible pairs are
k (k −1)
compared and K = 2 .
This issue is resolved by using a modified significance level.
Statistics 101 (Prof. Rundel)
Multiple comparisons & Type 1 error rate
L 22: ANOVA
ANOVA
November 29, 2011
24 / 29
Statistics 101 (Prof. Rundel)
Multiple comparisons & Type 1 error rate
L 22: ANOVA
ANOVA
Determining the modified α
November 29, 2011
25 / 29
Multiple comparisons & Type 1 error rate
Which means differ?
Clicker question
Clicker question
In the aldrin data set depth has 3 levels: bottom, mid-depth, and surface. If α = 0.05, what should be the modified significance level for two
sample t tests for determining which pairs of groups have significantly
different means?
Based on the p-values given below, which pairs of means are significantly different
t test between bottom & surface: p-value = 0.005244
t test between bottom & mid-depth: p-value = 0.1236
t test between mid-depth & surface: p-value = 0.05441
(a) α∗ = 0.05
(b) α∗ = 0.05/2 = 0.025
(a) bottom & surface
(c) α∗ = 0.05/3 = 0.0167
(b) bottom & mid-depth
(d) α∗ = 0.05/6 = 0.0083
(c) mid-depth & surface
(d) bottom & mid-depth; mid-depth & surface
(e) bottom & mid-depth; bottom & surface; mid-depth & surface
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
26 / 29
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
27 / 29
ANOVA
Using ANOVA for multiple regression
ANOVA
Revisit - Kid’s scores
Testing the model as a whole
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.59241
9.21906
2.125
0.0341 *
mom_hsyes
5.09482
2.31450
2.201
0.0282 *
mom_iq
0.56147
0.06064
9.259
<2e-16 ***
mom_workyes 2.53718
2.35067
1.079
0.2810
mom_age
0.21802
0.33074
0.659
0.5101
Predicting cognitive test scores of three- and four-year-old children
using characteristics of their mothers. Data are from a survey of adult
American women and their children.
1
..
.
5
6
..
.
kid score
65
434
mom hs
yes
Using ANOVA for multiple regression
mom iq
121.12
mom work
yes
mom age
27
115
98
yes
no
92.75
107.90
yes
no
27
18
70
yes
91.25
yes
25
Residual standard error: 18.14 on 429 degrees of freedom
Multiple R-squared: 0.2171, Adjusted R-squared: 0.2098
F-statistic: 29.74 on 4 and 429 DF, p-value: < 2.2e-16
The F statistic and p-value is used to test the entire model.
H0 : The model as a whole is not significant.
HA : The model as a whole is significant.
What is the conclusion of the test?
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
28 / 29
Statistics 101 (Prof. Rundel)
L 22: ANOVA
November 29, 2011
29 / 29
Download