Chapter 12

advertisement
Chapter 12
One-Way Analysis of
Variance
Introduction to the Practice of
STATISTICS
SEVENTH
EDI T I O N
Moore / McCabe / Craig
Lecture Presentation Slides
Chapter 12
One-Way Analysis of
Variance
12.1 Inference for One-Way Analysis of Variance
12.2 Comparing the Means
2
12.1 Inference for One-Way
Analysis of Variance
 The Idea of ANOVA
 Comparing Several Means
 The Problem of Multiple Comparisons
 The ANOVA F Test
3
Introduction
The two sample t procedures of Chapter 7 compared the means of two
populations or the mean responses to two treatments in an
experiment.
In this chapter we’ll compare any number of means using Analysis of
Variance.
Note: We are comparing means even though the procedure is
Analysis of Variance.
4
The Idea of ANOVA
When comparing different populations or treatments, the data are
subject to sampling variability. We can pose the question for inference
in terms of the mean response.
Analysis of variance (ANOVA) is the technique used to compare
several means.
One-way ANOVA is used for situations in which there is only one way
to classify the populations of interest.
Two-way ANOVA is used to analyze the effect of two factors.
5
The Idea of ANOVA
The details of ANOVA are a bit daunting. The main idea is that when we
ask if a set of means gives evidence for differences among the
population means, what matters is not how far apart the sample means
are, but how far apart they are relative to the variability of individual
observations.
The Analysis of Variance Idea
Analysis of variance compares the variation due to specific
sources with the variation among individuals who should be
similar. In particular, ANOVA tests whether several populations
have the same mean by comparing how far apart the sample
means are with how much variation there is within the sample.
6
The Idea of ANOVA



The sample means for the three samples are the same for each set.
The variation among sample means for (a) is identical to (b).
The variation among the individuals within the three samples is much less
for (b).
 CONCLUSION: the samples in (b) contain a larger amount of variation
among the sample means relative to the amount of variation within the
samples, so ANOVA will find more significant differences among the
means in (b)
– assuming equal sample sizes here for (a) and (b).
– Note: larger samples will find more significant differences.
7
Comparing Several Means
Do SUVs and trucks have
lower gas mileage than midsize
cars?
Response variable: gas mileage
(mpg)
Groups: vehicle classification
31 midsize cars
31 SUVs
14 standard-size pickup trucks
Data from the Environmental Protection Agency’s Model Year
2003 Fuel Economy Guide, www.fueleconomy.gov.
8
Comparing Several Means
Means:
Midsize: 27.903
SUV:
22.677
Pickup: 21.286


Mean gas mileage for
SUVs and pickups
appears less than for
midsize cars.
Are these differences
statistically significant?
9
Comparing Several Means
Means:
Midsize: 27.903
SUV:
22.677
Pickup: 21.286
Null hypothesis:
The true means (for gas mileage) are the
same for all groups (the three vehicle
classifications).
We could look at separate t tests to compare each pair of
means to see if they are different:
27.903 vs. 22.677, 27.903 vs. 21.286, & 22.677 vs. 21.286
H0: μ1 = μ2
H0: μ1 = μ3
H0: μ2 = μ3
However, this gives rise to the problem of multiple
comparisons!
10
Problem of Multiple Comparisons
We have the problem of how to do many comparisons at the same
time with some overall measure of confidence in all the conclusions.
Statistical methods for dealing with this problem usually have two
steps:


An overall test to find any differences among the parameters
we want to compare
A detailed follow-up analysis to decide which groups differ
and how large the differences are
Follow-up analyses can be quite complex; we will look at only the
overall test for a difference in several means and examine the data
to make follow-up conclusions.
11
The One-Way ANOVA Model
Random sampling always produces chance variations. Any “factor
effect” would thus show up in our data as the factor-driven
differences plus chance variations (“error”):
Data = fit + residual
The one-way ANOVA model analyzes
situations where chance variations are
normally distributed N(0,σ) such that:
12
The ANOVA F Test
We want to test the null hypothesis that there are no differences
among the means of the populations.
The basic conditions for inference are that we have random
samples from each population and that the population is Normally
distributed.
The alternative hypothesis is that there is some difference. That is,
not all means are equal. This hypothesis is not one-sided or twosided. It is “many-sided.”
This test is called the analysis of variance F test (ANOVA).
13
Conditions for ANOVA
Like all inference procedures, ANOVA is valid only in some
circumstances. The conditions under which we can use ANOVA are:
Conditions for ANOVA Inference
 We have I independent SRSs, one from each population.
We measure the same response variable for each sample.
The ith population has a Normal distribution with unknown
mean µi. One-way ANOVA tests the null hypothesis that all
population means are the same.
 All of the populations have the same standard deviation
, whose value is unknown.
Checking Standard Deviations in ANOVA
 The results of the ANOVA F test are approximately correct
when the largest sample standard deviation is no more than
twice as large as the smallest sample standard deviation.
14
The ANOVA F Statistic
To determine statistical significance, we need a test statistic that we can
calculate:
The ANOVA F Statistic
The analysis of variance F statistic for testing the equality of
several means has this form:
variation among the sample means
F=
variation among individuals in the same sample
• F must be zero or positive
– F is zero only when all sample means are identical
– F gets larger as means move further apart
• Large values of F are evidence against H0: equal means
• The F test is upper-one-sided
15
F Distributions
The F distributions are a family of right-skewed distributions that take
only values greater than 0. A specific F distribution is determined by
the degrees of freedom of the numerator and denominator of the F
statistic.
When describing an F distribution, always give the numerator
degrees of freedom first. Our brief notation will be F(df1, df2) with df1
degrees of freedom in the numerator and df2 degrees of freedom in
the denominator.
Degrees of Freedom for the F Test
We want to compare the means of I populations. We have an SRS of
size n from the ith population, so that the total number of observations
in all samples combined is:
N = n1 + n2 + … + ni
If the null hypothesis that all population means are equal is true, the
ANOVA F statistic has the F distribution with I – 1 degrees of freedom
in the numerator and N – I degrees of freedom in the denominator.
16
The ANOVA F Test
F=

variation among the sample means
variation among individuals in the same sample
The measures of variation in the numerator and denominator are
mean squares:
 Numerator: Mean Square for Groups (MSG)
n1(x1  x )2  n2 (x2  x )2 L  nI (xI  x )2
MSG 
I 1

Denominator: Mean Square for Error (MSE)
(n1 1)s12  (n2 1)s22 L  (nI 1)sI2
MSE 
NI




MSE is also called the pooled sample variance, written as sp2 (sp
is the pooled standard deviation)
sp2 estimates the common variance  2
17
The ANOVA Table
Source of variation
Sum of squares
SS
DF
Mean square
MS
F
P value
F crit
Among or between
“groups”
n
(
x

x
)

I -1
SSG/DFG
MSG/MSE
Tail area
above F
Value of
F for a
Within groups or
“error”
(
n
1
)
s

i
i
N-I
SSE/DFE
Total
SST=SSG+SSE
N–1
2
i i
2
(
x
x
)

2
ij
R2 = SSG/SST
√MSE = sp
Coefficient of determination
Pooled standard deviation
The sum of squares represents variation in the data: SST = SSG + SSE.
The degrees of freedom reflect the ANOVA model: DFT = DFG + DFE.
Data (“Total”) = fit (“Groups”) + residual (“Error”)
18
Example
P-value<.05
significant
differences
Follow-up analysis
There is significant evidence that the three types of vehicle do not all
have the same gas mileage.
From the confidence intervals (and looking at the original data), we
see that SUVs and pickups have similar fuel economy and both are
distinctly poorer than midsize cars.
19
12.2 Comparing the Means
 Contrasts
 Multiple Comparisons
 Power of the F Test*
20
Introduction
You have calculated a p-value for your ANOVA test. Now what?
If you found a significant result, you still need to determine which treatments
were different from which.
 You can gain insight by looking back at your plots (boxplot, mean ±
s).
 There are several tests of statistical significance designed specifically
for multiple tests. You can choose contrasts, or multiple
comparisons.
 You can find the confidence interval for each mean mi shown to be
significantly different from the others.
21
Introduction
 Contrasts can be used only when there are clear expectations BEFORE
starting an experiment, and these are reflected in the experimental
design. Contrasts are planned comparisons.
 Patients are given either drug A, drug B, or a placebo. The three treatments
are not symmetrical. The placebo is meant to provide a baseline against which
the other drugs can be compared.
 Multiple comparisons should be used when there are no justified
expectations. Those are pair-wise tests of significance.
 We compare gas mileage for eight brands of SUVs. We have no prior
knowledge to expect any brand to perform differently from the rest. Pair-wise
comparisons should be performed here, but only if an ANOVA test on all eight
brands reached statistical significance first.
It is NOT appropriate to use a contrast test when suggested
comparisons appear only after the data is collected.
22
Contrasts
When an experiment is designed to test a specific hypothesis that some
treatments are different from other treatments, we can use contrasts to
test for significant differences between these specific treatments.
 Contrasts are more powerful than multiple comparisons because
they are more specific. They are more able to pick up a significant
difference.
 You can use a t test on the contrasts or calculate a t confidence
interval.
 The results are valid regardless of the results of your multiple
sample ANOVA test (you are still testing a valid hypothesis).
23
Contrasts
A contrast is a combination of
population means of the form
y

a

im
i
where the coefficients ai have sum 0.
To test the null hypothesis
H0: y = 0 use the t statistic:
t
cSE
c
with degrees of freedom DFE that is
associated with sp. The alternative
hypothesis can be one- or two-sided.
The corresponding sample contrast is:
c
a

ix
i
The standard error of c is:
A level C confidence interval for
the difference y is:
c

t*
SE
c
2
2
a
a
where t* is the critical value defining
i
i
SE

s

MSE

c
p
the middle C% of the t distribution
n
n
i
i
with DFE degrees of freedom.
24
Contrasts
Contrasts are not always readily available in statistical software
packages (when they are, you need to assign the coefficients “ai”), or
may be limited to comparing each sample to a control.
If your software doesn’t provide an option for contrasts, you can test
your contrast hypothesis with a regular t test using the formulas we just
highlighted. Remember to use the pooled variance and degrees of
freedom as they reflect your better estimate of the population variance.
Then you can look up your p-value in a table of t distribution.
25
Example
Do nematodes affect plant growth? A botanist prepares 16 identical planting
pots and adds different numbers of nematodes into the pots. Seedling
growth (in mm) is recorded two weeks later.
One group contains no nematodes at all. If the botanist planned this group as
a baseline/control, then a contrast of all the nematode groups against the
control would be valid.
26
Example
Contrast of all the nematode groups against the control:
Combined contrast hypotheses:
H0: µ1 = 1/3 (µ2+ µ3 + µ4) vs.
Ha: µ1> 1/3 (µ2+ µ3 + µ4)  one tailed
xi
G1: 0 nematode
10.65
G2: 1,000 nematodes 10.425
G3: 5,000 nematodes 5.6
G4: 10,000 nematodes 5.45
si
2.053
1.486
1.244
1.771
Contrast coefficients: (+1 −1/3 −1/3 −1/3) or (+3 −1 −1 −1)
c

a
x

3
*
10
.
65

10
.
425

5
.
6

5
.
45

10
.
475

i
i
2
2
2


a
3
(

1
)
i


SE

s

2
.
78
*

3
*

2
.
9

c p


n
4 4


i
t

c
SE

10
.
5
2
.
9

3
.
6
df
:
N
12
c
In Excel: TDIST(3.6,12,1) = tdist(t, df, tails) ≈ 0.002 (p-value).
Nematodes result in significantly shorter seedlings (alpha 1%).
27
Example

ANOVA: H0: all µi are equal vs. Ha: not all µi are equal
ANO VA
SeedlingLengt h
Bet ween G r oups
Wit hin G r oups
Tot al

Sum of
Squar es
100. 647
33. 328
133. 974
df
3
12
15
Mean Squar e
33. 549
2. 777
F
12. 080
Sig.
. 001
 not all µi are equal
Planned comparison:
H0: µ1 = 1/3 (µ2+ µ3 + µ4) vs.
Cont r ast Coef f i ci ent s
Ha: µ1> 1/3 (µ2+ µ3 + µ4)  one tailed
Contrast coefficients: (+3 −1 −1 −1)
Cont r as t
Cont r ast
1
0
-3
Nem at odesLevel
1000
5000
1
1
10000
1
Te st s
Valu e of
Con t r as
Con
t t r as St
t d . Er r or
t
See d lin gL e ng t hAs s u m e eq u al v a r ia1n c e s - 10 . 4 75 0 2 . 8 8 6 50 - 3. 6 2 9
Doe s no t a s s u m e 1e qu a l - 10 . 4 75 0 3 . 3 4 8 23 - 3. 1 2 9
v ar ia n c e s
df
Sig .
12
4. 139
Nematodes result in significantly shorter seedlings (alpha 1%).
( 2- t aile d )
. 00 3
. 03 4
28
Multiple Comparisons
Multiple comparison tests are variants on the two-sample t test.
 They use the pooled standard deviation sp = √MSE.
 The pooled degrees of freedom DFE.
 And they compensate for the multiple comparisons.
We compute the t statistic
for all pairs of means:
A given test is significant (µi and µj significantly different), when
|tij| ≥ t** (df = DFE).
The value of t** depends on which procedure you choose to use.
29
The Bonferroni Procedure
The Bonferroni procedure performs a number of pair-wise
comparisons with t tests and then multiplies each p-value by the
number of comparisons made. This ensures that the probability of
making any false rejection among all comparisons made is no greater
than the chosen significance level α.
As a consequence, the higher the number of pair-wise comparisons
you make, the more difficult it will be to show statistical significance
for each test. But the chance of committing a type I error also
increases with the number of tests made. The Bonferroni procedure
lowers the working significance level of each test to compensate for
the increased chance of type I errors among all tests performed.
30
Simultaneous Confidence Intervals
We can also calculate simultaneous level C confidence intervals for
all pair-wise differences (µi − µj) between population means:
11
CI
:
(
x
x
)

t
*
*
s

i
j
p
n
i n
j
spis the pooled variance, MSE.
t** is the t critical value with degrees of freedom DFE = N – I, adjusted
for multiple, simultaneous comparisons (e.g., Bonferroni procedure).
31
Power*
The power, or sensitivity, of a one-way ANOVA is the probability that the
test will be able to detect a difference among the groups (i.e., reach
statistical significance) when there really is a difference.
Estimate the power of your test while designing your experiment to
select sample sizes appropriate to detect an amount of difference
between means that you deem important.
 Too small a sample is a waste of experiment, but too large a
sample is also a waste of resources.
 A power of about 80% is often suggested.
32
Power Computations*
ANOVA power is affected by:
 The significance level a
 The sample sizes and number of groups being compared
 The differences between group means µi
 The guessed population standard deviation
You need to decide what alternative Ha you would consider important,
detect statistically for the means µi, and guess the common standard
deviation σ (from similar studies or preliminary work).
The power computations then require calculating a non-centrality
parameter λ, which follows the F distribution with DFG and DFE
degrees of freedom to arrive at the power of the test.
33
Chapter 12
One-Way Analysis of
Variance
12.1 Inference for One-Way Analysis of Variance
12.2 Comparing the Means
34
Download