Section 13

advertisement
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Introduction
Suppose we wish to compare k population means ( k  2 ). This situation can arise in two
ways. If the study is observational, we are obtaining independently drawn samples from
k distinct populations and we wish to compare the population means for some numerical
response of interest. If the study is experimental, then we are using a completely
randomized design to obtain our data from k distinct treatment groups. In a completely
randomized design the experimental units are randomly assigned to one of k treatments
and the response value from each unit is obtained. The mean of the numerical response
of interest is then compared across the different treatment groups.
There two main questions of interest:
1) Are there at least two population means that differ?
2) If so, which population means differ and how much do they differ by?
More formally:
H o : 1   2  ...   k i.e. all the population means are equal and have a common mean (  )
H a : i   j for some i  j, i.e. at least two population means differ.
If we reject the null then we use comparative methods to answer question 2 above.
Basic Idea:
The test procedure compares the variation in observations between samples to the
variation within samples. If the variation between samples is large relative to the
variation within samples we are likely to conclude that the population means are not all
equal. The diagrams below illustrate this idea...
Between Group Variation >> Within Group Variation
(Conclude population means differ)
Between Group Variation  Within Group Variation
(Fail to conclude the population means differ)
The name analysis of variance gets its name because we are using variation to decide
whether the population means differ.
197
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Computational Details for Equal Sample Size Case ( n1  n2    nk  n )
Preliminary notation:
Yij  jth obs. from population i, i  1,..., k , j  1,..., n
n
Yi  sample mean for the sample from population i , Yi 
k
Y  grand mean of all obs. 
n
 Yij
i 1 j 1
N
Y
j 1
n
ij
, i  1,...,k
k

Y
i 1
i
k
N  n1  n2  ...  nk , i.e. N is the total sample size for the experiment
TWO ESTIMATES OF THE COMMON VARIANCE ( 2 )
Estimate 1: Using Between Group Variation
If the null hypothesis is true each of the Yi  ' s represents a randomly selected observation
from a normal distribution with mean  (common mean) and std. deviation = 
the standard error of the mean, i.e. from the sampling distribution of 𝑌̅~𝑁(𝜇,
𝜎
√𝑛
n
, i.e.
).
Thus if we find the sample standard deviation of the Yi  ' s we get an estimate of 
n
, or
in terms of the sample variance we have,
k
Sample variance of the Yi  ' s 
 (Y
i 1
i
 Y ) 2
k 1
is an estimate of
2
n
.
Therefore if multiply this sample variance by n,
k
n (Yi   Y ) 2
i 1
k 1
we obtain an estimate of  2 , if and only if the null hypothesis is true. However if the
alternative hypothesis is true, this estimate of  2 will be too BIG. This formula will
be large when there is substantial between group variation. This measure of between
group variation is called the Mean Square for Treatments and is denoted MS Treat .
198
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Estimate 2: Using Within Group Variation
Another estimate of the common variance ( 2 ) can be found by looking at the variation
of the observations within each of the k treatment groups. By extending the pooledvariance from the two population case we have the following:
(n  1) s12  (n2  1) s22  ...  (nk  1) sk2
Pooled estimate of  2  1
n1  n2  ...  nk  k
which simplifies to the mean of the k sample variances when the sample sizes are all
equal.
Another way to write this is as follows:
k
Pooled estimate of  2 
ni
 (Y
i 1 j 1
ij
 Yi  ) 2
N k

SS Error
 MS Error
N k
This will be an estimate of the common population variance ( 2 ) regardless of whether
the null hypothesis is true or not. This is called the Mean Square for Error ( MS Error ) .
Thus if the null hypothesis is true we have two estimates of the common variance ( 2 ) ,
namely the mean square for treatments (MSTreat ) and the Mean Square Error ( MS Error ) .
If MSTreat  MS Error we reject H o , i.e. the between group variation is large relative to the
within group variation.
If MS Treat  MS Error we fail to reject H o , i.e. the between group variation is NOT large
relative to the within group variation.
Test Statistic
The test statistic compares the mean squares for treatment and error in the form of a ratio,
F
MS Treat
~ F - distribution with numerator df  k  1 and denominator df  N - k
MS Error
Large values for F give small p-values and lead to rejection of the null hypothesis.
199
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
EXAMPLE 13.1 - Weight Gain in Anorexia Patients
Data File: Anorexia.JMP available on the course website
These data give the pre- and post-weights of patients being treated for anorexia
nervosa. There are actually three different treatment plans being used in this
study, and we wish to compare their performance.
The variables in the data file are:
Treatment – Family, Standard, Behavioral
prewt - weight at the beginning of treatment
postwt - weight at the end of the study period
Weight Gain - weight gained (or lost) during treatment (postwt-prewt)
We begin our analysis by examining comparative displays for the weight gained
across the three treatment methods. To do this select Fit Y by X from the Analyze
menu and place the grouping variable, group, in the X box and place the
response, Weight Gain, in the Y box and click OK. Here boxplots, mean
diamonds, normal quantile plots, and comparison circles have been added.
Things to consider from this graphical display:

Do there appear to be differences in the mean weight gain?

Are the weight changes normally distributed?

Is the variation in weight gain equal across therapies?
200
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Checking the Equality of Variance Assumption
To test whether it is reasonable to assume the population variances are equal for these
three therapies select UnEqual Variances from the Oneway Analysis pull down-menu.
Equality of Variance Test
H o :  12   22     k2
H a : the population variances are not
all equal
We have no evidence to conclude that the variances/standard deviations of the weight
gains for the different treatment programs differ (p >> .05).
ONE-WAY ANOVA TEST FOR COMPARING THE THERAPY MEANS
To test the null hypothesis that the mean weight gain is the same for each of the therapy
methods we will perform the standard one-way ANOVA test. To do this in JMP select
Means, Anova/t-Test from the Oneway Analysis pull-down menu. The results of the
test are shown in the Analysis of Variance box.
MS Treat  307.22
MS Error  56.68
The mean square for treatments is 5.42 times larger than the
mean square for error! This provides strong evidence against
the null hypothesis.

The p-value contained in the ANOVA table is .0065, thus we reject the null hypothesis at
the .05 level and conclude statistically significant differences in the mean weight gain
experienced by patients in the different therapy groups exist.
201
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
MULTIPLE COMPARISONS
Because we have concluded that the mean weight gains across treatment method
are not all equal it is natural to ask the secondary question:
Which means are significantly different from one another?
We could consider performing a series of two-sample t-Tests and constructing
confidence intervals for independent samples to compare all possible pairs of
means, however if the number of treatment groups is large we will almost
certainly find two treatment means as being significantly different. Why?
Consider a situation where we have k = 7 different treatments that we wish to
compare. To compare all possible pairs of means (1 vs. 2, 1 vs. 3, …, 6 vs. 7)
k (k  1)
 21 two-sample t-Tests. If we used
would require performing a total of
2
  P(Type I Error)  .05 for each test we expect to make 21(.05)  1 Type I Error,
i.e. we expect to find one pair of means as being significantly different when in
fact they are not. This problem only becomes worse as the number of groups, k,
gets larger.
Experiment-wise Error Rate (EER)
Another way to think about this is to consider the probability of making no Type
I Errors when making our pair-wise comparisons. When k = 7 for example, the
probability of making no Type I Errors is (.95) 21  .3406 , i.e. the probability that
we make at least one Type I Error is therefore .6596 or a 65.96% chance.
Certainly this unacceptable! Why would conduct a statistical analysis when you
know that you have a 66% of making an error in your conclusions? This
probability is called the experiment-wise error rate.
Bonferroni Correction
There are several different ways to control the experiment-wise error rate (EER).
One of the easiest ways to control experiment-wise error rate is use the Bonferroni
Correction. If we plan on making m comparisons or conducting m significance
tests the Bonferroni Correction is to simply use 
as our significance level
m
rather than  . This simple correction guarantees that our experiment-wise error
rate will be no larger than  . This correction implies that our p-values will have
to be less than 
m
, rather than  , to be considered statistically significant.
202
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Multiple Comparison Procedures for Pairwise Comparisons of k Pop. Means
When performing pair-wise comparison of population means in ANOVA there
are several different methods that can be employed. These methods depend on
the types of pair-wise comparisons we wish to perform. The different types
available in JMP are summarized briefly below:




Compare each pair using the usual two-sample t-Test for
independent samples. This choice does not provide any
experiment-wise error rate protection! (DON’T USE)
Compare all pairs using Tukey’s Honest Significant Difference
(HSD) approach. This is best choice if you are interested
comparing each possible pair of treatments.
Compare with the means to the “Best” using Hsu’s method. The
best mean can either be the minimum (if smaller is better for the
response) or maximum (if bigger is better for the response).
Compare each mean to a control group using Dunnett’s method.
Compares each treatment mean to a control group only. You must
identify the control group in JMP by clicking on an observation in
your comparative plot corresponding to the control group before
selecting this option.
Multiple Comparison Options in JMP
203
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
EXAMPLE 13.1 - Weight Gain in Anorexia Patients (cont’d)
For these data we are probably interested in comparing each of the treatments to
one another. For this we will use Tukey’s multiple comparison procedure for
comparing all pairs of population means. Select Compare Means from the
Oneway Analysis menu and highlight the All Pairs, Tukey HSD option. Beside
the graph you will now notice there are circles plotted. There is one circle for
each group and each circle is centered at the mean for the corresponding
group. The size of the circles are inversely proportional to the sample size, thus
larger circles will drawn for groups with smaller sample sizes. These circles are
called comparison circles and can be used to see which pairs of means are
significantly different from each other. To do this, click on the circle for one of
the treatments. Notice that the treatment group selected will be appear in the
plot window and the circle will become red & bold. The means that are
significantly different from the treatment group selected will have circles that are
gray. These color differences will also be conveyed in the group labels on the
horizontal axis. In the plot below we have selected Standard treatment group.
The results of the pairwise comparisons are also contained the output window
(shown below). The matrix labeled Comparison for all pairs using Tukey-Kramer
HSD identifies pairs of means that are significantly different using positive
entries in this matrix. Here we see only treatments 2 and 3 significantly differ. I
generally turn this option off as the other methods for displaying significant
differences are better.
The next table conveys the same information by using different letters to
represent populations that have significantly different means. Notice treatments
2 and 3 are not connected by the same letter so they are significantly different.
204
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Finally the CI’s in the Ordered Differences section give estimates for the
differences in the population means. Here we see that mean weight gain for
patients in treatment 3 is estimated to be between 2.09 lbs. and 13.34 lbs. larger
than the mean weight gain for patients receiving treatment 2 (see highlighted
section below).
205
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Example 13.2: Study of Diet and Growth of Western Rock Lobsters
The western rock lobster is recognized as a valued fishery estimated at 300 million
dollars a year. Therefore, recent studies have focused on protecting this commodity and
increasing the production. In one particular study, a large sample of juvenile lobsters
were collected at various locations and held in a tank at the Fisheries Marine Research
Lab for two months as an acclimation period. During this period, the lobsters were fed a
diet of mussels and prawn pellets. Subsequently, juvenile lobsters were fed for nine
weeks on either the fresh mussel diet (D1) or one of four artificial diets: two in semimoist form (D2 and D3) and two in dry-pelleted form (D4 and D5). At the end of the
nine weeks, 10 animals were sampled at random from each diet treatment, and their
Muscle Somatic Index (MSI) was measured. The MSI is the weight of wet tail muscle
expressed as a percentage of the weight of the whole animal, wet.
Question(s) of Interest: Is there evidence that lobsters perform as well on the artificial
diets as they do on the natural diet?
Source: Tsvetneko, E., J. Brown, B.D. Glencross, and L.H. Evans. 1999. Measures of condition in
dietary studies on western rock lobster post-pueruli. Proceedings, International Symposium on
Lobster Health Management, Adelaide, September 1999.
Step 0: Check the assumptions to be sure that the ANOVA is valid.

Make sure that the two groups are independent

Check the normality assumption.
206
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013

Check whether we have any evidence that the group variances differ (the
ANOVA assumes equal variances).
Step 1: Convert the research question into Ho and Ha.
H0:
Ha:
Step 2: Find the test statistic and p-value from your data.
In JMP, select Means/Anova.
207
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
If you question the assumption of equal variances:
If you question the assumption of normality:
Step 3: Write a conclusion in the context of the problem.
208
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Multiple Comparisons for the Western Rock Lobster data:
The researchers regarded the fresh mussel diet (D1) as a control treatment. Therefore,
we should use Dunnett’s test to compare the treatments.
Using Dunnett’s test in JMP: Select Compare Means > With Control, Dunnett’s.
First, you need to identify which group is the control group. Click on D1, the control
diet.
209
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Questions:
1. Where do the significant differences exist?
2. Do the artificial diets appear to promote or retard body condition (based on the
muscle somatic index)?
3. Which, if any, of the artificial diets would you recommend?
210
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
EXAMPLE 13.3 – Butter Fat Content in Cow Milk
Data File: Butterfat-cows.JMP
This data set comes from the butter fat content for five breeds of dairy cows.
The variables in the data file are:
 Breed – Ayshire, Canadian, Guernsey, Holstein-Fresian, Jersey
 Age Group – two different age class of cows (1 = younger, 2 = older)
 Butterfat - % butter fat found in the milk sample
We begin our analysis by examining comparative displays for the butter fat
content across the five breeds. To do this select Fit Y by X from the Analyze
menu and place the grouping variable, Breed, in the X box and place the
response, Butterfat, in the Y box and click OK. The resulting plot simply shows a
scatter plot of butter fat content versus breed. In many cases there will numerous
data points on top of each other in such a display making the plot harder to read.
To help alleviate this problem we can stagger or jitter the points a bit from
vertical by selecting Jitter Points from the Display Options menu. To help
visualize breed differences we could add quantile boxplots, mean diamonds, or
mean with standard error bars to the graph. To do any or all of these select the
appropriate options from the Display Options menu. The display below has
both quantile boxplots, sample mean with error bars, and normal quantile plots
added.
211
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
We can clearly see the butter fat content is largest for Jersey and Guernsey cows
and lowest for Holstein-Fresian. There also appears to be some difference in the
variation of butter fat content as well. The summary statistics on the following
page confirm our findings based on the graph above. To obtain these summary
statistics select the Quantiles and Means, Std Dev, Std Err options from the
Oneway Analysis menu at the top of the window.
Summary Statistics for Butter Fat Content by Breed
To test the null hypothesis that the butter fat content is the same for each of the
breeds we can perform a one-way ANOVA test.
H o :  Ayshire   Canadian  ...   Jersey
H a : at least two breeds have different mean percent butter fat content in their milk
Again the assumptions for one-way ANOVA are as follows:
1.) The samples are drawn independently or come from using a completely
randomized design. Here we can assume the cows sampled from each
breed were independently sampled.
2.) The variable of interest is normally distributed for each population.
3.) The population variances are equal across groups. Here this means:
 2 Ayshire   2 Canadian  ...   2 Jersey
If this assumption is violated we can use Welch’s ANOVA, which allows
for inequality of the population variances or we can transform the
response (see below).
212
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
To check these assumptions in JMP select UnEqual Variances and Normal
Quantile Plot > Actual by Quantile (the 1st option). To conduct the one-way
ANOVA test select Means, Anova/t-Test from the Oneway Analysis pull-down
menu.
The graphical results and the results of the test are shown below.
Butter Fat Content by Breed
Normality appears to be satisfied
213
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
The equality of variance test results are shown below.
We have strong evidence against
the equality of the population
variances/standard deviations.
(Use Welch’s ANOVA)
We conclude at least two means differ.
Because we have strong evidence against equality of the population variances we
could use Welch’s ANOVA to test the equality of the means. This test allows for
the population variances/standard deviations to differ when comparing the
population means. For Welch’s ANOVA we see that the p-value < .0001,
therefore we have strong evidence against the equality of the means and we
conclude these dairy breeds differ in the mean butter fat content of their milk.
214
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Variance Stabilizing Transformations
Another approach that is often times used when we encounter situations where
there is evidence against the equality of variance assumption is to transform the
response. Often times we find that the variation appears to increase as the mean
increases. For these data we see that this is the case, the estimated standard
deviations increase as the estimated means increase. To correct this we generally
consider taking the log of the response. Sometimes the square root and
reciprocal are used but these transformations do not allow for any meaningful
interpretation in the original scale, whereas the log transform.
As we can see, as the samples means for the
groups increase so do the sample standard
deviations. To stabilize the variances/standard
deviations we can try a log transformation.
The results of the analysis using the log response are shown below.
Analysis of the log(Percent Butter Fat) Content for the Five Dairy Cow Breeds
215
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
Normality is still satisfied, there is no evidence against
the equality of the variances, and we have strong
evidence that at least two means differ.
Because we have concluded that the mean/median log butter fat content differs
across breed (p < .0001), it is natural to ask the secondary question: Which breeds
have typical values that are significantly different from one another? To do this
we again use multiple comparison procedures. In JMP select Compare Means
from the Oneway Analysis menu and highlight the All Pairs, Tukey HSD
option. The results of Tukey’s HSD are shown below.
Below is the plot showing the results when clicking on the circle for Guernsey’s.
We can see that the mean log-Butterfat Content for Guernsey’s and Jersey’s do
not significantly differ from each other, but they both differ significantly from the
216
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
other three breeds.
Here we see only Guernsey and Jersey do not significantly differ. The table
using letters conveys the information, using different letters to represent
populations that have significantly different means. The CI’s in the Ordered
Differences section give estimates for the differences in the population
means/medians in the log scale. As was the case with using the log
transformation with two populations, we interpret the results in the original
scale in terms of ratios of medians. For example, back transforming the interval
comparing Jersey to Holstein-Fresian, we estimate that the median butter fat
content found in Jersey milk is between 1.33 and 1.55 times larger than the
median butter fat content in Holstein-Fresian milk. We could also state this in
terms of percentages as follows: we estimate that the typical butter fat content
found in Jersey milk is between 33% and 55% higher than the butter fat content
found in Holstein-Fresian milk.
217
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
ONE-WAY ANOVA MODEL (FYI only)
In statistics we often use models to represent the mechanism that generates the observed
response values. Before we introduce the one-way ANOVA model we first must
introduce some notation.
Let,
ni  sample size from group i , (i  1,..., k )
Yij  the j th response value for i th group , ( j  1,..., ni )
  mean common to all groups assuming the null hypothesis is true
 i  shift in the mean due to the fact the observation came from group i
 ij  random error for the j th observatio n from the i th group .
Note: The test procedure we use requires that the random errors are normally distributed
with mean 0 and variance  2 . Equivalently the test procedure requires that the response
is normally distributed with a common standard deviation  for all k groups.
One-way ANOVA Model:
Yij     i   ij i  1,...,k and j  1,...,ni
The null hypothesis equivalent to saying that the  i are all 0, and the alternative says that
at least one  i  0 , i.e. H o :  i  0 for all i vs. H a :  i  0 for some i . We must decide
on the basis of the data whether we evidence against the null. We display this graphically
as follows:
Estimates of the Model Parameters:
̂  Y ~ the grand mean
ˆi  Yi  Y ~ the estimate ith treatment effect
Yˆ  ˆ  ˆ  Y ~ these are called the fitted values.
ij
i
i
Note:
Yij  ˆ  ˆi  ˆij
Yij  Y  (Yi  Y )  (Yij  Yi )
ˆij  Yij  Yˆij  Yij  Yi ~ these are called the residuals.
218
STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA)
Fall 2013
An Alternative Derivation of the Test Statistic Based on the Model
Starting with
Yij  ˆ  ˆi  ˆij
Yij  Y  (Yi  Y )  Yij  Yi )
we obtain
(Yij  Y )  (Yi  Y )  (Yij  Yi )
After squaring both sides we have,
(Yij  Y ) 2  (Yi  Y ) 2  (Yij  Yi ) 2  2(Yi  Y )(Yij  Yi )
Finally after summing overall of the observations and simplying we obtain.
k
ni
k
k
ni
 (Yij  Y ) 2   ni (Yi  Y ) 2   (Yij  Yi ) 2
i 1 j 1
i 1
i 1 j 1
These measures of variation are called “Sum of Squares” and are denoted SS. The above
expression can be written
SSTotal  SSTreat  SS Error
The degrees of freedom associated with each of these SS’s have the same relationship
and are given below:
df total = df treatment + df error.
(n – 1) = (k – 1)
+ (n – k)
The mean squares (i.e. variance estimates) discussed above are found by taking the sum
of squares and dividing by their associated degrees of freedom, i.e.
k
MS Treat
niˆi2
SSTreat 
2
which is an estimate of the common pop. variance (  ) only if Ho is true

 i 1
k 1
k 1
and
MS Error 
SS Error
= ˆ 2 which is an estimate of the common population  2 regardless of Ho.
nk
Thus when testing
H o :  i  0 for all i.
H a :  i  0 for some i.
we reject Ho when MSTreat >> MSError , i.e. when the estimated treatment effects are
large!
219
Download