ANOVA In the last set of notes we looked at the pooled t

advertisement
ANOVA
In the last set of notes we looked at the pooled t-test which is used to compare two group means
assuming they had equal variances. Next, we will explore a procedure called Analysis of Variance
(ANOVA). The biggest advantage of ANOVA is that it allows you to compare as many group means as
you want, whereas the pooled t-test limits you to only two. The ANOVA tests the following hypotheses:
H0:
Ha:
The test helps to determine if the observed differences among the sample means could have happened
by chance if the null hypothesis were true. Just like the pooled t-test, there are certain assumptions that
must be met in order for the ANOVA to be valid. We will discuss these in more detail later.
Assumptions:
1. The groups have the ______________ variances (standard deviations).
2. The response variables for each group are ___________________ distributed.
3. The observations were either __________________ selected from each group’s
population, or in the case of designed experiments were randomly assigned to each
of the groups.
Idea behind ANOVA
The ANOVA is a statistical method used to compare group (or treatment) ___________. So, why is it
called analysis of variance? It’s because the ANOVA test compares two types of variability. Consider the
following figures A, B and C. Each shows data distributions for two groups, where the length of the red
box corresponds to the variability within each group and the green line in the middle of the box
represents the mean of each group.
Figure A
Figure B
Figure C
1
Questions:
1. What do you notice about the difference between the sample means in Figure A as compared to
the difference between sample means in Figure B?
2. What do you notice about the variability within the groups in Figure A as compared to Figure B?
3. Which figure do you think provides more evidence that the group population means differ from
each other: Figure A or Figure B? Explain your reasoning.
4. Which figure do you think provides more evidence that the group population means differ from
each other: Figure B or Figure C? Explain your reasoning.
2
Your intuition should have led you in this direction:

The data in Figure _______ provide stronger evidence that the group means differ than do the
data in Figure _______ because the variability within each sample is smaller, even though the
variability between groups is the same (i.e., the group means are the same).

The data in Figure _______ provides stronger evidence the group means differ than do the data
in Figure _______ because the variability between means is larger in this case.
As you can see, both types of variability (within group and between groups) play a role in determining
whether we have evidence the group means differ. In general, the ANOVA works this way: the larger
the variability _________________ groups relative to the variability _______________ groups, the more
evidence we have that the group means differ. The test statistic we will use is the F-statistic and is
calculated as follows:
F=
Estimate of variability between groups
Estimate of varibility within groups
Questions:
5. What does it mean if the F-statistic is small?
6. What does it mean if the F-statistic is large?
Example: Consider the example dealing with the coagulation rate of rabbits from the last set of notes.
Recall, an experiment was conducted to investigate how two diets (A and B) affect coagulation rate. The
data is given below.
Diet
A
B
Coagulation Rate
Time in Seconds
Mean
68, 72
70
56, 60
58
3
Estimating the variability WITHIN groups
To calculate this quantity, we will compare each observation to its group mean. The distance between
each observation and its group mean is the __________ of interest.
Diet A average = __________
Diet B average = __________
Diet
A
Replicate
1
Time
68
A
2
72
B
1
56
B
2
60
Group Mean
Error
Error2
The measure the variability within groups we consider the sum of squared error terms. We call this sum
_____, or the Error Sum of Squares.
SSE =
4
To get our estimate of variability within groups, we divide the SSE by the degrees of freedom (df) for
error. Let N = the total number of observations and t = the number of groups.
df for error = N – t =
When we divide the SSE by the df for error we call this Mean Square Error (_____).
MSE =
We can also think of this another way….
Because the sample sizes for both diets are equal in this case, the estimate of the variability within
groups (the MSE) is simply the average of the two variances:
MSE =
s12  s 22
=
2
Note that if the sample sizes are different, the estimate of the variability within groups would be a
weighted average of the sample variances. In general,
MSE =
 n1  1 s12   n 2  1 s 22 
n1  n 2 
  n t  1 st2
 nt  t
Estimating the variability BETWEEN groups
To calculate this quantity, we will compare each observation’s _______________ mean to the overall
mean. So, the distance between each observation’s group mean and the overall mean is the deviation
of interest when calculating the variability between groups.
5
Diet A average = __________
Diet B average = __________
Overall Average = __________
Diet
A
Replicate
1
Time
68
A
2
72
B
1
56
B
2
60
Group Mean
Overall Mean
Deviance2
To measure the variability between groups, we consider the sum of the squared deviances from the
above table. We call this __________, or the Treatment Sum of Squares.
SSTrt =
Once again, to get our estimate of the variability between groups, we divide SSTrt by the df for
treatment.
df for treatment = t – 1 =
Finally, when we divide SSTrt by the df for treatment we call this the Mean Square for Treatment
(MSTrt).
MSTrt =
We can also think of this another way…
To measure the variation between groups, we consider the following quantity:
n1  y1  y   n 2  y2  y  
2
MSTrt =
2
 n t  yt  y 
2
t 1
Where y i represents the mean of the ith treatment group and y represents the overall mean.
6
Calculating the F-statistic and p-value
As mentioned earlier, the F-statistic is calculated as follows:
F=
Estimate of variability between groups

Estimate of varibility within groups
When the null hypothesis (the group/treatment means are equal) is true, this test statistic follows the Fdistribution with numerator df = __________ and denominator df = __________.
As mentioned earlier, the larger the variability between groups relative to the variability within groups,
the more evidence we have that the group means are different. So, the larger the F-statistic, the more
evidence we have __________________ the null hypothesis.
Recall, the p-value is the probability of obtaining a test statistic at least as extreme as our observed
result, given the null hypothesis is true. To find the p-value associated with the F-statistic, we find the
probability of obtaining our F-statistic or one that is even larger.
We can get the p-value from JMP.
7
Carrying out the ANOVA in JMP.
First, you’ll want to enter the data in the following manner:
Next, choose Analyze  Fit Model. Then enter the following:
Click Run. The output appears as follows:
8
Questions:
7. Find the sums of squares in the output:

SSTrt =

SSE =
8. Find the degrees of freedom in the output:

df Error =

df Treatment =
9. Find the mean squares in the output:

MSTrt =

MSE =
10. Find the F-statistic in the output:
11. Find the p-value in the output:
12. What do we conclude from this study?
9
Download