Simple ANalysis Of Variance (ANOVA)

advertisement
1
Simple ANalysis Of Variance (ANOVA)
•
Oftentimes we have more than two groups that we want to compare. The purpose of
ANOVA is to allow us to compare group means from several independent samples.
•
In general, ANOVA procedures are generalizations of the t-test and it can be shown that, if
one is only interested in the difference between two groups on one independent categorical
(i.e. grouping variable), that the independent samples t-test is a special case of ANOVA.
•
A one-way ANOVA refers to having only one independent grouping variable or factor,
which is the independent variable. It is possible to have more than one grouping variable, but
we will start with the simplest case.
•
If one only has two levels of the grouping variable then one can simply conduct an
independent samples t-test, but if one has more than two levels of the grouping variable than
one needs to conduct an ANOVA.
•
Since we have more than two groups in ANOVA we need to figure out a way to describe the
difference between all the means. One way to do this is to figure out the variance between
the sample means because a large variance implies that the sample means differ a lot,
whereas a small variance implies that the sample means are not that different. This will give
us a single numeric value for the difference between all the sample means.
•
The statistic used in ANOVA partitions the variance into two components: (1) the between
treatment1 variability and (2) the within treatment variability.
•
Whenever from different samples are compared there are three sources that can cause
differences to be observed between the sample means:
1. Difference due to Treatment
2. Individual Differences
3. Differences due to Experimental Error
These are the three different sources of variability that can be cause one to observe
differences between treatment groups and so these sources of variability are referred to as the
between treatment variability.
Only two of these sources of variability can be observed within a treatment group,
specifically individual differences and experimental error, and these are referred to as the
within treatment variability2.
•
The statistics used in ANOVA, the F –statistic, uses a ratio of between treatment variability
and within treatment variability to test whether or not there is a difference among treatments.
Specifically:
F=
1
between treatment variabilty treatmenteffect+ individualdifferences + experimental error
=
within treatment variablity
individualdifferences + experimental error
Note that groups do not always represent treatments. Oftentimes ANOVA is used to determine differences in
intact groups such as those that differ by ethnicity or gender.
2
It should be noted that your book, and many statistical software packages refer to the within treatment variability
as the error variability.
2
•
If the treatment effect is small than the ratio will be close to one. Therefore, an F-statistic
close to one would be expected if the null hypothesis were true and there were no treatment
differences.
•
If the treatment effect is large then the ratio will be much greater than one because the
between treatment variability will be much larger than the within treatment variability.
•
The hypotheses tested in ANOVA are:
H0: µ1 = µ2 = µ3 = ... = µK
H1: at least one mean is different from the rest
where K = the total number of groups or sample means being compared
•
In the population, group j has mean µj and variance σ 2j . In the sample, group j has mean X j
and variance s 2j . The sample size for each group j is nj and the total number of observations,
N = n1 + n2 + n3 + … + nK. The grand mean, of all observations is X .
•
The assumptions underlying the test are the same as the assumption underlying the t-test for
independent samples. Specifically,
1. Each group, j in the population is normally distributed with mean µj
2. The variance in each group is the same so that σ12 = σ 22 = K = σ 2K = σ 2 , otherwise
known as the homogeneity of variance assumption.
3. Each observation is independent of each other
•
The computations underlying a simple one-way ANOVA are pretty straightforward if you
remember that a variance is composed of two parts: (1) the sum of squared deviations from
the mean (SS) and (2) the degrees of freedom (df), which can be though of as the number of
potentially different values that are used to compute the SS minus 1.
•
Therefore the total variance, across all groups, is computed using SStotal =
∑(X − X )
2
and
dftotal = N – 1. We partition this variance into two parts, the within treatment variance and
the between treatment variance. Note that the total variance is simply the sum of within
treatment variance and between treatment variance and the df for the total variance is simply
the sum of the df associated with the within treatment variance and between treatment
variance.
•
The within-treatment or within-group variance is computed using SSwithin = SSerror =
∑ ( X − X j ) 2 , which represents the sum the squared deviations from each group mean and
dfwithin = dferror = (n1 – 1) + (n2 – 1) + (n3 – 1)) + … + (nK – 1) = (total number of
observations) – (number of groups) = N – K. The ratio of SSwithin and dfwithin is known as the
Mean Square within groups (MSwithin) or Mean Square Error (MSerror)
•
The between-treatment variability is computed using the SSbetween = SStreatment =
∑ n j ( X j − X ) 2 , which represents the sum of the squared deviations of all group means
from the grand (overall) mean and dfbetween = dftreatment = K – 1, or the number of groups minus
3
one. . The ratio of SSbetween and dfbetween is known as the Mean Square between groups
(MSbetween) or Mean Square Treatment (MStreatment)
•
The F-statistic is calculated by computing the ratio of Mean Square between groups
(MSbetween or MStreatment) and Mean Square within groups (MSwithin or MSerror). Specifically,
F=
•
MS between
MS within
This ratio follows a sampling distribution known as the F distribution which is a family of
distributions based on the df of the numerator and the df of the denominator.
Example
A psychologist is interested in determining the extent to which physical attractiveness may
influence a person’s judgment of other personal characteristics, such as intelligence or ability.
So he selects three groups of subjects and asks them to pretend to be a company personnel
manager and he gives them all a stack of identical job applications which include picture of the
applicants. One group of subjects is given only pictures of very attractive people, another group
is given only pictures of average looking people and a third group is given only pictures of
unattractive people. Subjects are asked to rate the quality of each applicant on a scale of 0
(which represents very poor qualities) to 10 (which represents excellent qualities). The
following data is obtained:
5
3
4
3
Attractive
4
5
3
5
4
6
8
6
6
5
8
Average
5
6
4
7
3
7
6
8
4
3
2
2
Unattractive
3
1
4
1
What should he conclude?
Well, we first need to calculate the grand mean and the means for each of the three groups:
X =
5 + 4 + 4 + 6 + 5 + 3 + 4 + ... + 1
= 4.32
34
X1 =
5 + 4 + 4 + 3 + ... + 5
= 4.55
11
X2 =
6 + 5 + 3 + 6 + ... + 8
= 5.92
12
X3 =
4 + 3 + 1 + ... + 1
= 2.36
11
1
2
3
4
Now we can calculate3 MSwithin =
∑(X − X
j
)2
=
N −K
(5 − 4.55) 2 + (4 − 4.55) 2 + ... + (6 − 5.92) 2 + (5 − 5.92) 2 + ... + (4 − 2.36) 2 + ... + (1 − 2.36) 2
= 1.94
34 − 3
∑n
and MSbetween =
j
(X j − X )2
K −1
=
11(4.55 − 4.32) 2 + 12(5.92 − 4.32) 2 + ... + 11(2.36 − 4.32) 2 .582 + 30.72 + 42.26
=
= 36.63
2
2
So the F-statistic = 36.63/1.94 = 18.88, but how likely is it to have obtained this value if the null
hypothesis is true?
With 2 and 31 df the critical F, at α = .05, is approximately, 3.32.
So the psychologist can reject the null hypothesis and conclude that person’s judgment of the job
qualifications of prospective applicants appears to be influenced by how attractive the
prospective applicant is.
•
The ANOVA procedure is robust to violations of the assumptions, especially the assumption
of normality. Violating the assumption of homogeneity of variance is especially problematic
if the groups consist of different sample sizes.
•
Levene’s test, which we talked about before in terms of the t-test, can be used to test if the
homogeneity of variance assumption has been violated. If it has, then the Welch procedure
can be used to adjust the df used in ANOVA, similar to what we talked about for the t-test.
•
If the normality assumption is violated then the data can be transformed (because this won’t
change the results of the statistical test it will just re-scale things) to be more normally
distributed. Common transformation include:
1. Taking the square root of each observation is beneficial if the data is very skewed.
2. Taking the log of each observation is beneficial if the data is very positively skewed.
3. Taking the reciprocal of each observation (i.e. 1/observation) is beneficial if there are
very large values in the positive tail of the distribution.
Another approach to dealing with a violation of the normality assumption is to use a
trimmed sample which removes a fixed percentage of the extreme values in each of the tails
of the distribution or a Windsorized sample which replaces the values that are trimmed with
the most extreme observations in the tail that are left. In the latter case the df need to be
adjusted by the number of values that are replaced.
•
3
As we explore more complicated ANOVA models (models with more than one grouping
variable) it will become important to be able to differentiate between fixed factors (or
groups) and random factors.
Note: Answers obtained by hand, from Excel, or from a statistical software package will all most likely vary
slightly due to rounding error.
5
•
A fixed factor is one in which the researcher is only interested in the various levels of the
different groups that are being studied. These levels are not assumed to be representative of,
nor generalizable to, other levels of the group.
•
A random factor is one in which the researcher considers the various levels of the grouping
variable to be a random sample from all possible levels. In this situation the results of the
statistical test may be generalized to other levels of the group.
•
It should be noted that there is a direct relationship between the t-test for independent
samples and the ANOVA, when K = 2. Specifically, it can be shown mathematically that the
F-statistic = the t-statistic, squared (i.e. F = t2)
Power and Effect Size
•
Similar to the t-test, finding statistical significance does not tell us whether the differences
are important from a practical perspective. Several measure of effect size have been
proposed, all of which differ in terms of how biased they are.
•
η2 (eta-squared) or the correlation ratio is one of the oldest measure of effect size. It
represents the percentage of total variability that can be “accounted for” by differences in the
grouping variable or the percentage by with the error variability (i.e. within treatment
variability) is reduced by considering group membership. This is done by calculating the
ratio of SSbetween and SStotal Specifically:
η2 =
SS between
SS total
73.25
= .55 , meaning 55% of the variation in ratings can
133.44
be accounted for by differences in the independent variable (i.e. the groups).
For our previous example, η 2 =
This effect size measure is biased upwards, meaning it is larger than would be expected if it
were to have been calculated from the population, rather than estimated from the sample.
•
An alternative effect size measure to η2 is ω2 (omega-squared). It also measures the
percentage of total variability that can be “accounted for” by between group variability but
does so by using MS values, rather than SS values, thereby making use of sample size
information. Specifically, for a fixed effect4 ANOVA:
ω2 =
For our previous example, ω 2 =
SS between − (k − 1) MS within
SS total + MS within
73.25 − (3 − 1)(1.942) 69.37
=
= .51 .
133.44 + 1.942
135.38
This measure of effect size has been found to be less biased than η2. Note that it is smaller
for what we obtained for η2.
4
Note that this measure of effect size is computed slightly differently for a random effects ANOVA model and that
the formula for a random effects ANOVA model is not presented here.
6
•
Estimating power for ANOVA is a straightforward extension of how power was estimated
for the t-test. We simply use different notation, and different tables. Moreover, we assume
equal sample sizes in each group, which is the optimal situation.
•
In an ANOVA context, φ′ is comparable to d in the independent t-test context, and separates
out the effect size from the sample size. However, we need to incorporate the fact that we
are using variance estimates in the ANOVA context. Specifically,
φ′ =
∑ (µ
− µ) / K
2
j
MS within
So, if we were to assume that the population values correspond exactly to what we obtained
in our example (unlikely as this may be) then
φ′ =
•
[(4.55 − 4.32) 2 + (5.92 − 4.32) 2 + (2.36 − 4.32) 2 ] / 3 2.143
=
= 1.10
1.942
1.942
Furthermore, in an ANOVA context, φ is comparable to δ in the independent t-test context,
in that it incorporates sample size to allow us to determine how large of a sample we need to
detect meaningful differences, from a practical perspective. However, even though we may
wind up with unequal sample sizes in our group we calculate power based on the assumption
of equal sample sizes. Specifically,
φ = φ′ n where n = the number of subjects in each group
So, if we were to assume that we expected 12 subjects in each of our groups in our example
then:
φ = φ′ n = 1.1 12 = 3.81
•
In an ANOVA context we can use to the non-centrality parameter for the F distribution,
which is the mean of the F-distribution if the null hypothesis is false, with K – 1 and N – K df
for the numerator and denominator, respectively.
For our example, we will use an estimate corresponding to φ = 3.0, because our table in the
book does not go any higher and we will compare it to the non-centrality parameter with 2 df
for the numerator and 30 df for the denominator (because our book does not have very fine
gradiations for df in the denominator. Using the table in the book we find that β = .03 if we
want to conduct our test at α = .01. Therefore, since Power = 1 - β the power of the
experiment we ran was approximately .97.
Download