IM911 DTC Quantitative Research Methods Comparing Means II: Nonparametric Tests and

advertisement
IM911
DTC Quantitative Research Methods
Comparing Means II:
Nonparametric Tests and
Bivariate and Multivariate Analysis
of Variance (ANOVA)
Thursday 25th February 2016
Two-sample t-tests: Limitations
• In Week 6 we looked at two-sample t-tests, which are used to
test the (null) hypothesis that the population means for two
groups are the same.
• But t-tests make an assumption of homogeneity of variance
(i.e. that the spread of values is the same in each of the
groups).
• Furthermore, the assumption that the difference between
sample means has a t-distribution is only reasonable for small
samples if the variable has (approximately) a normal
distribution.
• And, of course, very often we are interested in comparing
more than two groups…
Nonparametric alternatives
• Where the assumptions of a t-test are seriously violated,
an alternative approach is to use a nonparametric test.
• Nonparametric tests are also referred to as distributionfree tests, as they have the advantage of not requiring
the same assumptions about distributions of values.
• In practice (when using SPSS), such tests work in a similar
way to parametric tests, with the same processes of
selecting variables and of assessing statistical
significance, based on the p-value that is calculated for
the test statistic.
The weakness of the
nonparametric alternative…
• However, parametric tests such as t-tests are to be
preferred because, in general, for the same sample
size(s), they are less likely to generate Type II errors
(i.e. the acceptance of an incorrect null hypothesis).
• Nonparametric tests are thus less powerful.
• This lack of power results from the loss of
information when interval-level data are converted
to ranked data (i.e. merely ordering the values from
lowest to highest).
The Mann-Whitney U-test
• This is a nonparametric alternative to the twosample t-test for comparing two independent
samples. In effect, it focuses on average ranks of
values rather than on average values.
• U is calculated by, first, ranking all the values in the
two samples taken together.
• The ranked values for each sample are then added
up, and, if the sample size for a sample is n, then
n(n+1)/2 is subtracted from the sum of the ranks.
• The smaller of the numbers generated for the two
samples becomes the U-statistic.
Mann-Whitney (continued)
• The U-test can represent a better way of
comparing an ordinal measure between two
groups than assuming the measure can be
treated as interval-level.
• Since it is based on ranks, it is more robust
than the t-test with respect to the impact of
outliers.
• However, it is less appropriate where there are
more than a small number of ‘tied’ values.
Another alternative…
• Where there are a substantial number of tied values, the
Kolmogorov-Smirnov Two Sample Test may be more
appropriate.
• This is (yet) another nonparametric test, focusing on whether
the two groups have the same distribution of values, and
based on the maximum absolute difference between the
observed cumulative frequency distributions for the two
samples
• However, this is a broader hypothesis than one focusing on
the level of the values.
• It has also been noted in the technical literature that this test
has limited power and hence gives a high chance of a Type II
Error, i.e. not identifying a difference when one exists.
Rethinking the difference between
two sample means: An example
Women’s ages at marriage (in years)
First pair of samples:
Education
Left school at 16:
Stayed on at school:
Ages at marriage
19 19 19 20 20 20 21 21 21
24 24 24 25 25 25 26 26 26
Mean
20.0
25.0
Second pair of samples:
Education
Left school at 16:
Stayed on at school:
Ages at marriage
16 17 18 19 20 21 22 23 24
21 22 23 24 25 26 27 28 29
Mean
20.0
25.0
Which pair of samples provides
stronger evidence of a difference?
Question: Within each pair of samples on the preceding slide the
difference between the sample means is the same (25.0 - 20.0 = 5.0
years). Given this similarity, which pair of samples provides stronger
evidence that there is a difference between the mean ages at
marriage, in the population, of women who left school at 16 and of
women who stayed on at school?
Answer: It seems intuitively obvious that the first pair of samples
provides stronger evidence of a difference, since in this case the
ages at marriage in each of the two groups are quite homogeneous,
and as a consequence there is no overlap between the two groups.
It seems implausible that a set of values that is so homogeneous
within groups but different between groups could have arisen by
chance, rather than as a consequence of some underlying
difference between the groups.
Comparing types of variation
• Another way of looking at the above is to say that the
difference between the means in the first pair of
samples is large when compared with the differences
between individuals within either of the groups.
• The difference between the group means can be labelled
as between-groups variation and the differences
between individuals within each of the groups can be
labelled as within-group variation.
• It is the comparison of between-groups variation and
within-group variation that is at the heart of the
statistical technique labelled analysis of variance
(ANOVA).
Quantifying variation
• As in the first pair of samples in the example, a high level
of between-groups variation relative to within-group
variation gives one more confidence that there is an
underlying difference between the groups.
• But how can one quantify the between-groups variation
and the within-group variation?
• Typically, when we want to summarise the spread of a
set of values we calculate the standard deviation
corresponding to those values. A similar approach is used
to quantify the two forms of variation.
Sums of squares
• Recall that the standard deviation is based on the
squared differences between each of a set of
individual values and a mean value.
• Between-groups variation is thus quantified as the
sum of the squared differences between the group
means and the overall mean, with each squared
difference being weighted by the number of cases in
the group in question (since larger groups are
obviously of greater empirical importance).
• Thus, in the example, the between-groups variation
can be calculated as:
[ 9 x (20.0 - 22.5)2 ] + [ 9 x (25.0 - 22.5)2 ] = 112.5
Sums of squares (continued)
• The within-group variation can be calculated by
taking each of the groups in turn, and calculating the
sum of squared differences between the individual
values in that group and the mean for that group.
• Thus, in the first of the groups in the second pair of
samples:
(16 - 20)2 + (17 - 20)2 + (18 - 20)2 + (19 - 20)2 + (20 - 20)2 +
(21 - 20)2 + (22 - 20)2 + (23 - 20)2 + (24 - 20)2 = 60.0
• The second of the groups in the second pair of
samples also generates a sum of squared differences
of 60.0, so the total value for the within-group
variation is 60.0 + 60.0 = 120.0
Partitioning variation
• Note that the overall amount of variation within the
data can be measured by calculating the sum of
squared differences between each of the individual
values (i.e. all the values in both of the groups) and
the overall mean.
• This calculation results in a figure of 232.5.
• Note that 232.5 = 112.5 + 120.0!
• In other words, the technique of Analysis of Variance
involves breaking down (‘partitioning’) the overall
variation in a set of values into its between-groups
and within-group components.
Accounting for sources of variation
• Now that the two forms of variation have
been quantified the next step is to compare
the two values that have been obtained with
each other.
• However, when doing this it makes sense to
take account of:
(a) the number of groups being considered,
and
(b) the number of individuals in each group.
Degrees of freedom
• In this case there are only two groups, hence we are only
making one comparison between groups. In fact, the number
of degrees of freedom (sources of variation) attached to the
between-groups variation is always equal to the number of
groups less one.
• The number of degrees of freedom (sources of variation) for
the within-group variation is the total number of individuals in
all the groups, less the number of groups (or, to put it another
way, the sum across all the groups of the number of
individuals in each group minus one). Thus, in this case:
Degrees of freedom of between-groups variation
Degrees of freedom of within-group variation
= 2-1 = 1
= 18 - 2 = 16
Calculating the F-statistic
• We now divide the two amounts of variation by their
respective degrees of freedom, i.e.:
Between-groups variation
Within-group variation
= 112.5/1 = 112.5
= 120.0/16 = 7.5
• Finally we compare the amounts of the two forms of
variation by dividing the first amount by the second
amount, giving 112.5/7.5 = 15.0.
• Thus, in a sense, the between-groups variation is 15
times as great as the within-group variation.
Evaluating the F-statistic
• Note that an F-statistic has associated with it two sets of
degrees of freedom (corresponding to the betweengroups variation and the within-group variation). Hence
here we have an F-statistic of 15.0 with 1 degree of
freedom and 16 degrees of freedom.
• Differences between sample means that occur simply as
a consequence of sampling error result, on average, in
the same amount of between-groups variation per
degree of freedom as within-group variation per degree
of freedom. Hence the average F-statistic where the null
hypothesis of equal means is correct will be 1.
• How rarely, then, would an F-statistic of 15.0 occur
simply as a consequence of sampling error?
The usual p-value…
• For an F-statistic of 15.0 with 1 degree of freedom
and 16 degrees of freedom, the p-value is 0.0013.
• Since p < 0.05, we can reject the (null) hypothesis
that the population means for the two groups are
the same.
• However, ANOVA makes the same assumptions
about homogeneity of variance and normally
distributed values as t-tests do!
• And, if we are comparing more than two groups, the
question arises as to whether the means for
particular pairs of groups differ from each other?
Post-hoc tests
• Rather than carrying out a large number of t-tests for pairs of
groups, which involves a substantially increased chance of one
or more Type I Errors (i.e. false positives), there are a number
of alternative ways of comparing the groups more
appropriately in a pair-wise way.
• If the assumptions of homogeneity of variance and normal
distribution of values are met, then Tukey’s HSD test corrects
for the increased chance of Type I Errors when groups are
compared in a pair-wise way.
• Another common post-hoc procedure is Scheffe’s test.
However, because this allows for more complex forms of
comparisons (i.e. of three or more means), it is unnecessarily
low in power for pair-wise comparisons, i.e. the chance of
Type II Errors is increased when it is used to look at these.
…and the nonparametric alternative?
• The Kruskal-Wallis H Test is the nonparametric
test equivalent to (one-way) ANOVA, being an
extension of the Mann-Whitney U-test to
allow the comparison of more than two
(independent) samples.
• The above comment refers to one-way ANOVA
because the technique can be generalised to
versions which involve two or more
independent variables at the same time…
Multivariate analysis
As noted last week, we can use multivariate analysis to
elaborate bivariate relationships, in order to answer the
following types of questions:
1. Why does the relationship [between two variables] exist?
Spurious relationships, intervening variables
2. How general is the relationship? Does it vary in existence/intensity
between subgroups?
The replication of/specification of relationships
These objectives can be achieved via an elaboration of ANOVA
Starting with some means…
BSA 2006: At what age did you retire work? (Q296)
NS- SEC class
Employers in large org.; higher manag. & pr.
Lower profess & manag; higher techn. & su.
Intermediate occupations
Employers in small org.; own account work
Lower supervisory & technical occupation
Semi-routine occupations
Routine occupations
N
64
183
88
72
96
144
111
Mean
60.84
58.01
56.18
61.39
60.04
58.53
57.60
Total
758
58.65
… and then a One-Way ANOVA
BSA 2006: At what age did you retire work? (Q296)
Sum of Squares df
Between Groups
1769.833
6
Within Groups
57609.915
751
Total
59379.748
757
Mean Square
294.972
76.711
F
3.845
Sig.
.001
Since p=0.001 < 0.05, there is a significant relationship
between occupational class (NS-SEC) and retirement age.
… but we need to remember to reflect on whether the
assumptions of ANOVA are met in this case!
Assumptions: a reminder
• ANOVA make an assumption of homogeneity of
variance (i.e. that the spread of values is the same in
each of the groups).
• Furthermore, ANOVA assumes that the variable has
(approximately) a normal distribution within each of
the groups.
• Levene’s test of the former assumption results in
p<0.001, i.e. the assumption is not plausible.
• … and it is also not self-evident that retirement ages
would have a normal distribution!
Nevertheless…
• We might ask ourselves the question whether
some of the class difference in retirement
ages reflects gender.
• And hence there is a motivation to carry out a
Two-way ANOVA to look at the effects of class
and gender simultaneously.
Two-way ANOVA results
BSA2006: At what age did you retire work Q296
(Type III)
Source
Corrected Model
RClass
RSex
RClass * RSex
Error
Corrected Total
Sum of Sq.
4739.996
619.086
2188.093
506.510
54639.752
59379.748
df
13
6
1
6
744
757
Mean Sq.
364.615
103.181
2188.093
84.418
73.441
F
4.965
1.405
29.794
1.149
Sig.
.000
.210
.000
.332
… so what do the results mean?
• The overall variation explained by the two variables is
greater (4739.996 compared to 1769.833).
• But the between-groups variation which is unique to
class is no longer significant (p=0.210 > 0.05)
• Whereas the between-groups variation which is unique
to sex is significant (p<0.001)
• … but sex and class do not have interacting effects
(p=0.332)
• Note that the class, sex and interaction sums of squares
don’t add up to the overall ‘explained’ sum of squares
because some of the effects of class and sex overlap.
A multivariate conclusion!
• The class differences in retirement age
observed in the One-way ANOVA are shown
by the Two-way ANOVA to be a spurious
consequence of the relationships between
gender and class and between gender and
retirement age!
Download