3.4 Wilcoxon and Kruskal Wallis alternatives to t

advertisement
Wilcoxon and Kruskal-Wallis: Rank-based alternatives to t-tests and ANOVA
T-tests and ANOVA can perform poorly when have large outliers in our data or have
data are not normally distributed. Non-normal distributions violate the assumptions of
ANOVA, and can greatly reduce the power to detect significant effects. Even when
graphs of our data suggest there is a difference among the groups, we may get nonsignificant p-values from t-tests and ANOVA as a result of the outliers or non-normal
distributions.
In these cases, we can use alternatives to the t-test and ANOVA based on ranks:
Non-parametric (rank) test
Wilcoxon rank sum test
Wilcoxon signed rank test
Kruskal-Wallis test
Corresponding parametric test
T-test (ordinary, two sample version)
Paired t-test
ANOVA
T-tests and ANOVA are called parametric tests because they assume that the data follow
a distribution (such as normal distribution) that has parameters mean and standard
deviation. The relationship between the rank-based tests and the parametric tests is like
the relationship between the median and the mean. The median (which is based on
rank) is less affected by outliers than is the mean. Because the rank-based methods use
do not rely on assumptions about the distribution (such as normal) or parameters (such
as mean and standard deviation), they are also called distribution-free or nonparametric methods. The Wilcoxon rank sum test is also known as the Mann-Whitney U
test. It sometimes called the Wilcoxon-Mann-Whitney test to give credit to everyone.
If the sample size is large, deviations from normal distribution are less of a problem for
t-tests and ANOVA.
Wilcoxon rank sum for two independent samples:
Recall a t-test example where we asked the question, do colon cancer patients have
elevated level of the mucin protein in their blood? We measured the level of the
protein mucin in the blood of patients with colon cancer and in healthy controls.
Group
Colon cancer
Healthy control
Mucin level
83, 89, 90, 93, 98
99, 100, 103, 104, 141
The boxplot shows a separation of the two groups, but the p-value for the t-test is not
significant (p=0.054). The Wilcoxon rank sum test is an alternative to the t-test that uses
the rank value of each observation, rather than the actual value. Here are the rank
values of the original observations.
Group
Colon cancer
Healthy control
Mucin level
83, 89, 90, 93, 98
99, 100, 103, 104, 141
Rank Mucin level
1, 2, 3, 4, 5
6, 7, 8, 9, 10
Using the rank values, the Wilcoxon rank sum test yields a p-value of p=0.0079, so we
reject the null hypothesis and conclude that colon cancer patients have mucin levels
different from those of healthy controls.
Excel does not have the rank-based tests built in. If you only have Excel, you can convert
the original values to ranks and analyze the rank-values using a t-test or ANOVA. This
method does not give identical results to using the Wilcoxon tests or Kruskal-Wallis, but
the results are quite similar, and the method is an acceptable alternative if you don't
have software that will do the rank tests. The p-value for the t-test using rank value is
p=0.0010 is similar to the p-value for the Wilcoxon rank sum test p=0.0079.
Recall that we tried a log transform of the data to reduce the effect of the outlier. For
these data, the log transform yields a t-test p-value of p=0.036. If a log transform makes
your data more normally distributed, it may be worthwhile trying that before trying the
rank methods.
Wilcoxon signed rank test for paired (matched) samples:
Recall that we used the paired t-test to test for difference in paired (matched) samples,
such as the difference before and after treatment. The Wilcoxon signed rank test is the
rank-based analog of the paired t-test. Here's an example.
Do bears lose weight between winter and spring?
We previously used the paired t-test to examine the change in weight of bears, where
the same bears were weighed in winter and in spring. We'll analyze the data using the
Wilcoxon signed rank test.
Measurement time
Winter
Spring
Difference
Bear weights
300,470,550,650,750,760,800,985,1100,1200
280,420,500,620,690,710,790,935,1050,1110
20, 50, 50, 30, 60, 50, 10, 50, 50, 90
Notice that all the bears lose weight. Using the paired t-test, we get p = 0.0001053,
which is significant. We reject the null hypothesis that the change in weight between
winter and spring is zero. The Wilcoxon signed rank test gives us p =0.0053, so we still
reject the null hypothesis. In this case, the results are similar for paired t-test and
Wilcoxon signed rank test. Construct some data sets with outliers to see when the t-test
and Wilcoxon tests give different results.
The Kruskal-Wallis test for two or more treatment groups
The Kruskal-Wallis test is the non-parametric version of one-way ANOVA, and allows us
to compare two or more treatment groups.
For the chickwts example:
Kruskal-Wallis rank sum test
data: weight by feed
Kruskal-Wallis chi-squared = 37.3427, df = 5, p-value = 5.113e-07
When should I use t-tests and ANOVA versus Wilcoxon tests and Kruskal-Wallis?
The Wilcoxon tests and Kruskal-Wallis don't require the assumption of normal
distributions, and are not affected by outliers, so why don't we always use them instead
of t-tests and ANOVA? There are several considerations.
If the data are normally distributed, ANOVA and t-tests have slightly more power to
detect differences than do the rank-based tests.
It is easier to include additional variables in ANOVA, and regression models (which also
assume normally distributed residuals) than it is to include additional variables in rankbased models.
ANOVA and t-tests give us a quantitative measure of the difference between the group
means, and can provide group means adjusted for covariates.
It is generally easier to get confidence intervals using methods that assume the data are
normally distributed. But newer computer-intensive methods such as bootstrap now
make getting confidence intervals easier for rank methods.
When we have very small sample size (5 to 10 per group) it may be mathematically
nearly impossible to get a significant p-value using a non-parametric test. But we may be
able to get significance using a parametric test, if we are willing to make the assumption
of normal or T distributions.
Do the Wilcoxon tests and Kruskal-Wallis test compare the medians of groups?
The Wilcoxon tests and Kruskal-Wallis do not compare the medians of the treatment
groups. They compare the entire distributions of the groups (that is, the ranks of all the
observations, not just the median). You can construct two treatment groups that have
identical medians, but that differ in their rank sums, and will give significant differences
in the Wilcoxon or Kruskal-Wallis tests.
group.A=(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,
1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1)
group.B=(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
2,2,2,2,2,2,2,22,2,2,2,2,2,2,2,2,2,2,2,2,2,2,22,2,2,2,2,2,2,2)
The median of both groups is 0. The Wilcoxon rank sum test gives p = 0.0295, indicating
that the two groups are significantly different. This result shows that, although group A
and group B have identical medians they differ in location, and differ significantly in the
Wilcoxon test.
Download