Nonparametric Statistics or Distribution

advertisement
Nonparametric Statistics or Distribution-free Statistics
Previously, we often used the assumption that our samples were drawn from normally distributed
populations. This chapter introduces techniques that do not make these assumptions. These
methods are called Distribution-free Statistics or Nonparametric Statistics. In situations where
the normal assumption holds, the nonparametric tests are less efficient than our traditional
parametric methods. However, when the normal assumption is not valid, the nonparametric
methods are more appropriate.
In this section, we consider four nonparametric tests: (1) the Wilcoxon Rank Sum Test or MannWhitney U Test, (2) the Wilcoxon Signed Rank Test, (3) the Kruskal-Wallis test and (4) the one
sample test of runs.
Wilcoxon Rank Sum Test or Mann-Whitney U Test:
This technique tests whether the medians of 2 populations are the same, when the 2 samples are
independent of each other. This test is comparable to the parametric t-test on the difference
between 2 means that we considered previously.
Technique: Merge and rank the observations. Find the sum of the ranks R1 and R2 for each of
the 2 samples. Compute the T1 statistic (based on the smaller of the 2 samples, or either sample
if same sizes) and its mean T1 and standard deviation  T1 as described below.
T1  n1n2 
n1 (n1  1)
 R1
2
n1n2
2
n n (n  n  1)
 T1  1 2 1 2
12
T1 
If the 2 sample sizes are each greater than 10, then U is approximately normal and we can
standardize to get Z = (T1 - T1)/T1. Then if Z is statistically different from zero, we conclude
that the medians are not the same.
Wilcoxon Signed Rank Test:
This technique tests whether the medians of 2 populations are the same, when the 2 samples are
not independent of each other. This test is comparable to the parametric matched-pairs test that
we considered previously.
Technique: Calculate the differences between the two sample values for each pair of
observations. Drop the zero values. Of the n non-zero values, rank the absolute values of the
differences. Sum the ranks of the positive and negative differences separately. Let W be the
sum of the positive ranks. The mean and standard deviation of W are W 
n(n  1)
4
and
n(n  1)(2n  1)
. If the number of non-zero differences is at least 20, then W is
24
W  W
approximately normal and we can standardize to get Z 
. If Z is statistically different
W 
W
from zero, we conclude that the medians are not the same.
Kruskal-Wallis test:
This technique tests the null hypothesis that several populations have the same median. It is the
nonparametric equivalent of the one-factor ANOVA. The test statistic is
2
12
K
( R j )  3(n  1)
n(n  1)
nj
where nj is the number of observations in the jth sample, n is the total number of observations,
and Rj is the sum of the ranks for the jth sample. If each nj is at least 5 and the null hypothesis is
true, then the distribution of K is 2 with c-1 degrees of freedom, where c is the number of
sample groups. If testing at the level α and K is in the α-tail, then we conclude that the medians
are not the same.
In the case of ties, a corrected statistic Kc should be computed.
K
Kc 
 (t 3j  t j ) 
1  3

 n  n 
where tj is the number of ties in the jth sample.
One sample test of runs:
This technique tests for randomness of order of occurrence.
A run is a sequence of identical occurrences that are followed and preceded by different
occurrences. Count the number of runs r. If the order is random, the mean of r is
2n n
r  1 2 1
n1  n2
and the standard deviation of r is
2n1n2 (2n1n2  n1  n2 )
.
r 
(n1  n2 ) 2 (n1  n2  1)
If either n1 or n2 is greater than 20, then r is approximately normally distributed and we can
standardize to get Z = (r - r)/r. Then if Z is statistically different from zero, we conclude that
the occurrence is not random.
Download