A comparison of the powers of the Chi-Square Kolmogorov-Smirnov and Cram´

A comparison of the powers of the Chi-Square
test statistic with the discrete
Kolmogorov-Smirnov and Cramér-von Mises
test statistics
Michael Steele1 and Janet Chaseling2
School of Mathematical and Physical Sciences, James Cook University,
Australia, Mike.Steele@jcu.edu.au.
Australian School of Environmental Studies, Griffith University, Australia,
Summary. This paper provides extensive simulated power studies for three major
goodness-of-fit test statistics for a uniform null distribution against a variety of
alternative distributions. Recommendations are made so that applied researchers
can use the specific test statistics with high power.
Key words: power, goodness-of-fit, ordinal, Kolmogorov-Smirnov
1 Introduction
Researchers from many disciplines use goodness-of-fit test statistics for discrete data
but studies of their power are quite limited. Contributions to the power of test statistics for discrete data have been made by [CLS94], [PS77], [Fro96], [Ste02] and [SC06].
The Chi-Square test statistic is the main choice of applied researchers however this
paper shows that in some situations this may be at the expense of power.
The powers of the discrete Kolmogorov-Smirnov and Cramér-von Mises test
statistics discussed in Section 2 are compared to the Chi-Square for a uniform null
distribution against several alternative distributions defined in Section 3. The results of the power study are presented and discussed in Section 4 with concluding
comments and recommendations in Section 5 on the most powerful of the three test
2 The test statistics
The three test statistics used in this power study are the discrete KolmogorovSmirnov [PS77], the discrete Cramér-von Mises [CLS94] and Pearson’s Chi-Square
[Pea00] as defined in equations 2.1, 2.2 and 2.3 respectively.
KS =
|Z i |
PZ p
W 2 = N −1
χ2 =
i i
(Oi −Ei )2
where k is the number of cells, N is the sample size, pi is the probability for cell i, Oi
and Ei are the observed and expected frequencies for cell i, and Zi is the cumulative
sum of the differences between the observed and expected frequencies up to and
including cell i.
3 Method of the power study
For consistency a uniform null distribution over ten cells is used for each of the
alternative distributions defined in Table 1. The powers of each of the three test
statistics are approximated for total sample sizes of 10, 20, 30, 50, 100 and 200
(representing 1, 2, 3, 5, 10 and 20 observations per cell under the null distribution).
Power is estimated using 10000 simulated random samples.
Table 1. Alternative distributions for the power study
Decreasing 0.32
Triangular 0.17
Platykurtic 0.04
Leptokurtic 0.05
Cell Probability (2 Decimal Places)
0.13 0.10 0.08 0.07 0.07 0.06 0.06 0.05
0.05 0.05 0.05 0.05 0.15 0.15 0.15 0.15
0.13 0.10 0.07 0.03 0.03 0.07 0.10 0.13
0.11 0.11 0.12 0.12 0.12 0.12 0.11 0.11
0.05 0.05 0.05 0.30 0.30 0.05 0.05 0.05
0.11 0.17 0.11 0.06 0.06 0.11 0.17 0.11
As the null distribution of each test statistic is generated from these simulated
random samples the null distribution is discrete. This means that a critical value
and corresponding power at a significance level of exactly 5% may not be achieved.
Again for consistency the power is linearly interpolated to enable an estimate of the
power at the fixed significance level of 5%.
4 Results
4.1 Decreasing alternative distribution
For the smaller sample sizes Figure 1 shows that the two EDF type test statistics
W 2 and KS have greater power than the χ2 test statistic. It should be noted that
the powers of all three test statistics are very high for sample sizes of at least five
observations per cell under the uniform null distribution (N ≥ 50).
The major cumulative difference between the uniform null distribution and this
particular decreasing trend type alternative distribution occurs at the initial cells
with the largest difference occurring at the second cell. As they are based on the
cumulative difference between the observed and expected frequencies the two EDF
test statistics are greatly affected by larger cumulative differences in the earlier cells.
4.2 Step type alternative distribution
The cumulative difference between the uniform null and this step type distribution
continues to increase up to the fifth cell. This indicates that the EDF type test
statistics are again shown in Figure 2 to be more powerful for the same reasons
advanced in Section 4.1.
The powers of KS and W 2 are shown to be approximately equal over all sample
sizes. For sample sizes of at least 10 per cell under the null distribution the powers
of all three test statistics are approximately equal and very high.
Fig. 1. Power of KS ,W 2 and χ2 for a uniform null and decreasing alternative
4.3 Triangular alternative distribution
At the earlier cells the magnitude of the cumulative differences between the uniform
null and the triangular alternative distribution is not as large as the differences in
Sections 4.1 and 4.2. The power of W 2 and KS is shown in Figure 3 to be less
than that of the χ2 test statistic for most of the sample sizes. The exception being
Fig. 2. Power of KS , W 2 and χ2 for a uniform null and step type alternative
the larger sample sizes where the power of all three test statistics is approximately
equal. It is clear from Figure 3 that overall the χ2 test statistic is the more powerful
for this particular alternative distribution.
4.4 Platykurtic alternative distribution
As the cumulative differences between the uniform null and this platykurtic alternative distribution are not large the powers of the KS andW 2 test statistics are shown
in Figure 4 to have very low power for all sample sizes. However it is important to
note that although the power of the χ2 test statistic is shown to be much greater
than the two EDF test statistics for the larger sample sizes its power for smaller
sample sizes is also very poor. The χ2 test statistic is the only one of the three test
statistics in this power study that could be recommended but its use for the smaller
sample sizes is with low power.
4.5 Leptokurtic alternative distribution
The powers of KS and W 2 are shown in Figure 5 to be relatively less than that of
the χ2 test statistic for the smaller sample sizes of up to three per cell under the
uniform null distribution. However for sample sizes of at least five per cell the powers
of all three test statistics are shown to be approximately the same and very high for
this leptokurtic alternative distribution. Overall the χ2 test statistic could be used
with higher power than the two EDF test statistics for the leptokurtic alternative
distribution particularly for smaller sample sizes.
Fig. 3. Power of KS , W 2 and χ2 for a uniform null and triangular alternative
Fig. 4. Power of KS , W 2 and χ2 for a uniform null and platykurtic alternative
4.6 Bimodal alternative distribution
As was the case in Section 4.4 with the platykurtic alternative distribution the
powers of the EDF test statistics are shown in Figure 6 to be very low regardless of
the sample size and the power of the χ2 test statistic also shown to be very low for
sample sizes less than ten per cell under the null distribution. It is clear that unless
the sample size is quite large that none of the three test statistics can be used with
high power for a bimodal alternative distribution.
Fig. 5. Power of KS , W 2 and χ2 for a uniform null and leptokurtic alternative
5 Conclusions and Recommendations
The results from Section 4 indicate that none of the three test statistics can be
recommended as having high power for all situations. However these results and
summary below (Table 2) should give applied researchers some understanding of
the reasons why, with respect to power, blindly selecting the χ2 test statistic may
not always be the ideal approach in determining ‘fit’.
Table 2. Summary of powers for various alternative distributions
Alternative Distribution Ranking of the Powers
W 2 > KS > χ2
W 2 ≈ KS > χ2
χ2 > KS ≈ W 2
χ2 > KS ≈ W 2
χ2 > KS > W 2
χ2 > KS > W 2
Fig. 6. Power of KS , W 2 and χ2 for a uniform null and bimodal alternative
