A comparison of the powers of the Chi-Square Kolmogorov-Smirnov and Cram´

A comparison of the powers of the Chi-Square test statistic with the discrete Kolmogorov-Smirnov and Cramér-von Mises test statistics Michael Steele1 and Janet Chaseling2 1 2 School of Mathematical and Physical Sciences, James Cook University, Australia, Mike.Steele@jcu.edu.au. Australian School of Environmental Studies, Griffith University, Australia, J.Chaseling@griffith.edu.au Summary. This paper provides extensive simulated power studies for three major goodness-of-fit test statistics for a uniform null distribution against a variety of alternative distributions. Recommendations are made so that applied researchers can use the specific test statistics with high power. Key words: power, goodness-of-fit, ordinal, Kolmogorov-Smirnov 1 Introduction Researchers from many disciplines use goodness-of-fit test statistics for discrete data but studies of their power are quite limited. Contributions to the power of test statistics for discrete data have been made by [CLS94], [PS77], [Fro96], [Ste02] and [SC06]. The Chi-Square test statistic is the main choice of applied researchers however this paper shows that in some situations this may be at the expense of power. The powers of the discrete Kolmogorov-Smirnov and Cramér-von Mises test statistics discussed in Section 2 are compared to the Chi-Square for a uniform null distribution against several alternative distributions defined in Section 3. The results of the power study are presented and discussed in Section 4 with concluding comments and recommendations in Section 5 on the most powerful of the three test statistics. 2 The test statistics The three test statistics used in this power study are the discrete KolmogorovSmirnov [PS77], the discrete Cramér-von Mises [CLS94] and Pearson’s Chi-Square [Pea00] as defined in equations 2.1, 2.2 and 2.3 respectively. 616 Michael Steele and Janet Chaseling KS = max |Z i | 1≤i≤k (2.1) PZ p k W 2 = N −1 P k χ2 = i=1 2 i i (2.2) i=1 (Oi −Ei )2 Ei (2.3) where k is the number of cells, N is the sample size, pi is the probability for cell i, Oi and Ei are the observed and expected frequencies for cell i, and Zi is the cumulative sum of the differences between the observed and expected frequencies up to and including cell i. 3 Method of the power study For consistency a uniform null distribution over ten cells is used for each of the alternative distributions defined in Table 1. The powers of each of the three test statistics are approximated for total sample sizes of 10, 20, 30, 50, 100 and 200 (representing 1, 2, 3, 5, 10 and 20 observations per cell under the null distribution). Power is estimated using 10000 simulated random samples. Table 1. Alternative distributions for the power study Description 1 Decreasing 0.32 Step 0.05 Triangular 0.17 Platykurtic 0.04 Leptokurtic 0.05 Bimodal 0.05 Cell Probability (2 Decimal Places) 2 3 4 5 6 7 8 9 0.13 0.10 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.05 0.05 0.05 0.15 0.15 0.15 0.15 0.13 0.10 0.07 0.03 0.03 0.07 0.10 0.13 0.11 0.11 0.12 0.12 0.12 0.12 0.11 0.11 0.05 0.05 0.05 0.30 0.30 0.05 0.05 0.05 0.11 0.17 0.11 0.06 0.06 0.11 0.17 0.11 10 0.05 0.15 0.17 0.04 0.05 0.05 As the null distribution of each test statistic is generated from these simulated random samples the null distribution is discrete. This means that a critical value and corresponding power at a significance level of exactly 5% may not be achieved. Again for consistency the power is linearly interpolated to enable an estimate of the power at the fixed significance level of 5%. 4 Results 4.1 Decreasing alternative distribution For the smaller sample sizes Figure 1 shows that the two EDF type test statistics W 2 and KS have greater power than the χ2 test statistic. It should be noted that A comparison of the powers of the Chi-Square test statistic 617 the powers of all three test statistics are very high for sample sizes of at least five observations per cell under the uniform null distribution (N ≥ 50). The major cumulative difference between the uniform null distribution and this particular decreasing trend type alternative distribution occurs at the initial cells with the largest difference occurring at the second cell. As they are based on the cumulative difference between the observed and expected frequencies the two EDF test statistics are greatly affected by larger cumulative differences in the earlier cells. 4.2 Step type alternative distribution The cumulative difference between the uniform null and this step type distribution continues to increase up to the fifth cell. This indicates that the EDF type test statistics are again shown in Figure 2 to be more powerful for the same reasons advanced in Section 4.1. The powers of KS and W 2 are shown to be approximately equal over all sample sizes. For sample sizes of at least 10 per cell under the null distribution the powers of all three test statistics are approximately equal and very high. Fig. 1. Power of KS ,W 2 and χ2 for a uniform null and decreasing alternative 4.3 Triangular alternative distribution At the earlier cells the magnitude of the cumulative differences between the uniform null and the triangular alternative distribution is not as large as the differences in Sections 4.1 and 4.2. The power of W 2 and KS is shown in Figure 3 to be less than that of the χ2 test statistic for most of the sample sizes. The exception being 618 Michael Steele and Janet Chaseling Fig. 2. Power of KS , W 2 and χ2 for a uniform null and step type alternative the larger sample sizes where the power of all three test statistics is approximately equal. It is clear from Figure 3 that overall the χ2 test statistic is the more powerful for this particular alternative distribution. 4.4 Platykurtic alternative distribution As the cumulative differences between the uniform null and this platykurtic alternative distribution are not large the powers of the KS andW 2 test statistics are shown in Figure 4 to have very low power for all sample sizes. However it is important to note that although the power of the χ2 test statistic is shown to be much greater than the two EDF test statistics for the larger sample sizes its power for smaller sample sizes is also very poor. The χ2 test statistic is the only one of the three test statistics in this power study that could be recommended but its use for the smaller sample sizes is with low power. 4.5 Leptokurtic alternative distribution The powers of KS and W 2 are shown in Figure 5 to be relatively less than that of the χ2 test statistic for the smaller sample sizes of up to three per cell under the uniform null distribution. However for sample sizes of at least five per cell the powers of all three test statistics are shown to be approximately the same and very high for this leptokurtic alternative distribution. Overall the χ2 test statistic could be used with higher power than the two EDF test statistics for the leptokurtic alternative distribution particularly for smaller sample sizes. A comparison of the powers of the Chi-Square test statistic 619 Fig. 3. Power of KS , W 2 and χ2 for a uniform null and triangular alternative Fig. 4. Power of KS , W 2 and χ2 for a uniform null and platykurtic alternative 4.6 Bimodal alternative distribution As was the case in Section 4.4 with the platykurtic alternative distribution the powers of the EDF test statistics are shown in Figure 6 to be very low regardless of the sample size and the power of the χ2 test statistic also shown to be very low for 620 Michael Steele and Janet Chaseling sample sizes less than ten per cell under the null distribution. It is clear that unless the sample size is quite large that none of the three test statistics can be used with high power for a bimodal alternative distribution. Fig. 5. Power of KS , W 2 and χ2 for a uniform null and leptokurtic alternative 5 Conclusions and Recommendations The results from Section 4 indicate that none of the three test statistics can be recommended as having high power for all situations. However these results and summary below (Table 2) should give applied researchers some understanding of the reasons why, with respect to power, blindly selecting the χ2 test statistic may not always be the ideal approach in determining ‘fit’. Table 2. Summary of powers for various alternative distributions Alternative Distribution Ranking of the Powers Decreasing W 2 > KS > χ2 Step W 2 ≈ KS > χ2 Triangular χ2 > KS ≈ W 2 Platykurtic χ2 > KS ≈ W 2 Leptokurtic χ2 > KS > W 2 Bimodal χ2 > KS > W 2 A comparison of the powers of the Chi-Square test statistic 621 Fig. 6. Power of KS , W 2 and χ2 for a uniform null and bimodal alternative References [CLS94] [Fro96] [Pea00] [PS77] [Ste02] [SC06] Choulakian, V., Lockhart, R.A., Stephens, M.A.: Cramér-von Mises test statistics for discrete distributions. The Canadian Journal of Statistics 22,125-137 (1994) From, S.G.: A new goodness of fit test for the equality of multinomial cell probabilities verses trend alternatives. Communications in StatisticsTheory and Methods 25, 3167-3183 (1996) Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series Five 50,157-175 (1900) Pettitt, A.N., Stephens, M.A.: The Kolmogorov-Smirnov goodness-of-fit statistic with discrete and grouped data. Technometrics 19, 205-210 (1977) Steele, M.: The power of categorical goodness-of-fit test statistics. Ph.D. thesis, Griffith University (2002) Steele, M., Chaseling, J.: Powers of discrete goodness-of-fit test statistics for a uniform null against a selection of alternative distributions. Communications in Statistics-Simulation and Computations (In Press) (2006)

A comparison of the powers of the Chi-Square Kolmogorov-Smirnov and Cram´

Related documents

Products

Support

A comparison of the powers of the Chi-Square Kolmogorov-Smirnov and Cram´

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib