Selected Topics in NONPARAMETRIC STATISTICS Gabino P. Petilos, Ph.D. Nonparametric Statistics 2 NONPARAMETRIC STATISTICS GABINO P. PETILOS, Ph.D. INTRODUCTION Test statistics that require assumptions on the parameters of the populations from which samples are drawn are called parametric statistical tests. Examples of these are the z-test, t-test and the F-test. These tests assume that the population from which the sample was drawn is normally distributed and that the sample size is large enough to satisfy this requirement. Moreover, the dependent variable must be measured in at least the interval scale. When the above assumptions are not met, the use of parametric statistical tests is questionable because the conclusion arrived at would be contingent on the assumptions made on the particular test statistic. Also, it is not often the case that the dependent variable is measured in at least the interval scale. For instance, we might want to identify the factors that explain “passing or not passing the Licensure Examination for Teachers (LET)”. Note that this variable is a categorical variable. As another example, a researcher might be interested in determining whether three groups of middle level managers differ on their management styles. Obviously, the dependent variable which is management style is difficult to measure using interval scaling because it is inherently categorical. There are also many research problems where the population of interest has few elements so that normality of the distribution of data would be difficult to justify. In this situation, parametric statistics would probably be not appropriate because the data may not follow the normal distribution. The foregoing arguments should motivate us to learn other statistical tests that are useful when parametric statistical tests cannot be meaningfully applied. These test statistics are generally called nonparametric statistics which are applicable when the data gathered are as low as nominal data. Nonparametric statistical tests are also called distribution-free statistics because they do not require a specific distribution of data. In this context, the test of hypothesis is focused on equivalence of distribution rather than equality of parameters. The following is the list of some advantages of nonparametric statistics: 1. They can be used for very small sample sizes. 2. They make fewer assumptions about the data and hence may be more relevant to the situation mentioned in research. 3. They are available to analyze data which are inherently in ranks as well as data where numerical scores have the strength of ranks. 4. They are available to treat data that are simply classificatory or categorical, i.e., measured on a nominal scale. 5. They are easier to learn than parametric tests and the results can be interpreted directly. However, nonparametric statistics are less powerful than their parametric counterpart. This means that when the assumptions about normality of data and homogeneity of samples are satisfied, parametric statistical tests are more powerful than their nonparametric counterpart. Nonparametric statistical tests are also not as systematic when compared with parametric tests. Moreover, the statistical tables used in nonparametric statistical tests are scattered widely and have different formats. Gabino P. Petilos, Ph.D. Nonparametric Statistics 3 Despite the disadvantages mentioned, nonparametric statistical tests are recommended when parametric tests are not applicable. Knowledge of the most commonly used nonparametric statistical tests which are presented in this material is therefore necessary. Once these are learned, it will be easy to learn other nonparametric statistical tests not found in this material. All nonparametric tests discussed in this module are based on the book of Siegel and Castellan (1989). Also, some of the important statistical tables in the same reference are reproduced here for the convenience of the readers. In presenting a particular test, the readers are provided information as to the data requirement and the function of the said test followed by a detailed illustration on how the test is applied. The same data are analyzed using the Statistical Package for the Social Sciences (SPSS) software and the results are indicated after each example. In interpreting the SPSS computer output, we emphasize the use of p-value or significance level of the test which is commonly used in presenting results of hypothesis testing. The p-value is the probability of obtaining a value of a test statistic as extreme or more extreme than the one expected when the null hypothesis is true. In general, when the p-value associated to a test statistic is smaller than or equal to the given level of significance , the null hypothesis is rejected. This is opposite to what we do when analyzing data manually where the null hypothesis is rejected whenever the absolute value of the computed test-statistic is greater than or equal to the corresponding tabular value. The equivalence of these two techniques is discussed in the succeeding paragraph and illustrated in Fig. 1. p-value z zcomputed Fig. 1 In this figure, the area to the right of z is equal to the level of significance . The value of z is called the critical or tabular value. When the data are manually analyzed, the value of the test statistic is computed and compared with the critical z value. When the computed value is greater than or equal to the tabular value, the null hypothesis is rejected. This means that the computed value will lie to the right of the critical value. The area to the right of the computed value will be smaller than the given level of significance. This area is what we call the p-value or significance level of the test. Hence when the p-value is less than or equal to the given level of significance, the null hypothesis is also rejected. The nonparametric tests included in this material are only those that are commonly used in comparing independent samples as well as dependent or paired samples. Gabino P. Petilos, Ph.D. Nonparametric Statistics 4 MANN-WHITNEY U TEST (OR WILCOXON RANK-SUM TEST) DATA REQUIREMENT: RANKED DATA (ORDINAL) FUNCTION: USED TO COMPARE TWO INDEPENDENT SAMPLES Example: Do male or female students endorse stricter norm of honesty? Samples of 15 students, 7 males and 8 females, were given a brief description on 20 situations that might be considered as dishonesty (for example, glancing at somebody’s paper during the test, copying someone’s solution, etc.) and were asked to classify each from a scale of ‘very honest’ to ‘not honest at all’. Summative data ranging from 50 (indicating stricter norm of honesty) to 0 are given. Is the difference between male and female students statistically significant? Use Wilcoxon-Mann-Whitney test at = 0.05. 1 2 3 4 5 6 7 MALE Score Rank 29 5 36 8.5 24 2 26 3 33 7 27 4 31 6 Sum 35.5 1 2 3 4 5 6 7 8 FEMALE Score 36 42 46 41 20 43 39 40 Rank 8.5 13 15 12 1 14 10 11 84.5 The ranks can be obtained by combining the data sets as one and assigning 1 to the lowest score, 2 to the next higher score, etc. It there are tied scores, we assign the average of the ranks that would have been assigned to the scores if they were distinct. After assigning the ranks, we split again the scores into their original grouping with their corresponding ranks. H0: There is no significant difference on the perceived norm of honesty between male and female students. H1: Female students endorse stricter norm of honesty than male students (or the median perceived score on norm of honesty for female students is significantly higher than the male students.) Let m and n be the sample sizes of the smaller and larger group, respectively. Focusing on the smaller group (male), let Wx be the sum of the ranks of this group. Thus, Wx = 35.5. Using Appendix TableJ (pp. 339 – 346, Siegel & Castellan), we locate the sub-table for m = 7. Since the alternative hypothesis is that Wx should be small, we use the left (lower) tail of the distribution. When the null hypothesis is true, the probability associated with Wx 35.5 is between 0.0070 and 0.0103 which is significant at .05 level of significance. Thus we conclude that female students endorse more strict norm of honesty than male students. When m > 10 or n > 10, Appendix Table J cannot be used but we can use the normal approximation since the sampling distribution of Wx rapidly approaches that of the normal distribution. Let us consider the following example. Gabino P. Petilos, Ph.D. Nonparametric Statistics 5 MALE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Score 29 36 24 26 38 16 37 27 38 28 25 33 27 31 Rank 9.5 13.5 3 5 18.5 1 15.5 6.5 18.5 8 4 12 6.5 11 Sum 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 132.5 FEMALE Score Rank 36 13.5 42 25 46 28 41 23.5 48 29 41 23.5 38 18.5 45 27 38 18.5 37 15.5 29 9.5 20 2 43 26 39 21 40 22 Sum 302.5 From the given summary, let Wx = 132.5 (sum of the ranks of the smaller group). Since the sample size is large, we use the normal approximation for the data. Note that m 14 and m 15 , so that N m n 29 N. The mean of the sampling distribution of Wx is m(N 1) 2 14(29+ 1) 14(30) = = = 210 , and 2 2 Mean of Wx = Wx = Variance of Wx = 2wx = s.d. = mn(N + 1) = 12 mn(N 1) (14)(15)(29 + 1) = 525, so that 12 12 525 = 22.913 The test statistic is given by: z m(N 1) 2 mn(N + 1) 12 WX ± 0.5 where +0.5 is used for left tail probability and -0.5 is used for right tail probability. 132.5 0.5 - 210 77.5 = = - 3.361. 22.913 22.913 At = .05 level of significance, the tabular z (one-tailed) = 1.645. Since the absolute computed value of z exceeds the tabular value, the null hypothesis is rejected. Thus, for the given data, z When there are tied scores, we compute the correction for ties before getting the standard deviation of the sampling distribution of WX. The computation is done below: Gabino P. Petilos, Ph.D. Nonparametric Statistics 6 Grouping 1 2 3 4 5 6 Value 27 29 36 37 38 41 Rank 6.5 9.5 13.5 15.5 18.5 23.5 tj 2 2 2 2 4 2 Thus, 6 j 1 (t j 3 - t j ) = 12 23 - 2 23 - 2 23 - 2 23 - 2 4 3 - 4 23 - 2 + + + + + = (.5)(5) + 5 12 12 12 12 12 12 = (5)(1.5) = 7.5. Therefore, Wx2 mn N3 - N = N(N - 1) 12 = (t 3j t j ) 12 j 1 g (14)(15) 293 - 29 - 7.5 29(29 - 1) 12 = 523.0603 Thus, Wx = 523.0603448 = 22.8705. Finally, since the test is one-tailed (to the left), the value of the test statistic z is: z Wx 0.5- μ Wx (132.5 0.5 - 210) = = -3.36678. σ Wx 22.8705 The corresponding tabular value of z at 0.05 (one-tailed) is 1.645. Since the absolute computed value of z is greater than the corresponding tabular value, we reject the null hypothesis. We therefore conclude that the perceived norm of honesty of male and female students are significantly different. In particular, we say that female students tend to endorse stricter norm of honesty than male students. REMARKS: The effect of correcting ties is that it increases the magnitude of z making it more significant. If no correction for ties is employed, the value of z is conservative since its associated probability will be slightly inflated. Siegel and Castellan recommends that one should correct for ties only if the proportion of ties is quite large. ------------------------------------------------------------------------------------------------------------------------- Gabino P. Petilos, Ph.D. Nonparametric Statistics 7 COMPUTER OUTPUT USING SPSS Wilcoxon Rank Sum Test Ranks SCORE GROUP FEMALE MALE Total N Mean Rank 20.17 9.46 15 14 29 Sum of Ranks 302.50 132.50 Test Statisticsb Mann-Whitney U Wilcoxon W Z Asy mp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)] SCORE 27.500 132.500 -3.389 .001 a. Not corrected f or ties. b. Grouping Variable: GROUP Gabino P. Petilos, Ph.D. a .000 Obtained without adding 0.5 to the numerator of the test statistic Very small p-value which leads to the rejection of the hypothesis Nonparametric Statistics 8 WILCOXON SIGNED RANKS TEST DATA REQUIREMENT: ORDINAL OR RANKED DATA FUNCTION: USED TO COMPARE TWO DEPENDENT OR CORRELATED SAMPLES Example: It is claimed that jogging can improve the self-esteem of a person in less than 3 weeks. The self-esteem scores of 15 students who subscribed to this program were recorded before and after the treatment. Is there a significant difference between the median self-esteem of the students before and after subscribing to the program? STUDENT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SCORE BEFORE 58 61 61 69 64 68 70 62 56 59 72 75 73 67 65 SCORE AFTER 63 58 62 69 70 75 73 67 58 55 80 80 69 70 74 Let di be the difference score for any matched pair representing the difference between the paired scores under two treatments X and Y, that is, di = Xi – Yi. In applying the Wilcoxon signed-ranks test for equality of the central tendencies, we disregard first all differences equal to zero and rank all of the di’s without regard to sign: assign the rank of 1 to the smallest |di|, the rank of 2 to the next smallest, and so on. When the absolute value of two or more differences is the same, assign to each the average of the ranks that would have been assigned if the differences were distinguishable. After this procedure, affix to each rank the sign of the difference. The table is reproduced below. Gabino P. Petilos, Ph.D. Nonparametric Statistics 9 STUDENT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SCORE BEFORE 58 61 61 69 64 68 70 62 56 59 72 75 73 67 65 SCORE AFTER 63 58 62 69 70 75 73 67 58 55 80 80 69 70 74 d Rank of d 5 -3 1 0 6 7 3 5 2 -4 8 5 -4 3 9 9 -4 1 --11 12 4 9 2 -6.5 13 9 -6.5 4 14 The Wilcoxon signed ranks statistic is T+ which is the sum of all ranks where the differences are positive. This test statistic is used to test the difference between the two groups. H0: The improvement in self-esteem does not depend on the jogging program (or: The sum of the positive ranks and the sum of the negative ranks are equal). H1: The improvement in the self-esteem of a person depends on the program (or: the sum of the positive ranks differs from the sum of the negative ranks). For small samples (n 15) Appendix Table H on pp. 332-334 of Siegel and Castellan may be used for one-tailed and two-tailed tests. For two tailed tests, the table entry is simply doubled. From the given data N = 14 (since one difference is 0) and T+ = 88. Using Appendix Table H on pp. 332-334, for N = 14, and T+ = 88, the tabled probability at = .05 (two tailed) is 2(0.0123) = 0.0246. Hence, we reject the null hypothesis. For N > 15, Appendix H cannot be used. The sum of the ranks, T+, however, is approximately normally distributed with N(N + 1) N(N + 1)(2N 1) , and Variance T2 = 4 24 T - μ T so that the test statistic z is used to test the difference of the means of the two groups. Mean μ T = T If there are tied ranks, the test-statistic is also adjusted to account for the decrease in the variability of T+. The new variance is given by Variance = Gabino P. Petilos, Ph.D. N(N 1)(2N 1) 1 24 2 g t (t j j1 j 1)(t j 1) Nonparametric Statistics 10 Where g = number of groups; and tj = number of tied ranks in group j. To illustrate the normal approximation of the Wilcoxon test, consider the same data but we include two more students. Thus, N = 17. The data are reproduced below: STUDENT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 SCORE BEFORE 58 61 61 69 64 68 70 62 56 59 72 75 73 67 65 60 59 SCORE AFTER 63 58 62 69 70 75 73 67 58 55 80 80 69 70 74 70 70 d Rank of d 5 -3 1 0 6 7 3 5 2 -4 8 5 -4 3 9 10 11 9 -4 1 --11 12 4 9 2 -6.5 13 9 -6.5 4 14 15 16 For the given data, N = 16 (since one difference is 0) and T+ = 119 . The mean and variance are given by: N(N + 1) 16(16 + 1) = = 68 ; and 4 4 N(N + 1)(2N 1) 16(17)(33) Variance = = 374 . 24 24 Mean = Since there are tied ranks we have: Grouping 1 2 3 We first compute the value of the quantity have 1 2 g t (t j j 1)(t j 1) = j 1 Therefore, T2 = Gabino P. Petilos, Ph.D. Rank 4 6.5 9 1 2 tj 3 2 3 g t (t j j 1)(t j 1) . From the given values of tj, we j 1 1 [(3)(3 - 1)(3 1) (2)(2 1)(2 1) (3)(3 - 1)(3 1)] 27. 2 N(N 1)(2N 1) 1 24 2 g t (t j j 1 j 1)(t j 1) = 374 – 27 = 347. Nonparametric Statistics Thus, s.d. = 11 347 = 18.630. Therefore, z T - μ T T = 119 68 2.738 . 18.630 The tabular value of z at = .05 (one-tailed) is 1.645. Since the computed value of z exceeded the tabular value, the null hypothesis is rejected. We therefore conclude that the improvement in the self-esteem of a person depends on the program. -----------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS. Wilcoxon Signed Ranks Test Ranks N Score Af t er - Score Bef ore Negativ e Ranks Positiv e Ranks Ties Total a 3 13b 1c 17 Mean Rank 5.67 9.15 Sum of Ranks 17.00 119.00 a. Score Af t er < Score Bef ore b. Score Af t er > Score Bef ore c. Score Af t er = Score Bef ore Test Statisticsb Z Asy mp. Sig. (2-tailed) Score Af t er Score Bef ore -2.641a .008 a. Based on negativ e ranks. b. Wilcoxon Signed Ranks Test Gabino P. Petilos, Ph.D. Computed without correcting the variance Very small p-value which leads to the rejection of the hypothesis Nonparametric Statistics 12 KRUSKAL WALLIS ANOVA DATA REQUIREMENT: ORDINAL OR RANKED DATA FUNCTION: USED TO COMPARE THREE OR MORE INDEPENDENT SAMPLES Suppose we want to compare the effectiveness of three methods of teaching science, namely lecture, modular, and computer assisted. A random sample of 15 students were randomly assigned to three groups. The scores of the five students in each group are shown below. Method of Teaching G2 G3 (Modular) (Computer Assisted) 84 85 80 86 81 91 81 82 82 87 G1 (Lecture) 80 81 81 80 82 Ho: The median scores of the three groups of students exposed to the three teaching methods are not significantly different. H1: The median scores of the three groups of students are significantly different (NonDirectional alternative hypothesis). Converting the scores into ranks (treating them as one set of scores), we get the following results: Rj Rj METHOD OF TEACHING G2 G3 (Modular) (Computer Assisted) 11 12 2 13 5.5 15 5.5 9 9 14 33 63 6.6 12.6 G1 (Lecture) 2 5.5 5.5 2 9 24 4.8 Based on the mean ranks ( R j ), we note that the students exposed to the lecture and modular methods did not perform equally better than the students exposed to computer assisted instruction. We need to confirm this observation by conducting a statistical test using the Kruskal Wallis ANOVA since the three groups are independent samples and the data are ranks. TEST STATISTIC: KW 12 N(N 1) k n R j 2 j 3(N 1) j 1 12 5(4.8)2 5(6.6)2 5(12.6)2 3(15 1) 15(16) 12 (1,126.8) 48 8.34 15(16) Gabino P. Petilos, Ph.D. Nonparametric Statistics 13 The corresponding tabular value using Table O, at = .05, is given by 5.78. Since the computed value of KW exceeded the tabular value, we have to reject the null hypothesis. We therefore conclude that the median scores of the students exposed to the three teaching methods are significantly different. CORRECTED VALUE OF KW WHEN THERE ARE TIED OBSERVATIONS If we look at the original data, we can observe that there are 3 groups of tied scores, namely 80, 81, and 82. Moreover, we say that three scores are tied at 80, four scores are tied at 81, and three scores are also tied at 82. If we correct the obtained KW statistic for ties, the correction for continuity is given by g 1 (t 3 i ti ) i 1 N3 N =1– 108 (33 3) (4 3 4) (33 3) =1– .967857 3 3370 15 15 The corrected value of KW is given by KWCorrected = 12 N(N 1) k n R 1 j j 1 g (t 2 j 3(N 1) = 3 i ti ) 8.34 8.617. 0.967857 i 1 N3 N which is still significant at = .05. Pairwise comparison must be done to determine where the differences lie. To do this, we have to compute the value of the test statistic given by z k (k 1) N(N 1) 1 1 12 nu nv . where N is the sum of all the sample sizes among all groups, nu and nv are the sample sizes of the two groups being compared, k is the number of groups and z α is the corresponding tabular value k (k 1) obtained using Table AII (page 320, Siegel and Castellan). If the sample sizes are equal, nu and nv will be the same. In the given example nu = nv = 5 for all comparison groups. Hence we only have one value of the test statistic for comparing any two groups. At = .05, the critical value of z using Table AII is 2.394 (two tailed). Hence the value of the test statistic is given by (2.394) 15(15 1) 1 1 (2.394)(2.828) 6.771254537. 12 5 5 Any absolute difference between the mean ranks that exceeds the value of 6.7713 is therefore declared significant. Based on the data presented on page 11, we have R 1 4.8 , R 2 6.6 , and R 3 12.6 . Computing the absolute differences, we have Gabino P. Petilos, Ph.D. Nonparametric Statistics 14 R 3 R 1 12.6 4.8 7.8 6.7713 (significant); R 3 R 2 12.6 6.6 6.0 6.7713 (not significant); R 2 R 1 6.6 4.8 1.8 6.7713 (not significant). Based on the comparison test, only the effects of the lecture method and the computer assisted instruction of teaching science are significantly different in favor of the latter. -----------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS. Kruskal-Wallis ANOVA Ranks SCORE GRP lecture modular computer assisted Total N 5 5 5 15 Mean Rank 4.80 6.60 12.60 Test Statisticsa,b KW statistic Chi-Square df Asy mp. Sig. SCORE 8.617 2 .013 a. Kruskal Wallis Test b. Grouping Variable: GRP -------------------------------------------------------------------------------------------------------------------------- Gabino P. Petilos, Ph.D. Nonparametric Statistics 15 FRIEDMAN’S TWO WAY ANOVA DATA REQUIREMENT: ORDINAL OR RANKED DATA FUNCTION: USED TO COMPARE THREE OR MORE DEPENDENT SAMPLES (Repeated Measures Design) Consider the data given below. These data are the scores of 18 pairs of subjects on the standardized test in Mathematics. PAIR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 METHOD A (Modular) 18 24 19 23 24 18 25 19 21 22 19 23 24 20 21 19 25 18 METHOD B (Computer Assisted) 19 23 20 25 21 22 25 24 23 22 20 21 24 23 22 24 22 25 METHOD C (Lecture) 16 20 16 23 18 16 25 18 20 17 17 16 18 21 17 17 16 20 For this problem, the null hypothesis is: Ho: The three methods of teaching are equally effective in terms of improving the performance of three groups of students in College Algebra. H1: There is a teaching method which is more effective in terms of improving the performance of the students in College Algebra. To apply the Friedman’s Test, we first rank the scores row-wise starting with a rank of 1 for the lowest score. The average rank for tied scores will be used for tied observations. The results of the ranking is shown below. Gabino P. Petilos, Ph.D. Nonparametric Statistics 16 PAIR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Rj METHOD A (Modular) 2 3 2 1.5 3 2 2 2 2 2.5 2 3 2.5 1 2 2 3 1 38.5 METHOD B (Computer Assisted) 3 2 3 3 2 3 2 3 3 2.5 3 2 2.5 3 3 3 2 3 48.0 METHOD C (Lecture) 1 1 1 1.5 1 1 2 1 1 1 1 1 1 2 1 1 1 2 21.5 The value of Fr without correction is given by: 12 Fr Nk(k 1) R 2 j 3N(k 1) , where N = number of subjects and k = number of groups. 12 Thus, we have Fr (38.52 482 21.52 ) 3(18)(4) 236.028 216 20.028. 18(3)(3 1) The value of Fr with correction for tied observations is computed by first recording the size of the tied scores. In the given data, there are 45 ties of size 1 (all distinct data read rowwise), 3 ties of size 2, and 1 tie of size three. The correction for ties is given by: N gi t 3 i j 45(13 ) 3(2 3 ) 1 (33 ) 96. i 1 j 1 Hence the value of Fr with this correction is given by: k R 12 Fr Gabino P. Petilos, Ph.D. 2 j 3N 2 k(k 1)2 j 1 N gi Nk t i3 j i 1 j 1 Nk(k 1) k 1 Nonparametric Statistics 17 Substituting the values, we have, Fr 12(38.52 482 21.52 ) 3(182 )(3)(3 1)2 12(4248.5) 46656 = 18(3) 96 216 (21) 18(3)(3 1) 2 50982 46656 4326 22.1846or 22.185 . 195 195 The value of Fr (uncorrected for ties) is therefore slightly lower than the value of Fr corrected for ties. The significance of this value can be assessed using the Chi-Square distribution with degrees of freedom (d.f.) = k - 1. Now, if the d.f. = 2 and = 0.05, the tabular value of Chi-square is 5.99. Since the value of Fr exceeded the tabular value, we reject the null hypothesis. We conclude that there is a method of teaching college algebra which is superior than the another method. PAIRWISE COMPARISON The test above is global, i.e., it only tells us that the three groups are not comparable. It does not tell us which two specific groups are significantly different from one another. To determine which two particular groups are significantly different, we perform the pairwise comparison test as suggested by Siegel and Castellan (pp. 180-181). First we determine the absolute difference of the sum of the ranks for the three groups. R A - RB 38.5 - 48 9.5 R A - RC 38.5- 21.5 17.0 RB - R C 48- 21.5 26.5 The difference of the sums of the ranks is greatest between the subjects exposed to Method B and Method C, followed by those exposed to Method A and Method C. The absolute difference is compared to the critical value given by Nk(k 1) z . 6 k ( k 1 ) If = 0.05, α .0083. Using the normal table, the corresponding value of z (through k(k 1) interpolation) is given by 2.394. And since Nk(k 1) 18(3)(4) 6 , the critical value for the 6 6 pairwise comparison is 2.394(6) = 14.364. Since 17 > 14.364 and 26.5 > 14.364, we conclude that the groups exposed to Methods A and B significantly differed with the group exposed to Method C. However, the groups exposed to methods A and B did not differ significantly in their scores in algebra. Gabino P. Petilos, Ph.D. Nonparametric Statistics 18 On the basis of the pairwise comparison test, it could be said that teaching algebra using modules and using computers are better methods than the lecture method. -------------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS Friedman Test Ranks Method A Method B Method C Mean Rank 2.14 2.67 1.19 Test Statisticsa Fr statistic N Chi-Square df Asy mp. Sig. 18 22.185 2 .000 a. Friedman Test ------------------------------------------------------------------------------------------------------------------------ Gabino P. Petilos, Ph.D. Nonparametric Statistics 19 CHI-SQUARE TEST FOR TWO INDEPENDENT SAMPLES DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS FUNCTION: USED TO COMPARE TWO INDEPENDENT SAMPLES In a study conducted on the use of seat belts in preventing fatalities, records of the last 100 vehicular accidents were reviewed. These 100 accidents involved 238 persons. Each person was classified as using or not using seat belts when the accident happened and as injured fatally or a survivor. Injured Fatally? Yes No Total Wearing seat Belts? Yes No 9 (13.04) 88 (83.96) 23 (18.96) 118 (122.04) 32 206 Total 97 141 238 The samples could be treated as independent samples: those wearing seat belts and those not wearing. We can then compare the proportion of those wearing seat belts who were fatally injured and those not wearing who were also fatally injured. From this table, 9 out of 32 or 28.2% of those wore seat belts were fatally injured. On the other hand, 88 out of 206 or 42.3% of those who did not wear seat belts were also fatally injured. Are these independent proportions or percentages significantly different? The null hypothesis and the corresponding alternative hypotheses are stated as follows: Ho: There is no significant difference between the proportion of fatally injured persons who wear and who do not wear seatbelts. H1: There is a significant difference between the proportion of fatally injured persons who wear and who do not wear seatbelts. The test statistic appropriate for this problem is the Chi-square test. The general formula for the Chi-square test is given by o e 2 2 e over all cells where: o is the observed frequency e is the expected frequency given by e = row totalcolumntotal grand total For instance, from the given table we have the following computations: o 9 88 23 118 Gabino P. Petilos, Ph.D. e (97)(32) 13.04 202 or 13.04 238 (97)(206) 83.95798 or 83.96 238 (141)(32) 18.95798 or 18.96 238 (141)(206) 122.042 or 122.04 238 . Nonparametric Statistics 20 Thus, the Chi-square value is given by 2 o e 2 over all cells e = 9 13.042 13.04 + 88 83.962 83.96 + 23 18.962 18.96 + 118 122.042 122.04 1.251656441 + 0.194397332 + 0.860843881 + 0.133739757 2.440637411 or 2.441. For 2 2 tables in which the sample size is small, (say N < 50) the following formula is recommended: 2 o e 0.52 e over all cells , i.e., 0.5 is subtracted first from the absolute difference of the observed and the expected frequency before squaring and dividing by the expected frequency. For the same table above, the corrected value is given by: 2 9 13.04 0.52 88 83.96 0.52 23 18.96 0.52 + + + 118 122.04 0.52 18.96 13.04 83.96 0.961959+ 0.14943+ 0.661773+ 0.1028(using the exact values) 1.875962or 1.876 122.04 A more efficient formula that is equivalent to the equation shown above is given by 2 N N AD BC 2 2 (A C ) (B D) (A D) (B D) where the symbols are taken from a contingency table whose format is given below Variable Y Yes No Total Variable X Yes No A B C D A+C B+D Total A+B C+D A+B+C+D Thus, for the data in the table shown above, we have: 2 238 238 (9)(118) - (88)(23) 238(926 - 119)2 2 2 1.876. (97)(141)(206)(32) 90158784 which is the same value of the test statistic. Gabino P. Petilos, Ph.D. Nonparametric Statistics 21 Since the tabular Chi-square value at 0.05 level of significance and df = 1 is 3.84, the null hypothesis cannot be rejected. There is no sufficient evidence to show that wearing seat belts would reduce the number of fatally injured people during accidents. Remarks: When to use the Chi-square test 1. When N 20, always use the Fisher exact test. 2. When N is between 20 and 40, the Chi-square test given by 2 N N AD BC 2 2 may be used if all expected frequencies (A C ) (B D) (A D) (B D) are 5 or more. If the smallest expected frequency is less than 5, use the Fisher exact test. 3. When N > 40, use the Chi-Square using the formula above. Gabino P. Petilos, Ph.D. Nonparametric Statistics 22 MCNEMAR TEST DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS FUNCTION: USED TO COMPARE TWO DEPENDENT SAMPLES The McNemar test for the significance of changes is applicable to “before-and-after” designs in which each subject is its own control and in which the measurements are made on either a nominal or ordinal scale. To test the significance of any observed change using this test, a fourfold table of frequencies is used to represent the first and second sets of responses from the same individuals. In this table, + and – signs are used to denote different responses arranged as shown below after before + – – A C + B D where A, B, C, and D are the observed frequencies. Thus, A denotes the number of individuals whose responses were + on the first measure and – on the second measure. Similarly, D is the number of individuals who changed from minus (–) to plus (+). B and C are the respondents who responded the same (+ for B and – for C) on both measures. If the null hypothesis of no significant difference in the number who changed from to plus (+) to minus (–) and from minus (–) to plus (+) is true, then the expected frequency for the cells for A and D are each equal to (A+D)/2. The corresponding test statistic is called the McNemar test and is given by Test Statistic (with Yate’s Correction): 2 A D 1 2 A+D with df = 1. Illustration: How consistent are people in their voting habits? Do people vote for the same political party from election to election? Below are the results of a poll in which people were asked if they had voted for NP or LP in each of the last two presidential elections. 1998 Elections NP LP 1992 Elections NP LP 117 23 27 178 In applying the McNemar test, we always use + and – to denote different responses. The table is organized as shown below After Before + - A C + B D The obtained data can be assumed to have come from two dependent samples since the same group of people were interviewed on two different occasions. Moreover, the data are Gabino P. Petilos, Ph.D. Nonparametric Statistics 23 frequency counts so the McNemar Test is applicable. We first rearrange the entries in the given table to conform to the table suggested by Siegel and Castellan as shown below. 1998 Elections +(NP) - (LP) Total 1992 Elections -(LP) +(NP) 23 117 178 27 201 144 Total 140 205 345 The analysis could be centered on the proportion of voters who voted for NP during the 1992 elections and during the 1998 elections. From the table, we observe that 144 out of 345 or about 41.7% voted for NP during the 1992 elections. On the other hand, only 140 out of 345 or 40.6% voted for NP during the 1998 elections. There was therefore a decrease in the proportion of voters who voted for NP during the 1992 elections and during the 1998 elections. Thus, our null and alternative hypotheses are: Ho: There is no significant difference in the proportion of those who voted for NP during the 1992 elections and those who voted the same party affiliation during the 1998 elections. H1: The proportion of those who voted for NP during the 1998 elections was significantly lower than those who voted for the same party affiliation during the 1992 elections. Using the McNemar test, we have A = 23, D = 27. Thus, the value of the test statistic is 2 23 27 12 23 27 0.18 The tabular chi-square value at 0.05 level of significance is 3.84. Since the computed value did not exceed the tabular value, we do not reject the null hypothesis. It could be said that people tend to vote for the same party affiliation from election to election. Gabino P. Petilos, Ph.D. Nonparametric Statistics 24 COMPUTER OUTPUT USING THE SPSS McNemar Test Y1998 & Y1992 Y1992 Y1998 NP LP NP 117 27 LP 23 178 Test Statisticsb McNemar statistic N Chi-Square a Asy mp. Sig. Y 1998 & Y 1992 345 .180 .671 a. Continuity Corrected b. McNemar Test Remark: When the expected frequency (A+D)/2 is small (less than 5), we use the Binomial Test. Gabino P. Petilos, Ph.D. Nonparametric Statistics 25 CHI-SQUARE TEST FOR THREE OR MORE INDEPENDENT SAMPLES DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS FUNCTION: TO COMPARE THREE OR MORE INDEPENDENT SAMPLES Suppose we want to compare the effectiveness of three methods of teaching advanced statistics namely, Lecture Method (Method 1), Modular Method (Method 2) and Using CAI Materials (Method 3). To do this, we first randomly form three independent groups (samples) of students where each group will be taught by any one of the three methods of teaching. The dependent variable in this case is the performance of the students in the final examination in Advanced Statistics. If the data are scores, the One way ANOVA will be applicable. But suppose the performance of the student is categorized into one of the following categories: Below Satisfactory (score of 74 and below), Fair (75 – 79), Satisfactory (80 – 84), and Above Satisfactory (85 and above). It would be interesting to determine how many of the students in each group would have scores falling within each of these four categories. A comparison of the frequencies or proportions can be done descriptively. But if we want to test whether the proportions within each category differ significantly from one another, the Chi-Square test of significance can be used. Ho: The distribution of grades of students exposed to the three teaching methods will not differ significantly. (or There is no significant difference between the proportion of students in each of the three groups who obtained Above Satisfactory ratings) Ha: The distribution of grades of students exposed to the three teaching methods will differ significantly. (or There is a significant difference between the proportion of students in each of the three groups who obtained Above Satisfactory ratings) Let us use the following hypothetical data to test the given null hypothesis: Performance Category Above Satisfactory Method of Teaching Modular CAI 20 18 TOTAL 47 Satisfactory 12 18 21 51 Fair 15 10 8 33 Below Satisfactory 24 12 6 42 60 60 53 173 Total Gabino P. Petilos, Ph.D. Lecture 9 Nonparametric Statistics 26 The hypothetical data show that there were 60 students exposed to the lecture method, 60 students exposed to the modular method, and 53 students exposed to the use of Computer Assisted Instruction. It can be gleaned from the same table, that the distribution of performance for students under the lecture method seem to differ from those exposed to the other two methods. We will test the significance of this difference by computing the Chi-square test statistic given by 2 over all cells where: o e 2 e o2 N , with d.f. = (r-1)(c-1) over all cells e o is the observed frequency e is the expected frequency given by e = row totalcolumntotal grand total , N is the grand total, and r is number of categories of the row variable (dependent variable) while c is the number of categories of the column variable (independent variable). To compute this value, we need to compute the expected frequencies corresponding to the observed frequencies. The results are shown in the table below. Performance Category Above Satisfactory Lecture 9 (16.3) Satisfactory 12 (17.7) 18 (17.7) 21 (15.6) 51 Fair 15 (11.4) 10 (11.4) 8 (10.1) 33 Below Satisfactory 24 (14.6) 12 (14.6) 6 (12.9) 42 60 60 53 173 Total Method of Teaching Modular CAI 20 (16.3) 18 (14.4) TOTAL 47 Based on the table entries, the value of the Chi-square statistic is given by 2 92 122 152 242 202 182 102 122 182 212 8 2 62 173 16.3 17.7 11.4 14.6 16.3 17.7 11.4 14.6 14.4 15.6 10. 12.9 193.670 173 20.67 Thus the computed Chi-square value is 20.67. At = 0.05 and d.f. = (4-1)(3-1) = 6, the corresponding tabular value is 12.59. Since the computed chi-square value exceeded the tabular value, the null hypothesis is rejected. It may be concluded that the effects of the three methods of teaching on the performance of the students in the final test are significantly different. To determine where the differences lie, we conduct the pairwise comparison test by first partitioning the original table into 2 2 sub-tables each with d.f. = 1. Gabino P. Petilos, Ph.D. Nonparametric Statistics 27 The ith partition table for any r c contingency table have the following entries needed to compute the corresponding chi-square value: A B R1 C D R2 C1 C2 N where C1 is the sum of all marginal column totals determined by A and C; C2 is the sum of all marginal column totals determined by B and D; R1 is the sum of all marginal row totals determined by A and B; R2 is the sum of all marginal row totals determined by C and D; N is the grand total in the original contingency table. The Chi-square statistic associated to each partition is computed using the formula 2 NC 2 (A R2 C R1 ) C 1 (B R2 D R1 ) 2 C 1 C 2 R1 R2 (C 1 C 2 )(R1 R2 ) For the given contingency table, we have the following computed 2 values: 1. Comparing the effects of lecture and modular methods for students who got Above Satisfactory and Satisfactory performance. The partition table is given by: Therefore, 2 9 20 47 12 18 51 60 60 173 17360(9 51 12 47) 60(20 51 18 47)2 = 0.48 60 60 47 51 (60 60)(47 51) 2. Comparing the effects of lecture and modular methods for students who got Above Satisfactory and Satisfactory performance combined and those who got Fair performance. The partition table is given by: Therefore, Gabino P. Petilos, Ph.D. 2 21 38 98 15 10 33 60 60 173 17360(21 33 15 98) 60(38 33 10 98)2 = 3.76 60 60 98 33 (60 60)(98 33) Nonparametric Statistics 28 3. Comparing the effects of lecture and modular methods for students who got Above Satisfactory, Satisfactory and Fair performance combined and those who got Below Satisfactory performance. The partition table is given by: Therefore, 2 36 48 131 24 12 42 60 60 173 17360(36 42 24 131) 60(48 42 12 131)2 = 6.53 60 60 131 42 (60 60)(131 42) 4. Comparing the effects of lecture and modular methods combined versus the effect of CAI for students who got Above Satisfactory and Satisfactory performance. The partition table is given by: Therefore, 2 29 18 47 30 21 51 120 53 173 17353(29 51 30 47) 120(18 51 21 47)2 = 0.10 120 53 47 51 (120 53)(47 51) 5. Comparing the effects of lecture and modular methods combined versus the effect of CAI for students who got Above Satisfactory and Satisfactory performance combined and those who got Fair performance. The partition table is given by: Therefore, 2 59 39 98 25 8 33 120 53 173 17353(59 33 25 98) 120(39 33 8 98)2 = 2.81 120 53 98 33 (120 53)(98 33) 6. Comparing the effects of lecture and modular methods combined versus CAI for students who got Above Satisfactory, Satisfactory and Fair performance combined and those who got Below Satisfactory performance. The partition table is given by: Therefore, Gabino P. Petilos, Ph.D. 2 84 47 131 36 6 42 120 53 173 17353(84 42 36 131) 120(47 42 6 1312 = 7.0 120 53 131 42 (120 53)(131 42) Nonparametric Statistics 29 Summary of the Chi-square Values: Partition 2 value Tabular Value 1 2 3 4 5 6 0.48 3.76 6.53 0.10 2.81 7.00 20.68 3.84 3.84 3.84 3.84 3.84 3.84 Total Interpretation Not Significant Not Significant Significant Not Significant Not Significant Significant Based on the comparisons made, it is concluded that the effects of the lecture and modular methods of teaching advanced statistics are significantly different for those who obtained at least Fair performance and those who got Below Satisfactory performance. Similarly, the effects of lecture and modular methods combined, is deemed significantly different from the effects of CAI for students who got at least Fair performance and those who got Below Satisfactory performance. In general then, the effects of the three methods of teaching can be said to be different from one another in as far as those who failed are concerned. The hypothetical data showed that there were many failures in the group exposed to the lecture method, followed by the group exposed to the modular method. There are less failures in the group taught using the CAI. ANALYSIS OF THE SAME DATA USING SPSS: Crosstabs Performance * Method of Teaching Crosstabulation Method of Teaching Perf ormance Total Abov e Satisf actory Count Expected Count Satisf actoy Count Expected Count Fair Count Expected Count Below Satisf actory Count Expected Count Count Expected Count Lecture 9 16.3 12 17.7 15 11.4 24 14.6 60 60.0 Chi-Square Tests Pearson Chi-Square N of Valid Cases Value 20.647a 173 df 6 Asy mp. Sig. (2-sided) .002 a. 0 cells (.0%) hav e expect ed count less than 5. The minimum expected count is 10.11. Gabino P. Petilos, Ph.D. Modular 20 16.3 18 17.7 10 11.4 12 14.6 60 60.0 CAI 18 14.4 21 15.6 8 10.1 6 12.9 53 53.0 Total 47 47.0 51 51.0 33 33.0 42 42.0 173 173.0 Nonparametric Statistics 30 COCHRAN’S Q TEST DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS FUNCTION: TO COMPARE THREE OR MORE DEPENDENT SAMPLES Example: An experimental study was conducted to determine the method of teaching that would improve the conceptual understanding of the students in Physics. Twenty sets of matched individuals were selected and randomly assigned to the three groups. The dependent variable of the study was the students’ performance in the test to be given after the experiment. Each student’s performance was coded as 1 if the student passes the test and 0 if he fails. The data are shown below. Subject METHOD A METHOD B METHOD C Li L2i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 1 0 1 1 1 1 1 0 1 1 1 0 0 1 1 1 0 1 0 G1 = 14 0 1 1 0 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 G2 = 11 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 0 0 0 0 G3 = 6 1 2 1 1 1 3 2 2 1 3 2 1 2 0 2 3 2 0 2 0 1 4 1 1 1 9 4 4 1 9 4 1 4 0 4 9 4 0 4 0 i 2 i L 31 L i Proportion who passed: Pi 65 i 14 11 6 or 70%; P2 or 55%; P3 or 30% 20 20 20 In this research problem, the Cochran’s Q whose test statistic is given below will be appropriate for this research design for the following reasons: 1. We are comparing three dependent samples. 2. The data are categorical. Gabino P. Petilos, Ph.D. Nonparametric Statistics 31 2 k k 2 (k 1)k G j G j j 1 j 1 Q Test Statistic: k N Li i 1 N L 2 i i 1 The null and alternative hypotheses in this research problem are: Ho: There is no significant difference in the proportion of subjects who pass the test in each of the three groups. H1: There is a significant difference in the proportion of subjects who pass the test in each of the three groups Based on the given data, the computed value of Q which is the test statistic is given by Thus, (k 1)k Q k k j 1 N k G j j 1 N L L i i 1 G 2j 2 i 2 2 2 2 2 (3 1) 3(14 11 6 ) 31 3(31) 65 i 1 2(98) 7.0 . 28 At = 0.05 and df = 2, the tabular Chi-square value is 5.99. Therefore the null hypothesis is rejected. We conclude that the observed proportions are significantly different. Note: When the Q statistic is significant, pairwise comparison must be conducted to determine which two particular groups are significantly different. Gabino P. Petilos, Ph.D. Nonparametric Statistics 32 ------------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS Cochran Test Frequencies Value 0 METHOD A METHOD B METHOD C 1 6 9 14 14 11 6 Test Statistics N Cochran's Q df Asy mp. Sig. 20 7.000a 2 .030 a. 1 is treated as a success. ------------------------------------------------------------------------------------------------------------------------SUMMARY This material discussed test statistics that can be used when the given data cannot be analyzed using a parametric test like the t-test and the F-test because of data requirement and scale of measurement used in gathering the data. As mentioned earlier, the nonparametric statistical tests discussed in this material are only the most commonly used tests which are alternatives for the t-test and the F-test. There are other nonparametric tests that could be used and the interested readers are referred to the book of Siegel and Castellan. The table below summarizes the relation between parametric and nonparametric tests used for comparing groups Type of Data Interval Ordinal Nominal Independent Samples 2 groups 3 or more groups t-test F-test (One Way ANOVA) Wilcoxon Rank Sum Test Chi-Square Test Kruskal Wallis Test Chi-Square Test Dependent Samples 2 groups 3 or more groups t-test F-test (Repeated Measures ANOVA) Wilcoxon Signed Friedman’s Test Ranks Test McNemar Test Cochran’s Q test REFERENCE: Siegel S. & Castellan, J. (1988) Nonparametric Statistics. New York: McGraw-Hill Book Company (2nd ed). Gabino P. Petilos, Ph.D.