Uploaded by Jethromel Meneses

NONPARAMETRIC STATISTICS

advertisement
Selected Topics in
NONPARAMETRIC STATISTICS
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
2
NONPARAMETRIC STATISTICS
GABINO P. PETILOS, Ph.D.
INTRODUCTION
Test statistics that require assumptions on the parameters of the populations from which
samples are drawn are called parametric statistical tests. Examples of these are the z-test, t-test and
the F-test. These tests assume that the population from which the sample was drawn is normally
distributed and that the sample size is large enough to satisfy this requirement. Moreover, the
dependent variable must be measured in at least the interval scale.
When the above assumptions are not met, the use of parametric statistical tests is
questionable because the conclusion arrived at would be contingent on the assumptions made on
the particular test statistic. Also, it is not often the case that the dependent variable is measured in
at least the interval scale. For instance, we might want to identify the factors that explain “passing or
not passing the Licensure Examination for Teachers (LET)”. Note that this variable is a categorical
variable. As another example, a researcher might be interested in determining whether three groups
of middle level managers differ on their management styles. Obviously, the dependent variable
which is management style is difficult to measure using interval scaling because it is inherently
categorical. There are also many research problems where the population of interest has few
elements so that normality of the distribution of data would be difficult to justify. In this situation,
parametric statistics would probably be not appropriate because the data may not follow the normal
distribution.
The foregoing arguments should motivate us to learn other statistical tests that are useful
when parametric statistical tests cannot be meaningfully applied. These test statistics are generally
called nonparametric statistics which are applicable when the data gathered are as low as nominal
data.
Nonparametric statistical tests are also called distribution-free statistics because they do not
require a specific distribution of data. In this context, the test of hypothesis is focused on
equivalence of distribution rather than equality of parameters.
The following is the list of some advantages of nonparametric statistics:
1. They can be used for very small sample sizes.
2. They make fewer assumptions about the data and hence may be more relevant to the
situation mentioned in research.
3. They are available to analyze data which are inherently in ranks as well as data where
numerical scores have the strength of ranks.
4. They are available to treat data that are simply classificatory or categorical, i.e., measured on
a nominal scale.
5. They are easier to learn than parametric tests and the results can be interpreted directly.
However, nonparametric statistics are less powerful than their parametric counterpart. This
means that when the assumptions about normality of data and homogeneity of samples are satisfied,
parametric statistical tests are more powerful than their nonparametric counterpart. Nonparametric
statistical tests are also not as systematic when compared with parametric tests. Moreover, the
statistical tables used in nonparametric statistical tests are scattered widely and have different
formats.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
3
Despite the disadvantages mentioned, nonparametric statistical tests are recommended
when parametric tests are not applicable. Knowledge of the most commonly used nonparametric
statistical tests which are presented in this material is therefore necessary. Once these are learned, it
will be easy to learn other nonparametric statistical tests not found in this material.
All nonparametric tests discussed in this module are based on the book of Siegel and
Castellan (1989). Also, some of the important statistical tables in the same reference are reproduced
here for the convenience of the readers. In presenting a particular test, the readers are provided
information as to the data requirement and the function of the said test followed by a detailed
illustration on how the test is applied. The same data are analyzed using the Statistical Package for
the Social Sciences (SPSS) software and the results are indicated after each example.
In interpreting the SPSS computer output, we emphasize the use of p-value or significance
level of the test which is commonly used in presenting results of hypothesis testing. The p-value is
the probability of obtaining a value of a test statistic as extreme or more extreme than the one
expected when the null hypothesis is true. In general, when the p-value associated to a test statistic
is smaller than or equal to the given level of significance , the null hypothesis is rejected. This is
opposite to what we do when analyzing data manually where the null hypothesis is rejected
whenever the absolute value of the computed test-statistic is greater than or equal to the
corresponding tabular value. The equivalence of these two techniques is discussed in the succeeding
paragraph and illustrated in Fig. 1.

p-value
z zcomputed
Fig. 1
In this figure, the area to the right of z is equal to the level of significance . The value of z is
called the critical or tabular value. When the data are manually analyzed, the value of the test
statistic is computed and compared with the critical z value. When the computed value is greater
than or equal to the tabular value, the null hypothesis is rejected. This means that the computed
value will lie to the right of the critical value. The area to the right of the computed value will be
smaller than the given level of significance. This area is what we call the p-value or significance level
of the test. Hence when the p-value is less than or equal to the given level of significance, the null
hypothesis is also rejected.
The nonparametric tests included in this material are only those that are commonly used in
comparing independent samples as well as dependent or paired samples.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
4
MANN-WHITNEY U TEST (OR WILCOXON RANK-SUM TEST)
DATA REQUIREMENT: RANKED DATA (ORDINAL)
FUNCTION: USED TO COMPARE TWO INDEPENDENT SAMPLES
Example:
Do male or female students endorse stricter norm of honesty? Samples of 15 students, 7
males and 8 females, were given a brief description on 20 situations that might be considered as
dishonesty (for example, glancing at somebody’s paper during the test, copying someone’s solution,
etc.) and were asked to classify each from a scale of ‘very honest’ to ‘not honest at all’. Summative
data ranging from 50 (indicating stricter norm of honesty) to 0 are given. Is the difference between
male and female students statistically significant? Use Wilcoxon-Mann-Whitney test at  = 0.05.
1
2
3
4
5
6
7
MALE
Score
Rank
29
5
36
8.5
24
2
26
3
33
7
27
4
31
6
Sum
35.5
1
2
3
4
5
6
7
8
FEMALE
Score
36
42
46
41
20
43
39
40
Rank
8.5
13
15
12
1
14
10
11
84.5
The ranks can be obtained by combining the data sets as one and assigning 1 to the lowest score, 2 to
the next higher score, etc. It there are tied scores, we assign the average of the ranks that would
have been assigned to the scores if they were distinct.
After assigning the ranks, we split again the scores into their original grouping with their
corresponding ranks.
H0: There is no significant difference on the perceived norm of honesty between male and
female students.
H1: Female students endorse stricter norm of honesty than male students (or the median
perceived score on norm of honesty for female students is significantly higher than the
male students.)
Let m and n be the sample sizes of the smaller and larger group, respectively. Focusing on the
smaller group (male), let Wx be the sum of the ranks of this group. Thus, Wx = 35.5. Using Appendix
TableJ (pp. 339 – 346, Siegel & Castellan), we locate the sub-table for m = 7.
Since the alternative hypothesis is that Wx should be small, we use the left (lower) tail of the
distribution. When the null hypothesis is true, the probability associated with Wx  35.5 is between
0.0070 and 0.0103 which is significant at .05 level of significance. Thus we conclude that female
students endorse more strict norm of honesty than male students.
When m > 10 or n > 10, Appendix Table J cannot be used but we can use the normal
approximation since the sampling distribution of Wx rapidly approaches that of the normal
distribution. Let us consider the following example.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
5
MALE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Score
29
36
24
26
38
16
37
27
38
28
25
33
27
31
Rank
9.5
13.5
3
5
18.5
1
15.5
6.5
18.5
8
4
12
6.5
11
Sum
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
132.5
FEMALE
Score
Rank
36
13.5
42
25
46
28
41
23.5
48
29
41
23.5
38
18.5
45
27
38
18.5
37
15.5
29
9.5
20
2
43
26
39
21
40
22
Sum
302.5
From the given summary, let Wx = 132.5 (sum of the ranks of the smaller group). Since the
sample size is large, we use the normal approximation for the data. Note that m  14 and m  15 ,
so that N  m  n  29 N. The mean of the sampling distribution of Wx is
m(N  1)
2
14(29+ 1) 14(30)
=
=
= 210 , and
2
2
Mean of Wx = Wx =
Variance of Wx = 2wx =
s.d. =
mn(N + 1)
=
12
mn(N  1) (14)(15)(29 + 1)

= 525, so that
12
12
525 = 22.913
The test statistic is given by:
z
m(N  1)
2
mn(N + 1)
12
WX ± 0.5 
where +0.5 is used for left tail probability and -0.5 is used for right tail probability.
132.5  0.5 - 210
77.5
=
= - 3.361.
22.913
22.913
At  = .05 level of significance, the tabular z (one-tailed) = 1.645. Since the absolute computed
value of z exceeds the tabular value, the null hypothesis is rejected.
Thus, for the given data, z 
When there are tied scores, we compute the correction for ties before getting the standard
deviation of the sampling distribution of WX. The computation is done below:
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
6
Grouping
1
2
3
4
5
6
Value
27
29
36
37
38
41
Rank
6.5
9.5
13.5
15.5
18.5
23.5
tj
2
2
2
2
4
2
Thus,
6

j 1
(t j 3 - t j )
=
12
23 - 2 23 - 2 23 - 2 23 - 2 4 3 - 4 23 - 2
+
+
+
+
+
= (.5)(5) + 5
12
12
12
12
12
12
= (5)(1.5)
= 7.5.
Therefore,
Wx2
mn  N3 - N
=
N(N - 1)  12
=
(t 3j  t j ) 


12
j 1

g


(14)(15)  293 - 29

- 7.5
29(29 - 1)  12

= 523.0603
Thus, Wx =
523.0603448 = 22.8705.
Finally, since the test is one-tailed (to the left), the value of the test statistic z is:
z
Wx  0.5- μ Wx (132.5 0.5 - 210)
=
= -3.36678.
σ Wx
22.8705
The corresponding tabular value of z at 0.05 (one-tailed) is 1.645. Since the absolute
computed value of z is greater than the corresponding tabular value, we reject the null hypothesis.
We therefore conclude that the perceived norm of honesty of male and female students are
significantly different. In particular, we say that female students tend to endorse stricter norm of
honesty than male students.
REMARKS: The effect of correcting ties is that it increases the magnitude of z making it more
significant. If no correction for ties is employed, the value of z is conservative since its
associated probability will be slightly inflated. Siegel and Castellan recommends that
one should correct for ties only if the proportion of ties is quite large.
-------------------------------------------------------------------------------------------------------------------------
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
7
COMPUTER OUTPUT USING SPSS
Wilcoxon Rank Sum Test
Ranks
SCORE
GROUP
FEMALE
MALE
Total
N
Mean Rank
20.17
9.46
15
14
29
Sum of Ranks
302.50
132.50
Test Statisticsb
Mann-Whitney U
Wilcoxon W
Z
Asy mp. Sig. (2-tailed)
Exact Sig. [2*(1-tailed
Sig.)]
SCORE
27.500
132.500
-3.389
.001
a. Not corrected f or ties.
b. Grouping Variable: GROUP
Gabino P. Petilos, Ph.D.
a
.000
Obtained without adding 0.5 to the
numerator of the test statistic
Very small p-value which leads to
the rejection of the hypothesis
Nonparametric Statistics
8
WILCOXON SIGNED RANKS TEST
DATA REQUIREMENT: ORDINAL OR RANKED DATA
FUNCTION: USED TO COMPARE TWO DEPENDENT OR CORRELATED SAMPLES
Example:
It is claimed that jogging can improve the self-esteem of a person in less than 3 weeks. The
self-esteem scores of 15 students who subscribed to this program were recorded before and after
the treatment. Is there a significant difference between the median self-esteem of the students
before and after subscribing to the program?
STUDENT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SCORE
BEFORE
58
61
61
69
64
68
70
62
56
59
72
75
73
67
65
SCORE
AFTER
63
58
62
69
70
75
73
67
58
55
80
80
69
70
74
Let di be the difference score for any matched pair representing the difference between the
paired scores under two treatments X and Y, that is, di = Xi – Yi.
In applying the Wilcoxon signed-ranks test for equality of the central tendencies, we
disregard first all differences equal to zero and rank all of the di’s without regard to sign: assign the
rank of 1 to the smallest |di|, the rank of 2 to the next smallest, and so on. When the absolute value
of two or more differences is the same, assign to each the average of the ranks that would have been
assigned if the differences were distinguishable. After this procedure, affix to each rank the sign of
the difference. The table is reproduced below.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
9
STUDENT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SCORE
BEFORE
58
61
61
69
64
68
70
62
56
59
72
75
73
67
65
SCORE
AFTER
63
58
62
69
70
75
73
67
58
55
80
80
69
70
74
d
Rank of d
5
-3
1
0
6
7
3
5
2
-4
8
5
-4
3
9
9
-4
1
--11
12
4
9
2
-6.5
13
9
-6.5
4
14
The Wilcoxon signed ranks statistic is T+ which is the sum of all ranks where the differences
are positive. This test statistic is used to test the difference between the two groups.
H0: The improvement in self-esteem does not depend on the jogging program (or: The sum of
the positive ranks and the sum of the negative ranks are equal).
H1: The improvement in the self-esteem of a person depends on the program (or: the sum of
the positive ranks differs from the sum of the negative ranks).
For small samples (n  15) Appendix Table H on pp. 332-334 of Siegel and Castellan may be
used for one-tailed and two-tailed tests. For two tailed tests, the table entry is simply doubled.
From the given data N = 14 (since one difference is 0) and T+ = 88. Using Appendix Table H on
pp. 332-334, for N = 14, and T+ = 88, the tabled probability at  = .05 (two tailed) is 2(0.0123) =
0.0246. Hence, we reject the null hypothesis.
For N > 15, Appendix H cannot be used. The sum of the ranks, T+, however, is approximately
normally distributed with
N(N + 1)
N(N + 1)(2N  1)
, and Variance  T2 =
4
24

T - μ T
so that the test statistic z 
is used to test the difference of the means of the two groups.
Mean μ T  =
 T
If there are tied ranks, the test-statistic is also adjusted to account for the decrease in the
variability of T+. The new variance is given by
Variance =
Gabino P. Petilos, Ph.D.
N(N  1)(2N  1) 1
24
2
g
t (t
j
j1
j
 1)(t j  1)
Nonparametric Statistics
10
Where g = number of groups; and tj = number of tied ranks in group j.
To illustrate the normal approximation of the Wilcoxon test, consider the same data but we
include two more students. Thus, N = 17. The data are reproduced below:
STUDENT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SCORE
BEFORE
58
61
61
69
64
68
70
62
56
59
72
75
73
67
65
60
59
SCORE
AFTER
63
58
62
69
70
75
73
67
58
55
80
80
69
70
74
70
70
d
Rank of d
5
-3
1
0
6
7
3
5
2
-4
8
5
-4
3
9
10
11
9
-4
1
--11
12
4
9
2
-6.5
13
9
-6.5
4
14
15
16
For the given data, N = 16 (since one difference is 0) and T+ = 119 . The mean and variance
are given by:
N(N + 1) 16(16 + 1)
=
= 68 ; and
4
4
N(N + 1)(2N  1) 16(17)(33)
Variance =
=
 374 .
24
24
Mean =
Since there are tied ranks we have:
Grouping
1
2
3
We first compute the value of the quantity
have
1
2
g
 t (t
j
j
 1)(t j  1) =
j 1
Therefore,  T2 =
Gabino P. Petilos, Ph.D.
Rank
4
6.5
9
1
2
tj
3
2
3
g
 t (t
j
j
 1)(t j  1) . From the given values of tj, we
j 1
1
[(3)(3 - 1)(3  1)  (2)(2  1)(2  1)  (3)(3 - 1)(3  1)]  27.
2
N(N  1)(2N  1) 1

24
2
g
 t (t
j
j 1
j
 1)(t j  1) = 374 – 27 = 347.
Nonparametric Statistics
Thus, s.d. =
11
347 = 18.630. Therefore, z 
T  - μ T
 T
=
119  68
 2.738 .
18.630
The tabular value of z at  = .05 (one-tailed) is 1.645.
Since the computed value of z exceeded the tabular value, the null hypothesis is rejected.
We therefore conclude that the improvement in the self-esteem of a person depends on the
program.
-----------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS.
Wilcoxon Signed Ranks Test
Ranks
N
Score Af t er - Score Bef ore Negativ e Ranks
Positiv e Ranks
Ties
Total
a
3
13b
1c
17
Mean Rank
5.67
9.15
Sum of Ranks
17.00
119.00
a. Score Af t er < Score Bef ore
b. Score Af t er > Score Bef ore
c. Score Af t er = Score Bef ore
Test Statisticsb
Z
Asy mp. Sig. (2-tailed)
Score Af t er Score Bef ore
-2.641a
.008
a. Based on negativ e ranks.
b. Wilcoxon Signed Ranks Test
Gabino P. Petilos, Ph.D.
Computed without correcting the
variance
Very small p-value which leads to
the rejection of the hypothesis
Nonparametric Statistics
12
KRUSKAL WALLIS ANOVA
DATA REQUIREMENT: ORDINAL OR RANKED DATA
FUNCTION: USED TO COMPARE THREE OR MORE INDEPENDENT SAMPLES
Suppose we want to compare the effectiveness of three methods of teaching science, namely
lecture, modular, and computer assisted. A random sample of 15 students were randomly assigned
to three groups. The scores of the five students in each group are shown below.
Method of Teaching
G2
G3
(Modular)
(Computer Assisted)
84
85
80
86
81
91
81
82
82
87
G1
(Lecture)
80
81
81
80
82
Ho: The median scores of the three groups of students exposed to the three teaching
methods are not significantly different.
H1: The median scores of the three groups of students are significantly different (NonDirectional alternative hypothesis).
Converting the scores into ranks (treating them as one set of scores), we get the following
results:
Rj
Rj
METHOD OF TEACHING
G2
G3
(Modular)
(Computer Assisted)
11
12
2
13
5.5
15
5.5
9
9
14
33
63
6.6
12.6
G1
(Lecture)
2
5.5
5.5
2
9
24
4.8
Based on the mean ranks ( R j ), we note that the students exposed to the lecture and
modular methods did not perform equally better than the students exposed to computer assisted
instruction. We need to confirm this observation by conducting a statistical test using the Kruskal
Wallis ANOVA since the three groups are independent samples and the data are ranks.
TEST STATISTIC:
KW 
12
N(N  1)

k
n R
j
2
j
 3(N  1)
j 1

12
5(4.8)2  5(6.6)2  5(12.6)2  3(15  1)
15(16)
12

(1,126.8)  48  8.34
15(16)

Gabino P. Petilos, Ph.D.
Nonparametric Statistics
13
The corresponding tabular value using Table O, at  = .05, is given by 5.78. Since the
computed value of KW exceeded the tabular value, we have to reject the null hypothesis. We
therefore conclude that the median scores of the students exposed to the three teaching methods
are significantly different.
CORRECTED VALUE OF KW WHEN THERE ARE TIED OBSERVATIONS
If we look at the original data, we can observe that there are 3 groups of tied scores, namely
80, 81, and 82. Moreover, we say that three scores are tied at 80, four scores are tied at 81, and
three scores are also tied at 82. If we correct the obtained KW statistic for ties, the correction for
continuity is given by
g
1
 (t
3
i
 ti )
i 1
N3  N
=1–
108
(33  3)  (4 3  4)  (33  3)
=1–
 .967857
3
3370
15  15
The corrected value of KW is given by
KWCorrected =
12
N(N  1)
k
n R
1
j
j 1
g
 (t
2
j
 3(N  1)
=
3
i
 ti )
8.34
 8.617.
0.967857
i 1
N3  N
which is still significant at  = .05.
Pairwise comparison must be done to determine where the differences lie. To do this, we
have to compute the value of the test statistic given by
z

k (k 1)

N(N  1)  1
1
 
12  nu nv

 .

where N is the sum of all the sample sizes among all groups, nu and nv are the sample sizes of the two
groups being compared, k is the number of groups and z α is the corresponding tabular value
k (k 1)
obtained using Table AII (page 320, Siegel and Castellan). If the sample sizes are equal, nu and nv will
be the same. In the given example nu = nv = 5 for all comparison groups. Hence we only have one
value of the test statistic for comparing any two groups. At  = .05, the critical value of z using
Table AII is 2.394 (two tailed). Hence the value of the test statistic is given by
(2.394) 
15(15  1)  1 1 
    (2.394)(2.828)  6.771254537.
12
 5 5
Any absolute difference between the mean ranks that exceeds the value of 6.7713 is
therefore declared significant. Based on the data presented on page 11, we have R 1  4.8 , R 2  6.6 ,
and R 3  12.6 . Computing the absolute differences, we have
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
14
R 3  R 1  12.6  4.8  7.8  6.7713 (significant);
R 3  R 2  12.6  6.6  6.0  6.7713 (not significant);
R 2  R 1  6.6  4.8  1.8  6.7713 (not significant).
Based on the comparison test, only the effects of the lecture method and the computer
assisted instruction of teaching science are significantly different in favor of the latter.
-----------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS.
Kruskal-Wallis ANOVA
Ranks
SCORE
GRP
lecture
modular
computer assisted
Total
N
5
5
5
15
Mean Rank
4.80
6.60
12.60
Test Statisticsa,b
KW statistic
Chi-Square
df
Asy mp. Sig.
SCORE
8.617
2
.013
a. Kruskal Wallis Test
b. Grouping Variable: GRP
--------------------------------------------------------------------------------------------------------------------------
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
15
FRIEDMAN’S TWO WAY ANOVA
DATA REQUIREMENT: ORDINAL OR RANKED DATA
FUNCTION: USED TO COMPARE THREE OR MORE DEPENDENT
SAMPLES (Repeated Measures Design)
Consider the data given below. These data are the scores of 18 pairs of subjects on the
standardized test in Mathematics.
PAIR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
METHOD A
(Modular)
18
24
19
23
24
18
25
19
21
22
19
23
24
20
21
19
25
18
METHOD B
(Computer Assisted)
19
23
20
25
21
22
25
24
23
22
20
21
24
23
22
24
22
25
METHOD C
(Lecture)
16
20
16
23
18
16
25
18
20
17
17
16
18
21
17
17
16
20
For this problem, the null hypothesis is:
Ho: The three methods of teaching are equally effective in terms of improving the
performance of three groups of students in College Algebra.
H1: There is a teaching method which is more effective in terms of improving the
performance of the students in College Algebra.
To apply the Friedman’s Test, we first rank the scores row-wise starting with a rank of 1 for
the lowest score. The average rank for tied scores will be used for tied observations. The results of
the ranking is shown below.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
16
PAIR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Rj
METHOD A
(Modular)
2
3
2
1.5
3
2
2
2
2
2.5
2
3
2.5
1
2
2
3
1
38.5
METHOD B
(Computer Assisted)
3
2
3
3
2
3
2
3
3
2.5
3
2
2.5
3
3
3
2
3
48.0
METHOD C
(Lecture)
1
1
1
1.5
1
1
2
1
1
1
1
1
1
2
1
1
1
2
21.5
The value of Fr without correction is given by:
 12
Fr  

 Nk(k  1)
R
2
j   3N(k  1) ,

where N = number of subjects and k = number of groups.
12


Thus, we have Fr  
 (38.52  482  21.52 )  3(18)(4)  236.028  216  20.028.
 18(3)(3  1)

The value of Fr with correction for tied observations is computed by first recording the size of
the tied scores. In the given data, there are 45 ties of size 1 (all distinct data read rowwise), 3 ties of
size 2, and 1 tie of size three. The correction for ties is given by:
N
gi
 t
3
i j
 45(13 )  3(2 3 )  1  (33 )  96.
i 1 j 1
Hence the value of Fr with this correction is given by:
k
R
12
Fr 
Gabino P. Petilos, Ph.D.
2
j
 3N 2 k(k  1)2
j 1
N gi


 Nk 
t i3 j 


i 1 j 1

Nk(k  1)  
k 1

Nonparametric Statistics
17
Substituting the values, we have,
Fr 
12(38.52  482  21.52 )  3(182 )(3)(3  1)2
12(4248.5)  46656
=
18(3)  96
216  (21)
18(3)(3  1) 
2

50982 46656 4326

 22.1846or 22.185 .
195
195
The value of Fr (uncorrected for ties) is therefore slightly lower than the value of Fr
corrected for ties. The significance of this value can be assessed using the Chi-Square distribution
with degrees of freedom (d.f.) = k - 1.
Now, if the d.f. = 2 and  = 0.05, the tabular value of Chi-square is 5.99. Since the value of Fr
exceeded the tabular value, we reject the null hypothesis. We conclude that there is a method of
teaching college algebra which is superior than the another method.
PAIRWISE COMPARISON
The test above is global, i.e., it only tells us that the three groups are not comparable. It does
not tell us which two specific groups are significantly different from one another. To determine
which two particular groups are significantly different, we perform the pairwise comparison test as
suggested by Siegel and Castellan (pp. 180-181).
First we determine the absolute difference of the sum of the ranks for the three groups.
R A - RB  38.5 - 48  9.5
R A - RC  38.5- 21.5  17.0
RB - R C  48- 21.5  26.5
The difference of the sums of the ranks is greatest between the subjects exposed to Method
B and Method C, followed by those exposed to Method A and Method C.
The absolute difference is compared to the critical value given by

 Nk(k  1)
z

.



6
k
(
k

1
)


If  = 0.05,
α
 .0083. Using the normal table, the corresponding value of z (through
k(k  1)
interpolation) is given by 2.394. And since
Nk(k  1)
18(3)(4)

 6 , the critical value for the
6
6
pairwise comparison is 2.394(6) = 14.364.
Since 17 > 14.364 and 26.5 > 14.364, we conclude that the groups exposed to Methods A
and B significantly differed with the group exposed to Method C. However, the groups exposed to
methods A and B did not differ significantly in their scores in algebra.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
18
On the basis of the pairwise comparison test, it could be said that teaching algebra using
modules and using computers are better methods than the lecture method.
-------------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS
Friedman Test
Ranks
Method A
Method B
Method C
Mean Rank
2.14
2.67
1.19
Test Statisticsa
Fr statistic
N
Chi-Square
df
Asy mp. Sig.
18
22.185
2
.000
a. Friedman Test
------------------------------------------------------------------------------------------------------------------------
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
19
CHI-SQUARE TEST FOR TWO INDEPENDENT SAMPLES
DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS
FUNCTION: USED TO COMPARE TWO INDEPENDENT SAMPLES
In a study conducted on the use of seat belts in preventing fatalities, records of the last 100
vehicular accidents were reviewed. These 100 accidents involved 238 persons. Each person was
classified as using or not using seat belts when the accident happened and as injured fatally or a
survivor.
Injured
Fatally?
Yes
No
Total
Wearing seat Belts?
Yes
No
9 (13.04)
88 (83.96)
23 (18.96) 118 (122.04)
32
206
Total
97
141
238
The samples could be treated as independent samples: those wearing seat belts and those
not wearing. We can then compare the proportion of those wearing seat belts who were fatally
injured and those not wearing who were also fatally injured. From this table, 9 out of 32 or 28.2% of
those wore seat belts were fatally injured. On the other hand, 88 out of 206 or 42.3% of those who
did not wear seat belts were also fatally injured. Are these independent proportions or percentages
significantly different? The null hypothesis and the corresponding alternative hypotheses are stated
as follows:
Ho: There is no significant difference between the proportion of fatally injured persons who
wear and who do not wear seatbelts.
H1: There is a significant difference between the proportion of fatally injured persons who
wear and who do not wear seatbelts.
The test statistic appropriate for this problem is the Chi-square test. The general formula for
the Chi-square test is given by
o  e 2
2 
e
over all cells

where:
o is the observed frequency
e is the expected frequency given by e =
row totalcolumntotal
grand total
For instance, from the given table we have the following computations:
o
9
88
23
118
Gabino P. Petilos, Ph.D.
e
(97)(32)
 13.04 202 or 13.04
238
(97)(206)
 83.95798 or 83.96
238
(141)(32)
 18.95798 or 18.96
238
(141)(206)
 122.042 or 122.04
238
.
Nonparametric Statistics
20
Thus, the Chi-square value is given by
2 

o  e 2
over all cells
e
=
9  13.042
13.04
+
88  83.962
83.96
+
23  18.962
18.96
+
118  122.042
122.04
 1.251656441 + 0.194397332 + 0.860843881 + 0.133739757
 2.440637411 or 2.441.
For 2  2 tables in which the sample size is small, (say N < 50) the following formula is
recommended:
2 

 o  e  0.52
e
over all cells
,
i.e., 0.5 is subtracted first from the absolute difference of the observed and the expected frequency
before squaring and dividing by the expected frequency. For the same table above, the corrected
value is given by:
2 
 9  13.04  0.52  88  83.96  0.52  23  18.96  0.52
+
+
+
118 122.04  0.52
18.96
13.04
83.96
 0.961959+ 0.14943+ 0.661773+ 0.1028(using the exact values)
 1.875962or 1.876
122.04
A more efficient formula that is equivalent to the equation shown above is given by
2
N

N  AD  BC  
2

2 
(A  C )  (B  D)  (A  D)  (B  D)
where the symbols are taken from a contingency table whose format is given below
Variable Y
Yes
No
Total
Variable X
Yes
No
A
B
C
D
A+C
B+D
Total
A+B
C+D
A+B+C+D
Thus, for the data in the table shown above, we have:
2
238 

238 (9)(118) - (88)(23) 
238(926 - 119)2
2 

2
 

 1.876.
(97)(141)(206)(32)
90158784
which is the same value of the test statistic.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
21
Since the tabular Chi-square value at 0.05 level of significance and df = 1 is 3.84, the null
hypothesis cannot be rejected. There is no sufficient evidence to show that wearing seat belts would
reduce the number of fatally injured people during accidents.
Remarks: When to use the Chi-square test
1. When N  20, always use the Fisher exact test.
2. When N is between 20 and 40, the Chi-square test given by
2
N

N AD  BC  
2

2 
may be used if all expected frequencies
(A  C )  (B  D)  (A  D)  (B  D)
are 5 or more. If the smallest expected frequency is less than 5, use the Fisher
exact test.
3. When N > 40, use the Chi-Square using the formula above.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
22
MCNEMAR TEST
DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS
FUNCTION: USED TO COMPARE TWO DEPENDENT SAMPLES
The McNemar test for the significance of changes is applicable to “before-and-after” designs
in which each subject is its own control and in which the measurements are made on either a
nominal or ordinal scale.
To test the significance of any observed change using this test, a fourfold table of frequencies
is used to represent the first and second sets of responses from the same individuals. In this table, +
and – signs are used to denote different responses arranged as shown below
after
before
+
–
–
A
C
+
B
D
where A, B, C, and D are the observed frequencies. Thus, A denotes the number of individuals whose
responses were + on the first measure and – on the second measure. Similarly, D is the number of
individuals who changed from minus (–) to plus (+). B and C are the respondents who responded the
same (+ for B and – for C) on both measures.
If the null hypothesis of no significant difference in the number who changed from to plus (+)
to minus (–) and from minus (–) to plus (+) is true, then the expected frequency for the cells for A and
D are each equal to (A+D)/2. The corresponding test statistic is called the McNemar test and is given
by
Test Statistic (with Yate’s Correction):  
2

A  D  1
2
A+D
with df = 1.
Illustration:
How consistent are people in their voting habits? Do people vote for the same political party
from election to election? Below are the results of a poll in which people were asked if they had
voted for NP or LP in each of the last two presidential elections.
1998
Elections
NP
LP
1992 Elections
NP
LP
117
23
27
178
In applying the McNemar test, we always use + and – to denote different responses. The table is
organized as shown below
After
Before
+
-
A
C
+
B
D
The obtained data can be assumed to have come from two dependent samples since the
same group of people were interviewed on two different occasions. Moreover, the data are
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
23
frequency counts so the McNemar Test is applicable. We first rearrange the entries in the given
table to conform to the table suggested by Siegel and Castellan as shown below.
1998
Elections
+(NP)
- (LP)
Total
1992 Elections
-(LP)
+(NP)
23
117
178
27
201
144
Total
140
205
345
The analysis could be centered on the proportion of voters who voted for NP during the 1992
elections and during the 1998 elections.
From the table, we observe that 144 out of 345 or about 41.7% voted for NP during the 1992
elections. On the other hand, only 140 out of 345 or 40.6% voted for NP during the 1998 elections.
There was therefore a decrease in the proportion of voters who voted for NP during the 1992
elections and during the 1998 elections. Thus, our null and alternative hypotheses are:
Ho: There is no significant difference in the proportion of those who voted for NP during the
1992 elections and those who voted the same party affiliation during the 1998 elections.
H1: The proportion of those who voted for NP during the 1998 elections was significantly
lower than those who voted for the same party affiliation during the 1992 elections.
Using the McNemar test, we have A = 23, D = 27. Thus, the value of the test statistic is
 
2
 23  27  12
23  27
 0.18
The tabular chi-square value at 0.05 level of significance is 3.84. Since the computed value
did not exceed the tabular value, we do not reject the null hypothesis.
It could be said that people tend to vote for the same party affiliation from election to
election.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
24
COMPUTER OUTPUT USING THE SPSS
McNemar Test
Y1998 & Y1992
Y1992
Y1998
NP
LP
NP
117
27
LP
23
178
Test Statisticsb
McNemar statistic
N
Chi-Square a
Asy mp. Sig.
Y 1998 &
Y 1992
345
.180
.671
a. Continuity Corrected
b. McNemar Test
Remark:
When the expected frequency (A+D)/2 is small (less than 5), we use the Binomial Test.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
25
CHI-SQUARE TEST FOR THREE OR MORE INDEPENDENT SAMPLES
DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS
FUNCTION: TO COMPARE THREE OR MORE INDEPENDENT SAMPLES
Suppose we want to compare the effectiveness of three methods of teaching advanced
statistics namely, Lecture Method (Method 1), Modular Method (Method 2) and Using CAI Materials
(Method 3). To do this, we first randomly form three independent groups (samples) of students
where each group will be taught by any one of the three methods of teaching. The dependent
variable in this case is the performance of the students in the final examination in Advanced
Statistics. If the data are scores, the One way ANOVA will be applicable. But suppose the
performance of the student is categorized into one of the following categories: Below Satisfactory
(score of 74 and below), Fair (75 – 79), Satisfactory (80 – 84), and Above Satisfactory (85 and above).
It would be interesting to determine how many of the students in each group would have
scores falling within each of these four categories. A comparison of the frequencies or proportions
can be done descriptively. But if we want to test whether the proportions within each category differ
significantly from one another, the Chi-Square test of significance can be used.
Ho: The distribution of grades of students exposed to the three teaching methods will not
differ significantly. (or There is no significant difference between the proportion of
students in each of the three groups who obtained Above Satisfactory ratings)
Ha: The distribution of grades of students exposed to the three teaching methods will differ
significantly. (or There is a significant difference between the proportion of students in
each of the three groups who obtained Above Satisfactory ratings)
Let us use the following hypothetical data to test the given null hypothesis:
Performance
Category
Above Satisfactory
Method of Teaching
Modular
CAI
20
18
TOTAL
47
Satisfactory
12
18
21
51
Fair
15
10
8
33
Below Satisfactory
24
12
6
42
60
60
53
173
Total
Gabino P. Petilos, Ph.D.
Lecture
9
Nonparametric Statistics
26
The hypothetical data show that there were 60 students exposed to the lecture method, 60
students exposed to the modular method, and 53 students exposed to the use of Computer Assisted
Instruction. It can be gleaned from the same table, that the distribution of performance for students
under the lecture method seem to differ from those exposed to the other two methods. We will test
the significance of this difference by computing the Chi-square test statistic given by
2 

over all cells
where:
o  e 2
e

o2
 N , with d.f. = (r-1)(c-1)
over all cells e

o is the observed frequency
e is the expected frequency given by e =
row totalcolumntotal
grand total
,
N is the grand total, and
r is number of categories of the row variable (dependent variable) while
c is the number of categories of the column variable (independent variable).
To compute this value, we need to compute the expected frequencies corresponding to the
observed frequencies. The results are shown in the table below.
Performance
Category
Above Satisfactory
Lecture
9 (16.3)
Satisfactory
12 (17.7)
18 (17.7)
21 (15.6)
51
Fair
15 (11.4)
10 (11.4)
8 (10.1)
33
Below Satisfactory
24 (14.6)
12 (14.6)
6 (12.9)
42
60
60
53
173
Total
Method of Teaching
Modular
CAI
20 (16.3)
18 (14.4)
TOTAL
47
Based on the table entries, the value of the Chi-square statistic is given by
2 
92
122 152 242 202 182 102 122 182 212 8 2
62











 173
16.3 17.7 11.4 14.6 16.3 17.7 11.4 14.6 14.4 15.6 10. 12.9
 193.670 173
 20.67
Thus the computed Chi-square value is 20.67. At  = 0.05 and d.f. = (4-1)(3-1) = 6, the
corresponding tabular value is 12.59. Since the computed chi-square value exceeded the tabular
value, the null hypothesis is rejected. It may be concluded that the effects of the three methods of
teaching on the performance of the students in the final test are significantly different.
To determine where the differences lie, we conduct the pairwise comparison test by first
partitioning the original table into 2  2 sub-tables each with d.f. = 1.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
27
The ith partition table for any r  c contingency table have the following entries needed to
compute the corresponding chi-square value:
A
B
R1
C
D
R2
C1
C2
N
where
C1 is the sum of all marginal column totals determined by A and C;
C2 is the sum of all marginal column totals determined by B and D;
R1 is the sum of all marginal row totals determined by A and B;
R2 is the sum of all marginal row totals determined by C and D;
N is the grand total in the original contingency table.
The Chi-square statistic associated to each partition is computed using the formula
2 
NC 2 (A  R2  C  R1 )  C 1 (B  R2  D  R1 ) 2
C 1  C 2  R1  R2  (C 1  C 2 )(R1  R2 )
For the given contingency table, we have the following computed  2 values:
1. Comparing the effects of lecture and modular methods for students who got Above Satisfactory
and Satisfactory performance. The partition table is given by:
Therefore,  2 
9
20
47
12
18
51
60
60
173
17360(9  51  12  47)  60(20  51  18  47)2
= 0.48
60  60  47  51  (60  60)(47  51)
2. Comparing the effects of lecture and modular methods for students who got Above Satisfactory
and Satisfactory performance combined and those who got Fair performance. The partition table
is given by:
Therefore,
Gabino P. Petilos, Ph.D.
2 
21
38
98
15
10
33
60
60
173
17360(21  33  15  98)  60(38  33  10  98)2
= 3.76
60  60  98  33  (60  60)(98  33)
Nonparametric Statistics
28
3. Comparing the effects of lecture and modular methods for students who got Above Satisfactory,
Satisfactory and Fair performance combined and those who got Below Satisfactory performance.
The partition table is given by:
Therefore,
2 
36
48
131
24
12
42
60
60
173
17360(36  42  24  131)  60(48  42  12  131)2
= 6.53
60  60  131 42  (60  60)(131  42)
4. Comparing the effects of lecture and modular methods combined versus the effect of CAI for
students who got Above Satisfactory and Satisfactory performance. The partition table is given
by:
Therefore,
2 
29
18
47
30
21
51
120
53
173
17353(29  51  30  47)  120(18  51  21  47)2
= 0.10
120  53  47  51  (120  53)(47  51)
5. Comparing the effects of lecture and modular methods combined versus the effect of CAI for
students who got Above Satisfactory and Satisfactory performance combined and those who got
Fair performance. The partition table is given by:
Therefore,
2 
59
39
98
25
8
33
120
53
173
17353(59  33  25  98)  120(39  33  8  98)2
= 2.81
120  53  98  33  (120  53)(98  33)
6. Comparing the effects of lecture and modular methods combined versus CAI for students who
got Above Satisfactory, Satisfactory and Fair performance combined and those who got Below
Satisfactory performance. The partition table is given by:
Therefore,
Gabino P. Petilos, Ph.D.
2 
84
47
131
36
6
42
120
53
173
17353(84  42  36  131)  120(47  42  6  1312
= 7.0
120  53  131 42  (120  53)(131  42)
Nonparametric Statistics
29
Summary of the Chi-square Values:
Partition
 2  value
Tabular Value
1
2
3
4
5
6
0.48
3.76
6.53
0.10
2.81
7.00
20.68
3.84
3.84
3.84
3.84
3.84
3.84
Total
Interpretation
Not Significant
Not Significant
Significant
Not Significant
Not Significant
Significant
Based on the comparisons made, it is concluded that the effects of the lecture and modular
methods of teaching advanced statistics are significantly different for those who obtained at least
Fair performance and those who got Below Satisfactory performance.
Similarly, the effects of lecture and modular methods combined, is deemed significantly
different from the effects of CAI for students who got at least Fair performance and those who got
Below Satisfactory performance.
In general then, the effects of the three methods of teaching can be said to be different from
one another in as far as those who failed are concerned. The hypothetical data showed that there
were many failures in the group exposed to the lecture method, followed by the group exposed to
the modular method. There are less failures in the group taught using the CAI.
ANALYSIS OF THE SAME DATA USING SPSS:
Crosstabs
Performance * Method of Teaching Crosstabulation
Method of Teaching
Perf ormance
Total
Abov e Satisf actory Count
Expected Count
Satisf actoy
Count
Expected Count
Fair
Count
Expected Count
Below Satisf actory Count
Expected Count
Count
Expected Count
Lecture
9
16.3
12
17.7
15
11.4
24
14.6
60
60.0
Chi-Square Tests
Pearson Chi-Square
N of Valid Cases
Value
20.647a
173
df
6
Asy mp. Sig.
(2-sided)
.002
a. 0 cells (.0%) hav e expect ed count less than 5. The
minimum expected count is 10.11.
Gabino P. Petilos, Ph.D.
Modular
20
16.3
18
17.7
10
11.4
12
14.6
60
60.0
CAI
18
14.4
21
15.6
8
10.1
6
12.9
53
53.0
Total
47
47.0
51
51.0
33
33.0
42
42.0
173
173.0
Nonparametric Statistics
30
COCHRAN’S Q TEST
DATA REQUIREMENT: NOMINAL OR FREQUENCY COUNTS
FUNCTION: TO COMPARE THREE OR MORE DEPENDENT SAMPLES
Example:
An experimental study was conducted to determine the method of teaching that would
improve the conceptual understanding of the students in Physics. Twenty sets of matched individuals
were selected and randomly assigned to the three groups. The dependent variable of the study was
the students’ performance in the test to be given after the experiment. Each student’s performance
was coded as 1 if the student passes the test and 0 if he fails. The data are shown below.
Subject
METHOD A
METHOD B
METHOD C
Li
L2i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
1
0
1
1
1
1
1
0
1
1
1
0
0
1
1
1
0
1
0
G1 = 14
0
1
1
0
0
1
1
0
1
1
1
0
1
0
0
1
1
0
1
0
G2 = 11
0
0
0
0
0
1
0
1
0
1
0
0
1
0
1
1
0
0
0
0
G3 = 6
1
2
1
1
1
3
2
2
1
3
2
1
2
0
2
3
2
0
2
0
1
4
1
1
1
9
4
4
1
9
4
1
4
0
4
9
4
0
4
0
i
2
i
 L  31  L
i
Proportion who passed:
Pi 
 65
i
14
11
6
or 70%; P2 
or 55%; P3 
or 30%
20
20
20
In this research problem, the Cochran’s Q whose test statistic is given below will be
appropriate for this research design for the following reasons:
1. We are comparing three dependent samples.
2. The data are categorical.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
31
2
 k
 k
 
2
(k  1)k  G j  G j  


 j 1
 j 1  

Q

Test Statistic:
k
N


Li 
i 1
N
L
2
i
i 1
The null and alternative hypotheses in this research problem are:
Ho: There is no significant difference in the proportion of subjects who pass the test in each of the
three groups.
H1: There is a significant difference in the proportion of subjects who pass the test in each of the
three groups
Based on the given data, the computed value of Q which is the test statistic is given by
Thus,

(k  1)k 


Q
k
k

j 1
N
 k

 G j 


 j 1 

N
L  L
i
i 1

G 2j
2
i
2



2
2
2
2
  (3  1) 3(14  11  6 )  31
3(31)  65


i 1
2(98)
 7.0 .
28
At  = 0.05 and df = 2, the tabular Chi-square value is 5.99. Therefore the null hypothesis is
rejected. We conclude that the observed proportions are significantly different.
Note: When the Q statistic is significant, pairwise comparison must be conducted to determine
which two particular groups are significantly different.
Gabino P. Petilos, Ph.D.
Nonparametric Statistics
32
------------------------------------------------------------------------------------------------------------------------COMPUTER OUTPUT USING SPSS
Cochran Test
Frequencies
Value
0
METHOD A
METHOD B
METHOD C
1
6
9
14
14
11
6
Test Statistics
N
Cochran's Q
df
Asy mp. Sig.
20
7.000a
2
.030
a. 1 is treated as a success.
------------------------------------------------------------------------------------------------------------------------SUMMARY
This material discussed test statistics that can be used when the given data cannot be
analyzed using a parametric test like the t-test and the F-test because of data requirement and scale
of measurement used in gathering the data.
As mentioned earlier, the nonparametric statistical tests discussed in this material are only
the most commonly used tests which are alternatives for the t-test and the F-test. There are other
nonparametric tests that could be used and the interested readers are referred to the book of Siegel
and Castellan. The table below summarizes the relation between parametric and nonparametric
tests used for comparing groups
Type of Data
Interval
Ordinal
Nominal
Independent Samples
2 groups
3 or more groups
t-test
F-test
(One Way ANOVA)
Wilcoxon Rank
Sum Test
Chi-Square Test
Kruskal Wallis Test
Chi-Square Test
Dependent Samples
2 groups
3 or more groups
t-test
F-test
(Repeated Measures
ANOVA)
Wilcoxon Signed
Friedman’s Test
Ranks Test
McNemar Test
Cochran’s Q test
REFERENCE: Siegel S. & Castellan, J. (1988) Nonparametric Statistics. New York:
McGraw-Hill Book Company (2nd ed).
Gabino P. Petilos, Ph.D.
Download