Uploaded by Moataz Hamdan

Non-parametric Tests

advertisement
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Non-parametric Tests
Non-parametric tests (sometimes called assumption-free tests) are used when one or more of the
parametric assumptions are violated. These tests work based on ranking the data from the lowest
score (rank of 1) and the rank increases accordingly (2, 3, …, 𝑛). The analysis is then carried out
on the ranks (not the actual scores).
The main non-parametric tests that we will investigate in this chapter are:




Mann-Whitney test.
Wilcoxon signed-rank test.
Friedman’s test.
Kruskal-Wallis test.
Comparing two independent conditions
For testing two different conditions with different participants, we may use the non-parametric
equivalents of the independent 𝑡-test: the Mann-Whitney test and the Wilcoxon rank-sum test.
Example: A doctor is interested in assessing the effect of two drugs so she tested two different
groups of patients: the first group (10 participants), were given drug A while the second group (10
participants) were given drug B. A certain health indicator was measured over two days for all
participants and the results are presented in the following Table.
Participant
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Drug
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
Indicator (day 1)
15
35
16
18
19
17
27
16
13
20
16
15
20
15
16
13
14
19
18
18
1
Indicator (day 2)
28
35
35
24
39
32
27
29
36
35
5
6
30
8
9
7
6
17
3
10
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
The rationale behind the Wilcoxon rank-sum and Mann-Whitney tests is as follows: Ignore the
drug type for a moment and rank the indicator data from the lowest to the highest score. If there
were no significant differences between the two drugs, we would expect the sum of ranks in drug
A to be quite similar to the sum of ranks in drug B.
If we were to take only the “Indicator (day 2)” data and rank it from the lowest to the highest, the
ranking will look like this:
Participant
19
11
17
12
16
14
15
20
18
4
7
1
8
13
6
3
10
2
9
5
Drug
B
B
B
B
B
B
B
B
B
A
A
A
A
B
A
A
A
A
A
A
3
5
6
6
7
8
9
10
17
24
27
28
29
30
32
35
35
35
36
39
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
3.5
3.5
5
6
7
8
9
10
11
12
13
14
15
17
17
17
19
20
Indicator
(day 2)
Potential
rank
Actual
rank
Notice that the highlighted ranks are tied ones (equal scores) so the actual rank assigned to each is
the average of the tied ranks.
Wilcoxon rank-sum test: If we add the ranks for each drug, the sums of ranks would be 151 for
drug A and 59 for drug B. The lowest of sums (59) is selected as our test statistic for day 2, that is
𝑊 = 59. Next is to find the mean 𝑊 and the standard error 𝑆𝐸 as follows:
𝑊 =
𝑆𝐸
=
𝑛 (𝑛 + 𝑛 + 1) 10 × (10 + 10 + 1)
=
= 105
2
2
𝑛 𝑛 (𝑛 + 𝑛 + 1)
=
12
10 × 10 × (10 + 10 + 1)
= 13.23
12
Now we convert the test statistic to a 𝑧-score as follows:
𝑧=
𝑊 − 𝑊 59 − 105
=
= −3.48
𝑆𝐸
13.23
This value is smaller than -1.96, so we can conclude that there is a significant difference (twotailed).
Mann-Whitney: the test statistic 𝑈 is derived as follows:
𝑈=𝑛 𝑛 +
𝑈 = 10 × 10 +
𝑛 (𝑛 + 1)
−𝑅
2
10 × 11
− 151 = 4.00
2
The 𝑈 statistic is also going to lead to concluding that the difference is significant.
2
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Running Mann Whitney test on SPSS
Data input is done using a coding variable (drug) so the “Variable View” window should look like
this:
First, we need to explore the data (normality and homogeneity) and it is clear that the homogeneity
of variances assumption has been met; however, not all variables satisfied the normality
assumption. Therefore, we will carry out non-parametric tests on the data.
Tests of Normality
Kolmogorov-Smirnova
Drug
Statistic
IndicatorDay1
IndicatorDay2
df
Shapiro-Wilk
Sig.
Statistic
Df
Sig.
Drug A
.276
10
.030
.811
10
.020
Drug B
.170
10
.200*
.959
10
.780
Drug A
.235
10
.126
.941
10
.566
Drug B
.305
10
.009
.753
10
.004
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Test of Homogeneity of Variance
Levene Statistic
IndicatorDay1
df1
df2
Sig.
Based on Mean
3.644
1
18
.072
Based on Median
1.880
1
18
.187
Based on Median and with
1.880
1
10.076
.200
2.845
1
18
.109
Based on Mean
.508
1
18
.485
Based on Median
.091
1
18
.766
Based on Median and with
.091
1
11.888
.768
.275
1
18
.606
adjusted df
Based on trimmed mean
IndicatorDay2
adjusted df
Based on trimmed mean
Go to Analyze  Nonparametric Tests  Legacy Dialogs  2 Independent Samples. Remember
to click “Define Groups” and input the numbers 1 and 2 to compare drug A and drug B.
3
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Click “Exact” to select the method for calculating Mann-Whitney significance. The default
“Asymptotic only” is only accurate for large samples. For small samples or poorly distributed data,
the “Exact” method, which is a more complex and time-consuming process, is more accurate. The
“Monte Carlo” method creates a distribution similar to the sample’s distribution and then several
samples (default is 10,000) are taken from this distribution, from which the mean significance level
is computed. In summary, we select “Monte Carlo” for large samples and “Exact” for small
samples.
You may want to click “Options” and then check the “Descriptives” option. Click “OK” to obtain
the SPSS output.
4
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
SPSS output
Ranks
Drug
IndicatorDay1
IndicatorDay2
N
Mean Rank
Sum of Ranks
Drug A
10
11.95
119.50
Drug B
10
9.05
90.50
Total
20
Drug A
10
15.10
151.00
Drug B
10
5.90
59.00
Total
20
The “Ranks” table is a summary of the averages and sums of the total ranks for each drug. This
table is like a descriptive statistics table of the ranks and is useful to interpret significant differences
should they exist.
Test Statisticsa
IndicatorDay1
IndicatorDay2
Mann-Whitney U
35.500
4.000
Wilcoxon W
90.500
59.000
Z
-1.105
-3.484
Asymp. Sig. (2-tailed)
.269
.000
Exact Sig. [2*(1-tailed
b
.000b
Exact Sig. (2-tailed)
.288
.000
Exact Sig. (1-tailed)
.144
.000
Point Probability
.013
.000
.280
Sig.)]
a. Grouping Variable: Drug
b. Not corrected for ties.
The “Test Statistics” table contains the computed 𝑧-scores for the variables, the Mann-Whitney’s
𝑈 statistic, the Wilcoxon’s 𝑊 statistic, and the significance levels for Mann-Whitney test. The
difference is significant for day 2 and it is not significant for day 1.
5
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Comparing two related conditions
If scores under two different conditions come from the same participants, we will use the Wilcoxon
signed-rank test (different from Wilcoxon sum test) which is equivalent to the paired 𝑡-test.
Going back to the previous example (drugs), if we want to compare the effects of the drug between
the first day and the second day, then we are conducting a repeated-measures experiment (same
participants are evaluated over two days). The data is non-parametric so we will use the Wilcoxon
signed-rank test which works in a pretty similar way to the dependent 𝑡-test. The differences
between scores in the two conditions are ranked (ignoring the sign of the difference) and the sign
of the difference (+ or –) is assigned to the rank. Zero differences (equal scores) are eliminated
from the analysis. Tied ranks are handled in a similar manner as described earlier. The sums of
positive ranks and negative ranks are computed and the 𝑇 statistic is the smaller of the two sums.
Then we calculate a mean statistic and a standard error, from which we calculate a 𝑧-score to be
compared against a critical value.
Running Wilcoxon signed-rank test on SPSS
We need to split the variables based on the drug (depending on the version, SPSS may
automatically split the variables based on the drug). We then carry out the repeated measures
analysis on SPSS by going to Analyze  Nonparametric Tests  Legacy Dialogs  2 Related
Samples as shown in the Figure below. Use the same guidelines for “Exact” and “Options” as
before.
SPSS output
Descriptive Statistics
Drug
Drug A
Drug B
N
Mean
Std. Deviation
Minimum
Maximum
IndicatorDay1
10
19.6000
6.60303
13.00
35.00
IndicatorDay2
10
32.0000
4.78423
24.00
39.00
IndicatorDay1
10
16.4000
2.27058
13.00
20.00
IndicatorDay2
10
10.1000
7.95054
3.00
30.00
6
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
The first key output is the “Ranks” table which provides a summary on the ranked scores including
the number of negative ranks (day 2 > day 1), number of positive ranks, and the number of tied
ranks.
Ranks
Drug
Drug A
N
IndicatorDay2 -
Negative Ranks
IndicatorDay1
Drug B
Positive Ranks
IndicatorDay2 IndicatorDay1
Mean Rank
Sum of Ranks
0a
.00
.00
b
4.50
36.00
8
c
Ties
2
Total
10
Negative Ranks
9a
5.22
47.00
b
8.00
8.00
Positive Ranks
1
Ties
0c
Total
10
a. IndicatorDay2 < IndicatorDay1
b. IndicatorDay2 > IndicatorDay1
c. IndicatorDay2 = IndicatorDay1
The next output is the “Ranks” table in which the test statistic (𝑍) and the significance is shown
for both drugs. We can conclude that the differences between the two days are significant for both
drugs. Now to determine the direction of this difference, we may inspect the “Descriptive
Statistics” table from which we can conclude that the indicator’s mean value for day 1 is greater
than day 2 for drug B while the mean of day 2 is greater than day 1 for drug A. This opposite
direction of the effect is considered an interaction.
Test Statisticsa
Drug
Drug A
Drug B
IndicatorDay2 - IndicatorDay1
-2.527b
Z
Asymp. Sig. (2-tailed)
.012
Exact Sig. (2-tailed)
.008
Exact Sig. (1-tailed)
.004
Point Probability
.004
-1.990c
Z
Asymp. Sig. (2-tailed)
.047
Exact Sig. (2-tailed)
.045
Exact Sig. (1-tailed)
.022
Point Probability
.003
a. Wilcoxon Signed Ranks Test
b. Based on negative ranks.
c. Based on positive ranks.
7
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Comparing different independent groups
Remember that we conducted one-way ANOVA to compare the means of several independent
groups even when some assumptions have been violated. There is a non-parametric counterpart
called the Kruskal-Wallis test. Similar to the previous tests, the Kruskal-Wallis is based on ranking
the data and the sum of ranks for each group (𝑅 ) is determined. A test statistic 𝐻 is then calculated
which has a chi-square distribution, from which we can determine whether the differences are
significant.
Example: Four conditions (A, B, C, and D) were tested using different participants and the
measured outcome is a continuous outcome as shown in the Table below.
Condition
A
0.35
0.58
0.88
0.92
1.22
1.51
1.52
1.57
2.43
2.79
3.40
4.52
4.72
6.90
7.58
7.78
9.62
10.05
10.32
21.08
B
0.33
0.36
0.63
0.64
0.77
1.53
1.62
1.71
1.94
2.48
2.71
4.12
5.65
6.76
7.08
7.26
7.92
8.04
12.10
18.47
C
0.40
0.60
0.96
1.20
1.31
1.35
1.68
1.83
2.10
2.93
2.96
3.00
3.09
3.36
4.34
5.81
5.94
10.16
10.98
18.21
D
0.31
0.32
0.56
0.57
0.71
0.81
0.87
1.18
1.25
1.33
1.34
1.49
1.50
2.09
2.70
2.75
2.83
3.07
3.28
4.11
To run the Kruskal-Wallis test on SPSS, the outcome variable should be inserted into one column
and the distinction between groups is made using a categorical variable. To explore the normality
and homogeneity assumptions, go to Analyze  Descriptive Statistics  Explore. The results are
presented in the output below; from which we can conclude that both assumptions have been
violated.
8
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Tests of Normality
Kolmogorov-Smirnova
Condition
Statistic
Outcome
df
Shapiro-Wilk
Sig.
Statistic
df
Sig.
Condition A
.181
20
.085
.805
20
.001
Condition B
.208
20
.024
.826
20
.002
Condition C
.268
20
.001
.743
20
.000
Condition D
.205
20
.027
.912
20
.070
a. Lilliefors Significance Correction
Test of Homogeneity of Variance
Levene Statistic
Outcome
df1
df2
Sig.
Based on Mean
5.115
3
76
.003
Based on Median
2.861
3
76
.042
Based on Median and with
2.861
3
58.104
.045
4.070
3
76
.010
adjusted df
Based on trimmed mean
To run the Kruskal-Wallis test, go to Analyze  Nonparametric Tests  Legacy Dialogs  K
Independent Samples. Input the variables as shown in the Figure below.
The “Jonckheere-Terpstra” option is a useful one to look at linear trend in the data. Select “Exact”
and then check the “Monte Carlo” option. Click “OK” to run the analysis.
SPSS output
The first two outputs are the “Descriptive Statistics” and the “Ranks” tables as shown below.
9
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Descriptive Statistics
N
Mean
Std. Deviation
Minimum
Maximum
Outcome
80
3.8392
4.26056
.31
21.08
Condition
80
2.5000
1.12509
1.00
4.00
Ranks
Condition
Outcome
N
Mean Rank
Condition A
20
46.35
Condition B
20
44.15
Condition C
20
44.15
Condition D
20
27.35
Total
80
In the “Test Statistics” table below, the Kruskal-Wallis’s statistic (𝐻) is presented along with its
significance (we will look at Monte Carlo’s value). We can conclude that there is a significant
effect of the condition on the outcome.
Test Statisticsa,b
Outcome
Chi-Square
8.659
Df
3
Asymp. Sig.
Monte Carlo Sig.
.034
.031c
Sig.
99% Confidence
Lower Bound
.027
Interval
Upper Bound
.036
a. Kruskal Wallis Test
b. Grouping Variable: Condition
c. Based on 10000 sampled tables with starting seed 2000000.
To identify where the differences occur, we need to run contrasts or post hoc tests. These tests are
not directly available for non-parametric data; instead, we can run multiple Mann-Whitney tests
and to avoid the inflation of Type I error, we divide the critical significance (0.05) by the number
of comparisons (i.e. Bonferroni correction). It is therefore necessary to select only the desired
comparisons in order to minimize the loss of power of the test.
The next output is the “Jonckheere-Terpstra Test” table which tests whether the median ascends
or descends with an increase in the category code (1 for A to 4 for D). A positive 𝑧-value indicates
an increasing trend of medians with the categorical variable increase. The sign is negative here
which indicates a decreasing trend of the medians and this trend is significant.
10
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Jonckheere-Terpstra Testa
Outcome
Number of Levels in Condition
4
N
80
Observed J-T Statistic
912.000
Mean J-T Statistic
1200.00
0
Std. Deviation of J-T Statistic
116.333
Std. J-T Statistic
-2.476
Asymp. Sig. (2-tailed)
Monte Carlo Sig. (2-tailed)
.013
.012b
Sig.
99% Confidence Interval
Monte Carlo Sig. (1-tailed)
Lower Bound
.009
Upper Bound
.015
.006b
Sig.
99% Confidence Interval
Lower Bound
.004
Upper Bound
.008
a. Grouping Variable: Condition
b. Based on 10000 sampled tables with starting seed 2000000.
Comparing related groups
If the same participants are tested to measure an outcome for 3 or more conditions, and if one or
more of the parametric data assumptions have been violated, we can use Friedman’s ANOVA.
This test is much like the other non-parametric tests which depend on ranking the data.
We will insert the data for the different conditions in columns and each row will represent a single
participant. The data for each person is ranked from the lowest to the highest (lowest score receives
a rank of 1). The ranks under each condition are added and the summed ranks are used to calculate
a test statistic 𝐹 which has a chi-square distribution.
Example: A new diet is claimed to produce a significant weight loss over a period of two months.
10 participants engaged in this diet for two months and their weights were recorded at the
beginning of the diet, after one month, and after two months as shown in the following Table.
11
Applied Statistics for Engineers – Non-parametric Tests
Participant
1
2
3
4
5
6
7
8
9
10
Husam A. Abu Hajar
Weight (kg)
Month 1
65.38
66.24
67.70
102.72
69.45
119.96
66.09
73.62
75.81
67.66
Start
63.75
62.98
65.98
107.27
66.58
120.46
62.01
71.87
83.01
76.62
Month 2
81.34
69.31
77.89
91.33
72.87
114.26
68.01
55.43
71.63
68.60
Running Friedman’s ANOVA on SPSS
We first conduct exploratory analysis to check the normality of the data (no need to check for
homogeneity). The results are shown in the output below. Clearly, the normality assumption has
been violated in 2 of the 3 variables.
Tests of Normality
Kolmogorov-Smirnova
Statistic
df
Shapiro-Wilk
Sig.
Statistic
df
Sig.
Start
.228
10
.149
.784
10
.009
Month1
.335
10
.002
.685
10
.001
Month2
.203
10
.200*
.877
10
.121
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
To run Friedman’s ANOVA on SPSS, go to Analyze  Nonparametric Tests  Legacy Dialogs
 K Related Samples. Drag the variables as shown in the Figure below.
12
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Click “OK” to view the results.
SPSS output
Based on the output below, the Friedman’s test statistic (chi-square) is not significant which means
that the weights did not differ significantly over the studied two-month period.
Descriptive Statistics
N
Mean
Std. Deviation
Minimum
Maximum
Start
10
78.0530
20.23028
62.01
120.46
Month1
10
77.4630
18.61402
65.38
119.96
Month2
10
77.0670
16.10698
55.43
114.26
Ranks
Test Statisticsa
Mean Rank
N
10
Start
1.90
Chi-Square
Month1
2.00
df
Month2
2.10
Asymp. Sig.
.905
Exact Sig.
.974
Point Probability
.143
.200
2
a. Friedman Test
If the statistic was significant, we will need to follow up with post hoc tests to identify where the
significant difference occurs. We can carry out post hoc tests by running multiple Wilcoxon
signed-rank tests and correcting the critical significance by dividing 0.05 by the number of
comparisons (Bonferroni correction).
13
Applied Statistics for Engineers – Non-parametric Tests
Husam A. Abu Hajar
Self-study problems:
1. 14 students were given two quizzes. The first quiz (control) was an ordinary one while in
the second quiz, the students were informed that they will be rewarded based on the scores
(each student took two quizzes). The results are presented in the table below. What are
your conclusions?
Student
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Score
Quiz 1
0
0
0
2
2
0
0
0
0
2
0
1
2
2
Quiz 2
1
3
1
1
2
1
2
3
3
3
4
0
1
3
2. 59 students from three different majors (business, engineering, and science) took exams in
4 different subjects: A, B, C, and D (each student took four exams). The data is in the “Nonparametric example” data file.
a. Compare the performance of students from the different majors in the 4 different
subjects.
b. Compare the performance of students from the same major in the 4 different
subjects.
14
Download