Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Non-parametric Tests Non-parametric tests (sometimes called assumption-free tests) are used when one or more of the parametric assumptions are violated. These tests work based on ranking the data from the lowest score (rank of 1) and the rank increases accordingly (2, 3, …, 𝑛). The analysis is then carried out on the ranks (not the actual scores). The main non-parametric tests that we will investigate in this chapter are: Mann-Whitney test. Wilcoxon signed-rank test. Friedman’s test. Kruskal-Wallis test. Comparing two independent conditions For testing two different conditions with different participants, we may use the non-parametric equivalents of the independent 𝑡-test: the Mann-Whitney test and the Wilcoxon rank-sum test. Example: A doctor is interested in assessing the effect of two drugs so she tested two different groups of patients: the first group (10 participants), were given drug A while the second group (10 participants) were given drug B. A certain health indicator was measured over two days for all participants and the results are presented in the following Table. Participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Drug A A A A A A A A A A B B B B B B B B B B Indicator (day 1) 15 35 16 18 19 17 27 16 13 20 16 15 20 15 16 13 14 19 18 18 1 Indicator (day 2) 28 35 35 24 39 32 27 29 36 35 5 6 30 8 9 7 6 17 3 10 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar The rationale behind the Wilcoxon rank-sum and Mann-Whitney tests is as follows: Ignore the drug type for a moment and rank the indicator data from the lowest to the highest score. If there were no significant differences between the two drugs, we would expect the sum of ranks in drug A to be quite similar to the sum of ranks in drug B. If we were to take only the “Indicator (day 2)” data and rank it from the lowest to the highest, the ranking will look like this: Participant 19 11 17 12 16 14 15 20 18 4 7 1 8 13 6 3 10 2 9 5 Drug B B B B B B B B B A A A A B A A A A A A 3 5 6 6 7 8 9 10 17 24 27 28 29 30 32 35 35 35 36 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3.5 3.5 5 6 7 8 9 10 11 12 13 14 15 17 17 17 19 20 Indicator (day 2) Potential rank Actual rank Notice that the highlighted ranks are tied ones (equal scores) so the actual rank assigned to each is the average of the tied ranks. Wilcoxon rank-sum test: If we add the ranks for each drug, the sums of ranks would be 151 for drug A and 59 for drug B. The lowest of sums (59) is selected as our test statistic for day 2, that is 𝑊 = 59. Next is to find the mean 𝑊 and the standard error 𝑆𝐸 as follows: 𝑊 = 𝑆𝐸 = 𝑛 (𝑛 + 𝑛 + 1) 10 × (10 + 10 + 1) = = 105 2 2 𝑛 𝑛 (𝑛 + 𝑛 + 1) = 12 10 × 10 × (10 + 10 + 1) = 13.23 12 Now we convert the test statistic to a 𝑧-score as follows: 𝑧= 𝑊 − 𝑊 59 − 105 = = −3.48 𝑆𝐸 13.23 This value is smaller than -1.96, so we can conclude that there is a significant difference (twotailed). Mann-Whitney: the test statistic 𝑈 is derived as follows: 𝑈=𝑛 𝑛 + 𝑈 = 10 × 10 + 𝑛 (𝑛 + 1) −𝑅 2 10 × 11 − 151 = 4.00 2 The 𝑈 statistic is also going to lead to concluding that the difference is significant. 2 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Running Mann Whitney test on SPSS Data input is done using a coding variable (drug) so the “Variable View” window should look like this: First, we need to explore the data (normality and homogeneity) and it is clear that the homogeneity of variances assumption has been met; however, not all variables satisfied the normality assumption. Therefore, we will carry out non-parametric tests on the data. Tests of Normality Kolmogorov-Smirnova Drug Statistic IndicatorDay1 IndicatorDay2 df Shapiro-Wilk Sig. Statistic Df Sig. Drug A .276 10 .030 .811 10 .020 Drug B .170 10 .200* .959 10 .780 Drug A .235 10 .126 .941 10 .566 Drug B .305 10 .009 .753 10 .004 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Test of Homogeneity of Variance Levene Statistic IndicatorDay1 df1 df2 Sig. Based on Mean 3.644 1 18 .072 Based on Median 1.880 1 18 .187 Based on Median and with 1.880 1 10.076 .200 2.845 1 18 .109 Based on Mean .508 1 18 .485 Based on Median .091 1 18 .766 Based on Median and with .091 1 11.888 .768 .275 1 18 .606 adjusted df Based on trimmed mean IndicatorDay2 adjusted df Based on trimmed mean Go to Analyze Nonparametric Tests Legacy Dialogs 2 Independent Samples. Remember to click “Define Groups” and input the numbers 1 and 2 to compare drug A and drug B. 3 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Click “Exact” to select the method for calculating Mann-Whitney significance. The default “Asymptotic only” is only accurate for large samples. For small samples or poorly distributed data, the “Exact” method, which is a more complex and time-consuming process, is more accurate. The “Monte Carlo” method creates a distribution similar to the sample’s distribution and then several samples (default is 10,000) are taken from this distribution, from which the mean significance level is computed. In summary, we select “Monte Carlo” for large samples and “Exact” for small samples. You may want to click “Options” and then check the “Descriptives” option. Click “OK” to obtain the SPSS output. 4 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar SPSS output Ranks Drug IndicatorDay1 IndicatorDay2 N Mean Rank Sum of Ranks Drug A 10 11.95 119.50 Drug B 10 9.05 90.50 Total 20 Drug A 10 15.10 151.00 Drug B 10 5.90 59.00 Total 20 The “Ranks” table is a summary of the averages and sums of the total ranks for each drug. This table is like a descriptive statistics table of the ranks and is useful to interpret significant differences should they exist. Test Statisticsa IndicatorDay1 IndicatorDay2 Mann-Whitney U 35.500 4.000 Wilcoxon W 90.500 59.000 Z -1.105 -3.484 Asymp. Sig. (2-tailed) .269 .000 Exact Sig. [2*(1-tailed b .000b Exact Sig. (2-tailed) .288 .000 Exact Sig. (1-tailed) .144 .000 Point Probability .013 .000 .280 Sig.)] a. Grouping Variable: Drug b. Not corrected for ties. The “Test Statistics” table contains the computed 𝑧-scores for the variables, the Mann-Whitney’s 𝑈 statistic, the Wilcoxon’s 𝑊 statistic, and the significance levels for Mann-Whitney test. The difference is significant for day 2 and it is not significant for day 1. 5 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Comparing two related conditions If scores under two different conditions come from the same participants, we will use the Wilcoxon signed-rank test (different from Wilcoxon sum test) which is equivalent to the paired 𝑡-test. Going back to the previous example (drugs), if we want to compare the effects of the drug between the first day and the second day, then we are conducting a repeated-measures experiment (same participants are evaluated over two days). The data is non-parametric so we will use the Wilcoxon signed-rank test which works in a pretty similar way to the dependent 𝑡-test. The differences between scores in the two conditions are ranked (ignoring the sign of the difference) and the sign of the difference (+ or –) is assigned to the rank. Zero differences (equal scores) are eliminated from the analysis. Tied ranks are handled in a similar manner as described earlier. The sums of positive ranks and negative ranks are computed and the 𝑇 statistic is the smaller of the two sums. Then we calculate a mean statistic and a standard error, from which we calculate a 𝑧-score to be compared against a critical value. Running Wilcoxon signed-rank test on SPSS We need to split the variables based on the drug (depending on the version, SPSS may automatically split the variables based on the drug). We then carry out the repeated measures analysis on SPSS by going to Analyze Nonparametric Tests Legacy Dialogs 2 Related Samples as shown in the Figure below. Use the same guidelines for “Exact” and “Options” as before. SPSS output Descriptive Statistics Drug Drug A Drug B N Mean Std. Deviation Minimum Maximum IndicatorDay1 10 19.6000 6.60303 13.00 35.00 IndicatorDay2 10 32.0000 4.78423 24.00 39.00 IndicatorDay1 10 16.4000 2.27058 13.00 20.00 IndicatorDay2 10 10.1000 7.95054 3.00 30.00 6 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar The first key output is the “Ranks” table which provides a summary on the ranked scores including the number of negative ranks (day 2 > day 1), number of positive ranks, and the number of tied ranks. Ranks Drug Drug A N IndicatorDay2 - Negative Ranks IndicatorDay1 Drug B Positive Ranks IndicatorDay2 IndicatorDay1 Mean Rank Sum of Ranks 0a .00 .00 b 4.50 36.00 8 c Ties 2 Total 10 Negative Ranks 9a 5.22 47.00 b 8.00 8.00 Positive Ranks 1 Ties 0c Total 10 a. IndicatorDay2 < IndicatorDay1 b. IndicatorDay2 > IndicatorDay1 c. IndicatorDay2 = IndicatorDay1 The next output is the “Ranks” table in which the test statistic (𝑍) and the significance is shown for both drugs. We can conclude that the differences between the two days are significant for both drugs. Now to determine the direction of this difference, we may inspect the “Descriptive Statistics” table from which we can conclude that the indicator’s mean value for day 1 is greater than day 2 for drug B while the mean of day 2 is greater than day 1 for drug A. This opposite direction of the effect is considered an interaction. Test Statisticsa Drug Drug A Drug B IndicatorDay2 - IndicatorDay1 -2.527b Z Asymp. Sig. (2-tailed) .012 Exact Sig. (2-tailed) .008 Exact Sig. (1-tailed) .004 Point Probability .004 -1.990c Z Asymp. Sig. (2-tailed) .047 Exact Sig. (2-tailed) .045 Exact Sig. (1-tailed) .022 Point Probability .003 a. Wilcoxon Signed Ranks Test b. Based on negative ranks. c. Based on positive ranks. 7 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Comparing different independent groups Remember that we conducted one-way ANOVA to compare the means of several independent groups even when some assumptions have been violated. There is a non-parametric counterpart called the Kruskal-Wallis test. Similar to the previous tests, the Kruskal-Wallis is based on ranking the data and the sum of ranks for each group (𝑅 ) is determined. A test statistic 𝐻 is then calculated which has a chi-square distribution, from which we can determine whether the differences are significant. Example: Four conditions (A, B, C, and D) were tested using different participants and the measured outcome is a continuous outcome as shown in the Table below. Condition A 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32 21.08 B 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 18.47 C 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18.21 D 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 4.11 To run the Kruskal-Wallis test on SPSS, the outcome variable should be inserted into one column and the distinction between groups is made using a categorical variable. To explore the normality and homogeneity assumptions, go to Analyze Descriptive Statistics Explore. The results are presented in the output below; from which we can conclude that both assumptions have been violated. 8 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Tests of Normality Kolmogorov-Smirnova Condition Statistic Outcome df Shapiro-Wilk Sig. Statistic df Sig. Condition A .181 20 .085 .805 20 .001 Condition B .208 20 .024 .826 20 .002 Condition C .268 20 .001 .743 20 .000 Condition D .205 20 .027 .912 20 .070 a. Lilliefors Significance Correction Test of Homogeneity of Variance Levene Statistic Outcome df1 df2 Sig. Based on Mean 5.115 3 76 .003 Based on Median 2.861 3 76 .042 Based on Median and with 2.861 3 58.104 .045 4.070 3 76 .010 adjusted df Based on trimmed mean To run the Kruskal-Wallis test, go to Analyze Nonparametric Tests Legacy Dialogs K Independent Samples. Input the variables as shown in the Figure below. The “Jonckheere-Terpstra” option is a useful one to look at linear trend in the data. Select “Exact” and then check the “Monte Carlo” option. Click “OK” to run the analysis. SPSS output The first two outputs are the “Descriptive Statistics” and the “Ranks” tables as shown below. 9 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Descriptive Statistics N Mean Std. Deviation Minimum Maximum Outcome 80 3.8392 4.26056 .31 21.08 Condition 80 2.5000 1.12509 1.00 4.00 Ranks Condition Outcome N Mean Rank Condition A 20 46.35 Condition B 20 44.15 Condition C 20 44.15 Condition D 20 27.35 Total 80 In the “Test Statistics” table below, the Kruskal-Wallis’s statistic (𝐻) is presented along with its significance (we will look at Monte Carlo’s value). We can conclude that there is a significant effect of the condition on the outcome. Test Statisticsa,b Outcome Chi-Square 8.659 Df 3 Asymp. Sig. Monte Carlo Sig. .034 .031c Sig. 99% Confidence Lower Bound .027 Interval Upper Bound .036 a. Kruskal Wallis Test b. Grouping Variable: Condition c. Based on 10000 sampled tables with starting seed 2000000. To identify where the differences occur, we need to run contrasts or post hoc tests. These tests are not directly available for non-parametric data; instead, we can run multiple Mann-Whitney tests and to avoid the inflation of Type I error, we divide the critical significance (0.05) by the number of comparisons (i.e. Bonferroni correction). It is therefore necessary to select only the desired comparisons in order to minimize the loss of power of the test. The next output is the “Jonckheere-Terpstra Test” table which tests whether the median ascends or descends with an increase in the category code (1 for A to 4 for D). A positive 𝑧-value indicates an increasing trend of medians with the categorical variable increase. The sign is negative here which indicates a decreasing trend of the medians and this trend is significant. 10 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Jonckheere-Terpstra Testa Outcome Number of Levels in Condition 4 N 80 Observed J-T Statistic 912.000 Mean J-T Statistic 1200.00 0 Std. Deviation of J-T Statistic 116.333 Std. J-T Statistic -2.476 Asymp. Sig. (2-tailed) Monte Carlo Sig. (2-tailed) .013 .012b Sig. 99% Confidence Interval Monte Carlo Sig. (1-tailed) Lower Bound .009 Upper Bound .015 .006b Sig. 99% Confidence Interval Lower Bound .004 Upper Bound .008 a. Grouping Variable: Condition b. Based on 10000 sampled tables with starting seed 2000000. Comparing related groups If the same participants are tested to measure an outcome for 3 or more conditions, and if one or more of the parametric data assumptions have been violated, we can use Friedman’s ANOVA. This test is much like the other non-parametric tests which depend on ranking the data. We will insert the data for the different conditions in columns and each row will represent a single participant. The data for each person is ranked from the lowest to the highest (lowest score receives a rank of 1). The ranks under each condition are added and the summed ranks are used to calculate a test statistic 𝐹 which has a chi-square distribution. Example: A new diet is claimed to produce a significant weight loss over a period of two months. 10 participants engaged in this diet for two months and their weights were recorded at the beginning of the diet, after one month, and after two months as shown in the following Table. 11 Applied Statistics for Engineers – Non-parametric Tests Participant 1 2 3 4 5 6 7 8 9 10 Husam A. Abu Hajar Weight (kg) Month 1 65.38 66.24 67.70 102.72 69.45 119.96 66.09 73.62 75.81 67.66 Start 63.75 62.98 65.98 107.27 66.58 120.46 62.01 71.87 83.01 76.62 Month 2 81.34 69.31 77.89 91.33 72.87 114.26 68.01 55.43 71.63 68.60 Running Friedman’s ANOVA on SPSS We first conduct exploratory analysis to check the normality of the data (no need to check for homogeneity). The results are shown in the output below. Clearly, the normality assumption has been violated in 2 of the 3 variables. Tests of Normality Kolmogorov-Smirnova Statistic df Shapiro-Wilk Sig. Statistic df Sig. Start .228 10 .149 .784 10 .009 Month1 .335 10 .002 .685 10 .001 Month2 .203 10 .200* .877 10 .121 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction To run Friedman’s ANOVA on SPSS, go to Analyze Nonparametric Tests Legacy Dialogs K Related Samples. Drag the variables as shown in the Figure below. 12 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Click “OK” to view the results. SPSS output Based on the output below, the Friedman’s test statistic (chi-square) is not significant which means that the weights did not differ significantly over the studied two-month period. Descriptive Statistics N Mean Std. Deviation Minimum Maximum Start 10 78.0530 20.23028 62.01 120.46 Month1 10 77.4630 18.61402 65.38 119.96 Month2 10 77.0670 16.10698 55.43 114.26 Ranks Test Statisticsa Mean Rank N 10 Start 1.90 Chi-Square Month1 2.00 df Month2 2.10 Asymp. Sig. .905 Exact Sig. .974 Point Probability .143 .200 2 a. Friedman Test If the statistic was significant, we will need to follow up with post hoc tests to identify where the significant difference occurs. We can carry out post hoc tests by running multiple Wilcoxon signed-rank tests and correcting the critical significance by dividing 0.05 by the number of comparisons (Bonferroni correction). 13 Applied Statistics for Engineers – Non-parametric Tests Husam A. Abu Hajar Self-study problems: 1. 14 students were given two quizzes. The first quiz (control) was an ordinary one while in the second quiz, the students were informed that they will be rewarded based on the scores (each student took two quizzes). The results are presented in the table below. What are your conclusions? Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Score Quiz 1 0 0 0 2 2 0 0 0 0 2 0 1 2 2 Quiz 2 1 3 1 1 2 1 2 3 3 3 4 0 1 3 2. 59 students from three different majors (business, engineering, and science) took exams in 4 different subjects: A, B, C, and D (each student took four exams). The data is in the “Nonparametric example” data file. a. Compare the performance of students from the different majors in the 4 different subjects. b. Compare the performance of students from the same major in the 4 different subjects. 14