NONPARAMETRIC TESTS (Chapter 9 and Section 12.7) In situations where the normality of the population(s) is suspect or the sample sizes are so small that checking normality is not really feasible, it is sometimes preferable to use nonparametric tests to make inferences about “average” value. THE SIGN TEST (first used in the binomial distribution handout) The sign test can be used in place of the paired t-test when we have evidence that the paired differences are NOT normally distributed. It is a simple test to perform, however it is not the best nonparametric alternative to paired t-test. Example: Resting Energy Expenditure (REE) for Patient with Cystic Fibrosis A researcher believes that patients with cystic fibrosis (CF) expend greater energy during resting than those without CF. To obtain a fair comparison she matches 13 patients with CF to 13 patients without CF on the basis of age, sex, height, and weight. She then measured there REE for each pair of subjects and compared the results. The following results were obtained: Pair CF Healthy Difference Sign of Difference (C) (H) =C-H (+ and -) 1 1153 996 157 + 2 1132 1080 52 + 3 1165 1182 -17 4 1460 1452 8 + 5 1634 1162 472 + 6 1493 1619 -126 7 1358 1140 218 + 8 1453 1123 330 + 9 1185 1113 72 + 10 1824 1463 361 + 11 1793 1632 161 + 12 1930 1614 316 + 13 2075 1836 239 + If there were no difference between in the resting energy of the CF patients vs. healthy patients in each pair we would expect P(+) = P(-) = 1 when looking at the paired 2 differences. Here we see that we have 11 +’s and only 2 –‘s, which seems to suggest that cystic fibrosis patients have a larger REE energy expenditure than a similar healthy individual. We can use the Binomial Table Generator to find the probability that we would obtain 11 or more +’s by chance variation alone. 1 Results from Binomial Table Generator with n = 13 and p = .50 P(X > 11) = .0112 < so we conclude that individuals with CF have a higher resting energy than healthy individuals of the same sex, age, height and weight. WILCOXON SIGNED-RANK TEST This test is a better alternative to the paired t-test than the sign test discussed above. This test is used when we do not wish to assume that the population of paired differences is normally distributed. The Wilcoxon Signed-Rank test use ranks based on the paired differences rather than the actual difference values. Example: Resting Energy Expenditure (REE) for Patient with Cystic Fibrosis As an example we again consider the resting energy of cystic fibrosis patients. Pair 1 2 3 4 5 6 7 8 9 10 11 12 13 CF (C) 1153 1132 1165 1460 1634 1493 1358 1453 1185 1824 1793 1930 2075 Healthy Difference (H) d=C-H 996 157 1080 52 1182 -17 1452 8 1162 472 1619 -126 1140 218 1123 330 1113 72 1463 361 1632 161 1614 316 1836 239 Sign of Difference + + + + + + + + + + + |d| 157 52 17 8 472 126 218 330 72 361 161 316 239 Rank |di| 6 3 2 1 13 5 8 11 4 12 7 10 9 Signed Rank 6 3 -2 1 13 -5 8 11 4 12 7 10 9 We then calculate T = the sum of the positive signed ranks = ___________ and T = the sum of the negative signed ranks = ___________ Are hypotheses are stated in terms of the median of the paired differences. Listed below are the hypotheses along with the test statistic based on the signed rank sums used to test it. 2 Let Md denote the median paired difference. H o : M d 0 vs. H a : M d 0 (two-tailed) H o : M d 0 vs. H a : M d 0 (upper-tailed) H o : M d 0 vs. H a : M d 0 (lower-tailed) Test statistic T min( T , T ) Test statistic T T Test statistic T T For this example, if had originally hypothesized that the cystic fibrosis patient will have a larger REE than a similar healthy individual then we have the upper-tailed alternative and our test statistic T = _______ The Wilcoxon Signed Rank Table at the end of this handout contains p-values associated with test statistic values for small sample sizes, i.e. number of pairs, n<30. If n>12 we can use a z-statistic and find the p-value from the standard normal table. T T n(n 1)( 2n 1) n(n 1) where T and T . zT 4 T 24 Here we have n = 13 so we can use the above approximation as follows: n(n 1) T 4 T n(n 1)(2n 1) 13(14)(27) 14.31 24 24 Thus our z-statistic is T T zT = T Now we find the p-value using the standard normal table. Here our p-value = _________, thus we reject the null hypothesis and conclude that the cystic fibrosis patients have a higher resting energy expenditure than healthy individuals who are the same age, sex, height, and weight. 3 IN JMP Select Distribution > Test Mean > Enter 0 for the hypothesized value and check the nonparametric test box. The results of the test are on the following page. 4 The p-values for the upper-tailed t-Test and the Wilcoxon signed-rank test have been highlighted. (T T ) (84 7) 38.5 2 2 Why? I don’t know, but we only need the p-value anyway. The test statistic reported by JMP for Wilcoxon test = Conclusion: 5 WILCOXON RANK SUM TEST (MANN-WHITNEY U TEST) This test is an alternative to the two-sample t-test for comparing the “average” value of two populations where the samples from each population are taken independently. In the discussion below we will label the two populations to be compared as 1 and 2. We will also assume the sample size from population 1 is n and the sample size from population 2 is m. The hypotheses tested can be stated as follows: H o : The distribution of population 1 and population 2 are identical. If the populations are symmetric (but not necessarily normal) the null hypothesis can be expressed in terms of the population medians as: M1 M 2 H a : The distribution of population 1 and population 2 are different. (two-tailed) M1 M 2 or H a : The distribution of population 1 is shifted to the right of the distribution for population 2, i.e. the population 1 values are generally larger than the population 2 values. (right-tailed) M1 M 2 or H a : The distribution of population 1 is shifted to the left of the distribution for population 2, i.e. the population 1 values are generally smaller than the population 2 values. (left-tailed) M1 M 2 The tests statistic is based on the sum of the ranks assigned to the observed data from each population when the combined sample is ranked from smallest to largest. The test statistic is based upon the sum of the ranks from each group. Our test statistic will is given by: n(n 1) T S1 where S1= the sum of the ranks assigned to the pop 1 values. 2 or m(m 1) T S2 where S2 = the sum of the ranks assigned to the pop 2 values. 2 The choice is irrelevant, but you do need consider your choice when making your decision. 6 The Wilcoxon (Mann-Whitney) table at the end of notes gives critical values for T for cases where n and m are both less than 20. For two sided test we will reject if T is sufficiently small or sufficiently large. n(n 1) For Ha: M1 < M2 we will reject if T S1 is “small” or if 2 m(m 1) is “large”. T S2 2 For Ha: M1 > M2 we will reject if T S1 T S2 m(m 1) is “small”. 2 n(n 1) is “large” or if 2 If we have larger sample sizes we can use a normal approximation to find the p-value. The normal approximation test statistic based on the test statistic T as follows: z T T T where T mn and T 2 nm(n m 1) 12 We can then use the standard normal table to find the p-value. Example: Oral glucose response in patients with Huntington’s disease vs. control Davidson et al. studied the responses to oral glucose in patients with Huntington’s disease and in a group of control subjects. The five-hour responses are shown in the table on the following page. Is there evidence to suggest the five-hour glucose (mg present) is greater for patients with Huntington’s disease? In conducting the study the researchers used n = 11 patients with Huntington’s disease (H) and m = 10 controls. Ho : MC M H vs. Ha : MC M H The data below are the five-hour glucose levels for the two samples. Control: 83 73 65 65 Huntington’s: 85 89 86 90 91 77 77 78 93 97 100 85 82 75 92 86 86 You can use JMP to compute the ranks or to conduct the entire test as we will see later. 7 The sum of the ranked glucose levels for the control group is: ____________. The sum of the ranked glucose levels for the Huntington’s group is: ____________. The sum of the ranks for the control group is smaller than the rank sum for the Huntington’s disease patients, but this could be expected even if the null hypothesis were true. Why? The test statistic T is based on the sum of the ranks for the controls. Intuitively we will reject the null hypothesis if the sum of the ranked glucose levels for the control group is “small”. Using the normal approximation to find the p-value for the sample size n group. For the sample size m group the roles of n and m are reversed in the mean formula. n(m n 1) 2 nm(n m 1) T 12 T T z ~ N (0,1) T T 8 For sample size n = 10 group we have the following. n(m n 1) 10(10 11 1) 110 2 2 10 11(10 11 1) T 14.20 12 78 110 153 121 z 2.25 or z 2.25 14.20 14.20 T Compute p-value using normal approximation z-score Conclusion: 9 Wilcoxon Rank Sum Test in JMP Data Table Select Nonparametric > Wilcoxon Test 10 Wilcoxon Signed-Rank Test p-values for (n < 30) 11 Critical Values for Wilcoxon (Mann-Whitney) Rank Sum Test 12 Nonparametric Approach: Kruskal-Wallis Test If the normality assumption is suspect or the sample sizes from each of the k populations are too small to assess normality we can use the Kruskal-Wallis Test to compare the size of the values drawn from the different populations. There are two basic assumptions for the Kruskal-Wallis test: 1) The samples from the k populations are independently drawn. 2) The null hypothesis is that all k populations are identical in shape, with the only potential difference being in the location of the typical values (medians). Hypotheses: H o : All k populations have the same median or “typical/average” data. H a : At least one of the populations has a median or “typical/average” value different from other others or At least one population is shifted away from the others. To perform to the test we rank all of the data from smallest to largest and compute the rank sum for each of the k samples. The test statistic looks at the difference between the R N 1 average rank for each group i and average rank for all observations . If there 2 ni are differences in the populations we expect some groups will have an average rank much larger than the average rank for all observations and some to have smaller average ranks. 2 k R N 1 12 ~ k21 (Chi-square distribution with df = k-1) H ni i N ( N 1) i 1 ni 2 The larger H is the stronger the evidence we have against the null hypothesis that the populations have the same location/median. Large values of H lead to small p-values! Example: Antecubital Vein Cortisol Levels Cawson et al. studied cortisol levels in three groups of patients who were delivered between 38 and 42 weeks gestation. Group I was studied before the onset of labor at elective Caesarean section, Group II was studied at emergency Caesarean section during induced labor, and Group III consisted of patients in whom spontaneous labor occurred and who were delivered either vaginally or by Caesarean section. We wish to know whether the median cortisol levels differ across these three groups. Group I: 262 307 211 323 454 339 Group II: 465 501 455 355 468 362 Group III: 343 772 207 1048 838 687 304 154 287 356 13 Enter these data into two columns, one denoting the group the other containing cortisol level. Select Analyze > Fit Y by X and place Group in X box and Cortisol level in the Y box. Select Nonparametric > Wilcoxon Test This will perform a Kruskal-Wallis test R1 69, R2 90, R3 94 and H 9.23 (p-value = .0099). We have evidence to suggest that the median cortisol levels are significantly differ between the three groups. 14 Multiple Comparisons for Kruskal-Wallis Test To determine if group i significantly differs from group j we compute zij and then compute p-value = P( Z z ij ) . Ri R j N ( N 1) 1 1 12 ni n j Bonferroni Correction If the p-value is less then 2m where m # of pair-wise comparisons to be made which k would typically be if all pair-wise comparisons are of interest. For this example, we 2 3 can make a total of m = 3 pair-wise comparisons so we compare our p-values to 2 .05 .00833 . 2(3) Comparing Group I vs. Group II z13 69.0 90.0 = P(Z > 6.26) 0 < .00833 so we conclude these groups significantly differ 22(23) 1 1 12 10 6 in terms of cortisol level. Comparing Group I vs. Group III z13 69.0 94.0 22(23) 1 1 12 10 6 7.46 P(Z > 7.46) 0 < .00833 so we conclude these groups significantly differ in terms of cortisol level. Comparing Group II vs. Group III z13 90.0 94.0 22(23) 1 1 12 6 6 1.192 P(Z>1.192) = .1166 > .00833 so we fail to conclude these groups differ significantly. In conclusion we have identified the elective Caesarean section patients as being significantly different from patients in whom spontaneous labor occurred in terms of antecubital vein cortisol levels. 15 Friedman’s Test for Randomized Complete Block (RCB) Designs In for the analysis of one-way ANOVA with blocking, i.e. analysis of results from RCB designs a nonparametric alternative is Friedman’s Test. The assumptions required for this test are as follows: 1) The data consist of b mutually independent samples (blocks) of size k (# of treatments). 2) The variable of interest is continuous, or at least ordinal. 3) There is no interaction between blocks and treatments. 4) The observations within each block may be ranked in order of magnitude. Hypotheses: 𝐻𝑜 : 𝑀1 = 𝑀2 = ⋯ = 𝑀𝑘 𝐻𝑎 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑒𝑞𝑢𝑎𝑙𝑖𝑡𝑦 𝑖𝑠 𝑣𝑖𝑜𝑙𝑎𝑡𝑒𝑑. The Friedman test statistic is defined as: 𝑘 𝜒𝑟2 2 12 𝑏(𝑘 + 1) = ∑ [𝑅𝑗 − ] 𝑏𝑘(𝑘 + 1) 2 𝑗=1 where, 𝑅𝑗 = the sum of the ranks for treatment j where the ranks are assigned to the treatments within blocks. The sum of the ranks is computed across blocks however. The test statistics under 𝐻𝑜 follows a chi-square distribution with df = k – 1. 16 Example: Serum amylase values (enzyme units per 100 ml of serum) in patients with pancreatitis. 17 In JMP (sort of Friedman’s Test): Multiple Comparisons (Steel-Dwass ~ nonparametric version of Tukey’s all pairs) 18 In R, > Amylase = read.table(file.choose(),header=T,sep=",") > names(Amylase) [1] "Block" "Method" "Amylase" > friedman.test(Amylase~Method|Block,data=Amylase) Friedman rank sum test data: Amylase and Method and Block Friedman chi-squared = 15.9429, df = 2, p-value = 0.0003452 19