Section 25 - Nonparametric Tests

advertisement
NONPARAMETRIC TESTS (Chapter 9 and Section 12.7)
In situations where the normality of the population(s) is suspect or the sample sizes are so
small that checking normality is not really feasible, it is sometimes preferable to use
nonparametric tests to make inferences about “average” value.
THE SIGN TEST (first used in the binomial distribution handout)
The sign test can be used in place of the paired t-test when we have evidence that the
paired differences are NOT normally distributed. It is a simple test to perform, however
it is not the best nonparametric alternative to paired t-test.
Example: Resting Energy Expenditure (REE) for Patient with Cystic Fibrosis
A researcher believes that patients with cystic fibrosis (CF) expend greater energy during
resting than those without CF. To obtain a fair comparison she matches 13 patients with
CF to 13 patients without CF on the basis of age, sex, height, and weight. She then
measured there REE for each pair of subjects and compared the results.
The following results were obtained:
Pair CF Healthy Difference Sign of Difference
(C)
(H)
=C-H
(+ and -)
1
1153
996
157
+
2
1132 1080
52
+
3
1165 1182
-17
4
1460 1452
8
+
5
1634 1162
472
+
6
1493 1619
-126
7
1358 1140
218
+
8
1453 1123
330
+
9
1185 1113
72
+
10
1824 1463
361
+
11
1793 1632
161
+
12
1930 1614
316
+
13
2075 1836
239
+
If there were no difference between in the resting energy of the CF patients vs. healthy
patients in each pair we would expect P(+) = P(-) = 1 when looking at the paired
2
differences. Here we see that we have 11 +’s and only 2 –‘s, which seems to suggest that
cystic fibrosis patients have a larger REE energy expenditure than a similar healthy
individual. We can use the Binomial Table Generator to find the probability that we
would obtain 11 or more +’s by chance variation alone.
1
Results from Binomial Table Generator with n = 13 and p = .50
P(X > 11) = .0112 < so we conclude that individuals with CF have a higher
resting energy than healthy individuals of the same sex, age, height and weight.
WILCOXON SIGNED-RANK TEST
This test is a better alternative to the paired t-test than the sign test discussed above. This
test is used when we do not wish to assume that the population of paired differences is
normally distributed. The Wilcoxon Signed-Rank test use ranks based on the paired
differences rather than the actual difference values.
Example: Resting Energy Expenditure (REE) for Patient with Cystic Fibrosis
As an example we again consider the resting energy of cystic fibrosis patients.
Pair
1
2
3
4
5
6
7
8
9
10
11
12
13
CF
(C)
1153
1132
1165
1460
1634
1493
1358
1453
1185
1824
1793
1930
2075
Healthy Difference
(H)
d=C-H
996
157
1080
52
1182
-17
1452
8
1162
472
1619
-126
1140
218
1123
330
1113
72
1463
361
1632
161
1614
316
1836
239
Sign of
Difference
+
+
+
+
+
+
+
+
+
+
+
|d|
157
52
17
8
472
126
218
330
72
361
161
316
239
Rank
|di|
6
3
2
1
13
5
8
11
4
12
7
10
9
Signed
Rank
6
3
-2
1
13
-5
8
11
4
12
7
10
9
We then calculate T = the sum of the positive signed ranks = ___________
and T = the sum of the negative signed ranks = ___________
Are hypotheses are stated in terms of the median of the paired differences. Listed below
are the hypotheses along with the test statistic based on the signed rank sums used to test
it.
2
Let Md denote the median paired difference.
H o : M d  0 vs. H a : M d  0
(two-tailed)
H o : M d  0 vs. H a : M d  0
(upper-tailed)
H o : M d  0 vs. H a : M d  0
(lower-tailed)
Test statistic T  min( T , T )
Test statistic T  T
Test statistic T  T
For this example, if had originally hypothesized that the cystic fibrosis patient will have a
larger REE than a similar healthy individual then we have the upper-tailed alternative and
our test statistic T = _______
The Wilcoxon Signed Rank Table at the end of this handout contains p-values associated
with test statistic values for small sample sizes, i.e. number of pairs, n<30.
If n>12 we can use a z-statistic and find the p-value from the standard normal table.
T  T
n(n  1)( 2n  1)
n(n  1)
where  T 
and  T 
.
zT 
4
T
24
Here we have n = 13 so we can use the above approximation as follows:
n(n  1)
T 

4
T 
n(n  1)(2n  1)
13(14)(27)

 14.31
24
24
Thus our z-statistic is
T  T
zT 
=
T
Now we find the p-value using the standard normal table.
Here our p-value = _________, thus we reject the null hypothesis and conclude that the
cystic fibrosis patients have a higher resting energy expenditure than healthy individuals
who are the same age, sex, height, and weight.
3
IN JMP
Select Distribution > Test Mean > Enter 0 for the hypothesized value and check the
nonparametric test box.
The results of the test are on the following page.
4
The p-values for the upper-tailed t-Test and the Wilcoxon signed-rank test have been
highlighted.
(T  T ) (84  7)

 38.5
2
2
Why? I don’t know, but we only need the p-value anyway.
The test statistic reported by JMP for Wilcoxon test =
Conclusion:
5
WILCOXON RANK SUM TEST (MANN-WHITNEY U TEST)
This test is an alternative to the two-sample t-test for comparing the “average” value of
two populations where the samples from each population are taken independently. In the
discussion below we will label the two populations to be compared as 1 and 2. We will
also assume the sample size from population 1 is n and the sample size from population 2
is m.
The hypotheses tested can be stated as follows:
H o : The distribution of population 1 and population 2 are identical.
If the populations are symmetric (but not necessarily normal) the null hypothesis
can be expressed in terms of the population medians as:
M1  M 2
H a : The distribution of population 1 and population 2 are different. (two-tailed)
M1  M 2
or
H a : The distribution of population 1 is shifted to the right of the distribution for
population 2, i.e. the population 1 values are generally larger than the population
2 values. (right-tailed)
M1  M 2
or
H a : The distribution of population 1 is shifted to the left of the distribution for
population 2, i.e. the population 1 values are generally smaller than the population
2 values. (left-tailed)
M1  M 2
The tests statistic is based on the sum of the ranks assigned to the observed data from
each population when the combined sample is ranked from smallest to largest.
The test statistic is based upon the sum of the ranks from each group. Our test statistic
will is given by:
n(n  1)
T  S1 
where S1= the sum of the ranks assigned to the pop 1 values.
2
or
m(m  1)
T  S2 
where S2 = the sum of the ranks assigned to the pop 2 values.
2
The choice is irrelevant, but you do need consider your choice when making your
decision.
6
The Wilcoxon (Mann-Whitney) table at the end of notes gives critical values for T for
cases where n and m are both less than 20.



For two sided test we will reject if T is sufficiently small or sufficiently large.
n(n  1)
For Ha: M1 < M2 we will reject if T  S1 
is “small” or if
2
m(m  1)
is “large”.
T  S2 
2
For Ha: M1 > M2 we will reject if T  S1 
T  S2 
m(m  1)
is “small”.
2
n(n  1)
is “large” or if
2
If we have larger sample sizes we can use a normal approximation to find the p-value.
The normal approximation test statistic based on the test statistic T as follows:
z
T  T
T
where  T 
mn
and  T 
2
nm(n  m  1)
12
We can then use the standard normal table to find the p-value.
Example: Oral glucose response in patients with Huntington’s disease vs. control
Davidson et al. studied the responses to oral glucose in patients with Huntington’s disease
and in a group of control subjects. The five-hour responses are shown in the table on the
following page. Is there evidence to suggest the five-hour glucose (mg present) is greater
for patients with Huntington’s disease?
In conducting the study the researchers used n = 11 patients with Huntington’s disease
(H) and m = 10 controls.
Ho : MC  M H
vs.
Ha : MC  M H
The data below are the five-hour glucose levels for the two samples.
Control: 83
73
65
65
Huntington’s: 85
89
86
90
91
77
77
78
93
97
100
85
82
75
92
86
86
You can use JMP to compute the ranks or to conduct the entire test as we will see later.
7
The sum of the ranked glucose levels for the control group is:
____________.
The sum of the ranked glucose levels for the Huntington’s group is: ____________.
The sum of the ranks for the control group is smaller than the rank sum for the
Huntington’s disease patients, but this could be expected even if the null hypothesis were
true. Why?
The test statistic T is based on the sum of the ranks for the controls. Intuitively we will
reject the null hypothesis if the sum of the ranked glucose levels for the control group is
“small”. Using the normal approximation to find the p-value for the sample size n
group. For the sample size m group the roles of n and m are reversed in the mean
formula.
n(m  n  1)
2
nm(n  m  1)
T 
12
T  T
z
~ N (0,1)
T 
T
8
For sample size n = 10 group we have the following.
n(m  n  1) 10(10  11  1)

 110
2
2
10  11(10  11  1)
T 
 14.20
12
78  110
153  121
z
 2.25 or z 
 2.25
14.20
14.20
T 
Compute p-value using normal approximation z-score
Conclusion:
9
Wilcoxon Rank Sum Test in JMP
Data Table
Select Nonparametric >
Wilcoxon Test
10
Wilcoxon Signed-Rank Test p-values for (n < 30)
11
Critical Values for Wilcoxon (Mann-Whitney) Rank Sum Test
12
Nonparametric Approach: Kruskal-Wallis Test
If the normality assumption is suspect or the sample sizes from each of the k populations
are too small to assess normality we can use the Kruskal-Wallis Test to compare the size
of the values drawn from the different populations. There are two basic assumptions for
the Kruskal-Wallis test:
1) The samples from the k populations are independently drawn.
2) The null hypothesis is that all k populations are identical in shape, with the
only potential difference being in the location of the typical values (medians).
Hypotheses:
H o : All k populations have the same median or “typical/average” data.
H a : At least one of the populations has a median or “typical/average” value
different from other others
or
At least one population is shifted away from the others.
To perform to the test we rank all of the data from smallest to largest and compute the
rank sum for each of the k samples. The test statistic looks at the difference between the
R 
 N 1
average rank for each group  i  and average rank for all observations 
 . If there
 2 
 ni 
are differences in the populations we expect some groups will have an average rank much
larger than the average rank for all observations and some to have smaller average ranks.
2
k
 R N 1
12
 ~  k21 (Chi-square distribution with df = k-1)
H
ni  i 

N ( N  1) i 1  ni
2 
The larger H is the stronger the evidence we have against the null hypothesis that the
populations have the same location/median. Large values of H lead to small p-values!
Example: Antecubital Vein Cortisol Levels
Cawson et al. studied cortisol levels in three groups of patients who were delivered between 38 and 42
weeks gestation. Group I was studied before the onset of labor at elective Caesarean section, Group II was
studied at emergency Caesarean section during induced labor, and Group III consisted of patients in whom
spontaneous labor occurred and who were delivered either vaginally or by Caesarean section. We wish to
know whether the median cortisol levels differ across these three groups.
Group I:
262
307
211
323
454
339
Group II:
465
501
455
355
468
362
Group III:
343
772
207
1048
838
687
304
154
287
356
13
Enter these data into two
columns, one denoting the group
the other containing cortisol level.
Select Analyze > Fit Y by X and place Group in X box and Cortisol level in the Y box.
Select Nonparametric >
Wilcoxon Test
This will perform a
Kruskal-Wallis test
R1  69, R2  90, R3  94 and H  9.23 (p-value = .0099).
We have evidence to suggest that the median cortisol levels are significantly differ
between the three groups.
14
Multiple Comparisons for Kruskal-Wallis Test
To determine if group i significantly differs from group j we compute
zij 
and then compute p-value = P( Z  z ij ) .
Ri  R j
N ( N  1)  1 1 

12  ni n j 
Bonferroni Correction
If the p-value is less then

2m
where m  # of pair-wise comparisons to be made which
k 
would typically be   if all pair-wise comparisons are of interest. For this example, we
2
 
 3
can make a total of m =    3 pair-wise comparisons so we compare our p-values to
 2
.05
 .00833 .
2(3)
Comparing Group I vs. Group II
z13 
69.0  90.0
=  P(Z > 6.26)  0 < .00833 so we conclude these groups significantly differ
22(23)  1 1 
  
12  10 6 
in terms of cortisol level.
Comparing Group I vs. Group III
z13 
69.0  94.0
22(23)  1 1 
  
12  10 6 
 7.46  P(Z > 7.46)  0 < .00833 so we conclude these groups significantly
differ in terms of cortisol level.
Comparing Group II vs. Group III
z13 
90.0  94.0
22(23)  1 1 
  
12  6 6 
 1.192  P(Z>1.192) = .1166 > .00833 so we fail to conclude these groups
differ significantly.
In conclusion we have identified the elective Caesarean section patients as being significantly different
from patients in whom spontaneous labor occurred in terms of antecubital vein cortisol levels.
15
Friedman’s Test for Randomized Complete Block (RCB) Designs
In for the analysis of one-way ANOVA with blocking, i.e. analysis of results from RCB
designs a nonparametric alternative is Friedman’s Test. The assumptions required for
this test are as follows:
1) The data consist of b mutually independent samples (blocks) of size k (# of
treatments).
2) The variable of interest is continuous, or at least ordinal.
3) There is no interaction between blocks and treatments.
4) The observations within each block may be ranked in order of magnitude.
Hypotheses:
𝐻𝑜 : 𝑀1 = 𝑀2 = ⋯ = 𝑀𝑘
𝐻𝑎 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑒𝑞𝑢𝑎𝑙𝑖𝑡𝑦 𝑖𝑠 𝑣𝑖𝑜𝑙𝑎𝑡𝑒𝑑.
The Friedman test statistic is defined as:
𝑘
𝜒𝑟2
2
12
𝑏(𝑘 + 1)
=
∑ [𝑅𝑗 −
]
𝑏𝑘(𝑘 + 1)
2
𝑗=1
where,
𝑅𝑗 = the sum of the ranks for treatment j where the ranks are assigned to the
treatments within blocks. The sum of the ranks is computed across blocks
however.
The test statistics under 𝐻𝑜 follows a chi-square distribution with df = k – 1.
16
Example: Serum amylase values (enzyme units per 100 ml of serum) in patients with pancreatitis.
17
In JMP (sort of Friedman’s Test):
Multiple Comparisons (Steel-Dwass ~ nonparametric version of Tukey’s all pairs)
18
In R,
> Amylase = read.table(file.choose(),header=T,sep=",")
> names(Amylase)
[1] "Block"
"Method"
"Amylase"
> friedman.test(Amylase~Method|Block,data=Amylase)
Friedman rank sum test
data: Amylase and Method and Block
Friedman chi-squared = 15.9429, df = 2, p-value = 0.0003452
19
Download