What are non-parametric tests?

advertisement
Statistics for Health Research
Non-Parametric
Methods
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Objectives of Presentation
•
•
•
•
•
•
Introduction
Ranks & Median
Wilcoxon Signed Rank Test
Paired Wilcoxon Signed Rank
Mann-Whitney test
Spearman’s Rank Correlation
Coefficient
• Others….
What are non-parametric tests?
• ‘Parametric’ tests involve estimating
parameters such as the mean, and
assume that distribution of sample
means are ‘normally’ distributed
• Often data does not follow a Normal
distribution eg number of cigarettes
smoked, cost to NHS etc.
• Positively skewed distributions
A positively skewed distribution
20
Frequency
15
10
5
Mean = 8.03
Std. Dev. = 12.952
N = 30
0
0
10
20
30
Units of alcohol per week
40
50
What are non-parametric tests?
• ‘Non-parametric’ tests were
developed for these situations where
fewer assumptions have to be made
• NP tests STILL have assumptions but
are less stringent
• NP tests can be applied to Normal
data but parametric tests have
greater power IF assumptions met
Ranks
• Practical differences between
parametric and NP are that NP
methods use the ranks of values
rather than the actual values
• E.g.
1,2,3,4,5,7,13,22,38,45 - actual
1,2,3,4,5,6, 7, 8, 9,10 - rank
Median
• The median is the value above and
below which 50% of the data lie.
• If the data is ranked in order, it is
the middle value
• In symmetric distributions the mean
and median are the same
• In skewed distributions, median more
appropriate
Median
• BPs:
135, 138, 140, 140, 141, 142, 143
Median=
Median
• BPs:
135, 138, 140, 140, 141, 142, 143
Median=140
• No. of cigarettes smoked:
0, 1, 2, 2, 2, 3, 5, 5, 8, 10
Median=
Median
• BPs:
135, 138, 140, 140, 141, 142, 143
Median=140
• No. of cigarettes smoked:
0, 1, 2, 2, 2, 3, 5, 5, 8, 10
Median=2.5
T-test
• T-test used to test whether the
mean of a sample is sig different
from a hypothesised sample mean
• T-test relies on the sample being
drawn from a normally distributed
population
• If sample not Normal then use the
Wilcoxon Signed Rank Test as an
alternative
Wilcoxon Signed Rank Test
• NP test relating to the median as
measure of central tendency
• The ranks of the absolute
differences between the data and the
hypothesised median calculated
• The ranks for the negative and the
positive differences are then summed
separately (W- and W+ resp.)
• The minimum of these is the test
statistic, W
Wilcoxon Signed Rank Test:
Example
The median heart rate for an 18 year
old girl is supposed to be 82bpm. A
student takes the pulse rates of 8
female students (all aged 18):
83, 90, 96, 82, 85, 80, 81, 87
Do these results suggest that the
median might not be 82?
Wilcoxon Signed Rank Test:
Example
H0:
Wilcoxon Signed Rank Test:
Example
H0: median=82
H1:
Wilcoxon Signed Rank Test:
Example
H0: median=82
H1: median≠82
Wilcoxon Signed Rank Test:
Example
H0: median=82
H1: median≠82
Two-tailed test
Because one result equals 82 this
cannot be used in the analysis
Wilcoxon Signed Rank Test:
Example
Result
Above or
below
median
Absolute
difference from
median=82
Rank of
difference
83
+
1
1.5
90
+
8
6
96
+
14
7
85
+
3
4
80
-
2
3
81
-
1
1.5
87
+
5
5
W+= 1.5+6+7+4+5=23.5
W-= 3+1.5=4.5
So, W=4.5
n=7, so the value of W > tabulated value of 2, so p>0.05
Wilcoxon Signed Rank Test:
Example
Therefore, the student should conclude
that these results could have come
from a population which had a median
of 82 as the result is not
significantly different to the null
hypothesis value.
Wilcoxon Signed Rank Test
Normal Approximation
• As the number of ranks (n) becomes
•
•
•
•
larger, the distribution of W becomes
approximately Normal
Generally, if n>20
Mean W=n(n+1)/4
Variance W=n(n+1)(2n+1)/24
Z=(W-mean W)/SD(W)
Wilcoxon Signed Rank Test
Assumptions
• Population should be approximately
symmetrical but need not be Normal
• Results must be classified as either
being greater than or less than the
median ie exclude results=median
• Can be used for small or large
samples
Paired samples t-test
• Disadvantage: Assumes data are a
random sample from a population
which is Normally distributed
• Advantage: Uses all detail of the
available data, and if the data are
normally distributed it is the most
powerful test
The Wilcoxon Signed Rank Test
for Paired Comparisons
• Disadvantage: Only the sign (+ or -)
of any change is analysed
• Advantage: Easy to carry out and
data can be analysed from any
distribution or population
Paired And Not Paired
Comparisons
• If you have the same sample
measured on two separate occasions
then this is a paired comparison
• Two independent samples is not a
paired comparison
• Different samples which are
‘matched’ by age and gender are
paired
The Wilcoxon Signed Rank Test
for Paired Comparisons
• Similar calculation to the Wilcoxon
Signed Rank test, only the
differences in the paired results are
ranked
• Example using SPSS:
A group of 10 patients with chronic
anxiety receive sessions of cognitive
therapy. Quality of Life scores are
measured before and after therapy.
Wilcoxon Signed Rank Test
example
QoL Score
Before
6
5
3
4
2
1
3
8
6
12
After
9
12
9
9
3
1
2
12
9
10
Wilcoxon Signed Rank Test
example
SPSS Output
p < 0.05
Mann-Whitney test
• Used when we want to compare two
•
•
unrelated or INDEPENDENT groups
For parametric data you would use
the unpaired (independent) samples
t-test
The assumptions of the t-test
were:
1. The distribution of the measure in each
group is approx Normally distributed
2. The variances are similar
Example (1)
The following data shows the number
of alcohol units per week collected in a
survey:
Men (n=13): 0,0,1,5,10,30,45,5,5,1,0,0,0
Women (n=14): 0,0,0,0,1,5,4,1,0,0,3,20,0,0
Is the amount greater in men compared
to women?
Example (2)
How would you test whether the
distributions in both groups are
approximately Normally distributed?
Example (2)
How would you test whether the
distributions in both groups are
approximately Normally distributed?




Plot histograms
Stem and leaf plot
Box-plot
Q-Q or P-P plot
Boxplots of alcohol units per week by gender
50
7
Units of alcohol per week
40
6
30
25
20
10
0
Male
Female
Gender
Example (3)
Are those distributions symmetrical?
Example (3)
Are those distributions symmetrical?
Definitely not!
They are both highly skewed so not
Normal. If transformation is still not Normal
then use non-parametric test – Mann Whitney
Suggests perhaps that males tend to
have a higher intake than women.
Mann-Whitney on SPSS
Normal approx (NS)
Mann-Whitney (NS)
Spearman Rank Correlation
• Method for investigating the
relationship between 2 measured
variables
• Non-parametric equivalent to
Pearson correlation
• Variables are either non-Normal or
measured on ordinal scale
Spearman Rank Correlation
Example
A researcher wishes to assess whether
the distance to general practice
influences the time of diagnosis of
colorectal cancer.
The null hypothesis would be that
distance is not associated with time to
diagnosis. Data collected for 7 patients
Distance from GP and time to diagnosis
Distance (km)
Time to diagnosis
(weeks)
5
6
2
4
4
3
8
4
20
5
45
5
10
4
Scatterplot
Distance from GP and time to diagnosis
D2
Distance
(km)
Time
(weeks)
Rank for
distance
Rank for
time
Difference
in Ranks
2
4
1
3
-2
4
4
3
2
1
1
1
5
6
3
7
-4
16
8
4
4
3
1
1
10
4
5
3
2
4
20
5
6
5.5
0.5
0.25
45
5
7
5.5
1.5
2.25
Total = 0
d2=28.5
Spearman Rank Correlation
Example
The formula for Spearman’s rank
correlation is:
rs  1 
6 d

2

n n 1
2
where n is the number of pairs
Spearman’s on SPSS
Spearman’s in SPSS
Spearman’s in SPSS
Spearman’s in SPSS
Spearman Rank Correlation
Example
In our example, rs=0.468
In SPSS we can see that this value is
not significant, ie.p=0.29
Therefore there is no significant
relationship between the distance to a
GP and the time to diagnosis but note
that correlation is quite high!
Spearman Rank Correlation
• Correlations lie between –1 to +1
• A correlation coefficient close to
•
•
zero indicates weak or no
correlation
A significant rs value depends on
sample size and tells you that its
unlikely these results have arisen by
chance
Correlation does NOT measure
causality only association
Chi-squared test
• Used when comparing 2 or more
•
•
groups of categorical or nominal
data (as opposed to measured data)
Already covered!
In SPSS Chi-squared test is test of
observed vs. expected in single
categorical variable
More than 2 groups
• So far we have been comparing 2
•
•
•
•
groups
If we have 3 or more independent
groups and data is not Normal we
need NP equivalent to ANOVA
If independent samples use KruskalWallis
If related samples use Friedman
Same assumptions as before
More than 2 groups
Parametric related to Nonparametric test
Parametric Tests
Single sample t-test
Paired sample t-test
2 independent samples t-test
One-way Analysis of Variance
Pearson’s correlation
Non-parametric Tests
Parametric / Non-parametric
Parametric Tests
Single sample t-test
Paired sample t-test
2 independent samples t-test
One-way Analysis of Variance
Pearson’s correlation
Non-parametric Tests
Wilcoxon-signed rank test
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
One-way Analysis of Variance
Pearson’s correlation
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
Mann-Whitney test (Note:
sometimes called Wilcoxon
Rank Sums test!)
One-way Analysis of Variance
Pearson’s correlation
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
Mann-Whitney test (Note:
sometimes called Wilcoxon
Rank Sums test!)
One-way Analysis of Variance
Kruskal-Wallis
Pearson’s correlation
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
Mann-Whitney test(Note:
sometimes called Wilcoxon
Rank Sums test!)
One-way Analysis of Variance
Kruskal-Wallis
Pearson’s correlation
Spearman Rank
Summary
Non-parametric
• Non-parametric methods have fewer
assumptions than parametric tests
• So useful when these assumptions not met
• Often used when sample size is small and
difficult to tell if Normally distributed
• Non-parametric methods are a ragbag of
tests developed over time with no
consistent framework
• Read in datasets LDL, etc and carry out
appropriate Non-Parametric tests
References
Corder GW, Foreman DI. Non-parametric Statistics for NonStatisticians. Wiley, 2009.
Nonparametric statistics for the behavioural Sciences.
Siegel S, Castellan NJ, Jr. McGraw-Hill, 1988 (first edition
was 1956)
Download