Spearman`s Rank Correlation Coefficient

advertisement
Spearman's Rank Correlation Coefficient
The Spearman's coefficient is used in data analysis to assess the extent to which two variables are correlated.
It is an objective measure which provides a numerical value to summarise the relationship between two
variables such as air temperature and altitude.
The statistical significance of the result can be determined to assess the likelihood of it occurring by chance.
However, it must be remembered that even if there is a statistically significant relationship between two
variables it does not prove a causal link, i.e. it does not prove that a change in one variable has been responsible
for a change in the other.
It is a useful technique for testing hypotheses; it can suggest relationships which are worthy of further
investigation, or it can eliminate some variables form the study. The formula for calculating Spearman's Rank
correlation coefficient is:
The test effectively measures the extent to which the rank of data for one variable is different to the rank of
data for the second variable and then assesses whether this difference is statistically significant. The best
way to calculate Spearman's Rank is to use the following Excel spreadsheet.
Interpreting the Result
On commenting on the Spearman rank result there are four aspects to consider:
1. What is the strength of the correlation between the variables?
1
Perfect correlation
0.9
Strong correlation
0.7 - 0.8
Good correlation
0.5 - 0.6
Weak correlation
0.3 - 0.4
Poor correlation
0
No correlation
2. What is the direction of the relationship?
If the calculation produces a positive value (e.g. + 0.8) then the relationship is positive i.e. as one variable
increases then so does the other. If the coefficient is negative (e.g. - 0.8) then the relationship is negative. i.e.
as one variable increased the decreased.
3. Is the result statistically significant or could the relationship observed have occurred by chance?
In order to assess the statistical significance or the reliability of the result critical values for R must be
consulted. The following table is the same as the critical value table found on the spreadsheet.
n
95%
99%
4
1.000
5
0.900
1.000
6
0.829
0.943
7
0.714
0.893
8
0.643
0.833
9
0.600
0.783
10
0.564
0.746
12
0.506
0.712
14
0.456
0.645
16
0.425
0.601
18
0.399
0.564
20
0.377
0.534
22
0.359
0.508
24
0.343
0.485
26
0.329
0.465
28
0.317
0.448
30
0.306
0.432
-
When interpreting the result for statistical significance the critical value applies whether the result is positive
or negative. Ignore the positive or negative sign; simply assess the result you have calculated against the
critical values given for your value of n.
If your result for Spearman's Rank is equal to, or greater than, the stated critical value then it is statistically
significant. Significance is measured from 0 to 1.0; therefore 0.05 significance (which may be stated as the
95% confidence level) means that the result could have occurred by chance 5 in 100 times.
The rejection level is 0.05 significance; any result which does not satisfy this level has to be rejected; it could
have occurred by chance more than 5 times in 100; this represents an unacceptable level of chance; there is
likely to be some other explanation for the observed relationship.
The 0.01 significance level (99% confidence level) means that the result could have occurred by chance 1 in 100
times. This means that the given result is very unlikely to have occurred by chance and represents a
statistically very significant result. Obviously, if your result satisfies the 0.01 level, then it must satisfy the
0.05 level. In this case there is no need to state that it satisfies 0.05. You simply need to state the most
significant level that your result satisfies.
4. Does the observed relationship have geological significance?
It is important that you interpret the relationship in its geological context; is this what you would expect? How
do the two variables relate? Try to develop your reasons for the observed relationship. You will have started
the investigation by establishing a hypothesis; consider your result and relate this to your initial research
question.
Issues to watch out for when using Spearman's Rank.

You must have at least 10 sets of paired data! Spearman's rank is unreliable if n is less than 10.




The more sets of data used the more reliable the result; but the calculation becomes more complicated,
particularly in the ranking process, and there is more chance of error once you use more than 30 sets of
paired data.
Obviously large samples are much better handled by a computer statistical package such as Excel.
If the data set contains too many tied ranks then this undermines the statistical reliability of the
coefficient; there is little that you can do about the nature of the data collected but you should be
aware of any limitations in the application of a chosen method of analysis.
Remember, the Spearman's Rank coefficient must be between +1 and -1 in value.
Finally...
Without supporting evidence a correlation coefficient proves very little; it is possible to identify a correlation
between two sets of data without the data being connected in any meaningful way. The Spearman's Rank test
merely indicates the degree of correlation between two sets of data and allows some assessment of the
likelihood that the relationship occurred by chance; it does not prove that a change in one variable causes a
change in the other. It does not prove a causal relationship.
It is very important that you choose your variables sensibly; do not attempt to suggest that there is a direct
causal link between two variables which only have some spurious connection! It may be possible to prove a
statistical relationship between two variables, but any observed connection may not be direct, it may have been
caused by a third factor which is not being investigated.
Download