Topics:
Non-Parametric Measures of Correlation
Spearman’s Rho
Goodman’s & Kruskal’s Gamma
Stat203
Fall2011 – Week 13, Lecture 1
Page 1 of 26
More Correlation?
The Pearson correlation we’ve already studied was somewhat restrictive.
Recall, Pearson correlation was usually on good when:
- both variables are interval or ratio scaled
- both variables are approximately normally distributed
What about when we want to know the relationship between two variables that are:
- interval or ratio but not normally distributed
- ordinal
- nominal
Stat203
Fall2011 – Week 13, Lecture 1
Page 2 of 26
Correlation: Interval or Ratio data that isn’t Normally Distributed
Spearman’s Rho is a measure of correlation you can use when the data is strongly skewed, or you’re not sure whether the variables are Normally distributed.
It also sometimes called the Spearman Rank-Order correlation coefficient, so let’s define ‘Rank-Order’ first.
An observation’s rank is it’s order (highest to smallest) out of the n observations in the dataset.
An example …
Stat203
Fall2011 – Week 13, Lecture 1
Page 3 of 26
Example (C12 Q5): Is there a relationship between
Distance from School and Number of clubs joined?
First, let’s look at ‘ranks’:
Distance to Rank Order of Number of Rank order of Number
School (miles) Distance to School Clubs Joined of Clubs Joined
Lee
Rhonda
Jess
4
2
7
3
1
5
Evelyn
Mohammed
Steve
George
Juan
Chi
David
1
4
6
9
7
7
17
2
1
1
9
6
5
8
Stat203
Fall2011 – Week 13, Lecture 1
Page 4 of 26
Now that we have the ranks of each individual’s value of each variable, Spearman’s Rho is actually calculated using exactly the same formula as Pearson’s correlation, but using the ranks!
Let’s look at this using SPSS.
First off, let’s examine the histograms of these two variables to see why we shouldn’t use the Pearson
Correlation.
Stat203
Fall2011 – Week 13, Lecture 1
Page 5 of 26
Are these variables normally distributed?
... the sample size is small, so it’s hard to tell … but there is one easy way to see which correlation to use.
Stat203
Fall2011 – Week 13, Lecture 1
Page 6 of 26
Use SPSS to calculate both the Pearson and Spearman
Correlations
Stat203
Fall2011 – Week 13, Lecture 1
Page 7 of 26
If the data were perfectly normally distributed, the
Spearman and Pearson correlations would be identical!
In this case, the Pearson and Spearman correlation coefficients are not identical and the histograms seem to show some skewness, so we should use the Spearman’s
Rho as the correlation.
So, our conclusion from examining this data is that
Distance to School and the number of clubs joined have a statistically significant (p-value = 0.002), positive correlation ( ρ = 0.838)
Stat203
Fall2011 – Week 13, Lecture 1
Page 8 of 26
Correlations between Ordinal Variables?
From the previous example, we should note that the key to calculating Spearman’s Rho was to identify the rank-order of each individual for each variable.
Recall, Ordinal variables only give us an ‘ordering’ … or the rank of one individual compared to another!
So … we can use Spearman’s Rho for correlations involving ordinal data!
Stat203
Fall2011 – Week 13, Lecture 1
Page 9 of 26
Example (Ch12, q7): A researcher ranks population density and Quality of Life for 10 cities. Is there a relationship between these two variables?
Research Question:
Individuals:
Population:
Variables:
Parameter:
Stat203
Fall2011 – Week 13, Lecture 1
Page 10 of 26
Statistical Hypothesis:
… From SPSS:
Conclusion:
Stat203
Fall2011 – Week 13, Lecture 1
Page 11 of 26
… but note that the Spearman correlation is identical to the
Pearson!
This is because the data we analyzed was the ranks … and when only the ranks are available Spearman and
Pearson will be the same.
Stat203
Fall2011 – Week 13, Lecture 1
Page 12 of 26
Example (Ch12, Q11): Comparing High School GPA to
College performance. Is there a relationship between the two?
Research Question:
Individuals:
Population:
Variables:
Parameter:
Stat203
Fall2011 – Week 13, Lecture 1
Page 13 of 26
Statistical Hypothesis:
From SPSS:
Conclusion:
Stat203
Fall2011 – Week 13, Lecture 1
Page 14 of 26
… but note now that the Pearson does not match the
Spearman.
Why?
Only one variable contained ranks, the other variable was ratio scaled. So, for determining the correlation between ratio and ordinal data, we should use the Spearman.
Stat203
Fall2011 – Week 13, Lecture 1
Page 15 of 26
Goodman’s and Kruskal’s Gamma
Although Spearman’s Rho can be used in most cases involving ordinal data, if you have ‘lots’ of ties, you may have to use Gamma as an alternative.
What’s a tie? A tie is when many many individuals will have the same value of a variable, or combination of variables.
Why would there be lots of ties?
Think back to the homework; recall the General Happiness variable from the GSS – there were only three categories for this ordinal variable and most peop le selected ‘Pretty
Happy’ … they were all tied.
Gamma in SPSS
Stat203
Fall2011 – Week 13, Lecture 1
Page 16 of 26
As with the other statistics, we won’t calculate this by hand.
But it’s easy to find in SPSS.
Example (Ch12, Q12): Is there a relationship between
SocioEconomic Status and Number of books read?
Let’s first look at this data in SPSS:
Stat203
Fall2011 – Week 13, Lecture 1
Page 17 of 26
Stat203
Fall2011 – Week 13, Lecture 1
Page 18 of 26
Are there ties?
Stat203
Fall2011 – Week 13, Lecture 1
Page 19 of 26
Let’s do a cross-tab to obtain the table in the textbook, but note that we can generate some statistics along the way:
Stat203
Fall2011 – Week 13, Lecture 1
Page 20 of 26
… and the output:
Stat203
Fall2011 – Week 13, Lecture 1
Page 21 of 26
So, what would our conclusion be regarding the relationship between these variables?
When making conclusions regarding ‘relationship’ questions, quote the strength and direction (ie: the actual correlation) and the pvalue or ‘significance’.
Conclusion:
Stat203
Fall2011 – Week 13, Lecture 1
Page 22 of 26
So, we’ve studied 3 different ways to calculate correlation:
- Pearson’s r
- Spearman’s Rho
- Goodman’s and Kruskall’s Gamma
How do I know which to use?
- Consider the type of variables involved
- Consider the distribution of the variables
- Consider the # of ties
... and if all else fails, if Spearman’s gives a different conclusion than Pearson, use Spearman … and if it looks like you have 10% or more of your data with the same value of one variable or the other, use Gamma
Stat203
Fall2011 – Week 13, Lecture 1
Page 23 of 26
For all correlations …
All correlations we have studied have a maximum of 1 and a minimum of 1 and describe the strength and direction of the relationship between TWO variables.
SPSS provides a p-value for all correlations, and all are interpreted the same (significance of the relationship).
Research Hypotheses involving correlations always ask about a significant relationship between the variables.
Stat203
Fall2011 – Week 13, Lecture 1
Page 24 of 26
Today’s Topics
Non-Parametric Measures of Correlation
- Pearson’s r isn’t always good enough
- All correlations have similar interpretations regarding strength and direction of relationship
- All correlations have a p-value which is interpreted similarly
Spearman’s Rho
- for non-normal (ie: skewed) interval or ratio-scaled variables
- if one or more of the variables are ordinal
Goodman’s & Kruskal’s Gamma
- useful for correlations between ordinal variables with lots of ties
Reading:
Stat203
Fall2011 – Week 13, Lecture 1
Page 25 of 26
This lecture included material from Chapter 12 up to page 430.
No more reading for this course!
Stat203
Fall2011 – Week 13, Lecture 1
Page 26 of 26