Goodman's and Kruskal's Gamma

advertisement

Topics:

Non-Parametric Measures of Correlation

Spearman’s Rho

Goodman’s & Kruskal’s Gamma

Stat203

Fall2011 – Week 13, Lecture 1

Page 1 of 26

More Correlation?

The Pearson correlation we’ve already studied was somewhat restrictive.

Recall, Pearson correlation was usually on good when:

- both variables are interval or ratio scaled

- both variables are approximately normally distributed

What about when we want to know the relationship between two variables that are:

- interval or ratio but not normally distributed

- ordinal

- nominal

Stat203

Fall2011 – Week 13, Lecture 1

Page 2 of 26

Correlation: Interval or Ratio data that isn’t Normally Distributed

Spearman’s Rho is a measure of correlation you can use when the data is strongly skewed, or you’re not sure whether the variables are Normally distributed.

It also sometimes called the Spearman Rank-Order correlation coefficient, so let’s define ‘Rank-Order’ first.

An observation’s rank is it’s order (highest to smallest) out of the n observations in the dataset.

An example …

Stat203

Fall2011 – Week 13, Lecture 1

Page 3 of 26

Example (C12 Q5): Is there a relationship between

Distance from School and Number of clubs joined?

First, let’s look at ‘ranks’:

Distance to Rank Order of Number of Rank order of Number

School (miles) Distance to School Clubs Joined of Clubs Joined

Lee

Rhonda

Jess

4

2

7

3

1

5

Evelyn

Mohammed

Steve

George

Juan

Chi

David

1

4

6

9

7

7

17

2

1

1

9

6

5

8

Stat203

Fall2011 – Week 13, Lecture 1

Page 4 of 26

Now that we have the ranks of each individual’s value of each variable, Spearman’s Rho is actually calculated using exactly the same formula as Pearson’s correlation, but using the ranks!

Let’s look at this using SPSS.

First off, let’s examine the histograms of these two variables to see why we shouldn’t use the Pearson

Correlation.

Stat203

Fall2011 – Week 13, Lecture 1

Page 5 of 26

Are these variables normally distributed?

... the sample size is small, so it’s hard to tell … but there is one easy way to see which correlation to use.

Stat203

Fall2011 – Week 13, Lecture 1

Page 6 of 26

Use SPSS to calculate both the Pearson and Spearman

Correlations

Stat203

Fall2011 – Week 13, Lecture 1

Page 7 of 26

If the data were perfectly normally distributed, the

Spearman and Pearson correlations would be identical!

In this case, the Pearson and Spearman correlation coefficients are not identical and the histograms seem to show some skewness, so we should use the Spearman’s

Rho as the correlation.

So, our conclusion from examining this data is that

Distance to School and the number of clubs joined have a statistically significant (p-value = 0.002), positive correlation ( ρ = 0.838)

Stat203

Fall2011 – Week 13, Lecture 1

Page 8 of 26

Correlations between Ordinal Variables?

From the previous example, we should note that the key to calculating Spearman’s Rho was to identify the rank-order of each individual for each variable.

Recall, Ordinal variables only give us an ‘ordering’ … or the rank of one individual compared to another!

So … we can use Spearman’s Rho for correlations involving ordinal data!

Stat203

Fall2011 – Week 13, Lecture 1

Page 9 of 26

Example (Ch12, q7): A researcher ranks population density and Quality of Life for 10 cities. Is there a relationship between these two variables?

Research Question:

Individuals:

Population:

Variables:

Parameter:

Stat203

Fall2011 – Week 13, Lecture 1

Page 10 of 26

Statistical Hypothesis:

… From SPSS:

Conclusion:

Stat203

Fall2011 – Week 13, Lecture 1

Page 11 of 26

… but note that the Spearman correlation is identical to the

Pearson!

This is because the data we analyzed was the ranks … and when only the ranks are available Spearman and

Pearson will be the same.

Stat203

Fall2011 – Week 13, Lecture 1

Page 12 of 26

Example (Ch12, Q11): Comparing High School GPA to

College performance. Is there a relationship between the two?

Research Question:

Individuals:

Population:

Variables:

Parameter:

Stat203

Fall2011 – Week 13, Lecture 1

Page 13 of 26

Statistical Hypothesis:

From SPSS:

Conclusion:

Stat203

Fall2011 – Week 13, Lecture 1

Page 14 of 26

… but note now that the Pearson does not match the

Spearman.

Why?

Only one variable contained ranks, the other variable was ratio scaled. So, for determining the correlation between ratio and ordinal data, we should use the Spearman.

Stat203

Fall2011 – Week 13, Lecture 1

Page 15 of 26

Goodman’s and Kruskal’s Gamma

Although Spearman’s Rho can be used in most cases involving ordinal data, if you have ‘lots’ of ties, you may have to use Gamma as an alternative.

What’s a tie? A tie is when many many individuals will have the same value of a variable, or combination of variables.

Why would there be lots of ties?

Think back to the homework; recall the General Happiness variable from the GSS – there were only three categories for this ordinal variable and most peop le selected ‘Pretty

Happy’ … they were all tied.

Gamma in SPSS

Stat203

Fall2011 – Week 13, Lecture 1

Page 16 of 26

As with the other statistics, we won’t calculate this by hand.

But it’s easy to find in SPSS.

Example (Ch12, Q12): Is there a relationship between

SocioEconomic Status and Number of books read?

Let’s first look at this data in SPSS:

Stat203

Fall2011 – Week 13, Lecture 1

Page 17 of 26

Stat203

Fall2011 – Week 13, Lecture 1

Page 18 of 26

Are there ties?

Stat203

Fall2011 – Week 13, Lecture 1

Page 19 of 26

Let’s do a cross-tab to obtain the table in the textbook, but note that we can generate some statistics along the way:

Stat203

Fall2011 – Week 13, Lecture 1

Page 20 of 26

… and the output:

Stat203

Fall2011 – Week 13, Lecture 1

Page 21 of 26

So, what would our conclusion be regarding the relationship between these variables?

When making conclusions regarding ‘relationship’ questions, quote the strength and direction (ie: the actual correlation) and the pvalue or ‘significance’.

Conclusion:

Stat203

Fall2011 – Week 13, Lecture 1

Page 22 of 26

So, we’ve studied 3 different ways to calculate correlation:

- Pearson’s r

- Spearman’s Rho

- Goodman’s and Kruskall’s Gamma

How do I know which to use?

- Consider the type of variables involved

- Consider the distribution of the variables

- Consider the # of ties

... and if all else fails, if Spearman’s gives a different conclusion than Pearson, use Spearman … and if it looks like you have 10% or more of your data with the same value of one variable or the other, use Gamma

Stat203

Fall2011 – Week 13, Lecture 1

Page 23 of 26

For all correlations …

All correlations we have studied have a maximum of 1 and a minimum of 1 and describe the strength and direction of the relationship between TWO variables.

SPSS provides a p-value for all correlations, and all are interpreted the same (significance of the relationship).

Research Hypotheses involving correlations always ask about a significant relationship between the variables.

Stat203

Fall2011 – Week 13, Lecture 1

Page 24 of 26

Today’s Topics

Non-Parametric Measures of Correlation

- Pearson’s r isn’t always good enough

- All correlations have similar interpretations regarding strength and direction of relationship

- All correlations have a p-value which is interpreted similarly

Spearman’s Rho

- for non-normal (ie: skewed) interval or ratio-scaled variables

- if one or more of the variables are ordinal

Goodman’s & Kruskal’s Gamma

- useful for correlations between ordinal variables with lots of ties

Reading:

Stat203

Fall2011 – Week 13, Lecture 1

Page 25 of 26

This lecture included material from Chapter 12 up to page 430.

No more reading for this course!

Stat203

Fall2011 – Week 13, Lecture 1

Page 26 of 26

Download