Correlation

advertisement
Correlations
Renan Levine
POL 242
July 12, 2006
Association
:
Crosstabulation
Category
Specifics
Symmetry
Specification
Measure of
Association
Indication of
Direction
Nominal X Nominal
Only (2 X 2)
Symmetrical
Phi
Yes
Nominal X Nominal
Greater than (2X2)
-
Cramer's V
No
Nominal X Ordinal
At least (2 X 3)
-
Cramer's V
No
Ordinal X Ordinal
Square (e.g., 3 X 3)
Symmetrical
Kendall's Taub
Yes
Ordinal X Ordinal
Rectangle (e.g., 3 X 4)
Asymmetrical
Kendall's Tauc
Yes
Interval X Interval (not taught
yet)
-
-
Pearson's R (not yet
taught)
Yes
Today: Correlations



Correlation is a measure of a relationship between
variables. Measured with a coefficient [Pearson’s r] that
ranges from -1 to 1.
Measure strength of relationship of interval or ratio
variables
r = Σ(Zx * Zy)/n – 1


Zx=Z scores for X variable and Z scores for Y variable.
Sum the products and divide by number of paired cases
minus one.
How to calculate Z scores can be found on-line.
Correlation r

Absolute values closer to 0 indicate that there is
little or no linear relationship.



Generally, 0.2-0.4 is weak, 0.4-0.6 is okay, 0.6 or
higher is strong.
If correlation is very high, then its probably something
related that you might considering indexing or
choosing just one variable.
The closer the coefficient is to the absolute value
of 1 the stronger the relationship between the
variables being correlated.
Positive Relationship
If two variables are related positively or
directly
r > 0
 Variables “track together” – high
values on Variable X are associated
with high values on Variable Y.


Low values on X associated with low
values.
Example
Robert D. Putnam; Robert Leonardi; Raffaella Y. Nanetti; Franco Pavoncello. “Explaining Institutional Success: The Case of Italian Regional
Government.” The American Political Science Review 77:1 (Mar. 1983), pp. 55-74
More fun examples: http://www.nationmaster.com/correlations/eco_gdp-economy-gdp-nominal
Example II
0
10 20 30 40 50 60 70 80 90
100
Opinion towards Palin & McCain
0
10
20
30
40
50
60
70
Feeling Thermometer McCain
80
90
100
r = 0.84
Negative or Inverse Relationship

Variables can be inversely or negatively
related

High values of X are associated with low values
of Y.
Example – Negative / Inverse
r = -0.68
10 20 30 40 50 60 70 80 90
100
Opinion towards Obama & McCain
0
Time/SRBI:
Oct 3-6, ‘08
0
10
20
30
40
50
60
70
Feeling Thermometer McCain
80
90
100
red= Republicans, blue=Democrats, grey diamonds=Independents
Data


You need interval-level data.
You will find many interval-level variables in:




Countries / World
Provinces
Election studies (feeling thermometers, odds of
party entering government, etc)
You can often use the index you created as
an interval-level variable.
Compare
0
10 20 30 40 50 60 70 80 90
Feeling Thermometer Palin
100
Opinion towards Palin & McCain
0
Most points close to a line.
10
20
30
40
50
60
70
Feeling Thermometer McCain
80
Lots more noise here. Typical of public
opinion data.
90
100
Differences between Public Opinion and
Aggregate Data

Although it is not uncommon to have
one/some outliers in aggregate data, public
opinion data tends to be “noisy”.

Feeling thermometer example:




Many respondents gave both candidates a 50;
Quite a few respondents liked both candidates
Even though most who liked McCain disliked Obama
A high Pearson’s r for public opinion data
may be low for an association in aggregate
data.
Guidelines for Public Opinion Data
MAGNITUDE OF
ASSOCIATION
QUALIFICATION
COMMENTS
No Relationship
Knowing the independent variable does not reduce the number of errors in
predicting the dependent variable at all.
.00 to .15
Not Useful
Not Acceptable
.15 to .20
Very Weak
Minimally acceptable
.20 to .25
Moderately Strong
Acceptable
.25 to .30
Fairly Strong
Good Work
.30 to .40
Strong
Great Work
.40 to .70
Very Strong/Worrisomely Strong
EITHER an excellent relationship OR the two variables are measuring the same
thing
.70 to .99
Redundant (?)
Proceed with caution: are the two variables testing the same thing?
Perfect Relationship.
If we the know the independent variable, we can predict the dependent variable with
absolute success.
0.00
1.00
Rough Guidelines for Aggregate Data
MAGNITUDE OF
ASSOCIATION
QUALIFICATION
COMMENTS
No Relationship
Knowing the independent variable does not reduce the number of errors in
predicting the dependent variable at all.
.00 to .30
Not useful, very weak
Not Acceptable
.30 to .50
Weak
Minimally acceptable
.50 to .70
Fairly Strong
Acceptable
.70 to .85
Strong
Good Work
.80 to .90
Very Strong/Worrisomely Strong
EITHER an excellent relationship OR the two variables are measuring the same
thing
.90 to .99
Redundant (?)
Proceed with caution: are the two variables testing the same thing?
Perfect Relationship.
If we the know the independent variable, we can predict the dependent variable with
absolute success.
0.00
1.00
Very Strong or Worrisome??



Public Opinion: above |0.40|
Aggregate: above |0.80|
But these are just guidelines. It depends on
how good the data is:



Lots of variation in data
Large scale (10, 20, 100 pts – like prediction
odds, physicians per 100,000 people, feeling
thermometer scales)
Number of observations (N)

Provinces dataset is small
Outstanding or the same?

You either have an outstanding relationship OR the
variables may be measuring the same idea.



Ex. unemployment and GDP both measure economic
health
Ex. Feeling thermometer Barack Obama and feeling
thermometer for Joe Biden both measure attitudes towards
the Democratic ticket
Also inverse relationship

Example above: Obama and McCain feeling thermometers
– different sides of the same coin, as both seem to
measure partisanship.
Use Yo’ Brain


Computer cannot tell you if it’s a good, strong
relationship or two measures looking at the same
thing.
Need to understand what each variable is
measuring

Same thought process about the index creation.




Use your knowledge of world and theory to decide whether
two variables measure the same thing or two different things.
Example (above): Putnam’s relationship between civic culture
and government performance.
Failed states survey - appears that the higher an indicator value, the
worse off the country in that particular field.
http://www.fundforpeace.org/web/index.php?option=com_content&ta
sk=view&id=99&Itemid=140
Flip side



Relationship you expect is strong is surprisingly not
?!?!?
Make certain both variables are interval
Double check that you cleaned up data



Missing values are missing
Next week: there may be the need to qualify the
relationship as some sub-group of the data is not like the
others and those need to be identified.
Think about relationship – maybe its not linear, so
that relationship is only present for part of range.
Usefulness


Quick, easy way to look at several variables to see if
they are related.
With strong association, you can begin to think about
predicting values of Y based on a value of X.

Ex. Positive correlation – you know a high value of X is
associated with a high value of Y!
Webstats Output
- Q375A1
Correlation Coefficients
Q305
Q375A3
Q1005
Q375A1
1.0000
( 686)
P= .
.2916
( 666)
P= .000
.5320
( 667)
P= .000
-.3163
( 672)
P= .000
Q305
.2916
( 666)
P= .000
1.0000
( 2776)
P= .
.2679
( 660)
P= .000
-.1272
( 2721)
P= .000
.5320
N ( 667)
P= .000
.2679
( 660)
P= .000
1.0000
( 682)
P= .
-.2020
( 666)
P= .000
-.3163
( 672)
P= .000
-.1272
( 2721)
P= .000
-.2020
( 666)
P= .000
1.0000
( 3181)
P= .
Q375A3
Coefficients
(Pearson’s r)
Q1005
- -
Significance?


Webstats will tell you whether or not the
correlation coefficient is significant.
Remember that this is just telling you whether
the relationship may be due to chance.



Not the strength of the relationship
Almost unheard of to have a strong
relationship that is insignificant when using
survey data.
So, don’t spend any time discussing
significance.
What if non-interval/non-ratio?



Usually more appropriate to use the other
measures of association.
Webstats will perform a correlation. Be ready
for results to be less strong
Program may report (instead of Pearson’s r):



Spearman: ordinal x ordinal
Point-biserial: one interval/ratio, one dichotomous
Phi: two dichotomous variables

All interpreted the same way
Download