Correlation

advertisement
S519: Evaluation of
Information Systems
Social Statistics
Ch5: Correlation
This week



What is correlation?
How to compute?
How to interpret?
Correlation Coefficients

The relations between two variables


How the value of one variable changes when the
value of another variable changes
A correlation coefficient is a numerical index
to reflect the relationship between two
variables.


Range: -1 ~ +1
Bivariate correlation (for two variables)
Correlation Coefficients

Parametric


Pearson product-moment correlation (named for
inventor Karl Pearson)
Non-parametric


Spearman’s rank correlation
Kendall tau rank correlation coefficient
Pearson correlation coefficient

For two variables which are continuous in
nature


Height, age, test score, income
But not for discrete or categorical variables

Race, political affiliation, social class, rank
Rxy is the correlation between variable
X and variable Y
Types of correlation
coefficients

Direct correlation (positive correlation):


If both variables change in the same direction
Indirect correlation (negative correlation):

If both variables change in opposite directions

See table 5.1 (S-p112)

-0.70 and +0.5, which is stronger?
Pearson product-moment
correlation coefficient
rxy 
rxy
n
X
Y
XY
X2
Y2
n XY   X  Y
[n X 2  ( X ) 2 ][n Y 2  ( Y ) 2 ]
The correlation coefficient between X and Y
the size of the sample
the individual’s score on the X variable
the individual’s score on the Y variable
the product of each X score times its corresponding Y score
the individual X score, squared
the individual Y score, squared
Exercise

Calculate Pearson correlation coefficient
X
Y
2
4
5
6
4
7
8
5
6
7
1.Is variable X and variable Y correlated?
2. What does this correlated mean?
3
2
6
5
3
6
5
4
4
5
Using Excel to calculate


CORREL function
Or Pearson function
Visualizing a correlation

Scatterplot or scattergram
X
Y
2
4
5
6
4
7
8
5
6
7
Y
X
3
2
6
5
3
6
5
4
4
5
Visualizing a correlation
7
Y
6
5
4
3
2
1
0
0
2
4
6
8
X
10
Direct (positive) correlation
9
8
7
6
5
4
3
2
1
0
0


2
4
6
8
10
r =1, a perfect direct (or positive) correlation
In real life case, 0.7 and 0.8 could be the highest you will see
Indirect (or negative) correlation
9
8
7
6
5
4
3
2
1
0
0

2
4
6
8
10
Strength and direction are important
Excel Scatterplot
Four sets of data with the
same correlation of 0.816
Linear correlation

Linear correlation means that X and Y are in
one straight line

Curvlilinear correlation

Age and memory
More than 2 variables?
income
How to
calculate the
correlation
coefficient?
education
74190
80931
81314
73089
62023
61217
84526
87251
62659
76450
70512
78858
78628
86212
74962
58828
61471
78621
60071
attitude
13
12
11
11
11
10
11
11
12
10
12
9
13
14
9
11
10
12
9
vote
1
3
4
5
3
4
5
4
5
6
7
6
7
8
8
9
8
7
8
1
2
2
2
2
2
1
1
2
2
2
1
1
2
2
4
5
5
4
1. CORREL()
2. Correlation
in data
analysis
toolset
More than 2 variables?

Correlation matrix
Income
Education
Attitude
Vote
Income
Education
Attitude
Vote
1.00
0.35
-0.19
0.51
1.00
-0.21
0.43
1.00
0.55
1.00
Excel

Data Analysis tool - correlation
Meaning of Correlation
coefficient

Correlation value:


- finite number ~ + finite number
Correlation coefficient value:

-1.00 ~ +1.00
rxy value
Interpretation
0.8 ~ 1.0
Very strong relationship (share most of the things in common)
0.6 ~0.8
Strong relationship (share many things in common)
0.4 ~ 0.6
Moderate relationship (share something in common)
0.2 ~ 0.4
Weak relationship (share a little in common)
0.0 ~ 0.2
Weak or no relationship (share very little or nothing in common)
Coefficient of determination

Coefficient of determination:
The percentage of variance in one variable that is
accounted for by the variance in the other
variable.
= square of coefficient


rGPA.Time  0.70
2
GPA .Time
r
 0.49
49% of the variance in GPA can be
explained by the variance in
studying time
Coefficient of nondetermination

The amount of unexplained variance is called
the coefficient of undetermination (coefficient
of alienation)
correlation
determination
0
0
0.5
0.25
0.9
0.81
interpretation
Ice cream and crime


In a small town in Greece,
The local police found the direct correlation
between ice cream and crime
Correlation vs. causality


The correlation represents the association
between two or more variables
It has nothing to do with causality (there is no
cause relation between two correlated
variables)


Ices cream and crime are correlated, but
Ices cream does not cause crime
Download