Correlation • Correlation is used to describe the linear relationship

advertisement
Correlation
• Correlation is used to describe the linear relationship
between two continuous variables (e.g., height and weight).
In general, correlation tends to be used when there is no
identified response variable. It measures the strength
(qualitatively) and direction of the linear relationship
between two or more variables.
• The Pearson correlation coefficient measures the strength
of the linear association between two variables. It can be
used to estimate the population correlation, ρ.
p. 1
Example
Correlating Two Lung Function Meters
(Wright vs. Mini-Wright)
Peak expiratory flow rate (PERF) can be measured using
either the “Wright Peak Flow Meter” or the “Mini-Wright
Peak Flow Meter.” Do their measurements “correlate”?
r = 0.943
700
PERF by Mini Meter
600
500
400
300
200
100
200
300
400
500
600
700
PERF by Large Meter
Note that this is a very different question than do their
measurements “agree”.
p. 2
Properties of Correlations
• A correlation can range in value between –1 and 1:
-
r is a dimensionless quantity; that is, r is independent of
the units of measurement of X and Y.
-
If the correlation is greater than 0, then as X increases Y
increases and the two variables are said to be positively
correlated. An r = 1 is perfect positive correlation.
-
If the correlation is less than 0, then as X increases Y
decreases and the two variables are said to be
negatively correlated. An r = -1 is perfect negative
correlation.
-
If the correlation is 0 then there is no linear relationship
between X and Y. The two variables are said to be
uncorrelated. The correlation is 0 when the covariance
of X and Y is 0.
• The correlation coefficient is a measure of the strength of
the linear trend relative to the variability of the data around
that trend. Thus, it is dependent both on the magnitude of
the trend and the magnitude of the variability in the data.
p. 3
70
r = 1.0
60
60
50
50
40
40
Y
Y
70
30
30
20
20
10
20
r = -1.0
10
30
40
50
60
70
80
20
30
40
X
160
150
50
60
70
80
X
160
r = 0.9
r = -0.9
150
130
130
Y
140
Y
140
120
120
110
110
100
100
90
90
30
40
50
60
70
80
20
30
X
40
50
60
70
80
X
150
r≈0
140
130
Y
20
120
110
100
90
20
30
40
50
60
70
80
X
p. 4
Correlation is not a measure of the magnitude of the slope of
the regression line.
9
8
7
6
Y
5
4
3
2
1
0
0
2
4
6
8
10
6
8
10
X
9
8
7
6
Y
5
4
3
2
1
0
0
2
4
X
Both plots above have r = 1, but very different slopes.
Note: In linear regression we can (and do) estimate the
amount Y increases/decreases with a one-unit change in X.
p. 5
However, the slope does depend on the units of
measurement of X and Y, the correlation is dimensionless.
p. 6
• Correlation is not a measure of the appropriateness of the
straight-line model.
60
60
50
50
40
40
30
30
Y
Y
Examples where r = 0
20
20
10
10
0
0
0
1
2
3
4
5
6
7
0
8
1
2
3
4
5
6
7
8
5
6
7
8
X
X
80
80
70
70
60
60
50
50
40
40
Y
Y
Examples when r is high
30
30
20
20
10
10
0
0
0
1
2
3
4
X
r = -0.766
5
6
7
8
0
1
2
3
4
X
r = -0.767
p. 7
Download