licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2006, The Johns Hopkins University and Karl W. Broman. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
Fathers’ and daughters’ heights
Fathers’ heights
mean = 67.7
SD = 2.8
55
60
65
70
75
70
75
height (inches)
Daughters’ heights
mean = 63.8
SD = 2.7
55
60
65
height (inches)
Pearson and Lee (1906) Biometrika 2:357-462
1376 pairs
Fathers’ and daughters’ heights
corr = 0.52
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Pearson and Lee (1906) Biometrika 2:357-462
1376 pairs
Covariance and correlation
Let X and Y be random variables with
µX = E(X), µY = E(Y), σX = SD(X), σY = SD(Y)
For example, sample a father/daughter pair and let
X = the father’s height and Y = the daughter’s height.
Covariance
Correlation
cov(X,Y) = E{(X – µX) (Y – µY)}
cor(X, Y) =
cov(X, Y)
σXσY
−1 ≤ cor(X, Y) ≤ 1
cov(X,Y) can be any real number.
Examples
corr = 0.1
30
25
25
0
20
−1
15
−2
10
−2
−1
0
1
2
Y
30
1
−3
10
5
10
15
20
25
30
5
30
25
25
20
20
20
15
15
10
10
5
5
15
20
25
30
Y
30
25
10
5
10
15
20
25
30
5
20
20
Y
25
20
Y
30
25
15
15
15
10
10
10
5
5
25
30
30
15
20
25
30
25
30
corr = −0.9
30
20
10
corr = 0.9
25
15
25
10
30
10
20
15
corr = 0.7
5
15
corr = −0.5
30
5
10
corr = 0.5
Y
Y
20
15
corr = 0.3
Y
corr = −0.1
2
Y
Y
corr = 0
5
5
10
15
20
25
30
5
10
15
20
Estimated correlation
Consider n pairs of data:
(x1, y1), (x2, y2), (x3, y3), . . . , (xn, yn)
We consider these as independent draws from some
bivariate distribution.
We estimate the correlation in the underlying distribution by:
P
− x̄)(yi − ȳ)
P
2
2
(
x
−
x̄
)
i
i(yi − ȳ)
i
r = pP
i (xi
This is sometimes called the correlation coefficient.
Correlation measures linear association
All three plots have correlation ≈ 0.7!
Fathers’ and daughters’ heights
corr = 0.52
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Linear regression
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
75
Linear regression
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Regression line
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
Slope = r × SD(Y) / SD(X)
75
SD line
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Slope = SD(Y) / SD(X)
SD line vs regression line
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
Both lines go through the point (X̄, Ȳ).
75
Predicting father’s ht from daughter’s ht
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Predicting father’s ht from daughter’s ht
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
75
Predicting father’s ht from daughter’s ht
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
There are two regression lines!
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
75