Measures of Association

advertisement
Measures of Association
The association between two continuous variables can be
illustrated in a scatterplot.
5
4
3
1
2
mean attachment level (mm)
6
serum cotinine and attachment levels
200
400
600
800
1000
serum cotinine (ng/mL)
This scatterplot depicts serum cotinine levels (a metabolite
of nicotine) and mean attachment levels of 30 current
smokers. Each point indicates the two values for a single
person.
There appears to be a slight positive correlation; that is,
higher cotinine levels seem to be related with higher
attachment levels.
An “eyeball” assessment such as this can be quite
subjective.
Pearson Correlation Coefficient
An objective, numerical measure of correlation between
two characteristics measured on the same subjects is the
Pearson Correlation Coefficient.
For n subjects the data for the two variables can be
arranged in pairs.
subject
Variable 1 (X)
Variable 2 (Y)
1
2
.
.
.
n
x1
x2
.
.
.
xn
y1
y2
.
.
.
yn
The sample correlation coefficient, r, is computed by
r
 ( x  x )( y  y )
 ( x  x)  ( y  y)
i
i
i
2
i
i
i
2
,
i
which is an estimate of the population correlation
coefficient,


E ( x   x )( y   y )
 x y

.
The correlation is always between –1 and 1 and is a
measure of the linear association between X and Y.
Scatterplots illustrating a range of correlation coefficients
r = 0.78
r = -0.7
r = 0.4
r = -0.4
r = 0.18
r = 0.01
Interpretation of the correlation coefficient
1. Correlation is unitless; that is, it is not affected by
changes of location or scale.
2.
r and ρ are always between –1 and 1
3.
r > 0 says Y ↑ when X
r < 0 says Y ↓ when X
(same thing for ρ)
↑.
↑.
4. X and Y independent → r ≈ 0 , ρ = 0
5. r ≈ 0 , ρ = 0 → no linear relationship between X
and Y.
6. If r = 1 or –1 then the points (x,y) lie on a straight
line (perfect linear relationship).
7. “Strength” of linear assocation indicated by
magnitude of r. Closer to ± 1 indicates stronger
linear association.
Two Extreme Examples
r = -1
y
y
r=1
x
x
r = 1 or –1 mean perfect linear relationship
y
r=0
x
Points with a perfect association but r = 0, because the association
is not a linear association
Inference for correlation coefficients
If ρ = 0, (and X or Y is Normal) then the statistic
t  n2
r
1 r2
has an approximate tn-2 distribution. Thus we can use
it to compute a hypothesis test of
H0: ρ = 0 vs H1: ρ ≠ 0.
Example: Attachment level and serum cotinine
The sample correlation coefficient between the serum
cotinine levels and mean attachment levels for 30
current smokers is r = 0.498.
To test whether there is good evidence that the true
correlation, ρ, is different than zero, we compute the
statistic
t  30  2
0.498
1  0.498
2
 3.04 ,
which is greater than t28, .975= 2.048, so we reject at
the α = .05 level.
The p-value for the test is P(|t28| > 3.04) = 0.005
(from Excel).
Spearman Rank Correlation
One problem with the Pearson correlation coefficient
is that, like the sample mean and standard deviation,
it can be unduly influenced by outliers (extreme
values).
The Spearman rank correlation coefficient, rs , is an
alternative measure of association that is more robust
(less likely to be influenced by a small number of
outliers).
It is calculated simply by using the Pearson
correlation coefficient forumula, but applying it to
the ranks of X and Y instead.
Example: Clinical trial studying leptin and proinflammatory cytokines, before and after hypo-caloric diet
Change in Leptin
300
0
-200
0
200
400
600
-300
-600
Change in TNF receptor 55
TNFα
59.1
115.9
-67.8
-67.7
660.6
148.3
50.2
154.8
-93.7
22.4
-19.3
-5.8
9.4
36.3
23.6
-91.6
-121.5
0.6
-54.3
leptin
89.80
-49.25
188.55
-23.10
-519.65
102.90
-29.55
-83.20
-60.00
-12.65
255.40
27.90
-15.60
186.19
-23.10
-111.95
-8.52
165.75
-18.50
r = -0.67
rank(TNFα)rank(leptin)
15
16
4
5
19
17
14
18
2
11
7
8
10
13
12
3
1
9
6
14
5
18
7
1
15
6
3
4
11
19
13
10
17
7
2
12
16
9
rs = -0.16
If n > 10, we can test the
hypothesis
H0: ρs = 0 vs H1: ρs ≠ 0,
using the same procedure:
t s  19  2
 .16
1  (.16) 2
 .67 ,
so the p-value is
P(|t17| > |-.67|) = 0.51.
Compare to the corresponding
to the Pearson r values:
t = -3.72, p-value < 0.002
Download