Correlational Problems and Fallacies James H. Steiger

advertisement
Correlational Problems and
Fallacies
James H. Steiger
Introduction
In this module, we discuss some common
problems and fallacies regarding correlation
coefficients and their interpretation






Interpreting a correlation
Correlation and causality
Perfect correlation and equivalence
No Correlation vs. No Relation
Combining Populations, and Ignoring Explanatory
Variables
Restriction of Range
Interpreting a Correlation
If scores are on roughly similar scales, the
shape of the scatterplot can reveal a
substantial amount about the correlation.
Interpreting a Correlation
Scatterplot (Cigarettes v s. Cardiac Reserv e)
42
38
r = 
Cardiac Reserve
34
30
26
22
18
2
6
10
14
18
Cigarettes Smoked
22
26
30
Interpreting a Correlation
Scatterplot (Shoe Size vs. IQ)
160
140
120
IQ
100
80
60
40
20
-2
r = .01
0
2
4
6
8
Shoe Size
10
12
14
16
Interpreting a Correlation
Scatterplot (GPA vs. IQ)
160
140
120
IQ
100
80
r = .72
60
40
20
20
40
60
80
GPA
100
120
Anscombe’s Quartet
X1
Y1
X2
Y2
X3
Y3
X4
Y4
10
8
13
9
11
14
6
4
12
7
5
8.04
6.95
7.58
8.81
8.33
9.96
7.24
4.26
10.84
4.82
5.68
10
8
13
9
11
14
6
4
12
7
5
9.14
8.14
8.74
8.77
9.26
8.10
6.13
3.10
9.13
7.26
4.74
10
8
13
9
11
14
6
4
12
7
5
7.46
6.77
12.74
7.11
7.81
8.84
6.08
5.39
8.15
6.42
5.73
8
8
8
8
8
8
8
19
8
8
8
6.58
5.76
7.71
8.84
8.47
7.04
5.25
12.50
5.56
7.91
6.89
Anscombe’s Quartet
Each of the above 4 data sets has the following
summary statistics:
M Y  7.5 M X  9
s  11
2
X
s  4.13 rXY  .82
2
Y
Each has a best fitting linear regression line
of
Yˆ  .5 X  3
Anscombe’s Quartet
Scatterplot (Anscombe.ST A 8v*11c)
Scatterplot (Anscombe.ST A 8v*11c)
y=3+0.5*x+eps
y=3+0.5*x+eps
14
14
12
12
10
Y3
Y4
10
8
8
6
6
4
4
6
8
10
12
14
16
18
2
20
4
6
X4
Scatterplot (Anscombe.ST A 8v*11c)
8
10
12
14
16
X3
Scatterplot (Anscombe.ST A 8v*11c)
y=3+0.5*x+eps
y=3.+0.5*x+eps
12
10
11
9
10
8
9
7
Y2
Y1
8
7
6
5
6
5
4
4
3
3
2
4
6
8
10
X1
12
14
16
2
2
4
6
8
10
X2
12
14
16
Correlation and Causality
Correlation is not causality. This is a
standard adage in textbooks on statistics and
experimental design, but it is still forgotten
on occasion.
Example: The correlation between number
of fire trucks sent to a fire and the dollar
damage done by the fire.
Perfect Corrrelation and
Equivalence
Two variables may correlate highly (or even
perfectly), without measuring the same
construct.
Example: Height and weight on the planet
Zorg.
Height and Weight on the Planet
Zorg
Zero Correlation vs. No Relation
The Pearson correlation coefficient is a
measure of linear relation. Many strong
relationships are nonlinear. Always examine
the scatterplot!
Combining Populations
If two groups with different means and/or
covariances are combined, the resulting
mixture can exhibit spurious correlations.
Example. (C.P.) Suppose the correlation
between strength and mathematics
performance is zero for 6th grade boys, and
zero for 8th grade boys. Does this mean it
will be zero in a combined group of 6th and
8th graders?
Restriction of Range
Often, when linear regression is used to
predict performance, the population is
restricted. (For example, the GRE is used to
predict performance in graduate school, but
people with low GRE scores are often
refused admission to graduate school.
Consequently, the “available data” are a
truncated version of the full data set.
Restriction of Range
Scatterplot (Restriction of Range.STA 10v*1000c)
y=33.905+0.514*x+eps
100
N = 1000
r = .73
90
VAR2
80
70
60
50
40
20
30
40
50
60
VAR1
70
80
90
100
110
Restriction of Range
Scatterplot (Restriction of Range.STA 10v*1000c)
y=35.09+0.497*x+eps
96
r = .40
N = 153
90
VAR2
84
78
72
66
60
78
82
86
94
90
VAR1
98
102
106
The “Third Variable Fallacy”
Often people assume, sometimes almost
subconsciously, that when two variables correlate
highly with a third variable, they correlate highly
with each other.
Actually, if rXW and rYW are both .7071, rXY can
vary anywhere from 0 to 1.
Only when rXW and/or rYW become very high does
the correlation between X and Y become highly
restricted.
Download