Simpson's Paradox and the Concept of "Lurkers"

advertisement
THE SCIENTIFIC STUDY OF
POLITICS (POL 51)
Professor B. Jones
University of California, Davis
TODAY
Pitfalls and Paradoxes
 The Concept of a “Lurker”

THE BASEBALL MANAGER’S DILEMMA
Bottom of the ninth, down by 1 run
 Two Outs
 Runners on second and third
 …and the pitcher is up
 You have only two players left
 …and this is the National League.
 What will you do?

THE CHOICES
Player 1: 280 hits from 1200 at bats.
 Player 2: 110 hits from 500 at bats.
 Their “batting average”

 Player
1: 110/500=.220
 Player 2: 280/1200=.233
Who would you choose?
 On batting average, Player 2 > Player 1

BUT WAIT!



Both players are switch-hitters (they can bat from
the left or right side of the plate)
We’ll go “money ball” and play the best match-up
The data:
Player 1
Side
At Bats
Hits
Player 2
From Right
From Left
From Right
From Left
400
100
400
800
84
26
80
200
0.210
0.260
0.200
0.250
HUH?
What happened?
 Not accounting for switch hitting, Player 2 is
preferred to Player 1
 When accounting for switch hitting, Player 1 is
preferred to Player 2
 Worse! From either side of the plate, we would
conclude Player 1 is better than Player 2 even
though Player 2’s overall batting average is
higher!

COLLEGE ADMISSIONS

University Admission Statistics


1000 women apply, 1000 men apply
Admission Rate:
Women: 510/1000=51 percent
 Men: 800/1000=80 percent




Conclusion?
Evidence of gender bias?
This was basis of U.C. Berkeley gender bias case in
the 1970s
Source: http://walrandpc.eecs.berkeley.edu/126/simpson.htm
BUT WAIT


Two colleges students apply to, College A and College B.
The Admissions Data:
Female
College
Accepted
Rate
Applied
Accepted
Rate
A
980
490
50%
200
80
40%
B
20
20
100%
800
720
90%
1000
510
51%
1000
800
80%
Total

Applied
Male
Findings?


Admission Rate for each college is higher for women than men.
Overall admission rate is higher for men.
SIMPSON’S PARADOX
Two preceding examples illustrate Simpson’s
Paradox
 Named for E.H. Simpson (based on 1951
paper)
 Phenomenon has been known since at least
1899 (and Yule 1903 published a paper on it).
 Why a paradox?

 The
result is counterintuitive.
SIMPSON’S PARADOX

The Paradox:
A
“reversal result”
 The relationship between two variables found
within sub-groups differ in direction when the subgroups are combined
 Batting
Averages on Left/Right Side vs. Overall
 Gender admissions by college vs. Overall Gender
Admission Rate

Consider admissions data again.
ADMISSIONS DATA AND 1973 BERKELEY
CASE

Our example





Data seem consistent with the hypothesis.
The Problem:


The “model”: Admission Rate=f(Gender)
Gender Bias Hypothesis: Admission rates of women will be
lower than men.
Y=Admission Rate; X=Gender
There is a third variable; what is it?
College to which students applied (A vs. B)

Z=College
PARADOX REVEALED


The Problem is Simple
(A) There is a strong association between Y and Z
One college (B) is easier to “get into” than the other college
(A)
(B) There is a strong association between X and Z
 Women tend to apply to the harder college (A) at higher
rates; men tend to apply to the easier college (B) at higher
rates.




Therefore, because of (A) and (B), there is a strong
connection between Y and X
This connection, however, is spurious.
A PICTURE

The Nature of the Problem
Admission
Rate (Y)
College (Z)
Gender (X)
PARADOX RESOLVED
Beware the Lurker Variable
 Lurker Variable:



A lurking variable (confounding factor or variable, or simply a confound or
confounder) is a "hidden" variable in a statistical or research model that affects the
variables in question but is not known or acknowledged, and thus (potentially)
distorts the resulting data. This hidden third variable causes the two measured
variables to falsely appear to be in a causal relation. Such a relation between two
observed variables is termed a spurious relationship. (Source: http://en.wikipedia.org/wiki/Confounder)
The Problem: Z is a confounder. If we had
accounted for Z, we would have arrived at
different conclusions.
IMPLICATIONS

Berkeley, 1973
 Gender
bias not found when accounting for
departmental admission rates
 Interestingly, it was found that women tended to
apply to more difficult graduate programs than
men.
 Across departments, graduate admission rates
were higher for women.
 Not accounting for departmental differences,
gender bias appeared
IMPLICATIONS FOR RESEARCH DESIGN
Combining sub-groups (aggregation) can lead
to serious inferential problems
 ESPECIALLY if the presence of lurking variables
are not accounted for
 Large samples with lots of subgroups can lead
to these kinds of problems
 Simpson’s Paradox is a real concern
 …but often not recognized.

Download