Simpson`s Paradox

advertisement
Simpson’s Paradox
Jeff Witmer
18 March 2010
G. Udny Yule
(Not Edward H. Simpson)
Treatment A
Treatment B
Blue disease
81/87 93%
234/270 87%
C
Treatment A
D
Treatment B
Red disease
192/263 73%
55/80 69%
Treatment A
E
Treatment B
F
Black disease
273/350 78%
289/350 83%
81
+192
273
a A
c C
ac AC
 and 
but

b B
d D
bd BD

1 2
7 6
1 7 2  6
 and  but

5 8
4 3
5 4 8 3
Lefthanded
pitchers
Sizemore 29/134 .216
Righthanded
pitchers
79/302 .262
All
pitchers
108/416 .248
Valbuena
84/329 .255
92/368 .250
8/39
.205
UC Berkeley 1973 grad school admissions
Applicants
Men
8442
Women 4321
% admitted
44%
35%
6 large departments:
Department
Men
Applicants % admitted
A
825
62%
B
560
63%
C
325
37%
D
417
33%
E
191
28%
F
272
6%
Women
Applicants % admitted
108
82%
25
68%
593
34%
375
35%
393
24%
341
7%
January 2009
Percent of Planes Delayed from City of Origin
Continental
Airport
Late Total %
Newark
957 3998 23.9
LaGuardia
62 356 17.4
Pittsburg
8
60 13.3
Detroit
16 145 11.0
Totals
1043 4559 22.9
United
Late Total
100
399
113
573
17
119
16
139
246 1230
%
25.1
19.7
14.3
11.5
20.0
Showing two of the four airports graphically:
Circles correspond to Continental (Newark much larger than LGA)
Squares correspond to United Airlines (LGA larger than Newark)
X equals average for
the two airports
Continental at 23.4%
and United at 21.9%
Marginal effect
(ignoring/combining
airports)
Continental
United
X equals average for
the two airports
Continental at 23.4%
and United at 21.9%
Marginal effect
(ignoring/combining
airports)
Continental
United
1 2
7 6
1 7 2  6
 and  but

5 8
4 3
5 4 8 3
Hat tip: Roger Nelsen
What level of aggregation is “right”?
1995
Justice
Jeter
104/411 .253
12/48 .250
1996
45/140
Combined
.321 149/551 .270
183/582 .314 195/630 .310
Unemployment higher now than in 1980s recession
for each education level, but lower overall
WSJ article, 2 December 20009
(graph only shows two education levels…)
1977 data from 20 counties in Florida
Sentence = death penalty?
White
victim
White def 19/151 .126
Black def
11/63 .175
Black
victim
0/9 .000
6/103
.058
All
victims
19/160 .119
17/166
.102
The probability of a convicted murderer being given the
death penalty (vs life in prison) depends more on the
victim’s race than on the defendant’s race
Hat tip: Mike Radelet (and Alan Agresti) and Jimmy Doi
Regression data: Olympic 1500 winners
Hat tip: Phil Everson
State average SAT scores (1995)
Average SAT vs
Expenditure per student
Average SAT vs
Average teacher salary
Average SAT vs
Pupil/teacher ratio
The 10 states with the lowest per pupil spending included
four -- North Dakota, South Dakota, Tennessee, Utah -among the 10 states with the top SAT scores. Only one of
the 10 states with the highest per pupil expenditures -Wisconsin -- was among the 10 states with the highest SAT
scores. New Jersey has the highest per pupil expenditures,
an astonishing $10,561, which teachers' unions elsewhere
try to use as a negotiating benchmark. New Jersey's rank
regarding SAT scores? Thirty-ninth... The fact that the
quality of schools... [fails to correlate] with education
appropriations will have no effect on the teacher unions'
insistence that money is the crucial variable.
-- George F. Will, (September 12, 1993),
"Meaningless Money Factor," The Washington Post, C7.
Consider the fraction of students in a state who take the SAT
North Dakota
Iowa
Minnesota
Utah
Wisconsin
South Dakota
States with high SAT
scores but low fractions
taking the test
Average SAT vs
Fraction of students taking the test
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1051.88710
20.82477
50.51 < 2e-16 ***
expend
7.91371
3.49820
2.26
0.028 *
frac
-6.38069
0.70362
-9.07 8.3e-12 ***
I(frac^2)
0.04741
0.00916
5.18 4.9e-06 ***
Added variable
plot for
“expend”
Hat tip: Stacey Hancock and Albyn Jones
Other examples
US Senate: replace moderate Dems with moderate Reps and the
Senate could become more more conservative while each party
caucus becomes more liberal
The 20-yr death rate was higher for non-smokers than
for smokers in a UK city – but lower in (almost) each
age group.
Hat tip: Jo Hardin
G. U. Yule (1903). "Notes on the Theory of Association of
Attributes in Statistics". Biometrika 2: 121–134.
Simpson, Edward H. (1951). "The Interpretation of
Interaction in Contingency Tables". Journal of the Royal
Statistical Society, Ser. B 13: 238–241.
Stigler, S. M. (1980). Stigler's law of eponymy.
Transactions of the New York Academy of Sciences, 39:
147-58 (Merton Frestschrift Volume, F. Gieryn (ed)).
Download