Simpson’s Paradox Jeff Witmer 18 March 2010 G. Udny Yule (Not Edward H. Simpson) Treatment A Treatment B Blue disease 81/87 93% 234/270 87% C Treatment A D Treatment B Red disease 192/263 73% 55/80 69% Treatment A E Treatment B F Black disease 273/350 78% 289/350 83% 81 +192 273 a A c C ac AC and but b B d D bd BD 1 2 7 6 1 7 2 6 and but 5 8 4 3 5 4 8 3 Lefthanded pitchers Sizemore 29/134 .216 Righthanded pitchers 79/302 .262 All pitchers 108/416 .248 Valbuena 84/329 .255 92/368 .250 8/39 .205 UC Berkeley 1973 grad school admissions Applicants Men 8442 Women 4321 % admitted 44% 35% 6 large departments: Department Men Applicants % admitted A 825 62% B 560 63% C 325 37% D 417 33% E 191 28% F 272 6% Women Applicants % admitted 108 82% 25 68% 593 34% 375 35% 393 24% 341 7% January 2009 Percent of Planes Delayed from City of Origin Continental Airport Late Total % Newark 957 3998 23.9 LaGuardia 62 356 17.4 Pittsburg 8 60 13.3 Detroit 16 145 11.0 Totals 1043 4559 22.9 United Late Total 100 399 113 573 17 119 16 139 246 1230 % 25.1 19.7 14.3 11.5 20.0 Showing two of the four airports graphically: Circles correspond to Continental (Newark much larger than LGA) Squares correspond to United Airlines (LGA larger than Newark) X equals average for the two airports Continental at 23.4% and United at 21.9% Marginal effect (ignoring/combining airports) Continental United X equals average for the two airports Continental at 23.4% and United at 21.9% Marginal effect (ignoring/combining airports) Continental United 1 2 7 6 1 7 2 6 and but 5 8 4 3 5 4 8 3 Hat tip: Roger Nelsen What level of aggregation is “right”? 1995 Justice Jeter 104/411 .253 12/48 .250 1996 45/140 Combined .321 149/551 .270 183/582 .314 195/630 .310 Unemployment higher now than in 1980s recession for each education level, but lower overall WSJ article, 2 December 20009 (graph only shows two education levels…) 1977 data from 20 counties in Florida Sentence = death penalty? White victim White def 19/151 .126 Black def 11/63 .175 Black victim 0/9 .000 6/103 .058 All victims 19/160 .119 17/166 .102 The probability of a convicted murderer being given the death penalty (vs life in prison) depends more on the victim’s race than on the defendant’s race Hat tip: Mike Radelet (and Alan Agresti) and Jimmy Doi Regression data: Olympic 1500 winners Hat tip: Phil Everson State average SAT scores (1995) Average SAT vs Expenditure per student Average SAT vs Average teacher salary Average SAT vs Pupil/teacher ratio The 10 states with the lowest per pupil spending included four -- North Dakota, South Dakota, Tennessee, Utah -among the 10 states with the top SAT scores. Only one of the 10 states with the highest per pupil expenditures -Wisconsin -- was among the 10 states with the highest SAT scores. New Jersey has the highest per pupil expenditures, an astonishing $10,561, which teachers' unions elsewhere try to use as a negotiating benchmark. New Jersey's rank regarding SAT scores? Thirty-ninth... The fact that the quality of schools... [fails to correlate] with education appropriations will have no effect on the teacher unions' insistence that money is the crucial variable. -- George F. Will, (September 12, 1993), "Meaningless Money Factor," The Washington Post, C7. Consider the fraction of students in a state who take the SAT North Dakota Iowa Minnesota Utah Wisconsin South Dakota States with high SAT scores but low fractions taking the test Average SAT vs Fraction of students taking the test Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1051.88710 20.82477 50.51 < 2e-16 *** expend 7.91371 3.49820 2.26 0.028 * frac -6.38069 0.70362 -9.07 8.3e-12 *** I(frac^2) 0.04741 0.00916 5.18 4.9e-06 *** Added variable plot for “expend” Hat tip: Stacey Hancock and Albyn Jones Other examples US Senate: replace moderate Dems with moderate Reps and the Senate could become more more conservative while each party caucus becomes more liberal The 20-yr death rate was higher for non-smokers than for smokers in a UK city – but lower in (almost) each age group. Hat tip: Jo Hardin G. U. Yule (1903). "Notes on the Theory of Association of Attributes in Statistics". Biometrika 2: 121–134. Simpson, Edward H. (1951). "The Interpretation of Interaction in Contingency Tables". Journal of the Royal Statistical Society, Ser. B 13: 238–241. Stigler, S. M. (1980). Stigler's law of eponymy. Transactions of the New York Academy of Sciences, 39: 147-58 (Merton Frestschrift Volume, F. Gieryn (ed)).