THE SCIENTIFIC STUDY OF POLITICS (POL 51) Professor B. Jones University of California, Davis TODAY Pitfalls and Paradoxes The Concept of a “Lurker” THE BASEBALL MANAGER’S DILEMMA Bottom of the ninth, down by 1 run Two Outs Runners on second and third …and the pitcher is up You have only two players left …and this is the National League. What will you do? THE CHOICES Player 1: 280 hits from 1200 at bats. Player 2: 110 hits from 500 at bats. Their “batting average” Player 1: 110/500=.220 Player 2: 280/1200=.233 Who would you choose? On batting average, Player 2 > Player 1 BUT WAIT! Both players are switch-hitters (they can bat from the left or right side of the plate) We’ll go “money ball” and play the best match-up The data: Player 1 Side At Bats Hits Player 2 From Right From Left From Right From Left 400 100 400 800 84 26 80 200 0.210 0.260 0.200 0.250 HUH? What happened? Not accounting for switch hitting, Player 2 is preferred to Player 1 When accounting for switch hitting, Player 1 is preferred to Player 2 Worse! From either side of the plate, we would conclude Player 1 is better than Player 2 even though Player 2’s overall batting average is higher! COLLEGE ADMISSIONS University Admission Statistics 1000 women apply, 1000 men apply Admission Rate: Women: 510/1000=51 percent Men: 800/1000=80 percent Conclusion? Evidence of gender bias? This was basis of U.C. Berkeley gender bias case in the 1970s Source: http://walrandpc.eecs.berkeley.edu/126/simpson.htm BUT WAIT Two colleges students apply to, College A and College B. The Admissions Data: Female College Accepted Rate Applied Accepted Rate A 980 490 50% 200 80 40% B 20 20 100% 800 720 90% 1000 510 51% 1000 800 80% Total Applied Male Findings? Admission Rate for each college is higher for women than men. Overall admission rate is higher for men. SIMPSON’S PARADOX Two preceding examples illustrate Simpson’s Paradox Named for E.H. Simpson (based on 1951 paper) Phenomenon has been known since at least 1899 (and Yule 1903 published a paper on it). Why a paradox? The result is counterintuitive. SIMPSON’S PARADOX The Paradox: A “reversal result” The relationship between two variables found within sub-groups differ in direction when the subgroups are combined Batting Averages on Left/Right Side vs. Overall Gender admissions by college vs. Overall Gender Admission Rate Consider admissions data again. ADMISSIONS DATA AND 1973 BERKELEY CASE Our example Data seem consistent with the hypothesis. The Problem: The “model”: Admission Rate=f(Gender) Gender Bias Hypothesis: Admission rates of women will be lower than men. Y=Admission Rate; X=Gender There is a third variable; what is it? College to which students applied (A vs. B) Z=College PARADOX REVEALED The Problem is Simple (A) There is a strong association between Y and Z One college (B) is easier to “get into” than the other college (A) (B) There is a strong association between X and Z Women tend to apply to the harder college (A) at higher rates; men tend to apply to the easier college (B) at higher rates. Therefore, because of (A) and (B), there is a strong connection between Y and X This connection, however, is spurious. A PICTURE The Nature of the Problem Admission Rate (Y) College (Z) Gender (X) PARADOX RESOLVED Beware the Lurker Variable Lurker Variable: A lurking variable (confounding factor or variable, or simply a confound or confounder) is a "hidden" variable in a statistical or research model that affects the variables in question but is not known or acknowledged, and thus (potentially) distorts the resulting data. This hidden third variable causes the two measured variables to falsely appear to be in a causal relation. Such a relation between two observed variables is termed a spurious relationship. (Source: http://en.wikipedia.org/wiki/Confounder) The Problem: Z is a confounder. If we had accounted for Z, we would have arrived at different conclusions. IMPLICATIONS Berkeley, 1973 Gender bias not found when accounting for departmental admission rates Interestingly, it was found that women tended to apply to more difficult graduate programs than men. Across departments, graduate admission rates were higher for women. Not accounting for departmental differences, gender bias appeared IMPLICATIONS FOR RESEARCH DESIGN Combining sub-groups (aggregation) can lead to serious inferential problems ESPECIALLY if the presence of lurking variables are not accounted for Large samples with lots of subgroups can lead to these kinds of problems Simpson’s Paradox is a real concern …but often not recognized.