What’s wrong with single-level epidemiology? S V Subramanian Professor of Population Health and Geography Harvard University http://www.hsph.harvard.edu/faculty/sv-subramanian/ Public Lecture Series Duke-National University of Singapore Medical School December 13, 2013 Outline • The problem with single level models • Revisiting a classic • Importance of multilevel perspectives • Concluding remarks S V Subramanian 2 Single level perspectives 3 4 5 Ancel Keys Thomas Dawber Seven Country Study Framingham Heart Study 6 Risk factors for Cardiovascular Disease • Framingham • Seven Country – 1948 and on-going – One town in Massachusetts, xxxxxxxxxxxxxxxxxxxxxxxxxx xxx – Inferential Unit: Individuals – Population variability: a nuisance – Unit of analysis: Individuals, xxxxxxxxxxxx – 1958 – 1970 – 7 countries: Yugoslavia, Italy, Greece, Finland, Netherlands, USA, Japan – Inferential Unit: Populations – Population variability: of substantive interest – Unit of analysis: Sites/Countries – http://www.framinghamheartstudy. org/about/history.html – http://www.sph.umn.edu/epi/history/s evencountries.asp 7 American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357 8 American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357 Individual Illiteracy Black S V Subramanian Illiteracy 1 0.203 Black 1 9 American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357 S V Subramanian Individual Illiteracy Black Illiteracy 1 0.203 Black State %Illiteracy %Black %Illiteracy 1 0.773 %Black 1 1 10 American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357 Individual Illiteracy Foreign-born Illiteracy 1 0.118 Foreign-born 1 State %Illiteracy %Foreign-born %Illiteracy 1 %Foreign-born -0.526 1 S V Subramanian 11 American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357 • On the ecological relationship – The purpose of this paper will have been accomplished if it prevents the future computation of meaningless correlations. S V Subramanian 12 American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357 • On the ecological relationship – The purpose of this paper will have been accomplished if it prevents the future computation of meaningless correlations. • On the individual relationship – The purpose of this paper will have been accomplished if it stimulates the study of similar problems with use of meaningful correlation between the properties of individuals. S V Subramanian 13 S V Subramanian 14 Critique • Conclusion not supported by analysis • Questionable Assumption – “In each study that uses ecological correlations, the obvious purpose is to discover something about the behavior of individuals”. • Methodological individualism – Technically accurate but substantively misleading in asserting the primacy of “individual” relationships for understanding the association between race and illiteracy in the US • Conflates ecology with aggregate – the whole need not simply be the sum of its parts S V Subramanian 15 Robinson’s Reach • 3rd most cited paper in ASR (>1000 citations) • Dire warnings of “ecological fallacy” - a cornerstone of ALL epidemiologic textbooks. • Motivated collection of individual survey data S V Subramanian 16 Multilevel, or more precisely, Two-level perspectives S V Subramanian 17 Revisiting Robinson’s Example • • • • Data: 1930 US Census Structure: 98 241 245 individuals in 49 States Outcome: Illiterate or not Predictors: – Individual: Race/Nativity (White Native, Foreign-born Native, and Black) – State: Percentage of Black Population; and Jim Crow or not • Model: Two-level Binomial Logistic Model using Monte Carlo Markov Chain (MCMC) estimation with MetropolisHastings Algorithm S V Subramanian 18 Race-illiteracy association appears to be sensitive to States’ circumstances Native White Black S V Subramanian Ignoring states Accounting for states OR (95% CI) OR (95% CI) 1 1 11.66 (11.63, 11.69) 5.86 (5.84, 5.88) 19 State “effects” not sensitive to racial composition 2.00 Between-state variation (in logits) 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 Null S V Subramanian After accounting for race 20 Substantial heterogeneity in the illiteracy-race association New Mexico Louisiana Kentucky North Carolina Tennessee South Carolina Alabama Louisiana Mississippi Illiteracy North Carolina Oregon New York Washington Oregon California South Dakota Minnesota Nevada District of Columbia Nevada Native Whites S V Subramanian Blacks 21 “Everywhere is nowhere” S V Subramanian 22 Race and Racial Context Black Foreign-born White Native White S V Subramanian 23 State not simply “aggregates” of individuals • Presence of Jim Crow Law – i.e., federal and state laws that permitted racial discrimination under the concept of “separate but equal” S V Subramanian 24 25 26 State not simply “aggregates” of individuals • Presence of Jim Crow Law – i.e., federal and state laws that permitted racial discrimination under the concept of “separate but equal” • Reality: “separate and unequal” – Per capita educational expenditure in public schools • Georgia: White Child = $11.30; “Colored” Child = $0.00. • Alabama: White Child = $26.47; “Colored” Child = $3.81 S V Subramanian 27 States with and without Jim Crow Laws in Education S V Subramanian 28 Association between state Jim Crow laws, race, and illiteracy 0.14 NJC Predicted probability of being illiterate JC 0.12 0.1 0.08 0.06 0.04 0.02 0 Native Whites S V Subramanian Blacks 29 The problem with Robinson’s analysis was thinking at only ONE level, leading to an impoverished interpretation of the data. 30 • Critical re-thinking of any single-level analyses: ecological or individual • No longer need to chose A level of analysis: an inductive approach to ascertaining at what level does action lie S V Subramanian 31 Where do we go from here? S V Subramanian 32 1. Conceptualizing Micro and Macro Contexts – e.g., neighborhoods (micro contexts) often are embedded in larger settings such as counties or states or regions (macro contexts) 2. Considering geographical and non-geographical contexts simultaneously (e.g., neighborhoods and schools) S V Subramanian 33 Current multilevel applications suffer from the problem of missing or omitted level. 34 1. Importance of considering multiple (nested) geographies S V Subramanian 35 Life expectancy patterns in the US 36 Data • Response: Life expectancy • Predictor: Time (i.e., “technological progress”) • Structure – Repeated cross-section – Three-level: years (1961-2000) at level-1 (n=122850) nested within 3150 counties at level-2 nested within 51 states at level-3. • Model: Three-level random coefficient model 37 Level at which action lies: Two Level yijk 0 jk 1 xijk e0ijk 0 jk 0 k u0 jk Variance Between state Between county Between time y ij 0 j 1 xij e0ij 0 k 0 v0 k 0 j 0 u0 j v0 jk ~ N (0, v20 ) u 0 j ~ N (0, u20 ) u0 jk ~ N (0, u20 ) e0ij ~ N (0, e20 ) e0ijk ~ N (0, e20 ) Ignore State Estimate SE 2.984 0.512 0.076 0.002 Include State Estimate SE 1.705 1.422 0.512 0.350 0.036 0.002 38 Level at which action lies: Three Level yijk 0 jk 1 xijk e0ijk 0 jk 0 k u0 jk Variance Between state Between county Between time y ij 0 j 1 xij e0ij 0 k 0 v0 k 0 j 0 u0 j v0 jk ~ N (0, v20 ) u 0 j ~ N (0, u20 ) u0 jk ~ N (0, u20 ) e0ij ~ N (0, e20 ) e0ijk ~ N (0, e20 ) Ignore State Estimate SE 2.984 0.512 0.076 0.002 Include State Estimate SE 1.705 1.422 0.512 0.350 0.036 0.002 39 State Variation trumps County Variation y ijk 0 jk 1 jk xijk e0ijk 0 jk 0 k u 0 jk 0 k 0 v0 k 1 jk 1k u1 jk 1k 1 v1k v20 v 0 k v ~ N (0, v ) : v 2 1k v 01 v1 u 0 jk u20 ~ N (0, u ) : u 2 u1 jk u 01 u1 2 e0ijk ~ N (0, e 0 ) 40 Macro contexts as important, if not more, for life expectancy variations in the US. 41 2. Importance of non-geographical/spatial contexts (e.g., schools, workplaces, hospitals, social networks) S V Subramanian 42 One context at a time • In prior multilevel research, we have not given sufficient attention to all potentially relevant contexts on health – Most multilevel research to date has focused on one setting: neighborhoods – However, other settings may be relevant for health – Biased inferences Empirical illustration: schools versus neighborhoods • 180 days/year • 6 or more hours/day • 12-13 years The idea of cross classified Level-2 School 1 School 2 School 3 School 4 Neighborhood 1 ☺☺☺☺ ☺ ☺☺ ☺ ☺ Neighborhood 2 ☺ ☺☺☺☺ ☺☺☺ ☺☺ ☺☺ ☺☺☺ ☺☺☺ ☺☺☺☺ ☺☺☺☺ Neighborhood 3 Neighborhood 4 S V Subramanian ☺☺☺ ☺☺ ☺☺ ☺☺☺☺ 45 Study population • In-home Survey: 20,745 students nested in 132 schools • Mean 125.4 students/school • 2142 neighborhoods (census tracts) – Mean 7.3 students/tract 3 outcomes: Tobacco use Depression Weight status Number of days cigarettes smoked (last 30 days) Variance Estimate (Standard Error) Hierarchical School Neighborhood School Hierarchical Neighborhood - 5.6 (0.6) - CrossClassified Number of days cigarettes smoked (last 30 days) Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood - 4.7 (0.5) - CrossClassified Number of days cigarettes smoked (last 30 days) Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood CrossClassified - 4.7 (0.5) 0.35 (0.3) 5.6 (0.6) - 5.54 (0.8) Log odds of smoking Variance Estimate (Standard Error) Hierarchical School Neighborhood School Hierarchical Neighborhood - 0.35 (0.04) - CrossClassified Log odds of smoking Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood - 0.25 (0.03) - CrossClassified Log odds of smoking Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood CrossClassified - 0.25 (0.03) 0.06 (0.02) 0.35 (0.04) - 0.36 (0.06) Self reported Body Mass Index Variance Estimate (Standard Error) Hierarchical School Neighborhood School Hierarchical Neighborhood - 0.98 (0.11) - CrossClassified Self reported Body Mass Index Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood - 0.73 (0.10) - CrossClassified Self reported Body Mass Index Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood CrossClassified - 0.73 (0.10) 0.22 (0.08) 0.98 (0.11) - 0.87 (0.14) CES-D Scale (Depression) Variance Estimate (Standard Error) Hierarchical School Neighborhood School Hierarchical Neighborhood - 2.05 (0.56) - CrossClassified CES-D Scale (Depression) Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood - 1.84 (0.52) - CrossClassified CES-D Scale (Depression) Variance Neighborhood School Estimate (Standard Error) Hierarchical School Hierarchical Neighborhood CrossClassified - 1.84 (0.52) 0.45 (0.34) 2.05 (0.56) - 1.69 (0.59) Concluding remarks • Need critical re-thinking of ALL single-level epidemiological analyses – Dire warnings of “ecological fallacy”, but the science is full of studies that risk “individualistic or atomistic fallacy” • Need to carefully consider the units of analysis in epidemiological investigations and the problem of “omitted” levels • Need to consider multiple contexts simultaneously (places/schools/worksites) S V Subramanian 61