Multilevel models: concept and application

advertisement
What’s wrong with single-level
epidemiology?
S V Subramanian
Professor of Population Health and Geography
Harvard University
http://www.hsph.harvard.edu/faculty/sv-subramanian/
Public Lecture Series
Duke-National University of Singapore Medical School
December 13, 2013
Outline
• The problem with single level models
• Revisiting a classic
• Importance of multilevel perspectives
• Concluding remarks
S V Subramanian
2
Single level perspectives
3
4
5
Ancel Keys
Thomas Dawber
Seven Country Study
Framingham Heart Study
6
Risk factors for Cardiovascular Disease
• Framingham
• Seven Country
– 1948 and on-going
– One town in Massachusetts,
xxxxxxxxxxxxxxxxxxxxxxxxxx
xxx
– Inferential Unit: Individuals
– Population variability: a
nuisance
– Unit of analysis: Individuals,
xxxxxxxxxxxx
– 1958 – 1970
– 7 countries: Yugoslavia, Italy,
Greece, Finland, Netherlands,
USA, Japan
– Inferential Unit: Populations
– Population variability: of
substantive interest
– Unit of analysis:
Sites/Countries
– http://www.framinghamheartstudy.
org/about/history.html
– http://www.sph.umn.edu/epi/history/s
evencountries.asp
7
American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357
8
American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357
Individual
Illiteracy
Black
S V Subramanian
Illiteracy
1
0.203
Black
1
9
American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357
S V Subramanian
Individual
Illiteracy
Black
Illiteracy
1
0.203
Black
State
%Illiteracy
%Black
%Illiteracy
1
0.773
%Black
1
1
10
American Sociological Review, Vol. 15, No. 3 (Jun., 1950), pp. 351-357
Individual
Illiteracy
Foreign-born
Illiteracy
1
0.118
Foreign-born
1
State
%Illiteracy %Foreign-born
%Illiteracy
1
%Foreign-born
-0.526
1
S V Subramanian
11
American Sociological Review, Vol. 15, No. 3
(Jun., 1950), pp. 351-357
• On the ecological relationship
– The purpose of this paper will have been accomplished if
it prevents the future computation of meaningless
correlations.
S V Subramanian
12
American Sociological Review, Vol. 15, No. 3
(Jun., 1950), pp. 351-357
• On the ecological relationship
– The purpose of this paper will have been accomplished if
it prevents the future computation of meaningless
correlations.
• On the individual relationship
– The purpose of this paper will have been accomplished if
it stimulates the study of similar problems with use of
meaningful correlation between the properties of
individuals.
S V Subramanian
13
S V Subramanian
14
Critique
• Conclusion not supported by analysis
• Questionable Assumption
– “In each study that uses ecological correlations, the obvious purpose is
to discover something about the behavior of individuals”.
• Methodological individualism
– Technically accurate but substantively misleading in asserting the
primacy of “individual” relationships for understanding the association
between race and illiteracy in the US
• Conflates ecology with aggregate
– the whole need not simply be the sum of its parts
S V Subramanian
15
Robinson’s Reach
• 3rd most cited paper in ASR (>1000 citations)
• Dire warnings of “ecological fallacy” - a
cornerstone of ALL epidemiologic textbooks.
• Motivated collection of individual survey data
S V Subramanian
16
Multilevel, or more precisely,
Two-level perspectives
S V Subramanian
17
Revisiting Robinson’s Example
•
•
•
•
Data: 1930 US Census
Structure: 98 241 245 individuals in 49 States
Outcome: Illiterate or not
Predictors:
– Individual: Race/Nativity (White Native, Foreign-born
Native, and Black)
– State: Percentage of Black Population; and Jim Crow or
not
• Model: Two-level Binomial Logistic Model using Monte
Carlo Markov Chain (MCMC) estimation with MetropolisHastings Algorithm
S V Subramanian
18
Race-illiteracy association appears to be
sensitive to States’ circumstances
Native White
Black
S V Subramanian
Ignoring states
Accounting for
states
OR (95% CI)
OR (95% CI)
1
1
11.66 (11.63, 11.69)
5.86 (5.84, 5.88)
19
State “effects” not sensitive to racial composition
2.00
Between-state variation (in logits)
1.75
1.50
1.25
1.00
0.75
0.50
0.25
0.00
Null
S V Subramanian
After accounting for race
20
Substantial heterogeneity in the illiteracy-race association
New Mexico
Louisiana
Kentucky
North Carolina
Tennessee
South Carolina
Alabama
Louisiana
Mississippi
Illiteracy
North Carolina
Oregon
New York
Washington
Oregon
California
South Dakota
Minnesota
Nevada
District of Columbia
Nevada
Native Whites
S V Subramanian
Blacks
21
“Everywhere is nowhere”
S V Subramanian
22
Race and Racial Context
Black
Foreign-born White
Native White
S V Subramanian
23
State not simply “aggregates” of individuals
• Presence of Jim Crow Law
– i.e., federal and state laws that permitted racial
discrimination under the concept of “separate but equal”
S V Subramanian
24
25
26
State not simply “aggregates” of individuals
• Presence of Jim Crow Law
– i.e., federal and state laws that permitted racial
discrimination under the concept of “separate but equal”
• Reality: “separate and unequal”
– Per capita educational expenditure in public schools
• Georgia: White Child = $11.30; “Colored” Child = $0.00.
• Alabama: White Child = $26.47; “Colored” Child = $3.81
S V Subramanian
27
States with and without Jim Crow Laws in Education
S V Subramanian
28
Association between state Jim Crow laws, race, and illiteracy
0.14
NJC
Predicted probability of being illiterate
JC
0.12
0.1
0.08
0.06
0.04
0.02
0
Native Whites
S V Subramanian
Blacks
29
The problem with Robinson’s analysis
was thinking at only ONE level, leading
to an impoverished interpretation of
the data.
30
• Critical re-thinking of any single-level
analyses: ecological or individual
• No longer need to chose A level of analysis:
an inductive approach to ascertaining at
what level does action lie
S V Subramanian
31
Where do we go from here?
S V Subramanian
32
1. Conceptualizing Micro and Macro Contexts
– e.g., neighborhoods (micro contexts) often are
embedded in larger settings such as counties or
states or regions (macro contexts)
2. Considering geographical and non-geographical
contexts simultaneously (e.g., neighborhoods
and schools)
S V Subramanian
33
Current multilevel applications suffer
from the
problem of missing or omitted level.
34
1. Importance of considering multiple
(nested) geographies
S V Subramanian
35
Life expectancy patterns in the US
36
Data
• Response: Life expectancy
• Predictor: Time (i.e., “technological progress”)
• Structure
– Repeated cross-section
– Three-level: years (1961-2000) at level-1
(n=122850) nested within 3150 counties at
level-2 nested within 51 states at level-3.
• Model: Three-level random coefficient model
37
Level at which action lies: Two Level
yijk   0 jk  1 xijk  e0ijk
 0 jk   0 k  u0 jk
Variance
Between state
Between county
Between time
y ij   0 j   1 xij  e0ij
 0 k   0  v0 k
 0 j   0  u0 j
v0 jk ~ N (0,  v20 )
u 0 j ~ N (0,  u20 )
u0 jk ~ N (0,  u20 )
e0ij ~ N (0,  e20 )
e0ijk ~ N (0,  e20 )
Ignore State
Estimate
SE
2.984
0.512
0.076
0.002
Include State
Estimate
SE
1.705
1.422
0.512
0.350
0.036
0.002
38
Level at which action lies: Three Level
yijk   0 jk  1 xijk  e0ijk
 0 jk   0 k  u0 jk
Variance
Between state
Between county
Between time
y ij   0 j   1 xij  e0ij
 0 k   0  v0 k
 0 j   0  u0 j
v0 jk ~ N (0,  v20 )
u 0 j ~ N (0,  u20 )
u0 jk ~ N (0,  u20 )
e0ij ~ N (0,  e20 )
e0ijk ~ N (0,  e20 )
Ignore State
Estimate
SE
2.984
0.512
0.076
0.002
Include State
Estimate
SE
1.705
1.422
0.512
0.350
0.036
0.002
39
State Variation trumps County Variation
y ijk   0 jk  1 jk xijk  e0ijk
 0 jk   0 k  u 0 jk
 0 k   0  v0 k
1 jk  1k  u1 jk
1k  1  v1k
  v20

v 0 k 
v  ~ N (0,  v ) :  v  
2 
 1k 
 v 01  v1 
u 0 jk 
  u20


 ~ N (0,  u ) :  u  
2 
u1 jk 
 u 01  u1 
2
e0ijk ~ N (0,  e 0 )
40
Macro contexts as important, if not
more, for life expectancy variations in
the US.
41
2. Importance of non-geographical/spatial
contexts (e.g., schools, workplaces,
hospitals, social networks)
S V Subramanian
42
One context at a time
• In prior multilevel research, we have not
given sufficient attention to all potentially
relevant contexts on health
– Most multilevel research to date has focused
on one setting: neighborhoods
– However, other settings may be relevant for
health
– Biased inferences
Empirical illustration: schools versus
neighborhoods
• 180 days/year
• 6 or more hours/day
• 12-13 years
The idea of cross classified
Level-2
School 1
School 2
School 3
School 4
Neighborhood 1
☺☺☺☺
☺
☺☺
☺
☺
Neighborhood 2
☺
☺☺☺☺
☺☺☺
☺☺
☺☺
☺☺☺
☺☺☺
☺☺☺☺
☺☺☺☺
Neighborhood 3
Neighborhood 4
S V Subramanian
☺☺☺
☺☺
☺☺
☺☺☺☺
45
Study population
• In-home Survey: 20,745 students nested in
132 schools
• Mean 125.4 students/school
• 2142 neighborhoods (census tracts)
– Mean 7.3 students/tract
3 outcomes:
Tobacco use
Depression
Weight status
Number of days cigarettes smoked (last 30
days)
Variance
Estimate (Standard Error)
Hierarchical
School
Neighborhood
School
Hierarchical
Neighborhood
-
5.6 (0.6)
-
CrossClassified
Number of days cigarettes smoked (last 30
days)
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
-
4.7 (0.5)
-
CrossClassified
Number of days cigarettes smoked (last 30
days)
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
CrossClassified
-
4.7 (0.5)
0.35 (0.3)
5.6 (0.6)
-
5.54 (0.8)
Log odds of smoking
Variance
Estimate (Standard Error)
Hierarchical
School
Neighborhood
School
Hierarchical
Neighborhood
-
0.35 (0.04)
-
CrossClassified
Log odds of smoking
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
-
0.25 (0.03)
-
CrossClassified
Log odds of smoking
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
CrossClassified
-
0.25 (0.03)
0.06 (0.02)
0.35 (0.04)
-
0.36 (0.06)
Self reported Body Mass Index
Variance
Estimate (Standard Error)
Hierarchical
School
Neighborhood
School
Hierarchical
Neighborhood
-
0.98 (0.11)
-
CrossClassified
Self reported Body Mass Index
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
-
0.73 (0.10)
-
CrossClassified
Self reported Body Mass Index
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
CrossClassified
-
0.73 (0.10)
0.22 (0.08)
0.98 (0.11)
-
0.87 (0.14)
CES-D Scale (Depression)
Variance
Estimate (Standard Error)
Hierarchical
School
Neighborhood
School
Hierarchical
Neighborhood
-
2.05 (0.56)
-
CrossClassified
CES-D Scale (Depression)
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
-
1.84 (0.52)
-
CrossClassified
CES-D Scale (Depression)
Variance
Neighborhood
School
Estimate (Standard Error)
Hierarchical
School
Hierarchical
Neighborhood
CrossClassified
-
1.84 (0.52)
0.45 (0.34)
2.05 (0.56)
-
1.69 (0.59)
Concluding remarks
• Need critical re-thinking of ALL single-level epidemiological
analyses
– Dire warnings of “ecological fallacy”, but the science is
full of studies that risk “individualistic or atomistic
fallacy”
• Need to carefully consider the units of analysis in
epidemiological investigations and the problem of
“omitted” levels
• Need to consider multiple contexts simultaneously
(places/schools/worksites)
S V Subramanian
61
Download