ppt

advertisement
Multilevel Modeling
using Stata
{
Andrew Hicks
CCPR Statistics and Methods Core
Workshop based on the book:
Multilevel and
Longitudinal Modeling
Using Stata
(Second Edition)
by
Sophia Rabe-Hesketh
Anders Skrondal
600
500
400
300
200
Mini Wright Measurements
700
Within-Subject Dependence
1
2
3
4
5
6
7
8 9 10 11 12 13 14 15 16 17
Subject ID
Occasion 1
Occasion 2
Within-Subject Dependence: We can predict occasion 2 measurement if
we know the subject’s occasion 1 measurement.
Between-Subject Heterogeneity: Large differences between subjects
(compare subjects 9 and 15)
Within-subject dependence is due to between-subject heterogeneity
Standard Regression Model
Measurement of subject i on occasion j
𝑦𝑖𝑗 = 𝛽 + πœ‰π‘–π‘—
Population Mean
Residuals (error terms)
Independent over subjects and occasions
πœ‰π‘–π‘— {
πœ‰π‘–π‘— {
πœ‰π‘–π‘— {
πœ‰π‘–π‘— {
𝜷
Clearly ignores information about
within-subject dependence
Variance Component Model
𝑦𝑖𝑗 = 𝛽 + πœ‰π‘–π‘—
𝑦𝑖𝑗 = 𝛽 + πœπ‘— + πœ–π‘–π‘—
Random Intercept: deviation of subject
j’s mean from overall mean 𝛽
Within-subject residual: deviation of
observation i from subject j’s mean
Variance Component Model
𝑦𝑖𝑗 = 𝛽 + πœ‰π‘–π‘—
𝑦𝑖𝑗 = 𝛽 + πœπ‘— + πœ–π‘–π‘—
Random Intercept: deviation of subject
j’s mean from overall mean 𝛽
Within-subject residual: deviation of
observation i from subject j’s mean
Variance Component Model
𝑦𝑖𝑗 = 𝛽 + πœπ‘— + πœ–π‘–π‘—
Random Intercept: deviation of subject
j’s mean from overall mean 𝛽
Within-subject residual: deviation of
observation i from subject j’s mean
πœ–2𝑗
πœπ‘—
πœ–1𝑗
𝛽 + πœπ‘—
𝜷
Variance Component Model
𝑦𝑖𝑗 = 𝛽 + πœπ‘— + πœ–π‘–π‘—
πœπ‘— ∼ 𝑁(0, πœ“)
πœ–π‘–π‘— ∼ 𝑁(0, πœƒ)
π‘‰π‘Žπ‘Ÿ 𝑦𝑖𝑗 = π‘‰π‘Žπ‘Ÿ 𝛽 + π‘‰π‘Žπ‘Ÿ(πœπ‘— ) + π‘‰π‘Žπ‘Ÿ(πœ–π‘–π‘— )
0
π‘‰π‘Žπ‘Ÿ 𝑦𝑖𝑗 =
πœ“
πœ“
+
πœƒ
πœƒ
Variance Component Model
𝑦𝑖𝑗 = 𝛽 + πœπ‘— + πœ–π‘–π‘—
Proportion of Total Variance due to subject differences:
π‘‰π‘Žπ‘Ÿ(πœπ‘— )
π‘‰π‘Žπ‘Ÿ 𝑦𝑖𝑗
=
πœ“
πœ“+πœƒ
=ρ
Intraclass Correlation: within cluster correlation
πΆπ‘œπ‘Ÿ(𝑦1𝑗 , 𝑦2𝑗 ) = ρ
Random or Fixed Effect?
Since every subject has a different effect πœπ‘– we can think of
subjects as categorical explanatory variables. Since the effects
of each subject is random, we have been using a random effect model:
𝑦𝑖𝑗 = 𝛽 + πœπ‘— + πœ–π‘–π‘— ,
πœπ‘— ∼ 𝑁(0, πœ“)
What if we want to fix our model so that each effect is for a specific
subject? Then we would use a fixed effect model:
𝑦𝑖𝑗 = 𝛽 + 𝛼𝑗 + πœ–π‘–π‘— ,
𝐽
𝑗=1 𝛼𝑗
.xtreg wm, fe
=0
Random or Fixed Effect?
random effect model:
if the interest concerns the population of clusters
“generalize the potential effect”
i.e. nurse giving the drug
fixed effect model:
if we are interest in the “effect” of the specific clusters in a particular
dataset
“replicable in life”
i.e. the actual drug
Random Intercept Model
with Covariates
without covariates:
𝑦𝑖𝑗 = 𝛽 + πœ‰π‘–π‘—
𝑦𝑖𝑗 = 𝛽 + πœπ‘— + πœ–π‘–π‘—
Random Intercept Model
with Covariates
with covariates:
𝑦𝑖𝑗 = 𝛽1 + 𝛽2 π‘₯2𝑖𝑗 + β‹― 𝛽𝑝 π‘₯𝑝𝑖𝑗 + πœ‰π‘–π‘—
𝑦𝑖𝑗 = 𝛽1 + 𝛽2 π‘₯2𝑖𝑗 + β‹― 𝛽𝑝 π‘₯𝑝𝑖𝑗 + πœπ‘— +
πœ–π‘–π‘—
= (𝛽1 + πœπ‘— ) + 𝛽2 π‘₯2𝑖𝑗 + β‹― 𝛽𝑝 π‘₯𝑝𝑖𝑗 +
πœ–π‘–π‘—
random parameter not estimated with fixed parameters 𝛽1 − 𝛽𝑝 ,
but whose variance πœ“ is estimated with variance πœƒ of πœ–π‘–π‘—
Ecological Fallacy
occurs when between-cluster relationships differ substantially
from within-cluster relationships.
• Can be caused by cluster-lever confounding
For example, mothers who smoke during pregnancy may also adopt
other behaviors such as drinking and poor nutritional intake, or have lower
socioeconomic status and be less educated. These variables adversely affect
birthweight and have not be adequately controlled for. In these cases the
covariate is correlated with the error term. (endogeneity)
• Because of this, the between-effect may be an overestimate of the
true effect.
• In contrast, for within-effects each mother serves as her own control,
so within mother estimates may be closer to the true causal effect.
How to test for endogeneity?
Use the Hausman test to compare two alternative estimators of 𝛽
Random-coefficient model
We’ve already considered random intercept models where the intercept
is allowed to vary over clusters after controlling for covariates.
What if we would also like the coefficients (or slopes) to vary across
clusters?
Models the involve both random intercepts and random slopes are
called Random Coefficient Models
Random-coefficient model
Random Intercept Model:
𝑦𝑖𝑗 = 𝛽1 + 𝛽2 π‘₯𝑖𝑗 + πœπ‘— + πœ–π‘–π‘—
Random Coefficient Model:
cluster-specific random intercept
𝑦𝑖𝑗 = 𝛽1 + 𝛽2 π‘₯𝑖𝑗 + 𝜁1𝑗 + 𝜁2𝑗 π‘₯𝑖𝑗 + πœ–π‘–π‘—
cluster-specific random slope
𝑦𝑖𝑗 = (𝛽1 +𝜁1𝑗 ) + (𝛽2 + 𝜁2𝑗 )π‘₯𝑖𝑗 + πœ–π‘–π‘—
Download