Multilevel Modeling using Stata { Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using Stata (Second Edition) by Sophia Rabe-Hesketh Anders Skrondal 600 500 400 300 200 Mini Wright Measurements 700 Within-Subject Dependence 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Subject ID Occasion 1 Occasion 2 Within-Subject Dependence: We can predict occasion 2 measurement if we know the subject’s occasion 1 measurement. Between-Subject Heterogeneity: Large differences between subjects (compare subjects 9 and 15) Within-subject dependence is due to between-subject heterogeneity Standard Regression Model Measurement of subject i on occasion j π¦ππ = π½ + πππ Population Mean Residuals (error terms) Independent over subjects and occasions πππ { πππ { πππ { πππ { π· Clearly ignores information about within-subject dependence Variance Component Model π¦ππ = π½ + πππ π¦ππ = π½ + ππ + πππ Random Intercept: deviation of subject j’s mean from overall mean π½ Within-subject residual: deviation of observation i from subject j’s mean Variance Component Model π¦ππ = π½ + πππ π¦ππ = π½ + ππ + πππ Random Intercept: deviation of subject j’s mean from overall mean π½ Within-subject residual: deviation of observation i from subject j’s mean Variance Component Model π¦ππ = π½ + ππ + πππ Random Intercept: deviation of subject j’s mean from overall mean π½ Within-subject residual: deviation of observation i from subject j’s mean π2π ππ π1π π½ + ππ π· Variance Component Model π¦ππ = π½ + ππ + πππ ππ ∼ π(0, π) πππ ∼ π(0, π) πππ π¦ππ = πππ π½ + πππ(ππ ) + πππ(πππ ) 0 πππ π¦ππ = π π + π π Variance Component Model π¦ππ = π½ + ππ + πππ Proportion of Total Variance due to subject differences: πππ(ππ ) πππ π¦ππ = π π+π =ρ Intraclass Correlation: within cluster correlation πΆππ(π¦1π , π¦2π ) = ρ Random or Fixed Effect? Since every subject has a different effect ππ we can think of subjects as categorical explanatory variables. Since the effects of each subject is random, we have been using a random effect model: π¦ππ = π½ + ππ + πππ , ππ ∼ π(0, π) What if we want to fix our model so that each effect is for a specific subject? Then we would use a fixed effect model: π¦ππ = π½ + πΌπ + πππ , π½ π=1 πΌπ .xtreg wm, fe =0 Random or Fixed Effect? random effect model: if the interest concerns the population of clusters “generalize the potential effect” i.e. nurse giving the drug fixed effect model: if we are interest in the “effect” of the specific clusters in a particular dataset “replicable in life” i.e. the actual drug Random Intercept Model with Covariates without covariates: π¦ππ = π½ + πππ π¦ππ = π½ + ππ + πππ Random Intercept Model with Covariates with covariates: π¦ππ = π½1 + π½2 π₯2ππ + β― π½π π₯πππ + πππ π¦ππ = π½1 + π½2 π₯2ππ + β― π½π π₯πππ + ππ + πππ = (π½1 + ππ ) + π½2 π₯2ππ + β― π½π π₯πππ + πππ random parameter not estimated with fixed parameters π½1 − π½π , but whose variance π is estimated with variance π of πππ Ecological Fallacy occurs when between-cluster relationships differ substantially from within-cluster relationships. • Can be caused by cluster-lever confounding For example, mothers who smoke during pregnancy may also adopt other behaviors such as drinking and poor nutritional intake, or have lower socioeconomic status and be less educated. These variables adversely affect birthweight and have not be adequately controlled for. In these cases the covariate is correlated with the error term. (endogeneity) • Because of this, the between-effect may be an overestimate of the true effect. • In contrast, for within-effects each mother serves as her own control, so within mother estimates may be closer to the true causal effect. How to test for endogeneity? Use the Hausman test to compare two alternative estimators of π½ Random-coefficient model We’ve already considered random intercept models where the intercept is allowed to vary over clusters after controlling for covariates. What if we would also like the coefficients (or slopes) to vary across clusters? Models the involve both random intercepts and random slopes are called Random Coefficient Models Random-coefficient model Random Intercept Model: π¦ππ = π½1 + π½2 π₯ππ + ππ + πππ Random Coefficient Model: cluster-specific random intercept π¦ππ = π½1 + π½2 π₯ππ + π1π + π2π π₯ππ + πππ cluster-specific random slope π¦ππ = (π½1 +π1π ) + (π½2 + π2π )π₯ππ + πππ