Analysis of Clustered and Longitudinal Data Module 3 Linear Mixed Models (LMMs) for Clustered Data – Two Level Part A Biostat 512: Module 3A - Kathy Welch, Heidi Reichert 1 The Linear Mixed Model (LMM) • A Linear Mixed Model is a parametric model for a continuous outcome. • The model is linear in the parameters. • The model contains both fixed and random effects. • LMMs can be used to analyze both clustered and longitudinal/repeated measures data. • We will discuss the analysis clustered data using LMMs in this module and cover the analysis of longitudinal and repeated measures data using LMMs in later modules. 2 Data Example: Rat Pup Data • 30 female rats were randomly assigned to one of three treatment groups, high dose, low dose and control. The objective of the study was to compare the birth weights of pups from litters born to female rats that received the drug treatment at high and low doses to the birth weights of pups from litters that received the control treatment. • Research question: Is there an effect of drug treatment (High, Low, Control) on birth weight? 3 Clustered Data Example: Rat Pup Data • The design is unbalanced – Number of rats receiving each treatment varies by treatment group (3 rats in the high-dose group died) – Number of rat pups per litter varies across the litters • Variables include – – – – – Litter (litter ID number) Pup_ID (rat pup ID number) Weight (birth weight of the rat pup: the outcome) Sex (sex of the rat pup: female or male) Treatment (dose: high, low, or control) 4 The Rat Pup Data is Multilevel Level 2 Litter 1 Litter 2 (Litter) Level 1 Pup 11 Pup n1 Pup 21 .. Pup 12 Pup n2 Pup 22 .. (Rat Pup) Level 1 Variables: Birth Weight, Sex Level 2 Variables: Treatment 5 Weights Vary Within and Between Litters • Rat weights vary from rat to rat within the same litter. • The average litter weight ( ) varies between litters. 6 Weights are Correlated Within Litters • The weights of rats from within the same litter tend to be pretty similar. • For some litters, the rat weights lie entirely above or below the overall average (-) . 7 Summarize the Level 1 Covariate(s) • Level 1 covariate is sex sex | Freq. Percent Cum. ------------+----------------------------------Female | 151 46.89 46.89 Male | 171 53.11 100.00 ------------+----------------------------------Total | 322 100.00 8 Summarize Weight by the Level 1 Covariate(s) • Y is Weight • Level 1 covariate is sex Summary for variables: weight by categories of: female (Sex) female | N mean sd min max ---------+-------------------------------------------------0 | 171 6.205322 .6741926 4.57 8.33 1 | 151 5.940132 .5867458 3.68 7.73 ---------+-------------------------------------------------Total | 322 6.080963 .6474272 3.68 8.33 ------------------------------------------------------------ 9 Visualize Weight by the Level 1 Covariate(s) • Use boxplots to assess the effect of sex 10 Summarize the Level 2 Covariates • Level 2 covariate is treatment group treatment | Freq. Percent Cum. ------------+----------------------------------Control | 10 37.04 37.04 High | 7 25.93 62.96 Low | 10 37.04 100.00 ------------+----------------------------------Total| 27 100.00 11 Visualize Weight by the Level 2 Covariates • Use boxplots to assess the effect of treatment 12 The Linear Mixed Model (LMM) for Clustered Data • LMMs for clustered data allow for both fixed and random effects. • Fixed effects may be modeled at any level of the data. – In the rat pup data, we are interested in the fixed effects of sex and treatment. – Sex can vary from rat to rat. It is measured at Level 1. – Treatment is constant for rats within the same litter. It is measured at Level 2. • Random effects usually include a random intercept for each level of clustering to account for possible correlation within clusters, and to make inference to the larger population of clusters. – In the rat pup model, we will include a random intercept term. 13 The LMM for the Rat Pup Data • We start with the simplest mixed model : Weightij = b0 + b0j + eij fixed random where i denotes a rat pup j denotes the litter 0 is the overall intercept term b0 j is the random deviation from the fixed intercept for litter j ij is the random error for the ith rat pup in the jth litter 14 The LMM for the Rat Pup Data Weight = b + b + e ij 0 0j ij 15 The LMM for the Rat Pup Data Weightij = b0 + b0j + eij -----------------------------------------------------------------------------weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_cons | 6.195284 .1090958 56.79 0.000 5.981461 6.409108 ----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+-----------------------------------------------litter: Identity | var(_cons) | .3003704 .092285 .1644887 .5485019 -----------------------------+-----------------------------------------------var(Residual) | .1963076 .016214 .1669676 .2308033 ------------------------------------------------------------------------------ 16 The LMM for the Rat Pup Data Weightij = b0 + b0j + eij • The random portion of the model now involves two parts – the cluster-specific random deviations (the b0j), and the subject-within-cluster-specific error (the e ij ). • This LMM is commonly referred to as the Variance Components model, because it partitions the total variation in the outcome into between-cluster variation and within-cluster variation. – The variance of the random intercepts is the between-cluster variation. Also referred to as the Level 2 variance. – The variance of the residuals is the within-cluster variation, also known as the Level 1 variance. 17 The LMM for the Rat Pup Data • We now add the dummy variables for the Level 2 covariate, Treatment: Weightij = b0 + b1High j + b2 Low j + b0j + eij where i j fixed random denotes a rat pup denotes the litter 0 is the overall intercept term, and represents the mean for Control group b1,b2 are the difference in effect of treatment for the High and Low treatment groups, respectively, compared to Control b0 j is the random deviation from the treatment-specific intercept for litter j ij is the random error for the ith rat pup in the jth litter 18 The LMM for the Rat Pup Data Weightij = b0 + b1High j + b2 Low j + b0j + eij 19 The LMM for the Rat Pup Data Weightij = b0 + b1High j + b2 Low j + b0j + eij -----------------------------------------------------------------------------weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------treatnum | 1 | -.3944372 .2695682 -1.46 0.143 -.9227811 .1339067 2 | -.4287423 .2434727 -1.76 0.078 -.9059401 .0484555 | _cons | 6.453315 .1716384 37.60 0.000 6.11691 6.78972 ----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+-----------------------------------------------litter: Identity | var(_cons) | .276991 .0905209 .1459796 .5255803 -----------------------------+-----------------------------------------------var(Residual) | .1965504 .0162532 .1671422 .2311328 ------------------------------------------------------------------------------ 20 The LMM for the Rat Pup Data Weightij = b0 + b1High j + b2 Low j + b0j + eij • The addition of the Level 2 dummies for treatment has reduced the Level 2 betweencluster variance. – The variance of the random intercepts (or the b0js) is smaller because the systematic variation due to treatment has been removed. 21 The LMM for the Rat Pup Data • We now add the dummy variable for the Level 1 covariate, Sex: Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij where i j fixed random denotes a rat pup denotes the litter 0 is the overall intercept term, and represents the mean for Males in the Control group b1,b2 are the difference in effect of treatment for the High and Low treatment groups, respectively, compared to Control b3 is the effect being Female compared to Male b0 j is the random deviation from the treatment-specific intercept for litter j ij is the random error for the ith rat pup in the jth litter 22 The LMM for the Rat Pup Data Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij 23 The LMM for the Rat Pup Data Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij -----------------------------------------------------------------------------weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------treatnum | 1 | -.354683 .2893063 -1.23 0.220 -.9217129 .2123469 2 | -.3747049 .2617241 -1.43 0.152 -.8876746 .1382648 | 1.female | -.3612726 .0477986 -7.56 0.000 -.4549561 -.2675891 _cons | 6.606246 .1856211 35.59 0.000 6.242436 6.970057 ----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+-----------------------------------------------litter: Identity | var(_cons) | .3259097 .1037444 .1746388 .6082104 -----------------------------+-----------------------------------------------var(Residual) | .1636033 .0135447 .1390981 .1924257 ------------------------------------------------------------------------------ 24 The LMM for the Rat Pup Data Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij • The addition of the Level 1 dummy for sex has reduced the Level 1 within-cluster variance. – The residual variance is smaller because the systematic variation due to sex has been removed. 25 The LMM Accounts for Correlation We say that given the b0 j s, the ij s within a cluster are independent. 26 The Linear Mixed Model (LMM) for Clustered Data • LMMs for clustered data generally include both fixed and random effects. – We include random intercepts for each level of clustering. • In LMMs the random part of the model now involves two parts – the b0js and the ij s – The variance of the random intercepts (the b0js) quantifies the between-cluster variation in the outcome. – The residual variance (variance of the ij s) quantifies the within-cluster variation in the outcome. 27 Data Setup for LMM Analysis: Long Form 28 Lab Example Rat Pup Data 29