GrowthMixtureModeling

advertisement
GROWTH MIXTURE MODELING
1
Shaunna L. Clark & Ryne Estabrook
Advanced Genetic Epidemiology Statistical
Workshop
October 24, 2012
OUTLINE
Growth Mixture Model
 Regime Switching
 Other Longitudinal Mixture Models
 OpenMx
 GMM
 How to extend GMM to FMM
 How to get individual class probabilities
from OpenMx
 Exercise

2
HOMOGENEITY VS. HETEROGENEITY
Previous session showed a growth model where
everyone follows the same mean trajectory of use


With some individual variations
Is this an accurate representation of the
development of substance abuse\dependence?

Probably Not
25
Number of Drinks Per Week

20
15
10
5
0
12
14
16
18
Age
21
24
3
GROWTH MIXTURE MODELING (GMM)
Muthén & Shedden, 1999; Muthén, 2001
 Setting


A single item measured repeatedly


Hypothesized trajectory classes



Example: Number of substances currently using
Non-users; Early initiate; Late, but consistent use
Individual trajectory variation within class
Aims

Estimate trajectory shapes


Estimate trajectory class probabilities


Linear, quadratic, etc.
Proportion of sample in each trajectory class
Estimate variation within class
4
LINEAR GROWTH MODEL DIAGRAM
σ2
σ2Int,Slope
Int
I
1
mInt
1
xT1
σ2ε1
1
1
1
σ2Slope
S
mSlope
1
0
1
2
xT2
xT3
xT4
σ2ε2
σ2ε3
σ2ε4
3
4
xT5
σ2ε5
5
LINEAR GMM MODEL DIAGRAM
C
σ2
σ2Int,Slope
Int
I
1
mInt
1
xT1
σ2ε1
1
1
1
σ2Slope
S
mSlope
1
0
1
2
xT2
xT3
xT4
σ2ε2
σ2ε3
σ2ε4
3
4
xT5
σ2ε5
6
GMM EXAMPLE PROFILE PLOT
7
GMM EXAMPLE PROFILE PLOT
8
GROWTH MIXTURE MODEL EQUATIONS
xitk = Interceptik + λtk*Slopeik + εitk
Interceptik = α0k + ζ0ik
Slopeik = α1k + ζ1ik
for individual i at time t in class k
εitk ~ N(0,σ)
9
LATENT CLASS GROWTH MODEL (LCGA)
VS. GMM
C

σ2Int,Slope
I
1
mInt
1
1
1
1

S
mSlope
1
0
1
2
Same as GMM except no
residual variance on
growth factors

3
xT2
xT3 xT4
xT5
No individual variation
within class

4

xT1
Nagin, 1999; Nagin &
Tremblay, 1999
Everyone has the same
trajectory
LCGA is a special case of
GMM
10
CLASS ENUMERATION
Still cannot use LRT χ2
 Information Criteria: AIC (Akaike, 1974), BIC
(Schwartz,1978)




Penalize for number of parameters and sample size
Model with lowest value
Interpretation and usefulness
Profile plot
 Substantive theory
 Predictive validity
 Size of classes

11
ANALYSIS PLAN
Determine growth function
 Determine number of classes
 Examine mean plots, with and without individual
trajectories
 Determine if growth factor variances need:

1.
2.

To be different from zero (GMM vs. LCGA)
Should be held equal across classes
Add covariates and distal outcomes
12
MODELING ZERO
13
HOW DO I MODEL ZEROS?
Particularly relevant for substance abuse (or
other outcome with floor effects) to model nonusers
 Some outcomes are right skewed so that there
are many low values of the dependent variable
 However, some outcomes may have more zero’s
than expected

Example: Alcohol consumption; Individuals who
never drink
 These individuals will always respond that consumed
zero drinks

14
WHEN YOU HAVE MORE ZERO’S THAN
EXPECTED

In this case, zeros can be thought of coming from
two populations
Structural Zeros – zeros always occur in this
population
1.

Example: Never drinkers
Others who produce zero with some probability at
the time of measurement
2.

Example: Occasional drinkers
15
ONE OPTION

Identify those individuals in the two populations
Structural zeros can then be eliminated
 Those who could potentially produce zeros are
retained

But it can very difficult to tell the difference
between the two
 Or the population of interest is the entire
population

i.e. both drinkers and non-drinkers
 Stem issue

16
ZERO-CLASS

Consider what you mean by a zero


Only non-users who have not initiated use or those
have initiated but only one try?
Fix growth factor mean to zero
Start not using, stay not using
 If only fix the means it will not be a pure zero-class



Likely to pick up people that have tried once or twice, but
have not moved to regular use
Fix growth factor means and (co)variance to zero
No variance in group
 Sometimes can cause computation issues

17
REGIME SWITCHING
18
IS GMM A GOOD MODEL FOR SUBSTANCE
USE DEVELOPMENT?
Maybe not
 Assumes that individuals remain in same
trajectory over time
 Once a heavy smoker always a heavy smoker,
even if you successfully quit for a period
 May not hold with many substance use outcomes
 Examples: Switching from moderate to heavy
drinking, changing from daily smoker to nonsmoker

19
INDIVIDUAL TRAJECTORY PLOTS

Dolan et al. (2005) presented the regime
switching model (RSM) a way to get traction on
this issue
20
DOLAN ET AL.
REGIME SWITCHING MODEL (RSM)
Regime = latent trajectory class
 Ex: habitual moderate drinkers, heavy
drinkers
 Regime Switch = move from one regime to
another
 Ex: A switch from moderate to heavy drinking
 Used latent markov modeling for normally
distributed outcomes (Schmittmann et al., 2005)

21
RSM WITH ORDINAL DATA
Dolan RSM model was designed to be used with
normally distributed data
 Substance abuse measures are often:



If continuous, not normally distributed
Count


Categorical


Ex: # of drinking using days per month
Ex: Do you use X substance?
As we’ve seen in previous talks, can use the
Mehta, Neale and Flay (2004) method when we
have ordinal data
22
APPLICATION:
ADOLESCENT DRINKING


From Dolan et al. paper
Data: National Longitudinal Survey Youth
(NLSY97)





Years 1998, 1999, 2000, 2001
737 white males and females
Age 13 or 14 in 1998
Indicated the regularly drank alcohol
Outcome: “In the past 30 days, on days you
drank, how much did you drink?”

Made ordinal: 0= 0-2 drinks; 1= 3 drinks; 2= 4-6
drinks, 3= 7+ drinks
23
MODEL SIMPLIFICATIONS FOR GMM &
RSM APPLICATION

Assumed linear model


No correlation between intercept and slope


Really quadratic
Where you start drinking at the beginning of the
study does not influence how your drinking
develops during the study
Transition probabilities equivalent across time

Probability of drinking between age 12-13 are the
same as 20-21
24
COMPARING GMM AND RSM
Model
-2*LL
np
AIC
BIC
saBIC
3-Class
GMM
-5077
18
995
-4199
-958
3-Class
RSM
-4589
26
990
-4183
-955
25
3-CLASS GMM PROFILE PLOT
12
10
8
Growing-72%
6
Moderate-18%
Low-10%
4
2
0
-2
26
3-CLASS RSM PROFILE PLOT
12
10
8
6
Moderate-12%
High-10%
Low-77%
4
2
0
0
-2
1
2
3
27
GSM & RSM COMBINED PROFILE PLOT
12
10
8
RSW-Moderate
6
RSW-High
RSW-Low
GMM-Growing
GMM-Moderate
4
GMM-Low
2
28
0
0
-2
1
2
3
RSM TRANSITION PROBABILITIES
Class
Low
Moderate
Heavy
Low
0.74
0.01
0.04
Moderate
0.17
0.67
0.22
Heavy
0.09
0.32
0.74

Likely to stay in same class

Low class unlikely to switch to other classes

Most likely to switch between moderate and high
drinking classes
29
OTHER LONGITUDINAL MIXTURE MODELS

Longitudinal Latent Class Analysis
Models patterns of change over time, rather than
functional growth form
 Lanza & Collins, 2006; Feldman et al., 2009

LCA
LCGA
Binary item
35
11
3 category item
68
12
11 variables
3 Classes
Quadratic
30
LATENT TRANSITION ANALYSIS
•Models transition from one state to another over time
•Unlike RSM, do not impost growth structure
•Ex: Drinking alcohol or not over time
•Graham et al., 1991; Nylund et al., 2006
•Script on the OpenMx forum
C1
x1 x2
x3 x4 x5
Time 1
C2
x1 x2
x3 x4 x5
Time 2
31
OTHER LONGITUDINAL MIXTURE MODELS

Survival Mixture

Multiple latent classes of individuals with different
survival functions


Ex: Different groups based on age of initiation
Kaplan, 2004; Masyn, 2003; Muthén & Masyn, 2005
32
OPENMX:
GMM EXAMPLE
33
GMM_example.R
2 Classes
Intercept and Slope
MAKE OBJECTS FOR THINGS WE WILL
REFERENCE THROUGHOUT THE SCRIPT
#Number of measurement occasions
nocc <- 4
#Number of growth factors (intercept, slope)
nfac <- 2
#Number of classes
nclass <- 2
#Number of thresholds; 1 minus categories of variable
nthresh <- 3
#Function that will help us label our thresholds
labFun <- function(name="matrix",nrow=1,ncol=1){matlab <matrix(paste(rep(name, each=nrow*ncol), rep(rep(1:nrow),ncol),
rep(1:ncol,each=nrow),sep="_"))return(matlab)}
34
SETTING UP THE GROWTH PART OF THE
MODEL
#Factor Loadings
lamda <- mxMatrix("Full", nrow = nocc, nco l= nfac,
values = c(rep(1,nocc),0:(nocc-1)),name ="lambda")
#Factor Variances
phi <-mxMatrix("Diag", nrow = nfac, ncol = nfac,
free = TRUE,labels = c("vi", "vs"), name ="phi")
#Error terms
theta <-mxMatrix("Diag", nrow = nocc, ncol = nocc,
free = TRUE,labels = paste("theta",1:nocc,sep = ""),
values = 1,name ="theta")
#Factor Means
alpha <- mxMatrix("Full", nrow= 1, ncol = nfac, free = TRUE,
labels = c("mi", "ms"), name ="alpha")
35
GROWTH PART CONT’D
#Item Thresholds
thresh <- mxMatrix(type="Full", nrow=nthresh, ncol=nocc, free=rep(c(F,F,T),nocc),
values=rep(c(0,1,1.1),nocc),
lbound=.0001,labels=labFun("th",nthresh,nocc),name="thresh")
cov <-mxAlgebra(lambda %*% phi %*% t(lambda) + theta, name="cov")
mean <-mxAlgebra(alpha %*% t(lambda), name="mean”
obj<-mxFIMLObjective("cov", "mean", dimnames=names(ordgsmsData),
threshold="thresh",vector=TRUE)
lgc <- mxModel("LGC", lamda, phi, theta, alpha, thresh, cov, mean, obj)
36
CLASS-SPECIFIC MODEL
class1 <- mxModel(lgc, name ="Class1")
class1 <- omxSetParameters(class1,
labels = c("vi", "vs", "mi", "ms"),
values = c(0.01, 0.05, 0.14, 0.32),
newlabels = c("vi1", "vs1", "mi1", "ms1"))
As in LCA, repeat for all your latent classes. Just make
sure to change the class number and starting values
accordingly.
37
CLASS PROPORTIONS
#Fixing one probability to 1
classP <- mxMatrix("Full", nrow = nclass, ncol = 1, free = c(TRUE,
FALSE), values = 1, lbound = 0.001,
labels = c("p1", "p2"),
name="Props")
# rescale the class proportion matrix into a class probability matrix by
dividing by their sum
# (done with a kronecker product of the class proportions and 1/sum)
classS <- mxAlgebra(Props%x%(1/sum(Props)),
name ="classProbs")
38
CLASS-SPECIFIC OBJECTIVES
# weighted by the class probabilities
sumll<-mxAlgebra(-2*sum(log(
classProbs[1,1]%x%Class1.objective
+ classProbs[2,1]%x%Class2.objective)),
name = "sumll")
# make an mxAlgebraObjective
obj <- mxAlgebraObjective("sumll")
39
FINISH IT OFF
# put it all in a model
gmm <- mxModel("GMM 2 Class",
mxData(observed = ordgsmsData, type ="raw”),
classP, classS, sumll, obj)
class1, class2,
# run it
gmmFit <- mxRun(gmm, unsafe = TRUE)
# run it again using starting values from previous run
summary(gmmFit2 <- mxRun(gmmFit))
40
DIFFERENCE BETWEEN GMM AND FMM?
C
σ2 F
1
F
σ2
C
In
t
I
1
1 1 1 1 1
x1
x2 x3 x4 x5
xT1 xT2 xT3 xT4 xT5
41
Factor Mixture Model
Intercept Only Growth Mixture Model
GMM AND FMM
The difference between the two models shown on
the previous slide is that the factor loadings are
restricted to 1 in the GMM where in the FMM
they are freely estimated
 Adjust the script by having letting the values of
the lambda matrix be freely estimated
 To run the FMM on the previous page,



similar to factor analysis, need to fix a parameter so
the model is identified
Restrict the mean of two of the factors in two
class to set the metric of the factor
42
FMM & MEASUREMENT INVARIANCE
Clark et al. (In Press)
 In previous version, the threshold of the items
were measurement invariant across classes



Classes were differentiated based on difference in the
mean and variances of the factor
Can also have models where there are
measurement non-invariant thresholds



Classes arising because of difference in item
thresholds
Add thresholds to class-specific statements
Need to restrict the factor mean to zero because can’t
identify factor mean and item thresholds
43
HOW DO WE EXTRACT CLASS
PROBABILITIES AND CALCULATE
ENTROPY IN OPENMX
44
Ryne Estabrook
OPEN MX EXERCISE\HOMEWORK

Adjust the GMM_example.R script to include:
A quadratic growth function
 A third class


Run it


Re-run it
Interpret the output

What are the classes?
45
Download