Recent advances in group-based trajectory modeling

An Overview of Two Recent
Advances in Trajectory
Modeling
Daniel S Nagin
Combining Propensity Score Matching and
Group-Based Trajectory Analysis in an
Observational Study (Psychological Methods,
2007) (Also, Developmental Psychology, 2008)
Amelia Haviland, RAND Corporation
Daniel S. Nagin, Carnegie Mellon University
Paul R. Rosenbaum, University of Pennsylvania
Problem Setting





Inferring the “treatment (aka causal) effect” of an important life
event or a therapeutic intervention with non-experimental
longitudinal data
Overcoming severe selection problem whereby treatment
probability depends heavily upon prior trajectory of the
outcome-- Boys with high prior violence levels are more likely to
join gangs
Dealing with feedback effects--violence and gang membership
may be mutually reinforcing
Treatment effect may also depend upon prior trajectory of the
outcome
Measuring effect of gang membership is prototypical example of
a large set of important inference problems in psychopathology
 Divorce and depression
 Drug treatment and drug abuse
3
Montreal Data

1037 Caucasian, francophone, nonimmigrant males

First assessment at age 6 in 1984

Most recent assessment at age 17 in 1995

Data collected on a wide variety of individual, familial, and
parental characteristics including self-reported violent
delinquency and gang membership from age 11 to 17
Prototypical modern longitudinal dataset—rich
measurements about the characteristics and behaviors of
participants

4
Annual Assessments of Violent
Delinquency and Gang Membership

Violent Delinquency—frequency in last year of:






Gang fighting
Fist fighting
Carrying/Using a Deadly Weapon
Threatening or Attacking Someone
Throwing an object at someone
Gang Membership: In the past year have you
been part of a group or gang that committed
reprehensible acts?
5
The Selection Problem: Violent Delinquency from
Age 11 to 14 of Gang Members at Age 14
4
3.5
3
2.5
2
1.5
1
0.5
0
Gang member age 14
Non-gang member age 14
violence
age 14
violence
age 13
violence
age 12
violence
age 11
6
Cochran’s Advice on how to proceed: “How
should the study be conducted if it were possible
to do it by controlled experimentation?”



Well defined treatment—what is the effect of first-time
gang membership at age 14 on violence at age 14 and
beyond?
Good baseline measurements on the treated (gang
members at 14) and controls (non-gang members at
14)—provided by trajectory groups
Randomize treatment to create comparability (i.e.
balance) on all covariates between treated and
controls—provided by propensity score matching
7
Treatment, Covariates, & Outcomes
Responses to gang status at 14—Outcomes
Time=+
Outcomes-violence at
14 and beyond
“Treatment compliance”gang status at 15 and
beyond
Time=0
Time= -
Baseline covariates—Fixed and time varying
Including violence prior to age 14
Treatment
Assignment
-1st-time
gang status
at 14
8
Baseline Measurements: Trajectories of Violent
Delinquency from Age 11 to 13 for Sub-sample with NO
Gang Involvement over this Period
5
31% of Chronics
Join Gangs
at Age 14
Delinquent Violence
4.5
4
3.5
3
2.5
15% of Decliners
Join Gangs
at Age 14
2
1.5
1
0.5
0
11
12
Age
13
7% of Lows
Join Gangs
at Age 14
9
Trajectory Groups as Baseline
Measurements


Allows test of whether facilitation effect of
gang membership depends on
developmental history
Aids in controlling for selection effects by
comparing gang and nongang members with
comparable histories of violence that are
uncontaminated by the effects of prior gang
membership
10
Creating balance with propensity score
matching



Propensity score relates probability of treatment
to specified covariates
By matching on propensity score, treated and
controls are balanced on the covariates in the
propensity score
Imbalance may remain on other covariates
11
Creating balance—Match first-time gang joiners at
14 with one or more “comparable” non-gang
joiners

Match within trajectory group



Group-specific treatment effect estimates
Helps to balance prior history of violence
Within Group Matching based on:


Propensity score for gang membership at age 14
Covariates in the propensity score include:
 Self reported violence at ages 10-13 plus teacher and
peer ratings of aggression
 Posterior probability of trajectory group membership
 Many risk factors for violence-gang membership such as
low iq and having a teen mother, hyperactivity and
opposition
12
Twelve Covariates Comparing Gang
Joiners at 14 with Potential Controls
13
Propensity for gang joining by trajectory
group (before matching)
14
Matching Strategy

21 gang joiners in low trajectory matched with
105 (out of 276) non-gang joiners from that
trajectory


Number of matches range 2 to 7
38 gang joiners in declining trajectory
matched with 114 (out of 216) non-gang
joiners from that trajectory

Number of matches range from 1 to 6
15
Balance before and after matching for
selected variables
16
Standardized differences across the 15
variables used in matching
17
“Intent to Treat” Effects of First-time Gang
Membership at 14 on Violence at age 14 to 17
Age
Group
Significance Level
14
Low
Declining
.008
.033
15
Low
Declining
.034
.086
16
Low
Declining
.044
.753
17
Low
Declining
.070
.530
18
Effects of First-time Gang Membership at 14 on
Violence at 14 to 17
Low Trajectory: Violence at Ages 14-17 by Gang
Status at Age 14
2
1.5
Gang member age 14
1
Non-gang member age
14
0.5
0
violence violence violence violence
age 14 age 15 age 16 age 17
Declining Trajectory: Violence at ages 14 to 17 by
Gang Status at Age 14
3
2.5
2
1.5
1
0.5
0
Gang member age 14
Non-gang member age
14
violence violence violence violence
age 14 age 15 age 16 age 17
19
Concluding Observations on Strengths of
this Approach




Trajectory Group Specific Effects
Transparency
Weaknesses Open to View
Keeping Time in Order
20
Extending Group-Based Trajectory
Modeling to Account for Subject
Attrition
Daniel S. Nagin
Carnegie Mellon University
Bobby Jones
Carnegie Mellon University
Amelia Haviland
Rand Corporation
mean number of convictions per year
Trajectories Based on 1979 Dutch Conviction Cohort
3
2.5
2
1.5
1
0.5
0
12
15
18
21
24
27
30
33
36
39
42
45
48
51
54
57
60
63
66
69
age
SO (70.9%)
LR-D (21.7%)
MR-D (5.7%)
HR-P (1.6%)
72
Missing Data
•
Two Types
–
–
•
•
•
Intermittent missing assessments (y1, y2 , . ,y4, .
,y6)
Subject attrition where assessments cease
starting in period τ (y1 , y2 , y3 , . , . , .)
Both types assumed to be missing at random
Model extension designed to account for
potentially non-random subject attrition
No change in the model for intermittent
missing assessments
Some Notation
T=number of assessment periods
τi =period t in which subject i drops out
t
j
= Probability of Drop out in group j in period t
Probability of Dropout in Period t
Period
1
2
3
4
.
.
.
T
No
Drop
Out
Probability of Drop Out
0
.
.
.
1 – all the above probabilities
The Dropout Extended Likelihood for
Group j
 i 1T

j
j  j
P (Yi | agei , j;  j , )    p ( yit | wit  0, agei , j;  j )(1  t ) i
 t 1

Specification of t
•
•
Binary Logit Model
Predictor Variables
–
–
•
•
j
Fixed characteristics of i, x
i
Prior values of outcome, yit 1 , yit 2 ,....
If trajectory group was known within trajectory
group j dropout would be “exogenous” or
“ignorable conditional on observed
covariates”
Because trajectory group is latent, at
population level, dropout is “non-ignorable”
Simulation Objectives


Examine effects of differential attrition rate
across groups that are not initially well
separated
Examine the effects of using model estimates
to make population level projections
Simulation 1: Two Group Model With Different
Drop Probabilities and Small Initial Separation
E(y
)
10
E(y)
No
dropout
Slope=.5
10
Time
10
E(y)
Time
10
E(y)
Time
Tim
Simulation Results: Group 1 and Group 2 Initially not Well Separated
Group 1 Per
Period
Dropout
Probability
0
.05
.10
.15
.20
.25
.30
.35
.40
Expected Probability of
Group 1
Group 1
Assessment Dropout on
Periods
or before
Period 6
6.0
5.3
4.7
4.2
3.7
3.3
2.9
2.6
2.4
0
.226
.410
.556
.672
.762
.832
.884
.922
Model Without
Dropout
Model With
Dropout
Group 1
Prob. Est.
(π1)
Percent
Bias
Group 1
Prob. Est.
(π1)
Percent
Bias
Dropout
Prob.
Est.
.200
.171
.146
.122
.100
.079
.061
.046
.034
0.0
-14.5
-27.0
-39.0
-50.0
-60.5
-69.5
-77.0
-83.0
.200
.199
.199
.200
.199
.200
.199
.199
.199
0.0
-0.5
-0.5
0.0
-0.5
0.0
-0.5
-0.5
-0.5
.000
.051
.099
.150
.199
.250
.301
.350
.398
Simulation 2: Projecting to the Population
Level from Model Parameter Estimates
Chinese Longitudinal Healthy Longevity
Survey (CLHLS)






Random selected counties and cities in 22
provinces
4 waves 1998 to 2005
80 to 105 years old at baseline
8805 individual at baseline
68.9% had died by 2005
Analyzed 90-93 years old cohort in 1998
Activities of Daily Living

On your own and without assistance can you:






Bath
Dress
Toilet
Get up from bed or chair
Eat
Disability measured by count of items where
assistance is required
Table 3
Summary Statistic for the Age 90 to 93 CLHLS Cohort at
Baseline
Variable
ADL 1998 Count
ADL 2000 Count
ADL 2002 Count
ADL 2005 Count
Female
Life Threatening
Disease
N
1078
580
335
120
1078
1078
Average
.84
1.05
1.16
1.26
.52
.11
A
5
D 4.5
L 4
ADL Trajectory Model Without Dropout
3.5
3
C 2.5
o 2
u 1.5
1
n 0.5
t 0
Low (27.1%)
Medium (60.0%)
High (12.9%)
1
2
3
Wave
4
ADL Trajectory Model With Drop Out
A
D
L
C
o
u
n
t
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Low (20.1% DP=.34)
Medium (58.6%
DP=.47)
High (21.3% DP=.64)
1
2
3
Wave
4
Table 4
Predict Population Average ADL counts from the Models With
and Without Dropout
Model
Without
Drop Out
Period Average Predict
%
ADL
ADL
Error
Count
Count
1998
.84
.91
8.3
2000
1.05
1.19
13.3
2002
1.16
1.42
22.4
2005
1.26
1.89
50.0
Model With Drop Out
~
1
t
.201
.254
.309
.366
~

~

.586
.600
.593
.571
.213
.146
.097
.063
2
t
3
t
Predicted
%
ADL
Error
Count
.93
10.7
1.07
1.9
1.17
.9
1.58
25.4
Adding Covariates to Model to Test the Morbidity
Compression v. Expansion Hypothesis
•
•
•
Will increases in longevity compress or expand
disability level in the population of the elderly?
“Had a life threatening disease” at baseline or
prior is positively correlated with both ADL
counts at baseline and subsequent mortality
rate.
Question: Would a reduction in the incidence of
life threatening diseases at baseline increase or
decrease the population level ADL count?
Testing Strategy and Results
•
•
•
•
Specify group membership probability (πj )
j

and dropout probability ( t ) to be a function
of life threatening disease variable
Both also functions of sex and dropout
probability alone of ADL count in prior period
Life threatening disease significantly related
to group membership in expected way but
has no relationship with dropout due to death
Thus, unambiguous support for compression
Projecting the reduction in population average ADL count from a
25% reduction in the incidence of the life threatening disease at
baseline
Table 6
Own and Cross Elasticity Estimates (%) for Life
Threatening Disease Incidences
Group
1. Low (
Cross
Elasticity
Own
Group Group
Total
Elasticity
2
3
Elasticity
NA
-.033
-.059
-.092
 1  .201)
2. Medium (

2

NA
-.173
-.104
.232
-.036
NA
.196
 .586)
3. High(
3
.069
 .213)
Projected % Reduction in Population Average ADL Count
Year
Reduction (%)
1998
3.0
2000
2.2
2002
1.5
2005
.7
Conclusions and Future Research


Large differences in dropout rates across
trajectory groups matter
Future research



Investigate effects of endogenous selection
Compare results in data sets with more modest
dropout rates
Further research morbidity expansion and
contraction