An Overview of Two Recent Advances in Trajectory Modeling Daniel S Nagin Combining Propensity Score Matching and Group-Based Trajectory Analysis in an Observational Study (Psychological Methods, 2007) (Also, Developmental Psychology, 2008) Amelia Haviland, RAND Corporation Daniel S. Nagin, Carnegie Mellon University Paul R. Rosenbaum, University of Pennsylvania Problem Setting Inferring the “treatment (aka causal) effect” of an important life event or a therapeutic intervention with non-experimental longitudinal data Overcoming severe selection problem whereby treatment probability depends heavily upon prior trajectory of the outcome-- Boys with high prior violence levels are more likely to join gangs Dealing with feedback effects--violence and gang membership may be mutually reinforcing Treatment effect may also depend upon prior trajectory of the outcome Measuring effect of gang membership is prototypical example of a large set of important inference problems in psychopathology Divorce and depression Drug treatment and drug abuse 3 Montreal Data 1037 Caucasian, francophone, nonimmigrant males First assessment at age 6 in 1984 Most recent assessment at age 17 in 1995 Data collected on a wide variety of individual, familial, and parental characteristics including self-reported violent delinquency and gang membership from age 11 to 17 Prototypical modern longitudinal dataset—rich measurements about the characteristics and behaviors of participants 4 Annual Assessments of Violent Delinquency and Gang Membership Violent Delinquency—frequency in last year of: Gang fighting Fist fighting Carrying/Using a Deadly Weapon Threatening or Attacking Someone Throwing an object at someone Gang Membership: In the past year have you been part of a group or gang that committed reprehensible acts? 5 The Selection Problem: Violent Delinquency from Age 11 to 14 of Gang Members at Age 14 4 3.5 3 2.5 2 1.5 1 0.5 0 Gang member age 14 Non-gang member age 14 violence age 14 violence age 13 violence age 12 violence age 11 6 Cochran’s Advice on how to proceed: “How should the study be conducted if it were possible to do it by controlled experimentation?” Well defined treatment—what is the effect of first-time gang membership at age 14 on violence at age 14 and beyond? Good baseline measurements on the treated (gang members at 14) and controls (non-gang members at 14)—provided by trajectory groups Randomize treatment to create comparability (i.e. balance) on all covariates between treated and controls—provided by propensity score matching 7 Treatment, Covariates, & Outcomes Responses to gang status at 14—Outcomes Time=+ Outcomes-violence at 14 and beyond “Treatment compliance”gang status at 15 and beyond Time=0 Time= - Baseline covariates—Fixed and time varying Including violence prior to age 14 Treatment Assignment -1st-time gang status at 14 8 Baseline Measurements: Trajectories of Violent Delinquency from Age 11 to 13 for Sub-sample with NO Gang Involvement over this Period 5 31% of Chronics Join Gangs at Age 14 Delinquent Violence 4.5 4 3.5 3 2.5 15% of Decliners Join Gangs at Age 14 2 1.5 1 0.5 0 11 12 Age 13 7% of Lows Join Gangs at Age 14 9 Trajectory Groups as Baseline Measurements Allows test of whether facilitation effect of gang membership depends on developmental history Aids in controlling for selection effects by comparing gang and nongang members with comparable histories of violence that are uncontaminated by the effects of prior gang membership 10 Creating balance with propensity score matching Propensity score relates probability of treatment to specified covariates By matching on propensity score, treated and controls are balanced on the covariates in the propensity score Imbalance may remain on other covariates 11 Creating balance—Match first-time gang joiners at 14 with one or more “comparable” non-gang joiners Match within trajectory group Group-specific treatment effect estimates Helps to balance prior history of violence Within Group Matching based on: Propensity score for gang membership at age 14 Covariates in the propensity score include: Self reported violence at ages 10-13 plus teacher and peer ratings of aggression Posterior probability of trajectory group membership Many risk factors for violence-gang membership such as low iq and having a teen mother, hyperactivity and opposition 12 Twelve Covariates Comparing Gang Joiners at 14 with Potential Controls 13 Propensity for gang joining by trajectory group (before matching) 14 Matching Strategy 21 gang joiners in low trajectory matched with 105 (out of 276) non-gang joiners from that trajectory Number of matches range 2 to 7 38 gang joiners in declining trajectory matched with 114 (out of 216) non-gang joiners from that trajectory Number of matches range from 1 to 6 15 Balance before and after matching for selected variables 16 Standardized differences across the 15 variables used in matching 17 “Intent to Treat” Effects of First-time Gang Membership at 14 on Violence at age 14 to 17 Age Group Significance Level 14 Low Declining .008 .033 15 Low Declining .034 .086 16 Low Declining .044 .753 17 Low Declining .070 .530 18 Effects of First-time Gang Membership at 14 on Violence at 14 to 17 Low Trajectory: Violence at Ages 14-17 by Gang Status at Age 14 2 1.5 Gang member age 14 1 Non-gang member age 14 0.5 0 violence violence violence violence age 14 age 15 age 16 age 17 Declining Trajectory: Violence at ages 14 to 17 by Gang Status at Age 14 3 2.5 2 1.5 1 0.5 0 Gang member age 14 Non-gang member age 14 violence violence violence violence age 14 age 15 age 16 age 17 19 Concluding Observations on Strengths of this Approach Trajectory Group Specific Effects Transparency Weaknesses Open to View Keeping Time in Order 20 Extending Group-Based Trajectory Modeling to Account for Subject Attrition Daniel S. Nagin Carnegie Mellon University Bobby Jones Carnegie Mellon University Amelia Haviland Rand Corporation mean number of convictions per year Trajectories Based on 1979 Dutch Conviction Cohort 3 2.5 2 1.5 1 0.5 0 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 age SO (70.9%) LR-D (21.7%) MR-D (5.7%) HR-P (1.6%) 72 Missing Data • Two Types – – • • • Intermittent missing assessments (y1, y2 , . ,y4, . ,y6) Subject attrition where assessments cease starting in period τ (y1 , y2 , y3 , . , . , .) Both types assumed to be missing at random Model extension designed to account for potentially non-random subject attrition No change in the model for intermittent missing assessments Some Notation T=number of assessment periods τi =period t in which subject i drops out t j = Probability of Drop out in group j in period t Probability of Dropout in Period t Period 1 2 3 4 . . . T No Drop Out Probability of Drop Out 0 . . . 1 – all the above probabilities The Dropout Extended Likelihood for Group j i 1T j j j P (Yi | agei , j; j , ) p ( yit | wit 0, agei , j; j )(1 t ) i t 1 Specification of t • • Binary Logit Model Predictor Variables – – • • j Fixed characteristics of i, x i Prior values of outcome, yit 1 , yit 2 ,.... If trajectory group was known within trajectory group j dropout would be “exogenous” or “ignorable conditional on observed covariates” Because trajectory group is latent, at population level, dropout is “non-ignorable” Simulation Objectives Examine effects of differential attrition rate across groups that are not initially well separated Examine the effects of using model estimates to make population level projections Simulation 1: Two Group Model With Different Drop Probabilities and Small Initial Separation E(y ) 10 E(y) No dropout Slope=.5 10 Time 10 E(y) Time 10 E(y) Time Tim Simulation Results: Group 1 and Group 2 Initially not Well Separated Group 1 Per Period Dropout Probability 0 .05 .10 .15 .20 .25 .30 .35 .40 Expected Probability of Group 1 Group 1 Assessment Dropout on Periods or before Period 6 6.0 5.3 4.7 4.2 3.7 3.3 2.9 2.6 2.4 0 .226 .410 .556 .672 .762 .832 .884 .922 Model Without Dropout Model With Dropout Group 1 Prob. Est. (π1) Percent Bias Group 1 Prob. Est. (π1) Percent Bias Dropout Prob. Est. .200 .171 .146 .122 .100 .079 .061 .046 .034 0.0 -14.5 -27.0 -39.0 -50.0 -60.5 -69.5 -77.0 -83.0 .200 .199 .199 .200 .199 .200 .199 .199 .199 0.0 -0.5 -0.5 0.0 -0.5 0.0 -0.5 -0.5 -0.5 .000 .051 .099 .150 .199 .250 .301 .350 .398 Simulation 2: Projecting to the Population Level from Model Parameter Estimates Chinese Longitudinal Healthy Longevity Survey (CLHLS) Random selected counties and cities in 22 provinces 4 waves 1998 to 2005 80 to 105 years old at baseline 8805 individual at baseline 68.9% had died by 2005 Analyzed 90-93 years old cohort in 1998 Activities of Daily Living On your own and without assistance can you: Bath Dress Toilet Get up from bed or chair Eat Disability measured by count of items where assistance is required Table 3 Summary Statistic for the Age 90 to 93 CLHLS Cohort at Baseline Variable ADL 1998 Count ADL 2000 Count ADL 2002 Count ADL 2005 Count Female Life Threatening Disease N 1078 580 335 120 1078 1078 Average .84 1.05 1.16 1.26 .52 .11 A 5 D 4.5 L 4 ADL Trajectory Model Without Dropout 3.5 3 C 2.5 o 2 u 1.5 1 n 0.5 t 0 Low (27.1%) Medium (60.0%) High (12.9%) 1 2 3 Wave 4 ADL Trajectory Model With Drop Out A D L C o u n t 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Low (20.1% DP=.34) Medium (58.6% DP=.47) High (21.3% DP=.64) 1 2 3 Wave 4 Table 4 Predict Population Average ADL counts from the Models With and Without Dropout Model Without Drop Out Period Average Predict % ADL ADL Error Count Count 1998 .84 .91 8.3 2000 1.05 1.19 13.3 2002 1.16 1.42 22.4 2005 1.26 1.89 50.0 Model With Drop Out ~ 1 t .201 .254 .309 .366 ~ ~ .586 .600 .593 .571 .213 .146 .097 .063 2 t 3 t Predicted % ADL Error Count .93 10.7 1.07 1.9 1.17 .9 1.58 25.4 Adding Covariates to Model to Test the Morbidity Compression v. Expansion Hypothesis • • • Will increases in longevity compress or expand disability level in the population of the elderly? “Had a life threatening disease” at baseline or prior is positively correlated with both ADL counts at baseline and subsequent mortality rate. Question: Would a reduction in the incidence of life threatening diseases at baseline increase or decrease the population level ADL count? Testing Strategy and Results • • • • Specify group membership probability (πj ) j and dropout probability ( t ) to be a function of life threatening disease variable Both also functions of sex and dropout probability alone of ADL count in prior period Life threatening disease significantly related to group membership in expected way but has no relationship with dropout due to death Thus, unambiguous support for compression Projecting the reduction in population average ADL count from a 25% reduction in the incidence of the life threatening disease at baseline Table 6 Own and Cross Elasticity Estimates (%) for Life Threatening Disease Incidences Group 1. Low ( Cross Elasticity Own Group Group Total Elasticity 2 3 Elasticity NA -.033 -.059 -.092 1 .201) 2. Medium ( 2 NA -.173 -.104 .232 -.036 NA .196 .586) 3. High( 3 .069 .213) Projected % Reduction in Population Average ADL Count Year Reduction (%) 1998 3.0 2000 2.2 2002 1.5 2005 .7 Conclusions and Future Research Large differences in dropout rates across trajectory groups matter Future research Investigate effects of endogenous selection Compare results in data sets with more modest dropout rates Further research morbidity expansion and contraction