Introducing discrete-time survival analysis ALDA, Chapter Eleven “To exist is to change, to change is to mature” Henri Bergson John B. Willett & Judith D. Singer Harvard Graduate School of Education Chapter 11: Fitting basic discrete-time hazard models Review basic descriptive statistics for discrete-time survival data (Ch 10) Life table Hazard function Survivor function Median lifetime Specifying a suitable discrete-time hazard model (§11.1 & 11.2)—both heuristic and formal representations Fitting the discrete-time hazard model to data (§11.3)—it turns out that it’s very easy to fit the model Interpreting parameter estimates (§11.4)—very different from growth modeling, but more similar to logistic regression Displaying fitted hazard and survivor functions (§11.5)—as in growth modeling, we’ll display fitted functions at prototypical predictor values Comparing (nested) discrete-time hazard models using goodness-of-fit statistics (§11.5)—methods for data analysis and model comparison Illustrative example: Grade at first heterosexual intercourse Data source: Deborah Capaldi & colleagues (1996) Child Development Sample: 180 middle school boys (all considered “at risk”) Research design: Large panel study in which each boy was tracked from 7 th through 12th grades By the end of data collection (at the end of 12 th grade), n=126 (70.0%) had had sex The remaining n=54 (30%) were still virgins. These censored observations pose a challenge for data analysis. Question predictor: PT, for parenting transition, a dichotomy indicating whether the boy lived with his biological parents during his early formative years (before 7th grade when data collection began) 72 boys (40%) lived with both biological parents (PT=0) 108 boys (60%) experienced at least one parenting transition before 7 th grade (PT=1) Ultimately, we’ll also examine a continuous predictor, PAS, which assesses the parents’ level of antisocial behavior during the child’s formative years (also time-invariant—behavior before the study started). Because the original scale is totally arbitrary, scores have been standardized to a mean of 0 and sd of 1 (ALDA, Section 11.1, pp 358-360) The life table: Summarizing the distribution of event occurrence over time J intervals, T=7, 8, …, 12 Risk set n experiencing target event in interval j n censored in interval j How might we summarize the distribution of event occurrence? (ALDA, Section 10.1, pp 326-329) Assessing the conditional risk of event occurrence: The discrete-time hazard function hˆ(t j ) n eventsj n at risk j , 15 hˆ(t 7 ) 0.0833 180 24 hˆ(t 9 ) 0.1519 158 26 hˆ(t12 ) 0.3250 80 Discrete-time hazard Conditional probability that individual i will experience the target event in time period j (Ti = j) given that s/he didn’t experience it in any earlier time period (Ti j) h(tij)=Pr{Ti= j|Ti j} As a probability (only in discrete time), hazard is bounded by 0 and 1. This is an issue for modeling that we’ll need to address Estimation is easy because each value of hazard is based on that interval’s risk set. h(t) 0.30 0.20 0.10 0.00 6 7 8 9 Grade (ALDA, Section 10.2.1, pp 330-339) 10 11 12 Cumulating risk over time: The survivor function (and median lifetime) Sˆ (t j ) Sˆ (t j 1 )[1 hˆ(t j )] Sˆ (t 7 ) 1.0 [1 0.0833] 0.9167 Sˆ (t 9 ) 0.8778 [1 0.1519 ] 0.7444 Discrete-time survival probability Probability that individual i will “survive” beyond time period j (Ti > j) (i.e.,will not experience the event until after time period j). S(tij)=Pr{Ti > j} Also a probability bounded by 0 and 1. At the beginning of time, S(ti0)=1.0 Strategy for estimation: Since h(tij) tells us about the probability of event occurrence, 1-h(tij) tells us about the probability of non-occurrence (i.e., about survival) Estimated median lifetime S(t) 1.00 0.75 ML = 10.6 0.50 0.25 0.00 6 7 8 9 Grade (ALDA, Section 10.2, pp 330-339) 10 11 12 Converting a person-level data set into a person-period data set Person-period data set: • Person-level data set: one row per person • ID T CENSOR PT 193 9 0 1 126 12 0 1 407 12 1 0 ID 407 was censored, remaining a virgin through 12th grade (ALDA, Section 10.5.1, pp 351-354) ID 193 had sex in the 9th grade ID 126 had sex in the 12th grade one row for every person-period until event occurrence or censoring—different from growth modeling EVENT indicates either event occurrence or censoring Contemplating a DTSA model: Inspecting sample plots of within-group hazard and survivor functions Q’s to ask when examining sample hazard f ns: • What is the shape of each hazard function?—here, their shape is similar—both beginning low and climbing steadily over time. • Does the relative level of hazard differ across groups?—here, hazard for boys with a parenting transition is consistently higher • Suggests partitioning variation in risk into: • A baseline profile of risk • A shift in risk corresponding to variation in the predictor Q’s to ask when examining sample survivor f ns: • They tend to be less useful because they assess the predictor’s cumulative effect—here, telling us that the ML for boys with a PT is 10.0 vs. 11.7 when PT=0. • Note: reversal of relative rankings We’re almost ready to go, but back to the bounded nature of hazard (ALDA, Section 11.1.1, pp 358-361) As in regular regression, we use transformation to deal with hazard’s bounds: Understanding the effects of taking odds and logits 1.0 Estimated hazard 0.8 0.6 odds hazard odds 1 hazard logit One or more early parenting transitions 0.4 0.2 No early parenting transitions 0.0 6 7 8 9 10 11 12 hazard log( odds) log 1 hazard Grade 1.0 Estimated odds 0.0 0.8 Estimated logit(hazard) One or more early parenting transitions -1.0 One or more early parenting transitions 0.5 No early parenting transitions -2.0 0.3 -3.0 No early parenting transitions 0.0 -4.0 6 7 8 9 10 11 12 6 7 8 Grade • • • • Facts about odds scale Symmetric about 1 (50/50) Effect most prominent when hazard is larger Easy to get back to raw hazard: odds hazard 1 odds But it’s still bounded below by 0 and it’s asymmetric (raw differences have different meanings depending upon value of odds) (ALDA, Section 11.1.2, pp 362-365) 9 10 11 12 Grade Facts about logit scale Not bounded at all, although you need to get used to negative values (whenever hazard<.50) Usually regularizes distance betw hazard f ns Stretches distance between small values Compresses distance between large values It’s easy to get back to raw hazard hazard 1 1 e logit What population model might have generated these sample data? Plotting sample hazard estimates and overlaying alternative hypothesized models 0.0 General population logit hazard, shifted when PT switches from 0 to 1 Logit(hazard) " " -1.0 PT=1 " -2.0 PT=0! 6 ! 9 10 ! 11 12 Flat population logit hazard, shifted when PT switches from 0 to 1 " ! -3.0 -4.0 " ! " ! Linear population logit hazard, shifted when PT switches from 0 to 1 7 8 Grade Three reasonable features of a population discrete-time hazard model 1. For each predictor value, there is a population logit-hazard function. • When the predictor(s)=0, we call it the “baseline” logit-hazard function. 2. Each population logit-hazard function is constrained to have the identical shape, regardless of predictor value. • This is an assumption, and it can—and will—be relaxed later. 3. The distance between each of these logit hazard functions is identical in every time period. • • • (ALDA, Section 11.1.1, pp 366-369) Differences in predictor value only “shift” the logit-hazard function “vertically.” This assumption can—and will—be relaxed later In the meantime, the magnitude of this shift is the magnitude of the predictor’s effect How do we specify a discrete-time hazard model that has these 3 features? Recode PERIOD into a set of TIME indicators Constant vertical shift in logit hazard associated with variation in PT logit h(tij ) [ 7 D7 j D j 12 D12 ] 1 PTi (ALDA, Section 11.2, pp369-372) How does this model relate to the previous graph? Carefully unpacking the discrete-time hazard model 0.0 When PT=1, you shift this entire baseline vertically by 1 Logit(hazard) -1.0 1 -2.0 PT = 1 PT = 0 -3.0 -4.0 6 7 (D7=1) 10 9 7 12 11 When PT=0, you get the baseline logit hazard function 8 8 (D8=1) 9 ... Grade 10 ... 11 ... 12 (D12=1) logit h(tij ) [ 7 D7 j D j 12 D12 ] 1 PT 1i And we can add predictors just as in regular (logistic) regression logit h(t ij ) [ 7 D7 j D j 12 D12 ] 1 PTi 2 PAS i (ALDA, Section 11.2.1, pp 372-376) How does this model behave when hazard is expressed in the other scales? What does the DT hazard model look like when expressed on the other scales? On the logit scale, the distances between functions is identical in every time period (assumption built into our model) Logit(hazard) 0.0 -1.0 1 odds e logit -2.0 PT = 1 -3.0 PT = 0 hazard -4.0 6 7 8 9 10 11 1 1 e logit 12 Grade 0.8 Odds 0.5 Hazard 0.4 0.6 0.3 0.4 0.2 exp(1) 0.2 0.0 PT = 1 0.1 PT = 1 PT = 0 PT = 0 0.0 6 7 8 9 10 11 12 Grade On the odds scale, one function is a constant magnification (or dimunition) of the other —they are proportional 6 7 8 9 10 11 12 Grade On the hazard scale, the functions have no constant relationship (Would need to use a complementary log-log transformation to get a proportional hazards model) The “standard” DTSA model is a proportional odds model! (ALDA, Section 11.2.2, pp 376-379) Fitting the model to data: Use logistic regression in the person-period data set TIME indicators Outcome Substantive predictors All parameter estimates, standard errors, t- and zstatistics, goodness-of-fit statistics, and tests will be correct for the discrete-time hazard model Model A : Model B : Model C : Model D : logit h(t j ) 7 D7 8 D8 ... 12 D12 logit h(t j ) 7 D7 8 D8 ... 12 D12 1 PT logit h(t j ) 7 D7 8 D8 ... 12 D12 logit h(t j ) 7 D7 8 D8 ... 12 D12 1 PT 2 PAS ’s estimate the baseline logit hazard function (ALDA, Section 11.3, pp 378-386) 2 PAS ’s assess the effects of substantive predictors ^ Strategies for interpreting the ’s: ML estimates of the baseline hazard function Simplifying interpretation by transforming back to odds and hazard Because there are no substantive predictors, Model A’s estimates are the full sample estimates Because there are no predictors in Model A, this baseline is for the entire sample • If est’s are approx equal, baseline is flat • If est’s decline, hazard declines • If est’s increase (as they do here), hazard increases (ALDA, Section 11.4.1, pp 386-388) ^ Strategies for interpreting the ’s: ML estimates of the substantive predictors’ effects Dichotomous predictors As in regular logistic regression, antilogging a parameter estimate yields the estimated odds-ratio associated with a 1-unit difference in the predictor: e ˆ PT Continuous predictors Antilogging still yields a estimated odds-ratio associated with a 1-unit difference in the predictor: e e0.8736 2.4 (ALDA, Section 11.4.2 & 11.4.3, pp 388-390) e0.4428 1.56 The estimated odds of first intercourse for boys whose parents exhibited “1 unit more” of antisocial behavior are 1.56 times the odds for boys whose parental antisocial behavior was one unit lower. The estimated odds of first intercourse for boys who have experienced a parenting transition are 2.4 times higher than the odds for boys who did not experience such a transition. Estimated odds of first intercourse for boys who did not experience a parenting transition are 1/(2.40)=.42 or approximately 40% the odds for boys who did ˆ PAS Because odds ratios are symmetric about 1, you can also invert the odds ratios and change the reference group Estimated odds of first intercourse for boys who parents have “1 unit less” of antisocial behavior are 1/(1.56)=.641 or approximately 2/3rds the odds for boys whose parents were 1 unit higher Displaying fitted hazard and survivor functions Illustrating the general idea using Model B for a single dichotomous predictor With a single dichotomous predictor, there are only 2 possible prototypical functions: PT=0 (for boys from stable homes with no parenting transitions before 7th grade) PT=1 (for boys who experienced one of more early parenting transitions) logit hˆ(t j ) ˆ j ˆ1 PT hˆ(t j ) (ALDA, Section 11.5.1, pp 392-394) Sˆ (t j ) Sˆ (t j 1 )[1 hˆ(t j )] 1 1 e logit hˆ ( t j ) Displaying fitted hazard and survivor functions Constant vertical separation of 0.8736 (the parameter estimate for PT) Easy to see the effect of PT Non-constant vertical separation (no simple interpretation because the model is proportional in odds, not hazard) Effect of PT cumulates into a large difference in estimated median lifetimes (9.9 vs. 11.8 2 years) (ALDA, Section 11.5.1, pp 392-394) Displaying fitted hazard and survivor functions when some predictors are continuous As in growth modeling, select substantively interesting prototypical values and proceed in just as you did for dichotomous predictors here, we’ll choose +/- 1 sd PAS (lo=1, medium=0, and high=+1) 0.5 Fitted hazard PAS=+1 0.4 PAS= 0 One or more early parenting transitions 0.3 PAS= -1 PAS=+1 PAS= 0 0.2 PAS= -1 0.1 No early parenting transitions 0.0 6 7 8 9 10 11 12 Grade Estimated Median Lifetimes PAS PT=0 PT=1 Low (-1) >12.0 10.7 Medium (0) 11.5 10.1 High (+1) 10.9 9.6 1.0 Fitted survival probability No early parenting transitions PAS = -1 PAS = 0 PAS = +1 PAS = -1 PAS = 0 PAS = +1 0.5 One or more early parenting transitions 0.0 6 7 8 9 Grade (ALDA, Section 11.5.1, pp 392-394) 10 11 12 Comparing goodness of fit using deviance statistics and information criteria: The strategies are generally the same as in growth modeling TIME dummies Deviance smaller value, better fit, 2 dist., compare nested models AIC, BIC smaller value, better fit, compare non- nested models Model B vs. Model A provides an uncontrolled test of H0: PT=0 DDeviance=17.30(1), p<.001 Model C vs. Model A provides an uncontrolled test of H0: PAS=0 DDeviance=14.79(1), p<.001 Model D vs. Models B&C provide controlled tests [Both rejected as well] (ALDA, Section 11.6, pp 397-402)