Data structure for a continuous-time event history analysis Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Overview • Structure of most survey data: one record per respondent • Event history analysis requires separate records for each period at risk of the event – One record per spell • How to create one record per spell – Components of the dependent variable – Fixed characteristics – Time-varying characteristics Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Data preparation for an event history • To conduct a continuous-time event history analysis requires one record per period at risk – Also known as a spell • Survey data often contains one record per respondent • Creating an event history data set involves generating one record per spell – Creating the components of the dependent variable – Mapping the fixed covariates onto each spell – Calculating time-varying covariates for each spell Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Source data: 1 record per respondent ID 1 2 Date of birth Date of Date of Date of Date of 1st 1st 2nd 2nd marriage divorce marriage divorce 2/1/52. Date of death Date of Date of 1st 2nd Date 1st Date last child's child's observed observed Gender birth birth . . . . 7/15/85 10/1/10 F . . 7/15/69 6/22/10. . . . 9/21/85 11/5/10 M . . . 10/8/85 5/1/05 M 12/5/95. 10/1/02 12/2/85 10/2/02 F 9/21/645/11/67 3 3/1/65 8/1/90 1/1/97 10/1/04. 4 3/1/42 6/1/63. . . Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Creating an analytic data set for study of divorce Fixed covariate Dates of beginning of period(s) at risk Dates pertaining to Dates pertaining to Dates of event censoring and status at time-varying end of observation covariates ID 1 2 Date of birth Date of Date of Date of Date of 1st 1st 2nd 2nd marriage divorce marriage divorce 2/1/52. Date of death Date of Date of 1st 2nd Date 1st Date last child's child's observed observed Gender birth birth . . . . 7/15/85 10/1/10 F . . 7/15/69 6/22/10. . . . 9/21/85 11/5/10 M . . . 10/8/85 5/1/05 M 12/5/95. 10/1/02 12/2/85 10/2/02 F 9/21/645/11/67 3 3/1/65 8/1/90 1/1/97 10/1/04. 4 3/1/42 6/1/63. . . Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Example timelines for study of divorce M = Married D = Divorced L = Lost to follow-up O = Censored by end of study. X = Died Case 1: Never married -> no spells Case 2: Married once, censored by end of survey Case 3: Married twice, lost to follow-up before end of survey Case 4: Married once, died before end of survey M Start of observation period Event history analysis: Continuous time O M Not married -> not at risk of divorce -> not part of a spell M D M L X End of observation period The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Calculating number of spells for each respondent • Each respondent contributes a spell for each time they are at risk of the event under study • If they are never at risk -> no spells – Thus some respondents in the original data set might not be included in the event history analysis. E.g., in an analysis of • Getting married, anyone who was already married throughout the period of observation is not at risk of becoming married -> no spells • Getting divorced, the same respondent would be at risk the entire time! • If they are at risk once -> one spell – No more than one spell/respondent for non-repeatable events like death • For repeatable events -> potential for multiple spells – In an event history analysis of divorce, anyone who is observed during two periods of marriage contributes two spells Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Example spells for a study of divorce M = Married D = Divorced L = Lost to follow-up O = Censored by end of study. X = Died Case 1: Never married -> Contributes zero spells to the divorce event history data set Case 2: Married once, censored by end of survey. Contributes one open spell Case 3: Married twice, lost to follow-up before end of survey. Contributes two total spells: one closed and one open (censored) Case 4: Married once, died before end of survey. Contributes one open spell Event history analysis: Continuous time O M M D M L Event under study = divorce X M The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Event history data: Continuous time 1 record per spell Case 1 does not contribute ANY spells at risk of divorce because she was never married Cases 2 and 4 each contribute 1 spell, because each was married once Case 3 contributes 2 spells (periods at risk of divorce), 1 for each time married Date Spell # spell ID (marriage #) started 2 1 6/22/10 3 1 8/1/90 3 2 10/1/04 4 1 6/1/63 Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Dependent variables for an event history • Duration of spell • Event indicator – Can be dichotomous • Occurred or not – Can be multichotomous • Differentiate between different reasons for nonevent – Death – Lost to follow-up • Both components must be constructed from information about the respondent’s timeline Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Measures of duration replace most dates • In the event history data set, measures of duration replace Date Duration Age first Age at Age last Spell # spell of spell observed start of observed ID (marriage #) started (mos.) (yrs) spell (yrs) (yrs) 2 1 6/22/10 3.5 16 40 41 3 1 8/1/90 76.5 20 25 45 3 2 10/1/04 6.5 20 39 45 4 1 6/1/63 474.5 43 21 60 Event history analysis: Continuous time – Dates of onset of risk period – Event occurrence – Censoring • Calculated from distance between dates from event history The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Duration calculations • Unless precise dates are known, events or censoring are assumed to have occurred half-way through the period, yielding 0.5 person-units of exposure. – Assuming that an event occurred in the middle of a time period corresponds to a constant risk of the event during that time interval (Trussell and Hammerslough, 1983) • If exact dates are known, fractional person-time units can be assigned accordingly. – For instance, if a person was divorced on March 10, they would be assigned 10/30 or 0.333 person-months at risk in that month. Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Detailed indicator of status at end of spell Date Duration Status Divorce Spell # spell of spell at end event ID (marriage #) started (mos.) of spell indicator 2 1 6/22/10 3.5 0 0 3 1 8/1/90 76.5 1 1 3 2 10/1/04 6.5 2 0 4 1 6/1/63 474.5 3 0 Coding of status indicator: 0 = censored 1 = divorced 2 = lost to follow-up (LFU) 3 = died Case 2: Married once, still married at end of survey Case 3: First marriage ended in divorce Case 3: Married second time, lost to follow-up in 2005 Case 4: Married once, died in 2002 Coding of divorce event indicator: 0 = censored, LFU, died 1 = divorced Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Dependent variable components Date Duration Status Divorce Age first Age at Age last Spell # spell of spell at end event observed start of observed ID (marriage #) started (mos.) of spell indicator (yrs) spell (yrs) (yrs) Gender 2 1 6/22/10 3.5 0 0 16 40 41 male 3 1 8/1/90 76.5 1 1 20 25 45 male 3 2 10/1/04 66.5 2 0 20 39 45 male 4 1 6/1/63 474.5 3 0 43 21 60 female Dependent variable component #1 – duration measure Dependent variable component #2 – dichotomous event indicator Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Fixed and time-varying covariates • Covariate = independent variable • Fixed covariates are those that have the same value for a given respondent throughout the spell – E.g., except in rare cases, each person’s gender remains constant • Time-varying covariates are those whose values can change for a respondent between or during spells – E.g., number of children • Need to map each of these correctly from the onerecord-per-respondent onto one-record-per-spell Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Fixed covariates Age at start of spell and gender do not change during the course of a spell Date Duration Status Divorce Age at Spell # spell of spell at end event start of ID (marriage #) started (mos.) of spell indicator spell (yrs) Gender 2 1 6/22/10 3.5 0 0 40 male 3 1 8/1/90 76.5 1 1 25 male 3 2 10/1/04 66.5 2 0 39 male 4 1 6/1/63 474.5 3 0 21 female Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Time-varying covariate None of these respondents had children prior to their first marriage (though theoretically some cases could). Respondent #3 had one child during his first marriage, so the value of that variable changes between his first and second marriages (first and second spells at risk of divorce). Date Duration Status Divorce Age first Age at Age last # kids at Spell # spell of spell at end event observed start of observed start of ID (marriage #) started (mos.) of spell indicator (yrs) spell (yrs) (yrs) Gender spell 2 1 6/22/10 3.5 0 0 16 40 41 male 0 3 1 8/1/90 76.5 1 1 20 25 45 male 0 3 2 10/1/04 6.5 2 0 20 39 45 male 1 4 1 6/1/63 474.5 3 0 43 21 60 female 0 Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Presenting information on event history construction: Background work • Most of the gory details of programming creation of an event history are parts of behind-the-scenes work – Important to do consistency checks to make sure event histories were created correctly given • • • • Original data source of information for timeline construction Type of event under study Fixed covariates Time-varying covariates – E.g., correct • Number of spells for each respondent • Duration and event indicators for each spell Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Presenting information on event history construction • In the data and methods section, describe: – Original data source of information for timeline construction • Dates, status, duration of events – – – – Type of event under study What constitutes censoring Fixed covariates Time-varying covariates • Source(s) of information for determining timing of changes in those variables • See checklist in chapter 17 of Writing about Multivariate Analysis, 2nd Edition for more detail on what to report Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Summary • A continuous-time event history analysis requires a separate record for each period at risk of the event • For each spell, calculate – Components of the dependent variable • Duration measure • Event indicator – Fixed characteristics – Time-varying characteristics • In data and methods section, describe data sources and variables for the event history Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested resources • Allison, P. D. 2010. Survival Analysis Using the SAS System: A Practical Guide, 2nd Edition. Cary, NC: SAS Institute. • Trussell, James, and Charles Hammerslough. 1983. “A Hazards-Model Analysis of the Covariates of Infant and Child Mortality in Sri Lanka.” Demography 20 (1): 1–26. • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. University of Chicago Press, chapter 17. Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested online resources • Podcast on data structure for a discrete-time event history analysis Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Question #3a in the problem set for chapter 17 – Suggested course extensions for chapter 17 • “Reviewing” exercises #2a through 2h • “Applying statistics and writing” exercises #1 and 2a Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html Event history analysis: Continuous time data The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.