Basic Concepts in Survival Analysis Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Goa, India, September 2009 1 1 “Survival Analysis” • Seems to, unfortunately, imply studying deaths – Indeed, this is what prompted development of these methods – Insurance companies had great interest in estimating the probability that a new client would soon die or get really sick (“actuarial methods”) 2 2 Other Names • Some other, less morbid, names for survival analysis methods: – – – – – Time-to-event analysis Failure-time analysis Event-history analysis Duration analysis Transition analysis 3 3 Survival Outcomes • Outcome variable: time until an event occurs – Note that this is like a composite outcome with 2 components • time – usually continuous • event – a transition from one well-defined state to another well-defined state. – alive / dead – no disease / disease – not married / married – unemployed / employed • “Survival analysis was designed for longitudinal data on the occurrence of events.” Allison 1995 4 4 Survival Data - Time • All subjects must have a precisely defined time origin (Time 0) – Typically date of randomization in RCT – Must be able to exclude potential subjects who have event prior to time 0 (e.g., use screening test with high sensitivity) • Typically follow subjects prospectively until they have an event or until they leave the study – In an RCT, time is often calculated in days – Date of event or date left study – date of randomization 5 5 Survival Data – End-Points • Must have clearly defined case definition, and must be able to identify subjects who have an event – Typically simply a 0/1 variable in dataset • Typically, assume at most 1 event / subject – If multiple events are possible, might use “time to first event” – Methods for repeated events are available, but not often used for various reasons 6 6 Censoring • Ok, so what’s the big deal? Why do we need special analysis methods? • Not all subjects will have an event during the study period (or possibly ever) AND • Not all subjects will be followed for the same amount of time. • This combination makes “standard” analysis methods (e.g., t-tests, ANOVA, Chi-Squared tests, logistic regression, etc.) inappropriate 7 7 Censoring • • • Subjects for whom no event is observed are referred to as censored. Some possible reasons for censoring: 1. Study ends before subject has an event 2. Subject withdrawals from study 3. Subject is lost to follow-up 4. Subject has another kind of event that precludes observing event of interest (e.g., subject dies, or event is incidence of preterm birth but subject has abortion) Survival analysis has built-in methods for dealing with these types of missing data – As long as censoring is not related to event (i.e., requires “non-informative” censoring). 8 8 Simple Example • • • Clinical trial for preventing secondary strokes among sickle cell patients over 5 years 2 treatments: New Drug (ND), Standard Transfusions (ST) ND: 3+, 3, 4, 5, 5+ ST: 1, 2+, 3, 4+, 5+ Naïve analysis methods 1. Ignore +’s and compare mean survival times ND: 20/5 = 4 ST: 15/5 = 3 ND better? 2. Compare “raw” proportions of subjects with event ND: 60% ST: 40% ST better? 9 9 Right Censoring • Assume each subject will have some (possibly unobserved) event time T • If you observe that a subject has the event at time T, then subject is uncensored. • If all you know is that T is greater than some value c, then the subject is right censored at c. – E.g., suppose a subject completes a 1-year study without event interest, all you know is that T > 1 year. – By far, the most common type of censoring! – All survival analysis methods easily handle right censoring by default. 10 10 Subjects Right-Censored Data A Uncensored B Censored C Uncensored D Censored E Censored 0 6 12 Months in Study End of Study 11 11 3 Types of Right-Censoring • Type I censoring – study ends when a certain time point is reached – E.g., each subject is intended to be followed for 1 year • Type II censoring – study ends when a certain number of events occur – E.g., study ends after 82 events have occurred • Random censoring – observation terminated for reasons not under control of investigator – E.g., participant dropout 12 12 Other Kinds of Censoring • Interval Censoring: you don’t observe actual time of event, all you know is that a < T < b for some values a and b. – Testing for STI at 0, 6, and 12 months. If negative at 6 months but positive at 12 months, then you know that 6 < T < 12. 13 13 Other Kinds of Censoring • Left Censoring: you don’t observe actual time of event, all you know is that T happened prior to time of observation. – Special case of interval censoring when a = 0. – E.g., studying menarche and begin following girls at age 12 – some will already have started menstruating. – Typically not an issue in RCTs 14 14 Two Key Functions • Two functions are fundamental and of central interest in survival analysis: – S(t) = survivor function – h(t) = hazard function 15 15 Survivor Function • S(t) = Pr{ T > t } = Prob. of surviving longer than time t • Estimating survival probabilities for different, meaningful values of t provides crucial summary information for survival data. – E.g., in studying breast cancer, one might be particularly interested in the 5- and 10-year survival rates. • Note: Probability of having an event by time t (failure probability) is just 1 – S(t). 16 16 Theoretical Properties of Survivor Function 1 S(t) 0 t →∞ • S(0) = 1 • S(t) is non-increasing, smooth function • S(∞) = 0 17 17 Hazard Function ⎧ P(t ≤ T < t + Δt | T ≥ t )⎫ h(t ) = lim ⎨ ⎬ Δt →0 ⎩ Δt ⎭ • h(t) = instantaneous event (or failure) rate at time t, given that event hasn’t occurred up until time t • Sometimes called “instantaneous failure probability”, but it’s really not a probability – “Potential”? “Propensity”? 18 18 Theoretical Properties of Hazard Functions h(t) 0 • h(t) ≥ 0 • no upper bound t 19 19 Special Case - Exponential Failure Times 1 h(t) S(t) 0 t • h(t) = λ • S(t) = exp{ -λt } • Common assumption for sample size calculations 20 20