Survival Analysis What is survival analysis? Survival analysis is a technique to analyze data when your outcome variable is time until an event occurs. That event can be death, marriage, falling into poverty, or a part breaking in a machine. The key element in survival analysis is that you want to know either when the event occurred or the hazard of an event occurring. How is survival analysis different from OLS? In OLS, the rate or proportion is the variable of interest. For example, OLS could determine the likelihood of a patient relapsing after treatment. With survival analysis, the variable of interest is the time until an event or the hazard of the event; you can determine the chance a patient will relapse at a given time. When do you use survival analysis? When you have time-dependent covariates If your data is censored (you don’t have a complete picture for some participants) Note: Your data don’t have to be censored to use survival analysis, but if you have censored data you should consider survival analysis. Time-dependent covariates Time-dependent covariates are variables that change over time. Time-independent covariates are variables that do not vary with time. Time-dependent covariates are variables like marital status, employment status, or number of children. Timeindependent covariates are variables like race, country of origin, and sex. Censoring Censoring occurs when you don’t have complete knowledge of the data for some of your participants. There are four main types of censoring: 1. Right censoring: The event has not occurred for all participants within the time interval of the data. E.g. if you are studying deaths in a 10 year period, and some patients don’t die. In non-survival analysis models, right censored data typically bias the results, as you would either have to drop those observations, or presume the event either occurred for them or did not by the end of the period (leading to an over, or under estimation). 2. Left censoring: The event of interest has already occurred for in individual before the time period of the study has begun. Same bias risk as right censoring. 3. Interval censoring: You know that the event time falls within a particular interval of time, but not the exact time it happened. 4. Random censoring: Observations are terminated for reasons beyond the control of the investigator, for example emigration or participants lost to follow-up. This can be either non-informative or informative. a. Non-informative censored data are representative of all subjects with the same values of the explanatory variable. Will not lead to bias. b. Informative censored data are not representative of other subjects and could introduce bias into your study. Note that you must consider whether the data are informative or non-informative; nothing about the actual data tells you this. Credit: Menggang Yu, Indiana University Kaplan-Meier Survival Curves and Life Tables Kaplan-Meier Survival Curves and Life Tables are like descriptive statistics; they describe your data but have limited explanatory capability. The only variable they consider is time. Kaplan-Meier Survival Curves Kaplan-Meier survival curves track the survival of study subjects over time. The curve estimates the survival function taking only time into account. The survival function is viewed as a continuous function, rather than divided into intervals. Life Tables Life tables can be thought of as more detailed frequency distribution table. The distribution of survival times is divided into intervals defined by the researcher. Life tables then report the number or proportion of subjects that enter the interval alive, the number that fail in the interval, and the number censored in the interval. Hazard rate and other statistics are computed based on this data. Cox Proportional Hazard Model The Cox proportional hazard model is a general regression model used in survival analysis. This model allows you to specify and control for individual characteristics, such as age, sex, and socioeconomic status (depending on the variables of interest in your model). The variables can be time-independent (age, sex) or time-dependent (socioeconomic status.). The Cox model for time-independent variables is: hi(t) = h0(t)exp(1xi1 + 2xi2 + … + kxik) Each of the xik variables represents a time-independent variable. The model for time dependent variables is specified as followed: hi(t) = h0(t)exp(1xi1(t) + 2xi2(t) + … + kxik(t)) Here, the xik(t) are functions specifying variables that change over time. Like other survival analysis techniques, the Cox proportional hazards model can handle censored data. Resources Introduction to Survival Analysis: http://courses.washington.edu/b515/l15.pdf Survival Analysis Using Stata: www.nyu.edu/its/socsci/Docs/Survival8.ppt Survival Analysis Textbook (available online through UW Libraries): Survival Analysis: A Self-Learning Text by David Kleinbaum and Mitchel Klein