How is survival analysis different from OLS?

advertisement
Survival Analysis
What is survival analysis?
Survival analysis is a technique to analyze data when your outcome variable is time
until an event occurs. That event can be death, marriage, falling into poverty, or a
part breaking in a machine. The key element in survival analysis is that you want to
know either when the event occurred or the hazard of an event occurring.
How is survival analysis different from OLS?
In OLS, the rate or proportion is the variable of interest. For example, OLS could
determine the likelihood of a patient relapsing after treatment. With survival
analysis, the variable of interest is the time until an event or the hazard of the
event; you can determine the chance a patient will relapse at a given time.
When do you use survival analysis?


When you have time-dependent covariates
If your data is censored (you don’t have a complete picture for some
participants) Note: Your data don’t have to be censored to use survival
analysis, but if you have censored data you should consider survival analysis.
Time-dependent covariates
Time-dependent covariates are variables that change over time. Time-independent
covariates are variables that do not vary with time. Time-dependent covariates are
variables like marital status, employment status, or number of children. Timeindependent covariates are variables like race, country of origin, and sex.
Censoring
Censoring occurs when you don’t have complete knowledge of the data for some of
your participants. There are four main types of censoring:
1. Right censoring: The event has not occurred for all participants within the
time interval of the data. E.g. if you are studying deaths in a 10 year period,
and some patients don’t die. In non-survival analysis models, right censored
data typically bias the results, as you would either have to drop those
observations, or presume the event either occurred for them or did not by
the end of the period (leading to an over, or under estimation).
2. Left censoring: The event of interest has already occurred for in individual
before the time period of the study has begun. Same bias risk as right
censoring.
3. Interval censoring: You know that the event time falls within a particular
interval of time, but not the exact time it happened.
4. Random censoring: Observations are terminated for reasons beyond the
control of the investigator, for example emigration or participants lost to
follow-up. This can be either non-informative or informative.
a. Non-informative censored data are representative of all subjects with
the same values of the explanatory variable. Will not lead to bias.
b. Informative censored data are not representative of other subjects
and could introduce bias into your study. Note that you must consider
whether the data are informative or non-informative; nothing about
the actual data tells you this.
Credit: Menggang Yu, Indiana University
Kaplan-Meier Survival Curves and Life Tables
Kaplan-Meier Survival Curves and Life Tables are like descriptive statistics; they
describe your data but have limited explanatory capability. The only variable they
consider is time.
Kaplan-Meier Survival Curves
Kaplan-Meier survival curves track the survival of study subjects over time. The
curve estimates the survival function taking only time into account. The survival
function is viewed as a continuous function, rather than divided into intervals.
Life Tables
Life tables can be thought of as more detailed frequency distribution table. The
distribution of survival times is divided into intervals defined by the researcher.
Life tables then report the number or proportion of subjects that enter the interval
alive, the number that fail in the interval, and the number censored in the interval.
Hazard rate and other statistics are computed based on this data.
Cox Proportional Hazard Model
The Cox proportional hazard model is a general regression model used in survival
analysis. This model allows you to specify and control for individual characteristics,
such as age, sex, and socioeconomic status (depending on the variables of interest in
your model). The variables can be time-independent (age, sex) or time-dependent
(socioeconomic status.). The Cox model for time-independent variables is:
hi(t) = h0(t)exp(1xi1 + 2xi2 + … + kxik)
Each of the xik variables represents a time-independent variable. The model for time
dependent variables is specified as followed:
hi(t) = h0(t)exp(1xi1(t) + 2xi2(t) + … + kxik(t))
Here, the xik(t) are functions specifying variables that change over time. Like other
survival analysis techniques, the Cox proportional hazards model can handle
censored data.
Resources
Introduction to Survival Analysis: http://courses.washington.edu/b515/l15.pdf
Survival Analysis Using Stata: www.nyu.edu/its/socsci/Docs/Survival8.ppt
Survival Analysis Textbook (available online through UW Libraries): Survival
Analysis: A Self-Learning Text by David Kleinbaum and Mitchel Klein
Download