Survival analysis First example of the day • • • • Small cell lungcanser Meadian survival time: 8-10 months 2-year survival is 10% New treatment showed median survival of 13.2months Progressively censored observations Current life table • Completed dataset Cohort life table • Analysis “on the fly” Problem Do patients survive longer after treatment 1 than after treatment 2? Possible solutions: • ANOVA on mean survival time? • ANOVA on median survival time? • 100 person years of observation: How long has the average person been in the study. • 10 persons being observed for 10 years • 100 persons being observed for 100 years Life table analysis A sub-set of 13 patients undergoing the same treatment Life table analysis Time interval chosen to be 3 months ni number of patients starting a given period Life table analysis di number of terminal events, in this example; progression/response wi number of patients that have not yet been in the study long enough to finish this period Life table analysis Number exposed to risk: ni – wi/2 Assuming that patients withdraw in the middle of the period on average. Life table analysis qi = di/(ni – wi/2) Proportion of patients terminating in the period Life table analysis pi = 1 - qi Proportion of patients surviving Life table analysis Si = pi pi-1 ...pi-N Cumulative proportion of surviving Conditional probability Survival curves How long will a lung canser patient keep having canser on this particular treatment? Kaplan-Meier Simple example with only 2 ”terminal-events”. Confidence interval of the Kaplan-Meier method Fx after 32 months SE ( Si ) Si di n n d i i i SE ( Si ) 0.9 1 10 10 1 0.0949 Confidence interval of the Survival plot for all data on treatment 1 AreKaplan-Meier there differences between the method treatments? Comparing Survival Curves One could useTwo the confidence intervals… But what if the confidence intervals are not overlapping only at some points? Logrank-stats • Hazard ratio Mantel-Haenszel methods Comparing Two Survival Curves The logrank statistics Aka Mantel-logrank statistics Aka Cox-Mantel-logrank statistics Comparing Two Survival Curves Five steps to the logrank statistics table 1. Divide the data into intervals (eg. 10 months) 2. Count the number of patients at risk in the groups and in total 3. Count the number of terminal events in the groups and in total 4. Calculate the expected numbers of terminal events e.g. (31-40) 44 in grp1 and 46 in grp2, 4 terminal events. expected terminal events 4x(44/90) and 4x(46/90) 5. Calculate the total Comparing Two Survival Curves Smells like Chi-Square statistics O E 2 E all_treatments 2 23 17.07 12 17.93 2 2 17.07 df 1 p 0.05 17.93 2 4.02 Comparing Hazard ratio Two Survival Curves Hazard ratio O1 E1 23 17.07 2.01 O2 E2 12 17.93 Comparing Two Mantel Haenszel testSurvival Curves a b n OR c d n Is the OR significant different from 1? Look at cell (1,1) Estimated value, E(ai) row total * column total Variance, V(a i) grand total (a c)(b d )(a b)(c d ) V (ai ) 2 n n 1 Comparing Two Mantel Haenszel testSurvival Curves a E (a ) 2 i i 1.12 M H V (ai ) df = 1; p>0.05 Hazard function H log( Si ) H d f c d is the number of terminal events f is the sum of failure times c is the sum of censured times Logistic regression Who survived Titanic? The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers survived. Who survived? 25 The data pclass survived name sex 1 1 Allen, Miss. Elisabeth Walton female 1 1 Allison, Master. Hudson Trevor male 1 0 Allison, Miss. Helen Loraine female 1 0 Allison, Mr. Hudson Joshua Creighton 1 0 1 age sibsp parch 29 0 0 0.9167 1 2 2 1 2 male 30 1 2 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25 1 2 1 Anderson, Mr. Harry male 48 0 0 1 1 Andrews, Miss. Kornelia Theodosia female 63 1 0 1 0 Andrews, Mr. Thomas Jr male 39 0 0 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53 2 0 Sibsp is the number of siblings and/or spouses accompanying Parsc is the number of parents and/or children accompanying Some values are missing Can we predict who will survive titanic II? 26 Analyzing the data in a (too) simple manner • Associations between factors without considering interactions 27 Analyzing the data in a (too) simple manner • Associations between factors without considering interactions 28 Analyzing the data in a (too) simple manner • Associations between factors without considering interactions 29 Could we use multiple linear regression to predict survival? multiple linear regression Logistic regression Response variable is defined between –inf and +inf Response variable is defined between 0 and 1 Normal distributed Bernoulli distributed E ( y) 0 1 x1 ... n xn 30 Logit transformation is modeled linearly The logistic function ln p 0 1 x1 ... n xn 1 p exp 0 1 x1 ... n xn 1 p 1 exp 0 1 x1 ... n xn 1 exp 0 1 x1 ... n xn 31 The sigmodal curve sigmodal curve 1 p 1 e z z 0 1 x1 ... n xn 1 0.8 p 0.6 0.4 = 0; = 1 0 1 0.2 0 -6 -4 -2 0 x 2 4 6 32 The sigmodal curve sigmodal curve 1 p 1 e z z 0 1 x1 ... n xn 0.8 The intercept basically just ‘scale’ the input variable 0 = 0; 1 = 1 = 2; = 1 0 0.6 1 0 = -2; 1 = 1 p • 1 0.4 0.2 0 -6 -4 -2 0 x 2 4 6 33 The sigmodal curve sigmodal curve 1 p 1 e z z 0 1 x1 ... n xn • The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability 0.8 0 = 0; 1 = 1 = 0; = 2 0 0.6 1 0 = 0; 1 = 0.5 p • 1 0.4 0.2 0 -6 -4 -2 0 x 2 4 6 34 The sigmodal curve sigmodal curve 1 p 1 e z z 0 1 x1 ... n xn • • The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability Positive regression coefficient → risk factor increases the probability 0.8 0 = 0; 1 = 1 0.6 = 0; = -1 0 1 p • 1 0.4 0.2 0 -6 -4 -2 0 x 2 4 6 35 Logistic regression of the Titanic data 36 Logistic regression of the Titanic data – passenger class 1. Summary of data 2. Coding of the dependent variable 3. Coding of the categorical explanatory variable: First class: 1 Second class: 2 Third class: reference 37 Logistic regression of the Titanic data – passenger class • • • A fit of the null-model, basically just the intercept. Usually not interesting The total probability of survival is 500/1309 = 0.382. Cutoff is 0.5 so all are classified as nonsurvivers. Basically tests if the null-model is sufficient. It almost certainly is not. Shows that survival is related to pclass (which is not in the nullmodel) 38 Logistic regression of the Titanic data – passenger class 1. Omnibus test: Uses LR to describe if the adding the pclass variable to the model makes it better. It did! But better than the null-model, so no surprise. 2. Model Summary. Other measures of the goodness of fit. 3. Classification table: By including pclass 67.7 passengers were correctly categorized. 4. Variables in the equation: first line repeats that pclass has a significant effect on survival. B is the logistic fittet parameter. Exp(B) is the odds rations, so the odds of survival is 4.7 (3.6-6.3) times higher than passengers on third class (reference class) 39 Logistic regression of the Titanic data – Adding age to the model Ups… Some data points are missing And the null model is poorer 40 Logistic regression of the Titanic data – Adding age to the model • • Cox and Senll’s R-square increased from 0.093 to 0.141, indicating a better model By this model we can classify 69.1% passenger class only classified 67.7% 41 Logistic regression of the Titanic data – Adding age to the model • • • • Age has a significant influence on survival. The odds ratio of age is 0.963 So the odds of a 31 year old is 0.963 times the odds of a 30 year old. Or the odds for a 30 year old to survive is 1/0.963 = 1.038 times larger than that of a 31 year old 42 Logistic regression of the Titanic data – Age alone • The model is extremely poor • Consequently age appear to be insignificant in estimating survival. 43 Logistic regression of the Titanic data – Adding family and sex • The model is becoming better 44 Logistic regression of the Titanic data – Using the model as to predict • What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? z = -2.703 -0.041*25 +2.552 +1.718 +0.925 = 1.4670 1 1 p 1 e z 1 e 1.4714 0.8133 45 Using the model to predict survival • What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? z = -3.929 -0.589*(-5)/14.41 +1.718 +2.552 +0.926 = 1.4714 1 1 1 e z 1 e 1.4714 0.8133 p 46 Is it realistic that Leonardo survives and the chick dies? 47