Survival Analysis Introduction (AKA Event History Analysis) Situation Analysis of data in which the dependent variable consists of two aspects – 1) whether or not the outcome occurred – a dichotomous characteristic and 2) the amount of time that elapsed before occurrence of the outcome – a continuous quantitative characteristic. The typical situations with this type of dependent variable Medical literature Two treatments for a disease are given. We attempt to record 1) Whether or not each patient died – the dichotomous outcome – and 2) how long each patient survived until death – the continuous outcome. Group A given Drug A. Group B given Drug B. Are death rates different in the two groups? Are survival times different in the two groups? Turnover literature Persons are hired by an organization into two different buildings. We attempt to record 1) Whether or not each employee quits before retirement and 2) how long each employee is employed before quitting. Building A: Kill and Debone Building B: Cook Are turnover rates different in the two plants? Are there differences in length of service in the two plants? Survival Analysis – 1 Printed on 10/9/2008 Dealing with the two aspects of survival analysis: One Dichotomous and one Continuous Dichotomous: Dying / Leaving the company Continuous: Length of survival / Time at the company These two aspects – proportion dying/turning over and “average” time before death/turnover are negatively correlate. If death rate is lower, that means that survival times are longer. But they’re not perfectly negative correlated, so each gives us a slightly different picture of survival across groups. If we could observed persons for an infinite period of time – until EVERYONE had died or quit, then we would probably just analyze the survival times – a positively skewed variable, but one for which there are plenty of analytic tools – Mann-Whitney, Kruskal-Wallis, transformations to normality, etc. Window of Observation The problem with this is that we don’t have an infinite period of time to wait until everyone quits or dies. Plus, it may be the case that we lose contact with people so for some people we won’t know how long they survived. Plus, people die/quit for a variety of reasons, many or most not related to the specifics of the treatment or working conditions. So survival times may be shortened arbitrarily and randomly for some people. Somehow, this variability must be taken into account. The window of observation is the specific time period in which participant survival is recorded. That is, at some time, we begin recording whether or not each person is surviving or not. At some later time we quit monitoring each patient. However, the exigencies of research require that our windows of observation must be finite. Because the window is of finite duration, this necessarily results in incomplete information on some participants. Of particular importance is the fact some will still be alive/working when we quit observing. This means that we won’t have accurate survival times for some people and we also won’t have accurate death/quit rates for some groups, since some members of each group may still be working when the window closes. Oh, woe is me! What to do? Survival Analysis – 2 Printed on 10/9/2008 Overview of Types of cases in survival analysis Monitoring of cases begins, i.e., Window opens Monitoring of cases ends, i.e., Window closes -------|------------------------------------------------------|-------Ideal Cases – each starting time and ending time is known Cases whose ending time (time of termination/death) is unknown These are called Right Censored Cases - the most common ?????????????? ? ?????????????? The above cases are still employed/surviving at the time monitoring ends. ? ?????????????????????????????? The above case is lost to follow-up (quit answering phone, left state, etc.) Cases whose starting times (times disease develops)are unknown. I believe these are called left-censored, although Tabachnick & Fidell are ambiguous on p. 5?? and p. 5?? regarding this. They use "failed before study begin" and "disease process began before study began" - two different conditions. I believe their discussion on p 5?? is the correct definition. ??????? Cases whose starting times and ending times are Fagettaboutit – these are not analyzable. ???? unknown, ???? Survival Analysis – 3 Printed on 10/9/2008 Incorrect Analysis 1: Analysis of only the outcomes – deaths or quits. We could use logistic regression to compare death/quit rates between groups. (Use linear regression in a pinch praying that the God of statistics won’t strike you down). Problem – it’s possible to create situations in which distributions of durations are different even though proportions of outcomes are identical. Consider the following . . . Assume we’re dealing with employment. In the figures, each arrow represents duration of employment for a person. The horizontal axis is time. The vertical line at the left represents the time at which the window of observation opened. The vertical line at the right represents the time at which the window closed. The -> of the arrow represents death/termination. Group A – Termination Rate = 100% Group B – Termination Rate = 100% Clearly, Group A has longer average employment times, but both have the exact same proportion of turnovers – 100% in this example. So comparison of death/quit rates gives an inaccurate picture of the differences between the groups. Survival Analysis – 4 Printed on 10/9/2008 Incorrect Analysis 2 – Analyze only the durations. Ignore the deaths/turnovers. Use Mann-Whitney U-tests since durations will be positively skewed. Group A – Average Survival Time = Group B - Average Survival Time = In the example above, the two groups have equal average survival time, but different turnover rates – Group A has a 100% turnover rate, while that of Group B is 60%. In this case, analysis of only survival times will give an incorrect picture of the differences in survival between the groups. Each type of incomplete analysis ignores one aspect of the complete dependent variable. We need a method of analysis that takes into account both aspects. Survival analysis is an analytic technique that combines both aspects. Survival Analysis – 5 Printed on 10/9/2008 Survival Analysis (also called Event History Analysis) An analytic technique that models both proportion of outcomes (death/turnover) and average duration to outcome. 3 separate techniques – Life Table, Kaplan-Meier, Cox Regression Key concepts common to all 1. Survival function – most important one of all of them A plot of proportion surviving from time 0 up to a given time vs. time A cumulative plot. 100% 63% have survived at time, t. Proportion Surviving 50% 54% have survived at time, t. 0 t Time Generally decreasing curve, since proportion surviving can only remain constant or decrease across time. Separate curves for separate groups. Note that the survival function represents both proportion dieing/terminating (the height of the curve at a point). The curve also represents duration of stay/life (how far the curve has progressed to the right from t=0). It is a two-dimensional representation of the two aspects of survival – turnover rates and length of life/employment. Survival Analysis – 6 Printed on 10/9/2008 Comparing survival rates between groups. The vertical axis represents proportion of survivals or turnovers. Within a vertical slice at any point, turnover rates up to a particular time can be compared. In the following, we see that Group B had lower survival/higher turnover at the indicated time period. A B Time The horizontal axis represents duration of life/stay. Within a horizontal slice at any point, average durations can be compared. In the following, we see that for group A, average time to reach 70% turnover was longer for Group A. A 70% B Time Survival Analysis – 7 Printed on 10/9/2008 2. Hazard function A plot of proportion dying/leaving at time intervals from among those who had survived to that time period. Among those who have survived until time, t, the hazard function, gives the proportion who will die next. Not a cumulative plot. Hazard function for human mortality – Highest at young age and at high age. Proportion Dying Age 3. Cumulative Hazard. A plot of proportion dieing/turning over up to a particular time. A cumulative plot – the inverse of the survival plot. Survival Analysis – 8 Printed on 10/9/2008 Three general types of Survival Analysis 1. Life Tables analysis. The window of observation is cut up into n equal-length intervals. Proportions of persons surviving/dying within each interval are computed. This is the original method. Useful for analysis of one group or for comparison of a few groups defined by levels of a single categorical factor. Can’t incorporate quantitative predictors. Can’t incorporate more than 2 qualitative predictors in SPSS. Cannot analyze interactions of 2 or more predictors. 2. Kaplan-Meier analysis. Event-based. Rather than defining intervals based on time, intervals are defined based on occurrence of death/termination. Each death/termination marks the end of one interval and the beginning of a subsequent interval. Can’t incorporate quantitative predictors. Can’t incorporate more than 2 qualitative predictors in SPSS. Cannot analyze interactions of 2 or more predictors. Survival Analysis – 9 Printed on 10/9/2008 3. Cox Proportional Hazards Regression (Cox Regression) A very general, procedure. Based on a specific mathematical model of survival developed by Cox. Estimates hazard probabilities for whole sample. Then estimates ratios of hazards to this overall hazard function for groups/persons with different values of IV’s As implemented in SPSS, output and analyses look at lot like logistic regression. Can incorporate quantitative predictors. Can incorporate multiple qualitative and quantitative factors. Can incorporate interactions. Requires: Proportional hazard functions. Survival plots for different groups must diverge “nicely” and can’t cross back. OK Not OK Survival Analysis – 10 Printed on 10/9/2008 Based on Tabachnick Table 11.1, p. 511 Analyzed using SPSS Life Tables Suppose the efficacy of Drug 0 is being compared with that of Drug 1. Each was formulated to prolong life of patients with a usually terminal form of cancer. Seven patients were given Drug 0 and five were given Drug 1. Patients were observed for up to 12 months. After 12 months, the window of observation closed and the results were entered into SPSS. So this problem is analogous to a turnover problem in organizational research with two groups of employees treated differently. Like ANOVA Like multiple t-tests The SPSS syntax to invoke the analysis. SAVE OUTFILE='G:\MdbT\P595\P595AL07-Survival analysis\TAndFDancingData.sav' /COMPRESSED. SURVIVAL TABLE=months BY drug(0 1) /INTERVAL=THRU 12 BY 1 /STATUS=outcome(1) /PRINT=TABLE /PLOTS (SURVIVAL)=months BY drug. Survival Analysis – 11 Printed on 10/9/2008 Survival Analysis [DataSet0] G:\MdbT\P595\P595AL07-Survival analysis\TAndFDancingData.sav Survival Variable: months Life Table First-order Controls drug 0 1 Interval Start Time 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 12 Number Entering Interval 7 7 6 4 3 2 1 1 1 1 1 1 5 5 5 5 5 5 5 5 4 3 3 1 1 Number Withdra wing during Interval 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Number Exposed to Risk 7.000 7.000 6.000 4.000 3.000 2.000 1.000 1.000 1.000 1.000 1.000 1.000 5.000 5.000 5.000 5.000 5.000 5.000 5.000 5.000 4.000 3.000 3.000 1.000 .500 Number of Terminal Events 0 1 2 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 2 0 0 Proportio n Terminat ing .00 .14 .33 .25 .33 .50 .00 .00 .00 .00 .00 1.00 .00 .00 .00 .00 .00 .00 .00 .20 .25 .00 .67 .00 .00 Proportio n Surviving 1.00 .86 .67 .75 .67 .50 1.00 1.00 1.00 1.00 1.00 .00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .80 .75 1.00 .33 1.00 1.00 Cumulati ve Proportio n Surviving at End of Interval 1.00 .86 .57 .43 .29 .14 .14 .14 .14 .14 .14 .00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .80 .60 .60 .20 .20 .20 Std. Error of Cumulati ve Proportio n Surviving at End of Interval .00 .13 .19 .19 .17 .13 .13 .13 .13 .13 .13 .00 .00 .00 .00 .00 .00 .00 .00 .18 .22 .22 .18 .18 .18 Probabili ty Density .000 .143 .286 .143 .143 .143 .000 .000 .000 .000 .000 .143 .000 .000 .000 .000 .000 .000 .000 .200 .200 .000 .400 .000 .000 Std. Error of Probabili ty Density .000 .132 .171 .132 .132 .132 .000 .000 .000 .000 .000 .132 .000 .000 .000 .000 .000 .000 .000 .179 .179 .000 .219 .000 .000 Hazard Rate .00 .15 .40 .29 .40 .67 .00 .00 .00 .00 .00 2.00 .00 .00 .00 .00 .00 .00 .00 .22 .29 .00 1.00 .00 .00 Std. Error of Hazard Rate .00 .15 .28 .28 .39 .63 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .22 .28 .00 .61 .00 .00 The results suggest that survival is significantly longer with Drug 1 – the top (orange) curve. Survival Analysis – 12 Printed on 10/9/2008 Tabachnick Table 11.1, p. 511 start here on 10/14/15 Analyzed using SPSS Kaplan-Meier [Define Event] had already been pressed when this screen shot was taken. KM months BY drug /STATUS=outcome(1) /PRINT TABLE MEAN /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED. Survival Analysis – 13 /PLOT SURVIVAL Printed on 10/9/2008 Kaplan-Meier [DataSet2] G:\MdbT\InClassDatasets\Survival(T&Bp511).sav Case Processing Summary Censored drug 0 1 Overall drug 0 1 Total N N of Events 7 5 12 1 2 3 4 5 6 7 1 2 3 4 5 N Percent 7 4 11 Time 1.000 2.000 2.000 3.000 4.000 5.000 11.000 7.000 8.000 10.000 10.000 12.000 0 1 1 Status 1 1 1 1 1 1 1 1 1 1 1 0 .0% 20.0% 8.3% Survival Table Cumulative Proportion Surviving at the Time Estimate Std. Error .857 .132 . . .571 .187 .429 .187 .286 .171 .143 .132 .000 .000 .800 .179 .600 .219 . . .200 .179 . . N of Cumulative Events 1 2 3 4 5 6 7 1 2 3 4 4 N of Remaining Cases 6 5 4 3 2 1 0 4 3 2 1 0 Means and Medians for Survival Time Meana 95% Confidence Interval drug Estimate Std. Error Lower Bound Upper Bound 0 4.000 1.272 1.506 6.494 1 9.400 .780 7.872 10.928 Overall 6.250 1.081 4.131 8.369 a. Estimation is limited to the largest survival time if it is censored. Estimate 3.000 10.000 5.000 Std. Error 1.309 .894 2.598 Median 95% Confidence Interval Lower Bound Upper Bound .434 5.566 8.247 11.753 .000 10.092 Overall Comparisons Chi-Square df Log Rank (Mantel-Cox) 3.747 1 Breslow (Generalized Wilcoxon) 4.926 1 Tarone-Ware 4.522 1 Test of equality of survival distributions for the different levels of drug. Sig. .053 .026 .033 Note that censored cases are denoted with a + on the survival function. As was the case with the analysis using the LIFE TABLES procedure, the results support the conclusion that survival is significantly longer with Drug 1. Survival Analysis – 14 Printed on 10/9/2008 Tabachnick Table 11.1, p. 511 Analyzed using SPSS Cox Regression The program will not produce a survival curve for a group of cases defined by the value of a variable unless that variable is a categorical variable. For that reason, I told the program that drug is a categorical variable so that survival curves for each value of drug could be obtained. Since drug is a dichotomy, the analysis could be done without labeling it categorical, but in that case the survival curves for each value of drug could not have been generated. Survival Analysis – 15 Printed As mentioned above if you on 10/9/2008 want separate predicted survival functions for each The left panel would yield 1 plot The right panel yields a plot for each value of drug. COXREG months /STATUS=outcome(1) /PATTERN BY drug /CONTRAST (drug)=Indicator(1) /METHOD=ENTER drug /CRITERIA=PIN(.05) POUT(.10) ITERATE(20). /PLOT SURVIVAL Cox Regression [DataSet2] G:\MdbT\InClassDatasets\Survival(T&Bp511).sav Case Processing Summary N Cases available in analysis Cases dropped Eventa Censored Total Cases with missing values Cases with negative time Censored cases before the earliest event in a stratum Total Total a. Dependent Variable: months Categorical Variable Codingsb Frequency (1) druga 0 7 1 5 a. Indicator Parameter Coding b. Category variable: drug 11 1 12 0 0 0 Percent 91.7% 8.3% 100.0% .0% .0% .0% 0 12 .0% 100.0% 0 1 Survival Analysis – 16 Printed on 10/9/2008 Block 0: Beginning Block Omnibus Tests of Model Coefficients -2 Log Likelihood 40.740 Block 1: Method = Enter Omnibus Tests of Model Coefficientsa Change From Previous Step Sig. Chi-square df Sig. .063 3.346 1 .067 Overall (score) -2 Log Likelihood Chi-square df 37.394 3.469 1 a. Beginning Block Number 1. Method = Enter Change From Previous Block Chi-square df Sig. 3.346 1 .067 Variables in the Equation B drug drug SE -1.176 Wald 3.192 .658 Covariate Means and Pattern Values Pattern Mean 1 .417 .000 df Sig. 1 Exp(B) .074 .309 Cox regression coefficient signs are relative to death, not survival. So a positive sign means that larger values of the independent variable have higher death rates. And negative signs mean that larger values of the independent variable have lower death rates. 2 1.000 In Cox Regression, we’re predicting DEATH, not survival. Death 0 1 Drug I strongly recommend that you create a plot such as the one immediately above by hand to make sure you understand the Cox Regression results. I do it every time I use this procedure. Survival Analysis – 17 Printed on 10/9/2008 The Cox-Regression plots are y-hat plots, not observed survival functions. They are predicted survival, not actual survival. Y-hats COXREG plots are plots of predicted survival, not actual survival. In this sense, they’re like the tables and plots of estimated marginal means from GLM. I usually report observed survival functions, using KaplanMeier, rather than these predicted survival functions. However, these are certainly useful in situations in which you want to show what survival should be for specific groups at specific times. Survival Analysis – 18 Printed on 10/9/2008 Example: Turnover at a local Manufacturing Plant 1. Effect of Friends and/or family at the plant In this study, turnover at a local manufacturing plant was studied. On the application blank, applicants were asked to indicate whether or not they had friends or family already working at the plant. Some did not respond to this question. They’re included in the analysis. A screen shot of the data editor The variable, wsfr2, represents whether or not the applicant had friends at the company. wsfr2 = 0.50 means yes. wsfr2 = -0.50 means no. wsfr2 = 0.15 means no info. Wsfr2 was created to deal with missing values in a special way. The fact that the values are fractional has no bearing on the analyses. They could just as well have been 0, 1, 2 or 1, 2,3. Kaplan-Meier output is shown KM dos BY wsfr2 /STATUS=status(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED . Kaplan-Meier [DataSet3] G:\MdbR\1TurnoverArticle\TurnoverArticleDataset061005.sav Huge table not reproduced here. Ca se Pr oces sing Sum mary wsf r2 Wheth er F/F at compa ny for whol e sam ple analyses -.5 0 .15 Wh ole sa mpl e missing val ue .50 Overall Ce nsore d To tal N 42 3 N o f Eve nts 17 4 10 0 N 24 9 Pe rcent 58 .9% 40 60 60 .0% 77 8 22 0 55 8 71 .7% 13 01 43 4 86 7 66 .6% -.50 = No friends .15 = No info .50 = Had friends The Survival table (700+ lines long) was deleted. Survival Analysis – 19 Printed on 10/9/2008 Means and Medians for Survival Time Meana Median 95% Confidence Interval wsfr2 Estimate Std. Lower Upper Error Bound Bound 95% Confidence Interval Estimate Std. Lower Upper Error Bound Bound -.50 610.597 25.559 560.500 660.693 667.000 . . . .15 missing info 579.795 49.233 483.299 676.291 528.000 151.013 232.014 823.986 .50 769.900 18.559 733.524 806.277 . . . . Overall 706.965 15.009 677.548 736.383 . . . . a. Estimation is limited to the largest survival time if it is censored. Note that there is not estimate of median survival for the 0.50 group. I believe this is because of the large proportion of censored cases in that group – almost everyone was still on the job at the end of the window of observation. Overall Comparisons Chi-Square df Sig. Log Rank (Mantel-Cox) 25.344 2 .000 Breslow (Generalized Wilcoxon) 25.325 2 .000 Tarone-Ware 25.004 2 .000 Test of equality of survival distributions for the different levels of wsfr2. Clearly there are significant differences in overall survival between the groups. Survival Analysis – 20 Printed on 10/9/2008 Had friends or family Missing response Note the huge difference in proportion surviving after two years – almost 20% difference between those with friends and those without friends. No friends or family 1 year 2 years The data strongly suggest that applicants who had friends or family at the company had higher survival rates at all times, up to 1100 days (about 3 years). For example, at the end of 1 year survival (leftmost arrow in the above figure) rate of those with friends and family was about 70% while that for those who said they did not have friends or family at the organization was about 60%. By two years (rightmost arrow), the rate of retention of those with was about 68% while the rate of those without had decreased to 50%. The fact that the curve for those for whom no information was available was between the other two curves suggests that those employees for whom no information was available were a mixture of some who did have friends and family and those who did not. Survival Analysis – 21 Printed on 10/9/2008 Using Survival Analysis to validate selection test questions. An I/O consulting firm gave a 30 question pre-employment questionnaire to 1000+ employees of a local company. Each question had from one to five alternatives. The consulting company wanted to identify questions that predicted long tenure with the organization. (They would have preferred to identify questions that predicted high performance, but it was not possible to get good performance data. Don’t get me started on why organizations don’t gather good performance data.) In order to identify responses associated with long tenure, a survival analysis was conducted for each question. A few of the analyses are presented below. For each survival function, each curve is the survival function of persons who made a particular response to the item. I picked only those for which the difference in survival curves was significant or approached signifiance. Question 1 Overall Comparisons Chi-Square df Sig. Log Rank (Mantel-Cox) 5.382 2 .068 Breslow (Generalized Wilcoxon) 4.307 2 .116 Tarone-Ware 4.756 2 .093 The numbers represent the 3 possible responses to the question, coded as +1, 0, -1. +1 0 -1? Survival Analysis – 22 For this question, I believe we treated +1 as an indicator of long tenure and both 0 and -1 as indicators of short tenure. Printed on 10/9/2008 Question 2 Overall Comparisons Chi-Square df Sig. Log Rank (Mantel-Cox) 7.647 4 .105 Breslow (Generalized Wilcoxon) 6.950 4 .139 Tarone-Ware 7.298 4 .121 +1 0 As in the case of the question on the previous page, the response coded as +1 was treated as an indicator of long tenure and all other responses were treated as indicators of short tenure. -1? Survival Analysis – 23 Printed on 10/9/2008 Question 3 Overall Comparisons Chi-Square df Sig. Log Rank (Mantel-Cox) 5.070 3 .167 Breslow (Generalized Wilcoxon) 5.525 3 .137 Tarone-Ware 5.493 3 .139 Test of equality of survival distributions for the different levels of GenQ4 Gen Q4 L:I prefer a job that / S: How often you experience conflict with a coworker?. There were very few persons who responded +1 or 0, but those who did were treated as long tenure and those who responded 0 as short tenure. +1 0 Survival Analysis – 24 Printed on 10/9/2008 Question 4 Overall Comparisons Chi-Square df Sig. Log Rank (Mantel-Cox) 7.753 4 .101 Breslow (Generalized Wilcoxon) 6.762 4 .149 Tarone-Ware 7.439 4 .114 Test of equality of survival distributions for the different levels of GenQ3 Gen Q3 L: Recieved safety training? / S: You are asked to do more physically demanding work than you were hired to do because someone out sick, how do you react?. +1 +1: Long tenure Else: Short tenure 0 -1? Survival Analysis – 25 Printed on 10/9/2008 Question 5 Overall Comparisons Chi-Square Log Rank (Mantel-Cox) Breslow (Generalized Wilcoxon) Tarone-Ware df Sig. 10.971 4 .027 9.931 4 .042 10.597 4 .031 Test of equality of survival distributions for the different levels of GenQ2 Gen Q2 L: Your team in disagreement over who will clean the floor. What method is fair?/ S: Recent supervisor rate dependability?. +1 0 -1? Survival Analysis – 26 Printed on 10/9/2008 Question 6 Overall Comparisons Chi-Square Log Rank (Mantel-Cox) df Sig. 8.052 3 .045 Breslow (Generalized Wilcoxon) 12.729 3 .005 Tarone-Ware 10.614 3 .014 Test of equality of survival distributions for the different levels of GenQ1 GenQ1 L: Which strategies inspire a team and help be more effective?/ S:Your team in disagreement over who will clean the floor. What method is fair?. +1 0 -1? Survival Analysis – 27 Printed on 10/9/2008 Thirty questions were evaluated in the above fashion. After examination of the individual survival curves for the 30 questions, those for which significant differences in survival between responses were identified by examining the survival analysis for each question as shown above. Finally, an overall index was calculated, using syntax like the following . . . In this particular case, the response associated with long survival added 1 to the index. The response associated with short survival subtracted 1 from the index. Tenure Scale Computation Compute genshort=0. if ((genq1=3 or genq1=4)) if ((genq1=1 or genq1=2)) if ((genq2=3 or genq2=4)) if ((genq2=1 or genq2=2 or genq2=5)) if ((genq6=3)) if ((genq6=1 or genq6=2)) if ((genq12=1)) if ((genq12=3)) if ((genq13=1)) if ((genq13=2 or genq13=3 or genq13=4)) if ((genq21=1 or genq21=3)) if ((genq21=2)) genqshort=genqshort+1. genqshort=genqshort-1. genshort=genshort+1. genshort=genshort-1. genshort=genshort+1. genshort=genshort-1. genshort=genshort+1. genshort=genshort-1. genshort=genshort+1. genshort=genshort-1. genshort=genshort+1. genshort=genshort-1. Survival Analysis – 28 Printed on 10/9/2008 Validity of the Tenure Scale The following is not based on the scale above but on a similar scale. The median score on the scale was determined to be -14. Group 0 was all employees with an index value less than or equal to -14. Group 1 was all employees with an index value greater than -14. Group 1 Group 0 1 yr 2 yr 3 yr 4 yr The graph indicates that those in Group 1, with large values of the index, had a nearly 70% retention rate after 50 months. Those in Group 0 had a 40% retention rate after the same length of time. The implication of this analysis would be to recommend to the company to use the scale in hiring of employees, giving preference to those with higher scores on the scale. Potential problems The above curve was based on the same sample that was used to select the questions. So clearly there is capitalization on chance. The scale should be tested on a different sample. That is the results need to be cross validated. Survival Analysis – 29 Printed on 10/9/2008 Multivariate Analysis using Cox Regression Turnover as a function of 1) friends at the organization (wsfr2) and 2) ethnic group of the employee (neth) COXREG dos /STATUS=status(1) /PATTERN BY wsfr2 /CONTRAST (neth)=Indicator(1) /CONTRAST (wsfr2)=Indicator /METHOD=ENTER wsfr2 nsex neth /PLOT SURVIVAL /CRITERIA=PIN(.05) POUT(.10) ITERATE(20). Wsfr2 = 0.50 friends at the company 0.15 no info on whether has friends -0.50 does not have friends Neth 1 2 3 Employee is White Employee is Black Employee is American Indian or Asian or Hispanic Survival Analysis – 30 Printed on 10/9/2008 Cox Regression [DataSet1] G:\MDBR\1TurnoverArticle\TurnoverArticleDataset061005.sav Case Processing Summary N Cases available in analysis Eventa 434 33.4% Censored 867 66.6% 1301 100.0% Cases with missing values 0 0.0% Cases with negative time 0 0.0% 0 0.0% 0 0.0% 1301 100.0% Total Cases dropped Percent Censored cases before the earliest event in a stratum Total Total a. Dependent Variable: dos Days of service: termdate-effdate or 3/1/1-effdate or 12/31/4-effdate Categorical Variable Codingsa,c Frequency wsfr2b nethb (1) Note that (2) -.50=-.50 423 1 0 .15=Whole sample missing value 100 0 1 .50=.50 778 0 0 1.00=White 903 0 0 2.00=Black 324 1 0 74 0 1 3.00=Am Ind,Asian,Hisp Wsfr2 = 0.50 (friends) is the reference group Neth = 1 (white) is the reference group a. Category variable: wsfr2 (Whether F/F at company for whole sample analyses) b. Indicator Parameter Coding c. Category variable: neth (1=White, 2=Black, 3=Am Ind,Asian, Hisp) Survival Analysis – 31 Printed on 10/9/2008 Block 0: Beginning Block Omnibus Tests of Model Coefficients -2 Log Likelihood 5871.672 Block 1: Method = Enter Omnibus Tests of Model Coefficientsa -2 Log Likelihood Overall (score) Chi-square 5827.342 df Change From Previous Step Sig. 42.322 5 Chi-square .000 df 44.330 Change From Previous Block Sig. 5 Chi-square .000 df 44.330 Sig. 5 .000 a. Beginning Block Number 1. Method = Enter Remember we’re predicting “Quit”, not survival Quit Variables in the Equation B SE Wald wsfr2 df Sig. Exp(B) 22.427 2 .000 wsfr2(1) .464 .102 20.763 1 .000 1.590 wsfr2(2) .421 .173 5.969 1 .015 1.524 -.223 .100 4.952 1 .026 .800 10.799 2 .005 nsex neth neth(1) .088 .109 .657 1 .417 1.092 neth(2) -.908 .295 9.490 1 .002 .403 Quit Covariate Means and Pattern Values Mean Pattern 1 2 3 wsfr2(1) .325 1.000 .000 .000 wsfr2(2) .077 .000 1.000 .000 1.421 1.421 1.421 1.421 neth(1) .249 .249 .249 .249 neth(2) .057 .057 .057 .057 nsex 1=No Fr or missing 0=Fr 1=Fr Quit 2=MV Quit 1=Fe 2=Ma Survival Analysis – 32 0=W 1= Black or Printed on AI/As 10/9/2008 Predicted Survival Analysis – 33 Printed on 10/9/2008 Testing for Interactions in Cox Regression The interaction of Friends and Nsex To specify that an interaction be tested, click on the 1st variable name, then while holding down the CTRL key or Command on the Mac, click on the 2nd variable name. Finally, click on the >a*b> button. Block 1: Method = Enter -2 Log Likelihood Omnibus Tests of Model Coefficientsa Overall (score) Change From Previous Step Chidf Sig. Chisquare square 5824.879 44.989 7 .000 46.792 a. Beginning Block Number 1. Method = Enter B wsfr2 wsfr2(1) wsfr2(2) nsex neth neth(1) neth(2) nsex*wsfr2 nsex*wsfr2(1) nsex*wsfr2(2) .429 -.333 -.282 .097 -.907 .023 .541 df Variables in the Equation SE Wald df 3.022 .306 1.975 .530 .394 .138 4.158 10.964 .109 .800 .295 9.464 2.517 .213 .011 .347 2.424 Sig. 7 .000 2 1 1 1 2 1 1 2 1 1 Sig. .221 .160 .530 .041 .004 .371 .002 .284 .915 .119 Change From Previous Block Chidf Sig. square 46.792 7 .000 Exp(B) 1.536 .717 .754 1.102 .404 1.023 1.717 So the effect of having friends is the same for Females as it is for Males Survival Analysis – 34 Printed on 10/9/2008 The interaction of Friends and Neth Block 1: Method = Enter -2 Log Likelihood Omnibus Tests of Model Coefficientsa Overall (score) Change From Previous Step Chidf Sig. Chisquare square 5820.584 49.194 9 .000 51.088 a. Beginning Block Number 1. Method = Enter B wsfr2 wsfr2(1) wsfr2(2) nsex neth neth(1) neth(2) neth*wsfr2 neth(1)*wsfr2(1) neth(2)*wsfr2(1) neth(1)*wsfr2(2) neth(2)*wsfr2(2) .599 .623 -.224 .298 -.465 -.392 -1.222 -.534 -1.093 df Sig. 9 Variables in the Equation SE Wald df 27.320 .121 24.386 .209 8.846 .100 4.973 6.603 .150 3.934 .344 1.835 6.377 .230 2.906 .791 2.385 .378 1.995 1.075 1.035 .000 Change From Previous Block Chidf Sig. square 51.088 9 .000 Sig. .000 .000 .003 .026 .037 .047 .176 .173 .088 .123 .158 .309 2 1 1 1 2 1 1 4 1 1 1 1 Exp(B) 1.820 1.864 .799 1.347 .628 .675 .295 .586 .335 Again, the effect of Friends is the same for each ethnic group. What the heck? What about the interaction of nsex and neth? Block 1: Method = Enter -2 Log Likelihood Overall (score) Chi-square df 5827.298 42.395 7 a. Beginning Block Number 1. Method = Enter Omnibus Tests of Model Coefficientsa Change From Previous Step Sig. Chi-square df Sig. .000 44.373 7 .000 Variables in the Equation SE Wald B wsfr2 wsfr2(1) wsfr2(2) nsex .465 .421 -.216 .102 .173 .118 .112 -.739 .327 .884 -.018 -.125 .234 .624 neth neth(1) neth(2) neth*nsex neth(1)*nsex neth(2)*nsex df Sig. Change From Previous Block Chi-square df Sig. 44.373 7 .000 Exp(B) 22.438 2 .000 20.791 5.941 3.365 1 1 1 .000 .015 .067 .888 2 .642 .117 .700 1 1 .732 .403 .043 2 .979 .006 .040 1 1 .940 .842 1.591 1.523 .806 1.118 .477 .982 .883 Nope. Survival Analysis – 35 Printed on 10/9/2008 Survival Analysis of a phenomenon with a positive outcome PEG vs. PEGJ Example Skipped in F013 & F014 The data for this example compared two methods of feeding trauma patients, one using a percutaneous esophagogastrojejunostomy (PEGJ) and the other using percutaneous esophagogastrostomy (PEG). It was hoped that the data would show that the PEGJ technique would provide continuous uninterrupted nutrition with greater consistency than with PEG. Time to reach a nutrition goal was the continuous dependent variable. Patients were observed for 14 days. Whether or not a patient reached the goal was the status. Reaching the goal was the +1 state. A patient who had not reached the goal in 14 days, was treated as a censored case. Group=1 is the PEGJ group. Group=2 is the PEG group. NUTRSD 02/15/98 01/10/98 02/14/98 02/02/98 01/10/98 01/09/98 01/02/98 01/20/98 03/18/98 02/04/98 01/23/98 02/01/98 02/20/98 02/03/98 03/31/98 04/13/98 05/08/98 04/14/98 05/27/98 05/13/98 05/07/98 04/16/98 03/23/98 04/07/98 03/29/98 04/30/98 05/05/98 05/28/98 06/08/98 05/27/98 04/27/98 04/10/98 02/26/98 03/27/98 04/17/98 02/25/98 03/18/98 01/28/98 03/23/98 04/29/98 07/19/98 08/13/98 08/25/98 10/06/98 09/10/98 08/14/98 08/25/98 09/20/98 09/29/98 10/09/98 NUTRGOAL DAYSGOAL GOALIN14 02/16/98 1 1 01/12/98 2 1 02/18/98 4 1 02/06/98 4 1 01/13/98 3 1 . 15 0 01/04/98 2 1 01/22/98 2 1 . 5 1 02/06/98 2 1 . 15 0 02/02/98 1 1 02/21/98 1 1 02/04/98 1 1 04/02/98 2 1 04/15/98 2 1 05/09/98 1 1 04/20/98 6 1 05/28/98 1 1 . 15 0 05/16/98 9 1 04/17/98 1 1 03/25/98 2 1 04/08/98 1 1 03/30/98 1 1 05/01/98 1 1 05/08/98 3 1 05/30/98 2 1 06/10/98 2 1 05/28/98 1 1 04/29/98 2 1 04/11/98 1 1 03/04/98 6 1 03/28/98 1 1 04/18/98 1 1 03/05/98 8 1 03/19/98 1 1 01/29/98 1 1 03/24/98 1 1 05/03/98 4 1 08/02/98 14 1 08/15/98 2 1 . 15 0 10/07/98 1 1 09/11/98 1 1 08/15/98 1 1 08/27/98 2 1 09/21/98 1 1 10/01/98 2 1 . 15 0 GROUP 1 1 1 1 1 2 2 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 1 2 2 2 2 ISS 29 5 29 27 13 19 26 36 27 13 10 22 17 14 18 27 9 9 17 29 25 32 20 16 25 29 38 4 16 9 22 27 25 29 22 25 25 17 16 26 34 25 26 34 27 30 27 36 17 38 Survival Analysis – 36 AGE 43 88 37 36 92 73 42 55 23 72 45 59 54 78 30 49 22 60 27 95 31 31 41 29 24 52 79 76 70 27 87 36 54 22 22 79 56 66 20 22 33 49 77 19 36 35 29 62 19 74 DAYSGOAL is the “length of the arrow” variable in the first handout. GOALIN14 is a variable which represents whether the goal was reached or not. GOALIN14=1 means that the goal was reached. GOALIN14=0 means that the case is rightcensored. GROUP=1: PEGJ GROUP=2: PEG ISS: Injury Severity Score, a measure of amount of trauma (taken at admission) Printed on 10/9/2008 NUTRSD 10/02/98 08/26/98 08/19/98 08/03/98 08/25/98 09/17/98 07/02/98 08/03/98 07/15/98 07/27/98 04/30/98 05/29/98 05/16/98 06/20/98 08/30/98 04/30/98 07/01/98 09/29/98 05/28/98 07/15/98 08/11/98 10/12/98 08/24/98 10/22/98 10/08/98 10/06/98 07/30/98 04/16/98 10/08/98 08/19/98 03/20/98 06/20/98 07/30/98 09/07/98 07/17/98 09/15/98 07/07/98 10/01/98 09/11/98 NUTRGOAL DAYSGOAL GOALIN14 10/03/98 1 1 09/04/98 9 1 08/21/98 2 1 08/04/98 1 1 08/28/98 3 1 . 15 0 . 15 0 08/05/98 2 1 07/17/98 2 1 08/01/98 5 1 05/02/98 2 1 05/30/98 1 1 05/18/98 2 1 06/23/98 3 1 . 15 0 05/02/98 2 1 07/02/98 1 1 . 15 0 06/08/98 11 1 07/16/98 1 1 08/12/98 1 1 10/13/98 1 1 08/25/98 1 1 . 15 0 10/09/98 1 1 . 15 0 08/02/98 3 1 04/17/98 1 1 10/09/98 1 1 08/21/98 2 1 03/21/98 1 1 06/21/98 1 1 07/31/98 1 1 . 15 0 07/18/98 1 1 09/17/98 2 1 07/08/98 1 1 10/02/98 1 1 09/12/98 1 1 GROUP 1 2 1 1 2 2 1 2 2 2 2 1 2 1 1 2 1 2 2 2 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 2 1 2 1 ISS 10 18 18 41 24 26 19 13 38 34 4 29 19 25 25 43 43 17 36 27 19 36 20 25 25 17 22 38 25 34 25 11 25 36 22 20 33 25 41 AGE 40 48 31 46 37 75 28 52 71 33 61 58 42 19 70 33 79 18 57 59 43 18 84 17 20 31 26 18 34 22 48 45 33 28 62 47 27 33 31 Specifying the analysis using Life Tables . . . Survival Analysis – 37 Printed on 10/9/2008 The output of LIFE TABLES SURVIVAL TABLE=DAYSGOAL BY GROUP(1 2) /INTERVAL=THRU 15 BY 1 /STATUS=GOALIN14(1) /PRINT=TABLE /PLOTS ( SURVIVAL)=DAYSGOAL BY GROUP . Survival Analysis G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav Survival Variable: DAYSGOAL Life Table First-ord er Co ntrols GROUP 1 2 Pro porti on Su rvivin g 1.0 0 Cu mula tive Pro porti on Su rvivin g at En d of Inte rval 1.0 0 Std . Erro r of Cu mula tive Pro porti on Su rvivin g at En d of Inte rval .00 Nu mber En tering Inte rval 46 Nu mber Wit hdrawin g d uring Inte rval 0 Nu mber Exp osed to Risk 46. 000 1.0 00 46 0 46. 000 28 .61 .39 .39 .07 .60 9 .07 2 .88 .15 2.0 00 18 0 18. 000 6 .33 .67 .26 .06 .13 0 .05 0 .40 .16 3.0 00 12 0 12. 000 3 .25 .75 .20 .06 .06 5 .03 6 .29 .16 4.0 00 9 0 9.0 00 3 .33 .67 .13 .05 .06 5 .03 6 .40 .23 5.0 00 6 0 6.0 00 1 .17 .83 .11 .05 .02 2 .02 2 .18 .18 6.0 00 5 0 5.0 00 1 .20 .80 .09 .04 .02 2 .02 2 .22 .22 7.0 00 4 0 4.0 00 0 .00 1.0 0 .09 .04 .00 0 .00 0 .00 .00 8.0 00 4 0 4.0 00 1 .25 .75 .07 .04 .02 2 .02 2 .29 .28 9.0 00 3 0 3.0 00 0 .00 1.0 0 .07 .04 .00 0 .00 0 .00 .00 10. 000 3 0 3.0 00 0 .00 1.0 0 .07 .04 .00 0 .00 0 .00 .00 11. 000 3 0 3.0 00 0 .00 1.0 0 .07 .04 .00 0 .00 0 .00 .00 12. 000 3 0 3.0 00 0 .00 1.0 0 .07 .04 .00 0 .00 0 .00 .00 13. 000 3 0 3.0 00 0 .00 1.0 0 .07 .04 .00 0 .00 0 .00 .00 14. 000 3 0 3.0 00 0 .00 1.0 0 .07 .04 .00 0 .00 0 .00 .00 .00 0 43 0 43. 000 0 .00 1.0 0 1.0 0 .00 .00 0 .00 0 .00 .00 1.0 00 43 0 43. 000 11 .26 .74 .74 .07 .25 6 .06 7 .29 .09 2.0 00 32 0 32. 000 15 .47 .53 .40 .07 .34 9 .07 3 .61 .15 3.0 00 17 0 17. 000 2 .12 .88 .35 .07 .04 7 .03 2 .13 .09 4.0 00 15 0 15. 000 0 .00 1.0 0 .35 .07 .00 0 .00 0 .00 .00 5.0 00 15 0 15. 000 1 .07 .93 .33 .07 .02 3 .02 3 .07 .07 6.0 00 14 0 14. 000 1 .07 .93 .30 .07 .02 3 .02 3 .07 .07 7.0 00 13 0 13. 000 0 .00 1.0 0 .30 .07 .00 0 .00 0 .00 .00 8.0 00 13 0 13. 000 0 .00 1.0 0 .30 .07 .00 0 .00 0 .00 .00 9.0 00 13 0 13. 000 2 .15 .85 .26 .07 .04 7 .03 2 .17 .12 10. 000 11 0 11. 000 0 .00 1.0 0 .26 .07 .00 0 .00 0 .00 .00 11. 000 11 0 11. 000 1 .09 .91 .23 .06 .02 3 .02 3 .10 .10 12. 000 10 0 10. 000 0 .00 1.0 0 .23 .06 .00 0 .00 0 .00 .00 13. 000 10 0 10. 000 0 .00 1.0 0 .23 .06 .00 0 .00 0 .00 .00 14. 000 10 0 10. 000 1 .10 .90 .21 .06 .02 3 .02 3 .11 .11 Inte rval Start Tim e .00 0 Nu mber of Pro porti on Te rmina l Te rmina tin Eve nts g 0 .00 Std . Erro r of Pro babi lity Pro babi lity De nsity De nsity .00 0 .00 0 Ha zard Ra te .00 Std . Error of Ha zard Ra te .00 Me dian Surv iv al Time First-ord er Co ntrol s GROUP 1 2 Me d Tim e 1.8 2 2.7 0 Survival Analysis – 38 Printed on 10/9/2008 First-order Control: GROUP Since the outcome is a good event, the faster the curve falls to zero, the better. So the group performing best is the group with the lowest curve. These data are strange because the “event” is something that is sought after - reaching a feeding goal, rather than something that is to be avoided - death or termination. So for these data, lower "survival" is preferred, since the "event" is not death, but reaching a nutrition goal. The sooner a patient reached the nutrition goal the better. Thus, the investigators hoped that patients in the PEJ condition would reach those goals faster, leading to lower "survival" curves. In this case, survival should be called "Failure to reach feeding goal." Survival Analysis – 39 Printed on 10/9/2008 Analysis of the same data using Kaplan-Meier KM DAYSGOAL BY GROUP /STATUS=GOALIN14(1) /PRINT TABLE MEAN /PLOT SURVIVAL HAZARD /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED . Kaplan-Meier G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav Ca se Pr oces sing Sum mary Ce nsore d GROUP 1 To tal N 46 N o f Eve nts 43 2 43 Overall 89 N 3 Pe rcent 6.5 % 34 9 20 .9% 77 12 13 .5% Me ans and M edia ns for Surv iv a l Tim e Me an a Me dian 95 % Co nfide nce I nterva l GROUP 1 95 % Co nfide nce I nterva l Est imate 2.7 17 Std . Erro r .52 7 Lo wer B ound 1.6 85 Up per B ound 3.7 50 Est imate 1.0 00 2 5.4 88 .85 7 3.8 08 7.1 69 Overall 4.0 56 .51 7 3.0 43 5.0 69 Std . Erro r . Lo wer B ound . Up per B ound . 2.0 00 .21 4 1.5 81 2.4 19 2.0 00 .21 1 1.5 87 2.4 13 a. Est imati on is limit ed to the l argest survival t ime i f it is censored. Survival Analysis – 40 Printed on 10/9/2008 Ov erall Com paris ons Log Ran k (Ma ntel-Cox) Ch i-Squ are 8.4 79 df 1 Sig . .00 4 Bre slow (Gen eralized Wil coxo n) 9.5 88 1 .00 2 Ta rone-Ware 9.3 06 1 .00 2 Te st of e qual ity of survival di stribu tions for th e diff erent level s of G ROUP. Survival Analysis – 41 Printed on 10/9/2008 The same analysis using Cox Regression One requirement of the Cox Regression analysis is that the hazard functions be proportional. That means that for any two values of a covariate, the ratio of hazards for those two values across time be constant. This eliminates hazard functions which cross or which are parallel. Roughly speaking the hazard function should look like the following . . . That is, the hazard functions diverge over time. Survival Analysis – 42 Printed on 10/9/2008 COXREG DAYSGOAL /STATUS=GOALIN14(1) /PATTERN BY GROUP /CONTRAST (GROUP)=Indicator(1) /METHOD=ENTER GROUP /PLOT SURVIVAL HAZARD /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) . Cox Regression G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav Ca se Pr oces sing Sum mary N Ca ses a vaila ble in analysis Ca ses d roppe d Pe rcent 86 .5% Eventa 77 Ce nsore d 12 13 .5% To tal 89 10 0.0% Ca ses with m issing valu es 0 .0% Ca ses with ne gative tim e 0 .0% Ce nsore d cases be fore the earl iest e vent in a stra tum 0 .0% 0 .0% 89 10 0.0% To tal To tal a. De pend ent V ariab le: DAYSG OAL Ca tegor ical Varia ble Codingsb GROUP a 1 Fre quen cy 46 2 43 (1) 0 1 a. Ind icato r Paramet er Co ding b. Ca tegory variable : GRO UP Block 0: Beginning Block Om nibus Tes ts of Model Coeffic ients -2 Log L ikelih ood 61 8.281 Block 1: Method = Enter Om nibus Tes ts of Mode l Coefficie nts a,b Overall (score ) -2 L og L ikelih ood 612 .895 Ch i-squa re 5.4 48 df 1 Ch ange From Previous Step Sig . .02 0 Ch i-squa re 5.3 85 df 1 Sig . .02 0 Ch ange From Previous Block Ch i-squa re 5.3 85 df 1 a. Be ginni ng Bl ock Numbe r 0, i nitial Log Likeli hood funct ion: -2 Log likel ihood : 618 .281 b. Be ginni ng Bl ock Numbe r 1. M etho d = E nter Survival Analysis – 43 Printed on 10/9/2008 Sig . .02 0 Va riable s in the E quation B GROUP SE -.5 42 Wa ld 5.3 32 .23 5 df 1 Sig . .02 1 Exp(B) .58 2 Goal Cov aria te M eans and Patte rn Va lues Pa ttern GROUP Me an .48 3 1 .00 0 2 1.0 00 No Goal 1 2 The above graph presents predicted proportions. They are analogous to plots of y-hats vs. predictors in a regression analysis. When you perform a Cox-regression analysis, you may also have to run a Kaplan-Meier analysis just for the observed survival curves the K-M procedure produces. Survival Analysis – 44 Printed on 10/9/2008 Survival Analysis – 45 Printed on 10/9/2008 Comparing Turnover in two plants A company was interested in determining the causes of turnover in two of its plants. Plant A: One part of the preparation of food for sale to retailers is undertaken. Plant B: A different part of the preparation of food for sale to retailer is understaken. The two plants hire from the same pool of employees. Each plant is managed by a different person. The overall “survival” of employees in the two plants is as follows . . . filter off. compute reploc = newloc. value labels reploc 1 "A" 2 "B". filter by useme. KM dayswrkd by reploc /STATUS=termed(1)/PRINT MEAN /PLOT SURVIVAL /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED. Kaplan-Meier [DataSet1] G:\MDBR\???\AllEmployeesNN041025.sav Case Processing Summary reploc Total N N of Events Censored N Percent 1.00 A 310 126 184 59.4% 2.00 B 837 285 552 65.9% Overall 1147 411 736 64.2% Means and Medians for Survival Time Meana reploc Estimate Std. Error Median 95% Confidence Interval Lower Bound Estimate Std. Error 95% Confidence Interval Upper Bound Lower Bound Upper Bound 1.00 A 355.796 16.345 323.760 387.832 377.000 39.815 298.962 455.038 2.00 B 424.911 9.197 406.884 442.938 559.000 . . . Overall 407.357 8.081 391.519 423.195 489.000 33.040 424.242 553.758 a. Estimation is limited to the largest survival time if it is censored. Overall Comparisons Chi-Square df Sig. Log Rank (Mantel-Cox) 13.633 1 .000 Breslow (Generalized Wilcoxon) 10.203 1 .001 Tarone-Ware 11.880 1 .001 Test of equality of survival distributions for the different levels of reploc. Survival Analysis – 46 Printed on 10/9/2008 filter off. Clearly, employee “retention/survival” is best in Plant B. Survival Analysis – 47 Printed on 10/9/2008 Are these differences in survival rates the same for the different ethnic groups employed by the company? Perhaps the differences between buildings are due to the fact that the different buildings have different proportions of ethnic groups neweth * reploc Crosstabulation reploc 1.00 A neweth .00 White or Black 1.00 Hispanic Total Count % within reploc Count % within reploc Count % within reploc 130 41.4% 184 58.6% 314 100.0% 2.00 B 219 25.7% 634 74.3% 853 100.0% Total 349 29.9% 818 70.1% 1167 100.0% coupled with the fact that the different ethnic groups have different survival rates . . . Hispanic White/Black These differences suggest that the difference in survival between buildings might be a side-effect of the difference in proportion of hispanics in the two buildings combined with the difference in survival between Hispanics vs. White/Black, The way to resolve this issue is to perform a multivariate analysis, pitting building against ethnic group. This can only be done with Cox Regression. Survival Analysis – 48 Printed on 10/9/2008 Multivariate analysis joint effect of building and ethnic group. filter off. filter by useme. COXREG dayswrkd /STATUS=termed(1) /METHOD=ENTER reploc neweth /CRITERIA=PIN(.05) POUT(.10) ITERATE(20). Cox Regression Case Processing Summary N Eventa Censored Total Cases with missing values Cases with negative time Censored cases before the earliest event in a stratum Total Cases available in analysis Cases dropped Total a. Dependent Variable: dayswrkd 411 736 1147 65 0 Percent 33.9% 60.7% 94.6% 5.4% 0.0% 0 0.0% 65 1212 5.4% 100.0% Block 0: Beginning Block Omnibus Tests of Model Coefficients -2 Log Likelihood 5312.092 Block 1: Method = Enter Omnibus Tests of Model Coefficientsa Overall (score) Change From Previous Step -2 Log ChiChiLikelihood square df Sig. square df Sig. 5222.145 101.652 2 .000 89.947 2 .000 a. Beginning Block Number 1. Method = Enter reploc neweth B -.159 -.916 Variables in the Equation SE Wald df .111 2.072 1 .102 80.462 1 Sig. .150 .000 Change From Previous Block Chisquare df Sig. 89.947 2 .000 Exp(B) .853 .400 Covariate Means Mean reploc 1.730 neweth .697 filter off. So, when controlling for differences in ethnic groups, no difference in survival (turnover) between the two buildings was found. The manager of Building A was very happy with this result. Survival Analysis – 49 Printed on 10/9/2008