P5510 Lecture 7 - Survival Analysis

advertisement
Survival Analysis Introduction
(AKA Event History Analysis)
Situation
Analysis of data in which the dependent variable consists of two aspects –
1) whether or not the outcome occurred – a dichotomous characteristic and
2) the amount of time that elapsed before occurrence of the outcome – a continuous
quantitative characteristic.
The typical situations with this type of dependent variable
Medical literature
Two treatments for a disease are given. We attempt to record
1) Whether or not each patient died – the dichotomous outcome – and
2) how long each patient survived until death – the continuous outcome.
Group A given Drug A.
Group B given Drug B.
Are death rates different in the two groups?
Are survival times different in the two groups?
Turnover literature
Persons are hired by an organization into two different buildings. We attempt to record
1) Whether or not each employee quits before retirement and
2) how long each employee is employed before quitting.
Building A: Kill and Debone
Building B: Cook
Are turnover rates different in the two plants?
Are there differences in length of service in the two plants?
Survival Analysis – 1
Printed on 10/9/2008
Dealing with the two aspects of survival analysis: One Dichotomous and one Continuous
Dichotomous: Dying / Leaving the company
Continuous: Length of survival / Time at the company
These two aspects – proportion dying/turning over and “average” time before death/turnover are negatively correlate. If death rate is lower, that means that survival times are longer. But
they’re not perfectly negative correlated, so each gives us a slightly different picture of survival
across groups.
If we could observed persons for an infinite period of time – until EVERYONE had died or
quit, then we would probably just analyze the survival times – a positively skewed variable, but
one for which there are plenty of analytic tools – Mann-Whitney, Kruskal-Wallis,
transformations to normality, etc.
Window of Observation
The problem with this is that we don’t have an infinite period of time to wait until everyone
quits or dies. Plus, it may be the case that we lose contact with people so for some people we
won’t know how long they survived. Plus, people die/quit for a variety of reasons, many or
most not related to the specifics of the treatment or working conditions. So survival times may
be shortened arbitrarily and randomly for some people. Somehow, this variability must be
taken into account.
The window of observation is the specific time period in which participant survival is
recorded.
That is, at some time, we begin recording whether or not each person is surviving or not. At
some later time we quit monitoring each patient.
However, the exigencies of research require that our windows of observation must be finite.
Because the window is of finite duration, this necessarily results in incomplete information on
some participants. Of particular importance is the fact some will still be alive/working when we
quit observing.
This means that we won’t have accurate survival times for some people and we also won’t have
accurate death/quit rates for some groups, since some members of each group may still be
working when the window closes.
Oh, woe is me! What to do?
Survival Analysis – 2
Printed on 10/9/2008
Overview of Types of cases in survival analysis
Monitoring of cases
begins, i.e., Window
opens
Monitoring of cases
ends, i.e., Window
closes
-------|------------------------------------------------------|-------Ideal Cases – each starting time and ending time is known
Cases whose ending time (time of termination/death) is unknown
These are called Right Censored Cases - the most common
??????????????
?
??????????????
The above cases are still employed/surviving at the time monitoring ends. ?
??????????????????????????????
The above case is lost to follow-up (quit answering phone, left state, etc.)
Cases whose starting times (times disease develops)are unknown.
I believe these are called left-censored, although Tabachnick & Fidell are ambiguous on p. 5?? and p. 5??
regarding this. They use "failed before study begin" and "disease process began before study began" - two
different conditions. I believe their discussion on p 5?? is the correct definition.
???????
Cases
whose starting times and ending times are
Fagettaboutit – these are not analyzable.
????
unknown,
????
Survival Analysis – 3
Printed on 10/9/2008
Incorrect Analysis 1: Analysis of only the outcomes – deaths or quits.
We could use logistic regression to compare death/quit rates between groups.
(Use linear regression in a pinch praying that the God of statistics won’t strike you down).
Problem – it’s possible to create situations in which distributions of durations are different even
though proportions of outcomes are identical.
Consider the following . . . Assume we’re dealing with employment.
In the figures, each arrow represents duration of employment for a person. The horizontal axis
is time. The vertical line at the left represents the time at which the window of observation
opened. The vertical line at the right represents the time at which the window closed. The -> of
the arrow represents death/termination.
Group A – Termination Rate = 100%
Group B – Termination Rate = 100%
Clearly, Group A has longer average employment times, but both have the exact same
proportion of turnovers – 100% in this example. So comparison of death/quit rates gives an
inaccurate picture of the differences between the groups.
Survival Analysis – 4
Printed on 10/9/2008
Incorrect Analysis 2 – Analyze only the durations. Ignore the deaths/turnovers.
Use Mann-Whitney U-tests since durations will be positively skewed.
Group A – Average Survival Time =
Group B - Average Survival Time =
In the example above, the two groups have equal average survival time, but different turnover
rates – Group A has a 100% turnover rate, while that of Group B is 60%. In this case, analysis
of only survival times will give an incorrect picture of the differences in survival between the
groups.
Each type of incomplete analysis ignores one aspect of the complete dependent variable. We
need a method of analysis that takes into account both aspects.
Survival analysis is an analytic technique that combines both aspects.
Survival Analysis – 5
Printed on 10/9/2008
Survival Analysis (also called Event History Analysis)
An analytic technique that models both proportion of outcomes (death/turnover) and average
duration to outcome.
3 separate techniques – Life Table, Kaplan-Meier, Cox Regression
Key concepts common to all
1. Survival function – most important one of all of them
A plot of proportion surviving from time 0 up to a given time vs. time
A cumulative plot.
100%
63% have survived at time, t.
Proportion
Surviving
50%
54% have survived at time, t.
0
t
Time
Generally decreasing curve, since proportion surviving can only remain constant or decrease
across time.
Separate curves for separate groups.
Note that the survival function represents both proportion dieing/terminating (the height of the
curve at a point).
The curve also represents duration of stay/life (how far the curve has progressed to the right
from t=0). It is a two-dimensional representation of the two aspects of survival – turnover rates
and length of life/employment.
Survival Analysis – 6
Printed on 10/9/2008
Comparing survival rates between groups.
The vertical axis represents proportion of survivals or turnovers.
Within a vertical slice at any point, turnover rates up to a particular time can be compared.
In the following, we see that Group B had lower survival/higher turnover at the indicated time
period.
A
B
Time
The horizontal axis represents duration of life/stay.
Within a horizontal slice at any point, average durations can be compared.
In the following, we see that for group A, average time to reach 70% turnover was longer for
Group A.
A
70%
B
Time
Survival Analysis – 7
Printed on 10/9/2008
2. Hazard function
A plot of proportion dying/leaving at time intervals from among those who had survived to that
time period.
Among those who have survived until time, t, the hazard function, gives the proportion who
will die next.
Not a cumulative plot.
Hazard function for human mortality – Highest at young age and at high age.
Proportion
Dying
Age
3. Cumulative Hazard.
A plot of proportion dieing/turning over up to a particular time.
A cumulative plot – the inverse of the survival plot.
Survival Analysis – 8
Printed on 10/9/2008
Three general types of Survival Analysis
1. Life Tables analysis.
The window of observation is cut up into n equal-length intervals.
Proportions of persons surviving/dying within each interval are computed.
This is the original method.
Useful for analysis of one group or for comparison of a few groups defined by levels of a single
categorical factor.
Can’t incorporate quantitative predictors.
Can’t incorporate more than 2 qualitative predictors in SPSS.
Cannot analyze interactions of 2 or more predictors.
2. Kaplan-Meier analysis.
Event-based. Rather than defining intervals based on time, intervals are defined based on
occurrence of death/termination. Each death/termination marks the end of one interval and the
beginning of a subsequent interval.
Can’t incorporate quantitative predictors.
Can’t incorporate more than 2 qualitative predictors in SPSS.
Cannot analyze interactions of 2 or more predictors.
Survival Analysis – 9
Printed on 10/9/2008
3. Cox Proportional Hazards Regression (Cox Regression)
A very general, procedure.
Based on a specific mathematical model of survival developed by Cox.
Estimates hazard probabilities for whole sample.
Then estimates ratios of hazards to this overall hazard function for groups/persons with
different values of IV’s
As implemented in SPSS, output and analyses look at lot like logistic regression.
Can incorporate quantitative predictors.
Can incorporate multiple qualitative and quantitative factors.
Can incorporate interactions.
Requires:
Proportional hazard functions.
Survival plots for different groups must diverge “nicely” and can’t cross back.
OK
Not OK
Survival Analysis – 10
Printed on 10/9/2008
Based on Tabachnick Table 11.1, p. 511
Analyzed using SPSS Life Tables
Suppose the efficacy of Drug 0 is being compared with that of Drug 1. Each was formulated to prolong life of
patients with a usually terminal form of cancer. Seven patients were given Drug 0 and five were given Drug 1.
Patients were observed for up to 12 months. After 12 months, the window of observation closed and the results
were entered into SPSS.
So this problem is analogous to a turnover problem in organizational research with two groups of employees
treated differently.
Like ANOVA
Like multiple t-tests
The SPSS syntax to invoke the analysis.
SAVE OUTFILE='G:\MdbT\P595\P595AL07-Survival analysis\TAndFDancingData.sav'
/COMPRESSED.
SURVIVAL TABLE=months BY drug(0 1)
/INTERVAL=THRU 12 BY 1
/STATUS=outcome(1)
/PRINT=TABLE
/PLOTS (SURVIVAL)=months BY drug.
Survival Analysis – 11
Printed on 10/9/2008
Survival Analysis
[DataSet0] G:\MdbT\P595\P595AL07-Survival analysis\TAndFDancingData.sav
Survival Variable: months
Life Table
First-order
Controls
drug
0
1
Interval
Start Time
0
1
2
3
4
5
6
7
8
9
10
11
0
1
2
3
4
5
6
7
8
9
10
11
12
Number
Entering
Interval
7
7
6
4
3
2
1
1
1
1
1
1
5
5
5
5
5
5
5
5
4
3
3
1
1
Number
Withdra
wing
during
Interval
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
Number
Exposed
to Risk
7.000
7.000
6.000
4.000
3.000
2.000
1.000
1.000
1.000
1.000
1.000
1.000
5.000
5.000
5.000
5.000
5.000
5.000
5.000
5.000
4.000
3.000
3.000
1.000
.500
Number
of
Terminal
Events
0
1
2
1
1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
0
2
0
0
Proportio
n
Terminat
ing
.00
.14
.33
.25
.33
.50
.00
.00
.00
.00
.00
1.00
.00
.00
.00
.00
.00
.00
.00
.20
.25
.00
.67
.00
.00
Proportio
n
Surviving
1.00
.86
.67
.75
.67
.50
1.00
1.00
1.00
1.00
1.00
.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
.80
.75
1.00
.33
1.00
1.00
Cumulati
ve
Proportio
n
Surviving
at End of
Interval
1.00
.86
.57
.43
.29
.14
.14
.14
.14
.14
.14
.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
.80
.60
.60
.20
.20
.20
Std.
Error of
Cumulati
ve
Proportio
n
Surviving
at End of
Interval
.00
.13
.19
.19
.17
.13
.13
.13
.13
.13
.13
.00
.00
.00
.00
.00
.00
.00
.00
.18
.22
.22
.18
.18
.18
Probabili
ty
Density
.000
.143
.286
.143
.143
.143
.000
.000
.000
.000
.000
.143
.000
.000
.000
.000
.000
.000
.000
.200
.200
.000
.400
.000
.000
Std.
Error of
Probabili
ty
Density
.000
.132
.171
.132
.132
.132
.000
.000
.000
.000
.000
.132
.000
.000
.000
.000
.000
.000
.000
.179
.179
.000
.219
.000
.000
Hazard
Rate
.00
.15
.40
.29
.40
.67
.00
.00
.00
.00
.00
2.00
.00
.00
.00
.00
.00
.00
.00
.22
.29
.00
1.00
.00
.00
Std.
Error of
Hazard
Rate
.00
.15
.28
.28
.39
.63
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.22
.28
.00
.61
.00
.00
The results suggest that survival is significantly longer with Drug 1 – the top (orange) curve.
Survival Analysis – 12
Printed on 10/9/2008
Tabachnick Table 11.1, p. 511 start here on 10/14/15
Analyzed using SPSS Kaplan-Meier
[Define Event] had already
been pressed when this
screen shot was taken.
KM months BY drug /STATUS=outcome(1) /PRINT TABLE MEAN
/TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED.
Survival Analysis – 13
/PLOT SURVIVAL
Printed on 10/9/2008
Kaplan-Meier
[DataSet2] G:\MdbT\InClassDatasets\Survival(T&Bp511).sav
Case Processing Summary
Censored
drug
0
1
Overall
drug
0
1
Total N
N of Events
7
5
12
1
2
3
4
5
6
7
1
2
3
4
5
N
Percent
7
4
11
Time
1.000
2.000
2.000
3.000
4.000
5.000
11.000
7.000
8.000
10.000
10.000
12.000
0
1
1
Status
1
1
1
1
1
1
1
1
1
1
1
0
.0%
20.0%
8.3%
Survival Table
Cumulative Proportion Surviving at the Time
Estimate
Std. Error
.857
.132
.
.
.571
.187
.429
.187
.286
.171
.143
.132
.000
.000
.800
.179
.600
.219
.
.
.200
.179
.
.
N of Cumulative Events
1
2
3
4
5
6
7
1
2
3
4
4
N of Remaining Cases
6
5
4
3
2
1
0
4
3
2
1
0
Means and Medians for Survival Time
Meana
95% Confidence Interval
drug
Estimate
Std. Error
Lower Bound
Upper Bound
0
4.000
1.272
1.506
6.494
1
9.400
.780
7.872
10.928
Overall
6.250
1.081
4.131
8.369
a. Estimation is limited to the largest survival time if it is censored.
Estimate
3.000
10.000
5.000
Std. Error
1.309
.894
2.598
Median
95% Confidence Interval
Lower Bound
Upper Bound
.434
5.566
8.247
11.753
.000
10.092
Overall Comparisons
Chi-Square
df
Log Rank (Mantel-Cox)
3.747
1
Breslow (Generalized Wilcoxon)
4.926
1
Tarone-Ware
4.522
1
Test of equality of survival distributions for the different levels of drug.
Sig.
.053
.026
.033
Note that censored cases are
denoted with a + on the
survival function.
As was the case with the analysis using the LIFE TABLES procedure, the results support the conclusion that
survival is significantly longer with Drug 1.
Survival Analysis – 14
Printed on 10/9/2008
Tabachnick Table 11.1, p. 511
Analyzed using SPSS Cox Regression
The program will not produce a survival curve for a group of cases defined by the value of a variable unless that
variable is a categorical variable.
For that reason, I told the program that drug is a categorical variable so that survival curves for each value of
drug could be obtained.
Since drug is a dichotomy, the analysis could be done without labeling it categorical, but in that case the
survival curves for each value of drug could not have been generated.
Survival Analysis – 15
Printed
As mentioned above if
you on 10/9/2008
want separate predicted
survival functions for each
The left panel would yield 1 plot
The right panel yields a plot for each value of drug.
COXREG months /STATUS=outcome(1) /PATTERN BY drug
/CONTRAST (drug)=Indicator(1) /METHOD=ENTER drug
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
/PLOT SURVIVAL
Cox Regression
[DataSet2] G:\MdbT\InClassDatasets\Survival(T&Bp511).sav
Case Processing Summary
N
Cases available in analysis
Cases dropped
Eventa
Censored
Total
Cases with missing values
Cases with negative time
Censored cases before the
earliest event in a stratum
Total
Total
a. Dependent Variable: months
Categorical Variable Codingsb
Frequency
(1)
druga
0
7
1
5
a. Indicator Parameter Coding
b. Category variable: drug
11
1
12
0
0
0
Percent
91.7%
8.3%
100.0%
.0%
.0%
.0%
0
12
.0%
100.0%
0
1
Survival Analysis – 16
Printed on 10/9/2008
Block 0: Beginning Block
Omnibus Tests of Model
Coefficients
-2 Log Likelihood
40.740
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
Change From Previous Step
Sig.
Chi-square
df
Sig.
.063
3.346
1
.067
Overall (score)
-2 Log Likelihood
Chi-square
df
37.394
3.469
1
a. Beginning Block Number 1. Method = Enter
Change From Previous Block
Chi-square
df
Sig.
3.346
1
.067
Variables in the Equation
B
drug
drug
SE
-1.176
Wald
3.192
.658
Covariate Means and Pattern Values
Pattern
Mean
1
.417
.000
df
Sig.
1
Exp(B)
.074
.309
Cox regression coefficient signs are relative to death, not
survival. So a positive sign means that larger values of the
independent variable have higher death rates. And
negative signs mean that larger values of the independent
variable have lower death rates.
2
1.000
In Cox Regression, we’re
predicting DEATH, not
survival.
Death
0
1
Drug
I strongly recommend that you create a plot such as the one immediately above by hand to make sure you
understand the Cox Regression results. I do it every time I use this procedure.
Survival Analysis – 17
Printed on 10/9/2008
The Cox-Regression
plots are y-hat plots,
not observed
survival functions.
They are predicted
survival, not actual
survival.
Y-hats
COXREG plots are plots of predicted survival, not actual survival. In this sense, they’re like the tables and
plots of estimated marginal means from GLM. I usually report observed survival functions, using KaplanMeier, rather than these predicted survival functions. However, these are certainly useful in situations in which
you want to show what survival should be for specific groups at specific times.
Survival Analysis – 18
Printed on 10/9/2008
Example: Turnover at a local Manufacturing Plant
1. Effect of Friends and/or family at the plant
In this study, turnover at a local manufacturing plant was studied. On the application blank, applicants were
asked to indicate whether or not they had friends or family already working at the plant.
Some did not respond to this question. They’re included in the analysis.
A screen shot of the data editor
The variable, wsfr2,
represents whether or not
the applicant had friends at
the company.
wsfr2 = 0.50 means yes.
wsfr2 = -0.50 means no.
wsfr2 = 0.15 means no
info.
Wsfr2 was created to deal with missing values in a special way. The fact that the values are fractional has no
bearing on the analyses. They could just as well have been 0, 1, 2 or 1, 2,3.
Kaplan-Meier output is shown
KM
dos BY wsfr2 /STATUS=status(1)
/PRINT TABLE MEAN
/PLOT SURVIVAL
/TEST LOGRANK BRESLOW TARONE
/COMPARE OVERALL POOLED .
Kaplan-Meier
[DataSet3] G:\MdbR\1TurnoverArticle\TurnoverArticleDataset061005.sav
Huge table not reproduced here.
Ca se Pr oces sing Sum mary
wsf r2 Wheth er F/F at compa ny
for whol e sam ple analyses
-.5 0
.15 Wh ole sa mpl e missing
val ue
.50
Overall
Ce nsore d
To tal N
42 3
N o f Eve nts
17 4
10 0
N
24 9
Pe rcent
58 .9%
40
60
60 .0%
77 8
22 0
55 8
71 .7%
13 01
43 4
86 7
66 .6%
-.50 = No friends
.15 = No info
.50 = Had friends
The Survival table (700+ lines long) was deleted.
Survival Analysis – 19
Printed on 10/9/2008
Means and Medians for Survival Time
Meana
Median
95% Confidence Interval
wsfr2
Estimate
Std.
Lower
Upper
Error
Bound
Bound
95% Confidence Interval
Estimate
Std.
Lower
Upper
Error
Bound
Bound
-.50
610.597
25.559
560.500
660.693
667.000
.
.
.
.15 missing info
579.795
49.233
483.299
676.291
528.000
151.013
232.014
823.986
.50
769.900
18.559
733.524
806.277
.
.
.
.
Overall
706.965
15.009
677.548
736.383
.
.
.
.
a. Estimation is limited to the largest survival time if it is censored.
Note that there is not estimate of median survival for the 0.50 group. I believe this is because of the large
proportion of censored cases in that group – almost everyone was still on the job at the end of the window of
observation.
Overall Comparisons
Chi-Square
df
Sig.
Log Rank (Mantel-Cox)
25.344
2
.000
Breslow (Generalized Wilcoxon)
25.325
2
.000
Tarone-Ware
25.004
2
.000
Test of equality of survival distributions for the different levels of wsfr2.
Clearly there are significant differences in overall survival between the groups.
Survival Analysis – 20
Printed on 10/9/2008
Had friends or family
Missing response
Note the huge
difference in
proportion surviving
after two years –
almost 20%
difference between
those with friends
and those without
friends.
No friends or family
1 year
2 years
The data strongly suggest that applicants who had friends or family at the company had higher survival
rates at all times, up to 1100 days (about 3 years).
For example, at the end of 1 year survival (leftmost arrow in the above figure) rate of those with friends and
family was about 70% while that for those who said they did not have friends or family at the organization was
about 60%.
By two years (rightmost arrow), the rate of retention of those with was about 68% while the rate of those
without had decreased to 50%.
The fact that the curve for those for whom no information was available was between the other two curves
suggests that those employees for whom no information was available were a mixture of some who did have
friends and family and those who did not.
Survival Analysis – 21
Printed on 10/9/2008
Using Survival Analysis to validate selection test questions.
An I/O consulting firm gave a 30 question pre-employment questionnaire to 1000+ employees of a local
company. Each question had from one to five alternatives. The consulting company wanted to identify
questions that predicted long tenure with the organization. (They would have preferred to identify questions
that predicted high performance, but it was not possible to get good performance data. Don’t get me started on
why organizations don’t gather good performance data.)
In order to identify responses associated with long tenure, a survival analysis was conducted for each question.
A few of the analyses are presented below.
For each survival function, each curve is the survival function of persons who made a particular response to the
item. I picked only those for which the difference in survival curves was significant or approached signifiance.
Question 1
Overall Comparisons
Chi-Square
df
Sig.
Log Rank (Mantel-Cox)
5.382
2
.068
Breslow (Generalized Wilcoxon)
4.307
2
.116
Tarone-Ware
4.756
2
.093
The numbers represent
the 3 possible responses
to the question, coded as
+1, 0, -1.
+1
0
-1?
Survival Analysis – 22
For this question, I
believe we treated +1 as
an indicator of long
tenure and both 0 and -1
as indicators of short
tenure.
Printed on 10/9/2008
Question 2
Overall Comparisons
Chi-Square
df
Sig.
Log Rank (Mantel-Cox)
7.647
4
.105
Breslow (Generalized Wilcoxon)
6.950
4
.139
Tarone-Ware
7.298
4
.121
+1
0
As in the case of the
question on the
previous page, the
response coded as +1
was treated as an
indicator of long
tenure and all other
responses were
treated as indicators
of short tenure.
-1?
Survival Analysis – 23
Printed on 10/9/2008
Question 3
Overall Comparisons
Chi-Square
df
Sig.
Log Rank (Mantel-Cox)
5.070
3
.167
Breslow (Generalized Wilcoxon)
5.525
3
.137
Tarone-Ware
5.493
3
.139
Test of equality of survival distributions for the different levels of GenQ4 Gen
Q4 L:I prefer a job that / S: How often you experience conflict with a coworker?.
There were very few
persons who responded
+1 or 0, but those who
did were treated as long
tenure and those who
responded 0 as short
tenure.
+1
0
Survival Analysis – 24
Printed on 10/9/2008
Question 4
Overall Comparisons
Chi-Square
df
Sig.
Log Rank (Mantel-Cox)
7.753
4
.101
Breslow (Generalized Wilcoxon)
6.762
4
.149
Tarone-Ware
7.439
4
.114
Test of equality of survival distributions for the different levels of GenQ3 Gen
Q3 L: Recieved safety training? / S: You are asked to do more physically
demanding work than you were hired to do because someone out sick, how
do you react?.
+1
+1: Long tenure
Else: Short tenure
0
-1?
Survival Analysis – 25
Printed on 10/9/2008
Question 5
Overall Comparisons
Chi-Square
Log Rank (Mantel-Cox)
Breslow (Generalized Wilcoxon)
Tarone-Ware
df
Sig.
10.971
4
.027
9.931
4
.042
10.597
4
.031
Test of equality of survival distributions for the different levels of GenQ2 Gen
Q2 L: Your team in disagreement over who will clean the floor. What
method is fair?/ S: Recent supervisor rate dependability?.
+1
0
-1?
Survival Analysis – 26
Printed on 10/9/2008
Question 6
Overall Comparisons
Chi-Square
Log Rank (Mantel-Cox)
df
Sig.
8.052
3
.045
Breslow (Generalized Wilcoxon)
12.729
3
.005
Tarone-Ware
10.614
3
.014
Test of equality of survival distributions for the different levels of GenQ1
GenQ1 L: Which strategies inspire a team and help be more effective?/
S:Your team in disagreement over who will clean the floor. What method is
fair?.
+1
0
-1?
Survival Analysis – 27
Printed on 10/9/2008
Thirty questions were evaluated in the above fashion.
After examination of the individual survival curves for the 30 questions, those for which significant differences
in survival between responses were identified by examining the survival analysis for each question as shown
above.
Finally, an overall index was calculated, using syntax like the following . . .
In this particular case, the response associated with long survival added 1 to the index.
The response associated with short survival subtracted 1 from the index.
Tenure Scale Computation
Compute genshort=0.
if ((genq1=3 or genq1=4))
if ((genq1=1 or genq1=2))
if ((genq2=3 or genq2=4))
if ((genq2=1 or genq2=2 or genq2=5))
if ((genq6=3))
if ((genq6=1 or genq6=2))
if ((genq12=1))
if ((genq12=3))
if ((genq13=1))
if ((genq13=2 or genq13=3 or genq13=4))
if ((genq21=1 or genq21=3))
if ((genq21=2))
genqshort=genqshort+1.
genqshort=genqshort-1.
genshort=genshort+1.
genshort=genshort-1.
genshort=genshort+1.
genshort=genshort-1.
genshort=genshort+1.
genshort=genshort-1.
genshort=genshort+1.
genshort=genshort-1.
genshort=genshort+1.
genshort=genshort-1.
Survival Analysis – 28
Printed on 10/9/2008
Validity of the Tenure Scale
The following is not based on the scale above but on a similar scale.
The median score on the scale was determined to be -14.
Group 0 was all employees with an index value less than or equal to -14.
Group 1 was all employees with an index value greater than -14.
Group 1
Group 0
1 yr
2 yr
3 yr
4 yr
The graph indicates that those in Group 1, with large values of the index, had a nearly 70% retention rate after
50 months.
Those in Group 0 had a 40% retention rate after the same length of time.
The implication of this analysis would be to recommend to the company to use the scale in hiring of employees,
giving preference to those with higher scores on the scale.
Potential problems
The above curve was based on the same sample that was used to select the questions. So clearly there is
capitalization on chance. The scale should be tested on a different sample. That is the results need to be cross
validated.
Survival Analysis – 29
Printed on 10/9/2008
Multivariate Analysis using Cox Regression
Turnover as a function of
1) friends at the organization (wsfr2) and
2) ethnic group of the employee (neth)
COXREG dos
/STATUS=status(1)
/PATTERN BY wsfr2
/CONTRAST (neth)=Indicator(1)
/CONTRAST (wsfr2)=Indicator
/METHOD=ENTER wsfr2 nsex neth
/PLOT SURVIVAL
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Wsfr2 =
0.50 friends at the company
0.15 no info on whether has friends
-0.50 does not have friends
Neth
1
2
3
Employee is White
Employee is Black
Employee is American Indian or Asian or Hispanic
Survival Analysis – 30
Printed on 10/9/2008
Cox Regression
[DataSet1] G:\MDBR\1TurnoverArticle\TurnoverArticleDataset061005.sav
Case Processing Summary
N
Cases available in analysis
Eventa
434
33.4%
Censored
867
66.6%
1301
100.0%
Cases with missing values
0
0.0%
Cases with negative time
0
0.0%
0
0.0%
0
0.0%
1301
100.0%
Total
Cases dropped
Percent
Censored cases before the earliest
event in a stratum
Total
Total
a. Dependent Variable: dos Days of service: termdate-effdate or 3/1/1-effdate or 12/31/4-effdate
Categorical Variable Codingsa,c
Frequency
wsfr2b
nethb
(1)
Note that
(2)
-.50=-.50
423
1
0
.15=Whole sample missing value
100
0
1
.50=.50
778
0
0
1.00=White
903
0
0
2.00=Black
324
1
0
74
0
1
3.00=Am Ind,Asian,Hisp
Wsfr2 = 0.50 (friends) is
the reference group
Neth = 1 (white) is the
reference group
a. Category variable: wsfr2 (Whether F/F at company for whole sample analyses)
b. Indicator Parameter Coding
c. Category variable: neth (1=White, 2=Black, 3=Am Ind,Asian, Hisp)
Survival Analysis – 31
Printed on 10/9/2008
Block 0: Beginning Block
Omnibus Tests of
Model Coefficients
-2 Log Likelihood
5871.672
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log Likelihood
Overall (score)
Chi-square
5827.342
df
Change From Previous Step
Sig.
42.322
5
Chi-square
.000
df
44.330
Change From Previous Block
Sig.
5
Chi-square
.000
df
44.330
Sig.
5
.000
a. Beginning Block Number 1. Method = Enter
Remember we’re predicting “Quit”, not survival
Quit
Variables in the Equation
B
SE
Wald
wsfr2
df
Sig.
Exp(B)
22.427
2
.000
wsfr2(1)
.464
.102
20.763
1
.000
1.590
wsfr2(2)
.421
.173
5.969
1
.015
1.524
-.223
.100
4.952
1
.026
.800
10.799
2
.005
nsex
neth
neth(1)
.088
.109
.657
1
.417
1.092
neth(2)
-.908
.295
9.490
1
.002
.403
Quit
Covariate Means and Pattern Values
Mean
Pattern
1
2
3
wsfr2(1)
.325
1.000
.000
.000
wsfr2(2)
.077
.000
1.000
.000
1.421
1.421
1.421
1.421
neth(1)
.249
.249
.249
.249
neth(2)
.057
.057
.057
.057
nsex
1=No Fr
or
missing
0=Fr
1=Fr
Quit
2=MV
Quit
1=Fe
2=Ma
Survival Analysis – 32
0=W
1= Black or
Printed on AI/As
10/9/2008
Predicted
Survival Analysis – 33
Printed on 10/9/2008
Testing for Interactions in Cox Regression
The interaction of Friends and Nsex
To specify that an
interaction be tested, click
on the 1st variable name,
then while holding down
the CTRL key or
Command on the Mac,
click on the 2nd variable
name.
Finally, click on the >a*b>
button.
Block 1: Method = Enter
-2 Log
Likelihood
Omnibus Tests of Model Coefficientsa
Overall (score)
Change From Previous Step
Chidf
Sig.
Chisquare
square
5824.879
44.989
7
.000
46.792
a. Beginning Block Number 1. Method = Enter
B
wsfr2
wsfr2(1)
wsfr2(2)
nsex
neth
neth(1)
neth(2)
nsex*wsfr2
nsex*wsfr2(1)
nsex*wsfr2(2)
.429
-.333
-.282
.097
-.907
.023
.541
df
Variables in the Equation
SE
Wald
df
3.022
.306
1.975
.530
.394
.138
4.158
10.964
.109
.800
.295
9.464
2.517
.213
.011
.347
2.424
Sig.
7
.000
2
1
1
1
2
1
1
2
1
1
Sig.
.221
.160
.530
.041
.004
.371
.002
.284
.915
.119
Change From Previous
Block
Chidf
Sig.
square
46.792
7
.000
Exp(B)
1.536
.717
.754
1.102
.404
1.023
1.717
So the effect of having friends is the same for Females as it is for Males
Survival Analysis – 34
Printed on 10/9/2008
The interaction of Friends and Neth
Block 1: Method = Enter
-2 Log
Likelihood
Omnibus Tests of Model Coefficientsa
Overall (score)
Change From Previous Step
Chidf
Sig.
Chisquare
square
5820.584
49.194
9
.000
51.088
a. Beginning Block Number 1. Method = Enter
B
wsfr2
wsfr2(1)
wsfr2(2)
nsex
neth
neth(1)
neth(2)
neth*wsfr2
neth(1)*wsfr2(1)
neth(2)*wsfr2(1)
neth(1)*wsfr2(2)
neth(2)*wsfr2(2)
.599
.623
-.224
.298
-.465
-.392
-1.222
-.534
-1.093
df
Sig.
9
Variables in the Equation
SE
Wald
df
27.320
.121
24.386
.209
8.846
.100
4.973
6.603
.150
3.934
.344
1.835
6.377
.230
2.906
.791
2.385
.378
1.995
1.075
1.035
.000
Change From Previous
Block
Chidf
Sig.
square
51.088
9
.000
Sig.
.000
.000
.003
.026
.037
.047
.176
.173
.088
.123
.158
.309
2
1
1
1
2
1
1
4
1
1
1
1
Exp(B)
1.820
1.864
.799
1.347
.628
.675
.295
.586
.335
Again, the effect of Friends is the same for each ethnic group.
What the heck? What about the interaction of nsex and neth?
Block 1: Method = Enter
-2 Log Likelihood
Overall (score)
Chi-square
df
5827.298
42.395
7
a. Beginning Block Number 1. Method = Enter
Omnibus Tests of Model Coefficientsa
Change From Previous Step
Sig.
Chi-square
df
Sig.
.000
44.373
7
.000
Variables in the Equation
SE
Wald
B
wsfr2
wsfr2(1)
wsfr2(2)
nsex
.465
.421
-.216
.102
.173
.118
.112
-.739
.327
.884
-.018
-.125
.234
.624
neth
neth(1)
neth(2)
neth*nsex
neth(1)*nsex
neth(2)*nsex
df
Sig.
Change From Previous Block
Chi-square
df
Sig.
44.373
7
.000
Exp(B)
22.438
2
.000
20.791
5.941
3.365
1
1
1
.000
.015
.067
.888
2
.642
.117
.700
1
1
.732
.403
.043
2
.979
.006
.040
1
1
.940
.842
1.591
1.523
.806
1.118
.477
.982
.883
Nope.
Survival Analysis – 35
Printed on 10/9/2008
Survival Analysis of a phenomenon with a positive outcome
PEG vs. PEGJ Example Skipped in F013 & F014
The data for this example compared two methods of feeding trauma patients, one using a percutaneous
esophagogastrojejunostomy (PEGJ) and the other using percutaneous esophagogastrostomy (PEG). It
was hoped that the data would show that the PEGJ technique would provide continuous uninterrupted nutrition
with greater consistency than with PEG. Time to reach a nutrition goal was the continuous dependent
variable. Patients were observed for 14 days. Whether or not a patient reached the goal was the status.
Reaching the goal was the +1 state. A patient who had not reached the goal in 14 days, was treated as a
censored case. Group=1 is the PEGJ group. Group=2 is the PEG group.
NUTRSD
02/15/98
01/10/98
02/14/98
02/02/98
01/10/98
01/09/98
01/02/98
01/20/98
03/18/98
02/04/98
01/23/98
02/01/98
02/20/98
02/03/98
03/31/98
04/13/98
05/08/98
04/14/98
05/27/98
05/13/98
05/07/98
04/16/98
03/23/98
04/07/98
03/29/98
04/30/98
05/05/98
05/28/98
06/08/98
05/27/98
04/27/98
04/10/98
02/26/98
03/27/98
04/17/98
02/25/98
03/18/98
01/28/98
03/23/98
04/29/98
07/19/98
08/13/98
08/25/98
10/06/98
09/10/98
08/14/98
08/25/98
09/20/98
09/29/98
10/09/98
NUTRGOAL DAYSGOAL GOALIN14
02/16/98
1
1
01/12/98
2
1
02/18/98
4
1
02/06/98
4
1
01/13/98
3
1
.
15
0
01/04/98
2
1
01/22/98
2
1
.
5
1
02/06/98
2
1
.
15
0
02/02/98
1
1
02/21/98
1
1
02/04/98
1
1
04/02/98
2
1
04/15/98
2
1
05/09/98
1
1
04/20/98
6
1
05/28/98
1
1
.
15
0
05/16/98
9
1
04/17/98
1
1
03/25/98
2
1
04/08/98
1
1
03/30/98
1
1
05/01/98
1
1
05/08/98
3
1
05/30/98
2
1
06/10/98
2
1
05/28/98
1
1
04/29/98
2
1
04/11/98
1
1
03/04/98
6
1
03/28/98
1
1
04/18/98
1
1
03/05/98
8
1
03/19/98
1
1
01/29/98
1
1
03/24/98
1
1
05/03/98
4
1
08/02/98
14
1
08/15/98
2
1
.
15
0
10/07/98
1
1
09/11/98
1
1
08/15/98
1
1
08/27/98
2
1
09/21/98
1
1
10/01/98
2
1
.
15
0
GROUP
1
1
1
1
1
2
2
2
1
2
2
1
1
2
2
2
2
2
1
2
2
2
2
2
1
2
2
1
2
1
1
1
1
1
1
1
1
1
1
1
2
1
2
2
2
1
2
2
2
2
ISS
29
5
29
27
13
19
26
36
27
13
10
22
17
14
18
27
9
9
17
29
25
32
20
16
25
29
38
4
16
9
22
27
25
29
22
25
25
17
16
26
34
25
26
34
27
30
27
36
17
38
Survival Analysis – 36
AGE
43
88
37
36
92
73
42
55
23
72
45
59
54
78
30
49
22
60
27
95
31
31
41
29
24
52
79
76
70
27
87
36
54
22
22
79
56
66
20
22
33
49
77
19
36
35
29
62
19
74
DAYSGOAL is the
“length of the arrow”
variable in the first
handout.
GOALIN14 is a
variable which
represents whether the
goal was reached or
not.
GOALIN14=1 means
that the goal was
reached.
GOALIN14=0 means
that the case is rightcensored.
GROUP=1: PEGJ
GROUP=2: PEG
ISS: Injury Severity
Score, a measure of
amount of trauma
(taken at admission)
Printed on 10/9/2008
NUTRSD
10/02/98
08/26/98
08/19/98
08/03/98
08/25/98
09/17/98
07/02/98
08/03/98
07/15/98
07/27/98
04/30/98
05/29/98
05/16/98
06/20/98
08/30/98
04/30/98
07/01/98
09/29/98
05/28/98
07/15/98
08/11/98
10/12/98
08/24/98
10/22/98
10/08/98
10/06/98
07/30/98
04/16/98
10/08/98
08/19/98
03/20/98
06/20/98
07/30/98
09/07/98
07/17/98
09/15/98
07/07/98
10/01/98
09/11/98
NUTRGOAL DAYSGOAL GOALIN14
10/03/98
1
1
09/04/98
9
1
08/21/98
2
1
08/04/98
1
1
08/28/98
3
1
.
15
0
.
15
0
08/05/98
2
1
07/17/98
2
1
08/01/98
5
1
05/02/98
2
1
05/30/98
1
1
05/18/98
2
1
06/23/98
3
1
.
15
0
05/02/98
2
1
07/02/98
1
1
.
15
0
06/08/98
11
1
07/16/98
1
1
08/12/98
1
1
10/13/98
1
1
08/25/98
1
1
.
15
0
10/09/98
1
1
.
15
0
08/02/98
3
1
04/17/98
1
1
10/09/98
1
1
08/21/98
2
1
03/21/98
1
1
06/21/98
1
1
07/31/98
1
1
.
15
0
07/18/98
1
1
09/17/98
2
1
07/08/98
1
1
10/02/98
1
1
09/12/98
1
1
GROUP
1
2
1
1
2
2
1
2
2
2
2
1
2
1
1
2
1
2
2
2
1
1
1
1
2
2
1
1
1
1
1
1
1
2
1
2
1
2
1
ISS
10
18
18
41
24
26
19
13
38
34
4
29
19
25
25
43
43
17
36
27
19
36
20
25
25
17
22
38
25
34
25
11
25
36
22
20
33
25
41
AGE
40
48
31
46
37
75
28
52
71
33
61
58
42
19
70
33
79
18
57
59
43
18
84
17
20
31
26
18
34
22
48
45
33
28
62
47
27
33
31
Specifying the analysis using Life Tables . . .
Survival Analysis – 37
Printed on 10/9/2008
The output of LIFE TABLES
SURVIVAL
TABLE=DAYSGOAL BY GROUP(1 2)
/INTERVAL=THRU 15 BY 1
/STATUS=GOALIN14(1)
/PRINT=TABLE
/PLOTS ( SURVIVAL)=DAYSGOAL BY GROUP
.
Survival Analysis
G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav
Survival Variable: DAYSGOAL
Life Table
First-ord er Co ntrols
GROUP 1
2
Pro porti on
Su rvivin g
1.0 0
Cu mula tive
Pro porti on
Su rvivin g at
En d of
Inte rval
1.0 0
Std . Erro r of
Cu mula tive
Pro porti on
Su rvivin g at
En d of
Inte rval
.00
Nu mber
En tering
Inte rval
46
Nu mber
Wit hdrawin
g d uring
Inte rval
0
Nu mber
Exp osed to
Risk
46. 000
1.0 00
46
0
46. 000
28
.61
.39
.39
.07
.60 9
.07 2
.88
.15
2.0 00
18
0
18. 000
6
.33
.67
.26
.06
.13 0
.05 0
.40
.16
3.0 00
12
0
12. 000
3
.25
.75
.20
.06
.06 5
.03 6
.29
.16
4.0 00
9
0
9.0 00
3
.33
.67
.13
.05
.06 5
.03 6
.40
.23
5.0 00
6
0
6.0 00
1
.17
.83
.11
.05
.02 2
.02 2
.18
.18
6.0 00
5
0
5.0 00
1
.20
.80
.09
.04
.02 2
.02 2
.22
.22
7.0 00
4
0
4.0 00
0
.00
1.0 0
.09
.04
.00 0
.00 0
.00
.00
8.0 00
4
0
4.0 00
1
.25
.75
.07
.04
.02 2
.02 2
.29
.28
9.0 00
3
0
3.0 00
0
.00
1.0 0
.07
.04
.00 0
.00 0
.00
.00
10. 000
3
0
3.0 00
0
.00
1.0 0
.07
.04
.00 0
.00 0
.00
.00
11. 000
3
0
3.0 00
0
.00
1.0 0
.07
.04
.00 0
.00 0
.00
.00
12. 000
3
0
3.0 00
0
.00
1.0 0
.07
.04
.00 0
.00 0
.00
.00
13. 000
3
0
3.0 00
0
.00
1.0 0
.07
.04
.00 0
.00 0
.00
.00
14. 000
3
0
3.0 00
0
.00
1.0 0
.07
.04
.00 0
.00 0
.00
.00
.00 0
43
0
43. 000
0
.00
1.0 0
1.0 0
.00
.00 0
.00 0
.00
.00
1.0 00
43
0
43. 000
11
.26
.74
.74
.07
.25 6
.06 7
.29
.09
2.0 00
32
0
32. 000
15
.47
.53
.40
.07
.34 9
.07 3
.61
.15
3.0 00
17
0
17. 000
2
.12
.88
.35
.07
.04 7
.03 2
.13
.09
4.0 00
15
0
15. 000
0
.00
1.0 0
.35
.07
.00 0
.00 0
.00
.00
5.0 00
15
0
15. 000
1
.07
.93
.33
.07
.02 3
.02 3
.07
.07
6.0 00
14
0
14. 000
1
.07
.93
.30
.07
.02 3
.02 3
.07
.07
7.0 00
13
0
13. 000
0
.00
1.0 0
.30
.07
.00 0
.00 0
.00
.00
8.0 00
13
0
13. 000
0
.00
1.0 0
.30
.07
.00 0
.00 0
.00
.00
9.0 00
13
0
13. 000
2
.15
.85
.26
.07
.04 7
.03 2
.17
.12
10. 000
11
0
11. 000
0
.00
1.0 0
.26
.07
.00 0
.00 0
.00
.00
11. 000
11
0
11. 000
1
.09
.91
.23
.06
.02 3
.02 3
.10
.10
12. 000
10
0
10. 000
0
.00
1.0 0
.23
.06
.00 0
.00 0
.00
.00
13. 000
10
0
10. 000
0
.00
1.0 0
.23
.06
.00 0
.00 0
.00
.00
14. 000
10
0
10. 000
1
.10
.90
.21
.06
.02 3
.02 3
.11
.11
Inte rval Start
Tim e
.00 0
Nu mber of Pro porti on
Te rmina l Te rmina tin
Eve nts
g
0
.00
Std . Erro r
of
Pro babi lity Pro babi lity
De nsity
De nsity
.00 0
.00 0
Ha zard
Ra te
.00
Std .
Error of
Ha zard
Ra te
.00
Me dian Surv iv al Time
First-ord er Co ntrol s
GROUP 1
2
Me d Tim e
1.8 2
2.7 0
Survival Analysis – 38
Printed on 10/9/2008
First-order Control: GROUP
Since the outcome is a
good event, the faster the
curve falls to zero, the
better.
So the group performing
best is the group with the
lowest curve.
These data are strange because the “event” is something that is sought after - reaching a feeding goal, rather
than something that is to be avoided - death or termination. So for these data, lower "survival" is preferred,
since the "event" is not death, but reaching a nutrition goal. The sooner a patient reached the nutrition goal the
better. Thus, the investigators hoped that patients in the PEJ condition would reach those goals faster, leading
to lower "survival" curves. In this case, survival should be called "Failure to reach feeding goal."
Survival Analysis – 39
Printed on 10/9/2008
Analysis of the same data using Kaplan-Meier
KM
DAYSGOAL BY GROUP /STATUS=GOALIN14(1)
/PRINT TABLE MEAN
/PLOT SURVIVAL HAZARD
/TEST LOGRANK BRESLOW TARONE
/COMPARE OVERALL POOLED .
Kaplan-Meier
G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav
Ca se Pr oces sing Sum mary
Ce nsore d
GROUP
1
To tal N
46
N o f Eve nts
43
2
43
Overall
89
N
3
Pe rcent
6.5 %
34
9
20 .9%
77
12
13 .5%
Me ans and M edia ns for Surv iv a l Tim e
Me an
a
Me dian
95 % Co nfide nce I nterva l
GROUP
1
95 % Co nfide nce I nterva l
Est imate
2.7 17
Std . Erro r
.52 7
Lo wer B ound
1.6 85
Up per B ound
3.7 50
Est imate
1.0 00
2
5.4 88
.85 7
3.8 08
7.1 69
Overall
4.0 56
.51 7
3.0 43
5.0 69
Std . Erro r
.
Lo wer B ound
.
Up per B ound
.
2.0 00
.21 4
1.5 81
2.4 19
2.0 00
.21 1
1.5 87
2.4 13
a. Est imati on is limit ed to the l argest survival t ime i f it is censored.
Survival Analysis – 40
Printed on 10/9/2008
Ov erall Com paris ons
Log Ran k (Ma ntel-Cox)
Ch i-Squ are
8.4 79
df
1
Sig .
.00 4
Bre slow (Gen eralized
Wil coxo n)
9.5 88
1
.00 2
Ta rone-Ware
9.3 06
1
.00 2
Te st of e qual ity of survival di stribu tions for th e diff erent level s of G ROUP.
Survival Analysis – 41
Printed on 10/9/2008
The same analysis using Cox Regression
One requirement of the Cox Regression analysis is
that the hazard functions be proportional. That
means that for any two values of a covariate, the
ratio of hazards for those two values across time be
constant.
This eliminates hazard functions which cross or
which are parallel.
Roughly speaking the hazard function should look
like the following . . .
That is, the hazard functions diverge over time.
Survival Analysis – 42
Printed on 10/9/2008
COXREG
DAYSGOAL /STATUS=GOALIN14(1)
/PATTERN BY GROUP
/CONTRAST (GROUP)=Indicator(1)
/METHOD=ENTER GROUP
/PLOT SURVIVAL HAZARD
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .
Cox Regression
G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav
Ca se Pr oces sing Sum mary
N
Ca ses a vaila ble
in analysis
Ca ses d roppe d
Pe rcent
86 .5%
Eventa
77
Ce nsore d
12
13 .5%
To tal
89
10 0.0%
Ca ses with m issing valu es
0
.0%
Ca ses with ne gative tim e
0
.0%
Ce nsore d cases be fore
the earl iest e vent in a
stra tum
0
.0%
0
.0%
89
10 0.0%
To tal
To tal
a. De pend ent V ariab le: DAYSG OAL
Ca tegor ical Varia ble Codingsb
GROUP a 1
Fre quen cy
46
2
43
(1)
0
1
a. Ind icato r Paramet er Co ding
b. Ca tegory variable : GRO UP
Block 0: Beginning Block
Om nibus Tes ts of Model Coeffic ients
-2 Log L ikelih ood
61 8.281
Block 1: Method = Enter
Om nibus Tes ts of Mode l Coefficie nts a,b
Overall (score )
-2 L og L ikelih ood
612 .895
Ch i-squa re
5.4 48
df
1
Ch ange From Previous Step
Sig .
.02 0
Ch i-squa re
5.3 85
df
1
Sig .
.02 0
Ch ange From Previous Block
Ch i-squa re
5.3 85
df
1
a. Be ginni ng Bl ock Numbe r 0, i nitial Log Likeli hood funct ion: -2 Log likel ihood : 618 .281
b. Be ginni ng Bl ock Numbe r 1. M etho d = E nter
Survival Analysis – 43
Printed on 10/9/2008
Sig .
.02 0
Va riable s in the E quation
B
GROUP
SE
-.5 42
Wa ld
5.3 32
.23 5
df
1
Sig .
.02 1
Exp(B)
.58 2
Goal
Cov aria te M eans and Patte rn Va lues
Pa ttern
GROUP
Me an
.48 3
1
.00 0
2
1.0 00
No Goal
1
2
The above graph presents predicted proportions. They are analogous to plots of y-hats vs. predictors in a
regression analysis.
When you perform a Cox-regression analysis, you may also have to run a Kaplan-Meier analysis just for the
observed survival curves the K-M procedure produces.
Survival Analysis – 44
Printed on 10/9/2008
Survival Analysis – 45
Printed on 10/9/2008
Comparing Turnover in two plants
A company was interested in determining the causes of turnover in two of its plants.
Plant A: One part of the preparation of food for sale to retailers is undertaken.
Plant B: A different part of the preparation of food for sale to retailer is understaken.
The two plants hire from the same pool of employees. Each plant is managed by a different person.
The overall “survival” of employees in the two plants is as follows . . .
filter off.
compute reploc = newloc.
value labels reploc 1 "A" 2 "B".
filter by useme.
KM dayswrkd by reploc /STATUS=termed(1)/PRINT MEAN /PLOT SURVIVAL
/TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED.
Kaplan-Meier
[DataSet1] G:\MDBR\???\AllEmployeesNN041025.sav
Case Processing Summary
reploc
Total N
N of Events
Censored
N
Percent
1.00 A
310
126
184
59.4%
2.00 B
837
285
552
65.9%
Overall
1147
411
736
64.2%
Means and Medians for Survival Time
Meana
reploc
Estimate
Std. Error
Median
95% Confidence Interval
Lower Bound
Estimate
Std. Error
95% Confidence Interval
Upper Bound
Lower Bound
Upper Bound
1.00 A
355.796
16.345
323.760
387.832
377.000
39.815
298.962
455.038
2.00 B
424.911
9.197
406.884
442.938
559.000
.
.
.
Overall
407.357
8.081
391.519
423.195
489.000
33.040
424.242
553.758
a. Estimation is limited to the largest survival time if it is censored.
Overall Comparisons
Chi-Square
df
Sig.
Log Rank (Mantel-Cox)
13.633
1
.000
Breslow (Generalized Wilcoxon)
10.203
1
.001
Tarone-Ware
11.880
1
.001
Test of equality of survival distributions for the different levels of reploc.
Survival Analysis – 46
Printed on 10/9/2008
filter off.
Clearly, employee “retention/survival” is best in Plant B.
Survival Analysis – 47
Printed on 10/9/2008
Are these differences in survival rates the same for the different ethnic groups employed by the
company?
Perhaps the differences between buildings are due to the fact that the different buildings have different
proportions of ethnic groups
neweth * reploc Crosstabulation
reploc
1.00 A
neweth
.00 White or Black
1.00 Hispanic
Total
Count
% within reploc
Count
% within reploc
Count
% within reploc
130
41.4%
184
58.6%
314
100.0%
2.00 B
219
25.7%
634
74.3%
853
100.0%
Total
349
29.9%
818
70.1%
1167
100.0%
coupled with the fact that the different ethnic groups have different survival rates . . .
Hispanic
White/Black
These differences suggest that the difference in survival between buildings might be a side-effect of the
difference in proportion of hispanics in the two buildings combined with the difference in survival between
Hispanics vs. White/Black,
The way to resolve this issue is to perform a multivariate analysis, pitting building against ethnic group.
This can only be done with Cox Regression.
Survival Analysis – 48
Printed on 10/9/2008
Multivariate analysis joint effect of building and ethnic group.
filter off.
filter by useme.
COXREG dayswrkd
/STATUS=termed(1)
/METHOD=ENTER reploc neweth
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Cox Regression
Case Processing Summary
N
Eventa
Censored
Total
Cases with missing values
Cases with negative time
Censored cases before the earliest event in a
stratum
Total
Cases available in analysis
Cases dropped
Total
a. Dependent Variable: dayswrkd
411
736
1147
65
0
Percent
33.9%
60.7%
94.6%
5.4%
0.0%
0
0.0%
65
1212
5.4%
100.0%
Block 0: Beginning Block
Omnibus Tests of
Model Coefficients
-2 Log Likelihood
5312.092
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
Overall (score)
Change From Previous Step
-2 Log
ChiChiLikelihood
square
df
Sig.
square
df
Sig.
5222.145
101.652
2
.000
89.947
2
.000
a. Beginning Block Number 1. Method = Enter
reploc
neweth
B
-.159
-.916
Variables in the Equation
SE
Wald
df
.111
2.072
1
.102
80.462
1
Sig.
.150
.000
Change From Previous
Block
Chisquare
df
Sig.
89.947
2
.000
Exp(B)
.853
.400
Covariate Means
Mean
reploc
1.730
neweth
.697
filter off.
So, when controlling for differences in ethnic groups, no difference in survival (turnover) between the two
buildings was found. The manager of Building A was very happy with this result.
Survival Analysis – 49
Printed on 10/9/2008
Download