Prediction

advertisement
STA 695 Final
Angela Schoergendorfer
May 5, 2009
1)
Weibull regression model
I fit a Weibull regression model with the effects age, age^2 and t5. The results of the model show that the
effect of t5 is not significant (p=.209). For age, the linear as well as the quadratic effect are significant
(p=.0066 and =.0008, respectively). I therefore refit the model (model 1.2) without the effect of t5, and will
use this model for the comparison in part 3.
> model1 = survreg(Surv(time,status)~age + I(age^2) + t5, data=stan)
> summary(model1)
Call:
survreg(formula = Surv(time, status) ~ age + I(age^2) + t5, data = stan)
Value Std. Error
z
p
(Intercept) 4.66230
1.66855 2.79 5.20e-03
age
0.24207
0.08915 2.72 6.62e-03
I(age^2)
-0.00388
0.00116 -3.34 8.26e-04
t5
-0.37258
0.29670 -1.26 2.09e-01
Log(scale)
0.47114
0.08434 5.59 2.32e-08
Scale= 1.60
Weibull distribution
Loglik(model)= -760.4
Loglik(intercept only)= -770.7
Chisq= 20.62 on 3 degrees of freedom, p= 0.00013
Number of Newton-Raphson Iterations: 5
n=153 (25 observations deleted due to missingness)
> model1.2 = survreg(Surv(time,status)~age + I(age^2) , data=stan)
> summary(model1.2)
Call:
survreg(formula = Surv(time, status) ~ age + I(age^2), data = stan)
Value Std. Error
z
p
(Intercept) 4.76093
1.59528 2.98 2.84e-03
age
0.20988
0.08645 2.43 1.52e-02
I(age^2)
-0.00346
0.00113 -3.06 2.22e-03
Log(scale)
0.47509
0.07934 5.99 2.12e-09
Scale= 1.61
Weibull distribution
Loglik(model)= -843.9
Loglik(intercept only)= -852.9
Chisq= 18.09 on 2 degrees of freedom, p= 0.00012
Number of Newton-Raphson Iterations: 5
n= 178
2)
Cox Proportional Hazard Regression
Fitting a Cox proportional hazards regression model with the covariates age, age^2 and t5, we find that
just as with the Weibull regression model, the linear and quadratic effect are significant (p=.0079 and
p=.0012, respectively), while the effect of t5 is not significant (p=.2300).
> model2 = coxph(Surv(time,status)~ age + I(age^2) + t5 , data=stan)
> summary(model2)
Call:
coxph(formula = Surv(time, status) ~ age + I(age^2) + t5, data = stan)
n=153 (25 observations deleted due to missingness)
coef exp(coef) se(coef)
z
p
age
-0.14747
0.863 0.055493 -2.66 0.0079
I(age^2) 0.00234
1.002 0.000721 3.25 0.0012
t5
0.22466
1.252 0.186847 1.20 0.2300
age
I(age^2)
t5
exp(coef) exp(-coef) lower .95 upper .95
0.863
1.159
0.774
0.962
1.002
0.998
1.001
1.004
1.252
0.799
0.868
1.806
Rsquare= 0.116
(max possible=
Likelihood ratio test= 18.9 on
Wald test
= 21.3 on
Score (logrank) test = 21.5 on
0.996 )
3 df,
p=0.000288
3 df,
p=9.07e-05
3 df,
p=8.24e-05
Prediction
The results of estimating the survival curve and 95% (pointwise) confidence interval for a patient who is
45 years old and has a t5-mismatch score of 1.3 are plotted in Figure 1. To estimate the cumulative
hazard for this patient, I centered age and t5 by subtracting the mean age and t5-score of all observations
in the dataset. The patient has a slightlylower hazard of dying than the baseline hazard (see Figure 2),
which is a patient of mean age and mean mismatch score t5.
# predicted survival curve for patient with age=45, t5 = 1.3
newdata = data.frame(age=c(45),t5=c(1.3))
fitpred2 = survfit(model2,newdata)
# predicted cumulative hazard
baseH = basehaz(model2)
meanage = mean(stan$age, na.rm=T)
meant5 = mean(stan$t5, na.rm=T)
predH = baseH$hazard * exp(sum(model2$coef*c(45-meanage,(45)^2-meanage2,1.3-meant5)))
Figure 1
3)
Figure 2
Cox Proportional Hazard Regression without t5
Model3 refits the Cox PH regression without the covariate t5.
> model3 = coxph(Surv(time,status)~ age + I(age^2), data=stan)
> summary(model3)
Call:
coxph(formula = Surv(time, status) ~ age + I(age^2), data = stan)
n= 178
coef exp(coef) se(coef)
z
p
age
-0.12488
0.883 0.054043 -2.31 0.0210
I(age^2) 0.00205
1.002 0.000705 2.91 0.0037
age
I(age^2)
exp(coef) exp(-coef) lower .95 upper .95
0.883
1.133
0.794
0.981
1.002
0.998
1.001
1.003
Rsquare= 0.088
(max possible=
Likelihood ratio test= 16.5 on
Wald test
= 18.6 on
Score (logrank) test = 18.6 on
0.996 )
2 df,
p=0.000267
2 df,
p=9e-05
2 df,
p=9.22e-05
Prediction
Figure 3 shows the predicted median survival time over age., figure 4 shows the observed (or censored)
survival time over age (all plots that follow are plotted on the same scale of the y-axis to make
comparisons easier). As could be expected, there is a lot of scatter in the observed data compared to the
predicted curve. In the plot of predicted median survival time we can see that until 30 the median survival
time increases (from 431 for age 12 to 1634 for age 30-32). After that it steadily decreases.
# predict each patients’ survival time
fitpred3=survfit(model3,newdata=stan)
median_loc = apply(fitpred3$surv,2,find.median)
find.median = function(x) return(sum(x>=.5))
median_pred = fitpred3$time[median_loc+1]
Figure 3
Figure 4
Comparison to Weibull model without t5
Figure 5 plots the predicted median survival time from the Weibull regression model without t5 (model
1.2) as a solid line, and places the pointwise predicted median survival from the Cox PH model over it.
The general shape of the predicted curve is very similar, in that survival time increases until age 30 and
then steadily decreases. There are slight deviations around age 20 and age 40, where median survival
time predicted by the Cox PH model is slightly higher than that by the Weibull regression. Overall the
results of the two models agree, both in detecting age and ag^2 as significant, t5 as nonsignificant, and in
the overall prediction of the median survival time.
Figure 5
4)
SAS Analysis (103 patients)
The first model contains the variable ‘plant’ and ‘surg’, along with the covariates age, age^2, and their
interactions with ‘plant’. Neither of the interactions is significant, so I removed them one at a time. But
even after removing the interaction, age^2 was not significant, so that I ended up with the same model as
on p139 of the book.
The PHREG Procedure
Model Information
Data Set
WORK.STAN
Dependent Variable
surv1
Censoring Variable
dead
Censoring Value(s)
0
Ties Handling
EFRON
Number of Observations Read
Number of Observations Used
103
103
Summary of the Number of Event and
Censored Values
Total
Event
Censored
Percent
Censored
103
75
28
27.18
Analysis of Maximum Likelihood Estimates
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
plant
1
-1.15579
3.36472
0.1180
0.7312
.
surg
1
-0.70131
0.36712
3.6492
0.0561
0.496
ageaccpt
1
-0.04130
0.07779
0.2819
0.5954
.
ageaccpt_sq
1
0.0007543
0.00103
0.5408
0.4621
.
plant*ageaccpt
1
0.02603
0.15631
0.0277
0.8677
. plant * ageaccpt
plant*ageaccpt_sq
1
-0.0000396
0.00181
0.0005
0.9826
. plant * ageaccpt_sq
Parameter
Hazard
Ratio Label
Model after interaction have been removed:
Analysis of Maximum Likelihood Estimates
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
plant
1
-0.04785
0.30526
0.0246
0.8754
0.953
surg
1
-0.70043
0.36492
3.6842
0.0549
0.496
ageaccpt
1
-0.05362
0.06364
0.7098
0.3995
0.948
ageaccpt_sq
1
0.00103
0.0007718
1.7815
0.1820
1.001
Parameter
Final model with only one significant covariate (apart from time-dependent covariate ‘plant’ and variable
of interest ‘surg’)
Analysis of Maximum Likelihood Estimates
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
plant
1
-0.04614
0.30276
0.0232
0.8789
0.955
surg
1
-0.77143
0.35961
4.6019
0.0319
0.462
ageaccpt
1
0.03109
0.01391
4.9948
0.0254
1.032
Parameter
This dataset is not the same as in the previous analysis (1-3), and the variables considered are different,
so the results can not be expected to be the same. Still, one might still expect the quadratic effect of age
to be significant. In this dataset however, there are only 5 observations with age less than 25 (compared
to 19 in the previous dataset). It seems reasonable that 5 are too few observations to detect a significant
decrease in expected survival time in this age range compared to older ages. This might explain why the
covariate age^2 is not significant in this model.
5)
R code
library(survival)
data(stanford2)
# delete observations where survival time <5
stan = stanford2[(stanford2$time>=5),]
## 1 ##
## Weibull regression model (t5 ... mismatch score)
model1 = survreg(Surv(time,status)~age + I(age^2) + t5, data=stan)
summary(model1)
# leave out t5
model1.2 = survreg(Surv(time,status)~age + I(age^2) , data=stan)
summary(model1.2)
## 2 ##
## Cox PH regression
model2 = coxph(Surv(time,status)~ age + I(age^2) + t5 , data=stan)
summary(model2)
plot(survfit(model2))
# predicted survival curve for patient with age=45, t5 = 1.3
newdata = data.frame(age=c(45),t5=c(1.3))
fitpred2 = survfit(model2,newdata)
par(new=T)
plot(fitpred2, main="Predicted Survival for Patient Age = 45, t5 = 1.3", col=2)
legend(2000,1, legend=c("baseline","predicted"),col=c(1,2), lty=1)
# predicted cumulative hazard
baseH = basehaz(model2, centered=T)
meanage = mean(stan$age, na.rm=T)
meanage2 = mean(stan$age^2,na.rm=T)
meant5 = mean(stan$t5, na.rm=T)
predH = baseH$hazard * exp(sum(model2$coef*c(45-meanage,(45)^2-meanage2,1.3-meant5)))
plot(baseH$time, baseH$hazard, type="l",
main="Estimated Baseline Cumulative Hazard
\n Predicted Cumulative Hazard for Patient Age = 45, t5 = 1.3",
xlab = "Time", ylab="Cumulative Hazard")
lines(baseH$time,predH, type="l", lty=2)
legend(0,1.8, legend=c("baseline","predicted"),lty=c(1,2))
## 3 ##
## 2nd Cox PH model
model3 = coxph(Surv(time,status)~ age + I(age^2), data=stan)
summary(model3)
fit3 = survfit(model3)
plot(fit3)
# predict each patients' survival time
fitpred3=survfit(model3,newdata=stan)
find.median = function(x) return(sum(x>=.5))
median_loc = apply(fitpred3$surv,2,find.median)
median_pred = fitpred3$time[median_loc+1]
cbind(stan$id,median_pred)
# predicted median survival over age
plot(stan$age, median_pred, xlab="Age", ylab="Predicted median survival",
main="Predicted median survival time over age", ylim=c(0,3700))
# observed survival over age
plot(stan$age[stan$status==1], stan$time[stan$status==1], pch=20,
ylim=c(0,3700), main="Observed survival over Age",
xlab="Age", ylab="observed time")
points(stan$age[stan$status==0], stan$time[stan$status==0])
legend(11,3650,legend=c("observed death", "censored"), pch=c(20,1))
# predicted vs. ovserved
plot(median_pred[stan$status==1], stan$time[stan$status==1], pch=20,
ylim=c(0,3700), main="Observed survival over predicted median",
xlab="Predicted Median survival", ylab="observed time")
points(median_pred[stan$status==0], stan$time[stan$status==0])
legend(100,3650,legend=c("observed death", "censored"), pch=c(20,1))
abline(1,1)
# comparison to model 1.2
median_weibull = predict(model1.2, newdata=stan, type="quantile", p=.5)
plot(stan$age, median_weibull, xlab="Age", ylab="Predicted median survival",
main="Predicted median survival time over age \n Weibull model and Cox PH (no t5)",
type="l",
ylim=c(0,3700))
points(stan$age, median_pred)
legend(50,3000,legend=c("Weibull","Cox PH"), pch=c(NA,1), lty=c(1,NA))
plot(median_weibull, median_pred, xlab="Predicted Median Weibull",
ylab="Predicted Median Cox PH", main="Cox PH vs. Weibull - predicted medians")
abline(1,1)
# comparison to model 1
median_weibull = predict(model1, newdata=stan, type="quantile", p=.5)
plot(stan$age[!is.na(stan$t5)], median_weibull, xlab="Age", ylab="Predicted median
survival",
main="Predicted median survival time over age \n Weibull model (with t5) and Cox PH
(no t5)",
ylim=c(0,3700))
points(stan$age[!is.na(stan$t5)], median_pred[!is.na(stan$t5)], pch=20)
legend(50,3000,legend=c("Weibull","Cox PH"), pch=c(1,20),)
plot(median_weibull, median_pred[!is.na(stan$t5)], xlab="Predicted Median Weibull",
ylab="Predicted Median Cox PH", main="Cox PH vs. Weibull - predicted medians")
abline(1,1)
6)
SAS Code
data stan; set stan;
ageaccpt_sq= ageaccpt*ageaccpt;
run;
proc phreg data=stan;
model surv1*dead(0) = plant surg ageaccpt ageaccpt_sq plant*ageaccpt
plant*ageaccpt_sq/ ties=efron;
if wait>surv1 or wait=. then plant=0; else plant=1;
run;
proc phreg data=stan;
model surv1*dead(0) = plant surg ageaccpt plant*ageaccpt / ties=efron;
if wait>surv1 or wait=. then plant=0; else plant=1;
run;
proc phreg data=stan;
model surv1*dead(0) = plant surg ageaccpt ageaccpt_sq/ ties=efron;
if wait>surv1 or wait=. then plant=0; else plant=1;
run;
proc phreg data=stan;
model surv1*dead(0) = plant surg ageaccpt / ties=efron;
if wait>surv1 or wait=. then plant=0; else plant=1;
run;
proc gplot data=stan;
plot surv1*ageaccpt=dead;
run;
Download