Survival Analysis in SAS - Department of Statistics

advertisement
STA/BST 222
Fall 2011
Nov 22, 2011



K-M estimate
COX MODEL
AFT MODEL

PROC LIFETEST
1. Compute & graph the estimated survival
function.
----Kaplan-Meier method
----life-table method
2. test the null hypothesis that the survival
functions are identical for two or more groups.
3. test association between survival times and
sets of quantitative covariates.


KM estimator using PROC LIFETEST
Syntax:
proc lifetest data=dataname;
run;
time timevar*censoringindicator(0);
proc lifetest data=dataname plots=(s) graphics;
time timevar*censoringindicator(0);
symbol1 v=none;
run;

life table estimator using PROC LIFETEST
proc lifetest data=dataname method=life;
run;
time timevar*censoringindicator(0);




PROC PHREG
Advantages:
1. can incorporate time-dependent variables.
2. can deal with tied data.
Syntax:
proc phreg data=dataname;
model timevar*censoringindicator(0)=predictors;
run;
Note: There is no intercept estimate.
When ties are present:
proc phreg data=dataname;
model timevar*censoringindicator(0)=predictors
/ties=exact(or breslow, or efron, or discrete);
run;
efron is fast.



When there are no time-dependent covariates, the
Cox model can be written as S ( t )  [ S 0 ( t )] exp(  x )
Estimate baseline hazard function.
Syntax:
proc phreg data=dataname;
model timevar*censorid(0)=predictors
/ties=efron;
baseline out=a survival=s logsurv=ls
loglogs=lls;
run;
proc print data=a;
run;


Prediction for particular set of covariate values.
Syntax:
data covals;
input predictors;
cards;
**** data ***
run;
proc phreg data=dataname;
model timevar*censorid(0)=predictors;
baseline out=a covariates=covals survival=s lower=lcl
upper=ucl/nomean;
run;
proc print data=a;
run;
log Ti   0   1 x i1 
  k x ik   i
1. It’s simpler to fix the variance of  at some standard
value (e.g., 1.0) and let  change in value to
accommodate changes in the distribution variance.
2.
As for the log transformation of T, its main purpose
is to ensure that the predicted values of T are positive,
regardless of the values of the x’s and the ' s .
3. The output line labeled SCALE is an estimate of the 
parameter, along with its estimated standard error.





If we let the error term be iid but arbitrary and unspecified, we get
(nonparametric) AFT model.
The Buckley-James estimator of the parameter can thought of as
the nonparametric version of the EM algorithm: where the
censored residual is replaced by expected values (E-step). Then
followed by the usual regression M-estimation procedure.
The non-parametric nature of this procedure appears in both the
E-step (where you do not have a parametric distribution for the
residuals); and M-step (where you do not maximize a parametric
likelihood, but use least squares etc.).
The calculation of least squares Buckley-James estimator can be
found in the R function bj(), inside the Design library. The
trustworthiness of the variance estimation from bj() is in doubt.
Mai Zhou (Department of Statistics, University of Kentucky)
recommends use empirical likelihood. See bjtest() etc. inside the
emplik package.


PROC LIFEREG
Advantage:
1.
2.
Accommodates left censoring and interval
censoring. PROC PHREG only allows right
censoring.
Can test certain hypotheses about the shape of the
hazard function. PROC PHREG only gives
nonparametric estimates of the survival function.



If the shape of the survival distribution is known,
PROC LIFEREG produces more efficient estimates
(with smaller standard errors) than PROC PHREG.
PROC LIFEREG’s greatest limitation is that it does
not handle time-dependent covariates.
Syntax:
proc lifereg data=dataname;
model
timevar*censorid(0)=predictors/dist=distri_T;
run;


PROC LIFEREG allows for five distributions for
error term, and for each distribution, there is a
corresponding distribution for T:
Distribution of error term
Distribution of T
Extreme value (2 par.)
Weibull
Extreme value (1 par.)
Exponential
Log-gamma
gamma
logistic
Log-logistic
Normal
Log-normal
Note: all AFT models are named for the
distribution rather than the distribution of error
term or log T.
Download