lab3_EG

advertisement
HRP 262, SAS LAB THREE, April 25, 2012
Topics: Cox Regression
Lab Objectives
After today’s lab you should be able to:
1. Fit models using PROC PHREG. Understand PROC PHREG output.
2. Understand how to implement and interpret different methods for dealing with ties
(exact, efron, breslow, discrete).
3. Understand output from the “baseline” statement.
4. Output estimated survivor functions and plot cumulative hazards.
5. Output and plot predicted survivor functions at user-specified levels of the covariates.
6. Understand the role of the strata statement in PROC PHREG.
7. Evaluate PH assumption graphically and by including interactions with time in the
model.
8. Use the “where” subsetting statement in all PROC’s.
9. Add time-dependent variables to the model. Understand SAS syntax for time-dependent
variables. Be able to correctly specify the time-dependent variables you intend!
SAS PROCs
PROC LIFETEST
PROC LIFEREG
PROC PHREG
SAS EG equivalent
AnalyzeSurvival AnalysisLife Tables
None
AnalyzeSurvival AnalysisProportional Hazards regression
1
HRP 262, SAS LAB THREE, April 25, 2012
LAB EXERCISE STEPS:
Follow along with the computer in front…
1. Go to the class website: www.stanford.edu/~kcobb/courses/hrp262
Lab 2 data SaveSave on your desktop as hmohiv (SAS format)
Lab 3 data SaveSave on your desktop as uis (SAS format)
2. Open SAS EG: From the desktop double-click “Applications” double-click SAS
Enterprise Guide 4.2 icon
3. Click on “New Project”
4. You DO NOT need to import the dataset into SAS, since the dataset is already in SAS
format (.sas7bdat). You DO need to name a library that points to the desktop, where the
dataset is located. Assign the library name hrp262 to your desktop folder:
ToolsAssign Project Library
Name the library HRP262 and then click Next.
2
HRP 262, SAS LAB THREE, April 25, 2012
Browse to find your Desktop. Then Click Next.
Click Next through the next screen.
Click Finish.
3
HRP 262, SAS LAB THREE, April 25, 2012
5. Confirm that you have created an hrp262 library that contains the SAS datasets uis and
hmohiv.
6. We will first use the HMO-HIV dataset to examine ties in a Proportional Hazards model.
Recall from last time that the HMOHIV dataset has many ties for time.
First we will use the default method (BRESLOW) for dealing with ties.
Select AnalyzeSurvival AnalysisProportional Hazards regression
Drag time as your Survival time variable and Censor as your censoring variable. Make sure
to set the censoring value to 0.
4
HRP 262, SAS LAB THREE, April 25, 2012
Add age and drug as your explanatory variables:
Under Methods make sure that you have checked Compute confidence limits for hazard ratio.
You will also see the various methods for dealing with failure time ties (Breslow’s approximate
likelihood should be chosen by default).
5
HRP 262, SAS LAB THREE, April 25, 2012
Then Click Run.
Review of output:
6
HRP 262, SAS LAB THREE, April 25, 2012
Method used for dealing
with ties; Breslow is default
-2 Log Likelihood for
comparing with other models
(likelihood ratio test)
Test of the global null
hypothesis that all
coefficients are equal to 0.
Wald Tests: tests of
whether or not individual
betas are equal to 0.
9.6% increase in hazard for every 1-year increase in age, controlling for the
effect of IV drug use; 156.3% increase in hazard for using IV drugs,
controlling for age.
7. FYI, the corresponding SAS code for proportional hazards regression would be:
Syntax for PHREG
7
HRP 262, SAS LAB THREE, April 25, 2012
/**Do not specify ties**/
proc phreg data=hrp262.hmohiv;
model time*censor(0)=age drug / risklimits;
title 'Cox model for hmohiv data-- ties';
run;
risklimits option asks for
confidence limits for the
estimated hazard ratio.
8. Use the Modify Code button to rerun the Cox model with the other possible
specifications for ties (Efron,Exact, Discrete). Then compare.
BRESLOW (from above)
Analysis of Maximum Likelihood Estimates
95% Hazard Ratio
Parameter Standard
ChiHazard
Confidence
Parameter DF Estimate
Error
Square Pr > ChiSq Ratio
Limits
Label
Age
1
0.09151 0.01849
24.5009
<.0001 1.096
1.057
1.136 Age
Drug
1
0.94108 0.25550
13.5662
0.0002 2.563
1.553
4.229 Drug
EFRON
Analysis of Maximum Likelihood Estimates
95% Hazard Ratio
Parameter Standard
ChiHazard
Confidence
Parameter DF Estimate
Error
Square Pr > ChiSq Ratio
Limits
Label
8
HRP 262, SAS LAB THREE, April 25, 2012
Analysis of Maximum Likelihood Estimates
95% Hazard Ratio
Parameter Standard
ChiHazard
Confidence
Parameter DF Estimate
Error
Square Pr > ChiSq Ratio
Limits
Label
Age
1
0.09714 0.01864
27.1597
<.0001 1.102
1.062
1.143 Age
Drug
1
1.01670 0.25622
15.7459
<.0001 2.764
1.673
4.567 Drug
EXACT
Analysis of Maximum Likelihood Estimates
95% Hazard Ratio
Parameter Standard
ChiHazard
Confidence
Parameter DF Estimate
Error
Square Pr > ChiSq Ratio
Limits
Label
Age
1
0.09768 0.01874
27.1731
<.0001 1.103
1.063
1.144 Age
Drug
1
1.02263 0.25716
15.8132
<.0001 2.781
1.680
4.603 Drug
DISCRETE
Analysis of Maximum Likelihood Estimates
95% Hazard Ratio
Parameter Standard
ChiHazard
Confidence
Parameter DF Estimate
Error
Square Pr > ChiSq Ratio
Limits
Label
Age
1
0.10315 0.02006
26.4449
<.0001 1.109
1.066
1.153 Age
Drug
1
1.07004 0.27438
15.2084
<.0001 2.916
1.703
4.992 Drug
Note: only small differences between Breslow, exact, and Efron; Breslow
and Efron approximations attenuate slightly toward the null.
Discrete gives odds ratios; hence the small increase in size.
9. Corresponding SAS code for different specifications in handling ties.
/**Efron**/
proc phreg data=hrp262.hmohiv;
model time*censor(0)=age drug / ties=efron risklimits;
title 'Cox model for hmohiv data—ties=efron';
run;
/**Exact**/
proc phreg data=hrp262.hmohiv;
model time*censor(0)=age drug / ties=exact risklimits;
title 'Cox model for hmohiv data—ties=exact';
run;
/**Discrete**/
proc phreg data=hrp262.hmohiv;
model time*censor(0)=age drug / ties=discrete risklimits;
title 'Cox model for hmohiv data—ties=discrete';
run;
9
HRP 262, SAS LAB THREE, April 25, 2012
10. Estimate the survival function and plot the cumulative hazard [= -log S(t)]. SAS estimates
the baseline hazard and survival functions using a nonparametric maximum likelihood method.
Using the estimated coefficients for the covariates (the betas from above), SAS can estimate a
survival function for an individual with specific values of the covariates (SAS default plugs in
mean values for the cohort). In EG this is as simple as modifying the procedure to estimate the
survival function. Click on Modify Task:
S (t )  e
 0t h ( u ) du
  log S (t )  0t h(u )du
Go to Results and check the box next to Baseline survivor function estimates. SAS EG will
automatically output the estimated survival function (and log survival and log (-log survival) to a
temporary dataset.
Go to Plots and check Cumulative hazard function plot.
10
HRP 262, SAS LAB THREE, April 25, 2012
Click Run.
Under Output Data you should see the following dataset of estimated
survival
28 unique event
times
Mean value of age for
the dataset
Mean value of drug for
the dataset
logS(t=2)
S (t )  e
log(-logS(t=15))
 0t h ( u ) du
  log S (t )  0t h(u )du
11
HRP 262, SAS LAB THREE, April 25, 2012
Function curves upward, indicating
an increasing hazard with time.
FYI, in SAS code we would use a baseline statement to output this estimated survival function
(and log survival and log(-log survival)) as follows. :
title ' ';
proc phreg data=hrp262.hmohiv;
model time*censor(0)=age drug / ties=discrete risklimits;
baseline out=outdata survival=S logsurv=ls loglogs=lls;
run;
proc print data=outdata;
run;
To graph the cumulative hazard using code:
data outdata2;
set outdata;
ls=-ls;
run;
goptions reset=all;
axis1 label=(angle=90);
proc gplot data=outdata2;
label time='Time(Months)';
label ls='Cumulative Hazard';
plot ls*time / vaxis=axis1;
symbol1 value=none i=join;
run;
12
HRP 262, SAS LAB THREE, April 25, 2012
11. We can also use these plots to assess the validity of the PH assumption. As discussed in
lecture Monday, PH assumption implies that log-log survival curves should be parallel.
First we need to rerun the PH analysis to stratify the analysis by the variable Drug. Click on
Modify Task:
Move Drug from Explanatory variables to Strata variables.
Click Run.
This should generate a new output dataset containing log(S) and log(-log(S)). Examine the
output data -- what is stratification doing?
13
HRP 262, SAS LAB THREE, April 25, 2012
Under your PH analysis, go to the Output Data tab and click on GraphLine Plot.
Under Line Plot, choose Multiple line plots by group column
Under Data, plot time on the Horizontal by loglogsurvival on the Vertical. Group the plots by
Drug.
14
HRP 262, SAS LAB THREE, April 25, 2012
Click Run.
FYI, the SAS code equivalent is as follows:
Note that DRUG has been removed from the model
statement; stratifying by DRUG allows SAS to assume
different baseline hazards for each drug group (which
can later be compared to test PH assumption)…
proc phreg data=hrp262.hmohiv;
model time*censor(0)= age/ ties=discrete risklimits;
strata drug;
15
HRP 262, SAS LAB THREE, April 25, 2012
baseline out=outdata survival=S logsurv=ls loglogs=lls;
run;
**Use SolutionsAnalysisInteractive Data AnalysisWork.Outdata
to examine the contents of outdata. What is stratification doing?
proc gplot data=outdata;
title 'Evaluate proportional hazards assumption for variable: drug';
plot lls*time=drug /vaxis=axis1;
symbol1 i=join c=black line=1;
symbol2 i=join c=black line=2;
run;
log Si(t )  log S j (t ) HR  log Si (t )  HR log S j (t )
becauselog
no events
log(  log Si (t ))  log(Short
 HR
S j (t ))
in this group beyond t=20
log(  log Si (t ))  log HR  log(  log S j (t ))
 X (t )  K  X (t )
Recall: in SAS LAB 2
we graphed log(-log)S(t)
for these data against
log(time) using proc
lifetest, which involved
fewer steps!!
16
HRP 262, SAS LAB THREE, April 25, 2012
10. Since it’s hard to tell if the hazards are proportional, we might wonder if PH assumption
is violated and if there is an interaction between drug and time. We can test this (and fix
problem) by adding such an interaction term to the model.
**We have to use code to do this, because “time” is a time-changing variable; thus we cannot
simply create a time*drug variable and add that to the model. The drugtime variable that we
are creating below is a time-dependent variable, where “time” is updated at every event
time (SAS does this work for us).
Find the previous Cox regression run that included age and drug as predictors. Directly
modify the code as follows:
PROC PHREG DATA=WORK.TMP0TempTableInput
PLOTS=SURVIVAL
;
MODEL time * Censor (0) = Drug Age drugtime /
TIES=BRESLOW
RISKLIMITS ALPHA=0.05
SELECTION=NONE
;
drugtime=drug*time;
RUN;TITLE;
SAS automatically creates timedependent variables for you in the phreg
procedure (without a separate datastep).
This temporary variable is not saved.
Analysis of Maximum Likelihood Estimates
95% Hazard Ratio
Parameter Standard
ChiHazard
Confidence
Parameter DF Estimate
Error
Square Pr > ChiSq Ratio
Limits
Label
Drug
1
1.12079 0.34996
10.2570
0.0014 3.067
1.545
6.090 Drug
Age
1
0.08925 0.01877
22.6193
<.0001 1.093
1.054
1.134 Age
drugtime
1 -0.02818 0.03884
0.5264
0.4681 0.972
0.901
1.049
p-value is not significant, so there is little evidence of an interaction exists.
11. Obtain predictions about survival times for particular sets of covariate values that need
not appear in the data set being analyzed. You have to create a new dataset that contains
the covariate values of interest. We’ll use code to enter the new dataset (though this could
also be done in point and click).
NewNew Program
17
HRP 262, SAS LAB THREE, April 25, 2012
data MyCovs;
input age drug;
datalines;
55 1
22 0
;
run;
Then run a Cox regression model on the original hmohiv dataset, with age and drug as
predictors:
Drag time as your Survival time variable and Censor as your censoring variable. Make sure
to set the censoring value to 0.
Add age and drug as your explanatory variables:
18
HRP 262, SAS LAB THREE, April 25, 2012
Under Resultsask for Baseline survivor function estimates; Then click Browse to
rename these data…
Find the Work libraryName the dataset Outdata
19
HRP 262, SAS LAB THREE, April 25, 2012
Then click Save and Run.
20
HRP 262, SAS LAB THREE, April 25, 2012
Examine the data under Output Data:
Defines the
survivor curve
at the mean
age and mean
IV drug use.
Make one modification to the code to apply your model to the MyCovs dataset (e.g., to get
survivor curves for a 55-year old drug user and a 22-year old non-drug user).
BASELINE OUT=WORK.OUTDATA(LABEL="Baseline Survivor Function Estimates
for WORK.QUERY_FOR_HMOHIV_0000")
SURVIVAL=_SURVIV_
UPPER=_SDFUCL_
LOWER=_SDFLCL_
LOGLOGS=_LOGLOGS_
LOGSURV=_LOGSURV_
STDERR=_STDERR_
STDXBETA=_STDXBETA_
XBETA=_XBETA_
covariates=mycovs
;
21
HRP 262, SAS LAB THREE, April 25, 2012
RUN;
Note the output data now contains two survivor curves, one for a 55 year old drug user and 1 for
a 22 year old non-drug user:
12. Graph these survival curves (we’ll use code here so that we can superimpose the
confidence limits on the same graph):
goptions reset=all;
axis1 label=(angle=90);
proc gplot data=outdata;
title 'Survivor f3unction at age 55 with drug';
plot _surviv_*time _sdflcl_*time _sdfucl_*time/overlay
vaxis=axis1;
symbol1 i=join c=black line=1;
Note use of the very convenient
symbol2 i=join c=black line=2;
where statement (works in most
symbol3 i=join c=black line=2;
SAS procs) for subsetting.
where age=55;
run;
proc gplot data=outdata;
title 'Survivor function at age 22 without drug';
plot _surviv_*time _sdflcl_*time _sdfucl_*time /overlay
vaxis=axis1;
symbol1 i=join c=black line=1;
symbol2 i=join c=black line=2;
symbol3 i=join c=black line=2;
where age=22;
run;
22
HRP 262, SAS LAB THREE, April 25, 2012
It’s a quick drop-off for this
group. Note that the
confidence intervals go
haywire around
PredictedSurvival=0.
23
HRP 262, SAS LAB THREE, April 25, 2012
Now, let’s switch datasets to UIS. The data dictionary is below for your reference:
These data were collected from a randomized trial of two in-patient treatment courses for drug
addiction (heroin, cocaine, and unspecified other drugs): a shorter treatment plan and a longer
treatment plan. Time to censoring or return to drug use was recorded, as was length of time in
the treatment plan. Variables are below:
Variable
ID
AGE
BECKTOTA
Description
Identification code
Age at enrollment
Beck Depression Score at
Admission
Heroin/Cocaine use during 3
months prior to admission
Codes/Values
1-628
Years
0-54
IVHX
IV drug use history at
admission
NDRUGTX
Number of prior drug
treatments
Subject’s race
1=never
2=previous
3=recent
0-40
HERCOC
RACE
TREAT
SITE
LOT
TIME
CENSOR
Treatment randomization
Assignment
Treatment site
Length of treatment
(exit date-admission date)
Time to return of drug use
(Measured from admission)
Returned to drug use
1=Heroin& cocaine
2=Heroin only
3=Cocaine Only
4=neither
0=White
1=Other
0=short
1=long
0=A
1=B
Days
Days
1=Returned to drug use
0=Otherwise
24
HRP 262, SAS LAB THREE, April 25, 2012
14. Examine data using a KM plot stratified by “TREAT” which specifies the randomization
group (short or long). In EG, open the UIS dataset. Click on AnalyzeSurvival Analysis
Life Tables.
Drag time to Survival time, censor as the Censoring variable, and treat as the Strata variable.
Make sure to specify 0 as the Censoring value.
Click Run.
25
HRP 262, SAS LAB THREE, April 25, 2012
Scroll to the bottom of the results window to examine the KM plot.
The analogous SAS code is:
goptions reset=all;
Note use of proc format to assign
proc format;
format to the variable treat.
value treat
1 = "long"
0 = "short";
run;
proc lifetest data=hrp262.uis plots=(s) graphics censoredsymbol=none;
label time='time(days)';
time time*censor(0);
strata treat;
title 'UIS KM plot';
symbol1 c=black v=none line=1 i=join;
symbol2 c=black v=none line=2 i=join; *gives plots that will be
suitable for black and white copying;
format treat treat.;
run;
15. Now run a PH analysis with treat and age as predictors, first as a crude analysis and then
stratified by site. Scroll through and review output as a class.
26
HRP 262, SAS LAB THREE, April 25, 2012
Under the UIS dataset, click on AnalyzeSurvival AnalysisProportional Hazards.
For the non-stratified analysis, specify time under Survival time, censor as the Censoring
variable, age and treat as the Explanatory variables.
Click Run.
WITHOUT STRATIFICATION BY SITE:
Analysis of Maximum Likelihood Estimates
Interpretation: there is a 20% decre
relapse in those treated in the long
27 those in the short p
compared with
HRP 262, SAS LAB THREE, April 25, 2012
Parameter Standard
Hazard
Parameter DF Estimate
Error Chi-Square Pr > ChiSq Ratio
age
1 -0.01327 0.00721
3.3847
0.0658 0.987
treat
1 -0.22298 0.08933
6.2307
0.0126 0.800
Now repeat the analysis with site as the Stratification variable:
Click Run.
WITH STRATIFICATION BY SITE:
Analysis of Maximum Likelihood Estimates
Parameter Standard
Hazard
Parameter DF Estimate
Error Chi-Square Pr > ChiSq Ratio
age
1 -0.01429 0.00729
3.8434
0.0499 0.986
treat
1 -0.23507 0.08961
6.8820
0.0087 0.791
The fact that there’s little difference between
the hazard ratios with or without stratification
indicates that there’s little confounding by
site, BUT it might be important to leave the
stratification in anyway, to avert potential
criticism by reviewers of your analysis.
There seems to be a protective effect for being
randomized to the longer treatment group.
Stratification allows there to be totally different (and even non-parallel) baseline hazard functions in each stratum. Rather than trying
to estimate hazard ratios between the strata, you instead generate a hazard ratio that has been “averaged” over the different strata (like
a mantel-haenszel odds ratio).
Risk sets for the partial likelihood are allowed to be stratum-specific.
This is useful if you want to control for a possible confounding variable that (1) may violate the Proportional Hazards assumption
and/or (2) is a nuisance variables—that is you don’t care about getting hazard ratios for it.
For example, in this study, there were two hospitals in which treatment was administered (site 1 and site 2). There’s good reason to
believe that baseline hazards might be totally different depending on the site (because of different local populations, and different ways
of administering the treatment). For this study, we don’t care about the effect of site; we just want to know if the long treatment is
better than the short treatment. So, we “average” the HR for the long treatment compared with the short treatment over the two
different sites.
28
HRP 262, SAS LAB THREE, April 25, 2012
The corresponding SAS code is:
Without stratification:
proc phreg data=hrp262.uis;
model time*censor(0)=age treat/risklimits;
run;
With stratification:
proc phreg data=hrp262.uis;
model time*censor(0)=age treat/risklimits;
strata site;
run;
16. Run PROC PHREG with treatment as a time-dependent variable and age (much easier with
code).
proc phreg data=hrp262.uis;
model time*censor(0)=off_trt age treat/risklimits;
if lot>=time then off_trt=0; else if lot<time then off_trt=1;
run;
This is adding a time-dependent covariate: off_trt, which is 1 if they have left the treatment program and 0 if they are still in the
treatment program. This is evaluating whether just being in the treatment facility and program prevents a relapse.
At each event time T, for each individual still in the risk set, SAS compares the length of treatment to T. For example, the first event
occurred at 2 months. SAS goes through and compares the length of treatment for each individual with 2 months. If the person was in
treatment longer than 2 months, then they are still on treatment at the first event time and off_trt=0. If the person was in treatment
shorter than 2 months, then at 2 months they are off treatment and off_trt=1. Thus, the values of off_trt change at each event time and
in each likelihood term of the likelihood equation.
RESULTS:
Analysis of Maximum Likelihood Estimates
Variable
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
off_trt
age
treat
1
1
1
2.56681
-0.00719
0.01082
0.15173
0.00728
0.08988
286.1826
0.9777
0.0145
<.0001
0.3228
0.9042
13.024
0.993
1.011
95% Hazard Ratio
Confidence Limits
9.674
0.979
0.848
17.535
1.007
1.206
There appears to be no difference between short vs. long treatment, beyond the
fact that currently being in treatment prevents relapse and the longer program
keeps them in longer.
29
Download