Proportional Hazard Regression Cox Proportional Hazards Modeling (PROC PHREG) Consider the following data: Drug addicts are enrolled in two different residential treatment programs that differ in length (treat = 0 is short, treat = 1 is long). The patients are assigned to two different sites (site = 0 is site A, site = 1 is site B). Herco indicates heroine and cocaine use in the past three months (1= heroine and cocaine use, 2 = heroine or cocaine use, 3 = neither heroine or cocaine use). Other variables recorded were age at time of enrollment, ndrugtx (number of previous drug treatments), time until return to drug use, and censor (1=return to drug use, 0 = censored). Reading a SAS Data Set into SAS You will need to save the data set uis_small to your computer. It is a SAS data set, and it can be read into a SAS program using the following code (making the appropriate adjustment to the file location): DATA uis; SET 'C:\uis_small'; RUN; To make sure the data set was read in properly, print out the first 10 observations: PROC PRINT DATA=uis (obs=10); RUN; The SAS System Obs ID age ndrugtx 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 39 33 33 32 24 30 39 27 40 36 1 8 3 1 5 1 34 2 3 7 treat site time censor 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 188 26 207 144 551 32 459 22 210 184 1 1 1 1 0 1 1 1 1 1 herco 3 3 2 3 2 1 3 3 2 2 First compare survival rates for the three categorical variables of treat, site and herco: PROC LIFETEST DATA=uis PLOTS=(s); TITLE 'Survival by Treatment'; TIME time*censor(0); STRATA treat; RUN; PROC LIFETEST DATA=uis PLOTS=(s); TITLE 'Survival by Site'; TIME time*censor(0); STRATA site; RUN; PROC LIFETEST DATA=uis PLOTS=(s); TITLE 'Survival by herco'; TIME time*censor(0); STRATA herco; RUN; The Wilcoxon and Log-Rank Tests (output not shown) are statistically significant (p = 0.0021, p = 0.0091, respectively). Treatment affects risk of returning to drug use. The Wilcoxon and Log-Rank Tests (output not shown) are not statistically significant (p = 0.0779, p = 0.1240, respectively). Site does not affect risk of returning to drug use. The Wilcoxon and Log-Rank Tests (output not shown) are not statistically significant (p = 0.2919, p = 0.1473, respectively). Herco does not affect risk of returning to drug use, although the curves do cross initially, so this may affect these statistical tests. Now examine if ndrugtx and age affect the risk of returning to drug use. Because these are continuous variables, we will use proportional hazard regression (PROC PHREG): PROC PHREG DATA=uis; MODEL time*censor(0) = ndrugtx; RUN; PROC PHREG DATA=uis; MODEL time*censor(0) = age; RUN; Output from PHREG: ndrugtx Interpreting the Output • The proportional hazards regression model for these data with ndrugtx as the predictor is: λ(t) = λo(t)exp(0.02937*ndrugtx) • The relative risk of a 1 unit increase in the number of previous drug treatments is: = λo(t)exp(0.02937*1)/ λo(t)exp(0.02937*0) = exp(0.02937-0) = exp(0.02937) = 1.03 • With each increase in the number of prior drug treatments, the risk of relapsing increases by 3% (1.031.00). • Notice that the SAS output also gives you this relative risk under “Hazard Ratio.” • This term is significant (p<0.0001), which indicates that prior drug treatments affect risk of relapse. Output from PHREG: age Interpreting the Output: Age • The proportional hazards regression model for these data with age as the predictor is: λ(t) = λo(t)exp(-0.01286*age) • The relative risk of a 1 year increase in age at enrollment is: = λo(t)exp(-0.01286*1)/ λo(t)exp(-0.01286*0) = exp(-0.01286-0) = exp(-0.01286) = 0.987 • With each year increase in age of enrollment, the risk of relapsing decreases by 1.3% (1.000.987). • Notice that the SAS output also gives you this relative risk under “Hazard Ratio.” • Age is not significantly related to risk, however (p=0.735). The Full Model First consider the full model with all of the predictor variables. As part of the PHREG procedure, we will create 2 new variables: herco2 and herco3. In addition, we will conduct a test labeled “herco” to determine whether both of these variables together are significant. PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site herco2 herco3; herco2 = herco=2; herco3 = herco=3; herco: TEST herco2, herco3; RUN; Results from “herco” test The test of our two new variables, herco2 and herco3, is non-significant (p = 0.1130), so we will drop herco from our model and run the refitted model. PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site; RUN; Output from Model w/o herco All of the terms in the model are significant, except for site, which is approaching significance. Because we know from previous research that site is important, we will leave it in our model. We will now check six different interactions in our model, to see if any significant ones exist: ndrugtx*age, ndrugtx*treat, ndrugtx*site, age*treat, age*site, treat*site Adding ndrugtx*age to the model (notice you can create the interaction term within the PHREG procedure): PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site drugage; drugage = ndrugtx*age; RUN; ndrugtx*age interaction not significant Adding ndrugtx*treat to the model PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site drugtreat; drugtreat = ndrugtx*treat; RUN; ndrugtx*treat not significant Adding ndrugtx*site to the model PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site drugsite; drugsite = ndrugtx*site; RUN; ndrugtx*site not significant Adding age*treat to the model PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site agetreat; agetreat = age*treat; RUN; age*treat not significant Adding age*site to the model PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site agesite; agesite = age*site; RUN; age*site interaction IS significant Adding treat*site to the model PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site treatsite; treatsite = treat*site; RUN; treat*site not significant Final Model Selection Not only was the age*site interaction significant, but once we included it in our model, the site term also became statistically significant. The final proportional hazard model is: λ(t) = λo(t)exp(β1*age + β2*ndrugtx + β3*treat + β4*site + β5*treatsite) λ(t) = λo(t)exp(-0.034*age + 0.036*ndrugtx – 0.267*treat – 1.246*site + 0.034*treatsite) Testing Proportionality The Cox proportional hazard regression we have just conducted assumes that the risks are proportional, that is, that the proportion is constant over time. To test this assumption of proportionality, we use time-dependent variables and test whether they are significant. If they are not significant, it means that time does not affect the relative risk, and we can conclude that the risks in our model are proportional. Creating and testing time-dependent varibles (on the log scale): PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx treat site agesite aget drugt treatt sitet; agesite = age*site; aget = age*log(time); drugt = ndrugtx*log(time); treatt = treat*log(time); sitet = site*log(time); test_proportionality: TEST aget, drugt, treatt, sitet; RUN; Testing Proportionality Output The test we labeled “test_proportionality” is not significant (p = 0.7309), which means that none of our time-dependent variables are significant. We can assume proportionality over time. If we cannot assume proportionality… If the assumption of proportionality was not met, we could stratify across the variable that does not have a proportionate risk. For example, if we found the variable treat to be not proportional, we could stratify on that variable: PROC PHREG DATA=uis; MODEL time*censor(0) = age ndrugtx site agesite; agesite = age*site; STRATA treat; RUN; Output stratifying on treat