Power Analysis • Suppose there are k covariates X1, X2, ..., Xk , then the required number of deaths is approximately • α = Type I error level D = D0/(1 − R2) • 1 − β = power • Nonbinary covariate denoted by X • The variance of X is σ 2 where R2 is the square of the multiple correlation coefficient for regressing one covariate on the others • Length of accrual period is T0 units • Proportional hazards: • Recruit r individuals per time unit h(t) = h0(t)exp(βX) • Length of the follow-up period is T • Required number of deaths is D0 = (z1−α + z1−β )2 σ 2β 2 −1 • Baseline survivor function is S0(t) when X = 0 Hsieh and Lavori(2000, Controlled Clinical Trials, 21, 552-560) 373 374 • Probability of death before the end of the study for a patient who enters the study at time t with covariate X = x is 1 − [S0([T0 − t] + T )]exp(βx) • Suppose X has a distribution with density f (x) • The probability that a randomly selected patient dies before the end of the study is T0−1 T0 ∞ t=0 f (x)(1 − [S0 (T0 + T − t)]exp(βx))dxdt x=−∞ = T0−1 T0 +T ∞ t=T0 x=−∞ f (x)(1 − [S0(t)]exp(βx))dxdt • Substitute this for D in the Lavori/Hsieh formula N T0+T ∞ f (x)(1−[S0 (t)]exp(βx))dxdt T0 t=T0 x=−∞ = (z1−α + z1−β )2 σ 2β 2 −1 and solve for N. The expected number of deaths is N T0+T ∞ f (x)(1−[S0 (t)]exp(βx))dxdt T0 t=T0 x=−∞ where N = rT0 is the sample size. 375 376 The PHPOW macro: ALPHA is the Type I error level (default=.05) The code appears on page 151 in SAS(R) Survival Analysis Techniques for Medical Research, Second Edition by Alan B. Cantor N is the sample size POWER is the power %PHPOW(T= , TAU= , ALPHA=.05, N= , POWER= ,VAR= ,DELTA= , S0= ,); VAR is the variance of the covariate DELTA is the value of exp(β) where S0 is the formula for the baseline survival function, it should be a function of the the variable, time T is the accrual time TAU is the follow-up time 377 378 Example: Suppose you are designing a randomized clinical trial to study the effect of a variable X on a survival distribution. Suppose we plan to • To obtain a sample size, omit N= from PHPOW • accrue 120 patients over three years • To obtain a power value, POWER= from PHPOW • have an additional 2 year follow-up period omit • PHPOW assumes that f (x) is a normal density with variance σ 2 and mean 0 • use Type I error level .05 • Assume a baseline exponential distribution with constant hazard of about 0.4 • Estimate power for a hazard ratio of exp(β) = 1.3 379 380 /* Example of the PHPOW macro */ \* Now compute a sample size *\ %include ’c:\st565\sas\phpow.macro.sas’; %phpow(T=3, tau=2, n=120, var=1, delta=1.3, s0 = exp(-.4*time)) %phpow(T=3, tau=2, power=.8, var=1, delta=1.3, s0 = exp(-.4*time)) Alpha Alpha = = 0.05 0.05 Hazard Ratio = Accrual Time Hazard Ratio = 1.3 Accrual Time = 3 1.3 = 3 Followup Time Followup Time = = 2 2 Covariate Variance Covariate Variance = Baseline Survival = exp(-.4*time) Sample Size 120 = 1 1 Baseline Survival =exp(-.4*time) Power = = 0.8 Sample Size(Calculated) Power(Calculated) = = 155 0.7 381 382 Dealing with Missing Values • Suppose we have a proportional hazards model with three covariates hi(t) = h0(t)exp(β1 X1i + β2X2i + β3X3i) • PHREG in SAS and coxph in R delete any case with a missing value for any of the covariates • Fit the proportional hazards model to each of the 5 completed data sets – estimate parameters – estimate the covariance matrix using the inverse of the information matrix for the completed data set (biased toward zero) • Use the average of the 5 sets of parameter estimates • Multiple imputation – Generate 5 samples of complete data – Use an MCMC approach that samples from a joint posterior distribution for the parameters and the missing covariate values given the observed data 383 • Adjust the average of the 5 covariance matrices using the variation in the 5 sets of parameter estimates 384 proc mi data=set3 out=impute noprint; var x1 x2 x3 run; ⎛ V (β̂M I ) = ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ Average within imputation covariance matrix ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ + ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ Between imputation variation proc phreg data=impute outest=out3 covout noprint; model time*status(0) = x1 x2 x3; by _imputation_; run; ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 2 1 5 β̂j − β̄ = 15 5j=1 V̂β̂ + 5+1 j=1 5 5−1 j proc mianalyze data=out3; var x1 x2 x3; run; Multiple Imputation Parameter Estimates Parameter Estimate x1 0.21924 x2 0.31919 x3 -0.12231 Std Error 95\% Conf. Limits DF 0.13497 -0.0732 0.5118 12.60 0.09746 0.1232 0.5151 48.96 0.03500 -0.1923 -0.0523 36.12 385 386 Counting Processes Replaces the pair (T , δ) with the pair of stochastic processes (N (t), Y (t)) where Multiple Imputation Parameter Estimates Parameter x1 x2 x3 Minimum 0.07210 0.28355 -0.14237 • N (t) is the number of observed events in [0,t] Maximum 0.31359 0.38977 -0.09171 • Y (t) = 1 if the unit is under observation and at risk at time t Multiple Imputation Parameter Estimates Parameter x1 x2 x3 Theta0 0 0 0 t for HO: Parameter=Theta0 1.62 3.27 -3.49 Y (t) = 0 otherwise For right censored survival data Pr>|t| 0.1290 0.0019 0.0013 • N (t) = I({T ≤ t, δ = 1}) • Y (t) = I({T ≥ t}) 387 388 • Shifts emphasis from modelling the hazard function to modelling the intensity or rate of a point process Partial likelihood function: ⎡ • The development of the Cox model and Kaplan-Meier and Nelson-Aalen estimators can easily be formulated in terms of counting processes • Enables the use of martingale theory to prove results L(β) = n i=1 t≥0 The log-partial likelihood is n – Counting Processes and Survival Analysis by Flemming and Harrington – Statistical Models Based on Counting Processes by Andersen, Borgan, Gill and Keiding ∞ 0 [Yi(t)ri(β, t) i=1 – Section 1.3, 2.2, 2.3 and 3.7 of Modeling Survival Data by Therneau and Grambsch dNi (t) ⎤ Yi(t)ri(β, t) ⎥⎥⎥ ⎥ ⎦ n j=1 Yi(t)ri (β, t) ⎢ ⎢ ⎢ ⎢ ⎣ where ri(β, t) = exp[Xi (t)β], i=1,2,...,n (β) = • Additional details ⎛ n ⎜ −log ⎜⎝ j=1 ⎞⎤ ⎟⎥ Yj (t)rj (β, t)⎟⎠⎥⎦ dNi(t) Counting Process Martingale for the i-th individual Mi(t) = Ni(t) − 0t Yi(s)h0 (s)exp[Xi (s)β]ds 389 Types of Residuals and Their Uses • Schoenfeld Residuals: (Collett, p 117) ¯w k) r̂ik = δi(xik − x̂ i where ¯w k = x̂ i j∈R(ti) xjk exp(xT j β̂)/ j∈R(ti) exp(xT j β̂) for the value of the k-th covariate for the i-th subject 390 Scaled Schoenfeld Residuals (Grambsch and Therneau, Biometrika, 1994) Let the Schoenfeld residuals for the i-th individual be denoted by ri = (r1i, ..., rki). The scaled Schoenfeld residuals are the components of the vector ri∗ = (m)V ar(β̂)ri where m is the observed number of failures Useful for detecting departures from the proportional hazards assumption due to time dependent coefficients: – individual contributions to the derivative of the log partial likelihood ¯ is an estimator of the risk set – x̂ conditional mean of the covariate – sum to zero (follows from the partial likelihood equations) – missing for censored subjects 391 h(t) = h0(t)exp(Xβ(t)) • β(t) for a treatment effect would decrease over time if the treatment loses effectiveness relative to a placebo or standard treatment 392 • Grambsch and Therneau showed that ˆ j is the estimated coefficient if beta from fitting a Cox model with no timedependent covariates, then Cox-Snell Residuals (Collett, p 112) rCS,i = exp(XiT (t)β̂)Ĥ0(t) ∗ ) ≈ β (t ) − β̂ E(rji j i j • Plot the scaled Schoenfeld residuals against some function of time and pass a smooth curve through the plot • Deviations from a horizontal line provide evidence against the proportional hazards assumption • Use the Nelson-Aalen estimator of Ĥ0(t) • Have an exponential distribution when the proposed model is appropriate • They are never negative 393 Martingale Residuals (Collett, p 115) Mi(t) = Ni(t) − Êi(t, β) = Ni(t) − 0t Yi(s)exp[XiT (s)β]dĤ0(s) where Ni(t) is a counting process that is equal to zero until failure, then equal to 1 thereafter. • Typically evaluated follow-up: at the end of Mi(t) = δi − Ĥi(t) = δi + log(Ŝi(t)) • Useful for determining the functional form of a covariate • When covariate xj is of interest: 1. Fit a model without xj 2. Plot martingale residuals versus xj 3. Pass a smooth curve through the residuals 4. Functional form is suggested by the smoothed curve. Similar to partial residual plots in linear regression • Just (observed count) - predicted, given the covariate values, the length of follow up, and history of any time dependent covariates • If necessary use splines to model the functional form: • Must sum to zero, but do not have a symmetric distribution • Individuals with extreme negative or positive values of O-E are poorly fit by the model (outliers?) 394 395 Deviance residuals: (Collett, p. 116) Dj = sign[M̂j ]{−2[M̂j + δj log(δj − M̂j )]}1/2 where δi = 1 if it is an actual failure time. • More nearly symmetrically distributed than Martingale residuals, but do not sum to zero • One-term Taylor expansion equates them to Pearson residuals of glm’s similar to deviance residuals in glm’s Score (dfbeta) Residuals: (page 118 in Collett, page 85 in T&G) • Useful for assessing influence of individual cases • Useful for robust variance estimation • Plot deviance residuals versus the risk scores • Denote as L̂i • If light to moderate censoring, look like normally distributed noise • Modification of Schoenfeld residuals • If heavy censoring, tend to have large collection of points near zero • dfbeta (in SAS output statement) • May detect outliers (both those who die sooner and later than expected) 396 397 • Influence on a particular coefficient Model Checking Influence and Poor Fit – measure by change in coefficient when delete subject i from the data • Leverage – measure using the score residuals (L̂i) – it is a weighted average of the distance of the value xik to the risk set means xwj k where the weights are the change in the Martingale residuals – Δβ̂i = (β̂ − β̂(−i)) = V ˆar(β̂)L̂i: called scaled score residuals or dfbeta residuals (in SAS and Splus) – Plot against ID or time • Overall measure of influence, often called likelihood displacement statistic ldi = (Δβ̂i)[V ˆar(β̂)]−1(Δβ̂i) = L̂i[V ˆar(β)]L̂i approximately equal to – large value (either positive or negative) means far from the mean and isolated – ressco (in SAS output statement) ldi ≈ 2[Lp(β̂) − Lp(β̂(−i))] – plot versus summary statistic (martingale residuals) – LD (in SAS output statement) 398 399 Cox model in SAS: VA data Model Checking Assessing the PH assumption • For a single categorical covariate, the logarithm of cumulative hazards should be parallel; i.e., plot log(Ĥ(t)) for each level of the covariate (in SAS, use proc lifetest, strata=covariate, and plots=(lls)) • Include (covariate)×log(time) interactions in the model and do hypothesis tests • Plot the scaled and smoothed scaled Schoenfeld residuals for each covariate versus a function of time /* SAS code to for Kaplan-Meier estimation of surviror functions to times from the VA lung cancer trial of 137 male patients with inoperable lung cancer */ /* Variables Treatment: 1=standard, 2=test (chemotherapy) Celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large Survival in days Status: 1=dead, 0=censored Karnofsky score: Measures patient performance of activities of daily living. SCORE 100 90 80 70 60 50 40 30 20 10 FUNCTION Normal, no evidence of disease Able to perform normal activity with only minor symptoms Able to perform normal activity with effort, some symptoms Able to care for self but unable to do normal activities Requires occasional assistance Requires considerable assistance Disabled, requires special assistance Severely disabled Very sick, requires supportive treatment Moribund Months from Diagnosis Age in years Prior therapy: 0=no, 10=yes */ 400 401 data va; infile ’c:\st565\data\va.dat’; input rx cellt time status karno months age prior_rx; prior_rx = prior_rx/10; if if if if (cellt=1) (cellt=2) (cellt=3) (cellt=4) then then then then celltype= celltype= celltype= celltype= proc phreg data=va; model time*status(0)= rx cell1 cell2 cell3 karno months age prior_rx/ties=efron; baseline out=ba xbeta=xbeta stdxbeta=stdxbeta survival=surv upper=ucl lower=lcl covariates=cov; run; ’squamous’; ’smallcell’; ’adeno’; ’large’; /* Create dummy variables for cell types */ cell1=0; cell2=0; cell3=0; if (cellt=2) then cell1=1; if (cellt=3) then cell2=1; if (cellt=4) then cell3=1; dummy=1; run; proc print data=ba; run; proc sort data=ba; by rx time; /* select covariate values where the baseline survivor function will be estimated */ data cov; input rx cell1 cell2 cell3 karno months age prior_rx; datalines; 1 1 0 0 80 6 67 1 0 1 0 0 80 6 67 1 run; 402 run; proc gplot data=ba; plot surv*time lcl*time ucl*time/overlay; symbol1 v=none interpol=step line=1 w=4; symbol2 v=none interpol=step line=2 w=4; symbol3 v=none interpol=step line=2 w=4; by rx; title "Estimated Survival Curves"; run; 403 /* Examination of functional form of covariates using marrtingale residulas */ proc phreg data=va; model time*status(0) = dummy; output out=resid_out resmart=mart_res /order=data; data data_res; merge va resid_out; /* gplot has a built in smoother with smoothness ranging from 0 to 100 */ proc gplot data=data_res; plot mart_res*age /haxis=axis2 vaxis=axis1; symbol i=sm60s v=dot h=1.2 w=3; axis1 label = (h=2 r=0 a=90 f=swiss "Residuals") value = (h=2.0 f=swiss); axis2 label = (h=2 f=swiss ) value = (h=2.0 f=swiss); label mart_res=’Residual’; title "Martingale Residulas"; run; 404 405 /* Test proportional hazards criterion using scaled Schoenfeld residuals: Produces both plots and tests */ proc phreg data=va; model time*status(0) = rx karno age cell1 cell2 cell3 months prior_rx /ties=efron; output out=schoenb ressch= schrx schkarno schage; run; proc gplot data=schoenb; plot schrx*time schkarno*time schage*time/ haxis=axis2 vaxis=axis1; symbol value=dot i=sm60s h=1.2 w=3; axis1 label = (h=2 r=0 a=90 f=swiss ) value = (h=2.0 f=swiss); axis2 label = (h=2 f=swiss) value = (h=2.0 f=swiss); title "Schoenfeld Residuals"; run; 406 407 408 409 /* Use the SCHOEN marco to produce plots of scaled schoenfeld residuals and tests */ %include ’c:\mydocuments\courses\st565\sas\therneau\schoen.sas’; /* Include the macro DASPLINE for fitting splines. This macro is used in the SCHOEN macro */ %include ’c:\mydocuments\courses\st565\sas\therneau\daspline.sas’; %schoen(data=va, time=time, event=status, xvars = rx karno age cell1 cell2 cell3 months prior_rx, outsch=outs, outbt=scaled, plot=t, points=yes, df=4, alpha=.05); 410 411 412 413 414 415 416 417 /* Examine dfbeta residuals for influence: accounts for change in score residuals when dropping an observation. Information matrix is not adjusted, influence may be underestimated. However, much quicker to compute than jackknife residuals (refit Cox model for each subject) */ proc phreg data=va; model time*status(0)= rx karno age cell1 cell2 cell3 months prior_rx/ties=efron; output out=dfout dfbeta = rx_inf karno_inf age_inf; run; data dfout; ID = _n_; run; set dfout; proc gplot data=dfout; plot rx_inf *ID karno_inf*ID age_inf*ID / haxis=axis2 vaxis=axis1; symbol1 v=dot h=1.0; axis1 label = (h=2 r=0 a=90 f=swiss ) value = (h=2.0 f=swiss); axis2 label = (h=2 f=swiss ’Patient’) value = (h=2.0 f=swiss); title "Dfbeta values"; run; 418 419 420 421 Cox model in R and SPlus # # # # # Splus code to fit the Cox model to from the VA lung cancer trial of 137 male patients with inoperable lung cancer. This code is posted as vacoxph.diag.ssc # Variables # Treatment: 1=standard, 2=test (chemotherapy) # Celltype: 1=squamous, 2=smallcell, # 3=adeno, 4=large # Survival in days # Status: 1=dead, 0=censored # Karnofsky score # Months from Diagnosis # Age in years # Prior therapy: 0=no, 10=yes # Enter the data into a data frame. va <- read.table("c:/st565/data/va.dat", header=F, col.names=c("rx", "cellt", "time", "status", "karno", "months", "age", "priorrx")) va$rx <- va$rx-1 va$priorrx <- va$priorrx/10; va$celltf<-as.factor(va$cellt) 422 423 Martingale Residuals options(contrasts=c("contr.treatment", "contr.poly")) -2 -4 mart.res <- resid(vafit) par( lwd=4, mex=2, fin=c(8,8)) plot(va$age, mart.res, xlab="Age", ylab="Residuals", main="Martingale Residuals", cex=1.5, lwd=4) lines(lowess(va$age, mart.res, iter=0), lty=1 ) -6 # Martingale residuals Residuals summary(vafit) 0 vafit <- coxph(Surv(time, status) ~ rx+celltf+karno +months + age + priorrx, data = va, x=T, y=T, robust=T, method="efron", singular.ok=T) 40 # Fit a model with spline terms 50 va2 <- coxph(Surv(time, status) ~ rx+celltf+ns(karno, df=4) +months + ns(age, df=3) + priorrx, data = va, method="efron") 60 70 80 Age 424 425 # Testing the PH assumption 2 0 -6 -4 rho chisq p rx -0.0156 0.04 0.841485 celltf2 0.0527 0.44 0.506927 celltf3 0.1628 3.90 0.048430 celltf4 0.1873 4.92 0.026521 karno 0.2933 11.88 0.000566 months 0.1132 1.70 0.192913 age 0.2098 6.59 0.010245 priorrx -0.1668 3.99 0.045842 GLOBAL NA 27.53 0.000572 -2 output from cox.zph gives 1) pearson correlation between scaled schoenfeld residuals and g(t) for each covariate 2) test statistic for test 3) global test statistic over all covariates Beta(t) for rx # # # # # 4 vafit3 <- coxph(Surv(time,status) ~ rx + celltf + karno + months + age + priorrx, data=va) zph.vet <- cox.zph(vafit3, transform="log") 1 5 10 50 100 500 Time # Plot Schoenfeld residuals against the # transformed time scale. plot(zph.vet[5]) plot(zph.vet[7]) plot(zph.vet[1]) 426 427 0.3 0.1 0.0 -0.3 -0.1 Beta(t) for age 0.2 0.1 0.0 -0.1 -0.2 Beta(t) for karno 1 5 10 50 100 500 1 5 10 Time 50 100 500 Time 429 type="dfbeta") ylab="Influence for RX", ylab="Influence for Karnofsky", 0.02 0.0 dfbeta <- resid(vafit3, n <- length(va$age) va$id <- seq(1,n) plot(va$id, dfbeta[,1], xlab="subject") plot(va$id, dfbeta[,5], xlab="subject") plot(va$id, dfbeta[,7], xlab="subject") -0.04 # Compute dfbeta (score) residuals Influence for RX 0.04 0.06 428 ylab="Influence for Age", 0 20 40 60 80 100 120 subject 430 431 140 0.002 -0.002 0.0 Influence for Age 0.004 0.002 0.001 0.0 -0.001 Influence for Karnofsky 0 20 40 60 80 100 120 140 0 20 subject 40 60 80 100 120 140 subject 432 Strategies for non-proportional hazards 433 • In Splus or R use coxph(Surv(time, status) ~ rx + size + number + strata(center)) • Does it matter? • Could allow β to vary by strata • Check for outliers or bad data hs(t, x, βs) = h0s(t) exp(xT βs) Some strategies if non-proportionality is substantial and real for s = 1, . . . , S • Drawbacks Stratify: Incorporate covariates with non-proportional effects into the model as stratification factors rather than regressors hs(t, x, β) = h0s(t) exp(xT β) – No simple test for significance of stratification factors on survival – Stratum Boundaries must be selected for continuous factors – Too few strata may not sufficiently reduce bias for s = 1, . . . , S strata • With PHREG in SAS, use the STRATA statement 434 – Too many strata may result in unwarranted loss of efficiency • Accelerated Failure Time (AFT) models – The survival function of an individual with covariate X at time t is the same as the survival function of an individual with baseline survival function at time t exp(xβ) • Partition time into segments with proportional hazards – Expand or contract the time scale by a factor exp(Xβ) called the acceleration factor • Introduce time-dependent covariates – Survivor function S(t|X) = S0 te(Xβ) • Use other types of modesl – Harzard funtion h(t|X) = exp(Xβ)h0 [teXβ ] – Cumulative hazard Λ(t|X) = Λ0(teXβ ) 435 436 • Implies a linear model for the log of the failure time log(T ) = μ0 − Xβ + σW (1) • where S0 (t) is the survival function of the random variable exp(μ0 + σW ) and W is a mean zero random variable; typical distributional forms: gamma, Gaussian, extreme value (corresponds to Weibull distribution for T ) • common in industrial applications • also, biological data related to cumulative effect (such as toxicity) • Additive hazard models – A linear model for the hazard λi(t) = λ0(t) + Xiβ – Often used in epidemiologic applications where λ0(t) is the baseline mortality of the population and β measures the excess risk for the patients – Can get a negative predicted hazard • proc lifereg in SAS Sample code (not need to log transform data) proc lifereg data= ; model time*status(0) = covariates /distribution = weibull ; class ; output ; run; 437 438 Some stategies: Correlated Survival Times • Times to recurrent infections on the same patient • Times to rehospitalization and time to death for cancer patients • Failure times for angioplasty treatments when two different angioplasty procedures are used on each subject • Pregnant mice are exposed to different levels of a suspected toxin and time to tumor appearance is measured for each littermate. 439 • Analyze time to first event – Simple – Loss of information • Marginal approach – Initially ignore any correlations and base estimation on a partial likelihood that incorretly assumes that all times are independent – Robust covarinace estimaton • Frailty models: include random effects to account for correlations 440 Marginal Approach Mantel, et al, 1977, Cancer Research, 37, 3863-3868, reported times to tumor in a study with 50 litters of rats, with three rats per litter. Within each litter, one of the three rats was exposed to a potential tumor promoting agent. 40 of the 150 rats developed tumors during the followup period. Let x = 1 for exposed rats and x = 0 for the other two rats in each litter. • The inverse of the information matrix for the partial likelihood function based on completely independent responses (call it V) is a biased estimator of V ar(β̂) • Use a robust covarinace estimator V W V , where W is a consistent estimator of the covariance matrix for the score function. • Proportional hazards model: hij (t) = h0(t)exp(βxij ) • Assume completely independent responses to obtain β̂ 441 442 # # # # Splus code to fit the Cox model to data from 50 rat litters This code is posted as rats.ssc Call: coxph(formula = Surv(time, status) ~ rx + cluster(litter), data = rats, method= "efron", robust = T) # Variables # Litter number: # even litter numbers are males, # odd litter numbers are female # Exposure: 0=no, 1=yes # Time: Follow-up time # Status: 0-tumor, 1-censored n= 150 coef exp(coef) se(coef) robust se z p rx 0.905 2.47 0.318 0.303 2.99 0.0028 rx # Enter the data into a data frame. rats <- read.table("c:/stat565/rats.dat", header=F, col.names=c("litter", "rx", "time", "status")) fitm <- coxph(Surv(time, status) ~ rx + cluster(litter), data = rats, robust=T, method="efron") summary(fitm) exp(coef) exp(-coef) lower .95 upper .95 2.47 0.405 1.37 4.47 Rsquare= 0.052 (max possible= Likelihood ratio test= 7.98 on Wald test = 8.94 on Score (logrank) test = 8.68 on Robust = 7.65 p=0.00569 0.916 ) 1 df, p=0.00474 1 df, p=0.00278 1 df, p=0.00322 (Note: the likelihood ratio and score tests assume independence of observations within a cluster, the Wald and robust score tests do not). 443 444 /* SAS code for the rat data */ /* Variables Litter number: even litter numbers are males, odd litter numbers are female Exposure: 0=no, 1=yes Time: Follow-up time Status: 0-tumor, 1-censored */ The PHREG Procedure Analysis of Maximum Likelihood Estimates Variable rx DF Parameter Std Estimate Error 1 0.89823 0.31740 Chi-Square 8.0087 Pr>ChiSq 0.0047 Hazard Ratio 2.455 Comparison of Cox model Beta, SE and chi-square to robust estimates Wald chi-square is based on the robust estimates data rats; infile ’c:\stat565\rats.dat’; input litter rx time status; run; %include ’c:\stat565\phlev.sas’; Var Parameter Estimate SE rx wald 0.8982 . 0.3174 . Robust SE 0.3137 . ChiSquare Robust Chi-Sq 8.009 . 8.197 . Wald Chi-Sq . 8.197 %phlev(data=rats, time=time, event=status, xvars = rx, id=litter, outlev=phlev, outvar= phvar ); 445 446 P 0.0042 0.0042 Frailty models • Incorporate random effects into the hazard function to account for dependence within ’groups’ of observations • Assume the random effects are independent of the censoring process • Splus code – littermates fitf <- cxph(Surv(time, status) ~ rx + frailty(litter), data=rats) – multiple procedures on the same subject • model: hij (t) = h0(t) exp(ωi + xT ij β), where ω1, ..., ωg are random effects. – Gamma frailty model: The ωi’s are distributed as logs of iid gamma random variables with variance θ . Within group correlation is θ/(2 + θ) – Gaussian frailty model: The ωi’s are distributed as iid Gaussian random variables with variance θ and mean zero.. Call: coxph(formula = Surv(time, status) ~ rx + frailty(litter), data = rats, method = "efron") n= 150 coef se(coef) se2 Chisq DF p rx 0.914 0.323 0.319 8.01 1.0 0.0046 frailty(litter) 17.69 14.4 0.2400 rx exp(coef) exp(-coef) lower .95 upper .95 2.5 0.401 1.32 4.7 Iterations: 6 outer, 19 Newton-Raphson Variance of random effect= 0.499 I-likelihood = -180.8 Degrees of freedom for terms= 1.0 14.4 Rsquare= 0.222 (max possible= 0.916 ) Likelihood ratio test= 37.6 on 15.38 df, Wald test = 8.01 on 15.38 df, 447 p=0.00124 p=0.934 448 Extensions: • Covariates measured with error (e.g., see Hu, Tsiatis, Davidian, 1998 Biometrics) fitf <- cxph(Surv(time, status) ~ rx + frailty(litter, dist=’guassian’), data=rats) • competing risks (e.g., see Pepe and Mori 1993 SIM) • left censored and interval censored data (e.g., see Finkelstein and Wolfe, 1985 Biometrics) • Recurrent events (T&G) • Ordered multiple events (T&G; Collett, Chapter 11) 449 450