Power Analysis • k • α

advertisement
Power Analysis
• Suppose there are k covariates
X1, X2, ..., Xk , then the required
number of deaths is approximately
• α = Type I error level
D = D0/(1 − R2)
• 1 − β = power
• Nonbinary covariate denoted by X
• The variance of X is σ 2
where R2 is the square of the multiple correlation coefficient for regressing
one covariate on the others
• Length of accrual period is T0 units
• Proportional hazards:
• Recruit r individuals per time unit
h(t) = h0(t)exp(βX)
• Length of the follow-up period is T
• Required number of deaths is
D0 = (z1−α + z1−β )2 σ 2β 2
−1
• Baseline survivor function is S0(t)
when X = 0
Hsieh and Lavori(2000, Controlled
Clinical Trials, 21, 552-560)
373
374
• Probability of death before the end of
the study for a patient who enters the
study at time t with covariate X = x is
1 − [S0([T0 − t] + T )]exp(βx)
• Suppose X has a distribution with
density f (x)
• The probability that a randomly
selected patient dies before the end
of the study is
T0−1
T0 ∞
t=0
f (x)(1 − [S0 (T0 + T − t)]exp(βx))dxdt
x=−∞
= T0−1
T0 +T ∞
t=T0
x=−∞
f (x)(1 − [S0(t)]exp(βx))dxdt
• Substitute this for D in the
Lavori/Hsieh formula
N T0+T ∞
f (x)(1−[S0 (t)]exp(βx))dxdt
T0 t=T0 x=−∞
= (z1−α + z1−β )2 σ 2β 2
−1
and solve for N.
The expected number of deaths is
N T0+T ∞
f (x)(1−[S0 (t)]exp(βx))dxdt
T0 t=T0 x=−∞
where N = rT0 is the sample size.
375
376
The
PHPOW macro:
ALPHA is the Type I error level (default=.05)
The code appears on page 151 in
SAS(R) Survival Analysis Techniques for
Medical Research, Second Edition
by Alan B. Cantor
N is the sample size
POWER is the power
%PHPOW(T= , TAU= , ALPHA=.05,
N= , POWER= ,VAR= ,DELTA= ,
S0= ,);
VAR is the variance of the covariate
DELTA is the value of exp(β)
where
S0 is the formula for the baseline survival
function, it should be a function of the
the variable, time
T is the accrual time
TAU is the follow-up time
377
378
Example:
Suppose you are designing a randomized
clinical trial to study the effect of a variable
X on a survival distribution. Suppose we
plan to
• To obtain a sample size, omit N= from
PHPOW
• accrue 120 patients over three years
• To obtain a power value,
POWER= from PHPOW
• have an additional 2 year follow-up period
omit
• PHPOW assumes that f (x) is a normal
density with variance σ 2 and mean 0
• use Type I error level .05
• Assume a baseline exponential distribution with constant hazard of about 0.4
• Estimate power for a hazard ratio of
exp(β) = 1.3
379
380
/* Example of the PHPOW macro */
\* Now compute a sample size *\
%include ’c:\st565\sas\phpow.macro.sas’;
%phpow(T=3, tau=2, n=120, var=1, delta=1.3,
s0 = exp(-.4*time))
%phpow(T=3, tau=2, power=.8, var=1, delta=1.3,
s0 = exp(-.4*time))
Alpha
Alpha
=
=
0.05
0.05
Hazard Ratio
=
Accrual Time
Hazard Ratio
=
1.3
Accrual Time
=
3
1.3
=
3
Followup Time
Followup Time
=
=
2
2
Covariate Variance
Covariate Variance
=
Baseline Survival =
exp(-.4*time)
Sample Size
120
=
1
1
Baseline Survival =exp(-.4*time)
Power =
=
0.8
Sample Size(Calculated)
Power(Calculated)
=
=
155
0.7
381
382
Dealing with Missing Values
• Suppose we have a proportional
hazards model with three covariates
hi(t) = h0(t)exp(β1 X1i + β2X2i + β3X3i)
• PHREG in SAS and coxph in R
delete any case with a missing value
for any of the covariates
• Fit the proportional hazards model to
each of the 5 completed data sets
– estimate parameters
– estimate the covariance matrix
using the inverse of the information
matrix for the completed data set
(biased toward zero)
• Use the average of the 5 sets of
parameter estimates
• Multiple imputation
– Generate 5 samples of complete
data
– Use an MCMC approach that
samples from a joint posterior
distribution for the parameters
and the missing covariate values
given the observed data
383
• Adjust the average of the 5
covariance matrices using the
variation in the 5 sets of
parameter estimates
384
proc mi data=set3 out=impute noprint;
var x1 x2 x3
run;
⎛
V (β̂M I ) =
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
Average
within
imputation
covariance
matrix
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
⎛
+
⎜
⎜
⎜
⎜
⎜
⎜
⎝
Between
imputation
variation
proc phreg data=impute outest=out3 covout noprint;
model time*status(0) = x1 x2 x3;
by _imputation_;
run;
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎠
2
1 5
β̂j − β̄
= 15 5j=1 V̂β̂ + 5+1
j=1
5
5−1
j
proc mianalyze data=out3;
var x1 x2 x3;
run;
Multiple Imputation Parameter Estimates
Parameter Estimate
x1
0.21924
x2
0.31919
x3
-0.12231
Std Error 95\% Conf. Limits DF
0.13497
-0.0732 0.5118
12.60
0.09746
0.1232 0.5151
48.96
0.03500
-0.1923 -0.0523
36.12
385
386
Counting Processes
Replaces the pair (T , δ) with the pair of
stochastic processes (N (t), Y (t)) where
Multiple Imputation Parameter Estimates
Parameter
x1
x2
x3
Minimum
0.07210
0.28355
-0.14237
• N (t) is the number of observed events
in [0,t]
Maximum
0.31359
0.38977
-0.09171
• Y (t) = 1 if the unit is under observation
and at risk at time t
Multiple Imputation Parameter Estimates
Parameter
x1
x2
x3
Theta0
0
0
0
t for HO:
Parameter=Theta0
1.62
3.27
-3.49
Y (t) = 0 otherwise
For right censored survival data
Pr>|t|
0.1290
0.0019
0.0013
• N (t) = I({T ≤ t, δ = 1})
• Y (t) = I({T ≥ t})
387
388
• Shifts emphasis from modelling the
hazard function to modelling the
intensity or rate of a point process
Partial likelihood function:
⎡
• The development of the Cox model and
Kaplan-Meier and Nelson-Aalen
estimators can easily be formulated
in terms of counting processes
• Enables the use of martingale theory to
prove results
L(β) =
n
i=1 t≥0
The log-partial likelihood is
n
– Counting Processes and Survival
Analysis by Flemming and Harrington
– Statistical Models Based on Counting
Processes by Andersen, Borgan, Gill
and Keiding
∞
0 [Yi(t)ri(β, t)
i=1
– Section 1.3, 2.2, 2.3 and 3.7 of
Modeling Survival Data by
Therneau and Grambsch
dNi (t)
⎤
Yi(t)ri(β, t) ⎥⎥⎥
⎥
⎦
n
j=1 Yi(t)ri (β, t)
⎢
⎢
⎢
⎢
⎣
where ri(β, t) = exp[Xi (t)β], i=1,2,...,n
(β) =
• Additional details
⎛
n
⎜ −log ⎜⎝
j=1
⎞⎤
⎟⎥
Yj (t)rj (β, t)⎟⎠⎥⎦ dNi(t)
Counting Process Martingale for the i-th
individual
Mi(t) = Ni(t) − 0t Yi(s)h0 (s)exp[Xi (s)β]ds
389
Types of Residuals and Their Uses
• Schoenfeld Residuals: (Collett, p
117)
¯w k)
r̂ik = δi(xik − x̂
i
where
¯w k =
x̂
i
j∈R(ti)
xjk exp(xT
j β̂)/
j∈R(ti)
exp(xT
j β̂)
for the value of the k-th covariate for
the i-th subject
390
Scaled Schoenfeld Residuals
(Grambsch and Therneau, Biometrika,
1994)
Let the Schoenfeld residuals for the i-th
individual be denoted by ri = (r1i, ..., rki).
The scaled Schoenfeld residuals are the
components of the vector
ri∗ = (m)V ar(β̂)ri
where m is the observed number of
failures
Useful for detecting departures from the
proportional hazards assumption due to
time dependent coefficients:
– individual contributions to the
derivative of the log partial
likelihood
¯ is an estimator of the risk set
– x̂
conditional mean of the covariate
– sum to zero (follows from the partial
likelihood equations)
– missing for censored subjects
391
h(t) = h0(t)exp(Xβ(t))
• β(t) for a treatment effect would decrease over time if the treatment loses
effectiveness relative to a placebo or
standard treatment
392
• Grambsch and Therneau showed that
ˆ j is the estimated coefficient
if beta
from fitting a Cox model with no timedependent covariates, then
Cox-Snell Residuals (Collett, p 112)
rCS,i = exp(XiT (t)β̂)Ĥ0(t)
∗ ) ≈ β (t ) − β̂
E(rji
j i
j
• Plot the scaled Schoenfeld residuals
against some function of time and pass
a smooth curve through the plot
• Deviations from a horizontal line provide evidence against the proportional
hazards assumption
• Use the Nelson-Aalen estimator of
Ĥ0(t)
• Have an exponential distribution when
the proposed model is appropriate
• They are never negative
393
Martingale Residuals (Collett, p 115)
Mi(t) = Ni(t) − Êi(t, β)
= Ni(t) − 0t Yi(s)exp[XiT (s)β]dĤ0(s)
where Ni(t) is a counting process that is
equal to zero until failure, then equal to 1
thereafter.
• Typically evaluated
follow-up:
at
the
end
of
Mi(t) = δi − Ĥi(t) = δi + log(Ŝi(t))
• Useful for determining the functional
form of a covariate
• When covariate xj is of interest:
1. Fit a model without xj
2. Plot martingale residuals versus xj
3. Pass a smooth curve through the
residuals
4. Functional form is suggested by the
smoothed curve. Similar to partial
residual plots in linear regression
• Just (observed count) - predicted,
given the covariate values, the length
of follow up, and history of any time
dependent covariates
• If necessary use splines to model the
functional form:
• Must sum to zero, but do not have a
symmetric distribution
• Individuals with extreme negative or
positive values of O-E are poorly fit by
the model (outliers?)
394
395
Deviance residuals: (Collett, p. 116)
Dj = sign[M̂j ]{−2[M̂j + δj log(δj − M̂j )]}1/2
where δi = 1 if it is an actual failure time.
• More nearly symmetrically distributed
than Martingale residuals, but do not
sum to zero
• One-term Taylor expansion equates
them to Pearson residuals of glm’s similar to deviance residuals in glm’s
Score (dfbeta) Residuals:
(page 118 in Collett, page 85 in T&G)
• Useful for assessing influence of
individual cases
• Useful for robust variance estimation
• Plot deviance residuals versus the risk
scores
• Denote as L̂i
• If light to moderate censoring, look like
normally distributed noise
• Modification of Schoenfeld residuals
• If heavy censoring, tend to have large
collection of points near zero
• dfbeta (in SAS output statement)
• May detect outliers (both those who
die sooner and later than expected)
396
397
• Influence on a particular coefficient
Model Checking
Influence and Poor Fit
– measure by change in coefficient
when delete subject i from the data
• Leverage
– measure using the score residuals
(L̂i)
– it is a weighted average of the distance of the value xik to the risk set
means xwj k where the weights are
the change in the Martingale residuals
– Δβ̂i = (β̂ − β̂(−i)) = V ˆar(β̂)L̂i: called
scaled score residuals or
dfbeta residuals (in SAS and Splus)
– Plot against ID or time
• Overall measure of influence, often
called likelihood displacement statistic
ldi = (Δβ̂i)[V ˆar(β̂)]−1(Δβ̂i) = L̂i[V ˆar(β)]L̂i
approximately equal to
– large value (either positive or negative) means far from the mean and
isolated
– ressco (in SAS output statement)
ldi ≈ 2[Lp(β̂) − Lp(β̂(−i))]
– plot versus summary statistic (martingale residuals)
– LD (in SAS output statement)
398
399
Cox model in SAS: VA data
Model Checking
Assessing the PH assumption
• For a single categorical covariate, the
logarithm of cumulative hazards should
be parallel; i.e., plot log(Ĥ(t)) for
each level of the covariate (in SAS,
use proc lifetest, strata=covariate, and
plots=(lls))
• Include (covariate)×log(time) interactions in the model and do hypothesis
tests
• Plot the scaled and smoothed scaled
Schoenfeld residuals for each covariate
versus a function of time
/* SAS code to for Kaplan-Meier estimation
of surviror functions to times from the
VA lung cancer trial of 137 male patients
with inoperable lung cancer */
/* Variables
Treatment: 1=standard, 2=test (chemotherapy)
Celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large
Survival in days
Status: 1=dead, 0=censored
Karnofsky score: Measures patient
performance of activities of daily
living.
SCORE
100
90
80
70
60
50
40
30
20
10
FUNCTION
Normal, no evidence of disease
Able to perform normal activity with
only minor symptoms
Able to perform normal activity with
effort, some symptoms
Able to care for self but unable to do
normal activities
Requires occasional assistance
Requires considerable assistance
Disabled, requires special assistance
Severely disabled
Very sick, requires supportive treatment
Moribund
Months from Diagnosis
Age in years
Prior therapy: 0=no, 10=yes */
400
401
data va;
infile ’c:\st565\data\va.dat’;
input rx cellt time status karno months age prior_rx;
prior_rx = prior_rx/10;
if
if
if
if
(cellt=1)
(cellt=2)
(cellt=3)
(cellt=4)
then
then
then
then
celltype=
celltype=
celltype=
celltype=
proc phreg data=va;
model time*status(0)= rx cell1 cell2 cell3 karno
months age prior_rx/ties=efron;
baseline out=ba xbeta=xbeta stdxbeta=stdxbeta
survival=surv upper=ucl lower=lcl
covariates=cov;
run;
’squamous’;
’smallcell’;
’adeno’;
’large’;
/* Create dummy variables for cell types */
cell1=0;
cell2=0;
cell3=0;
if (cellt=2) then cell1=1;
if (cellt=3) then cell2=1;
if (cellt=4) then cell3=1;
dummy=1;
run;
proc print data=ba; run;
proc sort data=ba; by rx time;
/* select covariate values where the baseline
survivor function will be estimated */
data cov;
input rx cell1 cell2 cell3 karno months age prior_rx;
datalines;
1 1 0 0 80 6 67 1
0 1 0 0 80 6 67 1
run;
402
run;
proc gplot data=ba;
plot surv*time lcl*time ucl*time/overlay;
symbol1 v=none interpol=step line=1 w=4;
symbol2 v=none interpol=step line=2 w=4;
symbol3 v=none interpol=step line=2 w=4;
by rx;
title "Estimated Survival Curves";
run;
403
/* Examination of functional form of
covariates using marrtingale residulas
*/
proc phreg data=va;
model time*status(0) = dummy;
output out=resid_out resmart=mart_res /order=data;
data data_res;
merge va resid_out;
/* gplot has a built in smoother with
smoothness ranging from 0 to 100 */
proc gplot data=data_res;
plot mart_res*age /haxis=axis2 vaxis=axis1;
symbol i=sm60s v=dot h=1.2 w=3;
axis1 label = (h=2 r=0 a=90 f=swiss "Residuals")
value = (h=2.0 f=swiss);
axis2 label = (h=2 f=swiss )
value = (h=2.0 f=swiss);
label mart_res=’Residual’;
title "Martingale Residulas";
run;
404
405
/* Test proportional hazards criterion
using scaled Schoenfeld residuals:
Produces both plots and tests */
proc phreg data=va;
model time*status(0) = rx karno age cell1
cell2 cell3 months prior_rx /ties=efron;
output out=schoenb ressch= schrx schkarno schage;
run;
proc gplot data=schoenb;
plot schrx*time schkarno*time schage*time/
haxis=axis2 vaxis=axis1;
symbol value=dot i=sm60s h=1.2 w=3;
axis1 label = (h=2 r=0 a=90 f=swiss )
value = (h=2.0 f=swiss);
axis2 label = (h=2 f=swiss)
value = (h=2.0 f=swiss);
title "Schoenfeld Residuals";
run;
406
407
408
409
/* Use the SCHOEN marco to produce plots
of scaled schoenfeld residuals and tests */
%include ’c:\mydocuments\courses\st565\sas\therneau\schoen.sas’;
/* Include the macro DASPLINE for fitting splines.
This macro is used in the SCHOEN macro */
%include ’c:\mydocuments\courses\st565\sas\therneau\daspline.sas’;
%schoen(data=va, time=time, event=status,
xvars = rx karno age cell1 cell2 cell3 months prior_rx,
outsch=outs, outbt=scaled, plot=t, points=yes, df=4,
alpha=.05);
410
411
412
413
414
415
416
417
/* Examine dfbeta residuals for influence:
accounts for change in score residuals
when dropping an observation. Information
matrix is not adjusted, influence may be
underestimated. However, much quicker to
compute than jackknife residuals (refit
Cox model for each subject) */
proc phreg data=va;
model time*status(0)= rx karno age cell1
cell2 cell3 months prior_rx/ties=efron;
output out=dfout dfbeta = rx_inf karno_inf age_inf;
run;
data dfout;
ID = _n_;
run;
set dfout;
proc gplot data=dfout;
plot rx_inf *ID karno_inf*ID age_inf*ID /
haxis=axis2 vaxis=axis1;
symbol1 v=dot h=1.0;
axis1 label = (h=2 r=0 a=90 f=swiss )
value = (h=2.0 f=swiss);
axis2 label = (h=2 f=swiss ’Patient’)
value = (h=2.0 f=swiss);
title "Dfbeta values";
run;
418
419
420
421
Cox model in R and SPlus
#
#
#
#
#
Splus code to fit the Cox model to
from the VA lung cancer trial of
137 male patients with inoperable
lung cancer. This code is posted as
vacoxph.diag.ssc
# Variables
#
Treatment: 1=standard, 2=test (chemotherapy)
#
Celltype: 1=squamous, 2=smallcell,
#
3=adeno, 4=large
#
Survival in days
#
Status: 1=dead, 0=censored
#
Karnofsky score
#
Months from Diagnosis
#
Age in years
#
Prior therapy: 0=no, 10=yes
# Enter the data into a data frame.
va <- read.table("c:/st565/data/va.dat",
header=F, col.names=c("rx", "cellt", "time",
"status", "karno", "months", "age", "priorrx"))
va$rx <- va$rx-1
va$priorrx <- va$priorrx/10;
va$celltf<-as.factor(va$cellt)
422
423
Martingale Residuals
options(contrasts=c("contr.treatment", "contr.poly"))
-2
-4
mart.res <- resid(vafit)
par( lwd=4, mex=2, fin=c(8,8))
plot(va$age, mart.res, xlab="Age", ylab="Residuals",
main="Martingale Residuals", cex=1.5, lwd=4)
lines(lowess(va$age, mart.res, iter=0), lty=1 )
-6
# Martingale residuals
Residuals
summary(vafit)
0
vafit <- coxph(Surv(time, status) ~ rx+celltf+karno
+months + age + priorrx, data = va,
x=T, y=T, robust=T,
method="efron", singular.ok=T)
40
# Fit a model with spline terms
50
va2 <- coxph(Surv(time, status) ~ rx+celltf+ns(karno, df=4)
+months + ns(age, df=3) + priorrx, data = va,
method="efron")
60
70
80
Age
424
425
# Testing the PH assumption
2
0
-6
-4
rho chisq
p
rx -0.0156 0.04 0.841485
celltf2 0.0527 0.44 0.506927
celltf3 0.1628 3.90 0.048430
celltf4 0.1873 4.92 0.026521
karno 0.2933 11.88 0.000566
months 0.1132 1.70 0.192913
age 0.2098 6.59 0.010245
priorrx -0.1668 3.99 0.045842
GLOBAL
NA 27.53 0.000572
-2
output from cox.zph gives
1) pearson correlation between scaled schoenfeld
residuals and g(t) for each covariate
2) test statistic for test
3) global test statistic over all covariates
Beta(t) for rx
#
#
#
#
#
4
vafit3 <- coxph(Surv(time,status) ~ rx + celltf + karno +
months + age + priorrx, data=va)
zph.vet <- cox.zph(vafit3, transform="log")
1
5
10
50 100
500
Time
# Plot Schoenfeld residuals against the
# transformed time scale.
plot(zph.vet[5])
plot(zph.vet[7])
plot(zph.vet[1])
426
427
0.3
0.1
0.0
-0.3
-0.1
Beta(t) for age
0.2
0.1
0.0
-0.1
-0.2
Beta(t) for karno
1
5
10
50 100
500
1
5
10
Time
50 100
500
Time
429
type="dfbeta")
ylab="Influence for RX",
ylab="Influence for Karnofsky",
0.02
0.0
dfbeta <- resid(vafit3,
n <- length(va$age)
va$id <- seq(1,n)
plot(va$id, dfbeta[,1],
xlab="subject")
plot(va$id, dfbeta[,5],
xlab="subject")
plot(va$id, dfbeta[,7],
xlab="subject")
-0.04
# Compute dfbeta (score) residuals
Influence for RX
0.04
0.06
428
ylab="Influence for Age",
0
20
40
60
80
100
120
subject
430
431
140
0.002
-0.002
0.0
Influence for Age
0.004
0.002
0.001
0.0
-0.001
Influence for Karnofsky
0
20
40
60
80
100
120
140
0
20
subject
40
60
80
100
120
140
subject
432
Strategies for non-proportional
hazards
433
• In Splus or R use
coxph(Surv(time, status) ~ rx + size + number
+ strata(center))
• Does it matter?
• Could allow β to vary by strata
• Check for outliers or bad data
hs(t, x, βs) = h0s(t) exp(xT βs)
Some strategies if non-proportionality is
substantial and real
for s = 1, . . . , S
• Drawbacks
Stratify: Incorporate covariates with
non-proportional effects into the model as
stratification factors rather than regressors
hs(t, x, β) = h0s(t) exp(xT β)
– No simple test for significance of
stratification factors on survival
– Stratum Boundaries must be
selected for continuous factors
– Too few strata may not sufficiently
reduce bias
for s = 1, . . . , S strata
• With PHREG in SAS, use the STRATA
statement
434
– Too many strata may result in
unwarranted loss of efficiency
• Accelerated Failure Time (AFT)
models
– The survival function of an individual with covariate X at time t is the
same as the survival function of an
individual with baseline survival function at time t exp(xβ)
• Partition time into segments
with proportional hazards
– Expand or contract the time scale
by a factor exp(Xβ) called
the acceleration factor
• Introduce time-dependent
covariates
– Survivor function
S(t|X) = S0 te(Xβ)
• Use other types of modesl
– Harzard funtion
h(t|X) = exp(Xβ)h0 [teXβ ]
– Cumulative hazard
Λ(t|X) = Λ0(teXβ )
435
436
• Implies a linear model for the log of the
failure time
log(T ) = μ0 − Xβ + σW
(1)
• where S0 (t) is the survival function of
the random variable exp(μ0 + σW ) and
W is a mean zero random variable;
typical distributional forms: gamma,
Gaussian, extreme value (corresponds
to Weibull distribution for T )
• common in industrial applications
• also, biological data related to cumulative effect (such as toxicity)
• Additive hazard models
– A linear model for the hazard
λi(t) = λ0(t) + Xiβ
– Often used in epidemiologic applications where λ0(t) is the baseline mortality of the population and β measures the excess risk for the patients
– Can get a negative predicted hazard
• proc lifereg in SAS
Sample code (not need to log transform data)
proc lifereg data= ;
model time*status(0) = covariates /distribution = weibull ;
class ;
output ;
run;
437
438
Some stategies:
Correlated Survival Times
• Times to recurrent infections on the
same patient
• Times to rehospitalization and time to
death for cancer patients
• Failure times for angioplasty treatments when two different angioplasty
procedures are used on each subject
• Pregnant mice are exposed to different
levels of a suspected toxin and time to
tumor appearance is measured for each
littermate.
439
• Analyze time to first event
– Simple
– Loss of information
• Marginal approach
– Initially ignore any correlations and
base estimation on a partial likelihood that incorretly assumes that all
times are independent
– Robust covarinace estimaton
• Frailty models: include random effects
to account for correlations
440
Marginal Approach
Mantel, et al, 1977, Cancer Research, 37,
3863-3868, reported times to tumor in a
study with 50 litters of rats, with three
rats per litter. Within each litter, one of
the three rats was exposed to a potential
tumor promoting agent. 40 of the 150
rats developed tumors during the followup period. Let x = 1 for exposed rats and
x = 0 for the other two rats in each litter.
• The inverse of the information matrix
for the partial likelihood function based
on completely independent responses
(call it V) is a biased estimator of
V ar(β̂)
• Use a robust covarinace estimator
V W V , where W is a consistent
estimator of the covariance matrix
for the score function.
• Proportional hazards model:
hij (t) = h0(t)exp(βxij )
• Assume completely independent responses
to obtain β̂
441
442
#
#
#
#
Splus code to fit the Cox model to
data from 50 rat litters
This code is posted as
rats.ssc
Call:
coxph(formula = Surv(time, status) ~ rx +
cluster(litter), data = rats,
method= "efron", robust = T)
# Variables
# Litter number:
#
even litter numbers are males,
#
odd litter numbers are female
# Exposure: 0=no, 1=yes
# Time: Follow-up time
# Status: 0-tumor, 1-censored
n= 150
coef exp(coef) se(coef) robust se
z
p
rx 0.905
2.47
0.318
0.303 2.99 0.0028
rx
# Enter the data into a data frame.
rats <- read.table("c:/stat565/rats.dat",
header=F, col.names=c("litter", "rx",
"time", "status"))
fitm <- coxph(Surv(time, status) ~ rx +
cluster(litter), data = rats,
robust=T, method="efron")
summary(fitm)
exp(coef) exp(-coef) lower .95 upper .95
2.47
0.405
1.37
4.47
Rsquare= 0.052
(max possible=
Likelihood ratio test= 7.98 on
Wald test
= 8.94 on
Score (logrank) test = 8.68 on
Robust = 7.65 p=0.00569
0.916 )
1 df,
p=0.00474
1 df,
p=0.00278
1 df,
p=0.00322
(Note: the likelihood ratio and score tests
assume independence of observations
within a cluster, the Wald and robust
score tests do not).
443
444
/* SAS code for the rat data */
/* Variables
Litter number:
even litter numbers are males,
odd litter numbers are female
Exposure: 0=no, 1=yes
Time: Follow-up time
Status: 0-tumor, 1-censored
*/
The PHREG Procedure
Analysis of Maximum Likelihood Estimates
Variable
rx
DF
Parameter Std
Estimate Error
1
0.89823
0.31740
Chi-Square
8.0087
Pr>ChiSq
0.0047
Hazard
Ratio
2.455
Comparison of Cox model Beta, SE
and chi-square to robust estimates
Wald chi-square is based on the robust estimates
data rats;
infile ’c:\stat565\rats.dat’;
input litter rx time status;
run;
%include ’c:\stat565\phlev.sas’;
Var
Parameter
Estimate
SE
rx
wald
0.8982
.
0.3174
.
Robust
SE
0.3137
.
ChiSquare
Robust
Chi-Sq
8.009
.
8.197
.
Wald
Chi-Sq
.
8.197
%phlev(data=rats, time=time, event=status,
xvars = rx, id=litter, outlev=phlev,
outvar= phvar );
445
446
P
0.0042
0.0042
Frailty models
• Incorporate random effects into the
hazard function to account for dependence within ’groups’ of observations
• Assume the random effects are
independent of the censoring
process
• Splus code
– littermates
fitf <- cxph(Surv(time, status) ~ rx +
frailty(litter), data=rats)
– multiple procedures on the same
subject
• model: hij (t) = h0(t) exp(ωi + xT
ij β),
where ω1, ..., ωg are random effects.
– Gamma frailty model: The ωi’s are
distributed as logs of iid gamma
random variables with variance θ .
Within group correlation is θ/(2 + θ)
– Gaussian frailty model: The ωi’s are
distributed as iid Gaussian random
variables with variance θ and mean
zero..
Call:
coxph(formula = Surv(time, status) ~ rx +
frailty(litter), data = rats,
method = "efron")
n= 150
coef se(coef)
se2 Chisq
DF
p
rx 0.914 0.323
0.319 8.01 1.0 0.0046
frailty(litter)
17.69 14.4 0.2400
rx
exp(coef) exp(-coef) lower .95 upper .95
2.5
0.401
1.32
4.7
Iterations: 6 outer, 19 Newton-Raphson
Variance of random effect= 0.499
I-likelihood = -180.8
Degrees of freedom for terms= 1.0 14.4
Rsquare= 0.222
(max possible= 0.916 )
Likelihood ratio test= 37.6 on 15.38 df,
Wald test
= 8.01 on 15.38 df,
447
p=0.00124
p=0.934
448
Extensions:
• Covariates measured with error (e.g.,
see Hu, Tsiatis, Davidian, 1998 Biometrics)
fitf
<- cxph(Surv(time, status) ~ rx +
frailty(litter, dist=’guassian’),
data=rats)
• competing risks (e.g., see Pepe and
Mori 1993 SIM)
• left censored and interval censored data
(e.g., see Finkelstein and Wolfe, 1985
Biometrics)
• Recurrent events (T&G)
• Ordered multiple events (T&G; Collett, Chapter 11)
449
450
Download