Model Checking in the Cox PH model

advertisement
Model Checking
in the Proportional Hazard
model
• Model adequacy
• Influential observations
• PH assumption
• Example: leaders data
• Total 472 observations
• Total 11 covariates fitted in the initial PH model
• Manner start age conflict loginc region
Assessing model adequacy
(1) Residuals checking:
•
•
•
•
•
Cox-Snell (& modified)
Martingale
Deviance
Schoenfeld
Score
(2) Residual Plots
 log S (rCi )vs.rC i
(a) Cox-Snell residual plot
Yielding a straight line passing origin with unity slope
3.0
2.5
Negative Log SDF
2.0
1.5
1.0
0.5
0.0
0.0
0.5
1.0
1.5
e
2.0
2.5
3.0
(b) Index plot of Martingale, Deviance residuals
take values between -  and one
in large samples uncorrelated with each other, with mean zero
can be interpreted as the difference the observed and expected
number of deaths in (0,ti), for ith individual.
rMi  i  rCi
rDi  sgn(rMi )[2{rMi  i log(i  rMi )}]
1
2
(c) deviance residuals vs. risk score
•
•
•
•
•
•
A transformation of martingale residuals
More symmetrically distributed about zero
The smaller residuals, the better fitted by the model
Risk score: the linear predictor  ' xi in i (t )  0 (t )exp( ' xi )
Large negative values  a lower than average risk of fail
Large positive values  a higher than average risk of fail
Plot of the deviance residuals against linear predictor
Deviance Residual
2
1
0
-1
-2
-3
-15
-14
-13
Linear Predictor
-12
-11
(3) Checking the functional form of covariate
–
Martingale residual from fitting the null model
(contains no covariates)
–
plot residuals vs. covariate of interest
–
a straight line indicates a linear term is needed (but
most time need using smoother)
Martingale residuals from null model vs start
Marti residual for null model
1
0
-1
-2
-3
960
970
980
start
990
Indentifying influential observations
•
delta-beta statistics
i  j   j   j (i )
–
the difference in the parameter estimate between the all
observations fit and the ith observation omitted from the fit
–
assessing the influence of observations on a parameter
estimate
SAS output
Approx. delta-betas for manner
Extreme Observations
-------Lowest-----Value
-0.0300321
-0.0263841
-0.0180988
-0.0168336
-0.0156685
Approx. LDi
Obs
342
234
428
141
247
--------Lowest------Value
6.21925E-05
6.33683E-05
7.60771E-05
7.62382E-05
8.00260E-05
Obs
302
345
346
395
373
------Highest-----Value
0.0109460
0.0121594
0.0128754
0.0141662
0.0154732
Obs
180
212
403
258
286
------Highest-----Value
0.0838694
0.0973350
0.1086945
0.1152216
0.1250329
Obs
234
286
54
342
449
•
Likelihood displacement (LD)
LDi  2{log L(  )  log L(  (i ) )}
–
The changes of the maximized log-likelihood if omitting the ith
observation from the fit
–
Assessing the influence of observations on the overall fit( the
set of parameter estimates)
• Treatment:
– Check the original data, corrected it if found
any mistakes
– Make inferences based on both situations (full
& reduced data), contrast results
Testing PH assumption
• A crucial assumption when using Cox model
• The effect of covariates on the hazard rate are
the same over time
Before fitting a Cox Model
–
Grouping data according the level of one or
more factors
–
Plot
–
Parallel curves if the hazard ratios are
proportional across the different groups
–
In SAS, strata option
log[ log S ]vs.log(t )
Checking PH assumption for manner
plot of log(-log(surv))vs.logt
lls
1
0
-1
-2
0
1
2
3
logy
manner
0
1
4
Checking PH assumption for region
plot of log(-log(surv))vs.logt
lls
2
1
0
-1
-2
0
1
2
3
logy
region
0
1
2
3
4
After fitting a Cox Model
• Plot weighted Schoenfeld residuals vs. time
E (r *Pji )   j (ti )   j
 j (ti ) the time-varying coefficient at the ith failure time
j
the estimate in the fitted Cox PH model
Plot weighted Schoenfeld residuals vs. time (cont’s)
r *Pji   j vs.ti
–
Detect if some form of time dependency in particular covariate
exist
–
A horizontal line show the coefficient is constant, and PH
assumption is valid
Adding a time-dependent covariate if hazard rate varies with time
For example, a PH model with one covariate for ith observation assumes
i (t; Z )  0 (t )exp(1Z1i )
If the hazard ratio varies with time between two groups, we can include an
interaction term to the model:
i (t; Z )  0 (t ) exp(1Z1i  2 Z1i * t )
The relative hazard is now exp(1Z1i  2 Z1i * t ) , which depend on t.
If the coefficient 2 is significant, the model is no longer a PH model. The
test of the hypothesis that 2 =0 is a test of the PH assumption.
Summary
• Assessing model fit:
• Residuals & residual plots
• Functional form of the covariates
• Influence diagnostics:
• Delta-beta statistics
• LD statistics
• Testing PH assumption :
• Plot of log[-logS] vs. log(t)
• Plot of Schoenfeld residuals vs. t
• Adding a time-dependent variable
Download