Model Checking in the Proportional Hazard model • Model adequacy • Influential observations • PH assumption • Example: leaders data • Total 472 observations • Total 11 covariates fitted in the initial PH model • Manner start age conflict loginc region Assessing model adequacy (1) Residuals checking: • • • • • Cox-Snell (& modified) Martingale Deviance Schoenfeld Score (2) Residual Plots log S (rCi )vs.rC i (a) Cox-Snell residual plot Yielding a straight line passing origin with unity slope 3.0 2.5 Negative Log SDF 2.0 1.5 1.0 0.5 0.0 0.0 0.5 1.0 1.5 e 2.0 2.5 3.0 (b) Index plot of Martingale, Deviance residuals take values between - and one in large samples uncorrelated with each other, with mean zero can be interpreted as the difference the observed and expected number of deaths in (0,ti), for ith individual. rMi i rCi rDi sgn(rMi )[2{rMi i log(i rMi )}] 1 2 (c) deviance residuals vs. risk score • • • • • • A transformation of martingale residuals More symmetrically distributed about zero The smaller residuals, the better fitted by the model Risk score: the linear predictor ' xi in i (t ) 0 (t )exp( ' xi ) Large negative values a lower than average risk of fail Large positive values a higher than average risk of fail Plot of the deviance residuals against linear predictor Deviance Residual 2 1 0 -1 -2 -3 -15 -14 -13 Linear Predictor -12 -11 (3) Checking the functional form of covariate – Martingale residual from fitting the null model (contains no covariates) – plot residuals vs. covariate of interest – a straight line indicates a linear term is needed (but most time need using smoother) Martingale residuals from null model vs start Marti residual for null model 1 0 -1 -2 -3 960 970 980 start 990 Indentifying influential observations • delta-beta statistics i j j j (i ) – the difference in the parameter estimate between the all observations fit and the ith observation omitted from the fit – assessing the influence of observations on a parameter estimate SAS output Approx. delta-betas for manner Extreme Observations -------Lowest-----Value -0.0300321 -0.0263841 -0.0180988 -0.0168336 -0.0156685 Approx. LDi Obs 342 234 428 141 247 --------Lowest------Value 6.21925E-05 6.33683E-05 7.60771E-05 7.62382E-05 8.00260E-05 Obs 302 345 346 395 373 ------Highest-----Value 0.0109460 0.0121594 0.0128754 0.0141662 0.0154732 Obs 180 212 403 258 286 ------Highest-----Value 0.0838694 0.0973350 0.1086945 0.1152216 0.1250329 Obs 234 286 54 342 449 • Likelihood displacement (LD) LDi 2{log L( ) log L( (i ) )} – The changes of the maximized log-likelihood if omitting the ith observation from the fit – Assessing the influence of observations on the overall fit( the set of parameter estimates) • Treatment: – Check the original data, corrected it if found any mistakes – Make inferences based on both situations (full & reduced data), contrast results Testing PH assumption • A crucial assumption when using Cox model • The effect of covariates on the hazard rate are the same over time Before fitting a Cox Model – Grouping data according the level of one or more factors – Plot – Parallel curves if the hazard ratios are proportional across the different groups – In SAS, strata option log[ log S ]vs.log(t ) Checking PH assumption for manner plot of log(-log(surv))vs.logt lls 1 0 -1 -2 0 1 2 3 logy manner 0 1 4 Checking PH assumption for region plot of log(-log(surv))vs.logt lls 2 1 0 -1 -2 0 1 2 3 logy region 0 1 2 3 4 After fitting a Cox Model • Plot weighted Schoenfeld residuals vs. time E (r *Pji ) j (ti ) j j (ti ) the time-varying coefficient at the ith failure time j the estimate in the fitted Cox PH model Plot weighted Schoenfeld residuals vs. time (cont’s) r *Pji j vs.ti – Detect if some form of time dependency in particular covariate exist – A horizontal line show the coefficient is constant, and PH assumption is valid Adding a time-dependent covariate if hazard rate varies with time For example, a PH model with one covariate for ith observation assumes i (t; Z ) 0 (t )exp(1Z1i ) If the hazard ratio varies with time between two groups, we can include an interaction term to the model: i (t; Z ) 0 (t ) exp(1Z1i 2 Z1i * t ) The relative hazard is now exp(1Z1i 2 Z1i * t ) , which depend on t. If the coefficient 2 is significant, the model is no longer a PH model. The test of the hypothesis that 2 =0 is a test of the PH assumption. Summary • Assessing model fit: • Residuals & residual plots • Functional form of the covariates • Influence diagnostics: • Delta-beta statistics • LD statistics • Testing PH assumption : • Plot of log[-logS] vs. log(t) • Plot of Schoenfeld residuals vs. t • Adding a time-dependent variable