Statistics for Health Research Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Objectives of Workshop • Understand the general form of Cox PH model • Understand the need for adjusted Hazard Ratios (HR) • Implement the Cox model in SPSS • Understand and interpret the output from SPSS Modelling: Detecting signal from background noise Survival Regression Models Expressed in terms of the hazard function formally defined as: The instantaneous risk of event (mortality) in next time interval t, conditional on having survived to start of the interval t What is hazard? Hazard rate is an instantaneous rate of events as a function of time Plot of hazard Note that the hazard changes over time denoted by h(t) h(t) Birth time Old age Survival Regression Models The Cox model expresses the relationship between the hazard and a set of variables or covariates These could be arm of trial, age, gender, social deprivation, Dukes stage, co-morbidity, etc…. How is the relationship formulated? Simplest equation is: h = k H a z a r d Age in years h is the hazard K is a constant e.g. 0.3 per Person-year How is the relationship formulated? Next Simplest is linear equation: h = a + βx + ei h is the outcome; a is the intercept; β is the slope related to x the explanatory variable and; e is the error term or ‘noise’ Linear model of hazard Hazard Age in years Cox Proportional Hazards Model (1972) h( t ) = h0 ( t ) × r(β, x) h0 is the baseline hazard; r ( β, x) function reflects how the hazard function changes (β) according to differences in subjects’ characteristics (x) Exponential model of hazard Hazard Age in years What is Hazard Ratio? Hazard Ratio (HR) is ratio of hazards in two groups e.g. men vs women, new drug vs. BSC N.B. It is the improvement in one group over the other in terms of rate at which events will occur from a particular time point to another time point What is Hazard Ratio? Hazard Ratio (HR) is ratio of hazards in two groups and remains constant over time (n.b. survival curve widens) Survival Time Interpretation of HR comparing two groups HR = 1 ; Do NOT reject null hypothesis (i.e. no difference) HR < 1; Reduction in Hazard relative to comparator (e.g. HR = 0.6 is 40% reduction) HR > 1; Increase in Hazard relative to comparator (e.g. HR = 1.7 is 70% increase) Cox Proportional Hazards Model: Hazard Ratio r(β, x) = exp(βx) Consider hazard ratio for men vs. women, then - h(t ) HR h(t ) men women h (t )r ( x ) h (t )r ( x ) 0 0 men women Cox Proportional Hazards Model: Hazard Ratio If coding for gender is x=1 (men) and x=0 (women) then: r ( βxmen ) exp(β) HR = = r ( βxwomen ) exp(0 ) = exp(β) where β is the regression coefficient for gender Hazard ratios in SPSS SPSS gives hazard ratios for a binary factor coded (0,1) automatically from exponentiation of regression coefficients (95% CI are also given as an option) Note that the HR is labelled as EXP(B) in the output Fitting Gender in Cox Model in SPSS Output from Cox Model in SPSS Variables in the Equation SEXNUM B -.038 SE .121 Standard error Variable in model Regression Coefficient Wald .097 df Degrees of freedom Test Statistic ( β/se(β) )2 1 Sig. .755 pvalue Exp(B) .963 HR for men vs. women Logrank Test: Null Hypothesis The Null hypothesis for the logrank test: Hazard Rate group A = Hazard Rate for group B = HR = OA / EA = 1 OB / EB Wald Test: Null Hypothesis The Null hypothesis for the Wald test: Hazard Ratio = 1 Equivalent to regression coefficient β=0 Note that if the 95% CI for the HR includes 1 then the null hypothesis cannot be rejected Hazard ratios for categorical factors in SPSS • Enter factor as before • Click on ‘categorical’ and choose the reference category (usually first or last) • E.g. Duke’s staging may choose Stage A as the reference category • HRs are now given in output for survival in each category relative to Stage A • Hence there will be n-1 HRs for n categories Fitting a categorical variable: Duke’s Staging Categorical Variable Codingsa,b Reference category DUKES Freqency 18 107 188 123 40 0= A 1= B 2= C 3= D 9= UK (1) .000 1.000 .000 .000 .000 (2) .000 .000 1.000 .000 .000 (3) .000 .000 .000 1.000 .000 (4) .000 .000 .000 .000 1.000 a. Indi cator Parameter Codi ng b. Category vari able: DUKES (Dukes Stag ing ) Variables in the Equation B vs. A C vs. A D vs. A UK vs. A B DUKES DUKES(1) DUKES(2) DUKES(3) DUKES(4) .066 .716 1.753 1.328 SE .441 .421 .420 .446 Wald 105.703 .022 2.893 17.379 8.875 df 4 1 1 1 1 Sig . .000 .882 .089 .000 .003 Exp(B) 1.068 2.047 5.769 3.775 95.0% CI for Exp(B) Lower Upper .450 .897 2.531 1.575 2.536 4.672 13.151 9.046 One Solution to Confounding Use multiple Cox regression with both predictor and confounder as explanatory variables i.e fit: h(t) = h0 ( t) exp(β1x1 + β2 x2 ) x1 is Duke’s Stage and x2 is Age Fitting a multiple regression: Duke’s Staging and Age Variables in the Equation B AGE .019 SE .006 Wald 9.181 df Sig . .002 1 Exp(B) 1.019 95.0% CI for Exp(B) Lower Upper 1.007 1.032 Variables in the Equation B DUKES DUKES(1) DUKES(2) DUKES(3) DUKES(4) AGE .159 .822 1.896 1.321 .024 SE .442 .422 .422 .446 .007 Wald 111.400 .130 3.800 20.181 8.773 13.761 df 4 1 1 1 1 1 Sig . .000 .719 .051 .000 .003 .000 Age adjusted for Duke’s Stage Exp(B) 1.172 2.276 6.662 3.748 1.024 95.0% CI for Exp(B) Lower Upper .493 .996 2.913 1.564 1.011 2.788 5.203 15.238 8.986 1.038 Interpretation of the Hazard Ratio For a continuous variable such as age, HR represents the incremental increase in hazard per unit increase in age i.e HR=1.024, increase 2.4% for a one year increase in age For a categorical variable the HR represents the incremental increase in hazard in one category relative to the reference category i.e. HR = 6.66 for Stage D compared with A represents a 6.7 fold increase in hazard First steps in modelling •What hypotheses are you testing? •If main ‘exposure’ variable, enter first and assess confounders one at a time •Assess each variable on statistical significance and clinical importance. •It is acceptable to have an ‘important’ variable without statistical significance Summary • The Cox Proportional Hazards model is the most used analytical tool in survival research • It is easily fitted in SPSS • Model assessment requires some thought • Next step is to consider how to select multiple factors for the ‘best’ model Check assumption of proportional hazards (PH) • Proportional hazards assumes that the ratio of hazard in one group to another remains the same throughout the follow-up period • For example, that the HR for men vs. women is constant over time • Simplest method is to check for parallel lines in the Log (-Log) plot of survival Check assumption of proportional hazards for each factor. Log minus log plot of survival should give parallel lines if PH holds Hint: Within Cox model select factor as CATEGORICAL and in PLOTS select log minus log function for separate lines of factor Check assumption of proportional hazards for each factor. Log minus log plot of survival should give parallel lines if PH holds Hint: Within Cox model select factor as CATEGORICAL and in PLOTS select log minus log function for separate lines of factor Proportional hazards holds for Duke’s Staging Categorical Variable Codings(b) Frequency (1) (2) (3) (4) 18 1 0 0 0 1=B 107 0 1 0 0 2=C 188 0 0 1 0 3=D 123 0 0 0 1 9=UK 40 0 0 0 0 dukes(a) 0=A a Indicator Parameter Coding b Category variable: dukes (Dukes Staging) Proportional hazards holds for Duke’s Staging Summary • Selection of factors for Multiple Cox regression models requires some judgement • Automatic procedures are available but treat results with caution • They are easily fitted in SPSS • Check proportional hazards assumption • Parsimonious models are better Practical • Read in Colorectal.sav and try to fit a multiple proportional hazards model • Check proportional hazards assumption