Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany Patrick Royston MRC Clinical Trials Unit, London, UK Multivariable regression models with continuous covariates with a practical emphasis on fractional polynomials and applications in clinical epidemiology The problem … “Quantifying epidemiologic risk factors using non-parametric regression: model selection remains the greatest challenge” Rosenberg PS et al, Statistics in Medicine 2003; 22:3369-3381 Trivial nowadays to fit almost any model To choose a good model is much harder 2 Overview • Context and motivation • Introduction to fractional polynomials for the univariate smoothing problem • Extension to multivariable models • Robustness and stability • Software sources • Conclusions 3 Motivation • Often have continuous risk factors in epidemiology and clinical studies – how to model them? • Linear model may describe a dose-response relationship badly ‘Linear’ = straight line = 0 + 1 X + … throughout talk • Using cut-points has several problems • Splines recommended by some – but are not ideal Lack a well-defined approach to model selection ‘Black box’ Robustness issues 4 Problems of cut-points • Step-function is a poor approximation to true relationship Almost always fits data less well than a suitable continuous function • ‘Optimal’ cut-points have several difficulties Biased effect estimates Inflated P-values Not reproducible in other studies 5 Example datasets 1. Epidemiology • Whitehall 1 17,370 male Civil Servants aged 40-64 years Measurements include: age, cigarette smoking, BP, cholesterol, height, weight, job grade Outcomes of interest: coronary heart disease, allcause mortality logistic regression Interested in risk as function of covariates Several continuous covariates Some may have no influence in multivariable context 6 Example datasets 2. Clinical studies • German breast cancer study group (BMFT-2) Prognostic factors in primary breast cancer Age, menopausal status, tumour size, grade, no. of positive lymph nodes, hormone receptor status Recurrence-free survival time Cox regression 686 patients, 299 events Several continuous covariates Interested in prognostic model and effect of individual variables 7 Example: Systolic blood pressure vs. age 50 100 150 200 250 300 Whitehall 1: BP vs age 40 45 50 55 60 65 Age, years 8 Example: Curve fitting (Systolic BP and age – not linear) 150 Whitehall 1: BP vs age 125 130 135 140 145 95% CI Linear function FP1 function Running line 40 45 50 55 60 65 Age, years 9 Empirical curve fitting: Aims • Smoothing • Visualise relationship of Y with X • Provide and/or suggest functional form 10 Some approaches • ‘Non-parametric’ (local-influence) models Locally weighted (kernel) fits (e.g. lowess) Regression splines Smoothing splines (used in generalized additive models) • Parametric (non-local influence) models Polynomials Non-linear curves Fractional polynomials Intermediate between polynomials and non-linear curves 11 Local regression models • Advantages Flexible – because local! May reveal ‘true’ curve shape (?) • Disadvantages Unstable – because local! No concise form for models Therefore, hard for others to use – publication,compare results with those from other models Curves not necessarily smooth ‘Black box’ approach Many approaches – which one(s) to use? 12 Polynomial models • Do not have the disadvantages of local regression models, but do have others: • Lack of flexibility (low order) • Artefacts in fitted curves (high order) • Cannot have asymptotes 13 Fractional polynomial models • Describe for one covariate, X multiple regression later • Fractional polynomial of degree m for X with powers p1, … , pm is given by FPm(X) = 1 X p + … + m X p 1 m • Powers p1,…, pm are taken from a special set {2, 1, 0.5, 0, 0.5, 1, 2, 3} • Usually m = 1 or m = 2 is sufficient for a good fit 14 FP1 and FP2 models • FP1 models are simple power transformations • 1/X2, 1/X, 1/X, log X, X, X, X2, X3 8 models • FP2 models are combinations of these For example 1(1/X) + 2(X2) 28 models • Note ‘repeated powers’ models For example 1(1/X) + 2(1/X)log X 8 models 15 FP1 and FP2 models: some properties • Many useful curves • A variety of features are available: Monotonic Can have asymptote Non-monotonic (single maximum or minimum) Single turning-point • Get better fit than with conventional polynomials, even of higher degree 16 Examples of FP2 curves - varying powers (-2, 1) (-2, 2) (-2, -2) (-2, -1) 17 Examples of FP2 curves - single power, different coefficients (-2, 2) 4 Y 2 0 -2 -4 10 20 30 x 40 50 18 A philosophy of function selection • Prefer simple (linear) model • Use more complex (non-linear) FP1 or FP2 model if indicated by the data • Contrast to local regression modelling Already starts with a complex model 19 Estimation and significance testing for FP models • Fit model with each combination of powers FP1: 8 single powers FP2: 36 combinations of powers • Choose model with lowest deviance (MLE) • Comparing FPm with FP(m 1): compare deviance difference with 2 on 2 d.f. one d.f. for power, 1 d.f. for regression coefficient supported by simulations; slightly conservative 20 Selection of FP function • • • • • • • Has flavour of a closed test procedure Use 2 approximations to get P-values Define nominal P-value for all tests (often 5%) Fit linear and best FP1 and FP2 models Test FP2 vs. null – test of any effect of X (2 on 4 df) Test FP2 vs linear – test of non-linearity (2 on 3 df) Test FP2 vs FP1 – test of more complex function against simpler one (2 on 2 df) 21 Example: Systolic BP and age Model FP2 v Null FP2 v Linear FP2 v FP1 d.f. 4 3 2 Deviance difference 944.57 29.95 3.29 Pvalue 0.000 0.000 0.2 Reminder: FP1 had power 3: 1 X3 FP2 had powers (1,1): 1 X + 2 X log X 22 Aside: FP versus spline • Why care about FPs when splines are more flexible? • More flexible more unstable More chance of ‘over-fitting’ • In epidemiology, dose-response relationships are often simple • Illustrate by small simulation example 23 FP versus spline (continued) • • • • • • • • Logarithmic relationships are common in practice Simulate regression model y = 0 + 1log(X) + error Error is normally distributed N(0, 2) Take 0 = 0, 1 = 1; X has lognormal distribution Vary = {1, 0.5, 0.25, 0.125} Fit FP1, FP2 and spline with 2, 4, 6 d.f. Compute mean square error Compare with mean square error for true model 24 FP vs. spline (continued) 2 y 0 -2 -4 -4 -2 y 0 2 4 Sigma = 0.5 4 Sigma = 1 2 4 6 0 2 4 x Sigma = 0.25 Sigma = 0.125 6 2 y 0 -2 -4 -4 -2 y 0 2 4 x 4 0 0 2 4 x 6 0 2 4 6 x 25 FP vs. spline (continued) FP1 and spline with 2 df 2 1 0 -1 -2 -2 -1 0 1 2 Solid: FP1; dashed: spline 2 df 4 6 0 2 4 6 0 2 4 6 0 2 4 6 1 0 -1 -2 -2 -1 0 1 2 2 2 0 26 FP vs. spline (continued) 2 1 0 -1 -2 -2 -1 0 1 2 FP2 and spline with 4 df 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 1 0 -1 -2 -2 -1 0 1 2 1 2 0 27 FP vs. spline (continued) 0 .04 .08 .12 FP vs. spline: prediction error .125 .25 .5 1 sigma True Spline 2df FP1 Spline 4df FP2 Spline 6df 28 FP vs. spline (continued) • In this example, spline usually less accurate than FP • FP2 less accurate than FP1 (over-fitting) • FP1 and FP2 more accurate than splines • Splines often had non-monotonic fitted curves Could be medically implausible • Of course, this is a special example 29 Multivariable FP (MFP) models • Assume have k > 1 continuous covariates and perhaps some categoric or binary covariates • Allow dropping of non-significant variables • Wish to find best multivariable FP model for all X’s • Impractical to try all combinations of powers • Require iterative fitting procedure 30 Fitting multivariable FP models (MFP algorithm) • Combine backward elimination of weak variables with search for best FP functions • Determine fitting order from linear model • Apply FP model selection procedure to each X in turn fixing functions (but not ’s) for other X’s • Cycle until FP functions (i.e. powers) and variables selected do not change 31 Example: Prognostic factors in breast cancer • Aim to develop a prognostic index for risk of tumour recurrence or death • Have 7 prognostic factors 4 continuous, 3 categorical • Select variables and functions using 5% significance level 32 Univariate linear analysis Variable X1 X2 X3 X4a X4b X5 X6 X7 Name Age Menopausal status Tumour size Grade 2 or 3 Grade 3 No. of positive lymph nodes Progesterone receptor status Oestrogen receptor status 2 0.58 0.28 15.68 19.92 8.19 50.02 34.04 4.70 33 Univariate FP2 analysis Variable X1 age X3 size X5 nodes X6 PgR X7 ER Powers (2, 0.5) (1, 3) (1, 2) (0.5, 0) (2, 1) 2 d.f. 17.61 4 19.81 4 81.36 4 52.73 4 23.07 4 P 0.001 0.001 < 0.001 < 0.001 < 0.001 Gain 17.03 4.13 31.34 18.69 18.37 Gain compares FP2 with linear on 3 d.f. All factors except for X3 have a non-linear effect 34 Multivariable FP analysis Variable X1 age X3 size X5 nodes X6 PgR X7 ER X2 mens. X4a grad 2/3 X4b grad 3 FP etc. (2, 0.5) Out (2, 1) 0.5 Out Out In Out 2 19.33 5.31 74.14 32.70 2.15 0.21 4.59 0.15 d.f. P 4 0.001 4 0.3 4 <0.001 4 <0.001 4 0.7 1 0.6 1 0.03 1 0.7 35 Comments on analysis • Conventional backwards elimination at 5% level selects X4a, X5, X6, and X1 is excluded • FP analysis picks up same variables as backward elimination, and additionally X1 • Note considerable non-linearity of X1 and X5 • X1 has no linear influence on risk of recurrence • FP model detects more structure in the data than the linear model 36 Plots of fitted FP functions Breast cancer: Fitted FP functions 1 Nodes 20 40 -1 -.5 0 .5 Log relative hazard 5 4 3 2 1 0 Log relative hazard Age 60 80 Age, years 0 10 20 30 40 No. of positive lymph nodes 50 0 -1 -2 -3 Log relative hazard 1 Progesterone receptor 0 500 1000 1500 2000 Progesterone receptor status 2500 37 Survival by risk groups 0.00 0.25 0.50 0.75 1.00 Prognostic classification scheme 0 2 4 Recurrence-free survival, yr Group = Low risk Group = High risk 6 8 Group = Medium risk 38 Robustness of FP functions • Breast cancer example showed non-robust functions for nodes – not medically sensible • Situation can be improved by performing covariate transformation before FP analysis • Can be done systematically (work in progress) • Sauerbrei & Royston (1999) used negative exponential transformation of nodes exp(–0.12 * number of nodes) 39 0 .5 1 1.5 Making the function for lymph nodes more robust -.5 Original Exponential transformation 0 10 20 30 No. of positive lymph nodes 40 50 40 2nd example: Whitehall 1 MFP analysis Covariate Age Cigarettes Systolic BP Total cholesterol Height Weight Job grade FP etc. Linear 0.5 -1, -0.5 Linear Linear -2, 3 In No variables were eliminated by the MFP algorithm Weight is eliminated by linear backward elimination 41 Plots of FP functions Whitehall 1: multivariable FP analysis Cigarettes .5 .4 .3 .2 .1 .08 45 50 55 60 Age at entry 65 0 20 40 Cigarettes/day 5 10 Cholesterol/ mmol/l 100 150 200 250 300 Systolic BP 15 40 60 Height .08 .09 .1 Probability of death .2 .1 .12 .14 .16 .18 Probability of death .14 .12 .1 .08 0 50 Weight .16 Total cholesterol 60 .11 .12 .13 40 Probability of death Systolic BP Probability of death .1 Probability of death .15 .1 .05 Probability of death .2 .12 .14 .16 .18 Age 80 100 120 140 Weight/kgs 140 160 180 Height/cms 200 42 Stability Models (variables, FP functions) selected by statistical criteria – cut-off on P-value Approach has several advantages … … and also is known to have problems Omission bias Selection bias Unstable – many models may fit equally well 43 Stability • Instability may be studied by bootstrap resampling (sampling with replacement) Take bootstrap sample B times Select model by chosen procedure Count how many times each variable is selected Summarise inclusion frequencies & their dependencies Study fitted functions for each covariate • May lead to choosing several possible models, or a model different from the original one 44 Bootstrap stability analysis of the breast cancer dataset • 5000 bootstrap samples taken (!) • MFP algorithm with Cox model applied to each sample • Resulted in 1222 different models (!!) • Nevertheless, could identify stable subset consisting of 60% of replications Judged by similarity of functions selected 45 Bootstrap stability analysis of the breast cancer dataset Variable Model selected Age FP1 FP2 Menopausal status — Tumour size FP1 FP2 Grade 2/3 — Grade 3 — Lymph nodes FP1 Progesterone receptors FP1 FP2 Oestrogen receptors FP1 FP2 % bootstraps model selected 16 76 20 34 6 58 9 100 95 4 13 6 46 Bootstrap analysis: summaries of fitted curves from stable subset Log relative hazard 6 1 0 4 -1 2 -2 -3 0 20 30 40 50 60 Age, years 70 80 Log relative hazard 2 0 25 50 75 Tumour size, mm 100 0 250 PgR, fmol/L 500 1 1 0 0 -1 -1 0 10 20 30 Number of positive lymph nodes 47 Presentation of models for continuous covariates • The function + 95% CI gives the whole story • Functions for important covariates should always be plotted • In epidemiology, sometimes useful to give a more conventional table of results in categories • This can be done from the fitted function 48 Example: Cigarette smoking and allcause mortality (Whitehall 1) Cigarettes per day Number OR (model based) Range Ref. At risk Dying Estimate 95% CI point 0 (referent) 0 10103 690 1.00 -1-10 5 2254 243 1.69 1.59, 1.80 11-20 15 3448 494 2.25 2.04, 2.49 21-30 25 1117 185 2.60 2.31, 2.91 31-40 35 283 48 2.86 2.52, 3.24 41-50 45 43 8 3.07 2.68, 3.52 51-60 55 12 2 3.25 2.82, 3.75 49 Other issues (1) • Handling continuous confounders May use a larger P-value for selection e.g. 0.2 Not so concerned about functional form here • Binary/continuous covariate interactions Can be modelled using FPs (Royston & Sauerbrei 2004) Adjust for other factors using MFP 50 Other issues (2) • Time-varying effects in survival analysis Can be modelled using FP functions of time (Berger; also Sauerbrei & Royston, in progress) • Checking adequacy of FP functions May be done by using splines Fit FP function and see if spline function adds anything, adjusting for the fitted FP function 51 Software sources • Most comprehensive implementation is in Stata Command mfp is part of Stata 8 • Versions for SAS and R are now available Contact W Sauerbrei (wfs@imbi.uni-freiburg.de) to request a copy of the SAS macro R version available on CRAN archive mfp package 52 Concluding remarks (1) • FP method in general No reason (other than convention) why regression models should include only positive integer powers of covariates FP is a simple extension of an existing method Simple to program and simple to explain Parametric, so can easily get predicted values FP usually gives better fit than standard polynomials Cannot do worse, since standard polynomials are included 53 Concluding remarks (2) • Multivariable FP modelling Many applications in general context of multiple regression modelling Well-defined procedure based on standard principles for selecting variables and functions Aspects of robustness and stability have been investigated (and methods are available) Much experience gained so far suggests that method is very useful in clinical epidemiology 54 Some references • • • • • • • Royston P, Altman DG (1994) Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Applied Statistics 43: 429-467 Royston P, Altman DG (1997) Approximating statistical functions by using fractional polynomial regression. The Statistician 46: 1-12 Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. JRSS(A) 162: 71-94. Corrigendum JRSS(A) 165: 399--400, 2002 Royston P, Ambler G, Sauerbrei W. (1999) The use of fractional polynomials to model continuous risk variables in epidemiology. International Journal of Epidemiology, 28: 964-974. Royston P, Sauerbrei W (2004). A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine 23: 2509-2525. Royston P, Sauerbrei W (2003) Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigation. Statistics in Medicine 22: 639-659. Armitage P, Berry G, Matthews JNS (2002) Statistical Methods in Medical Research. Oxford, Blackwell. 55