Patrick Royston MRC Clinical Trials Unit, London, UK Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany Interactions With Continuous Variables – Extensions of the Multivariable Fractional Polynomial Approach Overview • Issues in regression models • (Multivariable) fractional polynomials (MFP) • Interactions of continuous variable with – Binary variable – Continuous variable – Time • Summary 2 Observational Studies Several variables, mix of continuous and (ordered) categorical variables Different situations: – prediction – explanation Explanation is the main interest here: • Identify variables with (strong) influence on the outcome • Determine functional form (roughly) for continuous variables The issues are very similar in different types of regression models (linear regression model, GLM, survival models ...) Use subject-matter knowledge for modelling ... ... but for some variables, data-driven choice inevitable 3 Regression models X=(X1, ...,Xp) covariate, prognostic factors g(x) = ß1 X1 + ß2 X2 +...+ ßp Xp (assuming effects are linear) normal errors (linear) regression model Y normally distributed E (Y|X) = ß0 + g(X) Var (Y|X) = σ2I logistic regression model Y binary Logit P (Y|X) = ln P(Y 1 X ) P(Y 0 X ) β 0 g(X) survival times T survival time (partly censored) Incorporation of covariates λ(t X ) λ 0 (t)exp (g(X)) 4 Central issue To select or not to select (full model)? Which variables to include? 5 Continuous variables – The problem “Quantifying epidemiologic risk factors using nonparametric regression: model selection remains the greatest challenge” Rosenberg PS et al, Statistics in Medicine 2003; 22:3369-3381 Discussion of issues in (univariate) modelling with splines Trivial nowadays to fit almost any model To choose a good model is much harder 6 Alcohol consumption as risk factor for oral cancer Rosenberg et al, StatMed 2003 7 Building multivariable regression models Before dealing with the functional form, the ‚easier‘ problem of model selection: variable selection assuming that the effect of each continuous variable is linear 8 Multivariable models - methods for variable selection Full model – variance inflation in the case of multicollinearity Stepwise procedures prespecified (in, out) and actual significance level? • forward selection (FS) • stepwise selection (StS) • backward elimination (BE) All subset selection which criteria? • Cp • AIC • BIC Mallows Akaike Information Criterion Bayes Information Criterion = (SSE / σˆ)2 - n+ p 2 = n ln (SSE / n) +p2 = n ln (SSE / n) + p ln(n) fit penalty Combining selection with Shrinkage Bayes variable selection Recommendations??? Central issue: MORE OR LESS COMPLEX MODELS? 9 Backward elimination is a sensible approach - Significance level can be chosen - Reduces overfitting Of course required • Checks • Sensitivity analysis • Stability analysis 10 Continuous variables – what functional form? Traditional approaches a) Linear function - may be inadequate functional form - misspecification of functional form may lead to wrong conclusions b) ‘best‘ ‘standard‘ transformation c) Step function (categorial data) - Loss of information - How many cutpoints? - Which cutpoints? - Bias introduced by outcome-dependent choice 11 StatMed 2006, 25:127-141 12 Continuous variables – newer approaches • ‘Non-parametric’ (local-influence) models – Locally weighted (kernel) fits (e.g. lowess) – Regression splines – Smoothing splines • Parametric (non-local influence) models – Polynomials – Non-linear curves – Fractional polynomials • Intermediate between polynomials and non-linear curves 13 Fractional polynomial models • Describe for one covariate, X • Fractional polynomial of degree m for X with powers p1, … , pm is given by FPm(X) = 1 X p1 + … + m X pm • Powers p1,…, pm are taken from a special set {2, 1, 0.5, 0, 0.5, 1, 2, 3} • Usually m = 1 or m = 2 is sufficient for a good fit • Repeated powers (p1=p2) 1 X p1 + 2 X p1log X • 8 FP1, 36 FP2 models 14 Examples of FP2 curves - varying powers (-2, 1) (-2, 2) (-2, -2) (-2, -1) 15 Examples of FP2 curves - single power, different coefficients (-2, 2) 4 Y 2 0 -2 -4 10 20 30 x 40 50 16 Our philosophy of function selection • Prefer simple (linear) model • Use more complex (non-linear) FP1 or FP2 model if indicated by the data • Contrasts to more local regression modelling – Already starts with a complex model 17 Example: Prognostic factors GBSG-study in node-positive breast cancer 299 events for recurrence-free survival time (RFS) in 686 patients with complete data 7 prognostic factors, of which 5 are continuous 18 FP analysis for the effect of age Degree 1 Power Model chisquare -2 6.41 -1 3.39 -0.5 2.32 0 1.53 0.5 0.97 1 0.58 2 0.17 3 0.03 Powers -2 -2 -2 -2 -2 -2 -2 -2 -1 -1 -1 -1 -2 -1 -0.5 0 0.5 1 2 3 -1 -0.5 0 0.5 Model chisquare 17.09 17.57 17.61 17.52 17.30 16.97 16.04 14.91 17.58 17.30 16.85 16.25 Degree 2 Powers Model Powers chisquare -1 1 15.56 0 2 -1 2 13.99 0 3 -1 3 12.37 0.5 0.5 -0.5 -0.5 16.82 0.5 1 -0.5 0 16.18 0.5 2 -0.5 0.5 15.41 0.5 3 -0.5 1 14.55 1 1 -0.5 2 12.74 1 2 -0.5 3 10.98 1 3 0 0 15.36 2 2 0 0.5 14.43 2 3 0 1 13.44 3 3 Model chisquare 11.45 9.61 13.37 12.29 10.19 8.32 11.14 8.99 7.15 6.87 5.17 3.67 7 19 Function selection procedure (FSP) Effect of age at 5% level? χ2 df p-value Any effect? Best FP2 versus null 17.61 4 0.0015 Linear function suitable? Best FP2 versus linear 17.03 3 0.0007 FP1 sufficient? Best FP2 vs. best FP1 11.20 2 0.0037 20 Many predictors – MFP With many continuous predictors selection of best FP for each becomes more difficult MFP algorithm as a standardized way to variable and function selection (usually binary and categorical variables are also available) MFP algorithm combines backward elimination with FP function selection procedures 21 Continuous factors Different results with different analyses Age as prognostic factor in breast cancer (adjusted) P-value 0.9 0.2 0.001 22 Results similar? Nodes as prognostic factor in breast cancer (adjusted) P-value 0.001 0.001 0.001 23 Multivariable FP Final Model in breast cancer example 2 0 .5 0 .5 f X 1 , X 1 ; X 4 a ; exp 0.12 X 5 ; X 6 age grade nodes progesterone Model choosen out of Model more than a million possible models, one model selected - Sensible? - Interpretable? - Stable? Bootstrap stability analysis (see R & S 2003) 24 Example: Risk factors • Whitehall 1 – 17,370 male Civil Servants aged 40-64 years, 1670 (9.7%) died – Measurements include: age, cigarette smoking, BP, cholesterol, height, weight, job grade – Outcomes of interest: all-cause mortality at 10 years logistic regression 25 Whitehall 1 Systolic blood pressure Deviance difference in comparison to a straight line for FP(1) and FP(2) models First degree Power Deviance Difference p -74.19 -2 -43.15 -1 -29.40 -0.5 -17.37 0 -7.45 0.5 0.00 1 6.43* 2 0.98 3 Powers q p -2 -2 -1 -2 -2 -0.5 0 -2 0.5 -2 1 -2 2 -2 3 -2 -1 -1 -1 -0.5 0 -1 -1 0.5 Fractional polynomials Second degree Deviance Powers Deviance q Difference Difference p 1 12.97 -1 26.22* 2 7.80 -1 24.43 3 2.53 -1 22.80 -0.5 -0.5 17.97 20.72 -0.5 0 16.00 18.23 -0.5 0.5 13.93 15.38 -0.5 1 11.77 8.85 -0.5 2 7.39 1.63 -0.5 3 3.10 21.62 0 14.24 0 19.78 0.5 12.43 0 17.69 1 10.61 0 15.41 Powers q p 2 0 3 0 0.5 0.5 1 0.5 2 0.5 3 0.5 1 1 2 1 3 1 2 2 3 2 3 3 Deviance difference 7.05 3.74 10.94 9.51 6.80 4.41 8.46 6.61 5.11 6.44 6.45 7.59 26 Similar fit of several functions 27 Presentation of models for continuous covariates • The function + 95% CI gives the whole story • Functions for important covariates should always be plotted • In epidemiology, sometimes useful to give a more conventional table of results in categories • This can be done from the fitted function 28 Whitehall 1 Systolic blood pressure Odds ratio from final FP(2) model LogOR= 2.92 – 5.43X-2 –14.30* X –2 log X Presented in categories Systolic blood pressure (mm Hg) Range ref. point 90 88 91-100 95 101-110 105 111-120 115 121-130 125 131-140 135 141-160 150 161-180 170 181-200 190 201-240 220 241-280 250 Number of men at risk dying OR (model-based) Estimate 95%CI 27 283 1079 2668 3456 4197 2775 1437 438 154 5 2.47 1.42 1.00 0.94 1.04 1.25 1.77 2.87 4.54 8.24 15.42 3 22 84 164 289 470 344 252 108 41 4 1.75, 3.49 1.21, 1.67 0.86, 1.03 0.91, 1.19 1.07, 1.46 1.50, 2.08 2.42, 3.41 3.78, 5.46 6.60, 10.28 11.64, 20.43 29 Whitehall 1 MFP analysis Covariate Age Cigarettes Systolic BP Total cholesterol Height Weight Job grade FP etc. Linear 0.5 -1, -0.5 Linear Linear -2, 3 In No variables were eliminated by the MFP algorithm Assuming a linear function weight is eliminated by backward elimination 30 Interactions Motivation – I Detecting predictive factors (interaction with treatment) • Don’t investigate effects in separate subgroups! • Investigation of treatment/covariate interaction requires statistical tests • Care is needed to avoid over-interpretation • Distinguish two cases: - Hypothesis generation: searching several interactions - Specific predefined hypothesis • For current bad practise - see Assmann et al (Lancet 2000) 31 Motivation - II Continuous by continuous interactions • usually linear by linear product term • not sensible if main effect (prognostic effect) is nonlinear • mismodelling the main effect may introduce spurious interactions 32 Detecting predictive factors (treatment – covariate interaction) • Most popular approach - Treatment effect in separate subgroups - Has several problems (Assman et al 2000) • Test of treatment/covariate interaction required - For `binary`covariate standard test for interaction available • Continuous covariate - Often categorized into two groups 33 Categorizing a continuous covariate • How many cutpoints? • Position of the cutpoint(s) • Loss of information loss of power 34 Standard approach • Based on binary predictor • Need cut-point for continuous predictor • Illustration - problem with cut-point approach TAM*ER interaction in breast cancer (GBSG-study) P -value (log scale) .5 .2 .1 .05 .02 .01 0 5 10 ER cutpoint, fmol/l 15 20 35 Treatment effect by subgroup Lo g h azard ra tio for tre atm e nt Low ER High ER 1 .5 0 -.5 0 5 10 ER cutpoint, fmol/l 15 20 36 New approaches for continuous covariates • STEPP Subpopulation treatment effect pattern plots Bonetti & Gelber 2000 • MFPI Multivariable fractional polynomial interaction approach Royston & Sauerbrei 2004 37 STEPP Contin. covariate Sequences of overlapping subpopulations Sliding window Tail oriented 2g-1 subpopulations (here g=8) 38 STEPP Estimates in subpopulations • No interaction treatment effects ‚similar‘ in all subpopulations • Plot effects in subpopulations 39 STEPP Overlapping populations, therefore correlation between treatment effects in subpopulations Simultaneous confidence band and tests proposed 40 MFPI • Have one continuous factor X of interest • Use other prognostic factors to build an adjustment model, e.g. by MFP Interaction part – with or without adjustment • Find best FP2 transformation of X with same powers in each treatment group • LRT of equality of reg coefficients • Test against main effects model(no interaction) based on 2 with 2df • Distinguish predefined hypothesis - hypothesis searching 41 RCT: Metastatic renal carcinoma Comparison of MPA with interferon 1 .00 N = 347, 322 Death (1) MPA 0 .00 0 .25 0 .50 0 .75 (2) Interferon At risk 1: 175 55 22 11 3 2 1 At risk 2: 172 73 36 20 8 5 1 0 12 24 36 48 60 72 Follow-up (months) 42 Overall: Interferon is better (p<0.01) • Is the treatment effect similar in all patients? Sensible questions? - Yes, from our point of view • Ten factors available for the investigation of treatment – covariate interactions 43 MFPI Treatment effect function for WCC -4 -2 0 2 Original data 5 10 White cell count 15 20 Only a result of complex (mis-)modelling? 44 Does the MFPI model agree with the data? Check proposed trend Treatment effect in subgroups defined by WCC 0 12 24 36 48 60 72 12 24 36 0 12 24 48 Follow-up (months) 60 72 36 48 60 72 Group IV 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Group III 0 Group II 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Group I 0 12 24 36 48 60 72 Follow-up (months) HR (Interferon to MPA; adjusted values similar) overall: 0.75 (0.60 – 0.93) I : 0.53 (0.34 – 0.83) II : 0.69 (0.44 – 1.07) III : 0.89 (0.57 – 1.37) IV : 1.32 (0.85 –2.05) 45 STEPP – Interaction with WCC SLIDING WINDOW (n1 = 25, n2 = 40) TAIL ORIENTED (g = 8) 46 1 .5 .2 .1 Relative hazard 2 STEPP as check of MFPI 4 8 12 16 20 White cell count STEPP – tail-oriented, g = 6 47 MFPI – Type I error 0 .5 D en sity 1 1 .5 Random permutation of a continuous covariate (haemoglobin) no interaction Distribution of P-value from test of interaction 1000 runs, Type I error: 0.054 0 .2 .4 .6 .8 1 p 48 Continuous by continuous interactions MFPIgen • Have Z1 , Z2 continuous and X confounders • Apply MFP to X, Z1 and Z2, forcing Z1 and Z2 into the model. FP functions f1(Z1) and f2(Z2) will be selected for Z1 and Z2 • Add term f1(Z1)* f2(Z2) to the model chosen and use LRT for test of interaction • Often f1(Z1) and/or f2(Z2) are linear • Check all pairs of continuous variables for an interaction • Check (graphically) interactions for artefacts • Use forward stepwise if more than one interaction remains • Low significance level for interactions 49 Interactions Whitehall 1 Consider only age and weight Main effects: age – linear weight – FP2 (-1,3) Interaction? Include age*weight-1 + age*weight3 into the model LRT: χ2 = 5.27 (2df, p = 0.07) no (strong) interaction 50 Erroneously assume that the effect of weight is linear Interaction? Include age*weight into the model LRT: χ2 = 8.74 (1df, p = 0.003) hightly significant interaction 51 Model check: categorize age in 4 equal sized groups Compute running line smooth of the binary outcome on weight in each group 52 0 Whitehall 1: check of age x weight interaction 2nd quartile 4th quartile -4 -3 -2 -1 1st quartile 3rd quartile 40 60 80 100 120 140 weight 53 Running line smooth are about parallel across age groups no (strong) interactions smoothed probabilities are about equally spaced effect of age is linear 54 Erroneously assume that the effect of weight is linear Estimated slopes of weight in age-groups indicates strong qualitative interaction between age und weight 55 Whitehall 1: P-values for two-way interactions from MFPIgen *FP transformations chol*age highly significant 56 Presentation of interactions Whitehall 1: age*chol interaction Effect (adjusted) for 10th, 35th, 65th and 90th centile 57 Age*Chol interaction Chol ‚low‘ : age has an effect Chol ‚high‘ : age has no effect Age ‚low‘ : chol has an effect Age ‚high‘ : chol has no effect 58 Age*Chol interaction Does the model fit? Check in 4 subgroups Linearity of chol: ok But: Slopes are not monotonically ordered Lack of fit of linear*linear interaction 59 More complicated model? Interaction ‚real‘? Validation in new data! 60 Survival data effect of a covariate may vary in time time by covariate interaction 61 Extending the Cox model • Cox model (t | X) = 0(t) exp (X) • Relax PH-assumption dynamic Cox model (t | X) = 0(t) exp ((t) X) HR(x,t) – function of X and time t • Relax linearity assumption (t | X) = 0(t) exp ( f (X)) 62 Causes of non-proportionality • Effect gets weaker with time • Incorrect modelling – – – omission of an important covariate incorrect functional form of a covariate different survival model is appropriate 63 Non-PH – What can be done? Non-PH - Does it matter ? - Is it real ? Non-PH is large and real – – – stratify by the factor (t|X, V=j) = j (t) exp (X ) • effect of V not estimated, not tested • for continuous variables grouping necessary Partition time axis Model non-proportionality by time-dependent covariate 64 Example: Time-varying effects Rotterdam breast cancer data 2982 patients 1 to 231 months follow-up time 1518 events for RFI (recurrence free interval) Adjuvant treatment with chemo- or hormonal therapy according to clinic guidelines 70% without adjuvant treatment Covariates continuous age, number of positive nodes, estrogen, progesterone categorical menopausal status, tumor size, grade 65 • Treatment variables ( chemo , hormon) will be analysed as usual covariates • 9 covariates , partly strong correlation (age-meno; estrogen-progesterone; chemo, hormon – nodes ) variable selection • Use multivariable fractional polynomial approach for model selection in the Cox proportional hazards model 66 Assessing PH-assumption • Plots – Plots of log(-log(S(t))) vs log t should be parallel for groups – Plotting Schoenfeld residuals against time to identify patterns in regression coefficients – Many other plots proposed • Tests many proposed, often based on Schoenfeld residuals, most differ only in choice of time transformation • Partition the time axis and fit models seperatly to each time interval • Including time-by-covariate interaction terms in the model and estimate the log hazard ratio function 67 Smoothed Schoenfeld residuals – multivariable MFP model assuming PH 68 Selected model with MFP estimates Factor test of time-varying effect for different time transformations t p-value rank(t) Log(t) Sqrt(t) SE X1 – age -0.01 0.002 0.082 0.243 0.329 0.149 X3a – size 0.29 0.057 0.000 0.000 0.001 0.000 X4b – grade 0.39 0.064 0.189 0.198 0.129 0.164 X5e – nodes -1.71 0.081 0.002 0.000 0.000 0.000 X8 - chemo-T -0.39 0.085 0.091 0.008 0.023 0.034 X9 – horm-T -0.45 0.073 0.014 0.001 0.000 0.002 Index 1.00 0.039 0.000 0.000 0.000 0.000 69 Including time – by covariate interaction (Semi-) parametric models for β(t) • model (t) x = x + x g(t) calculate time-varying covariate x g(t) fit time-varying Cox model and test for plot (t) against t • 0 g(t) – which form? – – – – ‘usual‘ function, eg t, log(t) piecewise splines fractional polynomials 70 MFPTime algorithm Motivation • Multivariable strategy required to select – Variables which have influence on outcome – For continuous variables determine functional form of the influence (‚usual‘ linearity assumption sensible?) – Proportional hazards assumption sensible or does a time-varying function fit the data better? 71 MFPTime algorithm (1) • Determine (time-fixed) MFP model M0 possible problems variable included, but effect is not constant in time variable not included because of short term effect only • Consider short term period only Additional to M0 significant variables? This gives M1 72 MFPTime algorithm (2) For all variables (with transformations) selected from full time-period and short time-period • Investigate time function for each covariate in forward stepwise fashion - may use small P value • Adjust for covariates from selected model • To determine time function for a variable compare deviance of models ( 2) from FPT2 to null (time fixed effect) 4 DF FPT2 to log 3 DF FPT2 to FPT1 2 DF • Use strategy analogous to stepwise to add time-varying functions to MFP model M1 73 Development of the model Variable X1 X3b Model M0 Model M1 Model M2 β SE β SE β SE -0.013 0.002 -0.013 0.002 -0.013 0.002 0.171 0.080 0.150 0.081 - - X4 0.39 0.064 0.354 0.065 0.375 0.065 X5e(2) -1.71 0.081 -1.681 0.083 -1.696 0.084 X8 -0.39 0.085 -0.389 0.085 -0.411 0.085 X9 -0.45 0.073 -0.443 0.073 -0.446 0.073 X3a 0.29 0.057 0.249 0.059 - 0.112 0.107 -0.032 0.012 - 0.137 0.024 logX6 - - X3a(log(t)) - - - - - 0.298 0.073 logX6(log(t)) - - - - 0.128 0.016 0.504 0.082 -0.361 0.052 Index Index(log(t)) 1.000 - 0.039 - 1.000 - 0.038 - 74 Time-varying effects in final model 75 Final model includes time-varying functions for progesterone ( log(t) ) and tumor size ( log(t) ) Prognostic ability of the Index vanishes in time 76 Software sources MFP • Most comprehensive implementation is in Stata – Command mfp is part since Stata 8 (now Stata 10) • Versions for SAS and R are available – SAS www.imbi.uni-freiburg.de/biom/mfp – R version available on CRAN archive • mfp package • Extensions to investigate interactions • So far only in Stata 77 Concluding comments – MFP • FPs use full information - in contrast to a priori categorisation • FPs search within flexible class of functions (FP1 and FP(2)-44 models) • MFP is a well-defined multivariate model-building strategy – combines search for transformations with BE • Important that model reflects medical knowledge, e.g. monotonic / asymptotic functional forms 78 Towards recommendations for model-building by selection of variables and functional forms for continuous predictors under several assumptions Issue Recommendation Variable selection procedure Backward elimination; significance level as key tuning parameter, choice depends on the aim of the study Functional form for continuous covariates Linear function as the 'default', check improvement in model fit by fractional polynomials. Check derived function for undetected local features Extreme values or influential points Check at least univariately for outliers and influential points in continuous variables. A preliminary transformation may improve the model selected. For a proposal see R & S 2007 Sensitivity analysis Important assumptions should be checked by a sensitivity analysis. Highly context dependent Check of model stability The bootstrap is a suitable approach to check for model stability Complexity of a predictor A predictor should be 'as parsimonious as possible' Sauerbrei et al. SiM 2007 79 Interactions • Interactions are often ignored by analysts • Continuous categorical has been studied in FP context because clinically very important • Continuous continuous is more complex • Interaction with time important for long-term FU survival data 80 MFP extensions • MFPI – treatment/covariate interactions In contrast to STEPP it avoids categorisation • MFPIgen – interaction between two continuous variables • MFPT – time-varying effects in survival data 81 Summary Getting the big picture right is more important than optimising aspects and ignoring others • strong predictors • strong non-linearity • strong interactions • strong non-PH in survival model 82 References Harrell FE jr. (2001): Regression Modeling Strategies. Springer. Royston P, Altman DG. (1994): Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with discussion). Applied Statistics, 43, 429-467. Royston P, Altman DG, Sauerbrei W (2006): Dichotomizing continuous predictors in multiple regression: a bad idea. Statistics in Medicine, 25, 127-141. Royston P, Sauerbrei W. (2004): A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine, 23, 2509-2525. Royston P, Sauerbrei W. (2005): Building multivariable regression models with continuous covariates, with a practical emphasis on fractional polynomials and applications in clinical epidemiology. Methods of Information in Medicine, 44, 561571. Royston P, Sauerbrei W. (2007): Improving the robustness of fractional polynomial models by preliminary covariate transformation: a pragmatic approach. Computational Statistics and Data Analysis, 51: 4240-4253. Royston P, Sauerbrei W (2008): Multivariable Model-Building - A pragmatic approach to regression analysis based on fractional polynomials for continuous variables. Wiley. Sauerbrei W. (1999): The use of resampling methods to simplify regression models in medical statistics. Applied Statistics, 48, 313-329. Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006): Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs. Computational Statistics & Data Analysis, 50, 3464-3485. Sauerbrei W, Royston P. (1999): Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society A, 162, 71-94. Sauerbrei W, Royston P, Binder H (2007): Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine, 26:5512-28. Sauerbrei W, Royston P, Look M. (2007): A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biometrical Journal, 49: 453-473. Sauerbrei W, Royston P, Zapien K. (2007): Detecting an interaction between treatment and a continuous covariate: a comparison of two approaches. Computational Statistics and Data Analysis, 51: 4054-4063. Schumacher M, Holländer N, Schwarzer G, Sauerbrei W. (2006): Prognostic Factor Studies. In Crowley J, Ankerst DP (ed.), Handbook of Statistics in Clinical Oncology, Chapman&Hall/CRC, 289-333. 83