Motivation

advertisement

Multivariable regression models with continuous covariates

with a practical emphasis on fractional polynomials and applications in clinical epidemiology

Professor Patrick Royston,

MRC Clinical Trials Unit, London.

Berlin, April 2005.

8/4/2005 1

The problem …

“Quantifying epidemiologic risk factors using non-parametric regression: model selection remains the greatest challenge”

Rosenberg PS et al, Statistics in Medicine 2003; 22:3369-3381

Trivial nowadays to fit almost any model

To choose a good model is much harder

8/4/2005 2

Overview

Context and motivation

Introduction to fractional polynomials for the univariate smoothing problem

Extension to multivariable models

• More on spline models

• Stability analysis

• Stata aspects

Conclusions

8/4/2005 3

Motivation

Often have continuous risk factors in epidemiology and clinical studies – how to model them?

• Linear model may describe a dose-response relationship badly

 ‘Linear’ = straight line = 

0

+

1

X + … throughout talk

Using cut-points has several problems

• Splines recommended by some – but are not ideal

Lack a well-defined approach to model selection

 ‘Black box’

Robustness issues

8/4/2005 4

Problems of cut-points

Step-function is a poor approximation to true relationship

Almost always fits data less well than a suitable continuous function

• ‘Optimal’ cut-points have several difficulties

Biased effect estimates

Inflated P-values

Not reproducible in other studies

8/4/2005 5

Example datasets

1. Epidemiology

Whitehall 1

17,370 male Civil Servants aged 40-64 years

Measurements include: age, cigarette smoking,

BP, cholesterol, height, weight, job grade

Outcomes of interest: coronary heart disease, allcause mortality

 logistic regression

Interested in risk as function of covariates

Several continuous covariates

Some may have no influence in multivariable context

8/4/2005 6

Example datasets

2. Clinical studies

German breast cancer study group (BMFT-2)

Prognostic factors in primary breast cancer

Age, menopausal status, tumour size, grade, no. of positive lymph nodes, hormone receptor status

Recurrence-free survival time

Cox regression

686 patients, 299 events

Several continuous covariates

Interested in prognostic model and effect of individual variables

8/4/2005 7

Example:

Systolic blood pressure vs. age

Whitehall 1: BP vs age

8/4/2005

40 45 50

Age, years

55 60 65

8

Example: Curve fitting

(Systolic BP and age – not linear)

Whitehall 1: BP vs age

95% CI

Linear function

FP1 function

Running line

8/4/2005

40 45 50

Age, years

55 60 65

9

Empirical curve fitting: Aims

Smoothing

Visualise relationship of Y with X

Provide and/or suggest functional form

8/4/2005 10

Some approaches

• ‘Non-parametric’ (local-influence) models

Locally weighted (kernel) fits (e.g. lowess )

 Regression splines

Smoothing splines (used in generalized additive models)

Parametric (non-local influence) models

Polynomials

Non-linear curves

Fractional polynomials

 Intermediate between polynomials and non-linear curves

8/4/2005 11

Local regression models

• Advantages

Flexible – because local!

 May reveal ‘true’ curve shape (?)

Disadvantages

 Unstable – because local!

 No concise form for models

Therefore, hard for others to use – publication,compare results with those from other models

Curves not necessarily smooth

 ‘Black box’ approach

 Many approaches – which one(s) to use?

8/4/2005 12

Polynomial models

Do not have the disadvantages of local regression models, but do have others:

Lack of flexibility (low order)

Artefacts in fitted curves (high order)

Cannot have asymptotes

8/4/2005 13

Fractional polynomial models

Describe for one covariate, X

 multiple regression later

• Fractional polynomial of degree m for X with powers p

1

, … , p m is given by

FP m ( X ) =

1

X p

1

+ … +  m

X p m

Powers p

1

,…, p m are taken from a special set

{

2,

1,

0.5, 0, 0.5, 1, 2, 3}

Usually m = 1 or m = 2 is sufficient for a good fit

8/4/2005 14

FP1 and FP2 models

FP1 models are simple power transformations

• 1/ X 2 , 1/ X , 1/

X , log X ,

X , X , X 2 , X 3

 8 models

FP2 models are combinations of these

 For example

1

(1/ X ) +

2

( X 2 )

28 models

• Note ‘repeated powers’ models

For example

1

(1/ X ) +

2

(1/ X )log X

8 models

8/4/2005 15

FP1 and FP2 models: some properties

Many useful curves

A variety of features are available:

Monotonic

Can have asymptote

Non-monotonic (single maximum or minimum)

Single turning-point

Get better fit than with conventional polynomials, even of higher degree

8/4/2005 16

Examples of FP2 curves

- varying powers

(-2, 1) (-2, 2)

(-2, -1)

8/4/2005

(-2, -2)

17

Examples of FP2 curves

- single power, different coefficients

(-2, 2)

4

2

0

-2

-4

10 20 30 x

40

8/4/2005

50

18

A philosophy of function selection

Prefer simple (linear) model

Use more complex (non-linear) FP1 or FP2 model if indicated by the data

Contrast to local regression modelling

Already starts with a complex model

8/4/2005 19

Estimation and significance testing for FP models

Fit model with each combination of powers

FP1: 8 single powers

FP2: 36 combinations of powers

Choose model with lowest deviance (MLE)

Comparing FP m with FP( m

1):

 compare deviance difference with

2 on 2 d.f.

 one d.f. for power, 1 d.f. for regression coefficient

 supported by simulations; slightly conservative

8/4/2005 20

Selection of FP function

Has flavour of a closed test procedure

• Use

2 approximations to get P-values

Define nominal P-value for all tests (often 5%)

Fit linear and best FP1 and FP2 models

Test FP2 vs. null – test of any effect of X (

2 on 4 df)

Test FP2 vs linear – test of non-linearity (

2 on 3 df)

Test FP2 vs FP1 – test of more complex function against simpler one (

2 on 2 df)

8/4/2005 21

Example: Systolic BP and age

8/4/2005

Model

FP2 v FP1 d.f.

Deviance difference

FP2 v Null 4

FP2 v Linear 3

2

944.57

29.95

3.29

Pvalue

0.000

0.000

0.2

Reminder:

FP1 had power 3:

1

X 3

FP2 had powers (1,1):

1

X +

2

X log X

22

Aside: FP versus spline

Why care about FPs when splines are more flexible?

More flexible

 more unstable

 More chance of ‘over-fitting’

In epidemiology, dose-response relationships are often simple

Illustrate by small simulation example

8/4/2005 23

FP versus spline (continued)

Logarithmic relationships are common in practice

• Simulate regression model y =

0

+

• Error is normally distributed N(0,

2 )

1 log( X ) + error

Take

0

= 0,

1

= 1; X has lognormal distribution

Vary

= {1, 0.5, 0.25, 0.125}

Fit FP1, FP2 and spline with 2, 4, 6 d.f.

Compute mean square error

Compare with mean square error for true model

8/4/2005 24

FP vs. spline (continued)

Sigma = 1 Sigma = 0.5

0 2 x

4

Sigma = 0.25

6 0 2 x

4

Sigma = 0.125

6

8/4/2005

0 2 x

4 6 0 2 x

4 6

25

FP vs. spline (continued)

FP1 and spline with 2 df

Solid: FP1; dashed: spline 2 df

0 2 4 6 0 2 4 6

8/4/2005

0 2 4 6 0 2 4 6

26

FP vs. spline (continued)

FP2 and spline with 4 df

0 1 2 3 4 5 0 1 2 3 4 5

8/4/2005

0 1 2 3 4 5 0 1 2 3 4 5

27

FP vs. spline (continued)

FP vs. spline: prediction error

8/4/2005

.125

True

Spline 2df

.25

sigma

FP1

Spline 4df

.5

FP2

Spline 6df

1

28

FP vs. spline (continued)

In this example, spline usually less accurate than FP

FP2 less accurate than FP1 (over-fitting)

FP1 and FP2 more accurate than splines

Splines often had non-monotonic fitted curves

Could be medically implausible

Of course, this is a special example

8/4/2005 29

Multivariable FP (MFP) models

Assume have k > 1 continuous covariates and perhaps some categoric or binary covariates

Allow dropping of non-significant variables

Wish to find best multivariable FP model for all X

’s

Impractical to try all combinations of powers

Require iterative fitting procedure

8/4/2005 30

Fitting multivariable FP models

(MFP algorithm)

Combine backward elimination of weak variables with search for best FP functions

Determine fitting order from linear model

Apply FP model selection procedure to each X in turn

 fixing functions (but not

 ’s) for other

X

’s

Cycle until FP functions (i.e. powers) and variables selected do not change

8/4/2005 31

Example: Prognostic factors in breast cancer

Aim to develop a prognostic index for risk of tumour recurrence or death

Have 7 prognostic factors

4 continuous, 3 categorical

Select variables and functions using 5% significance level

8/4/2005 32

Univariate linear analysis

X

1

X

2

X

3

X

4a

X

4b

X

5

X

6

X

7

Variable Name

Age

Menopausal status

Tumour size

2

0.58

0.28

15.68

Grade 2 or 3

Grade 3

19.92

8.19

No. of positive lymph nodes 50.02

Progesterone receptor status 34.04

Oestrogen receptor status 4.70

8/4/2005 33

Univariate FP2 analysis

Variable

X

1

age

Powers

2 d.f.

(

2,

0.5) 17.61

4

X

3

size (

1,

3) 19.81

4

X

5

nodes (1, 2)

X

6

PgR (

0.5, 0)

X

7

ER (

2,

1)

P

0.001

0.001

Gain

17.03

4.13

81.36

4 < 0.001

31.34

52.73

23.07

4

4

< 0.001

< 0.001

18.69

18.37

Gain compares FP2 with linear on 3 d.f.

All factors except for X

3 have a non-linear effect

8/4/2005 34

Multivariable FP analysis

Variable

X

X

X

1

3

5

age

size

nodes

X

6

PgR

X

7

ER

0.5

Out

X

2

mens.

Out

X

4a

grad 2/3 In

X

4b

grad 3 Out

FP etc.

2 d.f.

(

2,

0.5) 19.33

4

P

0.001

Out

(

2,

1)

5.31

74.14

4

4

0.3

<0.001

32.70

2.15

0.21

4.59

0.15

4 <0.001

4 0.7

1 0.6

1 0.03

1 0.7

8/4/2005 35

Comments on analysis

Conventional backwards elimination at 5% level selects X

4a

, X

5

, X

6

, and X

1 is excluded

FP analysis picks up same variables as backward elimination, and additionally X

1

Note considerable non-linearity of X

1

X

1 and X

5 has no linear influence on risk of recurrence

FP model detects more structure in the data than the linear model

8/4/2005 36

Plots of fitted FP functions

Breast cancer: Fitted FP functions

Age Nodes

20 40

Age, years

60

Progesterone receptor

80 0 10 20 30 40

No. of positive lymph nodes

50

8/4/2005

0 500 1000 1500 2000 2500

Progesterone receptor status

37

Survival by risk groups

Prognostic classification scheme

8/4/2005

0 2 4

Recurrence-free survival, yr

Group = Low risk

Group = High risk

6

Group = Medium risk

8

38

Robustness of FP functions

Breast cancer example showed non-robust functions for nodes – not medically sensible

Situation can be improved by performing covariate transformation before FP analysis

Can be done systematically (work in progress)

Sauerbrei & Royston (1999) used negative exponential transformation of nodes

 exp(–0.12 * number of nodes)

8/4/2005 39

Making the function for lymph nodes more robust

8/4/2005

0 10

Original

Exponential transformation

20 30

No. of positive lymph nodes

40 50

40

2 nd example: Whitehall 1

MFP analysis

8/4/2005

Covariate

Age

Cigarettes

Systolic BP

Total cholesterol

Height

Weight

Job grade

FP etc.

Linear

0.5

-1, -0.5

Linear

Linear

-2, 3

In

No variables were eliminated by the MFP algorithm

Weight is eliminated by linear backward elimination

41

Plots of FP functions

Whitehall 1: multivariable FP analysis

Age Cigarettes Systolic BP

40 45 50 55 60 65

Age at entry

Total cholesterol

0 20 40

Cigarettes/day

Weight

60 50 100 150 200 250 300

Systolic BP

Height

8/4/2005

0 5 10

Cholesterol/ mmol/l

15 40 60 80 100 120 140

Weight/kgs

140 160 180

Height/cms

200

42

A new multivariable regression algorithm with spline functions

• Inspired by closed test procedure for selecting an FP function

Start with predefined number of knots

Determines maximum complexity of function

• Use predetermined knot positions

E.g. at fixed percentile positions of distn. of x

Simplest function (default) is linear

Closed test procedure to reduce the knot set if some knots are not significant

Apply backfitting procedure as in mfp

Implemented in Stata as new command mrsnb

8/4/2005 43

Splines: Breast cancer example

Selects variables similar to mfp

Grade 2/3 omitted, otherwise selected variables are identical

Knots: age(46, 53); transformed nodes(linear);

PgR(7, 132)

Deviance of selected model almost identical to mfp model

8/4/2005 44

Plots of fitted FP functions

20 40

Age, years

60 80 0 10 20 30 40

No. of positive lymph nodes

50

8/4/2005

0 500 1000 1500 2000 2500

Progesterone receptor status

Solid lines, FP; dashed lines, spline

45

Improving the robustness of spline models

Often have covariates with positively skew distributions – can produce curve artefacts

Simple approach is to log-transform covariates with a skew distribution – e.g.



1

> 0.5

Then fit the spline model

In the breast cancer example, this approach gives a more satisfactory log function for PgR

8/4/2005 46

Stability of FP models

Models (variables, FP functions) selected by statistical criteria – cut-off on P-value

• Approach has several advantages …

• … and also is known to have problems

Omission bias

Selection bias

Unstable – many models may fit equally well

8/4/2005 47

Stability investigation

Instability may be studied by bootstrap resampling

(sampling with replacement)

Take bootstrap sample B times

Select model by chosen procedure

Count how many times each variable is selected

Summarise inclusion frequencies & their dependencies

Study fitted functions for each covariate

May lead to choosing several possible models, or a model different from the original one

8/4/2005 48

Bootstrap stability analysis of the breast cancer dataset

5000 bootstrap samples taken (!)

MFP algorithm with Cox model applied to each sample

Resulted in 1222 different models (!!)

Nevertheless, could identify stable subset consisting of 60% of replications

Judged by similarity of functions selected

8/4/2005 49

Bootstrap stability analysis of the breast cancer dataset

Variable

Age

Menopausal status

Tumour size

Grade 2/3

Grade 3

Lymph nodes

Model selected

FP1

FP2

FP1

FP2

FP1

Progesterone receptors FP1

FP2

Oestrogen receptors FP1

FP2

% bootstraps model selected

16

76

20

34

6

58

9

100

95

4

13

6

8/4/2005 50

Bootstrap analysis: summaries of fitted curves from stable subset

1

0

6

4

2

0

20 30 40 50 60 70 80

Age, years

2

-1

0 10 20

Number of positive lymph nodes

30

1

0

-1

-2

-3

0

1

25 50 75

Tumour size, mm

100

0

-1

0 250

PgR, fmol/L

500

8/4/2005 51

Presentation of models for continuous covariates

The function + 95% CI gives the whole story

Functions for important covariates should always be plotted

In epidemiology, sometimes useful to give a more conventional table of results in categories

This can be done from the fitted function

8/4/2005 52

Example: Cigarette smoking and all-cause mortality (Whitehall 1)

Cigarettes per day Number OR (model based)

Range Ref.

point

At risk Dyin g

Estimate

0 (referent) 0 10103 690 1.00

95% CI

--

1-10

11-20

21-30

31-40

41-50

51-60

5

15

25

55

2254

3448

1117

12

243

494

185

2

1.69

2.25

2.60

3.25

1.59, 1.80

2.04, 2.49

2.31, 2.91

35 283 48 2.86

2.52, 3.24

45 43 8 3.07

2.68, 3.52

2.82, 3.75

8/4/2005 53

Other issues (1)

Handling continuous confounders

May use a larger P-value for selection e.g. 0.2

Not so concerned about functional form here

Binary/continuous covariate interactions

Can be modelled using FPs (Royston & Sauerbrei

2004)

Adjust for other factors using MFP

8/4/2005 54

Other issues (2)

Time-varying effects in survival analysis

Can be modelled using FP functions of time

(Berger; also Sauerbrei & Royston, in progress)

Checking adequacy of FP functions

May be done by using splines

Fit FP function and see if spline function adds anything, adjusting for the fitted FP function

8/4/2005 55

Stata aspects

Command mfp is part of Stata 8

• Example of use:

 mfp stcox x1 x2 x3 x4a x4b x5 x6 x7 hormon, select(0.05, hormon:1)

Command mrsnb is available from PR

Example of use:

 mrsnb stcox x1 x2 x3 x4a x4b x5 x6 x7 hormon, select(0.05, hormon:1)

Command mfpboot is available from PR

Does bootstrap stability analysis of MFP models

8/4/2005 56

Concluding remarks (1)

FP method in general

No reason (other than convention) why regression models should include only positive integer powers of covariates

 FP is a simple extension of an existing method

Simple to program and simple to explain

Parametric, so can easily get predicted values

FP usually gives better fit than standard polynomials

Cannot do worse, since standard polynomials are included

8/4/2005 57

Concluding remarks (2)

Multivariable FP modelling

Many applications in general context of multiple regression modelling

Well-defined procedure based on standard principles for selecting variables and functions

Aspects of robustness and stability have been investigated (and methods are available)

Much experience gained so far suggests that method is very useful in clinical epidemiology

8/4/2005 58

Some references

Royston P, Altman DG (1994) Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Applied Statistics 43 :

429-467

Royston P, Altman DG (1997) Approximating statistical functions by using fractional polynomial regression. The Statistician 46 : 1-12

Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. JRSS(A)

162: 71-94. Corrigendum JRSS(A) 165: 399-400, 2002

Royston P, Ambler G, Sauerbrei W. (1999) The use of fractional polynomials to model continuous risk variables in epidemiology. International Journal of

Epidemiology , 28 : 964-974.

• Royston P, Sauerbrei W (2004). A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine 23 : 2509-2525.

• Royston P, Sauerbrei W (2003) Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigation.

Statistics in Medicine 22 : 639-659.

Armitage P, Berry G, Matthews JNS (2002) Statistical Methods in Medical

Research . Oxford, Blackwell.

8/4/2005 59

Download