Improving long-term survival estimation through flexible models, combining evidence and accessible software

advertisement
Improving long-term survival estimation through
flexible models, combining evidence and
accessible software
Chris Jackson
MRC Biostatistics Unit, Cambridge, U.K.
UCL, 7 July 2015
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
1/ 37
Overview
“Improving long-term survival estimation through
1. flexible models
2. combining evidence
3. accessible software. . .
. . . but you can’t currently have all three at the same time!”
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
2/ 37
Overview
“Improving long-term survival estimation through
1. flexible models
2. combining evidence
3. accessible software. . .
. . . but you can’t currently have all three at the same time!”
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
2/ 37
Premise of the talk
0.8
1.0
Long-term survival modelling
needs
observed data
0.6
0.4
Sufficient long-term data:
I
I
0.0
0.2
Survival Probabilities
I
extrapolation
follow−up
0
10
20
30
Years
Chris Jackson, MRC-BSU Cambridge
40
50
individual data with long
follow up, or
individual data with short
follow up + other data on
the long-term period
I
Models flexible enough to
capture the data
I
Usable software, skills.
60
Improving long-term survival estimation
3/ 37
Overview
1. Using long-term population data to extrapolate RCT evidence
over time (slides 6–21)
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
flexible modelling
| combining evidence
| accessible software
2. Flexible parametric survival models for one individual dataset
(slides 23–37)
I
The flexsurv R package for parametric survival modelling
flexible modelling
Chris Jackson, MRC-BSU Cambridge
| combining evidence
| accessible software
Improving long-term survival estimation
4/ 37
Part I
Using long-term population data to
extrapolate short-term survival data
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
5/ 37
Motivating example: ICDs
ICD (Implantable Cardioverter Defibrillators) compared to
anti-arrhythmic drugs (AAD) for prevention of sudden cardiac
death in people with cardiac arrhythmia.
Data:
I
Individual data from cohort of 535 UK cardiac arrhythmia
patients implanted with ICDs between 1991 and 2002.
I
Meta-analysis of three (non-UK) RCTs (published HRs).
I
I
Relatively short-term follow-up: approximately 75% people
followed for less than 5 years, maximum 10 years
UK population mortality statistics by age, sex, cause of death.
Estimate the survival curve over the lifetime of ICD and AAD
patients in UK
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
6/ 37
(Demiris & Sharples, Stat. Med. 2006)
0.4
0.6
observed data
0.2
Survival Probabilities
0.8
1.0
Previous Work: central idea
0.0
ICD cohort survival
extrapolation
follow−up
0
10
20
30
40
50
60
Years
Use UK population data with same age/sex distribution to anchor
the ICD population risk
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
7/ 37
(Demiris & Sharples, Stat. Med. 2006)
0.6
observed data
0.4
UK population survival
0.2
Survival Probabilities
0.8
1.0
Previous Work: central idea
0.0
ICD cohort survival
extrapolation
follow−up
0
10
20
30
40
50
60
Years
Use UK population data with same age/sex distribution to anchor
the ICD population risk
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
7/ 37
Previous Work: key assumption
Constant (multiplicative)
hazard ratio between ICD and
UK population
−1
Log−Hazard Function
ICD cohort log−hazard
−2
This seems a strong
assumption:
−3
UK population log−hazard
1. ICD patients at greater
risk of arrhythmia death
−5
−4
Log−hazard
β
0
10
20
30
40
50
60
Years
hICD (t) = e β hUK (t), for t > 0
Chris Jackson, MRC-BSU Cambridge
2. If proportion of deaths
caused by arrhythmia
changes over time, then
extrapolating constant
HR for all causes may be
inaccurate
Improving long-term survival estimation
8/ 37
Previous Work: key assumption
Constant (multiplicative)
hazard ratio between ICD and
UK population
−1
Log−Hazard Function
ICD cohort log−hazard
−2
This seems a strong
assumption:
−3
UK population log−hazard
1. ICD patients at greater
risk of arrhythmia death
−5
−4
Log−hazard
β
0
10
20
30
40
50
60
Years
hICD (t) = e β hUK (t), for t > 0
Chris Jackson, MRC-BSU Cambridge
2. If proportion of deaths
caused by arrhythmia
changes over time, then
extrapolating constant
HR for all causes may be
inaccurate
Improving long-term survival estimation
8/ 37
Proportion of UK deaths which are due to arrhythmia
Proportion of Arrhythmic Deaths - UK Population 2002
male
female
0.30
Proportion of Arrhythmic Deaths
0.25
0.20
0.15
0.10
0.05
95+
90-94
85-89
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24
15-19
0.00
Age Group
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
9/ 37
Account for multiple causes of death when modelling
Extrapolate constant cause-specific instead of all-cause hazard ratio
Simulation study: ignoring cause-specific nature of hazard gives
bias in mean survival
I
particularly when hazard increases much quicker for other
cause
I
See paper for full details
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
Application to ICD example. . .
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
10/ 37
Model to extrapolate survival for ICD patients
(not considering AAD control group, RCT data for the moment. . .)
I
General population data: cause of death (k =arrhythmic,
non-arrhythmic) known.
Cause-specific survival is Weibull with hazard:
(k)
hUK (t) = αk λk t αk −1
I
ICD cohort: cause of death unknown
Overall survival follows a polyhazard model
Biometrics 1999):
(Louzada-Neto,
arr
other
hICD (t) = hICD
(t) + hICD
(t)
I
I
t: minimum time to one of 2 possible causes of death
Hazard is the sum of 2 cause-specific hazards
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
11/ 37
Cause-specific proportional hazards assumption
ICD cohort hazard is related to the general population hazard as:
arr
other
hICD (t) = hICD
(t) + hICD
(t)
arr
other
= e β hUK
(t) + hUK
(t)
= e β α1 λ1 t α1 −1 + α2 λ2 t α2 −1 (poly-Weibull)
Arrhythmia hazard is proportional
to UK matched population.
Other-cause hazard is identical
I
Joint Bayesian model for ICD cohort + UK population data
I
Estimate joint posterior of parameters α1 , α2 , λ1 , λ2 , β by
MCMC (using WinBUGS).
I
WBDev add-on needed to implement the poly-Weibull
distribution for the cohort data
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
12/ 37
Weakly informative prior distributions
Express beliefs on an intuitive scale — exact choice may make a
difference for small populations
Weibull rate λ:
I
Age around 60 on study entry: cannot survive more than 60
additional years. Mean survival ∼ U(0, 60).
I
1/λ ∼ U(0, 100), gives a mean 1/λΓ(1 + 1/α) of < 60, for all
plausible α.
Weibull shape α: controls hazard vs. time: h(t) = αλ(λt)α−1
I
Hazard ratio for doubled time t is 2α−1 .
I
Prior mean of 1.5 for this, with 95% CI about (0.64, 100)
I
implies log(α) ∼ N(0.5, σ = 0.78)
Log HR β between ICD patients and general population: 95% CI
for HR (1/150,150) → β ∼ N(0, σ = 2.5)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
13/ 37
Weakly informative prior distributions
Express beliefs on an intuitive scale — exact choice may make a
difference for small populations
Weibull rate λ:
I
Age around 60 on study entry: cannot survive more than 60
additional years. Mean survival ∼ U(0, 60).
I
1/λ ∼ U(0, 100), gives a mean 1/λΓ(1 + 1/α) of < 60, for all
plausible α.
Weibull shape α: controls hazard vs. time: h(t) = αλ(λt)α−1
I
Hazard ratio for doubled time t is 2α−1 .
I
Prior mean of 1.5 for this, with 95% CI about (0.64, 100)
I
implies log(α) ∼ N(0.5, σ = 0.78)
Log HR β between ICD patients and general population: 95% CI
for HR (1/150,150) → β ∼ N(0, σ = 2.5)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
13/ 37
Weakly informative prior distributions
Express beliefs on an intuitive scale — exact choice may make a
difference for small populations
Weibull rate λ:
I
Age around 60 on study entry: cannot survive more than 60
additional years. Mean survival ∼ U(0, 60).
I
1/λ ∼ U(0, 100), gives a mean 1/λΓ(1 + 1/α) of < 60, for all
plausible α.
Weibull shape α: controls hazard vs. time: h(t) = αλ(λt)α−1
I
Hazard ratio for doubled time t is 2α−1 .
I
Prior mean of 1.5 for this, with 95% CI about (0.64, 100)
I
implies log(α) ∼ N(0.5, σ = 0.78)
Log HR β between ICD patients and general population: 95% CI
for HR (1/150,150) → β ∼ N(0, σ = 2.5)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
13/ 37
Extrapolating real ICD cohort data
0.8
10
Poly−Weibull
mean 8.68 (0.80)
Weibull model
mean 8.48 (0.72)
0.0
0.0
0.6
0.2
Poly−Weibull
mean 11.13 (2.23)
Weibull model
mean 10.04 (1.98)
0
General population
0.4
Survival Probability
0.6
0.4
General population
0.2
Survival Probability
0.8
1.0
Men
1.0
Women
20
30
40
50
0
10
20
Years
I
30
40
50
Years
Ignore the cause-specific hazard (Weibull) or account for it
(poly-Weibull)
I
More bias for women when ignoring it
I
due to time-varying proportion of deaths due to arrhythmia.
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
14/ 37
Proportion of UK deaths which are due to arrhythmia
Proportion of Arrhythmic Deaths - UK Population 2002
male
female
0.30
Proportion of Arrhythmic Deaths
0.25
0.20
0.15
0.10
0.05
95+
90-94
85-89
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24
15-19
0.00
Age Group
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
15/ 37
Including an intervention effect from literature
Hazards for three groups under Poly-Weibull model:
arr
other
hUK (t) = hUK
(t) + hUK
(t)
arr
other
hICD (t) = e β hUK
(t) + hUK
(t)
arr
other
hAAD (t) = e γa +β hUK
(t) + hUK
(t),
Meta-analysis of ICD vs AAD trials, published HR for arrhythmia
mortality, gives a prior for γa .
For the (probably biased) Weibull model we have:
hUK (t) = µ1 αt α−1 = e β0 αt α−1
hICD (t) = e β1 hUK (t) = e β0 +β1 αt α−1
hAAD (t) = e γ hICD (t) = e β0 +β1 +γ αt α−1 ,
Prior for γ from published meta-analysis HR for all-cause mortality.
Outcome of interest → life years gained (LYG) by ICDs vs AADs.
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
16/ 37
Including an intervention effect from literature
Hazards for three groups under Poly-Weibull model:
arr
other
hUK (t) = hUK
(t) + hUK
(t)
arr
other
hICD (t) = e β hUK
(t) + hUK
(t)
arr
other
hAAD (t) = e γa +β hUK
(t) + hUK
(t),
Meta-analysis of ICD vs AAD trials, published HR for arrhythmia
mortality, gives a prior for γa .
For the (probably biased) Weibull model we have:
hUK (t) = µ1 αt α−1 = e β0 αt α−1
hICD (t) = e β1 hUK (t) = e β0 +β1 αt α−1
hAAD (t) = e γ hICD (t) = e β0 +β1 +γ αt α−1 ,
Prior for γ from published meta-analysis HR for all-cause mortality.
Outcome of interest → life years gained (LYG) by ICDs vs AADs.
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
16/ 37
0.6
I
ICD cohort extrapolated
using population data
I
AAD survival generated
with aid of meta-analysis.
I
Life-years gained from ICD
appears biased if use
Weibull
0.4
General population
ICD
0.2
Survival Probability
0.8
1.0
Extrapolating incremental survival between interventions
Poly−Weibull
Weibull
AAD
0.0
Poly−Weibull
Weibull
0
10
20
30
40
50
Years
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
17/ 37
1.0
Extrapolating incremental survival between interventions
ICD cohort extrapolated
using population data
I
AAD survival generated
with aid of meta-analysis.
I
Life-years gained from ICD
appears biased if use
Weibull
I
Slightly more apparent
bias for women
0.6
0.4
General population
ICD
0.2
Survival Probability
0.8
I
Poly−Weibull
Weibull
AAD
0.0
Poly−Weibull
Weibull
0
10
20
30
40
50
Years
Life-years gained from ICD
Overall
Women
Men
Chris Jackson, MRC-BSU Cambridge
Weibull
1.82 (0.49)
1.89 (0.62)
1.73 (0.47)
Poly-Weibull
3.12 (0.61)
3.11 (0.76)
2.91 (0.58)
. . . Still bias for men
Improving long-term survival estimation
17/ 37
Proportion of UK deaths which are due to arrhythmia
Proportion of Arrhythmic Deaths - UK Population 2002
male
female
0.30
Proportion of Arrhythmic Deaths
0.25
0.20
0.15
0.10
0.05
95+
90-94
85-89
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24
15-19
0.00
Age Group
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
18/ 37
Risks of combining data (1) – data inconsistency
Causes of death may be recorded inconsistently between
I
meta-analysis of ICD vs drug trials “HR for arrhythmia deaths”
I
population mortality data
Sensitivity analysis — assume 10%-20%
“arrhythmia”/“non-arrhythmia” deaths are misclassified.
I
e.g. if fewer deaths actually affected by treatment, expected
survival gains from treatment lower
I
Still doesn’t explain the discrepancy between models for men.
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
19/ 37
Risks of combining data (1) – data inconsistency
Causes of death may be recorded inconsistently between
I
meta-analysis of ICD vs drug trials “HR for arrhythmia deaths”
I
population mortality data
Sensitivity analysis — assume 10%-20%
“arrhythmia”/“non-arrhythmia” deaths are misclassified.
I
e.g. if fewer deaths actually affected by treatment, expected
survival gains from treatment lower
I
Still doesn’t explain the discrepancy between models for men.
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
19/ 37
Risks of combining data (2) — assumptions about hazards
We assumed ICD patients had
arrhythmia hazard proportional(=greater)
other-cause hazard identical
to general population.
I
What if ICD patients at greater risk from some other causes
(other heart disease), as well as arrhythmia?
I
May have led to biases in survival
(underestimation of AAD-specific survival in poly-Weibull
model. . .reasoning for this in paper)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
20/ 37
Conclusions: survival extrapolation example
I
I
Bayesian models useful for combining short-term RCT /
cohort and longer-term survival data.
Ignoring cause-specific hazard, thus misspecifying the
underlying model, introduces bias in survival estimates.
I
I
may underestimate or overestimate overall survival.
Bias can be alleviated by modelling cause-specific hazards
I
I
but requires cause-specific survival data / treatment effects
and information about which causes will be affected by disease
status and / or treatment
I
Bias for treatment comparisons may be less if bias acts in the
same way in all treatment groups.
I
Sensitivity analysis to model / data assumptions important
I
Requires model-specific BUGS code. . .
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
21/ 37
Conclusions: survival extrapolation example
I
I
Bayesian models useful for combining short-term RCT /
cohort and longer-term survival data.
Ignoring cause-specific hazard, thus misspecifying the
underlying model, introduces bias in survival estimates.
I
I
may underestimate or overestimate overall survival.
Bias can be alleviated by modelling cause-specific hazards
I
I
but requires cause-specific survival data / treatment effects
and information about which causes will be affected by disease
status and / or treatment
I
Bias for treatment comparisons may be less if bias acts in the
same way in all treatment groups.
I
Sensitivity analysis to model / data assumptions important
I
Requires model-specific BUGS code. . .
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
21/ 37
Conclusions: survival extrapolation example
I
I
Bayesian models useful for combining short-term RCT /
cohort and longer-term survival data.
Ignoring cause-specific hazard, thus misspecifying the
underlying model, introduces bias in survival estimates.
I
I
may underestimate or overestimate overall survival.
Bias can be alleviated by modelling cause-specific hazards
I
I
but requires cause-specific survival data / treatment effects
and information about which causes will be affected by disease
status and / or treatment
I
Bias for treatment comparisons may be less if bias acts in the
same way in all treatment groups.
I
Sensitivity analysis to model / data assumptions important
I
Requires model-specific BUGS code. . .
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
21/ 37
Conclusions: survival extrapolation example
I
I
Bayesian models useful for combining short-term RCT /
cohort and longer-term survival data.
Ignoring cause-specific hazard, thus misspecifying the
underlying model, introduces bias in survival estimates.
I
I
may underestimate or overestimate overall survival.
Bias can be alleviated by modelling cause-specific hazards
I
I
but requires cause-specific survival data / treatment effects
and information about which causes will be affected by disease
status and / or treatment
I
Bias for treatment comparisons may be less if bias acts in the
same way in all treatment groups.
I
Sensitivity analysis to model / data assumptions important
I
Requires model-specific BUGS code. . .
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
21/ 37
Conclusions: survival extrapolation example
I
I
Bayesian models useful for combining short-term RCT /
cohort and longer-term survival data.
Ignoring cause-specific hazard, thus misspecifying the
underlying model, introduces bias in survival estimates.
I
I
may underestimate or overestimate overall survival.
Bias can be alleviated by modelling cause-specific hazards
I
I
but requires cause-specific survival data / treatment effects
and information about which causes will be affected by disease
status and / or treatment
I
Bias for treatment comparisons may be less if bias acts in the
same way in all treatment groups.
I
Sensitivity analysis to model / data assumptions important
I
Requires model-specific BUGS code. . .
(Benaglia, Jackson & Sharples, Stat. Med (2015) 34(5):796–811)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
21/ 37
Part II
Flexible parametric survival models and
software
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
22/ 37
Flexible parametric models
I
2-parameter Weibull model judged adequate in ICD
application — but this is not always the case.
I
3–4 parameter models (generalised gamma / F) (see e.g.
Jackson et al Int J Biostat) sometimes better
More flexible is a spline-based model (Royston & Parmar, Stat. Med.
2002)
I
can model both baseline hazard and non-proportional hazards
between groups with any number of parameters
I
I
can fit as well as needed
can be implemented in Stata (stpm2) and R (flexsurv)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
23/ 37
Spline survival model
0.0
Log cumulative hazard
(log(-log survival))
Log(−log(survival)
−2.5
I
Weibull: linear function
of log time t
Spline: piecewise cubic
function of log(t)
−5.0
I
−7.5
Weibull
−10.0
0.1
0.2
0.5
1.0
2.0
3.0
4.0 5.0
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
0.0
Log cumulative hazard
(log(-log survival))
●
Log(−log(survival)
−2.5
I
Weibull: linear function
of log time t
Spline: piecewise cubic
function of log(t)
−5.0
I
−7.5
Weibull
Spline (1 knot)
−10.0
0.1
0.2
0.5
1.0
2.0
3.0
4.0 5.0
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
0.0
Log cumulative hazard
(log(-log survival))
●
●
Log(−log(survival)
−2.5
I
Weibull: linear function
of log time t
Spline: piecewise cubic
function of log(t)
−5.0
I
−7.5
Weibull
Spline (2 knots)
−10.0
0.1
0.2
0.5
1.0
2.0
3.0
4.0 5.0
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
0.0
Log cumulative hazard
(log(-log survival))
●
●
●
Log(−log(survival)
−2.5
I
Weibull: linear function
of log time t
Spline: piecewise cubic
function of log(t)
−5.0
I
−7.5
Weibull
Spline (3 knots)
−10.0
0.1
0.2
0.5
1.0
2.0
3.0
4.0 5.0
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
1760
Log cumulative hazard
(log(-log survival))
●
1740
AIC
I
Weibull: linear function
of log time t
Spline: piecewise cubic
function of log(t)
1720
I
●
Pieces separated by
knots to span the data
●
●
●
●
1700
Weibull
1 knots
2 knots
3 knots
4 knots
5 knots
0 knots: Weibull
n knots: n + 2 parameters
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
1.00
Log cumulative hazard
(log(-log survival))
0.75
Survival
I
Weibull: linear function
of log time t
0.50
Spline: piecewise cubic
function of log(t)
I
0.25
Weibull
0.00
0.0
2.5
5.0
7.5
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
1.00
Log cumulative hazard
(log(-log survival))
●
0.75
Survival
I
Weibull: linear function
of log time t
0.50
Spline: piecewise cubic
function of log(t)
I
0.25
Weibull
Spline (1 knot)
0.00
0.0
2.5
5.0
7.5
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
1.00
●
Log cumulative hazard
(log(-log survival))
0.75
●
Survival
I
Weibull: linear function
of log time t
0.50
Spline: piecewise cubic
function of log(t)
I
0.25
Weibull
Spline (2 knots)
0.00
0.0
2.5
5.0
7.5
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline survival model
1.00
●
Log cumulative hazard
(log(-log survival))
●
0.75
I
Survival
●
Weibull: linear function
of log time t
0.50
Spline: piecewise cubic
function of log(t)
I
0.25
Weibull
Spline (3 knots)
0.00
0.0
2.5
5.0
7.5
10.0
Pieces separated by
knots to span the data
0 knots: Weibull
n knots: n + 2 parameters
Years
(German breast cancer data, see Sauerbrei & Royston,
J.Roy.Stat.Soc A 1999)
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
24/ 37
Spline equations
Log cumulative hazard in terms of x = log (t), parameters γ
log (−log (S(x, γ))) = γ0 + γ1 x
for Weibull
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
25/ 37
Spline equations
Log cumulative hazard in terms of x = log (t), parameters γ
log (−log (S(x, γ))) = γ0 + γ1 x + γ2 v1 (x) + . . . + γm+1 vm (x)
for spline with m knots. vj (x) is jth basis function
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
25/ 37
Spline equations
Log cumulative hazard in terms of x = log (t), parameters γ
log (−log (S(x, γ))) = γ0 + γ1 x + γ2 v1 (x) + . . . + γm+1 vm (x)
for spline with m knots. vj (x) is jth basis function
vj (x) = (x − kj )3+ − λj (x − kmin )3+ − (1 − λj )(x − kmax )3+ ,
λj =
kmax − kj
kmax − kmin
and (x − a)+ = max(0, x − a).
a cubic specially constructed to give a smooth function at the knots
kmin , . . . , kmax
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
25/ 37
Modelling covariates, and non-proportional hazards
Log cumulative hazard defined by
log (−log (S(x, γ))) = γ0 + γ1 x + γ2 v1 (x) + . . . + γm+1 vm (x)
Covariates z can be placed on any parameter γ
I
γ0 = β > z gives a proportional hazards model
I
I
Sufficiently flexible splines give similar answers to Cox models
Modelling any other γr as linear in z gives non-proportional
hazards:
I
hazard ratio can be an arbitrarily flexible function of time
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
26/ 37
Modelling covariates, and non-proportional hazards
Weibull fit (PH)
1.00
0.75
Survival
group
Good
0.50
Medium
Poor
0.25
0.00
0.0
2.5
5.0
7.5
Hazard ratios (vs. Good)
under proportional
hazards:
Medium vs good prognosis
Weibull 2.33 (1.67, 3.26)
Poor vs good prognosis
Weibull 5.33 (3.86, 7.35)
10.0
Years
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
27/ 37
Modelling covariates, and non-proportional hazards
Weibull fit (PH)
1.00
0.75
Survival
group
Good
0.50
Medium
Poor
0.25
0.00
0.0
2.5
5.0
Years
Chris Jackson, MRC-BSU Cambridge
7.5
10.0
Hazard ratios (vs. Good)
under proportional
hazards:
Medium vs good prognosis
Weibull 2.33 (1.67, 3.26)
Cox
2.32 (1.66, 3.24)
Poor vs good prognosis
Weibull 5.33 (3.86, 7.35)
Cox
5.04 (3.65, 6.96)
Improving long-term survival estimation
27/ 37
Modelling covariates, and non-proportional hazards
Spline fit (PH)
1.00
0.75
Survival
group
Good
0.50
Medium
Poor
0.25
0.00
0.0
2.5
5.0
Years
Chris Jackson, MRC-BSU Cambridge
7.5
10.0
Hazard ratios (vs. Good)
under proportional
hazards:
Medium vs good prognosis
Weibull 2.33 (1.67, 3.26)
Cox
2.32 (1.66, 3.24)
Spline
2.31 (1.65, 3.23)
Poor vs good prognosis
Weibull 5.33 (3.86, 7.35)
Cox
5.04 (3.65, 6.96)
Spline
5.02 (3.64, 6.93)
Improving long-term survival estimation
27/ 37
Modelling covariates, and non-proportional hazards
Log cumulative hazard defined by
log (−log (S(x, γ))) = γ0 + γ1 x + γ2 v1 (x) + . . . + γm+1 vm (x)
Covariates z can be placed on any parameter γ
I
γ0 = β > z gives a proportional hazards model
I
I
Sufficiently flexible splines give similar answers to Cox models
Modelling any other γr as linear in z gives non-proportional
hazards:
I
hazard ratio can be an arbitrarily flexible function of time
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
28/ 37
Modelling covariates, and non-proportional hazards
Spline fit (PH)
1.00
0.75
Survival
group
Good
0.50
Medium
Poor
0.25
Relaxing the proportional
hazards assumption
I
improves short-term
fit
I
changes extrapolation
0.00
0.0
2.5
5.0
7.5
10.0
Years
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
29/ 37
Modelling covariates, and non-proportional hazards
Spline fit (non−PH)
1.00
Spline fit (PH)
0.75
Survival
group
Good
0.50
Medium
Poor
0.25
Relaxing the proportional
hazards assumption
I
improves short-term
fit
I
changes extrapolation
0.00
0.0
2.5
5.0
7.5
10.0
Years
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
29/ 37
Ongoing challenges with spline models
Interconnected issues:
I
Hard to combine with evidence synthesis. . .
I
. . .No easy Bayesian implementation (tricky in BUGS). . .
I
. . .Priors for / interpretation of spline coefficients γ.
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
30/ 37
flexsurv R package
flexsurv (Jackson, CRAN (2011–2015))
I Fit any parametric survival model to individual data by
maximum likelihood
I
right/interval censoring, left-truncation, multi-state models
I
Standard models built in, plus spline, generalized gamma/F
I
Can program your own novel parametric distributions, given
the hazard or probability density function
Any parameter of any distribution can depend on covariates
I
I
I
proportional hazards / accelerated failure time typical, but not
necessary
Customisable outputs / summary presentation
Similar Stata packages
I
stpm2 (Royston & Lambert) for spline model
I
stgenreg (Crowther & Lambert) for user-defined parametric
survival models
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
31/ 37
flexsurv R package
flexsurv (Jackson, CRAN (2011–2015))
I Fit any parametric survival model to individual data by
maximum likelihood
I
right/interval censoring, left-truncation, multi-state models
I
Standard models built in, plus spline, generalized gamma/F
I
Can program your own novel parametric distributions, given
the hazard or probability density function
Any parameter of any distribution can depend on covariates
I
I
I
proportional hazards / accelerated failure time typical, but not
necessary
Customisable outputs / summary presentation
Similar Stata packages
I
stpm2 (Royston & Lambert) for spline model
I
stgenreg (Crowther & Lambert) for user-defined parametric
survival models
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
31/ 37
Fitting models, and default output
Generalized gamma 3-parameter model.
Two out of the three parameters depend on a covariate (prognostic group)
fs2 <- flexsurvreg(Surv(recyrs, censrec) ~ group + sigma(group),
data=bc, dist="gengamma")
fs2
Call:
flexsurvreg(formula = Surv(recyrs, censrec) ~ group + sigma(group),
data = bc, dist = "gengamma")
Estimates:
mu
sigma
Q
groupMedium
groupPoor
sigma(groupMedium)
sigma(groupPoor)
data mean
NA
NA
NA
0.3338
0.3324
0.3338
0.3324
est
1.9840
1.1765
-0.7074
-0.7105
-1.4043
-0.0494
-0.1768
L95%
1.6822
0.9392
-1.1795
-1.0346
-1.7102
-0.3105
-0.4293
U95%
2.2859
1.4738
-0.2353
-0.3863
-1.0983
0.2118
0.0757
se
0.1540
0.1352
0.2409
0.1654
0.1561
0.1332
0.1288
exp(est)
NA
NA
NA
0.4914
0.2455
0.9518
0.8380
L95%
NA
NA
NA
0.3554
0.1808
0.7331
0.6510
U95%
NA
NA
NA
0.6795
0.3334
1.2358
1.0787
N = 686, Events: 299, Censored: 387
Total time at risk: 2113.425
Log-likelihood = -786.2172, df = 7
AIC = 1586.434
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
32/ 37
Fitting models, and default output
Generalized gamma 3-parameter model.
Two out of the three parameters depend on a covariate (prognostic group)
fs2 <- flexsurvreg(Surv(recyrs, censrec) ~ group + sigma(group),
data=bc, dist="gengamma")
fs2
Call:
flexsurvreg(formula = Surv(recyrs, censrec) ~ group + sigma(group),
data = bc, dist = "gengamma")
Estimates:
mu
sigma
Q
groupMedium
groupPoor
sigma(groupMedium)
sigma(groupPoor)
data mean
NA
NA
NA
0.3338
0.3324
0.3338
0.3324
est
1.9840
1.1765
-0.7074
-0.7105
-1.4043
-0.0494
-0.1768
L95%
1.6822
0.9392
-1.1795
-1.0346
-1.7102
-0.3105
-0.4293
U95%
2.2859
1.4738
-0.2353
-0.3863
-1.0983
0.2118
0.0757
se
0.1540
0.1352
0.2409
0.1654
0.1561
0.1332
0.1288
exp(est)
NA
NA
NA
0.4914
0.2455
0.9518
0.8380
L95%
NA
NA
NA
0.3554
0.1808
0.7331
0.6510
U95%
NA
NA
NA
0.6795
0.3334
1.2358
1.0787
N = 686, Events: 299, Censored: 387
Total time at risk: 2113.425
Log-likelihood = -786.2172, df = 7
AIC = 1586.434
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
32/ 37
Fitting models, plot output
0.8
1.0
plot(fs2, xlim=c(0, 10))
text(x=c(2, 3.5, 4), y=c(0.4, 0.55, 0.7),
c("Poor","Medium","Good"))
0.6
Good
0.4
Survival
Medium
0.0
0.2
Poor
0
1
2
3
4
5
6
7
Time
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
33/ 37
Summarise user-defined functions of the parameters
e.g. restricted mean survival for a model with parameters α
Z t
E (T |T < t, α) =
S(u|α)du
0
mean.gengamma <- function(mu, sigma, Q,
horizon=10, ...){
surv <- function(t, ...) {
1 - pgengamma(q=t, mu=mu,
sigma=sigma, Q=Q, ...)
}
integrate(surv, 0, horizon, ...)$value
}
1.00
Survival
0.75
0.50
summary(fs2, newdata=list(group="Good"),
t=1, fn=mean.gengamma)
10−year mean survival
0.25
0.00
0.0
2.5
5.0
7.5
Years
Chris Jackson, MRC-BSU Cambridge
10.0
12.5
group=Good
time
est
lcl
ucl
1
1 7.36671 6.725084 7.85355
Improving long-term survival estimation
34/ 37
Summarise user-defined functions of the parameters
e.g. restricted mean survival for a model with parameters α
Z t
E (T |T < t, α) =
S(u|α)du
0
mean.gengamma <- function(mu, sigma, Q,
horizon=10, ...){
surv <- function(t, ...) {
1 - pgengamma(q=t, mu=mu,
sigma=sigma, Q=Q, ...)
}
integrate(surv, 0, horizon, ...)$value
}
1.00
Survival
0.75
0.50
summary(fs2, newdata=list(group="Good"),
t=1, fn=mean.gengamma)
10−year mean survival
0.25
0.00
0.0
2.5
5.0
7.5
Years
Chris Jackson, MRC-BSU Cambridge
10.0
12.5
group=Good
time
est
lcl
ucl
1
1 7.36671 6.725084 7.85355
Improving long-term survival estimation
34/ 37
flexsurv is user-extensible
I
Similar facility for people to fit their own parametric models
I
I
User guide and Supplementary examples manuals give lots of
worked examples
I
I
given definitions of the hazard or probability density as R
functions
User guide to be published in Journal of Statistical Software
Suggestions for additions welcome
I
ideally build a user-contributed library of examples
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
35/ 37
Limitations of flexsurv
I
Restricted to parametric modelling of one individual-level
dataset
I
No random effects / frailty, though could be extended in
theory
For synthesis of multiple related datasets, Bayesian probabilistic
modelling languages (BUGS / JAGS / Stan) preferred
I
though higher barrier to entry at the moment
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
36/ 37
In summary
Modelling long term survival may need
I
a combination of data sources. . .
I
I
I
flexible models . . .
I
I
but modelling/software tools are tricky
can be hard work to assess assumptions
hard to combine with synthesis of multiple datasets
accessible software
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
37/ 37
In summary
Modelling long term survival may need
I
a combination of data sources. . .
I
I
I
flexible models . . .
I
I
but modelling/software tools are tricky
can be hard work to assess assumptions
hard to combine with synthesis of multiple datasets
accessible software
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
37/ 37
In summary
Modelling long term survival may need
I
a combination of data sources. . .
I
I
I
flexible models . . .
I
I
but modelling/software tools are tricky
can be hard work to assess assumptions
hard to combine with synthesis of multiple datasets
accessible software
Chris Jackson, MRC-BSU Cambridge
Improving long-term survival estimation
37/ 37
Download