Incorporating unobserved heterogeneity in Weibull survival models: A Bayesian approach Catalina A. Vallejos1 1 Mark F.J. Steel2 MRC Biostatistics Unit, EMBL-European Bioinformatics Institute. 2 Dept. of Statistics, University of Warwick. Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics. 27-29 July, 2015 Motivation What happens if we ignore unobserved heterogeneity? 0.9 Population hazard rates 1.2 Individual hazard rates 0.7 0.6 0.6 h(t) 0.8 h(t) 1.0 0.8 Group 2 0.4 0.4 0.5 Group 1 0 5 10 t Catalina Vallejos 15 20 0 5 10 15 20 t MRC Biostatistics Unit and EMBL European Bioinformatics Institute 2/25 Mixture families of life distributions Definition Ti is distributed as a mixture of life distributions, iff its density function is given by Z f (ti |ψ, θ) ≡ L f ∗ (ti |ψ, Λi = λi ) dPΛi (λi |θ), where f ∗ (·|ψ, Λi = λi ) is a lifetime density and PΛi (·|θ) is a cdf on L possibly depending on a parameter θ, θ ∈ Θ. • Distinction between individual and population-level survival • The intuition behind the underlying model is preserved • The influence of outlying observations is attenuated Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 3/25 Rate Mixtures of Weibull distributions Definition Ti is distributed as a Rate Mixtures of Weibull (RMW) distributions iff Ti |α, γ, Λi = λi ∼ Weibull (αλi , γ) , Λi |θ ∼ PΛi (·|θ), i.e. Z f (ti |α, γ, θ) = L γ γαλi e −αλi ti tiγ−1 dPΛi (λi |θ), ti > 0, where α, γ > 0 and PΛi (·|θ) is a cdf on L possibly depending on a parameter θ ∈ Θ. Denote Ti ∼ RMWP (α, γ, θ) Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 4/25 Rate Mixtures of Weibull distributions • Relates to existing literature in frailty models • Typically γ = 1 and Λi ∼ gamma (Lomax distribution) ⇒ e.g. Jewell (1982), Abbring and Van Den Berg (2007) • Non-parametric mixtures ⇒ e.g. Kottas (2006) • Case γ = 1: Rate Mixtures of Exponentials Ti ∼ RMEP (α, θ) 1/γ • If Ti ∼ RMEP (α, θ) then Ti ∼ RMWP (α, γ, θ). • For γ ≤ 1: decreasing hazard rate (Marshall and Olkin, 2007) • Identifiability precludes unknown scale parameters in P ⇒ Fix scale parameters in P or set E(Λi |θ) = 1 Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 5/25 Rate Mixtures of Weibull distributions Example: RMW model with Gamma(θ, θ) mixing and α = 1 Density function θ=1 θ=5 θ=∞ h(t) 1.0 0.4 0.0 0.0 0.2 f(t) 0.6 γ = 0.7 2.0 0.8 3.0 Hazard function 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 2.5 3.0 2.0 2.5 3.0 h(t) 2.0 0.8 0.6 1.0 f(t) 0.4 0.0 0.0 0.0 0.5 1.0 1.5 t Catalina Vallejos 2.0 t 0.2 γ=2 1.5 3.0 t 2.0 2.5 3.0 0.0 0.5 1.0 1.5 t MRC Biostatistics Unit and EMBL European Bioinformatics Institute 6/25 Rate Mixtures of Weibull distributions Coefficient of variation Theorem If all the required moments exist, the coefficient of variation (cv ) of distributions in the RMW family is v u u Γ (1 + 2/γ) varΛ (Λ−1/γ |θ) Γ (1 + 2/γ) − Γ2 (1 + 1/γ) i i u + . cv (γ, θ) = u 2 Γ2 (1 + 1/γ) u Γ (1 + 1/γ) E2Λi (Λ−1/γ |θ) i | {z } t | {z } (cv ∗ (γ,θ))2 s It simplifies to Catalina Vallejos 2 (cv W (γ))2 varΛi (Λ−1 i |θ) + 1 when γ = 1. E2Λi (Λ−1 i |θ) MRC Biostatistics Unit and EMBL European Bioinformatics Institute 7/25 Rate Mixtures of Weibull distributions Coefficient of variation Theorem If all the required moments exist, the coefficient of variation (cv ) of distributions in the RMW family is v u u Γ (1 + 2/γ) varΛ (Λ−1/γ |θ) Γ (1 + 2/γ) − Γ2 (1 + 1/γ) i i u + . cv (γ, θ) = u 2 Γ2 (1 + 1/γ) u Γ (1 + 1/γ) E2Λi (Λ−1/γ |θ) i | {z } t | {z } (cv ∗ (γ,θ))2 s It simplifies to 2 (cv W (γ))2 varΛi (Λ−1 i |θ) + 1 when γ = 1. E2Λi (Λ−1 i |θ) If θ is unknown, we restrict the range of (γ, θ) such that cv is finite Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 7/25 A regression model based on RMW distributions Proportional Hazards (PH) models are popular in this context 0 ∗ hTi (ti |xi , β, Λi = λi ) = λi γtiγ−1 e xi β , Catalina Vallejos Λi ∼ PΛi (θ) MRC Biostatistics Unit and EMBL European Bioinformatics Institute 8/25 A regression model based on RMW distributions Proportional Hazards (PH) models are popular in this context 0 ∗ hTi (ti |xi , β, Λi = λi ) = λi γtiγ−1 e xi β , Λi ∼ PΛi (θ) But the PH property is not preserved after mixture! Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 8/25 A regression model based on RMW distributions Proportional Hazards (PH) models are popular in this context 0 ∗ hTi (ti |xi , β, Λi = λi ) = λi γtiγ−1 e xi β , Λi ∼ PΛi (θ) But the PH property is not preserved after mixture! Instead, we use an Accelerated Failure Times (AFT) specification Ti ∼ RMWP (αi , γ, θ), 0 αi = e −γxi β , which is equivalent to −1/γ log(Ti ) = xi0 β + log(Λi Catalina Vallejos T0 ), Λi ∼ PΛi (θ), T0 ∼ Weibull(1, γ) MRC Biostatistics Unit and EMBL European Bioinformatics Institute 8/25 A regression model based on RMW distributions Proportional Hazards (PH) models are popular in this context 0 ∗ hTi (ti |xi , β, Λi = λi ) = λi γtiγ−1 e xi β , Λi ∼ PΛi (θ) But the PH property is not preserved after mixture! Instead, we use an Accelerated Failure Times (AFT) specification Ti ∼ RMWP (αi , γ, θ), 0 αi = e −γxi β , which is equivalent to −1/γ log(Ti ) = xi0 β + log(Λi T0 ), Λi ∼ PΛi (θ), T0 ∼ Weibull(1, γ) These regressions are equivalent setting β = −β ∗ /γ Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 8/25 Bayesian inference for the RMW-AFT model A weakly informative prior First consider the RME case (γ = 1) Jeffreys and independence Jeffreys priors have structure π(β, θ) ∝ π(θ), but they are complicated to derive and π(θ) might not be proper. Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 9/25 Bayesian inference for the RMW-AFT model A weakly informative prior First consider the RME case (γ = 1) Jeffreys and independence Jeffreys priors have structure π(β, θ) ∝ π(θ), but they are complicated to derive and π(θ) might not be proper. Approach: • Keep Jeffreys structure but use a proper π(θ) • Match priors through common proper prior for cv , say π ∗ (cv ) • Exploting the functional relationship between cv and θ Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 9/25 Bayesian inference for the RMW-AFT model A weakly informative prior Table : Relationship between cv and θ for some RME models. Mixing density Range of cv cv (θ) Gamma(θ, θ) (1, ∞) q Inverse-Gamma(θ, 1) (1, Inverse-Gaussian(θ, 1) (1, Log-Normal(0, θ) (1, ∞) Catalina Vallejos √ √ 3) 5) dcv (θ) dθ θ θ−2 θ−1/2 (θ − 2)−3/2 θ+2 θ θ−3/2 (θ + 2)−1/2 5θ 2 +4θ+1 θ 2 +2θ+1 3θ+1 (5θ 2 +4θ+1)1/2 (θ+1)2 √ 2 eθ − 1 e θ (2 e θ − 1)−1/2 q q MRC Biostatistics Unit and EMBL European Bioinformatics Institute 10/25 Bayesian inference for the RMW-AFT model A weakly informative prior For the general RMW case We choose π(β, γ, θ) ∝ π(γ, θ) ≡ π(θ|γ)π(γ), where π(θ|γ) and π(γ) are proper. Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 11/25 Bayesian inference for the RMW-AFT model A weakly informative prior For the general RMW case We choose π(β, γ, θ) ∝ π(γ, θ) ≡ π(θ|γ)π(γ), where π(θ|γ) and π(γ) are proper. Approach: • Define π(θ|γ) as before through π ∗ (cv ), given γ • Choose a proper π(γ) Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 11/25 Bayesian inference for the RMW-AFT model A weakly informative prior For the general RMW case We choose π(β, γ, θ) ∝ π(γ, θ) ≡ π(θ|γ)π(γ), where π(θ|γ) and π(γ) are proper. Approach: • Define π(θ|γ) as before through π ∗ (cv ), given γ • Choose a proper π(γ) These priors are improper but the posterior distribution is well defined under mild conditions Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 11/25 Bayesian inference for the RMW-AFT model Outlier detection No heterogeneity Extreme value of λi ⇓ ⇓ λ1 = λ2 = · · · = λn = λ Potential outlier Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 12/25 Bayesian inference for the RMW-AFT model Outlier detection No heterogeneity Extreme value of λi ⇓ ⇓ λ1 = λ2 = · · · = λn = λ Potential outlier Formally, we contrast the models M0 : Λi = λref M1 : Λi 6= λref (with all other Λj , j 6= i free) 1 (i) BF01 = π(λi |t, c)E dP(λi |θ) λi =λref Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 12/25 Bayesian inference for the RMW-AFT model Outlier detection No heterogeneity Extreme value of λi ⇓ ⇓ λ1 = λ2 = · · · = λn = λ Potential outlier Formally, we contrast the models M0 : Λi = λref M1 : Λi 6= λref (with all other Λj , j 6= i free) 1 (i) BF01 = π(λi |t, c)E dP(λi |θ) λi =λref Choice of λref ? Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 12/25 Bayesian inference for the RMW-AFT model Outlier detection • In Vallejos and Steel (2014) we recommended λref = E(Λi |θ) Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 13/25 Bayesian inference for the RMW-AFT model Outlier detection • In Vallejos and Steel (2014) we recommended λref = E(Λi |θ) • This is not appropriate for RMW models where censoring is very informative for λi ’s. For censored observations we use correction factor λcref = Ri (β, γ, θ)λoref , with Ri (β, γ, θ) = Catalina Vallejos E (Λi |ti , ci = 0, β, γ, θ) . E (Λi |ti , ci = 1, β, γ, θ) MRC Biostatistics Unit and EMBL European Bioinformatics Institute 13/25 Applications To illustrate, we analyse 2 real datasets: Dataset Veteran’s administration lung cancer (VA) Cerebral palsy (CP) n Censoring # covariates 137 1,549 7% 84% 5 2 We use RMW-AFT models as well as a Weibull model. Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 14/25 Applications Model comparison We compare models defined by different mixing distributions using • Bayes Factors • Conditional Predictive Ordinate (CPO): for observation i, CPOi = f (ti |t−i ), t−i = (t1 , ..., ti−1 , ti+1 , ..., tn ), where f (·|t−i ) is the predictive density given t−i . Q • PsML = ni=1 CPOi (Geisser and Eddy, 1979) ⇒ Ratios of PsML’s defining pseudo Bayes factors (PsBF) Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 15/25 Application: VA data Model comparison in terms of BF and PsBF 3 4 5 6 5 4 3 2 0 1 2 3 4 5 6 0 1 2 3 4 5 log−Bayes Factors γ~Gamma(0.001,0.001) 3 4 log−Bayes Factors 5 6 5 4 3 ● ● 0 0 2 ● 2 2 3 ● 1 Log−Pseudo Bayes Factors 5 4 ● 1 Log−Pseudo Bayes Factors 5 4 2 3 ● 6 6 log−Bayes Factors 1 1 1 Log−Pseudo Bayes Factors 5 4 0 γ~Gamma(1,1) 0 Log−Pseudo Bayes Factors 3 6 ● log−Bayes Factors ● 0 Catalina Vallejos 2 0 2 ● γ~Gamma(4,1) 6 1 6 0 E(cv ) = 5.0 ● ● 1 Log−Pseudo Bayes Factors 5 4 3 2 1 Log−Pseudo Bayes Factors ● ● 0 E(cv ) = 1.5 γ~Gamma(0.001,0.001) 6 γ~Gamma(1,1) 6 γ~Gamma(4,1) 0 1 2 3 4 log−Bayes Factors 5 6 0 1 2 3 Weibull RMWEXP RMWGAM RMWIGAM RMWIGAUSS RMWLN 4 5 6 log−Bayes Factors MRC Biostatistics Unit and EMBL European Bioinformatics Institute 16/25 Application: CP data Model comparison in terms of BF and PsBF (Geisser and Eddy, 1979) 4 6 8 6 4 0 2 4 6 8 0 2 4 6 8 4 6 log−Bayes Factors 8 4 6 ●● 2 Log−Pseudo Bayes Factors 6 4 ● 0 0 2 Log−Pseudo Bayes Factors 6 4 ● 8 log−Bayes Factors γ~Gamma(0.001,0.001) 8 log−Bayes Factors γ~Gamma(1,1) 2 2 2 Log−Pseudo Bayes Factors 8 6 0 log−Bayes Factors ●● 0 Catalina Vallejos 4 8 ● ● γ~Gamma(4,1) 8 2 0 Log−Pseudo Bayes Factors 0 E(cv ) = 5.0 ●● 0 2 4 6 ● ● γ~Gamma(0.001,0.001) 2 Log−Pseudo Bayes Factors 8 γ~Gamma(1,1) 0 E(cv ) = 1.5 Log−Pseudo Bayes Factors γ~Gamma(4,1) 0 2 4 6 log−Bayes Factors 8 0 2 4 Weibull RMWEXP RMWGAM RMWIGAM RMWIGAUSS RMWLN 6 8 log−Bayes Factors MRC Biostatistics Unit and EMBL European Bioinformatics Institute 17/25 Applications: VA dataset Posterior medians and HPD 95% interval for some regression coefficients Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 18/25 Applications: CP dataset Posterior medians and HPD 95% interval for some regression coefficients Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 19/25 Applications Posterior medians and HPD 95% interval for γ VA dataset CP dataset Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 20/25 Application: VA data 10 44 17 5 E(cv ) = 1.5 2log(BF) 15 20 Outlier detection (Gamma(θ,θ) mixing) 75 78 118 0 36 0 20 40 60 80 100 120 140 20 Patient 10 17 75 78 36 5 E(cv ) = 5.0 2log(BF) 15 44 58 21 27 13 118 70 125 0 9 0 20 40 60 80 100 120 140 Patient Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 21/25 Application: CP data Bayes Factors 0.0 1.0 2.0 Outlier detection (Exponential(1) mixing) 0 500 1000 1500 Patient Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 22/25 Conclusions 1 We explored mixtures of life distributions (e.g RMW family) to deal with unobserved heterogeneity and outliers 2 Covariates through AFT specification: retains AFT structure and the interpretation of β 3 Prior based on structure of Jeffreys prior, but allows meaningful BFs 4 Proposal of outlier detection method based on mixing parameters 5 Data support mixing; critical for estimation of β and γ Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 23/25 Acknowledgements This research project was funded by • University of Warwick • Pontificia Universidad Católica de Chile Many thanks to P.O.D. Pharoah and Prof. Jane Hutton for access to the cerebral palsy dataset. Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 24/25 Full references list and more details in C.A. Vallejos and M.F.J. Steel (2014), Incorporating unobserved heterogeneity in Weibull survival models: A Bayesian approach. CRiSM-WP 14-20. C.A. Vallejos and M.F.J. Steel (2015), Objective Bayesian survival analysis using scale mixtures of log-normal distributions. JASA Catalina Vallejos MRC Biostatistics Unit and EMBL European Bioinformatics Institute 25/25