Chapter 17 Failure-Time Regression Analysis William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University 17 - 1 Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on the authors’ text Statistical Methods for Reliability Data, John Wiley & Sons Inc. 1998. December 14, 2015 8h 10min Computer Program Execution Time Versus System Load • Time to complete a computationally intensive task. • Information from the Unix uptime command 17 - 3 • Predictions needed for scheduling subsequent steps in a multi-step computational process. Explanatory Variables for Failure Times Useful explanatory variables explain/predict why some units fail quickly and some units survive a long time. • Continuous variables like stress, temperature, voltage, and pressure. • Discrete variables like number of hardening treatments or number of simultaneous users of a system. • Categorical variables like manufacturer, design, and location. 17 - 5 Regression model relates failure time distribution to explanatory variables x = (x1, . . . , xk ): Pr(T ≤ t) = F (t) = F (t; x). Chapter 17 Failure-Time Regression Analysis Objectives • Describe applications of failure-time regression • Describe graphical methods for displaying censored regression data. • Introduce time-scaling transformation functions. • Describe simple regression models to relate life to explanatory variables. • Illustrate the use of likelihood methods for censored regression data. • Explain the importance of model diagnostics. 17 - 2 • Describe and illustrate extensions to nonstandard multiple regression models • • ••• ••• •• 1 • •• 2 • • 3 System Load 4 5 • • 6 17 - 4 Scatter Plot of Computer Program Execution Time Versus System Load 800 700 600 500 400 300 200 100 0 0 Failure-Time Regression Analysis 17 - 6 • Presentation motivated by practical problems in reliability analysis. ◮ Nonstandard regression models that relate life to explanatory variables. ◮ Data may be censored. ◮ Data not necessarily from a normal distribution. • The ideas presented here are more general: where the xi are explanatory variables. mean = β0 + β1x1 + · · · + βsxk • Material in this chapter is an extension of statistical regression analysis with normal distributed data and Seconds Nickel-Base Super-alloy Fatigue Data 26 Observations in Total, 4 Censored (Nelson 1984, 1990) Originally described and analyzed by Nelson (1984 and 1990). • Thousands of cycles to failure as a function of pseudostress in ksi. • 26 units tested; 4 units did not fail. 1.5 2.5 scale accelerating transformation 2.0 Time at Level x0 3.0 17 - 7 Objective: Explore models that might be used to describe the relationship between life length and the amount of pseudostress applied to the tested specimens. scale decelerating transformation 0.5 1.0 SAFT Models Illustrating Acceleration and Deceleration. 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.0 17 - 9 17 - 11 • Decelerating SAFT models are straight lines above the diagonal. • Accelerating SAFT models are straight lines below the diagonal. • The SAFT models are straight lines through the origin: In a time transformation plot of T (x0) vs T (x) The SAFT Transformation Models and Acceleration Time at Level x 250 200 150 100 50 0 • • 80 • • • ••• • • • 120 Pseudo-stress (ksi) 100 •••••• • 140 • •• Nickel-Base Super-Alloy Fatigue Data (Nelson 1984, 1990) 60 AF (x) > 0, 160 17 - 8 AF (x0) = 1 17 - 10 17 - 12 In particular, in a probability plot with a log-time scale, F (t, x) is a translation of F (t, x0) along the log(t) axis. This shows that in any plot with a log-time scale tp(x0) and tp(x) are equidistant. log[tp(x0)] − log[tp(x)] = log[AF (x)]. • Proportional quantiles: tp(x) = tp(x0)/AF (x). Then taking logs gives • Scaled time: F (t; x) = F [AF (x) t; x0] . Thus the cdfs F (t; x) and F (t; x0) do not cross each other. For a SAFT model T (x) = T (x0)/AF (x), (Ψ(x) > 0), with baseline cdf F (t; x0) so AF (x0) = 1 Some Properties of SAFT Models • When 0 < AF (x) < 1, T (x) > T (x0), and time is decelerated (but we still call this an SAFT model). • When AF (x) > 1, the model accelerates time in the sense that time moves more quickly at x than at x0 so that T (x) < T (x0). ◮ log[AF (x)] = β1x1 + . . . + βk xk with x0 = 0 for a vector x = (x1, . . . , xk ). ◮ log[AF (x)] = β1x with x0 = 0 for a scalar x. where typical forms for AF (x) are: T (x) = T (x0)/AF (x), • Scale Accelerated Failure Time (SAFT) models are defined by Scale Accelerated Failure Time (SAFT) Model Thousands of Cycles 1 ∆ 5 10 Time ∆ 50 100 17 - 13 Weibull Probability Plot of Two Members from an SAFT Model with a Baseline Lognormal Distribution .999 .9 .98 .7 .5 .3 .2 .1 .05 .03 .02 .01 .005 .003 .001 1000 500 200 100 50 • • • • • • • •• 1 • • • 2 • • 3 4 System Load Local (observed information) estimate 2 0 ,β1 ,σ) − ∂ L(β ∂β0 ∂β1 2 1 ,σ) − ∂ L(β0,β ∂β12 2 0 ,β1 ,σ) − ∂ L(β ∂σ∂β1 1 0 5 • • 6 90% 50% 7 17 - 15 10% 2 0 ,β1 ,σ) − ∂ L(β ∂β0∂σ 2 0 ,β1 ,σ) − ∂ L(β ∂β1∂σ 2 0 ,β1 ,σ) − ∂ L(β ∂σ 2 d b d b b d b b) Var( β Cov( β 0) 0 , β1 ) Cov(β0 , σ d b d βb , σ d b b Σ̂b = Cov( β Cov( 1 , β0 ) Var(β1 ) 1 b) θ d σ d σ d σ b , βb0) b , βb1) b) Cov( Cov( Var( 2 1 ,σ) − ∂ L(β0,β 2 2 2 0 ,β1 ,σ) = − ∂ L(β ∂β ∂β ∂β0 −∂ L(β0,β1,σ) ∂σ∂β0 b. Partial derivatives are evaluated at βb0, βb1, σ −1 17 - 17 Estimated Parameter Variance-Covariance Matrix 0 Computer Program Execution Time Versus System Load Log-Linear Lognormal Regression Model −1 b b log[tbp(x)] = µ(x) + Φnor (p)σ Probability Seconds " log(t) − µ σ # Lognormal Distribution Simple Regression Model with Constant Shape Parameter β = 1/σ • The lognormal simple regression model is Pr(T ≤ t) = F (t; µ, σ) = F (t; β0, β1, σ) = Φnor where µ = µ(x) = β0 + β1x and σ does not depend on x. −1 (p)σ log[tp(x)] = µ(x) + Φnor • The failure-time log quantile function is linear in x. Notice that tp(x) = exp (β1x) tp(0) 17 - 14 implies that this regression model is a scale accelerated failure time (SAFT) model with AF (x) = exp(−β1x). Likelihood for Lognormal Distribution Simple Regression Model with Right Censored Data i=1 Li (β0, β1, σ; datai ) δi 1−δi n Y log(ti ) − µi log(ti ) − µi 1 1 − Φnor φnor σti σ σ i=1 n Y The likelihood for n independent observations has the form L(β0, β1 , σ) = = ( 1 0 exact observation right censored observation where datai = (xi, ti, δi), µi = β0 + β1xi, δi = 17 - 16 φnor (z) is the standardized normal pdf and Φnor (z) is the corresponding normal cdf. The parameters are θ = (β0, β1, σ). Standard Errors and Confidence Intervals for Parameters • Lognormal ML estimates for the computer time experiment b = (βb , βb , σ were θ 0 1 b ) = (4.49, .290, .312) and an estimate of b is the variance-covariance matrix for θ .012 −.0037 0 Σ̂b = −.0037 .0021 0 . θ 0 0 .0029 √ .0021 = .046 . 1 β˜1] = βb1±z(.975)sceβb = .290±1.96(.046) = [.20, .38] • Normal-approximation confidence interval for the computer execution time regression slope is [β1, e 1 where sceβb = 17 - 18 Standard Errors and Confidence Intervals for Quantities at Specific Explanatory Variable Conditions • • 100 ••• •• •• 120 # • 90% 50% •• 10% 140 160 60 • 80 • ••• • • • • • 100 120 ••• •• •• Pseudo-stress (ksi) 10% • 90% •• 50% 140 160 17 - 20 Log-Quadratic Weibull Regression Model with Constant (β = 1/σ) Fit to the Fatigue Data −1 b , x = log(pseudo-stress) b (p)σ log[tbp(x)] = µ(x) + Φsev b = βb0 + βb1x + βb2x2 µ 500 100 50 10 5 1 ( n Y i=1 Li (β0, β1, β2 , σ; datai ) exact observation right censored observation 17 - 22 δi 1−δi n Y log(ti ) − µi log(ti ) − µi 1 1 − Φsev . φsev σti σ σ i=1 1 0 • The Weibull quadratic regression model is Pr[T ≤ t] = Φsev {[log(t) − µ]/σ} , [σ] [σ] [µ] [µ] [µ] where µ = µ(x) = β0 + β1 x + β2 x2 and log(σ) = log[σ(x)] = β0 + β1 x. • The failure-time log quantile function is h −1 (p)σ(x) log[tp(x)] = µ(x) + Φsev which is not quadratic in x. Also 17 - 24 Because the coefficient in front of tp(0) depends on p this shows that the model is not an SAFT model. −1 (p) {σ(x) − σ(0)} × t (0). tp(x) = exp [µ(x) − µ(0)] exp Φsev p i Weibull Distribution Quadratic Regression Model with Nonconstant β = 1/σ The parameters are θ = (β0, β1, β2, σ). δi = where µi = β0 + β1xi + β2xi2, = L(β0, β1 , β2, σ) = The likelihood for n independent observations has the form Likelihood for Weibull Distribution Quadratic Regression Model with Right Censored Data Thousands of Cycles • Unknown values of µ and σ at each level of x " d µ) d µ, b b σ b) Var( Cov( d µ, d σ b σ b ) Var( b) Cov( b = βb0 + βb1x, σ does not depend on x, and • µ Σ̂µb,σb = d d βb )+2xCov( d βb , βb )+x2Var( d βb ) b is obtained from Var( µ) = Var( 0 1 0 1 d µ, d βb , σ d b b ). b σ b ) = Cov( and Cov( 0 b ) + xCov(β1 , σ • Use the above results with the methods from Chapter 8 to compute normal-approximation confidence intervals for F (t), h(t), and tp. • Could also use likelihood or simulation-based confidence intervals. 17 - 19 Weibull Distribution Quadratic Regression Model with Constant Shape Parameter β = 1/σ This is an AF time model with the following characteristics: • The Weibull quadratic regression model is Pr[T ≤ t] = Φsev {[log(t) − µ]/σ} 17 - 21 where µ = µ(x) = β0 + β1x + β2x2 and σ does not depend on x. • The failure-time log quantile function −1 (p)σ log[tp(x)] = µ(x) + Φsev is quadratic in x. Also tp(x) = exp(β1x + β2x2) × tp(0) which shows that the model is an SAFT model. Log-quadratic Weibull Regression Model with Nonconstant β = 1/σ Fit to the Fatigue Data 500 • ••• • • • • 80 50 60 100 10 5 1 Pseudo-stress (ksi) 17 - 23 b b (x) x = log(pseudo-stress) log[tb (x)] = µ(x) + Φ−1 (p)σ p sev [σ] [σ] b = βb0 + βb1x + βb2x2, log(σ b ) = βb0 + βb1 x. µ Thousands of Cycles Likelihood for Weibull Distribution Quadratic Regression Model with Nonconstant β = 1/σ and Right Censored Data The likelihood for n independent observations has the form [µ] [µ] [σ] [σ] 17 - 25 [µ] [µ] [µ] [σ] [σ] L(β0 , β1 , β2 , β0 , β1 ) n Y [µ] [µ] [µ] [σ] [σ] = Li(β0 , β1 , β2 , β0 , β1 ; datai) i=1 " " #)δ ( #)1−δ n ( i i Y 1 log(ti) − µi log(ti) − µi = 1 − Φsev . φsev σ σ i i i=1 σi ti [µ] [µ] [µ] [σ] [σ] where µi = β0 + β1 xi + β2 xi2 and σi = exp β0 + β1 xi . [µ] Parameters are θ = (β0 , β1 , β2 , β0 , β1 ). 5 4 3 2 1 0 • • • • • • • 20 • • • • • • •• • • • • ••• • •• • 30 •• • • 50 • • • • ••• •• •••• • ••• ••• • •• • •••• ••• ••••• •••• • • •• • • •••••••••••• • • • •• • • • 40 Years 60 •• 90% 50% 10% 70 17 - 27 Stanford Heart Transplant Data Quadratic Log-Location Weibull Regression Model 10 10 10 10 10 10 -1 10 10 Checking Model Assumptions 17 - 29 • Most analytical tests can be suitably generalized, at least approximately, for censored data (especially using likelihood ratio tests). ◮ Influence analysis (or sensitivity) analysis. (see Escobar and Meeker 1992 Biometrics). ◮ Other residual plots. ◮ Probability plot of residuals. ◮ Residuals versus fitted values. • Graphical checks using generalizations of usual diagnostics (including residual analysis) Days 5 4 3 2 1 0 • • • • • • • 20 • • • • • • •• • • • • ••• • •• • 30 •• • • 50 • • • • ••• •• •••• • ••• ••• • •• • •••• ••• ••••• •••• • • •• • • •••••••••••• • • • •• • • • 40 Years 60 •• 90% 50% 10% 70 17 - 26 Stanford Heart Transplant Data Quadratic Log-Mean Lognormal Regression Model 10 10 10 10 10 10 -1 10 10 Extrapolation and Empirical Models " log(ti) − log(t̂i) b σ # = !1 ti bσ t̂i 17 - 28 17 - 30 b i) and when ti is a censored observation, where t̂i = exp(µ the corresponding residual is also censored. exp(ǫbi) = exp • With models for positive random variables like Weibull, lognormal, and loglogistic, standardized residuals are defined as y − ybi ǫbi = i b σ where ybi is an appropriately defined fitted value (e.g., ybi = b i). µ • For location-scale distributions like the normal, logistic, and smallest extreme value, Definition of Residuals • Need to get the right curve to extrapolate: look toward physical or other process theory ◮ To the lower tail of a distribution ◮ To the upper tail of a distribution ◮ In an explanatory variable like stress • There are many different kinds of extrapolation • Should not be used to extrapolate outside of the range of one’s data. • Empirical models can be useful, providing a smooth curve to describe a population or a process. Days • • • 10 • •• •• • • 20 • • • Fitted Values 50 • • • 200 • • • • 100 •• 17 - 31 500 Plot of Standardized Residuals Versus Fitted Values for the Log-Quadratic Weibull Regression Model Fit to the Super Alloy Data on Log-Log Axes 5.000 2.000 1.000 0.500 0.200 0.100 0.050 0.020 0.010 0.005 0.002 0.001 5 Empirical Regression Models and Sensitivity Analysis Objectives • Describe a class of regression models that can be used to describe the relationship between failure time and explanatory variables. 71/99 60 Length 80 92/100 100 17 - 33 • Describe and illustrate the use of empirical regression models. • Illustrate the need for sensitivity analysis. 22/99 40 Picciotto Yarn Failure Data Showing Fraction Failing at Each Length 200 100 50 20 10 20 17 - 35 • 0.005 • • 0.050 • • 0.200 Standardized Residuals 0.020 • • 2.000 • •• •• •• •• •• •• •• 0.500 17 - 32 5.000 Probability Plot of the Standardized Residuals from the Log-Quadratic Weibull Regression Model Fit to the Super Alloy Data .9 .98 .7 .5 .3 .2 .1 .05 .03 .02 .01 0.001 Picciotto Yarn Failure Data (Picciotto 1970) • Original data were uncensored. For purposes of illustration, the data censored at 100 thousand cycles. • Subset of data at particular level of stress and particular specimen lengths (30mm, 60mm, and 90mm) • Cycles to failure on specimens of yarn of different lengths. Probability .98 .9 .7 .5 .3 .2 .1 .05 .03 .02 .01 .005 .003 20 80 Picciotto Yarn Failure Data Multiple Weibull Probability Plot with Weibull ML Estimates 60 Subset of Piccioto Yarn Fatigue Data With Individual Weibull MLE’s Weibull Probability Plot 30 60 90 40 Thousands of Cycles 100 17 - 36 17 - 34 • Suppose that the goal is to estimate life for 10 mm units. Probability Standardized Residuals • Show how to conduct a sensitivity analysis. Thousands of Cycles .9 .95 .8 .7 .6 .5 .4 .3 .2 .1 .05 .02 .01 .005 .002 60 Subset of Piccioto Yarn Fatigue Data With Individual Lognormal MLE’s Lognormal Probability Plot 30 60 90 40 Thousands of Cycles 80 Picciotto Yarn Failure Data Multiple Lognormal Probability Plot with Lognormal ML Estimates 20 5 10 30 60 90 10 20 100 200 Subset of Piccioto Yarn Fatigue Data with Lognormal log Model MLE Lognormal Probability Plot 50 Thousands of Cycles 500 1000 100 17 - 37 17 - 39 2000 Picciotto Yarn Failure Data Multiple Lognormal Probability Plot with Lognormal ML Log-Linear Regression Model Estimates .99 .98 .95 .9 .8 .7 .6 .4 .3 .2 .1 .05 .02 .005 .001 5.00 2.00 1.00 0.50 0.20 0.10 0.05 0.02 40 Fitted Values 80 100 120 Subset of Piccioto Yarn Fatigue Data Residuals versus Fitted Values 60 140 17 - 41 Suggested Strategy for Fitting Empirical Regression Models and Sensitivity Analysis • Use data and previous experience to choose a base-line model: ◮ Distribution at individual conditions ◮ Relationship between explanatory variables and distributions at individual conditions • Fit models • Use diagnostics (e.g., residual analysis) to check models • Assess uncertainty ◮ Confidence intervals quantify statistical uncertainty 17 - 38 ◮ Perturb and otherwise change the model and reanalyze (sensitivity analysis) to assess model uncertainty. Subset of Piccioto Yarn Fatigue Data Scatter Plot of the Picciotto Yarn Failure Data with Lognormal ML Log-Linear Regression Model Estimate Densities −1 b log[tbp(x)] = βb0 + βb1 log (Length) + Φnor (p)σ 2000 500 1000 200 20 40 60 17 - 40 100 10% 90% 20 10 Length on log scale 12 50% 8 50 4 100 .99 .98 .9 .95 .8 .7 .6 .4 .3 .2 .1 .05 .02 .005 .001 5 10 30 60 90 10 20 100 200 Subset of Piccioto Yarn Fatigue Data with Lognormal sqrt Model MLE Lognormal Probability Plot 50 Thousands of Cycles 500 1000 17 - 42 2000 Picciotto Yarn Failure Data Multiple Lognormal Probability Plot with Lognormal ML Squareroot-Linear Regression Model Estimates Thousands of Cycles Probability Probability Probability Picciotto Yarn Failure Data Log-Linear Model Residual vs Fitted Values Plot Standardized Residuals Thousands of Cycles Subset of Piccioto Yarn Fatigue Data 60 80 Scatter Plot of the Picciotto Yarn Failure Data with Lognormal ML Squareroot-Linear Regression Model Estimate Densities √ −1 b (p)σ log[tbp(x)] = βb0 + βb1 Length + Φnor 2000 1000 500 200 40 1.5 17 - 45 17 - 43 100 10% 90% 20 100 0 50% 20 10 Length on sqrt scale xi∗ = xi √ 1/3 xi∗ = log(xi) def xi∗ = −1/xi 1/3 xi∗ = −1/xi √ xi∗ = −1/ xi xi∗ = −1/xi2 Transformation Examples of Monotone Increasing Power Transformations of a Positive Explanatory Variable λ −2 −1 −.5 −.333 0 .333 xi∗ = xi 1 .5 xi∗ = xi xi∗ = xi2 2 0.5 Subset of Piccioto Yarn Fatigue Data with Lognormal Length:log at 30 Power Transformation Sensitivity Analysis on Length 0.0 Box-Cox Transformation Power -0.5 ML estimate of the 0.1 quantile 95% Pointwise confidence intervals -1.0 1.0 Picciotto Yarn Failure Data Relationship Sensitivity Analysis for the 0.1 Quantile at Length 30 mm 90 85 80 75 70 65 60 55 -1.5 17 - 47 5.00 2.00 1.00 0.50 0.20 0.10 0.05 0.02 60 Fitted Values 80 100 120 Subset of Piccioto Yarn Fatigue Data Residuals versus Fitted Values Picciotto Yarn Failure Data Squareroot-Linear Model Residual vs Fitted Values Plot 40 Box-Cox Transformation λ xi −1 λ= 6 0 λ log(xi) λ = 0 0.5 Subset of Piccioto Yarn Fatigue Data with Lognormal Length:log at 10 Power Transformation Sensitivity Analysis on Length 0.0 Box-Cox Transformation Power -0.5 ML estimate of the 0.1 quantile 95% Pointwise confidence intervals -1.0 1.0 140 Picciotto Yarn Failure Data Relationship Sensitivity Analysis for the 0.1 Quantile at Length 10 mm 20000 5000 10000 2000 500 1000 200 100 50 -1.5 1.5 17 - 44 17 - 48 17 - 46 ◮ For fixed, xi, xi∗ is a continuous function of λ through 0. ◮ The transformed value xi∗ is an increasing function of xi. • The Box-Cox transformation has the following properties: where xi is the original, untransformed explanatory variable for observation i and λ is the power transformation parameter. xi∗ = • The Box-Cox family of power transformations of a positive explanatory variable is Standardized Residuals 0.1 Quantile of Life in Thousands of Cycles 50 0.1 Quantile of Life in Thousands of Cycles 0.0 0.5 Box-Cox Transformation Power -0.5 1.0 Profile Likelihood and 95% Confidence Interval for Box-Cox Transformation Power from the Lognormal Distribution -1.0 1.5 0.99 0.95 0.90 0.80 0.70 0.60 0.50 Picciotto Yarn Failure Data Profile Plot for Different Box-Cox Parameters for the Length Relationship 1.0 0.8 0.6 0.4 0.2 0.0 10000 5000 2000 1000 500 200 100 50 -1.5 0.5 Subset of Piccioto Yarn Fatigue Data with Length:log at 10 Power Transformation Sensitivity Analysis on Length 0.0 Box-Cox Transformation Power -0.5 lognormal:0.1 quantile weibull:0.1 quantile -1.0 1.0 1.5 Picciotto Yarn Failure Data Relationship/Distribution Sensitivity Analysis 0.1 Quantile at 10 mm Length 17 - 51 17 - 49 Confidence Level 250 300 170 Degrees C 180 Degrees C 17 - 53 350 Scatter Plot the Effect of Voltage and Temperature on Glass Capacitor Life (Zelen 1959) 1200 1000 800 600 400 200 0 200 Volts 0.5 Subset of Piccioto Yarn Fatigue Data with Lognormal Length:log at 10 Power Transformation Sensitivity Analysis on Length 0.0 Box-Cox Transformation Power -0.5 Lognormal:0.1 quantile Lognormal:0.5 quantile Lognormal:0.9 quantile -1.0 1.0 1.5 Picciotto Yarn Failure Data Relationship Sensitivity Analysis 0.1, 0.5, 0.9 Quantiles at 10 mm Length 50000 20000 5000 10000 2000 500 1000 200 100 50 -1.5 Glass Capacitor Failure Data • Original data from Zelen (1959). .999 .9 .98 .7 .5 .3 .2 .1 .05 .03 .02 .01 .005 .003 100 170;200 170;250 170;300 170;350 180;200 180;250 180;300 180;350 200 Hours 500 1000 17 - 50 17 - 54 2000 Weibull Probability Plots of Glass Capacitor Life Test Results at Individual Temperature and Voltage Test Conditions 17 - 52 • Test at each combination run until 4 of 8 units failed (Type II censoring). • 2 × 4 factorial, 8 units at each combination. • Experiment designed to determine the effect of voltage and temperature on capacitor life. Quantile of Life in Thousands of Cycles Proportion Failing Profile Likelihood 0.1 Quantile of Life in Thousands of Cycles Hours Two-Variable Regression Models 240 260 280 300 170 Degrees C Volts Time 340 180 Degrees C 320 ∆ 50 17 - 57 360 100 170 Deg C, 200 V 170 Deg C, 250 V 170 Deg C,300 V 170 Deg C, 350 V 180 Deg C, 200 V 180 Deg C, 250 V 180 Deg C, 300 V 180 Deg C, 350 V 200 Hours 500 1000 Weibull Probability Plots with Weibull Regression Model ML Estimates of F (t) at each Set of Conditions for the Glass Capacitor Data .999 .9 .98 .7 .5 .3 .2 .1 .05 .03 .02 .01 .005 .003 100 17 - 56 The Proportional Hazard Failure Time Model 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.0 1.5 for all t > 0. 2.5 accelerating proportional hazards 2.0 Time at Level x0 1.0 decelerating proportional hazards 0.5 3.0 17 - 60 Proportional Hazard Model (Lognormal Baseline) as a Time Transformation 17 - 58 Thus in a Weibull probability scale F (t; x) and F (t; x0) are equidistant. In particular, Weibull plots of F (t; x) and F (t; x0) are translation of each other along the probability axis. log {− log [1 − F (t; x)]}−log {− log [1 − F (t; x0)]} = log [Ψ(x)] . • From 1−F (t; x) = [1−F (t; x0)]Ψ(x) and taking logs (twice): • S(t; x) = [S(t; x0)]Ψ(x) or 1 − F (t; x) = [1 − F (t; x0)]Ψ(x). If Ψ(x) 6= 1, F (t; x) and F (t; x0) do not cross and the model is accelerating if Ψ(x) > 1 and decelerating if Ψ(x) < 1. The PH model implies (some details later): h(t; x0) and F (t; x0) denote the baseline hazard function and cdf of the model. h(t; x) = Ψ(x)h(t; x0), The proportional hazard (PH) model assumes Proportion Failing • Additive model log[tp(x)] = yp(x) = β0 + β1x1 + β2x2 + Φ−1(p)σ. Because tp(x) = exp [yp(x)] = exp(β1x1 + β2x2) tp(0) this is an SAFT model • The interaction model log[tp(x)] = yp(x) = β0 + β1x1 + β2x2 + β3x1x2 + Φ−1(p)σ. is also SAFT. • Comparing the two models gives 17 - 55 −2 × (L1 − L2) = −2 × (−244.24 + 244.17) = 0.14 2 which is small relative χ.95,1 = 3.84 220 Estimates of Weibull t0.5 Plotted for each Combination of the Glass Capacitor Test Conditions Points are Estimates Regression–Model Free 1400 1200 800 1000 600 400 200 200 ∆ 5 10 Weibull Probability Plot of Two Members from a PH Model with a Baseline Lognormal Distribution .9 .98 .999 .7 .5 .3 .2 .1 .05 .03 .02 .01 .005 .003 .001 1 17 - 59 Time at Level x Hours Probability Interpreting PH Models as a Failure Time Transformation. 1 ! Suppose that T (x0) ∼ F (t; x0) and define the time transformation: T (x) = F −1 1 − {1 − F [T (x0); x0]} Ψ(x) ; x0 . It can be shown that: • T (x) and T (x0) follow the PH relationship h(t; x) = Ψ(x)h(t; x0). if if Ψ(x) < 1, Ψ(x) = 1, Ψ(x) > 1, identity transform accelerating • T (x) is a monotone transformation of T (x0) such that T (x ) < T (x 0 ) if decelerating T (x ) = T (x 0 ) 17 - 61 T (x ) > T (x 0 ) ∆1 ∆2 5 10 Time ∆1 ∆2 50 100 17 - 63 Weibull Probability Plot of Two Members from a PH/SAFT Model with a Baseline Weibull Distribution .9 .98 .999 .7 .5 .3 .2 .1 .05 .03 .02 .01 .005 .003 .001 1 Statistical Methods for the Semiparametric (Cox) PH Model 17 - 65 • Each unit has a vector xi of explanatory variables (often called covariates). • RS i = RS(t(i) − ε) is the risk set just before the failure at time t(i). Data: n units, t(1), . . . , t(r) ordered failure times with n − r censored observations (usually multiply censored). Probability Some Comments on the Proportional Hazard Failure Time Model for all t > 0. Here we consider the proportional hazards, PH, model h(t; x) = Ψ(x)h(t; x0), • In general, the PH model does not preserve the baseline distribution. For example, if T (x0) has a lognormal distribution then T (x) has a power lognormal distribution. 17 - 62 • The semiparametric model without a particular specification of h(t; x0) is known as Cox’s proportional hazards model. Contrasting SAFT and PH Models The scale failure time model tp(x) = tp(x0)/Ψ(x) and the PH model h(t; x) = Ψ(x)h(t; x0) are equivalent if only if the baseline distribution is Weibull. In other words, the Weibull distribution is the only baseline distribution for which a SAFT model is also a a PH model, and vice versa. Heuristic argument: Consider F (t; x1) and F (t; x2). The Weibull probability plots of these two cdfs are translation of each other in both the probability and the log(t) scale if and only if the plots are straight parallel lines. 17 - 64 Cox PH Model Likelihood (Probability of the Data) • The probability that individual i dies in [t, t + ∆t] is h(t; xi)∆t = h(t; x0)∆tΨ(xi) = h(t; x0) exp(xi′ β )∆t where Ψ(xi) = exp(xi′ β ) = exp(β1x1 +· · ·+βk xk ) and x0 = 0 ℓ∈RS i h(t; xℓ)∆t h(t; xi)∆t =P exp(xi′ β ) ′ ℓ∈RS i exp(xℓ β ) • If a death occurs at time t, the probability that it was individual i is Li = P r Y i=1 P exp(xi′ β ) ′ ℓ∈RS i exp(xℓβ ) • Conditional on the observed failure times, the joint probability of the data is L(β ) = • Need mild conditions on xi and the censoring mechanism. 17 - 66 Comments on the Cox PH Model Likelihood • Cox (1972) called L(β ) a conditional likelihood. • L(β ) is not a true likelihood, but there is justification for treating it as such. • With no censoring, Kalbfleisch and Prentice (1973) show that L(β ) can be justified as a marginal likelihood based on ranks of the observed failures. Argument can be extended to Type II censoring. • Cox (1975) provided a partial likelihood justification. • Asymptotic theory shows that the estimates obtained from maximizing L(β ) have high efficiency. 50 100 90 mm 60 mm 30 mm 150 Picciotto Yarn Failure Data Cox PH Model Survival Estimates (Linear Axes) 0 Thousands of Cycles • • ••••• ••• • • • ••• • ••• •• • •••• • • •• ••• •••• •••••••••• ••••• •• • • ••• • • ••• 60 •• • • • •• • • ••• Thousands of Cycles • •• • • •• • • • • 17 - 67 17 - 69 200 • 100 ••• •• •• • ••••• ••• • • 80 Shoenfield Residuals Estimating h(t; x0) [or S(t; x0) or F (t; x0)] b for β and maximize the prob• Basic idea is to substitute β ability of the data (after adjustment for the explanatory variables), as in the product limit estimator. i=1 r b Y b xi) Ψ(x ) Ψ( qi−1 i − qi j=r+1 n Y qj−1 j b x ) Ψ( • Let qi = S(ti + ε; x0), i = 1, . . . , r for the r failure times and let qj = S(tj + ε; x0), j = r + 1, . . . , n for the n − r censored observations. If there are no ties among the failure times, L(q ) = b c x ) = exp(x′ β where Ψ( i i ) and q0 ≡ 1. 20 50 Thousands of Cycles 100 90 mm 60 mm 30 mm 200 2.5 accelerating and decelerating transformation 2.0 accelerating transformation 1.5 Time at Level x0 1.0 decelerating transformation 0.5 3.0 17 - 72 17 - 70 17 - 68 • Maximize L(q ) with respect to qi, i = 1, . . . , r to get the step-function estimate of S(t; x0). b b b • Estimate of S(t) at x is S(t; x) = [S(t; x0]Ψ(x). 10 Picciotto Yarn Failure Data Cox PH Model Survival Estimates (on Weibull Probability Scales) 5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.0 General Failure Time Transformation Graph .003 .005 .01 .02 .03 .05 .1 .2 .3 .5 .7 .9 .98 • Setup more complicated when there are ties. Proportion Failing • There are some technical difficulties when there are ties observations. Approximations for L(β ) generally used. 1.0 0.8 0.6 0.4 0.2 0.0 ••• • ••• ••••• • 40 • Picciotto Yarn Failure Data Cox PH Model Schoenfeld Residuals versus Time • • 20 17 - 71 Time at Level x Probability 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 General Failure Time Transformation Model A general time transformation model is T (x) = Υ [T (x0), x] where x0 are some baseline conditions. We assume that Υ (t, x) satisfies the following conditions: • Υ (t, x) is nonnegative, i.e., Υ (t, x) ≥ 0 for all t and x. • For fixed x, Υ (t, x) is monotone increasing in t. • For all x, Υ (0, x) = 0. 17 - 73 • For x0 the transformation is the identity transformation, i.e., Υ (t, x0) = t for all t. Other Topics in Failure-Time Regression • Models with non-location-scale distributions. • Nonlinear relationship regression model. • Fatigue-limit regression model. • Special topics in regression: random effects models, nonparametric SAFT regression models, parametric PH regression models. 17 - 75 General Failure Time Transformation Model This general assumed transformation model implies: • The distribution of T (x) can be determined from the distribution of T (x0) and x. In particular, tp(x) = Υ [tp(x0), x] for 0 ≤ p ≤ 1. • In a plot of T (x0) versus T (x) ◮ T (x) entirely below the diagonal line implies acceleration. ◮ T (x) entirely above the diagonal line implies deceleration. ◮ Other T (x)s imply acceleration sometimes and deceleration other times. The cdfs of T (x) and T (x0) cross. ◮ Scale time transformations and proportional hazard model are special cases. 17 - 74