Chapter 14 Introduction to the Use of Bayesian Methods for Reliability Data William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University 14 - 1 Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on the authors’ text Statistical Methods for Reliability Data, John Wiley & Sons Inc. 1998. December 14, 2015 8h 9min Introduction • Bayes methods augment likelihood with prior information. • A probability distribution is used to describe our prior beliefs about a parameter or set of parameters. • Sources of prior information: Subjective Bayes: prior information subjective. Empirical Bayes: prior information from past data. • Bayesian methods are closely related to likelihood methods. 14 - 3 Updating Prior Information Using Bayes Theorem Bayes Theorem provides a mechanism for combining prior information with sample data to make inferences on model parameters. For a vector parameter θ the procedure is as follows: • Prior information on θ is expressed in terms of a pdf f (θ ). • We observe some data which for the specified model has likelihood L(DATA|θ ) ≡ L(θ ; DATA). R(θ )f (θ ) L(DATA|θ )f (θ ) =R L(DATA|θ )f (θ )dθ R(θ )f (θ )dθ • Using Bayes Theorem, the conditional distribution of θ given the data (also known as the posterior of θ ) is f (θ |DATA) = R 14 - 5 b ) is the relative likelihood and the where R(θ ) = L(θ )/L(θ multiple integral is computed over the region f (θ ) > 0. Introduction to the Use of Bayesian Methods for Reliability Data Chapter 14 Objectives • Describe the use of Bayesian statistical methods to combine prior information with data to make inferences. • Explain the relationship between Bayesian methods and likelihood methods used in earlier chapters. • Discuss sources of prior information. • Describe useful computing methods for Bayesian methods. • Illustrate Bayesian methods for estimating reliability. • Illustrate Bayesian methods for prediction. • Compare Bayesian and likelihood methods under different assumptions about prior information. Posterior f ( θ |DATA ) 14 - 4 14 - 2 • Explain the dangers of using wishful thinking or expectations as prior information. Likelihood L ( DATA | θ) Bayes Method for Inference Model for DATA DATA Prior f(θ) Some Comments on Posterior Distributions • The posterior f (θ |DATA) is function of the prior, the model, and the data. • In general, it is impossible to compute the multiple integral R L(DATA|θ )f (θ )dθ in closed form. • New statistical and numerical methods that take advantage of modern computing power are facilitating the computation of the posterior. 14 - 6 Differences Between Bayesian and Frequentist Inference • Nuisance parameters ◮ Bayes methods use marginals. ◮ Large-sample likelihood theory suggest maximization. • There are not important differences in large samples. • Interpretation ◮ Bayes methods justified in terms of probabilities. 14 - 7 ◮ Frequentist methods justified on repeated sampling and asymptotic theory. Proper Prior Distributions Any positive function defined on the parameter space that integrates to a finite value (usually 1). • Uniform prior: f (θ) = 1/(b − a) for a ≤ θ ≤ b. This prior does not express strong preference for specific values of θ in the interval. • Examples of non-uniform prior distributions: ◮ Normal with mean at a and and standard deviation b. ◮ Beta between specified a and b with specified shape parameters (allows for a more general shape). ◮ Isosceles triangle with base (range between) a and b. 14 - 9 For a positive parameter θ, may want to specify the prior in terms of log(θ). Effect of Using Vague (or Diffuse) Prior Distributions • For a uniform prior f (θ) (possibly improper) across all possible values of θ R(θ )f (θ ) R(θ ) f (θ |DATA) = R =R R(θ )f (θ )dθ R(θ )dθ which indicates that the posterior f (θ |DATA) is proportional to the likelihood. • The posterior is approximately proportional to the likelihood for a proper (finite range) uniform if the range is large enough so that R(θ ) ≈ 0 where f (θ ) = 0. • Other diffuse priors also result in a posterior that is approximately proportional to the likelihood if R(θ ) is large relative to f (θ ). 14 - 11 • Informative. Sources of Prior Information ◮ Past data. ◮ Physical, chemical, and mechanical theory. ◮ Expert knowledge. • Diffuse (or approximately non-informative). ◮ Uniform over finite range of parameter (or function of parameter). 14 - 8 ◮ Uniform over infinite range of parameter (improper prior). ◮ Other vague or diffuse priors. Improper Prior Distributions f (θ)dθ = ∞, Positive function f (θ) over parameter space for which Z • Uniform in an interval of infinite length: f (θ) = c for all θ. • For a positive parameter θ the corresponding choice is f [log(θ)] = c and f (θ) = (c/θ), θ > 0. f (θ)L(θ|DATA)dθ < ∞ To use an improper prior, one must have Z (a condition on the form of the likelihood and the DATA). 14 - 10 • These prior distributions can be made to be proper by specification of a finite interval for θ and choosing c such that the total probability is 1. Eliciting or Specifying a Prior Distribution • The elicitation of a meaningful joint prior distribution for vector parameters may be difficult ◮ The marginals may not completely determine the joint distribution. ◮ Difficult to express/elicit dependences among parameters through a joint distribution. ◮ The standard parameterization may not have practical meaning. • General approach: choose an appropriate parameterization in which the priors for the parameters are approximately independent. 14 - 12 With appropriate questioning, engineers provided the following information: Expert Opinion and Eliciting Prior Information • Identify parameters that, from past experience (or data), can be specified approximately independently (e.g., for high reliability applications a small quantile and the Weibull shape parameter). • Time to fracture data can often be described by a Weibull distribution. Example of Eliciting Prior Information: Bearing-Cage Time to Fracture Distribution • Determine for which parameters there is useful informative prior information. • From previous similar studies involving heavily censored data, (µ, σ) tend to be correlated (making it difficult to specify a joint prior for them). p σγ/2 × σ(1−γ/2) , b0 = log i σ(1−γ/2) /σγ/2 /z(1−γ/2) hp 0 1000 3000 .01 quantile 5000 0.3 0.4 sigma [log axis] 0.5 • The prior for log(σ) is normal " 0.20 0.30 log(σ) − a0 1 , φnor b0 b0 1 , log(b1/a1) • The prior for log(tp) is uniform f [log(tp)] = f (tp, σ) = f [log(tp)] f [log(σ)] tp σ sigma 0.40 σ > 0. 0.50 14 - 16 14 - 18 a1 ≤ tp ≤ b1, σ > 0. • Consequently, the joint prior distribution for (tp, σ) is The corresponding density for tp is f (tp) = (1/tp)f [log(tp)]. a1 ≤ tp ≤ b 1 . The corresponding density for σ is f (σ) = (1/σ)f [log(σ)]. f [log(σ)] = # Joint Lognormal-Uniform Prior Distributions 0.2 Prior pdfs for log(σ) and σ when σ0.005 = 0.2, σ0.995 = 0.5 14 - 14 • For small p (near the proportion failing in previous studies), (tp, σ) are approximately independent (which allows for specification of approximately independent priors). • For parameters for which there in no useful informative prior information, determine the form and range of the vague prior (e.g., uniform over a wide interval). 14 - 13 • For parameters for which there is useful informative prior information, specify the form and range of the distribution (e.g., lognormal with 99.7% content between two specified points). Example of Eliciting Prior Information: Bearing-Cage Fracture Field Data (Continued) • Based on experience with previous products of the same material and knowledge of the failure mechanism, there is strong prior information about the Weibull shape parameter. • The engineers did not have strong prior information on possible values for the distribution quantiles. a0 = log • For the Weibull shape parameter log(σ) ∼ NOR(a0, b0), where a0 and b0 are obtained from the specification of two quantiles σγ/2 and σ(1−γ/2) of the prior distribution for σ. Then 5000 14 - 15 • Uncertainty in the Weibull 0.01 quantile will be described by UNIFORM[log(a1), log(b1)] distribution where a1 = 100 and b1 = 5000 (wide range—not very informative). .01 quantile [log axis] 500 Prior pdfs for log(t0.01) and t0.01 when a1 = 100, b1 = 5000 100 14 - 17 Joint Prior Distribution for (µ, σ) −1 • The transformation µ = log(tp) − Φsev (p)σ, σ = σ yields the prior for (µ, σ) f [log(tp)] f [log(σ)] f (µ, σ) = × × tp tp σ f [log(σ)] = f [log(tp)] × σ 1 φnor {[log(σ) − a0] /b0} = × log(b1/a1) σb0 −1 −1 where log(a1) − Φsev (p)σ ≤ µ ≤ log(b1) − Φsev (p)σ, σ > 0. 14 - 19 • The region in which f (µ, σ) > 0 is South-West to North−1 (p)Var(σ) > 0. East oriented because Cov(µ, σ) = −Φsev Methods to Compute the Posterior • Numerical integration: to obtain the posterior, one needs R to evaluate the integral f (θ |DATA) = R(θ )f (θ )dθ over the region on which f (θ ) > 0. In general there is not a closed form for the integral and the computation has to done numerically using fixed quadrature or adaptive integration algorithms. 0.50 0.45 0.40 0.35 0.30 0.25 0.20 Simulated Joint Prior for t0.01 and σ • 200 1000 0.01 quantile 500 2000 14 - 21 5000 • • • • • • • • • • • • • • • • •• • •• • • • • • • • • • • • •• • • • • • • • • • •• • •• • •• • • • • • • •• • • • • • ••• • •• • • • • • • • • • • • • • ••• • • •• •• • • • • ••• • • • •• • • •• • • • • • • ••• •• • • •• • • •• • • • •• • • • • •• • •• • • • • • • • • •• • • • •• • • • •••• • •• •• • • • • • •• • • • • • • • • • • • • •• • • • •• ••• • •• • •• • • • • • • • • •• • • • •• •• • •• •• • • • • • • • ••• •• • •• • • • •• • • • • • •• • • •• • • • • • • • ••• •• •• • • • • •• • • • • • •• • • • • • •• • •• • • •• • • • • • • • • • •• • • •• • • • • • •• • • • • • • •• • •••• • • • • ••• • •• • • • • • •• •• • • • • •• • • •• • • •• •• • • •• • • • • • • •• • • • •••• • • •• • • • • • • • •• • • • • • • • • ••• •• • ••• •• • • • • • • • • •• •• • • • • • • • • • • • •• • •• • • •• • • • •• • • • •• • • • • • •• • • • • • • •• • •• • • •• •• • • • • 100 14 - 23 Joint Posterior Distribution for (µ, σ) i=1 δi 1−δi 2003 Y 1 log(ti ) − µ log(ti ) − µ × 1 − Φsev φsev σti σ σ • The likelihood is L(µ, σ) = where δi indicates whether the observation i is a failure or a right censored observation. 14 - 20 R(µ, σ)f (µ, σ) L(µ, σ)f (µ, σ) =RR . L(v, w)f (v, w)dvdw R(v, w)f (v, w)dvdw • The posterior distribution is f (µ, σ|DATA) = R R Computing the Posterior Using Simulation Using simulation, one can draw a sample from the posterior using only the likelihood and the prior. The procedure for a general parameter θ and prior distribution f (θ ) is as follows: • Let θ i, i = 1, . . . , M be a random sample from f (θ ). • The ith observation, θ i, is retained with probability R(θ i). Ui ≤ R(θ i). Then if Ui is a random observation from a uniform (0, 1), θ i is retained if 0.5 0.3 0.1 5 6 mu 9 9 11 0.2 sigma Simulated Joint and Marginal Prior Distributions for µ and σ 7 mu 8 10 • ••• • • • • • •• • • •• ••••••••••• • ••• ••••• ••• ••• ••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••• •••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••• •••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • 7 0.4 14 - 24 0.6 14 - 22 ⋆ , . . . θ⋆ • It can be shown that the retained observations, say θ 1 M⋆ (M ⋆ ≤ M ) are observations from the posterior f (θ |DATA). sigma • Simulation methods: the posterior can be approximated using Monte Carlo simulation resampling methods. sigma Sampling from the Prior The joint prior for θ = (µ, σ), is generated as follows: i = 1, . . . , M • Use the inverse cdf method (see Chapter 4) to obtain a pseudorandom sample for tp, say U (tp)i = a1 × b1 1i , where U11, . . . , U1M are a pseudorandom sample from a uniform (0, 1). i • Similarly, obtain a pseudorandom sample for σ, say h −1 (U ) σi = exp a0 + b0Φnor 2i where U21, . . . , U2M are another independent pseudorandom sample from a uniform (0, 1). 7 9 ••• •• • • ••• •••• • •• • •• •••••• •• • •• • • • •••••••••••••••••• • • •• • ••••••••••••••••• ••• • • •••••••••••••••••••••••• ••• •• • ••••••••••••••••••••••••••••••••••••••••••••• •• • • ••••••••••••••••••••••••••••••••••••••••••••••• • • •• ••••••••••••••••••••••••••••••••••••••••••• • • • • ••• • ••••••••••••••••••••••••••• •• ••••••••••••••••••••••• ••• • • •• • • ••••••••••• • •• • ••••• mu 8 • Joint Posterior for µ and σ 6 Z f (θ )R(θ )dθ • • 10 11 14 - 25 14 - 27 5 • • • • • 7 mu 8 9 0.9 11 0.5 0.1 0.01 10 • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • •• •• • • • •• • • • • • ••• • • • • • • • • • ••• • •• •• •• • • ••• • •• • • • • • • • •• • • • • • •• • •• •••• ••••• •• • •• •••• •• • ••••••• •• ••• •••••• • • • • • ••• •• • •• • • • •• • • • • • • •• •• • •• •• •• • • •• • •••• ••• ••••• • • • •••• ••••• • ••• ••• • •••• •• • • •• •••• •• •• • ••••• ••• ••• •• •• • • •• • • •••• • • •• • ••• •••• ••• • ••••• • •••• • • •• • • • • •• •• • • • •• •• • •• •••• •••• •••• •••• •• • • •• • •• • • •• ••• • • • • •• ••• • • •• • • • • • • •••• • • • • • • • • •• • • • •• • • •• • • • • •• • • • •• • •• •• •• • • •• • •••• • •• •• ••• •• • • •• • • • • • • • • • •• • • • •••• • •• • •• • •• • ••• • • • • • • • •• • • ••• • • • • • • • 6 14 - 26 Simulated Joint Prior Distribution with µ and σ Relative Likelihood 0.6 0.5 0.4 0.3 0.2 0.1 0.60 0.50 0.40 0.30 0.20 7.5 8.0 mu 9.5 • ••• ••• • •••••••••••••••••••••• • •••••••••••••••••••••••••• ••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •• •••••• ••••••••••• ••••••••••••••••••••••••• ••••••••••••••••••••••••••••• ••••••••••••••••••••••••••• • 8.5 mu 9.0 • 10.5 10.0 Z 0.2 0.3 0.4 sigma f (θ |DATA)dθ ′. 0.5 14 - 30 • Inferences for µ or σ alone are based on the corresponding marginal distributions µi⋆ and σi⋆, respectively. • Using the general resampling method described above, one gets a sample for the posterior for θ , say θ i⋆ = (µi⋆, σi⋆), i = 1, . . . , M ⋆. where θ ′ is the subset of the parameters excluding θj . f [θ j |DATA] = • Inferences on individual parameters are obtained by using the marginal posterior distribution of the parameter of interest. The marginal posterior of θj is Posterior and Marginal Posterior Distributions for the Model Parameters 14 - 28 Joint Posterior and Marginals for µ and σ for the Bearing Cage Data sigma sigma −1 • Then θ i = (µi, σi) with µi = log [(tp)i] − Φsev (p)σi is a pseudorandom sample from the (µ, σ) prior. 0.6 0.5 0.4 0.3 0.2 0.1 5 E(M ⋆) = M 14 - 29 • Can add to the posterior by sequentially filtering groups of prior points until a sufficient number is available in the posterior. • When the prior and the data do not agree well, M ⋆ << M otherwise and a larger prior sample will be required. Consequently, The number of observations M ⋆ from the posterior is random with an expected value of Comments on Computing Posteriors Using Resampling sigma Posterior and Marginal Posterior Distributions for the Functions of Model Parameters • Inferences on a scalar function of the parameters g(θ ) are obtained by using the marginal posterior distribution of the functions of the parameters of interest, f [g(θ )|DATA]. • Using the simulation method, inferences are based on the simulated posterior marginal distributions. For example: ◮ The marginal posterior distribution of f (tp|DATA) for inference on quantiles is obtained from the empirical dis−1 tribution of µi⋆ + Φsev (p)σi⋆. ◮ The marginal posterior distribution of f [F (te)|DATA] for 0.10 0.15 F(2000 | DATA) 0.20 0.0 0.2 0.4 0.6 F(5000 | DATA) 0.8 14 - 33 1.0 14 - 31 inference for failure probabilities at te is obtained from log(te )−µi⋆ . σi⋆ the empirical distribution of Φsev 0.05 Simulated Marginal Posterior Distributions for F (2000) and F (5000) 0.0 One-Sided Bayes Confidence Bounds f [g(θ )|DATA]dg(θ ) = 1 − α e • A 100(1 − α)% Bayes lower confidence bound (or credible bound) for a scalar function g(θ ) is value g satisfying Z ∞ g e f [g(θ )|DATA]dg(θ ) = 1 − α • A 100(1 − α)% Bayes upper confidence bound (or credible bound) for a scalar function g(θ ) is value g̃ satisfying Z g̃ −∞ 14 - 35 6000 .05 quantile (Hours) 4000 8000 2000 6000 .10 quantile (Hours) 4000 Simulated Marginal Posterior Distributions for t0.05 and t0.10 2000 Bayes Point Estimation 8000 14 - 32 Bayesian inference for θ and functions of the parameters g (θ ) are entirely based on their posterior distributions f (θ |DATA) and f [g (θ )|DATA]. Point Estimation: Z g(θ )f (θ |DATA)dθ . • If g(θ ) is a scalar, a common Bayesian estimate of g(θ ) is its posterior mean, which is given by gb(θ ) = E[g(θ )|DATA] = In particular, for the ith component of θ , θ̂i is the posterior mean of θi. This estimate is the the Bayes estimate that minimizes the square error loss. 14 - 34 • Other possible choices to estimate g(θ ) include (a) the posterior mode, which is very similar to the ML estimate and (b) the posterior median. Two-Sided Bayes Confidence Intervals f [g(θ )|DATA]dg(θ ) = 1 − α e (1) • A 100(1 − α)% Bayes confidence interval (or credible interval) for a scalar function g(θ ) is any interval [g , g̃] satisfying g Z g̃ e • The interval [g , e g̃] can be chosen in different ways ◮ Combining two 100(1 − α/2)% intervals puts equal probability in each tail (preferable when there is more concern for being incorrect in one direction than the other). e ◮ A 100(1 − α)% Highest Posterior Density (HPD) confidence interval chooses [g , g̃] to consist of all values of 14 - 36 g with f (g|DATA) > c where c is chosen such that (1) holds. HPD intervals are similar to likelihood-based confidence intervals. Also, when f [g(θ )|DATA] is unimodal the HPD is the narrowest Bayes interval. Bayesian Joint Confidence Regions The same procedure generalizes to confidence regions for vector functions g (θ ) of θ . • A 100(1 − α)% Bayes confidence region (or credible region) for a vector valued function g (θ ) is defined as CRB = {g (θ )|f [g|DATA] ≥ c} f [g (θ )|DATA)dg (θ ) = 1 − α where c is chosen such that Z CRB 14 - 37 • In this case the presentation of the confidence region is difficult when θ has more than 2 components. Prediction of Future Events • Future events can be predicted by using the Bayes predictive distribution. • If X [with pdf f (·|θ )] represents a future random variable f (x|θ )f (θ |DATA)dθ ◮ the posterior predictive pdf of X is Z f (u|θ )du = Z F (x|θ )f (θ |DATA)dθ = Eθ|DATA [f (x|θ )] f (x|DATA) = −∞ Z x ◮ the posterior predictive cdf of X is F (x|DATA) = = Eθ|DATA [F (x|θ )] 14 - 39 where the expectations are computed with respect to the posterior distribution of θ . Location-Scale Based Prediction Problems Here we consider prediction problems when log(T ) has a location-scale distribution. F (t|θ ) = Φ(ζ) • Predicting a future value of T . In this case, X = T and x = t, then 1 φ(ζ), f (t|θ ) = σt where ζ = [log(t) − µ]/σ. ⋆ X 1 1 M φsev (ζi⋆) M ⋆ i=1 σi⋆t ⋆ X 1 M Φsev (ζi⋆) M ⋆ i=1 14 - 41 • Thus, for the Bearing-cage fracture data, approximations of the predictive pdf and cdf for a new observation are: f (t|DATA) ≈ F (t|DATA) ≈ where ζi⋆ = [log(t) − µi⋆]/σi⋆. Bayes Versus Likelihood 14 - 38 • Summary table or plots to compare the Likelihood versus the Bayes Methods to compare confidence intervals for µ, σ, and t0.1 for the Bearing-cage data example. Approximating Predictive Distributions X 1 M f (x|θ i⋆). M ⋆ i=1 ⋆ • f (x|DATA) can be approximated by the average of the posterior pdfs f (x|θ i⋆). Then f (x|DATA) ≈ X 1 M F (x|θ i⋆). M ⋆ i=1 ⋆ • Similarly, F (x|DATA) can be approximated by the average of the posterior cdfs F (x|θ i⋆). Then F (x|DATA) ≈ 14 - 40 • A two-sided 100(1 − α)% Bayesian prediction interval for a new observation is given by the α/2 and (1 − α/2) quantiles of F (x|DATA). Prediction of an Order Statistic Here we consider prediction of the kth order statistic in a future sample of size m from the distribution of T when log(T ) has a location-scale distribution. j=k m X × [1 − Φ (ζ)]m−k m! [Φ (ζ)]j × [1 − Φ (ζ)]m−j j! (m − j)! 1 m! × [Φ (ζ)]k−1 × φ (ζ) (k − 1)! (m − k)! σt(k) • In this case, X = T(k) and x = t(k), then f [t(k)|θ ] = F [t(k)|θ ] = where ζ = [log(t(k)) − µ]/σ. 14 - 42 Predicting the 1st Order Statistic 1 φ (ζ) × [1 − Φ (ζ)]m−1 σt(1) 14 - 43 When k = 1 (predicting the 1st order statistic), the formulas simplify to • Predictive pdf f [t(1)|θ ] = m × • Predictive cdf F [t(1)|θ ] = 1 − [1 − Φ (ζ)]m where ζ = [log(t(1)) − µ]/σ. Predicting a New Observation • F (t|DATA) can be approximated by the average of the posterior probabilities F (t|θ i⋆), i = 1, . . . , M ⋆. • Similarly, f (t|DATA) can be approximated by the average of the posterior densities f (t|θ i⋆), i = 1, . . . , M ⋆. ⋆ " # # • In particular for the Bearing-cage fracture data, an approximation for the predictive pdf and cdf are " X log(t) − µi⋆ 1 M Φsev . M ⋆ i=1 σi⋆ ⋆ X 1 log(t) − µi⋆ 1 M f (t|DATA) ≈ φsev M ⋆ i=1 σi⋆t σi⋆ F (t|DATA) ≈ 14 - 45 • A 100(1 − α)% Bayesian prediction interval for a new observation is given by the percentiles of this distribution. Caution on the Use of Prior Information • In many applications, engineers really have useful, indisputable prior information. In such cases, the information should be integrated into the analysis. • We must beware of the use of wishful thinking as prior information. The potential for generating seriously misleading conclusions is high. • As with other inferential methods, when using Bayesian methods, it is important to do sensitivity analyses with respect to uncertain inputs to ones model (including the inputted prior information) 14 - 47 Predicting the 1st Order Statistic for the Bearing-Cage Fracture Data For the Bearing-cage fracture data: ( ) 14 - 44 X 1 M 1 m × ⋆ φ ζi⋆ × 1 − Φ ζi⋆ m−1 M ⋆ i=1 σi t ⋆ • An approximation for the predictive pdf for the 1st order statistic is f [t(1)|DATA] ≈ ⋆ • The corresponding predictive cdf is X n o 1 M 1 − 1 − Φ ζi⋆ m F [t(k)|DATA] ≈ M ⋆ i=1 where ζi⋆ = [log(t) − µi⋆]/σi⋆. 5000 Hours 10000 15000 Predictive Density and Prediction Intervals for a Future Observation from the Bearing Cage Population 0 14 - 46