Chapter 8 Maximum Likelihood for Location-Scale Based Distributions William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on the authors’ text Statistical Methods for Reliability Data, John Wiley & Sons Inc. 1998. January 13, 2014 3h 41min 8-1 Chapter 8 Maximum Likelihood for Location-Scale Based Distributions Objectives • Illustrate likelihood-based methods for parametric models based on log-location-scale distributions (especially Weibull and Lognormal). • Construct and interpret likelihood-ratio-based confidence intervals/regions for model parameters and for functions of model parameters. • Construct and interpret normal-approximation confidence intervals/regions. • Describe the advantages and pitfalls of assuming that the log-location-scale distribution shape parameter is known. 8-2 Weibull Probability Plot of the Shock Absorber Data Log Kilometers 3.8 4.0 4.2 4.4 .98 .9 Proportion Failing .5 .3 .2 .1 .05 • • .02 • • • • • • •• -2 -4 • .01 0 .005 .003 Standard quantile .7 -6 .001 .0005 5000 10000 15000 20000 25000 Kilometers 8-3 Weibull Distribution Model Likelihood for Right Censored Data • The Weibull distribution model is Pr(T ≤ t) = F (t; µ, σ) = Φsev {[log(t) − µ]/σ} . • The likelihood has the form L(µ, σ) = = = n Y i=1 n Y δi = 1 0 1−δi δ [f (ti ; µ, σ)] i [1 − F (ti ; µ, σ)] i=1 n Y i=1 ( Li (µ, σ; datai ) 1 φsev σti if if log(ti ) − µ σ δi × 1 − Φsev log(ti ) − µ σ 1−δi ti is an exact observation ti is a right censored observation φsev (z) is the standardized smallest extreme value density. 8-4 Lognormal Distribution Model Likelihood for Right Censored Data • The lognormal distribution model is Pr(T ≤ t) = F (t; µ, σ) = Φnor {[log(t) − µ]/σ} . • The likelihood has the form L(µ, σ) = = = n Y i=1 n Y δi = 1 0 δ [f (ti ; µ, σ)] i [1 − F (ti ; µ, σ)] i=1 n Y i=1 ( Li (µ, σ; datai ) 1 φnor σti if if log(ti ) − µ σ δi 1−δi × 1 − Φnor log(ti ) − µ σ 1−δi ti is an exact observation ti is a right censored observation φnor (z) is the standardized normal density. 8-5 Relative Likelihood 0 0.2 0.4 0.6 0.8 1 Weibull Relative Likelihood for the Shock Absorber Data b = 10.23 and σ b = .3164 ML Estimates: µ b log(σ b )) R(µ, log(σ)) = L(µ, log(σ))/L(µ, 10 .8 10 .6 µ 10 .4 10 .2 10 -0.6 -0.8 -1 -1.2 -1.4 -1.6 log(σ) 8-6 Weibull Relative Likelihood for the Shock Absorber Data b = 10.23 and σ b = .3164 ML Estimates: µ b σ b) R(µ, σ) = L(µ, σ)/L(µ, 0.7 0.6 0.1 0.5 0.2 0.01 0.4 0.4 0.7 0.9 σ 0.3 0.2 0.001 9.8 10.0 0.001 10.2 10.4 10.6 10.8 µ 8-7 Weibull Likelihood-Based Joint Confidence Regions for µ and σ for the Shock Absorber Data b = 10.23 and σ b = .3164 ML Estimates: h µ i R(µ, σ) > exp −χ2 /2 = 100α% (1−α;2) 0.7 0.6 95 90 0.5 60 50 0.4 70 80 99 25 σ 0.3 0.2 9.8 10.0 10.2 10.4 10.6 10.8 µ 8-8 Six-Distribution ML Probability Plot of the Shock Absorber Data Shock Absorber Data (Both Failure Modes) Probability Plots with ML Estimates and Pointwise 95% Confidence Intervals Smallest Extreme Value Probability Plot .7 .3 .1 .5 .1 .02 .02 .005 .005 .001 5000 10000 15000 20000 25000 30000 Normal Probability Plot .9 Probability Weibull Probability Plot 5000 15000 25000 Lognormal Probability Plot .8 .5 .2 .05 .6 .3 .1 .02 .002 10000 .005 .0005 5000 10000 15000 20000 25000 30000 Logistic Probability Plot .9 5000 10000 15000 25000 Loglogistic Probability Plot .8 .4 .6 .3 .1 .1 .02 .005 .001 .02 .005 5000 10000 15000 20000 25000 30000 5000 10000 15000 25000 8-9 Weibull Probability Plot of Shock Absorber Failure Times (Both Failure Modes) with Maximum Likelihood Estimates and Normal-Approximation 95% Pointwise Confidence Intervals for F (t) .9 .7 .5 .3 Fraction Failing .2 .1 .05 .03 .02 .01 .005 ^ η = 27719 .003 β^ = 3.16 .001 4000 6000 8000 10000 14000 18000 22000 28000 Kilometers 8 - 10 Lognormal Probability Plots of Shock Absorber Data with ML Estimates and Normal-Approximation 95% Pointwise Confidence Intervals for F (t). The Curved Line is the Weibull ML Estimate. .7 • Lognormal Distribution ML Fit Weibull Distribution ML Fit Lognormal Distribution 95% Pointwise Confidence Intervals .6 .5 • • Proportion Failing .4 • .3 • .2 • • • .1 • .05 • .02 • .01 5000 10000 15000 20000 25000 30000 Kilometers 8 - 11 Large-Sample Approximate Theory for Likelihood Ratios for Parameter Vector • Relative likelihood for (µ, σ) is L(µ, σ) . R(µ, σ) = b σ b) L(µ, • If evaluated at the true (µ, σ), then, asymptotically, −2 log[R(µ, σ)] follows, a chisquare distribution with 2 degrees of freedom. • General theory in the Appendix. 8 - 12 Weibull Profile Likelihood R(µ) (exp(µ) ≈ t.63) for the Shock Absorber i Data h L(µ,σ) R(µ) = max L(µ b ,σ b) σ 1.0 0.50 0.60 0.6 0.70 0.80 0.4 Confidence Level Profile Likelihood 0.8 0.90 0.2 0.95 0.99 0.0 9.8 10.0 10.2 10.4 10.6 10.8 µ 8 - 13 Weibull Profile Likelihood R(σ) (σ = 1/β) for the Shock Absorber i Data h L(µ,σ) R(σ) = max L(µ b ,σ b) µ 1.0 0.50 0.60 0.6 0.70 0.80 0.4 Confidence Level Profile Likelihood 0.8 0.90 0.2 0.95 0.99 0.0 0.2 0.3 0.4 0.5 0.6 σ 8 - 14 Large-Sample Approximate Theory for Likelihood Ratios for Parameter Vector Subset Need: Inferences on subset θ 1, from the partition θ = (θ 1, θ 2)′. • k1 = length(θ 1). • When (θ 1, θ 2)′ = (µ, σ), profile likelihood for θ 1 = µ is R(µ) = max σ " # L(µ, σ) . b b L(µ, σ ) • If evaluated at the true θ 1 = µ, then, asymptotically, −2 log[R(µ)] follows, a chisquare distribution with k1 = 1 degrees of freedom. • General theory in the Appendix. 8 - 15 Asymptotic Theory of Likelihood Ratios – Continued • An approximate 100(1 − α)% likelihood-based confidence region for θ 1 is the set of all values of θ 1 such that −2 log[R(θ 1)] < χ2 (1−α;k1) or, equivalently, the set defined by h i 2 R(θ 1) > exp −χ(1−α;k )/2 . 1 • Transformation of θ 1 will not affect the confidence statement. • Can improve the asymptotic approximation with simulation (only small effect except in very small samples). 8 - 16 Contour Plot of Weibull Relative Likelihood R(t.1, σ) for the Shock Absorber Data (Parameterized with t.1 and σ) b) R(t.1, σ) = L(t.1, σ)/L(tb.1, σ 0.01 0.6 0.5 σ 0.4 0.9 0.3 0.7 0.4 0.20.1 0.2 6000 10000 14000 18000 t.1 8 - 17 Confidence Regions and Intervals for Functions of µ and σ • Likelihood approach can be applied to functions of parameters. • Define the function of interest as one of the parameters, replacing one of the original parameters giving one-to-one reparameterization g (µ, σ) = [g1(µ, σ), g2(µ, σ)]. • Then follow previous procedure. • Simple to implement if function and its inverse are easy to compute. 8 - 18 Weibull Profile Likelihood R(t.1) for the Shock Absorber Data L(t.1,σ) R(t.1) = max b σ L(t.1,σ b) 1.0 0.50 0.60 0.6 0.70 0.80 0.4 Confidence Level Profile Likelihood 0.8 0.90 0.2 0.95 0.99 0.0 6000 8000 10000 12000 14000 16000 18000 20000 t.1 8 - 19 Weibull Profile Likelihood R[F (10000)] for the Shock Absorber Data L[F (10000),σ] max h i R [F (10000)] = σ L Fb (10000),σ b 1.0 0.50 0.60 0.6 0.70 0.80 0.4 Confidence Level Profile Likelihood 0.8 0.90 0.2 0.95 0.99 0.0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 F(10000) 8 - 20 Asymptotic Theory of ML Estimation b denote the ML estimator of θ . Let θ • If evaluated at the true value of θ , then asymptotically, b has a MVN(θ , Σ ) and thus the Wald (large samples) θ b θ statistic h i−1 ′ b − θ) Σ b − θ) (θ (θ b θ has a chisquare distribution with k degrees of freedom, where k is the length of θ . • Here, Σb = Iθ−1 is the large sample approximate covariance θ matrix where the Fisher information matrix for θ is " Iθ = E − ∂ 2L(θ ) ∂ θ∂ θ′ # . 8 - 21 Asymptotic Theory for Wald’s Statistic • Alternative asymptotic theory is based on the large-sample distribution of quadratic forms (Wald’s statistic). • Let Σ̂b be a consistent estimator of Σb , the asymptotic θ θ b . For example, covariance matrix of θ " Σ̂b = − θ #−1 2 ∂ L(θ ) ∂ θ∂ θ′ b. where the derivatives are evaluated at θ • Asymptotically, the Wald statistic i−1 h ′ b b − θ) w(θ ) = (θ − θ ) Σ̂b (θ θ when evaluated at the true θ , follows a chisquare distribution with k degrees of freedom, where k is the length of θ. 8 - 22 Asymptotic Theory for Wald’s Statistic – Continued • An approximate 100(1 − α)% confidence region for θ is the set of all values of θ in the ellipsoid h i−1 ′ b b − θ ) ≤ χ2 (θ − θ ) Σ̂b (θ (1−α;k). θ • This is sometimes known as the normal-theory confidence region. • Can specialize to functions or subsets of θ . • Can transform to improve asymptotic approximation. Try to get a log likelihood with approximate quadratic shape. 8 - 23 Normal-Approximation Confidence Intervals for Model Parameters • Estimated variance matrix for the shock absorber data Σ̂µb,σb = " d µ) b Var( d µ, b σ b) Cov( d µ, d σ b σ b ) Var( b) Cov( # = " .01208 .00399 .00399 .00535 # b − µ)/sc • Assuming that Zµb = (µ eµb ∼ ˙ NOR(0, 1) distribution, an approximate 100(1 − α)% confidence interval for µ is [µ, where sceµb = q e d µ). b Var( b ± z(1−α/2)sc µ̃] = µ eµb b )−log(σ)]/sc • Assuming that Zlog(σb) = [log(σ elog(σb) ∼ ˙ NOR(0, 1) an approximate 100(1 − α)% confidence interval for σ is [σ , h e b /w, σ̃] = [σ i b × w] σ b and sc where w = exp z(1−α/2)sceσb /σ eσb = q d σ b ). Var( 8 - 24 Normal-Approximation Confidence Intervals for Function g1 = g1(µ, σ) b σ b ). • ML estimate gb1 = g1(µ, • Assuming Zgb1 = (gb1 −g1)/scegb1 ∼ ˙ NOR(0, 1), an approximate 100(1 − α)% confidence interval for g1 is [g1, where e g̃1] = gb1 ± z(1−α/2)scegb1 , " # 2 2 q ∂g1 ∂g1 ∂g1 ∂g1 d d gb1) = d µ d σ b bg = Var( se Cov(µ b, σ b) Var( b ) + Var( b ) + 2 1 ∂µ ∂σ ∂µ ∂σ b σ b. • Partial derivatives evaluated at µ, • General theory in the appendix. 8 - 25 Normal-Approximation Confidence Interval for F (te; µ, σ) Objective: Obtain a point estimate and a confidence interval for Pr(T ≤ te) = F (te; µ, σ) at a fixed and known point te. b = (µ, b σ b ) and Σ̂b are available. • The ML estimates θ θ • The ML estimate for F (te; µ, σ) is c) b σ b ) = Φ(ζ Fb = F (te; µ, e b σ b. where ζce = [log(te) − µ]/ • In the context of Wald’s theory, however, there are many ways to obtain a confidence interval for F (te; µ, σ). 8 - 26 Confidence Interval for F (te; µ, σ)–Continued Note: Wald’s confidence intervals depend on the parameterization used to derive the intervals. For example, an approximate 100(1−α)% confidence interval for F (te; µ, σ) can be obtained using: • The asymptotic normality of ZFb = (Fb − F )/sceFb [F , e F̃ ] = Fb (te) ± z(1−α/2)sceFb . • The asymptotic normality of Zlogit(Fb) = [logit(Fb )−logit(F )]/scelogit(F [F , e " Fb (te) , F̃ ] = b b F (te) + (1 − F (te)) × w Fb (te) Fb (te) + (1 − Fb (te))/w where w = exp{z(1−α/2)sceFb /[Fb (te)(1 − Fb (te))]}. 8 - 27 # Confidence Interval for F (te; µ, σ)—Continued Comments: • Often the confidence interval based on the asymptotic normality of ZFb has poor statistical properties caused by the slow convergence toward normality of ZFb . • The confidence interval based on the transformation Zlogit(Fb) can have better statistical properties if Zlogit(Fb) converges to normality faster than ZFb . 8 - 28 ML Estimates for Biomedical Data R Proc Reliability ML estimates (Weibull Here we display SAS and lognormal) for the DMBA and the IUD data. 8 - 29 R Proc Reliability SAS Nonparametric and Weibull ML Estimate for DMBA Data with Parametric Pointwise Approximate 95% Confidence Intervals Percent 99.9 99 95 80 70 60 50 40 30 20 10 5 2 1 .5 .2 .1 100 500 Days 8 - 30 R Proc Reliability SAS Nonparametric and Lognormal ML Estimate for DMBA Data with Parametric Pointwise Approximate 95% Confidence Intervals 99.9 99 95 Percent 90 80 70 60 50 40 30 20 10 5 2 1 .5 .2 .1 100 500 Days 8 - 31 R Proc Reliability SAS Lognormal ML Estimate for IUD Data with a set of Pointwise Approximate 95% Confidence Intervals 95 90 80 Percent 70 60 50 40 30 20 10 5 2 1 .5 .2 5 10 100 Weeks 8 - 32 R Proc Reliability SAS Weibull ML Estimate for IUD Data with a set of Pointwise Approximate 95% Confidence Intervals 95 90 80 70 60 50 40 30 Percent 20 10 5 2 1 .5 .2 5 10 100 Weeks 8 - 33 Inference when σ (or Weibull β) is Given • Simplifies problem. Only one parameter with r failures and t1, . . . , tn failures and censor times P η̂ = 1/β β n i=1 ti , r sceη̂ = η̂ β s 1 . r • Provides much more precision, especially with small r. • If 0 failures can provide ◮ Upper confidence bound on F (t). ◮ Lower confidence bound on tp. • Requires sensitivity analysis because β is in doubt. • Danger of misleading inferences. 8 - 34 Bearing-Cage Fracture Field Data • A population of n = 1703 units had been introduced into service over time and 6 failures have been observed. • There is concern that the B10 design life specification of t.1 = 8 thousand hours was not being met. • ML estimate is tb.1= 3.903 thousand hours and an approximate 95% likelihood-ratio confidence interval for t.1 is [2.093, 22.144] thousand hours. • Management also wanted to know how many additional failures could be expected in the next year. 8 - 35 Bearing-Cage Fracture Data Event Plot Bearing Cage Failure Data Count Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 288 148 124 111 106 99 110 114 119 127 123 93 47 41 27 11 6 2 0 500 1000 1500 2000 Hours 8 - 36 Weibull Probability Plots Bearing-Cage Fracture Data with Weibull ML Estimates and Sets of 95% Pointwise Confidence Intervals for F (t) with β Estimated, and Assumed Known Values of β= 1.5, 2, and 3. β = 1.5 (or σ = .667) .7 .2 Fraction Failing Fraction Failing β estimated .03 .005 .001 .0002 .00003 .03 .005 .001 .0002 .00003 200 1000 5000 200 1000 Hours Hours β = 2 (or σ = .5 ) β = 3 (or σ = .333) .7 .2 Fraction Failing Fraction Failing .7 .2 .03 .005 .001 .0002 .00003 5000 .7 .2 .03 .005 .001 .0002 .00003 200 1000 Hours 5000 200 1000 5000 Hours 8 - 37 Lognormal and Weibull Comparison Bearing-Cage Fracture Field Data Lognormal Probability Paper .6 Lognormal Distribution ML Fit Weibull Distribution ML Fit Lognormal Distribution 95% Pointwise Confidence Intervals .5 .4 .3 Proportion Failing .2 .1 .05 • .02 .01 • • .005 .002 • .001 .0005 • • .0001 .00005 200 500 1000 2000 5000 10000 Hours 8 - 38 Weibull/SEV Distribution with Given β = 1/σ and Zero Failures • ML Estimate for the Weibull Scale Parameter η Cannot be Computed Unless the Available Data Contains One or More Failures. • For a sample of n units with running times t1, . . . , tn and no failures, a conservative 100(1−α)% lower confidence bound for η is P 1 β β t 2 n η = 2 i=1 i . χ(1−α;2) e • The lower bound η can be translated into an lower confie dence bound for functions like tp for specified p or a upper confidence bound for F (te) for a specified te. 8 - 39 Component A Safe Data • A metal component in a ship’s propulsion system fails from fatigue-caused fracture. • Because of persistent reliability problems, the component was redesigned to have a longer service life. • Previous experience suggests that the Weibull shape parameter is near β = 2, and almost certainly between 1.5 and 2.5. • Newly designed components were put into service during the past year and no failures have been reported. Hours: 500 1000 1500 2000 2500 3000 Number of Units: 10 12 8 9 7 9 Staggered entry data, with no reported failures. 3500 6 4000 3 • Can replacement be increased from 2000 hours to 4000 hours? 8 - 40 Weibull Model 95% Upper Confidence Bounds on F (t) for Component-A with Different Fixed Values for the Weibull Shape Parameter β F(t) .5 1 .3 .2 2 .1 3 .05 .03 .02 4 .01 1.5 .005 5 .003 6 .001 2 β= 2.5 .0005 500 1000 2000 5000 Time 8 - 41 Relative Weibull Likelihood with One Failure at 1 and One Survivor at 2 14 12 sigma 10 0.1 8 6 0.2 4 0.3 2 0.5 0.9 0 -2 2 6 10 14 mu 8 - 42 Relative Weibull Likelihood with One Failure at 1.9 and One Survivor at 2 1.0 0.8 sigma 0.1 0.6 0.4 0.2 0.3 0.2 0.5 0.9 0.0 0.6 1.0 1.4 1.8 mu 8 - 43 Regularity Conditions • Each technical result (e.g., asymptotic distribution of an estimator) has its own set of conditions on the model (see Lehmann 1983, Rao 1973). • Frequent reference to Regularity Conditions which give rise to simple results. • For special cases the regularity conditions are easy to state and check. For example, for some location-scale distributions the needed conditions are: z 2φ2(z) lim = 0 z→−∞ Φ(z) z 2φ2(z) = 0. lim z→+∞ 1 − Φ(z) • In non-regular models, asymptotic behavior is more complicated (e.g., behavior depends on θ ), but there are still useful asymptotic results. 8 - 44 Regularity Conditions – Continued Some typical regularity conditions include: • Support does not depend on unknown parameters. • Number of parameters does not grow too fast with n. • Continuous derivatives of log likelihood (w.r.t. θ ). • Bounded derivatives of likelihood. • Can exchange the order of differentiation of log likelihood w.r.t. θ and integration w.r.t. data. • Identifiability. 8 - 45 Other Topics Related to Parametric Likelihood Covered in the Book • Truncated data (Chapter 11). • Threshold parameters (Chapter 11). • Other distributions (e.g., gamma) (Chapter 11). • Bayesian methods (Chapter 14). • Multiple failure modes (Chapter 15). 8 - 46