8 Section WARWICK MANUFACTURING GROUP Product Excellence using 6 Sigma (PEUSS) Weibull analysis Warwick Manufacturing Group THE USE OF WEIBULL IN DEFECT DATA ANALYSIS Contents 1 Introduction 1 2 Data 1 3 The mechanics of Weibull analysis 5 4 Interpretation of Weibull output 8 5 Practical difficulties with Weibull plotting 15 6 Comparison with hazard plotting 20 7 Conclusions 20 8 References 21 9 ANNEX A – two cycle Weibull paper 22 10 ANNEX B – Progressive example of Weibull plotting 23 11 ANNEX C Estimation of Weibull location parameter 35 12 ANNEX D – Example of a 3-parameter Weibull plot 36 13 ANNEX E – the effect of scatter 40 13 ANNEX E – the effect of scatter 40 14 ANNEX F – 95% confidence limits for Weibull 42 15 ANNEX G – Weibull plot of multiply censored data 44 15 ANNEX G – Weibull plot of multiply censored data 44 Warwick Manufacturing Group Copyright © 2007 University of Warwick Warwick Manufacturing Group The use of Weibull in defect data analysis THE USE OF WEIBULL IN DEFECT DATA ANALYSIS 1 Introduction These notes give a brief introduction to Weibull analysis and its potential contribution to equipment maintenance and lifing policies. Statistical terminology has been avoided wherever possible and those terms which are used are explained, albeit briefly. Weibull analysis originated from a paper [1] published in 1951 by a Swedish mechanical engineer, Professor Waloddi Weibull. His original paper did little more than propose a multi-parameter distribution, but it became widely appreciated and was shown by Pratt and Whitney in 1967 to have some application to the analysis of defect data. 1.1 Information sources The definitive statistical text on Weibull is cited at [2], and publications closer to the working level are given at [3] and [4]. A set of British Standards, BS 5760 Parts 1 to 3 cover a broad spectrum of reliability activities. Part 1 on Reliability Programme Management was issued in 1979 but is of little value here except for its comments on the difficulties of obtaining adequate data. Part 2 [5] contains valuable guidance for the application of Weibull analysis although this may be difficult to extract. The third of the Standard contains authentic practical examples illustrating the principles established in Parts 1 and 2. One further source of information is an I Mech E paper by Sherwin and Lees [6]. Part 1 of this paper is a good review of current Weibull theory and Part 2 provides some insight into the practical problems inherent in its use. 1.2 Application to sampled defect data It is important to define the context in which the following Weibull analysis may be used. All that is stated subsequently is applicable to sampled defect data. This is a very different situation to that which exists on, say, the RB-211 for which Rolls Royce has a complete data base. They know at any time the life distribution of all the in-service engines and their components, and their analysis can be done from knowledge of the utilizations at failure and the current utilisation for all the non-failed components. Their form of Weibull analysis is unique to this situation of total visibility. It is assumed here, however, that most organisations are not in this fortunate position; their data will at best be of some representative sample of the failures which have occurred, and of utilization of unfailed units. It cannot be stressed too highly, though, that life of unfailed units must be known if a realistic estimate of lifetimes to failure is to be made, and, therefore, data must be collected on unfailed units in the sample. 2 Data The basic elements in defect data analysis comprise: • a population, from which some sample is taken in the form of times to failure (here time is taken to mean any appropriate measure of utilisation), Warwick Manufacturing Group Page 1 The use of Weibull in defect data analysis • an analytical technique such as Weibull which is then applied to the sample of failure data to derive a mathematical model for the behaviour of the sample, and hopefully of the population also, and finally • some deductions which are generated by an examination of the model. These deductions will influence the decisions to be made about the maintenance strategy for the population. The most difficult part of this process is the acquisition of trustworthy data. No amount of elegance in the statistical treatment of the data will enable sound judgements to be made from invalid data. Weibull analysis requires times to failure. This is higher quality data than knowledge of the number of failures in an interval. A failure must be a defined event and preferably objective rather than some subjectively assessed degradation in performance. A typical sample, therefore, might at its most superficial level comprise a collection of individual times to failure for the equipment under investigation. 2.1 Quality of data The quality of data is a most difficult feature to assess and yet its importance cannot be overstated. When there is a choice between a relatively large amount of dubious data and a relatively small amount of sound data, the latter is always preferred. The quality problem has several facets: • The data should be a statistically random sample of the population. Exactly what this means in terms of the hardware will differ in each case. Clearly the modification state of equipments may be relevant to the failures being experienced and failure data which cannot be allocated to one or other modification is likely to be misleading. By an examination of the source of the data the user must satisfy himself that it contains no bias, or else recognise such a bias and confine the deductions accordingly. For example, data obtained from one user unit for an item experiencing failures of a nature which may be influenced by the quality of maintenance, local operating conditions/practices or any other idiosyncrasy of that unit may be used providing the conclusions drawn are suitably confined to the unit concerned. • A less obvious data quality problem concerns the measure of utilisation to be used; it must not only be the appropriate one for the equipment as a whole, but it must also be appropriate for the major failure modes. As will be seen later, an analysis at equipment level can be totally misleading if there are several significant failure modes each exhibiting their own type of behaviour. The view of the problem at equipment level may give a misleading indication of the counter-strategies to be employed. The more meaningful deeper examination will not be possible unless the data contains mode information at the right depth and degree of integrity. • It is necessary to know any other details which may have a bearing on the failure sensitivity of the equipment; for example the installed position of the failures which Warwick Manufacturing Group Page 2 The use of Weibull in defect data analysis Page 3 comprise the sample. There are many factors which may render elements of a sample unrepresentative including such things as misuse or incorrect diagnosis. 2.2 Quantity of data Whereas the effects of poor quality are insidious, the effects of inadequate quantity of data are more apparent and can, in part, be countered. To see how this may be done it is necessary to examine one of the statistical characteristics used in Weibull analysis. An equipment undergoing in-service failures will exhibit a cumulative distribution function (F(t)), which is the distribution in time of the cumulative failure pattern or cumulative percent failed as a function of time, as indicated by the sample. Consider a sample of 5 failures (sample size n = 5). The symbol i is used to indicate the failure number once the failure times are ranked in ascending order; so here i will take the integer values 1 to 5 inclusive. Suppose the 5 failure times are 2, 7, 13, 19 and 27 cycles. Now the first failure at 2 cycles may be thought to correspond to an F(t) value of i/n, where i = 1 and n = 5. ie F(t) @ 2 cycles = 1/5 or 0.2 or 20% Similarly for the second failure time of 7 cycles, the corresponding F(t) is 40% and so on. On this basis, this data is suggesting that the fifth failure at 27 cycles corresponds to a cumulative percent failed of 100%. In other words, on the basis of this sample, 100% of the population will fail by 27 cycles. Clearly this is unrealistic. A further sample of 10 items may contain one or more which exceed a 27 cycle life. A much larger sample of 1000 items may well indicate that rather than correspond to a 100% cumulative failure, 27 cycles corresponds to some lesser cumulative failure of any 85 or 90%. This problem of small sample bias is best overcome as follows: • Sample Size Less Than 50. A table of Median Ranks has been calculated which gives a best estimate of the F(t) value corresponding to each failure time in the sample. This table is issued with these notes. It indicates that in the example just considered, the F(t) values corresponding to the 5 ascending failure times quoted are not 20%, 40%, 60%, 80% and 100%, but are 12.9%, 31.4%, 50%, 68.6% and 87.1%. It is this latter set of F(t) use values which should be plotted against the corresponding ranked failure times on a Weibull plot. Median rank values give the best estimate for the primary Weibull parameter and are best suited to some later work on confidence limits. • Sample Size Less Than 100. For sample sizes less than 100, in the absence of Median Rank tables the true median rank values can be adequately approximated using Bernard’s Approximation: F (t ) = • (i − 0.3) (n + 0.4) Sample Sizes Greater Than 100. Above a sample size of about 100 the problem of small sample bias is insignificant and the F(t) values may be calculated from the expression for the Mean Ranks: Warwick Manufacturing Group The use of Weibull in defect data analysis F (t ) = i (n + 1) 2.3 Trends in data A trend may be in relation to any other time-base than the one being used to assess reliability. For instance, if the reliability of vehicle engines is being assessed against vehicle miles, and a trend is observed in respect to the date of engine manufacture, or in respect of calendar time, then this is evidence that there is a trend. Before attempting to use a mathematical model such as weibul analysis is important to check the homogeneity of the data. If the data indicates the presence of a time series, it is inappropriate to apply a model that requires homogeneity. A simple approach to this would be to chart the failure data in the form of cumulative failures against cumulative time. Deviation from a straight line would indicate a trend. A mathematical approach to this would be to employ the Laplace Trend Test. 2.3.1 Laplace trend test The Laplace Trend test assesses the distribution of failure events with respect to the ordering indicated by the criteria being tested. The Trend ordering may simply be that the failures occurred in a specific order, rather than randomly. For instance, if the date of engine manufacture is being considered as an ordering criteria, then the failure times should be put on a time-line as follows, as ordered by date of engine manufacture. Common sense will indicate that the later manufactured engines are not lasting as long to failure as the first. Therefore, there is clear evidence of a trend. However, to put this on a statistical basis, the Laplace Trend Test is used: Tes t St at i s t i c , U = ⎡∑t ⎤ 12N (t ) ⎢ − 0.5⎥ ⎥⎦ ⎢⎣ N (t )t where U = standardised normal deviate—if the calculated value is less than that taken from tables appropriate to the confidence required (e.g 95%), then trend is not proven. If it is greater than the tabled value, then trend is demonstrated. Σt = sum of failure times (on cumulative scale: see diagram above) Warwick Manufacturing Group Page 4 The use of Weibull in defect data analysis N(t) = total number of failures t = total test time Crudely, ∑t N (t )t < 0.5 indicates reducing failure rate = 0.5 indicates no trend > 0.5 indicates increasing failure rate 2.3.2 Amendment for Failure Terminated Data Where data has been collected, terminating collection at the point of a failure, then the Laplace Trend Test is amended as follows: Tes t St at i s t i c , U = ⎡ n−1 ⎤ ⎢ ∑ ti ⎥ i = 1 ⎢ 12 N ( t n − 1 ) − 0.5⎥ ⎢ N ( t )t ⎥ n−1 n ⎢ ⎥ ⎢⎣ ⎥⎦ where the last failure is not counted, whereas the final time interval is counted. 3 The mechanics of Weibull analysis 3.1 The value of analysis On occasions, an analysis of the data reveals little that was not apparent from engineering judgement applied to the physics of the failures and an examination of the raw data. However, on other occasions, the true behaviour of equipments can be obscured when viewed by the most experienced assessor. It is always necessary to keep a balance between deductions drawn from data analysis and those which arise from an examination of the mechanics of failure. Ideally, these should be suggesting complementary rather than conflicting counterstrategies to unreliability. There are many reliability characteristics of an item which may be of interest and significantly more reliability measures or parameters which can be used to describe those characteristics. Weibull will provide meaningful information on two such characteristics. First, it will give some measure of how failures are distributed with time. Second, it will indicate the hazard regime for the failures under consideration. The significance of these two measures of reliability is described later. Weibull is a 3-parameter distribution which has the great strength of being sufficiently flexible to encompass almost all the failure distributions found in practice, and hence provides Warwick Manufacturing Group Page 5 The use of Weibull in defect data analysis Page 6 information on the 3 failure regimes normally encountered. Weibull analysis is primarily a graphical technique although it can be done analytically. The danger in the analytical approach is that it takes away the picture and replaces it with apparent precision in terms of the evaluated parameters. However, this is generally considered to be a poor practice since it eliminates the judgement and experience of the plotter. Weibull plots are often used to provide a broad feel for the nature of the failures; this is why, to some extent, it is a nonsense to worry about errors of about 1% when using Bernard’s approximation, when the process of plotting the points and fitting the best straight line will probably involve significantly larger “errors”. However, the aim is to appreciate in broad terms how the equipment is behaving. Weibull can make such profound statements about an equipment’s behaviour that ±5% may be relatively trivial. 3.2 Evaluating the Weibull parameters The first stage of Weibull analysis once the data has been obtained is the estimation of the 3 Weibull parameters: β: Shape parameter. η: Scale parameter or characteristic life. γ: Location parameter or minimum life. The general expression for the Weibull F(t) is: F (t ) = 1 − e ⎡ ( t −γ ) ⎤ −⎢ ⎥ ⎣ η ⎦ β This can be transformed into: log log 1 = β log(t − γ ) − β logη (1 − F (t )) It follows that if F(t) can be plotted against t (corresponding failure times) on paper which has a reciprocal double log scale on one axis and a log scale on the other, and that data forms a straight line, then the data can be modelled by Weibull and the parameters extracted from the plot. A piece of 2 cycle Weibull paper (Chartwell Graph Data Ref C6572) is shown at Annex A and this is simply a piece of graph paper constructed such that its vertical scale is a double log reciprocal and its horizontal scale is a conventional log. The mechanics of the plot are described progressively using the following example and the associated illustrations in plots 2 to 12 of Annex B. Consider the following times to failure for a sample of 10 items: 410, 1050, 825, 300, 660, 900, 500, 1200, 750 and 600 hours. Warwick Manufacturing Group The use of Weibull in defect data analysis Page 7 • Assemble the data in ascending order and tabulate it against the corresponding F(t) values for a sample size of 10, obtained from the Median Rank tables. The tabulation is shown at Section 16. • Mark the appropriate time scale on the horizontal axis on a piece of Weibull paper (plot 2). • Plot on the Weibull paper the ranked hours at failure (ti) on the horizontal axis against the corresponding F(t) value on the vertical axis (plot 3). • If the points constitute a reasonable straight line then construct that line. Note that real data frequently snakes about the straight line due to scatter in the data; this is not a problem providing the snaking motion is clearly to either side of the line. When determining the position of the line give more weight to the later points rather than the early ones; this is necessary both because of the effects of cumulation and because the Weibull paper tends to give a disproportionate emphasis to the early points which should be countered where these are at variance with the subsequent points. Do not attempt to draw more than one straight line through the data and do not construct a straight line where there is manifestly a curve. In this example the fitting of the line presents no problem (plot 4). Note also that on the matter of how much data is required for a Weibull plot that any 4 or so of the pieces of data used here would give an adequate straight line. In such circumstances 4 points may well be enough. Generally, 7 or so points would be a reasonable minimum, depending on their shape once plotted. • The fact that the data produced a straight line when initially plotted enables 2 statements to be made: o The data can apparently be modelled by the Weibull distribution. o The location parameter or minimum life (γ) is approximately zero. This parameter is discussed later. • At plot 5 a scale for the estimate of the Shape Parameter β, is highlighted. This scale can be seen to range from 0.5 to 5, although β values outside this range are possible. • The next step is to construct a perpendicular from the Estimation Point in the top left hand corner of the paper to the plotted line (plot 6). • The estimated value of β, termed β , is given by the intersection of the constructed ∧ ∧ perpendicular and the β scale. In this example, β is about 2.4 (plot 7). • At plot 8 a dotted horizontal line is highlighted corresponding to an F(t) value of 63.2%. Now the scale parameter or characteristic life estimate is the life which corresponds to a cumulative mortality of 63.2% of the population. Hence to determine its value it is necessary only to follow the η Estimator line horizontally until it intersects the plotted line and then read off the corresponding time on the lower scale. Warwick Manufacturing Group The use of Weibull in defect data analysis Page 8 Plot 9 shows that, based on this sample, these components have a characteristic life of about 830 hours. By this time 63.2% of them will have failed. • At plot 10 the evaluation of the proportion failed corresponding to the mean of the distribution of the times to failure (Pµ) is shown to be 52.7% using the point of intersection of the perpendicular and the Pµ scale. This value is inserted in the F(t) scale and its intersection with the plotted line determines the estimated mean of the ∧ distribution of the times to failure ( μ ). In this case this is about 740 hours. • The median life can also be easily extracted; that is to say the life corresponding to 50% mortality. This is shown at plot 11 to be about 720 hours, based on this sample. • Finally, plot 12 illustrates that this data is indicating that a 400 hour life would result in about 15% of in-service failures for these equipments. Conversely, an acceptable level of in-service failure may be converted into a life; for example it can be seen from plot 12 that an acceptable level of in-service failure of say, 30% would correspond to a life of about 550 hours, and so on. 4 Interpretation of Weibull output 4.1 Concept of hazard Before examining the significance of the Weibull shape parameter β it is necessary to know something of the concept of hazard and the 3 so-called failure regimes. The parameter of interest here is the hazard rate, h(t). This is the conditional probability that an equipment will fail in a given interval of unit time given that it has survived until that interval of time. It is, therefore, the instantaneous failure rate and can in general be thought of as a measure of the probability of failure, where this probability varies with the time the item has been in service. The 3 failure regimes are defined in terms of hazard rate and not, as is a common misconception, in terms of failure rate. The 3 regimes are often thought of in the form of the so-called ‘bath-tub’ curve; this is a valid concept for the behaviour of a system over its whole life but is a misleading model for the vast majority of components and, more importantly, their individual failure modes (see [5] and [7]). An individual mode is unlikely to exhibit more than one of the 3 characteristics of decreasing, constant or increasing hazard. 4.1.1 Shape parameter less than unity. A β value of less than unity indicates that the item or failure mode may be characterised by the first regime of decreasing hazard. This is sometimes termed the early failure or infant mortality period and it is a common fallacy that such failures are unavoidable. The distribution of times to failure will follow a hyper-exponential distribution in which the instantaneous probability of failure is decreasing with time in service. This hyper-exponential distribution models a concentration of failure times at each end of the time scale; many items fail early or else go on to a substantial life, whilst relatively few fail between the extremes. The extent to which β is below 1 is a measure of the severity of the early failures; 0.9 for Warwick Manufacturing Group The use of Weibull in defect data analysis example would be a relatively weak early failure effect, particularly if the sample size and therefore the confidence, was low. If there is a single or a predominant failure mode with a β<1, then clearly component lifing is inappropriate since the replacement is more likely to fail than the replaced item. Just as importantly, a β<1 gives a powerful indication of the causes of these failures, which are classically attributed to two deficiencies. First such failures may result from poor quality control in the manufacturing process or some other mechanism which permits the installation of low quality components. It is for this reason that burn-in programmes are the common counter-strategy to poor quality control for electronic components which would otherwise generate an unacceptably high initial in-service level of failure. The second primary cause of infant mortality is an inadequate standard of maintenance activity, and here the analysis is pointing to a lack of quality rather than quantity in the work undertaken. The circumstance classically associated with infant mortality problems is the introduction of new equipment, possibly of new design, which is unfamiliar to its operators and its maintainers. Clearly in such situations, the high initial level of unreliability should decrease with the dissemination of experience and the replacement of weakling components with those of normal standard. The problem of infant mortality has been shown to be much more prevalent than might have been anticipated. In one particular study (Part 2 of [6]) it was found to be the dominant failure regime on a variety of mechanical components of traditional design. 4.1.2 Shape parameter equal to unity. When the shape parameter has a value of approximately one, the Weibull analysis is indicating that constant hazard conditions apply. This is the special case where the degree of hazard is not changing with time in service and such terms as failure rate, MTBF and MTTF may be used meaningfully. This is the most frequently assumed distribution because to do so simplifies the mathematical manipulation significantly and opens up the possibility of using many other reliability techniques which are based on, but rarely state, the precondition that constant hazard conditions apply. To assume constant hazard, with its associated negative exponential distribution of times to failure, over some or all of an equipment’s life must frequently produce misleading conclusions. The term ‘random failures’ is often used to describe constant hazard and refers to the necessary conditions that failures be independent of each other and of time. Equipments which predominantly suffer constant hazard over their working lives should not be lifed since, by definition, the replacement has the same hazard or instantaneous probability of failure as the replaced item. Individual failure modes with β = 1 tend to be the exception. Frequently, an equipment will appear to exhibit constant hazard because it has several failure modes of a variety of types, none of which is dominant. This summation effect is a particular characteristic of complex maintained systems comprising multiple series elements whether they are electronic, electrical, mechanical or some combination, particularly when their lives have been randomized by earlier failure replacements. The difficulty here is that the counter-strategy for the individual failure modes may well be significantly different to those suggested by constant hazard conditions for the system or equipment as a whole. There may well be, therefore, a Warwick Manufacturing Group Page 9 The use of Weibull in defect data analysis need for a deeper analysis at mode level. Typical counter-strategies to known constant hazard conditions include de-rating, redundancy or modification. 4.1.3 Shape parameter greater than unity. If the Weibull shape parameter is greater than one the analysis is indicating that increasing hazard conditions apply. The instantaneous probability of failure is therefore increasing with time; the higher the β value, the greater is the rate of increase. This is often called the ‘wearout’ phase, although again this term can be misleading. The time dependence of failures now permits sensible consideration of planned replacement providing the total cost of a failure replacement is greater than the total cost of a planned replacement. The interval for such replacements should be optimised and there is at least one general technique [8] which will do this directly from the Weibull parameters, providing the total costs are known. Various values of β can be associated with certain distributions of times to failure and the commonest causes of such distributions. A β value of about 2 arises from a times to failure distribution which is roughly log-normal - see Figure 1. Figure 1. Probability density function for a shape parameter of 2. Such distributions may be attributable to a wear-out phenomenon but are classically generated by situations where failure is due to the nucleation effect of imperfections or weaknesses, such as in crack propagation. A shape parameter of about 2 is an indication, therefore, of fatigue failure. As the β value increases above 2, the shape of the pdf approaches the symmetrical normal distribution until at β = 3.4 the pdf is fully normal (Figure 2). Warwick Manufacturing Group Page 10 The use of Weibull in defect data analysis Figure 2. Probability density function for a shape parameter of 3.4. A β value of this order indicates at least one dominant failure mode which is being caused by wear or attrition. As the β value rises still further so does the rate of wear-out. Such situations need not necessarily be viewed with alarm; if the combined analysis for the 3 Weibull parameters indicates a pdf of the form shown below, of which a very high β, say about 6 or 7, is just one element, then clearly a strategy to replace at t0 might be highly satisfactory, particularly if it is a critical component, since the evidence suggests there will be no in-service failure once that life is introduced (Figure 3). Figure 3. Probability density function for a shape parameter of 6 to 7. The initiation of increasing hazard conditions and their rate of increase may be a function of the maintenance policy adopted and the operating conditions imposed on the equipment. 4.1.4 Some general comments on β The Weibull shape parameter provides a clear indication of which failure regime is the appropriate one for the mode under investigation and quantifies the degree of decreasing or increasing hazard. It can be used therefore, to indicate which counter-strategies are most Warwick Manufacturing Group Page 11 The use of Weibull in defect data analysis likely to succeed and aids interpretation of the physics of failure. It can also be used to quantify the effects of any modifications or maintenance policy changes. Although the use of median ranks provides the best estimate of β by un-biasing the sample data, it is important to remember that the confidence which can be placed on the β estimate for any given failure mode is primarily a function of the sample size and quality of the data for that mode. 4.2 Scale parameter or characteristic life As stated earlier, η is the value in time by which 63.2% of all failures will have occurred. In this sense, η is just one point on the time scale, providing some standard measure of the distribution of times to failure. Looking back at the example of 10 items, it was found that β= 2.4 and η= 830 hours. This information helps the construction of a picture of the appropriate pdf (See Figure 4). Figure 4. Probability density function and characteristic life. To say here that the characteristic life is 830 hours is to say simply that roughly two thirds of all failures will occur by that time, according to this sample. As Sherwin showed in his study [6], this is a very useful means of quantifying the effects of some change in maintenance strategy. There are, however, others some of which were evaluated in the example. The mean of this log-normalish distribution for these items was found to be about 740 hours and corresponded to a percent failed of 52.7%. Figure 5 can be sketched using these estimates. Warwick Manufacturing Group Page 12 The use of Weibull in defect data analysis Figure 5. Probability density function and mean life. Alternatively the median or 50% life was found to be about 720 hours (Figure 6). Figure 6. Probability density function and median life. Here the 3 measures of time are all doing roughly the same thing. The characteristic life, however, is taken as the standard measure of position. Its significance is strengthened by the fact that when constant hazard conditions apply, ie β = 1, then the η value becomes the mean time between failures (MTBF) for a repairable equipment or a mean time to failure (MTTF) for a non-repairable equipment, and is therefore the inverse of the constant hazard failure rate. This is the only circumstance in which η may be termed an MTBF/MTTF. 4.3 Location parameter or minimum life It was briefly stated during the example that if a reasonable straight line could be fitted to the initial plot, then the value of the location parameter is approximately zero. Sometimes, however, the first plot may appear concave when viewed from the bottom right hand corner of the sheet (Figure 7). Warwick Manufacturing Group Page 13 The use of Weibull in defect data analysis Page 14 Figure 7. Points on a curve using Weibull paper. ∧ When this occurs it is necessary to subtract some quantity of time ( γ ) from every failure time used to plot the curve. This is best done by a method attributed to General Motors and shown ∧ in Annex C. Using this or any other suitable method, an estimate of γ, termed γ , can be obtained. The estimate is enhanced by subtracting its value from every failure time and replotting the data: if is too small the curve will remain concave but to a lesser degree than Warwick Manufacturing Group The use of Weibull in defect data analysis before: if is too large the plot will become convex; and the best estimate of γ is that value which when subtracted from all the failure times gives the best straight line. The significance of γ is that it is some value of time by which the complete distribution of times to failure is shifted, normally to the right, hence the term ‘location’. In the earlier example the distribution with = 0 is shown at Figure 4. If, however, had taken some positive value, say 425 hours, then this value must be added to all the times to failure extracted from the subsequent analysis of the straight line, and Figure 4 would have changed to that illustrated at Figure 8. Figure 8. Effect of location parameter. Here two thirds of the population does not fail until 1255 hours and most importantly the γ value or minimum life value has shifted the time origin such that no failure is anticipated in the first 425 hours of service. The existence of a positive location parameter is therefore a highly desirable feature in any equipment and the initial plot should always be examined for a potential concave form. A further example of a 3-parameter Weibull plot is given at Annex D. 5 Practical difficulties with Weibull plotting 5.1 Scatter The problem of scatter in the original data and the resultant snaking effect this can produce has been briefly mentioned. At Annex E, however, is a plot using 11 pieces of real data which illustrates a severe case of snaking. It is possible to plot a line and an attempt has been made in this case which gives the necessary added weight to later points. The difficulty is obvious; it is necessary to satisfy yourself that you are seeing true snaking about a straight line caused by scatter of the points about the line and not some other phenomenon. Warwick Manufacturing Group Page 15 The use of Weibull in defect data analysis 5.2 Extrapolation Successful Weibull plotting relies on having historical failure data. Inaccuracies will arise if the span in time of that data is not significantly greater than the mean of the distribution of times to failure. If data obtained over an inadequate range is used as a basis for extrapolation (i.e.) extending the plotted line significantly, estimates of the 3 parameters are likely to be inaccurate and may well fail to reveal characteristics of later life such as a bi-modal wear-out phenomenon. The solution is comprehensive data at the right level. 5.3 Multi-modal failures The difficulty of multi-modal failures has been mentioned previously. In the same way that the distribution of times to failures for a single mode will be a characteristic of that mode, so the more modes there are contributing to the failure data, the more the individual characteristics of number of failure modes often tends to look like constant hazard (β = 1.0). In some cases this has been found to be so even when the modes themselves have all had a high wear-out characteristic (β ≈ 3 or 4). This tendency is strongest when there are many modes none of which is dominant. Hence knowledge of the failure regimes of the individual failure modes of an equipment is more useful in formulating a maintenance policy than that of the failure regime of the equipment itself. The solution once again is data precise enough to identify the characteristics of all the significant failure modes. A Weibull plot using data gathered at equipment level may or may not indicate multi-modal behaviour. The most frequent manifestation of such behaviour is a convex or cranked plot as shown in Figure 9. Warwick Manufacturing Group Page 16 The use of Weibull in defect data analysis Figure 9. Multi-modal behaviour on Weibull paper. The cranked plot shown above should not normally be drawn since it implies the existence of 2 failure regimes, one following the other in time. This is rarely the case; in general the bi- or multi-modal plots will be found to be mixed along both lines, because the distributions of times to failure themselves overlap. This is illustrated in Figure 10. Warwick Manufacturing Group Page 17 The use of Weibull in defect data analysis Figure 10. Multiple probability density functions. One example of this bi-modal behaviour is quoted in [6]. There a vacuum pump was found to have one mode of severe infant mortality (β = 0.42) combined with another of wear-out (β = 3.2). It is most unlikely that an analysis of their combined times to failure would have suggested an adequate maintenance strategy for the item as a whole. The convex curve also shown in Figure 9 indicates the presence of corrupt or multi-modal data. One form of corruption stems from the concept of a negative location parameter; if life is consumed in storage but the failure data under analysis is using an in-service life measured once the items are issued from store, then clearly the data is corrupt in that only a part life is being used in the analysis. Once adequate multi-modal data has been obtained it is possible to separate the data for each mode and replot all the data in such a way as to make maximum use of every piece of life information. This approach provides more confidence than simply plotting failure data for the individual mode and is best done using an adaptation of the technique for dealing with multiply-censored data; this topic is covered later. 5.4 Confidence limits As was pointed out earlier, most forms of analysis will give a false impression of accuracy and Weibull is no exception, particularly when the same size is less than 50. The limitations of the data are best recognised by the construction of suitable confidence limits on the original plot. The confidence limits normally employed are the 95% lower confidence limit (LCL) or 5% Ranks, and the 95% upper confidence limit (UCL) or 95% Ranks, although other levels of confidence can be used. With these notes are tables of LCL and UCL ranks which can be seen to be a function solely of sample size. The technique for using these ranks consists of entering the vertical axis of the Weibull plot at the ith F(t) value quoted in the tables for the appropriate sample size. A straight horizontal line should be drawn from the point of entry to intersect the line constructed from the data. From the point of intersection, move vertically up (for a lower limit) or down (for an upper limit) until horizontal with the corresponding ith plotted point. The technique is shown at Plot 17 in Annex F for the lower bound using the same example as in Annex B. The first value obtained from the table for a sample size of 10 is 0.5; this cannot Warwick Manufacturing Group Page 18 The use of Weibull in defect data analysis be used since it does not intersect the plotted line. The next value is 3.6 and this is shown in Plot 17 to generate point (1) on the lower bound. The third point of entry is at 8.7 and this is shown to produce point (2) which is level with the third plotted point for the straight line, and so on. The primary use of this lower bound curve constructed through the final set of points is that it is a visual statement of how bad this equipment might be and still give rise to the raw data observed, with 95% confidence. Hence it can be said here that although the best estimate for is 830 hours, we can be only 95% confident, based on the data used, that the true is greater than or equal to 615 hours. Similarly at Plot 18, which shows the construction of a 95% upper bound, we can be 95% confident that the true is less than or equal to 1040 hours. These two statements can be combined to give symmetrical 90% confidence limits of between 615 and 1040 hours. This range can only be reduced by either diminishing the confidence level (and therefore increasing the risks of erroneous deduction) or by increasing the quantity of data. 5.5 Censoring of sample data Often samples contain information on incomplete times to failure in addition to the more obviously useful consumed lives at failure. This incomplete data may arise because an item has to be withdrawn for some reason other than the failure which is being studied. If the equipment suffers multi-modal failures then in an analysis of a particular mode, failure times attributable to all other modes become censorings. Alternatively the data collection period may end without some equipments failing, ie unknown finish times. The outcome of such situations is generally a series of complete failure times and a series of incomplete failure times or censorings for the mode under investigation. This latter information, this collection of times when the equipment did not fail for the particular reason cannot be ignored since to do so would bias the analysis, and diminish the confidence level associated with subsequent statements drawn from the plot. The assumption is generally made that the non-failures would have failed with equal probability at any time between the known failures or censored lives or after all of them. Therefore an item removed during inspection because it is nearing unacceptable limits is closer to a failure and is not a censoring. The mechanics of dealing with censored data require the determination of a mean order number for each failure; this may be considered as an alternative to the failure number i used previously, the primary difference being that the mean order number becomes a non-integer once the first censoring is reached. The technique is outlined using the example at Annex G. As a first step a table is constructed with columns (a) and (b) listing in ascending order the failure and censoring times respectively. Column (c) is calculated as the survivors prior to each event in either of columns (a) or (b); where the event is a censoring the corresponding surviving number is shown in parenthesis by convention. Clearly the data in the sample is multiply-censored in that it is a mixture of failure and censored times; a total of 7 failures and 9 censorings gives a sample size n of 16. Column (d) is obtained using the formula: Warwick Manufacturing Group Page 19 The use of Weibull in defect data analysis mi = mi −1 + Page 20 (n + 1 − mi −1 ) 1 + ki Where mi = current mean order number mi-1 = previous mean order number n = total sample size for failure and censorings ki = number of survivors prior to the failure or censoring under consideration Mean order number values are determined only for failures. Once the first censoring occurs at 65, all subsequent mi values are non-integers. The median rank values at column (e) are taken from the median rank tables using linear interpolation when necessary. For purposes of comparison only, the equivalent median ranks obtained from Bernard’s Approximation, (i0.3)/(n + 0.4) are included at column (f). These are obtained by substituting mi for i in the standard expression. These can be seen to be largely in agreement with the purer figures in column (e). Finally 5% LCL AND 95% UCL figures are included at columns (g) and (h). These are obtained from the tables using linear interpolation where necessary. The median rank figures in column (e) are plotted on Weibull paper against the corresponding failure times at column a in the normal way. The plot is illustrated at Plot 19 in Annex G, and produces β, η and γ estimates without difficulty. For completeness, Plot 20 shows the 5% LCL AND 95% UCL curves; a 90% confidence range for of between 90 and 148 units of time is obtained. 6 Comparison with hazard plotting It is often thought that Weibull plots are no better than plotting techniques based on the cumulative hazard function calculated from sample data. Such methods will give estimates of the 3 Weibull parameters and the mechanics of obtaining them are often slightly simpler than for the equivalent application of Weibull. However, cumulative hazard plots give little feel for the behaviour of the equipment in terms of the levels of risk of in-service failures for a proposed life. More importantly, such methods contain no correction for small sample bias and are therefore less suitable for use with samples smaller than 50. This limitation is is compounded by the difficulty of attempting the evaluation of confidence limits on a cumulative hazard plot. Finally, cases have occurred where cumulative hazard plots have failed to indicate multi-modal behaviour which was readily apparent from a conventional Weibull plot from the same data. 7 Conclusions The ability of the Weibull distribution to model failure situations of many types, including those where non-constant hazard conditions apply, make it one of the most generally useful distributions for analyzing failure data. The information it provides, both in terms of the modelled distribution of times to failure and the prevailing failure regime is fundamental to the selection of a successful maintenance strategy, whether or not component lifing is an element in that strategy. Warwick Manufacturing Group The use of Weibull in defect data analysis Weibull’s use of median ranks helps overcome the problems inherent in small samples. The degree of risk associated with small samples can be quantified using confidence limits and this can be done for complete or multiply-censored data. Weibull plots can quantify the risks associated with a proposed lifing policy and can indicate the likely distribution of failure arisings. In addition, they may well indicate the presence of more than one failure mode. However, Weibull is not an autonomous process for providing instant solutions; it must be used in conjunction with knowledge of the mechanics of the failures under study. The final point to be made is that Weibull, like all such techniques, relies upon data of adequate quantity and quality; this is particularly true of multi-modal failure patterns. 8 References 1.Weibull W. A statistical distribution function of wide application. ASME paper 51-A-6, Nov 1951. 2.Mann R N, Schafer R E and Singpurwalla N D. Methods for statistical analysis of reliability and life data. Wiley 1974. 3.Bompas-Smith J H. Mechanical survival - the use of reliability data. McGraw-Hill 1973. 4.Carter A D S. Mechanical reliability. Macmillan 1972. 5.British Standard 5760: Part 2: 1981. Reliability of systems, equipments and components; guide to the assessment of reliability. 6.Sherwin D J and Lees F P. An investigation of the application of failure data analysis to decision making in the maintenance of process plant. Proc Instn Mech Engrs, Vol 194, No 29, 1980. 7.Carter ADS. The bathtub curve for mechanical components - fact or fiction. Conference on Improvement of Reliability in Engineering, Instn Mech Engrs, Loughborough 1973. 8.Glasser G J. Planned replacement: some theory and its application. Journal of Quality Technology, Vol 1,No 2. April 1969. Warwick Manufacturing Group Page 21 The use of Weibull in defect data analysis 9 ANNEX A – two cycle Weibull paper Plot 1 – Blank weibull paper Warwick Manufacturing Group Page 22 The use of Weibull in defect data analysis Page 23 10 ANNEX B – Progressive example of Weibull plotting Arranging the Raw Data Failure Number (i) 1 2 3 4 5 6 7 8 9 10 Ranked Hours at Failure (ti) 300 410 500 600 660 750 825 900 1050 1200 The following plots illustrate Weibull plotting. Warwick Manufacturing Group Median Rank Cumulative % Failed F(t) 6.7 16.2 25.9 35.5 45.2 54.8 64.5 74.1 83.8 93.3 The use of Weibull in defect data analysis Plot 2 Annotating the axes Warwick Manufacturing Group Page 24 The use of Weibull in defect data analysis Plot 3: Plot of ranked hours at failure against F(t) Warwick Manufacturing Group Page 25 The use of Weibull in defect data analysis Plot 4 Construct closest fit straight line Warwick Manufacturing Group Page 26 The use of Weibull in defect data analysis Plot 5 Estimation of shape parameter Warwick Manufacturing Group Page 27 The use of Weibull in defect data analysis Plot 6 use of estimation point Warwick Manufacturing Group Page 28 The use of Weibull in defect data analysis Plot 7 Estimate of shape parameter Warwick Manufacturing Group Page 29 The use of Weibull in defect data analysis Plot 8 : ETA Estimator Line Warwick Manufacturing Group Page 30 The use of Weibull in defect data analysis Plot 9: Calculation of characteristic life Warwick Manufacturing Group Page 31 The use of Weibull in defect data analysis Plot 10: calculation of mean life Warwick Manufacturing Group Page 32 The use of Weibull in defect data analysis Plot 11: Calculation of median life Warwick Manufacturing Group Page 33 The use of Weibull in defect data analysis Plot 12: Calculation of B15 Life Warwick Manufacturing Group Page 34 The use of Weibull in defect data analysis 11 ANNEX C Estimation of Weibull location parameter Steps: 1. Plot the data initially, observing a concave curve when viewed from the bottom right hand corner. 2. Select 2 extreme points on the vertical scale (say a and b), and determine the corresponding failure times (t1 and t3). 3. Divide the physical distance between points a and b in half without regard for the scale of the vertical axis, and so obtain point c. 4. Determine the failure time corresponding to point c (ie t2). 5. The estimate of the location parameter is given by: ∧ (t − t )(t − t ) γ = t2 − 3 2 2 1 (t3 − t 2 ) − (t 2 − t1 ) Figure 11. Estimation of location parameter. Warwick Manufacturing Group Page 35 The use of Weibull in defect data analysis Page 36 12 ANNEX D – Example of a 3-parameter Weibull plot Problem: to determine the Weibull parameters for the following (ordered) sample times to failure: 1000, 1300, 1550, 1850, 2100, 2450 and 3000 hours. Steps: Plot initially (Plot 1). 1. 2. 3. 4. Having identified a concave form apply the technique at Annex C (Plot 2). Determine γ and evaluate modified times to failure. Plot modified points and confirm a straight line (Plot 3). Extract β and η in the normal way remembering to add γ to the straight line value for η (Plot 4). 5. Sketch the probability density function (Plot 5). Plotting the raw data: Failure Number (i) 1 2 3 4 5 6 7 Ranked Hours at Failure (ti) 1000 1300 1550 1850 2100 2450 3000 Warwick Manufacturing Group Median Rank Cumulative % Failed F(t) 9.4 22.8 36.4 50.0 63.6 77.2 90.6 The use of Weibull in defect data analysis Plot 13: 3-Parameter Weibull plot Warwick Manufacturing Group Page 37 The use of Weibull in defect data analysis Plot 14 Construction of gamma estimator Warwick Manufacturing Group Page 38 The use of Weibull in defect data analysis From Plot 14: t1 = 800 hours t2 = 1400 hours t3 = 3300 hours General expression from Annex D: (t3 − t 2 )(t 2 − t1 ) (t3 − t 2 ) − (t 2 − t1 ) ∧ (4000 − 1500)(1500 − 810) γ = 1500 − (4000 − 1500) − (1500 − 810) = 1500-953 =547 hours ∧ γ = t2 − Replot using: 1000 - 547 = 453 1300 - 547 = 753 1550 - 547 = 1003 1850 - 547 = 1303 2100 - 547 = 1553 2450 - 547 = 1903 3000 - 547 = 2453 Warwick Manufacturing Group Page 39 The use of Weibull in defect data analysis Plot 15: Adjusted plot 13 ANNEX E – the effect of scatter Warwick Manufacturing Group Page 40 The use of Weibull in defect data analysis Plot 16 : The effect of scatter Warwick Manufacturing Group Page 41 The use of Weibull in defect data analysis 14 ANNEX F – 95% confidence limits for Weibull Plot 15: 95% lower confidence limit construction Warwick Manufacturing Group Page 42 The use of Weibull in defect data analysis Plot 18: 95% upper confidence limit construction Warwick Manufacturing Group Page 43 The use of Weibull in defect data analysis Page 44 15 ANNEX G – Weibull plot of multiply censored data Failure Times Censoring Survivors Warwick Manufacturing Group Mean Order Median Bernard’ s 5% Rank Lower 95% Rank The use of Weibull in defect data analysis ti times Ci ki (b) Number mi Page 45 Ranks % Approxn Bound (e) (f) (g) Upper Bound (h) (d) (a) (c) 31.7 16 1 4.2 4.27 0.3 17 39.2 15 2 10.2 10.37 2.2 26 57.5 14 3 16.3 16.46 5.3 34 (13) — — — — — 23.05 9.32 42.48 29.63 13.8 49.12 65.0 65.8 12 70.0 11 75.0 (10) 75.0 (9) 87.5 (8) 88.3 (7) 84.2 (6) 101.7 (5) 105.8 109.2 110.0 130.0 3+ 16 + 1 − 3 = 4.08 1 + 12 22.89 4.08 + 16 + 1 − 4.08 = 5.16 1 + 11 29.49 — — — — — 4 7.53 44.03 44.09 25.65 64.18 (3) — — — — — 2 10.69 63.31 63.35 43.14 80.45 (1) — — — — — Multiply censored data Warwick Manufacturing Group The use of Weibull in defect data analysis Plot 19: Plot of data with censorings Warwick Manufacturing Group Page 46 The use of Weibull in defect data analysis Plot 20: confidence limits on plot with censorings Warwick Manufacturing Group Page 47 The use of Weibull in defect data analysis Page 48 16 Median rank tables 50 % rank Median Ranks (50%) Rank order 1 Sample Size 1 50 2 3 4 5 6 7 29,29 20,63 15,91 12,94 10,91 9,43 8 9 10 8,30 7,41 6,70 2 70,71 50,00 38,57 31,38 26,44 22,85 20,11 17,96 16,23 3 79,37 61,43 50,00 42,14 36,41 32,05 28,62 25,86 4 84,09 68,62 57,86 50,00 44,02 39,31 35,51 5 87,06 73,56 63,59 55,98 50,00 45,17 6 89,09 77,15 67,95 60,69 54,83 7 90,57 79,89 71,38 64,49 8 91,70 82,04 74,14 9 92,59 83,77 10 93,30 Warwick Manufacturing Group The use of Weibull in defect data analysis Page 49 Rank Median Ranks (50%) order Sample Size 11 12 13 14 15 16 17 18 19 20 1 6,11 5,61 5,19 4,83 4,52 4,24 4,00 3,78 3,58 3,41 2 14,80 13,60 12,58 11,70 10,94 10,27 9,68 9,15 8,68 8,25 3 23,58 21,67 20,04 18,65 17,43 16,37 15,42 14,58 13,83 13,15 4 32,38 29,76 27,53 25,61 23,94 22,47 21,18 20,02 18,99 18,05 5 41,19 37,85 35,02 32,58 30,45 28,59 26,94 25,47 24,15 22,97 6 50,00 45,95 42,51 39,54 36,97 34,71 32,70 30,92 29,32 27,88 7 58,81 54,05 50,00 46,51 43,48 40,82 38,47 36,37 34,49 32,80 8 67,62 62,15 57,49 53,49 50,00 46,94 44,23 41,82 39,66 37,71 9 76,42 70,24 64,98 60,46 56,52 53,06 50,00 47,27 44,83 42,63 10 85,20 78,33 72,47 67,42 63,03 59,18 55,77 52,73 50,00 47,54 11 93,89 86,40 79,96 74,39 69,55 65,29 61,53 58,18 55,17 52,46 94,39 87,42 81,35 76,06 71,41 67,30 63,63 60,34 57,37 94,81 88,30 82,57 77,53 73,06 69,08 65,51 62,29 95,17 89,06 83,63 78,82 74,53 70,68 67,20 95,48 89,73 84,58 79,98 75,85 72,12 16 95,76 90,32 85,42 81,01 77,03 17 96,00 90,85 86,17 81,95 96,22 91,32 86,85 96,42 91,75 12 13 14 15 18 19 20 Warwick Manufacturing Group 96,59 The use of Weibull in defect data analysis Page 50 Median Ranks (50%) Rank Sample Size order 21 22 23 24 25 26 27 28 29 30 1 3,25 3,10 2,97 2,85 2,73 2,63 2,53 2,45 2,36 2,28 2 7,86 7,51 7,19 6,90 6,62 6,37 6,14 5,92 5,72 5,53 3 12,53 11,97 11,46 10,99 10,55 10,15 9,78 9,44 9,11 8,81 4 17,21 16,44 15,73 15,09 14,49 13,94 13,43 12,96 12,52 12,10 5 21,89 20,91 20,01 19,19 18,43 17,74 17,09 16,48 15,92 15,40 6 26,57 25,38 24,30 23,30 22,38 21,53 20,74 20,01 19,33 18,69 7 31,26 29,86 28,58 27,41 26,32 25,32 24,40 23,54 22,74 21,99 8 35,94 34,33 32,86 31,51 30,27 29,12 28,06 27,07 26,14 25,28 9 40,63 38,81 37,15 35,62 34,22 32,92 31,71 30,59 29,55 28,58 10 45,31 43,29 41,43 39,73 38,16 36,71 35,37 34,12 32,96 31,87 11 50,00 47,76 45,72 43,84 42,11 40,51 39,03 37,65 36,37 35,17 12 54,69 52,24 50,00 47,95 46,05 44,31 42,68 41,18 39,77 38,46 13 59,37 56,71 54,28 52,05 50,00 48,10 46,34 44,71 43,18 41,76 14 64,06 61,19 58,57 56,16 53,95 51,90 50,00 48,24 46,59 45,06 15 68,74 65,67 62,85 60,27 57,89 55,69 53,66 51,76 50,00 48,35 16 73,43 70,14 67,14 64,38 61,84 59,49 57,32 55,29 53,41 51,65 17 78,11 74,62 71,42 68,49 65,78 63,29 60,97 58,82 56,82 54,94 18 82,79 79,09 75,70 72,59 69,73 67,08 64,63 62,35 60,23 58,24 19 87,47 83,56 79,99 76,70 73,68 70,88 68,29 65,88 63,63 61,54 20 92,14 88,03 84,27 80,81 77,62 74,68 71,94 69,41 67,04 64,83 21 96,75 92,49 88,54 84,91 81,57 78,47 75,60 72,93 70,45 68,13 22 96,90 92,81 89,01 85,51 82,26 79,26 76,46 73,86 71,42 23 97,03 93,10 89,45 86,06 82,91 79,99 77,26 74,72 24 97,15 93,38 89,85 86,57 83,52 80,67 78,01 Warwick Manufacturing Group The use of Weibull in defect data analysis Page 51 25 97,27 93,63 90,22 87,04 84,08 81,31 26 97,37 93,86 90,56 87,48 84,60 27 97,47 94,08 90,89 87,90 28 97,55 94,28 91,19 29 97,64 94,47 30 97,72 Warwick Manufacturing Group