Dynamic Prognoser Architecture via the Path Classification and Estimation (PACE) Model Dr. Dustin R. Garvey (dgarvey@expmicrosys.com) Expert Microsystems, Inc. Dr. J. Wesley Hines (jhines2@utk.edu) University of Tennessee, Knoxville Nuclear Engineering Department thereby learning “real world” degradation patterns. This paper addresses this problem by using expert knowledge to initialize a path classification and estimation (PACE) prognoser and then uses collected examples of oil drill failure to incrementally supplement its initial knowledge with example cases. Abstract Most modern prognostic algorithms are founded on a simple abstraction of device degradation, for an individual device there exists a degradation signal that progresses along a unique path until it crosses a critical failure threshold. While this abstraction has been shown to be valid for well understood failure modes under controlled stress conditions, its viability in "real world" devices being exposed to "real world" stresses is questionable. Because the complexity of degradation should scale in a similar fashion as the complexity of the device, the applicability of a simple abstraction of degradation is increasingly arguable for modern devices. This paper will propose an alternative to the current abstraction of degradation, which is founded on the premise that degradation data should be allowed to speak for itself. In this way, many different forms of information can be incorporated into a prognoser's estimate of a device's remaining useful life (RUL). More specifically, this paper will outline a methodology for implementing a dynamic prognoser that can be incrementally trained to learn general (physical model output, expert opinion, etc.) and specific ("real world" data) degradation trends. This work will demonstrate the viability of the proposed method by applying a particular embodiment, namely the path classification and estimation (PACE) model, to data collected from a deep-well oil exploration drill. To begin, expert opinion will be used to develop a PACE prognoser. Next, data collected from individual drills will be used to incrementally train the prognoser to learn specific degradation trends. 1 2 PACE Model The general path model (GPM) (Lu & Meeker 1993) is founded on the concept that a degradation signal collected from an individual device will follow a general path until it reaches an associated failure threshold. Since its introduction, the thought model proposed in the GPM has been prolifically adopted by modern researchers and has resulted in a plethora of techniques that can be related to the GPM in one way or another (Lu & Meeker 1993, Upadhyaya et al. 1994, Mishra et al. 2004, Yan et al. 2004, Xu & Zhao 2005, Liao et al. 2006). From this cursory description, it can be seen that there are two fundamental assumptions of the GPM and its modern counterparts: 1) there exists a path for the degradation signal that can be parameterized via regression, machine learning, etc. and 2) there exists a failure threshold for the degradation signal that accurately predicts when a device will fail. For modern computational capacity, the first assumption is trivial, in that many methods exist for parameterizing simple (polynomial regression, power regression, etc.) and complex (fuzzy inference systems, neural networks, etc.) relationships from data. The assumption of the existence of a threshold that accurately predicts device failure is not so easily reconciled. While the existence of a failure threshold has been shown to be valid for well understood degradation processes (e.g. seeded crack growth) and controlled testing environments (e.g. constant load or uniform cycling), Liao et al. (2006) observes that for “real world” applications, where the failure modes are not always well understood or can be too complex to be quantified by a single threshold, the failure boundary is “gray” at best. Wang and Coit (2007) attempt to address this problem by integrating Introduction With the increased interest in the development of usable prognosis systems, there is an increasing focus on the details that can help develop an accurate and adaptable system. One such issue is the need for an adaptable system that would be able to use one set of information to initialize the prognoser and then continuously use collected data to update the system, 44 uncertainty into the estimate of the threshold, but in the end the authors replace an estimate of the threshold with another, more conservative estimate. For this work, instead of saying that a device has failed if its degradation signal exceeds its threshold, the approach implemented by Liao et al. (2006) will be adopted, where the data is allowed to speak for itself. In other words, for this work a device is interpreted as having failed if it in fact fails. Before continuing, it is important to note that if the failure thresholds are well established, the data can be formatted such that the instant where the signal crosses the threshold is interpreted as a failure event. As its name suggests, the PACE model is fundamentally composed of two operations: 1) classify a current degradation path as belonging to one or more of previously collected exemplar degradation paths and 2) use the resulting memberships to estimate the RUL. Hence, the name path classification (classify path according to exemplar paths) and estimation (estimate the RUL from the results of the classification). At this point, the PACE is described in more detail by considering a hypothetical example. To begin, consider the example degradation signals presented in Error! Reference source not found.. The degradation signals and their associated failure times are presented in the top plot. Here, the failure times can be set to be either the time that the device fails or the time at which an expert determines that the device performance has sufficiently degraded such that it has effectively failed. For this example, it can be seen that there is not a clear failure threshold for the degradation signal. Notice that the paths are generalized by fitting an arbitrary function to the data via regression, machine learning, etc. There are two useful pieces of information that can be extracted from the degradation paths, specifically the failure times and the “shape” of the degradation that is described by the functional approximations. These pieces of information can be used to construct a vector of exemplar failure times and functional approximations, as follows: T1 T T 2 T3 T4 Figure 1 Example degradation signals and their associated functional approximations To test the PACE, the degradation signal of another, similar device is being monitored and an estimate of the RUL of the individual device is needed at an arbitrary time t*. Such a case is presented in Figure 2, where the degradation signal is plotted in BLACK. The query observation of the degradation signal at time t* is written as u(t*). To estimate the RUL of the device via the PACE model, the algorithm presented in Figure 3 is used. The general process for estimating the RUL can be seen to be composed of three steps. First, the expected degradation signal values according to the exemplar degradation paths are estimated by evaluating the regressed functions at t*. At the same time, the expected RULs are calculated by subtracting the current time t* from the observed failure times of the exemplar paths. Second, the observed degradation signal u(t*) is then classified according to the vector of expected degradation signal values U(t*). The new degradation path is assigned a membership value for each of the historical paths which characterizes its similarity to that exemplar. Third, the vector of memberships of the observed degradation value to the exemplar degradation paths is combined with the vector of expected RULs to estimate the RUL of the individual device. The details of the PACE model f1 (t , θ1 ) f (t , θ ) 2 f t , Θ 2 f 3 (t , θ 3 ) f 4 (t , θ 4 ) Here, Ti and fi (t,θi) are the failure times and functional approximation of the ith exemplar degradation signal path, θi are the parameters of the functional approximation of the ith exemplar degradation signal path, and Θ are all of the parameters of each functional approximation. 45 context, the above vector can be rewritten as a follows: can now be described in the context of the present example. f1 (t*, θ1 ) U 1 (t*) f (t*, θ ) U (t*) 2 U(t*) 2 2 f 3 (t*, θ 3 ) U 3 (t*) f 4 (t*, θ 4 ) U 4 (t*) At the same time, the current time t* is used with the vector of failure times to calculate the expected RULs according to the exemplar degradation paths. T1 t * T t * L(t*) T t* 2 T3 t * T4 t * Figure 2 Illustration of an observation of a device’s degradation signal at time t* relative to the functional approximations of the exemplar devices Now the currently observed degradation signal value u(t*) can be compared to the expected degradation signal values U(t*) by any one of a number of classification algorithms to obtain a vector of memberships μU[u(t*)]. Here, the memberships have values on [0,1] and * U [u(t*)] i denotes the th membership of u(t ) to the i exemplar path. U1 [u (t*)] [u (t*)] U μ U [u (t*)] 2 U 3 [u (t*)] U 4 [u (t*)] Finally, the above memberships and the expected RULs are combined in some way to estimate the current RUL of the individual device, such as a simple weighted average. Figure 3 Process diagram of the PACE model for estimating the RUL 3 Dynamic Prognoser by Example Now that the PACE has been described, let’s take a look at how it can be used to implement a dynamic prognoser. More specifically, how can the PACE be structured to use expert opinion prior to data collection and how can the prognoser itself be updated as degradation data is obtained. To begin, let’s use the causal prognoser outlined in Garvey (2007) and Garvey & Hines (2007), where the cumulative vibration energy is used to infer the RUL of an oil drill’s steering system. Let’s begin by considering the relationship of vibration to degradation by examining the signal itself. * First, the current time t is used to estimate the expected values of the degradation signal and RULs according to the exemplar paths. In equation form, the expected values of the degradation signal according to the exemplar paths are simply the approximating functions evaluated at the current time t*. f1 (t*, θ1 ) f (t*, θ ) 2 f t*, Θ 2 f 3 (t*, θ 3 ) f 4 (t*, θ 4 ) 3.1 Vibration and Degradation The function evaluations can be interpreted as exemplars of the degradation signal at time t*. In this Before we can get into the details about how the PACE can be constructed to use expert opinion, we 46 need to take a look at how vibration is related to steering system degradation. To do this, consider the chart of the lateral vibration in G’s (i.e. 1 G = 9.81 m/s2) is presented in Figure 4. Also included in the plot are the vibration level that separates normal and moderate stresses (ORANGE) and the vibration level that separates moderate and severe stresses (RED). For this particular steering system, a failure occurs just prior to the end of the file, after 160 elapsed hours. Notice that while the majority of the observations are in the normal stress range, there are many peaks that lie in the moderate and severe range. What this means in terms of degradation is, when there are more severe vibration events the degradation of the steering system increases until failure. we have a physical model, we could simulate the degradation to obtain the curves. For this work, we’ll use expert knowledge to construct the original prognoser and then augment this knowledge with example degradations as they become available. To get started, consider the following rules of thumb that could possibly be obtained from engineers and operators: 1) For an environment with no moderate or severe events, the steering unit can be expected to survive 200 hours. 2) For an environment with 50 moderate or severe events, the steering unit can be expected to survive 100 hours. 3) For an environment with 200 moderate or severe events, the steering unit can be expected to survive 0 hours. In order to use these rules in the PACE, we need to a vector of function forms of the degradation and their corresponding lifetimes. What we get are the following vectors: 0 f t ,Θ 50 200 200 T 100 0 Notice that the functional forms of the degradation are constants, meaning that regardless of the current time, they evaluate to 0, 50, and 200 respectively. It’s important to note at this point that had we had a physical model of the degradation, we could use the parameterized physical form of the equations instead of constants. The constants are only used to enable easy communication of the underlying principal. Figure 4 Example vibration signal (GRAY) with normal-to-moderate (ORANGE) and moderateto-severe (RED) stress levels 3.3 Updating the Prognoser with Examples Let’s now suppose that we’ve had a steering unit failure and obtained the vibration signal presented earlier in Figure 4. In this case we’ve counted 646 moderate or severe vibration events and the operational time before failure was 49 hours. Notice we’re using the operational time and not the elapsed time since we want to know how the degradation affects the health while it is being used. Now that we have this example, we can include it in the prognosis model by simply appending the count and lifetime to the previous vectors to obtain the following exemplar vectors. For the sake of simplicity, we’re going to use a simple counting procedure to infer degradation. More specifically, we’re going to count the number of severe vibration events (i.e. vibration above 5 G) and then relate this to the RUL. 3.2 Initial Prognoser from Expert Knowledge As is the case for many engineering products, often no data is available for a new design, but it is still desired to have some means for inferring the RUL of individual devices for the imposed environmental stresses. Since the PACE doesn’t use a fixed architecture, but relies on the general concept of having a function form of the degradation and an associated failure time for examples, we can train an initial prognoser on almost anything. For example, if 0 50 f t ,Θ 200 646 47 200 100 T 0 49 At this point, the initial expert opinion has been supplemented with actual degradation data. In this way, we can continuously update the prognoser memory as additional data is made available. It is important to note that the method implemented in this discussion is rather simple, in that we’re using a single number to quantify degradation and not a “path”. The reason for the simplicity was to enable effective communication of the concept. If we were implementing a more complex system, where we had functional forms of the degradation derived from expert opinion or simulation, we would simply append the functional form of the degradation to the initial set as data is made available. Furthermore, since the PACE is purely data driven, if additional stress signals were identified they can easily be integrated by simply appending another column to the matrix of functional approximations of the degradation. Table 1 Summary of the incremental prognoser estimate accuracies # Examples 0 1 2 3 4 5 4 MAE (hrs) 97 92 81 80 74 53 Conclusions This paper has shown that by taking a more relaxed approach by allowing degradation data to speak for itself, we can develop a prognosis system that can incrementally learn “real world” degradation mechanisms by simply retaining additional degradation examples. The example presented in this paper illustrates that by incrementally training the PACE prognoser, we can significantly improve the prognoser performance in time. Furthermore, it is important to note that while the present example was simplified for the sake of discussion, there are no technical roadblocks to implementing the described approach with more complex functional approximations of the degradation, provided the functional approximations are known or can be found. 3.4 Testing and Results To test the approach, the prognoser was incrementally trained on five “real world” data sets. For each prognoser, a random sampling of 20 observations of the number of events, elapsed time, and actual RUL for the 5 sets was used as a test set. In other words, we used the samples of the number of events and elapsed time to estimate the RUL with the current prognoser. Next, the estimate of the RUL was compared to the actual RUL to assess the accuracy of the individual prognoser. This process was repeated beginning with the prognoser trained only with expert opinion and ending with the prognoser trained on the expert opinion and the entire set of example cases. The results are presented in Error! Reference source not found.. In the # Examples column the number of “real world” examples used in the prognoser is listed. The prognoser accuracy over the test set for each prognoser is presented in the MAE (mean absolute error) column. Notice that there is a clear downward trend in the MAE’s as the prognoser is incrementally trained on real degradation data. If we examine the prognoser trained only with the expert opinion, the MAE is approximately 97 hours. After we’ve supplemented the expert opinion with the five examples, notice that the MAE has significantly improved to a value of 53 hours. This represents a 45% improvement in the predictive performance of the prognoser. 5 References Garvey, Dustin R. (2007), An Integrated Fuzzy Inference Based Monitoring, Diagnostic, and Prognostic System, Ph.D. Dissertation, Nuclear Engineering Department, University of Tennessee, Knoxville: 2007. Garvey, Dustin R. and J. Wesley Hines (2007), “An Integrated Fuzzy Inference Bases Monitoring, Diagnostic, and Prognostic System”, Proceedings of the 61st Annual Meeting of the Machinery Failure and Prevention Technology Society, pp.59-68, Virginia Beach, VA: April 17-19, 2007. Liao, Haitao, Wenbiao Zhao, and Huairui Guo (2006), “Predicting Remaining Useful Life of an Individual Unit Using Proportional Hazards Model and Logistic Regression Model”, Proceedings of the Reliability and Maintainability Symposium (RAMS), pp.127-132: January 23-26, 2006 48 Lu, C. Joseph and William Q. Meeker (1993), “Using Degradation Measures to Estimate a Time-toFailure Distribution”, Technometrics, Vol. 35, No. 2, pp.161-174: May 1993. Mishra, S., S. Ganesan, M. Pecht and J. Xie (2004), "Life Consumption Monitoring for Electronic Prognostics", Proceedings of the IEEE Aerospace Conference, Vol. 5, pp.3455-3467: March 6-13, 2004. Nadaraya, E.A. (1964), “On estimating regression”, Theory of Probability and Its Applications, Vol. 10, pp. 186-190, 1964. Upadhyaya, Belle R., Masoud Naghedolfeizi, and B. Raychaudhuri (1994), “Residual Life Estimation of Plant Components”, Periodic and Predictive Maintenance Technology, pp.22-29: June 1994. Wang, Peng and David W. Coit (2007), “Reliability and Degradation Modeling with Random or Uncertain Failure Threshold”, Proceedings of the Annual Reliability and Maintainability Symposium, Las Vegas, NV: January 28-31, 2007. Watson, G. S. (1964), “Smooth Regression Analysis”, Indian Journal of Statistics, Series A, Vol. 26, pp. 359-372: 1964. Yan, Jihong, Muammer Koc, and Jay Lee (2004), “A Prognostic Algorithm for Machine Performance Assessment and Its Applications”, Production Planning & Control, Vol. 15, No. 8, pp.796-801: December 2004. Xu, Di and Wenbiao Zhao (2005), "Reliability Prediction using Multivariate Degradation Data", Proceedings of the Annual Reliability and Maintainability Symposium, pp.337-341, Alexandria, VA: January 24-27, 2005. 49