Eindhoven University of Technology MASTER A machine learning approach for data-driven maintenance with the absence of run-to-failure data Koks, J.J.C. Award date: 2021 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Department of Industrial Engineering and Innovation Sciences — Information Systems group A machine learning approach for data-driven maintenance with the absence of run-to-failure data J. Koks (1395769) Master of Science in Operations Management and Logistics 07 July 2021 Supervisors: Dr. Y. (Yingqian) Zhang – TU/e First supervisor Dr. H. (Rik) Eshuis – TU/e Second supervisor Supervisors from company are known but hided Abstract This thesis proposes a way to apply data-driven maintenance, which is set up without run-to-failure data. A choice was made to implement a predictive data-driven maintenance policy based on the degradation trajectory. The most appropriate method is to work with a data-driven degradation model based on a health indicator. Because the degradation of a specific part indicated by the health indicator behaves as a linear line, we can set an arbitrary failure point. Because the degradation is constant, substitution can be applied when run-to-failure data is gathered. The main focus is on the predictability of the process. After comparing different machine learning algorithms and finding the best settings in the data preprocessing, the Random Forest algorithm turned out to be the best in terms of accuracy. Data-driven maintenance can reduce maintenance time, material costs, and maintenance planning time, and maintenance costs. While on the other hand can increase the equipment uptime and availability ii Executive Summary Problem statement and research goal This report results from a master thesis project at a healthcare company located in Europe, in collaboration with the Eindhoven University of Technology. The project focuses on applying datadriven maintenance for a specific machine (deliberately unnamed), named machine Y, which fabricates a specific product (deliberately unnamed). The company currently uses a time-based maintenance policy for its maintenance activities at the specific machine. In this specific machine there is a specific component, namely component X. Component X is an essential component within this machine. There is no documentation on how many products component X can manufacture before replacing them. When the first machine was purchased, a life span was determined based on average production. The determination was already several years ago, and internal aspects are changed over time. As a result, knowledge lacks what constitutes an appropriate life span for component X. Data-driven maintenance can reduce maintenance time, material costs, and maintenance planning time, and maintenance costs. While on the other hand, it can increase the equipment uptime and availability. The problem statement summarizes the research goal and its relevance: Conducting maintenance activities is not optimized for effectivity and frequency and does not use historic planned maintenance and event data to replace the component X in machine Y. An improvement on the current situation (i.e., time-based policy) could be a different maintenance policy like condition-based maintenance based on the data available. Condition-based maintenance will decrease the number of maintenance moments that will lead to a higher production capacity and lower maintenance costs. On top of that, the findings can be applied to other similar production lines within the factories of the healthcare company. To improve the current situation the main question was constructed, which helps to investigate possibilities of using data to optimize the maintenance policy for component X. The main research question is: How can the data currently measured within machine Y help implement a more data-driven maintenance policy for component X? iii Performed activities in the project For this project, we will use the CRISP-DM methodology. In contrary to other methodologies, CRISP-DM focuses on data-related improvements or designs. Based on the steps of the CRISPDM methodology, the important findings per phase is described: Business understanding + Data understanding: Based on the current time-based preventive maintenance policy, component X are replaced before the actual failure of component X occurred. Therefore, a characteristic of the absence of run-to-failure data characterizes the project. Additionally, there is no direct measurement on the equipment (e.g., sensors that used temperature, vibrations) but an indirect relationship that reflects the degradation of component X. This indirect relationship is known but deliberately not noted for the public version of the Thesis. Data preparation: All the datasets are integrated. All string variables are removed. Data is reduced to a lower granularity to perform different tests. Based on different sliding time windows, time-domain features are created. Lastly, odd maintenance moments were corrected. Different time domain features are compared to find the most fitting health indicator for predicting the degradation of component X. Evaluation of the health indicator is done based on monotonicity, trendability, and correlation with the Remaining Useful Lifetime (RUL). The best performance health indicator was removed due to company sensitive information when using hourly data (i.e., granularity) and using a sliding time window of one hour to create time-domain features. For solving the problem of the absence of run-to-failure data, an arbitrary lower bound is set. Because the degradation of component X indicated by the health indicator behaves more or less as a linear relationship, we can substitute this arbitrary lower bound when run-to-failure data is gathered. Because the degradation is constant, substitution can be applied, and the main focus is on the predictability of the condition of component X. Modeling: A comparison is made with simpler time series based solely on one parameter to investigate if there is explainable behavior in the dataset. We compare that method with more complex machine learning predictive models. For our implementation, we selected a Simple Linear Regression, XGBoost, Random Forest, 1D-CNN, and LSTM for predicting the RUL of the Component X and performed multiple experiments. Evaluation: An in-depth comparison is made for each machine learning model compared to a simple linear regression. The best performing algorithm is Random Forest, which has a Root Mean iv Squared Error (RMSE) of roughly 18 active production hours (RMSE in products known but anomalized). For achieving those results, a data granularity of hourly data is used, and a hourly sliding time window. Deployment: The domain experts have indicated that they would like to get hold of the code of the prediction model to make a prediction one to several times a week as to how long a specific machine can produce. They can then schedule a time for the maintenance engineer to replace Component X. When planning the maintenance, a safety margin equal to the average RMSE should be included. Results At the beginning of this research project, the goal of the research project and the research questions were defined. The main research question is: How can the maintenance be more data-driven to optimize the current maintenance policy for machine Y? Based on the needs of the company, the knowledge of the process, and the gathered data of the machine Y, a choice was made to implement predictive data-driven maintenance based on degradation. However, the characteristics that made this project challenging are the absence of run-to-failure data and indirect equipment measurement. The most appropriate method is to work with a data-driven degradation model based on a health indicator. The health indicator was evaluated based on trendability, monotonicity, and correlation with the RUL. The correlation between the degradation of the health indicator and the degradation based on the RUL confirms that the indirect measurement can serve as a health indicator for the component X. For solving the problem of the absence of run-to-failure data, there using a relative arbitrary lower bound. Because the degradation of component X indicated by the health indicator behaves as a linear line, we can substitute this arbitrary lower bound when run-to-failure data is gathered. Because the degradation is constant, substitution can be applied, and the main focus is on the predictability of the process. After comparing different predictive models and finding the best settings in the data preprocessing, the Random Forest algorithm became the best in terms of accuracy. An RMSE of (RMSE in products known but anomalized), which means that a prediction can be made with an error of roughly 18 active production hours. A note to implement the data-driven maintenance policy, run-to-failure data must be collected to ultimately find the true point of failure to establish the lower bound. v Preface This thesis serves as the conclusion of my Master Operational Management & Logistics, which I have been following for 2,5-3 years. I would also like to thank the company for allowing me to apply the knowledge I have gained in the field. First, I would like to thank my mentor, Dr. Yingqian Zhang, for her guidance throughout the project. The joint meetings with other students on Wednesday morning always supported showing intermediate results and receiving feedback. Also, fellow students' projects were a nice touch, which made you think about possible methods I could apply in my project. Secondly, I would like to thank Ya Song for the individual moments to help me with different questions regarding predictive maintenance. I would also like to thank Dr. Rik Eshuis for his feedback on my draft version of my Thesis. Secondly, I would like to thank my supervisors from the company, individuals and acknowledgments have been removed due to company sensitive information Finally, I would like to thank my family, girlfriend, friends, and fellow students. Who supported me and provided sufficient distraction during my internship period in social isolation. Eindhoven, July 2021 Jelle Koks vi Contents Abstract ..........................................................................................................................................................ii Executive Summary ....................................................................................................................................... iii Problem statement and research goal ...................................................................................................... iii Performed activities in the project............................................................................................................ iv Results ........................................................................................................................................................v Preface........................................................................................................................................................... vi List of Acronyms ............................................................................................................................................. x List of Figures................................................................................................................................................. xi List of Tables ................................................................................................................................................. xii List of Equations ........................................................................................................................................... xii Part I – Introduction & Problem Description................................................................................................. 1 1. Introduction ........................................................................................................................................... 1 1.1 Context: Company ......................................................................................................................... 2 1.1.1 Production line and product produce .................................................................................. 2 1.1.2 A deeper understanding of Machine Y .................................................................................. 3 1.1.3 Data availability ..................................................................................................................... 5 1.2 Problem description ...................................................................................................................... 7 1.2.1 Problem formulation ............................................................................................................. 7 1.2.2 Research Questions and research method per question ...................................................... 8 1.2.3 Scope ................................................................................................................................... 10 1.2.4 Research design ................................................................................................................... 11 Part II – A literature overview ..................................................................................................................... 13 2. Literature Review ................................................................................................................................ 13 2.1 Maintenance................................................................................................................................ 13 2.1.1 Maintenance strategies ....................................................................................................... 13 2.1.2 Different condition-based maintenance strategies as predictive maintenance ................. 15 2.1.3 Answering sub research question Q-1 ................................................................................ 19 2.2 Process of data-driven strategy................................................................................................... 19 2.2.1 Process of prognostics ......................................................................................................... 19 2.2.2 Impact of missing labels ...................................................................................................... 21 2.2.3 Answering sub research question Q-3 ................................................................................ 23 2.3 Predictive models used in data-driven maintenance .................................................................. 24 vii 2.3.1 Statistical approaches.......................................................................................................... 24 2.3.2 Machine learning approaches (AI approaches) ................................................................... 25 2.3.3 (Deep) Transfer Learning ..................................................................................................... 27 2.3.4 Quality measures ................................................................................................................. 28 2.3.5 Answering sub research question Q-4 ................................................................................ 28 2.4 Conclusion of literature review ............................................................................................... 29 2.4.1 GAP in literature .................................................................................................................. 29 Part III – Data exploration, understanding, preparation ............................................................................. 30 3. Data exploration .................................................................................................................................. 30 3.1 4. Descriptive statistics .................................................................................................................... 30 3.1.1 Product dataset ................................................................................................................... 30 3.1.2 Process dataset.................................................................................................................... 32 3.2 Production and incidents per period........................................................................................... 34 3.3 Failure rate during lifetime of Component X .............................................................................. 35 3.4 Seeking trends in data ................................................................................................................. 35 3.5 Analyzing maintenance moments ............................................................................................... 38 Data preparation ................................................................................................................................. 39 4.1 Data integration .......................................................................................................................... 39 4.2 Data cleaning ............................................................................................................................... 39 4.2.1 Handling outliers and missing data ..................................................................................... 40 4.2.2 Adjusting maintenance moments ....................................................................................... 40 4.2.3 Selection of applicable production periods......................................................................... 40 4.3 Data transformation .................................................................................................................... 41 4.4 Feature selection ......................................................................................................................... 42 4.5 Data reduction ............................................................................................................................. 42 4.6 Conclusion data exploration........................................................................................................ 42 Part IV – Modeling ....................................................................................................................................... 43 5. 6. Experimental setup ............................................................................................................................. 43 5.1 Heal indication construction ....................................................................................................... 43 5.2 Health stage division ................................................................................................................... 46 5.3 RUL prediction ............................................................................................................................. 47 Modeling.............................................................................................................................................. 50 6.1 Linear Regression ........................................................................................................................ 50 viii 6.2 Decision Tree ............................................................................................................................... 50 6.2.1 Gradient boosted regression tree (XGBoost) ...................................................................... 51 6.2.2 Random Forest .................................................................................................................... 52 6.3 Convolutional neural network..................................................................................................... 53 6.4 Long-short-term-memory ........................................................................................................... 55 Part V – Results............................................................................................................................................ 59 7. 8. 9. Results ................................................................................................................................................. 59 7.1 Comparison of different predictions prediction models for RUL ................................................ 59 7.2 Impact of the determination of the lower bounds ..................................................................... 65 7.3 Impact of data granularity and sliding time window .................................................................. 66 7.4 Impact of filtering the data on production cell-level .................................................................. 67 7.5 Impact of adding process data as features ................................................................................. 67 7.6 Conclusion of experiments .......................................................................................................... 67 Conclusion ........................................................................................................................................... 68 8.1 Conclusion ................................................................................................................................... 68 8.2 Recommendations....................................................................................................................... 70 Future opportunities and implementation ......................................................................................... 71 9.1 Implementation on the work floor .............................................................................................. 71 9.2 Experiment with run-to-failure data ........................................................................................... 72 9.3 Created capacity and saved costs ............................................................................................... 74 References ................................................................................................................................................... 75 Appendix A: ................................................................................................................................................. 79 Appendix B: ................................................................................................................................................. 80 Appendix C: ................................................................................................................................................. 81 Appendix D: ................................................................................................................................................. 82 Appendix E:.................................................................................................................................................. 83 ix List of Acronyms ANN Artificial Neural Network ARMA Auto-Regressive Moving-Average CBM Condition based maintenance CNN Convolutional Neural Network CRISP-DM Cross-Industry Standard Process for Data Mining DT Decision Tree DBM Detection based maintenance GB Gradient Boosting HI Health Indicator HS Health stage LR Linear Regression LSTM Long Short Term Memory MAPE Mean Absolute Percentage Error PdM Predictive Maintenance RF Random Forest RMSE Root Mean Squared Error RUL Remaining Useful Lifetime SVR Support Vector Regression XGBoost eXtreme Gradient Boosting x List of Figures Figure 1.1 Product Z ...................................................................................................................................... 2 Figure 1.2 Production line of the Product Z .................................................................................................. 2 Figure 1.3 Pocket within EDM-unit ............................................................................................................... 3 Figure 1.4 Clamping jaws in pocket ............................................................................................................... 3 Figure 1.5 Electrical contact .......................................................................................................................... 3 Figure 1.6 New contact spring ....................................................................................................................... 4 Figure 1.7 Worn-out contact spring .............................................................................................................. 4 Figure 1.8 parameters Product Z ................................................................................................................... 6 Figure 1.9 Structure of sub-research questions ............................................................................................ 8 Figure 1.10 CRISP-DM methodology ........................................................................................................... 11 Figure 2.1 Categorization of maintenance policies (Avontuur, 2017) ........................................................ 14 Figure 2.2 Different condition-based maintenance strategies ................................................................... 16 Figure 2.3 General approach of prognostics, Abid et al. (2018) ................................................................. 20 Figure 3.1 Boxplot of each parameter for each machine ............................................................................ 31 Figure 3.2 Boxplot of height difference per diameter for each pocket within VT16-machine ................... 31 Figure 3.3 Zoomed in boxplot of height difference per diameter for each pocket within VT16-machine . 32 Figure 3.4 Failure rate during lifetime of Component X ............................................................................. 35 Figure 3.5 Example abnormal behavior....................................................................................................... 38 Figure 5.1 Experiment setup of RUL prediction .......................................................................................... 47 Figure 5.2 Trajectories of train and test set with a cut-off point ................................................................ 48 Figure 6.1 Graphical display of Random Forest........................................................................................... 52 Figure 6.2 General structure of convolution neural network as described in Chen et al. (2020) ............... 53 Figure 6.3 Structure of long short-term memory network as described in Chen et al. (2020) ................... 55 Figure 6.4 Time window processing ............................................................................................................ 58 Figure 7.1 Linear regression of Health index and Remaining useful lifetime.............................................. 60 Figure 7.2 Predicted RUL-values versus the real RUL values performed by XGBoost................................. 61 Figure 7.3 Predicted RUL-values versus the real RUL values performed by Random Forest ...................... 62 Figure 7.4 Predicted RUL-values versus the real RUL values performed by CNN ....................................... 63 Figure 7.5 Predicted RUL-values versus the real RUL values performed by LSTM...................................... 64 Figure 9.1 Process of implementing prediction model ............................................................................... 71 Figure 9.2 Replacements of Component X or production cell after last batched maintenance moment .. 72 Figure 9.3 health indicator trajectory run-to-failure data........................................................................... 73 xi List of Tables Table 1.1 Standard maintenance moments for Machine Y ........................................................................... 5 Table 3.1 Descriptive statics product dataset ............................................................................................. 30 Table 3.2 Descriptive statics process dataset.............................................................................................. 33 Table 3.3 Production per period for each machine..................................................................................... 34 Table 3.4 Results of Mann Kendall trend test for product dataset ............................................................. 37 Table 3.5 Reviewed maintenance moments ............................................................................................... 38 Table 4.1 Review of production periods ..................................................................................................... 41 Table 4.2 Time-domain features ................................................................................................................. 42 Table 5.1 Testing monotonicity and trendability on potential HI's ............................................................. 45 Table 5.2 impact of granularity and sliding time window ........................................................................... 45 Table 7.1 Comparison of different data granularity and sliding window on performance Random Forest 66 List of Equations Equation 2.1 RMSE formula ........................................................................................................................ 28 Equation 2.2 MAPE formula ........................................................................................................................ 28 Equation 3.1 Sign formula of Mann Kendall trend test ............................................................................... 36 Equation 3.2 S statistic formula of Mann Kendall trend test ..................................................................... 36 Equation 3.3 Variation formula of Mann Kendall trend test....................................................................... 36 Equation 3.4 Normalized test static of Mann Kendall trend test ................................................................ 36 Equation 5.1 Formula for Monotonicity ...................................................................................................... 43 Equation 5.2 Formula for Trendability ........................................................................................................ 44 Equation 6.1 Simple linear regression ......................................................................................................... 50 Equation 6.2 Convolutional operation for each feature map ..................................................................... 53 Equation 6.3 Pooling function for CNN ....................................................................................................... 54 Equation 6.4 Min-max normalization formula ............................................................................................ 54 Equation 6.5 Forget gate ............................................................................................................................. 56 Equation 6.6 learned information as input for input gate .......................................................................... 56 Equation 6.7 input gate ............................................................................................................................... 56 Equation 6.8 Update of the current gate .................................................................................................... 56 Equation 6.9 Output gate ............................................................................................................................ 57 Equation 6.10 Output of hidden layer......................................................................................................... 57 Equation 6.11 Standardization of data........................................................................................................ 57 xii Part I – Introduction & Problem Description 1. Introduction This report results from a master thesis project at a healthcare company located in Europe, in collaboration with the Eindhoven University of Technology. The project focuses on applying datadriven maintenance for Machine Y to produce Product Z. The company currently uses a timebased maintenance policy for its maintenance activities at Machine Y (i.e., replacing component X). Component X is an essential component within Machine Y. There is no documentation on how many products Component X can manufacture before replacing them. When the first machine was purchased, a life span was determined based on average production. The determination was already several years ago, and internal aspects are changed over time. As a result, knowledge lacks what constitutes an appropriate life span for Component X. Lately, industrial consultants at the company have gained interest in Industry 4.0. Industry 4.0 is what experts have called “The fourth industrial revolution,” the digital revolution. New technologies from Industry 4.0 integrate people, machines, and products, enabling faster and more targeted exchange of information (Rauch et al., 2020). Industry 4.0 is strongly associated with the integration between physical and digital systems of production environments. This digital revolution enables a collection of a large amount of data measured by different equipment (i.e., sensors) located in various places. These big amounts of data contain information about the processes in production. Analyzing the data can bring out valuable information regarding system dynamics. By applying analytics, it is possible to find interpretive results for strategic decision making, providing advantages such as cost reduction, machine fault reduction, repair stop reduction, spare parts inventory reduction, spare part life increasing, increased production, improvement in operator safety, repair verification, overall profit, among others (Peres et al., 2018; Sezer et al., 2018; Biswal and Sabareesh, 2015). Data-driven maintenance can support current maintenance policies. This Master Thesis project aims to improve the current maintenance policy into a data-driven maintenance policy. The initial goal was to implement an accurate prognostics model that predicts when the specific part (i.e. component X) of the machine can be replaced before faulty products are manufactured with all its consequences. 1 The project helped to optimize the life span of the replaceable machine part (i.e. Component X). In this situation, an operator can be involved on a preventive basis to replace worn-out equipment based on the current condition. (i.e., data-driven predictive maintenance). 1.1 Context: Company This section presents some background information about the company manufacturing site in Anomalized, where the thesis project was executed. The company is a leading manufacturer of healthcare technology. This thesis project has been executed at the Product Z production site in Anomalized. 1.1.1 Production line and product produce This thesis focuses on the production of the Product Z, see Figure 1.1. Within the production line, only one unique product is produced. Thus, there are no changeovers for the machines. The Product Z line consists of several machines that are positioned behind each other. A simplified representation of the process is shown in Figure 1.2. A deeper description of the process is given in Appendix A. Figure 1.1 Product Z To be able to perform data-driven maintenance, there should be data available. Data is available in the case for machine Y. Machine Y consists of several parts, but we focus on the part that has to be replaced regularly and is in direct contact with the operation of the machine, namely component X. Machine Y is maintained regularly (i.e., preventively every two weeks). Machine Y contains a certain part of the equipment that wears out relatively quickly. This part of the equipment is called component X. See subsection 1.1.2 for a detailed description of the function of component X. Figure 1.2 Production line of the Product Z Because machine Y performs the most complex tasks in the production line, data has already been gathered before this project started to monitor the situation accurately. To see the data availability, see subsection 1.1.3. 2 1.1.2 A deeper understanding of Machine Y To fully understand the function of machine Y, the importance of the component X, and the associated collected data, a deeper understanding of the process within the machine is required. This information is gathered by visiting the production line and talking to domain experts to explain the process. Machine Y consists of four identical machines, which behave all in the same way. Each machine consists of 30 production cells. Figure 1.3 shows a production cell. A production cell has several functions within the Machine Y. Those functions are transporting and centering the Product Z, clamping the Product Z during transport, and lastly, conducting the operation (anomalized) Construction-wise, a production cell is built up in different parts. Each part has a function in it. Below the parts are named: 1. Clamping jaws/fixing fingers: One production cell has three Figure 1.3 Production cell within Machine Y clamping jaws: L1, L2, and L3. See Figure 1.4. The clamping jaws fixate Product Z with a clamping force, making the same position remain during transport and the process. 2. Component X: Product Z are mounted on the clamp fingers for conducting the operation (anomalized). Figure 1.5 shows the contact between Component X and the Product Z. As said in the previous section, Component X is equipment the thesis Figure 1.4 Clamping jaws in production cell is focused on. If a component X not replaced timely, the Machine Y cannot conduct the operation. Which means faulty products. There is a risk that a component X will break, leading to an unexpected interruption during production. Because Machine Y is highly automated and records a significant amount of event data, this is the part of the process where data-driven maintenance can potentially be applied. Figure 1.5 Electrical contact 3 The reason why Component X was chosen is the following. It is the product that has to be replaced the most frequently, and it is the only parameter for which an indirect measurement is measured. Section 1.1.3 elaborates on the indirect relationship between the measured parameter and the condition of component X Figure 1.6 and Figure 1.7 show the difference between a new Component X and a worn-out Component X. Figure 1.6 shows a new Component X. Comparison between a new and a wornout Component X is removed due to company sensitive information Figure 1.6 New Component X Figure 1.7 Worn-out Component X To give an insight into which relevant parameters provide meaningful insights into the condition of Component X, the available data from the Machine Y is described in the following subsection. 4 1.1.3 Data availability Different sources of data were available, which were used to make the maintenance more datadriven based. In total, there were three datasets used for this project. Firstly, a document keeps track of when maintenance was performed on top of the planned maintenance moments. The second dataset consists of event log data which is logged within Machine Y. Furthermore, the third dataset keeps quality measure about Product Z. To better understand the key parameters from each dataset, each dataset is treated individually. The first dataset keeps a record of when maintenance is carried out, i.e., a logbook. Normally every two weeks, when Component X are replaced. To plan maintenance moments evenly over time, a set moment for each machine within Machine Y, set every two weeks, see Table 1.1. Table 1.1 Standard maintenance moments for Machine Y Machine: VT16 VT26 VT36 VT46 Week: Even weeks Odd weeks Odd weeks Even weeks Moment Saturday 11:00 AM Saturday 11:00 AM Wednesday 11:00 AM Wednesday 11:00 AM Incidentally, the relevant part is also replaced. Unplanned maintenance is recorded in this logbook. Because the logbook is tracked manually, this provides some margin of error in the indicated time when maintenance is performed and the actual time. Also, the engineers in question mentioned that Saturday 11:00 is a target time they prefer to stick to, but it is not always done. Planned maintenance is not recorded in the logbook. A margin of error is considered while performing analysis and predictions in the later stage of this Thesis. The second source contains log data regarding the parameters of the machine, i.e., multivariate time series. Different parameters can be distinguished, such as relative position based on a calibration, parameters regarding count values based on occurrences of behavior within a specified timeframe, and subsequent steps regarding the operation of the process. A more detailed overview of each parameter is given in Appendix B. Log data is recorded per product within each production cell in Machine Y. Data gathering is done automatically. Because the data gathering is gathered automatically, there is no margin of error when recording the data. However, there are empty values in this dataset and outliers, which should be discussed with the domain experts. 5 Lastly, the third source contains log data of the end product, i.e., multivariate time series. Company sensitive information is shown in Figure 1.8 and detailed description is removed. Hence, there are three different track parameters on top of the assignment of each machine and each production cell. Each 40-60 seconds, an end product is measured. The measurements are automatically logged. The delay from production to final measurement Figure 1.8 parameters Product Z is negligible. As the second dataset, the empty values and outlies are handled closely with the domain experts. A major observation is that due to the timely maintenance policy, there is no run-to-failure data available. The failure data which is available so far is individual per production cell. The absence of run-to-failure data will be one of the challenges to finding a suitable method. As mentioned previously, a relationship is found between the degradation of the measured parameters and the condition of the Component X. A more thorough exploration is made in section 3. In the following section, an in-depth problem description is given to break down the problem into several subquestions and structure the project. 6 1.2 Problem description This section defines the problem that forms the stimulant for this research project. The company is investigating how data and analytics as part of Industry 4.0 (i.e., smart industries) can make the maintenance policy more data-driven. First, the problem is introduced and defined. Subsequently, the sub-problems and associated research questions are formulated. 1.2.1 Problem formulation The company currently uses planned maintenance manuals that describe all the required planned maintenance activities for each machine, and this is handled through a time-based maintenance policy. There is, for example, no documentation of how many products Component X can manufacture. When the first machine was purchased, a life span was determined to average production. This determination was already several years ago, and some things have changed regarding the supplier of component X, the material of product Z, and other influencing factors. As a result, knowledge is lacking as to what constitutes an appropriate life cycle for component X and whether it is use-based or condition-based. The company has decided to do more with big data with the smart industry. Data-driven maintenance can reduce maintenance time, material costs, and maintenance planning time. While on the other hand can increase the equipment uptime and availability. The problem statement summarizes the goal of the research and its relevance: Conducting maintenance activities is not optimized for effectivity and frequency and does not use historic planned maintenance and event data to replace the Component X in Machine Y. The company’s goal for this project is to investigate whether data-driven maintenance can be applied to the Machine Y. An improvement on the current situation (i.e., time-based policy) could be a different maintenance policy like condition-based maintenance based on the data available. Condition-based maintenance will decrease the number of maintenance moments that will lead to a higher production capacity and lower maintenance costs. On top of that, the findings can be applied to other similar production lines. 7 1.2.2 Research Questions and research method per question The main question was constructed to structure the project answer to this question provides necessary insights. The research question of this project is formulated as follows: How can the data currently measured within Machine Y unit help implement a more data-driven maintenance policy for Component X? Several sub-questions were drafted to give more structure to the research to answer this rather broad research question. These sub-questions are divided into groups, as shown in Figure 1.9. Q-1, Q-3, and Q-4 help to find meaningful insights applicable to the case's current situation. Q-2 describes the current situation and desired outputs with all its characteristics. Q-5 helps to structure the experimental setup, and Q-6 supports the comparison to a baseline and results found in the domain of data-driven maintenance. Figure 1.9 Structure of sub-research questions Q-1: Which maintenance policies are there, and what are the characteristics of each policy? This sub-question was created to have the spectrum of maintenance coherent. It serves as an introduction, and the first goal of the literature review to know the characteristics of all possible maintenance policies. A literature review is used as a research method to answer this subquestion. It supports the description of the current situation and the desired output in the second sub-question. A detailed answer to this sub-question is given in section 2. Q-2: What are the characteristics of the current situation? To answer this question, the findings of the literature review are used. Also, the production factory was visited to get an overview of the production line. On top of that, unstructured interviews were conducted with domain experts. It can conclude that the lack of run-to-failure data is the biggest bottleneck for figuring out an effective maintenance policy. On top of that, there is no direct 8 measurement on the equipment (e.g., temperature, vibrations) but an indirect relationship that reflects the Component X’s degradation. Those two aspects were considered while finding an applicable policy. The current maintenance policy is a preventive maintenance policy conducted every two weeks. Q-3: Based on the available data, what type of data-driven strategy is best and realizable for the situation of Component X? Based on the conducted literature review and assessing the current situation due to a factory visit and unstructured interview, two constraints can be identified. Firstly, no run-to-failure data is available. Secondly, the sensor data is not measured directly on the maintained equipment. The most appropriate method is to work with a data-driven degradation model based on a Health Indicator (HI). An indirect relationship has been found between product related parameters and the condition of component X. Those product related parameters are used as a health indicator. The health indicator was evaluated on trendability, monotonicity, and correlation with the RUL. For solving the problem of the absence of run-to-failure data, there using an arbitrary lower bound. Because the degradation of the Component X indicated by the health indicator behaves as a linear line, we can substitute this arbitrary lower bound when sufficient run-to-failure data is gathered. The model must be a prognostic prediction model to predict a failing point. Q-4: Which predictions models can be applied for data-driven maintenance? Based on the literature review, several predictive models can be applied from the machine learning domain for creating a degradation-based model. A comparison is made with simpler time series based solely on one parameter to investigate if there is explainable behavior in the dataset. We compare that method with more complex machine learning predictive models. All the used models are explained in detail in section 2.3. This choice was made because this predictive model is often used during RUL predications. A baseline: linear regression was chosen for the reason that it is the easiest choice to implement. The decision tree methods (Random Forest and XGBoost) were chosen because of their ability to retrace the important parameters. Furthermore, the more complex neural networks were chosen to see if more complex models perform better. We selected a Simple Linear Regression, XGBoost, Random Forest, 1D-CNN, and LSTM for predicting the Remaining Useful Lifetime (RUL) of Component X for our implementation. 9 Q-5: How can we optimize each predictive model that in terms of accuracy? To ensure that each predictive model performs to its fullest potential, Section 6 provides a narrative describing how each predictive model operates. Subsequently, the input data is supposed to be set up to lend itself to its respective model. Five different experiments have been conducted to find the best input data and settings for each prediction model to find a performing predictive model. Finding the optimal dataset (e.g., granularity, feature selection) is found in sections 6 and 7. If applicable, for each model, hyperparameter optimization based on a grid search is performed, which can be read in section 6. Q-6: What is the performance of the best-performing prediction mode based on accuracy? An in-depth comparison is made for each machine learning model compared to intuitive time series methods. In section 7, this comparison is made. However, the best performing predictive model is Random Forest, which has an RMSE of roughly 18 active production hours (RMSE in products known but anomalized). For achieving those results, a data granularity of hourly data is used, and a sliding time window is based on a window of an hour. 1.2.3 Scope Several decisions have been made to adjust the magnitude of this research project. An overview of these decisions is given in this section. 1.2.3.1 Restrictions on data collection As previously indicated in a previous section, there are no direct sensors associated with the condition of the maintained equipment of Machine Y. As a result, less accurate results are obtained compared when there would be sensor data directly on the piece of equipment. This drawback has been discussed with the stakeholders. However, there is no option to collect additional data with additional sensors. 1.2.3.2 Implementation of predictive model In order to maximize the applicability of this research, the choice is made to work with limited company data instead of simulation data to make a predictive model. Using company data will cause the accuracy of the predictive model to perform less than if it were done with more usable simulation data. A prototype in the form of a proof-of-concept has been delivered. This solution is based on historical data for the year XXXX. Ensuring that the solution is implemented in real-time in the working environment with possible visualization is not part of the project definition. 10 1.2.4 Research design This project aims to deliver a proof of concept for data-driven maintenance based on the current data available within the production process of the Product Z in Drachten. For this project, we will use the CRISP-DM methodology, also shown in Figure 1.10. CRISP-DM is a Cross-Industry Standard Process for Data Mining widely used by industry members (Olson & Delen, 2008). In contrary to other methodologies, CRISP-DM focuses on data-related improvements or designs. Briefly, each phase of the CRISP-DM model is described with each core task and relevant sub-question. The CRISP-DM model is used as the structure of the thesis continued from this moment forward. Figure 1.10 CRISP-DM methodology In the business understanding phase business, the current situation of the process is described with the context and desired goals for this data-driven maintenance. This phase is mainly described in section 1. The data understanding phase is closely related to the business understanding phase. It was necessary to understand which data is available and what it means. Also, data exploration is done. Through new information from the data understanding, the perception of the business understanding can be changed and vice versa. It was important that the understanding of the data if verified with domain experts. The data understanding phase is described in sections 1 and 3. Q2 can be answered after the business understanding and data understanding phase. In parallel, during the business understanding phase and data understanding phase, a literature review is conducted. During this literature review, research questions Q-1, Q-3, and Q4 are answered. The summary of the literature review is shown in section 2. The three different datasets are integrated during the data preparation phase, missing values and outliers were handled. Nominal data is converted to numerical data. Moreover, all parameters which were not relevant were removed—the most useful parameters were chosen as input to the modeling phase. The data preparation stage is described in section 4. 11 Subsequently, in section 8, the modeling phase is described within different experiments. Different predictive models and settings in those predictive models are compared with a traditional regression method. The modeling phase is described in sections 6 and 7. During this section, subquestion Q-5 is answered. Next, the evaluation phase follows. The results of different predictive models are compared to different situations. The evaluation phase is shown in section 7. The last sub-question, Q-6, is answered in that section. The last phase is the deployment of the predictive model. However, this thesis project ends when the proof-of-concept is delivered. Making the predictions workable in a real-time application is out of the scope described in section 3.4 12 Part II – A literature overview 2. Literature Review To explore different maintenance policies with their characteristics, a literature review was conducted. The goal was to find out the most applicable one for the company’s needs. Also, several predictive models are discussed, which can be applied within the maintenance strategies considering the absence of run-to-failure data. During this literature review, several sub-questions are addressed, which are listed in this section. 2.1 Maintenance During this section, Q-1 stands central. To answer the question of Which maintenance strategies are there and what are the characteristics of each policy? An in-depth analysis is made of which maintenance strategies are there and which is the most applicable for the current situation and the desired situation. Maintenance may be defined as actions necessary for retaining or restoring a piece of equipment, machine, or system to the specified operable condition to achieve its maximum useful life. More specifically, maintenance includes repairing broken equipment, preserving equipment conditions, and preventing failure, which ultimately reduces production losses and down lime and reduces environmental and associated safety hazards (Adebimpe et al., 2015). In the past, maintenance was seen as a necessary evil to keep a company's core business. Nowadays, the role of maintenance has changed to be a strategically important part of the company’s business. New requirements are set up regarding quality, automation, performance, environment, safety, and agility, according to Duffuaa & Raouf (2015). Because of the new maintenance role, new maintenance strategies and maintenance models were established to gain competitive advantages. 2.1.1 Maintenance strategies Maintenance can roughly be divided into Corrective Maintenance (CM) and Preventive Maintenance (PM). Corrective Maintenance: Under a breakdown corrective maintenance policy, a part is not replaced until it has failed (Arts 2014). The corrective maintenance policy is the most expensive because the failed equipment can damage other parts of the machine or products. When repair is conducted, the whole production must be stopped, which will lead to downtime in production. 13 Costs involved can be high machine downtime, low production availability, and other equipment costs. Corrective maintenance is an attractive option for components that do not wear, such as electronics. Mobley (2013) has shown that corrective maintenance is about three times more expensive than the same task conducted in a preventive mode. Preventive Maintenance: or planned maintenance, the aim is to replace parts before failure occurs (Arts 2014). The goal of preventive maintenance is to reduce the frequency of failure rate of an item. It improves product performance and reduces or even minimizes machine downtime and failure costs (Shagluf et al., 2014). The general principle for preventive maintenance is that the risk of component failure can increase over time (Peng, 2014). Based on the work of Avontuur (2017), different preventive maintenance policies can be indicated, as shown in Figure 2.1. Figure 2.1 Categorization of maintenance policies (Avontuur, 2017) Avontuur (2016) categorized Preventive Maintenance is divided between the categories Timebased Maintenance, Usage-based Maintenance, Detection-based maintenance, and Conditionbased maintenance. Each preventive maintenance policy is briefly discussed. - Time-based maintenance: maintenance actions are performed after an amount of time. The parameter “time” can either be a reference to the time a component has been in use (i.e., time in production) or calendar time (i.e., real-time) - Usage-based maintenance: It is almost similar to the previous policy, but instead of time, a parameter that expresses the real usage of a product is used - Detection-based maintenance: Under Detection Based Maintenance (DBM), manual inspections through human senses (visually inspect, hear, smell or touch) is performed to detect whether a maintenance action is required (Waeyenbergh, 2006). This maintenance policy can be continuous or with fixed periods. 14 - Condition-based maintenance (CBM): is a decision-making policy based on a real-time diagnosis of impending failures and prognosis of future equipment health. To predict the moment of failure to make a timely decision on when to perform the appropriate preventive maintenance actions using a modeling approach (Veldman et al., 2011). Predefined control level thresholds set on measured parameters define when preventive maintenance should be performed to avoid costly downtime (Zhu, Peng, & van Houtum, 2014). This can be continuous or with fixed periods. Condition-based maintenance (CBM) is the most desirable for the case of the company. Data is automatically logged in a database related to producing a Product Z and measuring the final product itself. Predictive maintenance can be scaled as predictive condition-based maintenance based on the system's condition and its components. The condition of a system is quantified by obtaining data from various sensors in the system periodically or even continually. CBM attempts to avoid unnecessary maintenance tasks by taking maintenance actions only when there is evidence of abnormal behaviors of a physical asset. It is a proactive process that requires developing a predictive model that can trigger an alarm for corresponding maintenance (Peng et al., 2009). In the next subsection, different condition-based maintenance strategies are discussed to implement fault prognostics (i.e., predictive maintenance). 2.1.2 Different condition-based maintenance strategies as predictive maintenance Condition-based maintenance can be divided into two different sorts of strategies. Diagnostics and prognostics are two important aspects of a predictive maintenance program. Diagnostics deals with fault detection, isolation, and identification when an abnormality occurs. Prognostics deals with fault and degradation prediction before they occur. Prognostic algorithms for predictive maintenance have only recently been introduced into technical literature and have received much attention in maintenance research and development (Peng et al., 2009). The prognosis part is important for the thesis because the company benefits from a situation where machine malfunctions are predicted. Additionally, no failure data is currently available when the data is filtered on its respective machine. The absence of run-to-failure data leads to the fact that diagnostics cannot be applied. Also, a diagnostics algorithm has little added value. If a Component X breaks, a production cell within this Machine Y can no longer produce. Machine Y itself already does the detection of a non-functional production cell. 15 In diagnostics detects the presence of an incipient failure or point of transition from the normal state to a degraded state. Anomaly detection can be done statistically (e.g., detect outliers based on a simple threshold or dynamic threshold). Recently, statistical processes are machine learning algorithms are also used for diagnostic purposes. Jimenez et al. (2020) divide unsupervised anomaly detection algorithms into different main groups: Nearest-neighbor-based techniques, clustering-based, statistical algorithms, subspace techniques, unsupervised neural networks, and real-time and time-series analysis-based algorithms. However, for the company, diagnostics are not interesting due to the absence of failure data because of the preventive maintenance policy. For now, the focus is laid on prognosis. Prognosis is a relatively new area and has become a significant function of a maintenance system (Yang et al., 2009). From a human perspective, it seems that machines fail abruptly. Nevertheless, machines usually go through a measurable sign of failure before occurring (Lee et al., 2006). For that reason, the prognosis can use this measurable sign for predicting the appearance and the amount of time is left before failure. As shown in Figure 2.2, prognostic approaches can be classified into three basic groups; model-based prognostic, data-driven prognostics, and experience-based prognostic (Luo et al., 2008; Yang et al., 2008; Muller et al. 2008). Figure 2.2 Different condition-based maintenance strategies Because the prognostic methods of condition-based maintenance best fit the desired situation, the model-based policy, the experience-based policy, and the data-driven strategy are succinctly explained in the following subsections. A combination of two or more of these strategies is also possible. 16 2.1.2.1 Model-based approach Based on the work of Peng et al. (2009), physical model-based approaches usually employ mathematical models that are directly tied to physical processes that have direct or indirect effects on the health of related components. Domain experts usually develop physical models, and large sets of data validate the parameters in the model. Physical model-based approaches used for prognostics require specific mechanistic knowledge (i.e., first principles) and theories relevant to the monitored systems. The main advantage of the model-based approach is the ability to incorporate a physical understanding of the system monitored. Moreover, if the understanding of the system degradation improves, the model can be adapted to increase accuracy. Peng et al. (2009) stated that it is usually tough to build a mathematical model for a physical system with prior principles in realworld applications. So, the uses of physical model-based methodologies are limited. Especially when there is no deep understanding about the mechanics of the process is known or are nonlinear, as is the case of the company. 2.1.2.2 Experience-based approach An experience-based prognosis is an approach that does not depend on equipment’s historical data or the output from a mechanical model-based system. The approach solely depends on expert judgment (Yang et al., 2009). This method is the least used approach of all three approaches. The prognosis researchers are more focused on the existence of a numerical condition of data. Two widely known examples of the knowledge-based approach are expert systems (ES) and fuzzy logic. Peng et al. (2009) stated that expert systems are suitable for human specialists' problems. ES can be considered a computer system programmed to exhibit expert knowledge in solving a domain problem. Usually, rules are expressed in the form: IF condition, THEN consequence. The condition portion of a rule is usually some fact, while the consequence portion can be outcomes that affect the outside world. However, it is difficult to obtain domain knowledge and convert it to rules. Fuzzy logic provides a very human-like and intuitive way of representing and reasoning with incomplete and inaccurate information by using linguistic variables. 17 However, for the case within the company, no domain knowledge is known. Maintenance is done based on a preventive policy. Therefore, the behavior of the process is unknown after producing for too long with the same Component X. An experience-based approach seems unusable with the current knowledge. 2.1.2.3 Data-driven approach The data-driven prognostic methodology is based on statistical and learning techniques originating from pattern recognition theory for system degradation behavior (Peng, 2009). Data-driven models are usually developed from collected input/output data. These models can process various data types and exploit the nuances in the data that rule-based systems cannot discover. Dragomir et al. (2009) classify data-driven methods into two categories: statistical approaches and AI approaches. Statistical approaches include multivariate statistical methods, linear and quadratic discriminant, partial least squares (PLS), and signal analysis. Artificial intelligence (AI) techniques include neural networks, decision trees, and graphical models. Based on the current knowledge of the process, this method seems the most convenient. Data-driven maintenance tries to make connections based on data where no domain knowledge is required. 2.1.2.4 Hybrid approach Peng et al. (2010) stated that in real-world prognostic processes, the trends of all characteristic parameters are diversified and difficult to predict by using a single prediction method. Thus, a combination prediction method is adopted for prognostics. Using a well-designed condition-based combination prediction method that combines two or more prognostic approaches for data extraction, data analysis, and modeling may have the following advantages: (1) the demerit of individual theory will be offset, and the merits of all prediction methods could be utilized, (2) the complexity of the computation may be reduced, and (3) the prediction precision could be improved. There must be sufficient knowledge within the department to apply at least two methods to have a hybrid prognostic model. Sufficient knowledge is not known for the case at the company because maintenance is always done preventively. The engineers do not know what happens after the twoweek cycle. Also, there are no distributions available that can be linked to the wear of Component X. No run-to-failure data is available. Although there are arguments to apply a hybrid approach, this turns out to be impossible with the current information available. 18 2.1.3 Answering sub research question Q-1 The sub-question that was central to this chapter was as follows: Which maintenance policies are there, and what are the characteristics of each policy? To give a brief recap, the current situation within the company is a time-based maintenance policy. Every two weeks, Component X are replaced (i.e., real-time). A usage-based policy or a condition-based policy can replace the timebased maintenance. The reason for this is that data is automatically logged during production. Some measurements are also taken on the final product. The next sub-question will be Q-3. Based on the available data, what type of data-driven strategy is best and realizable for the situation of Component X? Here, the specifications within the company are specifically considered and what type of method is best to work out. This subquestion is described in the next sub-question, section 2.2. 2.2 Process of data-driven strategy Firstly, the characteristics are mentioned in the current situation. Subsequently, a suitable type of data-driven strategy will be examined based on the literature. During this section, Q-3 stands central. As shown in section 2.1.3, a condition-based maintenance policy is possible with the data currently gathered in the process of the company. Condition-based maintenance can again be subdivided into diagnostics and prognostics. Because there is no failure data available at the machine level, a prognostic method is chosen. The most appropriate method to work out is the data-driven maintenance policies, and this could be a statistical approach or more AI approach. 2.2.1 Process of prognostics Prognostics can predict a fault before it occurs and estimates the remaining useful life (RUL) of equipment. It is generally performed with three key steps (Xu et al. 2019). This general approach is used in multiple papers (Xu et al., 2019, Abid et al., 2018, Saidi et al. 2017; Javed et al. 2015; Zhang et al. 2016): (1) Health Indicator (HI) construction: HI’s are built to represent the health status of equipment. (2) Health Stage (HS) division: the lifetime of equipment is divided into several HSs based on the built HIs. (3) RUL prediction: the RUL is estimated through the assessment of degradation trends in the unhealthy stage 19 Abid et al. (2018) and Lei et al. (2018) also taking data acquisition in the form of data-monitoring into account as a preliminary step. The process of making an RUL prediction can be described as a series of steps depicted in Figure 2.3. Figure 2.3 General approach of prognostics, Abid et al. (2018) This method is chosen because it is the standard approach for prognostics. Health prognostics are the main tasks in condition-based maintenance, which aims to predict the RUL of machinery based on historical and ongoing degradation trends observed from condition monitoring information (Abid et al., 2018). Because in our data also a degradation trend can be traced, it is logical to use the same method. 2.2.1.1 Health indicator (HI) construction Abid et al. (2018) describe Health Indicator (HI) construction as the main step for achieving prognosis. It represents the evolution over time of the performance of the system. HIs can be classified into two categories: physical health indicators and virtual health indicators. Physical health indicators are based on a single feature, such as using the raw data gathered from sensors. In contrast, virtual health indicators are based on a fusion of multiple features that can better represent the system's health. The most relevant HI evaluation criteria are monotonicity and trendability (Saidi et al., 2017; Javed et al., 2015; Zhang et al., 2016). Monotonicity: The monotonicity evaluates the negative or positive trend of the HI, with the assumption that the system cannot self-heal. Trendability: Trendability is related to time and represents the correlation between the degradation trend and the operating time of a component When selecting a fitting health indicator (e.g., physics or virtual), the monotonicity and trendability need to be considered. If the health indicator is not monotonic and has no trendability, the fitting health indicator is not fitting. 20 2.2.1.2 Health stage (HS) division When a degradation occurred, the HI presents an increasing or decreasing trend. The degradation can be detected by dividing the HI into two or multiple stages using a threshold according to the degradation trend. Using a threshold is widely used in the literature for this task (Abid et al., 2018). The goal of this division using a threshold is to: (1) identify the stages where the degradation process is active (2) separate the degradation process or evolution over time according to its dynamics However, in the case of the company maintenance has been carried out preventively. Therefore, making health placements can be described as difficult. While analyzing the data, it will be necessary to focus on the deterioration of the process. To establish one or more thresholds that describe the condition of the Component X. Health stage division allows improving the reliability of degradation detection and the RUL estimation. (Abid et al. 2018). 2.2.1.3 RUL prediction The RUL of machinery is defined as the length from the current time to the end of the useful life. The RUL can be addressed as a time unit or production unit. Still, two issues related to RUL prediction are unanswered. Firstly, how to predict the RUL based on the condition monitoring information, and secondly, how to measure the prediction accuracy of different approaches. Those two things are answered in the next chapter. On top of that, the impact of missing labels since preventive maintenance is now in place is discussed. 2.2.2 Impact of missing labels Like the situation of the company, Zschech et al. (2019) attempt to develop a prognostic model with missing labels. Zschech et al. (2019) stated that for the case of critical machines, the aim is to avoid failures and faults through strictly short maintenance intervals. As a result, no thresholds and tolerance limits are known or observed that provide labels to mark necessary intervention points. In addition, sensors, which can describe physical health conditions directly (e.g., crack size, state of wear), are rarely used. Moreover, due to the pressure to use plants efficiently, it is often impossible to carry out test runs beyond safe conditions. Consequently, possible data observations might be truncated before the actual end of life, and thus, interesting events to describe fault patterns are not recorded. Overall, such circumstances can be characterized by the absence of a prospective target variable to build the prognosis. 21 Henceforth, this problem called missing labels can be seen as a major hurdle in developing adequate prognostic models (Gouriveau et al., 2013). In the case of Zschech et al. (2019), no direct labels were provided due to missing CBM thresholds and non-traceable results from quality control. However, no mention was made about the absence of run-to-failure data. On the other hand, Kim et al. (2021) described a method by calculating the HI with the MAE of the input and output data using an autoencoder. An autoencoder is a neural network that has a bottleneck as a hidden layer and focuses on the essentials of the input data. The input data is also the output data. Kim et al. (2021) use the difference between the input data and the output data of the autoencoder to determine whether the data are normal. The reason behind this is that the autoencoder learns the relationship between the input variables. As the autoencoder learns the training data, if the test data are similar to the input data, the reconstruction error (i.e., HI) between the input and the output will be small. On the contrary, when data that differs from the training data are used as the test data, the reconstruction error between the input and output will be large. However, the problem of determining a sensible HS (i.e., lower bound) stays. Normally, if there is historical run-to-failure data, the initial threshold can be set using failure data. However, when there is no historical run-to-failure data, the initial threshold is assumed arbitrarily. To determine a threshold, Kim et al. (2021) suggest two methods. First, let an expert provide a threshold, or secondly, determine it based on the HI calculated from the training data. Kim et al. (2021) proposed to use a gaussian distribution-based value because z-normalization was applied to the preprocessing method. Lastly, Kim et al. (2021) stated that the threshold could be updated when run-to-failure data accumulates. This way of determining a failure point can be useful for the case of the company. However, the main drawback of this approach is that the lower bound is based on the standard deviation. There is no data to measure the quality of this prediction because there is no run-to-failure data. However, it can be looked at to train the autoencoder on relatively short production periods versus long ones. If no meaningful health indicator can be determined based on monotonicity and trendability. The health indicator examined by the autoencoder can be used. However, during the study, establishing a lower limit as a health stage division will be investigated, as Kim et al. (2021) proposed. The idea that if data is standardized, a lower bound can be established on whether the deviation from the mean can serve as a more relative lower limit. 22 Malhotra et al. (2016) use a similar method with an LSTM encoder-decoder (LSTM-ED). You et al. (2013) also made a solution where large failed historical samples were available. The results were ineffective when failed historical samples were limited, but the performance improved fast when failed historical samples are increasingly available. 2.2.3 Answering sub research question Q-3 To answer sub-question Q-3: Based on the available data, what type of data-driven strategy is best and realizable for the situation of Component X The general approach of prognostics suggested by Abid et al. (2018). The approach consists of four steps, namely: monitoring data, health indicator (HI) construction, degradation detection using health stage (HS) divisions, and lastly, an RUL estimation. For a HI, we have to find a which is monotonicity and trendability. However, because the company uses a preventive maintenance policy, run-to-failure data are absent. Kim et al. (2021) suggest an autoencoder calculate a HI and use a Gaussian distributed value as a threshold to come up with an initial threshold for the health stages. In the next section, various prediction algorithms are discussed and methods to measure the prediction accuracy of different approaches. During this process, sub-question Q-4 stands central: Which prediction models can be applied for data-driven maintenance? 23 2.3 Predictive models used in data-driven maintenance This section answers Q-4: which prediction models can be applied for data-driven maintenance? During this section, we distinguish two sorts of prognostics data-driven algorithms (i.e., predictive maintenance). We will firstly discuss statistical data-driven approaches and subsequently machine learning approaches. To find the most applicable algorithms for our data-driven maintenance policy, we used different (systematic) literature reviews and surveys (Lei et al. 2018; Jimenez et al. 2020; Adhikari et al. 2018; Ramezani et al. 2019; Mathew et al. 2017). Because we want to give an RUL based on a threshold found by conducting a threshold during HS, we only take regression-based models are considered (i.e., no classification models were used). On top of that, anomaly detection algorithms are widely used within predictive maintenance. However, the common goal for those algorithms is fault detection. 2.3.1 Statistical approaches Jimenez et al. (2020) stated that statistical models aim to analyze random variables' behavior based on recorded data. For predictive maintenance, statistical models are used to determine the current degradation and the expected remaining life of the technical systems. Predicting the expected remaining life is performed by comparing the current behavior of measured random variables against known behaviors represented by a series of data. Normalization and data cleaning are common preliminary tasks performed on data series to obtain the distribution function before the trend analysis. Cleaning prevents outliers, constants, binary, or any other variable that is not useful for degradation analysis. Linear Regression: This is the most basic type of algorithm used for predictive analysis. Regression estimates are used to explain the relationship between a dependent variable and one or multiple independent variables (i.e., multiple linear regression such as Luo et al. (2015)). In this situation, the RUL is considered to be a dependent variable. Also, Mathew et al. (2017) include a simple linear regression in their work. This linear regression can also serve as a baseline to measure the quality of more complex algorithms. ARMA: Stands for autoregressive-moving-average models is primarily a system identify techniques, generated by using only normal operating data to identify the behavior of complex systems. The root mean square of residual errors is then used to indicate the machine degradation through a degradation index. Those residual errors are the different output between the 24 identification model and the behavior of the system. If it exceeds a certain threshold, this indicates a faulty system. Pham et al. (2012) use such an algorithm on top of a proportional hazard model and support vector machine to develop a remaining useful lifetime estimation for a low methane compressor. However, this method seems hard to work out with the absence of run-to-failure data. Liao et al. (2020) also use an ARMA model. However, run-to-failure data was available. Bayesian models: The bayesian model is a statistical model where you use probability to represent all uncertainty within the model, uncertainty regarding the output, and the uncertainty regarding the input data (i.e., parameters) to the model. Finding these prior probabilities poses the main problem for Bayesian theorem application. Bayesian models can be applied for predictive maintenance purposes when data, including anticipated failures with their related symptoms and life expectancy, is available (Jiminez et al., 2020). However, due to the absence of run-to-failure data, it is impossible to determine such probabilities to an event. Statistical models offer potential solutions for diagnostic as prognostics tasks. However, the main drawbacks of statistical models concern the need for enough previous data to build a reliable model and uncertainty models. For predictive maintenance systems, statistical models are often implemented in multi-model approaches. Containing historical data makes statistical models harder to implement except for the linear regression and other (simple) regression analyses. 2.3.2 Machine learning approaches (AI approaches) Machine learning is a branch of artificial intelligence that uses specialized learning algorithms to build models from data. These models can deal with and capture complex relationships among data, difficult to obtain using physics-based, statistical, or stochastic models (Jiminez et al., 2020). The results of such approaches are hard to be explained because of the lack of transparency. Many of these techniques are named “black boxes”. One key point of machine learning models is their learning process and depends on the system's application, goal, and available data. The following methods were considered based on the goal of using the algorithms as prognostics. Decision Tree: In decision tree algorithms, a tree is constructed to serve as a predictive model. The branches of this tree illustrate the outcome of the decision taken. The observations about an item can be converted to conclusions with the help of this decision tree, as stated in Mathew et al. (2017). The work of Patil et al. (2018) also underly the use of decision tree algorithms to retrace the feature importance for the predictive algorithms. 25 Random Forest (RF): Random forest is an ensemble learning method. It operates by constructing multiple decision trees when training the algorithm, and the output is the mean prediction of each of the individual trees. In the research Mathew et al. (2017) conducted while predicting the RUL of a turbofan engine, the Random Forest Algorithm performed the best compared to various other machine learning algorithms. Bey-Temsamani et al. (2009) also used it for their similarity model. Gradient Boosting (GB): This is a forward learning ensemble method. Based on the idea that good predictive results can be obtained through increasingly refined approximations. Gradient boosting builds regression trees subsequentially, based on all the dataset features in a fully distributed way. In the work of Matthew et al. (2017), the gradient boosting algorithm performed the second to best when predicting the RUL of a turbofan engine. Support Vector Machine/Regression (SVR): The numeric variables in the data in different columns form an n-dimensional space. A hyperplane is a line that splits the input variable space. With the support vector machine, a hyper plan is selected to best separate the points in the input variable space by their class, normally done using a kernel. Using a support vector regression, it will converge to RUL prediction. Benkedjouh et al. (2013) use such an SVR method with a Gaussian kernel to predict the RUL of bearings. Artificial Neural Network (ANN): ANNs mimic human brains' working process, which connects many nodes in a complex layer structure. They are the most commonly used AI techniques in the field of machinery RUL prediction (Lei et al.,2018) Long Short Term Memory (LSTM): Long Short-Term Memory (LSTM) network is an architect specializing in discovering the underlying patterns embedded in time-series, is proposed to track the system degradation, and consequently, predict the RUL. Zhang et al. (2018) use such a method to predict the RUL of a plane's engine. Convolution Neural Network: Convolutional neural networks are distinguished from other neural networks by their superior performance with image, speech, or audio signal inputs. However, it is also widely used as prognostics of an RUL prediction, as proposed by Li et al. (2018) Each of those machine learning approaches can be used in an RUL estimation for the company. When looking at (systematic) literature reviews and surveys (Lei et al., 2018; Jimenez et al., 2020; Adhikari et al., 2018; Ramezani et al., 2019; Mathew et al., 2017) within the prognostic aspect of 26 data-driven maintenance no to little mention is paid to decision tree methods. The choice is made to focus on those methods. A benefit of tree methods is that it is relatively easy to determine which parameters were important in the prediction. Also, more complex methods will be worked out in the form of CNN and LSTM since these outperform the regular ANN. 2.3.3 (Deep) Transfer Learning In real-life scenarios, the absence of run-to-failure data is applicable. In the last few years, (deep) transfer learning or (deep) domain adaptation is also used for a Remaining Useful Lifetime (RUL) estimation. RUL relates to the amount of time lift before a piece of equipment is considered not to perform its intended function. Da Costa et al. (2019) delineates a situation where processes that require prognostics prediction models that can leverage real-time data collected consciously over different locations. Although they perform similar processes, the system logs different multivariate sensor data due to equipment version updates, sensor malfunction, and timing. In such cases, Da Costa et al. (2019) suggest that high dimensional temporal data has to be directly used to determine the health state of the systems and models to adapt to incoming changes in the data. The work performed by da Costa et al. (2019) proposes an LSTM network to address the problem of learning temporal dependencies from time-series sensor data that can be transferred across relation RUL prediction tasks. The learning is based on a source domain with sufficient run-tofailure annotated data, the target domain containing only sensor data. Da Costa et al. (2019) developed a Domain Adversarial Neural Network (DANN) approach to learning domain invariant features that can be used to predict the RUL of the target domain. This option could be interesting for the case of the company due to the handling of the absence of run-to-failure data. However, the most important principle of transfer learning is to have a source domain with sufficient run-to-failure data. The learned behavior can then be subsumed with some transformations by (deep) transfer learning to be applied to the target domain. In the situation of the company, there are no similar machines or processes with somewhat partially the same parameters, which leaves the possibility of implementing (deep) transfer learning. 27 2.3.4 Quality measures General quality measures stated in various papers, including the papers of Malhotra et al. (2016), Kim et al. (2021), Abid et al. (2018), Zschech et al. (2019), uses the Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE) as a quality measure for their RUL prediction. RMSE is shown in Equation 2.1, and MAPE is shown in Equation 2.2. Equation 2.1 RMSE formula π 1 2 π πππΈ = √ ∑(π π (π‘) − π∗π (π‘)) π π‘=1 Equation 2.2 MAPE formula π 100% π π (π‘) − π∗π (π‘) ππ΄ππΈ = | ∑| π π π (π‘) π‘=1 Where n is the number of observations, t is the time index, π π (π‘) represents the true RUL, and π∗π (π‘) represents the predicted RUL. Because we want to predict the remaining useful lifetime in products or time units, we used such methods and not the accuracy. 2.3.5 Answering sub research question Q-4 To answer the sub-research question Q-4: Which prediction models can be applied for data-driven maintenance? We decided to start with simple machine learning methods such as a decision tree regression. On top of that, we want to apply a more complex neural network that specifically uses time-series, namely a Long-Term Short Memory (LSTM) network. We use a statistical regression approach to compare these methods, namely a linear regression. Such a wide variety of methods is to see which predictive algorithm comes up with the best predictions for the RUL estimation. 28 2.4 Conclusion of literature review To conclude the systematic literature review, various maintenance policies are identified. Different methods and algorithms are applicable for diagnostics and prognostics of machine failure depending on the available information. For the case of the company, the most fitting maintenance approach is data-driven maintenance since much data is gathered and measured within the production process of Product Z. Limited knowledge is known about the internal process and the degradation of the equipment during the production of the product. Various algorithms are applicable for an RUL estimation. However, the choice is made for two simple machine learning algorithms (Decision Tree, Random Forest), two complexes machine learning algorithms (LSTM, CNN), and a statistical approach to compare the machine learning algorithms, namely a simple linear regression. 2.4.1 GAP in literature Xu et al. (2019) attempt to provide a brief overview of the PdM system, emphasizing the current developments of data-driven fault diagnostics and prognostics methods. Xu et al. (2019) described future trends in the field where data-driven fault diagnostics and prognostics are still developing. The suggestion is made for future trends in this field, namely: • developing accurate hybrid methods combining data-driven and model-based diagnostics and prognostics approaches. • Improving the efficiency of diagnostics and prognostics methods. • Predicting the machinery RUL with limited labeled data. • Managing the uncertainties properly in the scheduling process of maintenance tasks. For our case, we have limited labeled data or, more precisely, no labeled data. Little research has been done presented in the literature on predictive maintenance with no labeled data. This offers relevance to the academic field of this topic; this is also confirmed by the number of search results when specifically searched for during the systematic literature review. During this literature review, only the papers of Kim et al. (2021), Zschech et al. (2019), Malhortra et al. (2016), and You et al. (2013) mentioned the absence of labeled data. Where Kim et al. (2021), Malhortra et al. (2016), and You et al. (2013) also mentioned the absence or redundancy of run-to-failure data. None of those papers used decision tree methods to predict the RUL. 29 Part III – Data exploration, understanding, preparation 3. Data exploration A deeper understanding of the data is obtained during this phase. The available datasets are explored in order to understand what data is available as well as its quality. This data exploration is guided by the problem formulation and the literature review. The focus lies on what data suitable for our chosen data-driven maintenance strategy. Firstly, general statics are shown to get an idea of how the distribution looks. Subsequently, the amount of production and incidental maintenance moments are analyzed to see deviating production cycles. After that, a trend analysis is performed to analyze parameters that show a constant trend that is necessary for a predictive degradation model, as was shown in the previous section. Furthermore, the maintenance moments are analyzed. 3.1 Descriptive statistics For both numerical datasets, descriptive statics are given to seek outliers and see how different machines within Machine Y behave compared to each other. On a deeper level, a comparison is made between each production cell in Machine Y. All the data for the year XXXX is used. 3.1.1 Product dataset First, the general descriptive statistics are derived from the processed dataset. In Table 3.1, all the parameters are shown. The product dataset consists of 601,956 data rows. The dataset consists of five parameters. ROD_11, ROD_12, and ROD_13 indicates removed due to company sensitive information. And OTAFP indicates the production cell. Table 3.1 Descriptive statics product dataset count missing mean std min 25% 50% 75% max ROD_11 601956.00 0.00 6.85 2.12 0.69 5.43 6.61 7.99 105.34 ROD_12 601956.00 0.00 7.30 2.24 1.01 5.81 7.03 8.47 105.09 ROD_13 601956.00 0.00 7.35 2.21 0.80 5.87 7.10 8.54 110.24 OTAFP 601956.00 0.00 15.46 8.66 1.00 8.00 15.00 23.00 30.00 What can be seen is that when we take the first quantile and third quantile into account, that for all the three parameters (i.e., ROD_11, ROD_12, and ROD_13) are within a range of 5.43 and 8.43. However, what can be seen at the max values is the extremely high numbers compared to the first and third quantiles. We needed to see if these particular outliers are equally distributed across all machines within Machine Y (i.e., VT16,VT26,VT36,VT46) or one of those machines is causing those outliers. It is also important to look at the production cell level. Should the cause be 30 specific to a machine and production cell therein, the cause should be investigated. The same can be said about the min-value numbers. When looked at the machine level, Figure 3.1, it can be concluded that the outliers are evenly distributed across each machine within the Machine Y. Even if the different parameters measured on Product Z are looked at, this is also evenly distributed. Figure 3.1 Boxplot of each parameter for each machine Subsequently, the production cell within these machines are examined. For visibility, this is done per parameter and each machine. An example is shown in Figure 3.2. To see all parameters with the respective Machine Y, refer to Appendix C. Figure 3.2 Boxplot of parameters for production cell within VT16-machine 31 The differences based on the first and third quantities for each production cell distribution differ but are fairly minimal. Also, the differences in outliers are not specifically attributable to a certain production cell within a machine, even when looking at Appendix C. When zooming in on the first and third quartiles per machine, it appears that the parameters shown minimal differences between different production cells. See Figure 3.3. Based on the behavior per production cell, it is particularly striking that there are no outliers below the lowest quartile on some production cell (e.g., cell 1 and cell 6). The median also often differs per different production cells, as does the distribution of the quantiles. These differences are minimal, but it can be interesting to make analyses at a production cell level during the modeling phase. Figure 3.3 Zoomed in boxplot of parameters for production cell within VT16-machine 3.1.2 Process dataset Secondly, a similar analysis is made for the product dataset. However, this dataset has more data rows and columns, respectively 22,000,000+ rows and 35 columns. The columns REMARK, OBEWF, OCTPP, ORACT, and OTFAP are binary or categorical parameters. Therefore, quantiles of those parameters are meaningless. On the contrary, the descriptive statics of the other parameters gives meaningful insights. OL1PR, OL2PR, and OL3PR give negative numbers when the minimal value is given in Table 5.2. It makes sense because those parameters indicate the position of Product Z within the production cell based on a reference point. For comparison, OVKZ also gives a negative number as the minimum value. Having discussed this with the domain expert, this may not be possible, indicating corrupted data. After discussing all the descriptive statics with the domain expert, a product can only be produced if removed due to company sensitive information indicated by the parameter OTVKT. A missing value for this parameter is a corrupted data point; this is the case for 2,666,070 data points. Those data points would be removed in the data preparation phase. In Table 3.2, all 32 parameters are shown, which are measured when the Machine Y is fabricating. Table 3.2 Descriptive statics process dataset count missing mean std min 25% 50% 75% max count missing mean std min 25% 50% 75% max count missing mean std min 25% 50% 75% max REMARK 0 22246927 OL2PR 17153164 5093763 -0.0453 35.19797 -1870.6 -18.8 0 18.8 1710.8 OTVKT 19580857 2666070 1304.731 155.3285 0 1191 1291 1396 3721 OBEWF 329132 21917795 19.06832 7.568774 0 19 24 24 30 OL3PR 17547304 4699623 1.183271 36.78888 -1983.4 -18.8 0 18.8 1739 OVNKT 19367536 2879391 311.2378 34.61268 0 279 303 343 2000 OCTEN 19627563 2619364 21600.26 3962.155 0 19248 21344 23776 104736 OPRPX 19649003 2597924 14320.05 1519.914 0 12565 15184.16 15286.14 17121.76 OVNKZ 19549684 2697243 165.6038 18.72718 -81.8 153 165 178.3 1413.8 OCTEO 19611740 2635187 19819.13 2570.739 0 18544 19872 21264 70016 OPRPY 19649287 2597640 16491.61 977.1917 0 15535.29 16286.65 17589.03 18983.24 OVSP1 19561000 2685927 4129.676 637.0317 0 3638.7 3965.3 4203 6033.6 OCTOK 19627323 2619604 61385.99 7441.904 0 57696 61600 65568 144768 OPTEW 8544604 13702323 2671.496 762.0827 -1 2071 2674 3307 4550 OVSP3 17731528 4515399 20.76635 0.768799 0 20.4 20.6 21 28.2 OCTOV 12548883 9698044 104.0141 21.87353 0 96 96 112 3552 ORACT 57250 22189677 0.985659 0.118892 0 1 1 1 1 OVSP4 17341812 4905115 10.48372 0.483952 0 10.2 10.4 10.7 29.5 OCTPG 158787 22088140 14.35211 855.5327 0 0 0 0 117376 OSTV1 19584413 2662514 1017.452 121.4371 0 932 1003 1088 2017 OVSP5 16624150 5622777 7.355334 0.374405 0 7.1 7.3 7.5 12.1 OCTPP 91285 22155642 0 0 0 0 0 0 0 OSTV3 19186064 3060863 131.5311 22.86511 0 116 127 142 2004 OVTV1 19580374 2666553 998.7494 124.7466 0 910 987 1073 1987 OCTSC 19602719 2644208 5252.139 2777.877 0 3456 4512 6528 71920 OSTV4 19209996 3036931 109.8519 20.44288 0 95 107 122 852 OVTV3 19083104 3163823 118.1898 22.6554 0 103 114 128 1861 OCTU3 4788930 17457997 96.18299 116.1308 0 16 48 128 1168 OSTV5 19338658 2908269 106.2063 30.33349 0 85 101 122 938 OVTV4 19110594 3136333 96.01248 20.24725 0 82 92 108 837 For each parameter within this dataset, histograms and boxplots are made. Those boxplots are shown per machine and, on a deeper level, per production cell. On top of that, histograms are made for the categorical parameters to see the distribution per category. All the outliers per respective machine or production cell are discussed with the domain experts. All corrupt behavior is removed in the data preparation phase, removing all the behavior for each parameter which is more than three standard deviations from the mean. Next, the production for each machine is shown in the following subsection. The production, in combination with the number of incidents of the Component X or production cell, gives a clear picture of which maintenance period obtained irregular behavior. When the data is prepared, this is considered only to take clean data periods into account. 33 OL1PR 17629655 4617272 2.902294 38.39188 -1974 -18.8 0 18.8 1720.2 OTAFP 19683300 2563627 15.49777 8.653749 0 8 15 23 30 OVTV5 19277183 2969744 92.55479 30.27356 0 71 88 109 656 3.2 Production and incidents per period The number of products produced for each machine within Machine Y is shown in Table 3.3. In this table, the production numbers per period (i.e. every two weeks) for each specific machine are indicated to indicate if the production is evenly distributed over the year. The incidental maintenance moments are given between brackets. Table 3.3 Production per period for each machine Because the VT46 was not introduced within the organization until the end of XXXX, this creates extensive gaps in the data. There are also gaps in the production of the VT46 due to malfunctions in the recording of data. There are also some fluctuations for each production period. During the data preparation phase, periods with low manufactured products or a high number of incidental maintenance moments are considered. Those moments could cause unpredictable behavior and are unable to represent reality. 34 3.3 Failure rate during lifetime of Component X A logical consequence of the aging of the Component X would be that more products would fail in the production process. After all, the Component X are replaced so that the process can perform optimally. However, this is not the case. In Figure 3.4, the production hour compared to the last maintenance moment shows no increase in failed products. Also, the number of product failures due to the Component X is neglectable, about 4,000 products based on 22,000,000 products. Figure 3.4 Failure rate during lifetime of Component X 3.4 Seeking trends in data A Mann-Kendall Trend test is used to determine whether or not a trend exists in time series data (Kamal and Pachauri, 2018). It is a non-parametric test, meaning there is no underlying assumption made about the normality of the data. The null hypothesis, H0, states that there is no monotonic trend, and this is tested against one of three possible alternative hypotheses, HA: (i) there is a monotonic upward trend, (ii) there is a monotonic downward trend, or (iii) there is either a monotonic upward trend or a monotonic downward trend. It is a robust test for trend detection. Kemal and Pachauri (2018) stated the following assumptions underlie the MK test: - In the absence of a trend, the data are independently and identically distributed (iid). - The measurements represent the true states of the observables at the times of measurements. - The methods used for sample collection, instrumental measurements, and data handling are unbiased. 35 Following the approach of Kamal and Pachauri (2018), the first step in the Mann-Kendall test for a time series x1 , x2 , … , xn of length n is to compute the indicator function sgn(xi − xj ) such that, see Equation 3.1: Equation 3.1 Sign formula of Mann Kendall trend test 1, π ππ(π₯π − π₯π ) = { 0, −1, π₯π − π₯π > 0 π₯π − π₯π < 0 π₯π − π₯π ≥ 0 Which tells whether the difference between the measurements at time π and π are positive, negative, or zero. Next, the mean and variance of the above quantities, the mean πΈ[π], is given by Equation 3.2. S is the statistic calculated by the Mann-Kendall trend test: Equation 3.2 S statistic formula of Mann Kendall trend test π−1 πΈ[π] = ∑ π=0 π ∑ π ππ(π₯π − π₯π ), π=π+1 Furthermore, the variance ππ΄π (π) is given by Equation 5.3: Equation 3.3 Variation formula of Mann Kendall trend test π 1 ππ΄π (π) = (π(π − 1)(2π + 5) − ∑ ππ (ππ − 1)(ππ − 1)(2ππ + 5)), 18 π=1 Where π is the total number of tie groups in the data, ππ The number of data points in the π-th tie group, and π the number of data points within the time series. Using the πΈ[π] and the ππ΄π (π) the Mann-Kendell statistic is computed, using the following transformation, which ensures that the test statistic πππΎ (normalized test statistic) is distributed approximately normally, following Equation 3.4: Equation 3.4 Normalized test static of Mann Kendall trend test πΈ[π] − 1 πππΎ √ππ΄π (π) = 0, πΈ[π] − 1 , {√ππ΄π (π) , πΈ[π] > 0 πΈ[π] = 0 πΈ[π] < 0 At a significant level πΌ (0.05) of the test, the computation is made whether or not to accept the alternative hypothesis π»π for each variant π»π Separately: The trend is decreasing if Z is negative and the computed probability is greater than the level of significance. The trend is increasing if the Z is positive and the computed probability is greater than the level of significance. If the computed probability is less than the level of significance, there is no trend. This process is done for all 36 parameters within the process dataset and the production dataset for each production period. However, only a continual trend was found for the product dataset. Parameters that have a trend are used to re-evaluate the correctness of the planned maintenance moments in the next paragraph. In Table 3.4, only the parameters shown a constant trend for the time-series data, which is only the product dataset that measures removed due to company sensitive information. Looking at which production cycles deviate from the other cycles helps us exclude these production cycles in the data preparation phase, which will improve the quality of the predictive model. Table 3.4 Results of Mann Kendall trend test for product dataset VT16 VT26 VT36 VT46 Period Samples Trend Sig Samples Trend Sig Samples Trend Sig Samples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 9069 9875 8939 8363 8573 4515 10133 7172 11167 7846 11537 9863 9201 2503 4245 7318 13038 11324 8304 9688 9097 7520 7292 3913 2996 down down down down down down down down down down down down down down No up down down down down down down down down down *** *** *** *** *** *** *** *** *** *** *** *** *** *** No *** *** *** *** *** *** *** *** *** *** 9236 10081 9944 11505 7967 7795 5645 8846 8324 8155 7794 10717 10473 7068 down down down down down down up down down down up down down down *** *** ** *** *** *** *** *** *** *** * *** *** *** 6479 13622 9509 11638 7663 8676 9062 1461 1329 2868 down down down down down down down down down down *** *** *** *** *** *** *** *** *** *** 8609 8411 8191 8454 6619 8123 4662 8697 7907 8482 8872 3067 7565 8854 74 7424 6346 4457 8039 7132 7921 4410 3605 4129 5538 down down down down down down down down down down up down down down No down down down down No down No down down down *** *** *** *** *** *** * *** *** *** *** *** *** *** No *** *** *** *** No 3 No No *** 144 No No No 8087 down *** *** 10116 down *** *** 3288 down * *** 4961 down *** *** p < 0.001, ** p < 0.01, * p < 0.05 Trend Sig What strikes one when looking at Table 3.4 are the deviating production cycles. For VT16: period 15 & 16, for VT26: 7, 11 & 15, for VT36: 11, 15, 20 & 22 (to a lesser extent period 7) and for VT46: period 20 & 21 (to a lesser extent period 24). These periods are not included in the data preparation phase. It also appears that ROD_11, ROD_12, ROD_13 have a constant downward trend. This downward trend offers the possibility to use this as a degradation parameter to establish a lower bound for a predictive maintenance policy. Because these parameters have the same behavior most of the time, they are also used in the following subsection to check the maintenance moments for the correctness 37 3.5 Analyzing maintenance moments Since the previous subsection parameters were found to show a stable trend over the lifetime of the Component X, these parameters are used to evaluate the maintenance periods. There are four machines with individual production cycles. An example of a sequence of data that indicates abnormal behavior is given in Figure 3.5. Abnormal behavior are: jumps in the data are not linked to a maintenance moment (red dotted) or a maintenance moment with no the data (green dotted). Figure 3.5 Example abnormal behavior All the possible irregular behavior is recorded in Table 3.5. In that case, the maintenance moments have deviated from the two-week cycle. All these abnormal maintenance moments are discussed with domain experts, and during the data preparation phase, the maintenance periods were adjusted. Table 3.5 Reviewed maintenance moments Timestamp removed due to company sensitive information Machine VT16 VT16 VT16 VT16 VT16 VT16 VT16 VT26 VT26 VT26 VT36 VT36 VT36 VT36 VT36 VT36 VT36 VT36 VT36 VT36 VT36 Action Remove Remove Remove Add Add Add Add Remove Remove Add Remove Remove Remove Remove Remove Add Add Add Add Add Add Confirmed Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 38 4. Data preparation Validation of the data is important so that a good analysis will be made. Using incorrect data can affect the quality of your model severely. The data has to be prepared so it is correct, and it can be used for the analysis and as input for the model. With the data preparation step, it is necessary to take the following steps: data integration, data cleaning, data transformation, feature selection, and data reduction. This sequence is also used as the structure of this section, 4.1 Data integration For the analysis to see whether data-driven maintenance can be applied to Machine Y, historical data (i.e., full production year XXXX) is used, obtained as CSV files from an ORACLE database; this applies to both the product and process dataset. We can classify the data sources under the following headings: Logbook, which keeps track of the incidental maintenance moments (entered manually). The process dataset logs various parameters related to the production of the product, and finally, the product dataset, which measures different parameters regarding the product. Both the product and process datasets are automatically maintained in the ORACLE database. Based on the timestamp, the data can be linked to each other, as it is tracked automatically in real-time with minimal delay. However, the data can be grouped at several layers, namely at the machine level or at the production cell level. We grouped the data based on the respective machine. Because the process data is kept per product, and the product data is based on a production sample. It was decided to use different degrees of granularity, the data grouped through the mean of this interval. The data is split into daily, 8-hourly, hourly, and quarter data on the machine level. 4.2 Data cleaning Data cleaning is necessary to correct corrupt or inaccurate data. Tasks fill in the missing values, identify the outliers, correct inconsistent data, and resolve redundancy caused by data integration. Data cleaning for the current situation can be divided into three tasks, namely • Handling of outliers and missing values • Adjusting maintenance moments • Selection of applicable production periods The actions that have been carried out are briefly dealt with in the following sub-sections. This is in response to the behavior in the data exploration stage. 39 4.2.1 Handling outliers and missing data First, we focus on the process dataset concerning the missing data. In Table 5.2, descriptive statics of the process dataset is shown. After discussing this with the domain experts, it turned out that a data entry must at least have a value for the parameter OTVKT. Otherwise, the product cannot be manufactured. A data entry must also be linked to a table position and a machine, which is indicated with OTAFP and WORKCELL, respectively. All data entries with no value filled in for these parameters are deleted, indicating a corrupt datapoint. The number of data points from the processed dataset is the original 22,246,927 datapoints. When the requirements above are taken into consideration, the number of data points is 18,495,610. Since we are using historical data from 2020, we have a period before the first maintenance point and a period after the last maintenance point, which cannot be linked to a whole production cycle. These data points are also not considered. When this is considered, we are left with 18,214,165 data points for the process dataset. This number is used to create a target variable for each cycle based on these production numbers. Before using the variables, outliers are removed. For the product dataset, there are no missing data for one of the parameters. All data entries are interconnected to a specific machine and production cell. Therefore, only actions have been taken concerning the outliers. All data points that are more than three standard deviations from the mean are removed concerning the specific parameter (i.e., all three parameters). Also, the points that cannot be linked to a production cycle are removed, just like in the dataset. If all steps are followed, the total number of data entries is reduced from 601,956 to 582,832 data points. 4.2.2 Adjusting maintenance moments For each machine, the data points of the process dataset are examined. The constant trend found in the data for these parameters in subsection 3.4 shows several moments where the trend continues, suggesting that there has been no maintenance and moments where there should have been maintained, not showing the expected behavior. The adjusted moments are shown in Table 3.5 4.2.3 Selection of applicable production periods To ensure that the prediction model is as accurate as possible, only realistic production cycles will be considered. After consultation with the domain experts, cycles with limited production (i.e. removed due to company sensitive information), cycles with relatively high numbers of incidental maintenance moments (i.e. removed due to company sensitive information), and cycles that show 40 no or opposite trends are not included. After entering the new maintenance moments, the same analysis was performed as in the data exploration stage. This results in the following periods as input for the model, see Table 4.1. Prod. Indicates the limit of removed due to company sensitive information products is reached. Inc. indicates incidental maintenance moments which could not exceed removed due to company sensitive information. The trend indicates the downwards trend for the parameters in the product dataset. In the last column, the determination is made on whether the period is used as input for the prediction model Table 4.1 Review of production periods Period 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Prod. Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N 4.3 VT16 Inc. Trend Y Y Y Y Y Y N Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Used Y Y Y N N Y Y Y Y Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y N Prod. Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N N Y VT26 Inc. Trend Y Y Y Y Y Y Y Y Y Y N Y Y Y Y Y Y Y N N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Used Y Y Y Y Y N Y Y Y N Y Y Y Y Y Y Y Y Y Y N N Y Prod. N N N N N Y Y N N N Y Y N N N N N Y N N Y Y Y Y Y Y VT36 Inc. Trend Y Y Y Y Y Y Y Y Y Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y N Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y N Y Y N Y Y Y Y Y Y Used N N N N N Y Y N N N Y Y N Y N N N Y N N Y N N Y Y Y Prod. N N Y Y Y Y VT46 Inc. Trend Y N Y Y Y Y N N N N Y Y Used N N N N Y Y Data transformation Data has to be converted to the right format. In the data, the column WORKCELL is used to indicate which machine is used to manufacture the product in question—indicated as a string, which is converted to a binary value while using one-hot encoding. Subsequently, the timestamps are changed from string to DateTime. Lastly, a different name is used for the table position (i.e., a production cell within a machine) for the process and the product dataset to a general name to link the datasets. Data is grouped on different granularities to extract variation from the data. These time buckets were also used to create different features stated in the next subsection. 41 4.4 Feature selection For selecting fitting input parameters from both the product dataset and the process, the dataset can be accessed. This is because the product dataset had a degrading trend in all three measured parameters, which would correlate with the wear of Component X. In the experiments, Yang et al. (2007) conducted their classification model performance improved for fault diagnosis with timedomain features. Yang et al. (2007) had a different goal in mind, namely fault detection. Their project focuses on prognostic maintenance. It was decided to apply the time-domain features they had used for this case, intending to obtain a more accurate prediction model. The input for the time-domain features is based on different time granularities (i.e., daily, 8-hourly, hourly, quarter) based on the average value of the parameters of the product dataset, see Table 4.2. Table 4.2 Time-domain features Feature Equation 1 Feature Equation Variance π (π₯(π‘) π₯π£ππ = π΄π‘=1 − π₯Μ )2 1 Mean π π₯Μ = π΄π‘=1 π₯(π‘) Square mean root π |π₯(π‘)|1/2 2 π₯π = ( π΄π‘=1 ) Root mean square π π₯πππ = ( π΄π‘=1 π₯(π‘)2 )1/2 Max value Peak-peak value Skewness π₯πππ₯ = πππ₯(π₯(π‘)) π₯Μ = max( |π₯(π‘)|) Min value π₯πππ = πππ(π₯(π‘)) π (π₯(π‘) π₯π ππ = π΄π‘=1 − π₯Μ )3 Kurtosis π (π₯(π‘) π₯ππ’π = π΄π‘=1 − π₯Μ )4 S-factor I-factor π = π₯πππ /π₯Μ πΌ = π₯Μ/π₯Μ C-factor L-factor πΆ = π₯Μ/π₯πππ πΌ = π₯Μ/π₯π 4.5 π 1 π 1 π π 1 π 1 π Data reduction As mentioned before, all data entries that could not be linked to a production period (i.e., the period between two planned maintenance moments) were removed. Also, several cycles were not included because of limited production, due to numerous incidents, or no proven trend. See Table 6.1. Finally, the following parameters have been removed from the process dataset: E.REMARK, the parameter is removed it has only empty values. 4.6 Conclusion data exploration The goal of this section was to prepare the data for the modeling phase. A more in-depth overview is given about the available data. Data exploration helps to answer the sub-question: Q-2: What are the characteristics of the current situation? The maintenance moments are revised. Timedomain features are added. The parameters in the product dataset show a trend and are potentially useable to use as a health indicator. The maintenance moments are revised. In Appendix D a snapshot of the different granularities is shown. For making different granularities, raw data consists of 582,832 rows and 25 columns (i.e., features). 42 Part IV – Modeling 5. Experimental setup As noted in section 2, prognostics can be divided into three phases (i.e., four if data monitoring is included). Those three prognostics steps: Health indication (HI), Health stage (HS) division, and RUL prediction. The section is structured based on those three key steps. 5.1 Heal indication construction For selecting a fitting HI for the case of the company, we first evaluate which kind of HI fits best for the situation. Abid et al. (2018) describe HI’s classified in two categories: physics health indicators and virtual health indicators. Physical HI’s are based on a single feature such as using raw data gathered from sensors, residuals-based feature, time domain, or time-frequency feature extracted from data measured by monitoring sensors. In the case of complex degradation, it is hard to find one feature sensitive to those degradations. To tackle this problem, features can be combined in order to exploit their complementarity. However, during the data exploration, no form of degradation was found except for the parameters in the product dataset. Because the degradation in those parameters was gradual, we make use of physical HI. To evaluate possible health indicators, monotonicity and trendability are considered, as suggested by Abid et al. (2018). Nguyen et al. (2021) also mentioned that monotonicity and trendability are the only factors that can evaluate a HI when no run-to-failure data is available. Monotonicity is used to evaluate the negative or positive trend of the health indicator. Monotonicity can be measured by the absolute difference between HI's negative and positive derivatives, as indicated in Equation 5.1. Equation 5.1 Formula for Monotonicity π π ππ’ππππ ππ ( > 0) ππ’ππππ ππ ( < 0) ππ₯ ππ₯ π= | − | , π ∈ [0; 1] π−1 π−1 π Where ππ₯ represents the derivative of the HI, π represents the number of variations, π represents higher monotonicity of degradation when it approaches 1. 43 Trendability is related to time and represents the correlation between the degradation trend and the operating time of the Component X. Trendability is calculated with Equation 5.2: Equation 5.2 Formula for Trendability π = π π π π(π΄π=1 π₯π π‘π ) − π(π΄π=1 π₯π )π(π΄π=1 π‘π ) π √[ππ΄π=1 π₯π2 − 2 π π π₯π ) ] [ππ΄π=1 π‘π2 (π΄π=1 2 π − (π΄π=1 π‘π ) ] , π ∈ [−1; 1] π ∈ [−1; 1] represents the correlation coefficient between indicator x and the time index t. When π approaches, 1 HI has a strong positive linear correlation with time. In subsection 3.4, the raw parameters regarding the parameters of the product dataset (i.e., ROD_11, ROD_12, and ROD_13) were monotonic and had a trend. When the results were discussed with domain experts from the company, a potential reason was given why there is a possible correlation between parameters of the product dataset and the condition of the Component X. Due to the position of Product Z in the production cell, removed due to company sensitive information regarding the indirect relationship This explains the difference between a new Component X and a worn-out Component X. We will test this hypothesis in the next subsection by calculating the correlation of the HI to the RUL. However, to find the best single parameter to use as a health indicator, we want to quantify the results. We also took the time-domain features based on the parameters of the product dataset into consideration. In Table 4.2, results are shown for monotonicity and trendability for various options as a HI. The results are averaged based on each production cycle, as shown in Table 5.1. In Table 5.1, we can see the effect of monotonicity and trendability when changing the data granularity. Different time-domain features are calculated based on various rolling windows. It can be seen that a higher data granularity (e.g., daily) results in a higher score based on monotonicity and trendability. During production, differences in measured data are quite large (i.e., much variation), and when the data is averaged, it creates less noise, and it will degrade more stepwise. Based on the evaluation criteria, the mean (i.e., based on sliding window) outperforms other timedomain features in monotonicity and trendability. AVG_ROD is used as a baseline. This parameter holds the average values of the parameters of the product dataset, while the mean takes a sliding window based on this parameter. 44 Table 5.1 Testing monotonicity and trendability on potential HI's Raw Mon Trend 0.005 -0.245 - Avg. ROD Mean Var Max Min SMR RMS Peak-peak Skewness Kurtosis S-factor C-factor I-factor L-factor Daily Mon Trend 0.428 -0.771 0.534 -0.829 0.283 0.208 0.167 -0.140 0.382 -0.762 0.514 -0.827 0.449 -0.800 0.257 0.534 0.178 0.047 0.169 0.116 0.248 0.567 0.424 0.743 0.418 0.753 0.419 0.754 8 Hourly Mon Trend 0.161 -0.686 0.205 -0.773 0.109 0.197 0.116 -0.243 0.175 -0.674 0.224 -0.774 0.200 -0.737 0.138 0.301 0.097 -0.022 0.083 0.041 0.114 0.438 0.160 0.575 0.155 0.590 0.165 0.592 Hourly Mon Trend 0.028 -0.614 0.035 -0.642 0.033 0.136 0.044 -0.272 0.049 -0.539 0.038 -0.631 0.031 -0.590 0.050 0.090 0.029 -0.020 0.037 0.007 0.033 0.226 0.055 0.317 0.055 0.331 0.055 0.334 Quarter Mon Trend 0.016 -0.484 0.017 -0.527 0.016 0.084 0.035 -0.265 0.035 -0.461 0.017 -0.512 0.020 -0.474 0.040 0.051 0.020 -0.016 0.015 -0.002 0.169 0.129 0.039 0.218 0.042 0.226 0.039 0.229 Based on the values found for monotonicity and trendability, a deeper analysis was conducted to see the impact when data granularity is adjusted based on different sliding windows. The results for this analysis are shown in Table 5.2. When a higher-level granularity is taken, the better the monotonicity scores. When a lower-level granularity is taken, the monotonicity will decrease due to variation in the data because many data points are considered. Also, when taken a wider sliding window, the trendability has increased for lower granularity data. To conclude the analysis, the mean of a sliding time window is fitted due to the high trendability over time. However, the monotonicity is scoring low at lower-level granularity, and this can cause inaccuracy during a prediction or the RUL. To test the suitability chosen HI, a correlation is conducted to the RUL in the next subsection. Table 5.2 impact of granularity and sliding time window Mean Daily Granularity 8-hourly Hourly Quarter Raw data Daily Mon: 0.534 Trend: -0.829 Mon: 0.402 Trend: -0.817 Mon: 0.243 Trend: -0.839 Mon: 0.151 Trend: -0.847 Mon: 0.093 Trend: -0.854 Sliding time window 8-Hourly Hourly Mon: 0.486 Mon: 0.421 Trend: -0.802 Trend: -0.759 Mon: 0.205 Mon: 0.165 Trend: -0.773 Trend: -0.668 Mon: 0.095 Mon: 0.035 Trend: -0.797 Trend: -0.642 Mon: 0.056 Mon: 0.020 Trend: -0.804 Trend: -0.645 Mon: 0.054 Mon: 0.029 Trend: -0.810 Trend: -0.659 Quarter Mon: 0.425 Trend: -0.763 Mon: 0.152 Trend: -0.671 Mon: 0.034 Trend: -0.605 Mon: 0.017 Trend: -0.527 Mon: 0.021 Trend: -0.522 45 5.2 Health stage division In the previous subsection, we chose the best HI to the condition of the Component X (i.e., system health). When a degradation occurred, the HI presents a decreasing trend. The next step in the process to predict the RUL is the HS division. Normally, when determining different HS divisions, the degradation can be detected by dividing the HI into two or multiple stages using a threshold according to the degradation trend. Dividing the health stage using a threshold is widely used in the literature for this task (Abid et al., 2018). Abid et al. (2018) offer solutions for determining an initial threshold when no-fault history is known. Abid et al. (2018) assume an arbitrary threshold. This arbitrary initial threshold can be provided by an expert or determined based on the HI calculated from the training data. After contact with domain experts of the operational process at the company, no initial threshold for a parameter can be set based on expert knowledge. Therefore, the only option available is to determine a somewhat arbitrary initial threshold. Abid et al. (2018) use a gaussian distributed-based value because z-normalization was applied in their preprocessing method. However, there are more options, namely: • A static lower bound (i.e., arbitrary number) • A relative lower bound (i.e., based on a degraded percentage based on the first few measurements of a cycle) • Lower-bound based on statistics (i.e., same method as Abid et al. (2018) When results are generated in section 7, the three methods to determine a lower bound are compared. To analyze which lower bound fits best with the data in terms of accuracy. Still, a correlation has been calculated between the RUL and the HI. The results were 0.945, 0.35, and 0.948, respectively, to the static, relative, and statistics way of determining a lower bound. The options were calculated based on hourly data and using a rolling window of one day to reduce the variation for each datapoint. The high correlation between the RUL and HI confirms that the average values of the product parameters can predict the condition of the Component X. To conclude, this method can be used even if run-to-failure data do not exist, and it is expected that the threshold can be updated when run-to-failure data accumulates, as mentioned by Abid et al. (2018). An HI in an unsupervised manner can capture the degradation in a system. The HI decreases as the system degrades. The lower bound can be replaced when run-to-failure data becomes available. 46 5.3 RUL prediction As stated, we use degradation modeling based on a data-driven approach. In the literature review (see section 2), various potential prediction models were discussed and are implemented. To ensure satisfactory results, an experimental setup has been drafted. For graphical representation, see Figure 5.1. Figure 5.1 Experiment setup of RUL prediction At first, the preprocessing set will be reviewed to guarantee that only valuable and useful information is used for the prediction model. Data is integrated, cleaned, transformed, and removed in this preprocessing step according to section 4. After that, the data is split into training to determine the best parameters and a test set to predict the RUL. This split is done based on a 70%-30% split based on a full production cycle. A production cycle is a period on a specific machine between preventive maintenance moments. The feature selection is done by a comparison between the importance of each feature. After the RUL prediction, the accuracy results will be described. The split based on the train and test set is shown in Figure 7.2 on the next page. 47 In Figure 5.2, the different trajectories are shown based on the health indicator. A relative lower bound is used for this graph based on 15% degradation based on the first 10 hours of data. The health indicator is the average result of the parameters from the product dataset. Based on the test set, a random point in time is selected to estimate the RUL. This method is chosen to take any real-life application into account. Where at some point, you want to see how many more products can be produced until Component X needs to be replaced. Figure 5.2 Trajectories of train and test set with a cut-off point To determine the effectiveness of the different prediction models, an evaluation must be made using suitable and meaningful metrics. Based on the literature review conducted in section 2, the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) are widely used in the literature for RUL evaluation depending on the real RUL. For the evaluation, the failure point is set based on the lower bound. A comparison is made based on the real RUL is compared to the predicted RUL. During the process so far, different choices and options are discussed without knowing the effect on the accuracy of the RUL prediction. Therefore, we want to evaluate different variations which could be based on the current data. Various experiments are prepared, namely: • Determine the best RUL prediction models for the case of the company • The determination of lower bounds on the RUL prediction • The granularity of the data versus the sliding time window on the RUL prediction • Filtering the data on machine level versus production cell level on the RUL prediction • Impact of stationary process data on RUL prediction 48 To determine the best prediction models, two different decision tree methods are implemented and two different complexes neural networks, namely: CNN and LSTM. As a baseline, we compare those results versus a simple linear regression based on the HI. To see the best determination of lower bounds, three options are compared to the bestperforming prediction models. The three options are a static lower bound, a relative lower bound, and a more statistical-based lower bound based on a gaussian distribution after standardizing the data. Based on the best prediction models for the prediction, RUL with the best functioning lower bound of different settings within the data preprocessing is compared based on data granularity and a sliding time window. The different data granularities and sliding time windows can be seen in Table 5.2. Second last, another adjustment is made within the data preprocessing. The data is filtered at a position level within the machine (i.e., production cell) rather than at the machine-level. In order to compare the accuracy based on the same analysis. However, it has been stated from the company that maintenance of all production cells is preferably changed simultaneously per machine because of efficiency and the associated costs. Lastly, the influence of adding additional features is evaluated. All the variables based on the process data were not considered due to their stationary behavior. Additionally, when finding trends in the data understanding phase, no trends were found when performing the Mann-Kendall Trend test is. Before we perform all the tests, all used prediction models are briefly discussed in the next section. Where needed, the method by which the data is standardized or normalized is explained. Lastly, hyperparameter optimization is addressed to achieve the optimal results for each specific prediction model. 49 6. Modeling In this section, all the models are briefly explained, which are applied to the RUL prediction of Component X. First, the general concept of each prediction model is described. Following this, we discuss specific tunings within the respective prediction model. We also mentioned that standardization or normalization is applicable for the prediction model. Lastly, we mentioned how parameter optimization had been done for its respective prediction model. 6.1 Linear Regression Simple linear regression is a statistical model, widely used in ML regression tasks, based on the idea that the relationship between two variables can be explained by Equation 6.1: Equation 6.1 Simple linear regression π¦π = πΌ + π½π₯π + ππ Where ππ Is the error term, πΌ and π½ are parameters of the regression, π½ represents the variation of the dependent variable compared to the independent variable. α represents the value of our dependent variable when the independent one is equal to zero (i.e., intercept). Using simple linear regression, we want to find the parameters πΌ and π½ that minimize the error term squared; this procedure is called Ordinary Least Squared error (OLS). For the company, we use simple linear regression as a baseline to compare the more complex machine learning algorithms. The RUL is the dependent variable, while the HI is the independent variable. We train a linear regression function based on the training data, as shown in the previous section. For finding the optimal parameters for the linear regression, we used an Ordinary Least Square method (OLS). After that, we use new data (test data) to validate the found linear function and know the performance based on the RMSE and MAPE. 6.2 Decision Tree Patil et al. (2018) described a decision tree as the principle of simple decision-making rules worked out in a flow chart form to get the desired output. In decision tree regression, various subsets of the existing dataset are created to form decision nodes and leaf nodes. Decision nodes represent features, and leaf nodes represent a decision. However, because we focus on the regression of the decision tree, the target is in the form of a numerical value (i.e., RUL value). The input is the feature set, while the output is the formation of a decision tree with a decision called a root node representing the topmost decision node of the tree corresponding to the best predictor. We use two decision tree methods for our method: Gradient Boosted Regression Tree and Random Forest. 50 6.2.1 Gradient boosted regression tree (XGBoost) Gradient boosted regression tree (e.g., XGBoost) is an iteratively accumulative decision tree algorithm. The prediction model can accumulate the results of multiple decision trees as the final prediction output by establishing a group of weak learners Gradient boosted regression trees are also ensemble learners like Random Forest, which use decision trees as base learners, but they differ in the following manners according to Singh et al. (2019): (1) Random Forest is an ensemble of low bias and high variance deep decision trees whereas, boosted trees use high bias and low variance shallow trees as base learners (2) In Random Forest, base decision trees grow in parallel, whereas in gradient boosted trees, it is done sequentially. Since boosting is a directed search where a new tree learns the gradient of the residuals at each iteration, between the target values and the currently predicted values. Gradient boosted trees use gradient descent based on learned gradients. For optimizing the results found for the XGBoost, a python module for gradient boosted regression tree learning. A grid search strategy is used for the eta and max depth. The eta is the step size shrinkage used in the update to prevent overfitting, while the max depth means the max depth of a tree (i.e., amount of decision nodes). There is also made use of an early stopping round to find the optimal number of boosting rounds. The number of boosted rounds correspondent to the number of boosting rounds or trees to build. Using early stopping rounds if the performance has not improved for n rounds, the training is stopped, and the best number of boosting rounds is stored. We use a window of 10 rounds. On top of that, we used 10-fold cross validation to average the performance of the training data to eliminate coincidences. No data normalization or standardization is needed for the gradient boosted regression tree to work. 51 6.2.2 Random Forest Random forest is a supervised learning algorithm that uses ensemble methods (bagging) to solve both regression and classification problems. Random Forest Regression is the assembly of various Decision Tree regressors, combined using ensemble and predictions of each tree, which are averaged to find the best predictions for the RUL. We focus on the regression models because we want to predict the RUL of Component X. A graphical of a Random Forest is shown in Figure 6.1. Figure 6.1 Graphical display of Random Forest Generally, Random Forest (Regression) is used for noisy datasets as they increase the model's bias and prevent overfitting (Patil et al., 2018). Patil et al. (2018) also stated that small variation in the dataset does not make the model unstable due to the ensemble of the tress. The randomized sampling of features that more noisy features contribute minimally to the prediction of the RUL. The fundamental concept behind random forest regression is the large number of uncorrelated models operating as a committee that will outperform any individual constituent models (i.e., single decision tree). Same as for optimizing the results of the XGBoost algorithm, for the Random Forest, we also made use of a grid search for the hyperparameter optimization. However, this is done with the number of estimators (i.e., the number of trees that have been built before taking the maximum voting of predictions) and the max features. Max features define how many features should be considered when looking for the best split. Similar to the XGBoost method, we used 10-fold cross validation to average the training data results. Subsequently, no standardization or normalization is needed for the Random Forest to work. 52 6.3 Convolutional neural network Convolution is mainly known for technology in computer vision, such as image recognition or object detection. Chen et al. (2020) describe the general structure based on Figure 6.2. This Figure show two significant characteristics (i.e., spatial weight sharing and local receptive field). Those characteristics are realized by alternately stacking the convolution layer and pooling layer. The network convolves multiple kernels or filters with input data in convolution layers to generate features maps, from which pooling layers are applied to aggregate features and significant abstract features afterward (Chen et al., 2020). Figure 6.2 General structure of convolution neural network as described in Chen et al. (2020) Generally, CNN is employed using two-dimension format data. However, for our application, the input data is one-dimensional. We want to see the relationship between the health indicator and the degradation of Component X. The one-dimensional data is represented as π₯ = [π₯1 , π₯2 , … π₯π ) Where π denotes the length of each sample. The convolutional operation in each feature map can be expressed as Equation 6.2 Equation 6.2 Convolutional operation for each feature map π = π(π€ π ⋅ π₯π:π+πΎπΏ −1 + π), π = 1,2,3, … , π − πΎπΏ + 1 53 Where π€ is the filter with the length being πΎπΏ ; the symbols π and π denote bias and the activation function, respectively. The activation function is selected as a rectified linear unit (ReLU). The output π can be seen as learned features concerning input sample π₯. Pooling is another operation of CNN. Convolution layers are used to generate features. Because of those convolution layers, the input signal is considered local stationery in this research, indicating that the extracted features useful in one local and short-time window can also be valuable in other window regions. Pooling is used to summarize the outputs of adjacent groups of units in the same feature map. It remarkably reduces feature dimension and over-fitting. The features generated by the convolution network are described as π = [π1 , π2 , … , ππ ]. Then, the pooling function is defined as Equation 6.3 Equation 6.3 Pooling function for CNN πΉ = {max {ππ:π+ππΏ −1 }|π = π π − π + 1, π = 1,2,3, … } Where ππΏ moreover, π denote window size and stride of pooling, respectively. In our case, CNN extracts local spatial features with a sliding window way and cannot encode time-series information. Chen et al. (2020) indicate that deep neural networks (DNNs) are hard to train, which can be solved by normalization. Normalization ensures a stable distribution of activation values and further reduces the internal covariate shift generated by the change of network parameters. Subsequently, it allows a higher learning rate to be deployed and acts as a regularizer. Normalization is done on the training dataset based on a min-max scaler, which is done with Equation 6.4. Equation 6.4 Min-max normalization formula π₯ππππ = π₯ − π₯πππ π₯πππ₯ − π₯πππ Batch normalization helps over-fitting to a certain degree (Chan et al. 2020). On top of that, the performance of a CNN network can be optimized when using different layers. We use a relatively simple CNN network because of the simple linear degradation in the data. Subsequently, the number of epochs, learning rate, batch size can be changed. Because optimizing a CNN is more an art than a profession, the best adjustments are made using a trial and error method. On top of that, we used the Adam optimizer, which provides an optimization algorithm that can handle sparse gradients on noisy problems to minimize the RMSE of the CNN network. 54 6.4 Long-short-term-memory Long short-term memory network is one of the advanced structures in recurrent neural networks (RNN), which is often used to deal with time-series tasks (Chen et al., 2020). LSTM networks have memory capability and enable the collected information stream to continue to flow inside the network, compared to feed-forward neural networks. LSTM networks can link the previous information to the present time, enabling sequence data to the prediction model to use historical status information to decide the current state of the equipment. LSTM networks contain four core elements: cell state, forget gate, input gate, and output gate. Those four components are shown in Figure 6.3. Figure 6.3 Structure of long short-term memory network as described in Chen et al. (2020) The three other gates update the information of the cell state (i.e., forget gate, input gate, an output gate, which makes LSTM remove or add information automatically. The forget gate determines what information the LSTM is intended to remove from the previous cell state πΆπ‘−1. 55 The mathematical expression is shown in Equation 6.5: Equation 6.5 Forget gate ππ‘ = π(ππ ⋅ [βπ‘−1 , π₯π‘ ] + ππ ) βπ‘−1 and π₯π‘ represent the hidden state at time π‘ − 1 and input feature at time π‘, respectively. ππ and ππ Are weight parameters and bias term of forget gate layer. π is the sigmoid function. The output of this gate is ππ‘ whose value ranges from 0 to 1. 0 indicates that the behavior is fully dropped out, while 1 indicates that the information of the cell state is totally retained (Chen et al. 2020). The input gate decides what new learned information the LSTM is going to add to the current state πΆπ‘ . The mathematical expression of the learned information is shown in Equation 6.6: Equation 6.6 learned information as input for input gate πΆΜπ‘ = tanh (π€π ⋅ [βπ‘−1 , π₯π‘ ] + ππ ) The input gate is calculated by mapping the previous hidden state and current input features in a non-linear way. Subsequently, the input gate selects partially significant information from (Chen et al. 2020). The input gate is expressed as the following mathematical expression, see Equation 6.7. Equation 6.7 input gate ππ‘ = π(ππ ⋅ [βπ‘−1 , π₯π‘ ] + ππ ) If the two equations are combined (i.e., equation X and equation Y), the current cell state πΆπ‘ is computed as the following term, see Equation 6.8: Equation 6.8 Update of the current gate πΆπ‘ = ππ‘ β πΆπ‘−1 + ππ‘ β πΆΜπ‘ The current cell is thus composed of two terms, ππ‘ β πΆπ‘−1 and ππ‘ β πΆΜπ‘ . ππ‘ β πΆπ‘−1 represents the filtered information after being discarded. ππ‘ β πΆΜπ‘ means the newly generated feature information which is being added. 56 The output gate determines which part of the cell state the LSTM is going to output. The final output of the hidden layer βπ‘ is calculated as follows, see Equation 6.9 and Equation 6.10: Equation 6.9 Output gate ππ‘ = π(π€π ⋅ [βπ‘−1 , π₯π‘ ] + ππ Equation 6.10 Output of hidden layer βπ‘ = ππ‘ β tanh(πΆπ‘ ) Based on all the equations, the non-linear gates regulate the incoming and outcoming information of the LSTM network adaptively. Moreover, the hidden state βπ‘ contains all historical state information from time 0 till the current time (i.e., time t). The time-series information is helpful to construct accurate health indictor (Chen et al. 2020). Standardization of the data is needed to perform the LSTM network, during the following formula, see Equation 6.11: Equation 6.11 Standardization of data π₯π ππππ = π₯ − π₯ππππ π Batch normalization helps over-fitting to a certain degree (Chan et al. 2020). Same as with the CNN network, the performance of an LSTM network can be optimized when using different layers, here we also use a simple LSTM model for the same reason mentioned at the CNN network. The CNN network uses a trial and error method to see fitting epochs, learning rate, and batch size. The Adam optimizer is also applied to minimize the RMSE of the LSTM network 57 A characteristic of an LSTM network is that it can use time window processing, shown in Figure 6.4. The window size will be a sequence of times series data with length n. Based on the window length, the LSTM network takes the datapoints for each feature (i.e., parameter) of n-1 timesteps back in time. Taken sequence data into account offers a possibility that the prediction model can include fluctuations in this sliding time window. The sliding time window moves up with the knumber of timesteps, also called a shift. The sliding window principle is also used in the 1D-CNN network but not for the XGBoost and Random Forest. For our prediction model, we use a shift of 1-step and a time window of 25 timesteps for hourly data. s Figure 6.4 Time window processing 58 Part V – Results 7. Results In this subsection, two sub-questions are answered., namely Q-5: How can we optimize each predictive model in terms of accuracy? And Q-6: What is the performance of the best-performing prediction mode based on accuracy? Different prediction models were examined in the previous section with optional normalization or standardization of the data and hyperparameter optimalisation. This answers Q-5 partly. This section examines the best settings of the data in terms of lower bounds, data granularity and sliding window, inclining features, and the way data is filtered to see which prediction model performs optimally. When we know the best settings, we also have enough information to answer sub-question Q-6. First, we test the different prediction models against each other to choose the best prediction model. Then, different settings of the lower bounds are considered. Subsequently, different settings between the data granularity and sliding window are compared. Next, the data is filtered on the production cell level to see if it affects the accuracy of the prediction model. Furthermore, process data is added to the model to see if any useful features improve the prediction model's performance. 7.1 Comparison of different predictions prediction models for RUL For comparing the different machine learning algorithms, the choice is made to use a data granularity of an hour to get relatively quick answers in terms of computational power for each prediction model. We use a sliding window based on a time window of a day. This because Table 5.2 has the highest monotonicity and trendability for a granularity of hourly data. For the implementation, we use a relatively lower bound, as it is the simplest and fairly easy to implement and, at first glance, the best to use. Linear Regression: The easiest prediction model to implement as a simple linear regression. The data was diverged based on a 70%-30% ratio as a train and test set based on whole production cycles. We take an evenly distributed sample for this split over all the machines to get balanced input data evenly distributed over the approved production periods. When we train a linear regression based on the HI (i.e., average values of product dataset) on the training set, a global formula is discovered with an intercept of -306,473 products and a coefficient of 54,920 for each unit (anomalized) gained. The linear regression is also graphically shown in Figure 7.1 59 To evaluate its results, an RMSE of roughly 28 production hours (RMSE in products known but anomalized) realized when calculating the MAPE. A result of 85.66% is realized. The difference between the MAPE and RMSE is that MAPE will penalize more in smaller predictions if inaccurate. Before we can say anything meaningful about the results of the linear regression, the results of the other prediction models will first be discussed Figure 7.1 Linear regression of Health index and Remaining useful lifetime XGBoost: Same as for the Linear Regression, we used the same split for the train and test set. We only make one prediction per production cycle; this also applies to the other prediction models. We used a lower bound of 15% degradation than the health indicator (HI) based on the first 10 data points (i.e., first 10 production hours in a production cycle). A constraint is added that a machine must be active for 24 production hours to gather enough input data. Now there is a timeline between the first 24 production hours and the time the lower bound is reached. A random moment is taken between those points and based on that current moment. We see if we can predict how many more products there can be manufactured before the lower bound will be reached. When generating results, hyperparameter optimization regarding the eta and max depth of the trees is discussed in the previous section—the optimal depth and eta where five layers and an eta of 0.1, respectively. After running the XGBoost prediction model, an RMSE of 25 production hours (RMSE in products known but anomalized) was found for 13 test periods; this result improved 60 compared to the linear regression. In terms of MAPE, 122.23% is realized, which is higher than with the linear regression. The reason for this can be seen in Figure 7.2. What can be observed is the relative true RUL value of periods 9, 10, 11, 12, and 13. This fluctuation will lead to a higher MAPE compared to the average values of the linear regression. Figure 7.2 Predicted RUL-values versus the real RUL values performed by XGBoost For now, it is too early to say anything about prediction performance. First, we will work out the other decision tree algorithm together with the two neural networks. Random Forest: To get a fair comparison, the same data split is considered for the Random Forest algorithm for the XGBoost algorithm. However, for the Random Forest algorithm, other hyperparameters must be optimized. For the Random Forest algorithm, we use a grid search to evaluate the number of evaluators (i.e., number of trees to build the forest) and the number of parameters used per tree (e.g., auto, sqrt, or log2). When the grid search is performed, the optimal number of estimators for the training dataset is 250, while the maximum number of features is determined based on the square of the total input parameters. After the hyperparameter optimization, an RMSE of 22 production hours is reached (RMSE in products known but anomalized), while a MAPE of 78.84% is realized. The result of the prediction model is also shown in Figure 7.3 61 What can be observed based on this graph is that the predictions are closer to the true RUL for most of the datapoints compared to the graph created by the XGBoost algorithm, hence the better results. However, there is still a gap between the prediction for the test dataset's 10th and 11th production cycles. Figure 7.3 Predicted RUL-values versus the real RUL values performed by Random Forest CNN: For the 1D convolution neural network, we use the same split as before. Also, the CNN network is unique and has interchangeable parameters. Because CNN is a more complex method and an optimal outcome is difficult to achieve, trial and error are used for the structure of the network and the number of epochs and learning rate. The performance, however, is less than with the other prediction models. For the convolution neural network used, we used 50 epochs to get a result with a stepwise learning rate to get optimal results. When running the CNN, we made use of an extra validation set. When the CNN network is trained, the history shown after each epoch is closely followed to see if the prediction model trains. When we look at the results, they perform worse than the XGBoost, Random Forest, or even the simple Linear Regression. An RMSE of 63 production hours is realized (RMSE in products known but anomalized), and a MAPE of 424.13% is achieved, which distressing. When looking at the training feed, the prediction model improves over time. However, the model overfits. Much time was spent on getting the CNN better, but this is the best result that could be found for it. This overfitting is probably because the degradation linear happens that the CNN network tries to find more complex connections that are not there. CNN networks take the history of an n62 number of time points. With this, it tries to analyze behavior based on the transformation of a parameter over this period, possibly coupled with other parameters. This makes the prediction model think that based on the training data, there is a complex connection between the data, which there is not. A decision tree algorithm (i.e., Random Forest) is better to apply in our situation. A graphical result of the CNN network is shown in Figure 7.4. Figure 7.4 Predicted RUL-values versus the real RUL values performed by CNN LSTM: Like all the other prediction models, we use the same train and test split to get a fair comparison. Same as for CNN, the LSTM network is more complex than decision tree methods. For this LSTM network, we also used different settings for the hyperparameters, such as the number of epochs, the learning rate that is also stepwise, and the batch size when running the LSTM network. Same as before, we used an extra validation set based on the training set to see if the prediction model works. Because the degradation behaves more or less linearly, the decision is made to implement a relatively simple LSTM. Taking a look at the LSTM network's performance, the model performs similarly to the CNN network. Same as with the CNN network, the LSTM looks at n-number of time points back in time to analyze trends within that interval. However, LSTM is specialized in time series data. However, because there are no returning and constant patterns in the data, the performance of the LSTM model is worse when a comparison is made to the decision tree algorithms. The same reason for overfitting is the same as with the CNN network. The LSTM network tries to find patterns in the window it was given, which there are not. When looking at the RMSE and MAPE, 55 production 63 hours (RMSE in products known but anomalized) and 450.65% are realized, respectively. The results of the LSTM network are graphically shown in Figure 7.5. Figure 7.5 Predicted RUL-values versus the real RUL values performed by LSTM Conclusion: When comparing all the prediction models, the best-found results based on the RMSE and MAPE have been found when predicting the RUL with the Random Forest algorithm. For further experiments, we will use the RF algorithm to compare different settings in the preprocess data. However, it can be interesting to see which parameters took account in the prediction of the RUL. The main parameters used were the number of products produced till a certain point, the cycle number (based on the specific machine). On a lower level, the importance of the minimum value of ROD 11 of the previous day, minimum value of ROD 13 of the previous day, the SMR of the AVG_ROD, the minimum of the AVG_ROD and the SMR of ROD 13. For a full analysis of how important each parameter was for predicting the RUL, see Appendix E for a graphical review with a legend. After talking to the domain experts, on average, when a machine is fully functioning, 2000 products per hour can be fabricated. When we look at the best prediction, an RMSE of around 22 production hours (RMSE in products known but anomalized). Based on Figure 9.3, it is observed that in most cases, an overprediction is made. With a margin of safety one production day, it can be predicted when a certain lower limit is reached on the HI. 64 7.2 Impact of the determination of the lower bounds The second experiment is based on the determination of various lower bounds. As discussed, three options are compared based on the Random Forest. We compare a static lower bound, a relative lower bound and a statistical lower bound based on the standard deviation for the training dataset. We select a lower bound from which roughly equal amounts of data will be available for all options. For the experiment performed in section 9.1, we used 5571 hours of data. A specific bound will be selected for the static and statistical lower bound with roughly the same amount. First, the static way was implemented. A lower bound was determined based on the HI. If AVG_ROD would decrease below 6.7units, we used that as our failure point. 5292 hourly data points were used from the four machines. After running a prediction based on the Random Forest algorithm, an RMSE of 25 production hours was realized (RMSE in products known but anomalized). However, the MAPE was relatively high with 62847%; the reason behind this extraordinary number is that one of the test production cycles had a real RUL of 0, so it is better to look at the fair comparison for a fair comparison RMSE. When looking at the important parameters, the HI itself was determined as the most important feature. Secondly, the statistic way was implemented. First, the HI was standardized, as shown in Equation 8.11. A similar split in the number of hourly data points used a lower bound was set on -0.5 for the data. When setting this lower bound 5313 hourly data points were used from the four machines. When running the Random Forest algorithm, the performance is significantly lower than in comparison with the static lower bound or a relative lower bound. An RMSE of 40 produciton hours (RMSE in products known but anomalized) is realized with a MAPE of 26,344%. The reason for this extraordinarily high number is the same as for the static way. One of the test periods real RUL was 0. Lastly, the relative way of determining the lower bound is briefly repeated. When comparing the different prediction models based on the experiment in the previous subsection, the relative lower bound has already been implemented. It resulted in an RMSE of 22 production hours (RMSE in products known but anomalized ) is reached while a MAPE of 78.84% is realized; this is the best performing method of selecting a lower bound for the case of the company. 65 7.3 Impact of data granularity and sliding time window The third experiment is based on finding the most suitable settings concerning the data granularity and sliding time window. This optimization is done based on the preprocessing step before a prediction is made with the Random Forest algorithm. We compare the same settings for the data granularity and sliding time window used to calculate the monotonicity and trendability. The results are expressed in terms of RMSE. MAPE can get extraordinarily high when the RUL comes around 0. Finding the optimal data granularity and using a sliding time window to create the features can improve the accuracy. If the data granularity is high, the Random Forest can easily detect the current health of Component X due to its high monotonicity and trendability. However, the intermediate steps at which the RUL is projected will also be substantial. The trick is to predict a level where the Random Forest algorithm can easily deduce the condition of Component X but can predict as accurately as possible. The results are shown in Table 7.1. Table 7.1 Comparison of different data granularity and sliding window on performance Random Forest Granularity Performance Random Forest Daily 8-hourly Hourly Quarter Raw data Daily RMSE: 33 hours RMSE: 45 hours RMSE: 25 hours RMSE: 40 hours RMSE: 31 hours Sliding time window 8-Hourly Hourly RMSE: 20 hours RMSE: 13 hours RMSE: 30 hours RMSE: 25 hours RMSE: 42 hours RMSE: 18 hours RMSE: 31 hours RMSE: 25 hours RMSE: 21 hours RMSE: 23 hours Quarter RMSE: 32 hours RMSE: 26 hours RMSE: 31 hours RMSE: 31 hours RMSE: 33 hours What can be observed from Table 7.1 is an improvement in the RMSE when compared to the results of the first experiment, which can be found in section 7.1. Looking vertically at Table 9.1, it appears that a sliding window produces the best results. When the data has a granularity of one day, exceptionally good results are obtained (i.e., sliding window based on hourly data). This has to do with the fact that the prediction model focuses on the production so far when looking at the most important features during making a prediction—a note for the results. Purely based on the train and test dataset split and the timing of choosing at what point during the test set a prediction is made, the result can also be influenced. This is demonstrated for a granularity of one hour and a sliding window of one day. The prediction was done again compared to the first experiment, where the same settings were included. Now the prediction model scores slightly worse based on the RMSE. As a conclusion from this experiment, the sliding window of one hour with a data granularity of one hour provides the best results. When a day's granularity is taken, it is purely based on the production up to a certain point in time, making it unreliable. 66 7.4 Impact of filtering the data on production cell-level We test if each cell's prediction within the machines is better predictable for the second to last experiment. To test this concept, instead of filtering the data based on each machine, we split the data based on each production cell within the machine (i.e., different granularity). The test is relatively simple. Instead of giving one prediction for the machine, we get 30 individual predictions. Also, when calculating the time-domain features, this is done based on the production cell level. We made 30 individual predictions at the production cell level; the average of these 30 predictions was an RMSE of 24 production hours (RMSE in products known but anomalized). This was done on a granularity of hourly data and a sliding time window of one hour for creating the HI and timedomain features. When this is compared to a realized RMSE of 18 production hours (RMSE in products known but anomalized) the RUL of individual production cell is less accurate than at the machine level. A note is that the result may be influenced by the randomness of selecting the train/test split and the time from which the RUL is predicted. 7.5 Impact of adding process data as features For the last experiment, we test if the accuracy of the Random Forest algorithm would increase if we extend it with process parameters. As stated in section 5.4, no trend was found for one of the process parameters when performing the Mann Kendall trend test. For testing the influence of process data as features, we use a granularity of hourly data combined with a sliding time window of one hour when creating the time domain features and HI. When a prediction is made, including the process parameters, see appendix B. on top of the other (time-domain) features used. An RMSE is realized of 31 production hours (RMSE in products known but anomalized), indicating that the Random Forest algorithm performs worse when including the process parameters. 7.6 Conclusion of experiments To conclude, the Random Forest algorithm performs the best in terms of accuracy based on the RMSE and MAPE. The best choice of the lower bound is made based on the relative lower bound based on the first 10 hours of production data. Using a granularity based on hourly data and creating the time-domain features, the Random Forest algorithm will perform optimal. Other experiments based on the impact of filtering the data on production cell-level and adding process data as features were unsuccessful. The best RMSE found was 18 production hours (RMSE in products known but anomalized). When the machine normally produces XXXX products per hour, we can predict reaching a lower limit within 18 hours of accuracy. 67 8. Conclusion 8.1 Conclusion At the beginning of this research project, the goal and the research questions were defined. The main research question is: How can the data currently measured within Machine Y help implement a more data-driven maintenance policy for Component X? To answer the main research question is captured in the CRISP-DM methodology. This methodology was used to help to structure this data-driven maintenance project. The goal of the research project is: Deliver a proof of concept for data-driven maintenance based on the current data available within the production process of Machine Y. It can be concluded that the goal of the research project has been achieved. A remaining useful lifetime condition is possible based on an indirect measurement based on parameters of the product. However, there were some assumptions made to get the question answered. In order to partially walk through the process, all the sub-questions will be answered first, and then the main question will be answered. Q-1: Which maintenance strategies are there, and what are the characteristics of each policy? Various maintenance policies are described and analyzed. The policy the company used before was a time-based preventive maintenance policy. Every two weeks, unrelated to the number of products produced, the Component X were preventively replaced. Based on the needs of the company, the knowledge of the process, and the gathered data of Machine Y. A choice was made to implement a prognostic data-driven maintenance policy. Q-2: What are the characteristics of the current situation? It can conclude that the lack of run-to-failure data is the biggest drawback for figuring out an effective maintenance policy. On top of that, there is no direct measurement on the equipment (e.g., temperature, vibrations) but an indirect relationship that reflects the Component X’s degradation. Those two aspects were considered while finding an applicable policy. The current maintenance policy is a preventive maintenance policy conducted every two weeks. 68 Q-3: Based on the available data, what type of data-driven strategy is best and realizable for the situation of Component X? Based on the current situation described in Q-2. The most fitting method is to work with a datadriven degradation model based on a health indicator. We test the indirect measurements, which are the parameters of the product and the condition of Component X. This is used as a health indicator. The health indicator was evaluated based on trendability, monotonicity, and correlation with the RUL. For solving the problem of the absence of run-to-failure data, there using a relative arbitrary lower bound. Because the degradation of the Component X indicated by the health indicator behaves as a linear line, we can substitute this arbitrary lower bound when run-to-failure data is gathered. Q-4: Which predictions models can be applied for data-driven maintenance? Based on the researched literature, several predictive prediction models can be applied from the machine learning domain for creating a degradation-based model. A comparison is made with simpler time series based solely on one parameter to investigate if there is explainable behavior in the dataset. We compare that method with more complex machine learning predictive models. The following predictive models are used: a Simple Linear Regression, XGBoost, Random Forest, 1D-CNN, and LSTM for predicting the Remaining Useful Lifetime (RUL) of the Component X. Q-5: How can we optimize each predictive model that in terms of accuracy? To ensure that each prediction model performs to its fullest potential, in the first place, a deep analysis is made on how each prediction model works. Subsequently, the input data is supposed to be set up so that it lends itself to its respective prediction model. This is done to optimize the data preprocessing, i.e., data granularity, feature selection, and a sliding time window. If applicable, for each model, hyperparameter optimization based on a grid search is performed. Q-6: What is the performance of the best-performing prediction mode based on accuracy? An in-depth comparison is made for each machine learning model compared to a simple linear regression. The best performing prediction model is Random Forest which has an RMSE of roughly 18 production hours (RMSE in products known but anomalized). For achieving those results, a data granularity of hourly data is used, and a sliding time window is based on a window of an hour. 69 Main-Question: How can the maintenance be more data-driven to optimize the current maintenance policy for Machine Y? Based on the needs of the company, the knowledge of the process, and the gathered data of Machine Y, a choice was made to implement prognostic data-driven maintenance based on degradation. However, the characteristics that made this project challenging are the absence of run-to-failure data and indirect equipment measurement. The most fitting method is to work with a data-driven degradation model based on a health indicator. The health indicator was evaluated based on trendability, monotonicity, and correlation with the RUL. The correlation between the degradation of the health indicator and the degradation based on the RUL confirms that the indirect measurement can serve as a health indicator for Component X. For solving the problem with the absence of run-to-failure data, there using a relative arbitrary lower bound. Because the degradation of Component X indicated by the health indicator behaves as a linear line, we can substitute this arbitrary lower bound when run-to-failure data is gathered. Because the degradation is constant, substitution can be applied, and the main focus is on the predictability of the process. After comparing different prediction models and finding the best settings in the data preprocessing, the Random Forest algorithm became the best in terms of accuracy. An RMSE of 36.193 of roughly 18 active production hours (RMSE in products known but anomalized). To ultimately implement the policy of data-driven maintenance, run-to-failure data must be collected to ultimately find the true point of failure to establish the lower bound based on failure data. 8.2 Recommendations Different recommendations can be given based on the outcomes of the project. First of all, it is important to generate run-to-failure data to extend the life of Component X. If Component X remain operational for longer in Machine Y, this means fewer maintenance moments and fewer maintenance costs. After all, the predictability of Component X has been proven through this project. Second, it might be interesting to focus more specifically on traditional time series based on the health indicator alone. This might improve the predictability of the Component X’s life. Lastly, adding direct sensors to the process (e.g., vibrations, temperature) can improve predictability. A predictive maintenance project is easier to implement with direct measurements on the replaceable equipment. 70 9. Future opportunities and implementation Since the scope of the assignment was to establish a more data-driven method for the Machine Y, and there is insufficient time to implement it, there are several aspects described for future implementation and cost savings for the data-driven maintenance policy. First, the way of implementing the data-driven maintenance policy is described. Next, the savings of the solution are given based on additional capacity and costs. Furthermore, a final experiment is done on runto-failure data to see if the degradation does indeed behave downwards that was assumed until now. 9.1 Implementation on the work floor Normally, at the end of a CRISP-DM cycle, development is the last step in the process. When the decision was made to begin this case study, the result was estimated to establish a working proofof-concept for the Machine Y. This is delivered. The role fulfilled during this case study can be compared to the role of a data analyst, which is to create a suitable prediction model, the next steps of implementing (real-time) data-driven maintenance is shown in Figure 8.1 Figure 9.1 Process of implementing prediction model It was decided based on the accuracy of the results not to implement the solution in real-time. The domain experts have indicated that they would like to get hold of the code of the prediction model to make a prediction one to several times a week as to how long a specific machine can produce. They can then schedule a time for the maintenance engineer to replace the Component X. This should include a safety margin equal to the average RMSE. 71 9.2 Experiment with run-to-failure data After talking to the engineers at the company regarding Machine Y, the absence of run-to-failure data was discussed. After making clear that a final lower bound can only be made once failure data has been collected, it was decided to let several machines produce up to the failure point. Agreed with the maintenance personnel that a Component X will be replaced when the Component X is not functional. First, the replacements of Component X are shown in Figure 8.2. Observable from this graph is that the renewal of the Component X or production cells takes place gradually. This indicates that not every Component X in the production cell has the same lifespan, provided that there are corresponding products manufactured for each production cell. REPLACEMENTS OF COMPONENT X VT16 VT26 VT36 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 Figure 9.2 Replacements of Component X or production cell after last batched maintenance moment It is more interesting to see how the health index behaves on which the lower bound is based. When a health indicator is compiled per individual production cell, no structural downward trend can be found. This was also the case when looking for trends in section 4. In section 4, it was then decided to work with data so that a consistent parameter from which a health indicator could be compiled could be found. All the production cells regarding VT16 and VT26 which were not replaced are used to see if the health indicator will depredate over time. The reason why VT36 is not included is since every production cell in it has been renewed. This is shown in Figure 8.3 on the next page. 72 The behavior that can be observed from Figure 8.3 does not meet expectations. There is no monotonic trend. A reason for that may be that several Component X’s or production cells have been replaced and are not documented. Several staggered patterns in Figure 8.3 may indicate this. Figure 9.3 health indicator trajectory run-to-failure data For now, we should not draw too many conclusions based on two health indicators on possibly incorrect data. It is recommended that the company conduct more trials to see if the health indicator remains consistent. A final observation during the evaluation was the number of products produced compared to the number of maintenance moments. As discussed earlier, the maintenance moments are not errorfree. Removed due to company sensitive information regarding the found results 73 9.3 Created capacity and saved costs Removed due to company sensitive information regarding the found results 74 References Abid, K., Mouchaweh, M. S., & Cornez, L. (2018, September). Fault prognostics for the predictive maintenance of wind turbines: State of the art. In Joint european conference on machine learning and knowledge discovery in databases (pp. 113-125). Springer, Cham. Adebimpe, O. A., Oladokun, V., & Charles-Owaba, O. E. (2015). Preventive maintenance interval prediction: a spare parts inventory cost and lost earning based model. Engineering, Technology & Applied Science Research, 5(3), 811-817. Adhikari, P. P., Rao, H. V. G., & Buderath, M. (2018, October). Machine learning based data driven diagnostics & prognostics framework for aircraft predictive maintenance. In 10th International Symposium on NDT in Aerospace Dresden, Germany. Arts, J. (2014). Elementary maintenance models. Lecture notes for 1CM30-Service Supply Chains for Capital Goods, 1-43. Avontuur, A. P. J. (2017). Maintenance optimization of process rolls, with an application to electrolytic tinning lines (Doctoral dissertation, PhD thesis, University of Technology Eindhoven, 2017. 14). Benkedjouh, T., Medjaher, K., Zerhouni, N., & Rechak, S. (2013). Remaining useful life estimation based on nonlinear feature reduction and support vector regression. Engineering Applications of Artificial Intelligence, 26(7), 1751-1760. Bey-Temsamani, A., Engels, M., Motten, A., Vandenplas, S., & Ompusunggu, A. P. (2009). A practical approach to combine data mining and prognostics for improved predictive maintenance. Data Min. Case Stud, 36. Biswal, S., & Sabareesh, G. R. (2015, May). Design and development of a wind turbine test rig for condition monitoring studies. In 2015 international conference on industrial instrumentation and control (icic) (pp. 891-896). IEEE. Chen, L., Xu, G., Zhang, S., Yan, W., & Wu, Q. (2020). Health indicator construction of machinery based on end-to-end trainable convolution recurrent neural networks. Journal of Manufacturing Systems, 54, 1-11. da Costa, P. R. D. O., Akçay, A., Zhang, Y., & Kaymak, U. (2020). Remaining useful lifetime prediction via deep domain adaptation. Reliability Engineering & System Safety, 195, 106682. Dragomir, O. E., Gouriveau, R., Dragomir, F., Minca, E., & Zerhouni, N. (2009, August). Review of prognostic problem in condition-based maintenance. In 2009 European Control Conference (ECC) (pp. 1587-1592). IEEE. Gouriveau, R., Ramasso, E., & Zerhouni, N. (2013). Strategies to face imbalanced and unlabelled data in PHM applications. Chemical Engineering Transactions, 33, 115-120. Javed, K., Gouriveau, R., Zerhouni, N., & Nectoux, P. (2014). Enabling health monitoring approach based on vibration data for accurate prognostics. IEEE Transactions on industrial electronics, 62(1), 647-656. 75 Jimenez, J. J. M., Schwartz, S., Vingerhoeds, R., Grabot, B., & Salaün, M. (2020). Towards multi-model approaches to predictive maintenance: A systematic literature survey on diagnostics and prognostics. Journal of Manufacturing Systems, 56, 539-557. Kamal, N., & Pachauri, S. (2018). Mann-Kendall Test-A Novel Approach for Statistical Trend Analysis. International Journal of Computer Trends and Technology, 63(1), 18-21. Kim, D., Lee, S., & Kim, D. (2021). An Applicable Predictive Maintenance Framework for the Absence of Run-to-Failure Data. Applied Sciences, 11(11), 5180. Lee, J., Ni, J., Djurdjanovic, D., Qiu, H., & Liao, H. (2006). Intelligent prognostics tools and emaintenance. Computers in industry, 57(6), 476-489. Lei, Y., Li, N., Guo, L., Li, N., Yan, T., & Lin, J. (2018). Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mechanical systems and signal processing, 104, 799-834. Li, X., Ding, Q., & Sun, J. Q. (2018). Remaining useful life estimation in prognostics using deep convolution neural networks. Reliability Engineering & System Safety, 172, 1-11. Liao, W., & Wang, Y. (2013). Data-driven Machinery Prognostics Approach using in a Predictive Maintenance Model. J. Comput., 8(1), 225-231. Luo, J., Pattipati, K. R., Qiao, L., & Chigusa, S. (2008). Model-based prognostic techniques applied to a suspension system. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 38(5), 1156-1168. Luo, M., Yan, H. C., Hu, B., Zhou, J. H., & Pang, C. K. (2015). A data-driven two-stage maintenance framework for degradation prediction in semiconductor manufacturing industries. Computers & Industrial Engineering, 85, 414-422. Malhotra, P., TV, V., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). Multi-sensor prognostics using an unsupervised health index based on LSTM encoderdecoder. arXiv preprint arXiv:1608.06154. Mathew, V., Toby, T., Singh, V., Rao, B. M., & Kumar, M. G. (2017, December). Prediction of Remaining Useful Lifetime (RUL) of turbofan engine using machine learning. In 2017 IEEE International Conference on Circuits and Systems (ICCS) (pp. 306-311). IEEE. Mobley, R. K. (2002). An introduction to predictive maintenance. Elsevier. Muller, A., Suhner, M. C., & Iung, B. (2008). Formalisation of a new prognosis model for supporting proactive maintenance implementation on industrial system. Reliability engineering & system safety, 93(2), 234-253. Nguyen, K. T., & Medjaher, K. (2021). An automated health indicator construction methodology for prognostics based on multi-criteria optimization. ISA transactions, 113, 81-96. Patil, S., Patil, A., Handikherkar, V., Desai, S., Phalle, V. M., & Kazi, F. S. (2018, November). Remaining useful life (RUL) prediction of rolling element bearing using random forest and gradient boosting technique. In ASME international mechanical engineering congress and exposition (Vol. 52187, p. V013T05A019). American Society of Mechanical Engineers. Peng, Y., Dong, M., & Zuo, M. J. (2010). Current status of machine prognostics in conditionbased maintenance: a review. The International Journal of Advanced Manufacturing Technology, 50(1-4), 297-313. 76 Peres, R. S., Rocha, A. D., Leitao, P., & Barata, J. (2018). IDARTS–Towards intelligent data analysis and real-time supervision for industry 4.0. Computers in industry, 101, 138-146. Pham, H. T., Yang, B. S., & Nguyen, T. T. (2012). Machine performance degradation assessment and remaining useful life prediction using proportional hazard model and support vector machine. Mechanical Systems and Signal Processing, 32, 320-330. Ramezani, S., Moini, A., & Riahi, M. (2019). Prognostics and health management in machinery: A review of methodologies for RUL prediction and roadmap. International Journal of Industrial Engineering and Management Science, 6(1), 38-61. Rauch, E., Linder, C., & Dallasega, P. (2020). Anthropocentric perspective of production before and within Industry 4.0. Computers & Industrial Engineering, 139, 105644. Saidi, L., Ali, J. B., Bechhoefer, E., & Benbouzid, M. (2017). Wind turbine high-speed shaft bearings health prognosis through a spectral Kurtosis-derived indices and SVR. Applied Acoustics, 120, 1-8. Sezer, E., Romero, D., Guedea, F., Macchi, M., & Emmanouilidis, C. (2018, June). An industry 4.0-enabled low cost predictive maintenance approach for smes. In 2018 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC) (pp. 1-8). IEEE. Shagluf, A., Longstaff, A. P., & Fletcher, S. (2014). Maintenance strategies to reduce downtime due to machine positional errors. In Proceedings of Maintenance Performance Measurement and Management (MPMM) Conference 2014. Imprensa da Universidade de Coimbra. Singh, S. K., Kumar, S., & Dwivedi, J. P. (2019). A novel soft computing method for engine RUL prediction. Multimedia Tools and Applications, 78(4), 4065-4087. Tober, M. (2011). PubMed, ScienceDirect, Scopus or Google Scholar–Which is the best search engine for an effective literature research in laser medicine?. Medical Laser Application, 26(3), 139-144. Veldman, J., Wortmann, H., & Klingenberg, W. (2011). Typology of condition based maintenance. Journal of Quality in Maintenance Engineering. Waeyenbergh, G., & Pintelon, L. (2009). CIBOCOF: a framework for industrial maintenance concept development. International Journal of Production Economics, 121(2), 633-640. Xu, G., Liu, M., Wang, J., Ma, Y., Wang, J., Li, F., & Shen, W. (2019, August). Data-driven fault diagnostics and prognostics for predictive maintenance: A brief overview. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE) (pp. 103-108). IEEE.. Yang, B. S., Oh, M. S., & Tan, A. C. C. (2008). Machine condition prognosis based on regression trees and one-step-ahead prediction. Mechanical Systems and Signal Processing, 22(5), 1179-1193. Yang, J., Zhang, Y., & Zhu, Y. (2007). Intelligent fault diagnosis of rolling element bearing based on SVMs and fractal dimension. Mechanical Systems and Signal Processing, 21(5), 2012-2024. You, M. Y., & Meng, G. (2013). A framework of similarity-based residual life prediction approaches using degradation histories with failure, preventive maintenance, and suspension events. IEEE Transactions on Reliability, 62(1), 127-135. 77 Zhang, J., Wang, P., Yan, R., & Gao, R. X. (2018). Long short-term memory for machine remaining life prediction. Journal of manufacturing systems, 48, 78-86. Zhang, B., Zhang, L., & Xu, J. (2016). Degradation feature selection for remaining useful life prediction of rolling element bearings. Quality and Reliability Engineering International, 32(2), 547-554. Zhu, Q., Peng, H., & Van Houtum, G. J. (2014). Remote monitoring and condition-based maintenance for high-tech capital goods. In Second International Conference on Railway Technology: Research, Development and Maintenance (Railways 2014), April 8-11, 2014, Ajaccio, France (pp. 302-1). Civil-Comp Press. Zschech, P., Heinrich, K., Bink, R., & Neufeld, J. S. (2019). Prognostic Model Development with Missing Labels. Business & Information Systems Engineering, 61(3), 327-343. 78 Appendix A: 79 Appendix B: 80 Appendix C: 81 Appendix D: 82 Appendix E: 83