Application of a data-driven monitoring technique to diagnose air leaks in an automotive diesel engine: a case study David Antory Electrical Test for Advanced Architectures, International Automotive Research Centre, Warwick Manufacturing Group, University of Warwick, Coventry, CV4 7AL, U.K. (E-mail: d.antory@warwick.ac.uk; Tel: +44-24-76575441; Fax: +44-24-76575403) Abstract This paper presents a case study of the application of a data-driven monitoring technique to diagnose air leaks in an automotive diesel engine. Using measurement signals taken from the sensors/actuators which are present in a modern automotive vehicle, a data-driven diagnostic model is built for condition monitoring purposes. Detailed investigations have shown that measured signals taken from the experimental test-bed often contain redundant information and noise due to the nature of the process. In order to deliver a clear interpretation of these measured signals, they therefore need to undergo a ‘compression’ and an ‘extraction’ stage in the modelling process. It is at this stage that the proposed data-driven monitoring technique plays a significant role by taking only the important information of the original measured signals for fault diagnosis purposes. The status of the engine’s performance is then monitored using this diagnostic model. This condition monitoring process involves two separate stages of fault detection and root-cause diagnosis. -1- The effectiveness of this diagnostic model was validated using an experimental automotive 1.9L 4-cylinder diesel engine embedded in a chassis dynamometer in an engine test-bed. Two joint diagnostics plots were used to provide an accurate and sensitive fault detection process. Using the proposed model, small air leaks in the inlet manifold plenum chamber with a diameter size of 2 to 6 mm were accurately detected. Further analysis using contribution to T2 and Q statistics show the effect of these air leaks on fuel consumption. It was later discovered that these air leaks may contribute to emissions fault. In comparison to the existing model-based approaches, the proposed method has several benefits: (i) it makes no simplifying assumptions, as the model is built entirely from the measured signals; (ii) it is simple and straight-forward, (iii) there is no additional hardware required for modelling, (iv) it is a time and cost-efficient way to deliver condition monitoring (i.e. fault diagnosis application), (v) it is capable of pinpointing the root-cause and the effect of the problem, and (vi) it is feasible to be implemented in practice. Keywords: application, data-driven technique, condition monitoring, diagnosis, air leaks, automotive diesel engine 1. Introduction Stringent emission regulations have led automotive manufacturers to develop systems which can detect and diagnose any fault which may cause tailpipe emissions to rise above a prescribed threshold. This can be achieved by continuously monitoring the -2- automotive data characteristics for any abnormal behaviour. Recently, Mills [1] discussed a way to perform automated analysis of automotive data to oversee vehicle system operations, to automate data capture and analysis, and also to improve the diagnostic process. Such an approach can be viewed as a method for improving the reliability, safety and efficiency of the processes as discussed by Isermann [2] and Gertler [3]. This can also be used as a way to conduct fault detection and identification. Previous work by the author [4] investigated faults in an automotive engine using measurement signals which were available in production engines and excluded the remaining signals which can only be measured in a test bed environment. The work reported in this paper extends previous investigations by using all measurement signals taken from an engine during tests conducted in a laboratory. This additional step allows a complete analysis of the experimental data which may be beneficial in the design, development, manufacturing and service stages of the vehicle lifecycle. A detailed analysis is then performed to demonstrate the detection and diagnosis processes. This paper showed that fault caused by various small leaks (of 2mm, 4mm and 6 mm diameters) in the intake manifold plenum chamber of an TDI 1.9 litre diesel engine can be well detected and diagnosed. The model, built using a data-driven technique named principal component analysis (PCA), performed more accurate condition monitoring of this fault than that achieved by using a conventional physical model (Section 4). The improved performance is especially apparent for the smallest air leak (with a diameter of 2 mm). This paper is organised as follows: the next section describes the data-driven technique, the PCA method, which is followed by a discussion of the experimental data in Section 3. Section 4 discusses the condition monitoring process where the detection -3- and diagnosis of various sizes of air leak is explained in detail. Finally Section 5 concludes this paper and discusses the future work. 2. Data-driven technique for condition monitoring This section discusses the data-driven technique known as principal component analysis (PCA). PCA has gained considerable attention mostly in the field of industrial chemical and semiconductor processes for condition monitoring [5-8]. The technique can be successfully applied to automotive applications [4]. 2.1. The PCA method The different types of signals collected from the process are recorded in a range of different unit scales. Therefore, in PCA, normalisation is an essential first stage to make the variance between one process variable comparable to that of any other [9]. Normalisation can be done by mean-centring or auto-scaling the raw data, the latter is done by dividing the mean-centred data by its standard deviation. The normalised data are then stored in column vectors that form a matrix. PCA identifies a combination of variables that describe major trends in the data set. It relies on an eigenvector decomposition of the covariance or correlation matrix of the process variables [10]. The most important information can then be described using a small number of principal components (PC). PCA is a powerful tool in this respect, for analysing multivariate data sets [11]. -4- For a given data matrix, X ∈ ℜm× n , in which m samples, which are stored as row vectors, of n (n << m) process variables, which are stored as column vectors, the application of PCA gives rise to a reduced set of ‘synthetic’ process variables, PC scores T and PC loadings P, containing important variation, written as follows: X = t 1p Τ1 + t 2 p T2 + L + t k p Tk + E = k ∑tp i =1 i + E T i (1) X is decomposed into a sum of vector products of PC score vectors ti, stored as column vectors in T, and PC loading vectors pi, stored as column vectors in P, where k < n represents the significant process variation shown by the first k dominant eigenvectors of the correlation matrix Sxx defined as follows: S xx = 1 X T X ∈ ℜn × n ( m − 1) (2) Here, X is auto-scaled to have mean-centred and unit variance. The residual matrix E describes unimportant variation and noise in the original data X. The important k ˆ = ∑ t i p Ti , thus, the residual E can be variation is stored in the estimation matrix X i =1 written as follows: ˆ E=X−X (3) Whilst the elements in the loading vectors describe the coefficient of the linear relationships between the process variables, the elements in the score vectors represent the variation in these variables. The model is built by determining k, which represents a reduced set of PCs that describe significant process variation. The loading vectors pi are the eigenvectors of the correlation matrix Sxx, which can be formulated as follows: S xx p i = λi p i (4) -5- where λi is the eigenvalue associated with the eigenvector pi of the correlation matrix. It measures the amount of variance explained by the { t i , pi } pairs, which are arranged in descending order of λi . Consequently, the first k pairs capture the largest amount of variation which contains the largest amount of information from the original data. The score vector t i is the linear combination of the original data matrix X defined by loadings p i as shown below: Xp i = t i (5) This transformation enhances the ability of PCA to extract information from the original data by eliminating redundant information. The reduced set of variables is then used for modelling and analysis. Kourti and McGregor [8] stated that this new reduced data set often contains more robust information of the process than the original data. More details about PCA can be found in Jackson [10] and Jolliffe [12]. 2.2. Monitoring statistics Using PCA for condition monitoring involves the application of a PCA model to new observed data. The procedure applied is similar to that of building the actual model. The observed data are normalised using the mean and standard deviation of the PCA model. By using the same number of PCs retained to build the model k, the loadings P, and the correlation matrix Sxx, condition monitoring can be performed. The examination focuses on the variation of the observed data within the PCA model and the mismatch between the PCA model and the observed data. -6- 2.2.1. The Hotelling’s T2 statistic The Hotelling’s T2 statistic gives a measure of significant variation of the process. It is simply the sum of normalised squared scores divided by their variance. The PC score t is obtained by projecting the new observed data xnew onto the plane defined by the PCA loadings P. This can be summarised as follows: t = x new P T T 2 (6) −1 = t Λ t = T k t i2 ∑λ i =1 (7) i where Λ −1 is a diagonal matrix of the inverse of the k largest eigenvalues λi of correlation matrix Sxx in descending order, and ti is the ith score. The Hotelling’s T2 statistic can be plotted as a function of time. The statistical thresholds for T2 can be calculated using the F-distribution [10, 12] as follows: Tα2 = k ( m − 1) Fα ( k , m − k ) (m − k ) (8) where Tα2 is the threshold value with significance level of confidence, α typically 95% or 99%, m is the number of samples used to build the PCA model, k is the number of PCs retained and Fα ( k , m − k ) is the upper 100 α % critical point of the Fdistribution with k and (m - k) degrees of freedom. 2.2.2. The Q (residual) statistic The Q statistic gives the measurement uncertainty between the PCA model and the observed data. It shows how well the newly observed data conforms to the PCA model. -7- The mismatch between measured and estimated sensor readings results in the residual e, which forms the basis of the Q statistic, which is formulated as follows: e = x − tPT = x[I n − PP T ] (9) The Q statistic is simply the sum squared of the residual e, thus: Q = eT e = n ∑e j =1 2 j (10) where ej is the jth residual. The Q statistic can be plotted as a function of time. The statistical thresholds for the Q statistic [13] can be calculated as follows: 1 ⎛ h c 2θ 2 ⎞ h0 θ h (h − 1) Qα = θ1 ⎜ 0 α + 2 0 02 + 1⎟ ⎜ ⎟ θ1 θ1 ⎝ ⎠ where θ1 = n ∑ λi , θ 2 = i = k +1 n ∑ λi2 , θ 3 = i = k +1 n ∑λ , i = k +1 3 i h0 = 1 − (11) 2θ1θ 3 and cα is the normal deviate 3θ 22 corresponding to the (1 − α ) percentile. 2.2.3. Geometrical interpretation of the monitoring statistics The geometrical interpretation of the Hotelling’s T2 and the Q statistics is illustrated in Fig.1 for a 2D plane formed by the first and second principal components. Point A shows the orthogonal deviation of a new sample perpendicular to the ellipse plane model, while point B shows the horizontal deviation of a new sample from the centre of the ellipse plane model. The deviation represents a serious effect of the abnormal situation to the process. The further away this deviation is from the ellipse plane model the more serious the effect of the fault which has occurred. -8- Fig. 1: Geometric interpretation of the monitoring statistics The two monitoring statistics mentioned above can compliment each other to produce more accurate condition monitoring. However, when the effect of the fault only emerges in one of the monitoring statistics, it may cause confusion in the analysis and interpretation of results. A joint monitoring statistics plot which combines the Q and the Hotelling’s T2 statistics may give a better interpretation. This is discussed details in the following section. 2.2.4. Kernel density method for joint diagnostics Chen et al. [14] stated that in process condition monitoring, the Q and Hotelling’s T2 statistics are the most important statistical parameters. They are useful for monitoring the system performance independently to detect any abnormal situation. Combining both statistics can improve the sensitivity of the individual monitoring statistic, especially when dealing with incipient fault, such as small air leaks. This can be done by simply using an individual statistic’s confidence limit to create joint diagnostics confidence limits, (i.e. by plotting the Q against T2 statistics in two-dimensional features). Alternatively, a new confidence region can be generated from the probability density functions (PDF) of the joint Q and T2 statistics using the kernel density estimation (KDE) method [14]. A PDF describes the likelihood with which a data point has occurred in previous process operations. The KDE assumes that the determination of the density function is approximated by a sum of small kernel functions (for example of a Gaussian or -9- Epanechnikov type) centred on each data point. Using the kernel, confidence regions are determined entirely from the structure contained in the data set without reference to a parametric model. KDE provides simple, reliable and useful information to a wide range of applications in fields such as medicine, engineering and economics [15]. The univariate kernel density estimator can be formulated as follows: n fˆ ( x; h ) = ( nh − 1 )∑ Κ{( x − X i ) h} (12) i =1 where K is a kernel function that satisfies the condition ∫ K (x )dx = 1 , and h is the bandwidth. Using a rescaling notation where Κ h (u ) = h −1 Κ (u h ) , Eq. (12) is transformed into: n fˆ ( x; h ) = n − 1 ∑ Κ h ( x − X i ) (13) i =1 A unimodal probability density function that is symmetric about zero is usually chosen for K. One important aspect when using the non-parametric approach KDE is the determination of the bandwidth h. Wand and Jones [15] state that even though it is possible to choose the bandwidth subjectively by eye in many situations, it is very timeconsuming, especially if one has no prior knowledge of the structure of the data. They proposed the use of an automatic bandwidth selector. In this paper, a mean integrated squared error (MISE) type of automatic bandwidth selector cross-validation is adopted. The extension from univariate to multivariate KDE requires some modification. The bandwidth h is transformed into a bandwidth matrix H using a diagonal matrix with one parameter as follows: H = h 2 Ι , where I is an identity matrix. In order to avoid a loss of accuracy by forcing the bandwidth to be the same in all dimensions, Fukunaga [16] suggested rescaling the data and stated that rescaling makes all variables become the - 10 - same in all dimensions. This reduces the computational load and provides a reasonable choice for bandwidth selection. The determination of H has the effect of minimising the global error criterion. For MISE cross-validation, H is given by: { [ } ] 2 MISE fˆ (:, H ) = Ε ∫ fˆ (x, H ) − f (x ) dx (14) where fˆ (x, H ) is the fitted density function and f (x ) is the real density function. The multivariate kernel density estimator can then be written as follows: n fˆ (x, H ) = n − 1 ∑ Κ H (x − X i ) (15) i =1 where H is a bandwidth matrix made up of a symmetric positive definite d × d matrix. In analogy to the univariate version, Κ H (x ) = H −1 2 Κ (H − 1 2 x ) where ∫ Κ (x )dx = 1. The probability density functions of the joint monitoring statistics between the Q and Hotelling’s T2 statistics can be built using Eq.(15) with a small modification, where ⎧Q ⎫ x = ⎨ 2⎬ ⎩T ⎭ ⎧Q ⎫ and Xi = ⎨ 2i ⎬ . In this paper, a 99% confidence region is adopted, which T ⎩ i ⎭ means that under normal operating condition not more than 1% of the total observed data lie outside of this region. More detailed information about KDE can be found in [15, 17]. Examples of the application of KDE to process monitoring can be found in [18, 19]. - 11 - 3. An experimental automotive diesel engine This section explains the procedure used to obtain the experimental data from an engine test-cell facility. A description of the automotive diesel engine used in this study is briefly presented. This is followed by a discussion of the types of fault conditions investigated. 3.1. Design of experiment (DoE) A four-cylinder Volkswagen 1.9 litre turbocharged direct injection (TDI) diesel engine was used to provide the experimental data. The engine was coupled to a 145 kW AC Schenck dynamometer and Ricardo control system in an instrumented test bed facility. A photo of the test laboratory is given in Fig. 2. Fig. 2: Photo of engine test cell with Volkswagen diesel engine connected to the dynamometer The fault-free, or baseline performance characteristics of the engine were recorded at steady-state conditions with speed settings of 1500, 2500, 3500, and 4500 rev/min, respectively. Five different pedal positions, ranging from 30% to 100%, were tested at each speed. These test conditions are summarised in Table 1. Table 1: Matrix of speed/load settings used during the engine tests - 12 - The values of the pedal positions were chosen using the following procedure. Firstly, the peak torque values at each speed were recorded. Next, the pedal positions corresponding to 20%, 40%, 60%, 80% and 100% of these peak torque values were noted. These pedal positions were then used during both fault-free and fault-containing tests. This ensured that the same inputs were used in all tests. Experimental data from a total of 20 different combinations, covering a wide range of operating conditions, were therefore used to develop the model. Each steady-state condition was recorded for 30 seconds at a sampling rate of 10 Hz. A total of 300 points were therefore recorded for each combination, producing an overall total of 6000 points across the entire range of steady-state driving conditions. At each point, the signals from 12 transducers were recorded using the combined input settings shown in Table 2. The first 7 outputs are available in production engines, while the remaining 5 outputs can be captured using laboratory instruments. The model derived did not include engine speed and pedal position as these represent input parameters that are set by the test-cell operator via the dynamometer control system. Including these inputs for modelling will not give additional information since they represent the ideal situation of steady-state behaviour. It would be a different matter for a transient dynamic experimental case where the dynamic characteristics of the inputs (speed and load) heavily influence the output signals. However, in the case it is mandatory to include them in the model. In this steady-state experimental case, the remaining 12 output variables, shown in Table 2, can be affected by any operational fault that occurs. Table 2: Recorded experimental engine signals - 13 - 3.2. Fault investigated: air leak in the intake manifold The fault to be examined was an air leak in the intake manifold. This particular kind of fault can be difficult to detect as, under a range of operating conditions, the turbocharger waste gate will inherently try to counteract the fault and maintain the manifold boost pressure at a pre-determined level. Consequently, depending on the magnitude of the air leak, the fault may be imperceptible to the driver. However, the engine management system (EMS) assumes that all of the air, which passes the airflow meter will subsequently enter the combustion chamber. If some of this air escapes from the manifold, then the overall air-fuel ratio will be lower than that assumed by the EMS. This could therefore lead to an increase in the levels of carbon monoxide, unburned hydrocarbons and particulate matter being released into the atmosphere, especially at full load conditions. Depending on the location of the leak within the intake manifold and the method of control used, the exhaust gas recirculation (EGR) process may also be affected leading to an increase in NOx emissions. In this investigation, the air leak was created by drilling holes of 2 mm, 4 mm and 6 mm diameters in a removable bolt in the inlet manifold plenum chamber. The manifold was pressure tested for leaks prior to the experimental leak being introduced. The complexity of the combustion process makes the identification of such a fault a difficult task. The effect of these leaks on the raw data is shown in Fig. 3, which highlights the fact that it is difficult to identify the abnormal effect. Consequently, with the exception of the φ6 mm hole, this type of fault would be difficult to detect using a physical model (see Section 4.1). Of particular interest is the data recorded with a φ2 mm leak which appears to be identical to the data recorded during the fault-free condition. - 14 - Fig. 3: Raw plot of the experimental data for all measured signals 4. Condition monitoring of intake manifold air leaks This section discusses the condition monitoring process where the detection and diagnosis of air leaks in the inlet manifold plenum chamber is investigated. A comparison between physical and principal component models is provided to illustrate the effectiveness of the proposed data-driven model over conventional physical techniques to examine the effect of air leaks, especially for the smallest leak (φ2 mm). 4.1. Physical model Using a physical model, the air leak rate was calculated using the pressure difference between the manifold and the surrounding atmosphere using the following equation: m& = A 2 ρΔP × C D (16) where: m& is the mass flow rate of air through the hole (kg/s), A is the area of the hole (m2), ρ is the density (kg/m3), ΔP is the manifold boost pressure (Pa), and CD is the coefficient of discharge which was taken to be 0.6. The air flow entering the engine was measured by the air flow meter during engine testing. As expected, it was found that the percentage of air lost through the hole increased as the diameter of the hole increased. For the three diameters tested, the highest percentage loss occurred at 1500 rev/min at full load, reaching 1.98 %, 7.05% and 15.19% for φ2 mm, φ4 mm and φ6 mm holes, respectively. Under these conditions the air flow rate entering the engine is low due to the low engine speed. However, the manifold boost - 15 - pressure, which is the driving force for the air leakage flow, is almost at its maximum. Consequently, the air leak flow rate constitutes a high proportion of the flow rate entering the engine. Fig. 4 shows the flow rate of air through the hole as a percentage of air entering the engine. Given that the maximum air leak rate was less than 2% for the φ2 mm hole, with an average value less than 1%, this fault posed a difficult challenge for the fault detection and diagnosis algorithm. Fig. 4: Percentage air loss caused by 2 mm, 4 mm and 6 mm air leaks in the inlet manifold plenum chamber for all combinations tested. 4.2. Principal components model Using the 12 output signals taken from the experimental data, a PCA model was built. Table 3 shows the variance captured by PC scores in descending order. Table 3: Variance captured by PCA The method of choosing how many PCs to retain was based on the percentage of variance captured by each PC. A minimum of 1% was required for a PC to be included. Popular methods such as the eigenvalue-one rule and the cross-validation procedure are not suitable for this case. Both approaches select 2 PCs to be retained to build a model. During residual evaluation it was found that using these approaches the residual value still contained a considerable amount of the variation of the original data. Further examination revealed that 5 PCs captured most of the original variance of the - 16 - experimental data and left only a negligible level of less than 1% of unimportant variation and noise. Information regarding the methods used to choose the number of PCs is not discussed here due to limitation of space but can be found in Jolliffe [12]. 4.3. Process monitoring of air leaks fault Section 2.2 has discussed the monitoring statistics used to detect air leaks in the intake manifold plenum chamber. To illustrate the monitoring process, a new data set of 150 seconds (corresponding to 1500 samples) for various driving conditions with a φ2 mm air leak introduced for the last 300 samples at 1500 rev/min and at full load, was used for validation purposes. Fig. 5: Process monitoring using Hotelling’s T2 statistic. Fig. 6: Process monitoring using Q residual statistic. Fig. 5 and Fig 6 illustrate the monitoring process using Hotelling’s T2 and Q statistics, respectively. Two confidence limits (99% and 95%) are provided to highlight the violation that was caused by a φ2 mm air leak at full load, 1500 rev/min. While the first 1200 samples remain below the confidence limits, the majority of the last 300 samples (sample 1201 onwards) strongly violate the confidence limits. This abnormal condition caused by the φ2 mm air leak in the intake manifold becomes more apparent in Fig. 7. The joint monitoring statistics plots shown in Fig. 7 can enhance the detection capabilities with increased sensitivity. The first joint diagnostics plot simply combines the two monitoring statistics (Q and T2) and plots them together on the X and Y axes. - 17 - The validation data set consists of the plus symbol (which represents the first 1200 conforming samples) and the cross symbol (which represents the non-conforming samples caused by the φ2 mm air leak at sample numbers 1201 to 1500). There are 4 regions in Fig. 7(a) defined by the two 99% confidence limits of the monitoring statistics, denoted as R1, R2, R3 and R4, respectively. R1 illustrates the normal region containing samples which fall below both confidence limits. In contrast, as can be seen R3 is the region containing those samples which violate both confidence limits. The samples contained in this region represent the most abnormal conditions and stem mostly from the φ2 mm condition represented by the cross symbol. R2 and R4 contain samples which violate either the Q residual (R2) or Hotelling’s T2 (R4) statistics alone. Fig. 7(a): Process monitoring using combined Q residual and Hotelling’s T2 statistics. The second joint diagnostics plot shown in Fig. 7(b) utilises a confidence region estimated using the kernel density method as discussed in Section 2.2.4 for condition monitoring. Here, the contour represents the 99% confidence region of the joint PDF between the Q and T2 statistics. Any points falling outside the contour represent outliers which occurred as an effect of the air leaks. Fig. 7(b): Process monitoring using a kernel density confidence region of the combined Q residual and Hotelling’s T2 statistics. Further analysis of the effect of the air leaks can be retrieved using contribution to T2 and Q statistics. Fig. 8 shows an analysis of the variation that is not captured by the model (T2 statistic). It is obvious that the HC (hydrocarbon) measurement signal is - 18 - affected the most when air leaks occurred, especially from points 1200 onward where air leaks are introduced. Fig. 8: Contribution to Hotelling’s T2 statistic for various data points (fault-free and faulty conditions). Fig. 9: Contribution to Q residual statistic for various data points (fault-free and faulty conditions). In a similar fashion, Fig. 9 shows an analysis of the mismatch between the diagnostic model and the unseen measured signals (Q statistic). It clearly shows that fuel flow measurement signal is affected the most by air leaks, especially from point 1200 onwards. 5. Conclusions and further work This paper demonstrated that a data-driven monitoring technique, principal components analysis (PCA), is a simple, straight-forward, powerful and potentially useful technique for condition monitoring in automotive applications. The diagnostic model is capable of exploring and exploiting underlying ‘hidden’ information from the experimental data in a compact manner. No requirement to make any simplifying assumptions is needed in building the model. This means that the model is derived solely from the measurement signals. The interdependency of the original signals is ‘captured’ and ‘transformed’ into a new and smaller number of independent signals (Section 2). The remaining un-captured signals will contain mainly un-informative and noisy data. The variation (T2 statistic) and the residual generator (Q statistic) of these - 19 - un-captured signals are used as the back-bone of fault detection and diagnosis process. It was shown in Section 4 that the diagnostic PCA model performed better in comparison to a physical model (where an assumption is made to define a coefficient discharge, CD) when detecting air leaks at the intake manifold plenum chamber, especially for a small diameter air leak (φ2 mm). Using two joint monitoring statistics plots, a clearer detection and diagnosis can be visually represented and a better analysis can be carried out. A confidence region estimated using kernel method increases the sensitivity of the monitoring process. It allows easier visual representative interpretation thereby improves the detection and diagnosis of small air leaks (see Fig. 7(b)) in comparison to joint diagnostics built by simply combining both monitoring statistics (see Fig. 7(a)). Further analysis using contribution to T2 and Q statistics show the effect of the air leaks on fuel consumption and indicate that they may contribute to emissions fault. Another important benefit of using this diagnostic model is that it can be used to detect and to diagnose any type of fault (within the scope of the measured signals) in a similar manner to the air leaks fault. The proposed technique has therefore shown good potential to automotive applications. It may be a valuable tool for a variety of condition monitoring situations, especially as the emissions regulations become increasingly stringent. - 20 - Acknowledgements David wishes to acknowledge the support of Dr. Darja Brandenburg for proofreading this manuscript. Comments and suggestions received from Dr. Geoffrey McCullough of the Internal Combustion Engines Research Group are appreciated. Special gratitude goes to Dr. Paul McEntee for his assistance in collecting the experimental data with the support of the Virtual Engineering Centre. Support received from Prof. George W. Irwin and Dr. Uwe Kruger of the Intelligent Systems and Control Research Group, Queen’s University Belfast, is gratefully acknowledged. Thanks also to the Electrical Test for Advanced Architectures team at the International Automotive Research Centre, University of Warwick. - 21 - References [1]. W.N. Mills III, Automated analysis of automotive data, SAE World Congress, Vehicle Diagnostic, SP-1922, No. 2005-01-1437, Detroit, USA, April 2005. [2]. R. Isermann, Model-based fault detection and diagnosis – status and applications, Annual Reviews in Control, 29(2005) 71-85. [3]. J. Gertler, Fault detection and diagnosis in engineering systems, Marcel Dekker, New York, USA, 1998. [4]. D. Antory, Fault diagnosis applications using nonlinear multivariate statistical process control, Ph.D. Thesis, School of Electrical & Electronics Engineering, Virtual Engineering Centre, Queen’s University Belfast, Belfast, Northern Ireland, UK, February 2005. [5]. S.J. Qin, Statistical process monitoring: basics and beyond, J. Chemometrics, 17(2003) 480-502. [6]. J.F. MacGregor, Data-based methods for process analysis, monitoring and control, in: Proc. 13th IFAC Symposium on System Identification, Rotterdam, The Netherlands, 2003, pp. 1019-1029. [7]. J.F. MacGregor, T. Kourti, Statistical process control of multivariable processes, Control Engineering Practice, 3(1995) 403-414. [8]. T. Kourti, J.F. MacGregor, Process analysis, monitoring and diagnosis, using multivariate projection methods, Chemometrics and Intelligent Laboratory Systems, 28(1995) 3-21. [9]. P. Geladi, B.R. Kowalski, Partial least-squares regression: a tutorial, Analytica Chimica Acta, 185(1986) 1-17. - 22 - [10]. J.E. Jackson, A user guide to principal components, Wiley, New York, USA, 1991. [11]. K.V. Mardia, J.T. Kent, J.M. Bibby, Multivariate analysis, Academic Press, London, UK, 1979. [12]. I.T. Jolliffe, Principal component analysis, Springer, New York, USA, 1986. [13]. J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics, 21(1979) 341-349. [14]. Q. Chen, U. Kruger, M. Meronk, A.Y.T. Leung, Synthesis of T2 and Q statistics for process monitoring, Control Engineering Practice, 12(2004) 745-755. [15]. M.P. Wand, M.C. Jones, Kernel smoothing, Monographs on statistics and applied probability 60, Chapman & Hall, London, UK, 1995. [16]. K. Fukunaga, Introduction to statistical pattern recognition, Academic Press, London, UK, 1990. [17]. B.W. Silverman, Density estimation for statistic and data analysis, Monograph on Statistics and Applied Probability 26, Chapman & Hall, London, UK, 1986 [18]. E.B. Martin, A.J. Morris, Non-parametric confidence bounds for process performance monitoring charts, J. Process Control, 6(1996) 349-358. [19]. Q. Chen, R. Wynne, P. Goulding, D.J. Sandoz, The application of principal component analysis and kernel density estimation to enhance process monitoring, Control Engineering Practice, 8(2000) 531-543. - 23 - Vitae David obtained a Sarjana Teknik (Ingenieur) degree in Electronics Engineering (major) from the Institute of Technology Sepuluh Nopember (ITS), Surabaya, Indonesia. Following qualification, he worked as an engineer for a year, before taking up an opportunity for further studies at the University of Sheffield, where he received an MSc(Eng) degree in Control Systems Engineering. Upon completion, he joined the Virtual Engineering Centre, Queen's University Belfast to do Ph.D research studies in a multidisciplinary project. He completed his studies and obtained a Ph.D degree in Control Engineering. Since Sep 2004, he has been working as a Project Engineer for the Electrical Test for Advanced Architectures project at the International Automotive Research Centre (IARC), Warwick Manufacturing Group, based at the University of Warwick. He is a member of IEEE and SAE. His current research interests include: fault detection and diagnosis, multivariate statistical process control, non-linear system modelling and identification, neural networks, intelligent data mining and process optimisation, applied to modelling and control in automotive, aeronautics and industrial chemical processes. - 24 - Figures captions Fig. 1: Geometric interpretation of the monitoring statistics Fig. 2: Photo of engine test cell with Volkswagen diesel engine connected to a chassis dynamometer - 25 - original fault-free samples samples of 2 mm air leaks samples of 4 mm air leaks samples of 6 mm air leaks F u e l F l o w ( k g /h ) Ai r fl o w ( k g /h ) 15 300 10 200 5 100 In ta k e Ma n i fo l d P r e s s u r e ( b a r ) In ta k e Ma n i fo l d T e m p e r a tu r e ( C ) 0.8 80 0.6 0.4 60 0.2 40 T u r b i n e In l e t P r e s s u r e ( b a r ) T u r b i n e In l e t T e m p e r a tu r e ( C ) 800 1.5 600 1 400 0.5 To rq u e (N m ) T u r b i n e E xi t P r e s s u r e ( b a r ) 0.1 150 0.05 100 0 50 -0.05 T u r b o S p e e d ( H z) C O 2 ( p e r c e n ta g e ) 5000 10 4000 8 3000 6 2000 4 H C (p p m ) O 2 ( p e r c e n ta g e ) 15 25 20 15 10 10 5 1000 2000 3000 4000 Data Points 5000 6000 1000 2000 3000 4000 5000 Data Points Fig. 3: Raw plot of the experimental data for all measured signals - 26 - 6000 14 2 mm hole 4 mm hole 6 mm hole Percentage air loss through hole 12 10 8 6 4 2 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Data Points Fig. 4: Percentage air loss caused by 2 mm, 4 mm and 6 mm air leaks in the inlet manifold plenum chamber for all combinations tested. 40 Hotelling's T 2 statistic value 99% Confidence limit 95% Confidence limit 35 25 N o n- c o nfo rm ing s a m ple s Hotelling's T 2 Statistic 30 20 15 10 5 300 600 900 1200 1500 Sample Nu mb er Fig. 5: Process monitoring using Hotelling’s T2 statistic. - 27 - Q Residual statistic value 99% Confidence limit 95% Confidence limit 0.6 No n- c o nfo rm ing s a m ple s Q Residual Statistic 0.5 0.4 0.3 0.2 0.1 300 600 900 1200 1500 Sample Number Fig. 6: Process monitoring using Q residual statistic. 1210 12091206 1207 1208 40 Conforming samples Non-conforming samples 1205 1212 1211 1204 35 1203 1213 1483 1482 Hotelling's T 2 Statistic 30 1484 1485 1201 1481 1214 1486 25 20 1480 1441 1442 1443 1444 1440 1439 1433 1431 1432 1445 1446 1437 1438 1434 1435 1429 1430 1436 1447 1448 1428 1241 1215 1487 1242 1240 1216 1479 1488 1449 1490 1456 1454 1455 1450 1489 1453 1457 1477 1478 1451 1452 1459 1458 1460 1491 R4 1461 1476 1475 1492 1462 1493 10 1426 1418 1419 1416 1417 1494 1474 1415 1420 1421 1464 1463 1425 1414 1422 1347 1423 1424 1413 1348 1412 1496 1497 1473 135113501349 1368 1369 1498 1352 13671500 1499 1495 1354 1355 1353 1366 1411 1465 1356 1466 1381 1382 1467 1471 1470 1469 1365 1379 1380 1410 1383 1472 1468 1409 1370 1358 1357 1384 1364 1359 1360 1363 1385 137813621361 1408 1371 1386 1377 1387 1407 1372 1376 1388 13891390 1373 1374 1375 1405 1406 1403 1391 1402 1404 1400 1401 1399 1392 1398 1393 1394 1397 1396 1395 1243 1244 1217 1245 1239 1315 1310 1311 1280 1313 1314 1297 1312 1279 1299 1278 1300 1298 1296 1219 1218 1294 1295 1272 1277 1316 1293 1220 1274 1273 1309 1281 1270 1269 1276 1271 1275 1317 1292 1268 1267 1302 1301 1282 1284 1221 1283 1318 1307 1260 1266 1308 1222 1259 1261 1262 1246 1265 1323 1326 13271303 1263 1223 1304 1257 1258 1264 1230 1324 1325 1319 1320 1285 1238 1228 1229 1305 1306 1291 1247 1248 1321 1322 1224 1225 1227 1226 1328 1231 1286 1290 1256 1232 1237 1329 1249 1289 1287 1233 1330 1288 1255 1234 1250 1331 1236 1235 1332 1333 1334 1254 1251 1335 1336 1337 1338 1253 1252 1339 1340 1341 1342 1344 1343 R3 1427 15 1202 1345 1346 R1 R2 5 0.1 0.2 0.3 0.4 0.5 0.6 Q Residual Statistic Fig. 7(a): Process monitoring using combined Q residual and Hotelling’s T2 statistics. - 28 - 1210 1209 1207 1206 1208 40 1205 1212 1211 1204 35 1203 1213 1483 1482 30 1484 1485 1201 1481 Hotelling's T 2 Statistic 1202 1214 1486 1441 1442 1443 1444 1440 1439 1431 1433 1432 1445 1446 1437 1438 1434 1435 1429 1430 1436 1447 1448 25 20 1428 1480 1241 1215 1487 1240 1242 1216 1479 1488 1449 1490 1456 1454 1455 1450 1489 1478 1453 1457 1452 1477 1451 1458 1459 1460 1491 1461 1475 1476 1492 1427 1462 1493 1426 1418 1419 1417 1416 1494 1474 1415 1420 1421 1425 1414 1463 1464 1422 1347 1423 1424 1413 13501349 1348 1500 1412 1473 1368 1496 1497 1351 1369 1498 1367 1495 1352 1355 1354 13661499 1411 1353 1465 1356 1466 1381 1382 1467 1471 1469 1470 1365 1379 1380 1383 1472 1468 1370 14101409 1384 1358 1357 1364 1359 1360 1363 1385 137813621361 1408 1386 1371 1377 1407 1387 1372 1376 13891390 1388 1373 1374 1375 1405 1406 1403 1391 1402 1404 1400 1401 1399 1392 1398 1394 1393 1397 1396 1395 15 10 1346 1345 1243 1244 1217 1245 1239 1310 1315 1311 1280 1314 1313 1297 1279 1312 1300 1299 1298 1278 1296 1219 1295 1294 1218 1272 1277 1316 1293 1220 1309 1281 1274 1273 1270 1269 1271 1276 1275 1317 1268 1267 1302 1301 1292 1282 1284 1221 1283 1318 1307 1266 1308 1260 1222 1246 1259 1262 1261 1265 1323 13271303 1326 1263 1223 1304 1258 1257 1264 1325 1230 1324 1320 1319 1228 1285 1238 1229 1321 1322 13061291 1305 1248 1247 1225 1224 1227 1226 1328 1286 1290 1231 1256 1232 1237 1329 1249 1289 1287 1233 1330 1288 1255 1234 1250 1331 1236 1235 1332 1333 1334 1254 1251 1335 1336 1337 1338 1253 1252 1339 1341 1340 1342 1344 1343 5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Q Residual Statistic Fig. 7(b): Process monitoring using a kernel density confidence region of the combined Q residual and Hotelling’s T2 statistics. Fig. 8: Contribution to Hotelling’s T2 statistic for various data points (fault-free and faulty conditions). - 29 - Fig. 9: Contribution to Q residual statistic for various data points (fault-free and faulty conditions). - 30 - Table captions Table 1: Matrix of speed/load settings used during the engine tests Speed (rev/min) Pedal Position (% load) 1500 30 40 54 62 100 2500 49 59 74 78 100 3500 57 64 74 80 100 4500 62 65 76 83 100 Table 2: Recorded experimental engine signals Engine Variable Speed Pedal Position Unit rev/min % Fuel Flow kg/h Air Flow kg/h Intake Manifold Pressure bar Intake Manifold Temperature ◦C Turbine Inlet Pressure bar Turbine Inlet Temperature ◦C Turbine Exit Pressure bar Torque Nm Turbo Speed Hz CO2 % HC ppm O2 % Note input output - 31 - Table 3: Variance captured by PCA Number of PC 1 7.46 Variance Captured by each PC (%) 62.13 Total Sum of Variance Captured (%) 62.13 2 3.55 29.61 91.74 3 0.44 3.68 95.42 4 0.30 2.52 97.94 5 0.15 1.25 99.19 6 0.049 0.40 99.59 7 0.040 0.33 99.92 8 0.006 0.047 99.967 9 0.003 0.015 99.982 10 0.001 0.011 99.993 11 0.0006 0.005 99.998 12 0.0002 0.002 100 Eigenvalue - 32 -