Application of a data-driven monitoring technique to

advertisement
Application of a data-driven monitoring technique to
diagnose air leaks in an automotive diesel engine: a case study
David Antory
Electrical Test for Advanced Architectures, International Automotive Research Centre,
Warwick Manufacturing Group, University of Warwick, Coventry, CV4 7AL, U.K.
(E-mail: d.antory@warwick.ac.uk; Tel: +44-24-76575441; Fax: +44-24-76575403)
Abstract
This paper presents a case study of the application of a data-driven monitoring
technique to diagnose air leaks in an automotive diesel engine. Using measurement
signals taken from the sensors/actuators which are present in a modern automotive
vehicle, a data-driven diagnostic model is built for condition monitoring purposes.
Detailed investigations have shown that measured signals taken from the experimental
test-bed often contain redundant information and noise due to the nature of the process.
In order to deliver a clear interpretation of these measured signals, they therefore need
to undergo a ‘compression’ and an ‘extraction’ stage in the modelling process. It is at
this stage that the proposed data-driven monitoring technique plays a significant role by
taking only the important information of the original measured signals for fault
diagnosis purposes. The status of the engine’s performance is then monitored using this
diagnostic model. This condition monitoring process involves two separate stages of
fault detection and root-cause diagnosis.
-1-
The effectiveness of this diagnostic model was validated using an experimental
automotive 1.9L 4-cylinder diesel engine embedded in a chassis dynamometer in an
engine test-bed. Two joint diagnostics plots were used to provide an accurate and
sensitive fault detection process. Using the proposed model, small air leaks in the inlet
manifold plenum chamber with a diameter size of 2 to 6 mm were accurately detected.
Further analysis using contribution to T2 and Q statistics show the effect of these air
leaks on fuel consumption. It was later discovered that these air leaks may contribute to
emissions fault.
In comparison to the existing model-based approaches, the proposed method has
several benefits: (i) it makes no simplifying assumptions, as the model is built entirely
from the measured signals; (ii) it is simple and straight-forward, (iii) there is no
additional hardware required for modelling, (iv) it is a time and cost-efficient way to
deliver condition monitoring (i.e. fault diagnosis application), (v) it is capable of pinpointing the root-cause and the effect of the problem, and (vi) it is feasible to be
implemented in practice.
Keywords: application, data-driven technique, condition monitoring, diagnosis, air
leaks, automotive diesel engine
1. Introduction
Stringent emission regulations have led automotive manufacturers to develop
systems which can detect and diagnose any fault which may cause tailpipe emissions to
rise above a prescribed threshold. This can be achieved by continuously monitoring the
-2-
automotive data characteristics for any abnormal behaviour. Recently, Mills [1]
discussed a way to perform automated analysis of automotive data to oversee vehicle
system operations, to automate data capture and analysis, and also to improve the
diagnostic process. Such an approach can be viewed as a method for improving the
reliability, safety and efficiency of the processes as discussed by Isermann [2] and
Gertler [3]. This can also be used as a way to conduct fault detection and identification.
Previous work by the author [4] investigated faults in an automotive engine using
measurement signals which were available in production engines and excluded the
remaining signals which can only be measured in a test bed environment. The work
reported in this paper extends previous investigations by using all measurement signals
taken from an engine during tests conducted in a laboratory. This additional step allows
a complete analysis of the experimental data which may be beneficial in the design,
development, manufacturing and service stages of the vehicle lifecycle. A detailed
analysis is then performed to demonstrate the detection and diagnosis processes. This
paper showed that fault caused by various small leaks (of 2mm, 4mm and 6 mm
diameters) in the intake manifold plenum chamber of an TDI 1.9 litre diesel engine can
be well detected and diagnosed. The model, built using a data-driven technique named
principal component analysis (PCA), performed more accurate condition monitoring of
this fault than that achieved by using a conventional physical model (Section 4). The
improved performance is especially apparent for the smallest air leak (with a diameter
of 2 mm).
This paper is organised as follows: the next section describes the data-driven
technique, the PCA method, which is followed by a discussion of the experimental data
in Section 3. Section 4 discusses the condition monitoring process where the detection
-3-
and diagnosis of various sizes of air leak is explained in detail. Finally Section 5
concludes this paper and discusses the future work.
2. Data-driven technique for condition monitoring
This section discusses the data-driven technique known as principal component
analysis (PCA). PCA has gained considerable attention mostly in the field of industrial
chemical and semiconductor processes for condition monitoring [5-8]. The technique
can be successfully applied to automotive applications [4].
2.1. The PCA method
The different types of signals collected from the process are recorded in a range of
different unit scales. Therefore, in PCA, normalisation is an essential first stage to make
the variance between one process variable comparable to that of any other [9].
Normalisation can be done by mean-centring or auto-scaling the raw data, the latter is
done by dividing the mean-centred data by its standard deviation. The normalised data
are then stored in column vectors that form a matrix. PCA identifies a combination of
variables that describe major trends in the data set. It relies on an eigenvector
decomposition of the covariance or correlation matrix of the process variables [10]. The
most important information can then be described using a small number of principal
components (PC). PCA is a powerful tool in this respect, for analysing multivariate data
sets [11].
-4-
For a given data matrix, X ∈ ℜm× n , in which m samples, which are stored as row
vectors, of n (n << m) process variables, which are stored as column vectors, the
application of PCA gives rise to a reduced set of ‘synthetic’ process variables, PC
scores T and PC loadings P, containing important variation, written as follows:
X = t 1p Τ1 + t 2 p T2 + L + t k p Tk + E =
k
∑tp
i =1
i
+ E
T
i
(1)
X is decomposed into a sum of vector products of PC score vectors ti, stored as
column vectors in T, and PC loading vectors pi, stored as column vectors in P, where k
< n represents the significant process variation shown by the first k dominant
eigenvectors of the correlation matrix Sxx defined as follows:
S xx =
1
X T X ∈ ℜn × n
( m − 1)
(2)
Here, X is auto-scaled to have mean-centred and unit variance. The residual matrix E
describes unimportant variation and noise in the original data X. The important
k
ˆ = ∑ t i p Ti , thus, the residual E can be
variation is stored in the estimation matrix X
i =1
written as follows:
ˆ
E=X−X
(3)
Whilst the elements in the loading vectors describe the coefficient of the linear
relationships between the process variables, the elements in the score vectors represent
the variation in these variables. The model is built by determining k, which represents a
reduced set of PCs that describe significant process variation.
The loading vectors pi are the eigenvectors of the correlation matrix Sxx, which can be
formulated as follows:
S xx p i = λi p i
(4)
-5-
where λi is the eigenvalue associated with the eigenvector pi of the correlation matrix.
It measures the amount of variance explained by the { t i , pi } pairs, which are arranged
in descending order of λi . Consequently, the first k pairs capture the largest amount of
variation which contains the largest amount of information from the original data.
The score vector t i is the linear combination of the original data matrix X defined by
loadings p i as shown below:
Xp i = t i
(5)
This transformation enhances the ability of PCA to extract information from the original
data by eliminating redundant information. The reduced set of variables is then used for
modelling and analysis. Kourti and McGregor [8] stated that this new reduced data set
often contains more robust information of the process than the original data. More
details about PCA can be found in Jackson [10] and Jolliffe [12].
2.2. Monitoring statistics
Using PCA for condition monitoring involves the application of a PCA model to new
observed data. The procedure applied is similar to that of building the actual model. The
observed data are normalised using the mean and standard deviation of the PCA model.
By using the same number of PCs retained to build the model k, the loadings P, and the
correlation matrix Sxx, condition monitoring can be performed. The examination focuses
on the variation of the observed data within the PCA model and the mismatch between
the PCA model and the observed data.
-6-
2.2.1. The Hotelling’s T2 statistic
The Hotelling’s T2 statistic gives a measure of significant variation of the process. It
is simply the sum of normalised squared scores divided by their variance. The PC score
t is obtained by projecting the new observed data xnew onto the plane defined by the
PCA loadings P. This can be summarised as follows:
t = x new P T
T
2
(6)
−1
= t Λ t =
T
k
t i2
∑λ
i =1
(7)
i
where Λ −1 is a diagonal matrix of the inverse of the k largest eigenvalues λi of
correlation matrix Sxx in descending order, and ti is the ith score.
The Hotelling’s T2 statistic can be plotted as a function of time. The statistical
thresholds for T2 can be calculated using the F-distribution [10, 12] as follows:
Tα2 =
k ( m − 1)
Fα ( k , m − k )
(m − k )
(8)
where Tα2 is the threshold value with significance level of confidence, α typically
95% or 99%, m is the number of samples used to build the PCA model, k is the number
of PCs retained and Fα ( k , m − k ) is the upper 100 α % critical point of the Fdistribution with k and (m - k) degrees of freedom.
2.2.2. The Q (residual) statistic
The Q statistic gives the measurement uncertainty between the PCA model and the
observed data. It shows how well the newly observed data conforms to the PCA model.
-7-
The mismatch between measured and estimated sensor readings results in the residual e,
which forms the basis of the Q statistic, which is formulated as follows:
e = x − tPT = x[I n − PP T ]
(9)
The Q statistic is simply the sum squared of the residual e, thus:
Q = eT e =
n
∑e
j =1
2
j
(10)
where ej is the jth residual. The Q statistic can be plotted as a function of time. The
statistical thresholds for the Q statistic [13] can be calculated as follows:
1
⎛ h c 2θ 2
⎞ h0
θ h (h − 1)
Qα = θ1 ⎜ 0 α
+ 2 0 02
+ 1⎟
⎜
⎟
θ1
θ1
⎝
⎠
where θ1 =
n
∑ λi , θ 2 =
i = k +1
n
∑ λi2 , θ 3 =
i = k +1
n
∑λ ,
i = k +1
3
i
h0 = 1 −
(11)
2θ1θ 3
and cα is the normal deviate
3θ 22
corresponding to the (1 − α ) percentile.
2.2.3. Geometrical interpretation of the monitoring statistics
The geometrical interpretation of the Hotelling’s T2 and the Q statistics is illustrated
in Fig.1 for a 2D plane formed by the first and second principal components. Point A
shows the orthogonal deviation of a new sample perpendicular to the ellipse plane
model, while point B shows the horizontal deviation of a new sample from the centre of
the ellipse plane model. The deviation represents a serious effect of the abnormal
situation to the process. The further away this deviation is from the ellipse plane model
the more serious the effect of the fault which has occurred.
-8-
Fig. 1: Geometric interpretation of the monitoring statistics
The two monitoring statistics mentioned above can compliment each other to
produce more accurate condition monitoring. However, when the effect of the fault only
emerges in one of the monitoring statistics, it may cause confusion in the analysis and
interpretation of results. A joint monitoring statistics plot which combines the Q and the
Hotelling’s T2 statistics may give a better interpretation. This is discussed details in the
following section.
2.2.4. Kernel density method for joint diagnostics
Chen et al. [14] stated that in process condition monitoring, the Q and Hotelling’s T2
statistics are the most important statistical parameters. They are useful for monitoring
the system performance independently to detect any abnormal situation. Combining
both statistics can improve the sensitivity of the individual monitoring statistic,
especially when dealing with incipient fault, such as small air leaks. This can be done
by simply using an individual statistic’s confidence limit to create joint diagnostics
confidence limits, (i.e. by plotting the Q against T2 statistics in two-dimensional
features). Alternatively, a new confidence region can be generated from the probability
density functions (PDF) of the joint Q and T2 statistics using the kernel density
estimation (KDE) method [14].
A PDF describes the likelihood with which a data point has occurred in previous
process operations. The KDE assumes that the determination of the density function is
approximated by a sum of small kernel functions (for example of a Gaussian or
-9-
Epanechnikov type) centred on each data point. Using the kernel, confidence regions are
determined entirely from the structure contained in the data set without reference to a
parametric model. KDE provides simple, reliable and useful information to a wide range
of applications in fields such as medicine, engineering and economics [15].
The univariate kernel density estimator can be formulated as follows:
n
fˆ ( x; h ) = ( nh − 1 )∑ Κ{( x − X i ) h}
(12)
i =1
where K is a kernel function that satisfies the condition
∫ K (x )dx = 1 ,
and h is the
bandwidth. Using a rescaling notation where Κ h (u ) = h −1 Κ (u h ) , Eq. (12) is
transformed into:
n
fˆ ( x; h ) = n − 1 ∑ Κ h ( x − X i )
(13)
i =1
A unimodal probability density function that is symmetric about zero is usually
chosen for K. One important aspect when using the non-parametric approach KDE is the
determination of the bandwidth h. Wand and Jones [15] state that even though it is
possible to choose the bandwidth subjectively by eye in many situations, it is very timeconsuming, especially if one has no prior knowledge of the structure of the data. They
proposed the use of an automatic bandwidth selector. In this paper, a mean integrated
squared error (MISE) type of automatic bandwidth selector cross-validation is adopted.
The extension from univariate to multivariate KDE requires some modification. The
bandwidth h is transformed into a bandwidth matrix H using a diagonal matrix with one
parameter as follows: H = h 2 Ι , where I is an identity matrix. In order to avoid a loss
of accuracy by forcing the bandwidth to be the same in all dimensions, Fukunaga [16]
suggested rescaling the data and stated that rescaling makes all variables become the
- 10 -
same in all dimensions. This reduces the computational load and provides a reasonable
choice for bandwidth selection. The determination of H has the effect of minimising the
global error criterion. For MISE cross-validation, H is given by:
{
[
}
]
2
MISE fˆ (:, H ) = Ε ∫ fˆ (x, H ) − f (x ) dx
(14)
where fˆ (x, H ) is the fitted density function and f (x ) is the real density function. The
multivariate kernel density estimator can then be written as follows:
n
fˆ (x, H ) = n − 1 ∑ Κ H (x − X i )
(15)
i =1
where H is a bandwidth matrix made up of a symmetric positive definite d × d matrix.
In analogy to the univariate version, Κ H (x ) = H
−1 2
Κ (H − 1 2 x ) where
∫ Κ (x )dx
= 1.
The probability density functions of the joint monitoring statistics between the Q and
Hotelling’s T2 statistics can be built using Eq.(15) with a small modification, where
⎧Q ⎫
x = ⎨ 2⎬
⎩T ⎭
⎧Q ⎫
and Xi = ⎨ 2i ⎬ . In this paper, a 99% confidence region is adopted, which
T
⎩
i
⎭
means that under normal operating condition not more than 1% of the total observed
data lie outside of this region.
More detailed information about KDE can be found in [15, 17]. Examples of the
application of KDE to process monitoring can be found in [18, 19].
- 11 -
3. An experimental automotive diesel engine
This section explains the procedure used to obtain the experimental data from an
engine test-cell facility. A description of the automotive diesel engine used in this study
is briefly presented. This is followed by a discussion of the types of fault conditions
investigated.
3.1. Design of experiment (DoE)
A four-cylinder Volkswagen 1.9 litre turbocharged direct injection (TDI) diesel
engine was used to provide the experimental data. The engine was coupled to a 145 kW
AC Schenck dynamometer and Ricardo control system in an instrumented test bed
facility. A photo of the test laboratory is given in Fig. 2.
Fig. 2: Photo of engine test cell with Volkswagen diesel engine connected to
the dynamometer
The fault-free, or baseline performance characteristics of the engine were recorded at
steady-state conditions with speed settings of 1500, 2500, 3500, and 4500 rev/min,
respectively. Five different pedal positions, ranging from 30% to 100%, were tested at
each speed. These test conditions are summarised in Table 1.
Table 1: Matrix of speed/load settings used during the engine tests
- 12 -
The values of the pedal positions were chosen using the following procedure. Firstly,
the peak torque values at each speed were recorded. Next, the pedal positions
corresponding to 20%, 40%, 60%, 80% and 100% of these peak torque values were
noted. These pedal positions were then used during both fault-free and fault-containing
tests. This ensured that the same inputs were used in all tests. Experimental data from a
total of 20 different combinations, covering a wide range of operating conditions, were
therefore used to develop the model. Each steady-state condition was recorded for 30
seconds at a sampling rate of 10 Hz. A total of 300 points were therefore recorded for
each combination, producing an overall total of 6000 points across the entire range of
steady-state driving conditions. At each point, the signals from 12 transducers were
recorded using the combined input settings shown in Table 2. The first 7 outputs are
available in production engines, while the remaining 5 outputs can be captured using
laboratory instruments. The model derived did not include engine speed and pedal
position as these represent input parameters that are set by the test-cell operator via the
dynamometer control system. Including these inputs for modelling will not give
additional information since they represent the ideal situation of steady-state behaviour.
It would be a different matter for a transient dynamic experimental case where the
dynamic characteristics of the inputs (speed and load) heavily influence the output
signals. However, in the case it is mandatory to include them in the model. In this
steady-state experimental case, the remaining 12 output variables, shown in Table 2, can
be affected by any operational fault that occurs.
Table 2: Recorded experimental engine signals
- 13 -
3.2. Fault investigated: air leak in the intake manifold
The fault to be examined was an air leak in the intake manifold. This particular kind
of fault can be difficult to detect as, under a range of operating conditions, the turbocharger waste gate will inherently try to counteract the fault and maintain the manifold
boost pressure at a pre-determined level. Consequently, depending on the magnitude of
the air leak, the fault may be imperceptible to the driver. However, the engine
management system (EMS) assumes that all of the air, which passes the airflow meter
will subsequently enter the combustion chamber. If some of this air escapes from the
manifold, then the overall air-fuel ratio will be lower than that assumed by the EMS.
This could therefore lead to an increase in the levels of carbon monoxide, unburned
hydrocarbons and particulate matter being released into the atmosphere, especially at
full load conditions. Depending on the location of the leak within the intake manifold
and the method of control used, the exhaust gas recirculation (EGR) process may also
be affected leading to an increase in NOx emissions.
In this investigation, the air leak was created by drilling holes of 2 mm, 4 mm and 6
mm diameters in a removable bolt in the inlet manifold plenum chamber. The manifold
was pressure tested for leaks prior to the experimental leak being introduced. The
complexity of the combustion process makes the identification of such a fault a difficult
task. The effect of these leaks on the raw data is shown in Fig. 3, which highlights the
fact that it is difficult to identify the abnormal effect. Consequently, with the exception
of the φ6 mm hole, this type of fault would be difficult to detect using a physical model
(see Section 4.1). Of particular interest is the data recorded with a φ2 mm leak which
appears to be identical to the data recorded during the fault-free condition.
- 14 -
Fig. 3: Raw plot of the experimental data for all measured signals
4. Condition monitoring of intake manifold air leaks
This section discusses the condition monitoring process where the detection and
diagnosis of air leaks in the inlet manifold plenum chamber is investigated. A
comparison between physical and principal component models is provided to illustrate
the effectiveness of the proposed data-driven model over conventional physical
techniques to examine the effect of air leaks, especially for the smallest leak (φ2 mm).
4.1. Physical model
Using a physical model, the air leak rate was calculated using the pressure difference
between the manifold and the surrounding atmosphere using the following equation:
m& = A 2 ρΔP × C D
(16)
where: m& is the mass flow rate of air through the hole (kg/s), A is the area of the hole
(m2), ρ is the density (kg/m3), ΔP is the manifold boost pressure (Pa), and CD is the
coefficient of discharge which was taken to be 0.6. The air flow entering the engine
was measured by the air flow meter during engine testing.
As expected, it was found that the percentage of air lost through the hole increased as
the diameter of the hole increased. For the three diameters tested, the highest percentage
loss occurred at 1500 rev/min at full load, reaching 1.98 %, 7.05% and 15.19% for φ2
mm, φ4 mm and φ6 mm holes, respectively. Under these conditions the air flow rate
entering the engine is low due to the low engine speed. However, the manifold boost
- 15 -
pressure, which is the driving force for the air leakage flow, is almost at its maximum.
Consequently, the air leak flow rate constitutes a high proportion of the flow rate
entering the engine.
Fig. 4 shows the flow rate of air through the hole as a percentage of air entering the
engine. Given that the maximum air leak rate was less than 2% for the φ2 mm hole, with
an average value less than 1%, this fault posed a difficult challenge for the fault
detection and diagnosis algorithm.
Fig. 4: Percentage air loss caused by 2 mm, 4 mm and 6 mm air leaks in the
inlet manifold plenum chamber for all combinations tested.
4.2. Principal components model
Using the 12 output signals taken from the experimental data, a PCA model was
built. Table 3 shows the variance captured by PC scores in descending order.
Table 3: Variance captured by PCA
The method of choosing how many PCs to retain was based on the percentage of
variance captured by each PC. A minimum of 1% was required for a PC to be included.
Popular methods such as the eigenvalue-one rule and the cross-validation procedure are
not suitable for this case. Both approaches select 2 PCs to be retained to build a model.
During residual evaluation it was found that using these approaches the residual value
still contained a considerable amount of the variation of the original data. Further
examination revealed that 5 PCs captured most of the original variance of the
- 16 -
experimental data and left only a negligible level of less than 1% of unimportant
variation and noise. Information regarding the methods used to choose the number of
PCs is not discussed here due to limitation of space but can be found in Jolliffe [12].
4.3. Process monitoring of air leaks fault
Section 2.2 has discussed the monitoring statistics used to detect air leaks in the
intake manifold plenum chamber. To illustrate the monitoring process, a new data set of
150 seconds (corresponding to 1500 samples) for various driving conditions with a φ2
mm air leak introduced for the last 300 samples at 1500 rev/min and at full load, was
used for validation purposes.
Fig. 5: Process monitoring using Hotelling’s T2 statistic.
Fig. 6: Process monitoring using Q residual statistic.
Fig. 5 and Fig 6 illustrate the monitoring process using Hotelling’s T2 and Q statistics,
respectively. Two confidence limits (99% and 95%) are provided to highlight the
violation that was caused by a φ2 mm air leak at full load, 1500 rev/min. While the first
1200 samples remain below the confidence limits, the majority of the last 300 samples
(sample 1201 onwards) strongly violate the confidence limits. This abnormal condition
caused by the φ2 mm air leak in the intake manifold becomes more apparent in Fig. 7.
The joint monitoring statistics plots shown in Fig. 7 can enhance the detection
capabilities with increased sensitivity. The first joint diagnostics plot simply combines
the two monitoring statistics (Q and T2) and plots them together on the X and Y axes.
- 17 -
The validation data set consists of the plus symbol (which represents the first 1200
conforming samples) and the cross symbol (which represents the non-conforming
samples caused by the φ2 mm air leak at sample numbers 1201 to 1500). There are 4
regions in Fig. 7(a) defined by the two 99% confidence limits of the monitoring
statistics, denoted as R1, R2, R3 and R4, respectively. R1 illustrates the normal region
containing samples which fall below both confidence limits. In contrast, as can be seen
R3 is the region containing those samples which violate both confidence limits. The
samples contained in this region represent the most abnormal conditions and stem
mostly from the φ2 mm condition represented by the cross symbol. R2 and R4 contain
samples which violate either the Q residual (R2) or Hotelling’s T2 (R4) statistics alone.
Fig. 7(a): Process monitoring using combined Q residual and Hotelling’s T2
statistics.
The second joint diagnostics plot shown in Fig. 7(b) utilises a confidence region
estimated using the kernel density method as discussed in Section 2.2.4 for condition
monitoring. Here, the contour represents the 99% confidence region of the joint PDF
between the Q and T2 statistics. Any points falling outside the contour represent outliers
which occurred as an effect of the air leaks.
Fig. 7(b): Process monitoring using a kernel density confidence region of
the combined Q residual and Hotelling’s T2 statistics.
Further analysis of the effect of the air leaks can be retrieved using contribution to T2
and Q statistics. Fig. 8 shows an analysis of the variation that is not captured by the
model (T2 statistic). It is obvious that the HC (hydrocarbon) measurement signal is
- 18 -
affected the most when air leaks occurred, especially from points 1200 onward where
air leaks are introduced.
Fig. 8: Contribution to Hotelling’s T2 statistic for various data points (fault-free and
faulty conditions).
Fig. 9: Contribution to Q residual statistic for various data points (fault-free and faulty
conditions).
In a similar fashion, Fig. 9 shows an analysis of the mismatch between the diagnostic
model and the unseen measured signals (Q statistic). It clearly shows that fuel flow
measurement signal is affected the most by air leaks, especially from point 1200
onwards.
5. Conclusions and further work
This paper demonstrated that a data-driven monitoring technique, principal
components analysis (PCA), is a simple, straight-forward, powerful and potentially
useful technique for condition monitoring in automotive applications. The diagnostic
model is capable of exploring and exploiting underlying ‘hidden’ information from the
experimental data in a compact manner. No requirement to make any simplifying
assumptions is needed in building the model. This means that the model is derived
solely from the measurement signals. The interdependency of the original signals is
‘captured’ and ‘transformed’ into a new and smaller number of independent signals
(Section 2). The remaining un-captured signals will contain mainly un-informative and
noisy data. The variation (T2 statistic) and the residual generator (Q statistic) of these
- 19 -
un-captured signals are used as the back-bone of fault detection and diagnosis process.
It was shown in Section 4 that the diagnostic PCA model performed better in
comparison to a physical model (where an assumption is made to define a coefficient
discharge, CD) when detecting air leaks at the intake manifold plenum chamber,
especially for a small diameter air leak (φ2 mm). Using two joint monitoring statistics
plots, a clearer detection and diagnosis can be visually represented and a better analysis
can be carried out. A confidence region estimated using kernel method increases the
sensitivity of the monitoring process. It allows easier visual representative interpretation
thereby improves the detection and diagnosis of small air leaks (see Fig. 7(b)) in
comparison to joint diagnostics built by simply combining both monitoring statistics
(see Fig. 7(a)). Further analysis using contribution to T2 and Q statistics show the effect
of the air leaks on fuel consumption and indicate that they may contribute to emissions
fault. Another important benefit of using this diagnostic model is that it can be used to
detect and to diagnose any type of fault (within the scope of the measured signals) in a
similar manner to the air leaks fault.
The proposed technique has therefore shown good potential to automotive
applications. It may be a valuable tool for a variety of condition monitoring situations,
especially as the emissions regulations become increasingly stringent.
- 20 -
Acknowledgements
David wishes to acknowledge the support of Dr. Darja Brandenburg for proofreading this manuscript. Comments and suggestions received from Dr. Geoffrey
McCullough of the Internal Combustion Engines Research Group are appreciated.
Special gratitude goes to Dr. Paul McEntee for his assistance in collecting the
experimental data with the support of the Virtual Engineering Centre. Support received
from Prof. George W. Irwin and Dr. Uwe Kruger of the Intelligent Systems and Control
Research Group, Queen’s University Belfast, is gratefully acknowledged. Thanks also
to the Electrical Test for Advanced Architectures team at the International Automotive
Research Centre, University of Warwick.
- 21 -
References
[1]. W.N. Mills III, Automated analysis of automotive data, SAE World Congress,
Vehicle Diagnostic, SP-1922, No. 2005-01-1437, Detroit, USA, April 2005.
[2]. R. Isermann, Model-based fault detection and diagnosis – status and applications,
Annual Reviews in Control, 29(2005) 71-85.
[3]. J. Gertler, Fault detection and diagnosis in engineering systems, Marcel Dekker,
New York, USA, 1998.
[4]. D. Antory, Fault diagnosis applications using nonlinear multivariate statistical
process control, Ph.D. Thesis, School of Electrical & Electronics Engineering,
Virtual Engineering Centre, Queen’s University Belfast, Belfast, Northern Ireland,
UK, February 2005.
[5]. S.J. Qin, Statistical process monitoring: basics and beyond, J. Chemometrics,
17(2003) 480-502.
[6]. J.F. MacGregor, Data-based methods for process analysis, monitoring and control,
in: Proc. 13th IFAC Symposium on System Identification, Rotterdam, The
Netherlands, 2003, pp. 1019-1029.
[7]. J.F. MacGregor, T. Kourti, Statistical process control of multivariable processes,
Control Engineering Practice, 3(1995) 403-414.
[8]. T. Kourti, J.F. MacGregor, Process analysis, monitoring and diagnosis, using
multivariate projection methods, Chemometrics and Intelligent Laboratory
Systems, 28(1995) 3-21.
[9]. P. Geladi, B.R. Kowalski, Partial least-squares regression: a tutorial, Analytica
Chimica Acta, 185(1986) 1-17.
- 22 -
[10]. J.E. Jackson, A user guide to principal components, Wiley, New York, USA,
1991.
[11]. K.V. Mardia, J.T. Kent, J.M. Bibby, Multivariate analysis, Academic Press,
London, UK, 1979.
[12]. I.T. Jolliffe, Principal component analysis, Springer, New York, USA, 1986.
[13]. J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with
principal component analysis, Technometrics, 21(1979) 341-349.
[14]. Q. Chen, U. Kruger, M. Meronk, A.Y.T. Leung, Synthesis of T2 and Q statistics for
process monitoring, Control Engineering Practice, 12(2004) 745-755.
[15]. M.P. Wand, M.C. Jones, Kernel smoothing, Monographs on statistics and applied
probability 60, Chapman & Hall, London, UK, 1995.
[16]. K. Fukunaga, Introduction to statistical pattern recognition, Academic Press,
London, UK, 1990.
[17]. B.W. Silverman, Density estimation for statistic and data analysis, Monograph on
Statistics and Applied Probability 26, Chapman & Hall, London, UK, 1986
[18]. E.B. Martin, A.J. Morris, Non-parametric confidence bounds for process
performance monitoring charts, J. Process Control, 6(1996) 349-358.
[19]. Q. Chen, R. Wynne, P. Goulding, D.J. Sandoz, The application of principal
component analysis and kernel density estimation to enhance process monitoring,
Control Engineering Practice, 8(2000) 531-543.
- 23 -
Vitae
David obtained a Sarjana Teknik (Ingenieur) degree in
Electronics Engineering (major) from the Institute of
Technology Sepuluh Nopember (ITS), Surabaya, Indonesia.
Following qualification, he worked as an engineer for a
year, before taking up an opportunity for further studies at
the University of Sheffield, where he received an MSc(Eng)
degree in Control Systems Engineering. Upon completion, he joined the Virtual
Engineering Centre, Queen's University Belfast to do Ph.D research studies in a
multidisciplinary project. He completed his studies and obtained a Ph.D degree in
Control Engineering. Since Sep 2004, he has been working as a Project Engineer for the
Electrical Test for Advanced Architectures project at the International Automotive
Research Centre (IARC), Warwick Manufacturing Group, based at the University of
Warwick. He is a member of IEEE and SAE. His current research interests include:
fault detection and diagnosis, multivariate statistical process control, non-linear system
modelling and identification, neural networks, intelligent data mining and process
optimisation, applied to modelling and control in automotive, aeronautics and industrial
chemical processes.
- 24 -
Figures captions
Fig. 1: Geometric interpretation of the monitoring statistics
Fig. 2: Photo of engine test cell with Volkswagen diesel engine connected to
a chassis dynamometer
- 25 -
original fault-free samples
samples of 2 mm air leaks
samples of 4 mm air leaks
samples of 6 mm air leaks
F u e l F l o w ( k g /h )
Ai r fl o w ( k g /h )
15
300
10
200
5
100
In ta k e Ma n i fo l d P r e s s u r e ( b a r )
In ta k e Ma n i fo l d T e m p e r a tu r e ( C )
0.8
80
0.6
0.4
60
0.2
40
T u r b i n e In l e t P r e s s u r e ( b a r )
T u r b i n e In l e t T e m p e r a tu r e ( C )
800
1.5
600
1
400
0.5
To rq u e (N m )
T u r b i n e E xi t P r e s s u r e ( b a r )
0.1
150
0.05
100
0
50
-0.05
T u r b o S p e e d ( H z)
C O 2 ( p e r c e n ta g e )
5000
10
4000
8
3000
6
2000
4
H C (p p m )
O 2 ( p e r c e n ta g e )
15
25
20
15
10
10
5
1000
2000
3000
4000
Data Points
5000
6000
1000
2000
3000
4000
5000
Data Points
Fig. 3: Raw plot of the experimental data for all measured signals
- 26 -
6000
14
2 mm hole
4 mm hole
6 mm hole
Percentage air loss through hole
12
10
8
6
4
2
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
Data Points
Fig. 4: Percentage air loss caused by 2 mm, 4 mm and 6 mm air leaks in the
inlet manifold plenum chamber for all combinations tested.
40
Hotelling's T 2 statistic value
99% Confidence limit
95% Confidence limit
35
25
N o n- c o nfo rm ing
s a m ple s
Hotelling's T
2
Statistic
30
20
15
10
5
300
600
900
1200
1500
Sample Nu mb er
Fig. 5: Process monitoring using Hotelling’s T2 statistic.
- 27 -
Q Residual statistic value
99% Confidence limit
95% Confidence limit
0.6
No n- c o nfo rm ing
s a m ple s
Q Residual Statistic
0.5
0.4
0.3
0.2
0.1
300
600
900
1200
1500
Sample Number
Fig. 6: Process monitoring using Q residual statistic.
1210
12091206
1207
1208
40
Conforming samples
Non-conforming samples
1205
1212
1211
1204
35
1203
1213
1483
1482
Hotelling's T 2 Statistic
30
1484
1485
1201
1481
1214
1486
25
20
1480
1441
1442
1443
1444
1440
1439
1433
1431
1432
1445
1446
1437
1438
1434
1435
1429
1430
1436
1447
1448
1428
1241 1215
1487
1242
1240
1216
1479
1488
1449
1490
1456
1454
1455
1450 1489
1453
1457
1477
1478 1451
1452
1459
1458
1460
1491
R4
1461
1476
1475
1492
1462
1493
10
1426
1418
1419
1416
1417
1494 1474
1415
1420
1421
1464
1463
1425
1414
1422
1347
1423
1424
1413
1348
1412
1496
1497
1473
135113501349
1368
1369
1498
1352
13671500
1499
1495
1354
1355
1353
1366
1411
1465
1356
1466
1381
1382
1467
1471
1470
1469
1365
1379
1380
1410
1383
1472
1468
1409
1370
1358 1357
1384
1364
1359
1360
1363 1385
137813621361
1408
1371
1386
1377
1387
1407
1372
1376
1388
13891390
1373
1374
1375
1405
1406
1403
1391
1402
1404
1400
1401
1399
1392
1398
1393
1394
1397 1396 1395
1243
1244
1217
1245 1239
1315
1310
1311 1280
1313
1314
1297
1312
1279
1299
1278
1300
1298
1296
1219
1218
1294
1295
1272
1277
1316
1293
1220
1274
1273
1309 1281
1270
1269
1276
1271
1275
1317
1292
1268
1267
1302
1301
1282
1284
1221
1283
1318 1307
1260
1266
1308
1222
1259
1261
1262
1246
1265
1323
1326
13271303
1263
1223
1304
1257
1258
1264
1230
1324
1325
1319
1320
1285 1238
1228
1229
1305
1306 1291
1247
1248
1321
1322
1224
1225
1227
1226
1328
1231
1286
1290
1256
1232 1237
1329
1249
1289
1287
1233
1330
1288
1255
1234
1250
1331
1236
1235
1332
1333
1334
1254
1251
1335
1336
1337
1338
1253
1252
1339
1340
1341
1342
1344
1343
R3
1427
15
1202
1345
1346
R1
R2
5
0.1
0.2
0.3
0.4
0.5
0.6
Q Residual Statistic
Fig. 7(a): Process monitoring using combined Q residual and Hotelling’s T2 statistics.
- 28 -
1210
1209
1207
1206
1208
40
1205
1212
1211
1204
35
1203
1213
1483
1482
30
1484
1485
1201
1481
Hotelling's T 2 Statistic
1202
1214
1486
1441
1442
1443
1444
1440
1439
1431
1433
1432
1445
1446
1437
1438
1434
1435
1429
1430
1436
1447
1448
25
20
1428
1480
1241 1215
1487
1240
1242
1216
1479
1488
1449
1490
1456
1454
1455
1450 1489
1478
1453
1457
1452
1477
1451
1458
1459
1460
1491
1461
1475
1476
1492
1427
1462
1493
1426
1418
1419
1417
1416
1494 1474
1415
1420
1421
1425
1414
1463
1464
1422
1347
1423
1424
1413
13501349 1348
1500
1412
1473
1368
1496
1497
1351
1369
1498
1367
1495
1352
1355
1354
13661499
1411
1353
1465
1356
1466
1381
1382
1467
1471
1469
1470
1365
1379
1380
1383
1472
1468
1370 14101409
1384
1358 1357
1364
1359
1360
1363 1385
137813621361
1408
1386
1371
1377
1407
1387
1372
1376
13891390
1388
1373
1374
1375
1405
1406
1403
1391
1402
1404
1400
1401
1399
1392
1398
1394
1393
1397 1396 1395
15
10
1346
1345
1243
1244
1217
1245 1239
1310
1315
1311 1280
1314
1313
1297
1279
1312
1300
1299
1298
1278
1296
1219
1295
1294
1218
1272
1277
1316
1293
1220
1309 1281
1274
1273
1270
1269
1271
1276
1275
1317
1268
1267
1302
1301
1292
1282
1284
1221
1283
1318 1307
1266
1308
1260
1222
1246
1259
1262
1261
1265
1323
13271303
1326
1263
1223
1304
1258
1257
1264
1325
1230
1324
1320
1319
1228
1285 1238
1229
1321
1322
13061291
1305
1248
1247
1225
1224
1227
1226
1328
1286
1290 1231
1256
1232 1237
1329
1249
1289
1287
1233
1330
1288
1255
1234
1250
1331
1236
1235
1332
1333
1334
1254
1251
1335
1336
1337
1338
1253
1252
1339
1341
1340
1342
1344
1343
5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Q Residual Statistic
Fig. 7(b): Process monitoring using a kernel density confidence region of
the combined Q residual and Hotelling’s T2 statistics.
Fig. 8: Contribution to Hotelling’s T2 statistic for various data points (fault-free and
faulty conditions).
- 29 -
Fig. 9: Contribution to Q residual statistic for various data points (fault-free and faulty
conditions).
- 30 -
Table captions
Table 1: Matrix of speed/load settings used during the engine tests
Speed
(rev/min)
Pedal Position
(% load)
1500
30
40
54
62
100
2500
49
59
74
78
100
3500
57
64
74
80
100
4500
62
65
76
83
100
Table 2: Recorded experimental engine signals
Engine Variable
Speed
Pedal Position
Unit
rev/min
%
Fuel Flow
kg/h
Air Flow
kg/h
Intake Manifold Pressure
bar
Intake Manifold Temperature
◦C
Turbine Inlet Pressure
bar
Turbine Inlet Temperature
◦C
Turbine Exit Pressure
bar
Torque
Nm
Turbo Speed
Hz
CO2
%
HC
ppm
O2
%
Note
input
output
- 31 -
Table 3: Variance captured by PCA
Number
of PC
1
7.46
Variance Captured
by each PC (%)
62.13
Total Sum of Variance
Captured (%)
62.13
2
3.55
29.61
91.74
3
0.44
3.68
95.42
4
0.30
2.52
97.94
5
0.15
1.25
99.19
6
0.049
0.40
99.59
7
0.040
0.33
99.92
8
0.006
0.047
99.967
9
0.003
0.015
99.982
10
0.001
0.011
99.993
11
0.0006
0.005
99.998
12
0.0002
0.002
100
Eigenvalue
- 32 -
Download