C ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION

advertisement
0263±8762/00/$10.00+0.00
q Institution of Chemical Engineers
Trans IChemE, Vol. 78, Part A, October 2000
ACCURATE IDENTIFICATION OF BIASED
MEASUREMENTS UNDER SERIAL CORRELATION
Ä ANA
R. KONGSJAHJU, D. K. ROLLINS and M. B. BASCUN
Departments of Chemical Engineering and Statistics, Iowa State University, Iowa, USA
C
hemical process data are often correlated over time (i.e., auto or serially correlated) due
to recycle loops, large material inventories, sampling lag, dead time, and process
dynamics created by high-order systems and transportation lag. However, many
approaches that attempt to identify gross errors in measured process variables have not
addressed the issue of serial correlation which can lead to large inaccuracies in identifying
biased measured variables. Hence, this work extends the unbiased estimation technique
(UBET) of Rollins and Davis1 to address serial correlation. The serially correlated gross error
detection study of Kao et al.2 is used as a basis for setting up this study and comparison. In their
work, the type of autocorrelation was assumed known (ARMA(1,1)), and the measurement test
(MT) was used for the identi®cation of the measurement bias. While Kao et al.2 used
prewhitening of the data and variances of measured variables derived from knowledge of the
time correlation structure, this work presents two prewhitening methods and a different
identi®cation strategy based on the UBET. Results of the simulation study show the UBET has
higher perfect identi®cation rates and lower type I error rates over the MT.
Keywords: serial correlation; autocorrelation; time series; gross error detection; data
reconciliation
INTRODUCTION
As measurement and modelling technology continues to
improve, along with advancements in computer technology,
the amount of available process data will continue to grow.
However, the bene®ts of these large amounts of data cannot
be fully harnessed if data are inaccurate. For the past four
decades, active research in the area of gross error detection
or GED has aspired to detect, identify, and correct these
measured process variables with signi®cantly large errors.
Most GED methods proposed in the literature for the
detection of large process measurement errors are derived
from the assumption of serially independent process data
that are iid (i.e., identically and independently distributed).
This iid assumption is not always a realistic view of the
condition of real processes. From a statistical point of view,
serial correlation occurs when a process measurement
variable is correlated over time. This kind of correlation
can occur when there is model-mismatch in the dynamics
that was not accounted for in either the process or
measurement model. In most GED literature, serial correlation is differentiated from process dynamics and is used to
refer strictly to correlation between measurement errors. In
chemical processes, serial correlation may occur because of
a number of physical factors such as process dead time,
process dynamics, process control (e.g., feedback control),
as well as factors related to measuring instruments.
Kao et al.2 proposed one of the ®rst GED methods to
address serial correlation. They prewhitened residuals of
measured variables and used the measurement test (MT) to
identify gross measurement errors. The authors’ evaluation
of this approach indicated a high rate of false identi®cation
consistent with other MT methods involving non-serially
correlated process data (Rollins, et al.3).
The concept of prewhitening process measurements for
handling serial correlation is also used in this work to extend
the capabilities of the unbiased estimation technique
(UBET) developed by Rollins and Davis1. Over the years
the UBET has been extended to address a variety of
conditions including unknown variances and covariances of
measurement errors (Rollins and Davis4), bilinear constraints (Rollins and Roelfs5, Kuiper et al.6), dynamic
processes (Rollins et al.7, Devanathan8) and for automatically controlled processes (Rollins et al.9, Manuell et al.10).
However, this is the authors’ ®rst attempt to extend the
UBET to serially correlated data.
There are two major contributions in this article. First,
two ways of prewhitening serially correlated data are
presented: (1) directly on the measured variables and (2) on
the nodal mass balances, both of which are different from
Kao et al.2. The merits of both strategies are then
investigated. The second major contribution is the modi®cation of the UBET test statistics and hypothesis tests to
correctly address the use of prewhitened transformed data.
This paper is organized into three sections. The ®rst
section presents the process measurement model that
includes the serial correlation structure used in this study
and a note on the scope of this work. This section is
followed by a reproduction of some Kao et al.2’s results and
a discussion of the limitations of the measurement test (MT)
in dealing with serial correlation. The last section discusses
the enhancement of the UBET to serially correlated process
measurements. This section compares results of the UBET
and the MT.
1010
ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION
1011
PROCESS AND MEASUREMENT MODEL UNDER
SERIAL CORRELATION
The steady state measurement model that takes into
account measurement bias (dt ) and serially correlated
measurement errors (Et) with the correlation structure of
a ®rst order autoregressive moving-average model,
ARMA(1,1)*, is shown below:
Yt
m
Et
dt
1
in this study. This indicates the following assumptions:
where
Et
Ut ,
u
1
1
B
Ut
1B
1
2
Nu 0, S
3
w
Since process variables have to satisfy physical constraints, equation (1) is subject to
Am
0
4
where A is a known w ´ u constraint matrix with the
rank A
w (u number of measured variables and
w number of physical constraints).
The equations for the development of the matrix A for the
generic process network shown in Figure 1 used by Kao
et al.2 and in this study are given below.
m1
m6
m2
0
m2
m3
0
m3
m4
m5
0
m5
m6
m7
0
m4
5
7, w
Thus, for this process network (u
can be written as:
Am
Figure 1. Process network used in the simulation study.
2
1
60
6
6
40
0
1
1
0
1
0
1
0
0
4), equation (4)
1
0
1
0
0
0
1
0
1
1
0
1
2
m1
3
6 7
36 m2 7
0 6 7
6 m3 7
6 7
0 7
76 7
76 m4 7 6
0 56 7
6 m5 7
7
1 6
6 7
4 m6 5
m7
2
In the next section, Kao et al. ’s approach to serial
correlation using the measurement test (MT) is examined
using a replication of their algorithm. This study used
exactly the same system and physical constraints (i.e.,
material balances) as in Kao et al. to facilitate the
comparison of their method with the new methods
introduced in this paper. All replication details can be
found in Kao et al. including the value of m used in the
simulations which is (1,3,3,1,2,1,1)T . As in Kao et al.2 one
(and only one) biased stream exists for each simulation run.
While it is possible for the serial correlation to be of a higher
order ARMA ( p,q), ARMA(1,1) time series models are used
to generate data following the study by Kao et al.2. In
addition, the scope used by Kao et al.2 was adopted for use
* For a comprehensive introduction to time series analysis including
ARMA models and calculation of autocorrelation and partial autocorrelation functions (i.e., ACFs and PACFs) discussed in this paper, the reader is
referred to Box and Jenkins11.
Trans IChemE, Vol 78, Part A, October 2000
(1) variances and covariances of measurement errors are
known;
(2) ARMA parameters (i.e., values of u 1 and w 1 ) are known
accurately without any sampling error;
(3) there is only one measurement bias at any given time;
and
(4) the true state variable vector x t
x is ®xed.
The fourth assumption means that there are no variances
due to process dynamics and thus, there will be no
autocorrelations in the y(t) induced by it. This means
autocorrelations that would be detected could only come
from errors in the measurement data (the y’s). It must be
noted that to date, there has not been any meaningful
progress in the literature in actually separately identifying
autocorrelations due to process dynamics and measurement
errors because of modelling dif®culty. As will be seen in the
later sections of this paper, the approach here is not
restricted by the ®rst three assumptions and it is chosen only
for reasons of comparison.
USING THE MEASUREMENT TEST FOR
HANDLING SERIAL CORRELATION
The measurement test (MT) is a popular GED test
developed by Mah and Tamhane12. It has been used in
a number of methods under the assumption of white noise
(Iordache et al.13, Heenan and Serth 14, Rosenberg et al.15,
Narasimhan and Mah16). To the authors’ knowledge, Kao
et al.’s application of the MT, is the only GED method
dealing with serial correlation that has been formally
evaluated and thus, should be used as the medium for
comparing the performance of the new methods stated here.
A brief discussion of the procedure used in Kao et al. and
some comments on their results follow.
Prewhitening Applied to the Estimates of the Residuals
The same measurement model as described in the
previous section was used in the work of Kao et al.2. The
serially correlated measurement error term, Et, with
ARMA(1,1) structure is shown in equations (2) and (3).
The MT makes use of the vector of measurement
adjustments obtained from weighted least-squares data
reconciliation. This residual vector using these estimates
and the measured variables is given by
à t SAT ASAT 1 AYt
rt Yt Y
7
where A is the w ´ u constraint matrix, S is the u ´ u
variance-covariance matrix of random measurement error
à t is the
terms, Yt is the u ´ 1 measurement vector, and Y
least-squares estimate of E Yt .
As the basis of their approach, Kao et al.2 prewhitened rt
KONGSJAHJU et al.
1012
by
Ut
1
1
w
Table 1. Performance of the MT applied to averaged prewhitened residuals
with the same ARMA(1,1) in all streams; S 0.25, di 0.2, a 0.1.
B
rt .
B
1
1
u
8
The MT is then applied to the ®ltered (prewhitened)
residuals, Ut , and declares a bias in the i th measurement
at time t if, and only if
Uti
ZUti
> Za
9
si
where ZUti is the standardized univariate normal test statistic
for the i th measurement at time t, Za is the upper a point of
the standard normal distribution, a is 12 1
1 a 1/u , and
st is the standard deviation of measured variable i (i.e., the
i th element of the diagonal of S).
The replication of Kao et al.2’s method included the
following conditions:
(1) S 0.25I;
(2) a signi®cance level (a ) of 0.1;
(3) only one biased variable with magnitude dt 0.25
existed at any given time;
(4) there were 10,000 trials simulated per biased variable.
TM
Data was generated using FORTRAN
Power Station
with IMSL TM subroutines in Microsoft TM WindowsTM.
Standard normally distributed random numbers were used
to generate the white noise.
The performance measures presented by Kao et al.2 were
OP (overall power) and P (type I error) (probability of type
I error). They de®ned OP as follows:
OP
No. of non-zero ds correctly identified
.
No. of non-zero ds simulated
10
An OP equal to 0.63 means that for 10,000 cases with
simulated biases, 6,300 cases identify the bias correctly. It
should be noted, however, that their chosen performance
measures have weaknesses. First, whenever the biased
stream is correctly identi®ed, OP increases even though
other non-biased streams may have also been identi®ed as
biased at that time. In other words, OP still increases despite
false identi®cations in other portions of the network.
Second, Kao et al.2 used P (type I error) to indicate the
performance of false identi®cations when no actual bias
existed in the measured variables. Hence, this indicator does
not provide suf®cient information regarding the false
identi®cation of non-biased variables when bias exists. To
give a better measure of performance for unbiased variables
when bias exists, P(type I error) was replaced with the AVTI
(averaged type I error) (Narasimhan and Mah16) and the
OPF (overall performance) (Rollins and Davis1) added to
the study. These measures are de®ned as
AVTI
OPF
No. of zero ds wrongly identified
No. of simulation trials 10,000
No. of trial with perfect identification
.
No. of simulation trials 10,000
11
12
The AVTI effectively shows false identi®cations for
unbiased variables when an actual bias exists in some
variable(s) in the process. The OPF, on the other hand, is a
measure of perfect identi®cations. For example, an OPF of
0.5 means that in the 10,000 simulated trials there were
w1
0.4, u
1
0.2
i
AVTI
OP
OP a
OPF
1
2
3
4
5
6
7
0.9649
0.9672
0.9695
0.9734
0.9684
0.9686
0.9663
0.6365
0.6374
0.6412
0.5772
0.6135
0.5693
0.6373
0.6100
0.6100
0.6200
0.4800
0.5600
0.4800
0.6100
0.0051
0.0079
0.0054
0.0027
0.0056
0.0035
0.0063
i the stream variable that is biased
OP OP results from replication
OPa OP results reported by Kao, et al.2
5,000 trials that identi®ed all zero ds correctly and all nonzero ds correctly. The main goal in identi®cation is to obtain
high OPF and low AVTI.
The MTs Performance
Table 1 presents results of some cases in Kao et al.2 and
the replication of their study in this work. As shown, there is
close agreement between the OPs. It can then be concluded
that their study has been successfully replicated, and more
importantly, their algorithm. Table 1 reveals very high
values of AVTI and very low values of OPF. A high AVTI
means it has a very high likelihood of identifying unbiased
variables as biased when an actual bias exists in a stream
somewhere within the process network. Very low OPFs on
the other hand, suggest very high rates of imperfect
identi®cation (i.e., while the biased stream is correctly
identi®ed, many unbiased streams in the network are also
being identi®ed as biased at the same time). This result
shows the particular drawback of a high rate of false
identi®cation for the general class of strategies using the MT
in the presence of biased variables. This conclusion agrees
with previous studies done on the MT (Rollins17,
Rollins et al.3) under the condition of white noise. It
addition, it must be noted that the method of prewhitening
presented by Kao et al.2 attempts to remove serial correlation
from the estimated values of the residuals, i.e., prewhitened
the estimates of residuals. This approach produces biased
estimates of Yt and therefore produces biased residuals that
result in decreased identi®cation accuracy.
USING THE UBET FOR HANDLING SERIAL
CORRELATION
The unbiased estimation technique (UBET) seeks to
obtain high OPFs. The UBET is an approach for gross error
detection which was designed to address limitations of other
techniques such as inability to control type I and type II
errors, statistically inconsistent estimators and biased
estimators. The ultimate goal, as the name suggests, is
producing unbiased estimators. In this study, the approach
and model formulation of the UBET was used for gross
error detection and identi®cation, and extended to handling
serially correlated process data.
Note that in the previous section, for the MT, prewhitening is applied to the residuals using estimated values of
Trans IChemE, Vol 78, Part A, October 2000
ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION
1013
process variables. The prewhitening approaches used here
are different. Two ways of prewhitening process data for
application of the UBET were considered. The ®rst
approach was to directly prewhiten the serially correlated
measurement data (i.e., the Yt s). The second approach was
to prewhiten the nodal mass balances (Rt ). As will be seen
later, an advantage of the latter approach over the former
one is there are fewer variables to model and transform.
A simulation study was done to show the effects of these
prewhitening schemes with the UBET and to compare them
to the method presented by Kao et al.2. As stated earlier, the
same ¯ow network as in Kao et al.2 and Rollins and Davis1
was used (see Figure 1). For this study, S I and a
signi®cance level (a ) of 0.05 were used. Only one biased
variable (dti di 5.0) existed in each trial. The number of
simulation trials per biased variable was 10,000 and the data
were generated by using the FORTRAN TM Power Station
with IMSL TM subroutines in MicrosoftTM WindowsTM.
Standard normally distributed random numbers were used
to generate the white noise.
mass). Although not restrictive, we are assuming the nonexistence of leaks for simplicity.
The next step as in the UBET is the following
transformation using the prewhitened process data (Yt ):
Prewhitening Applied to Yt
Where li is a vector of zeros and ones, representing nodal
balances or combinations of nodal balances.
The Bonferroni test statistic is used as a test for the above
hypotheses with the appropriate changes from Rollins and
Davis1. This test is given as, reject Hot : lT mRt
0 in favor
of Hat : lT mRt Þ 0, if and only if:
p


N lT Rt
q









 $ Za /2w
23
l T S Rt l
The idea of prewhitening Yt , is to multiply Yt given by
equation (1), with a transfer function that transforms it to
randomly independent process data (Yt ), given by the
equation below:
Yt
m
Ut
dt
13
where
P B Yt
Yt
Yt
p 1 Yt
m1
p1
p 2 Yt
1
...
2
14
and
m
p2
...
15
If the bias is assumed constant over time, dt
with
d
d1
p1
p2
u
w
d and dt
...
d,
16
where
pm
u
m 1
1
1
1
m
,
1, 2, 3, . . .
17
The assumption of constant bias is not a very restrictive
assumption granted that a bias is soon removed once it is
detected and identi®ed. The transfer function P B , is
written as a functin of u 1 B and w 1 B (note that B is the
backward-shift operator):
1
1
PB
u
B
w 1B
Rt
AYt
Am
Ad
AUt
20
where it is assumed that mRt
E Rt
0, if and only if,
d
0 (i.e., d 0). This assumption is necessary for the
hypotheses:
Hot : mRt
Ad
0
Hat : mR t
Ad Þ
0
21
The basic identi®cation mechanism of the UBET is to relate
linear combinations of the components of mRt to speci®c
conclusions regarding the components of d .
Hoi,t : lTi mRt
0
Hai,t : lTi mRt Þ
0
22
where, S Rt
ASAT and N is the sample size (the number
of measurements for each variable at each sampling time).
Adequate prewhitening will remove serial correlation
from process data. One way to verify the removal of serial
correlation is to calculate the sample autocorrelation
functions or ACFs. The ACF plots for the serially correlated
process data (for a particular stream measurement) and the
prewhitened results are shown in Figures 2 and 3,
respectively. The dashed lines represent the upper and
lower limits from which the sample ACFs are judged to be
signi®cantly different from zero. Prior to prewhitening, the
ACF plot (Figure 2) shows ACF values that are signi®cantly
different from zero in the early lags and dying down as the
lag increases. After the prewhitening, these ACF values are
1
1
1 w 1B
18
1 u 1B
The material balances of the nodes are modeled as follows,
Am
Am 1
p1
p2
...
0.
19
Note that equation (19) applies only when all streams
have the same ARMA(1,1) structure (i.e., where 1 p 1
p 2 . . . is the same for every stream) so that Am can be
factored out and set to zero (i.e., due to the conservation of
Trans IChemE, Vol 78, Part A, October 2000
Figure 2. ACF plot of ARMA(1,) process data, w 1
0.8 and u
1
0.5.
1014
Figure 3. ACF plot of prewhitened ARMA(1,) process data, w 1
u 1
0.5.
KONGSJAHJU et al.
0.8 and
all within the limits (Figure 3). This con®rms the absence of
signi®cant serial correlation.
Another way of observing serial correlation is from a time
series plot (i.e., by plotting the values against time). The
time series plot for the case above is shown in Figure 4. The
upper line is the correlated process data and the bottom line
is the prewhitened data. A trend is seen in the upper line,
which shows dependence of data on past values (i.e., serial
correlation), while the bottom line is slightly smoother but
the difference is not very apparent. To see the removal of
serial correlation from Yt in Figure 4 more clearly, Figure 5
shows a comparison of the prewhitened residuals (Et ) to the
white noise (Ut ). Since the prewhitened residuals are
identical to the white noise residuals, removal of serial
correlation and the effectiveness of this prewhitening
scheme can be concluded.
Some concerns arise in applying the UBET to the
prewhitened data. The ®rst involves dealing with variables
having different ARMA(1,1) structures, i.e., when w 1 and u 1
are not the same for every stream. Consequently,
(1 p 1 p 2 . . .), which is a function of w 1 and u 1 , is
not the same for each variable. Hence, the result Am cannot
be factored out in the matrix form of the material balances
(i.e., the balances cannot be written in the form of equation
(19)) and the Bonferroni test will not be at the speci®ed
level. This drawback is eliminated using the second
approach discussed in the next section.
Figure 5. Comparison plot of the white noise, Ut , and the residual from
0.5. Ut ´ ´ ´ ´; Et ÐÐ.
prewhitened process data, Et , w 1 0.8 and u 1
Another concern involves the effect of the magnitude of
the difference between u 1 and w 1 on the GED performance.
To understand this interesting behaviour, one needs to
understand the relationship between the ARMA structures
and the effect of prewhitening on the GED performance. In
equation (13), the prewhitenened Yt , Yt , is a function of the
transformed m, m , and transformed d, d . Therefore, values
of Yt are also signi®cantly dependent on values of w 1 and u 1
(see equations (14)±(17)). If u 1 > w 1 , then (u 1 w 1 ) is
positive and Yt > Yt . Similarly, if u 1 < w 1 , then (u 1 w 1 ) is
negative and Yt < Yt . If the difference of these parameters
is zero (which is not likely in practice), equation (17) is
reduced to a white noise model (i.e., randomly independent
process data). Some runs with different combinations of w 1
and u 1 values were simulated to illustrate these effects. The
following cases were tested: when (1) w 1 < u 1 , (2) w 1 u 1 ,
(3) w 1 > u 1 .
Table 2 contains some representative UBET results when
u 1 > w 1 . As shown in the table, the UBET’s OP and OPF are
high and the AVTI is low. These values indicate that UBET
effectively identi®ed biased variables when they are biased
and unbiased variables when they do not have bias. In other
words, there is a high rate of perfect identi®cation when the
measurements satisfy this correlation structure, with
u 1 > w 1 . From other runs performed but not shown on the
tables, it was observed that the greater the absolute value of
the difference of the parameters u 1 w 1 , the better the
performance (i.e., higher OP and OPF).
Table 3 gives results of the runs when w 1 $ u 1 . This time,
the greater the absolute value of the difference between the
Table 2. Performance of the UBET applied to prewhitened process data for same ARMA(1,1) in all streams;
when u 1 > w 1 , s 1.0, d 5.0, a 0.05, N 3
w1
Figure 4. Comparison plot of Yt and prewhitened Yt , Yt , w 1
u 1
0.5. Yt ÐÐ; Yt ´ ´ ´ ´.
0.2, u
1
0.4
i
AVTI
OP
OPF
1
2
3
4
5
6
7
0.0291
0.0185
0.0204
0.0233
0.0200
0.0194
0.0204
0.9668
0.9994
1.0000
0.9994
1.0000
0.9996
0.9643
0.9394
0.9809
0.9796
0.9761
0.9800
0.9802
0.9451
0.8 and
i
the stream variable that is biased
Trans IChemE, Vol 78, Part A, October 2000
ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION
Table 3. Performance of the UBET applied to prewhitened process data for same ARMA(1,1) in all streams, w 1
N
3
N
0.5, u
0.0, s
1
10
1.0, d1
5.0, a
1015
0.05.
20
N
i
AVTI
OP
OPF
AVTI
OP
OPF
AVTI
OP
OPF
1
2
3
4
5
6
7
0.0122
0.0109
0.0133
0.0149
0.0128
0.0127
0.0119
0.0009
0.2802
0.3778
0.2055
0.2782
0.2036
0.0003
0.0000
0.2766
0.3728
0.2038
0.2740
0.2004
0.0003
0.0240
0.0172
0.0187
0.0194
0.0177
0.0181
0.0199
0.2337
0.9135
0.9780
0.8901
0.9583
0.8919
0.2316
0.2283
0.8980
0.9597
0.8763
0.9423
0.8766
0.2243
0.0289
0.0199
0.0195
0.0234
0.0203
0.0168
0.0178
0.9449
0.9981
1.0000
0.9986
1.0000
0.9988
0.9419
0.9200
0.9783
0.9805
0.9757
0.9797
0.9820
0.9258
i
the stream variable that is biased
parameters (i.e., u 1 w 1 ) is, the worse the performance.
This is because the greater the difference, the smaller d is
compared to d (equation (16)) and this smaller value of d
decreases the ability (i.e., power) of detecting d . These
tables show the effect of increasing the sample size (N) to
improve detection as u 1 w 1 increases. One sees that
excellent performance is still possible if a large enough N is
used, which is equivalent to a small enough measurement
error term variance.
MT results under similar conditions as the UBET runs are
presented in Tables 4 and 5. In all cases, the OPs are high,
the AVTIs are high, and the OPFs are low indicating very
poor rates of perfect identi®cation. Note that the values of
OP and AVTI are 1.0. As discussed in earlier sections, this is
attributable to an MT weakness that leads to conclusions
that all variables are biased when in reality only one variable
is biased. While it performs well in detecting the existence
of biased variables, it identi®es the non-biased variables
very poorly.
combinations of dt and the serially correlated error term:
D
Rt
Ert
24
where
D
A dt
Ert
1
1
25
and
u
w
1
B
U
B t
26
1
Ut , Nw 0, ASAT
27
Similar modelling as applied to Yt is applied to Rt as
follows:
Rt
D
m Rt
E Rt
S Rt
Var Rt
Ut
28
with
D
ASAT
29
where
Prewhitening Applied to Rt
The second approach of prewhitening applied to the
UBET is to prewhiten Rt . The difference of this approach
from the preceding one is that instead of prewhitening Yt , Rt
(or Ayt ) is prewhitened. Because this approach does not
require factoring out Am, as in equation (19), this approach
is applicable to process variables with different serial
correlation structures, as will be shown.
Using the process and the general measurement models
presented in equations (4) and (1), respectively, the vector
of nodal balances, Rt , is modelled as a function of the linear
Table 4. Performance of the MT applied to prewhitened
estimated values of the residuals for same ARMA(1,1) in
all streams; when u 1 > w 1 , s 1.0, d1 5.0, a 0.05,
N 3
w1
0.2, u
1
Rt
P B Rt
Rt
Rt
p 1 Rt
D
D1
p1
p 2 Rt
1
p2
2
...
30
...
31
and
u
B
w 1B
1
1
PB
1
1
w
1
1
32
B
u 1B
1
Table 5. Performance of the MT applied to prewhitened
estimated values of the residuals for same ARMA(1,1) in
all streams; when u 1 < w 1 , s 1.0, d1 5.0, a 0.05,
N 3
0.4
w1
0.4, u
1
0.2
i
AVTI
OP
OPF
i
AVTI
OP
OPF
1
2
3
4
5
6
7
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1
2
3
4
5
6
7
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
i
the stream variable that is biased
Trans IChemE, Vol 78, Part A, October 2000
i
the stream variable that is biased
KONGSJAHJU et al.
1016
Table 6. Performance of the UBET applied to prewhitened Rt for same ARMA(1,) structure in every node
and the overall material balance, when u 1 > w 1 , s 1.0, di 5.0, a 0.05, N 3.
0.0, u
w1
u
m 1
1
u
w1
OPF
AVTI
OP
OPF
1
2
3
4
5
6
7
0.0303
0.0296
0.0198
0.0304
0.0293
0.0288
0.0202
0.9931
1.0000
1.0000
1.0000
1.0000
1.0000
0.9919
0.9629
0.9704
0.9802
0.9696
0.9707
0.9712
0.9723
0.0279
0.0284
0.0160
0.0300
0.0300
0.0278
0.0206
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.9721
0.9716
0.9840
0.9700
0.9700
0.9722
0.9794
the stream variable that is biased
w
1
33
1
0.4, u
1
0.2
w1
0.5, u
1
N
AVTI
OP
OPF
N
AVTI
OP
OPF
1
3
10
3
10
3
10
3
10
3
10
3
10
3
10
0.0220
0.0319
0.0235
0.0298
0.0205
0.0193
0.0265
0.0251
0.0245
0.0278
0.0247
0.0298
0.0158
0.0210
0.1879
0.8076
0.7363
0.9997
0.8635
0.9999
0.6545
0.9998
0.7725
1.0000
0.6512
0.9994
0.2155
0.8027
0.1838
0.7818
0.7163
0.9699
0.8450
0.9806
0.6350
0.9747
0.7517
0.9722
0.6334
0.9696
0.2134
0.7855
3
20
3
20
3
20
3
20
3
20
3
20
3
20
0.0110
0.0299
0.0146
0.0267
0.0129
0.0221
0.0159
0.0300
0.0181
0.0291
0.0167
0.0310
0.0114
0.0195
0.0297
0.7459
0.2322
0.9982
0.3156
1.000
0.1594
1.0000
0.2168
1.0000
0.1614
0.9985
0.0391
0.7441
0.0292
0.7238
0.2261
0.9715
0.3093
0.9779
0.1546
0.9693
0.2105
0.9709
0.1564
0.9676
0.0384
0.7296
4
5
6
7
i
the stream variable that is biased
closer to zero, in the direction of non-existence of bias,
reducing bias detection sensitivity. To overcome this, one
may elect to perform GED analysis at each sampling time.
This will not only improve detection, but also the estimation
of the time of bias occurrence. Figure 7 illustrates a case of
bias occurrence in one of the streams entering node A at
time 50. It shows that Rt changes at time 50 but then goes
back to the level it was at before time 50. Table 8 presents
some representative results of UBET’s performance for this
type of bias occurrence for the same ARMA(1,1) structure in
all nodes. Results show high rates of perfect identi®cation
Figure 6. Prewhitening RtA with constant bias throughout, w 1
u 1
0.5. Rt ÐÐ; Rt ´ ´ ´ ´.
0.8 and
Figure 7. Prewhitening RtA with bias introduced at time 50, w 1
u 1
0.5. Rt ´ ´ ´ ´; Bias ÐÐ.
0.8 and
0.0
i
3
0.8
OP
Table 7. Performance of the UBET for prewhitened Rt for same ARMA(1,)
structure in every node and the overall material balance, when u 1 < w 1 ,
s 1.0, di 5.0, a 0.05.
2
1
AVTI
The results using UBET with the prewhitened Rt are
given in Tables 6±8. Table 6 presents cases with u 1 $ w 1 .
The results are similar to results when Yt is prewhitened
directly given the same magnitudes of ARMA parameters.
The OPs and the OPFs are high, and the AVTIs are low
indicating excellent identi®cation of both biased and
unbiased variables. The performance improves further as
N increases. Table 7 presents the results when u 1 < w 1 . The
same behaviour is observed as when Yt is prewhitened
directly. That is, as u 1 w 1 becomes larger, the identi®cation of both biased and unbiased variables deteriorates. In
parallel to this analysis as to the cause of this in the previous
approach, this happens because as u 1 w 1 becomes larger,
the magnitude of the transformed bias (D ) is reduced (see
equation (31)), and thus it becomes more dif®cult to detect
bias. A possible way to overcome this limitation, is to use
data sets that represent periods before and after the bias
occurrence, as will be illustrated.
Figure 6 is a time series plot for node A when the bias
exists for all times shown. The dashed line represents the
prewhitened nodal balance data of node A(RtA ). The solid
line is the correlated nodal balance data (RtA ). As seen in the
®gure, after the serial correlation has been removed, RtA is
w1
0.5, u
i
i
pm
0.5
1
Trans IChemE, Vol 78, Part A, October 2000
ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION
Table 8. Performance of the UBET applied to prewhitened Rt for same ARMA(1,1) structure in all nodes, when w 1 > u 1 , s
introduced in the middle of the run.
0.8, u
w1
1
0.5
w1
0.9, u
0.9
1
1.0, di
w1
5.0, a
0.2, u
1
1017
0.05. Bias
0.4
i
N
AVTI
OP
OPF
AVTI
OP
OPF
AVTI
OP
OPF
1
3
8
3
8
3
8
3
8
3
8
3
8
3
8
0.0274
0.0286
0.0282
0.0294
0.0208
0.0220
0.0250
0.0313
0.0310
0.0290
0.0279
0.0311
0.0194
0.0195
0.4505
0.9330
0.9608
1.0000
0.9929
1.0000
0.9508
1.0000
0.9846
1.0000
0.9553
0.9999
0.4672
0.9360
0.4374
0.9067
0.9329
0.9706
0.9721
0.9780
0.9268
0.9687
0.9538
0.9710
0.9279
0.9688
0.4580
0.9177
0.0595
0.0627
0.0561
0.0572
0.0358
0.0396
0.0599
0.0583
0.0571
0.0578
0.0523
0.0600
0.0431
0.0394
0.4759
0.9237
0.9360
0.9999
0.9883
1.0000
0.9363
1.0000
0.9734
1.0000
0.9330
0.9999
0.4520
0.9014
0.4451
0.8654
0.8813
0.9427
0.9527
0.9604
0.8796
0.9417
0.9176
0.9422
0.8838
0.9399
0.4321
0.8647
0.0309
0.0278
0.0289
0.0283
0.0181
0.0194
0.0303
0.0297
0.0298
0.0312
0.0298
0.0291
0.0197
0.0191
0.4519
0.9339
0.9587
1.0000
0.9918
1.0000
0.9533
1.0000
0.9839
1.0000
0.9542
1.0000
0.4659
0.9351
0.4376
0.9079
0.9299
0.9717
0.9737
0.9806
0.9247
0.9703
0.9545
0.9688
0.9252
0.9709
0.4568
0.9173
2
3
4
5
6
7
i
the stream variable that is biased
and that the performance improves further as N is increased.
Similarly, Table 9 demonstrates the UBET GED performance when the ARMA(1,1) structure is different for each
measured variable. These cases show that even when
w 1 , is large and the correlation structure is different
u 1
at every stream, the UBET is able to perform well if the
analysed data represent times before and times after the bias
occurrence.
CONCLUSIONS
This study, shows that prewhitening the serially correlated process data facilitates the use of the UBET for
effective gross error detection. Furthermore, the performance of the UBET was shown to be superior to MT in
handling prewhitened serially correlated data. Although MT
seems to give high overall power to detect bias, the large
AVTI and small OPF takes away its attractiveness. In
contrast, the UBET can give high OP and OPF and low
AVTI. Secondly, prewhitening the nodal balances (Rt ) is
Table 9. Performance of the UBET applied to prewhitened Rt for different ARMA(1,) structure in every
node and the overall material balance, s 1.0, d1 5.0,
a 0.05, N
8
i
AVTI
OP
OPF
1
2
3
4
5
6
7
0.0280
0.0282
0.0184
0.0268
0.0289
0.0303
0.0218
0.9349
1.0000
1.0000
1.0000
1.0000
1.0000
0.9339
0.9086
0.9718
0.9816
0.9732
0.9711
0.9697
0.9132
i the stream variable that is biased
The structure with different parameters for every node
is given:
(1) node A: w A,1
0.5 and u A,t 0.8,
(2) node B: w B,1 0.5 and u B,t
0.8,
(3) node C: w C,1 0.0 and u C,t 0.0,
(4) node D: w D,1 0.2 and u D,t 0.4,
(5) node w ABCD,1 0.4 and u A,BCD ,1 0.2
Trans IChemE, Vol 78, Part A, October 2000
more effective than prewhitening the correlated process data
directly (Yt ). Prewhitening Rt eliminates several limitations
of prewhitening Yt . In addition, when u 1 < w 1 , the limitation
of power reduction can be overcome by the prewhitening
scheme if one performs GED analysis at each sampling
time. This prewhitening scheme requires fewer variables to
be prewhitened (i.e., only the nodal balances as opposed to
each measured variable).
Following the assumptions in Kao et al., ARMA
parameters used in this study were assumed to be known
accurately without sampling error. In a real application,
parameter estimates may be obtained by doing a time series
analysis (i.e., analysing the ACFs and PACFs) of measurement data or nodal residuals, depending on the GED
prewhitening approach chosen. Since the estimation of
parameters introduces sampling error, the sensitivity of the
tests to these errors may have to be examined in the future.
The idea to prewhiten data is not a new one. However, the
ways that the authors’ have prewhitened in the context of
GED is a signi®cant contribution of this work. In addition,
the extension of the UBET to address GED analysis under
these prewhitening schemes and its performance results are
also signi®cant contributions. While it may be dif®cult to
accurately approximate the serially correlated behaviour for
very large networks, we believe that this work has merit for
a large number of chemical processes in industry today.
Finally, although this work was presented under the
assumption of steady state, the proposed approach would
also be applicable to conditions of pseudo steady state, as in
Rollins and Davis4.
NOMENCLATURE
A
ACF
AVTI
Et
Ert
k
l
li
constraint matrix, (w ´ u)
autocorrelation function
average type I error
serially correlated random measurement error vector (u ´ 1) at
time t
serially correlated random nodal balances error vector (w ´ 1) at
time t
lag numbers
general vector used for making linear combinations of
measurements
vector used for making linear combinations for tests of di .
KONGSJAHJU et al.
1018
MT
n
N
OP
OPF
rt
Rt
Rt
UBET
Ut
Yt
Za/2w
measurement test
number of time instants in the time series
sample size
overall power
overall performance
estimates of measurement errors in the MT at time t (u ´ 1)
nodal balance vector (w ´ 1) at time t
prewhitened nodal balance vector (w ´ 1) at time t
unbiased estimation technique
white noise random error vector (u ´ 1) at time t
process measurements vector at time t (u ´ 1)
100/(a/2w)th percentile of the normal distribution
Greek letters
a
type I error level or the signi®cance level
D
unknown w ´ 1 vector of linear combinations of measurement
biases
D
transformed D after prewhitening step, (w ´ 1)
Dt
unknown w ´ 1 vector of linear combinations of measurement
biases at time t
d
unknown u ´ 1 vector of measurement biases
d0
initial value of the drifting bias, u ´ 1
d
transformed d after prewhitening step, (u ´ 1)
di
bias of stream i
dt
d at time t
m
unknown true process mean vector, (u ´ 1)
m
transformed m after prewhitening step, (u ´ 1)
m Rt
the expected value of Rt , (w ´ 1)
m Rt
the expected value of Rt , (w ´ 1)
wp B
AR( p) model transfer function
w1
®rst order coef®cient of autoregressive function
PB
prewhitening transfer function as a function of p 1 , p 2 , . . .
u 0
zero-order coef®cient of moving average function
u q B
MA q model transfer function
u 1
®rst-order coef®cient of moving average function
»k
ACF values at k lag numbers
S
variance-covariance measurement matrix of measured variables,
(u ´ u)
S Rt
variance-covariance matrix for R , (w ´ w)
s
standard deviation of normal random errors
4. Rollins, D. K. and Davis, J. F., 1993, Gross error detection when
variance-covariance matrices are unknown, AIChE J, 39: 1335±1341.
5. Rollins, D. K. and Roelfs, S. D., 1992, Gross error detection when
constraints are bilinear, AIChE J., 38: 1295±1298.
6. Kuiper, S. D., Rollins, D. K. and Chen, V. C. P., 1997, Gross error
detection strategies when constraints are bilinear, ADCHEM ’97
International Symposium on Advanced Control of Chemical Processes,
289.
7. Rollins, D. K. and Devanathan, S., 1993, Unbiased estimation in
dynamic data reconciliation, AIChE J, 39: 1330±1334.
8. Devanathan, S., 1993, Dynamic data reconciliation and gross error
detection, Maters Thesis (Iowa State University, USA).
9. Rollins, D. K., Cheng, Y. and Chen, V. C. P., 1996, Detection of
equipment faults in automatically controlled processes, AIChE J., 42:
642.
10. Manuell, L. M., Bascunana, V. B. and Rollins, D. K., 1997, Statistical
fault detection of automatically controlled processes, ADCHEM ’97 Int
Symp on Advanced Control of Chemical Processes, 458±463.
11. Box, G. E. P. and Jenkins, G. E., 1970, Time Series Analysis,
Forecasting and Control, Revised ed, Holden Day, San Francisco.
12. Mah, R. S. H. and Tamhane, A. C., 1982, Detection of gross errors in
process data, AIChE J, 28: 828.
13. Iordache, C., Mah, R. S. H. and Tamhane, A. C., 1985, Performance
studies of the measurement test for detection of gross errors in process
data, AIChE J, 31: 1187.
14. Heenan, W. A. and Serth, R. W., 1986, Gross error detection and data
reconciliation in steam-metering systems, AIChE J, 32: 733±742.
15. Rosenberg, J., Mah, R. S. H. and Iordache, C., 1987, Evaluation of
schemes for detecting and identifying gross errors in process data, Ind
Eng Chem Res, 26: 555±564.
16. Narasimhan, S. and Mah, R. S. H., 1987, Generalized likelihood ratio
methods for gross error identi®cation, AIChE J, 33: 1514.
17. Rollins, D. K., 1990, Unbiased estimation of measured process
variables when measurement biases and process leaks are present,
PhD Dissertation, (Ohio State University, USA).
ACKNOWLEDGEMENTS
We would like to acknowledge the partial support for this project by the
National Science Foundation under Grant CTS-9453534.
REFERENCES
1. Rollins, D. K. and Davis, J. F., 1992, Unbiased estimation of gross
errors in process measurements, AIChE J., 38: 563±572.
2. Kao, C. S., Mah, R. S. H. and Tamhane, A. C., 1990, Gross error
detection in serially correlated process data, Ind Eng Chem Res, 29:
1004±1012.
3. Rollins, D. K., Cheng, Y. and Devanathan, S., 1996, Intelligent
selection of hypothesis tests to enhance gross error identi®cation,
Comp and Chem Eng, 20: 517±530.
ADDRESS
Correspondence concerning this paper should be addressed to Dr D. K.
Rollins, Department of Chemical Engineering, Iowa State University, 1033
Sweeney Hall, Ames, IA 50010, USA, (E-mail: drollins@iastate.edu).
The manuscript was received 26 October 1999 and accepted for
publication after revision 1 August 2000.
Trans IChemE, Vol 78, Part A, October 2000
Download