0263±8762/00/$10.00+0.00 q Institution of Chemical Engineers Trans IChemE, Vol. 78, Part A, October 2000 ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION Ä ANA R. KONGSJAHJU, D. K. ROLLINS and M. B. BASCUN Departments of Chemical Engineering and Statistics, Iowa State University, Iowa, USA C hemical process data are often correlated over time (i.e., auto or serially correlated) due to recycle loops, large material inventories, sampling lag, dead time, and process dynamics created by high-order systems and transportation lag. However, many approaches that attempt to identify gross errors in measured process variables have not addressed the issue of serial correlation which can lead to large inaccuracies in identifying biased measured variables. Hence, this work extends the unbiased estimation technique (UBET) of Rollins and Davis1 to address serial correlation. The serially correlated gross error detection study of Kao et al.2 is used as a basis for setting up this study and comparison. In their work, the type of autocorrelation was assumed known (ARMA(1,1)), and the measurement test (MT) was used for the identi®cation of the measurement bias. While Kao et al.2 used prewhitening of the data and variances of measured variables derived from knowledge of the time correlation structure, this work presents two prewhitening methods and a different identi®cation strategy based on the UBET. Results of the simulation study show the UBET has higher perfect identi®cation rates and lower type I error rates over the MT. Keywords: serial correlation; autocorrelation; time series; gross error detection; data reconciliation INTRODUCTION As measurement and modelling technology continues to improve, along with advancements in computer technology, the amount of available process data will continue to grow. However, the bene®ts of these large amounts of data cannot be fully harnessed if data are inaccurate. For the past four decades, active research in the area of gross error detection or GED has aspired to detect, identify, and correct these measured process variables with signi®cantly large errors. Most GED methods proposed in the literature for the detection of large process measurement errors are derived from the assumption of serially independent process data that are iid (i.e., identically and independently distributed). This iid assumption is not always a realistic view of the condition of real processes. From a statistical point of view, serial correlation occurs when a process measurement variable is correlated over time. This kind of correlation can occur when there is model-mismatch in the dynamics that was not accounted for in either the process or measurement model. In most GED literature, serial correlation is differentiated from process dynamics and is used to refer strictly to correlation between measurement errors. In chemical processes, serial correlation may occur because of a number of physical factors such as process dead time, process dynamics, process control (e.g., feedback control), as well as factors related to measuring instruments. Kao et al.2 proposed one of the ®rst GED methods to address serial correlation. They prewhitened residuals of measured variables and used the measurement test (MT) to identify gross measurement errors. The authors’ evaluation of this approach indicated a high rate of false identi®cation consistent with other MT methods involving non-serially correlated process data (Rollins, et al.3). The concept of prewhitening process measurements for handling serial correlation is also used in this work to extend the capabilities of the unbiased estimation technique (UBET) developed by Rollins and Davis1. Over the years the UBET has been extended to address a variety of conditions including unknown variances and covariances of measurement errors (Rollins and Davis4), bilinear constraints (Rollins and Roelfs5, Kuiper et al.6), dynamic processes (Rollins et al.7, Devanathan8) and for automatically controlled processes (Rollins et al.9, Manuell et al.10). However, this is the authors’ ®rst attempt to extend the UBET to serially correlated data. There are two major contributions in this article. First, two ways of prewhitening serially correlated data are presented: (1) directly on the measured variables and (2) on the nodal mass balances, both of which are different from Kao et al.2. The merits of both strategies are then investigated. The second major contribution is the modi®cation of the UBET test statistics and hypothesis tests to correctly address the use of prewhitened transformed data. This paper is organized into three sections. The ®rst section presents the process measurement model that includes the serial correlation structure used in this study and a note on the scope of this work. This section is followed by a reproduction of some Kao et al.2’s results and a discussion of the limitations of the measurement test (MT) in dealing with serial correlation. The last section discusses the enhancement of the UBET to serially correlated process measurements. This section compares results of the UBET and the MT. 1010 ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION 1011 PROCESS AND MEASUREMENT MODEL UNDER SERIAL CORRELATION The steady state measurement model that takes into account measurement bias (dt ) and serially correlated measurement errors (Et) with the correlation structure of a ®rst order autoregressive moving-average model, ARMA(1,1)*, is shown below: Yt m Et dt 1 in this study. This indicates the following assumptions: where Et Ut , u 1 1 B Ut 1B 1 2 Nu 0, S 3 w Since process variables have to satisfy physical constraints, equation (1) is subject to Am 0 4 where A is a known w ´ u constraint matrix with the rank A w (u number of measured variables and w number of physical constraints). The equations for the development of the matrix A for the generic process network shown in Figure 1 used by Kao et al.2 and in this study are given below. m1 m6 m2 0 m2 m3 0 m3 m4 m5 0 m5 m6 m7 0 m4 5 7, w Thus, for this process network (u can be written as: Am Figure 1. Process network used in the simulation study. 2 1 60 6 6 40 0 1 1 0 1 0 1 0 0 4), equation (4) 1 0 1 0 0 0 1 0 1 1 0 1 2 m1 3 6 7 36 m2 7 0 6 7 6 m3 7 6 7 0 7 76 7 76 m4 7 6 0 56 7 6 m5 7 7 1 6 6 7 4 m6 5 m7 2 In the next section, Kao et al. ’s approach to serial correlation using the measurement test (MT) is examined using a replication of their algorithm. This study used exactly the same system and physical constraints (i.e., material balances) as in Kao et al. to facilitate the comparison of their method with the new methods introduced in this paper. All replication details can be found in Kao et al. including the value of m used in the simulations which is (1,3,3,1,2,1,1)T . As in Kao et al.2 one (and only one) biased stream exists for each simulation run. While it is possible for the serial correlation to be of a higher order ARMA ( p,q), ARMA(1,1) time series models are used to generate data following the study by Kao et al.2. In addition, the scope used by Kao et al.2 was adopted for use * For a comprehensive introduction to time series analysis including ARMA models and calculation of autocorrelation and partial autocorrelation functions (i.e., ACFs and PACFs) discussed in this paper, the reader is referred to Box and Jenkins11. Trans IChemE, Vol 78, Part A, October 2000 (1) variances and covariances of measurement errors are known; (2) ARMA parameters (i.e., values of u 1 and w 1 ) are known accurately without any sampling error; (3) there is only one measurement bias at any given time; and (4) the true state variable vector x t x is ®xed. The fourth assumption means that there are no variances due to process dynamics and thus, there will be no autocorrelations in the y(t) induced by it. This means autocorrelations that would be detected could only come from errors in the measurement data (the y’s). It must be noted that to date, there has not been any meaningful progress in the literature in actually separately identifying autocorrelations due to process dynamics and measurement errors because of modelling dif®culty. As will be seen in the later sections of this paper, the approach here is not restricted by the ®rst three assumptions and it is chosen only for reasons of comparison. USING THE MEASUREMENT TEST FOR HANDLING SERIAL CORRELATION The measurement test (MT) is a popular GED test developed by Mah and Tamhane12. It has been used in a number of methods under the assumption of white noise (Iordache et al.13, Heenan and Serth 14, Rosenberg et al.15, Narasimhan and Mah16). To the authors’ knowledge, Kao et al.’s application of the MT, is the only GED method dealing with serial correlation that has been formally evaluated and thus, should be used as the medium for comparing the performance of the new methods stated here. A brief discussion of the procedure used in Kao et al. and some comments on their results follow. Prewhitening Applied to the Estimates of the Residuals The same measurement model as described in the previous section was used in the work of Kao et al.2. The serially correlated measurement error term, Et, with ARMA(1,1) structure is shown in equations (2) and (3). The MT makes use of the vector of measurement adjustments obtained from weighted least-squares data reconciliation. This residual vector using these estimates and the measured variables is given by à t SAT ASAT 1 AYt rt Yt Y 7 where A is the w ´ u constraint matrix, S is the u ´ u variance-covariance matrix of random measurement error à t is the terms, Yt is the u ´ 1 measurement vector, and Y least-squares estimate of E Yt . As the basis of their approach, Kao et al.2 prewhitened rt KONGSJAHJU et al. 1012 by Ut 1 1 w Table 1. Performance of the MT applied to averaged prewhitened residuals with the same ARMA(1,1) in all streams; S 0.25, di 0.2, a 0.1. B rt . B 1 1 u 8 The MT is then applied to the ®ltered (prewhitened) residuals, Ut , and declares a bias in the i th measurement at time t if, and only if Uti ZUti > Za 9 si where ZUti is the standardized univariate normal test statistic for the i th measurement at time t, Za is the upper a point of the standard normal distribution, a is 12 1 1 a 1/u , and st is the standard deviation of measured variable i (i.e., the i th element of the diagonal of S). The replication of Kao et al.2’s method included the following conditions: (1) S 0.25I; (2) a signi®cance level (a ) of 0.1; (3) only one biased variable with magnitude dt 0.25 existed at any given time; (4) there were 10,000 trials simulated per biased variable. TM Data was generated using FORTRAN Power Station with IMSL TM subroutines in Microsoft TM WindowsTM. Standard normally distributed random numbers were used to generate the white noise. The performance measures presented by Kao et al.2 were OP (overall power) and P (type I error) (probability of type I error). They de®ned OP as follows: OP No. of non-zero ds correctly identified . No. of non-zero ds simulated 10 An OP equal to 0.63 means that for 10,000 cases with simulated biases, 6,300 cases identify the bias correctly. It should be noted, however, that their chosen performance measures have weaknesses. First, whenever the biased stream is correctly identi®ed, OP increases even though other non-biased streams may have also been identi®ed as biased at that time. In other words, OP still increases despite false identi®cations in other portions of the network. Second, Kao et al.2 used P (type I error) to indicate the performance of false identi®cations when no actual bias existed in the measured variables. Hence, this indicator does not provide suf®cient information regarding the false identi®cation of non-biased variables when bias exists. To give a better measure of performance for unbiased variables when bias exists, P(type I error) was replaced with the AVTI (averaged type I error) (Narasimhan and Mah16) and the OPF (overall performance) (Rollins and Davis1) added to the study. These measures are de®ned as AVTI OPF No. of zero ds wrongly identified No. of simulation trials 10,000 No. of trial with perfect identification . No. of simulation trials 10,000 11 12 The AVTI effectively shows false identi®cations for unbiased variables when an actual bias exists in some variable(s) in the process. The OPF, on the other hand, is a measure of perfect identi®cations. For example, an OPF of 0.5 means that in the 10,000 simulated trials there were w1 0.4, u 1 0.2 i AVTI OP OP a OPF 1 2 3 4 5 6 7 0.9649 0.9672 0.9695 0.9734 0.9684 0.9686 0.9663 0.6365 0.6374 0.6412 0.5772 0.6135 0.5693 0.6373 0.6100 0.6100 0.6200 0.4800 0.5600 0.4800 0.6100 0.0051 0.0079 0.0054 0.0027 0.0056 0.0035 0.0063 i the stream variable that is biased OP OP results from replication OPa OP results reported by Kao, et al.2 5,000 trials that identi®ed all zero ds correctly and all nonzero ds correctly. The main goal in identi®cation is to obtain high OPF and low AVTI. The MTs Performance Table 1 presents results of some cases in Kao et al.2 and the replication of their study in this work. As shown, there is close agreement between the OPs. It can then be concluded that their study has been successfully replicated, and more importantly, their algorithm. Table 1 reveals very high values of AVTI and very low values of OPF. A high AVTI means it has a very high likelihood of identifying unbiased variables as biased when an actual bias exists in a stream somewhere within the process network. Very low OPFs on the other hand, suggest very high rates of imperfect identi®cation (i.e., while the biased stream is correctly identi®ed, many unbiased streams in the network are also being identi®ed as biased at the same time). This result shows the particular drawback of a high rate of false identi®cation for the general class of strategies using the MT in the presence of biased variables. This conclusion agrees with previous studies done on the MT (Rollins17, Rollins et al.3) under the condition of white noise. It addition, it must be noted that the method of prewhitening presented by Kao et al.2 attempts to remove serial correlation from the estimated values of the residuals, i.e., prewhitened the estimates of residuals. This approach produces biased estimates of Yt and therefore produces biased residuals that result in decreased identi®cation accuracy. USING THE UBET FOR HANDLING SERIAL CORRELATION The unbiased estimation technique (UBET) seeks to obtain high OPFs. The UBET is an approach for gross error detection which was designed to address limitations of other techniques such as inability to control type I and type II errors, statistically inconsistent estimators and biased estimators. The ultimate goal, as the name suggests, is producing unbiased estimators. In this study, the approach and model formulation of the UBET was used for gross error detection and identi®cation, and extended to handling serially correlated process data. Note that in the previous section, for the MT, prewhitening is applied to the residuals using estimated values of Trans IChemE, Vol 78, Part A, October 2000 ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION 1013 process variables. The prewhitening approaches used here are different. Two ways of prewhitening process data for application of the UBET were considered. The ®rst approach was to directly prewhiten the serially correlated measurement data (i.e., the Yt s). The second approach was to prewhiten the nodal mass balances (Rt ). As will be seen later, an advantage of the latter approach over the former one is there are fewer variables to model and transform. A simulation study was done to show the effects of these prewhitening schemes with the UBET and to compare them to the method presented by Kao et al.2. As stated earlier, the same ¯ow network as in Kao et al.2 and Rollins and Davis1 was used (see Figure 1). For this study, S I and a signi®cance level (a ) of 0.05 were used. Only one biased variable (dti di 5.0) existed in each trial. The number of simulation trials per biased variable was 10,000 and the data were generated by using the FORTRAN TM Power Station with IMSL TM subroutines in MicrosoftTM WindowsTM. Standard normally distributed random numbers were used to generate the white noise. mass). Although not restrictive, we are assuming the nonexistence of leaks for simplicity. The next step as in the UBET is the following transformation using the prewhitened process data (Yt ): Prewhitening Applied to Yt Where li is a vector of zeros and ones, representing nodal balances or combinations of nodal balances. The Bonferroni test statistic is used as a test for the above hypotheses with the appropriate changes from Rollins and Davis1. This test is given as, reject Hot : lT mRt 0 in favor of Hat : lT mRt Þ 0, if and only if: p N lT Rt q $ Za /2w 23 l T S Rt l The idea of prewhitening Yt , is to multiply Yt given by equation (1), with a transfer function that transforms it to randomly independent process data (Yt ), given by the equation below: Yt m Ut dt 13 where P B Yt Yt Yt p 1 Yt m1 p1 p 2 Yt 1 ... 2 14 and m p2 ... 15 If the bias is assumed constant over time, dt with d d1 p1 p2 u w d and dt ... d, 16 where pm u m 1 1 1 1 m , 1, 2, 3, . . . 17 The assumption of constant bias is not a very restrictive assumption granted that a bias is soon removed once it is detected and identi®ed. The transfer function P B , is written as a functin of u 1 B and w 1 B (note that B is the backward-shift operator): 1 1 PB u B w 1B Rt AYt Am Ad AUt 20 where it is assumed that mRt E Rt 0, if and only if, d 0 (i.e., d 0). This assumption is necessary for the hypotheses: Hot : mRt Ad 0 Hat : mR t Ad Þ 0 21 The basic identi®cation mechanism of the UBET is to relate linear combinations of the components of mRt to speci®c conclusions regarding the components of d . Hoi,t : lTi mRt 0 Hai,t : lTi mRt Þ 0 22 where, S Rt ASAT and N is the sample size (the number of measurements for each variable at each sampling time). Adequate prewhitening will remove serial correlation from process data. One way to verify the removal of serial correlation is to calculate the sample autocorrelation functions or ACFs. The ACF plots for the serially correlated process data (for a particular stream measurement) and the prewhitened results are shown in Figures 2 and 3, respectively. The dashed lines represent the upper and lower limits from which the sample ACFs are judged to be signi®cantly different from zero. Prior to prewhitening, the ACF plot (Figure 2) shows ACF values that are signi®cantly different from zero in the early lags and dying down as the lag increases. After the prewhitening, these ACF values are 1 1 1 w 1B 18 1 u 1B The material balances of the nodes are modeled as follows, Am Am 1 p1 p2 ... 0. 19 Note that equation (19) applies only when all streams have the same ARMA(1,1) structure (i.e., where 1 p 1 p 2 . . . is the same for every stream) so that Am can be factored out and set to zero (i.e., due to the conservation of Trans IChemE, Vol 78, Part A, October 2000 Figure 2. ACF plot of ARMA(1,) process data, w 1 0.8 and u 1 0.5. 1014 Figure 3. ACF plot of prewhitened ARMA(1,) process data, w 1 u 1 0.5. KONGSJAHJU et al. 0.8 and all within the limits (Figure 3). This con®rms the absence of signi®cant serial correlation. Another way of observing serial correlation is from a time series plot (i.e., by plotting the values against time). The time series plot for the case above is shown in Figure 4. The upper line is the correlated process data and the bottom line is the prewhitened data. A trend is seen in the upper line, which shows dependence of data on past values (i.e., serial correlation), while the bottom line is slightly smoother but the difference is not very apparent. To see the removal of serial correlation from Yt in Figure 4 more clearly, Figure 5 shows a comparison of the prewhitened residuals (Et ) to the white noise (Ut ). Since the prewhitened residuals are identical to the white noise residuals, removal of serial correlation and the effectiveness of this prewhitening scheme can be concluded. Some concerns arise in applying the UBET to the prewhitened data. The ®rst involves dealing with variables having different ARMA(1,1) structures, i.e., when w 1 and u 1 are not the same for every stream. Consequently, (1 p 1 p 2 . . .), which is a function of w 1 and u 1 , is not the same for each variable. Hence, the result Am cannot be factored out in the matrix form of the material balances (i.e., the balances cannot be written in the form of equation (19)) and the Bonferroni test will not be at the speci®ed level. This drawback is eliminated using the second approach discussed in the next section. Figure 5. Comparison plot of the white noise, Ut , and the residual from 0.5. Ut ´ ´ ´ ´; Et ÐÐ. prewhitened process data, Et , w 1 0.8 and u 1 Another concern involves the effect of the magnitude of the difference between u 1 and w 1 on the GED performance. To understand this interesting behaviour, one needs to understand the relationship between the ARMA structures and the effect of prewhitening on the GED performance. In equation (13), the prewhitenened Yt , Yt , is a function of the transformed m, m , and transformed d, d . Therefore, values of Yt are also signi®cantly dependent on values of w 1 and u 1 (see equations (14)±(17)). If u 1 > w 1 , then (u 1 w 1 ) is positive and Yt > Yt . Similarly, if u 1 < w 1 , then (u 1 w 1 ) is negative and Yt < Yt . If the difference of these parameters is zero (which is not likely in practice), equation (17) is reduced to a white noise model (i.e., randomly independent process data). Some runs with different combinations of w 1 and u 1 values were simulated to illustrate these effects. The following cases were tested: when (1) w 1 < u 1 , (2) w 1 u 1 , (3) w 1 > u 1 . Table 2 contains some representative UBET results when u 1 > w 1 . As shown in the table, the UBET’s OP and OPF are high and the AVTI is low. These values indicate that UBET effectively identi®ed biased variables when they are biased and unbiased variables when they do not have bias. In other words, there is a high rate of perfect identi®cation when the measurements satisfy this correlation structure, with u 1 > w 1 . From other runs performed but not shown on the tables, it was observed that the greater the absolute value of the difference of the parameters u 1 w 1 , the better the performance (i.e., higher OP and OPF). Table 3 gives results of the runs when w 1 $ u 1 . This time, the greater the absolute value of the difference between the Table 2. Performance of the UBET applied to prewhitened process data for same ARMA(1,1) in all streams; when u 1 > w 1 , s 1.0, d 5.0, a 0.05, N 3 w1 Figure 4. Comparison plot of Yt and prewhitened Yt , Yt , w 1 u 1 0.5. Yt ÐÐ; Yt ´ ´ ´ ´. 0.2, u 1 0.4 i AVTI OP OPF 1 2 3 4 5 6 7 0.0291 0.0185 0.0204 0.0233 0.0200 0.0194 0.0204 0.9668 0.9994 1.0000 0.9994 1.0000 0.9996 0.9643 0.9394 0.9809 0.9796 0.9761 0.9800 0.9802 0.9451 0.8 and i the stream variable that is biased Trans IChemE, Vol 78, Part A, October 2000 ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION Table 3. Performance of the UBET applied to prewhitened process data for same ARMA(1,1) in all streams, w 1 N 3 N 0.5, u 0.0, s 1 10 1.0, d1 5.0, a 1015 0.05. 20 N i AVTI OP OPF AVTI OP OPF AVTI OP OPF 1 2 3 4 5 6 7 0.0122 0.0109 0.0133 0.0149 0.0128 0.0127 0.0119 0.0009 0.2802 0.3778 0.2055 0.2782 0.2036 0.0003 0.0000 0.2766 0.3728 0.2038 0.2740 0.2004 0.0003 0.0240 0.0172 0.0187 0.0194 0.0177 0.0181 0.0199 0.2337 0.9135 0.9780 0.8901 0.9583 0.8919 0.2316 0.2283 0.8980 0.9597 0.8763 0.9423 0.8766 0.2243 0.0289 0.0199 0.0195 0.0234 0.0203 0.0168 0.0178 0.9449 0.9981 1.0000 0.9986 1.0000 0.9988 0.9419 0.9200 0.9783 0.9805 0.9757 0.9797 0.9820 0.9258 i the stream variable that is biased parameters (i.e., u 1 w 1 ) is, the worse the performance. This is because the greater the difference, the smaller d is compared to d (equation (16)) and this smaller value of d decreases the ability (i.e., power) of detecting d . These tables show the effect of increasing the sample size (N) to improve detection as u 1 w 1 increases. One sees that excellent performance is still possible if a large enough N is used, which is equivalent to a small enough measurement error term variance. MT results under similar conditions as the UBET runs are presented in Tables 4 and 5. In all cases, the OPs are high, the AVTIs are high, and the OPFs are low indicating very poor rates of perfect identi®cation. Note that the values of OP and AVTI are 1.0. As discussed in earlier sections, this is attributable to an MT weakness that leads to conclusions that all variables are biased when in reality only one variable is biased. While it performs well in detecting the existence of biased variables, it identi®es the non-biased variables very poorly. combinations of dt and the serially correlated error term: D Rt Ert 24 where D A dt Ert 1 1 25 and u w 1 B U B t 26 1 Ut , Nw 0, ASAT 27 Similar modelling as applied to Yt is applied to Rt as follows: Rt D m Rt E Rt S Rt Var Rt Ut 28 with D ASAT 29 where Prewhitening Applied to Rt The second approach of prewhitening applied to the UBET is to prewhiten Rt . The difference of this approach from the preceding one is that instead of prewhitening Yt , Rt (or Ayt ) is prewhitened. Because this approach does not require factoring out Am, as in equation (19), this approach is applicable to process variables with different serial correlation structures, as will be shown. Using the process and the general measurement models presented in equations (4) and (1), respectively, the vector of nodal balances, Rt , is modelled as a function of the linear Table 4. Performance of the MT applied to prewhitened estimated values of the residuals for same ARMA(1,1) in all streams; when u 1 > w 1 , s 1.0, d1 5.0, a 0.05, N 3 w1 0.2, u 1 Rt P B Rt Rt Rt p 1 Rt D D1 p1 p 2 Rt 1 p2 2 ... 30 ... 31 and u B w 1B 1 1 PB 1 1 w 1 1 32 B u 1B 1 Table 5. Performance of the MT applied to prewhitened estimated values of the residuals for same ARMA(1,1) in all streams; when u 1 < w 1 , s 1.0, d1 5.0, a 0.05, N 3 0.4 w1 0.4, u 1 0.2 i AVTI OP OPF i AVTI OP OPF 1 2 3 4 5 6 7 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1 2 3 4 5 6 7 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 i the stream variable that is biased Trans IChemE, Vol 78, Part A, October 2000 i the stream variable that is biased KONGSJAHJU et al. 1016 Table 6. Performance of the UBET applied to prewhitened Rt for same ARMA(1,) structure in every node and the overall material balance, when u 1 > w 1 , s 1.0, di 5.0, a 0.05, N 3. 0.0, u w1 u m 1 1 u w1 OPF AVTI OP OPF 1 2 3 4 5 6 7 0.0303 0.0296 0.0198 0.0304 0.0293 0.0288 0.0202 0.9931 1.0000 1.0000 1.0000 1.0000 1.0000 0.9919 0.9629 0.9704 0.9802 0.9696 0.9707 0.9712 0.9723 0.0279 0.0284 0.0160 0.0300 0.0300 0.0278 0.0206 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9721 0.9716 0.9840 0.9700 0.9700 0.9722 0.9794 the stream variable that is biased w 1 33 1 0.4, u 1 0.2 w1 0.5, u 1 N AVTI OP OPF N AVTI OP OPF 1 3 10 3 10 3 10 3 10 3 10 3 10 3 10 0.0220 0.0319 0.0235 0.0298 0.0205 0.0193 0.0265 0.0251 0.0245 0.0278 0.0247 0.0298 0.0158 0.0210 0.1879 0.8076 0.7363 0.9997 0.8635 0.9999 0.6545 0.9998 0.7725 1.0000 0.6512 0.9994 0.2155 0.8027 0.1838 0.7818 0.7163 0.9699 0.8450 0.9806 0.6350 0.9747 0.7517 0.9722 0.6334 0.9696 0.2134 0.7855 3 20 3 20 3 20 3 20 3 20 3 20 3 20 0.0110 0.0299 0.0146 0.0267 0.0129 0.0221 0.0159 0.0300 0.0181 0.0291 0.0167 0.0310 0.0114 0.0195 0.0297 0.7459 0.2322 0.9982 0.3156 1.000 0.1594 1.0000 0.2168 1.0000 0.1614 0.9985 0.0391 0.7441 0.0292 0.7238 0.2261 0.9715 0.3093 0.9779 0.1546 0.9693 0.2105 0.9709 0.1564 0.9676 0.0384 0.7296 4 5 6 7 i the stream variable that is biased closer to zero, in the direction of non-existence of bias, reducing bias detection sensitivity. To overcome this, one may elect to perform GED analysis at each sampling time. This will not only improve detection, but also the estimation of the time of bias occurrence. Figure 7 illustrates a case of bias occurrence in one of the streams entering node A at time 50. It shows that Rt changes at time 50 but then goes back to the level it was at before time 50. Table 8 presents some representative results of UBET’s performance for this type of bias occurrence for the same ARMA(1,1) structure in all nodes. Results show high rates of perfect identi®cation Figure 6. Prewhitening RtA with constant bias throughout, w 1 u 1 0.5. Rt ÐÐ; Rt ´ ´ ´ ´. 0.8 and Figure 7. Prewhitening RtA with bias introduced at time 50, w 1 u 1 0.5. Rt ´ ´ ´ ´; Bias ÐÐ. 0.8 and 0.0 i 3 0.8 OP Table 7. Performance of the UBET for prewhitened Rt for same ARMA(1,) structure in every node and the overall material balance, when u 1 < w 1 , s 1.0, di 5.0, a 0.05. 2 1 AVTI The results using UBET with the prewhitened Rt are given in Tables 6±8. Table 6 presents cases with u 1 $ w 1 . The results are similar to results when Yt is prewhitened directly given the same magnitudes of ARMA parameters. The OPs and the OPFs are high, and the AVTIs are low indicating excellent identi®cation of both biased and unbiased variables. The performance improves further as N increases. Table 7 presents the results when u 1 < w 1 . The same behaviour is observed as when Yt is prewhitened directly. That is, as u 1 w 1 becomes larger, the identi®cation of both biased and unbiased variables deteriorates. In parallel to this analysis as to the cause of this in the previous approach, this happens because as u 1 w 1 becomes larger, the magnitude of the transformed bias (D ) is reduced (see equation (31)), and thus it becomes more dif®cult to detect bias. A possible way to overcome this limitation, is to use data sets that represent periods before and after the bias occurrence, as will be illustrated. Figure 6 is a time series plot for node A when the bias exists for all times shown. The dashed line represents the prewhitened nodal balance data of node A(RtA ). The solid line is the correlated nodal balance data (RtA ). As seen in the ®gure, after the serial correlation has been removed, RtA is w1 0.5, u i i pm 0.5 1 Trans IChemE, Vol 78, Part A, October 2000 ACCURATE IDENTIFICATION OF BIASED MEASUREMENTS UNDER SERIAL CORRELATION Table 8. Performance of the UBET applied to prewhitened Rt for same ARMA(1,1) structure in all nodes, when w 1 > u 1 , s introduced in the middle of the run. 0.8, u w1 1 0.5 w1 0.9, u 0.9 1 1.0, di w1 5.0, a 0.2, u 1 1017 0.05. Bias 0.4 i N AVTI OP OPF AVTI OP OPF AVTI OP OPF 1 3 8 3 8 3 8 3 8 3 8 3 8 3 8 0.0274 0.0286 0.0282 0.0294 0.0208 0.0220 0.0250 0.0313 0.0310 0.0290 0.0279 0.0311 0.0194 0.0195 0.4505 0.9330 0.9608 1.0000 0.9929 1.0000 0.9508 1.0000 0.9846 1.0000 0.9553 0.9999 0.4672 0.9360 0.4374 0.9067 0.9329 0.9706 0.9721 0.9780 0.9268 0.9687 0.9538 0.9710 0.9279 0.9688 0.4580 0.9177 0.0595 0.0627 0.0561 0.0572 0.0358 0.0396 0.0599 0.0583 0.0571 0.0578 0.0523 0.0600 0.0431 0.0394 0.4759 0.9237 0.9360 0.9999 0.9883 1.0000 0.9363 1.0000 0.9734 1.0000 0.9330 0.9999 0.4520 0.9014 0.4451 0.8654 0.8813 0.9427 0.9527 0.9604 0.8796 0.9417 0.9176 0.9422 0.8838 0.9399 0.4321 0.8647 0.0309 0.0278 0.0289 0.0283 0.0181 0.0194 0.0303 0.0297 0.0298 0.0312 0.0298 0.0291 0.0197 0.0191 0.4519 0.9339 0.9587 1.0000 0.9918 1.0000 0.9533 1.0000 0.9839 1.0000 0.9542 1.0000 0.4659 0.9351 0.4376 0.9079 0.9299 0.9717 0.9737 0.9806 0.9247 0.9703 0.9545 0.9688 0.9252 0.9709 0.4568 0.9173 2 3 4 5 6 7 i the stream variable that is biased and that the performance improves further as N is increased. Similarly, Table 9 demonstrates the UBET GED performance when the ARMA(1,1) structure is different for each measured variable. These cases show that even when w 1 , is large and the correlation structure is different u 1 at every stream, the UBET is able to perform well if the analysed data represent times before and times after the bias occurrence. CONCLUSIONS This study, shows that prewhitening the serially correlated process data facilitates the use of the UBET for effective gross error detection. Furthermore, the performance of the UBET was shown to be superior to MT in handling prewhitened serially correlated data. Although MT seems to give high overall power to detect bias, the large AVTI and small OPF takes away its attractiveness. In contrast, the UBET can give high OP and OPF and low AVTI. Secondly, prewhitening the nodal balances (Rt ) is Table 9. Performance of the UBET applied to prewhitened Rt for different ARMA(1,) structure in every node and the overall material balance, s 1.0, d1 5.0, a 0.05, N 8 i AVTI OP OPF 1 2 3 4 5 6 7 0.0280 0.0282 0.0184 0.0268 0.0289 0.0303 0.0218 0.9349 1.0000 1.0000 1.0000 1.0000 1.0000 0.9339 0.9086 0.9718 0.9816 0.9732 0.9711 0.9697 0.9132 i the stream variable that is biased The structure with different parameters for every node is given: (1) node A: w A,1 0.5 and u A,t 0.8, (2) node B: w B,1 0.5 and u B,t 0.8, (3) node C: w C,1 0.0 and u C,t 0.0, (4) node D: w D,1 0.2 and u D,t 0.4, (5) node w ABCD,1 0.4 and u A,BCD ,1 0.2 Trans IChemE, Vol 78, Part A, October 2000 more effective than prewhitening the correlated process data directly (Yt ). Prewhitening Rt eliminates several limitations of prewhitening Yt . In addition, when u 1 < w 1 , the limitation of power reduction can be overcome by the prewhitening scheme if one performs GED analysis at each sampling time. This prewhitening scheme requires fewer variables to be prewhitened (i.e., only the nodal balances as opposed to each measured variable). Following the assumptions in Kao et al., ARMA parameters used in this study were assumed to be known accurately without sampling error. In a real application, parameter estimates may be obtained by doing a time series analysis (i.e., analysing the ACFs and PACFs) of measurement data or nodal residuals, depending on the GED prewhitening approach chosen. Since the estimation of parameters introduces sampling error, the sensitivity of the tests to these errors may have to be examined in the future. The idea to prewhiten data is not a new one. However, the ways that the authors’ have prewhitened in the context of GED is a signi®cant contribution of this work. In addition, the extension of the UBET to address GED analysis under these prewhitening schemes and its performance results are also signi®cant contributions. While it may be dif®cult to accurately approximate the serially correlated behaviour for very large networks, we believe that this work has merit for a large number of chemical processes in industry today. Finally, although this work was presented under the assumption of steady state, the proposed approach would also be applicable to conditions of pseudo steady state, as in Rollins and Davis4. NOMENCLATURE A ACF AVTI Et Ert k l li constraint matrix, (w ´ u) autocorrelation function average type I error serially correlated random measurement error vector (u ´ 1) at time t serially correlated random nodal balances error vector (w ´ 1) at time t lag numbers general vector used for making linear combinations of measurements vector used for making linear combinations for tests of di . KONGSJAHJU et al. 1018 MT n N OP OPF rt Rt Rt UBET Ut Yt Za/2w measurement test number of time instants in the time series sample size overall power overall performance estimates of measurement errors in the MT at time t (u ´ 1) nodal balance vector (w ´ 1) at time t prewhitened nodal balance vector (w ´ 1) at time t unbiased estimation technique white noise random error vector (u ´ 1) at time t process measurements vector at time t (u ´ 1) 100/(a/2w)th percentile of the normal distribution Greek letters a type I error level or the signi®cance level D unknown w ´ 1 vector of linear combinations of measurement biases D transformed D after prewhitening step, (w ´ 1) Dt unknown w ´ 1 vector of linear combinations of measurement biases at time t d unknown u ´ 1 vector of measurement biases d0 initial value of the drifting bias, u ´ 1 d transformed d after prewhitening step, (u ´ 1) di bias of stream i dt d at time t m unknown true process mean vector, (u ´ 1) m transformed m after prewhitening step, (u ´ 1) m Rt the expected value of Rt , (w ´ 1) m Rt the expected value of Rt , (w ´ 1) wp B AR( p) model transfer function w1 ®rst order coef®cient of autoregressive function PB prewhitening transfer function as a function of p 1 , p 2 , . . . u 0 zero-order coef®cient of moving average function u q B MA q model transfer function u 1 ®rst-order coef®cient of moving average function »k ACF values at k lag numbers S variance-covariance measurement matrix of measured variables, (u ´ u) S Rt variance-covariance matrix for R , (w ´ w) s standard deviation of normal random errors 4. Rollins, D. K. and Davis, J. F., 1993, Gross error detection when variance-covariance matrices are unknown, AIChE J, 39: 1335±1341. 5. Rollins, D. K. and Roelfs, S. D., 1992, Gross error detection when constraints are bilinear, AIChE J., 38: 1295±1298. 6. Kuiper, S. D., Rollins, D. K. and Chen, V. C. P., 1997, Gross error detection strategies when constraints are bilinear, ADCHEM ’97 International Symposium on Advanced Control of Chemical Processes, 289. 7. Rollins, D. K. and Devanathan, S., 1993, Unbiased estimation in dynamic data reconciliation, AIChE J, 39: 1330±1334. 8. Devanathan, S., 1993, Dynamic data reconciliation and gross error detection, Maters Thesis (Iowa State University, USA). 9. Rollins, D. K., Cheng, Y. and Chen, V. C. P., 1996, Detection of equipment faults in automatically controlled processes, AIChE J., 42: 642. 10. Manuell, L. M., Bascunana, V. B. and Rollins, D. K., 1997, Statistical fault detection of automatically controlled processes, ADCHEM ’97 Int Symp on Advanced Control of Chemical Processes, 458±463. 11. Box, G. E. P. and Jenkins, G. E., 1970, Time Series Analysis, Forecasting and Control, Revised ed, Holden Day, San Francisco. 12. Mah, R. S. H. and Tamhane, A. C., 1982, Detection of gross errors in process data, AIChE J, 28: 828. 13. Iordache, C., Mah, R. S. H. and Tamhane, A. C., 1985, Performance studies of the measurement test for detection of gross errors in process data, AIChE J, 31: 1187. 14. Heenan, W. A. and Serth, R. W., 1986, Gross error detection and data reconciliation in steam-metering systems, AIChE J, 32: 733±742. 15. Rosenberg, J., Mah, R. S. H. and Iordache, C., 1987, Evaluation of schemes for detecting and identifying gross errors in process data, Ind Eng Chem Res, 26: 555±564. 16. Narasimhan, S. and Mah, R. S. H., 1987, Generalized likelihood ratio methods for gross error identi®cation, AIChE J, 33: 1514. 17. Rollins, D. K., 1990, Unbiased estimation of measured process variables when measurement biases and process leaks are present, PhD Dissertation, (Ohio State University, USA). ACKNOWLEDGEMENTS We would like to acknowledge the partial support for this project by the National Science Foundation under Grant CTS-9453534. REFERENCES 1. Rollins, D. K. and Davis, J. F., 1992, Unbiased estimation of gross errors in process measurements, AIChE J., 38: 563±572. 2. Kao, C. S., Mah, R. S. H. and Tamhane, A. C., 1990, Gross error detection in serially correlated process data, Ind Eng Chem Res, 29: 1004±1012. 3. Rollins, D. K., Cheng, Y. and Devanathan, S., 1996, Intelligent selection of hypothesis tests to enhance gross error identi®cation, Comp and Chem Eng, 20: 517±530. ADDRESS Correspondence concerning this paper should be addressed to Dr D. K. Rollins, Department of Chemical Engineering, Iowa State University, 1033 Sweeney Hall, Ames, IA 50010, USA, (E-mail: drollins@iastate.edu). The manuscript was received 26 October 1999 and accepted for publication after revision 1 August 2000. Trans IChemE, Vol 78, Part A, October 2000