0263–8762/06/$30.00+0.00 # 2006 Institution of Chemical Engineers Trans IChemE, Part A, January 2006 Chemical Engineering Research and Design, 84(A1): 9 – 21 www.icheme.org/journals doi: 10.1205/cherd.05190 OPTIMAL DETERMINISTIC TRANSFER FUNCTION MODELLING IN THE PRESENCE OF SERIALLY CORRELATED NOISE D. K. ROLLINS , N. BHANDARI, S.-T. CHIN, T. M. JUNGE and K. M. ROOSA Department of Chemical Engineering, Iowa State University, Ames, IA, USA T his article addresses the development of predictive transfer function models for nonlinear dynamic processes under serially correlated model error. This work is presented in the context of the block-oriented exact solution technique (BEST) for multiple input, multiple output (MIMO) processes proposed by Bhandari and Rollins (2003) for continuous-time modelling and Rollins and Bhandari (2004) for constrained discrete-time modelling. This work proposes a model building methodology that is able to separately determine the steady state, dynamic and noise model structures. It includes a pre-whitening procedure that is affective for the general class of discrete ARMA(p, q) noise (Box and Jenkins, 1976). The proposed method is demonstrated using a simulated physical system and a real physical system. Keywords: Wiener system; Hammerstein system; predictive modelling; dynamic modelling; block-oriented modelling; ARMA; serially correlated noise. INTRODUCTION (MIMO) system decomposed to q multiple input, single output (MISO) blocks (see Nells, 2001). The advantages of the Wiener system over the Hammerstein system are the following: (1) each input has a separate dynamic block; and (2) it addresses nonlinear dynamic behaviour functionally and directly through the blocks connecting the outputs. Note that, block-oriented sandwich models are systems with static nonlinear and linear dynamic blocks arranged in series or parallel connections. Although, in this article, we primarily focus on the Wiener and Hammerstein systems, the methodology that we propose is applicable to block-oriented modelling in general. Three common sources of serially correlated noise include model mismatch, measurement errors and unmeasured inputs. These sources combine to give the ETM its serially correlated nature. Most of the block-oriented modelling articles found in literature only addresses independently distributed noise or the so-called ‘white’ noise (e.g., see Gómez and Baeyens, 2004; Hagenblad and Ljung, 2000; Hagenblad, 1999; Bai, 1998; Kalafatis et al., 1997; Westwick and Verhaegan, 1996; Greblicki, 1994; Wigren, 1993). This is an insufficient representation of a ‘real’ system which will inevitably have serially correlated noise due to these error sources. Hence, this article seeks to overcome this insufficiency with the inclusion of serially correlated noise in block-oriented modelling. Figure 2 is a modification of Figure 1 and illustrates the contributions of unmeasured inputs and measurement error to the error term. In our literature search we found only a few studies involving serially correlated noise [these included the The noise or error term in a dynamic predictive model if often serially correlated, i.e., related over time. Therefore, in these situations, the predictive ability of a model may be improved from the development and use of an accurate error term model (ETM). Consequently, the purpose of this article is to propose a model development method under autoregressive, moving average (ARMA) noise in the context of the block-oriented method developed by Bhandari and Rollins (2003) for continuous-time modelling and by Rollins and Bhandari (2004) for constrained discrete-time modelling. In block-oriented modelling, static and dynamic behavior are represented in separate blocks and arranged in a network connected by variables that are either observed or unobserved. The two most basic systems are the Hammerstein system and the Wiener system which are special cases of the more general ‘sandwich model’ as discussed in Pearson and Ogunnaike (1997). The first block in the Hammerstein system is the static gain function which is typically nonlinear in the inputs. This function then enters the second block consisting of a linear dynamic transfer function that ultimately produces the output response. The Wiener system is similar to the Hammerstein system but reverses the order of the blocks; the Wiener system is shown in Figure 1 for a multiple input, multiple output Correspondence to: Professor D. K. Rollins, Department of Chemical Engineering, 2114 Sweeney Hall, Iowa State University, Ames, IA 50011, USA. E-mail: drollins@iastate.edu 9 10 ROLLINS et al. Figure 1. A description of the general MIMO Wiener model structure (decomposed to q MISO blocks) with i ¼ 1, . . . , q outputs and j ¼ 1, . . . , p inputs. There is one set of blocks for each of the q outputs. For each set of blocks, each of the p inputs (uj) passes through a separate linear dynamic block (Gij) and produces an intermediate variable, vij, that is an element of the vector vi. Each vi passes through a nonlinear static function fi(vi) and generates the output hi. works of Cao and Gertler (2004); Zhu (2002); David and Bastin (2001); Chen and Fassois (1992); and Haist et al. (1973)] in block-oriented modelling. These studies all employed methods of simultaneous identification of the DTFM and ETM structures, rather than separate identification of these structures. Also, only a small fraction of these studies specifically addressed Wiener systems (Zhu, 2002; Chen and Fassois, 1992) or Hammerstein systems (Haist et al., 1973), and none of them involved the modelling of physical systems. Blocking-oriented modelling of physical systems in the presence of serially correlated noise consists of the determination of three types of model structures: (1) the static or steady-state model (SSM) ( fh in Figure 2); (2) the dynamic deterministic transfer function model (DTFM) (i.e., h in Figure 2); and (3) the dynamic ETM (1 in Figure 2). If the goal is to determine the DTFM that explains the greatest amount of variation in the output, then identification of this model can be quite challenging as it is competing with the ETM for dynamic predictive power. In view of this, we make the following comments. First, the information for determining the DTFM comes from the relationships of the past inputs on the current output. Secondly, the information for determining the ETM comes from the relationships of the past outputs on the current output. Furthermore, past outputs contain a composite of input information that makes the past values of an output variable more information-rich than the past values of any one input variable. Thus, in many situations, it is possible to obtain high predictive accuracy without the use of any (or only a few) input variables. Note that this is the core justification for autoregressive-integrated moving average (ARIMA) modelling (see Box and Jenkins, 1976) which uses no inputs and is a dynamic modelling approach based strictly on past outputs. Therefore, given a transfer function modelling problem where ARIMA modelling alone can be quite effective, one could obtain excellent performance irrespective of the DTFM and its contribution. Consequently, in a dynamic setting, under serially correlated noise, the modeller must be careful not to allow the ETM to take predictive power away from the DTFM when the goal is to obtain the optimal DTFM (i.e., the one with maximum predictive Figure 2. A description of a MISO Wiener System with p measured inputs, q unmeasured inputs, and noise. The unmeasured inputs contribute to the unmeasured process noise, 1process. The 1measurement term represents all the measurement errors. The error term, 1, is equal to 1process plus 1measurement. The output, y, is equal to the exogenous (deterministic) term, h, plus the error (stochastic) term, 1 (i.e., the ETM). Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 OPTIMAL DETERMINISTIC TRANSFER MODELLING performance). To ensure this goal, model development requires an ability to separately develop, evaluate and partition the contributions of both of these dynamic terms to the extent that the contribution of the DTFM is optimized. Note that for a DTFM structure to be optimal it must have optimal form or structure which includes containing all significant inputs. Consequently, to achieve the goal of obtaining the maximum DTFM under serially correlated noise, this article extends the two-stage method of Bhandari and Rollins (2003) and Rollins and Bhandari (2004) that separately determine the SSM and DTFM and proposes a threestage method that also separately determines the ETM. We evaluate this method using the mathematically simulated CSTR in Bhandari and Rollins (2003) and the real self-regulating level process presented in Reitz (1998). This work is presented as follows. The next section reviews the Wiener and Hammerstein modelling methods proposed in Rollins and Bhandari (2004). The third section provides a mathematical description of the three measurement models evaluated in this study. The fourth section then details the proposed three-stage model development procedure. The fifth section demonstrates the effectiveness of the proposed procedure in the CSTR study by examining several cases of serially correlated noise, while the sixth section presents the results of the real process study. Concluding remarks are given in the final section. 11 where t ¼ kDt, k ¼ 1, . . . , nt, Dt ¼ the sampling rate and nt ¼ the number of samples. For discretization of equation (1) and constraining it to a gain of one (this is the reason it is called a ‘constrained’ method), Rollins and Bhandari (2004) presented the following results when m ¼ 0: vij,t ¼ n X " n X dij,k vij,tk þ 1 k¼1 # dij,k uj,t1 (4) k¼1 and when m . 0: vij,t ¼ n X d j,k vij,tk þ m X vij,‘ uj,t‘ ‘¼1 k¼1 " þ 1 n X m X dij,k # vij,‘ uj,t(mþ1) (5) ‘¼1 k¼1 where the dij’s and the vij’s are estimated from data. We will apply equations (3) and (5) in the simulation study later. The general MISO Hammerstein system in differential equation form can be represented by equations (6) and (7): dn hi (t) d n1 hi (t) d h (t) þ a þ þ ai,1 i þ hi (t) i,n1 dtn dtn1 dt dm vi (t) dm1 vi ðtÞ þ bi,m1 ¼ bi,m dtm dtm1 dvi (t) þ vi (t) þ þ bi,1 (6) dt (7) vi (t) ¼ fi (u(t)) ai,n THE CONSTRAINED MIMO W-BEST AND H-BEST MODELS This section gives the constrained discrete-time W-BEST (Wiener) and H-BEST (Hammerstein) modelling approaches recently proposed by Rollins and Bhandari (2004). For continuous-time versions, see Bhandari and Rollins (2003) and Rollins et al. (2003). Note that, although the application in this study is discrete-time modelling, the methodology that we present is equally applicable to continuous-time modelling. Working from the description of the Wiener system given in Figure 1 with q outputs and p inputs, a general deterministic mathematical W-BEST model (i.e., the expectation form) is given by equations (1) and (2): dn vij (t) dn1 nij ðtÞ dvij (t) þ vij (t) þ a þ þ aij, 1 ij,n1 dtn dtn1 dt dm uj (t) dn1 uj ðtÞ þ bij,m1 ¼ bij,m m dt dtm1 duj (t) þ uj (t) þ þ bij,1 (1) dt hi (t) ¼ fi (vi (t)) (2) aij,n where all initial conditions and derivatives are zero, i refers to the output with i ¼ 1, . . . , q, j refers to the input with j ¼ 1, . . . , p, and vi(t) ¼ [vi1, vi2, . . . , vip]T. Note that for simplicity, equation (1) is written without dead time and there are no restrictions placed on the static function given by equation (2). Discretization of equation (2) gives: hi,t ¼ fi (vi,t ) (3) Discretization of equation (7) gives: vi,t ¼ fi (ut ) (8) For discretization of equation (6) and constraining it to a gain of one, Rollins and Bhandari (2004) presented the following results for the H-BEST model when m ¼ 0: hi,t ¼ n X " di,k hi,tk þ 1 k¼1 n X # di,jk vi,t1 (9) k¼1 and when m . 0: hi,t ¼ n X di,k hi,tk þ þ 1 vi,‘ vi,t‘ ‘¼1 k¼1 " m X n X k¼1 di,k m X # vi,‘ vi,t(mþ1) (10) ‘¼1 We will apply equations (8) and (9) in the real process study later in this article. In the next section we give the measurement models for the development of the model-building procedure. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 12 ROLLINS et al. THE MEASUREMENT MODELS Then from equations (14)–(16) we get The three measurement models used in this study are presented in this section. Model 1 is the exogenous, white noise model; Model 2 is the exogenous, serially correlated noise model; and Model 3 is the ARIMA model. Models 1 and 2 are used in the proposed three-stage model-building methodology, while the results of Model 3 (where predictions are based solely on past outputs) are presented to illustrate the potential competitiveness of the ETM with the DTFM, as discussed in the Introduction. In the studies to follow, we consider only one output; thus, we are dropping the subscript i for simplicity. Model 1: The ‘White’ Noise Model yt ¼ h(1) t þ at (11) where at is distributed indep N(0, s2 ) (12) 8t is ht from equation (3) for W-BEST or equations and h(1) t (9) or (10) for H-BEST, whichever is appropriate, with estimator y^ (1) t : y^ (1) ^ t(1) t ¼ h (13) Note that ‘^’ is used to identify an estimator. h^ t(1) is obtained via nonlinear least squares regression by determining the estimates of the d’s and v’s that minimize SSE(1), the sum of squared errors (SSE) under Model 1 [see equation (A2)]. Model 2: The Pre-whitening Model (2) y(2) t ¼ ht þ N t (14) where Nt ¼ uq (B) at wp (B) 8t, uq (B) ¼ 1 u1 B u2 B2 uq Bq , 2 wp (B) ¼ 1 w1 B w2 B wp B (2) P(B)y(2) t ¼ P(B)ht þ at ¼) (2) (2) (2) y(2) t ¼ ht þ pt yt1 ht1 þ p2 yt2 ht2 þ þ at which is now in white noise form with estimator (2) (2) ^ t(2) þ p^ 1 yt1 h^ t1 þ p^ 2 yt2 h^ t2 þ y^ (2) t ¼h (18) Note that Nt and at play the role of 1 in Figure 2. In practice, the number of terms actually used in equation (18) is finite since pi dies out as i increases. The one-step-ahead (OSA) predictor, y^ (2) t , is obtained via nonlinear least squares regression by determining the estimates of the d’s and v’s contained in h^ t(2) that minimize SSE(2) [see equation (A3)]. Note that, as discussed previously, although their structures are equivalent, the coeffi^ (2) cients in h^ (1) t and h t will be different since the objective (1) function for obtaining h^ (1) t , SSE , is not the same as the (2) objective function for obtaining h^ (2) t , SSE . For OSA prediction, which uses equally spaced outputs, we recommend y^ (2) t , i.e., equation (18). However, for applications requiring only the DTFM (i.e., ht) (e.g., when outputs are not measured on-line), we recommend h^ t(1) , the estimator for ht under Model 1, over h^ t(2) , the estimator for ht under Model 2. Appendix A gives a simple proof illustrating that under the least squares criterion, when comparing DTFM’s, h^ t(1) is the best estimator (i.e., gives the smallest SSE) regardless of the nature of the ETM. In the simulation study that we present later, the SSEs using h^ t(1) were all less than the SSEs using h^ t(2) and thus, further supports our claim that h^ t(1) is a better estimator for ht than h^ t(2) . As an example in the literature see the ‘gas furnace problem’ in Box and Jenkins (1976). For this problem h^ t(1) is given on p. 383 and h^ t(2) is given as equation (11.4.1) on p. 396. Although not calculated in Box and Jenkins (1976), we found SSE(1) ¼ 221.6 (R 2 ¼ 0.927) using h^ t(1) which is less than SSE(2) ¼ 222.0 (R 2 ¼ 0.926) using h^ t(2) , in agreement with our claim from Appendix 1. We looked at other examples from the literature (not shown for space consideration) and they all were in agreement with our claim. (15) Model 3: The ARIMA Model p y(3) t ¼ r and B xt ¼ xt2r. Thus, Nt is an ARMA(p, q) ETM. Pre-whitening Model 2 Let wp (B) ¼ P(B) ¼ 1 p1 B p2 B2 uq (B) (17) (16) Note that the right-hand side of equation (16) can be obtained by long division [for an application of equation (16) in another context see Kongsjahju and Rollins (2000)]. Nt (1 B)d (19) This model is included because we give its performance for each case in the CSTR simulation study to indicate how the ARIMA estimator stacks up against the estimators under is close to y^ (2) Models 1 and 2. Note that when y^ (3) t t , this means that a very large amount of the effect of the inputs is contained in previous outputs. Hence, when this is the case, it will be possible to obtain near-optimal OSA model accuracy using a sub-optimal DTFM structure. This is because a large portion of the input effect can be carried by previous outputs, and when this happens, the OSA DTFM structure can be different from the optimal DTFM structure and still produce accuracy near the optimal Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 OPTIMAL DETERMINISTIC TRANSFER MODELLING OSA model. Therefore, as indicated earlier, accurate ARIMA model estimation is an indication that simultaneous approaches could result in a model structure for ht that is significantly sub-optimal. MODEL BUILDING IN THE PRESENCE OF ARMA NOISE When Model 1 is appropriate (i.e., white noise), Bhandari and Rollins (2003) gives a two-stage procedure for building block-oriented models exploiting statistical design of experiments (SDOE) to maximize information content that separately determines the static and dynamic model forms and parameter estimates. In this section, we extend this procedure to three stages to include a stage to separately determine the serial correlation structure for the error term [i.e., equation (15)]. This procedure is given in the following six steps: (1) Select the SDOE and run the design points (input changes) as a sequence of step tests. (2) Stage 1: Average the steady-state data from each input change and find the form of equation (3); estimate the ultimate response model parameters [see Rollins and Bhandari (2004) for more details]. (3) Stage 2: From a visual examination of response plots from the step tests, select the dynamic model form [e.g. equations (4) or (5)] and estimate the dynamic parameters under Model 1. This step is repeated until an acceptable dynamic model is obtained. (4) Stage 3: Using the residuals from Step 2, determine the ARMA( p , q ) form of equation (15) and the initial estimates of the p þ q parameters. (5) Simultaneously refit the dynamic parameters (using the form and initial estimates found in Stage 2) and the ARMA parameters (using the form and initial estimates found in Stage 3) under Model 2. (6) Check the residuals from Step 5 for compliance to white noise. The SDOE selected in Step 1 is based on á priori assumptions of the ultimate response behavior in the input space as described in Rollins et al. (2003). For additional help in block-oriented discrete-time modelling see Rollins and Bhandari (2004). Step 2 differs slightly from the procedure given in Rollins and Bhandari (2004) in the requirement of averaging the steady-state data for each input change. This is a necessary requirement because the steady-state data for a given input change will be serially correlated when the ETM follows equation (15). However, since the groups of steady-state data for each input change are far apart in time, the groups will not be serially correlated and thus, neither will their averages. One can check this behaviour via the autocorrelation function (ACF) for the residuals. Note that as the amount of serial correlation increases [e.g., as w increases in an AR(1) model], the sample size used for averaging must be sufficiently large to maintain sufficiently low variance in the averages. For assistance with Step 3, see details in Rollins and Bhandari (2004) where this step is identical since it is under Model 1. In Step 4 (Stage 3), the ARMA (or ARIMA) structure for the error term is determined. To obtain this form, we use the ACF and the partial 13 autocorrelation function (PACF). See Box and Jenkins (1976) for assistance on using the ACF and PACF for determining an ARMA structure from residuals and determining parameter estimates from models. In the studies to follow, we demonstrate this step using the Minitab computer program. In Step 5, the dynamic parameters and ARMA parameters are re-estimated simultaneously under Model 2 to fully comply with the assumption of white noise and all the assumptions of least squares estimation. This step produces the OSA predictor that uses past outputs [i.e., equation (18)] and the estimated coefficients of ht and Nt under Model 2. The final step is a check on the white noise assumption for the final OSA predictive model given by equation (18). This assumption should be checked using the ACF before accepting the final model. In the next section, we apply these steps in modelling CA for the simulated CSTR. MATHEMATICALLY SIMULATED CSTR STUDY This section applies the proposed model-building procedure given in the previous section to block-oriented modelling when the error term is serially correlated. This application is to the simulated CSTR used in Bhandari and Rollins (2003) for continuous-time modelling and Rollins and Bhandari (2004) for discrete-time modelling. For the physical details of this process see Bhandari and Rollins (2003). Although this reactor has five outputs, we restrict this study to just CA for space considerations. We studied all outputs and our conclusions for CA for this process apply equally as well to them. A simplified diagram of the CSTR is given in Figure 3. Reactants A and B enter the CSTR as two different flow streams and form product C. The second-order, exothermic reaction taking place in the CSTR gives the process strong nonlinear and interactive behavior [see Bhandari and Rollins (2003)]. The process model consists of the overall mass balance, component (A and B) mole balances, and energy balances on the tank and jacket contents. The input variables are the feed flowrate of A (qAf), the feed temperature of A (TAf), the feed concentration of Figure 3. Schematic of the CSTR. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 14 ROLLINS et al. A (CAf), the feed flowrate of B (qBf), the feed temperature of B (TBf), the feed concentration of B (CBf), and the coolant flowrate to the jacket (qc). Rollins and Bhandari (2004) effectively modelled all the outlet variables as Wiener processes with at ¼ 0 in equation (11) (i.e., no added noise). Although the fitted models were excellent, a check on the residuals revealed significant serial correlation even without adding noise to the outputs. In this study, Na,t (i.e., the noise added to CA) is either AR(1) or ARMA(1, 1). The only other modifications we made to the data that differs from Rollins and Bhandari (2004) are the four additional centre design points for replication to estimate s2. These 60 step tests were completely randomized to represent the design points for the training sequence. The type of experimental design remained the same: a three level Box –Behnken design. This design includes: (1) each step test (design point) being five minutes long for a total run time of 300 min; (2) a sampling time of 0.2 min; (3) the same static gain or ultimate response function, f (vt) [see equation (3)]; and (4) the same dynamic model forms, vj,t [see equation (5)], as determined in Rollins and Bhandari (2004). That is, ht ¼ f (vt ) ¼ b0 þ b1 v1,t þ þ b7 v7,t þ b8 (v1,t )2 þ þ b14 (v7,t )2 þ b15 v1,t v2,t þ b16 v1,t v3,t þ þ b35 v6,t v7,t (20) and vj,t ¼ dj,1 vt1 þ dj,2 vt2 þ vj,1 uj,t1 þ (1 dj,1 dj,2 v j,1 )uj,t2 (21) Note that the coefficients of equation (20) are estimated in Step 2 of the proposed procedure with the corresponding uj substituted for vj [see Rollins and Bhandari (2004)]. An overall result for all cases in this study, as mentioned in previously, is that the SSE using h^ t(1) was always less in agreement with our proof in than the SSE using h^ (2) t Appendix A. These results are not given for space consideration. We will now examine testing results of the first part of this study; Na,t equal to AR(1). Part 1: Na,t Equal to AR(1) For Part 1 of this study, random error was added to CA,t with the following distribution: yt ¼ CA,t þ Na,t (22) where Na,t ¼ at 1 wa B (23) and at is distributed indep N(0, s2 ) 8t (24) We varied s as follows: 0, 0.002, 0.006. The AR(1) coefficient was 0 or 0.5. This gave five different cases that we randomly replicated three times. One trial of the most extreme case, s ¼ 0.006 and wa ¼ 0.5, will be examined in detail to demonstrate the proposed model-building procedure. We will refer to this trial as the ‘example case’. Figure 4 plots the observed and true responses of CA over time for the example case to illustrate the behaviour of the added noise on the process response. Figure 4 includes a plot representing all the response data for this case along with a magnified view of the first 50 min of the training sequence. To conserve space, we do not present the input sequence, but it is similar to the one in Rollins and Bhandari (2004). Note that Figure 4 represents Step 1 in the proposed procedure. Step 2 of the procedure is the identification and fitting of the ultimate response model. The model form is given by equation (20) with the corresponding uj substituted for vj. For each steady state, we took the final five values right before the next input change, averaged them, and used these averages to estimate the coefficients in equation (20). The estimated parameters are given in Table 1 below. The residuals from this fit (not shown) did not show evidence of serial correlation from the ACF plot. The R 2 value was 99.7%. Figure 4. The observed values of CA and the true response of CA for the training case (s ¼ 0.006 and wa ¼ 0.5) illustrating the highest level of added noise in the simulation study. The plot to the left ‘blows up’ the first 50 min and the plot on the right contains all the response data. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 OPTIMAL DETERMINISTIC TRANSFER MODELLING Table 1. Estimated ultimate response coefficients in equation (20) for the example case. Parameter b^ 0 103 b^ 1 103 b^ 2 101 b^ 3 103 b^ 4 103 b^ 5 102 b^ 6 101 b^ 7 104 b^ 8 105 b^ 9 101 b^ 10 104 b^ 11 105 Value Parameter 3.21 4.37 3.57 26.46 23.84 21.26 24.59 9.93 26.86 5.16 1.83 5.95 b^ 12 107 b^ 13 101 b^ 14 106 b^ 15 103 b^ 16 105 b^ 17 105 b^ 18 105 b^ 19 104 b^ 20 105 b^ 21 102 b^ 22 103 b^ 23 103 Value Parameter Value 21.62 21.73 27.50 9.22 6.00 8.71 6.01 1.46 1.17 1.26 6.46 25.72 b^ 24 102 b^ 25 104 b^ 26 104 b^ 27 104 b^ 28 103 b^ 29 105 b^ 30 105 b^ 31 103 b^ 32 105 b^ 33 103 b^ 34 105 b^ 35 104 26.65 3.37 2.14 1.37 9.95 25.09 8.87 4.77 25.10 1.53 23.96 21.55 The next step is the estimation of the dynamic model parameters in equation (21) under Model 1 using all the data. That is, this step determined h^ (1) . The R 2 for this fit was 98.1%. Figure 5 contains the ACF and PACF of the residuals for this fit. This figure shows an exponential 15 decay of the ACF and a significant lag in the PACF indicating AR(1) behaviour. Following Step 4 of the procedure, we used the ARIMA command in the Minitab program to fit an AR(1) model to the residuals for an estimate of w1 ¼ 0.61. Step 5 re-estimated the parameters in equation (21) using Model 2 with an AR(1) error term, as described in equation (25) below, using the pre-whitening form (the new estimate of w1 ¼ 0.64) giving: (2) ^ t(2) þ w^ 1 yt1 h^ t1 y(2) t ¼h (25) The ACF of the residuals from this fit is shown in Figure 6. As evidenced by the small values of lagged correlation coefficients which are in the confidence bands, any significant serial correlation appears to be completely removed. The final h^ (2) parameter estimates used in equation (25) are given in Table 2 which are different from the ones obtained for h^ (1) . R2 using equation (25) was 98.8%. Plots of y^ (2) and h^ (1) are given in Figure 7 for the first 50 min of the training time and for the total training time. The plots of y^ (2) follow the data, y, quite well. Note that h^ (1) follows CA closely as well. Figure 5. The ACF and PACF of the residuals under Model 1 indicating an AR(1) noise model. Figure 6. ACF of the residuals and the time series plot of the residuals of the final fit of Model 2 for the example case, indicating removal of serially correlated noise via pre-whitening. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 16 ROLLINS et al. Table 2. Estimated dynamic parameters in equation (21) under Model 2 for the example case. Input ( j) Dynamic parameter qAf (1) d^j,1 d^j,2 v^ j,1 1.561 1.381 1.314 1.042 1.251 1.429 1.444 20.616 0.283 20.432 0.305 20.444 0.074 20.215 0.119 20.364 0.109 20.513 0.11 20.549 0.059 CAf (2) TAf (3) The models were then tested using the same test sequence in Rollins and Bhandari (2004). Plots of y^ (2) and h^ (1) are given in Figure 8 for the first 50 min of the testing time and the total testing time. Similar to the training data, in this most extreme case, the plots of y^ (2) follow the data quite well, and h^ (1) follows CA closely as well. Next we examine a summary of test results for Part 1. A summary of all the cases we ran for Part 1 of this study is provided by Table 3. The results in this table are expressed as relative measures of the sum of squared prediction errors (SSPE). To quantitatively assess the extent of agreement between the true and observed responses and the predicted responses, we define two terms called the true sum of squared prediction error (T-SSPE) and the observed sum of squared prediction error (O-SSPE) TBf (4) qBf (5) CBf (6) qc (7) given by equations (26) and (27), respectively. T-SSPE ¼ M X (CA,k h^ k )2 (26) (yk y^ k )2 (27) k¼1 O-SSPE ¼ M X k¼1 where M is the total number of equally spaced sampling points used over the testing interval. For this study M ¼ 1500. The smaller the SSPE, the more accurate the model. However, note that SSPE and SSE are determined from different data. SSPE is calculated from the testing data Figure 7. Time series plots for y^ (2) and h^ (1) under training. The top two plots are for the first 50 min while the bottom two plots are for the total training time. For this training case, y^ (2) fits the data, y, quite well, and h^ (1) agrees well with CA. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 OPTIMAL DETERMINISTIC TRANSFER MODELLING 17 Figure 8. Time series plots for y^ (2) and h^ (1) under testing. The top two plots are for the first 50 min while the bottom two plots are for the total testing time. For this testing case, y^ (2) fits the data, y, quite well, and h^ (1) agrees well with CA. Table 3. Relative T-SSPE and relative O-SSPE for h^ (1) , h^ (2) and y^ (2) for Part 1 of the CSTR study. s 0 0.002 0.006 Replication wa 0 0.5 0 0.5 Relative T-SSPE; Estimator in equation (26) (1) h^ h^ (2) y^ (2) y^ (3) h^ (1) h^ (2) y^ (2) y^ (3) Relative O-SSPE; Estimator in equation (27) y^ (2) y^ (3) y^ (2) y^ (3) 1.00 2.02 0.20 0.20 Replication 1 2 3 Avg 1 2 3 Avg 1.02 1.40 0.37 0.94 0.93 1.20 0.38 0.97 0.91 1.14 0.36 0.93 0.95 1.25 0.37 0.95 1.83 1.93 1.09 3.11 1.34 1.30 0.86 3.01 1.28 1.24 0.77 2.91 1.48 1.49 0.91 3.01 1.17 1.42 0.37 0.94 1.36 1.74 0.52 1.05 1.29 1.20 0.48 1.01 1.27 1.45 0.46 1.00 2.37 2.41 1.80 4.97 2.52 2.52 1.85 4.88 2.55 2.60 1.76 4.97 2.48 2.51 1.81 4.94 0.63 1.18 0.64 1.25 0.64 1.21 0.64 1.21 3.42 5.46 3.29 5.46 3.17 5.32 3.29 5.41 0.64 1.17 0.51 0.87 0.63 0.98 0.69 1.00 2.90 4.26 2.92 4.23 2.94 4.26 2.92 4.25 Notes: (1) Each result in this table is relative to the case T-SSPE for h^ (1) with s ¼ 0 and wa ¼ 0. (2) All of the results in this table used an AR(2) model for the noise under Model 2 except for cases with s ¼ 0.006 and wa ¼ 0.5 that used an AR(1) model. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 18 ROLLINS et al. and SSE is calculated from the training data. Nonetheless, since both are measures of model quality, we expect trends in SSPE to follow trends in SSE. It is important to recognize that the proof in Appendix A pertains to SSE and not SSPE, and as we stated previously, there are no SSE results from this study in contradiction of this proof. All the SSPE’s in Table 3 are relative to the T-SSPE for h^ (1) in equation (26) for the case when s ¼ 0 and wa ¼ 0. To give an indication of the achievable model accuracy from using only past outputs, this table also includes ARIMA modelling results for each case. As discussed previously, this is to indicate the potential difficulty simultaneous identification methods can have in finding the optimal DTFM since this difficulty increases when high accuracy is achievable by ARIMA modelling which uses no inputs. First we give some overall results and conclusions from the test cases in Table 3. From Table 3, we see that SSPE increases as s increases and as w increases. This behaviour is expected and is more pronounced for large s. Next we focus on the proposed OSA predictor, y^ (2) , alone. As expected, y^ (2) has the lowest SSPE value in all cases. Now we compare the two estimators for the DTFM, h^ (1) and h^ (2) . As Table 3 shows, the SSPE for h^ (1) is better than the SSPE for h^ (2) in all cases and mimics the SSE results. Finally note that the SSPE’s for the ARIMA model OSA predictor, y^ (3) , is only about one-and-a-half to three times worse than the SSPEs for y^ (2) and in some cases is better than the SSPE for h^ (1) . Thus, for this process and the conditions of this study, it appears that simultaneous identification method could have difficulty in obtaining the optimal DTFM structure. where at is distributed indep N(0, s2 ) For this study, s was fixed at 0.006. We looked at combinations of high values for the AR parameter (wa) and low values for the MA (ua) parameter, and vice versa. In all, we looked at four different cases with two trials each. The ARIMA parameter combinations for these cases included: ua ¼ 0.2, wa ¼ 0.8; ua ¼ 0.8, wa ¼ 0.2; ua ¼ 20.5, wa ¼ 0.5; and ua ¼ 0.5, wa ¼ 20.5. For s ¼ 0.006, the level of noise was less than that shown in Figure 4 but still highly significant. To conserve space, we do not present the training results and only present the testing results, which are summarized in Table 4. From Table 4, with the variations in the ARMA parameters as given, the proposed method was still able to obtain excellent SSPE performance for h^ (1) and y^ (2) . In the case with wa ¼ 20.5, the SSPE for y^ (2) is greater than the SSPE h^ (1) . However, the training SSEs are just the opposite with SSE(1) slightly greater than SSE(2) for this case, as expected. Also, in the case with wa ¼ 0.2, the SSPE for h^ (1) is slightly greater (but not significantly greater) than the SSPE for h^ (2) . However, as mentioned previously, the SSEs were just the opposite and in agreement with the proof in Appendix A. Finally, note that for the case with ua ¼ 0.2, the T-SSPE for y^ (3) is very close to the T-SSPE for y^ (2) , indicating a very high potential of difficulty in finding the optimal DTFM structure using a simultaneous identification procedure. LEVEL PROCESS STUDY To evaluate the proposed method using real data, we applied our method to the self-regulating level process shown in Figure 9. The level, h, reaches a steady state for changes in inlet flow rate, q. All the data for this study are taken from Rietz (1998) who originally built a two-stage, continuous-time H-BEST model for this system under Model 1. In this section, we will use the steady-state model from Rietz (1998) but will rebuild the Part 2: Na,t Equal to ARMA (1,1) For Part 2 of this study, the random error, Na,t, that was added to CA,t had the following distribution: Na,t ¼ 1 ua B at 1 wa B (29) 8t (28) Table 4. Relative T-SSPE and relative O-SSPE for h^ (1) , h^ (2) and y^ (2) for Part 2 of the CSTR study. Relative T-SSPE ua ¼ 0.2 ua ¼ 0.8 ua ¼ 20.5 ua ¼ 0.5 wa ¼ 0.8 wa ¼ 0.2 wa ¼ 0.5 wa ¼ 20.5 Replication Estimator [equation (26)] 1 h^ (1) h^ (2) y^ (2) y^ (3) Estimator [equation (27)] y^ (2) y^ (3) Replication Replication Replication 2 Avg 1 2 Avg 1 2 Avg 1 2 Avg 2.23 2.53 1.44 1.97 2.22 2.27 1.56 1.96 2.22 2.40 1.50 1.96 1.22 1.14 0.54 2.40 1.01 0.97 0.46 2.42 1.11 1.05 0.50 2.41 2.98 3.37 2.00 2.52 2.41 2.78 2.49 2.58 2.69 3.08 2.24 2.55 1.04 4.01 1.47 4.78 0.89 0.97 1.31 4.37 0.97 2.49 1.39 4.58 1.34 4.55 1.27 3.04 1.31 3.80 2.08 3.54 2.05 3.41 1.47 3.65 1.45 3.61 2.63 6.00 2.22 5.46 2.42 5.73 Relative O-SSPE 2.06 1.44 3.47 3.57 Notes: (1) Each result in this table is relative to the case T-SSPE for h^ (1) with s ¼ 0 and wa ¼ 0. (2) All of the results in this table used an ARMA (2, 1) model for the noise under Model 2. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 OPTIMAL DETERMINISTIC TRANSFER MODELLING 19 At Step 3 (Stage 2), by relying on visual examination, we selected a first-order dynamic model form as described by equation (31) (for space consideration, the plots are not shown). h^ t ¼ d^1 h^ t1 þ (1 d^1 )vt1 Figure 9. The tank level system used in the study for the real process. discrete-time H-BEST model in applying Stages 2 and 3 of the proposed three-stage model building procedure to address serially correlated noise. In Step 1, the experimental design used by Rietz (1998) consisted of two step tests of +0.15 qo, where qo ¼ 0.66 gpm. The steady-state model was obtained in Step 2 (Stage 1) and is given by equation (30): vt ¼ a þ b(q q0 ) (30) where a ¼ 0.3468 and b ¼ 85.2. (31) where d^1 ¼ 6:99. The initial value of this parameter was obtained under Model 1. R 2 for this training step was 1.0. The ACF and PACF for the residuals are plotted in Figure 10. As shown in these plots, the noise structure appears to be an AR(1) or AR(2) model. We modeled these residuals using an AR(2) structure and obtained the initial estimates for w1 and w2 as described in Step 4 (Stage 3) of the modelbuilding procedure. Then, d1, w1 and w2 were re-estimated simultaneously using equation (32) below (per Step 5): (2) (2) ^ t(2) þ w^ 1 yt1 h^ t1 þ w^ 2 yt1 h^ t2 y^ (2) t ¼h (32) where w^ 1 ¼ 0:7115 and w^ 2 ¼ 0:2910. As a final step, we checked for compliance to white noise. As shown in Figure 11, this step appears quite successful. To test this model, we used the same testing sequence in Rietz (1998); this sequence is given in Figure 12. As shown in Figure 13, both the fitted DTFM (h^ (1) ) and the OSA predictor (^y(2) ) fit the process response quite well. Thus, this procedure appears to have promise as an effective Figure 10. The ACF (left) and PACF (right) of the residuals under Model 1 for the real process study. The noise model appears to be AR(1) or AR(2) because the ACF has an exponential decay and the PACF has a large correlation for lag 1 and no significant correlations for other time lags. Figure 11. ACF of the residuals for the training data for the real process after pre-whitening (i.e., after Step 5). The plot shows excellent removal of the serially correlated noise (compare with the ACF plot in Figure 10). Figure 12. The test input sequence for the real process study. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 20 ROLLINS et al. Figure 13. The test sequence performance of y^ under Model 2 (i.e., y^ (2) ) and h^ under Model 1 (i.e., h^ (1) ) for the real process. The OSA predictor, y^ (2) , uses previous outputs and inputs but the DTFM, h^ (1) , uses only inputs. In both cases performance is excellent with h^ (1) only slightly worse than y^ (2) . model-building method for real physical systems in the presence of serially correlated noise. for the optimization process to obtain accurate estimates of the model parameters. NOMENCLATURE CLOSING COMMENTS This work has proposed a model-building procedure for block-oriented modeling when the error term is serially correlated. This pre-whitening procedure addresses general ARMA noise which is more common in real systems than ‘white’ (i.e., uncorrelated) noise. The procedure appears to be effective and is able to separately determine model structures and parameter estimates for the steady-state (ultimate response) model (SSM), dynamic deterministic transfer function model (DTFM), and the error term model (ETM) in three stages which gives it an advantage over methods that simultaneously determine these model structures. The advantage is that the proposed approach allows the modeller to first maximize explained variation of the DTFM model before fitting the ETM which could excessively dominate predictive performance and significantly reduce the amount of explained variation by the DTFM. This is especially true when ARIMA modelling, which uses only past outputs, can produce a highly accurate fit. This work demonstrated the reality of this situation by showing several cases were ARIMA modelling performed well relative to OSA predictive modelling with optimal DTFM structure. Therefore, for OSA transfer function modelling in the presence of serially correlated noise, it is our strong recommendation that modellers obtain their DTFM structure ahead of their ETM structure, regardless of the method they use. When the deterministic DTFM is the only goal, a significant implication of this work is the conclusion that the optimal model can be obtained under white noise regardless of the presence of serially correlated noise. Hence, in this situation, the best modelling approach is to simply determine the parameters that minimize SSE, without consideration of the characteristics of the noise. The proposed three-stage modelling building approach is readily extendable to sandwich block structures. An important advantage of our three stage approach is the ability to treat non-invertible static functions which provides promise in addressing multiple input, multiple output processes. However, the estimation of the static nonlinear functions will be nested one inside the other creating a challenge at CAf CBf CA CB CC f G m M n nt Nt p p q q qAf qBf qc T TAf TBf T u v y white noise term stream A inlet concentration stream B inlet concentration concentration of A in the reactor concentration of B in the reactor concentration of C in the reactor nonlinear static gain function linear dynamic function number of zeros number of equally spaced times over the testing interval number of poles number of samples serial correlated noise model number of inputs number of AR parameters number of outputs number of MA parameters feed A flowrate feed B flowrate coolant flowrate temperature in the reactor stream A inlet temperature stream B inlet temperature tank temperature in the reactor vector of input variables vector of intermediate variables measured value of the output Greek symbols b vector of parameters for static gain function 1 error term d, v vector of dynamic parameters for discrete-time models h expectation function (true value of the output) s standard deviation of the noise w auto regressive parameter u moving average parameter Subscripts a i j t noise output input sampling instant Superscripts ^ estimate (1) under Model 1 (2) under Model 2 (3) under Model 3 Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21 OPTIMAL DETERMINISTIC TRANSFER MODELLING REFERENCES Bai, E., 1998, An optimal two-stage identification algorithm for Hammerstein-Wiener nonlinear systems, Automatica, 34(3): 333–338. Bhandari, N. and Rollins, D.K., 2003, A continuous-time MIMO Wiener modeling method, Industrial & Engineering Chemistry Research, 42: 5583–5595. Box, G.P. and Jenkins, G.M., 1976, Time Series Analysis: Forecasting and Control, Revised edition (Holden-day, Oakland, California). Cao, J. and Gertler, J., 2004, Noise-induced bias in last principal component modeling of linear system, Journal of Process Control, 14: 365– 376. Chen, C.H. and Fassois, S.D., 1992, Maximum likelihood identification of stochastic Wiener-Hammerstein-type-non-linear systems, Mechanical Systems and Signal Processing, 6(2): 135 –153. David, B. and Bastin, G., 2001, An estimator of the inverse covariance matrix and its application to ML parameter estimation in dynamical systems, Automatica, 37: 99–106. Gomez, J.C. and Baeyens, E., 2004, Identification of block-oriented nonlinear systems using orthonormal bases, Journal of Process Control, 14: 685–697. Greblicki, W., 1994, Nonparametric identification of Wiener systems by orthogonal series, IEEE Transactions on Automatic Control, 39(10): 2077–2086. Hagenblad, A., 1999, Aspects of the identification of Wiener models, Technical report licentiate thesis no. 793, Department of Electrical Engineering, Linköping Studies University, SE-581 83 Linköping, Sweden. Hagenblad, A. and Ljung, L., 2000, Maximum likelihood estimation of Wiener models, Proceedings of the 39th IEEE Conference on Decision and Control, 2417–2418. Haist, N.D., Chang, F.H.I. and Luus, R., 1973, Nonlinear identification in the presence of correlated noise in using a Hammerstein model, IEEE Transactions on Automatic Control, 18(5): 552– 555. Kalafatis, A.D., Wang, L. and Cluett, W.R., 1997, Identification of Wiener-type nonlinear systems in a noisy environment, International Journal of Control, 66(6): 923 –941. Kongsjahju, R. and Rollins, D.K., 2000, Accurate identification of biased measurements under serial correlation, IChemE Transactions Part A – Chemical Engineering Research and Design, 78: 1010– 1017. Nells, O., 2001, Nonlinear System Identification (Springer, Germany). Pearson, R.K. and Ogunnaike, B.A., 1997, Nonlinear process identification, in Nonlinear Process Control, 20–23 (Prentice-Hall PTR, Upper Saddle River). Reitz, C.A., 1998, The application of a semi-empirical modeling technique to real processes, Graduate thesis submission, Chemical Engineering Department, Iowa State University. Rollins, D.K., Bhandari, N., Bassily, A.M., Colver, G.M. and Chin, S., 2003, A continuous-time nonlinear dynamic predictive modeling method for Hammerstein processes, Industrial & Engineering Chemistry Research, 42: 861–872. Rollins, D.K. and Bhandari, N., 2004, Constrained MIMO dynamic discrete-time modeling exploiting optimal experimental design, Journal of Process Control, 14: 671 –683. Westwick, D. and Verhaegan, M., 1996, Identifying MIMO Wiener systems using subspace model identification methods, 52: 235 –258. Wigren, T., 1993, Recursive prediction error identification using the nonlinear wiener model, Automatica, 29(4): 1011–1025. Zhu, Y., 2002, Estimation of an N-L-N Hammerstein-Wiener model, Automatica, 38: 1607–1614. 21 APPENDIX A Proof that h^ (1) t is the Best Estimator for ht Under Least Squares Let h^ t be the h^ t that minimizes SSE ¼ nt X (yt h^ t )2 (A1) t¼1 Under Model 1 [equation (11)], SSE(1) ¼ nt X nt 2 X 2 yt y^ (1) ¼ yt h^ t(1) t t¼1 (A2) t¼1 Therefore, from examination of equations (A1) and (A2), one sees that h^ t(1) must be the best estimator of ht, that is, h^ t . Likewise, under Model 2 [equation (14)], SSE(2) ¼ nt X yt y^ (2) t 2 t¼1 ¼ nt X (2) yt h^ t(2) þ p^ 1 yt1 h^ t1 t¼1 2 (2) þp^ 2 yt2 h^ t2 þ (A3) Thus, h^ t(2) is obtained through the minimization of equation (A3) and not SSE ¼ nt X 2 yt h^ t(2) (A4) t¼1 Therefore, it is possible for h^ t(2) to not be h^ t , and thus, is not preferred over h^ t(1) which is always equal to h^ t . It is easy to create a problem to support this proof. For example, let the true model be ht ¼ begt , for some b and g and fit the model h^ t ¼ a^ 0 þ a^ 1 t, over some values of t with any distribution for ETM. Determine SSE(1) using equation (A2) and SSE(2) using equation (A3). Then obtain SSE using equation (A4) with h^ t(2) from equation (A3). Upon comparing this SSE with SSE(1) you will find that it is greater than, or at best, equal to, SSE(1). The manuscript was received 9 August 2005 and accepted for publication after revision 9 January 2006. Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21