T OPTIMAL DETERMINISTIC TRANSFER FUNCTION MODELLING

advertisement
0263–8762/06/$30.00+0.00
# 2006 Institution of Chemical Engineers
Trans IChemE, Part A, January 2006
Chemical Engineering Research and Design, 84(A1): 9 – 21
www.icheme.org/journals
doi: 10.1205/cherd.05190
OPTIMAL DETERMINISTIC TRANSFER FUNCTION MODELLING
IN THE PRESENCE OF SERIALLY CORRELATED NOISE
D. K. ROLLINS , N. BHANDARI, S.-T. CHIN, T. M. JUNGE and K. M. ROOSA
Department of Chemical Engineering, Iowa State University, Ames, IA, USA
T
his article addresses the development of predictive transfer function models for nonlinear dynamic processes under serially correlated model error. This work is presented
in the context of the block-oriented exact solution technique (BEST) for multiple
input, multiple output (MIMO) processes proposed by Bhandari and Rollins (2003) for
continuous-time modelling and Rollins and Bhandari (2004) for constrained discrete-time
modelling. This work proposes a model building methodology that is able to separately
determine the steady state, dynamic and noise model structures. It includes a pre-whitening
procedure that is affective for the general class of discrete ARMA(p, q) noise (Box and
Jenkins, 1976). The proposed method is demonstrated using a simulated physical system
and a real physical system.
Keywords: Wiener system; Hammerstein system; predictive modelling; dynamic modelling;
block-oriented modelling; ARMA; serially correlated noise.
INTRODUCTION
(MIMO) system decomposed to q multiple input, single
output (MISO) blocks (see Nells, 2001). The advantages
of the Wiener system over the Hammerstein system are
the following: (1) each input has a separate dynamic
block; and (2) it addresses nonlinear dynamic behaviour
functionally and directly through the blocks connecting
the outputs. Note that, block-oriented sandwich models
are systems with static nonlinear and linear dynamic
blocks arranged in series or parallel connections. Although,
in this article, we primarily focus on the Wiener and Hammerstein systems, the methodology that we propose is
applicable to block-oriented modelling in general.
Three common sources of serially correlated noise
include model mismatch, measurement errors and unmeasured inputs. These sources combine to give the ETM its
serially correlated nature. Most of the block-oriented
modelling articles found in literature only addresses
independently distributed noise or the so-called ‘white’
noise (e.g., see Gómez and Baeyens, 2004; Hagenblad
and Ljung, 2000; Hagenblad, 1999; Bai, 1998; Kalafatis
et al., 1997; Westwick and Verhaegan, 1996; Greblicki,
1994; Wigren, 1993). This is an insufficient representation
of a ‘real’ system which will inevitably have serially correlated noise due to these error sources. Hence, this article
seeks to overcome this insufficiency with the inclusion of
serially correlated noise in block-oriented modelling.
Figure 2 is a modification of Figure 1 and illustrates the
contributions of unmeasured inputs and measurement
error to the error term.
In our literature search we found only a few studies
involving serially correlated noise [these included the
The noise or error term in a dynamic predictive model if
often serially correlated, i.e., related over time. Therefore,
in these situations, the predictive ability of a model may
be improved from the development and use of an accurate
error term model (ETM). Consequently, the purpose of this
article is to propose a model development method under
autoregressive, moving average (ARMA) noise in the context of the block-oriented method developed by Bhandari
and Rollins (2003) for continuous-time modelling and by
Rollins and Bhandari (2004) for constrained discrete-time
modelling.
In block-oriented modelling, static and dynamic behavior
are represented in separate blocks and arranged in a network connected by variables that are either observed or
unobserved. The two most basic systems are the Hammerstein system and the Wiener system which are special cases
of the more general ‘sandwich model’ as discussed in Pearson and Ogunnaike (1997). The first block in the Hammerstein system is the static gain function which is typically
nonlinear in the inputs. This function then enters the
second block consisting of a linear dynamic transfer function that ultimately produces the output response. The
Wiener system is similar to the Hammerstein system but
reverses the order of the blocks; the Wiener system is
shown in Figure 1 for a multiple input, multiple output
Correspondence to: Professor D. K. Rollins, Department of Chemical
Engineering, 2114 Sweeney Hall, Iowa State University, Ames, IA
50011, USA.
E-mail: drollins@iastate.edu
9
10
ROLLINS et al.
Figure 1. A description of the general MIMO Wiener model structure (decomposed to q MISO blocks) with i ¼ 1, . . . , q outputs and j ¼ 1, . . . , p inputs.
There is one set of blocks for each of the q outputs. For each set of blocks, each of the p inputs (uj) passes through a separate linear dynamic block (Gij) and
produces an intermediate variable, vij, that is an element of the vector vi. Each vi passes through a nonlinear static function fi(vi) and generates the output hi.
works of Cao and Gertler (2004); Zhu (2002); David and
Bastin (2001); Chen and Fassois (1992); and Haist et al.
(1973)] in block-oriented modelling. These studies all
employed methods of simultaneous identification of the
DTFM and ETM structures, rather than separate identification of these structures. Also, only a small fraction of
these studies specifically addressed Wiener systems (Zhu,
2002; Chen and Fassois, 1992) or Hammerstein systems
(Haist et al., 1973), and none of them involved the
modelling of physical systems.
Blocking-oriented modelling of physical systems in the
presence of serially correlated noise consists of the determination of three types of model structures: (1) the static or
steady-state model (SSM) ( fh in Figure 2); (2) the dynamic
deterministic transfer function model (DTFM) (i.e., h in
Figure 2); and (3) the dynamic ETM (1 in Figure 2). If
the goal is to determine the DTFM that explains the greatest amount of variation in the output, then identification of
this model can be quite challenging as it is competing with
the ETM for dynamic predictive power. In view of this, we
make the following comments. First, the information for
determining the DTFM comes from the relationships of
the past inputs on the current output. Secondly, the information for determining the ETM comes from the relationships of the past outputs on the current output. Furthermore,
past outputs contain a composite of input information that
makes the past values of an output variable more information-rich than the past values of any one input variable.
Thus, in many situations, it is possible to obtain high predictive accuracy without the use of any (or only a few)
input variables. Note that this is the core justification for
autoregressive-integrated moving average (ARIMA) modelling (see Box and Jenkins, 1976) which uses no inputs
and is a dynamic modelling approach based strictly on
past outputs. Therefore, given a transfer function modelling
problem where ARIMA modelling alone can be quite effective, one could obtain excellent performance irrespective of
the DTFM and its contribution. Consequently, in a dynamic
setting, under serially correlated noise, the modeller must
be careful not to allow the ETM to take predictive power
away from the DTFM when the goal is to obtain the optimal DTFM (i.e., the one with maximum predictive
Figure 2. A description of a MISO Wiener System with p measured inputs, q unmeasured inputs, and noise. The unmeasured inputs contribute to the
unmeasured process noise, 1process. The 1measurement term represents all the measurement errors. The error term, 1, is equal to 1process plus 1measurement.
The output, y, is equal to the exogenous (deterministic) term, h, plus the error (stochastic) term, 1 (i.e., the ETM).
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
OPTIMAL DETERMINISTIC TRANSFER MODELLING
performance). To ensure this goal, model development
requires an ability to separately develop, evaluate and partition the contributions of both of these dynamic terms to
the extent that the contribution of the DTFM is optimized.
Note that for a DTFM structure to be optimal it must have
optimal form or structure which includes containing all significant inputs.
Consequently, to achieve the goal of obtaining the maximum DTFM under serially correlated noise, this article
extends the two-stage method of Bhandari and Rollins
(2003) and Rollins and Bhandari (2004) that separately
determine the SSM and DTFM and proposes a threestage method that also separately determines the ETM.
We evaluate this method using the mathematically simulated CSTR in Bhandari and Rollins (2003) and the real
self-regulating level process presented in Reitz (1998).
This work is presented as follows. The next section reviews
the Wiener and Hammerstein modelling methods proposed
in Rollins and Bhandari (2004). The third section provides
a mathematical description of the three measurement
models evaluated in this study. The fourth section then
details the proposed three-stage model development procedure. The fifth section demonstrates the effectiveness of
the proposed procedure in the CSTR study by examining
several cases of serially correlated noise, while the sixth
section presents the results of the real process study.
Concluding remarks are given in the final section.
11
where t ¼ kDt, k ¼ 1, . . . , nt, Dt ¼ the sampling rate and
nt ¼ the number of samples. For discretization of equation
(1) and constraining it to a gain of one (this is the reason it
is called a ‘constrained’ method), Rollins and Bhandari
(2004) presented the following results when m ¼ 0:
vij,t ¼
n
X
"
n
X
dij,k vij,tk þ 1 k¼1
#
dij,k uj,t1
(4)
k¼1
and when m . 0:
vij,t ¼
n
X
d j,k vij,tk þ
m
X
vij,‘ uj,t‘
‘¼1
k¼1
"
þ 1
n
X
m
X
dij,k #
vij,‘ uj,t(mþ1)
(5)
‘¼1
k¼1
where the dij’s and the vij’s are estimated from data. We
will apply equations (3) and (5) in the simulation study
later.
The general MISO Hammerstein system in differential
equation form can be represented by equations (6) and (7):
dn hi (t)
d n1 hi (t)
d h (t)
þ
a
þ þ ai,1 i þ hi (t)
i,n1
dtn
dtn1
dt
dm vi (t)
dm1 vi ðtÞ
þ bi,m1
¼ bi,m
dtm
dtm1
dvi (t)
þ vi (t)
þ þ bi,1
(6)
dt
(7)
vi (t) ¼ fi (u(t))
ai,n
THE CONSTRAINED MIMO W-BEST AND
H-BEST MODELS
This section gives the constrained discrete-time
W-BEST (Wiener) and H-BEST (Hammerstein) modelling
approaches recently proposed by Rollins and Bhandari
(2004). For continuous-time versions, see Bhandari and
Rollins (2003) and Rollins et al. (2003). Note that, although
the application in this study is discrete-time modelling, the
methodology that we present is equally applicable to continuous-time modelling.
Working from the description of the Wiener system
given in Figure 1 with q outputs and p inputs, a general
deterministic mathematical W-BEST model (i.e., the
expectation form) is given by equations (1) and (2):
dn vij (t)
dn1 nij ðtÞ
dvij (t)
þ vij (t)
þ
a
þ þ aij, 1
ij,n1
dtn
dtn1
dt
dm uj (t)
dn1 uj ðtÞ
þ bij,m1
¼ bij,m
m
dt
dtm1
duj (t)
þ uj (t)
þ þ bij,1
(1)
dt
hi (t) ¼ fi (vi (t))
(2)
aij,n
where all initial conditions and derivatives are zero, i refers
to the output with i ¼ 1, . . . , q, j refers to the input with
j ¼ 1, . . . , p, and vi(t) ¼ [vi1, vi2, . . . , vip]T. Note that
for simplicity, equation (1) is written without dead
time and there are no restrictions placed on the static
function given by equation (2). Discretization of equation
(2) gives:
hi,t ¼ fi (vi,t )
(3)
Discretization of equation (7) gives:
vi,t ¼ fi (ut )
(8)
For discretization of equation (6) and constraining it to a
gain of one, Rollins and Bhandari (2004) presented the
following results for the H-BEST model when m ¼ 0:
hi,t ¼
n
X
"
di,k hi,tk þ 1 k¼1
n
X
#
di,jk vi,t1
(9)
k¼1
and when m . 0:
hi,t ¼
n
X
di,k hi,tk þ
þ 1
vi,‘ vi,t‘
‘¼1
k¼1
"
m
X
n
X
k¼1
di,k m
X
#
vi,‘ vi,t(mþ1)
(10)
‘¼1
We will apply equations (8) and (9) in the real process
study later in this article. In the next section we give
the measurement models for the development of the
model-building procedure.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
12
ROLLINS et al.
THE MEASUREMENT MODELS
Then from equations (14)–(16) we get
The three measurement models used in this study are
presented in this section. Model 1 is the exogenous, white
noise model; Model 2 is the exogenous, serially correlated
noise model; and Model 3 is the ARIMA model. Models 1
and 2 are used in the proposed three-stage model-building
methodology, while the results of Model 3 (where predictions are based solely on past outputs) are presented to
illustrate the potential competitiveness of the ETM with
the DTFM, as discussed in the Introduction. In the studies
to follow, we consider only one output; thus, we are
dropping the subscript i for simplicity.
Model 1: The ‘White’ Noise Model
yt ¼ h(1)
t þ at
(11)
where
at is distributed indep N(0, s2 )
(12)
8t
is ht from equation (3) for W-BEST or equations
and h(1)
t
(9) or (10) for H-BEST, whichever is appropriate, with
estimator y^ (1)
t :
y^ (1)
^ t(1)
t ¼ h
(13)
Note that ‘^’ is used to identify an estimator. h^ t(1) is
obtained via nonlinear least squares regression by determining the estimates of the d’s and v’s that minimize
SSE(1), the sum of squared errors (SSE) under Model 1
[see equation (A2)].
Model 2: The Pre-whitening Model
(2)
y(2)
t ¼ ht þ N t
(14)
where
Nt ¼
uq (B)
at
wp (B)
8t,
uq (B) ¼ 1 u1 B u2 B2 uq Bq ,
2
wp (B) ¼ 1 w1 B w2 B wp B
(2)
P(B)y(2)
t ¼ P(B)ht þ at ¼)
(2) (2) (2)
y(2)
t ¼ ht þ pt yt1 ht1 þ p2 yt2 ht2
þ þ at
which is now in white noise form with estimator
(2) (2) ^ t(2) þ p^ 1 yt1 h^ t1
þ p^ 2 yt2 h^ t2
þ y^ (2)
t ¼h
(18)
Note that Nt and at play the role of 1 in Figure 2. In practice,
the number of terms actually used in equation (18) is finite
since pi dies out as i increases.
The one-step-ahead (OSA) predictor, y^ (2)
t , is obtained via
nonlinear least squares regression by determining the
estimates of the d’s and v’s contained in h^ t(2) that minimize
SSE(2) [see equation (A3)]. Note that, as discussed previously, although their structures are equivalent, the coeffi^ (2)
cients in h^ (1)
t and h
t will be different since the objective
(1)
function for obtaining h^ (1)
t , SSE , is not the same as the
(2)
objective function for obtaining h^ (2)
t , SSE .
For OSA prediction, which uses equally spaced outputs,
we recommend y^ (2)
t , i.e., equation (18). However, for applications requiring only the DTFM (i.e., ht) (e.g., when outputs
are not measured on-line), we recommend h^ t(1) , the estimator
for ht under Model 1, over h^ t(2) , the estimator for ht under
Model 2. Appendix A gives a simple proof illustrating that
under the least squares criterion, when comparing
DTFM’s, h^ t(1) is the best estimator (i.e., gives the smallest
SSE) regardless of the nature of the ETM. In the simulation
study that we present later, the SSEs using h^ t(1) were all
less than the SSEs using h^ t(2) and thus, further supports our
claim that h^ t(1) is a better estimator for ht than h^ t(2) . As an
example in the literature see the ‘gas furnace problem’ in
Box and Jenkins (1976). For this problem h^ t(1) is given on
p. 383 and h^ t(2) is given as equation (11.4.1) on p. 396.
Although not calculated in Box and Jenkins (1976), we
found SSE(1) ¼ 221.6 (R 2 ¼ 0.927) using h^ t(1) which is less
than SSE(2) ¼ 222.0 (R 2 ¼ 0.926) using h^ t(2) , in agreement
with our claim from Appendix 1. We looked at
other examples from the literature (not shown for space
consideration) and they all were in agreement with our claim.
(15)
Model 3: The ARIMA Model
p
y(3)
t ¼
r
and B xt ¼ xt2r. Thus, Nt is an ARMA(p, q) ETM.
Pre-whitening Model 2
Let
wp (B)
¼ P(B) ¼ 1 p1 B p2 B2 uq (B)
(17)
(16)
Note that the right-hand side of equation (16) can be
obtained by long division [for an application of equation
(16) in another context see Kongsjahju and Rollins (2000)].
Nt
(1 B)d
(19)
This model is included because we give its performance for
each case in the CSTR simulation study to indicate how the
ARIMA estimator stacks up against the estimators under
is close to y^ (2)
Models 1 and 2. Note that when y^ (3)
t
t , this
means that a very large amount of the effect of the inputs
is contained in previous outputs. Hence, when this is the
case, it will be possible to obtain near-optimal OSA
model accuracy using a sub-optimal DTFM structure.
This is because a large portion of the input effect can be
carried by previous outputs, and when this happens, the
OSA DTFM structure can be different from the optimal
DTFM structure and still produce accuracy near the optimal
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
OPTIMAL DETERMINISTIC TRANSFER MODELLING
OSA model. Therefore, as indicated earlier, accurate
ARIMA model estimation is an indication that simultaneous approaches could result in a model structure for
ht that is significantly sub-optimal.
MODEL BUILDING IN THE PRESENCE OF
ARMA NOISE
When Model 1 is appropriate (i.e., white noise), Bhandari
and Rollins (2003) gives a two-stage procedure for building
block-oriented models exploiting statistical design of
experiments (SDOE) to maximize information content that
separately determines the static and dynamic model forms
and parameter estimates. In this section, we extend this
procedure to three stages to include a stage to separately
determine the serial correlation structure for the error term
[i.e., equation (15)]. This procedure is given in the following
six steps:
(1) Select the SDOE and run the design points (input
changes) as a sequence of step tests.
(2) Stage 1: Average the steady-state data from each input
change and find the form of equation (3); estimate the
ultimate response model parameters [see Rollins and
Bhandari (2004) for more details].
(3) Stage 2: From a visual examination of response plots
from the step tests, select the dynamic model form
[e.g. equations (4) or (5)] and estimate the dynamic
parameters under Model 1. This step is repeated until
an acceptable dynamic model is obtained.
(4) Stage 3: Using the residuals from Step 2, determine the
ARMA( p , q ) form of equation (15) and the initial
estimates of the p þ q parameters.
(5) Simultaneously refit the dynamic parameters (using the
form and initial estimates found in Stage 2) and the
ARMA parameters (using the form and initial estimates
found in Stage 3) under Model 2.
(6) Check the residuals from Step 5 for compliance to
white noise.
The SDOE selected in Step 1 is based on á priori
assumptions of the ultimate response behavior in the
input space as described in Rollins et al. (2003). For
additional help in block-oriented discrete-time modelling
see Rollins and Bhandari (2004). Step 2 differs slightly
from the procedure given in Rollins and Bhandari (2004)
in the requirement of averaging the steady-state data for
each input change. This is a necessary requirement because
the steady-state data for a given input change will be
serially correlated when the ETM follows equation (15).
However, since the groups of steady-state data for each
input change are far apart in time, the groups will not be
serially correlated and thus, neither will their averages.
One can check this behaviour via the autocorrelation function (ACF) for the residuals. Note that as the amount of
serial correlation increases [e.g., as w increases in an
AR(1) model], the sample size used for averaging must
be sufficiently large to maintain sufficiently low variance
in the averages.
For assistance with Step 3, see details in Rollins and
Bhandari (2004) where this step is identical since it is
under Model 1. In Step 4 (Stage 3), the ARMA (or
ARIMA) structure for the error term is determined. To
obtain this form, we use the ACF and the partial
13
autocorrelation function (PACF). See Box and Jenkins
(1976) for assistance on using the ACF and PACF for
determining an ARMA structure from residuals and determining parameter estimates from models. In the studies
to follow, we demonstrate this step using the Minitab computer program. In Step 5, the dynamic parameters and
ARMA parameters are re-estimated simultaneously under
Model 2 to fully comply with the assumption of white
noise and all the assumptions of least squares estimation.
This step produces the OSA predictor that uses past outputs
[i.e., equation (18)] and the estimated coefficients of ht and
Nt under Model 2. The final step is a check on the white
noise assumption for the final OSA predictive model
given by equation (18). This assumption should be checked
using the ACF before accepting the final model. In the next
section, we apply these steps in modelling CA for the
simulated CSTR.
MATHEMATICALLY SIMULATED CSTR STUDY
This section applies the proposed model-building procedure given in the previous section to block-oriented
modelling when the error term is serially correlated. This
application is to the simulated CSTR used in Bhandari
and Rollins (2003) for continuous-time modelling and
Rollins and Bhandari (2004) for discrete-time modelling.
For the physical details of this process see Bhandari and
Rollins (2003). Although this reactor has five outputs, we
restrict this study to just CA for space considerations. We
studied all outputs and our conclusions for CA for this
process apply equally as well to them.
A simplified diagram of the CSTR is given in Figure 3.
Reactants A and B enter the CSTR as two different flow
streams and form product C. The second-order, exothermic
reaction taking place in the CSTR gives the process strong
nonlinear and interactive behavior [see Bhandari and
Rollins (2003)]. The process model consists of the overall
mass balance, component (A and B) mole balances,
and energy balances on the tank and jacket contents.
The input variables are the feed flowrate of A (qAf),
the feed temperature of A (TAf), the feed concentration of
Figure 3. Schematic of the CSTR.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
14
ROLLINS et al.
A (CAf), the feed flowrate of B (qBf), the feed temperature
of B (TBf), the feed concentration of B (CBf), and the
coolant flowrate to the jacket (qc). Rollins and Bhandari
(2004) effectively modelled all the outlet variables as
Wiener processes with at ¼ 0 in equation (11) (i.e., no
added noise). Although the fitted models were excellent,
a check on the residuals revealed significant serial
correlation even without adding noise to the outputs.
In this study, Na,t (i.e., the noise added to CA) is either
AR(1) or ARMA(1, 1). The only other modifications we
made to the data that differs from Rollins and Bhandari
(2004) are the four additional centre design points for replication to estimate s2. These 60 step tests were completely
randomized to represent the design points for the training
sequence. The type of experimental design remained the
same: a three level Box –Behnken design. This design
includes: (1) each step test (design point) being five minutes long for a total run time of 300 min; (2) a sampling
time of 0.2 min; (3) the same static gain or ultimate
response function, f (vt) [see equation (3)]; and (4) the
same dynamic model forms, vj,t [see equation (5)], as
determined in Rollins and Bhandari (2004). That is,
ht ¼ f (vt ) ¼ b0 þ b1 v1,t þ þ b7 v7,t
þ b8 (v1,t )2 þ þ b14 (v7,t )2 þ b15 v1,t v2,t
þ b16 v1,t v3,t þ þ b35 v6,t v7,t
(20)
and
vj,t ¼ dj,1 vt1 þ dj,2 vt2 þ vj,1 uj,t1
þ (1 dj,1 dj,2 v j,1 )uj,t2
(21)
Note that the coefficients of equation (20) are estimated in
Step 2 of the proposed procedure with the corresponding uj
substituted for vj [see Rollins and Bhandari (2004)]. An
overall result for all cases in this study, as mentioned in
previously, is that the SSE using h^ t(1) was always less
in agreement with our proof in
than the SSE using h^ (2)
t
Appendix A. These results are not given for space consideration. We will now examine testing results of the first part
of this study; Na,t equal to AR(1).
Part 1: Na,t Equal to AR(1)
For Part 1 of this study, random error was added to CA,t
with the following distribution:
yt ¼ CA,t þ Na,t
(22)
where
Na,t ¼
at
1 wa B
(23)
and
at is distributed indep N(0, s2 )
8t
(24)
We varied s as follows: 0, 0.002, 0.006. The AR(1)
coefficient was 0 or 0.5. This gave five different cases
that we randomly replicated three times. One trial of the
most extreme case, s ¼ 0.006 and wa ¼ 0.5, will be examined in detail to demonstrate the proposed model-building
procedure. We will refer to this trial as the ‘example
case’. Figure 4 plots the observed and true responses of
CA over time for the example case to illustrate the behaviour of the added noise on the process response.
Figure 4 includes a plot representing all the response data
for this case along with a magnified view of the first 50
min of the training sequence. To conserve space, we do
not present the input sequence, but it is similar to the one
in Rollins and Bhandari (2004). Note that Figure 4
represents Step 1 in the proposed procedure.
Step 2 of the procedure is the identification and fitting of
the ultimate response model. The model form is given by
equation (20) with the corresponding uj substituted for vj.
For each steady state, we took the final five values right
before the next input change, averaged them, and used
these averages to estimate the coefficients in equation
(20). The estimated parameters are given in Table 1
below. The residuals from this fit (not shown) did not
show evidence of serial correlation from the ACF plot.
The R 2 value was 99.7%.
Figure 4. The observed values of CA and the true response of CA for the training case (s ¼ 0.006 and wa ¼ 0.5) illustrating the highest level of added noise
in the simulation study. The plot to the left ‘blows up’ the first 50 min and the plot on the right contains all the response data.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
OPTIMAL DETERMINISTIC TRANSFER MODELLING
Table 1. Estimated ultimate response coefficients in equation (20) for the
example case.
Parameter
b^ 0 103
b^ 1 103
b^ 2 101
b^ 3 103
b^ 4 103
b^ 5 102
b^ 6 101
b^ 7 104
b^ 8 105
b^ 9 101
b^ 10 104
b^ 11 105
Value
Parameter
3.21
4.37
3.57
26.46
23.84
21.26
24.59
9.93
26.86
5.16
1.83
5.95
b^ 12 107
b^ 13 101
b^ 14 106
b^ 15 103
b^ 16 105
b^ 17 105
b^ 18 105
b^ 19 104
b^ 20 105
b^ 21 102
b^ 22 103
b^ 23 103
Value
Parameter
Value
21.62
21.73
27.50
9.22
6.00
8.71
6.01
1.46
1.17
1.26
6.46
25.72
b^ 24 102
b^ 25 104
b^ 26 104
b^ 27 104
b^ 28 103
b^ 29 105
b^ 30 105
b^ 31 103
b^ 32 105
b^ 33 103
b^ 34 105
b^ 35 104
26.65
3.37
2.14
1.37
9.95
25.09
8.87
4.77
25.10
1.53
23.96
21.55
The next step is the estimation of the dynamic model
parameters in equation (21) under Model 1 using all the
data. That is, this step determined h^ (1) . The R 2 for this fit
was 98.1%. Figure 5 contains the ACF and PACF of the
residuals for this fit. This figure shows an exponential
15
decay of the ACF and a significant lag in the PACF indicating AR(1) behaviour. Following Step 4 of the procedure,
we used the ARIMA command in the Minitab program to
fit an AR(1) model to the residuals for an estimate of
w1 ¼ 0.61.
Step 5 re-estimated the parameters in equation (21)
using Model 2 with an AR(1) error term, as described in
equation (25) below, using the pre-whitening form (the
new estimate of w1 ¼ 0.64) giving:
(2) ^ t(2) þ w^ 1 yt1 h^ t1
y(2)
t ¼h
(25)
The ACF of the residuals from this fit is shown in Figure 6.
As evidenced by the small values of lagged correlation
coefficients which are in the confidence bands, any significant serial correlation appears to be completely removed.
The final h^ (2) parameter estimates used in equation (25)
are given in Table 2 which are different from the ones
obtained for h^ (1) . R2 using equation (25) was 98.8%.
Plots of y^ (2) and h^ (1) are given in Figure 7 for the first
50 min of the training time and for the total training time.
The plots of y^ (2) follow the data, y, quite well. Note that
h^ (1) follows CA closely as well.
Figure 5. The ACF and PACF of the residuals under Model 1 indicating an AR(1) noise model.
Figure 6. ACF of the residuals and the time series plot of the residuals of the final fit of Model 2 for the example case, indicating removal of serially
correlated noise via pre-whitening.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
16
ROLLINS et al.
Table 2. Estimated dynamic parameters in equation (21) under Model 2 for the example case.
Input ( j)
Dynamic
parameter
qAf (1)
d^j,1
d^j,2
v^ j,1
1.561
1.381
1.314
1.042
1.251
1.429
1.444
20.616
0.283
20.432
0.305
20.444
0.074
20.215
0.119
20.364
0.109
20.513
0.11
20.549
0.059
CAf (2)
TAf (3)
The models were then tested using the same test
sequence in Rollins and Bhandari (2004). Plots of y^ (2)
and h^ (1) are given in Figure 8 for the first 50 min of the testing time and the total testing time. Similar to the training
data, in this most extreme case, the plots of y^ (2) follow
the data quite well, and h^ (1) follows CA closely as well.
Next we examine a summary of test results for Part 1.
A summary of all the cases we ran for Part 1 of this study
is provided by Table 3. The results in this table are
expressed as relative measures of the sum of squared prediction errors (SSPE). To quantitatively assess the extent
of agreement between the true and observed responses
and the predicted responses, we define two terms called
the true sum of squared prediction error (T-SSPE) and
the observed sum of squared prediction error (O-SSPE)
TBf (4)
qBf (5)
CBf (6)
qc (7)
given by equations (26) and (27), respectively.
T-SSPE ¼
M
X
(CA,k h^ k )2
(26)
(yk y^ k )2
(27)
k¼1
O-SSPE ¼
M
X
k¼1
where M is the total number of equally spaced sampling
points used over the testing interval. For this study
M ¼ 1500.
The smaller the SSPE, the more accurate the model.
However, note that SSPE and SSE are determined from
different data. SSPE is calculated from the testing data
Figure 7. Time series plots for y^ (2) and h^ (1) under training. The top two plots are for the first 50 min while the bottom two plots are for the total training
time. For this training case, y^ (2) fits the data, y, quite well, and h^ (1) agrees well with CA.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
OPTIMAL DETERMINISTIC TRANSFER MODELLING
17
Figure 8. Time series plots for y^ (2) and h^ (1) under testing. The top two plots are for the first 50 min while the bottom two plots are for the total testing time.
For this testing case, y^ (2) fits the data, y, quite well, and h^ (1) agrees well with CA.
Table 3. Relative T-SSPE and relative O-SSPE for h^ (1) , h^ (2) and y^ (2) for Part 1 of the CSTR study.
s
0
0.002
0.006
Replication
wa
0
0.5
0
0.5
Relative T-SSPE;
Estimator in equation (26)
(1)
h^
h^ (2)
y^ (2)
y^ (3)
h^ (1)
h^ (2)
y^ (2)
y^ (3)
Relative O-SSPE;
Estimator in equation (27)
y^ (2)
y^ (3)
y^ (2)
y^ (3)
1.00
2.02
0.20
0.20
Replication
1
2
3
Avg
1
2
3
Avg
1.02
1.40
0.37
0.94
0.93
1.20
0.38
0.97
0.91
1.14
0.36
0.93
0.95
1.25
0.37
0.95
1.83
1.93
1.09
3.11
1.34
1.30
0.86
3.01
1.28
1.24
0.77
2.91
1.48
1.49
0.91
3.01
1.17
1.42
0.37
0.94
1.36
1.74
0.52
1.05
1.29
1.20
0.48
1.01
1.27
1.45
0.46
1.00
2.37
2.41
1.80
4.97
2.52
2.52
1.85
4.88
2.55
2.60
1.76
4.97
2.48
2.51
1.81
4.94
0.63
1.18
0.64
1.25
0.64
1.21
0.64
1.21
3.42
5.46
3.29
5.46
3.17
5.32
3.29
5.41
0.64
1.17
0.51
0.87
0.63
0.98
0.69
1.00
2.90
4.26
2.92
4.23
2.94
4.26
2.92
4.25
Notes: (1) Each result in this table is relative to the case T-SSPE for h^ (1) with s ¼ 0 and wa ¼ 0.
(2) All of the results in this table used an AR(2) model for the noise under Model 2 except for cases with s ¼ 0.006 and wa ¼ 0.5 that used an
AR(1) model.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
18
ROLLINS et al.
and SSE is calculated from the training data. Nonetheless,
since both are measures of model quality, we expect
trends in SSPE to follow trends in SSE. It is important to
recognize that the proof in Appendix A pertains to SSE
and not SSPE, and as we stated previously, there are no
SSE results from this study in contradiction of this proof.
All the SSPE’s in Table 3 are relative to the T-SSPE for
h^ (1) in equation (26) for the case when s ¼ 0 and
wa ¼ 0. To give an indication of the achievable model
accuracy from using only past outputs, this table also
includes ARIMA modelling results for each case. As discussed previously, this is to indicate the potential difficulty
simultaneous identification methods can have in finding the
optimal DTFM since this difficulty increases when high
accuracy is achievable by ARIMA modelling which uses
no inputs.
First we give some overall results and conclusions from
the test cases in Table 3. From Table 3, we see that SSPE
increases as s increases and as w increases. This behaviour
is expected and is more pronounced for large s. Next we
focus on the proposed OSA predictor, y^ (2) , alone. As
expected, y^ (2) has the lowest SSPE value in all cases.
Now we compare the two estimators for the DTFM, h^ (1)
and h^ (2) . As Table 3 shows, the SSPE for h^ (1) is better
than the SSPE for h^ (2) in all cases and mimics the SSE
results. Finally note that the SSPE’s for the ARIMA
model OSA predictor, y^ (3) , is only about one-and-a-half
to three times worse than the SSPEs for y^ (2) and in some
cases is better than the SSPE for h^ (1) . Thus, for this process
and the conditions of this study, it appears that simultaneous identification method could have difficulty in
obtaining the optimal DTFM structure.
where
at is distributed indep N(0, s2 )
For this study, s was fixed at 0.006. We looked at combinations of high values for the AR parameter (wa) and low
values for the MA (ua) parameter, and vice versa. In all,
we looked at four different cases with two trials each.
The ARIMA parameter combinations for these cases
included: ua ¼ 0.2, wa ¼ 0.8; ua ¼ 0.8, wa ¼ 0.2;
ua ¼ 20.5, wa ¼ 0.5; and ua ¼ 0.5, wa ¼ 20.5. For
s ¼ 0.006, the level of noise was less than that shown in
Figure 4 but still highly significant. To conserve space,
we do not present the training results and only present
the testing results, which are summarized in Table 4.
From Table 4, with the variations in the ARMA parameters as given, the proposed method was still able to
obtain excellent SSPE performance for h^ (1) and y^ (2) . In
the case with wa ¼ 20.5, the SSPE for y^ (2) is greater than
the SSPE h^ (1) . However, the training SSEs are just the
opposite with SSE(1) slightly greater than SSE(2) for this
case, as expected. Also, in the case with wa ¼ 0.2, the
SSPE for h^ (1) is slightly greater (but not significantly
greater) than the SSPE for h^ (2) . However, as mentioned
previously, the SSEs were just the opposite and in agreement with the proof in Appendix A. Finally, note that for
the case with ua ¼ 0.2, the T-SSPE for y^ (3) is very close
to the T-SSPE for y^ (2) , indicating a very high potential of
difficulty in finding the optimal DTFM structure using a
simultaneous identification procedure.
LEVEL PROCESS STUDY
To evaluate the proposed method using real data, we
applied our method to the self-regulating level process
shown in Figure 9. The level, h, reaches a steady state
for changes in inlet flow rate, q. All the data for this
study are taken from Rietz (1998) who originally built a
two-stage, continuous-time H-BEST model for this
system under Model 1. In this section, we will use the
steady-state model from Rietz (1998) but will rebuild the
Part 2: Na,t Equal to ARMA (1,1)
For Part 2 of this study, the random error, Na,t, that was
added to CA,t had the following distribution:
Na,t ¼
1 ua B
at
1 wa B
(29)
8t
(28)
Table 4. Relative T-SSPE and relative O-SSPE for h^ (1) , h^ (2) and y^ (2) for Part 2 of the CSTR study.
Relative T-SSPE
ua ¼ 0.2
ua ¼ 0.8
ua ¼ 20.5
ua ¼ 0.5
wa ¼ 0.8
wa ¼ 0.2
wa ¼ 0.5
wa ¼ 20.5
Replication
Estimator [equation (26)]
1
h^ (1)
h^ (2)
y^ (2)
y^ (3)
Estimator [equation (27)]
y^ (2)
y^ (3)
Replication
Replication
Replication
2
Avg
1
2
Avg
1
2
Avg
1
2
Avg
2.23
2.53
1.44
1.97
2.22
2.27
1.56
1.96
2.22
2.40
1.50
1.96
1.22
1.14
0.54
2.40
1.01
0.97
0.46
2.42
1.11
1.05
0.50
2.41
2.98
3.37
2.00
2.52
2.41
2.78
2.49
2.58
2.69
3.08
2.24
2.55
1.04
4.01
1.47
4.78
0.89
0.97
1.31
4.37
0.97
2.49
1.39
4.58
1.34
4.55
1.27
3.04
1.31
3.80
2.08
3.54
2.05
3.41
1.47
3.65
1.45
3.61
2.63
6.00
2.22
5.46
2.42
5.73
Relative O-SSPE
2.06
1.44
3.47
3.57
Notes: (1) Each result in this table is relative to the case T-SSPE for h^ (1) with s ¼ 0 and wa ¼ 0.
(2) All of the results in this table used an ARMA (2, 1) model for the noise under Model 2.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
OPTIMAL DETERMINISTIC TRANSFER MODELLING
19
At Step 3 (Stage 2), by relying on visual examination, we
selected a first-order dynamic model form as described by
equation (31) (for space consideration, the plots are not
shown).
h^ t ¼ d^1 h^ t1 þ (1 d^1 )vt1
Figure 9. The tank level system used in the study for the real process.
discrete-time H-BEST model in applying Stages 2 and 3 of
the proposed three-stage model building procedure to
address serially correlated noise.
In Step 1, the experimental design used by Rietz
(1998) consisted of two step tests of +0.15 qo, where
qo ¼ 0.66 gpm. The steady-state model was obtained in
Step 2 (Stage 1) and is given by equation (30):
vt ¼ a þ b(q q0 )
(30)
where a ¼ 0.3468 and b ¼ 85.2.
(31)
where d^1 ¼ 6:99. The initial value of this parameter was
obtained under Model 1. R 2 for this training step was 1.0.
The ACF and PACF for the residuals are plotted in
Figure 10.
As shown in these plots, the noise structure appears to be
an AR(1) or AR(2) model. We modeled these residuals
using an AR(2) structure and obtained the initial estimates
for w1 and w2 as described in Step 4 (Stage 3) of the modelbuilding procedure. Then, d1, w1 and w2 were re-estimated
simultaneously using equation (32) below (per Step 5):
(2) (2) ^ t(2) þ w^ 1 yt1 h^ t1
þ w^ 2 yt1 h^ t2
y^ (2)
t ¼h
(32)
where w^ 1 ¼ 0:7115 and w^ 2 ¼ 0:2910. As a final step, we
checked for compliance to white noise. As shown in
Figure 11, this step appears quite successful.
To test this model, we used the same testing sequence in
Rietz (1998); this sequence is given in Figure 12. As shown
in Figure 13, both the fitted DTFM (h^ (1) ) and the OSA
predictor (^y(2) ) fit the process response quite well. Thus,
this procedure appears to have promise as an effective
Figure 10. The ACF (left) and PACF (right) of the residuals under Model 1 for the real process study. The noise model appears to be AR(1) or AR(2)
because the ACF has an exponential decay and the PACF has a large correlation for lag 1 and no significant correlations for other time lags.
Figure 11. ACF of the residuals for the training data for the real process
after pre-whitening (i.e., after Step 5). The plot shows excellent removal
of the serially correlated noise (compare with the ACF plot in Figure 10).
Figure 12. The test input sequence for the real process study.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
20
ROLLINS et al.
Figure 13. The test sequence performance of y^ under Model 2 (i.e., y^ (2) ) and h^ under Model 1 (i.e., h^ (1) ) for the real process. The OSA predictor, y^ (2) , uses
previous outputs and inputs but the DTFM, h^ (1) , uses only inputs. In both cases performance is excellent with h^ (1) only slightly worse than y^ (2) .
model-building method for real physical systems in the presence of serially correlated noise.
for the optimization process to obtain accurate estimates
of the model parameters.
NOMENCLATURE
CLOSING COMMENTS
This work has proposed a model-building procedure for
block-oriented modeling when the error term is serially
correlated. This pre-whitening procedure addresses general
ARMA noise which is more common in real systems than
‘white’ (i.e., uncorrelated) noise. The procedure appears to
be effective and is able to separately determine model
structures and parameter estimates for the steady-state (ultimate response) model (SSM), dynamic deterministic transfer function model (DTFM), and the error term model
(ETM) in three stages which gives it an advantage over
methods that simultaneously determine these model structures. The advantage is that the proposed approach allows
the modeller to first maximize explained variation of the
DTFM model before fitting the ETM which could excessively dominate predictive performance and significantly
reduce the amount of explained variation by the DTFM.
This is especially true when ARIMA modelling, which
uses only past outputs, can produce a highly accurate fit.
This work demonstrated the reality of this situation by
showing several cases were ARIMA modelling performed
well relative to OSA predictive modelling with optimal
DTFM structure. Therefore, for OSA transfer function
modelling in the presence of serially correlated noise, it
is our strong recommendation that modellers obtain their
DTFM structure ahead of their ETM structure, regardless
of the method they use.
When the deterministic DTFM is the only goal, a significant implication of this work is the conclusion that the optimal model can be obtained under white noise regardless of
the presence of serially correlated noise. Hence, in this situation, the best modelling approach is to simply determine
the parameters that minimize SSE, without consideration
of the characteristics of the noise.
The proposed three-stage modelling building approach is
readily extendable to sandwich block structures. An important advantage of our three stage approach is the ability to
treat non-invertible static functions which provides promise
in addressing multiple input, multiple output processes.
However, the estimation of the static nonlinear functions
will be nested one inside the other creating a challenge
at
CAf
CBf
CA
CB
CC
f
G
m
M
n
nt
Nt
p
p
q
q
qAf
qBf
qc
T
TAf
TBf
T
u
v
y
white noise term
stream A inlet concentration
stream B inlet concentration
concentration of A in the reactor
concentration of B in the reactor
concentration of C in the reactor
nonlinear static gain function
linear dynamic function
number of zeros
number of equally spaced times over the testing interval
number of poles
number of samples
serial correlated noise model
number of inputs
number of AR parameters
number of outputs
number of MA parameters
feed A flowrate
feed B flowrate
coolant flowrate
temperature in the reactor
stream A inlet temperature
stream B inlet temperature
tank temperature in the reactor
vector of input variables
vector of intermediate variables
measured value of the output
Greek symbols
b
vector of parameters for static gain function
1
error term
d, v
vector of dynamic parameters for discrete-time models
h
expectation function (true value of the output)
s
standard deviation of the noise
w
auto regressive parameter
u
moving average parameter
Subscripts
a
i
j
t
noise
output
input
sampling instant
Superscripts
^
estimate
(1)
under Model 1
(2)
under Model 2
(3)
under Model 3
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
OPTIMAL DETERMINISTIC TRANSFER MODELLING
REFERENCES
Bai, E., 1998, An optimal two-stage identification algorithm for Hammerstein-Wiener nonlinear systems, Automatica, 34(3): 333–338.
Bhandari, N. and Rollins, D.K., 2003, A continuous-time MIMO Wiener
modeling method, Industrial & Engineering Chemistry Research, 42:
5583–5595.
Box, G.P. and Jenkins, G.M., 1976, Time Series Analysis: Forecasting and
Control, Revised edition (Holden-day, Oakland, California).
Cao, J. and Gertler, J., 2004, Noise-induced bias in last principal component modeling of linear system, Journal of Process Control, 14:
365– 376.
Chen, C.H. and Fassois, S.D., 1992, Maximum likelihood identification of
stochastic Wiener-Hammerstein-type-non-linear systems, Mechanical
Systems and Signal Processing, 6(2): 135 –153.
David, B. and Bastin, G., 2001, An estimator of the inverse covariance
matrix and its application to ML parameter estimation in dynamical
systems, Automatica, 37: 99–106.
Gomez, J.C. and Baeyens, E., 2004, Identification of block-oriented nonlinear systems using orthonormal bases, Journal of Process Control,
14: 685–697.
Greblicki, W., 1994, Nonparametric identification of Wiener systems by
orthogonal series, IEEE Transactions on Automatic Control, 39(10):
2077–2086.
Hagenblad, A., 1999, Aspects of the identification of Wiener models,
Technical report licentiate thesis no. 793, Department of Electrical
Engineering, Linköping Studies University, SE-581 83 Linköping,
Sweden.
Hagenblad, A. and Ljung, L., 2000, Maximum likelihood estimation of
Wiener models, Proceedings of the 39th IEEE Conference on Decision
and Control, 2417–2418.
Haist, N.D., Chang, F.H.I. and Luus, R., 1973, Nonlinear identification in
the presence of correlated noise in using a Hammerstein model, IEEE
Transactions on Automatic Control, 18(5): 552– 555.
Kalafatis, A.D., Wang, L. and Cluett, W.R., 1997, Identification of
Wiener-type nonlinear systems in a noisy environment, International
Journal of Control, 66(6): 923 –941.
Kongsjahju, R. and Rollins, D.K., 2000, Accurate identification of biased
measurements under serial correlation, IChemE Transactions Part A –
Chemical Engineering Research and Design, 78: 1010– 1017.
Nells, O., 2001, Nonlinear System Identification (Springer, Germany).
Pearson, R.K. and Ogunnaike, B.A., 1997, Nonlinear process identification, in Nonlinear Process Control, 20–23 (Prentice-Hall PTR,
Upper Saddle River).
Reitz, C.A., 1998, The application of a semi-empirical modeling technique
to real processes, Graduate thesis submission, Chemical Engineering
Department, Iowa State University.
Rollins, D.K., Bhandari, N., Bassily, A.M., Colver, G.M. and Chin, S.,
2003, A continuous-time nonlinear dynamic predictive modeling
method for Hammerstein processes, Industrial & Engineering Chemistry Research, 42: 861–872.
Rollins, D.K. and Bhandari, N., 2004, Constrained MIMO dynamic discrete-time modeling exploiting optimal experimental design, Journal
of Process Control, 14: 671 –683.
Westwick, D. and Verhaegan, M., 1996, Identifying MIMO Wiener
systems using subspace model identification methods, 52: 235 –258.
Wigren, T., 1993, Recursive prediction error identification using the
nonlinear wiener model, Automatica, 29(4): 1011–1025.
Zhu, Y., 2002, Estimation of an N-L-N Hammerstein-Wiener model,
Automatica, 38: 1607–1614.
21
APPENDIX A
Proof that
h^ (1)
t
is the Best Estimator for ht
Under Least Squares
Let h^ t be the h^ t that minimizes
SSE ¼
nt
X
(yt h^ t )2
(A1)
t¼1
Under Model 1 [equation (11)],
SSE(1) ¼
nt
X
nt
2 X
2
yt y^ (1)
¼
yt h^ t(1)
t
t¼1
(A2)
t¼1
Therefore, from examination of equations (A1) and (A2),
one sees that h^ t(1) must be the best estimator of ht, that
is, h^ t . Likewise, under Model 2 [equation (14)],
SSE(2) ¼
nt
X
yt y^ (2)
t
2
t¼1
¼
nt
X
(2) yt h^ t(2) þ p^ 1 yt1 h^ t1
t¼1
2
(2) þp^ 2 yt2 h^ t2
þ (A3)
Thus, h^ t(2) is obtained through the minimization of equation
(A3) and not
SSE ¼
nt
X
2
yt h^ t(2)
(A4)
t¼1
Therefore, it is possible for h^ t(2) to not be h^ t , and thus, is
not preferred over h^ t(1) which is always equal to h^ t . It is
easy to create a problem to support this proof. For example,
let the true model be ht ¼ begt , for some b and g and fit the
model h^ t ¼ a^ 0 þ a^ 1 t, over some values of t with any distribution for ETM. Determine SSE(1) using equation (A2) and
SSE(2) using equation (A3). Then obtain SSE using
equation (A4) with h^ t(2) from equation (A3). Upon comparing this SSE with SSE(1) you will find that it is greater than,
or at best, equal to, SSE(1).
The manuscript was received 9 August 2005 and accepted for publication after revision 9 January 2006.
Trans IChemE, Part A, Chemical Engineering Research and Design, 2006, 84(A1): 9–21
Download