Bioprocess Engineering 22 (2000) 85±93 Ó Springer-Verlag 2000 Issues regarding artificial neural network modeling for reactors and fermenters V.C.P. Chen, D.K. Rollins Abstract In recent years researchers in many areas have used arti®cial neural networks (ANNs) to model a variety of physical relationships. While in many cases this selection appears sound and reasonable, one must remember than ANN modeling is an empirical modeling technique (based on data) and is subject to the limitations of such techniques. Poor prediction occurs when the training data set does not contain adequate ``information'' to model a dynamic process. Using data from a simulated continuousstirred tank reactor, this paper illustrates four scenarios: (1) steady state, (2) large process time constant, (3) infrequent sampling, and (4) variable sampling rate. The ®rst scenario is typical of simulation studies while the other three incorporate attributes found in real plant data. For the cases in which ANNs predicted well, linear regression (LR), one of the oldest empirical modeling techniques, predicted equally well, and when LR failed to accurately model/predict the data, ANNs predicted poorly. Since real plant data would resemble a combination of situations (2), (3), and (4), it is important to understand that empirical models are not necessarily appropriate for predictively modeling dynamic processes in practice. 1 Introduction Arti®cial neural network (ANN) models have recently been used to model a variety of complex nonlinear physical relationships. Empirical techniques that have been employed to predictively model a dynamic process include radial basis function models for a continuous-stirred tank reactor Received: 11 February 1999 V.C.P. Chen (&) School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332 D.K. Rollins Departments of Chemical Engineering and Statistics, Iowa State University, Ames, Iowa 50011 The authors wish to acknowledge partial support for this research by the National Science Foundation under grant number CTS-945 3534. We are also grateful to Kristine Bendixen, Charlotte Schulze-Hewett, Aletia Van Brocklin, and Parisa Taravati for collecting the data from the continuous-stirred tank reactor and ®tting the neural network models. Finally, we would like to thank Dasaratha Sridhar for writing the neural network code. (CSTR) [1], autoregressive moving average with external input (AR-MAX) models for a distillation process [2], an ANN for a continuous-stirred tank fermenter [3], and the recently introduced semi-empirical technique (SET) demonstrated for a CSTR [4]. ANNs fall into the class of purely empirical methods since their structures are seldom, if ever, phenomenologically inspired, and their coef®cients are determined strictly by the data and seldom have physical meaning. As an empirical technique, ANN models are ¯exible, relatively easy to ®t, and typically perform very well in applications suited for empirical modeling. However, due to the ease of obtaining accurate ®ts to data sets, they can be misapplied and misused just as easily. This paper demonstrates important limitations of empirical models in accurately predicting model outputs (i.e., responses) to changes in inputs (disturbances) for real data taken from dynamic chemical, or biochemical process. In addition to ANN models, linear regression (LR), the most common empirical modeling technique, is tested. Both ANNs and LR predict well under steady state conditions, with data sampled at every input change, and under dynamic conditions, with constant time intervals between input changes and data sampled at very input change. However, under more complex sampling conditions found in real dynamic plant data such as infrequent sampling and variable sampling rate, empirical methods will likely predict poorly. When process data are sampled infrequently, input changes (there could be several) that occur between sampling instances have to be ignored by the empirical model because of its inability to predict without a complete set of input and output variables. With variable sampling under dynamic conditions, having the predicted response match the next measured response in time is a matter of fortuity and not control. Thus, we conjecture that an empirical modeling approach will not be successful under complex conditions of sampling. To evaluate our suppositions, we consider situations when: (1) the process nearly reaches steady state before the next sampling; (2) the input change period is much smaller than the time to reach steady state, but the sampling rate is constant; (3) the data are sampled infrequently; and (4) the sampling rate varies signi®cantly. The distinguishing feature between ANNs and LR is that in situations when predictive ability is poor, ANNs are able to interpolate the training data, while LR can perform poorly in ®tting the training data. Thus, (as commonly known) when working with ANNs, one must employ model validation on a test data set, so as not to misinterpret a good ®t to the training data as good predictive ability. 85 Bioprocess Engineering 22 (2000) 86 Empirical dynamic modeling of a response variable requires discrete-time past input and output measured data. The ideal situation is frequent data sampled at a constant and equal sampling rate for inputs and outputs. However, the typical real process is far from this ideal situation. More likely, several critical variables will be sampled infrequently and at different rates. The presentation in this article was motivated by the work of Normandin et al. [3] who modeled biomass and substrate responses to changes in the inlet/outlet ¯ow rate with an ANN. We commend them for demonstrating this powerful application of empirical modeling in the context of biochemical engineering. The main purpose of their ANN model was its inclusion in an iterative optimization routine intended to determine, for each sampling point, the inlet ¯ow rate which maximizes a steady-state objective performance criterion. Recognizing that one purpose of their work was to obtain an adequate ANN model under the conditions of their study (at which they were successful), our purpose is to supplement their work with cautions when empirically modeling data with complex sampling attributes. In addition, we also want to highlight the ¯exibility that modelers have in choosing a particular empirical method. We demonstrate this by bringing a represented empirical method, LR, into our study and demonstrating that LR and ANN are successful and unsuccessful together. Thus, in situations well suited for empirical modeling, the selection of a particular approach will likely be just a matter of preference. Our objectives, stated above, will be accomplished through a study involving the four situations stated above. For these four situations, model prediction performance is tested on a simulated continuous stirred tank reactor (CSTR) with ¯ow rate qc as the input variable and concentration of species A CA as the output variable. The CSTR and the two empirical modeling techniques (ANN and LR) in the context of the CSTR are described in Section 3. Section 4.1 discusses the situations in which the ANN and LR models performed well, and Section 4.2 presents the situations in which they did not. Finally, closing remarks are given in the last section. 2 LR performance on the data in Normandin et al As our only comparison to the ANN used by Normandin et al. we ®t LR to the 50 substrate samples forming their test data set. Neither their training data set nor their ®nal predictive ANN model was available. Using the notation of their paper, the LR model we employed was: Fig. 1. Linear regression ®t to the substrate test data in Normadin et al. paper 3 The CSTR process model Since Normandin et al. did not provide all the necessary information needed to replicate their process, we generated data from a different, but similar process. The CSTR in this study is similar to the one used by Pottmann and Seborg [1] except for the inclusion of an energy balance on the jacket contents to account for the jacket content temperature (which here is the same as the outlet coolant temperature) varying with process changes. The dynamics equations describing this process are given below and the coef®cients and other quantities can be found in Table 1: dCA q CAf ÿ CA k0 CA eÿE=RT ; dt V dT q ÿDHk0 CA Tf ÿ T dt V qCp eÿE=RT q Cpc qc c 1 ÿ eÿhA=qc pc Cpc Tc ÿ T ; qCp V 2 dTc qc qc Tcf ÿ Tc 1 ÿ eÿhA=qc pc Cpc T ÿ Tc : dt Vc Vc 3 Model development follows Pottmann and Seborg very closely. Speci®cally, model development used current and past input and output values to predict the output value of the next sampled output. That is: Table 1. Nominal conditions for the CSTR Variable Tank volume Feed ¯ow rate Feed temperature Feed concentration Coolant volume Ss1 b0 b1 Fs b2 Fs ÿ Fsÿ1 b3 Ss ; Coolant ¯ow rate Coolant temperature where S denotes dimensionless substrate concentration, F Densities denotes dimensionless ¯ow rate, s denotes dimensionless Speci®c heats time, b denotes the LR coef®cients, and denotes random Pre-exponential factor error in the data. As shown in the Fig. 1, LR ®ts well with Exponential factor Heat of reaction similar performance as the ANN ®t obtained by NorHeat transfer characteristics mandin et al. Thus, our belief that, with proper ®tting, Concentration of A when one empirical approach predicts well others will Stamping period predict well is supported by the results in Fig. 1. 1 Symbol Nominal value V q Tf CAf Vc qc Tc q; qc C; Cpc k0 E=R ÿDH E=R CAi Dt 100 l 100 l min)1 350 K 1 mol l)1 25.76 l 100 l min)1 350 K 1000 g)1 K)1 1 cal g l )1 7.2 ´ 1010 min)1 9.98 ´ 103 K 2.0 ´ 105 cal mol)1 7.0 ´ 105 cal)1 K)1 0.0137349 mol l)1 0.1 min V.C.P Chen, D.K. Rollins : Issues regarding arti®cial neural network modeling CAi1 f qciÿ2 ; qciÿ1 ; qci ; CAiÿ2 ; CAiÿ1 ; CAi i1 ; 4 where i is the current time instant and i+1 is the error term for the prediction of CA at the (i + 1)-th sampling instant. Notice that Eq. (4) is not a continuous function of time. Thus, three drawbacks of empirical approaches are immediately seen from Eq. (4): (1) one cannot obtain prediction between sampling times; (2) a variable sampling time could have a negative impact on accuracy; and (3) inputs (i.e., the q's) and outputs (i.e., the CA's) must be sampled at the same time to provide the most accurate prediction of CA. We employed three-layer (input, hidden, output) ANN models using a hyperbolic tangent activation function. Our ANN model equation was: CAi1 Tanh c X j xj1 Zji1 i1 ; Zji1 Tanh dj m1j qci m2j qciÿ1 m3j qciÿ2 m4j CAi m5j CAiÿ1 m6j CAiÿ2 : 5 The statistical model discrimination technique called ®nal prediction error (FPE) [5] was used to help obtain the ``best'' ANN model. In the cases that ®t poorly to the test data based on the FPE result, we also used cross validation to determine the best model. In these cases the FPE-based ANN usually recommended a very large number of nodes in the hidden layer and the cross validation procedure determined a smaller number of nodes in the hidden layer. Note that the FPE-based ANN provided an upper bound on the number of nodes in the hidden layers that we investigated. For example, when FPE gave 22 nodes as the optimal number, we evaluated, by cross validation, models with 1 to 22 nodes in the hidden layer. The one that we accepted from cross validation was the one with the smallest prediction error. Thus, we are con®dent in our conclusions of ANN's inability to accurately ®t in these cases. Our LR model equation was: CAi1 b0 b1 qciÿ2 b2 qciÿ1 b3 qci b4 CAiÿ2 b5 CAiÿ1 b6 CAi i1 : Table 2. Correlation r between estimated and true concentrations for both the training and test data sets. Coef®cient of multiple determination R2 for the empirical ®t to the training data sets 6 4 Prediction performance of empirical models Four cases were considered in this study: (1) steady state; (2) large time constant; (3) infrequent sampling; and (4) variable sampling rate. For each case, the training data set consisted of 200 samples and the test data set consisted of 50 samples. The input sequences for the training and test data sets were generated randomly from a uniform (90, 110) probability distribution. The same input sequence of 1000 values was used for all four training data sets with case (3) requiring all 1000, and cases (1), (2), and (4) requiring only the ®rst 200 in the sequence. The same input sequence of 50 values was used for all four test data sets with all 50 required for all cases (since the goal was to test how well the models predicted the true process). Table 2 presents the correlation r between the true CA and estimated concentrations for A using LR and ANN models. Also included are the R2 values (the percent of the total variation in the data explained by the model) resulting from the ®t of the empirical models to the training data sets. Accurate prediction was achieved by both LR and ANN models for the steady-state and large time constant cases while poor prediction resulted for both models in the infrequent sampling and variable sampling rate cases. Each case is examined more closely in the following sections. 4.1 Cases with accurate prediction 4.1.1 Steady-state processes In the steady-state setting, each input change essentially reaches its full effect before another change occurs. Thus, the response is the same for identical inputs. This relationship between output and input without time dependence is easily modeled, given the complete set of input changes, by ®tting a purely empirical technique to a representative data set of responses for different input changes. For our steady-state CSTR data, the process time constant was one minute (s = 1), input changes occurred every 5 minutes, and data was sampled at every input Case Data set LR ANN (FPE) ANN (Test) Steady-state Training r R2 Test r 0.9991 99.820% 0.9991 0.9999 99.998% 0.9999 Large time constant Training r R2 Test r 0.9983 99.801% 0.9975 0.9984 99.991% 0.9977 Infrequent sampling Training r R2 Test r )0.2001 4.002% )0.6047 0.9991 99.857% )0.0664 0.3239 10.613% )0.5008 Variable sampling rate Training r R2 Test r 0.4371 19.104% 0.1525 0.9900 98.031% )0.0093 0.5453 30.002% 0.1778 87 Bioprocess Engineering 22 (2000) 88 Fig. 2. Steady-state case, training data (®rst 500 minutes): a input sequence, b concentration responses change. In Fig. 2 the ®rst 500 minutes (out of 1000) of the training data input sequence is shown with the true process and the ANN and LR ®ts on the 200-sample training data set. The FPE-based ANN model selected four nodes in the hidden layer as the ``best'' ANN structure for this data. Figure 3 shows their performance on the steady-state test data. Both LR and the FPE-based ANN model ®t the Fig. 3. Steady-state case, test data: a input sequence, b concentration responses training data and predict the test data very well (the shortdashed lines for the ANN are not visible because they lie directly under the long-dashed lines for LR). Note that the dynamic transitions between samples cannot be captured by either empirical model. In Table 2, both empirical models achieve correlations over 0.99 on both the training and test data. V.C.P Chen, D.K. Rollins : Issues regarding arti®cial neural network modeling 4.1.2 Large time constant To simulate a basic dynamic condition, the process time constant was increased to ®ve minutes, with input changes occurring every minute and samples taken at each input change. In Table 2, as in the steady-state case, both LR and ANN achieved correlations over 0.99 on both the training and test data. In Fig. 4, the ®rst 100 minutes (out of 200) of the training data input sequence are shown with the corresponding true process and the ANN and LR ®ts on the 200-sample training data set. The FPE-based ANN model selected three nodes in the hidden layer. Figure 5 shows the results on the test data. Again, both LR and the FPE-based ANN performed very well, but this time their predictions on the test data were dissimilar enough to be able to distinguish the short-dashed from the long-dashed lines. We can gain additional information from the coef®cients for LR. For the steady-state case, only last input change qci was required to accurately predict the next concentration CAi1 . The large time constant case additionally requires the last concentration CAi . The ANN coef®cients are unable to provide this insight. Note that for the ``best'' ANN model structure, the steady-state case used more nodes in the hidden layer than the large time constant case. This seems counterintuitive to us since the large time constant case is clearly more complex than the steady-state case. In contrast to our results here, a paper by Chen et al. [6] found that LR and ANN models exhibited time lag in the case with a large time constant. We suspect these contrary results are due to the nonconstant time intervals between input changes/samples in the data used by Chen et al. Thus, the accurate predictions produced in Fig. 5 are Fig. 4. Large time constant, training data (®rst 100 minutes): a input sequence, b concentration responses heavily dependent on the constant one-minute time interval between samples/input changes. 4.2 Cases with poor prediction For the remaining two cases, neither ANNs nor LR were able to predict accurately; although ANNs were able to interpolate the training data well. The results include two ANN ®tted models, one whose structure was determined by the model discrimination technique FPE and one whose structure was selected to minimize the test data prediction error sums of squares (SSPE): 50 X ^ Ai 2 ; CAi ÿ C i1 where the CAi is the measured concentration of A for the ^ Ai is the prediction i-th sample in the test data set and C using the trained model. This ``test-based'' ANN structure represents the best prediction that an ANN model, based on the given training data, could achieve. In Table 2, the FPE-based ANN models have high correlations on the training data for the infrequent sampling and variable sampling rate; however, LR and both ANN models all result in low correlations on the test data. 4.2.1 Infrequent sampling For the infrequent sampling case, multiple input changes were permitted to occur between samples. Similarly to the steady-state case, the process time constant was one minute with input changes occurring every minute, but instead of sampling at every input change, samples were taken every ®ve minutes (i.e., at every ®fth input 89 Bioprocess Engineering 22 (2000) 90 Fig. 5. Large time constant, test data: a input sequence, b concentration responses change). This situation represents the typical real plant attribute of input changes occurring between samples. In Fig. 6, the ®rst 150 minutes (out of 300) of the training data input sequence are shown with the corresponding true process and the ANN and LR ®ts on the 200-sample training data set. The sampled data are indicated by dots on the input sequence and the true process. The FPE- Fig. 6. Infrequent sampling, training data (®rst 150 minutes): a input sequence, b concentration responses based ANN selected 23 nodes in the hidden layer and is able to interpolate the samples comprising the training data, while LR and the test-based ANN with one node in the hidden layer mostly follow the average. An experienced ANN modeler would recognize that the selection of 23 nodes by the FPE model discrimination technique is unjusti®ed. However, to a naive ANN modeler, the in- V.C.P Chen, D.K. Rollins : Issues regarding arti®cial neural network modeling 91 Fig. 7. Infrequent sampling, test data: a input sequence, b concentration responses terpolating ®t suggests that ANNs are ¯exible enough to represent this process. In Fig. 7, none of the empirical models predict well. The FPE-based ANN which interpolated the training data varies wildly up and down beyond the true process. The other two models stay closer to the process, but none shows any semblance of the true process in the test data. While a poor ®t is demonstrated by LR in examining the training data, these plots illustrate the importance of model validation of ANNs on a test data set. 4.2.2 Variable sampling frequency To simulate conditions in which sampling frequency varied from one sample to the next, we generated 199 uniformly distributed times ranging from 0.1 minutes to 1.5 minutes. This situation of randomly varying sample times represents the real plant attribute of non-constant sampling rate. For simplicity, we chose to have input changes occur at the same time qc (the input) and CA (the output) were sampled. Figure 8 the ®rst 85 minutes (out of 170) of the training data input sequence is shown with the corresponding true process and the ANN and LR ®ts on the 200-sample training data set. Similarly to the infrequent sampling case, the FPE-based ANN selected a large number (22) of nodes in the hidden layer while the test-based ANN only selected two nodes. Again, we ®nd that the FPE-based ANN is able to interpolate the training data, but none of the empirical models predict well on the test data in Fig. 9. Equation (4) provides insight on the poor ®t of this case. Since CA prediction is dependent on past data sampled discretely in time, if the time of prediction does not match the time of the next sample for CA, this alone could cause a large prediction error. With variable sampling rates, these two times are not likely to correspond. Thus, when signi®cant sampling rate variability exist, empirical modeling does not appear to be a viable choice for predictive modeling of dynamic processes. 5 Concluding remarks Using four carefully constructed cases, we tested the predictive capability of LR and ANN. Although ANNs have the reputation of accurately modeling complex nonlinear physical relationships, these models are still data-driven and are subject to the limitations of empirical models. In the steady-state and large time constant cases, both empirical models performed extremely well. However, in the infrequent sampling and variable sampling rate cases, neither could accurately predict the test data, despite the FPE-based ANN model's ability to interpolate the training data. Infrequent data sampling causes the problem of missing information and makes it dif®cult for empirical predictive models to perform well. Variable sampling rate data cause model predictions to not correspond to sampling times, as necessary, for discrete dynamic data. These cases illustrated the distinguishing feature between ANNs and LR: a good ®t to the training data implies good predictive ability for the LR model, but does not necessarily imply good prediction by the ANN model. Thus, even though an ANN model was capable of interpolating the training data sets, the limitations of empirical modeling prohibited ANNs from accurately predicting the corresponding test data set. This interpolative ability (i.e., excellent ®ts to training data) of ANNs may be misleading in many applications, and ANN modelers should be aware of the importance of Bioprocess Engineering 22 (2000) 92 Fig. 8. Variable sampling, training data (®rst 85 minutes): a input sequence, b concentration responses Fig. 9. Variable sampling, test data: a input sequence, b concentration responses model validation using a test data set. In addition, we encourage modelers to approach modeling problems broadly. That is, to classify them by approach (e.g., empirical) and not by modeling method (e.g., regression) since, as shown by this work, if one empirical method is successful some other empirical method is likely to be successful. In contrast, if the application is not suited for empirical modeling, no empirical modeling method will be successful. Thus, selection of a particular empirical method should be based more on training and familiarity, rather than a conception of superiority. References 1. Pottmann, M.; Seborg D.E.: Identi®cation of non-linear process using reciprocal multiquadric functions. J. Proc. Cont. 2 (1992) V.C.P Chen, D.K. Rollins : Issues regarding arti®cial neural network modeling 2. Yin, K.; Asbjornsen, O.A.: Application of MIMO adaptive control to a binary distillation process. Chem. Eng. Comm. 124 (1993) 115 3. Normandin, A.; Thibault J.; Grandjean B.P.A.: Optimizing control of a continuous stirred tank fermenter using a neural network. Bioprocess Eng. 10 (1994) 109 4. Rollins, D.K.; Liang J.M.; Smith, P.: Accurate simplistic predictive modeling of non-linear dynamic processes. ISA Transactions 36 (1997) 293±303 5. Akaike, H.: Statistical predictor identi®cation. Ann. Inst. Statist. Math. 22 (1970) 203 6. Chen, V.C.P.; Heldt, J.; McGlynn, K.; Rollins, D.K.: Critical issues in data collection and conditions when ®tting predictive neural network models for dynamic processes. ISA/96 Proc.: Advances in Instrumentation and Control 51 (1996) 279 93