6th International Conference on Hydroinformatics - Liong, Phoon & Babovic (eds) © 2004 World Scientific Publishing Company, ISBN 981-238-787-0 COMPARING ARTIFICIAL NEURAL NETWORKS AND SUPPORT VECTOR MACHINES FOR MODELLING RAINFALLRUNOFF BARBARA CANNAS, AUGUSTO MONTISCI, ALESSANDRA FANNI Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari, Italy LINDA SEE School of Geography, University of Leeds Leeds, United Kingdom GIOVANNI M. SECHI Department of Land Engineering, University of Cagliari, Cagliari, Italy Monthly river flow forecasting is an essential part of water resource management. In this paper, different approaches to modelling river flow are compared for the Santa Chiara section of the Tirso Basin in Sardinia. The results indicated that no significant improvement can be obtained with different neural models for monthly data forecasting, although some pre-preprocessing techniques can improve the forecasting performances as confirmed by the literature. INTRODUCTION Artificial Neural Networks (ANNs) are being used increasingly in hydrology and have been applied to a range of different areas including rainfall-runoff, water quality, sedimentation and rainfall forecasting ((Abrahart et al. [1], Baratti et al. [3]). ANNs for forecasting river flow are almost always trained using a multi-layer perceptron (MLP) with the backpropagation algorithm. This may be due in part to the fact that MLPs were the first successful models to be implemented (Rumelhart et al. [10]), and because the algorithm is simple to program and apply. However, there are now many different types of model available, some of which may be more suited to river flow forecasting and prediction, and hence warrant further investigation. The aim of this paper is to compare several approaches to modelling the rainfallrunoff problem for water management purposes when only basic input variables are available and where physicals models would be too complex for practical application. Different approaches are used including: support vector machines, time-delay neural networks, self-organizing maps and feedforward networks trained with backpropagation. Models are developed to predict discharge at one time step ahead using monthly discharge at the Santa Chiara section of the River Tirso. Historical runoff, rainfall and 1 2 temperature data are available from the period 1924 to 1992. The aforementioned approaches are then compared to results produced using a times series prediction model and persistence. A variety of different performance measures are used in evaluating the best approach to monthly runoff forecasting. SUPPORT VECTOR MACHINES Support Vector Machines (SVMs) are learning machines that can be used to solve both classification and regression problems (Vapnik, [12] [13]). Such problems can be expressed in general as a set of training data {(x1 ; y1) ; … ; (xN ; yN)}, where xi are the values that describe the system under study, and yi are the corresponding target values. The basic idea is to project the points xi into a higher dimensional space, called the feature space, where the points can be linearly separated, in the case of classification problems, or a linear regression can be performed, in the case of regression problems. In both cases, only a small part of the training set directly contributes to the determination of the machine structure. For regression problems in particular, the goal is to find a function that has deviations from the target values that are less than a fixed value for all the training points, but at the same time, be as flat as possible. For a given feasible solution that exceeds the stated deviation, only the points that violate the constraint contribute to modifying the regression function. In the end, the optimal solution will only depend on a small part of the training data set. The feature space is defined by referring only to this subset of points. The final solution depends on finding a set of parameters to the optimisation problem cost function. Such parameters affect the performance of the machine and must be determined by means of a trial-and-error procedure. The SVM for regression can be used to forecast a time series data set. In this situation, the performance of the machine is expressed in terms of its generalization capability. Two sets (training and validation) are used for parameter tuning, and then both sets are included in the training data set during the learning procedure. The input data to a SVM can be pre-processed using the same procedures as for neural networks and other types of models. For time series forecasting in particular, the mono-dimensional signal must be transformed into a multi-dimensional signal, for example, by means of a tapped line. ARTIFICIAL NEURAL NETWORKS Time Delay Neural Networks Locally Recurrent Neural Networks (LRNNs) have, more recently, been proposed for time series forecasting. LRNNs have an MLP-based structure with synapses characterized by taps and feedback connections (Back and Tsoi [2]), i.e. the short-term memory mechanism is brought into the feedforward network topology, and recurrent connections are introduced. In most processes, the advantage of using such a mixed 3 model, in contrast to using a pure MLP approach with lagged inputs, is that the same task can be achieved more parsimoniously. Nevertheless, feedbacks are necessary only if the system is characterized by long and complex temporal dynamics. Moreover, LRNNs are general but difficult to train. For this reason, Time Delay Neural Networks (TDNNs) are often used instead because of increased simplicity. TDNNs were originally introduced in speech recognition (Waibel et al. [14]). The architecture of a TDNN is well suited to modelling temporal sequences because it is invariant under translation in time or space. TDNNs use built-in time-delay steps to represent temporal relationships. Each neuron output is the weighted summation of the output from the neurons in the previous layer from a temporal input window, applied to a non-linear function (i.e. a sigmoid function). Each time-delay step is considered equally, which means that the recent past is not favoured over the distant past. Self-Organizing Maps A self-organizing map (SOM) is a type of artificial neural network developed by Kohonen [5], which is used more often for classification than function approximation. It differs from the MLP in both the configuration of the neurons and in the training algorithm. The neurons in a SOM are usually arranged in a two-dimensional grid, where each neuron has an associated vector of weights that corresponds to the input variables. Weights are first initialised randomly. Training then consists of selecting a data case, determining the neuron that is closest in Euclidean distance (or another measure of similarity) and updating the winning neuron and those within a certain neighbourhood around the winner. This process is repeated over many iterations until a stopping condition is reached. Training generally proceeds in two broad stages: a shorter initial training phase in which the map reflects the coarser and more general patterns in the data followed by a much longer fine tuning stage in which the local details of the organisation are refined. When training is completed, the weight vectors associated with each neuron define the partitioning of the multidimensional data. OTHER MODELS Traditional time series models can also be used in modelling river flow. ARMA (AutoRegressive Moving Average) models were developed by Box and Jenkins [4] and can be written as follows: xt φ0 φ1 xt 1 φ2 xt 2 ... θ1at 1 θ 2 at 2 ... a (1) where xt is the predicted value, 0 is the constant offset, i are weights associated with each previous observation, xt-i are previous observations, i are weights associated with each previous shock, at-i are previous shocks or noise terms, and at is the current shock. Persistence is the substitution of the known figure as the current prediction and represents a good benchmark against which other predictions can be measured. 4 DATA AND STUDY AREA Data used in this paper are from the Tirso basin, located in Sardinia, Italy, at the Santa Chiara section. The Tirso basin is of particular interest because of its geographic configuration and water resource management, as a dam was built in the S. Chiara section in 1924, providing water resources for central Sardinia. The basin area is 2,082.01 km 2 and is characterized by the availability of detailed data from several rainfall gauges. Recently, a new “Cantoniera Tirso” dam was built a few kilometers downstream, creating a reservoir with a storage volume of 780 Mm3, one of the largest in Europe. The data available for the hydrological system are limited to monthly recorded numerical time series associated with three basic input variables: the rainfall and temperature at the gauging station and the runoff at the considered station. In this study the analysis is developed on the runoff rates, which are treated as the realization of one stochastic process that should contain all the information necessary to characterize the basin. In previous works (Baratti et al. [3]), the authors have verified that temperature and rainfall data do not significantly improve model performance. Hence these data are not considered in this study. MODELLING The following models were developed using the first 40 years of data to train the model (1924-1963), the next 9 years for cross-validation (1964-1972) and the remaining 20 years as an independent test data set (1973-1992), where the data were normalized between -1 and 1 prior to training: SVM: The SVM simulations have been performed with a free tool available online called LIBSVM [7]. A memory depth equal to 6 has been used, so the input space of the SVM has a dimension equal to 6. The -SVM for regression gave the best results. Most of the parameters have been set up with default values, while for others an iterative sensitivity analysis has been performed. The adopted parameters have the following values: = 0.7; cost = 0.8; = 0.48. Refer to LIBSVM [7] for the meaning of the previously cited parameters. TDNN: The TDNN model had 3 hidden neurons and 16 input values. MLP: The training algorithm used was Levenberg-Marquardt. The network had 8 input nodes and 9 hidden neurons. The network input included the actual runoff, the 5 previous runoff values and the sine and cosine of the clock while the network output was the runoff at t+1. SOM/Manual data splitting: the data were trained with a SOM of differing sizes resulting in 4 different patterns or behaviours. The data were then divided into 5 the 4 subsets and each subset was then trained using a MLP with t to t-3 as inputs and t+1 as output. A manual data splitting procedure was also used in which the data at time t were split into low, medium and high flow categories. Once split, the data were then trained using MLPs and the same input window. ARMA model: From plots of the autocorrelation function, the best model was determined to have two autoregressive terms (lags of 1 and 2) and 1 moving average term (lag of 4). A single differencing operation was also undertaken but this did not improve the model results. Persistence: as outlined in the section on Other Models. The following measures of evaluation have been used to compare the performance of the different models, where N is the number of observations, Oi are the actual data and Pi are the predicted values: Coefficient of Efficiency (Nash and Sutcliffe [8]): N R 1 Oi Pi 2 i 1 N Oi O (2) 2 i 1 The seasonal Coefficient of Efficiency following the definition in Lorrai and Sechi [6]: D Ed E d 1 Rd D Ed (3) d 1 where E d D ( Pi Oi ) and d=1 to D months. (4) i d Root mean squared error: RMSE N 1 N Oi Pi 2 i 1 (5) 6 Mean absolute error: N MAE N 1 Oi Pi (6) i 1 Mean higher order error function (MS4E): N MS 4 E Oi Pi 4 i 1 N (7) DISCUSSION OF RESULTS The measures of evaluation were calculated for each model and are listed in Table 1 for the test data set (1973 to 1992). Table 1. Measures of evaluation for each model for the testing data set (1973 to 1992) Model R Rd RMSE MAE MS4E x 106 SVM 0.41 0.30 12.23 8.03 0.16 TDNN 0.43 0.35 11.58 7.36 0.11 MLP 0.42 0.38 29.0 19.0 0.50 SOM -0.04 -0.26 16.36 13.78 0.34 Data Partitioning 0.57 0.48 10.09 8.70 0.02 ARMA 0.29 0.19 13.14 10.09 0.19 Persistence 0.06 -0.06 15.44 9.47 0.38 The pure MLP solution, although providing comparable efficiencies to other solutions (i.e. TDNN and SVM), provided the worst results overall in terms of RMSE, MAE and MS4E. The SOM, although showing improvements in both RMSE and MAE over a pure MLP, was the second worst performing model and had very poor efficiencies. This result was surprising because a SOM was used successfully in a previous study to pre-process the data prior to training (See and Openshaw [11]). Examining plots of the test data revealed that the SOM was actually able to predict the peaks relatively well in places but had problems predicting low flows, indicating that the low flow behaviour was not captured adequately during the partitioning of the data with the SOM. The SVM and TDNN had comparable performances and were better than both a traditional linear time series model (i.e. ARMA) and persistence. However, they did not result in improvements in overall efficiency relative to a pure MLP. The manual data partitioning technique was used in order to divide the data at time t into low, medium and high flows, prior to training with individual MLPs. At least one network was therefore concentrating on low flow data, which was a clear deficiency with 7 the SOM approach. The result was a definitive improvement in efficiency as well as slight improvements in the other evaluation measures. Efficiencies were also calculated for the three subsets (low, medium and high flow predictions). Interestingly, they were poor for the low flow predictions and 90% for the medium and high flow predictions, further emphasizing the low flow data problem encountered in the SOM approach. Although the data partitioning technique resulted in the best overall results, it did involve the training of 3 different MLPs, thus requiring higher computational effort. CONCLUSIONS The implementation of different neural network models to forecast runoff in a Sardinian basin was proposed in this paper. The results showed that most of the neural network models could be useful in constructing a tool to support the planning and management of water resources. The measures of efficiency obtained with the different models, although significantly greater than those obtained with traditional autoregressive models, were still only around 40%. A sizeable increase was obtained when the input data were manually partitioned into low, medium and high flows before training with 3 individual MLPs, indicating that this pre-processing technique warrants further investigation. In fact it should be noted that in general, and in Sardinian basins in particular, that rainfall and runoff time series present high non-linearity and non-stationarity, and neural network models may not be able to cope with these two different aspects if no pre-processing of the input and/or output data is performed. Wavelet transformation and multiresolution analysis, for example, have been applied successfully to time series analysis when the signals to be processed are characterized by non-stationarity (Nason et al. [9]), and this represents an area of ongoing research. ACKNOWLEDGEMENTS The authors would like to thank the British Council/Ministero dell’Instruzione, dell’Università e della Ricerca in Italy for the funding provided through the British Council–MIUR/CRUI Agreement 2002-2003. REFERENCES [1] Abrahart, R.J., Kneale, P.E., See, L. “Neural Networks in Hydrology”, A.A. Balkema, Rotterdam, (in press). [2] Back D., Tsoi A.C., “FIR and IIR synapses, a new neural network architecture for time series modeling”, Neural Computation, Vol. 3, Massachusetts Institute of Technology, (1991), pp. 375-385. [3] Baratti R., Cannas, B., Fanni, F., Pintus, M., Sechi, G.M., Toreno, N., “River flow forecast for reservoir management through neural networks”, Neurocomputing, Vol. 55, (2003), pp. 421-437. 8 [4] Box, G.E.P., Jenkins, G.M. “Time Series Analysis: Forecasting and Control”. Oakland, CA, Holden-Day, (1976). [5] Kohonen, T., “Self-organization and associative memory”, Springer-Verlag, Berlin, (1984). [6] Lorrai, M., Sechi, G.M. 1995. Neural nets for modeling rainfall-runoff transformations, Water Resources Management, Vol. 9, (1995), pp.299-313. [7] LIBSVM, A Library for Support Vector Machines, (2003), http://www.csie.ntu.edu.tw/~cjlin/libsvm/. [8] Nash, J.E., Sutcliffe, J.V., “River flow forcasting through conceptual models, I: A discussion of principles”, Jounral of Hydrology, Vol. 10, (1970), pp. 282-290 [9] Nason, G. P., and von Sachs, R., “Wavelets in time series analysis”, Phil. Trans. Roy. Soc. A, Vol. 357, (1999), pp. 2511 – 2526. [10] Rumelhart, D.E., Hinton, G.E., Williams, R.J. “Learning internal representation by error propagation”. In: Rumelhart, D.E. & McClelland, J.L. (eds.) Parallel distributed processing: explorations in the microstructure of cognition, Vol.1. Cambridge MA, MIT Press, (1986), pp. 318-362. [11] See, L. and Openshaw, S. “A hybrid multi-model approach to river level forecasting”. Hydrological Sciences Journal, Vol. 45, (2000), pp.523-536. [12] Vapnik, V. “The Nature of Statistical Learning Theory”, Springer, NY, (1995). [13] Vapnik, V. “Statistical Learning Theory”, Wiley, NY, (1998). [14] Waibel, A.H., Hanazawa, T., Lang, K.J, Hinton, G., Shikano, K., “Phoneme recognition using time-delay neural networks”, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 37, (1989), pp. 328-339.