Liquid flow time series prediction using feed-forward neural networks and SuperSAB learning algorithm Laurenţiu Leuştean National Institute for Research and Development in Informatics-ICI, 8-10 Averescu Avenue, 71316 Bucharest, Romania, E-mail:leo@u3.ici.ro We must mention that in hydrology, for the bigger rivers, because of the inertness of the instruments used for measuring the speed of the water there is the probability that the measured liquid flow to be bigger or smaller then the real one. That’s why it was established that a tolerance of 10% is in the normal limits of measurement. It follows that a forecasting is considered correct if the value of the deviation is at most 10% from the measured value. Abstract - This paper presents an application of feed-forward neural networks and SuperSAB algorithm to forecasting a time series consisting of the average monthly liquid flow measured at Borzii Vineţi hydrometric station from the Jiu River, situated in the Southwest of Romania. Keywords: neural networks, liquid flow, time series prediction, adaptive learning algorithms, SuperSAB learning algorithm. I. INTRODUCTION II. FEED-FORWARD NEURAL NETWORKS The forecasting of different natural phenomena related to river flow is one of the priorities of hydrology in this moment. In order to obtain an adequate prediction it is necessary to observe the phenomena a long period of time, usually more than 30 years. The liquid flows display great fluctuations that can determine the flooding of agriculture land or, conversely, can diminish thus badly affecting the water supply. Accordingly, knowing the evolution of flows is crucial for predicting the agriculture output on irrigating or flooding land Neural networks are computational frameworks consisting of massively connected simple processing units (neurons). One of the most popular neural net paradigms is the feedforward neural network. In a feed-forward neural network, the neurons are usually arranged in layers. Since the presentation of the Backpropagation algorithm [7], a vast variety of improvements of the technique for training the weights in a feed-forward neural network have been proposed [4]. Some of the most popular algorithms are based on adaptive learning strategies. SuperSAB is such an algorithm introduced by T. Tollenaere in 1990 [9] . This paper presents an application of feed-forward neural networks and SuperSAB algorithm to forecasting a time series consisting of the average monthly liquid flow measured at Borzii Vineţi hydrometric station from the Jiu River, situated in the Southwest of Romania, in the Petroşani Depression. The hydrological basin corresponding to this station has an area of 1222 kmp and a medium elevation of 1135 m [10,11]. The data are from the period 1944-1989 [12]. The average monthly liquid flow represents the water volume that flows through the section of a river in a month, 3 reported at the unity of time and it expresses in m /sec. A. Basic concepts A neural network is a parallel distributed information processing structure consisting of processing elements (neurons) interconnected via unidirectional signal channels called connections. Each processing element has a single output connection that branches into as many collateral connections as desired; each carries the same signal - the processing element output signal. The processing element output signal can be of any mathematical type desired. The information processing that goes on within each processing element can be defined arbitrarily with the restriction that it must be completely local; that is, it must depend only on the current values of the input signals arriving at the processing element via incoming connections and on values stored in the processing element’s local memory. Neural networks develop information processing capabilities by learning for examples. Learning techniques can be roughly divided into two categories: 1. supervised learning; 2. unsupervised learning. Supervised learning requires a set of examples for which the desired network response is known. The learning process consists then in adapting the network in a way that it will produce the correct response for the set of examples. The resulting network should then be able to generalize (give a good response) when presented with cases not found in the set of examples. In unsupervised learning the neural network is autonomous; it processes the data it is presented with, finds out about some of the properties of the data set and learns to reflect these properties in its output. What exactly these properties are, that network can learn to recognize, depends on the particular network model and learning method. 1 One of the most popular neural net paradigms is the feedforward neural network. In a feed-forward neural network, the neurons are usually arranged in layers [7]. A feed-forward neural net is denoted as N I N1 N i N L N O , a j f (net j ) We remark that: f (net j ) where: N I represent the number of input units; L represent the number of hidden layers; N i represent the number of units from the hidden layer e ' 1 1 e net j . net j net (1 e j ) 2 f (net j )(1 f (net j )) a j (1 a j ) The objective of different supervised learning algorithms is the iterative optimization of a so called error function representing a measure of the performance of the network. This error function is defined as the mean square sum of differences between the values of the output units of the network and the desired target values, calculated for the whole pattern set. The error for a pattern p is given by i, i 1, L ; N O represent the number of output units. By convention, the input layer does not count, since the input units are not processing units, they simply pass on the input vector x. Units from the hidden layers and output layer are processing units. Figure 1 gives a typical fully connected 2-layer feed-forward network with a 3 4 3 structure. NO E p (d pj a pj ) 2 , j 1 where d pj and a pj are the target and the actual response value of output neuron j corresponding to the pattern p. The total error is P 1 1 P E Ep 2 p 1 p 1 2 NO (d j 1 pj a pj ) 2 , where P is the number of the training patterns. During the training process a set of pattern examples is used, each example consisting of a pair with the input and corresponding target output. The patterns are presented to the network sequentially, in an iterative manner, the appropriate weight corrections being performed during the process to adapt the network to the desired behavior. This iterating continues until the connection weight values allow the network to perform the required mapping. Each presentation of the whole pattern set is named an epoch. B. The Backpropagation algorithm Fi gure 1 : A 3x4 x3 feed -forward neura l net work. The Backpropagation algorithm is the most popular supervised learning algorithm for feed-forward neural networks [7]. In this algorithm the minimization of the error function is carried out using a gradient-descent technique. The necessary corrections to the weights of the network for each moment t are obtained by calculating the partial derivative of the error function in relation to each weight wij . A gradient vector Each processing unit has an activation function that is commonly chosen to be the sigmoid function: f ( x) 1 . 1 ex The net input to a processing unit j is given by: net j wij xi j , i where representing the steepest increasing direction in the weight space is thus obtained. The next step is to compute the resulting weight update. In it simplest form, the weight update is a scaled step in the opposite direction of the gradient. Hence, the weight update rule is xi ’s are the outputs from the previous layer, wij is the weight (connection strength) of the link connecting unit i to unit j, and j is the bias of unit j, which determines the location of the sigmoid function on the x axis. The activation value (output) of unit j is given by: 2 p wij (t ) E p wij SuperSAB with other adaptive rate learning algorithms, see [4]. To overcome the drawbacks of the simple Backpropagation weight update, this algorithm proposes weight-specific learning rates, since the error function may have a different shape with respect to the one-dimensional view of each weight in the network. Because of this, Tollenaere introduced a second learning law, which determines the evolution of a learning rate according to a local estimation of the shape of the error function. This estimation is based on the observed behavior of the partial derivative during two successive weight-steps. If the derivatives have the same sign, the learning rate is slightly increased by multiplying it with a factor greater than unity, in order to accelerate learning in shallow regions. On the other hand, a change in sign of the two derivatives indicates that the procedure has overshot a local minimum; the previous weigh-step was too large. As a consequence, the learning rate is decreased by multiplying it with a decreasing factor smaller than unity: (t ), where (0,1) is a parameter determining the step size and is called the learning rate. The partial derivative of the error for the pattern p is given by E p wij where (t ) pj a pi , pj is the error signal of unit j and is obtained as follows: -if unit j is an output unit, then pj f ' (net pj )(d pj a pj ) -if unit j is a hidden unit, then pj f ' (net pj ) pk w jk . ij (t 1), ij (t) ij (t 1), ij (t 1), where 0 1 . k Hence, the error signals pj for the output units can be calculated using directly available values, since the error measure is based on the difference between the desired d pj and actual a pj values. However, that measure is not available for the hidden units. The solution is to backpropagate the pj values layer by layer through the network. A momentum term was introduced in the Backpropagation algorithm [7]. The idea consists in incorporating in the present weight update some influence of the past iteration. The weight update rule becomes if E E (t) (t 1) 0 w ij w ij if E E (t) (t 1) 0 wij wij else Moreover, in case of a change in sign of two succesive derivatives, the previous weight-step is reverted. The partial derivative of the total error is given by E 1 P E p 1 P (t ) (t ) pj a pi . wij 2 p 1 wij 2 p 1 p wij (t ) pj a pi p wij (t 1) , where is the momentum term and determines the amount of influence from the previous iteration to the present one. The momentum introduces a “damping” effect on the search procedure, thus avoiding oscillation in irregular areas of the error surface and accelerating the convergence in long flat areas. In some situation it possibly avoids the search procedure from being stopped in a local minimum, helping it to skip over those regions without performing any minimization there. In summary, it has been shown to improve the convergence of the Backpropagation algorithm, in general. Hence, the signal errors pj must be accumulated for all P training patterns. This means hat the weights are updated only after the presentation of all training patterns. The weight update itself is the same as with Backpropagation learning, except that the fixed global learning rate is replaced by a weight-specific, dynamic learning rate ij (t) : w ij (t) ij (t) E (t) wij (t 1) w ij SuperSAB has shown to be a fast converging algorithm, that is often considerably faster than ordinary gradient descent. One possible problem of SuperSAB is the large number of parameters that need to be determined in order to achieve good convergence times, namely the initial learning rate, the momentum factor, and the increase (decrease) factors. Another drawback, inherent to all learning-rate III. THE SuperSAB ALGORITHM The SuperSAB algorithm is a local adaptive technique introduced by T. Tollenaere [9]. A very similar approach is the Delta-Bar-Delta algorithm [2]. For a comparison of 3 adaptation algorithms, is the remaining influence of the size of partial derivative on the weight-step. Despite careful adaptation of the learning rate, the derivative itself can have an un-foreseeable influence on the size of the weight-step. For example consider the situation where a very shallow error function leads to a permanent increase of the learning rate. Although the learning rate grows rather large, the resulting weight-step remains small, due to small partial derivative. When suddenly a region of steep descent is reached, probably indicating the presence of a minimum, the resulting large derivative is scaled by the large learning rate, pushing the weight in a region far away from the previous (promising) position. normalized to the range [0,1] before feeding into the neural networks: data[0,1] = l arg est _ data - data l arg est _ data - smallest _ data Forecasts from the neural network outputs were transformed to the original data scale before the percentage of good forecasts was reported. The algorithm was improved with the flat-spot elimination technique [1]. The flat-spot problem occurs when the output a j of the sigmoid activation function of some neuron j approaches 0.0 or 1.0. In this case, the function derivative a j (1 a j ) becomes too close to zero, leading to a small weight update. The technique consists in always adding a constant of 0.1 to the derivative, yielding a j (1 a j ) 0.1 . IV. EXPERIMENTAL MODEL The parameters used by the algorithm are: -the initial learning rate. At the beginning, all learning rates are set to this value. The choice of the value for this parameter is rather uncritical, for it is adapted as learning proceeds. We set to 0.1. -the momentum. We also set this parameter to 0.1. A time series is a sequence of time-ordered data values that are measurements of some physical process. A time series forecasting problem can be easily mapped to a feed-forward neural network [8]. The number of input units corresponds to the number of input data terms. The number of output units represents the forecast horizon. One-step-ahead forecast can be performed by a neural network with one output unit, and k-step-ahead forecast can be mapped to a neural network with k output units. This paper presents an application of feed-forward neural networks and SuperSAB algorithm to forecasting a time series consisting of the average monthly liquid flow measured at Borzii Vineţi hydrometric station from the Jiu River. The data are from the period 1944 - 1989 [12]. A forecasting is correct if its value presents a deviation of at most 10% from the measured value. In our experiments, we predicted the liquid flow from one month using the liquid flows from 12 previous months. Hence, a term of the time series consists in 12+1 values. The time series has 540 terms. The first 450 terms were used as training patterns and the next 50 as validation patterns. The last 40 terms of the time series were used to test the forecasting performance of the models. Validation can help insure that performance on the testing set is improving. The program keeps track of the network status during its point of best performance. If this performance does not improve for a number of epochs, the network is restored to its position of best performance and the trial is stopped. All the experiments were carried out on a Pentium at 133 MHz and with 24MB RAM. The feed-forward neural networks used in the experiments had 12 input units, one hidden layer and one output unit. We neural network structures by varying the tested different number of hidden units and we used different values for the parameters appearing in the algorithm SuperSAB. For each neural network structure and parameter setting, 5 experiments were run with different random initial weights, uniformly distributed over the interval [-1,1]. The reported results are the averages of the 5 runs. The data series were -the factor greater than the unity. We used for the following values: 1.1, 1.3, 1.5, 1.8 and 2.0. factor smaler than the unity. We set to 0.001, 0.1, 0.3, 0.5, 0.7 and 0.9. Nh-the number of units from the hidden layer. We used neural networks with 1, 3, 6 and 12 hidden units. The next tables present the percentages of correct predictions obtained after 2000 epochs. -the Table 1 1.1 0.001 1.1 0.1 1.1 0.3 1.1 0.5 1.1 0.7 1.1 0.9 4 Nh=1 Nh=3 Nh=6 Nh=12 14.37% 11.50% 6.66% 20.00% 9.18% 17.50% 20.83% 18.50% 19.18% 16.66% 18.33% 16.66% 18.33% 20.00% 15.00% 15.00% 20.00% 20.00% 20.83% 16.66% 6.66% 19.00% 17.50% 19.18% Table 2 1.3 0.001 1.3 0.1 1.3 0.3 1.3 0.5 1.3 0.7 1.3 0.9 Table 4 Nh=1 Nh=3 Nh=6 Nh=12 15.00% 15.83% 18.12% 17.33% 13.33% 19.16% 18.33% 16.66% 19.18% 19.16% 13.33% 19.00% 19.18% 16.86% 14.16% 19.00% 13.12% 14.16% 9.18% 0-2% 10.00% 0-2% 0-2% 0-2% 1.8 0.001 1.8 0.1 1.8 0.3 1.8 0.5 1.8 0.7 1.8 0.9 Nh=1 Nh=3 Nh=6 Nh=12 0.1 1.5 0.3 1.5 0.5 1.5 0.7 1.5 0.9 Nh=6 Nh=12 16.66% 21.16% 18.33% 18.33% 16.66% 19.60% 20.16% 20.00% 14.16% 17.50% 18.33% 19.00% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% Nh=1 Nh=3 Nh=6 Nh=12 16.86% 12.50% 12.50% 17.50% 16.66% 18.33% 16.66% 20.00% 17.50% 12.50% 10.83% 15.83% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 2.0 1.5 1.5 Nh=3 Table 5 Table 3 0.001 Nh=1 20.00% 18.33% 18.33% 15.83% 18.33% 20.50% 20.83% 16.66% 20.83% 17.50% 20.83% 22.50% 13.33% 8.75% 18.33% 11.66% 16.66% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0-2% 0.001 2.0 0.1 2.0 0.3 2.0 0.5 2.0 0.7 2.0 0.9 5 [4] M. Riedmiller, “Advanced supervised learning in multi-layer perceptrons-from Backpropagation to adaptive learning algorithms”, International Journal on Computer Standards and Interfaces, vol. 16, pp. 265-278, 1994. [5] M. Riedmiller, “Rprop-description and implementation details”, Technical Report, University of Kalsruhe, January 1994. [6] M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, in Proceedings of the IEEE International Conference on Neural Networks (ICNN), San Francisco, USA, H. Ruspini Ed. , pp. 586-591, 1993. [7] D. E. Rumelhart, G. Hinton and R. Williams, “Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. I Foundations, D. E. Rumelhart, J. McClelland & the PDP Research Group, MIT Press, Cambridge, MA, pp. 318-364, 1986. [8] Z. Tang and P. Fishwick, “Feed-forward neural nets as models for time series forecasting”, Technical report, University of Florida, Gainesville, 1993. [9] T. Tollenaere, “SuperSab: Fast adaptive backpropagation with good scaling properties”, Neural Networks, vol. 3, pp. 561-573, 1990. [10] L. Ujvary, Geografia apelor României, Scientific Ed., Bucharest, 1972 (in Romanian). [11] ***, Monografia hidrologicã a bazinului hidrografic al râului Jiu, Studii de hidrologie XV, I S C H Bucharest, 1966 (in Romanian). [12] ***, Anuarele hidrologice din perioada 1944-1989, I. N. M. H., Bucharest (in Romanian). V. CONCLUSIONS We see that the best percentage is 22.50%. Since we used 40 terms to test the forecasting performances of the networks, a percentage of 22.50% means that 9 from the 40 predictions were correct. This result was obtained with the following values of the parameters: 1.5; 0.3; 0.1; 0.1 and Nh=6; We notice that better results were got in the experiments with 1.5; 0.3 and 1.1; 0.7 . When both and had great values, the percentages of correct predictions where very small, 0-2%. In [3], we used another local adaptive learning scheme for feed-forward neural networks, the algorithm RPROP (Resilient Backpropagation) [4, 5, 6] to forecast a time series consisting of the average monthly liquid flow measured at Broşteni hydrometric station from the Motru River, an affluent of Jiu. The best percentage of correct predictions obtained was 26.65%, slightly better then the best result obtained in this paper. In this paper and [3], we used feed-forward neural networks. Another idea is to use recurrent neural networks, that is networks having feedback connections. These are very interesting for a number of reasons. Biological neural networks are highly recurrently connected, and many authors have studied recurrent network models of various types of perceptual and memory processes. The general property making such networks interesting and potentially useful is that they manifest highly nonlinear dynamical behavior. Hence, using recurrent neural networks could improve our results. The forecasting of the hydrological system from an area using neural networks represents a new method of investigations in the hydrological field and we hope that can contribute to hydrological prognosis improvement. We think that neural networks can be a promising alternative to classical prognosis based on statistical methods. In the future, we shall apply neural networks to forecasting other natural phenomena, too. ACKNOWLEDGEMENTS This paper is part of the of INTAS Project no. 397: “Data Mining Technologies and Image Processing: Theory and Applications”, Task 4: “The prognosis of harvest on the base of the weather- and geo-monitoring of a region”. REFERENCES [1] S. Fahlmann, “An empirical study of learning speed in backpropagation networks”, Technical report, Carnegie-Mellon University, 1988. [2] R. Jacobs, “Increased rates of convergence through learning rate adaptation”, Neural Networks, vol. 1, pp. 295-307, 1988. [3] L. Leuştean, “Liquid flow time series prediction using feed-forward neural networks and Rprop learning algorithm”, Studies in Informatics and Control, vol. 10, no. 4, pp. 287-299, 2001. 6