Modeling of liquid flow of the Motru river using feed

advertisement
Liquid flow time series prediction using feed-forward neural networks and
SuperSAB learning algorithm
Laurenţiu Leuştean
National Institute for Research and Development in Informatics-ICI, 8-10 Averescu Avenue, 71316 Bucharest, Romania,
E-mail:leo@u3.ici.ro
We must mention that in hydrology, for the bigger rivers,
because of the inertness of the instruments used for
measuring the speed of the water there is the probability that
the measured liquid flow to be bigger or smaller then the real
one. That’s why it was established that a tolerance of
10% is in the normal limits of measurement. It follows
that a forecasting is considered correct if the value of the
deviation is at most 10% from the measured value.
Abstract - This paper presents an application of feed-forward
neural networks and SuperSAB algorithm to forecasting a time
series consisting of the average monthly liquid flow measured at
Borzii Vineţi hydrometric station from the Jiu River, situated in the
Southwest of Romania.
Keywords: neural networks, liquid flow, time series prediction,
adaptive learning algorithms, SuperSAB learning algorithm.
I. INTRODUCTION
II. FEED-FORWARD NEURAL NETWORKS
The forecasting of different natural phenomena related to
river flow is one of the priorities of hydrology in this
moment. In order to obtain an adequate prediction it is
necessary to observe the phenomena a long period of time,
usually more than 30 years.
The liquid flows display great fluctuations that can
determine the flooding of agriculture land or, conversely, can
diminish thus badly affecting the water supply. Accordingly,
knowing the evolution of flows is crucial for predicting the
agriculture output on irrigating or flooding land
Neural networks are computational frameworks consisting
of massively connected simple processing units (neurons).
One of the most popular neural net paradigms is the feedforward neural network. In a feed-forward neural network,
the neurons are usually arranged in layers. Since the
presentation of the Backpropagation algorithm [7], a vast
variety of improvements of the technique for training the
weights in a feed-forward neural network have been proposed
[4]. Some of the most popular algorithms are based on
adaptive learning strategies. SuperSAB is such an algorithm
introduced by T. Tollenaere in 1990 [9] .
This paper presents an application of feed-forward neural
networks and SuperSAB algorithm to forecasting a time
series consisting of the average monthly liquid flow
measured at Borzii Vineţi hydrometric station from the Jiu
River, situated in the Southwest of Romania, in the Petroşani
Depression. The hydrological basin corresponding to this
station has an area of 1222 kmp and a medium elevation of
1135 m [10,11]. The data are from the period 1944-1989
[12].
The average monthly liquid flow represents the water
volume that flows through the section of a river in a month,
3
reported at the unity of time and it expresses in m /sec.
A. Basic concepts
A neural network is a parallel distributed information
processing structure consisting of processing elements
(neurons) interconnected via unidirectional signal channels
called connections. Each processing element has a single
output connection that branches into as many collateral
connections as desired; each carries the same signal - the
processing element output signal. The processing element
output signal can be of any mathematical type desired. The
information processing that goes on within each processing
element can be defined arbitrarily with the restriction that it
must be completely local; that is, it must depend only on the
current values of the input signals arriving at the processing
element via incoming connections and on values stored in the
processing element’s local memory.
Neural networks develop information processing
capabilities by learning for examples. Learning techniques
can be roughly divided into two categories:
1. supervised learning;
2. unsupervised learning.
Supervised learning requires a set of examples for which
the desired network response is known. The learning process
consists then in adapting the network in a way that it will
produce the correct response for the set of examples. The
resulting network should then be able to generalize (give a
good response) when presented with cases not found in the
set of examples.
In unsupervised learning the neural network is
autonomous; it processes the data it is presented with, finds
out about some of the properties of the data set and learns to
reflect these properties in its output. What exactly these
properties are, that network can learn to recognize, depends
on the particular network model and learning method.
1
One of the most popular neural net paradigms is the feedforward neural network. In a feed-forward neural network,
the neurons are usually arranged in layers [7]. A feed-forward
neural net is denoted as
N I  N1    N i   N L  N O ,
a j  f (net j ) 
We remark that:
f (net j ) 
where:

N I represent the number of input units;
 L represent the number of hidden layers;

N i represent the number of units from the hidden layer

e
'
1
1 e
 net j
.
 net j
 net
(1  e j ) 2
 f (net j )(1  f (net j ))
 a j (1  a j )
The objective of different supervised learning algorithms is
the iterative optimization of a so called error function
representing a measure of the performance of the network.
This error function is defined as the mean square sum of
differences between the values of the output units of the
network and the desired target values, calculated for the
whole pattern set. The error for a pattern p is given by
i, i  1, L ;
N O represent the number of output units.
By convention, the input layer does not count, since the
input units are not processing units, they simply pass on the
input vector x. Units from the hidden layers and output layer
are processing units. Figure 1 gives a typical fully connected
2-layer feed-forward network with a 3  4  3 structure.
NO
E p   (d pj  a pj ) 2 ,
j 1
where d pj and a pj are the target and the actual response
value of output neuron j corresponding to the pattern p.
The total error is
P
1
1 P
E   Ep  
2 p 1
p 1 2
NO
 (d
j 1
pj
 a pj ) 2 ,
where P is the number of the training patterns.
During the training process a set of pattern examples is
used, each example consisting of a pair with the input and
corresponding target output. The patterns are presented to the
network sequentially, in an iterative manner, the appropriate
weight corrections being performed during the process to
adapt the network to the desired behavior. This iterating
continues until the connection weight values allow the
network to perform the required mapping. Each presentation
of the whole pattern set is named an epoch.
B. The Backpropagation algorithm
Fi gure 1 : A 3x4 x3 feed -forward neura l net work.
The Backpropagation algorithm is the most popular
supervised learning algorithm for feed-forward neural
networks [7].
In this algorithm the minimization of the error function is
carried out using a gradient-descent technique. The necessary
corrections to the weights of the network for each moment t
are obtained by calculating the partial derivative of the error
function in relation to each weight wij . A gradient vector
Each processing unit has an activation function that is
commonly chosen to be the sigmoid function:
f ( x) 
1
.
1  ex
The net input to a processing unit j is given by:
net j   wij xi   j ,
i
where
representing the steepest increasing direction in the weight
space is thus obtained. The next step is to compute the
resulting weight update. In it simplest form, the weight
update is a scaled step in the opposite direction of the
gradient. Hence, the weight update rule is
xi ’s are the outputs from the previous layer, wij is
the weight (connection strength) of the link connecting unit i
to unit j, and  j is the bias of unit j, which determines the
location of the sigmoid function on the x axis.
The activation value (output) of unit j is given by:
2
 p wij (t )   
E p
wij
SuperSAB with other adaptive rate learning algorithms, see
[4].
To overcome the drawbacks of the simple
Backpropagation weight update, this algorithm proposes
weight-specific learning rates, since the error function may
have a different shape with respect to the one-dimensional
view of each weight in the network. Because of this,
Tollenaere introduced a second learning law, which
determines the evolution of a learning rate according to a
local estimation of the shape of the error function.
This estimation is based on the observed behavior of the
partial derivative during two successive weight-steps. If the
derivatives have the same sign, the learning rate is slightly
increased by multiplying it with a factor greater than unity, in
order to accelerate learning in shallow regions. On the other
hand, a change in sign of the two derivatives indicates that
the procedure has overshot a local minimum; the previous
weigh-step was too large. As a consequence, the learning rate
is decreased by multiplying it with a decreasing factor
smaller than unity:
(t ),
where   (0,1) is a parameter determining the step size and
is called the learning rate.
The partial derivative of the error for the pattern p is given
by
E p
wij
where
(t )   pj  a pi ,
 pj is the error signal of unit j and is obtained as
follows:
-if unit j is an output unit, then
 pj  f ' (net pj )(d pj  a pj )
-if unit j is a hidden unit, then
 pj  f ' (net pj )  pk w jk .
 
   ij (t 1),



ij (t)     ij (t 1),


 ij (t 1),




where 0    1   .
k
Hence, the error signals
 pj for the output units can be
calculated using directly available values, since the error
measure is based on the difference between the desired d pj
and actual a pj values. However, that measure is not
available for the hidden units. The solution is to backpropagate the  pj values layer by layer through the network.
A momentum term was introduced in the Backpropagation
algorithm [7]. The idea consists in incorporating in the
present weight update some influence of the past iteration.
The weight update rule becomes
if
E
E
(t)
(t 1)  0
w ij w ij
if
E
E
(t) 
(t 1)  0
wij wij
else
Moreover, in case of a change in sign of two succesive
derivatives, the previous weight-step is reverted.
The partial derivative of the total error is given by
E
1 P E p
1 P
(t )  
(t )     pj  a pi .
wij
2 p 1 wij
2 p 1
 p wij (t )     pj  a pi     p wij (t  1) ,
where  is the momentum term and determines the amount
of influence from the previous iteration to the present one.
The momentum introduces a “damping” effect on the
search procedure, thus avoiding oscillation in irregular areas
of the error surface and accelerating the convergence in long
flat areas. In some situation it possibly avoids the search
procedure from being stopped in a local minimum, helping it
to skip over those regions without performing any
minimization there. In summary, it has been shown to
improve the convergence of the Backpropagation algorithm,
in general.
Hence, the signal errors
 pj must be accumulated for all P
training patterns. This means hat the weights are updated
only after the presentation of all training patterns.
The weight update itself is the same as with
Backpropagation learning, except that the fixed global
learning rate  is replaced by a weight-specific, dynamic
learning rate ij (t) :
 w ij (t)   ij (t)
E
(t)     wij (t  1)
w ij
SuperSAB has shown to be a fast converging algorithm,
that is often considerably faster than ordinary gradient
descent. One possible problem of SuperSAB is the large
number of parameters that need to be determined in order to
achieve good convergence times, namely the initial learning
rate, the momentum factor, and the increase (decrease)
factors. Another drawback, inherent to all learning-rate
III. THE SuperSAB ALGORITHM
The SuperSAB algorithm is a local adaptive technique
introduced by T. Tollenaere [9]. A very similar approach is
the Delta-Bar-Delta algorithm [2]. For a comparison of
3
adaptation algorithms, is the remaining influence of the size
of partial derivative on the weight-step.
Despite careful adaptation of the learning rate, the
derivative itself can have an un-foreseeable influence on the
size of the weight-step. For example consider the situation
where a very shallow error function leads to a permanent
increase of the learning rate. Although the learning rate
grows rather large, the resulting weight-step remains small,
due to small partial derivative. When suddenly a region of
steep descent is reached, probably indicating the presence of
a minimum, the resulting large derivative is scaled by the
large learning rate, pushing the weight in a region far away
from the previous (promising) position.
normalized to the range [0,1] before feeding into the neural
networks:
data[0,1] =
l arg est _ data - data
l arg est _ data - smallest _ data
Forecasts from the neural network outputs were
transformed to the original data scale before the percentage
of good forecasts was reported.
The algorithm was improved with the flat-spot elimination
technique [1]. The flat-spot problem occurs when the output
a j of the sigmoid activation function of some neuron j
approaches 0.0 or 1.0. In this case, the function derivative
a j (1  a j ) becomes too close to zero, leading to a small
weight update. The technique consists in always adding a
constant of 0.1 to the derivative, yielding a j (1 a j )  0.1 .
IV. EXPERIMENTAL MODEL
The parameters used by the algorithm are:
 -the initial learning rate. At the beginning, all learning
rates are set to this value. The choice of the value for this
parameter is rather uncritical, for it is adapted as learning
proceeds. We set  to 0.1.
  -the momentum. We also set this parameter to 0.1.
A time series is a sequence of time-ordered data values that
are measurements of some physical process. A time series
forecasting problem can be easily mapped to a feed-forward
neural network [8]. The number of input units corresponds to
the number of input data terms. The number of output units
represents the forecast horizon. One-step-ahead forecast can
be performed by a neural network with one output unit, and
k-step-ahead forecast can be mapped to a neural network with
k output units.
This paper presents an application of feed-forward neural
networks and SuperSAB algorithm to forecasting a time
series consisting of the average monthly liquid flow
measured at Borzii Vineţi hydrometric station from the Jiu
River. The data are from the period 1944 - 1989 [12]. A
forecasting is correct if its value presents a deviation of at
most 10% from the measured value. In our experiments,
we predicted the liquid flow from one month using the liquid
flows from 12 previous months. Hence, a term of the time
series consists in 12+1 values.
The time series has 540 terms. The first 450 terms were
used as training patterns and the next 50 as validation
patterns. The last 40 terms of the time series were used to test
the forecasting performance of the models. Validation can
help insure that performance on the testing set is improving.
The program keeps track of the network status during its
point of best performance. If this performance does not
improve for a number of epochs, the network is restored to its
position of best performance and the trial is stopped.
All the experiments were carried out on a Pentium at 133
MHz and with 24MB RAM.
The feed-forward neural networks used in the experiments
had 12 input units, one hidden layer and one output unit. We
 neural network structures by varying the
tested different
number of hidden units and we used different values for the
parameters appearing in the algorithm SuperSAB.
For each neural network structure and parameter setting, 5
experiments were run with different random initial weights,
uniformly distributed over the interval [-1,1]. The reported
results are the averages of the 5 runs. The data series were


  -the factor greater than the unity. We used for  
the following values: 1.1, 1.3, 1.5, 1.8 and 2.0.
factor smaler than the unity. We set  to
0.001, 0.1, 0.3, 0.5, 0.7 and 0.9.
 Nh-the number of units from the hidden layer. We used
neural networks with 1, 3, 6 and 12 hidden units.
The next tables present the percentages of correct
predictions obtained after 2000 epochs.

  -the

Table 1
   1.1
  0.001
   1.1
  0.1
   1.1
  0.3
   1.1
  0.5
   1.1
  0.7
   1.1
  0.9

4
Nh=1
Nh=3
Nh=6
Nh=12
14.37%
11.50%
6.66%
20.00%
9.18%
17.50%
20.83%
18.50%
19.18%
16.66%
18.33%
16.66%
18.33%
20.00%
15.00%
15.00%
20.00%
20.00%
20.83%
16.66%
6.66%
19.00%
17.50%
19.18%
Table 2
   1.3
  0.001
   1.3
  0.1
   1.3
  0.3
   1.3
  0.5
   1.3
  0.7
   1.3
   0.9
Table 4
Nh=1
Nh=3
Nh=6
Nh=12
15.00%
15.83%
18.12%
17.33%
13.33%
19.16%
18.33%
16.66%
19.18%
19.16%
13.33%
19.00%
19.18%
16.86%
14.16%
19.00%
13.12%
14.16%
9.18%
0-2%
10.00%
0-2%
0-2%
0-2%
   1.8
  0.001
   1.8
  0.1
   1.8
  0.3
   1.8
  0.5
   1.8
  0.7
   1.8
   0.9
Nh=1
Nh=3
Nh=6
Nh=12
   0.1
   1.5
   0.3
   1.5
   0.5
   1.5
   0.7
   1.5
  0.9
Nh=6
Nh=12
16.66%
21.16%
18.33%
18.33%
16.66%
19.60%
20.16%
20.00%
14.16%
17.50%
18.33%
19.00%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
Nh=1
Nh=3
Nh=6
Nh=12
16.86%
12.50%
12.50%
17.50%
16.66%
18.33%
16.66%
20.00%
17.50%
12.50%
10.83%
15.83%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
   2.0
   1.5
   1.5
Nh=3
Table 5
Table 3
   0.001
Nh=1
20.00%
18.33%
18.33%
15.83%
18.33%
20.50%
20.83%
16.66%
20.83%
17.50%
20.83%
22.50%
13.33%
8.75%
18.33%
11.66%
16.66%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
0-2%
   0.001
   2.0
   0.1
   2.0
   0.3
   2.0
   0.5
   2.0
   0.7
   2.0
  0.9
5
[4] M. Riedmiller, “Advanced supervised learning in multi-layer
perceptrons-from Backpropagation to adaptive learning algorithms”,
International Journal on Computer Standards and Interfaces, vol. 16,
pp. 265-278, 1994.
[5] M. Riedmiller, “Rprop-description and implementation details”,
Technical Report, University of Kalsruhe, January 1994.
[6] M. Riedmiller and H. Braun, “A direct adaptive method for faster
backpropagation learning: The RPROP algorithm”, in Proceedings of
the IEEE International Conference on Neural Networks (ICNN), San
Francisco, USA, H. Ruspini Ed. , pp. 586-591, 1993.
[7] D. E. Rumelhart, G. Hinton and R. Williams, “Learning internal
representations by error propagation, in Parallel Distributed
Processing: Explorations in the Microstructure of Cognition, Vol. I
Foundations, D. E. Rumelhart, J. McClelland & the PDP Research
Group, MIT Press, Cambridge, MA, pp. 318-364, 1986.
[8] Z. Tang and P. Fishwick, “Feed-forward neural nets as models for
time series forecasting”, Technical report, University of Florida,
Gainesville, 1993.
[9] T. Tollenaere, “SuperSab: Fast adaptive backpropagation with good
scaling properties”, Neural Networks, vol. 3, pp. 561-573, 1990.
[10] L. Ujvary, Geografia apelor României, Scientific Ed., Bucharest,
1972 (in Romanian).
[11] ***, Monografia hidrologicã a bazinului hidrografic al râului Jiu,
Studii de hidrologie XV, I S C H Bucharest, 1966 (in Romanian).
[12] ***, Anuarele hidrologice din perioada 1944-1989, I. N. M. H.,
Bucharest (in Romanian).
V. CONCLUSIONS
We see that the best percentage is 22.50%. Since we used 40
terms to test the forecasting performances of the networks, a
percentage of 22.50% means that 9 from the 40 predictions
were correct. This result was obtained with the following
values of the parameters:
   1.5;   0.3;  0.1;  0.1 and Nh=6;
We notice that better results were got in the experiments




with   1.5;  0.3 and   1.1;  0.7 . When


both  and  had great values, the percentages of correct
predictions where very small, 0-2%.
In [3], we used another local adaptive learning scheme for
feed-forward neural networks, the algorithm RPROP
(Resilient Backpropagation) [4, 5, 6] to forecast a time series
consisting of the average monthly liquid flow measured at
Broşteni hydrometric station from the Motru River, an
affluent of Jiu. The best percentage of correct predictions
obtained was 26.65%, slightly better then the best result
obtained in this paper.
In this paper and [3], we used feed-forward neural
networks. Another idea is to use recurrent neural networks,
that is networks having feedback connections. These are very
interesting for a number of reasons. Biological neural
networks are highly recurrently connected, and many authors
have studied recurrent network models of various types of
perceptual and memory processes. The general property
making such networks interesting and potentially useful is
that they manifest highly nonlinear dynamical behavior.
Hence, using recurrent neural networks could improve our
results.
The forecasting of the hydrological system from an area
using neural networks represents a new method of
investigations in the hydrological field and we hope that can
contribute to hydrological prognosis improvement. We think
that neural networks can be a promising alternative to
classical prognosis based on statistical methods. In the future,
we shall apply neural networks to forecasting other natural
phenomena, too.
ACKNOWLEDGEMENTS
This paper is part of the of INTAS Project no. 397: “Data
Mining Technologies and Image Processing: Theory and
Applications”, Task 4: “The prognosis of harvest on the base
of the weather- and geo-monitoring of a region”.
REFERENCES
[1] S. Fahlmann, “An empirical study of learning speed in backpropagation
networks”, Technical report, Carnegie-Mellon University, 1988.
[2] R. Jacobs, “Increased rates of convergence through learning rate
adaptation”, Neural Networks, vol. 1, pp. 295-307, 1988.
[3] L. Leuştean, “Liquid flow time series prediction using feed-forward
neural networks and Rprop learning algorithm”, Studies in
Informatics and Control, vol. 10, no. 4, pp. 287-299, 2001.
6
Download