artificial neural networks

advertisement
6th International Conference on Hydroinformatics - Liong, Phoon & Babovic (eds)
© 2004 World Scientific Publishing Company, ISBN 981-238-787-0
COMPARING ARTIFICIAL NEURAL NETWORKS AND
SUPPORT VECTOR MACHINES FOR MODELLING RAINFALLRUNOFF
BARBARA CANNAS, AUGUSTO MONTISCI, ALESSANDRA FANNI
Department of Electrical and Electronic Engineering, University of Cagliari,
Cagliari, Italy
LINDA SEE
School of Geography, University of Leeds
Leeds, United Kingdom
GIOVANNI M. SECHI
Department of Land Engineering, University of Cagliari,
Cagliari, Italy
Monthly river flow forecasting is an essential part of water resource management. In this
paper, different approaches to modelling river flow are compared for the Santa Chiara
section of the Tirso Basin in Sardinia. The results indicated that no significant
improvement can be obtained with different neural models for monthly data forecasting,
although some pre-preprocessing techniques can improve the forecasting performances
as confirmed by the literature.
INTRODUCTION
Artificial Neural Networks (ANNs) are being used increasingly in hydrology and have
been applied to a range of different areas including rainfall-runoff, water quality,
sedimentation and rainfall forecasting ((Abrahart et al. [1], Baratti et al. [3]). ANNs for
forecasting river flow are almost always trained using a multi-layer perceptron (MLP)
with the backpropagation algorithm. This may be due in part to the fact that MLPs were
the first successful models to be implemented (Rumelhart et al. [10]), and because the
algorithm is simple to program and apply. However, there are now many different types
of model available, some of which may be more suited to river flow forecasting and
prediction, and hence warrant further investigation.
The aim of this paper is to compare several approaches to modelling the rainfallrunoff problem for water management purposes when only basic input variables are
available and where physicals models would be too complex for practical application.
Different approaches are used including: support vector machines, time-delay neural
networks, self-organizing maps and feedforward networks trained with backpropagation.
Models are developed to predict discharge at one time step ahead using monthly
discharge at the Santa Chiara section of the River Tirso. Historical runoff, rainfall and
1
2
temperature data are available from the period 1924 to 1992. The aforementioned
approaches are then compared to results produced using a times series prediction model
and persistence. A variety of different performance measures are used in evaluating the
best approach to monthly runoff forecasting.
SUPPORT VECTOR MACHINES
Support Vector Machines (SVMs) are learning machines that can be used to solve both
classification and regression problems (Vapnik, [12] [13]). Such problems can be
expressed in general as a set of training data {(x1 ; y1) ; … ; (xN ; yN)}, where xi are the
values that describe the system under study, and yi are the corresponding target values.
The basic idea is to project the points xi into a higher dimensional space, called the
feature space, where the points can be linearly separated, in the case of classification
problems, or a linear regression can be performed, in the case of regression problems. In
both cases, only a small part of the training set directly contributes to the determination of
the machine structure.
For regression problems in particular, the goal is to find a function that has
deviations from the target values that are less than a fixed value for all the training points,
but at the same time, be as flat as possible. For a given feasible solution that exceeds the
stated deviation, only the points that violate the constraint contribute to modifying the
regression function. In the end, the optimal solution will only depend on a small part of
the training data set. The feature space is defined by referring only to this subset of
points. The final solution depends on finding a set of parameters to the optimisation
problem cost function. Such parameters affect the performance of the machine and must
be determined by means of a trial-and-error procedure.
The SVM for regression can be used to forecast a time series data set. In this
situation, the performance of the machine is expressed in terms of its generalization
capability. Two sets (training and validation) are used for parameter tuning, and then both
sets are included in the training data set during the learning procedure.
The input data to a SVM can be pre-processed using the same procedures as for
neural networks and other types of models. For time series forecasting in particular, the
mono-dimensional signal must be transformed into a multi-dimensional signal, for
example, by means of a tapped line.
ARTIFICIAL NEURAL NETWORKS
Time Delay Neural Networks
Locally Recurrent Neural Networks (LRNNs) have, more recently, been proposed for
time series forecasting. LRNNs have an MLP-based structure with synapses
characterized by taps and feedback connections (Back and Tsoi [2]), i.e. the short-term
memory mechanism is brought into the feedforward network topology, and recurrent
connections are introduced. In most processes, the advantage of using such a mixed
3
model, in contrast to using a pure MLP approach with lagged inputs, is that the same task
can be achieved more parsimoniously. Nevertheless, feedbacks are necessary only if the
system is characterized by long and complex temporal dynamics. Moreover, LRNNs are
general but difficult to train. For this reason, Time Delay Neural Networks (TDNNs) are
often used instead because of increased simplicity. TDNNs were originally introduced in
speech recognition (Waibel et al. [14]). The architecture of a TDNN is well suited to
modelling temporal sequences because it is invariant under translation in time or space.
TDNNs use built-in time-delay steps to represent temporal relationships. Each neuron
output is the weighted summation of the output from the neurons in the previous layer
from a temporal input window, applied to a non-linear function (i.e. a sigmoid function).
Each time-delay step is considered equally, which means that the recent past is not
favoured over the distant past.
Self-Organizing Maps
A self-organizing map (SOM) is a type of artificial neural network developed by
Kohonen [5], which is used more often for classification than function approximation. It
differs from the MLP in both the configuration of the neurons and in the training
algorithm. The neurons in a SOM are usually arranged in a two-dimensional grid, where
each neuron has an associated vector of weights that corresponds to the input variables.
Weights are first initialised randomly. Training then consists of selecting a data case,
determining the neuron that is closest in Euclidean distance (or another measure of
similarity) and updating the winning neuron and those within a certain neighbourhood
around the winner. This process is repeated over many iterations until a stopping
condition is reached. Training generally proceeds in two broad stages: a shorter initial
training phase in which the map reflects the coarser and more general patterns in the data
followed by a much longer fine tuning stage in which the local details of the organisation
are refined. When training is completed, the weight vectors associated with each neuron
define the partitioning of the multidimensional data.
OTHER MODELS
Traditional time series models can also be used in modelling river flow. ARMA
(AutoRegressive Moving Average) models were developed by Box and Jenkins [4] and
can be written as follows:
xt  φ0  φ1 xt 1  φ2 xt 2  ... θ1at 1  θ 2 at 2  ...  a
(1)
where xt is the predicted value, 0 is the constant offset, i are weights associated with
each previous observation, xt-i are previous observations, i are weights associated with
each previous shock, at-i are previous shocks or noise terms, and at is the current shock.
Persistence is the substitution of the known figure as the current prediction and
represents a good benchmark against which other predictions can be measured.
4
DATA AND STUDY AREA
Data used in this paper are from the Tirso basin, located in Sardinia, Italy, at the Santa
Chiara section. The Tirso basin is of particular interest because of its geographic
configuration and water resource management, as a dam was built in the S. Chiara section
in 1924, providing water resources for central Sardinia. The basin area is 2,082.01 km 2
and is characterized by the availability of detailed data from several rainfall gauges.
Recently, a new “Cantoniera Tirso” dam was built a few kilometers downstream, creating
a reservoir with a storage volume of 780 Mm3, one of the largest in Europe.
The data available for the hydrological system are limited to monthly recorded
numerical time series associated with three basic input variables: the rainfall and
temperature at the gauging station and the runoff at the considered station.
In this study the analysis is developed on the runoff rates, which are treated as the
realization of one stochastic process that should contain all the information necessary to
characterize the basin. In previous works (Baratti et al. [3]), the authors have verified that
temperature and rainfall data do not significantly improve model performance. Hence
these data are not considered in this study.
MODELLING
The following models were developed using the first 40 years of data to train the model
(1924-1963), the next 9 years for cross-validation (1964-1972) and the remaining 20
years as an independent test data set (1973-1992), where the data were normalized
between -1 and 1 prior to training:

SVM: The SVM simulations have been performed with a free tool available online called LIBSVM [7]. A memory depth equal to 6 has been used, so the input
space of the SVM has a dimension equal to 6. The -SVM for regression gave
the best results. Most of the parameters have been set up with default values,
while for others an iterative sensitivity analysis has been performed. The
adopted parameters have the following values:  = 0.7; cost = 0.8;  = 0.48.
Refer to LIBSVM [7] for the meaning of the previously cited parameters.

TDNN: The TDNN model had 3 hidden neurons and 16 input values.

MLP: The training algorithm used was Levenberg-Marquardt. The network had
8 input nodes and 9 hidden neurons. The network input included the actual
runoff, the 5 previous runoff values and the sine and cosine of the clock while
the network output was the runoff at t+1.

SOM/Manual data splitting: the data were trained with a SOM of differing sizes
resulting in 4 different patterns or behaviours. The data were then divided into
5
the 4 subsets and each subset was then trained using a MLP with t to t-3 as
inputs and t+1 as output. A manual data splitting procedure was also used in
which the data at time t were split into low, medium and high flow categories.
Once split, the data were then trained using MLPs and the same input window.

ARMA model: From plots of the autocorrelation function, the best model was
determined to have two autoregressive terms (lags of 1 and 2) and 1 moving
average term (lag of 4). A single differencing operation was also undertaken but
this did not improve the model results.

Persistence: as outlined in the section on Other Models.
The following measures of evaluation have been used to compare the performance of the
different models, where N is the number of observations, Oi are the actual data and Pi are
the predicted values:
Coefficient of Efficiency (Nash and Sutcliffe [8]):
N
R 1
 Oi  Pi 2
i 1
N
 Oi  O 
(2)
2
i 1
The seasonal Coefficient of Efficiency following the definition in Lorrai and Sechi [6]:
 D

 Ed   E


d 1
Rd   D 

 Ed
(3)
d 1
where E d 
D
 ( Pi  Oi ) and d=1 to D months.
(4)
i d
Root mean squared error:
RMSE  N 1
N
 Oi  Pi 2
i 1
(5)
6
Mean absolute error:
N
MAE  N 1  Oi  Pi
(6)
i 1
Mean higher order error function (MS4E):
N
MS 4 E 
 Oi  Pi 4
i 1
N
(7)
DISCUSSION OF RESULTS
The measures of evaluation were calculated for each model and are listed in Table 1 for
the test data set (1973 to 1992).
Table 1. Measures of evaluation for each model for the testing data set (1973 to 1992)
Model
R
Rd
RMSE
MAE
MS4E x 106
SVM
0.41
0.30
12.23
8.03
0.16
TDNN
0.43
0.35
11.58
7.36
0.11
MLP
0.42
0.38
29.0
19.0
0.50
SOM
-0.04
-0.26
16.36
13.78
0.34
Data Partitioning
0.57
0.48
10.09
8.70
0.02
ARMA
0.29
0.19
13.14
10.09
0.19
Persistence
0.06
-0.06
15.44
9.47
0.38
The pure MLP solution, although providing comparable efficiencies to other solutions
(i.e. TDNN and SVM), provided the worst results overall in terms of RMSE, MAE and
MS4E. The SOM, although showing improvements in both RMSE and MAE over a pure
MLP, was the second worst performing model and had very poor efficiencies. This result
was surprising because a SOM was used successfully in a previous study to pre-process
the data prior to training (See and Openshaw [11]). Examining plots of the test data
revealed that the SOM was actually able to predict the peaks relatively well in places but
had problems predicting low flows, indicating that the low flow behaviour was not
captured adequately during the partitioning of the data with the SOM.
The SVM and TDNN had comparable performances and were better than both a
traditional linear time series model (i.e. ARMA) and persistence. However, they did not
result in improvements in overall efficiency relative to a pure MLP.
The manual data partitioning technique was used in order to divide the data at time t
into low, medium and high flows, prior to training with individual MLPs. At least one
network was therefore concentrating on low flow data, which was a clear deficiency with
7
the SOM approach. The result was a definitive improvement in efficiency as well as
slight improvements in the other evaluation measures. Efficiencies were also calculated
for the three subsets (low, medium and high flow predictions). Interestingly, they were
poor for the low flow predictions and 90% for the medium and high flow predictions,
further emphasizing the low flow data problem encountered in the SOM approach.
Although the data partitioning technique resulted in the best overall results, it did involve
the training of 3 different MLPs, thus requiring higher computational effort.
CONCLUSIONS
The implementation of different neural network models to forecast runoff in a Sardinian
basin was proposed in this paper. The results showed that most of the neural network
models could be useful in constructing a tool to support the planning and management of
water resources. The measures of efficiency obtained with the different models, although
significantly greater than those obtained with traditional autoregressive models, were still
only around 40%. A sizeable increase was obtained when the input data were manually
partitioned into low, medium and high flows before training with 3 individual MLPs,
indicating that this pre-processing technique warrants further investigation. In fact it
should be noted that in general, and in Sardinian basins in particular, that rainfall and
runoff time series present high non-linearity and non-stationarity, and neural network
models may not be able to cope with these two different aspects if no pre-processing of
the input and/or output data is performed. Wavelet transformation and multiresolution
analysis, for example, have been applied successfully to time series analysis when the
signals to be processed are characterized by non-stationarity (Nason et al. [9]), and this
represents an area of ongoing research.
ACKNOWLEDGEMENTS
The authors would like to thank the British Council/Ministero dell’Instruzione,
dell’Università e della Ricerca in Italy for the funding provided through the British
Council–MIUR/CRUI Agreement 2002-2003.
REFERENCES
[1] Abrahart, R.J., Kneale, P.E., See, L. “Neural Networks in Hydrology”, A.A.
Balkema, Rotterdam, (in press).
[2] Back D., Tsoi A.C., “FIR and IIR synapses, a new neural network architecture for
time series modeling”, Neural Computation, Vol. 3, Massachusetts Institute of
Technology, (1991), pp. 375-385.
[3] Baratti R., Cannas, B., Fanni, F., Pintus, M., Sechi, G.M., Toreno, N., “River flow
forecast for reservoir management through neural networks”, Neurocomputing, Vol.
55, (2003), pp. 421-437.
8
[4] Box, G.E.P., Jenkins, G.M. “Time Series Analysis: Forecasting and Control”.
Oakland, CA, Holden-Day, (1976).
[5] Kohonen, T., “Self-organization and associative memory”, Springer-Verlag, Berlin,
(1984).
[6] Lorrai, M., Sechi, G.M. 1995. Neural nets for modeling rainfall-runoff
transformations, Water Resources Management, Vol. 9, (1995), pp.299-313.
[7] LIBSVM,
A
Library
for
Support
Vector
Machines,
(2003),
http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[8] Nash, J.E., Sutcliffe, J.V., “River flow forcasting through conceptual models, I: A
discussion of principles”, Jounral of Hydrology, Vol. 10, (1970), pp. 282-290
[9] Nason, G. P., and von Sachs, R., “Wavelets in time series analysis”, Phil. Trans.
Roy. Soc. A, Vol. 357, (1999), pp. 2511 – 2526.
[10] Rumelhart, D.E., Hinton, G.E., Williams, R.J. “Learning internal representation by
error propagation”. In: Rumelhart, D.E. & McClelland, J.L. (eds.) Parallel
distributed processing: explorations in the microstructure of cognition, Vol.1.
Cambridge MA, MIT Press, (1986), pp. 318-362.
[11] See, L. and Openshaw, S. “A hybrid multi-model approach to river level
forecasting”. Hydrological Sciences Journal, Vol. 45, (2000), pp.523-536.
[12] Vapnik, V. “The Nature of Statistical Learning Theory”, Springer, NY, (1995).
[13] Vapnik, V. “Statistical Learning Theory”, Wiley, NY, (1998).
[14] Waibel, A.H., Hanazawa, T., Lang, K.J, Hinton, G., Shikano, K., “Phoneme
recognition using time-delay neural networks”, IEEE Transactions on Acoustics,
Speech, and Signal Processing, Vol. 37, (1989), pp. 328-339.
Download