Uploaded by gnr prasad


See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/225155100
Improved Water Level Forecasting Performance by Using Optimal Steepness
Coefficients in an Artificial Neural Network
Article in Water Resources Management · August 2011
DOI: 10.1007/s11269-011-9824-z
4 authors:
Muhammad Sulaiman
Ahmed El-Shafie
Universiti Malaysia Pahang
Universiti Kebangsaan Malaysia
Othman A Karim
Hassan Basri
Universiti Kebangsaan Malaysia
Universiti Kebangsaan Malaysia
Some of the authors of this publication are also working on these related projects:
Intelligent Solid Waste monitoring and Management for SmartCity/SmartVillage View project
Phytoremediation of Arsenic using Terrestrial Plant View project
All content following this page was uploaded by Muhammad Sulaiman on 21 September 2015.
The user has requested enhancement of the downloaded file.
Water Resour Manage (2011) 25:2525–2541
DOI 10.1007/s11269-011-9824-z
Improved Water Level Forecasting Performance
by Using Optimal Steepness Coefficients
in an Artificial Neural Network
Muhammad Sulaiman · Ahmed El-Shafie ·
Othman Karim · Hassan Basri
Received: 20 August 2010 / Accepted: 4 April 2011 /
Published online: 24 May 2011
© Springer Science+Business Media B.V. 2011
Abstract Developing water level forecasting models is essential in water resources
management and flood prediction. Accurate water level forecasting helps achieve
efficient and optimum use of water resources and minimize flooding damages. The
artificial neural network (ANN) is a computing model that has been successfully
tested in many forecasting studies, including river flow. Improving the ANN computational approach could help produce accurate forecasting results. Most studies
conducted to date have used a sigmoid function in a multi-layer perceptron neural
network as the basis of the ANN; however, they have not considered the effect of
sigmoid steepness on the forecasting results. In this study, the effectiveness of the
steepness coefficient (SC) in the sigmoid function of an ANN model designed to
test the accuracy of 1-day water level forecasts was investigated. The performance of
data training and data validation were evaluated using the statistical index efficiency
coefficient and root mean square error. The weight initialization was fixed at 0.5 in
the ANN so that even comparisons could be made between models. Three hundred
rounds of data training were conducted using five ANN architectures, six datasets
and 10 steepness coefficients. The results showed that the optimal SC improved the
forecasting accuracy of the ANN data training and data validation when compared
with the standard SC. Importantly, the performance of ANN data training improved
significantly with utilization of the optimal SC.
Keywords Artificial neural networks · Sigmoid function · Steepness coefficient ·
Water level forecasting
M. Sulaiman (B)
Faculty of Civil Engineering and Natural Resources, University Malaysia Pahang,
Pahang, Malaysia
e-mail: [email protected]
A. El-Shafie · O. Karim · H. Basri
Department of Civil Engineering, Faculty of Engineering, National University of Malaysia,
Bangi, Malaysia
M. Sulaiman et al.
1 Introduction
River flow forecasting is essential in water resources management because it can
facilitate the management of water resources, thereby optimizing the use of water.
The ability to forecast river flow also helps predict the occurrence of future flooding,
enabling better preparation to avoid the loss of lives and minimize property damage.
Forecasting studies normally require a series of historical datasets so that future
events can be predicted based on past events. It is vital for local agencies such as the
water authority to maintain good quality river flow data to facilitate reliable river
flow forecasting.
The artificial neural network (ANN) is a computing model that can solve nonlinear problems from a series of sample datasets. ANN computing is based on the
way the human brain processes information. The ANN can define hidden patterns
within sample datasets and forecast based on new dataset inputs. Thus, in many
areas of study such as forecasting river flow, the ANN does not require defined
physical conditions of the subject, which in this case is the river. Other problems
such as regression, classification, prediction, system identification, feature extraction
and data clustering can also be solved through ANN computing.
ANNs have been widely applied in many areas including the financial, mathematical, computer, medicinal, weather forecasting and engineering fields. In water
resources studies, ANNs are employed to forecast daily river flow (Atiya et al.
1999; Coulibaly et al. 2000; Ahmed and Sarma 2007; El-Shafie et al. 2008; Wu et al.
2009), water levels (Bustami et al. 2007; Leahy et al. 2008), flood events (Tareghian
and Kashefipour 2007; Kerh and Lee 2006), rainfall runoff patterns (Chiang et al.
2004; Agarwal and Singh 2004; Rahnama and Barani 2005), reservoir optimization
(Cancelliere et al. 2002; Chandramouli and Deka 2005) and sedimentation (Cigizoglu
and Kisi 2006; Rai and Mathur 2008). For a general review of the application of
ANN in hydrology, refer to ASCE (2000a, b). The objective in studies of ANN
forecasting is to identify the best forecasting model that can provide the most
accurate forecasting results possible. This is achieved by modifying key components
in the ANN during data training. These essential parameters are the number and type
of data inputs, number of hidden layers and neurons, activation of transfer function,
and optimization method to identify the weight in neurons.
In river flow forecasting, data inputs can be generated from historical river flow
data, rainfall, precipitation and sedimentation. El-Shafie et al. (2007, 2008, 2009)
conducted river flow forecasting of the Nile River using river flow data obtained
from a single station. Turan and Yurdusev (2009) used a multiple upstream river
flow station to forecast river flow. Chiang et al. (2004) and Rahnama and Barani
(2005) included rainfall and runoff data when conducting river flow forecasting,
and Zealand et al. (1999) included precipitation, rainfall and flow data in their
forecasting study. Fernando et al. (2005) suggested several methods to identify the
proper inputs to a neural network. Other studies (Alvisi et al. 2006; Toth and Brath
2007) have investigated the effects of the number and type of inputs on the ANN
Once data inputs have been determined, selection of the number of hidden
layers and neurons plays a vital role in achieving the best forecasting performance.
Determination of the number of layers and neurons used in forecasting studies
has generally been based on a trial-and-error approach (Coulibaly et al. 2000;
Improved Water Level Forecasting Performance
Joorabchi et al. 2007; Solaimani and Darvari 2008; Turan and Yurdusev 2009). Many
studies have shown that one hidden layer is sufficient. Indeed, Hornik et al. (1989)
found that multilayer feed forward networks with one hidden layer were capable of
approximating to any desired degree of accuracy provided sufficient hidden units
were available. Zhang et al. (1998) reviewed published studies of ANN and found
that a single hidden layer is most popular and widely used in the layer selection. Two
hidden layers are also able to produce the best forecasting performance in certain
problems (Barron 1994). However, selection of the number of ANN neurons is still
based on trial and error (Chauhan and Shrivastava 2008).
Activation transfer function (ATF) is the main computing element of ANN and
plays an important role in achieving the best forecasting performance. The most
common type of activation transfer function is the sigmoid function (Zhang et al.
1998). However, several studies have used different types of ATFs within the ANN to
improve the forecasting performance. Shamseldin et al. (2002) used logistic, bipolar,
hyperbolic tangent, arc-tan and scaled arc-tan to explore the potential improvement
of ANN forecasting. Joorabchi et al. (2007) applied log-sigmoid and hyperbolic tangent sigmoid transfer functions to produce their output. Han et al. (1996) introduced
optimization of the variant sigmoid function using a genetic algorithm to optimize
the ANN convergence speed and generalization capability.
Many researchers have studied the exterior architecture of the ANN using a trial
and error approach. Others have investigated the interior architecture of the ANN,
but have been limited to testing different types of ATF in the ANN architecture. The
sigmoid function, which is the most commonly used computing function for ATF, has
been widely used in the ANN because of its ability to influence the performance of
the ANN. This study was conducted to evaluate the effectiveness of the steepness
coefficient in a sigmoid function for improving ANN data training and forecasting in
a river flow study. Additionally, the effectiveness of the optimal steepness coefficient
approach for the exterior architecture in river flow forecasting was compared with
that of the traditional approach based on trial and error. The improved performance
of the ANN water level forecasting could assist water authorities in managing water
2 Methodology
In this study, sigmoid functions with different steepness coefficients were evaluated
as activation transfer functions in an ANN to improve the forecasting water level at
Rantau Panjang Station, Johor Baru, Malaysia. The performance of the ANN data
training and data validation were measured using the statistical index Nash-Sutcliffe
efficiency coefficient (NS). The Root Mean Square Error (RMSE) was also used to
measure the accuracy of the data forecasting performance.
2.1 Artificial Neural Network
The ANN is a non-linear mathematical computing model that can solve arbitrarily
complex non-linear problems such as time series data forecasting. The ANN can
identify hidden patterns in sample datasets and forecast based on these patterns.
ANN computing is based on the way human brains process information. Indeed,
M. Sulaiman et al.
the ANN is a non-linear solver that needs to compute constant terms or unknown
parameters before the x’s or data input can be computed. The general ANN nonlinear equation can be defined as follows:
y1 , . . . yk = f (x1 , . . . xm , . . . c1 , . . . cn )
where y is the forecasted output, k is the number of forecasted outputs, x is the
input, m is the number of data input, c is a constant term and n is the number of
constant terms. The constant term c is referred to as the weight in ANN modeling.
y can be computed if m, the c values and n are known. The x values are data
input. To determine the m, c values and n, it is necessary to understand the ANN
The architecture of ANN is dependent on the type of ANN that is used. In this
study, a multi-layer perceptron back-propagation (MLP-BP) neural network, which
is the most common type of ANN used in forecasting studies, was employed. The
reason for the popularity of the MLP-BP is its simplicity, easy implementation and
demonstrated success in forecasting studies. Figure 1 provides an example of the
MLP-BP architecture. There are three layers in MLP-BP, an input layer, a hidden
layer and an output layer. There can be only one input layer and one output layer;
however, there can be more than one hidden layer. Each layer contains stored
neurons, with the total number of neurons being determined by the user. The number
of neurons in the input layer is equivalent to the number of data inputs selected in a
study. The number of output neurons in the output layer is generally one, resulting
in one forecasted output. The number of neurons in the hidden layer is subject to
user selection. Most often, the number of input neurons and hidden neurons are
determined based on trial and error. An additional dummy neuron known as a bias
neuron that holds a value of 1 is added to the input layer and each hidden layer.
The bias neuron acts as a threshold value so that the value of the forecasted output
falls between 0 and 1. The neurons are interconnected between layers as shown in
the figure. The direction of the link is from the input layer to the output layer, and
these links represent the computational process within the ANN architecture. The
actual computational process occurs in the neuron, in which the activation transfer
function assigned to the neuron is used to compute the incoming value and produce
an output. In the input neuron, the activation transfer functions relay the incoming
data input as output. In the hidden and output neurons, the activation transfer
function computes the incoming value and produces an output. In the input neurons,
the activation transfer function employed is a linear transfer function. In hidden and
Fig. 1 Multi-layer perceptron back-propagation neural network in the study
Improved Water Level Forecasting Performance
Fig. 2 An active neuron with sigmoid function and threshold input
output neurons, the activation transfer function that is commonly used is a sigmoid
function. Figure 2 shows a neuron with a sigmoid function that computes incoming
data and produces an output. The data received by the neuron is the summation of
the output of the previous neuron based on its weight.
Finally, the weight c values can be determined by data training. Data training can
be accomplished when the number of neurons in each layer is defined, an activation
transfer function is assigned to the neurons in each layer, and training data are
available. The data training process involves two parts of computation, feed-forward
and back propagation. In the feed-forward computation, computing starts from the
input layer and proceeds to the output layer based on the links and ATF in the ANN
architecture. The output of the feed-forward computation is the forecasted value. In
the data training process, performance is evaluated based on the model output value
with respect to the observed value. If the measured performance does not achieve
the target performance, a back-propagation computation takes place. The backpropagation computation is a process of adjusting the weights in the architecture
based on the gradient descent method. The process of the weight adjustment starts
from the output layer and proceeds backward toward the input layer. The process
of feed-forward and back-propagation continues until the performance target is
achieved. Once the data training process is completed, the weights of the c values are
determined. Then, the forecasting process proceeds based on the single feed-forward
computation utilizing the input data as shown in the model architecture. The results
are then evaluated by applying performance measures to the observed data versus
the model output values.
2.2 Sigmoid Function
The activation transfer function (ATF) forces incoming values to range between 0
to 1 and 1 to −1 depending on the type of function used. The most commonly used
ATF in the MLP-BP is the sigmoid function. The sigmoid function is a differentiable
function in which the gradient method can be applied to the ANN to adjust the ANN
weights so that the model output and observed data reach a target performance value
during data training. The sigmoid function is defined as follows:
1 + e−kx
M. Sulaiman et al.
where y is the sigmoid value, k is the sigmoid steepness coefficient and x is the data
or incoming values. As in the case of the neural network, the incoming values are the
summation of the input and weight values. Additional input with a value of 1 and its
weights are added as a threshold value so that the computed sigmoid function will
result in an activation value between 0 and 1.
2.3 Case Study Area and Dataset
The case study area in the present study was Rantau Panjang station, which is located
along the bank of the Johor River (Fig. 3). More than 12 flood events have occurred
in Rantau Panjang since 1963. All occurrences of flooding were caused by river flow
from the upper stream and heavy rainfall during the Northeast monsoon, which is
between December and January (except in 1964). The Johor River, which is located
at the central part of southern Johor, originates from Mount Gemuruh (109 m) and
Mount Belumut (1,010 m). The river length is about 122.7 km with a drainage area of
2,636 km2 . The river flows through the southern part of Johor and discharges into the
Straits of Johor (0 m). The main tributaries of the river are the Linggiu and Sayong
Data for this study were provided by the Department of Irrigation and Drainage,
Ampang, Selangor. The data consist of 45 years of historical hourly water level data
obtained from 1963 to 2007. The collected data were divided into two categories;
training data and validation data. Due to missing daily data in the historical records,
the training and validation data were divided into smaller groups of continuous
datasets. Six water level datasets were extracted from 1963 to 1986 for data training
and a single dataset with 1,144 daily water level data points extracted from 1994 to
Fig. 3 The Johor River basin
Improved Water Level Forecasting Performance
2007 for data validation. The description of the training and validation datasets is
shown in Table 1. The datasets in this study were normalized based on the following
Oi − Omin
Omax − Omin
where N is the normalized value, O is the observed value, Omax is the maximum
observed value and Omin is the minimum observed value.
2.4 Implementation
A total of 10 sigmoid functions based on 10 different steepness coefficients (Fig. 4;
Table 2) were used as the activation transfer functions in the hidden and output
neurons. The steepness coefficients ranged from 0.025 to 1. The steepness coefficient
with a value of one, which is commonly used as the default steepness coefficient
in sigmoid functions, was used as the Standard Steepness Coefficient (SSC). The
other nine steepness coefficients ranging from 0.025 to 0.7 were referred to as Milder
Steepness Coefficients (MSC) and numbered from 1 to 9. As an example, MSC_1
refers to a steepness coefficient of 0.7. Pre-analysis of a steepness coefficient greater
than one shows poor performance in data training and validation of the water level;
thus, it is not described in this paper.
Previous studies (Atiya et al. 1999; Solaimani and Darvari 2008) have revealed
that a single layer is sufficient for forecasting river flow. Therefore, in this study,
an MLP-BP with one hidden layer was selected. In the ANN model tested in this
study, 3, 4, 5, 6, and 7 input data values were investigated. The number of input data
refers to the number of multi-lead days ahead of the forecasting day. The number of
input neurons was the same as the number of data input, while the number of output
neurons was one, referring to 1 day ahead of water level forecasting. The number
of neurons in the hidden layer was equal to the number of input neurons. Based on
these findings, five MLP-BP architectures ranging from Net1 to Net5 were developed
(Table 3). The MLP-BP architecture used in this study is shown in Fig. 1.
The goal of ANN data training is to determine the weights in the ANN architectures by processing sets of data input from historical data through iteration.
Generally, the weights in the network model are initialized with random values
between value 0 and 1. This method helps achieve better forecasting performance
when compared with conditions in which the initial weights are initialized with fixed
values. However, to enable equal data forecasting comparisons between different
network models, the initial weights in all network models in this study were initialized
Table 1 Training and forecasting datasets used in the study
Date from
Date to
Water level (days)
Data training
Data training
Data training
Data training
Data training
Data training
Data forecasting
M. Sulaiman et al.
Fig. 4 Sigmoid function with ten different steepness coefficients. MSC milder steepness coefficient
with a fixed value of 0.5. By doing so, each data training process started with the
same position of weight values, thereby enabling even comparisons of the forecasting
performance of different network models. Another, parameter that needs to be fixed
is the epoch so that equal forecasting performance of the different network models
can be made. Thus, an early stopping procedure (ASCE 2000a), which is the best
approach to stopping data training, was not implemented in this study.
In summary, 300 data training models based on six training datasets, 10 steepness coefficients and five ANN architectures were tested. The ANN weights were
initialized with a fixed value of 0.5, and 2000 epochs were used for the data training.
The model performance was tested based on a single validating dataset generated
from the best known training performances of the SSC and the optimal MSC of the
six training datasets. Comparisons of the observed and forecasting water levels were
charted to show the results of the data validation.
2.5 Performance Evaluation
Data training performance in this study was evaluated using the Nash-Sutcliffe
coefficient of efficiency (NS). The efficiency NS is a statistical index widely used
Table 2 The ten steepness
coefficients in the sigmoid
Steepness coefficient
Sigmoid function
1/(1 + e−x )
1/(1 + e−0.7x )
1/(1 + e−0.5x )
1/(1 + e−0.35x )
1/(1 + e−0.25x )
1/(1 + e−0.17x )
1/(1 + e−0.1x )
1/(1 + e−0.075x )
1/(1 + e−0.05x )
1/(1 + e−0.025x )
Improved Water Level Forecasting Performance
Table 3 The five ANN
to describe the forecasting accuracy of hydrological models. An NS with a value of
one indicates a perfect fit between the modeled and observed values. In addition, an
NS with a value of greater than 0.9 indicates satisfactory performance results, while
a value greater than 0.95 indicates good performance results between the two data
series. The NS is defined as:
NS = 1 −
(Oi − Fi )2
(Oi −
where O is the observed value, F is the forecasted value and N is the number
of data being evaluated. The model performance was measured using the NashSutcliffe’s coefficient of efficiency and root mean square error (RMSE). The RMSE
describes the average magnitude of error of the observed and forecasted values and
is defined as:
(Oi − Fi )2
where the terms above are the same as in Eq. 4. The units for RMSE in this study are
in millimeters.
3 Results and Discussion
The data training and data validation results for the five ANN architectures trained
with six datasets, 10 different steepness coefficients in sigmoid function and 2000
epochs are presented herein.
Figure 5 shows the data training performance for the five ANN architectures in
which each of the six datasets were trained with 10 steepness coefficients. All datasets
trained with MSC in the ANN architectures generally had better performance
than datasets that were trained with the SSC. Milder MSC values such as MSC_3
to MSC_9 tended to produce better performance than SSC, MSC_1 and MSC_2.
However, this does not necessarily indicate that milder steepness coefficients produce better training performance. Figure 6 shows an enlargement of Fig. 5e for
data training performance greater than 0.9. The figure shows a variation in dataset
performance above 0.9 for Net5, which is also indicated for Net1 to Net4. Dataset
DS_1 shows decreasing performance starting from MSC_3, which demonstrates that
a milder steepness coefficient is not necessarily better. Datasets DS_2, DS_4 and
M. Sulaiman et al.
(a) Net1
(c) Net3
(b) Net2
(d) Net4
(e) Net5
Fig. 5 Data training performances of the six datasets with five different network models and ten
steepness coefficients
DS_6 indicate that the best performance occurred when a steepness coefficient of
MSC_9, which was the steepest MSC tested, was used. In fact, the optimal MSC
could occur at or between MSC_3 to MSC_9. As shown in Fig. 6, datasets DS_1,
DS_3 and DS_6 showed strong data training performance where the performance
for the SSC training had NS values close to or greater than 0.9, while datasets
DS_2, DS_4 and DS_5 had poor data training performance when the NS values
were below 0.85, and occasionally when the values were lower than 0.7. However,
for the poor data training performance, the NS values improved to greater than 0.85,
Improved Water Level Forecasting Performance
Fig. 6 Enlargement of data training performance for Net5. DS dataset. MSC milder steepness
and most to greater than 0.9, as the steepness coefficient in the range of MSC_4
and MSC_9 was used. The results shown in Figs. 5 and 6 indicate that an optimal
steepness coefficient existed in all ANN architectures and datasets tested, which gave
optimal data training performance and better data training results when compared
with the standard steepness coefficient. Indeed, even bad training datasets improved
significantly to an NS value above 0.9, which can be considered satisfactory.
Figure 7 shows a summary of data training performances from Fig. 5 of five sets
of ANN architectures that were trained with SSC and another five sets of ANN
architectures that were trained with the optimal MSC for each of the datasets. The
Fig. 7 Comparison of data training performances based on the standard steepness coefficient (SSC)
and optimal milder steepness coefficients (MSC)
M. Sulaiman et al.
optimal MSC refers to the best data training performance among the nine MSCs
for the specific dataset and ANN architecture. For data training with the SSC, the
performance of the six datasets varied depending on the ability to achieve high
NS values using the same ANN architecture. For example, for dataset DS_1, the
range of NS varied from 0.893 to 0.922. In the trial and error method, if the ANN
architecture is to be selected based on dataset DS_1 for data validation, then the
ANN architecture Net3 will be used. This differs from data training using the optimal
MSC, in which the data training performance for different ANN architectures are
within 0.940. Thus, any ANN architecture trained with the optimal MSC can be used
for data validation. Another example is dataset DS_2, for which the data training
performance varies from 0.100 to 0.790. ANN architecture Net5 will be selected for
validation if in the trial and error method, but for data training with the optimal
MSC, any ANN architecture can be used in data validation since the results for
different ANN architectures is within 0.908. These findings indicate that different
ANN architectures do not affect the optimal data training performance. Instead, the
optimal steepness coefficient that seems to exist in all ANN architectures strongly
influences the data training performance results. The same conclusion is shown for
dataset DS_3 to DS_6. The figures and tables show that the optimal MSC is a better
approach than the trial and error method and that it produces a better result than
SSC. These results or findings may help to determine future studies in ANN in which
use of the optimal steepness coefficient method could be a better approach than trial
and error for identifying the optimal data training performance.
Table 4 shows the data training performance for the six datasets with the best
SSC and optimal MSC based on data training using six ANN architectures and
10 steepness coefficients. It could be observed from Fig. 7 that the proposed
modifications in MSC enhance the performance for all data sets over SSC. For more
details, from Table 4, for data sets resultant in relatively good performance (DS_1,
DS_3 and DS_6) utilizing SSC an improvement NS ranged between 0.9% and 2%
could be observed. On the other hand, for the data sets experienced relatively poor
performance using SSC, significant enhancement for NS between 12% and 14% has
been reached. In fact, such improvements in the model performance are significant
since achieving NS more than 0.9 is considered as satisfactory performance.
Table 5 shows the model performance based on the best SSC and optimal MSC
of data training models using validation dataset(DS_7). While examining DS_7,
it can be depicted from Table 5 “column 2” that the model performance values
NS were found to be consistent between training (DS_1 to DS_6), as shown in
Table 4 “column 2” utilizing SSC architecture. On the other hand, for dataset
Table 4 Data training performances (NS) of the six training datasets
Best SSC
Best optimal MSC
0.922 (Net1)
0.790 (Net5)
0.938 (Net1)
0.794 (Net1)
0.805 (Net1)
0.938 (Net5)
0.942 (Net5:MSC_3)
0.915 (Net5:MSC_9)
0.952 (Net5:MSC_4)
0.921 (Net5:MSC_9)
0.940 (Net5:MSC_7)
0.947 (Net5:MSC_9)
Improved Water Level Forecasting Performance
Table 5 Forecasting performances (NS) of dataset DS_7
Best SSC
Best optimal
RMSE (mm)
0.913 (Net1)
0.873 (Net5)
0.923 (Net1)
0.615 (Net1)
0.867 (Net1)
0.930 (Net5)
0.938 (Net5:MSC_3)
0.928 (Net5:MSC_9)
0.938 (Net5:MSC_4)
0.924 (Net5:MSC_9)
0.924 (Net5:MSC_7)
0.938 (Net5:MSC_9)
DS_2 and DS_5, which are based on the SSC, the performance was better than
the data training. Model performance based on the SSC range varied widely from
0.615 to 0.930; however, the forecasting performance of relatively good performance
datasets (DS_1, DS_3 and DS_6) based on the optimal MSC had similar values of
0.938. Conversely, the relatively poor performance datasets (DS_2, DS_4 and DS_5)
showed consistent performance values of 0.921 to 0.928. These findings indicate that
the optimal forecasting performance was achieved by different good performance
datasets (DS_1, DS_3 and DS_6) using the optimal MSC approach.
Figure 8 shows the model performance of DS_7 based on data training using
dataset DS_4 and SSC, while Fig. 9 shows the model performance of DS_7 based
on data training using dataset DS_4 and the optimal MSC (MSC_9). In fact, the
use of DS_4 as a training dataset is because it provides the worst data forecasting
performance based on the SSC, but improved drastically while using the optimal
MSC. The figures show a comparison of the observed and forecasted water level and
a scatter plot of the performance of the data forecasting. The scatter plot presented
in Fig. 8 shows that most data move away from the equal or middle line, while the
scatter plot in Fig. 9 shows that most data are centered around the middle line. These
findings show that the data forecasting based on the SSC has an NS of 0.615, while
the data forecasting based on the optimal MSC has an NS of 0.924. These results
demonstrate that the optimal MSC greatly improved the poor forecasting dataset.
Figure 10 shows the forecasting performance of DS_7 based on data training using
dataset DS_6 and SSC. Figure 11 shows the forecasting performance of DS_7 based
on data training using dataset DS_6 and the optimal MSC, which is MSC_9. The use
of DS_6 as a training dataset produced the best data forecasting performance based
Fig. 8 Model performance of ANN based model using DS4, Net1 and standard steepness coefficient
M. Sulaiman et al.
Fig. 9 Model performance of ANN based model using DS4, Net5 and milder steepness coefficient
on the SSC and optimal MSC. The scatter plots of the forecasting performances of
the two models could be similar, except that in the scatter plot in Fig. 11 showed more
data centered around the middle line than the plot shown in Fig. 10 specially for the
observed data less than 6,000 mm. This accuracy is reflected by the performance of
the SSC and optimal MSC, which had NS values of 0.93 and 0.938, respectively. The
results shown in both scatter plots were slightly better than those shown in the scatter
plot in Fig. 9, which had an NS value of 0.924. These findings confirm the existence
of an optimal MSC in the sigmoid function that produced the optimal forecasting
Based on these findings, the MSC are able to provide good forecasting results;
however, the mechanism by which MSC produces sigmoid values between 0 and 1
at the output neuron is unclear since it appears to have a limited range of sigmoid
values (Fig. 4). For example, MSC_9 appears to produce sigmoid values limited to
around 0.5. This is because the sigmoid functions shown in Fig. 4 based on the xvalues between 4 and −4. Figure 4 is suited to view sigmoid values between 0 and
1 for the SSC. However, if expanded to x-values between 100 and −100, then the
sigmoid values between 0 and 1 are visible for all MSCs. For MSC 9, the sigmoid
value is 0.1 if the x-value is −88, while it is 0.9 if the x-value is 88. For the SSC, to
produce a sigmoid value of 0.1 and 0.9, the x-value must be −2.2 and 2.2 respectively.
Thus, this explanation shows that all MSC can produce sigmoid values between 0
and 1 if appropriate x-values are passed to the sigmoid function. The mechanism that
enables the summation of w and x values passed to the hidden and output neurons
to have a high value is also unclear. Based on the two parameter w and x values
Fig. 10 Model performance of ANN based model using DS6, Net5 and standard steepness
Improved Water Level Forecasting Performance
Fig. 11 Model performance of ANN based model using DS6, Net5 and milder steepness coefficient
in the summation, only a high value of weights can increase the summation. This is
because the x values will always fall between 0 and 1. Figure 12 shows an example of
the bias weights of SSC and MSC_9 that are passed to the output neuron during data
training. As shown in the figure, the weight for MSC_9 increased dramatically when
compared with the SSC during the data training process. The weight generated by
SSC was only 2.48424, but that of MSC_9 was 86.12270 after 2000 epochs. It should
be noted that both bias weights were initialized with a value of 0.5 at the start of the
data training process. Thus, these findings confirmed that the weights in the ANN
model influence the output of MSC between 0 and 1. Linear transfer function has
been successfully applied to the output layer to improve the forecasting performance
of the ANN model (Toth and Brath 2007; Hornik et al. 1989). However, in our case
study, based on the bias weight explanation and its effects on the summation value
that is passed to output neurons, it is not possible to have a linear transfer function
in the output layer and an MSC in the hidden layer. This is due to the rigidity of the
back propagation algorithm to re-adjust the weights of hidden layer and output layer
in case a considerable difference in the weight values is experienced. Actually, the use
of MSC in hidden neurons will present a wide range different in the weight values in
hidden and output layer, which turn into unfeasibility to use linear transfer function
in the output neuron. However, there are several optimized algorithm rather than
back propagation such as genetic algorithm and particle swarm optimization that
could help overcome this drawback.
Fig. 12 Weight of bias connected to output neuron (a) using standard steepness coefficient, (b) using
milder steepness coefficient MSC_9
M. Sulaiman et al.
4 Conclusions
The results of this study demonstrated that the optimal steepness coefficient of the
sigmoid function effectively improved the ANN data training performance when
compared with ANN trained using the standard steepness coefficient. Based on
this study, models with a steepness coefficient between 0.17 and 0.025 performed
better than those with a steepness coefficient between 1 and 0.35. These results
could also be applicable to other studies; therefore, it is suggested that additional
studies employing steepness coefficients between 0.17 to 0.025 be used to evaluate
the ANN data training and forecasting performance. The results of the present study
also show that the optimal steepness coefficient method is more efficient for producing the best training performance than trial and error. Importantly, the optimal
steepness coefficient significantly improved the data training of poor performance
training datasets. The improved data training can help improve the accuracy of data
forecasting at Rantau Panjang station and thus assist in monitoring for the possible
future occurrence of flood. For future research in applying linear transfer function at
output neurons and MSC in hidden neurons, it is highly recommended to find better
and more flexible optimization technique that can handle problem of big difference
of weights adjustment in hidden and output neurons due to the drawback of back
propagation algorithm.
Acknowledgements The authors thank the Environmental Research Group at the Department
of Civil Engineering, Faculty of Engineering, University Kebangsaan Malaysia for a research grant
(UKM-GUP-PLW-08-13-308) provided to the second and third authors. In addition, the authors
appreciate the Department of Irrigation of Selangor and Johor for providing data and assisting with
the background study of the Johor River.
Agarwal A, Singh RD (2004) Runoff modelling through back propagation artificial neural network
with variable rainfall-runoff data. Water Resour Manage 18:285–300
Ahmed JA, Sarma AK (2007) Artificial neural network model for synthetic stream flow generation.
Water Resour Manage 21(6):1015–1029
Alvisi S, Mascellani G, Franchini M, Bardossy A (2006) Water level forecasting through fuzzy logic
and neural network approaches. Hydrol Earth Syst Sci 10(1):1–17
ASCE Task Committee on the application of ANN in Hydrology (2000a) Artificial neural networks
in hydrology. I: preliminary concepts. J Hydrol Eng 5(2):115–123
ASCE Task Committee on the application of ANN in Hydrology (2000b) Artificial neural networks
in hydrology. II: hydrological applications. J Hydrol Eng 5(2):124–137
Atiya AF, El-Shoura SM, Shaheen SI, El-Sherif MS (1999) A comparison between neural network
technique—case study: river flow forecasting. IEEE Trans Neural Netw 10(2):402–409
Barron AR (1994) Approximation and estimation bounds for artificial neural networks. Mach Learn
Bustami RA, Bessaih N, Bong C, Suhaili S (2007) Artificial neural network for precipitation and
water level predictions of Bedup River. IAENG International Journal of Computer Science
Cancelliere A, Gıuliano G, Ancarani A, Rossi G (2002) A neural networks approach for deriving
irrigation reservoir operating rules. Water Resour Manage 16:71–88
Chandramouli V, Deka P (2005) Neural network based decision support model for optimal reservoir
operation. Water Resour Manage 19:447–464
Chauhan S, Shrivastava RK (2008) Performance evaluation of reference evapotranspiration estimation using climate based methods and artificial neural networks. Water Resour Manage 23:825–
Improved Water Level Forecasting Performance
Chiang YM, Chang LC, Chang FJ (2004) Comparison of static-feedforward and dynamic-feedback
neural networks for rainfall–runoff modeling. J Hydrol 290:297–311
Cigizoglu HK, Kisi O (2006) Methods to improve the neural network performance in suspended
sediment estimation. J Hydrol 317:221–238
Coulibaly P, Anctil F, Bobee B (2000) Daily reservoir inflow forecasting using artificial neural
networks with stopped Training Approach. J Hydrol 230:244–257
El-Shafie A, Reda Taha M, Noureldin A (2007) A neuro-fuzzy model for inflow forecasting of the
Nile River at Aswan High Dam. Water Resour Manage 21(3):533–556
El-Shafie A, Noureldin AE, Taha MR, Basri H (2008) Neural network model for Nile river inflow
forecasting based on correlation analysis of historical inflow data. J Appl Sci 8(24):4487–4499
El-Shafie A, Abdin AE, Noureldin A, Taha MR (2009) Enhancing inflow forecasting model at
Aswan high dam utilizing radial basis neural network and upstream monitoring stations measurements. Water Resour Manage 23(11):2289–2315
Fernando TMKG, Maier HR, Dandy GC, May RJ (2005) Efficient selection of inputs for artificial
neural network models, Proc. of MODSIM 2005 International Congress on Modelling and Simulation: Modelling and Simulation Society of Australia and New Zealand, December 2005/Andre
Zerger and Robert M. Argent (eds) 1806–1812
Han J, Moraga C, Sinne S (1996) Optimization of feedforward neural networks. Eng Appl Artif
Intell 9(2):109–119
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366
Joorabchi A, Zhang H, Blumenstein M (2007) Application of Artificial Neural Networks in Flow
Discharge Prediction for Fitzroy River, Australia. J Coast Res SI 50:287–291
Kerh T, Lee CS (2006) Neural networks forecasting of flood discharge at an unmeasured station
using river upstream information. Adv Eng Softw 37:533–543
Leahy P, Kiely G, Gearóid C (2008) Structural optimisation and input selection of an artificial neural
network for river level prediction. J Hydrol 355:192–201
Rahnama MB, Barani GA (2005) Application of rainfall-runoff models to Zard River catchment’s.
Am J Environ Sci 1(1):86–89
Rai RK, Mathur BS (2008) Event-based sediment yield modeling using artificial neural network.
Water Resour Manage 22(4):423–441
Shamseldin AY, Nasr AE, O’Connor KM (2002) Comparison of different forms of the multi-layer
feed-forward neural network method used for river flow forecasting. Hydrol Earth Syst Sci
Solaimani K, Darvari Z (2008) Suitability of artificial neural network in daily flow forecasting. J Appl
Sci 8(17):2949–2957
Tareghian R, Kashefipour SM (2007) Application of fuzzy systems and artificial neural networks for
flood forecasting. J Appl Sci 7(22):3451–3459
Toth E, Brath A (2007) Multistep ahead streamflow forecasting: role of calibration data in conceptual
and neural network modeling. Water Resour Res 43:W11405. doi:10.1029/2006WR005383
Turan ME, Yurdusev MA (2009) River flow estimation from upstream flow records by artificial
intelligence methods. J Hydrol 369:71–77
Wu CL, Chau KW, Li YS (2009) Methods to improve neural network performance in daily flows
prediction. J Hydrol 372:80–93
Zealand CM, Burn DH, Simonovic SP (1999) Short term stream flow forecasting using artificial
neural networks. J Hydrol 214:32–48
Zhang G, Patuwo BE, Hu MY (1998) Forecasting with artificial neural networks: the state of the art.
Int J Forecast 14:35–62
View publication stats