development of stage-discharge rating curve in river using genetic

advertisement
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
DEVELOPMENT OF STAGE-DISCHARGE RATING CURVE IN RIVER
USING GENETIC ALGORITHMS AND MODEL TREE
by
Bhola N.S. Ghimire(1) and M. Janga Reddy(2)
(1)
(2)
Research Scholar (ghimire@iitb.ac.in)
Assistant Professor (mjreddy@civil.iitb.ac.in)
Department of Civil Engineering, Indian Institute of Technology, Bombay, India
ABSTRACT
Discharge measurement in rivers is a challenging job for hydraulic engineers. A graph of stage versus discharge or the
line through the data points represents the stage-discharge relationship, also known as rating curve. The stage-discharge
relationship is an approximate method employed for estimating discharge in rivers, streams etc. For various hydrological
applications such as water and sediment budget analysis, operation and control of water resources projects, the accurate
information about flow value in rivers is very important. Stages are easy to measure as compared to the measurement of
discharge in rivers. The stage-discharge relationship at a particular river cross-section, even under conditions of
meticulous observation, it is not necessary unique as rivers are often influenced by several other factors which are
neither always understood, nor easy to quantify. This is due to the fact that in reality, discharge is not a function of stage
alone. Discharge also depends upon longitudinal slope of river, geometry of channel, bed roughness etc. However, the
measurement of these parameters at even and every time step and section is not possible. Hence there is a need to
establish the accurate relationship between stage and discharge. The conventional parametric regression methods usually
fail to model these relationships.
This paper presents the use of genetic algorithms (GA), a search procedure based on the mechanics of natural selection
and natural genetics, and Model Tree (M5), a data driven technique for dealing with continuous class problems, that
provides structural representation of the data and piecewise linear fit of the classes, for river hydrology to establish the
stage-discharge relationship. The results obtained are compared with the other methods such as gene-expression
programming (GEP), multiple linear regressions (MLR) and classical stage-discharge rating curve (RC). To measure the
performance of models, statistical measures such as coefficient of determination and root mean square error are used.
The results obtained from the GA based model as well as MT based model are found to be much better than the other
methods.
Keywords: Genetic algorithms, Model tree, Gene-expression programming, Multiple linear regression, rating curve.
1
INTRODUCTION
Hydraulic Engineers needs the discharge measurement in rivers for various purposes. It is one of the
challenging jobs for them. Discharge is solely depends upon the nature of rainfall in the catchment areas
which is purely stochastic. Due to stochastic nature of discharge, stage varies accordingly. A graph of stage
versus discharge and the line through the data points represents the stage-discharge relationship habitually
called as rating curve. The rating curve is a fundamental technique employed in discharge calculation. For
various hydrological applications such as water resources planning, reservoir operation, sediment handling
as well as hydrologic modelling, the accurate information about discharge and stage are very important.
Stages are measurable at any time but it needs sufficient preparation to measure the discharge which may
not be handy. Hence, to predict the discharge from measured stage, there should be specified relation with
them. The stage-discharge relationship at a particular river cross-section, even under conditions of
meticulous observation, is not necessary unique as rivers are often influenced by factors neither always
understood nor easy to quantify (Sefe, 1996). This is due to the fact that in reality, discharge is not a
function of stage alone. Discharge also depends up on longitudinal slope of river, geometry of channel, bed
roughness etc. However, the measurement of these parameters in every time steps and sections is not
reliable. So it is in the practice that usually discharge is forced to show the dependency with stage. Hence it
is clear that there need to establish the accurate relationship between discharge and stage. The conventional
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
1
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
parametric regression methods usually fail to model these relationships (Habib and Meselhe, 2006). They
have specified the two distinct approaches for stage-discharge modelling techniques- numerical solutions
and data driven technique. They developed stage-discharge relationship for coastal low-gradient streams
using neural networks and nonparametric regression as a second approach. The first approach uses for the
data from accurate boundary condition sites.
Tawfik et al. (1997) introduced an approach based on multilayer artificial neural network (ANN) for
modelling stage-discharge relationship. Same approached was followed by Jain and Chalisgaonker (2000),
Sudheer and Jain (2003) and Bhattacharya and Solomatine (2005). Bhattacharya Solomatine (2005) used
model tree M5 in addition to ANN to show the relation between stage and discharge in rivers. PetersonOverleir (2006) introduced a methodology based on the Jones formula and nonlinear regression as a solution
to situations where stage-discharge relationship is affected by hysteresis due to unsteady flow. Tyafur and
Singh (2006) used ANN and fuzzy logic tool to model the rainfall-runoff laboratory data. The relationships
for estimating the two coefficients of the stage-discharge equations were obtained and presented after some
experimental runs carried out by using flumes characterised by different values of the contraction ratio
(ranging from 0.17 to 0.81) and of the flume slope ( ranging from 0.5 to 3.5%) (Baiamonte and Ferro, 2007).
Using compound neural network, Jain (2008) developed an integrated relationship between stage-dischargesuspended sediment.
Soft-computing technique like ANN is sufficiently used in water resource engineering whereas GP and GA
is used only by few researchers. Researchers (Savic et al., 1999; Babovic and Keijzer, 2002) have developed
GP model to define the relation between rainfall and runoff in separate places. Dorado et al.(2003), applied
GP and ANN in hydrology for runoff prediction using rainfall in urban areas. Giustolisi (2004) used GP to
determine the Chezy resistance coefficient for full circular corrugated channels. Cheng et al. (2005) used
GA used for calibration of rainfall run-off model developed from fuzzy methods. Rabunal et al.(2007) used
GP and ANN to derive the unit hydrograph for a typical urban basin. Kumar and Reddy (2007) used GA for
optimization of multipurpose reservoir operation. Sivapragasan et al.(2008) demonstrated the storagedischarge relationship adopted for the non-linear Muskingum model using an evolutionary algorithm-based
modelling approach as GP. While compared the results with particle swarm optimization technique, they
found same optimum values from both techniques. Recently, Aytek and Kisi (2008) used GEP for suspended
sediment modelling and Guven and Aytek (2009) used GEP for stage-discharge modelling in American
rivers.
Similarly, another data driven tool, Model tree (MT) have been used by few researchers in hydrology. MT
gives better accuracy over ANN in the field of water management problems, rainfall-runoff modelling, canal
sedimentation etc. (Solomatine, 2002; Solomatine and Dulal, 2003; Bhattacharya et al., 2005). Reddy and
Ghimire (2009) used model tree successfully on the field of Suspended Sediment Load (SSL) estimation in
American rivers.
The objective of this article is to support the use of soft computing technique, GA and MT in the field of
Water resource engineering especially to show the strong relation between stage and discharge. The model
results are compared with the results obtained from conventional methods like stage rating curve (SRC) and
multi-linear regression (MLR) as well as the result predicted from GEP model.
2
2.1
MODELLING TECHNIQUES
Genetic Algorithms (GAs)
Genetic Algorithms (GAs) are a particular class of evolutionary algorithms that use techniques inspired by
evolutionary biology to solve a problem. In other words, GAs are one of the population-based search
techniques, which works on the concept of “Darwin’s principle: survival of the fittest” (Goldberg, 1989).
The idea in all these evolutionary algorithms is to evolve a population of candidate solutions to a given
problem, using operators inspired by natural genetic variation and natural selection such as inheritance,
mutation, selection, and crossover.
Genetic algorithms (GAs) were invented by John Holland in the 1960s and were developed himself and his
students and colleagues at the University of Michigan (Goldberg, 1989). According to their principle, GA is
a method for moving from one population of "chromosomes" (e.g., strings of ones and zeros, called "bits")
to a new population by using a kind of "natural selection" together with the genetics inspired operators of
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
2
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
crossover, mutation, and inversion. Each chromosome consists of "genes" (e.g., bits), each gene being an
instance of a particular "allele" (e.g., 0 or 1). The selection operator chooses those chromosomes in the
population that will be allowed to reproduce, and on average the fitter chromosomes produce more offspring
than the less fit ones. Crossover exchanges subparts of two chromosomes, roughly mimicking biological
recombination between two single chromosome organisms; mutation randomly changes the allele values of
some locations in the chromosome; and inversion reverses the order of a contiguous section of the
chromosome, thus rearranging the order in which genes are arrayed. The in-depth details about GA can be
found in (Goldberg, 1989).
2.1.1 Elements of GA. In GA, search starts with an initial set of random solutions known as population.
Each chromosome of population is evaluated using some measure of fitness function which represents a
measure of the success of the chromosome. Based on the value of the fitness functions, a set of
chromosomes is selected for breeding. In order to simulate a new generation, genetic operators such as
crossover and mutation are applied. According to the fitness value, parents and offspring are selected, while
rejecting some of them so as to keep the population size constant for new generation. The cycle of
evaluation–selection–reproduction is continued until an optimal or a near-optimal solution is found. The
fundamental procedural algorithms steps are shown in Figure 1.
Initial Population
Generation
Next generation
Evaluates fitness of all
individuals in population
Crossover
And
mutation
Termination
Criteria met?
No
Select individual
For next generation
Yes
Stop the search
Figure 1 – Schematic diagram of genetic algorithms (Tung et al., 2006)
Selection. Selection attempts to apply pressure upon the population in a manner similar to that of natural
selection found in biological systems. Before making it into the next generation’s population, selected
chromosomes may undergo crossover or mutation (depending upon the probability of crossover and
mutation) in which case the offspring chromosome(s) are actually the ones that make it into the next
generation’s population. Poorer performing individuals (evaluated by a fitness function) are weeded out and
better performing, or fitter, individuals have a greater than average chance of promoting the information
they contain to the next generation. Out of several selection methods, tournament selection is applied in this
study. In tournament selection, operator which uses roulette selection N times to produce a tournament
subset of chromosomes. The best chromosome in this subset is then chosen as the selected chromosome.
Crossover. Crossover allows solutions to exchange information in a way similar to that used by a natural
organism undergoing reproduction. In other words, crossover is a genetic operator that combines (mates)
two chromosomes (parents) to produce a new chromosome (offspring). This operator randomly chooses a
locus and exchanges the subsequences before and after that locus between two chromosomes to create two
offspring. The idea behind crossover is that the new chromosome may be better than both of the parents if it
takes the best characteristics from each of the parents. Crossover occurs during evolution according to a
user-definable crossover probability. For examples, if two parents (chromosomes) A and B having four
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
3
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
genes in each, formed two children (offspring) by exchanging gene at the end of second gene (Figure 2),
then it is said to be single point crossover whereas if it exchanges two points, than it said to be two point
crossover. In this study two point cross over is considered.
CrossOver Point
Figure 2 – Single point cross-over operator
Mutation. Mutation is used to randomly change (flip) the value of single bits within individual
strings to keep the diversity of a population and help a genetic algorithm to get out of a local
optimum. It is typically used sparingly. For example in Figure 3 parent became new child by
mutated gene number two.
Figure 3 – Mutation operator
2.1.2 Fitness function used in GA. To carryout the better estimation of parameters, there are many fitness
functions can be used in GA. For this study, least root mean square error function was taken. The fitness
function is given in Equation (1), where Qoi and Qpi are observed values in the field and predicted values
from developed GA model respectively. Where n is the total no of observations and F is the function gives
error.
 n

Min F = Sqrt. ∑ (Qoi − Q pi )2 / n 
 i =1

2.2
(1)
Model Tree (MT)
Model tree is a data driven technique for dealing with continuous class problems, that provides structural
representation of the data and piecewise linear fit of the classes. Model tree is a kind of decision tree, which
has the capability to predict the numeric values with linear regression function at the leaves. Model tree
classifies the data according to their similarity and then fits local regression equations thereby helps to
minimize the error in the model. Quinlan (1992) and Wang and Witten (1997) explained these popular
techniques.
The flow chart of Model Tree M5 (Reddy and Ghimire, 2009) showing fundamental steps is follows to
carryout the processing the data for this study. Initially it splits the parameter space into sub-spaces. Then it
builds linear regression model to each sub-spaces. It uses the information theory in splitting the data and
helps to fit on appropriate model. During model formulation each splitting section follows the idea of
decision tree integration of several models. Finally it uses computational intelligence techniques for possible
solutions to each model. The major advantages of model trees over regression trees are: (a) model trees are
much smaller than regression trees, (b) the decision strength is clear and (c) regression functions normally
do not involve many variables. Computational requirements for model trees grow rapidly with
dimensionality. Hundreds of attributes involve in the tasks of computing which helps to give better
formulation. Tree based models will be developed by a divide and defeat method. The standard deviation
reduction (SDR) is the main criteria for model selection which is given by Equation (2).
SDR = sd (T ) − ∑
i
| Ti |
sd (Ti )
|T |
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
(2)
4
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
Where, T represents set of examples that reaches the node; Ti represents the subset of examples that have the
ith outcome of the potential set (i.e. the sets that result from splitting the node according to the chosen
attribute); and sd(.) represents the standard deviation.
Pruning and smoothing. If the generated trees have more than sufficient leaves, the prediction may be ‘too
accurate’ and over fits the existing data which makes a poor generalization. It is possible to make tree
healthier by simplifying it. This merging process of the lower sub-trees into one node is called pruning. The
process used to compensate for the sharp discontinuities that will occur between adjacent linear models at
the leaves of the pruned trees is called smoothing. Hence the smoothing is difficult for constructed models
from a small number of training samples.
Advantages of Model trees. Model trees constitute actually a set of local linear models. They may serve as
an alternative to ANNs, are often almost as accurate as ANNs (Solomatine, 2002). It have following
advantages: (a) MT trains much faster than ANN, (b) The results given by Model tree are transparent and
can be easily understood by decision makers, and (c) Sing pruning it is possible to easily generate a range of
MTs as a simple linear regression to a much more accurate but complex combination of local models (many
branches and leaves).
2.3
Multiple linear regression (MLR)
Many engineering and scientific problems are concerned with determining a relationship between a set of
variables. Usually, a single response variable Y (the dependent variable) as a function of a set of
independent variables x1, x2, x3……. xn. It can be written as-
Y = a1 x1 + a2 x2 + a3 x3 + ......... + an xn + a0
(3)
Where coefficient ‘ai’ is the regression coefficient for ith independent variable (xi) computed by using least
square methods. When n=1, Equation (3) become a linear regression equation form. Similarly, while n=2,
the function corresponds to a plane in three dimensions and the values of n greater than 2, the function is a
hyper plane of n+1 dimensional plane. If Yi is the observed dependent variable and Ypi is the predicted value
of dependent variable using Equation (3), then the sum of least square error e yi2 is given by Equation (4).
N
∑e
i =1
2.4
N
2
yi
= ∑ (Yi − Ypi ) 2
(4)
i =1
Stage-Discharge Rating Curve (RC)
A stage-discharge rating curve (simply: rating curve, RC) is describes a relationship between the water level
(stage) a channel cross section with the rate of discharge at that section. Ideally, a rating curve describes a
unique functional relationship between stage and discharge; therefore, it is obtained as a smooth and
continuous curve with reasonable degree of sensitivity. Unfortunately there cannot be a unique stagedischarge relationship unless the flow is uniform. And due to stochastic nature of rainfall, river flow also not
uniform. Hence ideal relation to show between stage and discharge is not truth and it is only for
approximation (Henderson, 1966).
The sufficient number of measured value of discharges when plotted against the corresponding stages gives
relationship that represents the integrated effect of a wide range of channel and flow parameters. The control
(combined effect of these parameters) is usually categorized as permanent and shifting. In shifting control,
the parameters are not fixed and it changes with time. In the permanent control the parameters are constants
(Subramanya, 2006).
A majority of streams and rivers, especially non-alluvial rivers exhibit permanent control. For this
permanent control case, the relationship between the stage and the discharge is a single-valued relation
which is expressed as in Equation (5), which is the equation of parabola where Q = discharge in m3/s, G =
gauge height (stage) in m, a = a constant which represent the gauge reading corresponding to zero discharge,
β and C are rating curve constants.
Q = C (G − a ) β
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
(5)
5
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
Traditionally, the best values of a, β and C in Equation (5) for a given range of stage are obtained by the
least square error method. For this, by taking logarithms of Equation (5), we can get the Equation (6).
log Q = β log (G − a ) + log C
(6)
or
Y = β X +c'
(7)
Equation (6) is the form of the equation of straight line equivalent to that of Equation (7). Where, the
dependent variable Y = log Q, independent variable X=log (G-a) and c’ = log C. To get the best fit straight
line of n observations of both independent and dependent variables (X and Y), normally regression have to
be done for independent variable on dependent variable. Depending upon the nature of data, often two or
more straight lines may be required to fit the given data. While analyzing the data primarily, it can be
possible to find out the approximate position of the break points for each range of data. The actual break
points may be determined by solving the two equations for Q and G or graphical ways. Sometimes the curve
changes from a parabolic to a complex curve and vice versa, and sometime the constants and exponents vary
through the range (Guven and Aytek, 2009). So it is not easy to find out the values of parameters (a, β and
C) for each case and some times it may completely impossible to get the true values.
Considering this tedious situation, this study is mainly focused to optimize the parameters (a, β and C)
involved in this Equation (5) using GA as well as developed the piece wise linear equations using MT. The
methodology applied for case studies gave sufficiently good results and it is believed that, the developed
methodology will solve the many practical problems related to stage-discharge relations.
3
3.1
CASE STUDIES
Stage – Discharge Data
For the application demonstration of GA and MT, the time series daily data set containing stage and
discharge from two stations in Schuylkill River at Berne (Station no: 01470500, Lat. 40º31'21'' and Long.
75º59'55'') and Philadelphia (Station no: 01474500, Lat. 39º58'04'' and Long. 75º11'20''), USA are taken. The
catchments area of Berne station is about 919.45 km2 and that of Philadelphia station is 4902.85 km2. This
information was obtained from (USGS website).
The data from the period October 01, 2000 to September 30, 2006 were taken for both of the stations. Initial
five years data were taken for training purpose and last one year data (October 01, 2005 to September 30,
2006) were used for testing purpose for both the stations. Some of the statistical parameters for these sites
are shown in Table I for training and testing sets. The parameters µ, σ, σ/µ, Csx, Xmax, Xmin are mean,
standard deviation, variance, skew-ness, maximum and minimum values respectively. The discharge limits
of Berne station are 2.125 to 972.014 m3/s and that of Philadelphia station are 2.239 to 1484.943 m3/s.
Similarly, the corresponding stages of these discharges are 1.384, 5.088, 1.686 and 3.463 m respectively.
The developed models are valid for those specified ranges.
Table I – The daily statistical parameters for training and testing data set for two stations at Schuylkill River
Basin AreaData
(Km2)
Type
Data Set
Station
µ
Training
Berne
919.45
01470500
Philadelphia 4902.85
01474500
Stage* 1.65
Flow* 21.95
Stage
1.96
Flow 97.15
Testing
Berne
919.45
01470500
Philadelphia 4902.85
01474500
Stage
Flow
Stage
Flow
σ
σ/µ
Csx
Xmax
0.22
26.72
0.18
111.18
0.13
1.22
0.09
1.14
2.25
5.23
2.04
3.88
3.418
1.384
399.574
2.125
3.338
1.686
1312.07 2.239
1.66
0.32
24.32 61.88
1.98
0.20
109.42 147.86
0.19 5.09
5.088
2.54 11.46 972.014
0.10 3.39
3.463
1.35 5.63 1484.943
Xmin
1.396
2.522
1.774
17.258
*The units of stage and flow are (m) and (m3/s) respectively.
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
6
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
3.2
Development of Models based on conventional methods
Stage-Discharge Rating curve (RC) and Multiple Linear Regression (MLR) are considered for conventional
methods. The RC also developed into two forms: One in a simple power equation form (RC-1: without
considering the stage height corresponding to the zero discharge) and other little complex than the former
(RC-2: considering the stage value corresponding to the zero discharge). The developed models for these
methods (RC and MLR) are shown in following Equations (8) to (10) for Berne station and Equations (11)
to (13) for Philadelphia station. During development of complex rating curve (RC-2), the stage
corresponding to zero discharge are fixing with the help of scatter plot diagrams for training periods. The
reference stages data taken for the Berne station to fix the stage corresponding to zero discharge are 1.418,
1.628 and 2.064. Similarly, reference stages 1.765, 1.945 and 2.396 were taken to fix the stage
corresponding to zero discharge for the Philadelphia station. The values adopted for stages corresponding to
zero discharges for the stations Berne and Philadelphia are 1.223 and 1.645 m respectively. During
development of MLR models, single independent variable was used for comparing the performance with
other models, so it became simple linear models as shown in Equations (10) and (13).
Q = 0.441 H 7.036
(8)
Q = 93.951( H − 1.223)1.9885
(9)
Q = 116.069 H − 170.356
(10)
Q = 0.055 H 10.512
(11)
Q = 670.039( H − 1.645)1.841
(12)
Q = 602.645 H − 1084.43
(13)
In Equations (8) to (13), Q is discharge in m3/s and H is stage height in m taken above from the reference
datum.
3.3
Development of Models based on Genetic Algorithms (GAs)
The parameters (a, β and C), involved in basic Equation (5) are optimized with GA. Initially, the “training
set” is selected from the whole data and parameters are found. Finally, the relation is used to predict the
discharge values in “testing set”. The predicted values are compared with the measured values with the help
of statistical performance measure tools such as coefficient of determination and root mean square error.
600
600
40
35
30
25
20
15
10
5
0
500
500
Fitness
Fitness
400
400
300
300
200
200
40
35
30
25
20
15
10
5
0
10 10 12 12 14 14 16 16 18 18 20 20
100
100
0
0
0
2
4
6
8
0
2
4
6
8
10
12
14
16
18
20
10
12
14
16
18
20
Generations
Generations
Figure- 4. Fitness convergence of Philadelphia station
A function program has been written in Matlab environment and optimization is done. The population size is
fixed as 200 with uniform creation function. Similarly, tournament selection option having size 4 with rank
scaling is selected during program execution. Mutation function is used as adaptive feasible. Two point
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
7
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
crossover and forward migration nature were set in the program. The program was run for five times and the
parameters are recorded for the best fitness value in both cases for Berne and Philadelphia stations. The
sample fitness for training sets for Philadelphia station is shown in Figures 4. Similar observations were
found for Berne station. From Figures 4 it can be noticed that, the function value is reached minimum 19.63
m3/s at 14th generation in Philadelphia station. Similarly, for Berne station it found 3.67 m3/s at 14th
generation.
Values of the parameters for these fitness values are used for final relations between stage and discharge.
The value of parameters (a, β and C) are: 1.262, 1.765 and 94.848 for Berne station and 1.695, 1.526 and
630 for Philadelphia station. The explicit formulations of GA models for the stations Berne and Philadelphia
are given in Equations (14) and (15) respectively.
3.4
Q = 94.848 ( H − 1.262)1.765
(14)
Q = 630 ( H − 1.695)1.526
(15)
Development of Models based on Model Tree (MT)
MT models are formulated based on the fitness function given in Eqation (4). Minimum instances are taken
as four during formulation. The training and testing sets are used same to that used in GA model
formulation. The logic sets given by the programs are shown in Table II. This logic sets tested the time
series data feeding to the computer and decides the value according to the fitness function.
Table II - Model tree logic sets.
Berne Station (01470500)
Rules:
If
elseif
elseif
elseif
else
end
Ht <= 1.572 [721/1.785%] : Rule 1
Ht <= 1.691[503/4.826%] : Rule 2
Ht <= 1.929 [430/7.216%] : Rule 3
Ht <= 2.247[131/13.753%] : Rule 4
[41/16.241%] : Rule 5
Philadelpha Station (01474500)
Rules:
If
H <= 1.898 [408/2.114%] : Rule1
elseif H <= 1.901 [369/4.163%] : Rule2
elseif H <= 1.984 [417/2.216%] : Rule3
elseif H <= 2.057 [257/1.539%] : Rule4
elseif H <= 2.228 [245/3.151%] : Rule5
elseif H <= 2.467 [93/6.874%] : Rule6
elseif [37/13.366%] : Rule7
end
Based on Table II, five linear models were developed for Berne station and seven linear models were
developed for Philadelphia station. The developed linear models are shown in Equations (16) and (17).
LM 1
: Qt =54.5153 · Ht - 74.1687
LM 2
: Qt = 78.4768 · Ht - 111.6473
LM 3
: Qt = 106.0997 · Ht - 158.8781
LM 4
: Qt = 142.2248 · Ht - 228.7286
LM 5
: Qt = 225.1834 · Ht - 420.9018
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
(16)
8
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
3.5
LM 1
: Qt = 339.3015 · Ht - 590.8252
LM 2
: Qt = 23.5523 · Ht - 25.4191
LM 3
: Qt = 465.2298 · Ht - 828.8627
LM 4
: Qt = 520.3345 · Ht - 938.6954
LM 5
: Qt = 628.692 · Ht - 1163.6821
LM 6
: Qt = 808.0967 · Ht - 1563.0307
LM 7
: Qt = 985.0411 · Ht - 2009.3265
(17)
Results and discussions
The GEP models presented by Guven and Aytek (2009) for these stations Berne and Philadelphia are shown
in Equations (18) and (19) respectively. In these equations Q is discharge m3/s h is stage height measured
from datum in m. In his study, he was shown the usefulness of GEP models over conventional models. Here,
this study tried to compare the results obtained by using those presented models as well as models developed
by the researcher of this paper itself. The performance measures of these models are carried out by
coefficient of determination (R2) and root mean square error (RMSE), which are widely used for research
judgment for many areas. Table III shows the corresponding performance values for those models
comparing with other models.
Q = 10.313h1.5 + 4.738h −6 − 27.743
(18)
Q = 2h − 4.925h 2 + 54.421(2h − 4.715 / h) 2 − 8.349
(19)
Table III - The R2 and RMSE values for testing period
Models
RC-1
Berne 0147050
Philadelphia
01474500
R2
RMSE
R2
RMSE
0.78
2142
0.668
1674.8
RC-2
0.993
25.7
0.985
43.6
MLR
0.866
31.5
0.941
42.2
GA
0.997
5.9
0.998
5.8
MT
0.970
13.3
0.998
7.3
GEP*
0.942
61.9
0.998
23.1
* Guven and Aytek (2009)
From Table III, it is clearly seen that overall performance from the model developed using GA (R2 = 0.997
and RMSE = 5.9) and MT (R2 = 0.970 and RMSE = 13.3) are far better than the conventional models as
well as model proposed by earlier researcher for Berne Station. Similarly, for Philadelphia station the GA
model (R2 = 0.998 and RMSE = 5.8) and (R2 = 0.998 and RMSE = 7.3) are better than other conventional
models. The RMSE value given by these models are still better than the GEP model proposed by Guven and
Aytek (2009) for both stations.
4
CONCLUSIONS
In this study, an optimization tool, Genetic Algorithm (GA) and a data driven technique Model Tree (MT)
are used to develop the relation between river stages and discharges. The results obtained from the model
developed by GA and MT compared with the model developed from conventional methods as well as GEP
Model. The results showed that GA models as well as MT model are better than the conventional models.
While comparing between GA model and MT model, GA model is seems to more superior over MT model.
The GA model for the maximum recorded stage height 5.088 m at Berne predicted the discharge value 1013
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
9
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
m3/s which is very near to that of observed value 972 m3/s. Similarly, at Philadelphia station, for the
maximum recorded stage 3.463 m, the model predicted value is 1503 m3/s whereas the observed value is
1485 m3/s. So it can be say, for higher stages GA model gives better result. The proposed methodology is
assumed to be useful for other sites.
Acknowledgments: The authors would like to thank the USGS web site to provide data freely
downloadable.
5
REFERENCES
Aytek, A. and Kisi, O. (2008). A genetic programming approach to suspended sediment modelling. Journal of
Hydrology, 351:288-298.
Babovic, V. and Keijzer, M. (2002). Rainfall runoff modelling based on genetic programming. Nordic Hydrology,
33(5), 331-346.
Baiamonte, G. and Ferro, V. (2007). Simple flume for flow measurement in sloping open channel. Journal of Irrig.
Drain. Eng., 133(1), 71-78.
Bhattacharya, B. and Solomatine, D.P. (2005). Neural networks and M5 model trees in modelling water level-discharge
relationship. Neurocomputing, 63: 381-396.
Bhattacharya, B., Price, R.K. and Solomatine, D.P. (2005). Data-driven modelling in the context of sediment transport.
Journal of Physics and Chemistry of the Earth, 30 (4-5), 297-302.
Cheng, C., Wu, X. and Chau, K.W. (2005). Multiple criteria rainfall-runoff model calibration using a parallel genetic
algorithm in a cluster of computers. J. Hydrol. Sciences, 50(6), 1069-1087.
Dorado, J., Rabunal, J.R., Pazos, A., Rivero, D., Santos, A. and Puertas, J. (2003). Prediction and modelling of the
rainfall-runoff transformation of a typical urban basin using ANN and GP. Applied Artificial Intelligence, 17(4),
329-343.
Giustolisi, O. (2004). Using genetic programming to determine Chezy resistance coefficient in corrugated channels. J.
Hydroinformatics, 6(3), 157-173.
Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Boston.
Guven, A. and Aytek, A. (2009). New approach for stage-discharge relationship: Gene-expression programming. J.
Hydrologic Eng., 14(8), 812-820.
Habib, E.H., and Meselhe, E.A. (2006). Stage-discharge relations for low-dradient tidal streams using data-driven
models. J. Hydraul. Eng., 132(5), 482-492.
Henderson, F.M. (1966). Open channel flow. The Macmillan Company, New York.
Jain, S.K. (2008). Development of integrated discharge and sediment rating relation using a compound neural network.
J. Hydrologic Eng., 13(3), 124-131.
Jain, S.K. and Chalisgaonkar, D. (2000). Setting up stage-discharge relations using ANN. J. Hydrologic Eng., 5(4),
428-433.
Kumar, D.N. and Reddy, M.J. (2007). Multipurpose reservoir operation using Particle Swarm Optimization. J. Water
Res. Plan. Manage., 133(3), 192-201.
Petersen-Øverleir, A. (2006). Modelling stage-discharge relationships affected by hysteresis using the Jones formula and
nonlinear regression. J. Hydrological Sciences, 51(3), 365-388.
Quinlan, J.R. (1992). Learning with continuous classes. Proceedings Austrilian Joint Conference on Artificial
Intelligence, 343-348. World Scientific, Singapore.
Rabunal, J.R., Puertas, J., Suarez, J. and Rivero, D. (2007). Determination of the unit hydrograph of a typical urban
basin using genetic programming and artificial neural networks. Hydrol. Process, 21, 476-485.
Reddy, M.J. and Ghimire, B.N.S.(2009). Use of Model tree and Gene expression programmong to predict the suspended
sediment load in rivers. J. Intelligent Systems, 18(3), 211-227.
Savic, D.A., Walters, G.A. and Davidson, J.W. (1999). A genetic programming approach to rainfall-runoff modelling.
Water Resource Management, 13: 219-231.
Sefe, F.T.K. (1996). A study of the stage-discharge relationship of the Okavango River at Mohembo, Botswana. J.
Hydrological Sciences, 41(1), 97-116.
Sivapragasam, C., Maheswaran, R. and Venkatesh, V. (2008). Genetic programming approach for flood routing in
natural channels. Hydrol. Process., 22:623-628.
Solomatine, D.P. (2002). Computational intelligence techniques in modelling water systems: some applications. IEEE,
0-7803-7278, 2,1853-1858.
Solomatine, D.P., and Dulal, K.N. (2003). Model tree as an alternative to neural network in rainfall-runoff modeling.
Hydrological Sciences Journal, 48(3), 399-411.
Subramanya, K. , (2006). Engineering Hydrology. Tata McGraw-Hill, New Delhi.
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
10
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 23-25, 2010 Taormina, Italy
Sudheer, K.P. and Jain, S.K. (2003). Radial basis function Neural Network for modelling rating curves. J. Hydrologic
Eng., 8(3), 161-164.
Tawfik, M., Ibrahim, A. and Fahmy, H. (1997). Hysterasis sensitive Neural Network for modelling rating curves. J.
Computing Civil Eng., 11(3), 206-211.
Tung, Y.K., Yen, B.C. and Melching, C.S. (2006). Hydrosystems engineering reliability assessment and risk analysis.
McGraw-Hill, New York.
Tyafur, G and Singh, V.P. (2006). ANN and fuzzy logic models for simulating event-based rainfall-runoff. J. Hydraul.
Eng., 132(12), 1321-1330.
USGS website, http://www.usgs.gov.
Wang, Y. and Witten, I.H.: Introduction for Model trees for predicting continuous classes. Proc. The European
conference on machine Learning, University of Economics, Faculty of Informatics and Statistics, Prague (1997)
***
Ghimire and Reddy, Development of Stage-Discharge RC in River using GA and MT
11
Download