Uploaded by asbin.acharya.10

Handbook of Hydroinformatics: Classic Soft-Computing Techniques

advertisement
Handbook of HydroInformatics
This page intentionally left blank
Handbook
of HydroInformatics
Volume I: Classic Soft-Computing Techniques
Edited by
Saeid Eslamian
Full Professor of Hydrology and Water Resources Sustainability, Department of Water Engineering,
College of Agriculture, Isfahan University of Technology, Iran
Faezeh Eslamian
McGill University, Quebec, Canada
Elsevier
Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2023 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system, without
permission in writing from the publisher. Details on how to seek permission, further information about the
Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance
Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher
(other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become
necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using
any information, methods, compounds, or experiments described herein. In using such information or methods
they should be mindful of their own safety and the safety of others, including parties for whom they have a
professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability
for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or
from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-821285-1
For information on all Elsevier publications
visit our website at https://www.elsevier.com/books-and-journals
Publisher: Candice Janco
Acquisitions Editor: Maria Elekidou
Editorial Project Manager: Rupinder Heron
Production Project Manager: Bharatwaj Varatharajan
Cover Designer: Greg Harris
Typeset by STRAIVE, India
Dedication
To Late Dr. Mark Twain (American Writer, Humorist, Entrepreneur, Publisher,
and Lecturer, 1835–1910)
“Data is like garbage. You’d better know what you are going to do with it
before you collect it.”
This page intentionally left blank
Contents
Contributors
About the editors
Preface
1. Advanced machine learning
techniques: Multivariate
regression
xv
xvii
xix
1
Reza Daneshfar, Mohammad Esmaeili,
Mohammad Mohammadi-Khanaposhtani,
Alireza Baghban, Sajjad Habibzadeh, and
Saeid Eslamian
1.
2.
3.
4.
5.
6.
7.
8.
Introduction
Linear regression
Multivariate linear regression
Gradient descent method
Polynomial regression
Overfitting and underfitting
Cross-validation
Comparison between linear and
polynomial regressions
9. Learning curve
10. Regularized linear models
11. The ridge regression
12. The effect of collinearity in the
coefficients of an estimator
13. Outliers impact
14. Lasso regression
15. Elastic net
16. Early stopping
17. Logistic regression
18. Estimation of probabilities
19. Training and the cost function
20. Conclusions
Appendix: Python code
Linear regression
Gradient descent method
Comparison between linear and polynomial
regressions
Learning curve
The effect of collinearity in the coefficients
of an estimator
Outliers impact
Lasso regression
Elastic net
Training and the cost function
References
2. Bat algorithm optimized extreme
learning machine: A new
modeling strategy for predicting
river water turbidity at the
United States
33
34
37
39
Salim Heddam
1
1
2
4
6
9
9
9
11
11
13
13
13
16
17
17
18
18
19
20
20
20
22
23
26
30
31
33
1. Introduction
2. Study area and data
3. Methodology
3.1 Feedforward artificial neural
network
3.2 Dynamic evolving neural-fuzzy
inference system
3.3 Bat algorithm optimized extreme
learning machine
3.4 Multiple linear regression
3.5 Performance assessment of the
models
4. Results and discussion
4.1 USGS 1497500 station
4.2 USGS 11501000 station
4.3 USGS 14210000 station
4.4 USGS 14211010 station
5. Conclusions
References
3. Bayesian theory: Methods and
applications
39
40
42
42
43
43
44
45
46
46
47
49
50
53
53
57
Yaser Sabzevari and Saeid Eslamian
1.
2.
3.
4.
5.
Introduction
Bayesian inference
Phases
Estimates
Theorem Bayes
5.1 Argument of Bayes
5.2 Bayesian estimation theory
57
57
58
58
58
58
59
vii
viii
Contents
5.3 Machine learning using Bayesian
method
5.4 Bayesian theory in machine
learning
5.5 Definition of basic concepts
5.6 Bayesian machine learning
methods
5.7 Optimal Bayes classifier
5.8 Naive Bayes classifier
6. Bayesian network
7. History of Bayesian model application in
water resources
8. Case study of Bayesian network
application in modeling of
evapotranspiration of reference plant
9. Conclusions
References
4. CFD models
5. Cross-validation
59
60
60
60
60
62
63
65
66
67
67
69
Hossien Riahi-Madvar,
Mohammad Mehdi Riyahi, and
Saeid Eslamian
1.
2.
Introduction
Numerical model of one-dimensional
advection dispersion equation
(1D-ADE)
3. Physically influenced scheme
4. Finite Volume Solution of Saint-Venant
equations for dam-break simulation
using PIS
5. Discretization of continuity
equation using PIS
6. Discretization of the momentum
equation using PIS
7. Quasi-two-dimensional flow
simulation
8. Numerical solution of quasi-twodimensional model
9. 3D numerical modeling of flow in
compound channel using turbulence
models
10. Three-dimensional numerical
model
11. Grid generation and the flow filed
solution
12. Comparison of different turbulence
models
13. Three-dimensional pollutant transfer
modeling
14. Results of pollutant transfer
modeling
15. Conclusions
References
69
69
70
89
Amir Seraj,
Mohammad Mohammadi-Khanaposhtani,
Reza Daneshfar, Maryam Naseri,
Mohammad Esmaeili, Alireza Baghban,
Sajjad Habibzadeh, and Saeid Eslamian
1. Introduction
1.1 Importance of validation
1.2 Validation of the training process
2. Cross-validation
2.1 Exhaustive and nonexhaustive
cross-validation
2.2 Repeated random subsampling
cross-validation
2.3 Time-series cross-validation
2.4 k-fold cross-validation
2.5 Stratified k-fold cross-validation
2.6 Nested
3. Computational procedures
4. Conclusions
References
89
89
89
90
90
90
90
90
91
91
91
104
105
6. Comparative study on the selected
node and link-based performance
indices to investigate the hydraulic
capacity of the water distribution
network
107
C.R. Suribabu and P. Sivakumar
72
73
73
74
76
78
79
80
80
81
83
86
86
1. Introduction
2. Resilience of water distribution
network
3. Hydraulic uniformity index (HUI)
4. Mean excess pressure (MEP)
5. Proposed measure
5.1 Energy loss uniformity (ELU)
6. Hanoi network
7. Results and discussion
8. Conclusions
References
7. The co-nodal system analysis
107
109
110
110
110
110
111
112
117
117
119
Vladan Kuzmanovi
c
1.
2.
3.
4.
5.
Introduction
Co-nodal and system analysis
Paleo-hydrology and remote sensing
Methods
Nodes and cyclic confluent system
5.1 H-cycloids analysis and fluvial
dynamics
6. Three Danube phases
119
119
120
121
121
123
124
Contents ix
7. Danubian hypocycles as overlapping
phases
8. Conclusions
References
Further reading
8. Data assimilation
10. Decision tree algorithms
127
133
133
134
135
Mohammad Mahdi Dorafshan,
Mohammad Reza Jabbari, and
Saeid Eslamian
1. Introduction
2. What is data assimilation?
3. Types of data assimilation methods
3.1 Types of updating procedure
3.2 Types of updating variable
4. Optimal filtering methods
4.1 Kalman filter
4.2 Transfer function
4.3 Extended Kalman filter
4.4 Unscented Kalman filter
5. Auto-regressive method
6. Considerations in using data
assimilation
7. Conclusions
References
9. Data reduction techniques
135
136
137
137
137
140
140
143
144
146
147
148
148
148
153
M. Mehdi Bateni and Saeid Eslamian
1.
2.
3.
Introduction
Principal component analysis
Singular spectrum analysis
3.1 Univariate singular spectral
analysis
3.2 Multivariate singular spectral
analysis
4. Canonical correlation analysis
5. Factor analysis
5.1 Principal axis factoring
6. Random projection
7. Isometric mapping
8. Self-organizing maps
9. Discriminant analysis
10. Piecewise aggregate approximation
11. Clustering
11.1 k-means clustering
11.2 Hierarchical clustering
11.3 Density-based clustering
12. Conclusions
References
Amir Ahmad Dehghani, Neshat Movahedi,
Khalil Ghorbani, and Saeid Eslamian
1. Introduction
1.1 ID3 algorithm
1.2 C4.5 algorithm
1.3 CART algorithm
1.4 CHAID algorithm
1.5 M5 algorithm
1.6 Random forest
1.7 Application of DT algorithms in water
sciences
2. M5 model tree
2.1 Splitting
2.2 Pruning
2.3 Smoothing
3. Data set
3.1 Empirical formula for flow
discharge
3.2 Model evaluation and
comparison
4. Modeling and results
4.1 Initial tree
4.2 Pruning
4.3 Comparing M5 model and empirical
formula
5. Conclusions
References
11. Entropy and resilience indices
153
153
155
156
157
157
158
158
160
162
162
163
165
165
165
166
167
168
169
171
171
171
172
173
173
173
173
174
174
174
176
176
176
177
178
179
179
179
184
185
185
189
Mohammad Ali Olyaei, A.H. Ansari,
Zahra Heydari, and Amin Zeynolabedin
1. Introduction
2. Water resource and infrastructure
performance evaluation
3. Entropy
3.1 Thermodynamic entropy
3.2 Statistical-mechanical entropy
3.3 Information entropy
3.4 Application of entropy in water
resources area
4. Resilience
4.1 Application of resilience in water
resources area
4.2 Resilience in UWS
4.3 Resilience in urban environments
4.4 Resilience to floods
4.5 Resilience to drought
5. Conclusions
References
189
190
191
191
192
192
193
194
195
196
198
199
201
202
203
x Contents
12. Forecasting volatility in the
stock market data using
GARCH, EGARCH, and GJR
models
14. Gradient-based optimization
Mohammad Zakwan
207
Sarbjit Singh, Kulwinder Singh Parmar,
and Jatinder Kaur
1. Introduction
2. Methodology
2.1 Types of GARCH
models
3. Application and results
4. Conclusions
References
13. Gene expression models
207
209
210
211
219
219
221
Hossien Riahi-Madvar, Mahsa Gholami,
and Saeid Eslamian
1. Introduction
2. Genetic programming
2.1 The basic steps in GEP
development
2.2 The basic steps in GEP
development
3. Tree-based GEP
3.1 Tree depth control
3.2 Maximum tree depth
3.3 Penalizing the large trees
3.4 Dynamic maximum-depth
technique
4. Linear genetic programming
5. Evolutionary polynomial
regression
6. Multigene genetic programming
7. Pareto optimal-multigene genetic
programming
8. Some applications of GEP-based
models in hydro informatics
8.1 Derivation of quadric polynomial
function using GEP
8.2 Derivation of Colebrook-White
equation using GEP
8.3 Derivation of the exact form of
shield’s diagram using GEP
8.4 Extraction of regime river equations
using GEP
8.5 Extraction of longitudinal
dispersion coefficient equations
using GEP
9. Conclusions
References
243
221
221
222
222
223
224
224
225
226
227
227
228
229
230
230
231
233
234
236
237
239
1. Introduction
2. Materials and method
2.1 GRG solver
3. Results and discussion
3.1 Solving nonlinear equations
3.2 Application in parameter
estimation
3.3 Fitting empirical equations
4. Conclusions
References
15. Gray wolf optimization
algorithm
243
244
245
245
245
246
248
249
249
253
Mohammad Reza Zaghiyan,
Vahid Shokri Kuchak, and Saeid Eslamian
1. Introduction
2. Theory of GWO
3. Mathematical modeling of gray wolf
optimizer
3.1 Social hierarchy
3.2 Encircling prey
3.3 Hunting behavior
3.4 Exploitation in GWO-attacking
prey
3.5 Exploration in GWO-search for
prey
4. Gray wolf optimization example for
reservoir operation
5. Conclusions
Appendix A: GWO Matlab codes for the
reservoir example
References
16. Kernel-based modeling
253
254
255
255
256
256
258
259
259
261
262
265
267
Kiyoumars Roushangar,
Roghayeh Ghasempour, and
Saman Shahnazi
1. Introduction
2. Support vector machine
2.1 Support vector classification
2.2 Support vector regression
3. Gaussian processes
3.1 Gaussian process regression
3.2 Gaussian process classification
4. Kernel extreme learning machine
5. Kernels type
5.1 Fisher kernel
5.2 Graph kernels
267
268
268
269
271
271
273
274
275
276
276
Contents xi
5.3 Kernel smoother
5.4 Polynomial kernel
5.5 Radial basis function kernel
5.6 Pearson kernel
5.7 String kernels
5.8 Neural tangent kernel
6. Application of kernel-based
approaches
6.1 Total resistance and form resistance of
movable bed channels
6.2 Energy losses of rectangular and
circular culverts
6.3 Lake and reservoir water level
prediction
6.4 Streamflow forecasting
6.5 Sediment load prediction
6.6 Pier scour modeling
6.7 Reservoir evaporation prediction
7. Conclusions
References
Further reading
17. Large eddy simulation:
Subgrid-scale modeling with
neural network
276
276
276
276
277
277
3.2 Constant heat flux boundary
condition
4. Multicomponent LBM (species
transport modeling)
5. Flow simulation in porous media
6. Dimensionless numbers
7. Flow chart of the simulation
procedure
8. Multiphase flows
8.1 The color-gradient model
8.2 Shan-Chen model
9. Sample test cases and codes
9.1 Free convection in L-cavity
9.2 Force convection in a channel
10. Conclusions
Appendix A
Computer code for free convection in
L-cavity
Appendix B
Computer code for force convection in a
channel
References
311
317
19. Multigene genetic programming
and its various applications
283
321
277
277
277
279
279
279
279
279
280
280
281
18. Lattice Boltzmann method and
its applications
Mojtaba Aghajani Delavar and Junye Wang
1.
2.
3.
Introduction
Lattice Boltzmann equations
2.1 BGK approximation
2.2 Lattice Boltzmann models
2.3 Multirelaxation time lattice
Boltzmann (MRT)
2.4 Boundary conditions
Thermal LBM
3.1 Boundary condition with a given
temperature
297
298
299
300
300
301
302
303
303
303
304
305
305
311
Majid Niazkar
Tamas Karches
1. Introduction
2. LES and traditional subgrid-scale
modeling
3. Data-driven LES closures
4. Guidelines for SGS modeling
4.1 Simulation project definition
4.2 A priory analysis with DNS
4.3 Neural network based SGS model
construction
5. Conclusions
References
297
283
284
284
285
285
286
286
287
287
1. Introduction
2. Genetic programming and its variants
3. An introduction to multigene genetic
programming
4. Main controlling parameters of
MGGP
5. A review on MGGP applications
6. Future trends of MGGP applications
7. A case study of the MGGP
application
8. Conclusions
References
289 20. Ontology-based knowledge
management framework in
business organizations and water
289
users networks in Tanzania
289
292
292
293
294
296
297
321
321
322
324
325
327
327
329
330
333
Neema Penance Kumburu
1.
2.
3.
4.
Introduction
Theoretical framework
Empirical literature
Ontology-based knowledge
management framework in business
organizations: A conceptual
framework
333
334
336
336
xii Contents
5. Ontology-based knowledge
management framework in business
organizations and water user networks
proposed system
6. The practice of knowledge organization
and expression
6.1 Ontology
6.2 Knowledge representation and
organization base on ontology
6.3 Knowledge retrieval base ontology
6.4 Knowledge application and
implementation base on ontology
7. Conclusions
References
21. Parallel chaos search-based
incremental extreme learning
machine
339
341
341
341
343
344
347
347
349
Salim Heddam
1. Introduction
2. Materials and methods
2.1 Study area description
2.2 Modeling approaches
2.3 Performance assessment of the
models
3. Results and discussion
4. Conclusions
References
22. Relevance vector machine
(RVM)
349
350
350
352
353
355
361
362
365
Mohammad Reza Jabbari,
Mohammad Mahdi Dorafshan, and
Saeid Eslamian
1. Introduction
2. Machine learning algorithms
2.1 Supervised learning
2.2 Unsupervised learning
3. Support vector machine
4. Relevance vector machine
4.1 Measurement model
representation
4.2 Relevance vector regression
4.3 Relevance vector classification
4.4 Limitations and performance
analysis
4.5 Multivariate relevance vector
machines
5. Preprocessing step
5.1 Data normalization
5.2 Data reduction
5.3 Dataset split ratio
365
365
365
366
366
367
367
371
372
6. Applications of relevance vector
machine
6.1 Sediment concentration
estimation
6.2 Drought monitoring
6.3 Groundwater quality monitoring
6.4 Evaporative losses in reservoirs
6.5 Environmental science
7. Conclusions
References
23. Stochastic learning algorithms
373
375
375
375
376
377
378
378
379
380
381
381
385
Amir Hossein Montazeri,
Sajad Khodambashi Emami,
Mohammad Reza Zaghiyan, and
Saeid Eslamian
1. Introduction
2. Gradient descent
2.1 Theory of batch gradient descent
2.2 Theory of SGD
3. Perceptron
3.1 Theory of perceptron
3.2 Perceptron learning procedure
4. Adaline
4.1 Theory of Adaline
4.2 Adaline learning procedure
5. Multilayer network
5.1 Multilayer network learning
procedure
6. Learning vector quantization
6.1 LVQ learning procedure
7. K-means clustering
7.1 What is clustering?
7.2 Theory of K-means
8. Gradient boosting
8.1 What is boosting?
8.2 Theory of gradient boosting (GB)
8.3 Stochastic gradient boosting
9. Conclusions
References
Appendix A
Appendix B
Appendix C
Appendix D
Appendix E
24. Supporting vector machines
372
377
385
386
386
386
388
389
390
391
391
391
392
392
393
395
397
397
397
399
399
399
400
400
401
403
404
406
407
409
411
Kiyoumars Roushangar
and Roghayeh Ghasempour
1. Introduction
2. SVMs for classification problems
2.1 Linear classifiers
2.2 Non-linear classifiers
411
412
412
413
Contents xiii
3. SVMs for regression problems
4. Selection of SVM parameters
4.1 Margin
4.2 Regularization
4.3 Kernels
4.4 Gamma parameter
5. Application of support vector
machines
5.1 Application of support vector
regression in the water recourse
engineering
6. Conclusions
References
25. Uncertainty analysis using
fuzzy models in
hydroinformatics
413
415
415
415
415
416
26. Uncertainty-based resiliency
evaluation
Hossien Riahi-Madvar,
Mohammad Mehdi Riyahi, and
Saeid Eslamian
1.
2.
Introduction
Uncertainty analysis by the first-order
method
3. Risk and resilience analysis
4. Reliability computation by direct
integration
5. Reliability computation using safety
margin/safety factor
6. Safety margin
7. Safety factor
8. Uncertainty-based hydraulic designs
9. Hydrologic uncertainties
10. Hydraulics uncertainties
11. Monte-Carlo uncertainty analysis in
quasi-2D model parameters
12. SKM model
13. Uncertainty based river flow modeling
with Monte-Carlo simulator
14. Monte-Carlo uncertainty analysis in
machine learning techniques
15. Uncertainty evaluation using the
integrated Bayesian multimodel
framework
16. Copula-based uncertainty analysis
17. Uncertainty analysis with Tsallis
entropy
18. Theory of evidence for uncertainty in
hydroinformatics
19. Resiliency quantification
20. Conclusions
References
417
417
421
421
423
Tayeb Boulmaiz, Mawloud Guermoui,
Mohamed Saber, Hamouda Boutaghane,
Habib Abida, and Saeid Eslamian
1. Introduction
2. Fuzzy logic theory
2.1 Fuzzification
2.2 Rule base
2.3 Inference
2.4 Defuzzification
3. Concept of fuzzy uncertainty
analysis
4. Uncertainty analysis
applications
4.1 Flood forecasting
4.2 Groundwater modeling
5. Machine learning and fuzzy sets
6. Fuzzy sets and probabilistic
approach
7. Conclusions
References
423
424
424
425
425
425
425
426
426
427
430
431
432
432
435
Index
435
435
438
438
439
439
439
440
441
442
442
443
443
447
449
449
450
451
451
452
452
455
This page intentionally left blank
Contributors
Habib Abida (423), Laboratory of Modeling of Geological
and
Hydrological
Systems
(GEOMODELE
(LR16ES17)), Faculty of Sciences, University of Sfax,
Sfax, Tunisia
A.H. Ansari (189), Department of Agricultural and Biological Engineering, Pennsylvania State University,
State College, PA, United States
Alireza Baghban (1,89), Chemical Engineering
Department, Amirkabir University of Technology
(Tehran Polytechnic), Mahshahr Campus, Mahshahr,
Iran
M. Mehdi Bateni (153), University School for Advanced
Studies, Pavia, Italy
Tayeb Boulmaiz (423), Materials, Energy Systems Technology and Environment Laboratory, University of
Ghardaia, Ghardaia, Algeria
Hamouda Boutaghane (423), Laboratory of Soil and
Hydraulic, Badji Mokhtar Annaba University, Annaba,
Algeria
Reza Daneshfar (1,89), Department of Petroleum Engineering, Ahwaz Faculty of Petroleum Engineering,
Petroleum University of Technology, Ahwaz, Iran
Amir Ahmad Dehghani (171), Department of Water Engineering, Gorgan University of Agricultural Sciences &
Natural Resources, Gorgan, Iran
Mojtaba Aghajani Delavar (289), Faculty of Science and
Technology, Athabasca University, Athabasca, AB,
Canada
Mohammad Mahdi Dorafshan (135,365), Department of
Civil Engineering, Isfahan University of Technology,
Isfahan, Iran
Sajad Khodambashi Emami (385), Department of Water
Engineering and Management, Tarbiat Modares University, Tehran, Iran
Saeid Eslamian (1,57,69,89,135,153,171,221,253,365,
385,423,435), Department of Water Engineering,
College of Agriculture, Isfahan University of Technology; Center of Excellence in Risk Management
and Natural Hazards, Isfahan University of Technology,
Isfahan, Iran
Mohammad Esmaeili (1,89), Department of Petroleum
Engineering, Amirkabir University of Technology
(Polytechnic of Tehran); Department of Petroleum
Engineering, Amirkabir University of Technology
(Tehran Polytechnic), Tehran, Iran
Roghayeh Ghasempour (267,411), Department of Water
Resources Engineering, Faculty of Civil Engineering,
University of Tabriz, Tabriz, Iran
Mahsa Gholami (221), Department of Civil Engineering,
Faculty of Engineering, Bu-Ali Sina University,
Hamedan, Iran
Khalil Ghorbani (171), Department of Water Engineering,
Gorgan University of Agricultural Sciences & Natural
Resources, Gorgan, Iran
Mawloud Guermoui (423), Unite de Recherche Appliquee
en Energies Renouvelables, Centre de Developpement
des Energies Renouvelables, Ghardaı̈a, Algeria
Sajjad Habibzadeh (1,89), Chemical Engineering
Department, Amirkabir University of Technology
(Tehran Polytechnic), Mahshahr Campus, Mahshahr;
Surface Reaction and Advanced Energy Materials Laboratory, Chemical Engineering Department, Amirkabir
University of Technology (Tehran Polytechnic),
Tehran, Iran
Salim Heddam (39,349), Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology,
Hydraulics Division, Agronomy Department, Faculty
of Science, Skikda, Algeria
Zahra Heydari (189), Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
Mohammad Reza Jabbari (135,365), Department of Electrical and Computer Engineering, Isfahan University of
Technology, Isfahan, Iran
Tamas Karches (283), Faculty of Water Science, University of Public Service, Budapest, Hungary
Jatinder Kaur (207), Department of Mathematics,
I.K. Gujral Punjab Technical University, Kapurthala;
Guru Nanak Dev University College, Amritsar, Punjab,
India
xv
xvi
Contributors
Vahid Shokri Kuchak (253), Department of Water Engineering and Management, Tarbiat Modares University,
Tehran, Iran
Neema Penance Kumburu (333), Moshi Co-operative
University, Moshi, Tanzania
Vladan Kuzmanovic (119), Serbian Hydrological Society,
International Association of Hydrological Sciences,
Belgrade, Serbia
Mohammad Mohammadi-Khanaposhtani (1), Fouman
Faculty of Engineering, College of Engineering, University of Tehran, Tehran, Iran
Amir Hossein Montazeri (385), Department of Water
Engineering and Management, Tarbiat Modares University, Tehran, Iran
Neshat Movahedi (171), Department of Water Engineering, Gorgan University of Agricultural Sciences
& Natural Resources, Gorgan, Iran
Maryam Naseri (89), Chemical Engineering Department,
Babol Noshirvani University of Technology, Babol,
Iran
Majid Niazkar (321), Department of Agricultural and
Environmental Sciences - Production, Landscape,
Agroenergy, University of Milan, Milan, Italy
Mohammad Ali Olyaei (189), Department of Civil Environmental and Geo-Engineering, University of Minnesota, Minneapolis, MN, United States
Kulwinder Singh Parmar (207), Department of Mathematics, I.K. Gujral Punjab Technical University,
Kapurthala, Punjab, India
Hossien Riahi-Madvar (69,221,435), Department of
Water Engineering, Faculty of Agriculture, Vali-e-Asr
University of Rafsanjan, Rafsanjan, Iran
Mohammad Mehdi Riyahi (69), Department of Civil
Engineering, Faculty of Civil Engineering and Architecture, Shahid Chamran University of Ahvaz, Ahvaz,
Iran
Kiyoumars Roushangar (267,411), Department of Water
Resources Engineering, Faculty of Civil Engineering;
Center of Excellence in Hydroinformatics, University
of Tabriz, Tabriz, Iran
Mohamed Saber (423), Disaster Prevention Research
Institute (DPRI), Kyoto University, Kyoto, Japan
Yaser Sabzevari (57), Department of Water Engineering,
College of Agriculture, Isfahan University of Technology, Isfahan, Iran
Amir Seraj (89), Department of Instrumentation and
Industrial Automation, Ahwaz Faculty of Petroleum
Engineering, Petroleum University of Technology,
Ahwaz, Iran
Saman Shahnazi (267), Department of Water Resources
Engineering, Faculty of Civil Engineering, University
of Tabriz, Tabriz, Iran
Sarbjit Singh (207), Guru Nanak Dev University College,
Pathankot; Department of Mathematics, Guru Nanak
Dev University, Amritsar, Punjab, India
P. Sivakumar (107), Department of Civil Engineering,
North Eastern Regional Institute of Science and Technology, Nirjuli (Itanagar), Arunachal Pradesh, India
C.R. Suribabu (107), Centre for Advanced Research in
Environment, School of Civil Engineering, SASTRA
Deemed University, Thanjavur, Tamil Nadu, India
Junye Wang (289), Faculty of Science and Technology,
Athabasca University, Athabasca, AB, Canada
Mohammad Reza Zaghiyan (385), Department of Water
Engineering and Management, Tarbiat Modares University, Tehran, Iran
Mohammad Zakwan (243), School of Technology,
Maulana Azad National Urdu University, Hyderabad,
India
Amin Zeynolabedin (189), School of Civil Engineering,
College of Engineering, University of Tehran, Tehran,
Iran
About the editors
Saeid Eslamian has been a Full Professor of Environmental Hydrology and Water
Resources Engineering in the Department of Water Engineering at Isfahan University
of Technology since 1995. His research focuses mainly on statistical and environmental
hydrology in a changing climate. In recent years, he has worked on modeling natural
hazards, including floods, severe storms, wind, drought, and pollution, and on water reuse,
sustainable development and resiliency, etc. Formerly, he was a visiting professor at Princeton University, New Jersey, and the University of ETH Zurich, Switzerland. On the
research side, he started a research partnership in 2014 with McGill University, Canada.
He has contributed to more than 600 publications in journals, books, and technical reports.
He is the founder and Chief Editor of both the International Journal of Hydrology Science
and Technology (IJHST) and the Journal of Flood Engineering (JFE). Dr. Eslamian is
currently Associate Editor of four important publications: Journal of Hydrology
(Elsevier), Eco-Hydrology and Hydrobiology (Elsevier), Journal of Water Reuse and
Desalination (IWA), and Journal of the Saudi Society of Agricultural Sciences (Elsevier).
Professor Eslamian is the author of approximately 35 books and 180 book chapters.
Dr. Eslamian’s professional experience includes membership on editorial boards, and he is a reviewer of approximately
100 Web of Science (ISI) journals, including the ASCE Journal of Hydrologic Engineering, ASCE Journal of Water
Resources Planning and Management, ASCE Journal of Irrigation and Drainage Engineering, Advances in Water
Resources, Groundwater, Hydrological Processes, Hydrological Sciences Journal, Global Planetary Changes, Water
Resources Management, Water Science and Technology, Eco-Hydrology, Journal of the American Water Resources Association, American Water Works Association Journal, etc. Furthermore, in 2015, UNESCO nominated him for a special
issue of the Eco-Hydrology and Hydrobiology Journal.
Professor Eslamian was selected as an outstanding reviewer for the Journal of Hydrologic Engineering in 2009 and
received the EWRI/ASCE Visiting International Fellowship at the University of Rhode Island (2010). He was also awarded
prizes for outstanding work by the Iranian Hydraulics Association in 2005 and the Iranian petroleum and oil industry in
2011. Professor Eslamian was chosen as a distinguished researcher by Isfahan University of Technology (IUT) and Isfahan
Province in 2012 and 2014, respectively. In 2016, he was a candidate for National Distinguished Researcher in Iran.
Dr. Eslamian has also acted as a referee for many international organizations and universities. Some examples include
the US Civilian Research and Development Foundation (USCRDF), the Swiss Network for International Studies, the His
Majesty’s Trust Fund for Strategic Research of Sultan Qaboos University, Oman, the Royal Jordanian Geography Center
College, and the Research Department of Swinburne University of Technology of Australia. He is also a member of the
following associations: American Society of Civil Engineers (ASCE), International Association of Hydrologic Science
(IAHS), World Conservation Union (IUCN), GC Network for Drylands Research and Development (NDRD), International
Association for Urban Climate (IAUC), International Society for Agricultural Meteorology (ISAM), Association of Water
and Environment Modeling (AWEM), International Hydrological Association (STAHS), and UK Drought National
Center (UKDNC).
Professor Eslamian finished Hakim-Sanaei High School in Isfahan in 1979. After the Islamic Revolution, he was
admitted to Isfahan University of Technology (IUT) to study a BS in water engineering, and he graduated in 1986. He
was subsequently offered a scholarship for a master’s degree program at Tarbiat Modares University, Tehran. He finished
his studies in hydrology and water resources engineering in 1989. In 1991, he was awarded a scholarship for a PhD in civil
engineering at the University of New South Wales, Australia. His supervisor was Professor David H. Pilgrim, who
encouraged Professor Eslamian to work on “Regional Flood Frequency Analysis Using a New Region of Influence
Approach.” He earned a PhD in 1995 and returned to his home country and IUT. He was promoted in 2001 to Associate
xvii
xviii
About the editors
Professor and in 2014 to Full Professor. For the past 26 years, he has been nominated for different positions at IUT,
including University President Consultant, Faculty Deputy of Education, and Head of Department. Dr. Eslamian is
now director of the Center of Excellence in Risk Management and Natural Hazards (RiMaNaH).
Professor Eslamian has made three scientific visits, to the United States, Switzerland, and Canada in 2006, 2008, and
2015, respectively. In the first, he was offered the position of visiting professor by Princeton University and worked jointly
with Professor Eric F. Wood at the School of Engineering and Applied Sciences for 1 year. The outcome was a contribution
to hydrological and agricultural drought interaction knowledge through developing multivariate L-moments between soil
moisture and low flows for northeastern US streams.
Recently, Professor Eslamian has written 14 handbooks published by Taylor & Francis (CRC Press): the three-volume
Handbook of Engineering Hydrology (2014), Urban Water Reuse Handbook (2016), Underground Aqueducts Handbook
(2017), the three-volume Handbook of Drought and Water Scarcity (2017), Constructed Wetlands: Hydraulic Design
(2019), Handbook of Irrigation System Selection for Semi-Arid Regions (2020), Urban and Industrial Water Conservation
Methods (2020), and the three-volume Flood Handbook (2022).
An Evaluation of Groundwater Storage Potentials in a Semiarid Climate (2019) and Advances in Hydrogeochemistry
Research (2020) by Nova Science Publishers are also among his book publications. The two-volume Handbook of Water
Harvesting and Conservation (2021, Wiley) and Handbook of Disaster Risk Reduction and Resilience (2021, New Frameworks for Building Resilience to Disasters) are further Springer publications by Professor Eslamian, as are the Handbook of
Disaster Risk Reduction and Resilience (2022, Disaster Risk Management Strategies) and the two-volume Earth Systems
Protection and Sustainability (2022).
Professor Eslamian was listed among the World’s Top 2% of Researchers by Stanford University, USA, in 2019 and
2020. He has also been a grant assessor, report referee, award jury member, and invited researcher for international organizations such as the United States Civilian Research and Development Foundation (2006), Intergovernmental Panel on
Climate Change (2012), World Bank Policy and Human Resources Development Fund (2021), and Stockholm International Peace Research Institute (2022), respectively.
Faezeh Eslamian holds a PhD in Bioresource Engineering from McGill University,
Canada. Her research focuses on the development of a novel lime-based product to mitigate phosphorus loss from agricultural fields. Dr. Elsamian completed her bachelor and
master’s degrees in Civil and Environmental Engineering at the Isfahan University of
Technology, Iran, where she evaluated natural and low-cost absorbents for the removal
of pollutants such as textile dyes and heavy metals. Furthermore, she has conducted
research on worldwide water quality standards and wastewater reuse guidelines. Dr.
Elsamian is an experienced multidisciplinary researcher with interests in soil and water
quality, environmental remediation, water reuse, and drought management.
Preface
Classic Soft-Computing Techniques is the first volume of three in the Handbook of HydroInformatics series. Through this
comprehensive, 26-chapter work, the contributors explore the difference between traditional computing, also known as
hard computing, and soft computing, which is based on the importance given to issues like precision, certainty, and rigor.
The chapters go on to define fundamental classic soft-computing techniques such as multivariate regressions, bat algorithm
optimized extreme learning machine (Bat-ELM), Bayesian inference, computational fluid dynamics (CFD) models, cross
validation, selected node and link-based performance indices, conodal system analysis, data assimilation, data reduction
techniques, decision tree algorithm, entropy and resilience indices, generalized autorregressive conditional heteroskedasticity (GARCH), exponential general autoregressive conditional heteroskedastic (EGARCH), and Glosten, Jagannathan,
and Runkle (GJR) models, gene expression models, gradient-based optimization, gray wolf optimization (GWO) algorithm, kernel-based modeling, subgrid-scale (SGS) modeling with neural network, lattice Boltzmann method (LBM), multigene genetic programming (MGGP), ontology-based knowledge management framework, parallel chaos search-based
incremental extreme learning, relevance vector machine (RVM), stochastic learning algorithms, support vector machine,
uncertainty analysis using fuzzy logic models, uncertainty-based resiliency evaluation, etc. It is a fully comprehensive
handbook providing all the information needed regarding classic soft-computing techniques.
This volume is a true interdisciplinary work, and the intended audience includes postgraduates and early-career
researchers interested in computer science, mathematical science, applied science, Earth and geoscience, geography, civil
engineering, engineering, water science, atmospheric science, social science, environment science, natural resources, and
chemical engineering.
The Handbook of HydroInformatics corresponds to courses that could be taught at the following levels: undergraduate,
postgraduate, research students, and short course programs. Typical course names of this type include: HydroInformatics,
Soft Computing, Learning Machine Algorithms, Statistical Hydrology, Artificial Intelligence, Optimization, Advanced
Engineering Statistics, Time Series, Stochastic Processes, Mathematical Modeling, Data Science, Data Mining, etc.
The three-volume Handbook of HydroInformatics is recommended not only for universities and colleges, but also for
research centers, governmental departments, policy makers, engineering consultants, federal emergency management
agencies, and related bodies.
Key features are as follows:
l
l
l
Contains key insights from global contributors in the fields of data management research, climate change and resilience,
insufficient data problems, etc.
Offers applied examples and case studies in each chapter, providing the reader with real-world scenarios for comparison
Introduces classic soft-computing techniques necessary for a range of disciplines
Saeid Eslamian
College of Agriculture, Isfahan University of Technology, Isfahan, Iran
Faezeh Eslamian
McGill University, Montreal, QC, Canada
xix
This page intentionally left blank
Chapter 1
Advanced machine learning techniques:
Multivariate regression
Reza Daneshfara, Mohammad Esmaeilib, Mohammad Mohammadi-Khanaposhtanic, Alireza Baghband,
Sajjad Habibzadehd, and Saeid Eslamiane,f
a
Department of Petroleum Engineering, Ahwaz Faculty of Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran, b Department of
Petroleum Engineering, Amirkabir University of Technology (Polytechnic of Tehran), Tehran, Iran, c Fouman Faculty of Engineering, College of
Engineering, University of Tehran, Tehran, Iran, d Chemical engineering Department, Amirkabir University of Technology (Tehran Polytechnic),
Mahshahr Campus, Mahshahr, Iran, e Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, f Center
of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran
1. Introduction
Complicated problems in a variety of fields that cannot be solved using conventional techniques are handled using machine
learning (Zeebaree et al., 2019; Bargarai et al., 2020; Dargan et al., 2020). Linear regression is a simple and popular
machine technique employed for prediction purposes. It was introduced by Galton (1894). It is a mathematical approach
€ g€ud€uc€u, 2015; Dehghan et al., 2015; Liu et al.,
in order to analyze and quantify the associations of variables (Akg€un and O
2017). To incorporate the outputs of other founders/covariates into a model, one cannot utilize univariate regression—i.e.,
chi-square, Fisher exact test, and analysis of variance (ANOVA). As a result, partial correlation and regression are
employed to identify the association of two variables and evaluate the confusion effect (Zebari et al., 2020; Sulaiman,
2020; Epskamp and Fried, 2018). Mathematical algorithms typically employ linear regression for the purpose of predicted
effect measurement and modeling versus several inputs (Lim, 2019). This data analysis approach linearly relates independent and dependent variables, modeling the relationships between the independent and dependent variables based
on model training. The present study conducts a review of recent popular methodologies in the machine learning and linear
regression literature, including databases, performance, accuracy, and algorithms, from 2017 to 2020 (Sarkar et al., 2015).
This chapter is divided into the following sections: The first section focuses on linear regression, this is followed by an
explanation of multivariate linear regression, and then the gradient descent method is described. The polynomial regression
concept is then explained and concepts such as overfitting and under-fitting, cross-validation, and learning curve are
expressed in a clear and fluent manner. Finally, the attractive and practical concepts that are discussed include: regularized
linear models, ridge regression, outliers impact, lasso regression, elastic net, early stopping, and logistic regression.
2. Linear regression
When we know a property or a dependent variable in general depends on several variables but the way of this dependence is
not clear to us, a linear model is the simplest choice to get an insight into this dependence. Although the simplest choice is
not necessarily the best one, linear models can do a lot in the case of algebraic dependency between a function and its
variables. A linear model can provide a reasonable estimation of any function at least in a small neighborhood. Moreover,
some nonlinear dependencies as suggested by theories could be transformed to a linear dependency. For example, consider
the following chemical reaction rate law
r A ¼ kCnA
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00017-8
Copyright © 2023 Elsevier Inc. All rights reserved.
(1)
1
2
Handbook of hydroinformatics
In which k and n are constants to be determined from experimental data of reaction rate (rA) versus species concentration
(CA). To apply the favorable linear model, one can transform the above equation taking a natural logarithm from both sides
to have
ln ðr A Þ ¼ ln k + n ln ðCA Þ
(2)
Another example that can be suited to multivariate problems is the polynomial regression. This topic will be discussed in a
separate section; however, the linear model can somewhat cover such problems by an interesting trick. Consider the following model
y ¼ a 0 + a1 z + a2 z 2
(3)
In this case, by introducing a new variable, the nonlinear model is transformed into a linear model. Assume that z ¼ x1 and
z2 ¼ x2 then
y ¼ a0 + a1 x 1 + a2 x 2
(4)
Although the values of x2 are not independent of x1, this does not have anything to do with the application of linear
regression algorithm. These two examples demonstrate that linear models for multivariate problems are a fundamental tool
that could not be ignored by practitioners, especially in the field of machine learning or more elegantly artificial intelligence
(Olive, 2017; Matloff, 2017).
In this section, we are going to go through a project called nutrient removal efficiency data and we use a data set containing 7876 data to predict the total phosphorus (TP), ammonium (NH4-N), and total nitrogen (TN) removal efficiency of
an anaerobic anoxic-oxic membrane bioreactor system and the output values are predicted by nine input data given in
Table 1. This dataset was taken from the data reported from an article published by Yaqub et al. (2020).
In this part, we are only using one explanatory variable (e.g., TOC) to explain the output (RE of TN).
The linear regression diagram for this example is shown in Fig. 1. After successful fitting, it is well known that with
increasing TOC, the values of removal efficiency of TN increase.
3.
Multivariate linear regression
When y is a function of n variables namely x1 to xn the simplest model for dependency is a linear model which can provide
an estimation yb of the function as
yb ¼ a0 + a1 x1 + a2 x2 +⋯+an xn
TABLE 1 The attribute information of the nutrient removal efficiency project.
Code
Input or output
Description
TOC
Input
Total organic contents
TN
Input
Total nitrogen
TP
Input
Total phosphorous
COD
Input
Chemical oxygen demand
NH4-N
Input
Ammonium
SS
Input
Suspended solids
DO
Input
Dissolved oxygen
ORP
Input
Oxidation-reduction potential
MLSS
Input
Mixed liquor suspended solids
RE of NH4-N
Output
Removal efficiency of NH4-N
RE of TN
Output
Removal efficiency of TN
RE of TP
Output
Removal efficiency of TP
(5)
Advanced machine learning techniques Chapter
1
3
100
80
NH4-N-OUT
60
40
20
0
4000
6000
8000
10000
12000
14000
16000
18000
TOC
FIG. 1 Linear regression for nutrient removal efficiency project.
Where a0 to an are the model parameters to be determined using available data in combination with a proper linear
regression algorithm (Hackeling, 2017). Matrix notations help provide a compact form of the equations in multivariate
problems. In the matrix form
yb ¼ xT a
(6)
Where xT ¼ [1 x1 x2 ⋯xn] indicates the transpose of column matrix x and similarly a shows the column matrix of parameters. Note that a new term i.e. x0 ¼ 1 is introduced to make the matrix product possible. Now the problem is reduced to
the determination of elements of matrix a under suitable constraints that eventually provide a system of linear equations
for specifying the model parameters. The first one must note that at each point the model error is defined as ei ¼ yi ybi
that is
ei ¼ yi xTi a
(7)
Where xTi can be interpreted as the i0 th row of the matrix XT which includes the values of each variable at different points.
The error vector could be defined as a column matrix as
e ¼ y XT a
(8)
4
Handbook of hydroinformatics
Where both e and y have p elements (column vectors with p rows) and XT is an p (n + 1) matrix:
2
3
xn,1
1 x1, 1 x2,1
⋯
6
7
6 1 x1, 2 x2,2
xn,2 7
T
6
7
X ¼6
⋮
⋱ ⋮ 7
4
5
1 x1, p x2,p ⋯ xn, p
(9)
At the first glance, minimization of the absolute value of the error results in the best values of model parameters. But since
the error is presented by the vector e, one should talk about the minimization of a suitable norm of that. Moreover, the first
norm which adds up the absolute values of the elements of e would bring some problems in terms of differentiation. Thus,
the better choice is the second norm or the Euclidian norm of the error vector:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
(10)
kek2 ¼ e21 + e22 + ⋯ + e2p
And finally, since minimization of the above function is equivalent to the minimization of the summation on the right side,
the Sum of Squares of Errors (SSE) is taken as the target function in linear regression problems:
SSE ¼
p
X
e2i ¼ eT ∗e ¼ yT aT X ∗ y XT a
(11)
i¼1
And minimization of this function without any constraint would result in an Ordinary Least Square (OLS) method for determination of model parameters, i.e., a0 to an. Obviously, this method involves vanishing the first partial derivatives of SSE
concerning the model parameters which provide the required n + 1 equations:
p
n
X
X
∂ðSSEÞ
¼ 2
yi xk,i + 2
ai XX T ki + ¼ 0, k ¼ 0,1,2,…,n
∂ak
i¼0
i¼1
p
n
X
X
;
ai XX T ki ¼
yi xk,i
i¼0
(12)
i¼1
Or in the matrix form
XX T a ¼ Xy
!
1
a ¼ XX T Xy
(13)
Of course solution of the system of linear equations could be accomplished by rather low cost calculations. Indeed, the
determination of the inverse matrix for the solution of a linear system of equations is almost always avoided. Instead, a
direct method such as the Gauss-Jordan method of LU-factorization is advised when the system is not too large (say
for n < 100) and indirect methods such as SOR are advised for large systems. Of course, a favorable model, should not
have too many parameters, that is, the number of independent variables is kept low by incorporating the effective terms
and neglecting the variables with minor impact on the output. Therefore, no matter how large the data set, one intends to
solve a linear system of equations with a reasonable number of unknowns.
Fig. 2 illustrates synthetic data for which a random error is involved in the measurement. Here, the data point is determined by y ¼ 5 + 2.5x + error.
And the linear regression results in a0 ¼ 4.9943 and a1 ¼ 2.4824.
The calculations involve the solution of a linear system of n + 1 equations which involves inverting of the matrix which
generally requires O(np) operations where p lies between 2.373 and three depending on the direct method applied. For
example, Gauss-Jordan requires O(n3) operations; hence if the number of variables (terms) is doubled, the operations
are increased to eight times of the original problem. For many problems, this does not introduce much difficulty but
for problems with a large number of variables, iterative procedures are advised. Among these methods, the gradient descent
method is helpful for both problems with too many variables and problems with a very large set of data (Konishi, 2014;
Izenman, 2008).
4.
Gradient descent method
This technique is also known as steepest descent, is a well-known method for the optimization of differentiable functions.
The strategy is based on taking steps proportional to the negative gradient of the function at each point to get closer to the
Advanced machine learning techniques Chapter
1
5
FIG. 2 Applying linear regression for the synthetic data.
local minimum of the function. For convex function, the global minimum of the function could be determined albeit by a
proper choice of the step size (Harrington, 2012). For a multivariable function F(x), the gradient descent method is put to
action as follows
xn+1 ¼ xn l—Fðxn Þ
(14)
Where l is a small positive number which must be small enough to prevent missing the local minimum but not too small to
literally get the process stuck at the neighborhood of our initial guess. Note that l can be updated at each step and under
certain circumstances, values of this parameter could be chosen to guarantee the convergence. As an example, consider the
contour plot shown in Fig. 3.
FIG. 3 Contours of function y ¼ x21/4 + x22 x32/8 + 1: reaching local minimum by GD.
6
Handbook of hydroinformatics
This plot represents the function y ¼ x21/4 + x22 x32/8 + 1 and if one starts searching for the minimum from (x1, x2) ¼ (4, 4),
the direction for the steepest descent is opposite to the gradient of the function at this point, i.e., r y(4,4) ¼ 2i 2j.
Hence, the new point is given by
x1
x2
¼
4
4
l
2
2
(15)
Usually, at the first step, values of l are set less than unity and then larger values are examined. Practically, enlarging l is
permitted as long as it provides smaller values of the objective function. For the present problem, the new values of variables in terms of l could be replaced in function “y” and with straight-forward single variable optimization one arrives at
lopt ¼ 2 and in this way the local minimum is determined by just one shot at (x1, x2) ¼ (0, 0) and ymin ¼ 1. Of course, realworld problems are not that easy and several steps with proper step size are required. In each step, a single variable optimization might be performed to infer the best value of the step size. Of course, direct searching based on small step size, its
enlargement (usually by 10 times), and comparison of the resulting function values is better suited for sophisticated
functions.
As long as our regression problem is concerned, one has to put it in the form of a minimization problem to apply the
gradient descent method. The objective function is simply the sum of the squares of error
SSE ¼ yT aT X ∗ y XT a
(16)
Whose gradient could be simply determined from previous arguments on its derivatives concerning model parameters;
that is
—SSE ¼ 2X∗ XT a y
(17)
With an initial guess and a small enough value of step size, one can initiate the algorithm to obtain the right values of the
model’s parameters:
anew ¼ aold l—SSEðaold Þ
(18)
There are three approaches to apply the gradient descent method for the training set of the data depending on how the huge
data set is handled by this method. In this respect, if the whole training set is used at every step of the calculations, the
method is addressed as batch gradient descent. For a very large data set, the batch gradient descent might not be economic.
Hence, two other variants are proposed by practitioners: Stochastic gradient descent and mini-batch gradient descent. In
stochastic GD at each step, only a small random set of the training data-set is used to calculate the gradient which makes it
much faster than the original Batch GD; however, there are some issues due to the stochastic nature of the method which
results in a nonmonotonic convergence to the local minimum and hence it needs stopping criteria to prevent bouncing
around the local minimum. On the other hand, the mini-batch variant of GD splits the training data-set into small sets
and computes the gradient on these sets which allows taking the benefit of parallel computation while it resolves the
problem of oscillatory convergence observed in stochastic GD (Shalev-Shwartz and Ben-David, 2014; Brownlee, 2016).
We are going to apply gradient descent to the nutrient removal efficiency project. We do these using parameters of TOC
and the removal efficiency of NH4-N. As can be seen from the Fig. 4, the sum of the squares of error decreases with
increasing epoch to a certain level until the error reaches a minimum value.
5.
Polynomial regression
When the data do not depict a linear behavior, still linear regression could be applied by introducing new variables in terms
of the powers of the original variable as discussed in the previous section. Consider Fig. 3 which displays synthetic data
built up by y ¼ 5 + 3x 0.5x2 + noise. To apply linear regression to this problem, consider the following model
y ¼ a0 + a1 x 1 + a2 x 2
(19)
Where x1 ¼ x itself and x2 ¼ x2 so that the values of the new variable are known everywhere. Therefore, the matrix of variables X is built up as follows
Advanced machine learning techniques Chapter
1
7
3938.0
3937.5
3937.0
SSE
3936.5
3936.0
3935.5
3935.0
3934.5
3934.0
0
20
40
60
80
100
Epoch
FIG. 4 SSE in terms of epoch for the nutrient removal efficiency project.
2
1
61
6
XT ¼ 6
6⋮
4
1
x1,1
x1,2
⋮
x1,p
3 2
1
x2,1
7
6
x2,2 7 6 1
7¼6
6
⋮ 7
5 4⋮
x2,p
1
And finally, the model parameters are determined as
a ¼ XX
2
T 21
x21,1
x1,1
3
⋮
7
x21,2 7
7
⋮ 7
5
x1,p
x21,p
x1,2
5:1129
(20)
3
6
7
Xy ¼ 4 3:0234 5
0:5138
(21)
The model prediction is also displayed in Fig. 5.
When there are indeed multiple variables (or features), a true polynomial regression is necessary to capture the relationship between these features. Mathematically, this relationship is depicted in nonlinear terms which contain a combination of the variables. For example, the second-order terms are either made by squaring a single variable or multiplying
one feature by another. In this respect, the number of terms of a polynomial of degree m for a problem with n features will be
determined as
n+m
ðn + mÞ!
(22)
¼
n!m!
m
8
Handbook of hydroinformatics
11
data
model
10
9
y
8
7
6
5
0
1
2
3
4
5
X
FIG. 5 Noisy data and the second-degree least-square polynomial.
FIG. 6 The variance of a too high-degree polynomial which makes it an improper choice.
which includes all possible combinations of variables to construct a multivariate mth degree polynomial. Now, the question
is what degree is the best for a problem. A high-degree polynomial can get closer to more data points but it can easily lose
track by following the inherent noise of the data. In this regard, its predictions might not be relied on. When this is the case,
we say that the model has a high variance. Fig. 6 compares a polynomial of 20th degree with the second-degree polynomial.
Choosing the proper degree of the regression polynomial is a statistical task that considers the trade-off between the high
variance (unacceptable sensitivity of the model with high-degree polynomials) and the bias (underfitting the data with low
degree polynomials) (Shalev-Shwartz and Ben-David, 2014; Raschka, 2015; Ramasubramanian and Singh, 2018). The
issue is discussed in the following section.
Advanced machine learning techniques Chapter
1
9
FIG. 7 The high-degree polynomial regression.
6. Overfitting and underfitting
Compared to plain linear regression, a high-degree polynomial regression provides a better opportunity for fitting the
training data. As shown in Fig. 7, when a 40-degree polynomial model is applied to the training data of Fig. 7, the data
points are approximated to a great extent but obviously, its trend is not acceptable at both ends that’s why it’s considered
as an overfitting regression polynomial. The linear model neither follows the general trend nor touches the data points
satisfactorily; hence, it underfits the data. However, the quadratic regression satisfactorily follows the general trend
and presents a reasonable approximation as well (Swamynathan, 2019; Burger, 2018).
This is expected since the initial dataset was created through the introduction of some errors in a quadratic function.
Nonetheless, in many practical cases, there is no means to identify the original function behind the dataset. Therefore, there
is a need to determine the level of complexity of a model and to determine whether the model is underfitting or overfitting
the data (Ganesh, 2017).
7. Cross-validation
One of the most common ways to obtain an estimation of the performance of the model in terms of generalization involves
the utilization of cross-validation. It is said that the model is overfitting if it has good performance on the training data, but
provides poor generalization, which is determined by evaluating the cross-validation measures. However, the model is said
to be underfitting if it provides poor performance on both the training data and on the measures of cross-validation. Hence,
this is a satisfactory method for determining whether the model is too complex or too simple.
8. Comparison between linear and polynomial regressions
In this section, we intend to examine polynomial regression for the nutrient removal efficiency project. By drawing the
MLSS on the horizontal axis in terms of removal efficiency of TN data on the y-axis, a nonlinear downward trend is
obtained. The figure related to this example is given in Fig. 8.
After applying linear regression to the data in this example, it becomes clear that this regression cannot be placed correctly on this data (see Fig. 9).
After applying polynomial regression in the quadratic mode for the data of this example, it becomes clear that this
regression can fit these data better than the linear mode. This issue was also examined numerically and the value of
R2 related to this regression was obtained equal to 0.14, while the value of R2 related to linear regression was equal to
0.10 (see Fig. 10).
Now we change the polynomial features to the degree of 10 and run the lines again and this time we can see from the
following figure that the direction of the graph is changed. The value of R2 is 0.04 and we may seem to have a case of
overfitting (see Fig. 11).
10
Handbook of hydroinformatics
90
80
70
60
50
40
30
20
0
2500
5000
7500
10000
12500
15000
17500
20000
FIG. 8 The values of MLSS in terms of removal efficiency of TN correspond to the nutrient removal efficiency project.
90
80
70
60
50
40
30
20
0
2500
5000
7500
10000
12500
FIG. 9 Using linear regression to predict the values of MLSS and removal efficiency of TN.
15000
17500
20000
Advanced machine learning techniques Chapter
1
11
90
80
70
60
50
40
30
20
0
2500
5000
7500
10000
12500
15000
17500
20000
FIG. 10 Using polynomial regression (quadratic mode) to predict the values of MLSS and removal efficiency of TN.
9. Learning curve
One of the other available methods is to evaluate the learning curves. Learning curves show the performance of the model
on both training and validation sets as a function of the size of the training set or the training iteration. In order to plot these
curves, the model is trained several times using various subsets of the training set where each subset is of a different size
( Jaber, 2016).
It should be noted that in general, a straight line cannot provide good performance for modeling the data. This is confirmed by the fact that the error level reaches a relatively constant level, which is very close to the other curve. It is worth
mentioning that when a model is underfitting, these learning curves are typically observed, i.e., curves that reach constant
error levels, which are close and relatively high. It should be noted that a common method for improving an overfitting
model is to provide it with more training instances until the validation error reaches the training error.
The learning curve for the nutrient removal efficiency project is given in the below diagram. By plotting train and also
test score versus the number of samples, it is clear that these two graphs become closer to each other with increasing the
number of training examples, and from 5000 onwards, the slope and the degree to which the two graphs approach each other
become less intense (see Fig. 12).
10. Regularized linear models
One of the possible ways for decreasing the overfitting phenomenon involves the regularization of the model, which is
another way to say limiting or restricting it. By reducing the degrees of freedom of the model, it will be harder for the
model to overfit the data. One of the east ways to regularize a polynomial model involves decreasing the polynomial degree.
12
Handbook of hydroinformatics
90
80
70
60
50
40
30
20
0
2500
5000
7500
10000
12500
15000
17500
20000
FIG. 11 Using polynomial regression (degree of 10) to predict the values of MLSS and removal efficiency of TN.
Learning Curves (Ridge Regression)
0.006
Training score
Cross-validation score
0.004
Score
0.002
0.000
–0.002
–0.004
–0.006
1000
2000
3000
4000
Training examples
FIG. 12 The learning curve for the nutrient removal efficiency project.
5000
6000
Advanced machine learning techniques Chapter
1
13
In the case of a linear model, the regularization is generally performed by restricting the weights of the model. In order to
better illustrate the model, the Ridge regression and the Lasso regression models can be evaluated since these models utilize
two distinct methods for restricting the weights (Gori, 2017).
11. The ridge regression
The ridge regression is in fact a regularized or restricted version of the linear regression. In order to regularize the linear
n
P
regression model, a regularization term, i.e., a y2i is introduced into the cost function of the model. Adding this term will
i¼1
make the learning algorithm fit the data, while also minimizing the weights of the model. It is worth mentioning that this
regularization term must only be introduced into the cost function during the training stage. After training the model, the
unregularized performance measure can be used to assess the performance of the model (Saleh et al., 2019; Aldrich and
Auret, 2013).
The extent of the regularization of the model can be controlled using the hyper-parameter a. When a ¼ 0, the Ridge
regression will be the same as the linear regression. However, if a is very large, all the weights will be close to zero,
resulting in a flat line going through the mean values of the data. The cost function of the Ridge regression model is presented in Eq. (23).
1 Xn 2
J ðyÞ ¼ MSEðyÞ + a
y
(23)
i¼1 i
2
It should be noted that the bias term, denoted by y0 is not regularized and the sum starts
2 at i ¼ 1 and not i ¼ 0. If w denotes the
vector of feature weights (y1 to yn), the regularization term will become 12 kwk2 , in which kwk2 signifies the ‘2 norm of
the weight vector. Moreover, for the gradient descent, aw is simply added to the MSE gradient vector (Alpaydin, 2020).
Similar to the linear regression model, the Ridge regression can be performed by calculating a closed-form equation or
by applying the gradient descent. The advantages and disadvantages are the same. The closed-form solution is presented in
Eq. (24). It should be noted that in this equation, A is the (n + 1) (n + 1) identity matrix, with one difference, i.e., the
presence of a 0 in the top-left cell, which corresponds to the bias term.
1
b
y ¼ XT X + aA XT y
(24)
12. The effect of collinearity in the coefficients of an estimator
As mentioned earlier, a 0 is a complexity parameter that controls the amount of shrinkage: the larger the value of a, the
greater the amount of shrinkage and thus the coefficients become more robust to collinearity. Each color represents a different feature of the coefficient vector, and this is displayed as a function of the regularization parameter. This example also
shows the usefulness of applying Ridge regression to highly ill-conditioned matrices. For such matrices, a slight change in
the target variable can cause huge variances in the calculated weights. In such cases, it is useful to set a certain regularization (alpha) to reduce this variation (noise) (see Fig. 13).
13. Outliers impact
Before discussing ridge regression, let’s look at an example of the effect of outliers on the slope of the regression line, and
then we will show how ridge regression can reduce these effects. For a data set containing 100 randomly generated points
with a slope of 0.5, after performing linear regression, the slope of the fitted line is equal to 0.47134857. The diagram for
this example is given below (see Fig. 14):
Now to show the effect of outliers on the previous example, we change two points in the data set: we replace the first
point of the chart with 200 and the last point of the chart with +200. After performing linear regression, we see that the
slope of the obtained line is equal to 1.50556072, which is significantly different from the slope of the previous chart and
shows the effect of outliers. The diagram for this example is given below (see Fig. 15).
After applying the ridge regression, we can see that this regression is substantially better than linear regression and it
recovers the original coefficient with a fairly good approximation. The slope of the line obtained in this regression is equal
to 1.00370714. The diagram for this example is given below (see Fig. 16).
14
Handbook of hydroinformatics
200
weights
100
0
–100
10–3
10–5
10–7
10–9
alpha
FIG. 13 Ridge coefficients as a function of the regularization.
10
5
0
–5
–10
–20
–10
0
FIG. 14 Data set containing 100 randomly generated points and performing linear regression.
10
20
Advanced machine learning techniques Chapter
200
150
100
50
0
–50
–100
–150
–200
–20
–10
0
10
20
–20
–10
0
10
20
FIG. 15 Showing the outlier’s effect.
200
150
100
50
0
–50
–100
–150
–200
FIG. 16 Using ridge regression to offset the impact of outliers.
1
15
16
14.
Handbook of hydroinformatics
Lasso regression
Another regularized version of the linear regression is the Lasso regression, which comes from Least Absolute Shrinkage
and Selection Operator Regression. Similar to the Ridge regression, this regularized version introduces a regularization
term into the cost function, except that instead of the half of the square of the ‘2 norm, it utilizes the ‘1 norm of the weight
vector. This is expressed in Eq. (25) (Sra et al., 2012; Bali et al., 2016).
Xn J ðyÞ ¼ MSEðyÞ + a i¼1 yi (25)
One of the differences between this type of regression and the Ridge regression involves the fact that as the parameters get
closer to the global optimum, the gradients get smaller, and the gradient descent becomes slower, increasing the likelihood
of convergence because of the lack of bouncing around. Another difference is that by increasing a, the optimal parameters
gradually get closer to the origin, but they will never reach zero.
It should be noted that at yi ¼ 0 for i ¼ 1, 2, …, n, the cost function of the Lasso regression is not differentiable; however, if
in cases where yi ¼ 0, the subgradient vector g is utilized, the gradient descent will perform well enough. A subgradient vector
equation that can be utilized for gradient descent with the cost function of the Lasso regression is presented by Eq. (26).
1
signðy1 Þ
B signðy Þ C
2 C
B
C
B
C
B
:
C
B
C
B
:
C
gðy, J Þ ¼ ry MSEðyÞ + aB
C
B
:
C
B
C
B
B signðyn Þ C
C
B
A
@
0
8
>
< 1 if yi < 0
if yi ¼ 0
where signðyi Þ ¼ 0
>
:
+ 1 if yi > 0
(26)
After applying Lasso regression to the example data related to the ridge regression, it was shown that this regression can fit
the data of this example with a much better approximation than linear one and be less affected by outliers. The slope of the
line related to this regression is 1.06289489. The diagram for this example is given below (see Fig. 17):
200
150
100
50
0
–50
–100
–150
–200
–20
–10
FIG. 17 Using Lasso regression to offset the impact of outliers.
0
10
20
Advanced machine learning techniques Chapter
1
17
15. Elastic net
The elastic net is the middle point between the Ridge and the Lasso regression models. In this model, the regularization term
is a combination of the regularization terms from Ridge and Lasso regression models, which is called the mix ratio r. It
should be noted that the elastic net will be equal to the Ridge regression when r is set to 0, while it equals the Lasso
regression when r is set of 1. This is expressed in Eq. (27) (Humphries et al., 2018; Forsyth, 2019).
Xn 1 r Xn
a i¼1 y2i
J ðyÞ ¼ MSEðyÞ + ra i¼1 yi +
(27)
2
The question is when to use the linear regression without regularization, the Ridge regression, the Lasso regression, or the
Elastic Net. In order to make this decision, it should be noted that a level of regularization is always preferred. Therefore, using
the plain linear regression model must be avoided as much as possible. The Ridge regression model can be a good default
option. However, if there are only a limited number of useful features, it is better to utilize the Lasso regression or the
elastic net, since they usually set the weights of the useless features to zero, as noted earlier. Nonetheless, when the number
of features is larger than the number of training instances, or when there is a strong correlation between several features, it is
better to utilize the elastic net instead of the Lasso regression since the Lasso can have erratic behaviors in such cases.
After applying elastic regression to the example data for ridge and lasso regressions, it was shown that this regression
has a much better approximation than two previous regressions and is less affected by outliers. The slope of the fitted line
for this regression was 0.74724704. The diagram for this example is given below (see Fig. 18):
16. Early stopping
Stopping the training once the validation error is minimized is another distinct way to regularize iterative learning algorithms, including the gradient descent. This method is known as “early stopping.” When applying the early stopping
method, as soon as the validation error is minimized, the training is halted. This is a simple, elegant, and efficient method
for the regularization of iterative learning algorithms (Shukla, 2018).
It should be noted that when using stochastic and mini-batch gradient descent, it is difficult to determine if the error is
minimized or not since the curves are not this smooth. A possible solution is to stop the training after the validation error has
200
150
100
50
0
–50
–100
–150
–200
–20
–10
FIG. 18 Using elastic net regression to offset the impact of outliers.
0
10
20
18
Handbook of hydroinformatics
stayed above the minimum for a while and the possibility of a better performance by the model is not very high. Afterward,
the parameters of the model can be set at the values they were when the validation error was at the minimum point.
17.
Logistic regression
Some algorithms using regression could be applied for classification problems. The possibility of an example belonging to a
particular class can be examined by logistical regression. For example, how likely is it that an email will be spam? Logit
regression is another name for this regression. Whether or not a sample belongs to a class in this model depends on the
probability of more or less than 50%, respectively. In this regression, the probability of more than 50% is called the “positive class” represented by “1” and the probability of less than 50% is called “negative class” which is indicated by “0.”
Such division is called a binary classification (Mohammed et al., 2016; Lesmeister, 2015).
18.
Estimation of probabilities
The question that may be encountered here is how the logistic regression works. A logistic regression model is similar to a
linear regression model in the sense that it computes the weighted sum of the inputs along with a bias term. However, its
main difference with the linear regression model is that it does not provide a direct result; rather, its output is the logistic of
the result. This is expressed in Eq. (28).
b
p ¼ hu ðxÞ ¼ s XT u
(28)
It should be noted that the logistic, which is denoted by s(.), is a sigmoid or S-shaped function, whose output ranges from 0
to 1. This function is expressed by Eq. (29) and it is depicted in Fig. 19.
s ðt Þ ¼
1
1 + exp ðtÞ
(29)
After the logistic regression model estimates the probability, i.e., b
p ¼ hy ðxÞ, of an instance x belonging to the positive class,
the model can calculate the prediction ybin a straightforward fashion. This prediction is expressed in Eq. (30) (Lantz, 2019).
0 if b
p < 0:5
b
y¼
(30)
1 if b
p 0:5
It should be noted that when t < 0, we will have s(t) < 0.5, while when t 0, we will have s(t) 0.5. Accordingly, if xTy is
positive, the logistic regression model’s output as the prediction will be equal to 1; otherwise, the output will be equal to 0.
It is worth mentioning that the score t is usually called the logit. This is because the logit function, which is expressed as
logitðpÞ ¼ log
p
1p
, is in fact the inverse of the logistic function. Moreover, when calculating the logit of the estimated
probability, i.e., p, the output will be t. It should also be noted that the logit is sometimes called the log-odds because it can
be defined as the logarithm of the ratio of the estimated probability of the positive class to the estimated probability of the
negative class (Harrell, 2015).
1.00
σ(t) =
1.75
1
1 + e–t
0.50
0.25
0.00
–10.0
–7.5
–5.0
–2.5
0.0
t
FIG. 19 The logistic function.
2.5
5.0
7.5
10.0
Advanced machine learning techniques Chapter
1
19
19. Training and the cost function
Based on the abovementioned considerations, it can be concluded that a logistic regression model is capable of estimating
probabilities and providing predictions. However, the method for training the model must be explained as well. The main goal
of training the model is to find a suitable value for the parameter vector y in a way that the model can provide high probabilities
for the positive instances (y ¼ 1), while it provides low probabilities for the negative instances (y ¼ 0). This notion is expressed
by the cost function, which is presented in Eq. (31) below for a single training instance (x) (Ayyadevara, 2018).
log ðb
pÞ
if y ¼ 1
cðuÞ ¼
(31)
log ð1 b
pÞ if y ¼ 0
In order to better understand this cost function, it should be noted that when t ! 0, the value of log(t) will become significantly larger. Therefore, if the model provides a probability close to 0 for a positive instance or a probability close to 1
for a negative instance, then the cost will become significantly large. In contrast, as t ! 1, we will have log(t) ! 0. In other
words, if the estimated probability for a negative instance is close to 0 or if the estimated probability for a positive instance
is close to 1, then the cost will be much closer to 0. Incidentally, the latter case is what we are actually seeking (Raschka and
Mirjalili, 2019; Amamra et al., 2018).
On the other hand, the cost function over the entire training set can be an indicator for the average cost over all individual
training instances. This general cost function can be presented as the log loss, which is expressed in Eq. (32).
i
1 Xm h ðiÞ
ðiÞ
ð iÞ
ðiÞ
b
b
JðuÞ ¼ y
log
p
+
1
y
log
1
p
(32)
i¼1
m
Unfortunately, there isn’t any closed-form formula to obtain a value for y in a way that the log loss or the general cost
function can be minimized. In other words, there is no equivalent available for the normal equation. In contrast, the
log loss is a convex function, so using any optimization algorithms, such as the gradient descent, can provide the global
minimum for this function as long as the learning rate is small enough and the algorithm has enough time. Moreover,
Eq. (33) presents the partial derivatives of the log loss based on parameter uj.
m
∂
1 X
ðiÞ
JðuÞ ¼
s uT XðiÞ yðiÞ xj
∂uj
m i¼1
(33)
It should be noted that the above equation is similar to partial derivatives of the cost function at the batch gradient descent
equation in that for each of the instances, it provides the prediction error and multiplies it by the value of the jth feature.
Afterward, it calculates the average over all the training instances. When the gradient vector, which includes all the partial
derivatives, is available, it can be utilized for employing the Batch Gradient Descent algorithm. These were the main steps
for training a logistic regression model. It is also worth mentioning that in the stochastic gradient descent algorithm, each
instance is evaluated separately, while in the mini-batch gradient descent algorithm, a mini-batch is utilized each time.
For performing logistic regression by python the following data displayed in Figs. 20 and 21 is used.
Finally, the decision boundary is specified by the violet from Fig. 22.
y=1
y=0
1.00
Microchip Test 2
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–0.75
FIG. 20 Data for logistic regression.
–0.50
–0.25
0.00
0.25
Microchip Test 1
0.50
0.75
1.00
20
Handbook of hydroinformatics
0.70
cost function
0.65
0.60
0.55
0.50
0.45
0
2500
5000
7500
10000
12500
number of iteration
15000
17500
20000
FIG. 21 The cost functions vs increasing of iterations.
y=1
y=0
1.00
Microchip Test 2
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–1.00
–1.00 –0.75 –0.50 –0.25 0.00 0.25 0.50
Microchip Test 1
0.75
1.00
FIG. 22 The decision boundary plot.
20.
Conclusions
This chapter describes different types of regressions and how they can be used to solve real problems. To solve our challenges in different sections, we utilize the Scikit-learn Python library. In this chapter, a complete description of different
regressions such as robust, multiple, regulized (ridge, lasso, and elastic net), polynomial, and logistic is given. Introduction
and practical use of methods such as gradient descent, cross-validation, and the learning curve are illustrated and how to
deal with issues such as outliers, overfitting, and underfitting to have a better approach to regression issues became clear.
Appendix: Python code
Linear regression
The following Python code presents the steps of calculations for this regression:
In [1]:
import numpy as np
import pandas as pd
Advanced machine learning techniques Chapter
In [2]:
df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None)
In [3]:
col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT'
, 'TN-OUT', 'TP-OUT']
In [4]:
df.columns = col_name
In [5]:
import matplotlib.pyplot as plt
import seaborn as sns
In [6]:
X = df['TOC'].values.reshape(-1,1)
In [7]:
y = df['TN-OUT'].values
In [8]:
from sklearn.linear_model import LinearRegression
In [9]:
model = LinearRegression()
In [10]:
model.fit(X, y)
Out[10]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [11]:
model.coef_
Out[11]:
array([0.02305826])
In [12]:
model.intercept_
1
21
22
Handbook of hydroinformatics
Out[12]:
66.88380657331433
In [66]:
plt.figure(figsize=(12,10));
sns.regplot(X, y);
plt.xlabel('TOC')
plt.ylabel("TN-OUT")
plt.show();
Gradient descent method
The following Python code presents the steps of implementing the Gradient Descent method for this example:
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
%matplotlib inline
In [4]:
df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None)
col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT'
, 'TN-OUT', 'TP-OUT']
df.columns = col_name
import matplotlib.pyplot as plt
import seaborn as sns
In [5]:
X = df['TOC'].values.reshape(-1,1)
y = df['TN-OUT'].values
In [6]:
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
X_std = sc_x.fit_transform(X)
y_std = sc_y.fit_transform(y.reshape(-1,1)).flatten()
Advanced machine learning techniques Chapter
In [7]:
alpha = 0.0001
w_ = np.zeros(1 + X_std.shape[1])
cost_ = []
n_ = 100
for i in range(n_):
y_pred = np.dot(X_std, w_[1:]) + w_[0]
errors = (y_std - y_pred)
w_[1:] += alpha * X_std.T.dot(errors)
w_[0] += alpha * errors.sum()
cost = (errors**2).sum() / 2.0
cost_.append(cost)
In [8]:
plt.figure(figsize=(10,8))
plt.plot(range(1, n_ + 1), cost_);
plt.ylabel('SSE');
plt.xlabel('Epoch');
In [9]:
w_
Out[9]:
array([1.02318154e-15, 3.21037679e-02])
Comparison between linear and polynomial regressions
The code related to this example is given below:
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
%matplotlib inline
In [2]:
df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None)
col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT'
, 'TN-OUT', 'TP-OUT']
df.columns = col_name
import matplotlib.pyplot as plt
import seaborn as sns
1
23
24
Handbook of hydroinformatics
In [3]:
X = df['MLSS'].values.reshape(-1,1)
y = df['TN-OUT'].values
In [4]:
plt.figure(figsize=(12,8))
plt.scatter(X, y);
In [5]:
lr = LinearRegression()
lr.fit(X.reshape(-1, 1), y)
model_pred = lr.predict(X.reshape(-1,1))
plt.figure(figsize=(12,8))
plt.scatter(X, y);
plt.plot(X, model_pred);
print("R^2 score = {:.2f}".format(r2_score(y, model_pred)))
R^2 score = 0.10
In [6]:
poly_reg = PolynomialFeatures(degree=2)
X_poly_b = poly_reg.fit_transform(X.reshape(-1, 1))
lin_reg_2 = LinearRegression()
In [7]:
lin_reg_2.fit(X_poly_b, y)
Out[7]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [8]:
X_fit = np.arange(X.min(), X.max(), 1)[:, np.newaxis]
In [9]:
X_fit
Out[9]:
array([[ 99.08039698],
[ 100.08039698],
[ 101.08039698],
...,
[19974.08039698],
[19975.08039698],
[19976.08039698]])
Advanced machine learning techniques Chapter
In [10]:
y_pred = lin_reg_2.predict(poly_reg.fit_transform(X_fit.reshape(-1,1)))
In [11]:
plt.figure(figsize=(10,8));
plt.scatter(X, y);
plt.plot(X_fit, y_pred);
print("R^2 score = {:.2f}".format(r2_score(y,
lin_reg_2.predict(X_poly_b))))
R^2 score = 0.14
In [12]:
poly_reg = PolynomialFeatures(degree=10)
X_poly_b = poly_reg.fit_transform(X.reshape(-1, 1))
lin_reg_3 = LinearRegression()
In [13]:
lin_reg_3.fit(X_poly_b, y)
Out[13]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [14]:
X_fit = np.arange(X.min(), X.max(), 1)[:, np.newaxis]
1
25
26
Handbook of hydroinformatics
In [15]:
y_pred = lin_reg_3.predict(poly_reg.fit_transform(X_fit.reshape(-1,1)))
In [16]:
plt.figure(figsize=(10,8));
plt.scatter(X, y);
plt.plot(X_fit, y_pred);
print("R^2 score = {:.2f}".format(r2_score(y,
lin_reg_3.predict(X_poly_b))))
R^2 score = 0.04
Learning curve
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set_style("whitegrid")
%matplotlib inline
from sklearn.linear_model import Ridge
from sklearn.model_selection import learning_curve
from sklearn.model_selection import ShuffleSplit
Advanced machine learning techniques Chapter
def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
n_jobs=1, train_sizes=np.linspace(.1, 1.0, 5)):
"""
Generate a simple plot of the test and training learning curve.
Parameters
---------estimator : object type that implements the "fit" and "predict" methods
An object of that type which is cloned for each validation.
title : string
Title for the chart.
X : array-like, shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and
n_features is the number of features.
y : array-like, shape (n_samples) or (n_samples, n_features), optional
Target relative to X for classification or regression;
None for unsupervised learning.
ylim : tuple, shape (ymin, ymax), optional
Defines minimum and maximum yvalues plotted.
cv : int, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if ``y`` is binary or multiclass,
:class:`StratifiedKFold` used. If the estimator is not a classifier
or if ``y`` is neither binary nor multiclass, :class:`KFold` is used.
Refer :ref:`User Guide <cross_validation>` for the various
cross-validators that can be used here.
1
27
28
Handbook of hydroinformatics
n_jobs : integer, optional
Number of jobs to run in parallel (default 1).
"""
plt.figure(figsize=(10, 8))
plt.title(title)
if ylim is not None:
plt.ylim(*ylim)
plt.xlabel("Training examples")
plt.ylabel("Score")
train_sizes, train_scores, test_scores = learning_curve(
estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.grid()
plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.1,
color="r")
plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std, alpha=0.1, color="g")
plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
label="Training score")
plt.plot(train_sizes, test_scores_mean, 'o-', color="g",
label="Cross-validation score")
plt.legend(loc="best")
return plt
Advanced machine learning techniques Chapter
In [2]:
df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None)
col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT'
, 'TN-OUT', 'TP-OUT']
df.columns = col_name
import matplotlib.pyplot as plt
import seaborn as sns
X = df['TOC'].values.reshape(-1,1)
y = df['TN-OUT'].values
title = "Learning Curves (Ridge Regression)"
# Cross validation with 100 iterations to get smoother mean test and train
# score curves, each time with 20% data randomly selected as a validation set.
cv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0)
estimator = Ridge()
plot_learning_curve(estimator, title, X, y, cv=cv, n_jobs=4)
plt.show()
1
29
30
Handbook of hydroinformatics
The effect of collinearity in the coefficients of an estimator
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# X is the 10x10 Hilbert matrix
X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
# ###########################################################################
# Compute paths
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
ridge = linear_model.Ridge(alpha=a, fit_intercept=False)
ridge.fit(X, y)
coefs.append(ridge.coef_)
# ###########################################################################
# Display results
plt.figure(figsize=(10,8))
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
Advanced machine learning techniques Chapter
ax.set_xlim(ax.get_xlim()[::-1])
# reverse axis
plt.xlabel('alpha')
plt.ylabel('weights')
plt.axis('tight')
plt.show()
Outliers impact
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
import pandas as pd
In [2]:
from sklearn.linear_model import LinearRegression
In [3]:
np.random.seed(42)
n_samples = 100
rng = np.random.randn(n_samples) * 10
y_gen = 0.5 * rng + 2 * np.random.randn(n_samples)
lr = LinearRegression()
lr.fit(rng.reshape(-1, 1), y_gen)
model_pred = lr.predict(rng.reshape(-1,1))
plt.figure(figsize=(10,8));
plt.scatter(rng, y_gen);
plt.plot(rng, model_pred);
print("Coefficient Estimate: ", lr.coef_)
Coefficient Estimate: [0.47134857]
1
31
32
Handbook of hydroinformatics
In [4]:
idx = rng.argmax()
y_gen[idx] = 200
idx = rng.argmin()
y_gen[idx] = -200
In [5]:
plt.figure(figsize=(10,8));
plt.scatter(rng, y_gen);
o_lr = LinearRegression(normalize=True)
o_lr.fit(rng.reshape(-1, 1), y_gen)
o_model_pred = o_lr.predict(rng.reshape(-1,1))
plt.scatter(rng, y_gen);
plt.plot(rng, o_model_pred);
print("Coefficient Estimate: ", o_lr.coef_)
Coefficient Estimate: [1.50556072]
In [6]:
from sklearn.linear_model import Ridge
In [7]:
ridge_mod = Ridge(alpha=0.5, normalize=True)
ridge_mod.fit(rng.reshape(-1, 1), y_gen)
ridge_model_pred = ridge_mod.predict(rng.reshape(-1,1))
plt.figure(figsize=(10,8));
plt.scatter(rng, y_gen);
plt.plot(rng, ridge_model_pred);
print("Coefficient Estimate: ", ridge_mod.coef_)
Coefficient Estimate: [1.00370714]
Advanced machine learning techniques Chapter
Lasso regression
In [8]:
from sklearn.linear_model import Lasso
In [9]:
lasso_mod = Lasso(alpha=0.4, normalize=True)
lasso_mod.fit(rng.reshape(-1, 1), y_gen)
lasso_model_pred = lasso_mod.predict(rng.reshape(-1,1))
plt.figure(figsize=(10,8));
plt.scatter(rng, y_gen);
plt.plot(rng, lasso_model_pred);
print("Coefficient Estimate: ", lasso_mod.coef_)
Coefficient Estimate: [1.06289489]
Elastic net
In [11]:
from sklearn.linear_model import ElasticNet
In [12]:
en_mod = ElasticNet(alpha=0.02, normalize=True)
en_mod.fit(rng.reshape(-1, 1), y_gen)
en_model_pred = en_mod.predict(rng.reshape(-1,1))
plt.figure(figsize=(10,8));
plt.scatter(rng, y_gen);
plt.plot(rng, en_model_pred);
print("Coefficient Estimate: ", en_mod.coef_)
Coefficient Estimate: [0.74724704]
1
33
34
Handbook of hydroinformatics
Training and the cost function
The required are added the following:
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pandas is used for reading the data:
In [2]:
data = pd.read_csv('ex2data2.txt', header=None)
In [3]:
data.head()
Out[3]:
0
1
2
0
0.051267
0.69956
1
1
0.092742
0.68494
1
2
0.213710
0.69225
1
3
0.375000
0.50219
1
4
0.513250
0.46564
1
In [4]:
x = data.iloc[:,:2].values
y = data.iloc[:,2].values.reshape(-1,1)
The polynomial feature is employed for logistic regression by scikit-learn:
In [6]:
from sklearn.preprocessing import PolynomialFeatures
poly_feature = PolynomialFeatures(degree=6, include_bias=False)
x_poly = poly_feature.fit_transform(x)
Advanced machine learning techniques Chapter
The following function defined the sigmoid function:
In [9]:
def sigmoid(z):
m = len(z)
h = np.zeros((m,1))
for i in range(m):
h [i] = 1 / (1 + np.exp(-z[i]))
return h
Cost function is computed by the following code:
In [10]:
def cost_function(X, y, theta, Lambda):
# X: features
# y: outputs
# theta: model's parameters vector
m = len(y) # number of instances
J = 1 / m * ( - y.T.dot(np.log(sigmoid(X.dot(theta)))) \
- (1 - y).T.dot(np.log(1 - sigmoid(X.dot(theta)))) ) + 2 * Lambda / m
* sum(theta[1:]**2)
return J
1
35
36
Handbook of hydroinformatics
The gradient descent is defined:
In [11]:
def gradient_descet(X, y, theta, alpha, num_itr):
# X: features
# y: outputs
# theta: model's parameters vector
# alpha: learning rate
# number of iterations
m = len(y) # number of instances
Lambda = 1
J = np.zeros((num_itr + 1, ))
J[0] = cost_function(X, y, theta, Lambda )
for i in range(num_itr):
theta = theta - alpha / m * X.T.dot(sigmoid(X.dot(theta)) - y) - alpha * Lambd
a / m * np.r_[np.zeros((1,1)),theta[1:]]
J[i+1] = cost_function(X, y, theta, Lambda)
return theta, J
The number of iteration, alpha and, initial theta are set by bellow:
In [12]:
num_itr = 20000
alpha = 0.001
int_theta = np.zeros((X.shape[1],1))
theta, J = gradient_descet(X, y, int_theta, alpha, num_itr)
Then the cost functions versus increasing of iterations is plotted:
In [13]:
plt.figure(figsize=(10,5))
plt.plot(np.arange(num_itr+1), J)
plt.xlabel('number of iteration')
plt.ylabel('cost function')
plt.show()
Advanced machine learning techniques Chapter
1
37
In [14]:
x_list = np.c_[np.linspace(-1,1.2,100),np.linspace(-1,1.2,100)]
D = np.zeros((100,100))
for j in range(100):
for i in range(100):
xx_poly = poly_feature.fit_transform(np.c_[x_list[i,0], x_list[j,1]])
xx_poly = (xx_poly - Mean) / Sigma
xx_poly = np.c_[np.ones((1,1)), xx_poly]
D[i,j] = xx_poly.dot(theta)
In [15]:
plt.contour(x_list[:,0], x_list[:,1], D, [0])
plt.scatter(x[np.array(y==1).reshape(118,),0], x[np.array(y==1).reshape(118,),1], labe
l='y = 1')
plt.scatter(x[np.array(y==0).reshape(118,),0], x[np.array(y==0).reshape(118,),1], labe
l='y = 0')
plt.xlabel('Microchip Test 1')
plt.ylabel('Microchip Test 2')
plt.legend()
plt.show()
References
€ g€
Akg€
un, B., O
ud€
uc€
u, Ş.G., 2015. Streaming linear regression on spark MLlib and MOA. In: Proceedings of the 2015 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining.
Aldrich, C., Auret, L., 2013. Unsupervised Process Monitoring and Fault Diagnosis With Machine Learning Methods. Springer.
Alpaydin, E., 2020. Introduction to Machine Learning. MIT Press, USA.
Amamra, A., Khanchoul, K., Eslamian, S., Hadj Zobir, S., 2018. Suspended sediment estimation using regression and artificial neural network models:
Kebir watershed, northeast of Algeria, North Africa. Int. J. Hydrol. Sci. Technol. 8 (4), 352–371.
Ayyadevara, V., 2018. Pro Machine Learning Algorithms. Apress, New York, USA.
Bali, R., et al., 2016. R: Unleash Machine Learning Techniques. Packt Publishing Ltd.
Bargarai, F.A.M., Abdulazeez, A.M., Tiryaki, V.M., Zeebaree, D.Q., 2020. Management of wireless communication systems using artificial intelligencebased software defined radio. Int. J. Interact. Mob. Technol. https://doi.org/10.3991/ijim.v14i13.14211.
Brownlee, J., 2016. Master Machine Learning Algorithms: Discover How They Work and Implement Them From Scratch. Machine Learning Mastery.
https://machinelearningmastery.com/.
Burger, S.V., 2018. Introduction to Machine Learning With R: Rigorous Mathematical Analysis. O’Reilly Media, Inc.
Dargan, S., et al., 2020. A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27 (4),
1071–1092.
Dehghan, M.H., Hamidi, F., Salajegheh, M., 2015. Study of linear regression based on least squares and fuzzy least absolutes deviations and its application
in geography. In: 2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS). IEEE.
38
Handbook of hydroinformatics
Epskamp, S., Fried, E.I., 2018. A tutorial on regularized partial correlation networks. Psychol. Methods 23 (4), 617.
Forsyth, D., 2019. Applied Machine Learning. Springer.
Ganesh, T.V., 2017. Practical Machine Learning With R and Python: Machine Learning in Stereo. Independently Published.
Gori, M., 2017. Machine Learning: A Constraint-Based Approach. Morgan Kaufmann.
Hackeling, G., 2017. Mastering Machine Learning With Scikit-Learn. Packt Publishing Ltd., UK.
Harrell Jr., F.E., 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.
Springer.
Harrington, P., 2012. Machine Learning in Action. Manning Publications Co., USA.
Humphries, G.R., Magness, D.R., Huettmann, F., 2018. Machine Learning for Ecology and Sustainable Natural Resource Management. Springer.
Izenman, A., 2008. Regression, Classification, and Manifold Learning, Modern Multivariate Statistical Techniques. Springer Texts in Statistics, Germany.
Jaber, M.Y., 2016. Learning Curves: Theory, Models, and Applications. CRC Press.
Konishi, S., 2014. Introduction to Multivariate Analysis: Linear and Nonlinear Modeling. CRC Press, USA.
Lantz, B., 2019. Machine Learning With R: Expert Techniques for Predictive Modeling. Packt Publishing Ltd.
Lesmeister, C., 2015. Mastering Machine Learning With R. Packt Publishing Ltd.
Lim, H.-I., 2019. A linear regression approach to modeling software characteristics for classifying similar software. In: 2019 IEEE 43rd Annual Computer
Software and Applications Conference (COMPSAC). IEEE.
Liu, Y., et al., 2017. Materials discovery and design using machine learning. J. Mater. 3 (3), 159–177.
Matloff, N., 2017. Statistical Regression and Classification: From Linear Models to Machine Learning. CRC Press, USA.
Mohammed, M., Khan, M.B., Bashier, E.B.M., 2016. Machine Learning: Algorithms and Applications. CRC Press.
Olive, D.J., 2017. Linear Regression. Springer.
Ramasubramanian, K., Singh, A., 2018. Machine Learning Using R: With Time Series and Industry-Based Use Cases in R. Apress, New York, USA.
Raschka, S., 2015. Python Machine Learning. Packt Publishing Ltd, UK.
Raschka, S., Mirjalili, V., 2019. Python Machine Learning: Machine Learning and Deep Learning With Python, Scikit-Learn, and Tensor Flow 2. Packt
Publishing Ltd.
Saleh, A.M.E., Arashi, M., Kibria, B.G., 2019. Theory of Ridge Regression Estimation With Applications. vol. 285 John Wiley & Sons.
Sarkar, M.R., et al., 2015. Electricity demand forecasting of Rajshahi City in Bangladesh using fuzzy linear regression model. In: 2015 International
Conference on Electrical Engineering and Information Communication Technology (ICEEICT). IEEE.
Shalev-Shwartz, S., Ben-David, S., 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, UK.
Shukla, N., 2018. Machine Learning With Tensor Flow. Manning Publications Co.
Sra, S., Nowozin, S., Wright, S.J., 2012. Optimization for Machine Learning. MIT Press, USA.
Sulaiman, M.A., 2020. Evaluating data mining classification methods performance in internet of things applications. J. Soft Comput. Data Mining 1 (2),
11–25.
Swamynathan, M., 2019. Mastering Machine Learning With Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using
Python. Apress, New York, USA.
Yaqub, M., et al., 2020. Modeling of a full-scale sewage treatment plant to predict the nutrient removal efficiency using a long short-term memory (LSTM)
neural network. J. Water Process Eng. 37, 101388.
Zebari, D.A., et al., 2020. Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images. IEEE Access 8, 203097–203116.
Zeebaree, D.Q., et al., 2019. Machine learning and region growing for breast cancer segmentation. In: 2019 International Conference on Advanced Science
and Engineering (ICOASE). IEEE.
Chapter 2
Bat algorithm optimized extreme learning
machine: A new modeling strategy
for predicting river water turbidity at
the United States
Salim Heddam
Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology, Hydraulics Division, Agronomy Department, Faculty of Science,
Skikda, Algeria
1. Introduction
Water turbidity (TU) among other water variables has been used for a long time as an indicator of water quality in rivers,
streams, and lakes freshwater ecosystems (Zolfaghari et al., 2020), and also for monitoring water contamination and
guiding pollution control (Gu et al., 2020). The concentration of TU in water comes from the high concentration of the
suspended solids caused by watershed runoff (Park et al., 2017), and it is often used as an indicator of the intensity of light
scattering (Gelda et al., 2009; Gelda and Effler, 2007). The water clarity and transparency are measured and evaluated using
the turbidity which is related to the scattering of light (Al-Yaseri et al., 2013). A high concentration of TU in freshwater can
cause serious problems and lead to a deterioration of the water quality that can cause serious health problems, affecting the
metabolic activity and leading to a significant increase in the net sedimentation rate (Gelda et al., 2013). In a study conducted in the Niulan River, China, it was demonstrated that high levels of turbidity were originated from three sources
namely; interflows and underflows caused the sudden spikes, strong mixing caused by the floods, and the very low settling
velocity of the very fine incoming sediments (Zhang and Wu, 2020). In addition, it was reported that the higher the TU
concentration in water, the higher the esthetic impairments manifested (Gelda and Effler, 2007). River TU is highly correlated to river discharge (Q) and the relation between TU and Q is a complex and dynamic process (Mather and Johnson,
2014). Water TU can be measured directly using in-situ sensors and calculated using various indirect methods based on the
application of different kinds of models. Over the years, several models have been developed and proposed for predicting
TU and mainly based on the artificial intelligence paradigms or remote sensing data.
Rajaee and Jafari (2018) applied several machine learning for predicting daily river TU in the Blue River at Kenneth
Road, Overland Park, Kansas, United States. The authors used the standard artificial neural network (ANN), gene
expression programming (GEP), and the decision tree (DP) approaches. In addition, the proposed models were applied
combined with the discrete wavelet transforms (DWT) for improving the model’s accuracy. Based on the correlation coefficients, the explanatory variables were composed of turbidity and river discharge (Q) measured at several previous lag
times. From the obtained results, the authors demonstrated that the best accuracy was achieved using the wavelet-gene
expression programming (WGEP), compared to the wavelet-ANN (WANN) and wavelet-decision tree (WDT). In another
study, Liu and Wang (2019) compared the multiple linear regression (MLR) and the GEP models in predicting water turbidity measured at two reservoirs located in Taiwan: the Tseng-Wen and Nan-Hwa reservoirs. The authors have developed
the predictive models based on the satellite imagery obtained from the Landsat 8 satellite, and in total four inputs were
selected namely, the spectral wavelength band 2 (450–510 nm, blue), band 3 (530–590 nm, green), band 4 (640–
670 nm, red), and band 5 (850–880 nm). From the obtained results, the GEP model worked best compared to the MLR
model. Zounemat-Kermani et al. (2020) used several machines in predicting river TU in Brandywine Creek, Pennsylvania,
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00005-1
Copyright © 2023 Elsevier Inc. All rights reserved.
39
40
Handbook of hydroinformatics
United States, namely the online sequential extreme learning machine (OS-ELM), the ANN, the classification and
regression tree (CART), the group method of data handling (GMDH), and the response surface method (RSM) models.
The proposed machine learning models were developed using several predictors, i.e., Q, precipitation (P), water pH, suspended sediment (SS), dissolved oxygen (DO), and water temperature (TE). From the obtained results, they reported that
the best accuracy was obtained using the OSELM model, while the CART was the worst model. Gu et al. (2020) proposed a
new model for river TU retrieval using the random forest regression model (RF). The authors selected 13 bands from the
hyperspectral remote sensing data obtained by the Google earth engine (GEE) and the model was called RFE-GEE. To
demonstrate the superiority of the proposed RFE-GEE model, they compared its accuracy with those of RF, broad learning
system (BLS), bidirectional ELM (BELM), support vector regression (SVR), deep belief network, extreme learning
machine (ELM), and stacked selective ensemble-backed predictor (SSEP) models. From the obtained results, they reported
that the high accuracy was obtained using the developed RFE-GEE which ensured a 15.4% gain taking into account the
mean squared error (MSE). Allam et al. (2020) proposed the use of the Landsat 8 surface reflectance (L8SR) for predicting
TU in the Ramganga River, India. The proposed algorithm achieved a good correlation between in situ measured and calculated river TU with R2 0.760.
Najah et al. (2013) compared two artificial neural network models namely the MLPNN and the radial basis function
neural networks (RBFNN) for predicting river TU measured in the Johor River Basin located in Johor state, Malaysia. The
two models were developed and compared using only the total dissolved solids (TDS), and the results showed that the
RBFNN (R2 0.80) was more accurate compared to the MLPNN (R2 0.64). Mather and Johnson (2016) combined three
input variables namely river Q, P and air temperature (TE) for forecasting daily River TU 3 days in advance. The empirical
even model was developed using data from two USGS sites and acceptable accuracy was obtained. Tsai and Yen (2017)
used the group method of data handling algorithm (GMDH) for forecasting river TU measured at the Chiahsien Weir and its
upper stream in Taiwan. By combining the Q, P, and TU measured at the previous lag, they demonstrated that GMDH
(R 0.975) was more accurate than the stepwise regressive (SGMDH) (R 0.965) and achieved high accuracy. In a
recently published paper, Teixeira et al. (2020) compared MLPNN and the fuzzy inference system (FIS) in predicting river
TU using the Q and the area of the watersheds (A). According to the obtained results, the FIS model was more accurate with
Nash-Sutcliffe efficiency (NSE) of 0.860 for the validation dataset. In the same context, Iglesias et al. (2014) has proposed a
new modeling strategy for modeling river TU in the Nalón river basin, Northern Spain. The proposed approach used the
so-called synergistic variables which were obtained by the multiplication two well-known variables: conductivity ammonium, conductivity pH, conductivity dissolved oxygen, and so on. It was demonstrated that the new synergistic variables contribute significantly to the improvement of the model’s performances.
According to the literature review discussed earlier, it is clear that several attempts have been done for providing general
frameworks for the river water TU modeling, and models based on machine learning were the most reported tools. While it
was shown that river TU can be predicted very well using a combination of several water variables, we believe that
the introduction of new working methods based on the use of fewer predictor will be a very promising area of research
and the development of new modeling strategy can help in improving our understanding of the river TU modeling. In
addition, the use of hybrid models based on the combination of standalone machine learning and several metaheuristics
algorithms can help in improving the models performances. Consequently, the objective of this study is to introduce a
new kind of machine learning models called bat algorithm optimized extreme learning machine (Bat-ELM) for predicting
daily river turbidity using only river discharge. The Bat-ELM was compared to the feedforward artificial neural network
(FFNN), and the dynamic evolving neural-fuzzy inference system (DENFIS) models.
2.
Study area and data
The study area for this investigation was composed of four USGS stations, two of them located in the Sprague River,
Oregon, United States, and the two other stations in the Clackamas County, Oregon, United States. The selected stations
were: (i) USGS 11497500 at Sprague River near Beatty, Klamath Basin, Oregon, United States (Latitude 42°260 51.900 ,
Longitude 121°140 18.700 NAD83), (ii) USGS 11501000 at Sprague River near Chiloquin, Klamath Basin, Oregon, United
States (Latitude 42°350 03.500 , Longitude 121°500 54.000 NAD83), (iii) USGS 14210000 at Clackamas River at Estacada,
Oregon, United States (Latitude 45°180 0000 , Longitude 122°210 1000 NAD27), and (iv) USGS 14211010 at Clackamas River
near Oregon City, Oregon, United States (Latitude 45°220 4600 , Longitude 122°340 3400 NAD27). The location of the study
Bat algorithm optimized extreme learning machine Chapter
2
41
FIG. 1 Location map showing the four USGS stations selected for modeling river turbidity.
area shows the four USGS station in Fig. 1. The data from these four selected stations were used to build machine learning
models for estimating river turbidity measured at a daily time scale, as a function of the river discharge. The length of the
data set varied form one station to another ranging from 990 to 6684 patterns, and the detail for each station was provided in
Table 1. For each station, the dataset was randomly divided into two subgroups: one for the calibration period (70%) and the
rest (30%) for validation. Table 2 reported the mean, maximum, minimum, standard deviation, coefficient of variation
values, and the coefficient of correlation with TU, i.e., Xmean, Xmax, Xmin, Sx, Cv, and R, respectively.
TABLE 1 Period of records for the USGS stations selected for Modeling River turbidity.
Station
Begin date
End date
Total patterns
Incomplete patterns
Final patterns
USGS 1497500
01/11/2007
31/12/2015
2983
1993
990
USGS 11501000
16/11/2007
02/09/2020
4675
711
3964
USGS 14210000
01/07/2001
03/09/2020
7005
321
6684
USGS 14211010
01/06/2002
03/09/2020
6670
229
6441
42
Handbook of hydroinformatics
TABLE 2 Summary statistics of water quality variables for the four stations.
Variables
Subset
Unit
Xmean
Xmax
Xmin
Sx
Cv
R
USGS 11497500 Sprague River near Beatty, Klamath Basin, Oregon, United States
TU
Q
Training
FNU
7.306
47.700
1.900
7.376
1.010
1.000
Validation
FNU
6.937
45.600
1.900
6.843
0.986
1.000
All data
FNU
7.196
47.700
1.900
7.221
1.003
1.000
Training
Kcfs
313.386
1500.000
82.600
264.966
0.845
0.503
Validation
Kcfs
310.379
1520.000
82.800
270.773
0.872
0.531
All data
Kcfs
312.486
1520.000
82.600
266.653
0.853
0.510
USGS 11501000 Sprague River near Chiloquin, Klamath Basin, Oregon, United States
TU
Q
Training
FNU
7.239
78.400
0.500
8.614
1.190
1.000
Validation
FNU
7.407
63.700
0.500
8.483
1.145
1.000
All data
FNU
7.290
78.400
0.500
8.575
1.176
1.000
Training
Kcfs
477.993
4430.000
100.000
482.665
1.010
0.548
Validation
Kcfs
501.307
4380.000
101.000
531.828
1.061
0.562
All data
Kcfs
484.984
4430.000
100.000
497.999
1.027
0.552
USGS 14210000 Clackamas River at Estacada, Oregon, United States
TU
Q
Training
FNU
2.225
75.400
0.000
4.692
2.109
1.000
Validation
FNU
2.524
78.300
0.000
5.575
2.209
1.000
All data
FNU
2.314
78.300
0.000
4.975
2.149
1.000
Training
Kcfs
2540.357
24800.000
589.000
2252.682
0.887
0.740
Validation
Kcfs
2635.124
28900.000
601.000
2371.804
0.900
0.746
All data
Kcfs
2568.781
28900.000
589.000
2289.388
0.891
0.741
USGS 14211010 Clackamas River near Oregon City, Oregon, United States.
TU
Q
Training
FNU
3.088
100.000
0.000
6.390
2.069
1.000
Validation
FNU
3.144
93.800
0.000
6.814
2.167
1.000
All data
FNU
3.105
100.000
0.000
6.520
2.100
1.000
Training
Kcfs
3270.327
27500.000
630.000
3115.037
0.953
0.784
Validation
Kcfs
3265.621
32600.000
624.000
3312.813
1.014
0.811
All data
Kcfs
3268.915
32600.000
624.000
3175.539
0.971
0.793
Xmean, mean; Xmax, maximum; Xmin, minimum; Sx, standard deviation; Cv, coefficient of variation; R, coefficient of correlation with TU; TU, water turbidity;
Q, discharge; FNU, Formazin Nephelometric Unit; Kcfs, thousand cubic feet per second.
3.
Methodology
3.1 Feedforward artificial neural network
Artificial neural networks (ANN) are widely used for solving a large number of problems in the area of water resources
management and now becoming a successful tool for tackling complex and nonlinear problem (Olyaie et al., 2017; Mehr
and Nourani, 2018; Hrnjica et al., 2019; Matouq et al., 2013). The success of the ANN in comparison to other regression
models was primarily due to their ability to adapt and to be flexible in extracting the nonlinear relationship between variables using a learning process (Haykin, 1999). There is a large number of the ANN architecture; however, the FFNN is the
Bat algorithm optimized extreme learning machine Chapter
2
43
most and widely used model in the literature. As the name suggests, the FFNN is composed of several layers: input layer,
hidden layers, and output layer, and generally only one hidden layer is adopted, and the available information
spreads through the network from the input to the output layer. The input layer contains the independent variables
(x1, x2, x3, …, xi), the hidden layer is composed of several neurons determined by trial and error, and each one receives
the all input variables (xi) multiplied by their respective parameters (the weights), use a summation function and added
one bias to the results. The output of each hidden neuron was produced using an activation function, generally the sigmoidal
function. Finally, the output layer sums the weighted output of the hidden neurons and uses a linear transfer function to
provide the final output response. The weights and biases of the ANN models will be adjusted during the training process to
minimize a cost function, generally the sum of squares error calculated as the differences between the measured and predicted value. The well-known and widely used training algorithm is the back propagation (Haykin, 1999).
3.2 Dynamic evolving neural-fuzzy inference system
Evolving neural-fuzzy inference systems are intelligent models with high similarity with the classical neuro-fuzzy
approaches for which the linear and nonlinear parameters were adopted in an online manner, more precisely; the nonlinear
parameters were governed by the kind of partition of the input-output space (Škrjanc et al., 2019). DENFIS is the most
relevant evolving system introduced during the last decade (Kasabov and Song, 2002) and is mainly based on the so-called
evolving clustering method (Heddam and Kisi, 2020; Kasabov et al., 2008). From a computational point of view, the
DENFIS model can be run in two manners namely the online and the offline. The first the version is based on the online
training method and the model is called DENFIS_ON, while the second method is based on the offline training method and
the model is called DENFIS_OF (Kasabov and Song, 2002; Kasabov et al., 2008). The triangular fuzzy membership functions are used for both online and offline DENFIS models:
8
0,
xa
>
>
>
x
a
>
>
,
axb
<
a
(1)
mðxÞ ¼ mf ðx, a, b, cÞ ¼ bc x
>
>
,
b
x
c
>
>
>
: cb
0:
cx
where b is the value of the cluster center on the x dimension, a ¼ b d Dthr and c ¼ b + d Dthr, d ¼ 1.2 2; the distance
threshold value, Dthr, is a clustering parameter (Kasabov and Song, 2002; Kasabov et al., 2008; Heddam et al., 2018).
During the last few years, DENFIS models have been applied for solving several engineering problems, and more details
related to its application can be found in Adnan et al. (2021), Sebbar et al. (2020), Heddam and Kisi (2020), Heddam et al.
(2018), Kisi et al. (2019a,b). The MatLab software for DENFIS can be found in https://kedri.aut.ac.nz/areas-of-expertise/
data-mining-and-decision-support/neucom.
3.3 Bat algorithm optimized extreme learning machine
Single hidden layer feedforward neural network (SLFN) is the most and relevant ANN model proposed during the last
decades, not only regarding its simplicity, i.e., having only one hidden layer, but also in regards to its robustness, high
precision, and universal approximation capability. With the invention of the back-propagation training algorithm, the SLFN
had become famous (Hornik et al., 1989; Hornik, 1991). From a computational point of view, the back-propagation is used
for iteratively updating all SLFN parameters (i.e., weights and biases) from the input to the output layers, bringing the total
number of updated parameters high, and in some cases (i.e., large data set) the training process become very slow and suffer
from the overfitting problem. In order to meet these challenges, a new training algorithm called extreme learning machine
(ELM) was arriving (Huang et al. 2006a,b), such that the weights between the input and hidden layer are obtained directly
and do not need to be updated during the training process, which is called the random generation of the hidden nodes, while
those linking the hidden to the output layers were analytically determined. According to Huang et al. (2006a,b), SLFN with
N hidden layer nodes can be expressed as follows:
Yj ¼
N
X
i¼1
bi c wi xj + bi
j ¼ 1,…M
(2)
44
Handbook of hydroinformatics
where M is the number of training simple, N is the number of hidden nodes, wi is a single input to hidden layers weight, C is
the activation function, bi is hidden to output layers weights, bi is the hidden nodes biases, xj correspond to the input variables matrix. The mathematical formulation of the ELM approach could be described as:
Hb ¼ T
(3)
where H is the hidden layer output matrix, b is the weight matrix of the output layer, and T is the expected output matrix (Liu
et al., 2020a; Cheng et al., 2020).
Several metaheuristics training algorithms have been proposed during the last few years for improving the training
process of the ANN and ELM models, among them: genetic algorithm (GA), particle swarm optimization (PSO), bee
colony (ABC) optimization algorithm, Ant Colony Optimization (ACO), differential evolution (DE), and cuckoo search
algorithm (CSA). In the present study, an efficient optimization method called Bat optimization was introduced to optimize
the ELM model and described later.
The Bat optimization algorithm introduced by Yang (2010) is a metaheuristics approach belonging into the category of
swarm intelligence models, and it was inspired by the behavior by which the bats seek their prey with a special sense ( Jaddi
et al., 2015). The main idea behind the bat algorithm is mainly based on the echolocation capability and social behavior of
the bat population (Xie et al., 2019). From a computational point of view the bat algorithm possesses the following three
idealized rules (Shekhar et al., 2020; Liu et al., 2020b):
(i) The echolocation is used by the bats as a method to calculate and to know the relative distance from a food source and
obstacles in an unknown way.
(ii) In order to search the prey, an initial velocity Vi should be randomly assigned at a starting position Xi. The bats fly at
the same relative velocity for different times due to different initial distances, using a fixed frequency fi ranging
between two limits fmin and fmax, varying wavelength l and the loudness or sound intensity A0. According to the
level of proximity to the target, the bat automatically adjusts the wavelength and pulse rate accordingly.
(iii) Ranging from a maximum (A0) to a minimum (Amin), the loudness of the pulse should be adjusted accordingly.
The output of this iterative process is achieved according to a series of iterations according to a large number of available solutions, in which the loudness and pulse rate were updated in response to the received accepted solution. Consequently, the frequency, velocity, and position values of any bat member are calculated as follow (Gangwar and Pathak,
2020):
f i ¼ f min + ðf max f min Þb
+ Xt1
Xtbest f i
V ti ¼ V t1
i
i
(4)
Xti ¼ Xt1
+ V ti
i
(6)
(5)
where b ranging is a random number ranging from 0 to 1, fi ranging from fmin to fmax denoted as the frequency and used for
controlling the step length (i.e., the step and range) of the bat movement, and it corresponds to a range of wavelengths [lmin,
lmax], and Xbest is the global best solution. During the iteration process, the updated solution is calculated a follow (Gangwar
and Pathak, 2020):
Xnew ¼ Xold + eAt
(7)
where e is a random number ranging between zero and one [0, 1], and At represents the average value of Bats loudness at the
time t. Flowchart of the developed Bat-ELM model is shown in Fig. 2. The MatLab code of the Bat algorithm can be found
in https://fr.mathworks.com/matlabcentral/fileexchange?q¼Bat+algorithm.
3.4 Multiple linear regression
Using the MLR method, one dependent variable Y (in our study the river turbidity) is linked or correlated with several
predictor variables xi, using the following equation (Luu et al., 2021):
Y i ¼ b0 +
K
X
bi xi + ei
i¼1
where b0 is the intercept, bi were the partial regression coefficients for each predictor and the e is the residual.
(8)
Bat algorithm optimized extreme learning machine Chapter
2
45
Velocity ″Vi ″
Hidden Layer
Wavelength l
Position Xi
Input Layer
b11
Q
Loudness A0
Y
Frequency ″f ″
Output Layer
Fitness Function
TU
M
Sqrt [MSE (Pi -Oi)]
bLm
D
Ultrasonic Waves
Wij
Echo
Bat, X [X1, X2, X3]
Xi
Start
Vi (t-1)
[Xi (t)- Xbest)] fi
Xi (t-1)
Parameters Initialization
N
r0
A
fmin
fmax
m
One solution should be
Selected among the best
Solutions
After generating a new solution, the
fitness function for all bats was evaluated
N=25 : Population size (10 to 25)
A=0.1 : Loudness
r0=0.1 : Pulse rate
fmin=–1 : Frequency minimum
fmax=+1 : Frequency miximum
m=200 : Number of Hidden Neurons
The process is iterative, and will continue until the
best solution was obtained corresponding to the
best fitness function
End
FIG. 2 Extreme learning machines optimized Bat algorithm flowchart.
3.5 Performance assessment of the models
In the present chapter, the performances of the proposed models were evaluated using: coefficient of correlation (R), NashSutcliffe efficiency (NSE), mean absolute error (MAE), and root mean square error (RMSE) are calculated as follow:
N
1 X
|ðTU 0 Þi TUp i |, ð0 MAE < +∞Þ
N i¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
N h
u1 X
i2
RMSE ¼ t
ðTU 0 Þi TUp i , ð0 RMSE < +∞Þ
N i¼1
MAE ¼
(9)
(10)
46
Handbook of hydroinformatics
2
N h
X
ðTU 0 Þi i2
6
i
6 i¼1
NSE ¼ 1 6
6 X
N 2
4
ðTU 0 Þi TU 0
TU p
3
7
7
7,
7
5
ð∞ < NSE 1Þ
i¼1
2
N X
(11)
3
ðTU 0 Þi TU 0 ðTUp Þi TUP
6
7
6
7
i¼1
6
7, ð1 < R +1Þ
s
s
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R¼6
n n 2 1 X
2 7
4 1 X
5
ðTU 0 Þi TU 0
ðTUp Þi TUp
Ν
Ν
1
Ν
i¼1
(12)
i¼1
In which, N is the number of data, TU0, TUp, TU 0, TU p are the measured, predicted, mean measured, and mean predicted
river water turbidity, respectively.
4.
Results and discussion
As stated above, the goal of our study was the prediction of river turbidity at four rivers located in the United States. For this
purpose, four machine learning models were developed and compared according to two scenarios: (i) using the river discharge (Q) and the periodicity (i.e., year, month, and day) numbers, and (ii) using only river discharge. The RMSE, MAE, R,
and NSE, respectively, were calculated during the training and validation phases separately, and the obtained results were
further analyzed using graphical representations. Overall, at the four stations, the river TU were poorly estimated using only
river Q compared to the estimation achieved using Q and the periodicity, and the Bat-ELM showed the best correlation
among all proposed models over the four stations. Detailed results for each station are discussed hereafter.
4.1 USGS 1497500 station
The numerical results of daily river TU prediction at the USGS 1497500 station using the four machine learning models are
illustrated in Table 3. According to Table 3, using only the Q as an input variable, the DENFIS_O2, DENFIS_F2, FFNN2,
and Bat-ELM2 models exhibit small variations during the validation phase, and none of them was able to correctly and
accurately predict TU concentration. The RMSE and MAE values ranging from 5.593 to 6.230 and 3.256 to 3.575, respectively, show the poor models performances during the validation stage. The NSE and R values were very low and do not
exceed 0.331 and 0.576, respectively. However, inclusion of the periodicity guaranteed a significant improvement in the
models performances for all proposed models. The river TU retrieved has a NSE coefficient of no <0.660 for all models and
the R values were superior to 0.827. In addition, the RMSE and MAE values were no more than 3.99 and 2.26, respectively.
These results imply that the four machine learning models have been able to predict the river TU very accurately by the
inclusion of the periodicity. Overall, the best accuracy was obtained using the Bat-ELM1 with R and NSE of 0.972 and
0.936, respectively, versus 0.905 and 0.770 for FFNN1, and 0.850 and 0.708 for DENFIS_O1, significantly higher than the
TABLE 3 Performances of different River Turbidity models at the USGS 11497500 station.
Training
Validation
Models
R
NSE
RMSE
MAE
R
NSE
RMSE
MAE
Bat-ELM1
0.984
0.968
1.314
0.894
0.972
0.936
1.731
1.155
Bat-ELM2
0.596
0.355
5.919
3.684
0.576
0.331
5.593
3.256
DENFIS_O1
0.924
0.593
4.704
2.948
0.850
0.708
3.694
2.023
DENFIS_O2
0.916
0.364
5.876
3.630
0.530
0.170
6.230
3.575
DENFIS_F1
0.842
0.657
4.316
2.425
0.827
0.659
3.992
2.258
DENFIS_F2
0.573
0.294
6.191
4.044
0.564
0.278
5.809
3.503
FFNN1
0.979
0.959
1.490
0.964
0.905
0.770
3.276
1.787
FFNN2
0.638
0.408
5.673
3.563
0.500
0.225
6.018
3.468
Bat algorithm optimized extreme learning machine Chapter
50.00
y = 0.8652x + 0.6792
R2 = 0.9439
38.00
26.00
14.00
2.00
2.00
Bat-ELM1
14.00
26.00
y = 0.6531x + 1.5762
38.00
38.00
26.00
14.00
2.00
2.00
50.00
Measured TU (FNU)
50.00
y = 0.6747x + 1.5264
R2 = 0.7227
14.00
26.00
38.00
50.00
y = 0.8117x + 1.4772
R2 = 0.6837
38.00
38.00
26.00
14.00
2.00
2.00
FFNN1
Measured TU (FNU)
Calculated TU (FNU)
Calculated TU (FNU)
50.00
47
R2 = 0.8186
Calculated TU (FNU)
Calculated TU (FNU)
50.00
2
DENFIS_O1
14.00
26.00
38.00
50.00
26.00
14.00
2.00
2.00
Measured TU (FNU)
DENFIS_F1
14.00
26.00
38.00
50.00
Measured TU (FNU)
FIG. 3 Scatterplots of measured against calculated Turbidity at the USGS 11497500 station.
weakest performances obtained using the DENFIS_F1 model with R and NSE values of 0.827 and 0.659, respectively,
which is still largely less than the other three models. In addition, the Bat-ELM1 improves the FFNN1, DENFIS_O1,
and DENFIS_F1 by 47.16% and 35.37%, 53.14% and 42.19%, and 56.64% and 48.85% reduction in terms of RMSE
and MAE, respectively. Clearly, Bat-ELM1, FFNN1, and DENFIS_O1 were more accurate compared to DENFIS_F1,
and Bat-ELM1 further improves the river TU estimation. Fig. 3 shows scatterplot of river TU values calculated by
DENFIS_O1, DENFIS_F1, FFNN1, and Bat-ELM1 models compared with in situ measurement. A first look at the results
reveals high-to-moderate agreement between calculated and measured data by all four algorithms. However, it may seem
that the Bat-ELM1 possess the high accuracy with very low scattered data, followed by the FFNN1, the DENFIS_O1 in the
third place, while the high scattered data were obtained using the DENFIS_F1.
4.2 USGS 11501000 station
River TU estimation at the USGS 115001000 using the four machine learning models are compared to the in situ measured
data in Fig. 4 for the validation dataset. For both Bat-ELM1 and FFNN1 models, simulated TU fall generally along the one
to one line against in situ measurements with less scattered data and the superiority of the Bat-ELM1 is obvious; however,
the DENFIS_O1 and DENFIS_F1 worked equally with slight difference, and they are less accurate compared to Bat-ELM1
and FFNN1 with large scattered data. Quantitative measures of all river TU comparisons are shown in Table 4 in terms of
RMSE, MAE, NSE, and R values. From Table 4, estimated river TU was poorly correlated with in situ measured data for the
models based only on river discharge. The four machine learning models have low NSE and R values ranging from 0.384 to
80.00
y = 0.8423x + 1.1526
80.00
y = 0.8795x + 1.0095
R2 = 0.8085
60.00
60.00
Calculated TU (FNU)
Calculated TU (FNU)
R2 = 0.8787
40.00
20.00
Bat-ELM1
0.00
0.00
20.00
40.00
60.00
40.00
20.00
0.00
0.00
80.00
Measured TU (FNU)
80.00
20.00
40.00
60.00
80.00
Measured TU (FNU)
80.00
y = 0.698x + 1.8652
R2 = 0.7523
y = 0.7006x + 2.1264
R2 = 0.7587
60.00
Calculated TU (FNU)
60.00
Calculated TU (FNU)
FFNN1
40.00
20.00
DENFIS_O1
0.00
0.00
20.00
40.00
60.00
40.00
20.00
0.00
0.00
80.00
Measured TU (FNU)
DENFIS_F1
40.00
20.00
60.00
80.00
Measured TU (FNU)
FIG. 4 Scatterplots of measured against calculated Turbidity at the USGS 11501000 station.
TABLE 4 Performances of different River Turbidity models at the USGS 11501000 station.
Training
Validation
Models
R
NSE
RMSE
MAE
R
NSE
RMSE
MAE
Bat-ELM1
0.941
0.885
2.921
1.582
0.937
0.877
2.972
1.748
Bat-ELM2
0.685
0.469
6.273
3.217
0.699
0.488
6.069
3.295
DENFIS_O1
0.868
0.705
4.680
1.836
0.867
0.746
4.271
2.174
DENFIS_O2
0.720
0.478
6.224
2.984
0.656
0.384
6.657
3.617
DENFIS_F1
0.877
0.764
4.187
2.101
0.871
0.754
4.206
2.250
DENFIS_F2
0.677
0.458
6.340
3.338
0.702
0.490
6.054
3.408
FFNN1
0.981
0.963
1.667
1.103
0.899
0.802
3.773
2.376
FFNN2
0.725
0.525
5.933
3.050
0.664
0.437
6.362
3.436
Bat algorithm optimized extreme learning machine Chapter
2
49
0.490 and from 0.656 to 0.702, respectively, and none of the models possess a NSE value greater than 0.50. In terms of
errors metrics, the obtained RMSE and MAE were very high ranging from 6.054 to 6.657 and from 3.408 to 3.617, respectively. The difference in models performances between scenario 1 and 2 is apparent and the significant contribution of the
periodicity in the improvement of the models accuracy was completely clear for which the NSE and R value are somewhat
larger and the RMSE and MAE values are somewhat small. The RMSE and MAE of the Bat-ELM1 and FFNN1 were
improved by 51.03% and 46.95%, 40.69% and 30.85%, respectively. In addition, the RMSE and MAE of the DENFIS_O1
and DENFIS_F1 were improved by 35.84% and 39.89%, 30.53% and 33.98%, respectively. The most significant
improvement was achieved using the Bat-ELM1 for which the RMSE had dropped from 6.069 to 2.972, while the
MAE value was decreased from 3.295 to 1.748, respectively. In addition, an increase in the NSE and R values is to be
expected: the R spiked to almost 0.937 compared to 0.699 (25.40% improvement) obtained using only the river discharge,
while the NSE value rose by 44.36% (0.488–0.877). The improvement on models accuracies was attributed to the introduction of the periodicity as input variable combined the discharge. Finally, comparison between the models accuracy
revealed the superiority of the Bat-ELM1, followed by the FFNN1, while the two DENFIS models have typically the same
performances. For comparison, Bat-ELM1 decreased the RMSE and MAE values of the FFNN1, DENFIS_O1, and
DENFIS_F1 by 21.23% and 26.43%, 30.41% and 19.60%, and 29.34% and 22.31%, respectively.
4.3 USGS 14210000 station
Table 5 shows the numerical results obtained at the USGS 14210000 station using the machine learning models described
above. During the validation phase, the minimum RMSE and MAE of the second scenario (i.e., using only Q) are given as
well as the NSE and R values, showing the superiority of the Bat-ELM2 model, while the FFNN2, DENFIS_O2, and
DENFIS_F2 exhibit relatively the same level of accuracies, for which statistical measurement of error, i.e., RMSE and
MAE showed a range of 3.185–3.753, and 1.244–1.334, respectively, with larger errors values obtained by the FFNN2
(RMSE ¼ 3.753, MAE ¼ 1.334).
From Table 5, it can be seen that the errors index calculated using the Bat-ELM2 are generally the lower one with RMSE
and MAE of 3.185 FNU and 1.245 FNU, respectively. Across the two scenarios with and without the periodicity, scenario 1
having the Q and the periodicity as input variables show better performances over the four machine learning models, with
measurement errors (i.e., RMSE and MAE) significantly reduced. The RMSE varies from 3.396 at worst to 2.456 at best,
and the MAE varies from 1.299 at worst to 1.117 at best. Quantitative comparisons for all models in the form of RMSE,
MAE, R, and NSE values between observed and simulated values reported in Table 5 revealed that a significant percentage
improvement was achieved using the Bat-ELM1 in comparison to the FFNN1, DENFIS_O1, and DENFIS_F1 models. The
Bat-ELM1 increased R and NSE values by 9.24% and 23.43%, and decreased RMSE and MAE values by 25.21% and
14.01%, respectively, in the validation phase, compared to the FFNN1 model. In addition, The Bat-ELM1 decreased
the RMSE and MAE values by 27.68% and 7.76%, and increased R and NSE values by 11.97% and 28.14%, respectively,
in the validation phase, compared with DENFIS_O1 model. Finally, the Bat-ELM1 was more accurate compared to
DENFIS_F1 showing a significant decrease of the RMSE and MAE by 25.213% and 14.011%, respectively.
TABLE 5 Performances of different River Turbidity models at the USGS 14210000 station.
Training
Validation
Models
R
NSE
RMSE
MAE
R
NSE
RMSE
MAE
Bat-ELM1
0.850
0.723
2.469
1.109
0.898
0.806
2.456
1.117
Bat-ELM2
0.831
0.691
2.607
1.042
0.821
0.674
3.185
1.245
DENFIS_O1
0.808
0.594
2.989
1.160
0.802
0.629
3.396
1.211
DENFIS_O2
0.835
0.664
2.718
0.959
0.799
0.623
3.420
1.244
DENFIS_F1
0.898
0.804
2.075
0.901
0.807
0.646
3.318
1.241
DENFIS_F2
0.821
0.674
2.677
1.040
0.790
0.625
3.415
1.277
FFNN1
0.901
0.812
2.036
0.935
0.822
0.653
3.284
1.299
FFNN2
0.856
0.733
2.424
0.996
0.771
0.547
3.753
1.334
50
Handbook of hydroinformatics
80.00
y = 0.8020x + 0.5825
R2 = 0.8061
60.00
Calculated TU (FNU)
Calculated TU (FNU)
80.00
40.00
20.00
0.00
0.00
Bat-ELM1
40.00
20.00
60.00
y = 0.7991x + 0.6243
R2 = 0.6758
60.00
40.00
20.00
FFNN1
0.00
0.00
80.00
20.00
60.00
40.00
20.00
DENFIS_O1
20.00
40.00
80.00
y = 0.7107x + 0.7690
R2 = 0.6512
Calculated TU (FNU)
Calculated TU (FNU)
80.00
y = 0.7329x + 0.4145
R2 = 0.6434
0.00
0.00
60.00
Measured TU (FNU)
Measured TU (FNU)
80.00
40.00
60.00
80.00
60.00
40.00
20.00
0.00
0.00
Measured TU (FNU)
DENFIS_F1
20.00
40.00
60.00
80.00
Measured TU (FNU)
FIG. 5 Scatterplots of measured against calculated Turbidity at the USGS 14210000 station.
The comparisons between simulated and in situ measured TU are given in Fig. 5 in terms of scatterplot. The agreement is
very good for Bat-ELM1 with R2 determination coefficient always above 0.80, and the data were less scattered in comparison to the other three models for which the data were largely scattered with an R2 approaching 0.670.
4.4 USGS 14211010 station
At the USGS 14211010 station (Table 6), for the four developed models, it can be concluded that both during the training
and the validation phase, results showed that the inclusion of the periodicity as input variable has a marked effect on the
performances of the models. During the validation phase, it is clear from the obtained results that using only the river discharge as input variable, the performances of FFNN2, Bat-ELM2, DENFIS_O2, and DENFIS_F2 models were relatively
similar with slight superiority in favor to DENFIS_F2. An analysis of the statistical indices shows that the R and NSE values
are in the range of 0.824–0.852 and 0.679–0.726. Similarly, the RMSE and MAE range from 3.56 to 3.86 FNU and from
1.338 to 1.407, respectively. From Table 6, it is clear that the inclusion of the periodicity improves the performances of both
models. Using the periodicity and Q as input variables, the best Bat-ELM1 model had RMSE ¼ 2.626, MAE ¼ 1.161,
R ¼ 0.923, and NSE ¼ 0.851, and surpasses all other models in terms of accuracy. Scatterplot of calculated versus measured
river TU are given in Fig. 6. Finally, the performances of the models were evaluated and compared in terms of boxplot
(Fig. 7) and Taylor diagram (Fig. 8) showing the superiority and the high performances of the Bat-ELM1 compared to
the all developed models.
Bat algorithm optimized extreme learning machine Chapter
2
51
TABLE 6 Performances of different River Turbidity models at the USGS 14211010 station.
Training
Validation
Models
R
NSE
RMSE
MAE
R
NSE
RMSE
MAE
Bat-ELM1
0.934
0.872
2.289
1.145
0.923
0.851
2.626
1.161
Bat-ELM2
0.837
0.700
3.500
1.385
0.844
0.708
3.682
1.430
DENFIS_O1
0.867
0.727
3.336
1.289
0.862
0.739
3.480
1.422
DENFIS_O2
0.869
0.747
3.216
1.313
0.837
0.698
3.744
1.338
DENFIS_F1
0.931
0.867
2.329
1.062
0.847
0.704
3.705
1.226
DENFIS_F2
0.880
0.774
3.039
1.317
0.852
0.726
3.564
1.345
FFNN1
0.945
0.893
2.093
1.031
0.827
0.638
4.097
1.249
FFNN2
0.891
0.793
2.904
1.303
0.824
0.679
3.862
1.407
100.00
100.00
y = 0.8435x + 0.5604
R = 0.8516
80.00
Calculated TU (FNU)
Calculated TU (FNU)
2
60.00
40.00
20.00
0.00
0.00
Bat-ELM1
y = 0.8583x + 0.4882
R2 = 0.6832
80.00
60.00
40.00
20.00
0.00
0.00
20.00 40.00 60.00 80.00 100.00
FFNN1
20.00 40.00 60.00 80.00 100.00
Measured TU (FNU)
100.00
y = 0.7915x + 0.7133
R2 = 0.7424
80.00
Calculated TU (FNU)
Calculated TU (FNU)
100.00
60.00
40.00
20.00
0.00
0.00
Measured TU (FNU)
DENFIS_O1
20.00 40.00 60.00 80.00 100.00
y = 0.8136x + 0.6708
R2 = 0.7174
80.00
60.00
40.00
20.00
0.00
0.00
DENFIS_F1
20.00 40.00 60.00
Measured TU (FNU)
FIG. 6 Scatterplots of measured against calculated Turbidity at the USGS 14211010 station.
80.00 100.00
Measured TU (FNU)
52
Handbook of hydroinformatics
USGS 11497500
30
20
10
USGS 11501000
60
[TU (RFU)]
[TU (RFU)]
40
40
20
0
0
M
M1 M2 M3 M4 M5 M6 M7 M8
M
150
[TU (RFU)]
[TU (RFU)]
60
40
20
M1 M2 M3 M4 M5 M6 M7 M8
USGS 14211010
USGS 14210000
80
M: Measured
M1: FFN1
M2: FFN1
M3: Bat-ELM1
M4: Bat-ELM2
M5: DENFIS_O1
M6: DENFIS_O2
M7: DENFIS_F1
M8: DENFIS_F2
100
50
0
0
M
M1 M2 M3 M4 M5 M6 M7 M8
M
M1 M2 M3 M4 M5 M6 M7 M8
FIG. 7 Box-plots of measured and calculated river turbidity (TU: RFU) for the four USGS stations. Boxes are generated using validation dataset illustrating the 25th and 75th percentiles, and the median. Whiskers include the highest and lowest values and the mean values are marked by red line.
FIG. 8 Taylor diagram of river turbidity (TU: RFU) illustrating the statistics of comparison between the proposed models at the four USGS stations.
Bat algorithm optimized extreme learning machine Chapter
2
53
5. Conclusions
As a key water quality variable, river turbidity is of great concern in a large number environmental, water resources, and
hydrological studies. In this study, first, a robust model for predicting the river TU using only river discharge was fitted and
the obtained results were low to moderate. Next, a nonlinear model between the river TU, discharge, and the periodicity
(i.e., day, month, and year numbers) was established using a new hybrid machine learning model (i.e., Bat-ELM). Then, the
proposed model was applied and tested using data collected at four USGS stations. And finally, the estimation provided by
the Bat-ELM was compared to those achieved using two kinds of machine learning models namely the FFNN and DENFIS
models. The new method introduced in the present study (Bat-ELM) made a good to excellent work, and an excellent progress in modeling the river TU was achieved. Therefore, the new method was defined as the best and the useful method for
the estimation of the river TU. The overall accuracy of prediction was significantly improved by the inclusion of the periodicity and the correlation coefficient between the measured and predicted river TU reached 0.97, and the corresponding
RMSE was 1.731. However, when the model was examined without the inclusion of the periodicity and using only the river
discharge, the performances of the Bat-ELM were not the greatest and in some cases, it was surpassed by the DENFIS
models. Results obtained in the present study encompass an encouraging record of progress and achievement by the
use of the machine learning models, and can be applied using data from other stations. Future work should be emphasized
on performances of the proposed models using other input variables and future researches should be encouraged. Also, the
obtained results in the present chapter seem to be interesting and the overall merits of the proposed hybrid Bat-ELM
highlighted. Having seen that, the Bat-ELM surpasses all of the FFNN and DENFIS models at the four stations has lead
us to conclude that the idea of hybridizing machine learning, i.e., the ELM is very promising and should be used for the
other machine learning models.
References
Adnan, R.M., Liang, Z., Parmar, K.S., Soni, K., Kisi, O., 2021. Modeling monthly streamflow in mountainous basin by MARS, GMDH-NN and DENFIS
using hydroclimatic data. Neural Comput. Applic. 33 (7), 2853–2871.
Allam, M., Khan, M.Y.A., Meng, Q., 2020. Retrieval of turbidity on a spatio-temporal scale using Landsat 8 SR: a case study of the Ramganga River in the
Ganges Basin, India. Appl. Sci. 10 (11), 3702. https://doi.org/10.3390/app10113702.
Al-Yaseri, I., Morgan, S., Retzlaff, W., 2013. Using turbidity to determine total suspended solids in storm-water runoff from green roofs. J. Environ. Eng.
139 (6), 822–828. https://doi.org/10.1061/(ASCE)EE.1943-7870.
Cheng, K., Gao, S., Dong, W., Yang, X., Wang, Q., Yu, H., 2020. Boosting label weighted extreme learning machine for classifying multi-label imbalanced
data. Neurocomputing 403, 360–370. https://doi.org/10.1016/j.neucom.2020.04.098.
Gangwar, S., Pathak, V.K., 2020. Dry sliding wear characteristics evaluation and prediction of vacuum casted marble dust (MD) reinforced ZA-27 alloy
composites using hybrid improved bat algorithm and ANN. Mater. Today Commun. 25, 101615. https://doi.org/10.1016/j.mtcomm.2020.101615.
Gelda, R.K., Effler, S.W., 2007. Modeling turbidity in a water supply reservoir: advancements and issues. J. Environ. Eng. 133 (2), 139–148. https://doi.
org/10.1061/(ASCE)0733-9372(2007)133:2(139).
Gelda, R.K., Effler, S.W., Peng, F., Owens, E.M., Pierson, D.C., 2009. Turbidity model for Ashokan Reservoir, New York: case study. J. Environ. Eng. 135
(9), 885–895. https://doi.org/10.1061/(ASCE)EE.1943-7870.0000048.
Gelda, R.K., Effler, S.W., Prestigiacomo, A.R., Peng, F., Effler, A.J., Wagner, B.A., et al., 2013. Characterizations and modeling of turbidity in a water
supply reservoir following an extreme runoff event. Inland Waters 3 (3), 377–390. https://doi.org/10.5268/IW-3.3.581.
Gu, K., Zhang, Y., Qiao, J., 2020. Random forest ensemble for river turbidity measurement from space remote sensing data. IEEE Trans. Instrum.
Meas. https://doi.org/10.1109/TIM.2020.2998615.
Haykin, S., 1999. Neural Networks a Comprehensive Foundation. Prentice Hall, Upper Saddle River, UK.
Heddam, S., Kisi, O., 2020. Evolving connectionist systems versus neuro-fuzzy system for estimating total dissolved gas at forebay and tailwater of dams
reservoirs. In: Intelligent Data Analytics for Decision-Support Systems in Hazard Mitigation. Springer, Singapore, pp. 109–126, https://doi.org/
10.1007/978-981-15-5772-9_6.
Heddam, S., Watts, M.J., Houichi, L., Djemili, L., Sebbar, A., 2018. Evolving connectionist systems (ECoSs): a new approach for modeling daily reference
evapotranspiration (ET0). Environ. Monit. Assess. 190 (9), 516. https://doi.org/10.1007/s10661-018-6903-0.
Hornik, K., 1991. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4 (2), 251–257. https://doi.org/10.1016/0893-6080 (91)
90009-T.
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366. https://doi.org/
10.1016/0893-6080 (89)90020-8.
Hrnjica, B., Mehr, A.D., Behrem, Š., Agıralioglu, N., 2019. Genetic programming for turbidity prediction: hourly and monthly scenarios. Pamukkale
€
Universitesi
M€
uhendislik Bilimleri Dergisi 25 (8), 992–997. https://doi.org/10.5505/pajes.2019.59458.
Huang, G.B., Chen, L., Siew, C.K., 2006a. Universal approximation using incremental constructive feedforward networks with random hidden nodes.
IEEE Trans. Neural Netw. 17 (4), 879–892. https://doi.org/10.1109/TNN.2006.875977.
54
Handbook of hydroinformatics
Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006b. Extreme learning machine: theory and applications. Neurocomputing 70 (1–3), 489–501. https://doi.org/
10.1016/j.neucom.2005.12.126.
Iglesias, C., Torres, J.M., Nieto, P.G., Fernández, J.A., Muñiz, C.D., Piñeiro, J.I., Taboada, J., 2014. Turbidity prediction in a river basin by using artificial
neural networks: a case study in northern Spain. Water Resour. Manage. 28 (2), 319–331. https://doi.org/10.1007/s11269-013-0487-9.
Jaddi, N.S., Abdullah, S., Hamdan, A.R., 2015. Multi-population cooperative bat algorithm-based optimization of artificial neural network model. Inf. Sci.
294, 628–644. https://doi.org/10.1016/j.ins.2014.08.050.
Kasabov, N.K., Song, Q., 2002. DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction. IEEE Trans.
Fuzzy Syst. 10 (2), 144–154. https://doi.org/10.1109/91.995117.
Kasabov, N., Song, Q., Ma, T.M., 2008. Fuzzy-neuro systems for local and personalized modelling. In: Forging New Frontiers: Fuzzy Pioneers II. Springer,
Berlin, Heidelberg, Germany, pp. 175–197.
Kisi, O., Heddam, S., Yaseen, Z.M., 2019a. The implementation of univariable scheme-based air temperature for solar radiation prediction: new development of dynamic evolving neural-fuzzy inference system model. Appl. Energy 241, 184–195. https://doi.org/10.1016/j.apenergy.2019.03.089.
Kisi, O., Khosravinia, P., Nikpour, M.R., Sanikhani, H., 2019b. Hydrodynamics of river-channel confluence: toward modeling separation zone using GEP,
MARS, M5 Tree and DENFIS techniques. Stoch. Environ. Res. Risk Assess., 1–19. https://doi.org/10.1007/s00477-019-01684-0.
Liu, L.W., Wang, Y.M., 2019. Modelling reservoir turbidity using Landsat 8 satellite imagery by gene expression programming. Water 11 (7),
1479. https://doi.org/10.3390/w11071479.
Liu, Z., Jin, W., Mu, Y., 2020a. Variances-constrained weighted extreme learning machine for imbalanced classification. Neurocomputing. https://doi.org/
10.1016/j.neucom.2020.04.052.
Liu, Q., Li, J., Wu, L., Wang, F., Xiao, W., 2020b. A novel bat algorithm with double mutation operators and its application to low-velocity impact localization problem. Eng. Appl. Artif. Intell. 90, 103505. https://doi.org/10.1016/j.engappai.2020.103505.
Luu, Q.H., Lau, M.F., Ng, S.P., Chen, T.Y., 2021. Testing multiple linear regression systems with metamorphic testing. J. Syst. Softw. 182, 111062. https://
doi.org/10.1016/j.jss.2021.111062.
Mather, A.L., Johnson, R.L., 2014. Quantitative characterization of stream turbidity-discharge behavior using event loop shape modeling and power law
parameter decorrelation. Water Resour. Res. 50 (10), 7766–7779. https://doi.org/10.1002/2014WR015417.
Mather, A.L., Johnson, R.L., 2016. Forecasting turbidity during streamflow events for two mid-Atlantic US streams. Water Resour. Manage. 30 (13),
4899–4912. https://doi.org/10.1007/s11269-016-1460-1.
Matouq, M., El-Hasan, T., Al-Bilbisi, H., Abdelhadi, M., Hindiyeh, M., Eslamian, S., Duheisat, S., 2013. The climate change implication on Jordan:
a case study using GIS and Artificial Neural Networks for weather forecasting. J. Taibah Univ. Sci. 7 (2), 44–55. https://doi.org/10.1016/j.
jtusci.2013.04.001.
Mehr, A.D., Nourani, V., 2018. Season algorithm-multigene genetic programming: a new approach for rainfall-runoff modelling. Water Resour. Manage.
32 (8), 2665–2679. https://doi.org/10.1007/s11269-018-1951-3.
Najah, A., El-Shafie, A., Karim, O.A., El-Shafie, A.H., 2013. Application of artificial neural networks for water quality prediction. Neural Comput. Applic.
22 (1), 187–201. https://doi.org/10.1007/s00521-012-0940-3.
Olyaie, E., Abyaneh, H.Z., Mehr, A.D., 2017. A comparative analysis among computational intelligence techniques for dissolved oxygen prediction in
Delaware River. Geosci. Front. 8 (3), 517–527. https://doi.org/10.1016/j.gsf.2016.04.007.
Park, J.C., Um, M.J., Song, Y.I., Hwang, H.D., Kim, M.M., Park, D., 2017. Modeling of turbidity variation in two reservoirs connected by a water transfer
tunnel in South Korea. Sustainability 9 (6), 993. https://doi.org/10.3390/su9060993.
Rajaee, T., Jafari, H., 2018. Utilization of WGEP and WDT models by wavelet denoising to predict water quality parameters in rivers. J. Hydrol. Eng. 23
(12), 04018054. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001700.
Sebbar, A., Heddam, S., Kisi, O., Djemili, L., Houichi, L., 2020. Comparison of evolving connectionist systems (ECoS) and neural networks for modelling
daily pan evaporation from Algerian dam reservoirs. In: Negm, A.M., Bouderbala, A., Chenchouni, H., Barceló, D. (Eds.), Water Resources in
Algeria – Part I. The Handbook of Environmental Chemistry. vol. 97. Springer, Cham, Switzerland, https://doi.org/10.1007/698_2020_527.
Shekhar, C., Varshney, S., Kumar, A., 2020. Optimal control of a service system with emergency vacation using bat algorithm. J. Comput. Appl. Math. 364,
112332. https://doi.org/10.1016/j.cam.2019.06.048.
Škrjanc, I., Iglesias, J.A., Sanchis, A., Leite, D., Lughofer, E., Gomide, F., 2019. Evolving fuzzy and neuro-fuzzy approaches in clustering, regression,
identification, and classification: a survey. Inf. Sci. 490, 344–368. https://doi.org/10.1016/j.ins.2019.03.060.
Teixeira, L.C., Mariani, P.P., Pedrollo, O.C., dos Reis Castro, N.M., Sari, V., 2020. Artificial neural network and fuzzy inference system models for forecasting suspended sediment and turbidity in basins at different scales. Water Resour. Manage. 34 (11), 3709–3723. https://doi.org/10.1007/s11269020-02647-9.
Tsai, T.M., Yen, P.H., 2017. GMDH algorithms applied to turbidity forecasting. Appl. Water Sci. 7 (3), 1151–1160. https://doi.org/10.1007/s13201-0160458-4.
Xie, X., Qin, X., Zhou, Q., Zhou, Y., Zhang, T., Janicki, R., Zhao, W., 2019. A novel test-cost-sensitive attribute reduction approach using the binary bat
algorithm. Knowl.-Based Syst. 186, 104938. https://doi.org/10.1016/j.knosys.2019.104938.
Yang, X.S., 2010. A new metaheuristic Bat-inspired algorithm. In: González, J.R., Pelta, D.A., Cruz, C., Terrazas, G., Krasnogor, N. (Eds.), Nature
Inspired Cooperative Strategies for Optimization (NICSO 2010). Studies in Computational Intelligence, vol. 284. Springer, Berlin, Heidelberg,
Germany, https://doi.org/10.1007/978-3-642-12538-6_6.
Zhang, R., Wu, B., 2020. Environmental impacts of high water turbidity of the Niulan River to Dianchi Lake Water Diversion Project. J. Environ. Eng. 146
(1), 05019006. https://doi.org/10.1061/(ASCE)EE.1943-7870.0001623.
Bat algorithm optimized extreme learning machine Chapter
2
55
Zolfaghari, K., Wilkes, G., Bird, S., Ellis, D., Pintar, K.D.M., Gottschall, N., McNairn, H., Lapen, D.R., 2020. Chlorophyll-a, dissolved organic carbon,
turbidity and other variables of ecological importance in river basins in southern Ontario and British Columbia, Canada. Environ. Monit. Assess. 192
(1), 1–16. https://doi.org/10.1007/s10661-019-7800-x.
Zounemat-Kermani, M., Alizamir, M., Fadaee, M., Sankaran Namboothiri, A., Shiri, J., 2020. Online sequential extreme learning machine in river water
quality (turbidity) prediction: a comparative study on different data mining approaches. Water Environ. J. https://doi.org/10.1111/WEJ.12630.
This page intentionally left blank
Chapter 3
Bayesian theory: Methods and applications
Yaser Sabzevaria and Saeid Eslamiana,b
a
Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, b Center of Excellence in Risk Management
and Natural Hazards, Isfahan University of Technology, Isfahan, Iran
1. Introduction
Bayesian law expresses the relationship between dependent variables. The Bayesian relation uses a numerical estimate of
the probabilistic knowledge of the hypothesis before the observations occur, and provides a numerical estimate of the probabilistic knowledge of the hypothesis after the observations. This law for classifying phenomena is based on the probability
of occurrence or nonoccurrence of a phenomenon and is important and widely used in probability theory. If we can choose
such a separation for a given sample space that knowing which of the separated events occurred would reduce an important
part of the uncertainty (Alinezhad et al., 2020). This is useful because it can be used to calculate the probability of an event
being conditional on the occurrence or nonoccurrence of another event. In many cases, it is difficult to calculate the probability of an incident directly. Using this theorem and conditioning one event on another, the probability can be calculated.
Bayesian theory has three methods: Bayes Optimal Classifier, Naive Bayes classifier, and Bayesian network. In hydrological issues, the Bayesian network has been used more. These networks are graphical networks that represent a set of
possible variables and their conditional dependencies by a directional noncyclic graph (DAG). Bayesian network nodes
represent variables that can be visible values, hidden variables, or unknown parameters. The edges of this network indicate
dependencies. Each node has a probability function that includes the initial probability (for parentless nodes) or conditional
probabilities related to the combination of different states of the parent nodes. The term Bayesian network was first coined
by Barre Pearl in 1987 to emphasize the following three aspects:
The subjective and judgmental nature of input information. In fact, many uncertain propositions do not involve a significant amount of historical data, and even with past historical information, judgmental information needs to be extracted
in order to be able to measure uncertainty.
Relying on conditional criteria as a basis for updating information.
The difference between causal states and conclusive observations with emphasis on the well-known Thomas Bayes law
(Bayes, 1763).
2. Bayesian inference
In statistical science, there are two doctrines called the Frequentist doctrine and the Bayesian doctrine. In the Frequentist
doctrine, only observations and frequency of events are cited and problems can be solved according to it, while in Bayesian
doctrine, in addition to observations, information and initial beliefs of the researcher are also important and are considered
in problem solving and conclusion. Another difference between the two doctrines is that in the Bayesian doctrine, the
unknowns are the random variables, which means that unlike the Frequentist doctrine of the unknowns in the Bayesian
doctrine, they do not have a fixed answer but a probability function for the unknown which gives the different values
of probability for the unknown. For example, in the Frequentist doctrine, a person is either sick or not, while in the Bayesian
method, a person can be 30% sick and 70% healthy.
In Bayesian inference, an initial estimate of the unknowns is required. This estimate is the researcher’s initial knowledge
or “Prior knowledge” which is expressed as a function of mathematical probability. Observations are then made and information about the unknowns is collected by the researcher, and using this new information, the initial probability function is
updated. By gathering more information and updating the probability functions corresponding to the unknowns, more
accurate probability distribution functions and better estimates can be obtained (Kouhestani et al., 2017).
Drayton (1978) in an introduction to introduce the use of Bayesian method in meta-analysis for humanities issues, he
argues that achieving general cause-and-effect relationships requires the repeated experiments. Since such activities require
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00026-9
Copyright © 2023 Elsevier Inc. All rights reserved.
57
58
Handbook of hydroinformatics
the initial planning and coordination between the different researchers and this coordination is almost impossible to
implement, Drayton suggests that the combined methods be used to achieve the goal in question.
3.
Phases
In Bayesian method, the three phases are as follows:
1. In the first stage, the researcher must express his belief in reality and pass it through the statistical filter of the expected
mean, expected variance and the strength of beliefs in the initial trust. These three criteria can be based on the previous
experience, past research, or a combination of them. If past experiences are expressed as mean, standard deviation and
hypothetical sample size, there is nothing to prevent the reference to past research.
2. The second stage is to collect the results of experiments or observations. This step can be done by summarizing the
statistics that are similar to those already predetermined.
3. The third stage is the combination of likelihood and initial belief and the formation of posterior information.
Subsequent information can be new and more informative than the original information. Combining the latter information
with other researches creates a new likelihood. In these methods, they are repeated in the same way, and as a result, they
lead to a new study and finally become their own characteristics. As Dritton points out, sampling can continue as long as it
covers the whole community or until the latest discrepancies are justified. This method is flexible in using different coefficients and mathematical transformations. Bayesian theory is as sensitive as sample (n).
4.
Estimates
Bayesian method includes the classical estimates such as:
l
l
l
l
Maximum A Posteriori (MAP)
Maximum probability (ML)
Minimum mean square error (MMSE)
Minimum average error size (MAVE) was considered as a special case.
The Markov concealed model is widely used in statistical signal processing, which is an example of the Bayesian model.
Bayesian inference is based on minimizing the Bayesian risk function, which is obtained using the mentioned models
and using the observations and value of the error function ().
5.
Theorem Bayes
A method for classifying phenomena is based on the probability of occurrence or nonoccurrence of a phenomenon and is
important and widely used in probability theory. If such a separation can be chosen for the hypothetical sample space that by
knowing which of the separated events occurred, an important part of the uncertainty is reduced.
This is useful because it can be used to calculate the probability of an event being conditional on the occurrence or
nonoccurrence of another event. In many cases, it is difficult to calculate the probability of an incident directly. Using this
theorem and conditioning one event on another, the probability can be calculated.
This relationship is famous for the honor of the English philosopher Thomas Bayes called Formula Bays.
Main equation: Suppose B1, …, Bk form a partition for the sample space S. So that for every J ¼ 1, …, k, we have
P(Bj) > 0 and suppose A is an event with the assumption P(A) > 0, in which case for i ¼ 1, …, k, we have:
PðBi ÞP ABi
P ð Bi j AÞ ¼ X k
P
Bj P A Bj
j¼1
5.1 Argument of Bayes
PðB \AÞ
By definition conditional probability we have PðBi jAÞ ¼ PðiAÞ . The form of the deduction according to the order multiplied by the conditional probability is equal P(Bi jA) and the denominator of the deduction according to the proposition of
the law of total probability is equal P(A).
Bayesian theory: Methods and applications Chapter
3
59
If A and B are two assumed events, we can consider event A as follows:
A ¼ ð A \ BÞ [ ð A \ B0 Þ
Because the point in A must be either in both A and B or it must be in A and not in B. On the other hand, we know A \ B and
A \ B0 they are incompatible, so we can write:
PA ¼ PðABÞ + PðAB0 Þ ¼ PðAjBÞPðBÞ + PðAjB0 ÞPðB0 Þ ¼ PðAjBÞPðBÞ + PðAjB0 Þð1 PðBÞÞ
This relationship states that the probability of occurrence of event A is a weighted average of the conditional probability
(A jB) and the conditional probability (A jB0 ). The weight given to any conditional probability is as probable as the condition
on which A is conditional. The above relation can be generalized as follows. Assume that events B1, B2, … and Bn are
inconsistent pairwise events, On the other hand, the following relation is also established between these events:
S ¼ [ni Fi
From this statement, it can be inferred that one of the events B1, B2, … and Bn must have taken place. On the other hand, we
know that events A \ Bi are inconsistent in pairs for i ¼ 1, …, k and we write:
A ¼ [ni ðA \ Bi Þ
Here you can write:
Pð AÞ ¼
n
n
X
X
ð A \ Bi Þ ¼
p A Bi Pð B i Þ
i¼1
i¼1
This relation describes how can P(A) be calculated by making one of the events B1, B2, … and Bn conditional. In general,
this equation describes P(A) is equal to the weighted average P(Aj B0 ) so that each weight of each sentence is equal to the
probability to which it is conditioned. Suppose that event A occurs and we want to calculate the probability that one of the
events Bi happened:
f yjy ðyjyÞ ¼ f yjy ðyjyÞ
f y ð yÞ
f y ðyÞ
5.2 Bayesian estimation theory
Estimation theory is concerned with determining the best estimate of uncertain parameters by observing related signals, or
the recovery of a signal when combined with noise. For example, a noise sinusoidal signal is given and there is an interest in
obtaining its basic parameters (amplitude, frequency, phase, etc.) or in obtaining the signal itself. The estimator has its own
noise observation set as input and obtains estimates from unspecified parameters using dynamic models or statistical
models. The accuracy of the estimate depends on the available data and the efficiency of the estimator.
Bayesian model uses data from the observed signal and accumulation of previous probabilities of the process. Now we
want to estimate the random variable y based on the random variable y. According to Bayesian law, the density function y is
given by y as follows:
f yjy ðyjyÞ ¼ f yjy ðyjyÞ
f y ð yÞ
f y ðyÞ
Which fy(y) is a constant for observation given that we have y in it and has only a multiplier effect. There are two other
density functions in the Bayesian formula, one is fyj y(y jy) the probability of observing y provided that y occurs and the other
is the probability density function y.
The effect of the density function of fyj y(yj y) and fy(y) on fyj y(yjy) depends on form of function. That is, the higher the
peak, the greater the effect, and if the function is constant, it will have no effect (Sean, 2004).
5.3 Machine learning using Bayesian method
For Bayesian approach to machine learning (or any other process), you must first:
l
Formulate existing knowledge about the subject in a probabilistic way: to do this, we must model the qualitative quantities of knowledge in the form of probability distributions, independence hypotheses, and so on. This model will have
60
l
l
l
l
l
Handbook of hydroinformatics
unknown parameters that for each of the unknown values, the initial probability distribution is considered, which
reflects our belief in the probability of each of these values without seeing the data.
By collecting data and observing it, we calculate the value of the secondary probability distribution
Using this secondary probability
We come to a conclusion about uncertainty
We make predictions by averaging the secondary probability values
Decide to reduce the expected secondary error
5.4 Bayesian theory in machine learning
In machine learning, we usually look for the best hypothesis in the space of Hypothesis H that applies to D training data.
One way to determine the best hypothesis is to look for the most probable hypothesis that, given the training data D and the
previous probability of different hypotheses, one might expect Bayesian theory to provide such a solution.
The base of Bayesian learning is Bayesian theory. This theory makes it possible to calculate the secondary probability
based on the primary probabilities:
PðhjDÞ ¼
PðDjhÞPðhÞ
Pð D Þ
As can be seen, the value PðhjDÞ ¼ PðDPjðhDÞPÞ ðhÞ decreases with increasing P(D). Because the higher the probability of seeing
D independent of h, the less evidence D has to support h.
5.5 Definition of basic concepts
Assume that the space of hypothesis h and the set of instruction examples D exist. We define the following probability
values:
P(h): Prior probability that hypothesis h had, before viewing the training data of D. If such a possibility does not exist, all
hypotheses can be given the same probability.
P(D): Probability of viewing training data of D.
P(D jh): Likelihood or probability of viewing training data of D assuming that hypothesis h is true.
P(hj D): Posterior probability or hypothesis probability of h provided training data of D is observed.
Note that the former probability P(h) is independent of the training data but the P(h jD) posterior probability reflects the
effect of the training data.
5.6 Bayesian machine learning methods
Bayesian methods offer hypotheses that are able to predict probability. A number of Bayesian machine learning methods
include the following:
l
l
l
Bayes Optimal Classifier
Naive Bayes classifier
Bayesian networks
5.7 Optimal Bayes classifier
5.7.1 Background and theory
Consider now that instead of having a continuous output variable Y, we have instead a categorical output variable G. This
model is summarized as
l
l
l
l
Input: X RpX Rp comes from a pp dimensional space
Output classification G G where G is a random variable corresponding to the discrete output value, and G is the discrete
output space.
Joint distribution on the input and output Pr(X,G) ¼ [(x1,g1),(x1,g2)…(xm,gm)]Pr(X,G) ¼ [(x1,g1),(x1,g2)…(xm,gm)]
Goal is to learn a function f(x):Rp ! G which takes inputs from the p dimensional input space and maps them to the
discrete output space.
Bayesian theory: Methods and applications Chapter
3
61
A first step is to decide on an appropriate loss function, as the usual \qq{squared loss} is not appropriate for discrete outputs.
Instead we will use the simple \qq{0-1 loss} function which is defined as follows.
Define the loss as a K K matrix, where K ¼ card(G) where the matrix will have 0 on the diagonal and nonnegative
values otherwise. So the loss L(k,l) is the k,l entry of the matrix, and is the cost of classifying k as l. For example, in the case
of three classes, we could get
2
3
0 1 1
6
7
41 0 15
1 1
Which means we can write the 0 1 loss function as:
Lðk, lÞ
0
0 if k ¼ l
1 if k 6¼ l
Lðk, lÞ ¼ τðk 6¼ lÞ ¼ 1 τðk ¼ lÞ
The Expected Predicted Error (EPE) is therefore:
h
i
EPE fbðxÞ ¼ E LðG, fbðXÞÞ
Where the expectation is taken with respect to the joint distribution Pr(X,G). Again we can condition on X to obtain
h
i
EPE fbðxÞ ¼ Ex EGjX LðG, fbðXÞÞjX
K
h
i
X
L k, fbðXÞ PT ðkjXÞ
EPE fbðxÞ ¼ Ex
k¼1
where k ¼ 1, …, K are all the possible values that the random variable G can take, i.e., the set G. Note that this is the discrete
version which is analogous to the derivations discussed in the previous section.
As we want to minimize the expected loss, we can do the following:
fbðXÞ ¼ arg min g
K
X
L½k, gPT ðkjXÞ
k¼1
¼ arg min g
K
X
ð1 τðk ¼ gÞÞPT ðkjXÞ
k¼1
¼ arg max g
K
X
τðk ¼ gÞPT ðkjXÞ
k¼1
Since the indicator function is 11 when k ¼ g, we get
fbðXÞ ¼ arg maxg PT ðgjXÞ ¼ MAP
In other words, the optimal Bayes decision rule is to choose the class presenting the maximum posterior probability, given the
particular observation at hand. Classifiers such as these are called Bayes Optimal Classifier or Maximum a Posteriori classifiers.
Since, for a given observation x, the marginal distribution of p(x) is constant in the denominator of Bayes theorem, we
can simplify this decision rule further as:
f^ðXÞ ¼ arg maxg PT ðgjX ¼ xÞ
P ðxjgÞpðgÞ
¼ arg maxg T
pð x Þ
¼ arg maxg PT ðxjgÞðpðgÞ
¼ arg maxg log PT ðxjgÞ + log ðpðgÞ
This form makes clear that the MAP decision rule tries to reach a compromise between the a priori expectations p(g) and the
evidence provided by the data via the likelihood function p(x jg).
62
Handbook of hydroinformatics
The Optimal Bayes classifier chooses the class that has greatest a posteriori probability of occurrence (so called
maximum a posteriori estimation, or MAP). It can be shown that of all classifiers, the Optimal Bayes classifier is the
one that will have the lowest probability of miss classifying an observation, i.e., the lowest probability of error. So if
we know the posterior distribution, then using the Bayes classifier is as good as it gets.
In real life, we usually do not know the posterior distribution, but rather we estimate it. The Naive Bayes classifier
approximates the optimal Bayes classifier by looking at the empirical distribution and by assuming independence of predictors. So the Naive Bayes classifier is not itself optimal, but it approximates the optimal solution.
5.8 Naive Bayes classifier
In machine learning, a group of simple categorizers based on probabilities is said to be based on Bayes’ theorem, assuming the
independence of random variables. The Bayesian method is simply a method of classifying phenomena based on the probability of occurrence or nonoccurrence of a phenomenon. This method is one of the simplest forecasting algorithms that also
has acceptable accuracy (Sean, 2004). Its accuracy can be significantly increased by using kernel density estimation. The
learning method in the simple Bayesian method is supervised learning (Sean, 2004). This method was developed among
information retrieval scientists in the decade and is still one of the most popular methods in document classification.
A simple Bayesian assumes the independence of the prediction variables; hence, it is called a simple Bayesian (Sean,
2004). There are many applications that estimate the parameters of naive Bayes, so people without the need for Bayesian
theory can take advantage of this opportunity to solve problems. Despite the design issues and assumptions about the
Bayesian method, this method is suitable for classifying most problems in the real world.
Probabilistic modeling:
If we have n variables, that is x ¼ (x1, …, xn), and y is the output of a set of k members. The purpose of modeling is to find
the conditional probability of each of these k categories means: p(Ck jx1, …, xn). According to Bayesian law, this probability
is equal to (Jensen, 2001):
pðCk jxÞ ¼
pðCk , xÞ
apðCk , xÞ
pðxÞ
p ð C , xÞ
In other words, the conditional probability pðCk jxÞ ¼ pðkxÞ apðCk , xÞ depends on the combined distribution of X and Ck.
According to the chain law, this combined distribution is equal to:
pðCk , x1 , …, xn Þ ¼ pðx1 , …, xn , CÞ pðCk , x1 , …, xn Þ ¼ px1 x2 , …, xn , Ck pðx2 , …, xn , Ck Þ pðCk , x1 , …, xn Þ ¼ p x1 x2 , …, xn , Ck p x2 x3 , …, xn , Ck pðx3 , …, xn , Ck Þ
pðCk , x1 , …, xn Þ… pðCk , x1 , …, xn Þ ¼ p x1 x2 , …, xn , Ck p x2 x3 , …, xn , Ck …pðxn1 xn , Ck Þp xn Ck pðCk Þ
Now, if we assume that each variable is independent of the other variables, provided that the category Ck means p(xi jxi + 1,
…, xn, Ck) ¼ p(xi j Ck), then we get the following result:
pðCk x1 , …, xn ÞapðCk , x1…,xn Þ pðCk , x1 …, xn Þ ¼ pðCk Þp x1 Ck p x2 Ck p x3 Ck …
n Y
pðCk , x1 …, xn Þ ¼ pðCk Þ p xi Ck
i¼1
By normalizing
the previous expression, the conditional probability distribution can be found, in the below equation
P
z ¼ p(x) kp(Ck)p(x jCk) is the same as the normalization coefficient:
n
Y
1
p xi C k
p Ck x1 …, xn ¼ pðCk Þ
z
i¼1
If the goal is to find the most probable category, the normalization coefficient, Z, is not needed:
ybðXÞ ¼ arg maxg pðCk Þ
n
Y
p xi Ck
i¼1
kf1, …, kg
Bayesian theory: Methods and applications Chapter
3
63
Estimation of parameters:
To model a naive Bayesian classifier for all ks, we need to estimate p(Ck) and p(xi jCk). p(Ck) simply obtained by calculating the percentage of data belonging to the Ck class. There are several ways to get p(xi jCk), estimating polynomial
distributions or natural distributions are common ways to do this (Chin et al., 2009).
In the natural distribution estimation method, we estimate p(xi jCk) with a natural distribution with mean mi, k and var2
iance s2
i,k and obtain mi,k and si,k through correct placement:
0 2 1
u
m
i,kÞ C
1
B
p xi ¼ uCk ¼ qffiffiffiffiffiffiffiffiffiffiffiffi exp @
A
2
2s
2
i,k
2psi,k
If xi is discrete, the p(xi ¼ u jCk) distribution can be estimated by a polynomial distribution.
Advantages and disadvantages:
Research in 2004 provided theoretical reasons for Bayesian irrational behaviors, and in 2006, comprehensive observations were made to compare the method with other classification methods such as boosted trees and random forests, which
confirmed its effectiveness.
The advantages of this method include the following:
l
l
l
Categorizing test data is easy and fast. It also performs well when the number of categories is more than two.
As long as the condition of independence is met, a simple Bayesian classifier performs better than other models such as
logistic regression and requires little training.
When our inputs are categorized, this method works better than when our inputs are numbered. For the case where the
input is a number, it is usually assumed that they follow the normal distribution.
In addition to the advantages of this classifier, it also has disadvantages, including:
l
l
If our input is categorized and there are categories in the learning phase that the categorizer has not seen any data from,
the potential categorizer will be zero for that category and will not be able to categorize. To solve this problem,
smoothing techniques such as Laplace estimator can be used.
Another disadvantage of this categorizer is that it is almost impossible to achieve the condition of independence in the
real world.
Applications:
Some of the uses of this categorizer are as follows:
l
l
l
l
Text categorizers: Naive Bayesian classifiers are commonly used in text categorization and have a higher success rate
than other methods.
Spam filtering: One of the most popular uses of this category is spam filtering. This filtering method uses a naive
Bayesian classifier to identify spam emails. Many e-mail servers today use Bayesian spam filtering. This method is
also used in spam filtering softwares. Server-side filters such as Bogofilter, SpamBayes, SpamAssassin, DSPAM,
and ASSP also use Bayesian spam filtering techniques.
Recommending system: A naive Bayesian classifier with group refinement forms a recommending system that uses
machine learning and data mining techniques to filter out unseen information and predict a user’s opinion on
various items.
Emotion analysis: This categorizer is used to analyze the emotions of various texts and opinions (e.g., on social
networks).
6. Bayesian network
This method is based on the calculation of dependent probabilities or Bayesian law.
A Bayesian network consists of a number of nodes that represent those random variables that interact with each other.
This interaction is created by the connection between nodes (Cain, 2001). Fig. 1 shows the nodes and the relationship
between them.
Definitions and concepts:
There are several definitions for Bayesian networks. G ¼ (V, E)A noncyclic, directional graph and X ¼ (XV)nV is a set of
random variables with V-indices. X is a Bayesian network relative to G, if the probability density function can be written as a
product of single density functions provided by their parent variables (Russell Stuart and Norvig, 2003).
64
Handbook of hydroinformatics
X1
X2
Y2
Y1
Y3
FIG. 1 The nodes and the relationship between them.
PðXÞ ¼
Y P XV XpaðV Þ
nn
Which pa(V) is the parent set V (nodes that are inserted directly into it with an edge). For each set of random variables, the
probability of each member of the co-distribution can be calculated from conditional probabilities using the chain law as
follows (Russell Stuart and Norvig, 2003):
Yn
n+1
X , …, X ¼ x
PðX1 , x1 , …, Xn ¼ xn Þ ¼
P
Xn
¼
x
n
n
n
n¼1
By comparing this relationship with the above definition, we will have:
Q
PðX1 , x1 , …, Xn ¼ xn Þ ¼ nn¼1 P Xn ¼ xn Xj ¼ xj
for each Xj which is a parenðtxn Þ
The difference between the terms 11-3 and 12-3 is the conditional independence of the variables from each of their offspring nodes, provided the values of their parent variables.
Creating Bayesian networks:
Creating a Bayesian network requires three steps:
l
l
l
Identify important variables and their possible scenarios
Identify the relationship between variables and express it in a graphical structure
Evaluation of initial and conditional probabilities
It should be noted that creating a Bayesian network is a creative process that is repeated in the above steps until the desired
network is reached.
The first step (identifying variables) is not always easy. Hecherman et al. have proposed the following to identify variables (Hecherman et al., 1995).
✓ Accurate identification of modeling goals. For example, the purpose of modeling can be predictive, descriptive, or
exploratory.
✓ Identify possible observations that may be related to the problem.
✓ Identify valuable subsets of these observations for the model, given the complexity of the network.
✓ Organize the observations in the variables so that the two states are incompatible.
Jensen (2001) has proposed three types of variables for the development of the Bayesian network (Jensen, 2001):
(a) Hypothesis variables: These variables are not observable (or are observed at an unjustifiable cost). Identifying these
variables is the first step in building business networks.
(b) Information variables: These types of variables are observable and provide information about hypothesis variables.
(c) Modeling variables: These variables are used for specific purposes and modeling, such as simplifying conditional
probability tables.
In the process of building a Bayesian network, variables (nodes) can be easily added or modified. The graphical structure of
this network allows variables to be added or removed without any noticeable effect on the rest of the network.
Bayesian theory: Methods and applications Chapter
3
65
After defining the variables, the next step is to determine the graphical structure of the network. This requires identifying possible dependencies between variables and applying them to directional edges. The direction of these edges must be
carefully defined. This can increase the complexity of the model; however, the modeling process must be continuously
reviewed and modified in terms of dependency relationships.
The last step in building a Bayesian network is to evaluate the probability values and their expertise in node probability
tables (NPT). The NPT shows how the conditional dependencies of the related variables. Depending on the type of a variable (discrete or continuous), NPT can be a discrete probability table or a continuous probability distribution. In orphaned
nodes, NPT showed the initial probability, which was estimated by subjective estimation from previous data. In a parent
node, the probability of each state of the node is evaluated under the condition of each state of its parents; therefore, the NPT
of these nodes contains probability values for all possible combinations of their parent states.
In addition to the initial probabilities, the conditional probability values are also extracted from both past data sources
and expert opinion. Extracting these probabilities is a difficult and time-consuming process. In many applications, the information required is limited or inaccessible. Therefore, the knowledge and experience of experts in these fields is the main
source of potential data (Khodakarami et al., 2007).
Bayesian network structural learning algorithms are divided into two categories, including constraint-based learning
algorithms and score-based learning algorithms. The first category is obtained by statistical tests (such as PC and NPC
algorithms) based on conditional independence and dependence between variables. In score-based learning methods, it
evaluates all possible relationships between nodes and selects a structure with the highest score as the desired structure
(Sadeghi Hesar et al., 2012). Due to their simplicity, PC and NPC algorithms are the most widely used in training Bayesian
networks to training modeling structure.
Applications and benefits of Bayesian networks:
Bayesian networks provide a robust, comprehensive, and flexible approach to modeling risk and uncertainty. Today, the
benefits of Bayesian networks are well understood and used in a variety of areas. In recent years, researchers have
developed programs for the easy implementation of these networks that have made it possible to develop decision support
systems in a variety of applications (Fenton and Neil Martin, 2007).
In recent years, Bayesian network models for quantitative analysis of project scheduling risk (Khodakarami et al., 2007)
and new product development (Chin et al., 2009), environmental modeling (Aguilera et al., 2011), DNA analysis in legal
issues (Biedermann and Taroni, 2011), real-time flood event prediction (Biondi and De Luca, 2012), and runoff estimation
(Sadeghi Hesar et al., 2012) have expanded.
7. History of Bayesian model application in water resources
Niko and Karachian (2008) in a study using the trade ratio system and Bayesian network, according to the one-way
direction of river water flow, prepared a qualitative model of the river. In this study, the results of the ratio-trade system
were used to train a Bayesian network. Due to the uncertainties in the river system, combining Monte Carlo uncertainty
analysis, ratio-trade method, and Bayesian network, a new model was proposed for the pollutant discharge permit, which in
addition to providing a trade model for pollution discharge permit, the ability to generate output. It also has the possibility
and quality management of the river in real time. The results showed that this model is an effective tool in the quality
management of the river system.
Ghorbani and Dehghani (2016) in a study investigated the applicability of Bayesian network model, artificial neural
network, and gene expression programming to analyze the amount of dissolve solids in the Balkhuchai River located
in Ardabil province. Accordingly, quality variables including bicarbonate, chloride, sulfate, calcium, magnesium, sodium,
and flow rate in the monthly time scale during the statistical period (1976–2006) as the input of Bayesian network model
and its results with artificial neural network models and they compared the gene expression programming. The results
showed that although the three models were able to estimate the amount of water-dissolve solids with acceptable accuracy,
but the Bayesian network model with the highest correlation coefficient (0.966), the least square root or the root mean
square error (0.094 mg/lit), and also the stochasticity criterion (0.988) is in the priority stage in the validation stage.
Varis and Keskinen (2006) used a Bayesian network in multiobjective optimization research and explained its application in water resources and environmental management. Finally, he expressed the efficiency of his model in the form of
an example in the field of economic management of river water quality.
Borsuk et al. (2001) studied the monitoring and quality management program of the river through the Bayesian network.
The program was about the problem of water intake (reduction of water-soluble oxygen and increased growth of algae and
toxic microorganisms) of a river in the state of Carolina in the United States. This river was divided into three categories of
parameters in terms of qualitative parameters. Parameters related to water quality, biological quality, and water quality
66
Handbook of hydroinformatics
suitable for human health. Then the appropriate Bayesian network was formed. The results showed that the Bayesian
network model can predict the changes that occur in the properties of the ecosystem in relation to the adopted policy.
Sun et al. (2019) in a study improved the simulation of evapotranspiration using the Bayesian network mean model with
the surface energy balance model in China. In this study, four surface energy balance models (SEBAL, SSEB, S-SEBI, and
SEBS) and the average Bayesian network model using Landsat 8 in two semiarid regions and dry/semiarid regions. In
China, it was examined. The results showed that the mean model of a random Bayesian network with R2 ¼ 0.75,
RMSE ¼ 0.902 mm/day, and Nash coefficient ¼ 0.746 for the high station and R2 ¼ 0.796, RMSE 0.602 ¼ mm/day, and
the Nash coefficient ¼ 0.793 for the Sidaokiao station, compared to the four surface energy balance models, predicts better
evapotranspiration in two climates.
8. Case study of Bayesian network application in modeling of evapotranspiration
of reference plant
In a study conducted in the semiarid climate of Khorramabad located in western Iran in order to evaluate the models of
artificial intelligence in estimating the evapotranspiration of the reference plant, the ability of the Bayesian network model
to estimate the evapotranspiration of the reference plant on a monthly and daily time scale was examined. In this study, six
input patterns for modeling and for network structure training were determined, the forbidden search learning algorithm
(PC) (significance level of 5%) and also for network parametric training, by setting two variables of significance level and
maximum size proximity was used according to the effect of the parameters on each other.
Table 1 shows the results of Bayesian network modeling in estimating monthly reference evapotranspiration. As can be
seen, the hybrid structures show better performance, so that the hybrid structure No. 5 with high accuracy ¼ 0.97 and the
lowest root mean square error RMSE ¼ 1.09 (mm/day) in the training phase and the lowest root mean square error
RMSE ¼ 0.93 (mm/day) in the test stage had better performance than other structures and was able to simulate the reference
evapotranspiration of the study area with appropriate accuracy.
Fig. 2 shows the best Bayesian network structure. The main purpose of this method is to find the relationship between
reference evapotranspiration and parameters affecting it.
Fig. 3 shows the changes in observational and computational values over time in the training phase. Based on this
diagram, the maximum values of the model have an almost unsatisfactory performance and it can be stated that the
Bayesian network model has performed poorly in estimating the maximum values and has estimated the values less than
the actual value, which is abundant in the figure; can be stated that Bayesian network determines the relationship between
each of the variables randomly based on conditional independence and dependence between variables and based on probabilities, and the network is not well generalized. Fig. 3 is the model diagrams in the test phase.
The results of this study showed that the Bayesian network was weak in estimating the maximum values, which can be
explained by the Bayesian network randomly determining the relationship between each of the variables based on conditional independence and dependence between variables and probabilities, and generalized the network not been well.
TABLE 1 Bayesian network results in reference evapotranspiration modeling of
Khorramabad station.
Training
Testing
Row
Pattern
R2
RMSE
R
1
M1
0.93
2.83
0.93
2.85
2
M2
0.95
4.94
0.96
5.03
3
M3
0.96
5.59
0.96
5.72
4
M4
0.96
4.26
0.96
4.39
5
M5
0.97
1.09
0.97
0.93
6
M6
0.97
2.87
0.97
2.76
2
RMSE
Bayesian theory: Methods and applications Chapter
3
67
ET0 (mm/day)
FIG. 2 Bayesian network structure used for simulation.
Time (month)
FIG. 3 Graph of computational and observational values relative to time.
9. Conclusions
For survey, the relationship between dependent variables, Bayesian law, is used. The Bayesian relation uses a numerical
estimate of the probabilistic knowledge of the hypothesis before the observations occur and provides a numerical estimate
of the probabilistic knowledge of the hypothesis after the observations. This law for classifying phenomena is based on the
probability of occurrence or nonoccurrence of a phenomenon and is important and widely used in probability theory. Due to
the application of Bayesian theory in probabilities and uncertainty problems, this method can be used in various problems
such as hydrological problems. The Bayesian network, due to its nonlinear mathematical structure, has the ability to
describe the complex nonlinear processes that occur between the input and output of any system. The Bayesian network
also provides the explicit solutions based on which the relationship between the input and output variables can be
determined.
References
Aguilera, P.A., Fernandez, A., Fernandez, R., Rumi, R., Salmeron, A., 2011. Bayesian networks in environmental modeling. Environ. Model Softw. 26
(12), 1376–1388.
Alinezhad, A., Gohari, A.R., Eslamian, S., Baghbani, R., 2020. Uncertainty analysis in climate change projection using Bayesian approach. In: World
Environmental and Water Resources Congress (ASCE), Henderson, Nevada, USA, May 17–21.
Bayes, T., 1763. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS Communicated by Mr. Price, in a letter to
John Canton, AMFR S. Philos. Trans. Royal Soc. Lond. (53), 370–418.
Biedermann, A., Taroni, F., 2011. Bayesian networks for evaluating forensic DNA profiling evidence: a review and guide to literature. Forensic Sci. Int.
Genet. 6 (2), 147–157.
Biondi, D., De Luca, D.L., 2012. A Bayesian approach for real-time flood forecasting. Phys. Chem. Earth 42 (44), 91–97.
68
Handbook of hydroinformatics
Borsuk, M.E., Higdon, D., Stow, C.A., Reckhow, K.H., 2001. A Bayesian hierarchical model to predict benthic oxygen demand from organic matter
loading in estuaries and coastal zones. Ecol. Model. 143 (3), 165–181.
Cain, J., 2001. Planning Improvement in Natural Resource Management: Guideline for Using Bayesian Networks to Support the Planning and Management of Development Program in the Water Sector and Beyond. Centre for Ecology and Hydrology (CEH), Wallingford, UK.
Chin, K.S., Tang, D.W., Yang, J.B., Wang, S.Y., Wang, H., 2009. Assessing new product development project risk by Bayesian network. Expert Syst.
Appl. 36, 9879–9890.
Drayton, E.L., 1978. The Effect of Father Absence Upon Social Adjustment of Male and Female Institutionalized Juvenile Delinquents. Fordham
University, USA.
Fenton, N., Neil Martin, E., 2007. Managing Risk in the Modern World: Applications of Bayesian Networks – A Knowledge Transfer Report from the
London Mathematical Society and the Knowledge Transfer Network for Industrial Mathematics. London Mathematical Society, London, England.
Ghorbani, M.A., Dehghani, R., 2016. Comparison of Bayesian neural network and artificial neural network methods in river suspended sediment estimation (case study: Simine Road). Environ. Sci. Technol. Q. 19 (2). https://civilica.com/doc/1288926.
Hecherman, D., Mamdani, A., Wellman, M., 1995. Real-world application of Bayesian networks. Commun. ACM 3, 25–26.
Jensen, F.V., 2001. Bayesian Networks and Decision Graphs. Springer-Verlag, New York, USA.
Khodakarami, V., Fenton, N., Neil, M., 2007. Project scheduling: improved approach to incorporate uncertainty using Bayesian networks. Proj. Manage.
J. 38, 30–49.
Kouhestani, S., Eslamian, S., Besalatpour, A., 2017. The Effect of Climate change on the Zayandeh-Rud River Basin’s temperature using a Bayesian
machine learning, Soft Computing Technique. J. Water Soil Sci. 21 (1), 203–216.
Niko, M., Karachian, R., 2008. The use of Bayesian networks in the non-deterministic model of river pollution permit trading. The First Conference on
Environmental Systems Management and Planning Engineering. Civilica, Tehran. https://civilica.com/doc/50951.
Russell Stuart, J., Norvig, P., 2003. Artificial Intelligence: A Modern Approach, second ed. Upper Saddle River, New Jersey, USA, Prentice Hall. ISBN 013-790395-2.
Sadeghi Hesar, A., Tabatabaee, H., Jalali, M., 2012. Monthly rainfall forecasting using Bayesian Belief Networks. Int. Res. J. Appl. Basic Sci. 3 (11), 2226–
2231.
Sean, R.E., 2004. What is Bayesian statistics? Nat. Biotechnol. 22, 1177–1178.
Sun, S., Zhang, G., Shi, J., Grosse, R., 2019. Functional variational Bayesian neural networks. arXiv. https://arxiv.org/abs/1903.05779.
Varis, O., Keskinen, M., 2006. Policy analysis for the Tonle Sap Lake, Cambodia: a Bayesian network model approach. Int. J. Water Resour. Dev. 22 (3),
417–431.
Chapter 4
CFD models
Hossien Riahi-Madvara, Mohammad Mehdi Riyahib, and Saeid Eslamianc,d
a
Department of Water Engineering, Faculty of Agriculture, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran, b Department of Civil Engineering,
Faculty of Civil Engineering and Architecture, Shahid Chamran University of Ahvaz, Ahvaz, Iran, c Department of Water Engineering, College of
Agriculture, Isfahan University of Technology, Isfahan, Iran, d Center of Excellence in Risk Management and Natural Hazards, Isfahan University
of Technology, Isfahan, Iran
1. Introduction
In this chapter, computational fluid dynamics (CFD), as an advanced technique in hydroinformatics modeling, is presented.
Some representative applications of CFD in hydroinformatics including the one-dimensional solution of advectiondiffusion equation in pollutant transport modeling, one-dimensional solution of Saint-Venant equations for dam-break
simulation, quasi-two-dimensional solution of velocity distribution in compound rivers, three-dimensional modeling of
turbulent flow, and finally pollutant transport in rivers are introduced and numerically solved. In this chapter, different
types of CFD models are developed and used in different fields of river engineering simulations. The physically influenced
scheme (PIS) is introduced for the one-dimensional dam-break modeling via finite volume method. PIS is used for the onedimensional solution of advection-diffusion equation in pollutant transport modeling and one-dimensional solution of fully
dynamic Saint-Venant equations in dam-break simulation. For solving the quasi-two-dimensional flow in natural rivers, the
Shiono and Knight Model (SKM) with finite difference method is numerically solved. In the case of three-dimensional
modeling, seven turbulence models are used to simulate the three-dimensional turbulent flow in open channels. Finally,
three-dimensional pollutant transfer in rivers is simulated by three different numerical models. At each section, the outputs
of numerical models are compared to the analytical or measured values to evaluate the results of the techniques. This
chapter briefly introduces and applies CFD techniques in hydroinformatic modeling.
Numerical solutions of governing equations in river flow and fluid mechanics are one of the great methods for the prediction of the flow field in hydroinformatics studies, including sediment transport, pollutant transport, open channels
hydraulics, and river engineering (Tucciarelli, 2003; Riahi-Madvar et al., 2019). The three major methods for numerical
discretization of nonlinear partial differential equations (PDEs) of fluid flow equations are the finite difference (FD), finite
element (FE), and finite volume (FV) methods (Aldrighetti, 2007). The finite volume method is widely implemented in
computational fluid dynamics (CFD) computer codes and commercial software with various discretization schemes
(Darbandi et al., 2007; Darbandi and Bostandoost, 2005). Recently, a new scheme for face flux estimations in FV is proposed based on the physical influence of flow field in gas dynamics (Darbandi et al., 2007; Darbandi and Bostandoost,
2005) as well as in dam-break simulations (Bozkus and Eslamian, 2022). In this chapter, different methods of the CFD
applications in hydroinformatics in one-, two-, and three-dimensional domains are introduced and used for the river flow
and pollutant transport simulations.
2. Numerical model of one-dimensional advection dispersion equation (1D-ADE)
Suspended sediment transport and pollutant dispersion in rives in a one-dimensional framework is modeled by numerical
solution of ADE, which is expressed as follows (Kashefipour and Falconer, 2002; Wu, 2007):
∂ðACÞ ∂ðAUCÞ
∂
∂C
+
¼
ADx
(1)
+ ST
∂t
∂x
∂x
∂x
where C is the pollutant concentration, U is velocity, A is the cross-section area, t is time, x is the direction of flow, ST is the
source term, and Dx is the longitudinal dispersion coefficient.
In this section, the focus is on the one-dimensional mass dispersion, so the flow field is not important and it is supposed
that the flow field is uniform or it is predefined. Using this hypothesis, Eq. (1) transforms to:
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00020-8
Copyright © 2023 Elsevier Inc. All rights reserved.
69
70
Handbook of hydroinformatics
∂ðCÞ ∂ðUCÞ
∂
∂C
+
¼
Dx
∂t
∂x
∂x
∂x
(2)
A cell-centered numerical grid, as presented in Fig. 1, is used to discretize the equation. The grid has N points, N 1 interfaces, and two boundaries faces. The conserved variables are determined at the cell centers and represent the average value
of cells, while the fluxes are calculated at the cell interfaces. Integrating (2) on the ith cell with the length Dx, and applying
the Green’s theorem, after the simplification, the following discretized implicit equation can be obtained (Seifi and RiahiMadvar, 2019):
" n+1 #
h
i
∂C n+1
∂C
n+1
n
n+1
n+1
ðCÞP ðCÞP Dx + UDt ðCÞe ðCÞw ¼ K x Dt
(3)
∂x e
∂x w
In Eq. (3), the diffusion parts are in the pressure forms that are discretized using the central difference method consistent
with the physics of the pressure field (Darbandi et al., 2007; Darbandi and Bostandoost, 2005). In Eq. (3), the convection
n+1
parts U(C)n+1
e e and U(C)w represent the interface fluxes at w and e faces. The interface fluxes should be determined by a
proper scheme (Patankar, 1980; Wu, 2007). In this section, a physically based scheme rather than numerical interpolation is
developed for these face fluxes.
3.
Physically influenced scheme
In this section, the new methodology of a physically based scheme for the derivation of face fluxes in ADE is presented. As
mentioned previously, Darbandi and Bostandoost (2005) introduced the PIS for aerospace applications. In this section, the
PIS is used in suspended sediment and pollutant dispersion modeling. Eq. (2) can be rewritten as (Seifi and Riahi-Madvar, 2019):
2 ∂ðCÞ
∂ðCÞ
∂ C
+U
¼ Kx
(4)
∂t
∂x
∂x2
The terms in this equation need to be discretized in a scheme that compliments the physical nature of mass and pollutant
transport. In this view, the convection term is discretized by the upwind, the central difference discretization is used for the
diffusion term, and the backward discretization is used for the unsteady term, i.e. (Seifi and Riahi-Madvar, 2019)
Cne
∂ðCÞ Cn+1
¼ e
∂t
Dt
(5)
Cn+1
Cn+1
∂ðCÞ
e
P
¼ U n+1
e
∂x
Dx=2
2 n+1
∂ C
Cn+1
+ Cn+1
P 2Ce
E
Kx
¼
K
x
∂x2
ðDx=2Þ2
U
(6)
(7)
Using this discretization into Eq. (4) and rearrangement of it yields an equation for interface fluxes. This new expression for
face flux can be written as (Seifi and Riahi-Madvar, 2019):
n+1
Cn+1
¼ C1 Cn+1
e
P + C2 CE + C3
In which:
C1 ¼
4K x Dt
Dx
+ 2DtU n+1
e
Dx + 2DtU n+1
+
e
!
8K x Dt
Dx
1
2
3
C2 ¼
P
W
FIG. 1 The grid layout of the one-dimensional FV.
,
W
Dx +
(8)
4K x Dt
Dx
2DtU n+1
e
!
+
(9)
8K x Dt
Dx
E
e
N-2
N-1
N
CFD models Chapter
C3 ¼
4
71
!
Dx Cne
Dx + 2DtU n+1
+
e
8K x Dt
Dx
(10)
As it is seen, in PIS, regarding the physics of the phenomenon and governing equations, there is a robust connection
between the nodal variables and intercell variables. In the inlet boundary condition, the known values of concentration
are used, C(1,t) ¼Cin,t. The zero-mass gradient C(N 1,t) ¼C(N,t) is used for the outlet boundary. The known concentrations used for the initial conditions C(I,0) ¼ Cin,i.
The PIS model results are evaluated in comparison with the analytical results given in Graf (1998). A hypothetical
trapezoidal channel with length 1 km, time duration 200 s, having 25 computational cells with 0.1 s time step is used.
The results are presented in Figs. 2 and 3 with different Courant (Cr ¼ ut/x) and Peclet (P ¼ ux/D) number values.
FIG. 2 The results of PIS, and upwind versus the analytical solution in P ¼ 320 and Cr ¼ 0.0024.
1.1
1
0.9
Analytical
Numerical
0.8
Upwind
C(ppt)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
100
200
300
400
500
X(m)
600
700
800
900
FIG. 3 The results of PIS, and upwind versus the analytical solution in P ¼ 852 and Cr ¼ 0.0045.
1.1
1
0.9
Analytical
Numerical
0.8
Upwind
C(ppt)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
100
200
300
400
500
X(m)
600
700
800
900
72
Handbook of hydroinformatics
4. Finite Volume Solution of Saint-Venant equations for dam-break simulation
using PIS
The Saint-Venant (SV) equations are used to simulate the one-dimensional unsteady flow in rivers with irregular crosssection. The SV equations in conservative form can be written as: The continuity equation:
∂A ∂Q
+
¼0
∂t
∂x
(11)
∂Q
∂ Q2
∂
+
ðgAhc Þ ¼ gA S0 Sf
+
∂t
∂x A
∂x
(12)
and the momentum equation:
where A ¼ cross-sectional area, V ¼ average velocity, Q ¼ discharge, g ¼ gravity acceleration, hc ¼ vertical distance below
the free surface to the centroid of the flow cross-sectional area, S0 ¼ bed slope, Sf ¼ friction slope.
Where n ¼ Manning coefficient, R ¼ the hydraulic radius of river cross-section, P ¼ the wetted perimeter of the river
cross-section. For a rectangular channel, we have:
n2 QjQj
A
, R¼
P
A2 R4=3
A
A
P ¼ B + 2 , hc ¼
B
2B
Q ¼ A V, Sf ¼
(13)
(14)
in which B ¼ channel width. As shown in Fig. 1, a cell-centered grid with N points, N 1 interface cells with two boundary
conditions are used. Integrating Eqs. (11) and (12) over the ith volume and applying the Green’s theorem, after simplifying,
the following discretized implicit equations can be obtained:
The continuity equation:
n+1
AP AnP Dx + Qn+1
Qn+1
Dt ¼ 0:0
(15)
e
w
and the momentum equation using the finite volume approach:
" n+1
#
n+1
h
i
h in+1
n+1
Q
Q
Qn+1
ðAhc Þn+1
(16)
¼ g Dt Dx A S0 Sf
Qn+1
+ g Dt ðAhc Þn+1
QP QnP Dx + Dt
e
w
e
w
p
A e
A w
of p cell, e shows the fluxes at the east face, and w shows the fluxes at the west face of p cell; the superscript n refers to the
values at the current time step and n + 1 shows the values at the future time step. The overbar notation in momentum
equation shows the simple linearization of nonlinear terms of momentum equations from lagged iterations. In this section,
new physically based expressions for face flux estimations at e and w faces are developed and the full coupling of discren+1
tized continuity and momentum in the collocated grid of Fig. 1 is assessed. The linear (Qn+1
e , Qw ) interface fluxes in the
n+1
discretized continuity equation, and the interface fluxes in discretized momentum equation, and overbar interface ( QA
,
e
n+1
Q
) in the momentum equation estimated by an expression derived from the discretized form of the momentum
A
w
equation, which is named convecting flux. Although there are several methods for linearization of momentum terms
and face flux estimations, in this section, the original simple linearization schemes are used. The expressions for the face
flux components can be derived from the momentum equation of SV. In this regard, the momentum equation given in
Eq. (12), using Q ¼ AV, is expanded to:
∂Q
∂Q
∂V
∂
+V
+Q
+ g ðAhc Þ ¼ S
∂t
∂x
∂x
∂x
(17)
The terms in this equation are discretized respect regarding the correct physics of the flow. To achieve this purpose, they are
approximated as follows:
∂Q Qn+1
Qne
¼ e
∂t
Dt
(18a)
CFD models Chapter
V
n+1
n+1
n+1
∂Q Qe Qe Qp
¼ n+1
Dx
∂x
A
2
4
73
(18b)
e
Qn+1
e
n+1
∂V
n+1 A
¼ Qe e
Q
∂x
g
Qn+1
p
n+1
A
p
(18c)
Dx
2
n+1
½Ahc n+1
∂
E ½Ahc P
ðAhc Þ ¼ g
∂x
Dx
n+1
S ¼ Se
(18d)
(18e)
According to the inherent physics of these terms, the convection parts are discretized with an upwind scheme, and pressure
terms are discretized by central difference. The substitution of the discretized terms into Eq. (6), with a rearrangement,
finally results in an expression for the face flux component at cell faces. A compact form of the resulted equation for
e face flux can be written as:
Qne + 1 ¼ C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , nE + 1 AnE + 1 + C3
in which
Dt n+1 1
1
C1 ¼ 2
+ n+1
Qe
n+1
CF Dx
AP
Ae
CF ¼ 1 + 4
C2 ¼
Dt
Dx
!
n+1
Qe
n+1
A
(20a)
(20b)
e
g Dt
CF Dx
n+1
S Dt
Qne
+ e
CF
CF
0
1
n+1
n+1 2 n
Q
Q
4
e
e
n+1 B
3C
¼ g Ae @S0 pn+1
A
103
e
n+1
Ae
C3 ¼
n+1
Se
(19)
(20c)
(20d)
(20e)
5. Discretization of continuity equation using PIS
As stated in the previous section, the continuity equation is discretized using convecting flux in Eqs. (19) and (20a)–(20e).
The continuity equation which is shown in the form of Eq. (15) can be rewritten as follows:
An+1
P +
Dt n+1
Q Qn+1
¼ AnP
w
Dx e
(21)
Replacing Eqs. (17) and (18a)–(18e) into Eq. (21) yields:
AnP + 1 +
Dt C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , En + 1 AnE + 1 + C3 D1 QnW+ 1 + D2 hc , nW+ 1 AnW+ 1 hc , nP + 1 AnP + 1 + D3 ¼ AnP
Dx
(22)
6. Discretization of the momentum equation using PIS
Considering Eq. (16), the face fluxes are estimated by Eq. (19). The pressure terms, based on their physical meanings, are
treated as follows:
74
Handbook of hydroinformatics
ðAhc Þn+1
¼
e
n+1
ðAhc Þn+1
E + ðAhc ÞP
2
Dt
W e ¼ 1 Cnre ¼
2
ðAhc Þn+1
w ¼
W w ¼ 1 Cnrw
jQE j
AE
n
+
jQP j
AP
n
(23)
Dx
n+1
ðAhc Þn+1
W + ðAhc ÞP
2
n Q n
j j
jQP j
+ AW
W
Dt AP
¼
2
Dx
Finally, implementing Eqs. (23) and (24) in Eq. (16) yields:
" n + 1
n+1
Q
n
QP QP Dx + Dt
C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , nE + 1 AnE + 1 + C3 QnP + 1 QnP Dx
A e
" n + 1
Q
+ Dt
C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , nE + 1 AnE + 1 + C3
A e
#
n + 1
n+1 n+1
Q
n+1
n+1
n+1
D 1 Q W + D 2 hc , W AW hc , P AP
+ D3
A w
"
#
ðAhc ÞnW+ 1 + ðAhc ÞnP + 1
ðAhc ÞnE + 1 + ðAhc ÞnP + 1
Ww
+ g Dt We
2
2
h in + 1
¼ g Dt Dx A S0 Sf
p
(24)
(25)
The nonlinear system of Eq. (25) in corporation with initial and boundary conditions is solved using an initial guess for
nonlinear coefficients, where a direct solver is used for the 5-diagonal matrix of unknowns. A FORTRAN code developed
by the first author is used to solve the system of nonlinear equations for dam-break simulation. The boundary conditions in
two test cases of dry bed and wet bed are as follows. In the case of dry bed, the initial conditions are:
If ðXi XdamÞ Then A ði, 0Þ ¼ A1,Q ði, 0Þ ¼ 0:0 Else A ði, 0Þ ¼ Ads ,Q ði, 0Þ ¼ 0:0
The defined boundary conditions are Neumann boundary condition (∂/∂ x ¼ 0.0), at the inlet and outlet boundaries considering as an open boundary problem, so we have:
Að1, tÞ ¼ Að2, tÞ,Qð1, tÞ ¼ Qð2, tÞ,AðN, tÞ ¼ AðN 1, tÞ,QðN, tÞ ¼ QðN 1, tÞ
In the dry bed Ads ¼ 1e 8 and in the wet bed Ads ¼ A2. The indices “2” and “1” refer to the downstream and upstream parts
of the dam, respectively. Tests are considered as idealized dam breaks in a rectangular channel with a dry and a wet bed. The
dry bed test is used to evaluate the performance of the PIS in the waves with very shallow front edge. In the wet bed, a right
traveling surge in the solution domain with an upward depression wave is used.
The test case conditions are; channel width, B ¼ 1 m; bed slope, S0 ¼ Sf ¼ 0.0; A1 ¼ 10 m, A2 ¼ 0.0000001, channel length
1200 m, dam located at x ¼ 500 m, dx ¼ 10 m and dt ¼ 0.3 s. Figs. 4–6 represent the numerical and exact solutions of water
area A, discharge Q, at 30 s after the dam failure.
As it can be seen from this figure, the PIS-SV model can accurately predict the step and sharp variation of flow depth and
area, also near the sharp depth changes. Fig. 7 shows the discharge obtained by PIS-SV and the exact results. From this
figure, the model accurately predicts the flow peak discharge along the channel, but some discrepancies occurred at low
flows. It is noticed that the PIS-SV model doesn’t use any special treatments such as weighted water surface gradient,
whereas the upwind model necessarily needs this technique of shock capturing. This has revealed the capability of the
newly developed model in shock capturing in step gradients without using any special numerical treatments.
7.
Quasi-two-dimensional flow simulation
The natural rivers in a compound shape have a deeper main cross-section and the shallower floodplain sections. Compound
channels differ from the single channels in terms of flood adjustment, cutting flood peak, sediment transport, lateral
CFD models Chapter
12
75
PIS-SV
10
Exact
= 0.3 (s)
= 10 (m)
t = 30 (s)
8
h (m)
4
6
4
2
0
0
200
400
600
800
1000
1200
x (m)
FIG. 4 Dam break over the dry bed: comparison of PIS-SV with the exact solution of flow depth at t ¼ 30 (s).
35
PIS-SV
Exact
30
Q (m3/s)
25
= 0.3 (s)
= 10 (m)
t = 30 (s)
20
15
10
5
0
0
200
400
600
x (m)
800
1000
1200
FIG. 5 Dam break over dry bed: comparison of PIS-SV with exact solution of discharge at t ¼ 30 (s).
12
Exact
10
PIS-SV
h (m)
8
6
4
2
0
0
200
400
600
800
1000
x (m)
FIG. 6 Dam break over wet bed: comparison of PIS-SV with exact solution of area at t ¼ 30 s.
variation in depth and flow velocities, etc. (Chatila, 1997). This quasi 2-D model is widely accepted for conveyance estimation system of natural rivers with compound channels (Riahi-Madvar et al., 2011).
Shiono and Knight (1989, 1991) derived the depth-averaged equation for quasi-two-dimensional flows by integrating
from the Navier-Stokes equations over the flow depth H. The Shiono and Knight model (SKM) is a depth-averaged based
on the RANS equations, which determines the lateral distributions of the depth averaged velocity distribution and boundary
shear stress distribution across the river cross-sections. The SKM is written as:
76
Handbook of hydroinformatics
N-1 N
123
FIG. 7 Compound cross-section with solution network used in SKM method.
f
rgHS0 r ud 2
8
(
)
rffiffiffiffiffiffiffiffiffiffiffiffiffi
1=2
1
∂
f
∂u
∂H ðrUVÞd
rlH2
1+ 2+
ud d ¼
s
∂y
8
∂y
∂y
(26)
where H ¼ water depth, U and V ¼ velocity in x and y directions, S0 ¼ longitudinal bed slope, f ¼ Darcy-Weisbach friction
factor, s ¼ side slope of cross-sections, r ¼ fluid density, g ¼ acceleration due to the gravity, and l ¼ dimensionless eddy
viscosity. Subscript d refers to the depth-averaged condition. In this section, the numerical solution of the quasi-twodimensional models as another application of the CFD models is presented. Chatila (1997) presented laboratory observations of the distribution of depth-averaged velocity in a compound channel. The experimental channel was 29.26 m long,
0.787 m deep, and 1.498 m wide with a bed slope of 0.00069. The channel had a simple rectangular cross-section; it was
modified by using aluminum sheets to produce an asymmetrical compound shape. Velocity measurements were performed
at two stations, one 12.24 m and the other 22.76 m from the channel entrance. In this study, only the velocity measurement at
12.24 m section from the entrance is used to evaluate the numerical model and seven turbulence models. Detailed information about the instruments and measurements is given in Chatila (1997).
8.
Numerical solution of quasi-two-dimensional model
By assuming X ¼ u2d the SKM relation is written as follows:
rffiffiffiffiffiffiffiffiffiffiffiffiffi
f
1 1 2 f 1=2 ∂2 X G
gHS0 X 1 + 2 + lH
¼
8
s
2
8
∂y2
r
(27)
Changing the following variables:
1=2
f
8
rffiffiffiffiffiffiffiffiffiffiffiffiffi
f
1
1+ 2
T¼
8
s
P ¼ lH 2
R¼
G
gHS0
r
(28a)
(28b)
(28c)
and replacing them into Eq. (27), the following equation is obtained:
P ∂2 X
TX ¼ R
2 ∂y2
(29)
By numerically solving the above equation at the flow cross-section, the lateral distributions of depth-averaged velocity are
obtained. Considering the compound cross-section according to Fig. 7 and discretizing the Eq. (29), we have a solution for
the network.
Pi
Xi+1 2Xi + Xi1
T i X i ¼ Ri
4Dy
(30)
Finally, by simplifying the above relation and re-arranging it according to the unknown parameter X, the following
equation is obtained:
CFD models Chapter
4
77
Pi
Pi
P
X + T i Xi + i Xi+1 ¼ Ri
4Dy i1
2Dy
4Dy
(31)
AXi1 BXi + CXi+1 ¼ D
(32)
or
By applying the above equation to all points in the cross-section, a three-diagonal unknown matrix is obtained, which can be
easily solved using the Thomas algorithm. The walls have some conditions like the velocity of them are zero and they are
nonslip and these conditions are the boundary conditions in the SKM method for solving the equations. The system of threediagonal equations is finally obtained as follows:
i ¼ 1 ! Xi ¼ 0
i ¼ 2 ! A ¼ o, BXi + CXi+1 ¼ D
i ¼ 3, N 2
AXi1 BXi + CXi+1 ¼ D
Next i
(33)
i ¼ N 1 ! C ¼ o, AXi1 BXi ¼ D
i ¼ N ! Xi ¼ 0
By solving the above system of equations, the variable X ¼ u2d and then the average velocity is obtained at different crosssection points. The results of the SKM model in comparison with observed values of velocity and shear stress are presented
in Figs. 8–10. According to Figs. 8 and 9, it’s clear that the SKM model has good capabilities in the prediction of the lateral
velocity profile of the depth-averaged velocity, and it can accurately predict strong velocity gradients between the main
channel and flood plains. The boundary shear stress distribution simulated by the SKM model is compared with measurements of FCF (Ayyoubzadeh, 1997) in Fig. 10. The results of SKM in these figures show the applicability of SKM in compound channels. More details about these results are given in Riahi-Madvar et al. (2011).
0.4
SKM
Observed
0.35
0.3
Velocity (m/s)
0.25
0.2
0.15
0.1
0.05
0
0
0.5
1
1.5
Lateral Distance(m)
FIG. 8 Lateral velocity distribution in a rectangular compound channel versus the measurements. (From Riahi-Madvar, H., Ayyoubzadeh, S., Namin, M.,
Seifi, A., 2011. Uncertainty analysis of quasi-two-dimensional flow simulation in compound channels with overbank flows. J. Hydrol. Hydromech. 59(3),
171.)
78
Handbook of hydroinformatics
FIG. 9 Lateral velocity distribution in a large-scale
trapezoidal compound channel versus the measurements. (From Riahi-Madvar, H., Ayyoubzadeh, S.,
Namin, M., Seifi, A., 2011. Uncertainty analysis of
quasi-two-dimensional flow simulation in compound
channels with overbank flows. J. Hydrol.
Hydromech. 59(3), 171.)
1
SKM
Observed
0.9
0.8
Velocity (m/s)
0.7
0.6
0.5
0.4
0.3
0.2
0
0.5
1
1.5
2
2.5
3
3.5
Lateral Distance(m)
FIG. 10 Lateral boundary shear stress distribution in a large-scale trapezoidal compound
channel versus the measurements. (From
Riahi-Madvar, H., Ayyoubzadeh, S., Namin,
M., Seifi, A., 2011. Uncertainty analysis of
quasi-two-dimensional flow simulation in compound channels with overbank flows. J. Hydrol.
Hydromech. 59(3), 171.)
1.8
SKM
Observed
1.6
Shear Stress (N/m2)
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
2.5
3
3.5
Lateral Distance(m)
9.
3D numerical modeling of flow in compound channel using turbulence models
The turbulent flow in a compound channel is an example of complicated turbulent flows. In this test case the flow field is
affected by shear stresses produced with the momentum transfer between the main channel and the adjacent flood plains.
The secondary flow of the second type is generated with the anisotropic turbulence near the corners of compound crosssections. Several researchers used turbulence models in compound channels modeling, such as Wilson et al. (2002), Jiang
CFD models Chapter
4
79
et al. (2008), Shiono et al. (2003), Sugiyama et al. (2006), but the comparison of the accuracy of different turbulence models
in predicting lateral velocity distribution in compound channels has fewer studies. In this section, the authors used seven
turbulence models in a CFD simulation and compared their results with experimental data.
With the development of computing ability, numerical simulation in the CFD models is used to study complex flow
problems. The flow of water in a straight compound channel with prismatic cross-section is investigated with a threedimensional finite volume model which solves the Reynolds Averaged Navier-Stokes equations. In the following sections,
the mathematical equations of the flow and turbulence models, the numerical solution of the governing equations and
finally results of several turbulence models are presented.
10. Three-dimensional numerical model
The three-dimensional Navier-Stokes equations for turbulent flow, combined with turbulence models, are solved numerically to obtain velocity field in compound channel flows. The equations are as follows:
∂U i
¼ 0 i ¼ 1,2,3
∂xi
∂U i
∂U
1 ∂ + Uj i ¼
Pdij rui uj
r ∂xj
∂t
∂xj
(34)
i,j ¼ 1,2,3
(35)
U is the velocity, x is the spatial geometrical scale, P is the pressure, dij is the Kronecker delta, and u is the velocity. The last
part of this equation shows the turbulent term that is being modeled by the Boussinesq method:
!
∂Uj
∂U i
2
+
(36)
ui uj ¼ nt
kdij
3
∂xi
∂xj
Here, k is the turbulent kinetic energy. In this equation, the turbulent eddy viscosity nt is unknown and must be modeled by a
turbulence model. In this section, seven models of turbulence were used for the computation of eddy viscosity which are T1:
Keefer eddy viscosity ¼ 0.11*depth*shear velocity, T2: constant nonisotropic eddy viscosity in vertical and horizontal
directions: VtH ¼ 0.23 and VtV ¼ 0.008, T3: constant eddy viscosity ¼ 0.24 given by Wilson et al. (2006), T4: standard
k-e, T5: local k-e (local k-e model based on water velocity), T6: k-w with Wilcox’s wall law, and T7: k-w with k-e wall
laws (Olsen, 2009; Wilson et al., 2002; Wilcox, 2000; Rodi, 1980). In this study, the numerical solution of the governing
equations is done using the SSIIM model, which is a free online three-dimensional solver of flow and sediment transport in
turbulent open channel flows. The final form of the general equation of discretization (in the permanent state) is obtained as
follows:
X
ap ’ p ¼
a ’ +b
(37)
nb nb nb
in which
ap ¼
X
a
nb nb
s2p
(38a)
b ¼ bNo + s1p
bNo ¼ GG12
∂’
∂’
+ GG13 3
2
∂x
∂x
e
+ GG21
w
∂’
∂’
+ GG23 3
1
∂x
∂x
(38b)
n
+ GG31
s
∂’
∂’
+ GG32 2
1
∂x
∂x
t
(38c)
b
In the above relations, the nb subscript represents the neighboring nodes around the central node p. To obtain a differential
equation including F in the network nodes, it is necessary to consider a suitable profile between nodes. A linear change
profile for F is used to discretize the diffusion fluxes, so the orthogonal flux gradient on the east side e is discretized as:
∂’ ’E ’P
¼
∂x1
∂x1
(39)
In this equation, ’E represents the value of the variable ’ in the east node E and the other two gradients in the nonorthogonal
flux on the east side e are also discretized as follows:
∂’ ’ðenÞ ’ðesÞ
¼
∂x2
∂x2
(40)
80
Handbook of hydroinformatics
∂’ ’ðetÞ ’ðebÞ
¼
∂x3
∂x3
(41)
Also, an interpolation is performed to calculate the values of variables on the control volume faces in a weighted linear way
in the physical space, i.e.:
’e ¼ ’p 1 f 1p + ’E f 1p
(42)
in which
f 1p ¼
11.
Pe
PE
(43)
Grid generation and the flow filed solution
The grid in this study was structured, composed of △ X ¼ 25 cm, △ Y ¼ 7.9 cm and 21 vertical cells, and resulted from a
series of grid sensitivity analysis. In the vertical direction grid intersections are selected at 0, 0.05, 0.1, 0.15, …, times
of the depth, uniformly spaced over the flow depth. The grid in vertical and plane view is given in Fig. 11. The solution
field has 29.26 m length and 1.498 m width. The flow is steady and nearly uniform with the depth ratio (depth in the floodplain to the main channel) of 0.138. The simulation results from the seven turbulence models are compared with measured
depth-averaged longitudinal velocity profiles at 12.24 m from the upstream. The Manning coefficient was n ¼ 0.014.
12.
Comparison of different turbulence models
In Fig. 12, the results of seven turbulence models in comparison with observed values for predicting depth-averaged lateral
velocity distribution are presented. From this figure, all of seven turbulence models qualitatively have predicted the lateral
distribution of longitudinal velocity in compound channels, where in the main channel the flow velocity is faster than floodplains and the higher velocity gradients near the floodplains in multistage rivers are predicted accurately. From Fig. 12, it is
concluded that in the present test case, the T4 and T7 models, i.e., standard k-e and k-w-epsilon, have the best predictions of
the velocity field in compound channels in Fig. 12 the T1 case, Keefer model, has good predictions of high velocities in the
main channel, but the slower flow velocity in the floodplain predicted by this model is very slower than observed values.
The nonisotropic constant eddy viscosity model, T2 case, and the constant eddy viscosity model, T3 case, predict velocity
in the main channel slower than the observation and in floodplains greater than the observation. These models, could not
truly predict the velocity gradients in compound channels very well because the turbulence model does not use the velocity
and depth variations in the eddy viscosity models. In the standard k-e turbulence model, T4 case, the best prediction was
derived, but the models in the floodplain and in the interaction zone of the shear layer have some discrepancies with measurements. The local k-e model, T5 case, has results similar to constant eddy viscosity models. T6 case, the k-w model with
Wilcox boundary wall laws has good results in comparison with eddy viscosity models and there is more poor prediction
than standard k-e. Finally, the k-w with k-e wall laws is the second-best model and has acceptable predictions. From these
comparisons, it is clear that the transverse velocity filed in multistage rivers is very sensitive to the turbulence models and
requires further investigation on advanced turbulence models.
As the first option in three-dimensional numerical modeling of turbulent flow in compound channels, the standard k-e
model can be used as an acceptable model but further analysis on the flow variables such as transverse and vertical
FIG. 11 The grid in vertical (top left and right) and horizontal (bottom) planes.
Depth Averaged Velocity(m/s)
CFD models Chapter
4
81
FIG. 12 Observed and simulated distribution of depth-averaged velocity in
seven turbulence model.
0.40
0.35
0.30
0.25
Keefer eddy viscosity
Observed
Constant non-isotropic eddy viscosity
Constant eddy viscosity
Standard k-e
Local k-e
k-w with Wilcox’s wall law
k-w with k-e wall laws
0.20
0.15
0.10
0.05
Lateral Distance (m)
0.00
0
0.3
0.6
0.9
1.2
1.2713 m/s
5.0 m
Level 21
FIG. 13 Longitudinal velocity vectors.
velocities, shear stress, and Reynolds stresses is required. However, for the velocity modeling in compound channels, the
k-e model is the best one among the investigated turbulence models. Because of the best predictions of the T4 turbulence
model, it is selected as the base turbulence model, and some hydrodynamic behavior of flow in compound channels are
investigated and interpreted numerically Longitudinal velocity profile along the compound channel is presented in
Fig. 13. A net lateral momentum transfer from slow water in the flood plain toward the faster water in the main channel
occurs, and the high-velocity flow in the main channel pulls the low-velocity flow in the flood plain. From upstream toward
the downstream, a fully developed flow field occurs. Fig. 14 shows the secondary flows in the three sections of the flume,
one at a section near the upstream, another in the middle of the flume, and the other one at the end downstream crosssection. From this figure, it is concluded that the secondary flow decreases from the upstream to the downstream, in such
a way that the lateral mass and momentum transfer decreases. The horizontal and vertical velocity contour plots are presented in Fig. 15, and the computed flow field is given in Fig. 16.
13. Three-dimensional pollutant transfer modeling
The pollutant transport was calculated by solving the transient convection-diffusion equation for pollutant concentration:
!
∂c
∂c
∂
∂c
+ Uj
¼
Gt
(44)
∂t
∂xj ∂xj
∂xj
82
Handbook of hydroinformatics
0.0149 m/s
Cross-section no. 2
0.0086 m/s
Cross-section no. 25
0.0045 m/s
Cross-section no. 55
FIG. 14 Secondary flows at three locations.
CFD models Chapter
4
83
FIG. 15 Horizontal and vertical velocity contour plots.
where the Reynolds-averaged water velocity was denoted as U, the diffusion coefficient Gt was set equal to the eddy viscosity taken from the best turbulence model achieved in the previous section (i.e., standard k-e turbulence model). The
experimental data on pollutant transport are those from Shiono and Feng (2003). The experiments were done on a flume
with 20 m length, 0.2 m width, two depth ratios of 0.5 and 0.27, fixed bed slope of 0.0005, and the Manning’s roughness of
0.012334.
The used tracer was a fluorescence dye (Rhodamine), injected with a constant rate from a reservoir. The tracer was
injected in three positions in the deep channel, hereafter referred to as C1, C2, and C3. In the shallow channel, only
one injection point was used, referred to as S1 in Table 1.
14. Results of pollutant transfer modeling
For three-dimensional numerical transfer modeling of neutral pollutants in the compound section, a network with 40 control
volumes in the flow direction, 24 control volumes in the lateral direction, and 20 control volumes in the vertical direction
has been used. The flow boundary conditions are like the boundary conditions used in the previous section and for solving
84
Handbook of hydroinformatics
FIG. 16 Computational network
and velocity distribution in main
channel and floodplains.
TABLE 1 Dye injection locations and measurement locations.
Test case
Dye injection location
Injection rate (mL/min)
X
Y
Z
C1
13
0.05
0.108
54
C2
13
0.1
0.108
54
C3
13
0.15
0.108
54
S1
13
0.1
0.073
33
the three-dimensional pollution-transfer equation, the boundary conditions are such that the boundary condition of symmetry (zero gradient) is used in the boundaries of the bed, downstream, outlet, and sides, i.e., the gradient is zero. At the
water surface, the concentration is zero, and at the inlet boundary at specific injection points, the known concentration
injection is used. The results of the numerical model are compared with experimental measurements. Fig. 17 shows the
distribution of cross concentration measured in the S1 experiment, simulated by the three-dimensional numerical model
SSIIM in this study, and simulated by Shiono et al. (2003). In laboratory measurements as can be seen in Fig. 17, two peaks
are observed one of them in width of 0.09 m and the other one in width of 0.135 m. The three-dimensional numerical model
in this study was more accurate than the results of the numerical model of Shiono et al. (2003), which is assumed to be a
fully developed and uniform flow with two turbulent, linear k-e and nonlinear k-e, flow models. Figs. 18–20 show three
experiments C1, C2, and C3, respectively, which compare the cross-distribution of concentration obtained from the present
study with experimental measurements and numerical modeling of Shiono et al. (2003). The results of three-dimensional
modeling show that although the three-dimensional model has been able to predict the pattern of cross-distribution of concentration in the compound section, but does not have the desired accuracy, like the numerical model of Shiono et al.
(2003). This issue is related to the nature of turbulent flow in compound sections and its effect on the quantitative and
qualitative pattern of pollutant distribution in composite sections.
Exp
Shiono et al(2003): LY
C (ppm)
7.00
Shiono et al(2003): k-e
3D This Study
FIG. 17 The results comparison of threedimensional modeling of contamination
transfer in the compound section with experimental measurements and the results of other
researchers in the S1 experiment.
6.00
5.00
4.00
3.00
2.00
1.00
0.00
0
0.05
0.1
0.15
0.2
Y(m)
FIG. 18 The results comparison of threedimensional modeling of contamination transfer
in the compound section with experimental measurements and the results of other researchers in
the C1 experiment.
7.00
C (ppm)
6.00
Exp
Shiono et al(2003): k-e
5.00
Shiono et al(2003):LY
3D This study
4.00
3.00
2.00
1.00
0.00
0
0.05
0.1
0.15
0.2
Y(m)
Exp.
Shiono et al(2003):LY
Shiono et al(2003): k-e
3D This study
C (ppm)
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
0
0.05
0.1
0.15
Y(m)
0.2
FIG. 19 The results comparison of threedimensional modeling of contamination transfer
in the compound section with experimental measurements and the results of other researchers in
the C2 experiment.
Handbook of hydroinformatics
FIG. 20 The results comparison of threedimensional modeling of contamination
transfer in the compound section with experimental measurements and the results of other
researchers in the C3 experiment.
7.00
Exp.
6.00
C (ppm)
86
Shiono et al(2003): ke
5.00
4.00
3.00
2.00
1.00
0.00
0
0.05
0.1
0.15
0.2
0.25
Y(m)
15.
Conclusions
This chapter presents the methods available in computational fluid dynamics (CFD) to solve the governing equations in
hydroinformatics modeling. Some applications of CFD in hydroinformatics, including the one-dimensional solution of
advection-diffusion equation in pollutant transport modeling, one-dimensional solution of Saint-Venant equations for
dam-break simulation, quasi-two-dimensional solution of velocity distribution in compound rivers, and three-dimensional
modeling of turbulent flow and pollutant transport in rivers, are provided. The physically influenced scheme (PIS) is presented for solving the one-dimensional solution of the advection-diffusion equation in pollutant transport modeling and the
one-dimensional solution of Saint-Venant equations for dam-break simulation. The PIS approach was initially developed
for Euler equations in gas dynamics. In this chapter, the authors extended the use of PIS approach into the pollutant dispersion. In the 1D-ADE problem, the results of the PIS model are verified by analytical solution. The comparison of the PIS
model with the analytical solution shows that this method is in good agreement with the analytical results. In the dam-break
problem, it is shown that the PIS model can accurately predict the step and sharp variation of the flow and it is indicated that
this model is capable of modeling high-speed open channel flow regimes. According to the quasi-two-dimensional flow
simulation section, the results of the Shiono and Knight Model (SKM) are compared with observed values of velocity, shear
stress, and discharge capacity. The performed comparisons reflect the fact that the SKM model can accurately predict
lateral velocity and shear stress profiles. In the three-dimensional numerical modeling of turbulent flow section, seven
models of turbulence are compared with experimental data. The seven models are Keefer model, nonisotropic constant
eddy viscosity model, constant eddy viscosity model, local k-e model, k-w model with Wilcox boundary wall laws,
and k-w with k-e wall laws. According to the output, the T4 and T7 models, i.e., standard k-e and k-w-epsilon, have
the best predictions of the velocity field in compound channels in comparison with the other five models. The other result
that can get from this output is that the T4 model is the best model because of its generality, and the T7 model is the secondbest model. In the study of three-dimensional pollutant transfer, the three-dimensional numerical model is compared to the
results of the numerical model of Shiono et al. (2003), which is assumed to be a fully developed and uniform flow with two
turbulent flow models, linear k-e and nonlinear k-e. The results of three-dimensional modeling show that although the
three-dimensional model can predict the pattern of cross-distribution of concentration in the compound section, it does
not have the desired accuracy, like the numerical model of Shiono et al. (2003).
References
Aldrighetti, E., 2007. Computational Hydraulic Techniques for the Saint Venant Equations in Arbitrarily Shaped Geometry. Doctoral dissertation, International Association for Hydrogen Research, Rotterdam, The Netherlands.
Ayyoubzadeh, S.A., 1997. Hydraulic Aspects of Straight-Compound Channel Flow and Bed Load Sediment Transport. PhD Thesis, The University of
Birmingham, UK.
CFD models Chapter
4
87
Bozkus, Z., Eslamian, S., 2022. Simulating flood due to dam break. Chapter 25, In: Eslamian, S., Eslamian, F. (Eds.), Flood Handbook, Vol. 3: Flood
Impact and Management. Taylor and Francis, CRC Group, USA.
Chatila, J.G., 1997. Modeling of Pollutant Transport in Compound Open Channels. Doctoral dissertation, University of Ottawa, ON, Canada.
Darbandi, M., Bostandoost, S.M., 2005. A new formulation toward unifying the velocity role in collocated variable arrangement. Numer. Heat Transf.
B 47 (4), 361–382.
Darbandi, M., Mokarizadeh, V., Rouhi, E., 2007. Developing a shock-capturing formulation with high performance to capture normal standing shock in
all-speed regime. Esteghlal J. Eng. 25 (2), 167–181.
Graf, W.H., 1998. Fluvial Hydraulics: Flow and Transport Processes in Channels of Simple Geometry. In: Collaboration with M.S. Altinakar, Wiley,
England. 681 pp. ISBN 0-471-97714-4.
Jiang, H., Guo, Y., Li, C., Zhang, J., 2008. Three-dimensional numerical simulation of compound meandering open channel flow by the Reynolds stress
model. Int. J. Numer. Meth. Fluids 2008. https://doi.org/10.1002/fld.1855.
Kashefipour, S.M., Falconer, R.A., 2002. Longitudinal dispersion coefficients in natural channels. Water Res. 36 (6), 1596–1608.
Olsen, N.R.B., 2009. A Three-Dimensional Numerical Model for Simulation of Sediment Movements in Water Intakes with Multiblock Option.
Department of Hydraulic and Environmental Engineering, The Norwegian University of Science and Technology, 2002. http://folk.ntnu.no/nilsol/
ssiim/manual3.pdf. User’s manual.
Patankar, S., 1980. Numerical Heat Transfer and Fluid Flow. Hemisphere, Washington, DC. McGraw Hill, USA.
Riahi-Madvar, H., Ayyoubzadeh, S., Namin, M., Seifi, A., 2011. Uncertainty analysis of quasi-two-dimensional flow simulation in compound channels
with overbank flows. J. Hydrol. Hydromech. 59 (3), 171.
Riahi-Madvar, H., Dehghani, M., Seifi, A., Salwana, E., Shamshirband, S., Mosavi, A., Chau, K.W., 2019. Comparative analysis of soft computing
techniques RBF, MLP, and ANFIS with MLR and MNLR for predicting grade-control scour hole geometry. Eng. Appl. Comput. Fluid Mech. 13
(1), 529–550.
Rodi, W., 1980. Turbulence Models and Their Application in Hydraulics – a State-of-the-Art Review. Int. Assoc. Hydr. Res., Balkema, Rotterdam, The
Netherlands,.
Seifi, A., Riahi-Madvar, H., 2019. Improving one-dimensional pollution dispersion modeling in rivers using ANFIS and ANN-based GA optimized
models. Environ. Sci. Pollut. Res. 26 (1), 867–885.
Shiono, K., Feng, T., 2003. Turbulence measurements of dye concentration and effects of secondary flow on distribution in open channel flows. J. Hydraul.
Eng. 129 (5), 373–384.
Shiono, K., Knight, D.W., 1989. Two-dimensional analytical solution compound channel. In: Proceedings of 3rd International Symposium on Refined
Flow Modeling and Turbulence Measurements. Universal Academy Press, pp. 591–599.
Shiono, K., Knight, D.W., 1991. Turbulent open-channel flows with variable depth across the channel. J. Fluid Mech. 222, 617–646.
Shiono, K., Scott, C.F., Kearney, D., 2003. Predictions of solute transport in a compound channel using turbulence models. J. Hydraul. Res. 41 (3),
247–258.
Sugiyama, H., Hitomi, D., Saito, T., 2006. Numerical analysis of turbulent structure in compound Meandering open channel by algebraic Reynolds stress
model. Int. J. Numer. Methods Fluids 51, 791–818.
Tucciarelli, T., 2003. A new algorithm for a robust solution of the fully dynamic Saint-Venant equations. J. Hydraul. Res. 41 (3), 239–246.
Wilcox, D.C., 2000. Turbulence Modeling for CFD. DCW Industries. ISBN 0-9636051-5-1.
Wilson, C., Bates, P.D., Hervouet, J.M., 2002. Comparison of turbulence models for stage-discharge rating curve prediction in reach-scale compound
channel flows using two-dimensional finite element methods. J. Hydrol. 257 (1–4), 42–58.
Wilson, C.A.M.E., Yagci, O., Rauch, H.P., Olsen, N.R.B., 2006. 3D numerical modelling of a willow vegetated river/floodplain system. J. Hydrol. 327
(1–2), 13–21.
Wu, W., 2007. Computational River Dynamics. CRC Press, USA.
This page intentionally left blank
Chapter 5
Cross-validation
Amir Seraja, Mohammad Mohammadi-Khanaposhtanib, Reza Daneshfarc, Maryam Naserid,
Mohammad Esmaeilie, Alireza Baghbanf, Sajjad Habibzadehg, and Saeid Eslamianh,i
a
Department of Instrumentation and Industrial Automation, Ahwaz Faculty of Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran,
b
Fouman Faculty of Engineering, College of Engineering, University of Tehran, Tehran, Iran, c Department of Petroleum Engineering, Ahwaz Faculty of
Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran, d Chemical Engineering Department, Babol Noshirvani University of
Technology, Babol, Iran, e Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, f Chemical
Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Mahshahr Campus, Mahshahr, Iran, g Surface Reaction and
Advanced Energy Materials Laboratory, Chemical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran,
h
Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, i Center of Excellence in Risk Management
and Natural Hazards, Isfahan University of Technology, Isfahan, Iran
1. Introduction
1.1 Importance of validation
In our everyday life, we face questions like “How do you know that?,” “Are you sure?,” etc. But why these questions are
asked so frequently? To see why let’s consider the situations where such questions are made. When someone talks about an
event, they give us a piece of “information” about a “result.” Now if we would use this information in other situations, we
need to make sure that it is “valid.” It is always about losing or gaining some benefit by either retelling the story or acting
based on that story. For example, we don’t like to risk our reputation on unverified stories or lose our money in an
investment based on unevaluated information. These examples were intended to demonstrate the importance of “validation.” Every piece of information that tends to be useful in the future, needs to be validated first. In a scientific study,
validation comes into play when some experimental data are acquired or when a model is proposed to generalize the
applicability of this data. Especially, when we are looking for a predictive tool, validation becomes a crucial step in
the development of model. For example, when a model is proposed for engineering or economic purposes, it must be validated before any design, investment, or forecasting upon that model. In science and engineering, when a model is derived
by a mechanistic approach, a mathematical formula is obtained which relates the input to the specified output. In this
respect, the validation could be made by comparison of the model results with accurate experimental data or at least
by the model performance in the limiting situations, e.g., equilibrium or steady-state conditions. But there is an everincreasing number of models based upon machine learning which try to offer predictive tools based on their training
by the available experimental data. Here a valid model must act as well on the independent data sets, i.e., the data which
is not incorporated in the derivation of the model.
1.2 Validation of the training process
In machine learning, the training phase usually takes the major portion of the available data (training data set) to find the
model parameters that favorably minimize the error. Usually, the error rapidly decreases at the first iterations but as the
training proceeds, the error would slowly decrease toward a local minimum (Richert, 2013). The training is stopped when
the best generalization is achieved by the model which is the one that is not underfitting or overfitting the data. Since the
training data set provides the expected output, the error could be minimized by a suitable method to achieve the best model
parameters for the training data; however, if the model is so accurate for the training data set, it might act poor on the testing
data set or any other future data (Fontaine, 2018). Thus, a stopping criterion is required to avoid overfitting in the model.
Indeed, searching should be stopped when the criteria of error minimization and cross-validation are both met (Brownlee,
2018). For example, the validation of the model on the testing set could be performed during the training phase and when the
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00021-X
Copyright © 2023 Elsevier Inc. All rights reserved.
89
90
Handbook of hydroinformatics
error of the predictions for the testing set reaches a local minimum the training would be stopped (Raschka, 2015). This type
of validation which usually takes some 30% of the data, as the testing set is the simplest type of cross-validation but it still
functions properly in any situation. However, for a more robust validation, more useful techniques might be incorporated
which are classified and discussed in the subsequent sections (Lei, 2019; Berrar, 2019).
2.
Cross-validation
The cross-validation is started by splitting the original data set into a training set(s) and test set(s). When all possible combinations of splitting the available data are considered, exhaustive cross-validation is going to be made (Brownlee, 2016).
On the other hand, when all possible combinations are not considered a nonexhaustive, cross-validation is performed. The
simple way of randomly splitting the data set into only one train set (usually containing 70% of the data) and one test set
which contains the rest of the data, which is known as the hold-out method is an example of nonexhaustive cross-validation
(M€
uller and Guido, 2016). Besides the hold-out method which is just considered a simple validation by some practitioners,
the other cross-validation methods include multiple training/test sets and the final model is derived by averaging over the
individual models obtained from different runs (Dangeti, 2017).
2.1 Exhaustive and nonexhaustive cross-validation
As suggested by its name, exhaustive cross-validation involves a great deal of computational effort as it considers all possible combinations of isolating a specified number of samples from the training phase to use them in the testing phase
(Karim and Kaysar, 2016).
l
LpO-CV
Generally, this type of cross-validation is addressed as leave-p out cross-validation (LpO-CV) since at each run, p samples
are taken out of the original data set and the training is made by the remaining “n-p” samples where n represents the total
number of samples in the original data set. Now if there are only 50 samples in the original data set, with p ¼ 15, the total
number of training-validation runs would be C(50,15) which is about 2.25 1012. Selecting p ¼ 2 provides the main advantages of this method for many problems (Brunton and Kutz, 2019).
l
LOO-CV
A variant of LpO with p ¼ 1 which is denoted as LOO-CV (leave-one-out cross-validation) is attracting more attention
toward itself as it involves less computational effort while preserving some essential features of LpO-CV. That is so
because this method provides “n-1” predictions for each sample in a data set of n samples and since the ratio is close
to unity, the LOO-CV holds the variable section consistency in linear models (Viswanathan et al., 2016).
2.2 Repeated random subsampling cross-validation
This cross-validation scheme creates multiple random splits from the original data set. For each of these random splits, the
training and testing are made to find the corresponding model and by averaging over these models, the final model is prepared. Since the selection of the training and testing sets is a random process, some of the samples may never be used in the
validation stage while other samples might be used more than once. Also, the stratified version of this method is proposed
for handling imbalanced data sets (Dua and Du, 2016).
2.3 Time-series cross-validation
For time-series or time-dependent data, the order of the data is of crucial importance that’s why random splitting or k-fold
partitioning may not provide satisfactory results. Instead, the data split is made based on time and the training is performed
using the prior subsets with the next subset as a validation set. This method is referred to as rolling cross-validation and also
the walk-forward or forward chaining method. Note that this method prevents the leakage of data from the future to the
training set which makes the rolling CV stand out (De Prado, 2018; Jansen, 2018; Hadizadeh and Eslamian, 2017).
2.4 k-fold cross-validation
The LOO-CV requires n runs which is quite large for many problems thus for saving computation effort, the k-fold technique was developed. This method partitions the data set in k equal size subsets or folds. At each pass, “k-1” folds are used
for the training and the remaining fold takes the rule of the testing set (Geron, 2019). The process is repeated k times until
Cross-validation Chapter
5
91
each of the k folds has been once used as the test set. Finally, the model is derived by averaging the results of different runs.
Although LOO-CV has been considered as a variant of LpO-CV, one might assume that LOO is a form of k-fold crossvalidation with k ¼ n (Kumar, 2019; Kane, 2017).
2.5 Stratified k-fold cross-validation
For imbalanced data sets, containing two or more different classes with a different numbers of data points the naı̈ve k-fold
CV does not work properly. The solution is sought by partitioning the data set in a way that the mean response value in all
folds to be nearly equal. This method is known as stratified k-fold cross-validation. Note that this method is not suitable for
time series data sets (Swamynathan, 2019; El Naqa et al., 2015).
2.6 Nested
Finally, the k-fold cross-validation could be run in a nested scheme if both hyperparameter selection and error minimization
are planned to be made simultaneously. For example, in the k*l-fold cross-validation method, an outer loop and an inner
loop are designed to cover both tasks. Here the outer loop makes the usual k-fold cross-validation, while the inner loops are
responsible for fitting the model parameters. Note that the “l” in the name of the method refers to the number of subdivisions
of outer training sets (Raschka and Mirjalili, 2017; Deisenroth et al., 2020).
3. Computational procedures
To comprehend completely and dive deep into cross-validation, the problem of predicting Boston house pricing is brought
here. As mentioned in previous sections, the difference between a regression and classification case is the output of these
types of problems. In a regression case, the output continues, the output is a continuous numeric variable, some instances for
this case study are predicting the temperature tomorrow or another instance is predicting the price of a stock, given its
previous prices. Another example for this case is the prediction of time which a software will terminate by giving its specifications. In this chapter, we want to check out the case of predicting the price of homes in Boston suburbs. In the 1970s, as
it can be understood, the home price is a continuous variable. Some features in this case study are crime rate, the average
number of rooms in the house and accessibility to different highways from house and so forth. In this example, the concept
of k-fold cross-validation will be investigated practically. Data for the Boston problem exist in the Keras library. For
solving this case, Spyder IDE of Python was selected. Python has several IDEs such as Spyder, Atom, PyCharm, and
so on. For machine learning and deep learning tasks, Spyder and Jupyter IDEs are utilized more and have special applications for machine learning tasks in comparison with other IDEs. Spyder IDE is selected to solve the Boston problem.
Let’s dive into the solution. It is noted that all codes are written in Spyder IDE (Zaman et al., 2019).
(1) from Keras.datasets import boston_housing
The first Boston dataset is imported. As mentioned before, we used Keras datasets. Boston_housing is a dataset in Keras
library.
(2) (train_data, train_targets ), (test_data, test_targets) = boston_housing.load_data();
For loading data and splitting data into train and test data, a function is applied known as the load_data() function. The
definition of load_data() function is as follow:
load_data(path=’boston_housing.npz’, test_split=0.2, seed=113)
The first argument of this function is the path. The path is the directory where to cache the dataset locally. The first time a
user wants to use a dataset in Keras, the Spyder download it automatically by load_data() function and saves this dataset in
the path (relative to /.keras/datasets) as locally (Sarkar et al., 2018). Another argument is test_split which means a fraction
of data to reserve as test data and seed means random seed for shuffling the data before computing the test split. The output
of this function is returned as a tuple of Numpy arrays: (x_train, y_train), (x_test, y_test). In line 2, train and test targets are
the labels of train and test data. Train or test data gives data or features and train and test targets give labels of these data.
(3) train_data.shape => Out[3]: (404, 13)
After executing the code in line 3, the output is: Out[4]: (404, 13). It means, this dataset has 404 train data and there are 13
features which as mentioned before some features are such crime rates, accessibility to highways, the number of rooms
which every house has, etc.
92
Handbook of hydroinformatics
(4) test_data.shape => Out[4]: (102, 13)
This means, 102 data is randomly selected for test data while like train data, it has 13 features. As it is obvious, the
deep learning model wants to predict the price of houses, therefore, the output of the model is the price of houses in
Boston.
(5) train_targets => Out[5]: array([15.2, 42.3, 50. , 21.1, 17.7, ........,19.4, 19.4, 29.1])
By executing line 5, the output is like Out[5], which means the price of the first house or first district in Boston city in
the dataset is 15.2 thousand dollars and the second sector is 42.3 thousand dollars, etc. These prices are the median
values in each city district in thousands of dollars. One important point is about features and that means the range
of features is different, in another word, each feature has a different scale or different range, some are between 1
and 12, some between 0 and 1, others between 0 and 100, and so on. In general, there are 13 features which each feature
has a different range. This point is too significant which before starting operation for training data, the range of all data
should be the same, and in many cases, the range of data should be between 0 and 1. This operation is called data
normalization. If don’t do the preprocessing operation, the learning rate becomes too slow and hard to train the model.
One important factor in data preprocessing is normalization. This means all data features should be in the same range.
One specific type of normalization is called Z-score normalization. Z-score normalization means each feature centered
zero and its standard deviation should be unit (Han et al., 2011). This causes all features to have the same distribution. If
the mean of all data becomes one specific number and the standard deviation becomes that specific number, the result
becomes the same distribution. The mean of train data is computed in line 6 by the mean() function and store into a
mean variable.
(6) mean = train_data.mean(axis=0)
Axis ¼ 0 in mean function means the averaging operation is executed on each column. As mentioned before each column is
an indicator of a specific feature. The output of code in line 6 is a vector of 13.
Out[6]: array([3.74511057e+00, 1.14801980e+01, 1.11044307e+01, 6.18811881e-02,
5.57355941e-01, 6.26708168e+00, 6.90106436e+01, 3.74027079e+00,
9.44059406e+00, 4.05898515e+02, 1.84759901e+01, 3.54783168e+02,
1.27408168e+01]).
(7) train_data - = mean;
In code of line 7, the mean value is reduced from all of the train data. This causes the mean of train data to become zero.
Lines 6 and 7 cause to mean of train data becomes zero. Now turn to standard deviation (std), standard deviation should be
unit.
(8) std = train_data.std(axis=0);
(9) train_data /=std;
Lines 8 and 9 make the standard deviation of train data become a unit. If take a standard deviation from train data, it returns
a unit. Out[10] proves this issue. As it can be seen it returns a unit array. When a number is divided into standard deviations,
the result turns to one.
(10) In[10] : train_data.std(axis=0)
Out[10]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
As a result of Z-score normalization the mean is zero and the standard deviation (std) should be united.
(11) test_data -=mean;
(12) test_data /=std;
If consider with details to the python codes, lines 11 and 12, mean and std are computed from train data. It is noted that for
test data, the mean and std of train data are being used. It should be considered that when users wants to normalize test data,
this normalizing process should be done by training data quantities (mean and std of train data). But a question here arises as
that why for normalizing test data, data information (mean and std) of train data is used for normalization and it doesn’t use
from test data information (mean and std) for normalizing test data. The reason is that when preparing the model and a new
case (a new home) comes into the model to predict its price, the first problem created here is that, it is impossible to compute
Cross-validation Chapter
5
93
the standard deviation and mean for one test data. The second problem is that another name of test data is unseen data. It
means that when the model is training, the user shouldn’t give any information about test data because in the real world
you don’t have any information about the new case (test data) which comes in the model. Therefore, it is not allowed to
compute the std and mean from test data because it is a type of cheating. It seems to give some information about test data
to your model. After loading (preparing) and normalizing or preprocessing data, we should construct network
architecture.
Network architecture
First, it should be import relative libraries from Keras library such as models and layers to construct the model. Lines 13
and 14 import relative libraries (models and layers) from Keras.
(13) from keras import models
(14) from keras import layers
In line 15, define a function known as build_model() with keyword def, we going to construct the model into the
build_model() function. The intention for defining a function is that the constructed model is going to apply in different
places in the program. Line 16 makes a sequential model. The sequential model is favorable for a plain stack of the layer in
which each layer has one input tensor and one output tensor (Krohn et al., 2019). This module is presented in the engine of
Keras. Add function is present in the engine of Keras in the Sequential section.
(15) def build_model():
(16) model = models.Sequential()
From layers library in add function created a dense layer with 64 neurons and with an activation function relu. This point is
important to mention that train data has two dimensions (a two-dimension tensor). The first dimension is the number of data
(number of rows) and the second dimension is the number of features (number of columns). Therefore train_data.shape[0]
is the number of rows (data) and train_data.shape[1] is number columns (features).
(17) model.add(layers.Dense(64,activation=’relu’,input_shape=(train_data.shape[1],)))
A definition of activation functions
Activations can either be used through an “Activation” layer, or through the “activation” argument that is supported by
all forward layers. There are many activation functions such as elu, selu, softmax, softplus, tanh, sigmoid, hard_sigmoid,
softsign, exponential, linear and relu. Fig. 1 indicates the graph of relu activation function (Millstein, 2020).
In line 18, another dense layer is added with 64 neurons.
(18) model.add(layers.Dense(64,activation=’relu’))
As additional explanation about optimizers such as relu and function of each layer consider Fig. 2.
FIG. 1 Relu: the rectified linear unit
function.
1.5
1.0
0.5
0.0
–0.5
–2.0
–1.5
–1.0
–0.5
0.0
0.5
1.0
1.5
94
Handbook of hydroinformatics
X (input)
W (Weight)
WX + b
f(W X + b)
b (bias)
FIG. 2 The layer explanations.
X (data) is the input of layer; each layer has two characteristics such as weight (w) and bias (b). The output of layer is
WX + b which is a linear equation. WX + b as an input is given to a nonlinear function such as relu, elu, softmax, and so forth
to become nonlinear. This nonlinear function can be modeled every nonlinear system and is the base of Artificial Neural
Networks (ANN). In linear layers, there is no f as f(W X + b) and is just a linear function W X + b.
In line 19, another dense layer is added with one neuron (Moolayil et al., 2019). Why there is one neuron in the output
layer? As mentioned before, the goal of the Boston problem is predicting house prices in Boston district. As a result, the
output of this problem is one variable (house price in Boston); therefore, there is one neuron in the output layer. The
important point here, is that, there is no activation function in the output layer. When don’t put any activation function
in the output layer, the activation function is assumed linear. But the question here arises why don’t use any activation
function in the output layer. The answer is that approximately all activation functions limit the output. For instance,
the softmax activation function, reduce the range of output to [0 1]. In this case, we intend to predict the price of the house
and this price can be in any range of price (can be any number); therefore, the last layer has no activation function. It will be
a linear layer free to learn to predict values in any range. Generally, for regression problems, the activation function for the
output layer is linear. Line 9 indicates this issue.
(19) model.add(layers.Dense(1))
The next step is assigning optimizer, loss function, and metrics.
(20) model.compile(optimizer=’rmsprop’, loss=’mse’, metrics=[’mae’])
In this step, and as a explain for codes in line 20, the model should be compile via compile method. Each compile method
has several parameters such as optimizer, loss and metrics. The definition of compile method is bright in bellow:
compile(self, optimizer, loss=None, metrics=None, loss_weights=None,sample_weight_mode=None,
weighted_metrics=None, target_tensors=None, **kwargs)
First, it is necessary to define optimizers, the duty of optimizers is to change the value of the weight a bit based on loss
values to decrease the loss value. Fig. 3 shows this mechanism clearly. This figure indicates the relationship between the
layers network, loss function, and optimizer. The adjustment between these two numbers is the job of the optimizer, which
this operation is implemented by the Backpropagation algorithm: the principle and central algorithm in deep learning (DL)
(Goodfellow et al., 2016).
FIG. 3 The loss value is used as a feedback signal to regulate the weights of layers in an appropriate direction. This
regulation is executed via optimizers.
Input X
Weights
Layer
(data transformation)
Weights
Layer
(data transformation)
Weight
update
Optimizer
True targets
Y
Predictions
Y'
Loss function
Loss score
Cross-validation Chapter
5
95
In this program, used from rmsprop, it is suggested to leave the parameters of this optimizer at their default values
(except the learning rate, which can be freely tuned). This optimizer is a good choice for recurrent NNs. Another parameter
of compile function is assigning the loss function. The loss function is an object that the model try to minimize. As mentioned before the loss function is one of the two parameters required to compile the model. Different types of loss function
based on types of the problem; the used loss function is different. For example, the “mse” loss function is usually applied for
regression problems. For classification problems, if there is a binary problem “binary_crossentropy” is applied and if there
are multi classes in output, categorical_crossentropyp is applied. In line 20, the loss function is assigned mse because as
mentioned before this case is a regression problem (Shanmugamani, 2018). What is the definition of mse? The abbreviation
of mse is mean square error. In statistics, the mean square error (MSE) or mean square deviation (MSD) of an estimator
(a procedure for estimating an unobserved quantity) measures the average of the squares of the errors, it means, the average
squared difference between the estimated values and the actual value. In Eq. (1), the MSE equation is bright:
n 2
1 X
MSE ¼
Y i Ybi
(1)
n i¼1
n ¼ number of data points
Yi ¼ observed value
Ybi ¼ predicted value
Another parameter of compile function is metrics; a metric is a function that is used to determine the functionality of the
model. The performance of metric function is similar to the loss function, except that the results from estimating a metric
are not used when training the model. There are multiple different metrics such as binary_accuracy, categorical_accuracy,
and so on. The metric applied in this case is MAE, what is MAE? Mae stands for mean absolute error. Mae is approximately
similar to mse with a little difference. Eq. (2) indicates the MAE.
X
n n X
e yixi i
MAE ¼
i¼1
n
¼
i¼1
n
(2)
In classification problems, accuracy is applied to record output while in regression problems, accuracy record is meaningless. To record the output and performance of a regression program MAE is recorded more. For instance, an MAE
of 0.5 on this program would mean your predictions are off by 500 dollars on average. As mentioned later in k-fold
cross-validation, every fold should be generated by the model. In order to avoid repetition in the program, all codes associated with the model have been put into one function. In code 21, this function is defined completely.
(21) def build_model():
model = models.Sequential()
model.add(layers.Dense(64,activation=’relu’,input_shape=(train_data.shape[1],)))
model.add(layers.Dense(64,activation=’relu’))
model.add(layers.Dense(1))
model.compile(optimizer=’rmsprop’,loss=’mse’,metrics=[’mae’])
return model
l
K-fold cross-validation
There are several ways for making validation data, one manner for validation data is splitting train data into two sections
(train and validation data) (Dubitzky et al., 2007; Gollapudi, 2016). This method is a so-called hold-out method. When the
number of data is small like the Boston problem (in this problem the total of data is 400), this method (hold-out method) is
not efficient for this problem. Splitting this little data into two sections and the separated train data is not reliable as the
performance of the model. This method is not efficient for little data. A way for solving such problems is using K-fold crossvalidation (Provost and Fawcett, 2013). The procedure of K-fold is that data is divided into k folds equally. For instance in
Fig. 4, it can be seen a threefold cross-validation. After dividing data into k folds, the model is run k times, for example,
three times. In run 1 or fold 1, part 1 becomes validation and the two successive parts become training and run 2 and run 3
are executed respectively by moving the validation section into the data set. In each fold, the score of the model (validation
score) is assessed and reported and finally the average of these scores is measured. The average of these scores is assumed as
model accuracy. As mentioned before, the idea of K-fold cross-validation is appropriate for little datasets such as the Boston
96
Handbook of hydroinformatics
Data split into 3 partitions
Fold 1
Validation
Training
Training
Validation
score #1
Fold 2
Training
Validation
Training
Validation
score #2
Fold 3
Training
Training
Validation
Validation
score #3
Final score:
average
FIG. 4 The idea of K-fold cross-validation.
dataset. The computational process becomes difficult in K-fold cross-validation in terms of repeating model training
because it should be trained k different models while in previous cases, it trained one model and then evaluated it with
a validation set but the K-fold method is more reliable in comparison with other methods particularly for little datasets.
Fig. 4 indicates the idea of K-fold cross-validation in detail.
The following explained the codes about k-fold cross-validation in detail and clearly. In this example, assumed there are
four folds cross-validation.
(22) import numpy as np
(23) k=4 (number of folds)
(24) num_val_samples = len(train_data) // 4 ¼> at this line the number of validation samples are calculated. Please noted that the sign
// means correct division. For instance 5/4 ¼ 1.25 but 5//4 ¼ 1, it turns out an integer number.
(25) num_epochs = 500 ¼> number of epochs.
(26) all_scores = [ ] ¼> this array stores the score of each fold and consequently the length of this array will be 4.
Now we should prepare data: in lines 27 and 28 validation data and targets (labels) are prepared separately. For example
val_data when i¼0 is val_data ¼ train_data[0:101] and the validation targets are: val_targets ¼ train_targets[0:101]
(27) val_data = train_data[i * num_val_samples : (i+1) * num_val_samples]
(28) val_targets = train_targets[i * num_val_samples : (i+1) * num_val_samples]
In lines 29 and 30 prepared partial train data and partial train targets
(29) partial_train_data = np.concatenate(
[train_data[:i*num_val_samples],
train_data[(i+1)*num_val_samples:]],
axis=0)
(30) partial_train_targets = np.concatenate(
[train_data[:i*num_val_samples],
train_data[(i+1)*num_val_samples:]],
axis=0)
Cross-validation Chapter
5
97
to explain code lines 29 and 30 assume i¼1 ¼> to simplify our task lets abbreviate num_val_samples nvs: therefore
validation data is => val_data = train_data[nvs : 2*nvs] while partial_train_data = train_data[:nvs] and
train_data[2*nvs:].concatenate means attach these two ranges of data together. Now to make a new model, call
build_model() function:
(31) model = build_model()
The model should now be trained using the partial_train_data by fit function. As mentioned before training process is executed by fit function. This function have several parameters:
fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1,callbacks=None, validation_split=0.,
validation_data=None, shuffle=True,class_weight=None, sample_weight=None, initial_epoch=0,
steps_per_epoch=None,validation_steps=None, validation_freq=1, max_queue_size=10,
workers=1,use_multiprocessing=False, **kwargs)
(32) model.fit(partial_train_data,
partial_train_targets,
epochs=num_epochs,
batch_size=1,
verbose=0)
As it is obvious the number of epochs are 500 and the important point is that the verbose parameter is zero, but the question
here is what is the use of verbose while training the model? By setting verbose 0, 1 or 2, you mention that how do you want
to see the training progress for each epoch.
Verbose ¼ 0 will show you nothing(silent)
Verbose ¼ 1 will show you an animated progress bar like this:
[¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼]
Verbose ¼ 2, will just mention the number of epoch like this: Epoch 1/10
Therefore, the code line 32 executed the training operation on the train data. Now the trained model should be evaluated on
the validation data and then the mae (mean absolute error) which is measured from this step appended to all_scores list
(array).
(33) val_mse, val_mae = model.evaluate(val_data,val_targets,verbose=0)
(34) all_scores.append(val_mae)
Because k ¼ 4, the loop is repeated four times and as a result four mae will be obtained for validation data and the final mae
will be the average of all maes in all_scores list. It is noteworthy to mention that the subject of k-fold cross-validation is too
significant in machine learning and data mining. It will be useful to mention that all codes about k-fold cross-validation are
ready in scikit to learn and machine learning libraries and there is not necessary to write codes. Now run code and print
all_scores:
(35) print(all_scores) => [2.018010139465332, 2.5180163383483887, 2.4935202598571777,
2.712078809738159] => while the number of epochs are 500
As it is obvious, there are four folds and for each fold, there is one mae. As mentioned before, we defined an array known as
all_scores and save all folds maes in this array. If averaged from these maes.
(36) np.mean(all_scores)
¼> 2.4354063868522644
This means the average mean absolute error among all folds is approximately 2.43. the idea of k-fold is that instead of
making train and validation data one time, the train and validation data are made several times and then take an average
from those maes. In this state, the output of this method is more reliable. If attend to codes, the final result of the program are
one mae. If we want to record the status of program in each fold and all steps (epochs) and save them (for instance if the
98
Handbook of hydroinformatics
program has 500 epochs this manipulated code, saves 500 maes), we should manipulate the codes a bit. The output of this
code is a matrix with 500 rows and four columns.
(37) history = model.fit(partial_train_data,partial_train_targets,
validation_data=(val_data, val_targets),
epochs=num_epochs,
batch_size=1,
verbose=0)
mae_history = history.history[’val_mean_absolute_error’]
all_mae_histores.append(mae_history)
mae_history are saved in an array known as all_mae_history but first we should define it. ¼>
all_mae_history = [ ]
The output of this code (line 37) is an array that has 4 elements and every element in this array is a list (array) again. This
gives 500 mean absolute error (mae) in each epoch. The array is like this:
[[MAE]500∗1, [MAE]500∗1, [MAE]500∗1, [MAE]500∗1]
Each list (array) in this list is for one specific fold. Let’s increase epoch number to 500. To consider precisely, should be
plotted the output of the program (validation MAE). But what this figure means? For more explanations, consider the list
below:
Fold1
[[MAE1, MAE2, MAE3, .................,MAE500],
Fold2
[MAE1, MAE2, MAE3, .................,MAE500],
Fold3
[MAE1, MAE2, MAE3, .................,MAE500],
Fold4
[MAE1, MAE2, MAE3, .................,MAE500]]
When calculating the model performance after epoch 1, you should average all MAE1 values. All codes about this operation
and plotting the results are reported as bellow:
(38) average_mae_history=[np.mean([x[i] for x in all_mae_scores]) for i in range(num_epochs)]
(39) plt.plot(range(1,len(average_mae_history)+1),average_mae_history)
plt.xlabel(’Epochs’)
plt.ylabel(’Validation MAE’)
plt.show()
If consider closely to Fig. 5, MAE is decreased rapidly in earlier epochs and little by little begins to increase and start to
overfits. If attended to this figure about epoch 80, model started to overfit.
Two changes enabling us to recognize the variations in the plot are applied. First removing the first epochs because they
don’t have any benefits and another change, removing sharp changes in plot, this task is so-called plot smoothing. One way
for plot smoothing and removing noise from the plot is averaging. For removing the sharpness in the plot and smoothing it,
the code below is written.
(40) def smooth_curve(points,factor=0.9):
smoothed_points=[]
for point in points:
if smoothed_points:
previous = smoothed_points[-1]
smoothed_points.append(previous*factor+point*(1-factor))
else:
smoothed_points.append(point)
return smoothed_points
(41) smoothed_mae_history=smooth_curve(average_mae_history[10:])
(42) plt.plot(range(1,len(smoothed_mae_history)+1),smoothed_mae_history)
plt.xlabel(’Epochs’)
plt.ylabel(’Validation MAE’)
plt.show()
Cross-validation Chapter
5
99
FIG. 5 Output of network (Validation MAE per
Epochs).
4.5
Validation MAE
4.0
3.5
3.0
2.5
0
100
200
300
400
500
Epochs
First, in line 40, the smooth_curve is defined; the duty of this function is removing sharpness from validation MAE plot.
Then in line 41, all data from epoch 10 passed to this function and in line 42, the diagram is plotted via matplotlib library
(this library is in core of python for plotting different diagrams). The output of this code is much clearer and more explicit
than previous one. As it can be seen from this plot after epoch 40, the network is starting to overfit; therefore, the epoch 40 is
the best situation for stop training. Fig. 6 indicates the output of network while epochs 1–10 are omitted and plot are
smoothed by applying smooth function. The average type which applied in this example is exponential moving average
(Khosrow-Pour, 2012).
After determining the hyper parameter (epochs number) in previous step, now should be retrain the network on all train
data with 40 epochs. This task executed with fit method. Then the retrained network is evaluated on test data and finally
reported the test_mae_score.
FIG. 6 Validation MAE after applying smooth_curve function on data.
2.75
2.70
Validation MAE
2.65
2.60
2.55
2.50
2.45
2.40
2.35
0
100
200
300
Epochs
400
500
100
Handbook of hydroinformatics
(43) model=build_model()
(44) model.fit(train_data,train_targets,epochs=40,batch_size=16,verbose=0)
(45) test_mse_score,test_mae_score=model.evaluate(test_data,test_targets)
>>>> test_mae_score = 2.845837354660034
This means the error of network is about 2.8, in another word the error of this network is 2.8 thousand dollars. For instance,
if the price of a house is $15,000, this network predicted the price between about $12,500 and $17,500.
Now we are going to go through a project called nutrient removal efficiency data and we use a data set containing 7876
data to analyze the total nitrogen (TN) removal efficiency of an anaerobic anoxic-oxic membrane bioreactor system. About
5000 data were separated for training the network and the remaining were allocated for test data. In this example, we used
K-fold cross-validation method for training network. The output values are predicted by 9 input data presented in Table 1.
This dataset was taken from the data reported from a paper published by Yaqub et al. in 2020 (Brownlee, 2018).
At first, we consider K ¼ 2, and then the K-fold cross-validation method was executed on the training data. According to
Fig. 7, visually we can understand from epochs 70, the overfit related to this model begins.
To remove the figure oscillations, the smooth function was used to view the overfit point more clearly. It is concluded
that the overfit value begins from epochs 75 which is shown in Fig. 8.
After magnifying the curve of Fig. 8 around the overfit point, the exact value of the overfit point is epochs 75 which can
be seen in Fig. 9.
After completing the run of cross-validation code, the test MAE and MSE values are obtained:
test_mae_score: 3.131399154663086
test_mse_score: 21.55681976617551
In the second run, we consider K ¼ 5. After training the network via the K-fold cross-validation code, Figs. 10–12 were
obtained. By considering these figures, after epochs 47, the overfit starts.
After setting the epochs on 47 and compiling the network again on the test data, the MAE and MSE values are obtained:
test_mae_score: 2.9801080226898193
test_mse_score: 19.03835214353075
Finally, after the complete run of the cross-validation code with k ¼ 7, it was concluded that the start of overfitting is from
epochs 40 and the following diagrams are reported for the output (Figs. 13–15):
TABLE 1 Attribute information of nutrient removal efficiency project.
Code
Input or output
Description
TOC
Input
Total organic contents
TN
Input
Total nitrogen
TP
Input
Total phosphorous
COD
Input
Chemical oxygen demand
NH4-N
Input
Ammonium
SS
Input
Suspended solids
DO
Input
Dissolved oxygen
ORP
Input
Oxidation-reduction potential
MLSS
Input
Mixed liquor suspended solids
RE of NH4-N
output
Removal efficiency of NH4-N
RE of TN
output
Removal efficiency of TN
RE of TN
output
The removal efficiency of TP
Cross-validation Chapter
5
101
14
Validation MAE
12
10
8
6
4
2
0
50
100
150
200
250
300
Epochs
FIG. 7 The output of network (Validation MAE per Epochs).
3.3
Validation MAE
3.2
3.1
3.0
2.9
2.8
2.7
0
50
100
150
Epochs
200
250
300
FIG. 8 Validation MAE after applying smooth_curve function on data.
After completing the run of network on test data for epochs 40, the MAE and MSE values are reported as below.
test_mae_score: 2.796928882598877
test_mse_score: 15.402697095683976
By comparing the values obtained from the data related to this project, it can be inferred that with increasing k, the mae (and
also mse) values decrease. Of course, it should not be forgotten that with increasing k, the computational time increases
sharply and a logical balance must be struck between these two inverse factors.
2.68
Validation MAE
2.67
2.66
2.65
2.64
2.63
65
70
75
80
85
90
95
100
Epochs
FIG. 9 Validation MAE after applying smooth_curve function on data and magnifying around overfit point.
7
Validation MAE
6
5
4
3
2
50
0
100
150
Epochs
200
250
300
FIG. 10 The output of network (Validation MAE per Epochs).
2.9
2.8
Validation MAE
2.7
2.6
2.5
2.4
2.3
0
50
100
FIG. 11 Validation MAE after applying smooth_curve function on data.
150
Epochs
200
250
300
Cross-validation Chapter
2.2855
Validation MAE
2.2850
2.2845
2.2840
2.2835
48
47
49
Epochs
50
52
51
FIG. 12 Validation MAE after applying smooth_curve function on data and magnifying around overfit point.
6
Validation MAE
5
4
3
0
50
100
FIG. 13 The output of network (Validation MAE per Epochs).
150
Epochs
200
250
300
5
103
104
Handbook of hydroinformatics
2.70
2.65
Validation MAE
2.60
2.55
2.50
2.45
2.40
2.35
2.30
0
50
100
150
Epochs
200
250
300
40
Epochs
45
50
55
FIG. 14 Validation MAE after applying smooth_curve function on data.
2.325
Validation MAE
2.320
2.315
2.310
2.305
2.300
2.295
25
30
35
FIG. 15 Validation MAE after applying smooth_curve function on data and magnifying around overfit point.
4.
Conclusions
In this chapter, the K-fold cross-validation is explained in detail and a famous and familiar example about this issue (predicting Boston house pricing) is represented. As mentioned before, there are two approaches for cross-validation types of
problems: the first method uses ready-made machine learning library in python known as scikit-learn and the second one
codes cross-validation in python from scratch. For more understanding about cross-validation concept, we have chosen the
secondary method named deep learning technique. The achieved result in Boston indicated that the value of the mean
absolute error is about 2.84 which means the predicted price has difference about $2800 in comparison with its actual value.
Also, by analyzing the data of the nutrient removal efficiency project, it was concluded that with increasing k, the obtained
MAE values decrease and the time for calculations increases sharply.
Cross-validation Chapter
5
105
References
Berrar, D., 2019. Cross-validation. In: Encyclopedia of Bioinformatics and Computational Biology. vol. 1. Elsevier, pp. 542–545.
Brownlee, J., 2016. Machine Learning Mastery With Python. Machine Learning Mastery Pty Ltd, pp. 100–120.
Brownlee, J., 2018. Statistical Methods for Machine Learning: Discover How to Transform Data Into Knowledge With Python. Machine Learning
Mastery.
Brunton, S.L., Kutz, J.N., 2019. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press.
Dangeti, P., 2017. Statistics for Machine Learning. Packt Publishing Ltd.
De Prado, M.L., 2018. Advances in Financial Machine Learning. John Wiley & Sons.
Deisenroth, M.P., Faisal, A.A., Ong, C.S., 2020. Mathematics for Machine Learning. Cambridge University Press.
Dua, S., Du, X., 2016. Data Mining and Machine Learning in Cybersecurity. CRC Press.
Dubitzky, W., Granzow, M., Berrar, D.P., 2007. Fundamentals of Data Mining in Genomics and Proteomics. Springer Science & Business Media.
El Naqa, I., Li, R., Murphy, M.J., 2015. Machine Learning in Radiation Oncology: Theory and Applications. Springer.
Fontaine, A., 2018. Mastering Predictive Analytics With Scikit-Learn and TensorFlow: Implement Machine Learning Techniques to Build Advanced
Predictive Models Using Python. Packt Publishing.
Geron, A., 2019. Hands-on Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.
O’Reilly Media.
Gollapudi, S., 2016. Practical Machine Learning. Packt Publishing Ltd.
Goodfellow, I., et al., 2016. Deep Learning. vol. 1 MIT Press, Cambridge.
Hadizadeh, R., Eslamian, S., 2017. Modeling hydrological process by ARIMA–GARCH time series. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of
Drought and Water Scarcity, Vol. 1: Principles of Drought and Water Scarcity. Taylor and Francis, CRC Press, USA, pp. 571–590 (Chapter 30).
Han, J., Pei, J., Kamber, M., 2011. Data Mining: Concepts and Techniques. Elsevier.
Jansen, S., 2018. Hands-On Machine Learning for Algorithmic Trading: Design and Implement Investment Strategies Based on Smart Algorithms That
Learn From Data Using Python. Packt Publishing Limited.
Kane, F., 2017. Hands-on Data Science and Python Machine Learning. Packt Publishing Ltd.
Karim, M.R., Kaysar, M.M., 2016. Large Scale Machine Learning with Spark. Packt Publishing Ltd.
Khosrow-Pour, M., 2012. Machine Learning: Concepts, Methodologies, Tools and Applications. Information Science Reference, Hershey, PA, USA.
Krohn, J., Beyleveld, G., Bassens, A., 2019. Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence. Addison-Wesley
Professional.
Kumar, R., 2019. Machine Learning Quick Reference: Quick and Essential Machine Learning Hacks for Training Smart Data Models. Packt
Publishing Ltd.
Lei, J., 2019. Cross-validation with confidence. J. Am. Stat. Assoc. 115 (532), 1–20.
Millstein, F., 2020. Convolutional Neural Networks in Python: Beginner’s Guide to Convolutional Neural Networks in Python. Frank Millstein.
Moolayil, J., Moolayil, J., John, S., 2019. Learn Keras for Deep Neural Networks. Springer.
M€
uller, A.C., Guido, S., 2016. Introduction to Machine Learning With Python: A Guide for Data Scientists. O’Reilly Media, Inc.
Provost, F., Fawcett, T., 2013. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O’Reilly Media, Inc.
Raschka, S., 2015. Python Machine Learning. Packt Publishing Ltd.
Raschka, S., Mirjalili, V., 2017. Python Machine Learning. Packt Publishing Ltd.
Richert, W., 2013. Building Machine Learning Systems With Python. Packt Publishing Ltd.
Sarkar, D., Bali, R., Sharma, T., 2018. Practical Machine Learning With Python. A Problem-Solvers Guide to Building Real-World Intelligent Systems.
Apress, Berkely.
Shanmugamani, R., 2018. Deep Learning for Computer Vision: Expert Techniques to Train Advanced Neural Networks Using TensorFlow and Keras.
Packt Publishing Ltd.
Swamynathan, M., 2019. Mastering Machine Learning With Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using
Python. Apress.
Viswanathan, V., et al., 2016. R: Recipes for Analysis, Visualization and Machine Learning. Packt Publishing Ltd.
Zaman, H.B., et al., 2019. Advances in Visual Informatics: 6th International Visual Informatics Conference, IVIC 2019, Bangi, Malaysia, November
19–21, 2019, Proceedings. vol. 11870 Springer Nature.
This page intentionally left blank
Chapter 6
Comparative study on the selected node
and link-based performance indices
to investigate the hydraulic capacity
of the water distribution network
C.R. Suribabua and P. Sivakumarb
a
Centre for Advanced Research in Environment, School of Civil Engineering, SASTRA Deemed University, Thanjavur, Tamil Nadu, India, b Department of
Civil Engineering, North Eastern Regional Institute of Science and Technology, Nirjuli (Itanagar), Arunachal Pradesh, India
1. Introduction
Water distribution system failures generally viewed as when the network does not supply water to the consumers with the
full degree of satisfaction. This happens due to failure of pumps, power outage, pipe breakage, valve malfunction, change in
the demand pattern, increase in pipe roughness due to age, water shortage at the source, inadequate storage system, and pipe
size, in which failure of pumps, pipe breakage, power outage, and valve malfunction creates shortage of water supply or no
supply of water to the consumer until failures rectified. If the failure rectified in a short duration, then the possibility of
restoring the system will be fast. The degree of satisfaction of network under such circumstances can be measured as the
resilience of water distribution network (WDN). If the availability of service pressure at all the demand nodes is ensured
through the design of the network when whatsoever type of failure happens, then network is said to be resilient system. In
most of the cases, the reliability of water availability has not been considered in while estimating the resilience of the
network since source reliability is treated as independent parameter. Resilience index proposed by Todini (2000) and
Jayaram and Srinivasan (2008) quantifies through the availability of excess service pressure. Ensuring higher pressure
at the nodes can compensate a short fall of pressure during failures.
Several performance measures are used to assess the capability of networks toward performing its intended task for
which design is made. In any system, the random failure is inevitable. Overcoming and minimizing its effect is one of
the key design factors while designing the water distribution system. The upper and lower limits of measure can be set
based on the level in which system needs to be performed under abnormal conditions. The cost of the system will increase
if higher the performance measure is expected. Design choice should be how best the system performance can be improved
within budget amount for implementation of WDN. The following few measures are commonly adopted in the design. The
resilient concept related to water supply system can be viewed in terms of resilient infrastructure. Wang et al. (2009) categorized that the resilient infrastructure is one that shows (a) reduced failure probabilities, (b) reduced consequence of
failure, and (c) reduced time to recovery. Howard and Bartram (2010) formulated the resilience concept of supply water
through the pipeline as a function of the resilience of individual components of the system, namely, the source, treatment
and distribution through primary, secondary and tertiary pipes, and in system storage infrastructure. Jeong and Kang (2020)
presented a new link-based reliability index called hydraulic uniformity index that consider the head-loss distribution in the
entire network. They have introduced equivalent head loss and equivalent hydraulic gradient concept and its value used for
comparing actual value of each pipe in the network. In this measure, the hydraulic uniformity of the network is quantified
based on an inborn property of that network’s configuration instead of permitted overall head loss while designing the
network.
Reliability of water distribution network (WDN) decreases due to the aging of the pipes (Mazumder et al., 2019). If the
pipe materials are in corrosive nature, then the aging of the pipe poses an increased resistant to the flow mainly due to inner
surface roughness projections and also its strength decreases further due to the result of corrosion on outer portion of the
pipe where pipe materials interact with soil. Commonly, the reliability of water distribution network is quantified based on
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00007-5
Copyright © 2023 Elsevier Inc. All rights reserved.
107
108 Handbook of hydroinformatics
the probability of failures during design periods, in which pipe failures are commonly taken as a prime factor to address the
system reliability. Guercio and Xu (1997) defined reliability evaluation of WDN related to pipe breakage as probability that
network can supply design demand when some components go out of service. A typical single objective optimization model
for the design of WDNs tries to minimize the cost of investment on procuring pipes. The minimum cost solution may not
fulfill the intended purpose during an abnormal operational condition. Except the dead end pipeline, most of the pipelines
carry flow for other pipelines. Hence, the failure of one pipe seriously affects the demand of several nodes under abnormal
operating condition. The increasing carrying capacity of each pipe with minimum overall capital cost has become as a new
objective. The maintaining good carrying capacity with the minimum cost invited a multiobjective formulation for the
design of WDN. Maximizing either reliability or resilience had become important second objective for the design of
WDN. Failure of a single pipe is considered as a serious effect on the demand satisfaction while comparing the failure
of combination of several pipes though its effect will be enormous to the system (Reed et al., 2010; Paez and Filion,
2019). The probability of failure of combination of two or more pipes is very minimum as such incident seldom happens.
Providing excess pressure to all the demand nodes has been considered while developing performance measures. Paez and
Filion (2019) presented two reliability estimators for performance assessment of WDN in which mechanical reliability
estimator considers both the probability of failure of components and also the proportion of supply under failure. They
have suggested that the proportion of supply during particular pipe failure is equal to the difference in total base demand
and flow through failure pipe under abnormal condition. Hydraulic reliability estimator is formulated to evaluate the
expected value of demand that is met under all possible demand variations. Mazumder et al. (2019) considered change
in pipe roughness due to aging for hydraulic performance analysis. The failure of pipes, pumps, valves, and failure of power
supply can be rectified less than 24 h. Hence, the normal supply can be restored within 48 h in the worst conditions. Cities
and towns provided with continuous water supply system can have serious effect if the failures are not rectified under war
footing. It is the responsibility of respective water supply agency to have an emergency plan to restore the system upon
failures. Fujiwara and De Silva (1990) specified that mean time to repair irrespective of the pipe diameter as 2 days. Standby
units for pumps and generator can help greater extend to restore the system in short interval of time. Further, the water can
be supplied using a water tanker truck in the area affected by pipe failure during repair period. The serious factor which
reduces the reliability of water supply is due to the increase of pipe roughness which needs to be quantified and its effect on
water supply shall be considered at the design stage itself and also during at regular performance assessment stages. The one
way of consideration at the design stage is that the roughness value can be taken corresponding to half of its service life
instead of new pipe roughness. It is common that while designing the network, the peak demand corresponding to the
planning period is considered as per guidelines. The aging of pipe line does not only increase the pipe breakage and
but also increases the roughness of the pipe surfaces. Increasing of inner roughness reduces the designed supply and
increases the probability of component failures.
The availability of WDN can be described as the percentage of a time that will be able to respond the design supply with
the desired service pressure. The time taken for the repair and restoring service to the normal will be a crucial parameter to
measure the availability. It is to be noted that the reliability is closely related to the availability. Though the aging of water
pipes and other components may not affect its availability, but its service toward fulfilling the design demand under aging
condition may not be possible. It reaches a status that though it is available, but not reliable. Sizing the pipe in such a way
that it can satisfy the design demand irrespective of its age can have ability to recover after damage and failure. When the
pipe is designed with excess nodal pressure than what is actually required at the beginning period of planning horizon, then
it will be able to provide supply despite the increase in the head loss due to roughness, increase in demand, and damages of
system. During restoration period, the network still can provide its best service when it is designed to have self-healing
characteristics. Under fully water availability condition at the source, the degree of supplying water during failure time
is described as the resilience of the network. Ensuring service pressure at all the time in the network is essential for network
to have good resiliency. Investigation by Swati and Janga Reddy (2020) through two benchmark networks and a real
network has concluded that resiliency can be a good surrogate measure for hydraulic reliability rather than for mechanical
reliability. Earlier, Reed et al. (2010) and Banos et al. (2011) highlighted that resilience index fails to represent the
mechanical reliability of water distribution network. Walski (2020) mentioned that reliability of water distribution network
highly depends on redundancy and availability of more isolation valves rather than increasing the capacity of pipe while
designing a network. The systematic placement of valves is more important since single pipe failure can isolate several
pipes for providing supply to the nodes. Hence, the network design based on hydraulic reliability indicator can help to
tackle demand variation and aging effects on the supply. Maintaining the uniform flow distribution among the pipes
can reduce the burden of carrying inflow to the node by particular pipe (Moosavian and Lence, 2020). The designing
network with uniform head loss among all the pipes can have good performance characteristic. Hashemi et al. (2020) investigated the effect of pipe size and location of the pipe from source on the head loss using 18 water distribution systems in
Comparative study on the selected node Chapter
6
109
North America. The study suggested that flow rate in the pipe is more influencing factor than the diameter of the pipe for
causing head loss in case pipes are located near to the water source and whereas at the periphery of the network, the pipe
diameter is found to be a critical factor than flow rate. In the present work, another new link-based index which takes
maximum permitted head loss in the network while designing the network. Relative head loss in each pipe is computed
in reference to average head loss per unit length of the pipe. And this work illustrates how pipe dimensioning can be done to
obtain uniformly distributed head loss across the network using average unit head loss without much computational efforts.
2. Resilience of water distribution network
Todini (2000) has proposed a resilience index to address the intrinsic capability of system to overcome the failures. If the
system is designed based on resilient point of view, then it can have reduced level of failure instant, minimum or reduced
failure consequences, and be able to recover quickly from the failure. Resilience can also be considered as a measure of
capability of the system to absorb the shocks or to perform under perturbation. Wu et al. (2011) applied surplus power factor
as a resilience measure in the optimal design of water transmission system. Yazdani and Jeffray (2012) combined
robustness and redundancy to address the resilience measure of water distribution system. In the water resources system,
resilience is considered as one of the important measure to quantify the capacity of system to maintain its essential functions
as before during the event of unexpected stresses and disturbances (Liu et al., 2012).
Resilience index (RI) proposed by Todini (2000) is the ratio between the sum of residual power of all the nodes to the
sum of potential residual power of all the nodes.
N
X
RI ¼
NDi havl,j h min,j
j¼1
R
X
Qr hres,i +
i¼1
B
X
P
b¼1
!
b
n
N
X
(1)
qj h min,j
j¼1
where
NDi —demand at node i
havl,j —available pressure head at node j
hmin,j —minimum pressure head at node j
Qr —flow from reservoir i
hres,i —sum of reservoir elevation and its water level of reservoir i
Pb —capacity of pump b
n—specific weight of the liquid
N—number of nodes
R—number of reservoirs
Later, Jayaram and Srinivasan (2008) proposed a modified resilience index (MRI) that considers the ratio between the sum
of residual power and the sum of minimum power required to all the nodes.
N
X
MRI ¼
NDi havl,j h min,j
j¼1
N
X
(2)
NDj h min,j
j¼1
While Todini (2000) resilience index theoretically varies between 0 and 1 (poor and good), the modified resilience index of
Jayaram and Srinivasan (2008) can exceed 1. Fixing the upper and lower bounds will be a quite challenging task if it is
considered as a constraint. In several cases, maximizing the resilience of water supply system is being considered either first
or second objective function (Prasad and Park, 2004; Fu et al., 2013; Wang et al., 2014; Ostfeld et al., 2014; Moosavian and
Lence, 2020). Maximization of reliability of water supply is another commonly used objective as the second objective after
cost minimization in the multiobjective optimization models (Walski, 2020). Walski (2020) indicated that reliability indicator developed in addressing the excess capacity of pipe alone does not address the true reliability of WDN. Practically, the
demand nodes located near to the source can have more surplus pressure energy than the faraway nodes along the flow path
or identified critical nodes in the networks. Utilization of pressure head available in those nodes to meet out additional
110 Handbook of hydroinformatics
demand will certainly affect the supply at those critical nodes. Critical nodes can be identified as the nodes located at the
higher elevation ground, remotely connected to the source, high demand nodes, and more pipes incident to the nodes. In
those nodes, the availability of surplus power will be negligible level. Supply cannot happen fully to the consumers during
abnormal conditions. Hence, the availability of minimum surplus power at the critical nodes is crucial in quantifying the
intrinsic capability of the system. The major advantage of Todini’s; and Jayaram and Srinivasan’s resilience indices is that it
does not involve statistical considerations on failures. Network reliability can be enhanced if the network is designed to
have higher value of indices. Todini (2000) indicated that the increasing of resilience is possible if the flow is distributed
more evenly among all the pipes rather than allowing flow concentrically in a spanning tree.
3. Hydraulic uniformity index (HUI)
Jeong and Kang (2020) introduced a link-based index called hydraulic uniformity index to evaluate system design. For
individual pipe, the ratio between its hydraulic gradient of a pipe and the equivalent hydraulic gradient is designated as
HUI and for entire network, HUIsys is given as follows:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u NP
uX
u
ðHUIi 1Þ2
u
t i¼1
(3)
HUIsys ¼
NP
The HUIi will be computed using following formula
hli ∗
NP
X
Li ∗
NP
X
i
HUIi ¼
Li ∗NP∗
Qi
i¼1
NP
X
(4)
Qi hli
i¼1
where
HUIi ¼ hydraulic uniformity index of the ith pipe
Qi ¼ flow in ith pipe
Li ¼ length of ith pipe
NP ¼ number of pipes
hli ¼ head loss in ith pipe.
4. Mean excess pressure (MEP)
The mean excess pressure for a network can be found using following expression.
N X
MEP ¼
havl,i h min,i
i¼1
N
(5)
where
N ¼ number of nodes.
The MEP can help to compare the excess pressure availability of the network. The two different cost networks can have a
same excess pressure, and its performance under abnormal condition could be distinctive. This measure could help while
assessing the performance of the different configuration networks.
5. Proposed measure
5.1 Energy loss uniformity (ELU)
The minimum pressure head required for a node is generally defined while designing the WDN. Primarily, the source head
acts a driving force to transmit water from the source to the consumer nodes. The part of source head available is getting
Comparative study on the selected node Chapter
6
111
dissipated in the pipeline to overcome the friction and minor losses. The maximum loss can be considered as the difference
between the source head (hs) and minimum pressure head (hmin). This maximum permitted head loss will be a key
parameter in the design of WDN. The economics of network with respect to pipe size depends on how best this maximum
head loss is utilized in the selection of network configuration. Configuring the network with minimum possible pipe sizes
could have maximum head loss in each pipe since head loss is indirectly proportional to the pipe size. Increase of pipe size
can reduce the head loss and limiting the pipe size with respect to average unit head loss could provide balancing configuration. The average unit head loss can be computed by taking the ratio between maximum head loss and total length of the
pipe line (L) in the network.
UHLavg ¼
hS h min
L
(6)
where
hs ¼ source pressure head.
Maintaining unit head loss in each link close to the average unit head loss (UHLavg) can eliminate either over or under sizing
of the pipes in the network. But, it may not be feasible to have same unit head loss in each link since size of the pipe is
decided from available commercial sizes. The ratio between actual unit head loss in each pipe and average unit head loss of
the network equals one, then, it denotes the pipe is sized exactly with average unit head loss. The larger variation of this ratio
denotes the pipe is undersized and vice versa. In the optimal design of the WDN, the pipe sizing is performed to minimize
the cost subject to the minimum pressure head requirement as a major constraint. This specific head loss (HLspecific) can be
considered as one of the design parameters in the sizing the pipeline or to address the uniformity of the network.
HLSpecific ¼
UHLactual
UHLavg
(7)
The uniformity of network’s pipe configuration in terms of head loss (energy loss uniformity) can be computed using
HLspecific by finding the standard deviation with respect to one as follows:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u NP 2
uX
u
HL
1
u
Specific,i
t i¼1
(8)
ELU ¼
NP
where
NP is the number of pipes in the network
ELU helps to assess how the network configuration possesses the uniformity in the pipe capacity to meet the nodal demand. The
low value of this measure indicates the good uniformity in the pipe configuration and higher value depicts that network could be
composed of redundant pipes and deficient capacity pipes. Redundant pipes in the network may help to augment the flow during
abnormal conditions (e.g., pipe failure scenario or excess demand period), but the cost of the network will be higher. The network
with more deficient pipes could be a least cost one and its performance under abnormal conditions will be poor. The proposed
ELU can be used for design of new network and also to assess the pipe uniformity of the existing system.
6. Hanoi network
Hanoi network is widely used by several researchers to examine the applicability of various types of optimization algorithms and also to assess the performance evaluation of WDN using performance indices. Hanoi network is categorized as a
medium size network that has a reservoir, 31 nodes, and 34 pipes. The source elevation is fixed as 100 m with respect to zero
ground elevation and elevation of all the nodes maintains the same elevation with zero reduced level. The minimum service
pressure of 30 m is required to all the demand nodes. To examine the proposed Energy loss uniformity (ELU), the various
optimal network configurations reported in the literature (Suribabu, 2010, 2012, 2017; Beygi et al., 2014; Saldarriaga et al.,
2020) have been considered. The layout of the network, pipe length, and demand details are available in Suribabu (2010).
Further, resilience index (Todini, 2000; Eslamian et al., 2019), modified resilience index ( Jayaram and Srinivasan, 2008),
and hydraulic uniformity index ( Jeong and Kang, 2020) are considered for a comparative study to assess the performance
of the network.
112 Handbook of hydroinformatics
7. Results and discussion
The optimal solutions available for Hanoi network in the literature are used to evaluate its resilience, hydraulic uniformity
and energy loss uniformity. ELU, MEP, RI, MRI, and HUIsys were calculated for 17 selected solutions. Table 1 gives the
computed value of these indices and cost of the network. The general observations on the values of four indices indicate that
there is a decreasing index value for ELU while remaining three indices values show increasing trend. However, in certain
solutions, the index does not match with this kind of trend. For example, at the higher cost, ELU value should be lower than
the least cost solution. But, the solution 15 has higher cost with higher ELU value. RI and MRI also have the same change
for the same solution, whereas HUIsys shows higher value which depicts that network has better uniformity. According to
HUIsys, it is rated better solution. The denominator value of ELU, RI, and MRI has a constant value for a network, whereas
in case of HUIsys, its value will be a variable type.
For direct comparison of indices for the selected solutions, three different plots are prepared. It can be seen from Fig. 1
that ELU is not directly related to or linearly fitting with these three indices. Hence, the proposed index cannot be viewed as
replica of those three indices. From the plot (Fig. 2) of resilience indices against HUIsys, it can be seen that HUIsys has not
directly related to resilience indices. Fig. 3 clearly shows that modified resilience index is directly related with resilience
index as the slope between them is found to be constant. It is to be noted here that RI and MRI are the excess pressure energy
available based indices whereas proposed ELU index and HUIsys are energy utilized indices. ELU takes maximum permissible head loss (difference between source head and minimum service pressure head) as a reference to address the uniformity, whereas HUI considers equivalent head loss and it is obtained by taking weighted average of head loss of that
network. The ratio between flow in a pipe and sum of flow in all the pipes is considered as a weighing factor to get equivalent head loss. This value varies solution to solution since the distribution of flow varies significantly when the network
configuration changes. Due to this aspect, the denominator of HUIsys expression will be a variable. Hence, HUI cannot be
used directly in the optimization model for arriving best configuration in terms of hydraulic uniformity. In the process of
optimization, it generates lot of infeasible solutions and equivalent head loss is calculated with reference to head loss in
each pipe of that network. The HUIsys can help effectively to assess the hydraulic uniformity of the feasible network
TABLE 1 Various performance measure values for selected solutions for Hanoi network.
Solution number
Cost of the network ($)
ELU
MEP (m)
HUIsystem
RI
MRI
01
6,081,087
4.102
11.59
0.843
0.134
0.447
02
6,232,322
4.128
13.40
0.843
0.151
0.504
03
6,260,886
4.083
13.21
0.861
0.150
0.502
04
6,302,313
4.236
13.38
0.846
0.150
0.501
05
6,374,469
4.072
12.76
0.871
0.172
0.574
06
6,399,904
4.567
12.75
0.897
0.144
0.479
07
6,426,983
4.282
14.21
0.887
0.162
0.541
08
6,650,114
3.641
18.54
0.912
0.203
0.675
09
6,703,973
3.663
18.23
0.939
0.197
0.655
10
6,710,999
3.550
19.84
0.959
0.211
0.705
11
6,800,022
3.697
19.31
0.943
0.206
0.686
12
6,927,267
3.591
17.40
0.967
0.188
0.627
13
6,950,027
3.667
20.24
0.983
0.213
0.711
14
7,128,405
3.540
20.85
0.969
0.222
0.739
15
7,216,711
4.522
18.58
1.032
0.198
0.659
16
7,417,236
3.535
21.89
1.002
0.229
0.765
17
7,797,775
3.540
22.80
1.042
0.237
0.789
Comparative study on the selected node Chapter
6
113
FIG. 1 Plot between three performance indices with the
proposed ELU index.
1.2
Performance Indecies
1
0.8
0.6
RI
MRI
0.4
HUIsystem
0.2
0
3
3.5
4
4.5
5
Energy Loss Uniformity
0.8
0.7
FIG. 2 Plot between resilience indices with HUI system.
Resilience Index
Modified Resilience Index
Resilience
0.6
0.5
0.4
0.3
0.2
0.1
0
0.8
0.85
0.9
0.95
1
1.05
HUI system
Modified Reslience Index
0.9
FIG. 3 Plot between
resilience index.
0.8
modified
resilience
index
and
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
Resilience Index
configuration. The proposed approach can be used for both in optimization model and also to assess the ELU of any network
configuration. Though RI and MRI have higher values at higher cost, but, it is unable to satisfy the uniformly increased
demand of several nodes.
Fig. 4 shows the supply-demand ratio for 10%, 20%, and 30% increase in the demand for all the nodes of least cost
solution 1 ($6,081,087) given in Table 1. Nodes 2, 3, 4, 5, 18, 19, and 20 satisfy the increased demand fully. EPANET
114 Handbook of hydroinformatics
FIG. 4 Supply demand ratio of solution
cost $6,081,087 for increased demand
conditions.
1.2
10% Increase in Demand
20% Increase in Demand
Supply-Demand Ratio
1
30% Increase in Demand
0.8
0.6
0.4
0.2
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30 32
Node Number
2.2 version was used to compute the supply for each node wherein the minimum and required pressure head was taken as
30 m and 45 m respectively for Hanoi network.
The pressure-driven analysis (PDA) of EPANET 2.2 indicates that all the nodes other than full supplying nodes are able
to supply partially for increased demand. The supply-demand ratio for solution 1 for 10%, 20%, and 30% increase in
demand is 0.892, 0.843, and 0.796, respectively. Three solutions having cost $6,710,999, 7,128,405, and 7,797,775 are
further taken up from Table 1 to compute the supply-demand ratio under 10%, 20%, and 30% increase in demand. It
can be seen from the radar plots (Fig. 5A–C) that the increased demand is met only in those nodes close to the source that
have more excess pressure. Deviation of curve is visible only in those nodes and curve continues to overlie each other in the
remaining nodes. There is a marginal difference in the value of supply-demand ratio for all three solutions. For 10% demand
increase, the supply-demand ratio is obtained as 0.946, 0.954, and 0.967, respectively, and corresponding to 20% its value is
0.897, 0.904, and 0.915, and while for 30%, 0.848, 0.856, and 0.864, respectively. It is to be noted that the increase in cost
alone does not ensure guaranteed full supply when the demand increases over a period. The main reason could be that
though there is an excess capacity to meet out increased demand, the remotely located nodes from the source continue
to face excess head loss along its path. The variation of pressure among the nodes should be as low as possible to meet
out the increased demand uniformly. This could be possible only if the reservoir is relocated at the center of the network.
The percentage increase in the cost between minimum and maximum cost solutions available for the Hanoi network as
per Table 1 is 22.01%, and corresponding increase of MEP, RI, and MRI are 49.17%, 43.45%, and 43.34% respectively.
The percentage increase of RI and MRI indicates that these two indices have direct correlation with mean excess pressure
(MEP) available in the network. In case of HUIsystem and ELU, the percentage change is 19.1% and 15.88%, respectively.
Though these two indices use head loss in the pipeline, there is a significant variation. It is to be noted that ELU uses
maximum permissible head loss instead of equivalent head loss to measure the uniformity. MEP is found to be a simple
measure that can be used as a second objective in the multiobjective optimization. For example, while referring the solution
number 10 and 11 in Table 1, the lower cost solution has higher value of MEP than the higher cost solution. Similarly, the
same is case for solutions between 14 and 15.
A new Hanoi network configuration is arrived to ascertain the usefulness of average unit head loss. The pipe diameter is
picked up for each link in such a way that which size could provide closest value of head loss with respect to average unit
loss. This can be done by setting the same size to all the pipes. By simulating the network using demand-driven analysis, the
pipe ID was selected to retain that diameter. This procedure repeated for remaining commercial sizes. Table 2 illustrates
how the pipe diameter was selected for each pipe based on unit head loss. The solution obtained by this approach has total
cost of $7,709,796. The head loss in each of pipe in the new configuration has a closest value with chosen head loss (shown
as bolded in Table 2) in the diameter selection process. Though the head loss for Pipe ID 1 and 2 is very high, there is no
higher pipe size available in the option to select. For this new configuration, MEP, ELU, HUIsystem, RI, and MRI are calculated and its value is 21.68 m, 3.536, 1.011, 0.228, and 0.760, respectively. This solution is found to be minimum cost
solution, while the performance measure values are nearest with values of 17th solution listed in Table 1. Similarly, the
supply-demand ratio for 10%, 20%, and 30% increase in the demand is 0.959, 0.908, and 0.858, and it appears closely
Comparative study on the selected node Chapter
6
115
FIG. 5 Radar plot for quantity of demand met at each node against demanded quantity. (A) Cost of network $6,710,999. (B) Cost of network $7,128,405.
(C) Cost of network $7,797,775.
performing with 17th solution in meeting the additional demand. Hence, designing the network using average head loss will
be easiest method to get better performing network configuration with minimum cost without much computational effort.
The design procedure illustrated can be used to pick pipe sizes to arrive a good configuration in terms of higher value of
performance indices. Still better configuration can be obtained through an optimization keeping minimization of difference
between actual head loss and average head loss as a second objective function while cost minimization as a first objective.
Further, the network performance is examined by relocating the reservoir at node 18 and this node appears as central
position to the Hanoi network. A new configuration is arrived for relocated reservoir position of the network using average
head loss. The same step is adopted to arrive a new configuration. The cost of the network is $7,177,777 and MEP, RI, MRI,
HUIsys, and ELU values are 48.25 m, 0.493, 1.644, 1.487, and 2.850, respectively. When this solution is compared with
solution 14 in Table 1, there is a drastic change in the various indices’ values. For the cost increase of 0.68%, the change
in the value MEP, RI, MRI, HUIsys, and ELU values are 56%, 54.98%, 55.05%, 34.84%, and 19.50%, respectively. It is
interesting to note that this solution is able to meet 37% increased demand at all the nodes. In case the network is configured
all the pipes with 1016 mm, the cost of the network will be $10,969,797 and its MEP value is 28 m. This MEP value is 58%
lesser than the new solution obtained by changing the position of the reservoir.
116 Handbook of hydroinformatics
TABLE 2 Unit head loss for each pipe and selected diameter for new Hanoi network configuration.
Unit head loss (m/km) for the pipe diameter (mm)
1016
Selected
diameter
(mm)
Head loss for
selected
configuration (m)
116.10
28.59
1016
28.59
316.34
106.69
26.27
1016
26.27
87.09
35.83
12.08
2.98
1016
2.89
247.76
83.56
34.38
11.59
2.86
1016
2.77
783.68
193.00
65.09
26.78
9.03
2.22
1016
2.15
6
518.26
127.63
43.04
17.71
5.97
1.47
1016
1.41
7
242.73
59.78
20.16
8.29
2.80
0.69
762
2.63
8
158.26
38.98
13.14
5.41
1.82
0.45
762
1.68
9
93.35
22.99
7.75
3.19
1.08
0.26
762
0.97
10
142.44
35.08
11.83
4.87
1.64
0.40
762
1.64
11
0.73
63.12
21.29
8.76
2.95
0.73
1016
0.73
12
1.24
26.56
8.96
3.69
1.24
0.31
762
1.24
13
34.7
8.54
2.88
1.19
0.40
0.10
609.6
1.4
14
88.62
21.83
7.36
3.03
1.02
0.25
762
1.13
15
120.58
29.70
10.01
4.12
1.39
0.34
762
1.52
16
460.74
113.47
38.27
15.74
5.31
1.31
1016
1.43
17
675.47
166.35
56.10
23.08
7.78
1.92
1016
2.07
18
1082.77
266.66
89.93
37.00
12.48
3.07
1016
3.26
19
1102.98
271.64
91.61
37.69
12.71
3.13
1016
3.32
20
1186.67
292.25
98.56
40.55
13.68
3.37
1016
3.27
21
75.04
18.48
6.23
2.56
0.86
0.21
609.6
2.56
22
10.33
2.54
0.86
0.35
0.12
0.03
406.4
2.54
23
421.54
103.82
35.01
14.41
4.86
1.20
1016
1.13
24
77.33
19.04
6.42
2.64
0.89
0.22
609.6
2.62
25
16.19
3.99
1.34
0.55
0.19
0.05
508
1.32
26
6.01
1.48
0.50
0.21
0.07
0.02
406.4
2.35
27
60.7
14.95
5.04
2.07
0.70
0.17
609.6
2.4
28
97.73
24.07
8.12
3.34
1.13
0.28
762
1.26
29
47.87
11.79
3.98
1.64
0.55
0.14
609.6
1.38
30
27.32
6.73
2.27
0.93
0.31
0.08
508
1.8
31
9.37
2.31
0.78
0.32
0.11
0.03
406.4
1.49
32
0.56
0.14
0.05
0.02
0.01
0
304.8
0
33
0
0
0
0
0
0
304.8
0.58
34
26.7
6.58
2.22
0.91
0.31
0.08
508
2.74
Pipe
ID
304.8
406.4
508
609.6
762
1
10,073.94
2480.98
836.71
344.26
2
9257.07
2279.80
768.86
3
1048.56
258.23
4
1006.01
5
Comparative study on the selected node Chapter
6
117
8. Conclusions
In this chapter, a new link-based index for the evaluation of pipe uniformity on energy loss, resilience, modified resilience,
and hydraulic uniformity indices is considered for the comparative study. Hanoi network is selected for investigation of the
proposed index and analysis shows that RI and MRI are found to be alternate index in addressing the performance of the
network. Though HUIsys and ELU indices uses head loss in each pipe, but it measures distinctly to address the energy
consumed in the pipeline. Pipe selection using average head loss gives directly good network configuration according
to the proposed ELU and other indices used in the study. The average head loss guides the pipe dimensioning process quite
simple way and resulting configuration promises the economic and provides maximum performance. Network designed by
average head loss can further be improved for least cost by bringing the unit head close to average head loss arrived for
design. It is further observed that the network performance cannot be improved just by increasing the pipe size alone as
increase of pressure cannot be achieved in the network unless booster pump is added or source is available at the centroidal
point of the network. In the Hanoi network, the source is not situated at centroidal point, hence the availability of pressure is
not uniformly distributed across the network. It is found from the study that even network with higher pipes to meet the
unexpected increase in demand is continued to suffer at all critical nodes. Only nodes closer to the source are able to supply
additional demand without compromising service pressure. Enhanced performance of the network under higher demand or
abnormal operation is possible only if the source of water is made available to the centroidal location of the network. The
above conclusion is arrived based on the analysis carried out for Hanoi network and further study using other networks can
shed some more insight on the measures and design.
References
Banos, R., Reca, J., Martı́nez, J., Gil, C., Márquez, A.L., 2011. Resilience indexes for water distribution network design: a performance analysis under
demand uncertainty. Water Resour. Manag. 25 (10), 2351–2366.
Beygi, S., Haddad, O.B., Mehdipour, E.F., Mariño, M.A., 2014. Bargaining models for optimal design of water distribution networks. J. Water Resour.
Plan. Manag. 140 (1), 92–99.
Eslamian, S., Syme, G., Reyhani, M.N., 2019. Building socio-hydrological resilience: theory to practice. Virtual Special Issue, J. Hydrol. 575, 930–932.
Fu, G., Kapelan, Z., Kasprzyk, J.R., Reed, P., 2013. Optimal design of water distribution systems using many-objective visual analytics. J. Water Resour.
Plan. Manag. 139, 624–633.
Fujiwara, O., De Silva, A.U., 1990. Algorithm for reliability-based optimal design of water networks. J. Environ. Eng. 116 (3), 575–587.
Guercio, R., Xu, Z., 1997. Linearized optimization model for reliability-based design of water systems. J. Hydraul. Eng. 123 (11), 1020–1026.
Hashemi, S., Filion, Y., Speight, V., Long, A., 2020. Effect of pipe size and location on water-main head loss in water distribution systems. J. Water Resour.
Plan. Manag. 146 (6), 06020006.
Howard, G., Bartram, J., 2010. Vision 2030: The Resilience of Water Supply and Sanitation in the Face of Climate Change. World Health Organization,
Geneva, Switzerland.
Jayaram, N., Srinivasan, K., 2008. Performance-based optimal design and rehabilitation of water distribution networks using life cycle costing. Water
Resour. Res. 44 (1), 1–15.
Jeong, G., Kang, D., 2020. Hydraulic uniformity index for water distribution networks. J. Water Resour. Plan. Manag. 146 (2), 04019078.
Liu, D., Chen, X., Nakato, T., 2012. Resilience assessment of water resources system. Water Resour. Manag. https://doi.org/10.1007/s11269-012-0100-7.
Mazumder, R.M., Salman, A.M., Li, Y., Yu, X., 2019. Reliability analysis of water distribution systems using physical probabilistic pipe failure method.
J. Water Resour. Plan. Manag. 145 (2), 04018097.
Moosavian, N., Lence, B.J., 2020. Flow-uniformity index for reliable-based optimal design of water-distribution networks. J. Water Resour. Plan. Manag.
146 (3), 04020005.
Ostfeld, A., Oliker, N., Salomons, E., 2014. Multiobjective optimization for least cost design and resiliency of water distribution systems. J. Water Resour.
Plan. Manag. 140 (12), 04014037-01-12.
Paez, D., Filion, Y., 2019. Mechanical and hydraulic reliability estimators for water distribution systems. J. Water Resour. Plan. Manag. 145 (11),
06019010.
Prasad, T.D., Park, N.S., 2004. Multiobjective genetic algorithms for design of water distribution networks. J. Water Resour. Plan. Manag. 130 (1), 73–82.
Reed, D.N., Sinske, A.N., Van Vuuren, J.H., 2010. Comparison of four reliability surrogate measures for water distribution systems design. Water Resour.
Res. 46 (5), W05524.
Saldarriaga, J., Paez, D., Salcedo, C., Cuero, P., Lopez, L.L., Leon, N., Celeita, D., 2020. A direct approach for the near-optimal design of water distribution
networks based on power use. Water 12, 1037. https://doi.org/10.3390/w12041037.
Suribabu, C.R., 2010. Differential evolution algorithm for optimal design of water distribution networks. J. Hydroinf. 12 (1), 66–82.
Suribabu, C.R., 2012. Heuristic based pipe dimensioning model for water distribution networks. J. Pipeline Syst. Eng. Pract. 3 (4), 115–124.
Suribabu, C.R., 2017. Resilience-based optimal design of water distribution network. Appl Water Sci 7 (7), 4055–4066.
Swati, S., Janga Reddy, M., 2020. Assessing the performance of surrogate measures for water distribution network reliability. J. Water Resour. Plan.
Manag. 146 (7), 04020048.
118 Handbook of hydroinformatics
Todini, E., 2000. Looped water distribution networks design using a resilience index based heuristic approach. Urban Water 2 (2), 115–122.
Walski, T., 2020. Providing reliability in water distribution systems. J. Water Resour. Plan. Manag. 146 (2), 02519004.
Wang, C., Blackmore, J., Wang, X., Yum, K.K., Zhou, M., Diaper, C., McGregor, G., Anticev, J., 2009. Overview of Resilience Concepts, with Application to Water Resource Systems. eWater Cooperative Research Centre Technical Report, eWater CRC, University of Canberra, Australia.
Wang, Q., Guidolin, M., Savic, D., Kapelan, Z., 2014. Two-objective design of benchmark problems of a water distribution system via MOEAs: towards
the best known approximation of the true Pareto front. J. Water Resour. Plan. Manag. https://doi.org/10.1061/ASCE(WR).1943-5452.0000460.
Wu, W., Maier, H.R., Simpson, A.R., 2011. Surplus power factor as a resilience measure for assessing hydraulic reliability in water transmission system
optimization. J. Water Resour. Plan. Manag. 137 (6), 542–546.
Yazdani, A., Jeffray, P., 2012. Applying network theory to quantify the redundancy and structural robustness of water distribution system. J. Water Resour.
Plan. Manage. 138 (2), 153–161.
Chapter 7
The co-nodal system analysis
Vladan Kuzmanovic
Serbian Hydrological Society, International Association of Hydrological Sciences, Belgrade, Serbia
1. Introduction
The co-nodal analysis finds an interesting example in geographical elements such as hydrological and paleo-hydrological
studies. Useful remote sensing techniques provide more consistent insight analysis of anastomosing phenomena as well as
their complex logical relationships. Flows and hydrodynamic functions are not equivocal, even more so in cross-sections of
paleo-hydrological periods. Remote sensing techniques allow for a multidimensional analysis of hydrological phenomena
in exchange for simplistic hydrological methods, reduced and insufficient field data. Remote sensing is particularly emphasized in the paleo-hydrological discipline of geography, enabling monitoring of the dynamics of macrogeographical
phenomena and the use of correlations of macrogeographical factors. Macrophenomena are dimensional, complex geographical, and spatial concepts that are partitions and subsets of geographical categories. GIS techniques enable more
accurate mapping of geospatial objects, as well as aggregate geographic data visuals of geographic and macrogeographic
phenomena (Wilkinson, 1996; Langat et al., 2019). Hydrological, paleo-hydrological phenomena are dynamic long-term
interactions with intriguing fluvial yields.
The co-nodal analysis includes a co-nodal system composed of nodules and flows, which describe oriented hydrological
functions, river basin, set and subsets of river basins, are not just linear and unambiguous hydrological networks, as seen by
conventional river basin theory, but by definition of the nodal theory are multifaceted oriented networks (reversible, bidirectional, complementary). The nodal theory is a temporal, time-spatial analysis of flows, and fluvial vectors, not just a
spatial, linear, and actual analysis of water maps. Flows are not one-way directions but poly-orientations; they are not
spatial, but temporal-spatial categories.
The analysis considers nodal functions as a function of the nodus as a hub and intersection. Flows or branches are temporal functions. Vectors have a temporal or temporal-spatial orientation. Flows are elements of different quad-circular
systems. The nodal system is described by time layers, where the layer defines orientation (nodal, mono-orientation)
and set (nodal set, layout of active nodes, and functional flows). However, characteristics of dynamic models are multilayer
positioning, overlapping layers, complementing flows, and adding functions. The model features functional addition
(addition of flows) and poly-orientation of functions. The nodal system thus forms a circular fluvial model with active sets
and fluvial palimpsests of the nth time layer. Unlike the conventional net model, which is simplistic with point A to point B
flows, the nodal system is a multidimensional model with a multilayered circular flow. Downstream flows are paleohydrological or chronological consequences. The complementary model explains the dilemmas of anastomosing flows.
Funnels are not simple regular flows, but complex productions or results. An anastomosing case changes the conventional
way we understand fluvial dynamics. The co-nodal system enables the interpretation and analysis of anastomosing flows as
complex logical systems through the formalization of advanced flow models. Formalized models make it possible to understand contradictory and alogical results, from the standpoint of conventional theory, especially in cases of counter currents
and overlap in the light of nondimensional models.
2. Co-nodal and system analysis
Remote sensing techniques provide a clear insight into complex geo-natural phenomena, and thus an adequate relational
analysis of concepts and elements in the model. Often, conclusions and a full explanation of a phenomenon can be drawn
only after techniques have been performed, GIS data collected, distant samples acquired (Walsh et al., 1998). GIS data are
of particular importance in cases of limited data collecting, with positive physical phenomena, such as geo terrain, or multisected permeation. GIS provides obvious insights (geographic, hydrogeographic insights) or provides visual data on which
models can be set (Gilvear, 1999; Gilvear et al., 1999; Gangodagamage and Agrarwal, 2001).
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00010-5
Copyright © 2023 Elsevier Inc. All rights reserved.
119
120
Handbook of hydroinformatics
Selected visual data can be formalized, as applied to hydrological phenomena, into nodal systems, models, vectorial, or
flow maps, with an explanation of specific physical systems, models that link groups of geographical objects and systems
into formal work frames (such as catchments, river mouths, anastomosing, intercourse, etc.).
Complex hydrological models find adequate formalization in co-nodal systems, given the abundance, multiplication,
dynamics, relations of elements (hubs and nodes) and systems (basins and rivers) as well as chronologies. Hydrological
models function on the principle of nodes and orientations. Temporal orientations are in paleo-hydrological analysis
besides spatial (geographical) are also temporal categories. Geographic models are not only physical, spatial systems,
but also complex and multidimensional, supra-geographical, linear compositions. The placement of geographic models
is oriented in space and time, and therefore basic geographical facts like flow, orientation are not mono-dimensional
but vector categories. Geographic elements are not physical but structured elements of a system. The contemporary theory
involves a structured multidimensional, realistic, diffuse model versus a linear GIS model in the classification of paleohydrological phenomena.
Identify, define, and classify with clear examples and explanations: GIS visual data enable global access to geophenomena. It allows comparisons and inferences based on similarities and differences, as well as specificity deductions.
Physical categories such as co-nodal systems (counter-fluxes, hub systems) are simply not elements of physical geography,
but of modern geography, for which the dynamics of different hydrological orientations and values are valid. For co-nodal
systems, poly-values, relative focus, dynamics, multidimensionality, linearity are valid. Co-nodal systems are also the basic
hydrological fluvial systems. Paleo-hydrology implements the temporal element in geo-interpretations in a particular way.
Co-nodal sets appear as a suitable work-frame.
3. Paleo-hydrology and remote sensing
Certain postulates that co-nodal systems can generate, such as lower-flow/sequestration, two-stream, counter-flux. The
postulates are linear, model-logical, relative, causal and acausal statements, relationships, and structured facts of
advanced systems. The co-nodal system is thus composed of nodes and branches. In cases of anastomosing, flows reach
inverse functions, abandoned channels become channels of other rivers, through co-nodal models the dynamics are quite
readable (Paraguay river system, Mesopotamian bifluvial system). Hydro-geographical movements are effectively
legible as branching, and the models that describe this and such branching are the most efficient models and the closest
models that fit the approximate Remote sensing data. Branches, networks, subset aggregations, and circular systems
effectively correspond to field data. The dynamics of water systems and flows is the formalization of dual, progressive
systems. Remote sensing data provide the material for formalizing hydrological models as river co-nodal systems, hubs
and flows, as multioriented interactions and vector exchanges. Axial models approximate causal ones as special cases of
interdynamics.
Remote sensing’s geo data are crucial to understanding hydrodynamics. Hydrosystems, especially hydrodynamic
systems, are complex phenomena and require comprehensive research. Fluvial analysis, dynamic analysis of flow/river
systems is provided at a high level of scientific accuracy/validity. Paleo-hydrological objects, phenomena, and paleohydrological issues require time analysis of the hydrological facts found. Temporal reconstruction is by all means facilitated by the use of visual data, primarily by remote sensing techniques given the nature, structure, and volume of information
data status, and physical internship that have been enhanced by hydrological and hydrodynamic analysis (less noticeable
phenomena). Remote sensing is a typical information technology technique. Information data can be tuned, calibrated,
planned, controlled, managed, and implemented through interactive research work and processes, such as process
alignment and implementation. Remote sensing data are dynamic info technology. Its data are much more interesting
and vibrant than other, secondary quantitative and qualitative techniques.
Paleo-hydrology is perhaps the most implicit field when implementing remote sensing techniques. Knowing that the
results and findings are the most intriguing and the area of possible applications most attractive, the interactive findings
open up opportunities for new geographical knowledge, interpretations, and the basics of comparative research (Abbasova
et al., 2017). Paleo-hydrological objects and topics are more easily visible with the remote sensing apparatus. Remote
sensing is greater applied, applicable to global and macrogeographic phenomena. Macrogeographical phenomena such
as geographical systems are observable, and comprehensible paleo-hydrological objects and fluvial systems are macroeconomic entities.
As synthetic systems, i.e., as holistic and logical structures, phenomena have a spatial spread and are holistically structured. Remote sensing techniques are especially useful for macrophenomena and hydrogeographic systems, dynamic complexes, paleo chronologies, and fluvial models.
The co-nodal system analysis Chapter
7
121
4. Methods
The purpose of this study is to analyze the flow of the Danube to determine the potential and dynamics of the channel system
and the complex time management of this significant European river. Paleo-hydrological analyses were done in chronological and spatially sequential terms, with sections (river flow fragments) and hydrological stages. The river has complex
canal potency, established or indicated relations, extraordinary canal, as well as prerequisites for the further development
and improvement of water management of the river. These connections are noticeable throughout the Danube and are particularly pronounced in the middle and lower middle parts, the Pannonian segment of the lower propagation. The lower
Danube segment registers intersecting flows and manageable communications that have long been the subject of management of the development and implementation of water management systems (Constantinescu et al., 2015).
The potentials of the upper Pannonian and lower Pannonian hydrological blocks have been generally investigated, as
recent hydrophysical outputs. The basic paleo-fluvial relations were represented by the observational method, as well as
repercussions, according to the current flows. On this occasion, advanced RS systems were used with the support of useroriented geographic software and remote sensing spatial systems. Hydro flows are summarized by confluxes as marked
paleo-hydrological points (points of previous or current paleo-estuaries).
Hydro points were used as starting points or nodes for alternative subset and vector analysis. For the analysis of the basic
paleo hydro nodes of the Pannonian and sub-Pannonian blocks, the block is divided according to ref. geographic features
with due regard to the dynamics and sometimes complex logic of fluvial or laco-fluvial formations. On that occasion, neighboring paleo nodes (paleo nodes and current hydro nodes) and river dominants were considered. The dominant and the node
form a subset or hydrologic block. The block is determined by simultaneous dominant, parallel (paleo-distributary), or
consecutively oriented flows (one river in two flows, in the interphase, named hereafter as Danube transitions). A stream
changes its dominant over time; hence, two dominants of one river channel are consecutively possible (Leigh, 2008;
Gilvear, 1999). The paleo-island is bounded by flows entirely or consecutively, as positive dominants of the paleo-block.
Oriented paleo-block flows form actuals, recent, and present river flows. Paleo-island is a hydrologic block made of river
distributaries, or as in river systems, of more than two rivers and its paleo distributaries. The middle and middle-lower
Danube has three (or four) paleo-blocks: upper supra-Pannonian, lower supra-Pannonian, sub-Pannonian, and lower Banat.
The upper block consists consecutively of: Vac, Szolnok, Ulca, and Titel knot.
5. Nodes and cyclic confluent system
Nodes are complex fluvial solutions, that is, complex paleoconfluxes. Most often, these are multiple confluxes (multicons),
very rarely unique confluent points. The nodes are consecutively: trifluxes, bifluxes, paleoconfluxes, and confluxes. As the
layers are structured, triconfluxes are biconflux and paleoconflux, or biconflux and conflux. Paleoconflux constitutes the
inflow of a river into itself, during the paleological phase, and is a temporal phenomenon, typical of the transition between
the main phases.
The most appropriate way to show a system of river flows with respect to the change of orientation over time is a system
of streamlined flows (i.e., oriented graphs), as a limited yet complex combination providing both time-separated phases and
overlaps. The nodal system is the most adequate model in the analysis of anastomosing river systems. Oriented flows have
the advantage of explaining the complex chronology of Pannonian and sub-Pannonian flows (Gábris and Nádor, 2007;
G€
unther-Diringer, 2001). Pannonian meta-paleology is intertwined and multifaceted. The orientations are subdivided into
classes and subclasses that are separated, time-varying, and declared layers. A distinction has been made between the temporal structure of the layers from which further conclusions and implications were drawn (Fig. 1).
Outputs of a co-nodal model are cyclical alternatives, that is, overlapping to a full cycle, from the start point (output) to
theend point (output) of branching of a hub system or estuary. Input is branching or flowing into branches and oriented
subsets. The hypocycle is composed of two or more sets whose nodal iteration over time produces a full anastomosing
effect. The hypocycloid therefore encompasses all the temporal cycles of paleo-hydrodynamics. The hypocycloid is a paleo
addition, a time composite of complementary cycles from deflux (input) to conflux (output). The hypocycloid thus encompasses the largest time shift that combines the period of anastomosing evolution, that is, the cycles of partial hubs. Hubs are
fluvial or conflux outputs. Hubs are confluxoids with respect to the alternating conflux character. They are made up of
iterative chronological not conventional spatial nodi.
Confluxoids are time-space meetings of a hub system. Although confluxoids are paleo-hydrological phenomena, they
also produce real effects, such as geohydrological geopaleological effects, and therefore they are also physico-geographical
phenomena. Again, each hydrological phenomenon is a special case of a complex system (Fig. 2).
122
Handbook of hydroinformatics
FIG. 1 Chrono-slide with paleo points (Danube river): (A) Titel, (B) Titel-Czenta, (C) Czenta-Surduk, (D) Belgrade, (E) Savian counterflow, Sirmium
(Sremska Mitrovica)/Savacium (Sabac), (F) Belgrade, Danube Hub, (G) Titel, Belgrade Hub.
FIG. 2 Dynamic Paleo-Danube translations: (A) Cibelian counterflow Slavonski Brod (1), Cibalae/Volcae (2,3) (B, C) Paleo Dravus, Cibelae (Savus),
Dravus avulsion with Danube paleo bifuxes (D, E) Paleo Danube avulsion (The co-nodal system in Table 2).
The co-nodal system analysis Chapter
7
123
The most pronounced occurrence of these fluvial dynamics and the multiple Danube hub is the unique Danubian hypocycloid, or conflux system with 10, clearly differentiated, paleoconfluxes. This makes the Danube hypocycloid the most
remarkable multiconflux in Europe. The accumulation of river confluxes peculiar to the Danube cycloid—as a miniature of
the Pannonian paleoinsula—made up of different cyclical points (countercurrents), combinations of countercurrents and
paleoconfluxes, as a dynamic maximum. Up to the Carpathians, the Danube dynamics are characterized only by anastomosing paleoconfluxes, in the Carpathian precomplex; however, a unique phenomenon of counterflux is found ( Jenkins,
2007; Phillips, 2014).
5.1 H-cycloids analysis and fluvial dynamics
Due to chronological overlaps and stages, the cycloid is a sequential hydrological category. Fluvial systems include cyclic
flows, partial cyclic layouts, and anabraching cycles are not cycloids according to the absence of interphase overlaps and
the summation of subcycles into a cycle. The problem seems to be that of the spatio-temporal set and the fact that cyclical
systems of distributions and anabraching are physical categories and not just chronological segments. However, all downstream flows are spatial-dynamic, hence chronological spatial functions. It also means that all systems are dynamic linear
models in which certain parallels are possible. Parallelisms imply certain phases that can be explained consecutively in hub
nodal systems. In any case, hypocycloids are not linear categories, but dynamic multifunctional, oriented, and polyvalent
variables.
The most impressive paleo-dynamic phenomena of hydrological geography are paleocycling and interfluvial paleocycles, and among them certainly hydro-cycloids (h-cycloids). The world’s largest HHC interfluvials are: Euphrates-Tigris,
Parana-Paraguay, Danube-Tisza, and Mississippi-Ohio. Interfluvials are by nature paleo-hydrological phenomena. Mesopotamian h-cycloid was formed by the cycling of the Euphrates with the paleo-hydrological rotation of the Euphrates and
the Tigris; the interfluve itself is characterized by two hydrological fluvial cycles and a quadri hub system. The Danube
h-cycloid is a tricyclic fluvial process mediated by the Tisza in the second cycle. The Iberian interfluve is an interactive
phenomenon of Paraguay and Parana. Finally, the Ohio-Mississippi interaction with four cycles, with a complex paleodynamic form and pre-gulf estuary forms a large North American interfluve.
A complete bifluvial paleo-cycle model or a bifluvial h-cycloid of two dominants and an interfluvial cycloid of two
dominants in which the flows are reciprocal and alternating are consequential axial, nodal, and polyconfluxoid points. Paleo
blocks of cyclic models are parallelograms with simultaneous flows of even stages and alternating flows of odd ones. The
parallelograms of the linear models consist of simultaneous and alternating, parallel and nonparallel flows and values (nth
and n + 1th phases), with polyvalent modes and constants (see Fig. 3).
The Baghdad hub system consists of parallel or alternating paleo-dynamic points, some of which are active simultaneously or consecutively as hydro nodes (Samawah, Hamza Ia, b). A cycloid is a specific alternating paleofluvium of two
rivers during the entire course of a cycle or subcycle. This means that unlike some other h-cycloids (such as the Danube),
perfect h-cycloids are unique paleo-hydrological bifluvial phenomena. In other words, in a certain phase of the model,
another river said during one river, and in the next phase interfluvial rotations take place. The model (see Fig. 3) is described
FIG. 3 Mespotamian H-cycloid (A) 1 Delli Abbas 2 Al Zoor 3 Bakuba 4 Baghdad 4a Habbaniyah, Karbala 5 Kut 5 6 Najaf 7 Nasiriyah, Al Hamza, 8
Basrah, (B) complete H-cycloid Euphrates (Eu) Ia 4-5-6b-8, Euphrates (Eu) Ib 4-5-5a-7-8 with overlapping 4-6b, 4-6-6a-7, Euphrates (Eu) II (1)-(2)-(3)-44a-6-6a-7-8, counterflow with paleo-rotation Euprathes I-Euphrates II (Tigris II) 4-4a, 4a-5, Euphrates III 4a-6-6a-7-8, 5a Shatrah 6a Samawah 6b Amarah.
124
Handbook of hydroinformatics
by interfluvial scissors, or a specific double lever where the sides are, alternatively, the variable and only constant in odd
phases—the cycloid nodules of Baghdad and Samawah.
The hypocycloid is a complex, dynamic, iterative, and polyvalent model, where the values are at the same time: conventional (recent phases of the model, separate flows of the model), simultaneous (even stages), and alternating (odd stages
of the model). The summation of stages takes place in flows that overlap from a subcycle to a cycle or from a smaller cycle
to a larger cycle. Hence, the hypocycloid does not form a large cycle but a cycle that encompasses cycles and overlaps. The
fluvial cycle is usually made up of two rivers with an impressive paleogenesis of one river, the dominant one. The cycle, the
development of the dominant, can be rotational (with counterpoints and counterdirection), alternative (with alternation of
dominant and subdominant), and alternate-rotational (with two alternations and two subdominants, Uruguay and Paraguay). Although it is an interfluvial, the cycle and hypo-cycle are performed by the dominant, so dynamic outlets, spatiotemporally understood as overlapping and consecutive additions of chronologies of the hydro-dominant with
subdominant assists or alternations.
The nonlinear function implies the alternation of flows which are bifluvium or refluvium with several confluent points
such as: quadrifluvium, double biconfluvium, or triconfluvium, such as biconfluvium and active confluvium. Interaction
implies the use of factors and factorials (partial outcomes) in order to form hydrodynamic product (Reinfelds et al., 1998).
Basically, every nonlinear function is a bifluvial relation. Each paleo-hydrological function is a real, active bifluvial
relation (a relation between two rivers that is actually polyoriented). Relative geographical models are based on the theory
of oriented flows, cycles, countercurrents, river basins, cycloids, as well as processes such as cycling, rotation, alternation,
equivalence, and addition. The co-nodal analysis includes a co-nodal system composed of nodules and flows, which
describe oriented hydrological functions, river basin, set and subsets of river basins, are not just linear and unambiguous
hydrological networks, as seen by conventional river basin theory, but by definition of the nodal theory are, multifaceted
oriented networks (reversible, bidirectional, complementary).
Subcycles (see Fig. 8) are cyclohydrological phenomena, cyclodynamic fluvial phenomena that are formed by summing
river flows. Cycles are cyclohydrological phenomena that are formed by overlaps and additions of subcycles. Semihypocycloids are paleo-cyclohydrological phenomena that occur by alternating and overlapping cycles (in the compendium
of analysis s-c, s-s-c, s-h-c, h-c, further classified as Kuzmanovic h-cycloids, cycles, and subcycles). Perfect cycloids are
hydrocyclic phenomena that are formed by the alternation of flows, and the rotation of semi-cycloids (Mesopotamian
h-cycloid, see Fig. 3). The subcycle is formed by cycling, the cycle by overlapping, the semicycloid by overlapping
and alternating, and the complete h-cycloid by rotating cycles.
Contrafluvium (see Figs. 1E and 2A) is a paleo-dynamic phenomenon of the flow of the same, or two different rivers, in
the same riverbed in two opposite directions during different time periods and paleo-hydrological phases. The counterflow
is a paleo-hydrological relation of two rivers, in which chronologically the latter river flows in the opposite direction from
the original direction of the previous river.
Countercurrents (Cibalae, Centa, Baghdad, Sabac) occur at higher stages of paleo-hydrological development. These are
high-ranking hydrodynamic phenomena, requiring a greater degree of fluvial and confluent conditions: bifluvium, recon,
or bicon.
In order for the bicon to be formed, it is necessary to fulfill the condition of refluvium and bifluvium, as well as recon.
For each higher paleo-dynamic rank, it is necessary to fulfill the condition of the lower rank. The phenomena are not only
temporally but temporally spatially scaled, forming a realistic model of paleo-hydrological chronology. This means that
over time, a river would form like a bicon, there should be an actual estuary, the river should flow into itself (recon), and it
should flow with its own course and the course of another river (bifluvium) (Fig. 6). Hence, at least two bifluviums are
required for biconfluxes, while in the case of higher hydrodynamic ranks such as tricons and quadricons, a larger number
of subdominants, one dominant and three subdominants, and two dominants and two subdominants. Trifluvium is a phenomenon in which the course of one river is (paleo) chronological flow of the same river in different phases (after alternation, postalternation), the second river during some of the phases (bifluvium), and the third river (Fig. 7). Countercurrents
are the consequences of quadricons (Centa) or polyfluviums. Counterflows are therefore the phenomena of the highest
hydrodynamic systems, polyfluviums (Centa) or trifluviums (Sabac, Vinkovci). Cycling is the property of a river to produce
time-cycling products, subcycles, cycles, partial, and perfect hypocycloids. The rotation is a double, complete alternation of
the flow of two rivers during a certain paleo period (alternative phases). The alternation of two rivers is a reciprocal
replacement of the flow of one river by the flow of another river, and vice versa.
6. Three Danube phases
The Danube paleohistory consists of three prominent phases that correspond in a physico-dynamic sense to the paleogeographic segments of Pannonian development, the paleo-dynamic sectors whose dynamics have been described even
The co-nodal system analysis Chapter
7
125
by recent flows. The three phases are: upper Pannonian and lower Pannonian (supra and subpanonic), with lower Danube
development (Banat metatransition of the Carpathian effect). The subpanonic transition is due to the first two phases,
without neglecting the laco-fluvial phase of the Carpathian-Danube limes. Danube limes make somewhat less recognizable
fluvial transitions. But the elements of the extensions of the first and second classes of knots are clearly indicated: Belgrade,
Pancevo, and Centa.
Each phase has a paleo-dynamic block with variable river dominants. The upper Pannonian sector is
made up of the Danube and the Tisza dominant (Kiss et al., 2015; Stancı́ková, 2010). The lower of the Drava, the Danube,
the Tisza, and the sub-Pannonian of the Drava-Sava and the Danube dominant.
The supra-Panonnian Phase (Table 1) includes the Danube Ia and Ib class 12 flows, the supra-Panonnian Phase and the
pre-Carpathian transition estuaries (2.11), the flow from Szolnok to Titel class 124, from Titel to Belgrade, and from Titel to
Centa. The second, lower suprapanonic phase exudes class 134, the Danube flow from Vac to Ulca with chronological
flows from Ulca to Titel, and with characteristic subclasses and further. The third sub-Pannonian classes 137 and 138
include specific sets of streams IIIa and IIIb, namely the Danube flow from Ulca to Cibalae, and from Cibalae to the mouth
of the Savus, and the Danube flow from Ulca to the mouth of the Savus, bypassing the sequence (7,8) (Table 2). The Sava
flow is laco-fluvial and bifluvial by paleo nature up to the Sirmian node, as well as in the continuation of the upper lacofluvial phase Hrtkovci-Belgrade; in front of that levridge on the line of the Sirmium-Sabac node, counterfluvial
(Drina, Sava).
The dynamics of the Dravus describe classes 56 and 34, which are peculiar to the Dravus translation from Bogojevo to
Ulca and specifically to flows from Bogojevo to Becej, Bechay (5,6) (6,9) and (5,6) (6,10) and from Ulca to Titela (3.4)
(4.6) (Fig. 4).
TABLE 1 Classes of paleo-flows of Danube.
Class
Flow
Class
(1,2) (2,6) (2,11)
Flow
Class
Flow
(1,3) (3,7) (7,8) (8,5) (5,6)
(1,3) (3,8) (8,5) (5,5) (5,6)
124
134
137
(1,2) (2,4) (4,6)
(1,3) (3,4) (4,6)
(1,2) (2,4) (4,9) (9,6)
(1,3) (3,4) (4,9) (9,6)
(1,2) (2,4) (4,5) (5,6)
(1,3) (3,4) (4,5) (5,6)
(1,3) (3,7) (7,8) (8,5) (5,6)
138
(1,3) (3,8) (8,5) (5,6)
TABLE 2 Bifluxes and polyfluxes with major fluvial transitions.
Biflux
River
Transitions
River
(3,4)
Dravus, Danube
(7,3)
Savus, Danube
(1,2)/(1,3) in 4
Translation
(7,8)
Savus, Danube
(1,3) (3,4)/(1,3) (3,7) in (5,6)
Translation
(8,5)
Savus, Danube
(3,4) (4,6)
Complement
(2,4)
Tisza, Danube
(10,8)
Polyflux
Flow classes
(9,6)
Danube I, Dravus, Danube II, Tamis
(6,11)
Danube Ib, Dravus, Danube II,
Danube III, Savus
(4,5)
Danube I, Dravus, Danube II
(8,5)
Danube II, Danube III, Savus
(5,6)
Danube Ib, Danube III, Savus
(7,8)
Savus, Danube III
126
Handbook of hydroinformatics
1
2
3
4
9
1
2
3
4
9
8
5
6
7
7
10
5
8
Hydrodynamics
Paleo knot
6
11
10
Hydrodynamics
2
1
2
3
4
9
7
Hydrodynamics
6
5
Paleo knot
1
2
3
4
10
2
Hydrodynamics
9
5
Paleo knot
1
2
3
4
6
10
2
3
4
9
8
5
6
Paleo knot
2
1
2
3
4
9
8
5
6
Hydrodynamics
9
7
Paleo knot
2
1
2
3
4
9
8
5
6
7
8
Hydrodynamics
2
7
8
10
1
Hydrodynamics
7
10
2
7
8
10
Paleo knot
6
5
Paleo knot
2
10
Hydrodynamics
Paleo knot
2
FIG. 4 Diagram: 1 Ia Class, 2 Ib, 3 IIa Classes 134 (1,3) (3,4) (4,9) (9,6), 4 IIb Classes 134 (1,3) (3,4) (4,6), 5 IIIa, 6 IIIb, 7 Paleo Island IIa 8 Paleo Con IIa
Knots: 1 Vac, 2 Szolnak, 3 Ulca (Vukovar), 4 Titel, 5 Belgrade 6 Pancsova (Pancevo), 7 Cibalae (Vinkovci), 8 Šamac, 9 Czenta (Centa).
The co-nodal system analysis Chapter
7
127
In cases of accumulation of river junctions (fluvial and translational), the phenomenon of manifold points, characteristic
of counterfluxes, and river counterflows also occurs. The counterfluxes are the consequence of multiple confluxes of
numerous paleo-dynamic nodes. While nodes are characteristic in the later stages of paleo-dynamic development
(Schumm, 1968; Rittenour et al., 2003), counterflows occur only after the second or third paleo layer. It is a common
estuary, paleoconflux, and biconflux paleo-estuary. In the case of counterflows, it is necessary to satisfy the biconflux condition (condition of second order) and paleoconflux, which means that counterflows occur in the transition from biconflux
to the triconflux phase.
The lower Danube (see Fig. 6) is characterized by the quintoflux of five chronological streams of the Danube cycles Ib,
II, III, Dravus, and Savus on the Danube limes from Panevo to Banatska Palanka (6,11), quadriflux of Danube I, II, Dravaus
and Tamis from Centa
to Pancevo (9,6), Danube Ib, III, Dravus and Savus, from Belgrade to Pancevo (5,6), and at least
several trifluxes from Titel to Belgrade (4,5), biflux of Danube III and Savus, from Cibalae to Šamac (7,8) and from Šamac
to Belgrade (8.5). Bifluxes of the Dravus and Danube from Ulca to Titel, and the Savus and Danube from Cibalae to Šamac,
Tamiš and Danube from Šamac to Belgrade. To this should be added the specific counterflux of the Danube from Ulca to
Cibalae, and the Sava from Cibalae to Ulca (Table 2).
In the case of poly conflux, 12 clearly differentiated hydro paleo nodes (Figs. 5–12) are characteristic: Vac, Szolnak,
Sonta, Bechey, Ulca (Vukovar), Titel, Cibalae (Vinkovci), Šamac, Czenta, Belgrade, Pancevo, and Banatska Palanka. The
Danube hub consists of the lower Banat paleo quadrant with specific Belgrade polyconflux, the hydroinsula described by
the hubs Surduk, Centa, Pancevo, Belgrade, and further, more appropriately Titel, Centa, Pancevo, Belgrade with a dozen
paleoconfluxes. The Belgrade polyconflux hub consists of six striking paleoconfluxes: Danube I and III, Danube II and III,
Danube I and Savus, Danube II and Savus, Dravus I and Savus II, and Danube (Fig. 6). A broader variant of the Danube halfhub consists of at least 6, as many from the point of view of analysis were counted, paleo-biconfluxes, 10 paleoconfluxes
and several fluvial confluxes (Tamis and Danube, Tisa and Danube, Savus and Danube).
The Danube hub with specific polyconfluxoid consists of 4, or 5 hydro nodes with 20 recent and temporal paleoconfluxes. The Danube hub consists of a system of 20 current and former mouths. Belgrade and Pancevo make interesting
Danube paleo trifluxes.
7. Danubian hypocycles as overlapping phases
The lower Danube flow is described by three hypocycles: large, medium, and small (in the analysis as Kuzmanovic s-c, s-sc, s-h-c, h-c), each characteristic for a phase of hydro-morphological development of the Danube and specific
paleo-hydrological block, I block for I and II phase, III block for III phase, with overlapping between the phases, and alternating partial flows, forming the Danubian hypocycloids of the Danube paleoinsula. These cycles, in addition to being
individual hubs, confluxes, also integrate joint flows (parts of joint flows, classes, bifluxes, and paleofluxes) (Figs. 10–12).
The first hydro cycle consists of overlaps of Phase I and III, Vac, Szolnok, Titel, Belgrade knot, Phase I and Vac, Ulca,
Šamac and Belgrade knot, Phase II, the second cycle consists of overlaps of Phase I and II, Vac, Szolnok, Titel, Phase I,
Vac, Ulca, Titel, Phase II, the third cycle overlapping Phases II and III, Ulca, Titel and Belgrade node, Phase II, and Ulca,
Samac and Belgrade node, Phase III (Figs. 8 and 9). The first circle includes the supra and sub-Panonnian blocks, the second
supra-Pannonian block, and the third sub-Pannonian block, each with corresponding quadrants. The most impressive is the
fourth quadrant Danube hub, which, with complex fluvial dynamics, somewhat corrects and complements the image of the
consequent Danube paleo-cycles. The third hydro cycle unites all three phases, the Belgrade paleo-hydrological block consists of all three river dominants, and the flows are trifluxes. The Belgrade cycle consists of Titel (Surduk), Centa, Pancevo,
and Belgrade knots. Danube hub cycles generate a large Danube paleo-hydrological cycle of the Pannonian paleo block, in
the manner of partial, hypocycloid flows, in the sequences of phases and hydro cycles. Kuzmanovic’s paleo cycles consist
of inner tangent circles/subcycles with common points, nodes, and common paleo flows, classes.
One cycle is described by alternating partial flows of one tangent circle to the nth node, and another tangent circle from
the nth to (n + 1)th node. The cycle model contains a dozen nodes and three inscribed tangent circles with specific
alternating flows.
The Belgrade paleo hub (hypocycloid) serves as a correction of partial cycles to the large paleo cycle of the maximal
Pannonian-Carpathian extend. The Belgrade cyclic hub is a pronounced hydrodynamic system whose environment records
almost all the dynamic solutions and translations of Pannonian flows (see Fig. 1A). This pronounced hyper-dynamics,
which is effected in a small space, makes it behave as a hydrodynamic model and as a hub, for the translation of river
systems over a large area of the Danube. In this way, the Danube hypocycle is composed of cyclic phases and tangent
junctions, with supra-sub-Pannonian and pre-Carpathian coverage (Fig. 11). The Great Paleo H-Cycloid is described in
physical terms by the combination of the first and third phase of the upper supra-Pannonian and sub-Pannonian as two
128
Handbook of hydroinformatics
1
2
3
4
9
7
1
2
3
4
9
8
5
6
7
10
5
8
Hydrodynamics
Paleo knot
6
11
10
2
Hydrodynamics
1
2
3
4
9
Paleo knot
2
1
2
3
4
9
8
5
6
7
7
10
8
5
6
11
10
Paleo knot
Hydrodynamics
2
Hydrodynamics
1
2
3
4
9
8
5
6
Paleo knot
2
7
10
Hydrodynamics
Paleo knot
11
2
FIG. 5 Diagram: 1 Ia Class Conflux, 2 Ib Conflux, 3 IIa Conflux Classes 134 (1,3) (3,4) (4,9) (9,6), 4 IIb Conflux Classes 134 (1,3) (3,4) (4,6), 5 IIIa
Conflux.
distinct physical-spatial hydro cycles, encompassing the entire Pannonian block, starting from the Vac flow to the Pancevo
and pre-Carpathian polyfluxes (Fig. 2B).
Larger coverage, I and II, quadrant 1 and 2, first, II and III, quadrant 3 and 4, second, I and III, quadrant 1, 2, 3 and 4,
third (cycles are listed according to overlapping blocks, not by chronology, consequential representation is shown in the
Fig. 8) first and second cycle have common phase II (Ulca, Titel, Pancevo), the large cycle and the first cycle, have a
The co-nodal system analysis Chapter
1
2
3
4
5
6
7
129
9
7
8
10
10
Paleo knot
Hydrodynamics
11
12
2
FIG. 6 Paleoknots: 1 Vac, 2 Szolnok, 3 Sonta 4 Bechey (Becej) 5 Ulca, 6 Titel, 7 Cibalae, 8 Šamac, 9 Czenta (Centa),
10 Belgrade, 11 Pancsova
(Pancevo), (Savski brod) 12 Banatska Palanka.
Vac
Szolnok
Knot
Ulca
Titel
Centa
Danube I - Danube II
Dravus I - Dravus II
Samac
Belgrade
Pancevo
Danube I - Danube II
Danube I - Danube III
Danube I - Savus
Danube II - Savus
Dravus I - Savus
Savus - Savus (Danube II)
Danubian hypocycloid
FIG. 7 Paleo triconfluxes, biconfluxes, confluxes and confluxes.
Danube I - Danube II
Danube I - Dravus
Danube I - Tamis
Dravus - Tamis
Danube I - Danube II
Danube I - Danube III
Danube I - Dravus
Savus - Dravus I
Belgrade
I
Szolnok
Vac
Knot
II
I, II
Titel
Novi Sad
Centa
Danube I - Danube II
Danube I - Dravus
Danube I - Tamis
Dravus - Tamis
Danube I - Danube III
Dravus I - Dravus II
III
I, II
II, III
Belgrade
Samac
Pancevo
Danube I - III
Danube II - III
Danube I - Savus
Danube II - Savus
Dravus I - Savus
Savus - Savus (Danube II)
Danube I - Danube II
Danube I - Danube III
Danube I - Dravus
Savus - Dravus I
Belgrade
Danubian hypocycloid
FIG. 8 Quadri system (fluvial sections and intersections) of Pannonian paleo-insula.
Vac
Snolnok
Ulca
Vac
Ulca
1
Szolnok
Titel
Titel
2
3
Samac
Ulca
Samac
Titel
Belgrade
Belgrade
Szolnok
Vac
Szolnok
Ulca
Titel
Vac
1
Ulca
Czenta
2
Titel
3
Samac
Czenta
Ulca
Samac
Titel
Pancsova
Belgrade
Belgrade
FIG. 9 Hypocycles and subcycles (Kuzmanovic cycles) with hydro knots. Line 1: subcycles, Line 2: hypocycles.
Vac
Szolnok
Knot
Neo Planta
Titel
Centa
Samac
Belgrade
Pancsova
Danubian hypocycloid
Belgrade
FIG. 10 Quadri-nodal system of oriented flows.
Szolnok
Vac
Knot
1
Ulca
Centa
Titel
2
Samac
Pancsova
Belgrade
Danubian cycle
2
FIG. 11 Sections to cyclic representation diagram. First and second hypo and subcycle (h-s-cycle).
Belgrade
132
Handbook of hydroinformatics
FIG. 12 Danube hypocycloid with 1 Danube subcycles (corresponding to first, second, and third paleo-dynamic change), Kuzmanovic s-c, see flow
diagram 2 Danube hypocycloid with paleo nodes and minor hypocycles, Kuzmanovic h-c 3 First and second subcycle of paleo-dynamic change, Kuzmanovic s-c 4 subcycle and hypocycle of Ia paleo-dynamic change, Kuzmanovic s-h-c.
common first cycle, a large cycle and a second cycle, a common second cycle, from which it follows that the sets of cross
sections are made up of cycle subsets, a large cycle and a large subcycle, have a common a large subcycle, hence the cycles
are duplicated, each a sub-cycle in a set of two sub-phases has a common flow of one phase plus the flow of another subphase of the same set or cycle, each subset is duplicated, which eventually leads to a duplication of a large sub-cycle
(summing sets) as a subset of the third cycle (Fig. 7).
Overlap of subcycles II and IIIa (1 and 3, and 3 and 4, in 3) in III, Ia and IIIa, in II (1 and 2, and 1 and 4 in 1), Ia and II, in
I (3 and 4, and 2 and 4, in 4) I phase I quadrant, Ia phase I and II quadrants, II phase I and III quadrants, III phase III and
IV quadrants (Fig. 8). Cycles are composed of complementary flows of complementary phases, overlapping phases,
and may be larger and smaller subcycles (4 or 6 knots). A cycle (at least 6 or 9 knots) is made up of subcycles with
common flows, for example, a cycle of 9 knots may consist of subcycles of 6 knots and a cycle of 6 knots may consist
of subcycles of 4 knots. The intersection of two subcycles can be a class or a subcycle, and the intersection of a cycle
and a subcycle is always a subcycle.
The co-nodal system analysis Chapter
7
133
The subcycle is added into a cycle by overlapping flows in phase models. The cycle integrates (encompasses) n cycles of
iterative flows of the nodal system. A cycloid is formed by overlapping several cycles of iterative (alternating) subcycles.
Cyclical overlaps, in turn, are positive additions of cycles to the cycloid. In this sense, cyclic dynamics achieves its full
synergy in cycloid as the aqueous complex of the dynamic model. Iterative phases are inherent to subcycles and overlapping
iterative phases are cycles. H-cycloids are formed by the addition of cycles. Thus, each level of the dynamic model is
described by determined subcycle processes.
Each circle is formed by two river cycles (subcycle, s-cycle), and each cycle is formed by two phases with up to two
common nodes (Figs. 7 and 12). Subcycles are complementary subsets of fluvial classes in a set, while class subsets are
intersecting subsets in a set of subcycles. The large h-cycloid is formed by phase I and phase III with the joint hubs in Vac
and Pancevo. The first cycle is formed by phase I and phase II with the common Vac and Titel hubs. The second cycle is
formed by phase II and phase III of Ulca and Pancevo.
8. Conclusions
The consequential analysis of lower Danube flow has led us to the model of Danubian hypocycloid as a complex paleodynamic system composed of major cycles (hypocycles) and subcycles, effectuating in Belgrade paleo hub, segmental
polyconfluental area acting as a correction to the other consequential hydro cycles. The Belgrade cyclic hub is a pronounced
hydrodynamic system whose environment records almost all the dynamic solutions and translations of upper
Pannonian flows.
The large Danube h-cycloid is composed of cyclic phases and tangent junctions, with supra-sub-Pannonian and preCarpathian coverage, while subcycles are made by complementing (flows), and cycles are made by overlapping (subcycles
and flows) in nodal time alternation. This concept is crucial in understanding the dynamics and repercussions of Pannonian
hydrology. The Great Paleo Hypo-Cycloid is formed in physical terms by the combination of phases of the upper and lower
supra-Pannonian and sub-Pannonian in distinct complement hydro cycles, by so encompassing the entire Pannonian block,
starting from the Vac flow to the Pancevo and pre-Carpathian polyfluxes.
References
Abbasova, D., Eslamian, S., Nazari, R., 2017. Paleo-drought: measurements and analysis. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of Drought and
Water Scarcity. Environmental Impacts and Analysis of Drought and Water Scarcity, vol. 2. Taylor and Francis, CRC Press, USA, pp. 665–674
(Chapter 34).
Constantinescu, Ş., Achim, D., Rus, I., Giosan, L., 2015. Embanking the Lower Danube: from natural to engineered floodplains and back. In: Geomorphic
Approaches to Integrated Floodplain Management of Lowland Fluvial Systems in North America and Europe. Springer, New York,
pp. 265–288, https://doi.org/10.1007/978-1-4939-2380-9_11.
Gábris, G., Nádor, A., 2007. Long-term fluvial archives in Hungary: response of the Danube and Tisza rivers to tectonic movements and climatic changes
during the quaternary: a review and new synthesis. Quat. Sci. Rev. 26 (22–24), 2758–2782. https://doi.org/10.1016/j.quascirev.2007.06.030.
Gangodagamage, C., Agrarwal, S.P., 2001. Hydrological modeling using remote sensing and GIS. In: 22nd Asian Conference on Remote Sensing.
Gilvear, D.J., 1999. Fluvial geomorphology and river engineering: future roles utilizing a fluvial hydrosystems framework. Geomorphology 31 (1–4),
229–245.
Gilvear, D.J., Bryant, R., Hardy, T., 1999. Remote sensing of channel morphology and in-stream fluvial processes. Prog. Environ. Sci. 1, 257–284.
G€
unther-Diringer, D., 2001. Evaluation of Wetlands and Floodplain Areas in the Danube River Basin, River Restoration in Europe., p. 91.
Jenkins, P.A., 2007. Map-Based Tests on Controls of Anabranch River Character on the Lower Yellowstone River (Doctoral dissertation). Montana State
University-Bozeman, College of Letters & Science.
Kiss, T., Hernesz, P., S€umeghy, B., Gy€orgy€ovics, K., Sipos, G., 2015. The evolution of the Great Hungarian Plain fluvial system–fluvial processes in a
subsiding area from the beginning of the Weichselian. Quat. Int. 388, 142–155. https://doi.org/10.1016/j.quaint.2014.05.050.
Langat, P.K., Kumar, L., Koech, R., 2019. Monitoring river channel dynamics using remote sensing and GIS techniques. Geomorphology 325, 92–102.
Leigh, D.S., 2008. Late quaternary climates and river channels of the Atlantic Coastal Plain, Southeastern USA. Geomorphology 101 (1–2), 90–
108. https://doi.org/10.1016/j.geomorph.2008.05.024.
Phillips, J.D., 2014. Anastamosing channels in the lower Neches River valley, Texas. Earth Surface deposits in the northern lower Mississippi valley. Quat.
Sci. Rev. 22 (10–13), 1105–1110. https://doi.org/10.1002/esp.3582.
Reinfelds, I., Bishop, P.A.U.L., Benito, G., Baker, V.R., Gregory, K.J., 1998. Palaeohydrology, palaeodischarges and palaeochannel dimensions: research
strategies for meandering alluvial rivers. In: Palaeohydrology and Environmental Change, pp. 27–42.
Rittenour, T.M., Goble, R.J., Blum, M.D., 2003. An optical age chronology of Late Pleistocene fluvial deposits in the northern lower Mississippi valley.
Quat. Sci. Rev. 22 (10–13), 1105–1110. https://doi.org/10.1016/S0277-3791(03)00041-6.
Schumm, S.A., 1968. River Adjustment to Altered Hydrologic Regimen, Murrumbidgee River and Paleochannels, Australia. vol. 598 US Government
Printing Office, p. 1968.
134
Handbook of hydroinformatics
Stancı́ková, A., 2010. Training of the Danube River channel. In: Hydrological Processes of the Danube River Basin, pp. 305–341.
Walsh, S.J., Butler, D.R., Malanson, G.P., 1998. An overview of scale, pattern, process relationships in geomorphology: a remote sensing and GIS perspective. Geomorphology 21 (3–4), 183–205.
Wilkinson, G.G., 1996. A review of current issues in the integration of GIS and remote sensing data. Int. J. Geogr. Inf. Sci. 10 (1), 85–101.
Further reading
Mulligan, A.E., Evans, R.L., Lizarralde, D., 2007. The role of paleochannels in groundwater/seawater exchange. J. Hydrol. 335 (3–4), 313–329. https://doi.
org/10.1016/j.jhydrol.2006.11.025.
Pálfai, I., 1994. A Duna–Tisza k€ozi hátság vı́zgazdálkodási problemái (The water management problems of the Danube–Tisza Interfluve). In: Pálfai,
I. (Ed.), A Duna–Tisza k€ozi hátság Vı́zgazdálkodási Problemái. A Nagyalf€old Alapı́tvány K€otetei 3. Ed. Bekescsaba, Nagyalf€old Alapı́tvány,
pp. 111–126.
Stevanovic, Z., Kozák, P., Lazic, M., Szanyi, J., Polomcic, D., Kovács, B., Papic, P., 2008. Towards sustainable management of transboundary
Hungarian–Serbian aquifer. In: Transboundary Water Resources Management: A Multidisciplinary Approach. vol. 1, pp. 143–149.
Timár, G., Szekely, B., Molnár, G., Ferencz, C., Kern, A., Galambos, C., Zentai, L., 2008. Combination of historical maps and satellite images of the Banat
region—re-appearance of an old wetland area. Glob. Planet. Chang. 262 (1–2), 29–38. https://doi.org/10.1016/j.gloplacha.2007.11.002.
Chapter 8
Data assimilation
Mohammad Mahdi Dorafshana, Mohammad Reza Jabbarib, and Saeid Eslamianc,d
a
Department of Civil Engineering, Isfahan University of Technology, Isfahan, Iran, b Department of Electrical and Computer Engineering, Isfahan
University of Technology, Isfahan, Iran, c Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran,
d
Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran
1. Introduction
Research shows that global warming will increase the occurrence of extreme events (Karl et al., 1995) and, thus, reduce the
possibility of predicting the hydrological system’s future conditions (Tsonis, 2004). By increasing the occurrence risk of
floods and droughts, regions with insufficient observational data will be more exposed to the dangers of these events. Thus,
it is crucial to address hydrological modeling and reduce the uncertainty of the result of models by using the available data
and estimating the flow rate with more certainty in these areas (Wagener and Gupta, 2005). Hydrological models simulate
natural events by simplification, and as a result, their results are not definitive. The most important sources of uncertainty in
hydrological models include the uncertainty of the model input data, the initial state variable, the model structure, and the
model parameters. Data assimilation (DA) reduces overall uncertainty by considering the uncertainties of the inputs, observations, and updating variables promptly. Up to now, each of these sources of uncertainty has been reduced separately or
together, by Batch (Duan et al., 1992) and Recursive (Moradkhani et al., 2005) optimization methods, or a combination of
both methods such as Simultaneous Optimization and Data Assimilation (SODA) method (Vrugt et al., 2006). In Batch
methods, calibration is carried out using the time series of observations, and then, the parameters are estimated regardless
of the uncertainties of input, state, and output variables of the model, resulting in reducing the overall uncertainty of the
model. In this regard, the Shuffled Complex Evolution optimization algorithm developed at the University of Arizona,
called SCE-UA, is one of the Batch methods used by various researchers (Parajka et al., 2006; Misirli et al., 2003). Concerning the recursive methods, optimization is performed sequentially, and it is feasible to update state variables and
hydrology model parameters by addressing the uncertainty of the input data and the initial conditions (Moradkhani
et al., 2005; Weerts and El Serafy, 2006; Clark et al., 2008). Recursive methods have been vastly used in DA. These
methods carry out a successive continuous-time process to update the state variables and parameters of the model based
on the new observations, called the European Centre for Medium-range Weather Forecasts (ECMWF) process. In other
words, because the observations and the simulation results of each hydrological model alone are not complete, combining
both of them during the DA process can result in a more accurate prediction (Tiefenbacher, 2012). Note that the information
optimization during DA is performed sequentially in the Recursive methods while this process is conducted on a time series
of observations in the Batch methods. The model calibration is a particular type of DA, the purpose of which is to reduce the
model’s results error using observations (Tiefenbacher, 2012), and in the case of making predictions, using the DA methods
is the best way to reduce the error (Wagener and Gupta, 2005).
Generally, accurate and reliable prediction methods are the two essential foundations of effective and simultaneous
river management for flood control, flood warning, and reservoir management. Numerical models are not perfect and just
provide approximate models of reality. Therefore, various factors such as input information, problem information, computational model, and solving methods may cause the lack of consistency of the modeling results with reality
(Krzysztofowicz, 2001; Butts et al., 2005). Obtaining information about the sources of uncertainty, as well as quantifying
the extent of the impact of each source, is a key step in prioritizing research to develop future models. So far, various
methods have been developed for determining the degree of uncertainty of a model, including sensitivity analysis, multimodel approach, general effect method, and the probabilistic method. Probabilistic and general effect methods include the
most recent research related to flood prediction. In the probabilistic method, a large number of simulations are implemented
using the random samplings of the probability distribution from model parameters, initial conditions, boundary conditions,
and input data. For example, Pappenberger et al. (2004) used the Monte-Carlo sampling method to discuss the uncertainty
of the rating curve and the roughness coefficient of the hydraulic model. Further, Pappenberger et al. (2007) investigated the
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00003-8
Copyright © 2023 Elsevier Inc. All rights reserved.
135
136 Handbook of hydroinformatics
effect of uncertainty on flooding by using multiple combinations of effective model parameters for a two-dimensional
flooding model. In contrast, the general effect method uses a concentrated sampling method to decrease the number of
samples. The existence of uncertainty sources in modeling, even when the quality of the model and input data is good
and the model is well-calibrated, may cause the results to be inconsistent with the observation values. In this case, making
accurate predictions requires postprocessing the output of the simulation models. It should be noted that the information
processing methods in this case are also called DA.
Finally, the objectives of this chapter include: (i) introducing DA and its methods, (ii) DA applications in water engineering (especially in flood simulation and forecasting), and (iii) considerations in the use of DA methods.
2. What is data assimilation?
Researchers have proposed different names for methods developed for improving modeling results based on their disciplines. For example, meteorologists and researchers on coastal management call these methods information data assimilation method while hydrologists also call them data updating method. The simplified term Data Assimilation (DA)
was first used in the 1960s as a military project to control and guide the direction of missiles. Regarding the lack of
knowledge of atmospheric conditions, the results of missiles trajectory models were generally inaccurate (Drecourt,
2003). Aerospace scientists have tried to solve this problem by using actual observations obtained by satellites along
the missile’s trajectory and control the missile properly. Hence, the DA term can be defined as follows: “combining
the observations of a phenomenon with the results of its simulation to improve the performance of the simulation model.”
Fig. 1 illustrates the difference between conventional modeling versus real-time modeling or DA.
Based on previous research, different models for flood simulation, routing, and forecasting can be divided into two
categories: hydraulic models and hydrological models (Fread, 1981). Flood forecasting models can be used in two ways.
In the first case, forecasting the downstream is performed by hydrologic and hydrodynamic models along with the usage of
input data (such as rainfall or water level upstream), and model parameters calibrated by measured downstream parameters.
In the second case, the output of hydrological and hydrodynamic models is modified after calibration and recorded using
upstream and downstream information.
The application of DA has been studied in hydrology and a combination of point observational data such as groundwater
level (Hendricks Franssen et al., 2011) and observational data from remote sensing on a large scale, such as soil moisture
(Sahoo et al., 2013), snow (Griessinger et al., 2016), flow rate (Moradkhani et al., 2005) in daily (Aubert et al., 2003) or
hourly (Neal et al., 2009) with the results of hydrological models. The results of previous studies have indicated the
improvement of the prediction by DA. Data assimilation can help water engineers in floods simulating and forecasting
to better plan and deal with this natural phenomenon. DA is performed by determining the estimation error, which is
the difference between the actual value and the value calculated by the model, through which the forecasting result can
be significantly improved. While many factors complicate real-time modeling, one advantage compared to nonreal-time
modeling is that the observed values of the river are comparable to the simulated values. If the observational values are valid
and reliable, it is possible to update the results of future simulations by using the difference between the observed and
simulated values. So far, various methods have been proposed to update the simulation results. In other words, the adoption
FIG. 1 Conventional modeling and real-time modeling using DA.
Data assimilation Chapter
8
137
of the technique for dealing with the updating problem depends on the selection of the dominant factor causing nonconformity in the results of simulations and observations (Anctil et al., 2003).
3. Types of data assimilation methods
Depending on the type of prevailing error in the applied model and its characteristics, a consistent DA with that error should
be used to obtain appropriate simulation and forecasting results. There is not a comprehensive and unique categorization for
DA algorithms, but commonly they can be divided into categories according to their updating procedure and the type of
variable to be updated.
3.1 Types of updating procedure
Concerning updating procedure, Variational Data Assimilation (VDA) and Sequential Data Assimilation (SDA) methods
are two well-known approaches.
3.1.1 Variational data assimilation
In this method, during each time step, the present and all past observations are used to correct the initial conditions of the
model and bring the observed values and simulated values closer together (Schad et al., 2015) (e.g., Fig. 2). Depending on
the spatial and temporal dimensions of the model state variable, VDA is performed in three ways of 1D-Var, 3D-Var, and
4D-Var (Liu et al., 2010; Reichle et al., 2001; Seo et al., 2003).
3.1.2 Sequential data assimilation
As shown in Fig. 3, the updating is performed step-by-step in this method. By having the observed and calculated values in
the time step k, the updating is performed by DA, which improves the model prediction in the time step k + 1. Then, the same
process is carried out for time steps k + 1 and +2. In other words, this updating approach uses accessible observations to
update the model variable in the same time step (Schad et al., 2015). In contrast to the previous method, this method causes
discontinuities in the value of the system variable in the time series. This approach is more commonly used for systems
driven by boundary conditions. The primary DA methods used in flood forecasting, such as the Kalman Filter (KF), are
subdivisions of the SDA (Entekhabi et al., 1994; Evensen, 1994; Galantowicz et al., 1999).
3.2 Types of updating variable
The structure of a model has different components. As shown in Fig. 4, each model has seven components: initial state (x0),
boundary (B), input (I), state variable (x), parameter (y), output (O), and structure (M). The model’s structure (M) consists of
two components of Mx and Mo, representing the conversion of the input to state variable and the conversion of the state
FIG. 2 Variational Data Assimilation (VDA) approach. The original model run (red dashed line and dots) is given a better initial condition that leads to a
new model run (blue dashed line and dots) that is closer to the observations (green dots).
138 Handbook of hydroinformatics
FIG. 3 Sequential Data Assimilation (SDA) approach. When an observation is available (green dot), the model forecast (red dot) is updated to a value
closer to the observation (blue dot) that is used to make the next model forecast.
FIG. 4 Schematic diagram of model components from a system perspective.
variable to output, respectively. As can be seen, the five components of seven (x0, B, I, y, M) should be estimated, determined,
or defined before the model can begin to operate. The two remaining model’s components (i.e., x, o) are obtained by a process
performed by the model. Every five components that should be identified before the model begins to operate can cause ambiguities and errors during their determination path, which play a role in determining the values x and o (Liu and Gupta, 2007).
Based on the type of error caused by each of the five components of the model, four DA methods are presented
(e.g., Fig. 5), including Updating Input Variable, Updating Model Parameter, Updating State Variable, and Updating
Output Variable (Error Correction) (O’Connell and Clarke, 1981; Refsgaard, 1997; Babovic et al., 2001).
3.2.1 Updating input variable
This method, which is an old approach and is rarely used today, is justified by the fact that ambiguity and uncertainty in the
model input can be the main and dominant source of error in the model prediction performance (Georgakakos, 1986; Xiong
et al., 2004).
3.2.2 Updating model parameter
It can be claimed that this method is the most widely used one for performing DA. This updating method is conducted by
using algorithms such as the Kalman Filter. In this case of updating, the model parameters are continuously considered
during the simulation and forecasting steps. The prevailing view in this case relies on the idea that the model calibration
from each time step to the next one will not change significantly the results in hydrodynamic models. That is why, the actual
changes of the model parameters are significantly slower than that of the computational intervals (Hsu et al., 2006; Chao
et al., 2008; Lee and Singh, 1999; Young, 2002).
The updating model parameters can practically express physical interpretations for the intended environment. However,
the model parameters are not determined accurately and clearly, due to the nature of the measurements, model calibration,
and model processing uncertainties. By accepting this compensation, updating the model parameters seeks to find the best
match between the modeling results and the observed values. In other words, using conceptual and physical models to
update model parameters may not provide an accurate and acceptable understanding of model parameters (Kachroo,
1992). However, there is no such understanding of parameters for data-driven models, such as Artificial Neural Networks
(ANNs) and Transfer Functions (TF) (Young, 2002). The selection of runoff coefficient and calculation of friction loss in
Data assimilation Chapter
8
139
FIG. 5 Comparison of the classical model run (upper part) with the model run with an updating procedure (lower part).
river hydrodynamic models are among the most prominent examples of this method (Sene, 2008). Furthermore, a combination method of the updating state variable and updating model parameters can be better cover all error sources
(Moradkhani et al., 2005).
3.2.3 Updating state variable
The issue of using DA to correct model parameters is rarely used because it is generally assumed that model parameters do
not change over time. This method considers the correction of the system state variable. The KF method is regarded as one
of the methods used in this case and is an optimal updating method for linear systems. However, it can be used well for
nonlinear systems with some modifications (Komma et al., 2008; Brocca et al., 2009). The updating state variable method
aims to determine the initial conditions of the model in such a way that the results of the simulation model and the observational values are best fitted and, if possible, these results can be generalized to the next time steps leading to the forecasting time (W€ohling et al., 2006). The KF method and its various forms, including Extended Kalman Filter (EKF) and
Unscented Kalman Filter (UKF) are among the well-known updating state variables methods, which have been used for
various models such as coastal models (Verlaan et al., 1831), as well as rainfall-runoff models and hydrodynamic models
(Butts et al., 2005; Weerts and El Serafy, 2005).
In general, the updating model parameter method tends to focus on ambiguities and errors caused by parameter estimation, while ignoring other sources of error. On the other hand, in updating state variable methods, the model has the
potential to consider various ambiguities and errors resulting from model inputs and observations. This causes the model
to fail to address the errors associated with estimating the model parameters. As a result, there was a tendency for model
estimation and prediction to result from a combination of updating model parameters and updating state variables so that all
sources of error could be covered appropriately.
140 Handbook of hydroinformatics
3.2.4 Updating output variable
The differences between the model outputs and the actual observations are usually related serially. This feature allows
predicting future error values and directly working on error modeling. The updating output variable method is based
on error prediction and has been widely used in various studies (Babovic et al., 2001; Wang and Bai, 2008; Bao et al.,
2011; Yu and Chen, 2005). The independence of this method from the prediction model treats as a salient feature, which
allows it to be used as postprocessing for the model output. This feature is noteworthy compared to the updating of state
variables and model parameters, which requires two-way interaction between the forecasting model and the updating
model. Researchers have proposed a wide range of techniques such as Kalman Filter (KF), Auto-Regressive (AR), Transfer
Function (TF), and ANN methods (Babovic et al., 2001; Xiong et al., 2004; Rungø et al., 1989; Moore, 1999), and Fuzzy
Logic (Yu and Chen, 2005; Xiong and O’Connor, 2002) to predict error.
4. Optimal filtering methods
At first, in Communication Engineering, optimal filtering methods have been developed to the aim of noise separation and
thus transmit information or signals more accurately and efficiently. Then it gained recognition in the field of hydrology and
water resources (Lettenmaier and Burges, 1976; Rodrı́guez-Iturbe and Mejı́a, 1974). Now, there are two basic mathematical
approaches to interpret the complete behavior of a hydrologic system: Deterministic or Stochastic. In the case of a totally
deterministic system, the transfer function can usually be obtained by solving systems of simultaneous equations. On the
other hand, stochastic hydrologic systems necessitate methodology which utilizes estimation theory and interrelates the
statistical nature of the problem with the prediction of random hydrologic events (Husain, 1985). To this aim, the Wiener
Filter was the first method adopted to handle random events. It is not used much for state estimation anymore, due to its
major drawbacks such as being applicable only to stationary time-invariant events. So, in this section, we will give a brief
review of other alternative and well-known Optimal Filtering methods such as Kalman Filter (KF), Extended Kalman
Filter (EKF) and Unscented Kalman Filter (UKF). For more information, the interested reader can refer to standard books
on optimal filtering (Simon, 2006; Anderson and Moore, 2012).
The early success of the KF at the beginning of its arrival in the 1960s in aerospace applications led to attempts to apply it to
more common other applications in the 1970s like water resources management. The application of optimal filtering approaches
in the field of water resources engineering can be summarized as follows: soil moisture (Entekhabi et al., 1994), flood forecasting
(Kitanidis and Bras, 1980), estimation of hydraulic conductivity (Katul et al., 1993), groundwater flow and transport problems
(Eigbe et al., 1998), estimation of water table elevations (Bailey and Baù, 2010), land surface model (Zhang et al., 2017), surface
water quality modeling (Cho et al., 2020), remote sensing (Dorigo et al., 2007; Khaki et al., 2020).
4.1 Kalman filter
One remarkable aspect of the Kalman Filter (KF) is that it is optimal in several different senses. This approach became
quickly known as a more efficient and useful tool due to its superiority on the Wiener Filter method (Kay, 1993). Navigating aircraft and spacecraft were the first usages of the KF method. For example, the Apollo project navigation system
used the KF method (Maybeck, 1979). This method was started with its applications in water-related fields such as meteorological issues. The KF method is widely used in meteorology, oceanography, and hydrology (Chiu, 1978). The KF
method is one of the state-variables updating methods that is widely used in various scientific fields today.
In the KF method, the system state is updated by updating the state variables. Suppose we have a linear discrete-time
deterministic system with a measurement equation given as follow:
xk+1 ¼ Fk xk + Gk uk + wk
(1)
yk+1 ¼ H k+1 xk+1 + vk+1
(2)
where wk and vk are processes and measurement noise (or uncertainty), respectively, which assume as white, zero-mean
Gaussian noise with covariance matric Qk and Rk (i.e., wk N ð0, Qk Þ and vk N ð0, Rk Þ ), and also are uncorrelated
(E{wkvTj } ¼ 0, for all i and j). The objective is to estimate the state vector xk+1 based on our knowledge of the System Dynamics
Equation (1) and the Measurements Equation (2), both of which have some uncertainty. The available information to estimate
state variables depends on the particular problem that we are trying to solve. If all of the measurements up to and including
+
time k + 1 are available for use in estimation of xk+1, then we can form an a Posteriori Estimate which is denoted as b
xk + 1 and is
computed as the expected value of xk+1 conditioned on all of the measurements up to and including time step k:
b
(3)
xk++ 1 ¼ E xk + 1 j yk + 1 ,yk ,…, y1
Data assimilation Chapter
8
141
On the other hand, if all of the measurements before (but not including) time k are available for use in our estimate of xk+1,
then we can form an a Priori Estimate, which is denoted by xbk+1
and computed as the expected value of xk+1 conditioned on
all of the measurements up to (but not including) time k:
xbk+1 ¼ E xk+1 jyk , yk1 , …, y1
(4)
+
+
and xbk+1
are estimates of xk+1. However, xbk+1
and xbk+1
are the estimate of xk+1 before and
It is important to note that both xbk+1
+
after the measurement yk+1 is taken into account, respectively. It is clear that xbk+1 is potentially expected to be a better
+
estimate than xbk+1
, since xbk+1
uses more information to estimate xk+1. Note that the first measurement is taken at time
k ¼ 1. We use the expected value of the initial state x0 to denote our initial estimate of x0, i.e., xb0+, before any measurements
are available:
xb0+ ¼ E x0
(5)
+
+
bk+1
and xbk+1
, respectively:
In addition, we use P
k+1 and Pk+1 to denote the Estimation Error Covariance of x
n
o
T
xk+1 xbk+1
P
xk+1 xbk+1
k+1 ¼ E
n
o
+ + T
P+k+1 ¼ E xk+1 xbk+1 xk+1 xbk+1
(6)
(7)
To better understand, the concepts above and their relationships are depicted in Fig. 6. In the notation that follows, the
discrete-time KF procedure can be summarized as follows:
* The Discrete-Time Kalman Filter (KF) Algorithm
1. Linear Model Identification:
Dynamic System Equation: xk+1 ¼ Fkxk + Gkuk + wk
Measurements Equation: yk+1 ¼ Hk+1xk+1 + vk+1
Noise Characteristics:
w k N ð0, Q k Þ, v k N ð0, R k Þ, E{wkvTj } ¼ 0\
2. Initialization:
+
xb0 ¼ E fx 0 g
n
o
+ + T
P +0 ¼ E x 0 xb0 x 0 xb0
3. Updating (for k 5 1,2, …):
3.1. Time Updating (Prediction):
T
P
k+1 ¼ F k P k F k + Q k
+
xbk+1 ¼ F k xbk + G k u k
3.2. Measurement Updating (Filtering):
1
T
T
K k+1 ¼ P k+1 H k+1 H k+1 P k+1 H k+1 + R k+1
+
xbk+1 ¼ xbk+1 + K k+1 y k+1 H k+1 xbk+1
T
T
P +k+1 ¼ ðI K k+1 H k+1 ÞP k+1 ðI K k+1 H k+1 Þ + K k+1 R k+1 K k+1
The matrix Kk in the above equations is called the Kalman Filter Gain, which can be calculated offline before the system
operates and saved in memory. The quantity yk+1 H k+1 xbk+1
is called the Innovation, which can be interpreted as the part of
FIG. 6 Timeline of prior and posterior estimates in KF framework.
142 Handbook of hydroinformatics
the measurement that contains new information about the state. When a KF is used for state estimation, the innovations can
T
be measured and if its mean and covariance are not equal to 0 and Hk+1P
k+1 Hk+1 + Rk+1, respectively, that means something
is wrong; Perhaps either the assumed system model or the assumed noise statistics are incorrect.
The following is an overview of the application of the KF in water engineering, especially flood simulation and
forecasting.
In Hino (1973), the KF was used in flood prediction for the first time and updated the parameters of the Muskingum model.
Markussen (1985) applied the North American Mesoscale (NAM) hydrodynamic model with a new structure to utilize KF for
finding water level values. The most important point in that research was the results of uncertainty analysis, which showed
that uncertainty at the output was mainly driven by uncertainty at the input (rainfall). Markussen (1985) used the KF technique
to update the output data of the centralized rainfall-runoff model. Considering the 6-h data for the Bird Creek Basin with
2344 km2 in Oklahoma, Georgakakos (1986) investigated the effectiveness of a hydrometric model to forecast flooding.
By developing a rainfall forecasting model and combining it with a modified rainfall-runoff model, the National Meteorological Institute forecasted the flood flow and variables such as soil moisture storage capacity. Ultimately, the flood flow was
modified by using the KF model. Guang-Te et al. (1987) also attempted to apply KF to estimate the flow in the Muskingum
model. Ferraresi et al. (1996) used a Darcy linear model along with KF to estimate Transmissibility in a real environment.
Refsgaard (1997) compared the combination of the NAM model integrated with the KF and the NAM/MIKE model with an
error prediction technique and concluded that correctly calibrating the initial hydrological model that leads the KF to have
better results relative to the error prediction model. Lee and Singh (1999) successfully used the KF model to estimate the
parameters in the Tank model. This model was used to modify the model parameters over time and update runoff uncertainty.
Groundwater researchers have mainly studied saturated areas and the possibility of deep diffusion, considering the
observation of moisture in the upper layers. In unsaturated activity areas, Walker et al. (2002) correctly demonstrated
the power of KF in simulating relative soil moisture. The model used by these researchers was valuable in terms of computational terms, due to its three-dimensional nonlinear nature. Information simulation in hydrology has been performed
using algorithms developed in other fields such as meteorology and oceanography through the KF method (Troch et al.,
2003). To predict daily soil moisture, Kashif Gill et al. (2007) used 6 months of meteorological data to train a Support
Vector Machine (SVM) model and 3 months of data to test the learned model. Other research was also conducted using
the KF on the Muskingum routing model (Wang and Bai, 2008; Huang, 1999).
The observability of the phenomenon model is one of the basic conditions for using the KF in predicting the phenomenon. Observability of a dynamic model means the ability to reconstruct the system variable with the observations.
Therefore, since hydrological lump models are not observable, updating the parameters of these models is the only
acceptable way to use the KF under such conditions (Huang, 1999). Distribution models, which are based on the SaintVanant equations and used in flood routing, can satisfy the observability condition well and provide the ability to use
KF for updating the variable state of these models. However, they significantly increase the computational load. Given
these conditions, studies indicate an increase in the accuracy of the forecasts updated by KF (Mu and Zhang, 2007;
Xie and Zhang, 2010). Madsen and Skotner (2005) presented a simultaneously updating model of river flood forecasting,
which was a combination of the KF model and the fault forecasting model. In their model, the error of system state variables
was first distributed on the location of measurement stations by the KF mode; after that, the resulting error values were
distributed at the located of measurement stations in the forecasting area by using the error forecasting model. Currently,
it is possible to use this model in nonlinear and large-dimensional problems. On the other hand, since KF can be adopted
with updating the problem variables to create proper time precedence, this method is a suitable and efficient way for flood
forecasting and timely warning problem.
4.1.1 Kalman filter limitations
The efficiency and accuracy of the Kalman filter are guaranteed only under certain conditions:
l
l
l
The mean, covariance and correlation of the process noise wk and measurement noise vk should be known at all
time steps.
The system model matrices Fk and Hk should be known.
The KF is the Minimum Variance Unbiased Estimator (MVUE) if the noise is Gaussian, and it is the best linear unbiased
minimum variance estimator if the noise is not Gaussian. So, if we want to minimize a different cost function, then the
KF may not accomplish our objectives.
Data assimilation Chapter
8
143
So, we will need an alternative filter if one of the Kalman filter assumptions is not satisfied. The H∞ filter does not make any
assumptions about the noise, and it minimizes the worst-case estimation error. Transfer Function (TF) approaches are a
formulation to H∞ filtering have been proposed by Yaesh and Shaked (1991).
4.2 Transfer function
As we have stated in the earlier section, although the KF and its extensions are an effective tool for estimating the state
variables of a system, they have a serious mismatch between the underlying assumptions of the KF and other practical state
estimation situations. Accurate system models are not as readily available for practical problems and engineers realized
they needed a new filter that could handle modeling errors and noise uncertainty. The Transfer Function (TF) approach
is a frequency domain approach that has been proposed to mitigate these drawbacks. Transfer function, as one of the data
assimilation methods, is among output updating methods that attempt to model error directly.
Consider the discrete-time time-invariant system as follows:
xk+1 ¼ Fxk + wk
yk ¼ Hxk + vk
zk ¼ Lxk
(8)
where yk is the measurement, zk is a vector of a linear combination of the states to be estimated, L is a user-defined full-rank
matrix, and process and measurement noise are uncorrelated. The vectors wk and vk are process and measurement noise with
unknown statistics such that they may not even be zero-mean. If we want to directly estimate xk as in the KF then we set
L ¼ I. Consider the estimation error and augmented disturbance vector as follows:
e
zk ¼ zk b
zk
T T T
ek ¼ wk vk
The transfer function from the augmented disturbance vector e to the estimation error e
z is defined as follow:
z b
z k 2
z k2
k
G ¼ sup ke
¼
sup
e~z ∞
a ke k2
a kwk2 + kvk2
(9)
(10)
where a is the frequency of the noise ( Jukic and Denic-Jukic, 2004; Duffy and Gelhar, 1986; Riyahi et al., 2018). Clearly,
Geez can be considered as a system that has e as its input and e
z as its output. We aim to find a steady-state estimator such that
the infinity-norm of the transfer function (16) is less than some user-specified bound:
G y1
(11)
e~z ∞
Therefore, if P and Pe are Positive Definite (PD) chosen by the user based on the specific problem, the steady-state TF
algorithm can be summarized as follows.
* The Discrete-Time Transfer Function (TF) Algorithm
1. Linear Model Identification:
Dynamic System Equation: xk+1 ¼ Fxk + wk
Measurements Equation: yk ¼ Hxk + vk
Noise Characteristics:
Unknown Statistics
2. Initialization:
x0 ¼ 0
3. Updating (for k 5 1,2, …):
3.1. Priori Filter:
1
1
P ¼ I + FPF T FPH T I + HPH T
HPF T + PL y1 I + LPLT
LP
K ¼ FPHT(I + HPHT)1LP
xbk+1 ¼ F xbk + K y k H xbk
Continued
144 Handbook of hydroinformatics
3.2. Posteriori Filter:
1
S1 ¼ Pe yLT L + H T H
Pe ¼ F Pe H T H Pe yLT LPe + I
1
Ke ¼ I + yLT L SH T
xbk+1 ¼ F xbk +Ke y k+1 HF xbk
1
FT + I
The practical use of the TF in water engineering was first proposed in the MIKE11 software environment to improve
flood prediction accuracy. This software eliminates two errors called Amplitude Error and Phase Error defined by flood
hydrograph in the flood forecasting process. In addition to performing the updating process separately on both the model
output and state variables, MIKE11 software is one of the software that can minimize the errors in model simulations and
present more accurate results.
The amplitude error and the phase error can be attributed to the momentum and continuity equations, respectively
(Paudyal, 2002). Fig. 7 demonstrates these errors schematically. Many works have focused on the improvement of the
TF method, for example, Saddagh and Abedini (2012) improved the results obtained by the TF method by considering
the nonuniformity of the amplitude error around the peak of the hydrograph.
4.3 Extended Kalman filter
Unfortunately, linear systems do not really exist and also many systems are not close enough to linear that linear estimation
approaches give satisfactory results. In this case, we need to explore nonlinear estimators. Since the general KF algorithm is
limited to linear systems, in this section, we will discuss the Extended Kalman Filter (EKF) as a nonlinear extension of the
KF. The low order nonlinear system can be linearized and then linear estimation techniques (such as the KF) can be applied.
Suppose we have the following nonlinear discrete-time deterministic system model:
xk+1 ¼ f k ðxk , uk , wk Þ
(12)
yk+1 ¼ hk+1 ðxk+1 , vk+1 Þ
(13)
By performing a Taylor series expansion of the dynamic system equation (8) around xk ¼
+
∂f
xk , uk , 0 + k
xk + 1 ’ fk b
∂xk
∂f
xk b
xk+ + k
∂wk
b
≜Fk xk + u k + w k
FIG. 7 Schematic of types of errors in the transfer function method.
+
xk ¼xk
xk ¼b
xk
+
wk
xb+k
and wk ¼ 0 we obtain:
(14)
Data assimilation Chapter
8
145
where
∂f k
∂f
+,
Lk ≜ k
b
x
¼
x
k
k
∂xk
∂wk x ¼ bx +
k
k
+
ek ≜Lk wk
uek ≜f k xbk , uk , 0 Fk xbk+ , w
Fk ≜
(15)
Similarly, we linearize the measurement equation (9) around xk+1 ¼ xbk+1 and vk+1 ¼ 0 as follows:
∂h
yk+1 ’ hk+1 xbk+1 , 0 + k+1
∂xk+1
xk+1
xk+1 ¼b
∂h
xk+1 xbk+1 + k+1
∂vk+1
xk+1 ¼ b
xk+1
vk+1
(16)
≜ H k+1 xk+1 + zk+1 + vek+1
where we define:
∂hk+1
∂h
, M
≜ k+1
∂xk+1 xk+1 ¼ bxk+1 k+1 ∂vk+1 x ¼ bx k+1
k+1
e
b
b
zk+1 ≜hk+1 xk+1 , 0 Hk+1 xk+1 , vk+1 ≜Mk+1 vk+1
H k+1 ≜
(17)
ek N 0, Lk Qk LTk and vek+1 N 0, Mk+1 Rk+1 MTk+1 . Therefore, we obtain a
Obviously, the new noise components are w
linear state system in Eq. (14) and a linear measurement in Eq. (16). That means we can use the standard KF to estimate the
state. This results in the following equations for the discrete-time EKF.
* The Discrete-Time Extended Kalman Filter (EKF) Algorithm
1. Nonlinear Model Identification:
Dynamic System Equation: xk+1 ¼ fk(xk, uk, wk)
Measurements Equation: yk+1 ¼ hk+1(xk+1, vk+1)
Noise Characteristics:
w k N ð0, Q k Þ, v k N ð0, R k Þ, E{wkvTj } ¼ 0
2. Initialization:
+
xb0 ¼ E fx 0 g
P +0 ¼ E
n
o
+ + T
x 0 xb0 x 0 xb0
3. Updating (for k 5 1,2, …):
3.1. Time Updating (Prediction):
∂f
F k ¼ ∂xk
k
x k ¼b
xk
+
∂f
, Lk ¼ ∂wk
k
x k ¼b
xk
+
(Linearization)
T
P
k+1 ¼ FkPk Fk + Qk
+
b
b
x k+1 ¼ F k x k + G k u k
3.2. Measurement Updating (Filtering):
∂h
H k+1 ¼ ∂x k+1
k+1
x k+1 ¼b
x k+1
∂h
, Mk+1 ¼ ∂v k+1
k+1
x k+1 ¼b
x k+1
(Linearization)
1
T
T
Kk+1 ¼ P
k+1 Hk+1(Hk+1Pk+1 Hk+1 + Rk+1)
+
xbk+1 ¼ xbk+1 + K k+1 y k+1 H k+1 xbk+1
T
T
P+k+1 ¼ (I Kk+1Hk+1)P
k+1(I Kk+1Hk+1) + Kk+1Rk+1Kk+1
A various researcher has examined the application of the EKF in the field of water resources engineering (Entekhabi
et al., 1994; Puente and Bras, 1987; Walker et al., 2001). In Wood and O’Connell (1985), KF and EKF are exploited for realtime forecasting to simultaneously estimate state variables and parameters in the Sacramento Soil Moisture Accounting
(SAC-SMA) model. Also, although the EKF model was suboptimal and its convergence was not well understood,
McLaughlin and Townley (1996) made several successful attempts to use this filter. Indeed, the lack of high-degree nonlinear terms in their study was the main reason for these successes. Next, Eppstein and Dougherty (1996) simplified the
EKF method by simplifying the covariance update method and used the classification algorithm to zoning the regions for
the Transmissibility value. This method converts a system with a bad situation to a good situation by decreasing the number
of variables. Unfortunately, they have not used this method for real-world situations. Moreover, Sun et al. (2016) used EKF
146 Handbook of hydroinformatics
to investigate the performance improvement of the Soil & Water Assessment Tool (SWAT) hydrological model, in which the
data updating process was performed in two ways of updating the state variable and the output of the model.
4.4 Unscented Kalman filter
As stated earlier, the EKF is the best predictor for nonlinear systems. However, if nonlinearities of the system are severe, the
EKF often gives unreliable estimates. Precisely, the mean and covariance can be exactly updated with the KF (Section 4.1)
in case the system is linear. If the system is nonlinear, then the mean and covariance can be approximately updated with the
EKF (Section 4.3). But, the Unscented Kalman Filter (UKF) works on the principle of Unscented Transformation to update
mean and covariance. An unscented transformation is based on two fundamental principles. First, it is easy to perform a
nonlinear transformation on a single point, called Sigma Point, rather than an entire Probability Density Function (PDF).
Second, it is not too hard to find a set of individual points in state space whose sample pdf approximates the true pdf of a
state vector.
The UKF algorithm can be summarized as follows.
* The Discrete-Time Unscented Kalman Filter (UKF) Algorithm
1. Nonlinear Model Identification:
Dynamic System Equation: xk+1 ¼ fk(xk, uk, tk) + wk
Measurements Equation: yk+1 ¼ hk+1(xk+1, tk+1) + vk+1
Noise Characteristics:
w k N ð0, Q k Þ, v k N ð0, R k Þ, E{wkvTj } ¼ 0
2. Initialization:
+
xb0 ¼ E fx 0 g
P +0 ¼ E
n
o
+ + T
x 0 xb0 x 0 xb0
3. Updating (for k 5 1,2, …):
3.1. Time Updating (Prediction):
3.1.1. Sigma Point Updating
ði Þ
+
ði Þ
xbk ¼ xbk + xe , i ¼ 1, 2, …, 2n, n ¼ Number of Sigma Points
8 pffiffiffiffiffiffiffiffiffi
T
<
nP +k i
i ¼ 1, 2, …, n
xeði Þ ¼
ffi
: pffiffiffiffiffiffiffiffi
+ T
nP
i ¼ n + 1, n + 2, …, 2n
k i
ði Þ
xbk+1
ði Þ
xbk ,
¼ fk
uk , t k
3.1.2. State Updating
1
xbk+1 ¼ 2n
P
k+1 ¼
2n
P
ði Þ
xbk+1
i¼1
2n
X
1
2n
i¼1
ði Þ
xbk+1 xbk+1
T
ði Þ
xbk+1 xbk+1
+ Qk
3.2. Measurement Updating (Filtering):
3.2.1. Sigma Point Updating
ði Þ
xbk+1 ¼ xbk+1 + xeði Þ , i ¼ 1, 2, …, 2n (Sigma Point Generation)
8 pffiffiffiffiffiffiffiffiffiffiffiffi
T
<
i ¼ 1, 2, …, n
nP k+1 i
ði Þ
xe ¼
ffiffiffiffiffiffiffiffiffiffiffiffi
p
: nP T i ¼ n + 1, n + 2, …, 2n
k+1 i
ði Þ
ybk+1
¼
ybk+1 ¼
Py ¼
ði Þ
hk+1 xbk+1 , t k+1
2n
1 X
ði Þ
yb
2n i¼1 k+1
2n
1 X
ði Þ
yb ybk+1
2n i¼1 k+1
P xy ¼
2n
1 X
ði Þ
xb xbk+1
2n i¼1 k+1
ði Þ
ybk+1 ybk+1
ði Þ
ybk+1 ybk+1
T
T
+ R k+1
Data assimilation Chapter
3.2.2.
8
147
State Updating
K k+1 ¼ P xy P 1
y
+
xbk+1 ¼ xbk+1 + K k+1 y k+1 ybk+1
T
P +k+1 ¼ P k+1 K k+1 P y K k+1
In the UKF p
algorithm,
ffiffiffiffiffiffi
row of matrix nP.
pffiffiffiffiffiffi
pffiffiffiffiffiffiT pffiffiffiffiffiffi
nP , and subscript i denote the ith
nP is the matrix square root of nP such that nP ¼
nP
5. Auto-regressive method
The Auto-Regressive (AR) model, which is a time series-based approach, aims to model error directly in the same way as the
TF method. The AR technique is one of the error correction methods widely used by researchers because of its simplicity
and relevant results (Refsgaard, 1997; Xiong and O’Connor, 2002).
The standard AR model in updating the simulation errors consists of a calibration procedure summarized below. Firstly,
the simulation error of model at time instant k is obtained as:
b
ek ¼ Qk Q
k
(18)
b denote the observed and estimated values, respectively.
where ek is the simulation error of the selected model, and Qk and Q
k
If the mean value of the simulation error series of the calibration period, shown by e, is not equal to zero, then that mean
value should be subtracted from the simulated errors to produce a corresponding zero-mean time series, ek:
ek ¼ ek e
(19)
Thus, the AR updating model of order p at time step k is given by:
bek ¼
p
X
ai eki
(20)
i¼1
p
} are the AR coefficients and bek is the estimate of ek. It is worth mentioning that the Auto-Correlation Function
where {aii¼1
(ACF) of the time series ei satisfies an analogous form of linear difference equation to that of Eq. (20), so the Yule-Walker
p
can be exploited to estimate the parameters {aii¼1
by replacing the theoretical auto-correlations by their respective estimates
p
(obtained from the ei time series) (Box and Jenkins, 1976). In practice, however, the AR parameters {aii¼1
} are generally
estimated by Least Squares (LS) method by treating Eq. (20) as linear regression. Shamseldin and O’Connor (2001) show
that the mathematical formulation of the AR error-forecast updating model is itself a special limiting case of the more
general input-output structure of the linear Auto-Regressive Exogenous-input Model (ARXM) (Abdelrahman, 1995), also
known as the Linear Transfer Function Model (LTFM) (Xiong and O’Connor, 2002).
By Incorporating a residual updated forecast error term, ek, the ARXM model-output updating procedure has the form
(Shamseldin and O’Connor, 2001):
Qk ¼
p
X
i¼1
ai Qki +
q
X
b +e
bi Q
ki
k
(21)
i¼0
b have the same definition as Eq. (18), p and q are the orders of the AR and the exogenous input parts of the
where Qk and Q
k
ARXM, respectively, and ai and bi are the corresponding coefficient parameters of the two parts, respectively. Clearly, if
p ¼ q, b0 ¼ 1 and ai ¼ bi for i ¼ 1, 2, …, p, the ARMX model becomes the AR model of the error series ek, whereas if bi ¼ 0
for i ¼ 0, …, q, it becomes the naive AR updating model. Since Eq. (21) is the form of a multiple-linear regression, the
parameters of the ARXM can also be estimated directly using the LS method.
The results of Shamseldin and O’Connor (2001) used the ARXM updating model as their benchmark updating procedure, show the ARXM procedure is not significantly more efficient than the conventional AR model.
148 Handbook of hydroinformatics
6. Considerations in using data assimilation
Data assimilation (DA) methods can significantly increase the accuracy of predicted results and are often considered as the
best method. By the way, some considerations should be addressed in the use of these methods as follow (House et al.,
2003):
(i) Applying updating methods does not necessarily eliminate the need for a model that has been properly calibrated for a
wide range of data. Further, it is necessary to examine the accuracy of the calibration performed for a validation
period, which contains the recorded data.
(ii) The quality of the updating process of the forecasting values heavily relies on the quality of the input data, and the
existence of errors in the input data of this process not only fails to improve the results, but also overshadows the
accuracy. In this way, it is suggested to control the quality of input data manually or automatically.
(iii) Regarding the real-time control applications of hydraulic structures, the use of upgrading methods requires
addressing special issues in network design. Otherwise, the lack of considering the interaction of modeling and
control commands of structures will affect the results.
7. Conclusions
The modeling of hydrological processes is associated with a multitude of climatic parameters and data; therefore, providing
an appropriate simulation model that lead to minimal error has always been a vital challenge in previous studies. The uncertainty and lack of reliability over the accuracy of data and input parameters of simulation models lead to error creation,
which has a significant adverse effect on long-term forecasts and management policies. The forecasting of flow rate by
hydrological models is always accompanied by uncertainty. For this reason, various methods such as increasing the quality
of input data to the model and improving the model structure and observational data assimilation are used to reduce the
uncertainty of the models. Data assimilation is one of the ways to reduce uncertainty, which decreases the uncertainty by
considering the uncertainty of inputs and observations, as well as updating the state variable. Moreover, the KF method and
its extensions are regarded as one of the most widely used data assimilation methods. These methods, which are commonly
used in various scientific fields, are recursive and can be applied to linear and nonlinear systems. The main advantages of
these methods are the lack of need to preserve all the measured information from the beginning until now, and the forecasting of the next step only by having the measurements of the present step.
References
Abdelrahman, E., 1995. Real-Time Stream Flow Forecasting for Single-Input Single-Output Systems. Unpublished M. Sc. thesis, National University of
Ireland, Galway.
Anctil, F., Perrin, C., Andreassian, V., 2003. ANN output updating of lumped conceptual rainfall/runoff forecasting models. J. Am. Water Resour. Assoc.
39 (5), 1269–1279.
Anderson, B.D., Moore, J.B., 2012. Optimal Filtering. Courier Corporation.
Aubert, D., Loumagne, C., Oudin, L., 2003. Sequential assimilation of soil moisture and streamflow data in a conceptual rainfall–runoff model. J. Hydrol.
280 (1–4), 145–161.
Babovic, V., et al., 2001. Neural networks as routine for error updating of numerical models. J. Hydraul. Eng. 127 (3), 181–193.
Bailey, R.T., Baù, D.A., 2010. Assimilating Water Table Elevation Data Into a Catchment Hydrology Modeling Framework to Estimate Hydraulic Conductivity. Colorado State University. Libraries.
Bao, W., et al., 2011. Real-time equivalent conversion correction on river stage forecasting with Manning’s formula. J. Hydrol. Eng. 16 (1), 1–9.
Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control. revised ed Holden-Day, San Francisco.
Brocca, L., et al., 2009. Assimilation of observed soil moisture data in storm rainfall-runoff modeling. J. Hydrol. Eng. 14 (2), 153–165.
Butts, M., et al., 2005. Ensemble-based methods for data assimilation and uncertainty estimation in the FLOODRELIEF project. In: ACTIF International
Conference on Innovation Advances and Implementation of Flood Forecasting Technology.
Chao, Z., et al., 2008. Robust recursive estimation of auto-regressive updating model parameters for real-time flood forecasting. J. Hydrol. 349 (3–4),
376–382.
Chiu, C.-L., 1978. Applications of Kalman Filter to Hydrology, Hydraulics, and Water Resources: Proceedings of AGU Chapman Conference, Held at
University of Pittsburgh, Pittsburgh, Pennsylvania, USA, May 22–24, 1978. American Geophysical Union (USA).
Cho, K.H., et al., 2020. Data assimilation in surface water quality modeling: a review. Water Res. 186, 116307.
Clark, M.P., et al., 2008. Hydrological data assimilation with the ensemble Kalman filter: use of streamflow observations to update states in a distributed
hydrological model. Adv. Water Resour. 31 (10), 1309–1324.
Data assimilation Chapter
8
149
Dorigo, W.A., et al., 2007. A review on reflective remote sensing and data assimilation techniques for enhanced agroecosystem modeling. Int. J. Appl.
Earth Obs. Geoinf. 9 (2), 165–193.
Drecourt, J.-P., 2003. Kalman Filtering in Hydrological Modeling. DAIHM, Hørsholm, Denmark.
Duan, Q., Sorooshian, S., Gupta, V., 1992. Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resour. Res. 28 (4),
1015–1031.
Duffy, C.J., Gelhar, L.W., 1986. A frequency domain analysis of groundwater quality fluctuations: interpretation of field data. Water Resour. Res. 22 (7),
1115–1128.
Eigbe, U., et al., 1998. Kalman filtering in groundwater flow modelling: problems and prospects. Stoch. Hydrol. Hydraul. 12 (1), 15–32.
Entekhabi, D., Nakamura, H., Njoku, E.G., 1994. Solving the inverse problem for soil moisture and temperature profiles by sequential assimilation of
multifrequency remotely sensed observations. IEEE Trans. Geosci. Remote Sens. 32 (2), 438–448.
Eppstein, M.J., Dougherty, D.E., 1996. Simultaneous estimation of transmissivity values and zonation. Water Resour. Res. 32 (11), 3321–3336.
Evensen, G., 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.
J. Geophys. Res. Oceans 99 (C5), 10143–10162.
Ferraresi, M., Todini, E., Vignoli, R., 1996. A solution to the inverse problem in groundwater hydrology based on Kalman filtering. J. Hydrol. 175 (1–4),
567–581.
Fread, D.L., 1981. Flood routing: a synopsis of past, present, and future capability. In: Singh, V.P. (Ed.), Proceedings of International Symposium on
Rainfall-Runoff Modeling. Water Resources Publications, Littleton, CO, pp. 521–541.
Galantowicz, J.F., Entekhabi, D., Njoku, E.G., 1999. Tests of sequential data assimilation for retrieving profile soil moisture and temperature from
observed L-band radiobrightness. IEEE Trans. Geosci. Remote Sens. 37 (4), 1860–1870.
Georgakakos, K.P., 1986. A generalized stochastic hydrometeorological model for flood and flash-flood forecasting: 1. Formulation. Water Resour. Res.
22 (13), 2083–2095.
Griessinger, N., et al., 2016. Assessing the benefit of snow data assimilation for runoff modeling in Alpine catchments. Hydrol. Earth Syst. Sci. 20 (9),
3895–3905.
Guang-Te, W., Yu, Y.-S., Kay, W., 1987. Improved flood routing by ARMA modelling and the Kalman filter technique. J. Hydrol. 93 (1–2),
175–190.
Hendricks Franssen, H.-J., et al., 2011. Operational real-time modeling with ensemble Kalman filter of variably saturated subsurface flow including
stream-aquifer interaction and parameter updating. Water Resour. Res. 47 (2).
Hino, M., 1973. Stochastic approach to linear and nonlinear runoff analysis. In: Flood Investigation. vol. II. Asian Institute of Technology.
House, E., et al., 2003. Defra/Environment Agency Flood and Coastal Defence R&D Programme.
Hsu, M.-H., Fu, J.-C., Liu, W.-C., 2006. Dynamic routing model with real-time roughness updating for flood forecasting. J. Hydraul. Eng. 132 (6),
605–619.
Huang, W.-C., 1999. Kalman filter effective to hydrologic routing? J. Mar. Sci. Technol. 7 (1), 65–71.
Husain, T., 1985. Kalman filter estimation model in flood forecasting. Adv. Water Resour. 8 (1), 15–21.
Jukic, D., Denic-Jukic, V., 2004. A frequency domain approach to groundwater recharge estimation in karst. J. Hydrol. 289 (1–4), 95–110.
Kachroo, R., 1992. River flow forecasting. Part 1. A discussion of the principles. J. Hydrol. 133 (1–2), 1–15.
Karl, T.R., Knight, R.W., Plummer, N., 1995. Trends in high-frequency climate variability in the twentieth century. Nature 377 (6546), 217–220.
Kashif Gill, M., Kemblowski, M.W., McKee, M., 2007. Soil moisture data assimilation using support vector machines and ensemble Kalman filter 1.
J. Am. Water Resour. Assoc. 43 (4), 1004–1015.
Katul, G.G., et al., 1993. Estimation of in situ hydraulic conductivity function from nonlinear filtering theory. Water Resour. Res. 29 (4), 1063–1070.
Kay, S.M., 1993. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Inc.
Khaki, M., Hendricks Franssen, H.-J., Han, S., 2020. Multi-mission satellite remote sensing data for improving land hydrological models via data assimilation. Sci. Rep. 10 (1), 1–23.
Kitanidis, P.K., Bras, R.L., 1980. Real-time forecasting with a conceptual hydrologic model: 1. Analysis of uncertainty. Water Resour. Res. 16 (6),
1025–1033.
Komma, J., Bl€
oschl, G., Reszler, C., 2008. Soil moisture updating by Ensemble Kalman Filtering in real-time flood forecasting. J. Hydrol. 357 (3–4),
228–242.
Krzysztofowicz, R., 2001. The case for probabilistic forecasting in hydrology. J. Hydrol. 249 (1–4), 2–9.
Lee, Y., Singh, V., 1999. Tank model using Kalman filter. J. Hydrol. Eng. 4 (4), 344–349.
Lettenmaier, D.P., Burges, S.J., 1976. Use of state estimation techniques in water resource system modeling 1. J. Am. Water Resour. Assoc. 12 (1), 83–99.
Liu, Y., Gupta, H.V., 2007. Uncertainty in hydrologic modeling: toward an integrated data assimilation framework. Water Resour. Res. 43 (7).
Liu, W.-C., et al., 2010. Dynamic routing modeling for flash flood forecast in river system. Nat. Hazards 52 (3), 519–537.
Madsen, H., Skotner, C., 2005. Adaptive state updating in real-time river flow forecasting—a combined filtering and error forecasting procedure. J. Hydrol.
308 (1–4), 302–312.
Markussen, L.M., 1985. Application of the Kaiman filter to real time operation and to uncertainty analyses in hydrological modelling. In: Scientific Procedures Applied to the Planning, Design and Management of Water Resources Systems. vol. 147. International Association of Hydrological Sciences,
Wallingford, Oxfordshire, pp. 273–282.
Maybeck, P., 1979. Stochastic Models, Estimation and Control. vol. 1 Academic Press.
McLaughlin, D., Townley, L.R., 1996. A reassessment of the groundwater inverse problem. Water Resour. Res. 32 (5), 1131–1161.
150 Handbook of hydroinformatics
Misirli, F., et al., 2003. Bayesian recursive estimation of parameter and output uncertainty for watershed models. In: Calibration of Watershed Models.
American Geophysical Union, Washington, DC, pp. 113–124.
Moore, R., 1999. Real-time flood forecasting systems: perspectives and prospects. In: Floods and landslides: Integrated Risk Assessment. Springer,
pp. 147–189.
Moradkhani, H., et al., 2005. Dual state–parameter estimation of hydrological models using ensemble Kalman filter. Adv. Water Resour. 28 (2), 135–147.
Mu, J.-b., Zhang, X.-f., 2007. Real-time flood forecasting method with 1-D unsteady flow model. J. Hydrodynam. 19 (2), 150–154.
Neal, J., et al., 2009. A data assimilation approach to discharge estimation from space. Hydrol. Process. 23 (25), 3641–3649.
O’Connell, P., Clarke, R., 1981. Adaptive hydrological forecasting—a review/Revue des methodes de prevision hydrologique ajustables. Hydrol. Sci. J. 26
(2), 179–205.
Pappenberger, F., et al., 2004. The influence of rating curve uncertainty on flood inundation predictions. In: Flood Risk Assessment, Bath.
Pappenberger, F., et al., 2007. Grasping the unavoidable subjectivity in calibration of flood inundation models: a vulnerability weighted approach.
J. Hydrol. 333 (2–4), 275–287.
Parajka, J., et al., 2006. Assimilating scatterometer soil moisture data into conceptual hydrologic models at the regional scale. Hydrol. Earth Syst. Sci. 10
(3), 353–368.
Paudyal, G.N., 2002. Forecasting and warning of water-related disasters in a complex hydraulic setting—the case of Bangladesh. Hydrol. Sci. J. 47 (S1),
S5–S18.
Puente, C.E., Bras, R.L., 1987. Application of nonlinear filtering in the real time forecasting of river flows. Water Resour. Res. 23 (4), 675–682.
Refsgaard, J.C., 1997. Validation and intercomparison of different updating procedures for real-time forecasting. Hydrol. Res. 28 (2), 65–84.
Reichle, R.H., Entekhabi, D., McLaughlin, D.B., 2001. Downscaling of radio brightness measurements for soil moisture estimation: a four-dimensional
variational data assimilation approach. Water Resour. Res. 37 (9), 2353–2364.
Riyahi, M.M., Rahmanshahi, M., Ranginkaman, M.H., 2018. Frequency domain analysis of transient flow in pipelines; application of the genetic programming to reduce the linearization errors. J. Hydraul. Struct. 4 (1), 75–90.
Rodrı́guez-Iturbe, I., Mejı́a, J.M., 1974. The design of rainfall networks in time and space. Water Resour. Res. 10 (4), 713–728.
Rungø, M., Refsgaard, J., Havnø, K., 1989. The updating procedure in the MIKE 11 modelling system for real-time forecasting. In: Proceedings of the
International Symposium for Hydrological Applications of Weather Radar. University of Salford.
Saddagh, M., Abedini, M., 2012. Enhancing MIKE11 updating kernel and evaluating its performance using numerical experiments. J. Hydrol. Eng. 17 (2),
252–261.
Sahoo, A.K., et al., 2013. Assimilation and downscaling of satellite observed soil moisture over the Little River Experimental Watershed in Georgia, USA.
Adv. Water Resour. 52, 19–33.
Schad, A., et al., 2015. Recent developments in helioseismic analysis methods and solar data assimilation. Space Sci. Rev. 196 (1), 221–249.
Sene, K., 2008. Flood Warning, Forecasting and Emergency Response. Springer Science & Business Media.
Seo, D.-J., Koren, V., Cajina, N., 2003. Real-time variational assimilation of hydrologic and hydrometeorological data into operational hydrologic forecasting. J. Hydrometeorol. 4 (3), 627–641.
Shamseldin, A.Y., O’Connor, K.M., 2001. A non-linear neural network technique for updating of river flow forecasts. Hydrol. Earth Syst. Sci. 5 (4),
577–598.
Simon, D., 2006. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. John Wiley & Sons.
Sun, C., et al., 2016. Fuzzy copula model for wind speed correlation and its application in wind curtailment evaluation. Renew. Energy 93, 68–76.
Tiefenbacher, J., 2012. Approaches to Managing Disaster: Assessing Hazards, Emergencies and Disaster Impacts. BoD–Books on Demand.
Troch, P., Paniconi, C., McLaughlin, D., 2003. Catchment-scale hydrological modeling and data assimilation. Adv. Water Resour. 26, 131–135.
Tsonis, A., 2004. Is global warming injecting randomness into the climate system? Eos, Transactions American Geophysical Union 85 (38), 361–364.
Verlaan, M., et al., 1831. Operational storm surge forecasting in the Netherlands: developments in the last decade. Philos. Trans. R. Soc. A Math. Phys.
Eng. Sci. 2005 (363), 1441–1453.
Vrugt, J.A., et al., 2006. Real-time data assimilation for operational ensemble streamflow forecasting. J. Hydrometeorol. 7 (3), 548–565.
Wagener, T., Gupta, H.V., 2005. Model identification for hydrological forecasting under uncertainty. Stoch. Environ. Res. Risk Assess. 19 (6), 378–387.
Walker, J.P., Willgoose, G.R., Kalma, J.D., 2001. One-dimensional soil moisture profile retrieval by assimilation of near-surface observations: a comparison of retrieval algorithms. Adv. Water Resour. 24 (6), 631–650.
Walker, J.P., Willgoose, G.R., Kalma, J.D., 2002. Three-dimensional soil moisture profile retrieval by assimilation of near-surface measurements: Simplified Kalman filter covariance forecasting and field application. Water Resour. Res. 38 (12). 37-1-37-13.
Wang, C.-H., Bai, Y.-L., 2008. Algorithm for real time correction of stream flow concentration based on Kalman filter. J. Hydrol. Eng. 13 (5), 290–296.
Weerts, A., El Serafy, G., 2005. Comparing particle filtering and ensemble Kalman filtering for input correction in rainfall runoff modelling. In: This
symposium is one of the major opportunities for engineers and scientists to meet in order to report on and discuss ways in which hydraulic and stochastic analyses can be integrated in an effective and useful manner in order to meet these challenges. In this context, it is important to note that the
move, in which the first eight in this series of symposia have played a pivotal role, over the last twenty years towards more.
Weerts, A.H., El Serafy, G.Y., 2006. Particle filtering and ensemble Kalman filtering for state updating with hydrological conceptual rainfall-runoff
models. Water Resour. Res. 42 (9).
W€
ohling, T., Lennartz, F., Zappa, M., 2006. Updating procedure for flood forecasting with conceptual HBV-type models. Hydrol. Earth Syst. Sci. 10 (6),
783–788.
Wood, E.F., O’Connell, P.E., 1985. Real-time forecasting. In: Hydrol Forecast. John Wiley & Sons, pp. 505–558.
Data assimilation Chapter
8
151
Xie, X., Zhang, D., 2010. Data assimilation for distributed hydrological catchment modeling via ensemble Kalman filter. Adv. Water Resour. 33 (6),
678–690.
Xiong, L., O’Connor, K.M., 2002. Comparison of four updating models for real-time river flow forecasting. Hydrol. Sci. J. 47 (4), 621–639.
Xiong, L., O’Connor, K.M., Guo, S., 2004. Comparison of three updating schemes using artificial neural network in flow forecasting. Hydrol. Earth Syst.
Sci. 8 (2), 247–255.
Yaesh, I., Shaked, U., 1991. A transfer function approach to the problems of discrete-time systems: H/sub infinity/-optimal linear control and filtering.
IEEE Trans. Autom. Control 36 (11), 1264–1271.
Young, P.C., 2002. Advances in real–time flood forecasting. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 360 (1796), 1433–1450.
Yu, P.-S., Chen, S.-T., 2005. Updating real-time flood forecasting using a fuzzy rule-based model/Mise à Jour de Prevision de Crue en Temps Reel Gr^ace à
un Modèle à Base de Règles Floues. Hydrol. Sci. J. 50 (2).
Zhang, H., et al., 2017. State and parameter estimation of two land surface models using the ensemble Kalman filter and the particle filter. Hydrol. Earth
Syst. Sci. 21 (9), 4927–4958.
This page intentionally left blank
Chapter 9
Data reduction techniques
M. Mehdi Batenia and Saeid Eslamianb,c
a
University School for Advanced Studies, Pavia, Italy, b Department of Water Engineering, College of Agriculture, Isfahan University of Technology,
Isfahan, Iran, c Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran
1. Introduction
Short/dense time and spatial intervals of data acquisition programs has resulted in dependent observations. Except of
important information, the real data contain useless or even confusing information which can be considered as noise.
At postdata collection phase, different redundancy reduction algorithms could be used to decrease redundant information
from the data set. These algorithms could be categorized in sample reduction (e.g., clustering) and dimension reduction
techniques. In order to determine the reliability of hydrologic design variables derived from these dependent observations,
it is necessary to reduce the sample data to an equivalent series of independent observations. Moreover, dimension
reduction techniques are useful to handle heterogeneity and massiveness of data by reducing many variable data into manageable size. Also, dimension reduction techniques could help reaching a more parsimonious model through selection of
more important features to include. Here, a review on most common sample reduction and dimension reduction techniques
in hydroinformatics are presented.
2. Principal component analysis
Principal component analysis (PCA) is usually utilized to transform a large number of variables into a small number of
orthogonal variables which present common causes of variable changes (Eslamian et al., 2010). It has been developed
to extract uncorrelated components of data which variation is maximized in those directions. PCA can save computation
time since there are less independent variables reconstructed from original data set by PCA. It is a basic robust multivariate
statistical method that does not require normally distributed and uncorrelated variables. PCA is able to remove the data
noise and to cluster the refined samples of similar composition into groups to reveal relationships among their variables.
All variables can be illustrated simultaneously by projecting basis vectors onto the two/three leading PCs.
PCA as a technique became popular following papers by Lorenz in the mid-1950s—who called this technique Empirical
Orthogonal Function (EOF) analysis. Both these names refer to the same set of procedures. However, EOF is the more
popular term in atmospheric sciences. The method was firstly introduced by Pearson (1901), though until the 1950s,
the method had limited applications due to the lack of computational equipment. Since then, it has been widely used in
environmental science.
The main objective of PCA is looking for new variables in the sample, which are not correlated to each other. Each of
these new variables or principal components is a linear combination of the former variables and describes a different source
of total variation. The method sorts initially correlated data into that explain decreasingly less and less of the total variation.
Hence, the data could be reduced by trimming off less important transformed variables which are the last ones (Fig. 1).
The dataset is typically represented by a matrix of samples or observations which are characterized by many physical,
chemical, and other variables of different magnitude and units. Suppose the data consists of n observations of p variables
and is represented as an n p matrix, called X. Without loss of generality, assume that the variables in the data matrix X are
standardized so that each has a zero mean and unit variance. The principal components are
t1 ¼ w1,1 x1,1 + w1,2 x1,2 + …+w1,p x1,p
t2 ¼ w2,1 x2,1 + w2,2 x2,2 + …+w2,p x2,p
⋮
tn ¼ wn,1 xn,1 + wn,2 xn,2 + …+wn,p xn,p
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00015-4
Copyright © 2023 Elsevier Inc. All rights reserved.
153
154
Handbook of hydroinformatics
FIG. 1 X0 and Y0 are orthogonal directions which describe the most variations of data and therefor are the first two principal components.
where wi, j and xi, j (1 < i n, 1 < j p) are component loadings and original data, respectively. The component loadings are
the contribution measures of a particular variable to the principal components. For each 1 < i n, it holds w2i, 1 + w2i, 2 + ⋯
+ w2i, p ¼1. The variability of the principal components is ordered as Var(t1) > Var(t2) > … > Var(tn). Sum of the eigenvalues
of the covariance matrix of data, gives the total variance of the original data matrix X.
Singular value decomposition (SVD), is a computational method often employed to efficiently calculate principal components of a dataset. SVD of an n p matrix M is a factorization into three matrices of the form USVT, where U is an
n n matrix with mutually perpendicular columns (orthogonal matrix), S is an n p rectangular diagonal matrix with nonnegative numbers in the decreasing order on the diagonal, and V is an p p orthogonal matrix. The diagonal entries of S are
known as singular values (si) of the original matrix M (Austin, 2009). If we now perform singular value decomposition of
the standardized data matrix X, we obtain
X ¼ USV T ,
where U and V are orthogonal matrices and have columns with unit magnitude.a S is the diagonal matrix of singular values si
of the data matrix. The columns of the matrices U and V contain the (left and right) singular vectors of X. The
p p covariance matrix (C) of data is given by C ¼ XTX/(n 1). One can easily see that the covariance matrix could be
rewritten as following using SVD of X
VSUT VSU T
S2
¼ VSUT USV T ¼ V
C¼
VT,
ð n 1Þ
n1
2
S
. In the above equation,
which could be considered itself as SVD of the covariance matrix with diagonal matrix of L ¼ n1
V is a matrix of eigenvectors (each column is an eigenvector) and L is a diagonal matrix with eigenvalues li ¼ s2i /(n 1)
on the diagonal. Since the eigenvectors of the covariance matrix are actually the directions of the axes where there is the
most variance, they called principal components (PCs). Principal component scores are given by columns of
XV ¼ USVTV ¼ US. These components can be seen as new, transformed, variables. The jth principal component is given
by jth column of XV. The coordinates of the ith data point in the new PC space are given by the ith row of XV. The
eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each principal component.
To reduce the dimensions of the data from p to k < p, select k first columns of V, and k k upper-left part of S. Their
product, UkSk, is the n k matrix containing first k principal components. Further multiplying the first k principal components by the corresponding principal axes (VTk ) yields Xk ¼ (XVk)VTk . The matrix Xk provides a reconstruction of the original
data from the first k principal components and has the original n p size but with lower rank of k.b Fortunately, there are
some efficient implementations of SVD in common programming languages which find just top k eigenvectors.
If you look at the spatial patterns of PCs, there is a temptation to ascribe some physical meaning to them, but this is not
always a good idea. Because the orthogonality constraint on the eigenvectors can mean that the second and third PCs bear
no resemblance to the physical mechanisms that drive the data. The first PC represents the most important mode of
a. An orthogonal matrix that have columns with unit magnitude is called orthonormal matrix.
b. The rank of the matrix refers to the number of linearly independent rows or columns in the matrix. In other words, the rank of a matrix is the dimension of
the vector space generated by its columns/rows.
Data reduction techniques Chapter
9
155
FIG. 2 A typical scree diagram of eigenvalues from the unreduced correlation matrix, arrow indicates region of curve where slope changes.
variability or physical process but it may include aspects of other correlated modes and processes. As noted earlier, the
variance of each PC is equal to its corresponding eigenvalue. In order to decide how many PCs are required to reduce data
without significant loss of information, scree diagram can be helpful. Scree diagram is a graph plotting the eigenvalues of
the covariance matrix C versus their values. By assessing the change in the slope of the diagram, one can choose the
number of PCs to be used for the data reduction. For example, four or five PCs may be selected in Fig. 2. Moreover,
the Kaiser criterion could be used for selecting the number of PCs. It is based on the fact that the more variables that
load onto a particular component (i.e., have a high correlation with the component), the more important the factor is in
summarizing the data. Hence, it drops the components, for which the eigenvalues are less than 1 (Beavers et al., 2013).
Employing PCA method for data prefiltering can avoid multicollinearity. Multicollinearity is the occurrence of mutual
correlations among a set of predictor variables which can result in unstable regression parameters. When the data vectors
are spatial distributions of values at a single time, it could filter away much of the small-scale noise with a minimal loss of
information. When PCA is applied to the time series which is structured into overlapping moving windows of data, it may
reveal oscillatory features in the series. In this case, the eigenvectors represent characteristic time patterns, rather than characteristic spatial patterns. PCA might get misguided in presence of outliers. Moreover, the complexity of PCA for a matrix
of size n p, is O(p2 n + p3) which is relatively high.
The description given here is by no means complete. Those who want a more complete description should read Jolliffe
(1986).
3. Singular spectrum analysis
Singular spectrum analysis (SSA) is a data reduction tool based on PCA concept, usually called extended empirical
orthogonal function (EEOF) in atmospheric sciences. Like PCA, it could be used as a data reduction method.
SSA provides the ability to discern “common patterns of variability” shared among multiple datasets in both space and
time. The extension may be in space (S-mode) or in time (T-mode). The math is essentially the same as for PCA, and the
difference lies in preprocessing of the data. When the technique applied to multivariate data (many time series) it is known
as multivariate or multichannel singular spectrum analysis (MSSA).
156
Handbook of hydroinformatics
3.1 Univariate singular spectral analysis
In order to explain the implementation of the univariate case of SSA, consider a single times series as a vector X(t), t ¼ 1, …, n.
Like PCA, eigenvectors and eigenvalues are extracted from the covariance matrix. Yet, the covariance matrix is calculated
using a delay window or imposing an embedding dimension of length m on the time series. This vector contains the values
of covariance between X(t) and X(t + k) with k ¼ 0, …, m 1. Note that we defined X such that it has unit variance. Hence,
the covariance at lag zero equals one.
The idea is thus to compute the covariance between the values X(t) and X(t + k), where k is a delay (or “lag”). That is,
using the definition above, if the covariance at lag k is positive, the values X(t) and X(t + k) tend to vary together (Fig. 3).
Hence, the trajectory matrix (Y) would be
2
xð1Þ T
6 x T
6 ð2Þ
6
Y¼6
⋮
6
6
4 xðn3Þ T
xðn2Þ T
3
7
7
7
7:
7
7
5
Number of rows of matrix Y is n m + 1 and number of its columns is m ¼ 3. C is computed as C ¼ YTY/(n m + 1), which
follows from the definition of covariance. The diagonal of matrix C contains the variances of each column which should be
close to one in case the data are standardized. Eigenvectors of the covariance matrix (C) are principal components and
called temporal empirical orthogonal functions (EOFs). Hence, matrix of eigenvectors (VC) could project the embedded
time series into its principal components (PC ¼ YVC). For example, in case of m ¼ 3, the projection results a PC matrix with
three columns that are the first three principal components PC1, PC2 and PC3. The first column is PC1, the second column is
PC2, etc.
Principal components are projection of data in a different coordinate system, and hence their interpretation is different
from the data series. However, by projecting the PCs back onto the eigenvectors, we obtain time series (referred to as reconstructed components (RCs) in SSA terminology) in the original coordinates. For this, we need to construct a matrix (Z) that
is, like the matrix Y, an embedded time series. There is one matrix for each one of the principal components. For example,
let’s compare the matrix Z for the first principal component to PC1 (i.e., to the first column of the matrix PC). The first
column of Z is simply PC1. The second column is PC1 at time t 1. The third column is PC1 at time t 2. Note that, again,
zeros have been put at the beginning, where data is not available. The RC1 (the first reconstructed component) is derived by
multiplying Z by the first eigenvector of the covariance matrix and dividing by M, RC1 ¼ ZVc(:,1)/m. For the other RCs, a
similar method could be adopted using other PCs.
FIG.
3 Imposing
embedding
dimension of length m ¼ 3 on the data
time series.
x1, x2, x3, x4,
,xn–3 , xn–2 , xn–1, xn
xn–3
x1
x(1) =
x(n–3) =
x2
xn–1
x3
x(2) =
xn–2
x2
x3
x4
xn–2
x(n–2) =
xn–1
xn
Data reduction techniques Chapter
9
157
3.2 Multivariate singular spectral analysis
This section describes the differences between the univariate SSA and its multivariate (or multichannel) extension, MSSA.
We go through the same steps as in the previous section and clarify the differences. The procedure for multivariate case is
pretty close to what is done for univariate cases. Nevertheless, some modifications are needed for the multivariate case.
Like univariate case, an essential and necessary step is to standardize multiple time series separately for each. Then, consider multiple time series as multiple columns of the data matrix. The trajectory matrix in this case consists of multiple lags
of each variable, one after another. For example, when there are two variable with embedding length m ¼ 2, the first column
is the first variable, the second column is the first variable with one step lag, the third column is the second variable and
finally the fourth column is the second variable with one step lag. The covariance matrix (C) of the trajectory matrix has
l m columns, where l is the number of variables in the time series. Using SVD, l m eigenvectors of C could be extracted.
The first m eigenvectors belongs to the first variable and so on. Once the eigenvectors have been determined, the principal
components are computed as PC ¼ YVC where Y is the trajectory matrix and VC is Matrix of eigenvectors of C. Unlike the
EOFs or the matrices Y and C, we can no longer identify a part that corresponds to each separate time series. In other words,
each PC contains characteristics of all time series. Analogous to univariate SSA, each time series can be reconstructed
by projecting the PCs back onto the eigenvectors. As before, we embed each PC time-delayed with delays 0, …, m + 1,
what yields to matrix Z of size n m with the same structure. In order to reconstruct the first time series, we use the first
m rows of Vc (RCx1(:,i)]ZVc (1:m,i)/m and RCx2(:,j) ¼ ZVc(m + 1:2 m,j)/M), and for the second time series the second
m rows, and so on.
4. Canonical correlation analysis
Canonical correlation analysis (CCA) is a linear dimension reduction method, applied to pairs of multidimensional random
variables. It has found applications in many areas of earth science including regional flood frequency analysis, statistical
downscaling of general circulation models, and forecast of long-range temperature and precipitation (Wikle, 2003).
Proposed by Hotelling (1936), CCA can be seen as the problem of finding basis vectors for two sets of variables such
that the mutual information between the projections of the variables onto these basis vectors are mutually maximized
(Borga and Knutsson, 1998).
CCA could be considered as a dimension reduction method as well as a classification method. The standard formulation
of CCA assumes that the dimensions of the reduced space, i.e., the number of canonical components, are known a priori. For
the case when the number of canonical components is not known, refer to Tripathi and Govindaraju (2010).
Let’s assume that matrix X containing observations of a set of variables while matrix Y containing relevant observations
of a different set of variables. Typically, the data are time series of the observations of the two fields which could be
observed at the same time (coupled variability) or they could be lagged in time (statistical prediction). Suppose that
X and Y are standardized so that each column of those has a zero mean and unit variance. Then, the variance/covariance
matrices could be estimated as
CXY ¼ CTYX ¼ E XY T ,
CXX ¼ E XXT ,
CYY ¼ E YY T :
The standardized data, X and Y, are transformed into sets of new variables (canonical variates), V ¼ ATX and W ¼ BTY where
A and B are linear weights, called canonical vectors. The number of pairs of canonical variates is k ¼ min(dim(X), dim(Y)).
A and B are chosen such that
1. corr[v1, w1] >¼ corr[v2,w2] >¼ … >¼ corr[vk,wk] >¼ 0 (each of the k pairs of canonical variates exhibits no greater
correlation than the previous pair)
2. corr[vi, wj] ¼ rC(i) for i ¼ j; corr[vi, wj] ¼ 0 for i 6¼ j, where rC is canonical correlation coefficients (each canonical variate
is uncorrelated with all other variates except its twin)
In order to calculate the canonical vectors and variates, matrix K is constructed as
1=2
1=2
K ¼ CXX CXY CYY
158
Handbook of hydroinformatics
where power of 1/2 denotes inverse of square root of the matrix. Let k denotes the number of nonzero eigenvalues of K. If
we now perform singular value decomposition on K, we obtain
K ¼ ΑSΒT ,
where columns of Α and Β, i.e., a1, …, ak and b1, …, bk are orthogonal and S is the diagonal matrix of singular values (si) of
K. Elements of canonical vectors, i.e., ai and bi will be
(
1=2
ai ¼ CXX ai , for i ¼ 1…k:
1=2
bi ¼ CYY bi , for i ¼ 1…k:
and canonical correlations are singular values rC(i) ¼ si.
Note that no distinction is made between the two fields X and Y; each can act interchangeably as predictors or
predictands.
5.
Factor analysis
Factor analysis (FA) is a linear method used to describe variability among observed, correlated variables in terms of a
potentially lower number of unobserved variables called factors (Wallis, 1968). Like PCA, FA is another data reduction
technique which allows you to capture variance in the variables in a smaller set. Moreover, FA could be used as a numerical
procedure for screening variables and yields to build more effective regression equations. Despite all the similarities, there
is a fundamental difference between PCA and FA; the former is a linear combination of variables, while the latter is a
measurement model of latent variables/dimensions. Only if unique factors are small (have close to zero variance), then
FA has the same results as PCA.
Exploratory factor analysis (EFA) is used to identify complex interrelationships among items and group items that
are part of unified concepts. The researcher makes no a priori assumptions about relationships among factors. In contrast, confirmatory factor analysis (CFA) tests the hypothesis that the items are associated with specific factors. CFA
uses structural equation modeling to test a measurement model whereby loadings on the factors allow for evaluation of
relationships between observed and unobserved variables. Structural equation modeling approaches can accommodate
measurement error, and are less restrictive than least-squares estimation. Hypothesized models are tested against actual
data, and the analysis would demonstrate loadings of observed variables on factors, as well as the correlation
between them.
5.1 Principal axis factoring
PCA and principal axis factoring (PAF) are used for factor extraction. PAF seeks factors which have the highest canonical
correlation with the observed variables. It is unaffected by arbitrary rescaling of the data. Here, we explain factor extraction
based on PAF.
Consider k independent variables x1, …, xk and observed data for each of these variables. Our objective is to identify
m factors y1, …, ym, preferably with m k as small as possible, that explains the observed data more succinctly. Let X ¼ [xi]
be a random k 1 column vector where each xi represents a sample (observable trait), and let Μ ¼ [mi] be the k 1 column
vector of the population means. Thus E[xi] ¼ mi. Let Y ¼ [yi] be an m 1 vector of unobserved common factors where m k.
These factors play a role similar to the principal components in PCA.
Suppose that each xi can be represented as a linear combination of independent factors as follows
xi ¼ bi0 +
m
X
bij yj + ei
j¼1
where the coefficient bij is called the loading of the ith variable on the jth factor and the errors (ei) are noises which are not
explained by the linear relationship. It is assumed that the factors are independent of each other and noises, with zero mean
and unit variance. Another assumption is that the mean of each ei is zero and they are independent of each other.
We can consider the above equation to be series of regression equations. Let Β ¼ [bij] be the k m matrix of loading
factors and let Ε ¼ [ei] be the k 1 column vector of noises. Hence,
mi ¼ E ½ x i Data reduction techniques Chapter
"
m
X
mi ¼ E bi0 +
9
159
#
bij yj + ei
j¼1
mi ¼ E½bi0 +
h i
b
E
yj + E½ei ij
j¼1
Xm
mi ¼ bi0 + 0 + 0 ¼ bi0 :
So, the above regression equations can be expressed as
xi ¼ mi +
m
X
bij yj + ei
j¼1
or equivalently
X ¼ Μ + ΒY + Ε:
From the assumptions stated above it also follows that
E½xi ¼ mi for all i
E½ e i ¼ 0
cov yi , yj ¼ 0
cov ei , ej ¼ 0
cov yi , ej ¼ 0
for all i
if i 6¼ j
if i 6¼ j
for all i,j:
Defining variance of noises as f ¼ E[ΕΕT], diagonal covariance matrix of X which is a k k matrix, has the form
h
i
S ¼ E ðX Μ ÞðX Μ ÞT
h
i
¼ E ðΒY + ΕÞðΒY + Ε ÞT
h
i
¼ E ΒYY T ΒT + E ΒYΕ T + E Ε ðΒY ÞT + E ΕΕ T:
¼ ΒE YY T ΒT + ΒE YΕ T + E ΕY T ΒT + E ΕΕ T
¼ ΒIΒT + 0 + 0 + f
¼ ΒΒT + f
The principal-axis method proceeds according to the following steps
(1) Estimate f from the communalities as discussed below.
(2) Find L and V, the eigenvalues and eigenvectors of S f using SVD (more detailed description on SVD method can be
found in the section “Principal Component Analysis (PCA)”).
(3) Calculate the loading matrix as follows
B ¼ VL1=2 :
One can reduce data by trimming out smaller eigenvalues. Hence, B could be estimated as VtrL1/2
tr .
A new f matrix is estimated from the current loading matrix and steps 1–3 iterate until the convergence of f.
(4) Calculate the factor scores as follows
1=2
F ¼ Z VL1=2 or F ¼ Z V tr Ltr
:
160
Handbook of hydroinformatics
We close this section with a discussion of obtaining an initial value of f. We can use the initial estimation of Cureton and
D’Agostino (1993), which will be outlined here. The initial communality estimates (cii) are calculated from the correlation
and inverse correlation matrices as follows
1
cii ¼ 1 ii
s
p
X
max
over j6¼k
k¼1
p
X
1
k¼1
sjk
1
Rkk
where s is the ith diagonal element of S 1 and sjk is element of S. The value of f is then estimated by 1 cii.
Like PCA, Kaiser criterion could be used for determining the number of factors. Based on that criterion, the technique
for determining the appropriate number of factors is to take the number of factors with eigenvalues greater than unity.
Besides, scree diagram could be useful to determine the number of factors. The number of factors to keep where the curve
makes an elbow and flattens out.
ii
6.
Random projection
Random projection is a simple linear technique, used to reduce the dimensions of a set of points which lie in Euclidean
space. Especially when random projection is used in conjunction with another technique (e.g., PCA or clustering), it could
be a very useful technique. It reduces the dimensions from thousands to hundred, then PCA, clustering or other reduction
techniques reduces dimensions farther. This scheme seem s more interesting when we know that time complexity for
random projection is way lower than that of PCA and many other data reduction methods.
The theory behind the efficiency of random projection is presented in Johnson-Lindenstrauss lemma. The lemma states
that a small set of points in a high-dimensional space can be embedded into a space of much lower dimensions in such a way
that distances between the points are nearly preserved. Hence, it is powerful when its results are used for discriminative
models.
In order to gain from the lemma, an orthonormal random matrix (R) is needed to transform data from a high-dimensional
space into a space with lower dimensions. If the original data matrix has n p dimensions, then XRP
nd ¼ XnpRpd is the
projection of the data onto a lower d-dimensional subspace. Computational cost of random projection is of order
O(pdn). If the data matrix X is sparse with c nonzero entries per column, then the complexity of this operation would
be of order O(cdn).
The orthonormal random matrix (R) can be generated using different methods. Two common ways to build R are
Gaussian random matrix and sparse random matrix methods. For Gaussian random projection, elements of the randomly
generated matrix are drawn from a zero-mean normal distribution with variance equals to 1/d.
The sparse random projection uses a sparse random matrix that guarantees similar embedding quality while being much
more memory efficient and allowing faster computation of the projected data. If we define s ¼ 1/density, the elements of the
random matrix are drawn from
8 pffiffiffiffiffi
1=2s
>
< s=d
Rij ¼ 0
with probability 1 1=s
>
: psffiffiffiffiffi
1=2s
=d
Li et al. (2006) recommended to set the density parameter to 1/√ p. Achlioptas (2001) has proposed a simpler alternative
which is commonly implemented in software packages
8
1=6
>
< 1
Rij ¼ 0
with probability 2=3
>
:
1
1=6
Worthwhile to mention that for both the Gaussian and sparse methods, the projection matrix is not an exact orthogonal
matrix. However, it has been shown that in high dimensional spaces, it is close enough to orthogonal matrix to guarantee
the embedding quality.
Data reduction techniques Chapter
9
161
It is common to use random projection in a Monte Carlo approach and aggregate multiple runs of randomly projected
data by Expectation Maximization (EM) clustering technique. In this approach, the frequency of similarity measure values
between pair of data points is the criteria to define the clusters. The probability that data point i belongs to each cluster under
the model y is p(l j i, y), l ¼ 1, …, k. Hence, the probability that data point i and j belong to the same cluster under the model y
k
P
is pyij ¼ pðlji, yÞ pðljj, yÞ. To aggregate multiple clustering results, the values of pyij are averaged across multiple runs to
l¼1
obtain estimate of the probability that data point i and j belong to the same cluster (pij). The pij values are expected to be
large when data point i and j are from the same natural cluster and small otherwise. In order to produce the final clusters
from the aggregated similarity matrix (P) whose elements are pijs, an agglomerative clustering procedure is adopted.
Similarity measures could be different kind of distances including Euclidean, cosine, Jaccard, Manhattan, Minkowski,
and Chebyshev. The distances are illustrated in Fig. 4.
Euclidean measure is only recommended to use when data is low dimensional and straight forward distance between
data points is enough to gauge the similarities of the points. Cosine measure is one of the most commonly used metrics. It is
used to find the similarity between two vectors/data points by calculating the cosine angle between them. It works excellent
with high-dimensional data and should be used for them ideally. Jaccard similarity emphasizes on the similarity between
two finite sample sets. It is defined as the size of the intersection of the sets, divided by the size of the union of these sets.
Unfortunately, Jaccard measure is highly dependent on the size of the data. Large datasets can significantly impact the
similarity, as in this case, the union could increase substantially while the intersection stays low. When discrete/binary
attributes are present in the dataset, the Manhattan metric is more effective. However, Manhattan measure does not represent optimal distance in the case of floating attributes in our dataset. Moreover, for high dimensional data, it works better
than Euclidean, but it’s still not the best option performance-wise. Minkowski generalizes the other distance metrics like
Euclidean, Manhattan, and Chebyshev. It is also called p-norm vector as it adds a parameter called p that allows different
distance measures to be calculated. Chebyshev distance is defined as the maximum difference between two vectors among
all coordinate dimensions. In other words, it is simply the maximum distance along each axis. This metric is usually used for
logistical problems. It can’t be applied for any general-purpose problem like Euclidean can. For more information on similarity metrics, refer to Cha (2007).
FIG. 4 Similarity measures commonly used in EM clustering technique.
162
Handbook of hydroinformatics
FIG. 5 Difference between Euclidean and geodesic distance between two sample points.
7.
Isometric mapping
Isometric mapping (ISOMap) is a nonlinear way to reduce dimensions while preserving local structures. It uses geodesic
distance instead of Euclidean distance. Geodesic distance is the distance between two points following the path available/
possible between the two points whereas Euclidean distance doesn’t have a path constraint to follow (Varini et al., 2006).
As it is shown in Fig. 5, according to Euclidean distance, the two points appear deceptively close, while they are on the
opposite parts of the horseshoe. This highlights the fact that Euclidean distance could be misleading when working with
nonlinearly dependent data.
The geodesic distance between each pair of points is calculated using Dijkstra or Floyd-Warshall algorithm. Dijkstra’s
algorithm finds shortest path from source to all other points, given a source points while Floyd Warshall algorithm computes the shortest path between all pair of points. Time complexity of Dijkstra’s algorithm is much less than that of FloydWarshall algorithm. Bounds of the running time of Dijkstra’s algorithm on a group of n data points with E lines can be
expressed as O(Elog(n)) while Floyd-Warshall algorithm has the time complexity of O(n3).
Using the above-calculated geodesic distances between points, the dissimilarity matrix is formed. After squaring the
matrix, it should be transformed such that mean for any row and mean for any column of the matrix is zero. Finally, eigendecomposition of the transformed matrix is performed and d first eigenvectors are chosen (d is the reduced size). This is
something similar to what we do in PCA after calculating the correlation matrix.
8.
Self-organizing maps
Based on ideas first introduced by Von der Malsburg (1973), Kohonen (1982) described self-organizing maps (SOMs) in a
publication entitled “Self organized formation of topologically correct feature maps.” He proposed a new algorithm aimed
at providing a representation in a smaller space, with the aim of preserving the initial topology. When the data forms a
curved line or surface in input space, PCA doesn’t perform good. In this case, SOM will overcome the approximation
problem by virtue of its topological ordering property. It provides a discrete approximation of finding so-called principal
curves or principal surfaces and may therefore be viewed as a nonlinear generalization of PCA. SOMs have many realworld applications in water science including satellite remote sensing process and discovering correlations and patterns
in the hydro-climate data.
An SOM is suitable for extracting information from large datasets consisting of numerous sample units and variables in
different scales. In general, conventional multivariate analyses are not suitable to extract information from such large and
complex datasets. In case of dimension reduction by PCA, for instance, a large dataset with a large number of variables
would produce a large number of significant principal components. Therefore, a few principal components may not be
sufficient to address overall variation in the large multidimensional datasets (Melssen et al., 1993).
SOM is a neural network algorithm using unsupervised competitive learning which could be used as a dimension
reduction method. Competitive learning is a form of unsupervised learning, in which nodes of the neural network compete
for the right to respond to a subset of input data. The goal of learning in the SOM is to cause different parts of the network to
respond similarly to certain input patterns.
In the SOM networks, input layer feeds the hidden layer. The hidden layer is basically a lattice of neurons and usually
called Kohonen layer. In the training procedure, neurons in the Kohonen layer accept and respond to set of input signals
(Fig. 6). Then, the responses compared and winning neuron selected from the lattice. Selected neuron activated together
with neighborhood neurons and adaptive process changes weights to more closely resemble the inputs. The network must
be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping.
Data reduction techniques Chapter
9
163
FIG. 7 The red limited domain shows neighborhood of neuron i in Kohonen layer.
FIG. 6 Basic structure of the SOM neural network.
Each hidden layer neuron has several neighbor neurons. In order to define neighborhood mathematically, the
neighborhood function should be deployed. The neighborhood function (’(i,k)) indicates how closely any pair of
neurons i and k in the Kohonen layer are connected to each other. ’ should be symmetric about the neuron and monotonically decreasing with distance to it (Fig. 7). Usually, a Gaussian function on the distance between the two neurons
in the layer is used
!
d2i,k
’ði, kÞ ¼ exp 2
2r
where di,k is the Euclidean distance between node i and its neighboring neuron k and r is the radius. The function is
maximum at the neuron i and monotonically decreases onward.
The stages of algorithm of SOM can be summarized as follows
(1) Initialization: Randomly initialize weights for all neurons in the Kohonen layer. The other option is to sample evenly
from the subspace spanned by two largest principal component eigenvectors, Wj ¼ [wj1, wj2, …, wjn].
(2) Sampling: Draw sample X from the input, X ¼ [x1, x2, P
x3,…, xn]. 2
(3) Similarity matching: Compute for each node j, Dj ¼
wij xi and find index j that Dj is minimum (winner
i
neuron).
(4) Updating/learning: Update the winner so that it becomes more like X, together with the winner’s neighbors k,
Wj:¼ Wj(old) + ’(j, k)(X Wj(old)).
(5) Continuation: Keep returning to step 2 until the map stops changing (i.e., no noticeable changes in the weights).
This process is usually reiterated over the all available input samples. Learning rate () and radius (r) may be decreased
during continuation stage.c The learning rate varies in the [0,1] interval and must converge to 0 so as to guarantee convergence and stability for the SOM.
9. Discriminant analysis
The perspective on data reduction in this section is rather different from that of the previous sections. Up to this point, it is
assumed that we have no measures of data labels, groups or strata. The sample data is reduced based on unsupervised procedures which obtained as many groups as were requested. In this section, the number of groups k is known. In addition, we
also have recorded the group label for each observation without any uncertainty
Discriminant analysis is a well-known and widely used linear data reduction technique which is limited by the fact
that all predictors must be continuous, and that a parametric Gaussian assumption should be formulated possibly after
transformation. However, it works well also with a mix of continuous (without parametric assumptions) and categorical
measurements. Linear discriminant analysis (LDA) is a generalization of Fisher’s linear discriminant, a method used in
statistics, to maximizes the ratio of the between-class variance to the within-class variance. Maximizing that ratio
guarantees the maximum class separability through its transformation of features into a lower dimensional space.
c. The decrease from the initial value of learning rate (0) to 0 could be done linearly or exponentially. Usually, the decrease from the initial value of
neighborhood radius (r0) is exponential (rn ¼ r0e n/const.).
164
Handbook of hydroinformatics
LDA seeks directions that are efficient for discriminating data whereas PCA seeks directions that are efficient for
representing data. In Fig. 8, it is shown that how choosing an appropriate projection direction maximize separability
among classes.
In order to explain the algorithm of LDA, assume
P that there are c classes with each class containing ni samples, i ¼ 1,
2, …, c and n is the total number of samples (n ¼ ci¼1ni). mi is the mean of the ith class, and m is the mean of the whole
P Pi
P
(xij mi)(xij mi)T and the between-class
dataset m ¼ 1c ci¼1 mi. The within-class scatter matrix would be SW ¼ ci¼1 nj¼1
Pc
T
scatter matrix would be Sb ¼ i¼1(mi m)(mi m) . Suppose the desired projection transformation is Y ¼ XUldawhere Ulda
is the orthonormal projection matrix of LDA and Y is the transformed data. According to the definition of LDA, it leads to
maximum class separability. The class separability is defined as the ratio of norm of between-class scatter matrix of the
) to the norm of its within-class scatter matrix S of the transformed data. Hence, the problem is to
transformed data ( S
b
W
j Sb j
jUTlda Sb Ulda j
maximize the ratio S or equivalently UT S U . This ratio is called Fisher’s criterion. It has been shown that Ulda is in fact
j Wj
j lda W lda j
the solution of the eigensystem problem SbUlda SWUldaL ¼ 0,where L is the diagonal matrix of eigenvalues. Multiplying
both sides by the inverse of SW yields (S1
W Sb)Ulda ¼ UldaL. If SW is a nonsingular matrix then the Fisher’s criterion is
maximized when the projection matrix Ulda is composed of the eigenvectors of S1
W Sbwith at most (C 1) nonzero
corresponding eigenvalues (since there are only C points to estimate Sb). Hence, four steps for performing LDA could
be listed as following
(1) Compute the p-dimensional mean vector for different classes from the dataset.
(2) Compute the scatter matrices (between class and within class scatter matrices).
(3) Sort the eigenvectors of S1
W Sb by decreasing eigenvalues and choose d eigenvector corresponding to d largest eigenvalues to from the p d dimensional matrix Ulda where every column represents an eigenvector.
(4) Use p d eigenvector matrix to transform the sample into the new subspace. This can be summarized by the matrix
multiplication Y ¼ XUlda (where X is the n p dimension matrix representing the n samples and Y is the transformed
n d dimensional matrix of samples in the new subspace.
Like other linear methods (i.e., PCA, FA, …), LDA is easily applicable and has analytical solution. However, if the number
of variables is much higher than the number of samples in the data matrix, LDA will be unable to find the lower dimensional
space due to singularity of the within-class scatter matrix. This is known as the small sample problem (SSS). Different
approaches have been proposed to solve this problem. The first proposed approach is to remove the null space of
within-class matrix (Chen et al., 2000). The second approach utilizes the conversion to an intermediate subspace using
another dimension reduction method, e.g., PCA. In other words, first, PCA is applied to reduce dimensions of data,
and then, LDA is applied to find the most discriminative directions in an intermediate subspace (Zhao et al., 1999).
The third approach, which is a well-known one, is to apply the regularization to solve singular linear systems. The simplest
way to regularize is by adding additional variance in all directions. To achieve this, a small constant number is added to all
the diagonal elements of the within-class-scatter matrix. Such a regularization has the effect of decreasing the larger eigenvalues and increasing the smaller ones, thereby counteracting the biasing. Another effect of the regularization is to stabilize
the smallest eigenvalues (Lu et al., 2005).
FIG. 8 Proper projection direction leads to good separation of classes.
x2
projection direction
x2
projection direction
Good separability
x1
Bad separability
x1
Data reduction techniques Chapter
9
165
10. Piecewise aggregate approximation
Piecewise aggregate approximation (PAA) is a very simple dimension reduction method for time series mining. Time series
datasets and databases tend to grow to extremely large sizes. Sampling consistently is a requirement in a lot of cases where
these databases are involved. One algorithm to address this is the PAA, discussed by Keogh et al. (2001). The basic idea
behind the algorithm is: to reduce the dimensions of the input time series by splitting them into equal-sized segments which
are computed by averaging the values in these segments. It reduces data by the average values of equal sized frames. PAA
approximates a time-series X of length n into vector X ¼(x1 , …, xm ) of any arbitrary length m n where each of xi is
Pðn=mÞi
calculated as follows xi ¼ mn j¼ðn=mÞði1Þ+1 xj. In other words, in order to reduce the dimensions from n to m, we first divide
the original time-series into m equally sized frames and secondly compute the mean values for each frame. The sequence
assembled from the mean values is the PAA approximation of the original time-series. Using the distance measure
pffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pm
DPAA(X,Y) ¼ mn
i¼1 ðxi yi Þ, Yi and Faloutsos (2000) have shown that PAA satisfies to the lower bounding condition
and guarantees no false dismissals, i.e., DPAA(X,Y) D(X,Y).
11. Clustering
Clustering algorithms partition the data examples into disjointed groups, or clusters, so that data samples within a cluster are
“similar” to one another and different to data examples that belong to other clusters. Using this data reduction method, only
cluster representation (e.g., centroid and diameter) of the data could be used instead of the actual data (Fig. 9).
Cluster analysis has been applied to find patterns in the atmospheric pressure and temperature of regions that have a
significant impact on climate.
There are many choices of clustering definitions and clustering algorithms. Generally speaking, there are two types of
clustering, partitional clustering and hierarchical clustering. In partitional clustering, objects are divided into nonoverlapping clusters. In hierarchical clustering, a set of nested clusters organized as a hierarchical tree. Hierarchical clusters
can be visualized as a dendrogram. To execute partitional clustering, there are two common algorithms, k-means and its
variants and density-based clustering.
11.1 k-means clustering
The most popular algorithm for clustering is k-means, which aims to identify the best k cluster centers in an iterative
manner. Cluster centers are served as representative of the objects associated with the cluster. Usually, number of clusters
is given a priori. Otherwise, notion of clusters could be extremely ambiguous. Due to iterative nature of k-means, it might
lead to an incorrect result due to convergence to a local minimum.
The basic operation of k-means clustering algorithms is relatively simple. Given a fixed number of k clusters, assign
observations to those clusters so that the means across clusters (for all variables) are as different from each other as possible.
FIG. 9 The result of a cluster analysis which reduces
dimensions of the data.
166
Handbook of hydroinformatics
TABLE 1 k-means: common choices for proximity, centroids and objective functions.
Proximity function
Centroid
Objective function
Manhattan (L1)
Median
Minimize sum of the L1 distance of an object to its cluster centroid
Mean
Minimize sum of the squared L2 distance of an object to its cluster centroid
Mean
Maximize sum of the cosine similarity of an object to its cluster centroid
Squared Euclidean
Cosine
(L22)
From Tan, P.N., Steinbach, M., Kumar, V., 2016. Introduction to Data Mining: Global Edition. Pearson Education Limited.
k-means procedure does not explicitly use pairwise distances between data points, in contrast to hierarchical clustering. In
order to determine which centroid is closest to a particular data point, we have to use a proximity measure. Manhattan,
Euclidean, and cosine are all proximity measures that help us to determine which cluster a point should be assigned to.
The term centroid implies to a central tendency measure in the multivariate space. The objective of clustering algorithm
is to minimize the sum of the squared distance (or other measures) of the objects in a cluster to their corresponding centriod
once the centroid has been defined. The k-means algorithm tries to minimize the value of objective function for each set of
centriods in an iterative manner. In first step, k data points are randomly selected as the centroids and the objective function
is calculated. It is continuing until there is no change to the centroids, i.e., assignment of data points to clusters doesn’t
change (Table 1).
It should be mentioned that generally k-means clustering is only a variant of expectation maximization (EM) technique,
and is efficient when clusters are spherical. The EM algorithm extends this basic approach to clustering in two important
ways: The EM clustering algorithm computes probabilities of cluster memberships based on one or more probability distributions. The goal of the clustering algorithm then is to maximize the overall probability or likelihood of the data, given
the (final) clusters. It is in contrast to k-means clustering that assign observations to clusters to maximize the distances
between clusters. Of course, as a final result of EM algorithm, you can usually review an actual assignment of observations
to clusters, based on the (largest) classification probability.
k-means has problems when clusters are of differing sizes, densities and constitute a nonspherical shape (not same variance in all directions) in space. It highlights the need for standardization of the data before applying k-means clustering.
Moreover, k-means has problems when the data contains outliers.
11.2 Hierarchical clustering
There are two main types of hierarchical clustering: agglomerative method that starts with the points as individual clusters
and each step, merge the closest pair of clusters until only k clusters left, and divisive method starting with all-inclusive one
cluster and recursively splitting the most appropriate cluster. The process continues until a stopping criterion (e.g., predefined number of clusters) is achieved.
Common hierarchical algorithms use a similarity matrix and merge or split one cluster at a time based on specific rules.
However, there are several ways to measure the similarity between in order to decide the rules for clustering, as shown in
Fig. 10.
All the approaches to calculate the similarity between clusters have their own advantages and disadvantages. Minimum
linkage method can handle nonelliptical shapes and is the best for capturing clusters of different sizes. However, it is sensitive to noise and outliers. Maximum linkage method is less susceptible to noise and outliers but it tends to break large
clusters and it is biased toward spherical clusters. The centroid, average and Ward linkage methods are robust to noises but
they are biased toward spherical clusters. Ward linkage method could outperform others in presence of outliers.
A simple agglomerative clustering method could be briefly explained as follows. First, we should note that there is an
n n similarity matrix (P), and k number of clusters is desired. A partition of n points into k clusters would be the output of
the procedure. The counter is set to n for the first time. For i ¼ 1 to n, ci is set to {xi} for i ¼ 1, …, n. Now, we find the most
similar pair of clusters based on P, say ci and cj and merge those while the counter is decremented by one. The steps of
finding and merging clusters will be repeated till the counter is equal or less than k (Fern and Brodley, 2003).
Hierarchical clustering does not work well on vast amounts of data. However, it has some strengths including
– There is no need to prespecify any particular number of clusters.
– Any desired number of clusters can be obtained by cutting the dendrogram at the proper level.
– It is easy to decide the number of clusters by merely looking at the dendrogram.
Data reduction techniques Chapter
9
167
FIG. 10 Different approaches to calculate the similarity between clusters in hierarchical clustering.
11.3 Density-based clustering
In view of density-based clustering methods, clusters are dense regions in the data space, separated by regions of the lower
density of points. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is the most common technique
in density-based clustering. DBSCAN algorithm identifies the dense region by grouping together data points that are closed
to each other based on distance measures. DBSCAN algorithm requires two parameters; radius of the neighborhood (Eps)
which density is defined as the number of points within it. Another parameter referring to minimum number of data points
within a neighborhood (MinPts), is also needed to define the density-based clusters. If the distance between two points is
lower or equal to Eps, then they are considered as neighbors. If the Eps value is chosen too small, then large part of the data
will be considered as outliers. If it is chosen very large, then the clusters will merge and majority of the data points will be in
the same clusters. One way to find the Eps value is based on the k-distance graph. Larger the dataset, the larger value of
MinPts must be chosen. As a general rule, the minimum MinPts can be derived from the number of dimensions (p) in the
dataset as MinPts p + 1. The minimum acceptable value of MinPts is three. Worthwhile to mention that it is needed to
scale/normalize data before density-based clustering.
There are have three types of data points in DBSCAN algorithm (Fig. 11):
Core point: a point is a core point if it has more than MinPts points within Eps. The clusters are built around our core
points.
Border point: a point which has fewer than MinPts within eps but it is in the neighborhood of a core point.
Outlier or noise: a point which is not a core point or border point.
It is not an easy job to determine the parameters of DBSCAN. However, a possible approach is using k-distance graph. Idea
is that for points in a cluster, their kth nearest neighbors are at roughly the same distance. Outliers have the kth nearest
neighbor at farther distance. So, plot the sorted distance of every point to its k-th nearest neighbor for values of
k larger than p. The optimum distance (Eps) is where the slope of the graph increases dramatically and k could be selected
as the MinPts (Fig. 12).
168
Handbook of hydroinformatics
FIG. 11 Three types of points defined in the DBSCAN algorithm.
FIG. 12 k-dist plot for sample data. (Adapted
from Kotary, D.K., Nanda, S.J., 2021. A distributed
neighbourhood DBSCAN algorithm for effective
data clustering in wireless sensor networks.
Wireless Pers. Commun. 121, 2545–2568.).
DBSCAN is resistant to noise and capable of handling clusters of different shapes and sizes. However, it has some
limitations including
Time complexity is exponential in number of dimensions (specifically, it has high complexity if “too many” dense units
are generated at lower stages).
May fail if clusters are of widely differing densities, since the threshold is fixed.
Determining appropriate threshold and unit interval length can be challenging.
12.
Conclusions
The area of dimension reduction is becoming very relevant in different application areas including environmental science,
because of the sheer amount of data being generated in the era of big data. The purpose of this chapter is to provide
information on different dimension reduction techniques. It r presented a review and comparative study of the common
techniques for dimension reduction. Ultimate goal of performing dimension reduction is to improve the accuracy and
efficiency as a consequence of decreasing complexity of computational work. This chapter gives clear idea of comparison
of common dimensional reduction techniques in water science and is useful to select the optimum method for particular
application based on characteristics of the dataset. It is concluded that in order to select a scheme to reduce data dimension,
we should consider the type of dataset and specific requirement of the work. The other important concern to choose a
techniques for dimension reduction is their computational difficulty, which should be feasible in practice. A combination
of schemes may also be used to overcome the disadvantages of one scheme over another.
Data reduction techniques Chapter
9
169
The chapter presented a review and comparative study of techniques for dimension reduction. By far, the most common
data reduction techniques are those based on the search of components in its different brands (e.g., PCA, SSA, FA, …),
although the tendency is to loose importance in favor of nonlinear techniques (e.g., ISOMap, SOMs, …). This is a response
to the nonlinear nature of acquired data. The key benefit of these methods is their ability to analyze nonlinearities and
adapting to the local structure of the data. However, nonlinear techniques for dimension reduction are often not capable
of outperforming traditional linear techniques such as PCA or FA.
References
Achlioptas, D., 2001. Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems – PODS ’01, pp. 274–281. CiteSeerX 10.1.1.28.6652 https://doi.org/10.1145/375551.375608. ISBN 978-1581133615.
S2CID 2640788.
Austin, D., 2009. We recommend a singular value decomposition. American Mathematical Society. Feature Column.
Beavers, A.S., Lounsbury, J.W., Richards, J.K., Huck, S.W., Skolits, G.J., Esquivel, S.L., 2013. Practical considerations for using exploratory factor
analysis in educational research. Pract. Assess. Res. Eval. 18 (Article 6). https://doi.org/10.7275/qv2q-rk76.
Borga, M., Knutsson, H., 1998. An Adaptive Stereo Algorithm Based on Canonical Correlation Analysis. IEEE.
Cha, S.H., 2007. Comprehensive survey on distance/similarity measures between probability density functions. City 1 (2), 1.
Chen, L.F., Liao, H.Y.M., Ko, M.T., Lin, J.C., Yu, G.J., 2000. A new LDA-based face recognition system which can solve the small sample size problem.
Pattern Recogn. 33 (10), 1713–1726.
Cureton, E.E., D’Agostino, R.B., 1993. Factor Analysis: An Applied Approach, first ed. Psychology Press, https://doi.org/10.4324/9781315799476.
Eslamian, S., Ghasemizadeh, M., Biabanaki, M., Talebizadeh, M., 2010. A principal component regression method for estimating low flow index. Water
Resour. Manage. 24 (11), 2553–2566.
Fern, X.Z., Brodley, C.E., 2003. Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 186–193.
Hotelling, H., 1936. Relations between two sets of variates. Biometrika 28, 321–377.
Jolliffe, I.T., 1986. Principal Component Analysis. Springer Series in Statistics. Springer, New York. https://doi.org/10.1007/978-1-4757-1904-8.
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S., 2001. Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf.
Syst. 3 (3), 263–286. https://doi.org/10.1007/PL00011669.
Kohonen, T., 1982. Self-organized formation of topologically correct feature maps. Biol. Cybern. 43 (1), 59–69.
Li, P., Hastie, T.J., Church, K.W., 2006. Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pp. 287–296.
Lu, J., Plataniotis, K.N., Venetsanopoulos, A.N., 2005. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn. Lett. 26 (2), 181–191. https://doi.org/10.1016/j.patrec.2004.09.014.
Melssen, W.J., Smits, J.R.M., Rolf, G.H., Kateman, G., 1993. Two-dimensional mapping of IR spectra using a parallel implemented self-organising feature
map. Chemom. Intell. Lab. Syst. 18 (2), 195–204.
Pearson, K., 1901. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2 (11), 559–572.
Tripathi, S., Govindaraju, R.S., 2010. Canonical correlation analysis for hydroclimatic datasets with known measurement uncertainties. In: World Environmental and Water Resources Congress 2010., https://doi.org/10.1061/41114(371)461.
Varini, C., Degenhard, A., Nattkemper, T.W., 2006. ISOLLE: LLE with geodesic distance. Neurocomputing 69 (13–15), 1768–1771.
Von der Malsburg, C., 1973. Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14 (2), 85–100.
Wallis, J.R., 1968. Factor analysis in hydrology—an agnostic view. Water Resour. Res. 4 (3), 521–527. https://doi.org/10.1029/WR004i003p00521.
Wikle, C.K., 2003. Spatio temporal methods in climatology. In: Encyclopedia of Life Support Systems. EOLSS.
Yi, B.K., Faloutsos, C., 2000. Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th VLDB Conference, Cairo, Egypt.
Zhao, W., Chellappa, R., Phillips, P.J., 1999. Subspace Linear Discriminant Analysis for Face Recognition. Computer Vision Laboratory, Center for Automation Research, University of Maryland, USA.
This page intentionally left blank
Chapter 10
Decision tree algorithms
Amir Ahmad Dehghania, Neshat Movahedia, Khalil Ghorbania, and Saeid Eslamianb,c
a
Department of Water Engineering, Gorgan University of Agricultural Sciences & Natural Resources, Gorgan, Iran, b Department of Water Engineering,
College of Agriculture, Isfahan University of Technology, Isfahan, Iran, c Center of Excellence in Risk Management and Natural Hazards, Isfahan
University of Technology, Isfahan, Iran
1. Introduction
Machine learning (ML) is a branch of artificial intelligence (AI) to train a computer system for making decisions based on
the fed data. The data pattern is recognized by ML based on learning process and then they can predict when unseen data
arrives. ML algorithms are grouped by their learning style and by their similarity in form or function. Decision trees (DTs)
are very popular techniques in machine learning which grouped by their similarity. The learning process can either be
supervised or unsupervised, which tree-based methods are supervised techniques. Unlike neuron-based models such as
ANN, DTs divide complex problems into smaller ones, which is understandable for everyone, even without any analytical
knowledge to read and interpret them. The other advantages of DTs over neuron-based models are that the process of
making a decision can be easily shown via visual representation of the data, and they are easy to program with only
IF, THEN, ELSE statements. Also, the lack of hidden layers and modeling transparency in DT-based algorithms, enable
better modeling performance rather than neuron-based models (Bui et al., 2020a).
DTs are used to solve both classification and regression problems in the form of trees. Furthermore, based on the target
variable, they are divided into two types; categorical variable DT and continuous variable DT. DTs use various algorithms
to find out how to choose suitable variable and how to split (Sullivan, 2017). There are some standard DT algorithms
including Iterative Dichotomiser 3 (ID3), C4.5 and C5.0, Classification and Regression Tree (CART), Chi-squared Automatic Interaction Detection (CHAID), M5 model tree, and Random Forest (RF). However, there are other well-known DT
algorithms such as Best First Decision Trees, Alternating model tree (AMT), Logistic Model Trees (LMT), Reduced Error
Pruning Trees (REPT), and Alternating Decision Trees (ADT) (Khosravi et al., 2018a, 2020; Bui et al., 2020b; Khosravi
et al., 2021), but here the above standard DT algorithms are introduced briefly. For better comparison of these algorithms,
specifications of each algorithm are presented in Table 1 based on type of target variable, splitting criterion, number of
branches and pruning. A brief description of DT algorithms is presented in the following part, and there is a need to define
important keywords before diving into working principle of DT algorithm (Fig. 1).
Root node: It is top-most node of a DT which represents the entire population or sample. The data set is divided into
homogeneous datasets based on Attribute Selection Techniques.
Splitting: It a process for dividing a node into multiple subnodes.
Decision node: The subnodes created in this process of splitting are known as decision nodes.
Leaf/terminal node: The Nodes that cannot be split any more are called leaf node.
Pruning: The process of removing the subnodes of a decision node is known as pruning.
Branch/subtree: The subsection of an entire tree in called branch or subtree.
Parent and child node: The node that gets divided is known as parent node, whereas the subnodes are known as
child nodes.
1.1 ID3 algorithm
Induction of Decision Tree (ID3) is a very simple DT algorithm which proposed by Quinlan (1986) used to determine the
classification of objects. In this algorithm, at each node, one attribute is tested based on maximizing information gain and
minimizing entropy. Then each attribute which produces the highest Gain is selected as root node. The Entropy and Gain
are calculated as following:
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00004-X
Copyright © 2023 Elsevier Inc. All rights reserved.
171
172 Handbook of hydroinformatics
TABLE 1 Comparison of different decision tree algorithms.
Algorithm (!)
Characteristics
(#)
ID3
C4.5
CART
CHAID
M5
RF
Independent
variable
Categorical
Categorical/
continuous
Categorical/
continuous
Categorical/
continuous
Continuous
Categorical/
continuous
Dependent
variable
Categorical
Categorical/
continuous
Categorical/
continuous
Categorical
Continuous
Categorical/
continuous
Splitting criterion
Information
Gain
Gain ratio
Gini index or
Towing criteria
Chi-square
Standard
deviation
reduction
Randomly
Branches
2
2
2
2
2
2
Pruning
No
Yes
Yes
Yes
Yes
No
FIG. 1 Decision tree structure.
Root Node
Branch/Sub-tree
Decision Node
Leaf Node
Decision Node
Leaf Node
Leaf Node
Decision Node
Leaf Node
EntropyðDÞ ¼
n
X
Leaf Node
pi log 2 pi
(1)
i¼1
GainðD, AÞ ¼ EntropyðDÞ n
X
½ pðDjAÞ EntropyðDjAÞj
(2)
i¼1
where D and A denote decision and attribute, respectively. Also, pi is probability of each class in decision and n is the
number of classes. It must be noted that for a dataset with small size, the ID3 algorithm may be give over-fitted or
over-classified results. Also, this algorithm disable to handle numeric attributes and missing values (El Seddawy
et al., 2013).
1.2 C4.5 algorithm
Quinlan (1993) developed a series of improvements to ID3, called C4.5 (Salzberg, 1994; Quinlan, 2014). These improvements deal with numeric attributes, missing values and noisy data. C4.5 uses entropy for its impurity function. In ID3 algorithm, the gains are used for each attribute, but in C4.5, the gain ratios are used:
GainRatioðAÞ ¼ GainðAÞ=SplitInfoðAÞ
X SplitInfoðAÞ ¼ Dj =jDj log 2 Dj =jDj
(3)
(4)
Decision tree algorithms Chapter
10
173
One of the advantages of this algorithm is that, C4.5 can handle both continuous and discrete attributes. In order to handle
continuous attributes, it splits the data set into those whose attribute value is above the threshold and those that are less than
or equal to it (Singh and Gupta, 2014). Mazid et al. (2010) found that many nodes in this algorithm have zero values or close
to zero values, which they do not contribute to generate rules or help to construct any class for classification task, while they
make the tree bigger and more complex.
1.3 CART algorithm
Classification and regression tree (CART) proposed by Breiman et al. (1984) develops visualized decision rules for predicting a categorical variable and also a continuous variable. Classification trees are used when the target variable is categorical. Regression trees, on the other hand, are used when the target variable is continuous. The splitting rule in
classification is measured by Gini index to quantify the homogeneity of the data:
Gini ¼ 1 n
X
ð pi Þ 2
(5)
i¼1
In regression tree, the splitting is made in accordance with squared residuals minimization algorithm which implies that
expected sum variances for two resulting nodes should be minimized (Timofeev, 2004). The CART can handle both
numerical and categorical variables. Also, it has this ability to identify the most significant variables and eliminate nonsignificant ones (Timofeev, 2004).
1.4 CHAID algorithm
The Chi-Square Automatic Interaction Detection (CHAID algorithm) proposed by Kass (1980), and it is based on the chisquare tests which used to find the significance of a feature (Kass, 1980). This algorithm builds decision tree for categorical
target data set. The formula of chi-square (w2) is:
w2 ¼
X ðy y 0 Þ2
y0
(6)
where y is actual value and y0 is expected value. The advantages of this algorithm is that it is convenient for generation of
nonbinary trees (Milanovic and Stamenkovic, 2016).
1.5 M5 algorithm
The M5 algorithm, was first introduced by Quinlan (1992) for predicting continuous data and then his theory improved in a
system called M50 by Wang and Witten (1996). In this model, data are classified into different groups and for each group a
set of multilinear regression (multivariate linear) equation is built. The advantage of M5 model tree is its ability to tackle
tasks with very high dimensionality—up to hundreds of attributes. In this algorithm by increasing the number of attributes,
the computational cost does not grow quickly. Also, the main advantage of this algorithm is that it is able to produce
regression equation for each of classes. The advantage of M5 over CART is that model trees are generally much smaller
than regression trees and have proven more accurate in the tasks investigated (Quinlan, 1992). The only disadvantages of
this algorithm, based on the experience of authors, are related to the greedy nature of this algorithm which the accuracy of
the model does not necessarily increase with increasing number of data.
1.6 Random forest
The random forest (RF) algorithm which first introduced by Breiman (2001), is a tree-based algorithm that combines multiple DTs for making decision. RF randomly chooses a set of samples from the dataset, creates a DT using a random subset
of variables. By repeating this process, i.e., choosing another sample and selecting variables randomly, a wide variety of
DTs are created. Then, RF combines the output and a tree with higher vote is selected as a final output. RF is performed well
on estimating missing data and large dataset but it is a hyperparameter model. The advantages of this model is that it can be
used in both time series (Sharafati et al., 2019; Bui et al., 2020a; Khosravi et al., 2020) and spatial prediction (Khosravi
et al., 2019). Readers are referred to the Tyralis et al. (2019) for further study about the RF applications in water sciences.
Recently, Fisher et al. (2021) use RF to investigate the parameters which affect sediment rating curve shape, vertical offsets,
and slopes.
174 Handbook of hydroinformatics
1.7 Application of DT algorithms in water sciences
Charoenporn (2017) used ID3 and C4.5 decision tree algorithms to forecast reservoir inflow. Galelli and Castelletti (2013)
investigate the possibility of using CART to predict characteristic flows in various morphoclimatic conditions. CART algorithm employed by Choubin et al. (2018a,b) to estimate the monthly suspended sediment load and to forecast precipitation,
respectively. M5 model tree has been applied in different field of hydraulic, hydrology, and groundwater; for example in
flood forecasting (Solomatine and Xue, 2004), sediment transport prediction (Bhattacharya et al., 2007; Reddy and
Ghimire, 2009; Khosravi et al., 2018b; Zahiri et al., 2020; Salih et al., 2020), wave height estimation (Etemad-Shahidi
and Mahjoobi, 2009), daily river flow forecasting (Sattari et al., 2013; Kisi et al., 2017), groundwater level prediction
(Rezaie-Balf et al., 2017; Nalarajan and Mohandas, 2015; Sattari et al., 2018; Bahmani et al., 2020), precipitation forecasting (Goyal and Ojha, 2012), reservoir operating rules derivation (Goyal et al., 2013), hydraulic jump characteristics
prediction (Mahtabi and Satari, 2016), drought prediction (Sattari and Sureh, 2019), pan evaporation estimation (Sattari
et al., 2020), and rainfall-runoff modeling (Nourani et al., 2019). Also, RF and M5 model tree are being compared in some
studies, for example in prediction of local scour around bridge piers (Pal et al., 2013), shear stress in compound channel
(Khozani et al., 2019), bed load transport (Khosravi et al., 2020), water quality (Bui et al., 2020a), and in estimation of
unsaturated hydraulic conductivity (Sihag et al., 2019). Looking to literature it can conclude that RF and M5 model tree
are frequently applied in water sciences, but since RF is hyper parameter model, so in the next section description of M5
model tree is presented and the flow discharge through side sluice gate for submerged condition is solved with this model.
2.
M5 model tree
Such as other decision tree models, first the initial tree is built based on splitting criterion, and then this tree is pruned to
overcome the fitting problem and finally the smoothing process is employed to compensate sharp discontinuities between
adjacent linear models at the leaves. The most important steps in M5 model tree are splitting and pruning which is explained
in details as follows. One of the most important principles in the decision tree is that an expert viewpoint is always needed in
the process of splitting and pruning. Different trees can be created due to the viewpoint of different experts. In some cases,
by pruning we may lose some rules that they are necessary for us.
2.1 Splitting
M5 model tree first constructs a regression tree by recursively splitting the instance space. Fig. 2 illustrates how the splitting
of space is done for a given 2-D input parameter domain of X1 and X2. The splitting criterion is used to minimize the intrasubset variability in the values down from the root through the branch to the node (Bonakdar and Etemad-Shahidi, 2011).
The splitting criterion for the M5 model tree algorithm is based on treating the standard deviation of the class values that
reach a node as a measure of the error at that node and calculating the expected reduction in this error as a result of testing
each attribute at that node. The Standard Deviation Reduction (SDR) is calculated as follows:
X T i SDR ¼ sd ðT Þ sd ðT i Þ
(7)
jT j
FIG. 2 Example of M5 model
tree (LM 1–5 are linear regression
models) (Solomatine and Xue,
2004).
8
X1>3
7
6
X2>4
X2>2
X2
X1>4.5
LM1
LM4
LM1
5
LM5
LM4
4
3
2
LM2
LM3
LM5
1
0
0
1
LM3
2
3
LM2
4
X1
5
6
7
Decision tree algorithms Chapter
10
175
where T represents a set of examples that reaches the node; Ti represents the subset of examples that have the ith outcome of
the potential set; and sd represents the standard deviation. After examining all possible splits (i.e., the attributes and the
possible split values), M5 chooses the one that maximizes the expected error reduction. Splitting in M5 terminates when the
class values of all the instances that reach a node vary just slightly, or only a few instances remain (Wang and Witten, 1996).
In order to better understand the processes in M5 model tree, details of splitting procedures are presented as follows:
Step 1: Compute the standard deviation of the target values.
Step 2: Choose one of the input variables (e.g., X1), sort its value in ascending order and calculate the average of X1 for
all adjacent records. Then, for each of these averages, calculate the SDR of corresponding target (Y) using Eq. (7).
For example, when tree is divided based on the average value of first point and second point of X1 (Fig. 3), two groups are
created, values less than and higher than this average. Calculate the SDR for corresponding target based on first point and
rest of the points. Then, assume that the tree is divided based on the average value of second and third points of X1 (Fig. 4).
So, again calculate the SDR of corresponding target for values less than and higher than this average. It must be continued
these processes until the SDR of all X1 records is calculated. Then, write down the maximum SDR of X1.
Step 3: Repeat Step 2, for all input variables.
Step 4: In this step, compare the maximum SDR of input variables. Choose the attribute with maximum SDR as Root
Node (e.g., X2), then among SDR of all groups of this attribute (X2), choose the group with maximum SDR. Consider
the value of X2 at this group as a point of splitting.
Step 5: After selecting the root node, continue all the above steps for splitting until the values of all instances that reach a
node vary slightly or only a few numbers of instances remain.
Step 6: Finally, built a linear multiple regression model for each subspace (i.e., leaf), using all the attributes that participate in building the tree.
Group2
Y
Group1
X1
FIG. 3 Splitting based on the average of first point and second point of X1.
Group2
Y
Group1
X1
FIG. 4 Splitting based on the average of second point and third point of X1.
176 Handbook of hydroinformatics
FIG. 5 Schematic of tree pruning.
2.2 Pruning
As mentioned before, M5 is considered as a greedy approach, which lead to just check for the best split, and continue until
the stopping conditions are achieved. As the tree grows, the accuracy of the model increases but overfitting may be inevitable, so pruning can overcome this problem (Kumar et al., 2005). By starting from the bottom of the tree and investigating
each of nonleaf node, the pruning is done by selecting the linear model above or the model subtree, whichever has the lower
estimated error. If the linear model is chosen, the subtree at this node is replace with a leaf (Quinlan, 1992). Fig. 5 illustrates
a conceptual scheme of the pruning process.
2.3 Smoothing
Smoothing process is done to compensate for the sharp discontinuities that will inevitably occur between adjacent linear
models at the leaves of the pruned tree, particularly for some models built from a smaller number of training instances
(Bhattacharya et al., 2007). Smoothing can be accomplished by producing linear models for each internal node, as well
as for the leaves, at the time the tree is built. Then, once the leaf model has been used to obtain the raw predicted value
for a test instance, that value is filtered along the path back to the root, smoothing it at each node by combining it with the
value predicted by the linear model for that node (Witten and Frank, 2002). An appropriate smoothing calculation is:
p0 ¼
np + kq
n+k
(8)
where p0 is prediction passed up to the next higher node, p is prediction passed to this node from below, q is value predicted
by the model at this node, n is training instances that reach the node below, and k is smoothing constant.
3.
Data set
To shows the ability of M5 model tree, the flow discharge through side sluice gate for submerged condition was solved with
M5 model tree. Side sluice gates are one of the diversion structures used in irrigation channels, urban sewage systems and
also as head regulators of distributaries, in order to divert the flow from main channel to a secondary channel (Swamee
et al., 1993; Ghodsian, 2003). Ghodsian (2003) investigated flow through side sluice gate in both free and submerged flow
conditions, using a series of laboratory experiments (Fig. 6). His experiments performed for various combinations of main
channel discharge (Q0), side sluice gate opening (a), upstream depth of flow (y0) for free flow conditions, and additional
parameter, i.e., tailwater depth (yt) for submerged flow conditions. Also, in his experiments, the flow was subcritical in the
main channel and it was assumed that the specific energy remains constant along the side sluice gate. Table 2 presents the
range of these parameters in Ghodsian (2003) study for submerged flow condition.
Decision tree algorithms Chapter
10
177
FIG. 6 Subcritical flow through side sluice gate.
side sluice gate
secondary channel
main channel
PLAN
y1
y0
a
b
SECTION
TABLE 2 Range of variables in Ghodsian (2003) study for submerged
flow through side sluice gate.
Variable definition
Variable range
Upstream depth, y0 (m)
0.08–0.4
Downstream depth, y1 (m)
0.09–0.39
Tailwater depth yt (m)
0.05–0.36
Sluice gate opening, a (m)
0.01–0.1
Upstream discharge, Q0 (m /s)
3
0.008–0.097
Side sluice gate discharge, QS (m /s)
0.003–0.046
y0/a
1.577–32.38
yt/y0
0.267–1.181
Fr
0.047–0.806
QS/Q0
0.067–0.968
3
3.1 Empirical formula for flow discharge
For submerged flow, Ghodsian (2003) proposed the flowing procedures to calculate flow discharge through submerged side
sluice gate:
1. For flow depth and discharge in upstream section, i.e., y0 and Q0, calculate specific energy E0 from:
E0 ¼ y 0 +
Q20
2gB2 y20
(9)
178 Handbook of hydroinformatics
2. Determine value of Fr and hence Cm from:
Cm ¼ 0:611
y0
a
y0
a
1
+1
!0:216 8
<
91
#0:67
y 0:2
=
0:46
2:5yt at
y0
0:24
+1
1 + 0:558Fr 0:1526
:
;
y0 yt
"
(10)
3. By assuming that E0 ¼ E1 ¼ E, calculate y1 using:
h y y 2abCm
y0
y i0:5 hy1 y i0:5
¼3
1 0
1 1
+ sin 1 0 sin 1 1
BE
E
E
E
E
E
E
(11)
4. Calculate downstream discharge Q1 from:
E1 ¼ y 1 +
Q21
2gB2 y21
(12)
5. Determine side sluice gate discharge QS from:
QS ¼ Q0 Q1
(13)
where B and b are the width of main channel and side channel, respectively. Cm is the discharge coefficient which used in
general formula of flow discharge through sluice gate. As the Eq. (11) is nonlinear then QS is calculated by trial and error
process. Furthermore, the Eq. (10) is also obtained by fitting the equation on experimental data and there are errors for
estimation of Cm. So, applicability of M5 model tree to estimate QS was examined by obtaining nondimensional parameters.
The flow discharge through submerged side sluice gate (QS) can be defined by the set of the following parameters:
QS ¼ ’ðQ0 , V 0 , y0 , yt , a, g, rÞ
(14)
where V0 is upstream flow velocity, g is acceleration due to gravity, and r is density of water. By applying the Buckingham
p theorem to Eq. (14) and considering y0, V0, and r as basic parameters, the dimensionless relationship becomes:
’
QS
Q0 yt a gy0
,
, , ,
y20 V 20 y20 V 0 y0 y0 V 20
¼0
(15)
By combining the first term and second term, and considering the last term as upstream Froude number, the following
equation is obtained:
QS
y y
¼ ’ t , 0 , Fr
Q0
y0 a
(16)
Experimental data sets of Ghodsian (2003) are used to evaluate M5 model tree in determining flow discharge trough side
sluice gate. The range of these dimensionless parameters are also presented in Table 2. The data set is randomly divided into
two groups. Of the total 185 data set, 80% (148 sets) is used for training M5 model, and the remaining 20% (37 sets) is used
for testing.
3.2 Model evaluation and comparison
In order to evaluate the model by testing data, the root mean square error (RMSE), discrepancy ratio (DR), mean percentage
error (MPE), and Nash-Sutcliffe model efficiency coefficient (NSE) was used as follows (Kouzehgar et al., 2021):
RMSE ¼
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uX
n
u
u
ðX i Y i Þ2
t
i¼1
n
(17)
Decision tree algorithms Chapter
DR ¼
MPE ¼
Yi
Xi
NSE ¼ 1 179
(18)
n
X
Xi Y i
100
Xi
i¼1
n
n
X
10
(19)
ðXi Y i Þ2
i¼1
n X
Xi X
2
(20)
i¼1
where Xi is the measured data, Yi is the calculated data, Xis the average of measured data, and n is the number of data. Lower
RMSE and MPE, and higher NSE values, indicate greater predictive power of the model.
4. Modeling and results
According to Eq. (16), QS/Q0 is selected as a dependent variable and y0/a, Fr and yt/y0 are selected as independent variables
for designing the M5 model tree. Then the ability of M5 model tree to estimate QS/Q0 compared with empirical formula.
4.1 Initial tree
The root node was selected with the following steps:
Step 1: The standard deviation of the target values was calculated as follows:
Standard deviation QS =Q0 ¼ 0:211
Step 2: For each variable, the data sets were sorted in ascending order and then SDR was calculated using Eq. (7). In this
case, first data were sorted in ascending order for y0/a (Column 2, Table 3). Then, the following calculations were done:
– First, it was assumed that the tree is divided in two groups; values less than and values greater than average of first
point and second point of y0/a (Column 4 and row 4 of Table 3). Then, the SD of corresponding QS/Q0 was calculated
for each of these two groups (i.e., first point and rest of the points) (Columns 5 and 6 and row 3 of Table 3). Then, it is
assumed that the tree is divided based on the average value of second and third points of y0/a (Column 4 and row 4 of
Table 3). So, the SD value of QS/Q0 for values less than and higher than the average was calculated. These processes
were continued until the SD of each group is calculated (Column 4 and 5 of Table 3).
– Then, each of these SDs were weighted (Column 6 Table 3) and subtracted from SD of target data (0.212), in order to
calculate SDR (Eq. 7) (Column 7 of Table 3).
– Finally, the maximum SDR of QS/Q0 was calculated.
Step 3: After finishing the calculation of maximum SDR for y0/a, then data were sorted for yt/y0 and once again for Fr,
calculations were repeated until the maximum SDR is calculated for all of the three variables (Tables 4 and 5).
Step 4: The maximum SDR of input variables were compared in Table 6. As Fr has the highest SDR, it is selected as root
node, and its value (Fr ¼ 0.276) was used for splitting (Fig. 7).
Step 5: After selecting the root node, the steps of 1–4 were repeated for each of Fr branches (i.e., less than 0.276 and
higher than 0.276). These steps in each branch were continued until the subset reach to the specific threshold. The
threshold was selected in a manner which the number of remained records in each leaf become less than 25. The initial
M5 tree and its linear models before pruning are illustrated in Fig. 8 and Table 7, respectively.
4.2 Pruning
The pruning was done by merging some lower subtrees into one node in order to avoid overfitting. The most popular and
logical pruning criterion is to remove some sets of leaf nodes, until a minimum overall error is reached. For the present
study, LM3 and LM4 leaf nodes were examined for pruning, firstly. As shown in Table 8, by pruning these two subtrees the
values of RMSE and MPE for training data set increase. But, for LM7 and LM8, the RMSE and MPE did not change
remarkable and they are suitable for pruning. The M5 model tree and its linear models after first pruning were presented
180 Handbook of hydroinformatics
TABLE 3 The data sorted by y0/a.
Records
y0/a
QS/Q0
1
1.577
0.463
2
1.804
0.620
3
1.864
4
Average of
adjutant y0/a
SD group 1
SD group 2
Weighted SD
SDR
(1.577, 1.717) ¼
1.691
0
0.212
0.2103
0.001054
0.276
(1.717, 1.804) ¼
1.834
0.079
0.211
0.2092
0.002186
2.017
0.259
1.941
0.141
0.212
0.2102
0.001134
5
2.087
0.734
2.052
0.148
0.212
0.2106
0.000754
6
2.111
0.450
2.099
0.187
0.210
0.2094
0.001979
7
2.314
0.546
2.212
0.171
0.211
0.2090
0.002388
8
2.315
0.778
2.315
0.160
0.210
0.2080
0.00332
…
…
…
…
…
…
…
…
96
9.032
0.106
8.936
0.202
0.122
0.1866
0.02476
…
…
…
…
…
…
…
…
148
32.38
0.239
31.965
0.212
0
0.2105
0.00813
max(SDR) ¼
0.02476
TABLE 4 The data sorted by yt/y0.
Records
yt/y0
QS/Q0
1
0.267
0.089
2
0.285
0.260
3
0.307
4
Average of
adjutant yt/y0
SD group 1
SD group 2
Weighted SD
SDR
(0.267, 0.285) ¼
0.276
0
0.211
0.2098
0.00156
0.191
(0.285, 0.307) ¼
0.296
0.086
0.212
0.2102
0.00117
0.309
0.239
0.308
0.071
0.212
0.2095
0.00187
5
0.323
0.253
0.316
0.066
0.213
0.2090
0.00234
6
0.350
0.247
0.336
0.064
0.214
0.2086
0.00278
7
0.355
0.315
0.352
0.060
0.214
0.2080
0.00330
8
0.386
0.231
0.370
0.066
0.215
0.2080
0.00334
…
…
…
…
…
…
…
…
18
0.468
0.715
0.458
0.068
0.221
0.2031
0.00822
…
…
…
…
…
…
…
…
148
1.181
0.179
1.085
0.212
0
0.2103
0.00102
max(SDR) ¼
0.00822
Decision tree algorithms Chapter
10
181
TABLE 5 The data sorted by Fr.
Records
Fr
Qs/Q0
1
0.047
0.797
2
0.049
0.662
3
0.083
4
Average of
adjutant Fr
SD group 1
SD group 2
Weighted SD
SDR
(0.047, 0.049) ¼
0.048
0.000
0.208
0.2069
0.00444
0.633
(0.049, 0.083) ¼
0.066
0.068
0.207
0.2051
0.00621
0.093
0.694
0.088
0.072
0.206
0.2033
0.00807
5
0.094
0.288
0.093
0.062
0.204
0.2004
0.01098
6
0.100
0.785
0.097
0.173
0.205
0.2038
0.00753
7
0.101
0.650
0.100
0.170
0.202
0.2004
0.01098
8
0.103
0.715
0.102
0.157
0.200
0.1982
0.01313
…
…
…
…
…
…
…
…
76
0.281
0.291
0.276
0.235
0.110
0.1734
0.03794
…
…
…
…
…
…
…
…
148
0.838
0.806
0.787
0.212
0
0.2102
0.00112
max(SDR) ¼
0.03794
TABLE 6 The value of SDR for independent variables.
Variable
max(SDR)
Value
y0/a
0.02476
8.94
yt/y0
0.00822
0.458
Fr
0.03794
0.276
Fr>0.276
?
?
FIG. 7 Root node.
in Fig. 9 and Table 9, respectively. Then, the new subtrees (LM7new and LM6) were pruned. Since the RMSE and MPE did
not change remarkable in compare with train dada set (Table 8), they considered for pruning. The final M5 model tree and
the final linear models were presented in Fig. 10 and Table 10, respectively. Comparison between predicted and measured
QS/Q0 for training data sets based on final linear models, shows that the present model with high NSE of 0.9605 and low
RMSE of 0.042, can well estimate the flow discharge through side sluice gate (Fig. 11).
Fig. 12 shows the performance of the M5 model tree for testing data sets. The results showed that the data points are
concentrated on 1:1 line. The statistical parameters for testing data sets were presented in Table 11. The RMSE of 0.0678
and NSE of 0.8837 confirm the goodness of the estimation.
182 Handbook of hydroinformatics
FIG. 8 Initial M5 model tree.
Fr>0.276
y0/a>3.281
LM5
y0/a>7.934
LM1
LM6
Fr>0.35
LM7
y0/a>11.91
LM8
Fr>0.135
y0/a>4.528
LM3
LM2
LM4
TABLE 7 Initial linear models.
Condition
LM
Regression equations
Fr < 0.276, y0/a > 11.91
LM1
QS/Q0 ¼ 0.8724 0.01147 y0/a 0.3525 yt/y0–1.351 Fr
Fr < 0.276, y0/a < 11.91, Fr < 0.135
LM2
QS/Q0 ¼ 1.787 0.0479 y0/a 0.784 yt/y0 1.50 Fr
Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a > 4.528
LM3
QS/Q0 ¼ 0.539 0.02126 y0/a 0.146 yt/y0 + 0.011 Fr
Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a < 4.528
LM4
QS/Q0 ¼ 3.489 0.1578 y0/a 2.283 yt/y0 2.339 Fr
Fr > 0.276, y0/a < 3.281
LM5
QS/Q0 ¼ 2.482 0.1484 y0/a 1.653 yt/y0 0.7286 Fr
Fr > 0.276, y0/a > 3.281, y0/a < 7.934
LM6
QS/Q0 ¼ 0.9449 0.03899 y0/a 0.4947 yt/y0 0.3498 Fr
Fr > 0.276, y0/a > 3.281, y0/a > 7.934, Fr > 0.35
LM7
QS/Q0 ¼ 0.4424 0.009900 y0/a 0.1931 yt/y0 0.1918 Fr
Fr > 0.276, y0/a > 3.281, y0/a > 7.934, Fr < 0.35
LM8
QS/Q0 ¼ 0.6728 0.012643 y0/a 0.20633 yt/y0 0.7433 Fr
TABLE 8 Statistical parameters of pruning process.
Data
set
Statistical
parameters
Without
pruning
Only with LM3,4
pruning
Only with LM7,8
pruning
With LM6 and LM7 new
pruning
Train
RMSE
0.0393
0.0867
0.0394
0.0420
MPE
1.8045
5.4721
1.8224
1.8599
NSE
0.9654
0.8317
0.9653
0.9605
Average (DR)
1.018
1.0547
1.0182
1.0186
Standard
deviation (DR)
0.1565
0.3755
0.1576
0.1898
Decision tree algorithms Chapter
y0/a>3.281
LM7
183
FIG. 9 M5 model tree after first pruning.
Fr>0.276
y0/a>11.91
LM5
y0/a>7.934
10
LM1
LM6
Fr>0.135
y0/a>4.528
LM3
LM2
LM4
TABLE 9 Linear models of first pruning.
Condition
LM
Regression equations
Fr < 0.276, y0/a > 11.91
LM1
QS/Q0 ¼ 0.8724 0.01147 y0/a 0.3525 yt/y0 1.351 Fr
Fr < 0.276, y0/a < 11.91, Fr < 0.135
LM2
QS/Q0 ¼ 1.787 0.0479 y0/a 0.784 yt/y0 1.50 Fr
Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a > 4.528
LM3
QS/Q0 ¼ 0.539 0.02126 y0/a 0.146 yt/y0 + 0.011 Fr
Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a < 4.528
LM4
QS/Q0 ¼ 3.489 0.1578 y0/a 2.283 yt/y0 2.339 Fr
Fr > 0.276, y0/a < 3.281
LM5
QS/Q0 ¼ 2.482 0.1484 y0/a 1.653 yt/y0 0.7286 Fr
Fr > 0.276, y0/a > 3.281, y0/a < 7.934
LM6
QS/Q0 ¼ 0.9449 0.03899 y0/a 0.4947 yt/y0 0.3498 Fr
Fr > 0.276, y0/a > 3.281, y0/a > 7.934
LM7
QS/Q0 ¼ 0.4754 0.009931 y0/a 0.2031 yt/y0 0.2444 Fr
FIG. 10 M5 model tree after second pruning (final M5
model tree).
Fr>0.276
y0/a>3.281
LM6
y0/a>11.91
LM5
LM1
Fr>0.135
y0/a>4.528
LM3
LM4
LM2
184 Handbook of hydroinformatics
TABLE 10 Linear models of second pruning (final models).
Condition
LM
Regression equations
Fr < 0.276, y0/a > 11.91
LM1
QS/Q0 ¼ 0.8724 0.01147 y0/a 0.3525 yt/y0 1.351 Fr
Fr < 0.276, y0/a < 11.91, Fr < 0.135
LM2
QS/Q0 ¼ 1.787 0.0479 y0/a 0.784 yt/y0 1.50 Fr
Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a > 4.528
LM3
QS/Q0 ¼ 0.539 0.02126 y0/a 0.146 yt/y0 + 0.011 Fr
Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a < 4.528
LM4
QS/Q0 ¼ 3.489 0.1578 y0/a 2.283 yt/y0 2.339 Fr
Fr > 0.276, y0/a < 3.281
LM5
QS/Q0 ¼ 2.482 0.1484 y0/a 1.653 yt/y0–0.7286 Fr
Fr > 0.276, y0/a > 3.281
LM6
QS/Q0 ¼ 0.6677 0.01923 y0/a 0.3117 yt/y0 0.2728 Fr
1
Predicted (QS/Q0)
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Measured (QS/Q0)
FIG. 11 Comparison between measured and predicted QS/Q0, training data set.
1
FIG. 12 Comparison between measured and predicted
QS/Q0 for M5 model tree and empirical formula, testing
data set.
Predicted (QS/Q0)
0.8
M5 Model
Tree
0.6
Emperical
Formula
0.4
1:1 line
0.2
0
0
0.2
0.4
0.6
0.8
1
Measured (QS/Q0)
4.3 Comparing M5 model and empirical formula
The performance of the M5 model tree was compared with the empirical formula proposed by Ghodsian (2003) for flow
discharge through side sluice gate in submerged flow conditions (Table 11). The high performance of the M5 model tree
against empirical formula can be seen through all statistical parameters. The proposed formula is not straight forward and
need trial and error process. The results also showed that the empirical formula overestimates the QS/Q0 values.
Decision tree algorithms Chapter
10
185
TABLE 11 Statistical parameters for testing data sets.
Data set
Statistical parameters
M5 model tree
Empirical formula (Ghodsian, 2003)
Test
RMSE
0.0678
0.1279
MPE
2.7123
32.8597
NSE
0.8837
0.5863
Average (DR)
1.0271
1.3286
Standard deviation (DR)
0.2436
0.2190
5. Conclusions
In this chapter book, some of the standard decision tree algorithms are briefly explained. Among these algorithms, the M5
model tree, which is extensively applied in water sciences, is explained in details and its application in prediction of flow
discharge through side sluice gate is investigated. The model constructed herein is based on the laboratory experiments of
Ghodsian (2003). In this study, QS/Q0 is selected as a dependent variable and y0/a, Fr and yt/y0 are selected as independent
variables for designing the M5 model tree. Five regression equations are developed by the M5 model tree, which consist
with the parameters used in empirical formula proposed by Ghodsian (2003). The importance of Fr number is also mentioned in Ghodsian (2003), which was expected to be as a root node, and in this study it was appeared in top of the tree. By
using different statistical parameters, the results compared with the results of empirical formula. The results demonstrated
that M5 model tree is more accurate than the empirical formula which needs try and error and is a time-consuming process.
References
Bahmani, R., Solgi, A., Ouarda, T.B., 2020. Groundwater level simulation using gene expression programming and M5 model tree combined with wavelet
transform. Hydrol. Sci. J. 65 (8), 1430–1442.
Bhattacharya, B., Price, R., Solomatine, D., 2007. Machine learning approach to modeling sediment transport. J. Hydraul. Eng. 133, 440–450.
Bonakdar, L., Etemad-Shahidi, A., 2011. Predicting wave run-up on rubble-mound structures using M5 model tree. Ocean Eng. 38, 111–118.
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and Regression Trees. CRC Press.
Bui, D.T., Khosravi, K., Tiefenbacher, J., Nguyen, H., Kazakis, N., 2020a. Improving prediction of water quality indices using novel hybrid machinelearning algorithms. Sci. Total Environ. 742, 141568.
Bui, X.N., Nguyen, H., Choi, Y., et al., 2020b. Prediction of slope failure in open-pit mines using a novel hybrid artificial intelligence model based on
decision tree and evolution algorithm. Sci. Rep. 10 (9939). https://doi.org/10.1038/s41598-020-66904-y.
Charoenporn, P., 2017. Reservoir inflow forecasting using ID3 and C4. 5 decision tree model. In: 2017 3rd IEEE International Conference on Control
Science and Systems Engineering (ICCSSE). IEEE, pp. 698–701.
Choubin, B., Darabi, H., Rahmati, O., Sajedi-Hosseini, F., Kløve, B., 2018a. River suspended sediment modelling using the CART model: a comparative
study of machine learning techniques. Sci. Total Environ. 615, 272–281.
€ 2018b. Precipitation forecasting using classification and
Choubin, B., Zehtabian, G., Azareh, A., Rafiei-Sardooi, E., Sajedi-Hosseini, F., Kişi, O.,
regression trees (CART) model: a comparative study of different approaches. Environ. Earth Sci. 77, 314.
El Seddawy, A.B., Sultan, T., Khedr, A., 2013. Applying Classification Technique using DID3 Algorithm to improve Decision Support System under
Uncertain Situations. Tersedia melalui: www.ijmer.com [Diakses 21 November 2015].
Etemad-Shahidi, A., Mahjoobi, J., 2009. Comparison between M50 model tree and neural networks for prediction of significant wave height in Lake
Superior. Ocean Eng. 36, 1175–1181.
Fisher, A., Belmont, P., Murphy, B.P., Macdonald, L., Ferrier, K.L., Hu, K., 2021. Natural and anthropogenic controls on sediment rating curves in
northern California coastal watersheds. Earth Surf. Process. Landf. 46 (8), 1610–1628.
Galelli, S., Castelletti, A., 2013. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling. Hydrol. Earth Syst. Sci.
17, 2669.
Ghodsian, M., 2003. Flow through side sluice gate. J. Irrig. Drain. Eng. 129, 458–463.
Goyal, M.K., Ojha, C., 2012. Downscaling of precipitation on a lake basin: evaluation of rule and decision tree induction algorithms. Hydrol. Res. 43,
215–230.
Goyal, M.K., Ojha, C., Singh, R., Swamee, P., Nema, R., 2013. Application of ANN, fuzzy logic and decision tree algorithms for the development of
reservoir operating rules. Water Resour. Manage. 27, 911–925.
Kass, G.V., 1980. An exploratory technique for investigating large quantities of categorical data. J. R. Stat. Soc.: Ser. C: Appl. Stat. 29, 119–127.
186 Handbook of hydroinformatics
Khosravi, K., Miraki, S., Saco, P.M., Farmani, R., 2021. Short-term river streamflow modeling using ensemble-based additive learner approach. J. Hydro
Environ. Res. 39, 81–91.
Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I., Bui, D.T., 2018a. A comparative assessment of decision trees
algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 627, 744–755.
Khosravi, K., Mao, L., Kisi, O., Yaseen, Z.M., Shahid, S., 2018b. Quantifying hourly suspended sediment load using data mining models: case study of a
glacierized Andean catchment in Chile. J. Hydrol. 567, 165–179.
Khosravi, K., Melesse, A.M., Shahabi, H., Shirzadi, A., Chapi, K., Hong, H., 2019. Chapter 33: Flood susceptibility mapping at Ningdu catchment, China
using bivariate and data mining techniques. In: Extreme Hydrology and Climate Variability: Monitoring, Modelling, Adaptation and Mitigation.
Elsevier, pp. 419–434.
Khosravi, K., Cooper, J.R., Daggupati, P., Pham, B.T., Bui, D.T., 2020. Bedload transport rate prediction: application of novel hybrid data mining techniques. J. Hydrol. 585 (8), 124774.
Khozani, Z.S., Khosravi, K., Pham, B.T., Kløve, B., Wan Mohtar, W.H.M., Yaseen, Z.M., 2019. Determination of compound channel apparent shear stress:
application of novel data mining models. J. Hydroinform. 21, 798–811.
Kisi, O., Shiri, J., Demir, V., 2017. Hydrological time series forecasting using three different heuristic regression techniques. In: Handbook of Neural
Computation. Elsevier, pp. 45–65.
Kouzehgar, K., Hassanzadeh, Y., Eslamian, S., Yousefzadeh Fard, M., Babaeian Amini, A., 2021. Experimental investigations and soft computations for
predicting the erosion mechanisms and peak outflow discharge caused by embankment dam breach. Arab. J. Geosci. 14, 616.
Kumar, P., Folk, M., Markus, M., Alameda, J.C., 2005. Hydroinformatics: Data Integrative Approaches in Computation, Analysis, and Modeling. CRC
Press.
Mahtabi, G., Satari, M., 2016. Investigation of hydraulic jump characteristics in rough beds using M5 model tree. Jordan J. Agric. Sci 12, 631–648.
Mazid, M.M., Ali, S., Tickle, K.S., 2010. Improved C4. 5 algorithm for rule based classification. In: Proceedings of the 9th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases. World Scientific and Engineering Academy and Society (Wseas),
pp. 296–301.
Milanovic, M., Stamenkovic, M., 2016. Chaid decision tree: methodological frame and application. Econ. Themes 54, 563–586.
Nalarajan, N.A., Mohandas, C., 2015. Groundwater level prediction using M5 model trees. J. Inst. Eng. (India): A 96, 57–62.
Nourani, V., Davanlou Tajbakhsh, A., Molajou, A., Gokcekus, H., 2019. Hybrid wavelet-M5 model tree for rainfall-runoff modeling. J. Hydrol. Eng. 24,
04019012.
Pal, M., Singh, N., Tiwari, N., 2013. Pier scour modelling using random forest regression. ISH J. Hydraul. Eng. 19, 69–75.
Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn. 1, 81–106.
Quinlan, J.R., 1992. Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence. World Scientific, pp. 343–348.
Quinlan, J.R., 1993. C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
Quinlan, J.R., 2014. C4. 5: Programs for Machine Learning. Elsevier.
Reddy, M.J., Ghimire, B.N., 2009. Use of model tree and gene expression programming to predict the suspended sediment load in rivers. J. Intell. Syst. 18,
211–228.
Rezaie-Balf, M., Naganna, S.R., Ghaemi, A., Deka, P.C., 2017. Wavelet coupled MARS and M5 Model Tree approaches for groundwater level forecasting.
J. Hydrol. 553, 356–373.
Salih, S.Q., Sharafati, A., Khosravi, K., Faris, H., Kisi, O., Tao, H., Ali, M., Yaseen, Z.M., 2020. River suspended sediment load prediction based on river
discharge information: application of newly developed data mining models. Hydrol. Sci. J. 65, 624–637.
Salzberg, S.L., 1994. In: Quinlan, J.R. (Ed.), C4. 5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc, Springer.
Sattari, M.T., Sureh, F.S., 2019. Drought prediction based on standardized precipitationevapotranspiration index by using M5 tree model. In: International
Civil Engineering and Architecture Conference (ICEARC). Trabzon, Turkey.
Sattari, M.T., Pal, M., Apaydin, H., Ozturk, F., 2013. M5 model tree application in daily river flow forecasting in Sohu Stream, Turkey. Water Resour. 40,
233–242.
Sattari, M.T., Mirabbasi, R., Sushab, R.S., Abraham, J., 2018. Prediction of groundwater level in Ardebil plain using support vector regression and M5 tree
model. Groundwater 56, 636–646.
Sattari, M.T., Ahmadifar, V., Delirhasannia, R., Apaydin, H., 2020. Estimation of pan evaporation coefficient in cold and dry climate conditions with a
decision-tree model. Atmósfera 34 (3). https://doi.org/10.20937/ATM.52777.
Sharafati, A., Khosravi, K., Khosravinia, P., Ahmed, K., Salman, S.A., Yaseen, Z.M., Shahid, S., 2019. The potential of novel data mining models for
global solar radiation prediction. Int. J. Environ. Sci. Technol. 16, 7147–7164.
Sihag, P., Karimi, S.M., Angelaki, A., 2019. Random forest, M5P and regression analysis to estimate the field unsaturated hydraulic conductivity. Appl.
Water Sci. 9, 129.
Singh, S., Gupta, P., 2014. Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey. Int. J. Adv. Inform. Sci. Technol. 27, 97–103.
Solomatine, D.P., Xue, Y., 2004. M5 model trees and neural networks: application to flood forecasting in the upper reach of the Huai River in China.
J. Hydrol. Eng. 9, 491–501.
Sullivan, W., 2017. 1: Machine Learning Beginners Guide Algorithms Supervised & Unsupervised Learning, Decision Tree & Random Forest Introduction. CreateSpace Independent Publishing Platform, South Carolina, USA.
Swamee, P.K., Pathak, S.K., Ali, M.S., 1993. Analysis of rectangular side sluice gate. J. Irrig. Drain. Eng. 119, 1026–1035.
Timofeev, R., 2004. Classification and Regression Trees (CART) Theory and Applications. Humboldt University, Berlin, Germany, pp. 1–40.
Decision tree algorithms Chapter
10
187
Tyralis, H., Papacgaralampous, G., Langousis, A., 2019. A brief review of random forests for water scientists and practitioners and their recent history in
water resources. Water 11, 910.
Wang, Y., Witten, I.H., 1996. Induction of Model Trees for Predicting Continuous Classes, Working Paper 96/23. University of Waikato, Department of
Computer Science, Hamilton, New Zealand.
Witten, I.H., Frank, E., 2002. Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec. 31, 76–77.
Zahiri, J., Mollaee, Z., Ansari, M.R., 2020. Estimation of suspended sediment concentration by M5 model tree based on hydrological and moderate resolution imaging spectroradiometer (MODIS) data. Water Resour. Manage. 34, 3725–3737.
This page intentionally left blank
Chapter 11
Entropy and resilience indices
Mohammad Ali Olyaeia, A.H. Ansarib, Zahra Heydaric, and Amin Zeynolabedind
a
Department of Civil Environmental and Geo-Engineering, University of Minnesota, Minneapolis, MN, United States, b Department of Agricultural and
Biological Engineering, Pennsylvania State University, State College, PA, United States, c Department of Civil and Environmental Engineering, University
of Illinois at Urbana-Champaign, Urbana, IL, United States, d School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran
1. Introduction
Urbanization with its rapid population growth in the 21st century has been leading to the concentration of population and
assets in hazard-prone urban areas, a speeding-up trend not likely to slow down in any near future. The inevitable exposure
resulting from this urbanization trajectory and the embedded conditions of built environments, spatial, and industrial vulnerabilities produce disaster risks when coupled with climate change-driven natural disasters (Gencer, 2013). Settlement
patterns, urbanization, and changes in socioeconomic conditions have all influenced observed trends in exposure and vulnerability to climate extremes. These urban climate hazard risks, vulnerabilities, and impacts are increasing across the
world, specifically in regions that are not able to meet their city’s needs due to inadequate capacity, unstable governance
structures, and substandard infrastructure, built-environment, and urban services (Revi et al., 2014).
The effects of climate-related disasters are often exacerbated in cities due to interactions with urban infrastructure
systems, growing populations, and economic activities. Natural disasters and the severity of their impacts expose a need
for an enhanced policy framework, particularly in urban areas, holding the majority of the world’s population. It is essential
to understand the correlation between the effects of climate change and hazard risks in urban areas and to address integrated
strategies for disaster risk reduction. More frequent climate extreme events are being experienced in urban areas. The frequency and severity of these disasters have been intensifying in the last few decades and are projected to increase in the
coming decades likewise. The impacts of climate change put additional pressure on existing urban water systems (UWS)
and can lead to negative impacts on human health, economies, and the ecosystem. Such impacts include increased frequency of extreme weather events leading to large volumes of stormwater runoff, rising sea levels, and changes in surface
water and groundwater (Melo et al., 2010; Magrin et al., 2014; Zeynolabedin et al., 2021).
Climate patterns are not changing in our favor, but our approaches and strategies must. Numbers of new concepts of
disaster risk management (DRM) have gained prominence over the past decade in international development discourse.
Among the mentioned concepts, resilience has emerged as one of the core principles in sustainable urban development
(Eslamian and Eslamian, 2021). Given that more than half of the world’s population lives in urban areas and that this percentage is expected to significantly increase in upcoming decades, cities must focus attention on disaster risk reduction and
enhancing resilience (United Nations, 2004). Assuming that urban decision-makers have the necessary institutional
capacity, their ability to ensure resilient futures could be redirected through strategic development initiatives such as
effective risk management, adaptation, and urban planning systems (Folke et al., 2010; Solecki et al., 2011).
Disaster risk reduction and climate change adaptation are the cornerstones of making cities resilient to a changing
climate. Integrating these activities with a metropolitan region’s development vision requires a new, systems-oriented
approach to risk assessments and planning. Moreover, since past events can only partially inform decision-makers about
emerging and increasing climate risks, risk assessments must incorporate knowledge about current climate patterns and
future projections simultaneously (Wang et al., 2014). This revision requires urban stakeholders and decision-makers
to increase the institutional capacity of many communities and organizations to strategize and apply risk-reduction, disaster
response and recovery plans on a flexible and highly adaptive basis.
UWSs play a vital role in building resilient cities. Both natural and anthropogenic activities could exert pressure on these
systems in a way that their normal operation becomes interrupted. Learning to be prepared for any potential hazard requires
comprehensive information regarding the UWS’s performance and how exactly could failure happen in such unlikely
circumstances.
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00025-7
Copyright © 2023 Elsevier Inc. All rights reserved.
189
190
Handbook of hydroinformatics
2.
Water resource and infrastructure performance evaluation
The design and operation of UWSs require an evaluation of the performance of these systems against different stressors,
which is measured by different metrics. Reliability, resilience, vulnerability (Hashimoto et al., 1982; Fowler et al., 2003),
and risk are the most widely used performance metrics, which are very popular among UWSs researchers and engineers.
Evaluating the performance of UWSs by these metrics requires the identification of what constitutes an unsatisfactory state
or failure (Asefa et al., 2014). Failure means that the system is unable to perform its required function and the definition of
failure varies from system to system. For example, in a drainage system, hydraulic overloading caused by extreme rainfall
could be considered a failure (Mugume et al., 2015). In a water distribution system, pressure reduction caused by component failures (pumps and pipes) or by large demands could be taken into account as a failure (Setiadi et al., 2005).
In the context of urban sewer and treatment systems, the overflow of effluent with a concentration exceeding the capacity
of receiving water is considered a failure. In a water supply system, failure occurs when supply cannot meet demand
(hatched areas in Fig. 1).
UWS has long been designed based on the “fail-safe” approach, i.e., to provide a high level of reliability over design life
for an acceptable level of risk. Reliability refers to the capacity of the system to minimize the frequency of its failures under
design condition (Butler et al., 2014). In Fig. 1, the suppliers are in either satisfactory, S, or unsatisfactory (hatched areas),
U, states, depending on whether they meet a constant water demand, D, for which reliability is defined as follows:
Rel ¼ PðV t SÞ
(1)
volume
where Vt represents the volume of water supplied at time t. In other words, the ratio of total times the system operates
successfully, i.e., Vt > D to the total operating time, T, or simply its probability of successful operation, P(Vt S), is called
reliability ( Jung et al., 2014).
Although reliability is one of the key elements in the design, operation and maintenance of UWSs, relying on this metric
alone is not sufficient to evaluate the performance of these systems against all stressors (Butler et al., 2017). For example, in
urban drainage systems, Park et al. (2013) show that the reliability approach for evaluation of structural failure is not sufficient due to the unknown mechanisms of failure. Reliability-based design does not include all the statistics of system
performance (Asefa et al., 2014; Hashimoto et al., 1982). Reconsidering Fig. 1 for example, supplier 1 does not meet
the demand between t5 and t6 and supplier 2 fails to meet this demand between t1 and t2 and between t3 and t4. The total
failure duration of the two suppliers is the same during operation and thus their reliabilities are equal to each other.
However, the failure magnitude of supplier 1 is greater than those of supplier 2, which needs to be considered in characterizing the system performance. In addition, increasing the reliability of UWS does not necessarily lead to a reduction in
the level of service failures when subject to exceptional conditions (Sweetapple et al., 2018).
Uncertainties in exceptional conditions have led to criticism of the “fail-safe” approach. According to critics, system
failures are unavoidable and water resources and infrastructure need to be designed not only reliable under design condition
but also to be resilient to unexpected stressors (Butler et al., 2014; Mugume et al., 2015). Conventionally, the system’s
supply #1
supply #2
demand
M(U)
M(U)
t1
t2
t3
FIG. 1 Failure in supplying downstream water demand (two alternatives).
t4
t5
t6
T
time
Entropy and resilience indices Chapter
11
191
behavior is evaluated against stresses outside of design condition by risk analysis. Risk is defined as a function of probability and magnitude of unsatisfactory state or failure. Based on Fig. 1, the risk is formulated as follows:
Risk ¼ PðV t U Þ MðU Þ
(2)
where P(Vt U) indicates the probability of failure, and M(U) represents severity or magnitude of this failure (maximum
gap between supply and demand).
Risk analysis is an efficient tool in the toolbox of UWS designers for evaluating the performance of systems in exceptional conditions. However, on many occasions, sufficient data are not available to describe risk (Ansari et al., 2021) and
even if there is enough recorded data of stressors, nonstationarity prohibits the accurate estimation of risk due to changes in
the response of systems to stressors (Karamouz and Mohammadi, 2020; Panos et al., 2021). Therefore, the records used in
risk analysis are not reliable for the prediction of future performance (Park et al., 2013).
Limitations in reliability and risk analysis (Sweetapple et al., 2018) and the inability of conventional design approaches
for addressing uncertainties have led to a transition in the design approach from “fail-safe” to “safe-fail.” In the “safe-fail”
approach, the different modes of system failure are investigated regardless of causal events and the system is designed to
absorb these stressors at least for a short time, and if it fails, it can recover quickly. In the “safe-fail” approach, achieving
maximum resilience is considered as a goal in multiobjective design optimization (Mohammadiun et al., 2018) and complements reliability and risk analysis (Kjeldsen and Rosbjerg, 2004; Sweetapple et al., 2018).
Sustainability in recent years has been recognized as an overarching goal in the design and operation of UWS. Sustainability refers to the capacity of the system to maintain its long-term performance while maximizing its economic, social,
and environmental goals (Sweetapple et al., 2019; Tavakol-Davani et al., 2019). In the light of achieving this goal under
uncertain circumstances and paradigm shift from “fail-safe” to “safe-fail,” concepts such as entropy and resilience have
been highlighted by researchers. Entropy addresses uncertainty to provide a basis for risk and reliability analysis while
resilience is crucial toward more sustainable urban water systems under uncertainty (Diao et al., 2016; Ahern, 2011). Basically, these two concepts became popular in water resource management in response to the limited data and incomplete
information. In the following, each of these concepts will be discussed and their applications will be reviewed in the environmental and water resources area.
3. Entropy
Water resources systems are inherently complex systems and are stochastic in nature due to the randomness of hydrological
processes and climate forces. Therefore, these systems require a stochastic description and probabilistic methods make such
a description explicitly possible. However, the lack of sufficient data, incomplete information, and small sample sizes challenge the estimation of probability distributions. Entropy theory alleviates this problem by providing least biased probability distributions with such limited data.
The concept of entropy was originally introduced by Clausius in thermodynamics. This term has a statistical and probabilistic nature and is interpreted as a measure of the amount of chaos, which shows the macroscopic behavior of a system in
a thermal equilibrium state. Entropy can be examined in three different but related areas: thermodynamic entropy,
statistical-mechanical entropy and information entropy.
3.1 Thermodynamic entropy
Thermodynamic entropy shows the state of systems in thermal equilibrium. In the equilibrium state, the entropy is at its
maximum and the entropy production per unit mass is at its minimum. This principle justifies the behavior of many natural
systems, such as hydrological and river morphological processes.
In recent decades, thermodynamic entropy has been considered by researchers to assess the sustainability of urban
systems. The relationship between the second law of thermodynamics and the degradation of natural resources was first
investigated by Georgescu-Roegen (1993). Subsequently, many researchers have tried to justify urban sustainability,
human environmental and economic activities (Daly, 1992), degradation of resources and the flow of matter and energy
upon the light of entropy and the second law of thermodynamics. According to the second law of thermodynamics, heat
cannot by itself transfer from a colder to a warmer body. This law states that all real processes are irreversible. In an irreversible process, work is lost and leads to the production of additional entropy in an isolated system. So the changes of total
entropy in the system, unlike reversible processes, is not zero but positive, △ S > 0. Today, many problems in the urban
basin and climate are linked to increased material entropy due to the irreversible degradation of resources and impossibility
of complete recycling of matter (Purvis et al., 2017).
192
Handbook of hydroinformatics
3.2 Statistical-mechanical entropy
In 1870, Boltzmann examined thermodynamic entropy at the molecular level using statistical mechanics. Each molecule in
a system can move at a set of discretized energy levels. As the temperature of a system rises, the molecules can move at
higher energy levels. Therefore, with increasing temperature, i.e., entropy of the system, the number of energy levels
available for molecules increases and the probability that the system will be found at a certain level of energy is reduced.
In other words, uncertainty about the state of the system increases. Lewis and Randall (1961) state that the most probable
distribution of energy in the system is such that the entropy of the whole system becomes at maximum level. This theory so
called “principle of maximum entropy” has been applied to a variety of problems in water resource management including
the derivation of probability distributions and parameter estimation.
3.3 Information entropy
The concept of information entropy was formed in Shannon’s (Shannon, 1948) mind in response to the question of how to
quantify the uncertainty of an information system. Shannon’s entropy is not unrelated to the concept of statisticalmechanical or Boltzmann’s entropy. The entropy of a random process shows how uncertain we are about the outcome
of this process. So the more unpredictable the outcome of a process and the more uncertainty about its outcome, the more
entropy there is, and vice versa. Shannon used a logarithmic scale to measure the uncertainty of a random process or the
difficulty of guessing its result.
X
H ðX Þ ¼
PðXÞ log PðXÞ
(3)
where H(X) indicates the information entropy of a random variable X ¼ {x1, x2, …, xn} with probability distribution of P(X).
Consider, for example, tossing a coin. The sample space of this random event includes tail and head, S ¼ {T, H}. If the
coin is perfect, the probability of getting heads and tails on the toss is the same and equal to 0.5 and in this case, entropy is
equal to 1. If the coin is defective and the probability of getting heads on the toss approaches 1 or 0, uncertainty about the
outcome of this random event decreases. Fig. 2 shows the entropy for the different probabilities of getting heads on the toss.
According to Fig. 2, in extreme cases, i.e., when the coin toss is definitely tail or head, tossing the coin will not add any
information to us. Because we already knew the result of the event with certainty and coin toss is no longer a random event.
If the probabilities of getting tail and head on the toss are equal, or in other words, the probability distribution is uniform,
uncertainty about the outcome of the coin toss will be maximized in this case. Among the various distributions, uniform
distribution has the highest degree of uncertainty or entropy and is used when information is very limited, followed by
Gaussian distribution, logistic, Laplace and extreme-value distributions (Mukher jee and Ratnaparkhi, 1986).
Entropy has numerous applications in water resource management particularly in case of joint and conditional probabilities. Regarding the joint probability, consider two random events, X and Y. We show the joint probability of these two
events as P(X, Y) which indicates the probability that these two events occur simultaneously. The joint entropy of X and Y is
calculated based on their joint probability as follows:
FIG. 2 Entropy of tossing a coin with different
probability of getting head on the toss. (Adapted
from Cover, T.M., Thomas, J.A., 1991. Information
theory and statistics. In: Elements of Information
Theory. John Wiley & Sons, New York,
pp. 279–335.)
1
0.8
entropy
0.6
0.4
0.2
0
0
0.2
0.4
0.6
probability of getting heads on toss
0.8
1
Entropy and resilience indices Chapter
HðX, Y Þ ¼ X
PðX, Y Þ log PðX, Y Þ
11
193
(4)
i,j
It can be shown that the entropy of the joint occurrence of these two events is always less than the total entropy of the
occurrence of each event and only if the events are independent of each other, the entropy or uncertainty of their simultaneous occurrence is equal to the total entropy of each event.
HðX, Y Þ HðXÞ + H ðY Þ
(5)
If X and Y are not independent of each other, the conditional probability of Y given X according to Bay’s law is written as
Eq. (6):
PðYjXÞ ¼
Pð X \ Y Þ
Pð X Þ
(6)
It can be expected that if two random events are not independent, observing the occurrence of each of them will lead to a
reduction in uncertainty in predicting the outcome of the other event. With this background on conditional entropy, we
calculate the conditional entropy of Y given X. Assuming that X and Y are two dependent random events, the entropy
of Y given X is calculated as follows:
H ðYjXÞ ¼ H ðX, Y Þ H ðXÞ
(7)
where H(Y j X) is the conditional entropy which measures the uncertainty of Y remaining after knowing X. According to
Eq. (7), the joint entropy of the two random events X and Y is equal to the entropy of event X plus the entropy of event
Y conditional upon the knowledge of event X, i.e.,
H ðX, Y Þ ¼ H ðXÞ + HðYjXÞ
(8)
Transinformation measures the redundant or mutual information between X and Y, which can be calculated as the difference
between the total entropy and the joint entropy of X and Y (Eq. 9).
T ðX, Y Þ ¼ HðXÞ + H ðY Þ H ðX, Y Þ
(9)
where T(X, Y) indicates the transinformation between X and Y. By inserting Eq. (8) into Eq. (9), the transinformation can be
calculated in terms of conditional entropy (Eq. 10).
T ðX, Y Þ ¼ H ðXÞ H ðXjY Þ or T ðX, Y Þ ¼ HðY Þ H ðYjXÞ
(10)
Transinformation or mutual information is an important variable in the design of monitoring networks. Entropy applications in water resource systems management will be reviewed later.
3.4 Application of entropy in water resources area
Entropy has been applied to a variety of problems in water resource systems planning and management including a derivation of distribution (Papalexiou and Koutsoyiannis, 2012), parameter estimation (Chen and Singh, 2018), streamflow
forecasting (Cui and Singh, 2015; Darbandsari and Coulibaly, 2020), the hydrologic cycle and water budget (Kleidon and
Schymanski, 2008), design of hydrological and water quality networks (Xu et al., 2018), channel hydraulics (Greco et al.,
2014), subsurface hydrology (Barbe et al., 1994), morphology (Ranjbar and Singh, 2020), reliability of water resource
systems (Setiadi et al., 2005; Tanyimboh, 2017), and risk analysis (Mobley et al., 2019; Liu et al., 2019; Qiu et al.,
2021). Some researchers such as Fistola (2011) and Pelorosso et al. (2017) used entropy in the sense of thermodynamics
in assessing the sustainability of urban development. In this sense, entropy has justified many hydrological and watershed
processes and has been considered by researchers such as Reggiani et al. (1998) in hydrological and watershed modeling.
The entropy exchange in these models, along with the balance equations of mass, momentum, and energy, is taken into
account for a hydrological system.
The principle of maximum entropy which originates from statistical entropy has also many applications in water and
environmental engineering. This principle in water and environmental engineering can be expressed as follows: when statistical inference is based on limited data and small samples, the probability distribution to be drawn should have the
maximum entropy based on this available information. This is equivalent to maximizing the information entropy. This
principle has been used to derive a variety of distributions that are widely used in hydrology and water resources, and
194
Handbook of hydroinformatics
parameter estimation. For example, Dong et al. (2013) employed the principle of maximum entropy to derive the bivariate
distribution of significant wave heights and the corresponding peak periods. Zhang et al. (2020) and Swetapadma and Ojha
(2021) applied this principle for parameter estimation in flood frequency analysis to minimize error and bias arising from
sampling methods and the selection of distribution models. In system engineering, the principle of maximum entropy provides a basis for risk and reliability analysis by approximating least-biased distribution tails from limited data (Zhang et al.,
2020; Singh, 1997).
Information entropy has found the most applications in water resource engineering. In designing hydrometeorological
networks (Xu et al., 2018; Wang et al., 2018) and water quality monitoring stations (Boroumand et al., 2018. Singh et al.,
2019; Banik et al., 2015), the optimal location of stations is determined based on information entropy by minimizing the
transinformation among stations. In risk analysis, the key influencing factors are identified based on information entropy
and their relative importance is determined based on entropy weight method (Ziarh et al., 2021; Liu et al., 2019;
Malekinezhad et al., 2021). Another application of information entropy in water resource system engineering is the estimation of the reliability of water distribution networks (Tanyimboh and Templeman, 1993), where the entropy acquired
another meaning, more connected to the redundancy. In other words, network entropy increases with adding pipes and
closing loops. This ensures that the flow of a node is supplied from alternative routes in case of failures, and that system
reliability is increased. The entropy of a water distribution network can be calculated as follows (Atkinson et al., 2014):
2
3
J
X
X
X
1
qij
Qj
S¼
=T ln Qj =T T 4 Qj =T ln Qj =T +
=Tj ln qij =Tj 5
(11)
T j¼1 j
jIN
iN
j
where, j indicates all nodes including source nodes, J indicates the number of nodes and IN the set consisting of source
nodes. Also, T is the total supply, Tj is the total flow reaching node j, including any external inflow while Nj represents
all nodes immediately upstream of and connected to node j. Qj is the demand at demand nodes or supply at source nodes
and qij is the flow rate in pipe ij. According to Eq. (11), increasing the source nodes and pipes to the upstream of demand
nodes results in higher entropy or redundancy of the network, and hence, higher reliability.
4.
Resilience
Resilience concept originates from the field of ecology in the 1970s and refers to the natural system’s persistence in case of
facing natural or anthropogenic causes (Holling, 1973; Dong et al., 2017). A resilient ecosystem is capable to remain its
functionality when exposed to external stress by changing its structure. This concept later gained prominence in the engineering field in the mid-1990s as the classical view toward designing the engineering systems in a way to prevent failure has
been challenged. A new paradigm considers the extreme conditions as an opportunity for a system to adapt and reorganize
( Juan-Garcı́a et al., 2017).
In all previous work regarding resilience in engineering systems, particular attention has been paid to how resilience is
implemented in practice. This effort is rather complex taking into account the fact that urban systems have both technological and social processes in which the resilience concept encompasses (Wang and Blackmore, 2009). Bruneau et al.
(2003) demystify this seemingly vague concept by defining resilience dimension in physical and social systems. They
stated that a resilient system must have a noticeable reduction in its failure probabilities, consequences and recovery time
and accordingly defined 4R terms, i.e., redundancy, resourcefulness, rapidity, and robustness as the dimension of resilience.
They further conceptualize resilience into different perspectives and break it into technical, organizational, social and economic aspects.
Basically, literature in resilience provides two approaches to quantify this concept by; (1) Metrics which characterizes
the inherent properties of a resilient system (attribute-based approach), and (2) Equations which monitor and evaluate the
performance of a system when is exposed to extreme stresses or performance-based approach (Hosseini et al., 2016;
Karamouz and Hojjat-Ansari, 2020). The underlying difference in these two methods lies in the way that resilience should
be perceived. While the attribute-based approach mainly concentrates on the system’s properties and sometimes these properties are suggested as indicators, the performance-based approach focuses on the ultimate objective of resilient systems
which is providing the specified services in an efficient and continuous way. Although there is clearly a relationship
between the effect of properties on system’s performance, the detail of this impact is not completely known (Butler
et al., 2017). It is believed that the resilient systems could be highly achieved when it is divided into various hierarchically
organized subsystems in the so-called centralized control and decentralized execution (CCDE) mode (Diao, 2021).
The typical performance curve of an engineering system during normal and extreme conditions is shown in Fig. 3.
During normal conditions, there is a fluctuation in performance as a result of changes in forcing data, malfunction in
Entropy and resilience indices Chapter
Normal condition
Extreme condition
11
195
Normal condition
System Performance
P0
Recovery
time
Hazard
duration
t0
t1
t2
Time
FIG. 3 Schematic presentation of system’s performance curve during normal and extreme condition.
properties and so on. With the onset of a natural or manmade hazard at t0, the system’s performance starts declining until the
point when the stressor terminates (t1). The system needs a time called recovery time to return to its normal operation. The
following equations could be used to quantify resilience based on the performance curve.
Z t2
r¼
½P0 PðτÞdτ; t ½t0 , t2 (12)
t0
0Z
B
B
Res ¼ 1 B
@
t0
t2
1
½P0 PðτÞdτ C
C
C
P0 ⁎ðt2 t0 Þ A
(13)
where P0 is the mean state of the system’s performance during normal conditions, P(τ) is the value of performance at a
measured time τ, t0 and t2 are the time of perturbation starts and ends, respectively. The Res metric value is dependent on the
shaded area in Fig. 3; the larger area, the less resilient the system becomes.
Some studies argue that the metrics based solely on the area of the performance curve is not sufficient and other metrics
representing the intensity and duration of perturbation must also be taken into consideration (Sweetapple et al., 2017;
Olyaei et al., 2018).
Resilience could be categorized in numerous ways. The first classification is attribute-based and performance-based.
Attribute-based or general resilience considers the system as a whole and refers to a system state which strengthens it to
limit the failure magnitude and duration to any threat. Performance-based or specified resilience, on the other hand, focuses
on a specified threat and refers to the agreed performance of the system in reducing failure magnitude and duration (Scholz
et al., 2012; Butler et al., 2014; Olyaei et al., 2018). Another classification is based on engineering resilience and ecological
resilience (Holling, 1996). With the latter considered as a more appropriate theoretical framework for management (Liao,
2012), the application of resilience in both connotations is reviewed further on. Global resilience analysis (GRA) is another
way to characterize resilience which shifts the focus from threat to solely performance by considering numerous and comprehensive sets of failure scenarios (Mugume et al., 2015).
It should be noted that resilience is somehow interwoven with some other terms such as reliability, risk and sustainability. The goal of resilience is to maintain a satisfactory state of the system under exceptional conditions and to quickly
recover once failure occurs (Park et al., 2013; Butler et al., 2014). In contrast to reliability and risk analysis, in which it is
necessary to identify hazards and characterization of probabilities, resilience analysis on a system can be performed for
highly improbable and even unobserved stressors (Fig. 4). Therefore, resilience analysis considers a wider range of
stressors and provides greater scope than risk analysis (Sweetapple et al., 2018).
4.1 Application of resilience in water resources area
Though the idea of resilience has a long history in engineering and ecology, its application to natural hazard management is
relatively recent (Berkes, 2007). Various strategies are used to reduce disaster risks and build resilience as well as to adapt
to climate change in urban areas. However, there is frequently a disconnection between climate change adaptation and
Handbook of hydroinformatics
frequency
196
design load
CDF
highly improbable
and
unobserved events
acceptable risk
PDF
reliability
design condition
risk
resiliency
exceptional condition
stress
FIG. 4 “Fail-safe” design approach.
disaster risk reduction research communities and a lack of collaborative integrated application of adaptation strategies in
these areas (Solecki et al., 2011). This is due to differences of emphasis between disaster risk and climate change research,
with the former focused on the past and present, and later on the impacts of future risk (Thomalla et al., 2006; UNDP, 2004;
Gencer, 2008). The application of resilience is what the mentioned discontinuity needs to be filled; to learn from the past
and prepare for the future is what underlies the notion of resilience. Since resilience is a multidisciplinary term, it has been
used in various areas of research. Regarding the water resources area in general, the previous studies could be categorized
into two distinct sections: resilience in UWS and in urban environment or ecology. The first section typically focuses on the
specific water engineering systems, particularly urban wastewater systems, while the second section deals with the way to
build resilient urban environments against natural hazards including flood and drought.
4.2 Resilience in UWS
Wastewater treatment plant (WWTP) is a strategic infrastructure with numerous purposes such as protecting the environment and provision of new resources in water scarce areas (Karamouz et al., 2018a). This infrastructure is exposed
to some natural and man-made stressors that may impact its efficient performance. The satisfactory performance of WWTP
is obtained through monitoring its effluent quality variables (such as BOD, COD, TSS, TN, TP) and ensuring that they are
not violating the standard values. These standards come from a so-called “permitting” approach that controls the risk
imposed by wastewater treatment systems. The standard values are determined based on the estimation of the impact
of releasing wastewaters to the environments. These permits are becoming stricter as protecting our environments gets
special significance; however, complying with strict regulations is costlier (Meng et al., 2016).
As it was stated in the definition of resilience, the performance of a system could be studied under a specified threat or
the attention could be put on the mode of system failure regardless of the type of threat. The failure of WWTP refers to the
times when the effluent exceeds the standards and this situation could happen as a result of two types of failure modes:
(1) Structural failure which refers to the malfunction in the WWTP components such as pumps, tanks or pipes; (2) Operational failure which relates to the components overloading such as solid washout in the aeration tank. Both failure modes
result in the inability of the failed components to deliver its desired function and eventually lead to the whole systems, i.e.,
WWTP failure (Mugume et al., 2015). Internal and external failure is another categorization for WWTP failure mode
(Sweetapple et al., 2019). The internal failure specifies component failure, which could be quantified by the percentage
loss of function. The external failure refers to the changes in the sewer influent characteristics.
Entropy and resilience indices Chapter
11
197
Sweetapple et al. (2017) assessed the performance of a WWTP against some predefined influent perturbations such as
increase in the flow rate, total nitrogen concentration, COD concentration, and temperature and presents a general
framework for designing a resilient and reliable WWTP.
Flooding is an example of natural hazard that could paralyze the normal serviceability of WWTP. There are two
stressors in flooding that could cause the malfunction of the system: (1) Enormous increase in the influent discharge that
might go beyond the capacity of the plants’ unit operations such as settling and aeration tanks; and (2) the inundation depth
that could cause malfunction in different unit operations. Olyaei et al. (2018) assessed the performance of a hypothetical
WWTP under both structural and operational failures by assessing resilience from three perspectives: based on the area of
the performance curve, the failure magnitude, and failure duration. They showed that the effect of flooding on various
effluent quality variables is disproportionate; while TSS and TN experience noticeable impact, the change in BOD and
COD are negligible. In another study by Olyaei and Karamouz (2020), it is shown that the biological parameters in WWTP
modeling could go through remarkable changes in flooding condition; therefore, uncertainty analysis is a useful tool to
quantify and capture these changes. In Fig. 5, the variation in TSS is shown, in which the rising in the time of flooding
is prominent.
For improving the resilience some measures called interventions should be implemented. These interventions include a
wide range of actions with manifold effects on long-term system’s performance, i.e., sustainability. The interventions typically divide into two classes: design and operational control.
Interventions based on design usually concentrate on enhancing the capacity of storage tanks preceding the WWTP and
the settling tanks by inserting backup tanks. In the case of flooding, there are other measures as well such as flood proofing
and elevating the equipment, providing backup power generation for pumping stations and emergency response such as
sandbagging (NYCDEP, 2013). These interventions sometimes need a considerable amount of budget for which the
optimum allocation is inevitable (Karamouz et al., 2018a). The other types of interventions are based on altering the operational control such as return flow or aeration value.
In an urban water distribution network, as another component of UWS, the performance of the system is measured by
the capacity to provide sufficient pressure and flow with appropriate quality. Different failure modes of the system include
structural failures such as pipe collapse due to traffic load, land subsidence, asset aging and decay, and operational failures
such as pump failure due to repairs or power outages, excess demand under fire fighting conditions and the intrusion of
chemical substances. The impact of such failures on the hydraulic and qualitative properties of the system can be simulated
in EPANET. Considering three types of common failures including pipe failure, excess demand of firefighting, and pollutant intrusion, Diao et al. (2016) examined the performance loss of these networks. The resilience assessment of these
networks, for example, against external pollutants was estimated by injecting a contaminant into a set of network nodes and
evaluating its effect on the quality of the flow throughout the networks. Their study shows that increasing resilience to one
stressor may reduce the resilience of the system to other stressors. Therefore, in resilience analysis, evaluating the performance of the system against a variety of potential hazards is inevitable, which is referred to as general resilience assessment.
Recent studies regarding resilience in water distribution networks highlight the role of network topology on the way the
systems could recover after a failure. For example, Meng et al. (2018) presents various metrics under six categories of
connectivity, efficiency, centrality, diversity, robustness and modularity. In particular modularity or system decomposition
was found to be imperative in system resilience, i.e., the ability of a system to be converted into multiple number of modules
or subsystems with stronger internal connections than external connections (Diao et al., 2021).
FIG. 5 The typical effluent of TSS concentration during flooding condition (the standard limit is depicted with dashed line) (Olyaei et al., 2018).
198
Handbook of hydroinformatics
FIG. 6 Resilience analysis of urban drainage system in SWMM (before and after failure scenario).
In urban drainage systems, resilience is targeted against structural failures such as pipe collapse and blockage, and operational failures caused by extreme rainfall events, and land use change (Panos et al., 2021), which leads to hydraulic overloading of the systems and urban flooding. Mugume et al. (2015) evaluated the resilience of these systems against structural
failures with a global resilience approach. Simulation of these systems was performed through dynamic flow routing in the
SWMM environment. Pipes’ collapse can be simulated by removing them from the network and blocking them by
increasing the manning coefficient (Fig. 6). In order to measure the resilience of these systems through classical resilience
formulas (Eqs. 12 and 13), the total flood volume and average flood duration due to the failure of system components are
defined as the magnitude and duration of failure. Dong et al. (2017) go one step further in evaluating the resilience of
drainage systems and, in addition to urban flooding (social severity), include sewer overflow (environmental severity),
and the operation of downstream wastewater treatment plant (technological severity) in the formulation of resilience
assessment of these systems.
Simultaneously with transition from “fail-safe” to “safe-fail” approach, researchers have evaluated the impacts of interventions on the resilience of urban infrastructure. Interventions in drainage systems are implemented with the aim of
achieving the best system performance and maximum resilience according to various criteria such as minimizing the frequency and volume of combined sewer overflow (CSO) or reaching the quality standard of effluent of wastewater treatment
plants. For example, increasing the capacity of pumps and the number and storage tanks are among the resilience-enhancing
interventions of drainage systems. In general, as the number of storage tanks in drainage systems increases, flooding subsides and resilience increases (Wang et al., 2021). Considering the social, economic and environmental aspects of the
failure of urban drainage systems, Sweetapple et al. (2019) found that by changing the type and magnitude of hazards,
resilience-enhancing interventions from a point onward, so called tipping point, will not lead to the sustainability of these
systems. For example, their studies showed that increasing the capacity of pumps in integrated drainage systems for different threat types and severities will not always result in sustainability.
Although much research has been done in the last two decades on the resilience of urban systems and resilienceenhancing interventions, the scope of this research is not limited to urban infrastructure, but has found broader dimensions
at the scale of urban environments, which will be discussed below.
4.3 Resilience in urban environments
The concepts of urban ecology and resilience are framed by the interrelationships between communities and the natural and
built environments at local, regional and global scales. The dynamic between these changing entities is fundamental to
resilience thinking and underpins the intentions of resilience: to understand and strengthen a city’s capacity to mitigate,
adapt to, and recover from internal and external shocks and stresses. Urban ecosystems are important components when
building urban resilience through their ability to absorb climate-induced shocks and ameliorate the worst effects of extreme
climate events (McPhearson et al., 2015). Despite the increasing attention given to the concept of resilience in hazard management and urban ecology, what defines resilience to natural disasters remains ambivalent. This section addresses
flooding, and drought to develop a rigorous definition on “urban resilience” that embraces inherent dynamism and
Entropy and resilience indices Chapter
11
199
uncertainties to provide unconventional perspectives for coping with natural hazards. Furthermore, the section aims to
emphasize the fact that it is vital for cities to catalyze the transformation from resistant to resilient; a shift from rigidity
to flexibility.
Urban areas are complex systems with social, ecological, economic, and technical/built components interacting dynamically in space and time (Pickett et al., 1997, 2001; Grimm et al., 2000; Niemel€a et al., 2011; McPhearson et al., 2016b). The
complex nature of urban systems can make it challenging to predict how ecosystems will respond to climate change in cities
(Batty, 2008; Bettencourt and West, 2010). This complexity is driven by many intersecting feedbacks affecting ecosystems,
including climate, biogeochemistry, nutrient cycling, hydrology, population growth, urbanization and development, human
perceptions, behavior, and more (Bardsley and Hugo, 2010; Pandey and Bardsley, 2013; Alberti, 2015; McPhearson et al.,
2016a; Tavakol-Davani et al., 2019). These systems interrelate dynamically with the social, ecological, economic, and
technological-built infrastructure of the city (Grimm et al., 2000; McDonnell and Hahs, 2013).
Patterns and processes of urban systems in this view emerge from the interactions and feedbacks between components
and systems in cities, emphasizing the need to consider multiple sources of social-ecological patterns and processes to
understand reciprocal interactions between climate change and urban ecosystems (Peterson, 2000). Applying the engineering resilience concept to urban environments subject to natural hazards is fundamentally problematic due to the outdated equilibrium theory. Recovery is often interpreted as returning to predisaster conditions, implicitly assuming an
optimal reference state, which nevertheless does not exist in coupled human-natural systems (Berkes, 2007). Urbanized
basins are such systems, where climate, socioeconomic trends, built systems, and riverine processes affect natural hazards
and disasters. They operate like evolving ecosystems rather than engineering systems and are characterized by complex
behaviors associated with nonlinearity, emergence, uncertainty, and surprise (Liu et al., 2007). Such dynamic systems will
not stay at a predetermined state. At the urban scale, resilience requires investment in man-made and nature-based “hard”
infrastructures, as well as “soft” systems such as knowledge and institutions. The concept of resilience when applied effectively can provide a useful base for more substantial changes in the underlying social, political and economic drivers of risk
and vulnerability. Factors that influence the resilience of cities include their (1) organizational structures (2) functions
(3) physical entities, and (4) spatial scales. A system with applied resilience can continually survive, adapt and grow in
the face of resource challenges and disturbances in an integrated and holistic manner for the well-being of the individual
and collective. Those challenges and disturbances may be discrete and temporary, such as floods, or endure over a longer
period, such as droughts and therefore individually discussed in the following subsections.
4.4 Resilience to floods
Resilience to climate change is a growing priority among urban decision-makers. Improving resilience will require transformations in social, ecological, and built infrastructure components of urban systems (Tyler and Moench, 2012; Ernstson
et al., 2010). Traditionally, urban planning and urban design have focused on settlement patterns, optimized land use, maximized proximity, community engagement, place-making, quality of life, and urban vitality. However, their focus is
increasingly expanding to include principles regarding the application of resilience. Conventional wisdom assumes that
flood resistance is necessary for cities; however, resilience theory suggests that it erodes urban resilience to floods
(Holling and Meffe, 1996). In effect, flood-control infrastructure puts the city in one or the other contrasting conditions:
dry and stable, or inundated and disastrous. With flood-control infrastructure in place, flooding results exclusively from the
infrastructure’s failure and is more hazardous than if there were no flood-control infrastructure (Tobin, 1995; Verchick,
2010), such that the natural process of flooding becomes a synonym to disaster.
In urban environments that are dependent on flood-control infrastructure, the river’s high flows are mostly confined
between levees or held behind the upstream dam. The flood frequency is dramatically reduced and river dynamics are
largely unnoticed. Each flood that is prevented is a loss of opportunity for learning (Klein et al., 1998; Colten and
Sumpter, 2009). Little flood experience leads to low awareness of flood risk among citizens (Nunes Correia et al.,
1998), who are too accustomed to operating under the dry-and-stable conditions, and know little about how to cope with
inundation once the flood-control infrastructure fails. Furthermore, flood-control infrastructure’s structural rigidity and
large scope leave little flexibility for making timely adjustments to constantly changing boundary conditions
(Pahl-Wostl, 2006). The existence of flood-control infrastructure also prevents the development of a diversity of floodcoping measures because the development of such measures is too expensive (Castonguay, 2007). Cities that are solely
dependent on flood-control infrastructures tend to address only the source of hazard and not its regarding built environment.
Flood-control infrastructure, as a centralized measure, creates a false sense of security that precludes the need for localized
flood-response capacity. In a resilient flood defense system, redundancy entails diversity and functional replication across
scales (Peterson et al., 1998). A resilient flood management system with redundancy would comprise a diversity of
200
Handbook of hydroinformatics
measures for mitigation, preparedness, response, and reorganization. The flood-response capacity would be distributed
across the levels, i.e., individuals, communities, and the municipality, such that when the capacity of one level is overwhelmed, the city can still count on the others. This high tolerance of socioeconomic state changes is resilience’s major
advantage over the traditional flood defense strategies which revolve around resistance.
To clarify the conceptual transformation from resistance to resilience, it could be simply stated that although resistance
and resilience have similar qualities, they hold different intentions. A resilient system is able to adjust flexibly in the occurrence of a hazard. A resilient system’s functions and core aims are maintained with only slight adjustment, although these
adjustments may be significant for subsystems or over time. In contrast to resistant systems, resilient systems can anticipate,
absorb, accommodate, or recover from the effects of a hazardous event in a timely and efficient manner through preservation, restoration, or improvement of the system’s essential basic structures and functions. Essentially, the system
responds by accepting loss and returning to its preshock/stress state, which in turn may be perceived by dominant actors
as the preferred state (Gencer et al., 2018).
Conventional planning approaches toward floods are heavily based on environmental stability, neglecting inherent
uncertainties and dynamism that are naturally coupled with the complexity of interactions in an urban system. Resilience
is inherently dynamic and, therefore, a suitable approach for future urban designs (Liao, 2012). Floods are an inevitable part
of urban dynamics, and therefore, the development of resilience to floods is significant to enlarge the existing body of
resilience in a new dimension that can account for the natural consequences of increasing storm events. However, research
on resilience to flood is still in its early stages with very few practical methods for real-world applications (Folke, 2006). In
what follows in this section, a number of approaches and recommendations are listed from the existing body of literature
that can be applied to urban flood hazard management from a resilience perspective:
– Applying a systems-oriented approach, such as a local bottom-up design approach that simultaneously addresses
physical, cultural, societal and economic issues, urban areas are often not understood as part of their surrounding
context, or in terms of the flows of resources, people, water and energy (McPhearson et al., 2018). Ignoring resource
flows and the interdependence of urban, periurban, and rural areas, as well as the relation between a city and its natural
environment, can lead to policies which reinforce and enforce unsustainable resource use. Often, a lack of planning
tools and current data makes integration of the design approach into planning and policies challenging (Raven
et al., 2018).
– Establishing a map of risk sectors, with hotspots defined. Geotechnical studies map and classify risk areas which can
estimate potential damage to dwellings and residents, considering their positions and distances to critical slopes, rivers,
and coast in coastal areas plus the degree of building vulnerability (construction pattern and level of urban
consolidation).
– Urban resilience strategies go hand in hand with configuration of urban morphology influenced by developed sustainable solutions like photovoltaic technologies, enhanced vegetation, and improved urban ventilation (Raven
et al., 2018)
– Best practices of adaptation-driven urban policies worldwide provide significant examples of how the paradigm shift
toward water-sensitive and water-resilient cities allows for the implementation of an integrated approach that combines
risk prevention with a regeneration of urban fabric driven by adaptive design solutions often including natural elements
(Kazmierczak and Carter, 2010; Karamouz et al., 2018b). Urban planning and urban design strategies focusing on green
infrastructure and sustainable water management help restore interactions between built and ecological environments. It
is necessary to improve the resilience of urban systems by applying the following (yet not limited to) nature-based
solutions to urban environments: (1) Utilizing an integrated “gray-green” approach to flood management strategies
in coastal areas that rather mitigates coastal floods than completely stop them from entering urban environments
(Karamouz and Heydari, 2020); (2) Revaluing and restoration of degraded ecosystems and indemnification of contaminated environmental elements, e.g., soil, air, and water. This will include monitoring air, water, and soil quality and
adopting measures to reduce pollutants and particulate matter; (3) Targeting water quality in coastal and riparian areas;
(4) Providing diverse open and safe public green space which enables cultural, community and recreation activities, and
contributes to food and water security (Kremer et al., 2016).
– Revamped drainage management approach to strengthen its flood resilience. This strategy aims at optimizing the management of stormwater using a comprehensive source-pathway-receptor approach that looks at catchment-wide
ecosystem-based solutions for achieving higher drainage and flood protection standards. It covers the entire drainage
system and not just the pathway over which the rainwater travels (Narayan et al., 2012). New provisions must be added
to surface-water drainage regulations, which should require a minimum land size dedicated to implementing measures
to slow surface runoff and reduce peak flows of stormwater into public drainage systems by implementing on-site
detention measures such as green roofs, rain gardens, and detention tanks.
Entropy and resilience indices Chapter
11
201
– Specified resilience, as described earlier on, although important, is not adequate alone. Optimizing specified resilience
may undermine the general resilience of a social-ecological system. This is mainly due to the possibility that too much
focus on specified resilience will tend to make the whole system less diverse, less flexible, and less responsive in terms
of cross-sector actions (Walker and Salt, 2006).
Overall, resilience to floods fosters the principle of working with nature rather than against it. It does not mean accepting
system failure during flood events; rather, it embraces the society’s potential toward flexibility and adaptability.
4.5 Resilience to drought
One of the complex natural hazards that have extreme effects on society, the environment, and the economy is drought.
Generally, projected longer-term droughts and intense floods underscore the need to store more water to manage climate
extremes but there are some fundamental differences between flood and drought contexts which makes applying resilience
concept different for each event. Unlike floods, droughts do not necessarily have a short term period; there is no indication
when it will start or end (Karamouz et al., 2015). It can last for months or even years which results in more complexity in
applying the resilience concept. Drought can affect economic aspects of a region by causing failure in water supply or
agricultural goals. There are four main stressors in drought that can cause failure in a region: (1) lack of precipitation,
(2) lack of water reservoir, (3) high temperature and (4) lack of soil moisture.
One of the studies on drought resilience was done by Karamouz et al. (2016) which applied the 4R concept of resilience
in drought context. They quantified resilience to drought in the Aharchay watershed, located in East Azerbaijan, Iran. This
watershed is one of the most important watersheds in the case study including the Sattarkhan Dam Reservoir. To quantify
resilience, they categorized the characteristics related to four resilience components namely rapidity, robustness, resourcefulness, and redundancy. They determined the relative importance of these components using multicriteria decision making
(MCDM). The criteria and subcriteria used in their research are presented in Table 1.
Another example is managing urban hydrological systems through improved greening to decrease the vulnerability of
urban ecosystems. For example, during drought periods, a small share of water resources may be reserved as an environmental flow for use by plants and animals, thus allowing ecological systems such as forests, wetlands, and streams to
TABLE 1 Criteria and subcriteria for assessing drought resilience (Karamouz et al., 2016).
Robustness (RO)
Redundancy (RD)
Resourcefulness (RS)
Rapidity (RA)
RO1: Water resources
available in the region
such as streams and
springs
RD1: Evaluation of water
resources transfer in the
surrounding area (water
transfer)
RS1: Availability of
regional data
RA1: Population of the region
RO2: Economic
vulnerability of the region
RD2: Ground water
resources availability
RS2: Risk and disaster
management plan
RA2: Implement a virtual drought
exercises performed by authorities
RO3: Geographic
proximity (mountain/
forest/desert)
RD3: Agricultural water use
method
RS3: Additional budget
for water disaster
RA3: Level of public awareness and
understanding of the concept of drought
(the culture of consumption and
conservation)
RO4: Average annual
rainfall in the region and
its variability
RD4: Prioritization of water
allocation facing drought
RS4: Drought
forecasting and warning
systems availability
RA4: Intensity of the disaster
RO5: Historical drought
experience and level of
region adaptability to
drought
RD5: Reservoir operation
policies facing drought
RS5: Drought
vulnerability maps
RA5: Infrastructure preparedness
RS6: Coordination
between organizations
facing drought
RA6: Significance of the region
(strategic values)
RO6: Average water
consumption in the region
202
Handbook of hydroinformatics
survive and maintain adaptive capacity. While drought may affect an entire region, urban ecosystems where water
resources are well managed can reduce the impact of such climate-driven water stress, but only provided that urban ecosystem management activities are part of a larger system-level urban resilience plan.
While greening solutions are presented as nature-based strategies, Van Loon et al. (2020) have explained the need for a
new approach to drought resilience: The “Creative Drought” project, led by the University of Birmingham, is a leading
example of how to increase drought resilience by utilizing local indigenous knowledge in addition to scientific methods.
The project brings together researchers from a number of different disciplines, and develops an interdisciplinary approach
unlike any other that evolve around a framework which could understand and help manage responses to drought. Their
research has shown that a drought event (or other natural hazards) is not always a phenomenon to avoid. They believe
experiencing any kind of natural catastrophe leads to adaptation, better management, and better preparedness for next
events. But preparedness, in its nature, reduces over time as events fade away from memories and people find it hard
to imagine what future droughts could have in store. Van Loon et al. (2020) searched for an approach that gave people
the ability to imagine an event without experiencing it; with hopes that creative experiments based on past drought stories
and future drought model scenarios might overcome the issue and help increase drought resilience by engaging local communities and authorities. They then built a model that was used to extrapolate and calculate several scenarios that were
mentioned by community members and government representatives. Instead of predicting the future, they explored plausible futures. Droughts were calculated and compared between the scenario and the baseline. These were transformed into
storylines including information on the duration and severity of future droughts compared to previous experience (e.g.,
more severe than has been experienced in the past 40 years or twice as long as the drought in the early 1980s). Matlou
et al. (2021) determined the impact of agricultural drought resilience on smallholder livestock farming households’ welfare
in Northern Cape Province of South Africa. They quantified smallholder farmers’ welfare based on four, human, social,
natural, and economic capitals, to enhance their resilience to agricultural drought. The results indicated that smallholder
farmers who received drought relief support saw an improvement in their welfare. However, the welfare improvements
varied across respondents and different gender categories, with males having higher welfare improvements relative to
females.
Pourmoghim et al. (2022) introduced a framework (based on Karamouz et al., 2016) to evaluate the resilience of lakes
under climatic and anthropogenic droughts. They proposed a hierarchical structure of criteria with four levels. The first
level included several indices such as long-term resilience, reliability, and implementation cost. In the second to fourth
levels, four main resilience-based criteria (i.e., robustness, resourcefulness, redundancy, and rapidity) with relative subcriteria were defined. They aggregated the values of criteria and subcriteria using the Evidential Reasoning (ER) approach.
In the end, they calculated the annual resilience time series, three resilience indices, namely the recovery time, loss of
resilience, and final resilience. The proposed methodology was applied in the Zarrinehrud river basin and Lake Urmia.
The results showed that more than 80% of the scenarios with the implementation costs of more than 50 million US dollars
have an overall resilience of more than 70%.
5.
Conclusions
More frequent climate and weather extreme events are being experienced in urban ecosystems. The frequency and severity
of weather and climate-related disasters in urban areas are projected to increase in the coming decades. With climate change
impacts taking hold, the environmental baselines of urban environments have started to shift. Stated that more than half of
the world’s population resides in urban areas and that this trend is expected to significantly increase in the coming decades
(Rosenzweig et al., 2018), more attention needs to be spotlighted on disaster risk reduction and a paradigm shift from failsafe to safe-to-fail. This chapter has endorsed a comprehensive theory of entropy and resilience in order to build resilient
cities. These concepts embrace inherent dynamism of urban ecosystems and uncertainties of extreme conditions to provide
unconventional perspectives for mitigating the impact of natural hazards on ecology and urban water systems performance.
Resilience theory suggests that what underlies a truly resilient urban design is not how stable appears or how many disturbances it has absorbed, but whether it can stand an unpredictable shock that would fundamentally alter or erase an urban
system’s identity.
How urban design affects urban resilience, however, essentially depends on design principles that are increasingly influenced by ecological resilience rather than engineering resilience as past experiences are taken as lessons for the future. A
resilient urban water system can only be designed by viewing urban regions as complex socio-ecological systems with
cross-level interactions and innate uncertainties. Green infrastructures and their integration into traditional resistanceoriented designs have been proven to provide cost-effective, nature-based solutions for a resilient adaptation strategy
toward climate change and extreme events while also creating opportunities to increase socioeconomic equity, public green
Entropy and resilience indices Chapter
11
203
spaces, and sustainable urban development. Comprehensive and ecosystem-friendly adaptation scenarios used to enhance
urban resilience are not limited to the strategies stated in this chapter but still of the same nature. It is argued that in applying
the key ideas and principles of resilience, it is important to think of the seemingly opposing processes, such as rigidity vs
flexibility, general vs specified, and creativity vs conservation not as paradoxes but dialectical duals that must coexist to
achieve a synthesis of urban resilience.
References
Ahern, J., 2011. From fail-safe to safe-to-fail: sustainability and resilience in the new urban world. Landsc. Urban Plan. 100 (4), 341–343.
Alberti, M., 2015. Eco-evolutionary dynamics in an urbanizing planet. Trends Ecol. Evol. 30 (2), 114–126. https://doi.org/10.1016/j.tree.2014.11.007.
Ansari, A.H., Olyaei, M.A., Heydari, Z., 2021. Ensemble generation for hurricane hazard assessment along the United States’ Atlantic coast. Coast. Eng.
169, 103956.
Asefa, T., Clayton, J., Adams, A., Anderson, D., 2014. Performance evaluation of a water resources system under varying climatic conditions: reliability,
resilience, vulnerability and beyond. J. Hydrol. 508, 53–65.
Atkinson, S., Farmani, R., Memon, F.A., Butler, D., 2014. Reliability indicators for water distribution system design: comparison. J. Water Resour. Plan.
Manag. 140 (2), 160–168.
Banik, B.K., Alfonso, L., Torres, A.S., Mynett, A., Di Cristo, C., Leopardi, A., 2015. Optimal placement of water quality monitoring stations in sewer
systems: an information theory approach. Procedia Eng. 119, 1308–1317.
Barbe, D.E., Cruise, J.F., Singh, V.P., 1994. Derivation of a distribution for the piezometric head in groundwater flow using entropy. In: Stochastic and
statistical methods in hydrology and environmental engineering. Springer, Dordrecht, Netherlands, pp. 151–161.
Bardsley, D.K., Hugo, G.J., 2010. Migration and climate change: examining thresholds of change to guide effective adaptation decision-making. Popul.
Environ. 32, 238–262. https://doi.org/10.1007/s11111-010-0126-9.
Batty, M., 2008. The size, scale, and shape of cities. Science 319, 769–771. https://doi.org/10.1126/science.1151419.
Berkes, F., 2007. Understanding uncertainty and reducing vulnerability: lessons from resilience thinking. Nat. Hazards 41 (2), 283–295.
Bettencourt, L., West, G., 2010. A unified theory of urban living. Nature 467, 912–913. https://doi.org/10.1038/467912a.
Boroumand, A., Rajaee, T., Masoumi, F., 2018. Semivariance analysis and transinformation entropy for optimal redesigning of nutrients monitoring
network in San Francisco bay. Mar. Pollut. Bull. 129 (2), 689–694.
Bruneau, M., Chang, S.E., Eguchi, R.T., Lee, G.C., O’Rourke, T.D., Reinhorn, A.M., Shinozuka, M., Tierney, K., Wallace, W.A., Von Winterfeldt, D.,
2003. A framework to quantitatively assess and enhance the seismic resilience of communities. Earthq. Spectra 19 (4), 733–752.
Butler, D., Farmani, R., Fu, G., Ward, S., Diao, K., Astaraie-Imani, M., 2014. A new approach to urban water management: safe and sure. In: 16th Water
Distribution System Analysis Conference, WDSA. Procedia Engineering, pp. 347–354.
Butler, D., Ward, S., Sweetapple, C., Astaraie-Imani, M., Diao, K., Farmani, R., Fu, G., 2017. Reliable, resilient and sustainable water management: the
Safe & SuRe approach. Global Chall. 1 (1), 63–77. https://doi.org/10.1002/gch2.1010.
Castonguay, S., 2007. The production of flood as natural catastrophe: extreme events and the construction of vulnerability in the drainage basin of the St.
Francis River (Quebec), mid-nineteenth to mid-twentieth century. Environ. Hist. 12 (4), 820–844.
Chen, L., Singh, V.P., 2018. Entropy-based derivation of generalized distributions for hydrometeorological frequency analysis. J. Hydrol. 557, 699–712.
Colten, C.E., Sumpter, A.R., 2009. Social memory and resilience in New Orleans. Nat. Hazards 48 (3), 355–364.
Cui, H., Singh, V.P., 2015. Configurational entropy theory for streamflow forecasting. J. Hydrol. 521, 1–17.
Daly, H.E., 1992. Is the entropy law relevant to the economics of natural resource scarcity?—yes, of course it is! J. Environ. Econ. Manag. 23 (1), 91–95.
Darbandsari, P., Coulibaly, P., 2020. Introducing entropy-based Bayesian model averaging for streamflow forecast. J. Hydrol. 591, 125577.
Diao, K., 2021. Towards resilient water supply in centralized control and decentralized execution mode. J. Water Supply Res. Technol. AQUA 70 (4),
449–466.
Diao, K., Sweetapple, C., Farmani, R., Fu, G., Ward, S., Butler, D., 2016. Global resilience analysis of water distribution systems. Water Res. 106,
383–393.
Diao, K., Jung, D., Farmani, R., Fu, G., Butler, D., Lansey, K., 2021. Modular interdependency analysis for water distribution systems. Water Res.
201, 117320.
Dong, S., Wang, N., Liu, W., Soares, C.G., 2013. Bivariate maximum entropy distribution of significant wave height and peak period. Ocean Eng. 59,
86–99.
Dong, X., Guo, H., Zeng, S., 2017. Enhancing future resilience in urban drainage system: green versus grey infrastructure. Water Res. 124, 280–289.
Ernstson, H., Barthel, S., Andersson, E., Borgstr€om, S.T., 2010. Scale-crossing brokers and network governance of urban ecosystem services: the case of
Stockholm. Ecol. Soc. 15 (4), 28.
Eslamian, S., Eslamian, F., 2021. Disaster Risk Reduction for Resilience: Disaster Risk Management Strategies. Springer Nature, Switzerland.
Fistola, R., 2011. The unsustainable city. Urban entropy and social capital: the needing of a new urban planning. Procedia Eng. 21, 976–984.
Folke, C., 2006. Resilience: the emergence of a perspective for social–ecological systems analyses. Global Environ. Change 16 (3), 253–267.
Folke, C., Carpenter, S.R., Walker, B.H., Scheffer, M., Chapin, T., Rockstrom, J., 2010. Resilience thinking: integrating resilience, adaptability and transformability. Ecol. Soc. 15 (4), 20.
Fowler, H.J., Kilsby, C.G., O’Connell, P.E., 2003. Modeling the impacts of climatic change and variability on the reliability, resilience, and vulnerability of
a water resource system. Water Resour. Res. 39 (8).
204
Handbook of hydroinformatics
Gencer, E.A., 2008. Natural Disasters, Vulnerability, and Sustainable Development. VDM Verlag, Germany.
Gencer, E.A., 2013. The Interplay Between Urban Development, Vulnerability, and Risk Management: A Case Study of the Istanbul Metropolitan Area.
Springer Briefs in Environment, Security, Development and Peace, vol. 7 Springer Science & Business Media, Heidelberg, New York, Dordrecht,
London, UK.
Gencer, E., Folorunsho, R., Linkin, M., Wang, X., Natenzon, C.E., Wajih, S., Mani, N., Esquivel, M., Solecki, W., 2018. Disasters and risk in cities. In:
Rosenzweig, C., Solecki, W., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ali Ibrahim, S. (Eds.), Climate Change and Cities: Second Assessment
Report of the Urban Climate Change Research Network. Cambridge University Press, New York, USA, pp. 61–98.
Georgescu-Roegen, N., 1993. The entropy law and the economic problem. In: Valuing the Earth: Economics, Ecology, Ethics, pp. 75–88.
Greco, M., Mirauda, D., Plantamura, A.V., 2014. Manning’s roughness through the entropy parameter for steady open channel flows in low submergence.
Procedia Eng. 70, 773–780.
Grimm, N.B., Grove, J.M., Pickett, S.T.A., Redman, C.A., 2000. Integrated approaches to long-term studies of urban ecological systems. Bioscience 50 (7),
571. https://doi.org/10.1641/0006-3568(2000)050.
Hashimoto, T., Stedinger, J.R., Loucks, D.P., 1982. Reliability, resiliency, and vulnerability criteria for water resource system performance evaluation.
Water Resour. Res. 18 (1), 14–20.
Holling, C.S., 1973. Resilience and stability of ecological systems. Annu. Rev. Ecol. Syst. 4 (1), 1–23.
Holling, C.S., 1996. Engineering resilience versus ecological resilience. In: Engineering Within Ecological Constraints, pp. 31–43.
Holling, C.S., Meffe, G.K., 1996. Command and control and the pathology of natural resource management. Conserv. Biol. 10 (2), 328–337.
Hosseini, S., Barker, K., Ramirez-Marquez, J.E., 2016. A review of definitions and measures of system resilience. Reliab. Eng. Syst. Saf. 145, 47–61.
Juan-Garcı́a, P., Butler, D., Comas, J., Darch, G., Sweetapple, C., Thornton, A., Corominas, L., 2017. Resilience theory incorporated into urban wastewater
systems management. State of the art. Water Res. https://doi.org/10.1016/j.watres.2017.02.047.
Jung, D., Kang, D., Kim, J.H., Lansey, K., 2014. Robustness-based design of water distribution systems. J. Water Resour. Plan. Manag. 140 (11),
04014033.
Karamouz, M., Heydari, Z., 2020. Conceptual design framework for coastal flood best management practices. J. Water Resour. Plan. Manag. 146 (6),
04020041.
Karamouz, M., Hojjat-Ansari, A., 2020. Uncertainty based budget allocation of wastewater infrastructures’ flood resiliency considering interdependencies.
J. Hydroinf. 22 (4), 768–792.
Karamouz, M., Mohammadi, K., 2020. Nonstationary based framework for performance enhancement of coastal flood mitigation strategies. J. Hydrol.
Eng. 25 (6), 04020020.
Karamouz, M., Zeynolabedin, A., Olyaei, M.A., 2015. Mapping regional drought vulnerability: a case study. Int. Arch. Photogramm. Remote. Sens. Spat.
Inf. Sci. 40.
Karamouz, M., Zeynolabedin, A., Olyaei, M.A., 2016. Regional drought resiliency and vulnerability. J. Hydrol. Eng. 21 (11), 05016028.
Karamouz, M., Rasoulnia, E., Olyaei, M.A., Zahmatkesh, Z., 2018a. Prioritizing investments in improving flood resilience and reliability of wastewater
treatment infrastructure. J. Infrastruct. Syst. 24 (4), 04018021.
Karamouz, M., Taheri, M., Mohammadi, K., Heydari, Z., Farzaneh, H., 2018b. A new perspective on BMPs’ application for coastal flood preparedness. In:
World Environmental and Water Resources Congress 2018: Water, Wastewater, and Stormwater; Urban Watershed Management; Municipal Water
Infrastructure; and Desalination and Water Reuse. American Society of Civil Engineers, Reston, VA, USA, pp. 171–180.
Kazmierczak, A., Carter, J., 2010. Adaptation to climate change using green and blue infrastructure: a database of case studies. University of Manchester,
Interreg IVC Green and blue space adaptation for Urban areas and eco-towns (GRaBS). Accessed 19 March 2020.
Kjeldsen, T.R., Rosbjerg, D., 2004. Choice of reliability, resilience and vulnerability estimators for risk assessments of water resources systems (Choix
d’estimateurs de fiabilite, de resilience et de vulnerabilite pour les analyses de risque de systèmes de ressources en eau). Hydrol. Sci. J. 49 (5). https://
doi.org/10.1623/hysj.49.5.755.55136.
Kleidon, A., Schymanski, S., 2008. Thermodynamics and optimality of the water budget on land: a review. Geophys. Res. Lett. 35 (20). https://doi.org/
10.1029/2008GL035393.
Klein, R.J., Smit, M.J., Goosen, H., Hulsbergen, C.H., 1998. Resilience and vulnerability: coastal dynamics or Dutch dikes? Geogr. J., 259–268.
Kremer, P., Hamstead, Z.A., McPhearson, T., 2016. The value of urban ecosystem services: a spatially explicit multicriteria analysis of landscape scale
valuation scenarios in NYC. Environ. Sci. Pol. https://doi.org/10.1016/J.ENVSCI.2016.04.012.
Lewis, G.N., Randall, M., 1961. Thermodynamics, second ed. McGraw-Hill, New York, USA. Revised by Pitzer, K. S. and Brewer, L.
Liao, K.H., 2012. A theory on urban resilience to floods—a basis for alternative planning practices. Ecol. Soc. 17 (4), 48.
Liu, J., Dietz, T., Carpenter, S.R., Alberti, M., Folke, C., Moran, E., Pell, A.N., Deadman, P., Kratz, T., Lubchenco, J., Ostrom, E., Ouyang, Z., Provencher,
W., Redman, C.L., Schneider, S.H., Taylor, W.W., 2007. Complexity of coupled human and natural systems. Science 317, 1513–1516. https://doi.org/
10.1126/science.1144004.
Liu, Y., You, M., Zhu, J., Wang, F., Ran, R., 2019. Integrated risk assessment for agricultural drought and flood disasters based on entropy information
diffusion theory in the middle and lower reaches of the Yangtze River, China. Int. J. Disaster Risk Reduct. 38, 101194.
Magrin, G.O., Marengo, J.A., Boulanger, J.P., Buckeridge, M.S., Castellanos, E., Poveda, G., Vicuña, S., 2014. Central and South America, Climate
Change 2014: impacts, adaptation, and vulnerability. In: Part B: Regional Aspects. Contribution of Working Group II to the Fifth Assessment Report
of the Intergovernmental Panel on Climate Change, Cambridge, United Kingdom and New York (Chapter 27).
Malekinezhad, H., Sepehri, M., Pham, Q.B., Hosseini, S.Z., Meshram, S.G., Vojtek, M., Vojteková, J., 2021. Application of entropy weighting method for
urban flood hazard mapping. Acta Geophys., 1–14.
Matlou, R., Bahta, Y.T., Owusu-Sekyere, E., Jordaan, H., 2021. Impact of agricultural drought resilience on the welfare of smallholder livestock farming
households in the Northern Cape Province of South Africa. Land 10 (6), 562.
Entropy and resilience indices Chapter
11
205
McDonnell, M.J., Hahs, A., 2013. The future of urban biodiversity research: moving beyond the “low-hanging fruit”. Urban Ecosyst. 16, 397–409. https://
doi.org/10.1007/s11252-013-0315-2.
McPhearson, T., Andersson, E., Elmqvist, T., Frantzeskaki, N., 2015. Resilience of and through urban ecosystem services. Ecosyst. Serv. 12, 152–156.
McPhearson, T., Haase, D., Kabisch, N., Gren, Å., 2016a. Advancing understanding of the complex nature of urban systems. Ecol. Indic. 70, 566–573.
McPhearson, T., Pickett, S.T.A., Grimm, N., Alberti, M., Elmqvist, T., Niemel€a, J., Weber, C., Haase, D., Breuste, J., Qureshi, S., 2016b. Advancing urban
ecology toward a science of cities. Bioscience 66 (3), 198–212. https://doi.org/10.1093/biosci/biw002.
McPhearson, T., Karki, M., Herzog, C., Santiago Fink, H., Abbadie, L., Kremer, P., Clark, C.M., Perini, K., 2018. Urban ecosystems and biodiversity. In:
Rosenzweig, C., Solecki, W., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ali Ibrahim, S. (Eds.), Climate Change and Cities: Second Assessment
Report of the Urban Climate Change Research Network. Cambridge University Press, New York, USA, pp. 257–318.
Melo, O., Vargas, X., Vicuna, S., Meza, F., McPhee, J., 2010. Climate change economic impacts on supply of water for the M & I sector in the metropolitan
region of Chile. In: 2010 Watershed Management Conference: Innovations in Watershed Management Under Land Use and Climate Change, August
(23–27), Madison, Wisconsin, USA.
Meng, F., Fu, G., Butler, D., 2016. Water quality permitting: from end-of-pipe to operational strategies. Water Res. 101, 114–126.
Meng, F., Fu, G., Farmani, R., Sweetapple, C., Butler, D., 2018. Topological attributes of network resilience: a study in water distribution systems. Water
Res. 143, 376–386.
Mobley, W., Sebastian, A., Highfield, W., Brody, S.D., 2019. Estimating flood extent during Hurricane Harvey using maximum entropy to build a hazard
distribution model. J. Flood Risk Manage. 12, e12549.
Mohammadiun, S., Yazdi, J., Salehi Neyshabouri, S.A.A., Sadiq, R., 2018. Development of a stochastic framework to design/rehabilitate urban stormwater
drainage systems based on a resilient approach. Urban Water J. 15 (2), 167–176.
Mugume, S.N., Gomez, D.E., Fu, G., Farmani, R., Butler, D., 2015. A global analysis approach for investigating structural resilience in urban drainage
systems. Water Res. 81, 15–26.
Mukher jee, D., Ratnaparkhi, M.V., 1986. On the functional relationship between entropy and variance with related applications. Commun. Stat. Theory
Methods 15 (1), 291–311.
Narayan, S., Hanson, S., Nicholls, R.J., Clarke, D., Willems, P., Ntegeka, V., Monbaliu, J., 2012. A holistic model for coastal flooding using system diagrams and the source–pathway–receptor (SPR) concept. Nat. Hazards Earth Syst. Sci. 12 (5), 1431–1439.
Niemel€a, J., Breuste, J.H., Guntenspergen, G., McIntyre, N.E., Elmqvist, T., James, P., 2011. Urban Ecology: Patterns, Processes, and Applications. Oxford
University Press, UK.
Nunes Correia, F., Castro Rego, F., Da Graca Saraiva, M., Ramos, I., 1998. Coupling GIS with hydrologic and hydraulic flood modelling. Water Resour.
Manag. 12 (3), 229–249.
NYCDEP, 2013. NYC Wastewater resiliency plan, climate risk assessment and adaptation study. In: Wastewater Treatment Plants. Department of Environmental Protection, New York City, USA.
Olyaei, M., Karamouz, M., 2020. A Bayesian approach for estimating biological treatment parameters under flood condition. J. Environ. Eng.
ASCE. https://doi.org/10.1061/(ASCE)EE.19437870.0001756.
Olyaei, M.A., Karamouz, M., Farmani, R., 2018. Framework for assessing flood reliability and resilience of wastewater treatment plants. J. Environ. Eng.
144 (9), 04018081.
Pahl-Wostl, C., 2006. The importance of social learning in restoring the multifunctionality of rivers and floodplains. Ecol. Soc. 11 (1), 10.
Pandey, R., Bardsley, D.K., 2013. Human ecological implications of climate change in the Himalaya: pilot studies of adaptation in agro-ecosystems within
two villages from Middle Hills and Tarai, Nepal. In: Impacts World 2013, International Conference on Climate Change Effects, Potsdam, Germany,
May 27–30.
Panos, C.L., Wolfand, J.M., Hogue, T.S., 2021. Assessing resilience of a dual drainage urban system to redevelopment and climate change. J. Hydrol. 596,
126101.
Papalexiou, S.M., Koutsoyiannis, D., 2012. Entropy based derivation of probability distributions: a case study to daily rainfall. Adv. Water Resour. 45, 51–
57.
Park, J., Seager, T.P., Rao, P.S.C., Convertino, M., Linkov, I., 2013. Integrating risk and resilience approaches to catastrophe management in engineering
systems. Risk Anal. 33 (3), 356–367.
Pelorosso, R., Gobattoni, F., Leone, A., 2017. The low-entropy city: a thermodynamic approach to reconnect urban systems with nature. Landsc. Urban
Plan. 168, 22–30.
Peterson, G., 2000. Political ecology and ecological resilience: an integration of human and ecological dynamics. Ecol. Econ. 35 (3), 323–336.
Peterson, G., Allen, C.R., Holling, C.S., 1998. Ecological resilience, biodiversity, and scale. Ecosystems 1, 6–18.
Pickett, S.T., Burch, W.R., Dalton, S.E., Foresman, T.W., Grove, J.M., Rowntree, R., 1997. A conceptual framework for the study of human ecosystems in
urban areas. Urban Ecosyst. 1 (4), 185–199.
Pickett, S.T.A., Cadenasso, M.L., Grove, J.M., Nilon, C.H., Pouyat, R.V., Zipperer, W.C., Costanza, R., 2001. Urban ecological systems: linking terrestrial
ecological, physical, and socioeconomic components of metropolitan areas. Annu. Rev. Ecol. Syst. 32, 127–157.
Pourmoghim, P., Behboudian, M., Kerachian, R., 2022. An uncertainty-based framework for evaluating and improving the long-term resilience of lakes
under anthropogenic droughts. J. Environ. Manag. 301, 113900.
Purvis, B., Mao, Y., Robinson, D., 2017. Thermodynamic entropy as an indicator for urban sustainability? Procedia Eng. 198, 802–812.
Qiu, H., Chen, L., Zhou, J., He, Z., Zhang, H., 2021. Risk analysis of water supply-hydropower generation-environment nexus in the cascade reservoir
operation. J. Clean. Prod. 283, 124239.
Ranjbar, S., Singh, A., 2020. Entropy and intermittency of river bed elevation fluctuations. J. Geophys. Res. Earth Surf. 125 (8), e2019JF005499.
206
Handbook of hydroinformatics
Raven, J., Stone, B., Mills, G., Towers, J., Katzschner, L., Leone, M., Gaborit, P., Georgescu, M., Hariri, M., 2018. Urban planning and design. In:
Rosenzweig, C., Solecki, W., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ali Ibrahim, S. (Eds.), Climate Change and Cities: Second Assessment
Report of the Urban Climate Change Research Network. Cambridge University Press, New York, USA, pp. 139–172.
Reggiani, P., Sivapalan, M., Hassanizadeh, S.M., 1998. A unifying framework for watershed thermodynamics: balance equations for mass, momentum,
energy and entropy, and the second law of thermodynamics. Adv. Water Resour. 22 (4), 367–398.
Revi, A., Satterthwaite, D.E., Aragón-Durand, F., Corfee-Morlot, J., Kiunsi, R.B.R., Pelling, M., Solecki, W., 2014. In: Field, C.B., Barros, V.R., Dokken,
D.J., et al. (Eds.), Urban areas Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of
Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, pp. 535–612.
Rosenzweig, C., Solecki, W.D., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ibrahim, S.A. (Eds.), 2018. Climate Change and Cities: Second Assessment
Report of the Urban Climate Change Research Network. Cambridge University Press, UK.
Scholz, R.W., Blumer, Y.B., Brand, F.S., 2012. Risk, vulnerability, robustness, and resilience from a decision-theoretic perspective. J. Risk Res. 15 (3),
313–330. https://doi.org/10.1080/13669877.2011.634522.
Setiadi, Y., Tanyimboh, T.T., Templeman, A.B., 2005. Modelling errors, entropy and the hydraulic reliability of water distribution systems. Adv. Eng.
Softw. 36 (11–12), 780–788.
Shannon, C.E., 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27 (3), 379–423.
Singh, V.P., 1997. The use of entropy in hydrology and water resources. Hydrol. Process. 11 (6), 587–626.
Singh, K.R., Dutta, R., Kalamdhad, A.S., Kumar, B., 2019. An investigation on water quality variability and identification of ideal monitoring locations by
using entropy based disorder indices. Sci. Total Environ. 647, 1444–1455.
Solecki, W., O’Brien, K., Leichenko, R., 2011. Disaster risk reduction and climate change adaptation strategies: convergence and synergies. Curr. Opin.
Environ. Sustain. 3 (3), 135–141.
Sweetapple, C., Fu, G., Butler, D., 2017. Reliable, robust, and resilient system design framework with application to wastewater-treatment plant control.
J. Environ. Eng. 143 (3), 04016086.
Sweetapple, C., Astaraie-Imani, M., Butler, D., 2018. Design and operation of urban wastewater systems considering reliability, risk and resilience. Water
Res. 147, 1–12.
Sweetapple, C., Fu, G., Farmani, R., Butler, D., 2019. Exploring wastewater system performance under future threats: does enhancing resilience increase
sustainability? Water Res. 149, 448–459.
Swetapadma, S., Ojha, C.S.P., 2021. Flood frequency study using partial duration series coupled with entropy principle. Hydrol. Earth Syst. Sci. Discuss.,
1–23.
Tanyimboh, T.T., 2017. Informational entropy: a failure tolerance and reliability surrogate for water distribution networks. Water Resour. Manag. 31 (10),
3189–3204.
Tanyimboh, T.T., Templeman, A.B., 1993. Optimum design of flexible water distribution networks. Civ. Eng. Syst. 10 (3), 243–258.
Tavakol-Davani, H., Rahimi, R., Burian, S.J., Pomeroy, C.A., McPherson, B.J., Apul, D., 2019. Combining hydrologic analysis and life cycle assessment
approaches to evaluate sustainability of water infrastructure: uncertainty analysis. Water 11 (12), 2592.
Thomalla, F., Downing, T., Spanger-Siegfried, E., Han, G., Rockstr€om, J., 2006. Reducing hazard vulnerability: towards a common approach between
disaster risk reduction and climate adaptation. Disasters 30 (1), 39–48.
Tobin, G.A., 1995. The levee love affair: a stormy relationship? J. Am. Water Resour. Assoc. 31 (3), 359–367.
Tyler, S., Moench, M., 2012. A framework for urban climate resilience. Clim. Dev. 4 (4), 311–326.
United Nations Development Programme (UNDP), 2004. Reducing Disaster Risk: A Challenge for Development. http://www.undp.org/bcpr. (Accessed 3
May 2020).
United Nations. Dept. of Economic, 2004. World Urbanization Prospects: The 2003 Revision. vol. 216 United Nations Publications.
Van Loon, A.F., Lester-Moseley, I., Rohse, M., Jones, P., Day, R., 2020. Creative practice as a potential tool to build drought and flood resilience in the
Global South. EGU. https://doi.org/10.5194/gc-2020-11.
Verchick, R.R., 2010. Facing Catastrophe. Harvard University Press, USA, Cambridge, MA.
Walker, B., Salt, D., 2006. Resilience Thinking. Island Press.
Wang, C.H., Blackmore, J.M., 2009. Resilience concepts for water resource systems. J. Water Resour. Plan. Manag. 135 (6), 528–536.
Wang, X., Khoo, Y.B., Wang, C.H., 2014. Risk assessment and decision- making for residential housing adapting to increasing stormtide inundation due to
sea level rise in Australia. Civ. Eng. Environ. Syst. 31 (2), 125–139.
Wang, W., Wang, D., Singh, V.P., Wang, Y., Wu, J., Wang, L., He, R., 2018. Optimization of rainfall networks using information entropy and temporal
variability analysis. J. Hydrol. 559, 136–155.
Wang, M., Fang, Y., Sweetapple, C., 2021. Assessing flood resilience of urban drainage system based on a ‘do-nothing’ benchmark. J. Environ. Manag.
288, 112472.
Xu, P., Wang, D., Singh, V.P., Wang, Y., Wu, J., Wang, L., He, R., 2018. A kriging and entropy-based approach to rain gauge network design. Environ.
Res. 161, 61–75.
Zeynolabedin, A., Ghiassi, R., Norooz, R., Najib, S., Fadili, A., 2021. Evaluation of geoelectrical models efficiency for coastal seawater intrusion by
applying uncertainty analysis. J. Hydrol. 603, 127086.
Zhang, X., Low, Y.M., Koh, C.G., 2020. Maximum entropy distribution with fractional moments for reliability analysis. Struct. Saf. 83, 101904.
Ziarh, G.F., Asaduzzaman, M., Dewan, A., Nashwan, M.S., Shahid, S., 2021. Integration of catastrophe and entropy theories for flood risk mapping in
peninsular Malaysia. J. Flood Risk Manage. 14 (1), e12686.
Chapter 12
Forecasting volatility in the stock market
data using GARCH, EGARCH, and GJR
models
Sarbjit Singha,b, Kulwinder Singh Parmarc, and Jatinder Kaurc,d
a
Guru Nanak Dev University College, Pathankot, Punjab, India, b Department of Mathematics, Guru Nanak Dev University, Amritsar, Punjab, India,
c
Department of Mathematics, I.K. Gujral Punjab Technical University, Kapurthala, Punjab, India, d Guru Nanak Dev University College, Amritsar,
Punjab, India
1. Introduction
Forecasting volatility in financial markets has been getting more and more attention from researchers in diverse fields, stock
market experts, and business analysts since October 19, 1987, when the stock market crashed. The volatility reflects uncertainty in stock market data and is affected by many factors like high corporate profit, sudden events, regulatory bodies,
emotions, and sentiments of investors. Volatility measures the risk of a security and helps in estimating short-period fluctuations. The GARCH model is a conditional variance model that estimates the volatility in stock market returns, bonds, and
other market indices. It helps modelers in assessing risks and optimizing their decisions (Tian and Guo, 2003). GARCH
model is used generally when the observations tend to cluster and do not form a linear pattern. In the case of time-series
data, the GARCH model is appropriate when the variance of error terms is serially autocorrelated and follows an autoregressive moving average process (Engle, 1982; Bollerslev, 1986; Barunik et al., 2016).
The generalized autoregressive conditional heteroskedasticity (GARCH) model describes an approach for estimating
volatility in financial markets which was introduced by Robert F. Engle in 1983, a 2003 Nobel Prize winner in Economics.
Many financial modeling professionals all over the globe used to prefer the GARCH model due to its capability in modeling
and predicting conditional variances and volatility in financial data (Dellaportas and Pourahmadi, 2012).
The most significant alphabet in a GARCH is H, the heteroskedasticity. Statistically, heteroskedasticity happens when
standard errors of a variable observed over a specific period are nonconstant (Hadizadeh and Eslamian, 2017). Depending
upon the future trends, heteroskedasticity are of two types; unconditional and conditional heteroskedasticity. Usually, lower
volatilities are accompanied by upward-positive movements in stock prices, while downward-negative swings of the same
magnitude point toward much higher volatilities ( Jach and Kokoszka, 2010). The presence of heteroskedastic effects
makes the model quite challenging. Engle (1982) developed the time-varying variance model, and Bollerslev (1986)
extended the model to include the structure of an ARMA model. Since then, many studies have adopted the GARCH
framework to explain the volatility of the stock market. Both the up and down trends of the stock market tend to affect
the volatility (Bouoiyour and Selmi, 2015). The stock market is highly influenced by massive changes, while minor changes
tend to have a low impact. There is a negative correlation between the shocks and the returns of the stock market. The
market takes a much longer time to recover from adverse shocks leaving substantial impacts on stock pricing as compared
to positive jolts (Liu and Morley, 2009). Thus, normal distribution or symmetric distribution is not always an appropriate
assumption (Nelson, 1991; Chuang et al., 2007). Therefore, researchers experimented GARCH model with some other
well-known distributions that are given in Table 1.
The key assumption in the GARCH model is that the variance will revert to the average value in the future. In financial
econometrics, GARCH effects are very predominant, because they capture the stylized facts of such data that show, for
example, volatility clustering, dependence without correlation, and tail heaviness (Paolella, 2018). Fat-tail distributions
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00024-5
Copyright © 2023 Elsevier Inc. All rights reserved.
207
208
Handbook of hydroinformatics
TABLE 1 Literature review on the GARCH model with some well-known distributions other than normal distribution for
financial studies.
Related literature
Distribution used
Purpose of study
Bollerslev (1988), Baillie and
Bollerslev (1989)
Student’s t-distribution
To model the foreign exchange rate
Hsieh (1989)
Exponential distribution
Used for foreign exchange rates
Akgirary et al. (1991)
Exponential distribution
Applied to the distribution of prices of precious metal
Nelson (1991)
Exponential distribution
To study the U.S. stock market
Ding et al. (1993)
Asymmetric power autoregressive
conditional heteroskedastic (APARCH)
model using Standard and Poor’s data
To investigate into long memory property of Stock
Market returns
Theodossiou (1994)
Exponential distribution
Used for foreign exchange rates
Koutmos and Theodossiou (1994)
Exponential distribution
Used for foreign exchange rates
Gallant et al. (1997)
Nonnormal distribution
For the financial analysis
McMillan et al. (2000)
Symmetric and asymmetric densities
For the United Kingdom stock market
Lambert and Laurent (2001)
Skewed Student’s t-distribution
Used it in the GARCH framework
Siourounis (2002)
Nonnormal distribution
For the financial analysis
Harris et al. (2004)
Skewed generalized Student’s
t-distribution
To capture stylized facts (skewness and leverage
effects) of daily returns
Yu (2005)
Nonnormal distribution
For the financial analysis
Chuang et al. (2007)
Logistic distribution and the scaled
student’s t distribution
Forecasting volatility in the financial markets
usually represent the stylized facts of the stock market. Alberg et al. (2011) proposed that the GARCH models with fat-tail
distributions are relatively better suited for analyzing returns on stocks. Finite-dimensional distributions of GARCH processes exhibit interesting features of regular variation. This feature is consistent with the “heavy-tailed ness” possessed by
real-life log-return data. The regular variation is due to the squares of a stationary GARCH process embedded in a multivariate linear stochastic recurrence equation. The convergence of sample autocorrelations is slow when the tails of
GARCH processes are heavy (Basrak et al., 2002).
In the last two decades, maximizing profits has always been a driving force for investors to shift toward algorithmic
trading and apply machine learning methods to investment decisions. The use of quantitative methods in economics and
finance research has increased dramatically with the advent of technology. It leads to the development of many theories that
explain the risk preferences of investors and optimal assets allocation in the portfolio under different risk aversion conditions. All these theories had clubbed under Modern Portfolio Theory and the Efficient Frontier of optimal asset allocation
(Markowitz, 1952). Under this theory, an investor selects a portfolio at the time that produces a stochastic return at time t.
The model assumes investors are risk-averse and, when choosing among portfolios, they care only about the mean and
variance of their one-period investment returns. Thus, investors choose “mean-variance-efficient portfolios” in the sense
that the portfolios minimize the variance of portfolio return, given the expected return, and maximize the expected return
given variance (Fama and French, 2004). This theory thus covered an optimal combination of securities. However, it also
sketched a critical assumption among others, i.e., “today’s returns are a function of the decisions made in the past.” This
connectivity between the past and present actions provides researchers with an abundant amount of information contained
in so-called “histories.” This leads to the idea that the “history repeats itself in that ‘patterns’ of past price behavior will tend
to recur in the future” (Fama, 1965; Charles and Darne, 2005). However, on the contrary, some researchers are also of the
Forecasting volatility in the stock market data Chapter
12
209
opinion that the future path of the price level of security cannot be predicted from the past, i.e., they believe in the theory of
random walks. To be precise, it implies that the past cannot predict the future in any meaningful way (Fama, 1965). Nelson
(1990), while studying the relationship between GARCH and similar models, found that with some restrictions on parameters for short intervals in the sequence of GARCH models, the conditional variance converges to a stochastic differential
equation with an inverse-gamma stationary distribution. It implies that the GARCH log-returns can be modeled approximately with Student’s t-distribution for sufficiently short time intervals. Since the GARCH process is Markovian, it is
enough to consider the convergence of Markov chains (Engle, 1982). In the beginning, error terms obtained after modeling
the financial time series by GARCH were handled by normal distribution, which Bollerslev (1987) treated later by Student’s t-distribution. In 1981, Harvey gave the Generalized Error Distribution (GED) by taking into account fat tails. After
testing various distributions, Liu and Hung (2010) concluded that the error distribution did not help much in improving
volatility forecasting using the GARCH model. However, Wilhelmsson (2006) found that if the leptokurtic property is
allowed in error distributions, it leads to significant improvements in forecasting when compared to the normal distribution.
The GARCH model introduced by Engle (1982) and Bollerslev (1986) is frequently employed to model excess kurtosis and
volatility clustering and forecast their volatility. But the residuals standardized by the conditional volatility computed by
using an estimated GARCH model still have excess kurtosis (Baillie and Bollerslev, 1989). It indicates the presence of
outliers in the returns series, which are not detected by the GARCH model (Balke and Fomby, 1994). Outliers can have
undesirable effects on the estimates of the parameters of the equation governing the volatility dynamics and the tests of
conditional homoskedasticity. Chen and Liu (1993) proposed a procedure to detect and correct additive outliers (AOs)
which had been recently proposed by Franses and Ghijsels (1999) using GARCH models. K€oksal (2009), Liu and
Hung (2010) also drew similar conclusions. However, Balke and Fomby (1994), and Tolvi (2001) found that a large number
of detected outliers in time series are innovative outliers (IOs), especially for high-frequency data. Lintner (1965), Mossin
(1966), Sharpe (1964), and Treynor (1961) developed factor-based models, and Fama and French (1992, 1993) developed
the capital asset pricing model, which is the Fama-French factor model to predict stock prices. Nijman and Sentana (1996)
discussed the contemporaneous aggregation of independent univariate GARCH processes as well as marginalization in
more general multivariate GARCH processes. They concluded that the class of strong GARCH processes is not closed
under these transformations whereas the class of the weak process is closed. Poon and Granger (2003) found that GARCH
generally dominates ARCH. However, asymmetric models, such as the exponential GARCH by Nelson (1991) and
GJR-GARCH by Glosten et al. (1993) tend to perform better than the original GARCH in some cases.
The present study deals with testing the performance of GARCH, EGARCH, and GJR models with Gaussian and Student’s t-distributions to forecast the volatility derived from conditional variances. Indian Stock Market data consisting of
daily closing prices of BSE 100 S&P stock index from 2009 to 2019 have been selected for the study. The detailed methodology for the proposed study has been described in following Section 2. The application and results of the study are then
discussed in Section 3. Finally, the conclusions of the study are given in the final Section 4.
2. Methodology
In the present study, GARCH and its variants EGARCH and GJR model and forecast volatility in BSE stock market data.
The main applications of GARCH lie in analyzing financial time-series data to find its conditional variances and volatilities. The GARCH model can be appropriate for time series data in which the variance of the error term is serially autocorrelated and follows an autoregressive moving average process. It assesses risk and expected returns for assets that exhibit
clustered periods of volatility in returns (Krishnan and Mukherjee, 2010; Dhaene and Wu, 2019).
The model will firstly convert the prices into relative returns and then fit the historical data to a mean-reverting volatility
term structure through its internal optimization technique. Specifically, GARCH model involves three steps:
(1) Estimating a best-fitting autoregressive model
(2) Computing autocorrelations of the error term
(3) Testing for significance.
Because financial data is high-frequency data, a large number of volatility matrix estimation methods have been developed.
Since these methods employ an approximate factor model, they lead to a low-rank and sparse structure for the integrated
volatility matrix. But these models lack a dynamic structure that can predict future volatility matrices as the volatilities are
highly unstable in real-life practice. Also, there is volatility clustering (Mandelbrot, 1963; McMillan and Speight 2004).
The heterogeneous and autocorrelated feature of volatilities motivated the researchers to develop parametric models. Kim
210
Handbook of hydroinformatics
and Fan (2019) developed a factor GARCH-It^o model to overcome the problem of predicting future volatility matrices
based on an approximate factor model.
2.1 Types of GARCH models
Since the original introduction of GARCH, many variations of it accommodating various specific qualities of the stock,
industry, or economic data have emerged. In addition to the magnitude, all these variants incorporate the direction of returns
(addressed in the original model).
In assessing risk, financial institutions incorporate GARCH models into their Value-at-Risk (VAR), maximum expected
loss (whether for a single investment or trading position, portfolio, or at a division or firm-wide level) over a specified
period projection. GARCH models provide better gauges of risk than can be obtained through tracking standard deviation
alone (Bollerslev, 1986; Chong et al., 1999). The GARCH process, being an extension of the ARCH process, which is the
same as the extension of the standard time series AR process to the general ARMA process. Just as the autocorrelation and
partial autocorrelation functions are useful tools in identifying and checking the behavior of time-series in the conditional
mean ARIMA model, the autocorrelations and partial autocorrelations for the squared process help to identify GARCH
behavior in the conditional variance equation (Franses and Van Dijk, 1996; Cryer and Chan, 2008).
One of the challenges in the modeling financial time series is to identify heteroskedastic effects, which imply that the
volatility of financial data is not constant. The volatility is the square root of conditional variance of the log return series.
If {yt : t T} denotes the stock price series data, then log returns are defined by
yt
r t ¼ log yt log yt1 ¼ log
(1)
yt1
The volatility st is defined by
s2t ¼ Var r 2t j F t1
(2)
where F t1 is a s-algebra generated by r0, r1, …, rt1.
2.1.1 GARCH model
GARCH model, being an extension of Engle’s ARCH model for variance heteroskedasticity, deals with the prediction of
future variances using past variances whenever the price series exhibits volatility clustering. GARCH models are widely
used conditional heteroskedastic models to explain volatility clustering in an innovations process. Volatility clustering
occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes
with time. The GARCH model is an autoregressive moving average model for conditional variances whose lagged variances consist of GARCH coefficients and lagged squared innovations contain ARCH coefficients. Mathematically, the
GARCH (p, q) model for the log return series {rt : t T} is given by:
r t ¼ m + et
(3)
where
et ¼ st zt and s2t ¼ o +
p
X
i¼1
gi s2ti +
q
X
aj e2tj
(4)
j¼1
Here zt (called the innovations) are independent and identically distributed with mean 0 and variance 1; et denote error
terms. The GARCH (p, q) model considers the following constraints for stationarity and positivity of the price series
(a) o > 0 which implies that the volatility cannot be zero or negative. (b) gi > 0, aj > 0 which will depict capturing the
stylized characteristic of volatility clustering with the increase in conditional variance forecast by large variations in
p
q
P
P
returns, and (c)
gi +
aj < 1 which is another condition depicting stylized characteristic of volatility clustering.
i¼1
j¼1
Forecasting volatility in the stock market data Chapter
12
211
The conditional variance model GARCH (p, q) is composed of the GARCH component polynomial (p past conditional
variances) and the ARCH component polynomial (q past squared innovations) (Posedel, 2005).
2.1.2 EGARCH model
In the exponential GARCH (EGARCH) model, the logarithm of the conditional variances is used which consists of additional leverage terms to capture asymmetry in volatility clustering. EGARCH model is an extension of the GARCH model.
The EGARCH (p, q) model consisting of p GARCH coefficients corresponding to lagged log variance terms, q ARCH
coefficients corresponding to lagged standardized innovations, and q leverage coefficients with lagged standardized innovations. Mathematically, the conditional variance equation of an EGARCH (p, q) model is given by:
8 93
2 !
p
q
q
<Etj =
Etj X
X
X
etj
2
2
5+
log st ¼ o +
gi log st1 +
aj 4
E
xj
(5)
: stj ;
stj
stj
i¼1
j¼1
j¼1
where the parameters gi(i ¼ 1, 2, …, p) measure the leverage effect and aj(j ¼ 1, 2, …, q) captures volatility clustering. In
EGARCH model, there is no need to consider the positivity constraints because of the modeling the logarithm of the variance. In the EGARCH model equation, the distribution of the innovation zt can be Gaussian or Student’s t-distribution
(Brummelhuis and Guegan, 2005).
2.1.3 GJR model
The GJR model is a variant of the GARCH model that includes leverage terms for modeling asymmetric volatility clustering and is named for Glosten, Jagannathan, and Runkle. In the GJR formulation, large negative changes are more likely
to be clustered than positive changes. In the GJR (p, q) model, p GARCH coefficients associated with lagged variances,
q ARCH coefficients associated with lagged squared innovations, and q leverage coefficients associated with the square of
negative lagged innovations. Mathematically, the conditional variance equation of the GJR (p, q) model is given by:
s2t ¼ o +
p
X
i¼1
gi log s2t1 +
q
X
aj E2tj +
q
X
j¼1
h
i
xj I Etj < 0 E2tj
(6)
j¼1
The term I(∙) is called the indicator function which equals 1 if Etj < 0 and 0 otherwise. Thus, the leverage coefficients are
applied to negative innovations. The stationarity and positivity constraints for the GJR model are (a) o > 0 (b) gi > 0, aj > 0
p
q
q
P
P
P
(c) aj + xj 0 and (d)
gi +
aj + 12 xj < 1.
i¼1
j¼1
j¼1
The GJR model reduces to the GARCH model if all leverage coefficients are zero which makes the GARCH model a
special case of the GJR model (Baillie and Bollerslev, 1989; Agnolucci, 2009).
Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are probabilistic statistical measures
which are used to quantify the fitting of a model in modeling phase and enable to select the best model among a set of
candidate models. The fitted model with least values of AIC and BIC is considered to be best for model testing and forecasting purposes (Franses and Van Dijk, 1996). AIC and BIC values for a model can be found from the following equations:
AIC ¼ 2 log L + 2p
(7)
BIC ¼ 2 log L + p log ðnÞ
(8)
where L is the likelihood function, p is the number of parameters in the models, and n is the number of samples in the
modeling phase.
Moreover, the conditional variances obtained by modeling daily and weekly returns with appropriate GARCH,
EGARCH, and GJR models are simulated using Monte Carlo simulation to obtain simulated conditional variances.
3. Application and results
The present study is based on the use of autoregressive conditional heteroskedastic models to estimate volatility in BSE 100
S&P stock index data of 10 years ranging from June 1, 2009, to June 14, 2019 (Data Source: https://www.bseindia.com).
212
Handbook of hydroinformatics
Plot of Daily Stock Prices
Daily Stock Prices
9.5
9
8.5
2010
2011
2012
2014
2015
2016
Date
Plot of Weekly Stock Prices
2017
2018
2019
2010
2011
2012
2013
2017
2018
2019
2013
Weekly Stock Prices
9.5
9
8.5
2014
2015
Date
2016
FIG. 1 Time series plot of daily and weekly BSE stock prices.
TABLE 2 Statistical analysis of daily and weekly stock index data.
Frequency
Min
Max
Mean
Standard
deviation
Variance
Skewness
Kurtosis
Daily
8.3065
9.4096
8.8804
0.2896
0.0839
0.1943
1.6690
Weekly
8.3178
9.3963
8.8822
0.2900
0.0841
0.1906
1.6651
BSE Daily Stock Returns
Daily Stock Returns
0.05
0
–0.05
Weekly Stock Returns
2010
2011
2012
2013
2015
2014
Date
2016
2017
2018
2019
2017
2018
2019
BSE Weekly Stock Returns
0.1
0.05
0
–0.05
–0.1
2010
2011
2012
FIG. 2 Daily and weekly returns of BSE stock returns.
2013
2015
2014
Date
2016
Forecasting volatility in the stock market data Chapter
12
213
Daily and weekly frequencies of price indices are used as inputs for the proposed study, whose time series plots have been
shown in Fig. 1. Statistical analysis of daily and weekly stock prices is given in Table 2. BSE is a stockpiling index that takes
into account the movement of both the price and the returns, so the stock index data is converted to returns for daily and
weekly frequencies using Eq. (1). Fig. 2 shows the plot of daily and weekly BSE stock returns. For each frequency, the
in-sample modeling period is from June 1, 2009, to November 29, 2018, and the remaining period of 2018 and 2019
includes the out-of-sample forecasting period. For daily frequency, the in-sample period consists of 2359 points and
the out-of-sample period consists of 132 points, while for weekly frequency, the in-sample period consists of 495, and
the out-of-sample period consists of 29 sample points.
For daily and weekly stock returns, the values of skewness are found to be 0.3022 and 0.1668 respectively, which
highlight the effect of asymmetric components contributing toward risk. The daily and weekly stock returns represent the
presence of skewness, leptokurtosis, and fat tails.
The daily and weekly returns of the BSE stock index are used as input to GARCH, EGARCH, and GJR conditional
variance models to forecast conditional variances and hence volatility forecast. First of all, the autocorrelation function
(ACF) and partial autocorrelation function (PACF) of squared residuals are plotted to examine conditional heteroskedasticity. The residuals for daily and weekly frequencies are calculated by the following formula:
Residuals ¼ returns meanðreturnsÞ
Sample Partial Autocorrelation
Sample Autocorrelation
ACF and PACF plots shown in Figs. 3 and 4 indicate significant autocorrelation and the presence of volatility clustering in
the residual series. The volatility clustering in the residual series can also be estimated by Ljung-Box Q-test. The presence
of conditional heteroskedasticity can also be estimated by testing residuals for ARCH effects. The F statistic for the test is
57.1877, which is greater than the critical value 9.2103 from the distribution with two degrees of freedom for daily returns.
Thus, the null hypothesis of “no ARCH effects” is rejected at a level of significance. Similarly, the residuals of weekly
returns of BSE stock prices reflect ARCH effects, which are clear from the F-statistic value for the test (57.1877) and
critical value (9.2103) from the distribution with two degrees of freedom. It means that the residual series exhibits conditional heteroskedasticity (Drakos et al., 2010).
The next step is to test the daily and weekly returns for normality using Jarque-Bera (JB) test. The value of the JB-test
statistic is 521.4188, which is larger than the critical value 5.9621 and indicates rejection of the null hypothesis of
“normality of daily returns” at a level of significance. It indicates that the daily returns are nonnormal with high values
of kurtosis. Similarly, the value of JB-test statistic 35.2496 is larger than the critical value 5.8912 rejects the null hypothesis
at a level of significance which indicates that weekly returns are nonnormal.
Sample Autocorrelation Function
1
0.5
0
0
2
4
0
2
4
1
6
8
6
8
10
12
14
Lag
Sample Partial Autocorrelation Function
16
18
16
18
20
0.5
0
FIG. 3 ACF and PACF of squared residuals for daily returns.
10
Lag
12
14
20
Sample Autocorrelation
Sample Partial Autocorrelation
Sample Autocorrelation Function
1
0.5
0
0
2
4
0
2
4
1
6
8
6
8
10
12
14
Lag
Sample Partial Autocorrelation Function
16
18
20
16
18
20
0.5
0
10
Lag
14
12
FIG. 4 ACF and PACF of squared residuals for weekly returns.
TABLE 3 Estimation results for GARCH, EGARCH, and GJR model parameters with Gaussian distribution.
Model
Frequency
GARCH (1,1)
Daily
Weekly
EGARCH (1,1)
Daily
Weekly
GJR (1,1)
Daily
Weekly
Value
SE
t-Statistic
P-value
Constant o
1.8923e-06
7.6804e-07
2.4638
0.013748
GARCH g
0.91403
0.0099932
91.465
0
ARCH a
0.067787
0.0078861
8.5958
8.2721e-18
Constant o
3.5898e-06
1.2718e-06
2.8227
0.0047624
GARCH g
0.18505
0.11231
1.6477
0.099409
ARCH a
0.26434
0.087087
3.0353
0.0024026
Constant o
0.30966
0.04304
7.1943
6.2781e-13
GARCH g
0.96612
0.0046121
209.48
0
ARCH a
0.12541
0.016856
7.4405
1.003e-13
Leverage x
0.10396
0.0087637
11.863
1.8442e-32
Constant o
1.2102
0.40467
2.9905
0.0027848
GARCH g
0.84112
0.052609
15.988
1.542e-57
ARCH a
0.21477
0.068153
3.1513
0.0016257
Leverage x
0.1836
0.053037
3.4617
0.00053669
Constant o
2.1252e-06
6.7635e-07
3.1421
0.0016773
GARCH g
0.90967
0.0095547
95.207
0
ARCH a
0.016508
0.0089696
1.8405
0.0657
Leverage x
0.114
0.012789
8.9139
4.9259e-19
Constant o
5.382e-05
2.1272e-05
2.5301
0.011403
GARCH g
0.77692
0.069548
11.171
5.6523e-29
ARCH a
0.016048
0.029889
0.53693
0.59132
Leverage x
0.21371
0.078476
2.7233
0.0064639
Forecasting volatility in the stock market data Chapter
12
215
TABLE 4 Estimation results for GARCH, EGARCH, and GJR model parameters with student’s t-distribution.
Model
Frequency
GARCH (1,1)
Daily
Weekly
EGARCH (1,1)
Daily
Weekly
GJR (1,1)
Daily
Weekly
Value
SE
t-Statistic
P-value
Constant o
2.1142e-06
9.8443e-07
2.1477
0.031739
GARCH g
0.90578
0.014978
60.473
0
ARCH a
0.07413
0.012107
6.123
9.1813e-10
DoF
8.1296
1.2305
6.6066
3.9333e-11
Constant o
3.5898e-06
1.4695e-06
2.4429
0.01457
GARCH g
0.18505
0.15613
1.1852
0.23593
ARCH a
0.26434
0.10143
2.6061
0.0091578
DoF
10
3.4478
2.9004
0.0037273
Constant o
0.32685
0.059042
5.536
3.0953e-08
GARCH g
0.96458
0.0063342
152.28
0
ARCH a
0.12384
0.020731
5.9734
2.3235e-09
Leverage x
0.11306
0.013209
8.5598
1.1305e-17
DoF
9.6831
1.6158
5.9926
2.0652e-09
Constant o
1.1334
0.40638
2.789
0.0052875
GARCH g
0.85133
0.052716
16.149
1.1457e-58
ARCH a
0.20907
0.072454
2.8856
0.0039073
Leverage x
0.18536
0.054362
3.4097
0.00065032
DoF
22.575
16.368
1.3792
0.16782
Constant o
2.8559e-06
8.9592e-07
3.1877
0.0014341
GARCH g
0.89932
0.013685
65.714
0
ARCH a
0.0088796
0.010881
0.81604
0.41448
Leverage x
0.13669
0.020179
6.7736
1.2563e-11
DoF
7.986
1.2484
6.3971
1.5838e-10
Constant o
5.1359e-05
2.1476e-05
2.3915
0.016781
GARCH g
0.78324
0.07072
11.075
1.6545e-28
ARCH a
0.014222
0.031427
0.45253
0.65088
Leverage x
0.21657
0.083663
2.5886
0.009636
DoF
24.552
21.96
1.118
0.26355
Now, the daily and weekly BSE stock returns are treated with GARCH, EGARCH, and GJR models with Gaussian and
student’s t-distributions. The model parameters are estimated using their respective conditional variance equations, given in
Tables 3 and 4. Using AIC and BIC criteria, the conditional variance models GARCH, EGARCH, and GJR with Gaussian
and student’s t-distributions are compared in respective Tables 5 and 6 for daily and weekly frequencies. The values of AIC
and BIC are least for the EGARCH model as compared to GARCH and GJR models for both frequencies with Gaussian and
student’s t-distributions. Using these models, daily and weekly out-of-sample volatilities are estimated. Mean absolute
error (MAE), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE)
are used as measures to obtain error statistics which are defined by
216
Handbook of hydroinformatics
TABLE 5 AIC and BIC values after modeling daily returns with GARCH, EGARCH, and GJR models with Gaussian
distribution.
Frequency
Criteria
GARCH (1,1)
EGARCH (1,1)
GJR (1,1)
Daily
AIC
1.6129e+ 04
1.6216e+ 04
1.6195e+ 04
BIC
1.6112e+ 04
1.6193e+ 04
1.6172e+ 04
AIC
2.5239e+ 03
2.5385e+ 03
2.5358e+ 03
BIC
2.5111e+ 03
2.5215e+ 03
2.5188e+ 03
Weekly
TABLE 6 AIC and BIC values after modeling daily returns with GARCH, EGARCH and GJR models with student’s
t-distribution.
Frequency
Criteria
GARCH (1,1)
EGARCH (1,1)
GJR (1,1)
Daily
AIC
1.6190e+04
1.6266e+ 04
1.6253e+04
BIC
1.6167e+04
1.6237e+ 04
1.6224e+04
AIC
2.5239e+03
2.5384e+ 03
2.5352e+03
BIC
2.5069e+03
2.5171e+ 03
2.5139e+03
Weekly
MAE ¼
n 1X
^
vðtÞ v ðtÞ
n t¼1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
n h
i2
1X
^
RMSE ¼
vðtÞ v ðtÞ
n t¼1
^
n 1X
vðtÞ v ðtÞ
MAPE ¼
100
n t¼1 vðtÞ ^
where v ðtÞ denotes the predicted value of volatility v(t).
The simulated conditional variances obtained using Monte Carlo simulations for daily and weekly frequencies Gaussian
distribution and student’s t-distribution have been shown in respective Figs. 5 and 6. Table 7 shows the values of error
statistics for daily and weekly returns of the BSE 100 S&P index. The EGARCH model with student’s t-distribution is
more accurate than the GARCH and GJR models with Gaussian and student’s t-distribution for daily and weekly frequencies. The improved prediction might be due to better capturing the leverage effect. The MAPE value obtained for
daily returns using the EGARCH model with student’s t-distribution is 14.2577, which is comparatively lower than the
MAPE value of 15.1214 for the EGARCH model with Gaussian distribution. Similar behavior is seen in MAPE values
of the EGARCH model for weekly frequency also. Moreover, MAE, MSE, and RMSE errors have lower values for the
EGARCH model with student’s t-distribution as compared to GARCH and GJR model with Gaussian distribution for daily
returns. Similarly, the other errors observed for weekly frequencies are likewise the daily frequency with students’
t-distribution and Gaussian distribution.
Forecasting volatility in the stock market data Chapter
2
×10−3 Simulated Daily Cond. Var (GARCH-Gaussian)
8
Simulated path
1.8
Mean
7
Confidence bounds
12
217
×10−5 Simulated Weekly Cond. Var (GARCH Gaussian)
Simulated path
Mean
Confidence bounds
1.6
6
5
1.2
Cond. var.
Cond. var.
1.4
1
0.8
4
3
0.6
2
0.4
1
0.2
0
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(a)
1.2
(b)
×10−3 Simulated Daily Cond. Var (EGARCH-Gaussian)
4.5
Simulated path
Simulated path
Mean
4
Confidence bounds
1
×10−3 Simulated Weekly Cond. Var (EGARCH Gaussian)
Mean
Confidence bounds
3.5
3
Cond. var.
Cond. var.
0.8
0.6
0.4
2.5
2
1.5
1
0.2
0.5
0.0
2.5
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(c)
(d)
×10−3
Simulated Daily Cond. Var (GJR-Gaussian)
8
×10−3 Simulated Weekly Cond. Var (GJR Gaussian)
Simulated path
Simulated path
Mean
Confidence bounds
Mean
7
Confidence bounds
2
5
1.5
Cond. var.
Cond. var.
6
1
4
3
2
0.5
1
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(e)
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(f)
FIG. 5 Simulated conditional variances obtained using GARCH, EGARCH, and GJR models for daily and weekly frequencies with Gaussian
distribution.
218
Handbook of hydroinformatics
2
×10−3
Simulated Daily Cond. Var (GARCH t-dist)
2.5
×10−4
Simulated path
Mean
Confidence bounds
1.8
Simulated Weekly Cond. Var (GARCH t-dist)
Simulated path
Mean
Confidence bounds
1.6
2
1.2
Cond. var.
Cond. var.
1.4
1
0.8
1.5
1
0.6
0.4
0.5
0.2
0
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(a)
1.5
×10−3
(b)
Simulated Daily Cond. Var (EGARCH t-dist)
2.5
Simulated path
×10−4
Simulated Weekly Cond. Var (GARCH t-dist)
Simulated path
Mean
Mean
Confidence bounds
Confidence bounds
2
Cond. var.
Cond. var.
1
1.5
1
0.5
0.5
0
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(c)
3.5
×10−3
(d)
Simulated Weekly Cond. Var (GJR t-dist)
Simulated Daily Cond. Var (GJR t-dist)
0.012
Simulated path
Simulated path
Mean
3
Mean
Confidence bounds
0.01
Confidence bounds
2.5
Cond. var.
Cond. var.
0.008
2
1.5
0.006
0.004
1
0.002
0.5
0
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(e)
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
(f)
FIG. 6 Simulated conditional variances obtained using GARCH, EGARCH, and GJR models for daily and weekly frequencies with student’s
t-distribution.
Forecasting volatility in the stock market data Chapter
12
219
TABLE 7 Forecast error statistics for daily and weekly frequencies with Gaussian and student’s t-distributions.
Distribution
Frequency
Model
MAE
RMSE
MAPE
Gaussian
Daily
GARCH (1,1)
0.0016
0.0018
20.5424
EGARCH (1,1)
0.0014
0.0017
15.1214
GJR (1,1)
0.0016
0.0019
17.3108
GARCH (1,1)
3.2795e-04
3.4871e-04
12.8366
EGARCH (1,1)
0.0028
0.0033
12.9630
GJR (1,1)
0.0030
0.0035
16.9324
GARCH (1,1)
0.0016
0.0018
20.6181
EGARCH (1,1)
0.0012
0.0015
14.2577
GJR (1,1)
0.0016
0.0019
17.1776
GARCH (1,1)
3.2795e-04
3.4871e-04
12.8366
EGARCH (1,1)
0.0026
0.0032
12.4820
GJR (1,1)
0.0031
0.0036
17.3702
Weekly
Student’s t
Daily
Weekly
4. Conclusions
The present study investigates the modeling of daily and weekly returns of BSE 100 S&P index using conditional variance
models of GARCH, EGARCH, and GJR models to obtain out-of-sample volatility forecast. The daily and weekly BSE
stock returns from June 1, 2009, to November 29, 2018 are used as input to the selected models. For daily frequency,
the in-sample period consists of 2359 points and the out-of-sample period consists of 132 points, while for weekly frequency, the in-sample period consists of 495, and the out-of-sample period consists of 29 sample points. These models
were applied to input data of daily and weekly stock index returns with Gaussian distribution and student’s t-distribution.
The MAPE value for daily returns using the EGARCH model with student’s t-distribution is 14.2577, which is comparatively lower than the MAPE value of 15.1214 for the EGARCH model with Gaussian distribution. Similarly, the MAPE
values for weekly frequency have been observed to be quite low for the EGARCH model. The other errors have also been
observed to be low for the EGARCH model with student’s t-distribution as compared to GARCH and GJR model with
Gaussian distribution for daily returns. So, the volatility forecasts get improved using GARCH, EGARCH, and GJR models
with student’s t-distribution as compared to that with Gaussian distribution and among these models EGARCH produced
more accurate results as compared to GARCH and GJR conditional variance models. So, the EGARCH model among other
conditional variance models produces more accurate results with student’s t-distribution as compared to that with Gaussian
distribution for daily and weekly frequencies.
References
Agnolucci, P., 2009. Volatility in crude oil futures: a comparison of the predictive ability of GARCH and implied volatility models. Energy Econ. 31 (2),
316–321.
Akgirary, V., Booth, G.C., Hatem, J.C., Mustafa, C., 1991. Conditional dependence in precious metal prices. Financ. Rev. 26 (3), 367–386.
Alberg, D., Shalit, H., Yosef, R., 2011. Estimating stock market volatility using asymmetric GARCH models. Appl. Financ. Econ. 18 (15), 1201–1208.
Baillie, R., Bollerslev, T., 1989. Common stochastic trends in a system of exchange rates. J. Monet. Econ. 44 (1), 167–181.
Balke, N.S., Fomby, T.B., 1994. Large shocks, small shocks, and economic fluctuations: outliers in macroeconomic time series. J. Appl. Econ. 9, 181–200.
Barunik, J., Krehlik, T., Vacha, L., 2016. Modeling and forecasting exchange rate volatility in time-frequency domain. Eur. J. Oper. Res. 251 (1), 329–340.
Basrak, B., Davis, R.A., Mikosch, T., 2002. Regular variation of GARCH processes. Stoch. Process. Appl. 99, 95–115.
Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. J. Econ. 31, 307–327.
Bollerslev, T., 1987. A conditionally heteroskedastic time series model for speculative prices and rates of return. Rev. Econ. Stat. 69 (3), 542–547.
Bollerslev, T., 1988. On the correlation structure for the generalized autoregressive heteroscedastic process. J. Time Ser. Anal. 9 (2), 121–131.
Bouoiyour, J., Selmi, R., 2015. Exchange volatility and export performance in Egypt: new insights from wavelet decomposition and optimal GARCH
model. J. Int. Trade Econ. Dev. 24 (2), 201–227.
220
Handbook of hydroinformatics
Brummelhuis, R., Guegan, D., 2005. Multiperiod conditional distribution functions for conditionally normal GARCH (1, 1) models. J. Appl. Probab. 42
(2), 426–445.
Charles, A., Darne, O., 2005. Outliers and GARCH models in financial data. Econ. Lett. 86 (3), 347–352.
Chen, C., Liu, L.M., 1993. Joint estimation of model parameters and outlier effects in time series. J. Am. Stat. Assoc. 88, 284–297.
Chong, C.W., Ahmad, M.I., Abdullah, M.Y., 1999. Performance of GARCH models in forecasting stock market volatility. J. Forecast. 18 (5), 333–343.
Chuang, I.Y., Lu, J.R., Lee, P.H., 2007. Forecasting volatility in the financial markets: a comparison of alternative distributional assumptions. Appl.
Financ. Econ. 17, 1051–1060.
Cryer, J.D., Chan, K.S., 2008. Time series regression models. In: Time Series Analysis: With Applications in R. Springer, pp. 249–276.
Dellaportas, P., Pourahmadi, M., 2012. Cholesky-GARCH models with applications to finance. Stat. Comput. 22 (4), 849–855.
Dhaene, G., Wu, J., 2019. Incorporating overnight and intraday returns into multivariate GARCH volatility models. J. Econ. 217 (2), 471–495.
Ding, Z., Engle, R.F., Granger, C.W.J., 1993. A long memory property of stock market return and a new model. J. Empir. Finance 1 (1), 83–106.
Drakos, A.A., Kouretas, G.P., Zarangas, L.P., 2010. Forecasting financial volatility of the Athens stock exchange daily returns: an application of the asymmetric normal mixture GARCH model. Int. J. Finance Econ. 15 (4), 331–350.
Engle, R.F., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation. Econometrica 50, 987–1007.
Fama, E.F., 1965. The behavior of stock-market prices. J. Bus. 38 (1), 34–105.
Fama, E.F., French, K.R., 1992. The cross-section of expected stock returns. J. Finance 47 (2), 427–465.
Fama, E.F., French, K.R., 1993. Common risk factors in the returns on stock and bonds. J. Financ. Econ. 33, 3–56.
Fama, E.F., French, K.R., 2004. The capital asset pricing model: theory and evidence. J. Econ. Perspect. 18 (3), 25–46.
Franses, P.H., Ghijsels, H., 1999. Additive outliers, GARCH and forecasting volatility. Int. J. Forecast. 15, 1–9.
Franses, P.H., Van Dijk, D., 1996. Forecasting stock market volatility using (non-linear) Garch models. J. Forecast. 15 (3), 229–235.
Gallant, R., Hsieh, D., Tauchen, G., 1997. Estimation of stochastic volatility models with diagnostics. J. Econ. 81 (1), 159–192.
Glosten, L., Jangannathan, R., Runkle, D., 1993. On the relation between excepted value and the volatility of the nominal excess return of stocks. J. Finance
48, 1779–1801.
Hadizadeh, R., Eslamian, S., 2017. Modeling hydrological process by ARIMA–GARCH time series. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of
Drought and Water Scarcity. Principles of Drought and Water Scarcity, vol. 1. Taylor and Francis, CRC Press, USA, pp. 571–590 (Chapter 30).
Harris, R.D., Coskun K€uç€uk€ozmen, C., Yilmaz, F., 2004. Skewness in the conditional distribution of daily equity returns. Appl. Financ. Econ. 14 (3), 195–
202.
Hsieh, D., 1989. Modeling heteroscedasticity in daily exchange rates. J. Bus. Econ. Stat. 7 (3), 307–317.
Jach, A., Kokoszka, P., 2010. Empirical wavelet analysis of tail and memory properties of LARCH and FIGARCH models. Comput. Stat. 25 (1), 163–182.
Kim, D., Fan, J., 2019. Factor GARCH-Ito models for high-frequency data with application to large volatility matrix prediction. J. Econ. 208, 395–417.
K€
oksal, B., 2009. A comparison of conditional volatility estimators for the ISE national 100 index returns. J. Econ. Soc. Res. 11 (2), 1–29.
Koutmos, G., Theodossiou, P., 1994. Time-series properties and predictability of Greek exchange rates. Manag. Decis. Econ. 15 (2), 159–177.
Krishnan, R., Mukherjee, C., 2010. Volatility in Indian stock markets: a conditional variance tale re-told. J. Emerg. Mark. Finance 9 (1), 71–93.
Lambert, P., Laurent, S., 2001. Modelling Financial Time Series Using GARCH-Type Models with a Skewed Student Distribution for the Innovations. No.
UCL-Universite Catholique de Louvain, Belgium.
Lintner, J., 1965. The valuation of risk assets on the selection of risky investments in stock portfolios and capital budgets. Rev. Econ. Stat. 47, 13–37.
Liu, H.C., Hung, J.C., 2010. Forecasting S&P-100 stock index volatility: the role of volatility asymmetry and distributional assumption in GARCH models.
Expert Syst. Appl. 37 (7), 4928–4934.
Liu, W., Morley, B., 2009. Volatility forecasting in the Hang Seng index using the GARCH approach. Asia-Pac. Financ. Mark. 16 (1), 51–63.
Mandelbrot, B., 1963. The variation of certain speculative prices. J. Bus. 36 (4), 394–419.
Markowitz, H., 1952. Portfolio selection. J. Finance 7 (1), 77–91.
McMillan, D.G., Speight, A.E., 2004. Daily volatility forecasts: reassessing the performance of GARCH models. J. Forecast. 23 (6), 449–460.
McMillan, D., Speight, A., Ap Gwilym, O., 2000. Forecasting UK stock market volatility. Appl. Financ. Econ. 10 (4), 435–448.
Mossin, J., 1966. Equilibrium in a capital asset market. Econometrica 34 (4), 768–783.
Nelson, D.B., 1990. ARCH models as diffusion approximations. J. Econ. 45, 7–28.
Nelson, D., 1991. Conditional heteroskedasticity in asset returns: a new approach. Econometrics 59, 349–370.
Nijman, T., Sentana, E., 1996. Marginalization and contemporaneous aggregation in multivariate GARCH processes. J. Econ. 71, 71–87.
Paolella, M.S., 2018. Linear Models and Time-Series Analysis: Regression, ANOVA, ARMA and GARCH. John Wiley & Sons.
Poon, S., Granger, C.W.J., 2003. Forecasting volatility in financial markets: a review. J. Econ. Lit. 41 (2), 478–539.
Posedel, P., 2005. Properties and estimation of GARCH (1, 1) model. Metodoloski zvezki 2 (2), 243.
Sharpe, W., 1964. Capital asset prices: a theory of market equilibrium under conditions of risk. J. Finance 19, 425–442.
Siourounis, D., 2002. Modeling volatility and testing for efficiency in emerging capital markets: the case of the Athens stock exchange. Appl. Financ. Econ.
12 (1), 47–55.
Theodossiou, P., 1994. The stochastic properties of major Canadian exchange rates. Financ. Rev. 29 (2), 193–221.
Tian, G., Guo, M., 2003. Intraday data and volatility models: evidence from Chinese stocks. Economics. Working Paper.
Tolvi, J., 2001. Outliers in eleven Finnish macroeconomic time series. Finn. Econ. Pap. 14, 14–32.
Treynor, J.L., 1961. Market Value, Time, and Risk. SSRN. August 8, 1961, https://doi.org/10.2139/ssrn.2600356.
Wilhelmsson, A., 2006. GARCH forecasting performance under different distribution assumptions. J. Forecast. 25 (8), 561–578.
Yu, J., 2005. On leverage in a stochastic volatility model. J. Econ. 127 (2), 165–178.
Chapter 13
Gene expression models
Hossien Riahi-Madvara, Mahsa Gholamib, and Saeid Eslamianc,d
a
Department of Water Engineering, Faculty of Agriculture, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran, b Department of Civil Engineering,
Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran, c Department of Water Engineering, College of Agriculture, Isfahan University of
Technology, Isfahan, Iran, d Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran
1. Introduction
This chapter considers the gene expression programming (GEP), different types, and developments of GEP-based techniques in hydroinformatics. The genetic programming (GP), linear genetic programming (LGP), GEP, multigene genetic
programming, tree-GEP, and pareto-optimal multigene genetic programming are introduced, and their applicability in
various fields of hydraulic and hydroinformatics is discussed. The results of the case studies of GEP-based models confirmed the accuracy and suitability of these techniques in various aspects of hydraulic and hydrology models. Several
function findings using GEP-based methods in hydraulic and river engineering are performed, and their explicit equations
are derived as a strength model induction engine. The results of the case studies confirmed the general aspects and accuracy
of GEP in function findings.
Gene expression programming (GEP) is one of the applied fields in evolutionary processing and genetic algorithms
(Ferreira, 2001, 2002a, 2002b, 2006). In the GEP, based on the mathematical approach of GA, natural selection, and
the concepts of parse trees, the computer produces the codes automatically rather than developing codes by the human
(Li et al., 2005). Indeed, a high-level command enforced to the computing and then the machine by considering the general
concepts of problem develops the required code in an expressional way (Ferreira, 2003). In GEP, the machine develops the
code, executes the optimized program, and produces a symbolic predictive equation. Koza (1990), at the first of 90th
decade, introduced and developed the GEP algorithm as a new brand of automatic function finding in hydrology and
hydraulic. Because of its ability in automatic function finding, the GEP is known by different versions such as automatic
program induction, program synthesis, automatic programming, function finding, model induction engine (Riahi-Madvar
et al., 2019). The GEP creates a population of symbolic programs and motives for the final model using the selection and
reproduction operators of biological science to achieve the goal of automatic function induction, based on Darwinian
Theory and GA ideas (Poli et al., 2008). In brief, the GEP can be defined as a biologically-motivated expert system that
produces computer codes to complete a function, and simultaneously evolves the symbolic assembly of the model and the
parameters of an evolved mathematical system (Searson et al., 2010).
Many studies of GEP-based modeling in hydro informatics are available in the literature. These models have been used to
estimate the scour process around the hydraulic structures (Guven et al., 2009; Azamathulla and Ghani, 2010; Najafzadeh and
Barani, 2011; Najafzadeh and Oliveto, 2021; Najafzadeh and Kargar, 2019; Parsaie and Haghiabi, 2021; Khan et al., 2018);
Longitudinal dispersion and transverse mixing in rivers (Riahi-Madvar et al., 2019; Nezaratian et al., 2021); modeling the
spatial distribution of flow depth in fluvial systems (Yan et al., 2021;); Stage discharge modeling (Zahiri and Shabani, 2018;
Birbal et al., 2021); water quality modeling (Najafzadeh et al., 2019); Groundwater vulnerability assessment (Norouzi et al.,
2021); Reference evapotranspiration modeling (Kazemi et al., 2021); River water temperature modeling (Keramatloo et al.,
2020). This chapter introduces the different types of gene expression models in the hydro informatics.
2. Genetic programming
State-of-the-art, genetic programming (GP) is defined by Koza (1992) for automatic solutions to different problems. GP is
processing a complete set of components or computer codes for function finding. Then GP recreates a random population of
computer codes or specimens to produce a new population and discover the fittest result (Sattar and Gharabaghi, 2015;
Danandeh Mehr and Kahya, 2017; Dufek et al., 2017). This population-based structure of GP results in more robustness
(Eslamian et al., 2012). Because of its intrinsic parallelism, the GP is a straightforward algorithm based on the terms of
Handbook of Hydroinformatics. https://doi.org/10.1016/B978-0-12-821285-1.00011-7
Copyright © 2023 Elsevier Inc. All rights reserved.
221
222 Handbook of hydroinformatics
conceptualization and performance (Dufek et al., 2017). The GP is one of the most popular artificial intelligence and selfconstructing techniques, which the modeler does not need to specify the form and structure of the solution process. The GP
structure follows the framework same as the tree structure, containing the root node (s), internal node (s), and leaves
(Danandeh Mehr and Kahya, 2017). GP creates the initial population by generating random individuals (i.e., trees) by complete, grow, and Ramped HalfHalf (RHH) steps to achieve the highest diversity (Koolivand-Salooki et al., 2017). Then,
three well-known operations, the reproduction, the crossover, and the mutation, are used to generate new generations
(Babovic and Keijzer, 2000). The reproduction engine evaluates the individuals in all iterations and selects them for
copying and inserting into the new populations, and new generations. The crossover step substitutes subtrees preferable
parent’s chromosome to produce an offspring. The mutation step replaces a randomly selected subtree with a new one from
the preferred parents (Saljoughi, 2017). In the growth of trees and subtrees, the new place for individuals is determined from
a competitive process that results in improvements of the population’s fitness at successive steps (Dufek et al., 2017).
Input feature selection is a crucial task for creating a GP model. There are four main inputs in a typical GP model: (1) data
clustering and subsets in train and test steps, (2) the goal function for final selection (e.g., root mean square error), (3) the
number of inner nodes and the structure of subtrees and location of leaves, and (4) GP features for the establishment of
an arrangement tree as a computer code of the potential solution. The GP model’s final set may include (a) the program’s
external inputs, (b) arithmetic functions, and (c) coefficients. To choose the best groups of the arithmetic functions, an
accurate initial guess is needed while consisting of all the necessary functions. Depending upon the level of complexity
in the investigation process, the arithmetic functions in the initial collection may include the basic mathematical operations
(i.e., plus (+), minus (), times (), divide ()), or complicated one (i.e., sin, cos, sqrt, log, exp., max, min, sigmoid, etc.)
(Danandeh Mehr and Kahya, 2017; Ravansalar et al., 2017). The GP products sequences of formulas with different complexities, but the moderately complex formulas are favored. The final selected formula should be accurate and straightforward. A
simple formula may not be very accurate, whereas a very complex formula is often over-fitted (Tinoco et al., 2015).
2.1 The basic steps in GEP development
Producing the GEP code of a high-level problem, the high-level program should be converted into the appropriate form of
GA. The following five steps should be taken into account by the human over the machine (Abraham et al., 2006):
1. The elements of final terms in the GEP should be determined. To convert a program code into the required chromosome
forms of GA as a parse tree, it is necessary to decide on the leaves in the program graph and program tree. These final
elements in the tree graph are the terminals of the model. The terminals are the collection of input variables, independent
variables, functions without input vectors, or the constants determined randomly.
2. Determination of the function set and operation commands could be used in the generation and reproduction phase
of code.
3. The fitness function definition as an objective function to evaluate the code’s fitness over generations and find the bestoptimized solution of GA for the symbolic models.
4. Determining the structural parameters of GA such as the method of selection of parents in GA, the mutation rate, the
reproduction rate, the cross-over rate, etc.
5. Setting the termination condition, and defining the procedure for determining the termination results of produced code.
2.2 The basic steps in GEP development
The main basics of GEP are parse trees. A tree-based GEP model is a systematic procedure utilized to generate symbolic and
expressional solutions in a given problem with the primary thrust of genetic algorithms ( Jin, 2020). The final solution in a
tree-based GEP is a structure, same as a tree with leaves as the terminal nodes at the end level of the model, and the nonterminal
nodes at the higher level. In the parse trees, the equational form of models is represented by tree graphs. A key necessity in
generating programs is that generated code’s consistency and grammatical correctness should be evaluated. For example, it is
impossible to execute an initial C ++ code written by a programmer because the written code may have grammatical errors in
its initial form and is written in a high-level language that cannot be converted into machine language form. The solution is the
conversion of code into a more straightforward, and monotone form called parse trees. In the execution phase of a high-level
code written in C/C ++ language, the code is converted into a parse tree that enables the compiler to work on it. On the other
hand, using these frameworks in the GEP structure makes more straightforward the definition of genetic operators simpler. It
removes the challenges in error finding and correction of syntax, and runtime errors made simpler. If the compiler succeeded
in generating parse tree from the written code, the code has no syntax or run time error. However, how the parse tree form of an
expression such as x + 12 is presented? The corresponding parse tree of this expression is as follows (Fig. 1).
Gene expression models Chapter
13
223
+
x
12
FIG. 1 The parse tree form of x + 12 expression.
-
*
+
Po
3
+
/
x
x
y
8
z
40
FIG. 2 The parse tree form of ((x/y + (8 + 40)) (3*pow(x, z))) expression.
In this parse tree, the operators are in the root, and the operands are in the left and right leaves of the root node. The parse
tree form of rather complex expressions such as ((x/y + (8 + 40)) (3*pow(x, z))) is as (Fig. 2).
In the parse trees, all operands and functions that require input arguments are found in the root subtrees of parse trees,
while the variables, constants, and functions that do not require inputs are found in the leave nodes. The root nodes that do
not depend on other things or have no children are the terminals of variables. In this way, using the parse trees enables
representing a computer code in a simple and consistent form. One of the most programming languages which work based
on the parse trees, is the Lisp that the codes are written in a type of parse tree. For example the code to write x + 5 expression
in Lisp is (+ x 5) and the Lisp code for expression ((x/y + (8 + 40)) (3*pow(x, z))) is ( (+ (/ x y) (+ 8 40)) (* 3 (Pow x z))).
Nowadays, this method of code representation is used in different programming languages as MATLAB to provide the final
output of GEPs (Raiahi-Madvar and Seifi, 2016).
3. Tree-based GEP
The GEP is based on the GA in the automatic construction of optimal program structures and nonlinear equations. As stated,
the parse tree is one of the best methods in symbolic program generations. The tree encodes the individuals in the population
by utilizing genetic operators such as mutation and crossover over the parse tree representation. The tree-based GEP in a
hierarchical encoding of trees having roots, nodes, and leaves with branches shows the model’s syntactic structure in
context-free grammars and the corresponding pseudo-code as shown in Fig. 3.
In this figure, the end nodes known as terminal nodes represent the input arguments of functions, and the arithmetic
operations are located at the nonterminal nodes. In optimizing a GEP model, the primary genetic operators for tree-based
production are the selection operator and the crossover operator. The mutation process implements a certain degree of
diversity over the population. In the generations, the fitness function in all the individuals is assessed and the goodness
of fit calculated, and the individuals are graded regarding their fitness. The selection operation regarding the fitness values
chooses the best individuals to contribute in mutation or cross over for offspring. The selection usually is based on the
tournament and roulette-wheel selection that acts based on fitness values. After applying the selection operation to the
individuals, the crossover is used over the branches to produce offspring by changing the selected branches of two parents,
as shown in Fig. 4.
224 Handbook of hydroinformatics
FIG. 3 Parse-tree demonstrations of the
computer programs in tree-based GEP ( Jin,
2020).
Root
Root
-
*
*
/
y
+
x
y
/
*
*
z
y
Mathematical Expression:
x
z
, ,
Encoding for Computer Programs: f=(x+y)/z-y*(x*z)
x
z
x
, ,
f=(x*x)/z*y
The branches are selected randomly, to have appropriate diversity over the population in successive generations. The
crossover points in a tree-based GEP can be either a nonterminal or terminal node over the branches. The final operation in
the tree-based GEP development is the mutation that randomly selects the branches and replaces them with a randomly
produced branch, as shown in Fig. 5.
Fig. 6 shows a flowchart of tree-based GEP creation in a genetic process context. Individuals were initially used as generations. In each generation, the fitness and function of the model are evaluated using a tree-based GEP, and the people are
utilized to reproduce new generations through selection, crossover, and mutation, which enhances the model’s performance
over generations. The generation process is continued until the model reaches the stopping criteria, based on a predetermined
maximum number of generations or error threshold. The evolution process is represented in the following figure.
First and foremost, two issues should be addressed in the construction of the GEP model. At first, the closure problem should
be satisfied. The closure problem in GEP means that each function that has input arguments should manage all of the output
values. For instance, suppose we write a program that exclusively employs the mathematical operators of {*, /, +, } as the
functions and the values of {0, 1, 2, …} as the terminal values. In this example, the closure problem is not satisfied as a program
with {2/0} produces the divide by zero error. To satisfy the closure problem, the divide function should be defined in such a way
that does not break the program and does not cause the program to end abnormally. The second issue is that the problem under
consideration must be solvable using a combination of terminals and functions. For example, using the function set of {+, } and
terminals of {1, 2, 3, …} the problem of calculating the logarithm of numbers is not solvable accurately. As a result, the modeler’s capacity to solve problems by considering implemented functions and processes is critical.
3.1 Tree depth control
In the population generation of GEP, the maximum depth of trees should be confined to control the following complexity
and challenges in the problem (Raiahi-Madvar and Seifi, 2016):
l
l
l
growing the maximum depth of trees, increases the required system memory usage
increasing the depth of trees, the running speed of genetic operations reduces, and the time of final solution in the algorithm increases
generally, the optimum problems usually composed of the smaller command and simple trees
Therefore, it is necessary to control the maximum allowable depth of trees in an evolutionary process that can be done using
the method of maxim depth determination or penalizing the giant trees.
3.2 Maximum tree depth
In this approach, during the production of the initial population and the stage of genetic operator’s implementation, the
pruning method is used over the trees with depths greater than the predefined maximum depth that guarantees the growing
Gene expression models Chapter
Parents
Root
Root
-
*
/
y
y
y
*
*
z
x
x
225
Offspring
*
+
13
x
z
y
Root
Root
*
*
/
y
/
z
/
z
*
x
*
z
x
x
+
z
x
y
x
FIG. 4 Application of the crossover operation in the tree-based GEP model ( Jin, 2020).
depth of trees. The weakness of this method is that the final optimum solution may require a maximum depth greater than
the predefined values, which makes necessary the increasing maximum depth, and repeating the evolutionary process.
3.3 Penalizing the large trees
In this method, in the evaluation phase of chromosomes in proportional with the depth of large trees, a penalty is used. In
this manner, with growing the trees, the penalty increases, reducing the chance of selecting large trees and controlling the
maximum depth of trees.
226 Handbook of hydroinformatics
FIG. 5 Application of the mutation operation over the
tree-based GEP ( Jin, 2020).
Root
-
Randomly Generated
Tree-branch
*
Remove
/
+
*
+
z
x
x
*
/
y
x
z
z
y
z
y
x
ADD
Root
-
*
/
+
*
y
*
x
x
y
FIG. 6 The flowchart of tree-based GEP
development ( Jin, 2020).
Random
Generation for
Initial Population
x
z
Population
of
Candidate
Fitness
Evaluation
Stopping
Criteria
Reproduction
(Crossover/mutation)
Selection
z
z
Yes
Finish
No
Genetic Operations
3.4 Dynamic maximum-depth technique
As stated previously, the search space in the evolutionary process of tree-based GEP is potentially unlimited. Therefore, the
trees may grow in depth and size throughout the evolution process. As a result, some parts of the parse-tree structure may be
redundant with less influence on the improvements of fitness function (Koza, 1992). Another sophisticated technique in
maximum depth control is the dynamic procedure. In this approach, rather than the predefined fixed maximum depth, the
maximum depth is adjusted during the evolution according to the best fitness value of trees ( Jin, 2020). The pseudo-code of
dynamic maximum depth procedure is presented in Table 1.
Gene expression models Chapter
13
227
TABLE 1 Pseudo code for dynamic maximum-depth (DMD) technique (Jin, 2020).
For i=1: total number of individuals
Depth(i)=individual ith’s depth
Fitness(i)= individual ith’s fitness
If Depth(i)<Maximum Depth
Chose the ith individual
If Fitness(i)< Best fitness
Best fitness= Fitness(i)
End
Else
If Fitness(i)<best fitness
Chose ith individual
Best fitness= Fitness(i)
Maximum depth=depth(i)
End
End
End
4. Linear genetic programming
LGP is an advanced version of tree-based GEPs with a linear structure of program representation (Alavi and Gandomi,
2012). The LGP form of expression x + 12, which presented in tree-based GP in Fig. 1, is as follows.
f[0]=0;
L0: f[0]+=x[2];
l1: f[0]+=12;
Return f[0];
In the LGP, an individual program is a variable-length sequence of simple C instructions. The arithmetic operations, conditions, and functions compose the instructions. The LGP steps to solve the problems are as follows (Alavi and Gandomi,
2012).
l
l
Initialization: randomly generating the initial population and calculating the objective function of each individual.
Genetic operator:
✓ Tournament: randomly selecting the individuals from the population and evaluation of the two best and worst.
✓ Crossover: generating new individuals by replacing the parts of best-selected individuals.
✓ Elitist: replacing the worst individuals with the results of the previous step.
✓ Repeating the genetic loop until the convergence or termination of the model.
More details and descriptions of the basic parameters used to direct a search for a linear genetic program are given by
Brameier et al. (2007).
5. Evolutionary polynomial regression
EPR is an evolutionary data-based model using evolutionary computation in derivation of polynomial equations. It
hybridizes the ability of genetic algorithm with numerical regression and finds symbolic regressions.
The EPR procedure follows two primary steps. In the first step, the symbolic architecture of the regression is derived,
and in the second step the coefficients of symbolic model are determined using least square error. Giustolisi and Savic
(2006) developed a multi-objective genetic algorithm using the accuracy and complexity (minimum number of inputs,
minimum length of expressions) as the optimization goals (Balf et al., 2018).
The number and range of input parameters, the type of selected functions for the components of the symbolic regression
(e.g., natural logarithmic, tangent hyperbolic, and exponential) have a crucial impact on the EPR performance (Najafzadeh
et al., 2017). More details on the EPR and its software for application are given by Giustolisi and Savic (2006).
228 Handbook of hydroinformatics
6. Multigene genetic programming
Recently, the MGGP (Searson, 2009) has been established as one of the newest GP-based models. The MGGP produces
more straightforward models in comparison with the traditional GP, because the MGGP linearly combines simple and low
depth subtrees (Searson, 2015). Every computer code or individual in the MGGP model is a weighted linear combination of
many genes (trees), and a bias term as the noise component (Riahi-Madvar et al., 2021). An example of a hypothetical
multigene tree structure is presented in Fig. 7, and the equivalent equation of this model can be written as
Gene 1 : 6:27 ½ðx1 7:65Þ + ð tan ð7:51x1 ÞÞ ¼ 6:27 tan ð7:51x1 Þ 6:27x1 + 47:99
Gene 2 : 17 ½x1 + tan ð7:57x1 Þ ¼ 17 tan ð7:57x1 Þ 17x1
Gene 3 : 8 ½ð7:5x1 Þ + ð cos ðx2 Þ + x1 + x2 Þ ¼ 52:56x1 8 cos ðx2 Þ 8x2
Fðx1 , x2 Þ ¼ 29:29x1 8x2 + 6:27 tan ð7:51x1 Þ + 17 tan ð7:57x1 Þ 8 cos ðx2 Þ + 47:99
(1)
where, 47.99 is the bias component, 6.27, 17, and 8 are the regression coefficients, showing the gene weights. Generally, the linear coefficients for every MGGP individual are determined by the ordinary least squares optimization. The
user determines the maximum depth (Dmax) of trees, and the maximum number of genes (Gmax). These maxims control the
complexity degree of the final solution (Searson et al., 2010). It has been demonstrated that the MGGP method for obtaining
nonlinear behavior is more accurate than the traditional linear regression method (Danandeh Mehr and Kahya, 2017).
Also, Searson et al. (2010) illustrated that the MGGP method could effectively be embedded into a nonlinear partial
least square method.
In the MGGP, the initial population is generated by the GP trees having a different number of genes between 1 and Gmax,
that are generated randomly. Each gene is a simple GP tree that does not relate to other genes in the same chromosome.
Gene 1
Gene 2
plus
plus
plus
plus
tan
X1
times
-7.508
X1
plus
times
tan
X1
times
-7.654
X1
Gene 3
-7.576
plus
X2
X1
-7.576
X1
cos
plus
plus
times
X1
29.29
X2
times
29.29
times
6.27
17
tan
tan
FIG. 7 Representation of the MGGP model (Riahi-Madvar et al., 2021).
-8
times
times
X1
times
7.51
X1
cos
X2
7.57
X1
Gene expression models Chapter
13
229
Throughout an MGGP model, genes are obtained and deleted using a two-point high-level crossover (Danandeh Mehr and
Kahya, 2017; Gandomi and Alavi, 2012). For example, consider that the Gi is the ith gene in an individual, the first parent is
constructed with three genes G1, G2, and G3, and the second parent contains four genes G4, G5, G6, and G7. In every individual, two chosen crossover points are generated at random. The genes surrounded by the selected crossover points (i.e.,
[…]) are replaced. Two new individuals are created as following.
ðG1 , ½G2 , G3 ÞðG4 , G4 , ½G6 , ½G7 Þ ! ðG1 , G6 , G7 , G3 ÞðG4 , G4 , G2 Þ
(2)
In the case with genes greater than the Gmax, the additional genes are deleted randomly (Searson et al., 2010; Gandomi and
Alavi, 2012).
7. Pareto optimal-multigene genetic programming
The Pareto optimal has been used in many engineering problems due to its simple production of different Pareto frontiers
(Coello and Becerra, 2003). The Pareto optimization works using the balance between different optimization goals. The
feasible solutions of the multi-objective optimization are determined by separating the sequences that satisfy the priority
condition. If X1 and X2 were two feasible solutions for the establishment of a dominance relationship, they should satisfy the
following conditions (Riahi-Madvar et al., 2021)
Objd ðX1 Þ Objd ðX2 Þ, 8d f1, 2, …, Dg
Obji ðX1 Þ Obji ðX2 Þ, 9i f1, 2, …, Dg
(3)
where, Objd is the objective value of the dth solution, and D is the total number of the goals.
If the X* solution, satisfies the above conditions and there isn’t any sequence solution X, while X < X*, then the X* will
be preserved as the Pareto or noninferior solution, and other dominated solutions will be eliminated. The Pareto front is built
up by the values of the goal function in the selected sequences (Fig. 8). Fig. 8 illustrates that solution B is preferred to
solution E regarding two goals. So, E is a dominant solution and eliminated. The B and C solutions are the same. B is better
than the C based on the objective Obj1, but B does not satisfy the Obj2. Therefore, all A, B, C, and D solutions are Pareto
results. The frontier line of the resulting domain is called the Pareto front. The Pareto front removes the E, F, and G as the
dominant solutions (Zhang et al., 2017).
In the Pareto optimization algorithm, the new solutions (pi) will be compared with the nondominated solutions (qj).
If pi < qj, the pi will be selected and placed instead of the qj. If both qj and pi satisfy the Obj1 and Obj2 conditions, then
both qj and pi used in the next comparisons (Zhang et al., 2017). Fig. 9 shows the flowchart of PO-MGGP. The PO-MGGP
hybridizes robust techniques of feature selection, Pareto optimization, and multi-expression discoveries.
FIG. 8 Relationship between Pareto solutions.
230 Handbook of hydroinformatics
FIG. 9 The flowchart of PO-MGGP modeling.
8. Some applications of GEP-based models in hydro informatics
In this section, some applications of GEP based models in different hydro informatic problems are presented, and the ability
of GEP based techniques in explicit function finding is provided. In this study, gene expression programming from
MATLAB software is used to derive the exact form of function. A software package is created to be used with different
inputs and examine different mathematical functions and operations.
8.1 Derivation of quadric polynomial function using GEP
In the first applicability test of the gene expression program, GEP is used to derive quadric polynomial function. A set of
single input, output data is generated, and the final form of the function is derived using the developed GEP model. So, the
following function is executed between 0 and 100 with steps of 2.
Y ¼ X + X 2 + X3 + X 4
(4)
The GEP model is executed with the following parameters: population: 50, generations:100, match selection value: 5, probability: 0.05, threshold error: 1e-30, maximum tree depth: 12, maximum depth of subtrees in the mutation: 7, and the
addition, subtraction, power, multiplication, division, sinus functions are used. Based on the results in Fig. 10, the minimum
Gene expression models Chapter
13
231
FIG. 10 Changes of absolute error sum of target function in different generations of quadric polynomial.
value of the target function, which is the sum of absolute error, is equal to 9.0896e8. Results of the GEP model and the
results of the function are shown in Fig. 11. It is observed that despite vast variations of the output variable, gene expression
programming predicted the values correctly. The mentioned function is one of the functions used by Koza (1990) as a
reference function in the evaluation of GEP based models.
Also, the Final computer program generated by the model is derived as the following (Fig. 12).
And this symbolic expression demonstrates that the quadratic polynomial 4 is precisely generated via simplification.
Plusðplusðtimesðx1 , x1 Þ, timesðtimesðplusðtimesðx1 , x1 Þ, x1 Þ, x1 Þ, x1 ÞÞ, x1 Þ
(5)
8.2 Derivation of Colebrook-White equation using GEP
In the second test of GEP, a database is generated by numerical solution of the Colebrook White equation. The accuracy is
evaluated in the extraction of the exact form of Colebrook White equation. Reynold number varies in the ranges of 4000 to
100,000,000, relative roughness range is 0.000001 to 0.05, so by numerical solution of Colebrook White equation range of f
is derived between 0.015751715 and 0.0715537. Whole data are divided randomly into three clusters: training 60%, test
20%, cross-validation 20%. GEP with the population size 90, 100 generations, tournament selection 4, maximum tree depth
12, and addition, subtraction, multiplication, division, exponential, power and hyperbolic tangent functions is executed.
FIG. 11 Comparison of GEP predicted and results of quadric polynomial.
FIG. 12 GEP generated program
for quadric polynomial.
232 Handbook of hydroinformatics
Gene weights
1
X2
Y 0.753
0
X3
Y 0.06727
–1
X4
Y -1.321
–2
X1
–3
Y -3.207
–4
Bias
Gene 1
Gene 2
Gene 3
FIG. 13 The weights of multigene in POMGGP for Colebrook White equation.
Changes of the target function in different generations are shown in Fig. 13, in which the minimum value of the target
function is 3.307e5 in the 49th generation. Also, the best RMSE values are 3.307e5 for train data, 3.3203e5 test data,
and 3.1432e5 for the validation data, respectively. Figs. 14 and 15 show the observed and predicted values for three sets of
data. It is seen that the results are entirely the same, so the model accuracy is confirmed. GEP finally derived the explicit
equation for Colebrook White as:
0
0sffiffiffiffiffiffiffiffiffi11
sffiffiffiffiffiffiffiffi
rffiffiffiffi
rffiffiffiffiffi
rffiffiffiffi e
e
e AA
e
+ 0:05028
f ¼ 0:6131 + 0:05028
+ 0:4137Ln@2 Re Ln@
+ 0:4137 D 0:3955Lnð Re Þ
D
D
D
rffiffiffiffi
e
0:3955Ln
+ Lnð Re Þ
D
(6)
FIG. 14 Comparison of actual and GEP predicted values in Colebrook white relation derivation.
Gene expression models Chapter
13
233
FIG. 15 Comparison of actual and GEP predicted values trend in Colebrook White equation derivation.
8.3 Derivation of the exact form of shield’s diagram using GEP
Shield’s diagram shows that critical shear stress changes sediment particle Reynold number and is used to calculate sediment particle movement threshold, which different equations are derived by many researchers. In this part, the Wu (2007)
sets of equations are used, and a data set is generated. Then using the GEP, an exact expression for shield diagram is derived.
Wu (2007) suggested the following sets of equations for shield’s diagram
8
0:126D0:44
, D∗ < 1:5
>
∗
>
>
>
0:131D0:55
, 1:5 D∗ < 10
>
∗
>
<
0:27
τc
0:0685D
, 10 D∗ < 20
∗
¼
(7)
0:19
0:0173D
,
20
D∗ < 40
gs g d >
∗
>
>
> 0:0115D0:3 , 40 D < 150
>
>
∗
∗
:
0:052D0:3
∗ , 150 D∗
1=3
D∗ ¼ d rs =r 1 g=n2
In the above equation d is particle size [meters], and τc is the critical shear stress [N M2]. The dataset for the dimensionless
size of particles between 0.01 and 400 is derived, and then the exact form is extracted by GEP. GEP is executed with the
previous example settings, and after convergence, the RMSE values in three categories of train, test and evaluation data are
derived to be 0.0003859, 0.00037937, and 0.00038873, respectively. Finally, the GEP exact form of the shield’s diagram is
derived as below:
pffiffiffiffiffiffi
Exp D∗
τc
2
¼ 0:08460 + 3:0255ðD∗ + 8:639Þ + 0:887646
pffiffiffiffiffiffi
gs g d
ðD∗ + 8:639Þ D∗
0:5
0:0000127106ðD∗ 2:939185942Þ D0:5
+ 0:040208497ðD∗ + 3:509Þ0:5 D0:25
∗ + 8:639D∗
∗
0:75
0:00008529D∗
(8)
In Fig. 16, the shield’s diagram of the GEP-based equation vs equation seven is drawn, and it is clear that GEP derived it
accurately and explicitly.
234 Handbook of hydroinformatics
1
W*
HFFUGEP-Eq.
Wu(2006) Eq.
0.1
0.01
0.01
0.1
10
1
100
D*
FIG. 16 Derivation of shield’s diagram using GEP.
8.4 Extraction of regime river equations using GEP
In this section, to study the capability, and strength of GEP to derive empirical geometry equations of regime rivers, a field
database is used, and a model developed. To achieve this, a data set is collected by field study related to the geometry of
river regimes. Then this data set is used in GEP, and the exact form of desired relations is derived, and developed program
accuracy in practical problems and field studies is confirmed. Using GEP, exact relations are derived for geometrical properties of river regimes. Based on the results, the following equation is derived for regime depth.
H ¼ 0:00003256Q 0:00001639QD50 0:002263
Q
+ 0:05363Q0:5 D0:5
+ 0:5424Q0:25 0:1674
50
ExpðD50 Þ
(9)
The above equation has correlation coefficient 0.9, which demonstrates acceptable results. Using the GEP method, an exact
equation is derived for the river regime width, which has correlation coefficient 0.97, and based on the results shown in
Figs. 17–18, the GEP based equation has an acceptable performance to predict the stable width of the river regime. The
derived equation for width of the river is as below:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffi
Q + 8:00673
W ¼ 18:69 0:085Ln 1:310221Q + 8:255 Q + 7:717513 0:008295
0:3159 QD50
D50
Using collected data, and GEP the equation for river regime slope is derived as follows
pffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffi
D50
D50
D50
S ¼ 12:68 + 0:2521
0:0837Lnð3:76781 D50 Þ 0:006212
+ 0:0008078
Q + D50
Q
4:371538 D50
(10)
(11)
This equation has a correlation coefficient 0.74. Its results are compared to the actual values in Fig. 19, and seen that it has
the desired performance. Based on the findings of the presented equations in this section, it can be observed that the GEP
has the acceptable performance to derive equation in field data. It can display the shape and final form of optimized equations with their coefficients, which can be easily applied to actual applications.
Gene expression models Chapter
RMS: 0.36232
R square: 0.91922
Variation explained: 91.9183 %
13
235
FIG. 17 GEP accuracy in deriving the
canal depth regime equation.
9
8
7
Predicted
6
5
4
3
2
1
1
2
3
4
5
7
6
8
9
Observed
RMS: 14.4625
R square: 0.97086
Variation explained: 97.0809 %
FIG. 18 GEP accuracy in deriving
the canal width regime equation.
500
450
400
Predicted
350
300
250
200
150
100
50
50
100
150
200
250
300
Oserved
350
400
450
500
236 Handbook of hydroinformatics
FIG. 19 GEP accuracy in deriving the canal
slope regime equation.
1.5
RMS: 0.12375
R square: 0.72194
Variation explained: 72.1795 %
Predicted
1
0.5
0
0
1
0.5
1.5
Observed
8.5 Extraction of longitudinal dispersion coefficient equations using GEP
In this section, the Pareto optimality regarding the Multi GEP is used to find an equation for longitudinal dispersion of
pollutants in natural rivers (Riahi-Madvar et al., 2009). The primary goal of this section is to use the 503 collected data
in streams around the world, to build a PO-MGGP equation for longitudinal dispersion coefficient Kx. The advectiondispersion equation, in which the longitudinal dispersion coefficient is a crucial parameter, is written as follows:
∂C
∂C
∂2 C
+u
¼ Kx 2
∂t
∂x
∂ x
(12)
The Kx is longitudinal dispersion coefficient, x is the streamwise direction, t is the time, u is the flow velocity, and C is the
pollutant concentration. The longitudinal dispersion coefficient as a function of its effective parameters can be presented as
follows:
D ¼ f U, U ∗ , Sn , H, W
(13)
In which the parameters are: velocity (U), mean depth of stream (H), shear velocity (U*), sinuosity (Sn), and width of the
stream (W). In this case, the PPOMGGP with its parameters in Table 2, is used and the multigene results of the model are
derived.
The simplified equation of the Kx in the form of effective dimensionless parameters derived as follows:
X31 2:391 1015 X1 1:02 1016 2:033 1020
Kx
8:497X1
2 2
2
¼ 33:99X1 + 8:497X1 X2 +
+ 16:99X1 X2 +
X1
X42
BU ∗
+ 0:01478
(14)
Gene expression models Chapter
13
237
TABLE 2 Parameters of the MGGP for Kx prediction.
Run parameter
Value
Population size
300
Max. generations
500
Generations elapsed
55
Input variables
2
Training instances
352
Tournament size
15
Elite fraction
0.3
Lexicographic
selection pressure
On
Probability of Pareto
tournament
0
Max. genes
3
Max. tree depth
4
Max. total nodes
Inf
ERC probability
0
Crossover
probability
0.84
High level 0.2, Low level 0.8
Mutation
probabilities
0.14
Subtree 0.9, Input 0.05, Perturb ERC 0.05
Complexity measure
Expressional
Function set
TIMES MINUS PLUS RDIVIDE SQUARE TANH EXP
LOG MULT3 ADD3 SQRT CUBE NEGEXP NEG ABS
Comparisons of the predicted values versus observed values of Kx are shown in Figs. 20 and 21. Also, the error trend and
error histograms in train, and test steps are presented. These results obviously visualize the predictability of observations by
the developed equation in their acceptable accuracy. As it is clear, the PO-MGGP, is capable of estimating the Kx
accurately.
9. Conclusions
In this chapter, the GEP-based techniques in hydroinformatics are considered, and different types of them including, GP,
GEP, MGGP, Tree-GEP, POMGGP are introduced. Their applicability in different fields of hydraulic and hydroinformatics
is discussed. The results of the case studies of GEP based models confirmed the accuracy and suitability of the GEP-based
techniques in different aspects of hydraulic and hydrology. Functions findings in quadratic equation, explicit form of
Colebrook-White, shield’s diagram, river regime slope-depth and width, and pollutant dispersion in natural rivers are
studied. The results of case studies confirmed the general abilities of the GEP in function findings. In the GEP models,
according to the number of functions, variables, numerical constants and conditional expressions, the number of combinations varies large, and the search space will be vast. Also, taking into account the runtime of one program (chromosome)
and testing that the program works appropriately or not, the run time of the GEP models will be considerable. Finally, the
main challenges of GEP models can be summarized as:
238 Handbook of hydroinformatics
FIG. 20 Comparing observed Kx/HU* versus estimated PO-MGGP in train phase.
l
l
l
l
Ample search space of the problem (large number of functions, variables, constants, operands, and their combinations).
A large number of chromosomes in each generation and time required for calculation of fitness for all generations.
Determination of chromosomes fitness with different input combinations
The GA produces several generations with a hundred population that requires the evaluation of hundred thousand of
chromosomes overall.
It is worth mentioning, that the performance and success of GEP mainly depend on the appropriate selection of operands,
functions, and terminals in parse trees. The proper choice of these improves the efficiency of model and highly affects the
runtime of the model. Therefore, the main difference between GA and GEP is the concept of chromosomes and the final
solution. In GA, the strings or chromosomes have equal length of numbers as the solution of the problem. In GEP, a set of
computer codes with the same or different sizes are produced as the final solution. In GEP, the computer codes are generated automatically as symbolic expressions of variables, functions, and operands.
Gene expression models Chapter
13
239
FIG. 21 Comparing observed Kx/HU* versus estimated PO-MGGP in test phase.
References
Abraham, A., Nedjah, N., de Macedo Mourelle, L., 2006. Evolutionary computation: from genetic algorithms to genetic programming. In: Genetic Systems
Programming. Springer, Berlin, Heidelberg, Germany, pp. 1–20.
Alavi, A.H., Gandomi, A.H., 2012. Energy-based numerical models for assessment of soil liquefaction. Geosci. Front. 3 (4), 541–555.
Azamathulla, H.M., Ghani, A.A., 2010. Genetic programming to predict river pipeline scour. J. Pipeline Syst. Eng. Pract. 1 (3), 127–132.
Babovic, V., Keijzer, M., 2000. Genetic programming as a model induction engine. J. Hydroinf. 2 (1), 35–60.
Balf, M.R., Noori, R., Berndtsson, R., Ghaemi, A., Ghiasi, B., 2018. Evolutionary polynomial regression approach to predict longitudinal dispersion coefficient in rivers. J. Water Supply Res. Technol. AQUA 67 (5), 447–457.
Birbal, P., Azamathulla, H., Leon, L., Kumar, V., Hosein, J., 2021. Predictive modelling of the stage-discharge relationship using gene-expression programming. Water Supply 21 (7), 3503–3514.
Brameier, M., Banzhaf, W., Banzhaf, W., 2007. Linear Genetic Programming. Vol. 1 Springer, New York, USA.
Coello, C.A.C., Becerra, R.L., 2003. Evolutionary multiobjective optimization using a cultural algorithm. In: Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS’03, Cat. No. 03EX706. IEEE, pp. 6–13.
Danandeh Mehr, A., Kahya, E., 2017. A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction. J. Hydrol.
549, 603–615.
240 Handbook of hydroinformatics
Dufek, A.S., Augusto, D.A., Dias, P.L., Barbosa, H.J., 2017. Application of evolutionary computation on ensemble forecast of quantitative precipitation.
Comput. Geosci. 106, 139–149.
Eslamian, S., Abedi-Koupai, J., Zareian, M.J., 2012. Measurement and modelling of the water requirement of some greenhouse crops with artificial neural
networks and genetic algorithm. Int. J. Hydrol. Sci. Technol. 2 (3), 237–251.
Ferreira, C., 2001. Gene expression programming: a new adaptive algorithm for solving problems. arXiv. preprint cs/0102027.
Ferreira, C., 2002a. Gene expression programming in problem solving. In: Soft Computing and Industry. Springer, London, UK, pp. 635–653.
Ferreira, C., 2002b. Mutation, transposition, and recombination: an analysis of the evolutionary dynamics. In: Proceedings of the 6th Joint Conference on
Information Sciences, 4th International Workshop on Frontiers in Evolutionary Algorithms, pp. 614–617.
Ferreira, C., 2003. Function finding and the creation of numerical constants in gene expression programming. In: Advances in Soft Computing. Springer,
London, UK, pp. 257–265.
Ferreira, C., 2006. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. vol. 21 Springer, Germany.
Gandomi, A.H., Alavi, A.H., 2012. Krill herd: a new bio-inspired optimization algorithm. Commun. Nonlinear Sci. Numer. Simul. 17 (12), 4831–4845.
Giustolisi, O., Savic, D.A., 2006. A symbolic data-driven technique based on evolutionary polynomial regression. J. Hydroinf. 8 (3), 207–222.
Guven, A., Azamathulla, H.M., Zakaria, N.A., 2009. Linear genetic programming for prediction of circular pile scour. Ocean Eng. 36 (12–13), 985–991.
Jin, S.S., 2020. Compositional kernel learning using tree-based genetic programming for Gaussian process regression. Struct. Multidiscip. Optim. 62,
1313–1351.
Kazemi, M.H., Majnooni-Heris, A., Kisi, O., Shiri, J., 2021. Generalized gene expression programming models for estimating reference evapotranspiration
through cross-station assessment and exogenous data supply. Environ. Sci. Pollut. Res. 28 (6), 6520–6532.
Keramatloo, M., Zahiri, A., Kordi, E., Ghorbani, K., Dehghani, A., 2020. Modeling of river water temperature using gene expression programming (case
study: MohammadAbad River in Golestan province). J. Water Soil Conserv. 27 (2), 237–244.
Khan, M., Tufail, M., Azamathulla, H.M., Ahmad, I., Muhammad, N., 2018. Genetic functions-based modelling for pier scour depth prediction in coarse
bed streams. Proc. Inst. Civil Eng. Water Manag. 171 (5), 225–240.
Koolivand-Salooki, M., Esfandyari, M., Rabbani, E., Koulivand, M., Azarmehr, A., 2017. Application of genetic programing technique for predicting
uniaxial compressive strength using reservoir formation properties. J. Petrol. Sci. Eng. 159, 35–48.
Koza, J.R., 1990. Genetic Programming: A Paradigm for Genetically Breeding Populations of Computer Programs to Solve Problems. Vol. 34 Stanford
University, Department of Computer Science, USA, Stanford, CA.
Koza, J.R., 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. A Bradford Book. MIT Press.
Li, X., Zhou, C., Xiao, W., Nelson, P.C., 2005. Prefix gene expression programming. In: Proc. Genetic and Evolutionary Computation Conference, Washington, USA, pp. 25–31.
Najafzadeh, M., Barani, G.A., 2011. Comparison of group method of data handling based genetic programming and back propagation systems to predict
scour depth around bridge piers. Sci. Iranica 18 (6), 1207–1213.
Najafzadeh, M., Kargar, A.R., 2019. Gene-expression programming, evolutionary polynomial regression, and model tree to evaluate local scour depth at
culvert outlets. J. Pipeline Syst. Eng. Pract. 10 (3), 04019013.
Najafzadeh, M., Oliveto, G., 2021. More reliable predictions of clear-water scour depth at pile groups by robust artificial intelligence techniques while
preserving physical consistency. Soft. Comput. 25 (7), 5723–5746.
Najafzadeh, M., Laucelli, D.B., Zahiri, A., 2017. Application of model tree and evolutionary polynomial regression for evaluation of sediment transport in
pipes. KSCE J. Civ. Eng. 21 (5), 1956–1963.
Najafzadeh, M., Ghaemi, A., Emamgholizadeh, S., 2019. Prediction of water quality parameters using evolutionary computing-based formulations. Int.
J. Environ. Sci. Technol. 16 (10), 6377–6396.
Nezaratian, H., Zahiri, J., Peykani, M.F., Haghiabi, A., Parsaie, A., 2021. A genetic algorithm-based support vector machine to estimate the transverse
mixing coefficient in streams. Water Quality Res. J. 56 (3), 127–142.
Norouzi, H., Moghaddam, A.A., Celico, F., Shiri, J., 2021. Assessment of groundwater vulnerability using genetic algorithm and random forest methods
(case study: Miandoab plain, NW of Iran). Environ. Sci. Pollut. Res. 28, 1–16.
Parsaie, A., Haghiabi, A.H., 2021. Mathematical expression for discharge coefficient of weir-gate using soft computing techniques. J. Appl. Water Eng.
Res. 9 (3), 175–183.
Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R., 2008. A Field Guide to Genetic Programming. Springer, Germany.
Raiahi-Madvar, H., Seifi, A., 2016. Performance evaluation of gene expression programming approach in layout Design of Drippers in drip irrigation
systems comparing with empirical method. J. Water Soil Conserv. 23 (5), 25–45. https://doi.org/10.22069/jwfst.2017.9467.2359.
Ravansalar, M., Rajaee, T., Kisi, O., 2017. Wavelet-linear genetic programming: a new approach for modeling monthly streamflow. J. Hydrol. 549, 461–
475.
Riahi-Madvar, H., Ayyoubzadeh, S.A., Khadangi, E., Ebadzadeh, M.M., 2009. An expert system for predicting longitudinal dispersion coefficient in
natural streams by using ANFIS. Expert Syst. Appl. 36 (4), 8589–8596.
Riahi-Madvar, H., Dehghani, M., Seifi, A., Singh, V.P., 2019. Pareto optimal multigene genetic programming for prediction of longitudinal dispersion
coefficient. Water Resour. Manag. 33 (3), 905–921.
Riahi-Madvar, H., Gholami, M., Gharabaghi, B., Seyedian, S.M., 2021. A predictive equation for residual strength using a hybrid of subset selection of
maximum dissimilarity method with Pareto optimal multi-gene genetic programming. Geosci. Front. 12 (5), 101222.
Saljoughi, E., 2017. Application of genetic programming as a powerful tool for modeling of cellulose acetate membrane preparation. J. Textiles Polym. 5
(1), 1–7.
Gene expression models Chapter
13
241
Sattar, A.M.A, Gharabaghi, B., 2015. Gene expression models for prediction of longitudinal dispersion coefficient in streams. J. Hydrol. 524, 587–596.
Searson, D.P., 2009. GPTIPS: Genetic programming and symbolic regression for MATLAB. https://eprints.ncl.ac.uk/175261.
Searson, D.P., 2015. GPTIPS 2: an open-source software platform for symbolic data mining. In: Handbook of Genetic Programming Applications.
Springer, Cham, Switzerland, pp. 551–573.
Searson, D.P., Leahy, D.E., Willis, M.J., 2010, March. GPTIPS: An open source genetic programming toolbox for multigene symbolic regression. In:
Proceedings of the International Multiconference of Engineers and Computer Scientists. Vol. 1. Citeseer, USA, pp. 77–80.
Tinoco, R.O., Goldstein, E.B., Coco, G., 2015. A data-driven approach to develop physically sound predictors: application to depth-averaged velocities on
flows through submerged arrays of rigid cylinders. Water Resour. Res. 51 (2), 1247–1263.
Wu, W., 2007. Computational River Dynamics. CRC Press.
Yan, X., Mohammadian, A., Khelifa, A., 2021. Modeling spatial distribution of flow depth in fluvial systems using a hybrid two-dimensional hydraulicmultigene genetic programming approach. J. Hydrol. 600, 126517.
Zahiri, A., Shabani, M.A., 2018. Modeling of stage-discharge relationship in compound channels using multi-stage gene expression programming. Iranian
J. Ecohydrol. 5 (1), 37–48.
This page intentionally left blank
Chapter 14
Gradient-based optimization
Mohammad Zakwan
School of Technology, Maulana Azad National Urdu University, Hyderabad, India
Symbols
a, h, g
parameters of Pearson three parameter distribution
average of shifted log transformed discharge data
mC
j
parameter in lognormal three parameters (3P) distribution
standard deviation of log transformed discharge data
sc
A
watershed area in km2
b
coefficient of sediment rating curve
c, d, e, f, and g the empirical parameters
vegetal cover factor
Fv
I
channel inflow
K
constant in Muskingum equation
p
weighted inflow parameter
P
rainfall term
Q
channel outflow
effective discharge
Qe
S
channel storage
watershed slope
SL
X
weighing ratio
1. Introduction
There are several applications of optimization techniques in water resource engineering (Datta and Harikrishna, 2005;
Wang et al., 2009; Kisi et al., 2012; Bazaraa et al., 2013; Niazkar and Afzali, 2014; Asgari et al., 2016; Qin et al.,
2018). In this regard, researchers have widely used various optimization techniques in water resource engineering
(Eslamian and Lavaei, 2009; Bhattacharjya, 2011; Kisi et al., 2012). Starting from trial and error procedures have moved
toward the application of evolutionary algorithms and gradient-based optimization methods (Hegazy and Ersahin, 2001;
Haddad et al., 2006; Xu et al., 2012).
Evolutionary optimization techniques such as Genetic Algorithm (Mohan, 1997; Eslamian and Lavaei, 2009; Mondal
et al., 2010), Harmony Search (Kim et al., 2001), Particle Swarm Optimization (Chu and Chang, 2009), Ant Colony Algorithm (Afshar et al., 2015), Artificial Bee Colony (Kisi et al., 2012), Bat Algorithm (Ahmadianfar et al., 2016), Cuckoo
Search Algorithm (Meng et al., 2019), Differential Evolution (Xu et al., 2012), Genetic Programming (Mehr et al., 2018),
Gravitational Search Algorithm (Rashedi et al., 2009; Yazdani et al., 2014), Simulated Annealing (Wang et al., 2009),
teaching learning-based optimization (Rao et al., 2012), Honey Bee Mating Optimization (Haddad et al., 2006), Water
Cycle Algorithm (Sadollah et al., 2014), Modified Honey Bee Mating Optimization (Niazkar and Afzali, 2014), Firefly
Algorithm (Kisi et al., 2015) Weed Optimization Algorithm (Asgari et al., 2016), Gray Wolf Optimization Algorithm
(Choopan and Emami, 2019) and Harris Hawk Optimization (Tikhamarine et al., 2020) have been used in water resource
engineering. However, the efficiency of evolutionary algorithm is largely dependent on the tuning of these algorithm
parameters, which is often cumbersome, requiring a greater computational effort and expertise. On the other hand,
gradient-based optimization techniques are much simpler and can be easily used for solving optimization problems with
differentiable convex objective functions and convex constrain domain.
Fitting equations, parameter estimation and trade-off between the cost and benefit of various aspects of water resource
planning has remained an essential part of hydrology and hydraulics (Eslamian et al., 2000; Hegazy and Ersahin, 2001;
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00013-0
Copyright © 2023 Elsevier Inc. All rights reserved.
243
244 Handbook of hydroinformatics
Zakwan, 2017; Yuan et al., 2020). In this regard, gradient-based optimization has largely helped water resource engineers,
hydraulicians and hydrologist. There are several evidences of the use of gradient-based optimization technique in water
resource engineering. Gupta and Sorooshian (1985) applied gradient-based optimization for rainfall runoff modeling.
Lall and Miller (1988) and Goy et al. (1989) employed gradient-based Berndt-Hall-Hall-Hausman (BHHH) algorithm
for fitting equations in the time series analysis based on the least square method. Peng and Buras (2000) employed
gradient-based optimization technique for developing optimal reservoir operation system. Jewell (2001) proposed the
application of gradient-based solver to teach the design of pipe flow network and computation of the flow area of open
channels. Geem (2006) proposed the application of gradient-based Broyden Fletcher Goldfarb Shanno (BFGS) technique
for determining the optimal parameter of Muskingum channel routing. Yeo and Guldmann (2010) applied hill climbing
algorithm for hydrological modeling of watershed to estimate the peak runoff. Bhattacharjya (2011) employed GRG optimization technique to model the groundwater inverse flow problem. Karahan (2009), Barati (2013), Hirpurkar and Ghare
(2014), and Zakwan and Muzzammil (2016) identified the optimal parameters of Muskingum flood routing equation based
on the GRG optimization. Ibtissem and Nouredine (2013) developed an algorithm based on conjugate gradient neural networks to address the nonlinearity in systems. Che et al. (2014) employed gradient-based optimization technique to
determine parameters of unit hydrograph. Wang and Brubaker (2015) employed gradient-based optimization to develop
a multi objective model for hydrologic modeling. Zakwan (2016a) used GRG optimization technique to obtain the
parameter of Intensity Duration Frequency (IDF) curves while Zakwan (2016b) utilized the same technique to fit the
rainfall runoff curve. Zakwan et al. (2016a,b) used GRG optimization technique to estimate the parameters of various infiltration and compare the performance of infiltration models. Muzzammil et al. (2015, 2018) and Zakwan et al. (2017)
applied gradient-based technique to identify the parameters of stage-discharge rating curves for different rivers. Cho
et al. (2017) applied Davidon Fletcher Powell (DFP) algorithm to estimate the parameters of the rainfall model and reported
it to be highly sensitive to initial values of decision variables. Niazkar and Afzali (2016, 2017) provided hybridized Honey
Bee Mating optimization with GRG technique to obtain a better solution for flood routing problems. Geem and Kim (2018)
applied the BFGS technique to propose an improved version of Manning equation. Zakwan (2018) applied GRG technique
to model the looped stage-discharge rating curve. Pandey et al. (2020) utilized the GRG technique to develop the
equation to estimate the scour depth. Samadi-koucheksaraee et al. (2019) made a novel attempt to apply Gradient Evolution
technique for optimal reservoir operation. Zakwan (2020) redefined maximum observed discharge and rainfall envelope
curves using the GRG technique. Niazkar and Zakwan (2021) applied the hybrid MGGP-GRG technique to model simple
and looped discharge ratings. Zakwan and Niazkar (2021) applied the hybrid MGGP-GRG technique to model the infiltration rate. Zakwan and Niazkar (2022)) also applied GRG technique to model the reverse flood routing problem.
Basically, gradient-based optimization techniques have been broadly used for solving nonlinear equations, estimation
of parameters and developing empirical equations in water resource engineering. In this regard, one example each of curve
fitting, solving nonlinear equation, and estimation of parameters is presented.
2.
Materials and method
Numerous applications of optimization techniques could be in Hydroinformatics. In the present chapter, three examples of
gradient-based optimization techniques would be discussed. Although examples considered here are basically optimization
problems in one form or the other but based on the form in which they appear in hydrological literature, they may be categorized as follows: solving nonlinear equations, parameter estimation, and development of empirical relations. As an
example to solving of nonlinear equation, equations formed in analytical estimation of effective discharge was considered.
Parameters involved in Muskingum flood routing were estimated as an example to parameter estimation problems while
empirical equation was formulated to estimate mean annual discharge as an example to development of empirical relationships. Sediment discharge data of Drava River was used for analytical estimation of effective discharge. Flood routing data
available in Viessman and Lewis (2003) was used for estimating the Muskingum channel routing while the data of
watershed characteristics of various Indian rivers was obtained from Garde and Kothyari (1990).
The gradient optimization technique selected in the present chapter was Generalized Reduced Gradient (GRG) optimization technique. GRG optimization tool is available in many commonly used platforms such as Microsoft Excel,
MATLAB and Minitab. Depending on the suitability users can apply GRG technique to optimization problems through
any of the above mentioned platforms.
Woodbury et al. (2008) conducted a survey to assess the adaptability of undergraduate engineering students to various
software platforms and found that majority of them are familiar with Microsoft excel but have little knowledge of handling
other software platforms, therefore, author has selected to adopt GRG optimization technique available in Microsoft excel.
Gradient-based optimization Chapter
14
245
Microsoft Excel contains three solvers, GRG, Linear Programming (LP) solver and Evolutionary solver namely, LP solver
solves linear equation while the evolutionary solver is based on evolutionary algorithm.
2.1 GRG solver
GRG solver comes as an add-inn tool with Microsoft excel. GRG solver algorithm was originally articulated in FORTRAN
programming. GRG solver program is a combination of main program and numerous subprograms such as GRG, CONSBS,
SEARCH, DEGEN, DIREC, REDOBJ, REDGRA, NEWTON, PARSH and GCOMP (Lasdon et al., 1978). In this code
DATAIN reads the input, GRG works out the problem and the results are printed by OUTRES (Lasdon and Smith, 1992).
GRG optimization method is deterministic optimization approach as it does not make use of random sampling and rely on
gradient information. The search direction of GRG solver is automatically governed by either Qausi-Newton method or
conjugate gradient method (Zakwan et al., 2017). Qausi-Newton is computationally more intensive as compared to conjugate gradient method as it stores the Hessian matrix at each iteration (Barati, 2013; Hirpurkar and Ghare, 2014). Multistart
option is also available in GRG solver to look after multiple starting points to address the chances of getting trapped in local
optimum. Users also have the option to select between forward finite difference method and central finite difference method
to calculate the gradient. For application of Excel solver to solve non-linear programming problems, problems can be
modeled using excel spreadsheet, C and C ++ programming.
3. Results and discussion
In the present chapter, three applications of gradient-based optimization techniques have been discussed, the details of these
applications can be found in subsequent subsections. In general, optimization refers to attaining a certain target (Objective
function) under specific conditions (Constrains) by adjusting some variables (Decision variables). Therefore, basic step
toward the application of any optimization software and formulation of an optimization problem is to recognize these components (Objective function, Constrains and Decision variables).
3.1 Solving nonlinear equations
As an example to application of gradient-based optimization technique for solving the nonlinear equation, the nonlinear
equations produced in the analytical estimation of effective discharge are considered. Effective discharge is defined as
the discharge that transports maximum sediment load in a river over a period of time and is often considered as the discharge
responsible for shaping the river (Zakwan et al., 2018). Apart from other applications, computation of effective
discharge plays a major role in the design of hydraulic projects across the river cross section. Analytically, effective discharge is computed as the point of inflection of the curve obtained from the product of sediment rating curve and fitted
frequency distribution to the discharge time series. In case, the lognormal three parameters (3P) and Pearson three
parameter distributions are fitted to the discharge time series, the expression of effective discharge computations will
be given by Eq. (1) and Eq. (2) respectively.
bg
mC + b 1 +
(1)
s2C ln Qe x ¼ 0
Qe
aQ2e ðZ 1 + b + agÞQe + bg ¼ 0
(2)
Fig. 1 shows the set up for estimation of effective discharge based on lognormal three parameters. In Fig. 1, Column B
shows the parameters of Lognormal (3P) distribution and rating curve parameters obtained by fitting Lognormal (3P) distribution and sediment rating curve respectively. Eq. (1) was set as target or objective function (Cell F3 shown in Green
color in Fig. 1). The decision variable in Eq. (1) is obviously the effective discharge (Qe). Cell D4 (Red color) represents the
decision variable while cell D6 (Brown color) and cell D8 (Gray color) are the upper and lower bound respectively, on the
decision variable. Graphical user interface for GRG technique is also shown in Fig. 1. In this graphical interface target or
objective is chosen as Cell F3 and was targeted to a value of 0 as could be seen in the Right Hand Side (R.H.S) of Eq. (1).
The changing variable or decision variable was set as Cell D4 and constrains were set as lower and upper bound on the
decision variable. The equation was solved with initial guess of 1000 m3/s discharge against the effective and the optimal
value of effective discharge was obtained as 524.75 m3/s with target cell almost equal to zero. Similarly, Eq. (2) was formulated to obtain the analytical estimate of effective discharge based on Pearson three-parameter frequency distribution
and the effective discharge was obtained as 514.32 m3/s.
246 Handbook of hydroinformatics
FIG. 1 Set up showing analytical estimation of effective discharge.
3.2 Application in parameter estimation
Application of gradient-based optimization to estimate the parameters of hydrologic models is very common. Here we
consider an example of Muskingum hydrologic channel routing model with weighted inflow. Flood routing is an important
aspect in the design of flood protection works (Ara and Zakwan, 2018). Flood routing is a technique to determine the time
and magnitude of the flood at a river section from known hydrographs at one more sections upstream. Various forms of
Muskingum hydrologic channel routing are available in literature. Several hydraulicians have observed that weighted
inflow from the previous and current time step provides better estimates of change in channel storage, therefore, a
parameter p was introduced to compute weighted inflow (Vatankhah, 2017). However, introduction of parameter
p makes the routing equation more complicated, eliminating any chances of the use of trial error procedure. The nonlinear
Muskingum model with provision of weighted inflow may be presented as
St ¼ K ½X W t + ð1 XÞ Qt m
(3)
Where W t ¼ ½p I t + ð1 pÞI t1 (4)
Eq. (4) represents four parameter (K, X, p and m) model. The outflow and storage for this model may be computed as
m1 DSt
1
St
1
¼
+
(5)
Wt
1X
1X
Dt
K
m1 1
St
X
Qt ¼
(6)
Wt
1X
1X
K
Fig. 2 shows the modeled weighted Muskingum channel routing equation. Column A, B and C are the observed data acting
as an input data for the present problem. Column D represents the storage at different time steps calculated in accordance
with Eq. (3). Column E represent change in storage with respect to time, calculated in accordance with the Eq. (5) while
Column F represent change in storage. Column E represents the estimated outflow, calculated in accordance with Eq. (6). In
Fig. 2, Cell I3, J3, K3, L3 (Red color) are the decision variables while I5, J5, K5, L5 (Brown color) and I7, J7, K7, L7 (Gray
color) represent lower and upper bound on the decision variables. In Fig. 2, Cell N5 shows the sum of square of error as the
Gradient-based optimization Chapter
14
247
FIG. 2 Set up showing estimation of parameters of Muskingum equation.
target or objective function. To obtain the optimal value of decision variables (K, X, m and p) the target cell was set to
minimization as shown in GUI in Fig. 2.
The outflow hydrograph obtained from the present analysis is shown in Fig. 3. It may be observed that the outflow
obtained from the weighted Muskingum equation are in good agreement with the observed outflow. Also the outflow
hydrographs obtained estimate the peak outflow quite accurately. The observed outflows at the first and second peak
are 1509 m3/s and 1248 m3/s respectively, while the estimated outflows are 1470 m3/s and 1245 m3/s respectively.
1600
Observed
1400
Estimated
Outflow (m3/s)
1200
1000
800
600
400
200
0
0
5
10
15
Time (hr)
FIG. 3 Observed and estimated outflow from Muskingum equation.
20
25
248 Handbook of hydroinformatics
3.3 Fitting empirical equations
Another application of optimization tools in the field is to develop empirical equations mostly on a regional basis. Several
hydrologic phenomena are too complex to be exactly measured and the governing factors of many of them are highly
dependent on watershed characteristics. In such cases, empirical equations are developed taking into account regional
watershed characteristics. Here we consider an example of estimation of discharge of 2.33 years return interval, Q2.33,
(mean annual flood in case of Gumbel distribution) in Indian watersheds. It was observed that for Indian watersheds
Q2:33 ¼ f ðA, P, SL , Fv Þ
(7)
Q2:33 ¼ cðAÞd ðPÞe ðSL Þf ðFv Þg
(8)
So, the empirical equation will be of the form
The empirical parameters in Eq. (8) can be estimated by minimizing the sum of square of error as shown in the Eq. (9)
Min SSE ¼
N h
X
Q2:33 cðAÞd ðPÞe ðSL Þf ðFv Þg
i2
(9)
i¼1
After determining the parameters, mean annual flood can be estimated based on the watershed characteristics.
The set up for establishing empirical relationship is shown in Fig. 4. In Fig. 4, watershed characteristics (input data) is
shown in Column B to column E while column A represent the average annual discharge as per the Gumbel distribution.
Column F represents the modeled equation, i.e., Eq. (8). Cell H3, I3, J3, K3, L3 (Red color) are the decision variables while
H5, I5, J5, K5, L5 (Brown color) and H7, I7, J7, K7, L7 (Gray color) represent lower and upper bound on the decision
variables. The sum of square of error was set as the target or objective function (Cell N3 shown in Green color in
Fig. 4). To obtain the optimal value of the decision variables the target cell was set to minimization as shown in GUI
in Fig. 4.
In most of the gradient-based methods, initial values of decision variables are to be supplied by users and gradient-based
methods are highly sensitive to the initial values of decision variables (Cho et al., 2017), on the other hand evolutionary
algorithms are computationally expensive. Since the gradient-based optimization technique depends on the principle of
direction of gradient (slope), there are chances of these techniques to be trapped in local optimum solution rather than
FIG. 4 Set up for developing empirical equation.
Gradient-based optimization Chapter
14
249
finding global optima, therefore, users should try for a fairly diverse and large number of initial guess of decision variables,
covering entire search space to ascertain global optimum solution. Multistart option is available in GUI of GRG solver Addin of Microsoft Excel, which may help for the users as it automatically start with different initial values of decision
variables, thereby increasing the chances of finding global optimum solution.
4. Conclusions
Modeling of hydrologic events are often very complex and require implication of nonlinear optimization technique. Historically, such problems were dealt through trial and error methods. However, with the advancement of programming techniques several optimization techniques have become common nowadays. Several gradient-based optimization techniques
as well as evolutionary techniques have gained considerable attention in the field of hydrology and Hydroinformatics.
The present chapter focuses on the application of gradient-based techniques in the field of Hydroinformatics. Over the
years several researchers have implemented gradient-based techniques to model various hydrologic events and proved that
gradient-based techniques are capable of successfully modeling hydrologic events. In the present chapter, gradient-based
GRG optimization technique has been applied to obtain the (i) Analytical solution of effective discharge based on the lognormal three parameters (3P) and Pearson three parameter frequency distributions (ii) Optimal parameters of Muskingum
channel routing equation with weighted inflow and (iii) Empirical equation to estimate the mean annual flood discharge for
Indian watersheds.
However, gradient-based optimization techniques are sensitive to the initial values of parameters (decision variables), in
this regard, they should be rigorously checked against local optimum solution, by considering multiple initial values of
decision variables covering the entire search space. When gradient-based techniques are rerun with multiple initial values
of decision variables, chances of obtaining global optimum solution increase. Hybrids of gradient-based evolutionary algorithms can ensure global optimal solutions with reduced computational expense, thereby opening the scope of future
research and application in the field of Hydroinformatics.
References
Afshar, A., Massoumi, F., Afshar, A., Mariño, M.A., 2015. State of the art review of ant colony optimization applications in water resource management.
Water Resour. Manag. 29 (11), 3891–3904.
Ahmadianfar, I., Adib, A., Salarijazi, M., 2016. Optimizing multi reservoir operation: hybrid of bat algorithm and differential evolution. J. Water Resour.
Plan. Manag. 142 (2), 05015010.
Ara, Z., Zakwan, M., 2018. Estimating runoff using SCS curve number method. Int. J. Emerg. Technol. Adv. Eng. 8, 195–200.
Asgari, H.R., Bozorg Haddad, O., Pazoki, M., Loáiciga, H.A., 2016. Weed optimization algorithm for optimal reservoir operation. J. Irrig. Drain. Eng.
142 (2), 04015055.
Barati, R., 2013. Application of excel solver for parameter estimation of the nonlinear Muskingum models. KSCE J. Civ. Eng. 17 (5), 1139–1148.
Bazaraa, M.S., Sherali, H.D., Shetty, C.M., 2013. Nonlinear Programming: Theory and Algorithms. John Wiley & Sons.
Bhattacharjya, R.K., 2011. Solving groundwater flow inverse problem using spreadsheet solver. J. Hydrol. Eng. 16, 472–477. https://doi.org/10.1061/
(ASCE)HE.1943-5584.0000329.
Che, D., Nangare, M., Mays, L.W., 2014. Determination of optimal unit hydrographs and green-ampt parameters for watersheds. J. Hydrol. Eng. 19 (2),
375–383.
Cho, H., Lee, K.E., Kim, G., 2017. Analysis of the applicability of parameter estimation methods for a stochastic rainfall generation model. J. Korean Data
Inf. Sci. Soc. 28 (6), 1447–1456.
Choopan, Y., Emami, S., 2019. Optimal operation of dam reservoir using gray wolf optimizer algorithm (Case study: Urmia Shaharchay dam in Iran).
J. Soft Comput. Civ. Eng. 3 (3), 47–61.
Chu, H.J., Chang, L.C., 2009. Applying particle swarm optimization to parameter estimation of the nonlinear Muskingum model. J. Hydrol. Eng. 14 (9),
1024–1027.
Datta, B., Harikrishna, V., 2005. Optimization applications in water resources systems engineering. Res. J. IIT Kanpur, 57–64.
Eslamian, S., Lavaei, N., 2009. Modelling nitrate pollution of groundwater using artificial neural network and genetic algorithm in an arid zone. Int.
J. Water 5 (2), 194–203.
Eslamian, S.S., Salimi, V., Chavoshi, S., 2000. Developing an empirical model for the estimation of peak discharge in some catchments in Western Iran.
J. Sci. Technol. Agric. Nat. Resour. 4 (2), 1–12.
Garde, R.J., Kothyari, U.C., 1990. Flood estimation in Indian catchments. J. Hydrol. 113 (1–4), 135–146.
Geem, Z.W., 2006. Parameter estimation for the non-linear Muskingum model using the BFGS technique. J. Irrig. Drain. Eng. 132 (5), 474–478.
Geem, Z.W., Kim, J.H., 2018. Application of computational intelligence techniques to an environmental flow formula. Int. J. Fuzzy Log. Intell. Syst. 18,
237–244.
250 Handbook of hydroinformatics
Goy, J., Morand, P., Etienne, M., 1989. Long-term fluctuations of Pelagianoctiluca (Cnidaria, Scyphomedusa) in the western Mediterranean Sea. Prediction by climatic variables. Deep Sea Res. Part A 36 (2), 269–279.
Gupta, V.K., Sorooshian, S., 1985. The automatic calibration of conceptual catchment models using derivative-based optimization algorithms.
Water Resour. Res. 21 (4), 473–485.
Haddad, O.B., Afshar, A., Marino, M.A., 2006. Honey-bees mating optimization (HBMO) algorithm: a new heuristic approach for water resources optimization. Water Resour. Manag. 20 (5), 661–680.
Hegazy, T., Ersahin, T., 2001. Simplified spreadsheet solution overall construction optimization. J. Constr. Eng. Manag. 127 (6), 469–475.
Hirpurkar, P., Ghare, A.D., 2014. Parameter estimation for the nonlinear forms of the Muskingum model. J. Hydrol. Eng. 20 (8), 04014085.
Ibtissem, C., Nouredine, L., 2013, March. A hybrid method based on conjugate gradient trained neural network and differential evolution for non linear
systems identification. In: 2013 International Conference on Electrical Engineering and Software Applications. IEEE, pp. 1–5.
Jewell, T.K., 2001. Teaching hydraulic design using equation solvers. J. Hydraul. Eng. 127 (12), 1013–1021.
Karahan, H., 2009. Predicting Muskingum Flood Routing Parameters Using Spreadsheet. Wiley Periodicals Inc, pp. 280–286.
Kim, J.H., Geem, Z.W., Kim, E.S., 2001. Parameter estimation of the nonlinear Muskingum model using harmony search. J. Am. Water Resour. Assoc.
37 (5), 1131–1138.
Kisi, O., Ozkan, C., Akay, B., 2012. Modeling discharge–sediment relationship using neural networks with artificial bee colony algorithm. J. Hydrol. 428,
94–103.
Kisi, O., Shiri, J., Karimi, S., Shamshirband, S., Motamedi, S., Petkovic, D., Hashim, R., 2015. A survey of water level fluctuation predicting in Urmia Lake
using support vector machine with firefly algorithm. Appl. Math Comput. 270, 731–743.
Lall, U., Miller, C.W., 1988. An optimization model for screening multipurpose reservoir systems. Water Resour. Res. 24 (7), 953–968.
Lasdon, L.S., Smith, S., 1992. Solving sparse nonlinear programs using GRG. ORSA J. Comput. 4 (1), 2–15.
Lasdon, L.S., Waren, A.D., Jain, A., Ratner, M., 1978. Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans.
Math. Softw. 4 (1), 34–50.
Mehr, A.D., Nourani, V., Kahya, E., Hrnjica, B., Sattar, A.M., Yaseen, Z.M., 2018. Genetic programming in water resources engineering: a state-of-the-art
review. J. Hydrol. 566, 643–667.
Meng, X., Chang, J., Wang, X., Wang, Y., 2019. Multi-objective hydropower station operation using an improved cuckoo search algorithm. Energy 168,
425–439.
Mohan, S., 1997. Parameter estimation of nonlinear Muskingum models using genetic algorithm. J. Hydraul. Eng. 123 (2), 137–142.
Mondal, A., Eldho, T.I., Rao, V.G., 2010. Multiobjective groundwater remediation system design using coupled finite-element model and non-dominated
sorting genetic algorithm II. J. Hydrol. Eng. 15 (5), 350–359.
Muzzammil, M., Alam, J., Zakwan, M., 2015. An optimization technique for estimation of rating curve parameters. In: Symposium on Hydrology, New
Delhi, India. Indian Association of Hydrologists (IAH), Roorkee, pp. 234–240.
Muzzammil, M., Alam, J., Zakwan, M., 2018. A spreadsheet approach for prediction of rating curve parameters. In: Hydrologic Modeling. Springer,
Singapore, pp. 525–533, https://doi.org/10.1007/978-981-10-5801-1_36.
Niazkar, M., Afzali, S.H., 2014. Assessment of modified honey bee mating optimization for parameter estimation of nonlinear Muskingum models.
J. Hydrol. Eng. 20 (4), 04014055.
Niazkar, M., Afzali, S.H., 2016. Application of new hybrid optimization technique for parameter estimation of new improved version of Muskingum
model. Water Resour. Manag. 30 (13), 4713–4730.
Niazkar, M., Afzali, S.H., 2017. Parameter estimation of an improved nonlinear Muskingum model using a new hybrid method. Hydrol. Res. 48 (5),
1253–1267.
Niazkar, M., Zakwan, M., 2021. Assessment of artificial intelligence models for developing single-value and loop rating curves. Complexity. https://doi.
org/10.1155/2021/6627011.
Pandey, M., Zakwan, M., Sharma, P.K., Ahmad, Z., 2020. Multiple linear regression and genetic algorithm approaches to predict temporal scour depth near
circular pier in non-cohesive sediment. ISH J. Hydraul. Eng. 26 (1), 96–103. https://doi.org/10.1080/09715010.2018.1457455.
Peng, C.S., Buras, N., 2000. Dynamic operation of a surface water resources system. Water Resour. Res. 36 (9), 2701–2709.
Qin, Y., Kavetski, D., Kuczera, G., 2018. A Robust Gauss-Newton algorithm for the optimization of hydrological models: benchmarking against
industry-standard algorithms. Water Resour. Res. 54 (11), 9637–9654.
Rao, R.V., Savsani, V.J., Balic, J., 2012. Teaching–learning-based optimization algorithm for unconstrained and constrained real-parameter optimization
problems. Eng. Optim. 44 (12), 1447–1462.
Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S., 2009. GSA: a gravitational search algorithm. Inform. Sci. 179 (13), 2232–2248.
Sadollah, A., Yoo, D.G., Yazdi, J., Kim, J.H., Choi, Y., 2014. Application of water cycle algorithm for optimal cost design of water distribution systems. In:
11th International Conference on Hydroinformatics, New York City, USA.
Samadi-koucheksaraee, A., Ahmadianfar, I., Bozorg-Haddad, O., Asghari-pari, S.A., 2019. Gradient evolution optimization algorithm to optimize reservoir operation systems. Water Resour. Manag. 33 (2), 603–625.
Tikhamarine, Y., Souag-Gamane, D., Ahmed, A.N., Sammen, S.S., Kisi, O., Huang, Y.F., El-Shafie, A., 2020. Rainfall-runoff modelling using improved
machine learning methods: Harris hawks optimizer vs. particle swarm optimization. J. Hydrol., 125133.
Vatankhah, A.R., 2017. Non-linear Muskingum model with inflow-based exponent. Proc. Inst. Civ. Eng. Water Manage. 170 (2), 66–80.
Viessman, W., Lewis, G.L., 2003. Introduction to Hydrology, fifth ed. Pearson, New Delhi, India.
Wang, Y., Brubaker, K., 2015. Multi-objective model auto-calibration and reduced parameterization: exploiting gradient-based optimization tool for a
hydrologic model. Environ. Model. Softw. 70, 1–15.
Gradient-based optimization Chapter
14
251
Wang, X., Sun, Y., Song, L., Mei, C., 2009. An eco-environmental water demand based model for optimising water resources using hybrid genetic simulated annealing algorithms. Part I. Model development. J. Environ. Manage. 90 (8), 2628–2635.
Woodbury, K.A., Taylor, R.P., Huguet, J., Dent, T., Chappell, J., Mahan, K., 2008. Vertical integration of excel in the thermal mechanical engineering
curriculum. In: ASME 2008 International Mechanical Engineering Congress and Exposition, pp. 317–325.
Xu, D., Qui, L., Chen, S., 2012. Estimation of nonlinear Muskingum model parameter using differential evolution. J. Hydrol. Eng. 17 (2), 348–353.
Yazdani, S., Nezamabadi-pour, H., Kamyab, S., 2014. A gravitational search algorithm for multimodal optimization. Swarm Evol. Comput. 14, 1–14.
Yeo, I.Y., Guldmann, J.M., 2010. Global spatial optimization with hydrological systems simulation: appliication to land-use allocation and peak runoff
minimization. Hydrol. Earth Syst. Sci. 14 (2). https://doi.org/10.5194/hess-14-325-2010.
Yuan, G., Wang, X., Sheng, Z., 2020. The projection technique for two open problems of unconstrained optimization problems. J. Optim. Theory Appl.,
1–30.
Zakwan, M., 2016a. Application of optimization technique to estimate IDF parameters. Water Energy Int. 59 (5), 69–71.
Zakwan, M., 2016b. Estimation of runoff using optimization technique. Water Energy Int. 59 (8), 42–44.
Zakwan, M., 2017. Assessment of dimensionless form of Kostiakov model. Aquademia 1 (1), 01. https://doi.org/10.20897/awet.201701.
Zakwan, M., 2018. Spreadsheet-based modelling of hysteresis-affected curves. Appl. Water Sci. 8 (4), 101–105. https://doi.org/10.1007/s13201-0180745-3.
Zakwan, M., 2020. Revisiting maximum observed precipitation and discharge envelope curves. Int. J. Hydrol. Sci. Technol. 10 (3), 221–229. https://doi.
org/10.1504/IJHST.2020.107215.
Zakwan, M., Muzzammil, M., 2016. Optimization approach for hydrologic channel routing. Water Energy Int. 59 (3), 66–69.
Zakwan, M., Muzzammil, M., Alam, J., 2016a. Estimation of soil properties using infiltration data. In: Proceeding National Conference on Advances in
Geotechnical Engineering, Aligarh, pp. 198–201.
Zakwan, M., Muzzammil, M., Alam, J., 2016b. Application of spreadsheet to estimate infiltration parameters. Perspect. Sci. 8, 702–704. https://doi.org/
10.1016/j.pisc.2016.06.064.
Zakwan, M., Muzzammil, M., Alam, J., 2017. Developing stage-discharge relations using optimization techniques. Aquademia 1 (2), 05. https://doi.org/
10.20897/awet.201702.
Zakwan, M., Ahmad, Z., Sharief, S.M.V., 2018. Magnitude-frequency analysis for suspended sediment transport in the Ganga River. J. Hydrol. Eng. 23 (7),
05018013. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001671.
Zakwan, M., Niazkar, M., 2021. A comparative analysis of data-driven empirical and artificial intelligence models for estimating infiltration rates.
Complexity 2021, 9945218. https://doi.org/10.1155/2021/9945218.
Zakwan, M., Niazkar, M., 2022. Discussion of “Reverse Flood Routing in Rivers Using Linear and Nonlinear Muskingum Models” by Meisam Badfar,
Reza Barati, Emrah Dogan, and Gokmen Tayfur. J. Hydrol. Eng. 27 (5), 07022001.
This page intentionally left blank
Chapter 15
Gray wolf optimization algorithm
Mohammad Reza Zaghiyana, Vahid Shokri Kuchaka, and Saeid Eslamianb,c
a
Department of Water Engineering and Management, Tarbiat Modares University, Tehran, Iran, b Department of Water Engineering, College of
Agriculture, Isfahan University of Technology, Isfahan, Iran, c Center of Excellence in Risk Management and Natural Hazards, Isfahan University of
Technology, Isfahan, Iran
1. Introduction
Today, water is considered one of the three factors of formation and survival of the environment (soil, air, and water) more
than ever. Undoubtedly, the preservation and protection of water resources and their optimal use are global issues, and
therefore in the 21st century, water crises are mentioned as a pervasive human challenge (Damania et al., 2019). Increased
water demand in various sectors, pollution of water resources, climate change, and human activities can be considered the
leading causes of water stress. The emphasis on optimal water resources management and its sustainable development is
essential to deal with this type of stress. In this regard, optimization or, in other words, optimal use of available water
resources according to the associated constraints is one of the most fundamental steps in water resources management.
The first step in formulating macro water resources management policies is to propose different options according to the
limitations and comprehensive water resources development and management goals. Optimization models are an efficient
tool given the dimensions and complexities of water resource systems. However, uncertainties always affect their results
(Loucks and Van Beek, 2017). Today, with the development of information technology, new flexible tools have been
created. Their combination with optimization models has provided a new space for developing analysis, planning, and
management of water resources systems (Tayfur, 2017). Furthermore, the development of these methods can significantly
improve dealing with uncertainties.
Selecting the set of decision variables that maximizes/minimizes the objective function subject to the system constraints
is called the optimization procedure (Simonovic, 2012). In other words, the goal of optimization is to find the best possible
acceptable solution, given the limitations and needs of the problem. Optimization algorithms are generally divided into two
categories of exact and approximate algorithms. Exact algorithms, which are mathematical methods, include Linear programming (LP), Non-linear programming (NLP), Gradient-Based, Gradient Free, etc. (Yang, 2010). However, despite
finding the optimal global definitive solution, these algorithms in NPa-hard optimization problems, due to the constraints
and high dimensions of the system, cannot find the optimal solution, and execution time increases exponentially. On the
other hand, approximate optimization algorithms can find suitable solutions (close to the global optimization) in a shorter
time to solve NP-hard optimization problems. In other words, these algorithms are not guaranteed to achieve a definitive
global answer but are very useful for problems with a large number of decision variables and strict constraints.
Approximate algorithms are also classified into two categories of heuristic and meta-heuristic algorithms. Heuristics
methods are problem-dependent techniques. In other words, they adapt to the problem and try to make the most of its features and benefits. However, the greediness of this technique to find the optimal solution causes it to be trapped in local
optima, and the global optimal solution remains unknown (Sharma and Kaur, 2021). SUFI2b algorithm in SWAT-CUPc is
an example of this technique. As issues and problems in water resources systems became more complex, including the
allocation of water resources, optimization algorithms gradually improved, and the use of meta-heuristics methods surpassed other techniques. Meta-heuristic algorithms are problem-independent techniques inspired by nature or a natural
rhythm and applied to solve optimization problems by converting them into mathematical equations. Meta-heuristic techniques are not used only for a specific issue, and search process management in approaching the optimal solution is one of
the most critical features of this technique. In other words, by expanding the search scope, these methods choose the
a. Nondeterministic polynomial time.
b. Sequential uncertainty fitting.
c. SWAT calibration and uncertainty procedures.
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00023-3
Copyright © 2023 Elsevier Inc. All rights reserved.
253
254 Handbook of hydroinformatics
shortest path to reach the global optimal point and minimize the possibility of getting trapped in local optimizations (Oliva
et al., 2019). Meta-heuristic algorithms are also classified into several categories, including evolutionary-based like Genetic
Algorithm (Bonabeau et al., 1999), physics-based like Simulated Annealing (Kirkpatrick et al., 1983) and swarm intelligence like Ant Colony Optimization (Dorigo et al., 2006).
The gray wolf optimizer (GWO) is a meta-heuristics algorithm that is in the category of swarm intelligence and
population-based algorithms. The algorithm was developed by Mirjalili et al. (2014) and inspired by the strict social dominant hierarchy and social behavior of gray wolves while hunting. GWO algorithm has been applied in various fields of
water resources studies, including water optimization allocation (Yu and Lu, 2018), optimal reservoir operation (Dahmani
and Yebdri, 2020), soil properties (Mosavi et al., 2021), reference ETd estimating (Tikhamarine et al., 2019), streamflow
forecasting (Tikhamarine et al., 2020) and groundwater studies (Majumder and Eldho, 2020). The complete theory and
mathematical modeling of the GWO algorithm will be discussed in the following sections. Then at the end of this chapter,
an optimization example by GWO algorithm in the MATLAB platform will be presented.
2.
Theory of GWO
Gray wolf (Canis lupus), also called timber wolf (Fig. 1), is the most prominent wild member of the dog family (Canidae). It
inhabits vast areas of the Northern Hemisphere. Different species of gray wolf are known around the world. For example,
5–24 subspecies are known in North America, 7–12 in Eurasia, and 1 in Africa.
The gray wolf in nature often prefers group life in packs of an average of 5 to 12. Its main food is hunting large venomous species such as wild sheep, wild goats, and deer. Gray wolves are also at the top of the food chain and pyramid. The
interesting point about this type of animal, which was also briefly mentioned in the previous section, is that their life has an
exact, highly orderly social hierarchy, as shown in Fig. 2.
The leaders are a male and a female called alpha and are responsible for making decisions about hunting, sleeping
location, waking up time, etc. The alpha’s decisions are dictated to the whole pack, and other groups affirm them by holding
their tails down. Only alpha wolves are allowed to mate in the entire pack of gray wolves, and other wolves are not allowed.
It should be noted that the alpha wolves are not necessarily the strongest member of the group but the best member in terms
of pack management. In other words, the discipline and organization of a pack are far more important than its strength.
FIG. 1 Gray wolf.
d. Evapotranspiration.
Gray wolf optimization algorithm Chapter
15
255
FIG. 2 Gray wolf social hierarchy.
The second level of the gray wolf hierarchy is the beta. The beta wolves are subject to alpha decisions and contribute to
other group activities. Beta wolves can be male or female, but they are the best alternative to alpha wolves growing old. Beta
wolves must respect the alpha, and at the same time, command the lower-level wolf groups. This type of wolf acts as a consultant for alpha and a helper for the group. Beta boosts alpha commands throughout the pack and gives feedback to alpha.
The lowest level is related to omega-gray wolves. Omega wolves play a protective role and must always follow all the
higher and dominant wolf levels. Also, they are the last wolves allowed to eat. Omega may not seem to be an essential group
in the pack, but if omega is not present in the group, problems such as civil wars between wolves can arise. Therefore, the
presence of omega wolves creates a sense of satisfaction among all wolves and maintains its dominant structure. Also, it has
been observed in some cases in the wild which omega wolves work as babysitters in their groups.
In the gray wolves’ group, it is called delta or subordinate if the wolf is not Alpha, beta, or omega. Delta wolves must be
subject to alpha and beta wolves, dominating omega wolves. Scouts, sentinels, elders, hunters, and caretakers’ wolves
belong to this level. Scout wolves are responsible for observing the boundaries of the territory and warning the herd in
the event of any danger. Sentinel wolves ensure the protection and safety of the group. Elders are experienced older wolves
used as wolves in the Alpha or beta level. Hunter wolves help the Alpha and beta levels while hunting and provide food for
the herd. And finally, caretaker wolves are responsible for caring for weak, sick, and injured wolves in the group.
Group hunting is another interesting social behavior of gray wolves and the social hierarchy of wolves. According to
Muro et al. (2011), the main phases of gray wolves hunting are as follows and shown in Fig. 3:
l
l
l
Tracking, chasing and approaching the prey.
Pursuing, encircling, and harassing the prey until it stops moving.
Attack toward the prey.
3. Mathematical modeling of gray wolf optimizer
In this section, social hierarchy and hunting techniques (optimization), including encircling, tracking, and attacking the
prey, are mathematically presented to design a GWO (Mirjalili et al., 2014).
3.1 Social hierarchy
In the GWO for mathematical modeling of social hierarchy, the fittest position (solution) is alpha (a). Therefore, the second
and third-best solutions are beta (b) and delta (d), respectively. Other solutions are also called omega (o). Each gray wolf in
this algorithm is considered a search agent in an optimization problem. The search agent is evaluated in terms of its
position according to the cost function. In this regard, optimization (hunting) is led by a, b, and d wolves, and o
follows them.
256 Handbook of hydroinformatics
FIG. 3 Hunting behavior of gray wolves: (A) chasing, approaching and tracking prey (B–D) pursuing, harassing, and encircling (E) stationary situation
and attack.
3.2 Encircling prey
According to the details mentioned above, gray wolves encircle the prey during hunting. The equations of the mathematical
model of the gray wolf encircling behavior are proposed as follows:
! !
ƒ!
X ðt + 1Þ ¼ XP ðtÞ A D
(1)
! ƒ!
!
!
D ¼ C XP ðtÞ X ðtÞ
(2)
!
!
!
And also, the vectors A and C are calculated as follows:
!
!
A ¼2 a !
r1 a
!
!
C ¼2!
r2
(3)
(4)
The parameters descriptions used in all formulas of GWO are presented in Table 1.
).
As shown in Fig. 4A, a gray wolf in location (X, Y) can change its position according to the position of the prey (X⁎, Y⁎!
The !
various locations around the best agent can be obtained according
to
its
current
position
by
setting
the
value
of
the
A
!
!
and C vectors. For instance, (X⁎–X, Y⁎) can be reached by setting A ¼ ½1, 0 and C ¼ ½1, 1. The Possible updated locations
for a gray wolf in 3D space are also shown in Fig. 4B. The random vectors !
r 1 and !
r 2 allow search agents to access any
position between the points shown in Fig. 4. Thus, a gray wolf can randomly change its position in the prey space using
Eqs. (1) and (2).
3.3 Hunting behavior
As mentioned earlier, hunting is usually led by the a, b, and d provide support for a. Since there is no knowledge of the
ƒ!
optimal solution or hunting position ( XP ) in the search space, the a position is considered the best position obtained (prey).
Gray wolf optimization algorithm Chapter
15
257
TABLE 1 Description of GWO parameters.
Parameters
Description
!
X
The position vector of a gray wolf
t
Current iteration
ƒ!
XP
The position vector of the prey
!
A
Coefficient vector
C
!
Coefficient vector
!
a
Linearly decreased from 2 to 0 throughout iterations
!
r1
Random vectors in [0, 1]
!
r2
Random vectors in [0, 1]
FIG. 4 Two-dimensional (A) and three-dimensional (B) location vectors and their next possible position.
b and d are also assumed as other answers. In other words, three of the best solutions are saved, and other search agents (o
wolves) are forced to update their position according to the position of the best agents. The following formulas are suggested in this regard:
ƒ! ƒ! !
ƒ!
Da ¼ C1 Xa X
(5a)
ƒ! ƒ! !
ƒ!
Db ¼ C2 Xb X
(5b)
ƒ! ƒ! !
ƒ!
Dd ¼ C3 Xd X
(5c)
258 Handbook of hydroinformatics
ƒ! ƒ! ƒ! ƒ!
X 1 ¼ X a A 1 Da
ƒ! ƒ! ƒ! ƒ!
X 2 ¼ X b A 2 Db
ƒ! ƒ! ƒ! ƒ!
X 3 ¼ X d A 3 Dd
!
X ð t + 1Þ ¼
ƒ! ƒ! ƒ!
X1 + X 2 + X 3
3
(6a)
(6b)
(6c)
(7)
As shown in Fig. 5, the final position is obtained at a random location (within the circle) defined by the a, b, and d positions.
In other words, this figure shows the estimation of these three wolves’ hunting positions and updates the position of other
wolves (include o) in the surrounding area.
All meta-heuristic algorithms run the search process in two exploration and exploitation stages. The exploitation
process in the neighborhood of a point moves toward the best solution and may get stuck in local minima. So, it strongly
depends on the starting point. Methods that use this algorithm include Newton Raphson, gradient methods, and steepest
descent. At the same time, the exploration process is always looking for new searches in the decision space. If an algorithm
uses only exploration, it becomes a random search algorithm with no proper direction. Therefore, balancing the above two
components is always necessary for the optimization process. Table 2 also presents the specifications of the exploitation
and exploration phase.
3.4 Exploitation in GWO-attacking prey
As mentioned above, the gray wolves finish the hunt by attacking the prey when it stops moving.
Mathematical modeling of
!
!
approaching prey begins with decreasing the value of the a!. By doing this, the value of the A will also be reduced to zero
randomly. In this regard, when the random values of the A Are in [1, 1], the next position of the search agent will be
between its current position and the position of the prey (Fig. 6).
FIG. 5 Updating of wolves’ positions in GWO.
Gray wolf optimization algorithm Chapter
15
259
TABLE 2 Specifications of the exploitation and exploration phase.
Exploitation
Exploration
Maintain convergence
Preserve diversity
Suitable for the final runs
Suitable for primary runs
Local view at the space
The overall experience of space
Low risk
High risk
Trapped in local answers
Sudden movements
ƒ!
ƒ!
FIG. 6 Attacking prey against searching for prey. (A) If jAj < 1. (B) If jAj > 1.
3.5 Exploration in GWO-search for prey
Gray wolves move
away from each other in search of prey and approach each other according to the a, b, and d positions to
!
attack it. The A is used to model this process with a random value greater than 1 or less than 1. This diverges
ƒ! the search
agent. In other words, this process enables the GWO algorithm to search globally. As shown in Fig. 6B, if jAj > 1, wolves
are forced to diverge from prey and find more suitable prey.
!
On the other hand, another parameter affecting the exploration process
is the C. This vector, as mentioned before, has
!
random values in the range of [0, 2]. Depending on the wolf’s
position, C can give a random weight to the prey
to make it
!
!
harder or more accessible
for
the
wolves
to
reach.
When
C
>
1,
the
prey
importance
is
emphasized,
and
for
C
<
1,
the prey
!
importance is reduced. C can also be considered the effect of obstacles preventing prey from approaching by giving a
random weight to the prey, making!it harder for the wolves to reach the prey and longer in nature. It should be noted that
!
C is not linearly reduced relative to A and can be very useful in preventing trapped in local solutions, especially in the final
iteration.
By the above explanations and what was presented in the previous sections, the optimization implementation process by
the gray wolf algorithm is presented in Fig. 7 as a detailed flowchart.
4. Gray wolf optimization example for reservoir operation
This section presents the application of the gray wolf optimization algorithm in an issue that has been used as a primary example
in many water resource optimization topics. The physical characteristics of a dam are presented in Table 3. It is desirable
to optimize the release values from the dam in a situation where all the downstream monthly demands, including the environmental flow, are met (Araghinejad et al., 2017). In this example, the weight of downstream needs is considered equal.
260 Handbook of hydroinformatics
FIG. 7 Optimization implementation process by the gray wolf algorithm.
TABLE 3 Physical and hydrological characteristics of the assumed dam.
Mean inflow
(MCM)
Mean demand
(MCM)
Mean Eflow
(MCM)
Initial storage
(MCM)
Min storage
(MCM)
Max storage
(MCM)
Min release
(MCM)
Max release
(MCM)
122.35
103.28
23.45
750
250
1470
10
380
In the case of non-supply of downstream needs, the amount of damage is assumed to equal the squares of the monthly shortage
demands. The optimization period (number of variables) is 336 months.
This model was implemented in MATLAB (Appendix A) using the standard GWO method described by Mirjalili et al.
(2014). In this approach, the pack size and the maximum number of iterations were set to 245 and 1000. The mean best
solution (reservoir release) obtained by GWO is 120.27 MCM. The objective space is shown in Fig. 8.
Gray wolf optimization algorithm Chapter
15
261
FIG. 8 Objective space of the reservoir example.
5. Conclusions
This chapter first addressed the importance of optimization models in water resources and then introduced the types of
optimization algorithms along with their classification. After that, the gray wolf optimization (GWO) algorithm was presented as one of the newest meta-heuristic algorithms. A brief literature review on the application of this method in various
water resources issues was conducted. Gray wolves are thought to be predators, and their hunting mechanism and social
hierarchy inspire the algorithm. This method’s algorithm theory and mathematical modeling were presented in separate
sections. The optimization implementation process was also shown in a detailed flowchart. Finally, as a primary and
straightforward problem in teaching optimization algorithms in water resources, the optimization of water allocation from
the reservoir was implemented in MATLAB software (Appendix A).
262 Handbook of hydroinformatics
Appendix A: GWO Matlab codes for the reservoir example
function loss = Fit_Example(x)
global Inflow Totaldemand Maxstorage Minstorage Maxrelease Storage Release
loss = 0;
InitialStorage = 750;
Storage = zeros(336,1);
Release = x';
for m = 1:336
if m==1
Storage(m,1) = InitialStorage + Inflow(m,1) - Release(m,1);
else
%Continuity equation
Storage(m,1) = Storage(m-1) + Inflow(m,1) - Release(m,1);
end
Residual = Release(m,1) - Totaldemand(m,1);
if Residual==0
loss = loss+0;
else
loss = loss + (Release(m,1) - Totaldemand(m,1))^2;
end
if Storage(m,1) < Minstorage
loss = loss + 1000000;
%Penalty approch
end
if Storage(m,1) > Maxstorage
loss = loss + 1000000;
%Penalty approch
end
if Release(m,1) > Maxrelease
loss = loss + 1000*(Release(m,1) - Maxrelease); %Penalty approch
end
end
Gray wolf optimization algorithm Chapter
15
% This code performs Grey Wolf Optimization Algorithm for the reservoir example
clc
clear
close all
Problem definition
global EFlow Inflow demand Totaldemand Maxstorage Minstorage Maxrelease Minrelease Storage
data = readtable(fullfile('Evolutionary Algorithms.csv'));
EFlow = data.eflow_MCM; % Environmental flow
Inflow = data.damin_MCM;
demand = data.demand_MCM;
Totaldemand = demand + EFlow;
Maxstorage = 1470;
Minstorage = 250;
Maxrelease = 380;
Minrelease = 0;
Upper and Lower Bands Definitions
nVar = size(data,1);
VarMin = 10;
VarMax = 380;
GWO Parameters
MaxIt = 1000;
nPop = 245;
%Pack size
Initialization
empty_GWO.Position = [];
empty_GWO.Cost = inf;
%Because It will be compared in the following sections
GWO = repmat(empty_GWO,nPop,1);
Alpha = empty_GWO;
Beta = empty_GWO;
Delta = empty_GWO;
for i = 1:nPop
%Initialize Position
%
GWO(i).Position = unifrnd(VarMin,VarMax,[1,nVar]);
GWO(i).Position = Totaldemand';
%Evaluation
GWO(i).Cost = Fit_Example(GWO(i).Position);
%Update Alpha, Beta, Delta
if GWO(i).Cost < Alpha.Cost
Alpha = GWO(i);
if GWO(i).Cost < Beta.Cost
Beta = GWO(i);
if GWO(i).Cost < Delta.Cost
263
264 Handbook of hydroinformatics
Delta = GWO(i);
end
end
end
end
GWO Main Loop
NFE = 0; %Number of Function Evaluation
BestCost = zeros(1,MaxIt);
for it = 1:MaxIt
a = 2 - it*(2/MaxIt);
for i = 1:nPop
%Alpha Part
r1 = rand(1,nVar);
r2 = rand(1,nVar);
A = 2*a*r1 - a;
C = 2*r2;
D = abs(C.*Alpha.Position - GWO(i).Position);
X1 = Alpha.Position - A.*D;
%Beta Part
r1 = rand(1,nVar);
r2 = rand(1,nVar);
A = 2*a*r1 - a;
C = 2*r2;
D = abs(C.*Beta.Position - GWO(i).Position);
X2 = Beta.Position - A.*D;
%Delta Part
r1 = rand(1,nVar);
r2 = rand(1,nVar);
A = 2*a*r1 - a;
C = 2*r2;
D = abs(C.*Delta.Position - GWO(i).Position);
X3 = Delta.Position - A.*D;
%Final Steps
GWO(i).Position = (X1+X2+X3)/3;
flagUb = GWO(i).Position > VarMax;
flagLb = GWO(i).Position < VarMin;
GWO(i).Position =
GWO(i).Position.*(~(flagUb+flagLb))+VarMax.*(flagUb)+VarMin.*(flagLb);
%Evaluation
GWO(i).Cost = Fit_Example(GWO(i).Position);
%Update Alpha, Beta, Delta
if GWO(i).Cost < Alpha.Cost
Alpha = GWO(i);
Gray wolf optimization algorithm Chapter
15
265
if GWO(i).Cost < Beta.Cost
Beta = GWO(i);
if GWO(i).Cost < Delta.Cost
Delta = GWO(i);
end
end
end
end
NFE = NFE + nPop;
BestCost(it) = Alpha.Cost;
disp(['Iteration: ',num2str(it),', NFE = ',num2str(NFE),', BestCost =
',num2str(BestCost(it))]);
end
finalstorage = Storage;
reservoir_release = GWO(i).Position';
Plot Results
figure;
plot(1:MaxIt,BestCost,':','Color','r','LineWidth',2,'MarkerSize',8)
xlabel('Iteration')
ylabel('Cost value obtained per each iteration')
title(['Best Cost Obtained = ',num2str(BestCost(MaxIt))])
set(gca,'FontName','Times New Roman')
set(gca,'FontSize',12)
set(gca,'Color',[0.95 0.97 0.95])
set(gcf,'Color','w')
grid on
xlim([0 MaxIt])
ylim([0 max(BestCost)])
References
Araghinejad, S., Hosseini-Moghari, S.-M., Eslamian, S., 2017. Reservoir operation during drought. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of
Drought and Water Scarcity. Management of Drought and Water Scarcity, vol. 3. Taylor and Francis, CRC Press, USA, pp. 283–292 (Chapter 12).
Bonabeau, E., Marco, D., Theraulaz, G., 1999. Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, UK.
Dahmani, S., Yebdri, D., 2020. Hybrid algorithm of particle swarm optimization and grey wolf optimizer for reservoir operation management. Water
Resour. Manag. 34, 4545–4560.
Damania, R., Desbureaux, S., Rodella, A.-S., Russ, J., 2019. Quality Unknown: The Invisible Water Crisis. World Bank Publications, United Nations.
Dorigo, M., Birattari, M., Stutzle, T., 2006. Ant colony optimization. IEEE Comput. Intell. Mag. 1, 28–39.
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., 1983. Optimization by simulated annealing. Science 220 (4598), 671–680.
Loucks, D.P., Van Beek, E., 2017. Water Resource Systems Planning and Management: An Introduction to Methods, Models, and Applications. Springer.
Majumder, P., Eldho, T.I., 2020. Artificial neural network and grey wolf optimizer based surrogate simulation-optimization model for groundwater remediation. Water Resour. Manag. 34, 763–783.
Mirjalili, S., Mirjalili, S.M., Lewis, A., 2014. Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007.
Mosavi, A., Samadianfard, S., Darbandi, S., Nabipour, N., Qasem, S.N., Salwana, E., Band, S.S., 2021. Predicting soil electrical conductivity using multilayer perceptron integrated with grey wolf optimizer. J. Geochem. Explor. 220, 106639.
266 Handbook of hydroinformatics
Muro, C., Escobedo, R., Spector, L., Coppinger, R.P., 2011. Wolf-pack (Canis lupus) hunting strategies emerge from simple rules in computational simulations. Behav. Process. 88, 192–197.
Oliva, D., Abd Elaziz, M., Hinojosa, S., 2019. Metaheuristic Optimization. Springer, pp. 13–26, https://doi.org/10.1007/978-3-030-12931-6_3.
Sharma, M., Kaur, P., 2021. A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Arch. Comput. Methods
Eng. 28, 1103–1127. https://doi.org/10.1007/s11831-020-09412-6.
Simonovic, S.P., 2012. Managing Water Resources Methods and Tools for a Systems Approach. UNESCO, Paris and Earthscan James & James,
London, https://doi.org/10.4324/9781849771917.
Tayfur, G., 2017. Modern optimization methods in water resources planning, engineering and management. Water Resour. Manag. 31, 3205–3233.
Tikhamarine, Y., Malik, A., Kumar, A., Souag-Gamane, D., Kisi, O., 2019. Estimation of monthly reference evapotranspiration using novel hybrid
machine learning approaches. Hydrol. Sci. J. 64, 1824–1842.
Tikhamarine, Y., Souag-Gamane, D., Ahmed, A.N., Kisi, O., El-Shafie, A., 2020. Improving artificial intelligence models accuracy for monthly
streamflow forecasting using grey wolf optimization (GWO) algorithm. J. Hydrol. 582, 124435.
Yang, X.-S., 2010. Engineering Optimization. John Wiley & Sons, Inc., Hoboken, NJ, USA, https://doi.org/10.1002/9780470640425.
Yu, S., Lu, H., 2018. An integrated model of water resources optimization allocation based on projection pursuit model-grey wolf optimization method in a
transboundary river basin. J. Hydrol. 559, 156–165.
Chapter 16
Kernel-based modeling
Kiyoumars Roushangara,b, Roghayeh Ghasempoura, and Saman Shahnazia
a
Department of Water Resources Engineering, Faculty of Civil Engineering, University of Tabriz, Tabriz, Iran, b Center of Excellence in
Hydroinformatics, University of Tabriz, Tabriz, Iran
1. Introduction
Kernel-based approaches such as Gaussian Process Regression (GPR), Support Vector Machine (SVM), and Kernel
Extreme Learning Machine (KELM) are relatively new important methods based on the different kernels type which
are based on statistical learning theory initiated. These methods can model non-linear decision boundaries, and there
are many kernels to choose from. They are also robust against overfitting, especially in high-dimensional space. However,
the appropriate selection of kernel type is the most important step in these models due to their direct impact on the training
and classification precision. These methods are memory intensive, trickier to tune due to the importance of picking the right
kernel. In these models, we will be able to predict the proper behavior of the system, although we will not be able to characterize its intrinsic structure and behavior. A kernel method is an algorithm that depends on the data only through dotproducts. When this is the case, the dot product can be replaced by a kernel function which computes a dot product in some
possibly high dimensional feature space (Smola, 1996). This has two advantages: First, the ability to generate non-linear
decision boundaries using methods designed for linear classifiers. Second, the use of kernel functions allows the user to
apply a classifier to data that have no obvious fixed-dimensional vector space representation.
Several kernel-based tools or techniques have been employed to find reliable solutions for solving complicated
hydraulic and hydrological problems. Among the broader science applications of kernel-based methods, SVM is widely
used in various fields of water engineering. There is a large volume of published studies highlighting the encouraging performance of SVM in the areas of classification, regression, and forecasting (Deka, 2014; Seifi et al., 2020; Seifi and Riahi,
2020). Recent years have seen an increment in the number of scientific researches using other kernel-based algorithms.
Among them, Gaussian Process Regression (GPR) has gained enormous popularity as the most effective Bayesian tool
for complicated regression problems (Perez-Cruz et al., 2013). GPR as a kernel-based model is conceptually simpler to
understand and known as a flexible nonparametric model that provides a prior probability distribution to be directly defined
over functions (Rasmussen and Williams, 2006). A promising application of the GPR model in forecasting daily, monthly,
and seasonally streamflow (Sun et al., 2014; Zhu et al., 2019), forecasting groundwater level (Raghavendra and Deka,
2016) and forecasting daily seepage discharge of an earth dam (Roushangar et al., 2016) have been reported. Having
employed the GPR model to predict the discharge coefficient of a gated piano key weir, Akbari et al. (2019) showed that
utilization of various kernel functions had a trivial effect on model performance. Roushangar and Shahnazi (2020b)
explored the use of GPR and SVM to predict sediment transport rate of gravel-bed rivers. Two distinct scenarios (based
on hydraulic and sediment properties) were used to depict the modeling process. The slight advantage in performance of the
GPR over SVM models might be seen. Jaiswal and Goel (2020) went on to confirm the efficiency of kernel-based GPR
model in aeration efficiency modeling of rectangular weirs. It is also noteworthy that recent studies offer Pearson kernel
function for well advised application of GPR tool for modeling of energy losses of culverts and roughness coefficient of
sewer pipe (Roushangar et al., 2019, 2020).
The concept of kernel function has also been used in order to improve the performance of conventional learning
methods such as Extreme Learning Machine (ELM). Extending the kernel function for determination of the hidden layer
feature mapping in ELM leads to more stability for prediction goals. Employing Relevance Vector Machine (RVM), GPR
and KELM as kernel-based techniques to model pier scour using field data, Pal et al. (2014) demonstrated that the GPR
outperformed the RVM and KELM models. Roushangar and Shahnazi (2019) introduced an effective prediction method
based on KELM coupled with Particle Swarm Optimization (PSO). Compared with classical approaches, their hybrid
model enjoyed superior accuracy when it employed to predict bedload transport rate. In other attempts by the same authors,
Roushangar and Shahnazi (2020a) investigated the generalization capability of three kernel-based techniques (KELM,
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00018-X
Copyright © 2023 Elsevier Inc. All rights reserved.
267
268 Handbook of hydroinformatics
GPR and SVM) for modeling total sediment load of gravel-bed rivers. Li et al. (2020) conducted research on a river water
level forecasting and found that KELM can effectively improve the prediction accuracy of the model. Using kernel-based
approaches effectively requires an understanding of how they work and what kernel should be selected. In this chapter,
several types of kernel-based approaches and theory behind them are discussed and some examples about their applications
are provided.
2. Support vector machine
The support vector machine (SVM) algorithm is a popular machine learning tool that offers solutions for both classification
and regression problems (Vapnik, 1995; Sharifi Garmdareh et al., 2018). SVM is built on the basis of the VC Dimension
(Vapnik Chervonenkis Dimension) Theory and the Structural Risk Minimum Theory, which are the core contents of the
Statistical Learning Theory. SVM has both solid theoretical foundation and ideal generation ability. Presently, SVM has
been used in many domains and occasions, such as handwriting recognition, biological character recognition (e.g., face
recognition), credit card cheat checking, image segmentation, bioinformatics, function fitting, and medical data analysis.
An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible New examples are then mapped into that same space and
predicted to belong to a category based on the side of the gap on which they fall. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping
their inputs into high-dimensional feature spaces.
2.1 Support vector classification
2.1.1 Linear classifiers
The data for a two-class learning problem consists of objects labeled with one of two labels corresponding to the two
classes; for convenience based on Fig. 1, we assume the labels are +1 (positive examples) or 1 (negative examples).
In what follows boldface x denotes a vector with components xi. The notation xi will denote the ith vector in a dataset
{(xi, yi)}n i¼1, where yi is the label associated with xi. The objects xi are called patterns or examples. We assume the
examples belong to some set X. Initially we assume the examples are vectors, but once we introduce kernels this assumption
will be relaxed, at which point they could be any continuous/discrete object.
A key concept required for defining a linear classifier is the dot product between two vectors, also referred to as an inner
product or scalar product, defined as wTx ¼ Siwixi. A linear classifier is based on a linear discriminant function of the form
(Yang et al., 2015):
f ðxÞ ¼ w T x + b
(1)
The vector w is known as the weight vector, and b is called the bias. Consider the case b ¼ 0 first. The set of points x such that
wTx ¼ 0 are all points that are perpendicular to w and go through the origin—a line in two dimensions, a plane in three
dimensions, and more generally, a hyperplane. The bias b translates the hyperplane away from the origin.
x : f ðxÞ ¼ wT x + b ¼ 0
(2)
b
||w ||
2
Margin = w
|| ||
Support vector
wTx + b = 1
wTx + b = 0
wTx + b = –1
FIG. 1 SVM classification.
w
Support vector
Kernel-based modeling Chapter
16
269
The hyperplane divides the space into two: the sign of the discriminant function f(x) denotes the side of the hyperplane a
point is on. The boundary between regions classified as positive and negative is called the decision boundary of the classifier (Roushangar and Ghasempour, 2017). The decision boundary defined by a hyperplane is said to be linear because it is
linear in the input examples. A classifier with a linear decision boundary is called a linear classifier. Conversely, when the
decision boundary of a classifier depends on the data in a non-linear way the classifier is said to be non-linear.
2.1.2 Non-linear classifiers and kernels application
In many applications a non-linear classifier provides better accuracy. And yet, linear classifiers have advantages, one of
them being that they often have simple training algorithms that scale well with the number of examples (Cristianini and
Shawe-Taylor, 2000; Tezel and Buyukyildiz, 2016). This begs the question: Can the machinery of linear classifiers be
extended to generate non-linear decision boundaries? Furthermore, can we handle domains such as protein sequences
or structures where a representation in a fixed dimensional vector space is not available?
According to Fig. 2, the naive way of making a non-linear classifier out of a linear classifier is to map data from the input
space X to a feature space F using a non-linear function ’: X ! F. In the space F the discriminant function is:
f ðxÞ ¼ wT ’ðxÞ + b
(3)
When working with SVM, we are usually dealing with spaces with a very high dimensionality. The objective of SVM is to
find a hyperplane that separates the data in spaces we may don’t know how to do it, or what it is worse, we may know that
the data is not linear separable in the actual space (called input space). Kernel is a tool that projects the data from an input
space to a feature space where we know that the data is linear separable.
In fact, kernel is a mathematical function for transforming data from an input space to a feature space. Different kernelbased algorithms use different types of kernel functions. The appropriate selection of kernel type is the most important step
in the kernel-based approaches due to its impact on the training process (Zhuang et al., 2011).
2.2 Support vector regression
The regression problem is a generalization of the classification problem, in which the model returns a continuous-valued
output, as opposed to an output from a finite set. In other words, a regression model estimates a continuous-valued multivariate function (Roushangar and Ghasempour, 2017). The SVM method is based on the concept of the optimal hyperplane that separates samples of two classes by considering the widest gap between two classes (see Fig. 3). Support vector
regression (SVR) is an extension of SVM regression. The aim of SVR is to characterize a kind of function that has at most e
deviation from the actually obtained objectives for all training data yi and at the same time, it would be as flat as possible
(Vapnik, 1995). SVR formulation is as follows:
f ðxÞ ¼ w’ðxÞ + b
x1
Class +1
Class –1
Missclassified
Kernel mapping
ϕ: x → j (x) = z
n
rgi
Ma w
2/
(4)
(z2)
ξi
b
w
(z1)
x2
ξj
Decision boundary
wTϕ(x)+b= +1
T
w ϕ(x)+b= 0
x-space
FIG. 2 Non-linear classifier and kernel mapping (Zhuang et al., 2011).
z-space
wTϕ(x)+b= –1
270 Handbook of hydroinformatics
FIG. 3 Data classification and support vectors.
w is expressed as Eq. (5) in which ai is Lagrange multipliers, yi is forecasted value and xi is input value.
w¼
n
X
ai y i x i
(5)
i¼1
The coefficients of Eq. (4) are determined by minimizing regularized risk function as expressed below:
R min ¼ c
where
n
1 X
1
L ðt , y Þ + kwk2
N i¼1 e i i
2
(
Le ðti , yi Þ ¼
0
t , y e
i
i
t , y e
i
i
Otherwise
(6)
(7)
The constant C is the cost factor and represents the trade-off between the weight factor and approximation error. e is the
radius of the hyper-tube within which the regression function must lie. The Le(ti, yi) represents the loss function in which yi
is forecasted value and ti is desired value in period i. The jjw jj is the norm of w vector and the term jjw jj2 can be expressed
in the form wTw in which wT is the transpose form of w vector. According to Eq. (7), if the predicted value is out of e-tube
then the loss will be the absolute value, which is the difference between predicted value and e. Since some data may not lie
inside the e-tube, the slack variables (x, x⁎) must be used. These variables show the distance from actual values to the corresponding boundary values of e-tube. Therefore, it is possible to transform Eq. (6) into
R min ¼ c
n X
i¼1
1
xi , x∗i + kwk2
2
(8)
subject to: ti wi’(xi) b e + xi, wi’(xi) + b ti x∗i , xi + x∗i 0
Using Lagrangian multipliers in Eq. (8) thus yields the dual Lagrangian form:
n n n
n X
X
X
1 X
ai + a∗i + ti
ai a∗i ai a∗i aj a∗j K xi , xj
Maxl ai , a∗i ¼ e
2
i¼1
i¼1
i¼1
j¼1
subject to:
n P
i¼1
ai a∗i ¼ 0,
0 ai ,
a∗i C,
(9)
i ¼ 1,2…N
Where ai and a⁎i are Lagrange multipliers and l(ai, a⁎i ) represents the Lagrange function. K (xi, xj) is a kernel function to
yield the inner products in the feature space ’(xi) and ’(xj) and presented as below:
K xi , xj ¼ ’ðxi Þ ’ xj
(10)
Kernel-based modeling Chapter
16
271
Different software such as the MATLAB and STATISTICA can be used for data analysis with the SVM approach. Also,
Python programming language can be used for modeling aim via the SVM model.
More detail about SVM coding can be fined in https://www.mathworks.com/help/stats/fitrsvm.html website. For
example the following code can be used for fitting a SVM model.
fitrsvm:
Mdl ¼ fitrsvm(Tbl,ResponseVarName)
Mdl ¼ fitrsvm(Tbl,formula)
Mdl ¼ fitrsvm(Tbl,Y)
Mdl ¼ fitrsvm(X,Y)
Mdl ¼ fitrsvm(___,Name,Value)
3. Gaussian processes
Gaussian processes (GPs) are powerful algorithms for both regression and classification (Melo, 2012). Their greatest practical advantage is that they can give a reliable estimate of their own uncertainty. Fig. 4 illustrates a typical example of a
prediction problem: given some noisy observations of a dependent variable at certain values of the independent variable x,
what is our best estimate of the dependent variable at a new value, x∗?
If we expect the underlying function f(x) to be linear, and can make some assumptions about the input data, we might use
a least-squares method to fit a straight line (linear regression). Moreover, if we suspect f(x) may also be quadratic, cubic, or
even non-polynomial, we can use the principles of model selection to choose among the various possibilities.
Gaussian processes extend multivariate Gaussian distributions to infinite dimensionality. Formally, a Gaussian
process generates data located throughout some domain such that any finite subset of the range follows a multivariate
Gaussian distribution. Now, the n observations in an arbitrary data set, y ¼ {y1, …, yn}, can always be imagined as a
sample from some multivariate (n-variate) Gaussian distribution, after enough thought. Hence, working backward, this
data set can be partnered with a GP. Thus, GPs are as universal as they are simple. Very often, it’s assumed that the
mean of this partner GP is zero everywhere. What relates one observation to another in such cases is just the covariance
function, k (xi, xj).
3.1 Gaussian process regression
This section introduces Gaussian process regression as a useful tool for formulating a Bayesian framework for regression
problems. The Gaussian process (GP) is achieved through extending the multivariate Gaussian distribution to infinite
dimensions, which can be considered as a statistical distribution of functions (Rasmussen and Williams, 2006). Suppose
the training data set of the Gaussian model is D ¼ {(xn, yn), n ¼ 1, 2, …N}, where xn ℝdx refers to the input and yn ℝ refers
1.5
1
0.5
x*= ?
y
0
-0.5
-1
-1.5
-2
-2
-1.5
-1
-0.5
x
FIG. 4 Given seven noisy data points.
0
0.5
272 Handbook of hydroinformatics
to the output. In Gaussian process regression, the observed target value y of an underlying function f at input x can
be given as:
y ¼ f ðxÞ + x
(11)
where x represents the independent and identically Gaussian noise with a mean value of zero (m(x) ¼ 0) and a variance of s2.
Then, the prior distribution can be acquired as:
(12)
Y ¼ ðy1 , …, yn Þ N 0, kij + s2n I
where kij ¼ K(xi, xj), and I stands as the identity matrix. The joint prior distribution of observed and predicted values can be
written as:
"
#!
Y
K ðX, XÞ + s2n I K ðX, X∗ Þ
(13)
N 0,
f∗
K ðX ∗ , X Þ
K ðX ∗ , X ∗ Þ
where X ¼ [x1, x2, …, xn] refer to the training set, X⁎ stands as testing data, Y ¼ [y1, y2, …, yn] refer to observed value set,
f∗ ¼ [fx1, fx2, …, f∗ n] is predictive value set, K(X, X) ¼ (xi, xj) indicates a symmetric positive definite covariance matrix,
whose elements xixj describe the correlation between xi and xj using the concept of kernel function. K(X, X∗) represents
the n n∗ covariance matrix assessed at all pairs of training and test dataset considering n training data and n∗ test data.
Similarly, this is true for the other values of K(X∗, X) and K(X∗, X∗). The principle of the joint prior Gaussian distributions
provides the prediction results for the target to be inferred through the mean function f ∗ and the covariance function Cov(f∗)
presented by Eqs. (14), (15):
1
f∗ ¼ m x∗ + K X∗ , X K ðX, XÞ + s2n I Y
(14)
Covðf ∗ Þ ¼ K ðX∗ , X∗ Þ K ðX∗ , XÞ K ðX, X + s2n I
1
K ðX, X∗ Þ + s2n I
(15)
The kernel function is an essential part of the GPR model development, as it includes assumption about the smoothness and
likely patterns to be expected in the data. Kernel function determines how the response at one point xi is affected by
responses at other points xj, i 6¼ j, i ¼ 1, 2, …, n. A sensible assumption is usually that the correlation between two points
decays with the distance between the points. This implies that The behavior of closer points is more similar than points
which are further away from each other. There are many choices for kernel functions such as the Matern kernel family:
1v 0
1
0
1v
x
x
x
x
p
ffiffiffiffiffi
p
ffiffiffiffiffi
i
j
i
j
2 2
@ 2v
A K @ 2v
A
(16)
k xi , xj ¼ s
v
l
l
GðvÞ
where Г represents the gamma function, Kv represents the modified Bessel function and j xi xj j is the distance between
input location xi and xj. Some forms of kernel functions are derived through half integer values of v. Here the most prominent forms are addressed. For v ¼ 0 the Ornstein-Uhlenbeck kernel function is obtained as:
1
0 xi xj 2
@
A
(17)
k xi , xj ¼ s exp l
For v ¼ 3/2 Matern 3/2 kernel function:
1
1
0
0 pffiffiffi
pffiffiffi
3 x i x j 3 x i x j 2@
A
@
A
k xi , xj ¼ s 1 +
exp l
l
For v ¼ 5/2 Matern 5/2 kernel function:
0 pffiffiffi
1
2 1
1
0
0 pffiffiffi
pffiffiffi
5 x i x j 5xi xj 5 x i x j B
C
2@
A
@
A
k xi , xj ¼ s 1 +
exp @
A exp l
l
3l2
(18)
(19)
Kernel-based modeling Chapter
16
273
Moreover, for limv ! ∞, the squared exponential kernel is obtained. This commonly used kernel function provides an
expressive kernel in order to model smooth and stationary functions (Duvenaud, 2014). The squared exponential kernel
is defined as:
0 2 1
xi xj
B
C
k xi , xj ¼ s2 exp @
(20)
A
2l2
The values of the length scale (l) and the signal variance (s2) as hyper-parameters can affect the a priori correlation between
points and can change the resulting function (Seifi and Riahi, 2020).
The kernel function and its parameters with the degree of noise should be optimally determined during the
training process of GPR models. Considering GPR with a fixed value of Gaussian noise, a GP model could be trained
by using Bayesian inference, i.e. by maximizing the marginal likelihood. This causes the minimization of the negative
log-posterior:
1 1
1
p s2 , k ¼ Y T K + s2 I Y + log K + s2 I log p s2 log pðkÞ
2
2
(21)
In order to obtain the hyper parameters, the partial derivative of Eq. (21) can be gained with regard to s2 and k, and minimization can be obtained by gradient descent.
The MATLAB and Python programming languages can be used for modeling aim via the GPR. More detail about GPR
cods can be fined in https://www.mathworks.com/help/stats/fitrgp.html and https://www.mathworks.com/help/stats/
gaussian-process-regression-models.html websites. The following code can be used for fitting a Gaussian process
regression (GPR) model.
fitrgpr:
gprMdl
gprMdl
gprMdl
gprMdl
gprMdl
¼
¼
¼
¼
¼
fitrgp(Tbl,ResponseVarName)
fitrgp(Tbl,formula)
fitrgp(Tbl,y)
fitrgp(X,y)
fitrgp(___,Name,Value)
3.2 Gaussian process classification
The Gaussian process classifier implements Gaussian processes (GP) for classification purposes (Fig. 5), more specifically
for probabilistic classification, where test predictions take the form of class probabilities. The Gaussian process classification (GPC) places a GP prior on a latent function, which is then squashed through a link function to obtain the probabilistic classification. The latent function is a so-called nuisance function, whose values are not observed and are not
relevant by themselves. Its purpose is to allow a convenient formulation of the model, and is removed (integrated out)
during prediction. The GPC implements the logistic link function, for which the integral cannot be computed analytically
but is easily approximated in the binary case. In contrast to the regression setting, the posterior of the latent function is not
Gaussian even for a GP prior since a Gaussian likelihood is inappropriate for discrete class labels. Gaussian process classifier approximates the non-Gaussian posterior with a Gaussian based on the Laplace approximation. This method supports
multi-class classification by performing either one-versus-rest or one-versus-one based training and prediction. In oneversus-rest, one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from
the rest. In “one-vs-one,” one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate
these two classes. The predictions of these binary predictors are combined into multi-class predictions. In the case of
Gaussian process classification, “one-vs-one” must solve many problems involving only a subset of the whole training
set rather than fewer problems on the whole dataset. Since Gaussian process classification scales cubically with the size
of the dataset, this might be considerably faster. However, note that “one-vs-one” does not support predicting probability
estimates but only plain predictions.
274 Handbook of hydroinformatics
FIG. 5 Gaussian process classification. (For more information see: https://sccn.ucsd.edu/svn/software/tags/EEGLAB7_0_2_9beta/external/fieldtrip20090727/classification/toolboxes/external/gpml-matlab/doc/classification.html.)
4. Kernel extreme learning machine
Extreme Learning Machine (ELM) is a fast training model with simple mathematical structure which benefits the idea of
the back proration. However, the random distribution of input and hidden layer causes of variation in regression and classification accuracy, even if the inputs are totally the same. In order to avoid this drawback, the kernel function is integrated
into the basis of the ELM to design the Kernel Extreme Learning Machine (KELM). Compared with widely used kernelbased SVM method, KELM can lead to better performance in the areas of classification, pattern recognition and regression
through easier implementation and faster training speed (Shamshirband et al., 2015). A brief introduction of KELM is
presented here.
The ELM is known as a single hidden layer neural network (Huang et al., 2006). Despite back- propagation approach,
which needs adjustment of input weights, the ELM uses randomly assigned input weights. For the given dataset, an ELM
with H hidden neurons and activation function f(x) can be expressed as:
XH
XH
a
f
ð
x
Þ
¼
a
f
w
,
x
+
c
(22)
i
i j
i ¼ ej
i¼1 i
i¼1 i
where j ¼ 1, …, n
Where wi and ai are the weight vectors connecting inputs and the ith hidden neuron and the ith hidden neuron and output
neurons respectively; ci is the complex bias of the ith hidden neuron. Huang et al. (2006) suggested that Eq. (22) can be
written briefly as follows:
Aa ¼ Y
(23)
where A is the hidden layer output matrix of the neural network (Huang et al., 2006).
The set of weights (wi, ai) and biases should be adjusted in application of neural network with back-propagation learning
algorithms. The back-propagation learning algorithm requires specifying the value of learning rate, momentum, and does
not ensure that the absolute minimum of the error function will be found. As a result, the learning algorithm can have local
minima and may be in the risk of over-training while the training process. Using the smallest norm least squares solution of
Aa ¼ Y is suggested to solve these problems (Huang et al., 2006). In most examples of the utilization of ELM, the number of
hidden neurons is much less than the number of training samples, thus making A a non-square matrix and there may not exist
a such that Aa ¼ Y, instead one may need to find a0 (Huang et al., 2006). Consequently, the solution of Eq. (23) becomes:
a 0 ¼ AC Y
where AC is the Moore-Penrose generalized inverse of matrix A.
(24)
Kernel-based modeling Chapter
16
275
Recently, Huang et al. (2011) proposed applying orthogonal projection and kernel methods in the design of ELM. Based
on the orthogonal projection method; AC ¼ (ATA)1AT if ATA is non-singular or AC ¼ AT(AAT)1 if AAT is non-singular.
Huang et al. (2011) proposed adding a positive value of 1/r (where r is a user-defined parameter) to the diagonal of
AAT or ATA in the computation of the output weights a which present a more stable solution of the ELM with better generalization ability in compare of least square solution. Thus, to have a stable ELM algorithm, one can have:
1
T 1
T
+ AA
a¼A
Y
(25)
r
With a corresponding output function of ELM defined by
hðxÞa ¼ hðxÞA
T
1
+ AAT
r
1
Y
(26)
Employing a kernel function was proposed if the hidden layer feature mapping h(x) is unknown (Huang et al., 2006).
A kernel matrix for ELM can be expressed as follows:
(27)
wELM ¼ AAT : wELMi,j ¼ hðxi Þ,h xj ¼ K xi , xj
where K(xi, xj) is a kernel function. Now the output function (Eq. 23) can be written as:
2
3
K ðx, x1 Þ 1
6
7 1
⋮
+ wELM
Y
4
5
r
K ðx, xn Þ
(28)
In application of kernel-based ELM, there is no need to know the number of hidden nodes and hidden layer feature
mapping, alternatively, a kernel function corresponding to h(x) can be used. For modeling via the KELM method the
MATLAB and Python programming languages can be used.
5. Kernels type
Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit
feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner
products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than
the explicit computation of the coordinates (Theodoridis, 2008). This approach is called the “kernel trick.” According to
Fig. 6, any linear model can be turned into a non-linear model by applying the kernel trick to the model: replacing its features (predictors) by a kernel function.
FIG. 6 An illustration of the kernel method (https://commons.wikimedia.org/wiki/File:Kernel_Machine.png).
276 Handbook of hydroinformatics
5.1 Fisher kernel
The Fisher kernel is a function that measures the similarity of two objects based on sets of measurements for each object and
a statistical model. In a classification procedure, the class for a new object (whose real class is unknown) can be estimated
by minimizing, across classes, an average of the Fisher kernel distance from the new object to each known member of the
given class. The Fisher kernel was introduced in 1998 ( Jaakkola et al., 1999). It combines the advantages of generative
statistical models (like the hidden Markov model) and those of discriminative methods (like support vector machines).
5.2 Graph kernels
The graph kernel is a kernel function that computes an inner product on graphs (Vishwanathan et al., 2010). Graph kernels
can be intuitively understood as functions measuring the similarity of pairs of graphs. They allow kernelized learning algorithms such as support vector machines to work directly on graphs, without having to do feature extraction to transform
them to fixed-length, real-valued feature vectors.
5.3 Kernel smoother
A kernel smoother is a statistical technique to estimate a real valued function f: Rp ! R as the weighted average of neighboring observed data. The weight is defined by the kernel, such that closer points are given higher weights. The estimated
function is smooth, and the level of smoothness is set by a single parameter (Wand and Jones, 1994).
5.4 Polynomial kernel
The polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized
models, that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables, allowing learning of non-linear models. Intuitively, the polynomial kernel looks not only at the given features of input
samples to determine their similarity, but also combinations of these. In the context of regression analysis, such combinations are known as interaction features (Aboutalebi et al., 2015). Polynomial kernel function is described as:
d
K xi , xj ¼ xi , xj + c
(29)
where d is the degree and c 0 is a free parameter trading off the influence of higher-order versus lower-order terms in the
polynomial. When c ¼ 0, the kernel is called homogeneous.
5.5 Radial basis function kernel
In machine learning, the radial basis function (RBF) kernel, is a popular kernel function used in various kernelized learning
algorithms. The RBF kernel on two samples xi and xj, represented as feature vectors in some input space. Satisfactory performance of RBF kernel function has been reported in the literature (Seifi and Riahi, 2020; Roushangar et al., 2021). RBF
kernel function is described as:
2
K xi , xj ¼ exp xi xj =2g2
(30)
where g stands for the optimal width of kernel function. Great values of g let kernel-based approaches to have a strong
impact over a large area.
5.6 Pearson kernel
Pearson VII universal kernel (PUK) is the other type of kernel function that can be used in kernel-based algorithms. The
Pearson VII kernel function of multi-dimensional input space is given by the following formula:
2
!2 32
r
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
K xi , xj ¼ 1=41 + 2 xi xj 2ð1=oÞ 1=s 5
(31)
where the parameters o and s control the half-width (also named Pearson width) and the tailing factor of the peak.
Kernel-based modeling Chapter
16
277
5.7 String kernels
The string kernel is a kernel function that operates on strings, i.e. finite sequences of symbols that need not be of the same
length. String kernels can be intuitively understood as functions measuring the similarity of pairs of strings: the more
similar two strings are, the higher the value of a string kernel will be. The equation is K(xi,xj) ¼ tanh(axay + c).
5.8 Neural tangent kernel
The neural tangent kernel (NTK) is a kernel which describes the evolution of deep artificial neural networks during their
training by gradient descent. It allows ANNs to be studied using theoretical tools from Kernel Methods. For most common
neural network architectures, in the limit of large layer width the NTK becomes constant. This enables simple closed form
statements to be made about neural network predictions, training dynamics, generalization, and loss surfaces.
6. Application of kernel-based approaches
Application areas of kernel methods are diverse and include geo-statistics, inverse distance weighting, 3D reconstruction,
bioinformatics, information extraction, handwriting recognition, and regression issues. Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. The applications of kernel-based approaches in regression
of water recourse engineering problems have been widely considered by researchers. In the following parts some studies
related to kernel-based approaches are presented.
6.1 Total resistance and form resistance of movable bed channels
In general, total roughness coefficient in open channels includes both grain resistance and bedform resistance. Due to the
non-linearity of the roughness coefficient, an accurate prediction of the bedform roughness is difficult. Saghebian et al.
(2020) investigated the potential of the GPR kernel-based approach in the total resistance and form resistance prediction
in alluvial channels. The simulations were done for four different data series obtained from experimental studies in different
laboratories. The obtained results proved the capability of GPR method in the modeling process. It was found that using
kernel function of Pearson (Eq. 31) led to better prediction accuracy (see Fig. 7).
6.2 Energy losses of rectangular and circular culverts
1
0.8
0.6
0.6
0.4
20
15
MAPE
1
0.8
DC
R
An application of GPR and SVM regressions were discussed in Roushangar et al. (2019). The approach taken was the use of
GPR and SVM for predicting the energy dissipation in rectangular and circular culverts. According to Fig. 8, two types of
bend loss in rectangular culverts and entrance loss in circular culverts with different inlet end treatments were considered.
Various input combinations were developed and tested using experimental data. For selecting the best kernel functions,
models were predicted via GPR and SVM using various kernel. As Fig. 9 shows, kernel function of RBF led to better prediction accuracy in comparison to the other kernels. The obtained results showed the desirable accuracy of applied kernelbased approaches in the energy dissipation modeling.
0.4
10
0.2
0.2
5
0
0
0
Kernel type
Kernel typ
FIG. 7 Statistics parameters via GPR kernels function types (Saghebian et al., 2020).
Kernel typ
278 Handbook of hydroinformatics
Energy loss modeling
Scenario 1
Bend loss in rectangular culvert
Scenario 2
Entrance loss in circular culvert
Input: Fr, θ
Square-edged inlet with vertical
headwall
Input: Re, Fr,
Mitered
flush
to
1.5:1
(horizontal to vertical) fill slope
Hw/D
Output:
Ke
Thin-wall projecting
45° beveled inlet with vertical
headwall
FIG. 8 Schematic view of different states considered in Roushangar et al. (2019) study.
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
GPR
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
Kernel type
SVM
0.300
0.200
0.100
0.000
Kernel type
DC
R
1
0.400
Kernel type
GPR
0.100
GPR
0.075
RMSE
Kernel type
SVM
RMSE
SVM
DC
R
1
0.050
0.025
0.000
Kernel type
Kernel type
FIG. 9 Statistics parameters via SVM and GPR kernels function types for a testing set of rectangular culvert (Roushangar et al., 2019).
Kernel-based modeling Chapter
16
279
6.3 Lake and reservoir water level prediction
Khan and Coulibaly (2006) examined the potential of the support vector machine (SVM) in long-term prediction of lake
water levels. Lake Erie mean monthly water levels from 1918 to 2001 were used to predict future water levels up to
12 months ahead. Here the optimization technique (linearly constrained quadratic programming function) used in the
SVM for parameter selection has made it superior to traditional neural networks. They found RBF kernel to be the most
appropriate and adopts it with a common width of (g ¼ 0.3) for all points in the data set. Further they set the values for
regularization parameter C ¼ 100 and e-insensitive loss function with e ¼ 0.005 chosen by the trial-and-error approach.
80%–90% of the input data were identified to be support vectors in the model. The performance was compared with a
multilayer perceptron (MLP) and with a conventional multiplicative seasonal autoregressive model (SAR). Overall,
SVM showed good performance and was proved to be competitive with the MLP and SAR models. For a 3- to
12-month-ahead prediction, the SVM model outperformed the two other models based on the root-mean square error
and correlation coefficient performance criteria.
6.4 Streamflow forecasting
Jian et al. (2006) noticed the importance of accurate time- and site-specific forecasts of streamflow and reservoir inflow for
effective hydropower reservoir management and scheduling. They used monthly flow data of the Manwan Reservoir
spanning over a time period from January 1974 to December 2003; the data set from January 1974 to December 1998 were
used for training and the data sets from January 1999 to December 2003 served for validation. The SVM model gave a good
prediction performance when compared with those of ARMA and ANN models. The authors point out SVMs’ distinct
capability and advantages in identifying hydrological time series comprising nonlinear characteristics and its potential
in the prediction of long-term discharges.
6.5 Sediment load prediction
As the sediment transport attracts great attention on environmental issues and water resources planning, the need for and use
of robust learning approaches such as kernel-based techniques has become more apparent. In an investigation into the application of SVM, GPR and KELM on sediment load prediction, Roushangar and Shahnazi (2020a) used a large number of
measurements and related information for increasing the prediction level of total sediment load. They employed the records
of stream flow and transported sediments of 19 gravel-bed rivers for 1980 to 2002 including 890 samples of transported bed
and suspended loads. Different input combinations were evaluated, based on hydraulic characteristics and sediment features. The implementation of KELM technique with fewer number of input variables provided very good outcomes.
Moreover, the obtained results indicated great performance of KELM with minimum complexity of the model at the same
time. In addition, the results of this study showed that compared to SVM, RVM produces a much sparser solution, requiring
only 20 relevance vectors in comparison to 151 support vectors by SVM out of a total of 154 training data to create the
model, but may produce local minima because of the use of an expectation maximization-based learning approach.
6.6 Pier scour modeling
Scouring around the piers is the major cause of the bridge failures and accurate prediction of equilibrium scour depth is one
of the main concerns in the hydraulic design of bridge. Where empirical formulas are insufficient in providing persistent
success due to complexity and uncertainty of the phenomenon, Pal et al. (2014) introduced kernel-based methods as reliable
tools that provide solutions for prediction of pier scour depth. A total of 232 field data points were used to feed the employed
relevance vector machines (RVM), GPR and KELM methods. It was found that the employed kernel-based methods had the
superior performance compared with those obtained from empirical approaches. The models derived from GPR and to
some extent RVM are capable of generalization well, better than KELM method.
6.7 Reservoir evaporation prediction
The water loss through evaporation is a significant component for planning and management of water resources. Having
utilized KELM to model monthly evaporation from Algerian dam’s reservoirs in 2020, Sebbar et al. reported the effectiveness of proposed KELM tool for prediction of evaporation across large climatic zones. The prediction process was
carried out through three scenarios and generalization capability of different kernel functions as core tools of KELM
280 Handbook of hydroinformatics
method were investigated. The results revealed that polynomial and RBF kernel functions achieved better performance.
Further investigations showed that the hybrid SVM model with discrete wavelet transform can enhance the prediction
accuracy and surpass proposed KELM method in terms of Reservoir evaporation prediction.
7. Conclusions
In this chapter the principles of several kernel-based approaches are discussed and it has been shown that they provide an
approach for feature classification and regression problems. Kernel methods give a systematic and principled approach to
training learning machines. These methods can be used to generate many possible learning machine architectures (RBF
networks, feedforward neural networks) through an appropriate choice of kernel. In particular these approaches are
properly motivated theoretically and systematic in execution. Kernel functions enable the kernel methods to operate in
a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space. By applying
the kernel trick to the model any linear model can be turned into a non-linear model. The application of kernel methods
related to water resource engineering problems are demonstrated. It was showed that kernel methods had been successfully
applied for accurate prediction of water recourse engineering problems. However, the appropriate selection of kernel type is
the most important step in these models due to their direct impact on the training and classification process. Also, these
methods are memory intensive, trickier to tune due to the importance of picking the right kernel, and the application of them
on larger datasets should be tested to determine the merits of the kernel methods.
References
Aboutalebi, M., Bozorg Haddad, O., Loaiciga, H.A., 2015. Optimal monthly reservoir operation rules for hydropower generation derived with
SVR-NSGAII. J. Water Resour. Plan. Manag. 141 (11), 04015029.
Akbari, M., Salmasi, F., Arvanaghi, H., Karbasi, M., Farsadizadeh, D., 2019. Application of Gaussian process regression model to predict discharge coefficient of gated piano key weir. Water Resour. Manag. 33 (11), 3929–3947.
Cristianini, N., Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University
Press, Cambridge, UK.
Deka, P.C., 2014. Support vector machine applications in the field of hydrology: a review. Appl. Soft Comput. 19, 372–386.
Duvenaud, D., 2014. Automatic Model Construction With Gaussian Processes (Doctoral Dissertation). University of Cambridge, UK.
Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1–3), 489–501.
Huang, G.B., Zhou, H., Ding, X., Zhang, R., 2011. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern.
B Cybern. 42 (2), 513–529.
Jaakkola, T.S., Diekhans, M., Haussler, D., 1999. Using the Fisher kernel method to detect remote protein homologies. Proc. Int. Conf. Intell. Syst. Mol.
Biol. 99, 149–158.
Jaiswal, A., Goel, A., 2020. Evaluation of aeration efficiency of triangular weirs by using Gaussian process and M5P approaches. In: Advanced Engineering Optimization Through Intelligent Techniques. Springer, Singapore, pp. 749–756.
Jian, Y.L., Chun, T.C., Kwok, W.C., 2006. Using support vector machines for long term discharge prediction. Hydrol. Sci. J. 51 (4), 599–612.
Khan, S.M., Coulibaly, P., 2006. Application of support vector machine in lake waterlevel prediction. J. Hydraul. Eng. ASCE 11, 199–205.
Li, Y., Shi, H., Liu, H., 2020. A hybrid model for river water level forecasting: cases of Xiangjiang River and Yuanjiang River, China. J. Hydrol. 587,
124934.
Melo, J., 2012. Gaussian processes for regression: a tutorial (Technical Report). University of Porto, Portugal.
Pal, M., Singh, N.K., Tiwari, N.K., 2014. Kernel methods for pier scour modeling using field data. J. Hydroinf. 16 (4), 784–796.
Perez-Cruz, F., Van Vaerenbergh, S., Murillo-Fuentes, J.J., Lázaro-Gredilla, M., Santamaria, I., 2013. Gaussian processes for nonlinear signal processing:
an overview of recent advances. IEEE Signal Process. Mag. 30 (4), 40–50.
Raghavendra, N.S., Deka, P.C., 2016. Multistep ahead groundwater level time-series forecasting using Gaussian process regression and ANFIS. In:
Advanced Computing and Systems for Security. Springer, New Delhi, India, pp. 289–302.
Rasmussen, C.E., Williams, C.K., 2006. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA, USA.
Roushangar, K., Ghasempour, R., 2017. Prediction of non-cohesive sediment transport in circular channels in deposition and limit of deposition states
using SVM. Water Sci. Technol. Water Supply 17 (2), 537–551.
Roushangar, K., Shahnazi, S., 2019. Bed load prediction in gravel-bed rivers using wavelet kernel extreme learning machine and meta-heuristic methods.
Int. J. Environ. Sci. Technol. 16 (12), 8197–8208.
Roushangar, K., Shahnazi, S., 2020a. Determination of influential parameters for prediction of total sediment loads in mountain rivers using kernel-based
approaches. J. Mt. Sci. 17 (2), 480–491.
Roushangar, K., Shahnazi, S., 2020b. Prediction of sediment transport rates in gravel-bed rivers using Gaussian process regression. J. Hydroinf. 22 (2),
249–262.
Roushangar, K., Garekhani, S., Alizadeh, F., 2016. Forecasting daily seepage discharge of an earth dam using wavelet–mutual information–Gaussian
process regression approaches. Geotech. Geol. Eng. 34 (5), 1313–1326.
Kernel-based modeling Chapter
16
281
Roushangar, K., Matin, G.N., Ghasempour, R., Saghebian, S.M., 2019. Evaluation of the effective parameters on energy losses of rectangular and circular
culverts via kernel-based approaches. J. Hydroinf. 21 (6), 1014–1029.
Roushangar, K., Ghasempour, R., Biukaghazadeh, S., 2020. Evaluation of the parameters affecting the roughness coefficient of sewer pipes with rigid and
loose boundary conditions via kernel-based approaches. Int. J. Sediment Res. 35 (2), 171–179.
Roushangar, K., Majedi Asl, M., Shahnazi, S., 2021. Hydraulic performance of PK weirs based on experimental study and kernel-based modeling. Water
Resour. Manag. 35 (11), 3571–3592.
Saghebian, S.M., Roushangar, K., Ozgur Kirca, V.S., Ghasempour, R., 2020. Modeling total resistance and form resistance of movable bed channels via
experimental data and a kernel-based approach. J. Hydroinf. 22 (3), 528–540.
Seifi, A., Riahi, H., 2020. Estimating daily reference evapotranspiration using hybrid gamma test-least square support vector machine, gamma test-ANN,
and gamma test-ANFIS models in an arid area of Iran. J. Water Clim. Chang. 11 (1), 217–240.
Seifi, A., Ehteram, M., Singh, V.P., Mosavi, A., 2020. Modeling and uncertainty analysis of groundwater level using six evolutionary optimization algorithms hybridized with ANFIS, SVM, and ANN. Sustainability 12 (10), 4023.
Shamshirband, S., Mohammadi, K., Chen, H.L., Samy, G.N., Petkovic, D., Ma, C., 2015. Daily global solar radiation prediction from air temperatures
using kernel extreme learning machine: a case study for Iran. J. Atmos. Sol. Terr. Phys. 134, 109–117.
Sharifi Garmdareh, E., Vafakhah, M., Eslamian, S., 2018. Regional flood frequency analysis using support vector regression in the arid and semi-arid
regions of Iran. Hydrol. Sci. J. 63 (3), 426–440.
Smola, A.J., 1996. Regression Estimation With Support Vector Learning Machines (Master’s Thesis). Technische Universit€at M€unchen, Germany.
Sun, A.Y., Wang, D., Xu, X., 2014. Monthly streamflow forecasting using Gaussian process regression. J. Hydrol. 511, 72–81.
Tezel, G., Buyukyildiz, M., 2016. Monthly evaporation forecasting using artificial neural networks and support vector machines. Theor. Appl. Climatol.
124 (1–2), 69–80.
Theodoridis, S., 2008. Pattern Recognition. Elsevier B.V, p. 203. ISBN 9780080949123.
Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, pp. 1–47.
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M., 2010. Graph kernels. J. Mach. Learn. Res. 11, 1201–1242.
Wand, M.P., Jones, M.C., 1994. Kernel Smoothing. CRC Press.
Yang, Y., Li, J., Yang, Y., 2015, December. The research of the fast SVM classifier method. In: 2015 12th International Computer Conference on Wavelet
Active Media Technology and Information Processing (ICCWAMTIP). IEEE, pp. 121–124.
Zhu, S., Luo, X., Xu, Z., Ye, L., 2019. Seasonal streamflow forecasts using mixture-kernel GPR and advanced methods of input variable selection. Hydrol.
Res. 50 (1), 200–214.
Zhuang, J., Tsang, I.W., Hoi, S.C., 2011. A family of simple non-parametric kernel learning algorithms. J. Mach. Learn. Res. 12, 1313–1347.
Further reading
Sebbar, A., Heddam, S., Djemili, L., 2020. Kernel extreme learning machines (KELM): a new approach for modeling monthly evaporation (EP) from dams
reservoirs. Phys. Geogr., 1–23.
This page intentionally left blank
Chapter 17
Large eddy simulation: Subgrid-scale
modeling with neural network
Tamas Karches
Faculty of Water Science, University of Public Service, Budapest, Hungary
1. Introduction
Knowledge of the turbulent fluid flow behavior is essential in many hydroengineering applications; the spectrum is wide
from natural or constructed open channel flows, subsurface flows to water distribution networks, sewage collection systems
and flows through a (waste)water treatment technologies. The various engineering goals require different level of understanding of the flow structure, e.g., pipe flows generally allow dimensional simplifications, whereas the analysis of river
bed change shall include the description the fate of the multispecies. Many efforts have been made to describe and resolve
the hydrodynamic systems, but due to the complexity of the system, the computational cost could be a barrier.
Turbulent flow can be characterized as a chaotic, dissipative, 3D unsteady flow, which has intermittence and its spatial
and temporal distribution depends on upstream conditions. It increases the shear stress via eddy viscosity and as the turbulent diffusion develops the homogeneity of passive scalars also increases. For resolving the turbulent fluid flow the
starting point is the conservation equations for the continuity, momentum and scalar variables (species concentration or
energy/enthalpy). Direct numeric simulation (DNS) aims the solution of the entire spatial and temporal scales including
the largest integral scale and the smallest dissipative scales. DNS is a flagship computational approach in the understanding
of turbulence (Lee et al., 2014), and facilitate the improvement of turbulence models as an a priori test, where DNS provides
the input (Toutant et al., 2008) or as a posteriori test for a model comparison with the DNS results (Bou-Zeid, 2015).
Reynolds averaged Navier-Stokes (RANS) modeling is based on the separation of the time-averaged and fluctuation terms,
which produces apparent stress and for closing the RANS equation turbulence model is required. Most of the engineering
applications utilize unsteady RANS approach, because of its less need in computational cost, but in several cases such as
transitional flows or where large scale structures dominate the flow it is not sufficient for describe the flow characteristics.
Large eddy simulation utilizes a low pass filter of Navier-Stokes equation, which reduces the number of degrees of
freedom; the small scales are time- and spatial averaged and modeled, the large scale is resolved as in DNS. The small
scale is also called subgrid-scale (SGS). At dissipative scale, the eddies are isotropic, homogeneous, and universal, whereas
at large scale eddies characteristics are anisotropic, have dependence on boundaries and subjected to history effect.
The connection between the large and small scales is given by the energy cascade. RANS first averages and then computes
the flow field, whereas it is contrary in LES, flow field is first computed and then the averages are made. In LES practice the
role of the numerical errors is often underestimated, which could strongly influence the prediction (Dairay et al., 2017).
Explicit LES filtering reduces the numerical discretization error as the retained motion is well resolved (Moin, 1998), but
when the practicality and the ease in numerical computation are in the centerline, the use of the implicit LES could be
proposed since it leads to a nonoscillatory solution (Grinstein et al., 2007).
The main goal of the SGS model is to ensure that the energy dissipation in the LES is the same as in the fully resolved
energy cascade obtained with DNS. SGS model also needs to incorporate the local, instantaneous energy transfer. A tradeoff between the SGS model improvement and mesh refinement is a key element in LES, since it determines the computational cost (Vollant et al., 2017). In this chapter, the traditional SGS modeling will be presented, and then the applicability
of neural networks will be detailed, which is followed by recommendation for good modeling practice in this field.
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00022-1
Copyright © 2023 Elsevier Inc. All rights reserved.
283
284 Handbook of hydroinformatics
2.
LES and traditional subgrid-scale modeling
One categorization of SGS models is based on its role in flow field reconstruction; the nature of SGS can be functional or
structural. Functional SGS focuses on the appropriate energy dissipation rate by adjusting artificial eddy viscosity, whereas
structural approaches try to incorporate the energy transfer rate between the scales into their calculation. Some hybrid SGS
models exist in conventional explicit LES, but implicit LES formulation could overcome the computational difficulties.
Smagorinsky model mimics the Prandtl mixing length model applied in RANS, assuming a local equilibrium between
the production of the SGS kinetic energy and dissipation. In eddy viscosity calculation a constant is introduced, which
assumes full isotropy, but near-wall regions the turbulence anisotropy may be present. In addition, the Smagorinsky constant have to be adjusted depending on Reynolds number or discretization schemes.
Dynamic model adapts the Smagorinsky constant locally, introduce a test filter near the cut-off scale allowing automatic
constant damping near wall zones, which vanishes at laminar flows and it allows backscatter (Germano et al., 1991). Oneequation dynamic subgrid scale model generally applies transport equation for turbulence kinetic energy (k), which could
lead a robust solution (Kajishima and Nomachi, 2006). Local dynamic k-based model (LDKM) has demonstrated its capability of capturing correctly the flow behavior near solid walls without any adjustments of the model (Kim and Menon,
1997). Other dynamic approach is the Lagrangian way to accumulate the required averages over flow pathlines rather than
over directions of statistical homogeneity (Meneveau et al., 1996).
Scale similar models assumes that the most active SGS are close to the cut-off and interacts the zone right above the cutoff (Bardina et al., 1980), and in latter study it was justified that the scale similarity terms modeled in SGS could be included
in LES equation (Bensow and Fureby, 2007). Filtered density function proposed by Pope (1991) originated from probability
density function method includes complete statistical information of the flow field variables, thus it is a valuable approach
for subgrid closure especially in the case with multispecies transport and/or reacting flows (Drozda et al., 2007).
As some traditional SGS model directions are outlined, it can be stated that a colorful and wide spectrum of mathematical tools are available for construct a deterministic closure. There is no royal road to achieve our goals, because various
flow regimes and numerous cases differing from boundaries and geometries exist. Parallel the advancement of the above
mentioned methods, extensive research activity applying data driven algorithms has begun in early 2000s. Sarghini et al.
(2003) indicated the course using multilayer feed-forward neural network for SGS modeling in LES. The authors admitted
that some key features of neural networks should be addressed like spatial scaling of input signals, the data set (abundance
and type of data) used for training, but since then many achievements have been made in this field and some of them are
summarized in the next section.
3.
Data-driven LES closures
Soft computing deals with approximate models that could solve real-life engineering problems. Deep learning techniques
do not search the physical background, the structural relationship between the input and output, but these learn from historic
data or in other words, are trained from samples and in return they give predictions. Unlike hard computing, these
approaches are tolerant of uncertainty, approximations, imprecision (Ibrahim, 2016).
In hydromechanics, where clearly defined deterministic models are available, the role of data driven assumptions like
deep learning is discussed nowadays frequently. It is evident that in subproblems, where uncertain closure equations are
applied well trained neural networks could perform better than traditional models (Lapeyre et al., 2019).
Scalar flux modeling with dynamic approaches based on Clark model (Clark et al., 1979) proved to be an efficient tool to
reproduce the local SGS terms applied optimal estimator to determine the most accurate set of input parameters (Fabre and
Balarac, 2011). Scalar flux divergence is present in the filtered passive scalar advection-diffusion equation, which is subjected to be modeled. Functional SGS aims to determine eddy diffusivity (analogous to apparent eddy viscosity in
momentum equation). Structural SGS conditions could be met by Taylor series expansion and dynamic nonlinear extension
of eddy viscosity (Vollant et al., 2017). A subset of Taylor series expansion model is the gradient model, where the resolved
scalar gradient could be decomposed to three elements; compressional, stretching and rotational effects. From this tier of
elements, the compressional part results forward transfer, stretching induces backscatter of scalar variable, whereas the
rotational part has no transfer across scales (Balarac et al., 2013). Physics informed neural networks as a novel tool in scalar
flux modeling has been appeared since real life application requires to incorporate the case-specific behavior of flow and
the existing constraints. Frezat et al. (2021) stated that their transformation-invariant neural network outperformed both
data-driven and parametric state-of-the-art SGS models.
Neural network training procedure aims to minimize irreducible errors via large set of uncorrelated parameters. In case
of direct reconstruction of unresolved scalar sources from mesh resolved quantities there is no need for explicit filtering of
Large eddy simulation: Subgrid-scale modeling Chapter
17
285
LES or solving additional transport equation (Seltz et al., 2019). Outperforming issues may arise, if the training is carried
out at low Reynolds number, therefore Prat et al. (2020) suggest that ANN should be trained at high Reynolds, since extrapolating to lower values has better performance and correlations remain high.
Maulik et al. (2019) offered an approach, which is both a priori and a posteriori, because they performed test on 2D
decaying turbulence parallel to the learning SGS through probability density function supplemented with hyperparameter
optimization analysis. Neural network could decide whether a given point in the flow domain requires a turbulence closure
or not using DNS data. If the analysis of the subgrid statistics gives an indication that dissipation is required, upwind
scheme is applied. However, in the case of no subgrid closure, a symmetric and second-order accurate Arakawa-scheme
is utilized (Maulik et al., 2020).
Wall modeling with LES needs to account both the small scales at near-wall zone and the large scales above the wall.
This challenges could be addressed with the description of physics (vertically integrated thin-boundary-layer equation and
eddy population density scaling) and complement the neural network train data (Yang et al., 2019).
4. Guidelines for SGS modeling
Unified modeling practice does not exist for neural network based SGS modeling, but the recent advancements in this field
induce that some structural guidelines for the modeling process shall be presented. The suggestions are proposed based on
the state-of-the-art techniques, but it should be handled as a first step, which needs refinement or may be subjected profound
changes later.
The modeling process could be divided to four different stages; project definition, a priori analysis with DNS, neural
network based SGS construction and performing LES simulations (Fig. 1). The subtasks in each step are detailed as follows.
4.1 Simulation project definition
In every modeling procedure a clear statement of the aims and goals are the basis. Goals should be well-defined, formulated
briefly and shall avoid too general statements. As an example, the following phrasing is not enough: our aim is to analyze
FIG. 1 Simulation procedure flow chart for neural network aided LES simulation.
286 Handbook of hydroinformatics
near-wall flows in a T-bend or the goal of the project is to determine coherent structures in a shearing zone. Instead, we
should use: our aim is to calculate the unsteady velocity profile assuming a steady velocity inlet boundary focusing nearwall zones in a T-bend or the goal of the project is to determine isosurfaces of Q in a shearing zone. Generally speaking, the
goal should reflect the concrete output of the model.
After the problem statement the model scope and limitations should be outlined; what are the premises, boundaries,
untouched areas. An isotropic turbulence model could be appropriate to pollutant transport calculations via the turbulent
diffusion, but not capture all the necessary details in separated or rotational flows. In this stage of the modeling the simulation workflow with all the alternatives and related costs (time, hardware) and data management strategies should be
taken into account. Data management includes data acquisition (literature, measurement, premodeled data), data reconciliation (sanity and plausibility checks) and each data shall be linked to each modeling steps (e.g. model parameter, training
data, test data, validation data of the genetic algorithm). During the modeling process more data will be generated, a strategy
for data processing should be defined. For example, DNS output data will be stored and used for flow field visualization, but
parallel data will be filtered in order to reconstruct the coarse grid model. As the last step the deliverables should be stated.
4.2 A priory analysis with DNS
Generally, DNS is not embedded in commercial fluid dynamics software codes, but in-house solutions are available, each is
built for a specific flow type. Development of such algorithms focuses mainly on the computational cost. As for the
numerical considerations, spectral methods (Gottlieb and Orszag, 1977) or high-order finite difference methods could
be applied efficiently. Temporal discretization uses both explicit and implicit schemes (Coleman and Sandberg, 2010).
Since the turbulence is three-dimensional unsteady flow, each time dependent variable history shall be known at the boundaries. Depending on the flow type, initial conditions could have significant effect on the result, and various inflow turbulence generation methods are detailed in literature (Wu, 2017); however in case of stationary flows, the initial conditions is
less important. Further reading and guidelines for performing and validation DNS simulation is in work of Sandham (2005).
As the DNS simulation procedure finishes, the next step is to use filter as in LES and produce coarse grid data. The field
gained by the DNS filtration is the so-called perfect LES (Beck et al., 2019). As a next step the data shall be separated to
training data, validation data and testing data. The training set is used to optimize the weights, validation data are to evaluate
the training errors and predict the point of overfitting, testing data serves a posteriori check. Training and validation data
could be gained from similar set of DNS, whereas testing can be produced by a sudden change in inlet boundary for a short
interval as proposed in the study of Lapeyre et al. (2019).
4.3 Neural network based SGS model construction
Basic elements of deep learning are the interconnected nodes. Between the input and output layers there are several hidden
layers. The process starts from the input and by applying activation functions (e.g., sigmoid, threshold, rectified linear unit,
hyperbolic tangent, etc.) produces the outputs. Information from the nodes is to be weighted, which weight factors are
determined during the training phase.
Many deep learning algorithms have been developed, some examples are deep belief networks, radial basis function
networks, recurrent neural networks, generative adversarial networks, multilayer perceptron, and convolutional neural networks. The latter two ones are used commonly for SGS modeling (Beck et al., 2019; Ling et al., 2016), but as a specific
architecture tensor basis neural network is appeared in Ling et al. (2016) work.
Multilayer perceptron is a feed forward neural network with multiple layers of perceptron each having activation
function. Training of node weights aims to achieve the lowest mean squared error between the predicted and actual target
values. The derivative of the means squared error is calculated by back propagation method, and once these are available,
the weights are updated. Minibatch gradient method (Ioffe and Szegedy, 2015) is a popular method, where the squared error
is monitored during the process and this method could help to avoid overfitting. Overfitting of neural network could be
prevented by training the network on more inputs or by changing the network complexity (number of weights, value of
weights). It has to be mentioned that recent study revealed that spatially multiscale ANN model performed better than
gradient model (Xie et al., 2020).
Convolutional neural networks overcome the shortcomings of multilayer perceptron technique, it is mainly used in
image processing or in object detection. The architecture can be divided to various layers; convolution layer uses filters,
then the segments are rectified through the so-called rectified linear unit. Pooling layer flattens the arrays producing single
linear vectors, which are the inputs for the fully connected layers. In convolutional layer the neurons get data only from the
receptive field.
Large eddy simulation: Subgrid-scale modeling Chapter
17
287
Applying neural network, the modeler cannot follow the traditional CFD practice in validation procedure, where grid
and in unsteady simulations time independence were evaluated (Eslamian et al., 2012). Neural network cross validation
helps to detect overfitting and ensure the reliability of the predicted values (as in a convergence study of numerical
schemes). K-fold cross validation means that the data set is divided into k subsets, from where one subset is retained
for validation the rest k 1 is for training. The procedure is repeated k times, and then each data subset is used for exactly
once for validation.
Even if the output of the neural network is a good approximation for the closure term, it cannot be expected that it has
long term stability (Beck et al., 2019). As the direct construction of closure terms is not feasible, Beck et al. (2019) proposed
two alternatives. One applies the neural network as a direct predictive tool with dissipative model, the other is a neural
network informed eddy viscosity model described more in detail in the authors paper. However, both approaches are
effective to manage stability issues, but the latter one showed better agreement with the reference. As a last step,—if
all the pieces are together—LES simulation could be performed.
5. Conclusions
In this study, a brief overview of the data-driven SGS modeling applied in large eddy simulations was shown. After an
insight into the traditional SGS reconstruction approaches was given, the focus was set on artificial neural network based
modeling approaches. Some recent papers in this field were cited to show the colorful approaches, and it was followed by a
simplified guidance in the SGS modeling. The author’s aim was not a fully description of each modeling practice, despite, it
was considered to give a starting point, which may lead to a unified modeling protocol. Many challenges are in front of us,
as Kutz (2017) outlined in his paper; how many nodes and layers should be applied, what is the minimum size of data sets,
which is enough for proper training, what are the uncertainties in the output? How can I prevent overfitting? Can one
predict data outside the training data? In addition to general questions, what is the suggestible architecture for SGS
modeling? How can the stability maintain of the neural network constructed SGS? And as more robust and advanced tools
will be developed, new questions could be added the above mentioned list. Many fruitful directions are ongoing, e.g.,
deconvolutional method proposed by Yuan et al. (2020), or the idea of perfect LES (Beck et al. (2019)—the next step
is to evaluate and select the most appropriate (if there is any) modeling procedure.
References
Balarac, G., Le Sommer, J., Meunier, X., Vollant, A., 2013. A dynamic regularized gradient model of the subgrid-scale scalar flux for large eddy simulations. Phys. Fluids 25 (7), 075107. https://doi.org/10.1063/1.4813812.
Bardina, J., Ferziger, J.H., Reynolds, W.C., 1980. Improved subgrid scale models for large eddy simulations. In: 13th Fluid and Plasmadynamics Conference 1357., https://doi.org/10.2514/6.1980-1357.
Beck, A., Flad, D., Munz, C.D., 2019. Deep neural networks for data-driven LES closure models. J. Comput. Phys. 398, 108910. https://doi.org/10.1016/j.
jcp.2019.108910.
Bensow, R.E., Fureby, C., 2007. On the justification and extension of mixed models in LES. J. Turbul. 8, N54. https://doi.org/10.1080/146852
40701742335.
Bou-Zeid, E., 2015. Challenging the large eddy simulation technique with advanced a posteriori tests. J. Fluid Mech. 764, 1–4. https://doi.org/10.1017/
jfm.2014.616.
Clark, R.A., Ferziger, J.H., Reynolds, W.C., 1979. Evaluation of subgrid-scale models using an accurately simulated turbulent flow. J. Fluid Mech. 91 (1),
1–16. https://doi.org/10.1017/S002211207900001X.
Coleman, G.N., Sandberg, R.D., 2010. A primer on direct numerical simulation of turbulence-methods, procedures and guidelines. Technical Report AFM09/01a https://eprints.soton.ac.uk/66182/1/A_primer_on_DNS.pdf. (Accessed 14 March .2021).
Dairay, T., Lamballais, E., Laizet, S., Vassilicos, J.C., 2017. Numerical dissipation vs. subgrid-scale modelling for large eddy simulation. J. Comput. Phys.
337, 252–274. https://doi.org/10.1016/j.jcp.2017.02.035.
Drozda, T.G., Sheikhi, M.R.H., Madnia, C.K., Givi, P., 2007. Developments in formulation and application of the filtered density function. Flow Turbul.
Combust. 78 (1), 35–67. https://doi.org/10.1007/s10494-006-9052-4.
Eslamian, S., Abedi-Koupai, J., Zareian, M.J., 2012. Measurement and modelling of the water requirement of some greenhouse crops with artificial neural
networks and genetic algorithm. Int. J. Hydrol. Sci. Technol. 2 (3), 237–251.
Fabre, Y., Balarac, G., 2011. Development of a new dynamic procedure for the Clark model of the subgrid-scale scalar flux using the concept of optimal
estimator. Phys. Fluids 23 (11), 115103. https://doi.org/10.1063/1.3657090.
Frezat, H., Balarac, G., Le Sommer, J., Fablet, R., Lguensat, R., 2021. Physical invariance in neural networks for subgrid-scale scalar flux modelling. Phys.
Rev. Fluids 6 (2), 024607. https://doi.org/10.1103/PhysRevFluids.6.024607.
Germano, M., Piomelli, U., Moin, P., Cabot, W.H., 1991. A dynamic subgrid-scale eddy viscosity model. Phys. Fluids A Fluid Dynam. 3 (7), 1760–
1765. https://doi.org/10.1063/1.857955.
288 Handbook of hydroinformatics
Gottlieb, D., Orszag, S.A., 1977. Numerical Analysis of Spectral Methods: Theory and Applications. CBMS-NSF Regional Conference Series in Applied
Mathematics, Series Number 26, Society for Industrial and Applied Mathematics.
Grinstein, F.F., Margolin, L.G., Rider, W.J., 2007. A rationale for implicit LES. In: Implicit Large-Eddy Simulation: Computing Turbulent Flow
Dynamics. Cambridge University Press, New York, USA, pp. 39–59.
Ibrahim, D., 2016. An overview of soft computing. Procedia Comput. Sci. 102, 34–38. https://doi.org/10.1016/j.procs.2016.09.366.
Ioffe, S., Szegedy, C., 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on
Machine Learning, pp. 448–456, https://doi.org/10.48550/arXiv.1502.03167.
Kajishima, T., Nomachi, T., 2006. One-equation subgrid scale model using dynamic procedure for the energy production. J. Appl. Mech. 73 (3), 368–
373. https://doi.org/10.1115/1.2164509.
Kim, W.W., Menon, S., 1997. Application of the localized dynamic subgrid-scale model to turbulent wall-bounded flows. In: 35th Aerospace Sciences
Meeting and Exhibit 210., https://doi.org/10.2514/6.1997-210.
Kutz, J.N., 2017. Deep learning in fluid dynamics. J. Fluid Mech. 814, 1–4. https://doi.org/10.1017/jfm.2016.803.
Lapeyre, C.J., Misdariis, A., Cazard, N., Veynante, D., Poinsot, T., 2019. Training convolutional neural networks to estimate turbulent sub-grid scale
reaction rates. Combust. Flame 203, 255–264. https://doi.org/10.1016/j.combustflame.2019.02.019.
Lee, M., Ulerich, R., Malaya, N., Moser, R.D., 2014. Experiences from leadership computing in simulations of turbulent fluid flows. Comput. Sci. Eng. 16
(5), 24–31. https://doi.org/10.1109/MCSE.2014.51.
Ling, J., Kurzawski, A., Templeton, J., 2016. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid
Mech. 807, 155–166. https://doi.org/10.1017/jfm.2016.615.
Maulik, R., San, O., Rasheed, A., Vedula, P., 2019. Subgrid modelling for two-dimensional turbulence using neural networks. J. Fluid Mech. 858, 122–
144. https://doi.org/10.1017/jfm.2018.770.
Maulik, R., San, O., Jacob, J.D., 2020. Spatiotemporally dynamic implicit large eddy simulation using machine learning classifiers. Phys. D Nonlinear
Phenom. 406, 132409. https://doi.org/10.1016/j.physd.2020.132409.
Meneveau, C., Lund, T.S., Cabot, W.H., 1996. A Lagrangian dynamic subgrid-scale model of turbulence. J. Fluid Mech. 319, 353–385.
Moin, P., 1998. Numerical and physical issues in large eddy simulation of turbulent flows. JSME Int. J. Ser. B Fluids Thermal Eng. 41 (2), 454–463. https://
doi.org/10.1299/jsmeb.41.454.
Pope, S.B., 1991. Computations of turbulent combustion: progress and challenges. In: Symposium of Combustion 23, pp. 591–612.
Prat, A., Sautory, T., Navarro-Martinez, S., 2020. A priori sub-grid modelling using artificial neural networks. Int. J. Comput. Fluid Dynam. 34 (6), 397–
417. https://doi.org/10.1080/10618562.2020.1789116.
Sandham, N.D., 2005. Turbulence simulation. In: Prediction of Turbulent Flows. Cambridge University Press, pp. 207–235.
Sarghini, F., De Felice, G., Santini, S., 2003. Neural networks based subgrid scale modeling in large eddy simulations. Comput. Fluids 32 (1), 97–
108. https://doi.org/10.1016/S0045-7930(01)00098-6.
Seltz, A., Domingo, P., Vervisch, L., Nikolaou, Z.M., 2019. Direct mapping from LES resolved scales to filtered-flame generated manifolds using convolutional neural networks. Combust. Flame 210, 71–82. https://doi.org/10.1016/j.combustflame.2019.08.014.
Toutant, A., Labourasse, E., Lebaigue, O., Simonin, O., 2008. DNS of the interaction between a deformable buoyant bubble and a spatially decaying
turbulence: a priori tests for LES two-phase flow modelling. Comput. Fluids 37 (7), 877–886. https://doi.org/10.1016/j.compfluid.2007.03.019.
Vollant, A., Balarac, G., Corre, C.E., 2017. Subgrid-scale scalar flux modelling based on optimal estimation theory and machine-learning procedures.
J. Turbul. 18 (9), 854–878. https://doi.org/10.1080/14685248.2017.1334907. Taylor and Francis.
Wu, X., 2017. Inflow turbulence generation methods. Annu. Rev. Fluid Mech. 49, 23–49. https://doi.org/10.1146/annurev-fluid-010816-060322.
Xie, C., Wang, J., Li, H., Wan, M., Chen, S., 2020. Spatially multi-scale artificial neural network model for large eddy simulation of compressible isotropic
turbulence. AIP Adv. 10 (1), 015044. https://doi.org/10.1063/1.5138681.
Yang, X.I.A., Zafar, S., Wang, J.X., Xiao, H., 2019. Predictive large-eddy-simulation wall modeling via physics-informed neural networks. Phys. Rev.
Fluids 4 (3), 034602. https://doi.org/10.1103/PhysRevFluids.4.034602.
Yuan, Z., Xie, C., Wang, J., 2020. Deconvolutional artificial neural network models for large eddy simulation of turbulence. Phys. Fluids 32 (11),
115106. https://doi.org/10.1063/5.0027146.
Chapter 18
Lattice Boltzmann method and its
applications
Mojtaba Aghajani Delavar and Junye Wang
Faculty of Science and Technology, Athabasca University, Athabasca, AB, Canada
1. Introduction
Lattice Boltzmann method (LBM) is a mesoscopic approach based on the microscopic particle and mesoscopic kinetic
theory, originated from the classical statistical physics and lattice gas automata (LGA) method (Frisch et al., 1987).
The fundamental idea behind the LBM is to construct simplified kinetic models that simulate only a collection of pseudo
particles streaming and colliding over a discrete lattice domain to avoid the use of the full Boltzmann equation, and tracing
each particle as in molecular dynamics simulations. The stream and interaction of the parcel particles are simulated in terms
of time evolution. In the lattice Bhatnagar-Gross-Krook (LBGK) development, an ensemble average of the particle distribution is described as a density distribution function (Bhatnagar et al., 1954; Ajarostaghi et al., 2019). The complex collision of the Boolean particles in LGA was simplified as an ensemble average relaxation term of the so-called density
distribution function. Thus, the discrete collision rule is also replaced by a continuous function known as the collision
operator within LBGK. The transition from LGA to LBM removed the statistical noise and makes simulations more efficient, stable, and flexible. The governing continuity and Navier-Stokes equations can be recovered using appropriately
choosing the equilibrium distribution function and Chapman-Enskog theory (Qian et al., 1992; Chen et al., 1992). Macroscopic averaged properties, such as the flow velocity, the pressure, and the fluid density, can be recovered from the
moments of (time- and space-discrete) density distribution function.
Instead of solving the Navier-Stokes equations directly, fluid distribution functions on a lattice are simulated with
streaming and collision (relaxation) processes, while the Navier-Stokes-based solvers rely on discretizing the governing
differential equations with a given set of boundary conditions on a computational grid network. On the other hand, the
molecular dynamics method considers the streams and collisions of all individual molecules constituting the fluid with
a detailed description of the intermolecular interactions. Thus, the complexity of interactions among a huge number of
molecules makes the molecular dynamics models computationally prohibitive for application to flows in porous media
(Saatsaz and Eslamian, 2020). Due to its underlying kinetic nature, the LBM has many advantages, such as easy parallelism
of the algorithm, simplicity of programming, and ease of incorporating microscopic nonequilibrium processes (Wang et al.,
2005), which enable the modeling of complex multiphysics phenomena involving interfacial dynamics and complex
boundaries, such as multiphase or multicomponent flows in porous media. As a result, the LBM is an ideal scale-bridging
numerical scheme, which incorporates simplified kinetic models to capture microscopic/mesoscopic flow physics, yet the
averaged quantities satisfy the desired macroscopic equations.
This chapter is organized as follows. First, the lattice Boltzmann equation is presented. Then single relaxation BGK
model, multirelaxation time LBM, and boundary conditions are discussed. Finally, LBM applications to multiphase flow
and an example are presented.
2. Lattice Boltzmann equations
In the simulation of transport phenomenon, there are two basic methods as macroscopic continuum media and microscopic
scheme. In a macroscopic scheme including fluid mechanics and thermodynamics, the domain is considered as a continuum
media, while in a microscopic scheme or kinetics theory, the domain is treated as small particles. It is shown that both
schemes result in the same macroscopic governing equations for systems involving a large number of particles.
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00001-4
Copyright © 2023 Elsevier Inc. All rights reserved.
289
290 Handbook of hydroinformatics
The microscopic scheme is based on the microscopic particle description provided by Molecular Dynamics (MD). The
position and velocity of each particle in the system (atom or molecule) are strongly related to its position and velocity on the
current and previous time steps and Newton’s equations of motion. Due to the huge number of involved particles, this can
lead to substantial computational time. Thus, the molecular dynamics simulations are limited to very small systems. To
solve this problem and decrease computational problems of these models, these practical modifications are presented
and used:
(1) First, decreasing the number of investigated particles by considering a group of atoms or molecules instead of an individual atom or molecule. This changes the simulation scale from microscopic to mesoscopic scale.
(2) Second, increasing simulation speed by decreasing the degree of freedom of particle movement by restriction of it just
along with a finite number of directions.
In the lattice Boltzmann method as a mesoscopic method, the distribution function is defined as the probability to find
particles with a specific velocity range at a limited position at the given time. This distribution function is used as a
replacement for the investigation of each particle in molecular dynamics. This change leads to considerable savings in
computational costs that make this procedure feasible for larger domains compared to ones for molecular dynamics.
In the lattice Boltzmann method, the movements of fluid particles are modeled to capture macroscopic fluid quantities
such as velocity and pressure. Like other numerical simulation procedures, the domain is discretized into uniform Cartesian
cells, each one holds a fixed number of distribution functions, which represent the number of fluid particles moving in these
discrete directions. The number of distribution functions is related to the used lattice Boltzmann method in Table 1. These
distribution functions are calculated by solving the Lattice Boltzmann Equation (LBE).
TABLE 1 Common lattice Boltzmann models.
Model
D1Q3
D1Q5
Schematic
Velocities
0
2
4
1
0
2
D2Q5
8
>
< 0, k ¼ 0
ck ¼ 1 k ¼ 1
>
:
1, k ¼ 2
3
1
8
0, k ¼ 0
>
>
>
>
<2
c k ¼ 6 , k ¼ 1, 3
>
>
>
2
>
: , k ¼ 2, 4
6
8
>
< ð0, 0Þ, k ¼ 0
c k ¼ ð1, 0Þ, k ¼ 1, 3
>
:
ð0, 1Þ, k ¼ 2, 4
2
Weighting factors
ok ¼
8
0, k ¼ 0
>
>
<
>
>
: 1 , k ¼ 12
2
8 6
>
>
> 12 , k ¼ 0
>
>
<
2
ok ¼
, k ¼ 1, 3
> 12
>
>
>
>
1
: , k ¼ 2, 4
12
8
0, k ¼ 0
>
>
<
ok ¼
>
>
:1, k ¼ 1 4
4
3
0
1
4
D2Q7
8
>
< ð0, 0Þ, k ¼ 0
c k ¼ ð1, 0Þ, k ¼ 1, 2
>
pffiffi : 1
=2, 3=2 , k ¼ 3, 6
ok ¼
8
1
>
>
> ,k¼0
<
2
>
>
>
: 1 ,k ¼16
12
Lattice Boltzmann method and its applications Chapter
18
291
TABLE 1 Common lattice Boltzmann models—cont’d
Model
Schematic
D2Q9
Velocities
6
2
8
>
< ð0, 0Þ, k ¼ 0
c k ¼ ð1, 0Þ, ð0, 1Þ, k ¼ 1 4
>
:
ð1, 1Þ, k ¼ 5 8
5
3
Weighting factors
84
>
>
>9, k ¼ 0
>
>
<
1
ok ¼
,k ¼14
>9
>
>
>
>
1
: ,k ¼ 58
36
1
0
7
4
D3Q15
8
7
8
3
14
9
8
>
< ð0, 0, 0Þ, k ¼ 0
c k ¼ ð1, 0, 0Þ, ð0, 1, 0Þ, ð0, 1, 0Þ, k ¼ 1 6
>
:
ð1, 1, 1Þ, k ¼ 7 14
82
>
,k¼0
>
>
>
9
>
<
1
ok ¼
,k ¼16
>9
>
>
>
>
1
: , k ¼ 7 14
72
8
>
< ð0, 0, 0Þ, k ¼ 0
c k ¼ ð1, 0, 0Þ, ð0, 1, 0Þ, ð0, 1, 0Þ, k ¼ 1 6
>
:
ð1, 1, 0Þ, ð1, 0, 1Þ, ð0, 1, 1Þ, k ¼ 7 18
81
>
,k¼0
>
>
>
3
>
<
1
ok ¼
,k ¼ 16
> 18
>
>
>
>
1
: , k ¼ 7 14
36
2
0
4
1
5
10
13
6
11
12
D3Q19
15
13
5
11
9
3
2
7
17
18
16
12
14
1
4
6
8
10
c18
In order to capture the flow field in conventional numerical simulations, the continuity and momentum conservation
equation (Navier Stokes) should be solved which are
rU ¼0
r
∂U
+ rU rU ¼ rp + mr2 U
∂t
(1)
(2)
where U is the velocity vector, r is the fluid density, p and m represent the pressure and the dynamic viscosity, respectively.
However, in LBM these equations are not solved and lattice Boltzmann equations are solved which satisfy the governing
equations in mesoscopic scale using distribution functions.
The distribution function f(r, c, t) is defined as the number of particles in time t, located in the position between r and
r + dr, with a velocity ranging from c to c + dc. The velocity and location of a particle with unit mass due to an external force
change from c to c + Fdt, and r to r + dr, respectively. The change between initial and final statuses is modeled using the
collision operator, O, and the equation of particles numbers can be written as (Mohamad, 2011):
F
(3)
f r + cdt, c + dt, t + dt drdc f ðr, c, tÞdrdc ¼ Oðf Þdrdcdt
m
by limiting dt ! 0:
df =dt ¼ Oðf Þ
This means that the rate of distribution function change is equal to the collision.
(4)
292 Handbook of hydroinformatics
As the distribution function, f, is a function of c, r, and t, the Boltzmann transport equation can be written as below:
IF there is no external force:
∂f
∂f
F ∂f
+ !c +
¼O
∂t
m ∂c
∂r
(5)
∂f
+ c rf ¼ O
∂t
(6)
2.1 BGK approximation
The collision operator (O) is a function of f and it must be given to solve the Boltzmann equation. Bhatnagar, Gross, and
Krook introduced a simple model for the collision operator. By using the BGK (Bhatnagar-Gross-Krook) approximation,
the Boltzmann equation without external forces is as below (Bhatnagar et al., 1954):
∂f
1
+ c rf ¼ ðf eq f Þ ¼ oðf eq f Þ
∂t
τ
(7)
where o ¼ 1/τ is collision frequency, and τ is for relaxation factor and feq represents the local equilibrium distribution
function. The discretized lattice Boltzmann equation (Eq. 7) with external forces is as below:
Dt eq
f k ðx, tÞ f k ðx, tÞ + DtFk
τ
"
#
ck U
ðck U Þ2
UU
eq
f k ðx, tÞ ¼ ok r 1 +
+ 0:5
0:5 2
c2s
cs
c4s
f k ðx + ck Dt, t + DtÞ ¼ f k ðx, tÞ +
(8)
(9)
where Dt is the lattice time step, ck the discrete lattice velocity in direction k, Fk the external force, feq
k the equilibrium
distribution function (Qian et al., 1992; Chen et al., 1992), and r the lattice fluid density. ok denotes a weighting factor
depending on the different LB models, DnQm, as shown in Table 1.
The space, x, is usually discretized in such a way that ck Dt is the distance between two neighboring grid lattices. Then
after a time step, Dt, fk (x, t) will move to its neighboring grid lattice along the lattice velocity direction, ck. In LBM, particle
movement is limited to a set of specific directions, depending on the lattice arrangement used LBM model, DnQm. In
DnQm, n and m represent the spatial dimension and the total number of the lattice streaming directions velocities, respectively. For example, D1Q2 represents one dimension and two streaming directions. Microscopic Eqs. (8) and (9) can
recover to macroscopic Eqs. (1) and (2) through Chapman-Enskog Expansion.
2.2 Lattice Boltzmann models
In Table 1, D2Q9 is a type of lattice Boltzmann model, which represents two dimensions and nine streaming directions. Its
equations can be described as:
△t eq
f k ðx, tÞ f k ðx, tÞ + △tFk
f k ðx + ck △t, t + △tÞ ¼ f k ðx, tÞ +
τ
"
#
(10)
ck U
ðck U Þ2
UU
eq
f k ðx, tÞ ¼ ok r 1 +
+
0:5
0:5
c2s
c2s
c4s
84
>
,k¼0
>
>
>9
>
<
1
ok ¼
,k ¼14
>9
>
>
>
>
: 1 ,k ¼58
36
c
cs ¼ pkffiffiffi
3
The density and velocity components are determined by:
r¼
8
>
< ð0, 0Þ, k ¼ 0
ck ¼ ð1, 0Þ, ð0, 1Þ, k ¼ 1 4
>
:
ð1, 1Þ, k ¼ 5 8
X
fk
(11)
(12)
k
rui ¼
X
f k cki
(13)
k
Subscript i is the direction of the Cartesian axis coordinates, on which macro collections of distribution functions
are summed.
Lattice Boltzmann method and its applications Chapter
18
293
2.3 Multirelaxation time lattice Boltzmann (MRT)
To increase the numerical stability and accuracy of the single relaxation BGK model, the multirelaxation time (MRT) collision operator was proposed (d’Humières, 2002; Mohamad, 2011; Sharma et al., 2019; Jami et al., 2016). The streaming
process remains the same in MRT, but the differences lay in the collision process, the streaming process is completed in the
velocity space while the collision step is performed in the moment space. Lattice Boltzmann equation for MRT is
(d’Humières, 2002; Mohamad, 2011; Jami et al., 2016):
f k ðx + ck △t, t + △tÞ ¼ f k ðx, tÞ + M1 R meq
(14)
k ðx, tÞ mk ðx, tÞ
where M is a transformation matrix that is used to map velocity space, f, to moment space, m. R is a diagonal matrix with
relaxation time, mk and meq
k are vectors of moments. Using linear transformation for mapping between velocity and moment:
f ¼ M1 m
The details of the MRT for D2Q9 are summarized below:
2
4 4 4
0
6 4 1 2 6
6
6
6 4 1 2 0
6
6
6 4 1 2 6
6
1
1
M ¼ 6
4 1 2 0
36 6
6
1
6
64 2
6
64 2
1
6
6
6
1
6
44 2
4 2
1
6
m ¼ r e E j x qx
(15)
0
6
0
0
0
0
0
9
0
6
6
0
6
0
9
9
0
6
6
9
3
3
6
6
3
3
0
0
3
3
6
6
jy qy pxx
3 0
3 0
pyy T
3
0
0 7
7
7
0 7
7
7
0 7
7
0 7
7
7
9 7
7
9 7
7
7
9 5
9
(16)
(17)
where r is the fluid density, e is related to the total energy, E is related to the energy square, qx and qyare related to energy
flux, pxx and pyy are related to the symmetric stress tensor, and:
X eq
F
jx ¼ ru ¼
f k ckx 2
k
(18)
X eq
F
jy ¼ rv ¼
f k cky 2
k
where F is the external force. The equilibrium moments are calculated as:
meq
0 ¼r
2
2
meq
1 ¼ 2r + 3 jx + jy
2
2
meq
2 ¼ r 3 jx + jy
meq
3 ¼ jx
meq
4 ¼ jx
(19)
meq
5 ¼ jy
meq
6 ¼ jy
2
2
meq
7 ¼ jx jy
meq
8 ¼ jx jy
The diagonal relaxation matrix R can be written in compact notation as:
2
2
R ¼ diag 1:01:41:41:01:21:01:2
1 + 6# 1 + 6#
(20)
294 Handbook of hydroinformatics
More information about details of MRT can be found in (d’Humières, 2002; Lallemand and Luo, 2003; Li et al., 2016a,b;
Mohamad, 2011; Jami et al., 2016; Zhang et al., 2020).
2.4 Boundary conditions
Implementation of boundary conditions is not as straightforward as the Navier-Stokes equations and needs more attention.
For example, consider boundary grids as shown in Fig. 1 for north, east, south, and west boundaries in the D2Q9 model. All
solid arrows show distribution functions to outside of the domain known from streaming. The unknown distribution functions are ones shown in dotted lines toward the domain that should be calculated as a boundary condition in LBM and are
discussed in this section.
2.4.1 Bounce back
The bounce-back boundary condition is for solid walls that have no-slip conditions. In no-slip conditions, the surface has a
roughness and fluid particles adhere to it, thus the fluid velocity in such a situation is zero at the wall. This boundary condition is the most common boundary condition for solid boundaries of the domain and obstacles located in the fluid flow. In
the bounce-back scheme, the no-slip boundary condition is implemented in lattice Boltzmann. If the bounce back is applied
to a cell in an obstacle, during the collision every distribution function is replaced by the distribution function in opposite
direction. In this scheme, the unknown outgoing distribution functions are assumed equal to distribution functions in
reverse directions at boundary sites. During the collision, the below equation will be applied:
out
f in
p ðx, t + 1Þ ¼ f q ðx, tÞ
(21)
where p and q represent opposite directions in LBM models in Table 1. For example, for the west boundary in Fig. 1 the
following conditions will be used:
f 1 ¼ f 3 ,f 5 ¼ f 7 ,f 8 ¼ f 6
FIG. 1 Known (solid black lines) and unknown (dotted green lines) distribution functions for the D2Q9 model.
(22)
Lattice Boltzmann method and its applications Chapter
18
295
TABLE 2 Velocity boundary conditions for known velocity components in main boundaries (D2Q9).
Boundary
Density
Distribution functions
North known uN and vN
rN ¼
East known uE and vE
rE ¼ 1 +1 u ½f 0 + f 2 + f 4 + 2ðf 1 + f 5 + f 8 Þ
f 3 ¼ f 1 23 rE uE
f 6 ¼ f 8 12 ðf 2 f 4 Þ 16 rE uE + 12 rE v E
f 7 ¼ f 5 + 12 ðf 2 f 4 Þ 16 rE uE 12 rE v E
South known uS and vS
1
rS ¼ 1v
½f 0 + f 1 + f 3 + 2ðf 4 + f 7 + f 8 Þ
f 2 ¼ f 4 + 23 rS v S
f 5 ¼ f 7 + 12 ðf 1 f 3 Þ + 16 rs us + 12 rs v s
f 6 ¼ f 8 + 12 ðf 1 f 3 Þ + 16 rS v s 12 rS uS
West known uW and vW
1
rW ¼ 1u
½f 0 + f 2 + f 4 + 2ðf 3 + f 6 + f 7 Þ
f 1 ¼ f 3 + 23 rW uW
f 5 ¼ f 7 12 ðf 2 f 4 Þ + 16 rW uW + 12 rW v W
f 8 ¼ f 6 + 12 ðf 2 f 4 Þ + 16 rW uW 12 rW v W
1
1 + vN
½f 0 + f 1 + f 3 + 2ðf 2 + f 5 + f 6 Þ
E
S
W
f 4 ¼ f 2 23 rN v N
f 7 ¼ f 5 + 12 ðf 1 f 3 Þ 16 rN v N 12 rN uN
f 8 ¼ f 6 + 12 ðf 3 f 1 Þ + 12 rN uN 16 rN v N
2.4.2 The boundary with a given velocity
The velocity components in a boundary are known in many applications such as inlet ports of the domain. The equations of this
type of boundary condition can be summarized as below for the main boundaries (north, east, south, and west) in Fig. 1 (Table 2).
2.4.3 The boundary with given pressure
The procedure to specify pressure at the inlet is similar to specifying velocity as both conditions will be generated regarding
a density difference in the flow. Here the density (pressure) in the inlet is known despite the situations for the boundary with
known inlet velocities, the summary of pressure inlet boundary conditions is presented in Table 3.
2.4.4 Open boundary condition
This condition assures outlet ports by the implementation of extrapolation. It assumes zero gradients in the flow direction
and is applied as:
f 3,n ¼ 2f 3,n1 f 3,n2
(23)
f 6,n ¼ 2f 6,n1 f 6,n2
(24)
f 7,n ¼ 2f 7,n1 f 7,n2
(25)
where n is the lattice on the boundary, and (n 1) and (n 2) denote the lattices inside the domain in the flow direction
adjacent to the boundary. Another possible extrapolation can be implemented as (Seta et al., 2006):
4
1
f 3,n ¼ f 3,n1 f 3,n2
3
3
4
1
f 6,n ¼ f 6,n1 f 6,n2
3
3
4
1
f 7,n ¼ f 7,n1 f 7,n2
3
3
(26)
(27)
(28)
2.4.5 Symmetry boundary condition
If the problem is symmetrical along a line (2D) or surface (3D), a symmetrical boundary condition can be used to solve the
problem for just a part of the domain and save computer resources. In this boundary, the unknown distribution functions are
assumed to be equal to their mirror directions. For example, for the south boundary in Fig. 1, the symmetry boundary condition can be applied as:
f 2 ¼ f 4 ,f 5 ¼ f 8 ,f 6 ¼ f 7
(29)
296 Handbook of hydroinformatics
TABLE 3 Pressure conditions for known density in main boundaries (D2Q9).
Boundary
Density
Distribution functions
North known rN
vN ¼
f 4 ¼ f 2 23 rN v N
f 7 ¼ f 5 + 12 ðf 1 f 3 Þ 16 rN v N
f 8 ¼ f 6 + 12 ðf 3 f 1 Þ 16 rN v N
East known rE
rE ¼ r1 ½f 0 + f 2 + f 4 + 2ðf 1 + f 5 + f 8 Þ 1
f 3 ¼ f 1 23 rE uE
f 6 ¼ f 8 12 ðf 2 f 4 Þ 16 rE uE
f 7 ¼ f 5 + 12 ðf 2 f 4 Þ 16 rE uN
South known rS
v S ¼ 1 r1 ½f 0 + f 1 + f 3 + 2ðf 4 + f 7 + f 8 Þ
f 2 ¼ f 4 + 23 rS v S
f 5 ¼ f 7 + 12 f 1 ¼ f 3 + 23 rW uW + 12 rS v s
f 6 ¼ f 8 + 12 ðf 1 f 3 Þ + 16 rS v s
West known rW
uW ¼ 1 r1 ½f 0 + f 2 + f 4 + 2ðf 3 + f 6 + f 7 Þ
f 1 ¼ f 3 + 23 rW uW
f 5 ¼ f 7 12 ðf 2 f 4 Þ + 16 rW uW
f 8 ¼ f 6 + 12 ðf 2 f 4 Þ + 16 rW uW
1
rN
½f 0 + f 1 + f 3 + 2ðf 2 + f 5 + f 6 Þ 1
E
S
W
2.4.6 Periodic boundary condition
If a flow is repeated, a periodic boundary condition can be utilized by considering a part of the domain isolated from the
whole domain. Such a periodic boundary condition is to save computational time. In this regard, the periodic boundary
conditions consider both sides of the periodic domain symmetrical mirror to find unknown distribution functions. Unknown
distribution on one side is linked to the known distribution functions on the opposite side. For instance, if in Fig. 1, east and
west boundaries are periodic boundaries then:
f 1W ¼ f 1E ,f 5W ¼ f 5E ,f 8W ¼ f 8E
(30)
f 3E ¼ f 3W ,f 6E ¼ f 6W ,f 7E ¼ f 7W
(31)
where W and E denote west and east boundaries, respectively.
3.
Thermal LBM
The temperature distribution in the domain can be obtained by solving the energy conservation equation:
qg
∂T
+ U rT ¼ ar2 T +
∂t
rC
(32)
where T is the temperature, a the diffusion coefficient, r is density, C denotes specific heat capacity and qg is the heat
source.
As discussed before, f in Eq. (10) is a distribution function that is used to find the flow field in the domain. To solve
Eq. (32) of the temperature field, a new distribution function, g, should be required. To consider combined fluid flow and
temperature fields, the thermal LBM utilizes two distribution functions, fand g, for the flow and temperature fields, respectively. The g distribution function is described as below:
△t eq
gk ðx + ck △t, t + △tÞ ¼ gk ðx, tÞ +
gk ðx, tÞ gk ðx, tÞ + △tok ST
(33)
τg
where geq
k and ST are corresponding equilibrium distribution function and the heat source, respectively. As the temperature
is a scalar parameter, the equilibrium distribution function can be determined by (Mohamad, 2011; Mezrhab et al., 2006;
Delavar et al., 2009, 2010; Ajarostaghi et al., 2019; Delavar and Wang, 2020a, 2021a):
geq
k ðx, tÞ ¼ ok r 1 +
ck U
c2s
(34)
The source term is related to the macroscopic heat source, qg, as below:
ST ¼
qg
rC
(35)
Lattice Boltzmann method and its applications Chapter
The temperature is calculated as:
T¼
X
gk
18
297
(36)
k
The effects of buoyancy force due to density changes by temperature gradients should be included in Eq. (10) which is
calculated as below in vertical direction (y) (Mohamad, 2011):
Fk ¼ 3ok gy bry cky
(37)
where b represents thermal expansion coefficient, cky is the y-component of lattice velocity ck, and y is a dimensionless
temperature which is defined as:
y¼
T Tc
Th Tc
(38)
3.1 Boundary condition with a given temperature
Given a fixed temperature at a boundary, it is implemented in LBM using this boundary condition like other scalars (such as
concentration). For example, if the temperature of the east boundary in Fig. 1 is equal to TW then:
g3 ¼ T W ðo1 + o3 Þ g1
g6 ¼ T W ðo6 + o8 Þ g8
(39)
g7 ¼ T W ðo5 + o7 Þ g5
It should be noticed that if any nondimensioning or normalization is considered during the solution, the boundary conditions must be adjusted accordingly.
3.2 Constant heat flux boundary condition
Regarding Fourier law, the temperature gradient can be related to heat flux, for example for the south boundary in Fig. 1:
q00y ¼ l
∂T
∂y
(40)
where l is thermal conductivity. This equation can be rewritten as:
q00y ¼ l
∂T
T ði, 1Þ T ði, 0Þ
¼ l
∂y
Dy
(41)
Then:
T ði, 0Þ ¼ T ði, 1Þ +
q00y l
Dy
(42)
That is similar to the temperature boundary condition discussed in the previous section.
If qy00 ¼ 0, it would be an adiabatic boundary condition that is similar to an open boundary condition as discussed before.
4. Multicomponent LBM (species transport modeling)
In multicomponent models, different species react with each other and are dependent on local flow and thermal fields. In
such a simulation the species conservation equations are applied for each species that are as below (Delavar and Wang,
2020b, 2021b,c, 2022a):
2
rU rCl ¼ rDeff
l r Cl + Sl
X
Cl ¼ 1
(43)
(44)
l
where l is the chemical species, Cl represents the mole fraction of species l, Slis the mass generation/consumption rate for
species l per unit volume, and Dlis the diffusion coefficient of the lth component.
298 Handbook of hydroinformatics
Each species concentration is a scalar parameter, so it is treated as temperature. New distribution functions will be utilized for each species, gl, which are defined as below (Delavar and Wang, 2020b, 2021a,b,c):
△t eq
glk ðx, tÞ glk ðx, tÞ + △tok Sl
(45)
glk ðx + ck △t, t + △tÞ ¼ glk ðx, tÞ +
τl
geq
lk ¼ ok Cl 1 +
ck U
c2s
(46)
where Sl is the source term of the lth species. Same as temperature after computing the values of local distribution functions,
local concentration is:
X
Cl ¼
glk
(47)
k
Because species concentration, like temperature, is a scalar quantity, its boundary conditions will be handled in the same
way as those for temperature computation, as detailed in Section 3.
5.
Flow simulation in porous media
As Darcy’s law cannot make a good prediction of fluid flow in porous media at high velocities, two prominent modifications are proposed: first, Forchheimer’s equation which is considered a nonlinear drag effect due to the solid matrix,
rP ¼
m
1:75 rf
U + pffiffiffiffiffiffiffiffiffiffiffi 1= jUjU
K
150e3 K 2
and second Brinkman’s equation, which considers viscous stress introduced by the solid boundary.
m
rP ¼ U meff r2 U
K
(48)
(49)
where K is permeability, e denotes porosity, and meff represents the effective viscosity. Nield and Bejan (2006) combined
these two equations and derived the Brinkman-Forchheimer equation, which includes the viscous and inertial terms by the
local volume averaging technique. This model successfully was used to investigate fluid flow in porous media in a wide
range of porosities, Rayleigh, Reynolds, and Darcy numbers (Seta et al., 2006; Yan et al., 2008; Delavar et al., 2010;
Ajarostaghi et al., 2019; Tian and Wang, 2017, 2018). The Brinkman-Forchheimer equation is described as:
∂U
U
1
+ ðU r Þ
(50)
¼ rðepÞ + ueff r2 U + F
∂t
e
r
where ueff is the effective viscosity. F is the total body force due to the viscous diffusion, the inertia due to the presence of the
porous medium, and an external force:
F¼
eu
1:75
U pffiffiffiffiffiffiffiffiffiffiffiffiffi jU jU + eG
K
150eK
where G is the gravitational acceleration.
The equilibrium distribution functions are changed to:
"
#
ck U 1 ðck U Þ2 1 U 2
eq
+
f k ¼ wk r 1 +
c2s
2
2 ec2s
ec4s
The appropriate choice for the forcing term Fk and U to capture correct hydrodynamics is:
1
ck F
U c k ÞðF c k Þ U F
+
Fk ¼ w k r 1 2τv
c2s
ec4s
ec2s
X
V
Dt
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , V ¼
eG
ck f k =r +
2
2
c0 + c0 + c1 jV j
k
1
Dt u
Dt 1:75
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
c0 ¼
1+e
, c1 ¼ e
2
2 K
2 150e3 K
U¼
(51)
(52)
(53)
(54)
(55)
Lattice Boltzmann method and its applications Chapter
18
299
To solve the energy equation the overall thermal conductivity of the porous medium should be identified that depends on a
porous solid structure and the fluid. If the heat transfer via this solid structure and fluid occur in parallel, the overall conductivity leff is a function of the weighted solid and fluid conductivities, lsand lf, as (Nield and Bejan, 2006):
leff ¼ ð1 eÞls + elf
If the heat transfer occurs in series, the effective thermal conductivity would be (Nield and Bejan, 2006):
!1
ð1 eÞ
e
leff ¼
+
ks
lf
(56)
(57)
In general, these two equations produce upper and lower bounds, respectively. For practical purposes, it is recommended to
use (Nield and Bejan, 2006):
e
leff ¼ le1
s lf
(58)
The above equations are valid if there are no big differences between ks and kf, for other cases the effective thermal conductivity in porous media was calculated by ( Jiang and Lu, 2006; Delavar and Hedayatpour, 2012):
"
!#
pffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffi
2 1 e ð1 sÞB
1
B+1 B1
ln
leff ¼ lf 1 1 e +
1 sB ð1 sBÞ2
sB
2
1 sB
(59)
lf
1 e 10=9
B ¼ 1:25
, s¼
e
ls
Fig. 2 shows the results of lattice Boltzmann simulation of biohydrogen production by microorganisms in a porous
microbioreactor.
6. Dimensionless numbers
In LBM simulations, we need to use dimensionless equations required for an analysis of the relationship between the
physical quantities of LBM and those of real parameters. The dimensionless numbers are used in the simulations in order
to (Delavar and Wang, 2020a,b):
(1) Simplify the relation between real and simulation parameters by reducing the number of variables used, for instance,
Re as a dimensionless number can be used instead of velocity, kinematic viscosity, and characteristic length.
(2) These numbers can be used to analyze system behavior regardless of the unit used to measure variables.
FIG. 2 Velocity vectors and normalized biohydrogen contours (by
maximum concentration) in a porous microbioreactor simulated
using LBM.
300 Handbook of hydroinformatics
(3) Rescale governing parameters and variables so that all computed quantities are of a range of the same magnitude (relatively similar magnitudes; in LBM between 0 and 1).
For the simulation of fluid flow, the Reynolds number is the governing dimensionless. For a given Reynolds number
(Delavar et al., 2010; Delavar and Wang, 2020b, 2021a):
UH
UN
Re ¼
¼
(60)
u real
u LBM
where N is the number of grids used for meshing, and H is the characteristics length. Subscript, real, represents the real
system and LBM does the LBM simulations.
The next important dimensionless number is the Prandtl number, which is the ratio of kinematic viscosity to thermal
diffusivity, so for a given value of the Prandtl number:
Pr ¼
u
a
real
¼
u
a
(61)
LBM
Rayleigh number is used when natural convection is important and is defined as:
Ra ¼
gbDTH 3
an
(62)
where b is the thermal expansion coefficient, g is the gravitational acceleration, and DTis the reference temperature difference (usually maximum temperature difference in the domain).
For the mass transport equation, the main governing dimensionless number is Schmidt that is defined as the ratio of
momentum diffusivity (viscosity), n, and mass diffusivity, Dl. This number characterizes the mass transport
equation regards to fluid flow as (Delavar and Wang, 2021b):
u
Scl ¼
(63)
Dl
Therefore, for the transport of each species, the related Schmidt number must be calculated and used unchanged in real and
numerical simulations. In addition, the diffusion coefficient of species is 80% decreased in cells occupied with biofilm.
These changes will affect the Schmidt number.
In Eqs. (34) and (45), the source terms are multiplied by △t, this shows the importance of the correct calculation of time
in LBM regarding real time. This relation is satisfied as (Delavar et al., 2010; Delavar and Wang, 2020b, 2022b):
Dl t
Dl t
¼
(64)
2
LD real
L2D LBM
where LD is a characteristic length, Dl the diffusion coefficient of species, and t the time.
7.
Flow chart of the simulation procedure
The simulation flow chart, shown in Fig. 3, contains the following steps:
(1) Input geometry: The computational domain is defined in this step.
(2) Initialization: Input initial values of parameters (such as velocity, temperature, etc.) and boundary conditions or update
their values regarding the previous time step.
(3) Solve the hydrodynamic equations: Collision and streaming are solved to find flow fields in the domain in both fluid
and porous zones.
(4) Solve heat transfer equation: Temperature field is solved regarding the velocity field solved in the previous step.
(5) The mass transport of species: the effects of flow and temperature field are included to solve the mass transport
equation.
8.
Multiphase flows
Multiphase flows are important in many applications such as power plants, environmental systems, the recovery of
petroleum resources from reservoirs, fuel cell operations, etc. Consequently, they have been the subject of many numerical
and experimental studies. In the last decades, LBM has become a reliable technique for simulating multiphase flows (Huang
et al., 2015; Shan and Chen, 1993; Inamuro et al., 2004; Liu et al., 2016; Li et al., 2016a,b; Chen et al., 2018; Niu et al.,
Lattice Boltzmann method and its applications Chapter
18
301
FIG. 3 LBM simulation flow chart.
2018). Compared to conventional CFD methods, LBM has some advantages due to its kinetic nature. In conventional CFD
methods, the interfacial behavior is often applied by the implementation of complicated relations, a transport equation of
the volume fraction, and an interface reconstruction process. However, in the LB method, the incorporation of
intermolecular-level interactions can easily capture the multiphase interface and the interfacial dynamics (Ba et al.,
2016). Among of different Multiphase LBM models, some models are more popular: The color-gradient model, the
Shan-Chen (SC) model, the free-energy (FE) model, and the interface tracking model. Among these models, a brief review
of the color-gradient model and Shan-Chen are presented in this section. More information about different LBM multiphase
models can be found in (Huang et al., 2015; Sattari et al., 2014, 2016; Inamuro et al., 2004; Yan and Zu, 2007; Zheng et al.,
2006; He et al., 1999). Fig. 4 shows the results of bubble rising simulated using the lattice Boltzmann method.
8.1 The color-gradient model
In this model, two distribution functions are used for the simulation of a two-component fluid flow, red-colored fluid, and
the other is blue-colored fluid. This model was proposed by Gunstensen et al. (1991) based on the Rothman-Keller (RK)
multiphase lattice gas model (Rothman and Keller, 1988), after that it was modified by several authors.
FIG. 4 Isosurfaces and velocity vectors of bubble rise in the vertical direction, simulated using LBM.
302 Handbook of hydroinformatics
The model was modified to handle binary fluids with different density and viscosity ratios (Grunau et al., 1993). LatvaKokko and Rothman (2005) introduced an extra collision term in the model. Ahrenholz et al. (2008) improved the model by
use of a multiple relaxation time (MRT) LBM to simulate higher viscosity ratios and lower capillary numbers with the
advantage of independent adjustment of the surface tension and the ratio of viscosities. More recently, for cases of density
ratio of order O(10) or bigger, some schemes were suggested to improve the model (Huang et al., 2013; Ba et al., 2016).
In this model, two immiscible fluids are presented as red and blue fluids. The distribution function fXk (X ¼ R or B) is
used to represent fluid, R for red fluid, and B for blue fluid. The total distribution function would be (Ba et al., 2016):
(65)
f Xk ðx + ck △t, t + △tÞ ¼ f Xk ðx, tÞ + OXk f bk ðx, tÞ
f k ðx, tÞ ¼ f Rk ðx, tÞ + f Bk ðx, tÞ
3 h X 1 X 2 i
OXk ¼ OXk
Ok + Ok
(66)
(67)
where OXk is total collision operator, (OXk )1 is single phase collision operator, (OXk )2 is perturbation operator to generate an
interfacial tension and (OXk )3 is recoloring operator that produces the phase segregation and maintains the phase interface.
The BGK collision operator is (Ba et al., 2016):
h
i
X 1
Ok ¼ W X f Xk + f X,eq
(68)
k
"
#!
ck U
ðck UÞ2
UU
X,eq
X
X
+ 0:5
0:5 2
f k ¼ r fk + ok 1 +
(69)
c2s
cs
c4s
where WX is relaxation parameter and fXk is a parameter related to the density ratio of fluids and sound speed in each fluid.
The perturbation operator is:
X 2
WX
X
Ok ¼ A ok 1 ½3ðck U Þ + 9ðck U Þck FS
(70)
2
where AX is the reaction of interfacial tension contributed by fluid X, and FS is the interfacial tension (Ba et al., 2016). The
recoloring parameters for red and blue fluids are calculated as:
ORk
3 0 00
rR
rR rB
cos ’k ck
f k ≡ f Rk ¼ f 0k + rok
r
r
(71)
OBk
3 0 00
rB
rR rB
cos ’k ck
f k ≡ f Bk ¼ f 0k + rok
r
r
(72)
where fk0 is post perturbation value of total distribution function and ’k is the angle between color gradient and lattice
direction. More detail about this model is presented in Ba et al. (2016) and Huang et al. (2015).
8.2 Shan-Chen model
The Shan-Chen (SC) model (Shan and Chen, 1993, 1994; Sukop and Thorne, 2006) is based on the incorporation of an
attractive or repulsive force between particles in LBM, which leads to phase separation. There are both single-component
and multicomponent multiphase models based on the SC model (Shan and Chen, 1993; Shan and Doolen, 1995). This
model has been successfully used to simulate different multiphase phenomena such as condensation, evaporation, and cavitation, bubble rise, porous media relative permeability calculation, oil-water-like two-component flow in porous media
(Sankaranarayanan et al., 2002; Chen et al., 2014; Falcucci et al., 2013; Dong et al., 2010; Nekoeian et al., 2018;
Dauyeshova et al., 2018; Sipey et al., 2020).
This model produces good results with a density ratio of up to 10 (Huang et al., 2011) and the surface tension cannot be
specified independently of inter-particle force. However, some studies have been carried out to increase the density ratio
(Bao and Schaefer, 2013).
In the multicomponent multiphase SC model, each distribution function represents a fluid component and is calculated
using the following Lattice Boltzmann equation (Bao and Schaefer, 2013; Sipey et al., 2020):
f sk ðx + ck dt, t + dtÞ ¼ f sk ðx, tÞ 1 s
f ðx, tÞ f s,eq
k ðx, tÞ
τs k
(73)
Lattice Boltzmann method and its applications Chapter
2
s6
f s,eq
k ðx, tÞ ¼ ok r 41 +
ck useq
c2s
ek +
useq
2c4s
2
+
useq
2c2s
2
18
303
3
7
5
(74)
where fsk is the sth component density distribution function, τs the single relaxation time for each component related to the
eq
kinematic viscosity as ns ¼ c2s (τs 0.5dt) and fs,
the equilibrium distribution. The density and velocity of the sth comi
ponent are calculated as:
X
rs ¼
f si
(75)
i
us ¼
8
X
e fs
i i
i¼0
(76)
rs
The macroscopic velocity is given by:
2
X
rs us
useq ¼
s¼1
2
X
τs
rs
τs
s¼1
+
τ s Fs
rs
(77)
where Fsis the force acting on the sth component including fluid-fluid interaction, fluid-solid interaction, and body force:
Fs ¼ Fsf + Fss + Fsb
(78)
Details about this model can be found in (Bao and Schaefer, 2013; Sipey et al., 2020).
9. Sample test cases and codes
In this section, the thermal LBM described in Section 3 is applied for simulations of two sample cases. The test cases help
identify the problems of heat transfer, simplifying assumptions, set up boundaries, and solve thermal equations. Velocity
fields are solved first and then the temperature field is solved as a scalar variable. The codes are written in Intel(R) Visual
Fortran and are provided in Appendix.
9.1 Free convection in L-cavity
This example describes a thermal fluid problem of free convection in an L-cavity (Fig. 5). The left boundary is set up as a
hot wall with a constant temperature, Th ¼ 1. The top boundaries are set up as two adiabatic walls, and middle constant
temperature, Tc ¼ 0. The right boundary is set up as a constant temperature, Tc. The bottom boundary is set up as an adiabatic wall. The buoyancy force drives free convection that is controlled by local temperature and Rayleigh number. Fig. 5
shows interactions between the thermal field and the flow field of free convection. The code is included in Appendix A.
This presents a platform to implement other source terms in the simulation procedure, such as heat or mass sources, viscose
forces due to porous media, or forces in the magnetohydrodynamics phenomenon.
9.2 Force convection in a channel
This example describes a thermal fluid problem of forced convection in a rectangular channel (Fig. 6). This example is
similar to an electronic cooling case. An electronic device produces heat (right block). A cooling fluid is forced to pass
the heat produced by the device through forced convection. To reduce the forced fluid temperature, a heat pipe is used
as a low temperature block (left block).
The left boundary is set up as an inlet and the right boundary is the outlet. The top and bottom boundaries are adiabatic
walls. The left block is a clod source term with a constant low temperature, Tl ¼ 0.25, and the right block is a heat source
with a constant high temperature, Th ¼ 1. The inlet temperature is 0.5. Fig. 6 shows interactions between the thermal field
and the flow field of forced convection. The code is included in Appendix B.
304 Handbook of hydroinformatics
FIG. 5 Boundary condition, temperature contours, and velocity vectors of free convection in
an L-cavity geometry.
FIG. 6 Velocity vectors and temperature contours of
forced convection in a channel with two obstacles.
10.
Conclusions
In this chapter, the main concepts and applications of the lattice Boltzmann method were presented. As a mesoscopic
numerical approach, LBM is based on distribution functions in limited directions instead of solving governing equations
of mass, momentum, and energy conservation, compared to conventional CFD methods. This makes LBM more suitable
for nonequilibrium dynamics in complex geometries and boundary conditions, and parallel processing, and avoids solving
computationally expensive Poisson equation to capture pressure field. Single relaxation and multirelaxation time LBM
were described, followed by thermal LBM model, multicomponents LBM models, and multiphase LBM models. Two
sample cases with their codes are presented at the end of this chapter to provide a platform for students, researchers,
and engineers to develop their own codes to do different scientific and practical numerical simulations.
Lattice Boltzmann method and its applications Chapter
18
305
Appendix A
Computer code for free convection in L-cavity
parameter (n=120,m=120)
real f(0:8,0:n,0:m),feq(0:8,0:n,0:m),visco(0:n,0:m),omega(0:n,0:m)
real rho(0:n,0:m),rhoo(0:n,0:m),uin(0:m)
real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m)
real alpha(0:n,0:m),gbeta(0:n,0:m),omegat(0:n,0:m)
real w(0:8),cx(0:8),cy(0:8)
real u(-1:n+1,-1:m+1),v(-1:n,-1:m+1)
integer i,j
cx(:)=(/0.0,1.0,0.0,-1.0,0.0,1.0,-1.0,-1.0,1.0/)
cy(:)=(/0.0,0.0,1.0,0.0,-1.0,1.0,1.0,-1.0,-1.0/)
w(:)=(/4./9.,1./9.,1./9.,1./9.,1./9.,1./36.,1./36.,1./36.,1./36./)
ra=1.e6
SV=0.0
dx=1.0
dy=dx
tw1=1.0
tw2=0
thref=((tw1+tw2)/2)
dt=(tw1-tw2)
pr=.71
!**********************************************!
! Setting initial values
!**********************************************!
do i=0,n
do j=0,m
rhoo(i,j)=6.0
visco(i,j)=0.02
u(i,j)=0.
v(i,j)=0.
th(i,j)=0.0
end do
end do
do i=0,n
u(i,m)=0.0
v(i,m)=0.0
end do
!**********************************************!
! setting LBM solution parameters
!**********************************************!
do i=0,n
do j=0,m
alpha(i,j)=visco(i,j)/pr
rho(i,j)=rhoo(i,j)
end do
end do
do i=0,n
do j=0,m
omega(i,j)=1./(3.*visco(i,j)+0.5)
end do
end do
do i=0,n
do j=0,m
omegat(i,j)=1./(3.*alpha(i,j)+0.5)
end do
end do
do i=0,n
do j=0,m
gbeta(i,j)=ra*visco(i,j)*alpha(i,j)/(float(m*m*m)) ! Attention required
end do
end do
mstep=1 !
savenumber=0
Continued
306 Handbook of hydroinformatics
kk=0
!**********************************************!
!main loop
!**********************************************!
!main loop
1 do while (mstep==1) !kk=1,mstep
call collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,gbeta,visco,thref)
call streaming(f,n,m)
call bouncon(f,n,m)
call uvcalc(f,rho,u,v,n,m,cx,cy)
! -----------------------------!collesion for th
call colls(u,v,g,geq,th,omegat,w,cx,cy,n,m)
!streaming for th
call streaming(g,n,m)
call gbouncon(g,tw1,tw2,w,n,m)
call thcalcu(g,th,n,m)
! Showing some parameters to check solution in iterations
print*,kk,"**",th(n/2,m/2),"**",u(n/2,m/2),"**"
! maximum iteration criterion
if (kk==99999) mstep=2
! setting autosave period
if(savenumber==5000) call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref)
kk=kk+1
savenumber=savenumber+1
errmaxu=0.0
errmaxsc=0.0
END DO
!**********************************************!
! end of the main loop
!**********************************************!
call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref)
stop
end
!**********************************************!
! Subroutine of collesion for FLOW field
!**********************************************!
subroutine collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,gbeta,visco,thref)
real f(0:8,0:n,0:m),omega(0:n,0:m),gbeta(0:n,0:m)
real feq(0:8,0:n,0:m),rho(0:n,0:m),th(0:n,0:m)
real w(0:8),cx(0:8),cy(0:8),visco(0:n,0:m)
real u(0:n,0:m),v(0:n,0:m)
do i=0,n
do j=0,m
t1=u(i,j)*u(i,j)+v(i,j)*v(i,j)
do k=0,8
t2=u(i,j)*cx(k)+v(i,j)*cy(k) !u.c(k)
force=3.0*w(k)*gbeta(i,j)*((th(i,j)-thref)*cy(k)*rho(i,j))
if(i.eq.0.or.i.eq.n) force=0.0 !Attention required
if(j.eq.0.or.j.eq.m) force=0.0 !Attention required
feq(k,i,j)=rho(i,j)*w(k)*(1.0+3.0*t2+4.5*t2*t2-1.50*t1)
f(k,i,j)=omega(i,j)*feq(k,i,j)+(1.-omega(i,j))*f(k,i,j)+force
end do
end do
end do
return
end
!**********************************************!
! Subroutine of collesion for Thermal field
!**********************************************!
subroutine colls(u,v,g,geq,th,omegat,w,cx,cy,n,m)
Continued
Lattice Boltzmann method and its applications Chapter
18
real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m)
real w(0:8),cx(0:8),cy(0:8),omegat(0:n,0:m)
real u(0:n,0:m),v(0:n,0:m)
do i=0,n
do j=0,m
do k=0,8
geq(k,i,j)=th(i,j)*w(k)*(1.0+3.0*(u(i,j)*cx(k)+v(i,j)*cy(k)))
g(k,i,j)=omegat(i,j)*geq(k,i,j)+(1.0-omegat(i,j))*g(k,i,j)
end do
end do
end do
return
end
!**********************************************!
! Subroutine of streaming
!**********************************************!
subroutine streaming(f,n,m)
real f(0:8,0:n,0:m)
! streaming
do j=0,m
do i=n,1,-1 ! RIGHT TO LEFT
f(1,i,j)=f(1,i-1,j)
end do
do i=0,n-1 ! LEFT TO RIGHT
f(3,i,j)=f(3,i+1,j)
end do
end do
do j=m,1,-1 ! TOP TO BOTTOM
do i=0,n
f(2,i,j)=f(2,i,j-1)
end do
do i=n,1,-1
f(5,i,j)=f(5,i-1,j-1)
end do
do i=0,n-1
f(6,i,j)=f(6,i+1,j-1)
end do
end do
do j=0,m-1 !BOTTOM TO TOP
do i=0,n
f(4,i,j)=f(4,i,j+1)
end do
do i=0,n-1
f(7,i,j)=f(7,i+1,j+1)
end do
do i=n,1,-1
f(8,i,j)=f(8,i-1,j+1)
end do
end do
return
end
!**********************************************!
! Subroutine of Boundary condition for flow field
!**********************************************!
subroutine bouncon(f,n,m)
real f(0:8,0:n,0:m)!,feq(0:8,0:n,0:m)
real uin(0:m+m)
! West boundary, Bounce Back
do j=0,m
f(1,0,j)=f(3,0,j)
f(5,0,j)=f(7,0,j)
f(8,0,j)=f(6,0,j)
! East Boundary, Bounce Back
f(3,n,j)=f(1,n,j)
f(6,n,j)=f(8,n,j)
f(7,n,j)=f(5,n,j)
end do
Continued
307
308 Handbook of hydroinformatics
do i=0,n
! South Boundary, Bounce Back
f(2,i,0)=f(4,i,0)
f(5,i,0)=f(7,i,0)
f(6,i,0)=f(8,i,0)
! North Boundary, Bounce Back
f(4,i,m)=f(2,i,m)
f(8,i,m)=f(6,i,m)
f(7,i,m)=f(5,i,m)
end do
!obsatcle
! West boundary, Bounce Back
do j=0,m
f(1,n/2,j)=f(3,n/2,j)
f(5,n/2,j)=f(7,n/2,j)
f(8,n/2,j)=f(6,n/2,j)
end do
do i=0,n
! South Boundary, Bounce Back
f(2,i,m/2)=f(4,i,m/2)
f(5,i,m/2)=f(7,i,m/2)
f(6,i,m/2)=f(8,i,m/2)
end do
return
end
!**********************************************!
! Subroutine of Boundary condition for Thermal field
!**********************************************!
subroutine gbouncon(g,tw1,tw2,w,n,m)
real g(0:8,0:n,0:m),geq(0:8,0:n,0:m)
real w(0:8),tw1,tw2
! Boundary Conditions
! West Boundary Condition, T=1
do j=0,m
g(1,0,j)=tw1*(w(1)+w(3))-g(3,0,j)
g(5,0,j)=tw1*(w(5)+w(7))-g(7,0,j)
g(8,0,j)=tw1*(w(8)+w(6))-g(6,0,j)
end do
! East Boundary Condition, T=0
do j=0,m
g(6,n,j)=tw2*(w(8)+w(6))-g(8,n,j)
g(3,n,j)=tw2*(w(1)+w(3))-g(1,n,j)
g(7,n,j)=tw2*(w(5)+w(7))-g(5,n,j)
end do
! Top Boundary Condition, Part 1, Adiabatic
do i=0,n/4
do k=0,8
g(k,i,m)=g(k,i,m-1)
end do
end do
! Top Boundary Condition, Part 2, T=0
do i=n/4+1,3*n/4
g(4,i,m)=tw2*(w(2)+w(4))-g(2,i,m)
g(7,i,m)=tw2*(w(5)+w(7))-g(5,i,m)
g(8,i,m)=tw2*(w(6)+w(8))-g(6,i,m)
end do
! Top Boundary Condition, Part 3, Adiabatic
do i=3*n/4+1,n
do k=0,8
g(k,i,m)=g(k,i,m-1)
end do
end do
Continued
Lattice Boltzmann method and its applications Chapter
18
! Bottom Boundary Condition, Adiabatic
do i=0,n
do k=0,8
g(k,i,0)=g(k,i,1)
end do
end do
!Obsatacle
! West Boundary Condition, T=1
do j=0,m/2
g(1,n/2,j)=tw1*(w(1)+w(3))-g(3,n/2,j)
g(5,n/2,j)=tw1*(w(5)+w(7))-g(7,n/2,j)
g(8,n/2,j)=tw1*(w(8)+w(6))-g(6,n/2,j)
end do
! Top Boundary Condition, T=1
do i=0,n/2
g(2,i,m/2)=tw1*(w(2)+w(4))-g(4,i,m/2)
g(5,i,m/2)=tw1*(w(5)+w(7))-g(7,i,m/2)
g(6,i,m/2)=tw1*(w(6)+w(8))-g(8,i,m/2)
end do
return
end
!**********************************************!
! Temperature calculation
!**********************************************!
subroutine thcalcu(g,th,n,m)
real g(0:8,0:n,0:m),th(0:n,0:m)
do j=0,m
do i=0,n
thsum=0.0
do k=0,8
thsum=thsum+g(k,i,j)
end do
th(i,j)=thsum
end do
end do
return
end
!**********************************************!
! Subroutine of velocity calculation
!**********************************************!
subroutine uvcalc(f,rho,u,v,n,m,cx,cy)
real f(0:8,0:n,0:m),rho(0:n,0:m),u(0:n,0:m),v(0:n,0:m)
real cx(0:8),cy(0:8)
do j=0,m
do i=0,n
uvsum=0.0
do k=0,8
uvsum=uvsum+f(k,i,j)
end do
rho(i,j)=uvsum
end do
end do
do j=0,m
do i=0,n
usum=0.0
vsum=0.0
do k=0,8
usum=usum+f(k,i,j)*cx(k)
vsum=vsum+f(k,i,j)*cy(k)
end do
u(i,j)=usum/rho(i,j)
v(i,j)=vsum/rho(i,j)
end do
end do
return
end
Continued
309
310 Handbook of hydroinformatics
!**********************************************!
! Subroutine of exporting results
!**********************************************!
subroutine result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref)
real u(0:n,0:m),v(0:n,0:m),rho(-1:n+1,-1:m+1),th(0:n,0:m)
real strf(0:n,0:m)
real tw1,tw2,dt,thref
real tt(0:n,0:m)
CHARACTER FILOUT1*18
CHARACTER FILOUT2*18
WRITE(FILOUT1,'(4HUVTS,I8,4H.lbm)')kk
WRITE(FILOUT2,'(4HNUAV,I8,4H.txt)')kk
! Streamfunction Calculations
strf(0,0)=0.
do i=0,n
rhoav=0.5*(rho(i-1,0)+rho(i,0))
if(i.ne.0) strf(i,0)=strf(i-1,0)-rhoav*0.5*(v(i-1,0)+v(i,0))
do j=1,m
rhom=0.5*(rho(i,j)+rho(i,j-1))
strf(i,j)=strf(i,j-1)+rhom*0.5*(u(i,j-1)+u(i,j))
end do
end do
.
! Exporting velocity and thermal fields
OPEN(2,FILE=FILOUT1)
write(2,*)"VARIABLES=X,Y,U,V,th,StreamF,rho"
write(2,*)"ZONE ","I=",n+1,"J=",m+1,",","F=BLOCK"
do j=0,m
write(2,*)(i/float(m),i=0,n)
end do
do j=0,m
write(2,*)(j/float(m),i=0,n)
end do
do j=0,m
write(2,*)(u(i,j),i=0,n)
end do
do j=0,m
write(2,*)(v(i,j),i=0,n)
end do
do j=0,m
write(2,*)(th(i,j),i=0,n)
end do
do j=0,m
write(2,*)(strf(i,j),i=0,n)
end do
do j=0,m
write(2,*)(rho(i,j),i=0,n)
end do
savenumber=0.
return
end
Lattice Boltzmann method and its applications Chapter
18
311
Appendix B
Computer code for force convection in a channel
parameter (n=400,m=50)
real f(0:8,0:n,0:m),feq(0:8,0:n,0:m),visco(0:n,0:m),omega(0:n,0:m)
real rho(0:n,0:m),rhoo(0:n,0:m)
real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m)
real alpha(0:n,0:m),gbeta(0:n,0:m),omegat(0:n,0:m)
real w(0:8),cx(0:8),cy(0:8)
real u(-1:n+1,-1:m+1),v(-1:n,-1:m+1)
real uin
integer i,j
cx(:)=(/0.0,1.0,0.0,-1.0,0.0,1.0,-1.0,-1.0,1.0/)
cy(:)=(/0.0,0.0,1.0,0.0,-1.0,1.0,1.0,-1.0,-1.0/)
w(:)=(/4./9.,1./9.,1./9.,1./9.,1./9.,1./36.,1./36.,1./36.,1./36./)
!ra=1.e6
!SV=0.0
tw1=1.0
tw2=0
thref=((tw1+tw2)/2)
!dt=(tw1-tw2)
pr=.71
uin=0.02
!**********************************************!
! Setting initial values
!**********************************************!
do i=0,n
do j=0,m
rhoo(i,j)=6.0
visco(i,j)=0.02
u(i,j)=0.
v(i,j)=0.
th(i,j)=0.5
end do
end do
!**********************************************!
! setting LBM solution parameters
!**********************************************!
do i=0,n
do j=0,m
alpha(i,j)=visco(i,j)/pr
rho(i,j)=rhoo(i,j)
end do
end do
do i=0,n
do j=0,m
omega(i,j)=1./(3.*visco(i,j)+0.5)
end do
end do
do i=0,n
do j=0,m
omegat(i,j)=1./(3.*alpha(i,j)+0.5)
end do
end do
mstep=1 !
savenumber=0
kk=0
!**********************************************!
!main loop
!**********************************************!
!main loop
1 do while (mstep==1) !kk=1,mstep
call collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,visco,thref)
call streaming(f,n,m)
Continued
312 Handbook of hydroinformatics
call bouncon(f,n,m,uin)
call uvcalc(f,rho,u,v,n,m,cx,cy)
! -----------------------------!collesion for th
call colls(u,v,g,geq,th,omegat,w,cx,cy,n,m)
!streaming for th
call streaming(g,n,m)
call gbouncon(g,tw1,tw2,w,n,m)
call thcalcu(g,th,n,m)
! Showing some parameters to check solution in iterations
print*,kk,"**",th(n/2,m/2),"**",u(n/2,m/2),"**"
! maximum iteration criterion
if (kk==39999) mstep=2
! setting autosave period
if(savenumber==5000) call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref)
kk=kk+1
savenumber=savenumber+1
errmaxu=0.0
errmaxsc=0.0
END DO
!**********************************************!
! end of the main loop
!**********************************************!
call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref)
stop
end
!**********************************************!
! Subroutine of collesion for FLOW field
!**********************************************!
subroutine collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,visco,thref)
real f(0:8,0:n,0:m),omega(0:n,0:m)
real feq(0:8,0:n,0:m),rho(0:n,0:m),th(0:n,0:m)
real w(0:8),cx(0:8),cy(0:8),visco(0:n,0:m)
real u(0:n,0:m),v(0:n,0:m)
do i=0,n
do j=0,m
t1=u(i,j)*u(i,j)+v(i,j)*v(i,j)
do k=0,8
t2=u(i,j)*cx(k)+v(i,j)*cy(k) !u.c(k)
feq(k,i,j)=rho(i,j)*w(k)*(1.0+3.0*t2+4.5*t2*t2-1.50*t1)
f(k,i,j)=omega(i,j)*feq(k,i,j)+(1.-omega(i,j))*f(k,i,j)
end do
end do
end do
return
end
!**********************************************!
! Subroutine of collesion for Thermal field
!**********************************************!
subroutine colls(u,v,g,geq,th,omegat,w,cx,cy,n,m)
real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m)
real w(0:8),cx(0:8),cy(0:8),omegat(0:n,0:m)
real u(0:n,0:m),v(0:n,0:m)
do i=0,n
do j=0,m
do k=0,8
geq(k,i,j)=th(i,j)*w(k)*(1.0+3.0*(u(i,j)*cx(k)+v(i,j)*cy(k)))
g(k,i,j)=omegat(i,j)*geq(k,i,j)+(1.0-omegat(i,j))*g(k,i,j)
end do
end do
end do
return
Continued
Lattice Boltzmann method and its applications Chapter
18
end
!**********************************************!
! Subroutine of streaming
!**********************************************!
subroutine streaming(f,n,m)
real f(0:8,0:n,0:m)
! streaming
do j=0,m
do i=n,1,-1 ! RIGHT TO LEFT
f(1,i,j)=f(1,i-1,j)
end do
do i=0,n-1 ! LEFT TO RIGHT
f(3,i,j)=f(3,i+1,j)
end do
end do
do j=m,1,-1 ! TOP TO BOTTOM
do i=0,n
f(2,i,j)=f(2,i,j-1)
end do
do i=n,1,-1
f(5,i,j)=f(5,i-1,j-1)
end do
do i=0,n-1
f(6,i,j)=f(6,i+1,j-1)
end do
end do
do j=0,m-1 !BOTTOM TO TOP
do i=0,n
f(4,i,j)=f(4,i,j+1)
end do
do i=0,n-1
f(7,i,j)=f(7,i+1,j+1)
end do
do i=n,1,-1
f(8,i,j)=f(8,i-1,j+1)
end do
end do
return
end
!**********************************************!
! Subroutine of Boundary condition for flow field
!**********************************************!
subroutine bouncon(f,n,m,uin)
real f(0:8,0:n,0:m)!,feq(0:8,0:n,0:m)
real uin
! West boundary, inlet, u=uin
do j=0,m
rhow=(f(0,0,j)+f(2,0,j)+f(4,0,j)+2.*(f(3,0,j)+f(6,0,j)+f(7,0,j)))/(1.-uin)
f(1,0,j)=f(3,0,j)+2.*rhow*uin/3
f(5,0,j)=f(7,0,j)-(f(2,0,j)-f(4,0,j))/2.+rhow*uin/6.
f(8,0,j)=f(6,0,j)+(f(2,0,j)-f(4,0,j))/2.+rhow*uin/6.
! East Boundary, open boundary
f(3,n,j)=4*f(3,n-1,j)/3-f(3,n-2,j)/3
f(6,n,j)=4*f(6,n-1,j)/3-f(6,n-2,j)/3
f(7,n,j)=4*f(7,n-1,j)/3-f(7,n-2,j)/3
end do
do i=0,n
! South Boundary, Bounce Back
f(2,i,0)=f(4,i,0)
f(5,i,0)=f(7,i,0)
f(6,i,0)=f(8,i,0)
! North Boundary, Bounce Back
f(4,i,m)=f(2,i,m)
f(8,i,m)=f(6,i,m)
f(7,i,m)=f(5,i,m)
Continued
313
314 Handbook of hydroinformatics
end do
!obsatcle 1
! East boundary, Bounce Back
do j=0,m/2
f(3,n/4,j)=f(1,n/4,j)
f(6,n/4,j)=f(8,n/4,j)
f(7,n/4,j)=f(5,n/4,j)
! West boundary, Bounce Back
f(1,n/4+m/2,j)=f(3,n/4+m/2,j)
f(5,n/4+m/2,j)=f(7,n/4+m/2,j)
f(8,n/4+m/2,j)=f(6,n/4+m/2,j)
end do
do i=n/4,n/4+m/2
! North Boundary, Bounce Back
f(2,i,m/2)=f(4,i,m/2)
f(5,i,m/2)=f(7,i,m/2)
f(6,i,m/2)=f(8,i,m/2)
end do
!obsatcle 2
! East boundary, Bounce Back
do j=m/2,m
f(3,n/2,j)=f(1,n/2,j)
f(6,n/2,j)=f(8,n/2,j)
f(7,n/2,j)=f(5,n/2,j)
! West boundary, Bounce Back
f(1,n/2+m/2,j)=f(3,n/2+m/2,j)
f(5,n/2+m/2,j)=f(7,n/2+m/2,j)
f(8,n/2+m/2,j)=f(6,n/2+m/2,j)
end do
do i=n/2,n/2+m/2
! South Boundary, Bounce Back
f(4,i,m/2)=f(2,i,m/2)
f(7,i,m/2)=f(5,i,m/2)
f(8,i,m/2)=f(6,i,m/2)
end do
return
end
!**********************************************!
! Subroutine of Boundary condition for Thermal field
!**********************************************!
subroutine gbouncon(g,tw1,tw2,w,n,m)
real g(0:8,0:n,0:m),geq(0:8,0:n,0:m)
real w(0:8),tw1,tw2
tw3=0.25*tw1
tw4=0.5*tw1
! Boundary Conditions
! West Boundary Condition, T=0.5
do j=0,m
g(1,0,j)=tw4*(w(1)+w(3))-g(3,0,j)
g(5,0,j)=tw4*(w(5)+w(7))-g(7,0,j)
g(8,0,j)=tw4*(w(8)+w(6))-g(6,0,j)
end do
! East Boundary Condition, open boundary
do j=0,m
g(6,n,j)=4*g(6,n-1,j)/3-g(6,n-2,j)/3
g(3,n,j)=4*g(3,n-1,j)/3-g(3,n-2,j)/3
g(7,n,j)=4*g(7,n-1,j)/3-g(7,n-2,j)/3
end do
do i=0,n
do k=0,8
Continued
Lattice Boltzmann method and its applications Chapter
18
! Top Boundary Condition, Adiabatic
g(k,i,m)=g(k,i,m-1)
! Bottom Boundary Condition, Adiabatic
g(k,i,0)=g(k,i,1)
end do
end do
!Obsatacle 1, T=0.25
!Top Boundary Condition, T=0.25
do i=n/4,n/4+m/2
g(2,i,m/2)=tw3*(w(2)+w(4))-g(4,i,m/2)
g(5,i,m/2)=tw3*(w(5)+w(7))-g(7,i,m/2)
g(6,i,m/2)=tw3*(w(6)+w(8))-g(8,i,m/2)
end do
do j=0,m/2
! East Boundary Condition, T=0.25
g(3,n/4,j)=tw3*(w(1)+w(3))-g(1,n/4,j)
g(7,n/4,j)=tw3*(w(5)+w(7))-g(5,n/4,j)
g(6,n/4,j)=tw3*(w(8)+w(6))-g(8,n/4,j)
! West Boundary Condition, T=0.25
g(1,n/4+m/2,j)=tw3*(w(1)+w(3))-g(3,n/4+m/2,j)
g(5,n/4+m/2,j)=tw3*(w(5)+w(7))-g(7,n/4+m/2,j)
g(8,n/4+m/2,j)=tw3*(w(8)+w(6))-g(6,n/4+m/2,j)
end do
! Top Boundary Condition, T=1
!Obsatacle 2, T=1.0
!Bottom Boundary Condition, T=1.0
do i=n/2,n/2+m/2
g(4,i,m)=tw1*(w(2)+w(4))-g(2,i,m)
g(7,i,m)=tw1*(w(5)+w(7))-g(5,i,m)
g(8,i,m)=tw1*(w(6)+w(8))-g(6,i,m)
end do
do j=m/2,m
! East Boundary Condition, T=1.0
g(3,n/2,j)=tw1*(w(1)+w(3))-g(1,n/2,j)
g(7,n/2,j)=tw1*(w(5)+w(7))-g(5,n/2,j)
g(6,n/2,j)=tw1*(w(8)+w(6))-g(8,n/2,j)
! West Boundary Condition, T=1.0
g(1,n/2+m/2,j)=tw1*(w(1)+w(3))-g(3,n/2+m/2,j)
g(5,n/2+m/2,j)=tw1*(w(5)+w(7))-g(7,n/2+m/2,j)
g(8,n/2+m/2,j)=tw1*(w(8)+w(6))-g(6,n/2+m/2,j)
end do
return
end
!**********************************************!
! Temperature calculation
!**********************************************!
subroutine thcalcu(g,th,n,m)
real g(0:8,0:n,0:m),th(0:n,0:m)
do j=0,m
do i=0,n
thsum=0.0
do k=0,8
thsum=thsum+g(k,i,j)
end do
th(i,j)=thsum
if (i>=n/4.and.i<=(n/4+m/2).and.j<=m/2) th(i,j)=0.25
if (i>=n/2.and.i<=(n/2+m/2).and.j>=m/2) th(i,j)=1.0
end do
end do
return
Continued
315
316 Handbook of hydroinformatics
end
!**********************************************!
! Subroutine of velocity calculation
!**********************************************!
subroutine uvcalc(f,rho,u,v,n,m,cx,cy)
real f(0:8,0:n,0:m),rho(0:n,0:m),u(0:n,0:m),v(0:n,0:m)
real cx(0:8),cy(0:8)
do j=0,m
do i=0,n
uvsum=0.0
do k=0,8
uvsum=uvsum+f(k,i,j)
end do
rho(i,j)=uvsum
end do
end do
do j=0,m
do i=0,n
usum=0.0
vsum=0.0
do k=0,8
usum=usum+f(k,i,j)*cx(k)
vsum=vsum+f(k,i,j)*cy(k)
end do
u(i,j)=usum/rho(i,j)
v(i,j)=vsum/rho(i,j)
if (i>=n/4.and.i<=(n/4+m/2).and.j<=m/2) u(i,j)=0
if (i>=n/4.and.i<=(n/4+m/2).and.j<=m/2) v(i,j)=0
if (i>=n/2.and.i<=(n/2+m/2).and.j>=m/2) u(i,j)=0
if (i>=n/2.and.i<=(n/2+m/2).and.j>=m/2) v(i,j)=0
end do
end do
return
end
!**********************************************!
! Subroutine of exporting results
!**********************************************!
subroutine result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref)
real u(0:n,0:m),v(0:n,0:m),rho(-1:n+1,-1:m+1),th(0:n,0:m)
real strf(0:n,0:m)
real tw1,tw2,dt,thref
real tt(0:n,0:m)
CHARACTER FILOUT1*18
WRITE(FILOUT1,'(4HUVTS,I8,4H.lbm)')kk
! Streamfunction Calculations
strf(0,0)=0.
do i=0,n
rhoav=0.5*(rho(i-1,0)+rho(i,0))
if(i.ne.0) strf(i,0)=strf(i-1,0)-rhoav*0.5*(v(i-1,0)+v(i,0))
do j=1,m
rhom=0.5*(rho(i,j)+rho(i,j-1))
strf(i,j)=strf(i,j-1)+rhom*0.5*(u(i,j-1)+u(i,j))
end do
end do
! Exporting velocity and thermal fields
OPEN(2,FILE=FILOUT1)
write(2,*)"VARIABLES=X,Y,U,V,th,StreamF,rho"
write(2,*)"ZONE ","I=",n+1,"J=",m+1,",","F=BLOCK"
do j=0,m
write(2,*)(i/float(m),i=0,n)
end do
do j=0,m
write(2,*)(j/float(m),i=0,n)
end do
do j=0,m
Continued
Lattice Boltzmann method and its applications Chapter
18
317
write(2,*)(u(i,j),i=0,n)
end do
do j=0,m
write(2,*)(v(i,j),i=0,n)
end do
do j=0,m
write(2,*)(th(i,j),i=0,n)
end do
do j=0,m
write(2,*)(strf(i,j),i=0,n)
end do
do j=0,m
write(2,*)(rho(i,j),i=0,n)
end do
savenumber=0.
return
end
References
Ahrenholz, B., T€
olke, J., Lehmann, P., Peters, A., Kaestner, A., Krafczyk, M., Durner, W., 2008. Prediction of capillary hysteresis in a porous material
using lattice-Boltzmann methods and comparison to experimental data and a morphological pore network model. Adv. Water Resour. 31 (9), 1151–
1173.
Ajarostaghi, S.S.M., Delavar, M.A., Poncet, S., 2019. Thermal mixing, cooling and entropy generation in a micromixer with a porous zone by the lattice
Boltzmann method. J. Therm. Anal. Calorim., 1–19.
Ba, Y., Liu, H., Li, Q., Kang, Q., Sun, J., 2016. Multiple-relaxation-time color-gradient lattice Boltzmann model for simulating two-phase flows with high
density ratio. Phys. Rev. E 94 (2), 023310.
Bao, J., Schaefer, L., 2013. Lattice Boltzmann equation model for multi-component multi-phase flow with high density ratios. Appl. Math. Model. 37 (4),
1860–1871.
Bhatnagar, P.L., Gross, E.P., Krook, M., 1954. A model for collision processes in gases. I. Small amplitude processes in charged and neutral onecomponent systems. Phys. Rev. 94 (3), 511.
Chen, H., Chen, S., Matthaeus, W.H., 1992. Recovery of the Navier-stokes equations using a lattice-gas Boltzmann method. Phys. Rev. A 45 (8), R5339.
Chen, L., Kang, Q., Mu, Y., He, Y.L., Tao, W.Q., 2014. A critical review of the pseudopotential multiphase lattice Boltzmann model: methods and applications. Int. J. Heat Mass Transf. 76, 210–236.
Chen, Z., Shu, C., Tan, D., Niu, X.D., Li, Q., 2018. Simplified multiphase lattice Boltzmann method for simulating multiphase flows with large density
ratios and complex interfaces. Phys. Rev. E 98 (6), 063314.
Dauyeshova, B., Rojas-Solórzano, L.R., Monaco, E., 2018. Numerical simulation of diffusion process in T-shaped micromixer using Shan-Chen lattice
Boltzmann method. Comput. Fluids 167, 229–240.
Delavar, M.A., Hedayatpour, M., 2012. Forced convection and entropy generation inside a channel with a heat-generating porous block. Heat Transfer—
Asian Res. 41 (7), 580–600.
Delavar, M.A., Wang, J., 2020a. Modeling combined effects of temperature and structure on competition and growth of multispecies biofilms in microbioreactors. Ind. Eng. Chem. Res. 59 (37), 16122–16135.
Delavar, M.A., Wang, J., 2020b. Pore-scale modeling of competition and cooperation of multispecies biofilms for nutrients in changing environments.
AIChE J. 66 (6), e16919.
Delavar, M.A., Wang, J., 2021a. Modeling coupled temperature and transport effects on biofilm growth using thermal lattice Boltzmann model. AIChE
J. 67 (4), e17122.
Delavar, M.A., Wang, J., 2021b. Numerical investigation of pH control on dark fermentation and hydrogen production in a microbioreactor. Fuel 292,
120355.
Delavar, M.A., Wang, J., 2021c. Lattice Boltzmann method in modeling biofilm formation, growth and detachment. Sustainability 13 (14), 7968.
Delavar, M.A., Wang, J., 2022a. Modeling microbial growth of dynamic membrane in a biohydrogen production bioreactor. Int. J. Hydrog. Energy 47,
7666–7681.
Delavar, M.A., Wang, J., 2022b. Three-dimensional modeling of photo fermentative biohydrogen generation in a microbioreactor. Renew. Energy 181,
1034–1045.
Delavar, M.A., Farhadi, M., Sedighi, K., 2009. Effect of the heater location on heat transfer and entropy generation in the cavity using the lattice Boltzmann
method. Heat Transfer Res. 40 (6).
Delavar, M.A., Farhadi, M., Sedighi, K., 2010. Numerical simulation of direct methanol fuel cells using lattice Boltzmann method. Int. J. Hydrog. Energy
35 (17), 9306–9317.
318 Handbook of hydroinformatics
d’Humières, D., 2002. Multiple–relaxation–time lattice Boltzmann models in three dimensions. Philos. Trans. R. Soc. London, Ser. A 360 (1792), 437–
451.
Dong, B., Yan, Y.Y., Li, W., Song, Y., 2010. Lattice Boltzmann simulation of viscous fingering phenomenon of immiscible fluids displacement in a
channel. Comput. Fluids 39 (5), 768–779.
Falcucci, G., Ubertini, S., Bella, G., Succi, S., 2013. Lattice Boltzmann simulation of cavitating flows. Commun. Comput. Phys. 13 (3), 685–695.
Frisch, U., d’Humieres, D., Hasslacher, B., Lallemand, P., Pomeau, Y., Rivet, J.P., 1987. Lattice gas hydrodynamics in two and three dimensions. Complex
Syst. 1 (4), 649–707.
Grunau, D., Chen, S., Eggert, K., 1993. A lattice Boltzmann model for multiphase fluid flows. Phys. Fluids A 5 (10), 2557–2562.
Gunstensen, A.K., Rothman, D.H., Zaleski, S., Zanetti, G., 1991. Lattice Boltzmann model of immiscible fluids. Phys. Rev A 43 (8), 4320.
He, X., Chen, S., Zhang, R., 1999. A lattice Boltzmann scheme for incompressible multiphase flow and its application in simulation of Rayleigh–Taylor
instability. J. Comput. Phys. 152 (2), 642–663.
Huang, H., Krafczyk, M., Lu, X., 2011. Forcing term in single-phase and Shan-Chen-type multiphase lattice Boltzmann models. Phys. Rev. E 84 (4),
046710.
Huang, H., Huang, J.J., Lu, X.Y., Sukop, M.C., 2013. On simulations of high-density ratio flows using color-gradient multiphase lattice Boltzmann models.
Int. J. Mod. Phys. C 24 (04), 1350021.
Huang, H., Sukop, M., Lu, X., 2015. Multiphase Lattice Boltzmann Methods: Theory and Application. Wiley.
Inamuro, T., Ogata, T., Tajima, S., Konishi, N., 2004. A lattice Boltzmann method for incompressible two-phase flows with large density differences.
J. Comput. Phys. 198 (2), 628–644.
Jami, M., Moufekkir, F., Mezrhab, A., Fontaine, J.P., Bouzidi, M.H., 2016. New thermal MRT lattice Boltzmann method for simulations of convective
flows. Int. J. Therm. Sci. 100, 98–107.
Jiang, P.X., Lu, X.C., 2006. Numerical simulation of fluid flow and convection heat transfer in sintered porous plate channels. Int. J. Heat Mass Transf. 49
(9–10), 1685–1695.
Lallemand, P., Luo, L.S., 2003. Theory of the lattice Boltzmann method: acoustic and thermal properties in two and three dimensions. Phys. Rev. E 68 (3),
036706.
Latva-Kokko, M., Rothman, D.H., 2005. Static contact angle in lattice Boltzmann models of immiscible fluids. Phys. Rev. E 72 (4), 046701.
Li, Q., Luo, K.H., Kang, Q.J., He, Y.L., Chen, Q., Liu, Q., 2016a. Lattice Boltzmann methods for multiphase flow and phase-change heat transfer. Prog.
Energy Combust. Sci. 52, 62–105.
Li, Z., Yang, M., Zhang, Y., 2016b. Double MRT thermal lattice Boltzmann method for simulating natural convection of low Prandtl number fluids. Int.
J. Numer. Methods Heat Fluid Flow 26 (6). https://doi.org/10.1108/HFF-04-2015-0135.
Liu, H., Kang, Q., Leonardi, C.R., Schmieschek, S., Narváez, A., Jones, B.D., Williams, J.R., Valocchi, A.J., Harting, J., 2016. Multiphase lattice
Boltzmann simulations for porous media applications. Comput. Geosci. 20 (4), 777–805.
Mezrhab, A., Jami, M., Abid, C., Bouzidi, M.H., Lallemand, P., 2006. Lattice-Boltzmann modelling of natural convection in an inclined square enclosure
with partitions attached to its cold wall. Int. J. Heat Fluid Flow 27 (3), 456–465.
Mohamad, A.A., 2011. Lattice Boltzmann Method. vol. 70 Springer, London, UK.
Nekoeian, S., Goharrizi, A.S., Jamialahmadi, M., Jafari, S., Sotoudeh, F., 2018. A novel Shan and Chen type lattice Boltzmann two phase method to study
the capillary pressure curves of an oil water pair in a porous media. Petroleum 4 (3), 347–357.
Nield, D.A., Bejan, A., 2006. Convection in Porous Media. vol. 3 Springer, New York, USA.
Niu, X.D., Li, Y., Ma, Y.R., Chen, M.F., Li, X., Li, Q.Z., 2018. A mass-conserving multiphase lattice Boltzmann model for simulation of multiphase flows.
Phys. Fluids 30 (1), 013302.
Qian, Y.H., d’Humières, D., Lallemand, P., 1992. Lattice BGK models for Navier-stokes equation. EPL (Europhy. Lett.) 17 (6), 479.
Rothman, D.H., Keller, J.M., 1988. Immiscible cellular-automaton fluids. J. Stat. Phys. 52 (3), 1119–1127.
Saatsaz, M., Eslamian, S., 2020. Groundwater modeling and its concepts, classifications, and applications for solute transport simulation in saturated
porous media, Ch. 4. In: Eslamian, S., Eslamian, F. (Eds.), Advances in Hydrogeochemistry Research. Nova Science Publishers, Inc, USA, pp. 91–120.
Sankaranarayanan, K., Shan, X., Kevrekidis, I.G., Sundaresan, S., 2002. Analysis of drag and virtual mass forces in bubbly suspensions using an implicit
formulation of the lattice Boltzmann method. J. Fluid Mech. 452, 61–96.
Sattari, E., Delavar, M.A., Fattahi, E., Sedighi, K., 2014. Numerical investigation the effects of working parameters on nucleate pool boiling. Int. Commun.
Heat Mass Transfer 59, 106–113.
Sattari, E., Delavar, M.A., Sedighi, K., 2016. Numerical study of bubble separation and motion using lattice Boltzmann method. Chall. Nano Micro Scale
Sci. Technol. 4 (2), 17–27.
Seta, T., Takegoshi, E., Kitano, K., Okui, K., 2006. Thermal lattice Boltzmann model for incompressible flows through porous media. J. Therm. Sci.
Technol. 1 (2), 90–100.
Shan, X., Chen, H., 1993. Lattice Boltzmann model for simulating flows with multiple phases and components. Phys. Rev. E 47 (3), 1815.
Shan, X., Chen, H., 1994. Simulation of nonideal gases and liquid-gas phase transitions by the lattice Boltzmann equation. Phys. Rev. E 49 (4), 2941.
Shan, X., Doolen, G., 1995. Multicomponent lattice-Boltzmann model with interparticle interaction. J. Stat. Phys. 81 (1), 379–393.
Sharma, K.V., Straka, R., Tavares, F.W., 2019. Lattice Boltzmann methods for industrial applications. Ind. Eng. Chem. Res. 58 (36), 16205–16234.
Sipey, M.H., Delavar, M.A., Sattari, E., 2020. Lattice Boltzmann simulation of droplet deformation and breakup due to collision with obstacles in microchannel. Indian J. Phys. 94 (11), 1767–1776.
Sukop, M.C., Thorne, D.T., 2006. Lattice Boltzmann Modeling: An Introduction for Geoscientists and Engineers. Google Scholar Digital Library.
Lattice Boltzmann method and its applications Chapter
18
319
Tian, Z., Wang, J., 2017. Lattice Boltzmann simulation of CO2 reactive transport in network fractured media. Water Resour. Res. 53 (8), 7366–7381.
Tian, Z., Wang, J., 2018. Lattice Boltzmann simulation of dissolution-induced changes in permeability and porosity in 3D CO2 reactive transport.
J. Hydrol. 557, 276–290.
Wang, J., Zhang, X., Bengough, A.G., Crawford, J.W., 2005. Domain-decomposition method for parallel lattice Boltzmann simulation of incompressible
flow in porous media. Phys. Rev. E 72 (1), 016706.
Yan, Y.Y., Zu, Y.Q., 2007. A lattice Boltzmann method for incompressible two-phase flows on partial wetting surface with large density ratio. J. Comput.
Phys. 227 (1), 763–775.
Yan, W.W., Liu, Y., Xu, Y.S., Yang, X.L., 2008. Numerical simulation of air flow through a biofilter with heterogeneous porous media. Bioresour.
Technol. 99 (7), 2156–2161.
Zhang, Y., Huang, Y., Xu, M., Wan, Q., Li, W., Tian, Y., 2020. Flow and heat transfer simulation in a wall-driven porous cavity with internal heat source by
multiple-relaxation time lattice Boltzmann method (MRT-LBM). Appl. Therm. Eng. 173, 115209.
Zheng, H.W., Shu, C., Chew, Y.T., 2006. A lattice Boltzmann model for multiphase flows with large density ratio. J. Comput. Phys. 218 (1), 353–371.
This page intentionally left blank
Chapter 19
Multigene genetic programming and its
various applications
Majid Niazkar
Department of Agricultural and Environmental Sciences - Production, Landscape, Agroenergy, University of Milan, Milan, Italy
1. Introduction
The emergence of machine learning and artificial intelligence (AI) techniques brings about an inevitable influence on each
and every field of research. In essence, these methods own their broad spectrum of possible applications to the fact that they
do not require to know the physical background of the problem statement under investigation. Instead, they work with the
data representing the relationship(s) between the problem state variables in favor of capturing any trend or relation.
Therefore, machine learning methods can be utilized to the estimation-type of problems in various fields, while the major
focus of these models is on the data analysis. Genetic programming (GP), as one of the AI techniques, has been attracted the
attention of many researchers in different fields. It has several modified version, one of which is multigene genetic programming (MGGP). In the absence of a comprehensive review on the MGGP applications, this chapter is devoted not only
to introducing MGGP as one of the modified GP version but also to review its applications in many fields of research. As an
example, MGGP was used to tackle a problem in water resources. Finally, some future trends for applying MGGP to waterrelated field were suggested.
2. Genetic programming and its variants
Genetic algorithm (GA) is a well-established metaheuristic optimization algorithm. It basically adopts the principle of evolution, reproduction and survival of the best gene(s) among a randomly-generated population. In essence, the main five
steps of GA include the generation of a random initial population, fitness function, selection, crossover and mutation. Since
GA works adequately as a search-based optimization algorithm, it has been successfully applied to solving the numerous
problems in various fields including water resources management (Nicklow et al., 2010). Despite the wide range of the GA
applications in practice, many practitioners still need a powerful tool to determine nonlinear behavior of physical systems
(Nedjah et al., 2006). In this regard, an improved version of GA, named as GP, was proposed (Koza, 1992) not only to
address this shortcoming but also to utilize the substantial power of GA (Kouzehgar et al., 2021).
Generally, GP is an AI technique with a tree-like flexible structure (Niazkar and Zakwan, 2021a). It basically exploits
GA as a search engine to seek for a suitable relationship that converts a set of input data into output data (Niazkar, 2020). In
other words, GP tackled the mentioned problem by solving an optimization problem using GA, while the objective function
is to either minimize or maximize the error between the estimated and real output data. This problem solving process
becomes feasible by considering a tree-based architecture for each equation. A symbolic individual defined in GP is illustrated in Fig. 1. In Fig. 1, X1 and X2 are the input variables and Y is an output variable. As shown, Fig. 1 introduces three parts
of a custom equation in GP, which are root node, function node and terminal nodes. Hence, each individual is a gene or an
equation, which the functions and fixed coefficients are saved in a tree format in GP.
The main steps of the process conducted by GP consist of initialization, selection, reproduction, and termination
(Niazkar et al., 2019a). In the first step, GP commences the process by generating a random initial population by combining
the functions and terminal sets (input variables, constant coefficients) randomly. In essence, initialization process is to
specify a certain number of individuals with the random shapes and nodal functions and terminals (Garg and Tai,
2014). The functions used in GP include the arithmetic operations, trigonometric functions, exponential and logarithmic
functions, square function, Boolean operators, protected square root and protected natural logarithm. The last two functions
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00019-1
Copyright © 2023 Elsevier Inc. All rights reserved.
321
322
Handbook of hydroinformatics
Root node
+
×
sin
Function node
X2
Terminal nodes
X1
3
Y=3X1+sin(X2) where X1 and X2 are input variables
and Y is an output variable
FIG. 1 Different parts of a symbolic individual in genetic programming.
return zero and a very small number if a negative value is inserted in the square root and natural logarithm, respectively
(Marini and Conversi, 2012).
Each individual in the created populations is a random equation counted as the possible candidates if they would
describe the relation between the input and output variables adequately. Since the initial population does not necessarily
contain the best-fitted relation between the input and output data, GP needs to modify this population and search for the best
equations. This modification can be interpreted as changing the shape and/or information in each node of the individuals in
the population. For this purpose, the population is subjected to the GA operators including selection, crossover and mutation
in the second step (Niazkar, 2019). The evolutionary process is essentially applied not only to create but also to select
individuals with better fitness criterion. A schematic view of the crossover and mutation processes conducted by GP is
depicted in Fig. 2. As shown, the former process is basically exchanging the random parts of parent genes, while the latter
is altering a random part of parent gene to develop a new offspring gene. The reproduction process is applied to each generation until the termination criterion, which is either the maximum number of generations or a threshold error, is met (Garg
and Tai, 2014). The termination step is the final step in which a relation with desirable accuracy is acquired.
Similar to GA, several versions of GP were developed to enhance the characteristics of this estimation tool. Some of
these versions are classical or monolithic genetic programming (GP), linear genetic programming (LGP), traceless genetic
programming (TGP), gene expression programming (GEP), grammar-based genetic programming (GGP), and multigene
genetic programming (MGGP). The next section introduces MGGP.
3.
An introduction to multigene genetic programming
In the traditional GP, each individual or chromosome has a single tree or gene, whereas a typical individual in MGGP can
consist of more than one gene or tree, which is the key difference between GP and MGGP. Another difference between
parent genes
+
–
×
×
×
cos
offspring gene
parent gene
sin
8
5
X1
X2
0.5
X1
X2
×
Crossover
-
8
0.5
X1
X2
X2
×
0.5
cos
×
X1
0.5
×
X1
8
+
cos
sin
×
cos
Mutation
offspring genes
×
-
-
5
X1
(a) Crossover
FIG. 2 Crossover and mutation processes of genetic programming.
X2
(b) Mutation
X2
Multigene genetic programming Chapter
19
323
MGGP and GP is the evolutionary process. To be more specific, MGGP exploits two kinds of crossover processes including
two point high level crossover and low level crossover (Searson et al., 2010). In the former process, genes in two multigene
parents (two individuals) can be exchanged in MGGP to develop two new offspring using a tree crossover. This process also
enables either to add or remove a gene from an individual. After applying this process, new individuals are checked to have
lower number of genes than the maximum number of gene allowed in one individual, which is a controlling parameter
specified by the user. If any individual has more number of genes than the maximum limit, it will be removed from
the population. Furthermore, the low level crossover is similar to the routine subtree crossover process in GP. In this
process, one of the genes in an individual parent is selected randomly and the crossover is conducted within the corresponding gene, which provides a new individual offspring. Also, MGGP employs several methods for mutation process
in addition to the standard mutation process of GP, which may be counted as another difference between GP and MGGP.
Basically, an individual in MGGP or multibranches GP contains several genes or trees, while each gene in an individual
is multiplied by a gene weight. Since MGGP utilizes either linear or nonlinear regression methods to determine the
optimum values of gene coefficients, they are also called regression coefficients (Mehr and Nourani, 2018). When the
regression coefficients are obtained by a least square model, the corresponding MGGP model is called a pseudo-linear
model, which is capable of capturing nonlinear behavior of systems. In this version of MGGP, the linear algebraic summation of the weighted genes available in a single individual and a stochastic term, invariantly named bias or noise, gives
the equation of the corresponding individual. For a better clarification, Fig. 3 presents a schematic view of a single individual in MGGP that consists of two genes. In Fig. 3, d1 and d2 are the gene coefficients and d0 is bias.
The flowchart of the pseudo-linear MGGP is depicted in Fig. 4. In a bid to improve this model, it can be combined with
other approaches to work as a hybrid model (Ghorbani et al., 2018). These methods can be used to either modify different
processes including the determination of optimum values of gene coefficients and the process of selecting appropriate
models. For instance, the symbolic regression was replaced by the stepwise regression approach to improve the former
process (Garg and Tai, 2014; Garg and Lam, 2015). Moreover, artificial neural network, support vector machines
(Garg and Tai, 2014), Bayesian classifier (Garg and Lam, 2015) and the Pareto optimal method (Riahi-Madvar et al.,
2019) were used to improve the selection of models developed by MGGP.
In comparison with the classical GP, MGGP is supposed to exploit the smaller trees (Garg and Tai, 2014; Mehr and
Nourani, 2017). As a result, MGGP is expected to offer the simpler models than those developed by classical GP (Searson,
2015). Furthermore, using more number of genes in MGGP makes it suitable for dealing with the problems with more highorder of complexity. Furthermore, MGGP and GP benefit not only from the random search of GA but also from the flexible
tree architecture. The latter enables the successful implementation of various built-in functions and constant coefficients
required to develop the best-fitted expression describing the relation between any sets of the input and output data.
Since GA is a zero-order optimization algorithm, the process of initialization in MGGP is conducted without the need to
assume the functions and terminals in advance. This advantage enables MGGP to be a more appealing tool in comparison
with nonlinear regression models because the structure, functions and constant coefficients of the relation between the input
and output data are not needed to be estimated in advance (Niazkar, 2021). To be more precise, the type of equation, which
is unknown at the beginning of the process in most of the times, should be specified in nonlinear regression models. This
requirement confines the relation under investigation to a very limited structure. However, MGGP develops the relations
without shape limitation, which is indeed a helpful characteristic when it comes to searching for the relationships between
two sets of data (Lee and Suh, 2020). Finally, there is indeed a trade-off between the precision and complexity of relations
+
–
×
0.5
×
cos
X1
X2
8
×
sin
5
X2
X1
Y=d0+d1[0.5X1+cos(X2)]+d2[8 sin(X1)+5 X2] where X1
and X2 are input variables, Y is an output variable, d0 is
bias, and d1 and d2 are gene coefficients
FIG. 3 A schematic view of a two-gene individual in multigene genetic programming.
324
Handbook of hydroinformatics
Start
Insert input data
Specify MGGP controlling parameters, particularly the
maximum number of genes allowed in an individual
(Gmax) and the maximum depth of trees (dmax)
Create initial population randomly
Does each
individual satisfy
Gmax and dmax?
No
Yes
Find gene coefficients for each
individual using regression models
Evaluating fitness function for each individual
Yes
Evaluate the
performance
of MGGP
model for the
new data,
which were
not used for
developing
the model
End
Are termination
criteria satisfied?
No
Apply genetic operators
Create a new generation
Does each
individual satisfy
Gmax and dmax?
No
Yes
Find gene coefficients for each
individual using regression models
Evaluating fitness function for each individual
FIG. 4 Flowchart of the pseudo-linear multigene genetic programming.
that can determine the output data based on the input data, while such balance can be addressed by parameters controlling
the process, which are presented in the next section.
4.
Main controlling parameters of MGGP
The basic version of MGGP has several types of the parameters that control each and every part of the processes conducted
by MGGP. Different groups of controlling parameters in a typical MGGP model include (1) run control parameters,
(2) fitness parameters, (3) selection parameters, (4) terminal nodes, (5) function nodes, (6) genetic operators, (7) tree build
parameters, (8) multigene parameters, and (9) mutation settings (Searson, 2009). Fig. 5 lists the MGGP parameters in the
aforementioned groups.
Among various controlling parameters shown in Fig. 5, two parameters, which can only be set equal to nonzero integer
values, have a profound impact on the MGGP results. The first parameter is the maximum number of genes allowed in an
individual. When this multigene parameter is set to be one, the MGGP model turns into the classical GP model, while it
needs to be more than one to serve as the MGGP models. Furthermore, it mainly used as a criterion to dismiss individuals
with more than the allowed genes. These individuals are generated either randomly in the very first population or by the
evolutionary process like crossover and mutation. The second crucial parameter is the maximum depth of trees in MGGP,
which is clearly a tree-build parameter (Searson, 2009). It is also used as a maximum limit for the number of function and
terminal nodes in each gene. Obviously, the more the values of these two parameters are assumed, the more terms the
Multigene genetic programming Chapter
Run control parameters
Fitness parameters
Selection parameters
• Termination threshold value
• Fitness minimization (True to minimize and False to maximize the fitness
function)
• The size of the tournament
• Elitism
• The range that constant nodes are generated from with uniform probability
• The number of “input” nodes
Function nodes
• Cell array of the user’s defined function.
• Probability of GP tree mutation
• Probability of GP tree crossover
• Probability of GP tree direct copy
Tree build parameters
• Maximum depth of trees
• Maximum number of nodes per tree
• Maximum depth of sub-trees created by mutation
Multi-gene parameters
• Maximum number of genes per individual
Mutation settings
325
• Population size
• Number of generations to run
Terminal nodes
Genetic operators
19
• Standard deviation of perturbation applied in mutation Gaussian perturbation
of a randomly selected constant
FIG. 5 Controlling parameters of multigene genetic programming.
relation can theoretically have. Therefore, the trade-off between the complexity and accuracy of the MGGP results can be
controlled by the user mostly over these two parameters.
In one of the open-source MATLAB-based codes of MGGP, named as GPTIPS (Searson, 2009; Searson et al., 2010),
the fitness function is the root mean square of errors between the estimated and observed sets of input data. This software
has been widely used for applying MGGP to the various problems in the literature (Garg et al., 2014a; Garg and Lam, 2015;
Mehr and Kahya, 2017; Safari and Mehr, 2018; Zakwan and Niazkar, 2021; Lee and Suh, 2020). The GPTIPS default values
of some of the MGGP controlling parameters mentioned in Fig. 5 is given in Table 1. As shown, the default values of the
maximum tree depth and the maximum number of genes per a single individual are 6 and 1, respectively (Searson, 2009).
Hence, the default version of GPTIPS works as a GP model because the MGGP models can have more than one gene in each
individual. By specifying an integer value more than one to the maximum number of genes allowed in a chromosome,
GPTIPS works as a MGGP model. According to Table 1, the summation of parameters describing the probability of genetic
operators should be equal to one (Searson et al., 2010). Moreover, Table 1 indicates that the population size and the
maximum number of generation to run are 100 and 100, respectively. Finally, MGGP controlling parameters, particularly
the maximum depth of trees and the maximum number of genes allowed in an individual, may be selected by adopting
either a trial-and-error process (Garg and Lam, 2015) or a sensitivity analysis for a specific problem because they may
have an inevitable impact on the final results obtained by MGGP.
5. A review on MGGP applications
MGGP has been utilized to solve the various problems in different fields of research. These fields include aerospace science
(De Giorgi and Quarta, 2020), biomedicine ( Javed et al., 2016), medicine and global health (Hasan et al., 2016; Niazkar and
Niazkar, 2020), chemical engineering (Esmaeili and Mohebbi, 2017), petroleum engineering (Kaydani et al., 2014), industrial engineering (Garg and Lam, 2015), electrical engineering (Pedrino et al., 2019), mechanical engineering (Garg et al.,
2014c), urban engineering (Mousavi et al., 2015; Beura and Bhuyan, 2018), geotechnical engineering (Gandomi and Alavi,
2012a; Muduli and Das, 2014; Garg et al., 2014b; Muduli and Das, 2015; Chen et al., 2016), structural engineering
(Gandomi and Alavi, 2012b; Mohammadi Bayazidi et al., 2014; Hoang et al., 2017), and environmental engineering
(Pandey et al., 2015). Even though GP and its several variants like GEP have been applied to solve the various problems
in water resources field of study (Mehr et al., 2018; Mohammad-Azari et al., 2020), MGGP has only been used to a few
326
Handbook of hydroinformatics
TABLE 1 Default values of some of the controlling parameters of multigene genetic programming.
MGGP controlling parameter
GPTIPS default
values
Population size
100
Number of generations
100
Maximum number of genes allowed in individual
1
Maximum depth of trees
6
Maximum depth of subtrees created by mutation
6
Tournament size
2
Elitism (the fraction of population to copy directly to the next generation without modification)
0.05
Probability of GP tree mutation
0.1
Probability of GP tree crossover
0.85
Probability of GP tree direct copy
0.05
Standard deviation of perturbation applied in mutation Gaussian perturbation of a randomly selected constant
0.1
The range that constant nodes are generated from with uniform probability
[10, 10]
problems in this field. In the following, a chronological literature review of applying MGGP in the water resource field is
presented:
Kumar et al. (2014) applied MGGP to propose the models for predicting sediment transport, total bed load and incipient
motion. The results estimated by the MGGP models for the three sediment problems were found to have a high accuracy in
comparison with the several methods available in the literature (Kumar et al., 2014). Garg et al. (2014a) compared support
vector regression, artificial neural network with MGGP to predict the stress-dependent soil water retention curves. The
comparison reveals that the MGGP estimations were more accurate than the other two AI models (Garg et al., 2014a).
Mehr and Nourani (2017) proposed a moving average filtering MGGP (MA-MGGP) technique for estimating singleand multiday ahead runoff. This hybrid method employs moving average filtering as a data preprocessing approach, while
it uses MGGP as a prediction tool and Pareto-front to find the optimum models developed by MGGP. The performance of
the hybrid MGGP-based model was compared with those of the classical GP, MGGP and multilayer perceptron, while the
results showed the promising improvement made by the MA-MGGP model. Mehr and Kahya (2017) suggested the
MA-MGGP approach to predict the daily streamflow. They compared the performances of MA-MGGP with those of monolithic GP conventional multilinear regression prediction models using the daily streamflow records observed at a single
station on Senoz Stream, Turkey. The comparison indicated that the MA-MGGP model not only developed a parsimonious
model for estimating streamflow but also enables implementing human insight to explore the top MA-MGGP solutions for
further analysis (Mehr and Kahya, 2017). Hadi and Tombul (2018) utilized MGGP to select the best scales for forecasting
monthly. Since MGGP or any other AI may not capture the seasonality of the data, a continuous wavelet transformation
analysis was conducted to overcome the stationarity problem before applying MGGP. They estimated 1 month ahead downstream flow of a basin located in the southeast of Turkey by using downstream flow, upstream flow, rainfall, temperature,
and potential evapotranspiration with associated lags as input variables. The results demonstrated that the proposed model
improved the predicted streamflow and the peak values. This improvement was achieved because several scales, which
were appropriate for capturing the seasonality and irregularity of data, were used, while the inclusion of hydrological
and meteorological variables as input data enhances the ability of monthly streamflow forecasting (Hadi and Tombul,
2018). Safari and Mehr (2018) proposed the Pareto-optimal MGGP to predict particle Froude number in large sewers
for scenarios assuming sediment deposition on the bed. Using four sets of data, it was found that the MGGP models yielded
to better estimations of particle Froude number in comparison with the conventional regression models. Since the MGGP
models used a lower number of input variables than the empirical models available in the literature, they may be counted as
a parsimonious alternative for design of self-cleansing large sewers (Safari and Mehr, 2018). Mehr and Nourani (2018)
combined the season algorithm with MGGP to develop a rainfall-runoff model to estimate single- and two- and threeday ahead streamflow at Haldizen Catchment, Trabzon, Turkey. The results of the case study indicate that MGGP may
Multigene genetic programming Chapter
19
327
capture underlying structure of the rainfall-runoff process slightly better than the monolithic GP (Mehr and Nourani, 2018).
Eray et al. (2018) compared MGGP with the dynamic evolving neural-fuzzy inference system (DENFIS), GP and the Hargreaves Samani empirical equation for modeling monthly pan evaporation. The input data include minimum temperature,
maximum temperature, solar radiation, relative humidity, and wind speed, which were all gathered at Antakya and Antalya
stations, Mediterranean region of Turkey. The comparison indicates the slightly-better estimations than those of MGGP and
DENFIS were obtained by GP for Antakya station, while DENFIS reach to better predictions for Antalya station. Additionally, considering periodicity input in MGGP and DENFIS model enhances the accuracy of estimations (Eray et al.,
2018). Ghorbani et al. (2018) integrated Chaos theory with MGGP to develop a hybrid model for river flow forecasting.
The hybrid model was compared with MGGP and local prediction model for estimating daily flow at four stations. The
results indicated that the hybrid MGGP outperformed two other models (Ghorbani et al., 2018). Riahi-Madvar et al.
(2019) applied the Pareto-Optimal-MGGP model to estimate the longitudinal dispersion coefficients using 503 data sets
gathered from natural streams around the world. The proposed model was compared with eight equations present in the
literature, while the comparison indicates that the hybrid MGGP-based model provides simpler and more accurate equations than other available ones (Riahi-Madvar et al., 2019). Lee and Suh (2020) proposed MGGP to develop the stability
equations for rock armor and Tetrapods. They compared the MGGP models with available empirical formulas and artificial
neural network, while the MGGP model outperformed the others (Lee and Suh, 2020). More recently, Niazkar and Zakwan
(2021b) combined MGGP with generalized reduced gradient (GRG) and introduced the hybrid MGGP-GRG. They applied
it to develop stage-discharge relations for both single-value and loop rating curves. The compared the performances of the
conventional method, evolutionary algorithm, the modified honey bee mating optimization (MHBMO) algorithm, artificial
neural network (ANN), MGGP, and the hybrid MGGP-GRG technique for developing the rating curves of eight different
rivers. The obtained results demonstrated that the hybrid MGGP-GRG model yielded the best method in the ranking
analysis for developing single-value and loop rating curves.
6. Future trends of MGGP applications
Based on the literature review conducted in the previous section, MGGP has already been applied to several problems in the
water resources field. These topics include streamflow prediction, rainfall-runoff models, sediment transport modeling,
predicting longitudinal dispersion coefficient, estimating pan evaporation, developing soil water retention curves, and
developing stability equations for rock armor and Tetrapods. Since MGGP has been successfully to numerous problems
in different fields, application of MGGP to water resources problems is suggested for future studies. Based on the current
literature, MGGP can be used as a prediction tool for many problems, which the AI models except MGGP has been used.
Several examples of these problems include developing bed roughness predictors (Giustolisi, 2004; Niazkar et al., 2019a),
design of open channels (Niazkar, 2020), predicting scour depth around piers (Niazkar and Afzali, 2018), estimating water
surface profiles (Niazkar et al., 2021), and hydrological flood and stage routing (Sivapragasam et al., 2008; FallahMehdipour et al., 2013).
7. A case study of the MGGP application
The stage-discharge rating curve is a widely-used diagram that presents the variation of the water depth (G) in terms of
discharge (Q) values. In essence, developing a rating curve is essential when the discharge in a natural stream is not measured directly due to any possible reason. Such development requires a historical database including stage and discharge
values, which were measured concurrently. This database is basically used for estimating the parameters of a rating curve
model. According to the literature, various methods were utilized for the parameter estimation of rating curve models.
In this chapter, 235 data points observed from the Philadelphia gauging site on Schuylkill River, United States were
exploited as a case study for assessing the performance of MGGP. This dataset was previously used in the literature
(Niazkar and Zakwan, 2021b). To be more specific, MGGP with trigonometric functions were utilized to determine the
stage-discharge relationship of the data, while the same problem is revisited in this chapter using MGGP with the square
function. For this purpose, the considered data were divided into two parts named as training (75% of total data) and testing
(25% of total data) parts. The former was used to train MGGP, while a comparative analysis was conducted based on the
latter. Furthermore, the data division, which is depicted in Fig. 6, is the same as the one considered in the previous study
(Niazkar and Zakwan, 2021b). This similarity specifically enables comparing the results with those available in the literature. Additionally, the MGGP parameters were also set based on the previous study (Niazkar and Zakwan, 2021b), and the
GPTIPS version was used for implementing MGGP for this application. Finally, Table 2 shows the maximum, minimum
and average values of the data parameters for both training and testing parts.
328
Handbook of hydroinformatics
2.65
Stage (m)
Training data
Testing data
2.15
1.65
0
100
200
300
400
500
Discharge (m3/s)
FIG. 6 Training and testing parts of the stage-discharge data.
TABLE 2 Ranges of stage and discharge values for the training and testing data.
Data part
Parameter
Minimum
Maximum
Average
Training
G (m)
1.79
2.61
1.94
Q (m3/s)
17.7
484.22
73.63
G (m)
1.8
2.5
1.93
Q (m /s)
19.11
407.76
71.65
Testing
3
In the previous study conducted on this dataset, a few models were developed for estimating discharges from stage
values, while four of them (conventional method, MHBMO, ANN and MGGP) are selected for the comparison purpose.
These models (Eqs. 1–3) and the new MGGP-based model (Eq. 4) are presented in Table 3. The previous MGGP-based
model (Eq. 3) contains trigonometric functions, whereas the MGGP-based model (Eq. 4) proposed in this study was
developed using the square function. Moreover, Eqs. (1) and (2) provide discharge values directly, whereas the stage
and discharge values in Eqs. (3) and (4) are normalized variables. To be more specific, these variables were normalized
QQ
by Q ¼ Q Qmin , where Q, Qmin, and Qmax are the normalized, minimum and maximum discharge values of the dataset.
max
min
The performances of the rating curve models presented in Table 3 were compared using four metrics, which are shown
in Eqs. (5)–(8). They are (1) Sum of Square of Error (SSE), (2) Nash-Sutcliffe efficiency (NE), (3) Mean Absolute Relative
Error (MARE), and (4) Maximum Absolute Relative Error (MXARE):
TABLE 3 Ranges of stage and discharge values for the training and testing data.
Models
Eq.
Stage-discharge relations
Conventional method
1
Q ¼ 578.07(G 1.61)1.98
MHBMO
2
Q ¼ 543.44(G 1.68)1.57
MGGP (trigonometric functions)
3
Q ¼ 0:05899 cos ð11:78y Þ 0:07737 cos ½9:779y cos ðy Þ
36:2 cos ðy Þ
5:953
+ 7:196 sin ð0:8161y Þ 0:01603 sin ð15:46y Þ +
5:937y + 6:1
MGGP (square function )
4
Q ¼ 0:4256y + 0:4256y 2 39:52y 4 3:274y 6
266:6y 4
0:003953y 4
+
+ 0:4879y 2 0:0002393
y 7:282
y 0:5901
Multigene genetic programming Chapter
19
329
TABLE 4 Comparison of the performances of different rating curve models.
Previous study
This study
Data
part
Criteria
Conventional
method
MHBMO
ANN
MGGP (trigonometric
functions)
MGGP (square
function)
Training
SSE
19621.30
511.32
4321.46
398.97
421.41
NE
0.98
1.00
1.00
1.00
1.00
MARE
0.04
0.02
0.02
0.02
0.02
MXARE
0.19
0.08
0.07
0.08
0.07
SSE
3741.23
183.60
184.38
113.60
127.83
NE
0.98
1.00
0.99
1.00
1.00
MARE
0.04
0.02
0.02
0.02
0.02
MXARE
0.13
0.06
0.06
0.06
0.06
Testing
SSE ¼
N 2
X
Qoi Qei
(5)
i¼1
N 2
X
Qoi Qei
NE ¼ 1 i¼1
0
N B
X
B
BQ B oi
i¼1 @
N
X
12
(6)
Q oi C
C
C
N C
A
i¼1
N Q Q 1 X
oi ei
MARE ¼
N i¼1 Qoi !
Q Q oi ei
MXARE ¼ max for i ¼ 1, …, N
Qo (7)
(8)
i
where Qoi and Qei are the ith observed and estimated discharges for the dataset, respectively.
Table 4 compares the performances of five rating models. As shown, the new MGGP-based model improves the performances of the conventional method, MHBMO and ANN based on SSE and NE for both training and testing data. In
particular, the MGGP with the square function improves SSE values of the conventional method, MHBMO and ANN
by 97.8%, 17.6%, and 90.2% for the training data, respectively, while these improvement percentages are 96.6%,
30.4%, and 30.7% for the testing data, respectively. Although the MGGP-based model with trigonometric functions
achieved better SSE and NE values than the new MGGP-based model, the new model may be much simpler to adopt
in numerical modeling, such as flood routing where the first derivative of the discharge is required (Niazkar et al.,
2019b). According to Table 4, MARE and MXARE metrics indicate the better performance of the AI-based models
(ANN and MGGP). Finally, the comparison shown in Table 4 demonstrates that MGGP is capable of providing accurate
explicit estimation models using different types of functions.
8. Conclusions
This chapter reviewed the applications of MGGP. It has been used for tackling numerous problems in different fields
including water resources management. Because of the flexibility of MGGP tree-like structure, it can perform as a powerful
330
Handbook of hydroinformatics
estimation tool in the process of the problem solving. According to the literature review conducted in this chapter, many
problems in the field of water resources have not been explored using MGGP. In this regard, some topics for future research
were suggested for applying MGGP in water resources. Furthermore, the problem of developing the stage-discharge
relation in a river was revisited, and a new MGGP-based model was proposed considering the square function. According
to the comparative analysis, the MGGP-based rating model improves SSE values of the conventional method, MHBMO
and ANN from 17.6% to 97.8% for the training data and between 30.4% and 96.6% for the testing data, respectively.
Finally, it is anticipated that MGGP will be utilized for solving many water resources problems in future as it has comparable merits in comparison with other machine learning methods.
References
Beura, S.K., Bhuyan, P.K., 2018. Operational analysis of signalized street segments using multi-gene genetic programming and functional network techniques. Arab. J. Sci. Eng. 43 (10), 5365–5386.
Chen, J., Zeng, Z., Jiang, P., Tang, H., 2016. Application of multi-gene genetic programming based on separable functional network for landslide displacement prediction. Neural Comput. Appl. 27 (6), 1771–1784.
De Giorgi, M.G., Quarta, M., 2020. Hybrid MultiGene Genetic Programming-Artificial neural networks approach for dynamic performance prediction of
an aeroengine. Aerosp. Sci. Technol. 105902. https://doi.org/10.1016/j.ast.2020.105902.
Eray, O., Mert, C., Kisi, O., 2018. Comparison of multi-gene genetic programming and dynamic evolving neural-fuzzy inference system in modeling pan
evaporation. Hydrol. Res. 49 (4), 1221–1233.
Esmaeili, H., Mohebbi, A., 2017. Prediction of pressure drop in venturi scrubbers by multi-gene genetic programming and adaptive neuro-fuzzy inference
system. Chem. Prod. Process. Model. 12 (3), 1–12.
Fallah-Mehdipour, E., Haddad, O.B., Orouji, H., Mariño, M.A., 2013. Application of genetic programming in stage hydrograph routing of open channels.
Water Resour. Manage. 27 (9), 3261–3272.
Gandomi, A.H., Alavi, A.H., 2012a. A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake
engineering problems. Neural Comput. Appl. 21 (1), 189–201.
Gandomi, A.H., Alavi, A.H., 2012b. A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural
engineering problems. Neural Comput. Appl. 21 (1), 171–187.
Garg, A., Lam, J.S.L., 2015. Improving environmental sustainability by formulation of generalized power consumption models using an ensemble based
multi-gene genetic programming approach. J. Clean. Prod. 102, 246–263.
Garg, A., Tai, K., 2014. An improved multi-gene genetic programming approach for the evolution of generalized model in modelling of rapid prototyping
process. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, Switzerland,
pp. 218–226.
Garg, A., Garg, A., Tai, K.J.C.G., 2014a. A multi-gene genetic programming model for estimating stress-dependent soil water retention curves. Comput.
Geosci. 18 (1), 45–56.
Garg, A., Garg, A., Tai, K., Sreedeep, S., 2014b. An integrated SRM-multi-gene genetic programming approach for prediction of factor of safety of 3-D
soil nailed slopes. Eng. Appl. Artif. Intell. 30, 30–40.
Garg, A., Tai, K., Gupta, A.K., 2014c. A modified multi-gene genetic programming approach for modelling true stress of dynamic strain aging regime of
austenitic stainless steel 304. Meccanica 49 (5), 1193–1209.
Ghorbani, M.A., Khatibi, R., Mehr, A.D., Asadi, H., 2018. Chaos-based multigene genetic programming: a new hybrid strategy for river flow forecasting.
J. Hydrol. 562, 455–467.
Giustolisi, O., 2004. Using genetic programming to determine Chezy resistance coefficient in corrugated channels. J. Hydroinform. 6 (3), 157–173.
Hadi, S.J., Tombul, M., 2018. Monthly streamflow forecasting using continuous wavelet and multi-gene genetic programming combination. J. Hydrol.
561, 674–687.
Hasan, M.K., Islam, M.M., Hashem, M.M.A., 2016. Mathematical model development to detect breast cancer using multigene genetic programming. In:
2016 5th International Conference on Informatics, Electronics and Vision (ICIEV). IEEE, Dhaka, Bangladesh, pp. 574–579.
Hoang, N.D., Chen, C.T., Liao, K.W., 2017. Prediction of chloride diffusion in cement mortar using multi-gene genetic programming and multivariate
adaptive regression splines. Measurement 112, 141–149.
Javed, S.G., Majid, A., Ali, S., Kausar, N., 2016. A bio-inspired parallel-framework based multi-gene genetic programming approach to Denoise biomedical images. Cogn. Comput. 8 (4), 776–793.
Kaydani, H., Mohebbi, A., Eftekhari, M., 2014. Permeability estimation in heterogeneous oil reservoirs by multi-gene genetic programming algorithm.
J. Pet. Sci. Eng. 123, 201–206.
Kouzehgar, K., Hassanzadeh, Y., Eslamian, S., Yousefzadeh Fard, M., Babaeian Amini, A., 2021. Application of gene expression programming and nonlinear regression in determining breach geometry and peak discharge resulting from embankment failure using laboratory data. J. Irrig. Sci.
Eng. https://doi.org/10.22055/jise.2021.35162.1931.
Koza, J.R., 1992. Genetic Programming II, Automatic Discovery of Reusable Subprograms. MIT Press, Cambridge, MA, USA.
Kumar, B., Jha, A., Deshpande, V., Sreenivasulu, G., 2014. Regression model for sediment transport problems using multi-gene symbolic genetic programming. Comput. Electron. Agric. 103, 82–90.
Multigene genetic programming Chapter
19
331
Lee, J.-S., Suh, K.-D., 2020. Development of stability formulas for rock armor and tetrapods using multigene genetic programming. J. Waterw. Port Coast.
Ocean Eng. 146 (1), 04019027. https://doi.org/10.1061/(ASCE)WW.1943-5460.0000540.
Marini, S., Conversi, A., 2012. Understanding zooplankton long term variability through genetic programming. In: European Conference on Evolutionary
Computation, Machine Learning and Data Mining in Bioinformatics. Springer, Heidelberg, Berlin, Germany, pp. 50–61.
Mehr, A.D., Kahya, E., 2017. A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction. J. Hydrol. 549,
603–615.
Mehr, A.D., Nourani, V., 2017. A Pareto-optimal moving average-multigene genetic programming model for rainfall-runoff modelling. Environ. Model.
Softw. 92, 239–251.
Mehr, A.D., Nourani, V., 2018. Season algorithm-multigene genetic programming: a new approach for rainfall-runoff modelling. Water Resour. Manage.
32 (8), 2665–2679.
Mehr, A.D., Nourani, V., Kahya, E., Hrnjica, B., Sattar, A.M., Yaseen, Z.M., 2018. Genetic programming in water resources engineering: a state-of-the-art
review. J. Hydrol. 566, 643–667.
Mohammad-Azari, S., Bozorg-Haddad, O., Loáiciga, H.A., 2020. State-of-art of genetic programming applications in water-resources systems analysis.
Environ. Monit. Assess. 192 (2), 73.
Mohammadi Bayazidi, A., Wang, G.G., Bolandi, H., Alavi, A.H., Gandomi, A.H., 2014. Multigene genetic programming for estimation of elastic modulus
of concrete. Math. Probl. Eng. 2014. https://doi.org/10.1155/2014/474289.
Mousavi, S.M., Mostafavi, E.S., Hosseinpour, F., 2015. Towards estimation of electricity demand utilizing a robust multi-gene genetic programming
technique. Energy Effic. 8 (6), 1169–1180.
Muduli, P.K., Das, S.K., 2014. CPT-based seismic liquefaction potential evaluation using multi-gene genetic programming approach. Indian Geotech. J. 44
(1), 86–93.
Muduli, P.K., Das, S.K., 2015. Model uncertainty of SPT-based method for evaluation of seismic soil liquefaction potential using multi-gene genetic
programming. Soils Found. 55 (2), 258–275.
Nedjah, N., Abraham, A., de Macedo Mourelle, L., 2006. Genetic Systems Programming: Theory and Experiences. ISSN Print Edition: 1860-949X,
Springer, Netherlands.
Niazkar, M., 2019. Revisiting the estimation of colebrook friction factor: a comparison between artificial intelligence models and C-W based explicit
equations. KSCE J. Civ. Eng. 23 (10), 4311–4326. https://doi.org/10.1007/s12205-019-2217-1.
Niazkar, M., 2020. Assessment of artificial intelligence models for calculating optimum properties of lined channels. J. Hydroinform. https://doi.org/
10.2166/hydro.2020.050.
Niazkar, M., 2021. Optimum design of straight circular channels incorporating constant and variable roughness scenarios: assessment of machine learning
models. Math. Probl. Eng. 2021, 1–21. https://doi.org/10.1155/2021/9984934. Article ID 9984934.
Niazkar, M., Afzali, S.H., 2018. Developing a new accuracy-improved model for estimating scour depth around piers using a hybrid method. Iran. J. Sci.
Technol. Trans. Civil Eng. 43 (2), 179–189. https://doi.org/10.1007/s40996-018-0129-9.
Niazkar, M., Niazkar, H.R., 2020. COVID-19 outbreak: application of multi-gene genetic programming to country-based prediction models. Electron.
J. Gen. Med. 17 (5), em247. https://doi.org/10.29333/ejgm/8232.
Niazkar, M., Zakwan, M., 2021a. Application of MGGP, ANN, MHBMO, GRG and linear regression for developing daily sediment rating curves. Math.
Probl. Eng. 2021, 1–13. Article ID 8574063 https://doi.org/10.1155/2021/8574063.
Niazkar, M., Zakwan, M., 2021b. Assessment of artificial intelligence models for developing single-value and loop rating curves. Complexity 2021, 1–21.
Article ID 6627011 https://doi.org/10.1155/2021/6627011.
Niazkar, M., Talebbeydokhti, N., Afzali, S.H., 2019a. Novel grain and form roughness estimator scheme incorporating artificial intelligence models. Water
Resour. Manage. 33 (2), 757–773. https://doi.org/10.1007/s11269-018-2141-z.
Niazkar, M., Talebbeydokhti, N., Afzali, S.H., 2019b. One dimensional hydraulic flow routing incorporating a variable grain roughness coefficient. Water
Resour. Manage. 33 (13), 4599–4620. https://doi.org/10.1007/s11269-019-02384-8.
Niazkar, M., Hajizadeh Mishi, F., Eryılmaz T€urkkan, G., 2021. Assessment of artificial intelligence models for estimating lengths of gradually-varied flow
profiles. Complexity 2021, 1–11. Article ID 5547889 https://doi.org/10.1155/2021/5547889.
Nicklow, J., Reed, P., Savic, D., Dessalegne, T., Harrell, L., Chan-Hilton, A., et al., 2010. State of the art for genetic algorithms and beyond in water
resources planning and management. J. Water Resour. Plan. Manage. 136 (4), 412–432.
Pandey, D.S., Pan, I., Das, S., Leahy, J.J., Kwapinski, W., 2015. Multi-gene genetic programming based predictive models for municipal solid waste
gasification in a fluidized bed gasifier. Bioresour. Technol. 179, 524–533.
Pedrino, E.C., Yamada, T., Lunardi, T.R., de Melo Vieira Jr., J.C., 2019. Islanding detection of distributed generation by using multi-gene genetic programming based classifier. Appl. Soft Comput. 74, 206–215.
Riahi-Madvar, H., Dehghani, M., Seifi, A., Singh, V.P., 2019. Pareto optimal multigene genetic programming for prediction of longitudinal dispersion
coefficient. Water Resour. Manag. 33 (3), 905–921.
Safari, M.J.S., Mehr, A.D., 2018. Multigene genetic programming for sediment transport modeling in sewers for conditions of non-deposition with a bed
deposit. Int. J. Sediment Res. 33 (3), 262–270.
Searson, D., 2009. GPTIPS: Genetic Programming and Symbolic Regression for MATLAB.
Searson, D.P., 2015. GPTIPS 2: an open-source software platform for symbolic data mining. In: Handbook of Genetic Programming Applications.
Springer, Cham, pp. 551–573. https://doi.org/10.1007/978-3-319-20883-1_22.
332
Handbook of hydroinformatics
Searson, D.P., Leahy, D.E., Willis, M.J., 2010. GPTIPS: an open source genetic programming toolbox for multigene symbolic regression. In: Proc., Int.
Multiconf. of Engineers and Computer Scientists. Newswood, China, Hong Kong, pp. 77–80.
Sivapragasam, C., Maheswaran, R., Venkatesh, V., 2008. Genetic programming approach for flood routing in natural channels. Hydrol. Process. 22 (5),
623–628.
Zakwan, M., Niazkar, M., 2021. A comparative analysis of data-driven empirical and artificial intelligence models for estimating infiltration rates. Complexity 2021, 1–13. Article ID 9945218 https://doi.org/10.1155/2021/9945218.
Chapter 20
Ontology-based knowledge management
framework in business organizations
and water users networks in Tanzania
Neema Penance Kumburu
Moshi Co-operative University, Moshi, Tanzania
1. Introduction
Todays’ economy is considered knowledge economy this is because formation and utilization of facts is crucial and plays a
major part in the creation of wealth. Economic achievement is progressively founded upon the actual consumption of
invisible assets such as knowledge, skills and innovative potential as the crucial source for competitive advantage
(Barkhordari et al., 2019). This emerging economic structure is thus referred to as the knowledge economy (ESRC,
2005). The current movement of globalization has necessitated the world, regions, and countries in particular to aggressively
participate in the world-wide economy; thus, rivalry is the foremost influence in the process. This was based on the understanding and recognition that the traditional factors of production (Land, Labor, and Capital) which were plentiful, available
and were regarded as prime in attaining economic gain have limitations. Knowledge received scanty attention and thus of less
importance in attaining competitive advantage, currently, it is the knowledge-based economy characterized by the use of
information technologies that counts more (Kefela, 2010). Thus, this implies that, earlier well recognized factors of
production are presently not sufficient to sustain a firm’s competitive advantage as knowledge plays crucial role. Most organizations understand the relevance of available information in attaining sustained competitive edge than others. Knowledgedependent economy is founded on the invention, sharing and application of knowledge and information. Yoong and Molina
(2003) opined that one means through which business organization can thrive in today’s tempestuous commercial setting is
through appreciating, the recognition and utilization of knowledge in the organization. Organizational effectiveness in both
business and water or irrigation networks likewise hinge upon making good use of this knowledge, which needs to be established, apprehended and exchanged (knowledge management) so as to form firm capital (Omotayo, 2015). Thus, individual
enters and leave, but firm preserve knowledge over time. Or as articulated clearly by Fitz-Enz (2000), company capital
knowledge remains with the organization when the workers quit. Human capital is the cognitive asset that leaves the company
every night and so cannot easily be manipulated. This is because they choose where and how they want to invest their
knowledge. Consequently, knowledge management (KM) turns out to be a serious action in realizing outcomes.
Ontology is overt provisions of a collective operationalization. An operationalization is an intangible model of realities
in the earth by recognizing the pertinent thoughts of the phenomenon. Explicit refers to the form of notions applied and the
restraints on their application are openly demarcated. Shared knowledge mirrors the idea that an ontology reflects consensual knowledge, that is, it is not isolated to the person, but consented by the group. Basically, the role of ontology
in the knowledge management process is to affluence the creation of a field model. It offers a terminology of terms
and association in a precise domain. In making a knowledge management system, deuce sorts of knowledge are desirable.
First is field knowledge: knowledge concerning the unbiased truths in the field of attention (objects, relations, events, states,
causal relations, etc. that are gotten in some areas) and second problem-solving knowledge: knowledge concerning the use
of particular knowledge to realize numerous end results, the knowledge normally presented in form of problem-solving
method (PSM) that can facilitate attainment the goals in a dissimilar field.
In the case of business organization and water user networks or co-operatives, knowledge needs to provide solution of
non-acceptance of particular product in the market or behavior of users. With the intention to manage the knowledge,
ontology has crucial role in enabling the converting and distribution of knowledge amid experts and knowledge beneficiaries (Sureephong et al., 2008). Along with actors described earlier, it also offers a communal and reciprocated comprehending of a sphere that can be linked across individuals and utilizing systems.
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00014-2
Copyright © 2023 Elsevier Inc. All rights reserved.
333
334 Handbook of hydroinformatics
Business organizations are legal units through which stockholders and business persons offer goods and services as well as
co-operate with each other to realize viable goals. Utilization of knowledge properly is key to firm’s existence and prosperity
in competitive international markets and has crucial contribution to critical thinking, selecting course of action, company
performance and invention (Huang, 2008). Thus, business organizations are currently using knowledge to withstand continuing competitive advantage. At the moment knowledge is key management asset since knowledge permits organizations
to apply and nurture organizational capitals, augment competitive aptitude and advance maintainable competitive gain.
Water users’ networks or co-operatives are farmers’ societies which are autonomous and governed by members
(owners) who contribute both monetary and human capitals for their socio-economic benefits as well as conservation
of a cognizable water body (Mosha et al., 2018). Since it is the knowledge, skills, and abilities of the individual that make
value, emphasis ought to be on mechanism on how knowledge is acquired, exchanged and disseminated. Thus, knowledge
management is a prerequisite if the business organizations and water users’ networks have to attain roles for which they are
established for (Omotayo, 2015). Knowledge management refers to stowage and distribution of the knowledge and understanding accrued in an organizations, association or networks concerning procedures, methods and actions. It considers
knowledge as significant source in realizing members content (Gonzalez and Martins, 2017).
Knowledge management put emphasize on persons and the means they obtain, interchange and circulate knowledge.
With the fast advancement of information technology and applied utilization of unconventional thoughts, a diversity of
tacit, explicit, organized and unorganized knowledge is rising exponentially. How to successfully gather and classify these
multifaceted, varied and multiple knowledges and, means to reacquire and reiterate them sensibly to form new standards for
cultivating the competitiveness of firm. The relevance of knowledge management as key tool in business organizations and
water users association cannot be overemphasized. Teng and Song (2011) and Omotayo (2015) assert that the importance of
knowledge management is from recognizing that firms compete on their knowledge-based asset, even government institution (non-competing organizations) success or failure is dependent on the capacity to leverage their knowledge asset and
that knowledge management is not only relevant in high tech industry only but also in all spheres of the economy. Despite
the importance of knowledge management on enhancing business organizations competitive advantage, in particular, the
aspects of Ontology-Based Knowledge Management framework in Business Organization have not been recognized if not
rarely mentioned in few researches works.
On the aspect of water users’ networks, water users networks are very vivacious sector toward ensuring food safety and
poverty lessening particularly in Tanzanian’s rural household’s communities. About 45% of Tanzania’s GDP is subsidized
by the agrarian sector and around 30% of forex is through shipping of food items, while engaging more than 70% of rural
inhabitants. As a result, the agriculture subdivision endures to be the engine for advancing the economy in the country
(URT, 2011). Even though agriculture is the mainstay of the country’s economy, still it is reckoned by a number of challenges; to mention a few, are intermittent drought, undependable rainfall due to natural catastrophes such as overflows and
drought (URT, 2011). Based on this fact, water users’ networks or irrigation associations are regarded as crucial means to
battle food insecurity, upsurge food production and make food security amongst the domiciliary societies. However, in
Tanzanian situation, food security is amid the national disasters. For the period between 2019 and April 2020, over
20% of the populace in the sixteen districts such as Tanga, Arusha, and Manyara are facing acute food insecurity. These
comprise 16% and 5% suffering crisis situation, serious situation respectively. Furthermore, about 34% of people are also in
IPC Phase 2 (Stress) and need livelihood sustenance. This discloses the intensity of trouble in Tanzania. While water user
networks or irrigation societies was started mostly to increase agrarian production and improve food safety, the degree to
which these irrigation networks advance food security in Tanzania continued to be small. It is for this reason that this
chapter will document, share and propose a framework of knowledge management system. In order to assist business organization and water user networks in Tanzania to found a collective ontology that can be comprehended both by human and
computer, so that employees and network members can form a common place of dissimilar notions through a better condition of knowledge recovery interface. This not only reflects the invention of knowledge but again can be used as tool for
management to ensure conservation and revision of novelty knowledge on time. The system is also useful in ensuring that
knowledge is shared so that it can accrue tacit knowledge continually and polymerize explicit knowledge successfully so
that organizations and networks can use knowledge to outperform competitors and ensure sustained competitive advantage.
The development of this chapter was founded on the hypothetical and past study. To ensure an extensive theoretic base
for this work, numerous available literatures were consulted. The study used a case study design where the experiences of
business organizations and water user networks in Tanzania were explored.
2.
Theoretical framework
Several interpretations on knowledge are debated in many diverse scientific forums, such as strategy, management, organization theory literature, and philosophy. Divergent interpretations on knowledge resulted to dissimilar operationalization
Ontology-based knowledge management framework Chapter
20
335
of knowledge management (Ferreira et al., 2018). The assumption is based on understanding that knowledge as a tactical
resource. This is as stipulated by business strategy theory, explicitly the resource-based view (RBV) of the firm. The core
thesis of the RBV is that competitive gain is based on valued and exceptional interior resources and capabilities that are
expensive to reproduce for contestants. This implies that a resource to be a basis of competitive edge must meet three criteria. First, the results from these appreciated resources are freely procured by purchasers at a worth far higher than the
charges acquired in producing it to the marketable state. Second, it is rare since it is contingent to partial supply. Thirdly,
it is hard for rivals to either emulate or acquire the resources (Cardeal and Antonio, 2012). The theory further theorizes that
the wanted outcome of executive effort within the enterprise is sustained competitive advantage (SCA) that permits the
organization to get earnings that are beyond industry average (Mugera, 2012).
This theory sees SCA as originating from the peculiar resources of enterprise that stretches an advantage over its opponents. An enterprise such as water user network is observed as a package of particular resources that are utilized to establish
advantaged market situation. Therefore, the RBT stresses tactical selections, where managers of the organization have the
significant duty of recognizing, nurturing, and utilizing key resources to take full advantage of returns. In this case, organization manager must make sure that mechanisms are in place to ensure gathering, shaping, expounding, distributing and
the reapplication of the information and knowledge in the entire firm. This is because the resource-based theory (RBT)
places emphasis on decisions and competencies emanating from a firm rather than its environment (Barney and
Arikan, 2005). Resource firm have, knowledge inclusive is a source competitive advantage. Internal influences are those
that exert influence on firm owner/manager’s aptitude to work competently, notwithstanding any inborn ability of the
owner/manager (Amoah and Fordjour, 2012). Inner features are the individual qualities, skills, knowledge and capabilities
of the discrete owner/manager which are critical on how well the business perform the unavoidable emergencies that
ascend. Thus, RBT stipulate that knowledge is the main determinant of organizations performance. Knowledge management (KM) narrates the processes and set-ups organizations utilize to achieve, generate and disseminate knowledge
for articulating strategy and create strategic choices that will enable a firm to gain competitive advantage. When a business
organization develops knowledge strategy. It is described as the overall tactic a society intent to pursue to balance its
knowledge resources and competences to the rational necessities of its strategy. A strategic boldness is essential to attain
a maintainable competitive position.
From a practice, business organizations and water user networks recognize the eminence of handling knowledge if they
are to persevere competition and rise. Subsequently, numerous companies universally are beginning to enthusiastically
organize their knowledge and innovation. Knowledge do matter, but the enquiry is when, how and why? (Carayannis
and Campbell, 2009). Today, knowledge is of more substance and in the way that are not constantly foreseeable or even
controllable. Knowledge frameworks are extremely multifaceted, dynamic and adaptive (Carayannis and Campbell, 2009).
This requires ontology-based knowledge management framework in business organizations and water user networks.
However, RBT also does not value the synergy component among resource combinations in achieving competitive
advantage (Kraaijenbrink et al., 2010). To overcome these criticisms, the system theory is preferred.
Systems theory emphases on the links amid fragments and the features of a whole, instead of plummeting a total to its
segments and assessing their discrete properties (Senge, 1990). A system is described as “an object which preserves its
being by the use the joint interface of its fragments Systems theory delivers a structure by which clusters of components
and their characteristics can be studied together so as to comprehend outcomes”. Thus, systems theory framework is
important to the analysis of business firm and water users’ networks and how they operate (Yari and Eslamian,
2021). A system is made up of at tiniest dual parts and correlation that holds among them. At any particular time, a system
or each of its parts displays a state, defined as its pertinent features, morals or features. When considering knowledge, a
significant notion in systems thinking is generative learning. Generative learning is the procedure of balancing, mixing
and contextualizing prevailing knowledge to ensemble the wants of a new submission or a business organization (Chun
et al., 2008). Generative learning permits advanced approaches to novel snags rather than simple reactionary and frequently ill-suited reuse of ancient philosophies to novel difficulties. A systems theory approach to KM identifies that any
time one of the key knowledge developments is indorsed, there can be a ripple effect of actions and deeds that may alter
the condition of other subsystems. Incident may be component of strengthening processes that results to the rise or deterioration of either wanted or unwanted consequences. Each knowledge process may result to unreceptive solutions or
factual generative learning. Contingent on how four processes (the creation, storage, transfer, and application of
knowledge) have been executed, they may be observed as closed, open or dynamic systems, each affected more or less
by the outside setting and each interrelate and interdepend. While the systems intellectual viewpoint has been unified into
the information system (IS) literature (Panagiotidis and Edwards, 2001), insufficient investigators have scrutinized the
all-inclusive standpoint of systems thinking in the milieu of KM. Thus, this chapter will show and document how system
theory can best explain assumption of ontology of knowledge management in business organization and water user
networks.
336 Handbook of hydroinformatics
3.
Empirical literature
This section presents the reflection of previous studies in relation to the subject matter. These studies provide insights
into understanding the ontology-based knowledge management framework in Business Organizations as well as identifying gaps. Huang (2008), explained how enterprise culture and structure can effectively enhance knowledge management by revising literature and demonstrating a knowledge enterprise model. The study further shows that in order
to successfully utilize knowledge, the society should generate a knowledge distribution culture whose element is trust
and contemplate it from the four category that is interpersonal, group, organizational and institutional. Belief ought to
go through the progression of knowledge management and underscore trust to individuals and to knowledge content
concurrently. Omotayo (2015) reviewed literature in the part of knowledge management and document the relevance
of knowledge management in association. The paper additionally, shows that knowledge management is a critical
element to objectives attainment and crucial instrument for enterprise existence, competitiveness and viability. Thus
forming, handling, distribution and applying knowledge efficiently is important for society to exploit the benefit of
knowledge and for organization to manage knowledge effectively concentration should be on three key factors that
is people, processes and technology.
Barkhordari et al. (2019) examined the pragmatic association amid the knowledge based economy and fiscal progress in
MENA countries. The study used progress model in Barro and Sala-i-Martin framework (1995) for the time frame of five
years (2010–15) as well as panel data of yearly fiscal development rate for the designated MENA countries and within the
hypothetical and pragmatic context of the quatern variables applied for recognizing the context of the knowledge-based
economy. The outcomes show that institutions, human capital and research infrastructure and business sophistication are
the pillars of knowledge-based economy that have substantial and positive influence economic development in MENA
countries. The recommendation emanating from this study is that government should contemplate the knowledge connected policies for hastening passage to the knowledge-based economy and boosting economic performance.
Sureephong et al. (2008) recommended knowledge management system that hold up knowledge management actions
inside the organization. To attain the goals of the investigation, ontology is the key in the knowledge management development in several means as establishing reusable and quicker knowledge based and improved customs of representing the
knowledge openly. The study further found that generating and depicting ontology produce problems to enterprise due to
vagueness and formless origins of knowledge. Therefore, the study proposes the methodology that comprehend, generate
and depict, ontology for firm or societal development by means of the knowledge manufacturing approach.
Liu et al. (2013) presented knowledge management context that tracks and combines origin information across dispersed varied system. They reinforced by the combined knowledge model that defines the field knowledge and the origin
information included in the information life cycle of a certain data product and appraised the anticipated frameworks in the
setting of deuce actual world water irrigation information systems.
Basing on the above empirical studies, it is clear that the studies on ontology-based knowledge management framework
in business organizations have been conducted worldwide. These studies found that institutions, human capital and
research, infrastructure and business sophistication as well as people, processes and technology are critical in managing
knowledge effectively. Yet, little research has examined how ontology-based knowledge management framework in
Business Organizations and water users networks, particularly, the extent to which business organization accrue implied
knowledge continually and polymerize obvious knowledge competently, which result to improved organization and utilization of knowledge, to sustain the invention for the inventers. Therefore, relying on this fact, this study aimed to cover the
existing slit by authenticating outline of a knowledge management scheme founded on ontology of knowledge management
that form and explain knowledge with the ontology with the aim of creating a shared ontology whereby aspects of
knowledge management such as human, computer, and people can be appreciated and discover additional association
of different notions by improved mechanism of knowledge recovery interface. This will not only reflect the invention
of knowledge, but apply management tools to ensure the storage and upgrade of novelty knowledge timely and thus
enhance business and water users’ association performance.
4. Ontology-based knowledge management framework in business organizations:
A conceptual framework
Knowledge based economy raises exponentially and thus knowledge asset becomes irreplaceable to the organization. Efficient application of knowledge has come to be central to the firm’s endurance and accomplishment in rivalry world-wide
business and has an effective probability to problem resolution, decisiveness, enterprise performance enhancement and
invention. Successful application of knowledge, as described in academia, is knowledge management. Knowledge
Ontology-based knowledge management framework Chapter
20
337
management refers to organized, clear and deliberately established procedures needed to manage knowledge, the intention
of which is to make best use of an enterprise knowledge-linked effectiveness and form standards (Bixler, 2005). The procedure included in KM comprises gathering, sorting, illustrating, distributing, and reutilizing the information and
knowledge in the entire enterprise. Concerning with knowledge is the foremost principle of knowledge management.
Knowledge management is of two kinds overt and unspoken. Overt knowledge may be voiced in proper verbal and communicated among individuals while inferred knowledge contains more imperceptible features and is private knowledge
entrenched in personal skills. Equally obvious and inferred knowledge must make earnings and resolve today’s issues
in association. Mastery of relevant and contemporary knowledge for unceasing association enhancement is main focus
of KM. Effective KM has maturity, dynamic and self-development attributes. Maturity qualities purport that KM must
be durable and sufficient to solve the turmoil in attaining results hitherto dynamic and adapt to deviations (Barkhordari
et al., 2019). Again, KM should bring into line with enterprise policy, strategy, culture technology and structure and offer
an atmosphere with well-organized, value added and pertinent knowledge to make and initiate creativity and exciting ideas.
Dynamic attributes mean the information and knowledge movement should blowout through the enterprise without barricades where everybody can share and subsidies knowledge asset. Self-growth attribute implies, on one hand, KM should
sense possibly valuable knowledge, seizing and stowage it to upsurge enterprise assets and on the other side it should make
new knowledge relying on what an enterprise previously has had. Knowledge Management can advance organization, for
example, leveraging the intellectual capital, exploiting knowledge assets, preserving cutting edge performance. In organizational perspective, it is expected that organizational policy, strategy, culture technology and structure can enhance
knowledge management.
For example, organizational culture can impact knowledge management in the sense that in knowledge sharing culture,
organizational culture may enable knowledge dissemination, particularly implicit knowledge. People are extra persuaded to
hide what they are familiar with when are indeterminate with the consequence of dissemination, to ensure sharing is done
effectively building trust is key. This is because trust involves faith to persons and faith to the knowledge content itself.
Confidence to persons is important in creating collaborating and participative culture in the organization. This is expected
to minimize the barriers to the knowledge sharing. On the other hand, faith to knowledge content will rise the trustworthiness of knowledge which may facilitate persons utilize knowledge without reservation and foster the trust to other individuals instantly. Research further, identified other concept of knowledge allotment culture like the possession of
knowledge and obligation. Possession of knowledge can be categorized into enterprise trust. This entails that firms should
facilitate the employees work, offer essential knowledge to complete their duties, be open to censure and inspire confidence. The knowledge resources not only possessed by executives, it ought to be public to everybody in the enterprise.
All employees have the right to possess and recoup the knowledge asset. In addition, organizations should form environment which enable staff feel equal access to knowledge assets and accountable for facilitating change.
Furthermore, organizational structure also affects knowledge management positively. This is mainly because to
implement knowledge management business organizations appoint the person responsible with for example, chief
knowledge officer (CKO), knowledge engine, and knowledge manager to execute knowledge management. The organizational charts should be interacted to offer occasions for workers to co-operate and liaise with the rest and facilitate
knowledge linked deed. Since it is intangibles and more pertinent to individual moods, organizing implicit knowledge
is extra tricky than overt knowledge. Organizational settings must be in position implicit knowledge and change overt
knowledge if needed. The organizational design must be networked to deliver chances for each staff to interrelate and liaise
with others and offer knowledge associated activities in firm. Organizational frameworks should facilitate tacit knowledge
and turn it explicit. In the organizational setups, there must be links amid individual enhancement and organizational development as advocate by system approach discussed earlier.
Technology is enabler, within infotech are machineries which enable the executives to distribute knowledge and data.
Consequently, infotech have a crucial function on Knowledge Management creativities. Within contemporary business
situation, the execution of knowledge management projects is uncomplicated with the assistance of machinery
(Subashini et al., 2011). The worth of Knowledge Management increased when accessible to the accurate persons at
the exact time. Thus, knowledge distribution, storage and retrieval are eased via infotech equipment such as computers,
phones, e-mail, folders, data-mining systems, exploration engines, video-conferencing equipment and micro film.
Human factor is crucial to successful management of knowledge management thus human resource management assist
much in managing knowledge by compounding “congruence” and “human” and “social capital” approaches. Through the
integration approach human resource management mechanisms should be internally reliable in order to mutually strengthen
each other, strengthen the entire management framework in the organization and “fit” with the exterior business situation.
Human resource management in the field of career progression and payment systems also need special emphasize in order
to effectively manage knowledge (Omotayo, 2015). By using human and social capital approach, they emphasize the
338 Handbook of hydroinformatics
significance of “the long-term growth of skills, culture and competences in the organization” (El-Farr and
Hosseingholizadeh, 2019). Considering the thesis that personnel are carriers of abundant enterprise key knowledge, they
propose that human resource experts should focus upon, first, the retaining of staff. Second, they recommend that workers’
know-how be built into the enterprise customs through learning procedures. Third, they advocate that devices are formulated for the sharing of profits ascending from the consumption of this knowledge.
On the other hand, Armstrong (2006) suggests ways on which human resource management influence knowledge management (i) Assist in fostering an exposed culture in which the customs and standards stress the relevance of disseminating
knowledge. (ii) Endorse an environment of devotion and honesty. (iii) Recommend on the blueprint and expansion of enterprise which simplify knowledge distribution using webs and societies of practice (clusters of persons who share mutual
needs about their work), and co-operation. (iv) Suggest recruitment strategies and offer supply amenities which guarantee
that respected staff who can subsidies knowledge formation and distribution are captivated and reserved. (v) Offer means of
inspiring persons to make their knowledge explicit and motivating persons perform those act. (vi) Aid in the establishment
of performance management procedures that emphasize on the expansion and dissemination of knowledge. (vii) Formulate
mechanism of collectively and discrete learning that will produce and help in circulating knowledge. (viii) Establish and
conduct workshops, meetings, training and conventions which facilitate knowledge to be shared on a person-to-person
base. (ix) In collaboration with infotech, form structures for seizing and, codifying overt and inferred knowledge.
(x) Largely, endorse the source of knowledge organization with top executives to inspire them to apply direction and
backing knowledge management creativities.
On the other hand, hydrology is the discipline of water that involves storages and fluxes in site, period and stage. Based
on complexities of technical problems, it is increasing difficult to allocate water resource so as to solve problem within one
organization or one site (Liu et al., 2013). Data gathering and scrutiny are done without dispersed heterogeneous. Equated
with additional data-oriented discipline societies, one of the unique features of the hydrologic science communities is fact
that there is countless focus on externally generated data e.g., data gathered by other organization. A noteworthy snag for
field operators is to find the correct data that suit their needs and to choose means to utilize that data (Tarboton et al., 2010).
Additionally, societies have positioned greater consideration on the interacting of disseminated sensing and less on the
tool to control and comprehend the data. To correctly understand data produced by external agent, the operators require to
possess mastery of the source of data. Thus, water user networks in Tanzania need a mechanism of a knowledge management framework founded on ontology of knowledge management that shape and explain knowledge with the ontology
with the aim of creating a shared ontology whereby aspects of knowledge management such as human, computer, culture,
structure, leadership, and technology can be appreciated and create more association of dissimilar thoughts through the use
improved context of knowledge reclamation interface (Fig. 1).
Leadership
Practice strategic plan and systems thinking approaches
Encourage and reward learning, risk taking, and knowledge sharing
Use resources in the best possible way to share knowledge and ideas
Knowledge
Sharing
Structure that facilitates personal interactions and captures tacit knowledge
Culture
Technology Infrastructure
That supports
trust, open
dialogue,
and team
• promotes efficient capture of tacit
Business
and explicit knowledge
organization/
• supports efficient and effective
knowledge sharing
• makes knowledge accessible in
the entire organization
Organization Structure
Learning
• Provide an opportunity for individual learning that is linked to organizational learning
• Establish connection between individual performance and business goals
• Develop metrics and benchmarks to measure results of learning
• Challenge people in the organization to perform better by setting tougher standards
Trust
water user
network
Completive
advantage
FIG. 1 Conceptual framework of ontology-based knowledge management in business organizations. (Modified from Anantatmula, V.S., 2005.
Knowledge management criteria. In: Stankoshy, M. (Ed.), Creating the Discipline of Knowledge Management: The Latest IN University Research. Elsevier
Butterworth-Heinemann, Amsterdam, Boston, Netherlands, pp. 171–188.)
Ontology-based knowledge management framework Chapter
20
339
5. Ontology-based knowledge management framework in business organizations
and water user networks proposed system
Business organization and or water user networks are formalized structures of roles or positions to facilitate accomplishment of goals. Thus, today’s society is a society of organizations. This owes itself to a dynamic of capitalism as
the dominant mode of economic system. Capitalism thrives through organizational activity. Organizations have to be
managed as they grow in complexity. Therefore, organization/water user networks develop to achieve goals, they involve
sets of interacting positions, they involve collaborative actions of individuals, they are deliberately structured and consciously coordinated, they involve departmentalized activities following a logical pattern and they exist within the larger
society (Daft, 2000). Since organizations/water users networks operate within society. They are described as open system in
the sense that they are in a continuous exchange with their environments in order to survive. They receive inputs from this
environment, transform them into outputs, and pour them out into the environment as goods or services and waste. Through
feedback the organization gets information about what works and what doesn’t work and makes the necessary correction or
adjustment. Changes in the external environment require the organization to adapt accordingly. Such changes impact on
strategy, structure, and culture of the organization (Fig. 2).
For the above system to operate effectively, the ontology of knowledge management is needed. This is because the core
role of knowledge management scheme is to facilitate knowledge distribution inside. Thus, accession of knowledge does
not reflect the commencement of the knowledge management but then crucial condition (Huang, 2008) so as to be suitable
for reutilization of knowledge. Knowledge management creates clear components that are stowed in a knowledge foundation that comprises numerous organized, semiorganized, and unorganized information. The knowledge management is
alienated into trio portions if it has to work effectively and attain organizational goals. That is procurement of knowledge,
storing of knowledge and re-claim of knowledge as well as procedure whose key conception in ontology is connected by
knowledge mining, knowledge depiction, and knowledge correlations.
Procurement of knowledge is the conceptualizing procedure founded on the perception of ontology. The procedure
changes entirely essential knowledge, informal and semiformal information to the formal information. Furthermore, purchase of knowledge is actualized by knowledge mining that permits to place knowledge base, such as countless data
sources, certifications, and webs in the knowledge repository afterward dealt with by knowledge detection system
(KDS). Additional knowledge foundations such as data in each varieties of forum, comments on applications (comprising
tacit knowledge) are initially placed in the passage repository. Subsequently organized efficiently by leaders, these repositories will be placed in the knowledge repository. Consequently, obtaining of knowledge refer to the methods of
knowledge building rather than knowledge adaptation (Peng et al., 2019). Intending at changing semi organized and unorganized data to organized knowledge and store them in the knowledge bank, stowage of knowledge means the method that
metadata is taken out of knowledge foundations developed atop and knowledge are noticed in means of ontology and
Environment
System
Inputs
Raw Materials
Human Resources
Capital
Technology
Information
Transformation
Process
Outputs
Employees’Work
Activities
Management Activities
Technology and
Operations Methods
Products and Services
Financial Results
Information
Human Results
Feedback
Environment
FIG. 2 Organization as open system. (Modified from Daft, R.L., 2000. Management. The Dryden Press, Fort Worth.)
340 Handbook of hydroinformatics
knowledge
connection
knowledge
representation
knowledge
administrator
user
user
administration
tool
knowledge
retrieval engine
knowledge push
Ontology
base
meta-data
base
documents
domain knowledge
marking
knowledge
mining
knowledge processing/
discovering tool
documents
data base
data
base
knowledge
base
transit depot
application
Web
tools of knowledge
acquisition
the feedback of
forum/user
FIG. 3 Ontology-based knowledge management framework in business organizations and water user network-proposed system. (Modified from Peng, M.
Y.P., Zhang, Z., Ho, S.S.H., 2019. A study on the relationship among knowledge acquisition sources at the teacher-and college-level, student absorptive
capacity and learning outcomes: using student prior knowledge as a moderator. Educ. Sci. Theory Pract. 19(2), 22–39.)
metadata ideals. Ontology foundation comprises association of the grouped notions of field knowledge substances and extra
ideas in a system. Metadata, which KMS needs is placed in the metadata bank that form the core tool to explore knowledge
object successfully (Fig. 3).
Data bank and knowledge bank are kind of combined of syntax, metadata information and connection of knowledge
objects (in this case information on employees or members titles, education, training, past service, present position, performance scores, pay levels, language proficiency, and capabilities and specialized skills). Separate cluster command on
associated knowledge in feature of metadata store and ontology is not only the beliefs but again the basis to attain effective
search and conjecture on knowledge. It was also noted that acquisition of knowledge in business organization and water
user networks in Tanzania is poorly practiced. It was reported that in this company the concept knowledge management is
completely new. “We still rely on trial and error to solve problems we are yet to have knowledge management systems in
place. Thus it is difficult to have information and knowledge database that can easily facilitate knowledge acquisition.”
Most of the knowledge are still tacit in the sense that they are yet to be retrieved from the people who own it, debated and
stored in company database, debated and criticised and stored in company database.
Reutilization of knowledge is the mechanism that knowledge is utilized in using systems. In this context, workers or
members can utilize knowledge exploration engine to search needed knowledge in diverse discrete cluster in previous
works. One may obtain knowledge through technique of pull. Furthermore, KMS likewise provide linked knowledge
depending on individual preference and instant requirement. Under the favorable management situation, knowledge in
the repository is controlled, rehabilitated as well as kept promptly by knowledge leaders, who permits that system to have
competent capacity of active involvement rather than those that are limited to the still usage and close preservation.
Ontology-based knowledge management framework Chapter
20
341
6. The practice of knowledge organization and expression
6.1 Ontology
For organizations and water user networks to carry out its work effective ontology is necessary. Ontology is distributing
abstract model of official description. In execution, it is typically accepting five different arrays to designate ontology,
notion or class association, roles, axioms and occurrences. Association is the heart of ontology in five diversified arrays.
Establishing a good correlated area ontology foundation is the central aspect of the knowledge management foundation on
ontology.
The correlation of ontology replicates the constrains, interaction or a novel connection among thoughts, such as the
identical, appositional, hyponym, configuration, interconnection and noun modified relations etc. Numerous associations
can logically link each form of the knowledge node and establish a web of knowledge association base on ontology, then it
can obtain the accurate knowledge node using comparative path. There are two systems of knowledge relations (i) major
ontology relation and (ii) minor ontology relation (Zhang et al., 2011). The formal relation describes all the terminology
concerning particular field and ontology association among each terminology; the other relation describes the outside concepts of other domain linking the terminology of major ontology relatives. Using the major and minor relations can create
respectable knowledge organization and likewise retrieve knowledge proficiently and rapidly. Hypothetically all functions
carried out by water user association or business organization are connected between firms to produce outline function for a
web, again all properties in a firm “bind” within the associations to generate a resource gathering for a system. Lastly, the
players are attached within association to form players’ network. Yet, single managers cannot realize the entire pattern in
which the enterprise is entrenched. For executives, it implies taking a confined sight, establishing enough intricacy into
their comprehension to permit act, in order to prioritize aims across time. A part of relation network of business and water
user association is presented in Fig. 4.
6.2 Knowledge representation and organization base on ontology
The kind of knowledge depiction must be positioned first. Owing to the variety and intricacy of knowledge, it is challenging
to convey the knowledge in organized way. There are varied types of knowledge illustration, nonetheless there is no settled
and organized technique of the knowledge depiction (Zhang et al., 2011). Additionally, the formation of an undivided
knowledge base may not qualify for a conglomeration inventive plans, due to the cross and united knowledge in numerous
discipline at the moment. A multi-ontology foundation concerning knowledge organization is offered. This model generates one knowledge foundation correspondingly, use the elementary typescripts of ontology and the association labeled
above that appreciate the interconnectedness of many-ontology bases and lastly create a knowledge system. As shown in
Fig. 5.
organization
decomposition_of
has_goal
consists_of
organization-goal
sub_goal
consists_of
division
has_goal
requires_skill
skill
plays
role
has_authority
member_of
subdivision
member_of
organization-agent
team
has_communication_link
performs
authority
communication-link
activity
constrained_by
constraint
consumes
resource
FIG. 4 Part of relation network of business and water user association. (Modified from Grunninger, M., 2003. Enterprise modelling. In: Handbook on
Enterprise Architecture. Springer, Berlin, Heidelberg, Germany, pp. 515–541.)
342 Handbook of hydroinformatics
FIG. 5 Ontology knowledge organization model.
(Modified from Zhang, J., Zhao, W., Xie, G., Chen,
H., 2011. Ontology-based knowledge management
system and application. Procedia Eng. 15, 1021–
1029.)
knowledge base of
ontology
knowledge base of
ontology
knowledge base of
ontology
ontology system
ontology
ontology
base
logic
ontology
relations
library
conceptual set: C1, C2, C3...
correlation set: R1, R2, R3...
function set: ......
..
..
..
The relative of ontology in rational strata is really a tree or ontology relative which occur in the in-tree shape construction. Each thought is the notion of smallest and correlate to one another through ontology links. This stratum comprises
of all the notions and relative web of knowledge foundation. Moreover, it comprises quatern parts-components among each
idea: instances, axioms, relation forming, relation sets attributes, etc.
Ontology strata is the higher construction of the ontology relative library. Founding on the notion of creating model of
manifold knowledge centers, it created a knowledge system in the concept of comparable use to back up the plan process
(Zhang et al., 2011) (for instance, the process, equipment’s, approaches, guidelines in computer assisted procedure planning)
and ontology association related to the occupation resemblance. It is recognized that separately knowledge web is the basic
idea acts concerning correlated discipline and the knowledge system of ontology is the element of diversified ontology bases.
Moreover, diversified-ontology centers with connected domain knowledge in form of creating rational layer and
ontology layer is given. For example, it may be separated in numerous portions such as domain knowledge foundation,
principal knowledge foundation and unified knowledge foundation and so on. Each knowledge base of ontology is connected by lowest of constrains among every idea and designate the object, concept and semantic association concerning
linked parts base on ontology.
For irrigation co-operatives in Tanzania to increase efficiency and respond to the members needs they require ontology
of knowledge management (Sandelin et al., 2021). Thus, recognizing this, knowledge need to be managed in a systematic
and strategic way. Irrigation and water user networks occupy noteworthy subversive and mechanical resource, of which
they lack sufficient expertise. This is frightening due to the fact that irrigation co-operatives and water users’ networks to be
able to offer good services to its members is contingent to appropriate operation and preservation of key resource just like
any other business enterprise. Knowledge asset enhancement need to be continuous action in irrigation co-operatives and
ought to begin from the principle of identifying a gap between current knowledge and desired knowledge and how technology as enabler facilitate knowledge management practices.
Ontology-based knowledge management framework Chapter
20
343
Irrigation co-operatives may lose knowledge due to retirement, sease of membership, severe or long-lasting sickness
and death. When these irrigation co-operatives experience these it creates loss on portion of its institutional memory. This is
due to the fact that most of the knowledge is tacit and utilize solutions that depend on classical working practices as well as
possessing negative attitude on technology, thus making explicit knowledge is found to offer solution thus, this Ontology
Knowledge Organization Model is suggested.
6.3 Knowledge retrieval base ontology
The drive of the knowledge management is to offer suitable knowledge to the accurate persons so that they can be capacitated to make the best decisions. Furthermore, the knowledge retrieval is a main challenge of knowledge management
which is the center within linking among people and knowledge. Knowledge if is to help organization and water user networks achieve goals for which it was established for must be retrieved at the suitable time and to the suitable people. Unfortunately, this is missing in most business organization and water user networks in Tanzania and as a result they fail to make
informed decision on issues related to customers as well as product design. Knowledge recovery should focus on
knowledge organization because recovery pattern is fixed by association design, and it is the differing procedures of
the knowledge organization. Thus, the chapter apply the domain ontology-based firm design founded on the investigation
of the ontology and the knowledge depiction to design the recovery path as shown in Fig. 6.
This recovery mode is alienated into quatern parts; knowledge interlinkage, ontology, knowledge resources, and
matching recovery. Knowledge interconnectedness key role is going through the corresponding recovery mechanism
by the use of recovery passages, selecting the suitable entering (self-defines recovery and autorecovery) that explore
the knowledge bulges and ideas in the associated grounds, gaining acceptable similar consequences and again taking them
FIG. 6 Model of knowledge retrieval base on ontology. (Modified from Zhang, J., Zhao, W., Xie, G., Chen, H., 2011. Ontology-based knowledge management system and application. Procedia Eng. 15, 1021–1029.)
344 Handbook of hydroinformatics
back to the operator. Knowledge reserve is the basis of the knowledge retrieval founded on the ontology, it is the main point
to form the knowledge recovery arrangement dissimilar from well-known information recovery frameworks. Furthermore,
it is the central tenet of the system model. From the recovery examination and the findings, management to the knowledge
connotation recovery procedure, to the knowledge assets design to the index base are all founded on the correlated
knowledge in the ontology.
High effectual knowledge repossession rests on in a higher quality recovery tactic and technique. This chapter present
two recovery path self-defined retrieval and autoretrieval (Zheng et al., 2012). Which permits water users networks and
business organizations in diverse stratum to opt the suitable retrieval means conferring to unlike needs. The procedure of
self-described recovery user may describe the facet of knowledge all by themselves using the notion navigation, clarification navigation as well as graph of notion relations. As long as the network will explore the classes therefore, it can
demonstrate the variation relatives of the ontology to the operator in the human computer integration level as well as
can choose the sight of the itemized ontology. Since the ontology is quantified by operator in a definite selecting assortment,
specifically the range for searching are tapering extra or fewer in similar extent thus, cognate the essential knowledge for
the operator. This type of search method with some adjustment may principally satisfy majority of operators aspirations and
offer operators with extra maneuvering direction.
Added form of recovery path is mechanical retrieval (extended type retrieval) that is truly the growing of the thoughts. It
is implemented contingent to the relatives amid notions and connotation of the ontology. It comprises dual facets as follows
by means of the association articulated by the ranked construction in domain ontology to produce extra retrieval findings
user-challenging notions substituted by “topclass” concepts or the precise qualities value substituted by qualities value,
which all decrease the restraints of this recovery mode. Operator challenging thoughts swapped by subgroup ideas can
obtain profound and extra connotation thought and representation formats. Using the multidisciplinary notion counting
many additional domains of knowledge and additional domain of notion relative knowledge to enlarge the field which these
multidisciplinary knowledges are belonging to.
In business organizations in Tanzania storing of knowledge contains soft or hard mechanism of capturing and storage of
personal and enterprise knowledge in a manner that is simply retrieved. Storage of knowledge exploits mechanical set-up
like informational hard and software and human competencies to recognize organizational knowledge, then to code and
index retrieval purposes (Armstrong, 2006). This persons to document method. A repository as consents many to explore
for, and retrieve codified knowledge without meeting a person who inverted it, thus ensure organizational continuity.
6.4 Knowledge application and implementation base on ontology
This system may be applied to the arena of hydro informatics, after examining the formation of structure of knowledge
management arrangement founded on ontology. The knowledge organization and depiction as well as recovery, first,
in an effort to mine, classify and establish the connected knowledge ground firm apply a diversity of equipment for
knowledge gaining to assess on associated domain of knowledge (Zhang et al., 2011). A proper-structured ontology is significant to establish a knowledge management framework successful. This chapter evidently describes the ladder among
terms which are equally acknowledged in business organizations and water users’ networks and acquire the top-down technique, which imply itemizing highest-level notions and progressively purifying to create subgroups. For example, hydroinformatics can be categorized into numerous glades; enginery, power, driver, performance, etc. then slowly feast the
subgroup, spreading by similarity. There might obtain a theoretical chart that possess domain knowledge with dependence.
Lastly build a ontology model after describing the qualities of classes.
The worth of ontology knowledge ground is usability, thus, the connection among enquiries that ought to be answered
and knowledge in foundation is prime for knowledge repossession (Zheng et al., 2012). If user opt for self-described
retrieval, it will arrive into the interface presented in the ideal of knowledge retrieval described earlier, if users select
self-described retrieval, it will get in the interface as presented in Fig. 7.
On the left of notion exploration part is the notion three presenting groups, the differing side present diagram of web
correlation. Clacking the key “enter” it will appear as Fig. 5B. Picking a notion, the right will show the key association of
subject notion, inclusive of supreme, lateral, thought and related concept that will be displayed in viewing a concept part.
This offers a thin or extension exploration varieties in order to get conforming knowledge object. If operator select autoretrieval, it will get into the interface as it is displayed in Fig. 8.
Primarily, by means of isolating into quatern features, carrier, action, object, and condition by means of ontology scale
for inquisitive and altering ontology word, it shall result into problematic depiction. Snapping the key “ontological
expression” operator shall obtain content knowledge subsequently class mining connate as well as theoretical amalgamation. The knowledge acquired will help organization attain competitive advantage.
Ontology-based knowledge management framework Chapter
20
345
FIG. 7 (A) Self-defined retrieval
module. (B) Self-defined retrieval
module.
346 Handbook of hydroinformatics
FIG.
8 (A)
Autoretrieval
(B) Autoretrieval model.
module.
Ontology-based knowledge management framework Chapter
20
347
7. Conclusions
Granted that ongoing debate on performance of business organization and water user networks focus of various factors
which affect performance, the contribution of ontology-based knowledge management framework has been given scant
attention. In this chapter resource based and system theory are used to propose a framework of ontology-based knowledge
management that can enhance performance of business organizations and water user networks. These frameworks can be
employed to explain how ontology of knowledge management can facilitate competitiveness of these organization through
knowledge acquisition, representation, retrieval and manning. The chapter also shows how culture, structure technology
and people can facilitate knowledge sharing and ontology of knowledge management development in organizations. On the
other hand, the chapter shows that most business organization and water user association in Tanzania lack framework of
ontology-based knowledge management that creates challenges to the enterprise due to ambiguity and unstructured nature
of knowledge management in business. Irrigation co-operatives and water users networks may lose knowledge due to
retirement, sees of membership severe or long-lasting sickness and death. When these irrigation co-operatives experience
these it creates loss on portion of its institutional memory. It is therefore recommended that in order to manage knowledge
effectively management should invest enough capital in establishing structure and systems of knowledge management,
educate employees and members on the various aspects of knowledge management ontology as well as fostering
knowledge sharing culture based on trust.
References
Amoah, M., Fordjour, F., 2012. New product development activities among small and medium-scale furniture enterprises in Ghana: a discriminant
analysis. Am. Int. J. Contemp. Res. 2 (12), 41–53.
Armstrong, M., 2006. A Handbook of Human Resource Management Practice. Kogan Page Publishers.
Barkhordari, S., Fattahi, M., Azimi, N.A., 2019. The impact of knowledge-based economy on growth performance: evidence from MENA countries.
J. Knowl. Econ. 10 (3), 1168–1182.
Barney, J.B., Arikan, A.M., 2005. The resource-based view: origins and implications. In: The Blackwell Handbook of Strategic Management. John Wiley
& Sons, Ltd, pp. 123–182.
Bixler, C.H., 2005. Developing a foundation for a successful Knowledge Management System. In: Stankoshy., M. (Ed.), Creating the Discipline of
Knowledge Management: The Latest in. University Research. Elsevier Butterworth-Heinemann, Amsterdam, Boston, pp. 51–65.
Carayannis, E.G., Campbell, D.F., 2009. Mode 3’ and ‘Quadruple Helix’: toward a 21st century fractal innovation ecosystem. Int. J. Technol. Manag. 46
(3–4), 201–234.
Cardeal, N., Antonio, N.S., 2012. Valuable, rare, inimitable resources and organization (VRIO) resources or valuable, rare, inimitable resources (VRI)
capabilities: What leads to competitive advantage? Afr. J. Bus. Manag. 6 (37), 10159–10170.
Chun, M., Sohn, K., Arling, P., Granados, N.F., 2008. Systems theory and knowledge management systems: the case of Pratt-Whitney Rocketdyne. In:
Proceedings of the 41st Annual Hawaii International Conference on System Sciences, Hawaii, USA. IEEE, p. 336.
Daft, R.L., 2000. Management. The Dryden Press, Fort Worth.
El-Farr, H., Hosseingholizadeh, R., 2019. Aligning human resource management with knowledge management for better organizational performance: how
human resource practices support knowledge management strategies? In: Current Issues in Knowledge Management. IntechOpen.
ESRC, 2005. Knowledge Economy in the UK. Retrieved from: http://www.esrcsocietytoday.ac./ESRCInfoCentre/facts/UK/index4.aspx?
ComponentId¼6978&SourcePageId¼14971#0.
Ferreira, J., Mueller, J., Papa, A., 2018. Strategic knowledge management: theory, practice and future challenges. J. Knowl. Manage. 24 (2),
121–126. https://doi.org/10.1108/JKM-07-2018-0461.
Fitz-Enz, J., 2000. The ROI of Human Capital: Measuring the Economic Value of Employee Performance. AMACOM Division of American Management
Association.
Gonzalez, R.V.D., Martins, M.F., 2017. Knowledge Management Process: a theoretical-conceptual research. Gest. Prod. 24 (2), 248–265.
Huang, Y., 2008. Overview of knowledge management in organizations. UW-Stout J. Stud. Res. 1 (1), 1–5. http://digital.library.wisc.edu/1793/52955.
Kefela, G.T., 2010. Knowledge-based economy and society has become a vital commodity to countries. Int. NGO J. 5 (7), 160–166.
Kraaijenbrink, J., Spender, J.C., Groen, A.J., 2010. The resource-based view: a review and assessment of its critiques. J. Manag. 36 (1), 349–372.
Liu, Q., Bai, Q., Kloppers, C., Fitch, P., Bai, Q., Taylor, K., et al., 2013. An ontology-based knowledge management framework for a distributed water
information system. J. Hydroinf. 15 (4), 1169–1188.
Mosha, D.B., Vedeld, P., Katani, J.Z., Kajembe, G.C., Andrew, K.P.R., 2018. Contribution of paddy production to household income in farmer-managed
irrigation scheme communities in Iringa Rural and Kilombero Districts, Tanzania. J. Agric. Stud. 6 (2), 100–122. https://doi.org/10.5296/jas.
v6i2.13147.
Mugera, A.W., 2012. Sustained competitive advantage in agribusiness: applying the resource-based theory to human resources. Int. Food Agribusiness
Manag. Rev. 15 (4), 27–48.
Omotayo, F.O., 2015. Knowledge management as an important tool in organisational management: a review of literature. Libr. Philos. Pract. 1 (2015),
1–23.
348 Handbook of hydroinformatics
Panagiotidis, P., Edwards, J.S., 2001. Organisational learning—a critical systems thinking discipline. Eur. J. Inform. Syst. 10 (3), 135–146.
Peng, M.Y.P., Zhang, Z., Ho, S.S.H., 2019. A study on the relationship among knowledge acquisition sources at the teacher-and college-level, student
absorptive capacity and learning outcomes: using student prior knowledge as a moderator. Educ. Sci. Theory Pract. 19 (2), 22–39.
Sandelin, S.K., Hukka, J.J., Katko, T.S., 2021. Importance of knowledge management at water utilities. Public Works Manage. Policy 26 (2), 164–179.
Senge, P., 1990. The Fifth Discipline: The Art and Practice of the Learning Organization. Doubleday, New York, USA.
Subashini, R., Rita, S., Vivek, M., 2011. The role of ICTs in knowledge management (KM) for organizational effectiveness. In: International Conference
on Computing and Communication Systems. Springer, Berlin, Heidelberg, Germany, pp. 542–549.
Sureephong, P., Chakpitak, N., Ouzrout, Y., Bouras, A., 2008. An ontology-based knowledge management system for industry clusters. In: Global Design
to Gain a Competitive Edge. Springer, London, UK, pp. 333–342.
Tarboton, D.G., Maidment, D.R., Zaslavsky, I., Ames, D.P., 2010. Hydrologic Information System 2010 Status Report. Consortium of University for the
Advancement of Hydrologic Science, Washington, DC, USA.
Teng, J.T., Song, S., 2011. An exploratory examination of knowledge-sharing behaviors: solicited and voluntary. J. Knowl. Manag. 15 (1), 104–117.
URT, 2011. Tanzania Agriculture and Food Security Investment Plan (TAFSIP). Government Publishing Press; Tanzania, Dar es Salaam.
Yari, A., Eslamian, S., 2021. An introduction to residential water users. Chapter 1, In: Yari, A., Eslamian, S., Eslamian, F. (Eds.), Urban and Industrial
Water Conservation Methods. Taylor and Francis, CRC Group, USA, pp. 1–8.
Yoong, P., Molina, M., 2003. Knowledge sharing and business clusters. In: PACIS 2003 Proceedings, p. 84.
Zhang, J., Zhao, W., Xie, G., Chen, H., 2011. Ontology-based knowledge management system and application. Procedia Eng. 15, 1021–1029.
Zheng, Y.L., He, Q.Y., Ping, Q.I.A.N., Ze, L.I., 2012. Construction of the ontology-based agricultural knowledge management system. J. Integr. Agric. 11
(5), 700–709.
Chapter 21
Parallel chaos search-based incremental
extreme learning machine
Salim Heddam
Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology, Hydraulics Division, Agronomy Department, Faculty of Science,
Skikda, Algeria
1. Introduction
Dam’s reservoirs play an important and critical role in both daily life and our future developments. Dams were constructed
for fresh water storage; flood control; and supplying hydroelectric power stations (Ahmed et al., 2020; Yang et al., 2020).
The construction of dam’s reservoirs can have a significant impact on the aquatic life downstream of the dam by interrupting and modifying the flow regime and particularly modifying thermal regimes of rivers (Tao et al., 2020; Guo
et al., 2020). Consequently, damming rivers can have several advantages and provide a large amount of service for human
and also have adverse effects on aquatic environment, especially by changing water temperature which itself affect directly
fish and fish habitat (Wang et al., 2020), and adversely affect all ecological conditions (Shi et al., 2020). One of the most
important operations of dams is releasing water through the spillways, which lead to an increase of the concentration of the
ambient air entrained in the water, itself leading to the supersaturation of total dissolved gas (TDG) in the water downstream
of the spillways (Bragg and Johnston, 2014, 2015, 2016). By monitoring and controlling closely the evolution of TDG
downstream of the spillways at Snake and Columbia Rivers, United States, it was demonstrated that spill from dam is
the leading cause of supersaturated TDG elevation (Tanner et al., 2009, 2011, 2012, 2013). The concentration of TDG
should be maintained below 110% in order to avoid the gas-bubble trauma (GBT) in fish and other barotrauma (Ma
et al., 2019), which is the major problem that has been affecting the health of fish and other fresh water species (Yuan
et al., 2020; Fan et al., 2020), and as a results, it was strongly recommend a regional program of research on the severely
threatened fish species for a given spill season during the passage of fish through the tailwaters until the forebay (Cao et al.,
2020). The formation, composition, and the level of TDG concentration was influenced by several factors among them:
water temperature (Tw) (Yuan et al., 2018), barometric pressure, total gas pressure, and the vapor pressure of water (Morris
et al., 2003).
During the last few decades, many studies have investigated the physical process of TDG supersaturation, and a
variety of methods have been proposed to resolve the issue of TDG by defining a new formula and models that can help
in controlling the evolution of TDG over time and space. Among the proposed methods, the computational fluid dynamic
(CFD) models have been extensively used to predict TDG (Ma et al., 2019), numerical hydrodynamic (Weber et al.,
2004), polydisperse two-phase flow and unsteady 3D two-phase (Politano et al., 2007, 2009, 2012, 2016, 2017; Fu
et al., 2010; Ma et al., 2016), and other numerical and hydrodynamic models (Stewart et al., 2015; Wang et al.,
2019; Witt et al., 2017). Recently, machines learning models have been introduced as a strong and promising alternative
for predicting TDG concentration and an important work is already done (Heddam, 2017; Keshtegar et al., 2019; Heddam
et al., 2020; Heddam and Kisi, 2020). For example, Heddam (2017) applied the generalized regression neural network
(GRNN) model for predicting TDG in Columbia River, United States. Keshtegar et al. (2019) compared four machines
learning models namely, least squares support vector machine (LSSVM), M5 model tree (M5Tree), multivariate
adaptive regression spline (MARS) and high-order response surface method (H-RSM) models in predicting TDG concentration in Columbia River, United States, and reported that the H-RSM was more accurate than the LSSVM, M5Tree
and MARS models. Heddam et al. (2020) compared the kriging interpolation method (KIM), response surface method
(RSM) and the feedforward neural networks (FFNN) in modeling TDG at four dam’s reservoirs located in the Columbia
Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00006-3
Copyright © 2023 Elsevier Inc. All rights reserved.
349
350
Handbook of hydroinformatics
River, United States, and demonstrated that the KIM model was more accurate than the RSM and FFNN model. Recently,
Heddam and Kisi (2020) compared several families of neurofuzzy models namely, the adaptive neurofuzzy inference
system with subtractive clustering (ANFIS-S), ANFIS with grid partition (ANFIS-G), ANFIS with fuzzy c-means
(ANFIS-F), dynamic evolving neural-fuzzy inference system online learning called DENFIS_O, and offline learning
called DENFIS_F, for predicting hourly TDG and reported that in overall ANFIS models were more accurate than
DENFIS models.
However, most of the studies related to TDG prediction focus on the application of several kinds of models by incorporating a varieties of input variables or predictors, which complicates the application of the proposed models. In
addition, a unique model linking TDG concentration to water temperature is not well documented and investigated,
which constitutes a strong motivation of our study. Extreme l
Download