Handbook of Hydroinformatics: Classic Soft-Computing Techniques

Handbook of HydroInformatics This page intentionally left blank Handbook of HydroInformatics Volume I: Classic Soft-Computing Techniques Edited by Saeid Eslamian Full Professor of Hydrology and Water Resources Sustainability, Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Iran Faezeh Eslamian McGill University, Quebec, Canada Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States Copyright © 2023 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0-12-821285-1 For information on all Elsevier publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Candice Janco Acquisitions Editor: Maria Elekidou Editorial Project Manager: Rupinder Heron Production Project Manager: Bharatwaj Varatharajan Cover Designer: Greg Harris Typeset by STRAIVE, India Dedication To Late Dr. Mark Twain (American Writer, Humorist, Entrepreneur, Publisher, and Lecturer, 1835–1910) “Data is like garbage. You’d better know what you are going to do with it before you collect it.” This page intentionally left blank Contents Contributors About the editors Preface 1. Advanced machine learning techniques: Multivariate regression xv xvii xix 1 Reza Daneshfar, Mohammad Esmaeili, Mohammad Mohammadi-Khanaposhtani, Alireza Baghban, Sajjad Habibzadeh, and Saeid Eslamian 1. 2. 3. 4. 5. 6. 7. 8. Introduction Linear regression Multivariate linear regression Gradient descent method Polynomial regression Overfitting and underfitting Cross-validation Comparison between linear and polynomial regressions 9. Learning curve 10. Regularized linear models 11. The ridge regression 12. The effect of collinearity in the coefficients of an estimator 13. Outliers impact 14. Lasso regression 15. Elastic net 16. Early stopping 17. Logistic regression 18. Estimation of probabilities 19. Training and the cost function 20. Conclusions Appendix: Python code Linear regression Gradient descent method Comparison between linear and polynomial regressions Learning curve The effect of collinearity in the coefficients of an estimator Outliers impact Lasso regression Elastic net Training and the cost function References 2. Bat algorithm optimized extreme learning machine: A new modeling strategy for predicting river water turbidity at the United States 33 34 37 39 Salim Heddam 1 1 2 4 6 9 9 9 11 11 13 13 13 16 17 17 18 18 19 20 20 20 22 23 26 30 31 33 1. Introduction 2. Study area and data 3. Methodology 3.1 Feedforward artificial neural network 3.2 Dynamic evolving neural-fuzzy inference system 3.3 Bat algorithm optimized extreme learning machine 3.4 Multiple linear regression 3.5 Performance assessment of the models 4. Results and discussion 4.1 USGS 1497500 station 4.2 USGS 11501000 station 4.3 USGS 14210000 station 4.4 USGS 14211010 station 5. Conclusions References 3. Bayesian theory: Methods and applications 39 40 42 42 43 43 44 45 46 46 47 49 50 53 53 57 Yaser Sabzevari and Saeid Eslamian 1. 2. 3. 4. 5. Introduction Bayesian inference Phases Estimates Theorem Bayes 5.1 Argument of Bayes 5.2 Bayesian estimation theory 57 57 58 58 58 58 59 vii viii Contents 5.3 Machine learning using Bayesian method 5.4 Bayesian theory in machine learning 5.5 Definition of basic concepts 5.6 Bayesian machine learning methods 5.7 Optimal Bayes classifier 5.8 Naive Bayes classifier 6. Bayesian network 7. History of Bayesian model application in water resources 8. Case study of Bayesian network application in modeling of evapotranspiration of reference plant 9. Conclusions References 4. CFD models 5. Cross-validation 59 60 60 60 60 62 63 65 66 67 67 69 Hossien Riahi-Madvar, Mohammad Mehdi Riyahi, and Saeid Eslamian 1. 2. Introduction Numerical model of one-dimensional advection dispersion equation (1D-ADE) 3. Physically influenced scheme 4. Finite Volume Solution of Saint-Venant equations for dam-break simulation using PIS 5. Discretization of continuity equation using PIS 6. Discretization of the momentum equation using PIS 7. Quasi-two-dimensional flow simulation 8. Numerical solution of quasi-twodimensional model 9. 3D numerical modeling of flow in compound channel using turbulence models 10. Three-dimensional numerical model 11. Grid generation and the flow filed solution 12. Comparison of different turbulence models 13. Three-dimensional pollutant transfer modeling 14. Results of pollutant transfer modeling 15. Conclusions References 69 69 70 89 Amir Seraj, Mohammad Mohammadi-Khanaposhtani, Reza Daneshfar, Maryam Naseri, Mohammad Esmaeili, Alireza Baghban, Sajjad Habibzadeh, and Saeid Eslamian 1. Introduction 1.1 Importance of validation 1.2 Validation of the training process 2. Cross-validation 2.1 Exhaustive and nonexhaustive cross-validation 2.2 Repeated random subsampling cross-validation 2.3 Time-series cross-validation 2.4 k-fold cross-validation 2.5 Stratified k-fold cross-validation 2.6 Nested 3. Computational procedures 4. Conclusions References 89 89 89 90 90 90 90 90 91 91 91 104 105 6. Comparative study on the selected node and link-based performance indices to investigate the hydraulic capacity of the water distribution network 107 C.R. Suribabu and P. Sivakumar 72 73 73 74 76 78 79 80 80 81 83 86 86 1. Introduction 2. Resilience of water distribution network 3. Hydraulic uniformity index (HUI) 4. Mean excess pressure (MEP) 5. Proposed measure 5.1 Energy loss uniformity (ELU) 6. Hanoi network 7. Results and discussion 8. Conclusions References 7. The co-nodal system analysis 107 109 110 110 110 110 111 112 117 117 119 Vladan Kuzmanovi c 1. 2. 3. 4. 5. Introduction Co-nodal and system analysis Paleo-hydrology and remote sensing Methods Nodes and cyclic confluent system 5.1 H-cycloids analysis and fluvial dynamics 6. Three Danube phases 119 119 120 121 121 123 124 Contents ix 7. Danubian hypocycles as overlapping phases 8. Conclusions References Further reading 8. Data assimilation 10. Decision tree algorithms 127 133 133 134 135 Mohammad Mahdi Dorafshan, Mohammad Reza Jabbari, and Saeid Eslamian 1. Introduction 2. What is data assimilation? 3. Types of data assimilation methods 3.1 Types of updating procedure 3.2 Types of updating variable 4. Optimal filtering methods 4.1 Kalman filter 4.2 Transfer function 4.3 Extended Kalman filter 4.4 Unscented Kalman filter 5. Auto-regressive method 6. Considerations in using data assimilation 7. Conclusions References 9. Data reduction techniques 135 136 137 137 137 140 140 143 144 146 147 148 148 148 153 M. Mehdi Bateni and Saeid Eslamian 1. 2. 3. Introduction Principal component analysis Singular spectrum analysis 3.1 Univariate singular spectral analysis 3.2 Multivariate singular spectral analysis 4. Canonical correlation analysis 5. Factor analysis 5.1 Principal axis factoring 6. Random projection 7. Isometric mapping 8. Self-organizing maps 9. Discriminant analysis 10. Piecewise aggregate approximation 11. Clustering 11.1 k-means clustering 11.2 Hierarchical clustering 11.3 Density-based clustering 12. Conclusions References Amir Ahmad Dehghani, Neshat Movahedi, Khalil Ghorbani, and Saeid Eslamian 1. Introduction 1.1 ID3 algorithm 1.2 C4.5 algorithm 1.3 CART algorithm 1.4 CHAID algorithm 1.5 M5 algorithm 1.6 Random forest 1.7 Application of DT algorithms in water sciences 2. M5 model tree 2.1 Splitting 2.2 Pruning 2.3 Smoothing 3. Data set 3.1 Empirical formula for flow discharge 3.2 Model evaluation and comparison 4. Modeling and results 4.1 Initial tree 4.2 Pruning 4.3 Comparing M5 model and empirical formula 5. Conclusions References 11. Entropy and resilience indices 153 153 155 156 157 157 158 158 160 162 162 163 165 165 165 166 167 168 169 171 171 171 172 173 173 173 173 174 174 174 176 176 176 177 178 179 179 179 184 185 185 189 Mohammad Ali Olyaei, A.H. Ansari, Zahra Heydari, and Amin Zeynolabedin 1. Introduction 2. Water resource and infrastructure performance evaluation 3. Entropy 3.1 Thermodynamic entropy 3.2 Statistical-mechanical entropy 3.3 Information entropy 3.4 Application of entropy in water resources area 4. Resilience 4.1 Application of resilience in water resources area 4.2 Resilience in UWS 4.3 Resilience in urban environments 4.4 Resilience to floods 4.5 Resilience to drought 5. Conclusions References 189 190 191 191 192 192 193 194 195 196 198 199 201 202 203 x Contents 12. Forecasting volatility in the stock market data using GARCH, EGARCH, and GJR models 14. Gradient-based optimization Mohammad Zakwan 207 Sarbjit Singh, Kulwinder Singh Parmar, and Jatinder Kaur 1. Introduction 2. Methodology 2.1 Types of GARCH models 3. Application and results 4. Conclusions References 13. Gene expression models 207 209 210 211 219 219 221 Hossien Riahi-Madvar, Mahsa Gholami, and Saeid Eslamian 1. Introduction 2. Genetic programming 2.1 The basic steps in GEP development 2.2 The basic steps in GEP development 3. Tree-based GEP 3.1 Tree depth control 3.2 Maximum tree depth 3.3 Penalizing the large trees 3.4 Dynamic maximum-depth technique 4. Linear genetic programming 5. Evolutionary polynomial regression 6. Multigene genetic programming 7. Pareto optimal-multigene genetic programming 8. Some applications of GEP-based models in hydro informatics 8.1 Derivation of quadric polynomial function using GEP 8.2 Derivation of Colebrook-White equation using GEP 8.3 Derivation of the exact form of shield’s diagram using GEP 8.4 Extraction of regime river equations using GEP 8.5 Extraction of longitudinal dispersion coefficient equations using GEP 9. Conclusions References 243 221 221 222 222 223 224 224 225 226 227 227 228 229 230 230 231 233 234 236 237 239 1. Introduction 2. Materials and method 2.1 GRG solver 3. Results and discussion 3.1 Solving nonlinear equations 3.2 Application in parameter estimation 3.3 Fitting empirical equations 4. Conclusions References 15. Gray wolf optimization algorithm 243 244 245 245 245 246 248 249 249 253 Mohammad Reza Zaghiyan, Vahid Shokri Kuchak, and Saeid Eslamian 1. Introduction 2. Theory of GWO 3. Mathematical modeling of gray wolf optimizer 3.1 Social hierarchy 3.2 Encircling prey 3.3 Hunting behavior 3.4 Exploitation in GWO-attacking prey 3.5 Exploration in GWO-search for prey 4. Gray wolf optimization example for reservoir operation 5. Conclusions Appendix A: GWO Matlab codes for the reservoir example References 16. Kernel-based modeling 253 254 255 255 256 256 258 259 259 261 262 265 267 Kiyoumars Roushangar, Roghayeh Ghasempour, and Saman Shahnazi 1. Introduction 2. Support vector machine 2.1 Support vector classification 2.2 Support vector regression 3. Gaussian processes 3.1 Gaussian process regression 3.2 Gaussian process classification 4. Kernel extreme learning machine 5. Kernels type 5.1 Fisher kernel 5.2 Graph kernels 267 268 268 269 271 271 273 274 275 276 276 Contents xi 5.3 Kernel smoother 5.4 Polynomial kernel 5.5 Radial basis function kernel 5.6 Pearson kernel 5.7 String kernels 5.8 Neural tangent kernel 6. Application of kernel-based approaches 6.1 Total resistance and form resistance of movable bed channels 6.2 Energy losses of rectangular and circular culverts 6.3 Lake and reservoir water level prediction 6.4 Streamflow forecasting 6.5 Sediment load prediction 6.6 Pier scour modeling 6.7 Reservoir evaporation prediction 7. Conclusions References Further reading 17. Large eddy simulation: Subgrid-scale modeling with neural network 276 276 276 276 277 277 3.2 Constant heat flux boundary condition 4. Multicomponent LBM (species transport modeling) 5. Flow simulation in porous media 6. Dimensionless numbers 7. Flow chart of the simulation procedure 8. Multiphase flows 8.1 The color-gradient model 8.2 Shan-Chen model 9. Sample test cases and codes 9.1 Free convection in L-cavity 9.2 Force convection in a channel 10. Conclusions Appendix A Computer code for free convection in L-cavity Appendix B Computer code for force convection in a channel References 311 317 19. Multigene genetic programming and its various applications 283 321 277 277 277 279 279 279 279 279 280 280 281 18. Lattice Boltzmann method and its applications Mojtaba Aghajani Delavar and Junye Wang 1. 2. 3. Introduction Lattice Boltzmann equations 2.1 BGK approximation 2.2 Lattice Boltzmann models 2.3 Multirelaxation time lattice Boltzmann (MRT) 2.4 Boundary conditions Thermal LBM 3.1 Boundary condition with a given temperature 297 298 299 300 300 301 302 303 303 303 304 305 305 311 Majid Niazkar Tamas Karches 1. Introduction 2. LES and traditional subgrid-scale modeling 3. Data-driven LES closures 4. Guidelines for SGS modeling 4.1 Simulation project definition 4.2 A priory analysis with DNS 4.3 Neural network based SGS model construction 5. Conclusions References 297 283 284 284 285 285 286 286 287 287 1. Introduction 2. Genetic programming and its variants 3. An introduction to multigene genetic programming 4. Main controlling parameters of MGGP 5. A review on MGGP applications 6. Future trends of MGGP applications 7. A case study of the MGGP application 8. Conclusions References 289 20. Ontology-based knowledge management framework in business organizations and water 289 users networks in Tanzania 289 292 292 293 294 296 297 321 321 322 324 325 327 327 329 330 333 Neema Penance Kumburu 1. 2. 3. 4. Introduction Theoretical framework Empirical literature Ontology-based knowledge management framework in business organizations: A conceptual framework 333 334 336 336 xii Contents 5. Ontology-based knowledge management framework in business organizations and water user networks proposed system 6. The practice of knowledge organization and expression 6.1 Ontology 6.2 Knowledge representation and organization base on ontology 6.3 Knowledge retrieval base ontology 6.4 Knowledge application and implementation base on ontology 7. Conclusions References 21. Parallel chaos search-based incremental extreme learning machine 339 341 341 341 343 344 347 347 349 Salim Heddam 1. Introduction 2. Materials and methods 2.1 Study area description 2.2 Modeling approaches 2.3 Performance assessment of the models 3. Results and discussion 4. Conclusions References 22. Relevance vector machine (RVM) 349 350 350 352 353 355 361 362 365 Mohammad Reza Jabbari, Mohammad Mahdi Dorafshan, and Saeid Eslamian 1. Introduction 2. Machine learning algorithms 2.1 Supervised learning 2.2 Unsupervised learning 3. Support vector machine 4. Relevance vector machine 4.1 Measurement model representation 4.2 Relevance vector regression 4.3 Relevance vector classification 4.4 Limitations and performance analysis 4.5 Multivariate relevance vector machines 5. Preprocessing step 5.1 Data normalization 5.2 Data reduction 5.3 Dataset split ratio 365 365 365 366 366 367 367 371 372 6. Applications of relevance vector machine 6.1 Sediment concentration estimation 6.2 Drought monitoring 6.3 Groundwater quality monitoring 6.4 Evaporative losses in reservoirs 6.5 Environmental science 7. Conclusions References 23. Stochastic learning algorithms 373 375 375 375 376 377 378 378 379 380 381 381 385 Amir Hossein Montazeri, Sajad Khodambashi Emami, Mohammad Reza Zaghiyan, and Saeid Eslamian 1. Introduction 2. Gradient descent 2.1 Theory of batch gradient descent 2.2 Theory of SGD 3. Perceptron 3.1 Theory of perceptron 3.2 Perceptron learning procedure 4. Adaline 4.1 Theory of Adaline 4.2 Adaline learning procedure 5. Multilayer network 5.1 Multilayer network learning procedure 6. Learning vector quantization 6.1 LVQ learning procedure 7. K-means clustering 7.1 What is clustering? 7.2 Theory of K-means 8. Gradient boosting 8.1 What is boosting? 8.2 Theory of gradient boosting (GB) 8.3 Stochastic gradient boosting 9. Conclusions References Appendix A Appendix B Appendix C Appendix D Appendix E 24. Supporting vector machines 372 377 385 386 386 386 388 389 390 391 391 391 392 392 393 395 397 397 397 399 399 399 400 400 401 403 404 406 407 409 411 Kiyoumars Roushangar and Roghayeh Ghasempour 1. Introduction 2. SVMs for classification problems 2.1 Linear classifiers 2.2 Non-linear classifiers 411 412 412 413 Contents xiii 3. SVMs for regression problems 4. Selection of SVM parameters 4.1 Margin 4.2 Regularization 4.3 Kernels 4.4 Gamma parameter 5. Application of support vector machines 5.1 Application of support vector regression in the water recourse engineering 6. Conclusions References 25. Uncertainty analysis using fuzzy models in hydroinformatics 413 415 415 415 415 416 26. Uncertainty-based resiliency evaluation Hossien Riahi-Madvar, Mohammad Mehdi Riyahi, and Saeid Eslamian 1. 2. Introduction Uncertainty analysis by the first-order method 3. Risk and resilience analysis 4. Reliability computation by direct integration 5. Reliability computation using safety margin/safety factor 6. Safety margin 7. Safety factor 8. Uncertainty-based hydraulic designs 9. Hydrologic uncertainties 10. Hydraulics uncertainties 11. Monte-Carlo uncertainty analysis in quasi-2D model parameters 12. SKM model 13. Uncertainty based river flow modeling with Monte-Carlo simulator 14. Monte-Carlo uncertainty analysis in machine learning techniques 15. Uncertainty evaluation using the integrated Bayesian multimodel framework 16. Copula-based uncertainty analysis 17. Uncertainty analysis with Tsallis entropy 18. Theory of evidence for uncertainty in hydroinformatics 19. Resiliency quantification 20. Conclusions References 417 417 421 421 423 Tayeb Boulmaiz, Mawloud Guermoui, Mohamed Saber, Hamouda Boutaghane, Habib Abida, and Saeid Eslamian 1. Introduction 2. Fuzzy logic theory 2.1 Fuzzification 2.2 Rule base 2.3 Inference 2.4 Defuzzification 3. Concept of fuzzy uncertainty analysis 4. Uncertainty analysis applications 4.1 Flood forecasting 4.2 Groundwater modeling 5. Machine learning and fuzzy sets 6. Fuzzy sets and probabilistic approach 7. Conclusions References 423 424 424 425 425 425 425 426 426 427 430 431 432 432 435 Index 435 435 438 438 439 439 439 440 441 442 442 443 443 447 449 449 450 451 451 452 452 455 This page intentionally left blank Contributors Habib Abida (423), Laboratory of Modeling of Geological and Hydrological Systems (GEOMODELE (LR16ES17)), Faculty of Sciences, University of Sfax, Sfax, Tunisia A.H. Ansari (189), Department of Agricultural and Biological Engineering, Pennsylvania State University, State College, PA, United States Alireza Baghban (1,89), Chemical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Mahshahr Campus, Mahshahr, Iran M. Mehdi Bateni (153), University School for Advanced Studies, Pavia, Italy Tayeb Boulmaiz (423), Materials, Energy Systems Technology and Environment Laboratory, University of Ghardaia, Ghardaia, Algeria Hamouda Boutaghane (423), Laboratory of Soil and Hydraulic, Badji Mokhtar Annaba University, Annaba, Algeria Reza Daneshfar (1,89), Department of Petroleum Engineering, Ahwaz Faculty of Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran Amir Ahmad Dehghani (171), Department of Water Engineering, Gorgan University of Agricultural Sciences & Natural Resources, Gorgan, Iran Mojtaba Aghajani Delavar (289), Faculty of Science and Technology, Athabasca University, Athabasca, AB, Canada Mohammad Mahdi Dorafshan (135,365), Department of Civil Engineering, Isfahan University of Technology, Isfahan, Iran Sajad Khodambashi Emami (385), Department of Water Engineering and Management, Tarbiat Modares University, Tehran, Iran Saeid Eslamian (1,57,69,89,135,153,171,221,253,365, 385,423,435), Department of Water Engineering, College of Agriculture, Isfahan University of Technology; Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran Mohammad Esmaeili (1,89), Department of Petroleum Engineering, Amirkabir University of Technology (Polytechnic of Tehran); Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran Roghayeh Ghasempour (267,411), Department of Water Resources Engineering, Faculty of Civil Engineering, University of Tabriz, Tabriz, Iran Mahsa Gholami (221), Department of Civil Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran Khalil Ghorbani (171), Department of Water Engineering, Gorgan University of Agricultural Sciences & Natural Resources, Gorgan, Iran Mawloud Guermoui (423), Unite de Recherche Appliquee en Energies Renouvelables, Centre de Developpement des Energies Renouvelables, Ghardaı̈a, Algeria Sajjad Habibzadeh (1,89), Chemical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Mahshahr Campus, Mahshahr; Surface Reaction and Advanced Energy Materials Laboratory, Chemical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran Salim Heddam (39,349), Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology, Hydraulics Division, Agronomy Department, Faculty of Science, Skikda, Algeria Zahra Heydari (189), Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States Mohammad Reza Jabbari (135,365), Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran Tamas Karches (283), Faculty of Water Science, University of Public Service, Budapest, Hungary Jatinder Kaur (207), Department of Mathematics, I.K. Gujral Punjab Technical University, Kapurthala; Guru Nanak Dev University College, Amritsar, Punjab, India xv xvi Contributors Vahid Shokri Kuchak (253), Department of Water Engineering and Management, Tarbiat Modares University, Tehran, Iran Neema Penance Kumburu (333), Moshi Co-operative University, Moshi, Tanzania Vladan Kuzmanovic (119), Serbian Hydrological Society, International Association of Hydrological Sciences, Belgrade, Serbia Mohammad Mohammadi-Khanaposhtani (1), Fouman Faculty of Engineering, College of Engineering, University of Tehran, Tehran, Iran Amir Hossein Montazeri (385), Department of Water Engineering and Management, Tarbiat Modares University, Tehran, Iran Neshat Movahedi (171), Department of Water Engineering, Gorgan University of Agricultural Sciences & Natural Resources, Gorgan, Iran Maryam Naseri (89), Chemical Engineering Department, Babol Noshirvani University of Technology, Babol, Iran Majid Niazkar (321), Department of Agricultural and Environmental Sciences - Production, Landscape, Agroenergy, University of Milan, Milan, Italy Mohammad Ali Olyaei (189), Department of Civil Environmental and Geo-Engineering, University of Minnesota, Minneapolis, MN, United States Kulwinder Singh Parmar (207), Department of Mathematics, I.K. Gujral Punjab Technical University, Kapurthala, Punjab, India Hossien Riahi-Madvar (69,221,435), Department of Water Engineering, Faculty of Agriculture, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran Mohammad Mehdi Riyahi (69), Department of Civil Engineering, Faculty of Civil Engineering and Architecture, Shahid Chamran University of Ahvaz, Ahvaz, Iran Kiyoumars Roushangar (267,411), Department of Water Resources Engineering, Faculty of Civil Engineering; Center of Excellence in Hydroinformatics, University of Tabriz, Tabriz, Iran Mohamed Saber (423), Disaster Prevention Research Institute (DPRI), Kyoto University, Kyoto, Japan Yaser Sabzevari (57), Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran Amir Seraj (89), Department of Instrumentation and Industrial Automation, Ahwaz Faculty of Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran Saman Shahnazi (267), Department of Water Resources Engineering, Faculty of Civil Engineering, University of Tabriz, Tabriz, Iran Sarbjit Singh (207), Guru Nanak Dev University College, Pathankot; Department of Mathematics, Guru Nanak Dev University, Amritsar, Punjab, India P. Sivakumar (107), Department of Civil Engineering, North Eastern Regional Institute of Science and Technology, Nirjuli (Itanagar), Arunachal Pradesh, India C.R. Suribabu (107), Centre for Advanced Research in Environment, School of Civil Engineering, SASTRA Deemed University, Thanjavur, Tamil Nadu, India Junye Wang (289), Faculty of Science and Technology, Athabasca University, Athabasca, AB, Canada Mohammad Reza Zaghiyan (385), Department of Water Engineering and Management, Tarbiat Modares University, Tehran, Iran Mohammad Zakwan (243), School of Technology, Maulana Azad National Urdu University, Hyderabad, India Amin Zeynolabedin (189), School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran About the editors Saeid Eslamian has been a Full Professor of Environmental Hydrology and Water Resources Engineering in the Department of Water Engineering at Isfahan University of Technology since 1995. His research focuses mainly on statistical and environmental hydrology in a changing climate. In recent years, he has worked on modeling natural hazards, including floods, severe storms, wind, drought, and pollution, and on water reuse, sustainable development and resiliency, etc. Formerly, he was a visiting professor at Princeton University, New Jersey, and the University of ETH Zurich, Switzerland. On the research side, he started a research partnership in 2014 with McGill University, Canada. He has contributed to more than 600 publications in journals, books, and technical reports. He is the founder and Chief Editor of both the International Journal of Hydrology Science and Technology (IJHST) and the Journal of Flood Engineering (JFE). Dr. Eslamian is currently Associate Editor of four important publications: Journal of Hydrology (Elsevier), Eco-Hydrology and Hydrobiology (Elsevier), Journal of Water Reuse and Desalination (IWA), and Journal of the Saudi Society of Agricultural Sciences (Elsevier). Professor Eslamian is the author of approximately 35 books and 180 book chapters. Dr. Eslamian’s professional experience includes membership on editorial boards, and he is a reviewer of approximately 100 Web of Science (ISI) journals, including the ASCE Journal of Hydrologic Engineering, ASCE Journal of Water Resources Planning and Management, ASCE Journal of Irrigation and Drainage Engineering, Advances in Water Resources, Groundwater, Hydrological Processes, Hydrological Sciences Journal, Global Planetary Changes, Water Resources Management, Water Science and Technology, Eco-Hydrology, Journal of the American Water Resources Association, American Water Works Association Journal, etc. Furthermore, in 2015, UNESCO nominated him for a special issue of the Eco-Hydrology and Hydrobiology Journal. Professor Eslamian was selected as an outstanding reviewer for the Journal of Hydrologic Engineering in 2009 and received the EWRI/ASCE Visiting International Fellowship at the University of Rhode Island (2010). He was also awarded prizes for outstanding work by the Iranian Hydraulics Association in 2005 and the Iranian petroleum and oil industry in 2011. Professor Eslamian was chosen as a distinguished researcher by Isfahan University of Technology (IUT) and Isfahan Province in 2012 and 2014, respectively. In 2016, he was a candidate for National Distinguished Researcher in Iran. Dr. Eslamian has also acted as a referee for many international organizations and universities. Some examples include the US Civilian Research and Development Foundation (USCRDF), the Swiss Network for International Studies, the His Majesty’s Trust Fund for Strategic Research of Sultan Qaboos University, Oman, the Royal Jordanian Geography Center College, and the Research Department of Swinburne University of Technology of Australia. He is also a member of the following associations: American Society of Civil Engineers (ASCE), International Association of Hydrologic Science (IAHS), World Conservation Union (IUCN), GC Network for Drylands Research and Development (NDRD), International Association for Urban Climate (IAUC), International Society for Agricultural Meteorology (ISAM), Association of Water and Environment Modeling (AWEM), International Hydrological Association (STAHS), and UK Drought National Center (UKDNC). Professor Eslamian finished Hakim-Sanaei High School in Isfahan in 1979. After the Islamic Revolution, he was admitted to Isfahan University of Technology (IUT) to study a BS in water engineering, and he graduated in 1986. He was subsequently offered a scholarship for a master’s degree program at Tarbiat Modares University, Tehran. He finished his studies in hydrology and water resources engineering in 1989. In 1991, he was awarded a scholarship for a PhD in civil engineering at the University of New South Wales, Australia. His supervisor was Professor David H. Pilgrim, who encouraged Professor Eslamian to work on “Regional Flood Frequency Analysis Using a New Region of Influence Approach.” He earned a PhD in 1995 and returned to his home country and IUT. He was promoted in 2001 to Associate xvii xviii About the editors Professor and in 2014 to Full Professor. For the past 26 years, he has been nominated for different positions at IUT, including University President Consultant, Faculty Deputy of Education, and Head of Department. Dr. Eslamian is now director of the Center of Excellence in Risk Management and Natural Hazards (RiMaNaH). Professor Eslamian has made three scientific visits, to the United States, Switzerland, and Canada in 2006, 2008, and 2015, respectively. In the first, he was offered the position of visiting professor by Princeton University and worked jointly with Professor Eric F. Wood at the School of Engineering and Applied Sciences for 1 year. The outcome was a contribution to hydrological and agricultural drought interaction knowledge through developing multivariate L-moments between soil moisture and low flows for northeastern US streams. Recently, Professor Eslamian has written 14 handbooks published by Taylor & Francis (CRC Press): the three-volume Handbook of Engineering Hydrology (2014), Urban Water Reuse Handbook (2016), Underground Aqueducts Handbook (2017), the three-volume Handbook of Drought and Water Scarcity (2017), Constructed Wetlands: Hydraulic Design (2019), Handbook of Irrigation System Selection for Semi-Arid Regions (2020), Urban and Industrial Water Conservation Methods (2020), and the three-volume Flood Handbook (2022). An Evaluation of Groundwater Storage Potentials in a Semiarid Climate (2019) and Advances in Hydrogeochemistry Research (2020) by Nova Science Publishers are also among his book publications. The two-volume Handbook of Water Harvesting and Conservation (2021, Wiley) and Handbook of Disaster Risk Reduction and Resilience (2021, New Frameworks for Building Resilience to Disasters) are further Springer publications by Professor Eslamian, as are the Handbook of Disaster Risk Reduction and Resilience (2022, Disaster Risk Management Strategies) and the two-volume Earth Systems Protection and Sustainability (2022). Professor Eslamian was listed among the World’s Top 2% of Researchers by Stanford University, USA, in 2019 and 2020. He has also been a grant assessor, report referee, award jury member, and invited researcher for international organizations such as the United States Civilian Research and Development Foundation (2006), Intergovernmental Panel on Climate Change (2012), World Bank Policy and Human Resources Development Fund (2021), and Stockholm International Peace Research Institute (2022), respectively. Faezeh Eslamian holds a PhD in Bioresource Engineering from McGill University, Canada. Her research focuses on the development of a novel lime-based product to mitigate phosphorus loss from agricultural fields. Dr. Elsamian completed her bachelor and master’s degrees in Civil and Environmental Engineering at the Isfahan University of Technology, Iran, where she evaluated natural and low-cost absorbents for the removal of pollutants such as textile dyes and heavy metals. Furthermore, she has conducted research on worldwide water quality standards and wastewater reuse guidelines. Dr. Elsamian is an experienced multidisciplinary researcher with interests in soil and water quality, environmental remediation, water reuse, and drought management. Preface Classic Soft-Computing Techniques is the first volume of three in the Handbook of HydroInformatics series. Through this comprehensive, 26-chapter work, the contributors explore the difference between traditional computing, also known as hard computing, and soft computing, which is based on the importance given to issues like precision, certainty, and rigor. The chapters go on to define fundamental classic soft-computing techniques such as multivariate regressions, bat algorithm optimized extreme learning machine (Bat-ELM), Bayesian inference, computational fluid dynamics (CFD) models, cross validation, selected node and link-based performance indices, conodal system analysis, data assimilation, data reduction techniques, decision tree algorithm, entropy and resilience indices, generalized autorregressive conditional heteroskedasticity (GARCH), exponential general autoregressive conditional heteroskedastic (EGARCH), and Glosten, Jagannathan, and Runkle (GJR) models, gene expression models, gradient-based optimization, gray wolf optimization (GWO) algorithm, kernel-based modeling, subgrid-scale (SGS) modeling with neural network, lattice Boltzmann method (LBM), multigene genetic programming (MGGP), ontology-based knowledge management framework, parallel chaos search-based incremental extreme learning, relevance vector machine (RVM), stochastic learning algorithms, support vector machine, uncertainty analysis using fuzzy logic models, uncertainty-based resiliency evaluation, etc. It is a fully comprehensive handbook providing all the information needed regarding classic soft-computing techniques. This volume is a true interdisciplinary work, and the intended audience includes postgraduates and early-career researchers interested in computer science, mathematical science, applied science, Earth and geoscience, geography, civil engineering, engineering, water science, atmospheric science, social science, environment science, natural resources, and chemical engineering. The Handbook of HydroInformatics corresponds to courses that could be taught at the following levels: undergraduate, postgraduate, research students, and short course programs. Typical course names of this type include: HydroInformatics, Soft Computing, Learning Machine Algorithms, Statistical Hydrology, Artificial Intelligence, Optimization, Advanced Engineering Statistics, Time Series, Stochastic Processes, Mathematical Modeling, Data Science, Data Mining, etc. The three-volume Handbook of HydroInformatics is recommended not only for universities and colleges, but also for research centers, governmental departments, policy makers, engineering consultants, federal emergency management agencies, and related bodies. Key features are as follows: l l l Contains key insights from global contributors in the fields of data management research, climate change and resilience, insufficient data problems, etc. Offers applied examples and case studies in each chapter, providing the reader with real-world scenarios for comparison Introduces classic soft-computing techniques necessary for a range of disciplines Saeid Eslamian College of Agriculture, Isfahan University of Technology, Isfahan, Iran Faezeh Eslamian McGill University, Montreal, QC, Canada xix This page intentionally left blank Chapter 1 Advanced machine learning techniques: Multivariate regression Reza Daneshfara, Mohammad Esmaeilib, Mohammad Mohammadi-Khanaposhtanic, Alireza Baghband, Sajjad Habibzadehd, and Saeid Eslamiane,f a Department of Petroleum Engineering, Ahwaz Faculty of Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran, b Department of Petroleum Engineering, Amirkabir University of Technology (Polytechnic of Tehran), Tehran, Iran, c Fouman Faculty of Engineering, College of Engineering, University of Tehran, Tehran, Iran, d Chemical engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Mahshahr Campus, Mahshahr, Iran, e Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, f Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction Complicated problems in a variety of fields that cannot be solved using conventional techniques are handled using machine learning (Zeebaree et al., 2019; Bargarai et al., 2020; Dargan et al., 2020). Linear regression is a simple and popular machine technique employed for prediction purposes. It was introduced by Galton (1894). It is a mathematical approach € g€ud€uc€u, 2015; Dehghan et al., 2015; Liu et al., in order to analyze and quantify the associations of variables (Akg€un and O 2017). To incorporate the outputs of other founders/covariates into a model, one cannot utilize univariate regression—i.e., chi-square, Fisher exact test, and analysis of variance (ANOVA). As a result, partial correlation and regression are employed to identify the association of two variables and evaluate the confusion effect (Zebari et al., 2020; Sulaiman, 2020; Epskamp and Fried, 2018). Mathematical algorithms typically employ linear regression for the purpose of predicted effect measurement and modeling versus several inputs (Lim, 2019). This data analysis approach linearly relates independent and dependent variables, modeling the relationships between the independent and dependent variables based on model training. The present study conducts a review of recent popular methodologies in the machine learning and linear regression literature, including databases, performance, accuracy, and algorithms, from 2017 to 2020 (Sarkar et al., 2015). This chapter is divided into the following sections: The first section focuses on linear regression, this is followed by an explanation of multivariate linear regression, and then the gradient descent method is described. The polynomial regression concept is then explained and concepts such as overfitting and under-fitting, cross-validation, and learning curve are expressed in a clear and fluent manner. Finally, the attractive and practical concepts that are discussed include: regularized linear models, ridge regression, outliers impact, lasso regression, elastic net, early stopping, and logistic regression. 2. Linear regression When we know a property or a dependent variable in general depends on several variables but the way of this dependence is not clear to us, a linear model is the simplest choice to get an insight into this dependence. Although the simplest choice is not necessarily the best one, linear models can do a lot in the case of algebraic dependency between a function and its variables. A linear model can provide a reasonable estimation of any function at least in a small neighborhood. Moreover, some nonlinear dependencies as suggested by theories could be transformed to a linear dependency. For example, consider the following chemical reaction rate law r A ¼ kCnA Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00017-8 Copyright © 2023 Elsevier Inc. All rights reserved. (1) 1 2 Handbook of hydroinformatics In which k and n are constants to be determined from experimental data of reaction rate (rA) versus species concentration (CA). To apply the favorable linear model, one can transform the above equation taking a natural logarithm from both sides to have ln ðr A Þ ¼ ln k + n ln ðCA Þ (2) Another example that can be suited to multivariate problems is the polynomial regression. This topic will be discussed in a separate section; however, the linear model can somewhat cover such problems by an interesting trick. Consider the following model y ¼ a 0 + a1 z + a2 z 2 (3) In this case, by introducing a new variable, the nonlinear model is transformed into a linear model. Assume that z ¼ x1 and z2 ¼ x2 then y ¼ a0 + a1 x 1 + a2 x 2 (4) Although the values of x2 are not independent of x1, this does not have anything to do with the application of linear regression algorithm. These two examples demonstrate that linear models for multivariate problems are a fundamental tool that could not be ignored by practitioners, especially in the field of machine learning or more elegantly artificial intelligence (Olive, 2017; Matloff, 2017). In this section, we are going to go through a project called nutrient removal efficiency data and we use a data set containing 7876 data to predict the total phosphorus (TP), ammonium (NH4-N), and total nitrogen (TN) removal efficiency of an anaerobic anoxic-oxic membrane bioreactor system and the output values are predicted by nine input data given in Table 1. This dataset was taken from the data reported from an article published by Yaqub et al. (2020). In this part, we are only using one explanatory variable (e.g., TOC) to explain the output (RE of TN). The linear regression diagram for this example is shown in Fig. 1. After successful fitting, it is well known that with increasing TOC, the values of removal efficiency of TN increase. 3. Multivariate linear regression When y is a function of n variables namely x1 to xn the simplest model for dependency is a linear model which can provide an estimation yb of the function as yb ¼ a0 + a1 x1 + a2 x2 +⋯+an xn TABLE 1 The attribute information of the nutrient removal efficiency project. Code Input or output Description TOC Input Total organic contents TN Input Total nitrogen TP Input Total phosphorous COD Input Chemical oxygen demand NH4-N Input Ammonium SS Input Suspended solids DO Input Dissolved oxygen ORP Input Oxidation-reduction potential MLSS Input Mixed liquor suspended solids RE of NH4-N Output Removal efficiency of NH4-N RE of TN Output Removal efficiency of TN RE of TP Output Removal efficiency of TP (5) Advanced machine learning techniques Chapter 1 3 100 80 NH4-N-OUT 60 40 20 0 4000 6000 8000 10000 12000 14000 16000 18000 TOC FIG. 1 Linear regression for nutrient removal efficiency project. Where a0 to an are the model parameters to be determined using available data in combination with a proper linear regression algorithm (Hackeling, 2017). Matrix notations help provide a compact form of the equations in multivariate problems. In the matrix form yb ¼ xT a (6) Where xT ¼ [1 x1 x2 ⋯xn] indicates the transpose of column matrix x and similarly a shows the column matrix of parameters. Note that a new term i.e. x0 ¼ 1 is introduced to make the matrix product possible. Now the problem is reduced to the determination of elements of matrix a under suitable constraints that eventually provide a system of linear equations for specifying the model parameters. The first one must note that at each point the model error is defined as ei ¼ yi ybi that is ei ¼ yi xTi a (7) Where xTi can be interpreted as the i0 th row of the matrix XT which includes the values of each variable at different points. The error vector could be defined as a column matrix as e ¼ y XT a (8) 4 Handbook of hydroinformatics Where both e and y have p elements (column vectors with p rows) and XT is an p (n + 1) matrix: 2 3 xn,1 1 x1, 1 x2,1 ⋯ 6 7 6 1 x1, 2 x2,2 xn,2 7 T 6 7 X ¼6 ⋮ ⋱ ⋮ 7 4 5 1 x1, p x2,p ⋯ xn, p (9) At the first glance, minimization of the absolute value of the error results in the best values of model parameters. But since the error is presented by the vector e, one should talk about the minimization of a suitable norm of that. Moreover, the first norm which adds up the absolute values of the elements of e would bring some problems in terms of differentiation. Thus, the better choice is the second norm or the Euclidian norm of the error vector: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (10) kek2 ¼ e21 + e22 + ⋯ + e2p And finally, since minimization of the above function is equivalent to the minimization of the summation on the right side, the Sum of Squares of Errors (SSE) is taken as the target function in linear regression problems: SSE ¼ p X e2i ¼ eT ∗e ¼ yT aT X ∗ y XT a (11) i¼1 And minimization of this function without any constraint would result in an Ordinary Least Square (OLS) method for determination of model parameters, i.e., a0 to an. Obviously, this method involves vanishing the first partial derivatives of SSE concerning the model parameters which provide the required n + 1 equations: p n X X ∂ðSSEÞ ¼ 2 yi xk,i + 2 ai XX T ki + ¼ 0, k ¼ 0,1,2,…,n ∂ak i¼0 i¼1 p n X X ; ai XX T ki ¼ yi xk,i i¼0 (12) i¼1 Or in the matrix form XX T a ¼ Xy ! 1 a ¼ XX T Xy (13) Of course solution of the system of linear equations could be accomplished by rather low cost calculations. Indeed, the determination of the inverse matrix for the solution of a linear system of equations is almost always avoided. Instead, a direct method such as the Gauss-Jordan method of LU-factorization is advised when the system is not too large (say for n < 100) and indirect methods such as SOR are advised for large systems. Of course, a favorable model, should not have too many parameters, that is, the number of independent variables is kept low by incorporating the effective terms and neglecting the variables with minor impact on the output. Therefore, no matter how large the data set, one intends to solve a linear system of equations with a reasonable number of unknowns. Fig. 2 illustrates synthetic data for which a random error is involved in the measurement. Here, the data point is determined by y ¼ 5 + 2.5x + error. And the linear regression results in a0 ¼ 4.9943 and a1 ¼ 2.4824. The calculations involve the solution of a linear system of n + 1 equations which involves inverting of the matrix which generally requires O(np) operations where p lies between 2.373 and three depending on the direct method applied. For example, Gauss-Jordan requires O(n3) operations; hence if the number of variables (terms) is doubled, the operations are increased to eight times of the original problem. For many problems, this does not introduce much difficulty but for problems with a large number of variables, iterative procedures are advised. Among these methods, the gradient descent method is helpful for both problems with too many variables and problems with a very large set of data (Konishi, 2014; Izenman, 2008). 4. Gradient descent method This technique is also known as steepest descent, is a well-known method for the optimization of differentiable functions. The strategy is based on taking steps proportional to the negative gradient of the function at each point to get closer to the Advanced machine learning techniques Chapter 1 5 FIG. 2 Applying linear regression for the synthetic data. local minimum of the function. For convex function, the global minimum of the function could be determined albeit by a proper choice of the step size (Harrington, 2012). For a multivariable function F(x), the gradient descent method is put to action as follows xn+1 ¼ xn l—Fðxn Þ (14) Where l is a small positive number which must be small enough to prevent missing the local minimum but not too small to literally get the process stuck at the neighborhood of our initial guess. Note that l can be updated at each step and under certain circumstances, values of this parameter could be chosen to guarantee the convergence. As an example, consider the contour plot shown in Fig. 3. FIG. 3 Contours of function y ¼ x21/4 + x22 x32/8 + 1: reaching local minimum by GD. 6 Handbook of hydroinformatics This plot represents the function y ¼ x21/4 + x22 x32/8 + 1 and if one starts searching for the minimum from (x1, x2) ¼ (4, 4), the direction for the steepest descent is opposite to the gradient of the function at this point, i.e., r y(4,4) ¼ 2i 2j. Hence, the new point is given by x1 x2 ¼ 4 4 l 2 2 (15) Usually, at the first step, values of l are set less than unity and then larger values are examined. Practically, enlarging l is permitted as long as it provides smaller values of the objective function. For the present problem, the new values of variables in terms of l could be replaced in function “y” and with straight-forward single variable optimization one arrives at lopt ¼ 2 and in this way the local minimum is determined by just one shot at (x1, x2) ¼ (0, 0) and ymin ¼ 1. Of course, realworld problems are not that easy and several steps with proper step size are required. In each step, a single variable optimization might be performed to infer the best value of the step size. Of course, direct searching based on small step size, its enlargement (usually by 10 times), and comparison of the resulting function values is better suited for sophisticated functions. As long as our regression problem is concerned, one has to put it in the form of a minimization problem to apply the gradient descent method. The objective function is simply the sum of the squares of error SSE ¼ yT aT X ∗ y XT a (16) Whose gradient could be simply determined from previous arguments on its derivatives concerning model parameters; that is —SSE ¼ 2X∗ XT a y (17) With an initial guess and a small enough value of step size, one can initiate the algorithm to obtain the right values of the model’s parameters: anew ¼ aold l—SSEðaold Þ (18) There are three approaches to apply the gradient descent method for the training set of the data depending on how the huge data set is handled by this method. In this respect, if the whole training set is used at every step of the calculations, the method is addressed as batch gradient descent. For a very large data set, the batch gradient descent might not be economic. Hence, two other variants are proposed by practitioners: Stochastic gradient descent and mini-batch gradient descent. In stochastic GD at each step, only a small random set of the training data-set is used to calculate the gradient which makes it much faster than the original Batch GD; however, there are some issues due to the stochastic nature of the method which results in a nonmonotonic convergence to the local minimum and hence it needs stopping criteria to prevent bouncing around the local minimum. On the other hand, the mini-batch variant of GD splits the training data-set into small sets and computes the gradient on these sets which allows taking the benefit of parallel computation while it resolves the problem of oscillatory convergence observed in stochastic GD (Shalev-Shwartz and Ben-David, 2014; Brownlee, 2016). We are going to apply gradient descent to the nutrient removal efficiency project. We do these using parameters of TOC and the removal efficiency of NH4-N. As can be seen from the Fig. 4, the sum of the squares of error decreases with increasing epoch to a certain level until the error reaches a minimum value. 5. Polynomial regression When the data do not depict a linear behavior, still linear regression could be applied by introducing new variables in terms of the powers of the original variable as discussed in the previous section. Consider Fig. 3 which displays synthetic data built up by y ¼ 5 + 3x 0.5x2 + noise. To apply linear regression to this problem, consider the following model y ¼ a0 + a1 x 1 + a2 x 2 (19) Where x1 ¼ x itself and x2 ¼ x2 so that the values of the new variable are known everywhere. Therefore, the matrix of variables X is built up as follows Advanced machine learning techniques Chapter 1 7 3938.0 3937.5 3937.0 SSE 3936.5 3936.0 3935.5 3935.0 3934.5 3934.0 0 20 40 60 80 100 Epoch FIG. 4 SSE in terms of epoch for the nutrient removal efficiency project. 2 1 61 6 XT ¼ 6 6⋮ 4 1 x1,1 x1,2 ⋮ x1,p 3 2 1 x2,1 7 6 x2,2 7 6 1 7¼6 6 ⋮ 7 5 4⋮ x2,p 1 And finally, the model parameters are determined as a ¼ XX 2 T 21 x21,1 x1,1 3 ⋮ 7 x21,2 7 7 ⋮ 7 5 x1,p x21,p x1,2 5:1129 (20) 3 6 7 Xy ¼ 4 3:0234 5 0:5138 (21) The model prediction is also displayed in Fig. 5. When there are indeed multiple variables (or features), a true polynomial regression is necessary to capture the relationship between these features. Mathematically, this relationship is depicted in nonlinear terms which contain a combination of the variables. For example, the second-order terms are either made by squaring a single variable or multiplying one feature by another. In this respect, the number of terms of a polynomial of degree m for a problem with n features will be determined as n+m ðn + mÞ! (22) ¼ n!m! m 8 Handbook of hydroinformatics 11 data model 10 9 y 8 7 6 5 0 1 2 3 4 5 X FIG. 5 Noisy data and the second-degree least-square polynomial. FIG. 6 The variance of a too high-degree polynomial which makes it an improper choice. which includes all possible combinations of variables to construct a multivariate mth degree polynomial. Now, the question is what degree is the best for a problem. A high-degree polynomial can get closer to more data points but it can easily lose track by following the inherent noise of the data. In this regard, its predictions might not be relied on. When this is the case, we say that the model has a high variance. Fig. 6 compares a polynomial of 20th degree with the second-degree polynomial. Choosing the proper degree of the regression polynomial is a statistical task that considers the trade-off between the high variance (unacceptable sensitivity of the model with high-degree polynomials) and the bias (underfitting the data with low degree polynomials) (Shalev-Shwartz and Ben-David, 2014; Raschka, 2015; Ramasubramanian and Singh, 2018). The issue is discussed in the following section. Advanced machine learning techniques Chapter 1 9 FIG. 7 The high-degree polynomial regression. 6. Overfitting and underfitting Compared to plain linear regression, a high-degree polynomial regression provides a better opportunity for fitting the training data. As shown in Fig. 7, when a 40-degree polynomial model is applied to the training data of Fig. 7, the data points are approximated to a great extent but obviously, its trend is not acceptable at both ends that’s why it’s considered as an overfitting regression polynomial. The linear model neither follows the general trend nor touches the data points satisfactorily; hence, it underfits the data. However, the quadratic regression satisfactorily follows the general trend and presents a reasonable approximation as well (Swamynathan, 2019; Burger, 2018). This is expected since the initial dataset was created through the introduction of some errors in a quadratic function. Nonetheless, in many practical cases, there is no means to identify the original function behind the dataset. Therefore, there is a need to determine the level of complexity of a model and to determine whether the model is underfitting or overfitting the data (Ganesh, 2017). 7. Cross-validation One of the most common ways to obtain an estimation of the performance of the model in terms of generalization involves the utilization of cross-validation. It is said that the model is overfitting if it has good performance on the training data, but provides poor generalization, which is determined by evaluating the cross-validation measures. However, the model is said to be underfitting if it provides poor performance on both the training data and on the measures of cross-validation. Hence, this is a satisfactory method for determining whether the model is too complex or too simple. 8. Comparison between linear and polynomial regressions In this section, we intend to examine polynomial regression for the nutrient removal efficiency project. By drawing the MLSS on the horizontal axis in terms of removal efficiency of TN data on the y-axis, a nonlinear downward trend is obtained. The figure related to this example is given in Fig. 8. After applying linear regression to the data in this example, it becomes clear that this regression cannot be placed correctly on this data (see Fig. 9). After applying polynomial regression in the quadratic mode for the data of this example, it becomes clear that this regression can fit these data better than the linear mode. This issue was also examined numerically and the value of R2 related to this regression was obtained equal to 0.14, while the value of R2 related to linear regression was equal to 0.10 (see Fig. 10). Now we change the polynomial features to the degree of 10 and run the lines again and this time we can see from the following figure that the direction of the graph is changed. The value of R2 is 0.04 and we may seem to have a case of overfitting (see Fig. 11). 10 Handbook of hydroinformatics 90 80 70 60 50 40 30 20 0 2500 5000 7500 10000 12500 15000 17500 20000 FIG. 8 The values of MLSS in terms of removal efficiency of TN correspond to the nutrient removal efficiency project. 90 80 70 60 50 40 30 20 0 2500 5000 7500 10000 12500 FIG. 9 Using linear regression to predict the values of MLSS and removal efficiency of TN. 15000 17500 20000 Advanced machine learning techniques Chapter 1 11 90 80 70 60 50 40 30 20 0 2500 5000 7500 10000 12500 15000 17500 20000 FIG. 10 Using polynomial regression (quadratic mode) to predict the values of MLSS and removal efficiency of TN. 9. Learning curve One of the other available methods is to evaluate the learning curves. Learning curves show the performance of the model on both training and validation sets as a function of the size of the training set or the training iteration. In order to plot these curves, the model is trained several times using various subsets of the training set where each subset is of a different size ( Jaber, 2016). It should be noted that in general, a straight line cannot provide good performance for modeling the data. This is confirmed by the fact that the error level reaches a relatively constant level, which is very close to the other curve. It is worth mentioning that when a model is underfitting, these learning curves are typically observed, i.e., curves that reach constant error levels, which are close and relatively high. It should be noted that a common method for improving an overfitting model is to provide it with more training instances until the validation error reaches the training error. The learning curve for the nutrient removal efficiency project is given in the below diagram. By plotting train and also test score versus the number of samples, it is clear that these two graphs become closer to each other with increasing the number of training examples, and from 5000 onwards, the slope and the degree to which the two graphs approach each other become less intense (see Fig. 12). 10. Regularized linear models One of the possible ways for decreasing the overfitting phenomenon involves the regularization of the model, which is another way to say limiting or restricting it. By reducing the degrees of freedom of the model, it will be harder for the model to overfit the data. One of the east ways to regularize a polynomial model involves decreasing the polynomial degree. 12 Handbook of hydroinformatics 90 80 70 60 50 40 30 20 0 2500 5000 7500 10000 12500 15000 17500 20000 FIG. 11 Using polynomial regression (degree of 10) to predict the values of MLSS and removal efficiency of TN. Learning Curves (Ridge Regression) 0.006 Training score Cross-validation score 0.004 Score 0.002 0.000 –0.002 –0.004 –0.006 1000 2000 3000 4000 Training examples FIG. 12 The learning curve for the nutrient removal efficiency project. 5000 6000 Advanced machine learning techniques Chapter 1 13 In the case of a linear model, the regularization is generally performed by restricting the weights of the model. In order to better illustrate the model, the Ridge regression and the Lasso regression models can be evaluated since these models utilize two distinct methods for restricting the weights (Gori, 2017). 11. The ridge regression The ridge regression is in fact a regularized or restricted version of the linear regression. In order to regularize the linear n P regression model, a regularization term, i.e., a y2i is introduced into the cost function of the model. Adding this term will i¼1 make the learning algorithm fit the data, while also minimizing the weights of the model. It is worth mentioning that this regularization term must only be introduced into the cost function during the training stage. After training the model, the unregularized performance measure can be used to assess the performance of the model (Saleh et al., 2019; Aldrich and Auret, 2013). The extent of the regularization of the model can be controlled using the hyper-parameter a. When a ¼ 0, the Ridge regression will be the same as the linear regression. However, if a is very large, all the weights will be close to zero, resulting in a flat line going through the mean values of the data. The cost function of the Ridge regression model is presented in Eq. (23). 1 Xn 2 J ðyÞ ¼ MSEðyÞ + a y (23) i¼1 i 2 It should be noted that the bias term, denoted by y0 is not regularized and the sum starts 2 at i ¼ 1 and not i ¼ 0. If w denotes the vector of feature weights (y1 to yn), the regularization term will become 12 kwk2 , in which kwk2 signifies the ‘2 norm of the weight vector. Moreover, for the gradient descent, aw is simply added to the MSE gradient vector (Alpaydin, 2020). Similar to the linear regression model, the Ridge regression can be performed by calculating a closed-form equation or by applying the gradient descent. The advantages and disadvantages are the same. The closed-form solution is presented in Eq. (24). It should be noted that in this equation, A is the (n + 1) (n + 1) identity matrix, with one difference, i.e., the presence of a 0 in the top-left cell, which corresponds to the bias term. 1 b y ¼ XT X + aA XT y (24) 12. The effect of collinearity in the coefficients of an estimator As mentioned earlier, a 0 is a complexity parameter that controls the amount of shrinkage: the larger the value of a, the greater the amount of shrinkage and thus the coefficients become more robust to collinearity. Each color represents a different feature of the coefficient vector, and this is displayed as a function of the regularization parameter. This example also shows the usefulness of applying Ridge regression to highly ill-conditioned matrices. For such matrices, a slight change in the target variable can cause huge variances in the calculated weights. In such cases, it is useful to set a certain regularization (alpha) to reduce this variation (noise) (see Fig. 13). 13. Outliers impact Before discussing ridge regression, let’s look at an example of the effect of outliers on the slope of the regression line, and then we will show how ridge regression can reduce these effects. For a data set containing 100 randomly generated points with a slope of 0.5, after performing linear regression, the slope of the fitted line is equal to 0.47134857. The diagram for this example is given below (see Fig. 14): Now to show the effect of outliers on the previous example, we change two points in the data set: we replace the first point of the chart with 200 and the last point of the chart with +200. After performing linear regression, we see that the slope of the obtained line is equal to 1.50556072, which is significantly different from the slope of the previous chart and shows the effect of outliers. The diagram for this example is given below (see Fig. 15). After applying the ridge regression, we can see that this regression is substantially better than linear regression and it recovers the original coefficient with a fairly good approximation. The slope of the line obtained in this regression is equal to 1.00370714. The diagram for this example is given below (see Fig. 16). 14 Handbook of hydroinformatics 200 weights 100 0 –100 10–3 10–5 10–7 10–9 alpha FIG. 13 Ridge coefficients as a function of the regularization. 10 5 0 –5 –10 –20 –10 0 FIG. 14 Data set containing 100 randomly generated points and performing linear regression. 10 20 Advanced machine learning techniques Chapter 200 150 100 50 0 –50 –100 –150 –200 –20 –10 0 10 20 –20 –10 0 10 20 FIG. 15 Showing the outlier’s effect. 200 150 100 50 0 –50 –100 –150 –200 FIG. 16 Using ridge regression to offset the impact of outliers. 1 15 16 14. Handbook of hydroinformatics Lasso regression Another regularized version of the linear regression is the Lasso regression, which comes from Least Absolute Shrinkage and Selection Operator Regression. Similar to the Ridge regression, this regularized version introduces a regularization term into the cost function, except that instead of the half of the square of the ‘2 norm, it utilizes the ‘1 norm of the weight vector. This is expressed in Eq. (25) (Sra et al., 2012; Bali et al., 2016). Xn J ðyÞ ¼ MSEðyÞ + a i¼1 yi (25) One of the differences between this type of regression and the Ridge regression involves the fact that as the parameters get closer to the global optimum, the gradients get smaller, and the gradient descent becomes slower, increasing the likelihood of convergence because of the lack of bouncing around. Another difference is that by increasing a, the optimal parameters gradually get closer to the origin, but they will never reach zero. It should be noted that at yi ¼ 0 for i ¼ 1, 2, …, n, the cost function of the Lasso regression is not differentiable; however, if in cases where yi ¼ 0, the subgradient vector g is utilized, the gradient descent will perform well enough. A subgradient vector equation that can be utilized for gradient descent with the cost function of the Lasso regression is presented by Eq. (26). 1 signðy1 Þ B signðy Þ C 2 C B C B C B : C B C B : C gðy, J Þ ¼ ry MSEðyÞ + aB C B : C B C B B signðyn Þ C C B A @ 0 8 > < 1 if yi < 0 if yi ¼ 0 where signðyi Þ ¼ 0 > : + 1 if yi > 0 (26) After applying Lasso regression to the example data related to the ridge regression, it was shown that this regression can fit the data of this example with a much better approximation than linear one and be less affected by outliers. The slope of the line related to this regression is 1.06289489. The diagram for this example is given below (see Fig. 17): 200 150 100 50 0 –50 –100 –150 –200 –20 –10 FIG. 17 Using Lasso regression to offset the impact of outliers. 0 10 20 Advanced machine learning techniques Chapter 1 17 15. Elastic net The elastic net is the middle point between the Ridge and the Lasso regression models. In this model, the regularization term is a combination of the regularization terms from Ridge and Lasso regression models, which is called the mix ratio r. It should be noted that the elastic net will be equal to the Ridge regression when r is set to 0, while it equals the Lasso regression when r is set of 1. This is expressed in Eq. (27) (Humphries et al., 2018; Forsyth, 2019). Xn 1 r Xn a i¼1 y2i J ðyÞ ¼ MSEðyÞ + ra i¼1 yi + (27) 2 The question is when to use the linear regression without regularization, the Ridge regression, the Lasso regression, or the Elastic Net. In order to make this decision, it should be noted that a level of regularization is always preferred. Therefore, using the plain linear regression model must be avoided as much as possible. The Ridge regression model can be a good default option. However, if there are only a limited number of useful features, it is better to utilize the Lasso regression or the elastic net, since they usually set the weights of the useless features to zero, as noted earlier. Nonetheless, when the number of features is larger than the number of training instances, or when there is a strong correlation between several features, it is better to utilize the elastic net instead of the Lasso regression since the Lasso can have erratic behaviors in such cases. After applying elastic regression to the example data for ridge and lasso regressions, it was shown that this regression has a much better approximation than two previous regressions and is less affected by outliers. The slope of the fitted line for this regression was 0.74724704. The diagram for this example is given below (see Fig. 18): 16. Early stopping Stopping the training once the validation error is minimized is another distinct way to regularize iterative learning algorithms, including the gradient descent. This method is known as “early stopping.” When applying the early stopping method, as soon as the validation error is minimized, the training is halted. This is a simple, elegant, and efficient method for the regularization of iterative learning algorithms (Shukla, 2018). It should be noted that when using stochastic and mini-batch gradient descent, it is difficult to determine if the error is minimized or not since the curves are not this smooth. A possible solution is to stop the training after the validation error has 200 150 100 50 0 –50 –100 –150 –200 –20 –10 FIG. 18 Using elastic net regression to offset the impact of outliers. 0 10 20 18 Handbook of hydroinformatics stayed above the minimum for a while and the possibility of a better performance by the model is not very high. Afterward, the parameters of the model can be set at the values they were when the validation error was at the minimum point. 17. Logistic regression Some algorithms using regression could be applied for classification problems. The possibility of an example belonging to a particular class can be examined by logistical regression. For example, how likely is it that an email will be spam? Logit regression is another name for this regression. Whether or not a sample belongs to a class in this model depends on the probability of more or less than 50%, respectively. In this regression, the probability of more than 50% is called the “positive class” represented by “1” and the probability of less than 50% is called “negative class” which is indicated by “0.” Such division is called a binary classification (Mohammed et al., 2016; Lesmeister, 2015). 18. Estimation of probabilities The question that may be encountered here is how the logistic regression works. A logistic regression model is similar to a linear regression model in the sense that it computes the weighted sum of the inputs along with a bias term. However, its main difference with the linear regression model is that it does not provide a direct result; rather, its output is the logistic of the result. This is expressed in Eq. (28). b p ¼ hu ðxÞ ¼ s XT u (28) It should be noted that the logistic, which is denoted by s(.), is a sigmoid or S-shaped function, whose output ranges from 0 to 1. This function is expressed by Eq. (29) and it is depicted in Fig. 19. s ðt Þ ¼ 1 1 + exp ðtÞ (29) After the logistic regression model estimates the probability, i.e., b p ¼ hy ðxÞ, of an instance x belonging to the positive class, the model can calculate the prediction ybin a straightforward fashion. This prediction is expressed in Eq. (30) (Lantz, 2019). 0 if b p < 0:5 b y¼ (30) 1 if b p 0:5 It should be noted that when t < 0, we will have s(t) < 0.5, while when t 0, we will have s(t) 0.5. Accordingly, if xTy is positive, the logistic regression model’s output as the prediction will be equal to 1; otherwise, the output will be equal to 0. It is worth mentioning that the score t is usually called the logit. This is because the logit function, which is expressed as logitðpÞ ¼ log p 1p , is in fact the inverse of the logistic function. Moreover, when calculating the logit of the estimated probability, i.e., p, the output will be t. It should also be noted that the logit is sometimes called the log-odds because it can be defined as the logarithm of the ratio of the estimated probability of the positive class to the estimated probability of the negative class (Harrell, 2015). 1.00 σ(t) = 1.75 1 1 + e–t 0.50 0.25 0.00 –10.0 –7.5 –5.0 –2.5 0.0 t FIG. 19 The logistic function. 2.5 5.0 7.5 10.0 Advanced machine learning techniques Chapter 1 19 19. Training and the cost function Based on the abovementioned considerations, it can be concluded that a logistic regression model is capable of estimating probabilities and providing predictions. However, the method for training the model must be explained as well. The main goal of training the model is to find a suitable value for the parameter vector y in a way that the model can provide high probabilities for the positive instances (y ¼ 1), while it provides low probabilities for the negative instances (y ¼ 0). This notion is expressed by the cost function, which is presented in Eq. (31) below for a single training instance (x) (Ayyadevara, 2018). log ðb pÞ if y ¼ 1 cðuÞ ¼ (31) log ð1 b pÞ if y ¼ 0 In order to better understand this cost function, it should be noted that when t ! 0, the value of log(t) will become significantly larger. Therefore, if the model provides a probability close to 0 for a positive instance or a probability close to 1 for a negative instance, then the cost will become significantly large. In contrast, as t ! 1, we will have log(t) ! 0. In other words, if the estimated probability for a negative instance is close to 0 or if the estimated probability for a positive instance is close to 1, then the cost will be much closer to 0. Incidentally, the latter case is what we are actually seeking (Raschka and Mirjalili, 2019; Amamra et al., 2018). On the other hand, the cost function over the entire training set can be an indicator for the average cost over all individual training instances. This general cost function can be presented as the log loss, which is expressed in Eq. (32). i 1 Xm h ðiÞ ðiÞ ð iÞ ðiÞ b b JðuÞ ¼ y log p + 1 y log 1 p (32) i¼1 m Unfortunately, there isn’t any closed-form formula to obtain a value for y in a way that the log loss or the general cost function can be minimized. In other words, there is no equivalent available for the normal equation. In contrast, the log loss is a convex function, so using any optimization algorithms, such as the gradient descent, can provide the global minimum for this function as long as the learning rate is small enough and the algorithm has enough time. Moreover, Eq. (33) presents the partial derivatives of the log loss based on parameter uj. m ∂ 1 X ðiÞ JðuÞ ¼ s uT XðiÞ yðiÞ xj ∂uj m i¼1 (33) It should be noted that the above equation is similar to partial derivatives of the cost function at the batch gradient descent equation in that for each of the instances, it provides the prediction error and multiplies it by the value of the jth feature. Afterward, it calculates the average over all the training instances. When the gradient vector, which includes all the partial derivatives, is available, it can be utilized for employing the Batch Gradient Descent algorithm. These were the main steps for training a logistic regression model. It is also worth mentioning that in the stochastic gradient descent algorithm, each instance is evaluated separately, while in the mini-batch gradient descent algorithm, a mini-batch is utilized each time. For performing logistic regression by python the following data displayed in Figs. 20 and 21 is used. Finally, the decision boundary is specified by the violet from Fig. 22. y=1 y=0 1.00 Microchip Test 2 0.75 0.50 0.25 0.00 –0.25 –0.50 –0.75 –0.75 FIG. 20 Data for logistic regression. –0.50 –0.25 0.00 0.25 Microchip Test 1 0.50 0.75 1.00 20 Handbook of hydroinformatics 0.70 cost function 0.65 0.60 0.55 0.50 0.45 0 2500 5000 7500 10000 12500 number of iteration 15000 17500 20000 FIG. 21 The cost functions vs increasing of iterations. y=1 y=0 1.00 Microchip Test 2 0.75 0.50 0.25 0.00 –0.25 –0.50 –0.75 –1.00 –1.00 –0.75 –0.50 –0.25 0.00 0.25 0.50 Microchip Test 1 0.75 1.00 FIG. 22 The decision boundary plot. 20. Conclusions This chapter describes different types of regressions and how they can be used to solve real problems. To solve our challenges in different sections, we utilize the Scikit-learn Python library. In this chapter, a complete description of different regressions such as robust, multiple, regulized (ridge, lasso, and elastic net), polynomial, and logistic is given. Introduction and practical use of methods such as gradient descent, cross-validation, and the learning curve are illustrated and how to deal with issues such as outliers, overfitting, and underfitting to have a better approach to regression issues became clear. Appendix: Python code Linear regression The following Python code presents the steps of calculations for this regression: In [1]: import numpy as np import pandas as pd Advanced machine learning techniques Chapter In [2]: df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None) In [3]: col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT' , 'TN-OUT', 'TP-OUT'] In [4]: df.columns = col_name In [5]: import matplotlib.pyplot as plt import seaborn as sns In [6]: X = df['TOC'].values.reshape(-1,1) In [7]: y = df['TN-OUT'].values In [8]: from sklearn.linear_model import LinearRegression In [9]: model = LinearRegression() In [10]: model.fit(X, y) Out[10]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False) In [11]: model.coef_ Out[11]: array([0.02305826]) In [12]: model.intercept_ 1 21 22 Handbook of hydroinformatics Out[12]: 66.88380657331433 In [66]: plt.figure(figsize=(12,10)); sns.regplot(X, y); plt.xlabel('TOC') plt.ylabel("TN-OUT") plt.show(); Gradient descent method The following Python code presents the steps of implementing the Gradient Descent method for this example: In [1]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_style("whitegrid") %matplotlib inline In [4]: df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None) col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT' , 'TN-OUT', 'TP-OUT'] df.columns = col_name import matplotlib.pyplot as plt import seaborn as sns In [5]: X = df['TOC'].values.reshape(-1,1) y = df['TN-OUT'].values In [6]: from sklearn.preprocessing import StandardScaler sc_x = StandardScaler() sc_y = StandardScaler() X_std = sc_x.fit_transform(X) y_std = sc_y.fit_transform(y.reshape(-1,1)).flatten() Advanced machine learning techniques Chapter In [7]: alpha = 0.0001 w_ = np.zeros(1 + X_std.shape[1]) cost_ = [] n_ = 100 for i in range(n_): y_pred = np.dot(X_std, w_[1:]) + w_[0] errors = (y_std - y_pred) w_[1:] += alpha * X_std.T.dot(errors) w_[0] += alpha * errors.sum() cost = (errors**2).sum() / 2.0 cost_.append(cost) In [8]: plt.figure(figsize=(10,8)) plt.plot(range(1, n_ + 1), cost_); plt.ylabel('SSE'); plt.xlabel('Epoch'); In [9]: w_ Out[9]: array([1.02318154e-15, 3.21037679e-02]) Comparison between linear and polynomial regressions The code related to this example is given below: In [1]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_style("whitegrid") %matplotlib inline In [2]: df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None) col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT' , 'TN-OUT', 'TP-OUT'] df.columns = col_name import matplotlib.pyplot as plt import seaborn as sns 1 23 24 Handbook of hydroinformatics In [3]: X = df['MLSS'].values.reshape(-1,1) y = df['TN-OUT'].values In [4]: plt.figure(figsize=(12,8)) plt.scatter(X, y); In [5]: lr = LinearRegression() lr.fit(X.reshape(-1, 1), y) model_pred = lr.predict(X.reshape(-1,1)) plt.figure(figsize=(12,8)) plt.scatter(X, y); plt.plot(X, model_pred); print("R^2 score = {:.2f}".format(r2_score(y, model_pred))) R^2 score = 0.10 In [6]: poly_reg = PolynomialFeatures(degree=2) X_poly_b = poly_reg.fit_transform(X.reshape(-1, 1)) lin_reg_2 = LinearRegression() In [7]: lin_reg_2.fit(X_poly_b, y) Out[7]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False) In [8]: X_fit = np.arange(X.min(), X.max(), 1)[:, np.newaxis] In [9]: X_fit Out[9]: array([[ 99.08039698], [ 100.08039698], [ 101.08039698], ..., [19974.08039698], [19975.08039698], [19976.08039698]]) Advanced machine learning techniques Chapter In [10]: y_pred = lin_reg_2.predict(poly_reg.fit_transform(X_fit.reshape(-1,1))) In [11]: plt.figure(figsize=(10,8)); plt.scatter(X, y); plt.plot(X_fit, y_pred); print("R^2 score = {:.2f}".format(r2_score(y, lin_reg_2.predict(X_poly_b)))) R^2 score = 0.14 In [12]: poly_reg = PolynomialFeatures(degree=10) X_poly_b = poly_reg.fit_transform(X.reshape(-1, 1)) lin_reg_3 = LinearRegression() In [13]: lin_reg_3.fit(X_poly_b, y) Out[13]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False) In [14]: X_fit = np.arange(X.min(), X.max(), 1)[:, np.newaxis] 1 25 26 Handbook of hydroinformatics In [15]: y_pred = lin_reg_3.predict(poly_reg.fit_transform(X_fit.reshape(-1,1))) In [16]: plt.figure(figsize=(10,8)); plt.scatter(X, y); plt.plot(X_fit, y_pred); print("R^2 score = {:.2f}".format(r2_score(y, lin_reg_3.predict(X_poly_b)))) R^2 score = 0.04 Learning curve In [1]: import numpy as np import matplotlib.pyplot as plt import pandas as pd import seaborn as sns sns.set_style("whitegrid") %matplotlib inline from sklearn.linear_model import Ridge from sklearn.model_selection import learning_curve from sklearn.model_selection import ShuffleSplit Advanced machine learning techniques Chapter def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None, n_jobs=1, train_sizes=np.linspace(.1, 1.0, 5)): """ Generate a simple plot of the test and training learning curve. Parameters ---------estimator : object type that implements the "fit" and "predict" methods An object of that type which is cloned for each validation. title : string Title for the chart. X : array-like, shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape (n_samples) or (n_samples, n_features), optional Target relative to X for classification or regression; None for unsupervised learning. ylim : tuple, shape (ymin, ymax), optional Defines minimum and maximum yvalues plotted. cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 3-fold cross-validation, - integer, to specify the number of folds. - An object to be used as a cross-validation generator. - An iterable yielding train/test splits. For integer/None inputs, if ``y`` is binary or multiclass, :class:`StratifiedKFold` used. If the estimator is not a classifier or if ``y`` is neither binary nor multiclass, :class:`KFold` is used. Refer :ref:Ùser Guide <cross_validation>` for the various cross-validators that can be used here. 1 27 28 Handbook of hydroinformatics n_jobs : integer, optional Number of jobs to run in parallel (default 1). """ plt.figure(figsize=(10, 8)) plt.title(title) if ylim is not None: plt.ylim(*ylim) plt.xlabel("Training examples") plt.ylabel("Score") train_sizes, train_scores, test_scores = learning_curve( estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes) train_scores_mean = np.mean(train_scores, axis=1) train_scores_std = np.std(train_scores, axis=1) test_scores_mean = np.mean(test_scores, axis=1) test_scores_std = np.std(test_scores, axis=1) plt.grid() plt.fill_between(train_sizes, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, alpha=0.1, color="r") plt.fill_between(train_sizes, test_scores_mean - test_scores_std, test_scores_mean + test_scores_std, alpha=0.1, color="g") plt.plot(train_sizes, train_scores_mean, 'o-', color="r", label="Training score") plt.plot(train_sizes, test_scores_mean, 'o-', color="g", label="Cross-validation score") plt.legend(loc="best") return plt Advanced machine learning techniques Chapter In [2]: df = pd.read_csv('nutrient.data', delim_whitespace=True, header=None) col_name = ['TOC', 'TN' , 'TP', 'COD', 'NH4-N', 'SS', 'DO', 'ORP', 'MLSS', 'NH4-N-OUT' , 'TN-OUT', 'TP-OUT'] df.columns = col_name import matplotlib.pyplot as plt import seaborn as sns X = df['TOC'].values.reshape(-1,1) y = df['TN-OUT'].values title = "Learning Curves (Ridge Regression)" # Cross validation with 100 iterations to get smoother mean test and train # score curves, each time with 20% data randomly selected as a validation set. cv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0) estimator = Ridge() plot_learning_curve(estimator, title, X, y, cv=cv, n_jobs=4) plt.show() 1 29 30 Handbook of hydroinformatics The effect of collinearity in the coefficients of an estimator In [1]: import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model # X is the 10x10 Hilbert matrix X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis]) y = np.ones(10) # ########################################################################### # Compute paths n_alphas = 200 alphas = np.logspace(-10, -2, n_alphas) coefs = [] for a in alphas: ridge = linear_model.Ridge(alpha=a, fit_intercept=False) ridge.fit(X, y) coefs.append(ridge.coef_) # ########################################################################### # Display results plt.figure(figsize=(10,8)) ax = plt.gca() ax.plot(alphas, coefs) ax.set_xscale('log') Advanced machine learning techniques Chapter ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis plt.xlabel('alpha') plt.ylabel('weights') plt.axis('tight') plt.show() Outliers impact In [1]: import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_style('whitegrid') import pandas as pd In [2]: from sklearn.linear_model import LinearRegression In [3]: np.random.seed(42) n_samples = 100 rng = np.random.randn(n_samples) * 10 y_gen = 0.5 * rng + 2 * np.random.randn(n_samples) lr = LinearRegression() lr.fit(rng.reshape(-1, 1), y_gen) model_pred = lr.predict(rng.reshape(-1,1)) plt.figure(figsize=(10,8)); plt.scatter(rng, y_gen); plt.plot(rng, model_pred); print("Coefficient Estimate: ", lr.coef_) Coefficient Estimate: [0.47134857] 1 31 32 Handbook of hydroinformatics In [4]: idx = rng.argmax() y_gen[idx] = 200 idx = rng.argmin() y_gen[idx] = -200 In [5]: plt.figure(figsize=(10,8)); plt.scatter(rng, y_gen); o_lr = LinearRegression(normalize=True) o_lr.fit(rng.reshape(-1, 1), y_gen) o_model_pred = o_lr.predict(rng.reshape(-1,1)) plt.scatter(rng, y_gen); plt.plot(rng, o_model_pred); print("Coefficient Estimate: ", o_lr.coef_) Coefficient Estimate: [1.50556072] In [6]: from sklearn.linear_model import Ridge In [7]: ridge_mod = Ridge(alpha=0.5, normalize=True) ridge_mod.fit(rng.reshape(-1, 1), y_gen) ridge_model_pred = ridge_mod.predict(rng.reshape(-1,1)) plt.figure(figsize=(10,8)); plt.scatter(rng, y_gen); plt.plot(rng, ridge_model_pred); print("Coefficient Estimate: ", ridge_mod.coef_) Coefficient Estimate: [1.00370714] Advanced machine learning techniques Chapter Lasso regression In [8]: from sklearn.linear_model import Lasso In [9]: lasso_mod = Lasso(alpha=0.4, normalize=True) lasso_mod.fit(rng.reshape(-1, 1), y_gen) lasso_model_pred = lasso_mod.predict(rng.reshape(-1,1)) plt.figure(figsize=(10,8)); plt.scatter(rng, y_gen); plt.plot(rng, lasso_model_pred); print("Coefficient Estimate: ", lasso_mod.coef_) Coefficient Estimate: [1.06289489] Elastic net In [11]: from sklearn.linear_model import ElasticNet In [12]: en_mod = ElasticNet(alpha=0.02, normalize=True) en_mod.fit(rng.reshape(-1, 1), y_gen) en_model_pred = en_mod.predict(rng.reshape(-1,1)) plt.figure(figsize=(10,8)); plt.scatter(rng, y_gen); plt.plot(rng, en_model_pred); print("Coefficient Estimate: ", en_mod.coef_) Coefficient Estimate: [0.74724704] 1 33 34 Handbook of hydroinformatics Training and the cost function The required are added the following: In [1]: import numpy as np import pandas as pd import matplotlib.pyplot as plt pandas is used for reading the data: In [2]: data = pd.read_csv('ex2data2.txt', header=None) In [3]: data.head() Out[3]: 0 1 2 0 0.051267 0.69956 1 1 0.092742 0.68494 1 2 0.213710 0.69225 1 3 0.375000 0.50219 1 4 0.513250 0.46564 1 In [4]: x = data.iloc[:,:2].values y = data.iloc[:,2].values.reshape(-1,1) The polynomial feature is employed for logistic regression by scikit-learn: In [6]: from sklearn.preprocessing import PolynomialFeatures poly_feature = PolynomialFeatures(degree=6, include_bias=False) x_poly = poly_feature.fit_transform(x) Advanced machine learning techniques Chapter The following function defined the sigmoid function: In [9]: def sigmoid(z): m = len(z) h = np.zeros((m,1)) for i in range(m): h [i] = 1 / (1 + np.exp(-z[i])) return h Cost function is computed by the following code: In [10]: def cost_function(X, y, theta, Lambda): # X: features # y: outputs # theta: model's parameters vector m = len(y) # number of instances J = 1 / m * ( - y.T.dot(np.log(sigmoid(X.dot(theta)))) \ - (1 - y).T.dot(np.log(1 - sigmoid(X.dot(theta)))) ) + 2 * Lambda / m * sum(theta[1:]**2) return J 1 35 36 Handbook of hydroinformatics The gradient descent is defined: In [11]: def gradient_descet(X, y, theta, alpha, num_itr): # X: features # y: outputs # theta: model's parameters vector # alpha: learning rate # number of iterations m = len(y) # number of instances Lambda = 1 J = np.zeros((num_itr + 1, )) J[0] = cost_function(X, y, theta, Lambda ) for i in range(num_itr): theta = theta - alpha / m * X.T.dot(sigmoid(X.dot(theta)) - y) - alpha * Lambd a / m * np.r_[np.zeros((1,1)),theta[1:]] J[i+1] = cost_function(X, y, theta, Lambda) return theta, J The number of iteration, alpha and, initial theta are set by bellow: In [12]: num_itr = 20000 alpha = 0.001 int_theta = np.zeros((X.shape[1],1)) theta, J = gradient_descet(X, y, int_theta, alpha, num_itr) Then the cost functions versus increasing of iterations is plotted: In [13]: plt.figure(figsize=(10,5)) plt.plot(np.arange(num_itr+1), J) plt.xlabel('number of iteration') plt.ylabel('cost function') plt.show() Advanced machine learning techniques Chapter 1 37 In [14]: x_list = np.c_[np.linspace(-1,1.2,100),np.linspace(-1,1.2,100)] D = np.zeros((100,100)) for j in range(100): for i in range(100): xx_poly = poly_feature.fit_transform(np.c_[x_list[i,0], x_list[j,1]]) xx_poly = (xx_poly - Mean) / Sigma xx_poly = np.c_[np.ones((1,1)), xx_poly] D[i,j] = xx_poly.dot(theta) In [15]: plt.contour(x_list[:,0], x_list[:,1], D, [0]) plt.scatter(x[np.array(y==1).reshape(118,),0], x[np.array(y==1).reshape(118,),1], labe l='y = 1') plt.scatter(x[np.array(y==0).reshape(118,),0], x[np.array(y==0).reshape(118,),1], labe l='y = 0') plt.xlabel('Microchip Test 1') plt.ylabel('Microchip Test 2') plt.legend() plt.show() References € g€ Akg€ un, B., O ud€ uc€ u, Ş.G., 2015. Streaming linear regression on spark MLlib and MOA. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Aldrich, C., Auret, L., 2013. Unsupervised Process Monitoring and Fault Diagnosis With Machine Learning Methods. Springer. Alpaydin, E., 2020. Introduction to Machine Learning. MIT Press, USA. Amamra, A., Khanchoul, K., Eslamian, S., Hadj Zobir, S., 2018. Suspended sediment estimation using regression and artificial neural network models: Kebir watershed, northeast of Algeria, North Africa. Int. J. Hydrol. Sci. Technol. 8 (4), 352–371. Ayyadevara, V., 2018. Pro Machine Learning Algorithms. Apress, New York, USA. Bali, R., et al., 2016. R: Unleash Machine Learning Techniques. Packt Publishing Ltd. Bargarai, F.A.M., Abdulazeez, A.M., Tiryaki, V.M., Zeebaree, D.Q., 2020. Management of wireless communication systems using artificial intelligencebased software defined radio. Int. J. Interact. Mob. Technol. https://doi.org/10.3991/ijim.v14i13.14211. Brownlee, J., 2016. Master Machine Learning Algorithms: Discover How They Work and Implement Them From Scratch. Machine Learning Mastery. https://machinelearningmastery.com/. Burger, S.V., 2018. Introduction to Machine Learning With R: Rigorous Mathematical Analysis. O’Reilly Media, Inc. Dargan, S., et al., 2020. A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27 (4), 1071–1092. Dehghan, M.H., Hamidi, F., Salajegheh, M., 2015. Study of linear regression based on least squares and fuzzy least absolutes deviations and its application in geography. In: 2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS). IEEE. 38 Handbook of hydroinformatics Epskamp, S., Fried, E.I., 2018. A tutorial on regularized partial correlation networks. Psychol. Methods 23 (4), 617. Forsyth, D., 2019. Applied Machine Learning. Springer. Ganesh, T.V., 2017. Practical Machine Learning With R and Python: Machine Learning in Stereo. Independently Published. Gori, M., 2017. Machine Learning: A Constraint-Based Approach. Morgan Kaufmann. Hackeling, G., 2017. Mastering Machine Learning With Scikit-Learn. Packt Publishing Ltd., UK. Harrell Jr., F.E., 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer. Harrington, P., 2012. Machine Learning in Action. Manning Publications Co., USA. Humphries, G.R., Magness, D.R., Huettmann, F., 2018. Machine Learning for Ecology and Sustainable Natural Resource Management. Springer. Izenman, A., 2008. Regression, Classification, and Manifold Learning, Modern Multivariate Statistical Techniques. Springer Texts in Statistics, Germany. Jaber, M.Y., 2016. Learning Curves: Theory, Models, and Applications. CRC Press. Konishi, S., 2014. Introduction to Multivariate Analysis: Linear and Nonlinear Modeling. CRC Press, USA. Lantz, B., 2019. Machine Learning With R: Expert Techniques for Predictive Modeling. Packt Publishing Ltd. Lesmeister, C., 2015. Mastering Machine Learning With R. Packt Publishing Ltd. Lim, H.-I., 2019. A linear regression approach to modeling software characteristics for classifying similar software. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). IEEE. Liu, Y., et al., 2017. Materials discovery and design using machine learning. J. Mater. 3 (3), 159–177. Matloff, N., 2017. Statistical Regression and Classification: From Linear Models to Machine Learning. CRC Press, USA. Mohammed, M., Khan, M.B., Bashier, E.B.M., 2016. Machine Learning: Algorithms and Applications. CRC Press. Olive, D.J., 2017. Linear Regression. Springer. Ramasubramanian, K., Singh, A., 2018. Machine Learning Using R: With Time Series and Industry-Based Use Cases in R. Apress, New York, USA. Raschka, S., 2015. Python Machine Learning. Packt Publishing Ltd, UK. Raschka, S., Mirjalili, V., 2019. Python Machine Learning: Machine Learning and Deep Learning With Python, Scikit-Learn, and Tensor Flow 2. Packt Publishing Ltd. Saleh, A.M.E., Arashi, M., Kibria, B.G., 2019. Theory of Ridge Regression Estimation With Applications. vol. 285 John Wiley & Sons. Sarkar, M.R., et al., 2015. Electricity demand forecasting of Rajshahi City in Bangladesh using fuzzy linear regression model. In: 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT). IEEE. Shalev-Shwartz, S., Ben-David, S., 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, UK. Shukla, N., 2018. Machine Learning With Tensor Flow. Manning Publications Co. Sra, S., Nowozin, S., Wright, S.J., 2012. Optimization for Machine Learning. MIT Press, USA. Sulaiman, M.A., 2020. Evaluating data mining classification methods performance in internet of things applications. J. Soft Comput. Data Mining 1 (2), 11–25. Swamynathan, M., 2019. Mastering Machine Learning With Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using Python. Apress, New York, USA. Yaqub, M., et al., 2020. Modeling of a full-scale sewage treatment plant to predict the nutrient removal efficiency using a long short-term memory (LSTM) neural network. J. Water Process Eng. 37, 101388. Zebari, D.A., et al., 2020. Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images. IEEE Access 8, 203097–203116. Zeebaree, D.Q., et al., 2019. Machine learning and region growing for breast cancer segmentation. In: 2019 International Conference on Advanced Science and Engineering (ICOASE). IEEE. Chapter 2 Bat algorithm optimized extreme learning machine: A new modeling strategy for predicting river water turbidity at the United States Salim Heddam Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology, Hydraulics Division, Agronomy Department, Faculty of Science, Skikda, Algeria 1. Introduction Water turbidity (TU) among other water variables has been used for a long time as an indicator of water quality in rivers, streams, and lakes freshwater ecosystems (Zolfaghari et al., 2020), and also for monitoring water contamination and guiding pollution control (Gu et al., 2020). The concentration of TU in water comes from the high concentration of the suspended solids caused by watershed runoff (Park et al., 2017), and it is often used as an indicator of the intensity of light scattering (Gelda et al., 2009; Gelda and Effler, 2007). The water clarity and transparency are measured and evaluated using the turbidity which is related to the scattering of light (Al-Yaseri et al., 2013). A high concentration of TU in freshwater can cause serious problems and lead to a deterioration of the water quality that can cause serious health problems, affecting the metabolic activity and leading to a significant increase in the net sedimentation rate (Gelda et al., 2013). In a study conducted in the Niulan River, China, it was demonstrated that high levels of turbidity were originated from three sources namely; interflows and underflows caused the sudden spikes, strong mixing caused by the floods, and the very low settling velocity of the very fine incoming sediments (Zhang and Wu, 2020). In addition, it was reported that the higher the TU concentration in water, the higher the esthetic impairments manifested (Gelda and Effler, 2007). River TU is highly correlated to river discharge (Q) and the relation between TU and Q is a complex and dynamic process (Mather and Johnson, 2014). Water TU can be measured directly using in-situ sensors and calculated using various indirect methods based on the application of different kinds of models. Over the years, several models have been developed and proposed for predicting TU and mainly based on the artificial intelligence paradigms or remote sensing data. Rajaee and Jafari (2018) applied several machine learning for predicting daily river TU in the Blue River at Kenneth Road, Overland Park, Kansas, United States. The authors used the standard artificial neural network (ANN), gene expression programming (GEP), and the decision tree (DP) approaches. In addition, the proposed models were applied combined with the discrete wavelet transforms (DWT) for improving the model’s accuracy. Based on the correlation coefficients, the explanatory variables were composed of turbidity and river discharge (Q) measured at several previous lag times. From the obtained results, the authors demonstrated that the best accuracy was achieved using the wavelet-gene expression programming (WGEP), compared to the wavelet-ANN (WANN) and wavelet-decision tree (WDT). In another study, Liu and Wang (2019) compared the multiple linear regression (MLR) and the GEP models in predicting water turbidity measured at two reservoirs located in Taiwan: the Tseng-Wen and Nan-Hwa reservoirs. The authors have developed the predictive models based on the satellite imagery obtained from the Landsat 8 satellite, and in total four inputs were selected namely, the spectral wavelength band 2 (450–510 nm, blue), band 3 (530–590 nm, green), band 4 (640– 670 nm, red), and band 5 (850–880 nm). From the obtained results, the GEP model worked best compared to the MLR model. Zounemat-Kermani et al. (2020) used several machines in predicting river TU in Brandywine Creek, Pennsylvania, Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00005-1 Copyright © 2023 Elsevier Inc. All rights reserved. 39 40 Handbook of hydroinformatics United States, namely the online sequential extreme learning machine (OS-ELM), the ANN, the classification and regression tree (CART), the group method of data handling (GMDH), and the response surface method (RSM) models. The proposed machine learning models were developed using several predictors, i.e., Q, precipitation (P), water pH, suspended sediment (SS), dissolved oxygen (DO), and water temperature (TE). From the obtained results, they reported that the best accuracy was obtained using the OSELM model, while the CART was the worst model. Gu et al. (2020) proposed a new model for river TU retrieval using the random forest regression model (RF). The authors selected 13 bands from the hyperspectral remote sensing data obtained by the Google earth engine (GEE) and the model was called RFE-GEE. To demonstrate the superiority of the proposed RFE-GEE model, they compared its accuracy with those of RF, broad learning system (BLS), bidirectional ELM (BELM), support vector regression (SVR), deep belief network, extreme learning machine (ELM), and stacked selective ensemble-backed predictor (SSEP) models. From the obtained results, they reported that the high accuracy was obtained using the developed RFE-GEE which ensured a 15.4% gain taking into account the mean squared error (MSE). Allam et al. (2020) proposed the use of the Landsat 8 surface reflectance (L8SR) for predicting TU in the Ramganga River, India. The proposed algorithm achieved a good correlation between in situ measured and calculated river TU with R2 0.760. Najah et al. (2013) compared two artificial neural network models namely the MLPNN and the radial basis function neural networks (RBFNN) for predicting river TU measured in the Johor River Basin located in Johor state, Malaysia. The two models were developed and compared using only the total dissolved solids (TDS), and the results showed that the RBFNN (R2 0.80) was more accurate compared to the MLPNN (R2 0.64). Mather and Johnson (2016) combined three input variables namely river Q, P and air temperature (TE) for forecasting daily River TU 3 days in advance. The empirical even model was developed using data from two USGS sites and acceptable accuracy was obtained. Tsai and Yen (2017) used the group method of data handling algorithm (GMDH) for forecasting river TU measured at the Chiahsien Weir and its upper stream in Taiwan. By combining the Q, P, and TU measured at the previous lag, they demonstrated that GMDH (R 0.975) was more accurate than the stepwise regressive (SGMDH) (R 0.965) and achieved high accuracy. In a recently published paper, Teixeira et al. (2020) compared MLPNN and the fuzzy inference system (FIS) in predicting river TU using the Q and the area of the watersheds (A). According to the obtained results, the FIS model was more accurate with Nash-Sutcliffe efficiency (NSE) of 0.860 for the validation dataset. In the same context, Iglesias et al. (2014) has proposed a new modeling strategy for modeling river TU in the Nalón river basin, Northern Spain. The proposed approach used the so-called synergistic variables which were obtained by the multiplication two well-known variables: conductivity ammonium, conductivity pH, conductivity dissolved oxygen, and so on. It was demonstrated that the new synergistic variables contribute significantly to the improvement of the model’s performances. According to the literature review discussed earlier, it is clear that several attempts have been done for providing general frameworks for the river water TU modeling, and models based on machine learning were the most reported tools. While it was shown that river TU can be predicted very well using a combination of several water variables, we believe that the introduction of new working methods based on the use of fewer predictor will be a very promising area of research and the development of new modeling strategy can help in improving our understanding of the river TU modeling. In addition, the use of hybrid models based on the combination of standalone machine learning and several metaheuristics algorithms can help in improving the models performances. Consequently, the objective of this study is to introduce a new kind of machine learning models called bat algorithm optimized extreme learning machine (Bat-ELM) for predicting daily river turbidity using only river discharge. The Bat-ELM was compared to the feedforward artificial neural network (FFNN), and the dynamic evolving neural-fuzzy inference system (DENFIS) models. 2. Study area and data The study area for this investigation was composed of four USGS stations, two of them located in the Sprague River, Oregon, United States, and the two other stations in the Clackamas County, Oregon, United States. The selected stations were: (i) USGS 11497500 at Sprague River near Beatty, Klamath Basin, Oregon, United States (Latitude 42°260 51.900 , Longitude 121°140 18.700 NAD83), (ii) USGS 11501000 at Sprague River near Chiloquin, Klamath Basin, Oregon, United States (Latitude 42°350 03.500 , Longitude 121°500 54.000 NAD83), (iii) USGS 14210000 at Clackamas River at Estacada, Oregon, United States (Latitude 45°180 0000 , Longitude 122°210 1000 NAD27), and (iv) USGS 14211010 at Clackamas River near Oregon City, Oregon, United States (Latitude 45°220 4600 , Longitude 122°340 3400 NAD27). The location of the study Bat algorithm optimized extreme learning machine Chapter 2 41 FIG. 1 Location map showing the four USGS stations selected for modeling river turbidity. area shows the four USGS station in Fig. 1. The data from these four selected stations were used to build machine learning models for estimating river turbidity measured at a daily time scale, as a function of the river discharge. The length of the data set varied form one station to another ranging from 990 to 6684 patterns, and the detail for each station was provided in Table 1. For each station, the dataset was randomly divided into two subgroups: one for the calibration period (70%) and the rest (30%) for validation. Table 2 reported the mean, maximum, minimum, standard deviation, coefficient of variation values, and the coefficient of correlation with TU, i.e., Xmean, Xmax, Xmin, Sx, Cv, and R, respectively. TABLE 1 Period of records for the USGS stations selected for Modeling River turbidity. Station Begin date End date Total patterns Incomplete patterns Final patterns USGS 1497500 01/11/2007 31/12/2015 2983 1993 990 USGS 11501000 16/11/2007 02/09/2020 4675 711 3964 USGS 14210000 01/07/2001 03/09/2020 7005 321 6684 USGS 14211010 01/06/2002 03/09/2020 6670 229 6441 42 Handbook of hydroinformatics TABLE 2 Summary statistics of water quality variables for the four stations. Variables Subset Unit Xmean Xmax Xmin Sx Cv R USGS 11497500 Sprague River near Beatty, Klamath Basin, Oregon, United States TU Q Training FNU 7.306 47.700 1.900 7.376 1.010 1.000 Validation FNU 6.937 45.600 1.900 6.843 0.986 1.000 All data FNU 7.196 47.700 1.900 7.221 1.003 1.000 Training Kcfs 313.386 1500.000 82.600 264.966 0.845 0.503 Validation Kcfs 310.379 1520.000 82.800 270.773 0.872 0.531 All data Kcfs 312.486 1520.000 82.600 266.653 0.853 0.510 USGS 11501000 Sprague River near Chiloquin, Klamath Basin, Oregon, United States TU Q Training FNU 7.239 78.400 0.500 8.614 1.190 1.000 Validation FNU 7.407 63.700 0.500 8.483 1.145 1.000 All data FNU 7.290 78.400 0.500 8.575 1.176 1.000 Training Kcfs 477.993 4430.000 100.000 482.665 1.010 0.548 Validation Kcfs 501.307 4380.000 101.000 531.828 1.061 0.562 All data Kcfs 484.984 4430.000 100.000 497.999 1.027 0.552 USGS 14210000 Clackamas River at Estacada, Oregon, United States TU Q Training FNU 2.225 75.400 0.000 4.692 2.109 1.000 Validation FNU 2.524 78.300 0.000 5.575 2.209 1.000 All data FNU 2.314 78.300 0.000 4.975 2.149 1.000 Training Kcfs 2540.357 24800.000 589.000 2252.682 0.887 0.740 Validation Kcfs 2635.124 28900.000 601.000 2371.804 0.900 0.746 All data Kcfs 2568.781 28900.000 589.000 2289.388 0.891 0.741 USGS 14211010 Clackamas River near Oregon City, Oregon, United States. TU Q Training FNU 3.088 100.000 0.000 6.390 2.069 1.000 Validation FNU 3.144 93.800 0.000 6.814 2.167 1.000 All data FNU 3.105 100.000 0.000 6.520 2.100 1.000 Training Kcfs 3270.327 27500.000 630.000 3115.037 0.953 0.784 Validation Kcfs 3265.621 32600.000 624.000 3312.813 1.014 0.811 All data Kcfs 3268.915 32600.000 624.000 3175.539 0.971 0.793 Xmean, mean; Xmax, maximum; Xmin, minimum; Sx, standard deviation; Cv, coefficient of variation; R, coefficient of correlation with TU; TU, water turbidity; Q, discharge; FNU, Formazin Nephelometric Unit; Kcfs, thousand cubic feet per second. 3. Methodology 3.1 Feedforward artificial neural network Artificial neural networks (ANN) are widely used for solving a large number of problems in the area of water resources management and now becoming a successful tool for tackling complex and nonlinear problem (Olyaie et al., 2017; Mehr and Nourani, 2018; Hrnjica et al., 2019; Matouq et al., 2013). The success of the ANN in comparison to other regression models was primarily due to their ability to adapt and to be flexible in extracting the nonlinear relationship between variables using a learning process (Haykin, 1999). There is a large number of the ANN architecture; however, the FFNN is the Bat algorithm optimized extreme learning machine Chapter 2 43 most and widely used model in the literature. As the name suggests, the FFNN is composed of several layers: input layer, hidden layers, and output layer, and generally only one hidden layer is adopted, and the available information spreads through the network from the input to the output layer. The input layer contains the independent variables (x1, x2, x3, …, xi), the hidden layer is composed of several neurons determined by trial and error, and each one receives the all input variables (xi) multiplied by their respective parameters (the weights), use a summation function and added one bias to the results. The output of each hidden neuron was produced using an activation function, generally the sigmoidal function. Finally, the output layer sums the weighted output of the hidden neurons and uses a linear transfer function to provide the final output response. The weights and biases of the ANN models will be adjusted during the training process to minimize a cost function, generally the sum of squares error calculated as the differences between the measured and predicted value. The well-known and widely used training algorithm is the back propagation (Haykin, 1999). 3.2 Dynamic evolving neural-fuzzy inference system Evolving neural-fuzzy inference systems are intelligent models with high similarity with the classical neuro-fuzzy approaches for which the linear and nonlinear parameters were adopted in an online manner, more precisely; the nonlinear parameters were governed by the kind of partition of the input-output space (Škrjanc et al., 2019). DENFIS is the most relevant evolving system introduced during the last decade (Kasabov and Song, 2002) and is mainly based on the so-called evolving clustering method (Heddam and Kisi, 2020; Kasabov et al., 2008). From a computational point of view, the DENFIS model can be run in two manners namely the online and the offline. The first the version is based on the online training method and the model is called DENFIS_ON, while the second method is based on the offline training method and the model is called DENFIS_OF (Kasabov and Song, 2002; Kasabov et al., 2008). The triangular fuzzy membership functions are used for both online and offline DENFIS models: 8 0, xa > > > x a > > , axb < a (1) mðxÞ ¼ mf ðx, a, b, cÞ ¼ bc x > > , b x c > > > : cb 0: cx where b is the value of the cluster center on the x dimension, a ¼ b d Dthr and c ¼ b + d Dthr, d ¼ 1.2 2; the distance threshold value, Dthr, is a clustering parameter (Kasabov and Song, 2002; Kasabov et al., 2008; Heddam et al., 2018). During the last few years, DENFIS models have been applied for solving several engineering problems, and more details related to its application can be found in Adnan et al. (2021), Sebbar et al. (2020), Heddam and Kisi (2020), Heddam et al. (2018), Kisi et al. (2019a,b). The MatLab software for DENFIS can be found in https://kedri.aut.ac.nz/areas-of-expertise/ data-mining-and-decision-support/neucom. 3.3 Bat algorithm optimized extreme learning machine Single hidden layer feedforward neural network (SLFN) is the most and relevant ANN model proposed during the last decades, not only regarding its simplicity, i.e., having only one hidden layer, but also in regards to its robustness, high precision, and universal approximation capability. With the invention of the back-propagation training algorithm, the SLFN had become famous (Hornik et al., 1989; Hornik, 1991). From a computational point of view, the back-propagation is used for iteratively updating all SLFN parameters (i.e., weights and biases) from the input to the output layers, bringing the total number of updated parameters high, and in some cases (i.e., large data set) the training process become very slow and suffer from the overfitting problem. In order to meet these challenges, a new training algorithm called extreme learning machine (ELM) was arriving (Huang et al. 2006a,b), such that the weights between the input and hidden layer are obtained directly and do not need to be updated during the training process, which is called the random generation of the hidden nodes, while those linking the hidden to the output layers were analytically determined. According to Huang et al. (2006a,b), SLFN with N hidden layer nodes can be expressed as follows: Yj ¼ N X i¼1 bi c wi xj + bi j ¼ 1,…M (2) 44 Handbook of hydroinformatics where M is the number of training simple, N is the number of hidden nodes, wi is a single input to hidden layers weight, C is the activation function, bi is hidden to output layers weights, bi is the hidden nodes biases, xj correspond to the input variables matrix. The mathematical formulation of the ELM approach could be described as: Hb ¼ T (3) where H is the hidden layer output matrix, b is the weight matrix of the output layer, and T is the expected output matrix (Liu et al., 2020a; Cheng et al., 2020). Several metaheuristics training algorithms have been proposed during the last few years for improving the training process of the ANN and ELM models, among them: genetic algorithm (GA), particle swarm optimization (PSO), bee colony (ABC) optimization algorithm, Ant Colony Optimization (ACO), differential evolution (DE), and cuckoo search algorithm (CSA). In the present study, an efficient optimization method called Bat optimization was introduced to optimize the ELM model and described later. The Bat optimization algorithm introduced by Yang (2010) is a metaheuristics approach belonging into the category of swarm intelligence models, and it was inspired by the behavior by which the bats seek their prey with a special sense ( Jaddi et al., 2015). The main idea behind the bat algorithm is mainly based on the echolocation capability and social behavior of the bat population (Xie et al., 2019). From a computational point of view the bat algorithm possesses the following three idealized rules (Shekhar et al., 2020; Liu et al., 2020b): (i) The echolocation is used by the bats as a method to calculate and to know the relative distance from a food source and obstacles in an unknown way. (ii) In order to search the prey, an initial velocity Vi should be randomly assigned at a starting position Xi. The bats fly at the same relative velocity for different times due to different initial distances, using a fixed frequency fi ranging between two limits fmin and fmax, varying wavelength l and the loudness or sound intensity A0. According to the level of proximity to the target, the bat automatically adjusts the wavelength and pulse rate accordingly. (iii) Ranging from a maximum (A0) to a minimum (Amin), the loudness of the pulse should be adjusted accordingly. The output of this iterative process is achieved according to a series of iterations according to a large number of available solutions, in which the loudness and pulse rate were updated in response to the received accepted solution. Consequently, the frequency, velocity, and position values of any bat member are calculated as follow (Gangwar and Pathak, 2020): f i ¼ f min + ðf max f min Þb + Xt1 Xtbest f i V ti ¼ V t1 i i (4) Xti ¼ Xt1 + V ti i (6) (5) where b ranging is a random number ranging from 0 to 1, fi ranging from fmin to fmax denoted as the frequency and used for controlling the step length (i.e., the step and range) of the bat movement, and it corresponds to a range of wavelengths [lmin, lmax], and Xbest is the global best solution. During the iteration process, the updated solution is calculated a follow (Gangwar and Pathak, 2020): Xnew ¼ Xold + eAt (7) where e is a random number ranging between zero and one [0, 1], and At represents the average value of Bats loudness at the time t. Flowchart of the developed Bat-ELM model is shown in Fig. 2. The MatLab code of the Bat algorithm can be found in https://fr.mathworks.com/matlabcentral/fileexchange?q¼Bat+algorithm. 3.4 Multiple linear regression Using the MLR method, one dependent variable Y (in our study the river turbidity) is linked or correlated with several predictor variables xi, using the following equation (Luu et al., 2021): Y i ¼ b0 + K X bi xi + ei i¼1 where b0 is the intercept, bi were the partial regression coefficients for each predictor and the e is the residual. (8) Bat algorithm optimized extreme learning machine Chapter 2 45 Velocity ″Vi ″ Hidden Layer Wavelength l Position Xi Input Layer b11 Q Loudness A0 Y Frequency ″f ″ Output Layer Fitness Function TU M Sqrt [MSE (Pi -Oi)] bLm D Ultrasonic Waves Wij Echo Bat, X [X1, X2, X3] Xi Start Vi (t-1) [Xi (t)- Xbest)] fi Xi (t-1) Parameters Initialization N r0 A fmin fmax m One solution should be Selected among the best Solutions After generating a new solution, the fitness function for all bats was evaluated N=25 : Population size (10 to 25) A=0.1 : Loudness r0=0.1 : Pulse rate fmin=–1 : Frequency minimum fmax=+1 : Frequency miximum m=200 : Number of Hidden Neurons The process is iterative, and will continue until the best solution was obtained corresponding to the best fitness function End FIG. 2 Extreme learning machines optimized Bat algorithm flowchart. 3.5 Performance assessment of the models In the present chapter, the performances of the proposed models were evaluated using: coefficient of correlation (R), NashSutcliffe efficiency (NSE), mean absolute error (MAE), and root mean square error (RMSE) are calculated as follow: N 1 X |ðTU 0 Þi TUp i |, ð0 MAE < +∞Þ N i¼1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N h u1 X i2 RMSE ¼ t ðTU 0 Þi TUp i , ð0 RMSE < +∞Þ N i¼1 MAE ¼ (9) (10) 46 Handbook of hydroinformatics 2 N h X ðTU 0 Þi i2 6 i 6 i¼1 NSE ¼ 1 6 6 X N 2 4 ðTU 0 Þi TU 0 TU p 3 7 7 7, 7 5 ð∞ < NSE 1Þ i¼1 2 N X (11) 3 ðTU 0 Þi TU 0 ðTUp Þi TUP 6 7 6 7 i¼1 6 7, ð1 < R +1Þ s s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi R¼6 n n 2 1 X 2 7 4 1 X 5 ðTU 0 Þi TU 0 ðTUp Þi TUp Ν Ν 1 Ν i¼1 (12) i¼1 In which, N is the number of data, TU0, TUp, TU 0, TU p are the measured, predicted, mean measured, and mean predicted river water turbidity, respectively. 4. Results and discussion As stated above, the goal of our study was the prediction of river turbidity at four rivers located in the United States. For this purpose, four machine learning models were developed and compared according to two scenarios: (i) using the river discharge (Q) and the periodicity (i.e., year, month, and day) numbers, and (ii) using only river discharge. The RMSE, MAE, R, and NSE, respectively, were calculated during the training and validation phases separately, and the obtained results were further analyzed using graphical representations. Overall, at the four stations, the river TU were poorly estimated using only river Q compared to the estimation achieved using Q and the periodicity, and the Bat-ELM showed the best correlation among all proposed models over the four stations. Detailed results for each station are discussed hereafter. 4.1 USGS 1497500 station The numerical results of daily river TU prediction at the USGS 1497500 station using the four machine learning models are illustrated in Table 3. According to Table 3, using only the Q as an input variable, the DENFIS_O2, DENFIS_F2, FFNN2, and Bat-ELM2 models exhibit small variations during the validation phase, and none of them was able to correctly and accurately predict TU concentration. The RMSE and MAE values ranging from 5.593 to 6.230 and 3.256 to 3.575, respectively, show the poor models performances during the validation stage. The NSE and R values were very low and do not exceed 0.331 and 0.576, respectively. However, inclusion of the periodicity guaranteed a significant improvement in the models performances for all proposed models. The river TU retrieved has a NSE coefficient of no <0.660 for all models and the R values were superior to 0.827. In addition, the RMSE and MAE values were no more than 3.99 and 2.26, respectively. These results imply that the four machine learning models have been able to predict the river TU very accurately by the inclusion of the periodicity. Overall, the best accuracy was obtained using the Bat-ELM1 with R and NSE of 0.972 and 0.936, respectively, versus 0.905 and 0.770 for FFNN1, and 0.850 and 0.708 for DENFIS_O1, significantly higher than the TABLE 3 Performances of different River Turbidity models at the USGS 11497500 station. Training Validation Models R NSE RMSE MAE R NSE RMSE MAE Bat-ELM1 0.984 0.968 1.314 0.894 0.972 0.936 1.731 1.155 Bat-ELM2 0.596 0.355 5.919 3.684 0.576 0.331 5.593 3.256 DENFIS_O1 0.924 0.593 4.704 2.948 0.850 0.708 3.694 2.023 DENFIS_O2 0.916 0.364 5.876 3.630 0.530 0.170 6.230 3.575 DENFIS_F1 0.842 0.657 4.316 2.425 0.827 0.659 3.992 2.258 DENFIS_F2 0.573 0.294 6.191 4.044 0.564 0.278 5.809 3.503 FFNN1 0.979 0.959 1.490 0.964 0.905 0.770 3.276 1.787 FFNN2 0.638 0.408 5.673 3.563 0.500 0.225 6.018 3.468 Bat algorithm optimized extreme learning machine Chapter 50.00 y = 0.8652x + 0.6792 R2 = 0.9439 38.00 26.00 14.00 2.00 2.00 Bat-ELM1 14.00 26.00 y = 0.6531x + 1.5762 38.00 38.00 26.00 14.00 2.00 2.00 50.00 Measured TU (FNU) 50.00 y = 0.6747x + 1.5264 R2 = 0.7227 14.00 26.00 38.00 50.00 y = 0.8117x + 1.4772 R2 = 0.6837 38.00 38.00 26.00 14.00 2.00 2.00 FFNN1 Measured TU (FNU) Calculated TU (FNU) Calculated TU (FNU) 50.00 47 R2 = 0.8186 Calculated TU (FNU) Calculated TU (FNU) 50.00 2 DENFIS_O1 14.00 26.00 38.00 50.00 26.00 14.00 2.00 2.00 Measured TU (FNU) DENFIS_F1 14.00 26.00 38.00 50.00 Measured TU (FNU) FIG. 3 Scatterplots of measured against calculated Turbidity at the USGS 11497500 station. weakest performances obtained using the DENFIS_F1 model with R and NSE values of 0.827 and 0.659, respectively, which is still largely less than the other three models. In addition, the Bat-ELM1 improves the FFNN1, DENFIS_O1, and DENFIS_F1 by 47.16% and 35.37%, 53.14% and 42.19%, and 56.64% and 48.85% reduction in terms of RMSE and MAE, respectively. Clearly, Bat-ELM1, FFNN1, and DENFIS_O1 were more accurate compared to DENFIS_F1, and Bat-ELM1 further improves the river TU estimation. Fig. 3 shows scatterplot of river TU values calculated by DENFIS_O1, DENFIS_F1, FFNN1, and Bat-ELM1 models compared with in situ measurement. A first look at the results reveals high-to-moderate agreement between calculated and measured data by all four algorithms. However, it may seem that the Bat-ELM1 possess the high accuracy with very low scattered data, followed by the FFNN1, the DENFIS_O1 in the third place, while the high scattered data were obtained using the DENFIS_F1. 4.2 USGS 11501000 station River TU estimation at the USGS 115001000 using the four machine learning models are compared to the in situ measured data in Fig. 4 for the validation dataset. For both Bat-ELM1 and FFNN1 models, simulated TU fall generally along the one to one line against in situ measurements with less scattered data and the superiority of the Bat-ELM1 is obvious; however, the DENFIS_O1 and DENFIS_F1 worked equally with slight difference, and they are less accurate compared to Bat-ELM1 and FFNN1 with large scattered data. Quantitative measures of all river TU comparisons are shown in Table 4 in terms of RMSE, MAE, NSE, and R values. From Table 4, estimated river TU was poorly correlated with in situ measured data for the models based only on river discharge. The four machine learning models have low NSE and R values ranging from 0.384 to 80.00 y = 0.8423x + 1.1526 80.00 y = 0.8795x + 1.0095 R2 = 0.8085 60.00 60.00 Calculated TU (FNU) Calculated TU (FNU) R2 = 0.8787 40.00 20.00 Bat-ELM1 0.00 0.00 20.00 40.00 60.00 40.00 20.00 0.00 0.00 80.00 Measured TU (FNU) 80.00 20.00 40.00 60.00 80.00 Measured TU (FNU) 80.00 y = 0.698x + 1.8652 R2 = 0.7523 y = 0.7006x + 2.1264 R2 = 0.7587 60.00 Calculated TU (FNU) 60.00 Calculated TU (FNU) FFNN1 40.00 20.00 DENFIS_O1 0.00 0.00 20.00 40.00 60.00 40.00 20.00 0.00 0.00 80.00 Measured TU (FNU) DENFIS_F1 40.00 20.00 60.00 80.00 Measured TU (FNU) FIG. 4 Scatterplots of measured against calculated Turbidity at the USGS 11501000 station. TABLE 4 Performances of different River Turbidity models at the USGS 11501000 station. Training Validation Models R NSE RMSE MAE R NSE RMSE MAE Bat-ELM1 0.941 0.885 2.921 1.582 0.937 0.877 2.972 1.748 Bat-ELM2 0.685 0.469 6.273 3.217 0.699 0.488 6.069 3.295 DENFIS_O1 0.868 0.705 4.680 1.836 0.867 0.746 4.271 2.174 DENFIS_O2 0.720 0.478 6.224 2.984 0.656 0.384 6.657 3.617 DENFIS_F1 0.877 0.764 4.187 2.101 0.871 0.754 4.206 2.250 DENFIS_F2 0.677 0.458 6.340 3.338 0.702 0.490 6.054 3.408 FFNN1 0.981 0.963 1.667 1.103 0.899 0.802 3.773 2.376 FFNN2 0.725 0.525 5.933 3.050 0.664 0.437 6.362 3.436 Bat algorithm optimized extreme learning machine Chapter 2 49 0.490 and from 0.656 to 0.702, respectively, and none of the models possess a NSE value greater than 0.50. In terms of errors metrics, the obtained RMSE and MAE were very high ranging from 6.054 to 6.657 and from 3.408 to 3.617, respectively. The difference in models performances between scenario 1 and 2 is apparent and the significant contribution of the periodicity in the improvement of the models accuracy was completely clear for which the NSE and R value are somewhat larger and the RMSE and MAE values are somewhat small. The RMSE and MAE of the Bat-ELM1 and FFNN1 were improved by 51.03% and 46.95%, 40.69% and 30.85%, respectively. In addition, the RMSE and MAE of the DENFIS_O1 and DENFIS_F1 were improved by 35.84% and 39.89%, 30.53% and 33.98%, respectively. The most significant improvement was achieved using the Bat-ELM1 for which the RMSE had dropped from 6.069 to 2.972, while the MAE value was decreased from 3.295 to 1.748, respectively. In addition, an increase in the NSE and R values is to be expected: the R spiked to almost 0.937 compared to 0.699 (25.40% improvement) obtained using only the river discharge, while the NSE value rose by 44.36% (0.488–0.877). The improvement on models accuracies was attributed to the introduction of the periodicity as input variable combined the discharge. Finally, comparison between the models accuracy revealed the superiority of the Bat-ELM1, followed by the FFNN1, while the two DENFIS models have typically the same performances. For comparison, Bat-ELM1 decreased the RMSE and MAE values of the FFNN1, DENFIS_O1, and DENFIS_F1 by 21.23% and 26.43%, 30.41% and 19.60%, and 29.34% and 22.31%, respectively. 4.3 USGS 14210000 station Table 5 shows the numerical results obtained at the USGS 14210000 station using the machine learning models described above. During the validation phase, the minimum RMSE and MAE of the second scenario (i.e., using only Q) are given as well as the NSE and R values, showing the superiority of the Bat-ELM2 model, while the FFNN2, DENFIS_O2, and DENFIS_F2 exhibit relatively the same level of accuracies, for which statistical measurement of error, i.e., RMSE and MAE showed a range of 3.185–3.753, and 1.244–1.334, respectively, with larger errors values obtained by the FFNN2 (RMSE ¼ 3.753, MAE ¼ 1.334). From Table 5, it can be seen that the errors index calculated using the Bat-ELM2 are generally the lower one with RMSE and MAE of 3.185 FNU and 1.245 FNU, respectively. Across the two scenarios with and without the periodicity, scenario 1 having the Q and the periodicity as input variables show better performances over the four machine learning models, with measurement errors (i.e., RMSE and MAE) significantly reduced. The RMSE varies from 3.396 at worst to 2.456 at best, and the MAE varies from 1.299 at worst to 1.117 at best. Quantitative comparisons for all models in the form of RMSE, MAE, R, and NSE values between observed and simulated values reported in Table 5 revealed that a significant percentage improvement was achieved using the Bat-ELM1 in comparison to the FFNN1, DENFIS_O1, and DENFIS_F1 models. The Bat-ELM1 increased R and NSE values by 9.24% and 23.43%, and decreased RMSE and MAE values by 25.21% and 14.01%, respectively, in the validation phase, compared to the FFNN1 model. In addition, The Bat-ELM1 decreased the RMSE and MAE values by 27.68% and 7.76%, and increased R and NSE values by 11.97% and 28.14%, respectively, in the validation phase, compared with DENFIS_O1 model. Finally, the Bat-ELM1 was more accurate compared to DENFIS_F1 showing a significant decrease of the RMSE and MAE by 25.213% and 14.011%, respectively. TABLE 5 Performances of different River Turbidity models at the USGS 14210000 station. Training Validation Models R NSE RMSE MAE R NSE RMSE MAE Bat-ELM1 0.850 0.723 2.469 1.109 0.898 0.806 2.456 1.117 Bat-ELM2 0.831 0.691 2.607 1.042 0.821 0.674 3.185 1.245 DENFIS_O1 0.808 0.594 2.989 1.160 0.802 0.629 3.396 1.211 DENFIS_O2 0.835 0.664 2.718 0.959 0.799 0.623 3.420 1.244 DENFIS_F1 0.898 0.804 2.075 0.901 0.807 0.646 3.318 1.241 DENFIS_F2 0.821 0.674 2.677 1.040 0.790 0.625 3.415 1.277 FFNN1 0.901 0.812 2.036 0.935 0.822 0.653 3.284 1.299 FFNN2 0.856 0.733 2.424 0.996 0.771 0.547 3.753 1.334 50 Handbook of hydroinformatics 80.00 y = 0.8020x + 0.5825 R2 = 0.8061 60.00 Calculated TU (FNU) Calculated TU (FNU) 80.00 40.00 20.00 0.00 0.00 Bat-ELM1 40.00 20.00 60.00 y = 0.7991x + 0.6243 R2 = 0.6758 60.00 40.00 20.00 FFNN1 0.00 0.00 80.00 20.00 60.00 40.00 20.00 DENFIS_O1 20.00 40.00 80.00 y = 0.7107x + 0.7690 R2 = 0.6512 Calculated TU (FNU) Calculated TU (FNU) 80.00 y = 0.7329x + 0.4145 R2 = 0.6434 0.00 0.00 60.00 Measured TU (FNU) Measured TU (FNU) 80.00 40.00 60.00 80.00 60.00 40.00 20.00 0.00 0.00 Measured TU (FNU) DENFIS_F1 20.00 40.00 60.00 80.00 Measured TU (FNU) FIG. 5 Scatterplots of measured against calculated Turbidity at the USGS 14210000 station. The comparisons between simulated and in situ measured TU are given in Fig. 5 in terms of scatterplot. The agreement is very good for Bat-ELM1 with R2 determination coefficient always above 0.80, and the data were less scattered in comparison to the other three models for which the data were largely scattered with an R2 approaching 0.670. 4.4 USGS 14211010 station At the USGS 14211010 station (Table 6), for the four developed models, it can be concluded that both during the training and the validation phase, results showed that the inclusion of the periodicity as input variable has a marked effect on the performances of the models. During the validation phase, it is clear from the obtained results that using only the river discharge as input variable, the performances of FFNN2, Bat-ELM2, DENFIS_O2, and DENFIS_F2 models were relatively similar with slight superiority in favor to DENFIS_F2. An analysis of the statistical indices shows that the R and NSE values are in the range of 0.824–0.852 and 0.679–0.726. Similarly, the RMSE and MAE range from 3.56 to 3.86 FNU and from 1.338 to 1.407, respectively. From Table 6, it is clear that the inclusion of the periodicity improves the performances of both models. Using the periodicity and Q as input variables, the best Bat-ELM1 model had RMSE ¼ 2.626, MAE ¼ 1.161, R ¼ 0.923, and NSE ¼ 0.851, and surpasses all other models in terms of accuracy. Scatterplot of calculated versus measured river TU are given in Fig. 6. Finally, the performances of the models were evaluated and compared in terms of boxplot (Fig. 7) and Taylor diagram (Fig. 8) showing the superiority and the high performances of the Bat-ELM1 compared to the all developed models. Bat algorithm optimized extreme learning machine Chapter 2 51 TABLE 6 Performances of different River Turbidity models at the USGS 14211010 station. Training Validation Models R NSE RMSE MAE R NSE RMSE MAE Bat-ELM1 0.934 0.872 2.289 1.145 0.923 0.851 2.626 1.161 Bat-ELM2 0.837 0.700 3.500 1.385 0.844 0.708 3.682 1.430 DENFIS_O1 0.867 0.727 3.336 1.289 0.862 0.739 3.480 1.422 DENFIS_O2 0.869 0.747 3.216 1.313 0.837 0.698 3.744 1.338 DENFIS_F1 0.931 0.867 2.329 1.062 0.847 0.704 3.705 1.226 DENFIS_F2 0.880 0.774 3.039 1.317 0.852 0.726 3.564 1.345 FFNN1 0.945 0.893 2.093 1.031 0.827 0.638 4.097 1.249 FFNN2 0.891 0.793 2.904 1.303 0.824 0.679 3.862 1.407 100.00 100.00 y = 0.8435x + 0.5604 R = 0.8516 80.00 Calculated TU (FNU) Calculated TU (FNU) 2 60.00 40.00 20.00 0.00 0.00 Bat-ELM1 y = 0.8583x + 0.4882 R2 = 0.6832 80.00 60.00 40.00 20.00 0.00 0.00 20.00 40.00 60.00 80.00 100.00 FFNN1 20.00 40.00 60.00 80.00 100.00 Measured TU (FNU) 100.00 y = 0.7915x + 0.7133 R2 = 0.7424 80.00 Calculated TU (FNU) Calculated TU (FNU) 100.00 60.00 40.00 20.00 0.00 0.00 Measured TU (FNU) DENFIS_O1 20.00 40.00 60.00 80.00 100.00 y = 0.8136x + 0.6708 R2 = 0.7174 80.00 60.00 40.00 20.00 0.00 0.00 DENFIS_F1 20.00 40.00 60.00 Measured TU (FNU) FIG. 6 Scatterplots of measured against calculated Turbidity at the USGS 14211010 station. 80.00 100.00 Measured TU (FNU) 52 Handbook of hydroinformatics USGS 11497500 30 20 10 USGS 11501000 60 [TU (RFU)] [TU (RFU)] 40 40 20 0 0 M M1 M2 M3 M4 M5 M6 M7 M8 M 150 [TU (RFU)] [TU (RFU)] 60 40 20 M1 M2 M3 M4 M5 M6 M7 M8 USGS 14211010 USGS 14210000 80 M: Measured M1: FFN1 M2: FFN1 M3: Bat-ELM1 M4: Bat-ELM2 M5: DENFIS_O1 M6: DENFIS_O2 M7: DENFIS_F1 M8: DENFIS_F2 100 50 0 0 M M1 M2 M3 M4 M5 M6 M7 M8 M M1 M2 M3 M4 M5 M6 M7 M8 FIG. 7 Box-plots of measured and calculated river turbidity (TU: RFU) for the four USGS stations. Boxes are generated using validation dataset illustrating the 25th and 75th percentiles, and the median. Whiskers include the highest and lowest values and the mean values are marked by red line. FIG. 8 Taylor diagram of river turbidity (TU: RFU) illustrating the statistics of comparison between the proposed models at the four USGS stations. Bat algorithm optimized extreme learning machine Chapter 2 53 5. Conclusions As a key water quality variable, river turbidity is of great concern in a large number environmental, water resources, and hydrological studies. In this study, first, a robust model for predicting the river TU using only river discharge was fitted and the obtained results were low to moderate. Next, a nonlinear model between the river TU, discharge, and the periodicity (i.e., day, month, and year numbers) was established using a new hybrid machine learning model (i.e., Bat-ELM). Then, the proposed model was applied and tested using data collected at four USGS stations. And finally, the estimation provided by the Bat-ELM was compared to those achieved using two kinds of machine learning models namely the FFNN and DENFIS models. The new method introduced in the present study (Bat-ELM) made a good to excellent work, and an excellent progress in modeling the river TU was achieved. Therefore, the new method was defined as the best and the useful method for the estimation of the river TU. The overall accuracy of prediction was significantly improved by the inclusion of the periodicity and the correlation coefficient between the measured and predicted river TU reached 0.97, and the corresponding RMSE was 1.731. However, when the model was examined without the inclusion of the periodicity and using only the river discharge, the performances of the Bat-ELM were not the greatest and in some cases, it was surpassed by the DENFIS models. Results obtained in the present study encompass an encouraging record of progress and achievement by the use of the machine learning models, and can be applied using data from other stations. Future work should be emphasized on performances of the proposed models using other input variables and future researches should be encouraged. Also, the obtained results in the present chapter seem to be interesting and the overall merits of the proposed hybrid Bat-ELM highlighted. Having seen that, the Bat-ELM surpasses all of the FFNN and DENFIS models at the four stations has lead us to conclude that the idea of hybridizing machine learning, i.e., the ELM is very promising and should be used for the other machine learning models. References Adnan, R.M., Liang, Z., Parmar, K.S., Soni, K., Kisi, O., 2021. Modeling monthly streamflow in mountainous basin by MARS, GMDH-NN and DENFIS using hydroclimatic data. Neural Comput. Applic. 33 (7), 2853–2871. Allam, M., Khan, M.Y.A., Meng, Q., 2020. Retrieval of turbidity on a spatio-temporal scale using Landsat 8 SR: a case study of the Ramganga River in the Ganges Basin, India. Appl. Sci. 10 (11), 3702. https://doi.org/10.3390/app10113702. Al-Yaseri, I., Morgan, S., Retzlaff, W., 2013. Using turbidity to determine total suspended solids in storm-water runoff from green roofs. J. Environ. Eng. 139 (6), 822–828. https://doi.org/10.1061/(ASCE)EE.1943-7870. Cheng, K., Gao, S., Dong, W., Yang, X., Wang, Q., Yu, H., 2020. Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 403, 360–370. https://doi.org/10.1016/j.neucom.2020.04.098. Gangwar, S., Pathak, V.K., 2020. Dry sliding wear characteristics evaluation and prediction of vacuum casted marble dust (MD) reinforced ZA-27 alloy composites using hybrid improved bat algorithm and ANN. Mater. Today Commun. 25, 101615. https://doi.org/10.1016/j.mtcomm.2020.101615. Gelda, R.K., Effler, S.W., 2007. Modeling turbidity in a water supply reservoir: advancements and issues. J. Environ. Eng. 133 (2), 139–148. https://doi. org/10.1061/(ASCE)0733-9372(2007)133:2(139). Gelda, R.K., Effler, S.W., Peng, F., Owens, E.M., Pierson, D.C., 2009. Turbidity model for Ashokan Reservoir, New York: case study. J. Environ. Eng. 135 (9), 885–895. https://doi.org/10.1061/(ASCE)EE.1943-7870.0000048. Gelda, R.K., Effler, S.W., Prestigiacomo, A.R., Peng, F., Effler, A.J., Wagner, B.A., et al., 2013. Characterizations and modeling of turbidity in a water supply reservoir following an extreme runoff event. Inland Waters 3 (3), 377–390. https://doi.org/10.5268/IW-3.3.581. Gu, K., Zhang, Y., Qiao, J., 2020. Random forest ensemble for river turbidity measurement from space remote sensing data. IEEE Trans. Instrum. Meas. https://doi.org/10.1109/TIM.2020.2998615. Haykin, S., 1999. Neural Networks a Comprehensive Foundation. Prentice Hall, Upper Saddle River, UK. Heddam, S., Kisi, O., 2020. Evolving connectionist systems versus neuro-fuzzy system for estimating total dissolved gas at forebay and tailwater of dams reservoirs. In: Intelligent Data Analytics for Decision-Support Systems in Hazard Mitigation. Springer, Singapore, pp. 109–126, https://doi.org/ 10.1007/978-981-15-5772-9_6. Heddam, S., Watts, M.J., Houichi, L., Djemili, L., Sebbar, A., 2018. Evolving connectionist systems (ECoSs): a new approach for modeling daily reference evapotranspiration (ET0). Environ. Monit. Assess. 190 (9), 516. https://doi.org/10.1007/s10661-018-6903-0. Hornik, K., 1991. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4 (2), 251–257. https://doi.org/10.1016/0893-6080 (91) 90009-T. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366. https://doi.org/ 10.1016/0893-6080 (89)90020-8. Hrnjica, B., Mehr, A.D., Behrem, Š., Agıralioglu, N., 2019. Genetic programming for turbidity prediction: hourly and monthly scenarios. Pamukkale € Universitesi M€ uhendislik Bilimleri Dergisi 25 (8), 992–997. https://doi.org/10.5505/pajes.2019.59458. Huang, G.B., Chen, L., Siew, C.K., 2006a. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17 (4), 879–892. https://doi.org/10.1109/TNN.2006.875977. 54 Handbook of hydroinformatics Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006b. Extreme learning machine: theory and applications. Neurocomputing 70 (1–3), 489–501. https://doi.org/ 10.1016/j.neucom.2005.12.126. Iglesias, C., Torres, J.M., Nieto, P.G., Fernández, J.A., Muñiz, C.D., Piñeiro, J.I., Taboada, J., 2014. Turbidity prediction in a river basin by using artificial neural networks: a case study in northern Spain. Water Resour. Manage. 28 (2), 319–331. https://doi.org/10.1007/s11269-013-0487-9. Jaddi, N.S., Abdullah, S., Hamdan, A.R., 2015. Multi-population cooperative bat algorithm-based optimization of artificial neural network model. Inf. Sci. 294, 628–644. https://doi.org/10.1016/j.ins.2014.08.050. Kasabov, N.K., Song, Q., 2002. DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction. IEEE Trans. Fuzzy Syst. 10 (2), 144–154. https://doi.org/10.1109/91.995117. Kasabov, N., Song, Q., Ma, T.M., 2008. Fuzzy-neuro systems for local and personalized modelling. In: Forging New Frontiers: Fuzzy Pioneers II. Springer, Berlin, Heidelberg, Germany, pp. 175–197. Kisi, O., Heddam, S., Yaseen, Z.M., 2019a. The implementation of univariable scheme-based air temperature for solar radiation prediction: new development of dynamic evolving neural-fuzzy inference system model. Appl. Energy 241, 184–195. https://doi.org/10.1016/j.apenergy.2019.03.089. Kisi, O., Khosravinia, P., Nikpour, M.R., Sanikhani, H., 2019b. Hydrodynamics of river-channel confluence: toward modeling separation zone using GEP, MARS, M5 Tree and DENFIS techniques. Stoch. Environ. Res. Risk Assess., 1–19. https://doi.org/10.1007/s00477-019-01684-0. Liu, L.W., Wang, Y.M., 2019. Modelling reservoir turbidity using Landsat 8 satellite imagery by gene expression programming. Water 11 (7), 1479. https://doi.org/10.3390/w11071479. Liu, Z., Jin, W., Mu, Y., 2020a. Variances-constrained weighted extreme learning machine for imbalanced classification. Neurocomputing. https://doi.org/ 10.1016/j.neucom.2020.04.052. Liu, Q., Li, J., Wu, L., Wang, F., Xiao, W., 2020b. A novel bat algorithm with double mutation operators and its application to low-velocity impact localization problem. Eng. Appl. Artif. Intell. 90, 103505. https://doi.org/10.1016/j.engappai.2020.103505. Luu, Q.H., Lau, M.F., Ng, S.P., Chen, T.Y., 2021. Testing multiple linear regression systems with metamorphic testing. J. Syst. Softw. 182, 111062. https:// doi.org/10.1016/j.jss.2021.111062. Mather, A.L., Johnson, R.L., 2014. Quantitative characterization of stream turbidity-discharge behavior using event loop shape modeling and power law parameter decorrelation. Water Resour. Res. 50 (10), 7766–7779. https://doi.org/10.1002/2014WR015417. Mather, A.L., Johnson, R.L., 2016. Forecasting turbidity during streamflow events for two mid-Atlantic US streams. Water Resour. Manage. 30 (13), 4899–4912. https://doi.org/10.1007/s11269-016-1460-1. Matouq, M., El-Hasan, T., Al-Bilbisi, H., Abdelhadi, M., Hindiyeh, M., Eslamian, S., Duheisat, S., 2013. The climate change implication on Jordan: a case study using GIS and Artificial Neural Networks for weather forecasting. J. Taibah Univ. Sci. 7 (2), 44–55. https://doi.org/10.1016/j. jtusci.2013.04.001. Mehr, A.D., Nourani, V., 2018. Season algorithm-multigene genetic programming: a new approach for rainfall-runoff modelling. Water Resour. Manage. 32 (8), 2665–2679. https://doi.org/10.1007/s11269-018-1951-3. Najah, A., El-Shafie, A., Karim, O.A., El-Shafie, A.H., 2013. Application of artificial neural networks for water quality prediction. Neural Comput. Applic. 22 (1), 187–201. https://doi.org/10.1007/s00521-012-0940-3. Olyaie, E., Abyaneh, H.Z., Mehr, A.D., 2017. A comparative analysis among computational intelligence techniques for dissolved oxygen prediction in Delaware River. Geosci. Front. 8 (3), 517–527. https://doi.org/10.1016/j.gsf.2016.04.007. Park, J.C., Um, M.J., Song, Y.I., Hwang, H.D., Kim, M.M., Park, D., 2017. Modeling of turbidity variation in two reservoirs connected by a water transfer tunnel in South Korea. Sustainability 9 (6), 993. https://doi.org/10.3390/su9060993. Rajaee, T., Jafari, H., 2018. Utilization of WGEP and WDT models by wavelet denoising to predict water quality parameters in rivers. J. Hydrol. Eng. 23 (12), 04018054. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001700. Sebbar, A., Heddam, S., Kisi, O., Djemili, L., Houichi, L., 2020. Comparison of evolving connectionist systems (ECoS) and neural networks for modelling daily pan evaporation from Algerian dam reservoirs. In: Negm, A.M., Bouderbala, A., Chenchouni, H., Barceló, D. (Eds.), Water Resources in Algeria – Part I. The Handbook of Environmental Chemistry. vol. 97. Springer, Cham, Switzerland, https://doi.org/10.1007/698_2020_527. Shekhar, C., Varshney, S., Kumar, A., 2020. Optimal control of a service system with emergency vacation using bat algorithm. J. Comput. Appl. Math. 364, 112332. https://doi.org/10.1016/j.cam.2019.06.048. Škrjanc, I., Iglesias, J.A., Sanchis, A., Leite, D., Lughofer, E., Gomide, F., 2019. Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: a survey. Inf. Sci. 490, 344–368. https://doi.org/10.1016/j.ins.2019.03.060. Teixeira, L.C., Mariani, P.P., Pedrollo, O.C., dos Reis Castro, N.M., Sari, V., 2020. Artificial neural network and fuzzy inference system models for forecasting suspended sediment and turbidity in basins at different scales. Water Resour. Manage. 34 (11), 3709–3723. https://doi.org/10.1007/s11269020-02647-9. Tsai, T.M., Yen, P.H., 2017. GMDH algorithms applied to turbidity forecasting. Appl. Water Sci. 7 (3), 1151–1160. https://doi.org/10.1007/s13201-0160458-4. Xie, X., Qin, X., Zhou, Q., Zhou, Y., Zhang, T., Janicki, R., Zhao, W., 2019. A novel test-cost-sensitive attribute reduction approach using the binary bat algorithm. Knowl.-Based Syst. 186, 104938. https://doi.org/10.1016/j.knosys.2019.104938. Yang, X.S., 2010. A new metaheuristic Bat-inspired algorithm. In: González, J.R., Pelta, D.A., Cruz, C., Terrazas, G., Krasnogor, N. (Eds.), Nature Inspired Cooperative Strategies for Optimization (NICSO 2010). Studies in Computational Intelligence, vol. 284. Springer, Berlin, Heidelberg, Germany, https://doi.org/10.1007/978-3-642-12538-6_6. Zhang, R., Wu, B., 2020. Environmental impacts of high water turbidity of the Niulan River to Dianchi Lake Water Diversion Project. J. Environ. Eng. 146 (1), 05019006. https://doi.org/10.1061/(ASCE)EE.1943-7870.0001623. Bat algorithm optimized extreme learning machine Chapter 2 55 Zolfaghari, K., Wilkes, G., Bird, S., Ellis, D., Pintar, K.D.M., Gottschall, N., McNairn, H., Lapen, D.R., 2020. Chlorophyll-a, dissolved organic carbon, turbidity and other variables of ecological importance in river basins in southern Ontario and British Columbia, Canada. Environ. Monit. Assess. 192 (1), 1–16. https://doi.org/10.1007/s10661-019-7800-x. Zounemat-Kermani, M., Alizamir, M., Fadaee, M., Sankaran Namboothiri, A., Shiri, J., 2020. Online sequential extreme learning machine in river water quality (turbidity) prediction: a comparative study on different data mining approaches. Water Environ. J. https://doi.org/10.1111/WEJ.12630. This page intentionally left blank Chapter 3 Bayesian theory: Methods and applications Yaser Sabzevaria and Saeid Eslamiana,b a Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, b Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction Bayesian law expresses the relationship between dependent variables. The Bayesian relation uses a numerical estimate of the probabilistic knowledge of the hypothesis before the observations occur, and provides a numerical estimate of the probabilistic knowledge of the hypothesis after the observations. This law for classifying phenomena is based on the probability of occurrence or nonoccurrence of a phenomenon and is important and widely used in probability theory. If we can choose such a separation for a given sample space that knowing which of the separated events occurred would reduce an important part of the uncertainty (Alinezhad et al., 2020). This is useful because it can be used to calculate the probability of an event being conditional on the occurrence or nonoccurrence of another event. In many cases, it is difficult to calculate the probability of an incident directly. Using this theorem and conditioning one event on another, the probability can be calculated. Bayesian theory has three methods: Bayes Optimal Classifier, Naive Bayes classifier, and Bayesian network. In hydrological issues, the Bayesian network has been used more. These networks are graphical networks that represent a set of possible variables and their conditional dependencies by a directional noncyclic graph (DAG). Bayesian network nodes represent variables that can be visible values, hidden variables, or unknown parameters. The edges of this network indicate dependencies. Each node has a probability function that includes the initial probability (for parentless nodes) or conditional probabilities related to the combination of different states of the parent nodes. The term Bayesian network was first coined by Barre Pearl in 1987 to emphasize the following three aspects: The subjective and judgmental nature of input information. In fact, many uncertain propositions do not involve a significant amount of historical data, and even with past historical information, judgmental information needs to be extracted in order to be able to measure uncertainty. Relying on conditional criteria as a basis for updating information. The difference between causal states and conclusive observations with emphasis on the well-known Thomas Bayes law (Bayes, 1763). 2. Bayesian inference In statistical science, there are two doctrines called the Frequentist doctrine and the Bayesian doctrine. In the Frequentist doctrine, only observations and frequency of events are cited and problems can be solved according to it, while in Bayesian doctrine, in addition to observations, information and initial beliefs of the researcher are also important and are considered in problem solving and conclusion. Another difference between the two doctrines is that in the Bayesian doctrine, the unknowns are the random variables, which means that unlike the Frequentist doctrine of the unknowns in the Bayesian doctrine, they do not have a fixed answer but a probability function for the unknown which gives the different values of probability for the unknown. For example, in the Frequentist doctrine, a person is either sick or not, while in the Bayesian method, a person can be 30% sick and 70% healthy. In Bayesian inference, an initial estimate of the unknowns is required. This estimate is the researcher’s initial knowledge or “Prior knowledge” which is expressed as a function of mathematical probability. Observations are then made and information about the unknowns is collected by the researcher, and using this new information, the initial probability function is updated. By gathering more information and updating the probability functions corresponding to the unknowns, more accurate probability distribution functions and better estimates can be obtained (Kouhestani et al., 2017). Drayton (1978) in an introduction to introduce the use of Bayesian method in meta-analysis for humanities issues, he argues that achieving general cause-and-effect relationships requires the repeated experiments. Since such activities require Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00026-9 Copyright © 2023 Elsevier Inc. All rights reserved. 57 58 Handbook of hydroinformatics the initial planning and coordination between the different researchers and this coordination is almost impossible to implement, Drayton suggests that the combined methods be used to achieve the goal in question. 3. Phases In Bayesian method, the three phases are as follows: 1. In the first stage, the researcher must express his belief in reality and pass it through the statistical filter of the expected mean, expected variance and the strength of beliefs in the initial trust. These three criteria can be based on the previous experience, past research, or a combination of them. If past experiences are expressed as mean, standard deviation and hypothetical sample size, there is nothing to prevent the reference to past research. 2. The second stage is to collect the results of experiments or observations. This step can be done by summarizing the statistics that are similar to those already predetermined. 3. The third stage is the combination of likelihood and initial belief and the formation of posterior information. Subsequent information can be new and more informative than the original information. Combining the latter information with other researches creates a new likelihood. In these methods, they are repeated in the same way, and as a result, they lead to a new study and finally become their own characteristics. As Dritton points out, sampling can continue as long as it covers the whole community or until the latest discrepancies are justified. This method is flexible in using different coefficients and mathematical transformations. Bayesian theory is as sensitive as sample (n). 4. Estimates Bayesian method includes the classical estimates such as: l l l l Maximum A Posteriori (MAP) Maximum probability (ML) Minimum mean square error (MMSE) Minimum average error size (MAVE) was considered as a special case. The Markov concealed model is widely used in statistical signal processing, which is an example of the Bayesian model. Bayesian inference is based on minimizing the Bayesian risk function, which is obtained using the mentioned models and using the observations and value of the error function (). 5. Theorem Bayes A method for classifying phenomena is based on the probability of occurrence or nonoccurrence of a phenomenon and is important and widely used in probability theory. If such a separation can be chosen for the hypothetical sample space that by knowing which of the separated events occurred, an important part of the uncertainty is reduced. This is useful because it can be used to calculate the probability of an event being conditional on the occurrence or nonoccurrence of another event. In many cases, it is difficult to calculate the probability of an incident directly. Using this theorem and conditioning one event on another, the probability can be calculated. This relationship is famous for the honor of the English philosopher Thomas Bayes called Formula Bays. Main equation: Suppose B1, …, Bk form a partition for the sample space S. So that for every J ¼ 1, …, k, we have P(Bj) > 0 and suppose A is an event with the assumption P(A) > 0, in which case for i ¼ 1, …, k, we have: PðBi ÞP ABi P ð Bi j AÞ ¼ X k P Bj P A Bj j¼1 5.1 Argument of Bayes PðB \AÞ By definition conditional probability we have PðBi jAÞ ¼ PðiAÞ . The form of the deduction according to the order multiplied by the conditional probability is equal P(Bi jA) and the denominator of the deduction according to the proposition of the law of total probability is equal P(A). Bayesian theory: Methods and applications Chapter 3 59 If A and B are two assumed events, we can consider event A as follows: A ¼ ð A \ BÞ [ ð A \ B0 Þ Because the point in A must be either in both A and B or it must be in A and not in B. On the other hand, we know A \ B and A \ B0 they are incompatible, so we can write: PA ¼ PðABÞ + PðAB0 Þ ¼ PðAjBÞPðBÞ + PðAjB0 ÞPðB0 Þ ¼ PðAjBÞPðBÞ + PðAjB0 Þð1 PðBÞÞ This relationship states that the probability of occurrence of event A is a weighted average of the conditional probability (A jB) and the conditional probability (A jB0 ). The weight given to any conditional probability is as probable as the condition on which A is conditional. The above relation can be generalized as follows. Assume that events B1, B2, … and Bn are inconsistent pairwise events, On the other hand, the following relation is also established between these events: S ¼ [ni Fi From this statement, it can be inferred that one of the events B1, B2, … and Bn must have taken place. On the other hand, we know that events A \ Bi are inconsistent in pairs for i ¼ 1, …, k and we write: A ¼ [ni ðA \ Bi Þ Here you can write: Pð AÞ ¼ n n X X ð A \ Bi Þ ¼ p A Bi Pð B i Þ i¼1 i¼1 This relation describes how can P(A) be calculated by making one of the events B1, B2, … and Bn conditional. In general, this equation describes P(A) is equal to the weighted average P(Aj B0 ) so that each weight of each sentence is equal to the probability to which it is conditioned. Suppose that event A occurs and we want to calculate the probability that one of the events Bi happened: f yjy ðyjyÞ ¼ f yjy ðyjyÞ f y ð yÞ f y ðyÞ 5.2 Bayesian estimation theory Estimation theory is concerned with determining the best estimate of uncertain parameters by observing related signals, or the recovery of a signal when combined with noise. For example, a noise sinusoidal signal is given and there is an interest in obtaining its basic parameters (amplitude, frequency, phase, etc.) or in obtaining the signal itself. The estimator has its own noise observation set as input and obtains estimates from unspecified parameters using dynamic models or statistical models. The accuracy of the estimate depends on the available data and the efficiency of the estimator. Bayesian model uses data from the observed signal and accumulation of previous probabilities of the process. Now we want to estimate the random variable y based on the random variable y. According to Bayesian law, the density function y is given by y as follows: f yjy ðyjyÞ ¼ f yjy ðyjyÞ f y ð yÞ f y ðyÞ Which fy(y) is a constant for observation given that we have y in it and has only a multiplier effect. There are two other density functions in the Bayesian formula, one is fyj y(y jy) the probability of observing y provided that y occurs and the other is the probability density function y. The effect of the density function of fyj y(yj y) and fy(y) on fyj y(yjy) depends on form of function. That is, the higher the peak, the greater the effect, and if the function is constant, it will have no effect (Sean, 2004). 5.3 Machine learning using Bayesian method For Bayesian approach to machine learning (or any other process), you must first: l Formulate existing knowledge about the subject in a probabilistic way: to do this, we must model the qualitative quantities of knowledge in the form of probability distributions, independence hypotheses, and so on. This model will have 60 l l l l l Handbook of hydroinformatics unknown parameters that for each of the unknown values, the initial probability distribution is considered, which reflects our belief in the probability of each of these values without seeing the data. By collecting data and observing it, we calculate the value of the secondary probability distribution Using this secondary probability We come to a conclusion about uncertainty We make predictions by averaging the secondary probability values Decide to reduce the expected secondary error 5.4 Bayesian theory in machine learning In machine learning, we usually look for the best hypothesis in the space of Hypothesis H that applies to D training data. One way to determine the best hypothesis is to look for the most probable hypothesis that, given the training data D and the previous probability of different hypotheses, one might expect Bayesian theory to provide such a solution. The base of Bayesian learning is Bayesian theory. This theory makes it possible to calculate the secondary probability based on the primary probabilities: PðhjDÞ ¼ PðDjhÞPðhÞ Pð D Þ As can be seen, the value PðhjDÞ ¼ PðDPjðhDÞPÞ ðhÞ decreases with increasing P(D). Because the higher the probability of seeing D independent of h, the less evidence D has to support h. 5.5 Definition of basic concepts Assume that the space of hypothesis h and the set of instruction examples D exist. We define the following probability values: P(h): Prior probability that hypothesis h had, before viewing the training data of D. If such a possibility does not exist, all hypotheses can be given the same probability. P(D): Probability of viewing training data of D. P(D jh): Likelihood or probability of viewing training data of D assuming that hypothesis h is true. P(hj D): Posterior probability or hypothesis probability of h provided training data of D is observed. Note that the former probability P(h) is independent of the training data but the P(h jD) posterior probability reflects the effect of the training data. 5.6 Bayesian machine learning methods Bayesian methods offer hypotheses that are able to predict probability. A number of Bayesian machine learning methods include the following: l l l Bayes Optimal Classifier Naive Bayes classifier Bayesian networks 5.7 Optimal Bayes classifier 5.7.1 Background and theory Consider now that instead of having a continuous output variable Y, we have instead a categorical output variable G. This model is summarized as l l l l Input: X RpX Rp comes from a pp dimensional space Output classification G G where G is a random variable corresponding to the discrete output value, and G is the discrete output space. Joint distribution on the input and output Pr(X,G) ¼ [(x1,g1),(x1,g2)…(xm,gm)]Pr(X,G) ¼ [(x1,g1),(x1,g2)…(xm,gm)] Goal is to learn a function f(x):Rp ! G which takes inputs from the p dimensional input space and maps them to the discrete output space. Bayesian theory: Methods and applications Chapter 3 61 A first step is to decide on an appropriate loss function, as the usual \qq{squared loss} is not appropriate for discrete outputs. Instead we will use the simple \qq{0-1 loss} function which is defined as follows. Define the loss as a K K matrix, where K ¼ card(G) where the matrix will have 0 on the diagonal and nonnegative values otherwise. So the loss L(k,l) is the k,l entry of the matrix, and is the cost of classifying k as l. For example, in the case of three classes, we could get 2 3 0 1 1 6 7 41 0 15 1 1 Which means we can write the 0 1 loss function as: Lðk, lÞ 0 0 if k ¼ l 1 if k 6¼ l Lðk, lÞ ¼ τðk 6¼ lÞ ¼ 1 τðk ¼ lÞ The Expected Predicted Error (EPE) is therefore: h i EPE fbðxÞ ¼ E LðG, fbðXÞÞ Where the expectation is taken with respect to the joint distribution Pr(X,G). Again we can condition on X to obtain h i EPE fbðxÞ ¼ Ex EGjX LðG, fbðXÞÞjX K h i X L k, fbðXÞ PT ðkjXÞ EPE fbðxÞ ¼ Ex k¼1 where k ¼ 1, …, K are all the possible values that the random variable G can take, i.e., the set G. Note that this is the discrete version which is analogous to the derivations discussed in the previous section. As we want to minimize the expected loss, we can do the following: fbðXÞ ¼ arg min g K X L½k, gPT ðkjXÞ k¼1 ¼ arg min g K X ð1 τðk ¼ gÞÞPT ðkjXÞ k¼1 ¼ arg max g K X τðk ¼ gÞPT ðkjXÞ k¼1 Since the indicator function is 11 when k ¼ g, we get fbðXÞ ¼ arg maxg PT ðgjXÞ ¼ MAP In other words, the optimal Bayes decision rule is to choose the class presenting the maximum posterior probability, given the particular observation at hand. Classifiers such as these are called Bayes Optimal Classifier or Maximum a Posteriori classifiers. Since, for a given observation x, the marginal distribution of p(x) is constant in the denominator of Bayes theorem, we can simplify this decision rule further as: f^ðXÞ ¼ arg maxg PT ðgjX ¼ xÞ P ðxjgÞpðgÞ ¼ arg maxg T pð x Þ ¼ arg maxg PT ðxjgÞðpðgÞ ¼ arg maxg log PT ðxjgÞ + log ðpðgÞ This form makes clear that the MAP decision rule tries to reach a compromise between the a priori expectations p(g) and the evidence provided by the data via the likelihood function p(x jg). 62 Handbook of hydroinformatics The Optimal Bayes classifier chooses the class that has greatest a posteriori probability of occurrence (so called maximum a posteriori estimation, or MAP). It can be shown that of all classifiers, the Optimal Bayes classifier is the one that will have the lowest probability of miss classifying an observation, i.e., the lowest probability of error. So if we know the posterior distribution, then using the Bayes classifier is as good as it gets. In real life, we usually do not know the posterior distribution, but rather we estimate it. The Naive Bayes classifier approximates the optimal Bayes classifier by looking at the empirical distribution and by assuming independence of predictors. So the Naive Bayes classifier is not itself optimal, but it approximates the optimal solution. 5.8 Naive Bayes classifier In machine learning, a group of simple categorizers based on probabilities is said to be based on Bayes’ theorem, assuming the independence of random variables. The Bayesian method is simply a method of classifying phenomena based on the probability of occurrence or nonoccurrence of a phenomenon. This method is one of the simplest forecasting algorithms that also has acceptable accuracy (Sean, 2004). Its accuracy can be significantly increased by using kernel density estimation. The learning method in the simple Bayesian method is supervised learning (Sean, 2004). This method was developed among information retrieval scientists in the decade and is still one of the most popular methods in document classification. A simple Bayesian assumes the independence of the prediction variables; hence, it is called a simple Bayesian (Sean, 2004). There are many applications that estimate the parameters of naive Bayes, so people without the need for Bayesian theory can take advantage of this opportunity to solve problems. Despite the design issues and assumptions about the Bayesian method, this method is suitable for classifying most problems in the real world. Probabilistic modeling: If we have n variables, that is x ¼ (x1, …, xn), and y is the output of a set of k members. The purpose of modeling is to find the conditional probability of each of these k categories means: p(Ck jx1, …, xn). According to Bayesian law, this probability is equal to (Jensen, 2001): pðCk jxÞ ¼ pðCk , xÞ apðCk , xÞ pðxÞ p ð C , xÞ In other words, the conditional probability pðCk jxÞ ¼ pðkxÞ apðCk , xÞ depends on the combined distribution of X and Ck. According to the chain law, this combined distribution is equal to: pðCk , x1 , …, xn Þ ¼ pðx1 , …, xn , CÞ pðCk , x1 , …, xn Þ ¼ px1 x2 , …, xn , Ck pðx2 , …, xn , Ck Þ pðCk , x1 , …, xn Þ ¼ p x1 x2 , …, xn , Ck p x2 x3 , …, xn , Ck pðx3 , …, xn , Ck Þ pðCk , x1 , …, xn Þ… pðCk , x1 , …, xn Þ ¼ p x1 x2 , …, xn , Ck p x2 x3 , …, xn , Ck …pðxn1 xn , Ck Þp xn Ck pðCk Þ Now, if we assume that each variable is independent of the other variables, provided that the category Ck means p(xi jxi + 1, …, xn, Ck) ¼ p(xi j Ck), then we get the following result: pðCk x1 , …, xn ÞapðCk , x1…,xn Þ pðCk , x1 …, xn Þ ¼ pðCk Þp x1 Ck p x2 Ck p x3 Ck … n Y pðCk , x1 …, xn Þ ¼ pðCk Þ p xi Ck i¼1 By normalizing the previous expression, the conditional probability distribution can be found, in the below equation P z ¼ p(x) kp(Ck)p(x jCk) is the same as the normalization coefficient: n Y 1 p xi C k p Ck x1 …, xn ¼ pðCk Þ z i¼1 If the goal is to find the most probable category, the normalization coefficient, Z, is not needed: ybðXÞ ¼ arg maxg pðCk Þ n Y p xi Ck i¼1 kf1, …, kg Bayesian theory: Methods and applications Chapter 3 63 Estimation of parameters: To model a naive Bayesian classifier for all ks, we need to estimate p(Ck) and p(xi jCk). p(Ck) simply obtained by calculating the percentage of data belonging to the Ck class. There are several ways to get p(xi jCk), estimating polynomial distributions or natural distributions are common ways to do this (Chin et al., 2009). In the natural distribution estimation method, we estimate p(xi jCk) with a natural distribution with mean mi, k and var2 iance s2 i,k and obtain mi,k and si,k through correct placement: 0 2 1 u m i,kÞ C 1 B p xi ¼ uCk ¼ qffiffiffiffiffiffiffiffiffiffiffiffi exp @ A 2 2s 2 i,k 2psi,k If xi is discrete, the p(xi ¼ u jCk) distribution can be estimated by a polynomial distribution. Advantages and disadvantages: Research in 2004 provided theoretical reasons for Bayesian irrational behaviors, and in 2006, comprehensive observations were made to compare the method with other classification methods such as boosted trees and random forests, which confirmed its effectiveness. The advantages of this method include the following: l l l Categorizing test data is easy and fast. It also performs well when the number of categories is more than two. As long as the condition of independence is met, a simple Bayesian classifier performs better than other models such as logistic regression and requires little training. When our inputs are categorized, this method works better than when our inputs are numbered. For the case where the input is a number, it is usually assumed that they follow the normal distribution. In addition to the advantages of this classifier, it also has disadvantages, including: l l If our input is categorized and there are categories in the learning phase that the categorizer has not seen any data from, the potential categorizer will be zero for that category and will not be able to categorize. To solve this problem, smoothing techniques such as Laplace estimator can be used. Another disadvantage of this categorizer is that it is almost impossible to achieve the condition of independence in the real world. Applications: Some of the uses of this categorizer are as follows: l l l l Text categorizers: Naive Bayesian classifiers are commonly used in text categorization and have a higher success rate than other methods. Spam filtering: One of the most popular uses of this category is spam filtering. This filtering method uses a naive Bayesian classifier to identify spam emails. Many e-mail servers today use Bayesian spam filtering. This method is also used in spam filtering softwares. Server-side filters such as Bogofilter, SpamBayes, SpamAssassin, DSPAM, and ASSP also use Bayesian spam filtering techniques. Recommending system: A naive Bayesian classifier with group refinement forms a recommending system that uses machine learning and data mining techniques to filter out unseen information and predict a user’s opinion on various items. Emotion analysis: This categorizer is used to analyze the emotions of various texts and opinions (e.g., on social networks). 6. Bayesian network This method is based on the calculation of dependent probabilities or Bayesian law. A Bayesian network consists of a number of nodes that represent those random variables that interact with each other. This interaction is created by the connection between nodes (Cain, 2001). Fig. 1 shows the nodes and the relationship between them. Definitions and concepts: There are several definitions for Bayesian networks. G ¼ (V, E)A noncyclic, directional graph and X ¼ (XV)nV is a set of random variables with V-indices. X is a Bayesian network relative to G, if the probability density function can be written as a product of single density functions provided by their parent variables (Russell Stuart and Norvig, 2003). 64 Handbook of hydroinformatics X1 X2 Y2 Y1 Y3 FIG. 1 The nodes and the relationship between them. PðXÞ ¼ Y P XV XpaðV Þ nn Which pa(V) is the parent set V (nodes that are inserted directly into it with an edge). For each set of random variables, the probability of each member of the co-distribution can be calculated from conditional probabilities using the chain law as follows (Russell Stuart and Norvig, 2003): Yn n+1 X , …, X ¼ x PðX1 , x1 , …, Xn ¼ xn Þ ¼ P Xn ¼ x n n n n¼1 By comparing this relationship with the above definition, we will have: Q PðX1 , x1 , …, Xn ¼ xn Þ ¼ nn¼1 P Xn ¼ xn Xj ¼ xj for each Xj which is a parenðtxn Þ The difference between the terms 11-3 and 12-3 is the conditional independence of the variables from each of their offspring nodes, provided the values of their parent variables. Creating Bayesian networks: Creating a Bayesian network requires three steps: l l l Identify important variables and their possible scenarios Identify the relationship between variables and express it in a graphical structure Evaluation of initial and conditional probabilities It should be noted that creating a Bayesian network is a creative process that is repeated in the above steps until the desired network is reached. The first step (identifying variables) is not always easy. Hecherman et al. have proposed the following to identify variables (Hecherman et al., 1995). ✓ Accurate identification of modeling goals. For example, the purpose of modeling can be predictive, descriptive, or exploratory. ✓ Identify possible observations that may be related to the problem. ✓ Identify valuable subsets of these observations for the model, given the complexity of the network. ✓ Organize the observations in the variables so that the two states are incompatible. Jensen (2001) has proposed three types of variables for the development of the Bayesian network (Jensen, 2001): (a) Hypothesis variables: These variables are not observable (or are observed at an unjustifiable cost). Identifying these variables is the first step in building business networks. (b) Information variables: These types of variables are observable and provide information about hypothesis variables. (c) Modeling variables: These variables are used for specific purposes and modeling, such as simplifying conditional probability tables. In the process of building a Bayesian network, variables (nodes) can be easily added or modified. The graphical structure of this network allows variables to be added or removed without any noticeable effect on the rest of the network. Bayesian theory: Methods and applications Chapter 3 65 After defining the variables, the next step is to determine the graphical structure of the network. This requires identifying possible dependencies between variables and applying them to directional edges. The direction of these edges must be carefully defined. This can increase the complexity of the model; however, the modeling process must be continuously reviewed and modified in terms of dependency relationships. The last step in building a Bayesian network is to evaluate the probability values and their expertise in node probability tables (NPT). The NPT shows how the conditional dependencies of the related variables. Depending on the type of a variable (discrete or continuous), NPT can be a discrete probability table or a continuous probability distribution. In orphaned nodes, NPT showed the initial probability, which was estimated by subjective estimation from previous data. In a parent node, the probability of each state of the node is evaluated under the condition of each state of its parents; therefore, the NPT of these nodes contains probability values for all possible combinations of their parent states. In addition to the initial probabilities, the conditional probability values are also extracted from both past data sources and expert opinion. Extracting these probabilities is a difficult and time-consuming process. In many applications, the information required is limited or inaccessible. Therefore, the knowledge and experience of experts in these fields is the main source of potential data (Khodakarami et al., 2007). Bayesian network structural learning algorithms are divided into two categories, including constraint-based learning algorithms and score-based learning algorithms. The first category is obtained by statistical tests (such as PC and NPC algorithms) based on conditional independence and dependence between variables. In score-based learning methods, it evaluates all possible relationships between nodes and selects a structure with the highest score as the desired structure (Sadeghi Hesar et al., 2012). Due to their simplicity, PC and NPC algorithms are the most widely used in training Bayesian networks to training modeling structure. Applications and benefits of Bayesian networks: Bayesian networks provide a robust, comprehensive, and flexible approach to modeling risk and uncertainty. Today, the benefits of Bayesian networks are well understood and used in a variety of areas. In recent years, researchers have developed programs for the easy implementation of these networks that have made it possible to develop decision support systems in a variety of applications (Fenton and Neil Martin, 2007). In recent years, Bayesian network models for quantitative analysis of project scheduling risk (Khodakarami et al., 2007) and new product development (Chin et al., 2009), environmental modeling (Aguilera et al., 2011), DNA analysis in legal issues (Biedermann and Taroni, 2011), real-time flood event prediction (Biondi and De Luca, 2012), and runoff estimation (Sadeghi Hesar et al., 2012) have expanded. 7. History of Bayesian model application in water resources Niko and Karachian (2008) in a study using the trade ratio system and Bayesian network, according to the one-way direction of river water flow, prepared a qualitative model of the river. In this study, the results of the ratio-trade system were used to train a Bayesian network. Due to the uncertainties in the river system, combining Monte Carlo uncertainty analysis, ratio-trade method, and Bayesian network, a new model was proposed for the pollutant discharge permit, which in addition to providing a trade model for pollution discharge permit, the ability to generate output. It also has the possibility and quality management of the river in real time. The results showed that this model is an effective tool in the quality management of the river system. Ghorbani and Dehghani (2016) in a study investigated the applicability of Bayesian network model, artificial neural network, and gene expression programming to analyze the amount of dissolve solids in the Balkhuchai River located in Ardabil province. Accordingly, quality variables including bicarbonate, chloride, sulfate, calcium, magnesium, sodium, and flow rate in the monthly time scale during the statistical period (1976–2006) as the input of Bayesian network model and its results with artificial neural network models and they compared the gene expression programming. The results showed that although the three models were able to estimate the amount of water-dissolve solids with acceptable accuracy, but the Bayesian network model with the highest correlation coefficient (0.966), the least square root or the root mean square error (0.094 mg/lit), and also the stochasticity criterion (0.988) is in the priority stage in the validation stage. Varis and Keskinen (2006) used a Bayesian network in multiobjective optimization research and explained its application in water resources and environmental management. Finally, he expressed the efficiency of his model in the form of an example in the field of economic management of river water quality. Borsuk et al. (2001) studied the monitoring and quality management program of the river through the Bayesian network. The program was about the problem of water intake (reduction of water-soluble oxygen and increased growth of algae and toxic microorganisms) of a river in the state of Carolina in the United States. This river was divided into three categories of parameters in terms of qualitative parameters. Parameters related to water quality, biological quality, and water quality 66 Handbook of hydroinformatics suitable for human health. Then the appropriate Bayesian network was formed. The results showed that the Bayesian network model can predict the changes that occur in the properties of the ecosystem in relation to the adopted policy. Sun et al. (2019) in a study improved the simulation of evapotranspiration using the Bayesian network mean model with the surface energy balance model in China. In this study, four surface energy balance models (SEBAL, SSEB, S-SEBI, and SEBS) and the average Bayesian network model using Landsat 8 in two semiarid regions and dry/semiarid regions. In China, it was examined. The results showed that the mean model of a random Bayesian network with R2 ¼ 0.75, RMSE ¼ 0.902 mm/day, and Nash coefficient ¼ 0.746 for the high station and R2 ¼ 0.796, RMSE 0.602 ¼ mm/day, and the Nash coefficient ¼ 0.793 for the Sidaokiao station, compared to the four surface energy balance models, predicts better evapotranspiration in two climates. 8. Case study of Bayesian network application in modeling of evapotranspiration of reference plant In a study conducted in the semiarid climate of Khorramabad located in western Iran in order to evaluate the models of artificial intelligence in estimating the evapotranspiration of the reference plant, the ability of the Bayesian network model to estimate the evapotranspiration of the reference plant on a monthly and daily time scale was examined. In this study, six input patterns for modeling and for network structure training were determined, the forbidden search learning algorithm (PC) (significance level of 5%) and also for network parametric training, by setting two variables of significance level and maximum size proximity was used according to the effect of the parameters on each other. Table 1 shows the results of Bayesian network modeling in estimating monthly reference evapotranspiration. As can be seen, the hybrid structures show better performance, so that the hybrid structure No. 5 with high accuracy ¼ 0.97 and the lowest root mean square error RMSE ¼ 1.09 (mm/day) in the training phase and the lowest root mean square error RMSE ¼ 0.93 (mm/day) in the test stage had better performance than other structures and was able to simulate the reference evapotranspiration of the study area with appropriate accuracy. Fig. 2 shows the best Bayesian network structure. The main purpose of this method is to find the relationship between reference evapotranspiration and parameters affecting it. Fig. 3 shows the changes in observational and computational values over time in the training phase. Based on this diagram, the maximum values of the model have an almost unsatisfactory performance and it can be stated that the Bayesian network model has performed poorly in estimating the maximum values and has estimated the values less than the actual value, which is abundant in the figure; can be stated that Bayesian network determines the relationship between each of the variables randomly based on conditional independence and dependence between variables and based on probabilities, and the network is not well generalized. Fig. 3 is the model diagrams in the test phase. The results of this study showed that the Bayesian network was weak in estimating the maximum values, which can be explained by the Bayesian network randomly determining the relationship between each of the variables based on conditional independence and dependence between variables and probabilities, and generalized the network not been well. TABLE 1 Bayesian network results in reference evapotranspiration modeling of Khorramabad station. Training Testing Row Pattern R2 RMSE R 1 M1 0.93 2.83 0.93 2.85 2 M2 0.95 4.94 0.96 5.03 3 M3 0.96 5.59 0.96 5.72 4 M4 0.96 4.26 0.96 4.39 5 M5 0.97 1.09 0.97 0.93 6 M6 0.97 2.87 0.97 2.76 2 RMSE Bayesian theory: Methods and applications Chapter 3 67 ET0 (mm/day) FIG. 2 Bayesian network structure used for simulation. Time (month) FIG. 3 Graph of computational and observational values relative to time. 9. Conclusions For survey, the relationship between dependent variables, Bayesian law, is used. The Bayesian relation uses a numerical estimate of the probabilistic knowledge of the hypothesis before the observations occur and provides a numerical estimate of the probabilistic knowledge of the hypothesis after the observations. This law for classifying phenomena is based on the probability of occurrence or nonoccurrence of a phenomenon and is important and widely used in probability theory. Due to the application of Bayesian theory in probabilities and uncertainty problems, this method can be used in various problems such as hydrological problems. The Bayesian network, due to its nonlinear mathematical structure, has the ability to describe the complex nonlinear processes that occur between the input and output of any system. The Bayesian network also provides the explicit solutions based on which the relationship between the input and output variables can be determined. References Aguilera, P.A., Fernandez, A., Fernandez, R., Rumi, R., Salmeron, A., 2011. Bayesian networks in environmental modeling. Environ. Model Softw. 26 (12), 1376–1388. Alinezhad, A., Gohari, A.R., Eslamian, S., Baghbani, R., 2020. Uncertainty analysis in climate change projection using Bayesian approach. In: World Environmental and Water Resources Congress (ASCE), Henderson, Nevada, USA, May 17–21. Bayes, T., 1763. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS Communicated by Mr. Price, in a letter to John Canton, AMFR S. Philos. Trans. Royal Soc. Lond. (53), 370–418. Biedermann, A., Taroni, F., 2011. Bayesian networks for evaluating forensic DNA profiling evidence: a review and guide to literature. Forensic Sci. Int. Genet. 6 (2), 147–157. Biondi, D., De Luca, D.L., 2012. A Bayesian approach for real-time flood forecasting. Phys. Chem. Earth 42 (44), 91–97. 68 Handbook of hydroinformatics Borsuk, M.E., Higdon, D., Stow, C.A., Reckhow, K.H., 2001. A Bayesian hierarchical model to predict benthic oxygen demand from organic matter loading in estuaries and coastal zones. Ecol. Model. 143 (3), 165–181. Cain, J., 2001. Planning Improvement in Natural Resource Management: Guideline for Using Bayesian Networks to Support the Planning and Management of Development Program in the Water Sector and Beyond. Centre for Ecology and Hydrology (CEH), Wallingford, UK. Chin, K.S., Tang, D.W., Yang, J.B., Wang, S.Y., Wang, H., 2009. Assessing new product development project risk by Bayesian network. Expert Syst. Appl. 36, 9879–9890. Drayton, E.L., 1978. The Effect of Father Absence Upon Social Adjustment of Male and Female Institutionalized Juvenile Delinquents. Fordham University, USA. Fenton, N., Neil Martin, E., 2007. Managing Risk in the Modern World: Applications of Bayesian Networks – A Knowledge Transfer Report from the London Mathematical Society and the Knowledge Transfer Network for Industrial Mathematics. London Mathematical Society, London, England. Ghorbani, M.A., Dehghani, R., 2016. Comparison of Bayesian neural network and artificial neural network methods in river suspended sediment estimation (case study: Simine Road). Environ. Sci. Technol. Q. 19 (2). https://civilica.com/doc/1288926. Hecherman, D., Mamdani, A., Wellman, M., 1995. Real-world application of Bayesian networks. Commun. ACM 3, 25–26. Jensen, F.V., 2001. Bayesian Networks and Decision Graphs. Springer-Verlag, New York, USA. Khodakarami, V., Fenton, N., Neil, M., 2007. Project scheduling: improved approach to incorporate uncertainty using Bayesian networks. Proj. Manage. J. 38, 30–49. Kouhestani, S., Eslamian, S., Besalatpour, A., 2017. The Effect of Climate change on the Zayandeh-Rud River Basin’s temperature using a Bayesian machine learning, Soft Computing Technique. J. Water Soil Sci. 21 (1), 203–216. Niko, M., Karachian, R., 2008. The use of Bayesian networks in the non-deterministic model of river pollution permit trading. The First Conference on Environmental Systems Management and Planning Engineering. Civilica, Tehran. https://civilica.com/doc/50951. Russell Stuart, J., Norvig, P., 2003. Artificial Intelligence: A Modern Approach, second ed. Upper Saddle River, New Jersey, USA, Prentice Hall. ISBN 013-790395-2. Sadeghi Hesar, A., Tabatabaee, H., Jalali, M., 2012. Monthly rainfall forecasting using Bayesian Belief Networks. Int. Res. J. Appl. Basic Sci. 3 (11), 2226– 2231. Sean, R.E., 2004. What is Bayesian statistics? Nat. Biotechnol. 22, 1177–1178. Sun, S., Zhang, G., Shi, J., Grosse, R., 2019. Functional variational Bayesian neural networks. arXiv. https://arxiv.org/abs/1903.05779. Varis, O., Keskinen, M., 2006. Policy analysis for the Tonle Sap Lake, Cambodia: a Bayesian network model approach. Int. J. Water Resour. Dev. 22 (3), 417–431. Chapter 4 CFD models Hossien Riahi-Madvara, Mohammad Mehdi Riyahib, and Saeid Eslamianc,d a Department of Water Engineering, Faculty of Agriculture, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran, b Department of Civil Engineering, Faculty of Civil Engineering and Architecture, Shahid Chamran University of Ahvaz, Ahvaz, Iran, c Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, d Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction In this chapter, computational fluid dynamics (CFD), as an advanced technique in hydroinformatics modeling, is presented. Some representative applications of CFD in hydroinformatics including the one-dimensional solution of advectiondiffusion equation in pollutant transport modeling, one-dimensional solution of Saint-Venant equations for dam-break simulation, quasi-two-dimensional solution of velocity distribution in compound rivers, three-dimensional modeling of turbulent flow, and finally pollutant transport in rivers are introduced and numerically solved. In this chapter, different types of CFD models are developed and used in different fields of river engineering simulations. The physically influenced scheme (PIS) is introduced for the one-dimensional dam-break modeling via finite volume method. PIS is used for the onedimensional solution of advection-diffusion equation in pollutant transport modeling and one-dimensional solution of fully dynamic Saint-Venant equations in dam-break simulation. For solving the quasi-two-dimensional flow in natural rivers, the Shiono and Knight Model (SKM) with finite difference method is numerically solved. In the case of three-dimensional modeling, seven turbulence models are used to simulate the three-dimensional turbulent flow in open channels. Finally, three-dimensional pollutant transfer in rivers is simulated by three different numerical models. At each section, the outputs of numerical models are compared to the analytical or measured values to evaluate the results of the techniques. This chapter briefly introduces and applies CFD techniques in hydroinformatic modeling. Numerical solutions of governing equations in river flow and fluid mechanics are one of the great methods for the prediction of the flow field in hydroinformatics studies, including sediment transport, pollutant transport, open channels hydraulics, and river engineering (Tucciarelli, 2003; Riahi-Madvar et al., 2019). The three major methods for numerical discretization of nonlinear partial differential equations (PDEs) of fluid flow equations are the finite difference (FD), finite element (FE), and finite volume (FV) methods (Aldrighetti, 2007). The finite volume method is widely implemented in computational fluid dynamics (CFD) computer codes and commercial software with various discretization schemes (Darbandi et al., 2007; Darbandi and Bostandoost, 2005). Recently, a new scheme for face flux estimations in FV is proposed based on the physical influence of flow field in gas dynamics (Darbandi et al., 2007; Darbandi and Bostandoost, 2005) as well as in dam-break simulations (Bozkus and Eslamian, 2022). In this chapter, different methods of the CFD applications in hydroinformatics in one-, two-, and three-dimensional domains are introduced and used for the river flow and pollutant transport simulations. 2. Numerical model of one-dimensional advection dispersion equation (1D-ADE) Suspended sediment transport and pollutant dispersion in rives in a one-dimensional framework is modeled by numerical solution of ADE, which is expressed as follows (Kashefipour and Falconer, 2002; Wu, 2007): ∂ðACÞ ∂ðAUCÞ ∂ ∂C + ¼ ADx (1) + ST ∂t ∂x ∂x ∂x where C is the pollutant concentration, U is velocity, A is the cross-section area, t is time, x is the direction of flow, ST is the source term, and Dx is the longitudinal dispersion coefficient. In this section, the focus is on the one-dimensional mass dispersion, so the flow field is not important and it is supposed that the flow field is uniform or it is predefined. Using this hypothesis, Eq. (1) transforms to: Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00020-8 Copyright © 2023 Elsevier Inc. All rights reserved. 69 70 Handbook of hydroinformatics ∂ðCÞ ∂ðUCÞ ∂ ∂C + ¼ Dx ∂t ∂x ∂x ∂x (2) A cell-centered numerical grid, as presented in Fig. 1, is used to discretize the equation. The grid has N points, N 1 interfaces, and two boundaries faces. The conserved variables are determined at the cell centers and represent the average value of cells, while the fluxes are calculated at the cell interfaces. Integrating (2) on the ith cell with the length Dx, and applying the Green’s theorem, after the simplification, the following discretized implicit equation can be obtained (Seifi and RiahiMadvar, 2019): " n+1 # h i ∂C n+1 ∂C n+1 n n+1 n+1 ðCÞP ðCÞP Dx + UDt ðCÞe ðCÞw ¼ K x Dt (3) ∂x e ∂x w In Eq. (3), the diffusion parts are in the pressure forms that are discretized using the central difference method consistent with the physics of the pressure field (Darbandi et al., 2007; Darbandi and Bostandoost, 2005). In Eq. (3), the convection n+1 parts U(C)n+1 e e and U(C)w represent the interface fluxes at w and e faces. The interface fluxes should be determined by a proper scheme (Patankar, 1980; Wu, 2007). In this section, a physically based scheme rather than numerical interpolation is developed for these face fluxes. 3. Physically influenced scheme In this section, the new methodology of a physically based scheme for the derivation of face fluxes in ADE is presented. As mentioned previously, Darbandi and Bostandoost (2005) introduced the PIS for aerospace applications. In this section, the PIS is used in suspended sediment and pollutant dispersion modeling. Eq. (2) can be rewritten as (Seifi and Riahi-Madvar, 2019): 2 ∂ðCÞ ∂ðCÞ ∂ C +U ¼ Kx (4) ∂t ∂x ∂x2 The terms in this equation need to be discretized in a scheme that compliments the physical nature of mass and pollutant transport. In this view, the convection term is discretized by the upwind, the central difference discretization is used for the diffusion term, and the backward discretization is used for the unsteady term, i.e. (Seifi and Riahi-Madvar, 2019) Cne ∂ðCÞ Cn+1 ¼ e ∂t Dt (5) Cn+1 Cn+1 ∂ðCÞ e P ¼ U n+1 e ∂x Dx=2 2 n+1 ∂ C Cn+1 + Cn+1 P 2Ce E Kx ¼ K x ∂x2 ðDx=2Þ2 U (6) (7) Using this discretization into Eq. (4) and rearrangement of it yields an equation for interface fluxes. This new expression for face flux can be written as (Seifi and Riahi-Madvar, 2019): n+1 Cn+1 ¼ C1 Cn+1 e P + C2 CE + C3 In which: C1 ¼ 4K x Dt Dx + 2DtU n+1 e Dx + 2DtU n+1 + e ! 8K x Dt Dx 1 2 3 C2 ¼ P W FIG. 1 The grid layout of the one-dimensional FV. , W Dx + (8) 4K x Dt Dx 2DtU n+1 e ! + (9) 8K x Dt Dx E e N-2 N-1 N CFD models Chapter C3 ¼ 4 71 ! Dx Cne Dx + 2DtU n+1 + e 8K x Dt Dx (10) As it is seen, in PIS, regarding the physics of the phenomenon and governing equations, there is a robust connection between the nodal variables and intercell variables. In the inlet boundary condition, the known values of concentration are used, C(1,t) ¼Cin,t. The zero-mass gradient C(N 1,t) ¼C(N,t) is used for the outlet boundary. The known concentrations used for the initial conditions C(I,0) ¼ Cin,i. The PIS model results are evaluated in comparison with the analytical results given in Graf (1998). A hypothetical trapezoidal channel with length 1 km, time duration 200 s, having 25 computational cells with 0.1 s time step is used. The results are presented in Figs. 2 and 3 with different Courant (Cr ¼ ut/x) and Peclet (P ¼ ux/D) number values. FIG. 2 The results of PIS, and upwind versus the analytical solution in P ¼ 320 and Cr ¼ 0.0024. 1.1 1 0.9 Analytical Numerical 0.8 Upwind C(ppt) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 100 200 300 400 500 X(m) 600 700 800 900 FIG. 3 The results of PIS, and upwind versus the analytical solution in P ¼ 852 and Cr ¼ 0.0045. 1.1 1 0.9 Analytical Numerical 0.8 Upwind C(ppt) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 100 200 300 400 500 X(m) 600 700 800 900 72 Handbook of hydroinformatics 4. Finite Volume Solution of Saint-Venant equations for dam-break simulation using PIS The Saint-Venant (SV) equations are used to simulate the one-dimensional unsteady flow in rivers with irregular crosssection. The SV equations in conservative form can be written as: The continuity equation: ∂A ∂Q + ¼0 ∂t ∂x (11) ∂Q ∂ Q2 ∂ + ðgAhc Þ ¼ gA S0 Sf + ∂t ∂x A ∂x (12) and the momentum equation: where A ¼ cross-sectional area, V ¼ average velocity, Q ¼ discharge, g ¼ gravity acceleration, hc ¼ vertical distance below the free surface to the centroid of the flow cross-sectional area, S0 ¼ bed slope, Sf ¼ friction slope. Where n ¼ Manning coefficient, R ¼ the hydraulic radius of river cross-section, P ¼ the wetted perimeter of the river cross-section. For a rectangular channel, we have: n2 QjQj A , R¼ P A2 R4=3 A A P ¼ B + 2 , hc ¼ B 2B Q ¼ A V, Sf ¼ (13) (14) in which B ¼ channel width. As shown in Fig. 1, a cell-centered grid with N points, N 1 interface cells with two boundary conditions are used. Integrating Eqs. (11) and (12) over the ith volume and applying the Green’s theorem, after simplifying, the following discretized implicit equations can be obtained: The continuity equation: n+1 AP AnP Dx + Qn+1 Qn+1 Dt ¼ 0:0 (15) e w and the momentum equation using the finite volume approach: " n+1 # n+1 h i h in+1 n+1 Q Q Qn+1 ðAhc Þn+1 (16) ¼ g Dt Dx A S0 Sf Qn+1 + g Dt ðAhc Þn+1 QP QnP Dx + Dt e w e w p A e A w of p cell, e shows the fluxes at the east face, and w shows the fluxes at the west face of p cell; the superscript n refers to the values at the current time step and n + 1 shows the values at the future time step. The overbar notation in momentum equation shows the simple linearization of nonlinear terms of momentum equations from lagged iterations. In this section, new physically based expressions for face flux estimations at e and w faces are developed and the full coupling of discren+1 tized continuity and momentum in the collocated grid of Fig. 1 is assessed. The linear (Qn+1 e , Qw ) interface fluxes in the n+1 discretized continuity equation, and the interface fluxes in discretized momentum equation, and overbar interface ( QA , e n+1 Q ) in the momentum equation estimated by an expression derived from the discretized form of the momentum A w equation, which is named convecting flux. Although there are several methods for linearization of momentum terms and face flux estimations, in this section, the original simple linearization schemes are used. The expressions for the face flux components can be derived from the momentum equation of SV. In this regard, the momentum equation given in Eq. (12), using Q ¼ AV, is expanded to: ∂Q ∂Q ∂V ∂ +V +Q + g ðAhc Þ ¼ S ∂t ∂x ∂x ∂x (17) The terms in this equation are discretized respect regarding the correct physics of the flow. To achieve this purpose, they are approximated as follows: ∂Q Qn+1 Qne ¼ e ∂t Dt (18a) CFD models Chapter V n+1 n+1 n+1 ∂Q Qe Qe Qp ¼ n+1 Dx ∂x A 2 4 73 (18b) e Qn+1 e n+1 ∂V n+1 A ¼ Qe e Q ∂x g Qn+1 p n+1 A p (18c) Dx 2 n+1 ½Ahc n+1 ∂ E ½Ahc P ðAhc Þ ¼ g ∂x Dx n+1 S ¼ Se (18d) (18e) According to the inherent physics of these terms, the convection parts are discretized with an upwind scheme, and pressure terms are discretized by central difference. The substitution of the discretized terms into Eq. (6), with a rearrangement, finally results in an expression for the face flux component at cell faces. A compact form of the resulted equation for e face flux can be written as: Qne + 1 ¼ C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , nE + 1 AnE + 1 + C3 in which Dt n+1 1 1 C1 ¼ 2 + n+1 Qe n+1 CF Dx AP Ae CF ¼ 1 + 4 C2 ¼ Dt Dx ! n+1 Qe n+1 A (20a) (20b) e g Dt CF Dx n+1 S Dt Qne + e CF CF 0 1 n+1 n+1 2 n Q Q 4 e e n+1 B 3C ¼ g Ae @S0 pn+1 A 103 e n+1 Ae C3 ¼ n+1 Se (19) (20c) (20d) (20e) 5. Discretization of continuity equation using PIS As stated in the previous section, the continuity equation is discretized using convecting flux in Eqs. (19) and (20a)–(20e). The continuity equation which is shown in the form of Eq. (15) can be rewritten as follows: An+1 P + Dt n+1 Q Qn+1 ¼ AnP w Dx e (21) Replacing Eqs. (17) and (18a)–(18e) into Eq. (21) yields: AnP + 1 + Dt C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , En + 1 AnE + 1 + C3 D1 QnW+ 1 + D2 hc , nW+ 1 AnW+ 1 hc , nP + 1 AnP + 1 + D3 ¼ AnP Dx (22) 6. Discretization of the momentum equation using PIS Considering Eq. (16), the face fluxes are estimated by Eq. (19). The pressure terms, based on their physical meanings, are treated as follows: 74 Handbook of hydroinformatics ðAhc Þn+1 ¼ e n+1 ðAhc Þn+1 E + ðAhc ÞP 2 Dt W e ¼ 1 Cnre ¼ 2 ðAhc Þn+1 w ¼ W w ¼ 1 Cnrw jQE j AE n + jQP j AP n (23) Dx n+1 ðAhc Þn+1 W + ðAhc ÞP 2 n Q n j j jQP j + AW W Dt AP ¼ 2 Dx Finally, implementing Eqs. (23) and (24) in Eq. (16) yields: " n + 1 n+1 Q n QP QP Dx + Dt C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , nE + 1 AnE + 1 + C3 QnP + 1 QnP Dx A e " n + 1 Q + Dt C1 Qnp + 1 + C2 hc , nP + 1 AnP + 1 hc , nE + 1 AnE + 1 + C3 A e # n + 1 n+1 n+1 Q n+1 n+1 n+1 D 1 Q W + D 2 hc , W AW hc , P AP + D3 A w " # ðAhc ÞnW+ 1 + ðAhc ÞnP + 1 ðAhc ÞnE + 1 + ðAhc ÞnP + 1 Ww + g Dt We 2 2 h in + 1 ¼ g Dt Dx A S0 Sf p (24) (25) The nonlinear system of Eq. (25) in corporation with initial and boundary conditions is solved using an initial guess for nonlinear coefficients, where a direct solver is used for the 5-diagonal matrix of unknowns. A FORTRAN code developed by the first author is used to solve the system of nonlinear equations for dam-break simulation. The boundary conditions in two test cases of dry bed and wet bed are as follows. In the case of dry bed, the initial conditions are: If ðXi XdamÞ Then A ði, 0Þ ¼ A1,Q ði, 0Þ ¼ 0:0 Else A ði, 0Þ ¼ Ads ,Q ði, 0Þ ¼ 0:0 The defined boundary conditions are Neumann boundary condition (∂/∂ x ¼ 0.0), at the inlet and outlet boundaries considering as an open boundary problem, so we have: Að1, tÞ ¼ Að2, tÞ,Qð1, tÞ ¼ Qð2, tÞ,AðN, tÞ ¼ AðN 1, tÞ,QðN, tÞ ¼ QðN 1, tÞ In the dry bed Ads ¼ 1e 8 and in the wet bed Ads ¼ A2. The indices “2” and “1” refer to the downstream and upstream parts of the dam, respectively. Tests are considered as idealized dam breaks in a rectangular channel with a dry and a wet bed. The dry bed test is used to evaluate the performance of the PIS in the waves with very shallow front edge. In the wet bed, a right traveling surge in the solution domain with an upward depression wave is used. The test case conditions are; channel width, B ¼ 1 m; bed slope, S0 ¼ Sf ¼ 0.0; A1 ¼ 10 m, A2 ¼ 0.0000001, channel length 1200 m, dam located at x ¼ 500 m, dx ¼ 10 m and dt ¼ 0.3 s. Figs. 4–6 represent the numerical and exact solutions of water area A, discharge Q, at 30 s after the dam failure. As it can be seen from this figure, the PIS-SV model can accurately predict the step and sharp variation of flow depth and area, also near the sharp depth changes. Fig. 7 shows the discharge obtained by PIS-SV and the exact results. From this figure, the model accurately predicts the flow peak discharge along the channel, but some discrepancies occurred at low flows. It is noticed that the PIS-SV model doesn’t use any special treatments such as weighted water surface gradient, whereas the upwind model necessarily needs this technique of shock capturing. This has revealed the capability of the newly developed model in shock capturing in step gradients without using any special numerical treatments. 7. Quasi-two-dimensional flow simulation The natural rivers in a compound shape have a deeper main cross-section and the shallower floodplain sections. Compound channels differ from the single channels in terms of flood adjustment, cutting flood peak, sediment transport, lateral CFD models Chapter 12 75 PIS-SV 10 Exact = 0.3 (s) = 10 (m) t = 30 (s) 8 h (m) 4 6 4 2 0 0 200 400 600 800 1000 1200 x (m) FIG. 4 Dam break over the dry bed: comparison of PIS-SV with the exact solution of flow depth at t ¼ 30 (s). 35 PIS-SV Exact 30 Q (m3/s) 25 = 0.3 (s) = 10 (m) t = 30 (s) 20 15 10 5 0 0 200 400 600 x (m) 800 1000 1200 FIG. 5 Dam break over dry bed: comparison of PIS-SV with exact solution of discharge at t ¼ 30 (s). 12 Exact 10 PIS-SV h (m) 8 6 4 2 0 0 200 400 600 800 1000 x (m) FIG. 6 Dam break over wet bed: comparison of PIS-SV with exact solution of area at t ¼ 30 s. variation in depth and flow velocities, etc. (Chatila, 1997). This quasi 2-D model is widely accepted for conveyance estimation system of natural rivers with compound channels (Riahi-Madvar et al., 2011). Shiono and Knight (1989, 1991) derived the depth-averaged equation for quasi-two-dimensional flows by integrating from the Navier-Stokes equations over the flow depth H. The Shiono and Knight model (SKM) is a depth-averaged based on the RANS equations, which determines the lateral distributions of the depth averaged velocity distribution and boundary shear stress distribution across the river cross-sections. The SKM is written as: 76 Handbook of hydroinformatics N-1 N 123 FIG. 7 Compound cross-section with solution network used in SKM method. f rgHS0 r ud 2 8 ( ) rffiffiffiffiffiffiffiffiffiffiffiffiffi 1=2 1 ∂ f ∂u ∂H ðrUVÞd rlH2 1+ 2+ ud d ¼ s ∂y 8 ∂y ∂y (26) where H ¼ water depth, U and V ¼ velocity in x and y directions, S0 ¼ longitudinal bed slope, f ¼ Darcy-Weisbach friction factor, s ¼ side slope of cross-sections, r ¼ fluid density, g ¼ acceleration due to the gravity, and l ¼ dimensionless eddy viscosity. Subscript d refers to the depth-averaged condition. In this section, the numerical solution of the quasi-twodimensional models as another application of the CFD models is presented. Chatila (1997) presented laboratory observations of the distribution of depth-averaged velocity in a compound channel. The experimental channel was 29.26 m long, 0.787 m deep, and 1.498 m wide with a bed slope of 0.00069. The channel had a simple rectangular cross-section; it was modified by using aluminum sheets to produce an asymmetrical compound shape. Velocity measurements were performed at two stations, one 12.24 m and the other 22.76 m from the channel entrance. In this study, only the velocity measurement at 12.24 m section from the entrance is used to evaluate the numerical model and seven turbulence models. Detailed information about the instruments and measurements is given in Chatila (1997). 8. Numerical solution of quasi-two-dimensional model By assuming X ¼ u2d the SKM relation is written as follows: rffiffiffiffiffiffiffiffiffiffiffiffiffi f 1 1 2 f 1=2 ∂2 X G gHS0 X 1 + 2 + lH ¼ 8 s 2 8 ∂y2 r (27) Changing the following variables: 1=2 f 8 rffiffiffiffiffiffiffiffiffiffiffiffiffi f 1 1+ 2 T¼ 8 s P ¼ lH 2 R¼ G gHS0 r (28a) (28b) (28c) and replacing them into Eq. (27), the following equation is obtained: P ∂2 X TX ¼ R 2 ∂y2 (29) By numerically solving the above equation at the flow cross-section, the lateral distributions of depth-averaged velocity are obtained. Considering the compound cross-section according to Fig. 7 and discretizing the Eq. (29), we have a solution for the network. Pi Xi+1 2Xi + Xi1 T i X i ¼ Ri 4Dy (30) Finally, by simplifying the above relation and re-arranging it according to the unknown parameter X, the following equation is obtained: CFD models Chapter 4 77 Pi Pi P X + T i Xi + i Xi+1 ¼ Ri 4Dy i1 2Dy 4Dy (31) AXi1 BXi + CXi+1 ¼ D (32) or By applying the above equation to all points in the cross-section, a three-diagonal unknown matrix is obtained, which can be easily solved using the Thomas algorithm. The walls have some conditions like the velocity of them are zero and they are nonslip and these conditions are the boundary conditions in the SKM method for solving the equations. The system of threediagonal equations is finally obtained as follows: i ¼ 1 ! Xi ¼ 0 i ¼ 2 ! A ¼ o, BXi + CXi+1 ¼ D i ¼ 3, N 2 AXi1 BXi + CXi+1 ¼ D Next i (33) i ¼ N 1 ! C ¼ o, AXi1 BXi ¼ D i ¼ N ! Xi ¼ 0 By solving the above system of equations, the variable X ¼ u2d and then the average velocity is obtained at different crosssection points. The results of the SKM model in comparison with observed values of velocity and shear stress are presented in Figs. 8–10. According to Figs. 8 and 9, it’s clear that the SKM model has good capabilities in the prediction of the lateral velocity profile of the depth-averaged velocity, and it can accurately predict strong velocity gradients between the main channel and flood plains. The boundary shear stress distribution simulated by the SKM model is compared with measurements of FCF (Ayyoubzadeh, 1997) in Fig. 10. The results of SKM in these figures show the applicability of SKM in compound channels. More details about these results are given in Riahi-Madvar et al. (2011). 0.4 SKM Observed 0.35 0.3 Velocity (m/s) 0.25 0.2 0.15 0.1 0.05 0 0 0.5 1 1.5 Lateral Distance(m) FIG. 8 Lateral velocity distribution in a rectangular compound channel versus the measurements. (From Riahi-Madvar, H., Ayyoubzadeh, S., Namin, M., Seifi, A., 2011. Uncertainty analysis of quasi-two-dimensional flow simulation in compound channels with overbank flows. J. Hydrol. Hydromech. 59(3), 171.) 78 Handbook of hydroinformatics FIG. 9 Lateral velocity distribution in a large-scale trapezoidal compound channel versus the measurements. (From Riahi-Madvar, H., Ayyoubzadeh, S., Namin, M., Seifi, A., 2011. Uncertainty analysis of quasi-two-dimensional flow simulation in compound channels with overbank flows. J. Hydrol. Hydromech. 59(3), 171.) 1 SKM Observed 0.9 0.8 Velocity (m/s) 0.7 0.6 0.5 0.4 0.3 0.2 0 0.5 1 1.5 2 2.5 3 3.5 Lateral Distance(m) FIG. 10 Lateral boundary shear stress distribution in a large-scale trapezoidal compound channel versus the measurements. (From Riahi-Madvar, H., Ayyoubzadeh, S., Namin, M., Seifi, A., 2011. Uncertainty analysis of quasi-two-dimensional flow simulation in compound channels with overbank flows. J. Hydrol. Hydromech. 59(3), 171.) 1.8 SKM Observed 1.6 Shear Stress (N/m2) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 Lateral Distance(m) 9. 3D numerical modeling of flow in compound channel using turbulence models The turbulent flow in a compound channel is an example of complicated turbulent flows. In this test case the flow field is affected by shear stresses produced with the momentum transfer between the main channel and the adjacent flood plains. The secondary flow of the second type is generated with the anisotropic turbulence near the corners of compound crosssections. Several researchers used turbulence models in compound channels modeling, such as Wilson et al. (2002), Jiang CFD models Chapter 4 79 et al. (2008), Shiono et al. (2003), Sugiyama et al. (2006), but the comparison of the accuracy of different turbulence models in predicting lateral velocity distribution in compound channels has fewer studies. In this section, the authors used seven turbulence models in a CFD simulation and compared their results with experimental data. With the development of computing ability, numerical simulation in the CFD models is used to study complex flow problems. The flow of water in a straight compound channel with prismatic cross-section is investigated with a threedimensional finite volume model which solves the Reynolds Averaged Navier-Stokes equations. In the following sections, the mathematical equations of the flow and turbulence models, the numerical solution of the governing equations and finally results of several turbulence models are presented. 10. Three-dimensional numerical model The three-dimensional Navier-Stokes equations for turbulent flow, combined with turbulence models, are solved numerically to obtain velocity field in compound channel flows. The equations are as follows: ∂U i ¼ 0 i ¼ 1,2,3 ∂xi ∂U i ∂U 1 ∂ + Uj i ¼ Pdij rui uj r ∂xj ∂t ∂xj (34) i,j ¼ 1,2,3 (35) U is the velocity, x is the spatial geometrical scale, P is the pressure, dij is the Kronecker delta, and u is the velocity. The last part of this equation shows the turbulent term that is being modeled by the Boussinesq method: ! ∂Uj ∂U i 2 + (36) ui uj ¼ nt kdij 3 ∂xi ∂xj Here, k is the turbulent kinetic energy. In this equation, the turbulent eddy viscosity nt is unknown and must be modeled by a turbulence model. In this section, seven models of turbulence were used for the computation of eddy viscosity which are T1: Keefer eddy viscosity ¼ 0.11*depth*shear velocity, T2: constant nonisotropic eddy viscosity in vertical and horizontal directions: VtH ¼ 0.23 and VtV ¼ 0.008, T3: constant eddy viscosity ¼ 0.24 given by Wilson et al. (2006), T4: standard k-e, T5: local k-e (local k-e model based on water velocity), T6: k-w with Wilcox’s wall law, and T7: k-w with k-e wall laws (Olsen, 2009; Wilson et al., 2002; Wilcox, 2000; Rodi, 1980). In this study, the numerical solution of the governing equations is done using the SSIIM model, which is a free online three-dimensional solver of flow and sediment transport in turbulent open channel flows. The final form of the general equation of discretization (in the permanent state) is obtained as follows: X ap ’ p ¼ a ’ +b (37) nb nb nb in which ap ¼ X a nb nb s2p (38a) b ¼ bNo + s1p bNo ¼ GG12 ∂’ ∂’ + GG13 3 2 ∂x ∂x e + GG21 w ∂’ ∂’ + GG23 3 1 ∂x ∂x (38b) n + GG31 s ∂’ ∂’ + GG32 2 1 ∂x ∂x t (38c) b In the above relations, the nb subscript represents the neighboring nodes around the central node p. To obtain a differential equation including F in the network nodes, it is necessary to consider a suitable profile between nodes. A linear change profile for F is used to discretize the diffusion fluxes, so the orthogonal flux gradient on the east side e is discretized as: ∂’ ’E ’P ¼ ∂x1 ∂x1 (39) In this equation, ’E represents the value of the variable ’ in the east node E and the other two gradients in the nonorthogonal flux on the east side e are also discretized as follows: ∂’ ’ðenÞ ’ðesÞ ¼ ∂x2 ∂x2 (40) 80 Handbook of hydroinformatics ∂’ ’ðetÞ ’ðebÞ ¼ ∂x3 ∂x3 (41) Also, an interpolation is performed to calculate the values of variables on the control volume faces in a weighted linear way in the physical space, i.e.: ’e ¼ ’p 1 f 1p + ’E f 1p (42) in which f 1p ¼ 11. Pe PE (43) Grid generation and the flow filed solution The grid in this study was structured, composed of △ X ¼ 25 cm, △ Y ¼ 7.9 cm and 21 vertical cells, and resulted from a series of grid sensitivity analysis. In the vertical direction grid intersections are selected at 0, 0.05, 0.1, 0.15, …, times of the depth, uniformly spaced over the flow depth. The grid in vertical and plane view is given in Fig. 11. The solution field has 29.26 m length and 1.498 m width. The flow is steady and nearly uniform with the depth ratio (depth in the floodplain to the main channel) of 0.138. The simulation results from the seven turbulence models are compared with measured depth-averaged longitudinal velocity profiles at 12.24 m from the upstream. The Manning coefficient was n ¼ 0.014. 12. Comparison of different turbulence models In Fig. 12, the results of seven turbulence models in comparison with observed values for predicting depth-averaged lateral velocity distribution are presented. From this figure, all of seven turbulence models qualitatively have predicted the lateral distribution of longitudinal velocity in compound channels, where in the main channel the flow velocity is faster than floodplains and the higher velocity gradients near the floodplains in multistage rivers are predicted accurately. From Fig. 12, it is concluded that in the present test case, the T4 and T7 models, i.e., standard k-e and k-w-epsilon, have the best predictions of the velocity field in compound channels in Fig. 12 the T1 case, Keefer model, has good predictions of high velocities in the main channel, but the slower flow velocity in the floodplain predicted by this model is very slower than observed values. The nonisotropic constant eddy viscosity model, T2 case, and the constant eddy viscosity model, T3 case, predict velocity in the main channel slower than the observation and in floodplains greater than the observation. These models, could not truly predict the velocity gradients in compound channels very well because the turbulence model does not use the velocity and depth variations in the eddy viscosity models. In the standard k-e turbulence model, T4 case, the best prediction was derived, but the models in the floodplain and in the interaction zone of the shear layer have some discrepancies with measurements. The local k-e model, T5 case, has results similar to constant eddy viscosity models. T6 case, the k-w model with Wilcox boundary wall laws has good results in comparison with eddy viscosity models and there is more poor prediction than standard k-e. Finally, the k-w with k-e wall laws is the second-best model and has acceptable predictions. From these comparisons, it is clear that the transverse velocity filed in multistage rivers is very sensitive to the turbulence models and requires further investigation on advanced turbulence models. As the first option in three-dimensional numerical modeling of turbulent flow in compound channels, the standard k-e model can be used as an acceptable model but further analysis on the flow variables such as transverse and vertical FIG. 11 The grid in vertical (top left and right) and horizontal (bottom) planes. Depth Averaged Velocity(m/s) CFD models Chapter 4 81 FIG. 12 Observed and simulated distribution of depth-averaged velocity in seven turbulence model. 0.40 0.35 0.30 0.25 Keefer eddy viscosity Observed Constant non-isotropic eddy viscosity Constant eddy viscosity Standard k-e Local k-e k-w with Wilcox’s wall law k-w with k-e wall laws 0.20 0.15 0.10 0.05 Lateral Distance (m) 0.00 0 0.3 0.6 0.9 1.2 1.2713 m/s 5.0 m Level 21 FIG. 13 Longitudinal velocity vectors. velocities, shear stress, and Reynolds stresses is required. However, for the velocity modeling in compound channels, the k-e model is the best one among the investigated turbulence models. Because of the best predictions of the T4 turbulence model, it is selected as the base turbulence model, and some hydrodynamic behavior of flow in compound channels are investigated and interpreted numerically Longitudinal velocity profile along the compound channel is presented in Fig. 13. A net lateral momentum transfer from slow water in the flood plain toward the faster water in the main channel occurs, and the high-velocity flow in the main channel pulls the low-velocity flow in the flood plain. From upstream toward the downstream, a fully developed flow field occurs. Fig. 14 shows the secondary flows in the three sections of the flume, one at a section near the upstream, another in the middle of the flume, and the other one at the end downstream crosssection. From this figure, it is concluded that the secondary flow decreases from the upstream to the downstream, in such a way that the lateral mass and momentum transfer decreases. The horizontal and vertical velocity contour plots are presented in Fig. 15, and the computed flow field is given in Fig. 16. 13. Three-dimensional pollutant transfer modeling The pollutant transport was calculated by solving the transient convection-diffusion equation for pollutant concentration: ! ∂c ∂c ∂ ∂c + Uj ¼ Gt (44) ∂t ∂xj ∂xj ∂xj 82 Handbook of hydroinformatics 0.0149 m/s Cross-section no. 2 0.0086 m/s Cross-section no. 25 0.0045 m/s Cross-section no. 55 FIG. 14 Secondary flows at three locations. CFD models Chapter 4 83 FIG. 15 Horizontal and vertical velocity contour plots. where the Reynolds-averaged water velocity was denoted as U, the diffusion coefficient Gt was set equal to the eddy viscosity taken from the best turbulence model achieved in the previous section (i.e., standard k-e turbulence model). The experimental data on pollutant transport are those from Shiono and Feng (2003). The experiments were done on a flume with 20 m length, 0.2 m width, two depth ratios of 0.5 and 0.27, fixed bed slope of 0.0005, and the Manning’s roughness of 0.012334. The used tracer was a fluorescence dye (Rhodamine), injected with a constant rate from a reservoir. The tracer was injected in three positions in the deep channel, hereafter referred to as C1, C2, and C3. In the shallow channel, only one injection point was used, referred to as S1 in Table 1. 14. Results of pollutant transfer modeling For three-dimensional numerical transfer modeling of neutral pollutants in the compound section, a network with 40 control volumes in the flow direction, 24 control volumes in the lateral direction, and 20 control volumes in the vertical direction has been used. The flow boundary conditions are like the boundary conditions used in the previous section and for solving 84 Handbook of hydroinformatics FIG. 16 Computational network and velocity distribution in main channel and floodplains. TABLE 1 Dye injection locations and measurement locations. Test case Dye injection location Injection rate (mL/min) X Y Z C1 13 0.05 0.108 54 C2 13 0.1 0.108 54 C3 13 0.15 0.108 54 S1 13 0.1 0.073 33 the three-dimensional pollution-transfer equation, the boundary conditions are such that the boundary condition of symmetry (zero gradient) is used in the boundaries of the bed, downstream, outlet, and sides, i.e., the gradient is zero. At the water surface, the concentration is zero, and at the inlet boundary at specific injection points, the known concentration injection is used. The results of the numerical model are compared with experimental measurements. Fig. 17 shows the distribution of cross concentration measured in the S1 experiment, simulated by the three-dimensional numerical model SSIIM in this study, and simulated by Shiono et al. (2003). In laboratory measurements as can be seen in Fig. 17, two peaks are observed one of them in width of 0.09 m and the other one in width of 0.135 m. The three-dimensional numerical model in this study was more accurate than the results of the numerical model of Shiono et al. (2003), which is assumed to be a fully developed and uniform flow with two turbulent, linear k-e and nonlinear k-e, flow models. Figs. 18–20 show three experiments C1, C2, and C3, respectively, which compare the cross-distribution of concentration obtained from the present study with experimental measurements and numerical modeling of Shiono et al. (2003). The results of three-dimensional modeling show that although the three-dimensional model has been able to predict the pattern of cross-distribution of concentration in the compound section, but does not have the desired accuracy, like the numerical model of Shiono et al. (2003). This issue is related to the nature of turbulent flow in compound sections and its effect on the quantitative and qualitative pattern of pollutant distribution in composite sections. Exp Shiono et al(2003): LY C (ppm) 7.00 Shiono et al(2003): k-e 3D This Study FIG. 17 The results comparison of threedimensional modeling of contamination transfer in the compound section with experimental measurements and the results of other researchers in the S1 experiment. 6.00 5.00 4.00 3.00 2.00 1.00 0.00 0 0.05 0.1 0.15 0.2 Y(m) FIG. 18 The results comparison of threedimensional modeling of contamination transfer in the compound section with experimental measurements and the results of other researchers in the C1 experiment. 7.00 C (ppm) 6.00 Exp Shiono et al(2003): k-e 5.00 Shiono et al(2003):LY 3D This study 4.00 3.00 2.00 1.00 0.00 0 0.05 0.1 0.15 0.2 Y(m) Exp. Shiono et al(2003):LY Shiono et al(2003): k-e 3D This study C (ppm) 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 0 0.05 0.1 0.15 Y(m) 0.2 FIG. 19 The results comparison of threedimensional modeling of contamination transfer in the compound section with experimental measurements and the results of other researchers in the C2 experiment. Handbook of hydroinformatics FIG. 20 The results comparison of threedimensional modeling of contamination transfer in the compound section with experimental measurements and the results of other researchers in the C3 experiment. 7.00 Exp. 6.00 C (ppm) 86 Shiono et al(2003): ke 5.00 4.00 3.00 2.00 1.00 0.00 0 0.05 0.1 0.15 0.2 0.25 Y(m) 15. Conclusions This chapter presents the methods available in computational fluid dynamics (CFD) to solve the governing equations in hydroinformatics modeling. Some applications of CFD in hydroinformatics, including the one-dimensional solution of advection-diffusion equation in pollutant transport modeling, one-dimensional solution of Saint-Venant equations for dam-break simulation, quasi-two-dimensional solution of velocity distribution in compound rivers, and three-dimensional modeling of turbulent flow and pollutant transport in rivers, are provided. The physically influenced scheme (PIS) is presented for solving the one-dimensional solution of the advection-diffusion equation in pollutant transport modeling and the one-dimensional solution of Saint-Venant equations for dam-break simulation. The PIS approach was initially developed for Euler equations in gas dynamics. In this chapter, the authors extended the use of PIS approach into the pollutant dispersion. In the 1D-ADE problem, the results of the PIS model are verified by analytical solution. The comparison of the PIS model with the analytical solution shows that this method is in good agreement with the analytical results. In the dam-break problem, it is shown that the PIS model can accurately predict the step and sharp variation of the flow and it is indicated that this model is capable of modeling high-speed open channel flow regimes. According to the quasi-two-dimensional flow simulation section, the results of the Shiono and Knight Model (SKM) are compared with observed values of velocity, shear stress, and discharge capacity. The performed comparisons reflect the fact that the SKM model can accurately predict lateral velocity and shear stress profiles. In the three-dimensional numerical modeling of turbulent flow section, seven models of turbulence are compared with experimental data. The seven models are Keefer model, nonisotropic constant eddy viscosity model, constant eddy viscosity model, local k-e model, k-w model with Wilcox boundary wall laws, and k-w with k-e wall laws. According to the output, the T4 and T7 models, i.e., standard k-e and k-w-epsilon, have the best predictions of the velocity field in compound channels in comparison with the other five models. The other result that can get from this output is that the T4 model is the best model because of its generality, and the T7 model is the secondbest model. In the study of three-dimensional pollutant transfer, the three-dimensional numerical model is compared to the results of the numerical model of Shiono et al. (2003), which is assumed to be a fully developed and uniform flow with two turbulent flow models, linear k-e and nonlinear k-e. The results of three-dimensional modeling show that although the three-dimensional model can predict the pattern of cross-distribution of concentration in the compound section, it does not have the desired accuracy, like the numerical model of Shiono et al. (2003). References Aldrighetti, E., 2007. Computational Hydraulic Techniques for the Saint Venant Equations in Arbitrarily Shaped Geometry. Doctoral dissertation, International Association for Hydrogen Research, Rotterdam, The Netherlands. Ayyoubzadeh, S.A., 1997. Hydraulic Aspects of Straight-Compound Channel Flow and Bed Load Sediment Transport. PhD Thesis, The University of Birmingham, UK. CFD models Chapter 4 87 Bozkus, Z., Eslamian, S., 2022. Simulating flood due to dam break. Chapter 25, In: Eslamian, S., Eslamian, F. (Eds.), Flood Handbook, Vol. 3: Flood Impact and Management. Taylor and Francis, CRC Group, USA. Chatila, J.G., 1997. Modeling of Pollutant Transport in Compound Open Channels. Doctoral dissertation, University of Ottawa, ON, Canada. Darbandi, M., Bostandoost, S.M., 2005. A new formulation toward unifying the velocity role in collocated variable arrangement. Numer. Heat Transf. B 47 (4), 361–382. Darbandi, M., Mokarizadeh, V., Rouhi, E., 2007. Developing a shock-capturing formulation with high performance to capture normal standing shock in all-speed regime. Esteghlal J. Eng. 25 (2), 167–181. Graf, W.H., 1998. Fluvial Hydraulics: Flow and Transport Processes in Channels of Simple Geometry. In: Collaboration with M.S. Altinakar, Wiley, England. 681 pp. ISBN 0-471-97714-4. Jiang, H., Guo, Y., Li, C., Zhang, J., 2008. Three-dimensional numerical simulation of compound meandering open channel flow by the Reynolds stress model. Int. J. Numer. Meth. Fluids 2008. https://doi.org/10.1002/fld.1855. Kashefipour, S.M., Falconer, R.A., 2002. Longitudinal dispersion coefficients in natural channels. Water Res. 36 (6), 1596–1608. Olsen, N.R.B., 2009. A Three-Dimensional Numerical Model for Simulation of Sediment Movements in Water Intakes with Multiblock Option. Department of Hydraulic and Environmental Engineering, The Norwegian University of Science and Technology, 2002. http://folk.ntnu.no/nilsol/ ssiim/manual3.pdf. User’s manual. Patankar, S., 1980. Numerical Heat Transfer and Fluid Flow. Hemisphere, Washington, DC. McGraw Hill, USA. Riahi-Madvar, H., Ayyoubzadeh, S., Namin, M., Seifi, A., 2011. Uncertainty analysis of quasi-two-dimensional flow simulation in compound channels with overbank flows. J. Hydrol. Hydromech. 59 (3), 171. Riahi-Madvar, H., Dehghani, M., Seifi, A., Salwana, E., Shamshirband, S., Mosavi, A., Chau, K.W., 2019. Comparative analysis of soft computing techniques RBF, MLP, and ANFIS with MLR and MNLR for predicting grade-control scour hole geometry. Eng. Appl. Comput. Fluid Mech. 13 (1), 529–550. Rodi, W., 1980. Turbulence Models and Their Application in Hydraulics – a State-of-the-Art Review. Int. Assoc. Hydr. Res., Balkema, Rotterdam, The Netherlands,. Seifi, A., Riahi-Madvar, H., 2019. Improving one-dimensional pollution dispersion modeling in rivers using ANFIS and ANN-based GA optimized models. Environ. Sci. Pollut. Res. 26 (1), 867–885. Shiono, K., Feng, T., 2003. Turbulence measurements of dye concentration and effects of secondary flow on distribution in open channel flows. J. Hydraul. Eng. 129 (5), 373–384. Shiono, K., Knight, D.W., 1989. Two-dimensional analytical solution compound channel. In: Proceedings of 3rd International Symposium on Refined Flow Modeling and Turbulence Measurements. Universal Academy Press, pp. 591–599. Shiono, K., Knight, D.W., 1991. Turbulent open-channel flows with variable depth across the channel. J. Fluid Mech. 222, 617–646. Shiono, K., Scott, C.F., Kearney, D., 2003. Predictions of solute transport in a compound channel using turbulence models. J. Hydraul. Res. 41 (3), 247–258. Sugiyama, H., Hitomi, D., Saito, T., 2006. Numerical analysis of turbulent structure in compound Meandering open channel by algebraic Reynolds stress model. Int. J. Numer. Methods Fluids 51, 791–818. Tucciarelli, T., 2003. A new algorithm for a robust solution of the fully dynamic Saint-Venant equations. J. Hydraul. Res. 41 (3), 239–246. Wilcox, D.C., 2000. Turbulence Modeling for CFD. DCW Industries. ISBN 0-9636051-5-1. Wilson, C., Bates, P.D., Hervouet, J.M., 2002. Comparison of turbulence models for stage-discharge rating curve prediction in reach-scale compound channel flows using two-dimensional finite element methods. J. Hydrol. 257 (1–4), 42–58. Wilson, C.A.M.E., Yagci, O., Rauch, H.P., Olsen, N.R.B., 2006. 3D numerical modelling of a willow vegetated river/floodplain system. J. Hydrol. 327 (1–2), 13–21. Wu, W., 2007. Computational River Dynamics. CRC Press, USA. This page intentionally left blank Chapter 5 Cross-validation Amir Seraja, Mohammad Mohammadi-Khanaposhtanib, Reza Daneshfarc, Maryam Naserid, Mohammad Esmaeilie, Alireza Baghbanf, Sajjad Habibzadehg, and Saeid Eslamianh,i a Department of Instrumentation and Industrial Automation, Ahwaz Faculty of Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran, b Fouman Faculty of Engineering, College of Engineering, University of Tehran, Tehran, Iran, c Department of Petroleum Engineering, Ahwaz Faculty of Petroleum Engineering, Petroleum University of Technology, Ahwaz, Iran, d Chemical Engineering Department, Babol Noshirvani University of Technology, Babol, Iran, e Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, f Chemical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Mahshahr Campus, Mahshahr, Iran, g Surface Reaction and Advanced Energy Materials Laboratory, Chemical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, h Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, i Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction 1.1 Importance of validation In our everyday life, we face questions like “How do you know that?,” “Are you sure?,” etc. But why these questions are asked so frequently? To see why let’s consider the situations where such questions are made. When someone talks about an event, they give us a piece of “information” about a “result.” Now if we would use this information in other situations, we need to make sure that it is “valid.” It is always about losing or gaining some benefit by either retelling the story or acting based on that story. For example, we don’t like to risk our reputation on unverified stories or lose our money in an investment based on unevaluated information. These examples were intended to demonstrate the importance of “validation.” Every piece of information that tends to be useful in the future, needs to be validated first. In a scientific study, validation comes into play when some experimental data are acquired or when a model is proposed to generalize the applicability of this data. Especially, when we are looking for a predictive tool, validation becomes a crucial step in the development of model. For example, when a model is proposed for engineering or economic purposes, it must be validated before any design, investment, or forecasting upon that model. In science and engineering, when a model is derived by a mechanistic approach, a mathematical formula is obtained which relates the input to the specified output. In this respect, the validation could be made by comparison of the model results with accurate experimental data or at least by the model performance in the limiting situations, e.g., equilibrium or steady-state conditions. But there is an everincreasing number of models based upon machine learning which try to offer predictive tools based on their training by the available experimental data. Here a valid model must act as well on the independent data sets, i.e., the data which is not incorporated in the derivation of the model. 1.2 Validation of the training process In machine learning, the training phase usually takes the major portion of the available data (training data set) to find the model parameters that favorably minimize the error. Usually, the error rapidly decreases at the first iterations but as the training proceeds, the error would slowly decrease toward a local minimum (Richert, 2013). The training is stopped when the best generalization is achieved by the model which is the one that is not underfitting or overfitting the data. Since the training data set provides the expected output, the error could be minimized by a suitable method to achieve the best model parameters for the training data; however, if the model is so accurate for the training data set, it might act poor on the testing data set or any other future data (Fontaine, 2018). Thus, a stopping criterion is required to avoid overfitting in the model. Indeed, searching should be stopped when the criteria of error minimization and cross-validation are both met (Brownlee, 2018). For example, the validation of the model on the testing set could be performed during the training phase and when the Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00021-X Copyright © 2023 Elsevier Inc. All rights reserved. 89 90 Handbook of hydroinformatics error of the predictions for the testing set reaches a local minimum the training would be stopped (Raschka, 2015). This type of validation which usually takes some 30% of the data, as the testing set is the simplest type of cross-validation but it still functions properly in any situation. However, for a more robust validation, more useful techniques might be incorporated which are classified and discussed in the subsequent sections (Lei, 2019; Berrar, 2019). 2. Cross-validation The cross-validation is started by splitting the original data set into a training set(s) and test set(s). When all possible combinations of splitting the available data are considered, exhaustive cross-validation is going to be made (Brownlee, 2016). On the other hand, when all possible combinations are not considered a nonexhaustive, cross-validation is performed. The simple way of randomly splitting the data set into only one train set (usually containing 70% of the data) and one test set which contains the rest of the data, which is known as the hold-out method is an example of nonexhaustive cross-validation (M€ uller and Guido, 2016). Besides the hold-out method which is just considered a simple validation by some practitioners, the other cross-validation methods include multiple training/test sets and the final model is derived by averaging over the individual models obtained from different runs (Dangeti, 2017). 2.1 Exhaustive and nonexhaustive cross-validation As suggested by its name, exhaustive cross-validation involves a great deal of computational effort as it considers all possible combinations of isolating a specified number of samples from the training phase to use them in the testing phase (Karim and Kaysar, 2016). l LpO-CV Generally, this type of cross-validation is addressed as leave-p out cross-validation (LpO-CV) since at each run, p samples are taken out of the original data set and the training is made by the remaining “n-p” samples where n represents the total number of samples in the original data set. Now if there are only 50 samples in the original data set, with p ¼ 15, the total number of training-validation runs would be C(50,15) which is about 2.25 1012. Selecting p ¼ 2 provides the main advantages of this method for many problems (Brunton and Kutz, 2019). l LOO-CV A variant of LpO with p ¼ 1 which is denoted as LOO-CV (leave-one-out cross-validation) is attracting more attention toward itself as it involves less computational effort while preserving some essential features of LpO-CV. That is so because this method provides “n-1” predictions for each sample in a data set of n samples and since the ratio is close to unity, the LOO-CV holds the variable section consistency in linear models (Viswanathan et al., 2016). 2.2 Repeated random subsampling cross-validation This cross-validation scheme creates multiple random splits from the original data set. For each of these random splits, the training and testing are made to find the corresponding model and by averaging over these models, the final model is prepared. Since the selection of the training and testing sets is a random process, some of the samples may never be used in the validation stage while other samples might be used more than once. Also, the stratified version of this method is proposed for handling imbalanced data sets (Dua and Du, 2016). 2.3 Time-series cross-validation For time-series or time-dependent data, the order of the data is of crucial importance that’s why random splitting or k-fold partitioning may not provide satisfactory results. Instead, the data split is made based on time and the training is performed using the prior subsets with the next subset as a validation set. This method is referred to as rolling cross-validation and also the walk-forward or forward chaining method. Note that this method prevents the leakage of data from the future to the training set which makes the rolling CV stand out (De Prado, 2018; Jansen, 2018; Hadizadeh and Eslamian, 2017). 2.4 k-fold cross-validation The LOO-CV requires n runs which is quite large for many problems thus for saving computation effort, the k-fold technique was developed. This method partitions the data set in k equal size subsets or folds. At each pass, “k-1” folds are used for the training and the remaining fold takes the rule of the testing set (Geron, 2019). The process is repeated k times until Cross-validation Chapter 5 91 each of the k folds has been once used as the test set. Finally, the model is derived by averaging the results of different runs. Although LOO-CV has been considered as a variant of LpO-CV, one might assume that LOO is a form of k-fold crossvalidation with k ¼ n (Kumar, 2019; Kane, 2017). 2.5 Stratified k-fold cross-validation For imbalanced data sets, containing two or more different classes with a different numbers of data points the naı̈ve k-fold CV does not work properly. The solution is sought by partitioning the data set in a way that the mean response value in all folds to be nearly equal. This method is known as stratified k-fold cross-validation. Note that this method is not suitable for time series data sets (Swamynathan, 2019; El Naqa et al., 2015). 2.6 Nested Finally, the k-fold cross-validation could be run in a nested scheme if both hyperparameter selection and error minimization are planned to be made simultaneously. For example, in the k*l-fold cross-validation method, an outer loop and an inner loop are designed to cover both tasks. Here the outer loop makes the usual k-fold cross-validation, while the inner loops are responsible for fitting the model parameters. Note that the “l” in the name of the method refers to the number of subdivisions of outer training sets (Raschka and Mirjalili, 2017; Deisenroth et al., 2020). 3. Computational procedures To comprehend completely and dive deep into cross-validation, the problem of predicting Boston house pricing is brought here. As mentioned in previous sections, the difference between a regression and classification case is the output of these types of problems. In a regression case, the output continues, the output is a continuous numeric variable, some instances for this case study are predicting the temperature tomorrow or another instance is predicting the price of a stock, given its previous prices. Another example for this case is the prediction of time which a software will terminate by giving its specifications. In this chapter, we want to check out the case of predicting the price of homes in Boston suburbs. In the 1970s, as it can be understood, the home price is a continuous variable. Some features in this case study are crime rate, the average number of rooms in the house and accessibility to different highways from house and so forth. In this example, the concept of k-fold cross-validation will be investigated practically. Data for the Boston problem exist in the Keras library. For solving this case, Spyder IDE of Python was selected. Python has several IDEs such as Spyder, Atom, PyCharm, and so on. For machine learning and deep learning tasks, Spyder and Jupyter IDEs are utilized more and have special applications for machine learning tasks in comparison with other IDEs. Spyder IDE is selected to solve the Boston problem. Let’s dive into the solution. It is noted that all codes are written in Spyder IDE (Zaman et al., 2019). (1) from Keras.datasets import boston_housing The first Boston dataset is imported. As mentioned before, we used Keras datasets. Boston_housing is a dataset in Keras library. (2) (train_data, train_targets ), (test_data, test_targets) = boston_housing.load_data(); For loading data and splitting data into train and test data, a function is applied known as the load_data() function. The definition of load_data() function is as follow: load_data(path=’boston_housing.npz’, test_split=0.2, seed=113) The first argument of this function is the path. The path is the directory where to cache the dataset locally. The first time a user wants to use a dataset in Keras, the Spyder download it automatically by load_data() function and saves this dataset in the path (relative to /.keras/datasets) as locally (Sarkar et al., 2018). Another argument is test_split which means a fraction of data to reserve as test data and seed means random seed for shuffling the data before computing the test split. The output of this function is returned as a tuple of Numpy arrays: (x_train, y_train), (x_test, y_test). In line 2, train and test targets are the labels of train and test data. Train or test data gives data or features and train and test targets give labels of these data. (3) train_data.shape => Out[3]: (404, 13) After executing the code in line 3, the output is: Out[4]: (404, 13). It means, this dataset has 404 train data and there are 13 features which as mentioned before some features are such crime rates, accessibility to highways, the number of rooms which every house has, etc. 92 Handbook of hydroinformatics (4) test_data.shape => Out[4]: (102, 13) This means, 102 data is randomly selected for test data while like train data, it has 13 features. As it is obvious, the deep learning model wants to predict the price of houses, therefore, the output of the model is the price of houses in Boston. (5) train_targets => Out[5]: array([15.2, 42.3, 50. , 21.1, 17.7, ........,19.4, 19.4, 29.1]) By executing line 5, the output is like Out[5], which means the price of the first house or first district in Boston city in the dataset is 15.2 thousand dollars and the second sector is 42.3 thousand dollars, etc. These prices are the median values in each city district in thousands of dollars. One important point is about features and that means the range of features is different, in another word, each feature has a different scale or different range, some are between 1 and 12, some between 0 and 1, others between 0 and 100, and so on. In general, there are 13 features which each feature has a different range. This point is too significant which before starting operation for training data, the range of all data should be the same, and in many cases, the range of data should be between 0 and 1. This operation is called data normalization. If don’t do the preprocessing operation, the learning rate becomes too slow and hard to train the model. One important factor in data preprocessing is normalization. This means all data features should be in the same range. One specific type of normalization is called Z-score normalization. Z-score normalization means each feature centered zero and its standard deviation should be unit (Han et al., 2011). This causes all features to have the same distribution. If the mean of all data becomes one specific number and the standard deviation becomes that specific number, the result becomes the same distribution. The mean of train data is computed in line 6 by the mean() function and store into a mean variable. (6) mean = train_data.mean(axis=0) Axis ¼ 0 in mean function means the averaging operation is executed on each column. As mentioned before each column is an indicator of a specific feature. The output of code in line 6 is a vector of 13. Out[6]: array([3.74511057e+00, 1.14801980e+01, 1.11044307e+01, 6.18811881e-02, 5.57355941e-01, 6.26708168e+00, 6.90106436e+01, 3.74027079e+00, 9.44059406e+00, 4.05898515e+02, 1.84759901e+01, 3.54783168e+02, 1.27408168e+01]). (7) train_data - = mean; In code of line 7, the mean value is reduced from all of the train data. This causes the mean of train data to become zero. Lines 6 and 7 cause to mean of train data becomes zero. Now turn to standard deviation (std), standard deviation should be unit. (8) std = train_data.std(axis=0); (9) train_data /=std; Lines 8 and 9 make the standard deviation of train data become a unit. If take a standard deviation from train data, it returns a unit. Out[10] proves this issue. As it can be seen it returns a unit array. When a number is divided into standard deviations, the result turns to one. (10) In[10] : train_data.std(axis=0) Out[10]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) As a result of Z-score normalization the mean is zero and the standard deviation (std) should be united. (11) test_data -=mean; (12) test_data /=std; If consider with details to the python codes, lines 11 and 12, mean and std are computed from train data. It is noted that for test data, the mean and std of train data are being used. It should be considered that when users wants to normalize test data, this normalizing process should be done by training data quantities (mean and std of train data). But a question here arises as that why for normalizing test data, data information (mean and std) of train data is used for normalization and it doesn’t use from test data information (mean and std) for normalizing test data. The reason is that when preparing the model and a new case (a new home) comes into the model to predict its price, the first problem created here is that, it is impossible to compute Cross-validation Chapter 5 93 the standard deviation and mean for one test data. The second problem is that another name of test data is unseen data. It means that when the model is training, the user shouldn’t give any information about test data because in the real world you don’t have any information about the new case (test data) which comes in the model. Therefore, it is not allowed to compute the std and mean from test data because it is a type of cheating. It seems to give some information about test data to your model. After loading (preparing) and normalizing or preprocessing data, we should construct network architecture. Network architecture First, it should be import relative libraries from Keras library such as models and layers to construct the model. Lines 13 and 14 import relative libraries (models and layers) from Keras. (13) from keras import models (14) from keras import layers In line 15, define a function known as build_model() with keyword def, we going to construct the model into the build_model() function. The intention for defining a function is that the constructed model is going to apply in different places in the program. Line 16 makes a sequential model. The sequential model is favorable for a plain stack of the layer in which each layer has one input tensor and one output tensor (Krohn et al., 2019). This module is presented in the engine of Keras. Add function is present in the engine of Keras in the Sequential section. (15) def build_model(): (16) model = models.Sequential() From layers library in add function created a dense layer with 64 neurons and with an activation function relu. This point is important to mention that train data has two dimensions (a two-dimension tensor). The first dimension is the number of data (number of rows) and the second dimension is the number of features (number of columns). Therefore train_data.shape[0] is the number of rows (data) and train_data.shape[1] is number columns (features). (17) model.add(layers.Dense(64,activation=’relu’,input_shape=(train_data.shape[1],))) A definition of activation functions Activations can either be used through an “Activation” layer, or through the “activation” argument that is supported by all forward layers. There are many activation functions such as elu, selu, softmax, softplus, tanh, sigmoid, hard_sigmoid, softsign, exponential, linear and relu. Fig. 1 indicates the graph of relu activation function (Millstein, 2020). In line 18, another dense layer is added with 64 neurons. (18) model.add(layers.Dense(64,activation=’relu’)) As additional explanation about optimizers such as relu and function of each layer consider Fig. 2. FIG. 1 Relu: the rectified linear unit function. 1.5 1.0 0.5 0.0 –0.5 –2.0 –1.5 –1.0 –0.5 0.0 0.5 1.0 1.5 94 Handbook of hydroinformatics X (input) W (Weight) WX + b f(W X + b) b (bias) FIG. 2 The layer explanations. X (data) is the input of layer; each layer has two characteristics such as weight (w) and bias (b). The output of layer is WX + b which is a linear equation. WX + b as an input is given to a nonlinear function such as relu, elu, softmax, and so forth to become nonlinear. This nonlinear function can be modeled every nonlinear system and is the base of Artificial Neural Networks (ANN). In linear layers, there is no f as f(W X + b) and is just a linear function W X + b. In line 19, another dense layer is added with one neuron (Moolayil et al., 2019). Why there is one neuron in the output layer? As mentioned before, the goal of the Boston problem is predicting house prices in Boston district. As a result, the output of this problem is one variable (house price in Boston); therefore, there is one neuron in the output layer. The important point here, is that, there is no activation function in the output layer. When don’t put any activation function in the output layer, the activation function is assumed linear. But the question here arises why don’t use any activation function in the output layer. The answer is that approximately all activation functions limit the output. For instance, the softmax activation function, reduce the range of output to [0 1]. In this case, we intend to predict the price of the house and this price can be in any range of price (can be any number); therefore, the last layer has no activation function. It will be a linear layer free to learn to predict values in any range. Generally, for regression problems, the activation function for the output layer is linear. Line 9 indicates this issue. (19) model.add(layers.Dense(1)) The next step is assigning optimizer, loss function, and metrics. (20) model.compile(optimizer=’rmsprop’, loss=’mse’, metrics=[’mae’]) In this step, and as a explain for codes in line 20, the model should be compile via compile method. Each compile method has several parameters such as optimizer, loss and metrics. The definition of compile method is bright in bellow: compile(self, optimizer, loss=None, metrics=None, loss_weights=None,sample_weight_mode=None, weighted_metrics=None, target_tensors=None, **kwargs) First, it is necessary to define optimizers, the duty of optimizers is to change the value of the weight a bit based on loss values to decrease the loss value. Fig. 3 shows this mechanism clearly. This figure indicates the relationship between the layers network, loss function, and optimizer. The adjustment between these two numbers is the job of the optimizer, which this operation is implemented by the Backpropagation algorithm: the principle and central algorithm in deep learning (DL) (Goodfellow et al., 2016). FIG. 3 The loss value is used as a feedback signal to regulate the weights of layers in an appropriate direction. This regulation is executed via optimizers. Input X Weights Layer (data transformation) Weights Layer (data transformation) Weight update Optimizer True targets Y Predictions Y' Loss function Loss score Cross-validation Chapter 5 95 In this program, used from rmsprop, it is suggested to leave the parameters of this optimizer at their default values (except the learning rate, which can be freely tuned). This optimizer is a good choice for recurrent NNs. Another parameter of compile function is assigning the loss function. The loss function is an object that the model try to minimize. As mentioned before the loss function is one of the two parameters required to compile the model. Different types of loss function based on types of the problem; the used loss function is different. For example, the “mse” loss function is usually applied for regression problems. For classification problems, if there is a binary problem “binary_crossentropy” is applied and if there are multi classes in output, categorical_crossentropyp is applied. In line 20, the loss function is assigned mse because as mentioned before this case is a regression problem (Shanmugamani, 2018). What is the definition of mse? The abbreviation of mse is mean square error. In statistics, the mean square error (MSE) or mean square deviation (MSD) of an estimator (a procedure for estimating an unobserved quantity) measures the average of the squares of the errors, it means, the average squared difference between the estimated values and the actual value. In Eq. (1), the MSE equation is bright: n 2 1 X MSE ¼ Y i Ybi (1) n i¼1 n ¼ number of data points Yi ¼ observed value Ybi ¼ predicted value Another parameter of compile function is metrics; a metric is a function that is used to determine the functionality of the model. The performance of metric function is similar to the loss function, except that the results from estimating a metric are not used when training the model. There are multiple different metrics such as binary_accuracy, categorical_accuracy, and so on. The metric applied in this case is MAE, what is MAE? Mae stands for mean absolute error. Mae is approximately similar to mse with a little difference. Eq. (2) indicates the MAE. X n n X e yixi i MAE ¼ i¼1 n ¼ i¼1 n (2) In classification problems, accuracy is applied to record output while in regression problems, accuracy record is meaningless. To record the output and performance of a regression program MAE is recorded more. For instance, an MAE of 0.5 on this program would mean your predictions are off by 500 dollars on average. As mentioned later in k-fold cross-validation, every fold should be generated by the model. In order to avoid repetition in the program, all codes associated with the model have been put into one function. In code 21, this function is defined completely. (21) def build_model(): model = models.Sequential() model.add(layers.Dense(64,activation=’relu’,input_shape=(train_data.shape[1],))) model.add(layers.Dense(64,activation=’relu’)) model.add(layers.Dense(1)) model.compile(optimizer=’rmsprop’,loss=’mse’,metrics=[’mae’]) return model l K-fold cross-validation There are several ways for making validation data, one manner for validation data is splitting train data into two sections (train and validation data) (Dubitzky et al., 2007; Gollapudi, 2016). This method is a so-called hold-out method. When the number of data is small like the Boston problem (in this problem the total of data is 400), this method (hold-out method) is not efficient for this problem. Splitting this little data into two sections and the separated train data is not reliable as the performance of the model. This method is not efficient for little data. A way for solving such problems is using K-fold crossvalidation (Provost and Fawcett, 2013). The procedure of K-fold is that data is divided into k folds equally. For instance in Fig. 4, it can be seen a threefold cross-validation. After dividing data into k folds, the model is run k times, for example, three times. In run 1 or fold 1, part 1 becomes validation and the two successive parts become training and run 2 and run 3 are executed respectively by moving the validation section into the data set. In each fold, the score of the model (validation score) is assessed and reported and finally the average of these scores is measured. The average of these scores is assumed as model accuracy. As mentioned before, the idea of K-fold cross-validation is appropriate for little datasets such as the Boston 96 Handbook of hydroinformatics Data split into 3 partitions Fold 1 Validation Training Training Validation score #1 Fold 2 Training Validation Training Validation score #2 Fold 3 Training Training Validation Validation score #3 Final score: average FIG. 4 The idea of K-fold cross-validation. dataset. The computational process becomes difficult in K-fold cross-validation in terms of repeating model training because it should be trained k different models while in previous cases, it trained one model and then evaluated it with a validation set but the K-fold method is more reliable in comparison with other methods particularly for little datasets. Fig. 4 indicates the idea of K-fold cross-validation in detail. The following explained the codes about k-fold cross-validation in detail and clearly. In this example, assumed there are four folds cross-validation. (22) import numpy as np (23) k=4 (number of folds) (24) num_val_samples = len(train_data) // 4 ¼> at this line the number of validation samples are calculated. Please noted that the sign // means correct division. For instance 5/4 ¼ 1.25 but 5//4 ¼ 1, it turns out an integer number. (25) num_epochs = 500 ¼> number of epochs. (26) all_scores = [ ] ¼> this array stores the score of each fold and consequently the length of this array will be 4. Now we should prepare data: in lines 27 and 28 validation data and targets (labels) are prepared separately. For example val_data when i¼0 is val_data ¼ train_data[0:101] and the validation targets are: val_targets ¼ train_targets[0:101] (27) val_data = train_data[i * num_val_samples : (i+1) * num_val_samples] (28) val_targets = train_targets[i * num_val_samples : (i+1) * num_val_samples] In lines 29 and 30 prepared partial train data and partial train targets (29) partial_train_data = np.concatenate( [train_data[:i*num_val_samples], train_data[(i+1)*num_val_samples:]], axis=0) (30) partial_train_targets = np.concatenate( [train_data[:i*num_val_samples], train_data[(i+1)*num_val_samples:]], axis=0) Cross-validation Chapter 5 97 to explain code lines 29 and 30 assume i¼1 ¼> to simplify our task lets abbreviate num_val_samples nvs: therefore validation data is => val_data = train_data[nvs : 2*nvs] while partial_train_data = train_data[:nvs] and train_data[2*nvs:].concatenate means attach these two ranges of data together. Now to make a new model, call build_model() function: (31) model = build_model() The model should now be trained using the partial_train_data by fit function. As mentioned before training process is executed by fit function. This function have several parameters: fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1,callbacks=None, validation_split=0., validation_data=None, shuffle=True,class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None,validation_steps=None, validation_freq=1, max_queue_size=10, workers=1,use_multiprocessing=False, **kwargs) (32) model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0) As it is obvious the number of epochs are 500 and the important point is that the verbose parameter is zero, but the question here is what is the use of verbose while training the model? By setting verbose 0, 1 or 2, you mention that how do you want to see the training progress for each epoch. Verbose ¼ 0 will show you nothing(silent) Verbose ¼ 1 will show you an animated progress bar like this: [¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼] Verbose ¼ 2, will just mention the number of epoch like this: Epoch 1/10 Therefore, the code line 32 executed the training operation on the train data. Now the trained model should be evaluated on the validation data and then the mae (mean absolute error) which is measured from this step appended to all_scores list (array). (33) val_mse, val_mae = model.evaluate(val_data,val_targets,verbose=0) (34) all_scores.append(val_mae) Because k ¼ 4, the loop is repeated four times and as a result four mae will be obtained for validation data and the final mae will be the average of all maes in all_scores list. It is noteworthy to mention that the subject of k-fold cross-validation is too significant in machine learning and data mining. It will be useful to mention that all codes about k-fold cross-validation are ready in scikit to learn and machine learning libraries and there is not necessary to write codes. Now run code and print all_scores: (35) print(all_scores) => [2.018010139465332, 2.5180163383483887, 2.4935202598571777, 2.712078809738159] => while the number of epochs are 500 As it is obvious, there are four folds and for each fold, there is one mae. As mentioned before, we defined an array known as all_scores and save all folds maes in this array. If averaged from these maes. (36) np.mean(all_scores) ¼> 2.4354063868522644 This means the average mean absolute error among all folds is approximately 2.43. the idea of k-fold is that instead of making train and validation data one time, the train and validation data are made several times and then take an average from those maes. In this state, the output of this method is more reliable. If attend to codes, the final result of the program are one mae. If we want to record the status of program in each fold and all steps (epochs) and save them (for instance if the 98 Handbook of hydroinformatics program has 500 epochs this manipulated code, saves 500 maes), we should manipulate the codes a bit. The output of this code is a matrix with 500 rows and four columns. (37) history = model.fit(partial_train_data,partial_train_targets, validation_data=(val_data, val_targets), epochs=num_epochs, batch_size=1, verbose=0) mae_history = history.history[’val_mean_absolute_error’] all_mae_histores.append(mae_history) mae_history are saved in an array known as all_mae_history but first we should define it. ¼> all_mae_history = [ ] The output of this code (line 37) is an array that has 4 elements and every element in this array is a list (array) again. This gives 500 mean absolute error (mae) in each epoch. The array is like this: [[MAE]500∗1, [MAE]500∗1, [MAE]500∗1, [MAE]500∗1] Each list (array) in this list is for one specific fold. Let’s increase epoch number to 500. To consider precisely, should be plotted the output of the program (validation MAE). But what this figure means? For more explanations, consider the list below: Fold1 [[MAE1, MAE2, MAE3, .................,MAE500], Fold2 [MAE1, MAE2, MAE3, .................,MAE500], Fold3 [MAE1, MAE2, MAE3, .................,MAE500], Fold4 [MAE1, MAE2, MAE3, .................,MAE500]] When calculating the model performance after epoch 1, you should average all MAE1 values. All codes about this operation and plotting the results are reported as bellow: (38) average_mae_history=[np.mean([x[i] for x in all_mae_scores]) for i in range(num_epochs)] (39) plt.plot(range(1,len(average_mae_history)+1),average_mae_history) plt.xlabel(’Epochs’) plt.ylabel(’Validation MAE’) plt.show() If consider closely to Fig. 5, MAE is decreased rapidly in earlier epochs and little by little begins to increase and start to overfits. If attended to this figure about epoch 80, model started to overfit. Two changes enabling us to recognize the variations in the plot are applied. First removing the first epochs because they don’t have any benefits and another change, removing sharp changes in plot, this task is so-called plot smoothing. One way for plot smoothing and removing noise from the plot is averaging. For removing the sharpness in the plot and smoothing it, the code below is written. (40) def smooth_curve(points,factor=0.9): smoothed_points=[] for point in points: if smoothed_points: previous = smoothed_points[-1] smoothed_points.append(previous*factor+point*(1-factor)) else: smoothed_points.append(point) return smoothed_points (41) smoothed_mae_history=smooth_curve(average_mae_history[10:]) (42) plt.plot(range(1,len(smoothed_mae_history)+1),smoothed_mae_history) plt.xlabel(’Epochs’) plt.ylabel(’Validation MAE’) plt.show() Cross-validation Chapter 5 99 FIG. 5 Output of network (Validation MAE per Epochs). 4.5 Validation MAE 4.0 3.5 3.0 2.5 0 100 200 300 400 500 Epochs First, in line 40, the smooth_curve is defined; the duty of this function is removing sharpness from validation MAE plot. Then in line 41, all data from epoch 10 passed to this function and in line 42, the diagram is plotted via matplotlib library (this library is in core of python for plotting different diagrams). The output of this code is much clearer and more explicit than previous one. As it can be seen from this plot after epoch 40, the network is starting to overfit; therefore, the epoch 40 is the best situation for stop training. Fig. 6 indicates the output of network while epochs 1–10 are omitted and plot are smoothed by applying smooth function. The average type which applied in this example is exponential moving average (Khosrow-Pour, 2012). After determining the hyper parameter (epochs number) in previous step, now should be retrain the network on all train data with 40 epochs. This task executed with fit method. Then the retrained network is evaluated on test data and finally reported the test_mae_score. FIG. 6 Validation MAE after applying smooth_curve function on data. 2.75 2.70 Validation MAE 2.65 2.60 2.55 2.50 2.45 2.40 2.35 0 100 200 300 Epochs 400 500 100 Handbook of hydroinformatics (43) model=build_model() (44) model.fit(train_data,train_targets,epochs=40,batch_size=16,verbose=0) (45) test_mse_score,test_mae_score=model.evaluate(test_data,test_targets) >>>> test_mae_score = 2.845837354660034 This means the error of network is about 2.8, in another word the error of this network is 2.8 thousand dollars. For instance, if the price of a house is $15,000, this network predicted the price between about $12,500 and $17,500. Now we are going to go through a project called nutrient removal efficiency data and we use a data set containing 7876 data to analyze the total nitrogen (TN) removal efficiency of an anaerobic anoxic-oxic membrane bioreactor system. About 5000 data were separated for training the network and the remaining were allocated for test data. In this example, we used K-fold cross-validation method for training network. The output values are predicted by 9 input data presented in Table 1. This dataset was taken from the data reported from a paper published by Yaqub et al. in 2020 (Brownlee, 2018). At first, we consider K ¼ 2, and then the K-fold cross-validation method was executed on the training data. According to Fig. 7, visually we can understand from epochs 70, the overfit related to this model begins. To remove the figure oscillations, the smooth function was used to view the overfit point more clearly. It is concluded that the overfit value begins from epochs 75 which is shown in Fig. 8. After magnifying the curve of Fig. 8 around the overfit point, the exact value of the overfit point is epochs 75 which can be seen in Fig. 9. After completing the run of cross-validation code, the test MAE and MSE values are obtained: test_mae_score: 3.131399154663086 test_mse_score: 21.55681976617551 In the second run, we consider K ¼ 5. After training the network via the K-fold cross-validation code, Figs. 10–12 were obtained. By considering these figures, after epochs 47, the overfit starts. After setting the epochs on 47 and compiling the network again on the test data, the MAE and MSE values are obtained: test_mae_score: 2.9801080226898193 test_mse_score: 19.03835214353075 Finally, after the complete run of the cross-validation code with k ¼ 7, it was concluded that the start of overfitting is from epochs 40 and the following diagrams are reported for the output (Figs. 13–15): TABLE 1 Attribute information of nutrient removal efficiency project. Code Input or output Description TOC Input Total organic contents TN Input Total nitrogen TP Input Total phosphorous COD Input Chemical oxygen demand NH4-N Input Ammonium SS Input Suspended solids DO Input Dissolved oxygen ORP Input Oxidation-reduction potential MLSS Input Mixed liquor suspended solids RE of NH4-N output Removal efficiency of NH4-N RE of TN output Removal efficiency of TN RE of TN output The removal efficiency of TP Cross-validation Chapter 5 101 14 Validation MAE 12 10 8 6 4 2 0 50 100 150 200 250 300 Epochs FIG. 7 The output of network (Validation MAE per Epochs). 3.3 Validation MAE 3.2 3.1 3.0 2.9 2.8 2.7 0 50 100 150 Epochs 200 250 300 FIG. 8 Validation MAE after applying smooth_curve function on data. After completing the run of network on test data for epochs 40, the MAE and MSE values are reported as below. test_mae_score: 2.796928882598877 test_mse_score: 15.402697095683976 By comparing the values obtained from the data related to this project, it can be inferred that with increasing k, the mae (and also mse) values decrease. Of course, it should not be forgotten that with increasing k, the computational time increases sharply and a logical balance must be struck between these two inverse factors. 2.68 Validation MAE 2.67 2.66 2.65 2.64 2.63 65 70 75 80 85 90 95 100 Epochs FIG. 9 Validation MAE after applying smooth_curve function on data and magnifying around overfit point. 7 Validation MAE 6 5 4 3 2 50 0 100 150 Epochs 200 250 300 FIG. 10 The output of network (Validation MAE per Epochs). 2.9 2.8 Validation MAE 2.7 2.6 2.5 2.4 2.3 0 50 100 FIG. 11 Validation MAE after applying smooth_curve function on data. 150 Epochs 200 250 300 Cross-validation Chapter 2.2855 Validation MAE 2.2850 2.2845 2.2840 2.2835 48 47 49 Epochs 50 52 51 FIG. 12 Validation MAE after applying smooth_curve function on data and magnifying around overfit point. 6 Validation MAE 5 4 3 0 50 100 FIG. 13 The output of network (Validation MAE per Epochs). 150 Epochs 200 250 300 5 103 104 Handbook of hydroinformatics 2.70 2.65 Validation MAE 2.60 2.55 2.50 2.45 2.40 2.35 2.30 0 50 100 150 Epochs 200 250 300 40 Epochs 45 50 55 FIG. 14 Validation MAE after applying smooth_curve function on data. 2.325 Validation MAE 2.320 2.315 2.310 2.305 2.300 2.295 25 30 35 FIG. 15 Validation MAE after applying smooth_curve function on data and magnifying around overfit point. 4. Conclusions In this chapter, the K-fold cross-validation is explained in detail and a famous and familiar example about this issue (predicting Boston house pricing) is represented. As mentioned before, there are two approaches for cross-validation types of problems: the first method uses ready-made machine learning library in python known as scikit-learn and the second one codes cross-validation in python from scratch. For more understanding about cross-validation concept, we have chosen the secondary method named deep learning technique. The achieved result in Boston indicated that the value of the mean absolute error is about 2.84 which means the predicted price has difference about $2800 in comparison with its actual value. Also, by analyzing the data of the nutrient removal efficiency project, it was concluded that with increasing k, the obtained MAE values decrease and the time for calculations increases sharply. Cross-validation Chapter 5 105 References Berrar, D., 2019. Cross-validation. In: Encyclopedia of Bioinformatics and Computational Biology. vol. 1. Elsevier, pp. 542–545. Brownlee, J., 2016. Machine Learning Mastery With Python. Machine Learning Mastery Pty Ltd, pp. 100–120. Brownlee, J., 2018. Statistical Methods for Machine Learning: Discover How to Transform Data Into Knowledge With Python. Machine Learning Mastery. Brunton, S.L., Kutz, J.N., 2019. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press. Dangeti, P., 2017. Statistics for Machine Learning. Packt Publishing Ltd. De Prado, M.L., 2018. Advances in Financial Machine Learning. John Wiley & Sons. Deisenroth, M.P., Faisal, A.A., Ong, C.S., 2020. Mathematics for Machine Learning. Cambridge University Press. Dua, S., Du, X., 2016. Data Mining and Machine Learning in Cybersecurity. CRC Press. Dubitzky, W., Granzow, M., Berrar, D.P., 2007. Fundamentals of Data Mining in Genomics and Proteomics. Springer Science & Business Media. El Naqa, I., Li, R., Murphy, M.J., 2015. Machine Learning in Radiation Oncology: Theory and Applications. Springer. Fontaine, A., 2018. Mastering Predictive Analytics With Scikit-Learn and TensorFlow: Implement Machine Learning Techniques to Build Advanced Predictive Models Using Python. Packt Publishing. Geron, A., 2019. Hands-on Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media. Gollapudi, S., 2016. Practical Machine Learning. Packt Publishing Ltd. Goodfellow, I., et al., 2016. Deep Learning. vol. 1 MIT Press, Cambridge. Hadizadeh, R., Eslamian, S., 2017. Modeling hydrological process by ARIMA–GARCH time series. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of Drought and Water Scarcity, Vol. 1: Principles of Drought and Water Scarcity. Taylor and Francis, CRC Press, USA, pp. 571–590 (Chapter 30). Han, J., Pei, J., Kamber, M., 2011. Data Mining: Concepts and Techniques. Elsevier. Jansen, S., 2018. Hands-On Machine Learning for Algorithmic Trading: Design and Implement Investment Strategies Based on Smart Algorithms That Learn From Data Using Python. Packt Publishing Limited. Kane, F., 2017. Hands-on Data Science and Python Machine Learning. Packt Publishing Ltd. Karim, M.R., Kaysar, M.M., 2016. Large Scale Machine Learning with Spark. Packt Publishing Ltd. Khosrow-Pour, M., 2012. Machine Learning: Concepts, Methodologies, Tools and Applications. Information Science Reference, Hershey, PA, USA. Krohn, J., Beyleveld, G., Bassens, A., 2019. Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence. Addison-Wesley Professional. Kumar, R., 2019. Machine Learning Quick Reference: Quick and Essential Machine Learning Hacks for Training Smart Data Models. Packt Publishing Ltd. Lei, J., 2019. Cross-validation with confidence. J. Am. Stat. Assoc. 115 (532), 1–20. Millstein, F., 2020. Convolutional Neural Networks in Python: Beginner’s Guide to Convolutional Neural Networks in Python. Frank Millstein. Moolayil, J., Moolayil, J., John, S., 2019. Learn Keras for Deep Neural Networks. Springer. M€ uller, A.C., Guido, S., 2016. Introduction to Machine Learning With Python: A Guide for Data Scientists. O’Reilly Media, Inc. Provost, F., Fawcett, T., 2013. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O’Reilly Media, Inc. Raschka, S., 2015. Python Machine Learning. Packt Publishing Ltd. Raschka, S., Mirjalili, V., 2017. Python Machine Learning. Packt Publishing Ltd. Richert, W., 2013. Building Machine Learning Systems With Python. Packt Publishing Ltd. Sarkar, D., Bali, R., Sharma, T., 2018. Practical Machine Learning With Python. A Problem-Solvers Guide to Building Real-World Intelligent Systems. Apress, Berkely. Shanmugamani, R., 2018. Deep Learning for Computer Vision: Expert Techniques to Train Advanced Neural Networks Using TensorFlow and Keras. Packt Publishing Ltd. Swamynathan, M., 2019. Mastering Machine Learning With Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using Python. Apress. Viswanathan, V., et al., 2016. R: Recipes for Analysis, Visualization and Machine Learning. Packt Publishing Ltd. Zaman, H.B., et al., 2019. Advances in Visual Informatics: 6th International Visual Informatics Conference, IVIC 2019, Bangi, Malaysia, November 19–21, 2019, Proceedings. vol. 11870 Springer Nature. This page intentionally left blank Chapter 6 Comparative study on the selected node and link-based performance indices to investigate the hydraulic capacity of the water distribution network C.R. Suribabua and P. Sivakumarb a Centre for Advanced Research in Environment, School of Civil Engineering, SASTRA Deemed University, Thanjavur, Tamil Nadu, India, b Department of Civil Engineering, North Eastern Regional Institute of Science and Technology, Nirjuli (Itanagar), Arunachal Pradesh, India 1. Introduction Water distribution system failures generally viewed as when the network does not supply water to the consumers with the full degree of satisfaction. This happens due to failure of pumps, power outage, pipe breakage, valve malfunction, change in the demand pattern, increase in pipe roughness due to age, water shortage at the source, inadequate storage system, and pipe size, in which failure of pumps, pipe breakage, power outage, and valve malfunction creates shortage of water supply or no supply of water to the consumer until failures rectified. If the failure rectified in a short duration, then the possibility of restoring the system will be fast. The degree of satisfaction of network under such circumstances can be measured as the resilience of water distribution network (WDN). If the availability of service pressure at all the demand nodes is ensured through the design of the network when whatsoever type of failure happens, then network is said to be resilient system. In most of the cases, the reliability of water availability has not been considered in while estimating the resilience of the network since source reliability is treated as independent parameter. Resilience index proposed by Todini (2000) and Jayaram and Srinivasan (2008) quantifies through the availability of excess service pressure. Ensuring higher pressure at the nodes can compensate a short fall of pressure during failures. Several performance measures are used to assess the capability of networks toward performing its intended task for which design is made. In any system, the random failure is inevitable. Overcoming and minimizing its effect is one of the key design factors while designing the water distribution system. The upper and lower limits of measure can be set based on the level in which system needs to be performed under abnormal conditions. The cost of the system will increase if higher the performance measure is expected. Design choice should be how best the system performance can be improved within budget amount for implementation of WDN. The following few measures are commonly adopted in the design. The resilient concept related to water supply system can be viewed in terms of resilient infrastructure. Wang et al. (2009) categorized that the resilient infrastructure is one that shows (a) reduced failure probabilities, (b) reduced consequence of failure, and (c) reduced time to recovery. Howard and Bartram (2010) formulated the resilience concept of supply water through the pipeline as a function of the resilience of individual components of the system, namely, the source, treatment and distribution through primary, secondary and tertiary pipes, and in system storage infrastructure. Jeong and Kang (2020) presented a new link-based reliability index called hydraulic uniformity index that consider the head-loss distribution in the entire network. They have introduced equivalent head loss and equivalent hydraulic gradient concept and its value used for comparing actual value of each pipe in the network. In this measure, the hydraulic uniformity of the network is quantified based on an inborn property of that network’s configuration instead of permitted overall head loss while designing the network. Reliability of water distribution network (WDN) decreases due to the aging of the pipes (Mazumder et al., 2019). If the pipe materials are in corrosive nature, then the aging of the pipe poses an increased resistant to the flow mainly due to inner surface roughness projections and also its strength decreases further due to the result of corrosion on outer portion of the pipe where pipe materials interact with soil. Commonly, the reliability of water distribution network is quantified based on Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00007-5 Copyright © 2023 Elsevier Inc. All rights reserved. 107 108 Handbook of hydroinformatics the probability of failures during design periods, in which pipe failures are commonly taken as a prime factor to address the system reliability. Guercio and Xu (1997) defined reliability evaluation of WDN related to pipe breakage as probability that network can supply design demand when some components go out of service. A typical single objective optimization model for the design of WDNs tries to minimize the cost of investment on procuring pipes. The minimum cost solution may not fulfill the intended purpose during an abnormal operational condition. Except the dead end pipeline, most of the pipelines carry flow for other pipelines. Hence, the failure of one pipe seriously affects the demand of several nodes under abnormal operating condition. The increasing carrying capacity of each pipe with minimum overall capital cost has become as a new objective. The maintaining good carrying capacity with the minimum cost invited a multiobjective formulation for the design of WDN. Maximizing either reliability or resilience had become important second objective for the design of WDN. Failure of a single pipe is considered as a serious effect on the demand satisfaction while comparing the failure of combination of several pipes though its effect will be enormous to the system (Reed et al., 2010; Paez and Filion, 2019). The probability of failure of combination of two or more pipes is very minimum as such incident seldom happens. Providing excess pressure to all the demand nodes has been considered while developing performance measures. Paez and Filion (2019) presented two reliability estimators for performance assessment of WDN in which mechanical reliability estimator considers both the probability of failure of components and also the proportion of supply under failure. They have suggested that the proportion of supply during particular pipe failure is equal to the difference in total base demand and flow through failure pipe under abnormal condition. Hydraulic reliability estimator is formulated to evaluate the expected value of demand that is met under all possible demand variations. Mazumder et al. (2019) considered change in pipe roughness due to aging for hydraulic performance analysis. The failure of pipes, pumps, valves, and failure of power supply can be rectified less than 24 h. Hence, the normal supply can be restored within 48 h in the worst conditions. Cities and towns provided with continuous water supply system can have serious effect if the failures are not rectified under war footing. It is the responsibility of respective water supply agency to have an emergency plan to restore the system upon failures. Fujiwara and De Silva (1990) specified that mean time to repair irrespective of the pipe diameter as 2 days. Standby units for pumps and generator can help greater extend to restore the system in short interval of time. Further, the water can be supplied using a water tanker truck in the area affected by pipe failure during repair period. The serious factor which reduces the reliability of water supply is due to the increase of pipe roughness which needs to be quantified and its effect on water supply shall be considered at the design stage itself and also during at regular performance assessment stages. The one way of consideration at the design stage is that the roughness value can be taken corresponding to half of its service life instead of new pipe roughness. It is common that while designing the network, the peak demand corresponding to the planning period is considered as per guidelines. The aging of pipe line does not only increase the pipe breakage and but also increases the roughness of the pipe surfaces. Increasing of inner roughness reduces the designed supply and increases the probability of component failures. The availability of WDN can be described as the percentage of a time that will be able to respond the design supply with the desired service pressure. The time taken for the repair and restoring service to the normal will be a crucial parameter to measure the availability. It is to be noted that the reliability is closely related to the availability. Though the aging of water pipes and other components may not affect its availability, but its service toward fulfilling the design demand under aging condition may not be possible. It reaches a status that though it is available, but not reliable. Sizing the pipe in such a way that it can satisfy the design demand irrespective of its age can have ability to recover after damage and failure. When the pipe is designed with excess nodal pressure than what is actually required at the beginning period of planning horizon, then it will be able to provide supply despite the increase in the head loss due to roughness, increase in demand, and damages of system. During restoration period, the network still can provide its best service when it is designed to have self-healing characteristics. Under fully water availability condition at the source, the degree of supplying water during failure time is described as the resilience of the network. Ensuring service pressure at all the time in the network is essential for network to have good resiliency. Investigation by Swati and Janga Reddy (2020) through two benchmark networks and a real network has concluded that resiliency can be a good surrogate measure for hydraulic reliability rather than for mechanical reliability. Earlier, Reed et al. (2010) and Banos et al. (2011) highlighted that resilience index fails to represent the mechanical reliability of water distribution network. Walski (2020) mentioned that reliability of water distribution network highly depends on redundancy and availability of more isolation valves rather than increasing the capacity of pipe while designing a network. The systematic placement of valves is more important since single pipe failure can isolate several pipes for providing supply to the nodes. Hence, the network design based on hydraulic reliability indicator can help to tackle demand variation and aging effects on the supply. Maintaining the uniform flow distribution among the pipes can reduce the burden of carrying inflow to the node by particular pipe (Moosavian and Lence, 2020). The designing network with uniform head loss among all the pipes can have good performance characteristic. Hashemi et al. (2020) investigated the effect of pipe size and location of the pipe from source on the head loss using 18 water distribution systems in Comparative study on the selected node Chapter 6 109 North America. The study suggested that flow rate in the pipe is more influencing factor than the diameter of the pipe for causing head loss in case pipes are located near to the water source and whereas at the periphery of the network, the pipe diameter is found to be a critical factor than flow rate. In the present work, another new link-based index which takes maximum permitted head loss in the network while designing the network. Relative head loss in each pipe is computed in reference to average head loss per unit length of the pipe. And this work illustrates how pipe dimensioning can be done to obtain uniformly distributed head loss across the network using average unit head loss without much computational efforts. 2. Resilience of water distribution network Todini (2000) has proposed a resilience index to address the intrinsic capability of system to overcome the failures. If the system is designed based on resilient point of view, then it can have reduced level of failure instant, minimum or reduced failure consequences, and be able to recover quickly from the failure. Resilience can also be considered as a measure of capability of the system to absorb the shocks or to perform under perturbation. Wu et al. (2011) applied surplus power factor as a resilience measure in the optimal design of water transmission system. Yazdani and Jeffray (2012) combined robustness and redundancy to address the resilience measure of water distribution system. In the water resources system, resilience is considered as one of the important measure to quantify the capacity of system to maintain its essential functions as before during the event of unexpected stresses and disturbances (Liu et al., 2012). Resilience index (RI) proposed by Todini (2000) is the ratio between the sum of residual power of all the nodes to the sum of potential residual power of all the nodes. N X RI ¼ NDi havl,j h min,j j¼1 R X Qr hres,i + i¼1 B X P b¼1 ! b n N X (1) qj h min,j j¼1 where NDi —demand at node i havl,j —available pressure head at node j hmin,j —minimum pressure head at node j Qr —flow from reservoir i hres,i —sum of reservoir elevation and its water level of reservoir i Pb —capacity of pump b n—specific weight of the liquid N—number of nodes R—number of reservoirs Later, Jayaram and Srinivasan (2008) proposed a modified resilience index (MRI) that considers the ratio between the sum of residual power and the sum of minimum power required to all the nodes. N X MRI ¼ NDi havl,j h min,j j¼1 N X (2) NDj h min,j j¼1 While Todini (2000) resilience index theoretically varies between 0 and 1 (poor and good), the modified resilience index of Jayaram and Srinivasan (2008) can exceed 1. Fixing the upper and lower bounds will be a quite challenging task if it is considered as a constraint. In several cases, maximizing the resilience of water supply system is being considered either first or second objective function (Prasad and Park, 2004; Fu et al., 2013; Wang et al., 2014; Ostfeld et al., 2014; Moosavian and Lence, 2020). Maximization of reliability of water supply is another commonly used objective as the second objective after cost minimization in the multiobjective optimization models (Walski, 2020). Walski (2020) indicated that reliability indicator developed in addressing the excess capacity of pipe alone does not address the true reliability of WDN. Practically, the demand nodes located near to the source can have more surplus pressure energy than the faraway nodes along the flow path or identified critical nodes in the networks. Utilization of pressure head available in those nodes to meet out additional 110 Handbook of hydroinformatics demand will certainly affect the supply at those critical nodes. Critical nodes can be identified as the nodes located at the higher elevation ground, remotely connected to the source, high demand nodes, and more pipes incident to the nodes. In those nodes, the availability of surplus power will be negligible level. Supply cannot happen fully to the consumers during abnormal conditions. Hence, the availability of minimum surplus power at the critical nodes is crucial in quantifying the intrinsic capability of the system. The major advantage of Todini’s; and Jayaram and Srinivasan’s resilience indices is that it does not involve statistical considerations on failures. Network reliability can be enhanced if the network is designed to have higher value of indices. Todini (2000) indicated that the increasing of resilience is possible if the flow is distributed more evenly among all the pipes rather than allowing flow concentrically in a spanning tree. 3. Hydraulic uniformity index (HUI) Jeong and Kang (2020) introduced a link-based index called hydraulic uniformity index to evaluate system design. For individual pipe, the ratio between its hydraulic gradient of a pipe and the equivalent hydraulic gradient is designated as HUI and for entire network, HUIsys is given as follows: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u NP uX u ðHUIi 1Þ2 u t i¼1 (3) HUIsys ¼ NP The HUIi will be computed using following formula hli ∗ NP X Li ∗ NP X i HUIi ¼ Li ∗NP∗ Qi i¼1 NP X (4) Qi hli i¼1 where HUIi ¼ hydraulic uniformity index of the ith pipe Qi ¼ flow in ith pipe Li ¼ length of ith pipe NP ¼ number of pipes hli ¼ head loss in ith pipe. 4. Mean excess pressure (MEP) The mean excess pressure for a network can be found using following expression. N X MEP ¼ havl,i h min,i i¼1 N (5) where N ¼ number of nodes. The MEP can help to compare the excess pressure availability of the network. The two different cost networks can have a same excess pressure, and its performance under abnormal condition could be distinctive. This measure could help while assessing the performance of the different configuration networks. 5. Proposed measure 5.1 Energy loss uniformity (ELU) The minimum pressure head required for a node is generally defined while designing the WDN. Primarily, the source head acts a driving force to transmit water from the source to the consumer nodes. The part of source head available is getting Comparative study on the selected node Chapter 6 111 dissipated in the pipeline to overcome the friction and minor losses. The maximum loss can be considered as the difference between the source head (hs) and minimum pressure head (hmin). This maximum permitted head loss will be a key parameter in the design of WDN. The economics of network with respect to pipe size depends on how best this maximum head loss is utilized in the selection of network configuration. Configuring the network with minimum possible pipe sizes could have maximum head loss in each pipe since head loss is indirectly proportional to the pipe size. Increase of pipe size can reduce the head loss and limiting the pipe size with respect to average unit head loss could provide balancing configuration. The average unit head loss can be computed by taking the ratio between maximum head loss and total length of the pipe line (L) in the network. UHLavg ¼ hS h min L (6) where hs ¼ source pressure head. Maintaining unit head loss in each link close to the average unit head loss (UHLavg) can eliminate either over or under sizing of the pipes in the network. But, it may not be feasible to have same unit head loss in each link since size of the pipe is decided from available commercial sizes. The ratio between actual unit head loss in each pipe and average unit head loss of the network equals one, then, it denotes the pipe is sized exactly with average unit head loss. The larger variation of this ratio denotes the pipe is undersized and vice versa. In the optimal design of the WDN, the pipe sizing is performed to minimize the cost subject to the minimum pressure head requirement as a major constraint. This specific head loss (HLspecific) can be considered as one of the design parameters in the sizing the pipeline or to address the uniformity of the network. HLSpecific ¼ UHLactual UHLavg (7) The uniformity of network’s pipe configuration in terms of head loss (energy loss uniformity) can be computed using HLspecific by finding the standard deviation with respect to one as follows: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u NP 2 uX u HL 1 u Specific,i t i¼1 (8) ELU ¼ NP where NP is the number of pipes in the network ELU helps to assess how the network configuration possesses the uniformity in the pipe capacity to meet the nodal demand. The low value of this measure indicates the good uniformity in the pipe configuration and higher value depicts that network could be composed of redundant pipes and deficient capacity pipes. Redundant pipes in the network may help to augment the flow during abnormal conditions (e.g., pipe failure scenario or excess demand period), but the cost of the network will be higher. The network with more deficient pipes could be a least cost one and its performance under abnormal conditions will be poor. The proposed ELU can be used for design of new network and also to assess the pipe uniformity of the existing system. 6. Hanoi network Hanoi network is widely used by several researchers to examine the applicability of various types of optimization algorithms and also to assess the performance evaluation of WDN using performance indices. Hanoi network is categorized as a medium size network that has a reservoir, 31 nodes, and 34 pipes. The source elevation is fixed as 100 m with respect to zero ground elevation and elevation of all the nodes maintains the same elevation with zero reduced level. The minimum service pressure of 30 m is required to all the demand nodes. To examine the proposed Energy loss uniformity (ELU), the various optimal network configurations reported in the literature (Suribabu, 2010, 2012, 2017; Beygi et al., 2014; Saldarriaga et al., 2020) have been considered. The layout of the network, pipe length, and demand details are available in Suribabu (2010). Further, resilience index (Todini, 2000; Eslamian et al., 2019), modified resilience index ( Jayaram and Srinivasan, 2008), and hydraulic uniformity index ( Jeong and Kang, 2020) are considered for a comparative study to assess the performance of the network. 112 Handbook of hydroinformatics 7. Results and discussion The optimal solutions available for Hanoi network in the literature are used to evaluate its resilience, hydraulic uniformity and energy loss uniformity. ELU, MEP, RI, MRI, and HUIsys were calculated for 17 selected solutions. Table 1 gives the computed value of these indices and cost of the network. The general observations on the values of four indices indicate that there is a decreasing index value for ELU while remaining three indices values show increasing trend. However, in certain solutions, the index does not match with this kind of trend. For example, at the higher cost, ELU value should be lower than the least cost solution. But, the solution 15 has higher cost with higher ELU value. RI and MRI also have the same change for the same solution, whereas HUIsys shows higher value which depicts that network has better uniformity. According to HUIsys, it is rated better solution. The denominator value of ELU, RI, and MRI has a constant value for a network, whereas in case of HUIsys, its value will be a variable type. For direct comparison of indices for the selected solutions, three different plots are prepared. It can be seen from Fig. 1 that ELU is not directly related to or linearly fitting with these three indices. Hence, the proposed index cannot be viewed as replica of those three indices. From the plot (Fig. 2) of resilience indices against HUIsys, it can be seen that HUIsys has not directly related to resilience indices. Fig. 3 clearly shows that modified resilience index is directly related with resilience index as the slope between them is found to be constant. It is to be noted here that RI and MRI are the excess pressure energy available based indices whereas proposed ELU index and HUIsys are energy utilized indices. ELU takes maximum permissible head loss (difference between source head and minimum service pressure head) as a reference to address the uniformity, whereas HUI considers equivalent head loss and it is obtained by taking weighted average of head loss of that network. The ratio between flow in a pipe and sum of flow in all the pipes is considered as a weighing factor to get equivalent head loss. This value varies solution to solution since the distribution of flow varies significantly when the network configuration changes. Due to this aspect, the denominator of HUIsys expression will be a variable. Hence, HUI cannot be used directly in the optimization model for arriving best configuration in terms of hydraulic uniformity. In the process of optimization, it generates lot of infeasible solutions and equivalent head loss is calculated with reference to head loss in each pipe of that network. The HUIsys can help effectively to assess the hydraulic uniformity of the feasible network TABLE 1 Various performance measure values for selected solutions for Hanoi network. Solution number Cost of the network ($) ELU MEP (m) HUIsystem RI MRI 01 6,081,087 4.102 11.59 0.843 0.134 0.447 02 6,232,322 4.128 13.40 0.843 0.151 0.504 03 6,260,886 4.083 13.21 0.861 0.150 0.502 04 6,302,313 4.236 13.38 0.846 0.150 0.501 05 6,374,469 4.072 12.76 0.871 0.172 0.574 06 6,399,904 4.567 12.75 0.897 0.144 0.479 07 6,426,983 4.282 14.21 0.887 0.162 0.541 08 6,650,114 3.641 18.54 0.912 0.203 0.675 09 6,703,973 3.663 18.23 0.939 0.197 0.655 10 6,710,999 3.550 19.84 0.959 0.211 0.705 11 6,800,022 3.697 19.31 0.943 0.206 0.686 12 6,927,267 3.591 17.40 0.967 0.188 0.627 13 6,950,027 3.667 20.24 0.983 0.213 0.711 14 7,128,405 3.540 20.85 0.969 0.222 0.739 15 7,216,711 4.522 18.58 1.032 0.198 0.659 16 7,417,236 3.535 21.89 1.002 0.229 0.765 17 7,797,775 3.540 22.80 1.042 0.237 0.789 Comparative study on the selected node Chapter 6 113 FIG. 1 Plot between three performance indices with the proposed ELU index. 1.2 Performance Indecies 1 0.8 0.6 RI MRI 0.4 HUIsystem 0.2 0 3 3.5 4 4.5 5 Energy Loss Uniformity 0.8 0.7 FIG. 2 Plot between resilience indices with HUI system. Resilience Index Modified Resilience Index Resilience 0.6 0.5 0.4 0.3 0.2 0.1 0 0.8 0.85 0.9 0.95 1 1.05 HUI system Modified Reslience Index 0.9 FIG. 3 Plot between resilience index. 0.8 modified resilience index and 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 Resilience Index configuration. The proposed approach can be used for both in optimization model and also to assess the ELU of any network configuration. Though RI and MRI have higher values at higher cost, but, it is unable to satisfy the uniformly increased demand of several nodes. Fig. 4 shows the supply-demand ratio for 10%, 20%, and 30% increase in the demand for all the nodes of least cost solution 1 ($6,081,087) given in Table 1. Nodes 2, 3, 4, 5, 18, 19, and 20 satisfy the increased demand fully. EPANET 114 Handbook of hydroinformatics FIG. 4 Supply demand ratio of solution cost $6,081,087 for increased demand conditions. 1.2 10% Increase in Demand 20% Increase in Demand Supply-Demand Ratio 1 30% Increase in Demand 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Node Number 2.2 version was used to compute the supply for each node wherein the minimum and required pressure head was taken as 30 m and 45 m respectively for Hanoi network. The pressure-driven analysis (PDA) of EPANET 2.2 indicates that all the nodes other than full supplying nodes are able to supply partially for increased demand. The supply-demand ratio for solution 1 for 10%, 20%, and 30% increase in demand is 0.892, 0.843, and 0.796, respectively. Three solutions having cost $6,710,999, 7,128,405, and 7,797,775 are further taken up from Table 1 to compute the supply-demand ratio under 10%, 20%, and 30% increase in demand. It can be seen from the radar plots (Fig. 5A–C) that the increased demand is met only in those nodes close to the source that have more excess pressure. Deviation of curve is visible only in those nodes and curve continues to overlie each other in the remaining nodes. There is a marginal difference in the value of supply-demand ratio for all three solutions. For 10% demand increase, the supply-demand ratio is obtained as 0.946, 0.954, and 0.967, respectively, and corresponding to 20% its value is 0.897, 0.904, and 0.915, and while for 30%, 0.848, 0.856, and 0.864, respectively. It is to be noted that the increase in cost alone does not ensure guaranteed full supply when the demand increases over a period. The main reason could be that though there is an excess capacity to meet out increased demand, the remotely located nodes from the source continue to face excess head loss along its path. The variation of pressure among the nodes should be as low as possible to meet out the increased demand uniformly. This could be possible only if the reservoir is relocated at the center of the network. The percentage increase in the cost between minimum and maximum cost solutions available for the Hanoi network as per Table 1 is 22.01%, and corresponding increase of MEP, RI, and MRI are 49.17%, 43.45%, and 43.34% respectively. The percentage increase of RI and MRI indicates that these two indices have direct correlation with mean excess pressure (MEP) available in the network. In case of HUIsystem and ELU, the percentage change is 19.1% and 15.88%, respectively. Though these two indices use head loss in the pipeline, there is a significant variation. It is to be noted that ELU uses maximum permissible head loss instead of equivalent head loss to measure the uniformity. MEP is found to be a simple measure that can be used as a second objective in the multiobjective optimization. For example, while referring the solution number 10 and 11 in Table 1, the lower cost solution has higher value of MEP than the higher cost solution. Similarly, the same is case for solutions between 14 and 15. A new Hanoi network configuration is arrived to ascertain the usefulness of average unit head loss. The pipe diameter is picked up for each link in such a way that which size could provide closest value of head loss with respect to average unit loss. This can be done by setting the same size to all the pipes. By simulating the network using demand-driven analysis, the pipe ID was selected to retain that diameter. This procedure repeated for remaining commercial sizes. Table 2 illustrates how the pipe diameter was selected for each pipe based on unit head loss. The solution obtained by this approach has total cost of $7,709,796. The head loss in each of pipe in the new configuration has a closest value with chosen head loss (shown as bolded in Table 2) in the diameter selection process. Though the head loss for Pipe ID 1 and 2 is very high, there is no higher pipe size available in the option to select. For this new configuration, MEP, ELU, HUIsystem, RI, and MRI are calculated and its value is 21.68 m, 3.536, 1.011, 0.228, and 0.760, respectively. This solution is found to be minimum cost solution, while the performance measure values are nearest with values of 17th solution listed in Table 1. Similarly, the supply-demand ratio for 10%, 20%, and 30% increase in the demand is 0.959, 0.908, and 0.858, and it appears closely Comparative study on the selected node Chapter 6 115 FIG. 5 Radar plot for quantity of demand met at each node against demanded quantity. (A) Cost of network $6,710,999. (B) Cost of network $7,128,405. (C) Cost of network $7,797,775. performing with 17th solution in meeting the additional demand. Hence, designing the network using average head loss will be easiest method to get better performing network configuration with minimum cost without much computational effort. The design procedure illustrated can be used to pick pipe sizes to arrive a good configuration in terms of higher value of performance indices. Still better configuration can be obtained through an optimization keeping minimization of difference between actual head loss and average head loss as a second objective function while cost minimization as a first objective. Further, the network performance is examined by relocating the reservoir at node 18 and this node appears as central position to the Hanoi network. A new configuration is arrived for relocated reservoir position of the network using average head loss. The same step is adopted to arrive a new configuration. The cost of the network is $7,177,777 and MEP, RI, MRI, HUIsys, and ELU values are 48.25 m, 0.493, 1.644, 1.487, and 2.850, respectively. When this solution is compared with solution 14 in Table 1, there is a drastic change in the various indices’ values. For the cost increase of 0.68%, the change in the value MEP, RI, MRI, HUIsys, and ELU values are 56%, 54.98%, 55.05%, 34.84%, and 19.50%, respectively. It is interesting to note that this solution is able to meet 37% increased demand at all the nodes. In case the network is configured all the pipes with 1016 mm, the cost of the network will be $10,969,797 and its MEP value is 28 m. This MEP value is 58% lesser than the new solution obtained by changing the position of the reservoir. 116 Handbook of hydroinformatics TABLE 2 Unit head loss for each pipe and selected diameter for new Hanoi network configuration. Unit head loss (m/km) for the pipe diameter (mm) 1016 Selected diameter (mm) Head loss for selected configuration (m) 116.10 28.59 1016 28.59 316.34 106.69 26.27 1016 26.27 87.09 35.83 12.08 2.98 1016 2.89 247.76 83.56 34.38 11.59 2.86 1016 2.77 783.68 193.00 65.09 26.78 9.03 2.22 1016 2.15 6 518.26 127.63 43.04 17.71 5.97 1.47 1016 1.41 7 242.73 59.78 20.16 8.29 2.80 0.69 762 2.63 8 158.26 38.98 13.14 5.41 1.82 0.45 762 1.68 9 93.35 22.99 7.75 3.19 1.08 0.26 762 0.97 10 142.44 35.08 11.83 4.87 1.64 0.40 762 1.64 11 0.73 63.12 21.29 8.76 2.95 0.73 1016 0.73 12 1.24 26.56 8.96 3.69 1.24 0.31 762 1.24 13 34.7 8.54 2.88 1.19 0.40 0.10 609.6 1.4 14 88.62 21.83 7.36 3.03 1.02 0.25 762 1.13 15 120.58 29.70 10.01 4.12 1.39 0.34 762 1.52 16 460.74 113.47 38.27 15.74 5.31 1.31 1016 1.43 17 675.47 166.35 56.10 23.08 7.78 1.92 1016 2.07 18 1082.77 266.66 89.93 37.00 12.48 3.07 1016 3.26 19 1102.98 271.64 91.61 37.69 12.71 3.13 1016 3.32 20 1186.67 292.25 98.56 40.55 13.68 3.37 1016 3.27 21 75.04 18.48 6.23 2.56 0.86 0.21 609.6 2.56 22 10.33 2.54 0.86 0.35 0.12 0.03 406.4 2.54 23 421.54 103.82 35.01 14.41 4.86 1.20 1016 1.13 24 77.33 19.04 6.42 2.64 0.89 0.22 609.6 2.62 25 16.19 3.99 1.34 0.55 0.19 0.05 508 1.32 26 6.01 1.48 0.50 0.21 0.07 0.02 406.4 2.35 27 60.7 14.95 5.04 2.07 0.70 0.17 609.6 2.4 28 97.73 24.07 8.12 3.34 1.13 0.28 762 1.26 29 47.87 11.79 3.98 1.64 0.55 0.14 609.6 1.38 30 27.32 6.73 2.27 0.93 0.31 0.08 508 1.8 31 9.37 2.31 0.78 0.32 0.11 0.03 406.4 1.49 32 0.56 0.14 0.05 0.02 0.01 0 304.8 0 33 0 0 0 0 0 0 304.8 0.58 34 26.7 6.58 2.22 0.91 0.31 0.08 508 2.74 Pipe ID 304.8 406.4 508 609.6 762 1 10,073.94 2480.98 836.71 344.26 2 9257.07 2279.80 768.86 3 1048.56 258.23 4 1006.01 5 Comparative study on the selected node Chapter 6 117 8. Conclusions In this chapter, a new link-based index for the evaluation of pipe uniformity on energy loss, resilience, modified resilience, and hydraulic uniformity indices is considered for the comparative study. Hanoi network is selected for investigation of the proposed index and analysis shows that RI and MRI are found to be alternate index in addressing the performance of the network. Though HUIsys and ELU indices uses head loss in each pipe, but it measures distinctly to address the energy consumed in the pipeline. Pipe selection using average head loss gives directly good network configuration according to the proposed ELU and other indices used in the study. The average head loss guides the pipe dimensioning process quite simple way and resulting configuration promises the economic and provides maximum performance. Network designed by average head loss can further be improved for least cost by bringing the unit head close to average head loss arrived for design. It is further observed that the network performance cannot be improved just by increasing the pipe size alone as increase of pressure cannot be achieved in the network unless booster pump is added or source is available at the centroidal point of the network. In the Hanoi network, the source is not situated at centroidal point, hence the availability of pressure is not uniformly distributed across the network. It is found from the study that even network with higher pipes to meet the unexpected increase in demand is continued to suffer at all critical nodes. Only nodes closer to the source are able to supply additional demand without compromising service pressure. Enhanced performance of the network under higher demand or abnormal operation is possible only if the source of water is made available to the centroidal location of the network. The above conclusion is arrived based on the analysis carried out for Hanoi network and further study using other networks can shed some more insight on the measures and design. References Banos, R., Reca, J., Martı́nez, J., Gil, C., Márquez, A.L., 2011. Resilience indexes for water distribution network design: a performance analysis under demand uncertainty. Water Resour. Manag. 25 (10), 2351–2366. Beygi, S., Haddad, O.B., Mehdipour, E.F., Mariño, M.A., 2014. Bargaining models for optimal design of water distribution networks. J. Water Resour. Plan. Manag. 140 (1), 92–99. Eslamian, S., Syme, G., Reyhani, M.N., 2019. Building socio-hydrological resilience: theory to practice. Virtual Special Issue, J. Hydrol. 575, 930–932. Fu, G., Kapelan, Z., Kasprzyk, J.R., Reed, P., 2013. Optimal design of water distribution systems using many-objective visual analytics. J. Water Resour. Plan. Manag. 139, 624–633. Fujiwara, O., De Silva, A.U., 1990. Algorithm for reliability-based optimal design of water networks. J. Environ. Eng. 116 (3), 575–587. Guercio, R., Xu, Z., 1997. Linearized optimization model for reliability-based design of water systems. J. Hydraul. Eng. 123 (11), 1020–1026. Hashemi, S., Filion, Y., Speight, V., Long, A., 2020. Effect of pipe size and location on water-main head loss in water distribution systems. J. Water Resour. Plan. Manag. 146 (6), 06020006. Howard, G., Bartram, J., 2010. Vision 2030: The Resilience of Water Supply and Sanitation in the Face of Climate Change. World Health Organization, Geneva, Switzerland. Jayaram, N., Srinivasan, K., 2008. Performance-based optimal design and rehabilitation of water distribution networks using life cycle costing. Water Resour. Res. 44 (1), 1–15. Jeong, G., Kang, D., 2020. Hydraulic uniformity index for water distribution networks. J. Water Resour. Plan. Manag. 146 (2), 04019078. Liu, D., Chen, X., Nakato, T., 2012. Resilience assessment of water resources system. Water Resour. Manag. https://doi.org/10.1007/s11269-012-0100-7. Mazumder, R.M., Salman, A.M., Li, Y., Yu, X., 2019. Reliability analysis of water distribution systems using physical probabilistic pipe failure method. J. Water Resour. Plan. Manag. 145 (2), 04018097. Moosavian, N., Lence, B.J., 2020. Flow-uniformity index for reliable-based optimal design of water-distribution networks. J. Water Resour. Plan. Manag. 146 (3), 04020005. Ostfeld, A., Oliker, N., Salomons, E., 2014. Multiobjective optimization for least cost design and resiliency of water distribution systems. J. Water Resour. Plan. Manag. 140 (12), 04014037-01-12. Paez, D., Filion, Y., 2019. Mechanical and hydraulic reliability estimators for water distribution systems. J. Water Resour. Plan. Manag. 145 (11), 06019010. Prasad, T.D., Park, N.S., 2004. Multiobjective genetic algorithms for design of water distribution networks. J. Water Resour. Plan. Manag. 130 (1), 73–82. Reed, D.N., Sinske, A.N., Van Vuuren, J.H., 2010. Comparison of four reliability surrogate measures for water distribution systems design. Water Resour. Res. 46 (5), W05524. Saldarriaga, J., Paez, D., Salcedo, C., Cuero, P., Lopez, L.L., Leon, N., Celeita, D., 2020. A direct approach for the near-optimal design of water distribution networks based on power use. Water 12, 1037. https://doi.org/10.3390/w12041037. Suribabu, C.R., 2010. Differential evolution algorithm for optimal design of water distribution networks. J. Hydroinf. 12 (1), 66–82. Suribabu, C.R., 2012. Heuristic based pipe dimensioning model for water distribution networks. J. Pipeline Syst. Eng. Pract. 3 (4), 115–124. Suribabu, C.R., 2017. Resilience-based optimal design of water distribution network. Appl Water Sci 7 (7), 4055–4066. Swati, S., Janga Reddy, M., 2020. Assessing the performance of surrogate measures for water distribution network reliability. J. Water Resour. Plan. Manag. 146 (7), 04020048. 118 Handbook of hydroinformatics Todini, E., 2000. Looped water distribution networks design using a resilience index based heuristic approach. Urban Water 2 (2), 115–122. Walski, T., 2020. Providing reliability in water distribution systems. J. Water Resour. Plan. Manag. 146 (2), 02519004. Wang, C., Blackmore, J., Wang, X., Yum, K.K., Zhou, M., Diaper, C., McGregor, G., Anticev, J., 2009. Overview of Resilience Concepts, with Application to Water Resource Systems. eWater Cooperative Research Centre Technical Report, eWater CRC, University of Canberra, Australia. Wang, Q., Guidolin, M., Savic, D., Kapelan, Z., 2014. Two-objective design of benchmark problems of a water distribution system via MOEAs: towards the best known approximation of the true Pareto front. J. Water Resour. Plan. Manag. https://doi.org/10.1061/ASCE(WR).1943-5452.0000460. Wu, W., Maier, H.R., Simpson, A.R., 2011. Surplus power factor as a resilience measure for assessing hydraulic reliability in water transmission system optimization. J. Water Resour. Plan. Manag. 137 (6), 542–546. Yazdani, A., Jeffray, P., 2012. Applying network theory to quantify the redundancy and structural robustness of water distribution system. J. Water Resour. Plan. Manage. 138 (2), 153–161. Chapter 7 The co-nodal system analysis Vladan Kuzmanovic Serbian Hydrological Society, International Association of Hydrological Sciences, Belgrade, Serbia 1. Introduction The co-nodal analysis finds an interesting example in geographical elements such as hydrological and paleo-hydrological studies. Useful remote sensing techniques provide more consistent insight analysis of anastomosing phenomena as well as their complex logical relationships. Flows and hydrodynamic functions are not equivocal, even more so in cross-sections of paleo-hydrological periods. Remote sensing techniques allow for a multidimensional analysis of hydrological phenomena in exchange for simplistic hydrological methods, reduced and insufficient field data. Remote sensing is particularly emphasized in the paleo-hydrological discipline of geography, enabling monitoring of the dynamics of macrogeographical phenomena and the use of correlations of macrogeographical factors. Macrophenomena are dimensional, complex geographical, and spatial concepts that are partitions and subsets of geographical categories. GIS techniques enable more accurate mapping of geospatial objects, as well as aggregate geographic data visuals of geographic and macrogeographic phenomena (Wilkinson, 1996; Langat et al., 2019). Hydrological, paleo-hydrological phenomena are dynamic long-term interactions with intriguing fluvial yields. The co-nodal analysis includes a co-nodal system composed of nodules and flows, which describe oriented hydrological functions, river basin, set and subsets of river basins, are not just linear and unambiguous hydrological networks, as seen by conventional river basin theory, but by definition of the nodal theory are multifaceted oriented networks (reversible, bidirectional, complementary). The nodal theory is a temporal, time-spatial analysis of flows, and fluvial vectors, not just a spatial, linear, and actual analysis of water maps. Flows are not one-way directions but poly-orientations; they are not spatial, but temporal-spatial categories. The analysis considers nodal functions as a function of the nodus as a hub and intersection. Flows or branches are temporal functions. Vectors have a temporal or temporal-spatial orientation. Flows are elements of different quad-circular systems. The nodal system is described by time layers, where the layer defines orientation (nodal, mono-orientation) and set (nodal set, layout of active nodes, and functional flows). However, characteristics of dynamic models are multilayer positioning, overlapping layers, complementing flows, and adding functions. The model features functional addition (addition of flows) and poly-orientation of functions. The nodal system thus forms a circular fluvial model with active sets and fluvial palimpsests of the nth time layer. Unlike the conventional net model, which is simplistic with point A to point B flows, the nodal system is a multidimensional model with a multilayered circular flow. Downstream flows are paleohydrological or chronological consequences. The complementary model explains the dilemmas of anastomosing flows. Funnels are not simple regular flows, but complex productions or results. An anastomosing case changes the conventional way we understand fluvial dynamics. The co-nodal system enables the interpretation and analysis of anastomosing flows as complex logical systems through the formalization of advanced flow models. Formalized models make it possible to understand contradictory and alogical results, from the standpoint of conventional theory, especially in cases of counter currents and overlap in the light of nondimensional models. 2. Co-nodal and system analysis Remote sensing techniques provide a clear insight into complex geo-natural phenomena, and thus an adequate relational analysis of concepts and elements in the model. Often, conclusions and a full explanation of a phenomenon can be drawn only after techniques have been performed, GIS data collected, distant samples acquired (Walsh et al., 1998). GIS data are of particular importance in cases of limited data collecting, with positive physical phenomena, such as geo terrain, or multisected permeation. GIS provides obvious insights (geographic, hydrogeographic insights) or provides visual data on which models can be set (Gilvear, 1999; Gilvear et al., 1999; Gangodagamage and Agrarwal, 2001). Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00010-5 Copyright © 2023 Elsevier Inc. All rights reserved. 119 120 Handbook of hydroinformatics Selected visual data can be formalized, as applied to hydrological phenomena, into nodal systems, models, vectorial, or flow maps, with an explanation of specific physical systems, models that link groups of geographical objects and systems into formal work frames (such as catchments, river mouths, anastomosing, intercourse, etc.). Complex hydrological models find adequate formalization in co-nodal systems, given the abundance, multiplication, dynamics, relations of elements (hubs and nodes) and systems (basins and rivers) as well as chronologies. Hydrological models function on the principle of nodes and orientations. Temporal orientations are in paleo-hydrological analysis besides spatial (geographical) are also temporal categories. Geographic models are not only physical, spatial systems, but also complex and multidimensional, supra-geographical, linear compositions. The placement of geographic models is oriented in space and time, and therefore basic geographical facts like flow, orientation are not mono-dimensional but vector categories. Geographic elements are not physical but structured elements of a system. The contemporary theory involves a structured multidimensional, realistic, diffuse model versus a linear GIS model in the classification of paleohydrological phenomena. Identify, define, and classify with clear examples and explanations: GIS visual data enable global access to geophenomena. It allows comparisons and inferences based on similarities and differences, as well as specificity deductions. Physical categories such as co-nodal systems (counter-fluxes, hub systems) are simply not elements of physical geography, but of modern geography, for which the dynamics of different hydrological orientations and values are valid. For co-nodal systems, poly-values, relative focus, dynamics, multidimensionality, linearity are valid. Co-nodal systems are also the basic hydrological fluvial systems. Paleo-hydrology implements the temporal element in geo-interpretations in a particular way. Co-nodal sets appear as a suitable work-frame. 3. Paleo-hydrology and remote sensing Certain postulates that co-nodal systems can generate, such as lower-flow/sequestration, two-stream, counter-flux. The postulates are linear, model-logical, relative, causal and acausal statements, relationships, and structured facts of advanced systems. The co-nodal system is thus composed of nodes and branches. In cases of anastomosing, flows reach inverse functions, abandoned channels become channels of other rivers, through co-nodal models the dynamics are quite readable (Paraguay river system, Mesopotamian bifluvial system). Hydro-geographical movements are effectively legible as branching, and the models that describe this and such branching are the most efficient models and the closest models that fit the approximate Remote sensing data. Branches, networks, subset aggregations, and circular systems effectively correspond to field data. The dynamics of water systems and flows is the formalization of dual, progressive systems. Remote sensing data provide the material for formalizing hydrological models as river co-nodal systems, hubs and flows, as multioriented interactions and vector exchanges. Axial models approximate causal ones as special cases of interdynamics. Remote sensing’s geo data are crucial to understanding hydrodynamics. Hydrosystems, especially hydrodynamic systems, are complex phenomena and require comprehensive research. Fluvial analysis, dynamic analysis of flow/river systems is provided at a high level of scientific accuracy/validity. Paleo-hydrological objects, phenomena, and paleohydrological issues require time analysis of the hydrological facts found. Temporal reconstruction is by all means facilitated by the use of visual data, primarily by remote sensing techniques given the nature, structure, and volume of information data status, and physical internship that have been enhanced by hydrological and hydrodynamic analysis (less noticeable phenomena). Remote sensing is a typical information technology technique. Information data can be tuned, calibrated, planned, controlled, managed, and implemented through interactive research work and processes, such as process alignment and implementation. Remote sensing data are dynamic info technology. Its data are much more interesting and vibrant than other, secondary quantitative and qualitative techniques. Paleo-hydrology is perhaps the most implicit field when implementing remote sensing techniques. Knowing that the results and findings are the most intriguing and the area of possible applications most attractive, the interactive findings open up opportunities for new geographical knowledge, interpretations, and the basics of comparative research (Abbasova et al., 2017). Paleo-hydrological objects and topics are more easily visible with the remote sensing apparatus. Remote sensing is greater applied, applicable to global and macrogeographic phenomena. Macrogeographical phenomena such as geographical systems are observable, and comprehensible paleo-hydrological objects and fluvial systems are macroeconomic entities. As synthetic systems, i.e., as holistic and logical structures, phenomena have a spatial spread and are holistically structured. Remote sensing techniques are especially useful for macrophenomena and hydrogeographic systems, dynamic complexes, paleo chronologies, and fluvial models. The co-nodal system analysis Chapter 7 121 4. Methods The purpose of this study is to analyze the flow of the Danube to determine the potential and dynamics of the channel system and the complex time management of this significant European river. Paleo-hydrological analyses were done in chronological and spatially sequential terms, with sections (river flow fragments) and hydrological stages. The river has complex canal potency, established or indicated relations, extraordinary canal, as well as prerequisites for the further development and improvement of water management of the river. These connections are noticeable throughout the Danube and are particularly pronounced in the middle and lower middle parts, the Pannonian segment of the lower propagation. The lower Danube segment registers intersecting flows and manageable communications that have long been the subject of management of the development and implementation of water management systems (Constantinescu et al., 2015). The potentials of the upper Pannonian and lower Pannonian hydrological blocks have been generally investigated, as recent hydrophysical outputs. The basic paleo-fluvial relations were represented by the observational method, as well as repercussions, according to the current flows. On this occasion, advanced RS systems were used with the support of useroriented geographic software and remote sensing spatial systems. Hydro flows are summarized by confluxes as marked paleo-hydrological points (points of previous or current paleo-estuaries). Hydro points were used as starting points or nodes for alternative subset and vector analysis. For the analysis of the basic paleo hydro nodes of the Pannonian and sub-Pannonian blocks, the block is divided according to ref. geographic features with due regard to the dynamics and sometimes complex logic of fluvial or laco-fluvial formations. On that occasion, neighboring paleo nodes (paleo nodes and current hydro nodes) and river dominants were considered. The dominant and the node form a subset or hydrologic block. The block is determined by simultaneous dominant, parallel (paleo-distributary), or consecutively oriented flows (one river in two flows, in the interphase, named hereafter as Danube transitions). A stream changes its dominant over time; hence, two dominants of one river channel are consecutively possible (Leigh, 2008; Gilvear, 1999). The paleo-island is bounded by flows entirely or consecutively, as positive dominants of the paleo-block. Oriented paleo-block flows form actuals, recent, and present river flows. Paleo-island is a hydrologic block made of river distributaries, or as in river systems, of more than two rivers and its paleo distributaries. The middle and middle-lower Danube has three (or four) paleo-blocks: upper supra-Pannonian, lower supra-Pannonian, sub-Pannonian, and lower Banat. The upper block consists consecutively of: Vac, Szolnok, Ulca, and Titel knot. 5. Nodes and cyclic confluent system Nodes are complex fluvial solutions, that is, complex paleoconfluxes. Most often, these are multiple confluxes (multicons), very rarely unique confluent points. The nodes are consecutively: trifluxes, bifluxes, paleoconfluxes, and confluxes. As the layers are structured, triconfluxes are biconflux and paleoconflux, or biconflux and conflux. Paleoconflux constitutes the inflow of a river into itself, during the paleological phase, and is a temporal phenomenon, typical of the transition between the main phases. The most appropriate way to show a system of river flows with respect to the change of orientation over time is a system of streamlined flows (i.e., oriented graphs), as a limited yet complex combination providing both time-separated phases and overlaps. The nodal system is the most adequate model in the analysis of anastomosing river systems. Oriented flows have the advantage of explaining the complex chronology of Pannonian and sub-Pannonian flows (Gábris and Nádor, 2007; G€ unther-Diringer, 2001). Pannonian meta-paleology is intertwined and multifaceted. The orientations are subdivided into classes and subclasses that are separated, time-varying, and declared layers. A distinction has been made between the temporal structure of the layers from which further conclusions and implications were drawn (Fig. 1). Outputs of a co-nodal model are cyclical alternatives, that is, overlapping to a full cycle, from the start point (output) to theend point (output) of branching of a hub system or estuary. Input is branching or flowing into branches and oriented subsets. The hypocycle is composed of two or more sets whose nodal iteration over time produces a full anastomosing effect. The hypocycloid therefore encompasses all the temporal cycles of paleo-hydrodynamics. The hypocycloid is a paleo addition, a time composite of complementary cycles from deflux (input) to conflux (output). The hypocycloid thus encompasses the largest time shift that combines the period of anastomosing evolution, that is, the cycles of partial hubs. Hubs are fluvial or conflux outputs. Hubs are confluxoids with respect to the alternating conflux character. They are made up of iterative chronological not conventional spatial nodi. Confluxoids are time-space meetings of a hub system. Although confluxoids are paleo-hydrological phenomena, they also produce real effects, such as geohydrological geopaleological effects, and therefore they are also physico-geographical phenomena. Again, each hydrological phenomenon is a special case of a complex system (Fig. 2). 122 Handbook of hydroinformatics FIG. 1 Chrono-slide with paleo points (Danube river): (A) Titel, (B) Titel-Czenta, (C) Czenta-Surduk, (D) Belgrade, (E) Savian counterflow, Sirmium (Sremska Mitrovica)/Savacium (Sabac), (F) Belgrade, Danube Hub, (G) Titel, Belgrade Hub. FIG. 2 Dynamic Paleo-Danube translations: (A) Cibelian counterflow Slavonski Brod (1), Cibalae/Volcae (2,3) (B, C) Paleo Dravus, Cibelae (Savus), Dravus avulsion with Danube paleo bifuxes (D, E) Paleo Danube avulsion (The co-nodal system in Table 2). The co-nodal system analysis Chapter 7 123 The most pronounced occurrence of these fluvial dynamics and the multiple Danube hub is the unique Danubian hypocycloid, or conflux system with 10, clearly differentiated, paleoconfluxes. This makes the Danube hypocycloid the most remarkable multiconflux in Europe. The accumulation of river confluxes peculiar to the Danube cycloid—as a miniature of the Pannonian paleoinsula—made up of different cyclical points (countercurrents), combinations of countercurrents and paleoconfluxes, as a dynamic maximum. Up to the Carpathians, the Danube dynamics are characterized only by anastomosing paleoconfluxes, in the Carpathian precomplex; however, a unique phenomenon of counterflux is found ( Jenkins, 2007; Phillips, 2014). 5.1 H-cycloids analysis and fluvial dynamics Due to chronological overlaps and stages, the cycloid is a sequential hydrological category. Fluvial systems include cyclic flows, partial cyclic layouts, and anabraching cycles are not cycloids according to the absence of interphase overlaps and the summation of subcycles into a cycle. The problem seems to be that of the spatio-temporal set and the fact that cyclical systems of distributions and anabraching are physical categories and not just chronological segments. However, all downstream flows are spatial-dynamic, hence chronological spatial functions. It also means that all systems are dynamic linear models in which certain parallels are possible. Parallelisms imply certain phases that can be explained consecutively in hub nodal systems. In any case, hypocycloids are not linear categories, but dynamic multifunctional, oriented, and polyvalent variables. The most impressive paleo-dynamic phenomena of hydrological geography are paleocycling and interfluvial paleocycles, and among them certainly hydro-cycloids (h-cycloids). The world’s largest HHC interfluvials are: Euphrates-Tigris, Parana-Paraguay, Danube-Tisza, and Mississippi-Ohio. Interfluvials are by nature paleo-hydrological phenomena. Mesopotamian h-cycloid was formed by the cycling of the Euphrates with the paleo-hydrological rotation of the Euphrates and the Tigris; the interfluve itself is characterized by two hydrological fluvial cycles and a quadri hub system. The Danube h-cycloid is a tricyclic fluvial process mediated by the Tisza in the second cycle. The Iberian interfluve is an interactive phenomenon of Paraguay and Parana. Finally, the Ohio-Mississippi interaction with four cycles, with a complex paleodynamic form and pre-gulf estuary forms a large North American interfluve. A complete bifluvial paleo-cycle model or a bifluvial h-cycloid of two dominants and an interfluvial cycloid of two dominants in which the flows are reciprocal and alternating are consequential axial, nodal, and polyconfluxoid points. Paleo blocks of cyclic models are parallelograms with simultaneous flows of even stages and alternating flows of odd ones. The parallelograms of the linear models consist of simultaneous and alternating, parallel and nonparallel flows and values (nth and n + 1th phases), with polyvalent modes and constants (see Fig. 3). The Baghdad hub system consists of parallel or alternating paleo-dynamic points, some of which are active simultaneously or consecutively as hydro nodes (Samawah, Hamza Ia, b). A cycloid is a specific alternating paleofluvium of two rivers during the entire course of a cycle or subcycle. This means that unlike some other h-cycloids (such as the Danube), perfect h-cycloids are unique paleo-hydrological bifluvial phenomena. In other words, in a certain phase of the model, another river said during one river, and in the next phase interfluvial rotations take place. The model (see Fig. 3) is described FIG. 3 Mespotamian H-cycloid (A) 1 Delli Abbas 2 Al Zoor 3 Bakuba 4 Baghdad 4a Habbaniyah, Karbala 5 Kut 5 6 Najaf 7 Nasiriyah, Al Hamza, 8 Basrah, (B) complete H-cycloid Euphrates (Eu) Ia 4-5-6b-8, Euphrates (Eu) Ib 4-5-5a-7-8 with overlapping 4-6b, 4-6-6a-7, Euphrates (Eu) II (1)-(2)-(3)-44a-6-6a-7-8, counterflow with paleo-rotation Euprathes I-Euphrates II (Tigris II) 4-4a, 4a-5, Euphrates III 4a-6-6a-7-8, 5a Shatrah 6a Samawah 6b Amarah. 124 Handbook of hydroinformatics by interfluvial scissors, or a specific double lever where the sides are, alternatively, the variable and only constant in odd phases—the cycloid nodules of Baghdad and Samawah. The hypocycloid is a complex, dynamic, iterative, and polyvalent model, where the values are at the same time: conventional (recent phases of the model, separate flows of the model), simultaneous (even stages), and alternating (odd stages of the model). The summation of stages takes place in flows that overlap from a subcycle to a cycle or from a smaller cycle to a larger cycle. Hence, the hypocycloid does not form a large cycle but a cycle that encompasses cycles and overlaps. The fluvial cycle is usually made up of two rivers with an impressive paleogenesis of one river, the dominant one. The cycle, the development of the dominant, can be rotational (with counterpoints and counterdirection), alternative (with alternation of dominant and subdominant), and alternate-rotational (with two alternations and two subdominants, Uruguay and Paraguay). Although it is an interfluvial, the cycle and hypo-cycle are performed by the dominant, so dynamic outlets, spatiotemporally understood as overlapping and consecutive additions of chronologies of the hydro-dominant with subdominant assists or alternations. The nonlinear function implies the alternation of flows which are bifluvium or refluvium with several confluent points such as: quadrifluvium, double biconfluvium, or triconfluvium, such as biconfluvium and active confluvium. Interaction implies the use of factors and factorials (partial outcomes) in order to form hydrodynamic product (Reinfelds et al., 1998). Basically, every nonlinear function is a bifluvial relation. Each paleo-hydrological function is a real, active bifluvial relation (a relation between two rivers that is actually polyoriented). Relative geographical models are based on the theory of oriented flows, cycles, countercurrents, river basins, cycloids, as well as processes such as cycling, rotation, alternation, equivalence, and addition. The co-nodal analysis includes a co-nodal system composed of nodules and flows, which describe oriented hydrological functions, river basin, set and subsets of river basins, are not just linear and unambiguous hydrological networks, as seen by conventional river basin theory, but by definition of the nodal theory are, multifaceted oriented networks (reversible, bidirectional, complementary). Subcycles (see Fig. 8) are cyclohydrological phenomena, cyclodynamic fluvial phenomena that are formed by summing river flows. Cycles are cyclohydrological phenomena that are formed by overlaps and additions of subcycles. Semihypocycloids are paleo-cyclohydrological phenomena that occur by alternating and overlapping cycles (in the compendium of analysis s-c, s-s-c, s-h-c, h-c, further classified as Kuzmanovic h-cycloids, cycles, and subcycles). Perfect cycloids are hydrocyclic phenomena that are formed by the alternation of flows, and the rotation of semi-cycloids (Mesopotamian h-cycloid, see Fig. 3). The subcycle is formed by cycling, the cycle by overlapping, the semicycloid by overlapping and alternating, and the complete h-cycloid by rotating cycles. Contrafluvium (see Figs. 1E and 2A) is a paleo-dynamic phenomenon of the flow of the same, or two different rivers, in the same riverbed in two opposite directions during different time periods and paleo-hydrological phases. The counterflow is a paleo-hydrological relation of two rivers, in which chronologically the latter river flows in the opposite direction from the original direction of the previous river. Countercurrents (Cibalae, Centa, Baghdad, Sabac) occur at higher stages of paleo-hydrological development. These are high-ranking hydrodynamic phenomena, requiring a greater degree of fluvial and confluent conditions: bifluvium, recon, or bicon. In order for the bicon to be formed, it is necessary to fulfill the condition of refluvium and bifluvium, as well as recon. For each higher paleo-dynamic rank, it is necessary to fulfill the condition of the lower rank. The phenomena are not only temporally but temporally spatially scaled, forming a realistic model of paleo-hydrological chronology. This means that over time, a river would form like a bicon, there should be an actual estuary, the river should flow into itself (recon), and it should flow with its own course and the course of another river (bifluvium) (Fig. 6). Hence, at least two bifluviums are required for biconfluxes, while in the case of higher hydrodynamic ranks such as tricons and quadricons, a larger number of subdominants, one dominant and three subdominants, and two dominants and two subdominants. Trifluvium is a phenomenon in which the course of one river is (paleo) chronological flow of the same river in different phases (after alternation, postalternation), the second river during some of the phases (bifluvium), and the third river (Fig. 7). Countercurrents are the consequences of quadricons (Centa) or polyfluviums. Counterflows are therefore the phenomena of the highest hydrodynamic systems, polyfluviums (Centa) or trifluviums (Sabac, Vinkovci). Cycling is the property of a river to produce time-cycling products, subcycles, cycles, partial, and perfect hypocycloids. The rotation is a double, complete alternation of the flow of two rivers during a certain paleo period (alternative phases). The alternation of two rivers is a reciprocal replacement of the flow of one river by the flow of another river, and vice versa. 6. Three Danube phases The Danube paleohistory consists of three prominent phases that correspond in a physico-dynamic sense to the paleogeographic segments of Pannonian development, the paleo-dynamic sectors whose dynamics have been described even The co-nodal system analysis Chapter 7 125 by recent flows. The three phases are: upper Pannonian and lower Pannonian (supra and subpanonic), with lower Danube development (Banat metatransition of the Carpathian effect). The subpanonic transition is due to the first two phases, without neglecting the laco-fluvial phase of the Carpathian-Danube limes. Danube limes make somewhat less recognizable fluvial transitions. But the elements of the extensions of the first and second classes of knots are clearly indicated: Belgrade, Pancevo, and Centa. Each phase has a paleo-dynamic block with variable river dominants. The upper Pannonian sector is made up of the Danube and the Tisza dominant (Kiss et al., 2015; Stancı́ková, 2010). The lower of the Drava, the Danube, the Tisza, and the sub-Pannonian of the Drava-Sava and the Danube dominant. The supra-Panonnian Phase (Table 1) includes the Danube Ia and Ib class 12 flows, the supra-Panonnian Phase and the pre-Carpathian transition estuaries (2.11), the flow from Szolnok to Titel class 124, from Titel to Belgrade, and from Titel to Centa. The second, lower suprapanonic phase exudes class 134, the Danube flow from Vac to Ulca with chronological flows from Ulca to Titel, and with characteristic subclasses and further. The third sub-Pannonian classes 137 and 138 include specific sets of streams IIIa and IIIb, namely the Danube flow from Ulca to Cibalae, and from Cibalae to the mouth of the Savus, and the Danube flow from Ulca to the mouth of the Savus, bypassing the sequence (7,8) (Table 2). The Sava flow is laco-fluvial and bifluvial by paleo nature up to the Sirmian node, as well as in the continuation of the upper lacofluvial phase Hrtkovci-Belgrade; in front of that levridge on the line of the Sirmium-Sabac node, counterfluvial (Drina, Sava). The dynamics of the Dravus describe classes 56 and 34, which are peculiar to the Dravus translation from Bogojevo to Ulca and specifically to flows from Bogojevo to Becej, Bechay (5,6) (6,9) and (5,6) (6,10) and from Ulca to Titela (3.4) (4.6) (Fig. 4). TABLE 1 Classes of paleo-flows of Danube. Class Flow Class (1,2) (2,6) (2,11) Flow Class Flow (1,3) (3,7) (7,8) (8,5) (5,6) (1,3) (3,8) (8,5) (5,5) (5,6) 124 134 137 (1,2) (2,4) (4,6) (1,3) (3,4) (4,6) (1,2) (2,4) (4,9) (9,6) (1,3) (3,4) (4,9) (9,6) (1,2) (2,4) (4,5) (5,6) (1,3) (3,4) (4,5) (5,6) (1,3) (3,7) (7,8) (8,5) (5,6) 138 (1,3) (3,8) (8,5) (5,6) TABLE 2 Bifluxes and polyfluxes with major fluvial transitions. Biflux River Transitions River (3,4) Dravus, Danube (7,3) Savus, Danube (1,2)/(1,3) in 4 Translation (7,8) Savus, Danube (1,3) (3,4)/(1,3) (3,7) in (5,6) Translation (8,5) Savus, Danube (3,4) (4,6) Complement (2,4) Tisza, Danube (10,8) Polyflux Flow classes (9,6) Danube I, Dravus, Danube II, Tamis (6,11) Danube Ib, Dravus, Danube II, Danube III, Savus (4,5) Danube I, Dravus, Danube II (8,5) Danube II, Danube III, Savus (5,6) Danube Ib, Danube III, Savus (7,8) Savus, Danube III 126 Handbook of hydroinformatics 1 2 3 4 9 1 2 3 4 9 8 5 6 7 7 10 5 8 Hydrodynamics Paleo knot 6 11 10 Hydrodynamics 2 1 2 3 4 9 7 Hydrodynamics 6 5 Paleo knot 1 2 3 4 10 2 Hydrodynamics 9 5 Paleo knot 1 2 3 4 6 10 2 3 4 9 8 5 6 Paleo knot 2 1 2 3 4 9 8 5 6 Hydrodynamics 9 7 Paleo knot 2 1 2 3 4 9 8 5 6 7 8 Hydrodynamics 2 7 8 10 1 Hydrodynamics 7 10 2 7 8 10 Paleo knot 6 5 Paleo knot 2 10 Hydrodynamics Paleo knot 2 FIG. 4 Diagram: 1 Ia Class, 2 Ib, 3 IIa Classes 134 (1,3) (3,4) (4,9) (9,6), 4 IIb Classes 134 (1,3) (3,4) (4,6), 5 IIIa, 6 IIIb, 7 Paleo Island IIa 8 Paleo Con IIa Knots: 1 Vac, 2 Szolnak, 3 Ulca (Vukovar), 4 Titel, 5 Belgrade 6 Pancsova (Pancevo), 7 Cibalae (Vinkovci), 8 Šamac, 9 Czenta (Centa). The co-nodal system analysis Chapter 7 127 In cases of accumulation of river junctions (fluvial and translational), the phenomenon of manifold points, characteristic of counterfluxes, and river counterflows also occurs. The counterfluxes are the consequence of multiple confluxes of numerous paleo-dynamic nodes. While nodes are characteristic in the later stages of paleo-dynamic development (Schumm, 1968; Rittenour et al., 2003), counterflows occur only after the second or third paleo layer. It is a common estuary, paleoconflux, and biconflux paleo-estuary. In the case of counterflows, it is necessary to satisfy the biconflux condition (condition of second order) and paleoconflux, which means that counterflows occur in the transition from biconflux to the triconflux phase. The lower Danube (see Fig. 6) is characterized by the quintoflux of five chronological streams of the Danube cycles Ib, II, III, Dravus, and Savus on the Danube limes from Panevo to Banatska Palanka (6,11), quadriflux of Danube I, II, Dravaus and Tamis from Centa to Pancevo (9,6), Danube Ib, III, Dravus and Savus, from Belgrade to Pancevo (5,6), and at least several trifluxes from Titel to Belgrade (4,5), biflux of Danube III and Savus, from Cibalae to Šamac (7,8) and from Šamac to Belgrade (8.5). Bifluxes of the Dravus and Danube from Ulca to Titel, and the Savus and Danube from Cibalae to Šamac, Tamiš and Danube from Šamac to Belgrade. To this should be added the specific counterflux of the Danube from Ulca to Cibalae, and the Sava from Cibalae to Ulca (Table 2). In the case of poly conflux, 12 clearly differentiated hydro paleo nodes (Figs. 5–12) are characteristic: Vac, Szolnak, Sonta, Bechey, Ulca (Vukovar), Titel, Cibalae (Vinkovci), Šamac, Czenta, Belgrade, Pancevo, and Banatska Palanka. The Danube hub consists of the lower Banat paleo quadrant with specific Belgrade polyconflux, the hydroinsula described by the hubs Surduk, Centa, Pancevo, Belgrade, and further, more appropriately Titel, Centa, Pancevo, Belgrade with a dozen paleoconfluxes. The Belgrade polyconflux hub consists of six striking paleoconfluxes: Danube I and III, Danube II and III, Danube I and Savus, Danube II and Savus, Dravus I and Savus II, and Danube (Fig. 6). A broader variant of the Danube halfhub consists of at least 6, as many from the point of view of analysis were counted, paleo-biconfluxes, 10 paleoconfluxes and several fluvial confluxes (Tamis and Danube, Tisa and Danube, Savus and Danube). The Danube hub with specific polyconfluxoid consists of 4, or 5 hydro nodes with 20 recent and temporal paleoconfluxes. The Danube hub consists of a system of 20 current and former mouths. Belgrade and Pancevo make interesting Danube paleo trifluxes. 7. Danubian hypocycles as overlapping phases The lower Danube flow is described by three hypocycles: large, medium, and small (in the analysis as Kuzmanovic s-c, s-sc, s-h-c, h-c), each characteristic for a phase of hydro-morphological development of the Danube and specific paleo-hydrological block, I block for I and II phase, III block for III phase, with overlapping between the phases, and alternating partial flows, forming the Danubian hypocycloids of the Danube paleoinsula. These cycles, in addition to being individual hubs, confluxes, also integrate joint flows (parts of joint flows, classes, bifluxes, and paleofluxes) (Figs. 10–12). The first hydro cycle consists of overlaps of Phase I and III, Vac, Szolnok, Titel, Belgrade knot, Phase I and Vac, Ulca, Šamac and Belgrade knot, Phase II, the second cycle consists of overlaps of Phase I and II, Vac, Szolnok, Titel, Phase I, Vac, Ulca, Titel, Phase II, the third cycle overlapping Phases II and III, Ulca, Titel and Belgrade node, Phase II, and Ulca, Samac and Belgrade node, Phase III (Figs. 8 and 9). The first circle includes the supra and sub-Panonnian blocks, the second supra-Pannonian block, and the third sub-Pannonian block, each with corresponding quadrants. The most impressive is the fourth quadrant Danube hub, which, with complex fluvial dynamics, somewhat corrects and complements the image of the consequent Danube paleo-cycles. The third hydro cycle unites all three phases, the Belgrade paleo-hydrological block consists of all three river dominants, and the flows are trifluxes. The Belgrade cycle consists of Titel (Surduk), Centa, Pancevo, and Belgrade knots. Danube hub cycles generate a large Danube paleo-hydrological cycle of the Pannonian paleo block, in the manner of partial, hypocycloid flows, in the sequences of phases and hydro cycles. Kuzmanovic’s paleo cycles consist of inner tangent circles/subcycles with common points, nodes, and common paleo flows, classes. One cycle is described by alternating partial flows of one tangent circle to the nth node, and another tangent circle from the nth to (n + 1)th node. The cycle model contains a dozen nodes and three inscribed tangent circles with specific alternating flows. The Belgrade paleo hub (hypocycloid) serves as a correction of partial cycles to the large paleo cycle of the maximal Pannonian-Carpathian extend. The Belgrade cyclic hub is a pronounced hydrodynamic system whose environment records almost all the dynamic solutions and translations of Pannonian flows (see Fig. 1A). This pronounced hyper-dynamics, which is effected in a small space, makes it behave as a hydrodynamic model and as a hub, for the translation of river systems over a large area of the Danube. In this way, the Danube hypocycle is composed of cyclic phases and tangent junctions, with supra-sub-Pannonian and pre-Carpathian coverage (Fig. 11). The Great Paleo H-Cycloid is described in physical terms by the combination of the first and third phase of the upper supra-Pannonian and sub-Pannonian as two 128 Handbook of hydroinformatics 1 2 3 4 9 7 1 2 3 4 9 8 5 6 7 10 5 8 Hydrodynamics Paleo knot 6 11 10 2 Hydrodynamics 1 2 3 4 9 Paleo knot 2 1 2 3 4 9 8 5 6 7 7 10 8 5 6 11 10 Paleo knot Hydrodynamics 2 Hydrodynamics 1 2 3 4 9 8 5 6 Paleo knot 2 7 10 Hydrodynamics Paleo knot 11 2 FIG. 5 Diagram: 1 Ia Class Conflux, 2 Ib Conflux, 3 IIa Conflux Classes 134 (1,3) (3,4) (4,9) (9,6), 4 IIb Conflux Classes 134 (1,3) (3,4) (4,6), 5 IIIa Conflux. distinct physical-spatial hydro cycles, encompassing the entire Pannonian block, starting from the Vac flow to the Pancevo and pre-Carpathian polyfluxes (Fig. 2B). Larger coverage, I and II, quadrant 1 and 2, first, II and III, quadrant 3 and 4, second, I and III, quadrant 1, 2, 3 and 4, third (cycles are listed according to overlapping blocks, not by chronology, consequential representation is shown in the Fig. 8) first and second cycle have common phase II (Ulca, Titel, Pancevo), the large cycle and the first cycle, have a The co-nodal system analysis Chapter 1 2 3 4 5 6 7 129 9 7 8 10 10 Paleo knot Hydrodynamics 11 12 2 FIG. 6 Paleoknots: 1 Vac, 2 Szolnok, 3 Sonta 4 Bechey (Becej) 5 Ulca, 6 Titel, 7 Cibalae, 8 Šamac, 9 Czenta (Centa), 10 Belgrade, 11 Pancsova (Pancevo), (Savski brod) 12 Banatska Palanka. Vac Szolnok Knot Ulca Titel Centa Danube I - Danube II Dravus I - Dravus II Samac Belgrade Pancevo Danube I - Danube II Danube I - Danube III Danube I - Savus Danube II - Savus Dravus I - Savus Savus - Savus (Danube II) Danubian hypocycloid FIG. 7 Paleo triconfluxes, biconfluxes, confluxes and confluxes. Danube I - Danube II Danube I - Dravus Danube I - Tamis Dravus - Tamis Danube I - Danube II Danube I - Danube III Danube I - Dravus Savus - Dravus I Belgrade I Szolnok Vac Knot II I, II Titel Novi Sad Centa Danube I - Danube II Danube I - Dravus Danube I - Tamis Dravus - Tamis Danube I - Danube III Dravus I - Dravus II III I, II II, III Belgrade Samac Pancevo Danube I - III Danube II - III Danube I - Savus Danube II - Savus Dravus I - Savus Savus - Savus (Danube II) Danube I - Danube II Danube I - Danube III Danube I - Dravus Savus - Dravus I Belgrade Danubian hypocycloid FIG. 8 Quadri system (fluvial sections and intersections) of Pannonian paleo-insula. Vac Snolnok Ulca Vac Ulca 1 Szolnok Titel Titel 2 3 Samac Ulca Samac Titel Belgrade Belgrade Szolnok Vac Szolnok Ulca Titel Vac 1 Ulca Czenta 2 Titel 3 Samac Czenta Ulca Samac Titel Pancsova Belgrade Belgrade FIG. 9 Hypocycles and subcycles (Kuzmanovic cycles) with hydro knots. Line 1: subcycles, Line 2: hypocycles. Vac Szolnok Knot Neo Planta Titel Centa Samac Belgrade Pancsova Danubian hypocycloid Belgrade FIG. 10 Quadri-nodal system of oriented flows. Szolnok Vac Knot 1 Ulca Centa Titel 2 Samac Pancsova Belgrade Danubian cycle 2 FIG. 11 Sections to cyclic representation diagram. First and second hypo and subcycle (h-s-cycle). Belgrade 132 Handbook of hydroinformatics FIG. 12 Danube hypocycloid with 1 Danube subcycles (corresponding to first, second, and third paleo-dynamic change), Kuzmanovic s-c, see flow diagram 2 Danube hypocycloid with paleo nodes and minor hypocycles, Kuzmanovic h-c 3 First and second subcycle of paleo-dynamic change, Kuzmanovic s-c 4 subcycle and hypocycle of Ia paleo-dynamic change, Kuzmanovic s-h-c. common first cycle, a large cycle and a second cycle, a common second cycle, from which it follows that the sets of cross sections are made up of cycle subsets, a large cycle and a large subcycle, have a common a large subcycle, hence the cycles are duplicated, each a sub-cycle in a set of two sub-phases has a common flow of one phase plus the flow of another subphase of the same set or cycle, each subset is duplicated, which eventually leads to a duplication of a large sub-cycle (summing sets) as a subset of the third cycle (Fig. 7). Overlap of subcycles II and IIIa (1 and 3, and 3 and 4, in 3) in III, Ia and IIIa, in II (1 and 2, and 1 and 4 in 1), Ia and II, in I (3 and 4, and 2 and 4, in 4) I phase I quadrant, Ia phase I and II quadrants, II phase I and III quadrants, III phase III and IV quadrants (Fig. 8). Cycles are composed of complementary flows of complementary phases, overlapping phases, and may be larger and smaller subcycles (4 or 6 knots). A cycle (at least 6 or 9 knots) is made up of subcycles with common flows, for example, a cycle of 9 knots may consist of subcycles of 6 knots and a cycle of 6 knots may consist of subcycles of 4 knots. The intersection of two subcycles can be a class or a subcycle, and the intersection of a cycle and a subcycle is always a subcycle. The co-nodal system analysis Chapter 7 133 The subcycle is added into a cycle by overlapping flows in phase models. The cycle integrates (encompasses) n cycles of iterative flows of the nodal system. A cycloid is formed by overlapping several cycles of iterative (alternating) subcycles. Cyclical overlaps, in turn, are positive additions of cycles to the cycloid. In this sense, cyclic dynamics achieves its full synergy in cycloid as the aqueous complex of the dynamic model. Iterative phases are inherent to subcycles and overlapping iterative phases are cycles. H-cycloids are formed by the addition of cycles. Thus, each level of the dynamic model is described by determined subcycle processes. Each circle is formed by two river cycles (subcycle, s-cycle), and each cycle is formed by two phases with up to two common nodes (Figs. 7 and 12). Subcycles are complementary subsets of fluvial classes in a set, while class subsets are intersecting subsets in a set of subcycles. The large h-cycloid is formed by phase I and phase III with the joint hubs in Vac and Pancevo. The first cycle is formed by phase I and phase II with the common Vac and Titel hubs. The second cycle is formed by phase II and phase III of Ulca and Pancevo. 8. Conclusions The consequential analysis of lower Danube flow has led us to the model of Danubian hypocycloid as a complex paleodynamic system composed of major cycles (hypocycles) and subcycles, effectuating in Belgrade paleo hub, segmental polyconfluental area acting as a correction to the other consequential hydro cycles. The Belgrade cyclic hub is a pronounced hydrodynamic system whose environment records almost all the dynamic solutions and translations of upper Pannonian flows. The large Danube h-cycloid is composed of cyclic phases and tangent junctions, with supra-sub-Pannonian and preCarpathian coverage, while subcycles are made by complementing (flows), and cycles are made by overlapping (subcycles and flows) in nodal time alternation. This concept is crucial in understanding the dynamics and repercussions of Pannonian hydrology. The Great Paleo Hypo-Cycloid is formed in physical terms by the combination of phases of the upper and lower supra-Pannonian and sub-Pannonian in distinct complement hydro cycles, by so encompassing the entire Pannonian block, starting from the Vac flow to the Pancevo and pre-Carpathian polyfluxes. References Abbasova, D., Eslamian, S., Nazari, R., 2017. Paleo-drought: measurements and analysis. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of Drought and Water Scarcity. Environmental Impacts and Analysis of Drought and Water Scarcity, vol. 2. Taylor and Francis, CRC Press, USA, pp. 665–674 (Chapter 34). Constantinescu, Ş., Achim, D., Rus, I., Giosan, L., 2015. Embanking the Lower Danube: from natural to engineered floodplains and back. In: Geomorphic Approaches to Integrated Floodplain Management of Lowland Fluvial Systems in North America and Europe. Springer, New York, pp. 265–288, https://doi.org/10.1007/978-1-4939-2380-9_11. Gábris, G., Nádor, A., 2007. Long-term fluvial archives in Hungary: response of the Danube and Tisza rivers to tectonic movements and climatic changes during the quaternary: a review and new synthesis. Quat. Sci. Rev. 26 (22–24), 2758–2782. https://doi.org/10.1016/j.quascirev.2007.06.030. Gangodagamage, C., Agrarwal, S.P., 2001. Hydrological modeling using remote sensing and GIS. In: 22nd Asian Conference on Remote Sensing. Gilvear, D.J., 1999. Fluvial geomorphology and river engineering: future roles utilizing a fluvial hydrosystems framework. Geomorphology 31 (1–4), 229–245. Gilvear, D.J., Bryant, R., Hardy, T., 1999. Remote sensing of channel morphology and in-stream fluvial processes. Prog. Environ. Sci. 1, 257–284. G€ unther-Diringer, D., 2001. Evaluation of Wetlands and Floodplain Areas in the Danube River Basin, River Restoration in Europe., p. 91. Jenkins, P.A., 2007. Map-Based Tests on Controls of Anabranch River Character on the Lower Yellowstone River (Doctoral dissertation). Montana State University-Bozeman, College of Letters & Science. Kiss, T., Hernesz, P., S€umeghy, B., Gy€orgy€ovics, K., Sipos, G., 2015. The evolution of the Great Hungarian Plain fluvial system–fluvial processes in a subsiding area from the beginning of the Weichselian. Quat. Int. 388, 142–155. https://doi.org/10.1016/j.quaint.2014.05.050. Langat, P.K., Kumar, L., Koech, R., 2019. Monitoring river channel dynamics using remote sensing and GIS techniques. Geomorphology 325, 92–102. Leigh, D.S., 2008. Late quaternary climates and river channels of the Atlantic Coastal Plain, Southeastern USA. Geomorphology 101 (1–2), 90– 108. https://doi.org/10.1016/j.geomorph.2008.05.024. Phillips, J.D., 2014. Anastamosing channels in the lower Neches River valley, Texas. Earth Surface deposits in the northern lower Mississippi valley. Quat. Sci. Rev. 22 (10–13), 1105–1110. https://doi.org/10.1002/esp.3582. Reinfelds, I., Bishop, P.A.U.L., Benito, G., Baker, V.R., Gregory, K.J., 1998. Palaeohydrology, palaeodischarges and palaeochannel dimensions: research strategies for meandering alluvial rivers. In: Palaeohydrology and Environmental Change, pp. 27–42. Rittenour, T.M., Goble, R.J., Blum, M.D., 2003. An optical age chronology of Late Pleistocene fluvial deposits in the northern lower Mississippi valley. Quat. Sci. Rev. 22 (10–13), 1105–1110. https://doi.org/10.1016/S0277-3791(03)00041-6. Schumm, S.A., 1968. River Adjustment to Altered Hydrologic Regimen, Murrumbidgee River and Paleochannels, Australia. vol. 598 US Government Printing Office, p. 1968. 134 Handbook of hydroinformatics Stancı́ková, A., 2010. Training of the Danube River channel. In: Hydrological Processes of the Danube River Basin, pp. 305–341. Walsh, S.J., Butler, D.R., Malanson, G.P., 1998. An overview of scale, pattern, process relationships in geomorphology: a remote sensing and GIS perspective. Geomorphology 21 (3–4), 183–205. Wilkinson, G.G., 1996. A review of current issues in the integration of GIS and remote sensing data. Int. J. Geogr. Inf. Sci. 10 (1), 85–101. Further reading Mulligan, A.E., Evans, R.L., Lizarralde, D., 2007. The role of paleochannels in groundwater/seawater exchange. J. Hydrol. 335 (3–4), 313–329. https://doi. org/10.1016/j.jhydrol.2006.11.025. Pálfai, I., 1994. A Duna–Tisza k€ozi hátság vı́zgazdálkodási problemái (The water management problems of the Danube–Tisza Interfluve). In: Pálfai, I. (Ed.), A Duna–Tisza k€ozi hátság Vı́zgazdálkodási Problemái. A Nagyalf€old Alapı́tvány K€otetei 3. Ed. Bekescsaba, Nagyalf€old Alapı́tvány, pp. 111–126. Stevanovic, Z., Kozák, P., Lazic, M., Szanyi, J., Polomcic, D., Kovács, B., Papic, P., 2008. Towards sustainable management of transboundary Hungarian–Serbian aquifer. In: Transboundary Water Resources Management: A Multidisciplinary Approach. vol. 1, pp. 143–149. Timár, G., Szekely, B., Molnár, G., Ferencz, C., Kern, A., Galambos, C., Zentai, L., 2008. Combination of historical maps and satellite images of the Banat region—re-appearance of an old wetland area. Glob. Planet. Chang. 262 (1–2), 29–38. https://doi.org/10.1016/j.gloplacha.2007.11.002. Chapter 8 Data assimilation Mohammad Mahdi Dorafshana, Mohammad Reza Jabbarib, and Saeid Eslamianc,d a Department of Civil Engineering, Isfahan University of Technology, Isfahan, Iran, b Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran, c Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, d Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction Research shows that global warming will increase the occurrence of extreme events (Karl et al., 1995) and, thus, reduce the possibility of predicting the hydrological system’s future conditions (Tsonis, 2004). By increasing the occurrence risk of floods and droughts, regions with insufficient observational data will be more exposed to the dangers of these events. Thus, it is crucial to address hydrological modeling and reduce the uncertainty of the result of models by using the available data and estimating the flow rate with more certainty in these areas (Wagener and Gupta, 2005). Hydrological models simulate natural events by simplification, and as a result, their results are not definitive. The most important sources of uncertainty in hydrological models include the uncertainty of the model input data, the initial state variable, the model structure, and the model parameters. Data assimilation (DA) reduces overall uncertainty by considering the uncertainties of the inputs, observations, and updating variables promptly. Up to now, each of these sources of uncertainty has been reduced separately or together, by Batch (Duan et al., 1992) and Recursive (Moradkhani et al., 2005) optimization methods, or a combination of both methods such as Simultaneous Optimization and Data Assimilation (SODA) method (Vrugt et al., 2006). In Batch methods, calibration is carried out using the time series of observations, and then, the parameters are estimated regardless of the uncertainties of input, state, and output variables of the model, resulting in reducing the overall uncertainty of the model. In this regard, the Shuffled Complex Evolution optimization algorithm developed at the University of Arizona, called SCE-UA, is one of the Batch methods used by various researchers (Parajka et al., 2006; Misirli et al., 2003). Concerning the recursive methods, optimization is performed sequentially, and it is feasible to update state variables and hydrology model parameters by addressing the uncertainty of the input data and the initial conditions (Moradkhani et al., 2005; Weerts and El Serafy, 2006; Clark et al., 2008). Recursive methods have been vastly used in DA. These methods carry out a successive continuous-time process to update the state variables and parameters of the model based on the new observations, called the European Centre for Medium-range Weather Forecasts (ECMWF) process. In other words, because the observations and the simulation results of each hydrological model alone are not complete, combining both of them during the DA process can result in a more accurate prediction (Tiefenbacher, 2012). Note that the information optimization during DA is performed sequentially in the Recursive methods while this process is conducted on a time series of observations in the Batch methods. The model calibration is a particular type of DA, the purpose of which is to reduce the model’s results error using observations (Tiefenbacher, 2012), and in the case of making predictions, using the DA methods is the best way to reduce the error (Wagener and Gupta, 2005). Generally, accurate and reliable prediction methods are the two essential foundations of effective and simultaneous river management for flood control, flood warning, and reservoir management. Numerical models are not perfect and just provide approximate models of reality. Therefore, various factors such as input information, problem information, computational model, and solving methods may cause the lack of consistency of the modeling results with reality (Krzysztofowicz, 2001; Butts et al., 2005). Obtaining information about the sources of uncertainty, as well as quantifying the extent of the impact of each source, is a key step in prioritizing research to develop future models. So far, various methods have been developed for determining the degree of uncertainty of a model, including sensitivity analysis, multimodel approach, general effect method, and the probabilistic method. Probabilistic and general effect methods include the most recent research related to flood prediction. In the probabilistic method, a large number of simulations are implemented using the random samplings of the probability distribution from model parameters, initial conditions, boundary conditions, and input data. For example, Pappenberger et al. (2004) used the Monte-Carlo sampling method to discuss the uncertainty of the rating curve and the roughness coefficient of the hydraulic model. Further, Pappenberger et al. (2007) investigated the Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00003-8 Copyright © 2023 Elsevier Inc. All rights reserved. 135 136 Handbook of hydroinformatics effect of uncertainty on flooding by using multiple combinations of effective model parameters for a two-dimensional flooding model. In contrast, the general effect method uses a concentrated sampling method to decrease the number of samples. The existence of uncertainty sources in modeling, even when the quality of the model and input data is good and the model is well-calibrated, may cause the results to be inconsistent with the observation values. In this case, making accurate predictions requires postprocessing the output of the simulation models. It should be noted that the information processing methods in this case are also called DA. Finally, the objectives of this chapter include: (i) introducing DA and its methods, (ii) DA applications in water engineering (especially in flood simulation and forecasting), and (iii) considerations in the use of DA methods. 2. What is data assimilation? Researchers have proposed different names for methods developed for improving modeling results based on their disciplines. For example, meteorologists and researchers on coastal management call these methods information data assimilation method while hydrologists also call them data updating method. The simplified term Data Assimilation (DA) was first used in the 1960s as a military project to control and guide the direction of missiles. Regarding the lack of knowledge of atmospheric conditions, the results of missiles trajectory models were generally inaccurate (Drecourt, 2003). Aerospace scientists have tried to solve this problem by using actual observations obtained by satellites along the missile’s trajectory and control the missile properly. Hence, the DA term can be defined as follows: “combining the observations of a phenomenon with the results of its simulation to improve the performance of the simulation model.” Fig. 1 illustrates the difference between conventional modeling versus real-time modeling or DA. Based on previous research, different models for flood simulation, routing, and forecasting can be divided into two categories: hydraulic models and hydrological models (Fread, 1981). Flood forecasting models can be used in two ways. In the first case, forecasting the downstream is performed by hydrologic and hydrodynamic models along with the usage of input data (such as rainfall or water level upstream), and model parameters calibrated by measured downstream parameters. In the second case, the output of hydrological and hydrodynamic models is modified after calibration and recorded using upstream and downstream information. The application of DA has been studied in hydrology and a combination of point observational data such as groundwater level (Hendricks Franssen et al., 2011) and observational data from remote sensing on a large scale, such as soil moisture (Sahoo et al., 2013), snow (Griessinger et al., 2016), flow rate (Moradkhani et al., 2005) in daily (Aubert et al., 2003) or hourly (Neal et al., 2009) with the results of hydrological models. The results of previous studies have indicated the improvement of the prediction by DA. Data assimilation can help water engineers in floods simulating and forecasting to better plan and deal with this natural phenomenon. DA is performed by determining the estimation error, which is the difference between the actual value and the value calculated by the model, through which the forecasting result can be significantly improved. While many factors complicate real-time modeling, one advantage compared to nonreal-time modeling is that the observed values of the river are comparable to the simulated values. If the observational values are valid and reliable, it is possible to update the results of future simulations by using the difference between the observed and simulated values. So far, various methods have been proposed to update the simulation results. In other words, the adoption FIG. 1 Conventional modeling and real-time modeling using DA. Data assimilation Chapter 8 137 of the technique for dealing with the updating problem depends on the selection of the dominant factor causing nonconformity in the results of simulations and observations (Anctil et al., 2003). 3. Types of data assimilation methods Depending on the type of prevailing error in the applied model and its characteristics, a consistent DA with that error should be used to obtain appropriate simulation and forecasting results. There is not a comprehensive and unique categorization for DA algorithms, but commonly they can be divided into categories according to their updating procedure and the type of variable to be updated. 3.1 Types of updating procedure Concerning updating procedure, Variational Data Assimilation (VDA) and Sequential Data Assimilation (SDA) methods are two well-known approaches. 3.1.1 Variational data assimilation In this method, during each time step, the present and all past observations are used to correct the initial conditions of the model and bring the observed values and simulated values closer together (Schad et al., 2015) (e.g., Fig. 2). Depending on the spatial and temporal dimensions of the model state variable, VDA is performed in three ways of 1D-Var, 3D-Var, and 4D-Var (Liu et al., 2010; Reichle et al., 2001; Seo et al., 2003). 3.1.2 Sequential data assimilation As shown in Fig. 3, the updating is performed step-by-step in this method. By having the observed and calculated values in the time step k, the updating is performed by DA, which improves the model prediction in the time step k + 1. Then, the same process is carried out for time steps k + 1 and +2. In other words, this updating approach uses accessible observations to update the model variable in the same time step (Schad et al., 2015). In contrast to the previous method, this method causes discontinuities in the value of the system variable in the time series. This approach is more commonly used for systems driven by boundary conditions. The primary DA methods used in flood forecasting, such as the Kalman Filter (KF), are subdivisions of the SDA (Entekhabi et al., 1994; Evensen, 1994; Galantowicz et al., 1999). 3.2 Types of updating variable The structure of a model has different components. As shown in Fig. 4, each model has seven components: initial state (x0), boundary (B), input (I), state variable (x), parameter (y), output (O), and structure (M). The model’s structure (M) consists of two components of Mx and Mo, representing the conversion of the input to state variable and the conversion of the state FIG. 2 Variational Data Assimilation (VDA) approach. The original model run (red dashed line and dots) is given a better initial condition that leads to a new model run (blue dashed line and dots) that is closer to the observations (green dots). 138 Handbook of hydroinformatics FIG. 3 Sequential Data Assimilation (SDA) approach. When an observation is available (green dot), the model forecast (red dot) is updated to a value closer to the observation (blue dot) that is used to make the next model forecast. FIG. 4 Schematic diagram of model components from a system perspective. variable to output, respectively. As can be seen, the five components of seven (x0, B, I, y, M) should be estimated, determined, or defined before the model can begin to operate. The two remaining model’s components (i.e., x, o) are obtained by a process performed by the model. Every five components that should be identified before the model begins to operate can cause ambiguities and errors during their determination path, which play a role in determining the values x and o (Liu and Gupta, 2007). Based on the type of error caused by each of the five components of the model, four DA methods are presented (e.g., Fig. 5), including Updating Input Variable, Updating Model Parameter, Updating State Variable, and Updating Output Variable (Error Correction) (O’Connell and Clarke, 1981; Refsgaard, 1997; Babovic et al., 2001). 3.2.1 Updating input variable This method, which is an old approach and is rarely used today, is justified by the fact that ambiguity and uncertainty in the model input can be the main and dominant source of error in the model prediction performance (Georgakakos, 1986; Xiong et al., 2004). 3.2.2 Updating model parameter It can be claimed that this method is the most widely used one for performing DA. This updating method is conducted by using algorithms such as the Kalman Filter. In this case of updating, the model parameters are continuously considered during the simulation and forecasting steps. The prevailing view in this case relies on the idea that the model calibration from each time step to the next one will not change significantly the results in hydrodynamic models. That is why, the actual changes of the model parameters are significantly slower than that of the computational intervals (Hsu et al., 2006; Chao et al., 2008; Lee and Singh, 1999; Young, 2002). The updating model parameters can practically express physical interpretations for the intended environment. However, the model parameters are not determined accurately and clearly, due to the nature of the measurements, model calibration, and model processing uncertainties. By accepting this compensation, updating the model parameters seeks to find the best match between the modeling results and the observed values. In other words, using conceptual and physical models to update model parameters may not provide an accurate and acceptable understanding of model parameters (Kachroo, 1992). However, there is no such understanding of parameters for data-driven models, such as Artificial Neural Networks (ANNs) and Transfer Functions (TF) (Young, 2002). The selection of runoff coefficient and calculation of friction loss in Data assimilation Chapter 8 139 FIG. 5 Comparison of the classical model run (upper part) with the model run with an updating procedure (lower part). river hydrodynamic models are among the most prominent examples of this method (Sene, 2008). Furthermore, a combination method of the updating state variable and updating model parameters can be better cover all error sources (Moradkhani et al., 2005). 3.2.3 Updating state variable The issue of using DA to correct model parameters is rarely used because it is generally assumed that model parameters do not change over time. This method considers the correction of the system state variable. The KF method is regarded as one of the methods used in this case and is an optimal updating method for linear systems. However, it can be used well for nonlinear systems with some modifications (Komma et al., 2008; Brocca et al., 2009). The updating state variable method aims to determine the initial conditions of the model in such a way that the results of the simulation model and the observational values are best fitted and, if possible, these results can be generalized to the next time steps leading to the forecasting time (W€ohling et al., 2006). The KF method and its various forms, including Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) are among the well-known updating state variables methods, which have been used for various models such as coastal models (Verlaan et al., 1831), as well as rainfall-runoff models and hydrodynamic models (Butts et al., 2005; Weerts and El Serafy, 2005). In general, the updating model parameter method tends to focus on ambiguities and errors caused by parameter estimation, while ignoring other sources of error. On the other hand, in updating state variable methods, the model has the potential to consider various ambiguities and errors resulting from model inputs and observations. This causes the model to fail to address the errors associated with estimating the model parameters. As a result, there was a tendency for model estimation and prediction to result from a combination of updating model parameters and updating state variables so that all sources of error could be covered appropriately. 140 Handbook of hydroinformatics 3.2.4 Updating output variable The differences between the model outputs and the actual observations are usually related serially. This feature allows predicting future error values and directly working on error modeling. The updating output variable method is based on error prediction and has been widely used in various studies (Babovic et al., 2001; Wang and Bai, 2008; Bao et al., 2011; Yu and Chen, 2005). The independence of this method from the prediction model treats as a salient feature, which allows it to be used as postprocessing for the model output. This feature is noteworthy compared to the updating of state variables and model parameters, which requires two-way interaction between the forecasting model and the updating model. Researchers have proposed a wide range of techniques such as Kalman Filter (KF), Auto-Regressive (AR), Transfer Function (TF), and ANN methods (Babovic et al., 2001; Xiong et al., 2004; Rungø et al., 1989; Moore, 1999), and Fuzzy Logic (Yu and Chen, 2005; Xiong and O’Connor, 2002) to predict error. 4. Optimal filtering methods At first, in Communication Engineering, optimal filtering methods have been developed to the aim of noise separation and thus transmit information or signals more accurately and efficiently. Then it gained recognition in the field of hydrology and water resources (Lettenmaier and Burges, 1976; Rodrı́guez-Iturbe and Mejı́a, 1974). Now, there are two basic mathematical approaches to interpret the complete behavior of a hydrologic system: Deterministic or Stochastic. In the case of a totally deterministic system, the transfer function can usually be obtained by solving systems of simultaneous equations. On the other hand, stochastic hydrologic systems necessitate methodology which utilizes estimation theory and interrelates the statistical nature of the problem with the prediction of random hydrologic events (Husain, 1985). To this aim, the Wiener Filter was the first method adopted to handle random events. It is not used much for state estimation anymore, due to its major drawbacks such as being applicable only to stationary time-invariant events. So, in this section, we will give a brief review of other alternative and well-known Optimal Filtering methods such as Kalman Filter (KF), Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF). For more information, the interested reader can refer to standard books on optimal filtering (Simon, 2006; Anderson and Moore, 2012). The early success of the KF at the beginning of its arrival in the 1960s in aerospace applications led to attempts to apply it to more common other applications in the 1970s like water resources management. The application of optimal filtering approaches in the field of water resources engineering can be summarized as follows: soil moisture (Entekhabi et al., 1994), flood forecasting (Kitanidis and Bras, 1980), estimation of hydraulic conductivity (Katul et al., 1993), groundwater flow and transport problems (Eigbe et al., 1998), estimation of water table elevations (Bailey and Baù, 2010), land surface model (Zhang et al., 2017), surface water quality modeling (Cho et al., 2020), remote sensing (Dorigo et al., 2007; Khaki et al., 2020). 4.1 Kalman filter One remarkable aspect of the Kalman Filter (KF) is that it is optimal in several different senses. This approach became quickly known as a more efficient and useful tool due to its superiority on the Wiener Filter method (Kay, 1993). Navigating aircraft and spacecraft were the first usages of the KF method. For example, the Apollo project navigation system used the KF method (Maybeck, 1979). This method was started with its applications in water-related fields such as meteorological issues. The KF method is widely used in meteorology, oceanography, and hydrology (Chiu, 1978). The KF method is one of the state-variables updating methods that is widely used in various scientific fields today. In the KF method, the system state is updated by updating the state variables. Suppose we have a linear discrete-time deterministic system with a measurement equation given as follow: xk+1 ¼ Fk xk + Gk uk + wk (1) yk+1 ¼ H k+1 xk+1 + vk+1 (2) where wk and vk are processes and measurement noise (or uncertainty), respectively, which assume as white, zero-mean Gaussian noise with covariance matric Qk and Rk (i.e., wk N ð0, Qk Þ and vk N ð0, Rk Þ ), and also are uncorrelated (E{wkvTj } ¼ 0, for all i and j). The objective is to estimate the state vector xk+1 based on our knowledge of the System Dynamics Equation (1) and the Measurements Equation (2), both of which have some uncertainty. The available information to estimate state variables depends on the particular problem that we are trying to solve. If all of the measurements up to and including + time k + 1 are available for use in estimation of xk+1, then we can form an a Posteriori Estimate which is denoted as b xk + 1 and is computed as the expected value of xk+1 conditioned on all of the measurements up to and including time step k: b (3) xk++ 1 ¼ E xk + 1 j yk + 1 ,yk ,…, y1 Data assimilation Chapter 8 141 On the other hand, if all of the measurements before (but not including) time k are available for use in our estimate of xk+1, then we can form an a Priori Estimate, which is denoted by xbk+1 and computed as the expected value of xk+1 conditioned on all of the measurements up to (but not including) time k: xbk+1 ¼ E xk+1 jyk , yk1 , …, y1 (4) + + and xbk+1 are estimates of xk+1. However, xbk+1 and xbk+1 are the estimate of xk+1 before and It is important to note that both xbk+1 + after the measurement yk+1 is taken into account, respectively. It is clear that xbk+1 is potentially expected to be a better + estimate than xbk+1 , since xbk+1 uses more information to estimate xk+1. Note that the first measurement is taken at time k ¼ 1. We use the expected value of the initial state x0 to denote our initial estimate of x0, i.e., xb0+, before any measurements are available: xb0+ ¼ E x0 (5) + + bk+1 and xbk+1 , respectively: In addition, we use P k+1 and Pk+1 to denote the Estimation Error Covariance of x n o T xk+1 xbk+1 P xk+1 xbk+1 k+1 ¼ E n o + + T P+k+1 ¼ E xk+1 xbk+1 xk+1 xbk+1 (6) (7) To better understand, the concepts above and their relationships are depicted in Fig. 6. In the notation that follows, the discrete-time KF procedure can be summarized as follows: * The Discrete-Time Kalman Filter (KF) Algorithm 1. Linear Model Identification: Dynamic System Equation: xk+1 ¼ Fkxk + Gkuk + wk Measurements Equation: yk+1 ¼ Hk+1xk+1 + vk+1 Noise Characteristics: w k N ð0, Q k Þ, v k N ð0, R k Þ, E{wkvTj } ¼ 0\ 2. Initialization: + xb0 ¼ E fx 0 g n o + + T P +0 ¼ E x 0 xb0 x 0 xb0 3. Updating (for k 5 1,2, …): 3.1. Time Updating (Prediction): T P k+1 ¼ F k P k F k + Q k + xbk+1 ¼ F k xbk + G k u k 3.2. Measurement Updating (Filtering): 1 T T K k+1 ¼ P k+1 H k+1 H k+1 P k+1 H k+1 + R k+1 + xbk+1 ¼ xbk+1 + K k+1 y k+1 H k+1 xbk+1 T T P +k+1 ¼ ðI K k+1 H k+1 ÞP k+1 ðI K k+1 H k+1 Þ + K k+1 R k+1 K k+1 The matrix Kk in the above equations is called the Kalman Filter Gain, which can be calculated offline before the system operates and saved in memory. The quantity yk+1 H k+1 xbk+1 is called the Innovation, which can be interpreted as the part of FIG. 6 Timeline of prior and posterior estimates in KF framework. 142 Handbook of hydroinformatics the measurement that contains new information about the state. When a KF is used for state estimation, the innovations can T be measured and if its mean and covariance are not equal to 0 and Hk+1P k+1 Hk+1 + Rk+1, respectively, that means something is wrong; Perhaps either the assumed system model or the assumed noise statistics are incorrect. The following is an overview of the application of the KF in water engineering, especially flood simulation and forecasting. In Hino (1973), the KF was used in flood prediction for the first time and updated the parameters of the Muskingum model. Markussen (1985) applied the North American Mesoscale (NAM) hydrodynamic model with a new structure to utilize KF for finding water level values. The most important point in that research was the results of uncertainty analysis, which showed that uncertainty at the output was mainly driven by uncertainty at the input (rainfall). Markussen (1985) used the KF technique to update the output data of the centralized rainfall-runoff model. Considering the 6-h data for the Bird Creek Basin with 2344 km2 in Oklahoma, Georgakakos (1986) investigated the effectiveness of a hydrometric model to forecast flooding. By developing a rainfall forecasting model and combining it with a modified rainfall-runoff model, the National Meteorological Institute forecasted the flood flow and variables such as soil moisture storage capacity. Ultimately, the flood flow was modified by using the KF model. Guang-Te et al. (1987) also attempted to apply KF to estimate the flow in the Muskingum model. Ferraresi et al. (1996) used a Darcy linear model along with KF to estimate Transmissibility in a real environment. Refsgaard (1997) compared the combination of the NAM model integrated with the KF and the NAM/MIKE model with an error prediction technique and concluded that correctly calibrating the initial hydrological model that leads the KF to have better results relative to the error prediction model. Lee and Singh (1999) successfully used the KF model to estimate the parameters in the Tank model. This model was used to modify the model parameters over time and update runoff uncertainty. Groundwater researchers have mainly studied saturated areas and the possibility of deep diffusion, considering the observation of moisture in the upper layers. In unsaturated activity areas, Walker et al. (2002) correctly demonstrated the power of KF in simulating relative soil moisture. The model used by these researchers was valuable in terms of computational terms, due to its three-dimensional nonlinear nature. Information simulation in hydrology has been performed using algorithms developed in other fields such as meteorology and oceanography through the KF method (Troch et al., 2003). To predict daily soil moisture, Kashif Gill et al. (2007) used 6 months of meteorological data to train a Support Vector Machine (SVM) model and 3 months of data to test the learned model. Other research was also conducted using the KF on the Muskingum routing model (Wang and Bai, 2008; Huang, 1999). The observability of the phenomenon model is one of the basic conditions for using the KF in predicting the phenomenon. Observability of a dynamic model means the ability to reconstruct the system variable with the observations. Therefore, since hydrological lump models are not observable, updating the parameters of these models is the only acceptable way to use the KF under such conditions (Huang, 1999). Distribution models, which are based on the SaintVanant equations and used in flood routing, can satisfy the observability condition well and provide the ability to use KF for updating the variable state of these models. However, they significantly increase the computational load. Given these conditions, studies indicate an increase in the accuracy of the forecasts updated by KF (Mu and Zhang, 2007; Xie and Zhang, 2010). Madsen and Skotner (2005) presented a simultaneously updating model of river flood forecasting, which was a combination of the KF model and the fault forecasting model. In their model, the error of system state variables was first distributed on the location of measurement stations by the KF mode; after that, the resulting error values were distributed at the located of measurement stations in the forecasting area by using the error forecasting model. Currently, it is possible to use this model in nonlinear and large-dimensional problems. On the other hand, since KF can be adopted with updating the problem variables to create proper time precedence, this method is a suitable and efficient way for flood forecasting and timely warning problem. 4.1.1 Kalman filter limitations The efficiency and accuracy of the Kalman filter are guaranteed only under certain conditions: l l l The mean, covariance and correlation of the process noise wk and measurement noise vk should be known at all time steps. The system model matrices Fk and Hk should be known. The KF is the Minimum Variance Unbiased Estimator (MVUE) if the noise is Gaussian, and it is the best linear unbiased minimum variance estimator if the noise is not Gaussian. So, if we want to minimize a different cost function, then the KF may not accomplish our objectives. Data assimilation Chapter 8 143 So, we will need an alternative filter if one of the Kalman filter assumptions is not satisfied. The H∞ filter does not make any assumptions about the noise, and it minimizes the worst-case estimation error. Transfer Function (TF) approaches are a formulation to H∞ filtering have been proposed by Yaesh and Shaked (1991). 4.2 Transfer function As we have stated in the earlier section, although the KF and its extensions are an effective tool for estimating the state variables of a system, they have a serious mismatch between the underlying assumptions of the KF and other practical state estimation situations. Accurate system models are not as readily available for practical problems and engineers realized they needed a new filter that could handle modeling errors and noise uncertainty. The Transfer Function (TF) approach is a frequency domain approach that has been proposed to mitigate these drawbacks. Transfer function, as one of the data assimilation methods, is among output updating methods that attempt to model error directly. Consider the discrete-time time-invariant system as follows: xk+1 ¼ Fxk + wk yk ¼ Hxk + vk zk ¼ Lxk (8) where yk is the measurement, zk is a vector of a linear combination of the states to be estimated, L is a user-defined full-rank matrix, and process and measurement noise are uncorrelated. The vectors wk and vk are process and measurement noise with unknown statistics such that they may not even be zero-mean. If we want to directly estimate xk as in the KF then we set L ¼ I. Consider the estimation error and augmented disturbance vector as follows: e zk ¼ zk b zk T T T ek ¼ wk vk The transfer function from the augmented disturbance vector e to the estimation error e z is defined as follow: z b z k 2 z k2 k G ¼ sup ke ¼ sup e~z ∞ a ke k2 a kwk2 + kvk2 (9) (10) where a is the frequency of the noise ( Jukic and Denic-Jukic, 2004; Duffy and Gelhar, 1986; Riyahi et al., 2018). Clearly, Geez can be considered as a system that has e as its input and e z as its output. We aim to find a steady-state estimator such that the infinity-norm of the transfer function (16) is less than some user-specified bound: G y1 (11) e~z ∞ Therefore, if P and Pe are Positive Definite (PD) chosen by the user based on the specific problem, the steady-state TF algorithm can be summarized as follows. * The Discrete-Time Transfer Function (TF) Algorithm 1. Linear Model Identification: Dynamic System Equation: xk+1 ¼ Fxk + wk Measurements Equation: yk ¼ Hxk + vk Noise Characteristics: Unknown Statistics 2. Initialization: x0 ¼ 0 3. Updating (for k 5 1,2, …): 3.1. Priori Filter: 1 1 P ¼ I + FPF T FPH T I + HPH T HPF T + PL y1 I + LPLT LP K ¼ FPHT(I + HPHT)1LP xbk+1 ¼ F xbk + K y k H xbk Continued 144 Handbook of hydroinformatics 3.2. Posteriori Filter: 1 S1 ¼ Pe yLT L + H T H Pe ¼ F Pe H T H Pe yLT LPe + I 1 Ke ¼ I + yLT L SH T xbk+1 ¼ F xbk +Ke y k+1 HF xbk 1 FT + I The practical use of the TF in water engineering was first proposed in the MIKE11 software environment to improve flood prediction accuracy. This software eliminates two errors called Amplitude Error and Phase Error defined by flood hydrograph in the flood forecasting process. In addition to performing the updating process separately on both the model output and state variables, MIKE11 software is one of the software that can minimize the errors in model simulations and present more accurate results. The amplitude error and the phase error can be attributed to the momentum and continuity equations, respectively (Paudyal, 2002). Fig. 7 demonstrates these errors schematically. Many works have focused on the improvement of the TF method, for example, Saddagh and Abedini (2012) improved the results obtained by the TF method by considering the nonuniformity of the amplitude error around the peak of the hydrograph. 4.3 Extended Kalman filter Unfortunately, linear systems do not really exist and also many systems are not close enough to linear that linear estimation approaches give satisfactory results. In this case, we need to explore nonlinear estimators. Since the general KF algorithm is limited to linear systems, in this section, we will discuss the Extended Kalman Filter (EKF) as a nonlinear extension of the KF. The low order nonlinear system can be linearized and then linear estimation techniques (such as the KF) can be applied. Suppose we have the following nonlinear discrete-time deterministic system model: xk+1 ¼ f k ðxk , uk , wk Þ (12) yk+1 ¼ hk+1 ðxk+1 , vk+1 Þ (13) By performing a Taylor series expansion of the dynamic system equation (8) around xk ¼ + ∂f xk , uk , 0 + k xk + 1 ’ fk b ∂xk ∂f xk b xk+ + k ∂wk b ≜Fk xk + u k + w k FIG. 7 Schematic of types of errors in the transfer function method. + xk ¼xk xk ¼b xk + wk xb+k and wk ¼ 0 we obtain: (14) Data assimilation Chapter 8 145 where ∂f k ∂f +, Lk ≜ k b x ¼ x k k ∂xk ∂wk x ¼ bx + k k + ek ≜Lk wk uek ≜f k xbk , uk , 0 Fk xbk+ , w Fk ≜ (15) Similarly, we linearize the measurement equation (9) around xk+1 ¼ xbk+1 and vk+1 ¼ 0 as follows: ∂h yk+1 ’ hk+1 xbk+1 , 0 + k+1 ∂xk+1 xk+1 xk+1 ¼b ∂h xk+1 xbk+1 + k+1 ∂vk+1 xk+1 ¼ b xk+1 vk+1 (16) ≜ H k+1 xk+1 + zk+1 + vek+1 where we define: ∂hk+1 ∂h , M ≜ k+1 ∂xk+1 xk+1 ¼ bxk+1 k+1 ∂vk+1 x ¼ bx k+1 k+1 e b b zk+1 ≜hk+1 xk+1 , 0 Hk+1 xk+1 , vk+1 ≜Mk+1 vk+1 H k+1 ≜ (17) ek N 0, Lk Qk LTk and vek+1 N 0, Mk+1 Rk+1 MTk+1 . Therefore, we obtain a Obviously, the new noise components are w linear state system in Eq. (14) and a linear measurement in Eq. (16). That means we can use the standard KF to estimate the state. This results in the following equations for the discrete-time EKF. * The Discrete-Time Extended Kalman Filter (EKF) Algorithm 1. Nonlinear Model Identification: Dynamic System Equation: xk+1 ¼ fk(xk, uk, wk) Measurements Equation: yk+1 ¼ hk+1(xk+1, vk+1) Noise Characteristics: w k N ð0, Q k Þ, v k N ð0, R k Þ, E{wkvTj } ¼ 0 2. Initialization: + xb0 ¼ E fx 0 g P +0 ¼ E n o + + T x 0 xb0 x 0 xb0 3. Updating (for k 5 1,2, …): 3.1. Time Updating (Prediction): ∂f F k ¼ ∂xk k x k ¼b xk + ∂f , Lk ¼ ∂wk k x k ¼b xk + (Linearization) T P k+1 ¼ FkPk Fk + Qk + b b x k+1 ¼ F k x k + G k u k 3.2. Measurement Updating (Filtering): ∂h H k+1 ¼ ∂x k+1 k+1 x k+1 ¼b x k+1 ∂h , Mk+1 ¼ ∂v k+1 k+1 x k+1 ¼b x k+1 (Linearization) 1 T T Kk+1 ¼ P k+1 Hk+1(Hk+1Pk+1 Hk+1 + Rk+1) + xbk+1 ¼ xbk+1 + K k+1 y k+1 H k+1 xbk+1 T T P+k+1 ¼ (I Kk+1Hk+1)P k+1(I Kk+1Hk+1) + Kk+1Rk+1Kk+1 A various researcher has examined the application of the EKF in the field of water resources engineering (Entekhabi et al., 1994; Puente and Bras, 1987; Walker et al., 2001). In Wood and O’Connell (1985), KF and EKF are exploited for realtime forecasting to simultaneously estimate state variables and parameters in the Sacramento Soil Moisture Accounting (SAC-SMA) model. Also, although the EKF model was suboptimal and its convergence was not well understood, McLaughlin and Townley (1996) made several successful attempts to use this filter. Indeed, the lack of high-degree nonlinear terms in their study was the main reason for these successes. Next, Eppstein and Dougherty (1996) simplified the EKF method by simplifying the covariance update method and used the classification algorithm to zoning the regions for the Transmissibility value. This method converts a system with a bad situation to a good situation by decreasing the number of variables. Unfortunately, they have not used this method for real-world situations. Moreover, Sun et al. (2016) used EKF 146 Handbook of hydroinformatics to investigate the performance improvement of the Soil & Water Assessment Tool (SWAT) hydrological model, in which the data updating process was performed in two ways of updating the state variable and the output of the model. 4.4 Unscented Kalman filter As stated earlier, the EKF is the best predictor for nonlinear systems. However, if nonlinearities of the system are severe, the EKF often gives unreliable estimates. Precisely, the mean and covariance can be exactly updated with the KF (Section 4.1) in case the system is linear. If the system is nonlinear, then the mean and covariance can be approximately updated with the EKF (Section 4.3). But, the Unscented Kalman Filter (UKF) works on the principle of Unscented Transformation to update mean and covariance. An unscented transformation is based on two fundamental principles. First, it is easy to perform a nonlinear transformation on a single point, called Sigma Point, rather than an entire Probability Density Function (PDF). Second, it is not too hard to find a set of individual points in state space whose sample pdf approximates the true pdf of a state vector. The UKF algorithm can be summarized as follows. * The Discrete-Time Unscented Kalman Filter (UKF) Algorithm 1. Nonlinear Model Identification: Dynamic System Equation: xk+1 ¼ fk(xk, uk, tk) + wk Measurements Equation: yk+1 ¼ hk+1(xk+1, tk+1) + vk+1 Noise Characteristics: w k N ð0, Q k Þ, v k N ð0, R k Þ, E{wkvTj } ¼ 0 2. Initialization: + xb0 ¼ E fx 0 g P +0 ¼ E n o + + T x 0 xb0 x 0 xb0 3. Updating (for k 5 1,2, …): 3.1. Time Updating (Prediction): 3.1.1. Sigma Point Updating ði Þ + ði Þ xbk ¼ xbk + xe , i ¼ 1, 2, …, 2n, n ¼ Number of Sigma Points 8 pffiffiffiffiffiffiffiffiffi T < nP +k i i ¼ 1, 2, …, n xeði Þ ¼ ffi : pffiffiffiffiffiffiffiffi + T nP i ¼ n + 1, n + 2, …, 2n k i ði Þ xbk+1 ði Þ xbk , ¼ fk uk , t k 3.1.2. State Updating 1 xbk+1 ¼ 2n P k+1 ¼ 2n P ði Þ xbk+1 i¼1 2n X 1 2n i¼1 ði Þ xbk+1 xbk+1 T ði Þ xbk+1 xbk+1 + Qk 3.2. Measurement Updating (Filtering): 3.2.1. Sigma Point Updating ði Þ xbk+1 ¼ xbk+1 + xeði Þ , i ¼ 1, 2, …, 2n (Sigma Point Generation) 8 pffiffiffiffiffiffiffiffiffiffiffiffi T < i ¼ 1, 2, …, n nP k+1 i ði Þ xe ¼ ffiffiffiffiffiffiffiffiffiffiffiffi p : nP T i ¼ n + 1, n + 2, …, 2n k+1 i ði Þ ybk+1 ¼ ybk+1 ¼ Py ¼ ði Þ hk+1 xbk+1 , t k+1 2n 1 X ði Þ yb 2n i¼1 k+1 2n 1 X ði Þ yb ybk+1 2n i¼1 k+1 P xy ¼ 2n 1 X ði Þ xb xbk+1 2n i¼1 k+1 ði Þ ybk+1 ybk+1 ði Þ ybk+1 ybk+1 T T + R k+1 Data assimilation Chapter 3.2.2. 8 147 State Updating K k+1 ¼ P xy P 1 y + xbk+1 ¼ xbk+1 + K k+1 y k+1 ybk+1 T P +k+1 ¼ P k+1 K k+1 P y K k+1 In the UKF p algorithm, ffiffiffiffiffiffi row of matrix nP. pffiffiffiffiffiffi pffiffiffiffiffiffiT pffiffiffiffiffiffi nP , and subscript i denote the ith nP is the matrix square root of nP such that nP ¼ nP 5. Auto-regressive method The Auto-Regressive (AR) model, which is a time series-based approach, aims to model error directly in the same way as the TF method. The AR technique is one of the error correction methods widely used by researchers because of its simplicity and relevant results (Refsgaard, 1997; Xiong and O’Connor, 2002). The standard AR model in updating the simulation errors consists of a calibration procedure summarized below. Firstly, the simulation error of model at time instant k is obtained as: b ek ¼ Qk Q k (18) b denote the observed and estimated values, respectively. where ek is the simulation error of the selected model, and Qk and Q k If the mean value of the simulation error series of the calibration period, shown by e, is not equal to zero, then that mean value should be subtracted from the simulated errors to produce a corresponding zero-mean time series, ek: ek ¼ ek e (19) Thus, the AR updating model of order p at time step k is given by: bek ¼ p X ai eki (20) i¼1 p } are the AR coefficients and bek is the estimate of ek. It is worth mentioning that the Auto-Correlation Function where {aii¼1 (ACF) of the time series ei satisfies an analogous form of linear difference equation to that of Eq. (20), so the Yule-Walker p can be exploited to estimate the parameters {aii¼1 by replacing the theoretical auto-correlations by their respective estimates p (obtained from the ei time series) (Box and Jenkins, 1976). In practice, however, the AR parameters {aii¼1 } are generally estimated by Least Squares (LS) method by treating Eq. (20) as linear regression. Shamseldin and O’Connor (2001) show that the mathematical formulation of the AR error-forecast updating model is itself a special limiting case of the more general input-output structure of the linear Auto-Regressive Exogenous-input Model (ARXM) (Abdelrahman, 1995), also known as the Linear Transfer Function Model (LTFM) (Xiong and O’Connor, 2002). By Incorporating a residual updated forecast error term, ek, the ARXM model-output updating procedure has the form (Shamseldin and O’Connor, 2001): Qk ¼ p X i¼1 ai Qki + q X b +e bi Q ki k (21) i¼0 b have the same definition as Eq. (18), p and q are the orders of the AR and the exogenous input parts of the where Qk and Q k ARXM, respectively, and ai and bi are the corresponding coefficient parameters of the two parts, respectively. Clearly, if p ¼ q, b0 ¼ 1 and ai ¼ bi for i ¼ 1, 2, …, p, the ARMX model becomes the AR model of the error series ek, whereas if bi ¼ 0 for i ¼ 0, …, q, it becomes the naive AR updating model. Since Eq. (21) is the form of a multiple-linear regression, the parameters of the ARXM can also be estimated directly using the LS method. The results of Shamseldin and O’Connor (2001) used the ARXM updating model as their benchmark updating procedure, show the ARXM procedure is not significantly more efficient than the conventional AR model. 148 Handbook of hydroinformatics 6. Considerations in using data assimilation Data assimilation (DA) methods can significantly increase the accuracy of predicted results and are often considered as the best method. By the way, some considerations should be addressed in the use of these methods as follow (House et al., 2003): (i) Applying updating methods does not necessarily eliminate the need for a model that has been properly calibrated for a wide range of data. Further, it is necessary to examine the accuracy of the calibration performed for a validation period, which contains the recorded data. (ii) The quality of the updating process of the forecasting values heavily relies on the quality of the input data, and the existence of errors in the input data of this process not only fails to improve the results, but also overshadows the accuracy. In this way, it is suggested to control the quality of input data manually or automatically. (iii) Regarding the real-time control applications of hydraulic structures, the use of upgrading methods requires addressing special issues in network design. Otherwise, the lack of considering the interaction of modeling and control commands of structures will affect the results. 7. Conclusions The modeling of hydrological processes is associated with a multitude of climatic parameters and data; therefore, providing an appropriate simulation model that lead to minimal error has always been a vital challenge in previous studies. The uncertainty and lack of reliability over the accuracy of data and input parameters of simulation models lead to error creation, which has a significant adverse effect on long-term forecasts and management policies. The forecasting of flow rate by hydrological models is always accompanied by uncertainty. For this reason, various methods such as increasing the quality of input data to the model and improving the model structure and observational data assimilation are used to reduce the uncertainty of the models. Data assimilation is one of the ways to reduce uncertainty, which decreases the uncertainty by considering the uncertainty of inputs and observations, as well as updating the state variable. Moreover, the KF method and its extensions are regarded as one of the most widely used data assimilation methods. These methods, which are commonly used in various scientific fields, are recursive and can be applied to linear and nonlinear systems. The main advantages of these methods are the lack of need to preserve all the measured information from the beginning until now, and the forecasting of the next step only by having the measurements of the present step. References Abdelrahman, E., 1995. Real-Time Stream Flow Forecasting for Single-Input Single-Output Systems. Unpublished M. Sc. thesis, National University of Ireland, Galway. Anctil, F., Perrin, C., Andreassian, V., 2003. ANN output updating of lumped conceptual rainfall/runoff forecasting models. J. Am. Water Resour. Assoc. 39 (5), 1269–1279. Anderson, B.D., Moore, J.B., 2012. Optimal Filtering. Courier Corporation. Aubert, D., Loumagne, C., Oudin, L., 2003. Sequential assimilation of soil moisture and streamflow data in a conceptual rainfall–runoff model. J. Hydrol. 280 (1–4), 145–161. Babovic, V., et al., 2001. Neural networks as routine for error updating of numerical models. J. Hydraul. Eng. 127 (3), 181–193. Bailey, R.T., Baù, D.A., 2010. Assimilating Water Table Elevation Data Into a Catchment Hydrology Modeling Framework to Estimate Hydraulic Conductivity. Colorado State University. Libraries. Bao, W., et al., 2011. Real-time equivalent conversion correction on river stage forecasting with Manning’s formula. J. Hydrol. Eng. 16 (1), 1–9. Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control. revised ed Holden-Day, San Francisco. Brocca, L., et al., 2009. Assimilation of observed soil moisture data in storm rainfall-runoff modeling. J. Hydrol. Eng. 14 (2), 153–165. Butts, M., et al., 2005. Ensemble-based methods for data assimilation and uncertainty estimation in the FLOODRELIEF project. In: ACTIF International Conference on Innovation Advances and Implementation of Flood Forecasting Technology. Chao, Z., et al., 2008. Robust recursive estimation of auto-regressive updating model parameters for real-time flood forecasting. J. Hydrol. 349 (3–4), 376–382. Chiu, C.-L., 1978. Applications of Kalman Filter to Hydrology, Hydraulics, and Water Resources: Proceedings of AGU Chapman Conference, Held at University of Pittsburgh, Pittsburgh, Pennsylvania, USA, May 22–24, 1978. American Geophysical Union (USA). Cho, K.H., et al., 2020. Data assimilation in surface water quality modeling: a review. Water Res. 186, 116307. Clark, M.P., et al., 2008. Hydrological data assimilation with the ensemble Kalman filter: use of streamflow observations to update states in a distributed hydrological model. Adv. Water Resour. 31 (10), 1309–1324. Data assimilation Chapter 8 149 Dorigo, W.A., et al., 2007. A review on reflective remote sensing and data assimilation techniques for enhanced agroecosystem modeling. Int. J. Appl. Earth Obs. Geoinf. 9 (2), 165–193. Drecourt, J.-P., 2003. Kalman Filtering in Hydrological Modeling. DAIHM, Hørsholm, Denmark. Duan, Q., Sorooshian, S., Gupta, V., 1992. Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resour. Res. 28 (4), 1015–1031. Duffy, C.J., Gelhar, L.W., 1986. A frequency domain analysis of groundwater quality fluctuations: interpretation of field data. Water Resour. Res. 22 (7), 1115–1128. Eigbe, U., et al., 1998. Kalman filtering in groundwater flow modelling: problems and prospects. Stoch. Hydrol. Hydraul. 12 (1), 15–32. Entekhabi, D., Nakamura, H., Njoku, E.G., 1994. Solving the inverse problem for soil moisture and temperature profiles by sequential assimilation of multifrequency remotely sensed observations. IEEE Trans. Geosci. Remote Sens. 32 (2), 438–448. Eppstein, M.J., Dougherty, D.E., 1996. Simultaneous estimation of transmissivity values and zonation. Water Resour. Res. 32 (11), 3321–3336. Evensen, G., 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Oceans 99 (C5), 10143–10162. Ferraresi, M., Todini, E., Vignoli, R., 1996. A solution to the inverse problem in groundwater hydrology based on Kalman filtering. J. Hydrol. 175 (1–4), 567–581. Fread, D.L., 1981. Flood routing: a synopsis of past, present, and future capability. In: Singh, V.P. (Ed.), Proceedings of International Symposium on Rainfall-Runoff Modeling. Water Resources Publications, Littleton, CO, pp. 521–541. Galantowicz, J.F., Entekhabi, D., Njoku, E.G., 1999. Tests of sequential data assimilation for retrieving profile soil moisture and temperature from observed L-band radiobrightness. IEEE Trans. Geosci. Remote Sens. 37 (4), 1860–1870. Georgakakos, K.P., 1986. A generalized stochastic hydrometeorological model for flood and flash-flood forecasting: 1. Formulation. Water Resour. Res. 22 (13), 2083–2095. Griessinger, N., et al., 2016. Assessing the benefit of snow data assimilation for runoff modeling in Alpine catchments. Hydrol. Earth Syst. Sci. 20 (9), 3895–3905. Guang-Te, W., Yu, Y.-S., Kay, W., 1987. Improved flood routing by ARMA modelling and the Kalman filter technique. J. Hydrol. 93 (1–2), 175–190. Hendricks Franssen, H.-J., et al., 2011. Operational real-time modeling with ensemble Kalman filter of variably saturated subsurface flow including stream-aquifer interaction and parameter updating. Water Resour. Res. 47 (2). Hino, M., 1973. Stochastic approach to linear and nonlinear runoff analysis. In: Flood Investigation. vol. II. Asian Institute of Technology. House, E., et al., 2003. Defra/Environment Agency Flood and Coastal Defence R&D Programme. Hsu, M.-H., Fu, J.-C., Liu, W.-C., 2006. Dynamic routing model with real-time roughness updating for flood forecasting. J. Hydraul. Eng. 132 (6), 605–619. Huang, W.-C., 1999. Kalman filter effective to hydrologic routing? J. Mar. Sci. Technol. 7 (1), 65–71. Husain, T., 1985. Kalman filter estimation model in flood forecasting. Adv. Water Resour. 8 (1), 15–21. Jukic, D., Denic-Jukic, V., 2004. A frequency domain approach to groundwater recharge estimation in karst. J. Hydrol. 289 (1–4), 95–110. Kachroo, R., 1992. River flow forecasting. Part 1. A discussion of the principles. J. Hydrol. 133 (1–2), 1–15. Karl, T.R., Knight, R.W., Plummer, N., 1995. Trends in high-frequency climate variability in the twentieth century. Nature 377 (6546), 217–220. Kashif Gill, M., Kemblowski, M.W., McKee, M., 2007. Soil moisture data assimilation using support vector machines and ensemble Kalman filter 1. J. Am. Water Resour. Assoc. 43 (4), 1004–1015. Katul, G.G., et al., 1993. Estimation of in situ hydraulic conductivity function from nonlinear filtering theory. Water Resour. Res. 29 (4), 1063–1070. Kay, S.M., 1993. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Inc. Khaki, M., Hendricks Franssen, H.-J., Han, S., 2020. Multi-mission satellite remote sensing data for improving land hydrological models via data assimilation. Sci. Rep. 10 (1), 1–23. Kitanidis, P.K., Bras, R.L., 1980. Real-time forecasting with a conceptual hydrologic model: 1. Analysis of uncertainty. Water Resour. Res. 16 (6), 1025–1033. Komma, J., Bl€ oschl, G., Reszler, C., 2008. Soil moisture updating by Ensemble Kalman Filtering in real-time flood forecasting. J. Hydrol. 357 (3–4), 228–242. Krzysztofowicz, R., 2001. The case for probabilistic forecasting in hydrology. J. Hydrol. 249 (1–4), 2–9. Lee, Y., Singh, V., 1999. Tank model using Kalman filter. J. Hydrol. Eng. 4 (4), 344–349. Lettenmaier, D.P., Burges, S.J., 1976. Use of state estimation techniques in water resource system modeling 1. J. Am. Water Resour. Assoc. 12 (1), 83–99. Liu, Y., Gupta, H.V., 2007. Uncertainty in hydrologic modeling: toward an integrated data assimilation framework. Water Resour. Res. 43 (7). Liu, W.-C., et al., 2010. Dynamic routing modeling for flash flood forecast in river system. Nat. Hazards 52 (3), 519–537. Madsen, H., Skotner, C., 2005. Adaptive state updating in real-time river flow forecasting—a combined filtering and error forecasting procedure. J. Hydrol. 308 (1–4), 302–312. Markussen, L.M., 1985. Application of the Kaiman filter to real time operation and to uncertainty analyses in hydrological modelling. In: Scientific Procedures Applied to the Planning, Design and Management of Water Resources Systems. vol. 147. International Association of Hydrological Sciences, Wallingford, Oxfordshire, pp. 273–282. Maybeck, P., 1979. Stochastic Models, Estimation and Control. vol. 1 Academic Press. McLaughlin, D., Townley, L.R., 1996. A reassessment of the groundwater inverse problem. Water Resour. Res. 32 (5), 1131–1161. 150 Handbook of hydroinformatics Misirli, F., et al., 2003. Bayesian recursive estimation of parameter and output uncertainty for watershed models. In: Calibration of Watershed Models. American Geophysical Union, Washington, DC, pp. 113–124. Moore, R., 1999. Real-time flood forecasting systems: perspectives and prospects. In: Floods and landslides: Integrated Risk Assessment. Springer, pp. 147–189. Moradkhani, H., et al., 2005. Dual state–parameter estimation of hydrological models using ensemble Kalman filter. Adv. Water Resour. 28 (2), 135–147. Mu, J.-b., Zhang, X.-f., 2007. Real-time flood forecasting method with 1-D unsteady flow model. J. Hydrodynam. 19 (2), 150–154. Neal, J., et al., 2009. A data assimilation approach to discharge estimation from space. Hydrol. Process. 23 (25), 3641–3649. O’Connell, P., Clarke, R., 1981. Adaptive hydrological forecasting—a review/Revue des methodes de prevision hydrologique ajustables. Hydrol. Sci. J. 26 (2), 179–205. Pappenberger, F., et al., 2004. The influence of rating curve uncertainty on flood inundation predictions. In: Flood Risk Assessment, Bath. Pappenberger, F., et al., 2007. Grasping the unavoidable subjectivity in calibration of flood inundation models: a vulnerability weighted approach. J. Hydrol. 333 (2–4), 275–287. Parajka, J., et al., 2006. Assimilating scatterometer soil moisture data into conceptual hydrologic models at the regional scale. Hydrol. Earth Syst. Sci. 10 (3), 353–368. Paudyal, G.N., 2002. Forecasting and warning of water-related disasters in a complex hydraulic setting—the case of Bangladesh. Hydrol. Sci. J. 47 (S1), S5–S18. Puente, C.E., Bras, R.L., 1987. Application of nonlinear filtering in the real time forecasting of river flows. Water Resour. Res. 23 (4), 675–682. Refsgaard, J.C., 1997. Validation and intercomparison of different updating procedures for real-time forecasting. Hydrol. Res. 28 (2), 65–84. Reichle, R.H., Entekhabi, D., McLaughlin, D.B., 2001. Downscaling of radio brightness measurements for soil moisture estimation: a four-dimensional variational data assimilation approach. Water Resour. Res. 37 (9), 2353–2364. Riyahi, M.M., Rahmanshahi, M., Ranginkaman, M.H., 2018. Frequency domain analysis of transient flow in pipelines; application of the genetic programming to reduce the linearization errors. J. Hydraul. Struct. 4 (1), 75–90. Rodrı́guez-Iturbe, I., Mejı́a, J.M., 1974. The design of rainfall networks in time and space. Water Resour. Res. 10 (4), 713–728. Rungø, M., Refsgaard, J., Havnø, K., 1989. The updating procedure in the MIKE 11 modelling system for real-time forecasting. In: Proceedings of the International Symposium for Hydrological Applications of Weather Radar. University of Salford. Saddagh, M., Abedini, M., 2012. Enhancing MIKE11 updating kernel and evaluating its performance using numerical experiments. J. Hydrol. Eng. 17 (2), 252–261. Sahoo, A.K., et al., 2013. Assimilation and downscaling of satellite observed soil moisture over the Little River Experimental Watershed in Georgia, USA. Adv. Water Resour. 52, 19–33. Schad, A., et al., 2015. Recent developments in helioseismic analysis methods and solar data assimilation. Space Sci. Rev. 196 (1), 221–249. Sene, K., 2008. Flood Warning, Forecasting and Emergency Response. Springer Science & Business Media. Seo, D.-J., Koren, V., Cajina, N., 2003. Real-time variational assimilation of hydrologic and hydrometeorological data into operational hydrologic forecasting. J. Hydrometeorol. 4 (3), 627–641. Shamseldin, A.Y., O’Connor, K.M., 2001. A non-linear neural network technique for updating of river flow forecasts. Hydrol. Earth Syst. Sci. 5 (4), 577–598. Simon, D., 2006. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. John Wiley & Sons. Sun, C., et al., 2016. Fuzzy copula model for wind speed correlation and its application in wind curtailment evaluation. Renew. Energy 93, 68–76. Tiefenbacher, J., 2012. Approaches to Managing Disaster: Assessing Hazards, Emergencies and Disaster Impacts. BoD–Books on Demand. Troch, P., Paniconi, C., McLaughlin, D., 2003. Catchment-scale hydrological modeling and data assimilation. Adv. Water Resour. 26, 131–135. Tsonis, A., 2004. Is global warming injecting randomness into the climate system? Eos, Transactions American Geophysical Union 85 (38), 361–364. Verlaan, M., et al., 1831. Operational storm surge forecasting in the Netherlands: developments in the last decade. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2005 (363), 1441–1453. Vrugt, J.A., et al., 2006. Real-time data assimilation for operational ensemble streamflow forecasting. J. Hydrometeorol. 7 (3), 548–565. Wagener, T., Gupta, H.V., 2005. Model identification for hydrological forecasting under uncertainty. Stoch. Environ. Res. Risk Assess. 19 (6), 378–387. Walker, J.P., Willgoose, G.R., Kalma, J.D., 2001. One-dimensional soil moisture profile retrieval by assimilation of near-surface observations: a comparison of retrieval algorithms. Adv. Water Resour. 24 (6), 631–650. Walker, J.P., Willgoose, G.R., Kalma, J.D., 2002. Three-dimensional soil moisture profile retrieval by assimilation of near-surface measurements: Simplified Kalman filter covariance forecasting and field application. Water Resour. Res. 38 (12). 37-1-37-13. Wang, C.-H., Bai, Y.-L., 2008. Algorithm for real time correction of stream flow concentration based on Kalman filter. J. Hydrol. Eng. 13 (5), 290–296. Weerts, A., El Serafy, G., 2005. Comparing particle filtering and ensemble Kalman filtering for input correction in rainfall runoff modelling. In: This symposium is one of the major opportunities for engineers and scientists to meet in order to report on and discuss ways in which hydraulic and stochastic analyses can be integrated in an effective and useful manner in order to meet these challenges. In this context, it is important to note that the move, in which the first eight in this series of symposia have played a pivotal role, over the last twenty years towards more. Weerts, A.H., El Serafy, G.Y., 2006. Particle filtering and ensemble Kalman filtering for state updating with hydrological conceptual rainfall-runoff models. Water Resour. Res. 42 (9). W€ ohling, T., Lennartz, F., Zappa, M., 2006. Updating procedure for flood forecasting with conceptual HBV-type models. Hydrol. Earth Syst. Sci. 10 (6), 783–788. Wood, E.F., O’Connell, P.E., 1985. Real-time forecasting. In: Hydrol Forecast. John Wiley & Sons, pp. 505–558. Data assimilation Chapter 8 151 Xie, X., Zhang, D., 2010. Data assimilation for distributed hydrological catchment modeling via ensemble Kalman filter. Adv. Water Resour. 33 (6), 678–690. Xiong, L., O’Connor, K.M., 2002. Comparison of four updating models for real-time river flow forecasting. Hydrol. Sci. J. 47 (4), 621–639. Xiong, L., O’Connor, K.M., Guo, S., 2004. Comparison of three updating schemes using artificial neural network in flow forecasting. Hydrol. Earth Syst. Sci. 8 (2), 247–255. Yaesh, I., Shaked, U., 1991. A transfer function approach to the problems of discrete-time systems: H/sub infinity/-optimal linear control and filtering. IEEE Trans. Autom. Control 36 (11), 1264–1271. Young, P.C., 2002. Advances in real–time flood forecasting. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 360 (1796), 1433–1450. Yu, P.-S., Chen, S.-T., 2005. Updating real-time flood forecasting using a fuzzy rule-based model/Mise à Jour de Prevision de Crue en Temps Reel Grâce à un Modèle à Base de Règles Floues. Hydrol. Sci. J. 50 (2). Zhang, H., et al., 2017. State and parameter estimation of two land surface models using the ensemble Kalman filter and the particle filter. Hydrol. Earth Syst. Sci. 21 (9), 4927–4958. This page intentionally left blank Chapter 9 Data reduction techniques M. Mehdi Batenia and Saeid Eslamianb,c a University School for Advanced Studies, Pavia, Italy, b Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, c Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction Short/dense time and spatial intervals of data acquisition programs has resulted in dependent observations. Except of important information, the real data contain useless or even confusing information which can be considered as noise. At postdata collection phase, different redundancy reduction algorithms could be used to decrease redundant information from the data set. These algorithms could be categorized in sample reduction (e.g., clustering) and dimension reduction techniques. In order to determine the reliability of hydrologic design variables derived from these dependent observations, it is necessary to reduce the sample data to an equivalent series of independent observations. Moreover, dimension reduction techniques are useful to handle heterogeneity and massiveness of data by reducing many variable data into manageable size. Also, dimension reduction techniques could help reaching a more parsimonious model through selection of more important features to include. Here, a review on most common sample reduction and dimension reduction techniques in hydroinformatics are presented. 2. Principal component analysis Principal component analysis (PCA) is usually utilized to transform a large number of variables into a small number of orthogonal variables which present common causes of variable changes (Eslamian et al., 2010). It has been developed to extract uncorrelated components of data which variation is maximized in those directions. PCA can save computation time since there are less independent variables reconstructed from original data set by PCA. It is a basic robust multivariate statistical method that does not require normally distributed and uncorrelated variables. PCA is able to remove the data noise and to cluster the refined samples of similar composition into groups to reveal relationships among their variables. All variables can be illustrated simultaneously by projecting basis vectors onto the two/three leading PCs. PCA as a technique became popular following papers by Lorenz in the mid-1950s—who called this technique Empirical Orthogonal Function (EOF) analysis. Both these names refer to the same set of procedures. However, EOF is the more popular term in atmospheric sciences. The method was firstly introduced by Pearson (1901), though until the 1950s, the method had limited applications due to the lack of computational equipment. Since then, it has been widely used in environmental science. The main objective of PCA is looking for new variables in the sample, which are not correlated to each other. Each of these new variables or principal components is a linear combination of the former variables and describes a different source of total variation. The method sorts initially correlated data into that explain decreasingly less and less of the total variation. Hence, the data could be reduced by trimming off less important transformed variables which are the last ones (Fig. 1). The dataset is typically represented by a matrix of samples or observations which are characterized by many physical, chemical, and other variables of different magnitude and units. Suppose the data consists of n observations of p variables and is represented as an n p matrix, called X. Without loss of generality, assume that the variables in the data matrix X are standardized so that each has a zero mean and unit variance. The principal components are t1 ¼ w1,1 x1,1 + w1,2 x1,2 + …+w1,p x1,p t2 ¼ w2,1 x2,1 + w2,2 x2,2 + …+w2,p x2,p ⋮ tn ¼ wn,1 xn,1 + wn,2 xn,2 + …+wn,p xn,p Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00015-4 Copyright © 2023 Elsevier Inc. All rights reserved. 153 154 Handbook of hydroinformatics FIG. 1 X0 and Y0 are orthogonal directions which describe the most variations of data and therefor are the first two principal components. where wi, j and xi, j (1 < i n, 1 < j p) are component loadings and original data, respectively. The component loadings are the contribution measures of a particular variable to the principal components. For each 1 < i n, it holds w2i, 1 + w2i, 2 + ⋯ + w2i, p ¼1. The variability of the principal components is ordered as Var(t1) > Var(t2) > … > Var(tn). Sum of the eigenvalues of the covariance matrix of data, gives the total variance of the original data matrix X. Singular value decomposition (SVD), is a computational method often employed to efficiently calculate principal components of a dataset. SVD of an n p matrix M is a factorization into three matrices of the form USVT, where U is an n n matrix with mutually perpendicular columns (orthogonal matrix), S is an n p rectangular diagonal matrix with nonnegative numbers in the decreasing order on the diagonal, and V is an p p orthogonal matrix. The diagonal entries of S are known as singular values (si) of the original matrix M (Austin, 2009). If we now perform singular value decomposition of the standardized data matrix X, we obtain X ¼ USV T , where U and V are orthogonal matrices and have columns with unit magnitude.a S is the diagonal matrix of singular values si of the data matrix. The columns of the matrices U and V contain the (left and right) singular vectors of X. The p p covariance matrix (C) of data is given by C ¼ XTX/(n 1). One can easily see that the covariance matrix could be rewritten as following using SVD of X VSUT VSU T S2 ¼ VSUT USV T ¼ V C¼ VT, ð n 1Þ n1 2 S . In the above equation, which could be considered itself as SVD of the covariance matrix with diagonal matrix of L ¼ n1 V is a matrix of eigenvectors (each column is an eigenvector) and L is a diagonal matrix with eigenvalues li ¼ s2i /(n 1) on the diagonal. Since the eigenvectors of the covariance matrix are actually the directions of the axes where there is the most variance, they called principal components (PCs). Principal component scores are given by columns of XV ¼ USVTV ¼ US. These components can be seen as new, transformed, variables. The jth principal component is given by jth column of XV. The coordinates of the ith data point in the new PC space are given by the ith row of XV. The eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each principal component. To reduce the dimensions of the data from p to k < p, select k first columns of V, and k k upper-left part of S. Their product, UkSk, is the n k matrix containing first k principal components. Further multiplying the first k principal components by the corresponding principal axes (VTk ) yields Xk ¼ (XVk)VTk . The matrix Xk provides a reconstruction of the original data from the first k principal components and has the original n p size but with lower rank of k.b Fortunately, there are some efficient implementations of SVD in common programming languages which find just top k eigenvectors. If you look at the spatial patterns of PCs, there is a temptation to ascribe some physical meaning to them, but this is not always a good idea. Because the orthogonality constraint on the eigenvectors can mean that the second and third PCs bear no resemblance to the physical mechanisms that drive the data. The first PC represents the most important mode of a. An orthogonal matrix that have columns with unit magnitude is called orthonormal matrix. b. The rank of the matrix refers to the number of linearly independent rows or columns in the matrix. In other words, the rank of a matrix is the dimension of the vector space generated by its columns/rows. Data reduction techniques Chapter 9 155 FIG. 2 A typical scree diagram of eigenvalues from the unreduced correlation matrix, arrow indicates region of curve where slope changes. variability or physical process but it may include aspects of other correlated modes and processes. As noted earlier, the variance of each PC is equal to its corresponding eigenvalue. In order to decide how many PCs are required to reduce data without significant loss of information, scree diagram can be helpful. Scree diagram is a graph plotting the eigenvalues of the covariance matrix C versus their values. By assessing the change in the slope of the diagram, one can choose the number of PCs to be used for the data reduction. For example, four or five PCs may be selected in Fig. 2. Moreover, the Kaiser criterion could be used for selecting the number of PCs. It is based on the fact that the more variables that load onto a particular component (i.e., have a high correlation with the component), the more important the factor is in summarizing the data. Hence, it drops the components, for which the eigenvalues are less than 1 (Beavers et al., 2013). Employing PCA method for data prefiltering can avoid multicollinearity. Multicollinearity is the occurrence of mutual correlations among a set of predictor variables which can result in unstable regression parameters. When the data vectors are spatial distributions of values at a single time, it could filter away much of the small-scale noise with a minimal loss of information. When PCA is applied to the time series which is structured into overlapping moving windows of data, it may reveal oscillatory features in the series. In this case, the eigenvectors represent characteristic time patterns, rather than characteristic spatial patterns. PCA might get misguided in presence of outliers. Moreover, the complexity of PCA for a matrix of size n p, is O(p2 n + p3) which is relatively high. The description given here is by no means complete. Those who want a more complete description should read Jolliffe (1986). 3. Singular spectrum analysis Singular spectrum analysis (SSA) is a data reduction tool based on PCA concept, usually called extended empirical orthogonal function (EEOF) in atmospheric sciences. Like PCA, it could be used as a data reduction method. SSA provides the ability to discern “common patterns of variability” shared among multiple datasets in both space and time. The extension may be in space (S-mode) or in time (T-mode). The math is essentially the same as for PCA, and the difference lies in preprocessing of the data. When the technique applied to multivariate data (many time series) it is known as multivariate or multichannel singular spectrum analysis (MSSA). 156 Handbook of hydroinformatics 3.1 Univariate singular spectral analysis In order to explain the implementation of the univariate case of SSA, consider a single times series as a vector X(t), t ¼ 1, …, n. Like PCA, eigenvectors and eigenvalues are extracted from the covariance matrix. Yet, the covariance matrix is calculated using a delay window or imposing an embedding dimension of length m on the time series. This vector contains the values of covariance between X(t) and X(t + k) with k ¼ 0, …, m 1. Note that we defined X such that it has unit variance. Hence, the covariance at lag zero equals one. The idea is thus to compute the covariance between the values X(t) and X(t + k), where k is a delay (or “lag”). That is, using the definition above, if the covariance at lag k is positive, the values X(t) and X(t + k) tend to vary together (Fig. 3). Hence, the trajectory matrix (Y) would be 2 xð1Þ T 6 x T 6 ð2Þ 6 Y¼6 ⋮ 6 6 4 xðn3Þ T xðn2Þ T 3 7 7 7 7: 7 7 5 Number of rows of matrix Y is n m + 1 and number of its columns is m ¼ 3. C is computed as C ¼ YTY/(n m + 1), which follows from the definition of covariance. The diagonal of matrix C contains the variances of each column which should be close to one in case the data are standardized. Eigenvectors of the covariance matrix (C) are principal components and called temporal empirical orthogonal functions (EOFs). Hence, matrix of eigenvectors (VC) could project the embedded time series into its principal components (PC ¼ YVC). For example, in case of m ¼ 3, the projection results a PC matrix with three columns that are the first three principal components PC1, PC2 and PC3. The first column is PC1, the second column is PC2, etc. Principal components are projection of data in a different coordinate system, and hence their interpretation is different from the data series. However, by projecting the PCs back onto the eigenvectors, we obtain time series (referred to as reconstructed components (RCs) in SSA terminology) in the original coordinates. For this, we need to construct a matrix (Z) that is, like the matrix Y, an embedded time series. There is one matrix for each one of the principal components. For example, let’s compare the matrix Z for the first principal component to PC1 (i.e., to the first column of the matrix PC). The first column of Z is simply PC1. The second column is PC1 at time t 1. The third column is PC1 at time t 2. Note that, again, zeros have been put at the beginning, where data is not available. The RC1 (the first reconstructed component) is derived by multiplying Z by the first eigenvector of the covariance matrix and dividing by M, RC1 ¼ ZVc(:,1)/m. For the other RCs, a similar method could be adopted using other PCs. FIG. 3 Imposing embedding dimension of length m ¼ 3 on the data time series. x1, x2, x3, x4, ,xn–3 , xn–2 , xn–1, xn xn–3 x1 x(1) = x(n–3) = x2 xn–1 x3 x(2) = xn–2 x2 x3 x4 xn–2 x(n–2) = xn–1 xn Data reduction techniques Chapter 9 157 3.2 Multivariate singular spectral analysis This section describes the differences between the univariate SSA and its multivariate (or multichannel) extension, MSSA. We go through the same steps as in the previous section and clarify the differences. The procedure for multivariate case is pretty close to what is done for univariate cases. Nevertheless, some modifications are needed for the multivariate case. Like univariate case, an essential and necessary step is to standardize multiple time series separately for each. Then, consider multiple time series as multiple columns of the data matrix. The trajectory matrix in this case consists of multiple lags of each variable, one after another. For example, when there are two variable with embedding length m ¼ 2, the first column is the first variable, the second column is the first variable with one step lag, the third column is the second variable and finally the fourth column is the second variable with one step lag. The covariance matrix (C) of the trajectory matrix has l m columns, where l is the number of variables in the time series. Using SVD, l m eigenvectors of C could be extracted. The first m eigenvectors belongs to the first variable and so on. Once the eigenvectors have been determined, the principal components are computed as PC ¼ YVC where Y is the trajectory matrix and VC is Matrix of eigenvectors of C. Unlike the EOFs or the matrices Y and C, we can no longer identify a part that corresponds to each separate time series. In other words, each PC contains characteristics of all time series. Analogous to univariate SSA, each time series can be reconstructed by projecting the PCs back onto the eigenvectors. As before, we embed each PC time-delayed with delays 0, …, m + 1, what yields to matrix Z of size n m with the same structure. In order to reconstruct the first time series, we use the first m rows of Vc (RCx1(:,i)]ZVc (1:m,i)/m and RCx2(:,j) ¼ ZVc(m + 1:2 m,j)/M), and for the second time series the second m rows, and so on. 4. Canonical correlation analysis Canonical correlation analysis (CCA) is a linear dimension reduction method, applied to pairs of multidimensional random variables. It has found applications in many areas of earth science including regional flood frequency analysis, statistical downscaling of general circulation models, and forecast of long-range temperature and precipitation (Wikle, 2003). Proposed by Hotelling (1936), CCA can be seen as the problem of finding basis vectors for two sets of variables such that the mutual information between the projections of the variables onto these basis vectors are mutually maximized (Borga and Knutsson, 1998). CCA could be considered as a dimension reduction method as well as a classification method. The standard formulation of CCA assumes that the dimensions of the reduced space, i.e., the number of canonical components, are known a priori. For the case when the number of canonical components is not known, refer to Tripathi and Govindaraju (2010). Let’s assume that matrix X containing observations of a set of variables while matrix Y containing relevant observations of a different set of variables. Typically, the data are time series of the observations of the two fields which could be observed at the same time (coupled variability) or they could be lagged in time (statistical prediction). Suppose that X and Y are standardized so that each column of those has a zero mean and unit variance. Then, the variance/covariance matrices could be estimated as CXY ¼ CTYX ¼ E XY T , CXX ¼ E XXT , CYY ¼ E YY T : The standardized data, X and Y, are transformed into sets of new variables (canonical variates), V ¼ ATX and W ¼ BTY where A and B are linear weights, called canonical vectors. The number of pairs of canonical variates is k ¼ min(dim(X), dim(Y)). A and B are chosen such that 1. corr[v1, w1] >¼ corr[v2,w2] >¼ … >¼ corr[vk,wk] >¼ 0 (each of the k pairs of canonical variates exhibits no greater correlation than the previous pair) 2. corr[vi, wj] ¼ rC(i) for i ¼ j; corr[vi, wj] ¼ 0 for i 6¼ j, where rC is canonical correlation coefficients (each canonical variate is uncorrelated with all other variates except its twin) In order to calculate the canonical vectors and variates, matrix K is constructed as 1=2 1=2 K ¼ CXX CXY CYY 158 Handbook of hydroinformatics where power of 1/2 denotes inverse of square root of the matrix. Let k denotes the number of nonzero eigenvalues of K. If we now perform singular value decomposition on K, we obtain K ¼ ΑSΒT , where columns of Α and Β, i.e., a1, …, ak and b1, …, bk are orthogonal and S is the diagonal matrix of singular values (si) of K. Elements of canonical vectors, i.e., ai and bi will be ( 1=2 ai ¼ CXX ai , for i ¼ 1…k: 1=2 bi ¼ CYY bi , for i ¼ 1…k: and canonical correlations are singular values rC(i) ¼ si. Note that no distinction is made between the two fields X and Y; each can act interchangeably as predictors or predictands. 5. Factor analysis Factor analysis (FA) is a linear method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors (Wallis, 1968). Like PCA, FA is another data reduction technique which allows you to capture variance in the variables in a smaller set. Moreover, FA could be used as a numerical procedure for screening variables and yields to build more effective regression equations. Despite all the similarities, there is a fundamental difference between PCA and FA; the former is a linear combination of variables, while the latter is a measurement model of latent variables/dimensions. Only if unique factors are small (have close to zero variance), then FA has the same results as PCA. Exploratory factor analysis (EFA) is used to identify complex interrelationships among items and group items that are part of unified concepts. The researcher makes no a priori assumptions about relationships among factors. In contrast, confirmatory factor analysis (CFA) tests the hypothesis that the items are associated with specific factors. CFA uses structural equation modeling to test a measurement model whereby loadings on the factors allow for evaluation of relationships between observed and unobserved variables. Structural equation modeling approaches can accommodate measurement error, and are less restrictive than least-squares estimation. Hypothesized models are tested against actual data, and the analysis would demonstrate loadings of observed variables on factors, as well as the correlation between them. 5.1 Principal axis factoring PCA and principal axis factoring (PAF) are used for factor extraction. PAF seeks factors which have the highest canonical correlation with the observed variables. It is unaffected by arbitrary rescaling of the data. Here, we explain factor extraction based on PAF. Consider k independent variables x1, …, xk and observed data for each of these variables. Our objective is to identify m factors y1, …, ym, preferably with m k as small as possible, that explains the observed data more succinctly. Let X ¼ [xi] be a random k 1 column vector where each xi represents a sample (observable trait), and let Μ ¼ [mi] be the k 1 column vector of the population means. Thus E[xi] ¼ mi. Let Y ¼ [yi] be an m 1 vector of unobserved common factors where m k. These factors play a role similar to the principal components in PCA. Suppose that each xi can be represented as a linear combination of independent factors as follows xi ¼ bi0 + m X bij yj + ei j¼1 where the coefficient bij is called the loading of the ith variable on the jth factor and the errors (ei) are noises which are not explained by the linear relationship. It is assumed that the factors are independent of each other and noises, with zero mean and unit variance. Another assumption is that the mean of each ei is zero and they are independent of each other. We can consider the above equation to be series of regression equations. Let Β ¼ [bij] be the k m matrix of loading factors and let Ε ¼ [ei] be the k 1 column vector of noises. Hence, mi ¼ E ½ x i Data reduction techniques Chapter " m X mi ¼ E bi0 + 9 159 # bij yj + ei j¼1 mi ¼ E½bi0 + h i b E yj + E½ei ij j¼1 Xm mi ¼ bi0 + 0 + 0 ¼ bi0 : So, the above regression equations can be expressed as xi ¼ mi + m X bij yj + ei j¼1 or equivalently X ¼ Μ + ΒY + Ε: From the assumptions stated above it also follows that E½xi ¼ mi for all i E½ e i ¼ 0 cov yi , yj ¼ 0 cov ei , ej ¼ 0 cov yi , ej ¼ 0 for all i if i 6¼ j if i 6¼ j for all i,j: Defining variance of noises as f ¼ E[ΕΕT], diagonal covariance matrix of X which is a k k matrix, has the form h i S ¼ E ðX Μ ÞðX Μ ÞT h i ¼ E ðΒY + ΕÞðΒY + Ε ÞT h i ¼ E ΒYY T ΒT + E ΒYΕ T + E Ε ðΒY ÞT + E ΕΕ T: ¼ ΒE YY T ΒT + ΒE YΕ T + E ΕY T ΒT + E ΕΕ T ¼ ΒIΒT + 0 + 0 + f ¼ ΒΒT + f The principal-axis method proceeds according to the following steps (1) Estimate f from the communalities as discussed below. (2) Find L and V, the eigenvalues and eigenvectors of S f using SVD (more detailed description on SVD method can be found in the section “Principal Component Analysis (PCA)”). (3) Calculate the loading matrix as follows B ¼ VL1=2 : One can reduce data by trimming out smaller eigenvalues. Hence, B could be estimated as VtrL1/2 tr . A new f matrix is estimated from the current loading matrix and steps 1–3 iterate until the convergence of f. (4) Calculate the factor scores as follows 1=2 F ¼ Z VL1=2 or F ¼ Z V tr Ltr : 160 Handbook of hydroinformatics We close this section with a discussion of obtaining an initial value of f. We can use the initial estimation of Cureton and D’Agostino (1993), which will be outlined here. The initial communality estimates (cii) are calculated from the correlation and inverse correlation matrices as follows 1 cii ¼ 1 ii s p X max over j6¼k k¼1 p X 1 k¼1 sjk 1 Rkk where s is the ith diagonal element of S 1 and sjk is element of S. The value of f is then estimated by 1 cii. Like PCA, Kaiser criterion could be used for determining the number of factors. Based on that criterion, the technique for determining the appropriate number of factors is to take the number of factors with eigenvalues greater than unity. Besides, scree diagram could be useful to determine the number of factors. The number of factors to keep where the curve makes an elbow and flattens out. ii 6. Random projection Random projection is a simple linear technique, used to reduce the dimensions of a set of points which lie in Euclidean space. Especially when random projection is used in conjunction with another technique (e.g., PCA or clustering), it could be a very useful technique. It reduces the dimensions from thousands to hundred, then PCA, clustering or other reduction techniques reduces dimensions farther. This scheme seem s more interesting when we know that time complexity for random projection is way lower than that of PCA and many other data reduction methods. The theory behind the efficiency of random projection is presented in Johnson-Lindenstrauss lemma. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimensions in such a way that distances between the points are nearly preserved. Hence, it is powerful when its results are used for discriminative models. In order to gain from the lemma, an orthonormal random matrix (R) is needed to transform data from a high-dimensional space into a space with lower dimensions. If the original data matrix has n p dimensions, then XRP nd ¼ XnpRpd is the projection of the data onto a lower d-dimensional subspace. Computational cost of random projection is of order O(pdn). If the data matrix X is sparse with c nonzero entries per column, then the complexity of this operation would be of order O(cdn). The orthonormal random matrix (R) can be generated using different methods. Two common ways to build R are Gaussian random matrix and sparse random matrix methods. For Gaussian random projection, elements of the randomly generated matrix are drawn from a zero-mean normal distribution with variance equals to 1/d. The sparse random projection uses a sparse random matrix that guarantees similar embedding quality while being much more memory efficient and allowing faster computation of the projected data. If we define s ¼ 1/density, the elements of the random matrix are drawn from 8 pffiffiffiffiffi 1=2s > < s=d Rij ¼ 0 with probability 1 1=s > : psffiffiffiffiffi 1=2s =d Li et al. (2006) recommended to set the density parameter to 1/√ p. Achlioptas (2001) has proposed a simpler alternative which is commonly implemented in software packages 8 1=6 > < 1 Rij ¼ 0 with probability 2=3 > : 1 1=6 Worthwhile to mention that for both the Gaussian and sparse methods, the projection matrix is not an exact orthogonal matrix. However, it has been shown that in high dimensional spaces, it is close enough to orthogonal matrix to guarantee the embedding quality. Data reduction techniques Chapter 9 161 It is common to use random projection in a Monte Carlo approach and aggregate multiple runs of randomly projected data by Expectation Maximization (EM) clustering technique. In this approach, the frequency of similarity measure values between pair of data points is the criteria to define the clusters. The probability that data point i belongs to each cluster under the model y is p(l j i, y), l ¼ 1, …, k. Hence, the probability that data point i and j belong to the same cluster under the model y k P is pyij ¼ pðlji, yÞ pðljj, yÞ. To aggregate multiple clustering results, the values of pyij are averaged across multiple runs to l¼1 obtain estimate of the probability that data point i and j belong to the same cluster (pij). The pij values are expected to be large when data point i and j are from the same natural cluster and small otherwise. In order to produce the final clusters from the aggregated similarity matrix (P) whose elements are pijs, an agglomerative clustering procedure is adopted. Similarity measures could be different kind of distances including Euclidean, cosine, Jaccard, Manhattan, Minkowski, and Chebyshev. The distances are illustrated in Fig. 4. Euclidean measure is only recommended to use when data is low dimensional and straight forward distance between data points is enough to gauge the similarities of the points. Cosine measure is one of the most commonly used metrics. It is used to find the similarity between two vectors/data points by calculating the cosine angle between them. It works excellent with high-dimensional data and should be used for them ideally. Jaccard similarity emphasizes on the similarity between two finite sample sets. It is defined as the size of the intersection of the sets, divided by the size of the union of these sets. Unfortunately, Jaccard measure is highly dependent on the size of the data. Large datasets can significantly impact the similarity, as in this case, the union could increase substantially while the intersection stays low. When discrete/binary attributes are present in the dataset, the Manhattan metric is more effective. However, Manhattan measure does not represent optimal distance in the case of floating attributes in our dataset. Moreover, for high dimensional data, it works better than Euclidean, but it’s still not the best option performance-wise. Minkowski generalizes the other distance metrics like Euclidean, Manhattan, and Chebyshev. It is also called p-norm vector as it adds a parameter called p that allows different distance measures to be calculated. Chebyshev distance is defined as the maximum difference between two vectors among all coordinate dimensions. In other words, it is simply the maximum distance along each axis. This metric is usually used for logistical problems. It can’t be applied for any general-purpose problem like Euclidean can. For more information on similarity metrics, refer to Cha (2007). FIG. 4 Similarity measures commonly used in EM clustering technique. 162 Handbook of hydroinformatics FIG. 5 Difference between Euclidean and geodesic distance between two sample points. 7. Isometric mapping Isometric mapping (ISOMap) is a nonlinear way to reduce dimensions while preserving local structures. It uses geodesic distance instead of Euclidean distance. Geodesic distance is the distance between two points following the path available/ possible between the two points whereas Euclidean distance doesn’t have a path constraint to follow (Varini et al., 2006). As it is shown in Fig. 5, according to Euclidean distance, the two points appear deceptively close, while they are on the opposite parts of the horseshoe. This highlights the fact that Euclidean distance could be misleading when working with nonlinearly dependent data. The geodesic distance between each pair of points is calculated using Dijkstra or Floyd-Warshall algorithm. Dijkstra’s algorithm finds shortest path from source to all other points, given a source points while Floyd Warshall algorithm computes the shortest path between all pair of points. Time complexity of Dijkstra’s algorithm is much less than that of FloydWarshall algorithm. Bounds of the running time of Dijkstra’s algorithm on a group of n data points with E lines can be expressed as O(Elog(n)) while Floyd-Warshall algorithm has the time complexity of O(n3). Using the above-calculated geodesic distances between points, the dissimilarity matrix is formed. After squaring the matrix, it should be transformed such that mean for any row and mean for any column of the matrix is zero. Finally, eigendecomposition of the transformed matrix is performed and d first eigenvectors are chosen (d is the reduced size). This is something similar to what we do in PCA after calculating the correlation matrix. 8. Self-organizing maps Based on ideas first introduced by Von der Malsburg (1973), Kohonen (1982) described self-organizing maps (SOMs) in a publication entitled “Self organized formation of topologically correct feature maps.” He proposed a new algorithm aimed at providing a representation in a smaller space, with the aim of preserving the initial topology. When the data forms a curved line or surface in input space, PCA doesn’t perform good. In this case, SOM will overcome the approximation problem by virtue of its topological ordering property. It provides a discrete approximation of finding so-called principal curves or principal surfaces and may therefore be viewed as a nonlinear generalization of PCA. SOMs have many realworld applications in water science including satellite remote sensing process and discovering correlations and patterns in the hydro-climate data. An SOM is suitable for extracting information from large datasets consisting of numerous sample units and variables in different scales. In general, conventional multivariate analyses are not suitable to extract information from such large and complex datasets. In case of dimension reduction by PCA, for instance, a large dataset with a large number of variables would produce a large number of significant principal components. Therefore, a few principal components may not be sufficient to address overall variation in the large multidimensional datasets (Melssen et al., 1993). SOM is a neural network algorithm using unsupervised competitive learning which could be used as a dimension reduction method. Competitive learning is a form of unsupervised learning, in which nodes of the neural network compete for the right to respond to a subset of input data. The goal of learning in the SOM is to cause different parts of the network to respond similarly to certain input patterns. In the SOM networks, input layer feeds the hidden layer. The hidden layer is basically a lattice of neurons and usually called Kohonen layer. In the training procedure, neurons in the Kohonen layer accept and respond to set of input signals (Fig. 6). Then, the responses compared and winning neuron selected from the lattice. Selected neuron activated together with neighborhood neurons and adaptive process changes weights to more closely resemble the inputs. The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. Data reduction techniques Chapter 9 163 FIG. 7 The red limited domain shows neighborhood of neuron i in Kohonen layer. FIG. 6 Basic structure of the SOM neural network. Each hidden layer neuron has several neighbor neurons. In order to define neighborhood mathematically, the neighborhood function should be deployed. The neighborhood function (’(i,k)) indicates how closely any pair of neurons i and k in the Kohonen layer are connected to each other. ’ should be symmetric about the neuron and monotonically decreasing with distance to it (Fig. 7). Usually, a Gaussian function on the distance between the two neurons in the layer is used ! d2i,k ’ði, kÞ ¼ exp 2 2r where di,k is the Euclidean distance between node i and its neighboring neuron k and r is the radius. The function is maximum at the neuron i and monotonically decreases onward. The stages of algorithm of SOM can be summarized as follows (1) Initialization: Randomly initialize weights for all neurons in the Kohonen layer. The other option is to sample evenly from the subspace spanned by two largest principal component eigenvectors, Wj ¼ [wj1, wj2, …, wjn]. (2) Sampling: Draw sample X from the input, X ¼ [x1, x2, P x3,…, xn]. 2 (3) Similarity matching: Compute for each node j, Dj ¼ wij xi and find index j that Dj is minimum (winner i neuron). (4) Updating/learning: Update the winner so that it becomes more like X, together with the winner’s neighbors k, Wj:¼ Wj(old) + ’(j, k)(X Wj(old)). (5) Continuation: Keep returning to step 2 until the map stops changing (i.e., no noticeable changes in the weights). This process is usually reiterated over the all available input samples. Learning rate () and radius (r) may be decreased during continuation stage.c The learning rate varies in the [0,1] interval and must converge to 0 so as to guarantee convergence and stability for the SOM. 9. Discriminant analysis The perspective on data reduction in this section is rather different from that of the previous sections. Up to this point, it is assumed that we have no measures of data labels, groups or strata. The sample data is reduced based on unsupervised procedures which obtained as many groups as were requested. In this section, the number of groups k is known. In addition, we also have recorded the group label for each observation without any uncertainty Discriminant analysis is a well-known and widely used linear data reduction technique which is limited by the fact that all predictors must be continuous, and that a parametric Gaussian assumption should be formulated possibly after transformation. However, it works well also with a mix of continuous (without parametric assumptions) and categorical measurements. Linear discriminant analysis (LDA) is a generalization of Fisher’s linear discriminant, a method used in statistics, to maximizes the ratio of the between-class variance to the within-class variance. Maximizing that ratio guarantees the maximum class separability through its transformation of features into a lower dimensional space. c. The decrease from the initial value of learning rate (0) to 0 could be done linearly or exponentially. Usually, the decrease from the initial value of neighborhood radius (r0) is exponential (rn ¼ r0e n/const.). 164 Handbook of hydroinformatics LDA seeks directions that are efficient for discriminating data whereas PCA seeks directions that are efficient for representing data. In Fig. 8, it is shown that how choosing an appropriate projection direction maximize separability among classes. In order to explain the algorithm of LDA, assume P that there are c classes with each class containing ni samples, i ¼ 1, 2, …, c and n is the total number of samples (n ¼ ci¼1ni). mi is the mean of the ith class, and m is the mean of the whole P Pi P (xij mi)(xij mi)T and the between-class dataset m ¼ 1c ci¼1 mi. The within-class scatter matrix would be SW ¼ ci¼1 nj¼1 Pc T scatter matrix would be Sb ¼ i¼1(mi m)(mi m) . Suppose the desired projection transformation is Y ¼ XUldawhere Ulda is the orthonormal projection matrix of LDA and Y is the transformed data. According to the definition of LDA, it leads to maximum class separability. The class separability is defined as the ratio of norm of between-class scatter matrix of the ) to the norm of its within-class scatter matrix S of the transformed data. Hence, the problem is to transformed data ( S b W j Sb j jUTlda Sb Ulda j maximize the ratio S or equivalently UT S U . This ratio is called Fisher’s criterion. It has been shown that Ulda is in fact j Wj j lda W lda j the solution of the eigensystem problem SbUlda SWUldaL ¼ 0,where L is the diagonal matrix of eigenvalues. Multiplying both sides by the inverse of SW yields (S1 W Sb)Ulda ¼ UldaL. If SW is a nonsingular matrix then the Fisher’s criterion is maximized when the projection matrix Ulda is composed of the eigenvectors of S1 W Sbwith at most (C 1) nonzero corresponding eigenvalues (since there are only C points to estimate Sb). Hence, four steps for performing LDA could be listed as following (1) Compute the p-dimensional mean vector for different classes from the dataset. (2) Compute the scatter matrices (between class and within class scatter matrices). (3) Sort the eigenvectors of S1 W Sb by decreasing eigenvalues and choose d eigenvector corresponding to d largest eigenvalues to from the p d dimensional matrix Ulda where every column represents an eigenvector. (4) Use p d eigenvector matrix to transform the sample into the new subspace. This can be summarized by the matrix multiplication Y ¼ XUlda (where X is the n p dimension matrix representing the n samples and Y is the transformed n d dimensional matrix of samples in the new subspace. Like other linear methods (i.e., PCA, FA, …), LDA is easily applicable and has analytical solution. However, if the number of variables is much higher than the number of samples in the data matrix, LDA will be unable to find the lower dimensional space due to singularity of the within-class scatter matrix. This is known as the small sample problem (SSS). Different approaches have been proposed to solve this problem. The first proposed approach is to remove the null space of within-class matrix (Chen et al., 2000). The second approach utilizes the conversion to an intermediate subspace using another dimension reduction method, e.g., PCA. In other words, first, PCA is applied to reduce dimensions of data, and then, LDA is applied to find the most discriminative directions in an intermediate subspace (Zhao et al., 1999). The third approach, which is a well-known one, is to apply the regularization to solve singular linear systems. The simplest way to regularize is by adding additional variance in all directions. To achieve this, a small constant number is added to all the diagonal elements of the within-class-scatter matrix. Such a regularization has the effect of decreasing the larger eigenvalues and increasing the smaller ones, thereby counteracting the biasing. Another effect of the regularization is to stabilize the smallest eigenvalues (Lu et al., 2005). FIG. 8 Proper projection direction leads to good separation of classes. x2 projection direction x2 projection direction Good separability x1 Bad separability x1 Data reduction techniques Chapter 9 165 10. Piecewise aggregate approximation Piecewise aggregate approximation (PAA) is a very simple dimension reduction method for time series mining. Time series datasets and databases tend to grow to extremely large sizes. Sampling consistently is a requirement in a lot of cases where these databases are involved. One algorithm to address this is the PAA, discussed by Keogh et al. (2001). The basic idea behind the algorithm is: to reduce the dimensions of the input time series by splitting them into equal-sized segments which are computed by averaging the values in these segments. It reduces data by the average values of equal sized frames. PAA approximates a time-series X of length n into vector X ¼(x1 , …, xm ) of any arbitrary length m n where each of xi is Pðn=mÞi calculated as follows xi ¼ mn j¼ðn=mÞði1Þ+1 xj. In other words, in order to reduce the dimensions from n to m, we first divide the original time-series into m equally sized frames and secondly compute the mean values for each frame. The sequence assembled from the mean values is the PAA approximation of the original time-series. Using the distance measure pffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pm DPAA(X,Y) ¼ mn i¼1 ðxi yi Þ, Yi and Faloutsos (2000) have shown that PAA satisfies to the lower bounding condition and guarantees no false dismissals, i.e., DPAA(X,Y) D(X,Y). 11. Clustering Clustering algorithms partition the data examples into disjointed groups, or clusters, so that data samples within a cluster are “similar” to one another and different to data examples that belong to other clusters. Using this data reduction method, only cluster representation (e.g., centroid and diameter) of the data could be used instead of the actual data (Fig. 9). Cluster analysis has been applied to find patterns in the atmospheric pressure and temperature of regions that have a significant impact on climate. There are many choices of clustering definitions and clustering algorithms. Generally speaking, there are two types of clustering, partitional clustering and hierarchical clustering. In partitional clustering, objects are divided into nonoverlapping clusters. In hierarchical clustering, a set of nested clusters organized as a hierarchical tree. Hierarchical clusters can be visualized as a dendrogram. To execute partitional clustering, there are two common algorithms, k-means and its variants and density-based clustering. 11.1 k-means clustering The most popular algorithm for clustering is k-means, which aims to identify the best k cluster centers in an iterative manner. Cluster centers are served as representative of the objects associated with the cluster. Usually, number of clusters is given a priori. Otherwise, notion of clusters could be extremely ambiguous. Due to iterative nature of k-means, it might lead to an incorrect result due to convergence to a local minimum. The basic operation of k-means clustering algorithms is relatively simple. Given a fixed number of k clusters, assign observations to those clusters so that the means across clusters (for all variables) are as different from each other as possible. FIG. 9 The result of a cluster analysis which reduces dimensions of the data. 166 Handbook of hydroinformatics TABLE 1 k-means: common choices for proximity, centroids and objective functions. Proximity function Centroid Objective function Manhattan (L1) Median Minimize sum of the L1 distance of an object to its cluster centroid Mean Minimize sum of the squared L2 distance of an object to its cluster centroid Mean Maximize sum of the cosine similarity of an object to its cluster centroid Squared Euclidean Cosine (L22) From Tan, P.N., Steinbach, M., Kumar, V., 2016. Introduction to Data Mining: Global Edition. Pearson Education Limited. k-means procedure does not explicitly use pairwise distances between data points, in contrast to hierarchical clustering. In order to determine which centroid is closest to a particular data point, we have to use a proximity measure. Manhattan, Euclidean, and cosine are all proximity measures that help us to determine which cluster a point should be assigned to. The term centroid implies to a central tendency measure in the multivariate space. The objective of clustering algorithm is to minimize the sum of the squared distance (or other measures) of the objects in a cluster to their corresponding centriod once the centroid has been defined. The k-means algorithm tries to minimize the value of objective function for each set of centriods in an iterative manner. In first step, k data points are randomly selected as the centroids and the objective function is calculated. It is continuing until there is no change to the centroids, i.e., assignment of data points to clusters doesn’t change (Table 1). It should be mentioned that generally k-means clustering is only a variant of expectation maximization (EM) technique, and is efficient when clusters are spherical. The EM algorithm extends this basic approach to clustering in two important ways: The EM clustering algorithm computes probabilities of cluster memberships based on one or more probability distributions. The goal of the clustering algorithm then is to maximize the overall probability or likelihood of the data, given the (final) clusters. It is in contrast to k-means clustering that assign observations to clusters to maximize the distances between clusters. Of course, as a final result of EM algorithm, you can usually review an actual assignment of observations to clusters, based on the (largest) classification probability. k-means has problems when clusters are of differing sizes, densities and constitute a nonspherical shape (not same variance in all directions) in space. It highlights the need for standardization of the data before applying k-means clustering. Moreover, k-means has problems when the data contains outliers. 11.2 Hierarchical clustering There are two main types of hierarchical clustering: agglomerative method that starts with the points as individual clusters and each step, merge the closest pair of clusters until only k clusters left, and divisive method starting with all-inclusive one cluster and recursively splitting the most appropriate cluster. The process continues until a stopping criterion (e.g., predefined number of clusters) is achieved. Common hierarchical algorithms use a similarity matrix and merge or split one cluster at a time based on specific rules. However, there are several ways to measure the similarity between in order to decide the rules for clustering, as shown in Fig. 10. All the approaches to calculate the similarity between clusters have their own advantages and disadvantages. Minimum linkage method can handle nonelliptical shapes and is the best for capturing clusters of different sizes. However, it is sensitive to noise and outliers. Maximum linkage method is less susceptible to noise and outliers but it tends to break large clusters and it is biased toward spherical clusters. The centroid, average and Ward linkage methods are robust to noises but they are biased toward spherical clusters. Ward linkage method could outperform others in presence of outliers. A simple agglomerative clustering method could be briefly explained as follows. First, we should note that there is an n n similarity matrix (P), and k number of clusters is desired. A partition of n points into k clusters would be the output of the procedure. The counter is set to n for the first time. For i ¼ 1 to n, ci is set to {xi} for i ¼ 1, …, n. Now, we find the most similar pair of clusters based on P, say ci and cj and merge those while the counter is decremented by one. The steps of finding and merging clusters will be repeated till the counter is equal or less than k (Fern and Brodley, 2003). Hierarchical clustering does not work well on vast amounts of data. However, it has some strengths including – There is no need to prespecify any particular number of clusters. – Any desired number of clusters can be obtained by cutting the dendrogram at the proper level. – It is easy to decide the number of clusters by merely looking at the dendrogram. Data reduction techniques Chapter 9 167 FIG. 10 Different approaches to calculate the similarity between clusters in hierarchical clustering. 11.3 Density-based clustering In view of density-based clustering methods, clusters are dense regions in the data space, separated by regions of the lower density of points. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is the most common technique in density-based clustering. DBSCAN algorithm identifies the dense region by grouping together data points that are closed to each other based on distance measures. DBSCAN algorithm requires two parameters; radius of the neighborhood (Eps) which density is defined as the number of points within it. Another parameter referring to minimum number of data points within a neighborhood (MinPts), is also needed to define the density-based clusters. If the distance between two points is lower or equal to Eps, then they are considered as neighbors. If the Eps value is chosen too small, then large part of the data will be considered as outliers. If it is chosen very large, then the clusters will merge and majority of the data points will be in the same clusters. One way to find the Eps value is based on the k-distance graph. Larger the dataset, the larger value of MinPts must be chosen. As a general rule, the minimum MinPts can be derived from the number of dimensions (p) in the dataset as MinPts p + 1. The minimum acceptable value of MinPts is three. Worthwhile to mention that it is needed to scale/normalize data before density-based clustering. There are have three types of data points in DBSCAN algorithm (Fig. 11): Core point: a point is a core point if it has more than MinPts points within Eps. The clusters are built around our core points. Border point: a point which has fewer than MinPts within eps but it is in the neighborhood of a core point. Outlier or noise: a point which is not a core point or border point. It is not an easy job to determine the parameters of DBSCAN. However, a possible approach is using k-distance graph. Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance. Outliers have the kth nearest neighbor at farther distance. So, plot the sorted distance of every point to its k-th nearest neighbor for values of k larger than p. The optimum distance (Eps) is where the slope of the graph increases dramatically and k could be selected as the MinPts (Fig. 12). 168 Handbook of hydroinformatics FIG. 11 Three types of points defined in the DBSCAN algorithm. FIG. 12 k-dist plot for sample data. (Adapted from Kotary, D.K., Nanda, S.J., 2021. A distributed neighbourhood DBSCAN algorithm for effective data clustering in wireless sensor networks. Wireless Pers. Commun. 121, 2545–2568.). DBSCAN is resistant to noise and capable of handling clusters of different shapes and sizes. However, it has some limitations including Time complexity is exponential in number of dimensions (specifically, it has high complexity if “too many” dense units are generated at lower stages). May fail if clusters are of widely differing densities, since the threshold is fixed. Determining appropriate threshold and unit interval length can be challenging. 12. Conclusions The area of dimension reduction is becoming very relevant in different application areas including environmental science, because of the sheer amount of data being generated in the era of big data. The purpose of this chapter is to provide information on different dimension reduction techniques. It r presented a review and comparative study of the common techniques for dimension reduction. Ultimate goal of performing dimension reduction is to improve the accuracy and efficiency as a consequence of decreasing complexity of computational work. This chapter gives clear idea of comparison of common dimensional reduction techniques in water science and is useful to select the optimum method for particular application based on characteristics of the dataset. It is concluded that in order to select a scheme to reduce data dimension, we should consider the type of dataset and specific requirement of the work. The other important concern to choose a techniques for dimension reduction is their computational difficulty, which should be feasible in practice. A combination of schemes may also be used to overcome the disadvantages of one scheme over another. Data reduction techniques Chapter 9 169 The chapter presented a review and comparative study of techniques for dimension reduction. By far, the most common data reduction techniques are those based on the search of components in its different brands (e.g., PCA, SSA, FA, …), although the tendency is to loose importance in favor of nonlinear techniques (e.g., ISOMap, SOMs, …). This is a response to the nonlinear nature of acquired data. The key benefit of these methods is their ability to analyze nonlinearities and adapting to the local structure of the data. However, nonlinear techniques for dimension reduction are often not capable of outperforming traditional linear techniques such as PCA or FA. References Achlioptas, D., 2001. Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems – PODS ’01, pp. 274–281. CiteSeerX 10.1.1.28.6652 https://doi.org/10.1145/375551.375608. ISBN 978-1581133615. S2CID 2640788. Austin, D., 2009. We recommend a singular value decomposition. American Mathematical Society. Feature Column. Beavers, A.S., Lounsbury, J.W., Richards, J.K., Huck, S.W., Skolits, G.J., Esquivel, S.L., 2013. Practical considerations for using exploratory factor analysis in educational research. Pract. Assess. Res. Eval. 18 (Article 6). https://doi.org/10.7275/qv2q-rk76. Borga, M., Knutsson, H., 1998. An Adaptive Stereo Algorithm Based on Canonical Correlation Analysis. IEEE. Cha, S.H., 2007. Comprehensive survey on distance/similarity measures between probability density functions. City 1 (2), 1. Chen, L.F., Liao, H.Y.M., Ko, M.T., Lin, J.C., Yu, G.J., 2000. A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn. 33 (10), 1713–1726. Cureton, E.E., D’Agostino, R.B., 1993. Factor Analysis: An Applied Approach, first ed. Psychology Press, https://doi.org/10.4324/9781315799476. Eslamian, S., Ghasemizadeh, M., Biabanaki, M., Talebizadeh, M., 2010. A principal component regression method for estimating low flow index. Water Resour. Manage. 24 (11), 2553–2566. Fern, X.Z., Brodley, C.E., 2003. Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 186–193. Hotelling, H., 1936. Relations between two sets of variates. Biometrika 28, 321–377. Jolliffe, I.T., 1986. Principal Component Analysis. Springer Series in Statistics. Springer, New York. https://doi.org/10.1007/978-1-4757-1904-8. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S., 2001. Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3 (3), 263–286. https://doi.org/10.1007/PL00011669. Kohonen, T., 1982. Self-organized formation of topologically correct feature maps. Biol. Cybern. 43 (1), 59–69. Li, P., Hastie, T.J., Church, K.W., 2006. Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. Lu, J., Plataniotis, K.N., Venetsanopoulos, A.N., 2005. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn. Lett. 26 (2), 181–191. https://doi.org/10.1016/j.patrec.2004.09.014. Melssen, W.J., Smits, J.R.M., Rolf, G.H., Kateman, G., 1993. Two-dimensional mapping of IR spectra using a parallel implemented self-organising feature map. Chemom. Intell. Lab. Syst. 18 (2), 195–204. Pearson, K., 1901. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2 (11), 559–572. Tripathi, S., Govindaraju, R.S., 2010. Canonical correlation analysis for hydroclimatic datasets with known measurement uncertainties. In: World Environmental and Water Resources Congress 2010., https://doi.org/10.1061/41114(371)461. Varini, C., Degenhard, A., Nattkemper, T.W., 2006. ISOLLE: LLE with geodesic distance. Neurocomputing 69 (13–15), 1768–1771. Von der Malsburg, C., 1973. Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14 (2), 85–100. Wallis, J.R., 1968. Factor analysis in hydrology—an agnostic view. Water Resour. Res. 4 (3), 521–527. https://doi.org/10.1029/WR004i003p00521. Wikle, C.K., 2003. Spatio temporal methods in climatology. In: Encyclopedia of Life Support Systems. EOLSS. Yi, B.K., Faloutsos, C., 2000. Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th VLDB Conference, Cairo, Egypt. Zhao, W., Chellappa, R., Phillips, P.J., 1999. Subspace Linear Discriminant Analysis for Face Recognition. Computer Vision Laboratory, Center for Automation Research, University of Maryland, USA. This page intentionally left blank Chapter 10 Decision tree algorithms Amir Ahmad Dehghania, Neshat Movahedia, Khalil Ghorbania, and Saeid Eslamianb,c a Department of Water Engineering, Gorgan University of Agricultural Sciences & Natural Resources, Gorgan, Iran, b Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, c Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction Machine learning (ML) is a branch of artificial intelligence (AI) to train a computer system for making decisions based on the fed data. The data pattern is recognized by ML based on learning process and then they can predict when unseen data arrives. ML algorithms are grouped by their learning style and by their similarity in form or function. Decision trees (DTs) are very popular techniques in machine learning which grouped by their similarity. The learning process can either be supervised or unsupervised, which tree-based methods are supervised techniques. Unlike neuron-based models such as ANN, DTs divide complex problems into smaller ones, which is understandable for everyone, even without any analytical knowledge to read and interpret them. The other advantages of DTs over neuron-based models are that the process of making a decision can be easily shown via visual representation of the data, and they are easy to program with only IF, THEN, ELSE statements. Also, the lack of hidden layers and modeling transparency in DT-based algorithms, enable better modeling performance rather than neuron-based models (Bui et al., 2020a). DTs are used to solve both classification and regression problems in the form of trees. Furthermore, based on the target variable, they are divided into two types; categorical variable DT and continuous variable DT. DTs use various algorithms to find out how to choose suitable variable and how to split (Sullivan, 2017). There are some standard DT algorithms including Iterative Dichotomiser 3 (ID3), C4.5 and C5.0, Classification and Regression Tree (CART), Chi-squared Automatic Interaction Detection (CHAID), M5 model tree, and Random Forest (RF). However, there are other well-known DT algorithms such as Best First Decision Trees, Alternating model tree (AMT), Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), and Alternating Decision Trees (ADT) (Khosravi et al., 2018a, 2020; Bui et al., 2020b; Khosravi et al., 2021), but here the above standard DT algorithms are introduced briefly. For better comparison of these algorithms, specifications of each algorithm are presented in Table 1 based on type of target variable, splitting criterion, number of branches and pruning. A brief description of DT algorithms is presented in the following part, and there is a need to define important keywords before diving into working principle of DT algorithm (Fig. 1). Root node: It is top-most node of a DT which represents the entire population or sample. The data set is divided into homogeneous datasets based on Attribute Selection Techniques. Splitting: It a process for dividing a node into multiple subnodes. Decision node: The subnodes created in this process of splitting are known as decision nodes. Leaf/terminal node: The Nodes that cannot be split any more are called leaf node. Pruning: The process of removing the subnodes of a decision node is known as pruning. Branch/subtree: The subsection of an entire tree in called branch or subtree. Parent and child node: The node that gets divided is known as parent node, whereas the subnodes are known as child nodes. 1.1 ID3 algorithm Induction of Decision Tree (ID3) is a very simple DT algorithm which proposed by Quinlan (1986) used to determine the classification of objects. In this algorithm, at each node, one attribute is tested based on maximizing information gain and minimizing entropy. Then each attribute which produces the highest Gain is selected as root node. The Entropy and Gain are calculated as following: Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00004-X Copyright © 2023 Elsevier Inc. All rights reserved. 171 172 Handbook of hydroinformatics TABLE 1 Comparison of different decision tree algorithms. Algorithm (!) Characteristics (#) ID3 C4.5 CART CHAID M5 RF Independent variable Categorical Categorical/ continuous Categorical/ continuous Categorical/ continuous Continuous Categorical/ continuous Dependent variable Categorical Categorical/ continuous Categorical/ continuous Categorical Continuous Categorical/ continuous Splitting criterion Information Gain Gain ratio Gini index or Towing criteria Chi-square Standard deviation reduction Randomly Branches 2 2 2 2 2 2 Pruning No Yes Yes Yes Yes No FIG. 1 Decision tree structure. Root Node Branch/Sub-tree Decision Node Leaf Node Decision Node Leaf Node Leaf Node Decision Node Leaf Node EntropyðDÞ ¼ n X Leaf Node pi log 2 pi (1) i¼1 GainðD, AÞ ¼ EntropyðDÞ n X ½ pðDjAÞ EntropyðDjAÞj (2) i¼1 where D and A denote decision and attribute, respectively. Also, pi is probability of each class in decision and n is the number of classes. It must be noted that for a dataset with small size, the ID3 algorithm may be give over-fitted or over-classified results. Also, this algorithm disable to handle numeric attributes and missing values (El Seddawy et al., 2013). 1.2 C4.5 algorithm Quinlan (1993) developed a series of improvements to ID3, called C4.5 (Salzberg, 1994; Quinlan, 2014). These improvements deal with numeric attributes, missing values and noisy data. C4.5 uses entropy for its impurity function. In ID3 algorithm, the gains are used for each attribute, but in C4.5, the gain ratios are used: GainRatioðAÞ ¼ GainðAÞ=SplitInfoðAÞ X SplitInfoðAÞ ¼ Dj =jDj log 2 Dj =jDj (3) (4) Decision tree algorithms Chapter 10 173 One of the advantages of this algorithm is that, C4.5 can handle both continuous and discrete attributes. In order to handle continuous attributes, it splits the data set into those whose attribute value is above the threshold and those that are less than or equal to it (Singh and Gupta, 2014). Mazid et al. (2010) found that many nodes in this algorithm have zero values or close to zero values, which they do not contribute to generate rules or help to construct any class for classification task, while they make the tree bigger and more complex. 1.3 CART algorithm Classification and regression tree (CART) proposed by Breiman et al. (1984) develops visualized decision rules for predicting a categorical variable and also a continuous variable. Classification trees are used when the target variable is categorical. Regression trees, on the other hand, are used when the target variable is continuous. The splitting rule in classification is measured by Gini index to quantify the homogeneity of the data: Gini ¼ 1 n X ð pi Þ 2 (5) i¼1 In regression tree, the splitting is made in accordance with squared residuals minimization algorithm which implies that expected sum variances for two resulting nodes should be minimized (Timofeev, 2004). The CART can handle both numerical and categorical variables. Also, it has this ability to identify the most significant variables and eliminate nonsignificant ones (Timofeev, 2004). 1.4 CHAID algorithm The Chi-Square Automatic Interaction Detection (CHAID algorithm) proposed by Kass (1980), and it is based on the chisquare tests which used to find the significance of a feature (Kass, 1980). This algorithm builds decision tree for categorical target data set. The formula of chi-square (w2) is: w2 ¼ X ðy y 0 Þ2 y0 (6) where y is actual value and y0 is expected value. The advantages of this algorithm is that it is convenient for generation of nonbinary trees (Milanovic and Stamenkovic, 2016). 1.5 M5 algorithm The M5 algorithm, was first introduced by Quinlan (1992) for predicting continuous data and then his theory improved in a system called M50 by Wang and Witten (1996). In this model, data are classified into different groups and for each group a set of multilinear regression (multivariate linear) equation is built. The advantage of M5 model tree is its ability to tackle tasks with very high dimensionality—up to hundreds of attributes. In this algorithm by increasing the number of attributes, the computational cost does not grow quickly. Also, the main advantage of this algorithm is that it is able to produce regression equation for each of classes. The advantage of M5 over CART is that model trees are generally much smaller than regression trees and have proven more accurate in the tasks investigated (Quinlan, 1992). The only disadvantages of this algorithm, based on the experience of authors, are related to the greedy nature of this algorithm which the accuracy of the model does not necessarily increase with increasing number of data. 1.6 Random forest The random forest (RF) algorithm which first introduced by Breiman (2001), is a tree-based algorithm that combines multiple DTs for making decision. RF randomly chooses a set of samples from the dataset, creates a DT using a random subset of variables. By repeating this process, i.e., choosing another sample and selecting variables randomly, a wide variety of DTs are created. Then, RF combines the output and a tree with higher vote is selected as a final output. RF is performed well on estimating missing data and large dataset but it is a hyperparameter model. The advantages of this model is that it can be used in both time series (Sharafati et al., 2019; Bui et al., 2020a; Khosravi et al., 2020) and spatial prediction (Khosravi et al., 2019). Readers are referred to the Tyralis et al. (2019) for further study about the RF applications in water sciences. Recently, Fisher et al. (2021) use RF to investigate the parameters which affect sediment rating curve shape, vertical offsets, and slopes. 174 Handbook of hydroinformatics 1.7 Application of DT algorithms in water sciences Charoenporn (2017) used ID3 and C4.5 decision tree algorithms to forecast reservoir inflow. Galelli and Castelletti (2013) investigate the possibility of using CART to predict characteristic flows in various morphoclimatic conditions. CART algorithm employed by Choubin et al. (2018a,b) to estimate the monthly suspended sediment load and to forecast precipitation, respectively. M5 model tree has been applied in different field of hydraulic, hydrology, and groundwater; for example in flood forecasting (Solomatine and Xue, 2004), sediment transport prediction (Bhattacharya et al., 2007; Reddy and Ghimire, 2009; Khosravi et al., 2018b; Zahiri et al., 2020; Salih et al., 2020), wave height estimation (Etemad-Shahidi and Mahjoobi, 2009), daily river flow forecasting (Sattari et al., 2013; Kisi et al., 2017), groundwater level prediction (Rezaie-Balf et al., 2017; Nalarajan and Mohandas, 2015; Sattari et al., 2018; Bahmani et al., 2020), precipitation forecasting (Goyal and Ojha, 2012), reservoir operating rules derivation (Goyal et al., 2013), hydraulic jump characteristics prediction (Mahtabi and Satari, 2016), drought prediction (Sattari and Sureh, 2019), pan evaporation estimation (Sattari et al., 2020), and rainfall-runoff modeling (Nourani et al., 2019). Also, RF and M5 model tree are being compared in some studies, for example in prediction of local scour around bridge piers (Pal et al., 2013), shear stress in compound channel (Khozani et al., 2019), bed load transport (Khosravi et al., 2020), water quality (Bui et al., 2020a), and in estimation of unsaturated hydraulic conductivity (Sihag et al., 2019). Looking to literature it can conclude that RF and M5 model tree are frequently applied in water sciences, but since RF is hyper parameter model, so in the next section description of M5 model tree is presented and the flow discharge through side sluice gate for submerged condition is solved with this model. 2. M5 model tree Such as other decision tree models, first the initial tree is built based on splitting criterion, and then this tree is pruned to overcome the fitting problem and finally the smoothing process is employed to compensate sharp discontinuities between adjacent linear models at the leaves. The most important steps in M5 model tree are splitting and pruning which is explained in details as follows. One of the most important principles in the decision tree is that an expert viewpoint is always needed in the process of splitting and pruning. Different trees can be created due to the viewpoint of different experts. In some cases, by pruning we may lose some rules that they are necessary for us. 2.1 Splitting M5 model tree first constructs a regression tree by recursively splitting the instance space. Fig. 2 illustrates how the splitting of space is done for a given 2-D input parameter domain of X1 and X2. The splitting criterion is used to minimize the intrasubset variability in the values down from the root through the branch to the node (Bonakdar and Etemad-Shahidi, 2011). The splitting criterion for the M5 model tree algorithm is based on treating the standard deviation of the class values that reach a node as a measure of the error at that node and calculating the expected reduction in this error as a result of testing each attribute at that node. The Standard Deviation Reduction (SDR) is calculated as follows: X T i SDR ¼ sd ðT Þ sd ðT i Þ (7) jT j FIG. 2 Example of M5 model tree (LM 1–5 are linear regression models) (Solomatine and Xue, 2004). 8 X1>3 7 6 X2>4 X2>2 X2 X1>4.5 LM1 LM4 LM1 5 LM5 LM4 4 3 2 LM2 LM3 LM5 1 0 0 1 LM3 2 3 LM2 4 X1 5 6 7 Decision tree algorithms Chapter 10 175 where T represents a set of examples that reaches the node; Ti represents the subset of examples that have the ith outcome of the potential set; and sd represents the standard deviation. After examining all possible splits (i.e., the attributes and the possible split values), M5 chooses the one that maximizes the expected error reduction. Splitting in M5 terminates when the class values of all the instances that reach a node vary just slightly, or only a few instances remain (Wang and Witten, 1996). In order to better understand the processes in M5 model tree, details of splitting procedures are presented as follows: Step 1: Compute the standard deviation of the target values. Step 2: Choose one of the input variables (e.g., X1), sort its value in ascending order and calculate the average of X1 for all adjacent records. Then, for each of these averages, calculate the SDR of corresponding target (Y) using Eq. (7). For example, when tree is divided based on the average value of first point and second point of X1 (Fig. 3), two groups are created, values less than and higher than this average. Calculate the SDR for corresponding target based on first point and rest of the points. Then, assume that the tree is divided based on the average value of second and third points of X1 (Fig. 4). So, again calculate the SDR of corresponding target for values less than and higher than this average. It must be continued these processes until the SDR of all X1 records is calculated. Then, write down the maximum SDR of X1. Step 3: Repeat Step 2, for all input variables. Step 4: In this step, compare the maximum SDR of input variables. Choose the attribute with maximum SDR as Root Node (e.g., X2), then among SDR of all groups of this attribute (X2), choose the group with maximum SDR. Consider the value of X2 at this group as a point of splitting. Step 5: After selecting the root node, continue all the above steps for splitting until the values of all instances that reach a node vary slightly or only a few numbers of instances remain. Step 6: Finally, built a linear multiple regression model for each subspace (i.e., leaf), using all the attributes that participate in building the tree. Group2 Y Group1 X1 FIG. 3 Splitting based on the average of first point and second point of X1. Group2 Y Group1 X1 FIG. 4 Splitting based on the average of second point and third point of X1. 176 Handbook of hydroinformatics FIG. 5 Schematic of tree pruning. 2.2 Pruning As mentioned before, M5 is considered as a greedy approach, which lead to just check for the best split, and continue until the stopping conditions are achieved. As the tree grows, the accuracy of the model increases but overfitting may be inevitable, so pruning can overcome this problem (Kumar et al., 2005). By starting from the bottom of the tree and investigating each of nonleaf node, the pruning is done by selecting the linear model above or the model subtree, whichever has the lower estimated error. If the linear model is chosen, the subtree at this node is replace with a leaf (Quinlan, 1992). Fig. 5 illustrates a conceptual scheme of the pruning process. 2.3 Smoothing Smoothing process is done to compensate for the sharp discontinuities that will inevitably occur between adjacent linear models at the leaves of the pruned tree, particularly for some models built from a smaller number of training instances (Bhattacharya et al., 2007). Smoothing can be accomplished by producing linear models for each internal node, as well as for the leaves, at the time the tree is built. Then, once the leaf model has been used to obtain the raw predicted value for a test instance, that value is filtered along the path back to the root, smoothing it at each node by combining it with the value predicted by the linear model for that node (Witten and Frank, 2002). An appropriate smoothing calculation is: p0 ¼ np + kq n+k (8) where p0 is prediction passed up to the next higher node, p is prediction passed to this node from below, q is value predicted by the model at this node, n is training instances that reach the node below, and k is smoothing constant. 3. Data set To shows the ability of M5 model tree, the flow discharge through side sluice gate for submerged condition was solved with M5 model tree. Side sluice gates are one of the diversion structures used in irrigation channels, urban sewage systems and also as head regulators of distributaries, in order to divert the flow from main channel to a secondary channel (Swamee et al., 1993; Ghodsian, 2003). Ghodsian (2003) investigated flow through side sluice gate in both free and submerged flow conditions, using a series of laboratory experiments (Fig. 6). His experiments performed for various combinations of main channel discharge (Q0), side sluice gate opening (a), upstream depth of flow (y0) for free flow conditions, and additional parameter, i.e., tailwater depth (yt) for submerged flow conditions. Also, in his experiments, the flow was subcritical in the main channel and it was assumed that the specific energy remains constant along the side sluice gate. Table 2 presents the range of these parameters in Ghodsian (2003) study for submerged flow condition. Decision tree algorithms Chapter 10 177 FIG. 6 Subcritical flow through side sluice gate. side sluice gate secondary channel main channel PLAN y1 y0 a b SECTION TABLE 2 Range of variables in Ghodsian (2003) study for submerged flow through side sluice gate. Variable definition Variable range Upstream depth, y0 (m) 0.08–0.4 Downstream depth, y1 (m) 0.09–0.39 Tailwater depth yt (m) 0.05–0.36 Sluice gate opening, a (m) 0.01–0.1 Upstream discharge, Q0 (m /s) 3 0.008–0.097 Side sluice gate discharge, QS (m /s) 0.003–0.046 y0/a 1.577–32.38 yt/y0 0.267–1.181 Fr 0.047–0.806 QS/Q0 0.067–0.968 3 3.1 Empirical formula for flow discharge For submerged flow, Ghodsian (2003) proposed the flowing procedures to calculate flow discharge through submerged side sluice gate: 1. For flow depth and discharge in upstream section, i.e., y0 and Q0, calculate specific energy E0 from: E0 ¼ y 0 + Q20 2gB2 y20 (9) 178 Handbook of hydroinformatics 2. Determine value of Fr and hence Cm from: Cm ¼ 0:611 y0 a y0 a 1 +1 !0:216 8 < 91 #0:67 y 0:2 = 0:46 2:5yt at y0 0:24 +1 1 + 0:558Fr 0:1526 : ; y0 yt " (10) 3. By assuming that E0 ¼ E1 ¼ E, calculate y1 using: h y y 2abCm y0 y i0:5 hy1 y i0:5 ¼3 1 0 1 1 + sin 1 0 sin 1 1 BE E E E E E E (11) 4. Calculate downstream discharge Q1 from: E1 ¼ y 1 + Q21 2gB2 y21 (12) 5. Determine side sluice gate discharge QS from: QS ¼ Q0 Q1 (13) where B and b are the width of main channel and side channel, respectively. Cm is the discharge coefficient which used in general formula of flow discharge through sluice gate. As the Eq. (11) is nonlinear then QS is calculated by trial and error process. Furthermore, the Eq. (10) is also obtained by fitting the equation on experimental data and there are errors for estimation of Cm. So, applicability of M5 model tree to estimate QS was examined by obtaining nondimensional parameters. The flow discharge through submerged side sluice gate (QS) can be defined by the set of the following parameters: QS ¼ ’ðQ0 , V 0 , y0 , yt , a, g, rÞ (14) where V0 is upstream flow velocity, g is acceleration due to gravity, and r is density of water. By applying the Buckingham p theorem to Eq. (14) and considering y0, V0, and r as basic parameters, the dimensionless relationship becomes: ’ QS Q0 yt a gy0 , , , , y20 V 20 y20 V 0 y0 y0 V 20 ¼0 (15) By combining the first term and second term, and considering the last term as upstream Froude number, the following equation is obtained: QS y y ¼ ’ t , 0 , Fr Q0 y0 a (16) Experimental data sets of Ghodsian (2003) are used to evaluate M5 model tree in determining flow discharge trough side sluice gate. The range of these dimensionless parameters are also presented in Table 2. The data set is randomly divided into two groups. Of the total 185 data set, 80% (148 sets) is used for training M5 model, and the remaining 20% (37 sets) is used for testing. 3.2 Model evaluation and comparison In order to evaluate the model by testing data, the root mean square error (RMSE), discrepancy ratio (DR), mean percentage error (MPE), and Nash-Sutcliffe model efficiency coefficient (NSE) was used as follows (Kouzehgar et al., 2021): RMSE ¼ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uX n u u ðX i Y i Þ2 t i¼1 n (17) Decision tree algorithms Chapter DR ¼ MPE ¼ Yi Xi NSE ¼ 1 179 (18) n X Xi Y i 100 Xi i¼1 n n X 10 (19) ðXi Y i Þ2 i¼1 n X Xi X 2 (20) i¼1 where Xi is the measured data, Yi is the calculated data, Xis the average of measured data, and n is the number of data. Lower RMSE and MPE, and higher NSE values, indicate greater predictive power of the model. 4. Modeling and results According to Eq. (16), QS/Q0 is selected as a dependent variable and y0/a, Fr and yt/y0 are selected as independent variables for designing the M5 model tree. Then the ability of M5 model tree to estimate QS/Q0 compared with empirical formula. 4.1 Initial tree The root node was selected with the following steps: Step 1: The standard deviation of the target values was calculated as follows: Standard deviation QS =Q0 ¼ 0:211 Step 2: For each variable, the data sets were sorted in ascending order and then SDR was calculated using Eq. (7). In this case, first data were sorted in ascending order for y0/a (Column 2, Table 3). Then, the following calculations were done: – First, it was assumed that the tree is divided in two groups; values less than and values greater than average of first point and second point of y0/a (Column 4 and row 4 of Table 3). Then, the SD of corresponding QS/Q0 was calculated for each of these two groups (i.e., first point and rest of the points) (Columns 5 and 6 and row 3 of Table 3). Then, it is assumed that the tree is divided based on the average value of second and third points of y0/a (Column 4 and row 4 of Table 3). So, the SD value of QS/Q0 for values less than and higher than the average was calculated. These processes were continued until the SD of each group is calculated (Column 4 and 5 of Table 3). – Then, each of these SDs were weighted (Column 6 Table 3) and subtracted from SD of target data (0.212), in order to calculate SDR (Eq. 7) (Column 7 of Table 3). – Finally, the maximum SDR of QS/Q0 was calculated. Step 3: After finishing the calculation of maximum SDR for y0/a, then data were sorted for yt/y0 and once again for Fr, calculations were repeated until the maximum SDR is calculated for all of the three variables (Tables 4 and 5). Step 4: The maximum SDR of input variables were compared in Table 6. As Fr has the highest SDR, it is selected as root node, and its value (Fr ¼ 0.276) was used for splitting (Fig. 7). Step 5: After selecting the root node, the steps of 1–4 were repeated for each of Fr branches (i.e., less than 0.276 and higher than 0.276). These steps in each branch were continued until the subset reach to the specific threshold. The threshold was selected in a manner which the number of remained records in each leaf become less than 25. The initial M5 tree and its linear models before pruning are illustrated in Fig. 8 and Table 7, respectively. 4.2 Pruning The pruning was done by merging some lower subtrees into one node in order to avoid overfitting. The most popular and logical pruning criterion is to remove some sets of leaf nodes, until a minimum overall error is reached. For the present study, LM3 and LM4 leaf nodes were examined for pruning, firstly. As shown in Table 8, by pruning these two subtrees the values of RMSE and MPE for training data set increase. But, for LM7 and LM8, the RMSE and MPE did not change remarkable and they are suitable for pruning. The M5 model tree and its linear models after first pruning were presented 180 Handbook of hydroinformatics TABLE 3 The data sorted by y0/a. Records y0/a QS/Q0 1 1.577 0.463 2 1.804 0.620 3 1.864 4 Average of adjutant y0/a SD group 1 SD group 2 Weighted SD SDR (1.577, 1.717) ¼ 1.691 0 0.212 0.2103 0.001054 0.276 (1.717, 1.804) ¼ 1.834 0.079 0.211 0.2092 0.002186 2.017 0.259 1.941 0.141 0.212 0.2102 0.001134 5 2.087 0.734 2.052 0.148 0.212 0.2106 0.000754 6 2.111 0.450 2.099 0.187 0.210 0.2094 0.001979 7 2.314 0.546 2.212 0.171 0.211 0.2090 0.002388 8 2.315 0.778 2.315 0.160 0.210 0.2080 0.00332 … … … … … … … … 96 9.032 0.106 8.936 0.202 0.122 0.1866 0.02476 … … … … … … … … 148 32.38 0.239 31.965 0.212 0 0.2105 0.00813 max(SDR) ¼ 0.02476 TABLE 4 The data sorted by yt/y0. Records yt/y0 QS/Q0 1 0.267 0.089 2 0.285 0.260 3 0.307 4 Average of adjutant yt/y0 SD group 1 SD group 2 Weighted SD SDR (0.267, 0.285) ¼ 0.276 0 0.211 0.2098 0.00156 0.191 (0.285, 0.307) ¼ 0.296 0.086 0.212 0.2102 0.00117 0.309 0.239 0.308 0.071 0.212 0.2095 0.00187 5 0.323 0.253 0.316 0.066 0.213 0.2090 0.00234 6 0.350 0.247 0.336 0.064 0.214 0.2086 0.00278 7 0.355 0.315 0.352 0.060 0.214 0.2080 0.00330 8 0.386 0.231 0.370 0.066 0.215 0.2080 0.00334 … … … … … … … … 18 0.468 0.715 0.458 0.068 0.221 0.2031 0.00822 … … … … … … … … 148 1.181 0.179 1.085 0.212 0 0.2103 0.00102 max(SDR) ¼ 0.00822 Decision tree algorithms Chapter 10 181 TABLE 5 The data sorted by Fr. Records Fr Qs/Q0 1 0.047 0.797 2 0.049 0.662 3 0.083 4 Average of adjutant Fr SD group 1 SD group 2 Weighted SD SDR (0.047, 0.049) ¼ 0.048 0.000 0.208 0.2069 0.00444 0.633 (0.049, 0.083) ¼ 0.066 0.068 0.207 0.2051 0.00621 0.093 0.694 0.088 0.072 0.206 0.2033 0.00807 5 0.094 0.288 0.093 0.062 0.204 0.2004 0.01098 6 0.100 0.785 0.097 0.173 0.205 0.2038 0.00753 7 0.101 0.650 0.100 0.170 0.202 0.2004 0.01098 8 0.103 0.715 0.102 0.157 0.200 0.1982 0.01313 … … … … … … … … 76 0.281 0.291 0.276 0.235 0.110 0.1734 0.03794 … … … … … … … … 148 0.838 0.806 0.787 0.212 0 0.2102 0.00112 max(SDR) ¼ 0.03794 TABLE 6 The value of SDR for independent variables. Variable max(SDR) Value y0/a 0.02476 8.94 yt/y0 0.00822 0.458 Fr 0.03794 0.276 Fr>0.276 ? ? FIG. 7 Root node. in Fig. 9 and Table 9, respectively. Then, the new subtrees (LM7new and LM6) were pruned. Since the RMSE and MPE did not change remarkable in compare with train dada set (Table 8), they considered for pruning. The final M5 model tree and the final linear models were presented in Fig. 10 and Table 10, respectively. Comparison between predicted and measured QS/Q0 for training data sets based on final linear models, shows that the present model with high NSE of 0.9605 and low RMSE of 0.042, can well estimate the flow discharge through side sluice gate (Fig. 11). Fig. 12 shows the performance of the M5 model tree for testing data sets. The results showed that the data points are concentrated on 1:1 line. The statistical parameters for testing data sets were presented in Table 11. The RMSE of 0.0678 and NSE of 0.8837 confirm the goodness of the estimation. 182 Handbook of hydroinformatics FIG. 8 Initial M5 model tree. Fr>0.276 y0/a>3.281 LM5 y0/a>7.934 LM1 LM6 Fr>0.35 LM7 y0/a>11.91 LM8 Fr>0.135 y0/a>4.528 LM3 LM2 LM4 TABLE 7 Initial linear models. Condition LM Regression equations Fr < 0.276, y0/a > 11.91 LM1 QS/Q0 ¼ 0.8724 0.01147 y0/a 0.3525 yt/y0–1.351 Fr Fr < 0.276, y0/a < 11.91, Fr < 0.135 LM2 QS/Q0 ¼ 1.787 0.0479 y0/a 0.784 yt/y0 1.50 Fr Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a > 4.528 LM3 QS/Q0 ¼ 0.539 0.02126 y0/a 0.146 yt/y0 + 0.011 Fr Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a < 4.528 LM4 QS/Q0 ¼ 3.489 0.1578 y0/a 2.283 yt/y0 2.339 Fr Fr > 0.276, y0/a < 3.281 LM5 QS/Q0 ¼ 2.482 0.1484 y0/a 1.653 yt/y0 0.7286 Fr Fr > 0.276, y0/a > 3.281, y0/a < 7.934 LM6 QS/Q0 ¼ 0.9449 0.03899 y0/a 0.4947 yt/y0 0.3498 Fr Fr > 0.276, y0/a > 3.281, y0/a > 7.934, Fr > 0.35 LM7 QS/Q0 ¼ 0.4424 0.009900 y0/a 0.1931 yt/y0 0.1918 Fr Fr > 0.276, y0/a > 3.281, y0/a > 7.934, Fr < 0.35 LM8 QS/Q0 ¼ 0.6728 0.012643 y0/a 0.20633 yt/y0 0.7433 Fr TABLE 8 Statistical parameters of pruning process. Data set Statistical parameters Without pruning Only with LM3,4 pruning Only with LM7,8 pruning With LM6 and LM7 new pruning Train RMSE 0.0393 0.0867 0.0394 0.0420 MPE 1.8045 5.4721 1.8224 1.8599 NSE 0.9654 0.8317 0.9653 0.9605 Average (DR) 1.018 1.0547 1.0182 1.0186 Standard deviation (DR) 0.1565 0.3755 0.1576 0.1898 Decision tree algorithms Chapter y0/a>3.281 LM7 183 FIG. 9 M5 model tree after first pruning. Fr>0.276 y0/a>11.91 LM5 y0/a>7.934 10 LM1 LM6 Fr>0.135 y0/a>4.528 LM3 LM2 LM4 TABLE 9 Linear models of first pruning. Condition LM Regression equations Fr < 0.276, y0/a > 11.91 LM1 QS/Q0 ¼ 0.8724 0.01147 y0/a 0.3525 yt/y0 1.351 Fr Fr < 0.276, y0/a < 11.91, Fr < 0.135 LM2 QS/Q0 ¼ 1.787 0.0479 y0/a 0.784 yt/y0 1.50 Fr Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a > 4.528 LM3 QS/Q0 ¼ 0.539 0.02126 y0/a 0.146 yt/y0 + 0.011 Fr Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a < 4.528 LM4 QS/Q0 ¼ 3.489 0.1578 y0/a 2.283 yt/y0 2.339 Fr Fr > 0.276, y0/a < 3.281 LM5 QS/Q0 ¼ 2.482 0.1484 y0/a 1.653 yt/y0 0.7286 Fr Fr > 0.276, y0/a > 3.281, y0/a < 7.934 LM6 QS/Q0 ¼ 0.9449 0.03899 y0/a 0.4947 yt/y0 0.3498 Fr Fr > 0.276, y0/a > 3.281, y0/a > 7.934 LM7 QS/Q0 ¼ 0.4754 0.009931 y0/a 0.2031 yt/y0 0.2444 Fr FIG. 10 M5 model tree after second pruning (final M5 model tree). Fr>0.276 y0/a>3.281 LM6 y0/a>11.91 LM5 LM1 Fr>0.135 y0/a>4.528 LM3 LM4 LM2 184 Handbook of hydroinformatics TABLE 10 Linear models of second pruning (final models). Condition LM Regression equations Fr < 0.276, y0/a > 11.91 LM1 QS/Q0 ¼ 0.8724 0.01147 y0/a 0.3525 yt/y0 1.351 Fr Fr < 0.276, y0/a < 11.91, Fr < 0.135 LM2 QS/Q0 ¼ 1.787 0.0479 y0/a 0.784 yt/y0 1.50 Fr Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a > 4.528 LM3 QS/Q0 ¼ 0.539 0.02126 y0/a 0.146 yt/y0 + 0.011 Fr Fr < 0.276, y0/a < 11.91, Fr > 0.135, y0/a < 4.528 LM4 QS/Q0 ¼ 3.489 0.1578 y0/a 2.283 yt/y0 2.339 Fr Fr > 0.276, y0/a < 3.281 LM5 QS/Q0 ¼ 2.482 0.1484 y0/a 1.653 yt/y0–0.7286 Fr Fr > 0.276, y0/a > 3.281 LM6 QS/Q0 ¼ 0.6677 0.01923 y0/a 0.3117 yt/y0 0.2728 Fr 1 Predicted (QS/Q0) 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Measured (QS/Q0) FIG. 11 Comparison between measured and predicted QS/Q0, training data set. 1 FIG. 12 Comparison between measured and predicted QS/Q0 for M5 model tree and empirical formula, testing data set. Predicted (QS/Q0) 0.8 M5 Model Tree 0.6 Emperical Formula 0.4 1:1 line 0.2 0 0 0.2 0.4 0.6 0.8 1 Measured (QS/Q0) 4.3 Comparing M5 model and empirical formula The performance of the M5 model tree was compared with the empirical formula proposed by Ghodsian (2003) for flow discharge through side sluice gate in submerged flow conditions (Table 11). The high performance of the M5 model tree against empirical formula can be seen through all statistical parameters. The proposed formula is not straight forward and need trial and error process. The results also showed that the empirical formula overestimates the QS/Q0 values. Decision tree algorithms Chapter 10 185 TABLE 11 Statistical parameters for testing data sets. Data set Statistical parameters M5 model tree Empirical formula (Ghodsian, 2003) Test RMSE 0.0678 0.1279 MPE 2.7123 32.8597 NSE 0.8837 0.5863 Average (DR) 1.0271 1.3286 Standard deviation (DR) 0.2436 0.2190 5. Conclusions In this chapter book, some of the standard decision tree algorithms are briefly explained. Among these algorithms, the M5 model tree, which is extensively applied in water sciences, is explained in details and its application in prediction of flow discharge through side sluice gate is investigated. The model constructed herein is based on the laboratory experiments of Ghodsian (2003). In this study, QS/Q0 is selected as a dependent variable and y0/a, Fr and yt/y0 are selected as independent variables for designing the M5 model tree. Five regression equations are developed by the M5 model tree, which consist with the parameters used in empirical formula proposed by Ghodsian (2003). The importance of Fr number is also mentioned in Ghodsian (2003), which was expected to be as a root node, and in this study it was appeared in top of the tree. By using different statistical parameters, the results compared with the results of empirical formula. The results demonstrated that M5 model tree is more accurate than the empirical formula which needs try and error and is a time-consuming process. References Bahmani, R., Solgi, A., Ouarda, T.B., 2020. Groundwater level simulation using gene expression programming and M5 model tree combined with wavelet transform. Hydrol. Sci. J. 65 (8), 1430–1442. Bhattacharya, B., Price, R., Solomatine, D., 2007. Machine learning approach to modeling sediment transport. J. Hydraul. Eng. 133, 440–450. Bonakdar, L., Etemad-Shahidi, A., 2011. Predicting wave run-up on rubble-mound structures using M5 model tree. Ocean Eng. 38, 111–118. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and Regression Trees. CRC Press. Bui, D.T., Khosravi, K., Tiefenbacher, J., Nguyen, H., Kazakis, N., 2020a. Improving prediction of water quality indices using novel hybrid machinelearning algorithms. Sci. Total Environ. 742, 141568. Bui, X.N., Nguyen, H., Choi, Y., et al., 2020b. Prediction of slope failure in open-pit mines using a novel hybrid artificial intelligence model based on decision tree and evolution algorithm. Sci. Rep. 10 (9939). https://doi.org/10.1038/s41598-020-66904-y. Charoenporn, P., 2017. Reservoir inflow forecasting using ID3 and C4. 5 decision tree model. In: 2017 3rd IEEE International Conference on Control Science and Systems Engineering (ICCSSE). IEEE, pp. 698–701. Choubin, B., Darabi, H., Rahmati, O., Sajedi-Hosseini, F., Kløve, B., 2018a. River suspended sediment modelling using the CART model: a comparative study of machine learning techniques. Sci. Total Environ. 615, 272–281. € 2018b. Precipitation forecasting using classification and Choubin, B., Zehtabian, G., Azareh, A., Rafiei-Sardooi, E., Sajedi-Hosseini, F., Kişi, O., regression trees (CART) model: a comparative study of different approaches. Environ. Earth Sci. 77, 314. El Seddawy, A.B., Sultan, T., Khedr, A., 2013. Applying Classification Technique using DID3 Algorithm to improve Decision Support System under Uncertain Situations. Tersedia melalui: www.ijmer.com [Diakses 21 November 2015]. Etemad-Shahidi, A., Mahjoobi, J., 2009. Comparison between M50 model tree and neural networks for prediction of significant wave height in Lake Superior. Ocean Eng. 36, 1175–1181. Fisher, A., Belmont, P., Murphy, B.P., Macdonald, L., Ferrier, K.L., Hu, K., 2021. Natural and anthropogenic controls on sediment rating curves in northern California coastal watersheds. Earth Surf. Process. Landf. 46 (8), 1610–1628. Galelli, S., Castelletti, A., 2013. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling. Hydrol. Earth Syst. Sci. 17, 2669. Ghodsian, M., 2003. Flow through side sluice gate. J. Irrig. Drain. Eng. 129, 458–463. Goyal, M.K., Ojha, C., 2012. Downscaling of precipitation on a lake basin: evaluation of rule and decision tree induction algorithms. Hydrol. Res. 43, 215–230. Goyal, M.K., Ojha, C., Singh, R., Swamee, P., Nema, R., 2013. Application of ANN, fuzzy logic and decision tree algorithms for the development of reservoir operating rules. Water Resour. Manage. 27, 911–925. Kass, G.V., 1980. An exploratory technique for investigating large quantities of categorical data. J. R. Stat. Soc.: Ser. C: Appl. Stat. 29, 119–127. 186 Handbook of hydroinformatics Khosravi, K., Miraki, S., Saco, P.M., Farmani, R., 2021. Short-term river streamflow modeling using ensemble-based additive learner approach. J. Hydro Environ. Res. 39, 81–91. Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I., Bui, D.T., 2018a. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 627, 744–755. Khosravi, K., Mao, L., Kisi, O., Yaseen, Z.M., Shahid, S., 2018b. Quantifying hourly suspended sediment load using data mining models: case study of a glacierized Andean catchment in Chile. J. Hydrol. 567, 165–179. Khosravi, K., Melesse, A.M., Shahabi, H., Shirzadi, A., Chapi, K., Hong, H., 2019. Chapter 33: Flood susceptibility mapping at Ningdu catchment, China using bivariate and data mining techniques. In: Extreme Hydrology and Climate Variability: Monitoring, Modelling, Adaptation and Mitigation. Elsevier, pp. 419–434. Khosravi, K., Cooper, J.R., Daggupati, P., Pham, B.T., Bui, D.T., 2020. Bedload transport rate prediction: application of novel hybrid data mining techniques. J. Hydrol. 585 (8), 124774. Khozani, Z.S., Khosravi, K., Pham, B.T., Kløve, B., Wan Mohtar, W.H.M., Yaseen, Z.M., 2019. Determination of compound channel apparent shear stress: application of novel data mining models. J. Hydroinform. 21, 798–811. Kisi, O., Shiri, J., Demir, V., 2017. Hydrological time series forecasting using three different heuristic regression techniques. In: Handbook of Neural Computation. Elsevier, pp. 45–65. Kouzehgar, K., Hassanzadeh, Y., Eslamian, S., Yousefzadeh Fard, M., Babaeian Amini, A., 2021. Experimental investigations and soft computations for predicting the erosion mechanisms and peak outflow discharge caused by embankment dam breach. Arab. J. Geosci. 14, 616. Kumar, P., Folk, M., Markus, M., Alameda, J.C., 2005. Hydroinformatics: Data Integrative Approaches in Computation, Analysis, and Modeling. CRC Press. Mahtabi, G., Satari, M., 2016. Investigation of hydraulic jump characteristics in rough beds using M5 model tree. Jordan J. Agric. Sci 12, 631–648. Mazid, M.M., Ali, S., Tickle, K.S., 2010. Improved C4. 5 algorithm for rule based classification. In: Proceedings of the 9th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases. World Scientific and Engineering Academy and Society (Wseas), pp. 296–301. Milanovic, M., Stamenkovic, M., 2016. Chaid decision tree: methodological frame and application. Econ. Themes 54, 563–586. Nalarajan, N.A., Mohandas, C., 2015. Groundwater level prediction using M5 model trees. J. Inst. Eng. (India): A 96, 57–62. Nourani, V., Davanlou Tajbakhsh, A., Molajou, A., Gokcekus, H., 2019. Hybrid wavelet-M5 model tree for rainfall-runoff modeling. J. Hydrol. Eng. 24, 04019012. Pal, M., Singh, N., Tiwari, N., 2013. Pier scour modelling using random forest regression. ISH J. Hydraul. Eng. 19, 69–75. Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn. 1, 81–106. Quinlan, J.R., 1992. Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence. World Scientific, pp. 343–348. Quinlan, J.R., 1993. C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. Quinlan, J.R., 2014. C4. 5: Programs for Machine Learning. Elsevier. Reddy, M.J., Ghimire, B.N., 2009. Use of model tree and gene expression programming to predict the suspended sediment load in rivers. J. Intell. Syst. 18, 211–228. Rezaie-Balf, M., Naganna, S.R., Ghaemi, A., Deka, P.C., 2017. Wavelet coupled MARS and M5 Model Tree approaches for groundwater level forecasting. J. Hydrol. 553, 356–373. Salih, S.Q., Sharafati, A., Khosravi, K., Faris, H., Kisi, O., Tao, H., Ali, M., Yaseen, Z.M., 2020. River suspended sediment load prediction based on river discharge information: application of newly developed data mining models. Hydrol. Sci. J. 65, 624–637. Salzberg, S.L., 1994. In: Quinlan, J.R. (Ed.), C4. 5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc, Springer. Sattari, M.T., Sureh, F.S., 2019. Drought prediction based on standardized precipitationevapotranspiration index by using M5 tree model. In: International Civil Engineering and Architecture Conference (ICEARC). Trabzon, Turkey. Sattari, M.T., Pal, M., Apaydin, H., Ozturk, F., 2013. M5 model tree application in daily river flow forecasting in Sohu Stream, Turkey. Water Resour. 40, 233–242. Sattari, M.T., Mirabbasi, R., Sushab, R.S., Abraham, J., 2018. Prediction of groundwater level in Ardebil plain using support vector regression and M5 tree model. Groundwater 56, 636–646. Sattari, M.T., Ahmadifar, V., Delirhasannia, R., Apaydin, H., 2020. Estimation of pan evaporation coefficient in cold and dry climate conditions with a decision-tree model. Atmósfera 34 (3). https://doi.org/10.20937/ATM.52777. Sharafati, A., Khosravi, K., Khosravinia, P., Ahmed, K., Salman, S.A., Yaseen, Z.M., Shahid, S., 2019. The potential of novel data mining models for global solar radiation prediction. Int. J. Environ. Sci. Technol. 16, 7147–7164. Sihag, P., Karimi, S.M., Angelaki, A., 2019. Random forest, M5P and regression analysis to estimate the field unsaturated hydraulic conductivity. Appl. Water Sci. 9, 129. Singh, S., Gupta, P., 2014. Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey. Int. J. Adv. Inform. Sci. Technol. 27, 97–103. Solomatine, D.P., Xue, Y., 2004. M5 model trees and neural networks: application to flood forecasting in the upper reach of the Huai River in China. J. Hydrol. Eng. 9, 491–501. Sullivan, W., 2017. 1: Machine Learning Beginners Guide Algorithms Supervised & Unsupervised Learning, Decision Tree & Random Forest Introduction. CreateSpace Independent Publishing Platform, South Carolina, USA. Swamee, P.K., Pathak, S.K., Ali, M.S., 1993. Analysis of rectangular side sluice gate. J. Irrig. Drain. Eng. 119, 1026–1035. Timofeev, R., 2004. Classification and Regression Trees (CART) Theory and Applications. Humboldt University, Berlin, Germany, pp. 1–40. Decision tree algorithms Chapter 10 187 Tyralis, H., Papacgaralampous, G., Langousis, A., 2019. A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11, 910. Wang, Y., Witten, I.H., 1996. Induction of Model Trees for Predicting Continuous Classes, Working Paper 96/23. University of Waikato, Department of Computer Science, Hamilton, New Zealand. Witten, I.H., Frank, E., 2002. Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec. 31, 76–77. Zahiri, J., Mollaee, Z., Ansari, M.R., 2020. Estimation of suspended sediment concentration by M5 model tree based on hydrological and moderate resolution imaging spectroradiometer (MODIS) data. Water Resour. Manage. 34, 3725–3737. This page intentionally left blank Chapter 11 Entropy and resilience indices Mohammad Ali Olyaeia, A.H. Ansarib, Zahra Heydaric, and Amin Zeynolabedind a Department of Civil Environmental and Geo-Engineering, University of Minnesota, Minneapolis, MN, United States, b Department of Agricultural and Biological Engineering, Pennsylvania State University, State College, PA, United States, c Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States, d School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran 1. Introduction Urbanization with its rapid population growth in the 21st century has been leading to the concentration of population and assets in hazard-prone urban areas, a speeding-up trend not likely to slow down in any near future. The inevitable exposure resulting from this urbanization trajectory and the embedded conditions of built environments, spatial, and industrial vulnerabilities produce disaster risks when coupled with climate change-driven natural disasters (Gencer, 2013). Settlement patterns, urbanization, and changes in socioeconomic conditions have all influenced observed trends in exposure and vulnerability to climate extremes. These urban climate hazard risks, vulnerabilities, and impacts are increasing across the world, specifically in regions that are not able to meet their city’s needs due to inadequate capacity, unstable governance structures, and substandard infrastructure, built-environment, and urban services (Revi et al., 2014). The effects of climate-related disasters are often exacerbated in cities due to interactions with urban infrastructure systems, growing populations, and economic activities. Natural disasters and the severity of their impacts expose a need for an enhanced policy framework, particularly in urban areas, holding the majority of the world’s population. It is essential to understand the correlation between the effects of climate change and hazard risks in urban areas and to address integrated strategies for disaster risk reduction. More frequent climate extreme events are being experienced in urban areas. The frequency and severity of these disasters have been intensifying in the last few decades and are projected to increase in the coming decades likewise. The impacts of climate change put additional pressure on existing urban water systems (UWS) and can lead to negative impacts on human health, economies, and the ecosystem. Such impacts include increased frequency of extreme weather events leading to large volumes of stormwater runoff, rising sea levels, and changes in surface water and groundwater (Melo et al., 2010; Magrin et al., 2014; Zeynolabedin et al., 2021). Climate patterns are not changing in our favor, but our approaches and strategies must. Numbers of new concepts of disaster risk management (DRM) have gained prominence over the past decade in international development discourse. Among the mentioned concepts, resilience has emerged as one of the core principles in sustainable urban development (Eslamian and Eslamian, 2021). Given that more than half of the world’s population lives in urban areas and that this percentage is expected to significantly increase in upcoming decades, cities must focus attention on disaster risk reduction and enhancing resilience (United Nations, 2004). Assuming that urban decision-makers have the necessary institutional capacity, their ability to ensure resilient futures could be redirected through strategic development initiatives such as effective risk management, adaptation, and urban planning systems (Folke et al., 2010; Solecki et al., 2011). Disaster risk reduction and climate change adaptation are the cornerstones of making cities resilient to a changing climate. Integrating these activities with a metropolitan region’s development vision requires a new, systems-oriented approach to risk assessments and planning. Moreover, since past events can only partially inform decision-makers about emerging and increasing climate risks, risk assessments must incorporate knowledge about current climate patterns and future projections simultaneously (Wang et al., 2014). This revision requires urban stakeholders and decision-makers to increase the institutional capacity of many communities and organizations to strategize and apply risk-reduction, disaster response and recovery plans on a flexible and highly adaptive basis. UWSs play a vital role in building resilient cities. Both natural and anthropogenic activities could exert pressure on these systems in a way that their normal operation becomes interrupted. Learning to be prepared for any potential hazard requires comprehensive information regarding the UWS’s performance and how exactly could failure happen in such unlikely circumstances. Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00025-7 Copyright © 2023 Elsevier Inc. All rights reserved. 189 190 Handbook of hydroinformatics 2. Water resource and infrastructure performance evaluation The design and operation of UWSs require an evaluation of the performance of these systems against different stressors, which is measured by different metrics. Reliability, resilience, vulnerability (Hashimoto et al., 1982; Fowler et al., 2003), and risk are the most widely used performance metrics, which are very popular among UWSs researchers and engineers. Evaluating the performance of UWSs by these metrics requires the identification of what constitutes an unsatisfactory state or failure (Asefa et al., 2014). Failure means that the system is unable to perform its required function and the definition of failure varies from system to system. For example, in a drainage system, hydraulic overloading caused by extreme rainfall could be considered a failure (Mugume et al., 2015). In a water distribution system, pressure reduction caused by component failures (pumps and pipes) or by large demands could be taken into account as a failure (Setiadi et al., 2005). In the context of urban sewer and treatment systems, the overflow of effluent with a concentration exceeding the capacity of receiving water is considered a failure. In a water supply system, failure occurs when supply cannot meet demand (hatched areas in Fig. 1). UWS has long been designed based on the “fail-safe” approach, i.e., to provide a high level of reliability over design life for an acceptable level of risk. Reliability refers to the capacity of the system to minimize the frequency of its failures under design condition (Butler et al., 2014). In Fig. 1, the suppliers are in either satisfactory, S, or unsatisfactory (hatched areas), U, states, depending on whether they meet a constant water demand, D, for which reliability is defined as follows: Rel ¼ PðV t SÞ (1) volume where Vt represents the volume of water supplied at time t. In other words, the ratio of total times the system operates successfully, i.e., Vt > D to the total operating time, T, or simply its probability of successful operation, P(Vt S), is called reliability ( Jung et al., 2014). Although reliability is one of the key elements in the design, operation and maintenance of UWSs, relying on this metric alone is not sufficient to evaluate the performance of these systems against all stressors (Butler et al., 2017). For example, in urban drainage systems, Park et al. (2013) show that the reliability approach for evaluation of structural failure is not sufficient due to the unknown mechanisms of failure. Reliability-based design does not include all the statistics of system performance (Asefa et al., 2014; Hashimoto et al., 1982). Reconsidering Fig. 1 for example, supplier 1 does not meet the demand between t5 and t6 and supplier 2 fails to meet this demand between t1 and t2 and between t3 and t4. The total failure duration of the two suppliers is the same during operation and thus their reliabilities are equal to each other. However, the failure magnitude of supplier 1 is greater than those of supplier 2, which needs to be considered in characterizing the system performance. In addition, increasing the reliability of UWS does not necessarily lead to a reduction in the level of service failures when subject to exceptional conditions (Sweetapple et al., 2018). Uncertainties in exceptional conditions have led to criticism of the “fail-safe” approach. According to critics, system failures are unavoidable and water resources and infrastructure need to be designed not only reliable under design condition but also to be resilient to unexpected stressors (Butler et al., 2014; Mugume et al., 2015). Conventionally, the system’s supply #1 supply #2 demand M(U) M(U) t1 t2 t3 FIG. 1 Failure in supplying downstream water demand (two alternatives). t4 t5 t6 T time Entropy and resilience indices Chapter 11 191 behavior is evaluated against stresses outside of design condition by risk analysis. Risk is defined as a function of probability and magnitude of unsatisfactory state or failure. Based on Fig. 1, the risk is formulated as follows: Risk ¼ PðV t U Þ MðU Þ (2) where P(Vt U) indicates the probability of failure, and M(U) represents severity or magnitude of this failure (maximum gap between supply and demand). Risk analysis is an efficient tool in the toolbox of UWS designers for evaluating the performance of systems in exceptional conditions. However, on many occasions, sufficient data are not available to describe risk (Ansari et al., 2021) and even if there is enough recorded data of stressors, nonstationarity prohibits the accurate estimation of risk due to changes in the response of systems to stressors (Karamouz and Mohammadi, 2020; Panos et al., 2021). Therefore, the records used in risk analysis are not reliable for the prediction of future performance (Park et al., 2013). Limitations in reliability and risk analysis (Sweetapple et al., 2018) and the inability of conventional design approaches for addressing uncertainties have led to a transition in the design approach from “fail-safe” to “safe-fail.” In the “safe-fail” approach, the different modes of system failure are investigated regardless of causal events and the system is designed to absorb these stressors at least for a short time, and if it fails, it can recover quickly. In the “safe-fail” approach, achieving maximum resilience is considered as a goal in multiobjective design optimization (Mohammadiun et al., 2018) and complements reliability and risk analysis (Kjeldsen and Rosbjerg, 2004; Sweetapple et al., 2018). Sustainability in recent years has been recognized as an overarching goal in the design and operation of UWS. Sustainability refers to the capacity of the system to maintain its long-term performance while maximizing its economic, social, and environmental goals (Sweetapple et al., 2019; Tavakol-Davani et al., 2019). In the light of achieving this goal under uncertain circumstances and paradigm shift from “fail-safe” to “safe-fail,” concepts such as entropy and resilience have been highlighted by researchers. Entropy addresses uncertainty to provide a basis for risk and reliability analysis while resilience is crucial toward more sustainable urban water systems under uncertainty (Diao et al., 2016; Ahern, 2011). Basically, these two concepts became popular in water resource management in response to the limited data and incomplete information. In the following, each of these concepts will be discussed and their applications will be reviewed in the environmental and water resources area. 3. Entropy Water resources systems are inherently complex systems and are stochastic in nature due to the randomness of hydrological processes and climate forces. Therefore, these systems require a stochastic description and probabilistic methods make such a description explicitly possible. However, the lack of sufficient data, incomplete information, and small sample sizes challenge the estimation of probability distributions. Entropy theory alleviates this problem by providing least biased probability distributions with such limited data. The concept of entropy was originally introduced by Clausius in thermodynamics. This term has a statistical and probabilistic nature and is interpreted as a measure of the amount of chaos, which shows the macroscopic behavior of a system in a thermal equilibrium state. Entropy can be examined in three different but related areas: thermodynamic entropy, statistical-mechanical entropy and information entropy. 3.1 Thermodynamic entropy Thermodynamic entropy shows the state of systems in thermal equilibrium. In the equilibrium state, the entropy is at its maximum and the entropy production per unit mass is at its minimum. This principle justifies the behavior of many natural systems, such as hydrological and river morphological processes. In recent decades, thermodynamic entropy has been considered by researchers to assess the sustainability of urban systems. The relationship between the second law of thermodynamics and the degradation of natural resources was first investigated by Georgescu-Roegen (1993). Subsequently, many researchers have tried to justify urban sustainability, human environmental and economic activities (Daly, 1992), degradation of resources and the flow of matter and energy upon the light of entropy and the second law of thermodynamics. According to the second law of thermodynamics, heat cannot by itself transfer from a colder to a warmer body. This law states that all real processes are irreversible. In an irreversible process, work is lost and leads to the production of additional entropy in an isolated system. So the changes of total entropy in the system, unlike reversible processes, is not zero but positive, △ S > 0. Today, many problems in the urban basin and climate are linked to increased material entropy due to the irreversible degradation of resources and impossibility of complete recycling of matter (Purvis et al., 2017). 192 Handbook of hydroinformatics 3.2 Statistical-mechanical entropy In 1870, Boltzmann examined thermodynamic entropy at the molecular level using statistical mechanics. Each molecule in a system can move at a set of discretized energy levels. As the temperature of a system rises, the molecules can move at higher energy levels. Therefore, with increasing temperature, i.e., entropy of the system, the number of energy levels available for molecules increases and the probability that the system will be found at a certain level of energy is reduced. In other words, uncertainty about the state of the system increases. Lewis and Randall (1961) state that the most probable distribution of energy in the system is such that the entropy of the whole system becomes at maximum level. This theory so called “principle of maximum entropy” has been applied to a variety of problems in water resource management including the derivation of probability distributions and parameter estimation. 3.3 Information entropy The concept of information entropy was formed in Shannon’s (Shannon, 1948) mind in response to the question of how to quantify the uncertainty of an information system. Shannon’s entropy is not unrelated to the concept of statisticalmechanical or Boltzmann’s entropy. The entropy of a random process shows how uncertain we are about the outcome of this process. So the more unpredictable the outcome of a process and the more uncertainty about its outcome, the more entropy there is, and vice versa. Shannon used a logarithmic scale to measure the uncertainty of a random process or the difficulty of guessing its result. X H ðX Þ ¼ PðXÞ log PðXÞ (3) where H(X) indicates the information entropy of a random variable X ¼ {x1, x2, …, xn} with probability distribution of P(X). Consider, for example, tossing a coin. The sample space of this random event includes tail and head, S ¼ {T, H}. If the coin is perfect, the probability of getting heads and tails on the toss is the same and equal to 0.5 and in this case, entropy is equal to 1. If the coin is defective and the probability of getting heads on the toss approaches 1 or 0, uncertainty about the outcome of this random event decreases. Fig. 2 shows the entropy for the different probabilities of getting heads on the toss. According to Fig. 2, in extreme cases, i.e., when the coin toss is definitely tail or head, tossing the coin will not add any information to us. Because we already knew the result of the event with certainty and coin toss is no longer a random event. If the probabilities of getting tail and head on the toss are equal, or in other words, the probability distribution is uniform, uncertainty about the outcome of the coin toss will be maximized in this case. Among the various distributions, uniform distribution has the highest degree of uncertainty or entropy and is used when information is very limited, followed by Gaussian distribution, logistic, Laplace and extreme-value distributions (Mukher jee and Ratnaparkhi, 1986). Entropy has numerous applications in water resource management particularly in case of joint and conditional probabilities. Regarding the joint probability, consider two random events, X and Y. We show the joint probability of these two events as P(X, Y) which indicates the probability that these two events occur simultaneously. The joint entropy of X and Y is calculated based on their joint probability as follows: FIG. 2 Entropy of tossing a coin with different probability of getting head on the toss. (Adapted from Cover, T.M., Thomas, J.A., 1991. Information theory and statistics. In: Elements of Information Theory. John Wiley & Sons, New York, pp. 279–335.) 1 0.8 entropy 0.6 0.4 0.2 0 0 0.2 0.4 0.6 probability of getting heads on toss 0.8 1 Entropy and resilience indices Chapter HðX, Y Þ ¼ X PðX, Y Þ log PðX, Y Þ 11 193 (4) i,j It can be shown that the entropy of the joint occurrence of these two events is always less than the total entropy of the occurrence of each event and only if the events are independent of each other, the entropy or uncertainty of their simultaneous occurrence is equal to the total entropy of each event. HðX, Y Þ HðXÞ + H ðY Þ (5) If X and Y are not independent of each other, the conditional probability of Y given X according to Bay’s law is written as Eq. (6): PðYjXÞ ¼ Pð X \ Y Þ Pð X Þ (6) It can be expected that if two random events are not independent, observing the occurrence of each of them will lead to a reduction in uncertainty in predicting the outcome of the other event. With this background on conditional entropy, we calculate the conditional entropy of Y given X. Assuming that X and Y are two dependent random events, the entropy of Y given X is calculated as follows: H ðYjXÞ ¼ H ðX, Y Þ H ðXÞ (7) where H(Y j X) is the conditional entropy which measures the uncertainty of Y remaining after knowing X. According to Eq. (7), the joint entropy of the two random events X and Y is equal to the entropy of event X plus the entropy of event Y conditional upon the knowledge of event X, i.e., H ðX, Y Þ ¼ H ðXÞ + HðYjXÞ (8) Transinformation measures the redundant or mutual information between X and Y, which can be calculated as the difference between the total entropy and the joint entropy of X and Y (Eq. 9). T ðX, Y Þ ¼ HðXÞ + H ðY Þ H ðX, Y Þ (9) where T(X, Y) indicates the transinformation between X and Y. By inserting Eq. (8) into Eq. (9), the transinformation can be calculated in terms of conditional entropy (Eq. 10). T ðX, Y Þ ¼ H ðXÞ H ðXjY Þ or T ðX, Y Þ ¼ HðY Þ H ðYjXÞ (10) Transinformation or mutual information is an important variable in the design of monitoring networks. Entropy applications in water resource systems management will be reviewed later. 3.4 Application of entropy in water resources area Entropy has been applied to a variety of problems in water resource systems planning and management including a derivation of distribution (Papalexiou and Koutsoyiannis, 2012), parameter estimation (Chen and Singh, 2018), streamflow forecasting (Cui and Singh, 2015; Darbandsari and Coulibaly, 2020), the hydrologic cycle and water budget (Kleidon and Schymanski, 2008), design of hydrological and water quality networks (Xu et al., 2018), channel hydraulics (Greco et al., 2014), subsurface hydrology (Barbe et al., 1994), morphology (Ranjbar and Singh, 2020), reliability of water resource systems (Setiadi et al., 2005; Tanyimboh, 2017), and risk analysis (Mobley et al., 2019; Liu et al., 2019; Qiu et al., 2021). Some researchers such as Fistola (2011) and Pelorosso et al. (2017) used entropy in the sense of thermodynamics in assessing the sustainability of urban development. In this sense, entropy has justified many hydrological and watershed processes and has been considered by researchers such as Reggiani et al. (1998) in hydrological and watershed modeling. The entropy exchange in these models, along with the balance equations of mass, momentum, and energy, is taken into account for a hydrological system. The principle of maximum entropy which originates from statistical entropy has also many applications in water and environmental engineering. This principle in water and environmental engineering can be expressed as follows: when statistical inference is based on limited data and small samples, the probability distribution to be drawn should have the maximum entropy based on this available information. This is equivalent to maximizing the information entropy. This principle has been used to derive a variety of distributions that are widely used in hydrology and water resources, and 194 Handbook of hydroinformatics parameter estimation. For example, Dong et al. (2013) employed the principle of maximum entropy to derive the bivariate distribution of significant wave heights and the corresponding peak periods. Zhang et al. (2020) and Swetapadma and Ojha (2021) applied this principle for parameter estimation in flood frequency analysis to minimize error and bias arising from sampling methods and the selection of distribution models. In system engineering, the principle of maximum entropy provides a basis for risk and reliability analysis by approximating least-biased distribution tails from limited data (Zhang et al., 2020; Singh, 1997). Information entropy has found the most applications in water resource engineering. In designing hydrometeorological networks (Xu et al., 2018; Wang et al., 2018) and water quality monitoring stations (Boroumand et al., 2018. Singh et al., 2019; Banik et al., 2015), the optimal location of stations is determined based on information entropy by minimizing the transinformation among stations. In risk analysis, the key influencing factors are identified based on information entropy and their relative importance is determined based on entropy weight method (Ziarh et al., 2021; Liu et al., 2019; Malekinezhad et al., 2021). Another application of information entropy in water resource system engineering is the estimation of the reliability of water distribution networks (Tanyimboh and Templeman, 1993), where the entropy acquired another meaning, more connected to the redundancy. In other words, network entropy increases with adding pipes and closing loops. This ensures that the flow of a node is supplied from alternative routes in case of failures, and that system reliability is increased. The entropy of a water distribution network can be calculated as follows (Atkinson et al., 2014): 2 3 J X X X 1 qij Qj S¼ =T ln Qj =T T 4 Qj =T ln Qj =T + =Tj ln qij =Tj 5 (11) T j¼1 j jIN iN j where, j indicates all nodes including source nodes, J indicates the number of nodes and IN the set consisting of source nodes. Also, T is the total supply, Tj is the total flow reaching node j, including any external inflow while Nj represents all nodes immediately upstream of and connected to node j. Qj is the demand at demand nodes or supply at source nodes and qij is the flow rate in pipe ij. According to Eq. (11), increasing the source nodes and pipes to the upstream of demand nodes results in higher entropy or redundancy of the network, and hence, higher reliability. 4. Resilience Resilience concept originates from the field of ecology in the 1970s and refers to the natural system’s persistence in case of facing natural or anthropogenic causes (Holling, 1973; Dong et al., 2017). A resilient ecosystem is capable to remain its functionality when exposed to external stress by changing its structure. This concept later gained prominence in the engineering field in the mid-1990s as the classical view toward designing the engineering systems in a way to prevent failure has been challenged. A new paradigm considers the extreme conditions as an opportunity for a system to adapt and reorganize ( Juan-Garcı́a et al., 2017). In all previous work regarding resilience in engineering systems, particular attention has been paid to how resilience is implemented in practice. This effort is rather complex taking into account the fact that urban systems have both technological and social processes in which the resilience concept encompasses (Wang and Blackmore, 2009). Bruneau et al. (2003) demystify this seemingly vague concept by defining resilience dimension in physical and social systems. They stated that a resilient system must have a noticeable reduction in its failure probabilities, consequences and recovery time and accordingly defined 4R terms, i.e., redundancy, resourcefulness, rapidity, and robustness as the dimension of resilience. They further conceptualize resilience into different perspectives and break it into technical, organizational, social and economic aspects. Basically, literature in resilience provides two approaches to quantify this concept by; (1) Metrics which characterizes the inherent properties of a resilient system (attribute-based approach), and (2) Equations which monitor and evaluate the performance of a system when is exposed to extreme stresses or performance-based approach (Hosseini et al., 2016; Karamouz and Hojjat-Ansari, 2020). The underlying difference in these two methods lies in the way that resilience should be perceived. While the attribute-based approach mainly concentrates on the system’s properties and sometimes these properties are suggested as indicators, the performance-based approach focuses on the ultimate objective of resilient systems which is providing the specified services in an efficient and continuous way. Although there is clearly a relationship between the effect of properties on system’s performance, the detail of this impact is not completely known (Butler et al., 2017). It is believed that the resilient systems could be highly achieved when it is divided into various hierarchically organized subsystems in the so-called centralized control and decentralized execution (CCDE) mode (Diao, 2021). The typical performance curve of an engineering system during normal and extreme conditions is shown in Fig. 3. During normal conditions, there is a fluctuation in performance as a result of changes in forcing data, malfunction in Entropy and resilience indices Chapter Normal condition Extreme condition 11 195 Normal condition System Performance P0 Recovery time Hazard duration t0 t1 t2 Time FIG. 3 Schematic presentation of system’s performance curve during normal and extreme condition. properties and so on. With the onset of a natural or manmade hazard at t0, the system’s performance starts declining until the point when the stressor terminates (t1). The system needs a time called recovery time to return to its normal operation. The following equations could be used to quantify resilience based on the performance curve. Z t2 r¼ ½P0 PðτÞdτ; t ½t0 , t2 (12) t0 0Z B B Res ¼ 1 B @ t0 t2 1 ½P0 PðτÞdτ C C C P0 ⁎ðt2 t0 Þ A (13) where P0 is the mean state of the system’s performance during normal conditions, P(τ) is the value of performance at a measured time τ, t0 and t2 are the time of perturbation starts and ends, respectively. The Res metric value is dependent on the shaded area in Fig. 3; the larger area, the less resilient the system becomes. Some studies argue that the metrics based solely on the area of the performance curve is not sufficient and other metrics representing the intensity and duration of perturbation must also be taken into consideration (Sweetapple et al., 2017; Olyaei et al., 2018). Resilience could be categorized in numerous ways. The first classification is attribute-based and performance-based. Attribute-based or general resilience considers the system as a whole and refers to a system state which strengthens it to limit the failure magnitude and duration to any threat. Performance-based or specified resilience, on the other hand, focuses on a specified threat and refers to the agreed performance of the system in reducing failure magnitude and duration (Scholz et al., 2012; Butler et al., 2014; Olyaei et al., 2018). Another classification is based on engineering resilience and ecological resilience (Holling, 1996). With the latter considered as a more appropriate theoretical framework for management (Liao, 2012), the application of resilience in both connotations is reviewed further on. Global resilience analysis (GRA) is another way to characterize resilience which shifts the focus from threat to solely performance by considering numerous and comprehensive sets of failure scenarios (Mugume et al., 2015). It should be noted that resilience is somehow interwoven with some other terms such as reliability, risk and sustainability. The goal of resilience is to maintain a satisfactory state of the system under exceptional conditions and to quickly recover once failure occurs (Park et al., 2013; Butler et al., 2014). In contrast to reliability and risk analysis, in which it is necessary to identify hazards and characterization of probabilities, resilience analysis on a system can be performed for highly improbable and even unobserved stressors (Fig. 4). Therefore, resilience analysis considers a wider range of stressors and provides greater scope than risk analysis (Sweetapple et al., 2018). 4.1 Application of resilience in water resources area Though the idea of resilience has a long history in engineering and ecology, its application to natural hazard management is relatively recent (Berkes, 2007). Various strategies are used to reduce disaster risks and build resilience as well as to adapt to climate change in urban areas. However, there is frequently a disconnection between climate change adaptation and Handbook of hydroinformatics frequency 196 design load CDF highly improbable and unobserved events acceptable risk PDF reliability design condition risk resiliency exceptional condition stress FIG. 4 “Fail-safe” design approach. disaster risk reduction research communities and a lack of collaborative integrated application of adaptation strategies in these areas (Solecki et al., 2011). This is due to differences of emphasis between disaster risk and climate change research, with the former focused on the past and present, and later on the impacts of future risk (Thomalla et al., 2006; UNDP, 2004; Gencer, 2008). The application of resilience is what the mentioned discontinuity needs to be filled; to learn from the past and prepare for the future is what underlies the notion of resilience. Since resilience is a multidisciplinary term, it has been used in various areas of research. Regarding the water resources area in general, the previous studies could be categorized into two distinct sections: resilience in UWS and in urban environment or ecology. The first section typically focuses on the specific water engineering systems, particularly urban wastewater systems, while the second section deals with the way to build resilient urban environments against natural hazards including flood and drought. 4.2 Resilience in UWS Wastewater treatment plant (WWTP) is a strategic infrastructure with numerous purposes such as protecting the environment and provision of new resources in water scarce areas (Karamouz et al., 2018a). This infrastructure is exposed to some natural and man-made stressors that may impact its efficient performance. The satisfactory performance of WWTP is obtained through monitoring its effluent quality variables (such as BOD, COD, TSS, TN, TP) and ensuring that they are not violating the standard values. These standards come from a so-called “permitting” approach that controls the risk imposed by wastewater treatment systems. The standard values are determined based on the estimation of the impact of releasing wastewaters to the environments. These permits are becoming stricter as protecting our environments gets special significance; however, complying with strict regulations is costlier (Meng et al., 2016). As it was stated in the definition of resilience, the performance of a system could be studied under a specified threat or the attention could be put on the mode of system failure regardless of the type of threat. The failure of WWTP refers to the times when the effluent exceeds the standards and this situation could happen as a result of two types of failure modes: (1) Structural failure which refers to the malfunction in the WWTP components such as pumps, tanks or pipes; (2) Operational failure which relates to the components overloading such as solid washout in the aeration tank. Both failure modes result in the inability of the failed components to deliver its desired function and eventually lead to the whole systems, i.e., WWTP failure (Mugume et al., 2015). Internal and external failure is another categorization for WWTP failure mode (Sweetapple et al., 2019). The internal failure specifies component failure, which could be quantified by the percentage loss of function. The external failure refers to the changes in the sewer influent characteristics. Entropy and resilience indices Chapter 11 197 Sweetapple et al. (2017) assessed the performance of a WWTP against some predefined influent perturbations such as increase in the flow rate, total nitrogen concentration, COD concentration, and temperature and presents a general framework for designing a resilient and reliable WWTP. Flooding is an example of natural hazard that could paralyze the normal serviceability of WWTP. There are two stressors in flooding that could cause the malfunction of the system: (1) Enormous increase in the influent discharge that might go beyond the capacity of the plants’ unit operations such as settling and aeration tanks; and (2) the inundation depth that could cause malfunction in different unit operations. Olyaei et al. (2018) assessed the performance of a hypothetical WWTP under both structural and operational failures by assessing resilience from three perspectives: based on the area of the performance curve, the failure magnitude, and failure duration. They showed that the effect of flooding on various effluent quality variables is disproportionate; while TSS and TN experience noticeable impact, the change in BOD and COD are negligible. In another study by Olyaei and Karamouz (2020), it is shown that the biological parameters in WWTP modeling could go through remarkable changes in flooding condition; therefore, uncertainty analysis is a useful tool to quantify and capture these changes. In Fig. 5, the variation in TSS is shown, in which the rising in the time of flooding is prominent. For improving the resilience some measures called interventions should be implemented. These interventions include a wide range of actions with manifold effects on long-term system’s performance, i.e., sustainability. The interventions typically divide into two classes: design and operational control. Interventions based on design usually concentrate on enhancing the capacity of storage tanks preceding the WWTP and the settling tanks by inserting backup tanks. In the case of flooding, there are other measures as well such as flood proofing and elevating the equipment, providing backup power generation for pumping stations and emergency response such as sandbagging (NYCDEP, 2013). These interventions sometimes need a considerable amount of budget for which the optimum allocation is inevitable (Karamouz et al., 2018a). The other types of interventions are based on altering the operational control such as return flow or aeration value. In an urban water distribution network, as another component of UWS, the performance of the system is measured by the capacity to provide sufficient pressure and flow with appropriate quality. Different failure modes of the system include structural failures such as pipe collapse due to traffic load, land subsidence, asset aging and decay, and operational failures such as pump failure due to repairs or power outages, excess demand under fire fighting conditions and the intrusion of chemical substances. The impact of such failures on the hydraulic and qualitative properties of the system can be simulated in EPANET. Considering three types of common failures including pipe failure, excess demand of firefighting, and pollutant intrusion, Diao et al. (2016) examined the performance loss of these networks. The resilience assessment of these networks, for example, against external pollutants was estimated by injecting a contaminant into a set of network nodes and evaluating its effect on the quality of the flow throughout the networks. Their study shows that increasing resilience to one stressor may reduce the resilience of the system to other stressors. Therefore, in resilience analysis, evaluating the performance of the system against a variety of potential hazards is inevitable, which is referred to as general resilience assessment. Recent studies regarding resilience in water distribution networks highlight the role of network topology on the way the systems could recover after a failure. For example, Meng et al. (2018) presents various metrics under six categories of connectivity, efficiency, centrality, diversity, robustness and modularity. In particular modularity or system decomposition was found to be imperative in system resilience, i.e., the ability of a system to be converted into multiple number of modules or subsystems with stronger internal connections than external connections (Diao et al., 2021). FIG. 5 The typical effluent of TSS concentration during flooding condition (the standard limit is depicted with dashed line) (Olyaei et al., 2018). 198 Handbook of hydroinformatics FIG. 6 Resilience analysis of urban drainage system in SWMM (before and after failure scenario). In urban drainage systems, resilience is targeted against structural failures such as pipe collapse and blockage, and operational failures caused by extreme rainfall events, and land use change (Panos et al., 2021), which leads to hydraulic overloading of the systems and urban flooding. Mugume et al. (2015) evaluated the resilience of these systems against structural failures with a global resilience approach. Simulation of these systems was performed through dynamic flow routing in the SWMM environment. Pipes’ collapse can be simulated by removing them from the network and blocking them by increasing the manning coefficient (Fig. 6). In order to measure the resilience of these systems through classical resilience formulas (Eqs. 12 and 13), the total flood volume and average flood duration due to the failure of system components are defined as the magnitude and duration of failure. Dong et al. (2017) go one step further in evaluating the resilience of drainage systems and, in addition to urban flooding (social severity), include sewer overflow (environmental severity), and the operation of downstream wastewater treatment plant (technological severity) in the formulation of resilience assessment of these systems. Simultaneously with transition from “fail-safe” to “safe-fail” approach, researchers have evaluated the impacts of interventions on the resilience of urban infrastructure. Interventions in drainage systems are implemented with the aim of achieving the best system performance and maximum resilience according to various criteria such as minimizing the frequency and volume of combined sewer overflow (CSO) or reaching the quality standard of effluent of wastewater treatment plants. For example, increasing the capacity of pumps and the number and storage tanks are among the resilience-enhancing interventions of drainage systems. In general, as the number of storage tanks in drainage systems increases, flooding subsides and resilience increases (Wang et al., 2021). Considering the social, economic and environmental aspects of the failure of urban drainage systems, Sweetapple et al. (2019) found that by changing the type and magnitude of hazards, resilience-enhancing interventions from a point onward, so called tipping point, will not lead to the sustainability of these systems. For example, their studies showed that increasing the capacity of pumps in integrated drainage systems for different threat types and severities will not always result in sustainability. Although much research has been done in the last two decades on the resilience of urban systems and resilienceenhancing interventions, the scope of this research is not limited to urban infrastructure, but has found broader dimensions at the scale of urban environments, which will be discussed below. 4.3 Resilience in urban environments The concepts of urban ecology and resilience are framed by the interrelationships between communities and the natural and built environments at local, regional and global scales. The dynamic between these changing entities is fundamental to resilience thinking and underpins the intentions of resilience: to understand and strengthen a city’s capacity to mitigate, adapt to, and recover from internal and external shocks and stresses. Urban ecosystems are important components when building urban resilience through their ability to absorb climate-induced shocks and ameliorate the worst effects of extreme climate events (McPhearson et al., 2015). Despite the increasing attention given to the concept of resilience in hazard management and urban ecology, what defines resilience to natural disasters remains ambivalent. This section addresses flooding, and drought to develop a rigorous definition on “urban resilience” that embraces inherent dynamism and Entropy and resilience indices Chapter 11 199 uncertainties to provide unconventional perspectives for coping with natural hazards. Furthermore, the section aims to emphasize the fact that it is vital for cities to catalyze the transformation from resistant to resilient; a shift from rigidity to flexibility. Urban areas are complex systems with social, ecological, economic, and technical/built components interacting dynamically in space and time (Pickett et al., 1997, 2001; Grimm et al., 2000; Niemel€a et al., 2011; McPhearson et al., 2016b). The complex nature of urban systems can make it challenging to predict how ecosystems will respond to climate change in cities (Batty, 2008; Bettencourt and West, 2010). This complexity is driven by many intersecting feedbacks affecting ecosystems, including climate, biogeochemistry, nutrient cycling, hydrology, population growth, urbanization and development, human perceptions, behavior, and more (Bardsley and Hugo, 2010; Pandey and Bardsley, 2013; Alberti, 2015; McPhearson et al., 2016a; Tavakol-Davani et al., 2019). These systems interrelate dynamically with the social, ecological, economic, and technological-built infrastructure of the city (Grimm et al., 2000; McDonnell and Hahs, 2013). Patterns and processes of urban systems in this view emerge from the interactions and feedbacks between components and systems in cities, emphasizing the need to consider multiple sources of social-ecological patterns and processes to understand reciprocal interactions between climate change and urban ecosystems (Peterson, 2000). Applying the engineering resilience concept to urban environments subject to natural hazards is fundamentally problematic due to the outdated equilibrium theory. Recovery is often interpreted as returning to predisaster conditions, implicitly assuming an optimal reference state, which nevertheless does not exist in coupled human-natural systems (Berkes, 2007). Urbanized basins are such systems, where climate, socioeconomic trends, built systems, and riverine processes affect natural hazards and disasters. They operate like evolving ecosystems rather than engineering systems and are characterized by complex behaviors associated with nonlinearity, emergence, uncertainty, and surprise (Liu et al., 2007). Such dynamic systems will not stay at a predetermined state. At the urban scale, resilience requires investment in man-made and nature-based “hard” infrastructures, as well as “soft” systems such as knowledge and institutions. The concept of resilience when applied effectively can provide a useful base for more substantial changes in the underlying social, political and economic drivers of risk and vulnerability. Factors that influence the resilience of cities include their (1) organizational structures (2) functions (3) physical entities, and (4) spatial scales. A system with applied resilience can continually survive, adapt and grow in the face of resource challenges and disturbances in an integrated and holistic manner for the well-being of the individual and collective. Those challenges and disturbances may be discrete and temporary, such as floods, or endure over a longer period, such as droughts and therefore individually discussed in the following subsections. 4.4 Resilience to floods Resilience to climate change is a growing priority among urban decision-makers. Improving resilience will require transformations in social, ecological, and built infrastructure components of urban systems (Tyler and Moench, 2012; Ernstson et al., 2010). Traditionally, urban planning and urban design have focused on settlement patterns, optimized land use, maximized proximity, community engagement, place-making, quality of life, and urban vitality. However, their focus is increasingly expanding to include principles regarding the application of resilience. Conventional wisdom assumes that flood resistance is necessary for cities; however, resilience theory suggests that it erodes urban resilience to floods (Holling and Meffe, 1996). In effect, flood-control infrastructure puts the city in one or the other contrasting conditions: dry and stable, or inundated and disastrous. With flood-control infrastructure in place, flooding results exclusively from the infrastructure’s failure and is more hazardous than if there were no flood-control infrastructure (Tobin, 1995; Verchick, 2010), such that the natural process of flooding becomes a synonym to disaster. In urban environments that are dependent on flood-control infrastructure, the river’s high flows are mostly confined between levees or held behind the upstream dam. The flood frequency is dramatically reduced and river dynamics are largely unnoticed. Each flood that is prevented is a loss of opportunity for learning (Klein et al., 1998; Colten and Sumpter, 2009). Little flood experience leads to low awareness of flood risk among citizens (Nunes Correia et al., 1998), who are too accustomed to operating under the dry-and-stable conditions, and know little about how to cope with inundation once the flood-control infrastructure fails. Furthermore, flood-control infrastructure’s structural rigidity and large scope leave little flexibility for making timely adjustments to constantly changing boundary conditions (Pahl-Wostl, 2006). The existence of flood-control infrastructure also prevents the development of a diversity of floodcoping measures because the development of such measures is too expensive (Castonguay, 2007). Cities that are solely dependent on flood-control infrastructures tend to address only the source of hazard and not its regarding built environment. Flood-control infrastructure, as a centralized measure, creates a false sense of security that precludes the need for localized flood-response capacity. In a resilient flood defense system, redundancy entails diversity and functional replication across scales (Peterson et al., 1998). A resilient flood management system with redundancy would comprise a diversity of 200 Handbook of hydroinformatics measures for mitigation, preparedness, response, and reorganization. The flood-response capacity would be distributed across the levels, i.e., individuals, communities, and the municipality, such that when the capacity of one level is overwhelmed, the city can still count on the others. This high tolerance of socioeconomic state changes is resilience’s major advantage over the traditional flood defense strategies which revolve around resistance. To clarify the conceptual transformation from resistance to resilience, it could be simply stated that although resistance and resilience have similar qualities, they hold different intentions. A resilient system is able to adjust flexibly in the occurrence of a hazard. A resilient system’s functions and core aims are maintained with only slight adjustment, although these adjustments may be significant for subsystems or over time. In contrast to resistant systems, resilient systems can anticipate, absorb, accommodate, or recover from the effects of a hazardous event in a timely and efficient manner through preservation, restoration, or improvement of the system’s essential basic structures and functions. Essentially, the system responds by accepting loss and returning to its preshock/stress state, which in turn may be perceived by dominant actors as the preferred state (Gencer et al., 2018). Conventional planning approaches toward floods are heavily based on environmental stability, neglecting inherent uncertainties and dynamism that are naturally coupled with the complexity of interactions in an urban system. Resilience is inherently dynamic and, therefore, a suitable approach for future urban designs (Liao, 2012). Floods are an inevitable part of urban dynamics, and therefore, the development of resilience to floods is significant to enlarge the existing body of resilience in a new dimension that can account for the natural consequences of increasing storm events. However, research on resilience to flood is still in its early stages with very few practical methods for real-world applications (Folke, 2006). In what follows in this section, a number of approaches and recommendations are listed from the existing body of literature that can be applied to urban flood hazard management from a resilience perspective: – Applying a systems-oriented approach, such as a local bottom-up design approach that simultaneously addresses physical, cultural, societal and economic issues, urban areas are often not understood as part of their surrounding context, or in terms of the flows of resources, people, water and energy (McPhearson et al., 2018). Ignoring resource flows and the interdependence of urban, periurban, and rural areas, as well as the relation between a city and its natural environment, can lead to policies which reinforce and enforce unsustainable resource use. Often, a lack of planning tools and current data makes integration of the design approach into planning and policies challenging (Raven et al., 2018). – Establishing a map of risk sectors, with hotspots defined. Geotechnical studies map and classify risk areas which can estimate potential damage to dwellings and residents, considering their positions and distances to critical slopes, rivers, and coast in coastal areas plus the degree of building vulnerability (construction pattern and level of urban consolidation). – Urban resilience strategies go hand in hand with configuration of urban morphology influenced by developed sustainable solutions like photovoltaic technologies, enhanced vegetation, and improved urban ventilation (Raven et al., 2018) – Best practices of adaptation-driven urban policies worldwide provide significant examples of how the paradigm shift toward water-sensitive and water-resilient cities allows for the implementation of an integrated approach that combines risk prevention with a regeneration of urban fabric driven by adaptive design solutions often including natural elements (Kazmierczak and Carter, 2010; Karamouz et al., 2018b). Urban planning and urban design strategies focusing on green infrastructure and sustainable water management help restore interactions between built and ecological environments. It is necessary to improve the resilience of urban systems by applying the following (yet not limited to) nature-based solutions to urban environments: (1) Utilizing an integrated “gray-green” approach to flood management strategies in coastal areas that rather mitigates coastal floods than completely stop them from entering urban environments (Karamouz and Heydari, 2020); (2) Revaluing and restoration of degraded ecosystems and indemnification of contaminated environmental elements, e.g., soil, air, and water. This will include monitoring air, water, and soil quality and adopting measures to reduce pollutants and particulate matter; (3) Targeting water quality in coastal and riparian areas; (4) Providing diverse open and safe public green space which enables cultural, community and recreation activities, and contributes to food and water security (Kremer et al., 2016). – Revamped drainage management approach to strengthen its flood resilience. This strategy aims at optimizing the management of stormwater using a comprehensive source-pathway-receptor approach that looks at catchment-wide ecosystem-based solutions for achieving higher drainage and flood protection standards. It covers the entire drainage system and not just the pathway over which the rainwater travels (Narayan et al., 2012). New provisions must be added to surface-water drainage regulations, which should require a minimum land size dedicated to implementing measures to slow surface runoff and reduce peak flows of stormwater into public drainage systems by implementing on-site detention measures such as green roofs, rain gardens, and detention tanks. Entropy and resilience indices Chapter 11 201 – Specified resilience, as described earlier on, although important, is not adequate alone. Optimizing specified resilience may undermine the general resilience of a social-ecological system. This is mainly due to the possibility that too much focus on specified resilience will tend to make the whole system less diverse, less flexible, and less responsive in terms of cross-sector actions (Walker and Salt, 2006). Overall, resilience to floods fosters the principle of working with nature rather than against it. It does not mean accepting system failure during flood events; rather, it embraces the society’s potential toward flexibility and adaptability. 4.5 Resilience to drought One of the complex natural hazards that have extreme effects on society, the environment, and the economy is drought. Generally, projected longer-term droughts and intense floods underscore the need to store more water to manage climate extremes but there are some fundamental differences between flood and drought contexts which makes applying resilience concept different for each event. Unlike floods, droughts do not necessarily have a short term period; there is no indication when it will start or end (Karamouz et al., 2015). It can last for months or even years which results in more complexity in applying the resilience concept. Drought can affect economic aspects of a region by causing failure in water supply or agricultural goals. There are four main stressors in drought that can cause failure in a region: (1) lack of precipitation, (2) lack of water reservoir, (3) high temperature and (4) lack of soil moisture. One of the studies on drought resilience was done by Karamouz et al. (2016) which applied the 4R concept of resilience in drought context. They quantified resilience to drought in the Aharchay watershed, located in East Azerbaijan, Iran. This watershed is one of the most important watersheds in the case study including the Sattarkhan Dam Reservoir. To quantify resilience, they categorized the characteristics related to four resilience components namely rapidity, robustness, resourcefulness, and redundancy. They determined the relative importance of these components using multicriteria decision making (MCDM). The criteria and subcriteria used in their research are presented in Table 1. Another example is managing urban hydrological systems through improved greening to decrease the vulnerability of urban ecosystems. For example, during drought periods, a small share of water resources may be reserved as an environmental flow for use by plants and animals, thus allowing ecological systems such as forests, wetlands, and streams to TABLE 1 Criteria and subcriteria for assessing drought resilience (Karamouz et al., 2016). Robustness (RO) Redundancy (RD) Resourcefulness (RS) Rapidity (RA) RO1: Water resources available in the region such as streams and springs RD1: Evaluation of water resources transfer in the surrounding area (water transfer) RS1: Availability of regional data RA1: Population of the region RO2: Economic vulnerability of the region RD2: Ground water resources availability RS2: Risk and disaster management plan RA2: Implement a virtual drought exercises performed by authorities RO3: Geographic proximity (mountain/ forest/desert) RD3: Agricultural water use method RS3: Additional budget for water disaster RA3: Level of public awareness and understanding of the concept of drought (the culture of consumption and conservation) RO4: Average annual rainfall in the region and its variability RD4: Prioritization of water allocation facing drought RS4: Drought forecasting and warning systems availability RA4: Intensity of the disaster RO5: Historical drought experience and level of region adaptability to drought RD5: Reservoir operation policies facing drought RS5: Drought vulnerability maps RA5: Infrastructure preparedness RS6: Coordination between organizations facing drought RA6: Significance of the region (strategic values) RO6: Average water consumption in the region 202 Handbook of hydroinformatics survive and maintain adaptive capacity. While drought may affect an entire region, urban ecosystems where water resources are well managed can reduce the impact of such climate-driven water stress, but only provided that urban ecosystem management activities are part of a larger system-level urban resilience plan. While greening solutions are presented as nature-based strategies, Van Loon et al. (2020) have explained the need for a new approach to drought resilience: The “Creative Drought” project, led by the University of Birmingham, is a leading example of how to increase drought resilience by utilizing local indigenous knowledge in addition to scientific methods. The project brings together researchers from a number of different disciplines, and develops an interdisciplinary approach unlike any other that evolve around a framework which could understand and help manage responses to drought. Their research has shown that a drought event (or other natural hazards) is not always a phenomenon to avoid. They believe experiencing any kind of natural catastrophe leads to adaptation, better management, and better preparedness for next events. But preparedness, in its nature, reduces over time as events fade away from memories and people find it hard to imagine what future droughts could have in store. Van Loon et al. (2020) searched for an approach that gave people the ability to imagine an event without experiencing it; with hopes that creative experiments based on past drought stories and future drought model scenarios might overcome the issue and help increase drought resilience by engaging local communities and authorities. They then built a model that was used to extrapolate and calculate several scenarios that were mentioned by community members and government representatives. Instead of predicting the future, they explored plausible futures. Droughts were calculated and compared between the scenario and the baseline. These were transformed into storylines including information on the duration and severity of future droughts compared to previous experience (e.g., more severe than has been experienced in the past 40 years or twice as long as the drought in the early 1980s). Matlou et al. (2021) determined the impact of agricultural drought resilience on smallholder livestock farming households’ welfare in Northern Cape Province of South Africa. They quantified smallholder farmers’ welfare based on four, human, social, natural, and economic capitals, to enhance their resilience to agricultural drought. The results indicated that smallholder farmers who received drought relief support saw an improvement in their welfare. However, the welfare improvements varied across respondents and different gender categories, with males having higher welfare improvements relative to females. Pourmoghim et al. (2022) introduced a framework (based on Karamouz et al., 2016) to evaluate the resilience of lakes under climatic and anthropogenic droughts. They proposed a hierarchical structure of criteria with four levels. The first level included several indices such as long-term resilience, reliability, and implementation cost. In the second to fourth levels, four main resilience-based criteria (i.e., robustness, resourcefulness, redundancy, and rapidity) with relative subcriteria were defined. They aggregated the values of criteria and subcriteria using the Evidential Reasoning (ER) approach. In the end, they calculated the annual resilience time series, three resilience indices, namely the recovery time, loss of resilience, and final resilience. The proposed methodology was applied in the Zarrinehrud river basin and Lake Urmia. The results showed that more than 80% of the scenarios with the implementation costs of more than 50 million US dollars have an overall resilience of more than 70%. 5. Conclusions More frequent climate and weather extreme events are being experienced in urban ecosystems. The frequency and severity of weather and climate-related disasters in urban areas are projected to increase in the coming decades. With climate change impacts taking hold, the environmental baselines of urban environments have started to shift. Stated that more than half of the world’s population resides in urban areas and that this trend is expected to significantly increase in the coming decades (Rosenzweig et al., 2018), more attention needs to be spotlighted on disaster risk reduction and a paradigm shift from failsafe to safe-to-fail. This chapter has endorsed a comprehensive theory of entropy and resilience in order to build resilient cities. These concepts embrace inherent dynamism of urban ecosystems and uncertainties of extreme conditions to provide unconventional perspectives for mitigating the impact of natural hazards on ecology and urban water systems performance. Resilience theory suggests that what underlies a truly resilient urban design is not how stable appears or how many disturbances it has absorbed, but whether it can stand an unpredictable shock that would fundamentally alter or erase an urban system’s identity. How urban design affects urban resilience, however, essentially depends on design principles that are increasingly influenced by ecological resilience rather than engineering resilience as past experiences are taken as lessons for the future. A resilient urban water system can only be designed by viewing urban regions as complex socio-ecological systems with cross-level interactions and innate uncertainties. Green infrastructures and their integration into traditional resistanceoriented designs have been proven to provide cost-effective, nature-based solutions for a resilient adaptation strategy toward climate change and extreme events while also creating opportunities to increase socioeconomic equity, public green Entropy and resilience indices Chapter 11 203 spaces, and sustainable urban development. Comprehensive and ecosystem-friendly adaptation scenarios used to enhance urban resilience are not limited to the strategies stated in this chapter but still of the same nature. It is argued that in applying the key ideas and principles of resilience, it is important to think of the seemingly opposing processes, such as rigidity vs flexibility, general vs specified, and creativity vs conservation not as paradoxes but dialectical duals that must coexist to achieve a synthesis of urban resilience. References Ahern, J., 2011. From fail-safe to safe-to-fail: sustainability and resilience in the new urban world. Landsc. Urban Plan. 100 (4), 341–343. Alberti, M., 2015. Eco-evolutionary dynamics in an urbanizing planet. Trends Ecol. Evol. 30 (2), 114–126. https://doi.org/10.1016/j.tree.2014.11.007. Ansari, A.H., Olyaei, M.A., Heydari, Z., 2021. Ensemble generation for hurricane hazard assessment along the United States’ Atlantic coast. Coast. Eng. 169, 103956. Asefa, T., Clayton, J., Adams, A., Anderson, D., 2014. Performance evaluation of a water resources system under varying climatic conditions: reliability, resilience, vulnerability and beyond. J. Hydrol. 508, 53–65. Atkinson, S., Farmani, R., Memon, F.A., Butler, D., 2014. Reliability indicators for water distribution system design: comparison. J. Water Resour. Plan. Manag. 140 (2), 160–168. Banik, B.K., Alfonso, L., Torres, A.S., Mynett, A., Di Cristo, C., Leopardi, A., 2015. Optimal placement of water quality monitoring stations in sewer systems: an information theory approach. Procedia Eng. 119, 1308–1317. Barbe, D.E., Cruise, J.F., Singh, V.P., 1994. Derivation of a distribution for the piezometric head in groundwater flow using entropy. In: Stochastic and statistical methods in hydrology and environmental engineering. Springer, Dordrecht, Netherlands, pp. 151–161. Bardsley, D.K., Hugo, G.J., 2010. Migration and climate change: examining thresholds of change to guide effective adaptation decision-making. Popul. Environ. 32, 238–262. https://doi.org/10.1007/s11111-010-0126-9. Batty, M., 2008. The size, scale, and shape of cities. Science 319, 769–771. https://doi.org/10.1126/science.1151419. Berkes, F., 2007. Understanding uncertainty and reducing vulnerability: lessons from resilience thinking. Nat. Hazards 41 (2), 283–295. Bettencourt, L., West, G., 2010. A unified theory of urban living. Nature 467, 912–913. https://doi.org/10.1038/467912a. Boroumand, A., Rajaee, T., Masoumi, F., 2018. Semivariance analysis and transinformation entropy for optimal redesigning of nutrients monitoring network in San Francisco bay. Mar. Pollut. Bull. 129 (2), 689–694. Bruneau, M., Chang, S.E., Eguchi, R.T., Lee, G.C., O’Rourke, T.D., Reinhorn, A.M., Shinozuka, M., Tierney, K., Wallace, W.A., Von Winterfeldt, D., 2003. A framework to quantitatively assess and enhance the seismic resilience of communities. Earthq. Spectra 19 (4), 733–752. Butler, D., Farmani, R., Fu, G., Ward, S., Diao, K., Astaraie-Imani, M., 2014. A new approach to urban water management: safe and sure. In: 16th Water Distribution System Analysis Conference, WDSA. Procedia Engineering, pp. 347–354. Butler, D., Ward, S., Sweetapple, C., Astaraie-Imani, M., Diao, K., Farmani, R., Fu, G., 2017. Reliable, resilient and sustainable water management: the Safe & SuRe approach. Global Chall. 1 (1), 63–77. https://doi.org/10.1002/gch2.1010. Castonguay, S., 2007. The production of flood as natural catastrophe: extreme events and the construction of vulnerability in the drainage basin of the St. Francis River (Quebec), mid-nineteenth to mid-twentieth century. Environ. Hist. 12 (4), 820–844. Chen, L., Singh, V.P., 2018. Entropy-based derivation of generalized distributions for hydrometeorological frequency analysis. J. Hydrol. 557, 699–712. Colten, C.E., Sumpter, A.R., 2009. Social memory and resilience in New Orleans. Nat. Hazards 48 (3), 355–364. Cui, H., Singh, V.P., 2015. Configurational entropy theory for streamflow forecasting. J. Hydrol. 521, 1–17. Daly, H.E., 1992. Is the entropy law relevant to the economics of natural resource scarcity?—yes, of course it is! J. Environ. Econ. Manag. 23 (1), 91–95. Darbandsari, P., Coulibaly, P., 2020. Introducing entropy-based Bayesian model averaging for streamflow forecast. J. Hydrol. 591, 125577. Diao, K., 2021. Towards resilient water supply in centralized control and decentralized execution mode. J. Water Supply Res. Technol. AQUA 70 (4), 449–466. Diao, K., Sweetapple, C., Farmani, R., Fu, G., Ward, S., Butler, D., 2016. Global resilience analysis of water distribution systems. Water Res. 106, 383–393. Diao, K., Jung, D., Farmani, R., Fu, G., Butler, D., Lansey, K., 2021. Modular interdependency analysis for water distribution systems. Water Res. 201, 117320. Dong, S., Wang, N., Liu, W., Soares, C.G., 2013. Bivariate maximum entropy distribution of significant wave height and peak period. Ocean Eng. 59, 86–99. Dong, X., Guo, H., Zeng, S., 2017. Enhancing future resilience in urban drainage system: green versus grey infrastructure. Water Res. 124, 280–289. Ernstson, H., Barthel, S., Andersson, E., Borgstr€om, S.T., 2010. Scale-crossing brokers and network governance of urban ecosystem services: the case of Stockholm. Ecol. Soc. 15 (4), 28. Eslamian, S., Eslamian, F., 2021. Disaster Risk Reduction for Resilience: Disaster Risk Management Strategies. Springer Nature, Switzerland. Fistola, R., 2011. The unsustainable city. Urban entropy and social capital: the needing of a new urban planning. Procedia Eng. 21, 976–984. Folke, C., 2006. Resilience: the emergence of a perspective for social–ecological systems analyses. Global Environ. Change 16 (3), 253–267. Folke, C., Carpenter, S.R., Walker, B.H., Scheffer, M., Chapin, T., Rockstrom, J., 2010. Resilience thinking: integrating resilience, adaptability and transformability. Ecol. Soc. 15 (4), 20. Fowler, H.J., Kilsby, C.G., O’Connell, P.E., 2003. Modeling the impacts of climatic change and variability on the reliability, resilience, and vulnerability of a water resource system. Water Resour. Res. 39 (8). 204 Handbook of hydroinformatics Gencer, E.A., 2008. Natural Disasters, Vulnerability, and Sustainable Development. VDM Verlag, Germany. Gencer, E.A., 2013. The Interplay Between Urban Development, Vulnerability, and Risk Management: A Case Study of the Istanbul Metropolitan Area. Springer Briefs in Environment, Security, Development and Peace, vol. 7 Springer Science & Business Media, Heidelberg, New York, Dordrecht, London, UK. Gencer, E., Folorunsho, R., Linkin, M., Wang, X., Natenzon, C.E., Wajih, S., Mani, N., Esquivel, M., Solecki, W., 2018. Disasters and risk in cities. In: Rosenzweig, C., Solecki, W., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ali Ibrahim, S. (Eds.), Climate Change and Cities: Second Assessment Report of the Urban Climate Change Research Network. Cambridge University Press, New York, USA, pp. 61–98. Georgescu-Roegen, N., 1993. The entropy law and the economic problem. In: Valuing the Earth: Economics, Ecology, Ethics, pp. 75–88. Greco, M., Mirauda, D., Plantamura, A.V., 2014. Manning’s roughness through the entropy parameter for steady open channel flows in low submergence. Procedia Eng. 70, 773–780. Grimm, N.B., Grove, J.M., Pickett, S.T.A., Redman, C.A., 2000. Integrated approaches to long-term studies of urban ecological systems. Bioscience 50 (7), 571. https://doi.org/10.1641/0006-3568(2000)050. Hashimoto, T., Stedinger, J.R., Loucks, D.P., 1982. Reliability, resiliency, and vulnerability criteria for water resource system performance evaluation. Water Resour. Res. 18 (1), 14–20. Holling, C.S., 1973. Resilience and stability of ecological systems. Annu. Rev. Ecol. Syst. 4 (1), 1–23. Holling, C.S., 1996. Engineering resilience versus ecological resilience. In: Engineering Within Ecological Constraints, pp. 31–43. Holling, C.S., Meffe, G.K., 1996. Command and control and the pathology of natural resource management. Conserv. Biol. 10 (2), 328–337. Hosseini, S., Barker, K., Ramirez-Marquez, J.E., 2016. A review of definitions and measures of system resilience. Reliab. Eng. Syst. Saf. 145, 47–61. Juan-Garcı́a, P., Butler, D., Comas, J., Darch, G., Sweetapple, C., Thornton, A., Corominas, L., 2017. Resilience theory incorporated into urban wastewater systems management. State of the art. Water Res. https://doi.org/10.1016/j.watres.2017.02.047. Jung, D., Kang, D., Kim, J.H., Lansey, K., 2014. Robustness-based design of water distribution systems. J. Water Resour. Plan. Manag. 140 (11), 04014033. Karamouz, M., Heydari, Z., 2020. Conceptual design framework for coastal flood best management practices. J. Water Resour. Plan. Manag. 146 (6), 04020041. Karamouz, M., Hojjat-Ansari, A., 2020. Uncertainty based budget allocation of wastewater infrastructures’ flood resiliency considering interdependencies. J. Hydroinf. 22 (4), 768–792. Karamouz, M., Mohammadi, K., 2020. Nonstationary based framework for performance enhancement of coastal flood mitigation strategies. J. Hydrol. Eng. 25 (6), 04020020. Karamouz, M., Zeynolabedin, A., Olyaei, M.A., 2015. Mapping regional drought vulnerability: a case study. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 40. Karamouz, M., Zeynolabedin, A., Olyaei, M.A., 2016. Regional drought resiliency and vulnerability. J. Hydrol. Eng. 21 (11), 05016028. Karamouz, M., Rasoulnia, E., Olyaei, M.A., Zahmatkesh, Z., 2018a. Prioritizing investments in improving flood resilience and reliability of wastewater treatment infrastructure. J. Infrastruct. Syst. 24 (4), 04018021. Karamouz, M., Taheri, M., Mohammadi, K., Heydari, Z., Farzaneh, H., 2018b. A new perspective on BMPs’ application for coastal flood preparedness. In: World Environmental and Water Resources Congress 2018: Water, Wastewater, and Stormwater; Urban Watershed Management; Municipal Water Infrastructure; and Desalination and Water Reuse. American Society of Civil Engineers, Reston, VA, USA, pp. 171–180. Kazmierczak, A., Carter, J., 2010. Adaptation to climate change using green and blue infrastructure: a database of case studies. University of Manchester, Interreg IVC Green and blue space adaptation for Urban areas and eco-towns (GRaBS). Accessed 19 March 2020. Kjeldsen, T.R., Rosbjerg, D., 2004. Choice of reliability, resilience and vulnerability estimators for risk assessments of water resources systems (Choix d’estimateurs de fiabilite, de resilience et de vulnerabilite pour les analyses de risque de systèmes de ressources en eau). Hydrol. Sci. J. 49 (5). https:// doi.org/10.1623/hysj.49.5.755.55136. Kleidon, A., Schymanski, S., 2008. Thermodynamics and optimality of the water budget on land: a review. Geophys. Res. Lett. 35 (20). https://doi.org/ 10.1029/2008GL035393. Klein, R.J., Smit, M.J., Goosen, H., Hulsbergen, C.H., 1998. Resilience and vulnerability: coastal dynamics or Dutch dikes? Geogr. J., 259–268. Kremer, P., Hamstead, Z.A., McPhearson, T., 2016. The value of urban ecosystem services: a spatially explicit multicriteria analysis of landscape scale valuation scenarios in NYC. Environ. Sci. Pol. https://doi.org/10.1016/J.ENVSCI.2016.04.012. Lewis, G.N., Randall, M., 1961. Thermodynamics, second ed. McGraw-Hill, New York, USA. Revised by Pitzer, K. S. and Brewer, L. Liao, K.H., 2012. A theory on urban resilience to floods—a basis for alternative planning practices. Ecol. Soc. 17 (4), 48. Liu, J., Dietz, T., Carpenter, S.R., Alberti, M., Folke, C., Moran, E., Pell, A.N., Deadman, P., Kratz, T., Lubchenco, J., Ostrom, E., Ouyang, Z., Provencher, W., Redman, C.L., Schneider, S.H., Taylor, W.W., 2007. Complexity of coupled human and natural systems. Science 317, 1513–1516. https://doi.org/ 10.1126/science.1144004. Liu, Y., You, M., Zhu, J., Wang, F., Ran, R., 2019. Integrated risk assessment for agricultural drought and flood disasters based on entropy information diffusion theory in the middle and lower reaches of the Yangtze River, China. Int. J. Disaster Risk Reduct. 38, 101194. Magrin, G.O., Marengo, J.A., Boulanger, J.P., Buckeridge, M.S., Castellanos, E., Poveda, G., Vicuña, S., 2014. Central and South America, Climate Change 2014: impacts, adaptation, and vulnerability. In: Part B: Regional Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge, United Kingdom and New York (Chapter 27). Malekinezhad, H., Sepehri, M., Pham, Q.B., Hosseini, S.Z., Meshram, S.G., Vojtek, M., Vojteková, J., 2021. Application of entropy weighting method for urban flood hazard mapping. Acta Geophys., 1–14. Matlou, R., Bahta, Y.T., Owusu-Sekyere, E., Jordaan, H., 2021. Impact of agricultural drought resilience on the welfare of smallholder livestock farming households in the Northern Cape Province of South Africa. Land 10 (6), 562. Entropy and resilience indices Chapter 11 205 McDonnell, M.J., Hahs, A., 2013. The future of urban biodiversity research: moving beyond the “low-hanging fruit”. Urban Ecosyst. 16, 397–409. https:// doi.org/10.1007/s11252-013-0315-2. McPhearson, T., Andersson, E., Elmqvist, T., Frantzeskaki, N., 2015. Resilience of and through urban ecosystem services. Ecosyst. Serv. 12, 152–156. McPhearson, T., Haase, D., Kabisch, N., Gren, Å., 2016a. Advancing understanding of the complex nature of urban systems. Ecol. Indic. 70, 566–573. McPhearson, T., Pickett, S.T.A., Grimm, N., Alberti, M., Elmqvist, T., Niemel€a, J., Weber, C., Haase, D., Breuste, J., Qureshi, S., 2016b. Advancing urban ecology toward a science of cities. Bioscience 66 (3), 198–212. https://doi.org/10.1093/biosci/biw002. McPhearson, T., Karki, M., Herzog, C., Santiago Fink, H., Abbadie, L., Kremer, P., Clark, C.M., Perini, K., 2018. Urban ecosystems and biodiversity. In: Rosenzweig, C., Solecki, W., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ali Ibrahim, S. (Eds.), Climate Change and Cities: Second Assessment Report of the Urban Climate Change Research Network. Cambridge University Press, New York, USA, pp. 257–318. Melo, O., Vargas, X., Vicuna, S., Meza, F., McPhee, J., 2010. Climate change economic impacts on supply of water for the M & I sector in the metropolitan region of Chile. In: 2010 Watershed Management Conference: Innovations in Watershed Management Under Land Use and Climate Change, August (23–27), Madison, Wisconsin, USA. Meng, F., Fu, G., Butler, D., 2016. Water quality permitting: from end-of-pipe to operational strategies. Water Res. 101, 114–126. Meng, F., Fu, G., Farmani, R., Sweetapple, C., Butler, D., 2018. Topological attributes of network resilience: a study in water distribution systems. Water Res. 143, 376–386. Mobley, W., Sebastian, A., Highfield, W., Brody, S.D., 2019. Estimating flood extent during Hurricane Harvey using maximum entropy to build a hazard distribution model. J. Flood Risk Manage. 12, e12549. Mohammadiun, S., Yazdi, J., Salehi Neyshabouri, S.A.A., Sadiq, R., 2018. Development of a stochastic framework to design/rehabilitate urban stormwater drainage systems based on a resilient approach. Urban Water J. 15 (2), 167–176. Mugume, S.N., Gomez, D.E., Fu, G., Farmani, R., Butler, D., 2015. A global analysis approach for investigating structural resilience in urban drainage systems. Water Res. 81, 15–26. Mukher jee, D., Ratnaparkhi, M.V., 1986. On the functional relationship between entropy and variance with related applications. Commun. Stat. Theory Methods 15 (1), 291–311. Narayan, S., Hanson, S., Nicholls, R.J., Clarke, D., Willems, P., Ntegeka, V., Monbaliu, J., 2012. A holistic model for coastal flooding using system diagrams and the source–pathway–receptor (SPR) concept. Nat. Hazards Earth Syst. Sci. 12 (5), 1431–1439. Niemel€a, J., Breuste, J.H., Guntenspergen, G., McIntyre, N.E., Elmqvist, T., James, P., 2011. Urban Ecology: Patterns, Processes, and Applications. Oxford University Press, UK. Nunes Correia, F., Castro Rego, F., Da Graca Saraiva, M., Ramos, I., 1998. Coupling GIS with hydrologic and hydraulic flood modelling. Water Resour. Manag. 12 (3), 229–249. NYCDEP, 2013. NYC Wastewater resiliency plan, climate risk assessment and adaptation study. In: Wastewater Treatment Plants. Department of Environmental Protection, New York City, USA. Olyaei, M., Karamouz, M., 2020. A Bayesian approach for estimating biological treatment parameters under flood condition. J. Environ. Eng. ASCE. https://doi.org/10.1061/(ASCE)EE.19437870.0001756. Olyaei, M.A., Karamouz, M., Farmani, R., 2018. Framework for assessing flood reliability and resilience of wastewater treatment plants. J. Environ. Eng. 144 (9), 04018081. Pahl-Wostl, C., 2006. The importance of social learning in restoring the multifunctionality of rivers and floodplains. Ecol. Soc. 11 (1), 10. Pandey, R., Bardsley, D.K., 2013. Human ecological implications of climate change in the Himalaya: pilot studies of adaptation in agro-ecosystems within two villages from Middle Hills and Tarai, Nepal. In: Impacts World 2013, International Conference on Climate Change Effects, Potsdam, Germany, May 27–30. Panos, C.L., Wolfand, J.M., Hogue, T.S., 2021. Assessing resilience of a dual drainage urban system to redevelopment and climate change. J. Hydrol. 596, 126101. Papalexiou, S.M., Koutsoyiannis, D., 2012. Entropy based derivation of probability distributions: a case study to daily rainfall. Adv. Water Resour. 45, 51– 57. Park, J., Seager, T.P., Rao, P.S.C., Convertino, M., Linkov, I., 2013. Integrating risk and resilience approaches to catastrophe management in engineering systems. Risk Anal. 33 (3), 356–367. Pelorosso, R., Gobattoni, F., Leone, A., 2017. The low-entropy city: a thermodynamic approach to reconnect urban systems with nature. Landsc. Urban Plan. 168, 22–30. Peterson, G., 2000. Political ecology and ecological resilience: an integration of human and ecological dynamics. Ecol. Econ. 35 (3), 323–336. Peterson, G., Allen, C.R., Holling, C.S., 1998. Ecological resilience, biodiversity, and scale. Ecosystems 1, 6–18. Pickett, S.T., Burch, W.R., Dalton, S.E., Foresman, T.W., Grove, J.M., Rowntree, R., 1997. A conceptual framework for the study of human ecosystems in urban areas. Urban Ecosyst. 1 (4), 185–199. Pickett, S.T.A., Cadenasso, M.L., Grove, J.M., Nilon, C.H., Pouyat, R.V., Zipperer, W.C., Costanza, R., 2001. Urban ecological systems: linking terrestrial ecological, physical, and socioeconomic components of metropolitan areas. Annu. Rev. Ecol. Syst. 32, 127–157. Pourmoghim, P., Behboudian, M., Kerachian, R., 2022. An uncertainty-based framework for evaluating and improving the long-term resilience of lakes under anthropogenic droughts. J. Environ. Manag. 301, 113900. Purvis, B., Mao, Y., Robinson, D., 2017. Thermodynamic entropy as an indicator for urban sustainability? Procedia Eng. 198, 802–812. Qiu, H., Chen, L., Zhou, J., He, Z., Zhang, H., 2021. Risk analysis of water supply-hydropower generation-environment nexus in the cascade reservoir operation. J. Clean. Prod. 283, 124239. Ranjbar, S., Singh, A., 2020. Entropy and intermittency of river bed elevation fluctuations. J. Geophys. Res. Earth Surf. 125 (8), e2019JF005499. 206 Handbook of hydroinformatics Raven, J., Stone, B., Mills, G., Towers, J., Katzschner, L., Leone, M., Gaborit, P., Georgescu, M., Hariri, M., 2018. Urban planning and design. In: Rosenzweig, C., Solecki, W., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ali Ibrahim, S. (Eds.), Climate Change and Cities: Second Assessment Report of the Urban Climate Change Research Network. Cambridge University Press, New York, USA, pp. 139–172. Reggiani, P., Sivapalan, M., Hassanizadeh, S.M., 1998. A unifying framework for watershed thermodynamics: balance equations for mass, momentum, energy and entropy, and the second law of thermodynamics. Adv. Water Resour. 22 (4), 367–398. Revi, A., Satterthwaite, D.E., Aragón-Durand, F., Corfee-Morlot, J., Kiunsi, R.B.R., Pelling, M., Solecki, W., 2014. In: Field, C.B., Barros, V.R., Dokken, D.J., et al. (Eds.), Urban areas Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, pp. 535–612. Rosenzweig, C., Solecki, W.D., Romero-Lankao, P., Mehrotra, S., Dhakal, S., Ibrahim, S.A. (Eds.), 2018. Climate Change and Cities: Second Assessment Report of the Urban Climate Change Research Network. Cambridge University Press, UK. Scholz, R.W., Blumer, Y.B., Brand, F.S., 2012. Risk, vulnerability, robustness, and resilience from a decision-theoretic perspective. J. Risk Res. 15 (3), 313–330. https://doi.org/10.1080/13669877.2011.634522. Setiadi, Y., Tanyimboh, T.T., Templeman, A.B., 2005. Modelling errors, entropy and the hydraulic reliability of water distribution systems. Adv. Eng. Softw. 36 (11–12), 780–788. Shannon, C.E., 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27 (3), 379–423. Singh, V.P., 1997. The use of entropy in hydrology and water resources. Hydrol. Process. 11 (6), 587–626. Singh, K.R., Dutta, R., Kalamdhad, A.S., Kumar, B., 2019. An investigation on water quality variability and identification of ideal monitoring locations by using entropy based disorder indices. Sci. Total Environ. 647, 1444–1455. Solecki, W., O’Brien, K., Leichenko, R., 2011. Disaster risk reduction and climate change adaptation strategies: convergence and synergies. Curr. Opin. Environ. Sustain. 3 (3), 135–141. Sweetapple, C., Fu, G., Butler, D., 2017. Reliable, robust, and resilient system design framework with application to wastewater-treatment plant control. J. Environ. Eng. 143 (3), 04016086. Sweetapple, C., Astaraie-Imani, M., Butler, D., 2018. Design and operation of urban wastewater systems considering reliability, risk and resilience. Water Res. 147, 1–12. Sweetapple, C., Fu, G., Farmani, R., Butler, D., 2019. Exploring wastewater system performance under future threats: does enhancing resilience increase sustainability? Water Res. 149, 448–459. Swetapadma, S., Ojha, C.S.P., 2021. Flood frequency study using partial duration series coupled with entropy principle. Hydrol. Earth Syst. Sci. Discuss., 1–23. Tanyimboh, T.T., 2017. Informational entropy: a failure tolerance and reliability surrogate for water distribution networks. Water Resour. Manag. 31 (10), 3189–3204. Tanyimboh, T.T., Templeman, A.B., 1993. Optimum design of flexible water distribution networks. Civ. Eng. Syst. 10 (3), 243–258. Tavakol-Davani, H., Rahimi, R., Burian, S.J., Pomeroy, C.A., McPherson, B.J., Apul, D., 2019. Combining hydrologic analysis and life cycle assessment approaches to evaluate sustainability of water infrastructure: uncertainty analysis. Water 11 (12), 2592. Thomalla, F., Downing, T., Spanger-Siegfried, E., Han, G., Rockstr€om, J., 2006. Reducing hazard vulnerability: towards a common approach between disaster risk reduction and climate adaptation. Disasters 30 (1), 39–48. Tobin, G.A., 1995. The levee love affair: a stormy relationship? J. Am. Water Resour. Assoc. 31 (3), 359–367. Tyler, S., Moench, M., 2012. A framework for urban climate resilience. Clim. Dev. 4 (4), 311–326. United Nations Development Programme (UNDP), 2004. Reducing Disaster Risk: A Challenge for Development. http://www.undp.org/bcpr. (Accessed 3 May 2020). United Nations. Dept. of Economic, 2004. World Urbanization Prospects: The 2003 Revision. vol. 216 United Nations Publications. Van Loon, A.F., Lester-Moseley, I., Rohse, M., Jones, P., Day, R., 2020. Creative practice as a potential tool to build drought and flood resilience in the Global South. EGU. https://doi.org/10.5194/gc-2020-11. Verchick, R.R., 2010. Facing Catastrophe. Harvard University Press, USA, Cambridge, MA. Walker, B., Salt, D., 2006. Resilience Thinking. Island Press. Wang, C.H., Blackmore, J.M., 2009. Resilience concepts for water resource systems. J. Water Resour. Plan. Manag. 135 (6), 528–536. Wang, X., Khoo, Y.B., Wang, C.H., 2014. Risk assessment and decision- making for residential housing adapting to increasing stormtide inundation due to sea level rise in Australia. Civ. Eng. Environ. Syst. 31 (2), 125–139. Wang, W., Wang, D., Singh, V.P., Wang, Y., Wu, J., Wang, L., He, R., 2018. Optimization of rainfall networks using information entropy and temporal variability analysis. J. Hydrol. 559, 136–155. Wang, M., Fang, Y., Sweetapple, C., 2021. Assessing flood resilience of urban drainage system based on a ‘do-nothing’ benchmark. J. Environ. Manag. 288, 112472. Xu, P., Wang, D., Singh, V.P., Wang, Y., Wu, J., Wang, L., He, R., 2018. A kriging and entropy-based approach to rain gauge network design. Environ. Res. 161, 61–75. Zeynolabedin, A., Ghiassi, R., Norooz, R., Najib, S., Fadili, A., 2021. Evaluation of geoelectrical models efficiency for coastal seawater intrusion by applying uncertainty analysis. J. Hydrol. 603, 127086. Zhang, X., Low, Y.M., Koh, C.G., 2020. Maximum entropy distribution with fractional moments for reliability analysis. Struct. Saf. 83, 101904. Ziarh, G.F., Asaduzzaman, M., Dewan, A., Nashwan, M.S., Shahid, S., 2021. Integration of catastrophe and entropy theories for flood risk mapping in peninsular Malaysia. J. Flood Risk Manage. 14 (1), e12686. Chapter 12 Forecasting volatility in the stock market data using GARCH, EGARCH, and GJR models Sarbjit Singha,b, Kulwinder Singh Parmarc, and Jatinder Kaurc,d a Guru Nanak Dev University College, Pathankot, Punjab, India, b Department of Mathematics, Guru Nanak Dev University, Amritsar, Punjab, India, c Department of Mathematics, I.K. Gujral Punjab Technical University, Kapurthala, Punjab, India, d Guru Nanak Dev University College, Amritsar, Punjab, India 1. Introduction Forecasting volatility in financial markets has been getting more and more attention from researchers in diverse fields, stock market experts, and business analysts since October 19, 1987, when the stock market crashed. The volatility reflects uncertainty in stock market data and is affected by many factors like high corporate profit, sudden events, regulatory bodies, emotions, and sentiments of investors. Volatility measures the risk of a security and helps in estimating short-period fluctuations. The GARCH model is a conditional variance model that estimates the volatility in stock market returns, bonds, and other market indices. It helps modelers in assessing risks and optimizing their decisions (Tian and Guo, 2003). GARCH model is used generally when the observations tend to cluster and do not form a linear pattern. In the case of time-series data, the GARCH model is appropriate when the variance of error terms is serially autocorrelated and follows an autoregressive moving average process (Engle, 1982; Bollerslev, 1986; Barunik et al., 2016). The generalized autoregressive conditional heteroskedasticity (GARCH) model describes an approach for estimating volatility in financial markets which was introduced by Robert F. Engle in 1983, a 2003 Nobel Prize winner in Economics. Many financial modeling professionals all over the globe used to prefer the GARCH model due to its capability in modeling and predicting conditional variances and volatility in financial data (Dellaportas and Pourahmadi, 2012). The most significant alphabet in a GARCH is H, the heteroskedasticity. Statistically, heteroskedasticity happens when standard errors of a variable observed over a specific period are nonconstant (Hadizadeh and Eslamian, 2017). Depending upon the future trends, heteroskedasticity are of two types; unconditional and conditional heteroskedasticity. Usually, lower volatilities are accompanied by upward-positive movements in stock prices, while downward-negative swings of the same magnitude point toward much higher volatilities ( Jach and Kokoszka, 2010). The presence of heteroskedastic effects makes the model quite challenging. Engle (1982) developed the time-varying variance model, and Bollerslev (1986) extended the model to include the structure of an ARMA model. Since then, many studies have adopted the GARCH framework to explain the volatility of the stock market. Both the up and down trends of the stock market tend to affect the volatility (Bouoiyour and Selmi, 2015). The stock market is highly influenced by massive changes, while minor changes tend to have a low impact. There is a negative correlation between the shocks and the returns of the stock market. The market takes a much longer time to recover from adverse shocks leaving substantial impacts on stock pricing as compared to positive jolts (Liu and Morley, 2009). Thus, normal distribution or symmetric distribution is not always an appropriate assumption (Nelson, 1991; Chuang et al., 2007). Therefore, researchers experimented GARCH model with some other well-known distributions that are given in Table 1. The key assumption in the GARCH model is that the variance will revert to the average value in the future. In financial econometrics, GARCH effects are very predominant, because they capture the stylized facts of such data that show, for example, volatility clustering, dependence without correlation, and tail heaviness (Paolella, 2018). Fat-tail distributions Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00024-5 Copyright © 2023 Elsevier Inc. All rights reserved. 207 208 Handbook of hydroinformatics TABLE 1 Literature review on the GARCH model with some well-known distributions other than normal distribution for financial studies. Related literature Distribution used Purpose of study Bollerslev (1988), Baillie and Bollerslev (1989) Student’s t-distribution To model the foreign exchange rate Hsieh (1989) Exponential distribution Used for foreign exchange rates Akgirary et al. (1991) Exponential distribution Applied to the distribution of prices of precious metal Nelson (1991) Exponential distribution To study the U.S. stock market Ding et al. (1993) Asymmetric power autoregressive conditional heteroskedastic (APARCH) model using Standard and Poor’s data To investigate into long memory property of Stock Market returns Theodossiou (1994) Exponential distribution Used for foreign exchange rates Koutmos and Theodossiou (1994) Exponential distribution Used for foreign exchange rates Gallant et al. (1997) Nonnormal distribution For the financial analysis McMillan et al. (2000) Symmetric and asymmetric densities For the United Kingdom stock market Lambert and Laurent (2001) Skewed Student’s t-distribution Used it in the GARCH framework Siourounis (2002) Nonnormal distribution For the financial analysis Harris et al. (2004) Skewed generalized Student’s t-distribution To capture stylized facts (skewness and leverage effects) of daily returns Yu (2005) Nonnormal distribution For the financial analysis Chuang et al. (2007) Logistic distribution and the scaled student’s t distribution Forecasting volatility in the financial markets usually represent the stylized facts of the stock market. Alberg et al. (2011) proposed that the GARCH models with fat-tail distributions are relatively better suited for analyzing returns on stocks. Finite-dimensional distributions of GARCH processes exhibit interesting features of regular variation. This feature is consistent with the “heavy-tailed ness” possessed by real-life log-return data. The regular variation is due to the squares of a stationary GARCH process embedded in a multivariate linear stochastic recurrence equation. The convergence of sample autocorrelations is slow when the tails of GARCH processes are heavy (Basrak et al., 2002). In the last two decades, maximizing profits has always been a driving force for investors to shift toward algorithmic trading and apply machine learning methods to investment decisions. The use of quantitative methods in economics and finance research has increased dramatically with the advent of technology. It leads to the development of many theories that explain the risk preferences of investors and optimal assets allocation in the portfolio under different risk aversion conditions. All these theories had clubbed under Modern Portfolio Theory and the Efficient Frontier of optimal asset allocation (Markowitz, 1952). Under this theory, an investor selects a portfolio at the time that produces a stochastic return at time t. The model assumes investors are risk-averse and, when choosing among portfolios, they care only about the mean and variance of their one-period investment returns. Thus, investors choose “mean-variance-efficient portfolios” in the sense that the portfolios minimize the variance of portfolio return, given the expected return, and maximize the expected return given variance (Fama and French, 2004). This theory thus covered an optimal combination of securities. However, it also sketched a critical assumption among others, i.e., “today’s returns are a function of the decisions made in the past.” This connectivity between the past and present actions provides researchers with an abundant amount of information contained in so-called “histories.” This leads to the idea that the “history repeats itself in that ‘patterns’ of past price behavior will tend to recur in the future” (Fama, 1965; Charles and Darne, 2005). However, on the contrary, some researchers are also of the Forecasting volatility in the stock market data Chapter 12 209 opinion that the future path of the price level of security cannot be predicted from the past, i.e., they believe in the theory of random walks. To be precise, it implies that the past cannot predict the future in any meaningful way (Fama, 1965). Nelson (1990), while studying the relationship between GARCH and similar models, found that with some restrictions on parameters for short intervals in the sequence of GARCH models, the conditional variance converges to a stochastic differential equation with an inverse-gamma stationary distribution. It implies that the GARCH log-returns can be modeled approximately with Student’s t-distribution for sufficiently short time intervals. Since the GARCH process is Markovian, it is enough to consider the convergence of Markov chains (Engle, 1982). In the beginning, error terms obtained after modeling the financial time series by GARCH were handled by normal distribution, which Bollerslev (1987) treated later by Student’s t-distribution. In 1981, Harvey gave the Generalized Error Distribution (GED) by taking into account fat tails. After testing various distributions, Liu and Hung (2010) concluded that the error distribution did not help much in improving volatility forecasting using the GARCH model. However, Wilhelmsson (2006) found that if the leptokurtic property is allowed in error distributions, it leads to significant improvements in forecasting when compared to the normal distribution. The GARCH model introduced by Engle (1982) and Bollerslev (1986) is frequently employed to model excess kurtosis and volatility clustering and forecast their volatility. But the residuals standardized by the conditional volatility computed by using an estimated GARCH model still have excess kurtosis (Baillie and Bollerslev, 1989). It indicates the presence of outliers in the returns series, which are not detected by the GARCH model (Balke and Fomby, 1994). Outliers can have undesirable effects on the estimates of the parameters of the equation governing the volatility dynamics and the tests of conditional homoskedasticity. Chen and Liu (1993) proposed a procedure to detect and correct additive outliers (AOs) which had been recently proposed by Franses and Ghijsels (1999) using GARCH models. K€oksal (2009), Liu and Hung (2010) also drew similar conclusions. However, Balke and Fomby (1994), and Tolvi (2001) found that a large number of detected outliers in time series are innovative outliers (IOs), especially for high-frequency data. Lintner (1965), Mossin (1966), Sharpe (1964), and Treynor (1961) developed factor-based models, and Fama and French (1992, 1993) developed the capital asset pricing model, which is the Fama-French factor model to predict stock prices. Nijman and Sentana (1996) discussed the contemporaneous aggregation of independent univariate GARCH processes as well as marginalization in more general multivariate GARCH processes. They concluded that the class of strong GARCH processes is not closed under these transformations whereas the class of the weak process is closed. Poon and Granger (2003) found that GARCH generally dominates ARCH. However, asymmetric models, such as the exponential GARCH by Nelson (1991) and GJR-GARCH by Glosten et al. (1993) tend to perform better than the original GARCH in some cases. The present study deals with testing the performance of GARCH, EGARCH, and GJR models with Gaussian and Student’s t-distributions to forecast the volatility derived from conditional variances. Indian Stock Market data consisting of daily closing prices of BSE 100 S&P stock index from 2009 to 2019 have been selected for the study. The detailed methodology for the proposed study has been described in following Section 2. The application and results of the study are then discussed in Section 3. Finally, the conclusions of the study are given in the final Section 4. 2. Methodology In the present study, GARCH and its variants EGARCH and GJR model and forecast volatility in BSE stock market data. The main applications of GARCH lie in analyzing financial time-series data to find its conditional variances and volatilities. The GARCH model can be appropriate for time series data in which the variance of the error term is serially autocorrelated and follows an autoregressive moving average process. It assesses risk and expected returns for assets that exhibit clustered periods of volatility in returns (Krishnan and Mukherjee, 2010; Dhaene and Wu, 2019). The model will firstly convert the prices into relative returns and then fit the historical data to a mean-reverting volatility term structure through its internal optimization technique. Specifically, GARCH model involves three steps: (1) Estimating a best-fitting autoregressive model (2) Computing autocorrelations of the error term (3) Testing for significance. Because financial data is high-frequency data, a large number of volatility matrix estimation methods have been developed. Since these methods employ an approximate factor model, they lead to a low-rank and sparse structure for the integrated volatility matrix. But these models lack a dynamic structure that can predict future volatility matrices as the volatilities are highly unstable in real-life practice. Also, there is volatility clustering (Mandelbrot, 1963; McMillan and Speight 2004). The heterogeneous and autocorrelated feature of volatilities motivated the researchers to develop parametric models. Kim 210 Handbook of hydroinformatics and Fan (2019) developed a factor GARCH-Itô model to overcome the problem of predicting future volatility matrices based on an approximate factor model. 2.1 Types of GARCH models Since the original introduction of GARCH, many variations of it accommodating various specific qualities of the stock, industry, or economic data have emerged. In addition to the magnitude, all these variants incorporate the direction of returns (addressed in the original model). In assessing risk, financial institutions incorporate GARCH models into their Value-at-Risk (VAR), maximum expected loss (whether for a single investment or trading position, portfolio, or at a division or firm-wide level) over a specified period projection. GARCH models provide better gauges of risk than can be obtained through tracking standard deviation alone (Bollerslev, 1986; Chong et al., 1999). The GARCH process, being an extension of the ARCH process, which is the same as the extension of the standard time series AR process to the general ARMA process. Just as the autocorrelation and partial autocorrelation functions are useful tools in identifying and checking the behavior of time-series in the conditional mean ARIMA model, the autocorrelations and partial autocorrelations for the squared process help to identify GARCH behavior in the conditional variance equation (Franses and Van Dijk, 1996; Cryer and Chan, 2008). One of the challenges in the modeling financial time series is to identify heteroskedastic effects, which imply that the volatility of financial data is not constant. The volatility is the square root of conditional variance of the log return series. If {yt : t T} denotes the stock price series data, then log returns are defined by yt r t ¼ log yt log yt1 ¼ log (1) yt1 The volatility st is defined by s2t ¼ Var r 2t j F t1 (2) where F t1 is a s-algebra generated by r0, r1, …, rt1. 2.1.1 GARCH model GARCH model, being an extension of Engle’s ARCH model for variance heteroskedasticity, deals with the prediction of future variances using past variances whenever the price series exhibits volatility clustering. GARCH models are widely used conditional heteroskedastic models to explain volatility clustering in an innovations process. Volatility clustering occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes with time. The GARCH model is an autoregressive moving average model for conditional variances whose lagged variances consist of GARCH coefficients and lagged squared innovations contain ARCH coefficients. Mathematically, the GARCH (p, q) model for the log return series {rt : t T} is given by: r t ¼ m + et (3) where et ¼ st zt and s2t ¼ o + p X i¼1 gi s2ti + q X aj e2tj (4) j¼1 Here zt (called the innovations) are independent and identically distributed with mean 0 and variance 1; et denote error terms. The GARCH (p, q) model considers the following constraints for stationarity and positivity of the price series (a) o > 0 which implies that the volatility cannot be zero or negative. (b) gi > 0, aj > 0 which will depict capturing the stylized characteristic of volatility clustering with the increase in conditional variance forecast by large variations in p q P P returns, and (c) gi + aj < 1 which is another condition depicting stylized characteristic of volatility clustering. i¼1 j¼1 Forecasting volatility in the stock market data Chapter 12 211 The conditional variance model GARCH (p, q) is composed of the GARCH component polynomial (p past conditional variances) and the ARCH component polynomial (q past squared innovations) (Posedel, 2005). 2.1.2 EGARCH model In the exponential GARCH (EGARCH) model, the logarithm of the conditional variances is used which consists of additional leverage terms to capture asymmetry in volatility clustering. EGARCH model is an extension of the GARCH model. The EGARCH (p, q) model consisting of p GARCH coefficients corresponding to lagged log variance terms, q ARCH coefficients corresponding to lagged standardized innovations, and q leverage coefficients with lagged standardized innovations. Mathematically, the conditional variance equation of an EGARCH (p, q) model is given by: 8 93 2 ! p q q <Etj = Etj X X X etj 2 2 5+ log st ¼ o + gi log st1 + aj 4 E xj (5) : stj ; stj stj i¼1 j¼1 j¼1 where the parameters gi(i ¼ 1, 2, …, p) measure the leverage effect and aj(j ¼ 1, 2, …, q) captures volatility clustering. In EGARCH model, there is no need to consider the positivity constraints because of the modeling the logarithm of the variance. In the EGARCH model equation, the distribution of the innovation zt can be Gaussian or Student’s t-distribution (Brummelhuis and Guegan, 2005). 2.1.3 GJR model The GJR model is a variant of the GARCH model that includes leverage terms for modeling asymmetric volatility clustering and is named for Glosten, Jagannathan, and Runkle. In the GJR formulation, large negative changes are more likely to be clustered than positive changes. In the GJR (p, q) model, p GARCH coefficients associated with lagged variances, q ARCH coefficients associated with lagged squared innovations, and q leverage coefficients associated with the square of negative lagged innovations. Mathematically, the conditional variance equation of the GJR (p, q) model is given by: s2t ¼ o + p X i¼1 gi log s2t1 + q X aj E2tj + q X j¼1 h i xj I Etj < 0 E2tj (6) j¼1 The term I(∙) is called the indicator function which equals 1 if Etj < 0 and 0 otherwise. Thus, the leverage coefficients are applied to negative innovations. The stationarity and positivity constraints for the GJR model are (a) o > 0 (b) gi > 0, aj > 0 p q q P P P (c) aj + xj 0 and (d) gi + aj + 12 xj < 1. i¼1 j¼1 j¼1 The GJR model reduces to the GARCH model if all leverage coefficients are zero which makes the GARCH model a special case of the GJR model (Baillie and Bollerslev, 1989; Agnolucci, 2009). Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are probabilistic statistical measures which are used to quantify the fitting of a model in modeling phase and enable to select the best model among a set of candidate models. The fitted model with least values of AIC and BIC is considered to be best for model testing and forecasting purposes (Franses and Van Dijk, 1996). AIC and BIC values for a model can be found from the following equations: AIC ¼ 2 log L + 2p (7) BIC ¼ 2 log L + p log ðnÞ (8) where L is the likelihood function, p is the number of parameters in the models, and n is the number of samples in the modeling phase. Moreover, the conditional variances obtained by modeling daily and weekly returns with appropriate GARCH, EGARCH, and GJR models are simulated using Monte Carlo simulation to obtain simulated conditional variances. 3. Application and results The present study is based on the use of autoregressive conditional heteroskedastic models to estimate volatility in BSE 100 S&P stock index data of 10 years ranging from June 1, 2009, to June 14, 2019 (Data Source: https://www.bseindia.com). 212 Handbook of hydroinformatics Plot of Daily Stock Prices Daily Stock Prices 9.5 9 8.5 2010 2011 2012 2014 2015 2016 Date Plot of Weekly Stock Prices 2017 2018 2019 2010 2011 2012 2013 2017 2018 2019 2013 Weekly Stock Prices 9.5 9 8.5 2014 2015 Date 2016 FIG. 1 Time series plot of daily and weekly BSE stock prices. TABLE 2 Statistical analysis of daily and weekly stock index data. Frequency Min Max Mean Standard deviation Variance Skewness Kurtosis Daily 8.3065 9.4096 8.8804 0.2896 0.0839 0.1943 1.6690 Weekly 8.3178 9.3963 8.8822 0.2900 0.0841 0.1906 1.6651 BSE Daily Stock Returns Daily Stock Returns 0.05 0 –0.05 Weekly Stock Returns 2010 2011 2012 2013 2015 2014 Date 2016 2017 2018 2019 2017 2018 2019 BSE Weekly Stock Returns 0.1 0.05 0 –0.05 –0.1 2010 2011 2012 FIG. 2 Daily and weekly returns of BSE stock returns. 2013 2015 2014 Date 2016 Forecasting volatility in the stock market data Chapter 12 213 Daily and weekly frequencies of price indices are used as inputs for the proposed study, whose time series plots have been shown in Fig. 1. Statistical analysis of daily and weekly stock prices is given in Table 2. BSE is a stockpiling index that takes into account the movement of both the price and the returns, so the stock index data is converted to returns for daily and weekly frequencies using Eq. (1). Fig. 2 shows the plot of daily and weekly BSE stock returns. For each frequency, the in-sample modeling period is from June 1, 2009, to November 29, 2018, and the remaining period of 2018 and 2019 includes the out-of-sample forecasting period. For daily frequency, the in-sample period consists of 2359 points and the out-of-sample period consists of 132 points, while for weekly frequency, the in-sample period consists of 495, and the out-of-sample period consists of 29 sample points. For daily and weekly stock returns, the values of skewness are found to be 0.3022 and 0.1668 respectively, which highlight the effect of asymmetric components contributing toward risk. The daily and weekly stock returns represent the presence of skewness, leptokurtosis, and fat tails. The daily and weekly returns of the BSE stock index are used as input to GARCH, EGARCH, and GJR conditional variance models to forecast conditional variances and hence volatility forecast. First of all, the autocorrelation function (ACF) and partial autocorrelation function (PACF) of squared residuals are plotted to examine conditional heteroskedasticity. The residuals for daily and weekly frequencies are calculated by the following formula: Residuals ¼ returns meanðreturnsÞ Sample Partial Autocorrelation Sample Autocorrelation ACF and PACF plots shown in Figs. 3 and 4 indicate significant autocorrelation and the presence of volatility clustering in the residual series. The volatility clustering in the residual series can also be estimated by Ljung-Box Q-test. The presence of conditional heteroskedasticity can also be estimated by testing residuals for ARCH effects. The F statistic for the test is 57.1877, which is greater than the critical value 9.2103 from the distribution with two degrees of freedom for daily returns. Thus, the null hypothesis of “no ARCH effects” is rejected at a level of significance. Similarly, the residuals of weekly returns of BSE stock prices reflect ARCH effects, which are clear from the F-statistic value for the test (57.1877) and critical value (9.2103) from the distribution with two degrees of freedom. It means that the residual series exhibits conditional heteroskedasticity (Drakos et al., 2010). The next step is to test the daily and weekly returns for normality using Jarque-Bera (JB) test. The value of the JB-test statistic is 521.4188, which is larger than the critical value 5.9621 and indicates rejection of the null hypothesis of “normality of daily returns” at a level of significance. It indicates that the daily returns are nonnormal with high values of kurtosis. Similarly, the value of JB-test statistic 35.2496 is larger than the critical value 5.8912 rejects the null hypothesis at a level of significance which indicates that weekly returns are nonnormal. Sample Autocorrelation Function 1 0.5 0 0 2 4 0 2 4 1 6 8 6 8 10 12 14 Lag Sample Partial Autocorrelation Function 16 18 16 18 20 0.5 0 FIG. 3 ACF and PACF of squared residuals for daily returns. 10 Lag 12 14 20 Sample Autocorrelation Sample Partial Autocorrelation Sample Autocorrelation Function 1 0.5 0 0 2 4 0 2 4 1 6 8 6 8 10 12 14 Lag Sample Partial Autocorrelation Function 16 18 20 16 18 20 0.5 0 10 Lag 14 12 FIG. 4 ACF and PACF of squared residuals for weekly returns. TABLE 3 Estimation results for GARCH, EGARCH, and GJR model parameters with Gaussian distribution. Model Frequency GARCH (1,1) Daily Weekly EGARCH (1,1) Daily Weekly GJR (1,1) Daily Weekly Value SE t-Statistic P-value Constant o 1.8923e-06 7.6804e-07 2.4638 0.013748 GARCH g 0.91403 0.0099932 91.465 0 ARCH a 0.067787 0.0078861 8.5958 8.2721e-18 Constant o 3.5898e-06 1.2718e-06 2.8227 0.0047624 GARCH g 0.18505 0.11231 1.6477 0.099409 ARCH a 0.26434 0.087087 3.0353 0.0024026 Constant o 0.30966 0.04304 7.1943 6.2781e-13 GARCH g 0.96612 0.0046121 209.48 0 ARCH a 0.12541 0.016856 7.4405 1.003e-13 Leverage x 0.10396 0.0087637 11.863 1.8442e-32 Constant o 1.2102 0.40467 2.9905 0.0027848 GARCH g 0.84112 0.052609 15.988 1.542e-57 ARCH a 0.21477 0.068153 3.1513 0.0016257 Leverage x 0.1836 0.053037 3.4617 0.00053669 Constant o 2.1252e-06 6.7635e-07 3.1421 0.0016773 GARCH g 0.90967 0.0095547 95.207 0 ARCH a 0.016508 0.0089696 1.8405 0.0657 Leverage x 0.114 0.012789 8.9139 4.9259e-19 Constant o 5.382e-05 2.1272e-05 2.5301 0.011403 GARCH g 0.77692 0.069548 11.171 5.6523e-29 ARCH a 0.016048 0.029889 0.53693 0.59132 Leverage x 0.21371 0.078476 2.7233 0.0064639 Forecasting volatility in the stock market data Chapter 12 215 TABLE 4 Estimation results for GARCH, EGARCH, and GJR model parameters with student’s t-distribution. Model Frequency GARCH (1,1) Daily Weekly EGARCH (1,1) Daily Weekly GJR (1,1) Daily Weekly Value SE t-Statistic P-value Constant o 2.1142e-06 9.8443e-07 2.1477 0.031739 GARCH g 0.90578 0.014978 60.473 0 ARCH a 0.07413 0.012107 6.123 9.1813e-10 DoF 8.1296 1.2305 6.6066 3.9333e-11 Constant o 3.5898e-06 1.4695e-06 2.4429 0.01457 GARCH g 0.18505 0.15613 1.1852 0.23593 ARCH a 0.26434 0.10143 2.6061 0.0091578 DoF 10 3.4478 2.9004 0.0037273 Constant o 0.32685 0.059042 5.536 3.0953e-08 GARCH g 0.96458 0.0063342 152.28 0 ARCH a 0.12384 0.020731 5.9734 2.3235e-09 Leverage x 0.11306 0.013209 8.5598 1.1305e-17 DoF 9.6831 1.6158 5.9926 2.0652e-09 Constant o 1.1334 0.40638 2.789 0.0052875 GARCH g 0.85133 0.052716 16.149 1.1457e-58 ARCH a 0.20907 0.072454 2.8856 0.0039073 Leverage x 0.18536 0.054362 3.4097 0.00065032 DoF 22.575 16.368 1.3792 0.16782 Constant o 2.8559e-06 8.9592e-07 3.1877 0.0014341 GARCH g 0.89932 0.013685 65.714 0 ARCH a 0.0088796 0.010881 0.81604 0.41448 Leverage x 0.13669 0.020179 6.7736 1.2563e-11 DoF 7.986 1.2484 6.3971 1.5838e-10 Constant o 5.1359e-05 2.1476e-05 2.3915 0.016781 GARCH g 0.78324 0.07072 11.075 1.6545e-28 ARCH a 0.014222 0.031427 0.45253 0.65088 Leverage x 0.21657 0.083663 2.5886 0.009636 DoF 24.552 21.96 1.118 0.26355 Now, the daily and weekly BSE stock returns are treated with GARCH, EGARCH, and GJR models with Gaussian and student’s t-distributions. The model parameters are estimated using their respective conditional variance equations, given in Tables 3 and 4. Using AIC and BIC criteria, the conditional variance models GARCH, EGARCH, and GJR with Gaussian and student’s t-distributions are compared in respective Tables 5 and 6 for daily and weekly frequencies. The values of AIC and BIC are least for the EGARCH model as compared to GARCH and GJR models for both frequencies with Gaussian and student’s t-distributions. Using these models, daily and weekly out-of-sample volatilities are estimated. Mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE) are used as measures to obtain error statistics which are defined by 216 Handbook of hydroinformatics TABLE 5 AIC and BIC values after modeling daily returns with GARCH, EGARCH, and GJR models with Gaussian distribution. Frequency Criteria GARCH (1,1) EGARCH (1,1) GJR (1,1) Daily AIC 1.6129e+ 04 1.6216e+ 04 1.6195e+ 04 BIC 1.6112e+ 04 1.6193e+ 04 1.6172e+ 04 AIC 2.5239e+ 03 2.5385e+ 03 2.5358e+ 03 BIC 2.5111e+ 03 2.5215e+ 03 2.5188e+ 03 Weekly TABLE 6 AIC and BIC values after modeling daily returns with GARCH, EGARCH and GJR models with student’s t-distribution. Frequency Criteria GARCH (1,1) EGARCH (1,1) GJR (1,1) Daily AIC 1.6190e+04 1.6266e+ 04 1.6253e+04 BIC 1.6167e+04 1.6237e+ 04 1.6224e+04 AIC 2.5239e+03 2.5384e+ 03 2.5352e+03 BIC 2.5069e+03 2.5171e+ 03 2.5139e+03 Weekly MAE ¼ n 1X ^ vðtÞ v ðtÞ n t¼1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n h i2 1X ^ RMSE ¼ vðtÞ v ðtÞ n t¼1 ^ n 1X vðtÞ v ðtÞ MAPE ¼ 100 n t¼1 vðtÞ ^ where v ðtÞ denotes the predicted value of volatility v(t). The simulated conditional variances obtained using Monte Carlo simulations for daily and weekly frequencies Gaussian distribution and student’s t-distribution have been shown in respective Figs. 5 and 6. Table 7 shows the values of error statistics for daily and weekly returns of the BSE 100 S&P index. The EGARCH model with student’s t-distribution is more accurate than the GARCH and GJR models with Gaussian and student’s t-distribution for daily and weekly frequencies. The improved prediction might be due to better capturing the leverage effect. The MAPE value obtained for daily returns using the EGARCH model with student’s t-distribution is 14.2577, which is comparatively lower than the MAPE value of 15.1214 for the EGARCH model with Gaussian distribution. Similar behavior is seen in MAPE values of the EGARCH model for weekly frequency also. Moreover, MAE, MSE, and RMSE errors have lower values for the EGARCH model with student’s t-distribution as compared to GARCH and GJR model with Gaussian distribution for daily returns. Similarly, the other errors observed for weekly frequencies are likewise the daily frequency with students’ t-distribution and Gaussian distribution. Forecasting volatility in the stock market data Chapter 2 ×10−3 Simulated Daily Cond. Var (GARCH-Gaussian) 8 Simulated path 1.8 Mean 7 Confidence bounds 12 217 ×10−5 Simulated Weekly Cond. Var (GARCH Gaussian) Simulated path Mean Confidence bounds 1.6 6 5 1.2 Cond. var. Cond. var. 1.4 1 0.8 4 3 0.6 2 0.4 1 0.2 0 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (a) 1.2 (b) ×10−3 Simulated Daily Cond. Var (EGARCH-Gaussian) 4.5 Simulated path Simulated path Mean 4 Confidence bounds 1 ×10−3 Simulated Weekly Cond. Var (EGARCH Gaussian) Mean Confidence bounds 3.5 3 Cond. var. Cond. var. 0.8 0.6 0.4 2.5 2 1.5 1 0.2 0.5 0.0 2.5 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (c) (d) ×10−3 Simulated Daily Cond. Var (GJR-Gaussian) 8 ×10−3 Simulated Weekly Cond. Var (GJR Gaussian) Simulated path Simulated path Mean Confidence bounds Mean 7 Confidence bounds 2 5 1.5 Cond. var. Cond. var. 6 1 4 3 2 0.5 1 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (e) 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (f) FIG. 5 Simulated conditional variances obtained using GARCH, EGARCH, and GJR models for daily and weekly frequencies with Gaussian distribution. 218 Handbook of hydroinformatics 2 ×10−3 Simulated Daily Cond. Var (GARCH t-dist) 2.5 ×10−4 Simulated path Mean Confidence bounds 1.8 Simulated Weekly Cond. Var (GARCH t-dist) Simulated path Mean Confidence bounds 1.6 2 1.2 Cond. var. Cond. var. 1.4 1 0.8 1.5 1 0.6 0.4 0.5 0.2 0 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (a) 1.5 ×10−3 (b) Simulated Daily Cond. Var (EGARCH t-dist) 2.5 Simulated path ×10−4 Simulated Weekly Cond. Var (GARCH t-dist) Simulated path Mean Mean Confidence bounds Confidence bounds 2 Cond. var. Cond. var. 1 1.5 1 0.5 0.5 0 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (c) 3.5 ×10−3 (d) Simulated Weekly Cond. Var (GJR t-dist) Simulated Daily Cond. Var (GJR t-dist) 0.012 Simulated path Simulated path Mean 3 Mean Confidence bounds 0.01 Confidence bounds 2.5 Cond. var. Cond. var. 0.008 2 1.5 0.006 0.004 1 0.002 0.5 0 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (e) 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year (f) FIG. 6 Simulated conditional variances obtained using GARCH, EGARCH, and GJR models for daily and weekly frequencies with student’s t-distribution. Forecasting volatility in the stock market data Chapter 12 219 TABLE 7 Forecast error statistics for daily and weekly frequencies with Gaussian and student’s t-distributions. Distribution Frequency Model MAE RMSE MAPE Gaussian Daily GARCH (1,1) 0.0016 0.0018 20.5424 EGARCH (1,1) 0.0014 0.0017 15.1214 GJR (1,1) 0.0016 0.0019 17.3108 GARCH (1,1) 3.2795e-04 3.4871e-04 12.8366 EGARCH (1,1) 0.0028 0.0033 12.9630 GJR (1,1) 0.0030 0.0035 16.9324 GARCH (1,1) 0.0016 0.0018 20.6181 EGARCH (1,1) 0.0012 0.0015 14.2577 GJR (1,1) 0.0016 0.0019 17.1776 GARCH (1,1) 3.2795e-04 3.4871e-04 12.8366 EGARCH (1,1) 0.0026 0.0032 12.4820 GJR (1,1) 0.0031 0.0036 17.3702 Weekly Student’s t Daily Weekly 4. Conclusions The present study investigates the modeling of daily and weekly returns of BSE 100 S&P index using conditional variance models of GARCH, EGARCH, and GJR models to obtain out-of-sample volatility forecast. The daily and weekly BSE stock returns from June 1, 2009, to November 29, 2018 are used as input to the selected models. For daily frequency, the in-sample period consists of 2359 points and the out-of-sample period consists of 132 points, while for weekly frequency, the in-sample period consists of 495, and the out-of-sample period consists of 29 sample points. These models were applied to input data of daily and weekly stock index returns with Gaussian distribution and student’s t-distribution. The MAPE value for daily returns using the EGARCH model with student’s t-distribution is 14.2577, which is comparatively lower than the MAPE value of 15.1214 for the EGARCH model with Gaussian distribution. Similarly, the MAPE values for weekly frequency have been observed to be quite low for the EGARCH model. The other errors have also been observed to be low for the EGARCH model with student’s t-distribution as compared to GARCH and GJR model with Gaussian distribution for daily returns. So, the volatility forecasts get improved using GARCH, EGARCH, and GJR models with student’s t-distribution as compared to that with Gaussian distribution and among these models EGARCH produced more accurate results as compared to GARCH and GJR conditional variance models. So, the EGARCH model among other conditional variance models produces more accurate results with student’s t-distribution as compared to that with Gaussian distribution for daily and weekly frequencies. References Agnolucci, P., 2009. Volatility in crude oil futures: a comparison of the predictive ability of GARCH and implied volatility models. Energy Econ. 31 (2), 316–321. Akgirary, V., Booth, G.C., Hatem, J.C., Mustafa, C., 1991. Conditional dependence in precious metal prices. Financ. Rev. 26 (3), 367–386. Alberg, D., Shalit, H., Yosef, R., 2011. Estimating stock market volatility using asymmetric GARCH models. Appl. Financ. Econ. 18 (15), 1201–1208. Baillie, R., Bollerslev, T., 1989. Common stochastic trends in a system of exchange rates. J. Monet. Econ. 44 (1), 167–181. Balke, N.S., Fomby, T.B., 1994. Large shocks, small shocks, and economic fluctuations: outliers in macroeconomic time series. J. Appl. Econ. 9, 181–200. Barunik, J., Krehlik, T., Vacha, L., 2016. Modeling and forecasting exchange rate volatility in time-frequency domain. Eur. J. Oper. Res. 251 (1), 329–340. Basrak, B., Davis, R.A., Mikosch, T., 2002. Regular variation of GARCH processes. Stoch. Process. Appl. 99, 95–115. Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. J. Econ. 31, 307–327. Bollerslev, T., 1987. A conditionally heteroskedastic time series model for speculative prices and rates of return. Rev. Econ. Stat. 69 (3), 542–547. Bollerslev, T., 1988. On the correlation structure for the generalized autoregressive heteroscedastic process. J. Time Ser. Anal. 9 (2), 121–131. Bouoiyour, J., Selmi, R., 2015. Exchange volatility and export performance in Egypt: new insights from wavelet decomposition and optimal GARCH model. J. Int. Trade Econ. Dev. 24 (2), 201–227. 220 Handbook of hydroinformatics Brummelhuis, R., Guegan, D., 2005. Multiperiod conditional distribution functions for conditionally normal GARCH (1, 1) models. J. Appl. Probab. 42 (2), 426–445. Charles, A., Darne, O., 2005. Outliers and GARCH models in financial data. Econ. Lett. 86 (3), 347–352. Chen, C., Liu, L.M., 1993. Joint estimation of model parameters and outlier effects in time series. J. Am. Stat. Assoc. 88, 284–297. Chong, C.W., Ahmad, M.I., Abdullah, M.Y., 1999. Performance of GARCH models in forecasting stock market volatility. J. Forecast. 18 (5), 333–343. Chuang, I.Y., Lu, J.R., Lee, P.H., 2007. Forecasting volatility in the financial markets: a comparison of alternative distributional assumptions. Appl. Financ. Econ. 17, 1051–1060. Cryer, J.D., Chan, K.S., 2008. Time series regression models. In: Time Series Analysis: With Applications in R. Springer, pp. 249–276. Dellaportas, P., Pourahmadi, M., 2012. Cholesky-GARCH models with applications to finance. Stat. Comput. 22 (4), 849–855. Dhaene, G., Wu, J., 2019. Incorporating overnight and intraday returns into multivariate GARCH volatility models. J. Econ. 217 (2), 471–495. Ding, Z., Engle, R.F., Granger, C.W.J., 1993. A long memory property of stock market return and a new model. J. Empir. Finance 1 (1), 83–106. Drakos, A.A., Kouretas, G.P., Zarangas, L.P., 2010. Forecasting financial volatility of the Athens stock exchange daily returns: an application of the asymmetric normal mixture GARCH model. Int. J. Finance Econ. 15 (4), 331–350. Engle, R.F., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation. Econometrica 50, 987–1007. Fama, E.F., 1965. The behavior of stock-market prices. J. Bus. 38 (1), 34–105. Fama, E.F., French, K.R., 1992. The cross-section of expected stock returns. J. Finance 47 (2), 427–465. Fama, E.F., French, K.R., 1993. Common risk factors in the returns on stock and bonds. J. Financ. Econ. 33, 3–56. Fama, E.F., French, K.R., 2004. The capital asset pricing model: theory and evidence. J. Econ. Perspect. 18 (3), 25–46. Franses, P.H., Ghijsels, H., 1999. Additive outliers, GARCH and forecasting volatility. Int. J. Forecast. 15, 1–9. Franses, P.H., Van Dijk, D., 1996. Forecasting stock market volatility using (non-linear) Garch models. J. Forecast. 15 (3), 229–235. Gallant, R., Hsieh, D., Tauchen, G., 1997. Estimation of stochastic volatility models with diagnostics. J. Econ. 81 (1), 159–192. Glosten, L., Jangannathan, R., Runkle, D., 1993. On the relation between excepted value and the volatility of the nominal excess return of stocks. J. Finance 48, 1779–1801. Hadizadeh, R., Eslamian, S., 2017. Modeling hydrological process by ARIMA–GARCH time series. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of Drought and Water Scarcity. Principles of Drought and Water Scarcity, vol. 1. Taylor and Francis, CRC Press, USA, pp. 571–590 (Chapter 30). Harris, R.D., Coskun K€uç€uk€ozmen, C., Yilmaz, F., 2004. Skewness in the conditional distribution of daily equity returns. Appl. Financ. Econ. 14 (3), 195– 202. Hsieh, D., 1989. Modeling heteroscedasticity in daily exchange rates. J. Bus. Econ. Stat. 7 (3), 307–317. Jach, A., Kokoszka, P., 2010. Empirical wavelet analysis of tail and memory properties of LARCH and FIGARCH models. Comput. Stat. 25 (1), 163–182. Kim, D., Fan, J., 2019. Factor GARCH-Ito models for high-frequency data with application to large volatility matrix prediction. J. Econ. 208, 395–417. K€ oksal, B., 2009. A comparison of conditional volatility estimators for the ISE national 100 index returns. J. Econ. Soc. Res. 11 (2), 1–29. Koutmos, G., Theodossiou, P., 1994. Time-series properties and predictability of Greek exchange rates. Manag. Decis. Econ. 15 (2), 159–177. Krishnan, R., Mukherjee, C., 2010. Volatility in Indian stock markets: a conditional variance tale re-told. J. Emerg. Mark. Finance 9 (1), 71–93. Lambert, P., Laurent, S., 2001. Modelling Financial Time Series Using GARCH-Type Models with a Skewed Student Distribution for the Innovations. No. UCL-Universite Catholique de Louvain, Belgium. Lintner, J., 1965. The valuation of risk assets on the selection of risky investments in stock portfolios and capital budgets. Rev. Econ. Stat. 47, 13–37. Liu, H.C., Hung, J.C., 2010. Forecasting S&P-100 stock index volatility: the role of volatility asymmetry and distributional assumption in GARCH models. Expert Syst. Appl. 37 (7), 4928–4934. Liu, W., Morley, B., 2009. Volatility forecasting in the Hang Seng index using the GARCH approach. Asia-Pac. Financ. Mark. 16 (1), 51–63. Mandelbrot, B., 1963. The variation of certain speculative prices. J. Bus. 36 (4), 394–419. Markowitz, H., 1952. Portfolio selection. J. Finance 7 (1), 77–91. McMillan, D.G., Speight, A.E., 2004. Daily volatility forecasts: reassessing the performance of GARCH models. J. Forecast. 23 (6), 449–460. McMillan, D., Speight, A., Ap Gwilym, O., 2000. Forecasting UK stock market volatility. Appl. Financ. Econ. 10 (4), 435–448. Mossin, J., 1966. Equilibrium in a capital asset market. Econometrica 34 (4), 768–783. Nelson, D.B., 1990. ARCH models as diffusion approximations. J. Econ. 45, 7–28. Nelson, D., 1991. Conditional heteroskedasticity in asset returns: a new approach. Econometrics 59, 349–370. Nijman, T., Sentana, E., 1996. Marginalization and contemporaneous aggregation in multivariate GARCH processes. J. Econ. 71, 71–87. Paolella, M.S., 2018. Linear Models and Time-Series Analysis: Regression, ANOVA, ARMA and GARCH. John Wiley & Sons. Poon, S., Granger, C.W.J., 2003. Forecasting volatility in financial markets: a review. J. Econ. Lit. 41 (2), 478–539. Posedel, P., 2005. Properties and estimation of GARCH (1, 1) model. Metodoloski zvezki 2 (2), 243. Sharpe, W., 1964. Capital asset prices: a theory of market equilibrium under conditions of risk. J. Finance 19, 425–442. Siourounis, D., 2002. Modeling volatility and testing for efficiency in emerging capital markets: the case of the Athens stock exchange. Appl. Financ. Econ. 12 (1), 47–55. Theodossiou, P., 1994. The stochastic properties of major Canadian exchange rates. Financ. Rev. 29 (2), 193–221. Tian, G., Guo, M., 2003. Intraday data and volatility models: evidence from Chinese stocks. Economics. Working Paper. Tolvi, J., 2001. Outliers in eleven Finnish macroeconomic time series. Finn. Econ. Pap. 14, 14–32. Treynor, J.L., 1961. Market Value, Time, and Risk. SSRN. August 8, 1961, https://doi.org/10.2139/ssrn.2600356. Wilhelmsson, A., 2006. GARCH forecasting performance under different distribution assumptions. J. Forecast. 25 (8), 561–578. Yu, J., 2005. On leverage in a stochastic volatility model. J. Econ. 127 (2), 165–178. Chapter 13 Gene expression models Hossien Riahi-Madvara, Mahsa Gholamib, and Saeid Eslamianc,d a Department of Water Engineering, Faculty of Agriculture, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran, b Department of Civil Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran, c Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, d Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction This chapter considers the gene expression programming (GEP), different types, and developments of GEP-based techniques in hydroinformatics. The genetic programming (GP), linear genetic programming (LGP), GEP, multigene genetic programming, tree-GEP, and pareto-optimal multigene genetic programming are introduced, and their applicability in various fields of hydraulic and hydroinformatics is discussed. The results of the case studies of GEP-based models confirmed the accuracy and suitability of these techniques in various aspects of hydraulic and hydrology models. Several function findings using GEP-based methods in hydraulic and river engineering are performed, and their explicit equations are derived as a strength model induction engine. The results of the case studies confirmed the general aspects and accuracy of GEP in function findings. Gene expression programming (GEP) is one of the applied fields in evolutionary processing and genetic algorithms (Ferreira, 2001, 2002a, 2002b, 2006). In the GEP, based on the mathematical approach of GA, natural selection, and the concepts of parse trees, the computer produces the codes automatically rather than developing codes by the human (Li et al., 2005). Indeed, a high-level command enforced to the computing and then the machine by considering the general concepts of problem develops the required code in an expressional way (Ferreira, 2003). In GEP, the machine develops the code, executes the optimized program, and produces a symbolic predictive equation. Koza (1990), at the first of 90th decade, introduced and developed the GEP algorithm as a new brand of automatic function finding in hydrology and hydraulic. Because of its ability in automatic function finding, the GEP is known by different versions such as automatic program induction, program synthesis, automatic programming, function finding, model induction engine (Riahi-Madvar et al., 2019). The GEP creates a population of symbolic programs and motives for the final model using the selection and reproduction operators of biological science to achieve the goal of automatic function induction, based on Darwinian Theory and GA ideas (Poli et al., 2008). In brief, the GEP can be defined as a biologically-motivated expert system that produces computer codes to complete a function, and simultaneously evolves the symbolic assembly of the model and the parameters of an evolved mathematical system (Searson et al., 2010). Many studies of GEP-based modeling in hydro informatics are available in the literature. These models have been used to estimate the scour process around the hydraulic structures (Guven et al., 2009; Azamathulla and Ghani, 2010; Najafzadeh and Barani, 2011; Najafzadeh and Oliveto, 2021; Najafzadeh and Kargar, 2019; Parsaie and Haghiabi, 2021; Khan et al., 2018); Longitudinal dispersion and transverse mixing in rivers (Riahi-Madvar et al., 2019; Nezaratian et al., 2021); modeling the spatial distribution of flow depth in fluvial systems (Yan et al., 2021;); Stage discharge modeling (Zahiri and Shabani, 2018; Birbal et al., 2021); water quality modeling (Najafzadeh et al., 2019); Groundwater vulnerability assessment (Norouzi et al., 2021); Reference evapotranspiration modeling (Kazemi et al., 2021); River water temperature modeling (Keramatloo et al., 2020). This chapter introduces the different types of gene expression models in the hydro informatics. 2. Genetic programming State-of-the-art, genetic programming (GP) is defined by Koza (1992) for automatic solutions to different problems. GP is processing a complete set of components or computer codes for function finding. Then GP recreates a random population of computer codes or specimens to produce a new population and discover the fittest result (Sattar and Gharabaghi, 2015; Danandeh Mehr and Kahya, 2017; Dufek et al., 2017). This population-based structure of GP results in more robustness (Eslamian et al., 2012). Because of its intrinsic parallelism, the GP is a straightforward algorithm based on the terms of Handbook of Hydroinformatics. https://doi.org/10.1016/B978-0-12-821285-1.00011-7 Copyright © 2023 Elsevier Inc. All rights reserved. 221 222 Handbook of hydroinformatics conceptualization and performance (Dufek et al., 2017). The GP is one of the most popular artificial intelligence and selfconstructing techniques, which the modeler does not need to specify the form and structure of the solution process. The GP structure follows the framework same as the tree structure, containing the root node (s), internal node (s), and leaves (Danandeh Mehr and Kahya, 2017). GP creates the initial population by generating random individuals (i.e., trees) by complete, grow, and Ramped HalfHalf (RHH) steps to achieve the highest diversity (Koolivand-Salooki et al., 2017). Then, three well-known operations, the reproduction, the crossover, and the mutation, are used to generate new generations (Babovic and Keijzer, 2000). The reproduction engine evaluates the individuals in all iterations and selects them for copying and inserting into the new populations, and new generations. The crossover step substitutes subtrees preferable parent’s chromosome to produce an offspring. The mutation step replaces a randomly selected subtree with a new one from the preferred parents (Saljoughi, 2017). In the growth of trees and subtrees, the new place for individuals is determined from a competitive process that results in improvements of the population’s fitness at successive steps (Dufek et al., 2017). Input feature selection is a crucial task for creating a GP model. There are four main inputs in a typical GP model: (1) data clustering and subsets in train and test steps, (2) the goal function for final selection (e.g., root mean square error), (3) the number of inner nodes and the structure of subtrees and location of leaves, and (4) GP features for the establishment of an arrangement tree as a computer code of the potential solution. The GP model’s final set may include (a) the program’s external inputs, (b) arithmetic functions, and (c) coefficients. To choose the best groups of the arithmetic functions, an accurate initial guess is needed while consisting of all the necessary functions. Depending upon the level of complexity in the investigation process, the arithmetic functions in the initial collection may include the basic mathematical operations (i.e., plus (+), minus (), times (), divide ()), or complicated one (i.e., sin, cos, sqrt, log, exp., max, min, sigmoid, etc.) (Danandeh Mehr and Kahya, 2017; Ravansalar et al., 2017). The GP products sequences of formulas with different complexities, but the moderately complex formulas are favored. The final selected formula should be accurate and straightforward. A simple formula may not be very accurate, whereas a very complex formula is often over-fitted (Tinoco et al., 2015). 2.1 The basic steps in GEP development Producing the GEP code of a high-level problem, the high-level program should be converted into the appropriate form of GA. The following five steps should be taken into account by the human over the machine (Abraham et al., 2006): 1. The elements of final terms in the GEP should be determined. To convert a program code into the required chromosome forms of GA as a parse tree, it is necessary to decide on the leaves in the program graph and program tree. These final elements in the tree graph are the terminals of the model. The terminals are the collection of input variables, independent variables, functions without input vectors, or the constants determined randomly. 2. Determination of the function set and operation commands could be used in the generation and reproduction phase of code. 3. The fitness function definition as an objective function to evaluate the code’s fitness over generations and find the bestoptimized solution of GA for the symbolic models. 4. Determining the structural parameters of GA such as the method of selection of parents in GA, the mutation rate, the reproduction rate, the cross-over rate, etc. 5. Setting the termination condition, and defining the procedure for determining the termination results of produced code. 2.2 The basic steps in GEP development The main basics of GEP are parse trees. A tree-based GEP model is a systematic procedure utilized to generate symbolic and expressional solutions in a given problem with the primary thrust of genetic algorithms ( Jin, 2020). The final solution in a tree-based GEP is a structure, same as a tree with leaves as the terminal nodes at the end level of the model, and the nonterminal nodes at the higher level. In the parse trees, the equational form of models is represented by tree graphs. A key necessity in generating programs is that generated code’s consistency and grammatical correctness should be evaluated. For example, it is impossible to execute an initial C ++ code written by a programmer because the written code may have grammatical errors in its initial form and is written in a high-level language that cannot be converted into machine language form. The solution is the conversion of code into a more straightforward, and monotone form called parse trees. In the execution phase of a high-level code written in C/C ++ language, the code is converted into a parse tree that enables the compiler to work on it. On the other hand, using these frameworks in the GEP structure makes more straightforward the definition of genetic operators simpler. It removes the challenges in error finding and correction of syntax, and runtime errors made simpler. If the compiler succeeded in generating parse tree from the written code, the code has no syntax or run time error. However, how the parse tree form of an expression such as x + 12 is presented? The corresponding parse tree of this expression is as follows (Fig. 1). Gene expression models Chapter 13 223 + x 12 FIG. 1 The parse tree form of x + 12 expression. - * + Po 3 + / x x y 8 z 40 FIG. 2 The parse tree form of ((x/y + (8 + 40)) (3*pow(x, z))) expression. In this parse tree, the operators are in the root, and the operands are in the left and right leaves of the root node. The parse tree form of rather complex expressions such as ((x/y + (8 + 40)) (3*pow(x, z))) is as (Fig. 2). In the parse trees, all operands and functions that require input arguments are found in the root subtrees of parse trees, while the variables, constants, and functions that do not require inputs are found in the leave nodes. The root nodes that do not depend on other things or have no children are the terminals of variables. In this way, using the parse trees enables representing a computer code in a simple and consistent form. One of the most programming languages which work based on the parse trees, is the Lisp that the codes are written in a type of parse tree. For example the code to write x + 5 expression in Lisp is (+ x 5) and the Lisp code for expression ((x/y + (8 + 40)) (3*pow(x, z))) is ( (+ (/ x y) (+ 8 40)) (* 3 (Pow x z))). Nowadays, this method of code representation is used in different programming languages as MATLAB to provide the final output of GEPs (Raiahi-Madvar and Seifi, 2016). 3. Tree-based GEP The GEP is based on the GA in the automatic construction of optimal program structures and nonlinear equations. As stated, the parse tree is one of the best methods in symbolic program generations. The tree encodes the individuals in the population by utilizing genetic operators such as mutation and crossover over the parse tree representation. The tree-based GEP in a hierarchical encoding of trees having roots, nodes, and leaves with branches shows the model’s syntactic structure in context-free grammars and the corresponding pseudo-code as shown in Fig. 3. In this figure, the end nodes known as terminal nodes represent the input arguments of functions, and the arithmetic operations are located at the nonterminal nodes. In optimizing a GEP model, the primary genetic operators for tree-based production are the selection operator and the crossover operator. The mutation process implements a certain degree of diversity over the population. In the generations, the fitness function in all the individuals is assessed and the goodness of fit calculated, and the individuals are graded regarding their fitness. The selection operation regarding the fitness values chooses the best individuals to contribute in mutation or cross over for offspring. The selection usually is based on the tournament and roulette-wheel selection that acts based on fitness values. After applying the selection operation to the individuals, the crossover is used over the branches to produce offspring by changing the selected branches of two parents, as shown in Fig. 4. 224 Handbook of hydroinformatics FIG. 3 Parse-tree demonstrations of the computer programs in tree-based GEP ( Jin, 2020). Root Root - * * / y + x y / * * z y Mathematical Expression: x z , , Encoding for Computer Programs: f=(x+y)/z-y*(x*z) x z x , , f=(x*x)/z*y The branches are selected randomly, to have appropriate diversity over the population in successive generations. The crossover points in a tree-based GEP can be either a nonterminal or terminal node over the branches. The final operation in the tree-based GEP development is the mutation that randomly selects the branches and replaces them with a randomly produced branch, as shown in Fig. 5. Fig. 6 shows a flowchart of tree-based GEP creation in a genetic process context. Individuals were initially used as generations. In each generation, the fitness and function of the model are evaluated using a tree-based GEP, and the people are utilized to reproduce new generations through selection, crossover, and mutation, which enhances the model’s performance over generations. The generation process is continued until the model reaches the stopping criteria, based on a predetermined maximum number of generations or error threshold. The evolution process is represented in the following figure. First and foremost, two issues should be addressed in the construction of the GEP model. At first, the closure problem should be satisfied. The closure problem in GEP means that each function that has input arguments should manage all of the output values. For instance, suppose we write a program that exclusively employs the mathematical operators of {*, /, +, } as the functions and the values of {0, 1, 2, …} as the terminal values. In this example, the closure problem is not satisfied as a program with {2/0} produces the divide by zero error. To satisfy the closure problem, the divide function should be defined in such a way that does not break the program and does not cause the program to end abnormally. The second issue is that the problem under consideration must be solvable using a combination of terminals and functions. For example, using the function set of {+, } and terminals of {1, 2, 3, …} the problem of calculating the logarithm of numbers is not solvable accurately. As a result, the modeler’s capacity to solve problems by considering implemented functions and processes is critical. 3.1 Tree depth control In the population generation of GEP, the maximum depth of trees should be confined to control the following complexity and challenges in the problem (Raiahi-Madvar and Seifi, 2016): l l l growing the maximum depth of trees, increases the required system memory usage increasing the depth of trees, the running speed of genetic operations reduces, and the time of final solution in the algorithm increases generally, the optimum problems usually composed of the smaller command and simple trees Therefore, it is necessary to control the maximum allowable depth of trees in an evolutionary process that can be done using the method of maxim depth determination or penalizing the giant trees. 3.2 Maximum tree depth In this approach, during the production of the initial population and the stage of genetic operator’s implementation, the pruning method is used over the trees with depths greater than the predefined maximum depth that guarantees the growing Gene expression models Chapter Parents Root Root - * / y y y * * z x x 225 Offspring * + 13 x z y Root Root * * / y / z / z * x * z x x + z x y x FIG. 4 Application of the crossover operation in the tree-based GEP model ( Jin, 2020). depth of trees. The weakness of this method is that the final optimum solution may require a maximum depth greater than the predefined values, which makes necessary the increasing maximum depth, and repeating the evolutionary process. 3.3 Penalizing the large trees In this method, in the evaluation phase of chromosomes in proportional with the depth of large trees, a penalty is used. In this manner, with growing the trees, the penalty increases, reducing the chance of selecting large trees and controlling the maximum depth of trees. 226 Handbook of hydroinformatics FIG. 5 Application of the mutation operation over the tree-based GEP ( Jin, 2020). Root - Randomly Generated Tree-branch * Remove / + * + z x x * / y x z z y z y x ADD Root - * / + * y * x x y FIG. 6 The flowchart of tree-based GEP development ( Jin, 2020). Random Generation for Initial Population x z Population of Candidate Fitness Evaluation Stopping Criteria Reproduction (Crossover/mutation) Selection z z Yes Finish No Genetic Operations 3.4 Dynamic maximum-depth technique As stated previously, the search space in the evolutionary process of tree-based GEP is potentially unlimited. Therefore, the trees may grow in depth and size throughout the evolution process. As a result, some parts of the parse-tree structure may be redundant with less influence on the improvements of fitness function (Koza, 1992). Another sophisticated technique in maximum depth control is the dynamic procedure. In this approach, rather than the predefined fixed maximum depth, the maximum depth is adjusted during the evolution according to the best fitness value of trees ( Jin, 2020). The pseudo-code of dynamic maximum depth procedure is presented in Table 1. Gene expression models Chapter 13 227 TABLE 1 Pseudo code for dynamic maximum-depth (DMD) technique (Jin, 2020). For i=1: total number of individuals Depth(i)=individual ith’s depth Fitness(i)= individual ith’s fitness If Depth(i)<Maximum Depth Chose the ith individual If Fitness(i)< Best fitness Best fitness= Fitness(i) End Else If Fitness(i)<best fitness Chose ith individual Best fitness= Fitness(i) Maximum depth=depth(i) End End End 4. Linear genetic programming LGP is an advanced version of tree-based GEPs with a linear structure of program representation (Alavi and Gandomi, 2012). The LGP form of expression x + 12, which presented in tree-based GP in Fig. 1, is as follows. f[0]=0; L0: f[0]+=x[2]; l1: f[0]+=12; Return f[0]; In the LGP, an individual program is a variable-length sequence of simple C instructions. The arithmetic operations, conditions, and functions compose the instructions. The LGP steps to solve the problems are as follows (Alavi and Gandomi, 2012). l l Initialization: randomly generating the initial population and calculating the objective function of each individual. Genetic operator: ✓ Tournament: randomly selecting the individuals from the population and evaluation of the two best and worst. ✓ Crossover: generating new individuals by replacing the parts of best-selected individuals. ✓ Elitist: replacing the worst individuals with the results of the previous step. ✓ Repeating the genetic loop until the convergence or termination of the model. More details and descriptions of the basic parameters used to direct a search for a linear genetic program are given by Brameier et al. (2007). 5. Evolutionary polynomial regression EPR is an evolutionary data-based model using evolutionary computation in derivation of polynomial equations. It hybridizes the ability of genetic algorithm with numerical regression and finds symbolic regressions. The EPR procedure follows two primary steps. In the first step, the symbolic architecture of the regression is derived, and in the second step the coefficients of symbolic model are determined using least square error. Giustolisi and Savic (2006) developed a multi-objective genetic algorithm using the accuracy and complexity (minimum number of inputs, minimum length of expressions) as the optimization goals (Balf et al., 2018). The number and range of input parameters, the type of selected functions for the components of the symbolic regression (e.g., natural logarithmic, tangent hyperbolic, and exponential) have a crucial impact on the EPR performance (Najafzadeh et al., 2017). More details on the EPR and its software for application are given by Giustolisi and Savic (2006). 228 Handbook of hydroinformatics 6. Multigene genetic programming Recently, the MGGP (Searson, 2009) has been established as one of the newest GP-based models. The MGGP produces more straightforward models in comparison with the traditional GP, because the MGGP linearly combines simple and low depth subtrees (Searson, 2015). Every computer code or individual in the MGGP model is a weighted linear combination of many genes (trees), and a bias term as the noise component (Riahi-Madvar et al., 2021). An example of a hypothetical multigene tree structure is presented in Fig. 7, and the equivalent equation of this model can be written as Gene 1 : 6:27 ½ðx1 7:65Þ + ð tan ð7:51x1 ÞÞ ¼ 6:27 tan ð7:51x1 Þ 6:27x1 + 47:99 Gene 2 : 17 ½x1 + tan ð7:57x1 Þ ¼ 17 tan ð7:57x1 Þ 17x1 Gene 3 : 8 ½ð7:5x1 Þ + ð cos ðx2 Þ + x1 + x2 Þ ¼ 52:56x1 8 cos ðx2 Þ 8x2 Fðx1 , x2 Þ ¼ 29:29x1 8x2 + 6:27 tan ð7:51x1 Þ + 17 tan ð7:57x1 Þ 8 cos ðx2 Þ + 47:99 (1) where, 47.99 is the bias component, 6.27, 17, and 8 are the regression coefficients, showing the gene weights. Generally, the linear coefficients for every MGGP individual are determined by the ordinary least squares optimization. The user determines the maximum depth (Dmax) of trees, and the maximum number of genes (Gmax). These maxims control the complexity degree of the final solution (Searson et al., 2010). It has been demonstrated that the MGGP method for obtaining nonlinear behavior is more accurate than the traditional linear regression method (Danandeh Mehr and Kahya, 2017). Also, Searson et al. (2010) illustrated that the MGGP method could effectively be embedded into a nonlinear partial least square method. In the MGGP, the initial population is generated by the GP trees having a different number of genes between 1 and Gmax, that are generated randomly. Each gene is a simple GP tree that does not relate to other genes in the same chromosome. Gene 1 Gene 2 plus plus plus plus tan X1 times -7.508 X1 plus times tan X1 times -7.654 X1 Gene 3 -7.576 plus X2 X1 -7.576 X1 cos plus plus times X1 29.29 X2 times 29.29 times 6.27 17 tan tan FIG. 7 Representation of the MGGP model (Riahi-Madvar et al., 2021). -8 times times X1 times 7.51 X1 cos X2 7.57 X1 Gene expression models Chapter 13 229 Throughout an MGGP model, genes are obtained and deleted using a two-point high-level crossover (Danandeh Mehr and Kahya, 2017; Gandomi and Alavi, 2012). For example, consider that the Gi is the ith gene in an individual, the first parent is constructed with three genes G1, G2, and G3, and the second parent contains four genes G4, G5, G6, and G7. In every individual, two chosen crossover points are generated at random. The genes surrounded by the selected crossover points (i.e., […]) are replaced. Two new individuals are created as following. ðG1 , ½G2 , G3 ÞðG4 , G4 , ½G6 , ½G7 Þ ! ðG1 , G6 , G7 , G3 ÞðG4 , G4 , G2 Þ (2) In the case with genes greater than the Gmax, the additional genes are deleted randomly (Searson et al., 2010; Gandomi and Alavi, 2012). 7. Pareto optimal-multigene genetic programming The Pareto optimal has been used in many engineering problems due to its simple production of different Pareto frontiers (Coello and Becerra, 2003). The Pareto optimization works using the balance between different optimization goals. The feasible solutions of the multi-objective optimization are determined by separating the sequences that satisfy the priority condition. If X1 and X2 were two feasible solutions for the establishment of a dominance relationship, they should satisfy the following conditions (Riahi-Madvar et al., 2021) Objd ðX1 Þ Objd ðX2 Þ, 8d f1, 2, …, Dg Obji ðX1 Þ Obji ðX2 Þ, 9i f1, 2, …, Dg (3) where, Objd is the objective value of the dth solution, and D is the total number of the goals. If the X* solution, satisfies the above conditions and there isn’t any sequence solution X, while X < X*, then the X* will be preserved as the Pareto or noninferior solution, and other dominated solutions will be eliminated. The Pareto front is built up by the values of the goal function in the selected sequences (Fig. 8). Fig. 8 illustrates that solution B is preferred to solution E regarding two goals. So, E is a dominant solution and eliminated. The B and C solutions are the same. B is better than the C based on the objective Obj1, but B does not satisfy the Obj2. Therefore, all A, B, C, and D solutions are Pareto results. The frontier line of the resulting domain is called the Pareto front. The Pareto front removes the E, F, and G as the dominant solutions (Zhang et al., 2017). In the Pareto optimization algorithm, the new solutions (pi) will be compared with the nondominated solutions (qj). If pi < qj, the pi will be selected and placed instead of the qj. If both qj and pi satisfy the Obj1 and Obj2 conditions, then both qj and pi used in the next comparisons (Zhang et al., 2017). Fig. 9 shows the flowchart of PO-MGGP. The PO-MGGP hybridizes robust techniques of feature selection, Pareto optimization, and multi-expression discoveries. FIG. 8 Relationship between Pareto solutions. 230 Handbook of hydroinformatics FIG. 9 The flowchart of PO-MGGP modeling. 8. Some applications of GEP-based models in hydro informatics In this section, some applications of GEP based models in different hydro informatic problems are presented, and the ability of GEP based techniques in explicit function finding is provided. In this study, gene expression programming from MATLAB software is used to derive the exact form of function. A software package is created to be used with different inputs and examine different mathematical functions and operations. 8.1 Derivation of quadric polynomial function using GEP In the first applicability test of the gene expression program, GEP is used to derive quadric polynomial function. A set of single input, output data is generated, and the final form of the function is derived using the developed GEP model. So, the following function is executed between 0 and 100 with steps of 2. Y ¼ X + X 2 + X3 + X 4 (4) The GEP model is executed with the following parameters: population: 50, generations:100, match selection value: 5, probability: 0.05, threshold error: 1e-30, maximum tree depth: 12, maximum depth of subtrees in the mutation: 7, and the addition, subtraction, power, multiplication, division, sinus functions are used. Based on the results in Fig. 10, the minimum Gene expression models Chapter 13 231 FIG. 10 Changes of absolute error sum of target function in different generations of quadric polynomial. value of the target function, which is the sum of absolute error, is equal to 9.0896e8. Results of the GEP model and the results of the function are shown in Fig. 11. It is observed that despite vast variations of the output variable, gene expression programming predicted the values correctly. The mentioned function is one of the functions used by Koza (1990) as a reference function in the evaluation of GEP based models. Also, the Final computer program generated by the model is derived as the following (Fig. 12). And this symbolic expression demonstrates that the quadratic polynomial 4 is precisely generated via simplification. Plusðplusðtimesðx1 , x1 Þ, timesðtimesðplusðtimesðx1 , x1 Þ, x1 Þ, x1 Þ, x1 ÞÞ, x1 Þ (5) 8.2 Derivation of Colebrook-White equation using GEP In the second test of GEP, a database is generated by numerical solution of the Colebrook White equation. The accuracy is evaluated in the extraction of the exact form of Colebrook White equation. Reynold number varies in the ranges of 4000 to 100,000,000, relative roughness range is 0.000001 to 0.05, so by numerical solution of Colebrook White equation range of f is derived between 0.015751715 and 0.0715537. Whole data are divided randomly into three clusters: training 60%, test 20%, cross-validation 20%. GEP with the population size 90, 100 generations, tournament selection 4, maximum tree depth 12, and addition, subtraction, multiplication, division, exponential, power and hyperbolic tangent functions is executed. FIG. 11 Comparison of GEP predicted and results of quadric polynomial. FIG. 12 GEP generated program for quadric polynomial. 232 Handbook of hydroinformatics Gene weights 1 X2 Y 0.753 0 X3 Y 0.06727 –1 X4 Y -1.321 –2 X1 –3 Y -3.207 –4 Bias Gene 1 Gene 2 Gene 3 FIG. 13 The weights of multigene in POMGGP for Colebrook White equation. Changes of the target function in different generations are shown in Fig. 13, in which the minimum value of the target function is 3.307e5 in the 49th generation. Also, the best RMSE values are 3.307e5 for train data, 3.3203e5 test data, and 3.1432e5 for the validation data, respectively. Figs. 14 and 15 show the observed and predicted values for three sets of data. It is seen that the results are entirely the same, so the model accuracy is confirmed. GEP finally derived the explicit equation for Colebrook White as: 0 0sffiffiffiffiffiffiffiffiffi11 sffiffiffiffiffiffiffiffi rffiffiffiffi rffiffiffiffiffi rffiffiffiffi e e e AA e + 0:05028 f ¼ 0:6131 + 0:05028 + 0:4137Ln@2 Re Ln@ + 0:4137 D 0:3955Lnð Re Þ D D D rffiffiffiffi e 0:3955Ln + Lnð Re Þ D (6) FIG. 14 Comparison of actual and GEP predicted values in Colebrook white relation derivation. Gene expression models Chapter 13 233 FIG. 15 Comparison of actual and GEP predicted values trend in Colebrook White equation derivation. 8.3 Derivation of the exact form of shield’s diagram using GEP Shield’s diagram shows that critical shear stress changes sediment particle Reynold number and is used to calculate sediment particle movement threshold, which different equations are derived by many researchers. In this part, the Wu (2007) sets of equations are used, and a data set is generated. Then using the GEP, an exact expression for shield diagram is derived. Wu (2007) suggested the following sets of equations for shield’s diagram 8 0:126D0:44 , D∗ < 1:5 > ∗ > > > 0:131D0:55 , 1:5 D∗ < 10 > ∗ > < 0:27 τc 0:0685D , 10 D∗ < 20 ∗ ¼ (7) 0:19 0:0173D , 20 D∗ < 40 gs g d > ∗ > > > 0:0115D0:3 , 40 D < 150 > > ∗ ∗ : 0:052D0:3 ∗ , 150 D∗ 1=3 D∗ ¼ d rs =r 1 g=n2 In the above equation d is particle size [meters], and τc is the critical shear stress [N M2]. The dataset for the dimensionless size of particles between 0.01 and 400 is derived, and then the exact form is extracted by GEP. GEP is executed with the previous example settings, and after convergence, the RMSE values in three categories of train, test and evaluation data are derived to be 0.0003859, 0.00037937, and 0.00038873, respectively. Finally, the GEP exact form of the shield’s diagram is derived as below: pffiffiffiffiffiffi Exp D∗ τc 2 ¼ 0:08460 + 3:0255ðD∗ + 8:639Þ + 0:887646 pffiffiffiffiffiffi gs g d ðD∗ + 8:639Þ D∗ 0:5 0:0000127106ðD∗ 2:939185942Þ D0:5 + 0:040208497ðD∗ + 3:509Þ0:5 D0:25 ∗ + 8:639D∗ ∗ 0:75 0:00008529D∗ (8) In Fig. 16, the shield’s diagram of the GEP-based equation vs equation seven is drawn, and it is clear that GEP derived it accurately and explicitly. 234 Handbook of hydroinformatics 1 W* HFFUGEP-Eq. Wu(2006) Eq. 0.1 0.01 0.01 0.1 10 1 100 D* FIG. 16 Derivation of shield’s diagram using GEP. 8.4 Extraction of regime river equations using GEP In this section, to study the capability, and strength of GEP to derive empirical geometry equations of regime rivers, a field database is used, and a model developed. To achieve this, a data set is collected by field study related to the geometry of river regimes. Then this data set is used in GEP, and the exact form of desired relations is derived, and developed program accuracy in practical problems and field studies is confirmed. Using GEP, exact relations are derived for geometrical properties of river regimes. Based on the results, the following equation is derived for regime depth. H ¼ 0:00003256Q 0:00001639QD50 0:002263 Q + 0:05363Q0:5 D0:5 + 0:5424Q0:25 0:1674 50 ExpðD50 Þ (9) The above equation has correlation coefficient 0.9, which demonstrates acceptable results. Using the GEP method, an exact equation is derived for the river regime width, which has correlation coefficient 0.97, and based on the results shown in Figs. 17–18, the GEP based equation has an acceptable performance to predict the stable width of the river regime. The derived equation for width of the river is as below: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi Q + 8:00673 W ¼ 18:69 0:085Ln 1:310221Q + 8:255 Q + 7:717513 0:008295 0:3159 QD50 D50 Using collected data, and GEP the equation for river regime slope is derived as follows pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi D50 D50 D50 S ¼ 12:68 + 0:2521 0:0837Lnð3:76781 D50 Þ 0:006212 + 0:0008078 Q + D50 Q 4:371538 D50 (10) (11) This equation has a correlation coefficient 0.74. Its results are compared to the actual values in Fig. 19, and seen that it has the desired performance. Based on the findings of the presented equations in this section, it can be observed that the GEP has the acceptable performance to derive equation in field data. It can display the shape and final form of optimized equations with their coefficients, which can be easily applied to actual applications. Gene expression models Chapter RMS: 0.36232 R square: 0.91922 Variation explained: 91.9183 % 13 235 FIG. 17 GEP accuracy in deriving the canal depth regime equation. 9 8 7 Predicted 6 5 4 3 2 1 1 2 3 4 5 7 6 8 9 Observed RMS: 14.4625 R square: 0.97086 Variation explained: 97.0809 % FIG. 18 GEP accuracy in deriving the canal width regime equation. 500 450 400 Predicted 350 300 250 200 150 100 50 50 100 150 200 250 300 Oserved 350 400 450 500 236 Handbook of hydroinformatics FIG. 19 GEP accuracy in deriving the canal slope regime equation. 1.5 RMS: 0.12375 R square: 0.72194 Variation explained: 72.1795 % Predicted 1 0.5 0 0 1 0.5 1.5 Observed 8.5 Extraction of longitudinal dispersion coefficient equations using GEP In this section, the Pareto optimality regarding the Multi GEP is used to find an equation for longitudinal dispersion of pollutants in natural rivers (Riahi-Madvar et al., 2009). The primary goal of this section is to use the 503 collected data in streams around the world, to build a PO-MGGP equation for longitudinal dispersion coefficient Kx. The advectiondispersion equation, in which the longitudinal dispersion coefficient is a crucial parameter, is written as follows: ∂C ∂C ∂2 C +u ¼ Kx 2 ∂t ∂x ∂ x (12) The Kx is longitudinal dispersion coefficient, x is the streamwise direction, t is the time, u is the flow velocity, and C is the pollutant concentration. The longitudinal dispersion coefficient as a function of its effective parameters can be presented as follows: D ¼ f U, U ∗ , Sn , H, W (13) In which the parameters are: velocity (U), mean depth of stream (H), shear velocity (U*), sinuosity (Sn), and width of the stream (W). In this case, the PPOMGGP with its parameters in Table 2, is used and the multigene results of the model are derived. The simplified equation of the Kx in the form of effective dimensionless parameters derived as follows: X31 2:391 1015 X1 1:02 1016 2:033 1020 Kx 8:497X1 2 2 2 ¼ 33:99X1 + 8:497X1 X2 + + 16:99X1 X2 + X1 X42 BU ∗ + 0:01478 (14) Gene expression models Chapter 13 237 TABLE 2 Parameters of the MGGP for Kx prediction. Run parameter Value Population size 300 Max. generations 500 Generations elapsed 55 Input variables 2 Training instances 352 Tournament size 15 Elite fraction 0.3 Lexicographic selection pressure On Probability of Pareto tournament 0 Max. genes 3 Max. tree depth 4 Max. total nodes Inf ERC probability 0 Crossover probability 0.84 High level 0.2, Low level 0.8 Mutation probabilities 0.14 Subtree 0.9, Input 0.05, Perturb ERC 0.05 Complexity measure Expressional Function set TIMES MINUS PLUS RDIVIDE SQUARE TANH EXP LOG MULT3 ADD3 SQRT CUBE NEGEXP NEG ABS Comparisons of the predicted values versus observed values of Kx are shown in Figs. 20 and 21. Also, the error trend and error histograms in train, and test steps are presented. These results obviously visualize the predictability of observations by the developed equation in their acceptable accuracy. As it is clear, the PO-MGGP, is capable of estimating the Kx accurately. 9. Conclusions In this chapter, the GEP-based techniques in hydroinformatics are considered, and different types of them including, GP, GEP, MGGP, Tree-GEP, POMGGP are introduced. Their applicability in different fields of hydraulic and hydroinformatics is discussed. The results of the case studies of GEP based models confirmed the accuracy and suitability of the GEP-based techniques in different aspects of hydraulic and hydrology. Functions findings in quadratic equation, explicit form of Colebrook-White, shield’s diagram, river regime slope-depth and width, and pollutant dispersion in natural rivers are studied. The results of case studies confirmed the general abilities of the GEP in function findings. In the GEP models, according to the number of functions, variables, numerical constants and conditional expressions, the number of combinations varies large, and the search space will be vast. Also, taking into account the runtime of one program (chromosome) and testing that the program works appropriately or not, the run time of the GEP models will be considerable. Finally, the main challenges of GEP models can be summarized as: 238 Handbook of hydroinformatics FIG. 20 Comparing observed Kx/HU* versus estimated PO-MGGP in train phase. l l l l Ample search space of the problem (large number of functions, variables, constants, operands, and their combinations). A large number of chromosomes in each generation and time required for calculation of fitness for all generations. Determination of chromosomes fitness with different input combinations The GA produces several generations with a hundred population that requires the evaluation of hundred thousand of chromosomes overall. It is worth mentioning, that the performance and success of GEP mainly depend on the appropriate selection of operands, functions, and terminals in parse trees. The proper choice of these improves the efficiency of model and highly affects the runtime of the model. Therefore, the main difference between GA and GEP is the concept of chromosomes and the final solution. In GA, the strings or chromosomes have equal length of numbers as the solution of the problem. In GEP, a set of computer codes with the same or different sizes are produced as the final solution. In GEP, the computer codes are generated automatically as symbolic expressions of variables, functions, and operands. Gene expression models Chapter 13 239 FIG. 21 Comparing observed Kx/HU* versus estimated PO-MGGP in test phase. References Abraham, A., Nedjah, N., de Macedo Mourelle, L., 2006. Evolutionary computation: from genetic algorithms to genetic programming. In: Genetic Systems Programming. Springer, Berlin, Heidelberg, Germany, pp. 1–20. Alavi, A.H., Gandomi, A.H., 2012. Energy-based numerical models for assessment of soil liquefaction. Geosci. Front. 3 (4), 541–555. Azamathulla, H.M., Ghani, A.A., 2010. Genetic programming to predict river pipeline scour. J. Pipeline Syst. Eng. Pract. 1 (3), 127–132. Babovic, V., Keijzer, M., 2000. Genetic programming as a model induction engine. J. Hydroinf. 2 (1), 35–60. Balf, M.R., Noori, R., Berndtsson, R., Ghaemi, A., Ghiasi, B., 2018. Evolutionary polynomial regression approach to predict longitudinal dispersion coefficient in rivers. J. Water Supply Res. Technol. AQUA 67 (5), 447–457. Birbal, P., Azamathulla, H., Leon, L., Kumar, V., Hosein, J., 2021. Predictive modelling of the stage-discharge relationship using gene-expression programming. Water Supply 21 (7), 3503–3514. Brameier, M., Banzhaf, W., Banzhaf, W., 2007. Linear Genetic Programming. Vol. 1 Springer, New York, USA. Coello, C.A.C., Becerra, R.L., 2003. Evolutionary multiobjective optimization using a cultural algorithm. In: Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS’03, Cat. No. 03EX706. IEEE, pp. 6–13. Danandeh Mehr, A., Kahya, E., 2017. A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction. J. Hydrol. 549, 603–615. 240 Handbook of hydroinformatics Dufek, A.S., Augusto, D.A., Dias, P.L., Barbosa, H.J., 2017. Application of evolutionary computation on ensemble forecast of quantitative precipitation. Comput. Geosci. 106, 139–149. Eslamian, S., Abedi-Koupai, J., Zareian, M.J., 2012. Measurement and modelling of the water requirement of some greenhouse crops with artificial neural networks and genetic algorithm. Int. J. Hydrol. Sci. Technol. 2 (3), 237–251. Ferreira, C., 2001. Gene expression programming: a new adaptive algorithm for solving problems. arXiv. preprint cs/0102027. Ferreira, C., 2002a. Gene expression programming in problem solving. In: Soft Computing and Industry. Springer, London, UK, pp. 635–653. Ferreira, C., 2002b. Mutation, transposition, and recombination: an analysis of the evolutionary dynamics. In: Proceedings of the 6th Joint Conference on Information Sciences, 4th International Workshop on Frontiers in Evolutionary Algorithms, pp. 614–617. Ferreira, C., 2003. Function finding and the creation of numerical constants in gene expression programming. In: Advances in Soft Computing. Springer, London, UK, pp. 257–265. Ferreira, C., 2006. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. vol. 21 Springer, Germany. Gandomi, A.H., Alavi, A.H., 2012. Krill herd: a new bio-inspired optimization algorithm. Commun. Nonlinear Sci. Numer. Simul. 17 (12), 4831–4845. Giustolisi, O., Savic, D.A., 2006. A symbolic data-driven technique based on evolutionary polynomial regression. J. Hydroinf. 8 (3), 207–222. Guven, A., Azamathulla, H.M., Zakaria, N.A., 2009. Linear genetic programming for prediction of circular pile scour. Ocean Eng. 36 (12–13), 985–991. Jin, S.S., 2020. Compositional kernel learning using tree-based genetic programming for Gaussian process regression. Struct. Multidiscip. Optim. 62, 1313–1351. Kazemi, M.H., Majnooni-Heris, A., Kisi, O., Shiri, J., 2021. Generalized gene expression programming models for estimating reference evapotranspiration through cross-station assessment and exogenous data supply. Environ. Sci. Pollut. Res. 28 (6), 6520–6532. Keramatloo, M., Zahiri, A., Kordi, E., Ghorbani, K., Dehghani, A., 2020. Modeling of river water temperature using gene expression programming (case study: MohammadAbad River in Golestan province). J. Water Soil Conserv. 27 (2), 237–244. Khan, M., Tufail, M., Azamathulla, H.M., Ahmad, I., Muhammad, N., 2018. Genetic functions-based modelling for pier scour depth prediction in coarse bed streams. Proc. Inst. Civil Eng. Water Manag. 171 (5), 225–240. Koolivand-Salooki, M., Esfandyari, M., Rabbani, E., Koulivand, M., Azarmehr, A., 2017. Application of genetic programing technique for predicting uniaxial compressive strength using reservoir formation properties. J. Petrol. Sci. Eng. 159, 35–48. Koza, J.R., 1990. Genetic Programming: A Paradigm for Genetically Breeding Populations of Computer Programs to Solve Problems. Vol. 34 Stanford University, Department of Computer Science, USA, Stanford, CA. Koza, J.R., 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. A Bradford Book. MIT Press. Li, X., Zhou, C., Xiao, W., Nelson, P.C., 2005. Prefix gene expression programming. In: Proc. Genetic and Evolutionary Computation Conference, Washington, USA, pp. 25–31. Najafzadeh, M., Barani, G.A., 2011. Comparison of group method of data handling based genetic programming and back propagation systems to predict scour depth around bridge piers. Sci. Iranica 18 (6), 1207–1213. Najafzadeh, M., Kargar, A.R., 2019. Gene-expression programming, evolutionary polynomial regression, and model tree to evaluate local scour depth at culvert outlets. J. Pipeline Syst. Eng. Pract. 10 (3), 04019013. Najafzadeh, M., Oliveto, G., 2021. More reliable predictions of clear-water scour depth at pile groups by robust artificial intelligence techniques while preserving physical consistency. Soft. Comput. 25 (7), 5723–5746. Najafzadeh, M., Laucelli, D.B., Zahiri, A., 2017. Application of model tree and evolutionary polynomial regression for evaluation of sediment transport in pipes. KSCE J. Civ. Eng. 21 (5), 1956–1963. Najafzadeh, M., Ghaemi, A., Emamgholizadeh, S., 2019. Prediction of water quality parameters using evolutionary computing-based formulations. Int. J. Environ. Sci. Technol. 16 (10), 6377–6396. Nezaratian, H., Zahiri, J., Peykani, M.F., Haghiabi, A., Parsaie, A., 2021. A genetic algorithm-based support vector machine to estimate the transverse mixing coefficient in streams. Water Quality Res. J. 56 (3), 127–142. Norouzi, H., Moghaddam, A.A., Celico, F., Shiri, J., 2021. Assessment of groundwater vulnerability using genetic algorithm and random forest methods (case study: Miandoab plain, NW of Iran). Environ. Sci. Pollut. Res. 28, 1–16. Parsaie, A., Haghiabi, A.H., 2021. Mathematical expression for discharge coefficient of weir-gate using soft computing techniques. J. Appl. Water Eng. Res. 9 (3), 175–183. Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R., 2008. A Field Guide to Genetic Programming. Springer, Germany. Raiahi-Madvar, H., Seifi, A., 2016. Performance evaluation of gene expression programming approach in layout Design of Drippers in drip irrigation systems comparing with empirical method. J. Water Soil Conserv. 23 (5), 25–45. https://doi.org/10.22069/jwfst.2017.9467.2359. Ravansalar, M., Rajaee, T., Kisi, O., 2017. Wavelet-linear genetic programming: a new approach for modeling monthly streamflow. J. Hydrol. 549, 461– 475. Riahi-Madvar, H., Ayyoubzadeh, S.A., Khadangi, E., Ebadzadeh, M.M., 2009. An expert system for predicting longitudinal dispersion coefficient in natural streams by using ANFIS. Expert Syst. Appl. 36 (4), 8589–8596. Riahi-Madvar, H., Dehghani, M., Seifi, A., Singh, V.P., 2019. Pareto optimal multigene genetic programming for prediction of longitudinal dispersion coefficient. Water Resour. Manag. 33 (3), 905–921. Riahi-Madvar, H., Gholami, M., Gharabaghi, B., Seyedian, S.M., 2021. A predictive equation for residual strength using a hybrid of subset selection of maximum dissimilarity method with Pareto optimal multi-gene genetic programming. Geosci. Front. 12 (5), 101222. Saljoughi, E., 2017. Application of genetic programming as a powerful tool for modeling of cellulose acetate membrane preparation. J. Textiles Polym. 5 (1), 1–7. Gene expression models Chapter 13 241 Sattar, A.M.A, Gharabaghi, B., 2015. Gene expression models for prediction of longitudinal dispersion coefficient in streams. J. Hydrol. 524, 587–596. Searson, D.P., 2009. GPTIPS: Genetic programming and symbolic regression for MATLAB. https://eprints.ncl.ac.uk/175261. Searson, D.P., 2015. GPTIPS 2: an open-source software platform for symbolic data mining. In: Handbook of Genetic Programming Applications. Springer, Cham, Switzerland, pp. 551–573. Searson, D.P., Leahy, D.E., Willis, M.J., 2010, March. GPTIPS: An open source genetic programming toolbox for multigene symbolic regression. In: Proceedings of the International Multiconference of Engineers and Computer Scientists. Vol. 1. Citeseer, USA, pp. 77–80. Tinoco, R.O., Goldstein, E.B., Coco, G., 2015. A data-driven approach to develop physically sound predictors: application to depth-averaged velocities on flows through submerged arrays of rigid cylinders. Water Resour. Res. 51 (2), 1247–1263. Wu, W., 2007. Computational River Dynamics. CRC Press. Yan, X., Mohammadian, A., Khelifa, A., 2021. Modeling spatial distribution of flow depth in fluvial systems using a hybrid two-dimensional hydraulicmultigene genetic programming approach. J. Hydrol. 600, 126517. Zahiri, A., Shabani, M.A., 2018. Modeling of stage-discharge relationship in compound channels using multi-stage gene expression programming. Iranian J. Ecohydrol. 5 (1), 37–48. This page intentionally left blank Chapter 14 Gradient-based optimization Mohammad Zakwan School of Technology, Maulana Azad National Urdu University, Hyderabad, India Symbols a, h, g parameters of Pearson three parameter distribution average of shifted log transformed discharge data mC j parameter in lognormal three parameters (3P) distribution standard deviation of log transformed discharge data sc A watershed area in km2 b coefficient of sediment rating curve c, d, e, f, and g the empirical parameters vegetal cover factor Fv I channel inflow K constant in Muskingum equation p weighted inflow parameter P rainfall term Q channel outflow effective discharge Qe S channel storage watershed slope SL X weighing ratio 1. Introduction There are several applications of optimization techniques in water resource engineering (Datta and Harikrishna, 2005; Wang et al., 2009; Kisi et al., 2012; Bazaraa et al., 2013; Niazkar and Afzali, 2014; Asgari et al., 2016; Qin et al., 2018). In this regard, researchers have widely used various optimization techniques in water resource engineering (Eslamian and Lavaei, 2009; Bhattacharjya, 2011; Kisi et al., 2012). Starting from trial and error procedures have moved toward the application of evolutionary algorithms and gradient-based optimization methods (Hegazy and Ersahin, 2001; Haddad et al., 2006; Xu et al., 2012). Evolutionary optimization techniques such as Genetic Algorithm (Mohan, 1997; Eslamian and Lavaei, 2009; Mondal et al., 2010), Harmony Search (Kim et al., 2001), Particle Swarm Optimization (Chu and Chang, 2009), Ant Colony Algorithm (Afshar et al., 2015), Artificial Bee Colony (Kisi et al., 2012), Bat Algorithm (Ahmadianfar et al., 2016), Cuckoo Search Algorithm (Meng et al., 2019), Differential Evolution (Xu et al., 2012), Genetic Programming (Mehr et al., 2018), Gravitational Search Algorithm (Rashedi et al., 2009; Yazdani et al., 2014), Simulated Annealing (Wang et al., 2009), teaching learning-based optimization (Rao et al., 2012), Honey Bee Mating Optimization (Haddad et al., 2006), Water Cycle Algorithm (Sadollah et al., 2014), Modified Honey Bee Mating Optimization (Niazkar and Afzali, 2014), Firefly Algorithm (Kisi et al., 2015) Weed Optimization Algorithm (Asgari et al., 2016), Gray Wolf Optimization Algorithm (Choopan and Emami, 2019) and Harris Hawk Optimization (Tikhamarine et al., 2020) have been used in water resource engineering. However, the efficiency of evolutionary algorithm is largely dependent on the tuning of these algorithm parameters, which is often cumbersome, requiring a greater computational effort and expertise. On the other hand, gradient-based optimization techniques are much simpler and can be easily used for solving optimization problems with differentiable convex objective functions and convex constrain domain. Fitting equations, parameter estimation and trade-off between the cost and benefit of various aspects of water resource planning has remained an essential part of hydrology and hydraulics (Eslamian et al., 2000; Hegazy and Ersahin, 2001; Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00013-0 Copyright © 2023 Elsevier Inc. All rights reserved. 243 244 Handbook of hydroinformatics Zakwan, 2017; Yuan et al., 2020). In this regard, gradient-based optimization has largely helped water resource engineers, hydraulicians and hydrologist. There are several evidences of the use of gradient-based optimization technique in water resource engineering. Gupta and Sorooshian (1985) applied gradient-based optimization for rainfall runoff modeling. Lall and Miller (1988) and Goy et al. (1989) employed gradient-based Berndt-Hall-Hall-Hausman (BHHH) algorithm for fitting equations in the time series analysis based on the least square method. Peng and Buras (2000) employed gradient-based optimization technique for developing optimal reservoir operation system. Jewell (2001) proposed the application of gradient-based solver to teach the design of pipe flow network and computation of the flow area of open channels. Geem (2006) proposed the application of gradient-based Broyden Fletcher Goldfarb Shanno (BFGS) technique for determining the optimal parameter of Muskingum channel routing. Yeo and Guldmann (2010) applied hill climbing algorithm for hydrological modeling of watershed to estimate the peak runoff. Bhattacharjya (2011) employed GRG optimization technique to model the groundwater inverse flow problem. Karahan (2009), Barati (2013), Hirpurkar and Ghare (2014), and Zakwan and Muzzammil (2016) identified the optimal parameters of Muskingum flood routing equation based on the GRG optimization. Ibtissem and Nouredine (2013) developed an algorithm based on conjugate gradient neural networks to address the nonlinearity in systems. Che et al. (2014) employed gradient-based optimization technique to determine parameters of unit hydrograph. Wang and Brubaker (2015) employed gradient-based optimization to develop a multi objective model for hydrologic modeling. Zakwan (2016a) used GRG optimization technique to obtain the parameter of Intensity Duration Frequency (IDF) curves while Zakwan (2016b) utilized the same technique to fit the rainfall runoff curve. Zakwan et al. (2016a,b) used GRG optimization technique to estimate the parameters of various infiltration and compare the performance of infiltration models. Muzzammil et al. (2015, 2018) and Zakwan et al. (2017) applied gradient-based technique to identify the parameters of stage-discharge rating curves for different rivers. Cho et al. (2017) applied Davidon Fletcher Powell (DFP) algorithm to estimate the parameters of the rainfall model and reported it to be highly sensitive to initial values of decision variables. Niazkar and Afzali (2016, 2017) provided hybridized Honey Bee Mating optimization with GRG technique to obtain a better solution for flood routing problems. Geem and Kim (2018) applied the BFGS technique to propose an improved version of Manning equation. Zakwan (2018) applied GRG technique to model the looped stage-discharge rating curve. Pandey et al. (2020) utilized the GRG technique to develop the equation to estimate the scour depth. Samadi-koucheksaraee et al. (2019) made a novel attempt to apply Gradient Evolution technique for optimal reservoir operation. Zakwan (2020) redefined maximum observed discharge and rainfall envelope curves using the GRG technique. Niazkar and Zakwan (2021) applied the hybrid MGGP-GRG technique to model simple and looped discharge ratings. Zakwan and Niazkar (2021) applied the hybrid MGGP-GRG technique to model the infiltration rate. Zakwan and Niazkar (2022)) also applied GRG technique to model the reverse flood routing problem. Basically, gradient-based optimization techniques have been broadly used for solving nonlinear equations, estimation of parameters and developing empirical equations in water resource engineering. In this regard, one example each of curve fitting, solving nonlinear equation, and estimation of parameters is presented. 2. Materials and method Numerous applications of optimization techniques could be in Hydroinformatics. In the present chapter, three examples of gradient-based optimization techniques would be discussed. Although examples considered here are basically optimization problems in one form or the other but based on the form in which they appear in hydrological literature, they may be categorized as follows: solving nonlinear equations, parameter estimation, and development of empirical relations. As an example to solving of nonlinear equation, equations formed in analytical estimation of effective discharge was considered. Parameters involved in Muskingum flood routing were estimated as an example to parameter estimation problems while empirical equation was formulated to estimate mean annual discharge as an example to development of empirical relationships. Sediment discharge data of Drava River was used for analytical estimation of effective discharge. Flood routing data available in Viessman and Lewis (2003) was used for estimating the Muskingum channel routing while the data of watershed characteristics of various Indian rivers was obtained from Garde and Kothyari (1990). The gradient optimization technique selected in the present chapter was Generalized Reduced Gradient (GRG) optimization technique. GRG optimization tool is available in many commonly used platforms such as Microsoft Excel, MATLAB and Minitab. Depending on the suitability users can apply GRG technique to optimization problems through any of the above mentioned platforms. Woodbury et al. (2008) conducted a survey to assess the adaptability of undergraduate engineering students to various software platforms and found that majority of them are familiar with Microsoft excel but have little knowledge of handling other software platforms, therefore, author has selected to adopt GRG optimization technique available in Microsoft excel. Gradient-based optimization Chapter 14 245 Microsoft Excel contains three solvers, GRG, Linear Programming (LP) solver and Evolutionary solver namely, LP solver solves linear equation while the evolutionary solver is based on evolutionary algorithm. 2.1 GRG solver GRG solver comes as an add-inn tool with Microsoft excel. GRG solver algorithm was originally articulated in FORTRAN programming. GRG solver program is a combination of main program and numerous subprograms such as GRG, CONSBS, SEARCH, DEGEN, DIREC, REDOBJ, REDGRA, NEWTON, PARSH and GCOMP (Lasdon et al., 1978). In this code DATAIN reads the input, GRG works out the problem and the results are printed by OUTRES (Lasdon and Smith, 1992). GRG optimization method is deterministic optimization approach as it does not make use of random sampling and rely on gradient information. The search direction of GRG solver is automatically governed by either Qausi-Newton method or conjugate gradient method (Zakwan et al., 2017). Qausi-Newton is computationally more intensive as compared to conjugate gradient method as it stores the Hessian matrix at each iteration (Barati, 2013; Hirpurkar and Ghare, 2014). Multistart option is also available in GRG solver to look after multiple starting points to address the chances of getting trapped in local optimum. Users also have the option to select between forward finite difference method and central finite difference method to calculate the gradient. For application of Excel solver to solve non-linear programming problems, problems can be modeled using excel spreadsheet, C and C ++ programming. 3. Results and discussion In the present chapter, three applications of gradient-based optimization techniques have been discussed, the details of these applications can be found in subsequent subsections. In general, optimization refers to attaining a certain target (Objective function) under specific conditions (Constrains) by adjusting some variables (Decision variables). Therefore, basic step toward the application of any optimization software and formulation of an optimization problem is to recognize these components (Objective function, Constrains and Decision variables). 3.1 Solving nonlinear equations As an example to application of gradient-based optimization technique for solving the nonlinear equation, the nonlinear equations produced in the analytical estimation of effective discharge are considered. Effective discharge is defined as the discharge that transports maximum sediment load in a river over a period of time and is often considered as the discharge responsible for shaping the river (Zakwan et al., 2018). Apart from other applications, computation of effective discharge plays a major role in the design of hydraulic projects across the river cross section. Analytically, effective discharge is computed as the point of inflection of the curve obtained from the product of sediment rating curve and fitted frequency distribution to the discharge time series. In case, the lognormal three parameters (3P) and Pearson three parameter distributions are fitted to the discharge time series, the expression of effective discharge computations will be given by Eq. (1) and Eq. (2) respectively. bg mC + b 1 + (1) s2C ln Qe x ¼ 0 Qe aQ2e ðZ 1 + b + agÞQe + bg ¼ 0 (2) Fig. 1 shows the set up for estimation of effective discharge based on lognormal three parameters. In Fig. 1, Column B shows the parameters of Lognormal (3P) distribution and rating curve parameters obtained by fitting Lognormal (3P) distribution and sediment rating curve respectively. Eq. (1) was set as target or objective function (Cell F3 shown in Green color in Fig. 1). The decision variable in Eq. (1) is obviously the effective discharge (Qe). Cell D4 (Red color) represents the decision variable while cell D6 (Brown color) and cell D8 (Gray color) are the upper and lower bound respectively, on the decision variable. Graphical user interface for GRG technique is also shown in Fig. 1. In this graphical interface target or objective is chosen as Cell F3 and was targeted to a value of 0 as could be seen in the Right Hand Side (R.H.S) of Eq. (1). The changing variable or decision variable was set as Cell D4 and constrains were set as lower and upper bound on the decision variable. The equation was solved with initial guess of 1000 m3/s discharge against the effective and the optimal value of effective discharge was obtained as 524.75 m3/s with target cell almost equal to zero. Similarly, Eq. (2) was formulated to obtain the analytical estimate of effective discharge based on Pearson three-parameter frequency distribution and the effective discharge was obtained as 514.32 m3/s. 246 Handbook of hydroinformatics FIG. 1 Set up showing analytical estimation of effective discharge. 3.2 Application in parameter estimation Application of gradient-based optimization to estimate the parameters of hydrologic models is very common. Here we consider an example of Muskingum hydrologic channel routing model with weighted inflow. Flood routing is an important aspect in the design of flood protection works (Ara and Zakwan, 2018). Flood routing is a technique to determine the time and magnitude of the flood at a river section from known hydrographs at one more sections upstream. Various forms of Muskingum hydrologic channel routing are available in literature. Several hydraulicians have observed that weighted inflow from the previous and current time step provides better estimates of change in channel storage, therefore, a parameter p was introduced to compute weighted inflow (Vatankhah, 2017). However, introduction of parameter p makes the routing equation more complicated, eliminating any chances of the use of trial error procedure. The nonlinear Muskingum model with provision of weighted inflow may be presented as St ¼ K ½X W t + ð1 XÞ Qt m (3) Where W t ¼ ½p I t + ð1 pÞI t1 (4) Eq. (4) represents four parameter (K, X, p and m) model. The outflow and storage for this model may be computed as m1 DSt 1 St 1 ¼ + (5) Wt 1X 1X Dt K m1 1 St X Qt ¼ (6) Wt 1X 1X K Fig. 2 shows the modeled weighted Muskingum channel routing equation. Column A, B and C are the observed data acting as an input data for the present problem. Column D represents the storage at different time steps calculated in accordance with Eq. (3). Column E represent change in storage with respect to time, calculated in accordance with the Eq. (5) while Column F represent change in storage. Column E represents the estimated outflow, calculated in accordance with Eq. (6). In Fig. 2, Cell I3, J3, K3, L3 (Red color) are the decision variables while I5, J5, K5, L5 (Brown color) and I7, J7, K7, L7 (Gray color) represent lower and upper bound on the decision variables. In Fig. 2, Cell N5 shows the sum of square of error as the Gradient-based optimization Chapter 14 247 FIG. 2 Set up showing estimation of parameters of Muskingum equation. target or objective function. To obtain the optimal value of decision variables (K, X, m and p) the target cell was set to minimization as shown in GUI in Fig. 2. The outflow hydrograph obtained from the present analysis is shown in Fig. 3. It may be observed that the outflow obtained from the weighted Muskingum equation are in good agreement with the observed outflow. Also the outflow hydrographs obtained estimate the peak outflow quite accurately. The observed outflows at the first and second peak are 1509 m3/s and 1248 m3/s respectively, while the estimated outflows are 1470 m3/s and 1245 m3/s respectively. 1600 Observed 1400 Estimated Outflow (m3/s) 1200 1000 800 600 400 200 0 0 5 10 15 Time (hr) FIG. 3 Observed and estimated outflow from Muskingum equation. 20 25 248 Handbook of hydroinformatics 3.3 Fitting empirical equations Another application of optimization tools in the field is to develop empirical equations mostly on a regional basis. Several hydrologic phenomena are too complex to be exactly measured and the governing factors of many of them are highly dependent on watershed characteristics. In such cases, empirical equations are developed taking into account regional watershed characteristics. Here we consider an example of estimation of discharge of 2.33 years return interval, Q2.33, (mean annual flood in case of Gumbel distribution) in Indian watersheds. It was observed that for Indian watersheds Q2:33 ¼ f ðA, P, SL , Fv Þ (7) Q2:33 ¼ cðAÞd ðPÞe ðSL Þf ðFv Þg (8) So, the empirical equation will be of the form The empirical parameters in Eq. (8) can be estimated by minimizing the sum of square of error as shown in the Eq. (9) Min SSE ¼ N h X Q2:33 cðAÞd ðPÞe ðSL Þf ðFv Þg i2 (9) i¼1 After determining the parameters, mean annual flood can be estimated based on the watershed characteristics. The set up for establishing empirical relationship is shown in Fig. 4. In Fig. 4, watershed characteristics (input data) is shown in Column B to column E while column A represent the average annual discharge as per the Gumbel distribution. Column F represents the modeled equation, i.e., Eq. (8). Cell H3, I3, J3, K3, L3 (Red color) are the decision variables while H5, I5, J5, K5, L5 (Brown color) and H7, I7, J7, K7, L7 (Gray color) represent lower and upper bound on the decision variables. The sum of square of error was set as the target or objective function (Cell N3 shown in Green color in Fig. 4). To obtain the optimal value of the decision variables the target cell was set to minimization as shown in GUI in Fig. 4. In most of the gradient-based methods, initial values of decision variables are to be supplied by users and gradient-based methods are highly sensitive to the initial values of decision variables (Cho et al., 2017), on the other hand evolutionary algorithms are computationally expensive. Since the gradient-based optimization technique depends on the principle of direction of gradient (slope), there are chances of these techniques to be trapped in local optimum solution rather than FIG. 4 Set up for developing empirical equation. Gradient-based optimization Chapter 14 249 finding global optima, therefore, users should try for a fairly diverse and large number of initial guess of decision variables, covering entire search space to ascertain global optimum solution. Multistart option is available in GUI of GRG solver Addin of Microsoft Excel, which may help for the users as it automatically start with different initial values of decision variables, thereby increasing the chances of finding global optimum solution. 4. Conclusions Modeling of hydrologic events are often very complex and require implication of nonlinear optimization technique. Historically, such problems were dealt through trial and error methods. However, with the advancement of programming techniques several optimization techniques have become common nowadays. Several gradient-based optimization techniques as well as evolutionary techniques have gained considerable attention in the field of hydrology and Hydroinformatics. The present chapter focuses on the application of gradient-based techniques in the field of Hydroinformatics. Over the years several researchers have implemented gradient-based techniques to model various hydrologic events and proved that gradient-based techniques are capable of successfully modeling hydrologic events. In the present chapter, gradient-based GRG optimization technique has been applied to obtain the (i) Analytical solution of effective discharge based on the lognormal three parameters (3P) and Pearson three parameter frequency distributions (ii) Optimal parameters of Muskingum channel routing equation with weighted inflow and (iii) Empirical equation to estimate the mean annual flood discharge for Indian watersheds. However, gradient-based optimization techniques are sensitive to the initial values of parameters (decision variables), in this regard, they should be rigorously checked against local optimum solution, by considering multiple initial values of decision variables covering the entire search space. When gradient-based techniques are rerun with multiple initial values of decision variables, chances of obtaining global optimum solution increase. Hybrids of gradient-based evolutionary algorithms can ensure global optimal solutions with reduced computational expense, thereby opening the scope of future research and application in the field of Hydroinformatics. References Afshar, A., Massoumi, F., Afshar, A., Mariño, M.A., 2015. State of the art review of ant colony optimization applications in water resource management. Water Resour. Manag. 29 (11), 3891–3904. Ahmadianfar, I., Adib, A., Salarijazi, M., 2016. Optimizing multi reservoir operation: hybrid of bat algorithm and differential evolution. J. Water Resour. Plan. Manag. 142 (2), 05015010. Ara, Z., Zakwan, M., 2018. Estimating runoff using SCS curve number method. Int. J. Emerg. Technol. Adv. Eng. 8, 195–200. Asgari, H.R., Bozorg Haddad, O., Pazoki, M., Loáiciga, H.A., 2016. Weed optimization algorithm for optimal reservoir operation. J. Irrig. Drain. Eng. 142 (2), 04015055. Barati, R., 2013. Application of excel solver for parameter estimation of the nonlinear Muskingum models. KSCE J. Civ. Eng. 17 (5), 1139–1148. Bazaraa, M.S., Sherali, H.D., Shetty, C.M., 2013. Nonlinear Programming: Theory and Algorithms. John Wiley & Sons. Bhattacharjya, R.K., 2011. Solving groundwater flow inverse problem using spreadsheet solver. J. Hydrol. Eng. 16, 472–477. https://doi.org/10.1061/ (ASCE)HE.1943-5584.0000329. Che, D., Nangare, M., Mays, L.W., 2014. Determination of optimal unit hydrographs and green-ampt parameters for watersheds. J. Hydrol. Eng. 19 (2), 375–383. Cho, H., Lee, K.E., Kim, G., 2017. Analysis of the applicability of parameter estimation methods for a stochastic rainfall generation model. J. Korean Data Inf. Sci. Soc. 28 (6), 1447–1456. Choopan, Y., Emami, S., 2019. Optimal operation of dam reservoir using gray wolf optimizer algorithm (Case study: Urmia Shaharchay dam in Iran). J. Soft Comput. Civ. Eng. 3 (3), 47–61. Chu, H.J., Chang, L.C., 2009. Applying particle swarm optimization to parameter estimation of the nonlinear Muskingum model. J. Hydrol. Eng. 14 (9), 1024–1027. Datta, B., Harikrishna, V., 2005. Optimization applications in water resources systems engineering. Res. J. IIT Kanpur, 57–64. Eslamian, S., Lavaei, N., 2009. Modelling nitrate pollution of groundwater using artificial neural network and genetic algorithm in an arid zone. Int. J. Water 5 (2), 194–203. Eslamian, S.S., Salimi, V., Chavoshi, S., 2000. Developing an empirical model for the estimation of peak discharge in some catchments in Western Iran. J. Sci. Technol. Agric. Nat. Resour. 4 (2), 1–12. Garde, R.J., Kothyari, U.C., 1990. Flood estimation in Indian catchments. J. Hydrol. 113 (1–4), 135–146. Geem, Z.W., 2006. Parameter estimation for the non-linear Muskingum model using the BFGS technique. J. Irrig. Drain. Eng. 132 (5), 474–478. Geem, Z.W., Kim, J.H., 2018. Application of computational intelligence techniques to an environmental flow formula. Int. J. Fuzzy Log. Intell. Syst. 18, 237–244. 250 Handbook of hydroinformatics Goy, J., Morand, P., Etienne, M., 1989. Long-term fluctuations of Pelagianoctiluca (Cnidaria, Scyphomedusa) in the western Mediterranean Sea. Prediction by climatic variables. Deep Sea Res. Part A 36 (2), 269–279. Gupta, V.K., Sorooshian, S., 1985. The automatic calibration of conceptual catchment models using derivative-based optimization algorithms. Water Resour. Res. 21 (4), 473–485. Haddad, O.B., Afshar, A., Marino, M.A., 2006. Honey-bees mating optimization (HBMO) algorithm: a new heuristic approach for water resources optimization. Water Resour. Manag. 20 (5), 661–680. Hegazy, T., Ersahin, T., 2001. Simplified spreadsheet solution overall construction optimization. J. Constr. Eng. Manag. 127 (6), 469–475. Hirpurkar, P., Ghare, A.D., 2014. Parameter estimation for the nonlinear forms of the Muskingum model. J. Hydrol. Eng. 20 (8), 04014085. Ibtissem, C., Nouredine, L., 2013, March. A hybrid method based on conjugate gradient trained neural network and differential evolution for non linear systems identification. In: 2013 International Conference on Electrical Engineering and Software Applications. IEEE, pp. 1–5. Jewell, T.K., 2001. Teaching hydraulic design using equation solvers. J. Hydraul. Eng. 127 (12), 1013–1021. Karahan, H., 2009. Predicting Muskingum Flood Routing Parameters Using Spreadsheet. Wiley Periodicals Inc, pp. 280–286. Kim, J.H., Geem, Z.W., Kim, E.S., 2001. Parameter estimation of the nonlinear Muskingum model using harmony search. J. Am. Water Resour. Assoc. 37 (5), 1131–1138. Kisi, O., Ozkan, C., Akay, B., 2012. Modeling discharge–sediment relationship using neural networks with artificial bee colony algorithm. J. Hydrol. 428, 94–103. Kisi, O., Shiri, J., Karimi, S., Shamshirband, S., Motamedi, S., Petkovic, D., Hashim, R., 2015. A survey of water level fluctuation predicting in Urmia Lake using support vector machine with firefly algorithm. Appl. Math Comput. 270, 731–743. Lall, U., Miller, C.W., 1988. An optimization model for screening multipurpose reservoir systems. Water Resour. Res. 24 (7), 953–968. Lasdon, L.S., Smith, S., 1992. Solving sparse nonlinear programs using GRG. ORSA J. Comput. 4 (1), 2–15. Lasdon, L.S., Waren, A.D., Jain, A., Ratner, M., 1978. Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans. Math. Softw. 4 (1), 34–50. Mehr, A.D., Nourani, V., Kahya, E., Hrnjica, B., Sattar, A.M., Yaseen, Z.M., 2018. Genetic programming in water resources engineering: a state-of-the-art review. J. Hydrol. 566, 643–667. Meng, X., Chang, J., Wang, X., Wang, Y., 2019. Multi-objective hydropower station operation using an improved cuckoo search algorithm. Energy 168, 425–439. Mohan, S., 1997. Parameter estimation of nonlinear Muskingum models using genetic algorithm. J. Hydraul. Eng. 123 (2), 137–142. Mondal, A., Eldho, T.I., Rao, V.G., 2010. Multiobjective groundwater remediation system design using coupled finite-element model and non-dominated sorting genetic algorithm II. J. Hydrol. Eng. 15 (5), 350–359. Muzzammil, M., Alam, J., Zakwan, M., 2015. An optimization technique for estimation of rating curve parameters. In: Symposium on Hydrology, New Delhi, India. Indian Association of Hydrologists (IAH), Roorkee, pp. 234–240. Muzzammil, M., Alam, J., Zakwan, M., 2018. A spreadsheet approach for prediction of rating curve parameters. In: Hydrologic Modeling. Springer, Singapore, pp. 525–533, https://doi.org/10.1007/978-981-10-5801-1_36. Niazkar, M., Afzali, S.H., 2014. Assessment of modified honey bee mating optimization for parameter estimation of nonlinear Muskingum models. J. Hydrol. Eng. 20 (4), 04014055. Niazkar, M., Afzali, S.H., 2016. Application of new hybrid optimization technique for parameter estimation of new improved version of Muskingum model. Water Resour. Manag. 30 (13), 4713–4730. Niazkar, M., Afzali, S.H., 2017. Parameter estimation of an improved nonlinear Muskingum model using a new hybrid method. Hydrol. Res. 48 (5), 1253–1267. Niazkar, M., Zakwan, M., 2021. Assessment of artificial intelligence models for developing single-value and loop rating curves. Complexity. https://doi. org/10.1155/2021/6627011. Pandey, M., Zakwan, M., Sharma, P.K., Ahmad, Z., 2020. Multiple linear regression and genetic algorithm approaches to predict temporal scour depth near circular pier in non-cohesive sediment. ISH J. Hydraul. Eng. 26 (1), 96–103. https://doi.org/10.1080/09715010.2018.1457455. Peng, C.S., Buras, N., 2000. Dynamic operation of a surface water resources system. Water Resour. Res. 36 (9), 2701–2709. Qin, Y., Kavetski, D., Kuczera, G., 2018. A Robust Gauss-Newton algorithm for the optimization of hydrological models: benchmarking against industry-standard algorithms. Water Resour. Res. 54 (11), 9637–9654. Rao, R.V., Savsani, V.J., Balic, J., 2012. Teaching–learning-based optimization algorithm for unconstrained and constrained real-parameter optimization problems. Eng. Optim. 44 (12), 1447–1462. Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S., 2009. GSA: a gravitational search algorithm. Inform. Sci. 179 (13), 2232–2248. Sadollah, A., Yoo, D.G., Yazdi, J., Kim, J.H., Choi, Y., 2014. Application of water cycle algorithm for optimal cost design of water distribution systems. In: 11th International Conference on Hydroinformatics, New York City, USA. Samadi-koucheksaraee, A., Ahmadianfar, I., Bozorg-Haddad, O., Asghari-pari, S.A., 2019. Gradient evolution optimization algorithm to optimize reservoir operation systems. Water Resour. Manag. 33 (2), 603–625. Tikhamarine, Y., Souag-Gamane, D., Ahmed, A.N., Sammen, S.S., Kisi, O., Huang, Y.F., El-Shafie, A., 2020. Rainfall-runoff modelling using improved machine learning methods: Harris hawks optimizer vs. particle swarm optimization. J. Hydrol., 125133. Vatankhah, A.R., 2017. Non-linear Muskingum model with inflow-based exponent. Proc. Inst. Civ. Eng. Water Manage. 170 (2), 66–80. Viessman, W., Lewis, G.L., 2003. Introduction to Hydrology, fifth ed. Pearson, New Delhi, India. Wang, Y., Brubaker, K., 2015. Multi-objective model auto-calibration and reduced parameterization: exploiting gradient-based optimization tool for a hydrologic model. Environ. Model. Softw. 70, 1–15. Gradient-based optimization Chapter 14 251 Wang, X., Sun, Y., Song, L., Mei, C., 2009. An eco-environmental water demand based model for optimising water resources using hybrid genetic simulated annealing algorithms. Part I. Model development. J. Environ. Manage. 90 (8), 2628–2635. Woodbury, K.A., Taylor, R.P., Huguet, J., Dent, T., Chappell, J., Mahan, K., 2008. Vertical integration of excel in the thermal mechanical engineering curriculum. In: ASME 2008 International Mechanical Engineering Congress and Exposition, pp. 317–325. Xu, D., Qui, L., Chen, S., 2012. Estimation of nonlinear Muskingum model parameter using differential evolution. J. Hydrol. Eng. 17 (2), 348–353. Yazdani, S., Nezamabadi-pour, H., Kamyab, S., 2014. A gravitational search algorithm for multimodal optimization. Swarm Evol. Comput. 14, 1–14. Yeo, I.Y., Guldmann, J.M., 2010. Global spatial optimization with hydrological systems simulation: appliication to land-use allocation and peak runoff minimization. Hydrol. Earth Syst. Sci. 14 (2). https://doi.org/10.5194/hess-14-325-2010. Yuan, G., Wang, X., Sheng, Z., 2020. The projection technique for two open problems of unconstrained optimization problems. J. Optim. Theory Appl., 1–30. Zakwan, M., 2016a. Application of optimization technique to estimate IDF parameters. Water Energy Int. 59 (5), 69–71. Zakwan, M., 2016b. Estimation of runoff using optimization technique. Water Energy Int. 59 (8), 42–44. Zakwan, M., 2017. Assessment of dimensionless form of Kostiakov model. Aquademia 1 (1), 01. https://doi.org/10.20897/awet.201701. Zakwan, M., 2018. Spreadsheet-based modelling of hysteresis-affected curves. Appl. Water Sci. 8 (4), 101–105. https://doi.org/10.1007/s13201-0180745-3. Zakwan, M., 2020. Revisiting maximum observed precipitation and discharge envelope curves. Int. J. Hydrol. Sci. Technol. 10 (3), 221–229. https://doi. org/10.1504/IJHST.2020.107215. Zakwan, M., Muzzammil, M., 2016. Optimization approach for hydrologic channel routing. Water Energy Int. 59 (3), 66–69. Zakwan, M., Muzzammil, M., Alam, J., 2016a. Estimation of soil properties using infiltration data. In: Proceeding National Conference on Advances in Geotechnical Engineering, Aligarh, pp. 198–201. Zakwan, M., Muzzammil, M., Alam, J., 2016b. Application of spreadsheet to estimate infiltration parameters. Perspect. Sci. 8, 702–704. https://doi.org/ 10.1016/j.pisc.2016.06.064. Zakwan, M., Muzzammil, M., Alam, J., 2017. Developing stage-discharge relations using optimization techniques. Aquademia 1 (2), 05. https://doi.org/ 10.20897/awet.201702. Zakwan, M., Ahmad, Z., Sharief, S.M.V., 2018. Magnitude-frequency analysis for suspended sediment transport in the Ganga River. J. Hydrol. Eng. 23 (7), 05018013. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001671. Zakwan, M., Niazkar, M., 2021. A comparative analysis of data-driven empirical and artificial intelligence models for estimating infiltration rates. Complexity 2021, 9945218. https://doi.org/10.1155/2021/9945218. Zakwan, M., Niazkar, M., 2022. Discussion of “Reverse Flood Routing in Rivers Using Linear and Nonlinear Muskingum Models” by Meisam Badfar, Reza Barati, Emrah Dogan, and Gokmen Tayfur. J. Hydrol. Eng. 27 (5), 07022001. This page intentionally left blank Chapter 15 Gray wolf optimization algorithm Mohammad Reza Zaghiyana, Vahid Shokri Kuchaka, and Saeid Eslamianb,c a Department of Water Engineering and Management, Tarbiat Modares University, Tehran, Iran, b Department of Water Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran, c Center of Excellence in Risk Management and Natural Hazards, Isfahan University of Technology, Isfahan, Iran 1. Introduction Today, water is considered one of the three factors of formation and survival of the environment (soil, air, and water) more than ever. Undoubtedly, the preservation and protection of water resources and their optimal use are global issues, and therefore in the 21st century, water crises are mentioned as a pervasive human challenge (Damania et al., 2019). Increased water demand in various sectors, pollution of water resources, climate change, and human activities can be considered the leading causes of water stress. The emphasis on optimal water resources management and its sustainable development is essential to deal with this type of stress. In this regard, optimization or, in other words, optimal use of available water resources according to the associated constraints is one of the most fundamental steps in water resources management. The first step in formulating macro water resources management policies is to propose different options according to the limitations and comprehensive water resources development and management goals. Optimization models are an efficient tool given the dimensions and complexities of water resource systems. However, uncertainties always affect their results (Loucks and Van Beek, 2017). Today, with the development of information technology, new flexible tools have been created. Their combination with optimization models has provided a new space for developing analysis, planning, and management of water resources systems (Tayfur, 2017). Furthermore, the development of these methods can significantly improve dealing with uncertainties. Selecting the set of decision variables that maximizes/minimizes the objective function subject to the system constraints is called the optimization procedure (Simonovic, 2012). In other words, the goal of optimization is to find the best possible acceptable solution, given the limitations and needs of the problem. Optimization algorithms are generally divided into two categories of exact and approximate algorithms. Exact algorithms, which are mathematical methods, include Linear programming (LP), Non-linear programming (NLP), Gradient-Based, Gradient Free, etc. (Yang, 2010). However, despite finding the optimal global definitive solution, these algorithms in NPa-hard optimization problems, due to the constraints and high dimensions of the system, cannot find the optimal solution, and execution time increases exponentially. On the other hand, approximate optimization algorithms can find suitable solutions (close to the global optimization) in a shorter time to solve NP-hard optimization problems. In other words, these algorithms are not guaranteed to achieve a definitive global answer but are very useful for problems with a large number of decision variables and strict constraints. Approximate algorithms are also classified into two categories of heuristic and meta-heuristic algorithms. Heuristics methods are problem-dependent techniques. In other words, they adapt to the problem and try to make the most of its features and benefits. However, the greediness of this technique to find the optimal solution causes it to be trapped in local optima, and the global optimal solution remains unknown (Sharma and Kaur, 2021). SUFI2b algorithm in SWAT-CUPc is an example of this technique. As issues and problems in water resources systems became more complex, including the allocation of water resources, optimization algorithms gradually improved, and the use of meta-heuristics methods surpassed other techniques. Meta-heuristic algorithms are problem-independent techniques inspired by nature or a natural rhythm and applied to solve optimization problems by converting them into mathematical equations. Meta-heuristic techniques are not used only for a specific issue, and search process management in approaching the optimal solution is one of the most critical features of this technique. In other words, by expanding the search scope, these methods choose the a. Nondeterministic polynomial time. b. Sequential uncertainty fitting. c. SWAT calibration and uncertainty procedures. Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00023-3 Copyright © 2023 Elsevier Inc. All rights reserved. 253 254 Handbook of hydroinformatics shortest path to reach the global optimal point and minimize the possibility of getting trapped in local optimizations (Oliva et al., 2019). Meta-heuristic algorithms are also classified into several categories, including evolutionary-based like Genetic Algorithm (Bonabeau et al., 1999), physics-based like Simulated Annealing (Kirkpatrick et al., 1983) and swarm intelligence like Ant Colony Optimization (Dorigo et al., 2006). The gray wolf optimizer (GWO) is a meta-heuristics algorithm that is in the category of swarm intelligence and population-based algorithms. The algorithm was developed by Mirjalili et al. (2014) and inspired by the strict social dominant hierarchy and social behavior of gray wolves while hunting. GWO algorithm has been applied in various fields of water resources studies, including water optimization allocation (Yu and Lu, 2018), optimal reservoir operation (Dahmani and Yebdri, 2020), soil properties (Mosavi et al., 2021), reference ETd estimating (Tikhamarine et al., 2019), streamflow forecasting (Tikhamarine et al., 2020) and groundwater studies (Majumder and Eldho, 2020). The complete theory and mathematical modeling of the GWO algorithm will be discussed in the following sections. Then at the end of this chapter, an optimization example by GWO algorithm in the MATLAB platform will be presented. 2. Theory of GWO Gray wolf (Canis lupus), also called timber wolf (Fig. 1), is the most prominent wild member of the dog family (Canidae). It inhabits vast areas of the Northern Hemisphere. Different species of gray wolf are known around the world. For example, 5–24 subspecies are known in North America, 7–12 in Eurasia, and 1 in Africa. The gray wolf in nature often prefers group life in packs of an average of 5 to 12. Its main food is hunting large venomous species such as wild sheep, wild goats, and deer. Gray wolves are also at the top of the food chain and pyramid. The interesting point about this type of animal, which was also briefly mentioned in the previous section, is that their life has an exact, highly orderly social hierarchy, as shown in Fig. 2. The leaders are a male and a female called alpha and are responsible for making decisions about hunting, sleeping location, waking up time, etc. The alpha’s decisions are dictated to the whole pack, and other groups affirm them by holding their tails down. Only alpha wolves are allowed to mate in the entire pack of gray wolves, and other wolves are not allowed. It should be noted that the alpha wolves are not necessarily the strongest member of the group but the best member in terms of pack management. In other words, the discipline and organization of a pack are far more important than its strength. FIG. 1 Gray wolf. d. Evapotranspiration. Gray wolf optimization algorithm Chapter 15 255 FIG. 2 Gray wolf social hierarchy. The second level of the gray wolf hierarchy is the beta. The beta wolves are subject to alpha decisions and contribute to other group activities. Beta wolves can be male or female, but they are the best alternative to alpha wolves growing old. Beta wolves must respect the alpha, and at the same time, command the lower-level wolf groups. This type of wolf acts as a consultant for alpha and a helper for the group. Beta boosts alpha commands throughout the pack and gives feedback to alpha. The lowest level is related to omega-gray wolves. Omega wolves play a protective role and must always follow all the higher and dominant wolf levels. Also, they are the last wolves allowed to eat. Omega may not seem to be an essential group in the pack, but if omega is not present in the group, problems such as civil wars between wolves can arise. Therefore, the presence of omega wolves creates a sense of satisfaction among all wolves and maintains its dominant structure. Also, it has been observed in some cases in the wild which omega wolves work as babysitters in their groups. In the gray wolves’ group, it is called delta or subordinate if the wolf is not Alpha, beta, or omega. Delta wolves must be subject to alpha and beta wolves, dominating omega wolves. Scouts, sentinels, elders, hunters, and caretakers’ wolves belong to this level. Scout wolves are responsible for observing the boundaries of the territory and warning the herd in the event of any danger. Sentinel wolves ensure the protection and safety of the group. Elders are experienced older wolves used as wolves in the Alpha or beta level. Hunter wolves help the Alpha and beta levels while hunting and provide food for the herd. And finally, caretaker wolves are responsible for caring for weak, sick, and injured wolves in the group. Group hunting is another interesting social behavior of gray wolves and the social hierarchy of wolves. According to Muro et al. (2011), the main phases of gray wolves hunting are as follows and shown in Fig. 3: l l l Tracking, chasing and approaching the prey. Pursuing, encircling, and harassing the prey until it stops moving. Attack toward the prey. 3. Mathematical modeling of gray wolf optimizer In this section, social hierarchy and hunting techniques (optimization), including encircling, tracking, and attacking the prey, are mathematically presented to design a GWO (Mirjalili et al., 2014). 3.1 Social hierarchy In the GWO for mathematical modeling of social hierarchy, the fittest position (solution) is alpha (a). Therefore, the second and third-best solutions are beta (b) and delta (d), respectively. Other solutions are also called omega (o). Each gray wolf in this algorithm is considered a search agent in an optimization problem. The search agent is evaluated in terms of its position according to the cost function. In this regard, optimization (hunting) is led by a, b, and d wolves, and o follows them. 256 Handbook of hydroinformatics FIG. 3 Hunting behavior of gray wolves: (A) chasing, approaching and tracking prey (B–D) pursuing, harassing, and encircling (E) stationary situation and attack. 3.2 Encircling prey According to the details mentioned above, gray wolves encircle the prey during hunting. The equations of the mathematical model of the gray wolf encircling behavior are proposed as follows: ! ! ! X ðt + 1Þ ¼ XP ðtÞ A D (1) ! ! ! ! D ¼ C XP ðtÞ X ðtÞ (2) ! ! ! And also, the vectors A and C are calculated as follows: ! ! A ¼2 a ! r1 a ! ! C ¼2! r2 (3) (4) The parameters descriptions used in all formulas of GWO are presented in Table 1. ). As shown in Fig. 4A, a gray wolf in location (X, Y) can change its position according to the position of the prey (X⁎, Y⁎! The ! various locations around the best agent can be obtained according to its current position by setting the value of the A ! ! and C vectors. For instance, (X⁎–X, Y⁎) can be reached by setting A ¼ ½1, 0 and C ¼ ½1, 1. The Possible updated locations for a gray wolf in 3D space are also shown in Fig. 4B. The random vectors ! r 1 and ! r 2 allow search agents to access any position between the points shown in Fig. 4. Thus, a gray wolf can randomly change its position in the prey space using Eqs. (1) and (2). 3.3 Hunting behavior As mentioned earlier, hunting is usually led by the a, b, and d provide support for a. Since there is no knowledge of the ! optimal solution or hunting position ( XP ) in the search space, the a position is considered the best position obtained (prey). Gray wolf optimization algorithm Chapter 15 257 TABLE 1 Description of GWO parameters. Parameters Description ! X The position vector of a gray wolf t Current iteration ! XP The position vector of the prey ! A Coefficient vector C ! Coefficient vector ! a Linearly decreased from 2 to 0 throughout iterations ! r1 Random vectors in [0, 1] ! r2 Random vectors in [0, 1] FIG. 4 Two-dimensional (A) and three-dimensional (B) location vectors and their next possible position. b and d are also assumed as other answers. In other words, three of the best solutions are saved, and other search agents (o wolves) are forced to update their position according to the position of the best agents. The following formulas are suggested in this regard: ! ! ! ! Da ¼ C1 Xa X (5a) ! ! ! ! Db ¼ C2 Xb X (5b) ! ! ! ! Dd ¼ C3 Xd X (5c) 258 Handbook of hydroinformatics ! ! ! ! X 1 ¼ X a A 1 Da ! ! ! ! X 2 ¼ X b A 2 Db ! ! ! ! X 3 ¼ X d A 3 Dd ! X ð t + 1Þ ¼ ! ! ! X1 + X 2 + X 3 3 (6a) (6b) (6c) (7) As shown in Fig. 5, the final position is obtained at a random location (within the circle) defined by the a, b, and d positions. In other words, this figure shows the estimation of these three wolves’ hunting positions and updates the position of other wolves (include o) in the surrounding area. All meta-heuristic algorithms run the search process in two exploration and exploitation stages. The exploitation process in the neighborhood of a point moves toward the best solution and may get stuck in local minima. So, it strongly depends on the starting point. Methods that use this algorithm include Newton Raphson, gradient methods, and steepest descent. At the same time, the exploration process is always looking for new searches in the decision space. If an algorithm uses only exploration, it becomes a random search algorithm with no proper direction. Therefore, balancing the above two components is always necessary for the optimization process. Table 2 also presents the specifications of the exploitation and exploration phase. 3.4 Exploitation in GWO-attacking prey As mentioned above, the gray wolves finish the hunt by attacking the prey when it stops moving. Mathematical modeling of ! ! approaching prey begins with decreasing the value of the a!. By doing this, the value of the A will also be reduced to zero randomly. In this regard, when the random values of the A Are in [1, 1], the next position of the search agent will be between its current position and the position of the prey (Fig. 6). FIG. 5 Updating of wolves’ positions in GWO. Gray wolf optimization algorithm Chapter 15 259 TABLE 2 Specifications of the exploitation and exploration phase. Exploitation Exploration Maintain convergence Preserve diversity Suitable for the final runs Suitable for primary runs Local view at the space The overall experience of space Low risk High risk Trapped in local answers Sudden movements ! ! FIG. 6 Attacking prey against searching for prey. (A) If jAj < 1. (B) If jAj > 1. 3.5 Exploration in GWO-search for prey Gray wolves move away from each other in search of prey and approach each other according to the a, b, and d positions to ! attack it. The A is used to model this process with a random value greater than 1 or less than 1. This diverges ! the search agent. In other words, this process enables the GWO algorithm to search globally. As shown in Fig. 6B, if jAj > 1, wolves are forced to diverge from prey and find more suitable prey. ! On the other hand, another parameter affecting the exploration process is the C. This vector, as mentioned before, has ! random values in the range of [0, 2]. Depending on the wolf’s position, C can give a random weight to the prey to make it ! ! harder or more accessible for the wolves to reach. When C > 1, the prey importance is emphasized, and for C < 1, the prey ! importance is reduced. C can also be considered the effect of obstacles preventing prey from approaching by giving a random weight to the prey, making!it harder for the wolves to reach the prey and longer in nature. It should be noted that ! C is not linearly reduced relative to A and can be very useful in preventing trapped in local solutions, especially in the final iteration. By the above explanations and what was presented in the previous sections, the optimization implementation process by the gray wolf algorithm is presented in Fig. 7 as a detailed flowchart. 4. Gray wolf optimization example for reservoir operation This section presents the application of the gray wolf optimization algorithm in an issue that has been used as a primary example in many water resource optimization topics. The physical characteristics of a dam are presented in Table 3. It is desirable to optimize the release values from the dam in a situation where all the downstream monthly demands, including the environmental flow, are met (Araghinejad et al., 2017). In this example, the weight of downstream needs is considered equal. 260 Handbook of hydroinformatics FIG. 7 Optimization implementation process by the gray wolf algorithm. TABLE 3 Physical and hydrological characteristics of the assumed dam. Mean inflow (MCM) Mean demand (MCM) Mean Eflow (MCM) Initial storage (MCM) Min storage (MCM) Max storage (MCM) Min release (MCM) Max release (MCM) 122.35 103.28 23.45 750 250 1470 10 380 In the case of non-supply of downstream needs, the amount of damage is assumed to equal the squares of the monthly shortage demands. The optimization period (number of variables) is 336 months. This model was implemented in MATLAB (Appendix A) using the standard GWO method described by Mirjalili et al. (2014). In this approach, the pack size and the maximum number of iterations were set to 245 and 1000. The mean best solution (reservoir release) obtained by GWO is 120.27 MCM. The objective space is shown in Fig. 8. Gray wolf optimization algorithm Chapter 15 261 FIG. 8 Objective space of the reservoir example. 5. Conclusions This chapter first addressed the importance of optimization models in water resources and then introduced the types of optimization algorithms along with their classification. After that, the gray wolf optimization (GWO) algorithm was presented as one of the newest meta-heuristic algorithms. A brief literature review on the application of this method in various water resources issues was conducted. Gray wolves are thought to be predators, and their hunting mechanism and social hierarchy inspire the algorithm. This method’s algorithm theory and mathematical modeling were presented in separate sections. The optimization implementation process was also shown in a detailed flowchart. Finally, as a primary and straightforward problem in teaching optimization algorithms in water resources, the optimization of water allocation from the reservoir was implemented in MATLAB software (Appendix A). 262 Handbook of hydroinformatics Appendix A: GWO Matlab codes for the reservoir example function loss = Fit_Example(x) global Inflow Totaldemand Maxstorage Minstorage Maxrelease Storage Release loss = 0; InitialStorage = 750; Storage = zeros(336,1); Release = x'; for m = 1:336 if m==1 Storage(m,1) = InitialStorage + Inflow(m,1) - Release(m,1); else %Continuity equation Storage(m,1) = Storage(m-1) + Inflow(m,1) - Release(m,1); end Residual = Release(m,1) - Totaldemand(m,1); if Residual==0 loss = loss+0; else loss = loss + (Release(m,1) - Totaldemand(m,1))^2; end if Storage(m,1) < Minstorage loss = loss + 1000000; %Penalty approch end if Storage(m,1) > Maxstorage loss = loss + 1000000; %Penalty approch end if Release(m,1) > Maxrelease loss = loss + 1000*(Release(m,1) - Maxrelease); %Penalty approch end end Gray wolf optimization algorithm Chapter 15 % This code performs Grey Wolf Optimization Algorithm for the reservoir example clc clear close all Problem definition global EFlow Inflow demand Totaldemand Maxstorage Minstorage Maxrelease Minrelease Storage data = readtable(fullfile('Evolutionary Algorithms.csv')); EFlow = data.eflow_MCM; % Environmental flow Inflow = data.damin_MCM; demand = data.demand_MCM; Totaldemand = demand + EFlow; Maxstorage = 1470; Minstorage = 250; Maxrelease = 380; Minrelease = 0; Upper and Lower Bands Definitions nVar = size(data,1); VarMin = 10; VarMax = 380; GWO Parameters MaxIt = 1000; nPop = 245; %Pack size Initialization empty_GWO.Position = []; empty_GWO.Cost = inf; %Because It will be compared in the following sections GWO = repmat(empty_GWO,nPop,1); Alpha = empty_GWO; Beta = empty_GWO; Delta = empty_GWO; for i = 1:nPop %Initialize Position % GWO(i).Position = unifrnd(VarMin,VarMax,[1,nVar]); GWO(i).Position = Totaldemand'; %Evaluation GWO(i).Cost = Fit_Example(GWO(i).Position); %Update Alpha, Beta, Delta if GWO(i).Cost < Alpha.Cost Alpha = GWO(i); if GWO(i).Cost < Beta.Cost Beta = GWO(i); if GWO(i).Cost < Delta.Cost 263 264 Handbook of hydroinformatics Delta = GWO(i); end end end end GWO Main Loop NFE = 0; %Number of Function Evaluation BestCost = zeros(1,MaxIt); for it = 1:MaxIt a = 2 - it*(2/MaxIt); for i = 1:nPop %Alpha Part r1 = rand(1,nVar); r2 = rand(1,nVar); A = 2*a*r1 - a; C = 2*r2; D = abs(C.*Alpha.Position - GWO(i).Position); X1 = Alpha.Position - A.*D; %Beta Part r1 = rand(1,nVar); r2 = rand(1,nVar); A = 2*a*r1 - a; C = 2*r2; D = abs(C.*Beta.Position - GWO(i).Position); X2 = Beta.Position - A.*D; %Delta Part r1 = rand(1,nVar); r2 = rand(1,nVar); A = 2*a*r1 - a; C = 2*r2; D = abs(C.*Delta.Position - GWO(i).Position); X3 = Delta.Position - A.*D; %Final Steps GWO(i).Position = (X1+X2+X3)/3; flagUb = GWO(i).Position > VarMax; flagLb = GWO(i).Position < VarMin; GWO(i).Position = GWO(i).Position.*(~(flagUb+flagLb))+VarMax.*(flagUb)+VarMin.*(flagLb); %Evaluation GWO(i).Cost = Fit_Example(GWO(i).Position); %Update Alpha, Beta, Delta if GWO(i).Cost < Alpha.Cost Alpha = GWO(i); Gray wolf optimization algorithm Chapter 15 265 if GWO(i).Cost < Beta.Cost Beta = GWO(i); if GWO(i).Cost < Delta.Cost Delta = GWO(i); end end end end NFE = NFE + nPop; BestCost(it) = Alpha.Cost; disp(['Iteration: ',num2str(it),', NFE = ',num2str(NFE),', BestCost = ',num2str(BestCost(it))]); end finalstorage = Storage; reservoir_release = GWO(i).Position'; Plot Results figure; plot(1:MaxIt,BestCost,':','Color','r','LineWidth',2,'MarkerSize',8) xlabel('Iteration') ylabel('Cost value obtained per each iteration') title(['Best Cost Obtained = ',num2str(BestCost(MaxIt))]) set(gca,'FontName','Times New Roman') set(gca,'FontSize',12) set(gca,'Color',[0.95 0.97 0.95]) set(gcf,'Color','w') grid on xlim([0 MaxIt]) ylim([0 max(BestCost)]) References Araghinejad, S., Hosseini-Moghari, S.-M., Eslamian, S., 2017. Reservoir operation during drought. In: Eslamian, S., Eslamian, F. (Eds.), Handbook of Drought and Water Scarcity. Management of Drought and Water Scarcity, vol. 3. Taylor and Francis, CRC Press, USA, pp. 283–292 (Chapter 12). Bonabeau, E., Marco, D., Theraulaz, G., 1999. Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, UK. Dahmani, S., Yebdri, D., 2020. Hybrid algorithm of particle swarm optimization and grey wolf optimizer for reservoir operation management. Water Resour. Manag. 34, 4545–4560. Damania, R., Desbureaux, S., Rodella, A.-S., Russ, J., 2019. Quality Unknown: The Invisible Water Crisis. World Bank Publications, United Nations. Dorigo, M., Birattari, M., Stutzle, T., 2006. Ant colony optimization. IEEE Comput. Intell. Mag. 1, 28–39. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., 1983. Optimization by simulated annealing. Science 220 (4598), 671–680. Loucks, D.P., Van Beek, E., 2017. Water Resource Systems Planning and Management: An Introduction to Methods, Models, and Applications. Springer. Majumder, P., Eldho, T.I., 2020. Artificial neural network and grey wolf optimizer based surrogate simulation-optimization model for groundwater remediation. Water Resour. Manag. 34, 763–783. Mirjalili, S., Mirjalili, S.M., Lewis, A., 2014. Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007. Mosavi, A., Samadianfard, S., Darbandi, S., Nabipour, N., Qasem, S.N., Salwana, E., Band, S.S., 2021. Predicting soil electrical conductivity using multilayer perceptron integrated with grey wolf optimizer. J. Geochem. Explor. 220, 106639. 266 Handbook of hydroinformatics Muro, C., Escobedo, R., Spector, L., Coppinger, R.P., 2011. Wolf-pack (Canis lupus) hunting strategies emerge from simple rules in computational simulations. Behav. Process. 88, 192–197. Oliva, D., Abd Elaziz, M., Hinojosa, S., 2019. Metaheuristic Optimization. Springer, pp. 13–26, https://doi.org/10.1007/978-3-030-12931-6_3. Sharma, M., Kaur, P., 2021. A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Arch. Comput. Methods Eng. 28, 1103–1127. https://doi.org/10.1007/s11831-020-09412-6. Simonovic, S.P., 2012. Managing Water Resources Methods and Tools for a Systems Approach. UNESCO, Paris and Earthscan James & James, London, https://doi.org/10.4324/9781849771917. Tayfur, G., 2017. Modern optimization methods in water resources planning, engineering and management. Water Resour. Manag. 31, 3205–3233. Tikhamarine, Y., Malik, A., Kumar, A., Souag-Gamane, D., Kisi, O., 2019. Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrol. Sci. J. 64, 1824–1842. Tikhamarine, Y., Souag-Gamane, D., Ahmed, A.N., Kisi, O., El-Shafie, A., 2020. Improving artificial intelligence models accuracy for monthly streamflow forecasting using grey wolf optimization (GWO) algorithm. J. Hydrol. 582, 124435. Yang, X.-S., 2010. Engineering Optimization. John Wiley & Sons, Inc., Hoboken, NJ, USA, https://doi.org/10.1002/9780470640425. Yu, S., Lu, H., 2018. An integrated model of water resources optimization allocation based on projection pursuit model-grey wolf optimization method in a transboundary river basin. J. Hydrol. 559, 156–165. Chapter 16 Kernel-based modeling Kiyoumars Roushangara,b, Roghayeh Ghasempoura, and Saman Shahnazia a Department of Water Resources Engineering, Faculty of Civil Engineering, University of Tabriz, Tabriz, Iran, b Center of Excellence in Hydroinformatics, University of Tabriz, Tabriz, Iran 1. Introduction Kernel-based approaches such as Gaussian Process Regression (GPR), Support Vector Machine (SVM), and Kernel Extreme Learning Machine (KELM) are relatively new important methods based on the different kernels type which are based on statistical learning theory initiated. These methods can model non-linear decision boundaries, and there are many kernels to choose from. They are also robust against overfitting, especially in high-dimensional space. However, the appropriate selection of kernel type is the most important step in these models due to their direct impact on the training and classification precision. These methods are memory intensive, trickier to tune due to the importance of picking the right kernel. In these models, we will be able to predict the proper behavior of the system, although we will not be able to characterize its intrinsic structure and behavior. A kernel method is an algorithm that depends on the data only through dotproducts. When this is the case, the dot product can be replaced by a kernel function which computes a dot product in some possibly high dimensional feature space (Smola, 1996). This has two advantages: First, the ability to generate non-linear decision boundaries using methods designed for linear classifiers. Second, the use of kernel functions allows the user to apply a classifier to data that have no obvious fixed-dimensional vector space representation. Several kernel-based tools or techniques have been employed to find reliable solutions for solving complicated hydraulic and hydrological problems. Among the broader science applications of kernel-based methods, SVM is widely used in various fields of water engineering. There is a large volume of published studies highlighting the encouraging performance of SVM in the areas of classification, regression, and forecasting (Deka, 2014; Seifi et al., 2020; Seifi and Riahi, 2020). Recent years have seen an increment in the number of scientific researches using other kernel-based algorithms. Among them, Gaussian Process Regression (GPR) has gained enormous popularity as the most effective Bayesian tool for complicated regression problems (Perez-Cruz et al., 2013). GPR as a kernel-based model is conceptually simpler to understand and known as a flexible nonparametric model that provides a prior probability distribution to be directly defined over functions (Rasmussen and Williams, 2006). A promising application of the GPR model in forecasting daily, monthly, and seasonally streamflow (Sun et al., 2014; Zhu et al., 2019), forecasting groundwater level (Raghavendra and Deka, 2016) and forecasting daily seepage discharge of an earth dam (Roushangar et al., 2016) have been reported. Having employed the GPR model to predict the discharge coefficient of a gated piano key weir, Akbari et al. (2019) showed that utilization of various kernel functions had a trivial effect on model performance. Roushangar and Shahnazi (2020b) explored the use of GPR and SVM to predict sediment transport rate of gravel-bed rivers. Two distinct scenarios (based on hydraulic and sediment properties) were used to depict the modeling process. The slight advantage in performance of the GPR over SVM models might be seen. Jaiswal and Goel (2020) went on to confirm the efficiency of kernel-based GPR model in aeration efficiency modeling of rectangular weirs. It is also noteworthy that recent studies offer Pearson kernel function for well advised application of GPR tool for modeling of energy losses of culverts and roughness coefficient of sewer pipe (Roushangar et al., 2019, 2020). The concept of kernel function has also been used in order to improve the performance of conventional learning methods such as Extreme Learning Machine (ELM). Extending the kernel function for determination of the hidden layer feature mapping in ELM leads to more stability for prediction goals. Employing Relevance Vector Machine (RVM), GPR and KELM as kernel-based techniques to model pier scour using field data, Pal et al. (2014) demonstrated that the GPR outperformed the RVM and KELM models. Roushangar and Shahnazi (2019) introduced an effective prediction method based on KELM coupled with Particle Swarm Optimization (PSO). Compared with classical approaches, their hybrid model enjoyed superior accuracy when it employed to predict bedload transport rate. In other attempts by the same authors, Roushangar and Shahnazi (2020a) investigated the generalization capability of three kernel-based techniques (KELM, Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00018-X Copyright © 2023 Elsevier Inc. All rights reserved. 267 268 Handbook of hydroinformatics GPR and SVM) for modeling total sediment load of gravel-bed rivers. Li et al. (2020) conducted research on a river water level forecasting and found that KELM can effectively improve the prediction accuracy of the model. Using kernel-based approaches effectively requires an understanding of how they work and what kernel should be selected. In this chapter, several types of kernel-based approaches and theory behind them are discussed and some examples about their applications are provided. 2. Support vector machine The support vector machine (SVM) algorithm is a popular machine learning tool that offers solutions for both classification and regression problems (Vapnik, 1995; Sharifi Garmdareh et al., 2018). SVM is built on the basis of the VC Dimension (Vapnik Chervonenkis Dimension) Theory and the Structural Risk Minimum Theory, which are the core contents of the Statistical Learning Theory. SVM has both solid theoretical foundation and ideal generation ability. Presently, SVM has been used in many domains and occasions, such as handwriting recognition, biological character recognition (e.g., face recognition), credit card cheat checking, image segmentation, bioinformatics, function fitting, and medical data analysis. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. 2.1 Support vector classification 2.1.1 Linear classifiers The data for a two-class learning problem consists of objects labeled with one of two labels corresponding to the two classes; for convenience based on Fig. 1, we assume the labels are +1 (positive examples) or 1 (negative examples). In what follows boldface x denotes a vector with components xi. The notation xi will denote the ith vector in a dataset {(xi, yi)}n i¼1, where yi is the label associated with xi. The objects xi are called patterns or examples. We assume the examples belong to some set X. Initially we assume the examples are vectors, but once we introduce kernels this assumption will be relaxed, at which point they could be any continuous/discrete object. A key concept required for defining a linear classifier is the dot product between two vectors, also referred to as an inner product or scalar product, defined as wTx ¼ Siwixi. A linear classifier is based on a linear discriminant function of the form (Yang et al., 2015): f ðxÞ ¼ w T x + b (1) The vector w is known as the weight vector, and b is called the bias. Consider the case b ¼ 0 first. The set of points x such that wTx ¼ 0 are all points that are perpendicular to w and go through the origin—a line in two dimensions, a plane in three dimensions, and more generally, a hyperplane. The bias b translates the hyperplane away from the origin. x : f ðxÞ ¼ wT x + b ¼ 0 (2) b ||w || 2 Margin = w || || Support vector wTx + b = 1 wTx + b = 0 wTx + b = –1 FIG. 1 SVM classification. w Support vector Kernel-based modeling Chapter 16 269 The hyperplane divides the space into two: the sign of the discriminant function f(x) denotes the side of the hyperplane a point is on. The boundary between regions classified as positive and negative is called the decision boundary of the classifier (Roushangar and Ghasempour, 2017). The decision boundary defined by a hyperplane is said to be linear because it is linear in the input examples. A classifier with a linear decision boundary is called a linear classifier. Conversely, when the decision boundary of a classifier depends on the data in a non-linear way the classifier is said to be non-linear. 2.1.2 Non-linear classifiers and kernels application In many applications a non-linear classifier provides better accuracy. And yet, linear classifiers have advantages, one of them being that they often have simple training algorithms that scale well with the number of examples (Cristianini and Shawe-Taylor, 2000; Tezel and Buyukyildiz, 2016). This begs the question: Can the machinery of linear classifiers be extended to generate non-linear decision boundaries? Furthermore, can we handle domains such as protein sequences or structures where a representation in a fixed dimensional vector space is not available? According to Fig. 2, the naive way of making a non-linear classifier out of a linear classifier is to map data from the input space X to a feature space F using a non-linear function ’: X ! F. In the space F the discriminant function is: f ðxÞ ¼ wT ’ðxÞ + b (3) When working with SVM, we are usually dealing with spaces with a very high dimensionality. The objective of SVM is to find a hyperplane that separates the data in spaces we may don’t know how to do it, or what it is worse, we may know that the data is not linear separable in the actual space (called input space). Kernel is a tool that projects the data from an input space to a feature space where we know that the data is linear separable. In fact, kernel is a mathematical function for transforming data from an input space to a feature space. Different kernelbased algorithms use different types of kernel functions. The appropriate selection of kernel type is the most important step in the kernel-based approaches due to its impact on the training process (Zhuang et al., 2011). 2.2 Support vector regression The regression problem is a generalization of the classification problem, in which the model returns a continuous-valued output, as opposed to an output from a finite set. In other words, a regression model estimates a continuous-valued multivariate function (Roushangar and Ghasempour, 2017). The SVM method is based on the concept of the optimal hyperplane that separates samples of two classes by considering the widest gap between two classes (see Fig. 3). Support vector regression (SVR) is an extension of SVM regression. The aim of SVR is to characterize a kind of function that has at most e deviation from the actually obtained objectives for all training data yi and at the same time, it would be as flat as possible (Vapnik, 1995). SVR formulation is as follows: f ðxÞ ¼ w’ðxÞ + b x1 Class +1 Class –1 Missclassified Kernel mapping ϕ: x → j (x) = z n rgi Ma w 2/ (4) (z2) ξi b w (z1) x2 ξj Decision boundary wTϕ(x)+b= +1 T w ϕ(x)+b= 0 x-space FIG. 2 Non-linear classifier and kernel mapping (Zhuang et al., 2011). z-space wTϕ(x)+b= –1 270 Handbook of hydroinformatics FIG. 3 Data classification and support vectors. w is expressed as Eq. (5) in which ai is Lagrange multipliers, yi is forecasted value and xi is input value. w¼ n X ai y i x i (5) i¼1 The coefficients of Eq. (4) are determined by minimizing regularized risk function as expressed below: R min ¼ c where n 1 X 1 L ðt , y Þ + kwk2 N i¼1 e i i 2 ( Le ðti , yi Þ ¼ 0 t , y e i i t , y e i i Otherwise (6) (7) The constant C is the cost factor and represents the trade-off between the weight factor and approximation error. e is the radius of the hyper-tube within which the regression function must lie. The Le(ti, yi) represents the loss function in which yi is forecasted value and ti is desired value in period i. The jjw jj is the norm of w vector and the term jjw jj2 can be expressed in the form wTw in which wT is the transpose form of w vector. According to Eq. (7), if the predicted value is out of e-tube then the loss will be the absolute value, which is the difference between predicted value and e. Since some data may not lie inside the e-tube, the slack variables (x, x⁎) must be used. These variables show the distance from actual values to the corresponding boundary values of e-tube. Therefore, it is possible to transform Eq. (6) into R min ¼ c n X i¼1 1 xi , x∗i + kwk2 2 (8) subject to: ti wi’(xi) b e + xi, wi’(xi) + b ti x∗i , xi + x∗i 0 Using Lagrangian multipliers in Eq. (8) thus yields the dual Lagrangian form: n n n n X X X 1 X ai + a∗i + ti ai a∗i ai a∗i aj a∗j K xi , xj Maxl ai , a∗i ¼ e 2 i¼1 i¼1 i¼1 j¼1 subject to: n P i¼1 ai a∗i ¼ 0, 0 ai , a∗i C, (9) i ¼ 1,2…N Where ai and a⁎i are Lagrange multipliers and l(ai, a⁎i ) represents the Lagrange function. K (xi, xj) is a kernel function to yield the inner products in the feature space ’(xi) and ’(xj) and presented as below: K xi , xj ¼ ’ðxi Þ ’ xj (10) Kernel-based modeling Chapter 16 271 Different software such as the MATLAB and STATISTICA can be used for data analysis with the SVM approach. Also, Python programming language can be used for modeling aim via the SVM model. More detail about SVM coding can be fined in https://www.mathworks.com/help/stats/fitrsvm.html website. For example the following code can be used for fitting a SVM model. fitrsvm: Mdl ¼ fitrsvm(Tbl,ResponseVarName) Mdl ¼ fitrsvm(Tbl,formula) Mdl ¼ fitrsvm(Tbl,Y) Mdl ¼ fitrsvm(X,Y) Mdl ¼ fitrsvm(___,Name,Value) 3. Gaussian processes Gaussian processes (GPs) are powerful algorithms for both regression and classification (Melo, 2012). Their greatest practical advantage is that they can give a reliable estimate of their own uncertainty. Fig. 4 illustrates a typical example of a prediction problem: given some noisy observations of a dependent variable at certain values of the independent variable x, what is our best estimate of the dependent variable at a new value, x∗? If we expect the underlying function f(x) to be linear, and can make some assumptions about the input data, we might use a least-squares method to fit a straight line (linear regression). Moreover, if we suspect f(x) may also be quadratic, cubic, or even non-polynomial, we can use the principles of model selection to choose among the various possibilities. Gaussian processes extend multivariate Gaussian distributions to infinite dimensionality. Formally, a Gaussian process generates data located throughout some domain such that any finite subset of the range follows a multivariate Gaussian distribution. Now, the n observations in an arbitrary data set, y ¼ {y1, …, yn}, can always be imagined as a sample from some multivariate (n-variate) Gaussian distribution, after enough thought. Hence, working backward, this data set can be partnered with a GP. Thus, GPs are as universal as they are simple. Very often, it’s assumed that the mean of this partner GP is zero everywhere. What relates one observation to another in such cases is just the covariance function, k (xi, xj). 3.1 Gaussian process regression This section introduces Gaussian process regression as a useful tool for formulating a Bayesian framework for regression problems. The Gaussian process (GP) is achieved through extending the multivariate Gaussian distribution to infinite dimensions, which can be considered as a statistical distribution of functions (Rasmussen and Williams, 2006). Suppose the training data set of the Gaussian model is D ¼ {(xn, yn), n ¼ 1, 2, …N}, where xn ℝdx refers to the input and yn ℝ refers 1.5 1 0.5 x*= ? y 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 x FIG. 4 Given seven noisy data points. 0 0.5 272 Handbook of hydroinformatics to the output. In Gaussian process regression, the observed target value y of an underlying function f at input x can be given as: y ¼ f ðxÞ + x (11) where x represents the independent and identically Gaussian noise with a mean value of zero (m(x) ¼ 0) and a variance of s2. Then, the prior distribution can be acquired as: (12) Y ¼ ðy1 , …, yn Þ N 0, kij + s2n I where kij ¼ K(xi, xj), and I stands as the identity matrix. The joint prior distribution of observed and predicted values can be written as: " #! Y K ðX, XÞ + s2n I K ðX, X∗ Þ (13) N 0, f∗ K ðX ∗ , X Þ K ðX ∗ , X ∗ Þ where X ¼ [x1, x2, …, xn] refer to the training set, X⁎ stands as testing data, Y ¼ [y1, y2, …, yn] refer to observed value set, f∗ ¼ [fx1, fx2, …, f∗ n] is predictive value set, K(X, X) ¼ (xi, xj) indicates a symmetric positive definite covariance matrix, whose elements xixj describe the correlation between xi and xj using the concept of kernel function. K(X, X∗) represents the n n∗ covariance matrix assessed at all pairs of training and test dataset considering n training data and n∗ test data. Similarly, this is true for the other values of K(X∗, X) and K(X∗, X∗). The principle of the joint prior Gaussian distributions provides the prediction results for the target to be inferred through the mean function f ∗ and the covariance function Cov(f∗) presented by Eqs. (14), (15): 1 f∗ ¼ m x∗ + K X∗ , X K ðX, XÞ + s2n I Y (14) Covðf ∗ Þ ¼ K ðX∗ , X∗ Þ K ðX∗ , XÞ K ðX, X + s2n I 1 K ðX, X∗ Þ + s2n I (15) The kernel function is an essential part of the GPR model development, as it includes assumption about the smoothness and likely patterns to be expected in the data. Kernel function determines how the response at one point xi is affected by responses at other points xj, i 6¼ j, i ¼ 1, 2, …, n. A sensible assumption is usually that the correlation between two points decays with the distance between the points. This implies that The behavior of closer points is more similar than points which are further away from each other. There are many choices for kernel functions such as the Matern kernel family: 1v 0 1 0 1v x x x x p ffiffiffiffiffi p ffiffiffiffiffi i j i j 2 2 @ 2v A K @ 2v A (16) k xi , xj ¼ s v l l GðvÞ where Г represents the gamma function, Kv represents the modified Bessel function and j xi xj j is the distance between input location xi and xj. Some forms of kernel functions are derived through half integer values of v. Here the most prominent forms are addressed. For v ¼ 0 the Ornstein-Uhlenbeck kernel function is obtained as: 1 0 xi xj 2 @ A (17) k xi , xj ¼ s exp l For v ¼ 3/2 Matern 3/2 kernel function: 1 1 0 0 pffiffiffi pffiffiffi 3 x i x j 3 x i x j 2@ A @ A k xi , xj ¼ s 1 + exp l l For v ¼ 5/2 Matern 5/2 kernel function: 0 pffiffiffi 1 2 1 1 0 0 pffiffiffi pffiffiffi 5 x i x j 5xi xj 5 x i x j B C 2@ A @ A k xi , xj ¼ s 1 + exp @ A exp l l 3l2 (18) (19) Kernel-based modeling Chapter 16 273 Moreover, for limv ! ∞, the squared exponential kernel is obtained. This commonly used kernel function provides an expressive kernel in order to model smooth and stationary functions (Duvenaud, 2014). The squared exponential kernel is defined as: 0 2 1 xi xj B C k xi , xj ¼ s2 exp @ (20) A 2l2 The values of the length scale (l) and the signal variance (s2) as hyper-parameters can affect the a priori correlation between points and can change the resulting function (Seifi and Riahi, 2020). The kernel function and its parameters with the degree of noise should be optimally determined during the training process of GPR models. Considering GPR with a fixed value of Gaussian noise, a GP model could be trained by using Bayesian inference, i.e. by maximizing the marginal likelihood. This causes the minimization of the negative log-posterior: 1 1 1 p s2 , k ¼ Y T K + s2 I Y + log K + s2 I log p s2 log pðkÞ 2 2 (21) In order to obtain the hyper parameters, the partial derivative of Eq. (21) can be gained with regard to s2 and k, and minimization can be obtained by gradient descent. The MATLAB and Python programming languages can be used for modeling aim via the GPR. More detail about GPR cods can be fined in https://www.mathworks.com/help/stats/fitrgp.html and https://www.mathworks.com/help/stats/ gaussian-process-regression-models.html websites. The following code can be used for fitting a Gaussian process regression (GPR) model. fitrgpr: gprMdl gprMdl gprMdl gprMdl gprMdl ¼ ¼ ¼ ¼ ¼ fitrgp(Tbl,ResponseVarName) fitrgp(Tbl,formula) fitrgp(Tbl,y) fitrgp(X,y) fitrgp(___,Name,Value) 3.2 Gaussian process classification The Gaussian process classifier implements Gaussian processes (GP) for classification purposes (Fig. 5), more specifically for probabilistic classification, where test predictions take the form of class probabilities. The Gaussian process classification (GPC) places a GP prior on a latent function, which is then squashed through a link function to obtain the probabilistic classification. The latent function is a so-called nuisance function, whose values are not observed and are not relevant by themselves. Its purpose is to allow a convenient formulation of the model, and is removed (integrated out) during prediction. The GPC implements the logistic link function, for which the integral cannot be computed analytically but is easily approximated in the binary case. In contrast to the regression setting, the posterior of the latent function is not Gaussian even for a GP prior since a Gaussian likelihood is inappropriate for discrete class labels. Gaussian process classifier approximates the non-Gaussian posterior with a Gaussian based on the Laplace approximation. This method supports multi-class classification by performing either one-versus-rest or one-versus-one based training and prediction. In oneversus-rest, one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from the rest. In “one-vs-one,” one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate these two classes. The predictions of these binary predictors are combined into multi-class predictions. In the case of Gaussian process classification, “one-vs-one” must solve many problems involving only a subset of the whole training set rather than fewer problems on the whole dataset. Since Gaussian process classification scales cubically with the size of the dataset, this might be considerably faster. However, note that “one-vs-one” does not support predicting probability estimates but only plain predictions. 274 Handbook of hydroinformatics FIG. 5 Gaussian process classification. (For more information see: https://sccn.ucsd.edu/svn/software/tags/EEGLAB7_0_2_9beta/external/fieldtrip20090727/classification/toolboxes/external/gpml-matlab/doc/classification.html.) 4. Kernel extreme learning machine Extreme Learning Machine (ELM) is a fast training model with simple mathematical structure which benefits the idea of the back proration. However, the random distribution of input and hidden layer causes of variation in regression and classification accuracy, even if the inputs are totally the same. In order to avoid this drawback, the kernel function is integrated into the basis of the ELM to design the Kernel Extreme Learning Machine (KELM). Compared with widely used kernelbased SVM method, KELM can lead to better performance in the areas of classification, pattern recognition and regression through easier implementation and faster training speed (Shamshirband et al., 2015). A brief introduction of KELM is presented here. The ELM is known as a single hidden layer neural network (Huang et al., 2006). Despite back- propagation approach, which needs adjustment of input weights, the ELM uses randomly assigned input weights. For the given dataset, an ELM with H hidden neurons and activation function f(x) can be expressed as: XH XH a f ð x Þ ¼ a f w , x + c (22) i i j i ¼ ej i¼1 i i¼1 i where j ¼ 1, …, n Where wi and ai are the weight vectors connecting inputs and the ith hidden neuron and the ith hidden neuron and output neurons respectively; ci is the complex bias of the ith hidden neuron. Huang et al. (2006) suggested that Eq. (22) can be written briefly as follows: Aa ¼ Y (23) where A is the hidden layer output matrix of the neural network (Huang et al., 2006). The set of weights (wi, ai) and biases should be adjusted in application of neural network with back-propagation learning algorithms. The back-propagation learning algorithm requires specifying the value of learning rate, momentum, and does not ensure that the absolute minimum of the error function will be found. As a result, the learning algorithm can have local minima and may be in the risk of over-training while the training process. Using the smallest norm least squares solution of Aa ¼ Y is suggested to solve these problems (Huang et al., 2006). In most examples of the utilization of ELM, the number of hidden neurons is much less than the number of training samples, thus making A a non-square matrix and there may not exist a such that Aa ¼ Y, instead one may need to find a0 (Huang et al., 2006). Consequently, the solution of Eq. (23) becomes: a 0 ¼ AC Y where AC is the Moore-Penrose generalized inverse of matrix A. (24) Kernel-based modeling Chapter 16 275 Recently, Huang et al. (2011) proposed applying orthogonal projection and kernel methods in the design of ELM. Based on the orthogonal projection method; AC ¼ (ATA)1AT if ATA is non-singular or AC ¼ AT(AAT)1 if AAT is non-singular. Huang et al. (2011) proposed adding a positive value of 1/r (where r is a user-defined parameter) to the diagonal of AAT or ATA in the computation of the output weights a which present a more stable solution of the ELM with better generalization ability in compare of least square solution. Thus, to have a stable ELM algorithm, one can have: 1 T 1 T + AA a¼A Y (25) r With a corresponding output function of ELM defined by hðxÞa ¼ hðxÞA T 1 + AAT r 1 Y (26) Employing a kernel function was proposed if the hidden layer feature mapping h(x) is unknown (Huang et al., 2006). A kernel matrix for ELM can be expressed as follows: (27) wELM ¼ AAT : wELMi,j ¼ hðxi Þ,h xj ¼ K xi , xj where K(xi, xj) is a kernel function. Now the output function (Eq. 23) can be written as: 2 3 K ðx, x1 Þ 1 6 7 1 ⋮ + wELM Y 4 5 r K ðx, xn Þ (28) In application of kernel-based ELM, there is no need to know the number of hidden nodes and hidden layer feature mapping, alternatively, a kernel function corresponding to h(x) can be used. For modeling via the KELM method the MATLAB and Python programming languages can be used. 5. Kernels type Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates (Theodoridis, 2008). This approach is called the “kernel trick.” According to Fig. 6, any linear model can be turned into a non-linear model by applying the kernel trick to the model: replacing its features (predictors) by a kernel function. FIG. 6 An illustration of the kernel method (https://commons.wikimedia.org/wiki/File:Kernel_Machine.png). 276 Handbook of hydroinformatics 5.1 Fisher kernel The Fisher kernel is a function that measures the similarity of two objects based on sets of measurements for each object and a statistical model. In a classification procedure, the class for a new object (whose real class is unknown) can be estimated by minimizing, across classes, an average of the Fisher kernel distance from the new object to each known member of the given class. The Fisher kernel was introduced in 1998 ( Jaakkola et al., 1999). It combines the advantages of generative statistical models (like the hidden Markov model) and those of discriminative methods (like support vector machines). 5.2 Graph kernels The graph kernel is a kernel function that computes an inner product on graphs (Vishwanathan et al., 2010). Graph kernels can be intuitively understood as functions measuring the similarity of pairs of graphs. They allow kernelized learning algorithms such as support vector machines to work directly on graphs, without having to do feature extraction to transform them to fixed-length, real-valued feature vectors. 5.3 Kernel smoother A kernel smoother is a statistical technique to estimate a real valued function f: Rp ! R as the weighted average of neighboring observed data. The weight is defined by the kernel, such that closer points are given higher weights. The estimated function is smooth, and the level of smoothness is set by a single parameter (Wand and Jones, 1994). 5.4 Polynomial kernel The polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables, allowing learning of non-linear models. Intuitively, the polynomial kernel looks not only at the given features of input samples to determine their similarity, but also combinations of these. In the context of regression analysis, such combinations are known as interaction features (Aboutalebi et al., 2015). Polynomial kernel function is described as: d K xi , xj ¼ xi , xj + c (29) where d is the degree and c 0 is a free parameter trading off the influence of higher-order versus lower-order terms in the polynomial. When c ¼ 0, the kernel is called homogeneous. 5.5 Radial basis function kernel In machine learning, the radial basis function (RBF) kernel, is a popular kernel function used in various kernelized learning algorithms. The RBF kernel on two samples xi and xj, represented as feature vectors in some input space. Satisfactory performance of RBF kernel function has been reported in the literature (Seifi and Riahi, 2020; Roushangar et al., 2021). RBF kernel function is described as: 2 K xi , xj ¼ exp xi xj =2g2 (30) where g stands for the optimal width of kernel function. Great values of g let kernel-based approaches to have a strong impact over a large area. 5.6 Pearson kernel Pearson VII universal kernel (PUK) is the other type of kernel function that can be used in kernel-based algorithms. The Pearson VII kernel function of multi-dimensional input space is given by the following formula: 2 !2 32 r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi K xi , xj ¼ 1=41 + 2 xi xj 2ð1=oÞ 1=s 5 (31) where the parameters o and s control the half-width (also named Pearson width) and the tailing factor of the peak. Kernel-based modeling Chapter 16 277 5.7 String kernels The string kernel is a kernel function that operates on strings, i.e. finite sequences of symbols that need not be of the same length. String kernels can be intuitively understood as functions measuring the similarity of pairs of strings: the more similar two strings are, the higher the value of a string kernel will be. The equation is K(xi,xj) ¼ tanh(axay + c). 5.8 Neural tangent kernel The neural tangent kernel (NTK) is a kernel which describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from Kernel Methods. For most common neural network architectures, in the limit of large layer width the NTK becomes constant. This enables simple closed form statements to be made about neural network predictions, training dynamics, generalization, and loss surfaces. 6. Application of kernel-based approaches Application areas of kernel methods are diverse and include geo-statistics, inverse distance weighting, 3D reconstruction, bioinformatics, information extraction, handwriting recognition, and regression issues. Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. The applications of kernel-based approaches in regression of water recourse engineering problems have been widely considered by researchers. In the following parts some studies related to kernel-based approaches are presented. 6.1 Total resistance and form resistance of movable bed channels In general, total roughness coefficient in open channels includes both grain resistance and bedform resistance. Due to the non-linearity of the roughness coefficient, an accurate prediction of the bedform roughness is difficult. Saghebian et al. (2020) investigated the potential of the GPR kernel-based approach in the total resistance and form resistance prediction in alluvial channels. The simulations were done for four different data series obtained from experimental studies in different laboratories. The obtained results proved the capability of GPR method in the modeling process. It was found that using kernel function of Pearson (Eq. 31) led to better prediction accuracy (see Fig. 7). 6.2 Energy losses of rectangular and circular culverts 1 0.8 0.6 0.6 0.4 20 15 MAPE 1 0.8 DC R An application of GPR and SVM regressions were discussed in Roushangar et al. (2019). The approach taken was the use of GPR and SVM for predicting the energy dissipation in rectangular and circular culverts. According to Fig. 8, two types of bend loss in rectangular culverts and entrance loss in circular culverts with different inlet end treatments were considered. Various input combinations were developed and tested using experimental data. For selecting the best kernel functions, models were predicted via GPR and SVM using various kernel. As Fig. 9 shows, kernel function of RBF led to better prediction accuracy in comparison to the other kernels. The obtained results showed the desirable accuracy of applied kernelbased approaches in the energy dissipation modeling. 0.4 10 0.2 0.2 5 0 0 0 Kernel type Kernel typ FIG. 7 Statistics parameters via GPR kernels function types (Saghebian et al., 2020). Kernel typ 278 Handbook of hydroinformatics Energy loss modeling Scenario 1 Bend loss in rectangular culvert Scenario 2 Entrance loss in circular culvert Input: Fr, θ Square-edged inlet with vertical headwall Input: Re, Fr, Mitered flush to 1.5:1 (horizontal to vertical) fill slope Hw/D Output: Ke Thin-wall projecting 45° beveled inlet with vertical headwall FIG. 8 Schematic view of different states considered in Roushangar et al. (2019) study. 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 GPR 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 Kernel type SVM 0.300 0.200 0.100 0.000 Kernel type DC R 1 0.400 Kernel type GPR 0.100 GPR 0.075 RMSE Kernel type SVM RMSE SVM DC R 1 0.050 0.025 0.000 Kernel type Kernel type FIG. 9 Statistics parameters via SVM and GPR kernels function types for a testing set of rectangular culvert (Roushangar et al., 2019). Kernel-based modeling Chapter 16 279 6.3 Lake and reservoir water level prediction Khan and Coulibaly (2006) examined the potential of the support vector machine (SVM) in long-term prediction of lake water levels. Lake Erie mean monthly water levels from 1918 to 2001 were used to predict future water levels up to 12 months ahead. Here the optimization technique (linearly constrained quadratic programming function) used in the SVM for parameter selection has made it superior to traditional neural networks. They found RBF kernel to be the most appropriate and adopts it with a common width of (g ¼ 0.3) for all points in the data set. Further they set the values for regularization parameter C ¼ 100 and e-insensitive loss function with e ¼ 0.005 chosen by the trial-and-error approach. 80%–90% of the input data were identified to be support vectors in the model. The performance was compared with a multilayer perceptron (MLP) and with a conventional multiplicative seasonal autoregressive model (SAR). Overall, SVM showed good performance and was proved to be competitive with the MLP and SAR models. For a 3- to 12-month-ahead prediction, the SVM model outperformed the two other models based on the root-mean square error and correlation coefficient performance criteria. 6.4 Streamflow forecasting Jian et al. (2006) noticed the importance of accurate time- and site-specific forecasts of streamflow and reservoir inflow for effective hydropower reservoir management and scheduling. They used monthly flow data of the Manwan Reservoir spanning over a time period from January 1974 to December 2003; the data set from January 1974 to December 1998 were used for training and the data sets from January 1999 to December 2003 served for validation. The SVM model gave a good prediction performance when compared with those of ARMA and ANN models. The authors point out SVMs’ distinct capability and advantages in identifying hydrological time series comprising nonlinear characteristics and its potential in the prediction of long-term discharges. 6.5 Sediment load prediction As the sediment transport attracts great attention on environmental issues and water resources planning, the need for and use of robust learning approaches such as kernel-based techniques has become more apparent. In an investigation into the application of SVM, GPR and KELM on sediment load prediction, Roushangar and Shahnazi (2020a) used a large number of measurements and related information for increasing the prediction level of total sediment load. They employed the records of stream flow and transported sediments of 19 gravel-bed rivers for 1980 to 2002 including 890 samples of transported bed and suspended loads. Different input combinations were evaluated, based on hydraulic characteristics and sediment features. The implementation of KELM technique with fewer number of input variables provided very good outcomes. Moreover, the obtained results indicated great performance of KELM with minimum complexity of the model at the same time. In addition, the results of this study showed that compared to SVM, RVM produces a much sparser solution, requiring only 20 relevance vectors in comparison to 151 support vectors by SVM out of a total of 154 training data to create the model, but may produce local minima because of the use of an expectation maximization-based learning approach. 6.6 Pier scour modeling Scouring around the piers is the major cause of the bridge failures and accurate prediction of equilibrium scour depth is one of the main concerns in the hydraulic design of bridge. Where empirical formulas are insufficient in providing persistent success due to complexity and uncertainty of the phenomenon, Pal et al. (2014) introduced kernel-based methods as reliable tools that provide solutions for prediction of pier scour depth. A total of 232 field data points were used to feed the employed relevance vector machines (RVM), GPR and KELM methods. It was found that the employed kernel-based methods had the superior performance compared with those obtained from empirical approaches. The models derived from GPR and to some extent RVM are capable of generalization well, better than KELM method. 6.7 Reservoir evaporation prediction The water loss through evaporation is a significant component for planning and management of water resources. Having utilized KELM to model monthly evaporation from Algerian dam’s reservoirs in 2020, Sebbar et al. reported the effectiveness of proposed KELM tool for prediction of evaporation across large climatic zones. The prediction process was carried out through three scenarios and generalization capability of different kernel functions as core tools of KELM 280 Handbook of hydroinformatics method were investigated. The results revealed that polynomial and RBF kernel functions achieved better performance. Further investigations showed that the hybrid SVM model with discrete wavelet transform can enhance the prediction accuracy and surpass proposed KELM method in terms of Reservoir evaporation prediction. 7. Conclusions In this chapter the principles of several kernel-based approaches are discussed and it has been shown that they provide an approach for feature classification and regression problems. Kernel methods give a systematic and principled approach to training learning machines. These methods can be used to generate many possible learning machine architectures (RBF networks, feedforward neural networks) through an appropriate choice of kernel. In particular these approaches are properly motivated theoretically and systematic in execution. Kernel functions enable the kernel methods to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space. By applying the kernel trick to the model any linear model can be turned into a non-linear model. The application of kernel methods related to water resource engineering problems are demonstrated. It was showed that kernel methods had been successfully applied for accurate prediction of water recourse engineering problems. However, the appropriate selection of kernel type is the most important step in these models due to their direct impact on the training and classification process. Also, these methods are memory intensive, trickier to tune due to the importance of picking the right kernel, and the application of them on larger datasets should be tested to determine the merits of the kernel methods. References Aboutalebi, M., Bozorg Haddad, O., Loaiciga, H.A., 2015. Optimal monthly reservoir operation rules for hydropower generation derived with SVR-NSGAII. J. Water Resour. Plan. Manag. 141 (11), 04015029. Akbari, M., Salmasi, F., Arvanaghi, H., Karbasi, M., Farsadizadeh, D., 2019. Application of Gaussian process regression model to predict discharge coefficient of gated piano key weir. Water Resour. Manag. 33 (11), 3929–3947. Cristianini, N., Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, UK. Deka, P.C., 2014. Support vector machine applications in the field of hydrology: a review. Appl. Soft Comput. 19, 372–386. Duvenaud, D., 2014. Automatic Model Construction With Gaussian Processes (Doctoral Dissertation). University of Cambridge, UK. Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1–3), 489–501. Huang, G.B., Zhou, H., Ding, X., Zhang, R., 2011. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B Cybern. 42 (2), 513–529. Jaakkola, T.S., Diekhans, M., Haussler, D., 1999. Using the Fisher kernel method to detect remote protein homologies. Proc. Int. Conf. Intell. Syst. Mol. Biol. 99, 149–158. Jaiswal, A., Goel, A., 2020. Evaluation of aeration efficiency of triangular weirs by using Gaussian process and M5P approaches. In: Advanced Engineering Optimization Through Intelligent Techniques. Springer, Singapore, pp. 749–756. Jian, Y.L., Chun, T.C., Kwok, W.C., 2006. Using support vector machines for long term discharge prediction. Hydrol. Sci. J. 51 (4), 599–612. Khan, S.M., Coulibaly, P., 2006. Application of support vector machine in lake waterlevel prediction. J. Hydraul. Eng. ASCE 11, 199–205. Li, Y., Shi, H., Liu, H., 2020. A hybrid model for river water level forecasting: cases of Xiangjiang River and Yuanjiang River, China. J. Hydrol. 587, 124934. Melo, J., 2012. Gaussian processes for regression: a tutorial (Technical Report). University of Porto, Portugal. Pal, M., Singh, N.K., Tiwari, N.K., 2014. Kernel methods for pier scour modeling using field data. J. Hydroinf. 16 (4), 784–796. Perez-Cruz, F., Van Vaerenbergh, S., Murillo-Fuentes, J.J., Lázaro-Gredilla, M., Santamaria, I., 2013. Gaussian processes for nonlinear signal processing: an overview of recent advances. IEEE Signal Process. Mag. 30 (4), 40–50. Raghavendra, N.S., Deka, P.C., 2016. Multistep ahead groundwater level time-series forecasting using Gaussian process regression and ANFIS. In: Advanced Computing and Systems for Security. Springer, New Delhi, India, pp. 289–302. Rasmussen, C.E., Williams, C.K., 2006. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA, USA. Roushangar, K., Ghasempour, R., 2017. Prediction of non-cohesive sediment transport in circular channels in deposition and limit of deposition states using SVM. Water Sci. Technol. Water Supply 17 (2), 537–551. Roushangar, K., Shahnazi, S., 2019. Bed load prediction in gravel-bed rivers using wavelet kernel extreme learning machine and meta-heuristic methods. Int. J. Environ. Sci. Technol. 16 (12), 8197–8208. Roushangar, K., Shahnazi, S., 2020a. Determination of influential parameters for prediction of total sediment loads in mountain rivers using kernel-based approaches. J. Mt. Sci. 17 (2), 480–491. Roushangar, K., Shahnazi, S., 2020b. Prediction of sediment transport rates in gravel-bed rivers using Gaussian process regression. J. Hydroinf. 22 (2), 249–262. Roushangar, K., Garekhani, S., Alizadeh, F., 2016. Forecasting daily seepage discharge of an earth dam using wavelet–mutual information–Gaussian process regression approaches. Geotech. Geol. Eng. 34 (5), 1313–1326. Kernel-based modeling Chapter 16 281 Roushangar, K., Matin, G.N., Ghasempour, R., Saghebian, S.M., 2019. Evaluation of the effective parameters on energy losses of rectangular and circular culverts via kernel-based approaches. J. Hydroinf. 21 (6), 1014–1029. Roushangar, K., Ghasempour, R., Biukaghazadeh, S., 2020. Evaluation of the parameters affecting the roughness coefficient of sewer pipes with rigid and loose boundary conditions via kernel-based approaches. Int. J. Sediment Res. 35 (2), 171–179. Roushangar, K., Majedi Asl, M., Shahnazi, S., 2021. Hydraulic performance of PK weirs based on experimental study and kernel-based modeling. Water Resour. Manag. 35 (11), 3571–3592. Saghebian, S.M., Roushangar, K., Ozgur Kirca, V.S., Ghasempour, R., 2020. Modeling total resistance and form resistance of movable bed channels via experimental data and a kernel-based approach. J. Hydroinf. 22 (3), 528–540. Seifi, A., Riahi, H., 2020. Estimating daily reference evapotranspiration using hybrid gamma test-least square support vector machine, gamma test-ANN, and gamma test-ANFIS models in an arid area of Iran. J. Water Clim. Chang. 11 (1), 217–240. Seifi, A., Ehteram, M., Singh, V.P., Mosavi, A., 2020. Modeling and uncertainty analysis of groundwater level using six evolutionary optimization algorithms hybridized with ANFIS, SVM, and ANN. Sustainability 12 (10), 4023. Shamshirband, S., Mohammadi, K., Chen, H.L., Samy, G.N., Petkovic, D., Ma, C., 2015. Daily global solar radiation prediction from air temperatures using kernel extreme learning machine: a case study for Iran. J. Atmos. Sol. Terr. Phys. 134, 109–117. Sharifi Garmdareh, E., Vafakhah, M., Eslamian, S., 2018. Regional flood frequency analysis using support vector regression in the arid and semi-arid regions of Iran. Hydrol. Sci. J. 63 (3), 426–440. Smola, A.J., 1996. Regression Estimation With Support Vector Learning Machines (Master’s Thesis). Technische Universit€at M€unchen, Germany. Sun, A.Y., Wang, D., Xu, X., 2014. Monthly streamflow forecasting using Gaussian process regression. J. Hydrol. 511, 72–81. Tezel, G., Buyukyildiz, M., 2016. Monthly evaporation forecasting using artificial neural networks and support vector machines. Theor. Appl. Climatol. 124 (1–2), 69–80. Theodoridis, S., 2008. Pattern Recognition. Elsevier B.V, p. 203. ISBN 9780080949123. Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, pp. 1–47. Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M., 2010. Graph kernels. J. Mach. Learn. Res. 11, 1201–1242. Wand, M.P., Jones, M.C., 1994. Kernel Smoothing. CRC Press. Yang, Y., Li, J., Yang, Y., 2015, December. The research of the fast SVM classifier method. In: 2015 12th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). IEEE, pp. 121–124. Zhu, S., Luo, X., Xu, Z., Ye, L., 2019. Seasonal streamflow forecasts using mixture-kernel GPR and advanced methods of input variable selection. Hydrol. Res. 50 (1), 200–214. Zhuang, J., Tsang, I.W., Hoi, S.C., 2011. A family of simple non-parametric kernel learning algorithms. J. Mach. Learn. Res. 12, 1313–1347. Further reading Sebbar, A., Heddam, S., Djemili, L., 2020. Kernel extreme learning machines (KELM): a new approach for modeling monthly evaporation (EP) from dams reservoirs. Phys. Geogr., 1–23. This page intentionally left blank Chapter 17 Large eddy simulation: Subgrid-scale modeling with neural network Tamas Karches Faculty of Water Science, University of Public Service, Budapest, Hungary 1. Introduction Knowledge of the turbulent fluid flow behavior is essential in many hydroengineering applications; the spectrum is wide from natural or constructed open channel flows, subsurface flows to water distribution networks, sewage collection systems and flows through a (waste)water treatment technologies. The various engineering goals require different level of understanding of the flow structure, e.g., pipe flows generally allow dimensional simplifications, whereas the analysis of river bed change shall include the description the fate of the multispecies. Many efforts have been made to describe and resolve the hydrodynamic systems, but due to the complexity of the system, the computational cost could be a barrier. Turbulent flow can be characterized as a chaotic, dissipative, 3D unsteady flow, which has intermittence and its spatial and temporal distribution depends on upstream conditions. It increases the shear stress via eddy viscosity and as the turbulent diffusion develops the homogeneity of passive scalars also increases. For resolving the turbulent fluid flow the starting point is the conservation equations for the continuity, momentum and scalar variables (species concentration or energy/enthalpy). Direct numeric simulation (DNS) aims the solution of the entire spatial and temporal scales including the largest integral scale and the smallest dissipative scales. DNS is a flagship computational approach in the understanding of turbulence (Lee et al., 2014), and facilitate the improvement of turbulence models as an a priori test, where DNS provides the input (Toutant et al., 2008) or as a posteriori test for a model comparison with the DNS results (Bou-Zeid, 2015). Reynolds averaged Navier-Stokes (RANS) modeling is based on the separation of the time-averaged and fluctuation terms, which produces apparent stress and for closing the RANS equation turbulence model is required. Most of the engineering applications utilize unsteady RANS approach, because of its less need in computational cost, but in several cases such as transitional flows or where large scale structures dominate the flow it is not sufficient for describe the flow characteristics. Large eddy simulation utilizes a low pass filter of Navier-Stokes equation, which reduces the number of degrees of freedom; the small scales are time- and spatial averaged and modeled, the large scale is resolved as in DNS. The small scale is also called subgrid-scale (SGS). At dissipative scale, the eddies are isotropic, homogeneous, and universal, whereas at large scale eddies characteristics are anisotropic, have dependence on boundaries and subjected to history effect. The connection between the large and small scales is given by the energy cascade. RANS first averages and then computes the flow field, whereas it is contrary in LES, flow field is first computed and then the averages are made. In LES practice the role of the numerical errors is often underestimated, which could strongly influence the prediction (Dairay et al., 2017). Explicit LES filtering reduces the numerical discretization error as the retained motion is well resolved (Moin, 1998), but when the practicality and the ease in numerical computation are in the centerline, the use of the implicit LES could be proposed since it leads to a nonoscillatory solution (Grinstein et al., 2007). The main goal of the SGS model is to ensure that the energy dissipation in the LES is the same as in the fully resolved energy cascade obtained with DNS. SGS model also needs to incorporate the local, instantaneous energy transfer. A tradeoff between the SGS model improvement and mesh refinement is a key element in LES, since it determines the computational cost (Vollant et al., 2017). In this chapter, the traditional SGS modeling will be presented, and then the applicability of neural networks will be detailed, which is followed by recommendation for good modeling practice in this field. Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00022-1 Copyright © 2023 Elsevier Inc. All rights reserved. 283 284 Handbook of hydroinformatics 2. LES and traditional subgrid-scale modeling One categorization of SGS models is based on its role in flow field reconstruction; the nature of SGS can be functional or structural. Functional SGS focuses on the appropriate energy dissipation rate by adjusting artificial eddy viscosity, whereas structural approaches try to incorporate the energy transfer rate between the scales into their calculation. Some hybrid SGS models exist in conventional explicit LES, but implicit LES formulation could overcome the computational difficulties. Smagorinsky model mimics the Prandtl mixing length model applied in RANS, assuming a local equilibrium between the production of the SGS kinetic energy and dissipation. In eddy viscosity calculation a constant is introduced, which assumes full isotropy, but near-wall regions the turbulence anisotropy may be present. In addition, the Smagorinsky constant have to be adjusted depending on Reynolds number or discretization schemes. Dynamic model adapts the Smagorinsky constant locally, introduce a test filter near the cut-off scale allowing automatic constant damping near wall zones, which vanishes at laminar flows and it allows backscatter (Germano et al., 1991). Oneequation dynamic subgrid scale model generally applies transport equation for turbulence kinetic energy (k), which could lead a robust solution (Kajishima and Nomachi, 2006). Local dynamic k-based model (LDKM) has demonstrated its capability of capturing correctly the flow behavior near solid walls without any adjustments of the model (Kim and Menon, 1997). Other dynamic approach is the Lagrangian way to accumulate the required averages over flow pathlines rather than over directions of statistical homogeneity (Meneveau et al., 1996). Scale similar models assumes that the most active SGS are close to the cut-off and interacts the zone right above the cutoff (Bardina et al., 1980), and in latter study it was justified that the scale similarity terms modeled in SGS could be included in LES equation (Bensow and Fureby, 2007). Filtered density function proposed by Pope (1991) originated from probability density function method includes complete statistical information of the flow field variables, thus it is a valuable approach for subgrid closure especially in the case with multispecies transport and/or reacting flows (Drozda et al., 2007). As some traditional SGS model directions are outlined, it can be stated that a colorful and wide spectrum of mathematical tools are available for construct a deterministic closure. There is no royal road to achieve our goals, because various flow regimes and numerous cases differing from boundaries and geometries exist. Parallel the advancement of the above mentioned methods, extensive research activity applying data driven algorithms has begun in early 2000s. Sarghini et al. (2003) indicated the course using multilayer feed-forward neural network for SGS modeling in LES. The authors admitted that some key features of neural networks should be addressed like spatial scaling of input signals, the data set (abundance and type of data) used for training, but since then many achievements have been made in this field and some of them are summarized in the next section. 3. Data-driven LES closures Soft computing deals with approximate models that could solve real-life engineering problems. Deep learning techniques do not search the physical background, the structural relationship between the input and output, but these learn from historic data or in other words, are trained from samples and in return they give predictions. Unlike hard computing, these approaches are tolerant of uncertainty, approximations, imprecision (Ibrahim, 2016). In hydromechanics, where clearly defined deterministic models are available, the role of data driven assumptions like deep learning is discussed nowadays frequently. It is evident that in subproblems, where uncertain closure equations are applied well trained neural networks could perform better than traditional models (Lapeyre et al., 2019). Scalar flux modeling with dynamic approaches based on Clark model (Clark et al., 1979) proved to be an efficient tool to reproduce the local SGS terms applied optimal estimator to determine the most accurate set of input parameters (Fabre and Balarac, 2011). Scalar flux divergence is present in the filtered passive scalar advection-diffusion equation, which is subjected to be modeled. Functional SGS aims to determine eddy diffusivity (analogous to apparent eddy viscosity in momentum equation). Structural SGS conditions could be met by Taylor series expansion and dynamic nonlinear extension of eddy viscosity (Vollant et al., 2017). A subset of Taylor series expansion model is the gradient model, where the resolved scalar gradient could be decomposed to three elements; compressional, stretching and rotational effects. From this tier of elements, the compressional part results forward transfer, stretching induces backscatter of scalar variable, whereas the rotational part has no transfer across scales (Balarac et al., 2013). Physics informed neural networks as a novel tool in scalar flux modeling has been appeared since real life application requires to incorporate the case-specific behavior of flow and the existing constraints. Frezat et al. (2021) stated that their transformation-invariant neural network outperformed both data-driven and parametric state-of-the-art SGS models. Neural network training procedure aims to minimize irreducible errors via large set of uncorrelated parameters. In case of direct reconstruction of unresolved scalar sources from mesh resolved quantities there is no need for explicit filtering of Large eddy simulation: Subgrid-scale modeling Chapter 17 285 LES or solving additional transport equation (Seltz et al., 2019). Outperforming issues may arise, if the training is carried out at low Reynolds number, therefore Prat et al. (2020) suggest that ANN should be trained at high Reynolds, since extrapolating to lower values has better performance and correlations remain high. Maulik et al. (2019) offered an approach, which is both a priori and a posteriori, because they performed test on 2D decaying turbulence parallel to the learning SGS through probability density function supplemented with hyperparameter optimization analysis. Neural network could decide whether a given point in the flow domain requires a turbulence closure or not using DNS data. If the analysis of the subgrid statistics gives an indication that dissipation is required, upwind scheme is applied. However, in the case of no subgrid closure, a symmetric and second-order accurate Arakawa-scheme is utilized (Maulik et al., 2020). Wall modeling with LES needs to account both the small scales at near-wall zone and the large scales above the wall. This challenges could be addressed with the description of physics (vertically integrated thin-boundary-layer equation and eddy population density scaling) and complement the neural network train data (Yang et al., 2019). 4. Guidelines for SGS modeling Unified modeling practice does not exist for neural network based SGS modeling, but the recent advancements in this field induce that some structural guidelines for the modeling process shall be presented. The suggestions are proposed based on the state-of-the-art techniques, but it should be handled as a first step, which needs refinement or may be subjected profound changes later. The modeling process could be divided to four different stages; project definition, a priori analysis with DNS, neural network based SGS construction and performing LES simulations (Fig. 1). The subtasks in each step are detailed as follows. 4.1 Simulation project definition In every modeling procedure a clear statement of the aims and goals are the basis. Goals should be well-defined, formulated briefly and shall avoid too general statements. As an example, the following phrasing is not enough: our aim is to analyze FIG. 1 Simulation procedure flow chart for neural network aided LES simulation. 286 Handbook of hydroinformatics near-wall flows in a T-bend or the goal of the project is to determine coherent structures in a shearing zone. Instead, we should use: our aim is to calculate the unsteady velocity profile assuming a steady velocity inlet boundary focusing nearwall zones in a T-bend or the goal of the project is to determine isosurfaces of Q in a shearing zone. Generally speaking, the goal should reflect the concrete output of the model. After the problem statement the model scope and limitations should be outlined; what are the premises, boundaries, untouched areas. An isotropic turbulence model could be appropriate to pollutant transport calculations via the turbulent diffusion, but not capture all the necessary details in separated or rotational flows. In this stage of the modeling the simulation workflow with all the alternatives and related costs (time, hardware) and data management strategies should be taken into account. Data management includes data acquisition (literature, measurement, premodeled data), data reconciliation (sanity and plausibility checks) and each data shall be linked to each modeling steps (e.g. model parameter, training data, test data, validation data of the genetic algorithm). During the modeling process more data will be generated, a strategy for data processing should be defined. For example, DNS output data will be stored and used for flow field visualization, but parallel data will be filtered in order to reconstruct the coarse grid model. As the last step the deliverables should be stated. 4.2 A priory analysis with DNS Generally, DNS is not embedded in commercial fluid dynamics software codes, but in-house solutions are available, each is built for a specific flow type. Development of such algorithms focuses mainly on the computational cost. As for the numerical considerations, spectral methods (Gottlieb and Orszag, 1977) or high-order finite difference methods could be applied efficiently. Temporal discretization uses both explicit and implicit schemes (Coleman and Sandberg, 2010). Since the turbulence is three-dimensional unsteady flow, each time dependent variable history shall be known at the boundaries. Depending on the flow type, initial conditions could have significant effect on the result, and various inflow turbulence generation methods are detailed in literature (Wu, 2017); however in case of stationary flows, the initial conditions is less important. Further reading and guidelines for performing and validation DNS simulation is in work of Sandham (2005). As the DNS simulation procedure finishes, the next step is to use filter as in LES and produce coarse grid data. The field gained by the DNS filtration is the so-called perfect LES (Beck et al., 2019). As a next step the data shall be separated to training data, validation data and testing data. The training set is used to optimize the weights, validation data are to evaluate the training errors and predict the point of overfitting, testing data serves a posteriori check. Training and validation data could be gained from similar set of DNS, whereas testing can be produced by a sudden change in inlet boundary for a short interval as proposed in the study of Lapeyre et al. (2019). 4.3 Neural network based SGS model construction Basic elements of deep learning are the interconnected nodes. Between the input and output layers there are several hidden layers. The process starts from the input and by applying activation functions (e.g., sigmoid, threshold, rectified linear unit, hyperbolic tangent, etc.) produces the outputs. Information from the nodes is to be weighted, which weight factors are determined during the training phase. Many deep learning algorithms have been developed, some examples are deep belief networks, radial basis function networks, recurrent neural networks, generative adversarial networks, multilayer perceptron, and convolutional neural networks. The latter two ones are used commonly for SGS modeling (Beck et al., 2019; Ling et al., 2016), but as a specific architecture tensor basis neural network is appeared in Ling et al. (2016) work. Multilayer perceptron is a feed forward neural network with multiple layers of perceptron each having activation function. Training of node weights aims to achieve the lowest mean squared error between the predicted and actual target values. The derivative of the means squared error is calculated by back propagation method, and once these are available, the weights are updated. Minibatch gradient method (Ioffe and Szegedy, 2015) is a popular method, where the squared error is monitored during the process and this method could help to avoid overfitting. Overfitting of neural network could be prevented by training the network on more inputs or by changing the network complexity (number of weights, value of weights). It has to be mentioned that recent study revealed that spatially multiscale ANN model performed better than gradient model (Xie et al., 2020). Convolutional neural networks overcome the shortcomings of multilayer perceptron technique, it is mainly used in image processing or in object detection. The architecture can be divided to various layers; convolution layer uses filters, then the segments are rectified through the so-called rectified linear unit. Pooling layer flattens the arrays producing single linear vectors, which are the inputs for the fully connected layers. In convolutional layer the neurons get data only from the receptive field. Large eddy simulation: Subgrid-scale modeling Chapter 17 287 Applying neural network, the modeler cannot follow the traditional CFD practice in validation procedure, where grid and in unsteady simulations time independence were evaluated (Eslamian et al., 2012). Neural network cross validation helps to detect overfitting and ensure the reliability of the predicted values (as in a convergence study of numerical schemes). K-fold cross validation means that the data set is divided into k subsets, from where one subset is retained for validation the rest k 1 is for training. The procedure is repeated k times, and then each data subset is used for exactly once for validation. Even if the output of the neural network is a good approximation for the closure term, it cannot be expected that it has long term stability (Beck et al., 2019). As the direct construction of closure terms is not feasible, Beck et al. (2019) proposed two alternatives. One applies the neural network as a direct predictive tool with dissipative model, the other is a neural network informed eddy viscosity model described more in detail in the authors paper. However, both approaches are effective to manage stability issues, but the latter one showed better agreement with the reference. As a last step,—if all the pieces are together—LES simulation could be performed. 5. Conclusions In this study, a brief overview of the data-driven SGS modeling applied in large eddy simulations was shown. After an insight into the traditional SGS reconstruction approaches was given, the focus was set on artificial neural network based modeling approaches. Some recent papers in this field were cited to show the colorful approaches, and it was followed by a simplified guidance in the SGS modeling. The author’s aim was not a fully description of each modeling practice, despite, it was considered to give a starting point, which may lead to a unified modeling protocol. Many challenges are in front of us, as Kutz (2017) outlined in his paper; how many nodes and layers should be applied, what is the minimum size of data sets, which is enough for proper training, what are the uncertainties in the output? How can I prevent overfitting? Can one predict data outside the training data? In addition to general questions, what is the suggestible architecture for SGS modeling? How can the stability maintain of the neural network constructed SGS? And as more robust and advanced tools will be developed, new questions could be added the above mentioned list. Many fruitful directions are ongoing, e.g., deconvolutional method proposed by Yuan et al. (2020), or the idea of perfect LES (Beck et al. (2019)—the next step is to evaluate and select the most appropriate (if there is any) modeling procedure. References Balarac, G., Le Sommer, J., Meunier, X., Vollant, A., 2013. A dynamic regularized gradient model of the subgrid-scale scalar flux for large eddy simulations. Phys. Fluids 25 (7), 075107. https://doi.org/10.1063/1.4813812. Bardina, J., Ferziger, J.H., Reynolds, W.C., 1980. Improved subgrid scale models for large eddy simulations. In: 13th Fluid and Plasmadynamics Conference 1357., https://doi.org/10.2514/6.1980-1357. Beck, A., Flad, D., Munz, C.D., 2019. Deep neural networks for data-driven LES closure models. J. Comput. Phys. 398, 108910. https://doi.org/10.1016/j. jcp.2019.108910. Bensow, R.E., Fureby, C., 2007. On the justification and extension of mixed models in LES. J. Turbul. 8, N54. https://doi.org/10.1080/146852 40701742335. Bou-Zeid, E., 2015. Challenging the large eddy simulation technique with advanced a posteriori tests. J. Fluid Mech. 764, 1–4. https://doi.org/10.1017/ jfm.2014.616. Clark, R.A., Ferziger, J.H., Reynolds, W.C., 1979. Evaluation of subgrid-scale models using an accurately simulated turbulent flow. J. Fluid Mech. 91 (1), 1–16. https://doi.org/10.1017/S002211207900001X. Coleman, G.N., Sandberg, R.D., 2010. A primer on direct numerical simulation of turbulence-methods, procedures and guidelines. Technical Report AFM09/01a https://eprints.soton.ac.uk/66182/1/A_primer_on_DNS.pdf. (Accessed 14 March .2021). Dairay, T., Lamballais, E., Laizet, S., Vassilicos, J.C., 2017. Numerical dissipation vs. subgrid-scale modelling for large eddy simulation. J. Comput. Phys. 337, 252–274. https://doi.org/10.1016/j.jcp.2017.02.035. Drozda, T.G., Sheikhi, M.R.H., Madnia, C.K., Givi, P., 2007. Developments in formulation and application of the filtered density function. Flow Turbul. Combust. 78 (1), 35–67. https://doi.org/10.1007/s10494-006-9052-4. Eslamian, S., Abedi-Koupai, J., Zareian, M.J., 2012. Measurement and modelling of the water requirement of some greenhouse crops with artificial neural networks and genetic algorithm. Int. J. Hydrol. Sci. Technol. 2 (3), 237–251. Fabre, Y., Balarac, G., 2011. Development of a new dynamic procedure for the Clark model of the subgrid-scale scalar flux using the concept of optimal estimator. Phys. Fluids 23 (11), 115103. https://doi.org/10.1063/1.3657090. Frezat, H., Balarac, G., Le Sommer, J., Fablet, R., Lguensat, R., 2021. Physical invariance in neural networks for subgrid-scale scalar flux modelling. Phys. Rev. Fluids 6 (2), 024607. https://doi.org/10.1103/PhysRevFluids.6.024607. Germano, M., Piomelli, U., Moin, P., Cabot, W.H., 1991. A dynamic subgrid-scale eddy viscosity model. Phys. Fluids A Fluid Dynam. 3 (7), 1760– 1765. https://doi.org/10.1063/1.857955. 288 Handbook of hydroinformatics Gottlieb, D., Orszag, S.A., 1977. Numerical Analysis of Spectral Methods: Theory and Applications. CBMS-NSF Regional Conference Series in Applied Mathematics, Series Number 26, Society for Industrial and Applied Mathematics. Grinstein, F.F., Margolin, L.G., Rider, W.J., 2007. A rationale for implicit LES. In: Implicit Large-Eddy Simulation: Computing Turbulent Flow Dynamics. Cambridge University Press, New York, USA, pp. 39–59. Ibrahim, D., 2016. An overview of soft computing. Procedia Comput. Sci. 102, 34–38. https://doi.org/10.1016/j.procs.2016.09.366. Ioffe, S., Szegedy, C., 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456, https://doi.org/10.48550/arXiv.1502.03167. Kajishima, T., Nomachi, T., 2006. One-equation subgrid scale model using dynamic procedure for the energy production. J. Appl. Mech. 73 (3), 368– 373. https://doi.org/10.1115/1.2164509. Kim, W.W., Menon, S., 1997. Application of the localized dynamic subgrid-scale model to turbulent wall-bounded flows. In: 35th Aerospace Sciences Meeting and Exhibit 210., https://doi.org/10.2514/6.1997-210. Kutz, J.N., 2017. Deep learning in fluid dynamics. J. Fluid Mech. 814, 1–4. https://doi.org/10.1017/jfm.2016.803. Lapeyre, C.J., Misdariis, A., Cazard, N., Veynante, D., Poinsot, T., 2019. Training convolutional neural networks to estimate turbulent sub-grid scale reaction rates. Combust. Flame 203, 255–264. https://doi.org/10.1016/j.combustflame.2019.02.019. Lee, M., Ulerich, R., Malaya, N., Moser, R.D., 2014. Experiences from leadership computing in simulations of turbulent fluid flows. Comput. Sci. Eng. 16 (5), 24–31. https://doi.org/10.1109/MCSE.2014.51. Ling, J., Kurzawski, A., Templeton, J., 2016. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech. 807, 155–166. https://doi.org/10.1017/jfm.2016.615. Maulik, R., San, O., Rasheed, A., Vedula, P., 2019. Subgrid modelling for two-dimensional turbulence using neural networks. J. Fluid Mech. 858, 122– 144. https://doi.org/10.1017/jfm.2018.770. Maulik, R., San, O., Jacob, J.D., 2020. Spatiotemporally dynamic implicit large eddy simulation using machine learning classifiers. Phys. D Nonlinear Phenom. 406, 132409. https://doi.org/10.1016/j.physd.2020.132409. Meneveau, C., Lund, T.S., Cabot, W.H., 1996. A Lagrangian dynamic subgrid-scale model of turbulence. J. Fluid Mech. 319, 353–385. Moin, P., 1998. Numerical and physical issues in large eddy simulation of turbulent flows. JSME Int. J. Ser. B Fluids Thermal Eng. 41 (2), 454–463. https:// doi.org/10.1299/jsmeb.41.454. Pope, S.B., 1991. Computations of turbulent combustion: progress and challenges. In: Symposium of Combustion 23, pp. 591–612. Prat, A., Sautory, T., Navarro-Martinez, S., 2020. A priori sub-grid modelling using artificial neural networks. Int. J. Comput. Fluid Dynam. 34 (6), 397– 417. https://doi.org/10.1080/10618562.2020.1789116. Sandham, N.D., 2005. Turbulence simulation. In: Prediction of Turbulent Flows. Cambridge University Press, pp. 207–235. Sarghini, F., De Felice, G., Santini, S., 2003. Neural networks based subgrid scale modeling in large eddy simulations. Comput. Fluids 32 (1), 97– 108. https://doi.org/10.1016/S0045-7930(01)00098-6. Seltz, A., Domingo, P., Vervisch, L., Nikolaou, Z.M., 2019. Direct mapping from LES resolved scales to filtered-flame generated manifolds using convolutional neural networks. Combust. Flame 210, 71–82. https://doi.org/10.1016/j.combustflame.2019.08.014. Toutant, A., Labourasse, E., Lebaigue, O., Simonin, O., 2008. DNS of the interaction between a deformable buoyant bubble and a spatially decaying turbulence: a priori tests for LES two-phase flow modelling. Comput. Fluids 37 (7), 877–886. https://doi.org/10.1016/j.compfluid.2007.03.019. Vollant, A., Balarac, G., Corre, C.E., 2017. Subgrid-scale scalar flux modelling based on optimal estimation theory and machine-learning procedures. J. Turbul. 18 (9), 854–878. https://doi.org/10.1080/14685248.2017.1334907. Taylor and Francis. Wu, X., 2017. Inflow turbulence generation methods. Annu. Rev. Fluid Mech. 49, 23–49. https://doi.org/10.1146/annurev-fluid-010816-060322. Xie, C., Wang, J., Li, H., Wan, M., Chen, S., 2020. Spatially multi-scale artificial neural network model for large eddy simulation of compressible isotropic turbulence. AIP Adv. 10 (1), 015044. https://doi.org/10.1063/1.5138681. Yang, X.I.A., Zafar, S., Wang, J.X., Xiao, H., 2019. Predictive large-eddy-simulation wall modeling via physics-informed neural networks. Phys. Rev. Fluids 4 (3), 034602. https://doi.org/10.1103/PhysRevFluids.4.034602. Yuan, Z., Xie, C., Wang, J., 2020. Deconvolutional artificial neural network models for large eddy simulation of turbulence. Phys. Fluids 32 (11), 115106. https://doi.org/10.1063/5.0027146. Chapter 18 Lattice Boltzmann method and its applications Mojtaba Aghajani Delavar and Junye Wang Faculty of Science and Technology, Athabasca University, Athabasca, AB, Canada 1. Introduction Lattice Boltzmann method (LBM) is a mesoscopic approach based on the microscopic particle and mesoscopic kinetic theory, originated from the classical statistical physics and lattice gas automata (LGA) method (Frisch et al., 1987). The fundamental idea behind the LBM is to construct simplified kinetic models that simulate only a collection of pseudo particles streaming and colliding over a discrete lattice domain to avoid the use of the full Boltzmann equation, and tracing each particle as in molecular dynamics simulations. The stream and interaction of the parcel particles are simulated in terms of time evolution. In the lattice Bhatnagar-Gross-Krook (LBGK) development, an ensemble average of the particle distribution is described as a density distribution function (Bhatnagar et al., 1954; Ajarostaghi et al., 2019). The complex collision of the Boolean particles in LGA was simplified as an ensemble average relaxation term of the so-called density distribution function. Thus, the discrete collision rule is also replaced by a continuous function known as the collision operator within LBGK. The transition from LGA to LBM removed the statistical noise and makes simulations more efficient, stable, and flexible. The governing continuity and Navier-Stokes equations can be recovered using appropriately choosing the equilibrium distribution function and Chapman-Enskog theory (Qian et al., 1992; Chen et al., 1992). Macroscopic averaged properties, such as the flow velocity, the pressure, and the fluid density, can be recovered from the moments of (time- and space-discrete) density distribution function. Instead of solving the Navier-Stokes equations directly, fluid distribution functions on a lattice are simulated with streaming and collision (relaxation) processes, while the Navier-Stokes-based solvers rely on discretizing the governing differential equations with a given set of boundary conditions on a computational grid network. On the other hand, the molecular dynamics method considers the streams and collisions of all individual molecules constituting the fluid with a detailed description of the intermolecular interactions. Thus, the complexity of interactions among a huge number of molecules makes the molecular dynamics models computationally prohibitive for application to flows in porous media (Saatsaz and Eslamian, 2020). Due to its underlying kinetic nature, the LBM has many advantages, such as easy parallelism of the algorithm, simplicity of programming, and ease of incorporating microscopic nonequilibrium processes (Wang et al., 2005), which enable the modeling of complex multiphysics phenomena involving interfacial dynamics and complex boundaries, such as multiphase or multicomponent flows in porous media. As a result, the LBM is an ideal scale-bridging numerical scheme, which incorporates simplified kinetic models to capture microscopic/mesoscopic flow physics, yet the averaged quantities satisfy the desired macroscopic equations. This chapter is organized as follows. First, the lattice Boltzmann equation is presented. Then single relaxation BGK model, multirelaxation time LBM, and boundary conditions are discussed. Finally, LBM applications to multiphase flow and an example are presented. 2. Lattice Boltzmann equations In the simulation of transport phenomenon, there are two basic methods as macroscopic continuum media and microscopic scheme. In a macroscopic scheme including fluid mechanics and thermodynamics, the domain is considered as a continuum media, while in a microscopic scheme or kinetics theory, the domain is treated as small particles. It is shown that both schemes result in the same macroscopic governing equations for systems involving a large number of particles. Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00001-4 Copyright © 2023 Elsevier Inc. All rights reserved. 289 290 Handbook of hydroinformatics The microscopic scheme is based on the microscopic particle description provided by Molecular Dynamics (MD). The position and velocity of each particle in the system (atom or molecule) are strongly related to its position and velocity on the current and previous time steps and Newton’s equations of motion. Due to the huge number of involved particles, this can lead to substantial computational time. Thus, the molecular dynamics simulations are limited to very small systems. To solve this problem and decrease computational problems of these models, these practical modifications are presented and used: (1) First, decreasing the number of investigated particles by considering a group of atoms or molecules instead of an individual atom or molecule. This changes the simulation scale from microscopic to mesoscopic scale. (2) Second, increasing simulation speed by decreasing the degree of freedom of particle movement by restriction of it just along with a finite number of directions. In the lattice Boltzmann method as a mesoscopic method, the distribution function is defined as the probability to find particles with a specific velocity range at a limited position at the given time. This distribution function is used as a replacement for the investigation of each particle in molecular dynamics. This change leads to considerable savings in computational costs that make this procedure feasible for larger domains compared to ones for molecular dynamics. In the lattice Boltzmann method, the movements of fluid particles are modeled to capture macroscopic fluid quantities such as velocity and pressure. Like other numerical simulation procedures, the domain is discretized into uniform Cartesian cells, each one holds a fixed number of distribution functions, which represent the number of fluid particles moving in these discrete directions. The number of distribution functions is related to the used lattice Boltzmann method in Table 1. These distribution functions are calculated by solving the Lattice Boltzmann Equation (LBE). TABLE 1 Common lattice Boltzmann models. Model D1Q3 D1Q5 Schematic Velocities 0 2 4 1 0 2 D2Q5 8 > < 0, k ¼ 0 ck ¼ 1 k ¼ 1 > : 1, k ¼ 2 3 1 8 0, k ¼ 0 > > > > <2 c k ¼ 6 , k ¼ 1, 3 > > > 2 > : , k ¼ 2, 4 6 8 > < ð0, 0Þ, k ¼ 0 c k ¼ ð1, 0Þ, k ¼ 1, 3 > : ð0, 1Þ, k ¼ 2, 4 2 Weighting factors ok ¼ 8 0, k ¼ 0 > > < > > : 1 , k ¼ 12 2 8 6 > > > 12 , k ¼ 0 > > < 2 ok ¼ , k ¼ 1, 3 > 12 > > > > 1 : , k ¼ 2, 4 12 8 0, k ¼ 0 > > < ok ¼ > > :1, k ¼ 1 4 4 3 0 1 4 D2Q7 8 > < ð0, 0Þ, k ¼ 0 c k ¼ ð1, 0Þ, k ¼ 1, 2 > pffiffi : 1 =2, 3=2 , k ¼ 3, 6 ok ¼ 8 1 > > > ,k¼0 < 2 > > > : 1 ,k ¼16 12 Lattice Boltzmann method and its applications Chapter 18 291 TABLE 1 Common lattice Boltzmann models—cont’d Model Schematic D2Q9 Velocities 6 2 8 > < ð0, 0Þ, k ¼ 0 c k ¼ ð1, 0Þ, ð0, 1Þ, k ¼ 1 4 > : ð1, 1Þ, k ¼ 5 8 5 3 Weighting factors 84 > > >9, k ¼ 0 > > < 1 ok ¼ ,k ¼14 >9 > > > > 1 : ,k ¼ 58 36 1 0 7 4 D3Q15 8 7 8 3 14 9 8 > < ð0, 0, 0Þ, k ¼ 0 c k ¼ ð1, 0, 0Þ, ð0, 1, 0Þ, ð0, 1, 0Þ, k ¼ 1 6 > : ð1, 1, 1Þ, k ¼ 7 14 82 > ,k¼0 > > > 9 > < 1 ok ¼ ,k ¼16 >9 > > > > 1 : , k ¼ 7 14 72 8 > < ð0, 0, 0Þ, k ¼ 0 c k ¼ ð1, 0, 0Þ, ð0, 1, 0Þ, ð0, 1, 0Þ, k ¼ 1 6 > : ð1, 1, 0Þ, ð1, 0, 1Þ, ð0, 1, 1Þ, k ¼ 7 18 81 > ,k¼0 > > > 3 > < 1 ok ¼ ,k ¼ 16 > 18 > > > > 1 : , k ¼ 7 14 36 2 0 4 1 5 10 13 6 11 12 D3Q19 15 13 5 11 9 3 2 7 17 18 16 12 14 1 4 6 8 10 c18 In order to capture the flow field in conventional numerical simulations, the continuity and momentum conservation equation (Navier Stokes) should be solved which are rU ¼0 r ∂U + rU rU ¼ rp + mr2 U ∂t (1) (2) where U is the velocity vector, r is the fluid density, p and m represent the pressure and the dynamic viscosity, respectively. However, in LBM these equations are not solved and lattice Boltzmann equations are solved which satisfy the governing equations in mesoscopic scale using distribution functions. The distribution function f(r, c, t) is defined as the number of particles in time t, located in the position between r and r + dr, with a velocity ranging from c to c + dc. The velocity and location of a particle with unit mass due to an external force change from c to c + Fdt, and r to r + dr, respectively. The change between initial and final statuses is modeled using the collision operator, O, and the equation of particles numbers can be written as (Mohamad, 2011): F (3) f r + cdt, c + dt, t + dt drdc f ðr, c, tÞdrdc ¼ Oðf Þdrdcdt m by limiting dt ! 0: df =dt ¼ Oðf Þ This means that the rate of distribution function change is equal to the collision. (4) 292 Handbook of hydroinformatics As the distribution function, f, is a function of c, r, and t, the Boltzmann transport equation can be written as below: IF there is no external force: ∂f ∂f F ∂f + !c + ¼O ∂t m ∂c ∂r (5) ∂f + c rf ¼ O ∂t (6) 2.1 BGK approximation The collision operator (O) is a function of f and it must be given to solve the Boltzmann equation. Bhatnagar, Gross, and Krook introduced a simple model for the collision operator. By using the BGK (Bhatnagar-Gross-Krook) approximation, the Boltzmann equation without external forces is as below (Bhatnagar et al., 1954): ∂f 1 + c rf ¼ ðf eq f Þ ¼ oðf eq f Þ ∂t τ (7) where o ¼ 1/τ is collision frequency, and τ is for relaxation factor and feq represents the local equilibrium distribution function. The discretized lattice Boltzmann equation (Eq. 7) with external forces is as below: Dt eq f k ðx, tÞ f k ðx, tÞ + DtFk τ " # ck U ðck U Þ2 UU eq f k ðx, tÞ ¼ ok r 1 + + 0:5 0:5 2 c2s cs c4s f k ðx + ck Dt, t + DtÞ ¼ f k ðx, tÞ + (8) (9) where Dt is the lattice time step, ck the discrete lattice velocity in direction k, Fk the external force, feq k the equilibrium distribution function (Qian et al., 1992; Chen et al., 1992), and r the lattice fluid density. ok denotes a weighting factor depending on the different LB models, DnQm, as shown in Table 1. The space, x, is usually discretized in such a way that ck Dt is the distance between two neighboring grid lattices. Then after a time step, Dt, fk (x, t) will move to its neighboring grid lattice along the lattice velocity direction, ck. In LBM, particle movement is limited to a set of specific directions, depending on the lattice arrangement used LBM model, DnQm. In DnQm, n and m represent the spatial dimension and the total number of the lattice streaming directions velocities, respectively. For example, D1Q2 represents one dimension and two streaming directions. Microscopic Eqs. (8) and (9) can recover to macroscopic Eqs. (1) and (2) through Chapman-Enskog Expansion. 2.2 Lattice Boltzmann models In Table 1, D2Q9 is a type of lattice Boltzmann model, which represents two dimensions and nine streaming directions. Its equations can be described as: △t eq f k ðx, tÞ f k ðx, tÞ + △tFk f k ðx + ck △t, t + △tÞ ¼ f k ðx, tÞ + τ " # (10) ck U ðck U Þ2 UU eq f k ðx, tÞ ¼ ok r 1 + + 0:5 0:5 c2s c2s c4s 84 > ,k¼0 > > >9 > < 1 ok ¼ ,k ¼14 >9 > > > > : 1 ,k ¼58 36 c cs ¼ pkffiffiffi 3 The density and velocity components are determined by: r¼ 8 > < ð0, 0Þ, k ¼ 0 ck ¼ ð1, 0Þ, ð0, 1Þ, k ¼ 1 4 > : ð1, 1Þ, k ¼ 5 8 X fk (11) (12) k rui ¼ X f k cki (13) k Subscript i is the direction of the Cartesian axis coordinates, on which macro collections of distribution functions are summed. Lattice Boltzmann method and its applications Chapter 18 293 2.3 Multirelaxation time lattice Boltzmann (MRT) To increase the numerical stability and accuracy of the single relaxation BGK model, the multirelaxation time (MRT) collision operator was proposed (d’Humières, 2002; Mohamad, 2011; Sharma et al., 2019; Jami et al., 2016). The streaming process remains the same in MRT, but the differences lay in the collision process, the streaming process is completed in the velocity space while the collision step is performed in the moment space. Lattice Boltzmann equation for MRT is (d’Humières, 2002; Mohamad, 2011; Jami et al., 2016): f k ðx + ck △t, t + △tÞ ¼ f k ðx, tÞ + M1 R meq (14) k ðx, tÞ mk ðx, tÞ where M is a transformation matrix that is used to map velocity space, f, to moment space, m. R is a diagonal matrix with relaxation time, mk and meq k are vectors of moments. Using linear transformation for mapping between velocity and moment: f ¼ M1 m The details of the MRT for D2Q9 are summarized below: 2 4 4 4 0 6 4 1 2 6 6 6 6 4 1 2 0 6 6 6 4 1 2 6 6 1 1 M ¼ 6 4 1 2 0 36 6 6 1 6 64 2 6 64 2 1 6 6 6 1 6 44 2 4 2 1 6 m ¼ r e E j x qx (15) 0 6 0 0 0 0 0 9 0 6 6 0 6 0 9 9 0 6 6 9 3 3 6 6 3 3 0 0 3 3 6 6 jy qy pxx 3 0 3 0 pyy T 3 0 0 7 7 7 0 7 7 7 0 7 7 0 7 7 7 9 7 7 9 7 7 7 9 5 9 (16) (17) where r is the fluid density, e is related to the total energy, E is related to the energy square, qx and qyare related to energy flux, pxx and pyy are related to the symmetric stress tensor, and: X eq F jx ¼ ru ¼ f k ckx 2 k (18) X eq F jy ¼ rv ¼ f k cky 2 k where F is the external force. The equilibrium moments are calculated as: meq 0 ¼r 2 2 meq 1 ¼ 2r + 3 jx + jy 2 2 meq 2 ¼ r 3 jx + jy meq 3 ¼ jx meq 4 ¼ jx (19) meq 5 ¼ jy meq 6 ¼ jy 2 2 meq 7 ¼ jx jy meq 8 ¼ jx jy The diagonal relaxation matrix R can be written in compact notation as: 2 2 R ¼ diag 1:01:41:41:01:21:01:2 1 + 6# 1 + 6# (20) 294 Handbook of hydroinformatics More information about details of MRT can be found in (d’Humières, 2002; Lallemand and Luo, 2003; Li et al., 2016a,b; Mohamad, 2011; Jami et al., 2016; Zhang et al., 2020). 2.4 Boundary conditions Implementation of boundary conditions is not as straightforward as the Navier-Stokes equations and needs more attention. For example, consider boundary grids as shown in Fig. 1 for north, east, south, and west boundaries in the D2Q9 model. All solid arrows show distribution functions to outside of the domain known from streaming. The unknown distribution functions are ones shown in dotted lines toward the domain that should be calculated as a boundary condition in LBM and are discussed in this section. 2.4.1 Bounce back The bounce-back boundary condition is for solid walls that have no-slip conditions. In no-slip conditions, the surface has a roughness and fluid particles adhere to it, thus the fluid velocity in such a situation is zero at the wall. This boundary condition is the most common boundary condition for solid boundaries of the domain and obstacles located in the fluid flow. In the bounce-back scheme, the no-slip boundary condition is implemented in lattice Boltzmann. If the bounce back is applied to a cell in an obstacle, during the collision every distribution function is replaced by the distribution function in opposite direction. In this scheme, the unknown outgoing distribution functions are assumed equal to distribution functions in reverse directions at boundary sites. During the collision, the below equation will be applied: out f in p ðx, t + 1Þ ¼ f q ðx, tÞ (21) where p and q represent opposite directions in LBM models in Table 1. For example, for the west boundary in Fig. 1 the following conditions will be used: f 1 ¼ f 3 ,f 5 ¼ f 7 ,f 8 ¼ f 6 FIG. 1 Known (solid black lines) and unknown (dotted green lines) distribution functions for the D2Q9 model. (22) Lattice Boltzmann method and its applications Chapter 18 295 TABLE 2 Velocity boundary conditions for known velocity components in main boundaries (D2Q9). Boundary Density Distribution functions North known uN and vN rN ¼ East known uE and vE rE ¼ 1 +1 u ½f 0 + f 2 + f 4 + 2ðf 1 + f 5 + f 8 Þ f 3 ¼ f 1 23 rE uE f 6 ¼ f 8 12 ðf 2 f 4 Þ 16 rE uE + 12 rE v E f 7 ¼ f 5 + 12 ðf 2 f 4 Þ 16 rE uE 12 rE v E South known uS and vS 1 rS ¼ 1v ½f 0 + f 1 + f 3 + 2ðf 4 + f 7 + f 8 Þ f 2 ¼ f 4 + 23 rS v S f 5 ¼ f 7 + 12 ðf 1 f 3 Þ + 16 rs us + 12 rs v s f 6 ¼ f 8 + 12 ðf 1 f 3 Þ + 16 rS v s 12 rS uS West known uW and vW 1 rW ¼ 1u ½f 0 + f 2 + f 4 + 2ðf 3 + f 6 + f 7 Þ f 1 ¼ f 3 + 23 rW uW f 5 ¼ f 7 12 ðf 2 f 4 Þ + 16 rW uW + 12 rW v W f 8 ¼ f 6 + 12 ðf 2 f 4 Þ + 16 rW uW 12 rW v W 1 1 + vN ½f 0 + f 1 + f 3 + 2ðf 2 + f 5 + f 6 Þ E S W f 4 ¼ f 2 23 rN v N f 7 ¼ f 5 + 12 ðf 1 f 3 Þ 16 rN v N 12 rN uN f 8 ¼ f 6 + 12 ðf 3 f 1 Þ + 12 rN uN 16 rN v N 2.4.2 The boundary with a given velocity The velocity components in a boundary are known in many applications such as inlet ports of the domain. The equations of this type of boundary condition can be summarized as below for the main boundaries (north, east, south, and west) in Fig. 1 (Table 2). 2.4.3 The boundary with given pressure The procedure to specify pressure at the inlet is similar to specifying velocity as both conditions will be generated regarding a density difference in the flow. Here the density (pressure) in the inlet is known despite the situations for the boundary with known inlet velocities, the summary of pressure inlet boundary conditions is presented in Table 3. 2.4.4 Open boundary condition This condition assures outlet ports by the implementation of extrapolation. It assumes zero gradients in the flow direction and is applied as: f 3,n ¼ 2f 3,n1 f 3,n2 (23) f 6,n ¼ 2f 6,n1 f 6,n2 (24) f 7,n ¼ 2f 7,n1 f 7,n2 (25) where n is the lattice on the boundary, and (n 1) and (n 2) denote the lattices inside the domain in the flow direction adjacent to the boundary. Another possible extrapolation can be implemented as (Seta et al., 2006): 4 1 f 3,n ¼ f 3,n1 f 3,n2 3 3 4 1 f 6,n ¼ f 6,n1 f 6,n2 3 3 4 1 f 7,n ¼ f 7,n1 f 7,n2 3 3 (26) (27) (28) 2.4.5 Symmetry boundary condition If the problem is symmetrical along a line (2D) or surface (3D), a symmetrical boundary condition can be used to solve the problem for just a part of the domain and save computer resources. In this boundary, the unknown distribution functions are assumed to be equal to their mirror directions. For example, for the south boundary in Fig. 1, the symmetry boundary condition can be applied as: f 2 ¼ f 4 ,f 5 ¼ f 8 ,f 6 ¼ f 7 (29) 296 Handbook of hydroinformatics TABLE 3 Pressure conditions for known density in main boundaries (D2Q9). Boundary Density Distribution functions North known rN vN ¼ f 4 ¼ f 2 23 rN v N f 7 ¼ f 5 + 12 ðf 1 f 3 Þ 16 rN v N f 8 ¼ f 6 + 12 ðf 3 f 1 Þ 16 rN v N East known rE rE ¼ r1 ½f 0 + f 2 + f 4 + 2ðf 1 + f 5 + f 8 Þ 1 f 3 ¼ f 1 23 rE uE f 6 ¼ f 8 12 ðf 2 f 4 Þ 16 rE uE f 7 ¼ f 5 + 12 ðf 2 f 4 Þ 16 rE uN South known rS v S ¼ 1 r1 ½f 0 + f 1 + f 3 + 2ðf 4 + f 7 + f 8 Þ f 2 ¼ f 4 + 23 rS v S f 5 ¼ f 7 + 12 f 1 ¼ f 3 + 23 rW uW + 12 rS v s f 6 ¼ f 8 + 12 ðf 1 f 3 Þ + 16 rS v s West known rW uW ¼ 1 r1 ½f 0 + f 2 + f 4 + 2ðf 3 + f 6 + f 7 Þ f 1 ¼ f 3 + 23 rW uW f 5 ¼ f 7 12 ðf 2 f 4 Þ + 16 rW uW f 8 ¼ f 6 + 12 ðf 2 f 4 Þ + 16 rW uW 1 rN ½f 0 + f 1 + f 3 + 2ðf 2 + f 5 + f 6 Þ 1 E S W 2.4.6 Periodic boundary condition If a flow is repeated, a periodic boundary condition can be utilized by considering a part of the domain isolated from the whole domain. Such a periodic boundary condition is to save computational time. In this regard, the periodic boundary conditions consider both sides of the periodic domain symmetrical mirror to find unknown distribution functions. Unknown distribution on one side is linked to the known distribution functions on the opposite side. For instance, if in Fig. 1, east and west boundaries are periodic boundaries then: f 1W ¼ f 1E ,f 5W ¼ f 5E ,f 8W ¼ f 8E (30) f 3E ¼ f 3W ,f 6E ¼ f 6W ,f 7E ¼ f 7W (31) where W and E denote west and east boundaries, respectively. 3. Thermal LBM The temperature distribution in the domain can be obtained by solving the energy conservation equation: qg ∂T + U rT ¼ ar2 T + ∂t rC (32) where T is the temperature, a the diffusion coefficient, r is density, C denotes specific heat capacity and qg is the heat source. As discussed before, f in Eq. (10) is a distribution function that is used to find the flow field in the domain. To solve Eq. (32) of the temperature field, a new distribution function, g, should be required. To consider combined fluid flow and temperature fields, the thermal LBM utilizes two distribution functions, fand g, for the flow and temperature fields, respectively. The g distribution function is described as below: △t eq gk ðx + ck △t, t + △tÞ ¼ gk ðx, tÞ + gk ðx, tÞ gk ðx, tÞ + △tok ST (33) τg where geq k and ST are corresponding equilibrium distribution function and the heat source, respectively. As the temperature is a scalar parameter, the equilibrium distribution function can be determined by (Mohamad, 2011; Mezrhab et al., 2006; Delavar et al., 2009, 2010; Ajarostaghi et al., 2019; Delavar and Wang, 2020a, 2021a): geq k ðx, tÞ ¼ ok r 1 + ck U c2s (34) The source term is related to the macroscopic heat source, qg, as below: ST ¼ qg rC (35) Lattice Boltzmann method and its applications Chapter The temperature is calculated as: T¼ X gk 18 297 (36) k The effects of buoyancy force due to density changes by temperature gradients should be included in Eq. (10) which is calculated as below in vertical direction (y) (Mohamad, 2011): Fk ¼ 3ok gy bry cky (37) where b represents thermal expansion coefficient, cky is the y-component of lattice velocity ck, and y is a dimensionless temperature which is defined as: y¼ T Tc Th Tc (38) 3.1 Boundary condition with a given temperature Given a fixed temperature at a boundary, it is implemented in LBM using this boundary condition like other scalars (such as concentration). For example, if the temperature of the east boundary in Fig. 1 is equal to TW then: g3 ¼ T W ðo1 + o3 Þ g1 g6 ¼ T W ðo6 + o8 Þ g8 (39) g7 ¼ T W ðo5 + o7 Þ g5 It should be noticed that if any nondimensioning or normalization is considered during the solution, the boundary conditions must be adjusted accordingly. 3.2 Constant heat flux boundary condition Regarding Fourier law, the temperature gradient can be related to heat flux, for example for the south boundary in Fig. 1: q00y ¼ l ∂T ∂y (40) where l is thermal conductivity. This equation can be rewritten as: q00y ¼ l ∂T T ði, 1Þ T ði, 0Þ ¼ l ∂y Dy (41) Then: T ði, 0Þ ¼ T ði, 1Þ + q00y l Dy (42) That is similar to the temperature boundary condition discussed in the previous section. If qy00 ¼ 0, it would be an adiabatic boundary condition that is similar to an open boundary condition as discussed before. 4. Multicomponent LBM (species transport modeling) In multicomponent models, different species react with each other and are dependent on local flow and thermal fields. In such a simulation the species conservation equations are applied for each species that are as below (Delavar and Wang, 2020b, 2021b,c, 2022a): 2 rU rCl ¼ rDeff l r Cl + Sl X Cl ¼ 1 (43) (44) l where l is the chemical species, Cl represents the mole fraction of species l, Slis the mass generation/consumption rate for species l per unit volume, and Dlis the diffusion coefficient of the lth component. 298 Handbook of hydroinformatics Each species concentration is a scalar parameter, so it is treated as temperature. New distribution functions will be utilized for each species, gl, which are defined as below (Delavar and Wang, 2020b, 2021a,b,c): △t eq glk ðx, tÞ glk ðx, tÞ + △tok Sl (45) glk ðx + ck △t, t + △tÞ ¼ glk ðx, tÞ + τl geq lk ¼ ok Cl 1 + ck U c2s (46) where Sl is the source term of the lth species. Same as temperature after computing the values of local distribution functions, local concentration is: X Cl ¼ glk (47) k Because species concentration, like temperature, is a scalar quantity, its boundary conditions will be handled in the same way as those for temperature computation, as detailed in Section 3. 5. Flow simulation in porous media As Darcy’s law cannot make a good prediction of fluid flow in porous media at high velocities, two prominent modifications are proposed: first, Forchheimer’s equation which is considered a nonlinear drag effect due to the solid matrix, rP ¼ m 1:75 rf U + pffiffiffiffiffiffiffiffiffiffiffi 1= jUjU K 150e3 K 2 and second Brinkman’s equation, which considers viscous stress introduced by the solid boundary. m rP ¼ U meff r2 U K (48) (49) where K is permeability, e denotes porosity, and meff represents the effective viscosity. Nield and Bejan (2006) combined these two equations and derived the Brinkman-Forchheimer equation, which includes the viscous and inertial terms by the local volume averaging technique. This model successfully was used to investigate fluid flow in porous media in a wide range of porosities, Rayleigh, Reynolds, and Darcy numbers (Seta et al., 2006; Yan et al., 2008; Delavar et al., 2010; Ajarostaghi et al., 2019; Tian and Wang, 2017, 2018). The Brinkman-Forchheimer equation is described as: ∂U U 1 + ðU r Þ (50) ¼ rðepÞ + ueff r2 U + F ∂t e r where ueff is the effective viscosity. F is the total body force due to the viscous diffusion, the inertia due to the presence of the porous medium, and an external force: F¼ eu 1:75 U pffiffiffiffiffiffiffiffiffiffiffiffiffi jU jU + eG K 150eK where G is the gravitational acceleration. The equilibrium distribution functions are changed to: " # ck U 1 ðck U Þ2 1 U 2 eq + f k ¼ wk r 1 + c2s 2 2 ec2s ec4s The appropriate choice for the forcing term Fk and U to capture correct hydrodynamics is: 1 ck F U c k ÞðF c k Þ U F + Fk ¼ w k r 1 2τv c2s ec4s ec2s X V Dt pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , V ¼ eG ck f k =r + 2 2 c0 + c0 + c1 jV j k 1 Dt u Dt 1:75 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi c0 ¼ 1+e , c1 ¼ e 2 2 K 2 150e3 K U¼ (51) (52) (53) (54) (55) Lattice Boltzmann method and its applications Chapter 18 299 To solve the energy equation the overall thermal conductivity of the porous medium should be identified that depends on a porous solid structure and the fluid. If the heat transfer via this solid structure and fluid occur in parallel, the overall conductivity leff is a function of the weighted solid and fluid conductivities, lsand lf, as (Nield and Bejan, 2006): leff ¼ ð1 eÞls + elf If the heat transfer occurs in series, the effective thermal conductivity would be (Nield and Bejan, 2006): !1 ð1 eÞ e leff ¼ + ks lf (56) (57) In general, these two equations produce upper and lower bounds, respectively. For practical purposes, it is recommended to use (Nield and Bejan, 2006): e leff ¼ le1 s lf (58) The above equations are valid if there are no big differences between ks and kf, for other cases the effective thermal conductivity in porous media was calculated by ( Jiang and Lu, 2006; Delavar and Hedayatpour, 2012): " !# pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2 1 e ð1 sÞB 1 B+1 B1 ln leff ¼ lf 1 1 e + 1 sB ð1 sBÞ2 sB 2 1 sB (59) lf 1 e 10=9 B ¼ 1:25 , s¼ e ls Fig. 2 shows the results of lattice Boltzmann simulation of biohydrogen production by microorganisms in a porous microbioreactor. 6. Dimensionless numbers In LBM simulations, we need to use dimensionless equations required for an analysis of the relationship between the physical quantities of LBM and those of real parameters. The dimensionless numbers are used in the simulations in order to (Delavar and Wang, 2020a,b): (1) Simplify the relation between real and simulation parameters by reducing the number of variables used, for instance, Re as a dimensionless number can be used instead of velocity, kinematic viscosity, and characteristic length. (2) These numbers can be used to analyze system behavior regardless of the unit used to measure variables. FIG. 2 Velocity vectors and normalized biohydrogen contours (by maximum concentration) in a porous microbioreactor simulated using LBM. 300 Handbook of hydroinformatics (3) Rescale governing parameters and variables so that all computed quantities are of a range of the same magnitude (relatively similar magnitudes; in LBM between 0 and 1). For the simulation of fluid flow, the Reynolds number is the governing dimensionless. For a given Reynolds number (Delavar et al., 2010; Delavar and Wang, 2020b, 2021a): UH UN Re ¼ ¼ (60) u real u LBM where N is the number of grids used for meshing, and H is the characteristics length. Subscript, real, represents the real system and LBM does the LBM simulations. The next important dimensionless number is the Prandtl number, which is the ratio of kinematic viscosity to thermal diffusivity, so for a given value of the Prandtl number: Pr ¼ u a real ¼ u a (61) LBM Rayleigh number is used when natural convection is important and is defined as: Ra ¼ gbDTH 3 an (62) where b is the thermal expansion coefficient, g is the gravitational acceleration, and DTis the reference temperature difference (usually maximum temperature difference in the domain). For the mass transport equation, the main governing dimensionless number is Schmidt that is defined as the ratio of momentum diffusivity (viscosity), n, and mass diffusivity, Dl. This number characterizes the mass transport equation regards to fluid flow as (Delavar and Wang, 2021b): u Scl ¼ (63) Dl Therefore, for the transport of each species, the related Schmidt number must be calculated and used unchanged in real and numerical simulations. In addition, the diffusion coefficient of species is 80% decreased in cells occupied with biofilm. These changes will affect the Schmidt number. In Eqs. (34) and (45), the source terms are multiplied by △t, this shows the importance of the correct calculation of time in LBM regarding real time. This relation is satisfied as (Delavar et al., 2010; Delavar and Wang, 2020b, 2022b): Dl t Dl t ¼ (64) 2 LD real L2D LBM where LD is a characteristic length, Dl the diffusion coefficient of species, and t the time. 7. Flow chart of the simulation procedure The simulation flow chart, shown in Fig. 3, contains the following steps: (1) Input geometry: The computational domain is defined in this step. (2) Initialization: Input initial values of parameters (such as velocity, temperature, etc.) and boundary conditions or update their values regarding the previous time step. (3) Solve the hydrodynamic equations: Collision and streaming are solved to find flow fields in the domain in both fluid and porous zones. (4) Solve heat transfer equation: Temperature field is solved regarding the velocity field solved in the previous step. (5) The mass transport of species: the effects of flow and temperature field are included to solve the mass transport equation. 8. Multiphase flows Multiphase flows are important in many applications such as power plants, environmental systems, the recovery of petroleum resources from reservoirs, fuel cell operations, etc. Consequently, they have been the subject of many numerical and experimental studies. In the last decades, LBM has become a reliable technique for simulating multiphase flows (Huang et al., 2015; Shan and Chen, 1993; Inamuro et al., 2004; Liu et al., 2016; Li et al., 2016a,b; Chen et al., 2018; Niu et al., Lattice Boltzmann method and its applications Chapter 18 301 FIG. 3 LBM simulation flow chart. 2018). Compared to conventional CFD methods, LBM has some advantages due to its kinetic nature. In conventional CFD methods, the interfacial behavior is often applied by the implementation of complicated relations, a transport equation of the volume fraction, and an interface reconstruction process. However, in the LB method, the incorporation of intermolecular-level interactions can easily capture the multiphase interface and the interfacial dynamics (Ba et al., 2016). Among of different Multiphase LBM models, some models are more popular: The color-gradient model, the Shan-Chen (SC) model, the free-energy (FE) model, and the interface tracking model. Among these models, a brief review of the color-gradient model and Shan-Chen are presented in this section. More information about different LBM multiphase models can be found in (Huang et al., 2015; Sattari et al., 2014, 2016; Inamuro et al., 2004; Yan and Zu, 2007; Zheng et al., 2006; He et al., 1999). Fig. 4 shows the results of bubble rising simulated using the lattice Boltzmann method. 8.1 The color-gradient model In this model, two distribution functions are used for the simulation of a two-component fluid flow, red-colored fluid, and the other is blue-colored fluid. This model was proposed by Gunstensen et al. (1991) based on the Rothman-Keller (RK) multiphase lattice gas model (Rothman and Keller, 1988), after that it was modified by several authors. FIG. 4 Isosurfaces and velocity vectors of bubble rise in the vertical direction, simulated using LBM. 302 Handbook of hydroinformatics The model was modified to handle binary fluids with different density and viscosity ratios (Grunau et al., 1993). LatvaKokko and Rothman (2005) introduced an extra collision term in the model. Ahrenholz et al. (2008) improved the model by use of a multiple relaxation time (MRT) LBM to simulate higher viscosity ratios and lower capillary numbers with the advantage of independent adjustment of the surface tension and the ratio of viscosities. More recently, for cases of density ratio of order O(10) or bigger, some schemes were suggested to improve the model (Huang et al., 2013; Ba et al., 2016). In this model, two immiscible fluids are presented as red and blue fluids. The distribution function fXk (X ¼ R or B) is used to represent fluid, R for red fluid, and B for blue fluid. The total distribution function would be (Ba et al., 2016): (65) f Xk ðx + ck △t, t + △tÞ ¼ f Xk ðx, tÞ + OXk f bk ðx, tÞ f k ðx, tÞ ¼ f Rk ðx, tÞ + f Bk ðx, tÞ 3 h X 1 X 2 i OXk ¼ OXk Ok + Ok (66) (67) where OXk is total collision operator, (OXk )1 is single phase collision operator, (OXk )2 is perturbation operator to generate an interfacial tension and (OXk )3 is recoloring operator that produces the phase segregation and maintains the phase interface. The BGK collision operator is (Ba et al., 2016): h i X 1 Ok ¼ W X f Xk + f X,eq (68) k " #! ck U ðck UÞ2 UU X,eq X X + 0:5 0:5 2 f k ¼ r fk + ok 1 + (69) c2s cs c4s where WX is relaxation parameter and fXk is a parameter related to the density ratio of fluids and sound speed in each fluid. The perturbation operator is: X 2 WX X Ok ¼ A ok 1 ½3ðck U Þ + 9ðck U Þck FS (70) 2 where AX is the reaction of interfacial tension contributed by fluid X, and FS is the interfacial tension (Ba et al., 2016). The recoloring parameters for red and blue fluids are calculated as: ORk 3 0 00 rR rR rB cos ’k ck f k ≡ f Rk ¼ f 0k + rok r r (71) OBk 3 0 00 rB rR rB cos ’k ck f k ≡ f Bk ¼ f 0k + rok r r (72) where fk0 is post perturbation value of total distribution function and ’k is the angle between color gradient and lattice direction. More detail about this model is presented in Ba et al. (2016) and Huang et al. (2015). 8.2 Shan-Chen model The Shan-Chen (SC) model (Shan and Chen, 1993, 1994; Sukop and Thorne, 2006) is based on the incorporation of an attractive or repulsive force between particles in LBM, which leads to phase separation. There are both single-component and multicomponent multiphase models based on the SC model (Shan and Chen, 1993; Shan and Doolen, 1995). This model has been successfully used to simulate different multiphase phenomena such as condensation, evaporation, and cavitation, bubble rise, porous media relative permeability calculation, oil-water-like two-component flow in porous media (Sankaranarayanan et al., 2002; Chen et al., 2014; Falcucci et al., 2013; Dong et al., 2010; Nekoeian et al., 2018; Dauyeshova et al., 2018; Sipey et al., 2020). This model produces good results with a density ratio of up to 10 (Huang et al., 2011) and the surface tension cannot be specified independently of inter-particle force. However, some studies have been carried out to increase the density ratio (Bao and Schaefer, 2013). In the multicomponent multiphase SC model, each distribution function represents a fluid component and is calculated using the following Lattice Boltzmann equation (Bao and Schaefer, 2013; Sipey et al., 2020): f sk ðx + ck dt, t + dtÞ ¼ f sk ðx, tÞ 1 s f ðx, tÞ f s,eq k ðx, tÞ τs k (73) Lattice Boltzmann method and its applications Chapter 2 s6 f s,eq k ðx, tÞ ¼ ok r 41 + ck useq c2s ek + useq 2c4s 2 + useq 2c2s 2 18 303 3 7 5 (74) where fsk is the sth component density distribution function, τs the single relaxation time for each component related to the eq kinematic viscosity as ns ¼ c2s (τs 0.5dt) and fs, the equilibrium distribution. The density and velocity of the sth comi ponent are calculated as: X rs ¼ f si (75) i us ¼ 8 X e fs i i i¼0 (76) rs The macroscopic velocity is given by: 2 X rs us useq ¼ s¼1 2 X τs rs τs s¼1 + τ s Fs rs (77) where Fsis the force acting on the sth component including fluid-fluid interaction, fluid-solid interaction, and body force: Fs ¼ Fsf + Fss + Fsb (78) Details about this model can be found in (Bao and Schaefer, 2013; Sipey et al., 2020). 9. Sample test cases and codes In this section, the thermal LBM described in Section 3 is applied for simulations of two sample cases. The test cases help identify the problems of heat transfer, simplifying assumptions, set up boundaries, and solve thermal equations. Velocity fields are solved first and then the temperature field is solved as a scalar variable. The codes are written in Intel(R) Visual Fortran and are provided in Appendix. 9.1 Free convection in L-cavity This example describes a thermal fluid problem of free convection in an L-cavity (Fig. 5). The left boundary is set up as a hot wall with a constant temperature, Th ¼ 1. The top boundaries are set up as two adiabatic walls, and middle constant temperature, Tc ¼ 0. The right boundary is set up as a constant temperature, Tc. The bottom boundary is set up as an adiabatic wall. The buoyancy force drives free convection that is controlled by local temperature and Rayleigh number. Fig. 5 shows interactions between the thermal field and the flow field of free convection. The code is included in Appendix A. This presents a platform to implement other source terms in the simulation procedure, such as heat or mass sources, viscose forces due to porous media, or forces in the magnetohydrodynamics phenomenon. 9.2 Force convection in a channel This example describes a thermal fluid problem of forced convection in a rectangular channel (Fig. 6). This example is similar to an electronic cooling case. An electronic device produces heat (right block). A cooling fluid is forced to pass the heat produced by the device through forced convection. To reduce the forced fluid temperature, a heat pipe is used as a low temperature block (left block). The left boundary is set up as an inlet and the right boundary is the outlet. The top and bottom boundaries are adiabatic walls. The left block is a clod source term with a constant low temperature, Tl ¼ 0.25, and the right block is a heat source with a constant high temperature, Th ¼ 1. The inlet temperature is 0.5. Fig. 6 shows interactions between the thermal field and the flow field of forced convection. The code is included in Appendix B. 304 Handbook of hydroinformatics FIG. 5 Boundary condition, temperature contours, and velocity vectors of free convection in an L-cavity geometry. FIG. 6 Velocity vectors and temperature contours of forced convection in a channel with two obstacles. 10. Conclusions In this chapter, the main concepts and applications of the lattice Boltzmann method were presented. As a mesoscopic numerical approach, LBM is based on distribution functions in limited directions instead of solving governing equations of mass, momentum, and energy conservation, compared to conventional CFD methods. This makes LBM more suitable for nonequilibrium dynamics in complex geometries and boundary conditions, and parallel processing, and avoids solving computationally expensive Poisson equation to capture pressure field. Single relaxation and multirelaxation time LBM were described, followed by thermal LBM model, multicomponents LBM models, and multiphase LBM models. Two sample cases with their codes are presented at the end of this chapter to provide a platform for students, researchers, and engineers to develop their own codes to do different scientific and practical numerical simulations. Lattice Boltzmann method and its applications Chapter 18 305 Appendix A Computer code for free convection in L-cavity parameter (n=120,m=120) real f(0:8,0:n,0:m),feq(0:8,0:n,0:m),visco(0:n,0:m),omega(0:n,0:m) real rho(0:n,0:m),rhoo(0:n,0:m),uin(0:m) real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m) real alpha(0:n,0:m),gbeta(0:n,0:m),omegat(0:n,0:m) real w(0:8),cx(0:8),cy(0:8) real u(-1:n+1,-1:m+1),v(-1:n,-1:m+1) integer i,j cx(:)=(/0.0,1.0,0.0,-1.0,0.0,1.0,-1.0,-1.0,1.0/) cy(:)=(/0.0,0.0,1.0,0.0,-1.0,1.0,1.0,-1.0,-1.0/) w(:)=(/4./9.,1./9.,1./9.,1./9.,1./9.,1./36.,1./36.,1./36.,1./36./) ra=1.e6 SV=0.0 dx=1.0 dy=dx tw1=1.0 tw2=0 thref=((tw1+tw2)/2) dt=(tw1-tw2) pr=.71 !**********************************************! ! Setting initial values !**********************************************! do i=0,n do j=0,m rhoo(i,j)=6.0 visco(i,j)=0.02 u(i,j)=0. v(i,j)=0. th(i,j)=0.0 end do end do do i=0,n u(i,m)=0.0 v(i,m)=0.0 end do !**********************************************! ! setting LBM solution parameters !**********************************************! do i=0,n do j=0,m alpha(i,j)=visco(i,j)/pr rho(i,j)=rhoo(i,j) end do end do do i=0,n do j=0,m omega(i,j)=1./(3.*visco(i,j)+0.5) end do end do do i=0,n do j=0,m omegat(i,j)=1./(3.*alpha(i,j)+0.5) end do end do do i=0,n do j=0,m gbeta(i,j)=ra*visco(i,j)*alpha(i,j)/(float(m*m*m)) ! Attention required end do end do mstep=1 ! savenumber=0 Continued 306 Handbook of hydroinformatics kk=0 !**********************************************! !main loop !**********************************************! !main loop 1 do while (mstep==1) !kk=1,mstep call collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,gbeta,visco,thref) call streaming(f,n,m) call bouncon(f,n,m) call uvcalc(f,rho,u,v,n,m,cx,cy) ! -----------------------------!collesion for th call colls(u,v,g,geq,th,omegat,w,cx,cy,n,m) !streaming for th call streaming(g,n,m) call gbouncon(g,tw1,tw2,w,n,m) call thcalcu(g,th,n,m) ! Showing some parameters to check solution in iterations print*,kk,"**",th(n/2,m/2),"**",u(n/2,m/2),"**" ! maximum iteration criterion if (kk==99999) mstep=2 ! setting autosave period if(savenumber==5000) call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref) kk=kk+1 savenumber=savenumber+1 errmaxu=0.0 errmaxsc=0.0 END DO !**********************************************! ! end of the main loop !**********************************************! call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref) stop end !**********************************************! ! Subroutine of collesion for FLOW field !**********************************************! subroutine collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,gbeta,visco,thref) real f(0:8,0:n,0:m),omega(0:n,0:m),gbeta(0:n,0:m) real feq(0:8,0:n,0:m),rho(0:n,0:m),th(0:n,0:m) real w(0:8),cx(0:8),cy(0:8),visco(0:n,0:m) real u(0:n,0:m),v(0:n,0:m) do i=0,n do j=0,m t1=u(i,j)*u(i,j)+v(i,j)*v(i,j) do k=0,8 t2=u(i,j)*cx(k)+v(i,j)*cy(k) !u.c(k) force=3.0*w(k)*gbeta(i,j)*((th(i,j)-thref)*cy(k)*rho(i,j)) if(i.eq.0.or.i.eq.n) force=0.0 !Attention required if(j.eq.0.or.j.eq.m) force=0.0 !Attention required feq(k,i,j)=rho(i,j)*w(k)*(1.0+3.0*t2+4.5*t2*t2-1.50*t1) f(k,i,j)=omega(i,j)*feq(k,i,j)+(1.-omega(i,j))*f(k,i,j)+force end do end do end do return end !**********************************************! ! Subroutine of collesion for Thermal field !**********************************************! subroutine colls(u,v,g,geq,th,omegat,w,cx,cy,n,m) Continued Lattice Boltzmann method and its applications Chapter 18 real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m) real w(0:8),cx(0:8),cy(0:8),omegat(0:n,0:m) real u(0:n,0:m),v(0:n,0:m) do i=0,n do j=0,m do k=0,8 geq(k,i,j)=th(i,j)*w(k)*(1.0+3.0*(u(i,j)*cx(k)+v(i,j)*cy(k))) g(k,i,j)=omegat(i,j)*geq(k,i,j)+(1.0-omegat(i,j))*g(k,i,j) end do end do end do return end !**********************************************! ! Subroutine of streaming !**********************************************! subroutine streaming(f,n,m) real f(0:8,0:n,0:m) ! streaming do j=0,m do i=n,1,-1 ! RIGHT TO LEFT f(1,i,j)=f(1,i-1,j) end do do i=0,n-1 ! LEFT TO RIGHT f(3,i,j)=f(3,i+1,j) end do end do do j=m,1,-1 ! TOP TO BOTTOM do i=0,n f(2,i,j)=f(2,i,j-1) end do do i=n,1,-1 f(5,i,j)=f(5,i-1,j-1) end do do i=0,n-1 f(6,i,j)=f(6,i+1,j-1) end do end do do j=0,m-1 !BOTTOM TO TOP do i=0,n f(4,i,j)=f(4,i,j+1) end do do i=0,n-1 f(7,i,j)=f(7,i+1,j+1) end do do i=n,1,-1 f(8,i,j)=f(8,i-1,j+1) end do end do return end !**********************************************! ! Subroutine of Boundary condition for flow field !**********************************************! subroutine bouncon(f,n,m) real f(0:8,0:n,0:m)!,feq(0:8,0:n,0:m) real uin(0:m+m) ! West boundary, Bounce Back do j=0,m f(1,0,j)=f(3,0,j) f(5,0,j)=f(7,0,j) f(8,0,j)=f(6,0,j) ! East Boundary, Bounce Back f(3,n,j)=f(1,n,j) f(6,n,j)=f(8,n,j) f(7,n,j)=f(5,n,j) end do Continued 307 308 Handbook of hydroinformatics do i=0,n ! South Boundary, Bounce Back f(2,i,0)=f(4,i,0) f(5,i,0)=f(7,i,0) f(6,i,0)=f(8,i,0) ! North Boundary, Bounce Back f(4,i,m)=f(2,i,m) f(8,i,m)=f(6,i,m) f(7,i,m)=f(5,i,m) end do !obsatcle ! West boundary, Bounce Back do j=0,m f(1,n/2,j)=f(3,n/2,j) f(5,n/2,j)=f(7,n/2,j) f(8,n/2,j)=f(6,n/2,j) end do do i=0,n ! South Boundary, Bounce Back f(2,i,m/2)=f(4,i,m/2) f(5,i,m/2)=f(7,i,m/2) f(6,i,m/2)=f(8,i,m/2) end do return end !**********************************************! ! Subroutine of Boundary condition for Thermal field !**********************************************! subroutine gbouncon(g,tw1,tw2,w,n,m) real g(0:8,0:n,0:m),geq(0:8,0:n,0:m) real w(0:8),tw1,tw2 ! Boundary Conditions ! West Boundary Condition, T=1 do j=0,m g(1,0,j)=tw1*(w(1)+w(3))-g(3,0,j) g(5,0,j)=tw1*(w(5)+w(7))-g(7,0,j) g(8,0,j)=tw1*(w(8)+w(6))-g(6,0,j) end do ! East Boundary Condition, T=0 do j=0,m g(6,n,j)=tw2*(w(8)+w(6))-g(8,n,j) g(3,n,j)=tw2*(w(1)+w(3))-g(1,n,j) g(7,n,j)=tw2*(w(5)+w(7))-g(5,n,j) end do ! Top Boundary Condition, Part 1, Adiabatic do i=0,n/4 do k=0,8 g(k,i,m)=g(k,i,m-1) end do end do ! Top Boundary Condition, Part 2, T=0 do i=n/4+1,3*n/4 g(4,i,m)=tw2*(w(2)+w(4))-g(2,i,m) g(7,i,m)=tw2*(w(5)+w(7))-g(5,i,m) g(8,i,m)=tw2*(w(6)+w(8))-g(6,i,m) end do ! Top Boundary Condition, Part 3, Adiabatic do i=3*n/4+1,n do k=0,8 g(k,i,m)=g(k,i,m-1) end do end do Continued Lattice Boltzmann method and its applications Chapter 18 ! Bottom Boundary Condition, Adiabatic do i=0,n do k=0,8 g(k,i,0)=g(k,i,1) end do end do !Obsatacle ! West Boundary Condition, T=1 do j=0,m/2 g(1,n/2,j)=tw1*(w(1)+w(3))-g(3,n/2,j) g(5,n/2,j)=tw1*(w(5)+w(7))-g(7,n/2,j) g(8,n/2,j)=tw1*(w(8)+w(6))-g(6,n/2,j) end do ! Top Boundary Condition, T=1 do i=0,n/2 g(2,i,m/2)=tw1*(w(2)+w(4))-g(4,i,m/2) g(5,i,m/2)=tw1*(w(5)+w(7))-g(7,i,m/2) g(6,i,m/2)=tw1*(w(6)+w(8))-g(8,i,m/2) end do return end !**********************************************! ! Temperature calculation !**********************************************! subroutine thcalcu(g,th,n,m) real g(0:8,0:n,0:m),th(0:n,0:m) do j=0,m do i=0,n thsum=0.0 do k=0,8 thsum=thsum+g(k,i,j) end do th(i,j)=thsum end do end do return end !**********************************************! ! Subroutine of velocity calculation !**********************************************! subroutine uvcalc(f,rho,u,v,n,m,cx,cy) real f(0:8,0:n,0:m),rho(0:n,0:m),u(0:n,0:m),v(0:n,0:m) real cx(0:8),cy(0:8) do j=0,m do i=0,n uvsum=0.0 do k=0,8 uvsum=uvsum+f(k,i,j) end do rho(i,j)=uvsum end do end do do j=0,m do i=0,n usum=0.0 vsum=0.0 do k=0,8 usum=usum+f(k,i,j)*cx(k) vsum=vsum+f(k,i,j)*cy(k) end do u(i,j)=usum/rho(i,j) v(i,j)=vsum/rho(i,j) end do end do return end Continued 309 310 Handbook of hydroinformatics !**********************************************! ! Subroutine of exporting results !**********************************************! subroutine result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref) real u(0:n,0:m),v(0:n,0:m),rho(-1:n+1,-1:m+1),th(0:n,0:m) real strf(0:n,0:m) real tw1,tw2,dt,thref real tt(0:n,0:m) CHARACTER FILOUT1*18 CHARACTER FILOUT2*18 WRITE(FILOUT1,'(4HUVTS,I8,4H.lbm)')kk WRITE(FILOUT2,'(4HNUAV,I8,4H.txt)')kk ! Streamfunction Calculations strf(0,0)=0. do i=0,n rhoav=0.5*(rho(i-1,0)+rho(i,0)) if(i.ne.0) strf(i,0)=strf(i-1,0)-rhoav*0.5*(v(i-1,0)+v(i,0)) do j=1,m rhom=0.5*(rho(i,j)+rho(i,j-1)) strf(i,j)=strf(i,j-1)+rhom*0.5*(u(i,j-1)+u(i,j)) end do end do . ! Exporting velocity and thermal fields OPEN(2,FILE=FILOUT1) write(2,*)"VARIABLES=X,Y,U,V,th,StreamF,rho" write(2,*)"ZONE ","I=",n+1,"J=",m+1,",","F=BLOCK" do j=0,m write(2,*)(i/float(m),i=0,n) end do do j=0,m write(2,*)(j/float(m),i=0,n) end do do j=0,m write(2,*)(u(i,j),i=0,n) end do do j=0,m write(2,*)(v(i,j),i=0,n) end do do j=0,m write(2,*)(th(i,j),i=0,n) end do do j=0,m write(2,*)(strf(i,j),i=0,n) end do do j=0,m write(2,*)(rho(i,j),i=0,n) end do savenumber=0. return end Lattice Boltzmann method and its applications Chapter 18 311 Appendix B Computer code for force convection in a channel parameter (n=400,m=50) real f(0:8,0:n,0:m),feq(0:8,0:n,0:m),visco(0:n,0:m),omega(0:n,0:m) real rho(0:n,0:m),rhoo(0:n,0:m) real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m) real alpha(0:n,0:m),gbeta(0:n,0:m),omegat(0:n,0:m) real w(0:8),cx(0:8),cy(0:8) real u(-1:n+1,-1:m+1),v(-1:n,-1:m+1) real uin integer i,j cx(:)=(/0.0,1.0,0.0,-1.0,0.0,1.0,-1.0,-1.0,1.0/) cy(:)=(/0.0,0.0,1.0,0.0,-1.0,1.0,1.0,-1.0,-1.0/) w(:)=(/4./9.,1./9.,1./9.,1./9.,1./9.,1./36.,1./36.,1./36.,1./36./) !ra=1.e6 !SV=0.0 tw1=1.0 tw2=0 thref=((tw1+tw2)/2) !dt=(tw1-tw2) pr=.71 uin=0.02 !**********************************************! ! Setting initial values !**********************************************! do i=0,n do j=0,m rhoo(i,j)=6.0 visco(i,j)=0.02 u(i,j)=0. v(i,j)=0. th(i,j)=0.5 end do end do !**********************************************! ! setting LBM solution parameters !**********************************************! do i=0,n do j=0,m alpha(i,j)=visco(i,j)/pr rho(i,j)=rhoo(i,j) end do end do do i=0,n do j=0,m omega(i,j)=1./(3.*visco(i,j)+0.5) end do end do do i=0,n do j=0,m omegat(i,j)=1./(3.*alpha(i,j)+0.5) end do end do mstep=1 ! savenumber=0 kk=0 !**********************************************! !main loop !**********************************************! !main loop 1 do while (mstep==1) !kk=1,mstep call collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,visco,thref) call streaming(f,n,m) Continued 312 Handbook of hydroinformatics call bouncon(f,n,m,uin) call uvcalc(f,rho,u,v,n,m,cx,cy) ! -----------------------------!collesion for th call colls(u,v,g,geq,th,omegat,w,cx,cy,n,m) !streaming for th call streaming(g,n,m) call gbouncon(g,tw1,tw2,w,n,m) call thcalcu(g,th,n,m) ! Showing some parameters to check solution in iterations print*,kk,"**",th(n/2,m/2),"**",u(n/2,m/2),"**" ! maximum iteration criterion if (kk==39999) mstep=2 ! setting autosave period if(savenumber==5000) call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref) kk=kk+1 savenumber=savenumber+1 errmaxu=0.0 errmaxsc=0.0 END DO !**********************************************! ! end of the main loop !**********************************************! call result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref) stop end !**********************************************! ! Subroutine of collesion for FLOW field !**********************************************! subroutine collesion(u,v,f,feq,rho,omega,w,cx,cy,n,m,th,visco,thref) real f(0:8,0:n,0:m),omega(0:n,0:m) real feq(0:8,0:n,0:m),rho(0:n,0:m),th(0:n,0:m) real w(0:8),cx(0:8),cy(0:8),visco(0:n,0:m) real u(0:n,0:m),v(0:n,0:m) do i=0,n do j=0,m t1=u(i,j)*u(i,j)+v(i,j)*v(i,j) do k=0,8 t2=u(i,j)*cx(k)+v(i,j)*cy(k) !u.c(k) feq(k,i,j)=rho(i,j)*w(k)*(1.0+3.0*t2+4.5*t2*t2-1.50*t1) f(k,i,j)=omega(i,j)*feq(k,i,j)+(1.-omega(i,j))*f(k,i,j) end do end do end do return end !**********************************************! ! Subroutine of collesion for Thermal field !**********************************************! subroutine colls(u,v,g,geq,th,omegat,w,cx,cy,n,m) real g(0:8,0:n,0:m),geq(0:8,0:n,0:m),th(0:n,0:m) real w(0:8),cx(0:8),cy(0:8),omegat(0:n,0:m) real u(0:n,0:m),v(0:n,0:m) do i=0,n do j=0,m do k=0,8 geq(k,i,j)=th(i,j)*w(k)*(1.0+3.0*(u(i,j)*cx(k)+v(i,j)*cy(k))) g(k,i,j)=omegat(i,j)*geq(k,i,j)+(1.0-omegat(i,j))*g(k,i,j) end do end do end do return Continued Lattice Boltzmann method and its applications Chapter 18 end !**********************************************! ! Subroutine of streaming !**********************************************! subroutine streaming(f,n,m) real f(0:8,0:n,0:m) ! streaming do j=0,m do i=n,1,-1 ! RIGHT TO LEFT f(1,i,j)=f(1,i-1,j) end do do i=0,n-1 ! LEFT TO RIGHT f(3,i,j)=f(3,i+1,j) end do end do do j=m,1,-1 ! TOP TO BOTTOM do i=0,n f(2,i,j)=f(2,i,j-1) end do do i=n,1,-1 f(5,i,j)=f(5,i-1,j-1) end do do i=0,n-1 f(6,i,j)=f(6,i+1,j-1) end do end do do j=0,m-1 !BOTTOM TO TOP do i=0,n f(4,i,j)=f(4,i,j+1) end do do i=0,n-1 f(7,i,j)=f(7,i+1,j+1) end do do i=n,1,-1 f(8,i,j)=f(8,i-1,j+1) end do end do return end !**********************************************! ! Subroutine of Boundary condition for flow field !**********************************************! subroutine bouncon(f,n,m,uin) real f(0:8,0:n,0:m)!,feq(0:8,0:n,0:m) real uin ! West boundary, inlet, u=uin do j=0,m rhow=(f(0,0,j)+f(2,0,j)+f(4,0,j)+2.*(f(3,0,j)+f(6,0,j)+f(7,0,j)))/(1.-uin) f(1,0,j)=f(3,0,j)+2.*rhow*uin/3 f(5,0,j)=f(7,0,j)-(f(2,0,j)-f(4,0,j))/2.+rhow*uin/6. f(8,0,j)=f(6,0,j)+(f(2,0,j)-f(4,0,j))/2.+rhow*uin/6. ! East Boundary, open boundary f(3,n,j)=4*f(3,n-1,j)/3-f(3,n-2,j)/3 f(6,n,j)=4*f(6,n-1,j)/3-f(6,n-2,j)/3 f(7,n,j)=4*f(7,n-1,j)/3-f(7,n-2,j)/3 end do do i=0,n ! South Boundary, Bounce Back f(2,i,0)=f(4,i,0) f(5,i,0)=f(7,i,0) f(6,i,0)=f(8,i,0) ! North Boundary, Bounce Back f(4,i,m)=f(2,i,m) f(8,i,m)=f(6,i,m) f(7,i,m)=f(5,i,m) Continued 313 314 Handbook of hydroinformatics end do !obsatcle 1 ! East boundary, Bounce Back do j=0,m/2 f(3,n/4,j)=f(1,n/4,j) f(6,n/4,j)=f(8,n/4,j) f(7,n/4,j)=f(5,n/4,j) ! West boundary, Bounce Back f(1,n/4+m/2,j)=f(3,n/4+m/2,j) f(5,n/4+m/2,j)=f(7,n/4+m/2,j) f(8,n/4+m/2,j)=f(6,n/4+m/2,j) end do do i=n/4,n/4+m/2 ! North Boundary, Bounce Back f(2,i,m/2)=f(4,i,m/2) f(5,i,m/2)=f(7,i,m/2) f(6,i,m/2)=f(8,i,m/2) end do !obsatcle 2 ! East boundary, Bounce Back do j=m/2,m f(3,n/2,j)=f(1,n/2,j) f(6,n/2,j)=f(8,n/2,j) f(7,n/2,j)=f(5,n/2,j) ! West boundary, Bounce Back f(1,n/2+m/2,j)=f(3,n/2+m/2,j) f(5,n/2+m/2,j)=f(7,n/2+m/2,j) f(8,n/2+m/2,j)=f(6,n/2+m/2,j) end do do i=n/2,n/2+m/2 ! South Boundary, Bounce Back f(4,i,m/2)=f(2,i,m/2) f(7,i,m/2)=f(5,i,m/2) f(8,i,m/2)=f(6,i,m/2) end do return end !**********************************************! ! Subroutine of Boundary condition for Thermal field !**********************************************! subroutine gbouncon(g,tw1,tw2,w,n,m) real g(0:8,0:n,0:m),geq(0:8,0:n,0:m) real w(0:8),tw1,tw2 tw3=0.25*tw1 tw4=0.5*tw1 ! Boundary Conditions ! West Boundary Condition, T=0.5 do j=0,m g(1,0,j)=tw4*(w(1)+w(3))-g(3,0,j) g(5,0,j)=tw4*(w(5)+w(7))-g(7,0,j) g(8,0,j)=tw4*(w(8)+w(6))-g(6,0,j) end do ! East Boundary Condition, open boundary do j=0,m g(6,n,j)=4*g(6,n-1,j)/3-g(6,n-2,j)/3 g(3,n,j)=4*g(3,n-1,j)/3-g(3,n-2,j)/3 g(7,n,j)=4*g(7,n-1,j)/3-g(7,n-2,j)/3 end do do i=0,n do k=0,8 Continued Lattice Boltzmann method and its applications Chapter 18 ! Top Boundary Condition, Adiabatic g(k,i,m)=g(k,i,m-1) ! Bottom Boundary Condition, Adiabatic g(k,i,0)=g(k,i,1) end do end do !Obsatacle 1, T=0.25 !Top Boundary Condition, T=0.25 do i=n/4,n/4+m/2 g(2,i,m/2)=tw3*(w(2)+w(4))-g(4,i,m/2) g(5,i,m/2)=tw3*(w(5)+w(7))-g(7,i,m/2) g(6,i,m/2)=tw3*(w(6)+w(8))-g(8,i,m/2) end do do j=0,m/2 ! East Boundary Condition, T=0.25 g(3,n/4,j)=tw3*(w(1)+w(3))-g(1,n/4,j) g(7,n/4,j)=tw3*(w(5)+w(7))-g(5,n/4,j) g(6,n/4,j)=tw3*(w(8)+w(6))-g(8,n/4,j) ! West Boundary Condition, T=0.25 g(1,n/4+m/2,j)=tw3*(w(1)+w(3))-g(3,n/4+m/2,j) g(5,n/4+m/2,j)=tw3*(w(5)+w(7))-g(7,n/4+m/2,j) g(8,n/4+m/2,j)=tw3*(w(8)+w(6))-g(6,n/4+m/2,j) end do ! Top Boundary Condition, T=1 !Obsatacle 2, T=1.0 !Bottom Boundary Condition, T=1.0 do i=n/2,n/2+m/2 g(4,i,m)=tw1*(w(2)+w(4))-g(2,i,m) g(7,i,m)=tw1*(w(5)+w(7))-g(5,i,m) g(8,i,m)=tw1*(w(6)+w(8))-g(6,i,m) end do do j=m/2,m ! East Boundary Condition, T=1.0 g(3,n/2,j)=tw1*(w(1)+w(3))-g(1,n/2,j) g(7,n/2,j)=tw1*(w(5)+w(7))-g(5,n/2,j) g(6,n/2,j)=tw1*(w(8)+w(6))-g(8,n/2,j) ! West Boundary Condition, T=1.0 g(1,n/2+m/2,j)=tw1*(w(1)+w(3))-g(3,n/2+m/2,j) g(5,n/2+m/2,j)=tw1*(w(5)+w(7))-g(7,n/2+m/2,j) g(8,n/2+m/2,j)=tw1*(w(8)+w(6))-g(6,n/2+m/2,j) end do return end !**********************************************! ! Temperature calculation !**********************************************! subroutine thcalcu(g,th,n,m) real g(0:8,0:n,0:m),th(0:n,0:m) do j=0,m do i=0,n thsum=0.0 do k=0,8 thsum=thsum+g(k,i,j) end do th(i,j)=thsum if (i>=n/4.and.i<=(n/4+m/2).and.j<=m/2) th(i,j)=0.25 if (i>=n/2.and.i<=(n/2+m/2).and.j>=m/2) th(i,j)=1.0 end do end do return Continued 315 316 Handbook of hydroinformatics end !**********************************************! ! Subroutine of velocity calculation !**********************************************! subroutine uvcalc(f,rho,u,v,n,m,cx,cy) real f(0:8,0:n,0:m),rho(0:n,0:m),u(0:n,0:m),v(0:n,0:m) real cx(0:8),cy(0:8) do j=0,m do i=0,n uvsum=0.0 do k=0,8 uvsum=uvsum+f(k,i,j) end do rho(i,j)=uvsum end do end do do j=0,m do i=0,n usum=0.0 vsum=0.0 do k=0,8 usum=usum+f(k,i,j)*cx(k) vsum=vsum+f(k,i,j)*cy(k) end do u(i,j)=usum/rho(i,j) v(i,j)=vsum/rho(i,j) if (i>=n/4.and.i<=(n/4+m/2).and.j<=m/2) u(i,j)=0 if (i>=n/4.and.i<=(n/4+m/2).and.j<=m/2) v(i,j)=0 if (i>=n/2.and.i<=(n/2+m/2).and.j>=m/2) u(i,j)=0 if (i>=n/2.and.i<=(n/2+m/2).and.j>=m/2) v(i,j)=0 end do end do return end !**********************************************! ! Subroutine of exporting results !**********************************************! subroutine result(u,v,th,rho,n,m,kk,ra,pr,savenumber,tw1,tw2,dt,thref) real u(0:n,0:m),v(0:n,0:m),rho(-1:n+1,-1:m+1),th(0:n,0:m) real strf(0:n,0:m) real tw1,tw2,dt,thref real tt(0:n,0:m) CHARACTER FILOUT1*18 WRITE(FILOUT1,'(4HUVTS,I8,4H.lbm)')kk ! Streamfunction Calculations strf(0,0)=0. do i=0,n rhoav=0.5*(rho(i-1,0)+rho(i,0)) if(i.ne.0) strf(i,0)=strf(i-1,0)-rhoav*0.5*(v(i-1,0)+v(i,0)) do j=1,m rhom=0.5*(rho(i,j)+rho(i,j-1)) strf(i,j)=strf(i,j-1)+rhom*0.5*(u(i,j-1)+u(i,j)) end do end do ! Exporting velocity and thermal fields OPEN(2,FILE=FILOUT1) write(2,*)"VARIABLES=X,Y,U,V,th,StreamF,rho" write(2,*)"ZONE ","I=",n+1,"J=",m+1,",","F=BLOCK" do j=0,m write(2,*)(i/float(m),i=0,n) end do do j=0,m write(2,*)(j/float(m),i=0,n) end do do j=0,m Continued Lattice Boltzmann method and its applications Chapter 18 317 write(2,*)(u(i,j),i=0,n) end do do j=0,m write(2,*)(v(i,j),i=0,n) end do do j=0,m write(2,*)(th(i,j),i=0,n) end do do j=0,m write(2,*)(strf(i,j),i=0,n) end do do j=0,m write(2,*)(rho(i,j),i=0,n) end do savenumber=0. return end References Ahrenholz, B., T€ olke, J., Lehmann, P., Peters, A., Kaestner, A., Krafczyk, M., Durner, W., 2008. Prediction of capillary hysteresis in a porous material using lattice-Boltzmann methods and comparison to experimental data and a morphological pore network model. Adv. Water Resour. 31 (9), 1151– 1173. Ajarostaghi, S.S.M., Delavar, M.A., Poncet, S., 2019. Thermal mixing, cooling and entropy generation in a micromixer with a porous zone by the lattice Boltzmann method. J. Therm. Anal. Calorim., 1–19. Ba, Y., Liu, H., Li, Q., Kang, Q., Sun, J., 2016. Multiple-relaxation-time color-gradient lattice Boltzmann model for simulating two-phase flows with high density ratio. Phys. Rev. E 94 (2), 023310. Bao, J., Schaefer, L., 2013. Lattice Boltzmann equation model for multi-component multi-phase flow with high density ratios. Appl. Math. Model. 37 (4), 1860–1871. Bhatnagar, P.L., Gross, E.P., Krook, M., 1954. A model for collision processes in gases. I. Small amplitude processes in charged and neutral onecomponent systems. Phys. Rev. 94 (3), 511. Chen, H., Chen, S., Matthaeus, W.H., 1992. Recovery of the Navier-stokes equations using a lattice-gas Boltzmann method. Phys. Rev. A 45 (8), R5339. Chen, L., Kang, Q., Mu, Y., He, Y.L., Tao, W.Q., 2014. A critical review of the pseudopotential multiphase lattice Boltzmann model: methods and applications. Int. J. Heat Mass Transf. 76, 210–236. Chen, Z., Shu, C., Tan, D., Niu, X.D., Li, Q., 2018. Simplified multiphase lattice Boltzmann method for simulating multiphase flows with large density ratios and complex interfaces. Phys. Rev. E 98 (6), 063314. Dauyeshova, B., Rojas-Solórzano, L.R., Monaco, E., 2018. Numerical simulation of diffusion process in T-shaped micromixer using Shan-Chen lattice Boltzmann method. Comput. Fluids 167, 229–240. Delavar, M.A., Hedayatpour, M., 2012. Forced convection and entropy generation inside a channel with a heat-generating porous block. Heat Transfer— Asian Res. 41 (7), 580–600. Delavar, M.A., Wang, J., 2020a. Modeling combined effects of temperature and structure on competition and growth of multispecies biofilms in microbioreactors. Ind. Eng. Chem. Res. 59 (37), 16122–16135. Delavar, M.A., Wang, J., 2020b. Pore-scale modeling of competition and cooperation of multispecies biofilms for nutrients in changing environments. AIChE J. 66 (6), e16919. Delavar, M.A., Wang, J., 2021a. Modeling coupled temperature and transport effects on biofilm growth using thermal lattice Boltzmann model. AIChE J. 67 (4), e17122. Delavar, M.A., Wang, J., 2021b. Numerical investigation of pH control on dark fermentation and hydrogen production in a microbioreactor. Fuel 292, 120355. Delavar, M.A., Wang, J., 2021c. Lattice Boltzmann method in modeling biofilm formation, growth and detachment. Sustainability 13 (14), 7968. Delavar, M.A., Wang, J., 2022a. Modeling microbial growth of dynamic membrane in a biohydrogen production bioreactor. Int. J. Hydrog. Energy 47, 7666–7681. Delavar, M.A., Wang, J., 2022b. Three-dimensional modeling of photo fermentative biohydrogen generation in a microbioreactor. Renew. Energy 181, 1034–1045. Delavar, M.A., Farhadi, M., Sedighi, K., 2009. Effect of the heater location on heat transfer and entropy generation in the cavity using the lattice Boltzmann method. Heat Transfer Res. 40 (6). Delavar, M.A., Farhadi, M., Sedighi, K., 2010. Numerical simulation of direct methanol fuel cells using lattice Boltzmann method. Int. J. Hydrog. Energy 35 (17), 9306–9317. 318 Handbook of hydroinformatics d’Humières, D., 2002. Multiple–relaxation–time lattice Boltzmann models in three dimensions. Philos. Trans. R. Soc. London, Ser. A 360 (1792), 437– 451. Dong, B., Yan, Y.Y., Li, W., Song, Y., 2010. Lattice Boltzmann simulation of viscous fingering phenomenon of immiscible fluids displacement in a channel. Comput. Fluids 39 (5), 768–779. Falcucci, G., Ubertini, S., Bella, G., Succi, S., 2013. Lattice Boltzmann simulation of cavitating flows. Commun. Comput. Phys. 13 (3), 685–695. Frisch, U., d’Humieres, D., Hasslacher, B., Lallemand, P., Pomeau, Y., Rivet, J.P., 1987. Lattice gas hydrodynamics in two and three dimensions. Complex Syst. 1 (4), 649–707. Grunau, D., Chen, S., Eggert, K., 1993. A lattice Boltzmann model for multiphase fluid flows. Phys. Fluids A 5 (10), 2557–2562. Gunstensen, A.K., Rothman, D.H., Zaleski, S., Zanetti, G., 1991. Lattice Boltzmann model of immiscible fluids. Phys. Rev A 43 (8), 4320. He, X., Chen, S., Zhang, R., 1999. A lattice Boltzmann scheme for incompressible multiphase flow and its application in simulation of Rayleigh–Taylor instability. J. Comput. Phys. 152 (2), 642–663. Huang, H., Krafczyk, M., Lu, X., 2011. Forcing term in single-phase and Shan-Chen-type multiphase lattice Boltzmann models. Phys. Rev. E 84 (4), 046710. Huang, H., Huang, J.J., Lu, X.Y., Sukop, M.C., 2013. On simulations of high-density ratio flows using color-gradient multiphase lattice Boltzmann models. Int. J. Mod. Phys. C 24 (04), 1350021. Huang, H., Sukop, M., Lu, X., 2015. Multiphase Lattice Boltzmann Methods: Theory and Application. Wiley. Inamuro, T., Ogata, T., Tajima, S., Konishi, N., 2004. A lattice Boltzmann method for incompressible two-phase flows with large density differences. J. Comput. Phys. 198 (2), 628–644. Jami, M., Moufekkir, F., Mezrhab, A., Fontaine, J.P., Bouzidi, M.H., 2016. New thermal MRT lattice Boltzmann method for simulations of convective flows. Int. J. Therm. Sci. 100, 98–107. Jiang, P.X., Lu, X.C., 2006. Numerical simulation of fluid flow and convection heat transfer in sintered porous plate channels. Int. J. Heat Mass Transf. 49 (9–10), 1685–1695. Lallemand, P., Luo, L.S., 2003. Theory of the lattice Boltzmann method: acoustic and thermal properties in two and three dimensions. Phys. Rev. E 68 (3), 036706. Latva-Kokko, M., Rothman, D.H., 2005. Static contact angle in lattice Boltzmann models of immiscible fluids. Phys. Rev. E 72 (4), 046701. Li, Q., Luo, K.H., Kang, Q.J., He, Y.L., Chen, Q., Liu, Q., 2016a. Lattice Boltzmann methods for multiphase flow and phase-change heat transfer. Prog. Energy Combust. Sci. 52, 62–105. Li, Z., Yang, M., Zhang, Y., 2016b. Double MRT thermal lattice Boltzmann method for simulating natural convection of low Prandtl number fluids. Int. J. Numer. Methods Heat Fluid Flow 26 (6). https://doi.org/10.1108/HFF-04-2015-0135. Liu, H., Kang, Q., Leonardi, C.R., Schmieschek, S., Narváez, A., Jones, B.D., Williams, J.R., Valocchi, A.J., Harting, J., 2016. Multiphase lattice Boltzmann simulations for porous media applications. Comput. Geosci. 20 (4), 777–805. Mezrhab, A., Jami, M., Abid, C., Bouzidi, M.H., Lallemand, P., 2006. Lattice-Boltzmann modelling of natural convection in an inclined square enclosure with partitions attached to its cold wall. Int. J. Heat Fluid Flow 27 (3), 456–465. Mohamad, A.A., 2011. Lattice Boltzmann Method. vol. 70 Springer, London, UK. Nekoeian, S., Goharrizi, A.S., Jamialahmadi, M., Jafari, S., Sotoudeh, F., 2018. A novel Shan and Chen type lattice Boltzmann two phase method to study the capillary pressure curves of an oil water pair in a porous media. Petroleum 4 (3), 347–357. Nield, D.A., Bejan, A., 2006. Convection in Porous Media. vol. 3 Springer, New York, USA. Niu, X.D., Li, Y., Ma, Y.R., Chen, M.F., Li, X., Li, Q.Z., 2018. A mass-conserving multiphase lattice Boltzmann model for simulation of multiphase flows. Phys. Fluids 30 (1), 013302. Qian, Y.H., d’Humières, D., Lallemand, P., 1992. Lattice BGK models for Navier-stokes equation. EPL (Europhy. Lett.) 17 (6), 479. Rothman, D.H., Keller, J.M., 1988. Immiscible cellular-automaton fluids. J. Stat. Phys. 52 (3), 1119–1127. Saatsaz, M., Eslamian, S., 2020. Groundwater modeling and its concepts, classifications, and applications for solute transport simulation in saturated porous media, Ch. 4. In: Eslamian, S., Eslamian, F. (Eds.), Advances in Hydrogeochemistry Research. Nova Science Publishers, Inc, USA, pp. 91–120. Sankaranarayanan, K., Shan, X., Kevrekidis, I.G., Sundaresan, S., 2002. Analysis of drag and virtual mass forces in bubbly suspensions using an implicit formulation of the lattice Boltzmann method. J. Fluid Mech. 452, 61–96. Sattari, E., Delavar, M.A., Fattahi, E., Sedighi, K., 2014. Numerical investigation the effects of working parameters on nucleate pool boiling. Int. Commun. Heat Mass Transfer 59, 106–113. Sattari, E., Delavar, M.A., Sedighi, K., 2016. Numerical study of bubble separation and motion using lattice Boltzmann method. Chall. Nano Micro Scale Sci. Technol. 4 (2), 17–27. Seta, T., Takegoshi, E., Kitano, K., Okui, K., 2006. Thermal lattice Boltzmann model for incompressible flows through porous media. J. Therm. Sci. Technol. 1 (2), 90–100. Shan, X., Chen, H., 1993. Lattice Boltzmann model for simulating flows with multiple phases and components. Phys. Rev. E 47 (3), 1815. Shan, X., Chen, H., 1994. Simulation of nonideal gases and liquid-gas phase transitions by the lattice Boltzmann equation. Phys. Rev. E 49 (4), 2941. Shan, X., Doolen, G., 1995. Multicomponent lattice-Boltzmann model with interparticle interaction. J. Stat. Phys. 81 (1), 379–393. Sharma, K.V., Straka, R., Tavares, F.W., 2019. Lattice Boltzmann methods for industrial applications. Ind. Eng. Chem. Res. 58 (36), 16205–16234. Sipey, M.H., Delavar, M.A., Sattari, E., 2020. Lattice Boltzmann simulation of droplet deformation and breakup due to collision with obstacles in microchannel. Indian J. Phys. 94 (11), 1767–1776. Sukop, M.C., Thorne, D.T., 2006. Lattice Boltzmann Modeling: An Introduction for Geoscientists and Engineers. Google Scholar Digital Library. Lattice Boltzmann method and its applications Chapter 18 319 Tian, Z., Wang, J., 2017. Lattice Boltzmann simulation of CO2 reactive transport in network fractured media. Water Resour. Res. 53 (8), 7366–7381. Tian, Z., Wang, J., 2018. Lattice Boltzmann simulation of dissolution-induced changes in permeability and porosity in 3D CO2 reactive transport. J. Hydrol. 557, 276–290. Wang, J., Zhang, X., Bengough, A.G., Crawford, J.W., 2005. Domain-decomposition method for parallel lattice Boltzmann simulation of incompressible flow in porous media. Phys. Rev. E 72 (1), 016706. Yan, Y.Y., Zu, Y.Q., 2007. A lattice Boltzmann method for incompressible two-phase flows on partial wetting surface with large density ratio. J. Comput. Phys. 227 (1), 763–775. Yan, W.W., Liu, Y., Xu, Y.S., Yang, X.L., 2008. Numerical simulation of air flow through a biofilter with heterogeneous porous media. Bioresour. Technol. 99 (7), 2156–2161. Zhang, Y., Huang, Y., Xu, M., Wan, Q., Li, W., Tian, Y., 2020. Flow and heat transfer simulation in a wall-driven porous cavity with internal heat source by multiple-relaxation time lattice Boltzmann method (MRT-LBM). Appl. Therm. Eng. 173, 115209. Zheng, H.W., Shu, C., Chew, Y.T., 2006. A lattice Boltzmann model for multiphase flows with large density ratio. J. Comput. Phys. 218 (1), 353–371. This page intentionally left blank Chapter 19 Multigene genetic programming and its various applications Majid Niazkar Department of Agricultural and Environmental Sciences - Production, Landscape, Agroenergy, University of Milan, Milan, Italy 1. Introduction The emergence of machine learning and artificial intelligence (AI) techniques brings about an inevitable influence on each and every field of research. In essence, these methods own their broad spectrum of possible applications to the fact that they do not require to know the physical background of the problem statement under investigation. Instead, they work with the data representing the relationship(s) between the problem state variables in favor of capturing any trend or relation. Therefore, machine learning methods can be utilized to the estimation-type of problems in various fields, while the major focus of these models is on the data analysis. Genetic programming (GP), as one of the AI techniques, has been attracted the attention of many researchers in different fields. It has several modified version, one of which is multigene genetic programming (MGGP). In the absence of a comprehensive review on the MGGP applications, this chapter is devoted not only to introducing MGGP as one of the modified GP version but also to review its applications in many fields of research. As an example, MGGP was used to tackle a problem in water resources. Finally, some future trends for applying MGGP to waterrelated field were suggested. 2. Genetic programming and its variants Genetic algorithm (GA) is a well-established metaheuristic optimization algorithm. It basically adopts the principle of evolution, reproduction and survival of the best gene(s) among a randomly-generated population. In essence, the main five steps of GA include the generation of a random initial population, fitness function, selection, crossover and mutation. Since GA works adequately as a search-based optimization algorithm, it has been successfully applied to solving the numerous problems in various fields including water resources management (Nicklow et al., 2010). Despite the wide range of the GA applications in practice, many practitioners still need a powerful tool to determine nonlinear behavior of physical systems (Nedjah et al., 2006). In this regard, an improved version of GA, named as GP, was proposed (Koza, 1992) not only to address this shortcoming but also to utilize the substantial power of GA (Kouzehgar et al., 2021). Generally, GP is an AI technique with a tree-like flexible structure (Niazkar and Zakwan, 2021a). It basically exploits GA as a search engine to seek for a suitable relationship that converts a set of input data into output data (Niazkar, 2020). In other words, GP tackled the mentioned problem by solving an optimization problem using GA, while the objective function is to either minimize or maximize the error between the estimated and real output data. This problem solving process becomes feasible by considering a tree-based architecture for each equation. A symbolic individual defined in GP is illustrated in Fig. 1. In Fig. 1, X1 and X2 are the input variables and Y is an output variable. As shown, Fig. 1 introduces three parts of a custom equation in GP, which are root node, function node and terminal nodes. Hence, each individual is a gene or an equation, which the functions and fixed coefficients are saved in a tree format in GP. The main steps of the process conducted by GP consist of initialization, selection, reproduction, and termination (Niazkar et al., 2019a). In the first step, GP commences the process by generating a random initial population by combining the functions and terminal sets (input variables, constant coefficients) randomly. In essence, initialization process is to specify a certain number of individuals with the random shapes and nodal functions and terminals (Garg and Tai, 2014). The functions used in GP include the arithmetic operations, trigonometric functions, exponential and logarithmic functions, square function, Boolean operators, protected square root and protected natural logarithm. The last two functions Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00019-1 Copyright © 2023 Elsevier Inc. All rights reserved. 321 322 Handbook of hydroinformatics Root node + × sin Function node X2 Terminal nodes X1 3 Y=3X1+sin(X2) where X1 and X2 are input variables and Y is an output variable FIG. 1 Different parts of a symbolic individual in genetic programming. return zero and a very small number if a negative value is inserted in the square root and natural logarithm, respectively (Marini and Conversi, 2012). Each individual in the created populations is a random equation counted as the possible candidates if they would describe the relation between the input and output variables adequately. Since the initial population does not necessarily contain the best-fitted relation between the input and output data, GP needs to modify this population and search for the best equations. This modification can be interpreted as changing the shape and/or information in each node of the individuals in the population. For this purpose, the population is subjected to the GA operators including selection, crossover and mutation in the second step (Niazkar, 2019). The evolutionary process is essentially applied not only to create but also to select individuals with better fitness criterion. A schematic view of the crossover and mutation processes conducted by GP is depicted in Fig. 2. As shown, the former process is basically exchanging the random parts of parent genes, while the latter is altering a random part of parent gene to develop a new offspring gene. The reproduction process is applied to each generation until the termination criterion, which is either the maximum number of generations or a threshold error, is met (Garg and Tai, 2014). The termination step is the final step in which a relation with desirable accuracy is acquired. Similar to GA, several versions of GP were developed to enhance the characteristics of this estimation tool. Some of these versions are classical or monolithic genetic programming (GP), linear genetic programming (LGP), traceless genetic programming (TGP), gene expression programming (GEP), grammar-based genetic programming (GGP), and multigene genetic programming (MGGP). The next section introduces MGGP. 3. An introduction to multigene genetic programming In the traditional GP, each individual or chromosome has a single tree or gene, whereas a typical individual in MGGP can consist of more than one gene or tree, which is the key difference between GP and MGGP. Another difference between parent genes + – × × × cos offspring gene parent gene sin 8 5 X1 X2 0.5 X1 X2 × Crossover - 8 0.5 X1 X2 X2 × 0.5 cos × X1 0.5 × X1 8 + cos sin × cos Mutation offspring genes × - - 5 X1 (a) Crossover FIG. 2 Crossover and mutation processes of genetic programming. X2 (b) Mutation X2 Multigene genetic programming Chapter 19 323 MGGP and GP is the evolutionary process. To be more specific, MGGP exploits two kinds of crossover processes including two point high level crossover and low level crossover (Searson et al., 2010). In the former process, genes in two multigene parents (two individuals) can be exchanged in MGGP to develop two new offspring using a tree crossover. This process also enables either to add or remove a gene from an individual. After applying this process, new individuals are checked to have lower number of genes than the maximum number of gene allowed in one individual, which is a controlling parameter specified by the user. If any individual has more number of genes than the maximum limit, it will be removed from the population. Furthermore, the low level crossover is similar to the routine subtree crossover process in GP. In this process, one of the genes in an individual parent is selected randomly and the crossover is conducted within the corresponding gene, which provides a new individual offspring. Also, MGGP employs several methods for mutation process in addition to the standard mutation process of GP, which may be counted as another difference between GP and MGGP. Basically, an individual in MGGP or multibranches GP contains several genes or trees, while each gene in an individual is multiplied by a gene weight. Since MGGP utilizes either linear or nonlinear regression methods to determine the optimum values of gene coefficients, they are also called regression coefficients (Mehr and Nourani, 2018). When the regression coefficients are obtained by a least square model, the corresponding MGGP model is called a pseudo-linear model, which is capable of capturing nonlinear behavior of systems. In this version of MGGP, the linear algebraic summation of the weighted genes available in a single individual and a stochastic term, invariantly named bias or noise, gives the equation of the corresponding individual. For a better clarification, Fig. 3 presents a schematic view of a single individual in MGGP that consists of two genes. In Fig. 3, d1 and d2 are the gene coefficients and d0 is bias. The flowchart of the pseudo-linear MGGP is depicted in Fig. 4. In a bid to improve this model, it can be combined with other approaches to work as a hybrid model (Ghorbani et al., 2018). These methods can be used to either modify different processes including the determination of optimum values of gene coefficients and the process of selecting appropriate models. For instance, the symbolic regression was replaced by the stepwise regression approach to improve the former process (Garg and Tai, 2014; Garg and Lam, 2015). Moreover, artificial neural network, support vector machines (Garg and Tai, 2014), Bayesian classifier (Garg and Lam, 2015) and the Pareto optimal method (Riahi-Madvar et al., 2019) were used to improve the selection of models developed by MGGP. In comparison with the classical GP, MGGP is supposed to exploit the smaller trees (Garg and Tai, 2014; Mehr and Nourani, 2017). As a result, MGGP is expected to offer the simpler models than those developed by classical GP (Searson, 2015). Furthermore, using more number of genes in MGGP makes it suitable for dealing with the problems with more highorder of complexity. Furthermore, MGGP and GP benefit not only from the random search of GA but also from the flexible tree architecture. The latter enables the successful implementation of various built-in functions and constant coefficients required to develop the best-fitted expression describing the relation between any sets of the input and output data. Since GA is a zero-order optimization algorithm, the process of initialization in MGGP is conducted without the need to assume the functions and terminals in advance. This advantage enables MGGP to be a more appealing tool in comparison with nonlinear regression models because the structure, functions and constant coefficients of the relation between the input and output data are not needed to be estimated in advance (Niazkar, 2021). To be more precise, the type of equation, which is unknown at the beginning of the process in most of the times, should be specified in nonlinear regression models. This requirement confines the relation under investigation to a very limited structure. However, MGGP develops the relations without shape limitation, which is indeed a helpful characteristic when it comes to searching for the relationships between two sets of data (Lee and Suh, 2020). Finally, there is indeed a trade-off between the precision and complexity of relations + – × 0.5 × cos X1 X2 8 × sin 5 X2 X1 Y=d0+d1[0.5X1+cos(X2)]+d2[8 sin(X1)+5 X2] where X1 and X2 are input variables, Y is an output variable, d0 is bias, and d1 and d2 are gene coefficients FIG. 3 A schematic view of a two-gene individual in multigene genetic programming. 324 Handbook of hydroinformatics Start Insert input data Specify MGGP controlling parameters, particularly the maximum number of genes allowed in an individual (Gmax) and the maximum depth of trees (dmax) Create initial population randomly Does each individual satisfy Gmax and dmax? No Yes Find gene coefficients for each individual using regression models Evaluating fitness function for each individual Yes Evaluate the performance of MGGP model for the new data, which were not used for developing the model End Are termination criteria satisfied? No Apply genetic operators Create a new generation Does each individual satisfy Gmax and dmax? No Yes Find gene coefficients for each individual using regression models Evaluating fitness function for each individual FIG. 4 Flowchart of the pseudo-linear multigene genetic programming. that can determine the output data based on the input data, while such balance can be addressed by parameters controlling the process, which are presented in the next section. 4. Main controlling parameters of MGGP The basic version of MGGP has several types of the parameters that control each and every part of the processes conducted by MGGP. Different groups of controlling parameters in a typical MGGP model include (1) run control parameters, (2) fitness parameters, (3) selection parameters, (4) terminal nodes, (5) function nodes, (6) genetic operators, (7) tree build parameters, (8) multigene parameters, and (9) mutation settings (Searson, 2009). Fig. 5 lists the MGGP parameters in the aforementioned groups. Among various controlling parameters shown in Fig. 5, two parameters, which can only be set equal to nonzero integer values, have a profound impact on the MGGP results. The first parameter is the maximum number of genes allowed in an individual. When this multigene parameter is set to be one, the MGGP model turns into the classical GP model, while it needs to be more than one to serve as the MGGP models. Furthermore, it mainly used as a criterion to dismiss individuals with more than the allowed genes. These individuals are generated either randomly in the very first population or by the evolutionary process like crossover and mutation. The second crucial parameter is the maximum depth of trees in MGGP, which is clearly a tree-build parameter (Searson, 2009). It is also used as a maximum limit for the number of function and terminal nodes in each gene. Obviously, the more the values of these two parameters are assumed, the more terms the Multigene genetic programming Chapter Run control parameters Fitness parameters Selection parameters • Termination threshold value • Fitness minimization (True to minimize and False to maximize the fitness function) • The size of the tournament • Elitism • The range that constant nodes are generated from with uniform probability • The number of “input” nodes Function nodes • Cell array of the user’s defined function. • Probability of GP tree mutation • Probability of GP tree crossover • Probability of GP tree direct copy Tree build parameters • Maximum depth of trees • Maximum number of nodes per tree • Maximum depth of sub-trees created by mutation Multi-gene parameters • Maximum number of genes per individual Mutation settings 325 • Population size • Number of generations to run Terminal nodes Genetic operators 19 • Standard deviation of perturbation applied in mutation Gaussian perturbation of a randomly selected constant FIG. 5 Controlling parameters of multigene genetic programming. relation can theoretically have. Therefore, the trade-off between the complexity and accuracy of the MGGP results can be controlled by the user mostly over these two parameters. In one of the open-source MATLAB-based codes of MGGP, named as GPTIPS (Searson, 2009; Searson et al., 2010), the fitness function is the root mean square of errors between the estimated and observed sets of input data. This software has been widely used for applying MGGP to the various problems in the literature (Garg et al., 2014a; Garg and Lam, 2015; Mehr and Kahya, 2017; Safari and Mehr, 2018; Zakwan and Niazkar, 2021; Lee and Suh, 2020). The GPTIPS default values of some of the MGGP controlling parameters mentioned in Fig. 5 is given in Table 1. As shown, the default values of the maximum tree depth and the maximum number of genes per a single individual are 6 and 1, respectively (Searson, 2009). Hence, the default version of GPTIPS works as a GP model because the MGGP models can have more than one gene in each individual. By specifying an integer value more than one to the maximum number of genes allowed in a chromosome, GPTIPS works as a MGGP model. According to Table 1, the summation of parameters describing the probability of genetic operators should be equal to one (Searson et al., 2010). Moreover, Table 1 indicates that the population size and the maximum number of generation to run are 100 and 100, respectively. Finally, MGGP controlling parameters, particularly the maximum depth of trees and the maximum number of genes allowed in an individual, may be selected by adopting either a trial-and-error process (Garg and Lam, 2015) or a sensitivity analysis for a specific problem because they may have an inevitable impact on the final results obtained by MGGP. 5. A review on MGGP applications MGGP has been utilized to solve the various problems in different fields of research. These fields include aerospace science (De Giorgi and Quarta, 2020), biomedicine ( Javed et al., 2016), medicine and global health (Hasan et al., 2016; Niazkar and Niazkar, 2020), chemical engineering (Esmaeili and Mohebbi, 2017), petroleum engineering (Kaydani et al., 2014), industrial engineering (Garg and Lam, 2015), electrical engineering (Pedrino et al., 2019), mechanical engineering (Garg et al., 2014c), urban engineering (Mousavi et al., 2015; Beura and Bhuyan, 2018), geotechnical engineering (Gandomi and Alavi, 2012a; Muduli and Das, 2014; Garg et al., 2014b; Muduli and Das, 2015; Chen et al., 2016), structural engineering (Gandomi and Alavi, 2012b; Mohammadi Bayazidi et al., 2014; Hoang et al., 2017), and environmental engineering (Pandey et al., 2015). Even though GP and its several variants like GEP have been applied to solve the various problems in water resources field of study (Mehr et al., 2018; Mohammad-Azari et al., 2020), MGGP has only been used to a few 326 Handbook of hydroinformatics TABLE 1 Default values of some of the controlling parameters of multigene genetic programming. MGGP controlling parameter GPTIPS default values Population size 100 Number of generations 100 Maximum number of genes allowed in individual 1 Maximum depth of trees 6 Maximum depth of subtrees created by mutation 6 Tournament size 2 Elitism (the fraction of population to copy directly to the next generation without modification) 0.05 Probability of GP tree mutation 0.1 Probability of GP tree crossover 0.85 Probability of GP tree direct copy 0.05 Standard deviation of perturbation applied in mutation Gaussian perturbation of a randomly selected constant 0.1 The range that constant nodes are generated from with uniform probability [10, 10] problems in this field. In the following, a chronological literature review of applying MGGP in the water resource field is presented: Kumar et al. (2014) applied MGGP to propose the models for predicting sediment transport, total bed load and incipient motion. The results estimated by the MGGP models for the three sediment problems were found to have a high accuracy in comparison with the several methods available in the literature (Kumar et al., 2014). Garg et al. (2014a) compared support vector regression, artificial neural network with MGGP to predict the stress-dependent soil water retention curves. The comparison reveals that the MGGP estimations were more accurate than the other two AI models (Garg et al., 2014a). Mehr and Nourani (2017) proposed a moving average filtering MGGP (MA-MGGP) technique for estimating singleand multiday ahead runoff. This hybrid method employs moving average filtering as a data preprocessing approach, while it uses MGGP as a prediction tool and Pareto-front to find the optimum models developed by MGGP. The performance of the hybrid MGGP-based model was compared with those of the classical GP, MGGP and multilayer perceptron, while the results showed the promising improvement made by the MA-MGGP model. Mehr and Kahya (2017) suggested the MA-MGGP approach to predict the daily streamflow. They compared the performances of MA-MGGP with those of monolithic GP conventional multilinear regression prediction models using the daily streamflow records observed at a single station on Senoz Stream, Turkey. The comparison indicated that the MA-MGGP model not only developed a parsimonious model for estimating streamflow but also enables implementing human insight to explore the top MA-MGGP solutions for further analysis (Mehr and Kahya, 2017). Hadi and Tombul (2018) utilized MGGP to select the best scales for forecasting monthly. Since MGGP or any other AI may not capture the seasonality of the data, a continuous wavelet transformation analysis was conducted to overcome the stationarity problem before applying MGGP. They estimated 1 month ahead downstream flow of a basin located in the southeast of Turkey by using downstream flow, upstream flow, rainfall, temperature, and potential evapotranspiration with associated lags as input variables. The results demonstrated that the proposed model improved the predicted streamflow and the peak values. This improvement was achieved because several scales, which were appropriate for capturing the seasonality and irregularity of data, were used, while the inclusion of hydrological and meteorological variables as input data enhances the ability of monthly streamflow forecasting (Hadi and Tombul, 2018). Safari and Mehr (2018) proposed the Pareto-optimal MGGP to predict particle Froude number in large sewers for scenarios assuming sediment deposition on the bed. Using four sets of data, it was found that the MGGP models yielded to better estimations of particle Froude number in comparison with the conventional regression models. Since the MGGP models used a lower number of input variables than the empirical models available in the literature, they may be counted as a parsimonious alternative for design of self-cleansing large sewers (Safari and Mehr, 2018). Mehr and Nourani (2018) combined the season algorithm with MGGP to develop a rainfall-runoff model to estimate single- and two- and threeday ahead streamflow at Haldizen Catchment, Trabzon, Turkey. The results of the case study indicate that MGGP may Multigene genetic programming Chapter 19 327 capture underlying structure of the rainfall-runoff process slightly better than the monolithic GP (Mehr and Nourani, 2018). Eray et al. (2018) compared MGGP with the dynamic evolving neural-fuzzy inference system (DENFIS), GP and the Hargreaves Samani empirical equation for modeling monthly pan evaporation. The input data include minimum temperature, maximum temperature, solar radiation, relative humidity, and wind speed, which were all gathered at Antakya and Antalya stations, Mediterranean region of Turkey. The comparison indicates the slightly-better estimations than those of MGGP and DENFIS were obtained by GP for Antakya station, while DENFIS reach to better predictions for Antalya station. Additionally, considering periodicity input in MGGP and DENFIS model enhances the accuracy of estimations (Eray et al., 2018). Ghorbani et al. (2018) integrated Chaos theory with MGGP to develop a hybrid model for river flow forecasting. The hybrid model was compared with MGGP and local prediction model for estimating daily flow at four stations. The results indicated that the hybrid MGGP outperformed two other models (Ghorbani et al., 2018). Riahi-Madvar et al. (2019) applied the Pareto-Optimal-MGGP model to estimate the longitudinal dispersion coefficients using 503 data sets gathered from natural streams around the world. The proposed model was compared with eight equations present in the literature, while the comparison indicates that the hybrid MGGP-based model provides simpler and more accurate equations than other available ones (Riahi-Madvar et al., 2019). Lee and Suh (2020) proposed MGGP to develop the stability equations for rock armor and Tetrapods. They compared the MGGP models with available empirical formulas and artificial neural network, while the MGGP model outperformed the others (Lee and Suh, 2020). More recently, Niazkar and Zakwan (2021b) combined MGGP with generalized reduced gradient (GRG) and introduced the hybrid MGGP-GRG. They applied it to develop stage-discharge relations for both single-value and loop rating curves. The compared the performances of the conventional method, evolutionary algorithm, the modified honey bee mating optimization (MHBMO) algorithm, artificial neural network (ANN), MGGP, and the hybrid MGGP-GRG technique for developing the rating curves of eight different rivers. The obtained results demonstrated that the hybrid MGGP-GRG model yielded the best method in the ranking analysis for developing single-value and loop rating curves. 6. Future trends of MGGP applications Based on the literature review conducted in the previous section, MGGP has already been applied to several problems in the water resources field. These topics include streamflow prediction, rainfall-runoff models, sediment transport modeling, predicting longitudinal dispersion coefficient, estimating pan evaporation, developing soil water retention curves, and developing stability equations for rock armor and Tetrapods. Since MGGP has been successfully to numerous problems in different fields, application of MGGP to water resources problems is suggested for future studies. Based on the current literature, MGGP can be used as a prediction tool for many problems, which the AI models except MGGP has been used. Several examples of these problems include developing bed roughness predictors (Giustolisi, 2004; Niazkar et al., 2019a), design of open channels (Niazkar, 2020), predicting scour depth around piers (Niazkar and Afzali, 2018), estimating water surface profiles (Niazkar et al., 2021), and hydrological flood and stage routing (Sivapragasam et al., 2008; FallahMehdipour et al., 2013). 7. A case study of the MGGP application The stage-discharge rating curve is a widely-used diagram that presents the variation of the water depth (G) in terms of discharge (Q) values. In essence, developing a rating curve is essential when the discharge in a natural stream is not measured directly due to any possible reason. Such development requires a historical database including stage and discharge values, which were measured concurrently. This database is basically used for estimating the parameters of a rating curve model. According to the literature, various methods were utilized for the parameter estimation of rating curve models. In this chapter, 235 data points observed from the Philadelphia gauging site on Schuylkill River, United States were exploited as a case study for assessing the performance of MGGP. This dataset was previously used in the literature (Niazkar and Zakwan, 2021b). To be more specific, MGGP with trigonometric functions were utilized to determine the stage-discharge relationship of the data, while the same problem is revisited in this chapter using MGGP with the square function. For this purpose, the considered data were divided into two parts named as training (75% of total data) and testing (25% of total data) parts. The former was used to train MGGP, while a comparative analysis was conducted based on the latter. Furthermore, the data division, which is depicted in Fig. 6, is the same as the one considered in the previous study (Niazkar and Zakwan, 2021b). This similarity specifically enables comparing the results with those available in the literature. Additionally, the MGGP parameters were also set based on the previous study (Niazkar and Zakwan, 2021b), and the GPTIPS version was used for implementing MGGP for this application. Finally, Table 2 shows the maximum, minimum and average values of the data parameters for both training and testing parts. 328 Handbook of hydroinformatics 2.65 Stage (m) Training data Testing data 2.15 1.65 0 100 200 300 400 500 Discharge (m3/s) FIG. 6 Training and testing parts of the stage-discharge data. TABLE 2 Ranges of stage and discharge values for the training and testing data. Data part Parameter Minimum Maximum Average Training G (m) 1.79 2.61 1.94 Q (m3/s) 17.7 484.22 73.63 G (m) 1.8 2.5 1.93 Q (m /s) 19.11 407.76 71.65 Testing 3 In the previous study conducted on this dataset, a few models were developed for estimating discharges from stage values, while four of them (conventional method, MHBMO, ANN and MGGP) are selected for the comparison purpose. These models (Eqs. 1–3) and the new MGGP-based model (Eq. 4) are presented in Table 3. The previous MGGP-based model (Eq. 3) contains trigonometric functions, whereas the MGGP-based model (Eq. 4) proposed in this study was developed using the square function. Moreover, Eqs. (1) and (2) provide discharge values directly, whereas the stage and discharge values in Eqs. (3) and (4) are normalized variables. To be more specific, these variables were normalized QQ by Q ¼ Q Qmin , where Q, Qmin, and Qmax are the normalized, minimum and maximum discharge values of the dataset. max min The performances of the rating curve models presented in Table 3 were compared using four metrics, which are shown in Eqs. (5)–(8). They are (1) Sum of Square of Error (SSE), (2) Nash-Sutcliffe efficiency (NE), (3) Mean Absolute Relative Error (MARE), and (4) Maximum Absolute Relative Error (MXARE): TABLE 3 Ranges of stage and discharge values for the training and testing data. Models Eq. Stage-discharge relations Conventional method 1 Q ¼ 578.07(G 1.61)1.98 MHBMO 2 Q ¼ 543.44(G 1.68)1.57 MGGP (trigonometric functions) 3 Q ¼ 0:05899 cos ð11:78y Þ 0:07737 cos ½9:779y cos ðy Þ 36:2 cos ðy Þ 5:953 + 7:196 sin ð0:8161y Þ 0:01603 sin ð15:46y Þ + 5:937y + 6:1 MGGP (square function ) 4 Q ¼ 0:4256y + 0:4256y 2 39:52y 4 3:274y 6 266:6y 4 0:003953y 4 + + 0:4879y 2 0:0002393 y 7:282 y 0:5901 Multigene genetic programming Chapter 19 329 TABLE 4 Comparison of the performances of different rating curve models. Previous study This study Data part Criteria Conventional method MHBMO ANN MGGP (trigonometric functions) MGGP (square function) Training SSE 19621.30 511.32 4321.46 398.97 421.41 NE 0.98 1.00 1.00 1.00 1.00 MARE 0.04 0.02 0.02 0.02 0.02 MXARE 0.19 0.08 0.07 0.08 0.07 SSE 3741.23 183.60 184.38 113.60 127.83 NE 0.98 1.00 0.99 1.00 1.00 MARE 0.04 0.02 0.02 0.02 0.02 MXARE 0.13 0.06 0.06 0.06 0.06 Testing SSE ¼ N 2 X Qoi Qei (5) i¼1 N 2 X Qoi Qei NE ¼ 1 i¼1 0 N B X B BQ B oi i¼1 @ N X 12 (6) Q oi C C C N C A i¼1 N Q Q 1 X oi ei MARE ¼ N i¼1 Qoi ! Q Q oi ei MXARE ¼ max for i ¼ 1, …, N Qo (7) (8) i where Qoi and Qei are the ith observed and estimated discharges for the dataset, respectively. Table 4 compares the performances of five rating models. As shown, the new MGGP-based model improves the performances of the conventional method, MHBMO and ANN based on SSE and NE for both training and testing data. In particular, the MGGP with the square function improves SSE values of the conventional method, MHBMO and ANN by 97.8%, 17.6%, and 90.2% for the training data, respectively, while these improvement percentages are 96.6%, 30.4%, and 30.7% for the testing data, respectively. Although the MGGP-based model with trigonometric functions achieved better SSE and NE values than the new MGGP-based model, the new model may be much simpler to adopt in numerical modeling, such as flood routing where the first derivative of the discharge is required (Niazkar et al., 2019b). According to Table 4, MARE and MXARE metrics indicate the better performance of the AI-based models (ANN and MGGP). Finally, the comparison shown in Table 4 demonstrates that MGGP is capable of providing accurate explicit estimation models using different types of functions. 8. Conclusions This chapter reviewed the applications of MGGP. It has been used for tackling numerous problems in different fields including water resources management. Because of the flexibility of MGGP tree-like structure, it can perform as a powerful 330 Handbook of hydroinformatics estimation tool in the process of the problem solving. According to the literature review conducted in this chapter, many problems in the field of water resources have not been explored using MGGP. In this regard, some topics for future research were suggested for applying MGGP in water resources. Furthermore, the problem of developing the stage-discharge relation in a river was revisited, and a new MGGP-based model was proposed considering the square function. According to the comparative analysis, the MGGP-based rating model improves SSE values of the conventional method, MHBMO and ANN from 17.6% to 97.8% for the training data and between 30.4% and 96.6% for the testing data, respectively. Finally, it is anticipated that MGGP will be utilized for solving many water resources problems in future as it has comparable merits in comparison with other machine learning methods. References Beura, S.K., Bhuyan, P.K., 2018. Operational analysis of signalized street segments using multi-gene genetic programming and functional network techniques. Arab. J. Sci. Eng. 43 (10), 5365–5386. Chen, J., Zeng, Z., Jiang, P., Tang, H., 2016. Application of multi-gene genetic programming based on separable functional network for landslide displacement prediction. Neural Comput. Appl. 27 (6), 1771–1784. De Giorgi, M.G., Quarta, M., 2020. Hybrid MultiGene Genetic Programming-Artificial neural networks approach for dynamic performance prediction of an aeroengine. Aerosp. Sci. Technol. 105902. https://doi.org/10.1016/j.ast.2020.105902. Eray, O., Mert, C., Kisi, O., 2018. Comparison of multi-gene genetic programming and dynamic evolving neural-fuzzy inference system in modeling pan evaporation. Hydrol. Res. 49 (4), 1221–1233. Esmaeili, H., Mohebbi, A., 2017. Prediction of pressure drop in venturi scrubbers by multi-gene genetic programming and adaptive neuro-fuzzy inference system. Chem. Prod. Process. Model. 12 (3), 1–12. Fallah-Mehdipour, E., Haddad, O.B., Orouji, H., Mariño, M.A., 2013. Application of genetic programming in stage hydrograph routing of open channels. Water Resour. Manage. 27 (9), 3261–3272. Gandomi, A.H., Alavi, A.H., 2012a. A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake engineering problems. Neural Comput. Appl. 21 (1), 189–201. Gandomi, A.H., Alavi, A.H., 2012b. A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems. Neural Comput. Appl. 21 (1), 171–187. Garg, A., Lam, J.S.L., 2015. Improving environmental sustainability by formulation of generalized power consumption models using an ensemble based multi-gene genetic programming approach. J. Clean. Prod. 102, 246–263. Garg, A., Tai, K., 2014. An improved multi-gene genetic programming approach for the evolution of generalized model in modelling of rapid prototyping process. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, Switzerland, pp. 218–226. Garg, A., Garg, A., Tai, K.J.C.G., 2014a. A multi-gene genetic programming model for estimating stress-dependent soil water retention curves. Comput. Geosci. 18 (1), 45–56. Garg, A., Garg, A., Tai, K., Sreedeep, S., 2014b. An integrated SRM-multi-gene genetic programming approach for prediction of factor of safety of 3-D soil nailed slopes. Eng. Appl. Artif. Intell. 30, 30–40. Garg, A., Tai, K., Gupta, A.K., 2014c. A modified multi-gene genetic programming approach for modelling true stress of dynamic strain aging regime of austenitic stainless steel 304. Meccanica 49 (5), 1193–1209. Ghorbani, M.A., Khatibi, R., Mehr, A.D., Asadi, H., 2018. Chaos-based multigene genetic programming: a new hybrid strategy for river flow forecasting. J. Hydrol. 562, 455–467. Giustolisi, O., 2004. Using genetic programming to determine Chezy resistance coefficient in corrugated channels. J. Hydroinform. 6 (3), 157–173. Hadi, S.J., Tombul, M., 2018. Monthly streamflow forecasting using continuous wavelet and multi-gene genetic programming combination. J. Hydrol. 561, 674–687. Hasan, M.K., Islam, M.M., Hashem, M.M.A., 2016. Mathematical model development to detect breast cancer using multigene genetic programming. In: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV). IEEE, Dhaka, Bangladesh, pp. 574–579. Hoang, N.D., Chen, C.T., Liao, K.W., 2017. Prediction of chloride diffusion in cement mortar using multi-gene genetic programming and multivariate adaptive regression splines. Measurement 112, 141–149. Javed, S.G., Majid, A., Ali, S., Kausar, N., 2016. A bio-inspired parallel-framework based multi-gene genetic programming approach to Denoise biomedical images. Cogn. Comput. 8 (4), 776–793. Kaydani, H., Mohebbi, A., Eftekhari, M., 2014. Permeability estimation in heterogeneous oil reservoirs by multi-gene genetic programming algorithm. J. Pet. Sci. Eng. 123, 201–206. Kouzehgar, K., Hassanzadeh, Y., Eslamian, S., Yousefzadeh Fard, M., Babaeian Amini, A., 2021. Application of gene expression programming and nonlinear regression in determining breach geometry and peak discharge resulting from embankment failure using laboratory data. J. Irrig. Sci. Eng. https://doi.org/10.22055/jise.2021.35162.1931. Koza, J.R., 1992. Genetic Programming II, Automatic Discovery of Reusable Subprograms. MIT Press, Cambridge, MA, USA. Kumar, B., Jha, A., Deshpande, V., Sreenivasulu, G., 2014. Regression model for sediment transport problems using multi-gene symbolic genetic programming. Comput. Electron. Agric. 103, 82–90. Multigene genetic programming Chapter 19 331 Lee, J.-S., Suh, K.-D., 2020. Development of stability formulas for rock armor and tetrapods using multigene genetic programming. J. Waterw. Port Coast. Ocean Eng. 146 (1), 04019027. https://doi.org/10.1061/(ASCE)WW.1943-5460.0000540. Marini, S., Conversi, A., 2012. Understanding zooplankton long term variability through genetic programming. In: European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Springer, Heidelberg, Berlin, Germany, pp. 50–61. Mehr, A.D., Kahya, E., 2017. A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction. J. Hydrol. 549, 603–615. Mehr, A.D., Nourani, V., 2017. A Pareto-optimal moving average-multigene genetic programming model for rainfall-runoff modelling. Environ. Model. Softw. 92, 239–251. Mehr, A.D., Nourani, V., 2018. Season algorithm-multigene genetic programming: a new approach for rainfall-runoff modelling. Water Resour. Manage. 32 (8), 2665–2679. Mehr, A.D., Nourani, V., Kahya, E., Hrnjica, B., Sattar, A.M., Yaseen, Z.M., 2018. Genetic programming in water resources engineering: a state-of-the-art review. J. Hydrol. 566, 643–667. Mohammad-Azari, S., Bozorg-Haddad, O., Loáiciga, H.A., 2020. State-of-art of genetic programming applications in water-resources systems analysis. Environ. Monit. Assess. 192 (2), 73. Mohammadi Bayazidi, A., Wang, G.G., Bolandi, H., Alavi, A.H., Gandomi, A.H., 2014. Multigene genetic programming for estimation of elastic modulus of concrete. Math. Probl. Eng. 2014. https://doi.org/10.1155/2014/474289. Mousavi, S.M., Mostafavi, E.S., Hosseinpour, F., 2015. Towards estimation of electricity demand utilizing a robust multi-gene genetic programming technique. Energy Effic. 8 (6), 1169–1180. Muduli, P.K., Das, S.K., 2014. CPT-based seismic liquefaction potential evaluation using multi-gene genetic programming approach. Indian Geotech. J. 44 (1), 86–93. Muduli, P.K., Das, S.K., 2015. Model uncertainty of SPT-based method for evaluation of seismic soil liquefaction potential using multi-gene genetic programming. Soils Found. 55 (2), 258–275. Nedjah, N., Abraham, A., de Macedo Mourelle, L., 2006. Genetic Systems Programming: Theory and Experiences. ISSN Print Edition: 1860-949X, Springer, Netherlands. Niazkar, M., 2019. Revisiting the estimation of colebrook friction factor: a comparison between artificial intelligence models and C-W based explicit equations. KSCE J. Civ. Eng. 23 (10), 4311–4326. https://doi.org/10.1007/s12205-019-2217-1. Niazkar, M., 2020. Assessment of artificial intelligence models for calculating optimum properties of lined channels. J. Hydroinform. https://doi.org/ 10.2166/hydro.2020.050. Niazkar, M., 2021. Optimum design of straight circular channels incorporating constant and variable roughness scenarios: assessment of machine learning models. Math. Probl. Eng. 2021, 1–21. https://doi.org/10.1155/2021/9984934. Article ID 9984934. Niazkar, M., Afzali, S.H., 2018. Developing a new accuracy-improved model for estimating scour depth around piers using a hybrid method. Iran. J. Sci. Technol. Trans. Civil Eng. 43 (2), 179–189. https://doi.org/10.1007/s40996-018-0129-9. Niazkar, M., Niazkar, H.R., 2020. COVID-19 outbreak: application of multi-gene genetic programming to country-based prediction models. Electron. J. Gen. Med. 17 (5), em247. https://doi.org/10.29333/ejgm/8232. Niazkar, M., Zakwan, M., 2021a. Application of MGGP, ANN, MHBMO, GRG and linear regression for developing daily sediment rating curves. Math. Probl. Eng. 2021, 1–13. Article ID 8574063 https://doi.org/10.1155/2021/8574063. Niazkar, M., Zakwan, M., 2021b. Assessment of artificial intelligence models for developing single-value and loop rating curves. Complexity 2021, 1–21. Article ID 6627011 https://doi.org/10.1155/2021/6627011. Niazkar, M., Talebbeydokhti, N., Afzali, S.H., 2019a. Novel grain and form roughness estimator scheme incorporating artificial intelligence models. Water Resour. Manage. 33 (2), 757–773. https://doi.org/10.1007/s11269-018-2141-z. Niazkar, M., Talebbeydokhti, N., Afzali, S.H., 2019b. One dimensional hydraulic flow routing incorporating a variable grain roughness coefficient. Water Resour. Manage. 33 (13), 4599–4620. https://doi.org/10.1007/s11269-019-02384-8. Niazkar, M., Hajizadeh Mishi, F., Eryılmaz T€urkkan, G., 2021. Assessment of artificial intelligence models for estimating lengths of gradually-varied flow profiles. Complexity 2021, 1–11. Article ID 5547889 https://doi.org/10.1155/2021/5547889. Nicklow, J., Reed, P., Savic, D., Dessalegne, T., Harrell, L., Chan-Hilton, A., et al., 2010. State of the art for genetic algorithms and beyond in water resources planning and management. J. Water Resour. Plan. Manage. 136 (4), 412–432. Pandey, D.S., Pan, I., Das, S., Leahy, J.J., Kwapinski, W., 2015. Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier. Bioresour. Technol. 179, 524–533. Pedrino, E.C., Yamada, T., Lunardi, T.R., de Melo Vieira Jr., J.C., 2019. Islanding detection of distributed generation by using multi-gene genetic programming based classifier. Appl. Soft Comput. 74, 206–215. Riahi-Madvar, H., Dehghani, M., Seifi, A., Singh, V.P., 2019. Pareto optimal multigene genetic programming for prediction of longitudinal dispersion coefficient. Water Resour. Manag. 33 (3), 905–921. Safari, M.J.S., Mehr, A.D., 2018. Multigene genetic programming for sediment transport modeling in sewers for conditions of non-deposition with a bed deposit. Int. J. Sediment Res. 33 (3), 262–270. Searson, D., 2009. GPTIPS: Genetic Programming and Symbolic Regression for MATLAB. Searson, D.P., 2015. GPTIPS 2: an open-source software platform for symbolic data mining. In: Handbook of Genetic Programming Applications. Springer, Cham, pp. 551–573. https://doi.org/10.1007/978-3-319-20883-1_22. 332 Handbook of hydroinformatics Searson, D.P., Leahy, D.E., Willis, M.J., 2010. GPTIPS: an open source genetic programming toolbox for multigene symbolic regression. In: Proc., Int. Multiconf. of Engineers and Computer Scientists. Newswood, China, Hong Kong, pp. 77–80. Sivapragasam, C., Maheswaran, R., Venkatesh, V., 2008. Genetic programming approach for flood routing in natural channels. Hydrol. Process. 22 (5), 623–628. Zakwan, M., Niazkar, M., 2021. A comparative analysis of data-driven empirical and artificial intelligence models for estimating infiltration rates. Complexity 2021, 1–13. Article ID 9945218 https://doi.org/10.1155/2021/9945218. Chapter 20 Ontology-based knowledge management framework in business organizations and water users networks in Tanzania Neema Penance Kumburu Moshi Co-operative University, Moshi, Tanzania 1. Introduction Todays’ economy is considered knowledge economy this is because formation and utilization of facts is crucial and plays a major part in the creation of wealth. Economic achievement is progressively founded upon the actual consumption of invisible assets such as knowledge, skills and innovative potential as the crucial source for competitive advantage (Barkhordari et al., 2019). This emerging economic structure is thus referred to as the knowledge economy (ESRC, 2005). The current movement of globalization has necessitated the world, regions, and countries in particular to aggressively participate in the world-wide economy; thus, rivalry is the foremost influence in the process. This was based on the understanding and recognition that the traditional factors of production (Land, Labor, and Capital) which were plentiful, available and were regarded as prime in attaining economic gain have limitations. Knowledge received scanty attention and thus of less importance in attaining competitive advantage, currently, it is the knowledge-based economy characterized by the use of information technologies that counts more (Kefela, 2010). Thus, this implies that, earlier well recognized factors of production are presently not sufficient to sustain a firm’s competitive advantage as knowledge plays crucial role. Most organizations understand the relevance of available information in attaining sustained competitive edge than others. Knowledgedependent economy is founded on the invention, sharing and application of knowledge and information. Yoong and Molina (2003) opined that one means through which business organization can thrive in today’s tempestuous commercial setting is through appreciating, the recognition and utilization of knowledge in the organization. Organizational effectiveness in both business and water or irrigation networks likewise hinge upon making good use of this knowledge, which needs to be established, apprehended and exchanged (knowledge management) so as to form firm capital (Omotayo, 2015). Thus, individual enters and leave, but firm preserve knowledge over time. Or as articulated clearly by Fitz-Enz (2000), company capital knowledge remains with the organization when the workers quit. Human capital is the cognitive asset that leaves the company every night and so cannot easily be manipulated. This is because they choose where and how they want to invest their knowledge. Consequently, knowledge management (KM) turns out to be a serious action in realizing outcomes. Ontology is overt provisions of a collective operationalization. An operationalization is an intangible model of realities in the earth by recognizing the pertinent thoughts of the phenomenon. Explicit refers to the form of notions applied and the restraints on their application are openly demarcated. Shared knowledge mirrors the idea that an ontology reflects consensual knowledge, that is, it is not isolated to the person, but consented by the group. Basically, the role of ontology in the knowledge management process is to affluence the creation of a field model. It offers a terminology of terms and association in a precise domain. In making a knowledge management system, deuce sorts of knowledge are desirable. First is field knowledge: knowledge concerning the unbiased truths in the field of attention (objects, relations, events, states, causal relations, etc. that are gotten in some areas) and second problem-solving knowledge: knowledge concerning the use of particular knowledge to realize numerous end results, the knowledge normally presented in form of problem-solving method (PSM) that can facilitate attainment the goals in a dissimilar field. In the case of business organization and water user networks or co-operatives, knowledge needs to provide solution of non-acceptance of particular product in the market or behavior of users. With the intention to manage the knowledge, ontology has crucial role in enabling the converting and distribution of knowledge amid experts and knowledge beneficiaries (Sureephong et al., 2008). Along with actors described earlier, it also offers a communal and reciprocated comprehending of a sphere that can be linked across individuals and utilizing systems. Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00014-2 Copyright © 2023 Elsevier Inc. All rights reserved. 333 334 Handbook of hydroinformatics Business organizations are legal units through which stockholders and business persons offer goods and services as well as co-operate with each other to realize viable goals. Utilization of knowledge properly is key to firm’s existence and prosperity in competitive international markets and has crucial contribution to critical thinking, selecting course of action, company performance and invention (Huang, 2008). Thus, business organizations are currently using knowledge to withstand continuing competitive advantage. At the moment knowledge is key management asset since knowledge permits organizations to apply and nurture organizational capitals, augment competitive aptitude and advance maintainable competitive gain. Water users’ networks or co-operatives are farmers’ societies which are autonomous and governed by members (owners) who contribute both monetary and human capitals for their socio-economic benefits as well as conservation of a cognizable water body (Mosha et al., 2018). Since it is the knowledge, skills, and abilities of the individual that make value, emphasis ought to be on mechanism on how knowledge is acquired, exchanged and disseminated. Thus, knowledge management is a prerequisite if the business organizations and water users’ networks have to attain roles for which they are established for (Omotayo, 2015). Knowledge management refers to stowage and distribution of the knowledge and understanding accrued in an organizations, association or networks concerning procedures, methods and actions. It considers knowledge as significant source in realizing members content (Gonzalez and Martins, 2017). Knowledge management put emphasize on persons and the means they obtain, interchange and circulate knowledge. With the fast advancement of information technology and applied utilization of unconventional thoughts, a diversity of tacit, explicit, organized and unorganized knowledge is rising exponentially. How to successfully gather and classify these multifaceted, varied and multiple knowledges and, means to reacquire and reiterate them sensibly to form new standards for cultivating the competitiveness of firm. The relevance of knowledge management as key tool in business organizations and water users association cannot be overemphasized. Teng and Song (2011) and Omotayo (2015) assert that the importance of knowledge management is from recognizing that firms compete on their knowledge-based asset, even government institution (non-competing organizations) success or failure is dependent on the capacity to leverage their knowledge asset and that knowledge management is not only relevant in high tech industry only but also in all spheres of the economy. Despite the importance of knowledge management on enhancing business organizations competitive advantage, in particular, the aspects of Ontology-Based Knowledge Management framework in Business Organization have not been recognized if not rarely mentioned in few researches works. On the aspect of water users’ networks, water users networks are very vivacious sector toward ensuring food safety and poverty lessening particularly in Tanzanian’s rural household’s communities. About 45% of Tanzania’s GDP is subsidized by the agrarian sector and around 30% of forex is through shipping of food items, while engaging more than 70% of rural inhabitants. As a result, the agriculture subdivision endures to be the engine for advancing the economy in the country (URT, 2011). Even though agriculture is the mainstay of the country’s economy, still it is reckoned by a number of challenges; to mention a few, are intermittent drought, undependable rainfall due to natural catastrophes such as overflows and drought (URT, 2011). Based on this fact, water users’ networks or irrigation associations are regarded as crucial means to battle food insecurity, upsurge food production and make food security amongst the domiciliary societies. However, in Tanzanian situation, food security is amid the national disasters. For the period between 2019 and April 2020, over 20% of the populace in the sixteen districts such as Tanga, Arusha, and Manyara are facing acute food insecurity. These comprise 16% and 5% suffering crisis situation, serious situation respectively. Furthermore, about 34% of people are also in IPC Phase 2 (Stress) and need livelihood sustenance. This discloses the intensity of trouble in Tanzania. While water user networks or irrigation societies was started mostly to increase agrarian production and improve food safety, the degree to which these irrigation networks advance food security in Tanzania continued to be small. It is for this reason that this chapter will document, share and propose a framework of knowledge management system. In order to assist business organization and water user networks in Tanzania to found a collective ontology that can be comprehended both by human and computer, so that employees and network members can form a common place of dissimilar notions through a better condition of knowledge recovery interface. This not only reflects the invention of knowledge but again can be used as tool for management to ensure conservation and revision of novelty knowledge on time. The system is also useful in ensuring that knowledge is shared so that it can accrue tacit knowledge continually and polymerize explicit knowledge successfully so that organizations and networks can use knowledge to outperform competitors and ensure sustained competitive advantage. The development of this chapter was founded on the hypothetical and past study. To ensure an extensive theoretic base for this work, numerous available literatures were consulted. The study used a case study design where the experiences of business organizations and water user networks in Tanzania were explored. 2. Theoretical framework Several interpretations on knowledge are debated in many diverse scientific forums, such as strategy, management, organization theory literature, and philosophy. Divergent interpretations on knowledge resulted to dissimilar operationalization Ontology-based knowledge management framework Chapter 20 335 of knowledge management (Ferreira et al., 2018). The assumption is based on understanding that knowledge as a tactical resource. This is as stipulated by business strategy theory, explicitly the resource-based view (RBV) of the firm. The core thesis of the RBV is that competitive gain is based on valued and exceptional interior resources and capabilities that are expensive to reproduce for contestants. This implies that a resource to be a basis of competitive edge must meet three criteria. First, the results from these appreciated resources are freely procured by purchasers at a worth far higher than the charges acquired in producing it to the marketable state. Second, it is rare since it is contingent to partial supply. Thirdly, it is hard for rivals to either emulate or acquire the resources (Cardeal and Antonio, 2012). The theory further theorizes that the wanted outcome of executive effort within the enterprise is sustained competitive advantage (SCA) that permits the organization to get earnings that are beyond industry average (Mugera, 2012). This theory sees SCA as originating from the peculiar resources of enterprise that stretches an advantage over its opponents. An enterprise such as water user network is observed as a package of particular resources that are utilized to establish advantaged market situation. Therefore, the RBT stresses tactical selections, where managers of the organization have the significant duty of recognizing, nurturing, and utilizing key resources to take full advantage of returns. In this case, organization manager must make sure that mechanisms are in place to ensure gathering, shaping, expounding, distributing and the reapplication of the information and knowledge in the entire firm. This is because the resource-based theory (RBT) places emphasis on decisions and competencies emanating from a firm rather than its environment (Barney and Arikan, 2005). Resource firm have, knowledge inclusive is a source competitive advantage. Internal influences are those that exert influence on firm owner/manager’s aptitude to work competently, notwithstanding any inborn ability of the owner/manager (Amoah and Fordjour, 2012). Inner features are the individual qualities, skills, knowledge and capabilities of the discrete owner/manager which are critical on how well the business perform the unavoidable emergencies that ascend. Thus, RBT stipulate that knowledge is the main determinant of organizations performance. Knowledge management (KM) narrates the processes and set-ups organizations utilize to achieve, generate and disseminate knowledge for articulating strategy and create strategic choices that will enable a firm to gain competitive advantage. When a business organization develops knowledge strategy. It is described as the overall tactic a society intent to pursue to balance its knowledge resources and competences to the rational necessities of its strategy. A strategic boldness is essential to attain a maintainable competitive position. From a practice, business organizations and water user networks recognize the eminence of handling knowledge if they are to persevere competition and rise. Subsequently, numerous companies universally are beginning to enthusiastically organize their knowledge and innovation. Knowledge do matter, but the enquiry is when, how and why? (Carayannis and Campbell, 2009). Today, knowledge is of more substance and in the way that are not constantly foreseeable or even controllable. Knowledge frameworks are extremely multifaceted, dynamic and adaptive (Carayannis and Campbell, 2009). This requires ontology-based knowledge management framework in business organizations and water user networks. However, RBT also does not value the synergy component among resource combinations in achieving competitive advantage (Kraaijenbrink et al., 2010). To overcome these criticisms, the system theory is preferred. Systems theory emphases on the links amid fragments and the features of a whole, instead of plummeting a total to its segments and assessing their discrete properties (Senge, 1990). A system is described as “an object which preserves its being by the use the joint interface of its fragments Systems theory delivers a structure by which clusters of components and their characteristics can be studied together so as to comprehend outcomes”. Thus, systems theory framework is important to the analysis of business firm and water users’ networks and how they operate (Yari and Eslamian, 2021). A system is made up of at tiniest dual parts and correlation that holds among them. At any particular time, a system or each of its parts displays a state, defined as its pertinent features, morals or features. When considering knowledge, a significant notion in systems thinking is generative learning. Generative learning is the procedure of balancing, mixing and contextualizing prevailing knowledge to ensemble the wants of a new submission or a business organization (Chun et al., 2008). Generative learning permits advanced approaches to novel snags rather than simple reactionary and frequently ill-suited reuse of ancient philosophies to novel difficulties. A systems theory approach to KM identifies that any time one of the key knowledge developments is indorsed, there can be a ripple effect of actions and deeds that may alter the condition of other subsystems. Incident may be component of strengthening processes that results to the rise or deterioration of either wanted or unwanted consequences. Each knowledge process may result to unreceptive solutions or factual generative learning. Contingent on how four processes (the creation, storage, transfer, and application of knowledge) have been executed, they may be observed as closed, open or dynamic systems, each affected more or less by the outside setting and each interrelate and interdepend. While the systems intellectual viewpoint has been unified into the information system (IS) literature (Panagiotidis and Edwards, 2001), insufficient investigators have scrutinized the all-inclusive standpoint of systems thinking in the milieu of KM. Thus, this chapter will show and document how system theory can best explain assumption of ontology of knowledge management in business organization and water user networks. 336 Handbook of hydroinformatics 3. Empirical literature This section presents the reflection of previous studies in relation to the subject matter. These studies provide insights into understanding the ontology-based knowledge management framework in Business Organizations as well as identifying gaps. Huang (2008), explained how enterprise culture and structure can effectively enhance knowledge management by revising literature and demonstrating a knowledge enterprise model. The study further shows that in order to successfully utilize knowledge, the society should generate a knowledge distribution culture whose element is trust and contemplate it from the four category that is interpersonal, group, organizational and institutional. Belief ought to go through the progression of knowledge management and underscore trust to individuals and to knowledge content concurrently. Omotayo (2015) reviewed literature in the part of knowledge management and document the relevance of knowledge management in association. The paper additionally, shows that knowledge management is a critical element to objectives attainment and crucial instrument for enterprise existence, competitiveness and viability. Thus forming, handling, distribution and applying knowledge efficiently is important for society to exploit the benefit of knowledge and for organization to manage knowledge effectively concentration should be on three key factors that is people, processes and technology. Barkhordari et al. (2019) examined the pragmatic association amid the knowledge based economy and fiscal progress in MENA countries. The study used progress model in Barro and Sala-i-Martin framework (1995) for the time frame of five years (2010–15) as well as panel data of yearly fiscal development rate for the designated MENA countries and within the hypothetical and pragmatic context of the quatern variables applied for recognizing the context of the knowledge-based economy. The outcomes show that institutions, human capital and research infrastructure and business sophistication are the pillars of knowledge-based economy that have substantial and positive influence economic development in MENA countries. The recommendation emanating from this study is that government should contemplate the knowledge connected policies for hastening passage to the knowledge-based economy and boosting economic performance. Sureephong et al. (2008) recommended knowledge management system that hold up knowledge management actions inside the organization. To attain the goals of the investigation, ontology is the key in the knowledge management development in several means as establishing reusable and quicker knowledge based and improved customs of representing the knowledge openly. The study further found that generating and depicting ontology produce problems to enterprise due to vagueness and formless origins of knowledge. Therefore, the study proposes the methodology that comprehend, generate and depict, ontology for firm or societal development by means of the knowledge manufacturing approach. Liu et al. (2013) presented knowledge management context that tracks and combines origin information across dispersed varied system. They reinforced by the combined knowledge model that defines the field knowledge and the origin information included in the information life cycle of a certain data product and appraised the anticipated frameworks in the setting of deuce actual world water irrigation information systems. Basing on the above empirical studies, it is clear that the studies on ontology-based knowledge management framework in business organizations have been conducted worldwide. These studies found that institutions, human capital and research, infrastructure and business sophistication as well as people, processes and technology are critical in managing knowledge effectively. Yet, little research has examined how ontology-based knowledge management framework in Business Organizations and water users networks, particularly, the extent to which business organization accrue implied knowledge continually and polymerize obvious knowledge competently, which result to improved organization and utilization of knowledge, to sustain the invention for the inventers. Therefore, relying on this fact, this study aimed to cover the existing slit by authenticating outline of a knowledge management scheme founded on ontology of knowledge management that form and explain knowledge with the ontology with the aim of creating a shared ontology whereby aspects of knowledge management such as human, computer, and people can be appreciated and discover additional association of different notions by improved mechanism of knowledge recovery interface. This will not only reflect the invention of knowledge, but apply management tools to ensure the storage and upgrade of novelty knowledge timely and thus enhance business and water users’ association performance. 4. Ontology-based knowledge management framework in business organizations: A conceptual framework Knowledge based economy raises exponentially and thus knowledge asset becomes irreplaceable to the organization. Efficient application of knowledge has come to be central to the firm’s endurance and accomplishment in rivalry world-wide business and has an effective probability to problem resolution, decisiveness, enterprise performance enhancement and invention. Successful application of knowledge, as described in academia, is knowledge management. Knowledge Ontology-based knowledge management framework Chapter 20 337 management refers to organized, clear and deliberately established procedures needed to manage knowledge, the intention of which is to make best use of an enterprise knowledge-linked effectiveness and form standards (Bixler, 2005). The procedure included in KM comprises gathering, sorting, illustrating, distributing, and reutilizing the information and knowledge in the entire enterprise. Concerning with knowledge is the foremost principle of knowledge management. Knowledge management is of two kinds overt and unspoken. Overt knowledge may be voiced in proper verbal and communicated among individuals while inferred knowledge contains more imperceptible features and is private knowledge entrenched in personal skills. Equally obvious and inferred knowledge must make earnings and resolve today’s issues in association. Mastery of relevant and contemporary knowledge for unceasing association enhancement is main focus of KM. Effective KM has maturity, dynamic and self-development attributes. Maturity qualities purport that KM must be durable and sufficient to solve the turmoil in attaining results hitherto dynamic and adapt to deviations (Barkhordari et al., 2019). Again, KM should bring into line with enterprise policy, strategy, culture technology and structure and offer an atmosphere with well-organized, value added and pertinent knowledge to make and initiate creativity and exciting ideas. Dynamic attributes mean the information and knowledge movement should blowout through the enterprise without barricades where everybody can share and subsidies knowledge asset. Self-growth attribute implies, on one hand, KM should sense possibly valuable knowledge, seizing and stowage it to upsurge enterprise assets and on the other side it should make new knowledge relying on what an enterprise previously has had. Knowledge Management can advance organization, for example, leveraging the intellectual capital, exploiting knowledge assets, preserving cutting edge performance. In organizational perspective, it is expected that organizational policy, strategy, culture technology and structure can enhance knowledge management. For example, organizational culture can impact knowledge management in the sense that in knowledge sharing culture, organizational culture may enable knowledge dissemination, particularly implicit knowledge. People are extra persuaded to hide what they are familiar with when are indeterminate with the consequence of dissemination, to ensure sharing is done effectively building trust is key. This is because trust involves faith to persons and faith to the knowledge content itself. Confidence to persons is important in creating collaborating and participative culture in the organization. This is expected to minimize the barriers to the knowledge sharing. On the other hand, faith to knowledge content will rise the trustworthiness of knowledge which may facilitate persons utilize knowledge without reservation and foster the trust to other individuals instantly. Research further, identified other concept of knowledge allotment culture like the possession of knowledge and obligation. Possession of knowledge can be categorized into enterprise trust. This entails that firms should facilitate the employees work, offer essential knowledge to complete their duties, be open to censure and inspire confidence. The knowledge resources not only possessed by executives, it ought to be public to everybody in the enterprise. All employees have the right to possess and recoup the knowledge asset. In addition, organizations should form environment which enable staff feel equal access to knowledge assets and accountable for facilitating change. Furthermore, organizational structure also affects knowledge management positively. This is mainly because to implement knowledge management business organizations appoint the person responsible with for example, chief knowledge officer (CKO), knowledge engine, and knowledge manager to execute knowledge management. The organizational charts should be interacted to offer occasions for workers to co-operate and liaise with the rest and facilitate knowledge linked deed. Since it is intangibles and more pertinent to individual moods, organizing implicit knowledge is extra tricky than overt knowledge. Organizational settings must be in position implicit knowledge and change overt knowledge if needed. The organizational design must be networked to deliver chances for each staff to interrelate and liaise with others and offer knowledge associated activities in firm. Organizational frameworks should facilitate tacit knowledge and turn it explicit. In the organizational setups, there must be links amid individual enhancement and organizational development as advocate by system approach discussed earlier. Technology is enabler, within infotech are machineries which enable the executives to distribute knowledge and data. Consequently, infotech have a crucial function on Knowledge Management creativities. Within contemporary business situation, the execution of knowledge management projects is uncomplicated with the assistance of machinery (Subashini et al., 2011). The worth of Knowledge Management increased when accessible to the accurate persons at the exact time. Thus, knowledge distribution, storage and retrieval are eased via infotech equipment such as computers, phones, e-mail, folders, data-mining systems, exploration engines, video-conferencing equipment and micro film. Human factor is crucial to successful management of knowledge management thus human resource management assist much in managing knowledge by compounding “congruence” and “human” and “social capital” approaches. Through the integration approach human resource management mechanisms should be internally reliable in order to mutually strengthen each other, strengthen the entire management framework in the organization and “fit” with the exterior business situation. Human resource management in the field of career progression and payment systems also need special emphasize in order to effectively manage knowledge (Omotayo, 2015). By using human and social capital approach, they emphasize the 338 Handbook of hydroinformatics significance of “the long-term growth of skills, culture and competences in the organization” (El-Farr and Hosseingholizadeh, 2019). Considering the thesis that personnel are carriers of abundant enterprise key knowledge, they propose that human resource experts should focus upon, first, the retaining of staff. Second, they recommend that workers’ know-how be built into the enterprise customs through learning procedures. Third, they advocate that devices are formulated for the sharing of profits ascending from the consumption of this knowledge. On the other hand, Armstrong (2006) suggests ways on which human resource management influence knowledge management (i) Assist in fostering an exposed culture in which the customs and standards stress the relevance of disseminating knowledge. (ii) Endorse an environment of devotion and honesty. (iii) Recommend on the blueprint and expansion of enterprise which simplify knowledge distribution using webs and societies of practice (clusters of persons who share mutual needs about their work), and co-operation. (iv) Suggest recruitment strategies and offer supply amenities which guarantee that respected staff who can subsidies knowledge formation and distribution are captivated and reserved. (v) Offer means of inspiring persons to make their knowledge explicit and motivating persons perform those act. (vi) Aid in the establishment of performance management procedures that emphasize on the expansion and dissemination of knowledge. (vii) Formulate mechanism of collectively and discrete learning that will produce and help in circulating knowledge. (viii) Establish and conduct workshops, meetings, training and conventions which facilitate knowledge to be shared on a person-to-person base. (ix) In collaboration with infotech, form structures for seizing and, codifying overt and inferred knowledge. (x) Largely, endorse the source of knowledge organization with top executives to inspire them to apply direction and backing knowledge management creativities. On the other hand, hydrology is the discipline of water that involves storages and fluxes in site, period and stage. Based on complexities of technical problems, it is increasing difficult to allocate water resource so as to solve problem within one organization or one site (Liu et al., 2013). Data gathering and scrutiny are done without dispersed heterogeneous. Equated with additional data-oriented discipline societies, one of the unique features of the hydrologic science communities is fact that there is countless focus on externally generated data e.g., data gathered by other organization. A noteworthy snag for field operators is to find the correct data that suit their needs and to choose means to utilize that data (Tarboton et al., 2010). Additionally, societies have positioned greater consideration on the interacting of disseminated sensing and less on the tool to control and comprehend the data. To correctly understand data produced by external agent, the operators require to possess mastery of the source of data. Thus, water user networks in Tanzania need a mechanism of a knowledge management framework founded on ontology of knowledge management that shape and explain knowledge with the ontology with the aim of creating a shared ontology whereby aspects of knowledge management such as human, computer, culture, structure, leadership, and technology can be appreciated and create more association of dissimilar thoughts through the use improved context of knowledge reclamation interface (Fig. 1). Leadership Practice strategic plan and systems thinking approaches Encourage and reward learning, risk taking, and knowledge sharing Use resources in the best possible way to share knowledge and ideas Knowledge Sharing Structure that facilitates personal interactions and captures tacit knowledge Culture Technology Infrastructure That supports trust, open dialogue, and team • promotes efficient capture of tacit Business and explicit knowledge organization/ • supports efficient and effective knowledge sharing • makes knowledge accessible in the entire organization Organization Structure Learning • Provide an opportunity for individual learning that is linked to organizational learning • Establish connection between individual performance and business goals • Develop metrics and benchmarks to measure results of learning • Challenge people in the organization to perform better by setting tougher standards Trust water user network Completive advantage FIG. 1 Conceptual framework of ontology-based knowledge management in business organizations. (Modified from Anantatmula, V.S., 2005. Knowledge management criteria. In: Stankoshy, M. (Ed.), Creating the Discipline of Knowledge Management: The Latest IN University Research. Elsevier Butterworth-Heinemann, Amsterdam, Boston, Netherlands, pp. 171–188.) Ontology-based knowledge management framework Chapter 20 339 5. Ontology-based knowledge management framework in business organizations and water user networks proposed system Business organization and or water user networks are formalized structures of roles or positions to facilitate accomplishment of goals. Thus, today’s society is a society of organizations. This owes itself to a dynamic of capitalism as the dominant mode of economic system. Capitalism thrives through organizational activity. Organizations have to be managed as they grow in complexity. Therefore, organization/water user networks develop to achieve goals, they involve sets of interacting positions, they involve collaborative actions of individuals, they are deliberately structured and consciously coordinated, they involve departmentalized activities following a logical pattern and they exist within the larger society (Daft, 2000). Since organizations/water users networks operate within society. They are described as open system in the sense that they are in a continuous exchange with their environments in order to survive. They receive inputs from this environment, transform them into outputs, and pour them out into the environment as goods or services and waste. Through feedback the organization gets information about what works and what doesn’t work and makes the necessary correction or adjustment. Changes in the external environment require the organization to adapt accordingly. Such changes impact on strategy, structure, and culture of the organization (Fig. 2). For the above system to operate effectively, the ontology of knowledge management is needed. This is because the core role of knowledge management scheme is to facilitate knowledge distribution inside. Thus, accession of knowledge does not reflect the commencement of the knowledge management but then crucial condition (Huang, 2008) so as to be suitable for reutilization of knowledge. Knowledge management creates clear components that are stowed in a knowledge foundation that comprises numerous organized, semiorganized, and unorganized information. The knowledge management is alienated into trio portions if it has to work effectively and attain organizational goals. That is procurement of knowledge, storing of knowledge and re-claim of knowledge as well as procedure whose key conception in ontology is connected by knowledge mining, knowledge depiction, and knowledge correlations. Procurement of knowledge is the conceptualizing procedure founded on the perception of ontology. The procedure changes entirely essential knowledge, informal and semiformal information to the formal information. Furthermore, purchase of knowledge is actualized by knowledge mining that permits to place knowledge base, such as countless data sources, certifications, and webs in the knowledge repository afterward dealt with by knowledge detection system (KDS). Additional knowledge foundations such as data in each varieties of forum, comments on applications (comprising tacit knowledge) are initially placed in the passage repository. Subsequently organized efficiently by leaders, these repositories will be placed in the knowledge repository. Consequently, obtaining of knowledge refer to the methods of knowledge building rather than knowledge adaptation (Peng et al., 2019). Intending at changing semi organized and unorganized data to organized knowledge and store them in the knowledge bank, stowage of knowledge means the method that metadata is taken out of knowledge foundations developed atop and knowledge are noticed in means of ontology and Environment System Inputs Raw Materials Human Resources Capital Technology Information Transformation Process Outputs Employees’Work Activities Management Activities Technology and Operations Methods Products and Services Financial Results Information Human Results Feedback Environment FIG. 2 Organization as open system. (Modified from Daft, R.L., 2000. Management. The Dryden Press, Fort Worth.) 340 Handbook of hydroinformatics knowledge connection knowledge representation knowledge administrator user user administration tool knowledge retrieval engine knowledge push Ontology base meta-data base documents domain knowledge marking knowledge mining knowledge processing/ discovering tool documents data base data base knowledge base transit depot application Web tools of knowledge acquisition the feedback of forum/user FIG. 3 Ontology-based knowledge management framework in business organizations and water user network-proposed system. (Modified from Peng, M. Y.P., Zhang, Z., Ho, S.S.H., 2019. A study on the relationship among knowledge acquisition sources at the teacher-and college-level, student absorptive capacity and learning outcomes: using student prior knowledge as a moderator. Educ. Sci. Theory Pract. 19(2), 22–39.) metadata ideals. Ontology foundation comprises association of the grouped notions of field knowledge substances and extra ideas in a system. Metadata, which KMS needs is placed in the metadata bank that form the core tool to explore knowledge object successfully (Fig. 3). Data bank and knowledge bank are kind of combined of syntax, metadata information and connection of knowledge objects (in this case information on employees or members titles, education, training, past service, present position, performance scores, pay levels, language proficiency, and capabilities and specialized skills). Separate cluster command on associated knowledge in feature of metadata store and ontology is not only the beliefs but again the basis to attain effective search and conjecture on knowledge. It was also noted that acquisition of knowledge in business organization and water user networks in Tanzania is poorly practiced. It was reported that in this company the concept knowledge management is completely new. “We still rely on trial and error to solve problems we are yet to have knowledge management systems in place. Thus it is difficult to have information and knowledge database that can easily facilitate knowledge acquisition.” Most of the knowledge are still tacit in the sense that they are yet to be retrieved from the people who own it, debated and stored in company database, debated and criticised and stored in company database. Reutilization of knowledge is the mechanism that knowledge is utilized in using systems. In this context, workers or members can utilize knowledge exploration engine to search needed knowledge in diverse discrete cluster in previous works. One may obtain knowledge through technique of pull. Furthermore, KMS likewise provide linked knowledge depending on individual preference and instant requirement. Under the favorable management situation, knowledge in the repository is controlled, rehabilitated as well as kept promptly by knowledge leaders, who permits that system to have competent capacity of active involvement rather than those that are limited to the still usage and close preservation. Ontology-based knowledge management framework Chapter 20 341 6. The practice of knowledge organization and expression 6.1 Ontology For organizations and water user networks to carry out its work effective ontology is necessary. Ontology is distributing abstract model of official description. In execution, it is typically accepting five different arrays to designate ontology, notion or class association, roles, axioms and occurrences. Association is the heart of ontology in five diversified arrays. Establishing a good correlated area ontology foundation is the central aspect of the knowledge management foundation on ontology. The correlation of ontology replicates the constrains, interaction or a novel connection among thoughts, such as the identical, appositional, hyponym, configuration, interconnection and noun modified relations etc. Numerous associations can logically link each form of the knowledge node and establish a web of knowledge association base on ontology, then it can obtain the accurate knowledge node using comparative path. There are two systems of knowledge relations (i) major ontology relation and (ii) minor ontology relation (Zhang et al., 2011). The formal relation describes all the terminology concerning particular field and ontology association among each terminology; the other relation describes the outside concepts of other domain linking the terminology of major ontology relatives. Using the major and minor relations can create respectable knowledge organization and likewise retrieve knowledge proficiently and rapidly. Hypothetically all functions carried out by water user association or business organization are connected between firms to produce outline function for a web, again all properties in a firm “bind” within the associations to generate a resource gathering for a system. Lastly, the players are attached within association to form players’ network. Yet, single managers cannot realize the entire pattern in which the enterprise is entrenched. For executives, it implies taking a confined sight, establishing enough intricacy into their comprehension to permit act, in order to prioritize aims across time. A part of relation network of business and water user association is presented in Fig. 4. 6.2 Knowledge representation and organization base on ontology The kind of knowledge depiction must be positioned first. Owing to the variety and intricacy of knowledge, it is challenging to convey the knowledge in organized way. There are varied types of knowledge illustration, nonetheless there is no settled and organized technique of the knowledge depiction (Zhang et al., 2011). Additionally, the formation of an undivided knowledge base may not qualify for a conglomeration inventive plans, due to the cross and united knowledge in numerous discipline at the moment. A multi-ontology foundation concerning knowledge organization is offered. This model generates one knowledge foundation correspondingly, use the elementary typescripts of ontology and the association labeled above that appreciate the interconnectedness of many-ontology bases and lastly create a knowledge system. As shown in Fig. 5. organization decomposition_of has_goal consists_of organization-goal sub_goal consists_of division has_goal requires_skill skill plays role has_authority member_of subdivision member_of organization-agent team has_communication_link performs authority communication-link activity constrained_by constraint consumes resource FIG. 4 Part of relation network of business and water user association. (Modified from Grunninger, M., 2003. Enterprise modelling. In: Handbook on Enterprise Architecture. Springer, Berlin, Heidelberg, Germany, pp. 515–541.) 342 Handbook of hydroinformatics FIG. 5 Ontology knowledge organization model. (Modified from Zhang, J., Zhao, W., Xie, G., Chen, H., 2011. Ontology-based knowledge management system and application. Procedia Eng. 15, 1021– 1029.) knowledge base of ontology knowledge base of ontology knowledge base of ontology ontology system ontology ontology base logic ontology relations library conceptual set: C1, C2, C3... correlation set: R1, R2, R3... function set: ...... .. .. .. The relative of ontology in rational strata is really a tree or ontology relative which occur in the in-tree shape construction. Each thought is the notion of smallest and correlate to one another through ontology links. This stratum comprises of all the notions and relative web of knowledge foundation. Moreover, it comprises quatern parts-components among each idea: instances, axioms, relation forming, relation sets attributes, etc. Ontology strata is the higher construction of the ontology relative library. Founding on the notion of creating model of manifold knowledge centers, it created a knowledge system in the concept of comparable use to back up the plan process (Zhang et al., 2011) (for instance, the process, equipment’s, approaches, guidelines in computer assisted procedure planning) and ontology association related to the occupation resemblance. It is recognized that separately knowledge web is the basic idea acts concerning correlated discipline and the knowledge system of ontology is the element of diversified ontology bases. Moreover, diversified-ontology centers with connected domain knowledge in form of creating rational layer and ontology layer is given. For example, it may be separated in numerous portions such as domain knowledge foundation, principal knowledge foundation and unified knowledge foundation and so on. Each knowledge base of ontology is connected by lowest of constrains among every idea and designate the object, concept and semantic association concerning linked parts base on ontology. For irrigation co-operatives in Tanzania to increase efficiency and respond to the members needs they require ontology of knowledge management (Sandelin et al., 2021). Thus, recognizing this, knowledge need to be managed in a systematic and strategic way. Irrigation and water user networks occupy noteworthy subversive and mechanical resource, of which they lack sufficient expertise. This is frightening due to the fact that irrigation co-operatives and water users’ networks to be able to offer good services to its members is contingent to appropriate operation and preservation of key resource just like any other business enterprise. Knowledge asset enhancement need to be continuous action in irrigation co-operatives and ought to begin from the principle of identifying a gap between current knowledge and desired knowledge and how technology as enabler facilitate knowledge management practices. Ontology-based knowledge management framework Chapter 20 343 Irrigation co-operatives may lose knowledge due to retirement, sease of membership, severe or long-lasting sickness and death. When these irrigation co-operatives experience these it creates loss on portion of its institutional memory. This is due to the fact that most of the knowledge is tacit and utilize solutions that depend on classical working practices as well as possessing negative attitude on technology, thus making explicit knowledge is found to offer solution thus, this Ontology Knowledge Organization Model is suggested. 6.3 Knowledge retrieval base ontology The drive of the knowledge management is to offer suitable knowledge to the accurate persons so that they can be capacitated to make the best decisions. Furthermore, the knowledge retrieval is a main challenge of knowledge management which is the center within linking among people and knowledge. Knowledge if is to help organization and water user networks achieve goals for which it was established for must be retrieved at the suitable time and to the suitable people. Unfortunately, this is missing in most business organization and water user networks in Tanzania and as a result they fail to make informed decision on issues related to customers as well as product design. Knowledge recovery should focus on knowledge organization because recovery pattern is fixed by association design, and it is the differing procedures of the knowledge organization. Thus, the chapter apply the domain ontology-based firm design founded on the investigation of the ontology and the knowledge depiction to design the recovery path as shown in Fig. 6. This recovery mode is alienated into quatern parts; knowledge interlinkage, ontology, knowledge resources, and matching recovery. Knowledge interconnectedness key role is going through the corresponding recovery mechanism by the use of recovery passages, selecting the suitable entering (self-defines recovery and autorecovery) that explore the knowledge bulges and ideas in the associated grounds, gaining acceptable similar consequences and again taking them FIG. 6 Model of knowledge retrieval base on ontology. (Modified from Zhang, J., Zhao, W., Xie, G., Chen, H., 2011. Ontology-based knowledge management system and application. Procedia Eng. 15, 1021–1029.) 344 Handbook of hydroinformatics back to the operator. Knowledge reserve is the basis of the knowledge retrieval founded on the ontology, it is the main point to form the knowledge recovery arrangement dissimilar from well-known information recovery frameworks. Furthermore, it is the central tenet of the system model. From the recovery examination and the findings, management to the knowledge connotation recovery procedure, to the knowledge assets design to the index base are all founded on the correlated knowledge in the ontology. High effectual knowledge repossession rests on in a higher quality recovery tactic and technique. This chapter present two recovery path self-defined retrieval and autoretrieval (Zheng et al., 2012). Which permits water users networks and business organizations in diverse stratum to opt the suitable retrieval means conferring to unlike needs. The procedure of self-described recovery user may describe the facet of knowledge all by themselves using the notion navigation, clarification navigation as well as graph of notion relations. As long as the network will explore the classes therefore, it can demonstrate the variation relatives of the ontology to the operator in the human computer integration level as well as can choose the sight of the itemized ontology. Since the ontology is quantified by operator in a definite selecting assortment, specifically the range for searching are tapering extra or fewer in similar extent thus, cognate the essential knowledge for the operator. This type of search method with some adjustment may principally satisfy majority of operators aspirations and offer operators with extra maneuvering direction. Added form of recovery path is mechanical retrieval (extended type retrieval) that is truly the growing of the thoughts. It is implemented contingent to the relatives amid notions and connotation of the ontology. It comprises dual facets as follows by means of the association articulated by the ranked construction in domain ontology to produce extra retrieval findings user-challenging notions substituted by “topclass” concepts or the precise qualities value substituted by qualities value, which all decrease the restraints of this recovery mode. Operator challenging thoughts swapped by subgroup ideas can obtain profound and extra connotation thought and representation formats. Using the multidisciplinary notion counting many additional domains of knowledge and additional domain of notion relative knowledge to enlarge the field which these multidisciplinary knowledges are belonging to. In business organizations in Tanzania storing of knowledge contains soft or hard mechanism of capturing and storage of personal and enterprise knowledge in a manner that is simply retrieved. Storage of knowledge exploits mechanical set-up like informational hard and software and human competencies to recognize organizational knowledge, then to code and index retrieval purposes (Armstrong, 2006). This persons to document method. A repository as consents many to explore for, and retrieve codified knowledge without meeting a person who inverted it, thus ensure organizational continuity. 6.4 Knowledge application and implementation base on ontology This system may be applied to the arena of hydro informatics, after examining the formation of structure of knowledge management arrangement founded on ontology. The knowledge organization and depiction as well as recovery, first, in an effort to mine, classify and establish the connected knowledge ground firm apply a diversity of equipment for knowledge gaining to assess on associated domain of knowledge (Zhang et al., 2011). A proper-structured ontology is significant to establish a knowledge management framework successful. This chapter evidently describes the ladder among terms which are equally acknowledged in business organizations and water users’ networks and acquire the top-down technique, which imply itemizing highest-level notions and progressively purifying to create subgroups. For example, hydroinformatics can be categorized into numerous glades; enginery, power, driver, performance, etc. then slowly feast the subgroup, spreading by similarity. There might obtain a theoretical chart that possess domain knowledge with dependence. Lastly build a ontology model after describing the qualities of classes. The worth of ontology knowledge ground is usability, thus, the connection among enquiries that ought to be answered and knowledge in foundation is prime for knowledge repossession (Zheng et al., 2012). If user opt for self-described retrieval, it will arrive into the interface presented in the ideal of knowledge retrieval described earlier, if users select self-described retrieval, it will get in the interface as presented in Fig. 7. On the left of notion exploration part is the notion three presenting groups, the differing side present diagram of web correlation. Clacking the key “enter” it will appear as Fig. 5B. Picking a notion, the right will show the key association of subject notion, inclusive of supreme, lateral, thought and related concept that will be displayed in viewing a concept part. This offers a thin or extension exploration varieties in order to get conforming knowledge object. If operator select autoretrieval, it will get into the interface as it is displayed in Fig. 8. Primarily, by means of isolating into quatern features, carrier, action, object, and condition by means of ontology scale for inquisitive and altering ontology word, it shall result into problematic depiction. Snapping the key “ontological expression” operator shall obtain content knowledge subsequently class mining connate as well as theoretical amalgamation. The knowledge acquired will help organization attain competitive advantage. Ontology-based knowledge management framework Chapter 20 345 FIG. 7 (A) Self-defined retrieval module. (B) Self-defined retrieval module. 346 Handbook of hydroinformatics FIG. 8 (A) Autoretrieval (B) Autoretrieval model. module. Ontology-based knowledge management framework Chapter 20 347 7. Conclusions Granted that ongoing debate on performance of business organization and water user networks focus of various factors which affect performance, the contribution of ontology-based knowledge management framework has been given scant attention. In this chapter resource based and system theory are used to propose a framework of ontology-based knowledge management that can enhance performance of business organizations and water user networks. These frameworks can be employed to explain how ontology of knowledge management can facilitate competitiveness of these organization through knowledge acquisition, representation, retrieval and manning. The chapter also shows how culture, structure technology and people can facilitate knowledge sharing and ontology of knowledge management development in organizations. On the other hand, the chapter shows that most business organization and water user association in Tanzania lack framework of ontology-based knowledge management that creates challenges to the enterprise due to ambiguity and unstructured nature of knowledge management in business. Irrigation co-operatives and water users networks may lose knowledge due to retirement, sees of membership severe or long-lasting sickness and death. When these irrigation co-operatives experience these it creates loss on portion of its institutional memory. It is therefore recommended that in order to manage knowledge effectively management should invest enough capital in establishing structure and systems of knowledge management, educate employees and members on the various aspects of knowledge management ontology as well as fostering knowledge sharing culture based on trust. References Amoah, M., Fordjour, F., 2012. New product development activities among small and medium-scale furniture enterprises in Ghana: a discriminant analysis. Am. Int. J. Contemp. Res. 2 (12), 41–53. Armstrong, M., 2006. A Handbook of Human Resource Management Practice. Kogan Page Publishers. Barkhordari, S., Fattahi, M., Azimi, N.A., 2019. The impact of knowledge-based economy on growth performance: evidence from MENA countries. J. Knowl. Econ. 10 (3), 1168–1182. Barney, J.B., Arikan, A.M., 2005. The resource-based view: origins and implications. In: The Blackwell Handbook of Strategic Management. John Wiley & Sons, Ltd, pp. 123–182. Bixler, C.H., 2005. Developing a foundation for a successful Knowledge Management System. In: Stankoshy., M. (Ed.), Creating the Discipline of Knowledge Management: The Latest in. University Research. Elsevier Butterworth-Heinemann, Amsterdam, Boston, pp. 51–65. Carayannis, E.G., Campbell, D.F., 2009. Mode 3’ and ‘Quadruple Helix’: toward a 21st century fractal innovation ecosystem. Int. J. Technol. Manag. 46 (3–4), 201–234. Cardeal, N., Antonio, N.S., 2012. Valuable, rare, inimitable resources and organization (VRIO) resources or valuable, rare, inimitable resources (VRI) capabilities: What leads to competitive advantage? Afr. J. Bus. Manag. 6 (37), 10159–10170. Chun, M., Sohn, K., Arling, P., Granados, N.F., 2008. Systems theory and knowledge management systems: the case of Pratt-Whitney Rocketdyne. In: Proceedings of the 41st Annual Hawaii International Conference on System Sciences, Hawaii, USA. IEEE, p. 336. Daft, R.L., 2000. Management. The Dryden Press, Fort Worth. El-Farr, H., Hosseingholizadeh, R., 2019. Aligning human resource management with knowledge management for better organizational performance: how human resource practices support knowledge management strategies? In: Current Issues in Knowledge Management. IntechOpen. ESRC, 2005. Knowledge Economy in the UK. Retrieved from: http://www.esrcsocietytoday.ac./ESRCInfoCentre/facts/UK/index4.aspx? ComponentId¼6978&SourcePageId¼14971#0. Ferreira, J., Mueller, J., Papa, A., 2018. Strategic knowledge management: theory, practice and future challenges. J. Knowl. Manage. 24 (2), 121–126. https://doi.org/10.1108/JKM-07-2018-0461. Fitz-Enz, J., 2000. The ROI of Human Capital: Measuring the Economic Value of Employee Performance. AMACOM Division of American Management Association. Gonzalez, R.V.D., Martins, M.F., 2017. Knowledge Management Process: a theoretical-conceptual research. Gest. Prod. 24 (2), 248–265. Huang, Y., 2008. Overview of knowledge management in organizations. UW-Stout J. Stud. Res. 1 (1), 1–5. http://digital.library.wisc.edu/1793/52955. Kefela, G.T., 2010. Knowledge-based economy and society has become a vital commodity to countries. Int. NGO J. 5 (7), 160–166. Kraaijenbrink, J., Spender, J.C., Groen, A.J., 2010. The resource-based view: a review and assessment of its critiques. J. Manag. 36 (1), 349–372. Liu, Q., Bai, Q., Kloppers, C., Fitch, P., Bai, Q., Taylor, K., et al., 2013. An ontology-based knowledge management framework for a distributed water information system. J. Hydroinf. 15 (4), 1169–1188. Mosha, D.B., Vedeld, P., Katani, J.Z., Kajembe, G.C., Andrew, K.P.R., 2018. Contribution of paddy production to household income in farmer-managed irrigation scheme communities in Iringa Rural and Kilombero Districts, Tanzania. J. Agric. Stud. 6 (2), 100–122. https://doi.org/10.5296/jas. v6i2.13147. Mugera, A.W., 2012. Sustained competitive advantage in agribusiness: applying the resource-based theory to human resources. Int. Food Agribusiness Manag. Rev. 15 (4), 27–48. Omotayo, F.O., 2015. Knowledge management as an important tool in organisational management: a review of literature. Libr. Philos. Pract. 1 (2015), 1–23. 348 Handbook of hydroinformatics Panagiotidis, P., Edwards, J.S., 2001. Organisational learning—a critical systems thinking discipline. Eur. J. Inform. Syst. 10 (3), 135–146. Peng, M.Y.P., Zhang, Z., Ho, S.S.H., 2019. A study on the relationship among knowledge acquisition sources at the teacher-and college-level, student absorptive capacity and learning outcomes: using student prior knowledge as a moderator. Educ. Sci. Theory Pract. 19 (2), 22–39. Sandelin, S.K., Hukka, J.J., Katko, T.S., 2021. Importance of knowledge management at water utilities. Public Works Manage. Policy 26 (2), 164–179. Senge, P., 1990. The Fifth Discipline: The Art and Practice of the Learning Organization. Doubleday, New York, USA. Subashini, R., Rita, S., Vivek, M., 2011. The role of ICTs in knowledge management (KM) for organizational effectiveness. In: International Conference on Computing and Communication Systems. Springer, Berlin, Heidelberg, Germany, pp. 542–549. Sureephong, P., Chakpitak, N., Ouzrout, Y., Bouras, A., 2008. An ontology-based knowledge management system for industry clusters. In: Global Design to Gain a Competitive Edge. Springer, London, UK, pp. 333–342. Tarboton, D.G., Maidment, D.R., Zaslavsky, I., Ames, D.P., 2010. Hydrologic Information System 2010 Status Report. Consortium of University for the Advancement of Hydrologic Science, Washington, DC, USA. Teng, J.T., Song, S., 2011. An exploratory examination of knowledge-sharing behaviors: solicited and voluntary. J. Knowl. Manag. 15 (1), 104–117. URT, 2011. Tanzania Agriculture and Food Security Investment Plan (TAFSIP). Government Publishing Press; Tanzania, Dar es Salaam. Yari, A., Eslamian, S., 2021. An introduction to residential water users. Chapter 1, In: Yari, A., Eslamian, S., Eslamian, F. (Eds.), Urban and Industrial Water Conservation Methods. Taylor and Francis, CRC Group, USA, pp. 1–8. Yoong, P., Molina, M., 2003. Knowledge sharing and business clusters. In: PACIS 2003 Proceedings, p. 84. Zhang, J., Zhao, W., Xie, G., Chen, H., 2011. Ontology-based knowledge management system and application. Procedia Eng. 15, 1021–1029. Zheng, Y.L., He, Q.Y., Ping, Q.I.A.N., Ze, L.I., 2012. Construction of the ontology-based agricultural knowledge management system. J. Integr. Agric. 11 (5), 700–709. Chapter 21 Parallel chaos search-based incremental extreme learning machine Salim Heddam Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology, Hydraulics Division, Agronomy Department, Faculty of Science, Skikda, Algeria 1. Introduction Dam’s reservoirs play an important and critical role in both daily life and our future developments. Dams were constructed for fresh water storage; flood control; and supplying hydroelectric power stations (Ahmed et al., 2020; Yang et al., 2020). The construction of dam’s reservoirs can have a significant impact on the aquatic life downstream of the dam by interrupting and modifying the flow regime and particularly modifying thermal regimes of rivers (Tao et al., 2020; Guo et al., 2020). Consequently, damming rivers can have several advantages and provide a large amount of service for human and also have adverse effects on aquatic environment, especially by changing water temperature which itself affect directly fish and fish habitat (Wang et al., 2020), and adversely affect all ecological conditions (Shi et al., 2020). One of the most important operations of dams is releasing water through the spillways, which lead to an increase of the concentration of the ambient air entrained in the water, itself leading to the supersaturation of total dissolved gas (TDG) in the water downstream of the spillways (Bragg and Johnston, 2014, 2015, 2016). By monitoring and controlling closely the evolution of TDG downstream of the spillways at Snake and Columbia Rivers, United States, it was demonstrated that spill from dam is the leading cause of supersaturated TDG elevation (Tanner et al., 2009, 2011, 2012, 2013). The concentration of TDG should be maintained below 110% in order to avoid the gas-bubble trauma (GBT) in fish and other barotrauma (Ma et al., 2019), which is the major problem that has been affecting the health of fish and other fresh water species (Yuan et al., 2020; Fan et al., 2020), and as a results, it was strongly recommend a regional program of research on the severely threatened fish species for a given spill season during the passage of fish through the tailwaters until the forebay (Cao et al., 2020). The formation, composition, and the level of TDG concentration was influenced by several factors among them: water temperature (Tw) (Yuan et al., 2018), barometric pressure, total gas pressure, and the vapor pressure of water (Morris et al., 2003). During the last few decades, many studies have investigated the physical process of TDG supersaturation, and a variety of methods have been proposed to resolve the issue of TDG by defining a new formula and models that can help in controlling the evolution of TDG over time and space. Among the proposed methods, the computational fluid dynamic (CFD) models have been extensively used to predict TDG (Ma et al., 2019), numerical hydrodynamic (Weber et al., 2004), polydisperse two-phase flow and unsteady 3D two-phase (Politano et al., 2007, 2009, 2012, 2016, 2017; Fu et al., 2010; Ma et al., 2016), and other numerical and hydrodynamic models (Stewart et al., 2015; Wang et al., 2019; Witt et al., 2017). Recently, machines learning models have been introduced as a strong and promising alternative for predicting TDG concentration and an important work is already done (Heddam, 2017; Keshtegar et al., 2019; Heddam et al., 2020; Heddam and Kisi, 2020). For example, Heddam (2017) applied the generalized regression neural network (GRNN) model for predicting TDG in Columbia River, United States. Keshtegar et al. (2019) compared four machines learning models namely, least squares support vector machine (LSSVM), M5 model tree (M5Tree), multivariate adaptive regression spline (MARS) and high-order response surface method (H-RSM) models in predicting TDG concentration in Columbia River, United States, and reported that the H-RSM was more accurate than the LSSVM, M5Tree and MARS models. Heddam et al. (2020) compared the kriging interpolation method (KIM), response surface method (RSM) and the feedforward neural networks (FFNN) in modeling TDG at four dam’s reservoirs located in the Columbia Handbook of HydroInformatics. https://doi.org/10.1016/B978-0-12-821285-1.00006-3 Copyright © 2023 Elsevier Inc. All rights reserved. 349 350 Handbook of hydroinformatics River, United States, and demonstrated that the KIM model was more accurate than the RSM and FFNN model. Recently, Heddam and Kisi (2020) compared several families of neurofuzzy models namely, the adaptive neurofuzzy inference system with subtractive clustering (ANFIS-S), ANFIS with grid partition (ANFIS-G), ANFIS with fuzzy c-means (ANFIS-F), dynamic evolving neural-fuzzy inference system online learning called DENFIS_O, and offline learning called DENFIS_F, for predicting hourly TDG and reported that in overall ANFIS models were more accurate than DENFIS models. However, most of the studies related to TDG prediction focus on the application of several kinds of models by incorporating a varieties of input variables or predictors, which complicates the application of the proposed models. In addition, a unique model linking TDG concentration to water temperature is not well documented and investigated, which constitutes a strong motivation of our study. Extreme l

Handbook of Hydroinformatics: Classic Soft-Computing Techniques

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib