ISH Journal of Hydraulic Engineering ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tish20 Leak detection in water distribution network using machine learning techniques Nishant Sourabh, P.V. Timbadiya & P. L. Patel To cite this article: Nishant Sourabh, P.V. Timbadiya & P. L. Patel (2023): Leak detection in water distribution network using machine learning techniques, ISH Journal of Hydraulic Engineering, DOI: 10.1080/09715010.2023.2198988 To link to this article: https://doi.org/10.1080/09715010.2023.2198988 Published online: 12 Apr 2023. Submit your article to this journal Article views: 246 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=tish20 ISH JOURNAL OF HYDRAULIC ENGINEERING https://doi.org/10.1080/09715010.2023.2198988 Leak detection in water distribution network using machine learning techniques Nishant Sourabh, P.V. Timbadiya and P. L. Patel Department of Civil Engineering, Sardar Vallabhbhai National Institute of Technology-Surat, Surat, Gujarat, India ABSTRACT ARTICLE HISTORY Leakage in the water distribution system (WDS) and its control has been challenging for water resources fraternity for management of precious water demand. This study examines an inverse engineering technique to find the leaks in water supply pipelines. The main objective of the study has been to identify the patterns of deviations in the pressure/flow in the network, due to a single leak in the network, by solving classification and regression problems using artificial neural networks (ANNs) and support vector machines (SVMs). The leak detections were solved using two scenarios, wherein, (a) only pressure measurements and (b) only flow measurements, are undertaken in the system. The multi-layered perceptron (MLP) model and multi-label multi-class SVM classification and regression models were developed and trained using the pressure and flow signals, separately. It was found that the ANN model performed better than the SVM model in pressure- and flow-based leak detection in both classification and regression problems. The model performance could also be improved by optimizing the number of inputs to the model during the training phase. The present study would be useful for water supply management while applying the techniques for minimizing the losses in the water supply network due to leakages. Received 10 May 2022 Accepted 31 March 2023 1. Introduction Every drop of clean water is precious. Leakage in water distribution systems is an important issue which affect the customers worldwide. The leakages in the water supply sys­ tem are basically the wastage of the water through the cracks or fissures in the pipe or tanks or reservoirs. A multitude of things, including as poor pipe connections, internal or exter­ ior pipe corrosion, or mechanical damage brought on by an increased pipe load, can result in leakage in WDS. Kumar et al. (2005) says that India has 16% of total popula­ tion in the world, and just 4% of freshwater resources in the planet. According to the National Commission for Irrigated Water Resource Development of India, the water shortage issue faced by country has arisen due to wastage and poor management. According to Food and Agricultural Organisation, 92% of available fresh water is used in farming sector, 5% in domestic usage and remaining 3% is used in the industrial sector. Due to leaks and inefficiencies in the water management system, the nation wastes close to 50% of its fresh water. Keeping in view the Indian scenario of leakages in water distribution system, it is of utmost importance to develop and demonstrate low-cost engineering solution to identify location of leakages which would help public-health engineers to reduce ‘un-accounted for water (UFW)’. Water audits record the difference between the total quantity of water used and the total amount of water produced. An estimation of the leak is provided by the difference. This approach takes a lot of time and does not pinpoint the leak’s location. Leak detection was distinguished by Hamilton (2009) as a subordinate of the three primary phases of localise, locate, and pinpoint. After a leak has occurred, localization involves focusing the leak on a particular district metered area (DMA) or network segment. Finding the leak in the DMA is leak detection; artificial neural network; support vector machines; EPANET software; MATLAB programming the second stage. The third and last process, pinpointing, involves finding the leak’s specific position within a 20 cm radius. El-Zahab and Zayed (2019) states that the challenge is to distinguish leak signs due to pumps or open fire hydrant and it confuses the vibration-based leak detection instru­ ments or sensors to generate false alarm (El-Zahab et al. 2016; Khulief et al. 2012; Stoianov et al. 2007). Puust et al. (2010) and El-Zahab and Zayed (2019) summarizes the var­ ious methods used for leak detection. Puust et al. (2010) broadly categorised the leak detection approaches into three categories, viz., equipment-based, numerical or hydraulic modelling and their combinations. The equipment-based methods mainly comprise of use of installed or portable sensors to detect the leaks along the pipelines. These methods mainly include leak noise correla­ tors, ground penetrating radars (Hunaidi 1998; Lockwood et al. 2003; O’Brien et al. 2003), acoustic logging (Moyer et al. 1983; Hough 1988; Rajtar and Muthiah 1997), step-testing (Farley and Trow 2003; Pilcher et al. 2007), etc. Small leaks are more challenging to find with these labour-intensive, expensive approaches, especially when employing acoustic logging in plastic pipes. The numerical/hydraulic modelling for leak detection are mostly based on data analysis related to water supply system. Billmann and Isermann (1987) have proposed transient mod­ elling; Zhang (1993) have carried out the analysis on statistical methods; Lambert (2002) proposed the analysis through water balance method; Silva et al. (1996) have used the negative pressure wave; and Alkasseh et al. (2013) have modelled using the minimum night flow method. These strategies often use calibration and optimization methods to examine various network segments. The effectiveness of these techni­ ques depends on the calibre of the monitoring system and how frequently water is used. Due to the intrinsic complexity of CONTACT P.V. Timbadiya pvtimbadiya@ced.svnit.ac.in This article has been corrected with minor changes. These changes do not impact the academic content of the article. © 2023 Indian Society for Hydraulics KEYWORDS 2 N. SOURABH ET AL. urban water distribution systems, Mashhadi et al. (2021) demonstrated that the wide range of existing methodologies underlines the tremendous difficulty in identifying and loca­ lising water leaks. These methods are useful in locating the leakage, however, fail to locate the leakages in real time domain. Such issues can easily be overcome using machine learning techniques. Machine learning-based technology has recently drawn a lot of interest. An artificial neural networks (ANN) model working on steady-state process parameters was developed by Belsito et al. (1998) for the purpose of locating leaks in liquified gas pipeline networks. Leaks as tiny as 1% of flow rate could be found by the system. While misclassifying the leaks in the case of small breaches, ANNs did extremely well in locating huge leaks where noise was not present. Caputo and Pelagagge (2003) described a method of using multi-layered perceptron (MLP) to backpropagate ANN to detect leaks in pipeline net­ works with good accuracy, i.e. correctly identifying the leaking pipe/branch but had predicted the leak size with 3% of error. They, however, could not account for the noise and measure­ ment errors. Shinozuka et al. (2005) described a method using neural networks that monitors the online water pressure at certain selected locations, using supervisory control and data acquisition in the system to determine the location and sever­ ity of damage in the water supply system. Their results showed that number of monitoring stations can be less than one-tenth of the number of nodes in water distribution systems. Fuzzy ANN system for water supply system problem detection was described by Izquierdo et al. (2007). Fuzzy estimation states were produced by the fuzzy model and then utilised to train ANNs on multi-dimensional units. For big leaks, it was dis­ covered that the system offers good classification accuracy. The modelling requirements and, to some extent, the compu­ ter processing needs can be partially met by employing ANN to monitor the state of the pipeline network. Aksela et al. (2009) found that, for obtaining reasonable prediction using ANN, to detect the leakage, a lot of historical data are required to train the neural network, which need to be updated for every month. Therefore, the efficiency of these methods is usually lower. Also, the methods are not able to detect the leakages quickly as the training time is usually relatively long, which leads to alarming delay. In order to detect anomalies in the water distribution time series data using a pattern-based approach, Mounce et al. (2011) presented an ANN method based on the similarity study between new events and profiles derived from past occurrences. This aided in categorising recent occurrences to find anomalies that might be related to leaks. Jin et al. (2014) used a neural network to detect leaks from sound signals (de-noised) emitted by the pipeline net­ work. The relative error of the proposed method was found to be 1.1%. Zhang et al. (2016) used the multiclass SVM and applied K-means clustering method to subdivide the water network into leakage zones for large-scale water distribution network. Monte Carlo simulations was used for generating the leakage data. It was discovered that using flow and pressure data, multiclass SVM could locate the leakage zone. Chan et al. (2018) noted a substantial difficulty in estimating the number of clusters and a significant influence of the randomised first cluster on the clustering procedure. Rojek and Studzinski (2019) proposed that ANN method could correctly identify and locate the leaks in the water distribution systems. Shravani et al. (2019) employed the MLP to detect the leak, and pre­ dicted its location based on the deviation in flows due to leaks in network. The results showed that amongst the machine learning based models, the MLP performs the best with an accuracy of 94.47%. A leakage detection technique combined with GIS-based spatial flow data analysis was proposed by Cantos et al. (2020). Through a continuous evaluation of the real-time distributed volume and the consumption for DMA and/or, in the absence of metre reading, the possible leak in a DMA has been identified. In order to prevent any false alarms, the deployment of such a machine learning model for leak detection necessitates reliable data quality control and real-time system monitoring of the flow parameters in a complicated sensor network. For their reliable integration in the existing smart water networks, they also need further model training under realistic settings, forecasting abilities for pattern recognition of the effects of externalities, and data quality control. Using density-based spatial clustering of appli­ cations with noise (DBSCAN) and multiscale fully convolu­ tional networks (MFCN), Hu et al. (2021) suggested a novel leakage detection model. It was discovered that the accuracy of the suggested method is enhanced by 78%, 72%, and 28%, respectively, when compared to support vector machine (SVM), naive bayes classifier (NBC), and k-nearest neighbour (KNN). Tariq et al. (2022) used the data driven application of MEMS-based accelerometers to measure linear motion, either movement, shock or vibration due to leakage in the system. For the precise classification of leak and no-leak scenarios utilising extracted features, the authors used machine learning models based on Random Forest and Decision Trees. Random Forest was found to perform better than the other machine learning models, and the overall accuracy for metal pipes reached 100% and for non-metal pipes reached 94.93%. However, they can only be used for temporary monitoring, and because each accelerometer has a 30-minute power backup, personnel must change the accelerometers every 30 minutes. Due to the restrictions on the data collection, two types of models – metal and non-metal-based models – were developed and tested on various types of pipes and materials. However, these cannot be placed very far from the gateway without using a long-distance transmission antenna, which may be expensive. In their novel CtL-SSL (Clustering-thenLocalization Semi-supervised Learning) machine learning fra­ mework, Fan and Yu (2022) proposed using the topographical link between the WDN and its leaking characteristics for the location of the WDN’s sensors and the monitoring data for leakage detection and localization. The calculation of the ideal number of leakage zones for various types of WDNs, as well as the optimization of final detection and localization accuracy, are the method’s limitations. For the leakage detection problem, the best way is to install monitoring system which should be sufficiently dense, and can measure the flow and pressure characteristics in the network and compare the measured values with the expected values in case of no leak in the network. An abnor­ mal state, such as a potential leak in the vicinity of a certain measurement site, is demonstrated by the observed discre­ pancy between the measured value and the expected value. Due to the installation of numerous measuring instruments in the monitoring system to provide satisfactory emergency condition detection, this strategy is expensive. Therefore, it is crucial to maximise the network’s sensor placement and density. According to Hart and Murray (2010), the issue of where to place water quality and contamination sensors within water distribution networks in order to improve ISH JOURNAL OF HYDRAULIC ENGINEERING monitoring and security capability has been thoroughly researched over the past ten years by hydro-informatics researchers and the community working on water distribu­ tion systems. The Threat Ensemble Vulnerability Assessment and Sensor Placement Optimization Tool (TEVA-SPOT v2.5), created by Murray et al. (2010), is the most advanced software for placing sensors in water distribution networks. It is based on the EPANET software engine and simulates various contamination scenarios using the hydraulic and quality solver. Eliades et al. (2016) presented a software tool, named as Sensor Placement (S-PLACE) Toolkit, to compute the locations of water quality sensors on the basis of impact of the contaminations in the water distribution network. This toolkit is programmed in MATLAB using EPANET software library. The literature review presented in the preceding sections indicated that previous researchers applied the machine learn­ ing algorithm to either identify the leaks or localise the same in the pipe network. This research article proposes to fill the gap by applying the machine learning techniques to detect the location, pinpoint and find the rate of the leakage in a complex water distribution system using the network char­ acteristics in the networks obtained through an optimised network of pressure and flow sensors. The detection of leakage in the water distribution network requires three things, viz., detection of leaking pipe (leak detection & leak localization), location of leak in the detected pipe (leakage pinpointing) and the leak rate. The proposed methodology includes all three required components in the leakage detection in water distri­ bution networks. This article describes a method for utilising machine learning techniques to interpret data from a network of pressure sensors and/or flow-measuring devices monitoring a pipe network in order to determine the position and magni­ tude of network leaks. 2. Materials and methods The present study was carried out on the Hanoi water dis­ tribution network. The main objective of the present study is to detect a single leakage in the water supply network. In the present study, the classification and regression models have been developed using machine learning techniques such as MLP ANN and SVM models based on the nodal pressure and pipe flow measurements. The flowchart of the metho­ dology is shown in Figure 1. The main steps included in the current methodology are: ● Development of mathematical model of the network for the pressure/flow computations, according to the boundary conditions. ● Simulation of the leaks and the computation of result­ ing pressure/flow in the network under the various leak conditions. ● Correlation of the leak patterns to pressure/flow data using the machine learning techniques ● Testing and predicting the leaking pipe as per the sensed pressure/flow data. 2.1. Problem formulation The problem of leak detection requires three outputs, viz., leaking pipe in the network, leakage location in that pipe 3 and the leakage rate. The analysis has been divided into two different categories of classification problem and regression problem. The flows and pressure heads in the network due to leakages have been used to develop the classification problem to detect the leaking pipe in the water distribution network using pattern recognition in ANN and SVM. Both ANN and SVM have their own model parameters, such as activation and kernel functions, learning algorithm, scaling of training data, etc., which are required to be optimized for the better predictions through them. The classification problem has been for­ mulated in such a way that the output of the model would be the leaking pipe. The solution of regression problem provides the location of leak in the leaking pipe and leak rate. A neural network is described as an interconnected assembly of basic processing elements, units, or nodes (Thirumalaiah and Deo 1998). Similar to how the human brain does, they are built to spot a hidden pattern in the data. The majority of ANNs use a non-linear transfer function as its underlying method, which is then applied to a collection of input variables. There are basically two types of problems which can be solved using ANNs. They are classification and regression problems. The goal of pattern classification is to assign input patterns to one of a finite number of classes. In order to effectively apply it, features must be chosen that have the information necessary to distinguish between classes, are insensitive to irrelevant input variability, and are few in number to allow for effective computation of discriminant functions and to reduce the amount of training data needed (Dehuri and Cho 2010). The processing ability of the network is stored in the interconnection unit strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns (Huelss 2020). The example of a neural network architecture is shown in Figure 2. Caputo and Pelagagge (2003) have stated that ANNs have proven successful in approximating non-linear multi­ variable functions, and in classification problem. By mon­ itoring water pressures online at a few chosen points in the system, Shinozuka et al. (2005) described an approach for determining the location and extent of damage in a water delivery system. Mounce and Machell (2006) have pre­ sented an application of ANN for analysis of data from sensors measuring hydraulic parameters (flow and pres­ sure) of the flow in water distribution system and were able to locate the leakage in the network with an accuracy of almost 98.33%. SVM is a supervised machine learning model that uses classification algorithms for binary classes. The SVM per­ forms classification by constructing hyperplane that opti­ mally separates the data into two categories. In geometry, a hyperplane is a subspace whose dimension is one less than that of its ambient space or the space surrounding an object (Suthaharan 2016). This model linked with learning algo­ rithms analyse data and finds the equation for the hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks like outlier’s detection (Samudrala 2018). The training data in the original feature space may not be linearly separable always, and therefore needs to be improved by inducing additional dimensions in the feature space. The kernel trick is used to achieve higher dimensionality in the 4 N. SOURABH ET AL. Figure 1. Proposed methodology in the present study. Figure 2. Typical neural network architecture. original feature space without altering the data. Figure 3 shows the typical SVM architecture. There are two different ways in which the inputs are trained against the target values in SVM modelling. They are, (i) One Against One (O-A-O) and (ii) One Against All (O-A-A). In O-A-O, one set of input is related to only one class, which is represented by the set of input (Kreßel 1999). According to Hsu and Lin (2002), the total number of mod­ els created can be n*(n–1)/2, where n is the number of classes. Each binary classifier can predict one class label and the model with the most predictions is predicted by the one-against-one strategy. Whereas O-A-A is the earliest implemented method of SVM multi-class classification, in which one set of input is related to all classes, so the total number of classifier models created with this technique is equal to the number of the classes. The one-against-one strategy to train the multi-class SVM classification model is superior to one-against-all strategy. de Silva et al. (2010) explored SVM to act as pattern recognisers to detect the leak in the pipe networks. To solve the issue of erroneous leak detection, Mandal et al. (2012) suggested a novel leak detection system based on rough set theory and SVM. For the computational training of SVM, they used swarm intelligence technique: artificial bee colony algorithm, which imitates intelligent food searching beha­ viour of honey bees. Mashford et al. (2012) developed ISH JOURNAL OF HYDRAULIC ENGINEERING 5 learning techniques (ANN and SVM), with the application on three different networks, viz., one single pipe network, other benchmark problem from Poulakis et al. (2003) and another experimental network. The modelling and computer processing needs can be mitigated in part by employing ANN and SVM to monitor the state of piping networks, with a particular emphasis on leaks and losses. These machine learning techniques can be used to correlate the effect of the leakages or water losses on the network characteristics such as pressure and flows. The correlation helps in finding the leaks and supports in deci­ sion making for the water infrastructures authorities. Figure 3. Typical classification in SVM. (Reproduced after receiving permission from Baeldung Team). 2.2. Network simulation a method for interpreting data from a network of pressure or flow-measuring devices monitoring a pipe network in order to determine the location and size of leaks in the network using SVM analysis. Abdulla et al. (2013) have proposed the method to detect the leak in the pipeline using ANN model with only three inputs. They investigated neural network based probabilistic decision support system for detecting the leakage in pipeline system. Their model correlates mea­ surements of inlet and outlet pressures and flow to leak status. Van der Walt et al. (2018) have compared the machine learning and statistical techniques in the pipe net­ work leak detection. In their study, they have used Bayesian probabilistic framework and compared it with two machine The Hanoi water distribution network, first introduced by Fujiwara and Khang (1990), was modelled in the EPANET software (see Figure 4), and validated using the optimised solution of Eussuff and Lansey (2003). They had optimised the diameters in the network using Genetic Algorithm and Shuffled Frog Leap Algorithm. The network is configured as: 3 loops, 34 number of pipes, and 31 demand nodes with one reservoir having 100 m of head as shown in Figure 4. The pipes in the network are laid with different lengths ranging from 100 to 3500 m, and the total length of the network is 39.42 km and the total demand in the network is 5538.9 litre per second (lps). The source is fixed head reservoir having an elevation of water surface at 100 m above mean sea level (MSL). The minimum head required at all the junctions is 30 m above MSL. Figure 4. Layout of Hanoi Water distribution network (Fujiwara and Khang 1990) (Figure reproduced after receiving permission). 6 N. SOURABH ET AL. Figure 5. Modified layout of the Hanoi water distribution network. (Original Nodes: J-1 to J-31, Extra Nodes: J-32 to J-65) Table 1. Details of the randomly generated data for leak size. Properties Number of leak rates Mean leak size Maximum leak size Minimum leak size Standard deviation Values 5000 181.60 349.93 10.07 97.74 Units —– LPS LPS LPS LPS pipe (R1-P1) with the unit headloss of 28.59 m/km. The range of the velocity in the network is 0.01–6.83 m/s. The distribution of diameters, demand, pressure at the nodes, pipe lengths, and flow in the network is provided in the APPENDIX A (Table A1, A2, A3). 2.3. Data generation and feature selection The leak rates were generated randomly using uniform distribution through MATLAB programming (MATLAB 2014a). The details of the data set are given in Table 1. The programming code for the simulations were devel­ oped in the MATLAB (2014a). To model the leakages in the network, an extra node was added at the centre of each pipe and the leak rates were added as an extra demand to the newly added nodes. Each extra added node represents leak at respective pipes. The modified EPANET model of Hanoi water distribution network can be seen in Figure 5. The network’s characteristics due to different 5000 leak rates in the range of 10–350 lps were formulated using EPANET model coupled with MATLAB programming and EPANET toolkit. The pressure head and pipe flows from these simula­ tions were recorded and were used for the development of the machine learning model using MLP neural net­ work and SVM. The head loss in the network was calcu­ lated using Hazen-William’s equation. The maximum head loss per unit km of length was found in the first 2.3.1. Case A – model based on pressure measurements The hydraulic model developed in the EPANET software were solved for single leak in the Hanoi water distribu­ tion network at a time. The locations of pressure sensors were optimised using S-PLACE toolkit (Eliades et al. 2014) in MATLAB programming and was found at nodes (node nos.: J-2, J-4, J-6, J-9, J-17, J-22 and J-24) as shown in Figure 3. The number of pressure and flow sensors are taken as per Van der Walt et al. (2018). This can be again optimized on the basis of availability of budget and can be considered as the future scope of the present work. The pressures at the sensor location were recorded after simulations of the EPANET model with 5000 numbers of leak rates as a base demand at each extra added node. The pressures at these nodes were used as the inputs for the ANN and SVM models to train against the pipe number as target for the classification problem. The same inputs were used again in training the ANN and SVM models against the target of leak location and leak rates in the regression problem. ISH JOURNAL OF HYDRAULIC ENGINEERING 2.3.2. Case B – model based on flow measurements The hydraulic model developed in the EPANET software were solved for single leak in the network at a time. The locations of flow sensors were optimised using S-PLACE toolkit in MATLAB programming. The locations of six flow sensors were found at pipes (Pipe no.: P-2, P-9, P-20, P-23, P-25, P-28) and the flows in these pipes were recorded after simulations of the EPANET model with 5000 numbers of leak rates as a base demand at each extra added node. The recorded flows were used as the inputs for the ANN and SVM models to train against the pipe number as target for the classification problem. The same inputs were used again in training the ANN and SVM models against the target of leak location and leak rates in the regression problem. Modelling leakage depends on understanding the hydrau­ lics of leaks and how to incorporate that hydraulics into existing models of the water distribution system (Mutikanga et al. 2011). The leak has been modelled in the network as an extra demand at the leakage location. Let us say a distribution network is comprised of ‘m’ demand nodes and ‘n’ pipes. The total demand in the network is QTotal. The QL represent the amount of leakage in ith pipe in the network. So, the new total demand in the network is as per Eq. (1): Q0Total ¼ QTotal þ QL Eq:1 3. Results and discussion It is an inverse engineering problem for the leakage detec­ tion, which means, if pressure and/or flows in the network is recorded through sensors in real time, then it can be possible to detect the leak and its location in the network. 3.1. Classification problem 3.1.1. Solutions from pressure-based ANN model The ANN recognizes patterns among the data, and classifies them to identify and locate the leak. The ANN model, usually, consists of mainly four components, (a) input and output variables, (b) type of network, (c) transfer function, and (d) training and learning function. The feed forward network with gradient descent backpropagation (Amari 1993) was used as a learning function in developing the pattern recognition in neural network. A MLP neural net­ work classification model was developed to detect the single leakage in the network based on the pressure head at the nodes. The network was solved for the randomly generated leak rates and recorded pressures in the network. In this case, the pressure heads at total of seven nodes (J-2, J-4, J-6, J-9, J-17, J-22, and J-24) were taken as the inputs and the pipe 7 numbers were taken as the output as 34 different classes. The complete ANN configuration is given in Table 2. To guar­ antee that overfitting does not occur, the ANN model was validated using 10% of the data set before testing the data set. The structure of the developed MLP neural network is 7-3030-20-34. The optimal network architecture for ANN was achieved by manual optimization of numbers of hidden layers, number of neurons and activation functions for the hidden layers in the network to increase the accuracy of the model, keeping the maximum number of epochs and other training parameters as constant in each case. The mean squared error vs epochs for the trained optimal model is included at Figure 6. The figure indicates that the minimum MSE is observed at 2000 epochs. Boyce et al. (2002) states that the confusion matrix, espe­ cially used in classification problem, compares the predictions of the model with its respective actual or observed values. The comparison of the predicted and actual leaking pipe in the network can be seen as confusion matrix in Figure 7. This clearly indicates that the accuracy of the ANN model is 91.2%, i.e. correctly predicting leak in the 31 out of 34 pipes. Crossentropy can be used as a loss function when optimizing classification models like logistic regression, ANN and SVM. The objective function cross-entropy has been used in the testing and determining the accuracy of the ANN and SVM classification models. Cross-entropy is commonly used in machine learning as a loss function and is a measure of the difference between two probability distributions. The classifi­ cation models also work on this principle. Whereas, the sum of squared error is used in the case of regression models. 3.1.2. Solutions from flow-based ANN model The EPANET model developed for the Hanoi water distri­ bution network was solved for the 5000 different leak rates, at the mid location in the different pipes in the network. The flows at six flow sensors (P-2, P-9, P-20, P-23, P-25 and P-28) as per Figure 2, were taken as the inputs and the pipe numbers were taken as the output as 34 different classes. The supervised MLP network model was developed and complete configuration for the model can be found in Table 3. The structure of the developed MLP neural network is 6-10-25-25-34. The confusion matrix showing the classifi­ cation of the developed neural network is shown in Figure 8. The accuracy of the model for identifying the correctly leaked pipe is 91.2%, as shown in Table 3 and Figure 8. 3.1.3. Solutions from pressure-based SVM model The EPANET model developed for the Hanoi water distri­ bution network was solved for the 5000 different leak rates at the mid location in the different pipes. The resulting pres­ sures heads due to the leak rates were recorded using the Table 2. Details of developed MLP neural network model (Case A). S. No. 1 2 3 4 ANN parameters Size of input data Size of output data Network type Number of hidden layers, with transfer functions and sizes 5 6 7 8 Objective function Learning rate Training:validation:test Accuracy for testing data Observation 7 × 5000 × 34, i.e. 7 × 1,70,000 1 × 34 × 5000, i.e. 1 × 1,70,000 Feed forward pattern recognition 1. Logsig − 30 2. Poslin − 30 3. Purelin − 20 Cross-entropy 0.1 (default value) 60:10:30 (in %) 91.2% (misclassification for 3 pipes out of 34 pipes) 8 N. SOURABH ET AL. Figure 6. MSE vs number of epochs for the trained optimal architecture of pressure-based MLP ANN used in the study. Figure 7. Confusion matrix for the ANN classification on tested data for on pressure measurements. Table 3. Details of the MLP neural network model developed. S. No. 1 2 3 4 ANN parameters Size of input data Size of output data Network type Number of hidden layers, with transfer functions and sizes 5 6 7 8 Objective function Learning rate Training:validation:test Accuracy for testing data Observation 6 × 5000 × 34, i.e. 6 × 1,70,000 1 × 34 × 5000, i.e. 1 × 1,70,000 Feed forward pattern recognition 1. Logsig − 10 2. Radbas − 25 3. Poslin − 25 Cross-entropy 0.1 (default value) 60:10:30 (in %) 91.2% (misclassification for 3 pipes out of 34 pipes) ISH JOURNAL OF HYDRAULIC ENGINEERING 9 Figure 8. Confusion matrix for the ANN classification on tested data for flow measurements. Table 4. Details of the SVM classification model developed. S. No. 1 2 3 4 6 7 8 9 10 SVM parameters Size of input data Size of output data Kernel function Objective function Training:validation:test Number of binary classifiers Scaling factor Bias Accuracy against tested data seven pressure sensors distributed in the Hanoi Network. The supervised multi-class SVM classification model was developed using the kernel function as ‘Gaussian’ function. The SVM configurations used in present study are given in Table 4. Observation 7 × 5000 × 34, i.e. 7 × 1,70,000 1 × 34 × 5000, i.e. 1 × 1,70,000 Radial Basis Function (RBF) Cross-entropy 60:10:30 (in %) 561 (n = 34) 0.1 −0.196 91.2% (misclassification for 3 pipes out of 34 pipes) From Table 4, it is apparent that the accuracy achieved against the tested data was 91.2% for the case when the record of pressure sensors was used as input to the SVM model. The confusion matrix showing the accuracy of the pressure-based model is shown in Figure 9. Figure 9. Confusion matrix for the SVM classification on tested data based on pressure measurements. 10 N. SOURABH ET AL. 3.1.4. Solutions from flow-based SVM model The pipe flows due to the leak rates were recorded using the six flow sensors distributed in the Hanoi Network. The supervised multi-class SVM classification model was devel­ oped using the kernel function as ‘Gaussian’ function. The SVM configurations which are used in current study are given in Table 5. The confusion matrix for the same classi­ fication can be found in Figure 10. From Table 5 and Figure 10, it is seen that perfor­ mance of SVM model against the test data sets are 61.3% as and when the records of flow sensors were used as inputs to the model. As per Hsu and Lin (2002), the one-vs-one approach is superior to one-vs-all approach and hence the one-vs-one approach is used in the present study. The model based on flow values showed the lower accuracy (61.8%), which can be increased after optimization of the number of sensors and their locations in the network. 3.2. Regression problem The Hanoi water distribution network consists of pipes with varying lengths from 100 to 3500 m. The pipes in the network are divided into several segments through equal interval of 20 m to ensure that there were total of 1963 different probable leak locations in the network. The simula­ tion of leak rates at all these locations were carried out using EPANET model through MATLAB programming. The number of probable leak locations for each pipe are tabulated in Table 6. 3.2.1. Solutions from pressure- and flow-based ANN model The supervised MLP neural networks for each pipe were developed using both pressure and flows records separately. The developed models were tested against the independent data set and the result can be seen in APPENDIX B. The data set for the model testing was generated using a known leak size of 300 lps in each pipe and at the midpoint of all pipes as the leak location. The model parameters such as learning scale, number of hidden layers, size of the hidden layers and the transfer functions have been optimized manually depending on prediction accuracy of the models through MATLAB programming. The performance indices for the models using pressure and flow in network is tabulated in Table 7. Table 5. Details of the SVM classification model developed. S. No. 1 2 3 4 6 7 8 9 10 SVM parameters Size of input data Size of output data Kernel function Objective function Training:validation:test Number of binary classifiers Scaling factor Bias Accuracy against tested data Observation 6 × 5000 × 34, i.e. 6 × 1,70,000 1 × 34 × 5000, i.e. 1 × 1,70,000 Radial basis function (RBF) Cross-entropy 60:10:30 (in %) 561 (n = 34) 0.1 −0.291 61.8% (misclassification for 13 pipes out of 34 pipes) Figure 10. Confusion matrix for the SVM classification on tested data based on flow measurements. ISH JOURNAL OF HYDRAULIC ENGINEERING 11 Table 6. Number of probable leak locations in each pipe of Hanoi network. Pipe number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Pipe length (m) 100 1350 900 1150 1450 450 850 850 800 950 1200 3500 800 500 550 2730 1750 No. of probable leak locations = number of models 5 67 45 57 72 22 42 42 40 47 60 175 40 25 27 136 87 Pipe number 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Pipe length (m) 800 400 2200 1500 500 2650 1230 1300 850 300 750 1500 2000 1600 150 860 950 No. of probable leak locations = number of models 40 20 110 75 25 132 61 65 42 15 37 75 100 80 7 43 47 Table 7. Performance of ANN regression models using pressure and flow data for leak location and size detection. Statistical parameters RMSE n-RMSE R2 MAE MSE Case A (pressure models) Leak location 54.18 m 0.093 0.98 42.35 2935.29 Leak size 57.96 lps 0.193 — 32.19 3358.82 Case B (flow models) Leak location 54.02 m 0.093 0.98 40.59 2917.65 Leak size 29.56 lps 0.10 — 19.36 873.69 Table 8. Performance of the SVM regression analysis using pressure and flow data for leak location and leak sizes. Statistical parameters RMSE n-RMSE R2 MAE MSE Case A (pressure models) Leak location 50.06 0.086 0.986 20.59 2505.88 3.2.2. Solutions from pressure- and flow-based SVM model The recorded pressure and flow values from respective sen­ sors, were used to train the SVM model, against the leak location and the leak rates. The resulting models were tested against the testing data set prepared from simulating the leak size of 300 lps at the centre of all the pipes in the network. The performance indices for the models using pressure and flow in network is tabulated in Table 8. The results from ANN and SVM regression modelling can be found in APPENDIX B (Table B1 & B2) 4. Discussions The MLP neural network model has been developed to detect the leaking pipe in the network using pressure values. The model used three hidden layers of size, 30, 30 and 20 neurons and the transfer function as logsig (log sigmoidal), poslin (positive linear) and purelin (pure linear). The model classi­ fied the leaking pipe with almost 91.2% accuracy. The MLP neural network model for the flow values was also developed with three hidden layers of size, 10, 25, and 25 neurons and the transfer functions for each hidden layer was logsig, radbas (radial basis) and poslin. The model also predicts the leaking pipe in the network with 91.2% accuracy. Leak size 13.20 0.044 — 12.88 174.37 Case B (flow models) Leak location 45.44 0.078 0.986 21.47 2064.71 Leak size 31.85 0.11 — 20.74 1014.24 The supervised multi-class classification model was devel­ oped using the pressure values from the pressure sensors, with the ‘Gaussian’ function as kernel function. The model predicted the leaking pipe with 91.2% accuracy when trained using one-against-one strategy. The SVM multi-class classi­ fication model was again developed using the flow values from the sensors, with ‘Gaussian’ function as the kernel function. This model predicted the leaking pipe with 61.8% accuracy. The number of sensors installed in the network are one of the major factors in deciding the performance of the classi­ fication and regression models developed in the present study. The number of the sensors can further be optimized using some of the present algorithms. In the present study, seven pressure sensors and six flow sensors have been selected as sensor per 5.63 km and sensors per 6.57 km, respectively. Salam et al. (2014) have used emitter coefficient to model the leakage in the network using orifice equation. They have selected the values of emitter coefficients from 0.005 to 0.3, having average system pressure as 3.74 m, which in turn, gives the leak rates of 0.01–0.6 lps. Rojek and Studzinski (2019) have considered the leak rates in the range of 15–35 lps. The range of leakage rates has been chosen according to the maximum flow (5538.91 lps) in the Hanoi water distri­ bution network. Also, in previous studies, the researchers 12 N. SOURABH ET AL. had considered the leak rates as the percentages (2%, 4%, 6%, 8% and 10%) of flow rates in the pipes (Caputo and Pelagagge 2003); and 0.7% to 3.3% of the maximum flow in network (Van der Walt et al. 2018). The range of leakage rates taken in the present study is 10–350 lps, which is approximately 0.18%–6.31% of the maximum flow in the network without leakage. The network has been analyzed as demand-driven network. The leak rates in the present study have been considered (0.18%–6.31% of the maximum flow) as per existing practices in the literature. These leak detection techniques presented in current study can be combined with online monitoring systems which can allow us for quick and accurate detection of leaks. The presented study has been carried out to detect the leaking pipe, leak location and the leak rates in the water distribution network. The classification and regression mod­ els developed through ANN and SVM techniques require the optimization of their model parameters. The performance of the models depends upon the size of the data used and the number of features selected for the training. In the present study, it has been found that the model developed based on the measurement of pressure sensors have given better performance than that developed based on flow sensors. This is due to the fact that the leakages are modelled in the network as the base demand in the added extra nodes in the network. As per the basic assumption for the analysis of water distribution network, whatever demand is there in the network, it is assumed that the total water is available in the network to fulfil total demands. This means that, the network will never be deficient in the terms of demand and supply. But in actual field conditions, the network will always be in deficit after leakages and corresponding losses in the network. So, when the leak rates are added in the network as extra demand then the average system pressure increases, making pressure sensors more sensi­ tive to catch the deviations in nodal pressures. Carreno-Alvarado et al. (2017) compared the machine learning classifiers for leak detection and isolation in water distribution networks. They used PCA (principal component analysis), SVM and relevance vector machine (RVM), and found out that RVM is suitable for leak detection as it is having almost same accuracy than SVM but requires a smaller num­ ber of vectors. Quinones-Grueiro et al., (2018a) have used k-nearest neighbour, Bayes classifier, ANN and SVM and compared them for the leak location in Hanoi distribution networks. The results showed that SVM outperforms all other techniques. Again, Quinones-Grueiro et al. (2018b) have con­ sidered unsupervised approach to leak detection and localiza­ tion in water distribution networks and tested their methodology on Hanoi benchmark problem. The leak rates considered by them was in the range of 18–40 lps (less than 2.5%). They found out that periodic dynamic PCA along with using three pressure sensors gives 72% accuracy in detecting the leak and 85.25% accuracy in leak location. Akinsete and Oshingbesan (2019) investigated five intelligent models such as gradient boosting (GB), decision tree (DT), random forest (RF), SVM and ANN, in natural gas pipelines. The results showed that the RF and DT models are the most sensitive as they can detect a leak of 0.1% of nominal flow in about 2 h. However, the ANN and SVM showed the best performance. Lučin et al. (2021) used RF classifier for data-driven leak localization on urban water distribution networks using big data and found that RF showed maximum of 82% accuracy for smaller sized networks and 62% for greater networks. The data generation, training of the model, and evaluation of the trained model has been performed on the PC with the configurations as follows: Core i5 7th Gen, RAM − 8 GB Graphics Card − 4GB NVIDIA GeForce GTX 1050Ti Number of physical cores = 4 Multithreading – Available Maximum threads − 2 The data generation took the maximum time of 7–8 days. While training of the both ANN and SVM classification and regression models, the maximum time taken was 5–6 h. The testing of the model required time less than 0.5 s. 5. Summary and conclusions The present study has been carried out to detect, locate and pinpoint the leakage in the water distribution network using machine learning technologies, viz., ANN and SVM. The benchmark problem of Hanoi water distribution network has been used in the study and the leakage detection has been carried out for the range of leak rates from 10 to 350 lps (0.18–6.32% of the maximum flow in the network). The network is supposed to have seven pressure sensors (J-2, J-4, J-6, J-9, J-17, J-22 and J-24) and six flow sensors (P-2, P-9, P-20, P-23, P-25 and P-28). The problem of leak detec­ tion was divided into two different problems of classification and regression. The classification problem was solved to find the leaking pipe in the network, whereas the regression problem was solved to find leak location in the particular pipe and the leak rates. The ANN and SVM models were developed to detect the single leakage in the network. The leak simulated in the problem was 300 lps. The following are the key findings in present study:● ANN classification performed better than SVM multi- ● ● ● ● class classification, achieving 91.2% accuracy in both pressure and flow-based models, in contrast to SVM models, which gave the accuracy of 91.2% and only 61.8% in pressure and flow-based models, respectively. In regression problem, all the pipe segments were divided into several parts with interval of 20 m. For each location, a model was developed using ANN and SVM. Since there are total of 1963 different locations in the network, with 20 m intervals, so the number of models developed is also 1963 for each case in ANN and SVM. The pressure-based ANN regression model has yielded the performance with normalised RMSE (n-RMSE) of 0.093 and 0.197, whereas pressure-based SVM regres­ sion model has shown the performance with n-RMSE of 0.086 and 0.044, for detection of leak location and leak rate, respectively. Flow-based ANN regression model performed with n-RMSE of 0.093 and 0.10, whereas in case of flowbased SVM regression model, the n-RMSE was 0.078 and 0.11, for the detection of leak location and the leak rate. For pinpointing and detecting of leak rate, in the regression problem, the SVM regression model per­ formed better than ANN regression in both pressure and flow-based models. ISH JOURNAL OF HYDRAULIC ENGINEERING 6. Limitations and future scope The present study was carried out to detect the leakage in the water distribution network using machine learning techniques and a network of sensors. The techniques applied during the study, require the sensors to capture the maximum effects of leakage to flow characteristics of the network. The number and location of the sensors can further be optimized using some of the present algorithms. The reason behind this concept is that the machine learn­ ing techniques applied here are used to recognise the pattern in changes in network due to leakages. So, this can ascend the problem of the optimisation of the number of sensors and their numbers, in the water distribution network. Also, the study has been carried out considering the pressure and flow measurements separately. The pre­ sent study features the comparison between ANN and SVM techniques in leak detection and the performance accuracy. The leak detection using better methods such as CNN can be the future scope of present study. However, Yamashita et al. (2018) states that the methods like CNN require large amount of training data because of estimation of its numerous learnable parameters, making it more computationally expensive. Also, it will require gra­ phical processing units (GPUs) for model training. The scope for the future work can be, development of classification and regression models combining the pressure and flow measurements, using the machine learning techniques. Acknowledgements The authors would like to acknowledge the Centre of Excellence on “Water Resources and Flood Management” at Department of Civil Engineering, Sardar Vallabhbhai National Institute of Technology – Surat, Gujarat, India established under TEQIP-II grant of Ministry of Education for providing the required facilities and infrastructural sup­ port. Authors are thankful to the Editor, Associate Editor and Reviewers for their comments which helped in improvement of readability of the present paper. Disclosure statement No potential conflict of interest was reported by the authors. ORCID P.V. Timbadiya http://orcid.org/0000-0001-8472-3318 Data availability statement The distribution network data used and results from regression analysis in this study are available in the Appendix A & B after the references. Any other data related to study will be available based on the request for academic purposes only. Interested readers may directly contact the corresponding author for any other data requirements. References Abdulla, M.B., Herzallah, R.O., and Hammad, M.A. (2013). “Pipeline leak detection using artificial neural network: Experimental study”. Proceedings of International Conference on Modelling, Identification and Control (ICMIC), Cairo, Egypt, 328–332. Akinsete, O., and Oshingbesan, A. (2019). “Leak detection in natural gas pipelines using intelligent models”. Proceedings of SPE Nigeria 13 Annual International Conference and Exhibition, Nigeria: OnePetro, 10.2118/198738-MS. Aksela, K., Aksela, M., and Vahala, R. (2009). “Leakage detection in a real distribution network using a SOM.” Urban Water J., 6(4), 279–289.10.1080/15730620802673079. Alkasseh, J., Adlan, M.N., Abustan, I., Aziz, H.A., and Hanif, A.B.M. (2013). “Applying minimum night flow to estimate water loss using statistical modelling: A case study in Kinta Valley, Malaysia.” Water Resour. Manage., 27(5), 1439–1455. 10.1007/s11269-012-0247-2. Amari, S.I. (1993). “Backpropagation and stochastic gradient descent method.” Neurocomputing, 5(4–5), 185–196. 10.1016/0925-2312(93) 90006-O. Belsito, S., Lombardi, P., Andreussi, P., and Banerjee, S. (1998). “Leak detection in liquefied gas pipelines by artificial neural networks.” Process Systems Engineering, AIChE Journal, 44(12), 2675–2688.10. 1002/aic.690441209. Billmann, L., and Isermann, R. (1987). “Leak detection methods for pipelines.” Automatica, 23(3), 381–385.10.1016/0005-1098(87) 90011-2. Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K. (2002). “Evaluating resource selection functions.” Ecol. Modell, 157(2–3), 281–300. 10.1016/S0304-3800(02)00200-4. Cantos, W.P., Juran, I., and Tinelli, S. (2020). “Machine-learning–based risk assessment method for leak detection and geolocation in a water distribution system.” J. Infrastruct. Syst., 26(1), 04019039.10.1061/ (ASCE)IS.1943-555X.0000517. Caputo, A.C., and Pelagagge, P.M. (2003). “Using neural networks to monitor piping systems.” Process Safety Progress AIChE Journal, 22 (2), 119–127. 10.1002/prs.680220208. Carreno-Alvarado, E.P., Reynoso-Meza, G., Montalvo, I., and Izquierdo, J. (2017). “A comparison of machine learning classifiers for leak detection and isolation in urban networks.” In: Proceedings of Congress on numerical methods in engineering, SEMNI 2017, Valencia, Spain: International Center for Numerical Methods in Engineering (CIMNE), 1545–1552. http://hdl.handle.net/10251/ 160954 . Chan, T.K., Chin, C.S., and Zhong, X. (2018). “Review of current technologies and proposed intelligent methodologies for water dis­ tributed network leakage detection.” IEEE Access, 6, 78846–78867. 10.1109/ACCESS.2018.2885444. Dehuri, S., and Cho, S.B. (2010). “A hybrid genetic based functional link artificial neural network with a statistical comparison of classifiers over multiple datasets.” Neural Comput. Appl., 19(2), 317–328. 10. 1007/s00521-009-0310-y. de Silva, D., Mashford, J., and Burn, S. (2010). “Computer aided leak location and sizing in pipe network.” St Lucia, Queensland, Australia: Urban Water Security Research Alliance Technical Report No 17. Eliades, D.G., Kyriakou, M., and Polycarpou, M.M. (2014). “Sensor placement in water distribution systems using the S-PLACE Toolkit.” Procedia Engineering, 70, 602–611. 10.1016/j.proeng.2014. 02.066. Eliades, D.G., Kyriakou, M., Vrachimis, S., and Polycarpou, M.M. (2016). “EPANET-MATLAB toolkit: An open-source software for interfacing EPANET with MATLAB”. Proceedings of the 14th International Conference on Computing and Control for the Water Industry, Computer Control for Water Industry (CCWI 2016), Amsterdam, Netherlands, 10.5281/zenodo.437751. El-Zahab, S., and Zayed, T. (2019). “Leak detection in water distribution networks: An introductory overview.” Smart Water, 4(5), 1–23. 10. 1186/s40713-019-0017-x Eussuff, M.M., and Lansey, K.E. (2003). “Optimization of water dis­ tribution network design using the shuffled frog leaping algorithm.” J. Water Resour. Plann. Manage. (ASCE), 129(3), 210–225. 10.1061/ (ASCE)0733-9496(2003)129:3(210). Fan, X., and Yu, X. (2022). “An innovative machine learning based framework for water distribution network leakage detection and localization.” Struct. Health Monit., 21(4), 1626–1644.10.1177/ 14759217211040269. Farley, M., and Trow, S. (2003). Losses in water distribution networks. IWA Publishing, London, UK. Fujiwara, O., and Khang, D.B. (1990). “A two‐phase decomposition method for optimal design of looped water distribution networks.” Water Resour. Res., 26(4), 539–549.10.1029/WR026i004p00539. 14 N. SOURABH ET AL. Hamilton, S. (2009). “ALC in low pressure areas—it can be done“. Proceedings of 5th IWA Water Loss Reduction Specialist Conference, Cape Town, South Africa, 131–137. Hart, W.E., and Murray, R. (2010). “Review of sensor placement stra­ tegies for contamination warning systems in drinking water distri­ bution systems.” J. Water Resour. Plann. Manage. (ASCE), 136(6), 611–619.10.1061/(ASCE)WR.1943-5452.0000081. Hough, J.E. (1988). “Leak testing of pipelines uses pressures and acous­ tic velocity.” Oil and Gas Journal, 86(47), 35–41. Hsu, C.W., and Lin, C.J. (2002). “A comparison of methods for multi­ class support vector machines.” IEEE Trans. Neural Netw., 13(2), 415–425. 10.1109/72.991427. Huelss, H. (2020). “Norms are what machines make of them: Autonomous Weapons Systems and the normative implications of human-machine interactions.” International Political Sociology, 14 (2), 111–128.10.1093/ips/olz023. Hu, X., Han, Y., Yu, B., Geng, Z., and Fan, J. (2021). “Novel leakage detection and water loss management of urban water supply network using multiscale neural networks.” J. Clean. Prod., 278, 123611. 10. 1016/j.jclepro.2020.123611. Hunaidi, O. (1998). “Ground-penetrating radar for detection of leaks in buried plastic water distribution pipes”. Proceedings of the 7th International conference on Ground Penetrating Radar, GPR ’98, Lawrence, Kansas, USA. Izquierdo, J., López, P.A., Martínez, F.J., and Pérez, R. (2007). “Fault detection in water supply systems using hybrid (theory and data-driven) modelling.” Math Comput. Model, 46(3–4), 341–350.10.1016/j.mcm.2006.11.013. Jin, H., Zhang, L., Liang, W., and Ding, Q. (2014). “Integrated leakage detection and localization model for gas pipelines based on the acoustic wave method.” Journal of Loss Prevention in the Process Industries, 27, 74–88. 10.1016/j.jlp.2013.11.006. Khulief, Y.A., Khalifa, A., Mansour, R.B., and Habib, M.A. (2012). “Acoustic detection of leaks in water pipelines using measurements inside pipe.” J. Pipeline Syst. Eng. Pract., 3(2), 47–54.10.1061/(ASCE) PS.1949-1204.0000089. Kreßel, U.H.G. (1999). “Pairwise classification and support vector machines.” Advances in Kernel Methods: Support Vector Learning, 255–268. Cambridge, Massachusetts, United States: The MIT Press. https://www.researchgate.net/publication/2346087_Advances_in_ Kernel_Methods_-_Support_Vector_Learning . Kumar, R., Singh, R.D., and Sharma, K.D. (2005). “Water resources of India.” Curr. Sci., 89(5), 794–811. https://www.jstor.org/stable/ 24111024 Lambert, A.O. (2002). “International report: Water losses management and techniques.” Water Sci. Technol.: Water Supply, 2(4), 1–20.10. 2166/ws.2002.0115. Lockwood, A., Murray, T., Stuart, G., and Scudder, L. (2003). “A study of geophysical methods for water leak location”. Proceedings of PEDS 2003 (Pumps, Electromechanical Devices and Systems Applied to Urban Water Management). Valencia, Spain. Lučin, I., Lučin, B., Čarija, Z., and Sikirica, A. (2021). “Data-driven leak localization in urban water distribution networks using big data for random forest classifier.” Mathematics, 9(6), 672.10.3390/ math9060672. Mandal, S.K., Chan, F.T., and Tiwari, M.K. (2012). “Leak detection of pipeline: An integrated approach of rough set theory and artificial bee colony trained SVM.” Expert Syst. Appl., 39(3), 3071–3080. 10. 1016/j.eswa.2011.08.170. Mashford, J., de Silva, D., Burn, S., and Marney, D. (2012). “Leak detection in simulated water pipe networks using SVM.” Appl. Artif. Intell., 26(5), 429–444.10.1080/08839514.2012.670974. Mashhadi, N., Shahrour, I., Attoue, N., El Khattabi, J., and Aljer, A. (2021). “Use of machine learning for leak detection and localization in water distribution systems.” Smart Cities, 4(4), 1293–1315.10. 3390/smartcities4040069. MATLAB. (2014). The Math Works. : Natick, MA. Mounce, S.R., and Machell, J. (2006). “Burst detection using hydraulic data from water distribution systems with artificial neural networks.” Urban Water J, 3(1), 21–31.10.1080/15730620600578538. Mounce, S.R., Mounce, R.B., and Boxall, J.B. (2011). “Novelty detection for time series data analysis in water distribution systems using support vector machines.” J. Hydro-Informatics,13(4), 672–686. 10. 2166/hydro.2010.144. Moyer, E., Male, J.W., Moore, C., and Hock, G. (1983). “The economics of leak detection and repair – a case study.” Journal of American Water Works Association, 75(1), 29–35. 10.1002/j.1551-8833.1983. tb05054.x. Murray, R., Haxton, T., Janke, R., Hart, W.E., Berry, J., and Phillips, C. (2010). Sensor network design for drinking water contamination warning systems: A compendium of research results and case studies using the TEVA-SPOT software-Report, Cincinnati, OH: US Environmental Protection Agency, EPA Number: EPA/600/R-09/ 141. Mutikanga, H.E., Vairavamoorthy, K., Sharma, S.K., and Akita, C.S. (2011). “Operational tool for decision support in leakage control.” Water Practice & Technology, 6(3), wpt2011057. 10.2166/wpt.2011. 057. O’Brien, E., Murray, T., and McDonald, A., (2003). “Detecting leaks from water pipes at a test facility using ground penetrating radar”. Proceedings of PEDS 2003 (Pumps, Electromechanical Devices and Systems Applied to Urban Water Management), April 22-25, 2003, 1, Valencia, Spain, CORDIS, 395–404. Pilcher, R., Hamilton, S., Chapman, H., Ristovski, B., and Strapely, S., (2007). “Leak location and repair guidance notes”. Proceedings of International Water Association. Water Loss Task Forces: Specialist Group Efficient Operation and Management, Bucharest, Romania, IWA. Poulakis, Z., Valougeorgis, D., and Papadimitriou, C. (2003). “Leakage detection in water pipe networks using a Bayesian probabilistic framework.” Probabilistic Engineering Mechanics, 18(4), 315–327.10.1016/S0266-8920(03)00045-6. Puust, R., Kapelan, Z., Savic, D.A., and Koppel, T. (2010). “A review of methods for leakage management in pipe networks.” Urban Water J., 7(1), 25–45.10.1080/15730621003610878. Quiñones-Grueiro, M., Bernal-de Lázaro, J.M., Verde, C., PrietoMoreno, A., and Llanes-Santiago, O. (2018a). “Comparison of clas­ sifiers for leak location in water distribution networks.” IFACPapersonline, 51(24), 407–413.10.1016/j.ifacol.2018.09.609. Quiñones-Grueiro, M., Verde, C., Prieto-Moreno, A., and LlanesSantiago, O. (2018b). “An unsupervised approach to leak detection and location in water distribution networks.” International Journal of Applied Mathematics and Computer Science, 28(2), 283–295. 10. 2478/amcs-2018-0020. Rajtar, J., and Muthiah, R. (1997). “Pipeline leak detection system for oil and gas flowlines.” J. Manuf. Sci. Eng. (ASME), 19(1), 105–109. 10. 1115/1.2836545. Rojek, I., and Studzinski, J. (2019). “Detection and localization of water leaks in water nets supported by an ICT system with artificial intelligence methods as a way forward for smart cities.” Sustainability, 11(2), 518.10.3390/su11020518. Salam, A.E.U., Tola, M., Selintung, M., and Maricar, F. (2014). “On-line monitoring system of water leakage detection in pipe networks with artificial intelligence.” ARPN J. Eng. Appl. Sci., 9(10), 1817–1822. Samudrala, S. (2018). Machine Intelligence: Demystifying machine learn­ ing, neural networks and deep learning. Notion Press Chennai, India. https://books.google.co.id/books?id=LC2DDwAAQBAJ. Shinozuka, M., Liang, J., and Feng, M.Q. (2005). “Use of supervisory control and data acquisition for damage location of water delivery systems.” J. Eng. Mech., 131(3), 225–230.10.1061/(ASCE)0733-9399 (2005)131:3(225). Shravani, D., Prajwal, Y.R., Prapulla, S.B., Salanke, N.G.R., Shobha, G., and Ahmad, S.F. (2019). “A machine learning approach to water leak localization”. Proceedings of 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Florida International University, Miami, USA, 4, 1–6. 10.1109/CSITSS47250.2019.9031010. Silva, R.A., Buiatti, C.M., Cruz, S.L., and Pereira, J.A. (1996). “Pressure wave behaviour and leak detection in pipelines.” Comput. Chem. Eng., 20(1), S491–S496. 10.1016/0098-1354(96)00091-9 Stoianov, I., Nachman, L., Madden, S., and Tokmouline, T. (2007). “Pipeneta wireless sensor network for pipeline monitoring”. Proceedings of the 6th international conference on Information pro­ cessing in sensor networks, 264–273, April 25 - 27, 2007, New York, United States: Association for Computing Machinery, Cambridge, Massachusetts, USA. 10.1145/1236360.1236396. Suthaharan, S. (2016). “Support Vector Machine In machine learning models and algorithms for big data classification.” Integrated series in ISH JOURNAL OF HYDRAULIC ENGINEERING information systems, 36, 207–235. Springer: Boston, MA. 10.1007/ 978-1-4899-7641-3_9. Tariq, S., Bakhtawar, B., and Zayed, T. (2022). “Data-driven application of MEMS-based accelerometers for leak detection in water distribu­ tion networks.” Sci. Total Environ., 809, 151110. 10.1016/j.scitotenv. 2021.151110. Thirumalaiah, K., and Deo, M.C. (1998). “Real‐time flood forecasting using neural networks.” Computer‐aided Civil and Infrastructure Engineering, 13(2), 101–111. 10.1111/0885-9507.00090. Van der Walt, J.C., Heyns, P.S., and Wilke, D.N. (2018). “Pipe network leak detection: Comparison between statistical and machine learning techniques.” Urban Water J., 15(10), 953–960.10.1080/1573062X. 2019.1597375. Yamashita, R., Nishio, M., Do, R.K.G., and Togashi, K. (2018). “Convolutional neural networks: An overview and application 15 in radiology.” Insights Imaging, 9(4), 611–629. 10.1007/s13244018-0639-9. Zahab, S.E., Mosleh, F., and Zayed, T. (2016). “An accelerometerbased real-time monitoring and leak detection system for pressur­ ized water pipelines.” Pipelines 2016, Kansas City, Missouri: American Society of Civil Engineers, 257–268. 10.1061/ 9780784479957.025. Zhang, X.J. (1993). “Statistical leak detection in gas and liquid pipelines.” Pipes and Pipelines International, 38(4), 26–29. Zhang, Q., Wu, Z.Y., Zhao, M., Qi, J., Huang, Y., and Zhao, H. (2016). “Leakage zone identification in large-scale water distribution systems using multiclass support vector machines.” J. Water Resour. Plann. Manage.(ASCE)142(11), 4016042. 10.1061/(ASCE)WR.1943-5452. 0000661. 16 N. SOURABH ET AL. APPENDICES Appendix A. Details of the network simulation Table A1. Details of the nodes in the network. Node ID 1 (Source) 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Demand (lps) — 247.22 236.11 36.11 201.39 279.17 375 152.78 145.83 145.83 138.89 155.56 261.11 170.83 77.78 86.11 240.28 373.61 16.67 354.17 258.33 134.72 290.28 227.78 47.22 250 102.78 80.56 100 100 29.17 223.61 Head (m) 100 97.14 61.67 57.25 51.77 46.03 44.71 43.17 41.96 41.08 39.52 38.37 34.16 34.72 34.26 34.26 41.31 51.36 58.14 50.78 41.43 36.27 44.84 39.88 36.82 33.55 33.01 36.31 31.72 30.85 31.34 32.65 Table A2. Lengths of the different pipe diameters in the network. S. No. 1 2 3 4 5 6 Diameter (mm) 1016 762 609.6 508 406.4 304.8 No. of pipes 11 4 3 4 6 6 Length of the network (m) 12750 4680 4700 5050 8390 3850 ISH JOURNAL OF HYDRAULIC ENGINEERING 17 Table A3. Details of the piping network (pipe length, optimal diameter and the flow in the pipes). Pipe No. P-1 P-2 P-3 P-4 P-5 P-6 P-7 P-8 P-9 P-10 P-11 P-12 P-13 P-14 P-15 P-16 P-17 P-18 P-19 P-20 P-21 P-22 P-23 P-24 P-25 P-26 P-27 P-28 P-29 P-30 P-31 P-32 P-33 P-34 From node R-1 J-1 J-2 J-3 J-4 J-5 J-6 J-7 J-8 J-9 J-10 J-11 J-9 J-13 J-14 J-15 J-17 J-18 J-2 J-2 J-19 J-20 J-19 J-22 J-23 J-25 J-26 J-26 J-22 J-27 J-28 J-30 J-30 J-31 To node J-1 J-2 J-3 J-4 J-5 J-6 J-7 J-8 J-9 J-10 J-11 J-12 J-13 J-14 J-15 J-16 J-16 J-17 J-18 J-19 J-20 J-21 J-22 J-23 J-24 J-24 J-25 J-15 J-27 J-28 J-29 J-29 J-31 J-24 Length (m) 100 1350 900 1150 1450 450 850 850 800 950 1200 3500 800 500 550 2730 1750 800 400 2200 1500 500 2650 1230 1300 850 300 750 1500 2000 1600 150 860 950 Diameter (mm) 1016 1016 1016 1016 1016 1016 1016 1016 1016 762 762 609.6 406.4 406.4 304.8 406.4 508 609.6 609.6 1016 508 304.8 1016 762 762 508 304.8 304.8 406.4 406.4 304.8 304.8 406.4 508 Flow (lps) 5538.9 5291.68 2140.84 2104.73 1903.34 1624.17 1249.17 1096.39 950.56 555.56 416.67 261.11 249.17 78.34 0.56 135.79 376.07 749.68 766.35 2148.38 393.05 134.72 1401.16 902.88 675.1 302.54 52.54 50.24 208 127.44 27.44 72.56 101.73 325.34 Velocity (m/s) 6.83 6.53 2.64 2.6 2.35 2 1.54 1.35 1.17 1.22 0.91 0.89 1.92 0.6 0.01 1.05 1.86 2.57 2.63 2.65 1.94 1.85 1.73 1.98 1.48 1.49 0.72 0.69 1.6 0.98 0.38 0.99 0.78 1.61 Unit headloss (m/km) 28.59 26.27 4.92 4.76 3.95 2.95 1.81 1.42 1.09 1.64 0.96 1.2 7.95 0.93 0 2.58 5.74 8.48 8.83 4.95 6.23 10.33 2.24 4.03 2.36 3.84 1.81 1.66 5.69 2.3 0.54 3.28 1.51 4.39 18 N. SOURABH ET AL. Appendix B. Results from regression modelling Table B1. Regression results for Case A and Case B (ANN). Case A Case B (Pressure models) Pipe No. P-01 P-02 P-03 P-04 P-05 P-06 P-07 P-08 P-09 P-10 P-11 P-12 P-13 P-14 P-15 P-16 P-17 P-18 P-19 P-20 P-21 P-22 P-23 P-24 P-25 P-26 P-27 P-28 P-29 P-30 P-31 P-32 P-33 P-34 Length 100 1350 900 1150 1450 450 850 850 800 950 1200 3500 800 500 550 2730 1750 800 400 2200 1500 500 2650 1230 1300 850 300 750 1500 2000 1600 150 860 950 Observed leak location distance from u/s node (m) 50 675 450 575 725 225 425 425 400 475 600 1750 400 250 275 1365 875 400 200 1100 750 250 1325 615 650 425 150 375 750 1000 800 75 430 475 Leak location predicted (m) 50 770 470 550 670 190 450 530 430 470 670 1710 390 250 230 1370 770 430 250 1010 850 250 1350 690 710 490 150 350 810 930 890 70 410 470 Leak size predicted (LPS) 14.7 229.5 267.77 266.31 327.25 260.56 284.22 274.56 284.55 299.5 307.54 296.25 254.27 295.12 271.52 282.31 309.98 303.78 251.46 306.21 265.7 228.06 306 310.79 254.24 288.6 238.14 258.07 296.88 287.71 286.82 299.35 268.22 272.61 (Flow models) Leak location predicted (m) 50 670 350 530 770 250 390 470 370 510 570 1690 390 270 290 1350 850 370 250 1190 770 270 1250 490 750 450 150 390 750 850 750 90 450 530 Leak size predicted (LPS) 181.6 349.89 309.07 319.16 290.25 275.41 299.97 297.68 317.09 295.71 301.97 294.77 322.93 283.67 279.21 291.36 312.61 290.37 310.09 285.55 240.94 284.17 309.5 291.79 263.93 306.67 266.02 304.06 312.2 343.25 305.35 263.69 303.34 307.02 ISH JOURNAL OF HYDRAULIC ENGINEERING 19 Table B2. Regression results for Case A and Case B (SVM). Pipe No. P-1 P-2 P-3 P-4 P-5 P-6 P-7 P-8 P-9 P-10 P-11 P-12 P-13 P-14 P-15 P-16 P-17 P-18 P-19 P-20 P-21 P-22 P-23 P-24 P-25 P-26 P-27 P-28 P-29 P-30 P-31 P-32 P-33 P-34 Length 100 1350 900 1150 1450 450 850 850 800 950 1200 3500 800 500 550 2730 1750 800 400 2200 1500 500 2650 1230 1300 850 300 750 1500 2000 1600 150 860 950 Observed leak location distance from u/s node (m) 50 675 450 575 725 225 425 425 400 475 600 1750 400 250 275 1365 875 400 200 1100 750 250 1325 615 650 425 150 375 750 1000 800 75 430 475 Case A Case B (Pressure models) (Flow models) Leak location predicted (m) 50 630 450 570 730 230 430 430 390 470 470 1930 390 250 270 1370 870 410 190 1090 930 250 1310 610 650 430 150 370 750 990 790 70 430 490 Leak size predicted (LPS) 287.5 297.9 287.5 287.5 287.3 287.3 287.4 287.4 287.5 287.5 287.5 287.5 283.4 287.5 286.6 287.4 287.2 284.8 275 286.6 287.5 287.5 287.4 287.3 287.5 287.3 287.5 286.7 287.5 287.5 287.5 287.5 287.5 287.5 Leak location predicted (m) 50 630 450 570 730 230 430 430 390 430 770 1630 390 250 270 1370 870 410 210 1090 610 210 1330 610 650 430 150 370 750 990 790 70 450 490 Leak size predicted (LPS) 182.7 287.5 287.5 286.7 286.6 286.9 287.1 287.3 287.3 287.5 287.5 287.5 240.8 287.5 285.3 287.6 285.8 273 194.3 277.8 287.5 287.5 287.1 287.3 287.5 286.3 287.5 285.1 287.5 287.5 287.5 287.5 287.5 287.5