IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) | 978-1-6654-0926-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/INFOCOMWKSHPS54753.2022.9798212 IEEE INFOCOM WKSHPS: A4E 2022: AI/ML for Edge/Fog Networks Machine learning model for IoT-Edge device based Water Quality Monitoring Yogendra Kumar, Siba K Udgata* School of Computer and Information Sciences University of Hyderabad, Hyderabad, India email: yogendrak348@gmail.com, udgata@uohyd.ac.in Abstract—The aim of this work is to intelligently detect alarming events in the water quality using machine learning techniques at the edge device, which is adaptive to localities, applications and also time. There are four objectives of this work; (1) To develop an edge device for sensing the water quality parameters (2) to detect changes in the water quality with respect to base line parameter using a machine learning approach at the edge device itself (3) to generate the alarm signals when water quality parameters go beyond its threshold value and (3) to classify different types of contamination and analyze them for identifying possible contamination types. For the experimentation, three water quality indicative methods are used to calculate the water quality, namely (a) Weighted Arithmetic Index, (b) NSF Water Quality Index and (c) User feedback of the water quality. Water quality is determined using water quality indexes (WQI) on the basis of six physico-chemical sensor parameters like biological oxygen demand, dissolved oxygen, pH, total hardness, total dissolved solids and turbidity. With the help of WQI of these methods, a light weight machine learning model which is suitable for the edge device, has been developed using the Support Vector Machine (SVM) algorithm. We also clustered the alarming events to find out different types of alarming events. Index Terms—Edge Intelligence, Water Quality Sensors, Water Quality Index (WQI), Alarm Events, Machine Learning, Event Clustering. I. I NTRODUCTION Water is essential to human life, health and equally important for the environment. In our daily lives, we use water for many purposes like drinking, cooking, bathing, cleaning, agriculture, irrigation, among a few others. We require quality water which is different for different purposes mentioned above. Quality water is not only important for human health, but also for farming fish, agriculture success, wildlife habitats and contributes to the health of the mother earth. If the water quality is not monitored and maintained, it will be harmful in various ways to human life and the environment. In [1], authors have proposed different applications of the wireless sensor network using machine learning models. Intelligence at the edge is very crucial in these types of applications where the decision can be taken at the edge and required follow-up action can be initiated. In the literature, many models are proposed which are cloud and fog based architectures and mostly suffer from large latency. In [2], authors have proposed a sensor node placement method for detecting the contamination source in a wireless sensor network based water quality monitoring system, There are some methods as Weighted Arithmetic Index, NSF Water Quality Index, CCME Water Quality Index etc, which can detect the water quality based on the value of the Water Quality Index and in this work we propose to have a provision for user feedback of the water quality which can be considered as another method. In this work, we propose a machine learning approach to detect the changes in water quality and also the type of contamination. Support Vector Machine (SVM) is used to build a machine learning (ML) model using alarming events which are generated with the help of given WQI values. The workflow is shown in figure 1. The training data set is used to create a model and the test data set is used to validate the model. A water quality index provides one number (like grade) that shows the overall quality of water at a certain time and location based on several parameters of water quality. The objective of an index is to change complex data of water quality into usable and reasonable information. Among the scientists of water quality use of an index to ”grade” water quality is a debatable issue. It is not possible to express the whole description of water quality by a single number and many other parameters of water quality are there, which are not included in the water quality index. The index we are presenting here is not especially intended for aquatic life regulations or human health. However, a water quality index can give a meaningful indicator/ signal of water quality which is based on some relevant and important parameters. It provides the public a general idea of the probable problems with the water in the region. The rest of the paper is organized as follow: in section II the related works and background is discussed. Section III gives explanations on Conventional WQI Detection methods and Machine Learning algorithm used in the proposed work. In section IV we explained the design and development of our proposed Algorithm and also discussed the results and experimental analyses. In section V we have described the conclusions of the paper and followed by an outline of future works in section VI. II. R ELATED W ORK As water pollution is increasing day by day, the requirement of quality water and the consequences of polluted water have drawn the attention of researchers. Although a good amount of quality work has been done using the latest technologies and applying the recent algorithms in the recent decade by the researchers [1][2], 978-1-6654-0926-1/22/$31.00 ©2022 Authorized licensed use limited to: IEEE Satbayev University. Downloaded on March 02,2023 at 07:37:11 UTC from IEEE Xplore. Restrictions apply. IEEE INFOCOM WKSHPS: A4E 2022: AI/ML for Edge/Fog Networks Fig. 1: Workflow for detection and classification of alarming events of water quality using machine learning approach. still there is a requirement to design and develop some improved and optimal solutions for detecting the water quality index and water quality class. Most of the proposed methods are based on a wide area sensor network and a cloud based analysis model for monitoring the water quality which has its own limitations. An independent and intelligent IoT-edge device that can collect the data in its IoT-layer and do the intelligent computation and analysis in the edge-layer still remains a challenge. The supervised machine learning based approach to predict the water quality was proposed by Ahmed et al.[10]. The parameter used in the proposed model, namely total dissolved solid, turbidity, temperature and pH value of water. To retrieve the water quality indicators Hafeez et al.[11] presented a study based on different Machine Learning Algorithms in Hong Kong over the coastal waters. Yung et al.[12] discussed and proposed an approach using Decision Tree to predict the water quality index on Klang River water in Malaysia. Jing et al.[13] designed an integrated approach using a firefly meta-heuristic algorithm with Support Vector Machine and presented a hybrid evolutionary model for determining the indicator of water quality. An indirect methodology presented by Granata et al.[14] for the estimation of quality indicators of the main wastewater based on Regression Trees and Support Vector Machine. Xusong et al.[15] proposed a technique for monitoring the quality of wastewater and Hoon et al.[16] used GOCI satellite data for monitoring coastal water quality based on the Machine Learning approaches. Arunima et.al [17] proposed a Soft sensor model for Chemical Oxygen Demand (COD) estimation and Sahoo et.al. [18] proposed a model for accurate estimation of water level using ultrasonic sensors in a direction for providing safe and sufficient water. Das et. al. [2] proposed a method for optimal placement of sensor nodes in a water channel network for detecting the source of contamination. In most of the above-proposed methods, recent machine learning algorithms and statistical methods are used, but there is no involvement of well established and acceptable methods for finding the WQI. In this work for training the machine learning model, the traditional and popular methods, namely Weighted Arithmetic Index, NSF Water Quality Index and User feedback are also used, which makes the proposed method more efficient, adaptive and acceptable. In addition, we propose to integrate the complete model and integrate it with the IoT-edge device for independent and immediate decision making. III. C ONVENTIONAL WQI METHODS AND M ACHINE L EARNING A LGORITHMS IN THE PROPOSED WORK Water quality is the condition of the water body. It can be defined in two terms (i)Quantitative and (ii)Qualitative. In this work, we have focused on the term Qualitative. The quality of water is being determined by various methods based on different qualitative parameters like pH, Dissolved Oxygen (DO), Oxidation Reduction Potential (ORP), Turbidity BOD, COD, Salinity, Arsenic, Heavy Metals, Bacteria concentration, Fluoride, Nitrogen, Total hardness etc. There are many methods proposed in the literature to determine the quality of the water based on the above quality parameters. Water Quality Index based methods are mainly used to find a quality index of the water based on the value of the sample parameters. Based on the index value (a single number) the water is classified as very good, good, average, bad, worst, etc. In our study, we have considered only two types of quality like good or bad. Some of the popular WQI based methods are (i) Weighted Arithmetic Index and (ii) NSF Water Quality Index. These methods detect the water quality in terms of Water Quality Authorized licensed use limited to: Satbayev University. Downloaded on March 02,2023 at 07:37:11 UTC from IEEE Xplore. Restrictions apply. IEEE INFOCOM WKSHPS: A4E 2022: AI/ML for Edge/Fog Networks Index. The methodologies used for calculating different water quality indexes are given as follows. A. Water Quality Indexes 1) Weighted Arithmetic Index: The Weighted Arithmetic Index[3] is a standard of drinking water quality followed by the World Health Organization (WHO), the Indian Council for Medical Research (ICMR) and the Bureau of Indian Standards (BIS). Based on Water Quality Index, the status of the given water sample is calculated. Different Water Quality status is given with their respective Water Quality Index Level in table I. WQI Level 0-25 26-50 51-75 76-100 >100 Status of of Water Quality Excellent water quality Good water quality Poor water quality Very Poor water quality Unsuitable for drinking TABLE I: Weighted Arithmetic Index WQI and status of water quality 2) NSF Water Quality Index: It is a standard index of water quality that was designed and created by the National Sanitation Foundation(NSF). The calculating principle of the NSF Water Quality Index is given in [4]. IV. P ROPOSED M ETHOD A. Proposed Algorithm The proposed work introduces the SVM based WQI, given as algorithm 1, for classifying the water into two categories as normal and abnormal water quality based on traditional methods and Machine Learning approaches. The Proposed Algorithm takes the Training and Testing Data set, including the Water Quality Indexes calculated using Traditional methods and returns the classified value of WQI as normal or abnormal based on the Machine Learning classifier. Step 1 of the proposed algorithm calculates the WQI and corresponding alarming events of all the given traditional Methods. Step 2 is one of the most important steps of the algorithm, in this step, the union of WQI of all the traditional methods is calculated and stored in the variable named Abnormalindex[], which means if the instance is classified as abnormal by any one of the given methods, the instance will be considered abnormal WQI and remaining all other instances will be set as normal WQI by taking the set-difference. Initialise the WQI with 0 and 1 for the normal and abnormal index respectively in step 3 and assign it to WQIClass[]. Step 4 is to append the proposed WQIClass[] to the main dataset with their respective instances, values present in WQIClass[] are considered as the label for the supervised Machine Learning algorithm. Step 4 also involves the separation of the Training and Testing dataset. In Step 5 Machine Learning Model is trained based on Support Vector Machines followed by analyses of the trained model on the test data set in step 6. Algorithm 1 Proposed Algorithm Input: DP XN = Water Quality Data for Training with N no of water quality parameters {x1 , x2 , ..., xN }, and P no of samples. TQXN = Water Quality Data for Testing with N no of water quality parameters {x1 , x2 , ..., xN }, and Q no of Examples. M = {W QIM ethod1 , . . . W QIM ethodK }, Water Quality Index Methods to calculate water quality where K is the number of methods. Output: SVM model based classification of WQI. 1. Calculate the WQI and corresponding alarming events of all the given traditional Methods. 2. Abnormalindex[] = Union(Abnormalindex1 , Abnormalindex2 ,.., AbnormalindexK ) NormalIndex[] = SetDifference(Totalindex[], Abnormalindex[]) 3. Initialize: ∀ Totalindex[], set 1 for Abnormalindexes otherwise set 0. WQIclass[] = Totalindex[] 4. Appending the WQIclass[] to main dataset and separate the training dataset from it. 5. Build the model on training dataset using Support Vector Machines Algorithm. 6. Use and analyze the composite SVM-Model on test data or unseen examples. B. Experimental set-up, results and discussions: This work is an outcome of a funded research project for monitoring water quality parameters in a wireless sensor zone. We have been considering different water quality parameters and locations for analyzing the water quality of the samples. The different parameters considered for water quality measurements and their acceptable range is given in the following table 2 along with water temperature. These parameters are chosen based on their importance, usability, availability of sensor and associated electronics. These parameters are mainly interdependent and change due to environmental conditions. Thus, a single parameter based abnormality analysis always leads to improper results and we need a multi-parameter monitoring system. Drinking Water standards and respective recommending agencies are given below(All values except pH and Electrical conductivity are in mg/L) We collected some samples from different locations using the IoT-edge device developed and installed for the purpose and also generated the data using the data augmentation approach to have a robust data set for training and testing. The data is cleaned to remove some outliers which are there due to some observational errors. We collected some 4000 samples test results for the experimentation. Out of the 4000 samples 2000 samples are used for the training and 2000 for testing the proposed model. Initially, the IoT-edge device deployed at different locations sends the data to the server using wireless Authorized licensed use limited to: Satbayev University. Downloaded on March 02,2023 at 07:37:11 UTC from IEEE Xplore. Restrictions apply. IEEE INFOCOM WKSHPS: A4E 2022: AI/ML for Edge/Fog Networks Fig. 2: Drinking Water standards and respective recommending agencies. network and cellular network. The algorithm is executed at the server to classify the data, generate alarms and also determine the possible nature of alarm through clustering techniques. Then, the trained model is embedded in the IoT-edge device for independent decision making without any cloud/ server. An IoT-edge device with the detailed configuration as listed below to monitor the different water quality parameters, is developed under project AquaSense, as shown in Fig. 3 (a). the quality parameters values at different installations (a snap shot) is shown in a google map in Fig. 3(b). Fig 3(a): AquaSense Sensor Node. Fig 3(b): Water quality monitoring at different sites. Arduino Uno Board Turbidity : TSL 235R LFC,IR Led pH : Polar graphic probe,LMC6001 Temperature: DS18b20 Conductivity: Platinum electrode Dissolve Oxygen: Galvanic probe ORP : Platinum electrode probe,TL072 Wireless protocol: XBEE S2/GSM/Bluetooth Voltage Source :5V option to back up with solar panel We used WQI methods for experimentation, namely Weighted Arithmetic Index and NSF Water Quality Index, for the training data set. The training data set is used to generate an SVM based model for event classification (Alarming or not alarming). We also have a provision for user feedback of the water quality which is used as a third method. The event fingerprints (normalized deviation of different parameter values of the alarming sample) are captured and classified further using the K-means and Fuzzy C-means clustering method. For testing the model, we also used all the three methods described above. 1) Alarm Events and Event Fingerprints Using SVM based WQI: The alarm signals and event fingerprints are generated using SVM based WQIas shown in figure 3. The input signals have been generated based on input parameters value, and the SVM model is applied to these input signals which generates the output signals; the output signals contain both normal signals and alarm signals. SVM model calculates the water quality in the binary format, technically it detects WQI as either 0 or 1. If the SVM based WQI is 0 the water quality is classified as normal and if the SVM based WQI is 1 the water quality is classified as abnormal. The threshold value of WQI in this method is 1. A total of 48 signals are reaching the threshold value whose values are equal to 1, these signals are called alarm signals. 2) Detection Accuracy of Proposed Method: A total of 2000 instances of the test dataset was used to estimate and analyze the accuracy of the proposed SVM based Machine Learning model and it is found that out of 2000 instances, 1952 instances were classified as normal water quality and 48 instances were classified as abnormal water quality. Out of 2000 instances, 1988 instances were correctly classified and the remaining only 12 instances were wrongly classified using the proposed approach. The confusion matrix of this approach is given in figure 4. Based on the confusion matrix, the accuracy of the SVM based WQI Classifier is calculated as 99.4%, which shows a promising result of the proposed method. 3) Event Fingerprints Clustering: The common characteristics or patterns in the deviated values of parameters are easily identified by separating the fingerprints of abnormal instances into groups/clusters. Therefore, the main purpose of clustering in this work is to detect, identify, analyze and visualize the pattern of water quality parameters deviated value. Two clustering algorithms Fuzzy c-means and K-means clustering were used for clustering the fingerprints of abnormal class instances. Inter cluster and intra cluster distance was used for identifying the better clustering algorithm. The performance of the clustering Algorithm showed minor variation according to the distribution of parameters values. Following are the water quality parameters used in the study: 1) Biological Oxygen Demand 2) Dissolved Oxygen 3) pH 4) Total Hardness 5) Total Dissolved Solid 6) Turbidity Figure 5 shows the event fingerprints clustering. Fingerprints of abnormal quality class instances are divided into four different Authorized licensed use limited to: Satbayev University. Downloaded on March 02,2023 at 07:37:11 UTC from IEEE Xplore. Restrictions apply. IEEE INFOCOM WKSHPS: A4E 2022: AI/ML for Edge/Fog Networks Fig. 3: Alarm Signals and Event Fingerprints using SVM based WQI. Fig. 4: Confusion Matrix obtained from the SVM model. clusters. In the given graph, the clusters are classified by using both Fuzzy c-means clustering and K-means clustering algorithm. In cluster 1, all the parameter values have positively deviated except Total Hardness. In other words, Total Hardness is the only parameter that value is decreased and all remaining parameter values are increased from their respective standard values in all the instances of this cluster. From cluster 2, it is recognized that pH and Total Dissolved Solid values are decreased and the other four parameter values are increased from their respective standard values in all the instances of this cluster. From Cluster 3, it is observed that all parameter values negatively deviate except Dissolved Oxygen and Total Dissolved Solids. Dissolved Oxygen has positively deviated in all the instances of this cluster and Total Dissolved Solid is positively deviated in most of the instances of this cluster. Cluster 4 shows pH value has positively deviated in all the instances of the cluster, Dissolved Oxygen and turbidity values have positively deviated in most of the instances of this cluster. The remaining parameters values are negatively deviated in most of the instances of this cluster. It is analyzed and observed that all four clusters have shown some specific patterns and characteristics of deviation with their respective standard values. Based on the above observation, it is concluded that any of the above combinations may result in abnormal/bad water quality. The inter cluster distance has been calculated using the Dunn index. The Inter clustering distances of both clustering algorithms are given as follows: Using fuzzy c-means clustering algorithm = 0.3193 Using K-means clustering algorithm = 0.3534 The Intra clustering distance has been calculated using Silhouette value[6]. Silhouette is a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object lies within its cluster [6]. The quality of the clustering algorithm was estimated with the help of the above Intra clustering and Inter clustering distance results. The properties of good clustering are (i)Minimize intracluster distances and (ii)maximize inter-cluster distance. With the results of both distances, it is observed that K-means clustering shows better performance in this scenario as it has minimum average intra-cluster and maximum inter-cluster distance in comparison to fuzzy c-means clustering. 4) Importance of Proposed Method: (a) It is a combination of other traditional methods, So it will always give at least average accurate results. (b) Any number of traditional methods of water quality index can be combined and the Machine Learning model can be applied successfully on the resultant dataset. V. C ONCLUSIONS AND FUTURE SCOPE In this work, we proposed an IoT-edge device embedded with a machine learning model to detect abnormal changes in water quality to detect alarming events. Various techniques for detecting changes in the water quality were explored. The main indicator and different parameters responsible for changes in water quality were analyzed. The fingerprints of water parameters Authorized licensed use limited to: Satbayev University. Downloaded on March 02,2023 at 07:37:11 UTC from IEEE Xplore. Restrictions apply. IEEE INFOCOM WKSHPS: A4E 2022: AI/ML for Edge/Fog Networks Fig. 5: Event Fingerprints Clustering. into different groups were clustered. Three water quality methods as the Weighted Arithmetic Index, NSF Water Quality Index and user feedback of the water quality used for detecting the changes in water quality, and with the help of these methods the SVM model was built to learn and detect the changes in water quality for alarming events. Two machine learning clustering algorithms, fuzzy c-means clustering and k-means clustering used for analyzing the behaviour and pattern of the abnormal class of water quality index. Intra clustering and inter clustering distance were used for analyzing the quality of clustering algorithms. In future, we will deploy more such IoT-edge devices and test the proposed algorithm on more number of water samples in a water distribution network system. We will also consider more number of water quality parameters, other machine learning algorithms and check their impact in the proposed algorithm. R EFERENCES [1] Mayur V. Bhanderi and Hitesh B. Shah. Machine Learning for Wireless Sensor Network: A Review, Challenges and Applications. Research India Publications. 4(12):475-486, 2014. [2] S. Das and S. K. Udgata, ”Sensor Placement for Contamination Source Detection in Water Channel Networks,” ICC 2021 - IEEE International Conference on Communications, 2021, pp. 1-6, doi: 10.1109/ICC42927.2021.9500683. [3] K. Yogendra and E.T. Puttaiah. Determination of Water Quality Index and Suitability of an Urban Waterbody in Shimoga Town, Karnataka. The 12 th World Lake Conference (5):342-346, 2008. [4] C. SADASHIVAIAH C. R. RAMAKRISHNAIAH and G. RANGANNA. Assessment of Water Quality Index for the Groundwater in Tumkur Taluk, Karnataka State, India. E-Journal of Chemistry, 6(8):523-530, 2009. Swapnil R. Kamble and Ritesh Vijay. Assessment of water quality using cluster analysis in coastal region of Mumbai, India. Environ Monit Assess 178: 321-332, 2010. [5] S. Ankita and M. Prerna. Comparison Of K-Means And Fuzzy C-Means Algorithms. International Journal of Engineering Research and Technology (IJERT) ISSN: 2278-0181 Vol. 2 Issue 5, 2013. [6] Peter J. ROUSSEEUW. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics,North-Holland, 53-65, 1987. [7] Kunwar P. Singh, Nikita Basant and Shikha Gupta. Support vector machines in water quality management, ELSEVIER, 152-162, 2011. [8] Nabeel M. Gazzaz, Mohd Kamil Yusoff, Ahmad Zaharin Aris b, Hafizan Juahir b and Mohammad Firuz Ramli. Artificial neural network modeling of the water quality index for Kinta River(Malaysia) using water quality variables as predictors. ELSEVIER, 2409-2420, 2012. [9] Rosaida Rosly, Mokhairi Makhtar, Mohd Khalid Awang, M Nordin A Rahman and Mustafa Mat Deris. The Study on the Accuracy of Classifiers for Water Quality Application. International Journal of u- and e- Service, Science and Technology Vol.8, 145-154, 2015. [10] Umair Ahmed, Rafia Mumtaz, Hirra Anwar, Asad A. Shah, Rabia Irfan, and Jos Garca-Nieto. ”Efficient Water Quality Prediction Using Supervised Machine Learning.” Water 11, no. 11 (2019): 2210. [11] Sidrah Hafeez, Man Sing Wong, Hung Chak Ho, Majid Nazeer, Janet Nichol, Sawaid Abbas, Danling Tang, Kwon Ho Lee, and Lilian Pun. ”Comparison of machine learning algorithms for retrieval of water quality indicators in case-II waters: a case study of Hong Kong.” Remote sensing 11, no. 6 (2019): 617. [12] Jun Yung Ho, Haitham Abdulmohsin Afan, Amr H. El-Shafie, Suhana Binti Koting, Nuruol Syuhadaa Mohd, Wan Zurina Binti Jaafar, Hin Lai Sai et al. ”Towards a time and cost effective approach to water quality index class prediction.” Journal of Hydrology 575 (2019): 148-165. [13] Jing Li, Husam Ali Abdulmohsin, Samer Sami Hasan, Li Kaiming, Belal Al-Khateeb, Mazen Ismaeel Ghareb, and Muamer N. Mohammed. ”Hybrid soft computing approach for determining water quality indicator: Euphrates River.” Neural Computing and Applications 31, no. 3 (2019): 827-837. [14] Francesco Granata, Stefano Papirio, Giovanni Esposito, Rudy Gargano, and Giovanni De Marinis. ”Machine learning algorithms for the forecasting of wastewater quality indicators.” Water 9, no. 2 (2017): 105. [15] Xusong Qin, Furong Gao, and Guohua Chen. ”Wastewater quality monitoring system using sensor fusion and machine learning techniques.” Water research 46, no. 4 (2012): 1133-1144. [16] Yong Hoon Kim, Jungho Im, Ho Kyung Ha, Jong-Kuk Choi, and Sunghyun Ha. ”Machine learning approaches to coastal water quality monitoring using GOCI satellite data.” GIScience & Remote Sensing 51, no. 2 (2014): 158-174. [17] A. S. Pattanayak, B. S. Pattnaik, S. K. Udgata and A. K. Panda, ”Development of Chemical Oxygen on Demand (COD) Soft Sensor using Edge Intelligence,” in IEEE Sensors Journal, doi: 10.1109/JSEN.2020.3010134. [18] A. K. Sahoo and S. K. Udgata, ”A Novel ANN-Based Adaptive Ultrasonic Measurement System for Accurate Water Level Monitoring,” in IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 6, pp. 3359-3369, June 2020. Authorized licensed use limited to: Satbayev University. Downloaded on March 02,2023 at 07:37:11 UTC from IEEE Xplore. Restrictions apply.