Adding Explainability to Machine Learning Models to Detect Chronic Kidney Disease Abstract—Chronic Kidney Disease is a common term for multiple heterogeneous diseases in the kidneys. It is also known as Chronic Renal Disease. Chronic kidney disease (CKD) has a gradual loss of glomerular filtration rate (GFR) over three months. The patient does not observe any significant symptoms in the earlier stage of CKD, and it is not identifiable without clinical tests like urine and blood tests. Patients with CKD would have a higher chance of developing heart disease. CKD is a progressive and often irreversible process of renal function decline, which may reach an endpoint of end-stage renal failure, requiring renal replacement therapy. It is critical to diagnose progressive CKD at an early stage and predict patients prone to developing the disease further for timely therapeutic interventions. As such, researchers have expended enormous efforts in the development of novel biomarkers that may identify subjects with early CKD at risk of progression. In this study, we have developed an explainable machine learning model to predict chronic kidney disease by implementing an automated data pipeline using the Random Forest ensemble learning trees model and feature selection algorithm. The explainability of the proposed model has been assessed in terms of feature importance and explainability metrics. Three explainability methods; LIME, SHAP, and SKATER have been applied to interpret the developed model and to compare the explainability results using Interpretability, Fidelity, and Fidelityto-Interpretability ratio as the explainability metrics. Index Terms—chronic kidney disease, ckd, explainability, interpretability, machine learning, neural network I. I NTRODUCTION Chronic Kidney Disease (CKD) is characterized by a glomerular filtration rate (GFR) under 60 mL/min/1.73 m2 or kidney damage as defined by structural or functional abnormalities other than decreased GFR [2]. Slow degradation of the kidney cells over a long time is a chronic kidney disease condition. It is a major kidney functionality failure where the kidney sans blood filtering process and there is a heavy fluid buildup in the body. CKD is a worldwide health issue, afflicting around 10-15% of the population [3] and its pervasiveness is continuously increasing all over the world. In the year 2016, globally, around 753 million people (417 million females and 336 million males) were affected by this disease [4]. CKD caused 1.2 million deaths in 2015 having a sharp increase from 409,000 in 1990 [5] [6]. High blood pressure (550,000), diabetes (418,000), and glomerulonephritis (238,000) [5] are among the major causes that contribute to the greatest number of deaths. Patients with CKD tend to have a higher chance of developing heart disease, anemia, bone diseases, elevated potassium, and calcium at later stages [7] [8]. According to an Australian study, earlier detection of CKD could reduce the growth of the disease even by nurses in the specialization of nephrology and primary care doctors [9]. Doctors usually apply imaging techniques to identify the presence of CKD. However, it is practically impossible to test each person due to the large number of patients for many reasons. Researchers from different communities worldwide have put significant efforts into identifying CKD at an early stage using machine learning models. These models employ other standard clinical features stored as part of a patient’s medical health records - for example, different blood tests, demographic features like age, gender, or medical features like diabetes, hypertension, anemia, appetite, or blood pressure. Extensive testing can be recommended only for patients with a higher possibility of having CKD. Researchers have employed different machine learning techniques and achieved significant results in accurately predicting CKD patients for further medical examinations (Table I). Explainability and interpretability of any proposed technique is a critical component like any machine learning technique in the healthcare domain. We found a significant gap in the interpretability of these developed models for chronic kidney disease identification. This study proposes a machine learning model to predict CKD, focusing on explainability. It will facilitate physicians understanding of the critical features affecting CKD development in the early stages and correlate them medically with other features to recommend patients for further diagnosis or preventive/curative measures. II. P ROBLEM S TATEMENT AND O BJECTIVES Existing machine learning and neural network/deep learning algorithms are limited to two themes in the identification of chronic kidney diseases: CKD prediction through ML/DL/NN and CKD prediction through explainable ML/DL/NN (Fig. 1). By examining the relevant studies under each cluster, we found that most works focused on increasing classification and prediction accuracy. In contrast, there is a lack of research in the second cluster focusing on the explainability of the model. This gap presented an opportunity for us to contribute through our study. In addition to using machine learning methods for prediction, which seem more suitable for the task at hand, our approach also focuses on model interpretability, a critical factor for healthcare models. Interpretable or explainable models help physicians analyze global features for CKD prediction. Local features help develop patient-centric care to deal with the varying levels of severity of kidney diseases. The objective of our study is to develop an interpretable machine learning model to predict chronic kidney disease. We used the Random Forest ensemble learning trees model and feature selection algorithms. We then assessed the model explainability in terms of feature importance and explainability metrics. We tried to answer the following research questions through our study: Q1. How can we add interpretability or explainability to existing machine learning models for predicting a chronic kidney disease? Q2. Which interpretation method between local and global is more suitable for prediction? Q3. How can we measure the explainability results between methods? IV. E XPERIMENTAL S ETUP AND M ETHODOLOGY A. Dataset The majority of the researchers have used the CKD dataset available from the University of California Irvine-Machine Learning repository [29], and we have also used it in our study. The data was collected over two months in India. The dataset contains 400 instances representing 400 patients. There are 11 numeric and 14 nominal features (age, bacteria, white blood cell count, blood pressure, blood glucose random, red blood cell count, specific gravity, blood urea, hypertension, albumin, serum creatinine, diabetes mellitus, sugar, sodium, coronary artery disease, red blood cells, potassium, appetite, pus cell, haemoglobin, pedal edema, pus cell clumps, packed cell volume, anemia) in the dataset. The target class is the classification which is either ckd or notckd. We found both normal and skewed data distributions among numeric features having continuous and discrete data containing a significant number of outliers. There is a high proportion of missing data in both numeric and categorical features. Theoretically, measures of central tendencies for imputation will not be the best approach; rather, KNN would be a more suitable technique here for imputation. B. Data Processing Fig. 1. An overview of research gap identification. III. L ITERATURE R EVIEW The healthcare domain is one of the fastest-growing application domains of artificial intelligence and machine learning techniques. There are many different areas of study in the healthcare domain. We found an opportunity to work on the explainability of the machine learning models for CKD identification due to a limited number of previous studies in this scope [1], [10]–[28]. Fig. 1 briefly shows the research gap identification methodology. Researchers have proposed many machine learning approaches to predict CKD effectively by exploiting patients’ medical data. Most of the work in this area has been heavily focused on comparing the standard machine learning approaches and improving their accuracy using numerous feature selection methods. Table I summarizes the results achieved by these previous studies and compares them based on the number of features, classification models, accuracy, and briefly analyses the techniques used. Moreno [1] developed a classifier model through a data pipeline built on feature importance and employed SHAP method for assessing the explainability of the model’s results. We have used this work as a foundation to develop and compare our work. 1) Missing Data: We found both normal and skewed data distributions among numeric features having continuous and discrete data containing a significant number of outliers. There was a high proportion of missing data in both numeric and categorical features. Theoretically, measures of central tendencies for imputation are not the best approaches; rather, KNN would be a more suitable technique here for imputation. This method works on the basic principle of the KNN algorithm rather than handling the missing data with a naive approach with mean and median. In this method, the K parameter indicates the distance from the missing data. Hence, the missing data were calculated using the K neighbor’s mean. 2) Feature Scaling: For numerical features, we normalized the dataset using the MinMaxScaler, which rescales all numerical feature columns to a range between 0 and 1. The categorical features were handled by One-Hot Encoding, in which all the values were converted to 0 and 1. 3) Class Imbalance: The dataset also contains class imbalance - the percentage of positive and negative CKD samples are 62.50% and 37.50%. We applied the SMOTE technique, which over-samples the minority class by generating synthetic samples. By interpolating between positive instances close together, it emphasizes the feature space for producing new instances. After this step, both the target values had the same instances count. 4) Data Transformation: For feature selection, we used the correlation matrix and PPScore to identify the highly correlated features with the output class and dropped the highly correlated columns. Selection begins with the complete set of features and removes the features that were not as correlated with the target variable. TABLE I A SUMMARY OF THE PREVIOUS STUDIES . Author Dataset P. A. Moreno [1] UCI No. of Features 8 Methods Results (Accuracy) Analyses Extra Trees - 99%, Random Forest - 98%, Decision Tree - 95% Applied SHAP for local explainability and feature selection techniques for global explainability. Decision Trees - 100%, Random Forest - 100%, XGBoost - 100%, Adaboost - 100%, ExtraTrees - 100% 100% Used statistical approach for identifying global feature importance. 23 Decision Trees, Random Forest, ExtraTrees, Adaboost, Gradient Boosting, XGBoost, Ensemble voting classifier Decision Trees, Random Forest, XGBoost, Adaboost, ExtraTrees, KNN, CNN, SVC Linear, SVC RBF, Linear Regression, Gaussian NB XGBoost Ekanayake et al. [10] UCI 7 Alaoui et al. [11] UCI Ogunleye et al. [12] UCI 12 XGBoost 97.6% Zeynu et al. [13] UCI 8 K-Nearest Neighbor, J48, Artificial Neural Network, Naı̈ve Bayes and Support Vector Machine Alaskar et al. [14] UCI 8 Raju et al. [15] UCI 5 Khan et al. [16] UCI 23 Hasan et al. [17] UCI 13 Abdullah et al. [18] UCI 24 Back Propagation Neural Network, Naı̈ve Bayes, Decision Table, Decision trees, K nearest neighbor and One Rule classifier Support Vector Machine, Random Forest, XGBoost, Logistic Regression, Neural networks, Naı̈ve Bayes Classifier NBTree, J48, Support Vector Machine, Logistic Regression, Multi-layer Perceptron, Naı̈ve Bayes, and Composite Hypercube on Iterated Random Projection (CHIRP) Adaptive Boosting, Gradient boosting, Bootstrap Agregation, Extra Trees, Random Forest Random Forest, Linear and Radial SVM, Naı̈ve Bayes and Logistic Regression K-Nearest Neighbour - 99%, ANN - 99.5%, Naı̈ve Bayes - 99%, Support Vector Machine 98.25% Naı̈ve Bayes - 99.36% UCI 11 Multilayer Perceptron, Radial Basis Function Network, Logistic Regression MLP - 99.75%, RBFN - 98.5%, LR - 97.5% Yousef [20] UCI 6 Decision Tree, Random Forest, Naive Bayes Dulhare et al. [21] UCI 5 Naive Bayes and Naive Bayes with OneR Random Forest - 100%, Decision Tree - 96.6%, Naive Bayes - 98.3% Naive Bayes - 80%, Naive Bayes with OneR 92.5% Pujari et al. [22] USG Images N/A Image inpainting (Fast Marching), region of interest(ROI) identification (manual), and noise filtering (ideal filter, butterworth and median filters) N/A Sinha et al. [23] UCI 23 Support Vector Machine and K-Nearest Neighbor KNN - 78.75%, SVM - 73.75% Sobrinho et al. [24] University Hospital, UFAL, Brazil 4 J48 decision tree, Random Forest, Naive Bayes, Support Vector Machine, Multilayer Perceptron, and K-Nearest Neighbor Neves et al. [25] Proprieatary 24 ANN Decision Tree -95.00%, Random Forest 93.33%, Naive Bayes - 88.33%, Support Vector Machine - 76.66%, Multilayer Perceptron 75.00%, and K-Nearest Neighbor - 71.67% 92.30% Polat et al. [26] UCI 13 Support Vector Machine 98.5% Qin et al. [27] UCI 21 Logistic Regression, Random Forest, Support Vector Machine, K-Nearest Neighbor, Naive Bayes and Feed-Forward Neural Network Random Forest - 99.75% Vijayarani et al. [28] Proprietary 6 Artificial Neural Network and Support Vector Machine ANN - 87.70% SVM - 76.32% Rubini et al. [19] C. Learning Methods We used Decision Tree and Random Forest as the learning models for our implementation. Using these simplest machine learning algorithms, we aimed to fulfill our two goals of high accuracy and explainability of the models. 1) Decision Tree: Decision trees are a simple and widely used classification technique where each node represents an attribute, and the branches represent the values that the attribute can have. Based on the attribute with the most information in the dataset, the decision tree builds recursively, starting with the instances of the category of the parent node. An instance is passed down the tree built on the training dataset from the root node to the leaf node, where a high degree of Random Forest - 99.29% NB - 95.75% ,LR - 96.50%, MLP - 97.25%, J48 - 97.75%, SVM - 98.25%, NBTree - 98.75%, CHIRP - 99.75% AdaBoost - 99%, Extra Trees - 98%, Random Forest - 98.825%, Linear SVM - 97.5% The study applied statistical analysis and prediction using IBM SPSS statistics and SPSS Modeler software packasges and compared the overall accuracy metric which represents the weighted average of sensitivity and specificity and measures the overall probability of correct classification. Optimized the extreme gradient boosting (XGBoost) model using a new set-theory based rule which combines a few feature selection methods with their collective strengths. Build two models using feature selection method and ensemble method. Implemented data mining classifiers tools to predict chronic kidney disease. Empirical work was performed on different classification algorithms on the patient medical record to identy the existence of chronic kidney disease. CHIRP also had the diminishing mean absolute error rate of 0.0025. AdaBoost classifier had outperformed all other procedures in expectation of quality of kidney ailment. Random Forest classifier with the Random Forest features, the selection using all features outperformed other machine learning models. Used Fruit fly optimization algorithm (FFOA) for feature selection and multi-kernel support vector machine (MKSVM) for classification of CKD/Not CKD using UCI dataset and compared with 3 other datasets. Correlation coefficient and recursive feature elimination methods were used for feature selection. WrapperSubsetEval attribute evaluator with best first search and SMO, IBK and Naı̈ve bayes classifiers had selected 12, 7 and 6 features respectively. Applied image processing techniques focusing on detecting the proportion of fibrosis conditions within kidney tissues for detecting and identifying five different stages of CKD. Aimed at creating prediction model for ckd prediction by comparing the performance of Bayes classifier, Support vector machine (SVM) and K-Nearest Neighbour (KNN) classifier on the basis of its accuracy, precision and execution time Study was performed using Brazil as a reference country. Aimed to develop a hybrid decision support system allowing to consider incomplete, unknown, and even contradictory information utilizing the model’s knowledge representation and reasoning procedures based on Logic Programming. Both wrapper and filter-based feature selection approaches were chosen to reduce the dataset dimensionality. Support Vector Machine classifier demonstrated a higher accuracy with a filtered subset evaluator using the best first search engine feature selection method. Proposed another integrated model that combined Logistic Regression and Random Forest by using perceptron, and achieved an average accuracy of 99.83% after ten times of simulation. Compared the performance of the selected two algorithms based on their execution time and accuracy. certainty has been achieved in class variables. In addition to being sensitive to variance as it creates complex boundaries, this type of algorithm is most suitable when a model has to be explainable [30] [31]. 2) Random Forest: Random forest is a type of ensemble classifier (built on decision trees) that is composed of a group of individually trained classifiers whose predictions are combined for predicting new instances. If N records are in the training set, these are sampled at random from the original data called bootstrapping to grow the tree. Similarly, m variables are selected at random from M input variables (m<<M), and the best splitting on these m attributes will determine the node. The value of m is maintained constant during forest growth. Trees are grown to the maximum extent possible without pruning. In this way, the forest will develop multiple trees. Choosing a low value of m leads to weak trees, while choosing a high value of m leads to trees that are more or less similar [32] [33]. D. Model Validation Methods 1) Stratified K-fold Cross-Validation: As a statistical technique, cross-validation evaluates and compares learning algorithms by repeatedly splitting a dataset into two segments: one to train the model and the other to validate it. We used stratified five-fold cross-validation to maintain an equal proportion of the class samples in the training and validation sets and balance the bias-variance tradeoff. Each iteration of a K-fold cross-validation iteration holds out a different fold of the data for validation and uses the remaining k-1 folds for learning. Fig. 2. Flowchart of the design process. 2) Performance Metrics: We used the four standard scores: accuracy, precision, recall, and F1 to evaluate each model’s performance. True positives refer to cases that had positive results and were predicted positively by the algorithms. True negatives refer to cases that had negative results and were predicted negatively. Similarly, false negatives represent cases predicted as negative but were positive, and false positives represent cases predicted as positive but were negative. We tested both the algorithms with the sequential feature selection algorithm using Sequential Floating Backward Selection. The models experimented with k=6 best features in the feature selection. As each fold input varies in data, the six best features may also vary in each fold. Later in this step, the fold’s complete training and testing data were contracted to the six best features. The new contracted data was passed to train the algorithm and continued to evaluate the accuracy of the test data. We used GridSearchCV function from sklearn’s model selection package to find optimal values of the hyper-parameters. The best rank test score or the highest mean test score parameter combination was used to test the model’s accuracy on the testing data of the fold. This exact procedure was repeated for all the folds of our stratified cross-validation. The model average of all the folds was determined to be the final average of the cross-validation model. E. Explainability Methods and Metrics We used three different methods, namely SKATER, SHAP, and LIME, for studying the interpretability of our model [34]– [36]. SKATER is a unified model interpretation framework designed to explain the learned structures of a black box model both globally and locally, inferring based on a complete data set or an individual data point respectively [34]. SHAP (SHapley Additive exPlanations) is another coalitional game theory-based approach for explaining any machine learning model outcome. It allows explaining the prediction of an instance by computing the contribution of each feature. SHAP uses classic Shapley values (the aggregate marginal contributions within a feature value for various coalitions) from game theory and their related extensions for local explanations. It uses a unified method to interpret the results of various machine learning models [35]. LIME is a model-agnostic explainability method that attempts to explain the model by altering the input of sample data and then observing the changes in model output or predictions. Model-specific approaches evaluate the interaction of the essential components of the black-box machine learning model in order to have a deeper understanding of it [36]. The evaluation metrics considered for the Explainability are Interpretability, Fidelity, and Fidelity-to-Interpretability Ratio (FIR) in order to have comparable results with the previous study [1]. Interpretability is defined as the % of masked features that do not contribute to the final classification of the total # of features in the dataset. Fidelity is the degree of accuracy of a fully interpretable counterpart model compared to the actual model. Fidelity-to-Interpretability Ratio (FIR) indicates how much of the model’s interpretability is sacrificed for performance. The ideal ratio is 0.5, calculated as FIR=F/(F+I). V. R ESULTS A. Model Performances We applied the Floating Feature Selection method to select the six best features out of 24 and employed stratified fivefold cross-validation with features in each fold to compare the Decision Tree and Random Forest algorithms. Table II shows the performance matrix of the Decision Tree with an accuracy of 98.8% on the testing dataset. It was achieved by choosing the hyper-parameter settings as ’criterion’: ’gini’ and ’max depth’: 5. A fully interpretable Decision Tree with all features had a 99.8% accuracy. Table III summarizes the performance of the Random Forest with an accuracy of 99.8% on the testing dataset. The hyperparameter settings as ’bootstrap’: True, ’max depth’: 50 and ’n estimators’: 800 gain the highest accuracy for the Random Forest. Comparing both the models, the Random Forest performed better than the Decision Tree by a minute distance. B. Model Explainability SKATER plot in Fig. 3 displays the best features from the dataset after completing the model training where features are sorted based on their importance. According to this plot, red blood cell count, albumin, and serum creatinine are the top three important features for classifying CKD. TABLE II P ERFORMANCE MATRIX OF D ECISION T REE WITH THE SIX BEST FEATURES . Class/Performance Metrics ckd not-ckd Precision Recall F1-Score Support/Instances 0.98 1.00 1.00 0.98 0.99 0.99 250 250 TABLE III P ERFORMANCE MATRIX OF R ANDOM F OREST RESULTS WITH THE SIX BEST FEATURES . Class/Performance Metrics ckd not-ckd Precision Recall F1-Score Support/Instances 1.00 1.00 1.00 1.00 1.00 1.00 250 250 Fig. 5. The feature importance chart obtained by using SHAP (globally). Fig. 3. The feature importance chart obtained by using SKATER. SHAP plot displays the crucial features and also the features that drive the target class of the algorithm. SHAP has been applied both locally and globally in the current context. Fig. 4 depicts the feature importance for a single instance (local) and Fig. 5 shows the same for the model output (global). According to SHAP, the three features that contribute the most in determining the model output are hemoglobin, specific gravity, and packed cell volume. Fig. 4. The feature importance chart obtained by using SHAP (locally). LIME plot in Fig. 6 displays the features in a sorted order giving both the probability of the target class and the probability of each feature contributing to the classification. It also gives the ideal deciding range of each feature in the trained model on a local level. According to LIME, specific gravity, hemoglobin and albumin are the top three critical features for classifying the selected instance. Fig. 6. The feature importance chart obtained by using LIME (local). C. Explainability Evaluation Results The accuracy of the Decision Tree trained with the six best features found through the floating Feature Selection method and five-fold cross-validation was 98.8%. The accuracy of the Random Forest model with the same parameters was 99.8%. A fully interpretable Decision Tree with all features had an accuracy of 99.8%. Therefore, according to the definition, our model’s Interpretability is 75% (18/24) as we used only six features, and the remaining 18 features did not bring any value to the model’s output. The Fidelity of our model is 98.8% / 99.8%, which is 99% because we have 1% loss in performance to achieve the interpretability of the model. Lastly, the Fidelityto-Interpretability ratio is 99/(99 + 75), which is 56.89%, is closer to the ideal value of 50%. We compared our explainability model with the previous study [1] as shown in Fig. 7. Our model’s Interpretability was 75% compared to 67% of the other study. In the case of Fidelity, our model has sacrificed only 1% accuracy to add explainability compared to 3% previously. We achieved Fidelity-to-Interpretability as 57.14% closer to the ideal value of 50%, whereas Moreno [1] shows an FIR of 59%. Fig. 7. A comparison of the explainability results with the previous work [1]. VI. D ISCUSSION AND C ONTRIBUTIONS Our key objective of the study was to apply both local and global explainability methods to different machine learning methods for classifying chronic kidney disease patients and compare the results among the methods. We experimented with three local and global explainability methods (SHAP, LIME, and SKATER), which gave us impressive results by identifying the key features and their contributions to the model’s output. Except for a few differences, all three methods identified the same features as the important ones for CKD classification with Random Forest and Decision Tree models. For global interpretability, SKATER and SHAP, both the methods identified serum creatinine and red blood cell count among the top three essential features for CKD classification based on their value range. Hemoglobin, albumin, and packed cell volume are among other crucial features impacting the overall classification output by the models. We found that both SHAP and LIME identified haemoglobin as a vital feature in local interpretability. Low hemoglobin levels, low red blood cell count, and high specific gravity values contribute to a patient diagnosed with CKD and the reverse levels diagnosed as not having CKD. Such information is beneficial in assessing each patient based on the features and associated values and drawing practical observations that might impact the global level findings. While investigating the suitability of local and global explainability methods, our study shows that both techniques would be helpful in the relevant contexts of CKD classification and play a crucial role for the physicians in understanding the progression of kidney disease. Global explainability metrics help to gain a general understanding of commonly important features for CKD classification. Local instance-level explainability would help the physicians to assess each patient separately, compare with global feature importance and draw valuable facts and information to determine the next course of action for individual patients. Patients, in turn, benefit by avoiding many clinical tests and expensive image scans to diagnose kidney disease at an earlier stage. According to an American Family Physician research article [37], Diabetes mellitus, hypertension, and older age have been identified as the primary risk factors of CKD where patients should be screened for further investigation. Cardiovascular disease, family history of chronic kidney disease, and ethnic and racial minority status have been identified as other risk factors for CKD. In our study, SHAP global, SKATER, and LIME also identified hypertension, diabetes mellitus, and heart (coronary artery) disease as important features contributing positively to a patient classified as CKD. However, the explainability methods identified other factors like red blood cell count, haemoglobin, albumin, specific gravity, and serum creatinine as influential too. The clinical correlation and verification of these factors are the potential future tasks to advance our study further. National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) [38], Center for Disease Control and Prevention (CDC) [39], and Johns Hopkins Medicine [40] also identified diabetes mellitus type 1 or type 2, blood pressure, hypertension, and cardiovascular disease with a family history of kidney failures. Additionally, obesity is a high-risk factor for CKD evaluation among patients. VII. L IMITATIONS AND F UTURE W ORK We have a few limitations in the study. We acknowledge that the dataset is small for conducting a significant critical analysis. Other datasets either had proprietary ownership or non-responsiveness from the sources. We applied explainability to only the Random Forest algorithm in our study. In order to have a generalized observation, we need to apply the same methods to other black-box algorithms. We intend to test existing and new explainability metrics with other classification algorithms like XGBoost, Adaboost, and ExtraTree to assess the performances of future healthcare models. R EFERENCES [1] P. Moreno, “An explainable classification model for chronic kidney disease patients”, [Online]. Available: https://arxiv.org/pdf/2105.10368. [Accessed: 12-Mar-2022]. [2] A. Levin, P. E. Stevens, R. W. Bilous, J. Coresh, A. L. De Francisco, P. E. De Jong, C. G. Winearls, & Others, ”Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease”, Kidney international supplements, vol. 3, no. 1, pp. 1–150, 2013. [3] R. Saran et al., ”US renal data system 2016 annual data report: Epidemiology of kidney disease in the United States”, American journal of kidney diseases, vol. 69, no. 3, pp. A7–A8, 2017. [4] B. Bikbov, G. Remuzzi, and N. Perico, “Disparities in chronic kidney disease prevalence among males and females in 195 countries: Analysis of the global burden of disease 2016 study,” Nephron,pp. 313–318, [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/29791905/. [Accessed: 12-Mar-2022]. [5] H. Wang et al., ‘Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980– 2015: a systematic analysis for the Global Burden of Disease Study 2015’, The lancet, vol. 388, no. 10053, pp. 1459–1544, 2016. [6] H. Wang, M. Naghavi, C. Allen, R. M. Barber, Z. A. Bhutta, and A. Carter, “Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: A systematic analysis for the global burden of disease study 2015,” The Lancet, vol. 388, no. 10053, pp. 1459–1544, 2016. [7] S. Ardhanari, M. Alpert and K. Aggarwal, ”Cardiovascular Disease in Chronic Kidney Disease: Risk Factors, Pathogenesis, and Prevention”, Advances in Peritoneal Dialysis, vol. 30, pp. 40-53, 2014. Available: https://www.advancesinpd.com/adv14/40-53 Ardhanari.pdf. [Accessed 13 March 2022]. [8] M. J. Sarnak, “Kidney disease as a risk factor for development of cardiovascular disease,” Circulation, vol. 108, no. 17, pp. 2154–2169, 2003. [9] R. Walker, M. Marshall, en N. Polaschek, “Improving self-management in chronic kidney disease: A pilot study”, Renal Society of Australasia Journal, vol 9, pp. 116–125, Sep 2013 [10] I. U. Ekanayake and D. Herath, “Chronic kidney disease prediction using machine learning methods,” 2020 Moratuwa Engineering Research Conference (MERCon),pp. 260–265, 2020. [11] S. Sossi Alaoui, B. Aksasse, and Y. Farhaoui, “Statistical and predictive analytics of chronic kidney disease,” Advances in Intelligent Systems and Computing, pp. 27–38, 2019. [12] A. Ogunleye and Q.-G. Wang, “Enhanced xgboost-based automatic diagnosis system for chronic kidney disease,” 2018 IEEE 14th International Conference on Control and Automation (ICCA), pp. 2131–2140, Nov. 2020. [13] S. Zeynu, and S. Patil, ”Survey on prediction of chronic kidney disease using data mining classification techniques and feature selection”, International Journal of Pure and Applied Mathematics, vol. 118, no. 8, pp. 149–156, 2018. [14] H. Alasker, S. Alharkan, W. Alharkan, A. Zaki, and L. S. Riza, “Detection of kidney disease using various intelligent classifiers,” 2017 3rd International Conference on Science in Information Technology (ICSITech),pp. 681–684, 2017. [15] N. V. Ganapathi Raju, K. Prasanna Lakshmi, K. G. Praharshitha, and C. Likhitha, “Prediction of chronic kidney disease (CKD) using Data Science,” 2019 International Conference on Intelligent Computing and Control Systems (ICCS),pp. 642–647, 2019. [16] B. Khan, R. Naseem, F. Muhammad, G. Abbas, and S. Kim, “An empirical evaluation of machine learning techniques for chronic kidney disease prophecy,” IEEE Access, vol. 8, pp. 55012–55022, 2020. [17] K. M. Zubair Hasan and M. Zahid Hasan, “Performance evaluation of ensemble-based machine learning techniques for prediction of chronic kidney disease,” Emerging Research in Computing, Information, Communication and Applications, pp. 415–426, 2019. [18] A. A. Abdullah, S. A. Hafidz, and W. Khairunizam, “Performance comparison of machine learning algorithms for classification of chronic kidney disease (CKD),” Journal of Physics: Conference Series, vol. 1529, no. 5, p. 052077, 2020. [19] L. Jerlin Rubini and E. Perumal, “Efficient classification of chronic kidney disease by using multi-kernel support vector machine and fruit fly optimization algorithm,” International Journal of Imaging Systems and Technology, vol. 30, no. 3, pp. 660–673, 2020. [20] M. Yousef, “Prediction of chronic kidney disease using different Classification Algorithms: A Comparative Study,” Prediction Of Chronic Kidney Disease Using Different Classification Algorithms: A Comparative Study. [Online]. Available: https://www.xisdxjxsu.asia/V17I1039.pdf. [Accessed: 13-Mar-2022]. [21] U. N. Dulhare and M. Ayesha, “Extraction of action rules for chronic kidney disease using naı̈ve Bayes classifier,” 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC),pp. 1-5, 2016. [22] R. Pujari and V. Hajare, “Analysis of ultrasound images for identification of chronic kidney disease stages,” 2014 First International Conference on Networks & Soft Computing (ICNSC2014), pp. 380–383, 2014. [23] P. Sinha and P. Sinha, “Comparative study of chronic kidney disease prediction using KNN and SVM,” International Journal of Engineering Research and, vol. V4, no. 12, 2015. [24] A. Sobrinho, A. C. Queiroz, L. Dias Da Silva, E. De Barros Costa, M. Eliete Pinheiro, and A. Perkusich, “Computer-aided diagnosis of chronic kidney disease in developing countries: A comparative analysis of machine learning techniques,” IEEE Access, vol. 8, pp. 25407–25419, 2020. [25] J. Neves, M. R. Martins, and J. Vilhena, “A soft computing approach to Kidney Diseases Evaluation,” Journal of Medical Systems, vol. 39, no. 10, 2015. [26] H. Polat, H. Danaei Mehr, and A. Cetin, “Diagnosis of chronic kidney disease based on support Vector Machine by feature selection methods,” Journal of Medical Systems, vol. 41, no. 4, 2017. [27] J. Qin, L. Chen, Y. Liu, C. Liu, C. Feng, and B. Chen, “A machine learning methodology for diagnosing chronic kidney disease,” IEEE Access, vol. 8, pp. 20991–21002, 2020. [28] M. Vijayarani, S. Dhayanand,”Kidney Disease Prediction using SVM and ANN algorithms,” International Journal of Computing and Business Research, vol. 6,no. 2, pp. 2229 - 6166, 2015. [29] D. Dua and C. Graff, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2017).Available: https://archive.ics.uci.edu/ml/datasets/chronic kidney disease. [Accessed: 2-Feb-2022]. [30] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and regression trees. Routledge, 2017. [31] J. R. Quinlan, ”Induction of decision trees”, Machine learning, vol. 1, no. 1, pp. 81–106, 1986. [32] T. K. Ho, ”Random decision forests”, Proceedings of 3rd international conference on document analysis and recognition, no. 1, pp. 278–282, 1995. [33] L. Breiman, ”Random Forests.”, Machine Learning, vol. 45, pp. 5–32, 2001. [34] Skater Documentation. (n.d.). Model Interpretation with Skater: Overview¶. Overview - skater 0 documentation. Available: https://oracle.github.io/Skater/overview.html. [Accessed: 12-Apr-2022]. [35] S. M. Lundberg and S. I. Lee, (2017), ”A Unified Approach to Interpreting Model Predictions,” Advances in Neural Information Processing Systems, vol. 30, pp. 4765–4774, 2017.Available: http://papers.nips.cc/paper/7062-a-unified-approach-to-interpretingmodel-predictions.pdf. [Accessed: 12-Apr-2022]. [36] K. Peng and T. Menzies, “Documenting evidence of a reuse of ‘“why should I trust you?”: explaining the predictions of any classifier,’” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 1600–1600. [37] M. Baumgarten and T. Gehr, “Chronic kidney disease: Detection and evaluation,” American family physician, 15-Nov-2011. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/22085668/. [Accessed: 22-May2022]. [38] “Identify & evaluate patients with chronic kidney disease,” National Institute of Diabetes and Digestive and Kidney Diseases. [Online]. Available: https://www.niddk.nih.gov/healthinformation/professionals/clinical-tools-patient-management/kidneydisease/identify-manage-patients/evaluate-ckd. [Accessed: 23-May2022]. [39] “Chronic kidney disease basics,” Centers for Disease Control and Prevention, 28-Feb-2022. [Online]. Available: https://www.cdc.gov/kidneydisease/basics.html. [Accessed: 23-May2022]. [40] “Chronic kidney disease,” Johns Hopkins Medicine, 08-Aug-2021. [Online]. Available: https://www.hopkinsmedicine.org/health/conditionsand-diseases/chronic-kidney-disease. [Accessed: 23-May-2022].