Uploaded by mislam39

Explainability to Machine Learning Models for CKD

advertisement
Adding Explainability to Machine Learning Models
to Detect Chronic Kidney Disease
Abstract—Chronic Kidney Disease is a common term for
multiple heterogeneous diseases in the kidneys. It is also known
as Chronic Renal Disease. Chronic kidney disease (CKD) has
a gradual loss of glomerular filtration rate (GFR) over three
months. The patient does not observe any significant symptoms in
the earlier stage of CKD, and it is not identifiable without clinical
tests like urine and blood tests. Patients with CKD would have a
higher chance of developing heart disease. CKD is a progressive
and often irreversible process of renal function decline, which
may reach an endpoint of end-stage renal failure, requiring renal
replacement therapy. It is critical to diagnose progressive CKD at
an early stage and predict patients prone to developing the disease
further for timely therapeutic interventions. As such, researchers
have expended enormous efforts in the development of novel
biomarkers that may identify subjects with early CKD at risk
of progression. In this study, we have developed an explainable
machine learning model to predict chronic kidney disease by implementing an automated data pipeline using the Random Forest
ensemble learning trees model and feature selection algorithm.
The explainability of the proposed model has been assessed in
terms of feature importance and explainability metrics. Three
explainability methods; LIME, SHAP, and SKATER have been
applied to interpret the developed model and to compare the
explainability results using Interpretability, Fidelity, and Fidelityto-Interpretability ratio as the explainability metrics.
Index Terms—chronic kidney disease, ckd, explainability, interpretability, machine learning, neural network
I. I NTRODUCTION
Chronic Kidney Disease (CKD) is characterized by a
glomerular filtration rate (GFR) under 60 mL/min/1.73 m2
or kidney damage as defined by structural or functional
abnormalities other than decreased GFR [2]. Slow degradation
of the kidney cells over a long time is a chronic kidney
disease condition. It is a major kidney functionality failure
where the kidney sans blood filtering process and there is a
heavy fluid buildup in the body. CKD is a worldwide health
issue, afflicting around 10-15% of the population [3] and its
pervasiveness is continuously increasing all over the world.
In the year 2016, globally, around 753 million people (417
million females and 336 million males) were affected by this
disease [4]. CKD caused 1.2 million deaths in 2015 having
a sharp increase from 409,000 in 1990 [5] [6]. High blood
pressure (550,000), diabetes (418,000), and glomerulonephritis
(238,000) [5] are among the major causes that contribute to
the greatest number of deaths.
Patients with CKD tend to have a higher chance of developing heart disease, anemia, bone diseases, elevated potassium,
and calcium at later stages [7] [8]. According to an Australian
study, earlier detection of CKD could reduce the growth of
the disease even by nurses in the specialization of nephrology
and primary care doctors [9]. Doctors usually apply imaging
techniques to identify the presence of CKD. However, it is
practically impossible to test each person due to the large number of patients for many reasons. Researchers from different
communities worldwide have put significant efforts into identifying CKD at an early stage using machine learning models.
These models employ other standard clinical features stored
as part of a patient’s medical health records - for example,
different blood tests, demographic features like age, gender, or
medical features like diabetes, hypertension, anemia, appetite,
or blood pressure. Extensive testing can be recommended
only for patients with a higher possibility of having CKD.
Researchers have employed different machine learning techniques and achieved significant results in accurately predicting
CKD patients for further medical examinations (Table I).
Explainability and interpretability of any proposed technique
is a critical component like any machine learning technique
in the healthcare domain. We found a significant gap in the
interpretability of these developed models for chronic kidney
disease identification.
This study proposes a machine learning model to predict
CKD, focusing on explainability. It will facilitate physicians
understanding of the critical features affecting CKD development in the early stages and correlate them medically with
other features to recommend patients for further diagnosis or
preventive/curative measures.
II. P ROBLEM S TATEMENT AND O BJECTIVES
Existing machine learning and neural network/deep learning
algorithms are limited to two themes in the identification of
chronic kidney diseases: CKD prediction through ML/DL/NN
and CKD prediction through explainable ML/DL/NN (Fig.
1). By examining the relevant studies under each cluster, we
found that most works focused on increasing classification and
prediction accuracy. In contrast, there is a lack of research in
the second cluster focusing on the explainability of the model.
This gap presented an opportunity for us to contribute through
our study.
In addition to using machine learning methods for prediction, which seem more suitable for the task at hand, our
approach also focuses on model interpretability, a critical factor for healthcare models. Interpretable or explainable models
help physicians analyze global features for CKD prediction.
Local features help develop patient-centric care to deal with
the varying levels of severity of kidney diseases.
The objective of our study is to develop an interpretable
machine learning model to predict chronic kidney disease.
We used the Random Forest ensemble learning trees model
and feature selection algorithms. We then assessed the model
explainability in terms of feature importance and explainability
metrics. We tried to answer the following research questions
through our study:
Q1. How can we add interpretability or explainability to
existing machine learning models for predicting a chronic
kidney disease?
Q2. Which interpretation method between local and global
is more suitable for prediction?
Q3. How can we measure the explainability results between
methods?
IV. E XPERIMENTAL S ETUP AND M ETHODOLOGY
A. Dataset
The majority of the researchers have used the CKD dataset
available from the University of California Irvine-Machine
Learning repository [29], and we have also used it in our
study. The data was collected over two months in India. The
dataset contains 400 instances representing 400 patients. There
are 11 numeric and 14 nominal features (age, bacteria, white
blood cell count, blood pressure, blood glucose random, red
blood cell count, specific gravity, blood urea, hypertension,
albumin, serum creatinine, diabetes mellitus, sugar, sodium,
coronary artery disease, red blood cells, potassium, appetite,
pus cell, haemoglobin, pedal edema, pus cell clumps, packed
cell volume, anemia) in the dataset. The target class is the
classification which is either ckd or notckd.
We found both normal and skewed data distributions among
numeric features having continuous and discrete data containing a significant number of outliers. There is a high proportion
of missing data in both numeric and categorical features.
Theoretically, measures of central tendencies for imputation
will not be the best approach; rather, KNN would be a more
suitable technique here for imputation.
B. Data Processing
Fig. 1. An overview of research gap identification.
III. L ITERATURE R EVIEW
The healthcare domain is one of the fastest-growing application domains of artificial intelligence and machine learning
techniques. There are many different areas of study in the
healthcare domain. We found an opportunity to work on
the explainability of the machine learning models for CKD
identification due to a limited number of previous studies in
this scope [1], [10]–[28]. Fig. 1 briefly shows the research gap
identification methodology. Researchers have proposed many
machine learning approaches to predict CKD effectively by
exploiting patients’ medical data. Most of the work in this
area has been heavily focused on comparing the standard
machine learning approaches and improving their accuracy
using numerous feature selection methods. Table I summarizes
the results achieved by these previous studies and compares
them based on the number of features, classification models,
accuracy, and briefly analyses the techniques used. Moreno [1]
developed a classifier model through a data pipeline built on
feature importance and employed SHAP method for assessing
the explainability of the model’s results. We have used this
work as a foundation to develop and compare our work.
1) Missing Data: We found both normal and skewed data
distributions among numeric features having continuous and
discrete data containing a significant number of outliers. There
was a high proportion of missing data in both numeric and categorical features. Theoretically, measures of central tendencies
for imputation are not the best approaches; rather, KNN would
be a more suitable technique here for imputation. This method
works on the basic principle of the KNN algorithm rather than
handling the missing data with a naive approach with mean
and median. In this method, the K parameter indicates the
distance from the missing data. Hence, the missing data were
calculated using the K neighbor’s mean.
2) Feature Scaling: For numerical features, we normalized
the dataset using the MinMaxScaler, which rescales all numerical feature columns to a range between 0 and 1. The
categorical features were handled by One-Hot Encoding, in
which all the values were converted to 0 and 1.
3) Class Imbalance: The dataset also contains class imbalance - the percentage of positive and negative CKD samples
are 62.50% and 37.50%. We applied the SMOTE technique,
which over-samples the minority class by generating synthetic
samples. By interpolating between positive instances close
together, it emphasizes the feature space for producing new
instances. After this step, both the target values had the same
instances count.
4) Data Transformation: For feature selection, we used
the correlation matrix and PPScore to identify the highly
correlated features with the output class and dropped the
highly correlated columns. Selection begins with the complete
set of features and removes the features that were not as
correlated with the target variable.
TABLE I
A SUMMARY OF THE PREVIOUS STUDIES .
Author
Dataset
P. A. Moreno [1]
UCI
No.
of
Features
8
Methods
Results (Accuracy)
Analyses
Extra Trees - 99%, Random Forest - 98%, Decision Tree - 95%
Applied SHAP for local explainability and feature selection techniques for global explainability.
Decision Trees - 100%, Random Forest - 100%,
XGBoost - 100%, Adaboost - 100%, ExtraTrees
- 100%
100%
Used statistical approach for identifying global feature
importance.
23
Decision Trees, Random Forest, ExtraTrees, Adaboost, Gradient Boosting, XGBoost, Ensemble
voting classifier
Decision Trees, Random Forest, XGBoost, Adaboost, ExtraTrees, KNN, CNN, SVC Linear,
SVC RBF, Linear Regression, Gaussian NB
XGBoost
Ekanayake et al. [10]
UCI
7
Alaoui et al. [11]
UCI
Ogunleye et al. [12]
UCI
12
XGBoost
97.6%
Zeynu et al. [13]
UCI
8
K-Nearest Neighbor, J48, Artificial Neural Network, Naı̈ve Bayes and Support Vector Machine
Alaskar et al. [14]
UCI
8
Raju et al. [15]
UCI
5
Khan et al. [16]
UCI
23
Hasan et al. [17]
UCI
13
Abdullah et al. [18]
UCI
24
Back Propagation Neural Network, Naı̈ve Bayes,
Decision Table, Decision trees, K nearest neighbor and One Rule classifier
Support Vector Machine, Random Forest, XGBoost, Logistic Regression, Neural networks,
Naı̈ve Bayes Classifier
NBTree, J48, Support Vector Machine, Logistic
Regression, Multi-layer Perceptron, Naı̈ve Bayes,
and Composite Hypercube on Iterated Random
Projection (CHIRP)
Adaptive Boosting, Gradient boosting, Bootstrap
Agregation, Extra Trees, Random Forest
Random Forest, Linear and Radial SVM, Naı̈ve
Bayes and Logistic Regression
K-Nearest Neighbour - 99%, ANN - 99.5%,
Naı̈ve Bayes - 99%, Support Vector Machine 98.25%
Naı̈ve Bayes - 99.36%
UCI
11
Multilayer Perceptron, Radial Basis Function
Network, Logistic Regression
MLP - 99.75%, RBFN - 98.5%, LR - 97.5%
Yousef [20]
UCI
6
Decision Tree, Random Forest, Naive Bayes
Dulhare et al. [21]
UCI
5
Naive Bayes and Naive Bayes with OneR
Random Forest - 100%, Decision Tree - 96.6%,
Naive Bayes - 98.3%
Naive Bayes - 80%, Naive Bayes with OneR 92.5%
Pujari et al. [22]
USG Images
N/A
Image inpainting (Fast Marching), region of interest(ROI) identification (manual), and noise filtering (ideal filter, butterworth and median filters)
N/A
Sinha et al. [23]
UCI
23
Support Vector Machine and K-Nearest Neighbor
KNN - 78.75%, SVM - 73.75%
Sobrinho et al. [24]
University
Hospital,
UFAL, Brazil
4
J48 decision tree, Random Forest, Naive Bayes,
Support Vector Machine, Multilayer Perceptron,
and K-Nearest Neighbor
Neves et al. [25]
Proprieatary
24
ANN
Decision Tree -95.00%, Random Forest 93.33%, Naive Bayes - 88.33%, Support Vector Machine - 76.66%, Multilayer Perceptron 75.00%, and K-Nearest Neighbor - 71.67%
92.30%
Polat et al. [26]
UCI
13
Support Vector Machine
98.5%
Qin et al. [27]
UCI
21
Logistic Regression, Random Forest, Support
Vector Machine, K-Nearest Neighbor, Naive
Bayes and Feed-Forward Neural Network
Random Forest - 99.75%
Vijayarani et al. [28]
Proprietary
6
Artificial Neural Network and Support Vector
Machine
ANN - 87.70% SVM - 76.32%
Rubini et al. [19]
C. Learning Methods
We used Decision Tree and Random Forest as the learning
models for our implementation. Using these simplest machine
learning algorithms, we aimed to fulfill our two goals of high
accuracy and explainability of the models.
1) Decision Tree: Decision trees are a simple and widely
used classification technique where each node represents an attribute, and the branches represent the values that the attribute
can have. Based on the attribute with the most information
in the dataset, the decision tree builds recursively, starting
with the instances of the category of the parent node. An
instance is passed down the tree built on the training dataset
from the root node to the leaf node, where a high degree of
Random Forest - 99.29%
NB - 95.75% ,LR - 96.50%, MLP - 97.25%, J48
- 97.75%, SVM - 98.25%, NBTree - 98.75%,
CHIRP - 99.75%
AdaBoost - 99%, Extra Trees - 98%,
Random Forest - 98.825%, Linear SVM - 97.5%
The study applied statistical analysis and prediction
using IBM SPSS statistics and SPSS Modeler software
packasges and compared the overall accuracy metric
which represents the weighted average of sensitivity
and specificity and measures the overall probability of
correct classification.
Optimized the extreme gradient boosting (XGBoost)
model using a new set-theory based rule which combines
a few feature selection methods with their collective
strengths.
Build two models using feature selection method and
ensemble method.
Implemented data mining classifiers tools to predict
chronic kidney disease.
Empirical work was performed on different classification
algorithms on the patient medical record to identy the
existence of chronic kidney disease.
CHIRP also had the diminishing mean absolute error
rate of 0.0025.
AdaBoost classifier had outperformed all other procedures in expectation of quality of kidney ailment.
Random Forest classifier with the Random Forest features, the selection using all features outperformed other
machine learning models.
Used Fruit fly optimization algorithm (FFOA) for feature selection and multi-kernel support vector machine
(MKSVM) for classification of CKD/Not CKD using
UCI dataset and compared with 3 other datasets.
Correlation coefficient and recursive feature elimination
methods were used for feature selection.
WrapperSubsetEval attribute evaluator with best first
search and SMO, IBK and Naı̈ve bayes classifiers had
selected 12, 7 and 6 features respectively.
Applied image processing techniques focusing on detecting the proportion of fibrosis conditions within kidney tissues for detecting and identifying five different
stages of CKD.
Aimed at creating prediction model for ckd prediction by
comparing the performance of Bayes classifier, Support
vector machine (SVM) and K-Nearest Neighbour (KNN)
classifier on the basis of its accuracy, precision and
execution time
Study was performed using Brazil as a reference country.
Aimed to develop a hybrid decision support system
allowing to consider incomplete, unknown, and even
contradictory information utilizing the model’s knowledge representation and reasoning procedures based on
Logic Programming.
Both wrapper and filter-based feature selection approaches were chosen to reduce the dataset dimensionality. Support Vector Machine classifier demonstrated a
higher accuracy with a filtered subset evaluator using the
best first search engine feature selection method.
Proposed another integrated model that combined Logistic Regression and Random Forest by using perceptron,
and achieved an average accuracy of 99.83% after ten
times of simulation.
Compared the performance of the selected two algorithms based on their execution time and accuracy.
certainty has been achieved in class variables. In addition to
being sensitive to variance as it creates complex boundaries,
this type of algorithm is most suitable when a model has to
be explainable [30] [31].
2) Random Forest: Random forest is a type of ensemble
classifier (built on decision trees) that is composed of a
group of individually trained classifiers whose predictions are
combined for predicting new instances. If N records are in the
training set, these are sampled at random from the original data
called bootstrapping to grow the tree. Similarly, m variables
are selected at random from M input variables (m<<M), and
the best splitting on these m attributes will determine the node.
The value of m is maintained constant during forest growth.
Trees are grown to the maximum extent possible without
pruning. In this way, the forest will develop multiple trees.
Choosing a low value of m leads to weak trees, while choosing
a high value of m leads to trees that are more or less similar
[32] [33].
D. Model Validation Methods
1) Stratified K-fold Cross-Validation: As a statistical technique, cross-validation evaluates and compares learning algorithms by repeatedly splitting a dataset into two segments: one
to train the model and the other to validate it. We used stratified
five-fold cross-validation to maintain an equal proportion of
the class samples in the training and validation sets and
balance the bias-variance tradeoff. Each iteration of a K-fold
cross-validation iteration holds out a different fold of the data
for validation and uses the remaining k-1 folds for learning.
Fig. 2. Flowchart of the design process.
2) Performance Metrics: We used the four standard scores:
accuracy, precision, recall, and F1 to evaluate each model’s
performance. True positives refer to cases that had positive
results and were predicted positively by the algorithms. True
negatives refer to cases that had negative results and were
predicted negatively. Similarly, false negatives represent cases
predicted as negative but were positive, and false positives
represent cases predicted as positive but were negative.
We tested both the algorithms with the sequential feature
selection algorithm using Sequential Floating Backward Selection. The models experimented with k=6 best features in
the feature selection. As each fold input varies in data, the six
best features may also vary in each fold. Later in this step,
the fold’s complete training and testing data were contracted
to the six best features. The new contracted data was passed
to train the algorithm and continued to evaluate the accuracy
of the test data.
We used GridSearchCV function from sklearn’s model selection package to find optimal values of the hyper-parameters.
The best rank test score or the highest mean test score parameter combination was used to test the model’s accuracy on
the testing data of the fold. This exact procedure was repeated
for all the folds of our stratified cross-validation. The model
average of all the folds was determined to be the final average
of the cross-validation model.
E. Explainability Methods and Metrics
We used three different methods, namely SKATER, SHAP,
and LIME, for studying the interpretability of our model [34]–
[36].
SKATER is a unified model interpretation framework designed to explain the learned structures of a black box model
both globally and locally, inferring based on a complete data
set or an individual data point respectively [34].
SHAP (SHapley Additive exPlanations) is another coalitional game theory-based approach for explaining any machine
learning model outcome. It allows explaining the prediction of
an instance by computing the contribution of each feature.
SHAP uses classic Shapley values (the aggregate marginal
contributions within a feature value for various coalitions)
from game theory and their related extensions for local explanations. It uses a unified method to interpret the results of
various machine learning models [35].
LIME is a model-agnostic explainability method that attempts to explain the model by altering the input of sample
data and then observing the changes in model output or
predictions. Model-specific approaches evaluate the interaction
of the essential components of the black-box machine learning
model in order to have a deeper understanding of it [36].
The evaluation metrics considered for the Explainability are
Interpretability, Fidelity, and Fidelity-to-Interpretability Ratio
(FIR) in order to have comparable results with the previous
study [1].
Interpretability is defined as the % of masked features that
do not contribute to the final classification of the total # of
features in the dataset.
Fidelity is the degree of accuracy of a fully interpretable
counterpart model compared to the actual model.
Fidelity-to-Interpretability Ratio (FIR) indicates how
much of the model’s interpretability is sacrificed for performance. The ideal ratio is 0.5, calculated as FIR=F/(F+I).
V. R ESULTS
A. Model Performances
We applied the Floating Feature Selection method to select
the six best features out of 24 and employed stratified fivefold cross-validation with features in each fold to compare the
Decision Tree and Random Forest algorithms.
Table II shows the performance matrix of the Decision
Tree with an accuracy of 98.8% on the testing dataset. It
was achieved by choosing the hyper-parameter settings as
’criterion’: ’gini’ and ’max depth’: 5. A fully interpretable
Decision Tree with all features had a 99.8% accuracy.
Table III summarizes the performance of the Random Forest
with an accuracy of 99.8% on the testing dataset. The hyperparameter settings as ’bootstrap’: True, ’max depth’: 50 and
’n estimators’: 800 gain the highest accuracy for the Random
Forest. Comparing both the models, the Random Forest performed better than the Decision Tree by a minute distance.
B. Model Explainability
SKATER plot in Fig. 3 displays the best features from the
dataset after completing the model training where features are
sorted based on their importance. According to this plot, red
blood cell count, albumin, and serum creatinine are the top
three important features for classifying CKD.
TABLE II
P ERFORMANCE MATRIX OF D ECISION T REE WITH THE SIX BEST
FEATURES .
Class/Performance
Metrics
ckd
not-ckd
Precision
Recall
F1-Score
Support/Instances
0.98
1.00
1.00
0.98
0.99
0.99
250
250
TABLE III
P ERFORMANCE MATRIX OF R ANDOM F OREST RESULTS WITH THE SIX
BEST FEATURES .
Class/Performance
Metrics
ckd
not-ckd
Precision
Recall
F1-Score
Support/Instances
1.00
1.00
1.00
1.00
1.00
1.00
250
250
Fig. 5. The feature importance chart obtained by using SHAP (globally).
Fig. 3. The feature importance chart obtained by using SKATER.
SHAP plot displays the crucial features and also the features
that drive the target class of the algorithm. SHAP has been
applied both locally and globally in the current context. Fig.
4 depicts the feature importance for a single instance (local)
and Fig. 5 shows the same for the model output (global).
According to SHAP, the three features that contribute the
most in determining the model output are hemoglobin, specific
gravity, and packed cell volume.
Fig. 4. The feature importance chart obtained by using SHAP (locally).
LIME plot in Fig. 6 displays the features in a sorted
order giving both the probability of the target class and the
probability of each feature contributing to the classification.
It also gives the ideal deciding range of each feature in the
trained model on a local level. According to LIME, specific
gravity, hemoglobin and albumin are the top three critical
features for classifying the selected instance.
Fig. 6. The feature importance chart obtained by using LIME (local).
C. Explainability Evaluation Results
The accuracy of the Decision Tree trained with the six best
features found through the floating Feature Selection method
and five-fold cross-validation was 98.8%. The accuracy of the
Random Forest model with the same parameters was 99.8%.
A fully interpretable Decision Tree with all features had an
accuracy of 99.8%. Therefore, according to the definition, our
model’s Interpretability is 75% (18/24) as we used only six
features, and the remaining 18 features did not bring any value
to the model’s output. The Fidelity of our model is 98.8% /
99.8%, which is 99% because we have 1% loss in performance
to achieve the interpretability of the model. Lastly, the Fidelityto-Interpretability ratio is 99/(99 + 75), which is 56.89%, is
closer to the ideal value of 50%.
We compared our explainability model with the previous
study [1] as shown in Fig. 7. Our model’s Interpretability
was 75% compared to 67% of the other study. In the case
of Fidelity, our model has sacrificed only 1% accuracy to
add explainability compared to 3% previously. We achieved
Fidelity-to-Interpretability as 57.14% closer to the ideal value
of 50%, whereas Moreno [1] shows an FIR of 59%.
Fig. 7. A comparison of the explainability results with the previous work [1].
VI. D ISCUSSION AND C ONTRIBUTIONS
Our key objective of the study was to apply both local and
global explainability methods to different machine learning
methods for classifying chronic kidney disease patients and
compare the results among the methods. We experimented
with three local and global explainability methods (SHAP,
LIME, and SKATER), which gave us impressive results by
identifying the key features and their contributions to the
model’s output. Except for a few differences, all three methods
identified the same features as the important ones for CKD
classification with Random Forest and Decision Tree models.
For global interpretability, SKATER and SHAP, both the
methods identified serum creatinine and red blood cell count
among the top three essential features for CKD classification
based on their value range. Hemoglobin, albumin, and packed
cell volume are among other crucial features impacting the
overall classification output by the models.
We found that both SHAP and LIME identified
haemoglobin as a vital feature in local interpretability.
Low hemoglobin levels, low red blood cell count, and high
specific gravity values contribute to a patient diagnosed with
CKD and the reverse levels diagnosed as not having CKD.
Such information is beneficial in assessing each patient based
on the features and associated values and drawing practical
observations that might impact the global level findings.
While investigating the suitability of local and global explainability methods, our study shows that both techniques
would be helpful in the relevant contexts of CKD classification
and play a crucial role for the physicians in understanding the progression of kidney disease. Global explainability
metrics help to gain a general understanding of commonly
important features for CKD classification. Local instance-level
explainability would help the physicians to assess each patient
separately, compare with global feature importance and draw
valuable facts and information to determine the next course
of action for individual patients. Patients, in turn, benefit by
avoiding many clinical tests and expensive image scans to
diagnose kidney disease at an earlier stage.
According to an American Family Physician research article
[37], Diabetes mellitus, hypertension, and older age have
been identified as the primary risk factors of CKD where
patients should be screened for further investigation. Cardiovascular disease, family history of chronic kidney disease,
and ethnic and racial minority status have been identified
as other risk factors for CKD. In our study, SHAP global,
SKATER, and LIME also identified hypertension, diabetes
mellitus, and heart (coronary artery) disease as important
features contributing positively to a patient classified as CKD.
However, the explainability methods identified other factors
like red blood cell count, haemoglobin, albumin, specific
gravity, and serum creatinine as influential too. The clinical
correlation and verification of these factors are the potential
future tasks to advance our study further. National Institute
of Diabetes and Digestive and Kidney Diseases (NIDDK)
[38], Center for Disease Control and Prevention (CDC) [39],
and Johns Hopkins Medicine [40] also identified diabetes
mellitus type 1 or type 2, blood pressure, hypertension, and
cardiovascular disease with a family history of kidney failures.
Additionally, obesity is a high-risk factor for CKD evaluation
among patients.
VII. L IMITATIONS AND F UTURE W ORK
We have a few limitations in the study. We acknowledge
that the dataset is small for conducting a significant critical
analysis. Other datasets either had proprietary ownership or
non-responsiveness from the sources. We applied explainability to only the Random Forest algorithm in our study. In order
to have a generalized observation, we need to apply the same
methods to other black-box algorithms. We intend to test existing and new explainability metrics with other classification
algorithms like XGBoost, Adaboost, and ExtraTree to assess
the performances of future healthcare models.
R EFERENCES
[1] P. Moreno, “An explainable classification model for chronic kidney
disease patients”, [Online]. Available: https://arxiv.org/pdf/2105.10368.
[Accessed: 12-Mar-2022].
[2] A. Levin, P. E. Stevens, R. W. Bilous, J. Coresh, A. L. De Francisco,
P. E. De Jong, C. G. Winearls, & Others, ”Kidney Disease: Improving
Global Outcomes (KDIGO) CKD Work Group. KDIGO 2012 clinical
practice guideline for the evaluation and management of chronic kidney
disease”, Kidney international supplements, vol. 3, no. 1, pp. 1–150,
2013.
[3] R. Saran et al., ”US renal data system 2016 annual data report:
Epidemiology of kidney disease in the United States”, American journal
of kidney diseases, vol. 69, no. 3, pp. A7–A8, 2017.
[4] B. Bikbov, G. Remuzzi, and N. Perico, “Disparities in chronic kidney
disease prevalence among males and females in 195 countries: Analysis
of the global burden of disease 2016 study,” Nephron,pp. 313–318, [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/29791905/. [Accessed:
12-Mar-2022].
[5] H. Wang et al., ‘Global, regional, and national life expectancy, all-cause
mortality, and cause-specific mortality for 249 causes of death, 1980–
2015: a systematic analysis for the Global Burden of Disease Study
2015’, The lancet, vol. 388, no. 10053, pp. 1459–1544, 2016.
[6] H. Wang, M. Naghavi, C. Allen, R. M. Barber, Z. A. Bhutta, and A.
Carter, “Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: A
systematic analysis for the global burden of disease study 2015,” The
Lancet, vol. 388, no. 10053, pp. 1459–1544, 2016.
[7] S. Ardhanari, M. Alpert and K. Aggarwal, ”Cardiovascular Disease in
Chronic Kidney Disease: Risk Factors, Pathogenesis, and Prevention”,
Advances in Peritoneal Dialysis, vol. 30, pp. 40-53, 2014. Available: https://www.advancesinpd.com/adv14/40-53 Ardhanari.pdf. [Accessed 13 March 2022].
[8] M. J. Sarnak, “Kidney disease as a risk factor for development of
cardiovascular disease,” Circulation, vol. 108, no. 17, pp. 2154–2169,
2003.
[9] R. Walker, M. Marshall, en N. Polaschek, “Improving self-management
in chronic kidney disease: A pilot study”, Renal Society of Australasia
Journal, vol 9, pp. 116–125, Sep 2013
[10] I. U. Ekanayake and D. Herath, “Chronic kidney disease prediction
using machine learning methods,” 2020 Moratuwa Engineering Research
Conference (MERCon),pp. 260–265, 2020.
[11] S. Sossi Alaoui, B. Aksasse, and Y. Farhaoui, “Statistical and predictive
analytics of chronic kidney disease,” Advances in Intelligent Systems
and Computing, pp. 27–38, 2019.
[12] A. Ogunleye and Q.-G. Wang, “Enhanced xgboost-based automatic diagnosis system for chronic kidney disease,” 2018 IEEE 14th International
Conference on Control and Automation (ICCA), pp. 2131–2140, Nov.
2020.
[13] S. Zeynu, and S. Patil, ”Survey on prediction of chronic kidney disease
using data mining classification techniques and feature selection”, International Journal of Pure and Applied Mathematics, vol. 118, no. 8, pp.
149–156, 2018.
[14] H. Alasker, S. Alharkan, W. Alharkan, A. Zaki, and L. S. Riza,
“Detection of kidney disease using various intelligent classifiers,” 2017
3rd International Conference on Science in Information Technology
(ICSITech),pp. 681–684, 2017.
[15] N. V. Ganapathi Raju, K. Prasanna Lakshmi, K. G. Praharshitha, and
C. Likhitha, “Prediction of chronic kidney disease (CKD) using Data
Science,” 2019 International Conference on Intelligent Computing and
Control Systems (ICCS),pp. 642–647, 2019.
[16] B. Khan, R. Naseem, F. Muhammad, G. Abbas, and S. Kim, “An
empirical evaluation of machine learning techniques for chronic kidney
disease prophecy,” IEEE Access, vol. 8, pp. 55012–55022, 2020.
[17] K. M. Zubair Hasan and M. Zahid Hasan, “Performance evaluation of
ensemble-based machine learning techniques for prediction of chronic
kidney disease,” Emerging Research in Computing, Information, Communication and Applications, pp. 415–426, 2019.
[18] A. A. Abdullah, S. A. Hafidz, and W. Khairunizam, “Performance
comparison of machine learning algorithms for classification of chronic
kidney disease (CKD),” Journal of Physics: Conference Series, vol.
1529, no. 5, p. 052077, 2020.
[19] L. Jerlin Rubini and E. Perumal, “Efficient classification of chronic
kidney disease by using multi-kernel support vector machine and fruit
fly optimization algorithm,” International Journal of Imaging Systems
and Technology, vol. 30, no. 3, pp. 660–673, 2020.
[20] M. Yousef, “Prediction of chronic kidney disease using different Classification Algorithms: A Comparative Study,” Prediction Of Chronic
Kidney Disease Using Different Classification Algorithms: A Comparative Study. [Online]. Available: https://www.xisdxjxsu.asia/V17I1039.pdf. [Accessed: 13-Mar-2022].
[21] U. N. Dulhare and M. Ayesha, “Extraction of action rules for chronic
kidney disease using naı̈ve Bayes classifier,” 2016 IEEE International
Conference on Computational Intelligence and Computing Research
(ICCIC),pp. 1-5, 2016.
[22] R. Pujari and V. Hajare, “Analysis of ultrasound images for identification
of chronic kidney disease stages,” 2014 First International Conference
on Networks & Soft Computing (ICNSC2014), pp. 380–383, 2014.
[23] P. Sinha and P. Sinha, “Comparative study of chronic kidney disease
prediction using KNN and SVM,” International Journal of Engineering
Research and, vol. V4, no. 12, 2015.
[24] A. Sobrinho, A. C. Queiroz, L. Dias Da Silva, E. De Barros Costa,
M. Eliete Pinheiro, and A. Perkusich, “Computer-aided diagnosis of
chronic kidney disease in developing countries: A comparative analysis
of machine learning techniques,” IEEE Access, vol. 8, pp. 25407–25419,
2020.
[25] J. Neves, M. R. Martins, and J. Vilhena, “A soft computing approach to
Kidney Diseases Evaluation,” Journal of Medical Systems, vol. 39, no.
10, 2015.
[26] H. Polat, H. Danaei Mehr, and A. Cetin, “Diagnosis of chronic kidney
disease based on support Vector Machine by feature selection methods,”
Journal of Medical Systems, vol. 41, no. 4, 2017.
[27] J. Qin, L. Chen, Y. Liu, C. Liu, C. Feng, and B. Chen, “A machine
learning methodology for diagnosing chronic kidney disease,” IEEE
Access, vol. 8, pp. 20991–21002, 2020.
[28] M. Vijayarani, S. Dhayanand,”Kidney Disease Prediction using SVM
and ANN algorithms,” International Journal of Computing and Business
Research, vol. 6,no. 2, pp. 2229 - 6166, 2015.
[29] D. Dua and C. Graff, UCI Machine Learning Repository, University of California, Irvine, School of Information
and
Computer
Sciences
(2017).Available:
https://archive.ics.uci.edu/ml/datasets/chronic kidney disease.
[Accessed: 2-Feb-2022].
[30] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification
and regression trees. Routledge, 2017.
[31] J. R. Quinlan, ”Induction of decision trees”, Machine learning, vol. 1,
no. 1, pp. 81–106, 1986.
[32] T. K. Ho, ”Random decision forests”, Proceedings of 3rd international
conference on document analysis and recognition, no. 1, pp. 278–282,
1995.
[33] L. Breiman, ”Random Forests.”, Machine Learning, vol. 45, pp. 5–32,
2001.
[34] Skater Documentation. (n.d.). Model Interpretation with Skater:
Overview¶. Overview - skater 0 documentation. Available:
https://oracle.github.io/Skater/overview.html. [Accessed: 12-Apr-2022].
[35] S. M. Lundberg and S. I. Lee, (2017), ”A Unified Approach to
Interpreting Model Predictions,” Advances in Neural Information
Processing Systems, vol. 30, pp. 4765–4774, 2017.Available:
http://papers.nips.cc/paper/7062-a-unified-approach-to-interpretingmodel-predictions.pdf. [Accessed: 12-Apr-2022].
[36] K. Peng and T. Menzies, “Documenting evidence of a reuse of ‘“why
should I trust you?”: explaining the predictions of any classifier,’” in
Proceedings of the 29th ACM Joint Meeting on European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering, 2021, pp. 1600–1600.
[37] M. Baumgarten and T. Gehr, “Chronic kidney disease: Detection and
evaluation,” American family physician, 15-Nov-2011. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/22085668/. [Accessed: 22-May2022].
[38] “Identify & evaluate patients with chronic kidney disease,”
National Institute of Diabetes and Digestive and Kidney
Diseases. [Online]. Available: https://www.niddk.nih.gov/healthinformation/professionals/clinical-tools-patient-management/kidneydisease/identify-manage-patients/evaluate-ckd. [Accessed: 23-May2022].
[39] “Chronic
kidney
disease
basics,”
Centers
for
Disease
Control and Prevention, 28-Feb-2022. [Online]. Available:
https://www.cdc.gov/kidneydisease/basics.html. [Accessed: 23-May2022].
[40] “Chronic kidney disease,” Johns Hopkins Medicine, 08-Aug-2021.
[Online]. Available: https://www.hopkinsmedicine.org/health/conditionsand-diseases/chronic-kidney-disease. [Accessed: 23-May-2022].
Download