Uploaded by imtiaj hossain

19103008 Zarin Zayed Hossain Thesis Report

advertisement
87654zz
A Comparative Study of Chronic Kidney Disease
Prediction Using Machine Learning Techniques
Rehnuma Ferdous
#ID: 19103008
A Thesis in the Partial Fulfillment of the Requirements
for the Award of Bachelor of Computer Science and Engineering (BCSE)
Department of Computer Science and Engineering
College of Engineering and Technology
IUBAT – International University of Business Agriculture and Technology
Summer2022
A Comparative Study of Chronic Kidney Disease
Prediction Using Machine Learning Techniques
Rehnuma Ferdous
#ID: 19103008
A Thesis in the Partial Fulfillment of the Requirements for the Award of Bachelor of
Computer Science and Engineering (BCSE)
The thesis has been examined and approved,
_____________________________
Prof. Dr. Utpal Kanti Das
Professor and Chairman
_____________________________
Dr.Hasibur Rashid Chayon
Associate Professor and Coordinator
_____________________________
Arifa Tur Rahman
Assistant Professor and Supervisor
Department of Computer Science and Engineering
College of Engineering and Technology
IUBAT – International University of Business Agriculture and Technology
Summer2022
Letter of Transmittal
14 August 2022
The Chair
Thesis Defense Committee
Department of Computer Science and Engineering
IUBAT–International University of Business Agriculture and Technology
4 Embankment Drive Road, Sector 10, Uttara Model Town
Dhaka 1230, Bangladesh
Subject: Letter of Transmittal.
Dear Sir,
We are pleased to present you our thesis report titled ‘A Comparative Study of Chronic
Kidney Disease Prediction Using Machine Learning Technique’ as required by IUBAT for the
partial fulfillment of the requirements for the award of Bachelor of Computer Science and
Engineering. It was indeed a great opportunity for us to work on this project to actualize our
theoretical knowledge into practice.
Finally, we would like to express our gratitude to you for giving us this opportunity to pursue
our studies in your renowned university.
Yours sincerely,
_____________
Rehnuma Ferdous
19103008
Student’s Declaration
We hereby declare that this practicum report titled ‘A Comparative Study of Chronic
Kidney Disease Prediction Using Machine Learning Technique’ is our original work. It has
never been presented previously or concurrently for any other purpose, reward or degree at
IUBAT or any other institutions either by us or by any other student. We also declare that there
is no plagiarism or data falsification and materials used in this report and various sources have
been duly cited.
_____________
Rehnuma Ferdous
19103008
iv
Supervisor’s Certification
This is to certify that the thesis report on “A Comparative Study of Chronic Kidney
Disease Prediction Using Machine Learning Technique” has been carried out by Rehnuma
Ferdous (bearing ID# 19103008), student of Department of Computer Science and Engineering
of IUBAT — International University of Business Agriculture and Technology, As a partial
fulfillment of the requirement of the degree in Bachelor of Computer Science and Engineering.
The report has been prepared under my guidance and is a record of work carried out
successfully. Now they are permitted to submit the report. I wish them success in the future
endeavors
_______________________________
Arifa Tur Rahman
Supervisor and Assistant Professor
Department of Computer Science and Engineering
IUBAT–International University of Business Agriculture and Technology
v
Abstract
Chronic Kidney Disease (CKD), also regarded as persistent renal disease, has emerge
as a serious public health issue with a consistent expand in incidence. A character can only
continue to exist for about a week except their kidneys.18 days is a lengthy time, for this reason
a kidney transplant is in excessive demand. Dialysis is some other option. It is imperative to
have advantageous approaches for early detection’s diagnosis and prognosis Machine
mastering techniques are useful in a variety of situations’ prognosis This paper suggests a
approach for predicting CKD.a repute primarily based on clinical information that
accommodates data prepossessing, way for handling lacking values the use of collaborative
filtering and decision of characteristics Out of the 9 computer mastering techniques available,
The greater tree classifier and random forest classifier are taken into account. Are proved to
produce the best accuracy and the least bias the characteristics. The study additionally takes
into account the sensible problems of records gathering and emphasizes the want of applying
domain know-how when using computer learning to predict CKD status.
Index Terms - Machine learning, classification algorithms,chronic kidney illness,
chronic renal disease
vi
Acknowledgments
We take this opportunity to express our sincere gratitude to our research supervisor
Arifa Tur Rahman, Department of CSE, and IUBAT- International University of Business
Agriculture and Technology. Our project manager has a strong background in "Machine
Learning" and a genuine interest in it. This project was made possible by his never-ending
patience, academic leadership, strong motivation, persistent encouragement, constant and
energetic supervision, constructive criticism, invaluable counsel, reading numerous subpar
versions and fixing them at all stages.. We would like to express our heartiest gratitude to the
Almighty Allah and also to Prof. Dr. Utpal Kanti Das, Chairman of Department of Computer
Science and Engineering, and Prof. Dr. Hasibur Rashid Chayon, Co-Ordinator of Department
of Computer Science & Engineering and other faculty members & the staffs of the Department
of CSE of IUBAT- International University of Business Agriculture and Technology to finish
our research.
We'd like to extend our gratitude to all of our classmates at the International University of
Business, Agriculture, and Technology (IUBAT), who participated in this discussion while
also attending class. Finally, we must respectfully appreciate our parents' unwavering
assistance and endurance.
vii
Table of Contents
Letter of Transmittal ....................................................................................................... iii
Student’s Declaration ...................................................................................................... iv
Supervisor’s Certification .................................................................................................v
Abstract ............................................................................................................................. vi
Acknowledgments ........................................................................................................... vii
List of Figures .....................................................................................................................x
List of Tables .................................................................................................................... xi
Chapter 1. Introduction ....................................................................................................1
Chapter 2. Literature Review ...........................................................................................4
2.1 Chronic Kidney Disease (. (F. E. Murtagh,et al.)..............................................4
2.2 Five stages of CKD (C. A. Johnson et al.) .........................................................4
2.3 Little's MCAR method (S. Nair et al.) ...............................................................5
2.4 Characteristics to predict CKD (P. Yildirim et al.) ...........................................5
2.5 WEKA data mining tool (D. Dua et al.) ...........................................................7
Chapter 3. Research Methodology .................................................................................10
3.1 Data Preprocessing: Missing Value Handling…………..................................10
3.2 Data Preprocessing: Feature Selection.............................................................14
3.3 Model Training.................................................................................................20
3.4 Model Evaluation and Selection……………………………………………...22
Chapter 4. Result and Discussion ...................................................................................23
viii
4.1 Algorithm feature significance standard deviation...................................24
Chapter 5. Conclusion .....................................................................................................26
References ........................................................................................................................27
ix
List of Figures
Figure 3.1 Proposed workflow................................................................................................ 10
Figure 3.2 Heat map of attributes' relationships with the class variable…...............................15
Figure 3.3 Albumin over specific gravity distribution…. ...................................................... 16
Figure 3.4 The ratio of serum creatinine to hemoglobin……. .............................................. 18
Figure 3.5 Diabetes mellitus prevalence compared to high blood pressure. ......................... 19
Figure 3.6 The distribution of appetite. ................................................................................. 20
Figure 3.7Feature importance of each trained model............................................................. 22
x
List of Tables
Table 3.1 TESTS FOR MEASURING MULTIPLE ATTRIBUTES AND MISSING VALUE
PERCENTAGE…………………………………………………………………................... 11
Table 3.2LITTLE'S MCAR TEST RESULT…………………...…………………………...13
Table 3.3PERCENTAGE CHANGE OF STATISTICS OF ATTRIBUTES AFTER FILLING
MISSING VALUES……………………………..…………………………………………. 17
Table 3.4 ACCURACIES OF EACH ALGORITHM….…………………….…………….. 21
Table 3.5 PRECISION, RECALL AND F1-SCORE OF EACH ALGORITHM…...….….. 21
Table 3.6 FEATURE IMPORTANCE OF EACH ALGORITHM…….……….………….. 23
Table 3.7 ALGORITHM IMPORTANCE STANDARD DEVIATION OF FEATURE
………………………………………………………………………….…………………… 24
xi
Chapter 1. Introduction
Theikidneysiareitwoibeanshapediorgansithatiareieachiroughlyitheisizeiofiaifist.iTheyi
areiputionieachisideiofitheispine,ioneiimmediatelyiunderitheiribicage.iTheikidneysifilteri120
itoi150iquartsiofibloodieachidayitoigeneratei1itoi2iquartsiofiurine.iTheiprimaryifunctioniofit
heikidneysiisitoieliminateiexcessifluidiandiwasteifromitheibodyithroughiurine.iAiseriesiofiin
crediblyicomplexiexcretioniandireabsorptioniprocessesicombineitoiformiurine.iTheibodyinee
dsithisisystemitoimaintainiaiconstantichemicaliequilibrium.iTheikidneysiareiinichargeioficon
trollingitheibody'sisalt,ipotassium,iandiacidilevelsiasiwelliasicreatingihormonesithatiaffectih
owivariousiorgansifunction.iAniexampleiofiaihormoneigeneratedibyitheikidneysiisioneithat,i
amongiotherithings,istimulatesitheicreationiofiredibloodicells,imanagesibloodipressure,iandic
ontrolsicalciumimetabolism.
Chronicikidneyidiseasei(CKD)iisicurrentlyiregardediasitheibiggestihazarditoisociety's
ihealth.iWithitheihelpiofilaboratoryitests,iitiisipossibleitoidiagnoseichronicikidneyidisease,ia
ndithereiareitreatmentsiavailableitoistopitheidiseaseifromiprogressing,islowiitidown,ilessenit
heichallengesiofiailoweriGFRianditheiriskioficardiovascularidisease,iandiimproveisurvivalia
ndiqualityiofilife.iLackiofiwatericonsumption,ismoking,iaipooridiet,iinsufficientisleep,iandiai
numberiofiotherifactorsicanicauseiCKD.iArounditheiworld,i753imillionipeopleiwereiafflicte
dibyithisiillnessiini2016,iincludingi417imillionifemalesiandi336imillionimales.iMostioften,it
heiconditioniisidiscoverediiniitsilatteristages,iwhichicanicauseirenalifailure.i(“KidneyiDiseas
e:iTheiBasics,”iAug.i2014)
Aimajoriproblemiinitheiworld,ichronicikidneyidiseasei(CKD)iisicharacterizedibyiaigr
adualideclineiofikidneyifunctionioveritime.i14%iofipeopleionitheiplanetihaveiCKD.iEvenith
oughithisinumberimayionlyirepresenti10%iofithoseiwhoineeditreatmentitoisurvive,ioveritwoi
millionipeopleiworldwideidependionidialysisioriaikidneyitransplantitoistayialive.iMoreipeopl
eidieifromichronicirenaliillnessithanifromibreastioriprostateicancer.
Theiglomerularifiltrationiratei(eGFR),iwhichiisientirelyibasedionicreatinineilevels,igender,ira
ce,iandiage,ideterminesitheistagesiofiCKDiinigeneral.iThereiareifiveistagesiinitheikidneyifea
ture.iTheifunctioniisimarginallyiimpairediinistagei2iandinormaliinistagei1,ihoweveriinitheiva
stimajorityioficases,itheifunctioniisiatistagei3.i(F.iE.iMurtagh,vol.i40,ino.i3,ipp.i342–
352,i2010)
The following signs and symptoms could also manifest in the patient with untreated
CKD: Anemia, fatigue, poor nutrition, and nerve damage are all indications of high blood
pressure. Reduced immune response because, at more advanced stages, harmful concentrations
of fluids, electrolytes, and wastes can accumulate in your blood and body. Because of this, it
is crucial to identify CKD as soon as possible. However, this can be challenging because the
disease's signs and symptoms don't appear suddenly and don't always indicate CKD. Due to
the fact that some people have no symptoms at all, desktop learning can be effective in
determining whether or not a patient has CKD. This is achieved by machine learning by
training a predictive model with historical CKD patient data. The most reliable test for
determining your kidney condition and stage of chronic kidney disease is the glomerular
filtration rate (GFR). It may be calculated using your blood creatinine, age, race, gender, and
other other factors. The greater the chance of identifying a disease and halting or preventing it,
the earlier it is detected. (D. Dua, 2017)
It is possible to forecast excellent CKD repute and CKD stages using machine learning.
Machine learning is a significant synthetic intelligence challenge when it comes to applying
classification and regression algorithms to infer future outcomes from past data. Computing
device mastering strategies for CKD prediction have been studied on the foundation of various
2
data sets. The UCI repository dataset is listed among them as a benchmark dataset. The
benchmark dataset is considered in this analysis, as it is in the majority of similar ones.
Forithatireason,ithisiresearchidiscussesitheiproblemsiwithihandlingimissingivaluesiw
hileilookingiatiCKDidata,ioffersiaifreshimethodiforihandlingimissingivalues,iandicontrastsin
ovelisolutionsiusingitheiUCIidataset.iWhileimakingiaipredictionibasedioniscientificistatistics
irelateditoiCKD,ithisiworkiemphasizesitheineediofistatisticalianalysisiasiwelliasigeographica
liawarenessiofitheifactors.iTheipresentimethodiofianalysisireliesioniaiurineianalysisiandithei
determinationiofiserumicreatinineiconcentrations.iAivarietyioficlinicaliinterventions,iincludi
ngiscreeningiandiultrasonography,iareiemployeditoiachieveithisigoal.iEveryoneiisiscreened,ii
ncludingithoseiwithihypertension,iaihistoryioficardiovascularidisease,iaimedicalihistory,iandi
peopleiwithirelativesiwhoihaveiexperiencedirenaliillness.iTheibloodicreatinineileveliandithei
urineialbumintocreatinineiratioi(ACR),iwhichiisideterminediduringiaifirstimorningiurineitest,
iareiuseditoicomputeitheiestimatediGFR.iThisistudyifocusesionicomputerilearningitechnique
silikeiACOiandiSVMitoiimproveipredictioniaccuracyibyireducingifeaturesiandichoosingithei
righticharacteristics.
Chapter 2. Literature Review
A.iJ.iHussainiandihisicolleaguesiwereiableitoipredictiCKDiiniitsiearlyitiersiwithiania
ccuracyiofi0.995iusingimultilayericomprehensioniandipreprocessingiofitheidataisetiwithineur
alinetworksitoifilliinitheimissingiinformation.iOutliers are eliminated, statistical analysis is
performed to identify the top seven qualities, and primary factor analysis is used to omit the
traits with the strongest inter-correlation (PCA). The study under discussion's trained models'
accuracy is significantly impacted by the missing fee filling approach. However, the accuracy
of missing cost prediction was once little diminished since only 260 totally done records
instances were combined with the Neural Network for 20 characteristics. Eliminating attributes
with more than 20% of their values missing has had a significant impact on the accuracy of
replacing such values. The selection of attributes for the education mannequin from each
category has been facilitated by the classification of attributes using sources, such as blood
tests or urine tests. (A. J. Aljaaf, 2018, pp. 1–9.)
ForitheifiveistagesiofiCKD,iaimethodiforipredictingiaistageiwithitheibestiaccuracyifo
riaistageiwithi0.997iandistandardiaccuracyiofi0.967iwasiproposed.iThisimethodialsoicomput
esitheieGFRiusingitheiaforesaidirecordsisetiwithiextraigenderiandiracialivariables,iwhileieli
minatingicasesiwithimissingivalues.i(C.iA.iJohnson,ivol.i70,ino.i5,ipp.i869–876,i2004)
Dueitoitheimodel'sisubstantiallyiloweriprecision,iconstantsiareiemployeditoireplacei
missingidata.iHowever,iouristudy'sirandomizationioficomponentsiwithiinsufficientistatistical
ipoweriisipreferrediaccordingitoiLittle'siMCARitechniquei(seeitheiMethodologyisection).iFu
rthermore,iwheniexaminingitheifeatures,itheisignificanceiofiserumicreatinineiisiskewed.iHo
wever,iinitheiearlyitiersiofiCKD,iserumicreatinineicaniproceediatieverydayilevels,ianditheico
mpleteiimportanceiofialliotheriparametersimayialsoinowinotibeiextraithanithatiofiserumicrea
4
tinine,imakingiserumicreatinineiunhelpfuliinidiseaseiprediction.iBecauseidomainirecordsiisin
otiincluded,itheitrainedimodels'iaccuracyiinipredictinginewisituationsioutsideitheirecordsiseti
isiquestioned. (S. Nair,vol. 37, no. 2, pp. 483–487, Feb. 2014)
Ini2017,iaiteamiofiresearchersiusediaimulticlassiselectioniforestitoipredictiCKDiwithi
0.991iaccuracyiusingi14ifeatures.iAineuralinetworkiandiailogisticiregressionimodeliwereitrai
ned,ianditheseimodelsiproducedinormaliaccuraciesiofi0.975iandi0.960,irespectively.iTheyith
eniexcludeditheicasesiwithimissingivalues.iTheicorelationsibetweenitheichoseniqualitiesiran
geifromi[0.2itoi0.8].iAccordingitoiscience,iCKDicanicauseihypertension,iandihypertensionic
anicauseiCKD,iandispecificigravityihasiai0.73icorrelationitoitheiclass.iIt'sipossibleithatielimi
natingitheseicharacteristicsiwillireduceiaccuracy.i(P.iYildirim,ivol.i02,ipp.i193–198,i2017)
In 2015, Lambodar J. and Narendra Ku. K. scanned with eight computing device
research models using the WEKA data mining tool. With ROCs of 1 and accuracies of 0.950,
0.9975, and 0.99, respectively, the Naive Bayes, Multi-layer Perception, and J48 algorithms
exhibited the ideal receiver operating characteristic (ROC) and accuracy. In the study, the
argument strength was determined using Kappa statistics, with the multilayer perceptron
algorithm receiving the highest score of 0.9947 and the choice desk and J48 algorithms
receiving the lowest score of 0.9786. (D. Dua, 2017.)
In light of previous research based on the UCI CKD data set, it was found that many of
the lower accuracy cases are due to inadequate handling of missing information and the
attribute determination mechanism.
Deep learning algorithms have become well-known methods for characterizing
patients, simulating the progression of disease (Choi et al. 2016a; Ma et al. 2017; Choi et al.
2016c; Lipton et al. 2016), and creating artificial EHR data for search purposes. (Choi et al.
2017b).
Predicting illness outcomes is the most popular software for modeling disease
development. When learning disease trajectories from scratch, deep neural networks have
relatively limited capability; therefore, it is occasionally necessary to include existing clinical
information (Ma et al. 2018; Pham et al. 2017) or complement EHR data with the naturally
hierarchical structure of scientific ontologies (Choi et al. 2017a). Modeling EHR data includes
assignments for missing values. RNNs can take advantage of the long-term dependency in time
collection to improve prediction performance, as demonstrated by (Che et al. 2018)..
Deep learning models need a lot of data to provide high-quality results, which is
typically more than most healthcare institutions can handle. Combining EHR data from many
sources is a straightforward option, but records harmonization is a time-consuming procedure.
For deep learning models in addition to site-specific data harmonization, (Rajkomar et al. 2018)
currently proposed a representation of EHRs based on the Fast Healthcare Interoperability
Resources (FHIR) structure. Facts augmentation approaches can help training when working
with an unbalanced dataset and insufficient high-quality samples. In order to generate
candidate excellent and negative samples for the identification of rare diseases, CONAN (Cui
et al. 2020) incorporates generative adversarial networks (GANs). Pre-training and switch
getting to know can also help to fix this issue (Bengio 2012; Dauphin et al. 2012). The pretrained hospital go-to representations were employed by G-BERT (Shang et al. 2019) for
downstream prediction tasks. (Rios and Kavuluru 019) trained a CNN on a sizable global
biomedical abstract library and used the knowledge acquired to forecast diagnosis codes for
one medical facility.
Algorithms for representation analysis in the healthcare sector usually borrow from
herbal language processing (NLP). The commonly accepted idea is to observe Word2Vec
6
methods (Mikolov et al. 2013a) to analyze embeddings after discrete clinical ideas (such as
clinical codes) are encoded to one-hot vectors (Bengio, Courville, and Vincent 2013).
For instance, Med2Vec (Choi et al. 2016b) examined intra-visit medical code cooccurrences as well as inter-visit sequential information using skipgram (Mikolov et al. 2013b).
The standard skip-gram model is entirely predicated on the idea that words might have unique
functions in unusual places within a sentence. Due to the unordered nature of clinical codes,
this presumption is invalid. The sequence in which these scientific ideas develop is often
overlooked when we use NLP algorithms to represent clinical notions. Instead, we apply the
algorithms to the function dimension instead of the temporal dimension. To examine a
multilayer embedding of EHR data, MiME (Choi et al. 2018) took advantage of the natural
shape of clinical codes, but this mannequin required the EHR statistics to include full structure
information between diagnoses and treatments. GCT (Choi et al. 2020) has recently been
suggested as a solution to this issue. GCT demonstrated that Transformer is a suitable model
to explore such structure at some point during training after realizing the graphical structure of
EHR information. GCT's initial usage of Transformer to encode clinic visits served as the
inspiration for our work.
Geietial.i(2019)iPredictediparkisonsiailmentiseverityitheiusageiofiDeepiNeuraliNetw
orkiwithiUCI’siparkison’sitelemonitoringivoiceidatasetiofipatients.iTheistudiesicomprisediai
biomedicalivoiceidimensioniofi42isufferersiwithiParkisonsiDiseasei(PD).iSeverityiprediction
ionitheigroundworkiofitotaliUnifiediParkisonsiDiseaseiRatingiScalei(UPDRS)iaccuracyiscor
eiofi94.4422%iandi62.7335%iforiinstructiandicheckidatasetirespectivelyiandiseverityiPDisev
erityionitheifoundationioficompleteiUPDRSiaccuracyiratingiofi83.367%iandi81.6657%iforit
rainiandicheckidatasetirespectively.
AyoniandiIslami(2019)iProposediaimethodiforitheianalysisiofidiabetesiusingiDNNio
niPIMiIndianiDiabetesi(PID)idatasetifromiUCIimachineilearningirepositoryiwithianiaccurac
yiofi98.35%,iF1iScore:98%iandiMCC:97%iforifivefoldigoivalidation.iAdditionally,iaccuracyiofi97.11%,iSensitivityi:96.25%iandiSpecificity:98
.80%iboughtiforiten-foldigoivalidationiandiindicatedithatifivefoldipassivalidationishowedibetteriperformance.
Shafietial.i(2020)iProposediaimachineistudyingiprimarilyibasedisolutionitoiavoidiclef
tiinitheimother’siwombiwithiDeepiLeaningimethodiandiotheri4imethods,ioni1000ipregnanti
womanisamplesifromithreeiexceptionalihospitalsiiniLahore,iPunjab.iTheiauthorsicarriediouti
factsicleaning,iscalingiandicharacteristicidecisionitechniqueiandicompareditheiaccuracyiforia
llitheialgorithmsiwithiRandomiForest(RF)ialgorithm:85.77%,iDecisioniTree(DT):88.14%,iK
NearestiNeighbori(KNN):89.72%,iSupportivectoriMachine(SVM):90.69%iandiMultilayeripe
rception(MLP)iwhichiisiaiDeepiNeuraliNetwork:92.6%.iandiindicatedithatiMLPiyieldiaihigh
eriaccuracy.
SharmaiandiParmari(2020)iProposediaimodeliforiheartidiseaseipredictioniwithiDNNi
modelioniheartidiseaseiUCIidatasetiwithisixi(6)idistinctiveiclassifiersiKNN,iSVM,iNB,iRFia
ndiDNNitheiusageiofitalosioptimization.iTheiriworkiindicatedianiaccuracyiforiKNN:90.16%
,iLogisticiRegression:82.5%,iSVM:81.97%,iNB:85.25%iandiDNNiwithiTalosioptimization:9
0.78%.
AhmediandiAlsheblyi(2019)iappliedidifferentilaptopigainingiknowledgeiofialgorithm
iwhichiareiartificialinuralinatworkiandilogisticiragressioni(LR)itoiaihassleiinitheidomainioni
scientificidiagnosisiandianalyzeditheiriefficiencyiofitheipredictionioni153icaseiandieleveniatt
ributeiofiCKDipatients,itheiobservediperformanceiofitheiANNsiclassifieriisihigherithaniLRi
modeiwithitheiaccuracyiofi84.44%,isensitivityiofi84.21,ispecificityiofi84.61%iandiAreaiUnd
8
eritheiCurvei(AUC)iofi84.41%iandideterminedithatitheimostiessentialielementsithatihaveiaic
leariimpactionicontinualikidneyiailmentipatientsiareicreatinineiandiurea.
Chapter 3. Research Methodology
Threeicrucialicomponentsimakeiupitheisuggestedimethodology:idataipreparation,imo
delitraining,iandimodeliselection.(Fig. 1).
Fig. 1. Proposediworkflow
A. Dataipreprocessing:
IncorrectiValueiHandlingiDataipreprocessingiinithisiworkiuseditoibeidoneiinitwoiparts.i
Theifirstistepiwasitoifilterioutitheipropertiesiwhereimoreithani20%iofitheirecordsilackedival
uesi(seeiTableiI).iAsiairesult,itheistudyidoesinotiincludeitheicollectioniofifeaturesi(rediblood
icells,isodium,ipotassium,iwhiteibloodicellicount,iandipurpleibloodicellicount).iTheimissingi
valuesiinitheipreviousidataiwereiaddressediinitheisecondistageiofirecordsipreparation.
TABLE I
TESTSiFORiMEASURINGiMULTIPLEiATTRIBUTESiANDiMISSINGiVALUEiP
ERCENTAGE
attributeiiiiii
missingipercentageiiiiiiiiiiii
classiiiiiiiiiiiii
0.01i%
appetite
0.27%
doctoriinspection
pedaliedema
0.24i%
doctoriinspection
anemia
0.26i%
Fbc
hypertension
0.56i%
doctoriinspection
diabetesimellitus
0.54%
Fbc
coronaryiarteryidisease
0.52i%
doctoriinspection
pusicelliclumps
0.99i%
ufr
bacteria
0.99i%
ufr
age
2.35i%
doctoriinspection
bloodipressure
3.03i%
doctoriinspection
serumicreatinine
4.35i%
serumicreatinin
11
testitoiobtain
bloodiurea
4.76i%
bloodiuren
bloodiglucoaseirandom
11.10i%
rbs
albumin
11.60i%
ufr
spacificigravity
11.85i%
ufr
sugar
12.35i%
Ufr
hemoglobin
13.10i%
fbc
pusicell
16.35i%
ufr
packedicellivolume
18.60i%
fbc
sodium
22.85i%
serumielectroidsi
potassium
21.01i%
serumielectroids
whiteibloodicellicount
27.28%
fbc
redibloodicellicount
37.56i%
fbc
redibloodicells
38.50i%
ufr
TABLE II
MCARiiLITTLE'SiTESTiRESULT
name
value
Chi. square
3170.482
value
1
degreeiofifreedom
2171 P
missingipatterns
108
Toiachieveirealisticiaccuracy,imissingivaluesimustibeitreatedibasedionitheiridistributions
iinitheipreprocessingistage.iInithisistudy,itheiRandomiSamplingitestiwasionceirunitoiconfir
mitheiunpredictabilityiofitheimissingivalues.iTheimechanismicausingitheirecordsitoibeimissi
ngideterminesitheiworkableibiasiresultingifromimissingifacts.itheianalyticalitechniquesiusedi
toifilliinitheigaps.i(J.iC.iJakobsen,vol.i17,ino.i1,ip.i162,i2017)iTheiuseiofitheiMCAR'sichisquareitestitoimultivariateiquantitativeidataiisiexamined.iItideterminesiwhetheriorinotithereii
siaisignificantidifferenceiinitheiabilityiofivariousimissingvalueipatterns.iLittle'siMCARitestir
esults,iwhichiareishowniiniTableiII,ileditoitheiconclusionithatitheimissingivaluesiwereiunqu
estionablyirandomibecauseithei'p'icostiequalsizero.iInilightiofitheifactithatithereiareimoreipo
sitiveiCKDicasesithaninegativeiCKDicases,isubstitutingimissingivaluesiwithiaiconstantimayi
13
alsoireduceiaccuracyiandibiasitheipredictionitechnique.iSomeirelatediworksicanibeiuseditoii
dentifyithisiscenario.i(W.iGunarathne,2017,ipp.i291,296.),i(S.iVijayarani,iS.iDhayanandietia
l.,ivol.i4,ino.i4,ipp.i13,25,i2015).iByithinkingiaboutitheitheidrawbacki(ofi(A.iJ.iAljaaf,iD.iA
lJumeily,2018,ipp.i1,9.))iTheiKiNearestiNeighboriImputeritechniqueiwasiemployediinithisist
udyitoifilliinitheimissingivalues,iasiindicatediinitheirelatediwork.iByisettingitheinumberiofie
stimatorsi(anialgorithmihyperparameter)itoibeiequalitoitheinumberiofifulliinstances,iwhichir
esultediinitheilowestirecommendiandilowestistandardideviationichange,itheimethodiwasiable
itoimaintainitheidataset'sioriginalidistribution.
B. Data preprocessing:
FeatureiChoiceiTheiabsoluteivaluesiofitheiwarmnessimapiofitheicorrelationsioficharacter
isticsitoitheitypeilabeli(Fig.i2)irevealithatitheibesticorrelationsiareibetweenihemoglobin,ispec
ificigravity,ialbumin,ihypertension,iandidiabetesimellitusi(moreithani0.5).
Theisecondaryiqualitiesiwithicorrelationsiofigreaterithani0.3iareithenipusicell,ibloodigluc
oseirandom,ihunger,ibloodiurea,ipedaliedema,isugar,ianemia,iandiserumicreatinine.
Fig.i2.iHeatimapiofiattributes'irelationshipsiwithitheiclassivariable.
Albumin,ihemoglobin,ihypertension,idiabetesimellitus,ibloodiglucoseirandom,iandise
rumicreatinineiwereitheniselectediasitheitopqualityisubsetioficharacteristicsitoipredictiCKDiafteritakingiintoiaccountitheidistributioniofia
ttributesivaluesianditheiclinicaliperspectiveiofitheiattributesispecificigravity.iBelowiisiaidetai
lediexplanationiofihowitheichoseniattributesiwereidetermined.
15
Eachiofitheinumbersiforispecificigravityiandialbuminiisijusti5iunitsi(Fig.i3).iPlottedi
againstioneianother,itheirivaluesiformianiamazingiclusteriofiCKD-pooricases.
Fig. 3. Albumin over specific gravity distribution
Basedioniaicheckiforiproteiniinitheiurine,itheiamountiofialbuminiisiestimated.iExtrai
proteiniinitheiurineiisiaisignithatitheikidney'sifilteringimechanismsihaveibeenidamagedibyiai
disease,iaifever,ioristrenuousiactivity.iOveriaifewiweeks,inumerousievaluationsiareirequiredi
toiconfirmitheicondition.
Hemoglobin levels can often drop for three reasons: decreased synthesis of red blood
cells, increased red blood cell oxidation, and blood loss. Erythropoietin (EPO), a hormone, is
produced by healthy kidneys.(“Facts About Chronic Kidney Disease,” May 2020).
TABLE III
PERCENTAGEiCHANGEiOFiSTATISTICSiOFiATTRIBUTESiAFTERiFILLINGiMISSI
NGiVALUES
Hb
Specific
Albumin
Hypertension
DM
Gravity
Pus
Blood
Cell
Glucose
Appetite
BU
Pedal
Sugar
Anemia
SC
Edema
Random
Count
13.00
11.75
11.50
0.50
0.50
16.25
11.00
0.25
4.75
0.25
12.25
0.25
4.25
Mean
0.66
0.01
-1.39
-.034
-0.36
-2.22
-0.14
0.04
-0.22
-0.10
-4.53
-0.15
-0.43
Sid
-6.32-
-6.15
-5.95
-0.19
-0.19
-8.94
-5.86
-0.12
-2.43
-0.12
-6.30
-0.12
-2.16
Min
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
25
5.39
0.47
0.000
0.000
0.000
0.000
1.98
0.000
0.000
0.000
0.000
0.000
0.000
50
2.072
-0.12
100
0.000
0.000
0.000
3.87
0.000
4.55
0.000
0.000
0.000
7.15
75
-2.59
0.000
0.000
0.000
0.000
100
-2.46
0.000
-3.01
0.000
1000
0.000
0.85
max
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.04
0.000
0.000
0.000
0.000
Aihormoneiisiaisubstanceithatitheibodyiproducesiandireleasesiintoitheibloodstreamitoiassisti
initriggeringiorimodifyingispecificibodilyifunctions.iPurpleibloodicellsiareiproducediinitheib
oneimarrowiasiairesultiofiEPO,ianditheseicellsisubsequentlyitransportioxygenithroughoutith
eibody.iWhenitheikidneysiareiillioriinjured,itheyiareiunableitoiproduceienoughiEPO.iAsiaire
17
sult,itheiboneimarrowiproducesifeweriredibloodicells,iwhichicausesianemia.iHowever,ibefor
eianemiaidevelopsi(whichioccursiafterilessithani50%iofioneikidneyiisifunctioningiwell),ithei
levelsiofihemoglobinivaryislightly.
Furthermore,ihemoglobinivsiserumicreatinineiplotiisiadditionallyisuggestsiaiseparati
oniofitheitwoiclasses:iterribleianditremendous (Fig. 4)( “Kidney Disease: The Basics,” Aug.
2014).
iiiiiiFig.i4.iHemoglobinioveriserumicreatinineidistribution.
Bloodicreatinine,igenerate,iandicreatinineiareiotherinamesiforiserumicreatinine.iThei
breakdowniofiaisubstanceicalledicreatineiresultsiinitheiwasteiproducticreatinine,iwhichiisifor
medibyiutilisingimuscleimass.iTheikidneysiareiresponsibleiforieliminatingicreatinineifromith
eibody.iThisiexaminationicalculatesitheiblood'sicreatinineilevel.iTheicycleistepithaticreatesit
heielectricityineededitoicontractimusclesiisicreatine.iTheibodyiproducesicreatineiandicreatini
neiatiaihighlyiregularirate.iAihighproteinidiet,icongestiveiheartifailure,idiabeticiproblems,ian
didehydrationicanialliincreaseitheilevelioficreatinineiinitheibloodiiniadditionitoikidneyiprobl
emsithatiareialreadyipresent.iCreatinineitypicallyirangesibetweeni0.6iandi1.1img/dLiinigirlsi
andi0.7iandi1.3img/dLiiniboys.
Upitoitwothirdsiofiinstancesiofichronicikidneyidiseaseicanibeiattributeditoitwoikeyic
auses:idiabetesiandihighibloodipressurei(seeiFig.i5).iNumerousibodyiorgans,iincludingitheik
idneys,iheart,ibloodivessels,inerves,iandieyes,iareidamagedibyidiabetes.iWhenibloodipressur
eiagainstibloodivesseliwallsibecomesitooigreat,ihighibloodipressure,iorihypertension,iresults.
iExcessiveibloodipressureicanibeitheiprimaryicauseioficoronaryiheartiattacks,istrokes,iandiC
KDiifiitiisiuncontrolledioripoorlyicontrolled.iHowever,iCKDicaniresultiinihighibloodipressu
re.
Fig. 5. Diabetes mellitus over hypertension distribution.
Iniaddition,iaboveinotedifactors,itheiobtainabilityiandifeasibilityi(TableiI)iwasionceia
lsoiconsideriiniattributesiselection.
19
Theidistributioniofiappetiteivaluesiagainstitheiclassificationishowsithatiitibiasesiinith
eidirectioniofiexcellentiurgeiforifoodi(Fig.i6).iHowever,iCKDiareinowinotitheisolelyireasoni
ofihavingiaipooriappetite,iwhichiwillilieitoitheipredictionsiwhenimakingiuseiofitheiskilledim
annequinitoiainewiscenario.
C. Model Training:
Inithisistudy,itrainingiwasidoneiusingi9iclassificationimodels.iTheyiareidecisionitreeiclas
sifier,irandomiwoodediareaiclassifier,icatiboost,igradientiboostingiclassifier,istochasticigradi
entiboosting,iXGBiclassifier,imoreitimbericlassifier,iandiadaiboosticlassifier.iKNearestiNeighborsi(KNN)iregressioniisioneiofithem.iTheidatasetiwasisplitiintoitwoiparts:i70
%iofitheidataiwereiusediforitraining,iandi30%iwereiusediforitesting.iTheimodelsiwereifurthe
riimprovediusingigridisearchiforitheitrainingidatasetiandihyperparameterituningiviaiaigenetic
ialgorithm.
Fig. 6. The distribution of appetite
Fromitheinotedi9ialgorithms,i6ialgorithmsioutperformediinieducationiaccuracy,itestingia
ccuracyiandiinicrossivalidationiaccuracy.iThoseiareitheidecisionitreeiclassifier,irandomifores
ticlassifier,iXGBiclassifier,igreateritimbericlassifier,iadaiboosticlassifier.
TheiimplementationsiandicontrastihadibeenicarriedioutitheiusageiofiPythoniScikit,iandiKerasiframeworks.
TABLE IV
EACHiALGORITHMiACCURACIESi
Algorithm
Trainingiaccuracy
Testingiaccuracy
DecisioniTreeiClassifier
99.92%
93.23%
100.0%
96.53%
100.0%
95.26%
100.0%
95.46%
100.0%
92.93%
78.68%
61.88%
100.0%
94.64%
100.0%
97.69%
100.0%
96.51%
Random Forest Classifier
XG Boost
Extra Trees Classifier
Ada Boost Classifier
KNN
Gradient Boosting Classifier
Stochastic Gradient Boosting
Cat Boost
TABLE V
PRECISION,iRECALLiANDiF1-SCOREiOFiEACHiALGORITHM
Algorithm
Precision
Recall
F1-Score
Decision Tree Classifier
0.95
0.95
0.97
Random Forest Classifier
0.94
1.0
0.94
21
XG Boost
0.96
0.96
0.95
Extra Trees Classifier
0.94
1.0
0.92
Ada Boost Classifier
0.92
0.91
0.93
KNN
0.66
0.65
0.62
Gradient Boosting Classifier
0.98
1.0
0.94
Stochastic Gradient Boosting
0.97
1.0
0.96
Cat Boost
0.93
0.94
0.92
D. ModeliEvaluationiandiSelectioni
Theimethodsiwithitheihighestiaccuracyiinialli3idataisetsiwereichosenibasedionitheiresult
si(TableiIV,iV).iTheseiincludeitheiXGBiclassifier,iadditionalitreesiclassifier,idecisionitreeicl
assifier,irandomiforesticlassifier,iandiadaiboosticlassifier.
Fig. 7. Featureiimportanceiinieachitrainedimodel.
Chapter 4. Result and Discussion
Evenithoughitheicitedimodelsiprovidedi100%iaccuracy,iitiisistilliimportantitoirecognizeit
heicharacteristicsithatihaveitheimostiinfluenceionieachimodelibeforeimakingiaidecisioni(Tab
leiVI).iTheipopularideviationiofitheifeatureiimportanceiofieachialgorithmiwasicomputediafte
rideterminingitheirelevanceiofiaifewichosenifacetsiforieachipredictionialgorithm.iThisicalcul
ationidirectlyidemonstrateditheialgorithm'sipreferenceiforidistinguishingiattributesi(TableiVI
,iTableiVII,iFig.i7).iTheiextraibushesiclassifierihasitheileastibiasiinitheidirectioniofipointsiad
jacentitoitheirandomiforesticlassifier,iaccordingitoitheieffectsi(TableiVI,iTableiVII,iandiFig.i
7).iTheileastiamountiofibiasiisipresentiinitheiselectionitreeiclassifier.
TABLE VI
EACH ALGORITHM IMPORTANCE FEATURE
Attribute
Decission
Tree
Classifier
Rendom
Forest
Classifer
XGB
Classifer
Extra Tree
Classifier
Ada Boost
Classifer
Hemoglobin
0.583
0.245
0.255
0.175
0.332
Specific Gravity
0.269
0.276
0.137
0.244
0.328
Serum Creatinine
0.032
0.165
0.503
0.056
0.004
Albumin
0.107
0.197
0.086
0.153
0.146
Hypertension
0.006
0.054
0.004
0.194
0.132
23
Diabetes Mellitus
0.007
0.028
0.007
0.133
0.084
Blood Glucuse
Rendom
0.027
0.047
0.025
0.042
0.003
AlthoughitheidataidistributioniaccuratelyicoversitheifulliregioniiniCKD,icommonisy
mptomsiincludingihunger,ianemia,iandipedalioedemaiareislantedimoreitowardiCKD.iAlthou
ghiitiisisimpleitoimakeiaisuccessfulipredictioniusingithisidataiset,iiticanialsoiresultiinifalseip
ositivesiwheniusediinitheitypicalicontext,iasiseeniiniTableiV'sirecallicolumn.iFurthermore,iit
iwasidifficultitoiachieveiaiperfectiaccuracyiotherithanibyifillingiinitheimissingivaluesiusingi
aicollaborativeiimputeriratherithaniaiconstantibecauseitheyiwereiunquestionablyioverlookedi
atirandom.iGivenitheiscientificivalueiofitheicharacteristics,isomeiofithemihaveiaiweakericorr
elationithaniothersidependingionitheipatient'sistageiofidevelopment.
TABLE VII
STANDARDiDEVIATIONiOFiFEATUREiOFiALGORITHMS
Attribut
e
ExtraiTreeiCl
assifier
RendomiForesti
Classifier
AdeiBoostiCl
assifier
XGBiCle
ssifier
DecissioniTreei
Clessifier
StendardiDa
viation
0.070247746
0.101241419
0.1362923604
0.1853412263
0.21194746
Theitrainingiprocedureihasiaisignificantiimpactionitheimodels'iaccuracy.iAfteritheim
odelihasibeenitrained,iitiisievidentithatitreesiareimoreiaccurateithaniothericategorizationialgo
rithms.iThisiconclusionicanibeidrawnifromitheidistributioniofitheifactisetibecause,iwithitheie
xceptioniofiserumicreatinine,itheiclassificationiofitheichoseniattributesiisimoreiclearlyideline
ated.iConsideringitheireasonsiforitheialternateiofitheinominalivaluesiofithem,iitihasimanyiva
riousiprobabilitiesiasideifromiCKD.iFinally,iwhenipickingitheimethod,icertainiexpertimodels
ihaveiaibiasicloseritoiparticulariqualitiesiasiindicatediiniTableiVI.iAsiairesult,iitiencouragesi
decisionmakersitoiweighimoreifactorsithanijustioneiwhenimakingichoices,ianditheibetteritreeiclassifi
erihasibeenichoseniasiairesult.
25
Chapter 5. Conclusion
Nearlyi14%iofitheiworld'sipopulationiisiaffectedibyichronicikidneyidiseasei(CKD),iwhichic
anibeifatal.iByibeingiableitoiforecastiitiwithiai100%iaverageidegreeiofiaccuracy,ihumansica
nilearniaboutiitiearlyioniandireceiveitreatmentiwithitheileastiamountiofiexpenseiandidanger.i
Theirangeiofifeaturesirequirediforitheipredictionialgorithmiisireducedibyipropericharacteristi
ciengineering,ieffectivelyiloweringitheinumberioficlinicaliexamsithatimustibeitaken.iUsingi
KiNearestiNeighborsimputeri(KNNimputer)itoifilliinimissingivaluesibasedionitheiridistributionianditheicollocatio
niofiotheriattributesiratherithaniimmediatelyireplacingithemiwithiaiconstantiimprovesipredic
tioniaccuracyicompareditoirelatediworkidoneiwithitheisameidataset.iFurthermore,itheilargeri
bushesiclassifierianditheirandomiforesticlassifieriareisuperiorialgorithmsiforimakingipredicti
onsiforiCKDibecauseitheyiconsistentlyiachievei100%iaccuracyiandiexhibitilittleibiasitoward
idistinctiveifeaturesiinicomparisonitoiotherimodels.iIniorderitoianticipateiifiCKDifameiwilli
beipositiveiorinegative,iainewiapproachithatiincludesiinformationipreprocessing,imanagingi
missingivalues,iandielementiselectioniisiproposediinithisiwork.iThisiworkialsoiemphasizesih
owiimportantiitiisitoiconsideridomainiinformationiwhenichoosingifunctionsitoianalyzeiscien
tificirecordsirelateditoiCKD.iTherefore,iitiwillibeibeneficialitoilearnihowitoicopeiwithimissi
ngivaluesiinidataisetsirelateditoiseveralidiseasesiinitheifutureiusingiaiKNNimputerbasedistrategy.iAdditionally,ibyiincorporatingiknowledgeiaboutigenetics,iwatericonsumption
ipatterns,iandifoodikindsiintoitheiresearch,ideeperiinsightsioniCKDicanibeigained.
27
References
NationaliKidneyiFoundation.i2022.iKidneyiDisease:iTheiBasics.i[online]iAvailableia
t:i<https://www.kidney.org/news/newsroom/fsindex>i[Accessedi15iAugusti2022].i
NationaliKidneyiFoundation.i2022.iKidneyiDisease.i[online]iAvailableiat:i<https://w
ww.kidney.org/kidneydisease/global-facts-aboutkidneydisease/>i[Accessedi15iAugusti2022].
NationaliKidneyiFoundation.i2022.iEstimatediGlomerulariFiltrationiRatei(eGFR).i[
online]iAvailableiat:i<https://www.kidney.org/atoz/content/gfr>i[Accessedi15iAugusti2022].
i
Murtagh,iF.,iAddingtonHall,iJ.,iEdmonds,iP.,iDonohoe,iP.,iCarey,iI.,iJenkins,iK.iandiHigginson,iI.,i2010.iSymptom
siinitheiMonthiBeforeiDeathiforiStagei5iChroniciKidneyiDiseaseiPatientsiManagediWithout
iDialysis.iJournaliofiPainiandiSymptomiManagement,i40(3),ipp.342-352.i
Xiao,iJ.,iDing,iR.,iXu,iX.,iGuan,iH.,iFeng,iX.,iSun,iT.,iZhu,iS.iandiYe,iZ.,i2019.iCo
mparisoniandidevelopmentiofimachineilearningitoolsiinitheipredictioniofichronicikidneyidis
easeiprogression.iJournaliofiTranslationaliMedicine,i17(1).i
Archive.ics.uci.edu.i2022.iUCIiMachineiLearningiRepository.i[online]iAvailableiat:i
<https://archive.ics.uci.edu/ml/index.php>i[Accessedi15iAugusti2022].i
Abdullah,iA.,iHafidz,iS.iandiKhairunizam,iW.,i2020.iPerformanceiComparisoniofiM
achineiLearningiAlgorithmsiforiClassificationiofiChroniciKidneyiDiseasei(CKD).iJournaliof
iPhysics:iConferenceiSeries,i1529(5),ip.052077.i
InternationaliJournaliofiRecentiTechnologyiandiEngineering,i2019.iPredictiveiAnaly
ticsiofiChroniciKidneyiDiseaseiusingiMachineiLearningiAlgorithm.i8(2),ipp.940-947.
Aljaaf,iA.,iAlJumeily,iD.,iHaglan,iH.,iAlloghani,iM.,iBaker,iT.,iHussain,iA.iandiMu
stafina,iJ.,i2018.iEarlyiPredictioniofiChroniciKidneyiDiseaseiUsingiMachineiLearningiSupp
ortedibyiPredictiveiAnalytics.i2018iIEEEiCongressioniEvolutionaryiComputationi(CEC),.i
Rady,iE.iandiAnwar,iA.,i2019.iPredictioniofikidneyidiseaseistagesiusingidataimining
ialgorithms.iInformaticsiiniMedicineiUnlocked,i15,ip.100178.i
Johnson,iC.iA.,iLevey,iA.iS.,iCoresh,iJ.,iLevin,iA.,iLau,iJ.,i&iEknoyan,iG.i(2004).iC
linicalipracticeiguidelinesiforichronicikidneyidiseaseiiniadults:iPartiI.iDefinition,idiseaseista
ges,ievaluation,itreatment,iandiriskifactors.iAmericanifamilyiphysician,i70(5),i869–876.
iLi,iC.,i2013.iLittle'siTestiofiMissingiCompletelyiatiRandom.iTheiStataiJournal:iPro
motingicommunicationsionistatisticsiandiStata,i13(4),ipp.795-809.
Nair,iS.,iO’Brien,iS.,iHayden,iK.,iPandya,iB.,iLisboa,iP.,iHardy,iK.iandiWilding,iJ.,i
2014.iEffectiofiaiCookediMeatiMealioniSerumiCreatinineiandiEstimatediGlomerulariFiltrati
oniRateiiniDiabetes-RelatediKidneyiDisease.iDiabetesiCare,i37(2),ipp.483-487.i
Yildirim,iP.,i2017.iChroniciKidneyiDiseaseiPredictionioniImbalancediDataibyiMultil
ayeriPerceptron:iChroniciKidneyiDiseaseiPrediction.i2017iIEEEi41stiAnnualiComputeriSoft
wareiandiApplicationsiConferencei(COMPSAC),.i
Jakobsen,iJ.,iGluud,iC.,iWetterslev,iJ.iandiWinkel,iP.,i2017.iWheniandihowishouldi
multipleiimputationibeiusediforihandlingimissingidataiinirandomisediclinicalitrialsi–
iaipracticaliguideiwithiflowcharts.iBMCiMedicaliResearchiMethodology,i17(1).i
S,iV.iandiS,iD.,i2015.iDataiMiningiClassificationiAlgorithmsiforiKidneyiDiseaseiPr
ediction.iInternationaliJournalioniCyberneticsi&iInformatics,i4(4),ipp.13-25.i
29
NationaliKidneyiFoundation.i2022.iFactsiAboutiChroniciKidneyiDisease.i[online]iA
vailableiat:i<https://www.kidney.org/atoz/content/about-chronic-kidneydisease>i[Accessedi15iAugusti2022].
Download