International Journal of Trend in Scientific Research and Development (IJTSRD) Volume 5 Issue 2, January-February 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470 A Survey on Heart Disease Prediction Techniques G. Niranjana, Dr I. Elizabeth Shanthi Department of Computer Science, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India ABSTRACT Heart disease is the main reason for a huge number of deaths in the world over the last few decades and has evolved as the most life-threatening disease. The health care industry is found to be rich in information. So, there is a need to discover hidden patterns and trends in them. For this purpose, data mining techniques can be applied to extract the knowledge from the large sets of data. Many researchers, in recent times have been using several machine learning techniques for predicting the heart related diseases as it can predict the disease effectively. Even though a machine learning technique proves to be effective in assisting the decision makers, still there is a scope for developing an accurate and efficient system to diagnose and predict the heart diseases thereby helping doctors with ease of work. This paper presents a survey of various techniques used for predicting heart disease and reviews their performance. How to cite this paper: G. Niranjana | Dr I. Elizabeth Shanthi "A Survey on Heart Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 24566470, Volume-5 | Issue-2, February IJTSRD38349 2021, pp.68-71, URL: www.ijtsrd.com/papers/ijtsrd38349.pdf Copyright © 2021 by author(s) and International Journal of Trend in Scientific Research and Development Journal. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) KEYWORDS: Heart disease, data mining, machine learning, prediction (http://creativecommons.org/licenses/by/4.0) 1. INTRODUCTION Data mining is the process of examining large databases in order to discover the hidden patterns and correlations using statistics, machine learning, artificial intelligence and database technology. Tremendous amount of data is generated in medical field and it is important to mine those data for helping the practitioners in early diagnosis of disease. Heart disease causes immediate death and claims more lives each year than compared to all types of cancer or other major diseases. Heart disease prediction is very challenging area in the medical field because of several risk factors such as high blood pressure, high cholesterol, uncontrolled diabetes, abnormal pulse rate, obesity etc. Highly skilled and experienced physicians are required to diagnose the heart disease [1]. However, the death rate can be drastically reduced if the disease is detected at the early stages and also by adopting preventive measures. So, developing a heart disease prediction system is indispensable thereby preventing the patient’s death through early diagnosis. The main objective of this paper is to do a survey on the previous research work in heart disease prediction and analyzing the techniques used. The organization of the paper is as follows. Section 2 tells about the heart disease. Section 3 explains about the data mining algorithms for heart disease prediction. Section 4 deals with the literature review. Section 5 shows the observation and section 6 concludes the paper. 2. HEART DISEASE Heart disease occurs when plaque develops in the arteries and blood vessels that lead to the heart. This plaque blocks @ IJTSRD | Unique Paper ID – IJTSRD38349 | oxygen and important nutrients from reaching your heart. The types of heart disease are listed below: 1. Congenital heart disease 2. Arrhythmia 3. Coronary artery disease 4. Dilated cardiomyopathy 5. Myocardial infarction 6. Mitral valve prolapse 7. Pulmonary stenosis 8. Hypertrophic cardiomyopathy 3. DATA MINING TECHNIQUES FOR PREDICTION 3.1. DECISION TREE Decision Tree is a type of Supervised Machine Learning algorithm, where it identifies several ways to split the data continuously based on certain parameter. The tree consists of two entities, namely decision node and leaf node. The decision nodes are where the data is split and the leaf node are the decisions or the final outcomes. Decision tree performs both classification and regression tasks. The tree model where the target variable takes a set of discrete values are called classification trees and when the target variable takes continuous values are called as regression trees. Decision tree is widely used in data mining, statistics and machine learning. 3.2. NAÏVE BAYES A Naive Bayes classifier is a probabilistic machine learning algorithm based on Bayes theorem for binary and multi-class classification problems. A Naive Bayesian classifier is easy to build, with independent assumptions between predictors. Volume – 5 | Issue – 2 | January-February 2021 Page 68 International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 This assumption is most unlikely in real data. In Naive Bayes every pair of features being classified is independent of each other. It outperforms many other sophisticated classification models. Bayes’ theorem is stated as follows P(A/B) = P(B/A).P(A)/P(B) for classification after deducting the feature with lowest information gain. Now, the accuracy is 89.56% in training dataset and 80.99% in validation dataset. The result shows that feature selection helps increase computational efficiency while improving classification accuracy. 3.3. Random Forest Random forest is a supervised learning algorithm which operates by constructing multiple decision trees for both classification as well as regression. The random forest algorithm creates decision trees based on data samples and then predicts the output for each of them and voting is done finally to select the best solution or output. The main advantage of using random forest is that it reduces the overfitting and provides high accuracy. Nikhil Gawande et al. [4] (2017) proposed a system to classify heart disease using convolutional neural network. The system uses a CNN model to classify ECG signals. As it can be able to classify heart beat in different manner, there is no need for feature extraction. ECG signals have been taken from the MIT-BIH database. ECG signal is given as input to the system. Total 340 samples are trained and tested using CNN and the results are described in the confusion matrix. Even long ECG records can be classified in accurate manner and the distinct patient’s records can be treated once the CNN is trained. The accuracy obtained is 99.46%. 3.4. SUPPORT VECTOR MACHINE A Support vector machine is a discriminative classifier that finds a hyper plane in n-dimensional space to distinctly classify the data points. Given a labelled training data, the algorithm outputs an optimal separating hyper plane which divides the data or classes. Support vectors are the data points or class closer to the hyper plane. SVM is a supervised learning method used for both classification and regression tasks. It is widely used algorithm since it produces high accuracy with less computation power. 3.5. ARTIFICIAL NEURAL NETWORK Neural network is a machine learning algorithm that is designed based on the human brain. Artificial neural network is an information processing technique that works like the way human brain process the information. ANN contains 3 layers input layer, hidden layer and the output layer that are interconnected by nodes which contains an activation function. Neural network is greatly used in data mining field for classifying large datasets. It enhances the data analysis technology. 4. LITERATURE REVIEW In this section a detailed description of the previous work that has gone into the research related to the data mining technique for heart disease prediction are explained. Purushottam et al. [2] (2015) designed a system that can efficiently discover the rules to predict the risk level of heart disease based on the given parameter. The rules are prioritized based on the user’s requirements. The system uses the classification model by covering rules as C4.5Rules. WEKA tool is used for dataset analysis and Knowledge Extraction based on Evolutionary Learning (KEEL) tool to find out the classification decision rules. The dataset is taken from the Cleveland Clinic Foundation. It contains total 76 raw attributes, out of which only 14 of them are taken. This dataset contains several parameters like ECR, cholesterol, chest pain, fasting sugar, MHR and many more. The classification result of the decision tree is 87.4%. AnchanaKhemphila et al. [3] (2011) proposed a classification approach using Multi-Layer Perceptron with Back Propagation learning algorithm and a feature selection algorithm to diagnose heart disease. Information Gain concept is used for feature selection. First the model uses the ANN with no information gain-based feature selection function; the accuracy in training dataset is 88.46% and 80.17% in the validation dataset. Further, the ANN is used @ IJTSRD | Unique Paper ID – IJTSRD38349 | V Krishnaiah et al. [5] (2014) introduces a fuzzy classification technique to diagnose the heart disease. The main objective of the research work is to predict the heart disease patients with more accuracy and to remove the uncertainty in unstructured data. The data source is collected from Cleveland Heart Disease database and Stalog Heart disease database. Cleveland database consist of 303 records and Stalog database consists of 270 records and remaining records are collected from different hospitals in Hyderabad. M.A. Jabbar et al. [6] (2016) proposed a novel classification model Hidden Naïve Bayes classifier for heart disease prediction. The heart dataset is downloaded from the UCI repository. Heart stalog dataset contains 14 attributes and 270 instances. Hidden Naïve Bayes evaluation is performed using 10-fold cross validation. WEKA tool is used for hidden naïve bayes classification. The proposed approach performs pre-processing using discretization and IQR filters to improve the efficiency of Hidden Naïve Bayes. The performance results show that the HNB classifier model recorded 100% accuracy compared with NB classification model. Jayshril S. Sonawane et al. [7] (2014) proposed a system to predict heart disease using multilayer perceptron architecture of neural network. The dataset is taken from the Cleveland heart disease database. The dataset containing 13 clinical attributes fed as input to the neural network and back propagation algorithm is used to train the data. Compared to the decision support system for predicting heart disease using multilayer perceptron and back propagation algorithm, the proposed system achieves highest accuracy of 98.58% for 20 neurons. TanmayKasbe et al. [8] (2017) proposed fuzzy expert system for heart disease diagnosis. The database has been taken from the UCI repository and this dataset consist of 4 databases that are taken from V.A. Medical center, Long Beach, Cleveland clinic foundation, Hungarian institute of cardiology, Budapest and University hospital, Zurich, Switzerland. A total of 76 input attributes and 1 output attribute are in the dataset, out of this the proposed system uses 10 important input attributes and 1 output attribute. MATLAB software is used for developing fuzzy system. The fuzzy system consists of 3 steps i.e. fuzzification, rule base and defuzzification. The proposed fuzzy expert system Volume – 5 | Issue – 2 | January-February 2021 Page 69 International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 achieves 93.33% accuracy and better performance compared to previous work on same domain. dataset is small in size. There is no clear information about the accuracy level. Aakashchauhan et al. [9] (2018) proposed a new rule to predict the coronary heart disease using evolutionary rule learning. Computational intelligence is used to discover the relationship between disease and patient. The dataset is taken from the Cleveland Heart Disease database. KEEL is a java-based tool used for the simulation of evolutionary learning. Classification model was developed by association rule and the results are evaluated. The system will help doctor to explore their data and predicts the coronary disease accurately. M. AkhilJabbar et al, [14] (2013) proposed a neural network model to classify the heart disease using artificial neural network and feature subset selection. The feature subset selection is a method that is used to reduce the dimensionality of the input data. By reducing the number of attributes, the number of diagnosis tests which are needed by doctors from patient are also reduced. The dataset is taken from Andhra Pradesh hospital and results show that accuracy is enhanced over the out-dated classification techniques. The results also show that this system is faster and precise. Senthilkumarmohan et al. [10] (2019) proposed a novel method to improve the prediction accuracy of cardiovascular disease using hybrid machine learning techniques. The prediction model is designed with different combinations of features and several known machine learning classification techniques. Cleveland Heart Disease dataset is taken from the UCI repository. After feature selection 13 attributes are considered for further classification. R studio rattle is used for performing classification and the performance are evaluated. The prediction model uses hybrid random forest with a linear model and recorded 88.7% accuracy. Sinkonnayak et al. [11] (2019) proposed a method to predict the heart disease by mining frequent items and classification techniques. The dataset is taken from UCI repository and pre-processing is done. The frequent item mining is used for filtering the attributes and then the variant classification techniques like Decision tree, Naïve Bayes, Support Vector Machine and KNN classification methods are used for predicting the heart disease at an early stage. R analytical tool is used for implementation. Out of these diverse data mining techniques Naïve Bayes achieves 88.67% accuracy in predicting the heart disease with attribute filtration. The performance is evaluated using ROC curve. RahmaAtallah et al. [12] (2019) proposed a majority voting ensemble method to predict the heart disease in humans. Here the model classifies the patient based on the majority vote of diverse machine learning models in order to provide more accurate results. The dataset is taken from UCI repository and 14 predominant attributes are considered. In pre-processing Min-Max normalization is done. In order to analyze the data, a correlation value was calculated between each attribute and the target variable. It can be noted that the highest correlated features with the target attribute were Cp, Thalach, Oldpeak and Exang. For testing 4 types of classifier models are used namely Stochastic Gradient Descent (88%), KNN (87%), Logistic regression (87%) and Random forest (87%). These 4 models are combined in an ensemble model where the classification is done based on hard voting and finally the model achieves 90% accuracy. Haritajagad and Jehankandawalla et al, [13] (2015) proposed a model to detect coronary artery disease using different data mining algorithms namely Decision Tree, Naïve Bayes and Neural Network. The parameters like patient age, sex, blood pressure etc. is taken during check up to evaluate the performance of these algorithms. Naïve Bayes proved to be the fastest among three algorithms. Decision tree algorithm reliability depends on the input data and it is difficult to deal with large datasets. Neural network is basically used when @ IJTSRD | Unique Paper ID – IJTSRD38349 | Muhammad saqlain and wahidhussain et al, [15] (2016) proposed a multi nominal Naïve Bayes algorithm to detect the heart failure. The data are collected from Armed Forces Institute of Cardiology (AFIC), Pakistan in the form of medical records. It uses 30 variables. The proposed algorithm is compared with different classification algorithms like Logistic Regression, Neural Network, SVM, Random Forest and Decision Tree. The performance of Navies Bayes algorithm is measured in terms of Precision, Accuracy, Recall and Area Under the Curve (AUC). Naïve Bayes achieved highest accuracy of 86.7% and Area under the Curve (AUC) is 92.4% respectively. 5. OBSERVATION From the literature review, it is observed that the dataset was taken from UCI repository containing Cleveland Heart Disease database and Stalog Heart disease database. The Naïve Bayes, Decision tree, Random forest and neural network algorithms are predominantly used data mining techniques for predicting the heart disease with highest accuracy. But the authors should concentrate on minimizing the time utilization. Most of the research works have used existing machine learning techniques or combined two or more existing techniques to improve the prediction performance. Rather than taking vote or combining the existing algorithms the researchers should propose new algorithms for prediction. The researchers should explore more deep learning concepts in order toimprove the prediction performance. Since the heart disease is very dangerous a fast and reliable system need to be developed. 6. CONCLUSION The main motive for conducting this survey is to comprehend the work of different authors and also to analyze how accurately we can predict the heart disease. These authors have proposed different data mining algorithms but there are certain limitations like neural networks performs well only with structured dataset [16]. MATLAB, Python and WEKA tool are widely used technologies for implementing these algorithms. However different algorithms generate different accuracy that purely depends on the size of the dataset, number of attributes selected [17] and tools used for implementation. The future works should concentrate on proposing new algorithms to achieve better performance and to minimize the time utilization. REFERENCES [1] M. A. Jabbar et.al” Prediction of heart disease using Random forest and Feature subset selection “,AISC SPRINGER, vol 424,pp187-196(2015) Volume – 5 | Issue – 2 | January-February 2021 Page 70 International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 [2] Purushottam, K. Saxena and R. Sharma, "Efficient heart disease prediction system using decision tree," International Conference on Computing, Communication & Automation, Noida, 2015, pp. 72-77. Learning Techniques," in IEEE Access, vol. 7, pp. 81542-81554, 2019. [11] S. Nayak, M. K. Gourisaria, M. Pandey and S. S. Rautaray, "Prediction of Heart Disease by Mining Frequent Items and Classification Techniques," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019, pp. 607611, doi: 10.1109/ICCS45141.2019.9065805. [12] R. Atallah and A. Al-Mousa, "Heart Disease Detection Using Machine Learning Majority Voting Ensemble Method," 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), Amman, Jordan, 2019, pp. 1-6, doi: 10.1109/ICTCS.2019.8923053. [3] Khemphila and V. Boonjing, "Heart Disease Classification Using Neural Network and Feature Selection," 2011 21st International Conference on Systems Engineering, Las Vegas, NV, 2011, pp. 406409. [4] N. Gawande and A. Barhatte, "Heart diseases classification using convolutional neural network," 2017 2nd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, 2017, pp. 17-20. [5] V. Krishnaiah, M. Srinivas, G. Narsimha and N. S. Chandra, "Diagnosis of heart disease patients using fuzzy classification technique," International Conference on Computing and Communication Technologies, Hyderabad, 2014, pp. 1-7. [13] HaritaJagad and JehanKandawalla , “ Detection of Coronary Artery Diseases using Data Mining Techniques”, International Journal on Recent And Innovation Trends in Computing And Communication ISSN:2321-8169 Vol.3 issue 11,pp.60906092,2015. [6] M. A. Jabbar and S. Samreen, "Heart disease prediction system based on hidden naïve bayes classifier," 2016 International Conference on Circuits, Controls, Communications and Computing (I4C), Bangalore, 2016, pp. 1-5. [14] Jabbar, M. Akhil, B. L. Deekshatulu, and Priti Chandra. "Classification of heart disease using artificial neural network and feature subset selection."Global Journal of Computer Science and Technology Neural & Artificial Intelligence 13, no. 3 (2013). [7] J. S. Sonawane and D. R. Patil, "Prediction of heart disease using multilayer perceptron neural network," International Conference on Information Communication and Embedded Systems (ICICES2014), Chennai, 2014, pp. 1-6. [15] [8] T. Kasbe and R. S. Pippal, "Design of heart disease diagnosis system using fuzzy logic," 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, 2017, pp. 3183-3187. Muhammad Saqlain, Wahid Hussain, Nazar A. Saqib et al, “identification of heart failure by using unstructured data of cardiac patients”, 45th International Conference on Parallel Processing Workshops, DOI 10.1109/ICPPW.2016.66, pp.426431,2016. [16] B. Gnaneswar and M. R. E. Jebarani, "A review on prediction and diagnosis of heart failure," 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, 2017, pp. 1-3. [17] A. L. Rane, "A survey on Intelligent Data Mining Techniques used in Heart Disease Prediction," 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, 2018, pp. 208213. [9] [10] A. Chauhan, A. Jain, P. Sharma and V. Deep, "Heart Disease Prediction using Evolutionary Rule Learning," 2018 4th International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, 2018, pp. 1-4. S. Mohan, C. Thirumalai and G. Srivastava, "Effective Heart Disease Prediction Using Hybrid Machine @ IJTSRD | Unique Paper ID – IJTSRD38349 | Volume – 5 | Issue – 2 | January-February 2021 Page 71