International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 A Fuzzification Approach for Prediction of Heart Disease Nitika#1, Madan Lal Yadav#2 * Department of CSE, ASET, Amity University, Uttar Pradesh, Noida,India Abstract:Data Mining operations and approaches are the improvement over the statistical methods that enables a user to perform the future analysis based on current dataset. One of such analysis provided by data mining approaches is the predication based analysis. In this present work, the heart disease prediction system is designed. The heart disease prediction is actually an expert system application which requires the authenticated dataset to process. A Fuzzy based soft computing approach is been implemented on multiple parameters to predict the heart disease. In this paper, the earlier work done in the area of medical disease prediction is studied as well a new fuzzy rule based approach is suggested to perform the heart disease prediction. Keywords – Fuzzy System, Heart Disease, Rule Based, Dataset, Data Mining, Prediction based. I INTRODUCTION The prediction based systems are always the major challenge for the data mining approaches, where the current data analysis is been used to identify the future aspects. This challenge becomes more critical when we talk about the medical disease prediction and the analysis. The medical field is one of the major research area for the data mining but itself it is critical because it required some expert concern[1]. The involvement of the data mining approaches in the health care industry cannot be imagine. There are number of health care organization, medical industries that uses these mining approaches and the analysis to derive the effective results. There are number of trends and the patterns to work on medical data to analyze the patient situation as well as the disease and the diagnose prediction[4,6]. The influence of data mining on the quality of Health Care cannot be understated. All Health Care organizations retain detailed and comprehensive records of patient data. Trends and patterns identified in these records can ISSN: 2231-5381 positively impact the quality of Health Care. The huge amounts of patient data makes identification of these trends an arduous task. However data mining applications built for this purpose can make this very simple and produce efficient results. There have been several cases, where application of data mining techniques have helped in resolving a problem in the health industry. For instance, data mining on pneumonia patient records in a hospital, showed that patients who were administered medication immediately on arrival responded better than patients who were not administered medication on arrival. In order to arrive at this conclusion the data mining application, used several inputs, such as the tests and other information of the patients who showed better medication results. Various relations were drawn between the inputs. One of these was the relation between the results and the time taken to administer medication after arrival. It was found that, shorter the time, better the result[10,11]. There were several other key issues that were addressed at this time. The data mining tests proved that several tests, which were largely extraneous, were conducted on the patients. These led to a delay in the administration of medication and thereby affected the recovery of the patient. To overcome this, a standardized plan was created to treat pneumonia patients. The identification of these associations between inputs and finding the resultant best outcome was possible only because of data mining techniques. Some of the most used data mining techniques along with medical data analysis are given as under. A. Data Mining Techniques 1) Association:The association is one the basic and the mostly used data mining terms. The association mining is about to identify the relationship between the attributes of the dataset as well as to identify the relationship between the records. The association is one of the major modeling approach that is helpful to identify the valid dataset at the initial stage and to remove the records or the attributes having less association. Association mining is also taken as the pre-processing stage to filter the dataset by removing the dataset impurities and to keep the most valuable data values in the dataset. But the use of http://www.ijettjournal.org Page 2068 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 association mining is not limited to this only. It is also appropriate in some other mining operations such as classification etc[13]. 2) Clustering: When we have a large dataset, then instead of processing the dataset individually, the dataset is subdivided to the smaller units called clusters. The clustering is done by using some scientific approach so that the similar kind of the data will be maintained in one cluster. The similarity between the data items is analyzed by using some distance based measures such as Euclidean Distance. There are number of clustering approaches such as C-Means, K-means clustering approach etc[12]. 3) Data Visualization: The visualization is the approach to present the results in an effective way so that any stakeholder can easily drive the conclusion by viewing the results. Such kind of data transformation is performed in terms of pictorial data such as graphs, tables, charts etc. This is actually the management level presentation approach to represent data conclusions. 4) Decision Tree: Decision tree, as the name suggest is the tree based approach in which the decision are represented by the parent nodes and the associated events are represented by the child nodes. This kind of algorithm is used basically to perform the data classification. The decision is been taken about the data acceptance or the rejection under some rule defined as the parent node. 5) Linear Regression: It is the another statistical approach that work as the filtration as well as the analytical approach to perform the prediction of the data values. The regression is basically the analysis of an attribute respective to one or more attributes of the same dataset. II. LITERATURE SURVEY N. Aditya Sundar defined a study on the different classification approaches associated with heart disease. This paper describes about the most effective techniques called Naïve Bayes and WAC (weighted associative classifier). These are the classification approach that can answer the complex questions and the query in an effective way. The author has used a dataset in which the analysis is performed based on the age, gender, blood pressure and the blood sugar. The author performed the performance analysis under the defined approaches and predict the patient disease. Author also used different performance measures to perform the analysis[1]. The another work in same direction to predict the heart disease is performed by Chen. The author defined a system that can help medical professionals to identify the heart disease status in a patient. The author defined each processing ISSN: 2231-5381 stage broadly. The author performed the work in three layers. In first layer, the important features are selected for the patient. Once the features taken, the author performed the neural network based classification approach to classify the heart disease in the second layer. At the final stage, the author defined an analytical analysis to identify the chances of heart disease as well its criticality respective to a particular patient[2]. The another work related to the decision support system for heart disease prediction is performed by Mrs. G. Subalalakshmi to predict the heart disease. In her research work, the authentication dataset is driven under the standard parameters related to heart disease. The author had designed a questionnaire based web application to obtain the views of different experts and the medical students. Based on the comparative analysis on obtained dataset, the disease related conclusions are drawn[3].The another work performed by E. Barathi to predict the skin disease. The author has defined a survey based work to obtain the information about the different approaches to perform the disease prediction and also elaborate the work on skin disease classification and the prediction. The author has defined different classification approach to perform the prediction and the classification of the diseases. The author also suggested the related diagnose to the system[4]. Another survey based work is performed by Milan Kumari to different classification approaches in Cardiovascular Diseases prediction.. The author has defined the study on different classifiers such as Neural network, Support vector machine and the regression analysis. The author also discussed the different analytical approaches of similarity measures. The author performed the comparison based on the sensitivity, accuracy, error rate etc. The comparative analysis is here provided to perform the performance analysis on all these approaches[5]. Another work is performed by Jyoti Soni in the same direction to predict the heart disease. The presented research paper has performed the knowledge discovery under different mining techniques related to the medical area. The author has performed the heart disease predication and the analysis under all these approaches and conclude the relative decision based on the disease prediction[6][8]. A work related to the medical disease predication was presented by Dheeraj Dixit. In this presented approach, the discussion is being performed on different symptoms and the disorder analysis on the medical database for the prediction. A hybrid model is defined that includes the association rule mining and the relative analysis to perform the disease prediction[7]. The probabilistic analysis on the heart disease prediction was defined by Dr. D Raghu. In this work, a decision support system is being presented for the heart disease prediction and the probalistic analysis is being performed by the author. The http://www.ijettjournal.org Page 2069 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 medical disease prediction is here discussed under different attributes such as age, sex, blood pressure and the blood sugar. The author has performed the heart disease prediction analysis under the defined approaches[9]. Shantakumar B.Patil defined a research work with intelligent and effective heart attack prediction system using neural based approach. The author has performed the classification by training the dataset using neural and then identify the possible pattern. Later on all these patterns are studied and discussed to perform the analysis on the heart attack prediction. The author has defined a multi layer perceptron based training algorithm to perform the disease analysis. The results obtained from the system shows the effective prediction of the heart disease[10]. The another work based on association mining is being proposed by Jabbar to discover the heart disease on an authenticated dataset. The obtained results from the system shows the reliability of the work[11] 3) Define a probabilistic decision over the variables defined in step 1 so that the link associated in step 2 will be activated. Input data set Identify the most related symptom attributes Fuzzify the dataset under defined ruleset Recognize the patient disease based on fuzzy rules and operators Predict the Disease Criticality based on disease analysis III. PROPOSED APPROACH The heart disease prediction is one of the major research area in case of medical disease analysis. Such kind of systems are implemented under the expert advice and requires the authentication at each stage of the work. In this present work a fuzzy rule based system is been designed to provide the probabilistic model so that the predictive analysis can be driven from the approach easily. This methodology is effective to drive the predictive analysis on a single record as well as on a large dataset. In this present work, the Fuzzy based approach is been used to predict the patient disease. As it is the intelligent soft computing approach, it can represent the probabilistic relation based on the patient symptom analysis. As the work is rule based, the easy estimation of the interrelated variables can be identified to understand the approach followed by the fuzzy analysis. The validity of the algorithm implementation is based on the validity of the collected dataset. The first step is to define a valid dataset with large amount of data. In this present work we are going to define a patient dataset along with patient basic details as well as patient symptoms. Let us divide the tuples of the database into partitions, not necessarily of equal size. Once we get the normalized dataset, the next work is to implement the fuzzy rule on this dataset to perform the classification. The fuzzy will perform the work in three main steps by acquiring the domain knowledge: 1) Identify the attributes or the variables that are most associated to some event. 2) Identify the relationship between these attributes so that the flow can be defined. ISSN: 2231-5381 Figure 1 : Proposed Model Here figure 1 is showing the overall model proposed in this work. The work will begin with the input dataset. Once the dataset is obtained, the next work is to identify the associated attributes that play important role in heart disease prediction. This attribute recognition process will be done based on the expert concern. Once the dataset is identified, the next work is to generate the fuzzy rules on each attribute individually. Now implement these rules on this dataset to obtain the disease prediction as well disease criticality prediction. A. Strengths of system User friendly environment to work with user defined dataset Can work on authenticated dataset by loading the dataset Performing the Fuzzification of input data for each record Represent the Fuzzification process by using Fuzzy based graphical representation The Fuzzification is performed on each individual attribute The selection of patients is given based on Input symptom criteria The symptom as well as symptom criticality is considered in the work. http://www.ijettjournal.org Page 2070 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 More than one symptoms are taken collectively under different fuzzy operators Where: vi is the value assigned to the feature fi when checking the patient, i=1, … ,n. The results are represented based on the Fuzzy Query as well as Standard SQL Query IV. CONCLUSION All symptoms Criticality Level is collected to identify the overall prediction of the heart disease. B. Symptoms considered Fuzzy based association mining works on Boolean values which can be either true or false. For instance a patient suffering from high fever may be having temperature high then its truth value becomes 1 and if it’s false then its 0. Also if the value is intermediate such that it is neither true nor false then it takes the probability of both the condition. This whole approach is taken into consideration for every attributes of a patient such that it becomes easy for the identification and the diagnosis of the disease. In such manner we classify the whole database using the fuzzy approach such as the age, weight, blood pressure and other medical terms using the probability of the truth and false values. Medical diagnosis usually involves careful examination of a patient to check the presence and strength of some features relevant to a suspected disease in order to take a decision whether the patient suffers from that disease or not. A feature, like a runny nose for instance, may appear to be very strong for one patient but it can be moderate or even very light for another. It is the experience of the physician that tells him how to combine a set of symptoms (features and their strengths) to find out the correct diagnostic decision. Clinical medicine is one of the most interesting areas in which data mining may have an important practical impact. The widespread availability of large clinical data collections enables thorough retrospective analysis, which may give healthcare institutions an unprecedented opportunity to better understand the nature and peculiarity of the undergoing clinical processes. The present work is the analysis on the patient symptom information based on which a pre-level decision is taken to identify the chances of a heart disease. The work is under the intelligent system that can be adapted by a doctor. In this work we have taken a parameter based fuzzification that will perform the analysis based on some parameters. REFERENCES [1] [2] [3] [4] [5] [6] [7] C. Fuzzy Logic We consider a set of m diseases D, and define a collective set of n features F relevant to these diseases. Usually we have n>>m. Let: [8] D = {d1 , d2 , d 3 , … , dm } [9] F = { f1 , f2 , f3 , … , fn } To specify the symptoms of a patient, he would be checked against all features in the set F and a value would be assigned to each feature. The values are selected from the set: {Very Low, Low, Moderate, High, Very High} For example, a single symptom can be specified as < runny nose, Moderate >. By checking the patient for all n features of the set F and assigning a proper value for each feature, the set of patient’s symptoms S will be obtained as follows: S = { <f1 , v 1> , <f2 , v2> , <f3 , v3> , … , <fn , vn> } ISSN: 2231-5381 [10] [11] [12] [13] N. Aditya Sundar,” Performance analysis of classification data mining techniques over heart disease database”, [IJESAT] International Journal of engineering science & advanced technology ISSN: 2250–3676 AH Chen,” HDPS: Heart Disease Prediction System”, Computing in Cardiology 2011;38:557-560, ISSN 0276-6574 Mrs.G.Subbalakshmi,” Decision Support in Heart Disease Prediction System using Naive Bayes”, Indian Journal of Computer Science and Engineering (IJCSE), ISSN 0976-5166 Vol. 2 No. 2 Apr-May 2011 170-174 E. Barati,”A Survey on Utilization of Data Mining Approaches for Dermatological (Skin) Diseases Prediction”, Journals in Science and Technology, Journal of Selected Areas in Health Informatics (JSHI) March Edition, 2011 Milan Kumari,” Comparative Study of Data Mining Classification Methods in Cardiovascular Disease Prediction”, IJCST ISSN : 2229-4333 (Print)|ISSN:0976-8 491 Jyoti Soni,” Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction”, International Journal of Computer Applications (0975 – 8887) Mr. Dhiraj Pandey,” Prediction system to support medical information system using data mining approach”, International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Jyoti Soni,” Intelligent and Effective Heart Disease Prediction System using Weighted Associative Classifiers”, International Journal of Computer Applications (0975 – 8887) Volume 17– No.8, March 2011 Dr. D. Raghu,” Probability based Heart Disease Prediction using Data Mining Techniques”, IJCST ISSN : 0976-8491 (Online) | ISSN : 2229-4333(Print) Shantakumar B.Patil,” Intelligent and Effective Heart Attack Prediction System Using Data Mining and Artificial Neural Network”, European Journal of Scientific Research ISSN : 0975-3397 Vol. 3 No. 6 June 2011 2385 M.A.Jabbar,” Knowledge discovery from mining association rules for heart disease prediction”,, Journal of Theoretical and Applied Information Technology ISSN: 1992-8645 E-ISSN: 1817-3195, 2005 T Srinivasan,” Knowledge Discovery in Clinical Databases with Neural Network Evidence Combination”. Sellappan Palaniappan,” Intelligent Heart Disease Prediction System Using Data Mining Techniques”, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 http://www.ijettjournal.org Page 2071