A Fuzzification Approach for Prediction of Heart Disease

advertisement
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
A Fuzzification Approach for Prediction of
Heart Disease
Nitika#1, Madan Lal Yadav#2
*
Department of CSE, ASET, Amity University, Uttar Pradesh, Noida,India
Abstract:Data Mining operations and approaches are the
improvement over the statistical methods that enables a
user to perform the future analysis based on current
dataset. One of such analysis provided by data mining
approaches is the predication based analysis. In this
present work, the heart disease prediction system is
designed. The heart disease prediction is actually an
expert system application which requires the
authenticated dataset to process. A Fuzzy based soft
computing approach is been implemented on multiple
parameters to predict the heart disease. In this paper, the
earlier work done in the area of medical disease
prediction is studied as well a new fuzzy rule based
approach is suggested to perform the heart disease
prediction.
Keywords – Fuzzy System, Heart Disease, Rule Based,
Dataset, Data Mining, Prediction based.
I INTRODUCTION
The prediction based systems are always the
major challenge for the data mining approaches, where the
current data analysis is been used to identify the future
aspects. This challenge becomes more critical when we
talk about the medical disease prediction and the analysis.
The medical field is one of the major research area for the
data mining but itself it is critical because it required some
expert concern[1].
The involvement of the data mining approaches in the
health care industry cannot be imagine. There are number
of health care organization, medical industries that uses
these mining approaches and the analysis to derive the
effective results. There are number of trends and the
patterns to work on medical data to analyze the patient
situation as well as the disease and the diagnose
prediction[4,6].
The influence of data mining on the quality of Health Care
cannot be understated. All Health Care organizations
retain detailed and comprehensive records of patient data.
Trends and patterns identified in these records can
ISSN: 2231-5381
positively impact the quality of Health Care. The huge
amounts of patient data makes identification of these
trends an arduous task. However data mining applications
built for this purpose can make this very simple and
produce efficient results.
There have been several cases, where application
of data mining techniques have helped in resolving a
problem in the health industry. For instance, data mining
on pneumonia patient records in a hospital, showed that
patients who were administered medication immediately
on arrival responded better than patients who were not
administered medication on arrival. In order to arrive at
this conclusion the data mining application, used several
inputs, such as the tests and other information of the
patients who showed better medication results. Various
relations were drawn between the inputs. One of these was
the relation between the results and the time taken to
administer medication after arrival. It was found that,
shorter the time, better the result[10,11].
There were several other key issues that were
addressed at this time. The data mining tests proved that
several tests, which were largely extraneous, were
conducted on the patients. These led to a delay in the
administration of medication and thereby affected the
recovery of the patient. To overcome this, a standardized
plan was created to treat pneumonia patients. The
identification of these associations between inputs and
finding the resultant best outcome was possible only
because of data mining techniques. Some of the most used
data mining techniques along with medical data analysis
are given as under.
A. Data Mining Techniques
1) Association:The association is one the basic and the
mostly used data mining terms. The association mining is
about to identify the relationship between the attributes of
the dataset as well as to identify the relationship between
the records. The association is one of the major modeling
approach that is helpful to identify the valid dataset at the
initial stage and to remove the records or the attributes
having less association. Association mining is also taken
as the pre-processing stage to filter the dataset by
removing the dataset impurities and to keep the most
valuable data values in the dataset. But the use of
http://www.ijettjournal.org
Page 2068
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
association mining is not limited to this only. It is also
appropriate in some other mining operations such as
classification etc[13].
2) Clustering: When we have a large dataset, then instead
of processing the dataset individually, the dataset is
subdivided to the smaller units called clusters. The
clustering is done by using some scientific approach so
that the similar kind of the data will be maintained in one
cluster. The similarity between the data items is analyzed
by using some distance based measures such as Euclidean
Distance. There are number of clustering approaches such
as C-Means, K-means clustering approach etc[12].
3) Data Visualization: The visualization is the approach
to present the results in an effective way so that any
stakeholder can easily drive the conclusion by viewing the
results. Such kind of data transformation is performed in
terms of pictorial data such as graphs, tables, charts etc.
This is actually the management level presentation
approach to represent data conclusions.
4) Decision Tree: Decision tree, as the name suggest is
the tree based approach in which the decision are
represented by the parent nodes and the associated events
are represented by the child nodes. This kind of algorithm
is used basically to perform the data classification. The
decision is been taken about the data acceptance or the
rejection under some rule defined as the parent node.
5) Linear Regression: It is the another statistical approach
that work as the filtration as well as the analytical
approach to perform the prediction of the data values. The
regression is basically the analysis of an attribute
respective to one or more attributes of the same dataset.
II. LITERATURE SURVEY
N. Aditya Sundar defined a study on the different
classification approaches associated with heart disease.
This paper describes about the most effective techniques
called Naïve Bayes and WAC (weighted associative
classifier). These are the classification approach that can
answer the complex questions and the query in an
effective way. The author has used a dataset in which the
analysis is performed based on the age, gender, blood
pressure and the blood sugar. The author performed the
performance analysis under the defined approaches and
predict the patient disease. Author also used different
performance measures to perform the analysis[1]. The
another work in same direction to predict the heart disease
is performed by Chen. The author defined a system that
can help medical professionals to identify the heart disease
status in a patient. The author defined each processing
ISSN: 2231-5381
stage broadly. The author performed the work in three
layers. In first layer, the important features are selected for
the patient. Once the features taken, the author performed
the neural network based classification approach to
classify the heart disease in the second layer. At the final
stage, the author defined an analytical analysis to identify
the chances of heart disease as well its criticality
respective to a particular patient[2]. The another work
related to the decision support system for heart disease
prediction is performed by Mrs. G. Subalalakshmi to
predict the heart disease. In her research work, the
authentication dataset is driven under the standard
parameters related to heart disease. The author had
designed a questionnaire based web application to obtain
the views of different experts and the medical students.
Based on the comparative analysis on obtained dataset, the
disease related conclusions are drawn[3].The another work
performed by E. Barathi to predict the skin disease. The
author has defined a survey based work to obtain the
information about the different approaches to perform the
disease prediction and also elaborate the work on skin
disease classification and the prediction. The author has
defined different classification approach to perform the
prediction and the classification of the diseases. The
author also suggested the related diagnose to the
system[4]. Another survey based work is performed by
Milan Kumari to different classification approaches in
Cardiovascular Diseases prediction.. The author has
defined the study on different classifiers such as Neural
network, Support vector machine and the regression
analysis. The author also discussed the different analytical
approaches of similarity measures. The author performed
the comparison based on the sensitivity, accuracy, error
rate etc. The comparative analysis is here provided to
perform the performance analysis on all these
approaches[5].
Another work is performed by Jyoti Soni in the same
direction to predict the heart disease. The presented
research paper has performed the knowledge discovery
under different mining techniques related to the medical
area. The author has performed the heart disease
predication and the analysis under all these approaches and
conclude the relative decision based on the disease
prediction[6][8].
A work related to the medical disease predication was
presented by Dheeraj Dixit. In this presented approach,
the discussion is being performed on different symptoms
and the disorder analysis on the medical database for the
prediction. A hybrid model is defined that includes the
association rule mining and the relative analysis to
perform the disease prediction[7]. The probabilistic
analysis on the heart disease prediction was defined by Dr.
D Raghu. In this work, a decision support system is being
presented for the heart disease prediction and the
probalistic analysis is being performed by the author. The
http://www.ijettjournal.org
Page 2069
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
medical disease prediction is here discussed under
different attributes such as age, sex, blood pressure and the
blood sugar. The author has performed the heart disease
prediction analysis under the defined approaches[9].
Shantakumar B.Patil defined a research work with
intelligent and effective heart attack prediction system
using neural based approach. The author has performed
the classification by training the dataset using neural and
then identify the possible pattern. Later on all these
patterns are studied and discussed to perform the analysis
on the heart attack prediction. The author has defined a
multi layer perceptron based training algorithm to perform
the disease analysis. The results obtained from the system
shows the effective prediction of the heart disease[10].
The another work based on association mining is being
proposed by Jabbar to discover the heart disease on an
authenticated dataset. The obtained results from the system
shows the reliability of the work[11]
3) Define a probabilistic decision over the variables
defined in step 1 so that the link associated in step 2 will
be activated.
Input data set
Identify the most related
symptom attributes
Fuzzify the dataset under defined
ruleset
Recognize the patient disease
based on fuzzy rules and
operators
Predict the Disease Criticality
based on disease analysis
III. PROPOSED APPROACH
The heart disease prediction is one of the major research
area in case of medical disease analysis. Such kind of
systems are implemented under the expert advice and
requires the authentication at each stage of the work. In
this present work a fuzzy rule based system is been
designed to provide the probabilistic model so that the
predictive analysis can be driven from the approach easily.
This methodology is effective to drive the predictive
analysis on a single record as well as on a large dataset. In
this present work, the Fuzzy based approach is been used
to predict the patient disease. As it is the intelligent soft
computing approach, it can represent the probabilistic
relation based on the patient symptom analysis. As the
work is rule based, the easy estimation of the interrelated
variables can be identified to understand the approach
followed by the fuzzy analysis.
The validity of the algorithm implementation is based on
the validity of the collected dataset. The first step is to
define a valid dataset with large amount of data. In this
present work we are going to define a patient dataset along
with patient basic details as well as patient symptoms. Let
us divide the tuples of the database into partitions, not
necessarily of equal size. Once we get the normalized
dataset, the next work is to implement the fuzzy rule on
this dataset to perform the classification. The fuzzy will
perform the work in three main steps by acquiring the
domain knowledge:
1) Identify the attributes or the variables that are most
associated to some event.
2) Identify the relationship between these attributes so that
the flow can be defined.
ISSN: 2231-5381
Figure 1 : Proposed Model
Here figure 1 is showing the overall model proposed in
this work. The work will begin with the input dataset.
Once the dataset is obtained, the next work is to identify
the associated attributes that play important role in heart
disease prediction. This attribute recognition process will
be done based on the expert concern. Once the dataset is
identified, the next work is to generate the fuzzy rules on
each attribute individually. Now implement these rules on
this dataset to obtain the disease prediction as well disease
criticality prediction.
A. Strengths of system

User friendly environment to work with user defined
dataset

Can work on authenticated dataset by loading the
dataset

Performing the Fuzzification of input data for each
record

Represent the Fuzzification process by using Fuzzy
based graphical representation

The Fuzzification is performed on each individual
attribute

The selection of patients is given based on Input
symptom criteria

The symptom as well as symptom criticality is
considered in the work.
http://www.ijettjournal.org
Page 2070
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013

More than one symptoms are taken collectively under
different fuzzy operators
Where: vi is the value assigned to the feature fi when
checking the patient, i=1, … ,n.

The results are represented based on the Fuzzy Query
as well as Standard SQL Query
IV. CONCLUSION

All symptoms Criticality Level is collected to identify
the overall prediction of the heart disease.
B. Symptoms considered
Fuzzy based association mining works on Boolean values
which can be either true or false. For instance a patient
suffering from high fever may be having temperature high
then its truth value becomes 1 and if it’s false then its 0.
Also if the value is intermediate such that it is neither true
nor false then it takes the probability of both the condition.
This whole approach is taken into consideration for every
attributes of a patient such that it becomes easy for the
identification and the diagnosis of the disease. In such
manner we classify the whole database using the fuzzy
approach such as the age, weight, blood pressure and other
medical terms using the probability of the truth and false
values.
Medical diagnosis usually involves careful examination of
a patient to check the presence and strength of some
features relevant to a suspected disease in order to take a
decision whether the patient suffers from that disease or
not. A feature, like a runny nose for instance, may appear
to be very strong for one patient but it can be moderate or
even very light for another. It is the experience of the
physician that tells him how to combine a set of symptoms
(features and their strengths) to find out the correct
diagnostic decision.
Clinical medicine is one of the most interesting areas in which
data mining may have an important practical impact. The
widespread availability of large clinical data collections enables
thorough retrospective analysis, which may give healthcare
institutions an unprecedented opportunity to better understand the
nature and peculiarity of the undergoing clinical processes. The
present work is the analysis on the patient symptom information
based on which a pre-level decision is taken to identify the
chances of a heart disease. The work is under the intelligent
system that can be adapted by a doctor. In this work we have
taken a parameter based fuzzification that will perform the
analysis based on some parameters.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
C. Fuzzy Logic
We consider a set of m diseases D, and define a collective
set of n features F relevant to these diseases. Usually we
have n>>m. Let:
[8]
D = {d1 , d2 , d 3 , … , dm }
[9]
F = { f1 , f2 , f3 , … , fn }
To specify the symptoms of a patient, he would be
checked against all features in the set F and a value would
be assigned to each feature. The values are selected from
the set:
{Very Low, Low, Moderate, High, Very High}
For example, a single symptom can be specified as <
runny nose, Moderate >. By checking the patient for all n
features of the set F and assigning a proper value for each
feature, the set of patient’s symptoms S will be obtained as
follows:
S = { <f1 , v 1> , <f2 , v2> , <f3 , v3> , … , <fn , vn> }
ISSN: 2231-5381
[10]
[11]
[12]
[13]
N. Aditya Sundar,” Performance analysis of classification data
mining techniques over heart disease database”, [IJESAT]
International Journal of engineering science & advanced
technology ISSN: 2250–3676
AH Chen,”
HDPS: Heart Disease Prediction System”,
Computing in Cardiology 2011;38:557-560, ISSN 0276-6574
Mrs.G.Subbalakshmi,” Decision Support in Heart Disease
Prediction System using Naive Bayes”, Indian Journal of
Computer Science and Engineering (IJCSE), ISSN 0976-5166
Vol. 2 No. 2 Apr-May 2011 170-174
E. Barati,”A Survey on Utilization of Data Mining Approaches
for Dermatological (Skin) Diseases Prediction”, Journals in
Science and Technology, Journal of Selected Areas in Health
Informatics (JSHI) March Edition, 2011
Milan Kumari,” Comparative Study of Data Mining
Classification Methods in Cardiovascular Disease Prediction”,
IJCST ISSN : 2229-4333 (Print)|ISSN:0976-8 491
Jyoti Soni,” Predictive Data Mining for Medical Diagnosis: An
Overview of Heart Disease Prediction”, International Journal of
Computer Applications (0975 – 8887)
Mr. Dhiraj Pandey,” Prediction system to support medical
information system using data mining approach”, International
Journal of Engineering Research and Applications (IJERA)
ISSN: 2248-9622
Jyoti Soni,” Intelligent and Effective Heart Disease Prediction
System using Weighted Associative Classifiers”, International
Journal of Computer Applications (0975 – 8887) Volume 17–
No.8, March 2011
Dr. D. Raghu,” Probability based Heart Disease Prediction using
Data Mining Techniques”, IJCST ISSN : 0976-8491 (Online) |
ISSN : 2229-4333(Print)
Shantakumar B.Patil,” Intelligent and Effective Heart Attack
Prediction System Using Data Mining and Artificial Neural
Network”, European Journal of Scientific Research ISSN :
0975-3397 Vol. 3 No. 6 June 2011 2385
M.A.Jabbar,” Knowledge discovery from mining association
rules for heart disease prediction”,, Journal of Theoretical and
Applied Information Technology ISSN: 1992-8645 E-ISSN:
1817-3195, 2005
T Srinivasan,” Knowledge Discovery in Clinical Databases with
Neural Network Evidence Combination”.
Sellappan Palaniappan,” Intelligent Heart Disease Prediction
System Using Data Mining Techniques”, IJCSNS International
Journal of Computer Science and Network Security, VOL.8
No.8, August 2008
http://www.ijettjournal.org
Page 2071
Download