A Data Mining Of Leukemia Cancer Detection Using

advertisement
A Data Mining Of Leukemia Cancer
Detection Using Genetic Algorithm and
Neural Network
Pawandeep1, Arshdeep Singh2
Department of Information Technology
2.1
Department of Computer Science & Engineering
1,2
Adesh Institute of Engineering & Technology, Faridkot
1.1
ABSTRACT
1
Medical imaging has become one of the most
Data mining is the process in which valuable
important
interpretation
information is extracted from the large dataset. It has
methods in biology and medicine over the past
reached the high growth over past few years. Due to
decade. The most challenging aspect of medical
the usefulness of data mining approaches in health
imaging lies in the development of integrated
world, it has become the good technology in
systems for the use of the clinical sector. One of
healthcare domain. This realization leads to explosion
the most feared by the human disease is cancer.
of data mining approaches [1]. Medical data mining
Leukemia is a type of blood cancer, and if it is
can exploit the hidden patterns present in voluminous
detected late, it will result in death. Leukemia
medical data which otherwise is left undiscovered.
occurs when a lot of abnormal white blood cells
Data mining techniques which are applied to medical
produced by bone marrow. In this paper, we have
data include association rule mining for finding
proposed
frequent patterns, prediction, classification and
visualization
a
framework
and
using
data
mining
technique to detect leukemia cancer utilizing
genetic
algorithm
and
neural
network.In
proposedmethodology, we have used genetic
algorithm for reduction of large leukemia data set
or gene set to reduced gene set. This also
calculates the best fitness function. Neural
network classified the matched and unmatched
values. At last accuracy parameters are used for
accuracy. FRR and FAR is evaluated.
INTRODUCTION
clustering.
The research work done in data mining medical
feildis given as: Evans et. al [2] proposed a system
based on data mining techniques
to detect the
hereditiary syndromes. Pradhan and Prabhakaran [3]
proposed an approach through association rule
mining to mine high-dimensional, time series medical
data for discovering high confidence patterns.
DoronShalvi and Nicholas DeClaris, [4] discussed
Keywords: Leukemia, Cancer, Neural Network,
medical data mining through unsupervised neural
Genetic Algorithm, Blood Cells.
networks besides a method for data visualization.
They also emphasized the need for preprocessing
in body cells, and leukemia is a type of cancer that
prior to medical data mining.
starts in blood cells [11].
In the year 2000
Krzysztof J. Cior [5], bioengineering professor,
identified the need for data mining methods to mine
medical multimedia content. Tsumoto [6] identified
problems in medical data mining. The problems
include missing values, data storage with respect to
temporal data and multi-valued data, different
In this paper, we have proposed a framework using
data mining technique to detect leukemia cancer
utilizing genetic algorithm and neural network. In
next section we have briefed about leukemia, its
symptoms, its causes and risk factors occurs due to
its presence.
medical coding systems being used in Hospital
Information Systems (HIS). Brameier and Banzhaf
[7] explored and analyzed two programming models
such
as
neural
networks,
and
linier
genetic
programming for medical data mining. Abidi and
Hoe [8] proposed and implemented a symbolic rule
extraction workbench for generating emerging rulesets. Abidi et al. [9] explored the usage of rule-sets as
results of data mining for building rule-based expert
systems. Olukunle and Ehikioya [10] proposed an
algorithm for extracting association rules from
medical image data. The association rule mining
discovers frequently occurring items in the given
dataset.
2
BACKGROUND OF LEUKEMIA
Leukemia is a type of cancer of the blood or bone
marrow categorize by an irregular augment of
undeveloped white blood cells called "blasts." It is a
thick term covering a compilation of diseases.
According to American Cancer Society it is
approximated that 48,610 persons (27,880 men and
20,730 women) will be detect with and 23,720 men
and women will terminate of leukemia in 2013 only.
In turn, it is part of the even broader set of diseases
disturbing the blood, bone marrow, and lymphoid
system, which are all known as hematological
neoplasm. Over time, leukemia cells can crowd out
Traditionally data mining techniques were used in
the normal blood cells. This can lead to serious
various domains. However, it is introduced relatively
problems such as anemia, bleeding, and infections.
late into the Healthcare domain.
Leukemia cells can also spread to the lymph nodes or
other organs and cause swelling or pain. There are
Normally the necessary part of any human body is
several different types of leukemia.
blood since it keeps one alive. It executes many vital
functions such as to transfer oxygen, carbon dioxide,
mineral and etc. to the complete body in order to

Acute lymphoblastic leukemia, or ALL.

Acute myelogenous leukemia, or AML.

Chronic lymphocytic leukemia, or CLL.

Chronic myelogenous leukemia, or CML.
keep metabolism. Blood consists of three main
components which RBC, WBC and Platelets.
Insufficient amount of the blood could affect the
metabolism critically which could be very hazardous
if early treatment is not taken. One of the normal
In general, leukemia is grouped by how fast it gets
blood disorders is Leukemia. Leukemia is the
worse and what kind of white blood cell it affects.
common type of cancer in children. All cancers start
Acute Lymphoblastic Leukemia (ALL) is the most
all-purpose type of leukemia in young children and
Acute Myelogenous Leukemia (AML) occurs more
routinely inspect blood smear under microscope for
usually in adults than in children, and more usually in
proper identification and classification of blast cells
men than women [12]. The young WBC can also
[14].
build up in a variety of extreme dullard sites,
especially the mining’s, gonads, thymus, liver,
1.1
Causes and Risk Factors of Leukemia
spleen, and lymph nodes. Hence due to extreme lye-
The satisfactory causes of leukemia are unidentified
phobic blast or myeloid blast in the marrow they also
and in most case its unsettled why leukemia has
low into the peripheral blood stream. Acute myeloid
developed. Research into possible causes is going on
leukemia (AML) is also recognized by other names,
all the time. Like other cancers, leukemia isn’t
which include acute myelocytic leukemia, acute
transferable and can’t be approved on to other people.
Myelogenous leukemia, acute granulocytic leukemia,
There are several of factors that may amplify a
and acute non-lymphocytic leukemia.
“Acute”
person’s risk of budding leukemia. Having a
means that this leukemia can develop rapidly if not
scrupulous hazard factor doesn’t denote you will
treated, and would approximately certainly be lethal
definitely get this category of disease and personnel
in a few months. “Myeloid” refers to the type of cell
lacking any recognized risk factors can still develop
from where this leukemia begins.
it. The recognized risk factor of generate this type of
In most cases
AML build up from cells that would wind into white
blood cells (other than lymphocytes), but in some
cancer i.e. leukemia are clarify here.

cases of AML expand in other types of blood forming
Exposure to radiation: People who exposed
to high level of release, such as nuclear
cells.
developed accidents, have a main risk of
AML starts in the bone marrow (which is the soft
developing leukemia than people who have
inner part of certain bones, where new blood cells are
not been exposed. On the other hand, a small
fashioned), but in most cases it speedily moves into
numeral of people in the UK will be
the blood. It can sometimes widen to other parts of
uncovered to emission levels high adequate
the body together with the lymph nodes, liver, annoy
to augment their risk.
inner nervous system (brain and spinal cord), and

testicles. Other types of growth can start in these
Smoking: Smoking increase the risk of
initial leukemia. This may be due to the
organs and behind that expand to the bone marrow.
intense levels of benzene in cigarette smoke.
But these cancers that start anywhere else and then
increase to the bone heart are not leukemia’s. [13]

leukemia may begin due to the long term
Diagnosing leukemia is based on the fact that white
contact to benzene (and possibly other
cell count is greater than before with immature blast
(lymphoid
or
myeloid)
cells
and
solvents) used in industry.
decreased
neutrophils and platelets. The attendance of excess
Exposure to benzene: In very unusual cases,

Cancer treatments: Now and then, a few
number of blast cells in marginal blood is a
anti-cancer treatments such as chemotherapy
significant symptom of leukemia. So hematologists
or radiotherapy can be a basis for leukemia
to build up after some years of this behavior.
may come into view with no any obvious
The risk increase when persuaded types of
injury), heavy periods in women, bleed
chemotherapy
gums, nosebleeds and blood spots or rashes
radiotherapy.
drugs
While
are
mutual
leukemia
with
develops
on the skin (petechial)
since of earlier anti-cancer treatment, this is
called lower leukemia or treatment related
leukemia.

Feeling in general unwell and run down.

Having a passion and sweats, this may be
due to an infection or the leukemia itself.
Blood disorders: People with certain blood
disorders,
such
as
myeloproliferative
myelodysplasia
disorders

or
have
a
Genetic disorders: People with certain
hereditary
disorder,
excluding
A new lump or swollen gland in your neck,
under your arm, or in your groin.
distended risk of initial AML.


Down’s
syndrome and Franconia’s anemia, have an

Frequent fevers.

Bone pain.

Unexplained appetite loss or recent weight
inflated risk of embryonic leukemia.
loss.

Other less general symptoms may be caused by
an increase of leukemia cells in a finicky area of
Swelling and pain on the left side of the
belly.
the body. Your bones may ache, reason by the
strain from a buildup of undeveloped cells in the
3
PROPOSED FRAMEWORK
bone marrow. You might also distinguish raised,
bluish wine areas under the covering due to
3.1
Methodology
leukemia cells in the skin, or swollen gums
Step 1 : Start from uploading the samples which
caused by leukemia cells in the gums. [12]
is in the form of database file. we select
Cancer starts when cells in a piece of the body
the database file and upload it into the
begin to rise out of control and can extend to
GUI homepage
other areas of the body.
Step 2 : The uploaded file contains the gene
1.2
Symptoms

Looking fair and sensation exhausted and
breathless, which is due to anemia reason by
a need of red blood cells

Having more disease than normal, because
of a not have of muscular white blood cells.

Abnormal bleeding caused by too few
platelets – this might comprise stain (bruises
information about the patients who have
leukemia
Step 3 : After data has been uploaded ,create a
graphical representation of data for
visualization
Step 4 : Now we applied the Genetic Algorithm
for the reduction of gene set
Step 5 : For reduction of gene set we calculate
the following G.A steps:
Step 6 : Calculate total gene information as
population
Step 7 : Apply fitness function on gene data
Step 8 : Select sub population and evaluate a
new fitness function.
Step 9 : Finally evaluate best fitness function
Step 10 : Data set is reduced known as reduced
gene set
Step 11 : At last, classification is done by using
neural network. Back propagation is
used.
Step 12 : Matched and unmatched values are
categorized
Step 13 : Compare these values matched and
unmatched and evaluate the accuracy by
using accuracy parameters.
Step 14 : Accuracy parameters are fault
Fig.1 Flowchart of Methodology
acceptance rate (FAR) and fault
rejection rate (FRR) is calculated.
This flowchart contains the
methodology for
reduction of large leukemia data set or gene set to
3.2
Flowchart
reduced gene set using genetic algorithm. Genetic
algorithm calculates the best fitness function. Neural
network classified the matched and unmatched
values. At last accuracy parameters are used for
accuracy. FRR and FAR is evaluated.
The
micro-array
gene
classification
technique
involves three major steps namely (i) Dimensionality
reduction, (ii) Feature selection, and (iii) Gene
classification. The GA technique performs the
dimensionality reduction process for obtaining the
dataset with small size. Initially, the dimensionality
reduction process is carried out on the microarray
cancer gene dataset for diminishing the complexity in
the gene classification. This process is performed
because the dataset size is high dimensional, which
increases the processing time and does not produce
accurate result for the classification process. The
fitness function is carried out to choose the best
chromosomes among the generated chromosomes.
Next step is classification of reduced set using neural
network.Based on the values acquired from training
phase, the performance of the NN network is
analyzed to obtain appropriate values for testing
Fig.2GUI Panel
phase. In order to find the optimum structure, the NN
network performance has been analyzed for the
The above figure shows the graphical user interface
optimum number of hidden nodes and epochs. For
panel of the proposed system having different
this situation, the epochs will be set to a certain fixed
guicontrols of the user panel having upload buttons to
value. Then, the NN network was trained at the
upload the data set, Genetic algorithm and neural
appropriate range of hidden nodes. The number of
network intialization and testing button.
hidden nodes that have given the best performance is
This panel contains push buttons like click here to
then selected as the optimum hidden nodes. After
upload data set ,apply genetic algorithms,intialize
that, by fixing the optimum number of hidden nodes,
neural network and test leukaemia data .
the epochs will be analyzed in a similar way to obtain
At left bottom,code information is created.
the optimum number of epochs that can give the
highest or best accuracy.
4
RESULTS & IMPLEMENTATION
Fig.3GUI upload data interface
Figure represent the upload data set push button.
The above figure shows the data uploading process
After applying G.A, rows and columns are visualize
and the genes graphical representation with their
in reduced feature set.
processing rows and columns wrt to the total number
Gene number are reduced from 48 to 33 approx.
of rows and columns. The graphical representation is
Fitness Function is evaluated for reduction of gene
between the each feature count wrt gene value. The
set
data set values are uploaded in the edit button of GUI
Steps are followed for G.A
as shows in the above figure.
Code information box consist of total number of
genes, total number of features, processing gene
number, evaluating feature no.
This graph represents the gene value and gene feature
count according to this graph large gene value having
best fitness function.
Graphical representation of gene data set provides
visualization of data sets
Fig.5 Neural Network
The above figure shows the neural network toolbox
window having neural stricture which consists of
input layer, hidden layer neurons with synaptic
weights and output layer. Such mathematical terms
Fig.4 Genetic algorithm graphs
like gradient, validation checks are obtained after
apply in neural network. Neural network classify the
The above figure shows the Genes values w.r.t gene
matched and unmatched values. Accuracy parameters
number after applying genetic algorithm. Genetic is
like FRR and FAR are calculated
an optimization algorithm which is applied to
optimize the data set
Reduced data set is evaluated by using G.A algo.
Fig.6: False Acceptance rate
The above figure shows the false acceptance rate
having value 0.002463 which should be low to get
the appropriate output
Fig.9 FAR, FRR and Accuracy Graph
In above bar graph we have shown the results
obtained from our proposed system using FAR, FRR
Fig.7 False Rejection rate
and Accuracy parameters.
The above figure shows the false acceptance rate
5
CONCLUSION & FUTURE SCOPE
having value 0.00045437 which should be low as
Leukemia is a cancer of the marrow and blood. This
much as possible to get the efficient output
adversely affects the formation and normal function
of blood tissues and cells. NNs are particularly
attractive for diagnostic troubles without a linear
resolution. Usually physicians analyze clinical and
laboratory symptoms of blood cancer qualitatively
and finally use bone marrow biopsies as a better
procedure for assess the nature of disease. Then
precise and reliable detection of cancer need more
Fig.8 Accuracy
Para clinical tests and costs and take much time. In
this investigate we apply simple and early clinical
The above figure shows the false acceptance rate
having value 99.7181, which is calculated on the
basis of false acceptance rate, and false rejection rate.
and assessment for proper detection of leukemia.
Therefore by using trained ANN and Genetic
Algorithm, we can predict cancer with least clinical
and laboratory tests and without obligation of much
time. Accuracy of the detection of cancer by the
assembled artificial neural network was analyze by
[5]
Huerta, Edmundo Bonilla, Béatrice Duval,
roc and regression analysis. Outputs of trained ANN
and Jin-Kao Hao. "A hybrid GA/SVM
for testing data were used to plot graphs.
approach
classification
The future work of our experimentation will include
gene
of
selection
microarray
and
data."
Applications of Evolutionary Computing.
ever-increasing the number of records of the dataset
Springer Berlin Heidelberg, 2006. 34-44.
for training the data in order to get accurate results
and the network will be able to learn more
for
[6]
M. Pei,” Feature Extraction Using Genetic
professionally with more number of records. In
Algorithms”, Case Center for Computer-
addition to this, larger datasets can be obtained
Aided
&applied and the approach can be tested to provide
Department of Computer Science Genetic
higher accuracy results. Different neural networks
Algorithms
and other classification techniques can also be tried
Group,2006.
to obtain better results. Use of support vector
[7]
machines will be considered in the future work as a
and
Research
Manufacturing
and
Applications
Yuh-Jye Lee+ and Chia-Huang Chaom,” A
Data Mining Application to Leukemia
classification tool.
Microarray
Gene
Expression
Data
Analysis”, Department of Computer Science
REFERENCES
[1]
Engineering
and
Li, Eldon Y. "Artificial neural networks and
Information
Taiwan
their business applications."Information&
Engineering,
University
of
National
Science
andTechnology, No. 43, Sec. 4, Keelung
Management 27.5 (1994): 303-313.
Rd., Taipei, 106, Taiwan,2007.
[2]
Xue-wen Chen and Michael McKee,”
Finding expressed genes using genetic
[8]
cancer classification using PSO/SVM and
algorithms and support vector machines”,
GA/SVM hybrid algorithms." Evolutionary
Department of Electrical and Computer
Computation, 2007. CEC 2007. IEEE
Engineering, California State University
Congress on. IEEE, 2007.
18111 Nordhoff Street, Northridge, CA
91330, USA,2003.
[3]
[9]
Marc C. Chamberlain, M.D.,”
Morphological
NiponTheera-Umpon,”
Granulometric Features of Nucleus in
Leukemia
Automatic Bone Marrow White Blood Cell
and the Nervous System”,2003
[4]
Alba, Enrique, et al. "Gene selection in
Classification”,
IEEE
transactions
on
S. H. Rezatofighi et.al,” A New Approach to
information technology in biomedicine,
White Blood Cell Nucleus Segmentation
VOL. 11, NO. 3, MAY 2007.
Based
on
Gram-Schmidt
Orthogonalization”,
International
Conference
Processing,2005.
on
Digital
Image
[10]
YvanSaeys and et.al,” A review of feature
selection techniques in bioinformatics”, Vol.
23 no. 19 2007, pages 2507–2517.
[11]
Raje, Chaitali, and JyotiRangole. "Detection
[13]
HosseinGhayoumi,
of Leukemia in microscopic images using
SiamakJanianpour,
image processing." Communications and
"Recognition and Classification of the
Signal
Cancer Cells by Using Image Processing
Processing
(ICCSP),
2014
International Conference on. IEEE, 2014.
[12]
Zadeh,
JavadHaddadnia.
and Lab VIEW." International Journal of
Computer Theory and Engineering(2013).
El-Nasser, Ahmed Abd, Mohamed Shaheen,
and Hesham El-Deeb. "Enhanced leukemia
and
[14]
Mohapatra, Subrajeet, et al. "Fuzzy based
cancer classifier algorithm." Science and
blood image segmentation for automated
Information Conference (SAI), 2014. IEEE,
leukemia
2014.
Communications
detection."
Devices
(ICDeCom),
and
2011
International Conference on. IEEE, 2011.
Download