A Data Mining Of Leukemia Cancer Detection Using Genetic Algorithm and Neural Network Pawandeep1, Arshdeep Singh2 Department of Information Technology 2.1 Department of Computer Science & Engineering 1,2 Adesh Institute of Engineering & Technology, Faridkot 1.1 ABSTRACT 1 Medical imaging has become one of the most Data mining is the process in which valuable important interpretation information is extracted from the large dataset. It has methods in biology and medicine over the past reached the high growth over past few years. Due to decade. The most challenging aspect of medical the usefulness of data mining approaches in health imaging lies in the development of integrated world, it has become the good technology in systems for the use of the clinical sector. One of healthcare domain. This realization leads to explosion the most feared by the human disease is cancer. of data mining approaches [1]. Medical data mining Leukemia is a type of blood cancer, and if it is can exploit the hidden patterns present in voluminous detected late, it will result in death. Leukemia medical data which otherwise is left undiscovered. occurs when a lot of abnormal white blood cells Data mining techniques which are applied to medical produced by bone marrow. In this paper, we have data include association rule mining for finding proposed frequent patterns, prediction, classification and visualization a framework and using data mining technique to detect leukemia cancer utilizing genetic algorithm and neural network.In proposedmethodology, we have used genetic algorithm for reduction of large leukemia data set or gene set to reduced gene set. This also calculates the best fitness function. Neural network classified the matched and unmatched values. At last accuracy parameters are used for accuracy. FRR and FAR is evaluated. INTRODUCTION clustering. The research work done in data mining medical feildis given as: Evans et. al [2] proposed a system based on data mining techniques to detect the hereditiary syndromes. Pradhan and Prabhakaran [3] proposed an approach through association rule mining to mine high-dimensional, time series medical data for discovering high confidence patterns. DoronShalvi and Nicholas DeClaris, [4] discussed Keywords: Leukemia, Cancer, Neural Network, medical data mining through unsupervised neural Genetic Algorithm, Blood Cells. networks besides a method for data visualization. They also emphasized the need for preprocessing in body cells, and leukemia is a type of cancer that prior to medical data mining. starts in blood cells [11]. In the year 2000 Krzysztof J. Cior [5], bioengineering professor, identified the need for data mining methods to mine medical multimedia content. Tsumoto [6] identified problems in medical data mining. The problems include missing values, data storage with respect to temporal data and multi-valued data, different In this paper, we have proposed a framework using data mining technique to detect leukemia cancer utilizing genetic algorithm and neural network. In next section we have briefed about leukemia, its symptoms, its causes and risk factors occurs due to its presence. medical coding systems being used in Hospital Information Systems (HIS). Brameier and Banzhaf [7] explored and analyzed two programming models such as neural networks, and linier genetic programming for medical data mining. Abidi and Hoe [8] proposed and implemented a symbolic rule extraction workbench for generating emerging rulesets. Abidi et al. [9] explored the usage of rule-sets as results of data mining for building rule-based expert systems. Olukunle and Ehikioya [10] proposed an algorithm for extracting association rules from medical image data. The association rule mining discovers frequently occurring items in the given dataset. 2 BACKGROUND OF LEUKEMIA Leukemia is a type of cancer of the blood or bone marrow categorize by an irregular augment of undeveloped white blood cells called "blasts." It is a thick term covering a compilation of diseases. According to American Cancer Society it is approximated that 48,610 persons (27,880 men and 20,730 women) will be detect with and 23,720 men and women will terminate of leukemia in 2013 only. In turn, it is part of the even broader set of diseases disturbing the blood, bone marrow, and lymphoid system, which are all known as hematological neoplasm. Over time, leukemia cells can crowd out Traditionally data mining techniques were used in the normal blood cells. This can lead to serious various domains. However, it is introduced relatively problems such as anemia, bleeding, and infections. late into the Healthcare domain. Leukemia cells can also spread to the lymph nodes or other organs and cause swelling or pain. There are Normally the necessary part of any human body is several different types of leukemia. blood since it keeps one alive. It executes many vital functions such as to transfer oxygen, carbon dioxide, mineral and etc. to the complete body in order to Acute lymphoblastic leukemia, or ALL. Acute myelogenous leukemia, or AML. Chronic lymphocytic leukemia, or CLL. Chronic myelogenous leukemia, or CML. keep metabolism. Blood consists of three main components which RBC, WBC and Platelets. Insufficient amount of the blood could affect the metabolism critically which could be very hazardous if early treatment is not taken. One of the normal In general, leukemia is grouped by how fast it gets blood disorders is Leukemia. Leukemia is the worse and what kind of white blood cell it affects. common type of cancer in children. All cancers start Acute Lymphoblastic Leukemia (ALL) is the most all-purpose type of leukemia in young children and Acute Myelogenous Leukemia (AML) occurs more routinely inspect blood smear under microscope for usually in adults than in children, and more usually in proper identification and classification of blast cells men than women [12]. The young WBC can also [14]. build up in a variety of extreme dullard sites, especially the mining’s, gonads, thymus, liver, 1.1 Causes and Risk Factors of Leukemia spleen, and lymph nodes. Hence due to extreme lye- The satisfactory causes of leukemia are unidentified phobic blast or myeloid blast in the marrow they also and in most case its unsettled why leukemia has low into the peripheral blood stream. Acute myeloid developed. Research into possible causes is going on leukemia (AML) is also recognized by other names, all the time. Like other cancers, leukemia isn’t which include acute myelocytic leukemia, acute transferable and can’t be approved on to other people. Myelogenous leukemia, acute granulocytic leukemia, There are several of factors that may amplify a and acute non-lymphocytic leukemia. “Acute” person’s risk of budding leukemia. Having a means that this leukemia can develop rapidly if not scrupulous hazard factor doesn’t denote you will treated, and would approximately certainly be lethal definitely get this category of disease and personnel in a few months. “Myeloid” refers to the type of cell lacking any recognized risk factors can still develop from where this leukemia begins. it. The recognized risk factor of generate this type of In most cases AML build up from cells that would wind into white blood cells (other than lymphocytes), but in some cancer i.e. leukemia are clarify here. cases of AML expand in other types of blood forming Exposure to radiation: People who exposed to high level of release, such as nuclear cells. developed accidents, have a main risk of AML starts in the bone marrow (which is the soft developing leukemia than people who have inner part of certain bones, where new blood cells are not been exposed. On the other hand, a small fashioned), but in most cases it speedily moves into numeral of people in the UK will be the blood. It can sometimes widen to other parts of uncovered to emission levels high adequate the body together with the lymph nodes, liver, annoy to augment their risk. inner nervous system (brain and spinal cord), and testicles. Other types of growth can start in these Smoking: Smoking increase the risk of initial leukemia. This may be due to the organs and behind that expand to the bone marrow. intense levels of benzene in cigarette smoke. But these cancers that start anywhere else and then increase to the bone heart are not leukemia’s. [13] leukemia may begin due to the long term Diagnosing leukemia is based on the fact that white contact to benzene (and possibly other cell count is greater than before with immature blast (lymphoid or myeloid) cells and solvents) used in industry. decreased neutrophils and platelets. The attendance of excess Exposure to benzene: In very unusual cases, Cancer treatments: Now and then, a few number of blast cells in marginal blood is a anti-cancer treatments such as chemotherapy significant symptom of leukemia. So hematologists or radiotherapy can be a basis for leukemia to build up after some years of this behavior. may come into view with no any obvious The risk increase when persuaded types of injury), heavy periods in women, bleed chemotherapy gums, nosebleeds and blood spots or rashes radiotherapy. drugs While are mutual leukemia with develops on the skin (petechial) since of earlier anti-cancer treatment, this is called lower leukemia or treatment related leukemia. Feeling in general unwell and run down. Having a passion and sweats, this may be due to an infection or the leukemia itself. Blood disorders: People with certain blood disorders, such as myeloproliferative myelodysplasia disorders or have a Genetic disorders: People with certain hereditary disorder, excluding A new lump or swollen gland in your neck, under your arm, or in your groin. distended risk of initial AML. Down’s syndrome and Franconia’s anemia, have an Frequent fevers. Bone pain. Unexplained appetite loss or recent weight inflated risk of embryonic leukemia. loss. Other less general symptoms may be caused by an increase of leukemia cells in a finicky area of Swelling and pain on the left side of the belly. the body. Your bones may ache, reason by the strain from a buildup of undeveloped cells in the 3 PROPOSED FRAMEWORK bone marrow. You might also distinguish raised, bluish wine areas under the covering due to 3.1 Methodology leukemia cells in the skin, or swollen gums Step 1 : Start from uploading the samples which caused by leukemia cells in the gums. [12] is in the form of database file. we select Cancer starts when cells in a piece of the body the database file and upload it into the begin to rise out of control and can extend to GUI homepage other areas of the body. Step 2 : The uploaded file contains the gene 1.2 Symptoms Looking fair and sensation exhausted and breathless, which is due to anemia reason by a need of red blood cells Having more disease than normal, because of a not have of muscular white blood cells. Abnormal bleeding caused by too few platelets – this might comprise stain (bruises information about the patients who have leukemia Step 3 : After data has been uploaded ,create a graphical representation of data for visualization Step 4 : Now we applied the Genetic Algorithm for the reduction of gene set Step 5 : For reduction of gene set we calculate the following G.A steps: Step 6 : Calculate total gene information as population Step 7 : Apply fitness function on gene data Step 8 : Select sub population and evaluate a new fitness function. Step 9 : Finally evaluate best fitness function Step 10 : Data set is reduced known as reduced gene set Step 11 : At last, classification is done by using neural network. Back propagation is used. Step 12 : Matched and unmatched values are categorized Step 13 : Compare these values matched and unmatched and evaluate the accuracy by using accuracy parameters. Step 14 : Accuracy parameters are fault Fig.1 Flowchart of Methodology acceptance rate (FAR) and fault rejection rate (FRR) is calculated. This flowchart contains the methodology for reduction of large leukemia data set or gene set to 3.2 Flowchart reduced gene set using genetic algorithm. Genetic algorithm calculates the best fitness function. Neural network classified the matched and unmatched values. At last accuracy parameters are used for accuracy. FRR and FAR is evaluated. The micro-array gene classification technique involves three major steps namely (i) Dimensionality reduction, (ii) Feature selection, and (iii) Gene classification. The GA technique performs the dimensionality reduction process for obtaining the dataset with small size. Initially, the dimensionality reduction process is carried out on the microarray cancer gene dataset for diminishing the complexity in the gene classification. This process is performed because the dataset size is high dimensional, which increases the processing time and does not produce accurate result for the classification process. The fitness function is carried out to choose the best chromosomes among the generated chromosomes. Next step is classification of reduced set using neural network.Based on the values acquired from training phase, the performance of the NN network is analyzed to obtain appropriate values for testing Fig.2GUI Panel phase. In order to find the optimum structure, the NN network performance has been analyzed for the The above figure shows the graphical user interface optimum number of hidden nodes and epochs. For panel of the proposed system having different this situation, the epochs will be set to a certain fixed guicontrols of the user panel having upload buttons to value. Then, the NN network was trained at the upload the data set, Genetic algorithm and neural appropriate range of hidden nodes. The number of network intialization and testing button. hidden nodes that have given the best performance is This panel contains push buttons like click here to then selected as the optimum hidden nodes. After upload data set ,apply genetic algorithms,intialize that, by fixing the optimum number of hidden nodes, neural network and test leukaemia data . the epochs will be analyzed in a similar way to obtain At left bottom,code information is created. the optimum number of epochs that can give the highest or best accuracy. 4 RESULTS & IMPLEMENTATION Fig.3GUI upload data interface Figure represent the upload data set push button. The above figure shows the data uploading process After applying G.A, rows and columns are visualize and the genes graphical representation with their in reduced feature set. processing rows and columns wrt to the total number Gene number are reduced from 48 to 33 approx. of rows and columns. The graphical representation is Fitness Function is evaluated for reduction of gene between the each feature count wrt gene value. The set data set values are uploaded in the edit button of GUI Steps are followed for G.A as shows in the above figure. Code information box consist of total number of genes, total number of features, processing gene number, evaluating feature no. This graph represents the gene value and gene feature count according to this graph large gene value having best fitness function. Graphical representation of gene data set provides visualization of data sets Fig.5 Neural Network The above figure shows the neural network toolbox window having neural stricture which consists of input layer, hidden layer neurons with synaptic weights and output layer. Such mathematical terms Fig.4 Genetic algorithm graphs like gradient, validation checks are obtained after apply in neural network. Neural network classify the The above figure shows the Genes values w.r.t gene matched and unmatched values. Accuracy parameters number after applying genetic algorithm. Genetic is like FRR and FAR are calculated an optimization algorithm which is applied to optimize the data set Reduced data set is evaluated by using G.A algo. Fig.6: False Acceptance rate The above figure shows the false acceptance rate having value 0.002463 which should be low to get the appropriate output Fig.9 FAR, FRR and Accuracy Graph In above bar graph we have shown the results obtained from our proposed system using FAR, FRR Fig.7 False Rejection rate and Accuracy parameters. The above figure shows the false acceptance rate 5 CONCLUSION & FUTURE SCOPE having value 0.00045437 which should be low as Leukemia is a cancer of the marrow and blood. This much as possible to get the efficient output adversely affects the formation and normal function of blood tissues and cells. NNs are particularly attractive for diagnostic troubles without a linear resolution. Usually physicians analyze clinical and laboratory symptoms of blood cancer qualitatively and finally use bone marrow biopsies as a better procedure for assess the nature of disease. Then precise and reliable detection of cancer need more Fig.8 Accuracy Para clinical tests and costs and take much time. In this investigate we apply simple and early clinical The above figure shows the false acceptance rate having value 99.7181, which is calculated on the basis of false acceptance rate, and false rejection rate. and assessment for proper detection of leukemia. Therefore by using trained ANN and Genetic Algorithm, we can predict cancer with least clinical and laboratory tests and without obligation of much time. Accuracy of the detection of cancer by the assembled artificial neural network was analyze by [5] Huerta, Edmundo Bonilla, Béatrice Duval, roc and regression analysis. Outputs of trained ANN and Jin-Kao Hao. "A hybrid GA/SVM for testing data were used to plot graphs. approach classification The future work of our experimentation will include gene of selection microarray and data." Applications of Evolutionary Computing. ever-increasing the number of records of the dataset Springer Berlin Heidelberg, 2006. 34-44. for training the data in order to get accurate results and the network will be able to learn more for [6] M. Pei,” Feature Extraction Using Genetic professionally with more number of records. In Algorithms”, Case Center for Computer- addition to this, larger datasets can be obtained Aided &applied and the approach can be tested to provide Department of Computer Science Genetic higher accuracy results. Different neural networks Algorithms and other classification techniques can also be tried Group,2006. to obtain better results. Use of support vector [7] machines will be considered in the future work as a and Research Manufacturing and Applications Yuh-Jye Lee+ and Chia-Huang Chaom,” A Data Mining Application to Leukemia classification tool. Microarray Gene Expression Data Analysis”, Department of Computer Science REFERENCES [1] Engineering and Li, Eldon Y. "Artificial neural networks and Information Taiwan their business applications."Information& Engineering, University of National Science andTechnology, No. 43, Sec. 4, Keelung Management 27.5 (1994): 303-313. Rd., Taipei, 106, Taiwan,2007. [2] Xue-wen Chen and Michael McKee,” Finding expressed genes using genetic [8] cancer classification using PSO/SVM and algorithms and support vector machines”, GA/SVM hybrid algorithms." Evolutionary Department of Electrical and Computer Computation, 2007. CEC 2007. IEEE Engineering, California State University Congress on. IEEE, 2007. 18111 Nordhoff Street, Northridge, CA 91330, USA,2003. [3] [9] Marc C. Chamberlain, M.D.,” Morphological NiponTheera-Umpon,” Granulometric Features of Nucleus in Leukemia Automatic Bone Marrow White Blood Cell and the Nervous System”,2003 [4] Alba, Enrique, et al. "Gene selection in Classification”, IEEE transactions on S. H. Rezatofighi et.al,” A New Approach to information technology in biomedicine, White Blood Cell Nucleus Segmentation VOL. 11, NO. 3, MAY 2007. Based on Gram-Schmidt Orthogonalization”, International Conference Processing,2005. on Digital Image [10] YvanSaeys and et.al,” A review of feature selection techniques in bioinformatics”, Vol. 23 no. 19 2007, pages 2507–2517. [11] Raje, Chaitali, and JyotiRangole. "Detection [13] HosseinGhayoumi, of Leukemia in microscopic images using SiamakJanianpour, image processing." Communications and "Recognition and Classification of the Signal Cancer Cells by Using Image Processing Processing (ICCSP), 2014 International Conference on. IEEE, 2014. [12] Zadeh, JavadHaddadnia. and Lab VIEW." International Journal of Computer Theory and Engineering(2013). El-Nasser, Ahmed Abd, Mohamed Shaheen, and Hesham El-Deeb. "Enhanced leukemia and [14] Mohapatra, Subrajeet, et al. "Fuzzy based cancer classifier algorithm." Science and blood image segmentation for automated Information Conference (SAI), 2014. IEEE, leukemia 2014. Communications detection." Devices (ICDeCom), and 2011 International Conference on. IEEE, 2011.