WELCOME Malay Mitra Lecturer in Computer Science & Application Jalpaiguri Polytechnic West Bengal A Neural Network Based Intelligent System for Breast Cancer Diagnosis Supervised By : Prof. (Dr.) Ranjit Kumar Samanta Department Of Computer Sc. & Appl. University Of North Bengal West Bengal, India. Why on Breast Cancer • According to ACS, the new breast cancer cases are 229,060 in US only in 2012. • It ranks second as a cause of death in woman after lung cancer. • One of the reasons of survival of 2.5 million breast cancer patients in US is early diagnosis. Different methods for detecting Breast Cancer • Biopsy Accuracy : 100% • Mammography Accuracy : 68% to 79% • FNAC Accuracy : 65% to 98% Points which are noted from previous studies • Much of the works having high classification accuracy are based on hybrid approach. • NN and SVM with various combinations like LS, f-score,RS lead to higher classification accuracy. • From some work it is not clear that the accuracy they obtained are the result of best simulation product or average of several simulations. Wisconsin Breast Cancer Database Wisconsin Breast Cancer data description with statistics # 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Attribute Sample Code number Clump thickness Uniformity of cell size Uniformity of cell shape Marginal adhesion Single epithelial cell size Bare nuclei Bland chromatin Normal nuclei Mitosis Class Domain Mean SD id number 1-10 4.44 2.83 1-10 3.15 3.07 1-10 3.22 2.99 1-10 2.83 2.86 1-10 2.23 2.22 1-10 3.54 3.64 1-10 3.45 2.45 1-10 2.87 3.05 1-10 1.60 1.73 ( 2 for benign, 4 for malignant ) Feature Extraction and Reduction • CFS (Correlation based Feature Selection) • AR (Association Rule) • RS (Rough Set) Correlation based feature subset selection rfc Where, k rfc {k k (k 1)rff rfc Correlation between summed features and class variable rfc Average of the correlation between features k Number of features rff Average inter-correlation between features Rough Set Let there be an information system I = (U,A) where U be the universe discourse and a nonempty set and A is a nonempty set of attributes. For any P Let X A, the equivalence relation IND(P) called P-indiscernibility relation. U, be the target set and can be approximated by P-lower (PX) and P-upper (PX) approximation. Accuracy of the rough set (PX,PX) = |PX| / |PX| Artificial Neural Network (ANN) Mc Culloch – Pitts model of a neuron : Feature vectors as inputs Feature weights as synapse Activation value Output as bias F1 F2 Fn : Summing Part Output Function Modeling with ANN Modeling with ANN involves : i) Designing the network ii) Training the network Design of network involves : i) fixing the number of layers ii) fixing the number of neurons in each layer iii) the node function for each neuron iv) the form of network whether feed-forward or feedback type v) the connectivity patterns between the layers and neurons. Training phase involves : i) Adjustments of weights as well as threshold values from a set of training examples. Levenberg Marquardt (LM) algorithm During the iteration the new configuration of weights in step k+1 is :- w(k 1) w(k ) ( J J I ) J (k ) T Where J the Jacobian matrix the adjustable parameter error vector 1 T Applications The schematic view of our system :Wisconsin Breast Cancer Database (Original) Feature extraction and reduction using CFS & RS Classification using two combinations Decision space : 2. Benign 4. Malignant Data Preprocessing • We completely randomize the dataset after discarding the records with missing values. • There is no outlier in our dataset. • The dataset is partitioned into three sets1. Training set ( 68% ) 2. Validation set ( 16% ) 3. Test set ( 16% ) Feature Selection & Extraction The reduced feature sets after applying CFS & RS Sr. No. Reduced attributes ( CFS ) Reduced attributes ( RS ) 1. Clump thickness Clump thickness 2. Uniformity of cell size Uniformity of cell shape 3. Bare nuclei Bare nuclei 4. Bland chromatin Marginal adhesion 5. Normal nuclei Mitosis Network Architecture i) This work uses logistic function of the form f(x)=1/(1+ex) in the hidden & output nodes. ii) This work uses one input layer, one hidden layer and one output layer. iii) Number of neurons in hidden layer is evaluated from the formula proposed by Goa s=√(a1m2+a2mn+a3n2+a4m+a5n+a6)+a7 where s : number of neurons, m : number of inputs n : number of outputs a1 to a7 are undefined coefficients. Using LMS, Huang derived a formula as: s=√(0.43mn+0.12n2+2.54m+0.77n+0.35)+0.51 In this study m=5, n=1 and hence s=5 (after round off) Modeling Results • WEKA was used for feature set reduction using CFS. • RSES was used for feature set reduction using RS. • The classification algorithm using these two combinations were implemented in Alyuda NeuroIntelligence. Types Classifier Network Structure Epochs (Retrains) I HL O Number of patterns Training Validation Testing CFS+LM 5 5 1 2000(10) 465 109 109 RS+LM 5 5 1 2000(10) 465 109 109 Performance Evaluation Method As performance measure we compute : TP TN Accuracy x100% TP FP TN FN TP Sensitivity x100% TP FN TN Specificit y x100% TN FP Experimental Results Table shows the compiled results of 120 simulations Methods Test set (CCR%) Specificity Sensitivity AUC Highest (freq) Lowest (freq) Avg. Highest (freq) Lowest (freq) Avg. Highest (freq) Lowest (freq) Avg. Highest (freq) Lowest (freq) Avg. CFS + LM 100(6) 94.29 (4) 97.45 100 (19) 84.21 (1) 95.28 100 (38) 94.20 (1) 98.53 100 (10) 94(1) 99.27 RS + LM 100(3) 94.33 (1) 97.23 100 (16) 83.87 (1) 94.94 100 (40) 94.42 (1) 98.46 100 (5) 93(1) 99.11 Observations noted • Out of two methods CFS+LM shows better performance in terms of CCR, Sensitivity, Specificity and AUC. • Our methods provide 100% CCR as the highest performance which is comparable to other studies. • The lowest CCR is 94.29%. Conclusion • This work presents here the highest, lowest and average behavior of the methods used. • This work provides a better result as compared to the result obtained from much of the previous studies. • It is proposed that CFS-derived features set would have been worthwhile when the final decision is made by doctors. • Moreover the highest, lowest and average performance of a DSS should be judged by a user of the system before using. Thank You