Detection and Classification of Breast Cancer Nandi Nwe Win, Nang Aye Aye Htwe Abstract— Breast cancer is the second most lethal cancer for women in the world today. X-ray mammography is the most widely used method for early detection of breast cancer. To detect breast cancer region, Canny edge detection is used. To separate this region from all other background, thresholding method is used. This paper presents an implementation of detection and classification system for cancerous tissues. Malignant and benign abnormalities are selected from the segmented images. And then texture based features are extracted using Gray Level Difference Method (GLDM). For the purpose of pattern classification between malignant and benign samples, the optimum subset of texture features are modeled by using Artificial Neural Network (ANN).Detection and Classification of cancerous tissues is implemented with MATLAB programming language. Index Terms—Artificial Neural Network, Canny Operation Digital mammograms, Feature Extraction, Difference (GLDM), Thresholding Gray Level these features with a pattern recognition algorithm. Features are nothing but observable patterns in the image which gives some information about image. For every pattern classification problem, the most important stage is Feature Extraction. The accuracy of the classification depends on the Feature Extraction stage. Much research has been done in mammography towards detecting one or more abnormal structures: circumscribed masses [5], speculated lesions [6] and micro-calcifications [4].Other researchers have focused on classifying the breast lesions as benign or malignant.There are different feature descriptors such as GLDM (Gray Level Difference Method), LBP (Local Binary Patterns), GLRLM(Grey level Run Length Method),Harralick, Gabor texture features and there are classification methods such as SVM,C4.5,K-NN Classifier,ANN. In this paper we have used a GLDM feature extraction method over set of mammography images and then tested their performance on ANN classification. II. RELATED WORKS I. INTRODUCTION Cancer is uncontrolled growth of cells. Breast cancer is the uncontrolled growth of cells in the breast region. Breast cancer is the second leading cause of cancer deaths in women today. Early detection of the cancer can reduce mortality rate. Mammography has reported cancer detection rate of 70-90% which means 10-30% of breast cancers are missed with mammography [1].Early detection of breast cancer can be achieved using Digital Mammography, typically through detection of Characteristics masses and/or micro calcifications. A mammogram is an x-ray of the breast tissue which is designed to identify abnormalities. The presence of clustered microcalcifications in X-ray mammograms is considered an important indicator for the detection of breast cancer, especially for individual microcalcifications with diameters up to about 0.7 mm and with an average diameter of 0.3 mm [2]. Studies have shown that radiologists can miss the detection of a significant proportion of abnormalities in addition to having high rates of false positives .Therefore; it would be valuable to develop a computer aided method for mass/tumour classification based on extracted features from the Region of Interest (ROI) in mammograms [3]. Pattern recognition in image processing requires the extraction of features from regions of the image, and the processing of Manuscript received Oct 15, 2011. Nandi Nwe Win, Department of Information Technology, Mandalay Technological University, Mandalay, Myanmar, 09-256269894 (e-mail: anonymous.mdy.85@gamil.com). Nang Aye Aye Htwe, Department of Information Technology, Mandalay Technological University, Mandalay, Myanmar, 095661208 (e-mail: htwe.aye@gmail.com). In the literature, various numbers of techniques are described to detect and classify the presence of breast cancer in digital mammograms. A lot of research has been done on the textural analysis on mammographic images. Papadopoulossa et al. [7] presented a hybrid intelligent system for the identification of microcalcification clusters in digital mammograms, which can be summarised in three-steps: edge detection, segmentation, feature extraction and classification. This paper investigates the accuracy of a detection methodology that uses Haralick Texture Features as an input to ANN (Artificial Neural Networks) to classify the images into benign or malignant[8]. Weidong Xu et al. proposed a new algorithm based on ANN for detecting masses automatically [9]. III. BACKGROUND THEORY In this paper, there are four main parts: image acquisition, edge detection, image segmentation, feature extraction and classification. A. Image Acquisition Digital mammograms are used as the standard inputs into the proposed framework. Mammography dataset obtained from the Mammographic Image Analysis Society (MIAS) database. MIAS mammography images are digitized at 200 micron pixel edge, with a size of 1024 ×1024 pixels. Each pixel in the grayscale mammogram image represents the pixel intensity in the range of [0, 255] (8-bit). Breast images in MIAS database as shown in Figure 1. 1 All Rights Reserved © 2012 IJSETR International Journal of Science, Engineering and Technology Research (IJSETR) Volume 1, Issue 1, July 2012 Figure 1. Breast images in MIAS database B. Canny Edge Detection The Canny edge detection is known as the optimal edge detector. Canny edge detection aims at enhancing the many edge detectors already published at that time. It is important that edges occurring in images should not be missed and that there be no responses to non-edges. Canny method is a better method to find edges by isolating noise from the image without disturbing the feature of edges in the image. The experimental result of tested breast image by using Canny method as shown in Figure 2. Figure 4. (a) Original image (b) GLDM for Original image (distance=1, direction=0). The Grey-Level Difference Method is constructed based on the statistics of the second order joint conditional probability density function p (i | d).Where i is the grey level (i.e. intensity) difference between two pixels. And then the feature vectors can be derived the following the feature as shown in Table 1. TABLE I DESCRIPTION OF TEXTURE FEATURES Feature Figure 2. Edge Detection Using Canny Method C. Image Segmentation The goal of Image Segmentation is to find regions that represent objects or meaningful parts of objects. Segmentation divides an image into its constituent regions or objects. Thresholding has been used for segmentation as it is most suitable for the present application in order to obtain an image with ‘1’ representing the breast tumor and ‘0’ representing the background. The segmented breast as shown in Figure 3. Figure 3. Image Segmentation Using Thresholding 1 Contrast 2 Mean 3 Formula Entropy 4 Inverse Difference Moment 5 Angular Second Moment 6 Area D. Texture Features Extraction Using Gray Level Difference Method (GLDM) Texture Feature extraction is a very important process in the area of classification. Texture features have been widely used in mammogram classification. The texture features are ability to distinguish between abnormal and normal cases. Gray Level Difference Method (GLDM) is a good feature extraction method for our implementation. An example of gray level difference method is as shown in Figure 4(a) and (b). A complete set of 360 features are used for the classification of breast image. Resulting feature vectors are shown in Figure 5. Finally, these sets of features are used to classify the breast images. 2 All Rights Reserved © 2012 IJSETR past experience and produce a result. 6 features fed to neural input layer. The 20 hidden layer and the output layer produce either 1 (Benign) or 0 (Malignant). IV. SYSTEM DESIGN A. Design of the Proposed System In this system, Canny Method, Thresholding Technique, Gray Level Difference Method and Artificial Neural Network are applied to implement Breast Cancer Detection and Classification System. In image acquisition step, we have used the images from MIAS database. The total 80 mammograms have been used for training and testing. These images are already processed. After applying GLDM feature extractor following value are Contrast, Angular Second moment, Entropy, Mean, Inverse Difference Moment and Area. ANN Classifier is applied to these features which classify the input image as malignant or benign. Overall block diagram of the system is shown in Figure 7. Input: Image Acquisition Edge Detection Image Segmentation Texture Feature Extraction Classification Artificial Neural Network Digital Mammogram Figure 5. Extracted Features E. Classification Neural network is the best tool in pattern classification application and composed of three layers as shown in Figure 6. Input Layer Hidden Layer Output Layer Contrast Mean Entropy . . output . Angular second moment Inverse difference moment Classification Benign Result : Malignant or Benign Figure 7. Overall Block Diagram of the System V. EXPERIMENT For the experiment we have used MIAS database. It is a collection of 100 images. We implemented GLDM feature extraction method in Mat lab V-7.1, R-12.These images are already preprocessed. After applying GLDM feature extractor following values are obtained. As in table 1.ANN Classifier is applied to these features which classify the input image as malignant or non malignant. This paper gives result for two images shown in Figure 8 and Figure 9. Malignant Area Figure 6. Architecture of Artificial Neural Networks The classification process is divided into the training phase and the testing phase. The classifier is trained and tested on mammogram image. The classification accuracy depends on training. In the training phase known data are given. In the testing phase, unknown data are given and the classification is performed using the classifier after training. The accuracy of the classification depends on the efficiency of the training. Neural network are trained by experience, when fed an unknown input into neural network, it can generalize from Figure 8. Input Image 1 for GLDM Figure 9. Input Image 2 for GLDM 3 All Rights Reserved © 2012 IJSETR International Journal of Science, Engineering and Technology Research (IJSETR) Volume 1, Issue 1, July 2012 TABLE II GRAY LEVEL DIFFERENCE METHOD EXTRACTED FEATURES FEATURES IMAGE1 IMAGE2 Benign Malignant Angular Second Moment 216.0473 159.2742 Contrast 52.1763 44.7974 Inverse Different Moment 0.9604 0.6952 Mean 0.3044 0.2616 Entropy 0.0117 0.0099 Area 0 144.2500 VI. CONCLUSIONS Breast cancer classification is a vital stage for the performance of the canny method of breast cancer detection. GLDM feature vector is calculated for each image cell and is used for better computation performance. It reduces the false positive rate by reducing the unnecessary biopsy and health care cost as well. ANN shows very good performance in medical diagnostic systems. Computational time is around 36 seconds for each breast classification. It was evaluated on 60 images containing malignant and benign masses with different size and shape. Using the ANN classifier, breast cancer diagnosis with a training accuracy of 100% and testing accuracy of 100% is achieved. ACKNOWLEDGMENT First of all, the author is grateful to her parents who specially offered strong moral and physical support, care and kindness. The author is highly grateful to Dr. Myint Thein, the Pro.Rector of the Mandalay Technological University for his permission for completion of this paper. The author is deeply thankful to Dr. Aung Myint Aye, Dr. Nang Aye Aye Htwe, Mandalay Technological University, for their overall supporting during the writing of this paper. REFERENCES After extracting the features, the user runs the final result Figure 10 are results of breast classification with Malignant and Benign. [1] [2] [3] [4] [5] [6] [7] Figure 10. Classification result of the program To evaluate performance in this system, there are known image from a train data set and an unknown image from a test data set. The system’s accuracy of breast classification is described in Table 3. [8] [9] TABLE III THE ACCURACY RATE OF BREAST CLASSIFICATION Images set Cancer Non-ca ncer Tot No Correct Prediction Accuracy rate Training set 30 30 60 60 100% Testing set 50 50 100 100 100% R. G. Bird, R. G. Wallace, and B. C. Yankaskas, “Analysis ofcancers missed at screening mammography,” Radiology, vol. 184, pp. 613–617,1992. D. B. Kopans, Breast Imaging. Philadelphia, PA: J.B. Lippincoff, pp. 81–95,1989. M. Sampat, M. Markey, A. Bovik et al., “Computer-aided detection and diagnosis in mammography,” Handbook of image and video processing,vol. 10, no. 4, pp. 1195–1217, 2005. R. Strickland and H. Hahn, “Wavelet transforms for detectingmicrocalcifications in mammograms,” Medical Imaging, IEEE Transactions on, vol. 15, no. 2, pp. 218–229, 1996. M. Giger, F. Yin, K. Doi, C. Metz, R. Schmidt, and C. Vyborny,“Investigation of methods for the computerized detection and analysis of mammographic masses,” in Proceedings of SPIE, vol. 1233, 1990, p.183 S. Liu and E. J. Delp, “Multiresolution detection of stellate lesions in mammograms,” in In Proceedings of the IEEE International Conference on Image Processing, 1997, pp. 109–112 Y. Cairns, I. W. Ricketts, D. Folkes, M. Nimmo, P. E. Preece,A.Thompson,and C. Walker, “The automated detection of clusters of microcalcifications,” in Proc. Inst. Elect. Eng. Colloquium on Applications of Image Processing in Mass Health Screening, pp. 3/1–5,1982. Papadopoulossa, D.I. Fotiadisb, A. Likasb, ―An AutomaticMicrocalcification Detection System Based on a Hybrid Neural Network Classifier‖, Artificial Intelligence in Medicine , pp: 149–167, v.25, 2002. R. M. Welch, K. S Kuo, S. K. Sengupta, and D. W. Chen, “Cloud field classification based upon high spatial resolution textural feature (I): gray-level cooccurrence matrix approach,” J. Geophys. Res., vol.93, pp. 12, 663–12681, Oct. 1988. 4 All Rights Reserved © 2012 IJSETR