Breast Cancer Classification with Statistical Features of Wavelet Coefficient of Mammograms Shital Lahamage1, Harishchandra Patil2 2 1 PG Student, Cummins College of Engineering, Pune Associate Professor, Cummins College of Engineering, Pune 1sheetalsonare9@gmail.com 2 ht_patil143@yahoo.com the lesion is more than surrounding tissues. These abnormalities are classified into two classes as Benign and Malign. Normally, it is vary tedious for radiologist to analyse between benign and malignant mass. This study involves some novel classification approach and resulted in good accuracy rates in classifying benign and malignant. The result is obtained were analysed for its efficiency using some performance matrices like accuracy, sensitivity with the help of SVM. Abstract— Mammography is an X-ray imaging technique for diagnosis breast tumor. Segmentation of tumor in the mammogram images are difficult task because poor contrast and lesions are surrounded by tissue with similar characteristics. Feature extraction from mammogram images is critical task for classification of cancer. In this paper methodology to classify breast cancer with extract features from mammograms is proposed. In this method include image enhancement, Breast region (ROI) selection and discrete wavelet transform (DWT) for feature extraction. With Contrast Limited Histogram Equalization (CLAHE) image enhancement improves the image quality for processing. DWT is used for image decomposition and statistical features are extracted from low frequency coefficients. Principal Component Analysis (PCA) is used for data reduction and Support Vector Machine for Classification. This method is performed on set of images provided by Mammographic Image Analysis Society (MIAS). The performance of the system is then evaluated using a dataset containing 80 images and obtained accuracy about 90.47% A number of methods have been used to classify or to detect abnormalities in mammograms. Main task of methods is extraction of ROI which consist abnormalities. Variety of techniques has been developed for mass detection, but most are follow two step scheme. First, features are computed for each pixel and each pixel is classified. Second, is region are classified as normal and abnormal according to features like size, shape, or contrast. Various techniques for pre-processing and ROI extraction on the mammograms are available in literature [1-8]. Region of interest (ROI) is detected in [1] using kittler’s method segmentation. Chengdan et al. [2] proposed marker-controlled watershed for breast region segmentation. Another approach is proposed in [3] which uses morphological and seeded region growing to remove digitization noise and suppress artefacts also remove pectoral muscle to accentuate ROI. Another approach was suggested in [4] to classify mammograms with DWT and RT transform with SVM as classifier. Maha sharkas [5] presented a new method for detection of Microcalcifications (MCs) using contourlet transform and principal component analysis (PCA) to extract features, while SVM to classification. Andy Tirtajaya [6] proposed a methodology based on dual tree complex wavelet transform (DT-CWT) as feature extraction with SVM classifier to classify calcification into benign and malign. To construct and evaluate superimposed classifier for mammograms using DWT proposed in [7]. New methodology tested in [8] using db3 at three level decomposition to classify tumour in Normal and Abnormal or Benign and malign. Pravin Hajare [9] proposed a method using Gabor filter, PCA and SVM for breast tissue classification. This method proposed to feature extraction using DWT with PCA for data reduction and SVM to classify tumour into two classes as Benign and Malign. Keywords— Region of Interest (ROI), Discrete Wavelet Transform (DWT), CLAHE enhancement, Principal Component Analysis (PCA), Support Vector Machine (SVM). I. INTRODUCTION Breast cancer is most common type of cancer in women. With tremendous growth of medical field, the reason of cancer is unknown. Therefore mammogram play important role in early diagnosing of breast cancer. Mammography is x-ray imaging technique. In this x-ray component of mammogram is required for breast cancer screening purpose. Mammography is simple, chip, most effective and easily available technique. There are two types of mammography Film mammography and Digital mammography. For this experimentation we have chose digital mammography, because good contrast is achieved over dense breast tissue, also image acquisition is fast and patient is exposed to radiation for small amount of time. Breast cancer is type of cancer which originating from breast tissues, commonly from inner lining of duct and from lobules that supply ducts with milk. It originates from duct called Ductal Carcinoma and when originates from lobule called Lobular Carcinoma. Most common abnormality present in the mammograms is mass and calcification. These are very small in size and contrast of 1 II. CAD SYSTEM In this CAD system consist six parts shown in Fig. 1.Image acquisition, Pre-processing, Detection (Cancer Area Selection), Feature Extraction, Feature Selection, Classification. B. Image Enhancement Before any image processing algorithm can be applied on mammogram, pre-processing steps are very important in order to limit the search for abnormalities without undue influence from background of the mammogram. Digital mammograms are medical images that are difficult to be interpreted, thus a preparation phase is needed in order to improve the image quality and make the detection of cancer area results more accurate. The main objective of this process is to improve the quality of the image to make it ready to further processing. Here Contrast Limited Adaptive Histogram Equalization (CLAHE) enhancement is applied on image. Lesion area is enhanced by CLAHE shown in Fig. 2 which is used for further analysis. Figure: 2 Left: Original Mammogram, Right: Enhanced Image C. Detection / Breast Region Selection Figure: 1 Classification System Original mammograms are 1024x1024 pixels, and almost 50% of images having lot of noise. Therefore a manual cropping operation is applied to images to remove unwanted portion of the image such as labels, artefacts etc. Breast region is cropped according to their x, y imagecoordinates of centre of abnormality, and approximate radius (in pixels) of a circle enclosing the abnormality and resize into 256x256. In this we are selecting the breast region of the abnormality area show in Fig. 3. A. Image Acquisition In this study 80 cancerous mammography images from MIAS which currently has 332 “normal”, “benign” and “malign” cases [11] were selected. In this study only circumscribed mass, ill-defined mass, speculated mass, architectural distortion and asymmetry are considered. In MIAS associated patient information and image information is given as below. There are four major groups for classifying breast density: • Fatty (F) (106 images). • Fatty-glandular (G) (104 images). • Dense-glandular (D) (112 images). The abnormalities are also described with their kind: • CALC Calcification. • CIRC Well-defined/circumscribed masses. • SPIC Spiculated masses. • MISC Other, ill-defined masses. • ARCH Architectural distortion. Figure: 3 Breast cancer area of image D. Feature Extraction - Discrete Wavelet Transform The discrete wavelet transform (DWT) is a linear transformation that operates on a data vector whose length is an integer power of two, transforming it into a Information about x, y image-coordinates of centre of abnormality, and approximate radius (in pixels) of a circle enclosing the abnormality are also provided. 2 images at each scale Fig.4. numerically different vector of the same length. It is a tool that separates data into different frequency components, and then studies each component with resolution matched to its scale. DWT is computed with a cascade of filtering followed by a factor 2 sub-sampling Fig. 4. Figure: 4 DWT Tree Figure: 6 Sub-band images H and L denoted as high and low-pass filters respectively, ↓ 2 denotes sub-sampling. Outputs of these filters are given by equations (1) and (2) 𝑎𝑗+1 [𝑝] = ∑∞ 𝑘=−∞ 𝑙[𝑛 − 2𝑝]𝑎𝑗 [𝑛] (1) 𝑑𝑗+1 [𝑝] = ∑∞ 𝑘=−∞ ℎ[𝑛 − 2𝑝]𝑎𝑗 [𝑛] (2) The decomposition results in two intermediate sub–images. Then, the same procedure is applied to each column of the intermediate sub–images. For one–level decomposition, this results in yields four quarter-sized sub-images LL (m, n), LH (m, n), HL (m, n) and HH (m, n). In hierarchical wavelet decomposition, the sub-image LL is further decomposed into other four sub–images. In this mammograms selected texture feature listed in Table 1. Elements aj are used for next step (scale) of the transform and elements dj, called wavelet coefficients, determine output of the transform. l[n] and h[n] are coefficients of low and high -pas filters respectively One can assume that on scale j+1 there is only half from number of a and d elements on scale j. This causes that DWT can be done until only two aj elements remain in the analysed signal these elements are called scaling function coefficients. DWT algorithm for two-dimensional pictures is similar. The DWT is performed firstly for all image rows and then for all columns shown in Fig.5. Features Formulas 𝑀 Mean 𝜇= 𝑁 1 ∑ ∑ 𝑝(𝑖, 𝑗) 𝑀𝑁 𝑖=1 𝑗=1 Standard Deviation 𝑀 𝑁 1 𝜎= √ ∑ ∑(𝑝(𝑖, 𝑗) − 𝜇)2 𝑀𝑁 𝑖=1 𝑗=1 Energy 𝐸 = ∑ 𝑝(𝑖, 𝑗)2 𝑖,𝑗 𝐿−1 Entropy ℎ = − ∑ 𝑃𝑟𝑘(𝑙𝑜𝑔2 𝑃𝑟𝑘) 𝑘=0 Skewness 𝑀 𝑆= 𝑁 1 (𝑝(𝑖, 𝑗) − 𝜇)2 ∑∑( ) 𝑀𝑁 𝜎 𝑖=1 𝑗=1 𝑉𝑎𝑟 = (𝑆. 𝐷)2 Variance Homogeneity Figure: 5 Wavelet decomposition for two-dimensional pictures 𝐻= ∑ 𝑖,𝑗 The main feature of DWT is multistage representation of function. By using the wavelets, given function can be analysed at various levels of resolution. The DWT is also invertible and can be orthogonal [14]. 𝑀 Kurtosis 𝐾= { 𝑝(𝑖, 𝑗) 1 + |𝑖 − 𝑗| 𝑁 4 1 𝑝(𝑖, 𝑗) − 𝜇 ∑∑[ ] } 𝑀𝑁 𝜎 𝑖=1 𝑗=1 −3 Smoothness TEXTURE FEATURES: In this work only one set of DWT derived features is considered. It is a vector, which contains features of wavelet coefficients calculated in sub-bands at successive scales. As a result of this transform there are 4 sub band 𝑅 =1− Table 1: Texture & Statistical Feature 3 1 1 + 𝜎2 E. Feature Selection The feature selection and dimensionality reduction is process of elimination of closely related data with other data items in a set, as a result a smaller set of features is generated which preserves all the properties of the original large data set. Commonly used dimensionality reduction techniques are Principal Component Analysis (PCA). Principal component analysis (PCA) is a mathematical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Principal Components Analysis (PCA). PCA is a useful statistical technique that has found application in fields such as face recognition and image compression, and is a common technique for finding patterns in data of high dimension. PCA is the simplest type of the true eigenvector-based multivariate analyses. Its operation can be thought of as revealing the internal structure of the data in a way that best explains the variance in the data. If a multivariate dataset is visualized as a set of coordinates in a high-dimensional data space, PCA can supply the user with a lower-dimensional picture of this object when viewed from its most informative viewpoint. This is done by using only the first few principal components so that the dimensionality of the transformed data is reduced. Figure 7: Support Vector Machine with a hyper plane Since an SVM is a classifier, then given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other. More formally, a support vector machine constructs a hyper plane or set of hyper planes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyper plane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. The basic principle of SVMs is a maximum margin classifier. Using the kernel methods, the data can be first implicitly mapped to a high dimensional kernel space. The maximum margin classifier determined in the kernel space and the corresponding decision function in the original space can be non-linear. The non-linear data in the feature space is classified into linear data, with kernel space by the SVMs. This is illustrated in Fig. 8 as follows. The aim of SVM classification method is to find an optimal hyper plane separating relevant and irrelevant vectors by maximizing the size of the margin (between both classes). F. Classification Support vector machines (SVM) are based on the Structural Risk Minimization principle from statistical learning theory. SVM is also applied on different real world problems such as face recognition, cancer diagnosis and text categorization. The idea of structural risk minimization is to find a hypothesis h with the lowest true error. In their basic form, support vector machines find the hyper plane that separates the training data with maximum margin. SVM is a useful technique for data classification. A classification task usually involves with training and testing data which consist of some data instances. Each instance in the training set contains one “target value" (class labels) and several “attributes" (features). The standard SVM Fig.7 takes a set of input data, and predicts, for each given input, which of two possible classes the input is a member of which makes the SVM a nonprobabilistic binary linear classifier. Figure 8: The function f embeds the data in the original space (a) kernel space (b) Where the non-linear pattern now becomes linear. 4 III. EXPERIMENTAL WORK & RESULT This section is divided into two parts result first is SVM classification with testing dataset & training dataset and second is single image testing with SVM. A. Training –Testing Dataset Training set contains one “target value" (class labels) such as benign and malign with several features like texture and statistical features of image. Testing dataset also consist no of images to test classification process on that. In this 10 images are used as testing dataset. In this section cropped ROI saved as dataset as training and testing. ROI is cropped according to their x, y coordinates of center of abnormality and the radius of that lesion. Dataset used in work listed in Table 2. Dataset Benign Malign Training 5 5 Testing 5 5 Other Data 24 36 (a) (b) (c) (d) (e) Figure: 9 (a) Original image, (b) Enhanced image, (c) Cropped ROI, (d) 1st level decomposed ROI, (d) 2nd level Decomposed ROI Table 2 Dataset B. Single Image Testing In the following, we would like to give few examples to show the application of the proposed method. Here we used 60 different mammogram images, which were all digitized at a resolution of 1024×1024 pixels. Since these images were stored in jpeg version they were converted to grayscale images. Selected image enhanced and cropped manually according to information given by MIAS and resized to 256×256 pixels. The proposed algorithm uses DWT to decomposed image at 2 levels for feature extraction and then extracted features given to SVM for classification. Examples of 4 images shown in Fig. 9 for single test imaging. C. Performance measures We have tested the performance of these classifiers by calculating and analysis of accuracy, sensitivity and specificity for malignancy and benign detection. These are defined as follows: Accuracy: Number of classified mass / number of total mass. 𝑇𝑃 𝑇𝑁 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3) 𝑇𝑃 𝑇𝑁 𝐹𝑃 𝐹𝑁 Sensitivity: Number of correct classified malignant mass /number of total malignant mass. 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 5 𝑇𝑃 𝑇𝑃 𝐹𝑁 (4) college of Engineering, Pune International Conference on Recent Trends in Engineering & Technology, 2013. Specificity: Number of correct classified benign mass / number of total benign mass. [2] Chengadan Pei, Chunmei Wang, Shengzhou Xu “Segmentation of 𝑇𝑁 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁 𝐹𝑃 the Breast Region in Mammograms using Marker-controlled Watershed Transform”IEEE. (5) [3] Jawad Nagi, Sameem Abdul Kareem, Farrukh Nagi, Syed Khaleel Ahmed “Automated Breast Profile Segmentation for ROI Detection Using Digital Mammogram ”, College of Engineering, University of Malaya, Malaysia IEEE EMBS Conference on Biomedical Engineering & Science, pp. 87- 92, 2010. [4] Salim Lahmiri, Mounir Boukadoum “DWT and RT-Based Approach for Feature Extraction and Classification of Mammograms with SVM” Department of Computer Science, University of Quebec at Montreal IEEE, pp. 412-415, 2011. Accuracy, sensitivity and specificity of DWT are given in Table 2 with all previous result obtained by others. Accuracy % Sensitivity % Specificity % DWT [14] 89% 87% 87% [6] Andy Gabor Wavelet[14] 86% 89% 85% [7] Cristiane Bastos Rocha Ferreira, Dibio Leandro Borges “Analysis of Method Sharkas, Mohamed Al-Sharkawy “Detection of Microcalcification in Mammograms Using Support Vector Machine ”Department of Electronics & Communication, AAST IEEE , pp. 179-184, 2011. [5] Maha Tirtajaya, Diaz D. Santika “Classification of Microcalcification Using Dual-Tree Complex Wavelet Transform and Support Vector Machine IEEE, 2nd International Conference on Advances in Computing, Control & Telecommunication Technologies, pp. 164-166, 2010. Mammogram Classification Using a Wavelet Decomposition Elsevier Science, pp. 973-982, 2002. Transform [8] Ibrahima Faye, Brahim Belhaouari Samir, Mohamed M. M. DWT[15] 89.41% 95.56% Eltoukhy “Digital Mammograms Classification Using a Wavelet Based Feature Extraction Method IEEE, 2nd International Conference on Computer & Electrical Engineering, pp. 318-322, 2009. 82.5% [9] Pravin S. Hajare, Vaibhav V. Dixit “Breast Tissue Classification Proposed 90.47% 91.42% Using Gabor Filter, PCA and Support Vector Machine International Journal of advancement in electronics and computer engineering (IJAECE) Volume 1, Issue 4, 2012, 89.79% [10] Pragathi. J, H. T. Patil “Multiresolution Analysis for ComputerAided Mass Detection in Mammogram Using Pixel Based Segmentation Method International Conference on Recent Trends in Information Technology (ICRTIT), pp. 214-220, 2003. Table: 2 Performance Measures [11] http://peipa.essex.ac.uk/ ipa/pix/mias/ [12] Lori Mann Bruce, Reza R. Adhami “Classifying Mammographic IV. CONCLUSION Mass Shapes Using the Wavelet Transform Modulus-Maxima Method IEEE Transaction On Medical Imaging, vol.18, pp. 214-220, 1999. In this work breast cancer classification is done with good result. Breast region enhancement is achieved by using CLAHE enhancement technique. Manual cropping method extract a particular region which having abnormality correctly. In the proposed algorithm multi-resolution image analysis is performed to obtain a decomposed image with DWT for feature extraction. With the help of PCA we obtained particular features data in a way that best explains the variance in the data. Extracted features improved classification result with help of SVM. From the final result we see that we achived good classification accuracy about 90.47% with sensitivity 91.42% and specificity 89.79% for all type of lesions. [13] Wang T.C., and Karayiannis N.B. “Detection of microcalcifications in digital mammograms using wavelets IEEE Trans. Med. Imaging, vol.17 no.4, pp. 498-509, 1998. [14] S. M. Salve, V. A. Chakkarwar "Classification of Mammographic images using Gabor Wavelet and Discrete Wavelet Transform ", International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 2, Issue 5, May 2013. [15] J. Anitha, J. Dinesh Peter "A Wavelet Based Morphological Mass Detection and Classification in Mammograms", 2012 IEEE. References [1] Pragathi J, H. T. Patil “Segmentation Method for ROI Detection in Mammogram Images using Wiener Filter and Kittler’s Method”, Department of Instrumentation & Control Engineering, Cummins 6