International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 4 – Aug 2014 Review Paper on Classification on Mammography Khalid bashir1, Anuj Sharma 2 1 2 Department of Electronics & Communication Engineering, DIT, Dehradun, India Assistant Professor,, Department of Electronics & Communication Engineering, DIT, Dehradun, India Abstract – Breast cancer is leading cause of death among female cancer patient. However the early detection of breast cancer is dependent on both radiologist’s ability to read mammograms and the quality of mammograms images. Mammography is the effective technique for the screening of breast cancer and abnormalities detection. Screening is one of the key factor to reduce the death rates. The strong correlation between abnormalities and breast cancer shows that radiologists could get benefit by the CAD (Computer aided Diagnosis) system with abilities of automated breast tissue classification. The aim of this paper is to present a survey of existing mammograms, enhancement and segmentation techniques, with the clear classification and analysis of each technique. Keywords: mammograms enhancement, mammograms segmentation, breast mass. distributions, and we do not need to make any independence assumptions. Bayesian networks advantage is that statistical dependences and independences between features are represented explicitly. In this study we compare both classification methods and use two techniques, named as principal component analysis (PCA) and a transformation for achieving a normal distribution of image features, to improve the accuracy rate of the various techniques. Recently, the combination of PCA and support vector machines (SVMs) has been used in medical imaging, where principal component analysis is applied to extracted image features and the results are used to arrange a SVM classifier, but not specifically for mammograms. II- MATERIALS AND METHODS I-- INTRODUCTION In the beginning of 2005, the NHSBSP set up steering group for digital mammography to address the issues regarding breast screening programs. This group is trying to develop and study of different techniques used to cover this serious problem of breast cancer. There are several techniques to detect the breast cancer with their procurement. Machine learning techniques to diagnosis breast cancer is a very active research area. Several. These CAD systems are used to analyze mammographic abnormalities and classify lesions as either benign or malignant in order ot assist the radiologist in the diagnostic decision making. Some of them are based on Bayesian networks learned on mammographic descriptions provided by radiologists or on features extracted by image processing. Another classification technique that is widely used for the diagnosis of breast tumors are support vector machines. The advantage of SVMs is that by choosing a specific hyper plane among the many that can separate the data in the feature space, the problem of over fitting the training data is reduced. They are often able to characterize a large training set with a small subset of the training points. Also, SVMs allow us to choose features with arbitrary ISSN: 2231-5381 The used digitized mammograms have been obtained from the Dutch Breast Cancer screening program. In this program two mammographic views o each breast were obtained in the initial screening. The mediolateral oblique (MLO) and a cranio caudal (CC) view are used. This study is obtained by going through several number of cases. Form these cases some were benign and some are of malignant. To each image in the dataset a CAD scheme was applied that was previously developed in proposed group. C) CAD scheme The CAD scheme is described briefly by following steps: 1) Likelihood detection- In segmentation first step is of our CAD scheme is segmentation of an image into breast tissue and background, using a skin line detection algorithm. Additionally, it in the edge of the pectorals muscle. If the image is a MLO view. Then, a thickness equalization algorithm is applied to improve the periphery o the breast. Background equalization is done using a similar algorithm in the pectorals muscle, in order to http://www.ijettjournal.org Page 169 International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 4 – Aug 2014 avoid the problem with detection of masses located on or near the pectoral boundary. A neural network classifies each pixel using these features and assigns a level of suspiciousness to it. The neural network is trained using pixels sampled inside and outside of a representative series of malignant masses. Finally, the result of this method is an image whose pixel values represent the likelihood at a malignant mass or architectural distortion is present. 2) Initial detection step resulting in a likelihood image and a number of suspect image locations. 3) Region segmentation, any dynamic programming, using the suspicious locations as seed points. 4) Finally, classifying the regions as true abnormalities and false positives. a given two-class training set in a higher dimensional space and finds an optimal hyper plane. The optimal one is the one that separates the data with the maximal margin. SVMs identify the data points near the optimal separating hyper plane which are called support vectors. The distance between the separating hyper plane and the nearest of the positive and negative data points is called the margin of the SVM classifier. The separating hyper plane is defined as D(x) = (w · x) + b (4) Where, x is a vector of the dataset mapped to a high dimensional space, and w and b are parameters of the hyper plane that the SVM will estimate. There are some methodologies that were used in digitalized mammograms: 1) Digitalized mammogram database B) Naïve Bayesian Classifier The naive Bayesian classifier is a Bayesian network with a limited topology applicable to learning tasks where each instance is described by a conjunction of feature values and a class value. This method is also used to estimates the unknown data item using probabilistic statistics model. Challenge in the Bayesian classification is to determine the class of data sample which have some number of attributes. Let Z be the dataset with ‘u’ objects such that X1, X2,…,Xu. Each object have n attributes i.e. A1,A2……… An. Let there are m classes C1,C2…Cm. Native bayes classifier predicted the unknown data sample X which is without the label to the class Ci if and only P(Ci|X)>P(Cj|X) for 1<= j (1) In this mammogram database Mini-MIAS to test the algorithm presented is used. The database MIAS was created by the mammographic image analysis society – an organization of UK. This database includes several number of digital mammograms. Screening program have been digitized to 50 microns pixel edge, and presenting each pixel with an 8-bit word. Every image in database always has extrainformation or ground truth from the radiologists says about characteristics of background tissue. 2) Preprocessing The preprocessing phase composes of two steps. First, the removal of artifact and unwanted parts in background of the mammogram. It can be simply done by using a thresholding technique, then an enhancement is applied. Where, <= m , j < > i 3) ROI detection Then X is assigned to Ci. This is called Bayes decision rule. In this method of classification dataset are divided into two sets, training and testing respectively. Training dataset is considered as prior information and model is constructed on the basis o this training dataset. C) SVM scheme The SVM scheme is based on algorithm which is being presented by CORTES AND VAPNIK. This is used for solving classification tasks and have been successfully applied in various areas of research.The basic idea of SVM is that it projects data points from ISSN: 2231-5381 The preprocessed mammogram is inputted to ROI detection module in order to extract one or more ROIs from background. These ROI s are regions which likely contain a massive lesion. Each detected ROIs is then labeled at true positive ROI (TP ROI) or false positive ROI (FP ROI). The proportion of number of TP ROIs to number of masses is called sensitivity. III- CONCLUSION AND RESULTS We evaluated the effectiveness of dimension reduction and normal distribution transformation in http://www.ijettjournal.org Page 170 International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 4 – Aug 2014 improving the classification accuracy. After transformation, the difference in performance of the SVM classifier and the naive Bayesian classifier was not statistically significant. Despite the major drawback of principal component analysis, i.e., it can eliminate a dimension that is good for discriminating positive cases from negative cases, this unsupervised dimension reduction algorithm improved the classification accuracy of both classifiers. The performance of the two classifiers after applying PCA was very similar, with no statistical differences in the area under the ROC curve Dataset we used contained lot of features that were highly skewed and hence didn’t allow a normal distribution. The algorithm used in Bayesian networks assume that within each state of the class the observed continuous features follow a normal distribution. For SVM classifier, the mainly transformation had no noticeable effect on the performance, while it may give the best result by using 6 samples. FUTURE WORKS In the future researchers like to test these techniques with a larger database, which are discussed in this paper. These techniques can also be improved by using an unsupervised Kohonen neural network which can extract features from extracted areas then feed those features to another feed forward neural networks. REFERENCES 1. E. Burnside, D. Rubin, and R. Schachter, “A Bayesian network for mammography,” in Proceedings of AMIA Annual Symposium, pp. 106–110, 2000. 2. X. Wang, B. Zheng, W. Good, J. King, and Y. Chang, “Computer assisted diagnosis of breast cancer using a data-driven Bayesian belief network,” International Journal of Medical Informatics 54, pp. 115–126, May 1999. 3. C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning 20(3), pp. 273–297, 1995. 4. S. Timp, Analysis of Temporal Mammogram Pairs to Detect and Characterise Mass Lesions. PhD in medical sciences, Radboud University Nijmegen, 2006. ISBN 9090205500. 5. T. Nattkemper, B. Arnrich, O. Lichte, W. Timm, A. Degenhard, L. Pointon, C. Hayes, and M. Leach, “Evaluation of radiological features for breast tumour classification in clinical screening with machine learning methods,” Artificial Intelligence Medical 34, pp. 129–139, June 2004. ISSN: 2231-5381 6. M. Mavroforakis, H. Georgiou, N. Dimitropoulos, D. Cavouras, and S. Theodoridis, “Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines,” European Journal of Radiology 54, pp. 80–89, April 2005. 7. S. Li, T. Fevens, and A. Krzyzak, “Automatic clinical image segmentation using pathological modeling, PCA and SVM,” Engineering Applications of Artificial Intelligence 19, pp. 403–410, June 2006. 8. G. te Brake, N. Karssemeijer, and J. Hendriks, “An automatic method to discriminate malignant masses from normal tissue in digital mammograms1,” Physics in Medicine and Biology 45(10), pp. 2843–2857, 2000. 9. N. Karssemeijer, “Automated classification of parenchymal patterns in mammograms,” Physics in Medicine and Biology 43(2), pp. 365–378, 1998. 10. P. Snoeren and N. Karssemeijer, “Thickness correction of mammographic images by anisotropic filtering and interpolation of dense tissue,” in Medical Imaging 2005: Image Processing. Edited by Fitzpatrick, J. Michael; Reinhardt, Joseph M. Proceedings of the SPIE, Volume 5747, pp. 1521-1527 (2005)., J. M. Fitzpatrick and J. M. Reinhardt, eds., pp. 1521–1527, April. 11. N. Karssemeijer and G. te Brake, “Detection of stellate distortions in mammograms,” IEEE Transactions on Medical Imaging 15(5), pp. 611–619, 1996. 12. S. Timp and N. Karssemeijer, “A new 2d segmentation method based on dynamic programming applied to computer aided detection in mammography,” Medical Physics (5), pp. 958–971, 2004. 13. S. Caulkin, S. Astley, J. Asquith, and C. Boggis, Sites of occurrence of malignancies in mammograms, vol. 13 of Digital Mammography Nijmegen, pp. 279–282. Kluwer Academic Publishers, Dordrecht, The Netherlands, first ed., December 1998. Editors: N. Karssemeijer and M. Thijssen and J. Hendriks and L. Erning. 14. C. Varela, S. Timp, and N. Karssemeijer, “Use of border information in the classification of mammographic masses,” Physics in Medicine and Biology , January 2006. 15. C. Metz, B. Herman, and J. Shen, “Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data,” Statistics in Medicine 17, pp. 1033–1053, 1998. 16. C. Metz, P. Wang, and H. Kronman, “A new approach for testing the significance for differences between ROC curves measured from correlated data,” in Information Processing in Medical Imaging, F. Deconinck, ed., pp. 432–445, The Hague: Nijhoff, 1984. 17. R. Neapolitan, Learning Bayesian Networks, Prentice Hall, Upper Saddle River, NJ, 2003. 18. P. Domingos and M. J. Pazzani, “On the optimality of the simple bayesian classifier under zero-one loss,” Machine Learning 29(2-3), pp. 103–130, 1997. 19. A. J. Smola and B. Scholkopf, “A tutorial on support vector regression,” Statistics and Computing 14(3), pp. 199–222, 2004. 20. K. Murphy, “The bayes net toolbox for Matlab,” 2001. 21. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley, second ed., 2001. http://www.ijettjournal.org Page 171