Review Paper on Classification on Mammography Khalid bashir1, Anuj Sharma 2

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 4 – Aug 2014
Review Paper on Classification on
Mammography
Khalid bashir1, Anuj Sharma 2
1
2
Department of Electronics & Communication Engineering, DIT, Dehradun, India
Assistant Professor,, Department of Electronics & Communication Engineering, DIT, Dehradun, India
Abstract – Breast cancer is leading cause of death
among female cancer patient. However the early
detection of breast cancer is dependent on both
radiologist’s ability to read mammograms and the
quality of mammograms images. Mammography is the
effective technique for the screening of breast cancer
and abnormalities detection. Screening is one of the key
factor to reduce the death rates. The strong correlation
between abnormalities and breast cancer shows that
radiologists could get benefit by the CAD (Computer
aided Diagnosis) system with abilities of automated
breast tissue classification. The aim of this paper is to
present a survey of existing mammograms,
enhancement and segmentation techniques, with the
clear classification and analysis of each technique.
Keywords: mammograms enhancement, mammograms
segmentation, breast mass.
distributions, and we do not need to make any
independence assumptions.
Bayesian networks advantage is that statistical
dependences and independences between features are
represented explicitly. In this study we compare both
classification methods and use two techniques,
named as principal component analysis (PCA) and a
transformation for achieving a normal distribution of
image features, to improve the accuracy rate of the
various techniques. Recently, the combination of
PCA and support vector machines (SVMs) has been
used in medical imaging, where principal component
analysis is applied to extracted image features and the
results are used to arrange a SVM classifier, but not
specifically for mammograms.
II- MATERIALS AND METHODS
I-- INTRODUCTION
In the beginning of 2005, the NHSBSP set up
steering group for digital mammography to address
the issues regarding breast screening programs. This
group is trying to develop and study of different
techniques used to cover this serious problem of
breast cancer. There are several techniques to detect
the breast cancer with their procurement.
Machine learning techniques to diagnosis breast
cancer is a very active research area. Several.
These CAD systems are used to analyze
mammographic abnormalities and classify lesions as
either benign or malignant in order ot assist the
radiologist in the diagnostic decision making. Some
of them are based on Bayesian networks learned on
mammographic descriptions provided by radiologists
or on features extracted by image processing.
Another classification technique that is widely used
for the diagnosis of breast tumors are support vector
machines. The advantage of SVMs is that by
choosing a specific hyper plane among the many that
can separate the data in the feature space, the
problem of over fitting the training data is reduced.
They are often able to characterize a large training set
with a small subset of the training points. Also,
SVMs allow us to choose features with arbitrary
ISSN: 2231-5381
The used digitized mammograms have been obtained
from the Dutch Breast Cancer screening program. In
this program two mammographic views o each breast
were obtained in the initial screening. The mediolateral oblique (MLO) and a cranio caudal (CC) view
are used. This study is obtained by going through
several number of cases. Form these cases some were
benign and some are of malignant. To each image in
the dataset a CAD scheme was applied that was
previously developed in proposed group.
C) CAD scheme
The CAD scheme is described briefly by following
steps:
1) Likelihood detection- In segmentation first
step is of our CAD scheme is segmentation
of an image into breast tissue and
background, using a skin line detection
algorithm. Additionally, it in the edge of the
pectorals muscle. If the image is a MLO
view. Then, a thickness equalization
algorithm is applied to improve the
periphery o the breast.
Background
equalization is done using a similar
algorithm in the pectorals muscle, in order to
http://www.ijettjournal.org
Page 169
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 4 – Aug 2014
avoid the problem with detection of masses
located on or near the pectoral boundary.
A neural network classifies each pixel using
these features and assigns a level of
suspiciousness to it. The neural network is
trained using pixels sampled inside and
outside of a representative series of
malignant masses.
Finally, the result of this method is an image
whose pixel values represent the likelihood
at a malignant mass or architectural
distortion is present.
2) Initial detection step resulting in a likelihood
image and a number of suspect image
locations.
3) Region
segmentation,
any dynamic
programming, using the suspicious locations
as seed points.
4) Finally, classifying the regions as true
abnormalities and false positives.
a given two-class training set in a higher dimensional
space and finds an optimal hyper plane. The optimal
one is the one that separates the data with the
maximal margin. SVMs identify the data points near
the optimal separating hyper plane which are called
support vectors. The distance between the separating
hyper plane and the nearest of the positive and
negative data points is called the margin of the SVM
classifier.
The separating hyper plane is defined as
D(x) = (w · x) + b (4)
Where, x is a vector of the dataset mapped to a high
dimensional space, and w and b are parameters of the
hyper plane that the SVM will estimate.
There are some methodologies that were used in
digitalized mammograms:
1) Digitalized mammogram database
B) Naïve Bayesian Classifier
The naive Bayesian classifier is a Bayesian network
with a limited topology applicable to learning tasks
where each instance is described by a conjunction of
feature values and a class value. This method is also
used to estimates the unknown data item using
probabilistic statistics model. Challenge in the
Bayesian classification is to determine the class of
data sample which have some number of attributes.
Let Z be the dataset with ‘u’ objects such that X1,
X2,…,Xu. Each object have n attributes i.e.
A1,A2……… An. Let there are m classes
C1,C2…Cm. Native bayes classifier predicted the
unknown data sample X which is without the label to
the class Ci if and only
P(Ci|X)>P(Cj|X) for 1<= j
(1)
In this mammogram database Mini-MIAS to test
the algorithm presented is used. The database MIAS
was created by the mammographic image analysis
society – an organization of UK. This database
includes several number of digital mammograms.
Screening program have been digitized to 50 microns
pixel edge, and presenting each pixel with an 8-bit
word. Every image in database always has extrainformation or ground truth from the radiologists says
about characteristics of background tissue.
2) Preprocessing
The preprocessing phase composes of two steps.
First, the removal of artifact and unwanted parts in
background of the mammogram. It can be simply
done by using a thresholding technique, then an
enhancement is applied.
Where, <= m , j < > i
3) ROI detection
Then X is assigned to Ci. This is called Bayes
decision rule.
In this method of classification dataset are divided
into two sets, training and testing respectively.
Training dataset is considered as prior information
and model is constructed on the basis o this training
dataset.
C) SVM scheme
The SVM scheme is based on algorithm which is
being presented by CORTES AND VAPNIK. This is
used for solving classification tasks and have been
successfully applied in various areas of research.The
basic idea of SVM is that it projects data points from
ISSN: 2231-5381
The preprocessed mammogram is inputted to
ROI detection module in order to extract one or
more ROIs from background. These ROI s are
regions which likely contain a massive lesion.
Each detected ROIs is then labeled at true
positive ROI (TP ROI) or false positive ROI (FP
ROI). The proportion of number of TP ROIs to
number of masses is called sensitivity.
III- CONCLUSION AND RESULTS
We evaluated the effectiveness of dimension
reduction and normal distribution transformation in
http://www.ijettjournal.org
Page 170
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 4 – Aug 2014
improving the classification accuracy. After
transformation, the difference in performance of the
SVM classifier and the naive Bayesian classifier was
not statistically significant. Despite the major
drawback of principal component analysis, i.e., it can
eliminate a dimension that is good for discriminating
positive cases from negative cases, this unsupervised
dimension reduction algorithm improved the
classification accuracy of both classifiers. The
performance of the two classifiers after applying
PCA was very similar, with no statistical differences
in the area under the ROC curve Dataset we used
contained lot of features that were highly skewed and
hence didn’t allow a normal distribution. The
algorithm used in Bayesian networks assume that
within each state of the class the observed continuous
features follow a normal distribution. For SVM
classifier, the mainly transformation had no
noticeable effect on the performance, while it may
give the best result by using 6 samples.
FUTURE WORKS
In the future researchers like to test these techniques
with a larger database, which are discussed in this
paper. These techniques can also be improved by
using an unsupervised Kohonen neural network
which can extract features from extracted areas then
feed those features to another feed forward neural
networks.
REFERENCES
1. E. Burnside, D. Rubin, and R. Schachter, “A Bayesian network
for mammography,” in Proceedings of AMIA
Annual Symposium, pp. 106–110, 2000.
2. X. Wang, B. Zheng, W. Good, J. King, and Y. Chang,
“Computer assisted diagnosis of breast cancer using
a data-driven Bayesian belief network,” International Journal of
Medical Informatics 54, pp. 115–126,
May 1999.
3. C. Cortes and V. Vapnik, “Support vector networks,” Machine
Learning 20(3), pp. 273–297, 1995.
4. S. Timp, Analysis of Temporal Mammogram Pairs to Detect and
Characterise Mass Lesions. PhD in medical
sciences, Radboud University Nijmegen, 2006. ISBN 9090205500.
5. T. Nattkemper, B. Arnrich, O. Lichte, W. Timm, A. Degenhard,
L. Pointon, C. Hayes, and M. Leach, “Evaluation
of radiological features for breast tumour classification in clinical
screening with machine learning
methods,” Artificial Intelligence Medical 34, pp. 129–139, June
2004.
ISSN: 2231-5381
6. M. Mavroforakis, H. Georgiou, N. Dimitropoulos, D. Cavouras,
and S. Theodoridis, “Significance analysis of
qualitative mammographic features, using linear classifiers, neural
networks and support vector machines,”
European Journal of Radiology 54, pp. 80–89, April 2005.
7. S. Li, T. Fevens, and A. Krzyzak, “Automatic clinical image
segmentation using pathological modeling,
PCA and SVM,” Engineering Applications of Artificial
Intelligence 19, pp. 403–410, June 2006.
8. G. te Brake, N. Karssemeijer, and J. Hendriks, “An automatic
method to discriminate malignant masses
from normal tissue in digital mammograms1,” Physics in Medicine
and Biology 45(10), pp. 2843–2857, 2000.
9. N. Karssemeijer, “Automated classification of parenchymal
patterns in mammograms,” Physics in Medicine
and Biology 43(2), pp. 365–378, 1998.
10. P. Snoeren and N. Karssemeijer, “Thickness correction of
mammographic images by anisotropic filtering and
interpolation of dense tissue,” in Medical Imaging 2005: Image
Processing. Edited by Fitzpatrick, J. Michael;
Reinhardt, Joseph M. Proceedings of the SPIE, Volume 5747, pp.
1521-1527 (2005)., J. M. Fitzpatrick and
J. M. Reinhardt, eds., pp. 1521–1527, April.
11. N. Karssemeijer and G. te Brake, “Detection of stellate
distortions in mammograms,” IEEE Transactions
on Medical Imaging 15(5), pp. 611–619, 1996.
12. S. Timp and N. Karssemeijer, “A new 2d segmentation method
based on dynamic programming applied to
computer aided detection in mammography,” Medical Physics (5),
pp. 958–971, 2004.
13. S. Caulkin, S. Astley, J. Asquith, and C. Boggis, Sites of
occurrence of malignancies in mammograms, vol. 13
of Digital Mammography Nijmegen, pp. 279–282. Kluwer
Academic Publishers, Dordrecht, The Netherlands,
first ed., December 1998. Editors: N. Karssemeijer and M.
Thijssen and J. Hendriks and L. Erning.
14. C. Varela, S. Timp, and N. Karssemeijer, “Use of border
information in the classification of mammographic
masses,” Physics in Medicine and Biology , January 2006.
15. C. Metz, B. Herman, and J. Shen, “Maximum likelihood
estimation of receiver operating characteristic
(ROC) curves from continuously-distributed data,” Statistics in
Medicine 17, pp. 1033–1053, 1998.
16. C. Metz, P. Wang, and H. Kronman, “A new approach for
testing the significance for differences between
ROC curves measured from correlated data,” in Information
Processing in Medical Imaging, F. Deconinck,
ed., pp. 432–445, The Hague: Nijhoff, 1984.
17. R. Neapolitan, Learning Bayesian Networks, Prentice Hall,
Upper Saddle River, NJ, 2003.
18. P. Domingos and M. J. Pazzani, “On the optimality of the
simple bayesian classifier under zero-one loss,”
Machine Learning 29(2-3), pp. 103–130, 1997.
19. A. J. Smola and B. Scholkopf, “A tutorial on support vector
regression,” Statistics and Computing 14(3),
pp. 199–222, 2004.
20. K. Murphy, “The bayes net toolbox for Matlab,” 2001.
21. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification,
John Wiley, second ed., 2001.
http://www.ijettjournal.org
Page 171
Download