CLASSIFICATION OF BRAIN TUMOR USING SUPPORT VECTOR

advertisement
CLASSIFICATION OF BRAIN TUMOR USING SUPPORT VECTOR MACHINE CLASSIFIERS
Sanjivani N. Vaidya
Namdeo B. Vaidya
Assistance Professor
Assistance Professor
Datta Meghe COE Mumbai
Dr. D J Pete
Professor & Head Of Electronics
Datta Meghe COE Mumbai
Datta Meghe COE Mumbai
use radiation. MRI can detect a variety of
ABSTRACT
conditions of the brain such as cysts, tumours,
bleeding, swelling, structural abnormalities,
MRI brain image plays a vital role in
assisting radiologists to access patients for diagnosis
and treatment. Studying of medical image by the
Radiologist is not only a tedious and time consuming
process but also accuracy depends upon their
experience. So, the use of computer aided systems
becomes
very
necessary
to
overcome
these
limitations. Even though several automated methods
are available, still segmentation of MRI brain image
remains as a challenging problem due to its
complexity and there is no standard algorithm that
can produce satisfactory results. In this review paper,
various current methodologies of brain image
segmentation using automated algorithms that are
accurate and requires little user interaction are
reviewed and their advantages, disadvantages are
discussed. This review paper guides in combining
two or more methods together to produce accurate
results.
infections or problems with the blood vessels.
MRI of the brain can be useful in evaluating
problems
such
as
persistent
headaches,
dizziness, weakness, seizures and it can help to
detect certain chronic diseases of the nervous
system such as multiple sclerosis. In some
cases, MRI can provide clear images of parts of
the brain that can’t be seen with an x-ray, CT
scan or ultrasound. There are many different
types of pediatric brain tumors ranging from
those that can be cured with minimal therapy to
those that cannot be cured even with aggressive
therapy. Some of the common types are
Astrocytomas,
gliomas,
Ependymomas,
Germ
cell
Brainstem
tumors,
Craniopharyngiomas. Segmentation of brain
into various tissues like gray matter, white
matter, cerebrospinal fluid, skull and tumor is
very important for detecting tumor, edema, and
KEYWORDS: MRI brain image, Support Vector
hematoma.
Most
research
in
developed
Machine, GLCM.
countries has exposed that the death rate of
people affected by brain tumor has increased
1.
INTRODUCTION
over the past three decades [39]. A tumour is a
Magnetic resonance imaging (MRI) of the
mass of tissue that grows out of control of the
brain is a safe and painless test that uses
normal forces that regulates growth [50].
magnetic field and radio waves to produce
Tumours can directly destroy all healthy brain
detailed images of the brain and brain stem.
cells. It can also indirectly damage healthy cells
Magnetic resonance imaging differs from
by crowding other parts of the brain and causing
computer tomography (CT) because it does not
inflammation, brain swelling and pressure
within the skull [28]. In the early research of
the experts, there is a necessity of computer based
medical tumor detection, the algorithms have
technique for the discrimination of Meningioma
directly used the classic methods of image
subtypes. This project contributes to enhancing the
processing (Such as edge detection and region
knowledge in the field of Meningioma tumor
growing) based on gray intensities of images. In
classification [2][3].
recent years, the classification of human brain in
MRI
images
techniques
is
such
possible
as
via
supervised
k-nearest
neighbour,
2. LITERATURE REVIEW
The literature review is done to get an insight
Artificial neural networks and support vector
of
machine(SVM) and unsupervised classification
classification. Meningioma classification is a real
techniques such as self organization map(SOM)
world problem from the domain of medical image
and fuzzy C-means algorithm have also been
analysis that requires efficient pattern recognition.
used to classify the normal or pathological T 2
Real world problems, however, present a whole new
weighted MRI images. Even though many
set
algorithms are available for detecting brain
community.
tumour,
not
exceptionally well in the lab, fail or do not perform as
satisfactory. In this paper, various approaches of
well in the real world where conditions are not
MRI brain image segmentation methods are
perfect. Hence there is a need to resolve the problem
discussed in section 2 and Feature extraction
of computer-based Meningioma subtype (brain
methods are reviewed in section 3. In section 4,
tumor) classification [6]. There are two approaches
the SVM classifiers used for classifying brain
that could have been used to solve the problem.
image are discussed and finally the suitable
Firstly, image segmentation could have been used to
method for segmentation and classification are
extract structure in an image and then classification
concluded in section 5.
could have been carried out based upon the
1.1 Objective of the project
constituents of the image. The other approach that
the
detection
rate
is
still
Meningioma subtypes classification is a real
world problem from the domain of histological image
analysis that requires new methods for its resolution.
computer
of
based
challenges
Many
to
Meningioma
the
pattern
techniques
subtype
recognition
that
work
can be used is textural. Textural features can be
acquired from each image and classification can be
carried out based upon these features [4].
High intra-class variation and Low inter-class
Approaches used for classification falls into
differences in textures is often an issue in histological
two categories. First category is supervised learning
image analysis problems such as Meningioma
technique such as Artificial Neural Network (ANN),
subtypes classification. The problem of Meningioma
Support Vector Machine (SVM) and K-Nearest
subtype
discriminating
Neighbor (KNN) Algorithm which are used for
between the four types of Meningioma namely
classification. Another category is unsupervised
Meningothelial,
learning for data clustering such as K-means
classification
requires
Fibroblastic,
Transitional
and
Psammomatous [1].
1.2 Scope of the project
Clustering, Self Organizing Map (SOM). Many of the
detailed
decisions
required
for
supervised
classification are not required for unsupervised
Diagnosis of the tumour to the classification
classification.
of the meningioma subtype is time consuming, prone
to error and highly dependent on the availability of
2.1 Artificial Neural Networks (ANN) Classifier:
The Artificial neural network is basically
It is a method for classifying objects based on
having three layers namely input layer, hidden layer
closest training examples in the feature space. It is a
and output layer. There will be one or more hidden
type of instance based learning, where the function is
layers depending upon the number of dimensions of
only approximated locally and all computation is
the training samples. A learning problem with binary
deferred until classification. An object is classified by
outputs (1/0) is referred to as binary classification
a majority vote of its neighbors, with the object being
problem whose output layer has only one neuron. A
assigned to the class most common amongst its k -
learning problem with finite number of outputs is
nearest neighbors. The neighbors are taken from a set
referred to multi-class classification problem whose
of objects for which the correct classification is
output layer has more than one neuron. The examples
known. In order to identify neighbor, the objects are
of input data set (or sets) are referred to as the
represented by position vectors in a multidimensional
training data. The algorithm which takes the training
feature space. The k-Nearest neighbor algorithm is
data as input and gives the output by selecting best
sensitive to the local structure of the data.
one among hypothetical planes from hypothetical
space is referred to as the learning algorithm.
2.3 K-Mean clustering
There are two different styles of training i.e.,
Incremental
Training
and
Batch
training.
K means is widely used clustering algorithm
In
to partition data into k, clusters. Clustering is the
incremental training the weights and biases of the
process for grouping data points with similar feature
network are updated each time an input is presented
vectors into a single cluster and for grouping data
to the network. In batch training the weights and
points with dissimilar feature vectors into different
biases are only updated after all of the inputs are
clusters [6].
presented. In this algorithm for learning the samples,
Tan-sigmoid and log-sigmoid functions are applied in
2.4 Self Organizing Map (SOM)
hidden layer and output layer respectively, Gradient
descent is used for adjusting the weights as training
methodology. For training process, firstly different
features are extracted block by block in one image.
When a new image comes, only those selected
features are extracted and the trained classifier is
used to categorize the tumor in the image [4].
A
self-organizing
map
(SOM)
or
self-
organizing feature map (SOFM) is a type of Artificial
neural network (ANN) for unsupervised learning.
SOMs operate in two modes: training and mapping.
Training is a competitive process, also called vector
quantization. Mapping automatically classifies a new
input vector. Segmentation is an important process to
Shortcomings of ANN
ANN appears to be promising alternative,
however they failed to model sequence data such as
extract information from complex medical images.
Segmentation has wide application in medical field
[6].
online images, due to their complexity. Also, ANN
cannot differentiate the different abnormal brain
images based on the optimal feature set [7][14].
Shortcomings of SOM
The main shortcoming of the SOM is that the
number of neural units in the competitive layer needs
2.2 K-Nearest Neighbor Algorithm
to be approximately equal to the number of regions
desired in the segmented image. It is not however,
Classifiers. In this project, we use Support Vector
possible to determine a priory the correct number of
Machine Classifiers. Fig. 3.1 shows a block diagram
regions M in the segmented image. This is the main
for the proposed algorithm.
limitation of the conventional SOM for image
segmentation. The HSOM directly address the
aforesaid shortcomings of the SOM [6].
2.5 Support Vector Machine (SVM)
SVM is a nonlinear classification algorithm
based on kernel methods. In contrast to linear
classification methods, the kernel methods map the
original parameter vectors into a higher (possibly
infinite) dimensional feature space through a
nonlinear kernel function.
High dimensional input spaces can be
computationally difficult and time consuming for
classifiers, e.g. weight adjustment of Artificial Neural
Network (ANN). It is often required that the input
dimension needs to be reduced. It is desired that with
the limited resources (computer memory, computer
speed, etc.) a classifier can solve the computation as
fast as possible. Computational efficiency of SVM is
high [14].
3. PROPOSED METHODOLOGY
Figure 3.1:
In this project, a statistical method is
Operational flow chart for proposed
system.
presented and applied to brain tumor classification. In
images different local textures can describe different
physical characteristics. We used gray level cooccurrence matrix approach introduced by Haralick
which is well-known statistical method for extracting
second-order texture information for images. The
assumption is that local texture of tumor cells is
highly different from local texture of other biological
tissues. Thus texture measurements in the image
could be part of an effective discrimination technique
between healthy tissues and possible tumor areas.
The association between local texture measures and
recognized tumor area is executed using SVM
3.1 Textural Features
Texture is a commonly used feature in the
analysis and interpretation of images. Texture is
characterized by a set of local statistical properties of
pixel intensities. We base our texture feature
extraction on the spatial gray level co-occurrencematrix (SGLCM). The GLCM method considers the
spatial relationship between pixels of different gray
levels. The method calculates a GLCM by calculating
how often a pixel with a certain intensity i, occurs in
relation with another pixel j, at a certain distance d,
and orientation θ. For instance, if the value of a pixel
is 1 the method looks, for instance, the number of
3.
Inverse Differencet Moment (Homogenity): A
times this pixel has 2 in the right side. Each element
measure of local homogeneity that can be
(i, j) in the GLCM is the sum of the number of times
defined as
that the pixel with value i, occurred in the specified
f3 =
relationship to a pixel with value j, in the raw image.
Once the GLCM is calculated several second-order
4.
texture statistics can be computed as illustrated
∑𝑁
𝑖,𝑗=1 𝑃(𝑖,𝑗)
1+(𝑖−𝑗)²
Entropy: A measure of non-uniformity in the
image based on the probability of co-occurrence
below:
values and can be defined as
where Pd, θ (i, j) is the GLCM between i and j [3].
f4 = ∑𝑁
𝑖,𝑗=1 𝑃(𝑖, 𝑗)[−π‘™π‘œπ‘”(𝑃(𝑖, 𝑗))]
The feature extraction extracts the features of
5.
importance for image classfication. The feature
Angular second moment (Energy): A measure
of homogeneity that can be defined as
extracted gives the property of the image window,
which can be used for training in the database. The
f5 = ∑𝑁
𝑖,𝑗=1(𝑃(𝑖, 𝑗))²
obtained trained feature is compared with the test
sample feature obtained and classified as one of the
extracted character.
6.
Correlation Coefficient: A measure of linear
dependency of brightness and can be defined
Texture features or more precisely, GLCM
features are used to distinguish between
f6=
∑𝑁
𝑖,𝑗=1 𝑖𝑗 𝑃(𝑖,𝑗)−πœ‡π‘₯ πœ‡π‘¦
𝜎π‘₯ πœŽπ‘¦
normal and abnormal brain tumors. Five cooccurrence matrices are constructed in four spatial
Where, N is the number of distinct gray
orientations horizontal, right diagonal, vertical and
levels in the quantized image, equal to 256 for
left diagonal (0°, 45°, 90°, and 135°). A fifth
images in the present study. µx, µy, σx, σy are the
matrix is constructed as the mean of the preceding
mean and standard deviation values of GLCM in the
four matrices.
x and y directions, respectively.
From each co-occurrence matrix, a set of Eight
features are extracted in different orientations for the
3.2 CLASSIFICATION
training of the SVM model. Let P be the N*N co-
Classification is the procedure for classifying
occurrence matrix calculated for each sub-image, and
the input pattern into analogous classes. When the
then the features as given by Byer are as follows:
input data set is represented by its class membership,
it is called supervised learning.
It employs two
1.
Maximum Probability:
f1= max P(i,j)
phases of processing- training phase and testing
2.
Contrast: A measure of difference moment and is
phase. For training phase, characteristics properties
defined as:
of image features are isolated and a unique
description of each classification category is created.
f2 = ∑𝑁
𝑖,𝑗=1|𝑖 − 𝑗| ²π‘ƒ(𝑖, 𝑗)
In testing phase these features space partitions are
used to classify image features [13].
f(x)=𝑀 𝑇 .x+b = 0
x – Set of training vectors
w – vectors perpendicular to the separating hyper
plane
3.2.1 Support Vector Machine (Binary classifier)
b – offset parameter which allows the increase of the
margin
SVM is one of the techniques used for the
classification purpose. SVM generally are capable of
delivering
higher
performance
in
terms
of
classification accuracy. SVM is a binary classifier
based on supervised learning which gives better
performance than other classifiers. SVM classifies
between two classes by constructing a hyperplane
which can be used for classification [13][14].
Expression for hyper plane
Figure 3.3: SVM Classifier
w.x+b = 0
x – Set of training vectors
w is weight vector and normal to hyperplane
b is bias or threshold
3.2.2 Linear SVM Classifier
SVM maps input vectors into a higher
dimensional vector space where an optimal hyper
plane is constructed. Among the many hyper planes
available, there is only one hyper plane that
maximizes the distance between itself and the nearest
data vectors of each category. This hyper plane which
maximizes the margin is called the optimal separating
hyper plane and the margin is defined as the sum of
distances of the hyper plane to the closest training
vectors of each category. The basic theme of SVM is
Step2: The Distance ‘d’ can be calculated by:
|𝑓(π‘₯)| is a measure of Euclidean distance of
the point ‘x’ from decision hyperplane. One side of
the plane f(x) has positive values and on the other
negative. In the special case b=0 the hyperplane
passes through the origin.
Some criteria commonly used in classification
are distance measure. In the following those criteria
are explained:
• Distance measure is the simplest and most direct
approach to classify data points. Basically, the idea is
to classify a data point into the class closest to it. The
Euclidean distance is the most common definition.
to maximize the margins between two classes of the
Suppose we have ‘K’ classes with (μi, Si) as the
hyperplane [13][14]. The detailed description is given
known parameter set of class ‘i’, where μi is the
below:
reference vector of class ‘i’ and Si is the covariance.
The Euclidean distance of an observation vector ‘x’
Step1: The simplest form of discriminating function
from class ‘i’ is given by the following equation [14].
is linear. Linear discriminating function
F(x) is written as:
di(x)=√βˆ₯ π‘₯ − πœ‡π‘– βˆ₯ ²
f(x)=𝑀 𝑇 .x+b
Expression for hyper plane
Margin is d1+d2
3.2.3 Non-Linear SVM
The first section introduces the idea of
wT xk + b ≥ +1 for yk = +1 ……(1)
wT xk/ + b ≤ -1 for yk= -1………(2)
maximal margin classification, optimal separating
hyperplane, followed by kernel methods as the basis
for the extension towards nonlinear classification as
\Step2. Optimal separating hyperplane
introduced by Vapnik.
maximum margin hyperplane
Kernel function is used when decision
function is not a linear function of the data and the
or
The Optimal hyperplane of a training set D is
defined by:
data will be mapped from the input space through a
(w*, b*) = arg max D(w,b)
non linear transformation rather than fitting nonlinear curves to the vector space to separate the data.
With an optimal kernel function implemented in
SVM model, the classification task is able to scale
high dimensional data relatively well, tradeoff
The unit vector w* and the constant b* which
maximize the margin of the training set D (w, b) and
also satisfy the condition (1) and (2).
between classifier complexity, and classification
Step3. Kernel Criteria
error can be controlled explicitly [14]. Various steps
Steps involved:
are given below:
Step1: Maximal margin
Consider the class of hyperplanes wT x + b = 0, w €
Rn, b € R, corresponding
to a decision function
f(x) = sign(wT x + b)
(i) Let x € D € Rn denote a real valued random input
vector, and y € {−1,+1} discrete real valued random
output variable and let
Ω € RnH denote a high
dimensional feature space. The SVM method
basically maps the input vector ‘x’ into the high
dimensional feature space through some nonlinear
φ : D →Ω . In this feature space, one
mapping
consider the linear function
A hyperplane is constructed which maximally
separates the classes :(maximum margin)
f(x) = sign [wT(x) + b]
max w,b min [ β€–π‘₯ − π‘₯π‘˜β€– ; x € Rn , wT x + b=0 ,
k=1,….N]
This
linear
function
is
well
in
solving
classification problems, however, it remains a
To show how this hyperplane can be constructed in
problem to solve the calculation in the high
an efficient way, we need use definitions of
dimensional feature space. Interestingly, no explicit
Separability given by following equation :
construction of the nonlinear mapping ‘φ(x)’ is
A training set D = {(x1, y1), ..., (xN, yN) : xk € Rn , yk
€ {−1,+1}} is called separable by a hyperplane wT x
+ b = 0 if there exist both a unit vector w (‖𝑀‖= 1)
and a constant ‘b’ such that the following equalities
hold:
needed. This is motivated by the following result.
(ii) The inner product in the feature space ‘ φ(x k)T φ
(xl)’ can be replaced with the corresponding
K(xk, xl) satisfying Mercer’s condition.
kernel
4. Disscussion
Using Mercer’s theorem to replace the inner
product φ (xk)Tφ (xl) with its corresponding kernel
The system we are developing is 60%
K(xk, xl) is often called the kernel trick. It enables us
completed. The GLCM based various features are
to work in a huge dimensional feature space without
extracted. In this system a sample of 120 brain
actually having to do explicit computations in this
images are taken, out of which 80 images are
space. Computations are done in another space after
Cancerous and 40 images are Non-cancerous.
applying this kernel trick.
Various features are successfully calculated for all
In the case of support vector machines, one
120 images and further classification ( kernel based )
starts from a formulation in the primal weight space
into various subtypes of Meningioma is in progress.
with a high dimensional feature space by applying
5. CONCLUSION
transformations φ (·). The solution is calculated not
in this primal weight space, but in the dual space of
In this dissertation we study the problems
Lagrange multipliers after applying the kernel trick.
of Conventional classifiers like ANN, KNN, K-mean
In this way classification is done implicitly in a high
Clustering, HSOM etc that for ‘High dimensional
dimensional feature space rather than in the original
input spaces’ computationally is difficult and time
input space.
consuming. In proposed system an advanced kernelbased (RBF, Quadratic, Linear) techniques such as
Step4. Non-Linear Conversion
With slight modification, for the nonlinear case we
SVM
kernel-based
can write
have been implementing. Latest
SVM
classifiers
can
solve
the
computation as fast as possible for ‘High dimensional
wT φ(xk) + b ≥ +1 for yk = +1
input spaces’ and ‘Computational efficiency’ of SVM
wT φ(xk) + b ≤ −1 for yk = −1
is high.
In this quadratic form, the kernel trick is applied
6. REFERENCES
[1] S.T
K (xk, xl) = φ(xk)T φ (xl) for k = 1, ...,N.
[2]
Finally the nonlinear SVM classifier takes the
form
y(x) = sign[ ∑𝑁
π‘˜=1 π›Όπ‘˜π›Όπ‘˜ π‘¦π‘˜ π‘¦π‘˜ K(xk, xl)+b]
[3]
3.2.4 Choice of kernel function:
Two common chioces of kernel functions are:
(i) K(x,z) = exp (-β€–π‘₯ − 𝑧‖²/𝜎 2 )
[4]
( RBF-kernel)
[5]
(ii) K(x,z) = ( τ+π‘₯ 𝑇 𝑧 )d
)
analysis
(
Polynomial of degree d
Acton, D.P Mukherjee, “Scale space
classification using area morphology,” IEEE
Trans Image Process 9(4), 2000, pp.623–635.
M. N. Ahmed, S. M. Yamany, N. Mohamed, A.
A. Farag, T. Moriarty, “A modified fuzzy cmeans algorithm for bias field estimation and
segmentation of MRI data,” IEEE trans. medical
imaging, 21(3), 2002, pp.193-199.
Javad Alirezaie, M. E Jernigan, C. Nahmias,
“Automatic segmentation of cerebral MR images
using Artificial Neural Network,” IEEE
transactions on nuclear science, 1998, vol 45,
no.4.
H. Azzag, N. Monmarche, M. Slimane, G.
Venturini, “Ant Tree: A New model for
clustering with Artificial Ants,” IEEE, 2003,
pp.2642-2647.
E. F. Badran, E. G. Mahmoud, N. Hamdy,"An
algorithm for detecting brain tumors in MRI
images", Proceedings of the International
Conference on Computer Engg. and Systems
(ICCES), 2010, pp:368 - 373.
[6] J. C. Bezdek, “Pattern Recognition with Fuzzy
[11] S. Chaplot, L. M. Patnaik, “Brain Tumor
objective function algorithms” New York, 1981.
D. Bhattacharyya, Kim Tai-hoon, "Brain Tumor
Detection Using MRI Image Analysis",
Communications in Computer and Information
Science, Vol: 151, 2011, pp: 307-314.
B. H. Brinkmann, A. Manduca, R. A. Robb,
“Optimized homomorphic unsharp masking for
MR grayscale inhomogeneity correction,” IEEE
T. Med. Imag., 17, 1998, pp.161–171.
S. Chandra, R. Bhat, H. Singh, "A PSO based
method for detection of brain tumors from MRI",
Proceedings of the World Congress on Nature &
Biologically Inspired Computing, Coimbatore,
2009, pp. 666 - 671.
S. Chaplot, L. M. Patnaik, “Classification of
magnetic resonance brain images using wavelets
as input to support vector machines and neural
networks,” Biomedical Signal Processing and
Control, 2006, pp. 86-92.
Diagnosis with wavelets and Support Vector
Machine,” proceeding of 3rd international
Conference on intelligent Systems and
Knowledge Engineering, 2008.
[12] T. Chou, C. Chen, W. Lin, “Segmentation of
dual-echo MR images using neural networks”,
Proceeding SPIE, medical imaging, 1993,
pp.220-227.
[13] M. H. Chowdhury, W. D. Little., “Image
thresholding techniques” IEEE pacific Rim
conference on communications, computers and
signal processing, proceedings 17-19 may 1995,
1995, pp.585-589.
[14] L. P. Clarke, R. P. Velthuizen, S. Phuphanich, J.
D. Schellenberg, J. A. Arrington, M. Silbiger,
“MRI:
Stability
of
Three
Supervised
Segmentation Techniques”, Magnetic Resonance
Imaging, 11: pp. 95-106, 1993.
[7]
[8]
[9]
[10]
Download