CLASSIFICATION OF BRAIN TUMOR USING SUPPORT VECTOR MACHINE CLASSIFIERS Sanjivani N. Vaidya Namdeo B. Vaidya Assistance Professor Assistance Professor Datta Meghe COE Mumbai Dr. D J Pete Professor & Head Of Electronics Datta Meghe COE Mumbai Datta Meghe COE Mumbai use radiation. MRI can detect a variety of ABSTRACT conditions of the brain such as cysts, tumours, bleeding, swelling, structural abnormalities, MRI brain image plays a vital role in assisting radiologists to access patients for diagnosis and treatment. Studying of medical image by the Radiologist is not only a tedious and time consuming process but also accuracy depends upon their experience. So, the use of computer aided systems becomes very necessary to overcome these limitations. Even though several automated methods are available, still segmentation of MRI brain image remains as a challenging problem due to its complexity and there is no standard algorithm that can produce satisfactory results. In this review paper, various current methodologies of brain image segmentation using automated algorithms that are accurate and requires little user interaction are reviewed and their advantages, disadvantages are discussed. This review paper guides in combining two or more methods together to produce accurate results. infections or problems with the blood vessels. MRI of the brain can be useful in evaluating problems such as persistent headaches, dizziness, weakness, seizures and it can help to detect certain chronic diseases of the nervous system such as multiple sclerosis. In some cases, MRI can provide clear images of parts of the brain that can’t be seen with an x-ray, CT scan or ultrasound. There are many different types of pediatric brain tumors ranging from those that can be cured with minimal therapy to those that cannot be cured even with aggressive therapy. Some of the common types are Astrocytomas, gliomas, Ependymomas, Germ cell Brainstem tumors, Craniopharyngiomas. Segmentation of brain into various tissues like gray matter, white matter, cerebrospinal fluid, skull and tumor is very important for detecting tumor, edema, and KEYWORDS: MRI brain image, Support Vector hematoma. Most research in developed Machine, GLCM. countries has exposed that the death rate of people affected by brain tumor has increased 1. INTRODUCTION over the past three decades [39]. A tumour is a Magnetic resonance imaging (MRI) of the mass of tissue that grows out of control of the brain is a safe and painless test that uses normal forces that regulates growth [50]. magnetic field and radio waves to produce Tumours can directly destroy all healthy brain detailed images of the brain and brain stem. cells. It can also indirectly damage healthy cells Magnetic resonance imaging differs from by crowding other parts of the brain and causing computer tomography (CT) because it does not inflammation, brain swelling and pressure within the skull [28]. In the early research of the experts, there is a necessity of computer based medical tumor detection, the algorithms have technique for the discrimination of Meningioma directly used the classic methods of image subtypes. This project contributes to enhancing the processing (Such as edge detection and region knowledge in the field of Meningioma tumor growing) based on gray intensities of images. In classification [2][3]. recent years, the classification of human brain in MRI images techniques is such possible as via supervised k-nearest neighbour, 2. LITERATURE REVIEW The literature review is done to get an insight Artificial neural networks and support vector of machine(SVM) and unsupervised classification classification. Meningioma classification is a real techniques such as self organization map(SOM) world problem from the domain of medical image and fuzzy C-means algorithm have also been analysis that requires efficient pattern recognition. used to classify the normal or pathological T 2 Real world problems, however, present a whole new weighted MRI images. Even though many set algorithms are available for detecting brain community. tumour, not exceptionally well in the lab, fail or do not perform as satisfactory. In this paper, various approaches of well in the real world where conditions are not MRI brain image segmentation methods are perfect. Hence there is a need to resolve the problem discussed in section 2 and Feature extraction of computer-based Meningioma subtype (brain methods are reviewed in section 3. In section 4, tumor) classification [6]. There are two approaches the SVM classifiers used for classifying brain that could have been used to solve the problem. image are discussed and finally the suitable Firstly, image segmentation could have been used to method for segmentation and classification are extract structure in an image and then classification concluded in section 5. could have been carried out based upon the 1.1 Objective of the project constituents of the image. The other approach that the detection rate is still Meningioma subtypes classification is a real world problem from the domain of histological image analysis that requires new methods for its resolution. computer of based challenges Many to Meningioma the pattern techniques subtype recognition that work can be used is textural. Textural features can be acquired from each image and classification can be carried out based upon these features [4]. High intra-class variation and Low inter-class Approaches used for classification falls into differences in textures is often an issue in histological two categories. First category is supervised learning image analysis problems such as Meningioma technique such as Artificial Neural Network (ANN), subtypes classification. The problem of Meningioma Support Vector Machine (SVM) and K-Nearest subtype discriminating Neighbor (KNN) Algorithm which are used for between the four types of Meningioma namely classification. Another category is unsupervised Meningothelial, learning for data clustering such as K-means classification requires Fibroblastic, Transitional and Psammomatous [1]. 1.2 Scope of the project Clustering, Self Organizing Map (SOM). Many of the detailed decisions required for supervised classification are not required for unsupervised Diagnosis of the tumour to the classification classification. of the meningioma subtype is time consuming, prone to error and highly dependent on the availability of 2.1 Artificial Neural Networks (ANN) Classifier: The Artificial neural network is basically It is a method for classifying objects based on having three layers namely input layer, hidden layer closest training examples in the feature space. It is a and output layer. There will be one or more hidden type of instance based learning, where the function is layers depending upon the number of dimensions of only approximated locally and all computation is the training samples. A learning problem with binary deferred until classification. An object is classified by outputs (1/0) is referred to as binary classification a majority vote of its neighbors, with the object being problem whose output layer has only one neuron. A assigned to the class most common amongst its k - learning problem with finite number of outputs is nearest neighbors. The neighbors are taken from a set referred to multi-class classification problem whose of objects for which the correct classification is output layer has more than one neuron. The examples known. In order to identify neighbor, the objects are of input data set (or sets) are referred to as the represented by position vectors in a multidimensional training data. The algorithm which takes the training feature space. The k-Nearest neighbor algorithm is data as input and gives the output by selecting best sensitive to the local structure of the data. one among hypothetical planes from hypothetical space is referred to as the learning algorithm. 2.3 K-Mean clustering There are two different styles of training i.e., Incremental Training and Batch training. K means is widely used clustering algorithm In to partition data into k, clusters. Clustering is the incremental training the weights and biases of the process for grouping data points with similar feature network are updated each time an input is presented vectors into a single cluster and for grouping data to the network. In batch training the weights and points with dissimilar feature vectors into different biases are only updated after all of the inputs are clusters [6]. presented. In this algorithm for learning the samples, Tan-sigmoid and log-sigmoid functions are applied in 2.4 Self Organizing Map (SOM) hidden layer and output layer respectively, Gradient descent is used for adjusting the weights as training methodology. For training process, firstly different features are extracted block by block in one image. When a new image comes, only those selected features are extracted and the trained classifier is used to categorize the tumor in the image [4]. A self-organizing map (SOM) or self- organizing feature map (SOFM) is a type of Artificial neural network (ANN) for unsupervised learning. SOMs operate in two modes: training and mapping. Training is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector. Segmentation is an important process to Shortcomings of ANN ANN appears to be promising alternative, however they failed to model sequence data such as extract information from complex medical images. Segmentation has wide application in medical field [6]. online images, due to their complexity. Also, ANN cannot differentiate the different abnormal brain images based on the optimal feature set [7][14]. Shortcomings of SOM The main shortcoming of the SOM is that the number of neural units in the competitive layer needs 2.2 K-Nearest Neighbor Algorithm to be approximately equal to the number of regions desired in the segmented image. It is not however, Classifiers. In this project, we use Support Vector possible to determine a priory the correct number of Machine Classifiers. Fig. 3.1 shows a block diagram regions M in the segmented image. This is the main for the proposed algorithm. limitation of the conventional SOM for image segmentation. The HSOM directly address the aforesaid shortcomings of the SOM [6]. 2.5 Support Vector Machine (SVM) SVM is a nonlinear classification algorithm based on kernel methods. In contrast to linear classification methods, the kernel methods map the original parameter vectors into a higher (possibly infinite) dimensional feature space through a nonlinear kernel function. High dimensional input spaces can be computationally difficult and time consuming for classifiers, e.g. weight adjustment of Artificial Neural Network (ANN). It is often required that the input dimension needs to be reduced. It is desired that with the limited resources (computer memory, computer speed, etc.) a classifier can solve the computation as fast as possible. Computational efficiency of SVM is high [14]. 3. PROPOSED METHODOLOGY Figure 3.1: In this project, a statistical method is Operational flow chart for proposed system. presented and applied to brain tumor classification. In images different local textures can describe different physical characteristics. We used gray level cooccurrence matrix approach introduced by Haralick which is well-known statistical method for extracting second-order texture information for images. The assumption is that local texture of tumor cells is highly different from local texture of other biological tissues. Thus texture measurements in the image could be part of an effective discrimination technique between healthy tissues and possible tumor areas. The association between local texture measures and recognized tumor area is executed using SVM 3.1 Textural Features Texture is a commonly used feature in the analysis and interpretation of images. Texture is characterized by a set of local statistical properties of pixel intensities. We base our texture feature extraction on the spatial gray level co-occurrencematrix (SGLCM). The GLCM method considers the spatial relationship between pixels of different gray levels. The method calculates a GLCM by calculating how often a pixel with a certain intensity i, occurs in relation with another pixel j, at a certain distance d, and orientation θ. For instance, if the value of a pixel is 1 the method looks, for instance, the number of 3. Inverse Differencet Moment (Homogenity): A times this pixel has 2 in the right side. Each element measure of local homogeneity that can be (i, j) in the GLCM is the sum of the number of times defined as that the pixel with value i, occurred in the specified f3 = relationship to a pixel with value j, in the raw image. Once the GLCM is calculated several second-order 4. texture statistics can be computed as illustrated ∑π π,π=1 π(π,π) 1+(π−π)² Entropy: A measure of non-uniformity in the image based on the probability of co-occurrence below: values and can be defined as where Pd, θ (i, j) is the GLCM between i and j [3]. f4 = ∑π π,π=1 π(π, π)[−πππ(π(π, π))] The feature extraction extracts the features of 5. importance for image classfication. The feature Angular second moment (Energy): A measure of homogeneity that can be defined as extracted gives the property of the image window, which can be used for training in the database. The f5 = ∑π π,π=1(π(π, π))² obtained trained feature is compared with the test sample feature obtained and classified as one of the extracted character. 6. Correlation Coefficient: A measure of linear dependency of brightness and can be defined Texture features or more precisely, GLCM features are used to distinguish between f6= ∑π π,π=1 ππ π(π,π)−ππ₯ ππ¦ ππ₯ ππ¦ normal and abnormal brain tumors. Five cooccurrence matrices are constructed in four spatial Where, N is the number of distinct gray orientations horizontal, right diagonal, vertical and levels in the quantized image, equal to 256 for left diagonal (0°, 45°, 90°, and 135°). A fifth images in the present study. µx, µy, σx, σy are the matrix is constructed as the mean of the preceding mean and standard deviation values of GLCM in the four matrices. x and y directions, respectively. From each co-occurrence matrix, a set of Eight features are extracted in different orientations for the 3.2 CLASSIFICATION training of the SVM model. Let P be the N*N co- Classification is the procedure for classifying occurrence matrix calculated for each sub-image, and the input pattern into analogous classes. When the then the features as given by Byer are as follows: input data set is represented by its class membership, it is called supervised learning. It employs two 1. Maximum Probability: f1= max P(i,j) phases of processing- training phase and testing 2. Contrast: A measure of difference moment and is phase. For training phase, characteristics properties defined as: of image features are isolated and a unique description of each classification category is created. f2 = ∑π π,π=1|π − π| ²π(π, π) In testing phase these features space partitions are used to classify image features [13]. f(x)=π€ π .x+b = 0 x – Set of training vectors w – vectors perpendicular to the separating hyper plane 3.2.1 Support Vector Machine (Binary classifier) b – offset parameter which allows the increase of the margin SVM is one of the techniques used for the classification purpose. SVM generally are capable of delivering higher performance in terms of classification accuracy. SVM is a binary classifier based on supervised learning which gives better performance than other classifiers. SVM classifies between two classes by constructing a hyperplane which can be used for classification [13][14]. Expression for hyper plane Figure 3.3: SVM Classifier w.x+b = 0 x – Set of training vectors w is weight vector and normal to hyperplane b is bias or threshold 3.2.2 Linear SVM Classifier SVM maps input vectors into a higher dimensional vector space where an optimal hyper plane is constructed. Among the many hyper planes available, there is only one hyper plane that maximizes the distance between itself and the nearest data vectors of each category. This hyper plane which maximizes the margin is called the optimal separating hyper plane and the margin is defined as the sum of distances of the hyper plane to the closest training vectors of each category. The basic theme of SVM is Step2: The Distance ‘d’ can be calculated by: |π(π₯)| is a measure of Euclidean distance of the point ‘x’ from decision hyperplane. One side of the plane f(x) has positive values and on the other negative. In the special case b=0 the hyperplane passes through the origin. Some criteria commonly used in classification are distance measure. In the following those criteria are explained: • Distance measure is the simplest and most direct approach to classify data points. Basically, the idea is to classify a data point into the class closest to it. The Euclidean distance is the most common definition. to maximize the margins between two classes of the Suppose we have ‘K’ classes with (μi, Si) as the hyperplane [13][14]. The detailed description is given known parameter set of class ‘i’, where μi is the below: reference vector of class ‘i’ and Si is the covariance. The Euclidean distance of an observation vector ‘x’ Step1: The simplest form of discriminating function from class ‘i’ is given by the following equation [14]. is linear. Linear discriminating function F(x) is written as: di(x)=√β₯ π₯ − ππ β₯ ² f(x)=π€ π .x+b Expression for hyper plane Margin is d1+d2 3.2.3 Non-Linear SVM The first section introduces the idea of wT xk + b ≥ +1 for yk = +1 ……(1) wT xk/ + b ≤ -1 for yk= -1………(2) maximal margin classification, optimal separating hyperplane, followed by kernel methods as the basis for the extension towards nonlinear classification as \Step2. Optimal separating hyperplane introduced by Vapnik. maximum margin hyperplane Kernel function is used when decision function is not a linear function of the data and the or The Optimal hyperplane of a training set D is defined by: data will be mapped from the input space through a (w*, b*) = arg max D(w,b) non linear transformation rather than fitting nonlinear curves to the vector space to separate the data. With an optimal kernel function implemented in SVM model, the classification task is able to scale high dimensional data relatively well, tradeoff The unit vector w* and the constant b* which maximize the margin of the training set D (w, b) and also satisfy the condition (1) and (2). between classifier complexity, and classification Step3. Kernel Criteria error can be controlled explicitly [14]. Various steps Steps involved: are given below: Step1: Maximal margin Consider the class of hyperplanes wT x + b = 0, w € Rn, b € R, corresponding to a decision function f(x) = sign(wT x + b) (i) Let x € D € Rn denote a real valued random input vector, and y € {−1,+1} discrete real valued random output variable and let Ω € RnH denote a high dimensional feature space. The SVM method basically maps the input vector ‘x’ into the high dimensional feature space through some nonlinear φ : D →Ω . In this feature space, one mapping consider the linear function A hyperplane is constructed which maximally separates the classes :(maximum margin) f(x) = sign [wT(x) + b] max w,b min [ βπ₯ − π₯πβ ; x € Rn , wT x + b=0 , k=1,….N] This linear function is well in solving classification problems, however, it remains a To show how this hyperplane can be constructed in problem to solve the calculation in the high an efficient way, we need use definitions of dimensional feature space. Interestingly, no explicit Separability given by following equation : construction of the nonlinear mapping ‘φ(x)’ is A training set D = {(x1, y1), ..., (xN, yN) : xk € Rn , yk € {−1,+1}} is called separable by a hyperplane wT x + b = 0 if there exist both a unit vector w (βπ€β= 1) and a constant ‘b’ such that the following equalities hold: needed. This is motivated by the following result. (ii) The inner product in the feature space ‘ φ(x k)T φ (xl)’ can be replaced with the corresponding K(xk, xl) satisfying Mercer’s condition. kernel 4. Disscussion Using Mercer’s theorem to replace the inner product φ (xk)Tφ (xl) with its corresponding kernel The system we are developing is 60% K(xk, xl) is often called the kernel trick. It enables us completed. The GLCM based various features are to work in a huge dimensional feature space without extracted. In this system a sample of 120 brain actually having to do explicit computations in this images are taken, out of which 80 images are space. Computations are done in another space after Cancerous and 40 images are Non-cancerous. applying this kernel trick. Various features are successfully calculated for all In the case of support vector machines, one 120 images and further classification ( kernel based ) starts from a formulation in the primal weight space into various subtypes of Meningioma is in progress. with a high dimensional feature space by applying 5. CONCLUSION transformations φ (·). The solution is calculated not in this primal weight space, but in the dual space of In this dissertation we study the problems Lagrange multipliers after applying the kernel trick. of Conventional classifiers like ANN, KNN, K-mean In this way classification is done implicitly in a high Clustering, HSOM etc that for ‘High dimensional dimensional feature space rather than in the original input spaces’ computationally is difficult and time input space. consuming. In proposed system an advanced kernelbased (RBF, Quadratic, Linear) techniques such as Step4. Non-Linear Conversion With slight modification, for the nonlinear case we SVM kernel-based can write have been implementing. Latest SVM classifiers can solve the computation as fast as possible for ‘High dimensional wT φ(xk) + b ≥ +1 for yk = +1 input spaces’ and ‘Computational efficiency’ of SVM wT φ(xk) + b ≤ −1 for yk = −1 is high. In this quadratic form, the kernel trick is applied 6. REFERENCES [1] S.T K (xk, xl) = φ(xk)T φ (xl) for k = 1, ...,N. [2] Finally the nonlinear SVM classifier takes the form y(x) = sign[ ∑π π=1 πΌππΌπ π¦π π¦π K(xk, xl)+b] [3] 3.2.4 Choice of kernel function: Two common chioces of kernel functions are: (i) K(x,z) = exp (-βπ₯ − π§β²/π 2 ) [4] ( RBF-kernel) [5] (ii) K(x,z) = ( τ+π₯ π π§ )d ) analysis ( Polynomial of degree d Acton, D.P Mukherjee, “Scale space classification using area morphology,” IEEE Trans Image Process 9(4), 2000, pp.623–635. M. N. Ahmed, S. M. Yamany, N. Mohamed, A. A. Farag, T. Moriarty, “A modified fuzzy cmeans algorithm for bias field estimation and segmentation of MRI data,” IEEE trans. medical imaging, 21(3), 2002, pp.193-199. Javad Alirezaie, M. E Jernigan, C. Nahmias, “Automatic segmentation of cerebral MR images using Artificial Neural Network,” IEEE transactions on nuclear science, 1998, vol 45, no.4. H. Azzag, N. Monmarche, M. Slimane, G. Venturini, “Ant Tree: A New model for clustering with Artificial Ants,” IEEE, 2003, pp.2642-2647. E. F. Badran, E. G. Mahmoud, N. Hamdy,"An algorithm for detecting brain tumors in MRI images", Proceedings of the International Conference on Computer Engg. and Systems (ICCES), 2010, pp:368 - 373. [6] J. C. Bezdek, “Pattern Recognition with Fuzzy [11] S. Chaplot, L. M. Patnaik, “Brain Tumor objective function algorithms” New York, 1981. D. Bhattacharyya, Kim Tai-hoon, "Brain Tumor Detection Using MRI Image Analysis", Communications in Computer and Information Science, Vol: 151, 2011, pp: 307-314. B. H. Brinkmann, A. Manduca, R. A. Robb, “Optimized homomorphic unsharp masking for MR grayscale inhomogeneity correction,” IEEE T. Med. Imag., 17, 1998, pp.161–171. S. Chandra, R. Bhat, H. Singh, "A PSO based method for detection of brain tumors from MRI", Proceedings of the World Congress on Nature & Biologically Inspired Computing, Coimbatore, 2009, pp. 666 - 671. S. Chaplot, L. M. Patnaik, “Classification of magnetic resonance brain images using wavelets as input to support vector machines and neural networks,” Biomedical Signal Processing and Control, 2006, pp. 86-92. Diagnosis with wavelets and Support Vector Machine,” proceeding of 3rd international Conference on intelligent Systems and Knowledge Engineering, 2008. [12] T. Chou, C. Chen, W. Lin, “Segmentation of dual-echo MR images using neural networks”, Proceeding SPIE, medical imaging, 1993, pp.220-227. [13] M. H. Chowdhury, W. D. Little., “Image thresholding techniques” IEEE pacific Rim conference on communications, computers and signal processing, proceedings 17-19 may 1995, 1995, pp.585-589. [14] L. P. Clarke, R. P. Velthuizen, S. Phuphanich, J. D. Schellenberg, J. A. Arrington, M. Silbiger, “MRI: Stability of Three Supervised Segmentation Techniques”, Magnetic Resonance Imaging, 11: pp. 95-106, 1993. [7] [8] [9] [10]