International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 3 – Feb 2015 Retrieval of Compressed Medical Images Using Data Mining Techniques Enireddy.Vamsidhar#1, Dr.Reddi KiranKumar*2 # Research Scholar,Depatment of CSE,JNTUK,Kakinada,Andhra Pradesh,INDIA *Asst.Professor,Department of ComputerScience,Krishna University,Machilipatnam, Andhra Pradesh,INDIA Abstract— The advance of technology in the medical field is creating a large amount of digital data in the form of digital medical images. These images are stored in the large databases for easy accessibility and Image Retrieval (IR) is used to retrieve diagnostic cases similar to the query medical image to help the healthcare professional in analysing the query image. Image compression technique is utilized for storage and transmission of the images. A study has been done on the retrieval of compressed images. The proposed method integrates content based image retrieval of diagnostic cases similar to the query medical image and image compression techniques to minimize the bandwidth utilization. daubechie wavelet is used for image compression without losses. Edge and texture features are extracted from the medical compressed medical images using Sobel edge detector and Gabor transforms respectively. The features are then reduced using information gain and the classification accuracy of retrieval is evaluated using Naïve Bayes, Support Vector Machine, IBL,CART, Random forest. Keywords— Medical Images, Image retrieval, Compression, Data Mining, Support Vector Machine, Naïve Bayes, IBL, CART, Random Forest I. INTRODUCTION With the advance of medical technologies, digital images such as X-rays, MRI, ECG, CT has become a norm for diagnosis and treatment. These digital medical images are stored in large databases for easy accessibility and Content based image retrieval (CBIR) is used to retrieve diagnostic cases similar to the query medical image[1][2]. Image retrieval using conventional methods like index or semantics is not feasible as the databases contain a huge amount of data and also the image content is more versatile than the semantics. Using different algorithms CBIR extract relevant features from the image, on presenting a query image, based on these features the images are retrieved from the database. Features such as colour, texture, and shape in the image are automatically extracted by CBIR systems. Similarity measures are used to compare the features extracted from the query image with that of features of images stored in the database. Images with features similar to that of query are retrieved. CBIR is now widely applied in medical image applications; many CBIR systems are reviewed in literature [3] [4]. The major problem is to store the large amount of diagnostic data which is in the form of medical images and also the efficient transmission of the data is also another task with the available bandwidth. Image compression can be utilized to reduce the amount of data[5].During compression process, redundancies in the image are removed resulting in compact representation of the image. Compression process is of two types: lossless compression and lossy compression. In lossless compression, ISSN: 2231-5381 the original image is perfectly recovered whereas in lossy compression minor loss in details occurs when image is recovered. The major advantage of lossy compression is high compression ratio is achieved. Medical image compression cannot afford to lose any details on recovering of image, as it may lead to problems to loss of information in diagnostically important region. Thus, compression ratio achieved through lossless compression for medical images is very less. A commonly used approach to overcome this issue is to segment the medical image into region of interest (ROI), and ROI is compressed using lossless compression and rest of the image i.e., non-ROI is compressed using lossy compression. Thus, achieve a better compression ratio while preserving quality of diagnostically crucial region. In this paper, to retrieve diagnostic cases similar to the query medical image which are compressed to minimize the bandwidth utilization is investigated. Daubechie wavelet is used for image compression with a decomposition level of one to reduce the losses. Edge and texture features are extracted from the medical compressed medical images using Sobel edge detector and Gabor transforms respectively. The classification accuracy of retrieval is evaluated using Naïve Bayes , Support Vector Machine, CART,IBL and Random forest. II. METHODOLOGY In this paper the medical images were compressed using the Daubechies wavelet with a decomposition level of one to have a high PSNR. The low level features edge and textures are extracted using Sobel edge detector and Gabor transforms from the compressed medical images. Feature reduction is done using the gain ratio and the reduced features are used for classification. The figure shows the detailed methodology of the work. MRI Input image Image compression Daubechies wavelet Texture features using Gabor filter http://www.ijettjournal.org Edge features using Sobel Edge Detector Feature Selection Information Gain Classification and performance measurement Fig 1. Detailed Methodology Page 143 International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 3 – Feb 2015 Daubechies had a lasting impact on the field with her construction of the first family of compactly supported, orthogonal wavelet bases [6]. Due to their remarkable properties and ease of implementation, the Daubechies wavelets have become popular and led to a number of successful signal processing applications, such as compression, denoising, classification, or fusion. The Daubechies wavelet transforms, the scaling signals and wavelets have slightly longer supports, i.e., they produce averages and differences using just a few more values from the signal. This change, however, provides a tremendous improvement in the capabilities of these new transforms. They provide us with a set of powerful tools for performing basic signal processing tasks. The Daubechie4 wavelet transform is defined in essentially the same way as the Haar wavelet transform. If a signal f has an even number N of values, then the 1-level Daubechie4 transform is the mapping f _−D→1 (a1 | d1) from the signal f to its first trend subsignala1 and first fluctuation sub-signal d1. Each value am of a1 = (a1, . . . ,aN/2) is equal to a scalar product: am = f ·V1m off with a 1-level scaling signal V1m. Likewise, each value dm of d1 =(d1, . . . , dN/2) is equal to a scalar product :dm= f ·W1mof with a 1level wavelet W1m. In image processing and computer vision edge detection is a fundamental tool, particularly in the areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities[7].To understand image features detection of edges in an image is a very important step. Edges consist of meaningful features and contained significant information. It’s reduce significantly the amount of the image size and filters out information that may be regarded as less relevant, preserving the important structural properties of an image [8]. Using the information of the edges of an image the redundancies can sometimes be removed [9]. Eliminating the redundancy could be done through edge detection. When image edges are detected, every kind of redundancy present in the image is removed [10]. The Sobel Edge Detector generates a series of gradient magnitudes with a simple convolution kernel. The gradient of an image say f(x,y) at the location (x,y) is given by the vector = = +2P4 +P7 ) where P1 to P9 are the pixels in the sub image as shown in Figure 2 [12]. P1 P2 P3 P4 P5 P6 P7 P8 P9 )= + 1/2 The magnitude of the gradient is approximated as∆ = | | + and the direction of the gradient vector is given by ( , )= where the angle is measured along the x- axis. The equivalent digital form of the gradient is given by Sobel operators and the equation is given by Gx=(P7 +2P8 +P9 ) - (P1 +2P2 +P3 ) and similarly Gy=(P3 +2P6 +P9) - (P1 ISSN: 2231-5381 -1 0 0 0 1 2 1 (b) -1 0 1 -2 0 2 -1 0 1 (c) Fig.2 shows Sobel masks. (a) Sub image (b)Sobel mask for horizontal direction (c) Sobel mask for vertical direction The masks in Figure 2 (b) computes Gx at the centre of the 3X3 region and the other is used to compute Gy .Gabor filters model uses texture for image interpretation tasks as there are strong relations between different filters outputs. The texture can be defined as the regular repetitions of an element or pattern on a surface.. The Gabor filter is capable of multi-scale and multi-resolution and it consists of a tunable band pass filter . It has selectivity for orientation, spectral bandwidth and spatial extent. Visually different Image regions can have the same first order statistics. Use of second order statistics enables improvement of the situation taking into account not just grey pixel levels but also spatial relationships between them . Gabor filter, Gabor transform are directed by the “Uncertainty Principle” [13]. This function provides accurate time-frequency location. A two dimensional Gabor function g(x,y) and its Fourier transform G(u,v) is given by: ( , )= − + Employing the magnitude of the gradient vector in the edge direction, represented as[11]: ( -2 (a) + = -1 ( + ) whereσ is the spatial spreadω is the frequencyθ is the 2 v 2 1 u W G u, v exp 2 v2 2 u Where orientation u 1 2 x 1 2 y . and v On pre-processing the data, the features of the data set are identified as either being significant to the classification process, or redundant. These redundant features can be removed and this process is known as feature selection. Redundant features are generally found to be closely correlated with one or more other features. As a result, http://www.ijettjournal.org Page 144 International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 3 – Feb 2015 omitting them from the process does not degrade classification accuracy. In fact, the accuracy may improve due to the resulting data reduction, and removal of noise and measurement errors associated with the omitted features. Therefore, choosing a good subset of features proves to be significant in improving the performance of the system [14]. In this work, gain ratio is used which is a modification of the information gain that solves the issue of bias towards features with a larger set of values, exhibited by information gain. Gain ratio should be large when data is evenly spread and small when all data belong to one branch attribute. When choosing an attribute, Gain ratio takes number and size of branches into account, as it corrects information gain by considering intrinsic information of a split (how much information is needed to know which branch an instance belongs to) where Intrinsic information is the entropy of instances distribution into branches. For a given feature x and a feature value of y, the calculation is as follows: Gain ratio (y,x)= gain( y, x) int rinsic info(x) where, int rinsic inf o x si s * Log 2 i s s |s| is the number of possible values a feature x can take, and |si| is the number of actual values of feature x. Application of gain ratio to every dataset feature provides an estimate of a feature’s importance with all features being ranked from the most influential to the least through sorting of gain ratios. The top k features then construct a simple classifier. These selected features are utilized for the classification of the images. A database of 100 images was taken and the images contains lung and brain. The classification accuracy of retrieval is evaluated using Naïve Bayes, Support Vector Machine, IBL, CART, Random forest. The Bayesian Classification represents a supervised learning method as well as a statistical method for classification. The Naïve Bayes uses Bayes theorem and it is a probabilistic method used for prediction. Since it is a Supervised learning method during training the conditional probabilities of each attribute in the predicted class is estimated from the training data set. Using the small training data the parameter’s mean and variance obtained and it is sufficient for classification. Commonly used as the Naïve Bayes provide good results and easy interpretation of the results [15]. The disadvantage being that the classifier considers that the occurrence of attributes is independent; therefore the correlation between the attributes is ignored. ISSN: 2231-5381 Naïve Bayes classifies the given input represented by its feature vector to the most likely class. Learning is simplified on the assumption that the features are independent given class, P X C in1P X i C Where X=(X1,…,Xn) is the feature vector and C is a class. SVM(support vector machine) are a useful technique for the data classification. A classification task generally separates data into training and testing sets. Each instance in the training set contains target value( Class Label) and several attributes. It is a new learning method for binary classification, where basic idea is to locate a hyper plane that separates d-dimensional data into its two classes perfectly [16].. A key insight in SVM's is that higher-dimensional space does not need to be handled directly (only formula for dotproduct in such space is needed), which in turn eliminates above concerns The aim of SVM is production of a model (based on training data) that predicts target values of test data when given only test data attributes. Given a training set of instance-label pairs (xi; yi); i = 1,….l, the support vector machines (SVM) need a solution of the following optimization problem [17]: 1 T w w C w,b , 2 min l i i 1 Subject to yi wT xi b 1 i , i 0 A classification problem is restricted to consideration of a two-class problem without loss of generality [18]. Instancebased learning is a machine learning method that classifies new examples by comparing them to those already seen and in memory. These are “lazy” in the sense that they perform little work when learning from the dataset, but expend more effort classifying new examples. The simplest method, nearest neighbour, performs no work at all when learning; it stores all examples in memory verbatim. Effort is transferred to classification time, when the system decides which example in memory to use to classify the new one. Case-based reasoning systems perform a small amount of work indexing new cases, resulting in a reduction in classification effort. The examples stored in memory are called exemplars, and are retained in an exemplar database[19] http://www.ijettjournal.org Page 145 International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 3 – Feb 2015 On the basis of the stored examples, Predictions are derived [20] accomplished by means of nearest-neighbour estimation principle [21]. Let is equipped with a distance measure ∆(·), i.e., ∆ (x, x0) is the distance between instances x, x' ∈ X. Usually Euclidean distance is used and attributes are normalized. Distance between two instances xi and xj is defined as ( , )≡ ( ( )− ) = where is the actual value of the attribute is the output space and 〈 , 〉 ∈ X × Y is called a labelled instance, a case, or an example. In classification, Y is a finite (usually small) set comprised of m classes { ,…….. }, whereas Y = R in regression. IBL reduces the number of training instances stored to a small set of representative examples. Another advantage of IBL is it can be used in problems other than classification. CART(Classification and Regression Trees) tree is a binary decision tree that is constructed by splitting a node into two child nodes repeatedly, beginning with the root node that contains the whole learning sample[22]. The basic idea of tree growing is to choose a split among all the possible splits at each node so that the resulting child nodes are the “purest”. If X is a nominal categorical variable of I categories, there are 2 I-1 -1 possible splits for this predictor. If X is an ordinal categorical or continuous variable with K different values, there are K - 1 different splits on X. A tree is grown starting from the root node by repeatedly using the following steps on each node. Attribute are normalized by 1.Find each predictor’s best split. For each continuous and ordinal predictor, sort its values from the smallest to the largest. For the sorted predictor, go through each value from top to examine each candidate split point (call it v, if x ≤ v, the case goes to the left child node, otherwise, goes to the right.) to determine the best. The best split point is the one that maximize the splitting criterion the most when the node is split according to it. The definition of splitting criterion is in later section. For each nominal predictor, examine each possible subset of categories (call it A, if xϵ A, the case goes to the left child node, otherwise, goes to the right.) to find the best split. 2. Find the node’s best split. Among the best splits found in step 1, choose the one that maximizes the splitting criterion. 3. Split the node using its best split found in step 2 if the stopping rules are not satisfied Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest[23]. The common element in all of these procedures is that for the kth tree, a random vector k is generated, independent of the past random vectors , ... , k-1but with the same distribution; and a tree is grown using the training set and k , resulting in a classifier h(x, k ) where x is an input vector. The analysis show that the accuracy of a random forest depends on the strength of the individual tree classifiers and a measure of the dependence between them[24]. The medical images retrieval ,often have the property that there are many input variables, often in the hundreds or thousands, with each one containing only a small amount of information. A single tree classifier will then have accuracy only slightly better than a random choice of class. But combining trees grown using random features can produce improved accuracy. Given an ensemble of classifiers h1(x),h2(x), ... ,hK(x), and with the training set drawn at random from the distribution of the random vector Y,X, define the margin function as mg(X,Y) =avkI(hk(X)=Y)=maxj≠y avkI(hk(X)= j ) .where I() is the indicator function. The margin measures the extent to whichthe average number of votes at X,Yfor the right class exceeds the average votefor any other class. The larger the margin, the more confidence in the classification. The generalization error is given byPE* =PX,Y(mg(X,Y) < 0) where the subscripts X,Y indicate that the probability is over the X,Y space.In random forests, hk(X) =h(X,k ) . III. RESULTS AND DISCUSSIONS Medical Images are compressed using the daubechie wavelet. The low level features edge and textures are extracted using Sobel edge detector and Gabor transforms from the compressed medical images. The extracted features are then reduces using the information gain and the selected features are used for the classification. The classification accuracy of retrieval is evaluated using Naïve Bayes, Support Vector Machine[25] ,IBL,CART,[26] Random forest. Table 1 tabulates the classification accuracy and RMSE, and Figure 2 shows the same. Technique Naïve Bayes C-SVM with linear kernel nu-SVM with linear kernel Classification and Regression Tree Instance Based Learner Random Forest Classification Accuracy 92% RMSE 91% 0.3 92% 88% 0.2828 0.3325 93% 0.2646 92% 0.2632 0.2828 Table 1: Classification Accuracy ISSN: 2231-5381 http://www.ijettjournal.org Page 146 International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 3 – Feb 2015 100% Recall 80% 60% 0.95 Classification Accuracy RMSE 0.85 0.8 Recall Fig. 2 Graph showing classification accuracy Fig 4 Graph showing the Recall Table 2 lists the precision, recall and f Measure for various classification techniques. Figure 3 shows the precision, Figure 4 shows the recall, and Figure 5shows the f- measure. Naïve Bayes 0.92 0.92 0.92 C-SVM with linear kernel 0.91 0.91 0.91 nu-SVM with linear kernel 0.92 0.92 0.92 CART 0.913 0.84 0.875 IBL 0.957 0.9 0.928 Random Forest 0.920 0.920 0.920 F-Measure IBL Recall C-SVM with… Precision 0.94 0.92 0.9 0.88 0.86 0.84 Naïve Bayes Technique fMeasure F-Measure Random Forest 0% 0.9 CART 20% nu-SVM with… 40% Table 2: Precision, Recall and F Measure Fig 5 Graph showing the f-Measure Precision 0.98 0.96 0.94 0.92 0.9 0.88 Precision Fig. 3 Graph showing the Precision ISSN: 2231-5381 IV. CONCLUSION This paper proposed to investigate the Image Retrieval (IR) problem on compressed images. The medical images are compressed and retrieved using traditional techniques. The classification accuracy obtained is comparable to the accuracies obtained in uncompressed images. In future work needs to be carried out to investigate the effectiveness of soft computing classification algorithms for compressed medical image retrieval. REFERENCES 1. Lehmann , T.M., Schubert, H. , Keysers, D., Kohnen, M., Wein, B.B , The IRMA code for unique classification of medical image, in the Proceedings of the SPIE 5033, 109-117 (2003). 2. Samuel, G., Armato III, et al.: Lung image database consortium – Developing a resource for the medical imaging research community, in Radiology . 232, 739-748 (2004). 3. Crucianu M., Ferecatu M., Boujemaa N.: Relevance vthe Art in Audiovisual Content-Based Retrieval, Information Universal Access and Interaction, 2004. 4. Muller, H., Michoux, N., Bandon, D., Geissbuhler, A, A review of content based image retrieval systems in medical applications – Clinical benefits and http://www.ijettjournal.org Page 147 International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 3 – Feb 2015 future directions, in the International Journal of Medical Informatics .73, 123 (2004). 5. Cerra, D. ,Datcu, M., Image Retrieval using Compression based techniques, in Proceedings of the International Conference on Source andChannel Coding (SCC), 1-6 (2010) 6. I. Daubechieechies, “Orthogonal bases of compactly supported wavelets,”Commun. Pure Appl. Math., vol. 41, pp. 909–996, Nov. 1988. 7. Pal, N. R., & Pal, S. K. (1993). A review on image segmentation techniques. Pattern recognition, 26(9), 1277-1294. 8. Yuval, F. (1996). Fractal image compression (theory and application).Institute for non-linear Science,University of California, San Diego, USA. [9] Osuna, E., Freund, R., &Girosi, F. (1997). Training support vector machines: An applicat ion to face detection. Proceedings of IEEE Conference Computer Vision and Pattern Recognition. [10]. Sparr, G. (2002). Image processing and pattern classification for character recognition. Center for Mathemat ical Sciences, Lund University, 2, 25-78. [11] Raman Maini & Dr. Himanshu Aggarwal ,”Study and Comparison of Various Image Edge Detection Techniques “,International Journal of Image Processing (IJIP), Volume (3) : Issue (1), pp. 1 – 12 [12] S,Annadurai and R.Shanmugalakshmi, Fundamentals of digital image processing, third impression Pearson Education, pp 232-233. [13] C J Setchell, N W Campbell ,”Using Colour Gabor Texture Features For Scene Understanding.” In Proc. 7th Internat Conf. on image processing applications. Vol. 67(5), pp. 372-376. [14] Zubair A. Baig, Abdulrhman S. Shaheen, and RadwanAbdelAal, "OneDependence Estimators for Accurate Detection of Anomalous Network Traffic," International Journal for Information Security Research (IJISR), Volume 1, Issue 4, December 2011. [15]. Besserve. M, Garnero. L, Martinerie. J. Cross-Spectral Discriminant Analysis (CSDA) for the classification of Brain Computer Interfaces. 3rd International IEEE/EMBS Conference on Neural Engineering, 2007. CNE '07. pp:375 - 378,2007 [16]. Dustin Boswell, 2002,”Introduction to Support Vector Machines” [17] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, 2010,”A Practical Guide to Support Vector Classification” [18] Steve R. Gunn, 1998,”Support Vector Machines for Classification and Regression. [19]. Brent Martin,1995 “INSTANCE-BASED LEARNING:Nearest Neighbourwith Generalisation” A thesis report Department of Computer ScienceUniversity of WaikatoHamilton, New ZealandMarch, 1995. [20]. Ammar Shaker and Eyke H¨ullermeier. IBLStreams: A System for Instance-Based Classification and Regression on Data Streams. journal of Evolving Systems [21] Belur V. Dasarathy, editor. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos,California, 1991. [22]. Breiman, L., Friedman, J.H., Olshen, R., and Stone, C.J., 1984. Classification and Regression Tree Wadsworth & Brooks/Cole Advanced Books & Software, Pacific California. [23]. Breiman. L.2001 “Random Forests” A Technical Report January 2001, Statistics Department, UCB. [24]. Amit, Y. and Geman, D. Shape quantization and recognition withrandomized trees, Neural Computation 9, 1545-1588,1997. [25]. Vamsidhar Enireddy and Kiran Kumar Reddi. Article: A Data Mining Approach for Compressed Medical Image Retrieval.International Journal of Computer Applications 52(5):26-30, August 2012. [26]. Vamsidhar Enireddy and Kiran Kumar Reddi. Article: Application of CART and IBL for Image Retrieval International Journal of Computer Science and Telecommunications Volume 3, Issue 12, December 2012. ISSN: 2231-5381 http://www.ijettjournal.org Page 148