Expert Systems With Applications 86 (2017) 190–198 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Medical image analysis using wavelet transform and deep belief networks Amin Khatami∗, Abbas Khosravi, Thanh Nguyen, Chee Peng Lim, Saeid Nahavandi Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Geelong, VIC, 3216, Australia a r t i c l e i n f o Article history: Received 20 July 2016 Revised 12 May 2017 Accepted 28 May 2017 Available online 1 June 2017 Keywords: Deep belief network Wavelet transform Radiography image Feature extraction Kolmogorov Smirnov test Classification a b s t r a c t This paper introduces a three-step framework for classifying multiclass radiography images. The first step utilizes a de-noising technique based on wavelet transform (WT) and the statistical Kolmogorov Smirnov (KS) test to remove noise and insignificant features of the images. An unsupervised deep belief network (DBN) is designed for learning the unlabelled features in the second step. Although small-scale DBNs have demonstrated significant potential, the computational cost of training the restricted Boltzmann machine is a major issue when scaling to large networks. Moreover, noise in radiography images can cause a significant corruption of information that hinders the performance of DBNs. The combination of WT and KS test in the first step helps improve performance of DBNs. Discriminative feature subsets obtained in the first two steps serve as inputs into classifiers in the third step for evaluations. Five frequently used classifiers including naive Bayes, radial basis function network, random forest, sequential minimal optimization, and support vector machine and four different case studies are implemented for experiments using the Image Retrieval in Medical Application data set. The experimental results show that the three-step framework has significantly reduced computational cost and yielded a great performance for multiclass radiography image classification. Along with effective applications in image processing in other fields published in the literature, deep learning network in this paper has again demonstrated its robustness in handling a complex set of medical images. This implies that the proposed approach can be implemented in real practice for analysing noisy radiography images, which have many useful medical applications such as diagnosis of diseases related to lung, breast, musculoskeletal or pediatric studies. © 2017 Elsevier Ltd. All rights reserved. 1. Introduction Radiography images have been widely analysed and studied to facilitate disease diagnosis and treatment in the medical domain. Manually analysing and classifying X-ray images however are time-consuming and intensive. Computer vision provides effective techniques to overcome these difficulties. Recently, deep learning techniques, which learn rich features hierarchically, have demonstrated significant performance across many different image processing tasks. The capability of these techniques to extract information from large volumes of data, especially unsupervised data, makes it a valuable tool for big data related problems. Wulsin, Gupta, Mani, Blanco, and Litt (2011) applied the deep belief networks (DBNs) in a semi-supervised manner to form ∗ Corresponding author. E-mail addresses: amin.khatami@deakin.edu.au, skhatami@deakin.edu.au (A. Khatami), abbas.khosravi@deakin.edu.au (A. Khosravi), thanh.nguyen@deakin.edu.au (T. Nguyen), chee.lim@deakin.edu.au (C.P. Lim), saeid.nahavandi@deakin.edu.au (S. Nahavandi). http://dx.doi.org/10.1016/j.eswa.2017.05.073 0957-4174/© 2017 Elsevier Ltd. All rights reserved. clinical electroencephalography waveforms for classification and anomaly detection. Traditionally, the DBNs are time consuming due to a large number of trainable parameters. This encourages researchers to propose useful dimension reduction techniques that provide proper information to the DBNs. Brosch, Tam, and Initiative (2013) introduced a learning model for undertaking manifold 3D brain images using the DBNs. They proposed a pre-processing step to reduce the feature dimensionality of a single image by mapping the input image into multiple images with lower resolution. Regarding any other applications, mapping from raw data into different feature spaces has been currently investigated (Khatami, Mirghasemi, Khosravi, Lim, & Nahavandi, 2017; Khatami, Mirghasemi, Khosravi, & Nahavandi, 2015a, 2015b). One of the motivations of this study is to design and develop an effective pre-processing step to extract meaningful features for the DBNs. In this respect, we propose the use of wavelet transformation (WT) technique because WT is able to remove noise and outliers of images with a compact feature representation. Korfiatis, Karahaliou, Kazantzi, Kalogeropoulou, and Costaridou (2010) used a wavelet pre-processing step for identifying parenchyma lung A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 disease patterns. A similar technique for breast volume de-noising and noise characterization is utilized in Chen and Ning (2004) and Khatami, Khosravi, Lim, and Nahavandi (2016). Likewise, Zhang, Wang, Ji, and Dong (2013) used the WT followed by the principal component analysis to extract relevant information from medical images, resulting in a reduced feature space. Alternatively, a feature selection technique for a brain-computer interface driven by auditory and spatial navigation imagery was proposed in Cabrera, Farina, and Dremstrup (2010). The classification performance could be improved when the features from a combination of two and three channels were optimized by a scaling filter with the discrete wavelets. This paper proposes a sequential feature extraction framework for classifying different parts of body images. The proposed framework consists of three main steps: (1) extracting low level wavelet features obtained from a low-pass filter, followed by the Kolmogorov Smirnov (KS) test as pre-processing step; (2) applying an unsupervised DBN for learning the image features selected by the KS test; (3) analyzing different classification algorithms to identify the best classifier for processing the structured features derived from the DBN. The impact of coefficient matrices extracted by a Wavelet transformation is considered in this study. Approximation coefficients matrix which is obtained by low-pass filter, and details coefficients matrices (horizontal, vertical, and diagonal) which are obtained by high-pass filter, are computed by a Single-level discrete 2-D wavelet decomposition of input images. In this paper, we focus on the approximation coefficient components called the lowlevel wavelet features because X-ray images contain texture information, and applying a high-pass filter could lead to removal of useful texture information (Acharya & Ray, 2005). The statistical KS test is utilized as it is an unsupervised feature selection technique, which is computationally inexpensive and robust to outliers. The KS test converts all values to their ranks without changing the maximum dissimilarity among the cumulative frequency distributions (Lehmann & D’Abrera, 2006). Indeed, experimental results of this research indicate a significant performance improvement when the KS test is used. The main contributions of this study are two-fold. First, we introduce the combination of WT and KS test to reduce noise as well as dimensions of raw medical images. Although there are many studies on using wavelet features as the first step in analyzing Xray images, the KS test has been rarely utilized with WT for data dimension reduction. Second, we design appropriately DBN systems that are able to select significant deep features to facilitate fast and precise classification. We show that the appropriate design of DBN results in a considerable increase in the performance of deep classification of X-ray images. In addition, a range of classifiers is used to validate the feature subsets obtained by the DBN systems. 2. Relevant literature Many studies to overcome the related problems in medical image processing using machine learning models have been conducted. Schilham, Van Ginneken, and Loog (2006) presented a technique to detect nodule in chest X-ray images. They used a multi-scale Gaussian filterbank to obtain the relevant features for the classification step. Alternatively, a non-invasive classification model to determine scoliosis curve types from X ray images in Adankon, Dansereau, Labelle, and Cheriet (2012). In that research, the principal component analysis technique was considered for feature dimensionality reduction, and the support vector machine (SVM) was applied as a classifier. Huber et al. (2012) analysed different classifiers to estimate the relevance of texture features to classify interstitial lung disease patterns in high-resolution 191 computed tomography images. They found that the generalized matrix learning vector quantization performed better than others. In Bricq, Collet, and Armspach (2008) and Awate, Tasdizen, Foster, and Whitaker (2006), Markov modelling techniques were investigated for MRI brain-tissue classification. Likewise, a hierarchical learning machine-based scheme is proposed in El-Naqa, Yang, Galatsanos, Nishikawa, and Wernick (2004) to estimate human perceptual similarity and to retrieve clinical mammograms containing clustered micro-calcifications. Quellec, Lamard, Cazuguel, Roux, and Cochener (2011) presented two techniques pertaining to the Bayesian network and Dezert Smarandache theory to capture contextual features as important information of images. Naive Bayes, SVM, radial basis function (RBF) networks, and decision trees are the most frequently used techniques for image classification and retrieval in the medical domain. Subashini, Sahoo, Sunil, and Easwaran (2016) used a naive Bayes classifier to define a non-invasive methodology for grade identification of astrocytoma. In another research, Yang and Agarwal (2011) proposed a naive Bayes-based model to estimate indications for 145 diseases using clinical side-effects as important features. The RBF networks were used in the investigations of Udelhoven, Naumann, and Schmitt (20 0 0) and Lei, He, and Zi (20 09). In Udelhoven, Naumann, and Schmitt (20 0 0), the RBF network was employed for top-level classification of microbial Fourier transform infrared spectra at the genus level. On the other hand, in Lei, He, and Zi (2009), the RBF network was used to automatically identify various machine operation conditions. Sommer, Straehle, Koethe, and Hamprecht (2011) proposed an easy-to-use toolkit to perform segmentation and classification tasks. They utilized a random forest model to categorize each pixel’s neighbourhood for a set of generic (non-linear) features. Nahar, Imam, Tickle, Ali, and Chen (2012) used a sequential minimal optimization (SMO) type of SVM to analyse microarray data for early diagnosis of breast cancer. Celebi et al. (2007) used the SVM to classify features extracted from dermoscopy images pertaining to skin lesions. Lee et al. (2010) introduced a classification model based on the linear discriminant analysis. They combined the genetic algorithm with the random subspace method to estimate pulmonary nodule. 3. Image classification framework Accurately and quickly classifying big data medical images using data mining and machine learning techniques are challenging. Most of the conventional feature extraction models are not efficient to find useful information existed in complex medical images. We introduce a framework that uses an unsupervised deep learning model to learn the features obtained by wavelet transformation and Kolmogorov Smirnov test (Massey, 1951) for medical image classification. Statistics analysis is utilized to evaluate different classifiers based on the deep structured features. A useful classification system requires a proper data preprocessing scheme, a reliable feature extraction method, and an accurate classifier. In order to formulate a robust classification system, proper features and attributes should be extracted and used. In this study, we propose the combination of the single-level discrete 2-D WT and KS test as a pre-processing step, and design DBNs are combined as the feature extraction and selection techniques. Moreover, a comprehensive investigation to identify a robust classifier from the naive Bayes, LibSVM, RBF, random forest, and SMO models is also conducted. The proposed framework consisting of three steps to classify different radiological images is presented in Fig. 1. The pre-processing step aims to construct a sparse representation of raw image data. This is followed by feature extraction step via the design of an unsupervised DBN to find the deep structured features of the images. Then the resulting feature 192 A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 3.1.1. Wavelet transformation The 2D wavelet decomposition (Mallat, 1989) is a numerical analysis dimensionality reduction technique. It seeks the corresponding density levels extracted by two (low- and high-pass) filters. The Haar wavelet conversion is one of the transformations used for feature extraction. We use the Haar function because of its orthogonality property, which is able to represent the features efficiently with only a few wavelet coefficients. Suppose ϕ (x) is the mother Haar wavelet function. ϕ [(x − b)/a], (a, b) ∈ R+ × R (1) where a = 2− j and b = 2− j k for j, k ∈ N. The dilations and translations of ϕ calculated by (2) leads to an orthogonal basis in L2 (R). Therefore, each individual element in L2 (R) can be represented as a linear combination of (2): ϕ jk (x ) = const.ϕ (2 j x − k ) (2) 3.1.2. Kolmogorov Smirnov test This method is a non-parametric test comparing two populations of samples based on both location and shape of their cumulative distribution functions. Suppose the empirical distribution function of Fn for n independent and identically distributed observations of Xi is defined by: 1 I[− inf,x] (Xi ) n n Fn (x ) = (3) i=1 where I is the indicator function based on x. If F(X) is a cumulative distribution function, then the KS analysis is given by: Dn = supx |Fn (x ) − F (x )| (4) where supx is the supremum of the set distances. In the case of a two-sample KS test, the KS analysis is expressed as: Fig. 1. The proposed medical image classification framework. sets serve as inputs to supervised classifiers to classify the images. Through experiments, we will show the effectiveness of unsupervised deep structured features when they are used for image classification. Details of each step are presented in the following subsections. Dn = supx |F1 (x ) − F2 (x )| (5) The two-sample KS test checks the similarity of distribution functions of two populations (data sets) by defining the null and alternative hypotheses (Massey, 1951). The null hypothesis, denoted H0 , states that both sets of data were sampled from the same distribution. The null hypothesis is often an initial claim that analysts specify using previous examination or knowledge. An alternative hypothesis, H1 , is that the population parameter is different with the value of the population parameter in H0 . 3.1. Pre-processing phase 3.2. Feature extraction stage The dimension of all images is reduced to a gray scale, resized to 160 × 160, and then re-shaped to one row matrix. As discussed in the experimental results, feeding this matrix into the DBN and classification stages result in over-fitting and high computational cost problems. Therefore, the single-level discrete 2-D wavelet transform is applied to capture the highly discriminative coefficients that represent the complex structure of original data. Once the conversion process is completed, the best coefficients, which are able to separate the image classes, are selected using statistical tests. The maximum variance (MV) criterion is a conventional procedure to choose features having the greatest variance. As discussed in Quiroga, Nadasdy, and Ben-Shaul (2004) and Nguyen, Khosravi, Creighton, and Nahavandi (2015), the features with the largest variance do not guarantee to have the best discriminative properties among different classes. The preferred features should have the largest deviation from normality for the best discriminative property. For this purpose, Quiroga, Nadasdy, and Ben-Shaul (2004) proposed the use of the Lilliefors modification of the KS test for normality. The test is applied to select the best discriminative coefficients to form feature subsets. Informative representations of images (known as features) facilitate the robust image classification. Deep learning, a sub-field of machine learning, is a useful approach to identify meaningful features of images. The DBN proposed by Hinton, Osindero, and Teh (2006) is new deep learning model that is able to extract and learn a deep hierarchical representation of data. The main reason for choosing the DBN as a feature extractor is because it is able to generate robust features that could lead to improved classification performance (Kim, Lee, & Provost, 2013). In this study, the wavelet coefficients generated by the pre-processing stage serve as inputs to a DBN. The best discriminative coefficients are fed into the deep learning model, which results in a smooth and time efficient procedure. DBN is an unsupervised greedy layer-wise training procedure in which the output of each individual restricted Boltzmann machine (RBM) is fed to the visible layer of the next RBM. An RBM has a joint distribution defined by: P ( v, h ) = 1 exp(−hT W v − bT h − cT v ) z (6) A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 Table 1 The details of the DBN with four stacked RBMs. into sub QP problems. The SMO solves the smallest possible optimization problem, and involves two Lagrange multipliers at each step. Parameters DBN structure RBM 1 RBM 2 RBM 3 RBM 4 Visible units Latent units Latent units # Performance Max epoch Learning rate Model 25600 Binary 400 Free energy 50 0.1 Generative 400 Binary 300 Free energy 50 0.1 Generative 300 Binary 200 Free energy 50 0.1 Generative 200 Gaussian 100 Free energy 50 0.001 Generative where v and h are the binary visible and hidden units, respectively, W is the symmetric weights between them, c and b are the respective biases, and z is the normalization constant. The energy function between v and h is given by: E (v, h ) = hT W v + bT h + cT v (7) The jth binary hidden unit is activated by defining a logistic sigmoid function: P (h j = 1 ) = sigm(b j + viWi j ) 193 (8) 3.3.2. Naive Bayes Naive Bayes is a probabilistic algorithm. It acts as a classifier that utilizes the Bayes theorem to find which class is the mostly likely relevant to a new instance (Zhang, Peña, & Robles, 2009). It estimates the highest posterior probability conditioned on the new instance. To make this feasible, it assumes that all attributes are independent. Despite its simplicity and assumption of independence of attributes, it generates reasonable results even for complex classification problems (Hand & Yu, 2001). 3.3.3. Random forest A random forest model (Breiman, 2001) is a collection of k different decision trees. These decision trees are developed using different numbers of inputs and samples. All decision trees vote, and the highest ranked class is selected to provide the class label estimate of a new instance. 4. Experiments and discussions i In order to find the best W parameters, the gradient of the log likelihood of visible units, defined by (9), is calculated using contrastive divergence after k iterations (Hinton, 2002): ∂ logP (v ) ≈< vi h j >0 − < vi h j >k ∂ Wi j (9) where < . > m is the average value after a contrastive divergence iteration of m. We experimented different studies in terms of the number of RBMs. The model with four RBMs showed the best performance. Fig. 1 shows the DBN used for feature extraction that consists of four RBMs. Each RBM has one visible layer and one hidden layer. The output of each RBM is the input of the next RBM. Table 1 shows the description of the stacked RBMs used to construct the DBN. The network starts with 3200 units in the visible layer. It is one fourth of the original data size derived by wavelet transform and the KS test. A total of 40 0, 30 0, 20 0, and 10 0 units in the hidden layers of RBMs, respectively, are considered. Then, 100 units in the last layer force the DBN to select 100 most relevant features to represent each image. These deep structured features are used in the classification stage. 3.3. Classification stage The resulting set of deep structured features is fed to a classifier. The classification performances of naive Bayes, LibSVM, RBF, random forest, and SMO models are examined and compared . The input of this classification stage is a 5968 × 100 matrix, where 5968 is the data set size, and 100 is the features derived by the previous stage. Details of these classifiers are presented below. 3.3.1. Support vector machines This technique is one of the state-of-the-art classifiers that splits a data set into two or more categories. SVMs use a function called kernel to transform the input data samples into a higher dimensional space and to classify them linearly. We examine two different kernels here, i.e. SMO proposed by Platt (1999), and LibSVM using the SMO-type decomposition method developed by Fan, Chen, and Lin (2005). The SMO algorithm used for training the SVM classifier uses a numerical optimization method to carry out erforms quadratic programming (QP) pertaining to the SVM solution by decomposing it 4.1. Experimental data The Image Retrieval in Medical Application (IRMA) data set is used for experiments. IRMA is a joint project at the University Hospital, Aachen, Germany (Lehmann et al., 20 04, 20 05). The project aimed at developing image processing techniques applied to radiologic image archives. ImageCLEFmed 20 05, 20 07, and 20 09 are some of the popular benchmark collections of IRMA for automated categorization and retrieval of medical images (IRMA, 2015). Mueen, Baha, and Zainuddin (2007) developed a radiological image classification model using the ImageCLEFmed 2005. In their proposed method, a combination of global and local level features was deployed by an SVM classification model. Rahman, Bhattacharya, and Desai (2007) proposed a scheme to retrieve a part of medical images from ImageCLEFmed 2007. By using the RBF as a kernel function, they utilized statistical similarity distance measures and a relevance feedback procedure to associate low-level and high-level features in their proposed model. The ImageCLEFmed data set is popular for categorization and automated retrieval of medical images. The experiments of this research involve five selected classes of images from the ImageCLEFmed 2009 benchmark problem. Fig. 2 illustrates the samples from each selected category of the IRMA benchmark. As conducted in Pourghassem and Ghassemian (2008), these categories are some of the classes which have the highest overlap and similarity to each other, semantically. Table 2 presents the details of the categories used in this study. Ten data sets are randomly selected from the five classes of the original data, and each data set is randomly split into training (80%) and test (20%) sets. A total of five classes of the original data set consisting of 5968 images are chosen with respect to their labels. The images vary in dimensions and resolution. By considering computational costs and GPU limitations, the images are rescaled to zero-padded square sized images, 160 × 160, to prevent any distortions. Thus, each individual image is represented by a row vector with 25,600 dimensions. Accordingly, the input data sample comprised a matrix with 10,489 × 25,600 dimensions (4773 of the total data samples are considered as the training set). A total of 25,600 features from each individual image are used. Four case studies are considered, as follows: • Case Study 1: applying a pre-processing stage for re-scaling, resizing, and normalizing of the raw data, followed by using the 194 A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 Fig. 2. 5 classes selected form IRMA to be classified. Table 2 Five categories of the IRMA 2009 benchmark used in this study. Class name Anatomic Direction Number of images Number of train Number of test 1 2 3 4 5 Cranium Neuro cranium Hand Chest Chest Coronal Sagittal Coronal Coronal Sagittal 411 365 563 3587 1042 328 292 450 2869 833 83 73 113 718 209 DBN along with a classifier at the top. The input data feeding to the DBN is a matrix with 5968 × 25600 dimension. • Case Study 2: applying the KS test with the previous preprocessing step, followed by the same feature extraction and classification stages. The input data feeding to the DBN is a matrix with 5968 × 3200 dimension. Note that 3200 is one fourth of original data dimension selected by the KS test. • Case Study 3: applying a low-pass wavelet filter at the end of the pre-processing stage as in case study (1), followed by the same procedure. In other words, the pre-processed input data samples are passed through a wavelet filter to find a sparse representation of each image comprising 5968 × 6400 dimension. • Case Study 4 (the proposed framework): using the preprocessing step, i.e. re-scaling, re-sizing, normalizing, applying a low-pass wavelet filter and the KS test to obtain the best representative features of the raw data. This is followed by using the unsupervised DBN model to extract the deep structured features. The input data feeding to the DBN is a matrix with 5968 × 3200 dimension. In other words, the network starts with 3200 units in the visible layer. This number is half of the number of features feeding to the network defined in case study 3. Note that, this number is significantly less than that of for case study 1 (less than one fourth). Accordingly, because deep networks suffer from the curse of dimensionality, our proposed model performs efficiently based on this feature selection. A further investigation is required to find the best suitable number for this hyper parameter. 4.2. Performance evaluation metrics Accuracy, sensitivity, specificity, and F1 score statistics (Fmeasure) are used to evaluate classification performance. Sensitivity (the true positive rate) refers to the proportion of positive samples which are correctly classified as such. Conversely, specificity (the true negative rate) measures the proportion of negative samples which are correctly recognized as such. The F-measure is a single combined metric computed using (12). Precision (10) is the number of accurately labeled samples coming from the positive class divided by the number of labeled samples belonging to the positive class. On the contrary, recall (11) is the number of accurately labeled samples as belonging to the positive class divided by the total number of sample belonging to the positive class. P recision = TP TP + FP (10) Recall = TP TP + FN F measure = (β 2 + 1 ) × Precision × Recall (β 2 × Precision + Recal l ) (11) (12) To compute the F-measures, the value of β = 1 is used. 4.3. Results and discussions This research investigates the effects of unsupervised KS test to select the best features obtained by the low-pass filters of 2-D discrete wavelet. We then feed the selected features into a DBN to extract deep structured features. The low-pass filter is used to extract the approximation coefficient matrix because it contains more proper features as compared with those of the detailed information of WT. The main reason is that radiological images contain texture information, which may be removed by applying high-pass filters (Acharya & Ray, 2005). The top three features selected by the KS test are exhibited in Fig. 3. It is clear that these features have large deviations from a normal Gaussian distribution. This characteristic of the KS test based features helps improve the performance of DBN and therefore increase the classification performance of classifiers. A comparison among different studies is investigated in 4. As illustrated, the results of case study 4 indicate that there is a significant improvement once the KS test is applied to the features extracted by low-pass filters. More specifically, case study 4 which is defined by using a preprocessing step, followed by the three unsupervised feature selection and extraction techniques obtains the most relevant features for classification stage. The features extracted from low-pass wavelet filter are considered, followed by an unsupervised KS test feature selection technique to feed the features the unsupervised deep belief network (DBN). Due to the complexity of the deep structural models such as the DBN, a performance increase is promising if necessary information is fed to the deep networks. Hence, with respect to the literature and inspired by the usefulness of wavelet transform to extract the features of radiology images, we apply low-pass filter which keeps more proper features as compared with that of high-pass filter. Note that, as expected and reported in (Acharya & Ray, 2005), the latter removes texture information which is the relevant attributes in radiological images. Our results show the same scenarios. As seen in Fig. 4, the performance drops and the computational costs increase by convolving the details information. KS test is also an efficient technique to select the features which have large deviations from a normal Gaussian distribution. This A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 195 Fig. 3. Histograms of the top three features selected by the KS test. Fig. 4. The comparison among case studies with respect to the classifiers. Table 3 The proposed approach vs the others. Case studies P-value Case study (4) vs (1) Case study (4) vs (2) Case study (4) vs (3) 0.016 0.009 0.009 characteristic of the KS test based features helps improve the performance of DBN. As expected, a significant improvement is seen in Fig. 4, comparing the case study 4 with the rest scenarios. The same observation is produced by all classifiers. The classification performances of naive Bayes, LibSVM, RBF, random forest, and SMO classifies are compared in this study. In the classification stage, an investigation is conducted to identify a robust classifier. Fig. 4 depicts the classification accuracies for all case studies. The accuracy rate is the median of 10 trials of naive Bayes, LibSVM, RBF, random forest, and SMO classifiers. It is obvious that case study (4) outperforms other cases for all investigated classifiers. LibSVM, SMO, and random forest achieve similar accuracy rates for all case studies. We select LibSVM because it has a better recall rate as compared with those from other classifiers. It is highly important to achieve a recall rate as close as possible to 100%, because a misunderstanding of a positive case lead to serious consequences in the medical domain. The t-test is implemented to draw convincing conclusions in performance evaluation. As shown in Table 3, the pairwise t-test compares the p-value among different case studies. It can be Fig. 5. A comparison among different case studies with respect to the LibSVM classifier. concluded that the fourth model outperforms others. The t-test outcomes strongly reject the null hypothesis of all the tests, since the p-values are smaller than 0.05 (the 95% significance level). As a result, it is concluded that the classification performance is remarkably improved by applying the KS test for features extracted by the low pass filter. Fig. 5 presents graphical comparisons of classification performance for four case studies by using a boxplot. Each box represents the distribution of 10 classification accuracy results for each case study using the LibSVM classifier. 196 A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 Fig. 6. Performance comparisons between MV and KS test. Table 4 Performance obtained by LibSVM (mean ± std). Models Sensetivity Specificity F1 score Case study (1) Case study (2) Case study (3) 93.25 ± 0.8 92.9 ± 1.17 92 ± 0.9 97.05 ± 1 96.7 ± 1.16 95.7 ± 1.19 93.05 ± 0.87 92.9 ± 1.20 91.8 ± 1.01 Case study (4) 95.8 ± 0.01 97.8 ± 0.05 95.8 ± 0.01 Table 5 Performance metrics for different classifiers using proposed feature extraction method (the median of 10 trails). Models Sensetivity Specificity F1 score Naive Bayes RBF network Random forest SMO LibSVM 92 93.4 95.5 95.6 95.8 98.2 98.1 97.4 97.9 97.8 92.4 93.5 95.5 95.6 95.8 To highlight the effectiveness of the KS test, we compare its performance versus that of the MV criterion. Fig. 6 shows that using KS test leads to a significant accuracy improvement compared with MV criterion either using raw data or wavalet features. For example, KS test obtains the accuracy of 93.21% whilst MV achieves only 89% when the raw data are used (blue bars in Fig. 6). When wavelet features are used, the KS test reaches the accuracy of 95.8% whilst the MV receives only 92.8% of accuracy (orange bars in Fig. 6). In addition, the results reported in Fig. 6 also indicate that the combination of wavelet transform and the KS test considerably increases the classification accuracy compared with when using either of them individually. Table 4 shows another comparison between the case studies in terms of sensitivity, specificity, and F-measure metrics using the LibSVM classifier. Obviously, variations are relatively small, specially for case study (4). This demonstrates the consistency of classification techniques when fed by features extracted by the proposed approach. Note that the performance of the proposed method is obviously better than other case studies. Although the specificity of case studies (1) and (4) are almost the same, their sensitivity and F1 score are significantly different. Table 5 also compares the performance of different classifiers fed by features extracted by the proposed framework. It shows the median of 10 trials in terms of sensitivity, specificity, and Fmeasure achieved by naive Bayes, RBF, random forest, SMO, and LibSVM classifiers. SMO and LibSVM have almost the same performance. It is reasonable since their design concepts and theories are similar. Naive Bayes outperforms SVM in terms of specificity, but is recall rate is significantly lower. Due to the importance of recall in medical applications, the SVM is chosen. The processing times of all case studies are shown in Fig. 7. All computations are conducted using a computer with an Intel Core i7-2640M CPU @2.7 GHz and 16GB memory. The reported statistics show the time consumed for training and testing phases. The proposed framework, i.e. case study (4), is the fastest in both training and testing phases. It only requires 0.04 s to generate classification results for the test set, as compared with the processing time of case study (1) of 1.9 s. This implies that the proposed framework can be implemented in a real-time radiology classification system. The following explains some observations, followed by explanations on the strengths and the weaknesses of the proposed model; in contrast to non-medical applications where generally perfectly large-scaled labelled benchmarks can be created (e.g., for face recognition (Krizhevsky, Sutskever, & Hinton, 2012; Phillips, Wechsler, Huang, & Rauss, 1998)), a large-scaled labelled data set would be quite rare in medicine. It motivates researchers to investigate on unsupervised feature extraction techniques. Therefore, as one may desire to exploit the capabilities of deep solutions, addressing these challenges become an urgent task. In this study, we developed a three-hierarchical unsupervised feature extraction technique for medical domain, using wavelet transform, KS test, and DBN. To extract the features, the three steps do not use labels, and this is the strength of our proposed method. According to the literature, there are many researches using the DBN as feature extraction, however our study shows that utilising proper unsupervised feature extraction techniques before applying the DBN not only improves the performance, but also reduces the computational costs. Case study 1 was defined because the DBN technique has been commonly used for feature extraction on medical domain, according to the literature. As seen in Fig. 4 and Table 4, the proposed method outperformed the others in terms of sensitivity, specificity, F-measure, and prediction rate. Moreover, as seen in Fig. 7, in terms of computational cost, a significant reduction of 47.5 times is achieved by our approach, compared with case study 1. As mentioned in the literature, a combination of wavelet transform and the DBN was utilised in several studies for feature extraction on medical domain. As such, we defined case study 3 to compare with our contribution. As seen in Fig. 4 and Table 4, a considerable improvement in all evaluation terms was achieved as well as a significant reduction of 9.5 times in computational costs. The main weakness of our study is manually considering the number of features selected by KS test. A big investigation is A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 197 Fig. 7. A comparison among the case studies with SVM classifier in term of computational cost. required to find the optimal number of features selected from KS test of the low-level wavelet features. This should be taken into account. 5. Conclusions Extracting the most representative features and accurately classifying high dimensional, noisy data are two important issues in medical imaging. This paper aims to address these challenges by proposing a robust framework utilizing unsupervised deep structured features to classify different X-ray images. The main motivation is to generate a highly relevant pre-processed feature sets for the unsupervised deep learning DBN. DBN models further extract deep features and feed them as inputs into the LibSVM classifier for classification. The proposed framework uses the KS test to process information derived by a low-pass wavelet filter. This helps discover the most useful features from raw data efficiently for use by the DBN feature extractor. The proposed framework results in a significant reduction in the computational burden of the DBN. The experimental results show that using the proposed pre-processing procedure, i.e. the combination of WT and KS test, before applying the DBN not only greatly reduces the computational time, but also remarkably improves the classification performance. Superior performance of the proposed approach against its competing methods signifies that it can be applied effectively in the real practice for medical image analysis. This is really useful for medical doctors and patients because serious diseases such as lung cancer, liver problems, cardiovascular or heart diseases, or osteoarthritis and articular cartilage can be detected and diagnosed early. From these early detections, therapy planning and treatment can be devised effectively that helps reduce time, costs and efforts of doctors and patients and more importantly it contributes to improving public health. Investigation on the optimal number of features selected from KS test of the low-level wavelet features should be considered as a future study. Moreover, as the proposed approach has shown the excellent performance on five classes of images, a further research will verify the approach on more complex data sets that include more than five classes of X-ray images. Another future work is devoted to an investigation of other multiclass classifiers that would synergise with the proposed feature extraction approach to further improve the classification performance in processing high dimensional and noisy radiographic imaging data. Acknowledgement The IRMA database is obtained from Thomas M. Deserno, Dept. of Medical Informatics, RWTH Aachen, Germany. The authors would like to thank Thomas M. Deserno for support of the IRMA database. References Acharya, T., & Ray, A. K. (2005). Image processing: Principles and applications. John Wiley & Sons. Adankon, M. M., Dansereau, J., Labelle, H., & Cheriet, F. (2012). Non invasive classification system of scoliosis curve types using least-squares support vector machines. Artificial Intelligence in Medicine, 56, 99–107. Awate, S. P., Tasdizen, T., Foster, N., & Whitaker, R. T. (2006). Adaptive markov modeling for mutual-information-based, unsupervised mri brain-tissue classification. Medical Image Analysis, 10, 726–739. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. Bricq, S., Collet, C., & Armspach, J.-P. (2008). Unifying framework for multimodal brain mri segmentation based on hidden markov chains. Medical Image Analysis, 12, 639–652. Brosch, T., Tam, R., & Initiative, A. D. N. (2013). Manifold learning of brain mris by deep learning. In Medical image computing and computer-assisted intervention–MICCAI (pp. 633–640). Springer. Cabrera, A. F., Farina, D., & Dremstrup, K. (2010). Comparison of feature selection and classification methods for a brain–computer interface driven by non-motor imagery. Medical & Biological Engineering & Computing, 48, 123–132. Celebi, M. E., Kingravi, H. A., Uddin, B., Iyatomi, H., Aslandogan, Y. A., Stoecker, W. V., & Moss, R. H. (2007). A methodological approach to the classification of dermoscopy images. Computerized Medical Imaging and Graphics, 31, 362–373. Chen, Z., & Ning, R. (2004). Breast volume denoising and noise characterization by 3d wavelet transform. Computerized Medical Imaging and Graphics, 28, 235–246. El-Naqa, I., Yang, Y., Galatsanos, N. P., Nishikawa, R. M., & Wernick, M. N. (2004). A similarity learning approach to content-based image retrieval: Application to digital mammography. Medical Imaging, IEEE Transactions on, 23, 1233–1244. Fan, R.-E., Chen, P.-H., & Lin, C.-J. (2005). Working set selection using second order information for training support vector machines. The Journal of Machine Learning Research, 6, 1889–1918. Hand, D. J., & Yu, K. (2001). Idiot’s Bayes not so stupid after all? International Statistical Review, 69, 385–398. Hinton, G., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771–1800. Huber, M. B., Bunte, K., Nagarajan, M. B., Biehl, M., Ray, L. A., & Wismüller, A. (2012). Texture feature ranking with relevance learning to classify interstitial lung disease patterns. Artificial Intelligence in Medicine, 56, 91–97. IRMA (2015). Image retrieval in medical applications. https://ganymed.imib. rwth-aachen.de/irma/datasets.php. Khatami, A., Khosravi, A., Lim, C. P., & Nahavandi, S. (2016). A wavelet deep belief network-based classifier for medical images. International conference on neural information processing, Springer,467–474. Khatami, A., Mirghasemi, S., Khosravi, A., Lim, C. P., & Nahavandi, S. (2017). A new pso-based approach to fire flame detection using k-medoids clustering. Expert Systems with Applications, 68, 69–80. 198 A. Khatami et al. / Expert Systems With Applications 86 (2017) 190–198 Khatami, A., Mirghasemi, S., Khosravi, A., & Nahavandi, S. (2015a). An efficient hybrid algorithm for fire flame detection. IEEE. Neural networks (IJCNN), 2015 international joint conference on, 1–6. Khatami, A., Mirghasemi, S., Khosravi, A., & Nahavandi, S. (2015b). A new color space based on k-medoids clustering for fire detection. IEEE. Systems, man, and cybernetics (SMC), 2015 IEEE international conference on, 2755–2760. Kim, Y., Lee, H., & Provost, E. M. (2013). Deep learning for robust feature generation in audiovisual emotion recognition. IEEE. Acoustics, speech and signal processing (ICASSP), international conference on, 3687–3691. Korfiatis, P. D., Karahaliou, A. N., Kazantzi, A. D., Kalogeropoulou, C., & Costaridou, L. I. (2010). Texture-based identification and characterization of interstitial pneumonia patterns in lung multidetector ct. Information Technology in Biomedicine, IEEE Transactions on, 14, 675–680. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105). Lee, M. C., Boroczky, L., Sungur-Stasik, K., Cann, A. D., Borczuk, A. C., Kawut, S. M., & Powell, C. A. (2010). Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. Artificial Intelligence in Medicine, 50, 43–53. Lehmann, E. L., & D’Abrera, H. J. (2006). Nonparametrics: Statistical methods based on ranks. Springer New York. Lehmann, T. M., Gold, M., Thies, C., Fischer, B., Spitzer, K., Keysers, D., . . . Wein, B. B. (2004). Content-based image retrieval in medical applications. Methods of Information in Medicine, 43, 354–361. Lehmann, T. M., Güld, M. O., Deselaers, T., Keysers, D., Schubert, H., Spitzer, K., . . . Wein, B. B. (2005). Automatic categorization of medical images for content-based retrieval and data mining. Computerized Medical Imaging and Graphics, 29, 143–155. Lei, Y., He, Z., & Zi, Y. (2009). Application of an intelligent classification method to mechanical fault diagnosis. Expert Systems with Applications, 36, 9941–9948. Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11, 674–693. Massey, F. J., Jr (1951). The Kolmogorov–Smirnov test for goodness of fit. Journal of the American Statistical Association, 46, 68–78. Mueen, A., Baha, M. S., & Zainuddin, R. (2007). Multilevel feature extraction and x-ray image classification. Journal of Applied Sciences, 1224–1229. Nahar, J., Imam, T., Tickle, K. S., Ali, A. S., & Chen, Y.-P. P. (2012). Computational intelligence for microarray data and biomedical image analysis for the early diagnosis of breast cancer. Expert Systems with Applications, 39, 12371–12377. Nguyen, T., Khosravi, A., Creighton, D., & Nahavandi, S. (2015). Medical data classification using interval type-2 fuzzy logic system and wavelets. Applied Soft Computing, 30, 812–822. Phillips, P. J., Wechsler, H., Huang, J., & Rauss, P. J. (1998). The feret database and evaluation procedure for face-recognition algorithms. Image and Vision Computing, 16, 295–306. Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods Support Vector Learning, 3. Pourghassem, H., & Ghassemian, H. (2008). Content-based medical image classification using a new hierarchical merging scheme. Computerized Medical Imaging and Graphics, 32, 651–661. Quellec, G., Lamard, M., Cazuguel, G., Roux, C., & Cochener, B. (2011). Case retrieval in medical databases by fusing heterogeneous information. Medical Imaging, IEEE Transactions on, 30, 108–118. Quiroga, R. Q., Nadasdy, Z., & Ben-Shaul, Y. (2004). Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Computation, 16, 1661–1687. Rahman, M. M., Bhattacharya, P., & Desai, B. C. (2007). A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback. Information Technology in Biomedicine, IEEE Transactions on, 11, 58–69. Schilham, A. M., Van Ginneken, B., & Loog, M. (2006). A computer-aided diagnosis system for detection of lung nodules in chest radiographs with an evaluation on a public database. Medical Image Analysis, 10, 247–258. Sommer, C., Straehle, C., Koethe, U., & Hamprecht, F. A. (2011). ilastik: Interactive learning and segmentation toolkit. IEEE. Biomedical imaging: from Nano to Macro, 2011 IEEE international symposium on, 230–233. Subashini, M. M., Sahoo, S. K., Sunil, V., & Easwaran, S. (2016). A non-invasive methodology for the grade identification of astrocytoma using image processing and artificial intelligence techniques. Expert Systems with Applications, 43, 186–196. Udelhoven, T., Naumann, D., & Schmitt, J. (20 0 0). Development of a hierarchical classification system with artificial neural networks and ft-ir spectra for the identification of bacteria. Applied Spectroscopy, 54, 1471–1479. Wulsin, D., Gupta, J., Mani, R., Blanco, J., & Litt, B. (2011). Modeling electroencephalography waveforms with semi-supervised deep belief nets: Fast classification and anomaly measurement. Journal of Neural Engineering, 8, 036015. Yang, L., & Agarwal, P. (2011). Systematic drug repositioning based on clinical side– effects. PloS One, 6, e28025. Zhang, M.-L., Peña, J. M., & Robles, V. (2009). Feature selection for multi-label Naive Bayes classification. Information Sciences, 179, 3218–3229. Zhang, Y., Wang, S., Ji, G., & Dong, Z. (2013). An mr brain images classifier system via particle swarm optimization and kernel support vector machine. The Scientific World Journal, 130134.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )