1 A Novel Automatic Cardiac Auscultation with Hybrid Ant 2 Colony Optimization-SVM Classification 3 4 Automatic Cardiac Auscultation with ACO/SVM 5 6 Prasertsak Charoen*, Khamin Subpinyo and 7 Waree Kongprawechnon 8 9 Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, 10 Thailand 12121 11 *Corresponding author e-mail: prasertsak@siit.tu.ac.th 12 13 14 15 16 17 18 1 19 Abstract 20 Cardiac auscultation is a method to examine the condition of a heart using a 21 stethoscope. Since the method requires expertise, which is rare in suburban 22 areas, to analyze the condition of a heart, an automatic cardiac auscultation is 23 introduced with the use of a machine learning technique to eliminate the need of 24 expertise. In this paper, we develop a classification model based on support 25 vector machine (SVM) and ant colony optimization (ACO) for an automatic 26 cardiac auscultation system. The proposed method uses ACO to further select 27 heart sound features, resulting from initial feature selection by principle 28 component analysis (PCA), and use them to train SVM for classification. 29 Experimental results show that the proposed ACO/SVM technique is able to 30 give higher classification accuracy on heart sound samples, compared to 31 traditional SVM and genetic algorithm (GA) based SVM. 32 33 34 Keywords 35 Support vector machine, Cardiac auscultation, Ant colony optimization, 36 Machine learning 37 38 2 39 1. Introduction 40 Cardiac disease is one of the leading causes of death in the modern world. The 41 rate is relatively high, considering that cardiac diseases can be prevented by regular 42 heart check-ups to find symptoms before it becomes fatal. Regular health check-ups 43 are not widely done by the people, especially in suburban areas where the availability 44 of check-ups is low and considered to be expensive. The availability of cardiac 45 auscultation is even lower since traditional cardiac auscultation requires a skilled 46 doctor to use a stethoscope to listen to heart sounds of patients directly and analyze 47 their conditions. Though the diagnostics tools for cardiac auscultation are available, 48 skilled doctors who can conduct cardiac auscultation are rare. Moreover, they are 49 only available in major hospitals, making cardiac auscultation inaccessible in 50 suburban areas. Normally when patients need to have a cardiac auscultation 51 procedure, they need to go to big hospitals located in downtown areas because ECG 52 systems for cardiac auscultation treatment are expensive. Since the tools for cardiac 53 auscultation are expensive, the availability of tools is limited. 54 The main objective of this system is to create low cost cardiac auscultation by 55 using an automatic system in order to promote public healthcare. If cardiac 56 auscultations are easy with low cost to access, then fatalities from heart diseases will 57 decrease because the ease of access to cardiac auscultations provides early 58 measurements. The easy accessibility of cardiac auscultations will provide more 59 possibilities of treatment in the early stages before it becomes fatal. The main 3 60 problem that we discovered from previous works in Chatunapalak et al., 2012 and 61 Banpavivhit et al., 2013 is the classification accuracy of the system. Moreover, the 62 area that we consider a flaw is the accuracy of false negatives (FNs) where the system 63 is design to be used in dedicate situations which will affect lives. This is a vital part 64 of the results since FNs classify diseased heart samples as healthy ones. 65 The proposed system consists of 4 main parts: Preprocessing, Feature 66 Extraction, Feature Selection, and Support Vector Machine (SVM) classifier. The 67 preprocessing part resamples, reduces noise, and normalizes heart sounds. The feature 68 extraction part is used to extract features from each heart sound by using a Wavelet 69 Packet based feature extraction. The feature selection part is considered very 70 important as it is used to select salient features by using PCA and Ant colony 71 optimization. In this system we use Ant colony optimization to increase the accuracy 72 of the classifier. Since each feature of each heart sound has different significance in 73 identifying a health conditions, Ant colony optimization is implemented to select a 74 significant feature set and use this set of features to train a SVM classifier in the last 75 part. 76 77 2. Materials and Methods 78 2.1 Classification by Support Vector Machine (SVM) 79 Classifying data is a common task in machine learning. SVM has been known 80 for its competent algorithm for analyzing data and recognizing patterns in the areas of 4 81 statistics and computer science. It is used for classification and regression analysis. 82 Given certain data points, the largest separation on a hyperplane can make SVM a 83 non-probabilistic binary linear classifier, or give it, the perceptron of optimal 84 stability. The classifier can also be optimized with the input data set for 85 generalization. 86 87 2.2 Statistical Model 88 In statistics and machine learning, regularization is used to prevent overfitting, 89 which is used to describe a situation when the learning model is too complicated. The 90 rule is to choose a function that is the simplest and explains the data. When the model 91 is less complicated, the result shows less generalization error. Traditionally, SVM is 92 used to classify between two classes and is termed “binary classification”. Details 93 about deriving an optimized classifier that uses an optimal hyperplane with maximum 94 margin is described for heart sound (HS) analysis, in a later section. 95 96 97 98 2.3 Optimal Hyperplane (OHP) A binary classifier is a function n=2, into two labels that maps vectors , where , from Eq. 1, 99 100 (1) 101 5 102 is a linearly separable training data set. A binary classifier is constructed by using 103 two separating hyperplanes with far enough distance to isolate them. The OHP is 104 aligned between them in Figure 1. Support vectors (SVs) are data points which are 105 located at the corresponding hyperplane, to classify data with their real labels. The 106 OHP is the optimal decision surface which is chosen from many possible hyperplanes 107 generated from the input data, as in Eq. 2, on an orthogonal basis when the two 108 supporting hyperplanes are defined. Two class labels are defined here. One with +1 109 label and another with -1 label. These labels are defined by Eq. 3 and Eq. 4, 110 respectively. 111 Eq. 5. is the test data and needed to be classified, and has to conform with 112 113 . (2) 114 115 . (3) 116 117 . (4) 118 119 . (5) 120 6 121 Here, 122 respect to the origin, and lastly, 123 is the normal vector to the hyperplane, is the scalar valued bias with is the vector valued input data point. When points are difficult to classify, more computation is typically needed. 124 125 2.4 Maximum Margin, and Minimum Error of Misclassification 126 The second concept needed in formulation of a decision surface is margin, 127 represented by the 128 the right panel of Figure 1. From Figure 3, the larger the margin is, the better the 129 ability to categorize instant data into the correct type. In order to satisfy this criterion, 130 minimizing 131 subject to the constraints: as in is required. The nearest samples to the separating hyperplane are 132 133 . (6) 134 135 where 136 data points are given by: is the label associated with the sample and . This implies that the . 137 Searching for OHP with an optimality criterion of maximum distance also 138 implies an optimization problem. So, constructing a maximum-margin classifier is a 139 convex optimization problem that can be solved via quadratic programming (QP) 140 techniques. By substituting with for convenience, the optimization can 7 141 be in the objective function of min{ 142 minimum error of misclassification. }, which relies on the constraint for 143 144 2.5 Slack Variable 145 In real classification problems, the data samples are not always linearly 146 separable and require additional offset between a large margin and small error 147 penalty, to be optimized. This is provided by using a slack variable . It allows an 148 error term in the SVM model and makes it to be a soft-margin classifier. Slack 149 variables define the distance between data points and the separation hyperplane of 150 their class, as shown in Figure 2 as the penalty term 151 function. This introduces an outliner to the separating hyperplane. There is a trade-off 152 between the optimizing function 153 the margin is too large, more training data points are on the wrong side of their 154 respective supporting hyperplanes, which give a larger error. This is the reason that 155 soft-margin SVM chooses a hyperplane that splits the data points as clearly as 156 possible, and still retains maximum distance to the nearest clearly split samples. The 157 nonnegative constant 158 margin size and allows us to control the trade-off between margin size and error. 159 Slack variables need to be introduced in a non-separable data set. This modifies the 160 constraints in Eq. 6 with 161 is: in the objective , with margin size and error size. When is a cost parameter, and it is inversely proportional to the . As a result, a new optimized maximum-margin criterion 8 162 163 . (7) 164 165 such that and for are constrained. 166 167 2.6 Nonlinearly Separable Problem 168 For a practical HS classification problem, the data sample may not always be 169 linearly separable. By mapping nonlinear HS samples in the input space onto the 170 higher dimensional feature space, linearly separable data, 171 derived separating hyperplanes in Eq. 8 satisfy the minimum error of 172 misclassification and the maximum margin, with primal representation by Eq. 9, or 173 with dual representation by Eq. 10 as 174 175 (8) Minimizing, 176 177 (9) Subject to 178 179 , is formed. The final and Minimizing, 9 180 181 (10) Subject to 182 , for i = 1,2,…, N 183 where i and j are the number of each input data point, xi is nonlinear input data, b is 184 scalar-valued bits, C is regularization or cost parameter, 185 slack variables, 186 quadratic programming (QP), and 187 solution in Eq. 11 is attained for the SVs. In higher dimensional feature space, the 188 decision is formulated for the classifier via the search for an OHP using the RBF 189 kernel, as given in Eq. 12, are nonnegative are nonnegative Lagrangian multipliers, resulting from is the weight vector such that its optimum 190 (11) , i = 1, 2, …, N 191 (12) 192 such that 193 as well as for parameter C, to optimally define and separate groups of the classified 194 SVs, in order to achieve sensitivity and specificity. is the kernel width chosen from the process of ten-fold cross-validation, 10 195 Competent generalization of the SVM model is directly proportional to the 196 number of the input samples utilized for training the model. The precisely generated 197 decision function is: 198 199 (13) 200 For 201 label categories. as target vector, the input is predicted to be in one of the two real- 202 203 2.7 Kernel Method 204 The Kernel method is the algorithm which maps the sample data into a high 205 dimensional feature space to make data become more easily separated in high 206 dimensional space. There are many forms of these mapping processes. Moreover, 207 there is no constraint, meaning that the dimension can increase infinitely. 208 Choosing the right Kernel solely depends on which kind of problem SVM 209 needs to deal with. The reason is that each Kernel has different properties, therefore it 210 needs to be chosen according to what we want to model. For example a polynomial 211 Kernel allows us to model features up to the order of a polynomial. The RBF Kernel 11 212 allows us to pick out circles of hypersphere, the same as we have to choose in the 213 linear Kernel as a hyperplane. 214 215 2.8 Ten-Fold Cross-Validation 216 N-fold cross-validation, with N = 10, is applied to form the model analysis. 217 First, the data set D is divided into 10 partitions or folds 218 14, which are nine training sets and one test set. , as in Eq. 219 220 . (14) 221 222 where 223 that are independent of one another. Each fold 224 where the remaining folds of that partition 225 Eq. 15. The process continues via generating a new set of both portions until ten sets 226 of data are reached. is the partition. Cross-validation is carried out by taking different folds is utilized for testing exactly once, are used to train the models, as given in 227 228 . (15) 229 230 Eventually, N-fold cross-validation only needs to construct N models for the 231 evaluation of a single parameter set. This approach is computationally tractable even 232 for large data sets, in contrast to the computationally complexity of models in the 12 233 leave-one-out method, as well as the potential bias of the hold-out method. Also, it 234 has been shown that cross-validation with a value of N of 10 can factor out biases 235 effectively. 236 237 2.9 Principle Component Analysis (PCA) 238 239 PCA is applied to attain the compactness of a feature set through data-driven basis, as to signify the most relevant HS features according to Eq. 16: 240 241 . (16) 242 243 where 244 matrix whose eigenvectors, arranged in decreasing order of magnitude of their 245 eigenvalues, are made up from the sum of eigenvalues until being more than 246 all summing eigenvalues. These eigenvectors are formed via the covariance matrix of 247 the training set is a dimensional-reduced version of the extracted feature vector , is a of , defined by Eq. 17: 248 249 . (17) 250 251 where 252 the mean vector. Best-tree coefficients of the WP feature set are then compactly 253 supported as the algorithm iteratively traces subsets of eigenvectors from the is the number of feature vectors, is the original feature vectors, and is 13 254 covariance matrix of WP-generated features to ensure the best related features are in 255 the record. For the selection of WP coefficients, the chosen set of 9 features is 256 acquired for each HS from where original features are densely aligned to the signified 257 level using MATLAB. 258 2.10 Ant Colony Optimization 259 Ant Colony Optimization (ACO), which firstly aims to search for an optimal 260 path in a graph, is based on the behavior of ants seeking a path between 261 their colony and a source of food. The original idea has since diversified to solve a 262 wider class of numerical problems, and as a result, several procedures have emerged, 263 drawing on various aspects of behaviors of ants. In nature, ants initially 264 wander randomly, and upon finding food, return to their colony while laying 265 down pheromone trails. If other ants find such a path, they are likely not to keep 266 travelling at random, but to instead follow the trail, returning and reinforcing it if they 267 eventually find food. As time passes, however, the pheromone trail starts to 268 evaporate, thus reducing its attractiveness. The more time it takes for an ant to travel 269 down the path and back again, the higher the possibility of evaporation. A short route, 270 in comparison, gets marched over more frequently, and thus the pheromone density 271 becomes higher than longer ones. Pheromone evaporation also has an advantage of 272 avoiding the convergence to a locally optimal solution. If there were no evaporation 273 at all, the paths chosen by the first ants would tend to be excessively attractive to the 14 274 following ones. In that case, the exploration of the solution space would be 275 constrained. 276 An example of the ACO process is shown in Figure 4. The first ant wanders 277 randomly until it finds food source (F). Then it returns to its nest (N), laying a 278 pheromone trail. Afterwards, other ants follow one of the paths randomly, also laying 279 pheromone trails. Since the ants traveling on the shortest path lays pheromone trails 280 faster, this path gets reinforced with more pheromone, making it more appealing to 281 the following ants. Lastly, the ants are likely to follow the shortest path more and 282 more, since it is constantly reinforced with a larger amount of pheromones. The 283 pheromone trails of longer paths evaporate. 284 285 2.11 Proposed Methodology 286 The overall block diagram of the system is shown in Figure 5. It is divided 287 into to 4 main steps: Preprocessing, Feature Extraction, Feature Selection, and 288 Classification. All of the processes and algorithms that are used in the system are 289 done in MATLAB. Wavelet packet based feature extraction and PCA are studied in 290 Chatunapalak et al., 2012, this method is implemented in the proposed system. There 291 are 352 heart sound samples consisting of 144 normal heart sounds and 208 292 pathological murmurs comprised of Aortic Stenosis, Aortic Regurgitation, Mitral 293 Stenosis and Mitral Regurgitation. First, heart sounds go into the preprocessing stage, 294 which is composed of resampling, noise cancellation, and normalization. After that, 15 295 heart sound features are extracted by WP, using WP decomposition, WP entropy, and 296 best tree coefficients. Then, feature selection is done by PCA and ACO to select only 297 the salient features. The final trained SVM can classify healthy heart sounds from 298 abnormal heart sounds. 299 2.11.1 Data Collection and Preprocessing 300 During preprocessing, the length of each heart cycle is equalized and 301 resampled at a rate of 4 kHz. Then, the resampled heart sound enters the noise 302 cancellation process; here 5-level DWT via soft thresholding is used along with 303 Daubechies-6 wavelet family for detailed coefficients. The physiological variations in 304 intensity of each heart sound are discarded by normalization, using Eq. 18. 305 306 (18) 307 308 This has zero mean and unity variance where 309 a signal before normalization. 310 the samples. is a final heart sound signal, and is the mean of the samples and is is the variance of 311 312 2.11.2 Feature Extraction 313 The feature extraction part of the proposed system is based on Chatunapalak 314 et al., 2012. Since heart sound is a non-stationary signal, in order to retain most of the 315 time-frequency information in the signal, a WP-based filter is used because it has 16 316 high performance yield at characterizing these heart sound signals. Higher-order 317 Daubechies family wavelets are used to improve frequency resolution and reduce 318 aliasing. After WP decomposition, each subband energy is assessed by employing the 319 non-normalized Shannon’s entropy criterion, as Eq. 19. . 320 is Shannons’s entropy and (19) 321 where 322 function allows a greater entropy weight for signal intensity while attenuating noise 323 and disturbances. The WP that is suitable for the heart sound signal is 6-level 324 decomposition on Daubechies- 3 mother wavelet function. The selected percentage of 325 retained energy is 99.9%. This is to characterize the best-basis feature with a 96% 326 compression rate, approaching hierarchically on the decomposition indices. The total 327 outputs of the 2-channel filter banks are obtained, as Eq. 20. 328 . is a heart sound signal. The logarithmic (20) 329 After the performance of WP, the final number of different extracted features on the 330 frequency plane is 126. 331 332 2.11.3 Feature Selection 333 PCA is the method commonly used to reduce the dimensions of the given 334 features. The algorithm tries to locate a subset of the original extracted feature. The 335 dimensions of these feature sets is important since they significantly shape the 336 performance and accuracy of the system. PCA uses subsets of eigenvectors from the 17 337 covariance matrix of the extracted feature. The feature that has the sum of 90% of 338 eigenvalues (to previous sum) is retained as related features, meaning that these 339 features are significant in identifying each sample. The resulting number of features 340 remaining in this particular system is 9, which is considerably smaller than the 341 previous 126 features. 342 343 2.11.4 Ant Colony Optimization Algorithm (ACO) 344 ACO is an optimization technique based on the theory of ants finding food. 345 The proposed system uses ACO for feature selection to choose the best set of salient 346 features to train SVM classification. Since 9 features selected from the heart sound 347 samples are significant. These factors determine the accuracy of the proposed system. 348 Thus selection of these features is very important. From the experiment, training 349 SVM using one feature leads to significantly different classification accuracy of the 350 SVM classifier. Therefore if the system selects only salient features, the accuracy of 351 the classifier can be increased. This is the reason why an ACO is chosen in the 352 system, to enhance the accuracy of the system. 353 The ACO process diagram is shown in Figure 6. The method of ACO in the 354 proposed system is based on a selection of the salient factors. First, the system 355 generates ants and places them into initial nodes for all of the features. Classification 356 accuracy (CA) is determined for all of the ants and pheromone levels and a level of 1 357 is assigned to each feature. Then, a random next feature for each ant is used and 18 358 concatenated with the previous feature to compute the CA again. The feature 359 associated with the ant that has higher CA is given a pheromone level. Otherwise, the 360 system terminates the ant and starts to generate the next ant. The process continues 361 until it reaches the termination criteria. Note that the pheromone in this system is 362 based on discrete values, with increment of 1. 363 364 2.11.5 SVM Classification 365 In order to classify all the heart sound samples into two categories, healthy 366 and unhealthy, the SVM method is chosen. Training of SVM classification using the 367 RBF kernel has proven to be an effective classification method for higher dimension 368 classification, such as heart sound signals in the proposed system (Chatunapalak et 369 al., 2012). The classification accuracy is evaluated using a 10-fold cross validation 370 method, as stated earlier. The accuracy is then used to determine the fitness value of 371 each feature set, provided by ACO feature selection. SVM classifier with RBF-Kernel 372 function finds the solution of the model by mapping the nonlinear input data into a 373 higher dimensional feature space. This is done to make the feature data separable. 374 After the ACO loop, the system arranges the features in descending order, 375 corresponding to pheromone level. A total of 9 cases of SVM are considered. The 376 first case has 1 feature to train in SVM and the second case has 2 features (in 377 descending order), and so on until it reaches 9th case. The proposed method 19 378 determines how many features are needed to ensure the best accuracy, resulting from 379 SVM classification. 380 381 3. Results and Discussion 382 We evaluate the performance of SVM classifier by many values, which are 383 sensitivity, specificity and accuracy respectively. For sensitivity, it measures the 384 percentage of abnormal heart sound which is classified correctly. Specificity 385 measures the percentage of normal heart sounds which are classified correctly. 386 Accuracy is the proximity of measurement results to the true value. 387 (21) 388 (22) 389 (23) 390 From the simulation of the system with 50 iterations, Table 1 shows the total 391 pheromone levels of the system. We can see that the highest pheromone values are 392 from feature number 8, which is the most important feature. It is arranged to be the 393 first feature to be used in SVM training. The lowest pheromone values are from 394 feature number 1, which is assigned to the lowest priority. 395 Table 2 shows the result of SVM classification according to the number of 396 features trained in SVM. We simulate 9 SVM classifications to see how many 20 397 feature(s) acquire the highest accuracy for SVM classification. As a result, the highest 398 accuracy is obtained when use only 4 highest pheromone features are used. 399 Table 3 shows the results compared to the previous works of Chatunapalak et 400 al., 2012 and Banpavivhit et al., 2013. We can see that the system with ACO has a 401 better CA at 1.32%, compared to the system without GA and ACO. The system with 402 ACO improves the CA by 0.99%, when compared to the system with GA. The 403 accuracy improvement in the proposed system is the result of the reduction in FN 404 area, which is vital in this application, compared to previous works. 405 406 4. Conclusions 407 This paper proposes a new alternative screening diagnostics tool that has low 408 cost. The system uses wavelet packet based feature extraction to extract significant 409 features from the heart sound samples to form 126 feature sets. The dimensions of the 410 features are then further reduced using PCA. To obtain sets of salient features to be 411 used to train an SVM classifier, ACO is used. ACO checks the importance of features 412 by pheromone levels. These pheromone values arrange the features in descending 413 order by pheromones level, to train in SVM. The highest CA result is obtained when 414 choosing only 4 out of 9 features. The resulting classifier accuracy is improved, 415 compared to not using ACO. Moreover, the proposed system can reduce FN area 416 which is an important factor that can affect the life of a patient. 21 417 In future research, feature extraction using Linear Discriminant Analysis 418 (LDA) and Latent Semantic Analysis (LSA) will be investigated. We plan to expand 419 combination of such techniques with SVM classification to achieve maximum 420 classify accuracy. Also, the proposed system will be tested on raw heart sounds 421 collected on site from patients using a digital stethoscope. A Graphic User Interface 422 (GUI) can also be developed for real-time usage of the algorithm. 423 424 References 425 Bunluechokchai, C., Ussawawongaraya, W.: A Wavelet-based Factor for Classication 426 of Heart Sounds with Mitral Regurgitation. International Journal of Applied 427 Biomedical Engineering V0l.2, No.1 (2009) 428 429 Gupta, C.N., Palaniappan, R., Swaminathan, S., Krishnan, S.M.: Neural Network 430 Classication of Homomorphic Segmented Heart Sounds. Biomedical Engineering 431 Research Center, Nanyang Technological University, Singapore (2007) 432 433 Chebil, J., Al-Nabulsi, J.: Classification of Heart Sound Signal Using Discrete 434 Wavelet Analysis. International Journal of Soft Computing 2(1): 37-41 (2007) 435 436 Chatunapalak, I., Phatiwuttipat, P., Kongprawechnon, W.,Tungpimolrut, K.: 437 Childhood Musical Murmur Classification with Support Vector Machine Technique. 22 438 Sirindhorn International Institute of Technology, Thammasat University, Pathum 439 Thani, Thailand (2012) 440 441 Phatiwuttipat, P., Kongprawechnon, W., Tungpimolrut, K.: Cardiac Auscultation 442 Analysis System with Neural Network and SVM Technique. ECTI Conference, 443 Konkaen (2011) 444 445 Banpavivhit, S.,Kongprawechnon, W.,Tungpimolrut, K.: Cardiac Auscultation with 446 Hybrid GA/SVM. Sirindhorn International Institute of Technology, Thammasat 447 University, Pathum Thani, Thailand (2013) 448 449 Ahmed, A.: Ant Colony Optimization for Feature Subset Selection. International 450 Journal of Computer, Control, Quantum and Information Engineering Vol:1, No:4, 451 (2007) 452 453 Murase, K., Monirul, K., Shahjahan, M.: Ant Colony Optimization Toward Feature 454 Selection. http://dx.doi.org/10.5772/51707 455 456 Aghdam H.M., Ghasem-Adhaee, N., Basiri E.M.: Text Feature Selection using Ant 457 Colony Optimization. Department of Computer Engineering, Faculty of Engineering, 458 University of Isfahan, Hezar Jerib Avenue, Esfahan, Iran 459 23 460 Majdi, M., Eleyan, D.: Ant Colony Optimization based Feature Selection in Rough 461 Set Theory. International Journal of Computer Science and Electronics Engineering 462 (IJCSEE) Volume 1, Issue 2 (2013) 24