1501912

advertisement
1
A Novel Automatic Cardiac Auscultation with Hybrid Ant
2
Colony Optimization-SVM Classification
3
4
Automatic Cardiac Auscultation with ACO/SVM
5
6
Prasertsak Charoen*, Khamin Subpinyo and
7
Waree Kongprawechnon
8
9
Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani,
10
Thailand 12121
11
*Corresponding author e-mail: prasertsak@siit.tu.ac.th
12
13
14
15
16
17
18
1
19
Abstract
20
Cardiac auscultation is a method to examine the condition of a heart using a
21
stethoscope. Since the method requires expertise, which is rare in suburban
22
areas, to analyze the condition of a heart, an automatic cardiac auscultation is
23
introduced with the use of a machine learning technique to eliminate the need of
24
expertise. In this paper, we develop a classification model based on support
25
vector machine (SVM) and ant colony optimization (ACO) for an automatic
26
cardiac auscultation system. The proposed method uses ACO to further select
27
heart sound features, resulting from initial feature selection by principle
28
component analysis (PCA), and use them to train SVM for classification.
29
Experimental results show that the proposed ACO/SVM technique is able to
30
give higher classification accuracy on heart sound samples, compared to
31
traditional SVM and genetic algorithm (GA) based SVM.
32
33
34
Keywords
35
Support vector machine, Cardiac auscultation, Ant colony optimization,
36
Machine learning
37
38
2
39
1. Introduction
40
Cardiac disease is one of the leading causes of death in the modern world. The
41
rate is relatively high, considering that cardiac diseases can be prevented by regular
42
heart check-ups to find symptoms before it becomes fatal. Regular health check-ups
43
are not widely done by the people, especially in suburban areas where the availability
44
of check-ups is low and considered to be expensive. The availability of cardiac
45
auscultation is even lower since traditional cardiac auscultation requires a skilled
46
doctor to use a stethoscope to listen to heart sounds of patients directly and analyze
47
their conditions. Though the diagnostics tools for cardiac auscultation are available,
48
skilled doctors who can conduct cardiac auscultation are rare. Moreover, they are
49
only available in major hospitals, making cardiac auscultation inaccessible in
50
suburban areas. Normally when patients need to have a cardiac auscultation
51
procedure, they need to go to big hospitals located in downtown areas because ECG
52
systems for cardiac auscultation treatment are expensive. Since the tools for cardiac
53
auscultation are expensive, the availability of tools is limited.
54
The main objective of this system is to create low cost cardiac auscultation by
55
using an automatic system in order to promote public healthcare. If cardiac
56
auscultations are easy with low cost to access, then fatalities from heart diseases will
57
decrease because the ease of access to cardiac auscultations provides early
58
measurements. The easy accessibility of cardiac auscultations will provide more
59
possibilities of treatment in the early stages before it becomes fatal. The main
3
60
problem that we discovered from previous works in Chatunapalak et al., 2012 and
61
Banpavivhit et al., 2013 is the classification accuracy of the system. Moreover, the
62
area that we consider a flaw is the accuracy of false negatives (FNs) where the system
63
is design to be used in dedicate situations which will affect lives. This is a vital part
64
of the results since FNs classify diseased heart samples as healthy ones.
65
The proposed system consists of 4 main parts: Preprocessing, Feature
66
Extraction, Feature Selection, and Support Vector Machine (SVM) classifier. The
67
preprocessing part resamples, reduces noise, and normalizes heart sounds. The feature
68
extraction part is used to extract features from each heart sound by using a Wavelet
69
Packet based feature extraction. The feature selection part is considered very
70
important as it is used to select salient features by using PCA and Ant colony
71
optimization. In this system we use Ant colony optimization to increase the accuracy
72
of the classifier. Since each feature of each heart sound has different significance in
73
identifying a health conditions, Ant colony optimization is implemented to select a
74
significant feature set and use this set of features to train a SVM classifier in the last
75
part.
76
77
2. Materials and Methods
78
2.1 Classification by Support Vector Machine (SVM)
79
Classifying data is a common task in machine learning. SVM has been known
80
for its competent algorithm for analyzing data and recognizing patterns in the areas of
4
81
statistics and computer science. It is used for classification and regression analysis.
82
Given certain data points, the largest separation on a hyperplane can make SVM a
83
non-probabilistic binary linear classifier, or give it, the perceptron of optimal
84
stability. The classifier can also be optimized with the input data set for
85
generalization.
86
87
2.2 Statistical Model
88
In statistics and machine learning, regularization is used to prevent overfitting,
89
which is used to describe a situation when the learning model is too complicated. The
90
rule is to choose a function that is the simplest and explains the data. When the model
91
is less complicated, the result shows less generalization error. Traditionally, SVM is
92
used to classify between two classes and is termed “binary classification”. Details
93
about deriving an optimized classifier that uses an optimal hyperplane with maximum
94
margin is described for heart sound (HS) analysis, in a later section.
95
96
97
98
2.3 Optimal Hyperplane (OHP)
A binary classifier is a function
n=2, into two labels
that maps vectors
, where
, from Eq. 1,
99
100
(1)
101
5
102
is a linearly separable training data set. A binary classifier is constructed by using
103
two separating hyperplanes with far enough distance to isolate them. The OHP is
104
aligned between them in Figure 1. Support vectors (SVs) are data points which are
105
located at the corresponding hyperplane, to classify data with their real labels. The
106
OHP is the optimal decision surface which is chosen from many possible hyperplanes
107
generated from the input data, as in Eq. 2, on an orthogonal basis when the two
108
supporting hyperplanes are defined. Two class labels are defined here. One with +1
109
label and another with -1 label. These labels are defined by Eq. 3 and Eq. 4,
110
respectively.
111
Eq. 5.
is the test data and needed to be classified, and has to conform with
112
113
.
(2)
114
115
.
(3)
116
117
.
(4)
118
119
.
(5)
120
6
121
Here,
122
respect to the origin, and lastly,
123
is the normal vector to the hyperplane,
is the scalar valued bias with
is the vector valued input data point. When points
are difficult to classify, more computation is typically needed.
124
125
2.4 Maximum Margin, and Minimum Error of Misclassification
126
The second concept needed in formulation of a decision surface is margin,
127
represented by the
128
the right panel of Figure 1. From Figure 3, the larger the margin is, the better the
129
ability to categorize instant data into the correct type. In order to satisfy this criterion,
130
minimizing
131
subject to the constraints:
as
in
is required. The nearest samples to the separating hyperplane are
132
133
.
(6)
134
135
where
136
data points are given by:
is the label associated with the sample
and
. This implies that the
.
137
Searching for OHP with an optimality criterion of maximum distance also
138
implies an optimization problem. So, constructing a maximum-margin classifier is a
139
convex optimization problem that can be solved via quadratic programming (QP)
140
techniques. By substituting
with
for convenience, the optimization
can
7
141
be in the objective function of min{
142
minimum error of misclassification.
}, which relies on the constraint for
143
144
2.5 Slack Variable
145
In real classification problems, the data samples are not always linearly
146
separable and require additional offset between a large margin and small error
147
penalty, to be optimized. This is provided by using a slack variable . It allows an
148
error term in the SVM model and makes it to be a soft-margin classifier. Slack
149
variables define the distance between data points and the separation hyperplane of
150
their class, as shown in Figure 2 as the penalty term
151
function. This introduces an outliner to the separating hyperplane. There is a trade-off
152
between the optimizing function
153
the margin is too large, more training data points are on the wrong side of their
154
respective supporting hyperplanes, which give a larger error. This is the reason that
155
soft-margin SVM chooses a hyperplane that splits the data points as clearly as
156
possible, and still retains maximum distance to the nearest clearly split samples. The
157
nonnegative constant
158
margin size and allows us to control the trade-off between margin size and error.
159
Slack variables need to be introduced in a non-separable data set. This modifies the
160
constraints in Eq. 6 with
161
is:
in the objective
, with margin size and error size. When
is a cost parameter, and it is inversely proportional to the
. As a result, a new optimized maximum-margin criterion
8
162
163
.
(7)
164
165
such that
and
for
are constrained.
166
167
2.6 Nonlinearly Separable Problem
168
For a practical HS classification problem, the data sample may not always be
169
linearly separable. By mapping nonlinear HS samples in the input space onto the
170
higher dimensional feature space, linearly separable data,
171
derived separating hyperplanes in Eq. 8 satisfy the minimum error of
172
misclassification and the maximum margin, with primal representation by Eq. 9, or
173
with dual representation by Eq. 10 as
174
175
(8)
Minimizing,
176
177
(9)
Subject to
178
179
, is formed. The final
and
Minimizing,
9
180
181
(10)
Subject to
182
,
for
i = 1,2,…, N
183
where i and j are the number of each input data point, xi is nonlinear input data, b is
184
scalar-valued bits, C is regularization or cost parameter,
185
slack variables,
186
quadratic programming (QP), and
187
solution in Eq. 11 is attained for the SVs. In higher dimensional feature space, the
188
decision is formulated for the classifier via the search for an OHP using the RBF
189
kernel, as given in Eq. 12,
are nonnegative
are nonnegative Lagrangian multipliers, resulting from
is the weight vector such that its optimum
190
(11)
, i = 1, 2, …, N
191
(12)
192
such that
193
as well as for parameter C, to optimally define and separate groups of the classified
194
SVs, in order to achieve sensitivity and specificity.
is the kernel width chosen from the process of ten-fold cross-validation,
10
195
Competent generalization of the SVM model is directly proportional to the
196
number of the input samples utilized for training the model. The precisely generated
197
decision function is:
198
199
(13)
200
For
201
label categories.
as target vector, the input
is predicted to be in one of the two real-
202
203
2.7 Kernel Method
204
The Kernel method is the algorithm which maps the sample data into a high
205
dimensional feature space to make data become more easily separated in high
206
dimensional space. There are many forms of these mapping processes. Moreover,
207
there is no constraint, meaning that the dimension can increase infinitely.
208
Choosing the right Kernel solely depends on which kind of problem SVM
209
needs to deal with. The reason is that each Kernel has different properties, therefore it
210
needs to be chosen according to what we want to model. For example a polynomial
211
Kernel allows us to model features up to the order of a polynomial. The RBF Kernel
11
212
allows us to pick out circles of hypersphere, the same as we have to choose in the
213
linear Kernel as a hyperplane.
214
215
2.8 Ten-Fold Cross-Validation
216
N-fold cross-validation, with N = 10, is applied to form the model analysis.
217
First, the data set D is divided into 10 partitions or folds
218
14, which are nine training sets and one test set.
,
as in Eq.
219
220
.
(14)
221
222
where
223
that are independent of one another. Each fold
224
where the remaining folds of that partition
225
Eq. 15. The process continues via generating a new set of both portions until ten sets
226
of data are reached.
is the
partition. Cross-validation is carried out by taking different folds
is utilized for testing exactly once,
are used to train the models, as given in
227
228
.
(15)
229
230
Eventually, N-fold cross-validation only needs to construct N models for the
231
evaluation of a single parameter set. This approach is computationally tractable even
232
for large data sets, in contrast to the computationally complexity of models in the
12
233
leave-one-out method, as well as the potential bias of the hold-out method. Also, it
234
has been shown that cross-validation with a value of N of 10 can factor out biases
235
effectively.
236
237
2.9 Principle Component Analysis (PCA)
238
239
PCA is applied to attain the compactness of a feature set through data-driven
basis, as to signify the most relevant HS features according to Eq. 16:
240
241
.
(16)
242
243
where
244
matrix whose eigenvectors, arranged in decreasing order of magnitude of their
245
eigenvalues, are made up from the sum of eigenvalues until being more than
246
all summing eigenvalues. These eigenvectors are formed via the covariance matrix of
247
the training set
is a dimensional-reduced version of the extracted feature vector
,
is a
of
, defined by Eq. 17:
248
249
.
(17)
250
251
where
252
the mean vector. Best-tree coefficients of the WP feature set are then compactly
253
supported as the algorithm iteratively traces subsets of eigenvectors from the
is the number of feature vectors,
is the original feature vectors, and
is
13
254
covariance matrix of WP-generated features to ensure the best related features are in
255
the record. For the selection of WP coefficients, the chosen set of 9 features is
256
acquired for each HS from where original features are densely aligned to the signified
257
level using MATLAB.
258
2.10 Ant Colony Optimization
259
Ant Colony Optimization (ACO), which firstly aims to search for an optimal
260
path in a graph, is based on the behavior of ants seeking a path between
261
their colony and a source of food. The original idea has since diversified to solve a
262
wider class of numerical problems, and as a result, several procedures have emerged,
263
drawing on various aspects of behaviors of ants. In nature, ants initially
264
wander randomly, and upon finding food, return to their colony while laying
265
down pheromone trails. If other ants find such a path, they are likely not to keep
266
travelling at random, but to instead follow the trail, returning and reinforcing it if they
267
eventually find food. As time passes, however, the pheromone trail starts to
268
evaporate, thus reducing its attractiveness. The more time it takes for an ant to travel
269
down the path and back again, the higher the possibility of evaporation. A short route,
270
in comparison, gets marched over more frequently, and thus the pheromone density
271
becomes higher than longer ones. Pheromone evaporation also has an advantage of
272
avoiding the convergence to a locally optimal solution. If there were no evaporation
273
at all, the paths chosen by the first ants would tend to be excessively attractive to the
14
274
following ones. In that case, the exploration of the solution space would be
275
constrained.
276
An example of the ACO process is shown in Figure 4. The first ant wanders
277
randomly until it finds food source (F). Then it returns to its nest (N), laying a
278
pheromone trail. Afterwards, other ants follow one of the paths randomly, also laying
279
pheromone trails. Since the ants traveling on the shortest path lays pheromone trails
280
faster, this path gets reinforced with more pheromone, making it more appealing to
281
the following ants. Lastly, the ants are likely to follow the shortest path more and
282
more, since it is constantly reinforced with a larger amount of pheromones. The
283
pheromone trails of longer paths evaporate.
284
285
2.11 Proposed Methodology
286
The overall block diagram of the system is shown in Figure 5. It is divided
287
into to 4 main steps: Preprocessing, Feature Extraction, Feature Selection, and
288
Classification. All of the processes and algorithms that are used in the system are
289
done in MATLAB. Wavelet packet based feature extraction and PCA are studied in
290
Chatunapalak et al., 2012, this method is implemented in the proposed system. There
291
are 352 heart sound samples consisting of 144 normal heart sounds and 208
292
pathological murmurs comprised of Aortic Stenosis, Aortic Regurgitation, Mitral
293
Stenosis and Mitral Regurgitation. First, heart sounds go into the preprocessing stage,
294
which is composed of resampling, noise cancellation, and normalization. After that,
15
295
heart sound features are extracted by WP, using WP decomposition, WP entropy, and
296
best tree coefficients. Then, feature selection is done by PCA and ACO to select only
297
the salient features. The final trained SVM can classify healthy heart sounds from
298
abnormal heart sounds.
299
2.11.1 Data Collection and Preprocessing
300
During preprocessing, the length of each heart cycle is equalized and
301
resampled at a rate of 4 kHz. Then, the resampled heart sound enters the noise
302
cancellation process; here 5-level DWT via soft thresholding is used along with
303
Daubechies-6 wavelet family for detailed coefficients. The physiological variations in
304
intensity of each heart sound are discarded by normalization, using Eq. 18.
305
306
(18)
307
308
This has zero mean and unity variance where
309
a signal before normalization.
310
the samples.
is a final heart sound signal, and
is the mean of the samples and
is
is the variance of
311
312
2.11.2 Feature Extraction
313
The feature extraction part of the proposed system is based on Chatunapalak
314
et al., 2012. Since heart sound is a non-stationary signal, in order to retain most of the
315
time-frequency information in the signal, a WP-based filter is used because it has
16
316
high performance yield at characterizing these heart sound signals. Higher-order
317
Daubechies family wavelets are used to improve frequency resolution and reduce
318
aliasing. After WP decomposition, each subband energy is assessed by employing the
319
non-normalized Shannon’s entropy criterion, as Eq. 19.
.
320
is Shannons’s entropy and
(19)
321
where
322
function allows a greater entropy weight for signal intensity while attenuating noise
323
and disturbances. The WP that is suitable for the heart sound signal is 6-level
324
decomposition on Daubechies- 3 mother wavelet function. The selected percentage of
325
retained energy is 99.9%. This is to characterize the best-basis feature with a 96%
326
compression rate, approaching hierarchically on the decomposition indices. The total
327
outputs of the 2-channel filter banks are obtained, as Eq. 20.
328
.
is a heart sound signal. The logarithmic
(20)
329
After the performance of WP, the final number of different extracted features on the
330
frequency plane is 126.
331
332
2.11.3 Feature Selection
333
PCA is the method commonly used to reduce the dimensions of the given
334
features. The algorithm tries to locate a subset of the original extracted feature. The
335
dimensions of these feature sets is important since they significantly shape the
336
performance and accuracy of the system. PCA uses subsets of eigenvectors from the
17
337
covariance matrix of the extracted feature. The feature that has the sum of 90% of
338
eigenvalues (to previous sum) is retained as related features, meaning that these
339
features are significant in identifying each sample. The resulting number of features
340
remaining in this particular system is 9, which is considerably smaller than the
341
previous 126 features.
342
343
2.11.4 Ant Colony Optimization Algorithm (ACO)
344
ACO is an optimization technique based on the theory of ants finding food.
345
The proposed system uses ACO for feature selection to choose the best set of salient
346
features to train SVM classification. Since 9 features selected from the heart sound
347
samples are significant. These factors determine the accuracy of the proposed system.
348
Thus selection of these features is very important. From the experiment, training
349
SVM using one feature leads to significantly different classification accuracy of the
350
SVM classifier. Therefore if the system selects only salient features, the accuracy of
351
the classifier can be increased. This is the reason why an ACO is chosen in the
352
system, to enhance the accuracy of the system.
353
The ACO process diagram is shown in Figure 6. The method of ACO in the
354
proposed system is based on a selection of the salient factors. First, the system
355
generates ants and places them into initial nodes for all of the features. Classification
356
accuracy (CA) is determined for all of the ants and pheromone levels and a level of 1
357
is assigned to each feature. Then, a random next feature for each ant is used and
18
358
concatenated with the previous feature to compute the CA again. The feature
359
associated with the ant that has higher CA is given a pheromone level. Otherwise, the
360
system terminates the ant and starts to generate the next ant. The process continues
361
until it reaches the termination criteria. Note that the pheromone in this system is
362
based on discrete values, with increment of 1.
363
364
2.11.5 SVM Classification
365
In order to classify all the heart sound samples into two categories, healthy
366
and unhealthy, the SVM method is chosen. Training of SVM classification using the
367
RBF kernel has proven to be an effective classification method for higher dimension
368
classification, such as heart sound signals in the proposed system (Chatunapalak et
369
al., 2012). The classification accuracy is evaluated using a 10-fold cross validation
370
method, as stated earlier. The accuracy is then used to determine the fitness value of
371
each feature set, provided by ACO feature selection. SVM classifier with RBF-Kernel
372
function finds the solution of the model by mapping the nonlinear input data into a
373
higher dimensional feature space. This is done to make the feature data separable.
374
After the ACO loop, the system arranges the features in descending order,
375
corresponding to pheromone level. A total of 9 cases of SVM are considered. The
376
first case has 1 feature to train in SVM and the second case has 2 features (in
377
descending order), and so on until it reaches 9th case. The proposed method
19
378
determines how many features are needed to ensure the best accuracy, resulting from
379
SVM classification.
380
381
3. Results and Discussion
382
We evaluate the performance of SVM classifier by many values, which are
383
sensitivity, specificity and accuracy respectively. For sensitivity, it measures the
384
percentage of abnormal heart sound which is classified correctly. Specificity
385
measures the percentage of normal heart sounds which are classified correctly.
386
Accuracy is the proximity of measurement results to the true value.
387
(21)
388
(22)
389
(23)
390
From the simulation of the system with 50 iterations, Table 1 shows the total
391
pheromone levels of the system. We can see that the highest pheromone values are
392
from feature number 8, which is the most important feature. It is arranged to be the
393
first feature to be used in SVM training. The lowest pheromone values are from
394
feature number 1, which is assigned to the lowest priority.
395
Table 2 shows the result of SVM classification according to the number of
396
features trained in SVM. We simulate 9 SVM classifications to see how many
20
397
feature(s) acquire the highest accuracy for SVM classification. As a result, the highest
398
accuracy is obtained when use only 4 highest pheromone features are used.
399
Table 3 shows the results compared to the previous works of Chatunapalak et
400
al., 2012 and Banpavivhit et al., 2013. We can see that the system with ACO has a
401
better CA at 1.32%, compared to the system without GA and ACO. The system with
402
ACO improves the CA by 0.99%, when compared to the system with GA. The
403
accuracy improvement in the proposed system is the result of the reduction in FN
404
area, which is vital in this application, compared to previous works.
405
406
4. Conclusions
407
This paper proposes a new alternative screening diagnostics tool that has low
408
cost. The system uses wavelet packet based feature extraction to extract significant
409
features from the heart sound samples to form 126 feature sets. The dimensions of the
410
features are then further reduced using PCA. To obtain sets of salient features to be
411
used to train an SVM classifier, ACO is used. ACO checks the importance of features
412
by pheromone levels. These pheromone values arrange the features in descending
413
order by pheromones level, to train in SVM. The highest CA result is obtained when
414
choosing only 4 out of 9 features. The resulting classifier accuracy is improved,
415
compared to not using ACO. Moreover, the proposed system can reduce FN area
416
which is an important factor that can affect the life of a patient.
21
417
In future research, feature extraction using Linear Discriminant Analysis
418
(LDA) and Latent Semantic Analysis (LSA) will be investigated. We plan to expand
419
combination of such techniques with SVM classification to achieve maximum
420
classify accuracy. Also, the proposed system will be tested on raw heart sounds
421
collected on site from patients using a digital stethoscope. A Graphic User Interface
422
(GUI) can also be developed for real-time usage of the algorithm.
423
424
References
425
Bunluechokchai, C., Ussawawongaraya, W.: A Wavelet-based Factor for Classication
426
of Heart Sounds with Mitral Regurgitation. International Journal of Applied
427
Biomedical
Engineering V0l.2, No.1 (2009)
428
429
Gupta, C.N., Palaniappan, R., Swaminathan, S., Krishnan, S.M.: Neural Network
430
Classication of Homomorphic Segmented Heart Sounds. Biomedical Engineering
431
Research Center, Nanyang Technological University, Singapore (2007)
432
433
Chebil, J., Al-Nabulsi, J.: Classification of Heart Sound Signal Using Discrete
434
Wavelet Analysis. International Journal of Soft Computing 2(1): 37-41 (2007)
435
436
Chatunapalak, I., Phatiwuttipat, P., Kongprawechnon, W.,Tungpimolrut, K.:
437
Childhood Musical Murmur Classification with Support Vector Machine Technique.
22
438
Sirindhorn International Institute of Technology, Thammasat University, Pathum
439
Thani, Thailand (2012)
440
441
Phatiwuttipat, P., Kongprawechnon, W., Tungpimolrut, K.: Cardiac Auscultation
442
Analysis System with Neural Network and SVM Technique. ECTI Conference,
443
Konkaen (2011)
444
445
Banpavivhit, S.,Kongprawechnon, W.,Tungpimolrut, K.: Cardiac Auscultation with
446
Hybrid GA/SVM. Sirindhorn International Institute of Technology, Thammasat
447
University, Pathum Thani, Thailand (2013)
448
449
Ahmed, A.: Ant Colony Optimization for Feature Subset Selection. International
450
Journal of Computer, Control, Quantum and Information Engineering Vol:1, No:4,
451
(2007)
452
453
Murase, K., Monirul, K., Shahjahan, M.: Ant Colony Optimization Toward Feature
454
Selection. http://dx.doi.org/10.5772/51707
455
456
Aghdam H.M., Ghasem-Adhaee, N., Basiri E.M.: Text Feature Selection using Ant
457
Colony Optimization. Department of Computer Engineering, Faculty of Engineering,
458
University of Isfahan, Hezar Jerib Avenue, Esfahan, Iran
459
23
460
Majdi, M., Eleyan, D.: Ant Colony Optimization based Feature Selection in Rough
461
Set Theory. International Journal of Computer Science and Electronics Engineering
462
(IJCSEE) Volume 1, Issue 2 (2013)
24
Download