Uploaded by International Research Journal of Engineering and Technology (IRJET)

IRJET-Electroencphalogram Signals Classification using Gradient Boost Algorithm and Support Vector Machine

advertisement
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
ELECTROENCPHALOGRAM SIGNALS CLASSIFICATION USING GRADIENT
BOOST ALGORITHM AND SUPPORT VECTOR MACHINE
Ajao, T.A1, Oyewole, A.O2, Ojo, O.S3, Amore, T.O4, Amusan, D.G5 and Olabode, A.O6
1,2,4Researcher,
Federal Institute of Industrial Research Oshodi (FIIRO), Nigeria.
Scholar, Ladoke Akintola University of Technology, Ogbomoso, Nigeria.
5E-Tutor, LAUTECH Open and Distance Learning Center, Ogbomoso, Nigeria.
3,6Research
-----------------------------------------------------------------***----------------------------------------------------------------Abstract - Automatic diagnosis of epilepsy seizure from Electroencephalogram (EEG) has been an active research in the
field of biomedical science. A significant amount of classification of Electroencephalogram (EEG) signal have been proposed in
recent researches; most of which achieved a very promising performance but are characterized by high false positive rate and
limited by being computationally intensive. This research carried out a comparative analysis of the performance evaluation of
Extreme Gradient Boost Algorithm and Support Vector Machine for the classification of epileptic seizures in human
electroencephalogram (EEG). This research revealed that the XGBoost Algorithm outperformed SVM model in the
classification of an EEG signal.
Keywords- Extreme Gradient Boost, Support Vector Machine, Electroencephalogram (EEG)
1.
INTRODUCTION
In recent time, a lot of effort has been directed toward the application of computer analysis of the bio-electric signals of the
human system. Several ill health conditions in man can be detected from the evaluation of the electrical signals with the body,
some of the important bio-electric signals in the human system include those responsible for the heartbeat, brain signal and
those in the central nervous system. However, the breakthrough in soft computing and artificial intelligence has improved the
development of more effective classification, diagnostic techniques and improvements in treatment methodologies (Tzallas, et
al., 2012). Soft computing technique has helped in extracting and classifying bio-signal such as Electromyography (EMG),
electroencephalogram (EEG), Electrooculography (EOG) and electrocardiogram (ECG) in order to detect or treat the ailment.
Different methods and techniques have been developed for detecting and classifying the electroencephalogram (EEG) as either
normal or epileptic. However, complete visual analysis of EEG signal is very difficult, hence, automated means of detection is
essential.
Epilepsy is characterized by sudden recurrent and transient disturbances of perception or behaviour resulting from
excessive synchronization of cortical neuronal networks. Epileptic seizures are divided by their clinical manifestation into
partial or focal, generalized, unilateral and unclassified seizures (Tzallas, Tsipouras, and Fotiadis, 2009). The use of
classification systems in medical diagnosis has increased significantly. There is no doubt that evaluation of data taken from
patients and decisions of experts are the most important factors in diagnosis. Classification systems help to minimize possible
errors that can be done because of a fatigued or inexperienced physician. Automated diagnostic systems have been applied to
a variety of medical data, such as electrocardiograms (ECGs), electromyograms (EMGs), electroencephalograms (EEGs),
ultrasound signals/images, X-rays, and computed tomographic images (AlZubi, Islam and Abbod, (2011). This research
focuses on the comparative analysis of extreme gradient boost and support vector machine in the detection and classification
of an EEG as either epilepsy seizure or non-epilepsy seizure for effective management of a patient suffering from epilepsy
seizure.
Subsequently, the rest of this paper is organized in the following sections: some reviews on related EEG signals, methodology
of a Xboost and SVM, followed by results and discussion. The final section concludes the paper along with some
recommendations for future research.
2.
REVIEWS ON ELECTROENCEPHALOGRAM
In recent time, a lot of effort has been directed toward the application of computer analysis of the bio-electric signals of the
human system. Several methods for the brain function analysis such as megnetoencephalography (MEG), functional magnetic
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 125
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
resonance imaging (fMRI) and positron emission tomography (PET) have been introduced, the EEG signal is still a valuable
tool for monitoring the brain activity due to its relatively low cost and being convenient for the patient. However, the
breakthrough in soft computing and artificial intelligence has improved the development of more effective classification,
diagnostic techniques and improvements in treatment methodologies (Tzallas, et al., 2012). Different methods and techniques
have been developed for detecting and classifying the electroencephalogram (EEG) as either normal or epileptic. However,
complete visual analysis of EEG signal is very difficult, hence, automated means of detection is essential.
Subasi at al., (2005) deals with a novel method of analysis of EEG signals using discrete wavelet transform, and classification
using ANN. In their work, the signal decomposed in 5 levels using Daubechies order 4 (DB4) wavelet filter. The energy of
details and approximation were used as the input features. Adeli et al., (2007) proposed a wavelet-chaos-neural network
methodology for classification of electroencephalograms (EEGs) into healthy, ictal and interictal EEGs. In order to decompose
the EEG into the delta, theta, alpha, beta, and gamma sub-bands the wavelet analysis is utilized. Three parameters are used for
EEG representation: standard deviation, correlation dimension, and largest Lyapunov exponent. The research was carried out
in two phases with the intention of minimizing the computing time and output analysis, band-specific analysis and mixed-band
analysis. The results showed that all the three key components the wavelet-chaos-neural network methodologies are
significant for enhancing the EEG classification accuracy. Ganesan, et al., (2010) proposed a technique for the automatic
detection of the spikes in the long term 18 channel human electroencephalograms (EEG) with less number of data set. The
scheme for detecting epileptic and non-epileptic spikes in EEG is based on a multi-resolution, multi-level analysis and Artificial
Neural Network (ANN) approach. In lieu of these, the results obtained from various researches revealed that there is high false
positive rate.
3. METHODOLOGY
The implementation tool used was Python 3.6 software package (Spyder 3.5.1) on Windows 10 enterprise 64-bit operating
system, Core i5 CPU [email protected] Central Processing Unit, 8GB RAM and 500 Gigabytes hard disk drive with accurate
speed for better performance. Statistical tools of t-test value was used to further validate the performance of each of the
techniques used. Python programming language was used to implement the system because of its multiple programming
paradigms and dynamic features in machine learning.
The methodology involved five major steps, which are:
a.
b.
c.
d.
e.
Data acquisition,
Removal of artifacts
Extraction of features
Decomposition of extracted features
Classification of decomposed feature using XGboost and SVM.
A. Acquisition of datasets
Publicly available dataset from the Clinique of Bonn University were used for this study. This data is recorded by a means of
a 128-channel 12 bit EEG system with 173.5 samples per second. The total dataset comprising 500 segments were grouped into
five sets. Each segment has 23.6 seconds duration. All sets are selected from EEG records after purifying artifacts caused by eyes
and muscle movements.
The EEG data acquired contain three different cases namely; data for healthy people, epileptic people during the seizure-free
interval (interictal) and epileptic people during seizure interval (ictal). The case has five segments (Z, O, N, S, and F) that were
used for training and testing XGboost and SVM. ). Set Z and O are obtained from healthy people under the condition of eyes open
and closed with respect to the external surface electrodes using a standardized electrode placement scheme. The set N and F are
obtained from interictal people. The set F is obtained from epileptogenic sections of the brain that represent the focal activity
while N has been taken from the hippocampal pattern of the brain that indicates non-focal interictal activity. The set S has been
obtained from an epileptic subject during seizure interval. The bands that are clinically relevant are delta, theta, alpha, beta and
gamma.
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 126
B.
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
Removal of Artifacts
The EEG signal was normalized be rescaling the features so that signal have the properties of a standard normal distribution
with μ=0 and σ=1, where μ is the mean (average) and σ is the standard deviation from the mean.
C.
Extraction of features using Linear Discriminant Analysis
Different approaches have been considered for the extraction of features for EEC signals in the previous work where these
methods were usually used to explore the information from EEG. In this work, Linear Discriminant Analysis was used for
feature extraction and this involved ten stages.
These are the steps for the extraction of features using Linear discriminant analysis
i.
Given a set of N samples
) as given by,
each of which is represented as a row of length M as in Figure 2.4 (step (A)), and
)
)
)
)
)
)
)
[
ii.
iii.
)
Compute the mean of each
Calculate between-class matrix
)
∑
iv.
v.
)
) as in Equation 3.2 below
)
)
for all Class
where represents coefficients of signal s in an orthonormal basis
Compute within-class matrix of each class
), as follows:
∑(
vi.
)]
)
)(
)
)
Construct a transformation matrix for each class
)
)
)
vii.
viii.
ix.
x.
) and eigenvector
) of each transformation matrix
The eigenvalues
), are then calculated, where and
represent the calculated eigenvalues and eigenvectors of the ith class respectively.
Sorting eigenvectors in descending order according to their corresponding eigenvalues. The first k eigenvectors are
then used as a lower dimensional space for each class ( ).
) onto their lower dimensional space ( ), as:
Project all original samples
)
Where
xi.
represents the projected samples of the class
end for
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 127
D.
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
Classification of decomposed feature using XGboost and SVM.
i.
Gradient Boost Algorithm
The steps for the classification of extracted features using gradient boost algorithm are;
i.
ii.
input data (x, y)Ni = 1
Select number of iterations M
iii.
Choice of the loss-function Ψ(y, f)
iv.
choice of the base-learner model h(x, θ)
v.
vi.
Initialize ⏞ with a constant
for
Analogue
to Digital
Converter
Amplifier
Data pre-processing
-Normalization
- Feature Scaling
Feature Extraction Using LDA
Feature dimensionality
reduction, extraction and
separation
Classification
Result
Classification using
XGBoost and SVM
classifier
Fig 3.1: Conceptual view of EEG classification system
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 128
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
vii. Compute the negative gradient
www.irjet.net
)
viii. Fit a new base-learner function
)
ix. Find the best gradient descent step-size
̂
∑
)
)
(3.6)
x. update the function estimate:
̂ ← ̂
xi.
ii.
)
)
end for
Support Vector Machine for classification of extracted features
The Support Machine Vector was applied for classifying normal and seizure activity from the continuous recording EEG
signals. Feature vectors are generated for both seizure and non-seizure activity. Parameters like area and mean frequency of
the components are estimated and given as input for LDA-SVM.
The Selected best global position ( ) of the LDA output with the detected feature subset mapped by
optimized parameters C and using equation (3.8)
‖ ‖
and modelled with the
∑
such that
∑
(
)
)
The final classification was obtained using this equation 3.7
(
)
)
)
Where N is the size of the dataset, C is the cost function. I
iii.
(3.9)
are the slack variables, x and b is an offset scalar.
Implementation Phase
The implementation phase for the Gradient boost algorithm and Support vector machine is presented in Fig 3.2. The first stage
was the signal data acquisition from the human EEG which was pre-processed using Standard Scaler and the second stage was
feature selection and dimensional reduction using Linear Discriminant Analysis. The extracted features were classified into
epileptic seizure and non-epileptic seizure using XGBoost and SVM algorithm. This was done using Spyder 3.2.4 version.
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 129
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
Start
Acquire Dataset for training
Load Test Data
Normalize the Dataset
Normalize the Dataset
Apply Feature Scaling on
Dataset
Apply Feature Scaling on
Dataset
Apply LDA for feature
extraction and dimension
reduction
Apply LDA for feature
extraction and dimension
reduction
Store selected features into
the Library
Features extracted
Classify extracted features using
XGBoost or SVM
Classification Result
Yes
Test another data
No
Stop
Fig3.2: Flowchart showing trained and tested EEG signal with XGBoost, and SVM
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 130
4.
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
RESULTS AND DISCUSSION
The average training time generated by application of XGBoost for five datasets (Z-O-N-F-S) of 20490 is 2.817sec while the
average training time generated by application of SVM for five datasets (Z-O-N-F-S) of 20490 is 0.610sec. The result in terms of
computation time for training the dataset after five trials reveals that the time spent increases as the number of dataset
increases, which implies that the time consumed depends on the features in the training set for XGBoost Model and SVM model.
The classification results for the XGboost showed that the computation time increases with increase in dataset as illustrated by
Table 4.1 (a). The model achieved a false positive rate of 0.33%, sensitivity of 99.68%, specificity of 99.45%, precision of
98.48%, an accuracy of 99.06% and F-measure of 99.07% at classification time of 0.01sec. Also for three (3) datasets (O-N-S)
of 12294x100 dimension, XGBoost model achieved a false positive rate of 1.86%, sensitivity of 92.52%, specificity of 98.14%,
precision of 96.53%, accuracy of 96.12% and F-measure of 94.49% at classification time of 0.03sec. Similarly, for five (5)
datasets (Z-O-N-F-S) XGBoost model has a false positive rate of 2.04%, sensitivity of 86.92%, specificity of 97.96%, precision of
91.55%, accuracy of 95.72% and F-measure of 89.17% at classification time of 0.11sec.
Also, the classification results for the support vector machine algorithm showed that the computational time increases with
increase in dataset as indicated by Table 4.2b. The SVM model achieved a false positive rate of 1.47%, sensitivity of 99.03%,
specificity of 98.53%, precision of 98.55%, accuracy of 98.78% and F-measure of 98.79% at classification time of 0.004s. Also
for three (3) datasets (O-N-S) of 12294x100 dimension, SVM model achieved a false positive rate of 3.24%, sensitivity of
94.24%, specificity of 96.76%, precision of 93.47%, accuracy of 95.93% and F-measure of 93.86% at classification time of
0.042s. Similarly, for five (5) datasets (Z-O-N-F-S) SVM model has a false positive rate of 3.54%, sensitivity of 88.99%,
specificity of 96.46%, precision of 85.66%, accuracy of 95.02% and F-measure of 87.29% at classification time of 0.016s.
The results of the classification for the XGboost and SVM show that XGboost outperformed the SVM but computationally
expensive in terms of classification time.
The results show that there is significant variation in the performance metrics with an increase in dataset and the best result is
obtained using two (2) datasets (Z-S) across all metrics (false positive rate, specificity, precision, sensitivity F-score and
accuracy) for XGBoost and SVM model. The optimum performance is achieved with the classification scheme using two (2)
dataset i.e. Z-S as presented in Table 4.3. Therefore, based on the performance of the three techniques using five datasets (Z-ON-F-S) as shown in Table 4.3; XGBoost has the optimum performance with respect to all performance metrics. In terms of
training time, SVM trains and classify datasets faster than XGboost as illustrated in Fig 3.3.
Further statistical analysis was also conducted between the XGboost and SVM. A t-test value was measured between the FPR
of XGBoost and SVM. The paired t-test analysis conducted reveals that XGBoost was statistically significant at
with
. The mean difference and t-value being negative assert the fact
the XGBoost model have a reduced False Positive Rate. The t-test result validates the fact the XGBoost outperformed the SVM
techniques in terms of FPR. Therefore, the alternative hypothesis which states that the difference between the FPR of XGBoost
and SVM is statistically significant is accepted. In view of the hypothesis tested it is confirmed statistically that XGBoost
outperformed SVM in terms of False Positive Rate.
Similarly, the paired t-test analysis conducted between the F-Measure of XGBoost and SVM reveals that there is no much
distinction in the test result with a mean difference of 1.264 (i.e.
). Nevertheless, the result confirmed that the
XGBoost is statistically significant at
with
. The t-test result validates the fact the XGBoost
outperformed the SVM model in terms of F-Measures. Therefore, the alternative hypothesis which states that the difference
between the F-Measures of XGBoost and SVM is statistically significant is accepted. In view of the hypothesis tested it is
confirmed statistically that XGBoost outperformed SVM in terms of F-Measures.
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 131
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
Table 4.2a: Classification results for Gradient boost algorithm
Dataset
FPR
(%)
Sensitivity
(%)
Specificity
(%)
Precision
(%)
Accuracy
(%)
F-Score (%)
Classification Time
Z-S
0.33
99.68
99.45
98.48
99.06
99.07
0.01
O-N-S
1.86
92.52
98.14
96.53
96.12
94.49
0.03
Z-O-N-F-S
2.04
86.92
97.96
91.55
95.72
89.17
0.11
Table 4.2b: Classification results for Support vector machine algorithm
Dataset
FPR
(%)
Sensitivity
(%)
Specificity
(%)
Precision
(%)
Accuracy
(%)
F-Score (%)
Classification
Time
Z-S
1.47
99.03
98.53
98.55
98.78
98.79
0.004
O-N-S
3.24
94.24
96.76
93.47
95.93
93.86
0.042
Z-O-N-F-S
3.54
88.99
96.46
85.66
95.02
87.29
0.16
3
2.5
2
T
i
1.5
m
e
1
XGboost
SVM
0.5
0
Z-S
O-N-S
Z-0-N-F-S
Data sets
Fig 3.3: Graph of Average time for each techniques.
5.
CONCLUSION AND FUTURE WORKS
Based on the experimental results generated by this work, it has revealed that the XGBoost model produced more efficient
result in terms of recognition accuracies, precision, specificity, sensitivity, FPRand than F-measure SVM. In view of this, an EEG
signal classification system based on XGBoost model produced a more reliable seizure or seizure free detection system than
SVM model. Future work can be carried out by investigating, evaluating the performance of a hybrid approach using XGBoost
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 132
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 06 Issue: 11 | Nov 2019
p-ISSN: 2395-0072
www.irjet.net
model with other classifiers to know their overall performances in terms of classification, recognition accuracy and average
response time.
REFERENCE
[1]
Tzallas (2012). Automatic seizure detection based on time-frequency analysis and artificial neural networks.
Computational Intelligence and Neuroscience, 1-13.
[2]
Tzallas, A. T., Tsipouras, M. G., and Fotiadis, D. I. (2009). Epileptic seizure detection in EEGs using time-frequency
analysis. IEEE Trans Inf Technol Biomed, 13(5), 703-710.
[3]
AlZubi, S; Islam, N. and Abbod M. (2011): Multiresolution Analysis Using Wavelet, Ridgelet, and Curvelet Transforms
for Medical Image Segmentation. Int J Biomed Imaging. 11(4): 1-4.
[4]
Subasi, A., Alkan, A., Koklukaya, E., and Kiymik, M. K. (2005). Wavelet neural network classification of EEG signals by
using AR model with MLE preprocessing. Neural Netw, 18(7): 985-997.
[5]
Adeli, H., Ghosh-Dastidar, S., & Dadmehr, N. (2007). A wavelet-chaos methodology for analysis of EEGs and EEG
subbands to detect seizure and epilepsy. IEEE Trans Biomed Eng, 54(2):205-211.
[6]
Ganesan. M, Sumesh. E.P and Vidhyalavanya, R, (2010). Multi-Stage, Multi-Resolution Method for Automatic
Characterization of Epileptic Spikes in EEG., International Journal of Signal Processing, Image Processing and Pattern
Recognition, 3(2): 33-40.
© 2019, IRJET
|
Impact Factor value: 7.34
|
ISO 9001:2008 Certified Journal
|
Page 133
Download