advertisement

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net ELECTROENCPHALOGRAM SIGNALS CLASSIFICATION USING GRADIENT BOOST ALGORITHM AND SUPPORT VECTOR MACHINE Ajao, T.A1, Oyewole, A.O2, Ojo, O.S3, Amore, T.O4, Amusan, D.G5 and Olabode, A.O6 1,2,4Researcher, Federal Institute of Industrial Research Oshodi (FIIRO), Nigeria. Scholar, Ladoke Akintola University of Technology, Ogbomoso, Nigeria. 5E-Tutor, LAUTECH Open and Distance Learning Center, Ogbomoso, Nigeria. 3,6Research -----------------------------------------------------------------***----------------------------------------------------------------Abstract - Automatic diagnosis of epilepsy seizure from Electroencephalogram (EEG) has been an active research in the field of biomedical science. A significant amount of classification of Electroencephalogram (EEG) signal have been proposed in recent researches; most of which achieved a very promising performance but are characterized by high false positive rate and limited by being computationally intensive. This research carried out a comparative analysis of the performance evaluation of Extreme Gradient Boost Algorithm and Support Vector Machine for the classification of epileptic seizures in human electroencephalogram (EEG). This research revealed that the XGBoost Algorithm outperformed SVM model in the classification of an EEG signal. Keywords- Extreme Gradient Boost, Support Vector Machine, Electroencephalogram (EEG) 1. INTRODUCTION In recent time, a lot of effort has been directed toward the application of computer analysis of the bio-electric signals of the human system. Several ill health conditions in man can be detected from the evaluation of the electrical signals with the body, some of the important bio-electric signals in the human system include those responsible for the heartbeat, brain signal and those in the central nervous system. However, the breakthrough in soft computing and artificial intelligence has improved the development of more effective classification, diagnostic techniques and improvements in treatment methodologies (Tzallas, et al., 2012). Soft computing technique has helped in extracting and classifying bio-signal such as Electromyography (EMG), electroencephalogram (EEG), Electrooculography (EOG) and electrocardiogram (ECG) in order to detect or treat the ailment. Different methods and techniques have been developed for detecting and classifying the electroencephalogram (EEG) as either normal or epileptic. However, complete visual analysis of EEG signal is very difficult, hence, automated means of detection is essential. Epilepsy is characterized by sudden recurrent and transient disturbances of perception or behaviour resulting from excessive synchronization of cortical neuronal networks. Epileptic seizures are divided by their clinical manifestation into partial or focal, generalized, unilateral and unclassified seizures (Tzallas, Tsipouras, and Fotiadis, 2009). The use of classification systems in medical diagnosis has increased significantly. There is no doubt that evaluation of data taken from patients and decisions of experts are the most important factors in diagnosis. Classification systems help to minimize possible errors that can be done because of a fatigued or inexperienced physician. Automated diagnostic systems have been applied to a variety of medical data, such as electrocardiograms (ECGs), electromyograms (EMGs), electroencephalograms (EEGs), ultrasound signals/images, X-rays, and computed tomographic images (AlZubi, Islam and Abbod, (2011). This research focuses on the comparative analysis of extreme gradient boost and support vector machine in the detection and classification of an EEG as either epilepsy seizure or non-epilepsy seizure for effective management of a patient suffering from epilepsy seizure. Subsequently, the rest of this paper is organized in the following sections: some reviews on related EEG signals, methodology of a Xboost and SVM, followed by results and discussion. The final section concludes the paper along with some recommendations for future research. 2. REVIEWS ON ELECTROENCEPHALOGRAM In recent time, a lot of effort has been directed toward the application of computer analysis of the bio-electric signals of the human system. Several methods for the brain function analysis such as megnetoencephalography (MEG), functional magnetic © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 125 International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net resonance imaging (fMRI) and positron emission tomography (PET) have been introduced, the EEG signal is still a valuable tool for monitoring the brain activity due to its relatively low cost and being convenient for the patient. However, the breakthrough in soft computing and artificial intelligence has improved the development of more effective classification, diagnostic techniques and improvements in treatment methodologies (Tzallas, et al., 2012). Different methods and techniques have been developed for detecting and classifying the electroencephalogram (EEG) as either normal or epileptic. However, complete visual analysis of EEG signal is very difficult, hence, automated means of detection is essential. Subasi at al., (2005) deals with a novel method of analysis of EEG signals using discrete wavelet transform, and classification using ANN. In their work, the signal decomposed in 5 levels using Daubechies order 4 (DB4) wavelet filter. The energy of details and approximation were used as the input features. Adeli et al., (2007) proposed a wavelet-chaos-neural network methodology for classification of electroencephalograms (EEGs) into healthy, ictal and interictal EEGs. In order to decompose the EEG into the delta, theta, alpha, beta, and gamma sub-bands the wavelet analysis is utilized. Three parameters are used for EEG representation: standard deviation, correlation dimension, and largest Lyapunov exponent. The research was carried out in two phases with the intention of minimizing the computing time and output analysis, band-specific analysis and mixed-band analysis. The results showed that all the three key components the wavelet-chaos-neural network methodologies are significant for enhancing the EEG classification accuracy. Ganesan, et al., (2010) proposed a technique for the automatic detection of the spikes in the long term 18 channel human electroencephalograms (EEG) with less number of data set. The scheme for detecting epileptic and non-epileptic spikes in EEG is based on a multi-resolution, multi-level analysis and Artificial Neural Network (ANN) approach. In lieu of these, the results obtained from various researches revealed that there is high false positive rate. 3. METHODOLOGY The implementation tool used was Python 3.6 software package (Spyder 3.5.1) on Windows 10 enterprise 64-bit operating system, Core i5 CPU [email protected] Central Processing Unit, 8GB RAM and 500 Gigabytes hard disk drive with accurate speed for better performance. Statistical tools of t-test value was used to further validate the performance of each of the techniques used. Python programming language was used to implement the system because of its multiple programming paradigms and dynamic features in machine learning. The methodology involved five major steps, which are: a. b. c. d. e. Data acquisition, Removal of artifacts Extraction of features Decomposition of extracted features Classification of decomposed feature using XGboost and SVM. A. Acquisition of datasets Publicly available dataset from the Clinique of Bonn University were used for this study. This data is recorded by a means of a 128-channel 12 bit EEG system with 173.5 samples per second. The total dataset comprising 500 segments were grouped into five sets. Each segment has 23.6 seconds duration. All sets are selected from EEG records after purifying artifacts caused by eyes and muscle movements. The EEG data acquired contain three different cases namely; data for healthy people, epileptic people during the seizure-free interval (interictal) and epileptic people during seizure interval (ictal). The case has five segments (Z, O, N, S, and F) that were used for training and testing XGboost and SVM. ). Set Z and O are obtained from healthy people under the condition of eyes open and closed with respect to the external surface electrodes using a standardized electrode placement scheme. The set N and F are obtained from interictal people. The set F is obtained from epileptogenic sections of the brain that represent the focal activity while N has been taken from the hippocampal pattern of the brain that indicates non-focal interictal activity. The set S has been obtained from an epileptic subject during seizure interval. The bands that are clinically relevant are delta, theta, alpha, beta and gamma. © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 126 B. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net Removal of Artifacts The EEG signal was normalized be rescaling the features so that signal have the properties of a standard normal distribution with μ=0 and σ=1, where μ is the mean (average) and σ is the standard deviation from the mean. C. Extraction of features using Linear Discriminant Analysis Different approaches have been considered for the extraction of features for EEC signals in the previous work where these methods were usually used to explore the information from EEG. In this work, Linear Discriminant Analysis was used for feature extraction and this involved ten stages. These are the steps for the extraction of features using Linear discriminant analysis i. Given a set of N samples ) as given by, each of which is represented as a row of length M as in Figure 2.4 (step (A)), and ) ) ) ) ) ) ) [ ii. iii. ) Compute the mean of each Calculate between-class matrix ) ∑ iv. v. ) ) as in Equation 3.2 below ) ) for all Class where represents coefficients of signal s in an orthonormal basis Compute within-class matrix of each class ), as follows: ∑( vi. )] ) )( ) ) Construct a transformation matrix for each class ) ) ) vii. viii. ix. x. ) and eigenvector ) of each transformation matrix The eigenvalues ), are then calculated, where and represent the calculated eigenvalues and eigenvectors of the ith class respectively. Sorting eigenvectors in descending order according to their corresponding eigenvalues. The first k eigenvectors are then used as a lower dimensional space for each class ( ). ) onto their lower dimensional space ( ), as: Project all original samples ) Where xi. represents the projected samples of the class end for © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 127 D. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net Classification of decomposed feature using XGboost and SVM. i. Gradient Boost Algorithm The steps for the classification of extracted features using gradient boost algorithm are; i. ii. input data (x, y)Ni = 1 Select number of iterations M iii. Choice of the loss-function Ψ(y, f) iv. choice of the base-learner model h(x, θ) v. vi. Initialize ⏞ with a constant for Analogue to Digital Converter Amplifier Data pre-processing -Normalization - Feature Scaling Feature Extraction Using LDA Feature dimensionality reduction, extraction and separation Classification Result Classification using XGBoost and SVM classifier Fig 3.1: Conceptual view of EEG classification system © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 128 International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 vii. Compute the negative gradient www.irjet.net ) viii. Fit a new base-learner function ) ix. Find the best gradient descent step-size ̂ ∑ ) ) (3.6) x. update the function estimate: ̂ ← ̂ xi. ii. ) ) end for Support Vector Machine for classification of extracted features The Support Machine Vector was applied for classifying normal and seizure activity from the continuous recording EEG signals. Feature vectors are generated for both seizure and non-seizure activity. Parameters like area and mean frequency of the components are estimated and given as input for LDA-SVM. The Selected best global position ( ) of the LDA output with the detected feature subset mapped by optimized parameters C and using equation (3.8) ‖ ‖ and modelled with the ∑ such that ∑ ( ) ) The final classification was obtained using this equation 3.7 ( ) ) ) Where N is the size of the dataset, C is the cost function. I iii. (3.9) are the slack variables, x and b is an offset scalar. Implementation Phase The implementation phase for the Gradient boost algorithm and Support vector machine is presented in Fig 3.2. The first stage was the signal data acquisition from the human EEG which was pre-processed using Standard Scaler and the second stage was feature selection and dimensional reduction using Linear Discriminant Analysis. The extracted features were classified into epileptic seizure and non-epileptic seizure using XGBoost and SVM algorithm. This was done using Spyder 3.2.4 version. © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 129 International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net Start Acquire Dataset for training Load Test Data Normalize the Dataset Normalize the Dataset Apply Feature Scaling on Dataset Apply Feature Scaling on Dataset Apply LDA for feature extraction and dimension reduction Apply LDA for feature extraction and dimension reduction Store selected features into the Library Features extracted Classify extracted features using XGBoost or SVM Classification Result Yes Test another data No Stop Fig3.2: Flowchart showing trained and tested EEG signal with XGBoost, and SVM © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 130 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net RESULTS AND DISCUSSION The average training time generated by application of XGBoost for five datasets (Z-O-N-F-S) of 20490 is 2.817sec while the average training time generated by application of SVM for five datasets (Z-O-N-F-S) of 20490 is 0.610sec. The result in terms of computation time for training the dataset after five trials reveals that the time spent increases as the number of dataset increases, which implies that the time consumed depends on the features in the training set for XGBoost Model and SVM model. The classification results for the XGboost showed that the computation time increases with increase in dataset as illustrated by Table 4.1 (a). The model achieved a false positive rate of 0.33%, sensitivity of 99.68%, specificity of 99.45%, precision of 98.48%, an accuracy of 99.06% and F-measure of 99.07% at classification time of 0.01sec. Also for three (3) datasets (O-N-S) of 12294x100 dimension, XGBoost model achieved a false positive rate of 1.86%, sensitivity of 92.52%, specificity of 98.14%, precision of 96.53%, accuracy of 96.12% and F-measure of 94.49% at classification time of 0.03sec. Similarly, for five (5) datasets (Z-O-N-F-S) XGBoost model has a false positive rate of 2.04%, sensitivity of 86.92%, specificity of 97.96%, precision of 91.55%, accuracy of 95.72% and F-measure of 89.17% at classification time of 0.11sec. Also, the classification results for the support vector machine algorithm showed that the computational time increases with increase in dataset as indicated by Table 4.2b. The SVM model achieved a false positive rate of 1.47%, sensitivity of 99.03%, specificity of 98.53%, precision of 98.55%, accuracy of 98.78% and F-measure of 98.79% at classification time of 0.004s. Also for three (3) datasets (O-N-S) of 12294x100 dimension, SVM model achieved a false positive rate of 3.24%, sensitivity of 94.24%, specificity of 96.76%, precision of 93.47%, accuracy of 95.93% and F-measure of 93.86% at classification time of 0.042s. Similarly, for five (5) datasets (Z-O-N-F-S) SVM model has a false positive rate of 3.54%, sensitivity of 88.99%, specificity of 96.46%, precision of 85.66%, accuracy of 95.02% and F-measure of 87.29% at classification time of 0.016s. The results of the classification for the XGboost and SVM show that XGboost outperformed the SVM but computationally expensive in terms of classification time. The results show that there is significant variation in the performance metrics with an increase in dataset and the best result is obtained using two (2) datasets (Z-S) across all metrics (false positive rate, specificity, precision, sensitivity F-score and accuracy) for XGBoost and SVM model. The optimum performance is achieved with the classification scheme using two (2) dataset i.e. Z-S as presented in Table 4.3. Therefore, based on the performance of the three techniques using five datasets (Z-ON-F-S) as shown in Table 4.3; XGBoost has the optimum performance with respect to all performance metrics. In terms of training time, SVM trains and classify datasets faster than XGboost as illustrated in Fig 3.3. Further statistical analysis was also conducted between the XGboost and SVM. A t-test value was measured between the FPR of XGBoost and SVM. The paired t-test analysis conducted reveals that XGBoost was statistically significant at with . The mean difference and t-value being negative assert the fact the XGBoost model have a reduced False Positive Rate. The t-test result validates the fact the XGBoost outperformed the SVM techniques in terms of FPR. Therefore, the alternative hypothesis which states that the difference between the FPR of XGBoost and SVM is statistically significant is accepted. In view of the hypothesis tested it is confirmed statistically that XGBoost outperformed SVM in terms of False Positive Rate. Similarly, the paired t-test analysis conducted between the F-Measure of XGBoost and SVM reveals that there is no much distinction in the test result with a mean difference of 1.264 (i.e. ). Nevertheless, the result confirmed that the XGBoost is statistically significant at with . The t-test result validates the fact the XGBoost outperformed the SVM model in terms of F-Measures. Therefore, the alternative hypothesis which states that the difference between the F-Measures of XGBoost and SVM is statistically significant is accepted. In view of the hypothesis tested it is confirmed statistically that XGBoost outperformed SVM in terms of F-Measures. © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 131 International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net Table 4.2a: Classification results for Gradient boost algorithm Dataset FPR (%) Sensitivity (%) Specificity (%) Precision (%) Accuracy (%) F-Score (%) Classification Time Z-S 0.33 99.68 99.45 98.48 99.06 99.07 0.01 O-N-S 1.86 92.52 98.14 96.53 96.12 94.49 0.03 Z-O-N-F-S 2.04 86.92 97.96 91.55 95.72 89.17 0.11 Table 4.2b: Classification results for Support vector machine algorithm Dataset FPR (%) Sensitivity (%) Specificity (%) Precision (%) Accuracy (%) F-Score (%) Classification Time Z-S 1.47 99.03 98.53 98.55 98.78 98.79 0.004 O-N-S 3.24 94.24 96.76 93.47 95.93 93.86 0.042 Z-O-N-F-S 3.54 88.99 96.46 85.66 95.02 87.29 0.16 3 2.5 2 T i 1.5 m e 1 XGboost SVM 0.5 0 Z-S O-N-S Z-0-N-F-S Data sets Fig 3.3: Graph of Average time for each techniques. 5. CONCLUSION AND FUTURE WORKS Based on the experimental results generated by this work, it has revealed that the XGBoost model produced more efficient result in terms of recognition accuracies, precision, specificity, sensitivity, FPRand than F-measure SVM. In view of this, an EEG signal classification system based on XGBoost model produced a more reliable seizure or seizure free detection system than SVM model. Future work can be carried out by investigating, evaluating the performance of a hybrid approach using XGBoost © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 132 International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 11 | Nov 2019 p-ISSN: 2395-0072 www.irjet.net model with other classifiers to know their overall performances in terms of classification, recognition accuracy and average response time. REFERENCE [1] Tzallas (2012). Automatic seizure detection based on time-frequency analysis and artificial neural networks. Computational Intelligence and Neuroscience, 1-13. [2] Tzallas, A. T., Tsipouras, M. G., and Fotiadis, D. I. (2009). Epileptic seizure detection in EEGs using time-frequency analysis. IEEE Trans Inf Technol Biomed, 13(5), 703-710. [3] AlZubi, S; Islam, N. and Abbod M. (2011): Multiresolution Analysis Using Wavelet, Ridgelet, and Curvelet Transforms for Medical Image Segmentation. Int J Biomed Imaging. 11(4): 1-4. [4] Subasi, A., Alkan, A., Koklukaya, E., and Kiymik, M. K. (2005). Wavelet neural network classification of EEG signals by using AR model with MLE preprocessing. Neural Netw, 18(7): 985-997. [5] Adeli, H., Ghosh-Dastidar, S., & Dadmehr, N. (2007). A wavelet-chaos methodology for analysis of EEGs and EEG subbands to detect seizure and epilepsy. IEEE Trans Biomed Eng, 54(2):205-211. [6] Ganesan. M, Sumesh. E.P and Vidhyalavanya, R, (2010). Multi-Stage, Multi-Resolution Method for Automatic Characterization of Epileptic Spikes in EEG., International Journal of Signal Processing, Image Processing and Pattern Recognition, 3(2): 33-40. © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 133