Uploaded by Colton Chen

1 探究项目报告范例

advertisement
Epileptic Seizure Classification of EEGs using TimeFrequency Analysis by Support Vector Machine
INTRODUCTION
Epileptic seizures is a disease in which a transient occurrence of signs or symptoms due to abnormal
excessive or synchronous neuronal activity in the brain. And epilepsy is a disease characterized by
an enduring predisposition to generate epileptic seizures and by the neurobiological, cognitive,
psychological, and social consequences of this condition (Fig. 1). Each year, about 150,000
Americans are diagnosed with this central nervous system disorder that causes seizures. Over a
lifetime, 1 in 26 U.S. people will be diagnosed with the disease. Nowadays, doctors would likely to
provide drugs and medication for the patients, but there will be side effects such as thinning bones,
trouble remembering things, weight gain, and dizziness.
Fig. 1
Diagnosing epileptic seizure based on EEG signal automatically to improve the efficiency has
become a hot topic during the world. EEG is a recording of the electrical activity of the brain from
the scalp. The recorded waveforms reflect the cortical electrical activity. The shape of the wave may
contain useful information about the state of the brain. However, the human observer cannot directly
monitor these subtle details. Thus, there are various methods that can be utilized to discover EEG
signals. For instance, the method of time-frequency analysis is used in two ways: non-parametric and
parametric estimations are both being adopted to solve the problem. There are several approaches by
using non-parametric approaches, such as Wigner-Ville distribution (WVD), short time Fourier
transform (STFT), and continuous wavelet transform (CWT), etc. Also, non-parametric analysis is
able to identify non-stationary signals as a simultaneous functions of time and frequency. Nonparametric approaches that transform wavelet are often used in order to represent the non-stationary
signals as a simultaneous functions of time and frequency. Moreover, non-parametric wavelet
transform approaches were commonly employed because it suffers to face a trade-off between time
and frequency resolution. On the other hand, the paramedic methods are sometimes regarded as
parameterized expressions using a time-dependent auto-regressive modeling approach.
Generally, one of the most popular model is the TVAR (time-varying autoregressive) model, and it
can be considered as an efficient tool to show the dynamics of the non-stationary signals. Biomedical
Page 1 of 9
signals can be exploited because of its simplicity and effectiveness. An example of the TVAR models
being utilized is the application on the analysis of non-stationary physiological signals including
simulation, spectral estimation, classification diagnosis and synchronization. The estimation of the
time-frequency distribution of non-stationary series can be calculated by using the coefficients of the
TVAR models. For example, the application of time-frequency distribution is able to detect the
clinical events from intracranial pressure and identify the EEG oscillation activities successfully.
In this paper, the method of TVAR modeling approach using multi-scale basis function is being
introduced. This method transforms a signal by using a basis set that is restricted in time and
frequency. The proposed method involves three steps. Firstly, the time-varying coefficients, which
are very suitable for the approximation of general non-linear and non-stationary signals, can be
deduced by utilizing a finite number of multi-scale basis functions. And the power spectrum density
(PSD) of every EEG segment is achieved, where a modified particle swarm optimization (PSO)
algorithm is applied to search for optical parameters of multi-scale radial basis functions. The PSD
could describe the the energy density of the signals simultaneously in time and frequency. Secondly,
features of measuring the fractional energy are then extracted from the PSD, where these features
represent the evolution of non-stationary signals’ energy over time. Finally, the features are fed into
a RBF-based support vector machine classifier for classification. The widely used signal processing
technique RLS is also applied to compare with MRBF-MPSO-SVM method to illustrate the
advantages of proposed method. The method, results and conclusion of MRBF-MPSO-SVM will be
introduced in following section.
METHODS
A. Identification of the TVAR Model
In the proposed method, we apply time-varying aggressive model to depict the dynamic performance
of EEG signal. The mth order of the TVAR model is written below:
In this formula, t ( t = 1, 2, 3, … , N, where N is the total number of samples ) means the time instant
or the sampling index of y(t), which is the signal. Also, p is the TVAR model’s order, am (t) is the
TVAR coefficients, y (t-m) indicates the delayed samples of the signal, and e(t) is the assumed to be
the sequence of independent and a variance of 2 (i.d. ~ (0,2)).
Generally, the identification of TVAR parameters has a solution that can be expanded by a set of
basis function πl (t) for l =1, 2, 4, …, L, so that the TVAR coefficients am (t) can be expressed as:
In this equation above cm,l represents the expansion parameters, and if we substitute equation (2)
into equation (1), we can find out the equation (3).
Page 2 of 9
When the suitable basis function is chosen, we will be able to define the new variables so that:
yl (t−m)=πl (t)y(t−m)
(4)
If we substitute equation (4) into equation (3), we will be able to get equation (5):
Equation (5) shows the TVAR model is simplified to a time-variant AR model, in which cm,l are not
the functions of time. And the time-invariant coefficients could be calculated by the least square
algorithm.
B. Determine Scales of Radial Basis Function by Modified Particle Swarm Optimization
The radial basis function (RBF) can be interpreted as a three-layer neural network model, with an
N-dimensional input vector x = (x1, x2, ···, xN) that broadcast to each neuron is depended on the
distance between the input vector and the scaler c, chic is defined as the center of RBF. Even though
a conventional single scale RBF (SRBF) is easy to construct, it is lacking good generation properties.
However, a set of multi-scale RBF (MRBF) involves a number of different basis functions, which
each contains multiple scale parameters or kernel widths, so it generates a a good local and global
generalization performance. A general multi-scale Gaussian kernel with an infinite smooth and a good
approximation property is usually chosen as a typical ideal kernel function.
(6)
Where c = [c1, c2, ···, cM ] is the location or the translation parameters that determine kernel
positions, σi2 is the scale or dilation parameters that determine kernel widths, M is the dimension of
PBF and ∥·∥ denotes the Euclidean norm, respectively.
The following step is to determine the unknown locations and scale parameters when time-varying
model coefficients are expanded by MRBF. In order to ensure that the estimation accuracy of timevarying parameters, the kernel position of RBF is uniformly distributed in time-varying parameters
as follows:
(7)
Page 3 of 9
Where ci is the position (or center) of the ith RBF, M is the dimension of RBF, and N is the length
of observational samples.
As for the determination of scale parameters σi2 in the MRBF, a modified particle swarm
optimization (MPSO) is used to determine optimal scale parameters of MRBF automatically. The
particles search for the whole space in order to hunt for the optimum solution influenced by its own
best previous experience (pbest) and the best experience of all other members (gbest), which also
called the cognition part and social part, respectively.
Assuming that C(h)best is the best previous position that is being encountered by the ith particle,
g(h)best means the global global best position and h refers to the iteration counter. The current
velocity vi(h) and position ui(h) of the ith particle at time h is defined as:
(8)
(9)
ω is the inertia
weight that controls
exploration degree of the search, β is an uniform random scalar between 0 and 1, l1 and l2 are
acceleration coefficients that influence the divergence of each particle at each iteration, respectively.
C. Time-frequency Analysis based on Power Spectrum Density
The proposed MRBF-MPSO modeling method is able to provide a representation of a highresolution time-frequency for the non-stationary time series, in which both of the time and frequency
resolution will be able to be achieved by using multiple scale radial basis functions simultaneously.
Once time-varying parameters in the TVAR model has achieves, the estimation of the PSD can be
easily obtained by the TVAR coefficients. The definition of the PSD is derived from the TVAR
model:
(10)
Page 4 of 9
In this equation, θˆi(t) refers to the TVAR parameter at time t, j = √-1, fs is the sampling frequency,
and δˆ²e is the variance of the estimated residual.
The calculation of the PSD by the spectral formula in equation (14) is used in order to extract the
features of the non-stationary time series in time-frequency. A grid division is applied in order to
extract the energy distribution features of of EEG signals from medical knowledge both in the time
and frequency domain.
There are five frequency sub-bands that are based on the medical knowledge of EEG clinical
interests. They are: delta (0-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (12-30 Hz), and gamma (3050 Hz) and three equal-sized windows over the time are selected in this paper. Fig. 2 presents a PSD
distribution result with the dotted grid used for feature extraction. In this figure, each feature F(m, k)
is calculated as follows:
(11)
In this equation, tm represents the mth time window. For example, t1 is from 0∼7.87Hz, t2 is from
7.87∼15.73Hz and t3 is from 15.73∼23.6Hz. fk is the kth frequency sub-band, i.e., f1 (delta), f2 (theta),
f3 (alpha), f4 (beta) and f5 (gamma). In the equation, each feature represents the signal’s fractional
energy in a specific time-window and sub-band, so that the overall feature set would refer to the
energy distribution of signals on the time-frequency plane. For the detection of epileptic seizures from
EEG signals, however, the frequency component of the spectral function is considered only from 0Hz
to 50Hz because of the medical knowledge of clinical interests. According to the work of Tzallas et
al., we choose three time windows and five frequency sub-bands in this study, so that the number of
the features 3×5=15 are achieved. Moreover, the total energy of the signal can also be calculated as
Page 5 of 9
an additional feature. Consequently, the total number of features in each feature set is 3×5+1=16, in
other words, each feature includes a 16-dimensional vector.
D. Classification and Performance Evaluation
The SVM classifier contains well-generalized properties in classification, so the extracted features
calculated by the PSD in classifying normal and seizure EEG signals is evaluated by a RBF kernel
based support vector machine (RBF-SVM) from LIBSVM library. The basic idea of the SVM is to
construct a hyperplane that involves the margin between positive and negative samples with the
maximum value. In order to select the most optimal SVM parameters, the grid search algorithm is
being utilized additionally.
Generally, sensitivity (SEN), specificity (SPE) and accuracy (ACC) are the three important
measurements of evaluating the classification performance, which are defined below:
(12)
(13)
(14)
In these equations, TP and TN represent the total number of correctly detected true normal events
and true seizure events. Respectively, the FP and FN refer to the total number of erroneously normal
events and erroneously seizure events.
RESULTS
The EEG dataset we used is from Bonn University, which involves five subsets. But in this study
we only employed three subsets, denied as Z, F, and S. Among these subsets, subset Z captured the
signal from healthy volunteers with eyes open, subsets F and S contain the signals from the epileptic
patients, in which the subset F is recorded in the seizure-free intervals from the three patients, while
subset S involves seizure activity that are recorded from all the exhibiting octal activity sites. In each
subset, 100 single-channel EEG recordings with the period of 23.6 seconds are involved.
There are two different types of classification tasks that are being considered in order to evaluate
the proposed method’s performance based on the described data set above:
Task 1: the normal and seizure classes are being examined. Specifically, the normal class has subset
Z type EEG segments, while the seizure class includes S type EEG segments.
Page 6 of 9
Task 2: detect the seizure class (subset S) in the presence of free-seizure epochs (subset F), which is
a harder task than task one.
The EEG segments are being analyzed by two time-frequency analysis methods: the conventional
adaptive RLS and our proposed MSRBF-based method. The figure below represents three typical S,
Z, and F EEG segments, and their PSDs, where the color maps of the PSDs are in the dB scale
Fig.3 The PSD distribution calculated from four typical EEG segments (S, Z and F)
Page 7 of 9
From the figure below, we can see that since data set S was recorded from all sites of exhibiting
seizure activity, it is apparent the result of PSD distribution from EEG data set S is higher than that
of the other two data sets—Z, and F. Compared with proposed method, the traditional RLS parametric
estimation method obtain poor time-frequency resolution results due to slow convergence or tracking
lag of time-varying parametric estimations. While the MSRBF method, in which these basis function
expansion methods can rapidly capture the changes of transient information of time-varying systems,
and thus result in higher time-frequency resolution. In order to evaluate the performance of two
different time-frequency analysis methods in terms of accuracy, sensitivity and specificity further,
the 16-dimensional PSD feature vector for each EEG segment from four data sets (S, Z, and F) is
being extracted for epileptic seizure classification.
The average classification results from support vector machine classifier by RLS and MSRBF are
shown in Table A below. For each classification problem, as can be seen from Table A, the
classification results of epileptic seizure EEG signals by using the proposed MSRBF method
outstandingly overpasses RLS in terms of sensitivity, specificity and accuracy. Specifically, for
classify S and F, the accuracy of round 10 is 97.8% by MSRBF, while the accuracy round 10 for the
RLS method is 94.2%, which is apparently lower than the accuracy using MSRBF by 3.6%.
Moreover, the comparison between S and Z would be a stronger argument that MSRBF is a better
method, since the accuracy for MSRBF is 99.5%, whereas the RLS method has an accuracy of 97.3%.
The accuracies have a difference of 2.2% which makes the MSRBF method outweighs the RLS
method. Therefore, these experiment results demonstrate the effectiveness of our proposed timefrequency analysis method, and is capable of classifying or detecting epileptic EEG signals.
Task
S and F
S and Z
Method
MSRBF
RLS
MSRBF (S and Z)
RLS (S and Z)
SEN
97.2000
93
100
98
SPE
98.4000
95.4000
99
96.6000
ACC (of round 10)
97.8%
94.2%
99.5%
97.3%
Table A, presenting the comparison of classification results on the EEG segments of S-F and S-Z
DISCUSSION
In this paper, a seizure detection method in EEG signals is proposed. The method is based on a
novel time-frequency analysis named MSRBF, where the multi-scale Gaussian function are used to
approximate time-varying parameters in the TVAR model. Each basis function has multiple kernel
scales (widths) which could help represent time-varying parameters more flexibly, so that it has good
generalization properties for extracting PSD features. After extracting the features from PSD by
neurologist knowledge, the SVM classifier is applied to classify the two tasks.
Page 8 of 9
Fig. 3 above shows the PSDs of type S, Z and F, calculated using the RLS and MSRBF methods
respectively. Apparently, subset S has a higher power spectrum density owing to the epileptic seizure
releases in EEG signals. In addition, the PSD calculated by the proposed method has a higher timefrequency resolution and thus produces better classification performance compared with RLS
methods. The classification accuracy also illustrates the effectiveness and advantages of the proposed
method. Specifically, the RLS algorithm represents the lowest classification accuracy between the
two classification methods, owing to the potential deficiency in slow convergence and the resulted
poor time-frequency distribution. The proposed method, on the other hand, is able to avoid the slow
convergence deficiency of the RLS method and generate better time-varying parameter estimation
results, and thus higher resolution time-frequency resolution can be achieved.
In this paper, the experiment results have illustrated that the proposed MSRBF classification method
has obtained high classification accuracy for detecting epileptic seizures from EEG signals, while it
may lead to higher computational complexity than the traditional RLS classification method. Two
main reason s are involved. First, the MPSO algorithm is used to hunt for the optimal scales in the
MRBF-based TVAR model. Second, a large number of redundant regressors or terms may be
involved in the MRBF-based expansion model. In order to reduce the complexity and further improve
the classification performance of the proposed method, some classical sparse modeling algorithms
including orthogonal least squares (OLS) or Lasso-based method will be employed to alleviate the
dilemma.
CONCLUSION
In this paper, a new method of time-frequency analysis has been proposed that applies a set of basis
functions for time-varying parameters and then extract time-frequency features to classify epileptic
seizures in EEG recordings. In order to get optimal scales of radial basis function, the modified PSO
algorithm is employed. The PSD features which calculated by time-varying coefficients are fed into
the RBF-SVM classifier for classifying EEG segments to examine the effectiveness and superiority
of the proposed method compared with traditional time-frequency analysis methods like RLS
expansion methods. Classification accuracy results show that the proposed method achieves the better
classification performance than RLS method in the classification problems of EEG signals, i.e. S-Z
and S-F.
Page 9 of 9
Download