Uploaded by benaek29

eswa114031

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/344360611
Exploiting dimensionality reduction and neural network techniques for the
development of expert brain-computer interfaces
Article in Expert Systems with Applications · September 2020
DOI: 10.1016/j.eswa.2020.114031
CITATIONS
READS
37
431
3 authors, including:
Xiaojun Yu
Nanyang Technological University
110 PUBLICATIONS 965 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
EXPONENTIAL KNOWLEDGE AUTOMATION View project
Robust Gait Generation View project
All content following this page was uploaded by Muhammad Tariq Sadiq on 24 September 2020.
The user has requested enhancement of the downloaded file.
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
Highlights
Exploiting dimensionality reduction and neural network
techniques for the development of expert brain–computer
interfaces
Expert Systems With Applications xxx (xxxx) xxx
Muhammad Tariq Sadiq, Xiaojun Yu, Zhaohui Yuan∗
•
•
•
•
Two-step filtering technique was adopted for cognitive and external noise removal.
Automated correlation-based criteria was proposed to select relevant components and coefficients for PCA, ICA and LDA respectively.
The regularization parameters for NCA were tuned to reduce the classification loss.
Extensive experiments with PCA, ICA, LDA, NCA techniques with several channel selection, neural networks and statistical measures were conducted in
EWT domain.
• The proposed framework provide 100% and 92.9% classification accuracy for subject dependent and independent experiments.
Graphical abstract and Research highlights will be displayed in online search result lists, the online contents
list and the online article, but will not appear in the article PDF file or print unless it is mentioned in the
journal specific style requirement. They are displayed in the proof pdf for review purpose only.
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
Exploiting dimensionality reduction and neural network techniques for the
development of expert brain–computer interfaces
Muhammad Tariq Sadiq 1 , Xiaojun Yu 1 , Zhaohui Yuan ∗
School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, PR China
ARTICLE
INFO
Keywords:
Electroencephalography
Brain–computer interface
Empirical wavelet transform
Motor imagery
Neighborhood component analysis
Neural networks
1
2
3
4
5
6
7
8
9
10
ABSTRACT
Background: Analysis and classification of extensive medical data (e.g. electroencephalography (EEG) signals)
is a significant challenge to develop effective brain–computer interface (BCI) system. Therefore, it is necessary
to build automated classification framework to decode different brain signals.
Methods: In the present study, two-step filtering approach is utilize to achieve resilience towards cognitive
and external noises. Then, empirical wavelet transform (EWT) and four data reduction techniques; principal
component analysis (PCA), independent component analysis (ICA), linear discriminant analysis (LDA) and
neighborhood component analysis (NCA) are first time integrated together to explore dynamic nature and
pattern mining of motor imagery (MI) EEG signals. Specifically, EWT helped to explore the hidden patterns of
MI tasks by decomposing EEG data into different modes where every mode was consider as a feature vector in
this study and each data reduction technique have been applied to all these modes to reduce the dimension of
huge feature matrix. Moreover, an automated correlation-based components/coefficients selection criteria and
parameter tuning were implemented for PCA, ICA, LDA, and NCA respectively. For the comparison purposes,
all the experiments were performed on two publicly available datasets (BCI competition III dataset IVa and
IVb). The performance of the experiments was verified by decoding three different channel combination
strategies along with several neural networks. The regularization parameter tuning of NCA guaranteed to
improve classification performance with significant features for each subject.
Results: The experimental results revealed that NCA provides an average sensitivity, specificity, accuracy,
precision, F1 score and kappa-coefficient of 100% for subject dependent case whereas 93%, 93%, 92.9%,
93%, 96.4% and 90% for subject independent case respectively. All the results were obtained with artificial
neural networks, cascade-forward neural networks and multilayer perceptron neural networks (MLP) for subject
dependent case while with MLP for subject independent case by utilizing 7 channels out of total 118. Such
an increase in results can alleviate users to explain more clearly their MI activities. For instance, physically
impaired person will be able to manage their wheelchair quite effectively, and rehabilitated persons may be
able to improve their activities.
1. Introduction
Brain–Computer Interface (BCI) system uses individual brain signals
to link the brain and a computer (Birbaumer et al., 2008). During
recent years, BCI has showed extensive contributions towards rehabilitations (Birbaumer et al., 2008; Pfurtscheller et al., 2003) and multimedia applications (Ebrahimi et al., 2003; Krepki et al., 2007; Szczuko,
2017). Electroencephalography (EEG)-based motor imagery (MI) BCI
systems are by far the most widely employed practical systems because
of reliability, non-invasive, low cost and superb temporal characteristics. Cincotti et al. (2008) and Kronegg et al. (2007). Nonetheless, a
key challenge for every real-time BCI device is to accurately interpret
various MI EEG signals (Siuly & Li, 2012).
Typically, a non-biased automated EEG classification system comprises three elements, i.e. preprocessing, extraction of features and
classification of signals. Preprocessing is primarily accountable for
noise suppression of information signals and there are several methods
proposed for this purpose. Readers are referred to Jiang et al. (2019) for
a comprehensive review of noise removal techniques from EEG signals.
Extraction and identification of features are essential components
of an automated system for assessing the outcomes of classification, for
which a broad range of approaches have been reported to classify MI
∗ Corresponding author.
E-mail addresses: tariq.sadiq@mail.nwpu.edu.cn (M.T. Sadiq), XJYU@nwpu.edu.cn (X. Yu), yuanzhh@nwpu.edu.cn (Z. Yuan).
1
Co-first authors.
https://doi.org/10.1016/j.eswa.2020.114031
Received 10 February 2020; Received in revised form 23 August 2020; Accepted 14 September 2020
11
12
13
14
15
16
17
18
19
20
21
M.T. Sadiq et al.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
EEG signals. For the spectral analysis of EEG signals, Fourier Transform
(FT) based methods are developed but these methods do not provide
time-domain information (Polat & Güneş, 2007; Rodríguez-Bermúdez
& García-Laencina, 2012). The autoregressive (AR) methods are computationally effective however; they suffered from artifacts, which limit
their applicability for practical BCI systems (Burke et al., 2005; Jansen
et al., 1981; Krusienski et al., 2006; Schlögl et al., 2002).
A variety of common spatial pattern (CSP) methods have been
described for feature extraction which includes regularized CSP with
selected subjects (SSRCSP), spatially regularization (SRCSP), CSP with
Tikhonov regularization (TRCSP) and CSP with weighted Tikhonov regularization (WTRCSP. In literature, sparse group representation model
(SGRM) (Jiao et al., 2018), temporally constrained sparse group spatial
pattern (TSGSP) method (Zhang et al., 2018), the CSP-rank channel
selection for multifrequency band EEG (CSP-RMF) (Feng et al., 2019),
as well as the sparse Bayesian extreme learning (Jin et al., 2018; Zhang
et al., 2017) have also been proposed. However, there is still gap for
the improvement of classification accuracy since these methods are not
applicable for subjects with small training samples.
More recently EEG signals have also been extracted and classified
using deep learning schemes based on the convolutional neural networks (CNN) and recurrent neural networks (RNN) (Sakhavi et al.,
2018; Thomas et al., 2017). For more information, readers are referred
to Zhang et al. (2019) for recent developments on BCI deep learning
systems. Nevertheless, the accuracy rate produced by the majority of
these approaches was not significant owing to the unavailability of
extensive data needed for the training phase. Two other drawbacks
which may impede the successful applicability of such methods are the
system resources needed and the computational complexity burdens.
Data decomposition (DD)-based techniques have recently gained
prominence in the identification of MI EEG signals. Some notable DDbased methods available in literature are intrinsic mode function (IMF)
with least square support vector machine (LS-SVM) classifier (Taran
et al., 2018), multivariate empirical mode decomposition (EMD) with
FT (Bashar & Bhuiyan, 2016), a comparative study between wavelet
packet decomposition (WPD), empirical mode decomposition (EMD)
and discrete wavelet transformation (DWT) with higher-order statistics
(HOS) (Kevric & Subasi, 2017) as well as empirical wavelet transform
(EWT) with HOS (Sadiq et al., 2019a).
Not all the features extracted from EEG signals are relevant for
the classification. Excessive numbers of features not only increase the
dimension of the feature matrix but also results in low classification
success rates. To reduce the dimension of large feature matrix, several
combinations of features are evaluated in studies (Li et al., 2014; Sadiq
et al., 2019a) to decode the best one for classification enhancement.
Moreover, several dimension reduction techniques are utilized to
choose the best features for EEG signal classification. Yu et al. extract
the features from CSP and analyze the effect of principal component
analysis (PCA) for feature reduction (Yu et al., 2014). Acharya et al.
use WPD to decompose EEG signals into several sub-bands and apply
PCA to them to reduce the size and then use several principal components as an input to the classifier (Acharya et al., 2012). Independent
component analysis (ICA) efficacy in selecting the best features subset
is demonstrated in the literature (Xu et al., 2004) by a linear transformation of large feature vector into low dimensional. Discrete Fourier
transform coefficients are considered as a feature set in BCI study and
further linear discriminant analysis (LDA) is applied to them to reduce
the classifier load (Kołodziej et al., 2012).
In Li et al. (2016) several bivariate features are extracted from EEG
seizure data and then a dimension reduction method known as lasso is
applied to reduce the feature dimension. Another dimensionality reduction method named as a t-distributed stochastic neighbor embedding
(t-SNE) is evaluated to reduce the nonlinear features extracted from
DWT (Li et al., 2016). Neighborhood component analysis (NCA) is a
weighting method that is used to reduce the dimension of feature matrix in study (Raghu & Sriraam, 2018) and increased the classification
accuracy up to 96.1% for EEG focal epileptic seizures data.
1.1. Limitations
68
We understood several limitations from available literature which
are listed as:
69
70
1. In reducing the size of the feature matrix, so many groupings
of features are investigated in studies (Bashar & Bhuiyan, 2016;
Kevric & Subasi, 2017; Li et al., 2014; Sadiq et al., 2019a; Taran
et al., 2018) to decode the better one for identification improvement, however, this procedure is manual and time-taking as
it necessitates numerous experimentations to choose the best
feature pair.
2. In results (Acharya et al., 2012; Kołodziej et al., 2012; Martis
et al., 2013; Subasi & Gursoy, 2010; Xu et al., 2004; Yu et al.,
2014), no automatic approach was used for the selection of the
PCA, ICA, and LDA components, consequently curtailing their
acceptability to potential implementation.
3. To avoid the over-fitting problem of NCA regularization parameter, its cost function parameters were tuned manually (Raghu &
Sriraam, 2018) which limit its applicability for practical systems.
4. Furthermore, most studies in literature only use one dataset that
reduces the versatility of those studies. In studies (Li et al., 2013,
2014; Siuly & Li, 2012) authors used only classification accuracy
as a performance measure conversely classification accuracy is
not enough to identify the MI signals (Sturm, 2013).
5. It is also worth noting that previous studies (Chaudhary et al.,
2020; Ince et al., 2009; Kevric & Subasi, 2017; Li et al., 2011;
Lu et al., 2010; Sadiq et al., 2019a; Siuly & Li, 2012; Song
& Epps, 2007; Wang et al., 2016; Zhang et al., 2013) limited
only for subject dependent studies however, recently, subject
independent experiments have gained significant importance
because of its ability to generalize many subject’s data to an
unknown subject and hence it helps the product developers to
develop a system for a large group of people by training their
system on few subjects.
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
1.2. Contributions
101
To address the aforementioned limitations in the existing studies,
the main contributions of this study are summarized as: (1) Design
and validate a new framework for automatic identification of MI tasks
from either sufficient or small training samples subjects (2) Introduce
and implement the PCA, ICA, LDA and NCA approaches for reducing
the large amount of EEG modes data obtained from the EWT (3)
Propose correlation-based criteria for the automated selection of the
components of PCA, ICA and LDA and the tuning of regularization
parameters for NCA, and they are used as efficient biomarkers of EEG
for MI task detections (4) Investigate a sustainable classification model
for the proposed features to differentiate the MI tasks (5) Improve
classification accuracy as compared with the existing methods (6) Build
an efficient subject independent MI EEG classification system.
In this study, we employ EWT for EEG signal classifications, since
it has been proved that EWT is very useful for non-linear and nonstationary signal analysis (Sadiq et al., 2019a). It is also worth mentioning that, although various nonlinear dimensionality reduction techniques have been employed in Li et al. (2016, 2017), Wang et al.
(2015) and Xu et al. (2020), the results for EEG signals are relatively
low. It is particularly noted that signal decomposition methods do not
perform well with nonlinear dimension reduction techniques in Li et al.
(2017). On the contrary, the effectiveness of linear dimension reduction
techniques with signal decomposition methods have been verified in
many studies (Acharya et al., 2012; Kołodziej et al., 2012; Martis et al.,
2013; Subasi & Gursoy, 2010; Xu et al., 2004; Yu et al., 2014).
Since the focus of this study is to design a flexible framework that
is effective for subjects with either sufficient or small training samples,
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
M.T. Sadiq et al.
Fig. 1. Block diagram of the proposed methodology.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
and is also suitable for both subject-dependent and independent experiments, which is the one of the biggest challenge in BCI field, for
the design of expert flexible BCI system, we focus mainly on the linear
dimension reduction techniques.
To the best of our knowledge, correlation-based strategy and automated tuned-NCA model are the first time being applied for the
automated selection of PCA, ICA and LDA components, and for the
subject-independent MI EEG tasks classification, respectively. Our work
is so far the only study which considers all the aforementioned limitations mentioned in Section 1.1 at one place for the development of
expert BCI systems.
We proposed a novel flexible framework which is effective for
subjects with small and sufficient training samples, and provide effective results for both subject-dependent and subject-independent experiments. For the fair evaluation of the proposed framework, along
with classification accuracy (𝐴𝑐𝑐 ), we utilized several other evaluation
metrics such as sensitivity (𝑆𝑒𝑛 ), precision (𝑃𝑟𝑒 ), F1 score (𝐹1 ), specificity
(𝑆𝑝𝑒 ), and kappa coefficient (𝐾𝑐𝑜 ) obtained from confusion matrix.
19
1.3. Organization
20
21
22
23
24
25
The remainder of the article is structured subsequently. The datasets
are described in Section 2. Section 3 discusses the aspects of the
suggested approach. The criteria of performance assessment for experiments are describe in Section 4. Section 5 describe the experimental
setup of the study, Section 6 describes the observations. Section 7
details the debates and Section 8 summarizes the work.
26
2. Materials
27
28
The suggested research utilize two publicly available datasets which
are describe in following sections.
29
2.1. Dataset 1 description
30
31
32
33
The IVa dataset (Blankertz et al., 2006) incorporates right-hand
(RH) and right-foot (RF) MI activities. Such a set of data was accumulated from five completely relaxed healthy individuals known as ‘‘aa’’,
‘‘al’’, ‘‘av’’, ‘‘aw’’ and ‘‘ay’’ in this research by adjusting 118 electrodes
on each subject as per the global 10/20 system guidelines (Jurcak
et al., 2007). Each subject set of data includes details about the MI
EEG data at its initial four sessions without feedback, whereas there
were a maximum of 280 trials for each subject, with 140 trials devoted
to class 1 activities and others to class 2 activities. For 3.5 s, every
subject served any of the two MI activities, but the training and testing
trials are dissimilar for each subject. Precisely, out from the 280 trials,
168, 224, 84, 56 and 28 are training trials for ‘‘aa’’, ‘‘al’’ ‘‘av’’ ‘‘aw’’
and ‘‘ay’’ respectively, whereas the remaining are for testing. This
research adopts a down-sampled frequency of 100 Hz, whilst the initial
measuring rate was 1000 Hz.
34
35
36
37
38
39
40
41
42
43
44
2.2. Dataset 2 description
45
BCI competition III dataset IVb (Blankertz et al., 2006) is made
up of left hand (LH) and right foot (RF) MI tasks. It was obtained
from a normal subject labeled as ‘‘Ivb’’ with 118 electrodes positioned
under the expanded 10/20 international system (Jurcak et al., 2007).
Without feedback, this dataset has seven original sessions. For both MI
activities, 210 trials were obtained from 118 electrodes and a bandpass filter (BPF) with 0.05 Hz and 200 Hz lower and upper frequencies
filtered those signals respectively. In this analysis, the down-sampled
information is used at a frequency of 100 Hz.
46
47
48
49
50
51
52
53
54
3. Methods
55
The proposed method consists of five different modules as depicted
in Fig. 1, and each is explained briefly as follows.
56
57
3.1. Module 1: Pre-processing of data
58
EEG measurements are infected with external and cognitive noises
that impede further analysis due to unwanted effects. Moreover, cross
talk also degrade the MI EEG data patterns due to interference from
neighboring electrodes. To avoid these effects, in this study two-step
filtering technique is employed. In first step, EEG signals are band-pass
filtered with 8–25 Hz to retain the 𝜇 (8–12 Hz) and 𝛽 (16–24 Hz) bands
as these two bands have information related to imagine movement. The
elliptical filter of 6th order with 1 dB and 50dB passband and stop band
59
60
61
62
63
64
65
66
M.T. Sadiq et al.
1
2
3
4
5
frequencies is used in this study due to it sharp cutoff characteristics.
In the second step, Laplacian filter is employed to reduce the cross talk
between channels where the mean signals of four nearest neighboring
channels is subtracted from each channel signal (Bhattacharyya et al.,
2014; Dornhege et al., 2007).
6
3.2. Module 2: Channel selection
7
8
9
10
11
12
13
14
15
16
17
Because of increasing complexity, the use of large number of channels for EEG signal analysis is discouraged. So, in current study three
different channel selection criteria were used for further experimentation.
3.2.1. Channel selection criteria 1
In criteria 1, we have chosen 𝐶3 , 𝐶𝑍 , and 𝐶4 electrodes in accordance with the 10–20 framework (Jurcak et al., 2007) since these
electrodes are the most discriminatory in hand and foot movements
data (Kevric & Subasi, 2017). It should be noted that RH’s MI operation
is usually detected above the left motor cortex across the 𝐶3 electrode,
and the foot’s MI action across the 𝐶𝑍 electrode.
18
19
20
21
22
23
3.2.2. Channel selection criteria 2
From studies, it is understood that the brain’s frontal, central and
parietal lobes are important from a neurological perspective for MI
commands. Information from seven electrodes i.e. 𝐹3 , 𝐹4 , 𝐶3 , 𝐶𝑍 , 𝐶4 ,
𝑃3 and 𝑃4 which reside above these lobes of interest according to the
10–20 standard (Jurcak et al., 2007) are considered in criteria 2.
24
25
26
27
28
29
3.2.3. Channel selection criteria 3
In criteria 3, electrodes around motor cortex region are nominated,
as this region is responsible for MI execution. According to 10–20 standard (Jurcak et al., 2007), 18 electrodes lies around motor cortex region
and labeled as 𝐶5, 𝐶3, 𝐶1, 𝐶2, 𝐶4, 𝐶6, 𝐶𝑃 5, 𝐶𝑃 3, 𝐶𝑃 1, 𝐶𝑃 2, 𝐶𝑃 4, 𝐶𝑃 6,
𝑃 5, 𝑃 3, 𝑃 1, 𝑃 2, 𝑃 4 and 𝑃 6, respectively (Sadiq et al., 2019a, 2019b).
30
3.3. Module 3: Feature extraction
31
32
In this work empirical wavelet (EWT) is consider as a feature
extraction tool for EEG signal analysis.
33
34
35
36
37
38
39
40
41
42
3.3.1. Empirical wavelet transform
In Gilles (2013), Gill et al. suggested using the EWT strategy to
address the shortcomings in signal decomposition and analysis faced by
the EMD technique. Wavelet filter bank is a vital component of EWT
that allows to break down non-stationary signals into multiple modes
in which each is adjusted to a unique IMF frequency (Gilles, 2013). The
key working steps of EWT procedure can be summarize in subsequent
three stages:
Step 1: Use the fast Fourier Transform (FFT) method to obtain the
Fourier spectrum from the examined signal’s frequency range [0, 𝜋].
43
44
45
Step 2: Use the scale-space boundary detection technique specified
in Gilles (2013) to partition the acquired Fourier spectrum into N
neighboring segments.
46
47
48
49
50
Step 3: Empirical wavelets are used as band-pass filters for all frequency segmentations. For this role, this study uses Meyer’s wavelets
concept and Littlewood–Paley’s idea (Daubechies, 1992). The Eqs. (1)
and (2) reflect the functions of empirical scaling and wavelet as (Gilles,
2013):
51
⎧1,
⎪ ( 𝜋𝜙(𝛼,𝑓𝑗 ) )
𝐴̂ 𝑗 (𝑓 ) = ⎨cos
,
2
⎪
⎩0,
If |𝑓 | ≤ (1 − 𝛼)𝑓𝑗
If (1 − 𝛼)𝑓𝑗 ≤ |𝑓 | ≤ (1 + 𝛼)𝑓𝑗
otherwise
(1)
⎧1,
If (1 + 𝛼)𝑓𝑗
⎪
⎪
≤ |𝑓 | ≤ (1 − 𝛼)𝑓𝑗+1
⎪ ( 𝜋𝜙(𝛼,𝑓𝑗+1 ) )
,
If
(1 − 𝛼)𝑓𝑗+1 ≤ |𝑓 |
⎪cos
2
⎪
̂
𝐵𝑗 (𝑓 ) = ⎨
≤ (1 + 𝛼)𝑓𝑗+1
( 𝜋𝜙(𝛼,𝑓 ) )
⎪
𝑗
,
If (1 − 𝛼)𝑓𝑗 ≤ |𝑓 |
sin
⎪
2
⎪
≤ (1 + 𝛼)𝑓𝑗
⎪
⎪0,
otherwise
⎩
)
(
(∣ 𝑓 ∣ −(1 − 𝛼)𝑓𝑗 )
𝜙(𝛼, 𝑓𝑗 ) = 𝛽
2𝛼𝑓𝑗
(2)
52
(3)
53
while 𝛼 variable is essential for preventing any interaction between
Eqs. (1) and (2) functions and set a tight frame as shown in Eq.
(4) (Gilles, 2013),
(
)
𝑓𝑗+1 − 𝑓𝑗
𝛼 < 𝑚𝑖𝑛𝑗
(4)
𝑓𝑗+1 + 𝑓𝑗
54
55
56
where Eq. (5) represent the arbitrary function 𝛽(𝑦) (Gilles, 2013),
58
⎧0,
⎪
𝛽(𝑦) = ⎨𝛽(𝑦) + 𝛽(1 − 𝑦) = 1,
⎪
⎩1,
57
If 𝑦 ≤ 0
(5)
59
The coefficients of Eqs. (1) and (2) are found by the dot product of
the analyzed signal with the empirical scaling and wavelet functions,
respectively, and thus, the empirical mode would be formalized.
In this study, 10 modes are extracted from each channel signal
empirically. We have included Fig. 2 to present both the original EEG
and the modes generated with the EWT. For clear visualization the blue
color is use for class 1 whereas red color for class 2. It should also be
noted that the same number modes of both classes have significance
difference in shapes, which indicate that the statistical independency
and chances of better classification among different classes are very
high (Siuly & Li, 2012).
60
61
62
63
64
65
66
67
68
69
70
3.4. Module 4: Dimensionality reduction
71
The number of modes from all channels are combine together,
which form a large feature matrix. To reduce the feature matrix dimension, following four dimension reduction techniques are utilize in
this study.
72
73
74
75
3.4.1. Principal component analysis
Principal Component Analysis (PCA) is a well-known technique of
data reduction in which 𝐷-dimensional dataset is interpreted in low
dimensional data to minimize the complexities, space and degrees of
freedom. PCA is effective for segmenting signals from numerous sources
and the goal is to depict data in a space, which effectively represents
the variance in a context of a sum-squared error. The procedure of
PCA can be summarize by succeeding phrases. First, for a data matrix,
a mean vector (𝑚) and covariance matrix (𝑐) with D-dimension and
𝐷 × 𝐷 dimension respectively are executed. Second, the eigenvectors
(𝑣1 , 𝑣2 ....) and eigenvalues (𝜆1 , 𝜆2 ....) are computed and arranged with
respect to lowering eigenvalue. Third, the eigenvectors spectrum are
visualize and leading eigenvectors (𝐾) are selected. There will often
be a dimension indicating an underlying dimensionality of signal subspace and rest of all dimensions are noise. At last, create a 𝐾 ×𝐾 matrix
(𝐵), the columns of which consists of 𝐾 eigenvectors. The data matrix
is pre-process by using following expression:
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
𝑋̇ = 𝐵 𝑡 (𝑋 − 𝑚)
(6)
93
Further details of PCA can be found in Cover and Hart (1967) and
Cao et al. (2003).
94
95
for all 𝑦[0 1]
if 𝑦 ≥ 1
M.T. Sadiq et al.
Fig. 2. Original EEG Signal with Modes for Class 1 and Class 2 for dataset IVa.
1
2
3
4
5
6
7
8
3.4.2. Independent component analysis
Independent Component Analysis (ICA) is data reduction technique
that helps to extract mutually exclusive independent components from
multivariate data. Data contained in one component cannot inferred
from rest of the others and factually this implies joint likelihood of autonomous amounts is acquired as the product of the likelihood of every
one of them. Due to our noise free and independence assumptions, we
can write the multivariate density function as
9
𝑃 𝑥(𝑡) =
𝑎
∏
(7)
𝑃 (𝑥𝑗 (𝑡))
𝑗=1
10
11
12
13
where 𝑎 represents the scalar source signals 𝑥𝑗 (𝑡) for 𝑗 = 1, 2 … 𝑎 with
independence assumptions and 𝑡 notate the index of time with range of
1 ≤ 𝑡 ≤ 𝑇 . Assume that at every moment a data vector of 𝐷-dimensional
is observed so,
14
𝑦(𝑡) = 𝐵𝑥(𝑡)
15
16
17
where 𝐵 is a scalar matrix of 𝑎 × 𝐷 and 𝐷 ≥ 𝑎 is needed. The goal of
ICA is to retrieve the source signals from the observed signals so a real
matrix 𝑄 is found as
18
𝑧(𝑡) = 𝑄𝑦(𝑡) = 𝑄𝐵𝑥(𝑡)
19
20
21
22
23
24
25
(8)
−1
(9)
−1
where 𝑄 = 𝐵 but 𝐵 and 𝐵 both are unknown. In this study
maximum-likelihood techniques are utilize to seek 𝐵. The density
estimate with specification of 𝑝(𝑦;
̂ 𝑟) is use and the vector 𝑟 is determine,
which reduces the difference between the distribution of source and the
approximate. In conclusion, 𝑝(𝑦;
̂ 𝑟) is an estimation of 𝑃 (𝑦) and 𝑟 is the
basis vectors of 𝐵. Additional information of ICA is found in Cao et al.
(2003) and Cover and Hart (1967).
3.4.3. Linear discriminant analysis
Linear Discriminant Analysis (LDA) is a data reduction technique
that reduces a D-dimensional data into single dimension. LDA’s goal
is to build a new variable incorporating the original indicators. This
is done by minimizing discrepancies concerning the new parameters
among the predetermined categories. The aim is to incorporate the
indicator values in such a manner to form a new composite attribute,
which provide the discriminating score. At last, every category is
supposed to have a Gaussian distribution of discriminating results with
the maximum possible discrepancy in average results for the categories.
Discriminant function helped to compute the discriminant scores and
can be formulated as:
26
27
28
29
30
31
32
33
34
35
36
37
(10)
38
Therefore, a discriminating score is a weighted linear indicator mixture. The weights are calculated to increase the discrepancies among
discriminating average category scores. In particular, indicators with
broad differences among category averages will have greater weights,
whereas weights will be low if category averages are same. More details
of LDA can be studied in Fielding (2007).
39
40
41
42
43
44
3.4.4. Neighborhood component analysis
Neighborhood component analysis (NCA) is a non-parametric technique, which ranked the attributes according to the substantial information. It formulated because of K-NN classification algorithm. The NCA
algorithm’s increase the leave-one-out separation outcome with tuned
regularization parameter by learning feature weights over the training
data. Let the training data be (Yang et al., 2012)
45
46
47
48
49
50
51
(11)
52
𝐷 = 𝑞1 𝑍1 + 𝑞2 𝑍2 + 𝑞3 𝑍3 ......𝑞𝑝 𝑍𝑝
𝑇 = {(𝑓𝑖 , 𝑙𝑖 ), 𝑖 = 1, 2...𝑛}
M.T. Sadiq et al.
Fig. 3. Proposed framework for feature selection by NCA.
1
2
3
4
5
Where 𝑓𝑖 represents the feature vector with 𝐹 dimensions, 𝑙𝑖 ∈
{1, 2 … ..𝑐} denotes the respective class labels, 𝑛 corresponds to the total
observations and 𝑐 are class labels. For two samples 𝑓𝑖 and 𝑓𝑗 , the
distance function 𝐷𝑤 can be represented in context of weight vector
and expressed in Eq.(12) as (Yang et al., 2012)
6
𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 ) =
𝐷
∑
𝑘=1
|
|
𝑤2𝑘 |𝑓𝑖𝑘 − 𝑓𝑗𝑘 |
|
|
(12)
7
8
9
10
where 𝐷𝑤 are the attribute weights. In this technique, randomly a sample is chosen from 𝑇 , labeled respectively and considered as reference
point. The reference point can be represented in Eq.(13) as (Yang et al.,
2012)
11
𝑘𝑒𝑟(𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 ))
𝑃𝑖𝑗 = ∑𝑛
𝑗=1 𝑘𝑒𝑟(𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 ))
12
13
14
Where −𝑧
𝑘𝑒𝑟 represents the kernel function, which is define as,
𝑘𝑒𝑟(𝑧) = 𝑒 𝜎 and 𝜎 is the width of kernel. The correct classification
probability of 𝑓𝑖 is given as
15
𝑃𝑖 =
𝑛
∑
𝑘𝑒𝑟(𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 ))
(13)
(14)
𝑗=1,𝑖≠𝑗
16
17
18
when 𝑓𝑖 = 𝑓𝑗 , the value of 𝑃𝑖𝑗 will be one. So the total classification
accuracy can be formulated by following objective function 𝐹 (𝑤) in Eq.
(15) as (Yang et al., 2012)
19
𝐹 (𝑤) =
20
21
22
23
The NCA algorithm’s objective is to increase the 𝐹 (𝑤) but this 𝐹 (𝑤)
is vulnerable to being over-fit. To prevent the over-fitting in final 𝐹 (𝑤)
of NCA framework regularization parameter 𝜆 is used which need to be
tuned. The 𝐹 (𝑤) with 𝜆 can be represented as (Yang et al., 2012)
24
𝐴=
𝑛
∑
(𝑃𝑖 )
(15)
𝑖=1
𝑛
∑
𝑖=1
25
26
27
(𝑃𝑖 ) − 𝜆
𝐷
∑
𝑤2𝑘
(16)
𝑘=1
The NCA method utilize conjugate gradient approach to increase the
objective function 𝐴. The best subset of attributes is chosen based on
the weights results.
3.5. Module 5: Selection of suitable features for PCA, ICA, LDA and NCA
28
In this study the same index modes from different channels were
arrange in a sequence. To reduce the length of data matrix build by
each index modes, the PCA, ICA and LDA were applied. To choose the
suitable components (whereas each component is considered as one
feature vector in the present study) of PCA, ICA and LDA coefficients,
we proposed a correlation-based suitable components and coefficients
selection criteria.
29
30
31
32
33
34
35
3.5.1. Suitable components and coefficient selection criteria for PCA, ICA
and LDA
The ‘‘Best-First’’ (Witten et al., 2005) technique is being used to
scan through the groups of components and coefficients, via greedy
climbing, which is improved by a backtracking mechanism. Afterwards
the ‘‘correlation based’’ (Hall, 1999) component and coefficient chosen
method is being employed to determine relevant components and
coefficients, by explicitly assessing the predictive potential of each
component and coefficient and its degree of reliability. It picks the
group of components and coefficients, which are closely correlated to
the category but have week interconnections (Hall, 1999) as formulated
mathematically,
36
37
𝑠𝑢𝑖𝑡𝑎𝑏𝑙𝑒 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑔𝑟𝑜𝑢𝑝 =
∑
𝑎𝑙𝑙𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠𝑓 𝐶(𝑓 , 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦)
√∑
∑
𝑎𝑙𝑙𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠𝑓
𝑎𝑙𝑙𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠𝑔 𝐶(𝑓 , 𝑔)
(17)
where 𝐶 makes a comparison among two components and ‘‘symmetric
uncertainty’’ is employed in this study.
38
39
3.5.2. Suitable features selection by NCA
For NCA, all the modes (we consider each mode a single feature
vector here) are arranged in a sequence and for strategies 1, 2 and
3 we have total 180, 70 and 30 feature vectors. In this study the
proposed criteria as shown in Fig. 3 was used to reduce the large
feature matrix. As shown in Fig. 3 the large feature matrix was first
divided into training modes and testing modes data, and regularization
parameters were tuned on training modes data by employing 10-fold
40
41
42
43
44
45
46
47
M.T. Sadiq et al.
Fig. 4. Graphical representation of dimensionality reduction techniques (a) Original EEG signal (b) PCA (c) ICA (d) LDA (e) NCA.
1
2
3
4
5
6
cross-validation. For different folds and regularization parameters, NCA
models were constructed on training modes data and classification
costs were estimated for corresponding test modes data. Calculate the
average of all cost values and the regularization parameter, which
give the minimum average classification loss, is considered as best
parameter value in this study. The features weighing more than 0 are
trained on different neural networks and these classifiers were tested
of different test modes data to evaluate the classification outcomes.
In Fig. 4, we present the proposed framework graphically. To visually represent the two classes’ clearly, blue color is dedicated for class
1 and red for class 2. The Fig. 4(a) represents the original EEG with
huge data and Fig. 4(b)–(e), represents the data obtained after PCA,
7
8
9
10
11
12
M.T. Sadiq et al.
1
2
3
ICA, LDA and NCA. These figures showed that there is significant data
reduction by dimensionality reduction techniques in comparison with
original data.
4
3.6. Module 6: Classification
5
6
7
8
Once we obtained the suitable feature subsets from dimensionality
reduction techniques, we employed different neural network classifiers
to classify MI signals. The details of the classifiers are summarize in
subsequent discussion.
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
3.6.1. Artificial neural networks (single and multilayer)
Artificial Neural Networks (ANN) are computational structures consisting of a high number of complex, strongly integrated computing
components known as neurons which abstractly mimic the biological nervous system’s structure and activity. ANN learning is achieve
by developing certain training methodologies depending on learning
regulations deemed to emulate biological systems learning processes.
Typical ANN for linear problems contain two layers i.e. input and
output layers whereas for nonlinear problems additional layer is utilize
and known as hidden layer. The number of hidden layers is chosen
empirically based on the problem in hand. More number of hidden layers results in long training process speed. We utilize back propagation
algorithm with scaled conjugate gradient approach for fast training to
find the suitable weights. Two models of ANN, first employed single
layer with ten neurons and a multilayered ANN by three layers each
with ten neurons are experimented for the classification of MI tasks. All
parameters were selected by hit and trial manners (Subasi & Ercelebi,
2005).
27
28
29
30
31
32
33
34
3.6.2. Feed forward neural network
In feed forward neural networks (FFNN) neurons are arrange in multiple layers and signals are forwarded from input to output. When error
occurs these neurons are return back to the previous layer and weights
are adjusted again to reduce the error chances. In this study, we use tansigmoid transfer function, single hidden layer with empirically chosen
ten neurons and Levenberg–Marquardt algorithm for fast training (Jana
et al., 2018).
35
36
37
38
39
40
41
42
43
44
45
3.6.3. Cascade-forward neural networks
In cascade-forward neural networks, (CFNN) neurons are interlinked with previous and subsequent layers neurons. For example a
three-layer CFNN represents direct connection between layer one to
layer two, layer two to layer three and layer one to layer three i.e., neurons in input and output layers are connected directly and indirectly.
These extra connections help to achieve the better learning speed for required relationship. Like FFNN, in CFNN we utilize transfer function of
tan-sigmoid, one hidden layer with ten neurons selected by hit and trial
manner and Levenberg–Marquardt method for quick learning (Goyal &
Goyal, 2011).
46
47
48
49
50
51
52
53
3.6.4. Recurrent neural networks
In recurrent neural networks, (RNN) neurons can flow in circle because this network has one or more feedback links. The characteristics
of RNN allows the system to process temporarily and recognize the
trends. In this study, we implement Elman recurrent neural networks,
which is the common type of RNN. For the quick training of model,
Levenberg–Marquardt method and single hidden layer with empirically
selected ten neurons are utilized (Mandic & Chambers, 2001).
54
55
56
57
58
59
3.6.5. Probabilistic neural networks
Bayesian method derives the probabilistic neural network (PNN)
with input, pattern, summation and output layers. The PNN classification accuracy is largely dependent on accurate value of the spread factor. In this study spread factor is fixed to 0.1 after several experiments
for the classification of different MI tasks (Specht, 1990).
Table 1
Confusion matrix.
True positive class
(Class 1)
True negative class
(Class 2)
Predictive
positive
class(Class 1)
Predictive
negative
class(Class 2)
TP
FN
FP
TN
3.6.6. Multilayer perceptron neural networks
We utilized multilayer perceptron neural network (MLP) with backpropagation for the classification of different MI tasks. The amount
of neurons in input and output layers are same as that of features in
feature vector and MI classes. The amount of neurons in input and
output layers is equivalent to that of feature vectors and MI groups.
After comprehensive tests the quantity of neurons is chosen for hidden
layer. The following formulation is used in this analysis to pick the
number of neurons for a single hidden layer as (Subasi & Ercelebi,
2005).
𝑁=
𝑁𝑜.𝑜𝑓 𝑓 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 + 𝑁𝑜.𝑜𝑓 𝑀𝐼𝑐𝑙𝑎𝑠𝑠𝑒𝑠
2
(18)
60
61
62
63
64
65
66
67
68
69
70
4. Performance verification
71
The performance of the proposed study is verified by using the
confusion matrix as shown in Table 1.
A number of performance matrices referred to as sensitivity (𝑆𝑒𝑛 ),
precision (𝑃𝑟𝑒 ), accuracy (𝐴𝑐𝑐 ), F1 score (𝐹1 ), specificity (𝑆𝑝𝑒 ), and kappa
coefficient (𝐾𝑐𝑜 ) are attained from Table 1 for classification performance validation and are as follows respectively: (Hossin & Sulaiman,
2015).
72
73
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
𝑇𝑃
𝑃𝑟𝑒 =
𝑇𝑃 + 𝐹𝑃
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐 =
𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃
2𝑃 𝑟𝑒𝑆𝑒𝑛
𝐹1 =
𝑃 𝑟𝑒 + 𝑆𝑒𝑛
𝑇𝑁
𝑆𝑝𝑒 =
𝐹𝑃 + 𝑇𝑁
𝐴𝑐𝑐 − 𝐸𝑥𝑝𝐴𝑐𝑐
𝐾𝑐𝑜 =
1 − 𝐸𝑥𝑝𝐴𝑐𝑐
(19)
𝑆𝑒𝑛 =
(20)
(21)
(22)
(23)
(24)
where
74
(𝑇 𝑃 +𝐹 𝑃 )(𝑇 𝑃 +𝐹 𝑁)
𝑇 𝑃 +𝐹 𝑁+𝑇 𝑁+𝐹 𝑃
+
(𝐹 𝑁+𝑇 𝑁)(𝐹 𝑃 +𝑇 𝑁)
𝑇 𝑃 +𝐹 𝑁+𝑇 𝑁+𝐹 𝑃
(25)
75
In Table 1 and Eqs. (20)–(26), 𝑇 𝑃 mean true positive represents
the correctly predicted instances of class 1; 𝑇 𝑁 means true negative
represents the correctly estimated instances of class 2; 𝐹 𝑃 represents
false positives representing the number of instances predicted as positive where it actually belongs to negative class and 𝐹 𝑁 estimated as
negative instances but actually belongs to positive class.
76
77
78
79
80
81
5. Experimental setup
82
The proposed algorithms (dimension reduction techniques) require
class labels for experimentations as the dataset IVa and IVb comprised
of training (labeled) and testing (un-labeled) datasets so we consider
only training (labeled) data as raw EEG signals. We extract features
from labeled data (which is considered as raw EEG signals) only thus
we split features into the training and testing part at later stage to
verify the effectiveness of the proposed experiments. Similar procedure
is adopted in studies (Raghu & Sriraam, 2018; Yang et al., 2012) for the
split of features into training and testing part. For more understanding
83
84
85
86
87
88
89
90
91
𝐸𝑥𝑝𝐴𝑐𝑐 =
𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃
M.T. Sadiq et al.
Fig. 5. Classification results represented by confusion matrix for ANN classifier. (a) Training confusion matrix (b) Validation confusion matrix (c) Test confusion matrix (d) All
confusion matrix.
Table 2
Number of trials used in each class.
Datasets
aa
al
av
aw
ay
ivb
Class 1
Class 2
80
112
42
30
18
105
86
112
42
26
10
105
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
of the dataset, the total number of trials (labeled data) used for experiments related to each subject and class are shown in following Table 2.
The motivations for using the labeled data can be found in subsequent discussion as well. For MI EEG classification, Siuly et al. proposed
a cross-correlation with logistic regression (CC-LR) method (Li et al.,
2013), a Modified CC-LR algorithm (Li et al., 2014) and a principal
component analysis-based technique (Siuly et al., 2017) in their previous works, and they used the labeled data (data with labels/training
data) for their experiments. Recently we proposed EWT-based algorithms (Sadiq et al., 2019a) for MI EEG classification and followed the
same procedure as utilized in earlier studies (Li et al., 2013, 2014;
Siuly et al., 2017) and selected identically labeled dataset for MI EEG
classification. We considered only those EEG segments containing MI
tasks information only, and position markers are used to obtain MI data
of both categories.
17
18
19
20
21
22
1. In the first phase, from each EEG segment, 350 samples were
taken as these samples are directly related to the MI tasks
information. To remove external, cognitive and interference of
neighboring channels, band pass filter (BPF) with lower and
upper frequency of 8 Hz and 25 Hz respectively was employed
at the first step, as these two frequency bands contain the
maximum MI information. In the second step, Laplacian filter
was adopted to remove the cross-channel interference effects.
2. In the second phase, we choose different combination of channels (3, 7 and 18 channels) around the motor cortex area of
the brain region based on the physiological arrangement and
channel labeling according to the 10–20 system.
3. In the third phase, owing to highly non-linear and non-stationary
nature of the EEG, each signal is decomposed into 10 modes
by employing EWT, which provides enough information for the
correct identification of MI signals (Sadiq et al., 2019a). At this
stage we have a total number of 30, 70 and 180 modes which
are obtained with 3, 7 and 18 channels, respectively.
4. In the fourth phase, we re-arranged the modes data, such that
same index modes of all channels make row vectors respectively
as follows,
23
24
25
26
27
28
29
30
31
32
33
34
𝑀1 = {𝐶ℎ1 𝑚1 𝐶ℎ2 𝑚1 ⋯ 𝐶ℎ7 𝑚1 }
𝑀2 = {𝐶ℎ1 𝑚2 𝐶ℎ2 𝑚2 ⋯ 𝐶ℎ7 𝑚2 }
(26)
⋮
𝑀10 = {𝐶ℎ1 𝑚10 𝐶ℎ2 𝑚10 ⋯ 𝐶ℎ7 𝑚10 }
where 𝐶ℎ1 𝑚1 represents mode 1 of channel 1 and similar remark
is applicable for all other modes of different channels. It is
witnessed that each mode have 350 samples, and thus, 2450
(350 × 7) parameters are obtained for one index modes, and
totally we have 24,500 (2450 × 10) parameters for all index (10)
modes of 7 channels. Since the total number of parameters is too
large, we reduced the dimension of each vector parameters using
dimension reduction techniques as below,
35
36
37
38
39
40
41
42
(a) We applied PCA on each index modes vector first. PCA
reduced the arrangement dimensions into 49 parameters
43
44
M.T. Sadiq et al.
Table 3
Selected number of PCA, ICA and LDA components and coefficient (features) with number of parameters for three channel selection strategies.
Subject
3 Channels
7 Channels
18 channels
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
DR Techniques
# of parameters
PCA
ICA
LDA
NCA
PCA
ICA
LDA
NCA
PCA
ICA
LDA
NCA
aa
3 × 14
42
3 × 11
33
350 × 10
3500
350 × 14
4900
7 × 6
42
7 × 8
56
350 × 10
3500
350 × 6
2100
18 × 6
108
18 × 8
144
350 × 10
3500
350 × 6
2100
al
3 × 12
36
3 × 18
54
350 × 10
3500
350 × 12
4200
7 × 8
56
7 × 8
56
350 × 10
3500
350 × 7
2450
18 × 8
144
18 × 8
144
350 × 10
3500
350 × 8
2800
av
3 × 15
45
3 × 9
27
350 × 10
3500
350 × 12
4200
7 × 10
70
7 × 8
56
350 × 10
3500
350 × 8
2800
18 × 10
180
18 × 8
144
350 × 10
3500
350 × 8
2800
aw
3 × 6
18
3 × 3
9
350 × 10
3500
350 × 6
2100
7 × 2
14
7 × 3
21
350 × 10
3500
350 × 2
700
18 × 2
36
18 × 3
54
350 × 10
3500
350 × 2
700
ay
3 × 9
27
3 × 12
36
350 × 10
3500
350 × 9
3150
7 × 70
49
7 × 7
49
350 × 10
3500
350 × 7
2450
18 × 7
126
18 × 7
126
350 × 10
3500
350 × 7
2450
Ivb
3 × 9
27
3 × 6
18
350 × 10
3500
350 × 1
350
7 × 7
49
7 × 4
28
350 × 10
3500
350 × 1
350
18 × 7
126
18 × 4
72
350 × 10
3500
350 × 1
350
(7 values×7 dimensions) for each index modes vector, and
finally, we obtained a total number of 490 parameters for
[𝑀1 , 𝑀2 , … … .𝑀10 ].
(b) We then applied ICA onto [𝑀1 , 𝑀2 , … … , 𝑀10 ] respectively. The number of ICA components during the modes
data reduction was fixed to be 6, since 6 components of
ICA obtained enough physiological information for the
analysis of biomedical signals (Martis et al., 2013). ICA
results in 42 parameters (7values×6 dimensions) for each
index modes vector. A total number of 420 parameters are
acquired.
(c) We next utilized LDA on [𝑀1 , 𝑀2 , … … , 𝑀10 ] respectively. The number of LDA components are number of
𝑐𝑙𝑎𝑠𝑠𝑒𝑠 − 1. So we have one component obtained from
𝑀1 , one from 𝑀2 and similar for all others. In total, LDA
results in 3500 (350 × 10) parameters out of total 24,500
parameters.
(d) For NCA, all the modes vectors were arranged sequentially and we obtain a huge data matrix of modes arranged
as
21
𝐹 𝑀 = [𝑀1 , 𝑀2 , … … , 𝑀10 ]
22
23
24
25
This specify that 𝐹 𝑀 obtained 180 modes (63,000 parameters), 70 modes (24,500 parameters) and 30 modes
(10,500 parameters) for 18, 7 and 3 channels respectively.
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
5. In the fifth phase, features are automatically selected from PCA,
ICA, LDA and NCA. In study (Martis et al., 2013), each component of PCA, ICA and LDA is considered to be one feature
for the ECG bead classification. However, components and coefficients were selected manually or in a hit and trial manner,
which makes their method less practical. In the present proposed
study, we also utilized the components and coefficients of PCA,
ICA and LDA as feature vectors, but we employed automated
correlation-based component and coefficient selection criteria to
make proposed method more adaptive for real time applications.
In this way, a component or coefficient is chosen best if its
characteristics are maximally matched to the characteristics of
the category. While the automatic selection of features with NCA
is explained clearly in Fig. 3. Table 3 shows the number of
selected features with parameters.
6. In the final phase all the selected features were fed to the six neural network classifiers, wherein several evaluation measures with
10-fold cross-validation strategy was employed for classification
of different MI tasks.
45
6. Results
46
6.1. Subject dependent results
47
(27)
In following sections we first discuss the subject dependent results.
6.1.1. Analysis for PCA, ICA and LDA
To explain the results obtained from PCA, we randomly consider
subject ‘‘av’’ with channel selection criteria 3 and ANN classifier as an
example. The training, validation and testing results of PCA classified
by ANN considering single layer and back-propagation algorithm with
scaled conjugate gradient approach are shown by confusion matrix in
Fig. 5. As shown in Fig. 5, the first two diagonal cells (shaded by green
color) of training confusion matrix (each representing class 1 and class
2 samples) reflect the number and percentage of correct classification
by trained network. For example in Fig. 5(a), 3701 cases are correctly
classified as class 1 samples and 3609 are correctly classified as class 2
samples. 91 cases of class 2 are incorrectly classified as part of class 1
whereas 159 cases of class 1 are incorrectly classified. Overall 95.9%
cases of class 1 are identified as correct and rest 4.1% as incorrect.
Likewise, for class 2, 97.5% cases are predicted as correct and rest
2.5% as wrong. In total, 96.7% samples are correctly classified and
rest 3.3% as wrong in training dataset. Similar analysis is performed
for validation and testing sets as shown in Fig. 5(b) and (c). At last, the
average classification outcome for all datasets is 96.5%.
The area under the receiver-operating curve (AUC) of ANN classifier
for subject ‘‘av’’ with channel selection criteria 3 is shown in Fig. 6.
The AUC value near to 1 represents the good classification capability of
classifier, center line represents that classification is perform by chance
whereas value near to 0 represents the poor classification capability of
a classifier. For both classes the ROC curve in our case are near to top
left represents that AUC value is near to 1.
The best performance of ANN classifier at specific period is shown
in Fig. 7. As seen in Fig. 7, best validation performance of 0.055788 is
obtained at epoch 2.
To show the network verification of training, validation and testing
error, histogram is shown in Fig. 8. Verification is not as good as the
best collection of data points represents the outliers. Part of histogram
displaying the null line deviation provides the basis for setting the limit,
this need to categorize the outliers based on chosen attribute values
being perfected and unperfected.
Similar analysis is valid for ICA and LDA.
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
6.1.2. Analysis for NCA
The analysis of NCA technique is describe in following discussions.
In NCA, 10-fold cross validation was used to measure the finest parameter value for regularization corresponding to the lowest possible
loss of discrimination. NCA results produced the best classifying loss of
6.6667e−04 for subject ‘‘av’’ with 3 channels as shown in Fig. 9 at the
finest regularization value of 4.0100𝑒−04
The NCA model was executed on an attribute matrix with the best
value for the regularization parameter and hence the weight of each
attribute was calculated. Attributes with a weight exceeding 5 percent
of total attribute weight were nominated to differentiate between the
different MI functions. The Fig. 10 Indicates the features weight according to their indices. The essential features in the order were listed: 𝑀10
to 𝑀16 and 𝑀29.
84
85
86
87
88
89
90
91
92
93
94
95
96
97
M.T. Sadiq et al.
Fig. 6. Receiver Operating Curve of training, validation and testing for ANN classifier. (a) Training ROC (b) Validation ROC (a) Test ROC (a) All ROC.
Fig. 7. (a) Best validation performance at epoch 2 is 0.055788 (b) zoom version of part (a).
1
2
6.1.3. Analysis using all channel selection criteria, statistical measures and
classifiers
3
In this section, we describe the details of results using 18, 7 and 3
4
channels by employing ANN, FFNN, CFFNN, RNN, PNN, and MLP with
5
𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 (which are calculated from confusion
matrix) for PCA, ICA, LDA and NCA. Due to different mental and
physical nature of subjects, we obtained unique features combination
for every subject.
The results obtained by PCA and ICA are shown in Table 4 whereas
the results achieved by LDA and NCA for different channel selection
criteria can be seen in Table 5. The best classification results are shown
6
7
8
9
10
11
M.T. Sadiq et al.
Table 4
Classification outcome of three channel criteria strategies by PCA and ICA.
Subject
Classifier
Strategy 1
PCA
Strategy 2
ICA
Strategy 3
PCA
ICA
PCA
ICA
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
‘‘aa’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
80
46.7
69.2
80
69.2
54.6
80
80
40
85.7
80
85.7
55.6
80
80
45
75
80
75
55
80
100
100
100
100
90.9
91.7
100
94.7
97
69.2
94.7
68.8
70.8
94.7
97.2
97.2
77.8
97.2
75
77.8
97.2
100
100
90
100
100
72.7
100
100
100
90
100
100
77.8
100
100
100
90
100
100
75
100
100
66.7
66.7
100
100
100
100
100
66.7
66.7
100
75
75
100
100
66.7
66.7
100
83.3
83.3
100
87.5
75
77.8
87.5
69.2
83.3
87.5
75
87.5
72.7
75
85.7
64.3
75
80
80
75
80
75
70
80
92.9
80
80
92.9
80
100
92.9
94
66.7
66.7
94
66.7
77.8
94
92.8
71.4
71.4
92.8
71.4
85.7
92.8
‘‘al’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
90
75
87.5
90
61.5
70
90
90
66.7
75
90
71.4
70
90
90
70
80
90
65
70
90
100
83.3
81
100
76.5
62.5
100
100
83.3
93.3
100
73.7
75
100
100
83.3
86
100
75
66.7
100
77.8
40
66.7
77.8
77.8
45.5
77.8
72.7
40
75
72.7
72.7
44.4
72.7
75
40
70
75
75
45
75
100
100
100
100
100
100
100
100
75
75
100
75
100
100
100
83.3
83.3
100
83.3
100
100
100
87.5
100
100
100
62.5
100
83.3
75
83.3
83.3
83.3
58.3
83.3
90
80
90
90
90
60
90
85.7
75
83.3
85.7
80
100
85.7
85.7
83.3
75
85.7
66.7
77.8
85.7
85.7
78.5
78.5
85.7
71.4
85.7
85.7
‘‘av’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
72.7
66.7
57.4
72.7
64.3
64.3
72.4
77.6
63.6
66.7
77.6
83.3
83.3
77.6
75
65
60
75
70
70
75
80
68
77.3
80
73.9
65.4
80
87.5
90.1
92.9
87.5
92.3
90
87.5
83.3
75
83.3
83.3
80.6
72.2
83.3
81.8
72.7
81.8
81.8
81.8
61.5
81.8
88.9
77.8
88.9
88.9
88.9
71.4
88.9
85
75
90
85
85
65
85
83.3
83.3
83.3
83.3
66.7
66.7
83.3
83.3
83.3
83.3
83.3
66.7
66.7
83.3
83.3
83.3
83.3
83.3
66.7
66.7
83.3
80
77.8
80
80
66.7
83.3
80
80
72.7
80
80
63.6
64.3
80
80
75
80
80
65
70
80
87.5
55.7
63.4
87.5
50
44.4
87.5
100
80
100
100
50
40
100
92.8
71.4
71.4
92.8
50
42.8
92.8
‘‘aw’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
85.7
50
87.5
85.7
66.7
55
85.7
69.3
50
75
69.3
57.1
55
69.3
75
50
75
75
60
55
75
80.9
73.9
77.3
80.9
66.7
85
80.9
93.3
92.3
92.9
93.3
83.3
84.6
93.3
86.1
80.6
83.3
86.1
72.2
75
86.1
100
81.8
80
100
81.8
66.7
100
90.9
88.9
80
90.9
88.9
63.6
90.9
95
85
80
95
85
65
95
100
100
100
100
66.7
75
100
100
100
100
100
66.7
100
100
100
100
100
100
66.7
83.3
100
80
46.7
69.2
80
69.2
54.6
80
80
40
85.7
80
85.7
55.6
80
85
45
75
85
75
55
80
100
100
87.5
100
100
100
100
100
100
100
100
87.5
100
100
100
100
92.8
100
92.8
100
100
‘‘ay’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
90
90
70
90
90
66.7
90
90
90
70
90
90
57.1
90
90
90
70
90
90
60
90
100
88.2
85.7
100
100
88.9
100
81.8
84.2
72.7
81.8
78.3
62.9
81.8
88.9
86.1
77.8
88.9
86
69.4
88.9
72.7
63.6
77.8
72.7
58.3
58.3
72.7
77.8
66.7
72.7
77.8
62.5
62.5
77.8
75
65
75
75
60
60
75
100
100
100
100
66.7
100
100
100
100
100
100
66.7
100
100
100
100
100
100
66.7
100
100
80
80
69.2
80
69.3
54.6
80
80
80
85.7
80
85.7
55.6
80
80
80
75
80
75
55
80
100
85.7
100
100
85.7
80
100
87.5
85.7
70
87.5
85.7
66.7
87.5
92.8
85.7
78.5
92.8
85.7
71.4
92.8
Average
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
83.5
69
74.3
83.5
70.3
62.1
83.5
81.4
62.1
74.5
81.4
77.5
64.2
81.4
82
64
72
82
72
62
82
92.9
82.7
84.3
92.2
81.6
78.8
92.2
91.5
89.4
84.2
91.5
79.3
76.7
91.5
91.1
84.4
81.6
91.5
77.8
72.2
91.1
86.5
71.6
79.3
86.5
67.8
60.9
86.5
85.9
74.7
81.3
85.9
82.6
63.9
85.9
86
73
81
86
81
62
86
96.7
90
90
96.7
80
83.3
96.7
96.7
85
85
96.7
70
83.3
96.7
96.7
86.7
86.7
96.7
73.3
86.7
96.7
85.5
73.4
79.2
85.5
74.9
67.7
85.5
79.7
71
81.5
79.7
80.8
59.6
79.7
83
72
79
83
76
60
83
92.9
79.3
82.8
92.9
79.1
84.9
92.9
93.4
83.1
82.3
93.4
71.1
72.5
93.4
92.7
81.4
78.5
92.7
74.3
77.1
92.7
Fig. 8. Error histogram of training, validation and testing state with 20 bins.
Fig. 9. Plot between regularization parameters and mean loss values.
1
2
3
4
5
6
7
8
9
by bold text. As shown in those Tables, the most accurate attributes
were unique in each classifier, so each combination provides different
classification results.
For PCA, it is evident from Table 4, the combination of 7 channels
and chosen principal components results in 100%, 75%, 90%, 85%
and 75% classification accuracy for subjects ‘‘aa’’, ‘‘al’’, ‘‘av’’, ‘‘aw’’
and ‘‘ay’’ by employing ANN, CFNN and MLP respectively. The mean
classification accuracy for this case is 86%, which is higher than the
results obtained by 18 and 3 channels as shown in Table 4. It is also
worth noting that there is significance difference among 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 ,
𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 which showed the instability of PCA for the detection
of different MI tasks.
10
11
12
For ICA with 7 channels, we obtained 100% 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 ,
and 𝐾𝑐𝑜 for subjects ‘‘aa’’, ‘‘al’’, ‘‘aw’’ and ‘‘ay’’ by utilizing ANN,CFNN
and MLP classifiers.
13
14
15
M.T. Sadiq et al.
Table 5
Classification outcome of three channel selection criteria by LDA and NCA.
Subject
Classifier
Strategy 1
LDA
Strategy 2
NCA
NCA
LDA
NCA
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
𝑆𝑒𝑛
𝑆𝑝𝑒
𝐴𝑐𝑐
‘‘aa’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
98.9
98.8
98.9
98.9
99.2
92.7
98.9
98.1
96.1
96
98.1
95.9
89.5
98.1
98.5
97.4
97.5
98.5
97.5
91
98.5
99.9
96.4
98.2
99.9
96.8
99.3
99.9
100
99.8
100
100
99.9
99.9
100
99.9
98.1
99.1
99.9
98.4
99.6
99.9
100
100
100
100
100
97.7
100
100
100
100
100
100
99.9
100
100
100
100
100
100
98.8
100
100
100
100
100
100
100
100
100
100
100
100
100
99.4
100
100
100
100
100
100
99.7
100
96.5
96.4
94.8
96.5
94.1
63.9
96.5
96.4
96.3
94.4
96.4
93.7
43.5
100
98.2
98.2
97.2
98.2
96.9
71.8
98.2
99.9
99.4
99.9
99.9
98.8
99.3
99.9
99.9
98.7
99.9
99.9
99.9
99.6
99.9
99.9
99.1
99.9
99.9
99.3
99.4
99.9
‘‘al’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
96.9
96.9
96.9
96.9
96.9
96.9
96.9
93.4
93.4
93.4
93.4
93.4
93.4
93.4
95.1
95.1
95.1
95.1
95.1
95.1
95.1
100
100
100
100
100
94.9
100
100
100
100
100
100
93.2
100
100
100
100
100
100
94.1
100
96.9
96.9
96.9
96.9
96.9
96.9
96.9
93.4
93.4
93.4
93.4
93.4
93.4
93.4
95.1
95.1
95.1
95.1
95.1
95.1
95.1
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
97.4
97.8
96.4
97.4
96.1
97.2
97.4
96.4
96.4
96.4
96.4
96.2
64.4
96.4
96.9
97.1
96.4
96.9
96.1
92.1
96.9
100
98.8
98.4
100
99.8
99.8
100
100
99.4
98.4
100
99.1
99.1
100
100
99.1
98.4
100
99.4
99.4
100
‘‘av’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
99.9
99.7
99.9
99.9
99.9
84.7
99.9
98.9
98.3
98.9
98.9
98.5
81.8
98.9
99.4
99.1
99.4
99.4
99.2
83.2
99.4
100
100
100
100
100
100
100
100
100
100
100
100
70.1
100
100
100
100
100
100
78.7
100
99.6
99.1
98.6
99.6
98.5
95.3
99.6
99.6
97.8
99.7
99.6
99.5
98.2
99.6
99.6
98.4
99.1
99.6
99.1
96.7
99.6
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
92.7
91.4
92.4
92.7
92.3
80.8
92.7
94.8
92.6
94.1
94.8
94.5
87.9
94.8
93.7
91.9
92.3
93.7
93.4
83.9
93.7
96.9
94.8
95.8
96.9
95.2
87.5
96.9
96.9
95.7
96.9
96.9
97.9
99.2
96.9
96.9
95.6
96.4
96.9
96.5
92.6
96.9
‘‘aw’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
99.9
99.4
99.4
99.9
97.5
97.3
99.9
99.8
99.6
99.1
99.8
99.2
93.6
99.8
99.9
99.5
99.2
99.9
98.3
95.3
99.9
98.4
98.2
98.2
98.4
98.4
98.3
98.4
100
99.9
99.9
100
100
98.7
100
99.2
99.1
99.1
99.2
99.2
98.7
99.2
100
98.9
99.5
100
99.5
96.3
100
100
99.6
100
100
100
99.1
100
99.9
99.2
99.7
99.9
99.6
97.7
99.9
100
100
100
100
100
95.4
100
100
90
90
100
100
97
100
100
100
100
100
100
96.2
100
97.2
97.2
94.3
97.2
94.3
100
97.2
97.2
97.2
97.1
97.2
96.9
78.2
97.2
97.2
97.2
95.6
97.2
95.6
86
97.2
99.9
99.8
99.9
99.9
99.3
100
99.9
99.9
99.8
99.9
99.9
99.5
99.9
99.9
99.9
99.8
99.9
99.9
99.5
99.9
99.9
‘‘ay’’
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
98.8
97.9
95.7
98.8
96.7
94.1
98.8
99.2
99
99.4
99.2
99.3
94.3
99.2
99.2
98.5
97.5
99.2
97.9
94.6
99.2
93.8
92.8
91.7
93.8
91.8
77.9
93.8
96.9
94.6
98.5
96.9
90
88.7
96.9
95.3
94.5
94.8
95.3
94.7
92.5
95.3
100
100
100
100
99.5
97.6
100
100
100
100
100
99.4
99.9
100
100
100
100
100
99.5
98.7
100
100
99.7
100
100
99.9
97.8
100
100
99.6
100
100
100
99.8
100
100
99.7
9.9
100
99.9
98.7
100
98
95.5
97.3
98
97.4
93.7
98
95.4
93.7
95.4
95.4
95.9
89.9
95.4
96.7
94.6
96.3
96.7
96.7
91.8
96.7
96.9
93.5
97.9
96.9
98.8
99.6
96.9
96.9
92.7
96.9
96.9
96
99.6
96.9
96.9
93.1
97.4
96.9
97.4
99.6
96.9
Average
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
98.9
98.5
98.2
98.9
98
93.1
98.9
97.9
97.3
97.4
97.9
97.3
90.5
97.9
98.4
97.2
97.7
98.4
97.6
91.8
98.4
98.4
97.5
98
98.4
97.4
94.1
98.4
99.4
98.9
99.4
99.4
97.9
89.8
99.4
98.9
98.3
98.7
98.9
98.5
90.7
98.9
99.3
98.9
99
99.3
98.9
96.8
99.3
98.6
98.2
98.6
98.6
98.5
98.1
98.6
98.9
98.5
98.8
98.9
98.7
97.4
98.9
100
99.9
100
100
99.9
98.6
100
100
97.9
100
100
100
99.2
100
100
99.9
100
100
99.9
98.9
100
96.4
95.7
95
96.4
94.8
87.1
96.4
96.8
95.9
96.6
96.8
96.7
84.1
96.8
96.5
95.8
95.6
96.5
95.7
85.1
96.5
98.1
97.3
98.4
98.1
98.4
96
98.1
98.1
97.3
98.4
98.1
98.5
99.5
98.1
98.1
97.3
98.3
98.1
98.4
98.2
98.1
Fig. 10. Features with different weights.
1
2
3
4
5
6
7
8
Strategy 3
LDA
Likewise, LDA with 7 channels provide 100%, 95.1%, 99.6%, 99.9%
and 100% classification outcome for subjects ‘‘aa’’, ‘‘al’’, ‘‘av’’, ‘‘aw’’
and ‘‘ay’’ respectively by utilizing same classifiers as mentioned for ICA.
The detailed results obtained by ICA and LDA with different decoded
channels are shown in Table 4 and Table 5 respectively. The average
𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 are 96.7%, 97.5%, 96.7%, 96.6%, 96.7%
and 100% for ICA, and 99.3%, 98.5%, 98.9%, 98.9%, 98.6% and 97.9%
for LDA by employing ANN, CFNN and MLP classifiers. These results
indicate the less variations among different measures and shows the
fair nature of ICA and LDA methods in detection of different MI states
in comparison with PCA.
The classification results obtained by NCA are given in Table 5. We
obtained 100% average 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 , with 7 channels
and ANN, CFNN and MLP classifiers, which indicate the significance of
features obtained by NCA for MI EEG signal classification. These results
also indicate that NCA with 7 channels increase the classification performance measures by utilizing less number of features in comparison
with PCA, ICA and LDA.
In Fig. 11(a)–(c), precision, F-measure and kappa statistics values
obtained by different dimension reduction techniques with 7 channels
are shown by bar graphs. Each bar graph is labeled with the average
value in percentage. As represented in Fig. 11(a)–(c), for PCA, high
variations can be seen among different values by using six classifiers. The NCA provide maximum results with least variations among
different measures by utilizing ANN, CFNN and MLP classifiers
The results obtained for dataset IVb are shown in Table 6. These
results also provide 100% average 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 , for
NCA with 7 channels.
The Table 7 contains the probability (P) values obtained by the
Kruskal–Wallis (KW) test applied on features selected by NCA with
7 channels. As seen in Table 7, the values for features are extremely
low which indicate the significance of features that help to obtain
outstanding classification results in NCA.
These results conclude that 7 channels, suitable feature selection
and ANN, CFNN and MLP classifiers provide the best combination to
achieved benchmark classification output for different subjects. These
results conclude that NCA is a efficient feature selection tool for subjects with different mental and physical nature and can be used for
efficient BCI systems.
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
M.T. Sadiq et al.
Fig. 11. (a) Precision (b) F-measure (c) Kappa-Coefficient for NCA with 7 Channels.
M.T. Sadiq et al.
Table 6
Classification outcome of three channel selection criteria for Dataset IVb.
DR techniques
Classifier
Strategy 1
Strategy 2
Strategy 3
𝑆𝑒𝑛
𝑃𝑟𝑒
𝐴𝑐𝑐
𝐹1
𝑆𝑝𝑒
𝐾𝑐𝑜
𝑆𝑒𝑛
𝑃𝑟𝑒
𝐴𝑐𝑐
𝐹1
𝑆𝑝𝑒
𝐾𝑐𝑜
𝑆𝑒𝑛
𝑃𝑟𝑒
𝐴𝑐𝑐
𝐹1
𝑆𝑝𝑒
𝐾𝑐𝑜
PCA
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
85.5
60
65
85.5
65
70
85.5
88.3
60
65
88.3
65
70
88.3
86.7
60
65
86.7
65
70
86.7
88.9
60
65
88.9
65
70
88.9
87.9
60
65
87.9
65
70
87.9
73.3
60
65
73.3
65
70
73.3
97.3
96.2
97.3
97.3
97.3
88.8
97.3
98.3
97.2
98.3
98.3
98.3
88.6
98.3
97.8
96.7
97.8
97.8
97.8
88.6
97.8
97.8
96.7
97.8
97.8
97.8
88.6
97.8
98.3
97.2
98.3
98.3
98.3
88.4
98.3
95.6
93.3
95.6
95.6
95.6
77.2
95.6
88.8
88.8
70
88.8
70
88.8
88.8
88.6
88.6
70
88.6
70
88.6
88.6
88.6
88.6
70
88.6
70
88.6
88.6
88.6
88.6
70
88.6
70
88.6
88.6
88.4
88.4
70
88.4
70
88.4
88.4
77.2
77.2
70
77.2
70
77.2
77.2
ICA
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
83.3
100
100
100
100
100
100
100
94.4
94.4
94.4
94.4
94.4
94.4
94.4
97.2
97.2
97.2
97.2
97.2
97.2
97.2
97.1
97.1
97.1
97.1
97.1
97.1
97.1
94.7
94.7
94.7
94.7
94.7
94.7
94.7
94.4
94.4
94.4
94.4
94.4
94.4
94.4
87.5
87.5
87.5
87.5
87.5
87.5
87.5
100
100
100
100
100
100
100
92.9
92.9
92.9
92.9
92.9
92.9
92.9
93.3
93.3
93.3
93.3
93.3
93.3
93.3
100
100
100
100
100
100
100
85.7
85.7
85.7
85.7
85.7
85.7
85.7
LDA
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
98.4
97.8
99.7
98.4
98.4
92.5
98.4
97.5
95.4
94.8
97.5
96.9
92.9
97.5
97.9
96.6
97.3
97.9
97.6
92.7
97.9
97.9
96.6
97.2
97.9
97.6
92.7
97.9
97.5
95.5
95.1
97.5
96.9
92.8
97.5
95.9
93.3
94.6
95.9
95.3
85.3
95.9
99.6
99.7
99.6
99.6
99.4
96.9
99.6
99.7
97.8
99.7
99.7
99.9
99.8
99.7
99.7
98.8
99.7
99.7
99.6
98.3
99.7
99.7
99.8
99.7
99.7
99.6
98.4
99.7
99.7
97.8
99.7
99.7
99.9
99.8
99.7
99.7
97.5
99.3
99.7
99.3
96.7
99.7
99.7
99.7
99.6
99.7
99.4
96.9
99.7
99.7
97.8
99.7
99.7
99.9
99.8
99.7
99.7
98.8
99.7
99.7
99.6
98.3
99.7
99.7
98.8
99.7
99.7
99.6
98.4
99.7
99.7
97.8
99.7
99.7
99.9
99.8
99.7
99.7
97.5
99.3
99.7
99.3
96.7
99.7
NCA
ANN
MNN
FFNN
CFNN
RNN
PNN
MLP
97.9
96.7
94.5
97.9
95.8
96.4
97.9
97.9
98.9
98.8
97.9
99.2
99.9
97.9
97.9
97.8
96.5
97.9
97.4
98.1
97.9
97.9
97.8
96.6
97.9
97.5
98.1
97.9
97.9
98.8
98.7
97.9
99.2
99.9
97.9
97.9
95.5
93.1
97.9
94.8
96.2
97.9
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
99.7
Table 7
P values for NCA features with 7 channels.
Subjects
P values by KW test
‘‘aa’’
‘‘al’’
‘‘av’’
‘‘aw’’
‘‘ay’’
‘‘IVb’’
F1
F2
F3
F4
F5
F6
F7
F8
0
4.1 × 10−248
2.99 × 10−141
0
9.9 × 10−3
3.67 × 10−193
4.39 × 10−66
1.52 × 10−126
2.70 × 10−199
9.35 × 10−116
9.9 × 10−3
–
0
1.37 × 10−139
0
–
1.58 × 10−10
—
0
6.3 × 10−2
9.24 × 10−313
—
7.35 × 10−3
–
0
2.010 × 10−185
0
–
9.9 × 10−3
—
1.58 × 10−261
1.807 × 10−75
2.80 × 10−241
—
9.15 × 10−3
–
3.57 × 10−09
8.99 × 10−207
2.04 × 10−16
–
0
—
0.75
–
0
–
–
–
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
6.1.4. Timing execution of proposed methodology
In our experiments, we obtained maximum classification accuracy
by utilizing channel selection criteria 2 i.e. with 7 channels so we have
presented the execution time in seconds for this strategy in terms of
training, testing and total algorithm time in Fig. 12. This execution
time is calculated by utilizing MLP classifier for each subject. As seen
in Fig. 12, the total algorithm execution time increases due to the
complexity of feature selection algorithm. PCA provide the minimum
execution time whereas NCA provides the maximum execution time
in comparison with all other dimensionality reduction techniques. Although the execution time of NCA is more due to loop used for the
calculation of classification cost, this elapsed time is a single time
procedure so the training and testing execution period is very less
in comparison with other dimension reduction techniques, which is
required for online BCI applications. It is also worth noting that we
calculated time for all the trials by utilizing MATLAB software with 8
GB RAM on personal computer, the time required for single trial is still
very less and can be further reduced by utilizing powerful system and
software.
20
6.2. Subject independent results
21
22
23
24
25
Because of the dynamic characteristics of EEG signals, classifiers
training and testing are highly subject specific, and therefore, subject
dependent MI EEG signal identification strategies were formulated in
the literature (Ince et al., 2009; Kevric & Subasi, 2017; Li et al., 2011;
Lu et al., 2010; Siuly & Li, 2012; Song & Epps, 2007; Wang et al.,
2016; Zhang et al., 2013). Nevertheless, in reality it is incredibly hard
and tedious for stroke subjects for doing exhaustive training sessions
to use a particular device, and hence researchers (Joadder et al.,
2019) recently introduced a subject independent (SI) method for the
identification of MI EEG signals. However, it should be noted that such
a framework’s identification performance is low, a huge amount of
electrodes were also utilized in training phase.
In the present research, we have also sought to investigate the
effectiveness of the suggested method for the identification of MI
EEG signals in SI case. We adopted the similar procedure as utilized
in Joadder et al. (2019) for SI experiments where the first four subjects
in dataset IVa were employed for training purposes whereas subject
five was chosen as a test subject. In Fig. 13 we have shown the block
diagram of SI framework with NCA as it provided the best results in
this study. As it is seen in Fig. 13, there are two main building blocks
consists of training with first four subjects and testing with fifth subject.
All the results were obtained by ten-fold cross validation strategy with
MLP classifier. The results obtained by SI case are presented in Fig. 14.
We obtained 93%, 93%, 92.9%, 93%, 96.4% and 90%, of average 𝑆𝑒𝑛 ,
𝑆𝑝𝑒 , 𝐴𝑐𝑐 , 𝑃𝑟𝑒 , 𝐹1 and 𝐾𝑐𝑜 . These low deviations in different measures
summarize that the proposed SI framework have unbiased chances in
recognition of each MI task.
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
7. Discussions
48
The purpose of this study is to increase the classification outcome
of different MI signals by selection of efficient features. The objectives
obtained in study can be summarize as
49
50
51
M.T. Sadiq et al.
Fig. 12. Timing Execution with (a) PCA (b) ICA (c) LDA and (d) NCA.
Fig. 13. Schematic diagram of the suggested technique for interpretation of the MI tasks with subject independent case.
1
2
3
4
5
6
1. In the pre-processing unit, temporal and spatial filtering is
applied to eliminate cognitive noise and interference between
channels. The proposed system is made resilient against noise by
this pre-processing unit. To show the effectiveness of proposed
two step filtering on classification accuracy, we compare the
results with and without pre-processing in our analysis, yet
reviled results with pre-processing module are at least 5%–12%
7
improved in term of overall classification accuracy for NCA with
8
seven channels by utilizing MLP classifier as shown in Fig. 15.
2. In order to identify the appropriate channels relevant to MI
9
10
information, we select different combinations of channels based
11
M.T. Sadiq et al.
Table 8
Comparison with other studies.
Papers by
Methods employed
Classification accuracy (%)
‘‘aa‘‘
‘‘al’’
‘‘av‘‘
‘‘aw’’
‘‘ay‘‘
‘‘IVb’’
Average
The proposed
EWT+NCA+ANN/CFNN/MLP
Number of Channels
Number of Features
100
7
6
100
7
8
100
7
8
100
7
2
100
7
7
100
7
1
100
–
–
Siuly and Li (2012)
CC+tuned LS-SVM
Number of Channels
Number of Features
97.9
118
6
99.2
118
6
98.8
118
6
93.4
118
6
89.4
118
6
97.9
118
6
96.1
–
–
Ince et al. (2009)
CS+SVM
Number of Channels
Number of Features
95.6
33
16
99.7
33
16
90.5
33
16
98.4
33
64
95.7
33
64
–
–
–
96
–
–
Wang et al. (2016)
OA+HOS+NB
Number of Channels
Number of Features
97.9
118
11
97.9
118
11
98.3
118
11
94.5
118
11
93.7
118
11
91.9
118
11
95.6
–
–
Sadiq et al. (2019a)
EWT+IA2+tuned LS-SVM
Number of Channels
Number of Features
94.5
18
10
91.7
18
10
97.2
18
10
95.6
18
10
97
18
10
–
–
–
95.2
–
–
Kevric and Subasi (2017)
MSPCA +WPD+HOS + k-NN
Number of Channels
Number of Features
96
3
6
92.3
3
6
88.9
3
6
95.4
3
6
91.4
3
6
–
–
–
92.8
–
–
Li et al. (2011)
Clustering+LS-SVM
Number of Channels
Number of Features
92.6
118
9
84.9
118
9
90.8
118
9
86.5
118
9
86.7
118
9
–
–
–
88.3
–
–
Song and Epps (2007)
SSRCSP
Number of Channels
Number of Features
87.4
18
20
97.4
18
20
69.7
18
20
96.8
18
20
88.6
18
20
–
–
–
87.9
–
–
Lu et al. (2010)
R-CSP through aggregation
Number of Channels
Number of Features
76.8
118
6
98.2
118
6
74.5
118
6
92.2
118
6
77
118
6
–
–
–
83.7
–
–
Zhang et al. (2013)
Z-LDA
Number of Channels
Number of Features
77.70
118
6
100
118
6
68.4
118
6
99.6
118
6
59.9
118
6
–
–
–
81.1
–
–
Fig. 14. Results for subject independent MI EEG signal classification.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
on physiological understanding according to 10–20 system standard and so, in our research, only a combination of 18, 7 and
3 channels around the motor cortex brain area was chosen to
provide adequate identification results among various MI tasks.
3. The EWT procedure was developed to determine the complex
nature of the EEG signals, and 10 modes were selected by hit and
trial manner for each channel to distinguish between different
MI tasks. Although other modes are physiologically meaningful
but for the classification of MI tasks they do not contribute.
4. For channel selection strategies 1, 2 and 3, we grouped all modes
in sequential order and acquired 180, 70 and 30 modes in all.
Every mode is perceived as one feature vector in the analysis.
5. We next applied PCA, ICA, LDA and NCA on each feature matrix
obtained from three channel selection strategies to acquired
various components and coefficients from PCA, ICA and LDA.
Fig. 15. Effect of pre-processing (two-step filtering) on classification accuracy.
To select the efficient components and coefficients, a correlation
based criteria was implemented to further reduce the feature
dimension. For NCA a best value of regularization parameter was
selected which provide the minimum classification loss.
6. For the fair analysis of the proposed framework, extensive evaluation parameters (such as recall, sensitivity, precision, accuracy,
F1 score, specificity and kappa coefficient) with various neural
networks are evaluated and higher values of these parameters
showed the success of proposed method.
7. Our proposed experimental results showed that NCA provides
the higher classification results in comparison with PCA, ICA
and LDA as showed in Tables 4 and 5. This is because, PCA is a
linear transformation approach for discovering vector data representation, obtaining total variance in an unsupervised fashion.
It can result in data loss during the lower dimension projecting
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
M.T. Sadiq et al.
Fig. 16. Comparison of subject independent outcome with other studies.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
operation. For situations where the average and covariance are
not sufficient to identify datasets, PCA often suffers, and it tries
to seek linear correlation among variables, which is often not
desired. The ICA algorithm use the higher-order statistics (HOS)
to resolve the linear model’s blind sources separation (BSS)
limitations. ICA transform the subspace of the signal into independent components invisible to PCA, however HOS are highly
susceptible to outliers. LDA is a parametric process for learning
a linear set of features in a manner that discriminatory data is
retained by class labels. LDA makes the assumption that data
follows the Gaussian probability and fails if the discriminatory
data is not in the mean but possibly in the variance of the information. On the other hand, NCA ranks the features according
to weights to increase the classification accuracy and does not
lose any information during dimensionality reduction (Raghu &
Sriraam, 2018), therefore provides more classification accuracy
in our experiments in comparison with PCA, ICA and LDA.
It is also worth mentioning here, in our previous study (Sadiq
et al., 2019a), we employed the least square version of SVM
classifier and obtained an average classification accuracy of
95.19%. In the present proposed study, we again repeated the
experiments with the SVM and obtain a classification accuracy
of 94.7%, 92.2%, 97.77%, 95.7%, and 97% for the subjects ‘‘aa’’,
‘‘av’’, ‘‘al’’, ‘‘aw’’, and ‘‘ay’’ respectively. The average classification accuracy of 95.47% is obtained with the LS-SVM, which is
much closed to the results obtained in our previous study. As in
the proposed study, EWT features are tested with several neural
networks, and thus, we conclude that neural networks help in
obtaining much better results for the EWT.
8. At last, to show the success of the proposed approach, a comparison is made with several other studies applied on datasets
IVa and IVb. As seen in Table 8, our proposed approach with
channel selection strategy 2 and NCA provide benchmark classification accuracy of 100% for subjects ‘‘aa’’, ‘‘av’’, ‘‘al’’, ‘‘aw’’,
‘‘ay’’, and ‘‘IVb’’. This shows that proposed method is suitable for subjects with different training samples. The CC+tuned
LS-SVM (Siuly & Li, 2012) and CS+SVM (Ince et al., 2009)
achieved classification accuracy of 96.08% and 96% with 2nd
and 3rd ranks. The Z-LDA (Zhang et al., 2013) ranked at last
with an average classification accuracy of 81.1%. In comparison
with those studies in Table 8, our proposed approach ‘‘two-step
filtering+EWT+NCA+ANN/CFNN/MLP’’ provide classification
improvement of 3.92%–18.9%. Such an increase in accuracy
may alleviate subjects to explain their MI assignments more
clearly. As an illustration, handicapped people might be able
to regulate their wheelchair more effectively, and rehabilitated
patients might be able to increase their therapeutic activities
with adequate input after they perform the required action.
Moreover, other studies in Table 8, ‘‘CC+LS-SVM’’ (Siuly & Li,
2012), ‘‘OA+NB’’ (Wang et al., 2016), ‘‘RCSP+aggregation’’ (Lu
et al., 2010), ‘‘CS+SVM’’ (Ince et al., 2009), EWT+IA2+HOS
(Sadiq et al., 2019a) and ‘‘CSP+SVM’’ (Song & Epps, 2007)
utilized ‘‘118’’, ‘‘118’’, ‘‘118’’, ‘‘33’’, ‘‘18’’ and ‘‘18’’ number of
channels respectively in comparison of our proposed method
which utilized only 7 channels to achieve benchmark classification outcome. It is also shown in Table 8, the proposed
algorithm has chosen significant features that are particular
for each subject which represent such proposed algorithm is
versatile to adopt in developing a subject-specific BCI system.
9. For subject independent case, the compassion of the proposed
method with other studies is shown in Fig. 16 by bar graphs.
As represented in Fig. 16, the top of each bar is labeled with
maximum average classification accuracy for each study. The
proposed study ranked at number one in term of overall classification accuracy with 92.9%. Our previous study (Sadiq et al.,
2019b) ranked at 2nd with overall classification accuracy of
91.4%, ‘‘CSP+Katz Fractal Dimension+LDA’’ (Joadder et al.,
2019) ranked at 3rd with 84.3% average classification outcome whereas evolutionary based algorithm (Atyabi et al., 2017)
ranked at last with 71.9% results. These results suggest that
the proposed ‘‘two-step filtering+seven channels+regularized
NCA+MLP’’ framework in EWT domain provide up to 21% classification improvement in overall results for SI case. Moreover,
it is important to note that a total of ‘‘68’’, ‘‘118’’, ‘‘22’’ and
‘‘118’’ electrodes have been used in studies (Atyabi et al., 2017;
Devlaminck et al., 2011; Kang et al., 2009; Samek et al., 2013),
while in the proposed research just 7 channels have been used
to produce the best performance in distinction.
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
M.T. Sadiq et al.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The key benefits of the suggested framework is its robustness against
noise, selection of the reduce features for different subjects, stable evaluation outcome for both subject dependent and subject independent
cases by using 7 channels only. The advantages of the proposed study
reveal that proposed method is suitable for the development of expert
BCI system.
In the present study, the number of channels have been identified
empirically for the EWT. Since one have to choose those channels with
relevant MI information manually, such an empirical selection strategy
takes a long time. To overcome this constraint, our next step work is to
establish adaptive channel discovery algorithms to allow for effective
and scalable signal detection approaches for practical applications. In
addition to the publicly accessible sets of data, our next level is to test
these applied approaches online for some other different applications.
15
8. Conclusion
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
In this work we developed automated methods for the pattern
mining of EEG data using EWT technique with PCA, ICA, LDA and NCA.
To authenticate the reliability of the proposed methods, three different
channel combinations were decoded from two publicly available BCI
competition III datasets. To eliminate the irrelevant components and
coefficients from PCA, ICA and LDA, a correlation based components
and coefficients selection criteria with the best first search technique
was utilized. The regularization parameters of NCA method was also
tuned to select the relevant attributes. All the experiments were investigated by using various statistical measures and neural networks. We
achieved 100% and 92.9% classification accuracy for subject dependent
and subject independent cases respectively by utilizing NCA with 7
channels and MLP classifier which is higher than the other works on
same public available datasets. The computing cost of NCA is higher
than among PCA, ICA and LDA owing to the diversification in the
selection of tuned parameters which govern the function of NCA. Nevertheless, the estimated model’s training and testing times were slower
for NCA than for PCA, ICA and LDA due to reduced attribute space.
In conclusion the combination of two-step filtering, 7 channels, EWT,
NCA and MLP is an effective framework for expert BCI applications.
36
CRediT authorship contribution statement
37
38
39
40
41
42
Muhammad Tariq Sadiq: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original
draft, Writing - review & editing, Visualization. Xiaojun Yu: Validation,
Formal analysis, Investigation, Writing - review & editing, Visualization, Supervision, Project administration. Zhaohui Yuan: Writing review & editing, Visualization, Supervision, Project administration.
43
Declaration of competing interest
44
45
46
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
47
References
48
49
50
51
52
53
54
55
56
57
58
59
60
61
Acharya, U. R., Sree, S. V., Alvin, A. P. C., & Suri, J. S. (2012). Use of principal
component analysis for automatic classification of epileptic EEG activities in
wavelet framework. Expert Systems with Applications, 39(10), 9072–9078.
Atyabi, A., Luerssen, M., Fitzgibbon, S. P., Lewis, T., & Powers, D. M. (2017). Reducing
training requirements through evolutionary based dimension reduction and subject
transfer. Neurocomputing, 224, 19–36.
Bashar, S. K., & Bhuiyan, M. I. H. (2016). Classification of motor imagery movements
using multivariate empirical mode decomposition and short time fourier transform
based hybrid method. Engineering Science and Technology, an International Journal,
19(3), 1457–1464.
Bhattacharyya, S., Sengupta, A., Chakraborti, T., Konar, A., & Tibarewala, D. (2014).
Automatic feature selection of motor imagery EEG signals using differential
evolution and learning automata. Medical & Biological Engineering & Computing,
52(2), 131–139.
Birbaumer, N., Murguialday, A. R., & Cohen, L. (2008). Brain–computer interface in
paralysis. Current Opinion in Neurology, 21(6), 634–638.
Blankertz, B., Muller, K.-R., Krusienski, D. J., Schalk, G., Wolpaw, J. R., Schlogl, A.,
Pfurtscheller, G., Millan, J. R., Schroder, M., & Birbaumer, N. (2006). The BCI
competition III: Validating alternative approaches to actual BCI problems. IEEE
Transactions on Neural Systems and Rehabilitation Engineering, 14(2), 153–159.
Burke, D. P., Kelly, S. P., De Chazal, P., Reilly, R. B., & Finucane, C. (2005).
A parametric feature extraction and classification strategy for brain-computer
interfacing. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(1),
12–17.
Cao, L., Chua, K. S., Chong, W., Lee, H., & Gu, Q. (2003). A comparison of PCA, KPCA
and ICA for dimensionality reduction in support vector machine. Neurocomputing,
55(1–2), 321–336.
Chaudhary, S., Taran, S., Bajaj, V., & Siuly, S. (2020). A flexible analytic wavelet
transform based approach for motor-imagery tasks classification in BCI applications.
Computer Methods and Programs in Biomedicine, 187, Article 105325.
Cincotti, F., Mattia, D., Aloise, F., Bufalari, S., Schalk, G., Oriolo, G., Cherubini, A.,
Marciani, M. G., & Babiloni, F. (2008). Non-invasive brain–computer interface
system: towards its application as assistive technology. Brain Research Bulletin,
75(6), 796–803.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions
on Information Theory, 13(1), 21–27.
Daubechies, I. (1992). Ten lectures on wavelets (vol. 61). Siam.
Devlaminck, D., Wyns, B., Grosse-Wentrup, M., Otte, G., & Santens, P. (2011). Multisubject learning for common spatial patterns in motor-imagery BCI. Computational
Intelligence and Neuroscience, 2011.
Dornhege, G., Millán, J. d. R., Hinterberger, T., McFarland, D., & Müller, K.-R. (2007).
Toward brain-computer interfacing (vol. 63). MIT Press Cambridge, MA.
Ebrahimi, T., Vesin, J.-M., & Garcia, G. (2003). Brain-computer interface in multimedia
communication. IEEE Signal Processing Magazine, 20(1), 14–24.
Feng, J. K., Jin, J., Daly, I., Zhou, J., Niu, Y., Wang, X., & Cichocki, A. (2019). An
optimized channel selection method based on multifrequency CSP-rank for motor
imagery-based BCI system. Computational Intelligence and Neuroscience, 2019.
Fielding, A. (2007). Cluster and classification techniques for the biosciences (vol. 260).
Cambridge University Press Cambridge.
Gilles, J. (2013). Empirical wavelet transform. IEEE Transactions on Signal Processing,
61(16), 3999–4010.
Goyal, S., & Goyal, G. K. (2011). Cascade and feedforward backpropagation artificial
neural networks models for prediction of sensory quality of instant coffee flavoured
sterilized drink. Canadian Journal on Artificial Intelligence, Machine Learning and
Pattern Recognition, 2(6), 78–82.
Hall, M. A. (1999). Correlation-based feature selection for machine learning. University of
Waikato Hamilton.
Hossin, M., & Sulaiman, M. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management
Process, 5(2), 1.
Ince, N. F., Goksu, F., Tewfik, A. H., & Arica, S. (2009). Adapting subject specific motor
imagery EEG patterns in space–time–frequency for a brain computer interface.
Biomedical Signal Processing and Control, 4(3), 236–246.
Jana, G. C., Swetapadma, A., & Pattnaik, P. K. (2018). Enhancing the performance
of motor imagery classification to design a robust brain computer interface using
feed forward back-propagation neural network. Ain Shams Engineering Journal, 9(4),
2871–2878.
Jansen, B. H., Bourne, J. R., & Ward, J. W. (1981). Autoregressive estimation of short
segment spectra for computerized EEG analysis. IEEE Transactions on Biomedical
Engineering, (9), 630–638.
Jiang, X., Bian, G.-B., & Tian, Z. (2019). Removal of artifacts from EEG signals: a
review. Sensors, 19(5), 987.
Jiao, Y., Zhang, Y., Chen, X., Yin, E., Jin, J., Wang, X., & Cichocki, A. (2018). Sparse
group representation model for motor imagery EEG classification. IEEE Journal of
Biomedical and Health Informatics, 23(2), 631–641.
Jin, Z., Zhou, G., Gao, D., & Zhang, Y. (2018). EEG classification using sparse Bayesian
extreme learning machine for brain–computer interface. Neural Computing and
Applications, 1–9.
Joadder, M. A., Siuly, S., Kabir, E., Wang, H., & Zhang, Y. (2019). A new design
of mental state classification for subject independent BCI systems. IRBM, 40(5),
297–305.
Jurcak, V., Tsuzuki, D., & Dan, I. (2007). 10/20, 10/10, and 10/5 systems revisited:
their validity as relative head-surface-based positioning systems. NeuroImage, 34(4),
1600–1611.
Kang, H., Nam, Y., & Choi, S. (2009). Composite common spatial pattern for
subject-to-subject transfer. IEEE Signal Processing Letters, 16(8), 683–686.
Kevric, J., & Subasi, A. (2017). Comparison of signal decomposition methods in
classification of EEG signals for motor-imagery BCI system. Biomedical Signal
Processing and Control, 31, 398–406.
Kołodziej, M., Majkowski, A., & Rak, R. J. (2012). Linear discriminant analysis as EEG features reduction technique for brain-computer interfaces. Przeglad
Elektrotechniczny, 88, 28–30.
Krepki, R., Blankertz, B., Curio, G., & Müller, K.-R. (2007). The Berlin Brain-Computer
Interface (BBCI)–towards a new communication channel for online control in
gaming applications. Multimedia Tools and Applications, 33(1), 73–90.
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
M.T. Sadiq et al.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Kronegg, J., Chanel, G., Voloshynovskiy, S., & Pun, T. (2007). EEG-based synchronized
brain-computer interfaces: A model for optimizing the number of mental tasks. IEEE
Transactions on Neural Systems and Rehabilitation Engineering, 15(1), 50–58.
Krusienski, D. J., McFarland, D. J., & Wolpaw, J. R. (2006). An evaluation of autoregressive spectral estimation model order for brain-computer interface applications.
In 2006 International conference of the IEEE engineering in medicine and biology society
(pp. 1323–1326). IEEE.
Li, M.-a., Luo, X.-y., & Yang, J.-f. (2016). Extracting the nonlinear features of motor
imagery EEG using parametric t-SNE. Neurocomputing, 218, 371–381.
Li, Y., & Wen, P. P. (2011). Clustering technique-based least square support vector machine for EEG signal classification. Computer Methods and Programs in Biomedicine,
104(3), 358–372.
Li, Y., & Wen, P. (2013). Identification of motor imagery tasks through CC-LR algorithm
in brain computer interface. International Journal of Bioinformatics Research and
Applications, 9(2), 156–172.
Li, Y., & Wen, P. P. (2014). Modified CC-LR algorithm with three diverse feature sets for
motor imagery tasks classification in EEG based brain–computer interface. Computer
Methods and Programs in Biomedicine, 113(3), 767–780.
Li, M.-a., Zhu, W., Liu, H.-n., & Yang, J.-f. (2017). Adaptive feature extraction of motor
imagery EEG with optimal wavelet packets and SE-isomap. Applied Sciences, 7(4),
390.
Lu, H., Eng, H.-L., Guan, C., Plataniotis, K. N., & Venetsanopoulos, A. N. (2010).
Regularized common spatial pattern with aggregation for EEG classification
in small-sample setting. IEEE Transactions on Biomedical Engineering, 57(12),
2936–2946.
Mandic, D. P., & Chambers, J. (2001). Recurrent neural networks for prediction: learning
algorithms, architectures and stability. John Wiley & Sons, Inc.
Martis, R. J., Acharya, U., & Min, L. C. (2013). ECG beat classification using PCA, LDA,
ICA and discrete wavelet transform. Biomedical Signal Processing and Control, 8(5),
437–448.
Pfurtscheller, G., Neuper, C., Muller, G., Obermaier, B., Krausz, G., Schlogl, A.,
Scherer, R., Graimann, B., Keinrath, C., & Skliris, D. (2003). Graz-BCI: state of the
art and clinical applications. IEEE Transactions on Neural Systems and Rehabilitation
Engineering, 11(2), 1–4.
Polat, K., & Güneş, S. (2007). Classification of epileptiform EEG using a hybrid system
based on decision tree classifier and fast fourier transform. Applied Mathematics and
Computation, 187(2), 1017–1026.
Raghu, S., & Sriraam, N. (2018). Classification of focal and non-focal EEG signals using
neighborhood component analysis and machine learning algorithms. Expert Systems
with Applications, 113, 18–32.
Rodríguez-Bermúdez, G., & García-Laencina, P. J. (2012). Automatic and adaptive
classification of electroencephalographic signals for brain computer interfaces.
Journal of Medical Systems, 36(1), 51–63.
Sadiq, M. T., Yu, X., Yuan, Z., Fan, Z., Rehman, A. U., Li, G., & Xiao, G. (2019a).
Motor imagery EEG signals classification based on mode amplitude and frequency
components using empirical wavelet transform. IEEE Access, 7, 127678–127692.
Sadiq, M. T., Yu, X., Yuan, Z., Zeming, F., Rehman, A. U., Ullah, I., Li, G., & Xiao, G.
(2019b). Motor imagery EEG signals decoding by multivariate empirical wavelet
transform-based framework for robust brain–computer interfaces. IEEE Access, 7,
171431–171451.
Sakhavi, S., Guan, C., & Yan, S. (2018). Learning temporal information for braincomputer interface using convolutional neural networks. IEEE Transactions on Neural
Networks and Learning Systems, 29(11), 5619–5629.
Samek, W., Meinecke, F. C., & Müller, K.-R. (2013). Transferring subspaces between
subjects in brain–computer interfacing. IEEE Transactions on Biomedical Engineering,
60(8), 2289–2298.
View publication stats
Schlögl, A., Neuper, C., & Pfurtscheller, G. (2002). Estimating the mutual information of an EEG-based brain-computer interface. Biomedizinische Technik/Biomedical
Engineering, 47(1–2), 3–8.
Siuly, S., & Li, Y. (2012). Improving the separability of motor imagery EEG signals using
a cross correlation-based least square support vector machine for brain–computer
interface. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20(4),
526–538.
Siuly, S., Zarei, R., Wang, H., & Zhang, Y. (2017). A new data mining scheme for
analysis of big brain signal data. In Australasian database conference (pp. 151–164).
Springer.
Song, L., & Epps, J. (2007). Classifying EEG for brain-computer interface: Learning optimal filters for dynamical system features. Computational Intelligence and
Neuroscience, 2007.
Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3(1), 109–118.
Sturm, B. L. (2013). Classification accuracy is not enough. Journal of Intelligent
Information Systems, 41(3), 371–406.
Subasi, A., & Ercelebi, E. (2005). Classification of EEG signals using neural network and
logistic regression. Computer Methods and Programs in Biomedicine, 78(2), 87–99.
Subasi, A., & Gursoy, M. (2010). EEG signal classification using PCA, ICA, LDA and
support vector machines. Expert Systems with Applications, 37(12), 8659–8666.
Szczuko, P. (2017). Real and imaginary motion classification based on rough set analysis of EEG signals for multimedia applications. Multimedia Tools and Applications,
76(24), 25697–25711.
Taran, S., Bajaj, V., Sharma, D., Siuly, S., & Sengur, A. (2018). Features based
on analytic IMF for classifying motor imagery EEG signals in BCI applications.
Measurement, 116, 68–76.
Thomas, J., Maszczyk, T., Sinha, N., Kluge, T., & Dauwels, J. (2017). Deep learningbased classification for brain-computer interfaces. In 2017 IEEE International
Conference on Systems, Man, and Cybernetics (pp. 234–239). IEEE.
Wang, J.-J., Xue, F., & Li, H. (2015). Simultaneous channel and feature selection of
fused EEG features based on sparse group lasso. Biomed Research International, 2015.
Wang, H., & Zhang, Y. (2016). Detection of motor imagery EEG signals employing
Naïve Bayes based learning process. Measurement, 86, 148–158.
Witten, I. H., Frank, E., & Hall, M. A. (2005). Practical machine learning tools and
techniques (p. 578). Morgan Kaufmann.
Xu, N., Gao, X., Hong, B., Miao, X., Gao, S., & Yang, F. (2004). BCI competition 2003data set IIb: enhancing P300 wave detection using ICA-based subspace projections
for BCI applications. IEEE Transactions on Biomedical Engineering, 51(6), 1067–1072.
Xu, J., Zheng, H., Wang, J., Li, D., & Fang, X. (2020). Recognition of EEG signal motor
imagery intention based on deep multi-view feature learning. Sensors, 20(12), 3496.
Yang, W., Wang, K., & Zuo, W. (2012). Neighborhood component feature selection for
high-dimensional data. Journal of Computational Physics, 7(1), 161–168.
Yu, X., Chum, P., & Sim, K.-B. (2014). Analysis the effect of PCA for feature reduction in
non-stationary EEG based motor imagery of BCI system. Optik, 125(3), 1498–1502.
Zhang, Y., Nam, C. S., Zhou, G., Jin, J., Wang, X., & Cichocki, A. (2018). Temporally
constrained sparse group spatial patterns for motor imagery BCI. IEEE Transactions
on Cybernetics, 49(9), 3322–3332.
Zhang, Y., Wang, Y., Jin, J., & Wang, X. (2017). Sparse Bayesian learning for
obtaining sparsity of EEG frequency bands based feature vectors in motor imagery
classification. International Journal of Neural Systems, 27(02), Article 1650032.
Zhang, R., Xu, P., Guo, L., Zhang, Y., Li, P., & Yao, D. (2013). Z-score linear
discriminant analysis for EEG based brain-computer interfaces. PLoS One, 8(9).
Zhang, X., Yao, L., Wang, X., Monaghan, J., & Mcalpine, D. (2019). A survey on deep
learning based brain computer interface: Recent advances and new frontiers. arXiv
preprint arXiv:1905.04149.
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
Download