See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/344360611 Exploiting dimensionality reduction and neural network techniques for the development of expert brain-computer interfaces Article in Expert Systems with Applications · September 2020 DOI: 10.1016/j.eswa.2020.114031 CITATIONS READS 37 431 3 authors, including: Xiaojun Yu Nanyang Technological University 110 PUBLICATIONS 965 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: EXPONENTIAL KNOWLEDGE AUTOMATION View project Robust Gait Generation View project All content following this page was uploaded by Muhammad Tariq Sadiq on 24 September 2020. The user has requested enhancement of the downloaded file. Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Highlights Exploiting dimensionality reduction and neural network techniques for the development of expert brain–computer interfaces Expert Systems With Applications xxx (xxxx) xxx Muhammad Tariq Sadiq, Xiaojun Yu, Zhaohui Yuan∗ • • • • Two-step filtering technique was adopted for cognitive and external noise removal. Automated correlation-based criteria was proposed to select relevant components and coefficients for PCA, ICA and LDA respectively. The regularization parameters for NCA were tuned to reduce the classification loss. Extensive experiments with PCA, ICA, LDA, NCA techniques with several channel selection, neural networks and statistical measures were conducted in EWT domain. • The proposed framework provide 100% and 92.9% classification accuracy for subject dependent and independent experiments. Graphical abstract and Research highlights will be displayed in online search result lists, the online contents list and the online article, but will not appear in the article PDF file or print unless it is mentioned in the journal specific style requirement. They are displayed in the proof pdf for review purpose only. Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Exploiting dimensionality reduction and neural network techniques for the development of expert brain–computer interfaces Muhammad Tariq Sadiq 1 , Xiaojun Yu 1 , Zhaohui Yuan ∗ School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, PR China ARTICLE INFO Keywords: Electroencephalography Brain–computer interface Empirical wavelet transform Motor imagery Neighborhood component analysis Neural networks 1 2 3 4 5 6 7 8 9 10 ABSTRACT Background: Analysis and classification of extensive medical data (e.g. electroencephalography (EEG) signals) is a significant challenge to develop effective brain–computer interface (BCI) system. Therefore, it is necessary to build automated classification framework to decode different brain signals. Methods: In the present study, two-step filtering approach is utilize to achieve resilience towards cognitive and external noises. Then, empirical wavelet transform (EWT) and four data reduction techniques; principal component analysis (PCA), independent component analysis (ICA), linear discriminant analysis (LDA) and neighborhood component analysis (NCA) are first time integrated together to explore dynamic nature and pattern mining of motor imagery (MI) EEG signals. Specifically, EWT helped to explore the hidden patterns of MI tasks by decomposing EEG data into different modes where every mode was consider as a feature vector in this study and each data reduction technique have been applied to all these modes to reduce the dimension of huge feature matrix. Moreover, an automated correlation-based components/coefficients selection criteria and parameter tuning were implemented for PCA, ICA, LDA, and NCA respectively. For the comparison purposes, all the experiments were performed on two publicly available datasets (BCI competition III dataset IVa and IVb). The performance of the experiments was verified by decoding three different channel combination strategies along with several neural networks. The regularization parameter tuning of NCA guaranteed to improve classification performance with significant features for each subject. Results: The experimental results revealed that NCA provides an average sensitivity, specificity, accuracy, precision, F1 score and kappa-coefficient of 100% for subject dependent case whereas 93%, 93%, 92.9%, 93%, 96.4% and 90% for subject independent case respectively. All the results were obtained with artificial neural networks, cascade-forward neural networks and multilayer perceptron neural networks (MLP) for subject dependent case while with MLP for subject independent case by utilizing 7 channels out of total 118. Such an increase in results can alleviate users to explain more clearly their MI activities. For instance, physically impaired person will be able to manage their wheelchair quite effectively, and rehabilitated persons may be able to improve their activities. 1. Introduction Brain–Computer Interface (BCI) system uses individual brain signals to link the brain and a computer (Birbaumer et al., 2008). During recent years, BCI has showed extensive contributions towards rehabilitations (Birbaumer et al., 2008; Pfurtscheller et al., 2003) and multimedia applications (Ebrahimi et al., 2003; Krepki et al., 2007; Szczuko, 2017). Electroencephalography (EEG)-based motor imagery (MI) BCI systems are by far the most widely employed practical systems because of reliability, non-invasive, low cost and superb temporal characteristics. Cincotti et al. (2008) and Kronegg et al. (2007). Nonetheless, a key challenge for every real-time BCI device is to accurately interpret various MI EEG signals (Siuly & Li, 2012). Typically, a non-biased automated EEG classification system comprises three elements, i.e. preprocessing, extraction of features and classification of signals. Preprocessing is primarily accountable for noise suppression of information signals and there are several methods proposed for this purpose. Readers are referred to Jiang et al. (2019) for a comprehensive review of noise removal techniques from EEG signals. Extraction and identification of features are essential components of an automated system for assessing the outcomes of classification, for which a broad range of approaches have been reported to classify MI ∗ Corresponding author. E-mail addresses: tariq.sadiq@mail.nwpu.edu.cn (M.T. Sadiq), XJYU@nwpu.edu.cn (X. Yu), yuanzhh@nwpu.edu.cn (Z. Yuan). 1 Co-first authors. https://doi.org/10.1016/j.eswa.2020.114031 Received 10 February 2020; Received in revised form 23 August 2020; Accepted 14 September 2020 11 12 13 14 15 16 17 18 19 20 21 M.T. Sadiq et al. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 EEG signals. For the spectral analysis of EEG signals, Fourier Transform (FT) based methods are developed but these methods do not provide time-domain information (Polat & Güneş, 2007; Rodríguez-Bermúdez & García-Laencina, 2012). The autoregressive (AR) methods are computationally effective however; they suffered from artifacts, which limit their applicability for practical BCI systems (Burke et al., 2005; Jansen et al., 1981; Krusienski et al., 2006; Schlögl et al., 2002). A variety of common spatial pattern (CSP) methods have been described for feature extraction which includes regularized CSP with selected subjects (SSRCSP), spatially regularization (SRCSP), CSP with Tikhonov regularization (TRCSP) and CSP with weighted Tikhonov regularization (WTRCSP. In literature, sparse group representation model (SGRM) (Jiao et al., 2018), temporally constrained sparse group spatial pattern (TSGSP) method (Zhang et al., 2018), the CSP-rank channel selection for multifrequency band EEG (CSP-RMF) (Feng et al., 2019), as well as the sparse Bayesian extreme learning (Jin et al., 2018; Zhang et al., 2017) have also been proposed. However, there is still gap for the improvement of classification accuracy since these methods are not applicable for subjects with small training samples. More recently EEG signals have also been extracted and classified using deep learning schemes based on the convolutional neural networks (CNN) and recurrent neural networks (RNN) (Sakhavi et al., 2018; Thomas et al., 2017). For more information, readers are referred to Zhang et al. (2019) for recent developments on BCI deep learning systems. Nevertheless, the accuracy rate produced by the majority of these approaches was not significant owing to the unavailability of extensive data needed for the training phase. Two other drawbacks which may impede the successful applicability of such methods are the system resources needed and the computational complexity burdens. Data decomposition (DD)-based techniques have recently gained prominence in the identification of MI EEG signals. Some notable DDbased methods available in literature are intrinsic mode function (IMF) with least square support vector machine (LS-SVM) classifier (Taran et al., 2018), multivariate empirical mode decomposition (EMD) with FT (Bashar & Bhuiyan, 2016), a comparative study between wavelet packet decomposition (WPD), empirical mode decomposition (EMD) and discrete wavelet transformation (DWT) with higher-order statistics (HOS) (Kevric & Subasi, 2017) as well as empirical wavelet transform (EWT) with HOS (Sadiq et al., 2019a). Not all the features extracted from EEG signals are relevant for the classification. Excessive numbers of features not only increase the dimension of the feature matrix but also results in low classification success rates. To reduce the dimension of large feature matrix, several combinations of features are evaluated in studies (Li et al., 2014; Sadiq et al., 2019a) to decode the best one for classification enhancement. Moreover, several dimension reduction techniques are utilized to choose the best features for EEG signal classification. Yu et al. extract the features from CSP and analyze the effect of principal component analysis (PCA) for feature reduction (Yu et al., 2014). Acharya et al. use WPD to decompose EEG signals into several sub-bands and apply PCA to them to reduce the size and then use several principal components as an input to the classifier (Acharya et al., 2012). Independent component analysis (ICA) efficacy in selecting the best features subset is demonstrated in the literature (Xu et al., 2004) by a linear transformation of large feature vector into low dimensional. Discrete Fourier transform coefficients are considered as a feature set in BCI study and further linear discriminant analysis (LDA) is applied to them to reduce the classifier load (Kołodziej et al., 2012). In Li et al. (2016) several bivariate features are extracted from EEG seizure data and then a dimension reduction method known as lasso is applied to reduce the feature dimension. Another dimensionality reduction method named as a t-distributed stochastic neighbor embedding (t-SNE) is evaluated to reduce the nonlinear features extracted from DWT (Li et al., 2016). Neighborhood component analysis (NCA) is a weighting method that is used to reduce the dimension of feature matrix in study (Raghu & Sriraam, 2018) and increased the classification accuracy up to 96.1% for EEG focal epileptic seizures data. 1.1. Limitations 68 We understood several limitations from available literature which are listed as: 69 70 1. In reducing the size of the feature matrix, so many groupings of features are investigated in studies (Bashar & Bhuiyan, 2016; Kevric & Subasi, 2017; Li et al., 2014; Sadiq et al., 2019a; Taran et al., 2018) to decode the better one for identification improvement, however, this procedure is manual and time-taking as it necessitates numerous experimentations to choose the best feature pair. 2. In results (Acharya et al., 2012; Kołodziej et al., 2012; Martis et al., 2013; Subasi & Gursoy, 2010; Xu et al., 2004; Yu et al., 2014), no automatic approach was used for the selection of the PCA, ICA, and LDA components, consequently curtailing their acceptability to potential implementation. 3. To avoid the over-fitting problem of NCA regularization parameter, its cost function parameters were tuned manually (Raghu & Sriraam, 2018) which limit its applicability for practical systems. 4. Furthermore, most studies in literature only use one dataset that reduces the versatility of those studies. In studies (Li et al., 2013, 2014; Siuly & Li, 2012) authors used only classification accuracy as a performance measure conversely classification accuracy is not enough to identify the MI signals (Sturm, 2013). 5. It is also worth noting that previous studies (Chaudhary et al., 2020; Ince et al., 2009; Kevric & Subasi, 2017; Li et al., 2011; Lu et al., 2010; Sadiq et al., 2019a; Siuly & Li, 2012; Song & Epps, 2007; Wang et al., 2016; Zhang et al., 2013) limited only for subject dependent studies however, recently, subject independent experiments have gained significant importance because of its ability to generalize many subject’s data to an unknown subject and hence it helps the product developers to develop a system for a large group of people by training their system on few subjects. 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 1.2. Contributions 101 To address the aforementioned limitations in the existing studies, the main contributions of this study are summarized as: (1) Design and validate a new framework for automatic identification of MI tasks from either sufficient or small training samples subjects (2) Introduce and implement the PCA, ICA, LDA and NCA approaches for reducing the large amount of EEG modes data obtained from the EWT (3) Propose correlation-based criteria for the automated selection of the components of PCA, ICA and LDA and the tuning of regularization parameters for NCA, and they are used as efficient biomarkers of EEG for MI task detections (4) Investigate a sustainable classification model for the proposed features to differentiate the MI tasks (5) Improve classification accuracy as compared with the existing methods (6) Build an efficient subject independent MI EEG classification system. In this study, we employ EWT for EEG signal classifications, since it has been proved that EWT is very useful for non-linear and nonstationary signal analysis (Sadiq et al., 2019a). It is also worth mentioning that, although various nonlinear dimensionality reduction techniques have been employed in Li et al. (2016, 2017), Wang et al. (2015) and Xu et al. (2020), the results for EEG signals are relatively low. It is particularly noted that signal decomposition methods do not perform well with nonlinear dimension reduction techniques in Li et al. (2017). On the contrary, the effectiveness of linear dimension reduction techniques with signal decomposition methods have been verified in many studies (Acharya et al., 2012; Kołodziej et al., 2012; Martis et al., 2013; Subasi & Gursoy, 2010; Xu et al., 2004; Yu et al., 2014). Since the focus of this study is to design a flexible framework that is effective for subjects with either sufficient or small training samples, 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 M.T. Sadiq et al. Fig. 1. Block diagram of the proposed methodology. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 and is also suitable for both subject-dependent and independent experiments, which is the one of the biggest challenge in BCI field, for the design of expert flexible BCI system, we focus mainly on the linear dimension reduction techniques. To the best of our knowledge, correlation-based strategy and automated tuned-NCA model are the first time being applied for the automated selection of PCA, ICA and LDA components, and for the subject-independent MI EEG tasks classification, respectively. Our work is so far the only study which considers all the aforementioned limitations mentioned in Section 1.1 at one place for the development of expert BCI systems. We proposed a novel flexible framework which is effective for subjects with small and sufficient training samples, and provide effective results for both subject-dependent and subject-independent experiments. For the fair evaluation of the proposed framework, along with classification accuracy (𝐴𝑐𝑐 ), we utilized several other evaluation metrics such as sensitivity (𝑆𝑒𝑛 ), precision (𝑃𝑟𝑒 ), F1 score (𝐹1 ), specificity (𝑆𝑝𝑒 ), and kappa coefficient (𝐾𝑐𝑜 ) obtained from confusion matrix. 19 1.3. Organization 20 21 22 23 24 25 The remainder of the article is structured subsequently. The datasets are described in Section 2. Section 3 discusses the aspects of the suggested approach. The criteria of performance assessment for experiments are describe in Section 4. Section 5 describe the experimental setup of the study, Section 6 describes the observations. Section 7 details the debates and Section 8 summarizes the work. 26 2. Materials 27 28 The suggested research utilize two publicly available datasets which are describe in following sections. 29 2.1. Dataset 1 description 30 31 32 33 The IVa dataset (Blankertz et al., 2006) incorporates right-hand (RH) and right-foot (RF) MI activities. Such a set of data was accumulated from five completely relaxed healthy individuals known as ‘‘aa’’, ‘‘al’’, ‘‘av’’, ‘‘aw’’ and ‘‘ay’’ in this research by adjusting 118 electrodes on each subject as per the global 10/20 system guidelines (Jurcak et al., 2007). Each subject set of data includes details about the MI EEG data at its initial four sessions without feedback, whereas there were a maximum of 280 trials for each subject, with 140 trials devoted to class 1 activities and others to class 2 activities. For 3.5 s, every subject served any of the two MI activities, but the training and testing trials are dissimilar for each subject. Precisely, out from the 280 trials, 168, 224, 84, 56 and 28 are training trials for ‘‘aa’’, ‘‘al’’ ‘‘av’’ ‘‘aw’’ and ‘‘ay’’ respectively, whereas the remaining are for testing. This research adopts a down-sampled frequency of 100 Hz, whilst the initial measuring rate was 1000 Hz. 34 35 36 37 38 39 40 41 42 43 44 2.2. Dataset 2 description 45 BCI competition III dataset IVb (Blankertz et al., 2006) is made up of left hand (LH) and right foot (RF) MI tasks. It was obtained from a normal subject labeled as ‘‘Ivb’’ with 118 electrodes positioned under the expanded 10/20 international system (Jurcak et al., 2007). Without feedback, this dataset has seven original sessions. For both MI activities, 210 trials were obtained from 118 electrodes and a bandpass filter (BPF) with 0.05 Hz and 200 Hz lower and upper frequencies filtered those signals respectively. In this analysis, the down-sampled information is used at a frequency of 100 Hz. 46 47 48 49 50 51 52 53 54 3. Methods 55 The proposed method consists of five different modules as depicted in Fig. 1, and each is explained briefly as follows. 56 57 3.1. Module 1: Pre-processing of data 58 EEG measurements are infected with external and cognitive noises that impede further analysis due to unwanted effects. Moreover, cross talk also degrade the MI EEG data patterns due to interference from neighboring electrodes. To avoid these effects, in this study two-step filtering technique is employed. In first step, EEG signals are band-pass filtered with 8–25 Hz to retain the 𝜇 (8–12 Hz) and 𝛽 (16–24 Hz) bands as these two bands have information related to imagine movement. The elliptical filter of 6th order with 1 dB and 50dB passband and stop band 59 60 61 62 63 64 65 66 M.T. Sadiq et al. 1 2 3 4 5 frequencies is used in this study due to it sharp cutoff characteristics. In the second step, Laplacian filter is employed to reduce the cross talk between channels where the mean signals of four nearest neighboring channels is subtracted from each channel signal (Bhattacharyya et al., 2014; Dornhege et al., 2007). 6 3.2. Module 2: Channel selection 7 8 9 10 11 12 13 14 15 16 17 Because of increasing complexity, the use of large number of channels for EEG signal analysis is discouraged. So, in current study three different channel selection criteria were used for further experimentation. 3.2.1. Channel selection criteria 1 In criteria 1, we have chosen 𝐶3 , 𝐶𝑍 , and 𝐶4 electrodes in accordance with the 10–20 framework (Jurcak et al., 2007) since these electrodes are the most discriminatory in hand and foot movements data (Kevric & Subasi, 2017). It should be noted that RH’s MI operation is usually detected above the left motor cortex across the 𝐶3 electrode, and the foot’s MI action across the 𝐶𝑍 electrode. 18 19 20 21 22 23 3.2.2. Channel selection criteria 2 From studies, it is understood that the brain’s frontal, central and parietal lobes are important from a neurological perspective for MI commands. Information from seven electrodes i.e. 𝐹3 , 𝐹4 , 𝐶3 , 𝐶𝑍 , 𝐶4 , 𝑃3 and 𝑃4 which reside above these lobes of interest according to the 10–20 standard (Jurcak et al., 2007) are considered in criteria 2. 24 25 26 27 28 29 3.2.3. Channel selection criteria 3 In criteria 3, electrodes around motor cortex region are nominated, as this region is responsible for MI execution. According to 10–20 standard (Jurcak et al., 2007), 18 electrodes lies around motor cortex region and labeled as 𝐶5, 𝐶3, 𝐶1, 𝐶2, 𝐶4, 𝐶6, 𝐶𝑃 5, 𝐶𝑃 3, 𝐶𝑃 1, 𝐶𝑃 2, 𝐶𝑃 4, 𝐶𝑃 6, 𝑃 5, 𝑃 3, 𝑃 1, 𝑃 2, 𝑃 4 and 𝑃 6, respectively (Sadiq et al., 2019a, 2019b). 30 3.3. Module 3: Feature extraction 31 32 In this work empirical wavelet (EWT) is consider as a feature extraction tool for EEG signal analysis. 33 34 35 36 37 38 39 40 41 42 3.3.1. Empirical wavelet transform In Gilles (2013), Gill et al. suggested using the EWT strategy to address the shortcomings in signal decomposition and analysis faced by the EMD technique. Wavelet filter bank is a vital component of EWT that allows to break down non-stationary signals into multiple modes in which each is adjusted to a unique IMF frequency (Gilles, 2013). The key working steps of EWT procedure can be summarize in subsequent three stages: Step 1: Use the fast Fourier Transform (FFT) method to obtain the Fourier spectrum from the examined signal’s frequency range [0, 𝜋]. 43 44 45 Step 2: Use the scale-space boundary detection technique specified in Gilles (2013) to partition the acquired Fourier spectrum into N neighboring segments. 46 47 48 49 50 Step 3: Empirical wavelets are used as band-pass filters for all frequency segmentations. For this role, this study uses Meyer’s wavelets concept and Littlewood–Paley’s idea (Daubechies, 1992). The Eqs. (1) and (2) reflect the functions of empirical scaling and wavelet as (Gilles, 2013): 51 ⎧1, ⎪ ( 𝜋𝜙(𝛼,𝑓𝑗 ) ) 𝐴̂ 𝑗 (𝑓 ) = ⎨cos , 2 ⎪ ⎩0, If |𝑓 | ≤ (1 − 𝛼)𝑓𝑗 If (1 − 𝛼)𝑓𝑗 ≤ |𝑓 | ≤ (1 + 𝛼)𝑓𝑗 otherwise (1) ⎧1, If (1 + 𝛼)𝑓𝑗 ⎪ ⎪ ≤ |𝑓 | ≤ (1 − 𝛼)𝑓𝑗+1 ⎪ ( 𝜋𝜙(𝛼,𝑓𝑗+1 ) ) , If (1 − 𝛼)𝑓𝑗+1 ≤ |𝑓 | ⎪cos 2 ⎪ ̂ 𝐵𝑗 (𝑓 ) = ⎨ ≤ (1 + 𝛼)𝑓𝑗+1 ( 𝜋𝜙(𝛼,𝑓 ) ) ⎪ 𝑗 , If (1 − 𝛼)𝑓𝑗 ≤ |𝑓 | sin ⎪ 2 ⎪ ≤ (1 + 𝛼)𝑓𝑗 ⎪ ⎪0, otherwise ⎩ ) ( (∣ 𝑓 ∣ −(1 − 𝛼)𝑓𝑗 ) 𝜙(𝛼, 𝑓𝑗 ) = 𝛽 2𝛼𝑓𝑗 (2) 52 (3) 53 while 𝛼 variable is essential for preventing any interaction between Eqs. (1) and (2) functions and set a tight frame as shown in Eq. (4) (Gilles, 2013), ( ) 𝑓𝑗+1 − 𝑓𝑗 𝛼 < 𝑚𝑖𝑛𝑗 (4) 𝑓𝑗+1 + 𝑓𝑗 54 55 56 where Eq. (5) represent the arbitrary function 𝛽(𝑦) (Gilles, 2013), 58 ⎧0, ⎪ 𝛽(𝑦) = ⎨𝛽(𝑦) + 𝛽(1 − 𝑦) = 1, ⎪ ⎩1, 57 If 𝑦 ≤ 0 (5) 59 The coefficients of Eqs. (1) and (2) are found by the dot product of the analyzed signal with the empirical scaling and wavelet functions, respectively, and thus, the empirical mode would be formalized. In this study, 10 modes are extracted from each channel signal empirically. We have included Fig. 2 to present both the original EEG and the modes generated with the EWT. For clear visualization the blue color is use for class 1 whereas red color for class 2. It should also be noted that the same number modes of both classes have significance difference in shapes, which indicate that the statistical independency and chances of better classification among different classes are very high (Siuly & Li, 2012). 60 61 62 63 64 65 66 67 68 69 70 3.4. Module 4: Dimensionality reduction 71 The number of modes from all channels are combine together, which form a large feature matrix. To reduce the feature matrix dimension, following four dimension reduction techniques are utilize in this study. 72 73 74 75 3.4.1. Principal component analysis Principal Component Analysis (PCA) is a well-known technique of data reduction in which 𝐷-dimensional dataset is interpreted in low dimensional data to minimize the complexities, space and degrees of freedom. PCA is effective for segmenting signals from numerous sources and the goal is to depict data in a space, which effectively represents the variance in a context of a sum-squared error. The procedure of PCA can be summarize by succeeding phrases. First, for a data matrix, a mean vector (𝑚) and covariance matrix (𝑐) with D-dimension and 𝐷 × 𝐷 dimension respectively are executed. Second, the eigenvectors (𝑣1 , 𝑣2 ....) and eigenvalues (𝜆1 , 𝜆2 ....) are computed and arranged with respect to lowering eigenvalue. Third, the eigenvectors spectrum are visualize and leading eigenvectors (𝐾) are selected. There will often be a dimension indicating an underlying dimensionality of signal subspace and rest of all dimensions are noise. At last, create a 𝐾 ×𝐾 matrix (𝐵), the columns of which consists of 𝐾 eigenvectors. The data matrix is pre-process by using following expression: 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 𝑋̇ = 𝐵 𝑡 (𝑋 − 𝑚) (6) 93 Further details of PCA can be found in Cover and Hart (1967) and Cao et al. (2003). 94 95 for all 𝑦[0 1] if 𝑦 ≥ 1 M.T. Sadiq et al. Fig. 2. Original EEG Signal with Modes for Class 1 and Class 2 for dataset IVa. 1 2 3 4 5 6 7 8 3.4.2. Independent component analysis Independent Component Analysis (ICA) is data reduction technique that helps to extract mutually exclusive independent components from multivariate data. Data contained in one component cannot inferred from rest of the others and factually this implies joint likelihood of autonomous amounts is acquired as the product of the likelihood of every one of them. Due to our noise free and independence assumptions, we can write the multivariate density function as 9 𝑃 𝑥(𝑡) = 𝑎 ∏ (7) 𝑃 (𝑥𝑗 (𝑡)) 𝑗=1 10 11 12 13 where 𝑎 represents the scalar source signals 𝑥𝑗 (𝑡) for 𝑗 = 1, 2 … 𝑎 with independence assumptions and 𝑡 notate the index of time with range of 1 ≤ 𝑡 ≤ 𝑇 . Assume that at every moment a data vector of 𝐷-dimensional is observed so, 14 𝑦(𝑡) = 𝐵𝑥(𝑡) 15 16 17 where 𝐵 is a scalar matrix of 𝑎 × 𝐷 and 𝐷 ≥ 𝑎 is needed. The goal of ICA is to retrieve the source signals from the observed signals so a real matrix 𝑄 is found as 18 𝑧(𝑡) = 𝑄𝑦(𝑡) = 𝑄𝐵𝑥(𝑡) 19 20 21 22 23 24 25 (8) −1 (9) −1 where 𝑄 = 𝐵 but 𝐵 and 𝐵 both are unknown. In this study maximum-likelihood techniques are utilize to seek 𝐵. The density estimate with specification of 𝑝(𝑦; ̂ 𝑟) is use and the vector 𝑟 is determine, which reduces the difference between the distribution of source and the approximate. In conclusion, 𝑝(𝑦; ̂ 𝑟) is an estimation of 𝑃 (𝑦) and 𝑟 is the basis vectors of 𝐵. Additional information of ICA is found in Cao et al. (2003) and Cover and Hart (1967). 3.4.3. Linear discriminant analysis Linear Discriminant Analysis (LDA) is a data reduction technique that reduces a D-dimensional data into single dimension. LDA’s goal is to build a new variable incorporating the original indicators. This is done by minimizing discrepancies concerning the new parameters among the predetermined categories. The aim is to incorporate the indicator values in such a manner to form a new composite attribute, which provide the discriminating score. At last, every category is supposed to have a Gaussian distribution of discriminating results with the maximum possible discrepancy in average results for the categories. Discriminant function helped to compute the discriminant scores and can be formulated as: 26 27 28 29 30 31 32 33 34 35 36 37 (10) 38 Therefore, a discriminating score is a weighted linear indicator mixture. The weights are calculated to increase the discrepancies among discriminating average category scores. In particular, indicators with broad differences among category averages will have greater weights, whereas weights will be low if category averages are same. More details of LDA can be studied in Fielding (2007). 39 40 41 42 43 44 3.4.4. Neighborhood component analysis Neighborhood component analysis (NCA) is a non-parametric technique, which ranked the attributes according to the substantial information. It formulated because of K-NN classification algorithm. The NCA algorithm’s increase the leave-one-out separation outcome with tuned regularization parameter by learning feature weights over the training data. Let the training data be (Yang et al., 2012) 45 46 47 48 49 50 51 (11) 52 𝐷 = 𝑞1 𝑍1 + 𝑞2 𝑍2 + 𝑞3 𝑍3 ......𝑞𝑝 𝑍𝑝 𝑇 = {(𝑓𝑖 , 𝑙𝑖 ), 𝑖 = 1, 2...𝑛} M.T. Sadiq et al. Fig. 3. Proposed framework for feature selection by NCA. 1 2 3 4 5 Where 𝑓𝑖 represents the feature vector with 𝐹 dimensions, 𝑙𝑖 ∈ {1, 2 … ..𝑐} denotes the respective class labels, 𝑛 corresponds to the total observations and 𝑐 are class labels. For two samples 𝑓𝑖 and 𝑓𝑗 , the distance function 𝐷𝑤 can be represented in context of weight vector and expressed in Eq.(12) as (Yang et al., 2012) 6 𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 ) = 𝐷 ∑ 𝑘=1 | | 𝑤2𝑘 |𝑓𝑖𝑘 − 𝑓𝑗𝑘 | | | (12) 7 8 9 10 where 𝐷𝑤 are the attribute weights. In this technique, randomly a sample is chosen from 𝑇 , labeled respectively and considered as reference point. The reference point can be represented in Eq.(13) as (Yang et al., 2012) 11 𝑘𝑒𝑟(𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 )) 𝑃𝑖𝑗 = ∑𝑛 𝑗=1 𝑘𝑒𝑟(𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 )) 12 13 14 Where −𝑧 𝑘𝑒𝑟 represents the kernel function, which is define as, 𝑘𝑒𝑟(𝑧) = 𝑒 𝜎 and 𝜎 is the width of kernel. The correct classification probability of 𝑓𝑖 is given as 15 𝑃𝑖 = 𝑛 ∑ 𝑘𝑒𝑟(𝐷𝑤 (𝑓𝑖 , 𝑓𝑗 )) (13) (14) 𝑗=1,𝑖≠𝑗 16 17 18 when 𝑓𝑖 = 𝑓𝑗 , the value of 𝑃𝑖𝑗 will be one. So the total classification accuracy can be formulated by following objective function 𝐹 (𝑤) in Eq. (15) as (Yang et al., 2012) 19 𝐹 (𝑤) = 20 21 22 23 The NCA algorithm’s objective is to increase the 𝐹 (𝑤) but this 𝐹 (𝑤) is vulnerable to being over-fit. To prevent the over-fitting in final 𝐹 (𝑤) of NCA framework regularization parameter 𝜆 is used which need to be tuned. The 𝐹 (𝑤) with 𝜆 can be represented as (Yang et al., 2012) 24 𝐴= 𝑛 ∑ (𝑃𝑖 ) (15) 𝑖=1 𝑛 ∑ 𝑖=1 25 26 27 (𝑃𝑖 ) − 𝜆 𝐷 ∑ 𝑤2𝑘 (16) 𝑘=1 The NCA method utilize conjugate gradient approach to increase the objective function 𝐴. The best subset of attributes is chosen based on the weights results. 3.5. Module 5: Selection of suitable features for PCA, ICA, LDA and NCA 28 In this study the same index modes from different channels were arrange in a sequence. To reduce the length of data matrix build by each index modes, the PCA, ICA and LDA were applied. To choose the suitable components (whereas each component is considered as one feature vector in the present study) of PCA, ICA and LDA coefficients, we proposed a correlation-based suitable components and coefficients selection criteria. 29 30 31 32 33 34 35 3.5.1. Suitable components and coefficient selection criteria for PCA, ICA and LDA The ‘‘Best-First’’ (Witten et al., 2005) technique is being used to scan through the groups of components and coefficients, via greedy climbing, which is improved by a backtracking mechanism. Afterwards the ‘‘correlation based’’ (Hall, 1999) component and coefficient chosen method is being employed to determine relevant components and coefficients, by explicitly assessing the predictive potential of each component and coefficient and its degree of reliability. It picks the group of components and coefficients, which are closely correlated to the category but have week interconnections (Hall, 1999) as formulated mathematically, 36 37 𝑠𝑢𝑖𝑡𝑎𝑏𝑙𝑒 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑔𝑟𝑜𝑢𝑝 = ∑ 𝑎𝑙𝑙𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠𝑓 𝐶(𝑓 , 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦) √∑ ∑ 𝑎𝑙𝑙𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠𝑓 𝑎𝑙𝑙𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠𝑔 𝐶(𝑓 , 𝑔) (17) where 𝐶 makes a comparison among two components and ‘‘symmetric uncertainty’’ is employed in this study. 38 39 3.5.2. Suitable features selection by NCA For NCA, all the modes (we consider each mode a single feature vector here) are arranged in a sequence and for strategies 1, 2 and 3 we have total 180, 70 and 30 feature vectors. In this study the proposed criteria as shown in Fig. 3 was used to reduce the large feature matrix. As shown in Fig. 3 the large feature matrix was first divided into training modes and testing modes data, and regularization parameters were tuned on training modes data by employing 10-fold 40 41 42 43 44 45 46 47 M.T. Sadiq et al. Fig. 4. Graphical representation of dimensionality reduction techniques (a) Original EEG signal (b) PCA (c) ICA (d) LDA (e) NCA. 1 2 3 4 5 6 cross-validation. For different folds and regularization parameters, NCA models were constructed on training modes data and classification costs were estimated for corresponding test modes data. Calculate the average of all cost values and the regularization parameter, which give the minimum average classification loss, is considered as best parameter value in this study. The features weighing more than 0 are trained on different neural networks and these classifiers were tested of different test modes data to evaluate the classification outcomes. In Fig. 4, we present the proposed framework graphically. To visually represent the two classes’ clearly, blue color is dedicated for class 1 and red for class 2. The Fig. 4(a) represents the original EEG with huge data and Fig. 4(b)–(e), represents the data obtained after PCA, 7 8 9 10 11 12 M.T. Sadiq et al. 1 2 3 ICA, LDA and NCA. These figures showed that there is significant data reduction by dimensionality reduction techniques in comparison with original data. 4 3.6. Module 6: Classification 5 6 7 8 Once we obtained the suitable feature subsets from dimensionality reduction techniques, we employed different neural network classifiers to classify MI signals. The details of the classifiers are summarize in subsequent discussion. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 3.6.1. Artificial neural networks (single and multilayer) Artificial Neural Networks (ANN) are computational structures consisting of a high number of complex, strongly integrated computing components known as neurons which abstractly mimic the biological nervous system’s structure and activity. ANN learning is achieve by developing certain training methodologies depending on learning regulations deemed to emulate biological systems learning processes. Typical ANN for linear problems contain two layers i.e. input and output layers whereas for nonlinear problems additional layer is utilize and known as hidden layer. The number of hidden layers is chosen empirically based on the problem in hand. More number of hidden layers results in long training process speed. We utilize back propagation algorithm with scaled conjugate gradient approach for fast training to find the suitable weights. Two models of ANN, first employed single layer with ten neurons and a multilayered ANN by three layers each with ten neurons are experimented for the classification of MI tasks. All parameters were selected by hit and trial manners (Subasi & Ercelebi, 2005). 27 28 29 30 31 32 33 34 3.6.2. Feed forward neural network In feed forward neural networks (FFNN) neurons are arrange in multiple layers and signals are forwarded from input to output. When error occurs these neurons are return back to the previous layer and weights are adjusted again to reduce the error chances. In this study, we use tansigmoid transfer function, single hidden layer with empirically chosen ten neurons and Levenberg–Marquardt algorithm for fast training (Jana et al., 2018). 35 36 37 38 39 40 41 42 43 44 45 3.6.3. Cascade-forward neural networks In cascade-forward neural networks, (CFNN) neurons are interlinked with previous and subsequent layers neurons. For example a three-layer CFNN represents direct connection between layer one to layer two, layer two to layer three and layer one to layer three i.e., neurons in input and output layers are connected directly and indirectly. These extra connections help to achieve the better learning speed for required relationship. Like FFNN, in CFNN we utilize transfer function of tan-sigmoid, one hidden layer with ten neurons selected by hit and trial manner and Levenberg–Marquardt method for quick learning (Goyal & Goyal, 2011). 46 47 48 49 50 51 52 53 3.6.4. Recurrent neural networks In recurrent neural networks, (RNN) neurons can flow in circle because this network has one or more feedback links. The characteristics of RNN allows the system to process temporarily and recognize the trends. In this study, we implement Elman recurrent neural networks, which is the common type of RNN. For the quick training of model, Levenberg–Marquardt method and single hidden layer with empirically selected ten neurons are utilized (Mandic & Chambers, 2001). 54 55 56 57 58 59 3.6.5. Probabilistic neural networks Bayesian method derives the probabilistic neural network (PNN) with input, pattern, summation and output layers. The PNN classification accuracy is largely dependent on accurate value of the spread factor. In this study spread factor is fixed to 0.1 after several experiments for the classification of different MI tasks (Specht, 1990). Table 1 Confusion matrix. True positive class (Class 1) True negative class (Class 2) Predictive positive class(Class 1) Predictive negative class(Class 2) TP FN FP TN 3.6.6. Multilayer perceptron neural networks We utilized multilayer perceptron neural network (MLP) with backpropagation for the classification of different MI tasks. The amount of neurons in input and output layers are same as that of features in feature vector and MI classes. The amount of neurons in input and output layers is equivalent to that of feature vectors and MI groups. After comprehensive tests the quantity of neurons is chosen for hidden layer. The following formulation is used in this analysis to pick the number of neurons for a single hidden layer as (Subasi & Ercelebi, 2005). 𝑁= 𝑁𝑜.𝑜𝑓 𝑓 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 + 𝑁𝑜.𝑜𝑓 𝑀𝐼𝑐𝑙𝑎𝑠𝑠𝑒𝑠 2 (18) 60 61 62 63 64 65 66 67 68 69 70 4. Performance verification 71 The performance of the proposed study is verified by using the confusion matrix as shown in Table 1. A number of performance matrices referred to as sensitivity (𝑆𝑒𝑛 ), precision (𝑃𝑟𝑒 ), accuracy (𝐴𝑐𝑐 ), F1 score (𝐹1 ), specificity (𝑆𝑝𝑒 ), and kappa coefficient (𝐾𝑐𝑜 ) are attained from Table 1 for classification performance validation and are as follows respectively: (Hossin & Sulaiman, 2015). 72 73 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 𝑇𝑃 𝑃𝑟𝑒 = 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 + 𝑇𝑁 𝐴𝑐𝑐 = 𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃 2𝑃 𝑟𝑒𝑆𝑒𝑛 𝐹1 = 𝑃 𝑟𝑒 + 𝑆𝑒𝑛 𝑇𝑁 𝑆𝑝𝑒 = 𝐹𝑃 + 𝑇𝑁 𝐴𝑐𝑐 − 𝐸𝑥𝑝𝐴𝑐𝑐 𝐾𝑐𝑜 = 1 − 𝐸𝑥𝑝𝐴𝑐𝑐 (19) 𝑆𝑒𝑛 = (20) (21) (22) (23) (24) where 74 (𝑇 𝑃 +𝐹 𝑃 )(𝑇 𝑃 +𝐹 𝑁) 𝑇 𝑃 +𝐹 𝑁+𝑇 𝑁+𝐹 𝑃 + (𝐹 𝑁+𝑇 𝑁)(𝐹 𝑃 +𝑇 𝑁) 𝑇 𝑃 +𝐹 𝑁+𝑇 𝑁+𝐹 𝑃 (25) 75 In Table 1 and Eqs. (20)–(26), 𝑇 𝑃 mean true positive represents the correctly predicted instances of class 1; 𝑇 𝑁 means true negative represents the correctly estimated instances of class 2; 𝐹 𝑃 represents false positives representing the number of instances predicted as positive where it actually belongs to negative class and 𝐹 𝑁 estimated as negative instances but actually belongs to positive class. 76 77 78 79 80 81 5. Experimental setup 82 The proposed algorithms (dimension reduction techniques) require class labels for experimentations as the dataset IVa and IVb comprised of training (labeled) and testing (un-labeled) datasets so we consider only training (labeled) data as raw EEG signals. We extract features from labeled data (which is considered as raw EEG signals) only thus we split features into the training and testing part at later stage to verify the effectiveness of the proposed experiments. Similar procedure is adopted in studies (Raghu & Sriraam, 2018; Yang et al., 2012) for the split of features into training and testing part. For more understanding 83 84 85 86 87 88 89 90 91 𝐸𝑥𝑝𝐴𝑐𝑐 = 𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃 M.T. Sadiq et al. Fig. 5. Classification results represented by confusion matrix for ANN classifier. (a) Training confusion matrix (b) Validation confusion matrix (c) Test confusion matrix (d) All confusion matrix. Table 2 Number of trials used in each class. Datasets aa al av aw ay ivb Class 1 Class 2 80 112 42 30 18 105 86 112 42 26 10 105 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 of the dataset, the total number of trials (labeled data) used for experiments related to each subject and class are shown in following Table 2. The motivations for using the labeled data can be found in subsequent discussion as well. For MI EEG classification, Siuly et al. proposed a cross-correlation with logistic regression (CC-LR) method (Li et al., 2013), a Modified CC-LR algorithm (Li et al., 2014) and a principal component analysis-based technique (Siuly et al., 2017) in their previous works, and they used the labeled data (data with labels/training data) for their experiments. Recently we proposed EWT-based algorithms (Sadiq et al., 2019a) for MI EEG classification and followed the same procedure as utilized in earlier studies (Li et al., 2013, 2014; Siuly et al., 2017) and selected identically labeled dataset for MI EEG classification. We considered only those EEG segments containing MI tasks information only, and position markers are used to obtain MI data of both categories. 17 18 19 20 21 22 1. In the first phase, from each EEG segment, 350 samples were taken as these samples are directly related to the MI tasks information. To remove external, cognitive and interference of neighboring channels, band pass filter (BPF) with lower and upper frequency of 8 Hz and 25 Hz respectively was employed at the first step, as these two frequency bands contain the maximum MI information. In the second step, Laplacian filter was adopted to remove the cross-channel interference effects. 2. In the second phase, we choose different combination of channels (3, 7 and 18 channels) around the motor cortex area of the brain region based on the physiological arrangement and channel labeling according to the 10–20 system. 3. In the third phase, owing to highly non-linear and non-stationary nature of the EEG, each signal is decomposed into 10 modes by employing EWT, which provides enough information for the correct identification of MI signals (Sadiq et al., 2019a). At this stage we have a total number of 30, 70 and 180 modes which are obtained with 3, 7 and 18 channels, respectively. 4. In the fourth phase, we re-arranged the modes data, such that same index modes of all channels make row vectors respectively as follows, 23 24 25 26 27 28 29 30 31 32 33 34 𝑀1 = {𝐶ℎ1 𝑚1 𝐶ℎ2 𝑚1 ⋯ 𝐶ℎ7 𝑚1 } 𝑀2 = {𝐶ℎ1 𝑚2 𝐶ℎ2 𝑚2 ⋯ 𝐶ℎ7 𝑚2 } (26) ⋮ 𝑀10 = {𝐶ℎ1 𝑚10 𝐶ℎ2 𝑚10 ⋯ 𝐶ℎ7 𝑚10 } where 𝐶ℎ1 𝑚1 represents mode 1 of channel 1 and similar remark is applicable for all other modes of different channels. It is witnessed that each mode have 350 samples, and thus, 2450 (350 × 7) parameters are obtained for one index modes, and totally we have 24,500 (2450 × 10) parameters for all index (10) modes of 7 channels. Since the total number of parameters is too large, we reduced the dimension of each vector parameters using dimension reduction techniques as below, 35 36 37 38 39 40 41 42 (a) We applied PCA on each index modes vector first. PCA reduced the arrangement dimensions into 49 parameters 43 44 M.T. Sadiq et al. Table 3 Selected number of PCA, ICA and LDA components and coefficient (features) with number of parameters for three channel selection strategies. Subject 3 Channels 7 Channels 18 channels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DR Techniques # of parameters PCA ICA LDA NCA PCA ICA LDA NCA PCA ICA LDA NCA aa 3 × 14 42 3 × 11 33 350 × 10 3500 350 × 14 4900 7 × 6 42 7 × 8 56 350 × 10 3500 350 × 6 2100 18 × 6 108 18 × 8 144 350 × 10 3500 350 × 6 2100 al 3 × 12 36 3 × 18 54 350 × 10 3500 350 × 12 4200 7 × 8 56 7 × 8 56 350 × 10 3500 350 × 7 2450 18 × 8 144 18 × 8 144 350 × 10 3500 350 × 8 2800 av 3 × 15 45 3 × 9 27 350 × 10 3500 350 × 12 4200 7 × 10 70 7 × 8 56 350 × 10 3500 350 × 8 2800 18 × 10 180 18 × 8 144 350 × 10 3500 350 × 8 2800 aw 3 × 6 18 3 × 3 9 350 × 10 3500 350 × 6 2100 7 × 2 14 7 × 3 21 350 × 10 3500 350 × 2 700 18 × 2 36 18 × 3 54 350 × 10 3500 350 × 2 700 ay 3 × 9 27 3 × 12 36 350 × 10 3500 350 × 9 3150 7 × 70 49 7 × 7 49 350 × 10 3500 350 × 7 2450 18 × 7 126 18 × 7 126 350 × 10 3500 350 × 7 2450 Ivb 3 × 9 27 3 × 6 18 350 × 10 3500 350 × 1 350 7 × 7 49 7 × 4 28 350 × 10 3500 350 × 1 350 18 × 7 126 18 × 4 72 350 × 10 3500 350 × 1 350 (7 values×7 dimensions) for each index modes vector, and finally, we obtained a total number of 490 parameters for [𝑀1 , 𝑀2 , … … .𝑀10 ]. (b) We then applied ICA onto [𝑀1 , 𝑀2 , … … , 𝑀10 ] respectively. The number of ICA components during the modes data reduction was fixed to be 6, since 6 components of ICA obtained enough physiological information for the analysis of biomedical signals (Martis et al., 2013). ICA results in 42 parameters (7values×6 dimensions) for each index modes vector. A total number of 420 parameters are acquired. (c) We next utilized LDA on [𝑀1 , 𝑀2 , … … , 𝑀10 ] respectively. The number of LDA components are number of 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 − 1. So we have one component obtained from 𝑀1 , one from 𝑀2 and similar for all others. In total, LDA results in 3500 (350 × 10) parameters out of total 24,500 parameters. (d) For NCA, all the modes vectors were arranged sequentially and we obtain a huge data matrix of modes arranged as 21 𝐹 𝑀 = [𝑀1 , 𝑀2 , … … , 𝑀10 ] 22 23 24 25 This specify that 𝐹 𝑀 obtained 180 modes (63,000 parameters), 70 modes (24,500 parameters) and 30 modes (10,500 parameters) for 18, 7 and 3 channels respectively. 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 5. In the fifth phase, features are automatically selected from PCA, ICA, LDA and NCA. In study (Martis et al., 2013), each component of PCA, ICA and LDA is considered to be one feature for the ECG bead classification. However, components and coefficients were selected manually or in a hit and trial manner, which makes their method less practical. In the present proposed study, we also utilized the components and coefficients of PCA, ICA and LDA as feature vectors, but we employed automated correlation-based component and coefficient selection criteria to make proposed method more adaptive for real time applications. In this way, a component or coefficient is chosen best if its characteristics are maximally matched to the characteristics of the category. While the automatic selection of features with NCA is explained clearly in Fig. 3. Table 3 shows the number of selected features with parameters. 6. In the final phase all the selected features were fed to the six neural network classifiers, wherein several evaluation measures with 10-fold cross-validation strategy was employed for classification of different MI tasks. 45 6. Results 46 6.1. Subject dependent results 47 (27) In following sections we first discuss the subject dependent results. 6.1.1. Analysis for PCA, ICA and LDA To explain the results obtained from PCA, we randomly consider subject ‘‘av’’ with channel selection criteria 3 and ANN classifier as an example. The training, validation and testing results of PCA classified by ANN considering single layer and back-propagation algorithm with scaled conjugate gradient approach are shown by confusion matrix in Fig. 5. As shown in Fig. 5, the first two diagonal cells (shaded by green color) of training confusion matrix (each representing class 1 and class 2 samples) reflect the number and percentage of correct classification by trained network. For example in Fig. 5(a), 3701 cases are correctly classified as class 1 samples and 3609 are correctly classified as class 2 samples. 91 cases of class 2 are incorrectly classified as part of class 1 whereas 159 cases of class 1 are incorrectly classified. Overall 95.9% cases of class 1 are identified as correct and rest 4.1% as incorrect. Likewise, for class 2, 97.5% cases are predicted as correct and rest 2.5% as wrong. In total, 96.7% samples are correctly classified and rest 3.3% as wrong in training dataset. Similar analysis is performed for validation and testing sets as shown in Fig. 5(b) and (c). At last, the average classification outcome for all datasets is 96.5%. The area under the receiver-operating curve (AUC) of ANN classifier for subject ‘‘av’’ with channel selection criteria 3 is shown in Fig. 6. The AUC value near to 1 represents the good classification capability of classifier, center line represents that classification is perform by chance whereas value near to 0 represents the poor classification capability of a classifier. For both classes the ROC curve in our case are near to top left represents that AUC value is near to 1. The best performance of ANN classifier at specific period is shown in Fig. 7. As seen in Fig. 7, best validation performance of 0.055788 is obtained at epoch 2. To show the network verification of training, validation and testing error, histogram is shown in Fig. 8. Verification is not as good as the best collection of data points represents the outliers. Part of histogram displaying the null line deviation provides the basis for setting the limit, this need to categorize the outliers based on chosen attribute values being perfected and unperfected. Similar analysis is valid for ICA and LDA. 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 6.1.2. Analysis for NCA The analysis of NCA technique is describe in following discussions. In NCA, 10-fold cross validation was used to measure the finest parameter value for regularization corresponding to the lowest possible loss of discrimination. NCA results produced the best classifying loss of 6.6667e−04 for subject ‘‘av’’ with 3 channels as shown in Fig. 9 at the finest regularization value of 4.0100𝑒−04 The NCA model was executed on an attribute matrix with the best value for the regularization parameter and hence the weight of each attribute was calculated. Attributes with a weight exceeding 5 percent of total attribute weight were nominated to differentiate between the different MI functions. The Fig. 10 Indicates the features weight according to their indices. The essential features in the order were listed: 𝑀10 to 𝑀16 and 𝑀29. 84 85 86 87 88 89 90 91 92 93 94 95 96 97 M.T. Sadiq et al. Fig. 6. Receiver Operating Curve of training, validation and testing for ANN classifier. (a) Training ROC (b) Validation ROC (a) Test ROC (a) All ROC. Fig. 7. (a) Best validation performance at epoch 2 is 0.055788 (b) zoom version of part (a). 1 2 6.1.3. Analysis using all channel selection criteria, statistical measures and classifiers 3 In this section, we describe the details of results using 18, 7 and 3 4 channels by employing ANN, FFNN, CFFNN, RNN, PNN, and MLP with 5 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 (which are calculated from confusion matrix) for PCA, ICA, LDA and NCA. Due to different mental and physical nature of subjects, we obtained unique features combination for every subject. The results obtained by PCA and ICA are shown in Table 4 whereas the results achieved by LDA and NCA for different channel selection criteria can be seen in Table 5. The best classification results are shown 6 7 8 9 10 11 M.T. Sadiq et al. Table 4 Classification outcome of three channel criteria strategies by PCA and ICA. Subject Classifier Strategy 1 PCA Strategy 2 ICA Strategy 3 PCA ICA PCA ICA 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 ‘‘aa’’ ANN MNN FFNN CFNN RNN PNN MLP 80 46.7 69.2 80 69.2 54.6 80 80 40 85.7 80 85.7 55.6 80 80 45 75 80 75 55 80 100 100 100 100 90.9 91.7 100 94.7 97 69.2 94.7 68.8 70.8 94.7 97.2 97.2 77.8 97.2 75 77.8 97.2 100 100 90 100 100 72.7 100 100 100 90 100 100 77.8 100 100 100 90 100 100 75 100 100 66.7 66.7 100 100 100 100 100 66.7 66.7 100 75 75 100 100 66.7 66.7 100 83.3 83.3 100 87.5 75 77.8 87.5 69.2 83.3 87.5 75 87.5 72.7 75 85.7 64.3 75 80 80 75 80 75 70 80 92.9 80 80 92.9 80 100 92.9 94 66.7 66.7 94 66.7 77.8 94 92.8 71.4 71.4 92.8 71.4 85.7 92.8 ‘‘al’’ ANN MNN FFNN CFNN RNN PNN MLP 90 75 87.5 90 61.5 70 90 90 66.7 75 90 71.4 70 90 90 70 80 90 65 70 90 100 83.3 81 100 76.5 62.5 100 100 83.3 93.3 100 73.7 75 100 100 83.3 86 100 75 66.7 100 77.8 40 66.7 77.8 77.8 45.5 77.8 72.7 40 75 72.7 72.7 44.4 72.7 75 40 70 75 75 45 75 100 100 100 100 100 100 100 100 75 75 100 75 100 100 100 83.3 83.3 100 83.3 100 100 100 87.5 100 100 100 62.5 100 83.3 75 83.3 83.3 83.3 58.3 83.3 90 80 90 90 90 60 90 85.7 75 83.3 85.7 80 100 85.7 85.7 83.3 75 85.7 66.7 77.8 85.7 85.7 78.5 78.5 85.7 71.4 85.7 85.7 ‘‘av’’ ANN MNN FFNN CFNN RNN PNN MLP 72.7 66.7 57.4 72.7 64.3 64.3 72.4 77.6 63.6 66.7 77.6 83.3 83.3 77.6 75 65 60 75 70 70 75 80 68 77.3 80 73.9 65.4 80 87.5 90.1 92.9 87.5 92.3 90 87.5 83.3 75 83.3 83.3 80.6 72.2 83.3 81.8 72.7 81.8 81.8 81.8 61.5 81.8 88.9 77.8 88.9 88.9 88.9 71.4 88.9 85 75 90 85 85 65 85 83.3 83.3 83.3 83.3 66.7 66.7 83.3 83.3 83.3 83.3 83.3 66.7 66.7 83.3 83.3 83.3 83.3 83.3 66.7 66.7 83.3 80 77.8 80 80 66.7 83.3 80 80 72.7 80 80 63.6 64.3 80 80 75 80 80 65 70 80 87.5 55.7 63.4 87.5 50 44.4 87.5 100 80 100 100 50 40 100 92.8 71.4 71.4 92.8 50 42.8 92.8 ‘‘aw’’ ANN MNN FFNN CFNN RNN PNN MLP 85.7 50 87.5 85.7 66.7 55 85.7 69.3 50 75 69.3 57.1 55 69.3 75 50 75 75 60 55 75 80.9 73.9 77.3 80.9 66.7 85 80.9 93.3 92.3 92.9 93.3 83.3 84.6 93.3 86.1 80.6 83.3 86.1 72.2 75 86.1 100 81.8 80 100 81.8 66.7 100 90.9 88.9 80 90.9 88.9 63.6 90.9 95 85 80 95 85 65 95 100 100 100 100 66.7 75 100 100 100 100 100 66.7 100 100 100 100 100 100 66.7 83.3 100 80 46.7 69.2 80 69.2 54.6 80 80 40 85.7 80 85.7 55.6 80 85 45 75 85 75 55 80 100 100 87.5 100 100 100 100 100 100 100 100 87.5 100 100 100 100 92.8 100 92.8 100 100 ‘‘ay’’ ANN MNN FFNN CFNN RNN PNN MLP 90 90 70 90 90 66.7 90 90 90 70 90 90 57.1 90 90 90 70 90 90 60 90 100 88.2 85.7 100 100 88.9 100 81.8 84.2 72.7 81.8 78.3 62.9 81.8 88.9 86.1 77.8 88.9 86 69.4 88.9 72.7 63.6 77.8 72.7 58.3 58.3 72.7 77.8 66.7 72.7 77.8 62.5 62.5 77.8 75 65 75 75 60 60 75 100 100 100 100 66.7 100 100 100 100 100 100 66.7 100 100 100 100 100 100 66.7 100 100 80 80 69.2 80 69.3 54.6 80 80 80 85.7 80 85.7 55.6 80 80 80 75 80 75 55 80 100 85.7 100 100 85.7 80 100 87.5 85.7 70 87.5 85.7 66.7 87.5 92.8 85.7 78.5 92.8 85.7 71.4 92.8 Average ANN MNN FFNN CFNN RNN PNN MLP 83.5 69 74.3 83.5 70.3 62.1 83.5 81.4 62.1 74.5 81.4 77.5 64.2 81.4 82 64 72 82 72 62 82 92.9 82.7 84.3 92.2 81.6 78.8 92.2 91.5 89.4 84.2 91.5 79.3 76.7 91.5 91.1 84.4 81.6 91.5 77.8 72.2 91.1 86.5 71.6 79.3 86.5 67.8 60.9 86.5 85.9 74.7 81.3 85.9 82.6 63.9 85.9 86 73 81 86 81 62 86 96.7 90 90 96.7 80 83.3 96.7 96.7 85 85 96.7 70 83.3 96.7 96.7 86.7 86.7 96.7 73.3 86.7 96.7 85.5 73.4 79.2 85.5 74.9 67.7 85.5 79.7 71 81.5 79.7 80.8 59.6 79.7 83 72 79 83 76 60 83 92.9 79.3 82.8 92.9 79.1 84.9 92.9 93.4 83.1 82.3 93.4 71.1 72.5 93.4 92.7 81.4 78.5 92.7 74.3 77.1 92.7 Fig. 8. Error histogram of training, validation and testing state with 20 bins. Fig. 9. Plot between regularization parameters and mean loss values. 1 2 3 4 5 6 7 8 9 by bold text. As shown in those Tables, the most accurate attributes were unique in each classifier, so each combination provides different classification results. For PCA, it is evident from Table 4, the combination of 7 channels and chosen principal components results in 100%, 75%, 90%, 85% and 75% classification accuracy for subjects ‘‘aa’’, ‘‘al’’, ‘‘av’’, ‘‘aw’’ and ‘‘ay’’ by employing ANN, CFNN and MLP respectively. The mean classification accuracy for this case is 86%, which is higher than the results obtained by 18 and 3 channels as shown in Table 4. It is also worth noting that there is significance difference among 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 which showed the instability of PCA for the detection of different MI tasks. 10 11 12 For ICA with 7 channels, we obtained 100% 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 for subjects ‘‘aa’’, ‘‘al’’, ‘‘aw’’ and ‘‘ay’’ by utilizing ANN,CFNN and MLP classifiers. 13 14 15 M.T. Sadiq et al. Table 5 Classification outcome of three channel selection criteria by LDA and NCA. Subject Classifier Strategy 1 LDA Strategy 2 NCA NCA LDA NCA 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 𝑆𝑒𝑛 𝑆𝑝𝑒 𝐴𝑐𝑐 ‘‘aa’’ ANN MNN FFNN CFNN RNN PNN MLP 98.9 98.8 98.9 98.9 99.2 92.7 98.9 98.1 96.1 96 98.1 95.9 89.5 98.1 98.5 97.4 97.5 98.5 97.5 91 98.5 99.9 96.4 98.2 99.9 96.8 99.3 99.9 100 99.8 100 100 99.9 99.9 100 99.9 98.1 99.1 99.9 98.4 99.6 99.9 100 100 100 100 100 97.7 100 100 100 100 100 100 99.9 100 100 100 100 100 100 98.8 100 100 100 100 100 100 100 100 100 100 100 100 100 99.4 100 100 100 100 100 100 99.7 100 96.5 96.4 94.8 96.5 94.1 63.9 96.5 96.4 96.3 94.4 96.4 93.7 43.5 100 98.2 98.2 97.2 98.2 96.9 71.8 98.2 99.9 99.4 99.9 99.9 98.8 99.3 99.9 99.9 98.7 99.9 99.9 99.9 99.6 99.9 99.9 99.1 99.9 99.9 99.3 99.4 99.9 ‘‘al’’ ANN MNN FFNN CFNN RNN PNN MLP 96.9 96.9 96.9 96.9 96.9 96.9 96.9 93.4 93.4 93.4 93.4 93.4 93.4 93.4 95.1 95.1 95.1 95.1 95.1 95.1 95.1 100 100 100 100 100 94.9 100 100 100 100 100 100 93.2 100 100 100 100 100 100 94.1 100 96.9 96.9 96.9 96.9 96.9 96.9 96.9 93.4 93.4 93.4 93.4 93.4 93.4 93.4 95.1 95.1 95.1 95.1 95.1 95.1 95.1 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 97.4 97.8 96.4 97.4 96.1 97.2 97.4 96.4 96.4 96.4 96.4 96.2 64.4 96.4 96.9 97.1 96.4 96.9 96.1 92.1 96.9 100 98.8 98.4 100 99.8 99.8 100 100 99.4 98.4 100 99.1 99.1 100 100 99.1 98.4 100 99.4 99.4 100 ‘‘av’’ ANN MNN FFNN CFNN RNN PNN MLP 99.9 99.7 99.9 99.9 99.9 84.7 99.9 98.9 98.3 98.9 98.9 98.5 81.8 98.9 99.4 99.1 99.4 99.4 99.2 83.2 99.4 100 100 100 100 100 100 100 100 100 100 100 100 70.1 100 100 100 100 100 100 78.7 100 99.6 99.1 98.6 99.6 98.5 95.3 99.6 99.6 97.8 99.7 99.6 99.5 98.2 99.6 99.6 98.4 99.1 99.6 99.1 96.7 99.6 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 92.7 91.4 92.4 92.7 92.3 80.8 92.7 94.8 92.6 94.1 94.8 94.5 87.9 94.8 93.7 91.9 92.3 93.7 93.4 83.9 93.7 96.9 94.8 95.8 96.9 95.2 87.5 96.9 96.9 95.7 96.9 96.9 97.9 99.2 96.9 96.9 95.6 96.4 96.9 96.5 92.6 96.9 ‘‘aw’’ ANN MNN FFNN CFNN RNN PNN MLP 99.9 99.4 99.4 99.9 97.5 97.3 99.9 99.8 99.6 99.1 99.8 99.2 93.6 99.8 99.9 99.5 99.2 99.9 98.3 95.3 99.9 98.4 98.2 98.2 98.4 98.4 98.3 98.4 100 99.9 99.9 100 100 98.7 100 99.2 99.1 99.1 99.2 99.2 98.7 99.2 100 98.9 99.5 100 99.5 96.3 100 100 99.6 100 100 100 99.1 100 99.9 99.2 99.7 99.9 99.6 97.7 99.9 100 100 100 100 100 95.4 100 100 90 90 100 100 97 100 100 100 100 100 100 96.2 100 97.2 97.2 94.3 97.2 94.3 100 97.2 97.2 97.2 97.1 97.2 96.9 78.2 97.2 97.2 97.2 95.6 97.2 95.6 86 97.2 99.9 99.8 99.9 99.9 99.3 100 99.9 99.9 99.8 99.9 99.9 99.5 99.9 99.9 99.9 99.8 99.9 99.9 99.5 99.9 99.9 ‘‘ay’’ ANN MNN FFNN CFNN RNN PNN MLP 98.8 97.9 95.7 98.8 96.7 94.1 98.8 99.2 99 99.4 99.2 99.3 94.3 99.2 99.2 98.5 97.5 99.2 97.9 94.6 99.2 93.8 92.8 91.7 93.8 91.8 77.9 93.8 96.9 94.6 98.5 96.9 90 88.7 96.9 95.3 94.5 94.8 95.3 94.7 92.5 95.3 100 100 100 100 99.5 97.6 100 100 100 100 100 99.4 99.9 100 100 100 100 100 99.5 98.7 100 100 99.7 100 100 99.9 97.8 100 100 99.6 100 100 100 99.8 100 100 99.7 9.9 100 99.9 98.7 100 98 95.5 97.3 98 97.4 93.7 98 95.4 93.7 95.4 95.4 95.9 89.9 95.4 96.7 94.6 96.3 96.7 96.7 91.8 96.7 96.9 93.5 97.9 96.9 98.8 99.6 96.9 96.9 92.7 96.9 96.9 96 99.6 96.9 96.9 93.1 97.4 96.9 97.4 99.6 96.9 Average ANN MNN FFNN CFNN RNN PNN MLP 98.9 98.5 98.2 98.9 98 93.1 98.9 97.9 97.3 97.4 97.9 97.3 90.5 97.9 98.4 97.2 97.7 98.4 97.6 91.8 98.4 98.4 97.5 98 98.4 97.4 94.1 98.4 99.4 98.9 99.4 99.4 97.9 89.8 99.4 98.9 98.3 98.7 98.9 98.5 90.7 98.9 99.3 98.9 99 99.3 98.9 96.8 99.3 98.6 98.2 98.6 98.6 98.5 98.1 98.6 98.9 98.5 98.8 98.9 98.7 97.4 98.9 100 99.9 100 100 99.9 98.6 100 100 97.9 100 100 100 99.2 100 100 99.9 100 100 99.9 98.9 100 96.4 95.7 95 96.4 94.8 87.1 96.4 96.8 95.9 96.6 96.8 96.7 84.1 96.8 96.5 95.8 95.6 96.5 95.7 85.1 96.5 98.1 97.3 98.4 98.1 98.4 96 98.1 98.1 97.3 98.4 98.1 98.5 99.5 98.1 98.1 97.3 98.3 98.1 98.4 98.2 98.1 Fig. 10. Features with different weights. 1 2 3 4 5 6 7 8 Strategy 3 LDA Likewise, LDA with 7 channels provide 100%, 95.1%, 99.6%, 99.9% and 100% classification outcome for subjects ‘‘aa’’, ‘‘al’’, ‘‘av’’, ‘‘aw’’ and ‘‘ay’’ respectively by utilizing same classifiers as mentioned for ICA. The detailed results obtained by ICA and LDA with different decoded channels are shown in Table 4 and Table 5 respectively. The average 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 are 96.7%, 97.5%, 96.7%, 96.6%, 96.7% and 100% for ICA, and 99.3%, 98.5%, 98.9%, 98.9%, 98.6% and 97.9% for LDA by employing ANN, CFNN and MLP classifiers. These results indicate the less variations among different measures and shows the fair nature of ICA and LDA methods in detection of different MI states in comparison with PCA. The classification results obtained by NCA are given in Table 5. We obtained 100% average 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 , with 7 channels and ANN, CFNN and MLP classifiers, which indicate the significance of features obtained by NCA for MI EEG signal classification. These results also indicate that NCA with 7 channels increase the classification performance measures by utilizing less number of features in comparison with PCA, ICA and LDA. In Fig. 11(a)–(c), precision, F-measure and kappa statistics values obtained by different dimension reduction techniques with 7 channels are shown by bar graphs. Each bar graph is labeled with the average value in percentage. As represented in Fig. 11(a)–(c), for PCA, high variations can be seen among different values by using six classifiers. The NCA provide maximum results with least variations among different measures by utilizing ANN, CFNN and MLP classifiers The results obtained for dataset IVb are shown in Table 6. These results also provide 100% average 𝑆𝑒𝑛 , 𝑃𝑟𝑒 , 𝐴𝑐𝑐 , 𝐹1 , 𝑆𝑝𝑒 , and 𝐾𝑐𝑜 , for NCA with 7 channels. The Table 7 contains the probability (P) values obtained by the Kruskal–Wallis (KW) test applied on features selected by NCA with 7 channels. As seen in Table 7, the values for features are extremely low which indicate the significance of features that help to obtain outstanding classification results in NCA. These results conclude that 7 channels, suitable feature selection and ANN, CFNN and MLP classifiers provide the best combination to achieved benchmark classification output for different subjects. These results conclude that NCA is a efficient feature selection tool for subjects with different mental and physical nature and can be used for efficient BCI systems. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 M.T. Sadiq et al. Fig. 11. (a) Precision (b) F-measure (c) Kappa-Coefficient for NCA with 7 Channels. M.T. Sadiq et al. Table 6 Classification outcome of three channel selection criteria for Dataset IVb. DR techniques Classifier Strategy 1 Strategy 2 Strategy 3 𝑆𝑒𝑛 𝑃𝑟𝑒 𝐴𝑐𝑐 𝐹1 𝑆𝑝𝑒 𝐾𝑐𝑜 𝑆𝑒𝑛 𝑃𝑟𝑒 𝐴𝑐𝑐 𝐹1 𝑆𝑝𝑒 𝐾𝑐𝑜 𝑆𝑒𝑛 𝑃𝑟𝑒 𝐴𝑐𝑐 𝐹1 𝑆𝑝𝑒 𝐾𝑐𝑜 PCA ANN MNN FFNN CFNN RNN PNN MLP 85.5 60 65 85.5 65 70 85.5 88.3 60 65 88.3 65 70 88.3 86.7 60 65 86.7 65 70 86.7 88.9 60 65 88.9 65 70 88.9 87.9 60 65 87.9 65 70 87.9 73.3 60 65 73.3 65 70 73.3 97.3 96.2 97.3 97.3 97.3 88.8 97.3 98.3 97.2 98.3 98.3 98.3 88.6 98.3 97.8 96.7 97.8 97.8 97.8 88.6 97.8 97.8 96.7 97.8 97.8 97.8 88.6 97.8 98.3 97.2 98.3 98.3 98.3 88.4 98.3 95.6 93.3 95.6 95.6 95.6 77.2 95.6 88.8 88.8 70 88.8 70 88.8 88.8 88.6 88.6 70 88.6 70 88.6 88.6 88.6 88.6 70 88.6 70 88.6 88.6 88.6 88.6 70 88.6 70 88.6 88.6 88.4 88.4 70 88.4 70 88.4 88.4 77.2 77.2 70 77.2 70 77.2 77.2 ICA ANN MNN FFNN CFNN RNN PNN MLP 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 83.3 100 100 100 100 100 100 100 94.4 94.4 94.4 94.4 94.4 94.4 94.4 97.2 97.2 97.2 97.2 97.2 97.2 97.2 97.1 97.1 97.1 97.1 97.1 97.1 97.1 94.7 94.7 94.7 94.7 94.7 94.7 94.7 94.4 94.4 94.4 94.4 94.4 94.4 94.4 87.5 87.5 87.5 87.5 87.5 87.5 87.5 100 100 100 100 100 100 100 92.9 92.9 92.9 92.9 92.9 92.9 92.9 93.3 93.3 93.3 93.3 93.3 93.3 93.3 100 100 100 100 100 100 100 85.7 85.7 85.7 85.7 85.7 85.7 85.7 LDA ANN MNN FFNN CFNN RNN PNN MLP 98.4 97.8 99.7 98.4 98.4 92.5 98.4 97.5 95.4 94.8 97.5 96.9 92.9 97.5 97.9 96.6 97.3 97.9 97.6 92.7 97.9 97.9 96.6 97.2 97.9 97.6 92.7 97.9 97.5 95.5 95.1 97.5 96.9 92.8 97.5 95.9 93.3 94.6 95.9 95.3 85.3 95.9 99.6 99.7 99.6 99.6 99.4 96.9 99.6 99.7 97.8 99.7 99.7 99.9 99.8 99.7 99.7 98.8 99.7 99.7 99.6 98.3 99.7 99.7 99.8 99.7 99.7 99.6 98.4 99.7 99.7 97.8 99.7 99.7 99.9 99.8 99.7 99.7 97.5 99.3 99.7 99.3 96.7 99.7 99.7 99.7 99.6 99.7 99.4 96.9 99.7 99.7 97.8 99.7 99.7 99.9 99.8 99.7 99.7 98.8 99.7 99.7 99.6 98.3 99.7 99.7 98.8 99.7 99.7 99.6 98.4 99.7 99.7 97.8 99.7 99.7 99.9 99.8 99.7 99.7 97.5 99.3 99.7 99.3 96.7 99.7 NCA ANN MNN FFNN CFNN RNN PNN MLP 97.9 96.7 94.5 97.9 95.8 96.4 97.9 97.9 98.9 98.8 97.9 99.2 99.9 97.9 97.9 97.8 96.5 97.9 97.4 98.1 97.9 97.9 97.8 96.6 97.9 97.5 98.1 97.9 97.9 98.8 98.7 97.9 99.2 99.9 97.9 97.9 95.5 93.1 97.9 94.8 96.2 97.9 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 99.7 Table 7 P values for NCA features with 7 channels. Subjects P values by KW test ‘‘aa’’ ‘‘al’’ ‘‘av’’ ‘‘aw’’ ‘‘ay’’ ‘‘IVb’’ F1 F2 F3 F4 F5 F6 F7 F8 0 4.1 × 10−248 2.99 × 10−141 0 9.9 × 10−3 3.67 × 10−193 4.39 × 10−66 1.52 × 10−126 2.70 × 10−199 9.35 × 10−116 9.9 × 10−3 – 0 1.37 × 10−139 0 – 1.58 × 10−10 — 0 6.3 × 10−2 9.24 × 10−313 — 7.35 × 10−3 – 0 2.010 × 10−185 0 – 9.9 × 10−3 — 1.58 × 10−261 1.807 × 10−75 2.80 × 10−241 — 9.15 × 10−3 – 3.57 × 10−09 8.99 × 10−207 2.04 × 10−16 – 0 — 0.75 – 0 – – – 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 6.1.4. Timing execution of proposed methodology In our experiments, we obtained maximum classification accuracy by utilizing channel selection criteria 2 i.e. with 7 channels so we have presented the execution time in seconds for this strategy in terms of training, testing and total algorithm time in Fig. 12. This execution time is calculated by utilizing MLP classifier for each subject. As seen in Fig. 12, the total algorithm execution time increases due to the complexity of feature selection algorithm. PCA provide the minimum execution time whereas NCA provides the maximum execution time in comparison with all other dimensionality reduction techniques. Although the execution time of NCA is more due to loop used for the calculation of classification cost, this elapsed time is a single time procedure so the training and testing execution period is very less in comparison with other dimension reduction techniques, which is required for online BCI applications. It is also worth noting that we calculated time for all the trials by utilizing MATLAB software with 8 GB RAM on personal computer, the time required for single trial is still very less and can be further reduced by utilizing powerful system and software. 20 6.2. Subject independent results 21 22 23 24 25 Because of the dynamic characteristics of EEG signals, classifiers training and testing are highly subject specific, and therefore, subject dependent MI EEG signal identification strategies were formulated in the literature (Ince et al., 2009; Kevric & Subasi, 2017; Li et al., 2011; Lu et al., 2010; Siuly & Li, 2012; Song & Epps, 2007; Wang et al., 2016; Zhang et al., 2013). Nevertheless, in reality it is incredibly hard and tedious for stroke subjects for doing exhaustive training sessions to use a particular device, and hence researchers (Joadder et al., 2019) recently introduced a subject independent (SI) method for the identification of MI EEG signals. However, it should be noted that such a framework’s identification performance is low, a huge amount of electrodes were also utilized in training phase. In the present research, we have also sought to investigate the effectiveness of the suggested method for the identification of MI EEG signals in SI case. We adopted the similar procedure as utilized in Joadder et al. (2019) for SI experiments where the first four subjects in dataset IVa were employed for training purposes whereas subject five was chosen as a test subject. In Fig. 13 we have shown the block diagram of SI framework with NCA as it provided the best results in this study. As it is seen in Fig. 13, there are two main building blocks consists of training with first four subjects and testing with fifth subject. All the results were obtained by ten-fold cross validation strategy with MLP classifier. The results obtained by SI case are presented in Fig. 14. We obtained 93%, 93%, 92.9%, 93%, 96.4% and 90%, of average 𝑆𝑒𝑛 , 𝑆𝑝𝑒 , 𝐴𝑐𝑐 , 𝑃𝑟𝑒 , 𝐹1 and 𝐾𝑐𝑜 . These low deviations in different measures summarize that the proposed SI framework have unbiased chances in recognition of each MI task. 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 7. Discussions 48 The purpose of this study is to increase the classification outcome of different MI signals by selection of efficient features. The objectives obtained in study can be summarize as 49 50 51 M.T. Sadiq et al. Fig. 12. Timing Execution with (a) PCA (b) ICA (c) LDA and (d) NCA. Fig. 13. Schematic diagram of the suggested technique for interpretation of the MI tasks with subject independent case. 1 2 3 4 5 6 1. In the pre-processing unit, temporal and spatial filtering is applied to eliminate cognitive noise and interference between channels. The proposed system is made resilient against noise by this pre-processing unit. To show the effectiveness of proposed two step filtering on classification accuracy, we compare the results with and without pre-processing in our analysis, yet reviled results with pre-processing module are at least 5%–12% 7 improved in term of overall classification accuracy for NCA with 8 seven channels by utilizing MLP classifier as shown in Fig. 15. 2. In order to identify the appropriate channels relevant to MI 9 10 information, we select different combinations of channels based 11 M.T. Sadiq et al. Table 8 Comparison with other studies. Papers by Methods employed Classification accuracy (%) ‘‘aa‘‘ ‘‘al’’ ‘‘av‘‘ ‘‘aw’’ ‘‘ay‘‘ ‘‘IVb’’ Average The proposed EWT+NCA+ANN/CFNN/MLP Number of Channels Number of Features 100 7 6 100 7 8 100 7 8 100 7 2 100 7 7 100 7 1 100 – – Siuly and Li (2012) CC+tuned LS-SVM Number of Channels Number of Features 97.9 118 6 99.2 118 6 98.8 118 6 93.4 118 6 89.4 118 6 97.9 118 6 96.1 – – Ince et al. (2009) CS+SVM Number of Channels Number of Features 95.6 33 16 99.7 33 16 90.5 33 16 98.4 33 64 95.7 33 64 – – – 96 – – Wang et al. (2016) OA+HOS+NB Number of Channels Number of Features 97.9 118 11 97.9 118 11 98.3 118 11 94.5 118 11 93.7 118 11 91.9 118 11 95.6 – – Sadiq et al. (2019a) EWT+IA2+tuned LS-SVM Number of Channels Number of Features 94.5 18 10 91.7 18 10 97.2 18 10 95.6 18 10 97 18 10 – – – 95.2 – – Kevric and Subasi (2017) MSPCA +WPD+HOS + k-NN Number of Channels Number of Features 96 3 6 92.3 3 6 88.9 3 6 95.4 3 6 91.4 3 6 – – – 92.8 – – Li et al. (2011) Clustering+LS-SVM Number of Channels Number of Features 92.6 118 9 84.9 118 9 90.8 118 9 86.5 118 9 86.7 118 9 – – – 88.3 – – Song and Epps (2007) SSRCSP Number of Channels Number of Features 87.4 18 20 97.4 18 20 69.7 18 20 96.8 18 20 88.6 18 20 – – – 87.9 – – Lu et al. (2010) R-CSP through aggregation Number of Channels Number of Features 76.8 118 6 98.2 118 6 74.5 118 6 92.2 118 6 77 118 6 – – – 83.7 – – Zhang et al. (2013) Z-LDA Number of Channels Number of Features 77.70 118 6 100 118 6 68.4 118 6 99.6 118 6 59.9 118 6 – – – 81.1 – – Fig. 14. Results for subject independent MI EEG signal classification. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 on physiological understanding according to 10–20 system standard and so, in our research, only a combination of 18, 7 and 3 channels around the motor cortex brain area was chosen to provide adequate identification results among various MI tasks. 3. The EWT procedure was developed to determine the complex nature of the EEG signals, and 10 modes were selected by hit and trial manner for each channel to distinguish between different MI tasks. Although other modes are physiologically meaningful but for the classification of MI tasks they do not contribute. 4. For channel selection strategies 1, 2 and 3, we grouped all modes in sequential order and acquired 180, 70 and 30 modes in all. Every mode is perceived as one feature vector in the analysis. 5. We next applied PCA, ICA, LDA and NCA on each feature matrix obtained from three channel selection strategies to acquired various components and coefficients from PCA, ICA and LDA. Fig. 15. Effect of pre-processing (two-step filtering) on classification accuracy. To select the efficient components and coefficients, a correlation based criteria was implemented to further reduce the feature dimension. For NCA a best value of regularization parameter was selected which provide the minimum classification loss. 6. For the fair analysis of the proposed framework, extensive evaluation parameters (such as recall, sensitivity, precision, accuracy, F1 score, specificity and kappa coefficient) with various neural networks are evaluated and higher values of these parameters showed the success of proposed method. 7. Our proposed experimental results showed that NCA provides the higher classification results in comparison with PCA, ICA and LDA as showed in Tables 4 and 5. This is because, PCA is a linear transformation approach for discovering vector data representation, obtaining total variance in an unsupervised fashion. It can result in data loss during the lower dimension projecting 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 M.T. Sadiq et al. Fig. 16. Comparison of subject independent outcome with other studies. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 operation. For situations where the average and covariance are not sufficient to identify datasets, PCA often suffers, and it tries to seek linear correlation among variables, which is often not desired. The ICA algorithm use the higher-order statistics (HOS) to resolve the linear model’s blind sources separation (BSS) limitations. ICA transform the subspace of the signal into independent components invisible to PCA, however HOS are highly susceptible to outliers. LDA is a parametric process for learning a linear set of features in a manner that discriminatory data is retained by class labels. LDA makes the assumption that data follows the Gaussian probability and fails if the discriminatory data is not in the mean but possibly in the variance of the information. On the other hand, NCA ranks the features according to weights to increase the classification accuracy and does not lose any information during dimensionality reduction (Raghu & Sriraam, 2018), therefore provides more classification accuracy in our experiments in comparison with PCA, ICA and LDA. It is also worth mentioning here, in our previous study (Sadiq et al., 2019a), we employed the least square version of SVM classifier and obtained an average classification accuracy of 95.19%. In the present proposed study, we again repeated the experiments with the SVM and obtain a classification accuracy of 94.7%, 92.2%, 97.77%, 95.7%, and 97% for the subjects ‘‘aa’’, ‘‘av’’, ‘‘al’’, ‘‘aw’’, and ‘‘ay’’ respectively. The average classification accuracy of 95.47% is obtained with the LS-SVM, which is much closed to the results obtained in our previous study. As in the proposed study, EWT features are tested with several neural networks, and thus, we conclude that neural networks help in obtaining much better results for the EWT. 8. At last, to show the success of the proposed approach, a comparison is made with several other studies applied on datasets IVa and IVb. As seen in Table 8, our proposed approach with channel selection strategy 2 and NCA provide benchmark classification accuracy of 100% for subjects ‘‘aa’’, ‘‘av’’, ‘‘al’’, ‘‘aw’’, ‘‘ay’’, and ‘‘IVb’’. This shows that proposed method is suitable for subjects with different training samples. The CC+tuned LS-SVM (Siuly & Li, 2012) and CS+SVM (Ince et al., 2009) achieved classification accuracy of 96.08% and 96% with 2nd and 3rd ranks. The Z-LDA (Zhang et al., 2013) ranked at last with an average classification accuracy of 81.1%. In comparison with those studies in Table 8, our proposed approach ‘‘two-step filtering+EWT+NCA+ANN/CFNN/MLP’’ provide classification improvement of 3.92%–18.9%. Such an increase in accuracy may alleviate subjects to explain their MI assignments more clearly. As an illustration, handicapped people might be able to regulate their wheelchair more effectively, and rehabilitated patients might be able to increase their therapeutic activities with adequate input after they perform the required action. Moreover, other studies in Table 8, ‘‘CC+LS-SVM’’ (Siuly & Li, 2012), ‘‘OA+NB’’ (Wang et al., 2016), ‘‘RCSP+aggregation’’ (Lu et al., 2010), ‘‘CS+SVM’’ (Ince et al., 2009), EWT+IA2+HOS (Sadiq et al., 2019a) and ‘‘CSP+SVM’’ (Song & Epps, 2007) utilized ‘‘118’’, ‘‘118’’, ‘‘118’’, ‘‘33’’, ‘‘18’’ and ‘‘18’’ number of channels respectively in comparison of our proposed method which utilized only 7 channels to achieve benchmark classification outcome. It is also shown in Table 8, the proposed algorithm has chosen significant features that are particular for each subject which represent such proposed algorithm is versatile to adopt in developing a subject-specific BCI system. 9. For subject independent case, the compassion of the proposed method with other studies is shown in Fig. 16 by bar graphs. As represented in Fig. 16, the top of each bar is labeled with maximum average classification accuracy for each study. The proposed study ranked at number one in term of overall classification accuracy with 92.9%. Our previous study (Sadiq et al., 2019b) ranked at 2nd with overall classification accuracy of 91.4%, ‘‘CSP+Katz Fractal Dimension+LDA’’ (Joadder et al., 2019) ranked at 3rd with 84.3% average classification outcome whereas evolutionary based algorithm (Atyabi et al., 2017) ranked at last with 71.9% results. These results suggest that the proposed ‘‘two-step filtering+seven channels+regularized NCA+MLP’’ framework in EWT domain provide up to 21% classification improvement in overall results for SI case. Moreover, it is important to note that a total of ‘‘68’’, ‘‘118’’, ‘‘22’’ and ‘‘118’’ electrodes have been used in studies (Atyabi et al., 2017; Devlaminck et al., 2011; Kang et al., 2009; Samek et al., 2013), while in the proposed research just 7 channels have been used to produce the best performance in distinction. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 M.T. Sadiq et al. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 The key benefits of the suggested framework is its robustness against noise, selection of the reduce features for different subjects, stable evaluation outcome for both subject dependent and subject independent cases by using 7 channels only. The advantages of the proposed study reveal that proposed method is suitable for the development of expert BCI system. In the present study, the number of channels have been identified empirically for the EWT. Since one have to choose those channels with relevant MI information manually, such an empirical selection strategy takes a long time. To overcome this constraint, our next step work is to establish adaptive channel discovery algorithms to allow for effective and scalable signal detection approaches for practical applications. In addition to the publicly accessible sets of data, our next level is to test these applied approaches online for some other different applications. 15 8. Conclusion 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 In this work we developed automated methods for the pattern mining of EEG data using EWT technique with PCA, ICA, LDA and NCA. To authenticate the reliability of the proposed methods, three different channel combinations were decoded from two publicly available BCI competition III datasets. To eliminate the irrelevant components and coefficients from PCA, ICA and LDA, a correlation based components and coefficients selection criteria with the best first search technique was utilized. The regularization parameters of NCA method was also tuned to select the relevant attributes. All the experiments were investigated by using various statistical measures and neural networks. We achieved 100% and 92.9% classification accuracy for subject dependent and subject independent cases respectively by utilizing NCA with 7 channels and MLP classifier which is higher than the other works on same public available datasets. The computing cost of NCA is higher than among PCA, ICA and LDA owing to the diversification in the selection of tuned parameters which govern the function of NCA. Nevertheless, the estimated model’s training and testing times were slower for NCA than for PCA, ICA and LDA due to reduced attribute space. In conclusion the combination of two-step filtering, 7 channels, EWT, NCA and MLP is an effective framework for expert BCI applications. 36 CRediT authorship contribution statement 37 38 39 40 41 42 Muhammad Tariq Sadiq: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Visualization. Xiaojun Yu: Validation, Formal analysis, Investigation, Writing - review & editing, Visualization, Supervision, Project administration. Zhaohui Yuan: Writing review & editing, Visualization, Supervision, Project administration. 43 Declaration of competing interest 44 45 46 The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. 47 References 48 49 50 51 52 53 54 55 56 57 58 59 60 61 Acharya, U. R., Sree, S. V., Alvin, A. P. C., & Suri, J. S. (2012). Use of principal component analysis for automatic classification of epileptic EEG activities in wavelet framework. Expert Systems with Applications, 39(10), 9072–9078. Atyabi, A., Luerssen, M., Fitzgibbon, S. P., Lewis, T., & Powers, D. M. (2017). Reducing training requirements through evolutionary based dimension reduction and subject transfer. Neurocomputing, 224, 19–36. Bashar, S. K., & Bhuiyan, M. I. H. (2016). Classification of motor imagery movements using multivariate empirical mode decomposition and short time fourier transform based hybrid method. Engineering Science and Technology, an International Journal, 19(3), 1457–1464. Bhattacharyya, S., Sengupta, A., Chakraborti, T., Konar, A., & Tibarewala, D. (2014). Automatic feature selection of motor imagery EEG signals using differential evolution and learning automata. Medical & Biological Engineering & Computing, 52(2), 131–139. Birbaumer, N., Murguialday, A. R., & Cohen, L. (2008). Brain–computer interface in paralysis. Current Opinion in Neurology, 21(6), 634–638. Blankertz, B., Muller, K.-R., Krusienski, D. J., Schalk, G., Wolpaw, J. R., Schlogl, A., Pfurtscheller, G., Millan, J. R., Schroder, M., & Birbaumer, N. (2006). The BCI competition III: Validating alternative approaches to actual BCI problems. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14(2), 153–159. Burke, D. P., Kelly, S. P., De Chazal, P., Reilly, R. B., & Finucane, C. (2005). A parametric feature extraction and classification strategy for brain-computer interfacing. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(1), 12–17. Cao, L., Chua, K. S., Chong, W., Lee, H., & Gu, Q. (2003). A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 55(1–2), 321–336. Chaudhary, S., Taran, S., Bajaj, V., & Siuly, S. (2020). A flexible analytic wavelet transform based approach for motor-imagery tasks classification in BCI applications. Computer Methods and Programs in Biomedicine, 187, Article 105325. Cincotti, F., Mattia, D., Aloise, F., Bufalari, S., Schalk, G., Oriolo, G., Cherubini, A., Marciani, M. G., & Babiloni, F. (2008). Non-invasive brain–computer interface system: towards its application as assistive technology. Brain Research Bulletin, 75(6), 796–803. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. Daubechies, I. (1992). Ten lectures on wavelets (vol. 61). Siam. Devlaminck, D., Wyns, B., Grosse-Wentrup, M., Otte, G., & Santens, P. (2011). Multisubject learning for common spatial patterns in motor-imagery BCI. Computational Intelligence and Neuroscience, 2011. Dornhege, G., Millán, J. d. R., Hinterberger, T., McFarland, D., & Müller, K.-R. (2007). Toward brain-computer interfacing (vol. 63). MIT Press Cambridge, MA. Ebrahimi, T., Vesin, J.-M., & Garcia, G. (2003). Brain-computer interface in multimedia communication. IEEE Signal Processing Magazine, 20(1), 14–24. Feng, J. K., Jin, J., Daly, I., Zhou, J., Niu, Y., Wang, X., & Cichocki, A. (2019). An optimized channel selection method based on multifrequency CSP-rank for motor imagery-based BCI system. Computational Intelligence and Neuroscience, 2019. Fielding, A. (2007). Cluster and classification techniques for the biosciences (vol. 260). Cambridge University Press Cambridge. Gilles, J. (2013). Empirical wavelet transform. IEEE Transactions on Signal Processing, 61(16), 3999–4010. Goyal, S., & Goyal, G. K. (2011). Cascade and feedforward backpropagation artificial neural networks models for prediction of sensory quality of instant coffee flavoured sterilized drink. Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition, 2(6), 78–82. Hall, M. A. (1999). Correlation-based feature selection for machine learning. University of Waikato Hamilton. Hossin, M., & Sulaiman, M. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1. Ince, N. F., Goksu, F., Tewfik, A. H., & Arica, S. (2009). Adapting subject specific motor imagery EEG patterns in space–time–frequency for a brain computer interface. Biomedical Signal Processing and Control, 4(3), 236–246. Jana, G. C., Swetapadma, A., & Pattnaik, P. K. (2018). Enhancing the performance of motor imagery classification to design a robust brain computer interface using feed forward back-propagation neural network. Ain Shams Engineering Journal, 9(4), 2871–2878. Jansen, B. H., Bourne, J. R., & Ward, J. W. (1981). Autoregressive estimation of short segment spectra for computerized EEG analysis. IEEE Transactions on Biomedical Engineering, (9), 630–638. Jiang, X., Bian, G.-B., & Tian, Z. (2019). Removal of artifacts from EEG signals: a review. Sensors, 19(5), 987. Jiao, Y., Zhang, Y., Chen, X., Yin, E., Jin, J., Wang, X., & Cichocki, A. (2018). Sparse group representation model for motor imagery EEG classification. IEEE Journal of Biomedical and Health Informatics, 23(2), 631–641. Jin, Z., Zhou, G., Gao, D., & Zhang, Y. (2018). EEG classification using sparse Bayesian extreme learning machine for brain–computer interface. Neural Computing and Applications, 1–9. Joadder, M. A., Siuly, S., Kabir, E., Wang, H., & Zhang, Y. (2019). A new design of mental state classification for subject independent BCI systems. IRBM, 40(5), 297–305. Jurcak, V., Tsuzuki, D., & Dan, I. (2007). 10/20, 10/10, and 10/5 systems revisited: their validity as relative head-surface-based positioning systems. NeuroImage, 34(4), 1600–1611. Kang, H., Nam, Y., & Choi, S. (2009). Composite common spatial pattern for subject-to-subject transfer. IEEE Signal Processing Letters, 16(8), 683–686. Kevric, J., & Subasi, A. (2017). Comparison of signal decomposition methods in classification of EEG signals for motor-imagery BCI system. Biomedical Signal Processing and Control, 31, 398–406. Kołodziej, M., Majkowski, A., & Rak, R. J. (2012). Linear discriminant analysis as EEG features reduction technique for brain-computer interfaces. Przeglad Elektrotechniczny, 88, 28–30. Krepki, R., Blankertz, B., Curio, G., & Müller, K.-R. (2007). The Berlin Brain-Computer Interface (BBCI)–towards a new communication channel for online control in gaming applications. Multimedia Tools and Applications, 33(1), 73–90. 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 M.T. Sadiq et al. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 Kronegg, J., Chanel, G., Voloshynovskiy, S., & Pun, T. (2007). EEG-based synchronized brain-computer interfaces: A model for optimizing the number of mental tasks. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 15(1), 50–58. Krusienski, D. J., McFarland, D. J., & Wolpaw, J. R. (2006). An evaluation of autoregressive spectral estimation model order for brain-computer interface applications. In 2006 International conference of the IEEE engineering in medicine and biology society (pp. 1323–1326). IEEE. Li, M.-a., Luo, X.-y., & Yang, J.-f. (2016). Extracting the nonlinear features of motor imagery EEG using parametric t-SNE. Neurocomputing, 218, 371–381. Li, Y., & Wen, P. P. (2011). Clustering technique-based least square support vector machine for EEG signal classification. Computer Methods and Programs in Biomedicine, 104(3), 358–372. Li, Y., & Wen, P. (2013). Identification of motor imagery tasks through CC-LR algorithm in brain computer interface. International Journal of Bioinformatics Research and Applications, 9(2), 156–172. Li, Y., & Wen, P. P. (2014). Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain–computer interface. Computer Methods and Programs in Biomedicine, 113(3), 767–780. Li, M.-a., Zhu, W., Liu, H.-n., & Yang, J.-f. (2017). Adaptive feature extraction of motor imagery EEG with optimal wavelet packets and SE-isomap. Applied Sciences, 7(4), 390. Lu, H., Eng, H.-L., Guan, C., Plataniotis, K. N., & Venetsanopoulos, A. N. (2010). Regularized common spatial pattern with aggregation for EEG classification in small-sample setting. IEEE Transactions on Biomedical Engineering, 57(12), 2936–2946. Mandic, D. P., & Chambers, J. (2001). Recurrent neural networks for prediction: learning algorithms, architectures and stability. John Wiley & Sons, Inc. Martis, R. J., Acharya, U., & Min, L. C. (2013). ECG beat classification using PCA, LDA, ICA and discrete wavelet transform. Biomedical Signal Processing and Control, 8(5), 437–448. Pfurtscheller, G., Neuper, C., Muller, G., Obermaier, B., Krausz, G., Schlogl, A., Scherer, R., Graimann, B., Keinrath, C., & Skliris, D. (2003). Graz-BCI: state of the art and clinical applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 11(2), 1–4. Polat, K., & Güneş, S. (2007). Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast fourier transform. Applied Mathematics and Computation, 187(2), 1017–1026. Raghu, S., & Sriraam, N. (2018). Classification of focal and non-focal EEG signals using neighborhood component analysis and machine learning algorithms. Expert Systems with Applications, 113, 18–32. Rodríguez-Bermúdez, G., & García-Laencina, P. J. (2012). Automatic and adaptive classification of electroencephalographic signals for brain computer interfaces. Journal of Medical Systems, 36(1), 51–63. Sadiq, M. T., Yu, X., Yuan, Z., Fan, Z., Rehman, A. U., Li, G., & Xiao, G. (2019a). Motor imagery EEG signals classification based on mode amplitude and frequency components using empirical wavelet transform. IEEE Access, 7, 127678–127692. Sadiq, M. T., Yu, X., Yuan, Z., Zeming, F., Rehman, A. U., Ullah, I., Li, G., & Xiao, G. (2019b). Motor imagery EEG signals decoding by multivariate empirical wavelet transform-based framework for robust brain–computer interfaces. IEEE Access, 7, 171431–171451. Sakhavi, S., Guan, C., & Yan, S. (2018). Learning temporal information for braincomputer interface using convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems, 29(11), 5619–5629. Samek, W., Meinecke, F. C., & Müller, K.-R. (2013). Transferring subspaces between subjects in brain–computer interfacing. IEEE Transactions on Biomedical Engineering, 60(8), 2289–2298. View publication stats Schlögl, A., Neuper, C., & Pfurtscheller, G. (2002). Estimating the mutual information of an EEG-based brain-computer interface. Biomedizinische Technik/Biomedical Engineering, 47(1–2), 3–8. Siuly, S., & Li, Y. (2012). Improving the separability of motor imagery EEG signals using a cross correlation-based least square support vector machine for brain–computer interface. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20(4), 526–538. Siuly, S., Zarei, R., Wang, H., & Zhang, Y. (2017). A new data mining scheme for analysis of big brain signal data. In Australasian database conference (pp. 151–164). Springer. Song, L., & Epps, J. (2007). Classifying EEG for brain-computer interface: Learning optimal filters for dynamical system features. Computational Intelligence and Neuroscience, 2007. Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3(1), 109–118. Sturm, B. L. (2013). Classification accuracy is not enough. Journal of Intelligent Information Systems, 41(3), 371–406. Subasi, A., & Ercelebi, E. (2005). Classification of EEG signals using neural network and logistic regression. Computer Methods and Programs in Biomedicine, 78(2), 87–99. Subasi, A., & Gursoy, M. (2010). EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Systems with Applications, 37(12), 8659–8666. Szczuko, P. (2017). Real and imaginary motion classification based on rough set analysis of EEG signals for multimedia applications. Multimedia Tools and Applications, 76(24), 25697–25711. Taran, S., Bajaj, V., Sharma, D., Siuly, S., & Sengur, A. (2018). Features based on analytic IMF for classifying motor imagery EEG signals in BCI applications. Measurement, 116, 68–76. Thomas, J., Maszczyk, T., Sinha, N., Kluge, T., & Dauwels, J. (2017). Deep learningbased classification for brain-computer interfaces. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (pp. 234–239). IEEE. Wang, J.-J., Xue, F., & Li, H. (2015). Simultaneous channel and feature selection of fused EEG features based on sparse group lasso. Biomed Research International, 2015. Wang, H., & Zhang, Y. (2016). Detection of motor imagery EEG signals employing Naïve Bayes based learning process. Measurement, 86, 148–158. Witten, I. H., Frank, E., & Hall, M. A. (2005). Practical machine learning tools and techniques (p. 578). Morgan Kaufmann. Xu, N., Gao, X., Hong, B., Miao, X., Gao, S., & Yang, F. (2004). BCI competition 2003data set IIb: enhancing P300 wave detection using ICA-based subspace projections for BCI applications. IEEE Transactions on Biomedical Engineering, 51(6), 1067–1072. Xu, J., Zheng, H., Wang, J., Li, D., & Fang, X. (2020). Recognition of EEG signal motor imagery intention based on deep multi-view feature learning. Sensors, 20(12), 3496. Yang, W., Wang, K., & Zuo, W. (2012). Neighborhood component feature selection for high-dimensional data. Journal of Computational Physics, 7(1), 161–168. Yu, X., Chum, P., & Sim, K.-B. (2014). Analysis the effect of PCA for feature reduction in non-stationary EEG based motor imagery of BCI system. Optik, 125(3), 1498–1502. Zhang, Y., Nam, C. S., Zhou, G., Jin, J., Wang, X., & Cichocki, A. (2018). Temporally constrained sparse group spatial patterns for motor imagery BCI. IEEE Transactions on Cybernetics, 49(9), 3322–3332. Zhang, Y., Wang, Y., Jin, J., & Wang, X. (2017). Sparse Bayesian learning for obtaining sparsity of EEG frequency bands based feature vectors in motor imagery classification. International Journal of Neural Systems, 27(02), Article 1650032. Zhang, R., Xu, P., Guo, L., Zhang, Y., Li, P., & Yao, D. (2013). Z-score linear discriminant analysis for EEG based brain-computer interfaces. PLoS One, 8(9). Zhang, X., Yao, L., Wang, X., Monaghan, J., & Mcalpine, D. (2019). A survey on deep learning based brain computer interface: Recent advances and new frontiers. arXiv preprint arXiv:1905.04149. 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111