Wireless Personal Communications https://doi.org/10.1007/s11277-023-10445-w A Machine Learning Framework for Major Depressive Disorder (MDD) Detection Using Non‑invasive EEG Signals Nayab Bashir1 · Sanam Narejo2 · Bushra Naz2 · Fatima Ismail3 · Muhammad Rizwan Anjum4 · Ayesha Butt2 · Sadia Anwar5 · Ramjee Prasad5 Accepted: 7 April 2023 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023 Abstract According to World Health Organization (WHO) report, every 40 seconds a person attempts suicide globally. Depression, one of the world’s most prevailing diseases has become a reason behind these suicides. It is believed that early diagnosis of major depressive disorder (MDD) can reduce the adversity of this heinous deformity. For few years various machine learning and advanced neurocomputing techniques are being utilized in Electroencephalogram (EEG) based detection of multiple neurological diseases. In the proposed study, an EEG based screening of MDD is presented while using various Machine Learning and one Deep Learning approach. The majority of previous EEG based MDD decoding research has concentrated on a limited features. It was necessary to conduct indepth comparisons of different approaches, besides more detailed feature-based EEG analysis. This research starts with the creation of a complete feature-based framework, which is then further compared against the state of the art end to end techniques. The K-nearest neighbors (KNN) model outperformed the other models and gained an accuracy of 87.5%. While long short term memory (LSTM) model acquired an accuracy of 83.3%. This study can further support in clinical diagnosis of multiple stages of MDD and can attempt to provide an early intervention. Keywords Neurocomputing · Major depressive disorder · Feature based framework · Machine learning and Deep learning * Muhammad Rizwan Anjum engr.muhammadrizwan@gmail.com Sanam Narejo Sanam.narejo@faculty.muet.edu.pk 1 Department of Biomedical Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan 2 Department of Computer Systems Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan 3 Department of BBT, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan 4 Department of Electronic Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan 5 Department of Business Development and Technology, Aarhus University, Aarhus, Denmark 13 Vol.:(0123456789) N. Bashir et al. 1 Introduction Behavioral health diseases, particularly depression, are a critical health matter worldwide. The results of unattended behavioral health disorders are multifarious and have a serious and sometimes negative impact on a person’s being, combined, associative and societal levels. The distress of people suffering from depression is often unidentified even while going through the treatment. Modern technology holds an unrealized ability to recognize individuals at high risk for behavioral health conditions and in result to develop precaution and arbitration techniques. Around one in every eight children agonizes from a psychiatric disease that could be heinous enough to result in any particular functional defacement [1]. Major depressive disorder often termed as depression is one of the most prevalent mental health pathology, and it is prognosticated to be the highest contributor to the Global Burden of Disease (GBD) by the year 2030 [2]. The exact reason for major depressive disorder (MDD) is ambiguous, genetic, environmental and psychological factors can trigger this impairment [3]. Even the stress and anxiety linked with early traumatic experiences and poverty have also been associated with the spread of mental health problems [4]. Various symptoms and indications of ailing mental health can be decreased with different services; however, if mental health challenges left unattended and untreated, can have critical and problematic results. Electroencephalograph (EEG) is an effectual and acknowledged tool to acquire and record the electrical activity of the human brain [5]. It is being utilized substantially in the current years to research on and detect multiple encephalopathies which may involve seizure prediction [6, 7], mild cognitive impairment (MCI) [8], the Alzheimer’s disease [9], epileptic seizures [10–12], Creutzfeld–Jakob disease [13] the Parkinson’s disease [14, 15], schizophrenia [16], evaluation of different emotional states [17], multiple sleep studies [18, 19], and brain computer interfaces (BCI) [20, 21]. EEG is highly recommended and most desired diagnostic tool in the study of these neurological diseases due to its properties of being non-invasive, reasonable, higher time resolution and having ease of operation and comparatively less expensive than the magnetic resonance imaging (MRI) and computed tomography (CT) scan. According to studies in the domains of neuroscience, psychology, and cognitive science, EEG signals can reveal the bulk of brain functions and cognitive behaviour. The EEG signals are closely linked to psychological processes and emotional states, and it’s likely that they can reflect emotional changes in real time. Additionally, the electroencephalograph (EEG) acquires the electrical activity of human brain exceeding over a period of time whereas the MRI is an imaging modality which records the variations in flow of blood of brain within few seconds to about a minute. Therefore, the EEG signals are preferred instead of MRI scans to recognize major depressive disorder subjects. It is quite evident that the EEG signals of normal and depressed subjects are tumultuous and complicated in nature. While having subtle variations that reflect multiple brain activities of these two mentioned categories that could not be firmed readily with the help of visual observations. Hence, a computer aided detection (CAD) structure is developed to detect depression from the EEG signals of individuals. Machine learning in the field of health care has offered promising services to enhance the detection and treatment of some serious diseases, and today various promising researches in healthcare sector can be credited to machine learning [22–24]. The machine learning techniques related to the behavioral health offers tremendous benefits and ability to aid in earlier detection and support towards the prevention of such disorders. However, 13 A Machine Learning Framework for Major Depressive Disorder… the preceding research has determined the usefulness of utilizing the EEG signals in the classification of patients with major depressive disorder (MDD) and healthy subjects is still challenging area of research. Long preparation time for recording EEG signals owing to high quantities of data, long acquisition time for gathering necessary EEG signals, and, most crucially, the identification of MDD from resting state EEG signals alone; when the individuals are in the condition of either closed eyes or opened eyes. These limitations for acquiring EEG for a long session would cause the subjects either fall sleep or get bored. Apart from this, achieving greater and better accuracy for classifying MMD from normal subjects with less recording time and short EEG signal recording sites can be a matter of concern to address. This study demonstrates a non-invasive short-time data acquisition system that will diagnose the MDD from the subjects who tend to be in a resting and non-resting state while performing a cognitive task. Along with it, various features (time-domain, frequency-domain and non-linear) of EEG signals are explored and a complete feature-based framework was developed to analyze the EEG signals more extensively. By applying a feature selection technique, an optimal feature matrix was constructed for the major depressive disorder (MDD) classification process. Consequently, 242 features were identified and extricated from the recorded EEG signals. Later all the extracted and selected features were fed into a variety of Machine Learning models to evaluate the effect of selected features on these ML models. 2 Related Work The research mentioned in [5], researchers worked on the non-linear features of EEG signals and computed Higuchi’s fractal dimension (HFD) and found out that MDD and healthy controls are classified better in the beta band of EEG signals based on HFD, it opposed the previous study which stated that the separation is best computed in the alpha band. The HFD computed was higher in both beta and gamma bands of MDD subjects. On the basis of HFD, they obtained a high accuracy of 91.3% and according to the study HFD performed well than Katz’s Fractal Dimension (KFD). The study proposed in [25], used Linear Discriminant Analysis (LDA) and Logistic Regression (LR) for the classification and acquired an accuracy of 73.3% from the alpha bands of EEG signals only. The characterization of nonlinear features such as correlation dimension resulted in the high accuracy of the LR classifier where it reached to 91%. Both the classifiers LR and LDA worked well on non-linear features only. It also concluded that the right hemisphere of brain differentiates the depression with better results as compared to the left part of brain. Additionally, findings included that nonlinear features provides better classification results of depressed and healthy subjects as compared to the linear features. The study proposed by Acharya et al. [26] in the year 2015, showed the feature ranking acquired accuracy of 98% and sensitivity of 97%, whereas specificity of 98.5%. In this study the SVM classifier outperformed the rest of the classifiers. SVM classifiers having a polynomial kernel of 3rd order was used for left and right hemispheres of brain, using averaged values for both hemispheres. Bairy et al. [27] in the same year 2015 worked on the non-linear features (Sample Entropy, Fractal Dimension, Correlation Dimension, Hurst exponent) of EEG signals only and later acquired an accuracy of 93.8% having sensitivity of 92%, and specificity 95.9%. From the study it is hard to understand the type of validation was performed whether 13 N. Bashir et al. internal or external validation. For instance, the technique utilized to measure the fractal dimension was not reported properly thus resulting in limited reproducibility. The study proposed by Liao et al. [28] obtained 80% of accuracy in which a spectral-spatial Electroencephalogram feature extractor was developed, called the kernel Eigen-filter-bank common spatial pattern (KEFB-CSP). The data were collected covering the range of all the bands of EEGs (alpha, beta, gamma, theta and delta). In the year 2018, Mumtaz et al. [29] proposed a study using various classifiers to differentiate EEGs of MDD and normal subjects. SVM classifier acquired an accuracy of 98%, the classification accuracy of LR was 91.7%, and NB acquired an accuracy of 93.6%. With the HFD and detrended fluctuation analysis (DFA) along with HFD and Lempel–Ziv complexity (LZC) Bachmann et al. [30] in the year 2018 obtained the maximal accuracy of 85% by using LR classifier for differentiating MDD and normal subjects. The various classifiers reported by Čukić et al. [31] LR, SVM (with linear and polynomial kernel), DT, and NB achieved the accuracies ranging from 90.24 to 97.56%. Among the computed two measures, the sample entropy (SampEn) exhibited better performance. The number and placement of electrodes have been an important factor while acquiring the EEG signals from the subjects from the resting-state, because the principle component analysis (PCA) study demonstrated that each electrode has a contribution of its own to the results acquired [31, 32]. To sum up the related work that has been stated so far, the proposed studies focused on the improvement of the classification results with the necessary measures to be taken. Most of the studies are based on the classification of individuals with depression and healthy participants on their resting-state EEGs only, majority of them exhibited high accuracy while using different combinations of features used and machine learning models. But the studies conducted on differentiating depressed and normal subjects based on resting–state EEGs only (either eyes open or eyes closed) has reached to a bottleneck, which needs to be addressed. Lastly, all the studies had a modest number of sample sizes, which in result affected the generalizability of model. A summarized form of related research is presented in the Table 1. 3 Non‑Invasive EEG Based Acquisition System Electroencephalography (EEG) is a tool used to acquire and record the electrical activities which are originated from nerve cells in the area of cerebral cortex of human brain. Being the solely non-invasive technique for analyzing and obtaining the brain activities from scalp, it is being broadly utilized in various areas of neuroscience research, commercial applications and different medical diagnosis for research purposes. In the proposed study, the experimental dataset to be used was obtained from 34 major depressive disorder (MDD) patients which included 18 females and 16 males whereas the mean age was 40.33 and standard deviation was ± 12.861. Along with it a group of age-matched 30 healthy individuals were asked to volunteer, which included 9 females and 21 males. Their mean age was 38.227 and standard deviation was ± 15.64. The major depressive disorder subjects were the ones who underwent the diagnostic standards for depression as per the instructions of Diagnostic and Statistical Manual-IV (DSM-IV) [33], these subjects were inducted from Hospital Universiti Sains Malaysia’s outpatient clinic. The criterion of identification for major depression was cleared by the MDD subjects with the absence of any demented signs and symptoms. The consent form was signed by both the group participants. The experimental procedure which was guided to both the groups was accepted by the ethics 13 12 + 12 Ahmadlou et al. [5] 15 + 15 12 + 12 34 + 30 13 + 13 26 + 20 Acharya et al. [26] Liao et al. [28] Mumtaz et al. [29] Bachmann et al. [30] Čukić et al. [32, 33] Broadband EEG, PCA and Ten-fold cross validation Common spatial pattern REST Fourier Broadband Discrete cosine transforms 30 + 30 (left brain only) Standard spectral bands Wavelets and spectral bands (Fourier), bootstrap Preprocessing Bairy et al. [27] Hosseinifard et al. [25] 45 + 45 Samples Study Table 1 Summarizes the related work which has been stated Spectral (common spatial pattern) Synchronization likelihood HFD, DFA, Lempel–Ziv complexity, and SASI HFD and SampEn FD, LLE, SampEn, DFA, DET, ENTR, LAM, T2 (DDI) Power, DFA, HFD, CD, Lyapunov exponent SampEn, FD, DFA, CD, Hurst exponent, LLE HFD and KFD Features LR, SVM (with linear and polynomial kernel), DT, NB KEFB-CSP SVM, LR, NB Logistic regression SVM, KNN, NB, PNN, DT DT, KNN, NB, SVM KNN, LR, LDA Enhanced probabilistic neural networks Ml models Accuracy 97.50 Accuracy 93.80 Specificity 95.9 Sensitivity 92 Accuracy 98 Specificity 98.5 Sensitivity 97 Accuracy 80 Accuracy 87.50 Accuracy 88 Accuracy 90 Accuracy 91.30 Results A Machine Learning Framework for Major Depressive Disorder… 13 N. Bashir et al. committee of the hospital. An examination of the healthy individuals was done in order to eliminate the possibility of physical or mental impairment which then was found normal. The consumption of caffeine, alcohol and nicotine was strongly inhibited for the participants before recording the EEG signals. The experiment of data acquisition from the participants was done at the same time of the day. The experimental data procurement comprised of EEG data recordings for five minutes during three different conditions; eyes closed (EC); eyes open (EO) and a decision making task (TASK). The data acquisition of electroencephalograph included 22 channeled EEG cap sensors which were allocated by following the placement standard of the 10–20 electrode system [34] with electrodes placed all over the scalp and on four different lobes. Inion is external occipital protuberance tip. Nasion is present in middle of eyes and is a bony depression where the two nasal and frontal bones meet as recommended in [35] which is shown in Fig. 1. The total number of electrodes connected to sensors while sheathing the scalp were acquired from the four lobes of brain, temporal lobe, the frontal lobe, occipital lobe, parietal lobe and the central area. 4 Data Preprocessing Electroencephalography is a procedure to capture signals of brain activity by sensors. However, the recorded EEG data is contaminated with various kinds of interferences. The main focus while preparing the data is to attenuate the artifacts and noise from the environment or surroundings (exogenous) and organic (endogenous) sources [36]. 4.1 Preprocessing To set the seal on an accurate result in further processes like classification and feature selection, all the contaminated data should be denoised initially. Noise minimization procedure is categorized into two major parts: elimination of contaminated data chunks and attenuation of artifacts by utilizing the signal-processing techniques [37, 38]. Use of an adaptive filter was recommended by the National Institute of Mental Health to calculate the artifacts which could be removed from the EEG data [39]. A series of filters Fig. 1 Illustrates the 10–20 electrode placement system 13 A Machine Learning Framework for Major Depressive Disorder… were used to denoise and remove the interferences from the EEG signals. Butterworth highpass and bandpass filters with cutoff frequencies ranging from 0.1 to 70 Hz were used to attenuate the signals at a sample rate of 256 Hz. The 50 Hz power line is then further denoised with a notch filter, which is similar to a band stop filter. 4.2 Feature Extraction The EEG signal shows nonlinear, fragile, and time-sensitive characteristics that frequently show complicated fluctuations. With the change in emotional state, the features of EEG signals tend to change. In recent studies [40–43], the examination of these brain signals revealed numerous linear properties such as skewness, peak, and variance. Nonlinear metrics for pathological signals, such as correlation dimension, have been determined via research and are regarded to be valuable markers of different illnesses [44]. The feature extraction must first be conducted on the preprocessed EEG data in order to acquire the generated feature matrix. Frequency domain and time domain features are the two basic types of features seen in EEG data. The following characteristics were retrieved in this investigation due to the nonlinearity and inconsistency of the EEG signals: Frequency Domain Features The frequency domain allows for the categorization and characterization of EEG data. The characteristics retrieved from the frequency domain in the proposed study are absolute and relative power. Time Domain Features The most intuitive characteristics of EEG are those in the time domain. The EEG signals are collected at a certain time and frequency. Contaminants from EEG signals are removed from the time domain directly, and necessary data is generated as a time domain feature that may be used for continuous extended EEG diagnosis. Skewness, variance, peak, and Hjorth parameters are among the temporal domain variables investigated in this study. Hjorth introduced three parameters for measuring statistical qualities for signal processing in the time domain in 1970 [45]. These parameters are activity, mobility, and complexity. The activity variables represent the signal’s strength and the variation of the function of time among them. The mobility parameters, on the other hand, are the ones that show the frequency mean or standard deviation portion of the power spectrum. The frequency of the complexity parameters varies. Nonlinear Features The EEG signals are dynamic and irregular; including few features of the nonlinear dynamics system. Enormous studies being conducted on the EEG signals, EEG’s specialty of being nonlinear has been under exclusive focus globally. As a result, using nonlinear dynamics theory to analyze and evaluate EEG signals has been a more advanced study field for researchers [46–48]. The non-linear characteristics recovered in this study include Shannon Entropy, Power-Spectral Entropy, and correlation dimension. 1. Shannon in the year 1948 presented Shannon Entropy in an article titled “A Mathematical Theory of Communication” [49]. Shannon Entropy is basically a calculation of variability of a random signal and a random variable. The greater the amount of uncertainty and randomness it will result in the larger value of entropy. In this proposed methodology, the entropy utilized to process EEG signals can be seen as a measure of the series in the signal, which calculates the uncertainty and skewness of EEG [50]. If we take the case of random variables whose probability distribution is known, the entropy will be defined as: 13 N. Bashir et al. E(S) = ∑ p(s) log p(s), (1) s∈S Here S is denoted as a random variable with probability distribution p(s) and alphabet set S [51]. 2. The power-spectral entropy is described as a sequence of power densities and frequency distributions obtained by the Fourier transform. The calculation of the entropy of the power spectrum, also known as the Power-Spectral Entropy, may then be carried out effortlessly. The Power-Spectral Entropy is used to evaluate the timing of signals in EEG data. The entropy may be used as a physical indicator to estimate the frequency and strength of brain activity. A more functional brain will result in a larger level of entropy. 3. The elementary correlation dimension. In 1983, Grassberger and Procacia devised the simple correlation dimension method [52]. Correlation Dimension depicts the dynamic features of EEG signals. When the value of correlation dimension is larger, the EEG time series is considered difficult. It’s essentially a fractal dimension that’s often established using a time series diagram. Correlation dimension is a straightforward phase space diagram that is constructed from a single data vector, as seen below: ( ) ln C(d) CR = lim , (2) d→0 ln d Here C(d) is exhibiting the correlation integral and d is the radial interspace throughout each reference point. 4.3 Feature Selection The feature Selection approach selects a related subset of all the collected features, which not only assists to provide a low dimensionality of the classification issue but also reduces noise and irrelevant features. The applied approach was used to further reduce the various types of data in order to choose the important features for EEG signal detection. In order to acquire better results, the technique of minimal redundancy-maximal relevance is applied to perform the feature selection process. The minimal redundancy-maximal relevance (MRMR) feature selection principle was introduced by Peng et al. [53] to sort out the concern by analyzing both the relevance and feature redundancy concurrently; specifically, maximum relevance, shown as MAX R (F,l), it involves to increase the relevance of a particular feature subset F in accordance to the class label l. The relevance property of the feature subset is described in [53] as: MAX R(F, l) = 1 ∑ ( ) 𝜑 ri , l , |F| ri∈F (3) ( ) where 𝜑 ri , l describes the relevance property of a feature ­ri to l. 𝜑 can be approximated by utilizing some particular correlation operations. When two related features are highly dependent on one another, and when one of the features is eliminated the strength of class-discrimination would not vary drastically. 13 A Machine Learning Framework for Major Depressive Disorder… Minimum redundancy, MIN D (S), is made to choose a subset of feature mutually. The redundancy of the subset of feature is described as below: MIN D(S) = 1 ∑ 𝜑(pj , pk ). |S|2 p p ∈S (4) j k The minimal-redundancy-maximal-relevance (MRMR) is described as the common operator maximizing R and minimizing D simultaneously. In [54], the increasing research technique was utilized to achieve the near-optimal characteristics. The feature subset ­St−1 of t − 1 chosen feature is used to pick the t-order feature that optimizes the following mentioned equation: ( ) max [𝜑 pj , l − pk ∉St−1 1 ∑ 𝜑(pj , pk )]. t − 1 p ∈S j (5) t−1 Consequently, a total of 242 features (11 basic features × 22 electrodes) were identified and extricated from the five brain waves. The entire mentioned frequency domain, time domain and the nonlinear features have regular information about the EEG signals. 5 Classification As literature suggests, various classifiers for instance, SVM, KNN, and DT are broadly utilized as classification algorithms in the most of the researches related to EEG signals. In the proposed study, the performance of these three classifiers (SVM, KNN, and DT) was analyzed. Also, one deep learning classifier [long short term memory (LSTM)] was also evaluated in the diagnosis of MDD. All classifications done by these classifiers and the10-fold cross-validations were executed. The proposed methodology is illustrated in the Fig. 2. 6 Machine Learning Classifiers 6.1 Decision Tree Decision trees [55, 56] which are non-parametric and comes under the category of supervised learning recurrently partitions the space of feature in the areas which correlate to the classes by selecting a feature, which will then supply the high-raised information gain. The recursively partition of the feature space stops when the minimum number of specimen per node of a decision tree is extended to the value of two. In this study, the random state of DT is set to 0 to achieve a consistent output in every call. During the phase of pruning, that is basically based on estimation of the error of classification, the complication of the model can be decreased which will result in an improved generalization capacity. 80 percent of the data was set for training and 20% for testing from the whole dataset using the frequency domain, temporal domain, and time–frequency domain feature extractions. Figure 3a, b shows the training accuracy, validation accuracy, training loss and validation loss of this model for 50 epochs. 13 N. Bashir et al. Fig. 2 Illustrates the block diagram of the proposed methodology 6.2 Support Vector Machine SVM divides the feature space into decision boundary lines, linear in the amended area, justified by the kernel function, and uniquely given by a subset of the data [57]. Support vector machines creates a maximum margin classifier that escalates the gap between the boundary of decision and the support vectors. In this proposed research, the radial basis function (RBF) kernel function was utilized with the classifier soft-margin along with the regularization persistent value a constant C = 1 and an optimization algorithm [58]. The degree was set to 3 and cache size was 200. SVMs are supervised by nature and by design it enhances the classifier margin, and therefore, most probably, minimizes the phenomenon of overfitting. For 50 epochs the training accuracy, validation accuracy, training loss and validation loss of SVM was attained which is further presented in Fig. 4a, b. 6.3 K‑Nearest Neighbors K-Nearest Neighbors is a supervised classifier and is one of the simple classification models. KNN relies on an in-space function for pairs of monitoring. In K-Nearest Neighbor algorithm, k-nearest sample of training is found for a test sample. Later, the sample for testing is allocated to a certain class which tends to be the most frequent class out of all the k-nearest training data. KNN algorithm is only in need of a value of integer for the variable k and a metric to calculate the closeness [59]. In this proposed study, the n-neighbors hyper parameter was set to 5 which is a default value. The odd number is chosen to avoid any hitch in the classification. To check the performance the model the training accuracy, validation accuracy, training loss and validation loss was obtained as shown in Fig. 5a, b. 13 A Machine Learning Framework for Major Depressive Disorder… Fig. 3 a The training and validation accuracy of the Decision Tree model. b The training and validation loss of the Decision Tree model 7 Deep Learning Aproach 7.1 Long Short Term Memory (LSTM) LSTM is an augmentation of the recurrent neural networks which was introduced by Hochreiter and Schmidhube in the year 1997. By nature these kinds of networks have the capacity to relocate a hidden state as a reflection of what is being going across the network. LSTMs are credited to solve various sequential classification concerns with flying colors [60]. The purpose to introduce LSTMs was the inflating problem in RNN’s gradient loss 13 N. Bashir et al. Fig. 4 a The training and validation accuracy of the support vector machine model. b The training and validation loss of the Support Vector Machine model [61], LSTM aids to intercept the initial layers of RNN which needs to be upgraded with gradient vector. Here the real concern is related to the local gradient which gets near to the value of zero as it reaches the first layers, which results in reduction of the learning effect. LSTM comprises of number of gates which have an ability to carry important data for huge sequences and also give supervision over flow of data. LSTMs are similar to RNNs in a way that, every cell of LSTM progresses a hidden state to the upcoming layer. The architecture of LSTM model is based on variety of layers, for instance, there is an input layer and an output layer. Apart from this, there can be one or more LSTM layers and dense layers as well. The total number of iterations employed for training were 13 A Machine Learning Framework for Major Depressive Disorder… Fig. 5 a The training and validation accuracy of the developed K-Nearest Neighbors model. b The training and validation loss of the developed K-Nearest Neighbors model 500 epochs and along with it the sigmoid was selected as an activation function. The model utilized in this proposed study had a LSTM layer with the number of 64 neurons, having 50% dropout along with it 32 neurons were present in the dense layer. As LSTM extracts the temporal information quite efficiently so as EEG has an excellent temporal resolution. Therefore, the proposed model obtained satisfactory performance. Whereas, the training accuracy, validation accuracy, training loss and validation loss of the LSTM model for 500 epochs are shown in Fig. 7a, b. 13 N. Bashir et al. Table 2 Illustrates the summarized results of proposed study Performance measures SVM DT KNN LSTM Sensitivity 60 86.6 93.3 86.7 Specificity Precision Accuracy 77.78 56.25 676.6 66.66 68.42 79.1 77.78 66.7 87.5 77.7 65 83.3 8 Results The primary aim of this research was to provide a feature based machine learning frame work for MDD detection using EEG data. In order to execute this, the total number of 4224 EEG signals consisting of 22 channels × 3 states (EO, EC, TASK) of 64 subjects were explored, preprocessed and further analyzed. This was accomplished by computing frequency domain, time domain and nonlinear features. Consequently, three ML based classifiers, KNN, SVM and DT were trained to differentiate the EEG signals of major depressive disorder subjects from the normal subjects. Moreover, due to the advent and wide espousal of deep learning architectures LSTM model was also implemented and analyzed. Initially, the EEG signals were preprocessed with the help of filters, such as, Butterworth and Notch. Subsequently, around 11 features (absolute power, relative power, variance, skewness, peak, mobility, complexity, activity, correlation dimension, Shannon and Power-Spectral Entropy) were extracted from the preprocessed data. These extracted features are based on frequency domain, time domain and are non-linear in nature. Later, feature selection technique minimal redundancy-maximal relevance (MRMR) was applied to obtain only significant attributes which were to be fed into the machine learning classifiers. The machine learning classifiers used were DT, SVM and KNN. The performance measuring variables such as, accuracy, sensitivity, specificity and precision were computed and later the results were compared. The obtained results comprised on the classification accuracy, loss and computational resource in terms of training time. The KNN classifier outperformed the other two machine learning classifiers by achieving the highest accuracy of 87.5%, specificity of 77.78%, sensitivity of 93.3% and 66.7% precision. DT achieved an accuracy of 79.1% specificity of 66.66%, 86.67% sensitivity and 68.42% precision. SVM was the classifier with least accuracy of 66.6% specificity of 77.78%, sensitivity of 60% and 56.25% precision. On the contrary the deep learning approach, i.e., LSTM model did competently well and achieved an accuracy of 83.3% specificity of 77.77%, sensitivity of 86.67% Fig. 6 The architecture of a single LSTM cell 13 A Machine Learning Framework for Major Depressive Disorder… Fig. 7 a The training and accuracy of the developed LSTM model. b The training and validation loss of the developed LSTM model and 65% precision as d indicated in the Table 2. To conclude the results, three machine learning (KNN, DT, and SVM) and a deep learning (LSTM) model was computed and analyzed by using the EEG dataset. KNN proved to be the best in terms of accuracy and acquired 87.5% of accuracy. The training and validation accuracies, training and validation losses of all the models are exhibited in the Figs. 4, 5, 6 and 7. While the ROC curve in Fig. 8 indicates that the performance of all the models based on tradeoff between true positive rate (TPR) and false positive rate (FPR). Additionally, the computational time was also noted for the training process of each model. It was observed that LSTM model consumed more training time whereas DT took the least time to get train as further shown in Table 3. 13 N. Bashir et al. Fig. 8 ROC curve for the trained ML models Table 3 The training time of all the models used in the proposed study Models Time (s) KNN 198.8 DT SVM LSTM 193.3 207.3 1252.17 9 Conclusion and Future Work There are various health diseases that arise from weak and fragile mental states. One of the most common heinous deformities is depression. Physical injuries result in visible and aching signs and because of these obvious symptoms, they are taken seriously and recognized. There are no such visible signs and symptoms of mental ailment. Most of people are not even well informed about them, counting the victims as well. EEG signals are helpful for the analysis of MDD. These brain signals provide better information for depression analysis and are easy to acquire and economical as well. Compared to the previous research, the proposed experiment exhibited more reliable data including the resting and non- resting EEGs, for obtaining better experimental results. Altogether, the proposed research for MDD detection via a Machine learning approach based on feature extraction and selection decreased the number of data to be processed with vigorous data processing ability. To avoid the complexities in the 13 A Machine Learning Framework for Major Depressive Disorder… classifier, the selection of less number of features was encouraged to help in the simple interpretation; resulting in a decreased computational burden on the system. Therefore, the current study to differentiate depressed patients from healthy subjects and their mental states can help medical care experts with early intervention and prevention of this detestable deformity. An extension of this method on the larger dataset is needed to emphasize. Along with it the signals from various modalities like fNIRs or MIRs should be analyzed and compared with EEG signals, this will further provide support in obtaining more robust results. Meanwhile, use of fewer electrodes should be tested to optimize the existing models and more emotion or non-resting datasets should be utilized for exploring and knowing the potential applicability of various ML-based classification methods for diagnostic purposes. Funding This research is partially supported by the Department of Computer Systems Engineering, Mehran University of Engineering and Technology. Declarations Conflict of Interest Authors do not have any conflict of interest. Availability of Data and Material Enquiries about data availability should be directed to the authors. Code Availability Enquiries about code should be directed to the authors. References 1. Costello, E. J., Egger, H., & Angold, A. (2005). 10-year research update review: the epidemiology of child and adolescent psychiatric disorders: I. Methods and public health burden. Journal of the American Academy of Child & Adolescent Psychiatry, 44(10), 972–986. 2. Mathers, C. D., & Loncar, D. (2006). Projections of global mortality and burden of disease from 2002 to 2030. PLoS Medicine, 3(11), e442. 3. NAMI. Mental Health Conditions. (n.d.). Retrieved 18 Apr 2016. https://www.nami.org/Learn- More/Mental-Health-Conditions. 4. Gopalan, G., Goldstein, L., Klingenstein, K., Sicher, C., Blake, C., & McKay, M. M. (2010). Engaging families into child mental health treatment: updates and special considerations. Journal of the Canadian Academy of Child and Adolescent Psychiatry/Journal de l’Académie canadienne de psychiatrie de l’enfant et de l’adolescent. 5. Ahmadlou, M., Adeli, H., & Adeli, A. (2012). Fractality analysis of frontal brain in major depressive disorder. International Journal of Psychophysiology, 85(2), 206–211. 6. Direito, B., Teixeira, C. A., Sales, F., Castelo-Branco, M., & Dourado, A. (2017). A realistic seizure prediction study based on multiclass SVM. International Journal of Neural Systems, 27(03), 1750006. 7. Varatharajah, Y., Iyer, R. K., Berry, B. M., Worrell, G. A., & Brinkmann, B. H. (2017). Seizure forecasting and the preictal state in canine epilepsy. International Journal of Neural Systems, 27(01), 1650046. 8. Mammone, N., Bonanno, L., Salvo, S. D., Marino, S., Bramanti, P., Bramanti, A., & Morabito, F. C. (2017). Permutation disalignment index as an indirect, EEG-based, measure of brain connectivity in MCI and AD patients. International Journal of Neural Systems, 27(05), 1750020. 9. Morabito, F. C., Campolo, M., Labate, D., Morabito, G., Bonanno, L., Bramanti, A., & Bramanti, P. (2015). A longitudinal EEG study of Alzheimer’s disease progression based on a complex network approach. International Journal of Neural Systems, 25(02), 1550005. 10. Cogan, D., Birjandtalab, J., Nourani, M., Harvey, J., & Nagaraddi, V. (2017). Multi-biosignal analysis for epileptic seizure monitoring. International Journal of Neural Systems, 27(01), 1650031. 13 N. Bashir et al. 11. Geier, C., & Lehnertz, K. (2017). Which brain regions are important for seizure dynamics in epileptic networks? Influence of link identification and EEG recording montage on node centralities. International Journal of Neural Systems, 27(01), 1650033. 12. Guo, L., Wang, Z., Cabrerizo, M., & Adjouadi, M. (2017). A cross-correlated delay shift supervised learning method for spiking neurons with application to interictal spike detection in epilepsy. International Journal of Neural Systems, 27(03), 1750002. 13. Morabito, F. C., Campolo, M., Mammone, N., Versaci, M., Franceschetti, S., Tagliavini, F., et al. (2017). Deep learning representation from electroencephalography of early-stage Creutzfeldt–Jakob disease and features for differentiation from rapidly progressive dementia. International Journal of Neural Systems, 27(02), 1650039. 14. Hirschauer, T. J., Adeli, H., & Buford, J. A. (2015). Computer-aided diagnosis of Parkinson’s disease using enhanced probabilistic neural network. Journal of Medical Systems, 39(11), 1–12. 15. Yuvaraj, R., Murugappan, M., Acharya, U. R., Adeli, H., Ibrahim, N. M., & Mesquita, E. (2016). Brain functional connectivity patterns for emotional state classification in Parkinson’s disease patients without dementia. Behavioural Brain Research, 298, 248–260. 16. Akar, S. A., Kara, S., Latifoğlu, F. A. T. M. A., & Bilgiç, V. (2016). Analysis of the complexity measures in the EEG of schizophrenia patients. International Journal of Neural Systems, 26(02), 1650008. 17. Tonoyan, Y., Looney, D., Mandic, D. P., & Van Hulle, M. M. (2016). Discriminating multiple emotional states from EEG using a data-adaptive, multiscale information-theoretic approach. International Journal of Neural Systems, 26(02), 1650005. 18. Bruder, J. C., Dümpelmann, M., Piza, D. L., Mader, M., Schulze-Bonhage, A., & Van Jacobs-Le, J. (2017). Physiological ripples associated with sleep spindles differ in waveform morphology from epileptic ripples. International Journal of Neural Systems, 27(07), 1750011. 19. Dereymaeker, A., Pillay, K., Vervisch, J., Van Huffel, S., Naulaers, G., Jansen, K., & De Vos, M. (2017). An automated quiet sleep detection approach in preterm infants as a gateway to assess brain maturation. International Journal of Neural Systems, 27(06), 1750023. 20. Liu, R., Wang, Y., Newman, G. I., Thakor, N. V., & Ying, S. (2017). EEG classification with a sequential decision-making method in motor imagery BCI. International Journal of Neural Systems, 27(08), 1750046. 21. Sereshkeh, A. R., Trott, R., Bricout, A., & Chau, T. (2017). Online EEG classification of covert speech for brain–computer interfacing. International Journal of Neural Systems, 27(08), 1750033. 22. Marr, B. (2017). How Machine Learning is Transforming Healthcare. http://data-informed.com/how- machine-learning-is-transforming-healthcare/. 23. Imtiaz, S., Horchidan, S. F., Abbas, Z., Arsalan, M., Chaudhry, H. N., & Vlassov, V. (2020). Privacy preserving time-series forecasting of user health data streams. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 3428–3437). IEEE. 24. Daverio, P., Chaudhry, H. N., Margara, A., & Rossi, M. (2021). Temporal pattern recognition in graph data structures. In 2021 IEEE International conference on big data (Big Data) (pp. 2753–2763). IEEE. 25. Hosseinifard, B., Moradi, M. H., & Rostami, R. (2013). Classifying depression patients and normal subjects using machine learning techniques and nonlinear features from EEG signal. Computer Methods and Programs in Biomedicine, 109(3), 339–345. 26. Acharya, U. R., Sudarshan, V. K., Adeli, H., Santhosh, J., Koh, J. E., Puthankatti, S. D., & Adeli, A. (2015). A novel depression diagnosis index using nonlinear features in EEG signals. European Neurology, 74(1–2), 79–83. 27. Bairy, G. M., Bhat, S., Eugene, L. W. J., Niranjan, U. C., Puthankattil, S. D., & Joseph, P. K. (2015). Automated classification of depression electroencephalographic signals using discrete cosine transform and nonlinear dynamics. Journal of Medical Imaging and Health Informatics, 5(3), 635–640. 28. Liao, S. C., Wu, C. T., Huang, H. C., Cheng, W. T., & Liu, Y. H. (2017). Major depression detection from EEG signals using kernel Eigen-filter-bank common spatial patterns. Sensors, 17(6), 1385. 29. Mumtaz, W., Xia, L., Ali, S. S. A., Yasin, M. A. M., Hussain, M., & Malik, A. S. (2017). Electroencephalogram (EEG)-based computer-aided technique to diagnose major depressive disorder (MDD). Biomedical Signal Processing and Control, 31, 108–115. 30. Bachmann, M., Lass, J., Suhhova, A., & Hinrikus, H. (2013). Spectral asymmetry and Higuchi’s fractal dimension measures of depression electroencephalogram. Computational and Mathematical Methods in Medicine. 31. Cukic, M., Pokrajac, D., Stokic, M., Radivojevic, V., & Ljubisavljevic, M. (2018). EEG machine learning with Higuchi fractal dimension and sample entropy as features for successful detection of depression. arXiv: 1803.05985. 13 A Machine Learning Framework for Major Depressive Disorder… 32. Čukić, M., Stokić, M., Simić, S., & Pokrajac, D. (2020). The successful discrimination of depression from EEG could be attributed to proper feature extraction and not to a particular classification method. Cognitive Neurodynamics, 14(4), 443–455. 33. Lewis, G. (1996). DSM-IV. Diagnostic and Statistical Manual of Mental Disorders, 4th edn. By the American Psychiatric Association.(Pp. 886;£ 34.95.) APA: Washington, DC. Psychological Medicine, 26(3), 651–652. 34. Jasper, H. H. (1958). The ten-twenty electrode system of the International Federation. Electroencephalography and Clinical Neurophysiology, 10, 370–375. 35. Qin, Y., Xu, P., & Yao, D. (2010). A comparative study of different references for EEG default mode network: The use of the infinity reference. Clinical Neurophysiology, 121(12), 1981–1991. 36. Tatum, W. O., Dworetzky, B. A., & Schomer, D. L. (2011). Artifact and recording concepts in EEG. Journal of Clinical Neurophysiology, 28(3), 252–263. 37. Gross, J., Baillet, S., Barnes, G. R., Henson, R. N., Hillebrand, A., Jensen, O., et al. (2013). Good practice for conducting and reporting MEG research. NeuroImage, 65, 349–363. 38. Tong, S., Bezerianos, A., Paul, J., Zhu, Y., & Thakor, N. (2001). Removal of ECG interference from the EEG recordings in small animals using independent component analysis. Journal of Neuroscience Methods, 108(1), 11–17. 39. Gevins, A. S., Du, W., & Leong, H. (1996). U.S. Patent No. 5,513,649. U.S. Patent and Trademark Office. 40. Pijn, J. P. (1990). Quantitative evaluation of EEG signals in epilepsy: nonlinear associations, time delays and nonlinear dynamics. Rodopi. 41. Pijn, J. P. M., Velis, D. N., van der Heyden, M. J., DeGoede, J., van Veelen, C. W., & Lopes da Silva, F. H. (1997). Nonlinear dynamics of epileptic seizures on basis of intracranial EEG recordings. Brain Topography, 9(4), 249–270. 42. Rombouts, S. A. R. B., Keunen, R. W. M., & Stam, C. J. (1995). Investigation of nonlinear structure in multichannel EEG. Physics Letters A, 202(5–6), 352–358. 43. Stam, C. J., Van Woerkom, T. C. A. M., & Keunen, R. W. M. (1997). Non-linear analysis of the electroencephalogram in Creutzfeldt-Jakob disease. Biological Cybernetics, 77(4), 247–256. 44. Acharya, R., Faust, O., Kannathal, N., Chua, T., & Laxminarayan, S. (2005). Non-linear analysis of EEG signals at various sleep stages. Computer Methods and Programs in Biomedicine, 80(1), 37–45. 45. Hjorth, B. (1970). EEG analysis based on time domain properties. Electroencephalography and Clinical Neurophysiology, 29(3), 306–310. 46. Narejo, S., Pasero, E., & Kulsoom, F. (2016). EEG based eye state classification using deep belief network and stacked autoencoder. International Journal of Electrical and Computer Engineering (IJECE), 6(6), 3131–3141. 47. Kalsum, T., Mehmood, Z., Kulsoom, F., Chaudhry, H. N., Khan, A. R., Rashid, M., & Saba, T. (2021). Localization and classification of human facial emotions using local intensity order pattern and shape-based texture features. Journal of Intelligent & Fuzzy Systems, 40(5), 9311–9331. 48. Bashir, N., Narejo, S., Naz, B., & Ali, A. (2022). EEG Based Major Depressive Disorder (MDD) Detection Using Machine Learning. In Mediterranean Conference on Pattern Recognition and Artificial Intelligence (pp. 172–183). Springer. 49. Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423. 50. Bruhn, J., Lehmann, L. E., Röpcke, H., Bouillon, T. W., & Hoeft, A. (2001). Shannon entropy applied to the measurement of the electroencephalographic effects of desflurane. The Journal of the American Society of Anesthesiologists, 95(1), 30–35. 51. Shannon, C. E. (2001). A mathematical theory of communication. ACM Sigmobile Mobile Computing and Communications Review, 5(1), 3–55. 52. American Psychiatric Association. (2000). Diagnostic and Statistical Manual of Mental Disorders (4th edn., vol. 1). American Psychiatric Association. 53. Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. 54. Seligman, M. E. (1975). Helplessness. On depression, development and death. 55. Quinlan, R. C. (1993). 4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc. 56. Vapnik, V. N. (1988). Statistical learning theory (chapter 10, p. 42). Willey. 57. Jolliffe, I. T. (2002). Principal component analysis for special types of data (pp. 338–372). Springer. 58. Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines (pp. 41–65). MIT Press. 13 N. Bashir et al. 59. Webb, A. (1999). Statistical pattern recognition. Newnes. 60. Nagabushanam, P., Thomas George, S., & Radha, S. (2020). EEG signal classification using LSTM and improved neural network algorithms. Soft Computing, 24(13), 9981–10003. 61. Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02), 107–116. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Nayab Bashir received her degree in master of Engineering in the field of Information Technology in 2022 from the department of Computer Systems Engineering, Mehran University of Engineering and Technology, Jamshoro. She received her BE degree in Biomedical Engineering in 2018 from MUET. Her research interest includes Machine Learning, Deep Learning, Medical Imaging and Digital Signals Processing. Sanam Narejo is currently working as an Associate professor at the Department of Computer Systems Engineering, Mehran University of Engineering and Technology (MUET), Jamshoro. She has completed her Ph.D. from Politecnico Di Torino, Italy in 2018. She received her Masters degree in Communication Systems and Networking from MUET. Her research interests include Signal and Image Processing, Machine Learning and Deep Learning Architectures. She has also been an active member of Italian Society of Neural Networks (SIREN), PEC, IEEE and ACM-W Jamshoro chapter. 13 A Machine Learning Framework for Major Depressive Disorder… Bushra Naz received her B.E. degree in computer systems engineering and the M.E. degree in communication systems and networks from the Mehran University of Engineering and Technology (MUET), Jamshoro, Pakistan, in 2007 and 2009, respectively. From 2010 to 2011, she was a Senior Research Fellow with the University of Science and Technology, Beijing, China. She has pursued her Ph.D. degree with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China in 2018. She is currently working as Associate Professor at the Department of Computer Systems Engineering, MUET. Her research interest includes Image Processing, Regularization methods for Machine learning. Hyperspectral Image Classification, denoising and Remote Sensing Image Analysis. Fatima Ismail is currently working as Assistant Professor in the department of BBT, The Islamia university of Bahawalpur, Pakistan. Her research interest includes bioinformatics and Microbiology. Muhammad Rizwan Anjum received his Ph.D. degree from Beijing Institute of Technology, Beijing China in 2015. M. Engg. in Telecommunication and Control Engineering and B.E. in Electronic Engineering in 2011 & 2007 respectively from Mehran UET Jamshoro, Pakistan. Presently working as Associate Professor in the Department of Electronic Engineering, The Islamia university of Bahawalpur, Pakistan. He has more than 30 international conferences and journal publications. He is a member of PEC, IEEEP, IEP, IJPE, UACEE, IACSIT, ICCTD, IACSIT, IAENG, etc. and reviewer of several journals and conferences. 13 N. Bashir et al. Ayesha Butt is working as a lecturer at SZABIST institue. She received her M.E degree in Information and Technology from Mehran University of Engineering and Technology (MUET) in 2022. Her research interests include Signal Processing, Machine learning and Deep Learning Architectures. Sadia Anwar completed her Ph.D. in 2020 from Aarhus University, Denmark, in Integrated Technology-based Health Applications and User Specificities for Treatment Adherence. She received her degree in Doctor of Pharmacy from Government College University, Pakistan. She had worked for three years as a community pharmacist. She also served as a Hospital Pharmacist and Drug information consultant. She started working as a Guest Researcher at CTiF in the department of electronic systems, Aalborg University, under the supervision of Professor Ramjee Prasad. She served in the Interdisciplinary area specifically focused on four sections: Medicine, Telecommunication, Big data, and economics. As Research Scientist, she joined Aarhus University in December 2016 and the Department of Business Development and Technology. She has been a Research Assistant in the same Department since October 2017. Her research interests are based on the interdisciplinary area of Medicine, eHealth, ICT, Assistive technologies, and Green Business model Development. She has worked with EU and Interreg projects. She is also involved in administrative and managerial tasks. She has produced many research articles for conferences and Journals as an Author and co-Author. Ramjee Prasad is a Professor of Future Technologies for Business Ecosystem Innovation (FT4BI) in the Department of Business Development and Technology, Aarhus University, Denmark. He is the Founder President of the CTIF Global Capsule (CGC). He is also the Founder Chairman of the Global ICT Standardization Forum for India, established in 2009. GISFI aims to increase the collaboration between European, Indian, Japanese, North-American and other worldwide standardization activities in the area of Information and Communication Technology (ICT) and related application areas. The University of Rome “Tor Vergata”, Italy as a Distinguished Professor of the Department of Clinical Sciences and Translational Medicine honored him on March 15, 2016. He is Honorary Professor of University of Cape Town, South Africa, and University of KwaZulu-Natal, South Africa. He received Ridderkorset af Dannebrogordenen (Knight of the Dannenberg) in 2010 from the Danish Queen for the internationalization of top-class telecommunication research and education. He has received several international awards such as IEEE Communications Society Wireless Communications Technical Committee Recognition Award in 2003 for making contribution in the field of "Personal, Wireless and Mobile Systems and Networks", Telenor’s Research Award in 2005 for impressive merits, both academic and organizational within the area of wireless and personal communication, 2014 IEEE AESS Outstanding Organizational Leadership Award for: “Organizational Leadership in developing and globalizing the CTIF (Center for TeleInFrastruktur) Research Network”, and so on. He has been Project Coordinator of several EC projects, namely, MAGNET, MAGNET Beyond, eWALL, and so 13 A Machine Learning Framework for Major Depressive Disorder… on. He has published more than 30 books, 1000 plus journal and conference publications, more than 15 patents, over 140 Ph.D. Graduates and a more significant number of Masters (over 250). Several of his students are today worldwide telecommunication leaders themselves. Under his leadership, magnitudes of close collaborations are being established among premier universities across the globe. The collaborations are regulated by guidelines of the Memorandum of Understanding (MoU) between the collaborating universities. 13