Capstone Project Midterm Progress Report Form Department of Electrical and Computer Engineering School of Engineering and Digital Sciences, Nazarbayev University Project Details Student Name/s: Daniyar Zhakyp Project Title: Surface electromyography (sEMG) processing and classification for a myoelectric prosthesis design Supervisor Name/s: Muhammad Tahir Akhtar, Co-Supervisor: Prashant Kumar Jamwal Student contact (email): daniyar.zhakyp@nu.edu.kz Background of the project (limit 500 words) In recent years, the development of robust biological systems has become a scientific and research “anchor” in the field of Biomedical and Rehabilitation Engineering. The electromyography signals (EMG) – one of the members of the bio-signals “family” along with the electroencephalography (EEG) and electrocardiography (ECG) signals – are widely used for the deployment in prosthesis and exoskeleton designs. The surface electromyography signal (sEMG) is an algebraic sum of all electric pulses captured by the electrodes within a superficial muscle during its contraction. Until 2017, all commercially available prosthetic devices functioned based on the strength of the muscle contraction (i.e., the amplitude of the sEMG signal) [1]. This method, by avoiding any complex signal processing and signal classification, deprives the device of extensive functionality (i.e., additional hand motions); and a user of the intuitiveness of the application (i.e., the device operation is not as smooth as that of an intact hand). The latest approaches involve the development of machine learning (ML) models for offline simulations based on the pre-recorded sEMG datasets or for integration on embedded devices. The new scheme is called a Pattern-Recognition-based (PR) myoelectric control. It mainly consists of several crucial stages like (1) signal (dataset) preprocessing, where target signals’ time and frequency information is analyzed to apply necessary statistical signal processing algorithms; (2) feature extraction/selection, where the useful information is extracted in windows in the form of mathematical features, from which those with the highest “descriptive” power are selected for further classification; (3) signal classification, in which a statistical ML model learns different motion classes bound to certain sEMG signals based on the feature vector, and tries to estimate the class of the unseen sEMG time series. For there is no single formula for treating sEMG signals within the PR scheme, researchers have utilized various algorithms in each stage described above. For signal preprocessing, different classes of Bandpass filters (e.g., Butterworth, Chebyshev Type I, etc.) with different orders and cutoff frequencies were used to filter the raw sEMG signals [2]. The extracted feature vectors consisted of only Time-Domain (TD) or TimeFrequency Domain (TFD) features [2]. Some articles report on the use of features from two or more domains extracted in window sizes ranging from 150 to 400 milliseconds [2]. The Data-Driven Feature Selection techniques applied on sEMG signals comprised filter methods – Bhattacharyya distance, Pearson’s Correlation Coefficient; and wrapper methods – Sequential Forward Searching (SFS), Fisher-Markov selector (FMS) [3], [4]. For the final sEMG classification model, researchers frequently choose some linear classifiers like Linear Discriminant Analysis (LDA), Naïve Bayes (NB) or Support Vector Machine (SVM) with a Linear Kernel; and nonlinear models like SVM with the Radial Basis Function (RBF) Kernel or Artifical Neural Networks (ANNs) [5]. The exploration of the PR approach in scientific research and the implementation of the model on real myoelectric prostheses is of utmost importance nowadays. It will solve the problem with the proper arrangement of electrodes: as for the ML algorithms - concrete electrode placement is not mandatory. The PR technique would endow the user with more intuitive and smooth device control and the ability to incorporate new desired motions into the controller’s memory. Main achievements during week 1 - 7 (limit 3000 words) Dataset preprocessing For Capstone I, experiments have been performed with pre-recorded EMG datasets that contain an extensive data from healthy subjects and amputees. The NinaPro Database 2 (DB2), which contains 47 healthy subjects, and the NinaPro Database (DB3), with 11 trans-radial amputees, have been selected as the target datasets. Table I presents the detailed information about both datasets. The setup for the collection of both datasets includes 12-channel double-differential electrodes from a Delsys Trigno Wireless system, which samples EMG signals with a 2 kHz sampling rate [6]. Subjects repeat hand motions that appeare on the laptop’s screen with a duration of 5 seconds for every gesture alternated by 3 seconds of rest [6]. Table I. NinaPro DB2 and DB3 dataset information Dataset name NinaPro (DB2) No. of subjects 40 healthy subjects 11 amputees No. of performed 40 distinct hand motions + rest 40 distinct hand motions + rest No. of repetitions 6 repetitions 6 repetitions No. of sEMG channels 12 sEMG channels 12 sEMG channels motions Database 2 NinaPro (DB3) Database 3 All types of motions performed by the subjects from two databases are shown in Figure 1. The execution of forty hand motions is divided into three types of exercises. The first exercise includes all isometric finger configurations. The second exercise consists of all basic wrist movements, while the third exercise implies the execution of functional grasps of different objects by a user [6]. Figure 1. Forty distinct hand motions performed by every subject within three exercises. According to the Pattern-Recognition (PR) scheme, the first essential step is dataset preprocessing. The first preprocessing technique applied to the signals is a clipping of the initial and last parts of the time series, which correspond to the rest state (i.e., class 0). One typical sEMG signal is shown in Figure 2, and its Power Spectral Density (PSD) is presented in Figure 3. The raw sEMG signals are exposed to many sources of noise: electromagnetic radiation, muscle cross-talk, ECG artifacts, motion artifacts, and powerline interference [7]. Therefore, the second preprocessing step is to apply filtering to eliminate this noise. To choose the “proper” cutoff frequencies for a bandpass filter, the frequency content of our signals needs to be analyzed. The conventional Fast Fourier Transform (FFT) brings some challenges for sEMG frequency content analysis, which include complex output for the real-valued input, both positive and negative frequency components, and no information about signal power. The Power Spectral Density (PSD) becomes a convenient engineering tool to analyze the power distribution along all frequency components present in the signal. The Welch’s PSD, implemented in MATLAB using the pwelch() function, is used to plot the one-sided power spectral density of the raw sEMG signal. Figure 2. The raw sEMG signal from one channel of the first subject lasted for forty hand motions. Figure 3. The Power Spectral Density (PSD) of the raw sEMG signal using the Welch’s method. As it can be seen from Figure 3, the normalized power value starts decreasing gradually after 250-300 Hz and drops substantially after 450-500 Hz. There is also less power for low frequencies up to 10-20 Hz, which correspond to the Motor Unit Action Potential (MUAP) firing rates [7]. It can be deduced that the frequencies with much lesser power are the potential sources of noise such that they can be filtered out. Figure 4 shows the magnitude responses of four different filters. As the amount of noise substantially affects the classification performance, different types of Infinite Impulse Response (IIR) digital bandpass filters with mutable filter order and cutoff frequencies have been implemented as MATLAB functions. The selected filters are Butterworth, Chebyshev Type I, and Elliptic filters; the last filter type is the second-order 50 Hz Notch filter for powerline noise elimination. Figure 4. The frequency response of four different filters implemented in MATLAB. (a) 8th order Butterworth filter (20-500 Hz), (b) 8th order Chebyshev Type I filter (20-500 Hz), (c) 8th order Elliptic filter (20-500 Hz), (d) 2nd order Notch filter (50 Hz). It is intended to compare the overall PR model performance based on the type, order, and cutoff frequencies of the filter used in the preprocessing stage. Feature extraction The preprocessed signal does not go directly to a classifier but undergoes the process of extracting mathematical features from it performed in limited portions called “windows". A window function could be any function, but the rectangular one with an amplitude of 1 is the most frequently used option among researches. Table II provides the information about the features extracted in MATLAB. In total, there are 63 different feature functions implemented in MATLAB, of which 43 are Time-Domain (TD) features, 7 are Frequency Domain (FD) features, 11 are from the Time-Frequency Domain (TFD), and 2 are Entropy-based features. Table II. Four types of features implemented in MATLAB Type of the feature No. of features extracted Example of a feature Time-Domain 43 Zero Crossing (ZC) Frequency-Domain 7 Mean Frequency (MF) Time-Frequency Domain 11 Variance of Discrete Wavelet Transform Coefficients (DWTVAR) Entropy-based 2 Shannon Entropy (ShEn) One of the computed features in Time-Domain is the ZC feature. The implementation of a ZC feature is shown in Figure 5. The Zero Crossing (ZC) is computed as: 𝑍𝐶 = ∑𝑁−1 𝑖=1 [𝑠𝑔𝑛(𝑥𝑖 × 𝑥𝑖+1 ) ∩ |𝑥𝑖 − 𝑥𝑖+1 | ≥ 𝑇ℎ ] 1, 𝑠𝑔𝑛(𝑥) = { 0, (1) 𝑖𝑓 𝑥 ≥ 𝑇ℎ 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 where 𝑠𝑔𝑛(𝑥) – sign function, 𝑇ℎ - threshold value, to abstain from background noise [2]. The ZC feature shows how many times the signal crosses the zero y-axis – which gives some insights into the frequency domain properties of the time series [2]. The number of zero crossings is higher for some hand motions than others, and this value is certainly higher for any other gesture compared to the rest class, which separates those gestures in time. Figure 5. The implementation of the Zero Crossing (ZC) feature in MATLAB. The threshold value 𝑇ℎ for the ZC feature calculation is computed as: 𝑁 𝑇ℎ = 𝑅 ∗ √∑(𝑥𝑁𝑀 [𝑗])2 (2) 𝑗=1 where R – a factor from 0 to 6, 𝑥𝑁𝑀 [𝑗] – the samples of the signal at rest (no motion) [8]. Feature selection After extracting all the features from the signals, only those with the highest feature ranking, namely the greatest descriptive ability, are selected for the final feature vector fed into a classification model. This is achieved through several Data-Driven Feature Selection algorithms, which can be of three types: filter, wrapper, and embedded methods. The filter methods implementation is shown in Figure 6. The filter methods are usually less computationally expensive than the wrapper methods; they select the best subset of features irrespective of the classifier model. Figure 6. Filter methods implementation scheme. The wrapper methods implementation is shown in Figure 7. The wrapper methods contain more complex algorithms to identify the best feature vector. They pre-train different combinations of features on the specified classification model and then come up with the best fitting one.. Figure 7. Wrapper methods implementation scheme. EMG sensor development The EMG sensing board is used to amplify and filter the low-amplitude noisy electrical potentials within a muscle and send it to the microcontroller. As most commercially available EMG sensors are expensive and hard to reach in Kazakhstan, the DIY model of that sensor is proposed. The PSpice schematic of the EMG sensor is shown in Figure 8. The proposed design consists of six stages, which are signal sensing and preamplification, 2nd order high- and low-pass filtering, final amplification, and rectification. The first two stages of the circuit are designed to ensure that the desired AC signal (i.e., sEMG) is amplified and the unwanted DC potentials are suppressed. Figure 8. The DIY EMG sensor design consisting of six operational amplifiers. The raw differential sEMG is sent as the input to the circuit, as shown in Figure 9. The processed output of the circuit is shown in Figure 10. The simulated sEMG signal has been amplified approximately x1000-1500 times, filtered with 20-500 Hz cutoff frequencies, and rectified to leave only positive voltage values. Figure 9. The raw differential sEMG signal of 0-7 mV. Figure 10. The processed sEMG output of 0 – 5 V. Project implementation and plan Project activities and work plan (limit 2000 words) Capstone Project I Literature review: The up-to-date relevant papers on sEMG-based PatternRecognition scheme have been continuously explored and analyzed. Dataset choice: For there is a decent amount of pre-recorded EMG datasets available in public space, like NinaPro DB1-DB10, IEE EMG, putEMG, Megane Pro, etc., the instructions of each dataset, - which include an experiment setup, number of subjects, and performed motions - have been explored. Two databases – NinaPro DB2 and DB3 – have been chosen as the primary data sources for Capstone I. Signal preprocessing: The beginning and end of sEMG signals, which correspond to the rest state, have been clipped before the signals are denoised. The frequency spectrum is analyzed using Welch’s PSD. The cutoff frequencies for bandpass filtering are chosen to be 20 and 450 Hz. Different filter types have been implemented in MATLAB to compare their effect on the final PR model performance. Feature extraction: Each feature extracted from sensor data (I.e., sEMG) is a mathematical or engineering concept and may be a whole research topic itself. Therefore, different topics like Shannon entropy, an Autoregressive Model (AR), Welch’s PSD, Teager-Kaiser Energy Operator (TKEO), and Discrete Wavelet Transform (DWT) have been discovered before the MATLAB implementation. In total, 63 features of four distinct categories have been extracted and put into a single feature vector. Feature selection: The nature of Data-Driven Feature Selection algorithms has been studied. Different approaches like filter and wrapper selection methods have been compared based on fitness for the sEMG-based PR model. EMG sensor development: The DIY EMG sensor design has been developed and tested in simulation apps like PSpice. The assembled sensor has shown unsatisfactory real-time performance shown on DIY dry electrodes. The results have to be re-evaluated using the ordered disposable gel electrodes. Table 1 – Gantt chart Aug Sep Oct Nov Dec Jan Feb Mar - Planned Apr Capstone I Literature review Dataset choice Signal preprocessing Feature extraction Feature selection EMG sensor development Signal classification Model performance evaluation Real-time experiments with sensor - Completed Technical problems, or, concerns associated with realization of project (limit 1000 words) The main aim of this Capstone project is to develop a real-time PR algorithm on an embedded device. To acquire an sEMG dataset and to send the electric muscle impulses in real time, an EMG sensing board is required. During the beginning of Capstone I, I came up with the design of this EMG analog sensor, which showed meaningful and positive results in amplifying and filtering the raw sEMG signal in simulations, but failed to repeat the outcome in real-time experiments. The design followed all the guidelines about sensing lowamplitude bio-signals to avoid the skin-sensor impedance mismatch and pre-amplifier saturation, but the sensor operation was still unsatisfactory. However, a few tests have been done as I have run out of the disposable EMG gel electrodes. The new batch of them is expected to come by the 12𝑡ℎ of October. I have also ordered some commercially available myoelectric sensors - to use instead of the DIY one if its work seems to be unreliable. After I finish implementing my PR model offline, I start developing this model on a microcontroller using the sEMG sensor as the input source. If I have some time left, I will compare the performance of the commercial sensor with my design based on the accuracy of prediction and some real-time metrics. References [1] A. Calado, F. Soares and D. Matos, "A Review on Commercially Available Anthropomorphic Myoelectric Prosthetic Hands, Pattern-Recognition-Based Microcontrollers and sEMG Sensors used for Prosthetic Control", Ieeexplore.ieee.org, 2019. [Online]. Available: https://ieeexplore.ieee.org/document/8733629. [2] D. Toledo-Pérez, J. Rodríguez-Reséndiz, R. Gómez-Loenzo and J. Jauregui-Correa, "Support Vector Machine-Based EMG Signal Classification Techniques: A Review", Applied Sciences, vol. 9, no. 20, p. 4402, 2019. Available: 10.3390/app9204402. [3] A. Adewuyi, L. Hargrove and T. Kuiken, "Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control", Frontiers in Neurorobotics, vol. 10, 2016. Available: 10.3389/fnbot.2016.00015. [4] Qiang Cheng, Hongbo Zhou and Jie Cheng, "The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 6, pp. 12171233, 2011. Available: 10.1109/tpami.2010.195. [5] L. Zhang, G. Liu, B. Han, Z. Wang and T. Zhang, "sEMG Based Human Motion Intention Recognition", Journal of Robotics, vol. 2019, pp. 1-12, 2019. Available: 10.1155/2019/3679174. [6] M. Atzori et al., "Characterization of a Benchmark Database for Myoelectric Movement Classification", IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 23, no. 1, pp. 73-83, 2015. Available: 10.1109/tnsre.2014.2328495. [7] M. Jamal, Signal Acquisition Using Surface EMG and Circuit Design Considerations for Robotic Prosthesis. INTECH Open Access Publisher, 2012. [8] A. Waris and E. Kamavuako, "Effect of threshold values on the combination of EMG time domain features: Surface versus intramuscular EMG", Biomedical Signal Processing and Control, vol. 45, pp. 267-273, 2018. Available: 10.1016/j.bspc.2018.05.036.