TECHNION – Israel Institute of Technology Department of Electrical Engineering The Physiological Signal Processing Laboratory Classification of Finger Activation using Genetic Algorithm EMG Feature Selection By Dori Peleg & Eyal Braiman Supervised by Elad Yom-Tov Winter 2000-2001 Abstract In cases of hand amputation there is a need for a way to control a robotic replacement hand. Although the hand is missing, the muscles in the forearm, which are responsible for finger movement, remain, and can nonetheless be flexed. This muscle activity, via electrodes attached to the forearm, can be read as electomyographic signals. The objective of this project is to successfully identify when a finger is activated and which finger is activated. The interval of finger activation was derived from the signal’s envelope (without regard to identity of finger at this point). Characteristic features were extracted from the signal and a modified K-Nearest neighbor classifier used those features in order to ascertain which finger was activated. Due to an over fitting problem feature selection was required. The feature selection was utilized using a Genetic Algorithm that searched for the combination of features with which the classifier generates the fewest errors. We show that after reviewing only ~ 3 10 9 % of the possible combinations of features the algorithm managed to lower the probability of misclassification from approximately 25% to only 5%. This shows that it is possible to operate a robotic replacement arm with relatively few errors using only one pair of electrodes. -2- Table of contents Abstract .......................................................................................................................... 2 Table of contents ............................................................................................................ 3 1 Introduction ................................................................................................................ 4 1.1 Autoregressive (AR) coefficients ....................................................................... 7 1.2 Modified K Nearest Neighbor (KNN) classifier ................................................. 8 1.2.1 An example of the decision process........................................................... 10 1.3 Genetic Algorithms (GA) for variable selection ............................................... 14 1.3.1 Key GA parameters & terms...................................................................... 15 2 Experimental setup................................................................................................... 16 2.1 Location of electrode pair ................................................................................. 16 2.2 Finger label convention..................................................................................... 18 2.3 Finger activation sensor .................................................................................... 19 2.4 Course of experiment ........................................................................................ 20 3 Data processing ........................................................................................................ 21 3.1 Finger activation identification ......................................................................... 21 3.2 Envelope detection ............................................................................................ 22 3.3 Ascertainment of the interval of finger activation via signal’s envelope ......... 23 3.4 Extraction of characteristic features from the finger activation........................ 26 3.5 GA implementation ........................................................................................... 28 3.6 Error Calculation ............................................................................................... 30 4 Results ...................................................................................................................... 31 5 Discussion ................................................................................................................ 35 Reference ..................................................................................................................... 38 -3- 1 Introduction In cases of hand amputation there is a need for a way to control a robotic replacement hand. Although the hand is missing, the muscles in the forearm, which are responsible for finger movement, remain, and can nonetheless be flexed. This muscle activity, via electrodes attached to the forearm, can be read as electomyographic (EMG) signals. In order to analyze and classify finger movements using EMG signals, previous studies [1] used several learning techniques, such as perceptron linear separation and a backpropagation-type neural network. These techniques yielded probabilities of misclassifications too high for a realistic implementation. In this project, a combination of a K-Nearest neighbor (KNN) classifier of EMG signal features and a genetic algorithm (GA) for feature selection, is used, with the results that the probability of finger identity misclassification is approximately 5%. The process begins with reading the EMG signals from an electrode pair attached to the test subject’s forearm. Finger activation (FA) was identified when the signal’s envelope crossed a threshold higher than the maximum noise level. From the finger activation characteristic features were chosen in order to represent the FA. The characteristic features were derived from two sources. The first is the amplitude of the Discrete Fourier Transform of the EMG signal: The frequency region was divided into 20 equal sections of frequencies and each section was characterized by its mean and variance values. The second source is the Auto Regressive (AR) model: In the AR model the signal is modeled as the output of an all-pole IIR filter with white Gaussian noise input. The features from this model are the AR coefficients. A modified K Nearest Neighbor classifier used these features in order to ascertain which finger was activated (e.g. a test example’s label is evaluated according to its features). The modified KNN classifier uses a set of examples (a training set) as its decision space. Each example is comprised of N characteristic features and a label indicating the true finger identity. The N- dimensional Euclidian distance between the test example and the training examples is calculated. The nearest -4- example to the test example is declared the nearest neighbor and the test example’s label becomes the nearest neighbor’s label. This is the case when the classifier looks for only k=1 nearest neighbors. For a general k the classifier algorithm takes into account the k nearest labels and decides which is nearest. Due to an over fitting problem feature selection was required. It was utilized with a Genetic Algorithm that searched for the combination of features with which the KNN classifier classifies with the fewest errors. Genetic algorithms simulate nature’s evolutionary process. The equivalent of an organism is a “solution”. A solution is a binary vector of length N (the number of features), where ‘1’ terms in the solution indicate inclusion of the appropriate feature in the KNN classification. A certain number of solutions are evaluated and they make up a “population” of solutions. They are evaluated against a criterion of survival. The criterion is the number of misclassifications the KNN classifier makes when using the features denoted by the solution. The members of the population, which have the worst criteria die, while the rest survive and breed among themselves (crossover technique), and make up the population of the next iteration (equivalent to a generation). The algorithm reaches an end when most of the population is identical (this means the algorithm has converged into a solution) or until a certain predetermined number of generations is reached. The GA managed to lower the probability of misclassification from about 25% to only 5%. The classification process is illustrated in flow chart 1. In the following sections the theory behind AR modeling, the modified KNN classifier, and genetic algorithms are described. Section 2 details the experimental setup. Section 3 shows the stages of processing. Section 4 describes results obtained by the processing algorithms, and in section 5 these results are discussed. -5- Reading EMG signals from the forearm muscles Finger activation detection Characteristic feature extraction Feature selection that minimizes finger misclassification Finger classification with selected features Calculation of Probability of misclassification Flow Chart 1: The general steps of the classification process -6- 1.1 Autoregressive (AR) coefficients Autoregessive modeling [2] allows the signal to be modeled as the output of an all-pole IIR filter with white Gaussian noise input. This is the Yule-Walker AR problem, which concerns modeling an unknown system as an autoregressive process. The use of the signal's autocorrelation sequence to obtain an optimal estimate leads to a Ra = b equation of the type shown below. These equations are most efficiently solved by Levinson-Durbin recursion. Where the vector r is the autocorrelation estimate for the signal. Vector a contains the coefficients of the autoregressive process that optimally models the system. The coefficients are ordered in descending powers of z, and the AR process is minimum phase. The prediction error, G, defines the gain for the unknown system. -7- 1.2 Modified K Nearest Neighbor (KNN) classifier The modified K-Nearest Neighbors [3] (KNN) classifier operates on a set of M examples. Each example is a vector of N variables and is tagged with one of L labels. In order to test the classifier’s validity, holdout was used, whereby the set is separated into two groups where 80% of the examples are used for training and 20% of the examples are used for testing. The training set is treated as known information while the test set’s labels are regarded as unknown. The KNN classifier uses the training set and its labels to find the test set’s labels. Each variable is treated as a dimension and the classification space is the N-dimensional variable space. The classifier takes each of the test set’s examples and one at a time calculates their N-dimensional Euclidian distance from the training set examples. Only the k nearest labels are taken into account. The first criterion is the number of appearances a label makes in the k nearest. If there is only one label which appears the most times than it is the nearest label. If there are two or more labels, which appear the most times, than the algorithm advances to a second criteria. This criterion first defines the term “position” as the order of distance in the k nearest (e.g. the label with the smallest distance is in “position 1”, the label with the second smallest distance is in “position 2”…). The label, of the labels that are tied in the first criteria, with the smallest sum of positions is the nearest label. The term smallest sum of position is related to the smallest average distance term. If there are two or more labels, which have the same smallest sum of positions, than the algorithm advances to a third criteria. The label, of the labels that are tied in the first and second criteria, with the smallest position (not sum) is the nearest label. In this criteria there can be no tie. Flow chart 2 illustrates the modified KNN classifier decision process. -8- Take one example from the test set and calculate the Ndimensional Euclidian distance from each of the training set examples. Find the k nearest (N-dimensional Euclidian distance) labels and order them according to distance. Calculate how many times each label appears in the k nearest labels. Calculate which label appears the most. The nearest label is the label that appears the most times in the k nearest Is there only one label that appears the most times in the k nearest? There are two, or more, labels with the same, largest, number of appearances in the k nearest. Calculate, for each of the above labels, the sum of their positions. The nearest label is the label that appears the most times in the k nearest, and has the smallest sum of positions Is there only one label with the smallest sum of positions? -9- There are two, or more, labels with the same largest number of appearances in the k nearest and the same sum of their positions. Find, for each of the above labels, its lowest position. Choose the label which appears the most times in the k nearest, has the smallest sum of positions and of those positions has the lowest one. Flow Chart 2: The modified KNN classifier decision process. The classifier was modified from the classic KNN classifier so that ties can be resolved by the classifier 1.2.1 An example of the decision process Number of Labels L=5; Number of variable=2; => two dimensional decision (variable) space. k=5 nearest neighbors; The distance calculation from the test label to the five nearest training labels is shown in figure 1. - 10 - Figure 1: An example of modified KNN classification. In this example the number of labels=5, the number of features=2, the number of nearest neighbors=5. The distance from the test label to each of the 5 nearest labels is indicated with an arrow. The five nearest labels to the test label in ascending distance are: Label 5 Label 2 Label 1 - 11 - Label 2 Label 5 The first criterion is the number of appearances in the k nearest neighbors. Whichever label appears the most times is determined the "nearest label". The results are as follows: Label No. 1 2 3 4 5 Number of appearances in the 5 nearest neighbors 1 2 0 0 2 According to this criterion there is a tie between labels 2 & 5, since they both appear two times in the 5-nearest neighbors. In a tie situation the algorithm advances to a second criterion. For each of the tied labels (labels 2 & 5 in our example) the sum of their positions is calculated. The sum of label’s positions is indicative to the average distance of that label. Therefore the nearest label is the label with the smallest sum of positions1. The results are as follows: Label No. 2 5 Sum of Positions in the 5 nearest neighbors 2+4=6 1+5=6 According to this criterion there is still a tie between labels 2 & 5, since they both have the same sum of positions in the 5-nearest neighbors. In a tie situation the algorithm advances to a third and final criterion. For each of the tied labels (labels 2 & 5 in our example) we search for their smallest position. The label with the smallest position is declared the nearest label. In this criterion there can be no tie. The results are as follows: 1 The second criterion can be replaced by a criterion that calculates the actual average distance (in the variable space) of each of the first criterion tied labels and decide that the label with the smallest average distance is the nearest label. In this case the chances of a tie leading to third criterion are much slimmer than in the utilized algorithm. - 12 - Label No. Smallest position in the 5 nearest neighbors Position 2 Position 1 2 5 Therefore the label, which appears the most times in the 5 nearest (appears twice) and has the smallest sum of positions (sum=6) and of those positions has the lowest one (position 1) is label 5. Thus, in this example the nearest label is declared label 5. - 13 - 1.3 Genetic Algorithms (GA) for variable selection Genetic algorithms [4], [5] are search algorithms founded upon the principles of natural evolution and selection. Possible solutions to a problem are generally coded as binary strings and the search is initialized with a random population of possible solutions (a solution is also called a “chromosome”). Each member of the population representing a candidate solution is tested against some criteria and members of the population are ranked according to their “fitness”. Fit solutions are allowed to “live” and “breed” while unfit solutions “die” and are discarded. Iteration is performed until either the populations or quality of the solutions converge. However, for instances in which this does not occur, a maximum number of iterations is predetermined. Once the possible terms in the solution are chosen (a term is also called a “gene”), we can produce a random population of solutions in binary form where a term of value “1” indicates inclusion of the corresponding variable, and a term of value “0” indicates that the variable is not used. For instance, we may have the following three members in a population: 1 1 0 1 0 0 1 ... 0 1 0 1 0 1 1 ... 1 1 1 1 1 0 0 ... Thus, in the first solution, we would include the first, second, forth and seventh variables, whereas in the second solution we would include the second, forth, sixth and seventh variables and so on. At each step (generation), half of the solutions with the best “fitness” are allowed to live and breed. Pairs of these solutions are randomly selected for breeding using a crossover technique. A randomly selected “crossover point” is chosen for each pair of solutions and the genes are “twisted.” - 14 - For instance, suppose that the second and third solutions from above were chosen for breeding and the crossover point was selected to be between the third and fourth terms. The result would be as follows: 0 1 0 1 0 1 1 ... => 0 1 0 0 1 0 0 1 1 1 0 1 0 0 ... => 1 1 1 1 0 1 1 The two new members of the population are shown to the right. As in nature an element of mutation exists. A small percentage of the terms are determined randomly and not through breeding. 1.3.1 Key GA parameters & terms Parameters: Population Size: the number of members (chromosomes) in the population (16 to 256 and must be divisible by 4). %2 Initial Terms: the fraction of terms with a value “1” (e.g. number of variables included) in the solutions of the initial population. Maximum Generations: maximum generations allowed before the algorithm quits. % At Convergence: the convergence criteria, the percentage of identical members of the population that qualify as “convergence”. Mutation Rate: the fraction of bits in the binary strings, which are randomly flipped each generation (typically 0.001 to 0.01). Crossover: sets the crossover to be single or double. Num. Iterations: number of iterations for cross-validation at each generation. This parameter’s function will be clarified in section 3.5. The values selected for the GA parameters are: Population Size = 32; % Initial terms = 20 & 50. Maximum Generations = 1000; % At Convergence = 80. Mutation Rate = 0.01 Crossover = Single; Num. Iterations = 10; Terms: Pop: the unique populations at either convergence or any other generation along the search. Fit: the fitness (cross-validation error) for each population in pop. 2 % = percentage of - 15 - 2 Experimental setup 2.1 Location of electrode pair Electrodes were attached to the areas as indicated in figure 2. (a) (b) Figure 2: Location of the recording electrodes: (a) The electrode pair location. (b) The anatomical make up of the forearm. The electrode pair is is attached to the superficial layer of flexors, consisting primarily of the Brachioradialis, the Carpi Radialis flexor, and the Palmaris Longus. This location, at least theoretically, is usable for even partial forearm amputations. The EMG signals in this location are from flexor muscles, which are triggered before a finger actually presses, therefore reducing delays. This location is also an intersection of several flexors thus making only one electrode pair necessary. The figure was taken form [6]. - 16 - This particular area was chosen for several reasons: 1. Several different locations yielded worse results (smaller signal to noise ratios (SNR) and larger misclassification error). 2. The location is quite close to the elbow, which makes it theoretically useable for even partial forearm amputations. 3. There are two kinds of muscle groups in the forearm: flexors & extensors. The location of the electrode pair is on the palm up side of the forearm as indicated in figure 2(a). This side is primarily comprised of flexors as seen in figure 2(b). 4. The difference between flexors and extensors in the forearm is that flexors “pull” the fingers inwardly, while extensors “pull” the fingers outwardly. Hence, Flexors are much more useful than extensors for real-time performance purposes since flexors are utilized before a finger actually presses the button, reducing delays in calculation. 5. This location is an intersection of several flexors, which control all the five fingers. Thus, making only one electrode pair necessary for a high SNR ratio from all five fingers. - 17 - 2.2 Finger label convention The fingers were labeled as shown in figure 3. Thus, the thumb is labeled as 1, the index finger as 2, the middle finger as 3, the ring finger as 4, and the little finger as 5. Figure 3: Finger labeling convention. - 18 - 2.3 Finger activation sensor In order to correlate finger movements with the EMG signals, a simple voltage divisor, seen in figure 4, acted as the finger activation sensor. Figure 4: A simple voltage devisor used as the finger activation sensor. In the device (see figure 5) each finger is assigned a button, which is responsible for a switch in the voltage divisor. Figure 5: Finger movement sensor. Each finger is assigned a button, which is responsible for a switch in the voltage divisor. - 19 - Finger activation causes a button press, which in turn closes the corresponding switch in the voltage divisor. The resistance in each branch is in descending order, so the smaller the finger label number, the higher the voltage in the output (Notice that when no button is pressed the output voltage is zero). This output is read in a separate channel as a reference, showing the true time interval the subject was pressing a button. According to the relative voltage level- the true finger identity can be extracted. 2.4 Course of experiment The EMG signal was sampled from one test subject using the Bio Pac Student Lab PRO kit at a sampling frequency of 500 hertz. At each finger activation the number of finger was noted and labeled as the “true answer”. Each test session was one minute long, in order to avoid fatigue. Originally, two electrode pairs were attached to the forearm of the test subject. However, the data from the second channel was discarded due to a low signal to noise ratio (relative to the first channel). - 20 - 3 Data processing 3.1 Finger activation identification The objective in this stage is to find the interval of finger activation (without regard to identity). The EMG from a typical Finger Activation (FA) is portrayed in figure 6. This particular FA will be employed later on for further illustrations. The main source of noise is mains interference. A band-stop at this frequency was selected in the Bio Pac Student Lab PRO kit. Figure 6: An example of a typical finger activation. This specific activation was recorded during movement of finger 4. - 21 - 3.2 Envelope detection The envelope of the signal is calculated in order to find the interval of FA as will be explained below. The calculation of the envelope of the signal is a three-staged process: 1. The signal is passed through a 30 [Hz] High Pass Filter. The HPF removes interference from movement itself (movement of the electrodes, etc.). 2. The absolute value is taken. The absolute turns the signal from a two phase to a single-phase signal. 3. The signal is passed through a 2.5 [Hz] Low Pass Filter. The LPF smoothes out the signal. The outcome of this technique on our finger activation example is exhibited in figure 7. Figure 7: The example signal and its envelope. The envelope of the signal is calculated in order to find the interval of FA. The FA envelope’s amplitude is - 22 - actually five times smaller. It is depicted in this fashion for illustrative purposes only. 3.3 Ascertainment of the interval of finger activation via signal’s envelope The interval of finger activation is the interval in which the envelope of the signal is larger than a certain threshold. The threshold is determined by an observation of a representative noise segment devoid of any finger activity and is set slightly higher than the maximum noise level. Figure 8 illustrates this process. Figure 8: Determination of FA interval. The interval of finger activation is the interval in which the envelope of the signal is larger than a certain threshold. - 23 - First the points in which the envelope crosses the threshold have been found. All these crossing points are candidates for beginning and ending points of finger activations. In between these crossing points there can be finger activations, noise or short finger twitches, which don’t constitute intentional finger activation. In order to find the beginning and end of only the intentional finger activation two criterions have been set: 1. Intentional finger activation lasts at least one second (= 500 samples). 2. The average level of the envelope in a finger activation interval must be larger than the value of the threshold with which the crossing points were determined. Using these criterions FA can be recognized. Flow chart 3 illustrates this process. - 24 - N crossing points Numbered i=1:N This is not intentional finger activation. i=i+1; Go ahead to the next pear of crossing points (i+1,i+2) Is the time between two threshold crossings points (i, i+1) long enough for finger activation? Is average of envelope between crossing point i, i+1 > threshold This is intentional finger activation. Crossing point i is the beginning of the finger activation and crossing point i+1 is the end of the finger activation. i=i+2; Skip the next pair of crossing points (crossing point i+1 has already been determined to be the end of a finger activation) and evaluate crossing points (i+2,i+3). Note: In each one of the one-minute measurements the last recognized finger activation was ignored because it may have been cut off in the middle. Flow Chart 3: Identification of finger activation beginning and ending from crossing points. - 25 - 3.4 Extraction of characteristic features from the finger activation Each finger activation is stored from the beginning of the finger activation, until 2000 samples later3. A constant segment from the finger activation is selected for extraction of characteristic features. Through trial and error the segment, which yielded the best results, is (200:800) samples4, where sample number “1” corresponds to the first sample of the finger activation. Each finger was activated a different number of times. Consequently, for symmetrical reasons alone, only 30 finger activations were chosen despite the fact that more were measured. Two main sources of features were selected: 1. The first forty features were derived from the DFT amplitude of the chosen segment. Each finger activation interval was multiplied by a Hanning window of the same length. A DFT was performed and the absolute value was taken. The frequency domain (0:250 Hz) was separated into 20 equal regions of 12.5 Hz each. Each region was characterized by its mean and variance values. 2. The last ten features were chosen as the Auto Regressive coefficients of an AR model of order 10. This order was selected after testing showed that a higher order AR model provided coefficients that made no further contribution. The variables in our implementation are the finger activation’s characteristic features. For each finger activation 50 features are associated and they constitute an “example” (e.g. N=50). Each of the N terms in a chromosome determines if a corresponding feature is selected for KNN classifier usage or not. There are a total of 150 examples (e.g. M=150) and five labels (e.g. L=5). 3 Four seconds have been determined as the maximal duration of finger activation. This choice of region denies the possibility of an unintentional analysis of even partial second finger activation. 4 - 26 - The feature matrix is an M x N+1 sized matrix (Illustrated in figure 9). Each row represents an example. The first column stores the label of each example (e.g. which finger was truly activated), each column of the remaining 50 columns represents a single feature. The first twenty features (1:20) are the mean of the frequency regions. The next twenty (21:40) are the variance of the same frequency regions. The last ten (41:50) are the Auto Regressive coefficients, numbered 2 through 11 (the first coefficient is always normalized to the value of 1 and therefore disregarded). Figure 9: The complete feature matrix. The matrix contains a total of 50 features for each finger activation. The above fifty features pose a problem of over fitting. Since only a total of 150 examples are available and there are 50 features, it is too near the case where - 27 - every feature represents a single finger activation. In such a case no model can be established and only the current finger activations may be recognized. Hence, a classifier that uses these features will not make good generalizations. Therefore it is imperative that a smaller combination of features be chosen. The search algorithm chosen is the Genetic Algorithm. 3.5 GA implementation The process of finding the right features for classification is an intuitive process. Both identity and number of features have to be determined. It is difficult to view a decision space of more than three dimensions (features) at once, making it is very difficult to distinguish which features provide easy to classify formations (such as seen in figure 1). Also the immense number of permutation (250) excludes any hope of intuitive trial and error. In order to minimize this conundrum a new approach must be taken. First, all the features that may be helpful are calculated. Second, a search algorithm finds among these features the ones that actually help and whichever don’t, are discarded. The Genetic Algorithm implements this approach. The criteria used to judge the “fitness” of each solution is its probability of misclassification via the KNN classifier using only the features that the GA specified. The GA decision process is illustrated in flow chart 4. - 28 - Create a population of chromosomes with a random genetic makeup. Choose the chromosome with the best fitness from the entire evolution. Choose the chromosome, which the population has converged into. Has the evolution gone on for longer than the max. Num. of generations? Has the population of chromosomes converged? For each chromosome in the population iterate the following ten times (Num. Iterations): - Randomize the total set of 150 examples. -Classify via the KNN classifier as detailed earlier. Sum up the total amount of errors from the ten iterations for each chromosome. This is its “fitness” - the survival criterion for each chromosome. The best chromosome in the current generation is compared to the best chromosome in the entire evolution up till now. If better, it turns into the best chromosome in the evolution. Half the populations with the worse values of the criterion of survival die. The remaining half chooses partners randomly among themselves and breed. This is the next generation. Flow chart 4: Implementation of a genetic algorithm for feature selection - 29 - 3.6 Error Calculation For each of the test set examples the nearest the label has been established according to the KNN algorithm. The true label for each test set example is compared to the estimated label. If they are not identical, this is declared as an error. Note that there is no preference to the kind of error. For instance, if the true label is 1 and the estimated label is 2, this is an error just the same as if the true label is 1 and the estimated label is 5. The number of misclassification errors calculated by the KNN classifier will be used for calculating the criterion for survival in the Genetic Algorithm. - 30 - 4 Results The genetic algorithm searched with several different initial parameters: 1. Analyzed region of Finger Activation (FA). Area 1 corresponding to region (200:600) and Area 2 corresponding to region (200:800). 2. Number of k nearest neighbors in the KNN classifier. K=1,3,5; 3. Initial population 20% & 50% of genes with value 1 (e.g. features selected for use in KNN classification) in first generation. In figure 10 & 11 the search process is illustrated. In Figure 10 the entire evolution (all 1000 generations) are exhibited. In figure 11, the evolution ends in the generation with the chromosome with the best overall fitness. - 31 - Figure 10: A demonstration of the search process. The number of features selected for KNN classification after ~200 generations wavers around 25. The initial parameters are: Area 2; k=5; Initial Population =20%; Figure 11: A demonstration of the search process. This figure is the same as in figure 10, but the evolution is stopped when the generation with the chromosome with the best overall fitness is reached. Explanation of figures 10&11: The top right-handed figure displays the evolution of fitness (probability of misclassification in percent) over the generations. The blue graph is the average fitness of the chromosome population per generation. The green graph is the fitness of - 32 - the chromosome with the best fitness per generation. The red asterisk indicates the generation in which the chromosome with the best fitness throughout the evolution appears. Note in the first generations a conspicuous separation between the blue and green graphs and with time they converge. The significance of this phenomenon is that at first the solutions are completely random and the difference between the best solution and the rest is relatively large. As the algorithm progresses only the better solutions survive and the difference between the average fitness of the chromosome population and the best fitness in the chromosome population decreases. The top left-handed figure displays the distribution of solutions in a two dimensional space: number of selected features and the solution fitness. Each green circle indicates a chromosome. Notice that since the solutions tend to converge at least partially, several chromosomes appear in the exact same position. The purpose of this figure is to give an indication of convergence. The genetic algorithm determines that a solution has been found when the population of solutions is 80% identical. In this case most of the solutions are in the exact same position in the aforementioned plane (however just because solutions have the same fitness and the same number of selected features doesn’t necessarily stipulate they are identical!). The bottom left-handed figure displays the average number of selected features (genes) used per generation over the generations. This figure denotes the progress of the search. The bottom right-handed figure displays how many chromosomes use each feature in a generation. This is an indicator of which features are being considered at each generation. Figures 10 & 11 are the results of one choice of the initial parameters. Table 1 and figure 12 summarize the results of all the choices. Features/ Num. Var. Selected & Prob. Error Num. Var. Prob. Error (%) Area1 k1s20 Area1 k1s50 Area1 k3s20 Area1 k3s50 Area1 k5s20 Area1 k5s50 Area2 k1s20 Area2 k1s50 Area2 k3s20 Area2 k3s50 Area2 k5s20 Area2 k5s50 25 6.333 27 7 29 6.67 27 6.67 29 6 37 6.67 26 5 27 4.667 21 4.667 27 5 25 4.667 30 5 - 33 - Table 1 55% 55% 48% 50% 46% 45% 40% Mean Variance AR Figure 12: Average percentage of selected features per type. The three types of features were chosen relatively evenly, which indicates that all three types are equally good predictors for the classification task. - 34 - 5 Discussion This project describes the possible use of EMG signals for operating a robotic arm. The EMG signals are detected and then classified using a modified KNN classifier. Features for the classifier are chosen using a genetic algorithm. In all the search runs the genetic algorithm never converged and terminated by reaching the maximum number of generations. This indicates either a highly local solution or the convergence percentage parameter was set to a too high value (80% in all the tests). In order to make use of this algorithm without its convergence an option was added that it remember the best overall result throughout its iterations. When the algorithm doesn’t converge the parameter that decides the number of iterations is the ‘maximum number of generations’ parameter. It was set in all the searches to be 1000. This is a compromise between more generations, which have a higher chance of finding a better solution, and fewer generations, which use fewer hardware resources and less calculation time. Yet if 1000 generations seems like a high value, a simple calculation proves that the genetic algorithm only searches through a fraction of the possibilities: Number of possible permutation 2 50 Number of permutations examined per generation =32 Number of generation =1000 32 *1000 *100 3 *10 9 % % of permutations examined by GA = 50 2 Each of the results presented in the table 1 are the best results (in the minimum probability of error sense) that come from a single GA search per selection of initial parameters (Area, K, s%). The same choice of initial parameters has given higher probabilities of misclassification of up to ~ +1.5%. Despite this variance the results show that there is a clear distinction between Area 1 and Area 2. Area 2’s percentage of misclassification error is in general ~2% lower than Area1’s. Area 2 has been found, through trial and error, to exhibit the lowest overall percentage of misclassification. - 35 - On the other hand, when a finger is activated the flexor muscles may be activated for less than 1.6 seconds5. For this eventuality another shorter region was chosen: Area 1, which requires only 1.2 seconds of finger activation. In both areas the first 0.4 seconds (200 samples) were not utilized because their inclusion raised the probability of misclassification. Dividing the finger activation into two kinds of regions can elucidate a possible explanation to this phenomenon: 1. Transient period: first 0.4 seconds. 2. Steady state: rest of FA. As opposed to the steady state, the transient period depends on how the finger is contracted, without any correlation to which finger was activated. An interesting tendency is the GA’s number of selected features in the best feature combinations. Despite different starting number of selected features - 10 (20% of the population) and 25 (50% of the population), the best feature combination selected ~25 features. This is part of a wider trend of the GA. Even when initialized at ten selected features, after ~200 generations the GA searches in the vicinity of 25 features (bottom left-handed figure in figure 12) and as a direct consequence finds the best feature combinations only in that vicinity of features. If about 25 selected features still seem as an over fitting problem one must consider the method the probability of error is calculated. The total set of 150 FA measurements is separated into a training set of 120 and a test set of 30. This process is repeated 10 times and at each iteration a different training and test set are randomly selected. If there were a problem of over fitting (e.g. a feature identifies a particular measurement and not a certain general quality) then the probability of error would increase because at each iteration there is a different training set and a different test set. 5 Using the finger activation sensor, it was observed that the actual time interval a finger presses a button is shorter than the time interval the flexors are active. - 36 - Notice the choice of number of nearest neighbors affects neither the probability of error, nor the number of selected features. Also roughly half of each type of feature (mean, variance, AR coefficients) is on average used for best results. The first few generations of the average chromosome population (the blue graph in the top left-handed figure in figure 12) indicate the fitness error if no search algorithm was employed. If the choice of features is completely random, the probability of misclassification error is ~25%. An intuitive approach (for example viewing each feature at a time and searching for ‘good’ separation between the labels) yielded a probability of misclassification of ~15% (the same as the best chromosomes in the first few generations). The GA managed to lower the probability of misclassification to only ~5%. Hence making finger identification in cases of hand amputation realistic. - 37 - Reference [1] Uchida N, Hiraiwa A, Sonehara N, Shimohara K, EMG pattern recognition by neural networks for multi finger control, Proceedings of the Annual Engineering in Medicine and Biology Society, 1992, pp. 1016 –1018. [2] Jackson LB, Digital Filters and Signal Processing, Second Edition, Kluwer Academic, 1989. pp. 255-257. [3] Duda RO, Hart PE, Stork DG, Pattern Classification, John Wiley & Sons, 2000. [4] Whitley D, A genetic algorithm tutorial, Statistics and Computing., vol.4(2), pp.65-85, June 1994. [5] Holland JH, Genetic Algorithms, Scientific American, 1992, pp.44-50. [6] Tortora, Grabowski, Principles of Anatomy and Physiology 7th edition, Harper Collins, pp.313 - 38 -