EMG Pattern recognition for multi finger control

advertisement
TECHNION – Israel Institute of Technology
Department of Electrical Engineering
The Physiological Signal Processing Laboratory
Classification of Finger
Activation using Genetic
Algorithm EMG Feature
Selection
By
Dori Peleg & Eyal Braiman
Supervised by
Elad Yom-Tov
Winter 2000-2001
Abstract
In cases of hand amputation there is a need for a way to control a robotic
replacement hand. Although the hand is missing, the muscles in the forearm, which
are responsible for finger movement, remain, and can nonetheless be flexed. This
muscle activity, via electrodes attached to the forearm, can be read as
electomyographic signals. The objective of this project is to successfully identify
when a finger is activated and which finger is activated.
The interval of finger activation was derived from the signal’s envelope (without
regard to identity of finger at this point). Characteristic features were extracted from
the signal and a modified K-Nearest neighbor classifier used those features in order to
ascertain which finger was activated.
Due to an over fitting problem feature selection was required. The feature
selection was utilized using a Genetic Algorithm that searched for the combination of
features with which the classifier generates the fewest errors. We show that after
reviewing only ~ 3  10 9 % of the possible combinations of features the algorithm
managed to lower the probability of misclassification from approximately 25% to
only 5%.
This shows that it is possible to operate a robotic replacement arm with
relatively few errors using only one pair of electrodes.
-2-
Table of contents
Abstract .......................................................................................................................... 2
Table of contents ............................................................................................................ 3
1 Introduction ................................................................................................................ 4
1.1 Autoregressive (AR) coefficients ....................................................................... 7
1.2 Modified K Nearest Neighbor (KNN) classifier ................................................. 8
1.2.1 An example of the decision process........................................................... 10
1.3 Genetic Algorithms (GA) for variable selection ............................................... 14
1.3.1 Key GA parameters & terms...................................................................... 15
2 Experimental setup................................................................................................... 16
2.1 Location of electrode pair ................................................................................. 16
2.2 Finger label convention..................................................................................... 18
2.3 Finger activation sensor .................................................................................... 19
2.4 Course of experiment ........................................................................................ 20
3 Data processing ........................................................................................................ 21
3.1 Finger activation identification ......................................................................... 21
3.2 Envelope detection ............................................................................................ 22
3.3 Ascertainment of the interval of finger activation via signal’s envelope ......... 23
3.4 Extraction of characteristic features from the finger activation........................ 26
3.5 GA implementation ........................................................................................... 28
3.6 Error Calculation ............................................................................................... 30
4 Results ...................................................................................................................... 31
5 Discussion ................................................................................................................ 35
Reference ..................................................................................................................... 38
-3-
1 Introduction
In cases of hand amputation there is a need for a way to control a robotic
replacement hand. Although the hand is missing, the muscles in the forearm, which
are responsible for finger movement, remain, and can nonetheless be flexed. This
muscle activity, via electrodes attached to the forearm, can be read as
electomyographic (EMG) signals. In order to analyze and classify finger movements
using EMG signals, previous studies [1] used several learning techniques, such as
perceptron linear separation and a backpropagation-type neural network. These
techniques yielded probabilities of misclassifications too high for a realistic
implementation.
In this project, a combination of a K-Nearest neighbor (KNN) classifier of
EMG signal features and a genetic algorithm (GA) for feature selection, is used, with
the results that the probability of finger identity misclassification is approximately
5%.
The process begins with reading the EMG signals from an electrode pair
attached to the test subject’s forearm. Finger activation (FA) was identified when the
signal’s envelope crossed a threshold higher than the maximum noise level. From the
finger activation characteristic features were chosen in order to represent the FA. The
characteristic features were derived from two sources. The first is the amplitude of the
Discrete Fourier Transform of the EMG signal: The frequency region was divided
into 20 equal sections of frequencies and each section was characterized by its mean
and variance values. The second source is the Auto Regressive (AR) model: In the
AR model the signal is modeled as the output of an all-pole IIR filter with white
Gaussian noise input. The features from this model are the AR coefficients.
A modified K Nearest Neighbor classifier used these features in order to
ascertain which finger was activated (e.g. a test example’s label is evaluated
according to its features). The modified KNN classifier uses a set of examples (a
training set) as its decision space. Each example is comprised of N characteristic
features and a label indicating the true finger identity. The N- dimensional Euclidian
distance between the test example and the training examples is calculated. The nearest
-4-
example to the test example is declared the nearest neighbor and the test example’s
label becomes the nearest neighbor’s label. This is the case when the classifier looks
for only k=1 nearest neighbors. For a general k the classifier algorithm takes into
account the k nearest labels and decides which is nearest.
Due to an over fitting problem feature selection was required. It was utilized
with a Genetic Algorithm that searched for the combination of features with which the
KNN classifier classifies with the fewest errors. Genetic algorithms simulate nature’s
evolutionary process. The equivalent of an organism is a “solution”. A solution is a
binary vector of length N (the number of features), where ‘1’ terms in the solution
indicate inclusion of the appropriate feature in the KNN classification. A certain
number of solutions are evaluated and they make up a “population” of solutions. They
are evaluated against a criterion of survival. The criterion is the number of
misclassifications the KNN classifier makes when using the features denoted by the
solution. The members of the population, which have the worst criteria die, while the
rest survive and breed among themselves (crossover technique), and make up the
population of the next iteration (equivalent to a generation).
The algorithm reaches an end when most of the population is identical (this
means the algorithm has converged into a solution) or until a certain predetermined
number of generations is reached.
The GA managed to lower the probability of misclassification from about 25%
to only 5%. The classification process is illustrated in flow chart 1.
In the following sections the theory behind AR modeling, the modified
KNN classifier, and genetic algorithms are described. Section 2 details the
experimental setup. Section 3 shows the stages of processing. Section 4 describes
results obtained by the processing algorithms, and in section 5 these results are
discussed.
-5-
Reading EMG signals
from the forearm muscles
Finger activation detection
Characteristic feature
extraction
Feature selection that
minimizes finger
misclassification
Finger classification with
selected features
Calculation of Probability
of misclassification
Flow Chart 1: The general steps of the classification process
-6-
1.1 Autoregressive (AR) coefficients
Autoregessive modeling [2] allows the signal to be modeled as the output of an
all-pole IIR filter with white Gaussian noise input.
This is the Yule-Walker AR problem, which concerns modeling an unknown
system as an autoregressive process. The use of the signal's autocorrelation sequence
to obtain an optimal estimate leads to a Ra = b equation of the type shown below.
These equations are most efficiently solved by Levinson-Durbin recursion.
Where the vector r is the autocorrelation estimate for the signal. Vector a
contains the coefficients of the autoregressive process that optimally models the
system. The coefficients are ordered in descending powers of z, and the AR process is
minimum phase. The prediction error, G, defines the gain for the unknown system.
-7-
1.2 Modified K Nearest Neighbor (KNN) classifier
The modified K-Nearest Neighbors [3] (KNN) classifier operates on a set of M
examples. Each example is a vector of N variables and is tagged with one of L labels.
In order to test the classifier’s validity, holdout was used, whereby the set is separated
into two groups where 80% of the examples are used for training and 20% of the
examples are used for testing.
The training set is treated as known information while the test set’s labels
are regarded as unknown. The KNN classifier uses the training set and its labels to
find the test set’s labels.
Each variable is treated as a dimension and the classification space is the
N-dimensional variable space. The classifier takes each of the test set’s examples and
one at a time calculates their N-dimensional Euclidian distance from the training set
examples. Only the k nearest labels are taken into account. The first criterion is the
number of appearances a label makes in the k nearest. If there is only one label which
appears the most times than it is the nearest label. If there are two or more labels,
which appear the most times, than the algorithm advances to a second criteria.
This criterion first defines the term “position” as the order of distance in the
k nearest (e.g. the label with the smallest distance is in “position 1”, the label with the
second smallest distance is in “position 2”…). The label, of the labels that are tied in
the first criteria, with the smallest sum of positions is the nearest label. The term
smallest sum of position is related to the smallest average distance term. If there are
two or more labels, which have the same smallest sum of positions, than the algorithm
advances to a third criteria.
The label, of the labels that are tied in the first and second criteria, with the
smallest position (not sum) is the nearest label. In this criteria there can be no tie.
Flow chart 2 illustrates the modified KNN classifier decision process.
-8-
Take one example from the
test set and calculate the Ndimensional Euclidian
distance from each of the
training set examples.
Find the k nearest (N-dimensional
Euclidian distance) labels and order them
according to distance.
Calculate how many times each label
appears in the k nearest labels. Calculate
which label appears the most.
The nearest label is the
label that appears the
most times in the k
nearest
Is there only
one label that
appears the
most times in
the k nearest?
There are two, or more, labels with the same,
largest, number of appearances in the k
nearest.
Calculate, for each of the above labels, the
sum of their positions.
The nearest label is the
label that appears the most
times in the k nearest, and
has the smallest sum of
positions
Is there only
one label with
the smallest
sum of
positions?
-9-
There are two, or more, labels with
the same largest number of
appearances in the k nearest and the
same sum of their positions.
Find, for each of the above labels,
its lowest position.
Choose the label which appears the
most times in the k nearest, has the
smallest sum of positions and of
those positions has the lowest one.
Flow Chart 2: The modified KNN classifier decision process. The classifier was
modified from the classic KNN classifier so that ties can be resolved by the
classifier
1.2.1 An example of the decision process
Number of Labels L=5;
Number of variable=2; => two dimensional decision (variable) space.
k=5 nearest neighbors;
The distance calculation from the test label to the five nearest training labels is
shown in figure 1.
- 10 -
Figure 1: An example of modified KNN classification. In this example the
number of labels=5, the number of features=2, the number of nearest
neighbors=5. The distance from the test label to each of the 5 nearest labels is
indicated with an arrow.
The five nearest labels to the test label in ascending distance are:
Label 5
Label 2
Label 1
- 11 -
Label 2
Label 5
The first criterion is the number of appearances in the k nearest neighbors.
Whichever label appears the most times is determined the "nearest label". The results
are as follows:
Label
No.
1
2
3
4
5
Number of
appearances in the 5
nearest neighbors
1
2
0
0
2
According to this criterion there is a tie between labels 2 & 5, since they both
appear two times in the 5-nearest neighbors. In a tie situation the algorithm advances
to a second criterion.
For each of the tied labels (labels 2 & 5 in our example) the sum of their
positions is calculated. The sum of label’s positions is indicative to the average
distance of that label. Therefore the nearest label is the label with the smallest sum of
positions1. The results are as follows:
Label
No.
2
5
Sum of Positions in
the 5 nearest
neighbors
2+4=6
1+5=6
According to this criterion there is still a tie between labels 2 & 5, since they
both have the same sum of positions in the 5-nearest neighbors. In a tie situation the
algorithm advances to a third and final criterion.
For each of the tied labels (labels 2 & 5 in our example) we search for their
smallest position. The label with the smallest position is declared the nearest label. In
this criterion there can be no tie. The results are as follows:
1
The second criterion can be replaced by a criterion that calculates the actual average distance (in the
variable space) of each of the first criterion tied labels and decide that the label with the smallest
average distance is the nearest label. In this case the chances of a tie leading to third criterion are much
slimmer than in the utilized algorithm.
- 12 -
Label
No.
Smallest position in
the 5 nearest
neighbors
Position 2
Position 1
2
5
Therefore the label, which appears the most times in the 5 nearest (appears
twice) and has the smallest sum of positions (sum=6) and of those positions has the
lowest one (position 1) is label 5. Thus, in this example the nearest label is declared
label 5.
- 13 -
1.3 Genetic Algorithms (GA) for variable selection
Genetic algorithms [4], [5] are search algorithms founded upon the principles
of natural evolution and selection. Possible solutions to a problem are generally coded
as binary strings and the search is initialized with a random population of possible
solutions (a solution is also called a “chromosome”).
Each member of the population representing a candidate solution is tested
against some criteria and members of the population are ranked according to their
“fitness”. Fit solutions are allowed to “live” and “breed” while unfit solutions “die”
and are discarded.
Iteration is performed until either the populations or quality of the solutions
converge. However, for instances in which this does not occur, a maximum number of
iterations is predetermined.
Once the possible terms in the solution are chosen (a term is also called a
“gene”), we can produce a random population of solutions in binary form where a
term of value “1” indicates inclusion of the corresponding variable, and a term of
value “0” indicates that the variable is not used. For instance, we may have the
following three members in a population:
1 1 0 1 0 0 1 ...
0 1 0 1 0 1 1 ...
1 1 1 1 1 0 0 ...
Thus, in the first solution, we would include the first, second, forth and seventh
variables, whereas in the second solution we would include the second, forth, sixth
and seventh variables and so on.
At each step (generation), half of the solutions with the best “fitness” are
allowed to live and breed. Pairs of these solutions are randomly selected for breeding
using a crossover technique. A randomly selected “crossover point” is chosen for each
pair of solutions and the genes are “twisted.”
- 14 -
For instance, suppose that the second and third solutions from above were
chosen for breeding and the crossover point was selected to be between the third and
fourth terms. The result would be as follows:
0 1 0 1 0 1 1 ... => 0 1 0 0 1 0 0
1 1 1 0 1 0 0 ... => 1 1 1 1 0 1 1
The two new members of the population are shown to the right. As in nature an
element of mutation exists. A small percentage of the terms are determined randomly
and not through breeding.
1.3.1 Key GA parameters & terms
Parameters:
Population Size: the number of members (chromosomes) in the population (16 to 256
and must be divisible by 4).
%2 Initial Terms: the fraction of terms with a value “1” (e.g. number of variables
included) in the solutions of the initial population.
Maximum Generations: maximum generations allowed before the algorithm quits.
% At Convergence: the convergence criteria, the percentage of identical members of
the population that qualify as “convergence”.
Mutation Rate: the fraction of bits in the binary strings, which are randomly flipped
each generation (typically 0.001 to 0.01).
Crossover: sets the crossover to be single or double.
Num. Iterations: number of iterations for cross-validation at each generation. This
parameter’s function will be clarified in section 3.5.
The values selected for the GA parameters are:
Population Size = 32;
% Initial terms = 20 & 50.
Maximum Generations = 1000;
% At Convergence = 80.
Mutation Rate = 0.01
Crossover = Single;
Num. Iterations = 10;
Terms:
Pop: the unique populations at either convergence or any other generation along the
search.
Fit: the fitness (cross-validation error) for each population in pop.
2
% = percentage of
- 15 -
2 Experimental setup
2.1 Location of electrode pair
Electrodes were attached to the areas as indicated in figure 2.
(a)
(b)
Figure 2: Location of the recording electrodes: (a) The electrode pair location. (b) The
anatomical make up of the forearm. The electrode pair is is attached to the superficial layer of
flexors, consisting primarily of the Brachioradialis, the Carpi Radialis flexor, and the Palmaris
Longus. This location, at least theoretically, is usable for even partial forearm amputations. The
EMG signals in this location are from flexor muscles, which are triggered before a finger actually
presses, therefore reducing delays. This location is also an intersection of several flexors thus
making only one electrode pair necessary. The figure was taken form [6].
- 16 -
This particular area was chosen for several reasons:
1.
Several different locations yielded worse results (smaller signal to
noise ratios (SNR) and larger misclassification error).
2.
The location is quite close to the elbow, which makes it
theoretically useable for even partial forearm amputations.
3.
There are two kinds of muscle groups in the forearm: flexors &
extensors. The location of the electrode pair is on the palm up
side of the forearm as indicated in figure 2(a). This side is
primarily comprised of flexors as seen in figure 2(b).
4.
The difference between flexors and extensors in the forearm is
that flexors “pull” the fingers inwardly, while extensors “pull” the
fingers outwardly. Hence, Flexors are much more useful than
extensors for real-time performance purposes since flexors are
utilized before a finger actually presses the button, reducing
delays in calculation.
5.
This location is an intersection of several flexors, which control
all the five fingers. Thus, making only one electrode pair
necessary for a high SNR ratio from all five fingers.
- 17 -
2.2 Finger label convention
The fingers were labeled as shown in figure 3. Thus, the thumb is labeled as 1,
the index finger as 2, the middle finger as 3, the ring finger as 4, and the little finger
as 5.
Figure 3: Finger labeling convention.
- 18 -
2.3 Finger activation sensor
In order to correlate finger movements with the EMG signals, a simple
voltage divisor, seen in figure 4, acted as the finger activation sensor.
Figure 4: A simple voltage devisor used as the finger activation sensor.
In the device (see figure 5) each finger is assigned a button, which is
responsible for a switch in the voltage divisor.
Figure 5: Finger movement sensor. Each finger is assigned a button, which is
responsible for a switch in the voltage divisor.
- 19 -
Finger activation causes a button press, which in turn closes the corresponding
switch in the voltage divisor. The resistance in each branch is in descending order, so
the smaller the finger label number, the higher the voltage in the output (Notice that
when no button is pressed the output voltage is zero). This output is read in a separate
channel as a reference, showing the true time interval the subject was pressing a
button. According to the relative voltage level- the true finger identity can be
extracted.
2.4 Course of experiment
The EMG signal was sampled from one test subject using the Bio Pac Student
Lab PRO kit at a sampling frequency of 500 hertz. At each finger activation the
number of finger was noted and labeled as the “true answer”. Each test session was
one minute long, in order to avoid fatigue.
Originally, two electrode pairs were attached to the forearm of the test subject.
However, the data from the second channel was discarded due to a low signal to noise
ratio (relative to the first channel).
- 20 -
3 Data processing
3.1 Finger activation identification
The objective in this stage is to find the interval of finger activation (without
regard to identity). The EMG from a typical Finger Activation (FA) is portrayed in
figure 6. This particular FA will be employed later on for further illustrations. The
main source of noise is mains interference. A band-stop at this frequency was selected
in the Bio Pac Student Lab PRO kit.
Figure 6: An example of a typical finger activation. This specific activation was
recorded during movement of finger 4.
- 21 -
3.2 Envelope detection
The envelope of the signal is calculated in order to find the interval of FA as
will be explained below.
The calculation of the envelope of the signal is a three-staged process:
1.
The signal is passed through a 30 [Hz] High Pass Filter. The
HPF removes interference from movement itself (movement of
the electrodes, etc.).
2.
The absolute value is taken. The absolute turns the signal from a
two phase to a single-phase signal.
3.
The signal is passed through a 2.5 [Hz] Low Pass Filter. The
LPF smoothes out the signal.
The outcome of this technique on our finger activation example is exhibited in
figure 7.
Figure 7: The example signal and its envelope. The envelope of the signal is
calculated in order to find the interval of FA. The FA envelope’s amplitude is
- 22 -
actually five times smaller. It is depicted in this fashion for illustrative purposes
only.
3.3 Ascertainment of the interval of finger activation via signal’s envelope
The interval of finger activation is the interval in which the envelope of the
signal is larger than a certain threshold. The threshold is determined by an observation
of a representative noise segment devoid of any finger activity and is set slightly
higher than the maximum noise level.
Figure 8 illustrates this process.
Figure 8: Determination of FA interval. The interval of finger activation is the
interval in which the envelope of the signal is larger than a certain threshold.
- 23 -
First the points in which the envelope crosses the threshold have been found.
All these crossing points are candidates for beginning and ending points of finger
activations.
In between these crossing points there can be finger activations, noise or short
finger twitches, which don’t constitute intentional finger activation. In order to find
the beginning and end of only the intentional finger activation two criterions have
been set:
1.
Intentional finger activation lasts at least one second (= 500
samples).
2.
The average level of the envelope in a finger activation interval
must be larger than the value of the threshold with which the
crossing points were determined.
Using these criterions FA can be recognized. Flow chart 3 illustrates this
process.
- 24 -
N crossing
points
Numbered
i=1:N
This is not intentional finger
activation.
i=i+1;
Go ahead to the next pear of
crossing points (i+1,i+2)
Is the time
between two
threshold crossings
points (i, i+1) long
enough for finger
activation?
Is average of envelope
between crossing
point i, i+1 >
threshold
This is intentional finger activation.
Crossing point i is the beginning of the finger activation
and crossing point i+1 is the end of the finger activation.
i=i+2;
Skip the next pair of crossing points (crossing point i+1
has already been determined to be the end of a finger
activation) and evaluate crossing points (i+2,i+3).
Note: In each one of the one-minute measurements the
last recognized finger activation was ignored because it
may have been cut off in the middle.
Flow Chart 3: Identification of finger activation beginning and ending from
crossing points.
- 25 -
3.4 Extraction of characteristic features from the finger activation
Each finger activation is stored from the beginning of the finger activation, until
2000 samples later3. A constant segment from the finger activation is selected for
extraction of characteristic features. Through trial and error the segment, which
yielded the best results, is (200:800) samples4, where sample number “1” corresponds
to the first sample of the finger activation.
Each finger was activated a different number of times. Consequently, for
symmetrical reasons alone, only 30 finger activations were chosen despite the fact
that more were measured.
Two main sources of features were selected:
1.
The first forty features were derived from the DFT amplitude of
the chosen segment. Each finger activation interval was
multiplied by a Hanning window of the same length. A DFT
was performed and the absolute value was taken. The frequency
domain (0:250 Hz) was separated into 20 equal regions of 12.5
Hz each. Each region was characterized by its mean and
variance values.
2.
The last ten features were chosen as the Auto Regressive
coefficients of an AR model of order 10. This order was
selected after testing showed that a higher order AR model
provided coefficients that made no further contribution.
The variables in our implementation are the finger activation’s characteristic
features. For each finger activation 50 features are associated and they constitute an
“example” (e.g. N=50). Each of the N terms in a chromosome determines if a
corresponding feature is selected for KNN classifier usage or not. There are a total of
150 examples (e.g. M=150) and five labels (e.g. L=5).
3
Four seconds have been determined as the maximal duration of finger activation.
This choice of region denies the possibility of an unintentional analysis of even partial second finger
activation.
4
- 26 -
The feature matrix is an M x N+1 sized matrix (Illustrated in figure 9). Each
row represents an example. The first column stores the label of each example (e.g.
which finger was truly activated), each column of the remaining 50 columns
represents a single feature. The first twenty features (1:20) are the mean of the
frequency regions. The next twenty (21:40) are the variance of the same frequency
regions. The last ten (41:50) are the Auto Regressive coefficients, numbered 2
through 11 (the first coefficient is always normalized to the value of 1 and therefore
disregarded).
Figure 9: The complete feature matrix. The matrix contains a total of 50
features for each finger activation.
The above fifty features pose a problem of over fitting. Since only a total of
150 examples are available and there are 50 features, it is too near the case where
- 27 -
every feature represents a single finger activation. In such a case no model can be
established and only the current finger activations may be recognized. Hence, a
classifier that uses these features will not make good generalizations. Therefore it is
imperative that a smaller combination of features be chosen. The search algorithm
chosen is the Genetic Algorithm.
3.5 GA implementation
The process of finding the right features for classification is an intuitive
process. Both identity and number of features have to be determined. It is difficult to
view a decision space of more than three dimensions (features) at once, making it is
very difficult to distinguish which features provide easy to classify formations (such
as seen in figure 1). Also the immense number of permutation (250) excludes any hope
of intuitive trial and error.
In order to minimize this conundrum a new approach must be taken. First, all
the features that may be helpful are calculated. Second, a search algorithm finds
among these features the ones that actually help and whichever don’t, are discarded.
The Genetic Algorithm implements this approach. The criteria used to judge
the “fitness” of each solution is its probability of misclassification via the KNN
classifier using only the features that the GA specified. The GA decision process is
illustrated in flow chart 4.
- 28 -
Create a population
of chromosomes with
a random genetic
makeup.
Choose the chromosome
with the best fitness from
the entire evolution.
Choose the chromosome,
which the population has
converged into.
Has the evolution
gone on for
longer than the
max. Num. of
generations?
Has the
population of
chromosomes
converged?
For each chromosome in the population iterate the following ten
times (Num. Iterations):
- Randomize the total set of 150 examples.
-Classify via the KNN classifier as detailed earlier.
Sum up the total amount of errors from the ten iterations for each
chromosome. This is its “fitness” - the survival criterion for each
chromosome.
The best chromosome in the current generation is compared to
the best chromosome in the entire evolution up till now. If better,
it turns into the best chromosome in the evolution.
Half the populations with the worse values of the criterion of
survival die. The remaining half chooses partners randomly
among themselves and breed. This is the next generation.
Flow chart 4: Implementation of a genetic algorithm for feature selection
- 29 -
3.6 Error Calculation
For each of the test set examples the nearest the label has been established
according to the KNN algorithm. The true label for each test set example is compared
to the estimated label. If they are not identical, this is declared as an error.
Note that there is no preference to the kind of error. For instance, if the true
label is 1 and the estimated label is 2, this is an error just the same as if the true label
is 1 and the estimated label is 5.
The number of misclassification errors calculated by the KNN classifier will
be used for calculating the criterion for survival in the Genetic Algorithm.
- 30 -
4 Results
The genetic algorithm searched with several different initial parameters:
1.
Analyzed
region
of
Finger
Activation
(FA).
Area
1
corresponding to region (200:600) and Area 2 corresponding to
region (200:800).
2.
Number of k nearest neighbors in the KNN classifier. K=1,3,5;
3.
Initial population 20% & 50% of genes with value 1 (e.g.
features selected for use in KNN classification) in first
generation.
In figure 10 & 11 the search process is illustrated. In Figure 10 the entire
evolution (all 1000 generations) are exhibited. In figure 11, the evolution ends in the
generation with the chromosome with the best overall fitness.
- 31 -
Figure 10: A demonstration of the search process. The number of features
selected for KNN classification after ~200 generations wavers around 25. The
initial parameters are: Area 2; k=5; Initial Population =20%;
Figure 11: A demonstration of the search process. This figure is the same as in
figure 10, but the evolution is stopped when the generation with the chromosome
with the best overall fitness is reached.
Explanation of figures 10&11:
The top right-handed figure displays the evolution of fitness (probability of
misclassification in percent) over the generations. The blue graph is the average
fitness of the chromosome population per generation. The green graph is the fitness of
- 32 -
the chromosome with the best fitness per generation. The red asterisk indicates the
generation in which the chromosome with the best fitness throughout the evolution
appears.
Note in the first generations a conspicuous separation between the blue and
green graphs and with time they converge. The significance of this phenomenon is
that at first the solutions are completely random and the difference between the best
solution and the rest is relatively large. As the algorithm progresses only the better
solutions survive and the difference between the average fitness of the chromosome
population and the best fitness in the chromosome population decreases.
The top left-handed figure displays the distribution of solutions in a two
dimensional space: number of selected features and the solution fitness. Each green
circle indicates a chromosome. Notice that since the solutions tend to converge at
least partially, several chromosomes appear in the exact same position. The purpose
of this figure is to give an indication of convergence. The genetic algorithm
determines that a solution has been found when the population of solutions is 80%
identical. In this case most of the solutions are in the exact same position in the
aforementioned plane (however just because solutions have the same fitness and the
same number of selected features doesn’t necessarily stipulate they are identical!).
The bottom left-handed figure displays the average number of selected features
(genes) used per generation over the generations. This figure denotes the progress of
the search.
The bottom right-handed figure displays how many chromosomes use each
feature in a generation. This is an indicator of which features are being considered at
each generation.
Figures 10 & 11 are the results of one choice of the initial parameters. Table 1
and figure 12 summarize the results of all the choices.
Features/
Num. Var. Selected
& Prob. Error
Num. Var.
Prob. Error (%)
Area1
k1s20
Area1
k1s50
Area1
k3s20
Area1
k3s50
Area1
k5s20
Area1
k5s50
Area2
k1s20
Area2
k1s50
Area2
k3s20
Area2
k3s50
Area2
k5s20
Area2
k5s50
25
6.333
27
7
29
6.67
27
6.67
29
6
37
6.67
26
5
27
4.667
21
4.667
27
5
25
4.667
30
5
- 33 -
Table 1
55%
55%
48%
50%
46%
45%
40%
Mean
Variance
AR
Figure 12: Average percentage of selected features per type. The three
types of features were chosen relatively evenly, which indicates that all
three types are equally good predictors for the classification task.
- 34 -
5 Discussion
This project describes the possible use of EMG signals for operating a robotic
arm. The EMG signals are detected and then classified using a modified KNN
classifier. Features for the classifier are chosen using a genetic algorithm.
In all the search runs the genetic algorithm never converged and terminated by
reaching the maximum number of generations. This indicates either a highly local
solution or the convergence percentage parameter was set to a too high value (80% in
all the tests). In order to make use of this algorithm without its convergence an option
was added that it remember the best overall result throughout its iterations.
When the algorithm doesn’t converge the parameter that decides the number of
iterations is the ‘maximum number of generations’ parameter. It was set in all the
searches to be 1000. This is a compromise between more generations, which have a
higher chance of finding a better solution, and fewer generations, which use fewer
hardware resources and less calculation time. Yet if 1000 generations seems like a
high value, a simple calculation proves that the genetic algorithm only searches
through a fraction of the possibilities:
Number of possible permutation  2 50
Number of permutations examined per generation =32
Number of generation =1000
32 *1000
*100  3 *10 9 %
% of permutations examined by GA =
50
2
Each of the results presented in the table 1 are the best results (in the minimum
probability of error sense) that come from a single GA search per selection of initial
parameters (Area, K, s%). The same choice of initial parameters has given higher
probabilities of misclassification of up to ~ +1.5%.
Despite this variance the results show that there is a clear distinction between
Area 1 and Area 2. Area 2’s percentage of misclassification error is in general ~2%
lower than Area1’s. Area 2 has been found, through trial and error, to exhibit the
lowest overall percentage of misclassification.
- 35 -
On the other hand, when a finger is activated the flexor muscles may be
activated for less than 1.6 seconds5. For this eventuality another shorter region was
chosen: Area 1, which requires only 1.2 seconds of finger activation.
In both areas the first 0.4 seconds (200 samples) were not utilized because their
inclusion raised the probability of misclassification. Dividing the finger activation
into two kinds of regions can elucidate a possible explanation to this phenomenon:
1.
Transient period: first 0.4 seconds.
2.
Steady state: rest of FA.
As opposed to the steady state, the transient period depends on how the finger
is contracted, without any correlation to which finger was activated.
An interesting tendency is the GA’s number of selected features in the best
feature combinations. Despite different starting number of selected features - 10 (20%
of the population) and 25 (50% of the population), the best feature combination
selected ~25 features. This is part of a wider trend of the GA. Even when initialized at
ten selected features, after ~200 generations the GA searches in the vicinity of 25
features (bottom left-handed figure in figure 12) and as a direct consequence finds the
best feature combinations only in that vicinity of features.
If about 25 selected features still seem as an over fitting problem one must
consider the method the probability of error is calculated. The total set of 150 FA
measurements is separated into a training set of 120 and a test set of 30. This process
is repeated 10 times and at each iteration a different training and test set are randomly
selected. If there were a problem of over fitting (e.g. a feature identifies a particular
measurement and not a certain general quality) then the probability of error would
increase because at each iteration there is a different training set and a different test
set.
5
Using the finger activation sensor, it was observed that the actual time interval a finger presses a
button is shorter than the time interval the flexors are active.
- 36 -
Notice the choice of number of nearest neighbors affects neither the
probability of error, nor the number of selected features. Also roughly half of each
type of feature (mean, variance, AR coefficients) is on average used for best results.
The first few generations of the average chromosome population (the blue
graph in the top left-handed figure in figure 12) indicate the fitness error if no search
algorithm was employed. If the choice of features is completely random, the
probability of misclassification error is ~25%. An intuitive approach (for example
viewing each feature at a time and searching for ‘good’ separation between the labels)
yielded a probability of misclassification of ~15% (the same as the best chromosomes
in the first few generations). The GA managed to lower the probability of
misclassification to only ~5%. Hence making finger identification in cases of hand
amputation realistic.
- 37 -
Reference
[1] Uchida N, Hiraiwa A, Sonehara N, Shimohara K, EMG pattern recognition by
neural networks for multi finger control, Proceedings of the Annual Engineering
in Medicine and Biology Society, 1992, pp. 1016 –1018.
[2] Jackson LB, Digital Filters and Signal Processing, Second Edition, Kluwer
Academic, 1989. pp. 255-257.
[3] Duda RO, Hart PE, Stork DG, Pattern Classification, John Wiley & Sons, 2000.
[4] Whitley D, A genetic algorithm tutorial, Statistics and Computing., vol.4(2),
pp.65-85, June 1994.
[5] Holland JH, Genetic Algorithms, Scientific American, 1992, pp.44-50.
[6] Tortora, Grabowski, Principles of Anatomy and Physiology 7th edition, Harper
Collins, pp.313
- 38 -
Download