Optimizing Hidden Markov models using Genetic Algorithms and

advertisement
OPTIMIZING HIDDEN MARKOV MODELS USING
GENETIC ALGORITHMS AND ARTIFICIAL IMMUNE
SYSTEMS
Mohamed Korayem, Amr Badr, Ibrahim Farag
Department of Computer Science
Faculty of Computers and Information
Cairo University
ABSTRACT
Hidden Markov Models are widely used in speech
recognition and bioinformatics systems. Conventional
methods are usually used in the parameter estimation
process of Hidden Markov Models (HMM). These
methods are based on iterative procedure, like BaumWelch method, or gradient based methods. However,
these methods can yield to local optimum parameter
values. In this work, we use artificial techniques such
as Artificial Immune Systems (AIS) and Genetic
Algorithms (GA) to estimate HMM parameters. These
techniques are global search optimization techniques
inspired from biological systems. Also, the hybrid
between genetic algorithms and artificial immune
system was used to optimize HMM parameters.
Keywords: Artificial Immune Systems; Genetic
Algorithm; Clonal Selection Algorithm, Hybrid
Genetic Immune System ;Hidden
Markov
Models(HMM); Baum-Welch(BW)
1. INTRODUCTION
Hidden Markov Models (HMM) have many
applications in signal processing, pattern recognition,
and speech recognition (Rabiner, 1993).
HMM is considered a basic component in speech
recognition systems. The estimation of good model
parameters affects the performance of the recognition
process so the values of these parameters are needed
to be estimated such that the recognition error is
minimized.
HMM parameters are determined during iterative
process called "training process". One of the
conventional methods that are applied in setting HMM
model parameters values is Baum Welch algorithm.
One drawback of this method is that it converges to a
local optimum.
Global search techniques can be used to optimize
HMM parameters. In this paper, the performance of
two global optimization techniques is compared with
Baum Welch algorithm which is one of the traditional
techniques that are used to estimate HMM parameters.
These techniques are Genetic Algorithms and Clonal
Selection Algorithm which is inspired from artificial
immune system. Also, a hybrid genetic immune
method is proposed to optimize HMM parameters and
then compared with the above methods.
The natural immune system uses a variety of
evolutionary and adaptive mechanisms to protect
organisms from foreign pathogens and misbehaving
cells in the body (Forrest, 1997) (De Castro, 2005).
Artificial immune systems (AIS) (Somayaji,1998)
(Hofmeyr,2000) seek to capture some aspects of the
natural immune system in a computational framework,
either for the purpose of modeling the natural immune
system or for solving engineering problems
(Glickman,2005). Clonal selection algorithm which is
an aspect from immune system is used to optimize
HMM parameters. Clonal selection algorithm (De
Castro 2000)( De Castro 2002) is a special kind of
artificial immune systems algorithms that uses the
clonal expansion principle and the affinity maturation
as the main forces of the evolutionary process ( De
Castro,2002 b). Genetic Algorithm is another global
optimization technique which is used to optimize
HMM parameters. The main force of the evolutionary
process for the GA which is crossover operator and
mutation operator can be merged with clonal selection
principle to optimize HMM parameters. So, a hybrid
genetic immume technique is proposed.
2. HIDDEN MARKOV MODELS (HMM)
HMM are probabilistic models useful for modeling
stochastic sequence with underlying finite state
structure. Stochastic sequences in speech recognition
are called observation sequences O = o1 o2 ………oT,
where T is the length of the sequence. HMM with n
states ( S1,S2 …. Sn) can be characterized by a set of
parameters   { , A, B} , where  is the initial
distribution probability that describes the probability
distribution of the observation symbol in the initial
n

moment, and
i 1
i
 1 and  i >=0.
A is the transition probability matrix {aij | i,j=1,2,3 …
n}, aij is the probability of transition from state i to
n
state j, and
a
j 1
ij
 1 and a ij >=0.
3- OPTIMIZING HMM TRAINING
B is the observation matrix{ bik | i =1,2,3 ………….n,
k=1,2……….m} where n is the number of the states
and m is the number of observation symbols.
m
b
k 1
ik
 1 , bik >=0, bik is the probability of
observation symbol with index k emitted by the
current state i.
The main problems of HMM are: evaluation,
decoding, and Learning problems.
Evaluation problem
Given the HMM   { , A, B} and the observation
sequence O=o1 o2 ... oT, the probability that model 
has generated sequence O is calculated.
Often this problem is solved by The Forward
Backward Algorithm (Rabiner,1989) (Rabiner,1993).
Decoding problem
Given the HMM   { , A, B} and the observation
sequence O=o1 o2 ... oT, calculate the most likely
sequence of hidden states
that produced this
observation sequence O.
Usually this problem is handled by Viterbi Algorithm
(Rabiner,1989) (Rabiner,1993).
Learning problem
Given some training observation sequences O=o1 o2 ...
oT and general structure of HMM (numbers of hidden
and visible states), determine HMM parameters
  { , A, B} that best fit training data.
The most common solution for this problem is Baum
Welch algorithm (Rabiner,1989) (Rabiner,1993)
which is considered the traditional method for training
HMM.
The third problem is solved by using three global
optimization techniques. The results from these
techniques are compared with the traditional technique

Figure 1 :Six state left-right HMM model
In this paper, a six state left-right HMM model is used
as shown in figure 1. Optimizing HMM parameters is
estimated by using the mentioned three global
optimization techniques and the traditional method .
The speech vectors are vectors quantized into a
codebook with a size of 32. The transition matrix A is
a 6 x 6 matrix and the observation matrix B is of size 6
x 32. According to this configuration (LR HMM
model) see figure 1, some transitions of the matrix are
constantly zero.
3.1Genetic Algorithm (GA)
The genetic algorithm is a robust general purpose
optimization technique, which evolves a population of
solutions (Goldberg,1989).
GA is a search technique that has a representation of
the problem states and also has a set of operations to
move through the search space. The states in the GA
are represented using a set of chromosomes. Each
chromosome represents a candidate solution to the
problem. The set of candidate solutions forms a
population. In essence, the GA produces more
generations of this population hoping to reach a good
solution for the problem. Members (candidate
solutions) of the population are improved across
generation through a set of operations that GA uses
during the search process. GA has three basic
operations to expand a candidate solution into other
candidate solutions. These basic operations are:
 Selection: In this operation, an objective
function (called fitness function) is used to
assess the quality of the solution. The fittest
solutions from each generation are kept.
 Crossover: This operation generates new
solutions given a set of selected members of
the current population. Crossover exchanges
genetic material between two single
chromosome parents
 This set of selected members is the outcome
of the selection operation.
Mutation: Biological organisms are often subject to a
sudden change in their chromosomes in an unexpected
manner. Such a sudden change is simulated in GA in
the mutation operation. This operation is a clever way
to escape from local optima trap in which state-space
search algorithms may fall into. In this operation,
some values of a chromosome are changed by adding
random values for the current values. This action
changes the member values and hence produces a
different solution.
Genetic Algorithm pseudo code:
1.
Generate initial random population of
chromosomes
2.
Compute the fitness for each chromosome
in the current population
3.
Make an intermediate population from the
current population using the reproduction
operator.
4.
Using the intermediate population,
generate a new population by applying the
crossover and mutation operators.
5.
If you get a member the population that
satisfies the requirements stop, otherwise go
to step 2.
In this work, The GA is applied to estimate the HMM
model parameters. This parameters estimation
problem is represented as shown in figure 2 as
follows: Each member (chromosome) of the
generation represents the A matrix and the B matrix
jointly. Each row of the A matrix is encoded into an
array, and all the arrays are concatenated to constitute
one array, where the first row is followed by the
second row then the third row and so on. Then, the B
matrix is encoded row by row in the same way.
The clonal selection algorithms are a special kind of
Immune Algorithms using the clonal expansion and
the affinity maturation as the main forces of the
evolutionary process .
The clonal selection algorithm is described as follows:
1- Generate initial antibodies (each antibody
represents a solution that represents the
parameters of HMM in our case the A and B
matrices).
2- Compute the fitness of each antibody. The
used fitness function computes the average
log probability over training data.
3- Select antibodies from population which will
be used to generate new antibodies (the
selection can be random or according to the
fitness rank). The antibodies with highest
fitness are selected such that they are different
enough as described later.
4- For each antibody, generate clones and mutate
each clone according to fitness.
5- Delete antibodies with lower fitness form the
population, then add to the population the new
antibodies.
6- Repeat the steps from 2- 5 until stop criterion
is met. The number of iterations can be used
as the stop criterion.
Figure 2: Representation of Chromosome
The members from a given generation are selected for
the reproduction phase. The fitness function which is
used depends on the average of all log likelihood for
all utterances for a word as described in section 4. The
arithmetic crossover is applied on the population. The
arithmetic crossover generates two children of two
parents, and the values of a child are set as the
averages of the values of the parents and the values of
the other child is set by using the equation (3*p1-p2)/2,
where p1 is the first parent and p2 is the other parent.
When applying the arithmetic crossover, the resultant
values created from crossover must be in the range of
limited values for each parameter.
For the mutation operation, the following method is
applied:
According to the representation of chromosomes
which consist of real values, the creeping operator is
used as the mutation operator which adds generated
random values (Gaussian values) to the original
values. The resultant values must be within the
defined limits.
Antibodies represent the parameters of HMM. Each
antibody represents a candidate solution. Each
member (antibody) of the generation represents the A
matrix and the B matrix jointly like a chromosome in
GA see figure (2).
The fitness value for each antibody is computed
as follows:
1
F=
,
(1)
n
(i 1 Li ) / n
where Li is log likelihood for utterance, n is the
number of utterances for word.
Selection in clonal Selection algorithm depends on the
fitness values for each antibody; the antibodies with
the highest fitness are selected such that they are
different enough. The Euclidean distance between any
two antibodies is greater than a threshold. The
Euclidean distance is used to measure the difference
between the antibodies.
D
L
 ( Ab1
i 1
3.2 Clonal Selection Algorithm
Artificial immune systems (AIS) are adaptive systems,
inspired by theoretical immunology and observed
immune functions, principles and models, which are
applied to problem solving ( De Castro,2002c).
i
 Ab2 i ) 2
(2)
For all antibodies the fixed number is used to generate
clones. In each cycle of the algorithm, some new
antibodies are added to the population. The percentage
of these new antibodies is equal to 10% from the
population size. For mutation the value was added to
each value in the antibody this value is generated by
(  * Gaussian value) where  is computed according
to the following equation
=
1

e F ,
(3)
where F is the fitness value for the antibody and  is
decaying factor.
3.3 Hybrid Genetic -Immune System Method
The proposed hybrid method depends on genetic
algorithms and immune system. The main forces of
the evolutionary process for the GA are crossover and
the mutation operators. For the Clonal selection
algorithm the main force of the evolutionary process is
the idea of clone selection in which new clones are
generated. These new clones are then mutated and the
best of these clones are added to the population plus
adding new generated members to the population. The
hybrid method take the main force of the evolutionary
process for the two systems.
The hybrid method is described as follow:1- Generate the initial population (candidate
solutions).
2- Select the (N) best items from the population.
3- For each selected item generate a number of
clones (Nc) and mutate each item form (Nc).
4- Select the best mutated item from each group
(Nc) and add it to the population.
5- Select from the population the items on which
the crossover will be applied. We select them
randomly in our system but any selection
method can be used..
6- After selection make a crossover and add the
new items (items after crossover) to the
population by replacing the low fitness items
with the new ones.
7- Add to the population a group of new
generated random items.
8- Repeat step 2- 7 according to meeting the
stopping criterion.
The steps 2 -5 were repeated for a number of times
before adding new group of generated random items.
4. EXPERIMENTS
Dataset description
The used data is recorded for a speech recognition
task. The 30 samples for 9 words are collected . These
words represent the digits from 1 to 9 spoken in arabic
language. As a standard procedure in evaluating
machine learning techniques, the dataset is split into a
training set and a test set. The training set is composed
of 15 x 9 utterances, and the same size is used for the
test set. HMM models are trained using the above
three methods. Then, the performance of each model
is tested on the test dataset. Models are compared
according to the average log likelihood over all
utterances for each word. Moreover, HMM model is
trained by using one traditional method (Baum- Welch
algorithm). The results are reported in table 1.
The objective of these experiments is to determine
which of four methods yields better model in terms of
the maximum likelihood estimation (MLE) of training
and testing data.
5. RESULTS
Table1 : Average Log Likelihood for Genetic Algorithms , Clonal Selection , Hybrid Method vs. Baum Welch
Genetic Algorithms
Experiment
Clonal Selection
Hybrid Genetic Immune
Testing Data
Training Data
Testing Data
Word 1
Training
Data
-87.678826
Testing Data
-122.846582
Training
Data
-79.952704
-120.456613
-87.571879
Word 2
-112.853853
-129.668485
Word 3
-120.233247
-136.341848
-109.478160
-124.280939
-117.629193
-135.939801
Word 4
-99.422550
-106.937365
-97.869824
Word 5
Word 6
-114.924569
-135.706757
-111.930453
-128.168076
Word 7
-103.412966
Word 8
Word 9
Baum Welch
Testing Data
-112.972627
Training
Data
-100.256415
-99.285726
-116.882284
-126.857669
-144.217638
-110.821281
-127.068196
-127.729650
-144.002114
-105.532062
-90.419239
-98.660715
-100.071147
-109.651458
-107.990526
-127.026957
-100.729004
-120.318206
-119.703586
-139.742821
-105.539754
-123.945254
-97.301037
-114.680868
-118.338618
-135.820565
-113.111271
-100.211900
-109.174607
-94.428044
-106.360118
-112.493737
-120.044928
-91.442692
-121.272038
-91.163196
-122.357082
-80.717414
-111.834818
-92.313749
-123.730211
-87.677962
-112.124334
-82.390500
-105.931426
-76.158748
-98.281587
-95.080516
-117.038164
Table1 shows the average likelihood for each word
resulting from applying the GA, clonal selection,
Hybrid genetic immune, and Baum Welch algorithms.
The figures 3 and 4 present comparison of the four
techniques.

-132.043827
GA, Clonal Selection, Hybrid Method Vs.
Baum-Welch
It's clear from table 1 that GA, Clonal Selection
Algorithm, and The Hybrid Method optimize HMM
parameters better than Baum Welch Algorithm for all
Words, they maximize likelihood better over training
data (see figure3) and testing data (see figure 4)
We note that for experiment eight the Baum-Welch ,
GA, Clonal Selection Algorithm almost yield to the
same results but the Hybrid Method gives better
results. .
 GA Vs Clone Selection Algorithm
The immune clone selection gives better results than
GA for all words .We also we note that for the
experiment one, four, and eight the two algorithms
almost yield the same result.
Figure 5 a , b shows that the fitness function in clonal
selection is better than genetic algorithm which yields
-130
-120
-110
-100
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
to better results for optimizing HMM parameters. The
figure also show that the clonal selection fitness
function increase faster than genetic algorithms
especially in the beginning iterations.
 Hybrid Method Vs GA and Clonal
Selection Algorithm
We note from the results above The Hybrid Method is
give better results than GA and Clonal Selection
Algorithm for all experiments over the training data
and the testing data, and it's clear from figure 5 a, b
and c that the fitness function for The Hybrid Method
is better at any moment in the graph than Genetic
Algorithm and Clonal Selection Algorithm.
Baum-Welch
Genetic Algorithms
Clonal Selection
Hybrid Method
Word 1
Word 2 Word 3 Word 4 Word 5 Word 6 Word 7 Word 8 Word 9
Figure 3 : Log Likelihood for Training Data
-150
-140
-130
-120
-110
-100
-90
-80
-70
-60
-50
-40
-30
-20 Word Word Word Word Word Word Word Word Word
1
2
3
4
5
6
7
8
9
-10
0
Figure 4 : Log Likelihood for Testing Data
Baum-Welch
Genetic Algorthims
Clonal Selection
Hybrid method
w ord9 clonal
0.014
0.014
0.012
0.012
0.01
0.01
Fitness
Fitness
Word 9 genetic
0.008
0.006
0.008
0.006
0.004
0.004
0.002
0.002
0
0
0
200
400
600
800
1000
0
200
Iteration
400
600
800
1000
Iteration
a
b
word9 Hybrid
0.014
0.012
Fitness
0.01
0.008
0.006
0.004
0.002
0
0
200
400
600
800
1000
Iteration
c
Figure 5 a, b and c :Fitness function of clonal selection algorithm, genetic algorithms and hybrid method
word 9 .
CONCLUSION
In this paper, we presented the results of using the
Genetic Algorithm and the clonal selection algorithm
to optimize the HMM parameters. We also proposed a
hybrid immune genetic algorithm for optimizing
HMM parameters. it takes into account the main
immune aspects: selection and cloning of the most
stimulated cells, death of non-stimulated cells, affinity
maturation and reselection of the clones with higher
affinity, generation and maintenance of diversity. It
also takes into account the main forces of the
evolutionary process for the GA which are crossover
operator and mutation operator. The results show that
the used global optimization techniques produce better
results than the traditional Baum Welch algorithm.
Moreover, the proposed hybrid algorithm produced
the best results over all tested techniques. The global
search algorithms generated better results because it
doesn't fall in local minima like the Baum Welch
algorithm.
REFERENCES:De Castro L.N.&VonZuben,F.J. (2000) “The Clonal
Selection Algorithm with Engineering Applications”,
GECCO’00 – Workshop Proceedings, pp. 36-37.
De Castro L. and Timmis, J. (2002) "Artificial
Immune Systems: A New Computational Approach",
Springer-Verlag New York, Inc.
De Castro L. N. & Von Zuben, F. J.(2002 b)"Learning
and Optimization Using the Clonal Selection
Principle", IEEE Transactions on Evolutionary
Computation, Special Issue on Artificial Immune
Systems, 6(3), pp. 239-251..
De Castro L. N. and Timmis J. (2002 c) "Artificial
Immune Systems: A Novel Paradigm to Pattern
Recognition"In Artificial Neural Networks in Pattern
Recognition , J. M. Corchado, L. Alonso, and C. Fyfe
(eds.), SOCO-2002, University of Paisley, UK, pp. 6784.
De Castro L.N., Von Zuben, F.J (2005). Recent
Developments in Biologically Inspired Computing,
Idea Group Inc. (IGI) Publishing.
Forrest S., Hofmeyr S., and Somayaji A. (1997)
"Computer Immunology.". Communications of the
ACM Vol. 40, No. 10, pp. 88-96 .
Goldberg, D. E. (1989) .Genetic Algorithms in Search,
Opti-mization & Machine Learning, Addison-Wesley..
Glickman M. , Balthrop J., and Forrest S. (2005) "A
Machine Learning Evaluation of an Artificial Immune
System ," Evolutionary Computation Journal, Vol 13,
No 2 pp. 179-212.
Hofmeyr S. and Forrest S. (2000) "Architecture for an
Artificial Immune System." Evolutionary
Computation 7(1), Morgan-Kaufmann, San Francisco,
CA, pp. 1289-1296.
Rabiner L. and Juang B. (1993) . Fundamentals of
Speech Recognition. Prentice-Hall, Englewood
Cliffs,NJ.
Rabiner L.R. (1989) "A tutorial on HMM and Selected
Applications in Speech Recognition", In:[WL],
PROCEEDINGS OF THE IEEE, VOL. 77, NO. 2,
pp267-296 .
Somayaji A., Hofmeyr S., and Forrest S. (1998)
"Principles of a Computer Immune System". New
Security Paradigms Workshop, pp. 75-82, ACM.
Download