OPTIMIZING HIDDEN MARKOV MODELS USING GENETIC ALGORITHMS AND ARTIFICIAL IMMUNE SYSTEMS Mohamed Korayem, Amr Badr, Ibrahim Farag Department of Computer Science Faculty of Computers and Information Cairo University ABSTRACT Hidden Markov Models are widely used in speech recognition and bioinformatics systems. Conventional methods are usually used in the parameter estimation process of Hidden Markov Models (HMM). These methods are based on iterative procedure, like BaumWelch method, or gradient based methods. However, these methods can yield to local optimum parameter values. In this work, we use artificial techniques such as Artificial Immune Systems (AIS) and Genetic Algorithms (GA) to estimate HMM parameters. These techniques are global search optimization techniques inspired from biological systems. Also, the hybrid between genetic algorithms and artificial immune system was used to optimize HMM parameters. Keywords: Artificial Immune Systems; Genetic Algorithm; Clonal Selection Algorithm, Hybrid Genetic Immune System ;Hidden Markov Models(HMM); Baum-Welch(BW) 1. INTRODUCTION Hidden Markov Models (HMM) have many applications in signal processing, pattern recognition, and speech recognition (Rabiner, 1993). HMM is considered a basic component in speech recognition systems. The estimation of good model parameters affects the performance of the recognition process so the values of these parameters are needed to be estimated such that the recognition error is minimized. HMM parameters are determined during iterative process called "training process". One of the conventional methods that are applied in setting HMM model parameters values is Baum Welch algorithm. One drawback of this method is that it converges to a local optimum. Global search techniques can be used to optimize HMM parameters. In this paper, the performance of two global optimization techniques is compared with Baum Welch algorithm which is one of the traditional techniques that are used to estimate HMM parameters. These techniques are Genetic Algorithms and Clonal Selection Algorithm which is inspired from artificial immune system. Also, a hybrid genetic immune method is proposed to optimize HMM parameters and then compared with the above methods. The natural immune system uses a variety of evolutionary and adaptive mechanisms to protect organisms from foreign pathogens and misbehaving cells in the body (Forrest, 1997) (De Castro, 2005). Artificial immune systems (AIS) (Somayaji,1998) (Hofmeyr,2000) seek to capture some aspects of the natural immune system in a computational framework, either for the purpose of modeling the natural immune system or for solving engineering problems (Glickman,2005). Clonal selection algorithm which is an aspect from immune system is used to optimize HMM parameters. Clonal selection algorithm (De Castro 2000)( De Castro 2002) is a special kind of artificial immune systems algorithms that uses the clonal expansion principle and the affinity maturation as the main forces of the evolutionary process ( De Castro,2002 b). Genetic Algorithm is another global optimization technique which is used to optimize HMM parameters. The main force of the evolutionary process for the GA which is crossover operator and mutation operator can be merged with clonal selection principle to optimize HMM parameters. So, a hybrid genetic immume technique is proposed. 2. HIDDEN MARKOV MODELS (HMM) HMM are probabilistic models useful for modeling stochastic sequence with underlying finite state structure. Stochastic sequences in speech recognition are called observation sequences O = o1 o2 ………oT, where T is the length of the sequence. HMM with n states ( S1,S2 …. Sn) can be characterized by a set of parameters { , A, B} , where is the initial distribution probability that describes the probability distribution of the observation symbol in the initial n moment, and i 1 i 1 and i >=0. A is the transition probability matrix {aij | i,j=1,2,3 … n}, aij is the probability of transition from state i to n state j, and a j 1 ij 1 and a ij >=0. 3- OPTIMIZING HMM TRAINING B is the observation matrix{ bik | i =1,2,3 ………….n, k=1,2……….m} where n is the number of the states and m is the number of observation symbols. m b k 1 ik 1 , bik >=0, bik is the probability of observation symbol with index k emitted by the current state i. The main problems of HMM are: evaluation, decoding, and Learning problems. Evaluation problem Given the HMM { , A, B} and the observation sequence O=o1 o2 ... oT, the probability that model has generated sequence O is calculated. Often this problem is solved by The Forward Backward Algorithm (Rabiner,1989) (Rabiner,1993). Decoding problem Given the HMM { , A, B} and the observation sequence O=o1 o2 ... oT, calculate the most likely sequence of hidden states that produced this observation sequence O. Usually this problem is handled by Viterbi Algorithm (Rabiner,1989) (Rabiner,1993). Learning problem Given some training observation sequences O=o1 o2 ... oT and general structure of HMM (numbers of hidden and visible states), determine HMM parameters { , A, B} that best fit training data. The most common solution for this problem is Baum Welch algorithm (Rabiner,1989) (Rabiner,1993) which is considered the traditional method for training HMM. The third problem is solved by using three global optimization techniques. The results from these techniques are compared with the traditional technique Figure 1 :Six state left-right HMM model In this paper, a six state left-right HMM model is used as shown in figure 1. Optimizing HMM parameters is estimated by using the mentioned three global optimization techniques and the traditional method . The speech vectors are vectors quantized into a codebook with a size of 32. The transition matrix A is a 6 x 6 matrix and the observation matrix B is of size 6 x 32. According to this configuration (LR HMM model) see figure 1, some transitions of the matrix are constantly zero. 3.1Genetic Algorithm (GA) The genetic algorithm is a robust general purpose optimization technique, which evolves a population of solutions (Goldberg,1989). GA is a search technique that has a representation of the problem states and also has a set of operations to move through the search space. The states in the GA are represented using a set of chromosomes. Each chromosome represents a candidate solution to the problem. The set of candidate solutions forms a population. In essence, the GA produces more generations of this population hoping to reach a good solution for the problem. Members (candidate solutions) of the population are improved across generation through a set of operations that GA uses during the search process. GA has three basic operations to expand a candidate solution into other candidate solutions. These basic operations are: Selection: In this operation, an objective function (called fitness function) is used to assess the quality of the solution. The fittest solutions from each generation are kept. Crossover: This operation generates new solutions given a set of selected members of the current population. Crossover exchanges genetic material between two single chromosome parents This set of selected members is the outcome of the selection operation. Mutation: Biological organisms are often subject to a sudden change in their chromosomes in an unexpected manner. Such a sudden change is simulated in GA in the mutation operation. This operation is a clever way to escape from local optima trap in which state-space search algorithms may fall into. In this operation, some values of a chromosome are changed by adding random values for the current values. This action changes the member values and hence produces a different solution. Genetic Algorithm pseudo code: 1. Generate initial random population of chromosomes 2. Compute the fitness for each chromosome in the current population 3. Make an intermediate population from the current population using the reproduction operator. 4. Using the intermediate population, generate a new population by applying the crossover and mutation operators. 5. If you get a member the population that satisfies the requirements stop, otherwise go to step 2. In this work, The GA is applied to estimate the HMM model parameters. This parameters estimation problem is represented as shown in figure 2 as follows: Each member (chromosome) of the generation represents the A matrix and the B matrix jointly. Each row of the A matrix is encoded into an array, and all the arrays are concatenated to constitute one array, where the first row is followed by the second row then the third row and so on. Then, the B matrix is encoded row by row in the same way. The clonal selection algorithms are a special kind of Immune Algorithms using the clonal expansion and the affinity maturation as the main forces of the evolutionary process . The clonal selection algorithm is described as follows: 1- Generate initial antibodies (each antibody represents a solution that represents the parameters of HMM in our case the A and B matrices). 2- Compute the fitness of each antibody. The used fitness function computes the average log probability over training data. 3- Select antibodies from population which will be used to generate new antibodies (the selection can be random or according to the fitness rank). The antibodies with highest fitness are selected such that they are different enough as described later. 4- For each antibody, generate clones and mutate each clone according to fitness. 5- Delete antibodies with lower fitness form the population, then add to the population the new antibodies. 6- Repeat the steps from 2- 5 until stop criterion is met. The number of iterations can be used as the stop criterion. Figure 2: Representation of Chromosome The members from a given generation are selected for the reproduction phase. The fitness function which is used depends on the average of all log likelihood for all utterances for a word as described in section 4. The arithmetic crossover is applied on the population. The arithmetic crossover generates two children of two parents, and the values of a child are set as the averages of the values of the parents and the values of the other child is set by using the equation (3*p1-p2)/2, where p1 is the first parent and p2 is the other parent. When applying the arithmetic crossover, the resultant values created from crossover must be in the range of limited values for each parameter. For the mutation operation, the following method is applied: According to the representation of chromosomes which consist of real values, the creeping operator is used as the mutation operator which adds generated random values (Gaussian values) to the original values. The resultant values must be within the defined limits. Antibodies represent the parameters of HMM. Each antibody represents a candidate solution. Each member (antibody) of the generation represents the A matrix and the B matrix jointly like a chromosome in GA see figure (2). The fitness value for each antibody is computed as follows: 1 F= , (1) n (i 1 Li ) / n where Li is log likelihood for utterance, n is the number of utterances for word. Selection in clonal Selection algorithm depends on the fitness values for each antibody; the antibodies with the highest fitness are selected such that they are different enough. The Euclidean distance between any two antibodies is greater than a threshold. The Euclidean distance is used to measure the difference between the antibodies. D L ( Ab1 i 1 3.2 Clonal Selection Algorithm Artificial immune systems (AIS) are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving ( De Castro,2002c). i Ab2 i ) 2 (2) For all antibodies the fixed number is used to generate clones. In each cycle of the algorithm, some new antibodies are added to the population. The percentage of these new antibodies is equal to 10% from the population size. For mutation the value was added to each value in the antibody this value is generated by ( * Gaussian value) where is computed according to the following equation = 1 e F , (3) where F is the fitness value for the antibody and is decaying factor. 3.3 Hybrid Genetic -Immune System Method The proposed hybrid method depends on genetic algorithms and immune system. The main forces of the evolutionary process for the GA are crossover and the mutation operators. For the Clonal selection algorithm the main force of the evolutionary process is the idea of clone selection in which new clones are generated. These new clones are then mutated and the best of these clones are added to the population plus adding new generated members to the population. The hybrid method take the main force of the evolutionary process for the two systems. The hybrid method is described as follow:1- Generate the initial population (candidate solutions). 2- Select the (N) best items from the population. 3- For each selected item generate a number of clones (Nc) and mutate each item form (Nc). 4- Select the best mutated item from each group (Nc) and add it to the population. 5- Select from the population the items on which the crossover will be applied. We select them randomly in our system but any selection method can be used.. 6- After selection make a crossover and add the new items (items after crossover) to the population by replacing the low fitness items with the new ones. 7- Add to the population a group of new generated random items. 8- Repeat step 2- 7 according to meeting the stopping criterion. The steps 2 -5 were repeated for a number of times before adding new group of generated random items. 4. EXPERIMENTS Dataset description The used data is recorded for a speech recognition task. The 30 samples for 9 words are collected . These words represent the digits from 1 to 9 spoken in arabic language. As a standard procedure in evaluating machine learning techniques, the dataset is split into a training set and a test set. The training set is composed of 15 x 9 utterances, and the same size is used for the test set. HMM models are trained using the above three methods. Then, the performance of each model is tested on the test dataset. Models are compared according to the average log likelihood over all utterances for each word. Moreover, HMM model is trained by using one traditional method (Baum- Welch algorithm). The results are reported in table 1. The objective of these experiments is to determine which of four methods yields better model in terms of the maximum likelihood estimation (MLE) of training and testing data. 5. RESULTS Table1 : Average Log Likelihood for Genetic Algorithms , Clonal Selection , Hybrid Method vs. Baum Welch Genetic Algorithms Experiment Clonal Selection Hybrid Genetic Immune Testing Data Training Data Testing Data Word 1 Training Data -87.678826 Testing Data -122.846582 Training Data -79.952704 -120.456613 -87.571879 Word 2 -112.853853 -129.668485 Word 3 -120.233247 -136.341848 -109.478160 -124.280939 -117.629193 -135.939801 Word 4 -99.422550 -106.937365 -97.869824 Word 5 Word 6 -114.924569 -135.706757 -111.930453 -128.168076 Word 7 -103.412966 Word 8 Word 9 Baum Welch Testing Data -112.972627 Training Data -100.256415 -99.285726 -116.882284 -126.857669 -144.217638 -110.821281 -127.068196 -127.729650 -144.002114 -105.532062 -90.419239 -98.660715 -100.071147 -109.651458 -107.990526 -127.026957 -100.729004 -120.318206 -119.703586 -139.742821 -105.539754 -123.945254 -97.301037 -114.680868 -118.338618 -135.820565 -113.111271 -100.211900 -109.174607 -94.428044 -106.360118 -112.493737 -120.044928 -91.442692 -121.272038 -91.163196 -122.357082 -80.717414 -111.834818 -92.313749 -123.730211 -87.677962 -112.124334 -82.390500 -105.931426 -76.158748 -98.281587 -95.080516 -117.038164 Table1 shows the average likelihood for each word resulting from applying the GA, clonal selection, Hybrid genetic immune, and Baum Welch algorithms. The figures 3 and 4 present comparison of the four techniques. -132.043827 GA, Clonal Selection, Hybrid Method Vs. Baum-Welch It's clear from table 1 that GA, Clonal Selection Algorithm, and The Hybrid Method optimize HMM parameters better than Baum Welch Algorithm for all Words, they maximize likelihood better over training data (see figure3) and testing data (see figure 4) We note that for experiment eight the Baum-Welch , GA, Clonal Selection Algorithm almost yield to the same results but the Hybrid Method gives better results. . GA Vs Clone Selection Algorithm The immune clone selection gives better results than GA for all words .We also we note that for the experiment one, four, and eight the two algorithms almost yield the same result. Figure 5 a , b shows that the fitness function in clonal selection is better than genetic algorithm which yields -130 -120 -110 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 to better results for optimizing HMM parameters. The figure also show that the clonal selection fitness function increase faster than genetic algorithms especially in the beginning iterations. Hybrid Method Vs GA and Clonal Selection Algorithm We note from the results above The Hybrid Method is give better results than GA and Clonal Selection Algorithm for all experiments over the training data and the testing data, and it's clear from figure 5 a, b and c that the fitness function for The Hybrid Method is better at any moment in the graph than Genetic Algorithm and Clonal Selection Algorithm. Baum-Welch Genetic Algorithms Clonal Selection Hybrid Method Word 1 Word 2 Word 3 Word 4 Word 5 Word 6 Word 7 Word 8 Word 9 Figure 3 : Log Likelihood for Training Data -150 -140 -130 -120 -110 -100 -90 -80 -70 -60 -50 -40 -30 -20 Word Word Word Word Word Word Word Word Word 1 2 3 4 5 6 7 8 9 -10 0 Figure 4 : Log Likelihood for Testing Data Baum-Welch Genetic Algorthims Clonal Selection Hybrid method w ord9 clonal 0.014 0.014 0.012 0.012 0.01 0.01 Fitness Fitness Word 9 genetic 0.008 0.006 0.008 0.006 0.004 0.004 0.002 0.002 0 0 0 200 400 600 800 1000 0 200 Iteration 400 600 800 1000 Iteration a b word9 Hybrid 0.014 0.012 Fitness 0.01 0.008 0.006 0.004 0.002 0 0 200 400 600 800 1000 Iteration c Figure 5 a, b and c :Fitness function of clonal selection algorithm, genetic algorithms and hybrid method word 9 . CONCLUSION In this paper, we presented the results of using the Genetic Algorithm and the clonal selection algorithm to optimize the HMM parameters. We also proposed a hybrid immune genetic algorithm for optimizing HMM parameters. it takes into account the main immune aspects: selection and cloning of the most stimulated cells, death of non-stimulated cells, affinity maturation and reselection of the clones with higher affinity, generation and maintenance of diversity. It also takes into account the main forces of the evolutionary process for the GA which are crossover operator and mutation operator. The results show that the used global optimization techniques produce better results than the traditional Baum Welch algorithm. Moreover, the proposed hybrid algorithm produced the best results over all tested techniques. The global search algorithms generated better results because it doesn't fall in local minima like the Baum Welch algorithm. REFERENCES:De Castro L.N.&VonZuben,F.J. (2000) “The Clonal Selection Algorithm with Engineering Applications”, GECCO’00 – Workshop Proceedings, pp. 36-37. De Castro L. and Timmis, J. (2002) "Artificial Immune Systems: A New Computational Approach", Springer-Verlag New York, Inc. De Castro L. N. & Von Zuben, F. J.(2002 b)"Learning and Optimization Using the Clonal Selection Principle", IEEE Transactions on Evolutionary Computation, Special Issue on Artificial Immune Systems, 6(3), pp. 239-251.. De Castro L. N. and Timmis J. (2002 c) "Artificial Immune Systems: A Novel Paradigm to Pattern Recognition"In Artificial Neural Networks in Pattern Recognition , J. M. Corchado, L. Alonso, and C. Fyfe (eds.), SOCO-2002, University of Paisley, UK, pp. 6784. De Castro L.N., Von Zuben, F.J (2005). Recent Developments in Biologically Inspired Computing, Idea Group Inc. (IGI) Publishing. Forrest S., Hofmeyr S., and Somayaji A. (1997) "Computer Immunology.". Communications of the ACM Vol. 40, No. 10, pp. 88-96 . Goldberg, D. E. (1989) .Genetic Algorithms in Search, Opti-mization & Machine Learning, Addison-Wesley.. Glickman M. , Balthrop J., and Forrest S. (2005) "A Machine Learning Evaluation of an Artificial Immune System ," Evolutionary Computation Journal, Vol 13, No 2 pp. 179-212. Hofmeyr S. and Forrest S. (2000) "Architecture for an Artificial Immune System." Evolutionary Computation 7(1), Morgan-Kaufmann, San Francisco, CA, pp. 1289-1296. Rabiner L. and Juang B. (1993) . Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs,NJ. Rabiner L.R. (1989) "A tutorial on HMM and Selected Applications in Speech Recognition", In:[WL], PROCEEDINGS OF THE IEEE, VOL. 77, NO. 2, pp267-296 . Somayaji A., Hofmeyr S., and Forrest S. (1998) "Principles of a Computer Immune System". New Security Paradigms Workshop, pp. 75-82, ACM.