Genetic Algorithms Problem: find an optimal solution. Solution, so far: search through all possible solutions to find the best one. Solution with genetic algorithms: start from a set of solutions and keep on evolving the solutions until the “best fit” to the real solution is found. Each solution is a bit string (for example, concatenated values of features). This bit string is called a chromosome. Bits that comprise one feature are called a gene. Bits within a gene are called alleles. The position of a bit is called a locus. For example, let us assume that we have samples 1 and 2 and the task is to find the optimal value for F2 sample 3 such that the class of that sample is either A or B. Sample S1 S2 S3 Feature F1 1 4 2 F2 Class 2 A 5 B x=? The traditional mathematics can use various distances, for example Euclidian distance and minimum square error. We would start from a solution and somehow keep on evolving it, for example by slightly moving the sample value. For example, we would start with x=2, and calculate the distance from point S3(2,2) to the points in cluster A. Then we would pick another point, for example x=2.2, and see if the error improves. If not, we would try x=1.8. And so on, we would try all points at distances i*δ away from the original point, where δ is a predetermined step size, and i is a counter. If the error did not improve, we could repeat the process for another starting value of x. Genetic algorithms work by putting the solution into the form of a binary string. The solution is based on a schema. * are don’t cares. **00**11* Genetic algorithm: 1. Generate random population of n chromosomes (i.e. suitable solutions for the problem) 2. Evaluate the fitness of each chromosome x in the population 3. While the fitness is below the desired level, create a new population: 1. Select: Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected – it is possible to be selected several times) 2. Crossover: Using a prespecified crossover probability, crossover the parents to form new offspring (children). If no crossover was performed, offspring is the exact copy of parents. 3. Mutate: Using a prespecified mutation probability, mutate new offspring at each locus. 4. Place new offspring in the new population 5. Evaluate: Evaluate the fitness of each chromosome x in the population. 4. Return the best solution in current population. Variables to be selected are: 1. type and size of chromosomes 1. define alleles based on specified minimum precision 2. number and values of initial chromosomes 3. crossover probability and strategy 4. mutation probability 5. evaluation, i.e. assigning fitness to chromosomes and their offspring 6. selecting new population 7. stopping criteria Type and size of chromosomes 1. Binary encoding Most common way is to have each feature represented as a binary string. The number of bits is determined from this formula: (b-a) / (2m -1) ≤ required_precision where [a, b] is the range of values for the feature, m is the number of bits, and required_precision is assigned by the user. If we call C = (b-a) / (2m -1) for short, then each feature will have binary value binary = a + decimal* C 2. Gray Encoding 3. Value Encoding 4. Permutation encoding Swap all instances of values to be swapped. E.g.: in order to swap EB for CD, swap E and C in chromosome 1, and swap B and D in chromosome 2. ADE BC ADC BE ABC DE AEC DB ACE DB ACE BD 5. etc. Crossover 1. Single point crossover Pick a point in the chromosomes, and “swap” corresponding tail portions of two parent chromosomes to make up an offspring. 11111 111 00000 000 2. Two point crossover 11111000 Same as above except that the split is not at one point, but at two, and we swap the middle. 1 1111 111 10000111 0 0000 000 3. Arithmetic crossover 4. Uniform crossover 5. etc. Mutation Each bit can be mutated (i.e. inverted) or not, according to mutation probability. Example: problem 10.8 Fitness Assign some fitness function. Selection How do we select the fittest chromosomes which will be used for crossover? 1. Roulette wheel selection Probability of selecting this chromosome is proportional to its fitness. 2. Ranking Chromosomes are ranked based on fitness. 3. Tournament (Selecting the fittest) Only the fittest are selected for crossover. 4. Elitism At each iteration, save the best chromosome and copy it into the new population. 5. etc. Termination 1. When there is not much difference between iterations 2. When we completed a given number of iterations 3. etc. References: http://cs.felk.cvut.cz/~xobitko/ga/ http://pangea.stanford.edu/~baris/professional/theoryga.html http://www.burns-stat.com/pages/Tutor/genetic.html http://members.aol.com/btluke/gmovr01.htm for Aren http://en.wikipedia.org/wiki/Genetic_algorithm