13 12 Basics of Genetic Algorithms and some possibilities Peter Spijker Technische Universiteit Eindhoven Department of Biomedical Engineering Division of Biomedical Imaging and Modeling California Institute of Technology Materials Process and Simulation Center Biochemistry & Molecular Biophysics November 25, 2003 Presentation Overview • Purpose of presentation • General introduction to Genetic Algorithms (GA’s) • Biological background • Origin of species • Natural selection • Genetic Algorithm • Search space • Basic algorithm • Coding • Methods • Examples • Possibilities 13 Purpose of presentation 13 • Optimising parameters of force fields is a difficult and time consuming task • Use of optimising methods might be of use • Methods: - steepest descent - simulated annealing (Monte Carlo) - genetic algorithms • Brief introduction to genetic algorithms in lecture style General Introduction to GA’s 13 • Genetic algorithms (GA’s) are a technique to solve problems which need optimization • GA’s are a subclass of Evolutionary Computing • GA’s are based on Darwin’s theory of evolution • History of GA’s • Evolutionary computing evolved in the 1960’s. • GA’s were created by John Holland in the mid-70’s. Biological Background (1) – The cell • Every animal cell is a complex of many small “factories” working together • The center of this all is the cell nucleus • The nucleus contains the genetic information 13 Biological Background (2) – Chromosomes 13 • Genetic information is stored in the chromosomes • Each chromosome is build of DNA • Chromosomes in humans form pairs • There are 23 pairs • The chromosome is divided in parts: genes • Genes code for properties • The posibilities of the genes for one property is called: allele • Every gene has an unique position on the chromosome: locus Biological Background (3) – Genetics 13 • The entire combination of genes is called genotype • A genotype develops to a phenotype • Alleles can be either dominant or recessive • Dominant alleles will always express from the genotype to the fenotype • Recessive alleles can survive in the population for many generations, without being expressed. Biological Background (4) – Reproduction • Reproduction of genetical information • Mitosis • Meiosis • Mitosis is copying the same genetic information to new offspring: there is no exchange of information • Mitosis is the normal way of growing of multicell structures, like organs. 13 Biological Background (5) – Reproduction • Meiosis is the basis of sexual reproduction • After meiotic division 2 gametes appear in the process • In reproduction two gametes conjugate to a zygote wich will become the new individual • Hence genetic information is shared between the parents in order to create new offspring 13 Biological Background (6) – Reproduction • During reproduction “errors” occur • Due to these “errors” genetic variation exists • Most important “errors” are: • Recombination (cross-over) • Mutation 13 Biological Background (7) – Natural selection 13 • The origin of species: “Preservation of favourable variations and rejection of unfavourable variations.” • There are more individuals born than can survive, so there is a continuous struggle for life. • Individuals with an advantage have a greater chance for survive: survival of the fittest. Biological Background (8) – Natural selection 13 • Important aspects in natural selection are: • adaptation to the environment • isolation of populations in different groups which cannot mutually mate • If small changes in the genotypes of individuals are expressed easily, especially in small populations, we speak of genetic drift • Mathematical expresses as fitness: success in life Presentation Overview • Purpose of presentation • General introduction to Genetic Algorithms (GA’s) • Biological background • Origin of species • Natural selection • Genetic Algorithm • Search space • Basic algorithm • Coding • Methods • Examples • Possibilities 13 13 Genetic Algorithm (1) – Search space • Most often one is looking for the best solution in a specific subset of solutions • This subset is called the search space (or state space) • Every point in the search space is a possible solution • Therefore every point has a fitness value, depending on the problem definition • GA’s are used to search the search space for the best solution, e.g. a minimum • Difficulties are the local minima and the starting point of the search 2.5 2 1.5 1 0.5 0 0 100 200 300 400 500 600 700 800 900 1000 Genetic Algorithm (2) – Basic algorithm 13 • Starting with a subset of n randomly chosen solutions from the search space (i.e. chromosomes). This is the population • This population is used to produce a next generation of individuals by reproduction • Individuals with a higher fitness have more chance to reproduce (i.e. natural selection) Genetic Algorithm (3) – Basic algorithm 13 • Outline of the basic algorithm 0 START : Create random population of n chromosomes 1 FITNESS : Evaluate fitness f(x) of each chromosome in the population 2 NEW POPULATION 0 SELECTION : Based on f(x) 1 RECOMBINATION : Cross-over chromosomes 3 2 MUTATION : Mutate chromosomes 3 ACCEPTATION : Reject or accept new one REPLACE : Replace old with new population: the new generation 4 TEST : Test problem criterium 5 LOOP : Continue step 1 – 4 until criterium is satisfied Genetic Algorithm (4) – Coding 13 • Normal cells are diploid (containing 2 complete sets of chromosomes) • On the contrary gametes are haploid • Formalizing diploid reproduction is much more difficult than haploid • Diploid populations have an extra dimension compared to haploid populations • For simplicity therefore only haploid genetic algorithms Genetic Algorithm (5) – Coding 13 • Chromosomes are encoded by bitstrings • Every bitstring therefore is a solution but not necisseraly the best solution • The way bitstrings can code differs from problem to problem 1 0 0 1 Either: sequence of on/off or the number 9 13 Genetic Algorithm (6) – Coding • Recombination (cross-over) can when using bitstrings schematically be represented: 1 0 0 1 1 0 1 X 0 1 0 1 1 1 0 1 0 0 1 1 1 0 • Using a specific cross-over point 0 1 0 1 1 0 1 13 Genetic Algorithm (7) – Coding • Mutation prevents the algorithm to be trapped in a local minimum • In the bitstring approach mutation is simpy the flipping of one of the bits 1 0 0 1 1 0 1 1 1 0 1 1 0 1 Genetic Algorithm (8) – Coding 13 • Both recombination and mutation depend a lot on the exact definition of the problem and the choice of representing the chromosomes (e.g. no bitstrings) • Different encodings can be used: • Binary encoding • Permutation encoding • Value encoding • Tree encoding • Focus in this presentation stays with binary encoding 13 Example Minimum of Function (1) • First example shows how to find the minimum of a function 2.5 2 1.5 1 0.5 0 Minimum f(x) at x = 809 0 100 200 300 400 500 600 700 800 900 1000 1100101001 13 Example Minimum of Function (2) 2.2 2 1.8 1.6 1.4 1.2 Mean fitness 1 Best fitness 0.8 0.6 1.3 Best Fitness Mean Fitness 0.4 1.2 Generation 1 0 200 400 600 800 1000 1.1 1200 1 Individual Best individual 0.9 Fitness 0.2 0.8 0.7 0.6 0.5 0.4 0.3 0 10 20 30 40 Generations 50 Generations 60 70 80 Example Minimum of Function (3) • Interactive show of this algorithm with Matlab • Using the function: genalg2() • Variables: • Population size • Bitstringlength • Mutation chance • Recombination chance • Starting population adaption 13 Genetic Algorithm (9) – Remarks 13 • It is clear from the example that the convergence speed of the algorithm depends on many factors: • Population size • Mutation probability • Recombination probability • Elitism • Selection methods • Random selection of parents • Roulette wheel selection of parents • Strong point GA’s: mutation prevents from falling in a local minimum, recombination initiates a fast first convergence 13 Example Checkboard (1) • We are given an n by n checkboard in which every field can have a different colour from a set of four colours. • Goal is to achieve a checkboard in a way that there are no neighbours with the same colour (not diagonal) 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Example Checkboard (2) 13 • Chromosomes represent the way the checkboard is coloured. • Chromosomes are not represented by bitstrings but by bitmatrices • The bits in the bitmatrix can have one of the four values 0, 1, 2 or 3, depending on the colour • Crossing-over involves matrix manipulation instead of point wise operating. Crossing-over can be combining the parential matrices in a horizontal, vertical, triangular or square way • Mutation remains bitwise changing bits in either one of the other numbers 13 Example Checkboard (3) • Fitnesscurve for the checkboard example 180 175 170 165 Fitness 160 155 150 145 140 Best Fitness Mean Fitness 135 130 0 100 200 300 Generations 400 500 600 • This problem can be seen as a graph with n nodes and (n-1) edges, so the fitness f(x) is easily defined as: f(x) = 2 · (n-1) ·n 13 Example Checkboard (4) • Fitnesscurves for different cross-over rules Fitness Lower-Triangular Crossing Over Square Crossing Over 180 180 170 170 160 160 150 150 140 140 130 0 100 200 300 400 500 130 0 Fitness Horizontal Cutting Crossing Over 180 170 170 160 160 150 150 140 140 0 200 400 Generations 600 400 600 800 Verical Cutting Crossing Over 180 130 200 800 130 0 500 1000 Generations 1500 Example Checkboard (5) • Interactive show of this algorithm with Matlab • Using the functions: • main() • checkers() • bestindividual() • mutate() • recombine() • select() • showbestindividual() 13 Possibilities • Using the genetic algorithm to optimise parameters for a force field • Parameters are real numbers, so adaptations of these algorithms is required • Value incoding vs. bitstring encoding • Difficulties: • Definition fitness function • Integration algorithm with software 13 13 Further Questions ?