Genetic algorithms: introduction and applications to force field development

advertisement
13
12
Basics of Genetic Algorithms
and some possibilities
Peter Spijker
Technische Universiteit Eindhoven
Department of Biomedical Engineering
Division of Biomedical Imaging and Modeling
California Institute of Technology
Materials Process and Simulation Center
Biochemistry & Molecular Biophysics
November 25, 2003
Presentation Overview
• Purpose of presentation
• General introduction to Genetic Algorithms (GA’s)
• Biological background
• Origin of species
• Natural selection
• Genetic Algorithm
• Search space
• Basic algorithm
• Coding
• Methods
• Examples
• Possibilities
13
Purpose of presentation
13
• Optimising parameters of force fields is a difficult
and time consuming task
• Use of optimising methods might be of use
• Methods:
- steepest descent
- simulated annealing (Monte Carlo)
- genetic algorithms
• Brief introduction to genetic algorithms in lecture style
General Introduction to GA’s
13
• Genetic algorithms (GA’s) are a technique to solve
problems which need optimization
• GA’s are a subclass of Evolutionary Computing
• GA’s are based on
Darwin’s theory of evolution
• History of GA’s
• Evolutionary computing evolved in the 1960’s.
• GA’s were created by John Holland in the mid-70’s.
Biological Background (1) – The cell
• Every animal cell is a complex of many small
“factories” working together
• The center of this all is the cell nucleus
• The nucleus contains the genetic information
13
Biological Background (2) – Chromosomes
13
• Genetic information is stored in the chromosomes
• Each chromosome is build of DNA
• Chromosomes in humans form pairs
• There are 23 pairs
• The chromosome is divided in parts: genes
• Genes code for properties
• The posibilities of the genes for
one property is called: allele
• Every gene has an unique position
on the chromosome: locus
Biological Background (3) – Genetics
13
• The entire combination of genes is called genotype
• A genotype develops to a phenotype
• Alleles can be either dominant or recessive
• Dominant alleles will always express from the genotype
to the fenotype
• Recessive alleles can survive in the population for many
generations, without being expressed.
Biological Background (4) – Reproduction
• Reproduction of genetical information
• Mitosis
• Meiosis
• Mitosis is copying the same
genetic information to new
offspring: there is no
exchange of information
• Mitosis is the normal way of
growing of multicell structures,
like organs.
13
Biological Background (5) – Reproduction
• Meiosis is the basis of sexual reproduction
• After meiotic division 2 gametes
appear in the process
• In reproduction two gametes
conjugate to a zygote wich
will become the new individual
• Hence genetic information is shared
between the parents in order to
create new offspring
13
Biological Background (6) – Reproduction
• During reproduction “errors” occur
• Due to these “errors” genetic variation exists
• Most important “errors” are:
• Recombination (cross-over)
• Mutation
13
Biological Background (7) – Natural selection
13
• The origin of species: “Preservation of favourable
variations and rejection of unfavourable variations.”
• There are more individuals born than can survive, so
there is a continuous struggle for life.
• Individuals with an advantage have a greater chance for
survive: survival of the fittest.
Biological Background (8) – Natural selection
13
• Important aspects in natural selection are:
• adaptation to the environment
• isolation of populations in different groups which
cannot mutually mate
• If small changes in the genotypes of individuals are
expressed easily, especially in small populations, we
speak of genetic drift
• Mathematical expresses as fitness: success in life
Presentation Overview
• Purpose of presentation
• General introduction to Genetic Algorithms (GA’s)
• Biological background
• Origin of species
• Natural selection
• Genetic Algorithm
• Search space
• Basic algorithm
• Coding
• Methods
• Examples
• Possibilities
13
13
Genetic Algorithm (1) – Search space
• Most often one is looking for the best solution
in a specific subset of solutions
• This subset is called the search space (or state space)
• Every point in the search space is a possible solution
• Therefore every point has a fitness value, depending on
the problem definition
• GA’s are used to search the
search space for the best
solution, e.g. a minimum
• Difficulties are the local
minima and the starting
point of the search
2.5
2
1.5
1
0.5
0
0
100
200
300
400
500
600
700
800
900
1000
Genetic Algorithm (2) – Basic algorithm
13
• Starting with a subset of n randomly chosen solutions
from the search space (i.e. chromosomes). This is
the population
• This population is used to produce a next generation
of individuals by reproduction
• Individuals with a higher fitness have more chance
to reproduce (i.e. natural selection)
Genetic Algorithm (3) – Basic algorithm
13
• Outline of the basic algorithm
0
START
: Create random population of n chromosomes
1
FITNESS : Evaluate fitness f(x) of each chromosome in
the population
2
NEW POPULATION
0 SELECTION
: Based on f(x)
1 RECOMBINATION : Cross-over chromosomes
3
2 MUTATION
: Mutate chromosomes
3 ACCEPTATION
: Reject or accept new one
REPLACE : Replace old with new population: the new
generation
4
TEST
: Test problem criterium
5
LOOP
: Continue step 1 – 4 until criterium is
satisfied
Genetic Algorithm (4) – Coding
13
• Normal cells are diploid (containing 2 complete sets
of chromosomes)
• On the contrary gametes are haploid
• Formalizing diploid reproduction is much more difficult
than haploid
• Diploid populations have an extra dimension compared to
haploid populations
• For simplicity therefore only haploid genetic algorithms
Genetic Algorithm (5) – Coding
13
• Chromosomes are encoded by bitstrings
• Every bitstring therefore is a solution but not necisseraly
the best solution
• The way bitstrings can code differs from problem
to problem
1
0
0
1
Either: sequence of on/off or the number 9
13
Genetic Algorithm (6) – Coding
• Recombination (cross-over) can when using
bitstrings schematically be represented:
1
0
0
1
1
0
1
X
0
1
0
1
1
1
0
1
0
0
1
1
1
0
• Using a specific cross-over point
0
1
0
1
1
0
1
13
Genetic Algorithm (7) – Coding
• Mutation prevents the algorithm to be trapped in a
local minimum
• In the bitstring approach mutation is simpy the flipping
of one of the bits
1
0
0
1
1
0
1
1
1
0
1
1
0
1
Genetic Algorithm (8) – Coding
13
• Both recombination and mutation depend a lot
on the exact definition of the problem and the choice
of representing the chromosomes (e.g. no bitstrings)
• Different encodings can be used:
• Binary encoding
• Permutation encoding
• Value encoding
• Tree encoding
• Focus in this presentation stays with binary encoding
13
Example Minimum of Function (1)
• First example shows how to find the minimum
of a function
2.5
2
1.5
1
0.5
0
Minimum f(x)
at x = 809
0
100
200
300
400
500
600
700
800
900
1000
1100101001
13
Example Minimum of Function (2)
2.2
2
1.8
1.6
1.4
1.2
Mean
fitness
1
Best
fitness
0.8
0.6
1.3
Best Fitness
Mean Fitness
0.4
1.2
Generation 1
0
200
400
600
800
1000
1.1
1200
1
Individual
Best individual
0.9
Fitness
0.2
0.8
0.7
0.6
0.5
0.4
0.3
0
10
20
30
40
Generations
50
Generations
60
70
80
Example Minimum of Function (3)
• Interactive show of this algorithm with Matlab
• Using the function: genalg2()
• Variables:
• Population size
• Bitstringlength
• Mutation chance
• Recombination chance
• Starting population adaption
13
Genetic Algorithm (9) – Remarks
13
• It is clear from the example that the convergence
speed of the algorithm depends on many factors:
• Population size
• Mutation probability
• Recombination probability
• Elitism
• Selection methods
• Random selection of parents
• Roulette wheel selection of parents
• Strong point GA’s: mutation prevents from falling in
a local minimum, recombination initiates a fast
first convergence
13
Example Checkboard (1)
• We are given an n by n checkboard in which
every field can have a different colour from a
set of four colours.
• Goal is to achieve a checkboard in a way that there
are no neighbours with the same colour (not diagonal)
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Example Checkboard (2)
13
• Chromosomes represent the way the checkboard
is coloured.
• Chromosomes are not represented by bitstrings
but by bitmatrices
• The bits in the bitmatrix can have one of the four
values 0, 1, 2 or 3, depending on the colour
• Crossing-over involves matrix manipulation instead
of point wise operating. Crossing-over can be
combining the parential matrices in a horizontal,
vertical, triangular or square way
• Mutation remains bitwise changing bits in either one
of the other numbers
13
Example Checkboard (3)
• Fitnesscurve for the checkboard example
180
175
170
165
Fitness
160
155
150
145
140
Best Fitness
Mean Fitness
135
130
0
100
200
300
Generations
400
500
600
• This problem can be seen as a graph with n nodes
and (n-1) edges, so the fitness f(x) is easily
defined as: f(x) = 2 · (n-1) ·n
13
Example Checkboard (4)
• Fitnesscurves for different cross-over rules
Fitness
Lower-Triangular Crossing Over
Square Crossing Over
180
180
170
170
160
160
150
150
140
140
130
0
100
200
300
400
500
130
0
Fitness
Horizontal Cutting Crossing Over
180
170
170
160
160
150
150
140
140
0
200
400
Generations
600
400
600
800
Verical Cutting Crossing Over
180
130
200
800
130
0
500
1000
Generations
1500
Example Checkboard (5)
• Interactive show of this algorithm with Matlab
• Using the functions:
• main()
• checkers()
• bestindividual()
• mutate()
• recombine()
• select()
• showbestindividual()
13
Possibilities
• Using the genetic algorithm to optimise
parameters for a force field
• Parameters are real numbers, so adaptations of
these algorithms is required
• Value incoding vs. bitstring encoding
• Difficulties:
• Definition fitness function
• Integration algorithm with software
13
13
Further Questions
?
Download