Document

advertisement
CSM6120
Introduction to Intelligent Systems
Evolutionary and Genetic Algorithms
rkj@aber.ac.uk
Informal biological terminology

Genes


Chromosomes


Encoding rules that describe how an organism is built up from
the tiny building blocks of life
Long strings formed by connecting genes together
Recombination

Process of two organisms mating, producing offspring that may
end up sharing genes of their parents
Basic ideas of EAs

An EA is an iterative procedure which evolves a
population of individuals



Each individual is a candidate solution to a given problem
Each individual is evaluated by a fitness function, which
measures the quality of its candidate solution
At each iteration (generation):



The best individuals are selected
Genetic operators are applied to selected individuals in order
to produce new individuals (offspring)
New individuals are evaluated by fitness function
Taxonomy
Search Techniques
Informed
Uninformed
DFS
A*
Evolutionary
Strategies
Hill Climbing
Swarm Intelligence
Evolutionary
Algorithms
Genetic
Programming
BFS
Simulated
Annealing
Genetic
Algorithms
The Genetic Algorithm

Directed search algorithms based on the mechanics of
biological evolution

Developed by John Holland, University of Michigan
(1970s)



To understand the adaptive processes of natural systems
To design artificial systems software that retains the robustness
of natural systems
Provide efficient, effective techniques for optimization and
machine learning applications
Some GA applications
Domain
Application Types
Control
gas pipeline, pole balancing, missile evasion, pursuit
Design
semiconductor layout, aircraft design, keyboard
configuration, communication networks
Scheduling
manufacturing, facility scheduling, resource allocation
Robotics
trajectory planning
Machine Learning
designing neural networks, improving classification
algorithms, classifier systems
Signal Processing
filter design
Game Playing
poker, checkers, prisoner’s dilemma
Combinatorial
Optimization
set covering, travelling salesman, routing, bin packing,
graph colouring and partitioning
Application: function optimisation (1)
4
1
3 .5
0.8
3
2 .5
0.6
2
0.4
1 .5
1
0.2
0 .5
0
0
-1
-0.8
-0.6
f(x) =
-0.4
x2
-0.2
0
0.2
0.4
0.6
0.8
1
-1 0
-5
0
5
10
g(x) = sin(x) - 0.1 x + 2
h(x,y) = x.sin(4x) - y.sin(4y+ ) + 1
Application: function optimisation (2)

Conventional approaches:


Often requires knowledge of derivatives or other specific
mathematical technique
Evolutionary algorithm approach:

Requires only a measure of solution quality (fitness function)
Components of a GA
A problem to solve, and ...






Encoding technique
Initialization procedure
Evaluation function
Selection of parents
Genetic operators
Parameter settings
(gene, chromosome)
(creation)
(environment)
(reproduction)
(mutation, recombination)
(practice and art)
GA terminology

Population


Parents/Children



The collection of potential solutions (i.e. all the chromosomes)
Both are chromosomes
Children are generated from the parent chromosomes
Generations

Number of iterations/cycles through the GA process
Simple GA
initialize population;
evaluate population;
while TerminationCriteriaNotSatisfied
{
select parents for reproduction;
perform recombination and mutation;
evaluate population;
}
The GA cycle
chosen
parents
recombination
selection
children
modification
modified
children
parents
evaluation
population
evaluated children
deleted
members
discard
Population
Chromosomes could be:






Bit strings
Real numbers
Permutations of element
Lists of rules
Program elements
... any data structure ...
(0101 ... 1100)
(43.2 -33.1 ... 0.0 89.2)
(E11 E3 E7 ... E1 E15)
(R1 R2 R3 ... R22 R23)
(genetic programming)
Example: Discrete representation

Representation of an individual can be using discrete values
(binary, integer, or any other system with a discrete set of
values)

The following is an example of binary representation:
CHROMOSOME
1
0
1
0
GENE
0
0
1
1
Example: Discrete representation
Phenotype:
• Integer
8 bits Genotype
1
0
1
0
0
0
1
1
• Real Number
• Schedule
• ...
• Anything?
Example: Discrete representation
Phenotype could be integer numbers
Genotype:
1
0
1
0
Phenotype:
0
0
1
1
= 163
1*27 + 0*26 + 1*25 + 0*24 + 0*23 + 0*22 + 1*21 + 1*20 =
128 + 32 + 2 + 1 = 163
Example: Discrete representation
Phenotype could be real numbers
e.g. a number between 2.5 and 20.5 using 8 binary digits
Genotype:
1
0
1
0
x  2 .5 
0
Phenotype:
0
163
256
1
1
= 13.9609
 20 . 5  2 . 5   13 . 9609
Example: Discrete representation
Phenotype could be a schedule
e.g. 8 jobs, 2 time steps
Phenotype
Job
Genotype:
1
0
1
0
0
0
1
1
=
1
2
3
4
5
6
7
8
Time Step
2
1
2
1
1
1
2
2
Example: Real-valued representation

A very natural encoding if the solution we are looking
for is a list of real-valued numbers, then encode it as a list
of real-valued numbers! (i.e., not as a string of 1s and 0s)

Lots of applications, e.g. parameter optimisation
Representation

Task – how to represent the travelling salesman problem
(TSP)?
Find a tour of a given set of cities so that


Each city is visited only once
The total distance travelled is minimised
Representation
One possibility - an ordered list of city numbers
(this is known as an order-based GA)
1) London
2) Venice
3) Dunedin
4) Singapore
Chromosome 1
Chromosome 2
5) Beijing 7) Tokyo
6) Phoenix 8) Victoria
(3 5 7 2 1 6 4 8)
(2 5 7 6 8 1 3 4)
Selection
selection
parents
population
Selection

Need to choose which chromosomes to use based on
their ‘fitness’


Why not choose the best chromosomes?
We want a balance between exploration and
exploitation
Roulette wheel selection
Rank-based selection

1st step



2nd step



Sort (rank) individuals according to fitness
Ascending or descending order (minimization or maximization)
Select individuals with probability proportional to their rank only
(ignoring the fitness value)
The better the rank, the higher the probability of being selected
It avoids most of the problems associated with roulette-wheel
selection, but still requires global sorting of individuals,
reducing potential for parallel processing
Tournament selection

A number of “tournaments” are run




Several chromosomes chosen at random
The chromosome with the highest fitness is selected each time
Larger tournament size means that weak chromosomes are
less likely to be selected
Advantages


It is efficient to code
It works on parallel architectures
The GA cycle
chosen
parents
recombination
selection
children
modification
modified
children
parents
evaluation
population
evaluated children
deleted
members
discard
Crossover: recombination
(0 1 1 0 1 0 1 1)
(1 1 0 1 1 0 0 1)
P1
P2
(1 1 0 1 1 0 1 1)
(0 1 1 0 1 0 0 1)
C1
C2
Crossover is a critical feature of GAs:



It greatly accelerates search early in evolution of a population
It leads to effective combination of sub-solutions on different
chromosomes
Several methods for crossover exist…
Crossover

How would we implement crossover for TSPs?
Parent 1
Parent 2
(3 5 7 2 1 6 4 8)
(2 5 7 6 8 1 3 4)
Crossover
Parent 1
Parent 2
Child 1
Child 2
(3 5 7 2 1 6 4 8)
(2 5 7 6 8 1 3 4)
(3 5 7 6 8 1 3 4)
(2 5 7 2 1 6 4 8)
Mutation: local modification


Before:
(1 0 1 1 0 1 1 0)
After:
(0 1 1 0 0 1 1 0)
Before:
(1.38 -69.4 326.44 0.1)
After:
(1.38 -67.5 326.44 0.1)
Causes movement in the search space
(local or global)
Restores lost information to the population
Mutation

Given the representation for TSPs, how could we achieve
mutation?
Mutation
Mutation involves reordering of the list:
*
Before:
After:
*
(5 8 7 2 1 6 3 4)
(5 8 6 2 1 7 3 4)
Note

Both mutation and crossover are applied based on usersupplied probabilities

We usually use a fairly high crossover rate and fairly low
mutation rate

Why do you think this is?
Evaluation of fitness
modified
children
evaluation
evaluated children

The evaluator decodes a chromosome and assigns it a fitness
measure

The evaluator is the only link between a classical GA and the
problem it is solving
Fitness functions

Evaluate the ‘goodness’ of chromosomes

(How well they solve the problem)

Critical to the success of the GA

Often difficult to define well

Must be fairly fast, as each chromosome must be
evaluated each generation (iteration)
Fitness functions

Fitness function for the TSP?


(3 5 7 2 1 6 4 8)
As we’re minimizing the distance travelled, the fitness is
the total distance travelled in the journey defined by the
chromosome
Deletion
population
deleted
members
discard

Generational GA:
entire populations replaced with each iteration

Steady-state GA:
a few members replaced each generation
The GA cycle
chosen
parents
recombination
selection
children
modification
modified
children
parents
evaluation
population
evaluated children
deleted
members
discard
Stopping!

The GA cycle continues until


The system has ‘converged’; or
A specified number of iterations (‘generations’) has been
performed
An abstract example
Distribution of Individuals in Generation 0
Distribution of Individuals in Generation N

Good demo of the GA components

http://www.obitko.com/tutorials/genetic-algorithms/examplefunction-minimum.php
TSP example: 30 cities
120
100
y
80
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
TSP30 (Performance = 941)
120
100
y
80
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
TSP30 (Performance = 800)
120
100
80
y
44
62
69
67
78
64
62
54
42
50
40
40
38
21
35
67
60
60
40
42
50
99
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
TSP30 (Performance = 652)
120
100
y
80
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
TSP30 Solution (Performance = 420)
120
100
80
y
42
38
35
26
21
35
32
7
38
46
44
58
60
69
76
78
71
69
67
62
84
94
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
Overview of performance
TSP30 - Overview of Performance
1800
1600
1400
Distance
1200
1000
800
600
400
200
0
Best
1
3
5
7
9
11 13 15 17 19 21
Generations (1000)
23
25
27
29
31
Worst
Average
Example: n-queens

Put n queens on an n × n board with no two queens on
the same row, column, or diagonal
Examples

Eaters


http://math.hws.edu/xJava/GA/
TSP


http://www.heatonresearch.com/articles/65/page1.html
http://www.ads.tuwien.ac.at/raidl/tspga/TSPGA.html
Exercise: The Card Problem

You have 10 cards numbered from 1 to 10.You have to choose a way of
dividing them into 2 piles, so that the cards in Pile0 *sum* to a number as
close as possible to 36, and the remaining cards in Pile1 *multiply* to a
number as close as possible to 360

Encoding
 Each card can be in Pile0 or Pile1, there are 1024 possible ways of
sorting them into 2 piles, and you have to find the best. Think of a
sensible way of encoding any possible solution.

Fitness
 Some of these chromosomes will be closer to the target than others.
Think of a sensible way of evaluating any chromosome and scoring it
with a fitness measure.
Issues for GA practitioners

Choosing basic implementation issues:








Representation
Population size, mutation rate, ...
Selection, deletion policies
Crossover, mutation operators
Termination criteria
Performance, scalability
Solution is only as good as the fitness function (often hardest
part)
Your assignment will be to code a GA for a given task! Be
aware of the above issues…
Benefits of GAs

Concept is easy to understand

Supports multi-objective optimization

Good for “noisy” environments

Always an answer; answer gets better with time

Inherently parallel; easily distributed
Download