The building Block Hypothesis

advertisement
Building Blocks
CS 5764
Evolutionary Computation
Hod Lipson
Unifying ideas
• Knowledge represented as a population of
solutions containing building blocks
• Progress is driven by two key processes:
– Incremental progress: e.g. mutation
(traditional optimization): Refinement
– Recombination of solutions (e.g. crossover):
Discovering new areas (possibly initially
inferior)
Terminology
“Chromosome”
“Gene”
01010100111001010101010010110
Allele
one of two or more forms of a gene or a genetic locus
A GA Schema
• A “template”
– a string of symbols taken from the alphabet
{0,1,*}
– 010*1, *110*, *****, 10101
• The character “*” means “don’t care”
– *10*1 represents 01001, 01011, 11001,
and 11011
Geometric Interpretation
A Schema is a hyperplane in the larger search space
manifold
Order of a schema
• Number of specified alleles in a gene
?
000 001 010 011
?
010 011 110 111
?
010 110
?
101
Order of a schema
• How many different strings of length N
does a schema of order “O” represent?
– A schema of order O represents 2N-O
different strings of length N
Schema
Order
Represented Strings
***
0
000 001 010 011 100 101 110 111
*1*
1
010 011 110 111
*10
2
010 110
101
3
101
Destructive Dynamics
• Probability of surviving mutation
Sm(H)=
Defining Length
• “D” = The distance between the furthest
two non-* symbols
Schemata
**** *1**
*10* 10**
1*1*
1*11 0**1 1001
D
0
1
2
3
Why is the length important?
Destructive Dynamics
• Probability of surviving single point crossover
Strings containing schemata
• A bit string represented by a schema is
said to “contain” the schema
Bit String
1
00
110
1011
Contained Schemata
1 *
00 0* *0 **
110 11* 1*0 1** *10 *1* **0 ***
1011 101* 10*1 10** 1*11 1*1* 1**1 1***
*011 *01* *0*1 *0** **11 **1* ***1 ****
How many schemata does a string of length N include?
How many schemata in a
population?
• There are 3N different schemata
(potential genes) of length N
• A population of P bit-strings each of
length N contains between 2N and
min(P2N, 3N) schemata
All possible
schemata
N
3
P
100
Number of Schemata
?-?
How many schemata in a
population?
• There are 3N different schemata of
length N
• A population of P bit-strings each of
length N contains between 2N and
min(P2N, 3N) schemata
N=3
All possible
schemata
N
6
20
40
100
P
20
50
100
300
Number of Schemata
64 - 729
1048576 - 52428800
-
Estimating Fitness
associated with a gene
Population
101
100
010
110
f
5
1
2
3
Schemata
***
**0
**1
*0*
*00
*01
*1*
f
(5+1+2+3) / 4 = 2.75
(1+2+3) / 3 = 2
5/1=5
(5+1) / 2 = 3
1/1=1
5/1=5
(2+3)/2 = 2.5
Estimation uncertainty: Standard error
Observations
• If only fitness-proportionate selection is applied (no crossover or
mutation), schemata with above (below) average fitness are
sampled, generation after generation, by an increasing
(decreasing) number of chromosomes.
• Schemata with a long defining length have a higher probability
to be disrupted by crossover
• Schemata with high order have a higher probability of being
disrupted by mutation
• Schemata with a low order and a short defining length are called
building blocks
• Building blocks are processed with minimum disruption by GAs,
therefore GAs use building blocks of relatively high fitness to
build entire solutions
• GAs will be successful insofar as the
problem has been encoded in a way
that can be solved with compact
building blocks (low order, low defining
lengths)
• What is the easiest problem you can
think of?
Dynamics
•
•
•
•
•
H is a schema present in the population at time t
m(H,t) is the number of instances of H at time t
u(H,t) is the observed average fitness of H
expected number of offspring of x is f(x)/favg(t)
If x is an instance of H, then
Destructive Dynamics
• Probability of surviving single point crossover
• Probability of surviving mutation
Sm(H)=
Combining Effects
13
12
GA (Diversity, Tight Linkage)
Best Fitness
11
10
GA (Diversity, Poor Linkage)
9
Parallel Simulated Annealing
Parallel Hillclimber
GA (Roulette, Tight Linkage)
8
7
6
5
Random Search
4
0
500
1000
1500
2000
2500
3000
Evaluations
Generation (x100)
Large defining length and small order = poor linkage
The Building Block Hypothesis
• GAs performs adaptation by identifying
and recombining "building blocks", i.e.
low order, low defining-length schemata
with above average fitness.
• GAs perform adaptation by implicitly
and efficiently implementing this
heuristic.
Caveats
• Model assumes particular form of
representation:
– Bit strings, single point crossover, mutation
• Assumes fitness-proportionate selection
• Assume fixed fitness criterion
• Assumes fixed population size
Many variations have been published
Download