Artificial intelligence 1: informed search

advertisement
Page
What Do the Following 3 Things
Have in Common?
Page
Genetic Algorithms (GAs)
• GAs design jet engines.
• GAs draw criminals.
• GAs program computers.
Page
A Potpourri of Applications
1. General Electric’s Engineous (generalized
engineering optimization).
2. Face space (criminology).
3. Genetic programming (machine learning).
Page
Gas Turbine Design
Jet engine design at General Electric (Powell,
Tong, & Skolkick, 1989)
•
•
•
•
•
Page
Coarse optimization - 100 design variables.
Hybrid GA + numerical optimization + expert system.
Found 2% increase in efficiency.
Spending $250K to test in laboratory.
Boeing 777 design based on these results.
Engineous
A new software system called Engineous combines artificial intelligence
and numerical methods for the design and optimization of complex
aerospace systems. Engineous combines the advanced computational
techniques of genetic algorithms, expert systems, and object-oriented
programming with the conventional methods of numerical optimization
and simulated annealing to create a design optimization environment
that can be applied to computational models in various disciplines.
Engineous has produced designs with higher predicted performance
gains that current manual design processes - on average a 10-to-1
reduction of turnaround time - and has yielded new insights into product
design. It has been applied to the aerodynamic preliminary design of an
aircraft engine turbine, concurrent aerodynamic and mechanical
preliminary design of an aircraft engine turbine blade and disk, a space
superconductor generator, a satellite power converter, and a nuclearpowered satellite reactor and shield.
Page6
Source: Tong, S.S. ; Powell, D. ; Goel, S. Integration of artificial intelligence and numerical optimization techniques for the design
of complex aerospace systems
Engineous
Engineous Software, Inc. provides process integration and design optimization
software solutions and services. The company's iSIGHT software integrates key
steps in the product design process, then automates and executes those steps
through design exploration tools like optimization, DOE, and DFSS techniques.
The FIPER infrastructure links these processes together in a unified environment.
Through FIPER, models, applications and "best" processes are easily shared,
accessed, and executed with other engineers, groups, and partners. Engineous
operates numerous sales offices in the U.S., as well as wholly owned subsidiaries in
Asia and Europe.
Customers include leading Global 500 companies such as Canon, General Electric,
General Motors, Pratt & Whitney, Honeywell, Lockheed Martin, Toshiba, MHI, Ford
Motor Company, Chrysler, Toyota, Nissan, Renault, Hitachi, Peugeot, MTU Aero
Engines, TRW, BMW, Rolls Royce, and Johnson Controls, Inc.
Additional information may be found at www.Engineous.com and
http://components.Engineous.com.
Page7
Recent news on Engineous
Dassault Systèmes to Acquire Engineous Software
06/17/2008
Paris, France, and Providence, R.I., USA, June 17, 2008 – Dassault Systèmes
(DS) (Nasdaq: DASTY; Euronext Paris: #13065, DSY.PA),
a world leader in 3D and Product Lifecycle Management (PLM) solutions
and Engineous Software, a market leader in process automation, integration
and optimization, today announced an agreement in which DS would
acquire Engineous Software.
This acquisition will extend SIMULIA’s leadership in providing Simulation
Lifecycle Management solutions on the V6 IP collaboration platform.
The proposed acquisition, for an estimated price of 40 million USD, should
be completed before the end of July subject to specific closing conditions.
Page8
http://www.3ds.com/company/news-media/press-releases-detail/release//single/1789/?no_cache=1
Criminal-likeness Reconstruction
No closed form fitness
function (Caldwell &
Johnston, 1991).
• Human witness
chooses faces that
match best.
• GA creates new
faces from which
to choose.
Page
What are GAs?
• GAs are biologically inspired class of algorithms that can be
applied to, among other things, the optimization of nonlinear
multimodal functions.
• Solves problems in the same way that nature solves the
problem of adapting living organisms to the harsh realities of
life in a hostile world: evolution.
Let’s watch a video...
Page
What is a Genetic Algorithm (GA)?
A GA is an adaptation procedure based on the
mechanics of natural selection and genetics.
GAs have 2 essential components:
1. Survival of the fittest (selection)
2. Variation
Page
Nature as Problem Solver
Beauty-of-nature
argument
How Life Learned to
Live (Tributsch, 1982,
MIT Press)
Example: Nature as
structural engineer
Page
Owl Butterfly
Page
Evolutionary is Revolutionary!
Street distinction evolutionary vs.
revolutionary is false dichotomy.
3.5 Billion years of evolution can’t be wrong.
Complexity achieved in short time in nature.
Can we solve complex problems as quickly
and reliably on a computer?
Page
Why Bother?
Lots of ways to solve problems:
–Calculus
–Hill-climbing
–Enumeration
–Operations research: linear,
quadratic, nonlinear programming
Why bother with biology?
Page
Combinatorial Problem
• deals with the study of finite or countable discrete structures
• deciding when certain criteria can be met, and constructing and analyzing objects
meeting the criteria
e.g. Satisfiability
Is there a truth assignment to the boolean variables such that every clause is
satisfied?
Page
http://www.cs.sunysb.edu/~algorith/files/satisfiability.shtml
Non-linear Programming
nonlinear programming (NLP) is the process of solving a system of equalities and
inequalities, collectively termed constraints, over a set of unknown real variables,
along with an objective function to be maximized or minimized, where some of the
constraints or the objective function are nonlinear
Consider the following nonlinear
program:
minimise x(sin(3.14159x))
subject to 0 <= x <= 6
In the diagram above there are many local minima; that is, points at the bottom
of some portion of the graph
Page17
http://people.brunel.ac.uk/~mastjjb/jeb/or/nlp.html
Why bother with biology?
Gradient search
technique
Robustness = Breadth + Efficiency.
A hypothetical problem
spectrum:
Page
GAs Not New
Professor of Psychology and Electrical Engineering &
Computer Science
Ph.D. University of Michigan
Area: Cognition and Cognitive Neuroscience
John Holland at University of Michigan pioneered in the
50s.
Other evolutionaries: Fogel, Rechenberg, Schwefel.
Back to the cybernetics movement and early computers.
Reborn in the 70s.
Page
http://www.lsa.umich.edu/psych/people/directory/profiles/faculty/?uniquename=jholland
Genetic algorithms
Variant of local beam search with sexual
recombination.
Page20
How GAs are different from traditional
methods?
1.
2.
3.
4.
Page21
GAs work with a coding of the parameter set, not the
parameter themselves.
GAs search from a population of points, not a single
point.
GAs use payoff (objective function) information, not
derivatives or other auxilliary knowledge.
GAs use probabilistic transition rules, not deterministic
rules.
Traditional Approach
Problem: Maximise the function
f(S1, S2, S3, S4, S5) = sin(S1)2 * sin(S2)2+s3 - loge(S3)*S4-S5
Traditional approach: twiddle with the switch parameters.
y = f(S1, S2, S3, S4, S5)
output
S1, S2, S3, S4, S5
Page22
Setting of five switches
Genetic Algorithm
Problem: Maximise the function
f(S1, S2, S3, S4, S5) = sin(S1)2 * sin(S2)2+s3 - loge(S3)*S4-S5
Natural parameter set of the optimisation problem is
represented as a finite-length string
y = f(S1, S2, S3, S4, S5)
Setting of five switches
f(x)
output
S1, S2, S3, S4, S5
Page 23
GA doesn’t need to know the
workings of the black box.
0
x
31
Main Attractions of Genetic Algorithm
GA
Traditional Optimization Approaches
• Simplicity of operation and power of
effect
• unconstrained
• Limitations: continuity, derivative
existence, unimodality
• work from a rich database of points
simultaneously, climbing many peaks in
parallel
• move gingerly from a single point in
the decision space to the next using some
transition rule
• population of strings = points
• Population of well-adapted diversity
Page24
GA
maximise 2x1 + x2 - 5loge(x1)sin(x2)
subject to
x1x2 <= 10
| x1 - x2 | <= 2
0.1 <= x1 <= 5
0.1 <= x2 <= 3
Source: Nonlinear Programming, by J E Beasley
http://people.brunel.ac.uk/~mastjjb/jeb/or/nlp.html
Page25
• works from a rich database of points
simultaneously, climbing many peaks in
parallel
Page
Genetic Algorithm
GA
• Initial Step: random start using
successive coin flips
GA uses coding
01101
11000
01000
10011
population
• blind to auxiliary information
GAs are blind, only payoff values associated
with individual strings are required
• Searches from a population
Uses probabilistic transition rules to guide their
search towards regions of the search space with
likely improvement
Page27
Page
Genetic Algorithm
REPRODUCTION
• Selection according to fitness
GA uses coding
01101
11000
01000
10011
• Replication
1
population
3
4
Weighted Roulette wheel
Mating pool (tentative population)
CROSSOVER
• Crossover – randomized
information exchange
2
Crossover point k = [1, l-1]
Page
29
GA
builds
solutions from the past partial solutions of the previous trials
Page
Genetic Algorithm
MUTATION
• Reproduction and crossover may become overzealous and lose some potentially
useful genetic material
• Mutation protects against irrecoverable loss; it serves as an insurance policy against
premature loss of important notions
• Mutation rates: in the order of 1 mutation per a thousand bit position transfers
Page31
Page
Genetic Algorithm: Example
Problem: Maximise the function f(x) = x2 on the integer interval [0,
31]
1.
Coding of decision variables as some finite length string
X as binary unsigned integer of length 5
[0, 31] = [00000, 11111]
2. Constant settings
Pmutation=0.0333
Pcross=0.6
Population Size=30
Page33
DeJong(1975) suggests high crossover
Probability, low mutation probability
(inversely proportional to the pop.size), and
A moderate population size
Genetic Algorithm: Example
SAMPLE PROBLEM
• Maximize f(x) = x2; where x is permitted to vary between 0 and 31
3.
Select initial population at random (use even numbered population size)
String
number
Initial
Population
X value
f(x)
pselect
fi
f
Expected
count
fi
f
Actual
count(Roulette
Wheel)
1
01101
13
169
0.14
0.58
1
2
11000
24
576
0.49
1.97
2
3
01000
8
64
0.06
0.22
0
4
10011
19
361
0.31
1.23
1
Page34
Sum
Ave.
Max.
1170
293
576
Genetic Algorithm: Example
SAMPLE PROBLEM
• Maximize f(x) = x2; where x is permitted to vary between 0 and 31
4.
Reproduction: select mating pool by spinning roulette wheel 4 times.
pselect
30.9%
5.5%
14.4%
1
2
3
49.2
Weighted Roulette wheel
Page35
4
01101
11000
01000
10011
0.14
0.49
0.06
0.31
The best get more copies.
The average stay even.
The worst die off.
Choosing offspring for the next generation
int Select(int Popsize, double Sumfitness, Population Pop){
[0,1]
partSum = 0
rand=Random * Sumfitness
j=0
Repeat
j++;
partSum = partSum + Pop[j].fitness
Until (partSum >= rand) or (j = Popsize)
Return j
}
Page36
Genetic Algorithm
SAMPLE PROBLEM
5. Crossover – strings are mated randomly using coin tosses to pair the couples
- mated string couples crossover using coin tosses to select the crossing site
String
number
Mating Pool
after
Reproduction
Mate
(randomly
selected)
Crossover
site
(random)
New
population
X-value
f(x)=x2
1
0110|1
2
4
01100
12
144
2
1100|0
1
4
11001
25
625
3
11|000
4
2
11011
27
729
4
10|011
3
2
10000
16
256
Page37
Sum
Ave.
Max.
1754
439
729
Page
The Genetic Algorithm
1.
Page39
Initialize the algorithm.
Randomly initialize each individual chromosome in
the population of size N (N must be even), and
compute each individual’s fitness.
The Genetic Algorithm
1.
Initialize the algorithm. Randomly initialize each individual
chromosome in the population of size N (N must be even), and
compute each individual’s fitness.
2.
Select N/2 pairs of individuals for crossover. The
probability that an individual will be selected for
crossover is proportional to its fitness.
Page40
The Genetic Algorithm
1.
2.
3.
Page41
Initialize the algorithm. Randomly initialize each individual
chromosome in the population of size N (N must be even), and
compute each individual’s fitness.
Select N/2 pairs of individuals for crossover. The probability that an
individual will be selected for crossover is proportional to its fitness.
Perform crossover operation on N/2 pairs selected
in Step1.
Randomly mutate bits with a small probability
during this operation.
The Genetic Algorithm
1.
2.
3.
4.
Page42
Initialize the algorithm. Randomly initialize each individual chromosome in the
population of size N (N must be even), and compute each individual’s fitness.
Select N/2 pairs of individuals for crossover. The probability that an individual
will be selected for crossover is proportional to its fitness.
Perform crossover operation on N/2 pairs selected in Step1. Randomly
mutate bits with a small probability during this operation.
Compute fitness of all individuals in new population.
The Genetic Algorithm
5. (Optional Optimization)
Select N fittest individuals from combined
population of size 2N consisting of old and new
populations pooled together.
Page43
The Genetic Algorithm
5.
(Optional Optimization) Select N fittest individuals from combined
population of size 2N consisting of old and new populations pooled
together.
6. (Optional Optimization)
Rescale fitness of population.
Page44
The Genetic Algorithm
5.
6.
(Optional Optimization) Select N fittest individuals from combined
population of size 2N consisting of old and new populations pooled
together.
(Optional Optimization) Rescale fitness of population.
7. Determine maximum fitness of individuals in
the population.
If |max fitness – optimum fitness| < tolerance Then
Stop
Else
Go to Step1.
Page45
A Simple GA Example
Page47
Let’s see a
demonstration for
a GA that
maximizes the
function
 x
f ( x)   
c
n =10
c = 230 -1 = 1,073,741,823
Page48
n
Simple GA Example
Function to evaluate:
10
 x 
f ( x)  

 coeff 
Fitness Function
or Objective
Function
coeff – chosen to normalize the x parameter when a bit string of
length lchrom =30 is chosen.
coeff  230  1
When the x value is normalized, the max. value of the function will be:
f ( x)  1.0
This happens when
Page49
x  230  1
for the case when lchrom=30
Test Problem Characteristics
With a string length=30, the search space is
much larger, and random walk or
enumeration should not be so profitable.
There are 230=1.07(1010) points. With over
1.07 billion points in the space, one-at-a-time
methods are unlikely to do very much very
quickly. Also, only 1.05 percent of the points
have a value greater than 0.9.
Page50
Comparison of the functions on the unit
interval
x2
f(x)
0
x
31
x10
f(x)
0
Page51
x
1,073,741,823
Actual Plot
1
0.9
0.8
0.7
f(x)
0.6
x^2
x^10
0.5
0.4
0.3
0.2
0.1
0
0
Page52
0.1
0.2
0.3
0.4
0.5
X
0.6
0.7
0.8
0.9
1
Decoding a String
For every problem, we must create a procedure
that decodes a string to create a parameter (or set
of parameters) appropriate for that problem.
first bit
11010101
Chromosome
1073741823.0
DECODE
Parameter
OBJ FCN
Fitness or Figure of Merit
Page53
GA Parameters
A series of parametric studies [De Jong, 1975] across a
five function suite of optimization problems suggested that
good GA performance requires the choice of:
– High crossover probability
– Low mutation probability (inversely proportional
to the population size)
– Moderate Population Size
(e.g. pmutation=0.0333, pcross=0.6, popsize=30)
Page54
Limits of GA
• GAs are characterized by a voracious appetite for
processing power and storage capacity.
• GAs have no convergence guarantees in arbitrary
problems.
Page55
Limits of GAs
• GAs sort out interesting areas of a space quickly,
but they are a weak method, without the
guarantees of more convergent procedures.
• This does not reduce their utility however. More
convergent methods sacrifice globality and
flexibility for their convergence, and are limited to
a narrow class of problem.
• GAs can be used where more convergent methods
dare not tread.
Page56
Advantages of GAs
• Well-suited to a wide-class of problems
• Do not rely on the analytical properties of
the function to be optimized (such as the
existence of a derivative)
Page57
Advanced GA Architectures
GA + Any Local Convergent Method
– Start search using GA to sort out the
interesting hills in your problem. Once
GA ferrets out the best regions, apply
locally convergent scheme to climb the
local peaks.
Page58
Other Applications
Optimization of a choice of Fuzzy
Logic parameters
Page59
Simple GA Implementation
Initial population of chromosomes
Offspring
Population
Calculate fitness value
Evolutionary
operations
No
Solution
Found?
Yes
Stop
Page60
Based on SGA-C, A C-language Implementation of a
Simple Genetic Algorithm
Let’s see the documentation (pdf file)
Page
Phase 1 – General Initialisation
Initialise Parameters()
Randomize()
Warmup_Random()
Advance_Random()
Initialise Population()
Decode()
Objective Function()
Page62
Phase 2 – Generation of Chromosomes
Generation()
SelectIndividual()
CrossOver()
Mutation()
Page63
Running the GA System
Gen = 0
Initialize( OldPop )
Do
Gen = Gen + 1
Generation( OldPop, NewPop )
For ii = 1 To PopSize
OldPop(ii) = NewPop(ii) 'advance the generation
Next ii
Loop Until ( (Gen > MaxGen) or (MaxFitness > DesiredFitness) )
Page64
Initialisation
Initialise Parameters
PopSize = 30
'population size
lchrom = 30
'chromosome length
MaxGen = 10
PCross = 0.6
PMutation = 0.0333
ReDim GenStat(1 To (MaxGen + 1))
'Initialize random number generator
Randomize
'Initialize counters
NMutation = 0
NCross = 0
Page65
Randomization
Randomize
Sub Randomize()
Randomize Timer
Warmup_Random (Rnd * 1)
End Sub
Page66
[0,1]
Randomization
Warmup_Random()
[0, 1]
Sub Warmup_Random(RandomSeed As Single)
Dim j1 As Integer
Dim ii As Integer
Dim NewRandom As Single
Dim PrevRandom As Single
[0, 1]
OldRand(55) = RandomSeed
NewRandom = 0.000000001
PrevRandom = RandomSeed
Page67
For j1 = 1 To 54
ii = (21 * j1) Mod 55 'multiply first, before modulus
OldRand(ii) = NewRandom
NewRandom = PrevRandom - NewRandom
If (NewRandom < 0) Then NewRandom = NewRandom + 1
PrevRandom = OldRand(ii)
Next j1
Advance_Random
Advance_Random
Advance_Random
jrand = 0
End Sub
Randomization
Advance_Random()
Sub Advance_Random()
Dim j1 As Integer
Dim New_Random As Single
For j1 = 1 To 24
New_Random = OldRand(j1) - OldRand(j1 + 31)
If (New_Random < 0) Then New_Random = New_Random + 1
OldRand(j1) = New_Random
Next j1
For j1 = 25 To 55
New_Random = OldRand(j1) - OldRand(j1 - 24)
If (New_Random < 0) Then New_Random = New_Random + 1
OldRand(j1) = New_Random
Next j1
End Sub
Page68
Max: 24+31=55
Max: 55-24=31
Random
'Fetch a single random number between 0.0 and 1.0 Subtractive Method
'See Knuth, D. (1969), v. 2 for details
Function Random() As Single
jrand = jrand + 1
If jrand > 55 Then
jrand = 1
Advance_Random
End If
Random = OldRand(jrand)
End Function
Page69
Knuth, 1969. D.E. Knuth The Art of Computer Programming 2, Addison-Wesley, Reading, MA (1969).
Initialise Population
InitPop()
Sub InitPop()
Dim j As Integer
Dim j1 As Integer
For j = 1 To PopSize
With OldPop(j)
Max: 24+31=55
For j1 = 1 To lchrom
.Chromosome(j1) = Flip(0.5)
Next j1
.x = Decode(.Chromosome, lchrom) 'decode the string
.Fitness = ObjFunc(.x) 'evaluate initial fitness
.Parent1 = 0
.Parent2 = 0
.XSite = 0
End With
Next j
End Sub
Page70
Initialise Population
Decode()
'decodes the string to create a parameter or set of parameters
'appropriate for that problem
'Decode string as unsigned binary integer: true=1, false=0
Function Decode(Chrom() As Boolean, lbits As Integer) As Single
Dim j As Integer
Dim Accum As Single
Dim PowerOf2 As Single
Accum = 0
PowerOf2 = 1
For j = 1 To lbits
If Chrom(j) Then Accum = Accum + PowerOf2
PowerOf2 = PowerOf2 * 2
Next j
Decode = Accum
Binary
End Function
Page71
to decimal conversion
Initialise Population
Objective Function()
'Fitness function = f(x) = (x/c) ^ n
Function ObjFunc(x As Single) As Single 'coef = (2 ^ 30)-1 = 1073741823
'coef is chosen to normalize the x parameter when a bit string of
length lchrom=30 is chosen
'since the x value has been normalized, the maximum value of the fcn wil be f(x)=1,
'when x=(2^30)-1, for the case when lchrom=30
Const coef As Single = 1073741823 'coefficient to normalize domain
Const n As Single = 10 'power of x
ObjFunc = (x / coef) ^ n
End Function
Page72
Generation of Chromosomes
SelectIndividual()
Function SelectIndividual(PopSize As Integer, SumFitness As Single, Pop() As
IndividualType) As Integer
Dim RandPoint As Single
Dim PartSum As Single
Select a single individual or offspring for
Dim j As Integer
PartSum = 0
j=0
RandPoint = Random * SumFitness
the next generation via roulette wheel
selection
Do 'find wheel slot
j=j+1
PartSum = PartSum + Pop(j).Fitness
Loop Until ((PartSum >= RandPoint ) Or (j = PopSize))
SelectIndividual = j
End Function
Page73
CrossOver()
Generation of Chromosomes
Function CrossOver(Parent1() As Boolean, Parent2() As Boolean, Child1() As Boolean, Child2() As Boolean,
lchrom As Integer, NCross As Integer, NMutation As Integer, jcross As Integer, Pcross, PMutation As Single)
Dim j As Integer
If (Flip(PCross)) Then
jcross = Rndx(1, lchrom - 1) 'cross-over site is selected between 1 and the last cross site
NCross = NCross + 1
Else ‘ use full-length string l, and so a bit-by-bit mutation will take place despite the absence of a cross
jcross = lchrom
End If
For j = 1 To jcross ‘ 1st exchange, 1 to 1 and 2 to 2
Child1(j) = Mutation(Parent1(j), PMutation, NMutation)
Child2(j) = Mutation(Parent2(j), PMutation, NMutation)
Next j
If jcross <> lchrom Then ‘ 2nd exchange, 1 to 2 and 2 to 1
For j = jcross + 1 To lchrom
Child1(j) = Mutation(Parent2(j), PMutation, NMutation)
Child2(j) = Mutation(Parent1(j), PMutation, NMutation)
Next j
End If
CrossOver OldPop(Mate1).Chromosome, OldPop(Mate2).Chromosome, _
End Function
NewPop(j).Chromosome, NewPop(j + 1).Chromosome, _
lchrom, NCross, NMutation, jcross, PCross, PMutation
Page74
Generation of Chromosomes
Mutation()
'Mutate an allele with PMutation, count number of mutations
Function Mutation(Alleleval As Boolean, PMutation As Single, NMutation As Integer)
As Boolean
Dim Mutate As Boolean
Mutate = Flip(PMutation)
If Mutate Then
NMutation = NMutation + 1
Mutation = Not Alleleval
Else
Mutation = Alleleval
End If
End Function
Page75
Function Flip(Probability As Single) As
Boolean
If Probability = 1 Then
Flip = True
Else
Flip = (Rnd <= Probability)
End If
End Function
Generation of Chromosomes
Generation()
Sub Generation()
Dim j As Integer
Dim Mate1 As Integer
Dim Mate2 As Integer
Dim jcross As Integer
j=1
Do
'Pick a pair of mates
Mate1 = SelectIndividual(PopSize, SumFitness, OldPop)
Mate2 = SelectIndividual(PopSize, SumFitness, OldPop)
With NewPop(j + 1)
.x = Decode(.Chromosome, lchrom)
.Fitness = ObjFunc(.x)
.Parent1 = Mate1
.Parent2 = Mate2
.XSite = jcross
End With
j = j + 2 'increment population index
Loop Until (j > PopSize)
End Sub
'Crossover and mutation - mutation embedded within crossover
CrossOver OldPop(Mate1).Chromosome, OldPop(Mate2).Chromosome, _
NewPop(j).Chromosome, NewPop(j + 1).Chromosome, _
lchrom, NCross, NMutation, jcross, PCross, PMutation
'Decode string, evaluate fitness & record parentage date on both children
With NewPop(j)
.x = Decode(.Chromosome, lchrom)
.Fitness = ObjFunc(.x)
.Parent1 = Mate1
.Parent2 = Mate2
.XSite = jcross
End
With
Page76
What sort of traits do fit chromosomes carry to
survive generations after generations of mating
and mutation?
Page
Schema
A Schema is a similarity template describing a
subset of strings with similarities at certain
string positions.
We can think of it as a pattern matching device:
a schema matches a particular string if at every
location in the schema a 1 matches a 1 in the
string, or a 0 matches a 0, or an * matches either.
Page78
Notation: Schema
For a binary alphabet {0, 1}, we motivate a schema by
appending a special symbol *, or don’t care symbol, producing
a ternary alphabet:
V={ 0, 1, * };
* asterisk is a don’t care symbol which matches
either a 0 or a 1 at a particular position.
This allows us to build schemata:
e.g.
Page79
H = *11*0**
String A = 0111000
Understanding the building blocks of future solutions
Schema Properties
Schemata and their properties serve as notational
devices for rigorously discussing and classifying
string similarities.
They provide the basic means for analyzing the net
effect of reproduction and genetic operators on
the building blocks contained within the population.
Page80
Schema Matching
A bit string matches a particular schemata if that
bit string can be constructed from the schemata
by replacing the symbol with the appropriate bit
value.
e.g.
H = *11*0**
String A = 0111000
String A is an example of the schema H
because the string alleles ai match
schema positions hi at the fixed positions
2, 3 and 5.
Page81
Understanding the building blocks of future solutions
Schema Properties
Defining Length of Schema:
δ(H) – is the distance between the first
and last specific string position
011*1**
Order of Schema:
o(H) – is the number of fixed positions
present in the template
0******
Page 82
Schema Defining Length:
δ(H) = 5-1 = 4
Schema Order:
o(011*1**) = 4
Schema Defining Length: δ(H) = 0, because
there is only one fixed position
Schema Order:
o(0******) = 1
Effects of Crossover on a Schema
What happens to a particular schema when crossover is
introduced?
Crossover leaves a schema unscathed if it does not cut the
schema; otherwise it disrupts it.
e.g.
H1 = 1***0
H2 = **11*
H1 is likely to be disrupted by crossover.
H2 is relatively unlikely to be destroyed.
Page83
Effects of Mutation on a Schema
What happens to a particular schema when mutation is
introduced?
Mutation at normal, low rates does not disrupt a particular
schema very frequently.
e.g.
H1 = 110*0
H2 = **11*
H1 is relatively more likely to be disrupted by mutation (many
alleles that can be altered).
H2 is relatively unlikely to be mutated because it has a lower
order than H1.
Page84
Building blocks
Who shall live and who shall die?
Highly-fit, low-order, short-defining-length schemata
(building blocks)
are propagated generation to generation by giving
exponentially increasing samples to the observed best;
All these goes in parallel with no special bookkeeping or
special memory other than our population of n strings.
Page85
How do we choose a good coding?
Page
Principles for Choosing a GA Coding
1
Principle of Meaningful Blocks
The user should select a coding so that short, low-order
schemata are relevant to the underlying problem and
relatively unrelated to schemata over other fixed
positions.
When we design a coding, we should check the distances
between related bit positions.
Page87
Principles for Choosing a GA Coding
2
Principle of Minimal Alphabets
The user should select the smallest alphabet that permits
a natural expression of the problem.
In our previous examples, we were using binary coding.
Was that a good choice then?
Page88
Principles for Choosing a GA Coding
Example: Refer to our previous problem
• Maximize f(x) = x2; where x is permitted to vary between 0 and 31
Binary
String
Value X
Non-binary
String
Fitness
01101
13
N
169
11000
24
Y
576
01000
8
I
64
10011
19
T
361
We want to represent 32 possible values: [0, 31]
Non-binary Coding: 26-letter alphabet {A-Z}, six digits {1-6}
Binary Coding: {0, 1}
Page89
Number of Schemata for a GA Coding
Example: Refer to our previous problem
• Maximize f(x) = x2; where x is permitted to vary between 0 and 31
Binary String
Value X
Non-binary
String
Fitness
01101
13
N
169
11000
24
Y
576
01000
8
I
64
10011
19
T
361
We want to represent 32
possible values: [0, 31]
Non-binary Coding: 26-letter alphabet {A-Z}, six digits {1-6}
Binary Coding: {0, 1}
For equality of the number of points in each space, we require:
Where l
Page90
= binary code length, m = non-binary code length
2l = km
Number of Schemata for a GA Coding
Example: Refer to our previous problem
• Maximize f(x) = x2; where x is permitted to vary between 0 and 31
We want to represent 32 possible values: [0, 31]
Non-binary Coding: 26-letter alphabet {A-Z}, six digits {1-6}
Binary Coding: {0, 1}
For equality of the number of points in each space, we require:
Where l
= binary code length, m = non-binary code length
The number of schemata may then be calculated as:
Binary case:
3l
; 2+1 (asterisk) alphabets
Non-Binary case: (k+1)m
91
Page
; k+1 (asterisk) alphabets
2l = km
Number of Schemata for a GA Coding
Example: Refer to our previous problem
• Maximize f(x) = x2; where x is permitted to vary between 0 and 31
We want to represent 32 possible values: [0, 31]
Non-binary Coding: 26-letter alphabet {A-Z}, six digits {1-6}
Binary Coding: {0, 1}
For equality of the number of points in each space, we require:
Where l
2l = km
= binary code length, m = non-binary code length
The number of schemata may then be calculated as:
Binary case:
3l
; 2+1 (asterisk) alphabets
Non-Binary case: (k+1)m
; k+1 (asterisk) alphabets
The binary alphabet offers the maximum number of schemata per bit of information
of any coding. Since these similarities are the essence of our search, when we design
92
a code, we should ,maximise the number of them available for the GA to exploit.
Page
Genetic algorithms: Exercise
Think of a GA approach to solving
the 8-Queens problem
Page93
Genetic algorithms
Variant of local beam search with sexual recombination.
Fitness function: number of non-attacking pairs of queens (min = 0, max = 8 × 7/2 = 28);
Page94
24/(24+23+20+11) = 31%
23/(24+23+20+11) = 29% etc
Why use Fitness Scaling?
At the start of the GA run, it is common to have a few extraordinary
individuals in a population of mediocre colleagues.
If left to the selection rule
pselecti =
f
i
f
the extraordinary individuals would
take over a significant proportion of the
finite population in a single generation.
Page95
This is undesirable,
leading to a premature
convergence!
Why use Fitness Scaling?
Late in the run, there may still be significant diversity within the
population. However, the population’s average fitness may be close to
the population’s best fitness.
If this is left alone,
• average members get nearly the same number of
copies in future generations, and
• the survival of the fittest necessary for improvement
becomes a random walk among the mediocre.
In both cases, at the beginning of the run, and as the run
matures, fitness scaling can help.
Page96
Why use Fitness Scaling?
Linear Scaling
Scaled
fitness
f’ = a f + b
Raw
fitness
In all cases, we want f’ave = fave because
subsequent use of the selection procedure will ensure
that each average population member contributes one
expected offspring to the next generation.
Page97
Why use Fitness Scaling?
To control the number of offspring given to the population member with the
maximum raw fitness, we choose the other scaling relationship to obtain a
scaled maximum fitness.
Scaled Maximum Fitness:
f’max = cmult * fave
For a typical population size of n = 50 to 100,
Cmult = [1.2, 2] has been used successfully
Page98
Number of expected
copies desired for the
best population
member
Fitness Scaling
Linear Scaling Under Normal Conditions
Scaled fitness
f’max
f’ave
f’min
fmin fave
Page99
fmax
Raw fitness
Problem with Linear Scaling
Toward the end of a run, the choice of Cmult stretches the raw fitness values
significantly.
This may in turn cause difficulty in applying the linear scaling rule.
The effects of the Linear Scaling rule works during the initial run of the GA:
• few extraordinary individuals get scaled down, and
• the lowly members of the population get scaled up
The problem: As the run matures, points with low
fitness can be scaled to negative values!
Page100
The stretching required on the relatively close average
and maximum raw fitness values causes the low fitness
values to go negative after scaling.
See for yourself, TestGA2.xls
Why use Fitness Scaling?
Difficult situation for linear scaling in mature run
f’max
f’ave
f’min
0
Raw fitness
fmin fave
Page101
fmax
Negative fitness violates
non-negativity
requirement!
Fitness Scaling
If it’s possible to scale to the desired multiple, Cmult
Then
Perform linear scaling
Else
Scaling is performed by pivoting about the
average value and stretching the fitness until the
minimum value maps to zero.
Page102
Scaling
Non-negative test:
If( min > (fmultiple*avg - max) / (fmultiple - 1.0) ) {
Perform Normal Scaling
}
Page103
Eliminating negative fitness values
Solution:
When we cannot scale to the desired cmult, we still
maintain equality of the raw and scaled fitness averages
and we map the minimum raw fitness fmin to a scaled
fitness f’min = 0.
Description of Routines:
Prescale – takes the average, maximum and minimum raw
fitness values and calculates linear scaling of the coefficients a and
b based on the logic described previously. It takes into account
whether the desired cmult can be reached or not.
Scalepop – called after Prescaling is done. It scales all the
individual raw fitness values using the function Scale.
Page104
Fitness Scaling
procedure scalepop(popsize:integer; var max, avg, min, sumfitness:real;
var pop:population);
{ Scale entire population }
var j:integer;
a, b:real; { slope & intercept for linear equation }
begin
prescale(max, avg, min, a, b); { Get slope and intercept for function }
sumfitness := 0.0;
for j := 1 to popsize do with pop[j] do begin
fitness := scale(objective, a, b);
sumfitness := sumfitness + fitness;
end;
function scale(u, a, b:real):real;
end;
{ Scale an objective function value }
begin scale := a * u + b end;
Page105
Fitness Scaling
{ scale.sga: contains prescale, scale, scalepop for scaling fitnesses }
procedure prescale (umax, uavg, umin:real; var a, b:real);
{ Calculate scaling coefficients for linear scaling }
const fmultiple = 2.0; { Fitness multiple is 2 }
var delta:real;
{ Divisor }
begin
if umin > (fmultiple*uavg - umax) / (fmultiple - 1.0) { Non-negative test }
then begin { Normal Scaling }
delta := umax - uavg;
Linear Scaling
a := (fmultiple - 1.0) * uavg / delta;
b := uavg * (umax - fmultiple*uavg) / delta;
end else begin { Scale as much as possible }
delta := uavg - umin;
Stretch fitness until
a := uavg / delta;
Minimum maps to zero.
b := -umin * uavg / delta;
end;
end;
Let’s try to solve an
Page106
example using a
stored GA run.
Why Scaling?
Simple scaling helps prevent the early domination
of extraordinary individuals, while it later on
encourages a healthy competition among near
equals.
Page107
Stretch as much as possible
RAW Fitness
See GA-Scaling-Goldberg.xls
1
0.9
0.8
0.7
0.6
0.5
RAW Fitness
0.4
0.3
0.2
0.1
Raw Fitness vs. Scaled Fitness
(Multiple Scaling)
0
0
2
4
6
8
10
1.20
1.00
0.80
0.60
Scaled Fitness
0.40
0.20
0.00
0
Page108
0.2
0.4
0.6
0.8
1
Linear Scaling
RAW Fitness
Raw Fitness vs. Scaled Fitness
(Normal Scaling)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1.40
1.20
1.00
0.80
0.60
Scaled Fitness
0
2
4
6
8
10
0.40
0.20
0.00
-0.20
0
0.2
0.4
0.6
0.8
1
-0.40
See GA-Scaling-Goldberg.xls
Page109
This sample shows a violation of the nonnegativity requirement. This will never
happen had we followed the multiple scaling
scheme described previously
Page
Multiparameter, Mapped, Fixed-Point Coding
Single U1 Parameter (l1 = 4)
0 0 0 0 -> Umin
1 1 1 1 -> Umax
Others map linearly in between
Multiparameter Coding (10 parameters)
0 0 0 0 | 0 0 0 0 | ....... | 0 0 0 0 | 0 0 0 0 |
U1
U2
U9
U10
...
To construct a multiparameter coding, we can simply concatenate as many single
parameter codings as we require. Each coding may have its own sublength, its
own Umax and Umin values.
Page
111
Multiparameter, Mapped, Fixed-Point Coding
Extract a substring from a full string
procedure extract_parm(var chromfrom, chromto:chromosome;
var jposition, lchrom, lparm:integer);
{ Extract a substring from a full string }
var j, jtarget:integer;
begin
j := 1;
jtarget := jposition + lparm - 1;
if jtarget > lchrom then jtarget := lchrom; { Clamp if excessive }
while (jposition <= jtarget) do begin
chromto[j] := chromfrom[jposition];
jposition := jposition + 1;
j := j + 1;
end;
end;
Page112
Multiparameter, Mapped, Fixed-Point Coding
Map an unsigned binary integer to range [minparm, maxparm]
function map_parm(x, maxparm, minparm, fullscale: real) real;
begin
map_parm := minparm + (maxparm – minparm) / fullscale*x
end;
Page113
Multiparameter, Mapped, Fixed-Point Coding
Coordinates the decoding of all nparms parameters.
function decode_parms(var nparms, lchrom: integer;
var chrom: chromosome;
var parms: parmspecs);
Var j, jposition: integer;
Chromtemp:chromosome; {temporary string buffer}
begin
j:= 1; {parameter counter}
jposition := 1 ; {string position counter}
repeat
with parms[j] do if lparm > 0 then begin
extract_parm(chrom, chromtemp, jposition. lchrom, lparm);
parameter :=
map_parm(decode(chromtemp,
lparm), maxparm, minparm,
power(2.0, lparm)-1.0);
end else parameter := 0;
j := j + 1;
until j > nparms;
Page114
end;
Multiparameter, Mapped, Fixed-Point Coding
Decode a single parameter
function decode(chrom:chromosome; lbits:integer):real;
{ Decode string as unsigned binary integer - true=1, false=0 }
var j:integer;
accum, powerof2:real;
begin
accum := 0.0; powerof2 := 1;
for j := 1 to lbits do begin
if chrom[j] then accum := accum + powerof2;
powerof2 := powerof2 * 2;
end;
decode := accum;
end;
Page115
References
Genetic Algorithms: Darwin-in-a-box
Presentation by Prof. David E. Goldberg
Department of General Engineering
University of Illinois at Urbana-Champaign
deg@uiuc.edu
Neural Networks and Fuzzy Logic Algorithms
by Stephen Welstead
Soft Computing and Intelligent Systems Design
by Fakhreddine Karray and Clarence de Silva
Page116
References
http://lancet.mit.edu/~mbwall/presentations/IntroToGAs/P00
1.html
A Genetic Algorithm Tutorial
Darrell Whitley
Computer Science Department
Page117
Download