Teoria dell'Informazione (Classica)

advertisement
Evolutionary
Algorithms
Andrea G. B. Tettamanzi
Andrea G. B. Tettamanzi, 2002
Contents of the Lectures
•
•
•
•
•
•
•
•
Taxonomy and History;
Evolutionary Algorithms basics;
Theoretical Background;
Outline of the various techniques: plain genetic algorithms,
evolutionary programming, evolution strategies, genetic
programming;
Practical implementation issues;
Evolutionary algorithms and soft computing;
Selected applications from the biological and medical area;
Summary and Conclusions.
Andrea G. B. Tettamanzi, 2002
Bibliography








Th. Bäck. Evolutionary Algorithms in Theory and Practice. Oxford
University Press, 1996
L. Davis. The Handbook of Genetic Algorithms. Van Nostrand &
Reinhold, 1991
D.B. Fogel. Evolutionary Computation. IEEE Press, 1995
D.E. Goldberg. Genetic Algorithms in Search, Optimization and
Machine Learning. Addison-Wesley, 1989
J. Koza. Genetic Programming. MIT Press, 1992
Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution
Programs. Springer Verlag, 3rd ed., 1996
H.-P. Schwefel. Evolution and Optimum Seeking. Wiley & Sons,
1995
J. Holland. Adaptation in Natural and Artificial Systems. MIT Press
1995
Andrea G. B. Tettamanzi, 2002
Taxonomy (1)
Taboo Search
Stochastic optimization methods
Monte Carlo methods
Simulated Annealing
Evolutionary Algorithms
Genetic Algorithms
Genetic Programming
Andrea G. B. Tettamanzi, 2002
Evolution Strategies
Evolutionary Programming
Taxonomy (2)
Distinctive features of Evolutionary Algorithms:
•
•
•
•
operate on appropriate encoding of solutions;
population search;
no regularity conditions requested;
probabilistic transitions.
Andrea G. B. Tettamanzi, 2002
History (1)
John Koza
Stanford University
‘80s
L. Fogel
UC S. Diego, ‘60s
Andrea G. B. Tettamanzi, 2002
I. Rechenberg,
H.-P. Schwefel
TU Berlin, ‘60s
John H. Holland
University of Michigan,
Ann Arbor, ‘60s
History (2)
1859 Charles Darwin: inheritance, variation, natural selection
1957 G. E. P. Box: random mutation & selection for optimization
1958 Fraser, Bremermann: computer simulation of evolution
1964 Rechenberg, Schwefel: mutation & selection
1966 Fogel et al.: evolving automata - “evolutionary programming”
1975 Holland: crossover, mutation & selection - “reproductive plan”
1975 De Jong: parameter optimization - “genetic algorithm”
1989 Goldberg: first textbook
1991 Davis: first handbook
1993 Koza: evolving LISP programs - “genetic programming”
Andrea G. B. Tettamanzi, 2002
Evolutionary Algorithms Basics
•
•
•
•
•
•
•
•
what an EA is (the Metaphor)
object problem and fitness
the Ingredients
schemata
implicit parallelism
the Schema Theorem
the building blocks hypothesis
deception
Andrea G. B. Tettamanzi, 2002
The Metaphor
EVOLUTION
PROBLEM SOLVING
Environment
Object problem
Individual
Candidate solution
Fitness
Quality
Andrea G. B. Tettamanzi, 2002
Object problem and Fitness
genotype
M
solution
s
f
c: S  R
min c( s )
s S
fitness
object problem
Andrea G. B. Tettamanzi, 2002
The Ingredients
t
reproduction
selection
mutation
recombination
Andrea G. B. Tettamanzi, 2002
t+1
The Evolutionary Cycle
Selection
Parents
Population
Mutation
Reproduction
Recombination
Replacement
Offspring
Andrea G. B. Tettamanzi, 2002
Pseudocode
generation = 0;
SeedPopulation(popSize); // at random or from a file
while(!TerminationCondition())
{
generation = generation + 1;
CalculateFitness();
// ... of new genotypes
Selection();
// select genotypes that will reproduce
Crossover(pcross);
// mate pcross of them on average
Mutation(pmut);
// mutate all the offspring with Bernoulli
// probability pmut over genes
}
Andrea G. B. Tettamanzi, 2002
A Sample Genetic Algorithm
•
•
•
•
•
The MAXONE problem
Genotypes are bit strings
Fitness-proportionate selection
One-point crossover
Flip mutation (transcription error)
Andrea G. B. Tettamanzi, 2002
The MAXONE Problem
Problem instance: a string of l binary cells, l :
l
Fitness:
f ( )    i
i 1
Objective: maximize the number of ones in the string.
Andrea G. B. Tettamanzi, 2002
Fitness Proportionate Selection
Probability of  being selected:
P ( ) 
Implementation: “Roulette Wheel”
f
2

Andrea G. B. Tettamanzi, 2002
f ( )
f ( )
f
One Point Crossover
parents
offspring
0 0 0 1 1 1 1 0 1 0
0 0 0 1 0 0 1 1 0 0
1 0 1 1 0 0 1 1 0 0
1 0 1 1 1 1 1 0 1 0
crossover
point
Andrea G. B. Tettamanzi, 2002
Mutation
1 0 1 1 0 0 1 1 0 1
pmut
1 0 1 1 1 0 1 1 0 0
independent Bernoulli transcription errors
Andrea G. B. Tettamanzi, 2002
Example: Selection
0111011011
1011011101
1101100010
0100101100
1100110011
1111001000
0110001010
1101011011
0110110000
0011111101
f=7
f=7
f=5
f=4
f=6
f=5
f=4
f=7
f=4
f=7
Cf = 7
Cf = 14
Cf = 19
Cf = 23
Cf = 29
Cf = 34
Cf = 38
Cf = 45
Cf = 49
Cf = 56
P = 0.125
P = 0.125
P = 0.089
P = 0.071
P = 0.107
P = 0.089
P = 0.071
P = 0.125
P = 0.071
P = 0.125
Random sequence: 43, 1, 19, 35, 15, 22, 24, 38, 44, 2
Andrea G. B. Tettamanzi, 2002
Example: Recombination & Mutation
0111011011
0111011011
110|1100010
010|0101100
1|100110011
1|100110011
0110001010
1101011011
011000|1010
110101|1011










0111011011
0111011011
1100101100
0101100010
1100110011
1100110011
0110001010
1101011011
0110001011
1101011010










0111111011
0111011011
1100101100
0101100010
1100110011
1000110011
0110001010
1101011011
0110001011
1101011010
f=8
f=7
f=5
f=4
f=6
f=5
f=4
f=7
f=5
f=6
TOTAL = 57
Andrea G. B. Tettamanzi, 2002
Schemata
Don’t care symbol:

   1 0  1   
order of a schema: o(S) = # fixed positions
defining length (S) = distance between first and last fixed position
a schema S matches 2l - o(S) strings
a string of length l is matched by 2l schemata
Andrea G. B. Tettamanzi, 2002
Implicit Parallelism
In a population of n individuals of length l
2l  # schemata processed  n2l
n3 of which are processed usefully (Holland 1989)
(i.e. are not disrupted by crossover and mutation)
But see Bertoni & Dorigo (1993)
“Implicit Parallelism in Genetic Algorithms”
Artificial Intelligence 61(2), p. 307314
Andrea G. B. Tettamanzi, 2002
Fitness of a schema
f(): fitness of string 
qx(): fraction of strings equal to  in population x
qx(S): fraction of strings matched by S in population x
1
f x (S ) 
q x ( ) f ( )

q x ( S )  S
Andrea G. B. Tettamanzi, 2002
The Schema Theorem
{Xt}t=0,1,... populations at times t
suppose that
f X t (S )  f ( X t )
f ( Xt )
c

E[q X t ( S )| X 0 ]  q X 0 ( S )(1  c)  1  pcross

t
is constant
( S )

 o( S ) pmut 

l 1
i.e. above-average individuals increase exponentially!
Andrea G. B. Tettamanzi, 2002
t
The Schema Theorem (proof)
E[q X t ( S )| X t 1 ]  q X t 1 ( S )
f X t 1 ( S )
f ( X t 1 )
Psurv [ S ]  1  pcross
Andrea G. B. Tettamanzi, 2002
Psurv [ S ]  q X t 1 ( S )(1  c) Psurv [ S ]
( S )
 pmut o( S )
1 l
The Building Blocks Hypothesis
‘‘An evolutionary algorithm seeks near-optimal performance
through the juxtaposition of short, low-order, high-performance
schemata — the building blocks’’
Andrea G. B. Tettamanzi, 2002
Deception
i.e. when the building block hypothesis does not hold:
for some schema S,
* S
but
f (S )  f (S )
Example:
S1 = 111*******
* = 1111111111
S2 = ********11
S = 111*****11
S = 000*****00
Andrea G. B. Tettamanzi, 2002
Remedies to deception
Prior knowledge of the objective function
Non-deceptive encoding
Inversion
Semantics of genes not positional
Underspecification & overspecification
“Messy Genetic Algorithms”
Andrea G. B. Tettamanzi, 2002
Theoretical Background
• Theory of random processes;
• Convergence in probability;
• Open question: rate of convergence.
Andrea G. B. Tettamanzi, 2002
Events
Sample space
D
w
W
A
B
Andrea G. B. Tettamanzi, 2002
Random Variables
w
W
X :W R
X
X (w )
Andrea G. B. Tettamanzi, 2002
0
Stochastic Processes
A sequence of r.v.’s
X 1 , X 2 ,, X t ,
Each with its own probability distribution.
Notation:
Andrea G. B. Tettamanzi, 2002
 X (w)
t
t 0 ,1,
EAs as Random Processes
  ,2

, 
x  ( n )
probability space
a sample of size n
trajectory
“random numbers”
W, F , P
 X (w)
t
  ,2
Andrea G. B. Tettamanzi, 2002

, 
t 0 ,1,
evolutionary
process
Markov Chains
 X (w)
A stochastic process
t
t 0 ,1,
Is a Markov chain iff, for all t,
P[ X t  x| X 0 , X 1 ,, X t 1 ]  P[ X t  x| X t 1 ]
0.4
0.6
A
B
0.3
Andrea G. B. Tettamanzi, 2002
0.7
C
0.25
0.75
Abstract Evolutionary Algorithm
Xt
Stochastic functions:
select: (n) W
cross:   W
mutate:  W
mate:   W
insert:  W
Transition function:
X t 1 (w )  Tt (w ) X t (w )
select
select
mate
cross
mutate
Xt+1
Andrea G. B. Tettamanzi, 2002
insert
Convergence to Optimum
Theorem: if {Xt(w)}t = 0, 1, ... is monotone, homogeneous, x0 is
given, y in reach(x0)   (n)O reachable, then
lim P[ X t  O( n ) | X 0  x 0 ]  1.
t 
Theorem: if select, mutate are generous, the neighborhood
structure is connective, transition functions Tt(w), t = 0, 1, ... are i.i.d.
and elitist, then
lim P[ X t  O( n ) ]  1.
t 
Andrea G. B. Tettamanzi, 2002
Outline of various techniques
•
•
•
•
Plain Genetic Algorithms
Evolutionary Programming
Evolution Strategies
Genetic Programming
Andrea G. B. Tettamanzi, 2002
Plain Genetic Algorithms
•
•
•
•
Individuals are bit strings
Mutation as transcription error
Recombination is crossover
Fitness proportionate selection
Andrea G. B. Tettamanzi, 2002
Evolutionary Programming
•
•
•
•
•
•
Individuals are finite-state automata
Used to solve prediction tasks
State-transition table modified by uniform random mutation
No recombination
Fitness depends on the number of correct predictions
Truncation selection
Andrea G. B. Tettamanzi, 2002
Evolutionary Programming: Individuals
a/a
Finite-state automaton: (Q, q0, A, , w)
q0
• set of states Q;
• initial state q0;
c/c
• set of accepting states A;
b/c c/b
b/a
• alphabet of symbols ;
a/b
q
1
• transition function : Q   Q;
a/b
• output mapping function w: Q  ;
state
q0
q1
q2
q0 a
q2 b
q1 b
b
q1 c
q1 a
q0 c
c
q2 b
q0 c
q2 a
input
a
Andrea G. B. Tettamanzi, 2002
b/c
c/a
q2
Evolutionary Programming: Fitness
a
b
c
a
b
c
b
=?
individual 
prediction
no
yes
f() = f() + 1
Andrea G. B. Tettamanzi, 2002
a
b
Evolutionary Programming: Selection
Variant of stochastic q-tournament selection:

1
2
...
q
score() = #{i | f() > f(i) }
Order individuals by decreasing score
Select first half (Truncation selection)
Andrea G. B. Tettamanzi, 2002
Evolution Strategies
• Individuals are n-dimensional vectors of reals
• Fitness is the objective function
• Mutation distribution can be part of the genotype
(standard deviations and covariances evolve with solutions)
• Multi-parent recombination
• Deterministic selection (truncation selection)
Andrea G. B. Tettamanzi, 2002
Evolution Strategies: Individuals

a

candidate solution x

rotation angles  

standard deviations 
Andrea G. B. Tettamanzi, 2002
ij
1
2 cov(i , j )
 arctan 2
2
 i   2j
Evolution Strategies: Mutation
 i   i exp(  N (0,1)  N i (0,1))
 j   j  N j (0,1)
self-adaptation

    
x   x  N (0,  ,  )

Hans-Paul Schwefel suggests:

 
2 n

2n


1
1
  0.0873  5
Andrea G. B. Tettamanzi, 2002
Genetic Programming
• Program induction
• LISP (historically), math expressions, machine language, ...
• Applications:
–
–
–
–
–
–
–
optimal control;
planning;
sequence induction;
symbolic regression;
modelling and forecasting;
symbolic integration and differentiation;
inverse problems
Andrea G. B. Tettamanzi, 2002
Genetic Programming: The Individuals
subset of LISP S-expressions
OR
AND
AND
NOT
NOT
d0
d1
d0
d1
(OR (AND (NOT d0) (NOT d1)) (AND d0 d1))
Andrea G. B. Tettamanzi, 2002
Genetic Programming: Initialization
OR
OR
OR
AND
AND
NOT
AND
OR
AND
AND
OR
AND
AND
Andrea G. B. Tettamanzi, 2002
NOT
NOT
d0
d1
d0
d1
Genetic Programming: Crossover
OR
OR
NOT
AND
d0
d0
OR
d1
d1
OR
AND
NOT
d0
d1
NOT
NOT
NOT
d0
d0
d1
OR
AND
NOT
AND
d0
Andrea G. B. Tettamanzi, 2002
OR
d1
d1
NOT
NOT
d0
d0
Genetic Programming: Other Operators
•
•
•
•
•
Mutation: replace a terminal with a subtree
Permutation: change the order of arguments to a function
Editing: simplify S-expressions, e.g. (AND X X)  X
Encapsulation: define a new function using a subtree
Decimation: throw away most of the population
Andrea G. B. Tettamanzi, 2002
Genetic Programming: Fitness
Fitness cases:
j = 1, ..., Ne
Ne
“Raw” fitness:
r (  )   Output(  , j )  C ( j )
j 1
“Standardized” fitness:
“Adjusted” fitness:
Andrea G. B. Tettamanzi, 2002
s()  [0, +)
1
a( ) 
1  s(  )
Sample Application: Myoelectric
Prosthesis Control
• Control of an upper arm prosthesis
• Genetic Programming application
• Recognize thumb flection, extension and abduction patterns
Andrea G. B. Tettamanzi, 2002
Prosthesis Control: The Context
human
arm
150 ms
2 electrodes
myoelectric signals
measure
actuator commands
convert
raw myo-measurements
preprocess
robot motion
map into goal
myo-signal features
deduce
intentions
Andrea G. B. Tettamanzi, 2002
robot
arm
human motion
Prosthesis Control: Terminals
Features for electrodes 1, 2:
• Mean absolute value (MAV)
• Mean absolute value slope (MAVS)
• Number of zero crossings (ZC)
• Number of slope sign changes (SC)
• Waveform length (LEN)
• Average value (AVG)
• Up slope (UP)
• Down slope (DOWN)
• MAV1/MAV2, MAV2/MAV1
• 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 0.01, -1.0
Andrea G. B. Tettamanzi, 2002
Prosthesis Control: Function Set
Addition
Subtraction
Multiplication
Division
Square root
Sine
Cosine
Tangent
Natural logarithm
Common logarithm
Exponential
Power function
Reciprocal
Absolute value
Integer or truncate
Sign
Andrea G. B. Tettamanzi, 2002
x+y
x-y
x*y
x/y
sqrt(|x|)
sin x
cos x
tan x
ln |x|
log |x|
exp x
x^y
1/x
|x|
int(x)
sign(x)
(protected for y=0)
(protected for x=/2)
(protected for x=0)
(protected for x=0)
(protected for x=0)
Prosthesis Control: Fitness
type 1
undefined
type 2
type 3
undefined
undefined
result

22 signals per motion

spread
r(  )   abduction   extension   flexion 
100

min  abduction   extension ,  abduction   flexion ,  extension   flexion
separation
Andrea G. B. Tettamanzi, 2002

Myoelectric Prosthesis Control Reference
• Jaime J. Fernandez, Kristin A. Farry and John B. Cheatham.
“Waveform Recognition Using Genetic Programming: The
Myoelectric Signal Recognition Problem. GP ‘96, The MIT
Press, pp. 63–71
Andrea G. B. Tettamanzi, 2002
Classifier Systems (Michigan approach)
individual:
IF X = A AND Y = B THEN Z = D
(1  e) f n (  )  r  (n)  class(n)
f n 1 (  )  
 (1  p) f n (  )  (n)  class(n)
IF ... THEN ...
where
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
r  (1  gN  ) R
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
Andrea G. B. Tettamanzi, 2002
IF ... THEN ...
number of attributes
in antecedent part
Practical Implementation Issues
• from elegant academia to not so elegant but robust and efficient
real-world applications, evolution programs
• handling constraints
• hybridization
• parallel and distributed algorithms
Andrea G. B. Tettamanzi, 2002
Evolution Programs
Slogan:
Genetic Algorithms + Data Structures = Evolution Programs
Key ideas:
• use a data structure as close as possible to object problem
• write appropriate genetic operators
• ensure that all genotypes correspond to feasible solutions
• ensure that genetic operators preserve feasibility
Andrea G. B. Tettamanzi, 2002
Encodings: “Pie” Problems
W
128
X
32
Y
90
Z
20
0–255
0–255
0–255
0–255
X = 32/270 = 11.85%
W
X
Y
Z
Andrea G. B. Tettamanzi, 2002
Encodings: “Permutation” Problems
Adjacency Representation
(2, 4, 8, 3, 9, 7, 1, 5, 6)
Ordinal Representation
(1, 1, 2, 1, 4, 1, 3, 1, 1)
Path Representation
(1, 2, 4, 3, 8, 5, 9, 6, 7)
Matrix Representation
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
1
1
1
0
0
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
0
1
0
Sorting Representation
(-23, -6, 2, 0, 19, 32, 85, 11, 25)
1-2-4-3-8-5-9-6-7
Andrea G. B. Tettamanzi, 2002
1
1
0
1
0
0
0
0
0
Handling Constraints
• Penalty functions
Risk of spending most of the time evaluating unfeasible solutions,
sticking with the first feasible solution found, or finding an unfeasible
solution that scores better of feasible solutions
• Decoders or repair algorithms
Computationally intensive, tailored to the particular application
• Appropriate data structures and specialized genetic operators
All possible genotypes encode for feasible solutions
Andrea G. B. Tettamanzi, 2002
Penalty Functions
S
c
f (  )  Eval(c( z))  P( z)
P
P( z)  w(t )  wi  i ( z)
i
Andrea G. B. Tettamanzi, 2002
Decoders / Repair Algorithms
recombination
c
S
mutation
Andrea G. B. Tettamanzi, 2002
Hybridization
1) Seed the population with solutions provided by some heuristics
heuristics
initial population
2) Use local optimization algorithms as genetic operators
(Lamarckian mutation)
3) Encode parameters of a heuristics
genotype
Andrea G. B. Tettamanzi, 2002
heuristics
candidate solution
Sample Application: Unit Commitment
• Multiobjective optimization problem: cost VS emission
• Many linear and non-linear constraints
• Traditionally approached with dynamic programming
• Hybrid evolutionary/knowledge-based approach
• A flexible decision support system for planners
• Solution time increases linearly with the problem size
Andrea G. B. Tettamanzi, 2002
The Unit Commitment Problem
zE 
E
z$   Ci ( Pi )  SU i  SDi  HSi 
n
n
i
( Pi )
i 1
i 1
m
E i ( Pi )   E ij ( Pi )
j 1
Ci ( Pi )  ai  bi Pi  ci Pi 2
E ij ( Pi )   ij   ij Pi   ij Pi 2
Emissions
Andrea G. B. Tettamanzi, 2002
Cost
Predicted Load Curve
45
40
35
30
25
Spinning Reserve
20
Load
15
10
5
PM
PM
10
:0
0
8:
00
PM
6:
00
PM
4:
00
PM
2:
00
PM
12
:0
0
AM
AM
Andrea G. B. Tettamanzi, 2002
10
:0
0
8:
00
AM
6:
00
AM
4:
00
AM
2:
00
12
:0
0
AM
0
Unit Commitment: Constraints
•
•
•
•
•
•
•
•
•
Power balance requirement
Spinning reserve requirement
Unit maximum and minimum output limits
Unit minimum up and down times
Power rate limits
Unit initial conditions
Unit status restrictions
Plant crew constraints
...
Andrea G. B. Tettamanzi, 2002
Unit Commitment: Encoding
Unit 1
1.0
0.9
0.0
0.0
1.0
0.8
1.0
0.0
0.5
1.0
Unit 2
0.8
1.0
1.0
0.5
0.65
0.8
0.4
0.0
1.0
0.5
Andrea G. B. Tettamanzi, 2002
Unit 3
0.2
0.2
0.8
1.0
0.8
0.25
0.2
1.0
1.0
0.0
Unit 4
0.15
1.0
0.2
0.8
1.0
1.0
1.0
0.75
0.8
0.0
Time
00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00
08:00
09:00
Fuzzy
Knowledge
Base
Unit Commitment: Solution
Unit 1
Unit 2
Andrea G. B. Tettamanzi, 2002
Unit 3
Unit 4
Time
00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00
08:00
09:00
down
hot-stand-by
starting
shutting down
up
emission
Unit Commitment: Selection
cost ($)
competitive selection:
Andrea G. B. Tettamanzi, 2002
$507,762
$516,511
213,489 £
60,080 £
Unit Commitment References
• D. Srinivasan, A. Tettamanzi. “An Integrated Framework for
Devising Optimum Generation Schedules”. In Proceedings of
the 1995 IEEE International Conference on Evolutionary
Computing (ICEC ‘95), vol. 1, pp. 1-4.
• D. Srinivasan, A. Tettamanzi. A Heuristic-Guided Evolutionary
Approach to Multiobjective Generation Scheduling. IEE
Proceedings Part C - Generation, Transmission, and
Distribution, 143(6):553-559, November 1996.
• D. Srinivasan, A. Tettamanzi. An Evolutionary Algorithm for
Evauation of Emission Compliance Options in View of the Clean
Air Act Amendments. IEEE Transactions on Power Systems,
12(1):336-341, February 1997.
Andrea G. B. Tettamanzi, 2002
Parallel Evolutionary Algorithms
• Algoritmo evolutivo standard enunciato come
sequenziale...
• … ma gli algoritmi evolutivi sono intrinsecamente
paralleli
• Vari modelli:
–
–
–
–
algoritmo evolutivo cellulare
algoritmo evolutivo parallelo a grana fine (griglia)
algoritmo evolutivo parallelo a grana grossa (isole)
algoritmo evolutivo sequenziale con calcolo della fitness
parallelo (master - slave)
Andrea G. B. Tettamanzi, 2002
Terminology
• Panmictic
• Apomictic
Andrea G. B. Tettamanzi, 2002
Island Model
Andrea G. B. Tettamanzi, 2002
Selected Applications in Biology and
Medical Science
• the protein folding problem, i.e. determining the tertiary structure
of proteins using evolutionary algorithms;
• quantitative structure-activity relationship modeling for drug
design;
• applications to medical diagnosis, like electroencephalogram
(EEG) classification and automatic feature detection in medical
imagery (PET, CAT, NMR, X-RAY, etc.);
• applications to radiotherapy treatment planning;
• applications to myoelectric prosthesis control.
Andrea G. B. Tettamanzi, 2002
Sample Application: Protein Folding
•
•
•
•
Finding 3-D geometry of a protein to understand its functionality
Very difficult: one of the “grand challenge problems”
Standard GA approach
Simplified protein model
Andrea G. B. Tettamanzi, 2002
Protein Folding: The Problem
• Much of a proteins function may be derived from its
conformation (3-D geometry or “tertiary” structure).
• Magnetic resonance & X-ray crystallography are currently used
to view the conformation of a protein:
– expensive in terms of equipment, computation and time;
– require isolation, purification and crystallization of protein.
• Prediction of the final folded conformation of a protein chain has
been shown to be NP-hard.
• Current approaches:
– molecular dynamics modelling (brute force simulation);
– statistical prediction;
– hill-climbing search techniques (simulated annealing).
Andrea G. B. Tettamanzi, 2002
Protein Folding: Simplified Model
•
•
•
•
•
•
•
90° lattice (6 degrees of freedom at each point);
Peptides occupy intersections;
No side chains;
Hydrophobic or hydrophilic (no relative strengths) amino acids;
Only hydrophobic/hydrophilic forces considered;
Adjacency considered only in cardinal directions;
Cross-chain hydrophobic contacts are the basis for evaluation.
Andrea G. B. Tettamanzi, 2002
Protein Folding: Representation
relative move encoding:
UP DOWN FORWARD LEFT UP RIGHT
...
preference order encoding:
UP
DOWN
FORWARD LEFT
LEFT
LEFT
UP
DOWN
RIGHT
UP
DOWN
FORWARD
DOWN
FORWARD
LEFT
UP
FORWARD
RIGHT
RIGHT
RIGHT
Andrea G. B. Tettamanzi, 2002
...
Protein Folding: Fitness
Decode: plot the course encoded by the genotype.
Test each occupied cell:
• any collisions: -2;
• no collisions AND a hydrophobe in an adjacent cell: 1.
Notes:
• for each contact: +2;
• adjacent hydrophobes not discounted in the scoring;
• multiple collisions (>1 peptides in one cell): -2;
• hydrophobe collisions imply an additional penalty (no contacts
are scored).
Andrea G. B. Tettamanzi, 2002
Protein Folding: Experiments
•
•
•
•
•
Preference ordering encoding;
Two-point crossover with a rate of 95%;
Bit mutation with a rate of 0.1%;
Population size: 1000 individuals;
crowding and incest reduction.
• Test sequences with known minimum configuration;
Andrea G. B. Tettamanzi, 2002
Protein Folding References
• S. Schulze-Kremer. “Genetic Algorithms for Protein Tertiary
Structure Prediction”. PPSN 2, North-Holland 1992.
• R. Unger and J. Moult. “A Genetic Algorithm for 3D Protein
Folding Simulations”. ICGA-5, 1993, pp. 581–588.
• Arnold L. Patton, W. F. Punch III and E. D. Goodman. “A
Standard GA Approach to Native Protein Conformation
Prediction”. ICGA 6, 1995, pp. 574–581.
Andrea G. B. Tettamanzi, 2002
Sample Application: Drug Design
Purpose: given a chemical specification (activity), design a tertiary
structure complying with it.
Requirement: a quantitative structure-activity relationship model.
Example: design ligands that can
bind targets specifically and
selectively. Complementary
peptides.
Andrea G. B. Tettamanzi, 2002
Drug Design: Implementation
amino acid (residue)
N L H A F G L F K A
individual
• name
• hydropathic value
Operators:
• Hill-climbing Crossover
• Hill-climbing Mutation
• Reordering (no selection)
Andrea G. B. Tettamanzi, 2002
implicit selection
Drug Design: Fitness
target a
complement b
k s
ak 
h
ik s
i
moving average
hydropathy
k s
bk 
g
ik s
i
hydropathy of residues
k  s, ..., n  s
Q  i
(ai  bi ) 2
n  2s
Andrea G. B. Tettamanzi, 2002
n: number of residues in target
(lower Q = better complementarity)
Drug Design: Results
Sequence:FANSGNVYFGIIAL
Hydropathic
Value
Fassina
GA
Target
4
2
0
-2
-4
-6
0
2
4
6
8
10
12
14
16
AminoAcid
Andrea G. B. Tettamanzi, 2002
Drug Design References
• T. S. Lim. A Genetic Algorithms Approach for Drug Design. MS
Dissertation, Oxford University, Computing Laboratory, 1995.
• A. L. Parrill. Evolutionary and Genetic Methods in Drug Design.
Drug Discovery Today, Vol. 1, No. 12, Dec 1996, pp. 514–521.
Andrea G. B. Tettamanzi, 2002
Sample Application: Medical Diagnosis
• Classifier Systems application
• Learning by examples
• Lymphography
– 148 examples, 18 attributes, 4 diagnoses
– estimated performance of a human expert: 85% correct
• Prognosis of breast cancer recurrence
– 288 examples, 10 attributes, 2 diagnoses
– performance of human expert unknown
• Location of primary tumor
– 339 examples, 17 attributes, 22 diagnoses
– estimated performance of a human expert: 42% correct
Andrea G. B. Tettamanzi, 2002
Medical Diagnosis Results
• Performance indistiguishable from humans
• Performance for breast cancer: about 75%
• In primary tumor, patients with identical symptoms have different
diagnoses
• Symbolic (= comprehensible) diagnosis rules
Andrea G. B. Tettamanzi, 2002
Medical Diagnosis References
•
•
•
•
Pierre Bonelli, Alexandre Parodi, “An Efficient Classifier System and its
Experimental Comparison with two Representative learning methods
on three medical domains”. ICGA 4, pp. 288–295.
Tod A. Sedbrook, Haviland Wright, Richard Wright. “Application of a
Genetic Classifier for Patient Triage”. ICGA 4, pp. 334–338.
H. F. Gray, R. J. Maxwell, I. Martínez-Perez, C. Arús, S. Cerdán.
“Genetic Programming Classification of Magnetic Resonance Data”.
GP ‘96, p. 424.
Alejandro Pazos, Julian Dorado, Antonio Santos. “Detection of Patterns
in Radiographs using ANN Designed and Trained with GA”. GP ‘96, p.
432.
Andrea G. B. Tettamanzi, 2002
Sample Application: Radiotherapy
Treatment Planning
• X-rays or electron beams for cancer treatment
• Conformal therapy: uniform dose over cancerous regions, spare
healthy tissues
• Constrained optimization, inverse problem
• From dose specification to beam intensities
• Constraints:
– beam intensities are positive
– rate of intensity change is limited
• Conflicting objectives: Pareto-optimal set of solutions
Andrea G. B. Tettamanzi, 2002
RTP: The Problem
beam
TA:
dose delivered to
treatment area
OAR: dose delivered to
organs at risk
OHT: dose delivered to
other healty
tissues
plane of interest
tretment
area
organ at risk
TA = 100%
OAR < 20%
OHT < 30%
y
head
Andrea G. B. Tettamanzi, 2002
x
z
|OAR - OAR*|
RTP: Fitness and Solutions
C
A
Pareto optimal set
B
|TA - TA*|
Andrea G. B. Tettamanzi, 2002
Radiotherapy Treatment Planning
References
• O. C. L. Haas, K. J. Burnham, M. H. Fisher, J. A. Mills. “Genetic
Algorithm Applied to Radiotherapy Treatment Planning”.
ICANNGA ‘95, pp. 432–435.
Andrea G. B. Tettamanzi, 2002
Evolutionary Algorithms and Soft
Computing
EAs
optimization
optimization
monitoring
fitness
SC
FL
Andrea G. B. Tettamanzi, 2002
NNs
Soft Computing
• Tolerant of imprecision, uncertainty, and partial truth
• Adaptive
• Methodologies:
–
–
–
–
–
Evolutionary Algorithms
Neural Networks
Bayesian and Probabilistic Networks
Fuzzy Logic
Rough Sets
• Bio-inspired: Natural Computing
• A Scientific Discipline?
• Methodologies co-operate, do not compete (synergy)
Andrea G. B. Tettamanzi, 2002
Artificial Neural Networks
dendritis
axon
synapsis
x1 w
1
x 2 w2
x n wn
Andrea G. B. Tettamanzi, 2002

y
Fuzzy Logic
1
0
Andrea G. B. Tettamanzi, 2002
EAs
optimization
fitness
FL
Andrea G. B. Tettamanzi, 2002
NNs
Neural Network Design and Optimization
• Evolving weights for a network of predefined structure
• Evolving network structure
– direct encoding
– indirect encoding
• Evolving learning rules
• Input data selection
Andrea G. B. Tettamanzi, 2002
Evoluzione dei pesi (struttura predefinita)
0.2
-0.3
0.6
0.7
-0.5
0.4
(0.2, -0.3, 0.6, -0.5, 0.4, 0.7)
Andrea G. B. Tettamanzi, 2002
Evolving the Structure: Direct Encoding
1
2
3
4
5
6
1
0
0
0
0
0
0
2
0
0
0
0
0
0
3
0
0
0
0
0
0
4
1
1
0
0
0
0
5
1
0
1
0
0
0
6
0
1
0
1
1
0
6
4
1
Andrea G. B. Tettamanzi, 2002
5
2
3
Evoluzione pesi e struttura feed-forward
codifica diretta
(3, 2, 3)
3x3
3x2
2x3
3x1
W0
W1
W2
W3
Andrea G. B. Tettamanzi, 2002
Evoluzione pesi e struttura feed-forward
codifica diretta
• Operatore di mutazione:
–
–
–
–
rimozione neurone: elimina colonna in Wi - 1, riga in Wi;
duplicazione neurone: copia colonna in Wi - 1, riga in Wi;
rimozione di un layer con un solo neurone: WTi - 1 Wi;
duplicazione di un layer: inserisci matrice identità;
• Operatore di semplificazione:
– rimuovi neuroni con riga in Wi di norma < ;
• Operatore di incrocio:
– scegli due punti di incrocio nei genitori;
– scambia le code;
– collega i pezzi con nuova matrice di pesi casuale
Andrea G. B. Tettamanzi, 2002
Structure Evolution: Direct Encoding
Graph-generating Grammar
A B

S  
C
D


c d 
a
, B  
A  
a c 
a
 0 0
0
, b  
a  
 0 0
0
a
a a
a a
, C  
, D  

e
a a
a b
0
1 0
 0 1
1 1
, c  
, d  
, e  

1
0
1
0
1
1
1






(S: A, B, C, D || A: c, d, a, c || B: a, a, a, e || C: a, a, a, a || ... )
Andrea G. B. Tettamanzi, 2002
optimization
EAs
monitoring
SC
FL
Andrea G. B. Tettamanzi, 2002
NNs
Evolutionary Algorithms and Fuzzy Logic
Fuzzy Government
2
Evolutionary Algorithm
fuzzy fitness
fuzzy operators
1
Fuzzy Sistem
Andrea G. B. Tettamanzi, 2002
3
Fuzzy System Design and Optimization
•
•
•
•
Representation
Genetic operators
Selection mechanism
Example: Learning fuzzy classifiers
Andrea G. B. Tettamanzi, 2002
Fuzzy Rule-Based Systems
Andrea G. B. Tettamanzi, 2002
Representation of a Fuzzy Rulebase
c1
c2
c3
c4
totally overlapping membership functions
10011000 11011010 membership function genes
N
00001010 max = Ndom * Noutput rule genes of value (0 ... Ndom)
input
input
output
FA1 FA2 FA3 FA1 FA2 FA3 R1
genotype
Andrea G. B. Tettamanzi, 2002
rules
R2
...
Rmax
A richer representation
Input
membership
functions
Output
MFs
Rules
IF x is A AND v is B THEN F is C
IF a is D THEN F is E
IF w is G AND x is H THEN F is C
IF true THEN F is K
Andrea G. B. Tettamanzi, 2002
Initialization
Input variables
no. domains = 1 + exponential(3)

a
min
IF
b C c
d
max
Output variables
no. domains = 2 + exponential(3)
Rules
no. rules = 2 + exponential(6)
is
AND
is
AND
is
AND
is
THEN
is
for each input variable, flip a coin to decide whether to include
Andrea G. B. Tettamanzi, 2002
Recombination
A rule takes with it
all the referred domains
with their MFs
IF x is A AND v is B THEN F is C
something else
something else
IF true THEN F is K
something else
IF a is D THEN F is E
IF w is G AND x is H THEN F is C
something else
IF x is A AND v is B THEN F is C
IF a is D THEN F is E
IF w is G AND x is H THEN F is C
IF true THEN F is K
Andrea G. B. Tettamanzi, 2002
Mutation
• {add, remove, change} domain to {input, output} variable;
• {duplicate, remove} a rule;
• change a rule:
{add, remove, change} a clause in the {antecedent, consequent}
input MF perturbation:
a
Andrea G. B. Tettamanzi, 2002
b
c
d
Esempio: “Learning fuzzy classifiers”
Andrea G. B. Tettamanzi, 2002
Controlling the Evolutionary Process
• Motivation:
– EAs easy to implement
– little specific knowledge required
– long computing time
• Features:
– complex dynamics
– non-binary conditions
– “intuitive” knowledge available
Andrea G. B. Tettamanzi, 2002
Knowledge Acquisition
ALGORITHM
statistics
visualization
KNOWLEDGE
Andrea G. B. Tettamanzi, 2002
Fuzzfying Evolutionary Algorithms
• Fuzzy fitness (objective function)
• Fuzzy encoding
• Fuzzy operators
– recombination
– mutation
• Population Statistics
Andrea G. B. Tettamanzi, 2002
Fuzzy Fitness
• Faster calculation
• Less precision
• Specific Selection
Andrea G. B. Tettamanzi, 2002
Fuzzy Government
“Fuzzy rulebase for the dynamic control of an evolutionary
algorithm”
Andrea G. B. Tettamanzi, 2002
Population
Statistics
Parameters
If D(Xt) is LOW then pmut is HIGH
If f (Xt) is LOW and D(Xt) is HIGH then Emerg is NO
...
EAs
FL
NNs
integration
Andrea G. B. Tettamanzi, 2002
Neuro-Fuzzy Systems
• Fuzzy Neural Networks
–
–
–
–
fuzzy neurons (OR, AND, OR/AND)
learning algorithms (backpropagation-style)
NEFPROX
ANFIS
• Co-operative Neuro-Fuzzy Systems
– Adaptive FAMs: differential competitive learning
– Self-Organizing Feature Maps
– Fuzzy ART and Fuzzy ARTMAP
Andrea G. B. Tettamanzi, 2002
Fuzzy Neural Networks
x1
x2
w11
w12
AND
wm1
v1
wm 2
OR
w1n
xn
vm
wmn
Andrea G. B. Tettamanzi, 2002
AND
y
FAM Systems
( A1  B1 )
( A2  B2 )
x
fuzz
( Ak  Bk )
Andrea G. B. Tettamanzi, 2002

defuzz
y
EAs
optimization
optimization
monitoring
fitness
SC
FL
NNs
integration
A. Tettamanzi, M. Tomassini. Soft Computing. Springer-Verlag 2001
Andrea G. B. Tettamanzi, 2002
Summary and Conclusions
Andrea G. B. Tettamanzi, 2002
Download