Evolutionary Algorithms Andrea G. B. Tettamanzi Andrea G. B. Tettamanzi, 2002 Contents of the Lectures • • • • • • • • Taxonomy and History; Evolutionary Algorithms basics; Theoretical Background; Outline of the various techniques: plain genetic algorithms, evolutionary programming, evolution strategies, genetic programming; Practical implementation issues; Evolutionary algorithms and soft computing; Selected applications from the biological and medical area; Summary and Conclusions. Andrea G. B. Tettamanzi, 2002 Bibliography Th. Bäck. Evolutionary Algorithms in Theory and Practice. Oxford University Press, 1996 L. Davis. The Handbook of Genetic Algorithms. Van Nostrand & Reinhold, 1991 D.B. Fogel. Evolutionary Computation. IEEE Press, 1995 D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989 J. Koza. Genetic Programming. MIT Press, 1992 Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer Verlag, 3rd ed., 1996 H.-P. Schwefel. Evolution and Optimum Seeking. Wiley & Sons, 1995 J. Holland. Adaptation in Natural and Artificial Systems. MIT Press 1995 Andrea G. B. Tettamanzi, 2002 Taxonomy (1) Taboo Search Stochastic optimization methods Monte Carlo methods Simulated Annealing Evolutionary Algorithms Genetic Algorithms Genetic Programming Andrea G. B. Tettamanzi, 2002 Evolution Strategies Evolutionary Programming Taxonomy (2) Distinctive features of Evolutionary Algorithms: • • • • operate on appropriate encoding of solutions; population search; no regularity conditions requested; probabilistic transitions. Andrea G. B. Tettamanzi, 2002 History (1) John Koza Stanford University ‘80s L. Fogel UC S. Diego, ‘60s Andrea G. B. Tettamanzi, 2002 I. Rechenberg, H.-P. Schwefel TU Berlin, ‘60s John H. Holland University of Michigan, Ann Arbor, ‘60s History (2) 1859 Charles Darwin: inheritance, variation, natural selection 1957 G. E. P. Box: random mutation & selection for optimization 1958 Fraser, Bremermann: computer simulation of evolution 1964 Rechenberg, Schwefel: mutation & selection 1966 Fogel et al.: evolving automata - “evolutionary programming” 1975 Holland: crossover, mutation & selection - “reproductive plan” 1975 De Jong: parameter optimization - “genetic algorithm” 1989 Goldberg: first textbook 1991 Davis: first handbook 1993 Koza: evolving LISP programs - “genetic programming” Andrea G. B. Tettamanzi, 2002 Evolutionary Algorithms Basics • • • • • • • • what an EA is (the Metaphor) object problem and fitness the Ingredients schemata implicit parallelism the Schema Theorem the building blocks hypothesis deception Andrea G. B. Tettamanzi, 2002 The Metaphor EVOLUTION PROBLEM SOLVING Environment Object problem Individual Candidate solution Fitness Quality Andrea G. B. Tettamanzi, 2002 Object problem and Fitness genotype M solution s f c: S R min c( s ) s S fitness object problem Andrea G. B. Tettamanzi, 2002 The Ingredients t reproduction selection mutation recombination Andrea G. B. Tettamanzi, 2002 t+1 The Evolutionary Cycle Selection Parents Population Mutation Reproduction Recombination Replacement Offspring Andrea G. B. Tettamanzi, 2002 Pseudocode generation = 0; SeedPopulation(popSize); // at random or from a file while(!TerminationCondition()) { generation = generation + 1; CalculateFitness(); // ... of new genotypes Selection(); // select genotypes that will reproduce Crossover(pcross); // mate pcross of them on average Mutation(pmut); // mutate all the offspring with Bernoulli // probability pmut over genes } Andrea G. B. Tettamanzi, 2002 A Sample Genetic Algorithm • • • • • The MAXONE problem Genotypes are bit strings Fitness-proportionate selection One-point crossover Flip mutation (transcription error) Andrea G. B. Tettamanzi, 2002 The MAXONE Problem Problem instance: a string of l binary cells, l : l Fitness: f ( ) i i 1 Objective: maximize the number of ones in the string. Andrea G. B. Tettamanzi, 2002 Fitness Proportionate Selection Probability of being selected: P ( ) Implementation: “Roulette Wheel” f 2 Andrea G. B. Tettamanzi, 2002 f ( ) f ( ) f One Point Crossover parents offspring 0 0 0 1 1 1 1 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 0 1 0 crossover point Andrea G. B. Tettamanzi, 2002 Mutation 1 0 1 1 0 0 1 1 0 1 pmut 1 0 1 1 1 0 1 1 0 0 independent Bernoulli transcription errors Andrea G. B. Tettamanzi, 2002 Example: Selection 0111011011 1011011101 1101100010 0100101100 1100110011 1111001000 0110001010 1101011011 0110110000 0011111101 f=7 f=7 f=5 f=4 f=6 f=5 f=4 f=7 f=4 f=7 Cf = 7 Cf = 14 Cf = 19 Cf = 23 Cf = 29 Cf = 34 Cf = 38 Cf = 45 Cf = 49 Cf = 56 P = 0.125 P = 0.125 P = 0.089 P = 0.071 P = 0.107 P = 0.089 P = 0.071 P = 0.125 P = 0.071 P = 0.125 Random sequence: 43, 1, 19, 35, 15, 22, 24, 38, 44, 2 Andrea G. B. Tettamanzi, 2002 Example: Recombination & Mutation 0111011011 0111011011 110|1100010 010|0101100 1|100110011 1|100110011 0110001010 1101011011 011000|1010 110101|1011 0111011011 0111011011 1100101100 0101100010 1100110011 1100110011 0110001010 1101011011 0110001011 1101011010 0111111011 0111011011 1100101100 0101100010 1100110011 1000110011 0110001010 1101011011 0110001011 1101011010 f=8 f=7 f=5 f=4 f=6 f=5 f=4 f=7 f=5 f=6 TOTAL = 57 Andrea G. B. Tettamanzi, 2002 Schemata Don’t care symbol: 1 0 1 order of a schema: o(S) = # fixed positions defining length (S) = distance between first and last fixed position a schema S matches 2l - o(S) strings a string of length l is matched by 2l schemata Andrea G. B. Tettamanzi, 2002 Implicit Parallelism In a population of n individuals of length l 2l # schemata processed n2l n3 of which are processed usefully (Holland 1989) (i.e. are not disrupted by crossover and mutation) But see Bertoni & Dorigo (1993) “Implicit Parallelism in Genetic Algorithms” Artificial Intelligence 61(2), p. 307314 Andrea G. B. Tettamanzi, 2002 Fitness of a schema f(): fitness of string qx(): fraction of strings equal to in population x qx(S): fraction of strings matched by S in population x 1 f x (S ) q x ( ) f ( ) q x ( S ) S Andrea G. B. Tettamanzi, 2002 The Schema Theorem {Xt}t=0,1,... populations at times t suppose that f X t (S ) f ( X t ) f ( Xt ) c E[q X t ( S )| X 0 ] q X 0 ( S )(1 c) 1 pcross t is constant ( S ) o( S ) pmut l 1 i.e. above-average individuals increase exponentially! Andrea G. B. Tettamanzi, 2002 t The Schema Theorem (proof) E[q X t ( S )| X t 1 ] q X t 1 ( S ) f X t 1 ( S ) f ( X t 1 ) Psurv [ S ] 1 pcross Andrea G. B. Tettamanzi, 2002 Psurv [ S ] q X t 1 ( S )(1 c) Psurv [ S ] ( S ) pmut o( S ) 1 l The Building Blocks Hypothesis ‘‘An evolutionary algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance schemata — the building blocks’’ Andrea G. B. Tettamanzi, 2002 Deception i.e. when the building block hypothesis does not hold: for some schema S, * S but f (S ) f (S ) Example: S1 = 111******* * = 1111111111 S2 = ********11 S = 111*****11 S = 000*****00 Andrea G. B. Tettamanzi, 2002 Remedies to deception Prior knowledge of the objective function Non-deceptive encoding Inversion Semantics of genes not positional Underspecification & overspecification “Messy Genetic Algorithms” Andrea G. B. Tettamanzi, 2002 Theoretical Background • Theory of random processes; • Convergence in probability; • Open question: rate of convergence. Andrea G. B. Tettamanzi, 2002 Events Sample space D w W A B Andrea G. B. Tettamanzi, 2002 Random Variables w W X :W R X X (w ) Andrea G. B. Tettamanzi, 2002 0 Stochastic Processes A sequence of r.v.’s X 1 , X 2 ,, X t , Each with its own probability distribution. Notation: Andrea G. B. Tettamanzi, 2002 X (w) t t 0 ,1, EAs as Random Processes ,2 , x ( n ) probability space a sample of size n trajectory “random numbers” W, F , P X (w) t ,2 Andrea G. B. Tettamanzi, 2002 , t 0 ,1, evolutionary process Markov Chains X (w) A stochastic process t t 0 ,1, Is a Markov chain iff, for all t, P[ X t x| X 0 , X 1 ,, X t 1 ] P[ X t x| X t 1 ] 0.4 0.6 A B 0.3 Andrea G. B. Tettamanzi, 2002 0.7 C 0.25 0.75 Abstract Evolutionary Algorithm Xt Stochastic functions: select: (n) W cross: W mutate: W mate: W insert: W Transition function: X t 1 (w ) Tt (w ) X t (w ) select select mate cross mutate Xt+1 Andrea G. B. Tettamanzi, 2002 insert Convergence to Optimum Theorem: if {Xt(w)}t = 0, 1, ... is monotone, homogeneous, x0 is given, y in reach(x0) (n)O reachable, then lim P[ X t O( n ) | X 0 x 0 ] 1. t Theorem: if select, mutate are generous, the neighborhood structure is connective, transition functions Tt(w), t = 0, 1, ... are i.i.d. and elitist, then lim P[ X t O( n ) ] 1. t Andrea G. B. Tettamanzi, 2002 Outline of various techniques • • • • Plain Genetic Algorithms Evolutionary Programming Evolution Strategies Genetic Programming Andrea G. B. Tettamanzi, 2002 Plain Genetic Algorithms • • • • Individuals are bit strings Mutation as transcription error Recombination is crossover Fitness proportionate selection Andrea G. B. Tettamanzi, 2002 Evolutionary Programming • • • • • • Individuals are finite-state automata Used to solve prediction tasks State-transition table modified by uniform random mutation No recombination Fitness depends on the number of correct predictions Truncation selection Andrea G. B. Tettamanzi, 2002 Evolutionary Programming: Individuals a/a Finite-state automaton: (Q, q0, A, , w) q0 • set of states Q; • initial state q0; c/c • set of accepting states A; b/c c/b b/a • alphabet of symbols ; a/b q 1 • transition function : Q Q; a/b • output mapping function w: Q ; state q0 q1 q2 q0 a q2 b q1 b b q1 c q1 a q0 c c q2 b q0 c q2 a input a Andrea G. B. Tettamanzi, 2002 b/c c/a q2 Evolutionary Programming: Fitness a b c a b c b =? individual prediction no yes f() = f() + 1 Andrea G. B. Tettamanzi, 2002 a b Evolutionary Programming: Selection Variant of stochastic q-tournament selection: 1 2 ... q score() = #{i | f() > f(i) } Order individuals by decreasing score Select first half (Truncation selection) Andrea G. B. Tettamanzi, 2002 Evolution Strategies • Individuals are n-dimensional vectors of reals • Fitness is the objective function • Mutation distribution can be part of the genotype (standard deviations and covariances evolve with solutions) • Multi-parent recombination • Deterministic selection (truncation selection) Andrea G. B. Tettamanzi, 2002 Evolution Strategies: Individuals a candidate solution x rotation angles standard deviations Andrea G. B. Tettamanzi, 2002 ij 1 2 cov(i , j ) arctan 2 2 i 2j Evolution Strategies: Mutation i i exp( N (0,1) N i (0,1)) j j N j (0,1) self-adaptation x x N (0, , ) Hans-Paul Schwefel suggests: 2 n 2n 1 1 0.0873 5 Andrea G. B. Tettamanzi, 2002 Genetic Programming • Program induction • LISP (historically), math expressions, machine language, ... • Applications: – – – – – – – optimal control; planning; sequence induction; symbolic regression; modelling and forecasting; symbolic integration and differentiation; inverse problems Andrea G. B. Tettamanzi, 2002 Genetic Programming: The Individuals subset of LISP S-expressions OR AND AND NOT NOT d0 d1 d0 d1 (OR (AND (NOT d0) (NOT d1)) (AND d0 d1)) Andrea G. B. Tettamanzi, 2002 Genetic Programming: Initialization OR OR OR AND AND NOT AND OR AND AND OR AND AND Andrea G. B. Tettamanzi, 2002 NOT NOT d0 d1 d0 d1 Genetic Programming: Crossover OR OR NOT AND d0 d0 OR d1 d1 OR AND NOT d0 d1 NOT NOT NOT d0 d0 d1 OR AND NOT AND d0 Andrea G. B. Tettamanzi, 2002 OR d1 d1 NOT NOT d0 d0 Genetic Programming: Other Operators • • • • • Mutation: replace a terminal with a subtree Permutation: change the order of arguments to a function Editing: simplify S-expressions, e.g. (AND X X) X Encapsulation: define a new function using a subtree Decimation: throw away most of the population Andrea G. B. Tettamanzi, 2002 Genetic Programming: Fitness Fitness cases: j = 1, ..., Ne Ne “Raw” fitness: r ( ) Output( , j ) C ( j ) j 1 “Standardized” fitness: “Adjusted” fitness: Andrea G. B. Tettamanzi, 2002 s() [0, +) 1 a( ) 1 s( ) Sample Application: Myoelectric Prosthesis Control • Control of an upper arm prosthesis • Genetic Programming application • Recognize thumb flection, extension and abduction patterns Andrea G. B. Tettamanzi, 2002 Prosthesis Control: The Context human arm 150 ms 2 electrodes myoelectric signals measure actuator commands convert raw myo-measurements preprocess robot motion map into goal myo-signal features deduce intentions Andrea G. B. Tettamanzi, 2002 robot arm human motion Prosthesis Control: Terminals Features for electrodes 1, 2: • Mean absolute value (MAV) • Mean absolute value slope (MAVS) • Number of zero crossings (ZC) • Number of slope sign changes (SC) • Waveform length (LEN) • Average value (AVG) • Up slope (UP) • Down slope (DOWN) • MAV1/MAV2, MAV2/MAV1 • 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 0.01, -1.0 Andrea G. B. Tettamanzi, 2002 Prosthesis Control: Function Set Addition Subtraction Multiplication Division Square root Sine Cosine Tangent Natural logarithm Common logarithm Exponential Power function Reciprocal Absolute value Integer or truncate Sign Andrea G. B. Tettamanzi, 2002 x+y x-y x*y x/y sqrt(|x|) sin x cos x tan x ln |x| log |x| exp x x^y 1/x |x| int(x) sign(x) (protected for y=0) (protected for x=/2) (protected for x=0) (protected for x=0) (protected for x=0) Prosthesis Control: Fitness type 1 undefined type 2 type 3 undefined undefined result 22 signals per motion spread r( ) abduction extension flexion 100 min abduction extension , abduction flexion , extension flexion separation Andrea G. B. Tettamanzi, 2002 Myoelectric Prosthesis Control Reference • Jaime J. Fernandez, Kristin A. Farry and John B. Cheatham. “Waveform Recognition Using Genetic Programming: The Myoelectric Signal Recognition Problem. GP ‘96, The MIT Press, pp. 63–71 Andrea G. B. Tettamanzi, 2002 Classifier Systems (Michigan approach) individual: IF X = A AND Y = B THEN Z = D (1 e) f n ( ) r (n) class(n) f n 1 ( ) (1 p) f n ( ) (n) class(n) IF ... THEN ... where IF ... THEN ... IF ... THEN ... IF ... THEN ... r (1 gN ) R IF ... THEN ... IF ... THEN ... IF ... THEN ... IF ... THEN ... Andrea G. B. Tettamanzi, 2002 IF ... THEN ... number of attributes in antecedent part Practical Implementation Issues • from elegant academia to not so elegant but robust and efficient real-world applications, evolution programs • handling constraints • hybridization • parallel and distributed algorithms Andrea G. B. Tettamanzi, 2002 Evolution Programs Slogan: Genetic Algorithms + Data Structures = Evolution Programs Key ideas: • use a data structure as close as possible to object problem • write appropriate genetic operators • ensure that all genotypes correspond to feasible solutions • ensure that genetic operators preserve feasibility Andrea G. B. Tettamanzi, 2002 Encodings: “Pie” Problems W 128 X 32 Y 90 Z 20 0–255 0–255 0–255 0–255 X = 32/270 = 11.85% W X Y Z Andrea G. B. Tettamanzi, 2002 Encodings: “Permutation” Problems Adjacency Representation (2, 4, 8, 3, 9, 7, 1, 5, 6) Ordinal Representation (1, 1, 2, 1, 4, 1, 3, 1, 1) Path Representation (1, 2, 4, 3, 8, 5, 9, 6, 7) Matrix Representation 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 1 0 Sorting Representation (-23, -6, 2, 0, 19, 32, 85, 11, 25) 1-2-4-3-8-5-9-6-7 Andrea G. B. Tettamanzi, 2002 1 1 0 1 0 0 0 0 0 Handling Constraints • Penalty functions Risk of spending most of the time evaluating unfeasible solutions, sticking with the first feasible solution found, or finding an unfeasible solution that scores better of feasible solutions • Decoders or repair algorithms Computationally intensive, tailored to the particular application • Appropriate data structures and specialized genetic operators All possible genotypes encode for feasible solutions Andrea G. B. Tettamanzi, 2002 Penalty Functions S c f ( ) Eval(c( z)) P( z) P P( z) w(t ) wi i ( z) i Andrea G. B. Tettamanzi, 2002 Decoders / Repair Algorithms recombination c S mutation Andrea G. B. Tettamanzi, 2002 Hybridization 1) Seed the population with solutions provided by some heuristics heuristics initial population 2) Use local optimization algorithms as genetic operators (Lamarckian mutation) 3) Encode parameters of a heuristics genotype Andrea G. B. Tettamanzi, 2002 heuristics candidate solution Sample Application: Unit Commitment • Multiobjective optimization problem: cost VS emission • Many linear and non-linear constraints • Traditionally approached with dynamic programming • Hybrid evolutionary/knowledge-based approach • A flexible decision support system for planners • Solution time increases linearly with the problem size Andrea G. B. Tettamanzi, 2002 The Unit Commitment Problem zE E z$ Ci ( Pi ) SU i SDi HSi n n i ( Pi ) i 1 i 1 m E i ( Pi ) E ij ( Pi ) j 1 Ci ( Pi ) ai bi Pi ci Pi 2 E ij ( Pi ) ij ij Pi ij Pi 2 Emissions Andrea G. B. Tettamanzi, 2002 Cost Predicted Load Curve 45 40 35 30 25 Spinning Reserve 20 Load 15 10 5 PM PM 10 :0 0 8: 00 PM 6: 00 PM 4: 00 PM 2: 00 PM 12 :0 0 AM AM Andrea G. B. Tettamanzi, 2002 10 :0 0 8: 00 AM 6: 00 AM 4: 00 AM 2: 00 12 :0 0 AM 0 Unit Commitment: Constraints • • • • • • • • • Power balance requirement Spinning reserve requirement Unit maximum and minimum output limits Unit minimum up and down times Power rate limits Unit initial conditions Unit status restrictions Plant crew constraints ... Andrea G. B. Tettamanzi, 2002 Unit Commitment: Encoding Unit 1 1.0 0.9 0.0 0.0 1.0 0.8 1.0 0.0 0.5 1.0 Unit 2 0.8 1.0 1.0 0.5 0.65 0.8 0.4 0.0 1.0 0.5 Andrea G. B. Tettamanzi, 2002 Unit 3 0.2 0.2 0.8 1.0 0.8 0.25 0.2 1.0 1.0 0.0 Unit 4 0.15 1.0 0.2 0.8 1.0 1.0 1.0 0.75 0.8 0.0 Time 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 Fuzzy Knowledge Base Unit Commitment: Solution Unit 1 Unit 2 Andrea G. B. Tettamanzi, 2002 Unit 3 Unit 4 Time 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 down hot-stand-by starting shutting down up emission Unit Commitment: Selection cost ($) competitive selection: Andrea G. B. Tettamanzi, 2002 $507,762 $516,511 213,489 £ 60,080 £ Unit Commitment References • D. Srinivasan, A. Tettamanzi. “An Integrated Framework for Devising Optimum Generation Schedules”. In Proceedings of the 1995 IEEE International Conference on Evolutionary Computing (ICEC ‘95), vol. 1, pp. 1-4. • D. Srinivasan, A. Tettamanzi. A Heuristic-Guided Evolutionary Approach to Multiobjective Generation Scheduling. IEE Proceedings Part C - Generation, Transmission, and Distribution, 143(6):553-559, November 1996. • D. Srinivasan, A. Tettamanzi. An Evolutionary Algorithm for Evauation of Emission Compliance Options in View of the Clean Air Act Amendments. IEEE Transactions on Power Systems, 12(1):336-341, February 1997. Andrea G. B. Tettamanzi, 2002 Parallel Evolutionary Algorithms • Algoritmo evolutivo standard enunciato come sequenziale... • … ma gli algoritmi evolutivi sono intrinsecamente paralleli • Vari modelli: – – – – algoritmo evolutivo cellulare algoritmo evolutivo parallelo a grana fine (griglia) algoritmo evolutivo parallelo a grana grossa (isole) algoritmo evolutivo sequenziale con calcolo della fitness parallelo (master - slave) Andrea G. B. Tettamanzi, 2002 Terminology • Panmictic • Apomictic Andrea G. B. Tettamanzi, 2002 Island Model Andrea G. B. Tettamanzi, 2002 Selected Applications in Biology and Medical Science • the protein folding problem, i.e. determining the tertiary structure of proteins using evolutionary algorithms; • quantitative structure-activity relationship modeling for drug design; • applications to medical diagnosis, like electroencephalogram (EEG) classification and automatic feature detection in medical imagery (PET, CAT, NMR, X-RAY, etc.); • applications to radiotherapy treatment planning; • applications to myoelectric prosthesis control. Andrea G. B. Tettamanzi, 2002 Sample Application: Protein Folding • • • • Finding 3-D geometry of a protein to understand its functionality Very difficult: one of the “grand challenge problems” Standard GA approach Simplified protein model Andrea G. B. Tettamanzi, 2002 Protein Folding: The Problem • Much of a proteins function may be derived from its conformation (3-D geometry or “tertiary” structure). • Magnetic resonance & X-ray crystallography are currently used to view the conformation of a protein: – expensive in terms of equipment, computation and time; – require isolation, purification and crystallization of protein. • Prediction of the final folded conformation of a protein chain has been shown to be NP-hard. • Current approaches: – molecular dynamics modelling (brute force simulation); – statistical prediction; – hill-climbing search techniques (simulated annealing). Andrea G. B. Tettamanzi, 2002 Protein Folding: Simplified Model • • • • • • • 90° lattice (6 degrees of freedom at each point); Peptides occupy intersections; No side chains; Hydrophobic or hydrophilic (no relative strengths) amino acids; Only hydrophobic/hydrophilic forces considered; Adjacency considered only in cardinal directions; Cross-chain hydrophobic contacts are the basis for evaluation. Andrea G. B. Tettamanzi, 2002 Protein Folding: Representation relative move encoding: UP DOWN FORWARD LEFT UP RIGHT ... preference order encoding: UP DOWN FORWARD LEFT LEFT LEFT UP DOWN RIGHT UP DOWN FORWARD DOWN FORWARD LEFT UP FORWARD RIGHT RIGHT RIGHT Andrea G. B. Tettamanzi, 2002 ... Protein Folding: Fitness Decode: plot the course encoded by the genotype. Test each occupied cell: • any collisions: -2; • no collisions AND a hydrophobe in an adjacent cell: 1. Notes: • for each contact: +2; • adjacent hydrophobes not discounted in the scoring; • multiple collisions (>1 peptides in one cell): -2; • hydrophobe collisions imply an additional penalty (no contacts are scored). Andrea G. B. Tettamanzi, 2002 Protein Folding: Experiments • • • • • Preference ordering encoding; Two-point crossover with a rate of 95%; Bit mutation with a rate of 0.1%; Population size: 1000 individuals; crowding and incest reduction. • Test sequences with known minimum configuration; Andrea G. B. Tettamanzi, 2002 Protein Folding References • S. Schulze-Kremer. “Genetic Algorithms for Protein Tertiary Structure Prediction”. PPSN 2, North-Holland 1992. • R. Unger and J. Moult. “A Genetic Algorithm for 3D Protein Folding Simulations”. ICGA-5, 1993, pp. 581–588. • Arnold L. Patton, W. F. Punch III and E. D. Goodman. “A Standard GA Approach to Native Protein Conformation Prediction”. ICGA 6, 1995, pp. 574–581. Andrea G. B. Tettamanzi, 2002 Sample Application: Drug Design Purpose: given a chemical specification (activity), design a tertiary structure complying with it. Requirement: a quantitative structure-activity relationship model. Example: design ligands that can bind targets specifically and selectively. Complementary peptides. Andrea G. B. Tettamanzi, 2002 Drug Design: Implementation amino acid (residue) N L H A F G L F K A individual • name • hydropathic value Operators: • Hill-climbing Crossover • Hill-climbing Mutation • Reordering (no selection) Andrea G. B. Tettamanzi, 2002 implicit selection Drug Design: Fitness target a complement b k s ak h ik s i moving average hydropathy k s bk g ik s i hydropathy of residues k s, ..., n s Q i (ai bi ) 2 n 2s Andrea G. B. Tettamanzi, 2002 n: number of residues in target (lower Q = better complementarity) Drug Design: Results Sequence:FANSGNVYFGIIAL Hydropathic Value Fassina GA Target 4 2 0 -2 -4 -6 0 2 4 6 8 10 12 14 16 AminoAcid Andrea G. B. Tettamanzi, 2002 Drug Design References • T. S. Lim. A Genetic Algorithms Approach for Drug Design. MS Dissertation, Oxford University, Computing Laboratory, 1995. • A. L. Parrill. Evolutionary and Genetic Methods in Drug Design. Drug Discovery Today, Vol. 1, No. 12, Dec 1996, pp. 514–521. Andrea G. B. Tettamanzi, 2002 Sample Application: Medical Diagnosis • Classifier Systems application • Learning by examples • Lymphography – 148 examples, 18 attributes, 4 diagnoses – estimated performance of a human expert: 85% correct • Prognosis of breast cancer recurrence – 288 examples, 10 attributes, 2 diagnoses – performance of human expert unknown • Location of primary tumor – 339 examples, 17 attributes, 22 diagnoses – estimated performance of a human expert: 42% correct Andrea G. B. Tettamanzi, 2002 Medical Diagnosis Results • Performance indistiguishable from humans • Performance for breast cancer: about 75% • In primary tumor, patients with identical symptoms have different diagnoses • Symbolic (= comprehensible) diagnosis rules Andrea G. B. Tettamanzi, 2002 Medical Diagnosis References • • • • Pierre Bonelli, Alexandre Parodi, “An Efficient Classifier System and its Experimental Comparison with two Representative learning methods on three medical domains”. ICGA 4, pp. 288–295. Tod A. Sedbrook, Haviland Wright, Richard Wright. “Application of a Genetic Classifier for Patient Triage”. ICGA 4, pp. 334–338. H. F. Gray, R. J. Maxwell, I. Martínez-Perez, C. Arús, S. Cerdán. “Genetic Programming Classification of Magnetic Resonance Data”. GP ‘96, p. 424. Alejandro Pazos, Julian Dorado, Antonio Santos. “Detection of Patterns in Radiographs using ANN Designed and Trained with GA”. GP ‘96, p. 432. Andrea G. B. Tettamanzi, 2002 Sample Application: Radiotherapy Treatment Planning • X-rays or electron beams for cancer treatment • Conformal therapy: uniform dose over cancerous regions, spare healthy tissues • Constrained optimization, inverse problem • From dose specification to beam intensities • Constraints: – beam intensities are positive – rate of intensity change is limited • Conflicting objectives: Pareto-optimal set of solutions Andrea G. B. Tettamanzi, 2002 RTP: The Problem beam TA: dose delivered to treatment area OAR: dose delivered to organs at risk OHT: dose delivered to other healty tissues plane of interest tretment area organ at risk TA = 100% OAR < 20% OHT < 30% y head Andrea G. B. Tettamanzi, 2002 x z |OAR - OAR*| RTP: Fitness and Solutions C A Pareto optimal set B |TA - TA*| Andrea G. B. Tettamanzi, 2002 Radiotherapy Treatment Planning References • O. C. L. Haas, K. J. Burnham, M. H. Fisher, J. A. Mills. “Genetic Algorithm Applied to Radiotherapy Treatment Planning”. ICANNGA ‘95, pp. 432–435. Andrea G. B. Tettamanzi, 2002 Evolutionary Algorithms and Soft Computing EAs optimization optimization monitoring fitness SC FL Andrea G. B. Tettamanzi, 2002 NNs Soft Computing • Tolerant of imprecision, uncertainty, and partial truth • Adaptive • Methodologies: – – – – – Evolutionary Algorithms Neural Networks Bayesian and Probabilistic Networks Fuzzy Logic Rough Sets • Bio-inspired: Natural Computing • A Scientific Discipline? • Methodologies co-operate, do not compete (synergy) Andrea G. B. Tettamanzi, 2002 Artificial Neural Networks dendritis axon synapsis x1 w 1 x 2 w2 x n wn Andrea G. B. Tettamanzi, 2002 y Fuzzy Logic 1 0 Andrea G. B. Tettamanzi, 2002 EAs optimization fitness FL Andrea G. B. Tettamanzi, 2002 NNs Neural Network Design and Optimization • Evolving weights for a network of predefined structure • Evolving network structure – direct encoding – indirect encoding • Evolving learning rules • Input data selection Andrea G. B. Tettamanzi, 2002 Evoluzione dei pesi (struttura predefinita) 0.2 -0.3 0.6 0.7 -0.5 0.4 (0.2, -0.3, 0.6, -0.5, 0.4, 0.7) Andrea G. B. Tettamanzi, 2002 Evolving the Structure: Direct Encoding 1 2 3 4 5 6 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0 4 1 1 0 0 0 0 5 1 0 1 0 0 0 6 0 1 0 1 1 0 6 4 1 Andrea G. B. Tettamanzi, 2002 5 2 3 Evoluzione pesi e struttura feed-forward codifica diretta (3, 2, 3) 3x3 3x2 2x3 3x1 W0 W1 W2 W3 Andrea G. B. Tettamanzi, 2002 Evoluzione pesi e struttura feed-forward codifica diretta • Operatore di mutazione: – – – – rimozione neurone: elimina colonna in Wi - 1, riga in Wi; duplicazione neurone: copia colonna in Wi - 1, riga in Wi; rimozione di un layer con un solo neurone: WTi - 1 Wi; duplicazione di un layer: inserisci matrice identità; • Operatore di semplificazione: – rimuovi neuroni con riga in Wi di norma < ; • Operatore di incrocio: – scegli due punti di incrocio nei genitori; – scambia le code; – collega i pezzi con nuova matrice di pesi casuale Andrea G. B. Tettamanzi, 2002 Structure Evolution: Direct Encoding Graph-generating Grammar A B S C D c d a , B A a c a 0 0 0 , b a 0 0 0 a a a a a , C , D e a a a b 0 1 0 0 1 1 1 , c , d , e 1 0 1 0 1 1 1 (S: A, B, C, D || A: c, d, a, c || B: a, a, a, e || C: a, a, a, a || ... ) Andrea G. B. Tettamanzi, 2002 optimization EAs monitoring SC FL Andrea G. B. Tettamanzi, 2002 NNs Evolutionary Algorithms and Fuzzy Logic Fuzzy Government 2 Evolutionary Algorithm fuzzy fitness fuzzy operators 1 Fuzzy Sistem Andrea G. B. Tettamanzi, 2002 3 Fuzzy System Design and Optimization • • • • Representation Genetic operators Selection mechanism Example: Learning fuzzy classifiers Andrea G. B. Tettamanzi, 2002 Fuzzy Rule-Based Systems Andrea G. B. Tettamanzi, 2002 Representation of a Fuzzy Rulebase c1 c2 c3 c4 totally overlapping membership functions 10011000 11011010 membership function genes N 00001010 max = Ndom * Noutput rule genes of value (0 ... Ndom) input input output FA1 FA2 FA3 FA1 FA2 FA3 R1 genotype Andrea G. B. Tettamanzi, 2002 rules R2 ... Rmax A richer representation Input membership functions Output MFs Rules IF x is A AND v is B THEN F is C IF a is D THEN F is E IF w is G AND x is H THEN F is C IF true THEN F is K Andrea G. B. Tettamanzi, 2002 Initialization Input variables no. domains = 1 + exponential(3) a min IF b C c d max Output variables no. domains = 2 + exponential(3) Rules no. rules = 2 + exponential(6) is AND is AND is AND is THEN is for each input variable, flip a coin to decide whether to include Andrea G. B. Tettamanzi, 2002 Recombination A rule takes with it all the referred domains with their MFs IF x is A AND v is B THEN F is C something else something else IF true THEN F is K something else IF a is D THEN F is E IF w is G AND x is H THEN F is C something else IF x is A AND v is B THEN F is C IF a is D THEN F is E IF w is G AND x is H THEN F is C IF true THEN F is K Andrea G. B. Tettamanzi, 2002 Mutation • {add, remove, change} domain to {input, output} variable; • {duplicate, remove} a rule; • change a rule: {add, remove, change} a clause in the {antecedent, consequent} input MF perturbation: a Andrea G. B. Tettamanzi, 2002 b c d Esempio: “Learning fuzzy classifiers” Andrea G. B. Tettamanzi, 2002 Controlling the Evolutionary Process • Motivation: – EAs easy to implement – little specific knowledge required – long computing time • Features: – complex dynamics – non-binary conditions – “intuitive” knowledge available Andrea G. B. Tettamanzi, 2002 Knowledge Acquisition ALGORITHM statistics visualization KNOWLEDGE Andrea G. B. Tettamanzi, 2002 Fuzzfying Evolutionary Algorithms • Fuzzy fitness (objective function) • Fuzzy encoding • Fuzzy operators – recombination – mutation • Population Statistics Andrea G. B. Tettamanzi, 2002 Fuzzy Fitness • Faster calculation • Less precision • Specific Selection Andrea G. B. Tettamanzi, 2002 Fuzzy Government “Fuzzy rulebase for the dynamic control of an evolutionary algorithm” Andrea G. B. Tettamanzi, 2002 Population Statistics Parameters If D(Xt) is LOW then pmut is HIGH If f (Xt) is LOW and D(Xt) is HIGH then Emerg is NO ... EAs FL NNs integration Andrea G. B. Tettamanzi, 2002 Neuro-Fuzzy Systems • Fuzzy Neural Networks – – – – fuzzy neurons (OR, AND, OR/AND) learning algorithms (backpropagation-style) NEFPROX ANFIS • Co-operative Neuro-Fuzzy Systems – Adaptive FAMs: differential competitive learning – Self-Organizing Feature Maps – Fuzzy ART and Fuzzy ARTMAP Andrea G. B. Tettamanzi, 2002 Fuzzy Neural Networks x1 x2 w11 w12 AND wm1 v1 wm 2 OR w1n xn vm wmn Andrea G. B. Tettamanzi, 2002 AND y FAM Systems ( A1 B1 ) ( A2 B2 ) x fuzz ( Ak Bk ) Andrea G. B. Tettamanzi, 2002 defuzz y EAs optimization optimization monitoring fitness SC FL NNs integration A. Tettamanzi, M. Tomassini. Soft Computing. Springer-Verlag 2001 Andrea G. B. Tettamanzi, 2002 Summary and Conclusions Andrea G. B. Tettamanzi, 2002