Building Blocks CS 5764 Evolutionary Computation Hod Lipson Unifying ideas • Knowledge represented as a population of solutions containing building blocks • Progress is driven by two key processes: – Incremental progress: e.g. mutation (traditional optimization): Refinement – Recombination of solutions (e.g. crossover): Discovering new areas (possibly initially inferior) Terminology “Chromosome” “Gene” 01010100111001010101010010110 Allele one of two or more forms of a gene or a genetic locus A GA Schema • A “template” – a string of symbols taken from the alphabet {0,1,*} – 010*1, *110*, *****, 10101 • The character “*” means “don’t care” – *10*1 represents 01001, 01011, 11001, and 11011 Geometric Interpretation A Schema is a hyperplane in the larger search space manifold Order of a schema • Number of specified alleles in a gene ? 000 001 010 011 ? 010 011 110 111 ? 010 110 ? 101 Order of a schema • How many different strings of length N does a schema of order “O” represent? – A schema of order O represents 2N-O different strings of length N Schema Order Represented Strings *** 0 000 001 010 011 100 101 110 111 *1* 1 010 011 110 111 *10 2 010 110 101 3 101 Destructive Dynamics • Probability of surviving mutation Sm(H)= Defining Length • “D” = The distance between the furthest two non-* symbols Schemata **** *1** *10* 10** 1*1* 1*11 0**1 1001 D 0 1 2 3 Why is the length important? Destructive Dynamics • Probability of surviving single point crossover Strings containing schemata • A bit string represented by a schema is said to “contain” the schema Bit String 1 00 110 1011 Contained Schemata 1 * 00 0* *0 ** 110 11* 1*0 1** *10 *1* **0 *** 1011 101* 10*1 10** 1*11 1*1* 1**1 1*** *011 *01* *0*1 *0** **11 **1* ***1 **** How many schemata does a string of length N include? How many schemata in a population? • There are 3N different schemata (potential genes) of length N • A population of P bit-strings each of length N contains between 2N and min(P2N, 3N) schemata All possible schemata N 3 P 100 Number of Schemata ?-? How many schemata in a population? • There are 3N different schemata of length N • A population of P bit-strings each of length N contains between 2N and min(P2N, 3N) schemata N=3 All possible schemata N 6 20 40 100 P 20 50 100 300 Number of Schemata 64 - 729 1048576 - 52428800 - Estimating Fitness associated with a gene Population 101 100 010 110 f 5 1 2 3 Schemata *** **0 **1 *0* *00 *01 *1* f (5+1+2+3) / 4 = 2.75 (1+2+3) / 3 = 2 5/1=5 (5+1) / 2 = 3 1/1=1 5/1=5 (2+3)/2 = 2.5 Estimation uncertainty: Standard error Observations • If only fitness-proportionate selection is applied (no crossover or mutation), schemata with above (below) average fitness are sampled, generation after generation, by an increasing (decreasing) number of chromosomes. • Schemata with a long defining length have a higher probability to be disrupted by crossover • Schemata with high order have a higher probability of being disrupted by mutation • Schemata with a low order and a short defining length are called building blocks • Building blocks are processed with minimum disruption by GAs, therefore GAs use building blocks of relatively high fitness to build entire solutions • GAs will be successful insofar as the problem has been encoded in a way that can be solved with compact building blocks (low order, low defining lengths) • What is the easiest problem you can think of? Dynamics • • • • • H is a schema present in the population at time t m(H,t) is the number of instances of H at time t u(H,t) is the observed average fitness of H expected number of offspring of x is f(x)/favg(t) If x is an instance of H, then Destructive Dynamics • Probability of surviving single point crossover • Probability of surviving mutation Sm(H)= Combining Effects 13 12 GA (Diversity, Tight Linkage) Best Fitness 11 10 GA (Diversity, Poor Linkage) 9 Parallel Simulated Annealing Parallel Hillclimber GA (Roulette, Tight Linkage) 8 7 6 5 Random Search 4 0 500 1000 1500 2000 2500 3000 Evaluations Generation (x100) Large defining length and small order = poor linkage The Building Block Hypothesis • GAs performs adaptation by identifying and recombining "building blocks", i.e. low order, low defining-length schemata with above average fitness. • GAs perform adaptation by implicitly and efficiently implementing this heuristic. Caveats • Model assumes particular form of representation: – Bit strings, single point crossover, mutation • Assumes fitness-proportionate selection • Assume fixed fitness criterion • Assumes fixed population size Many variations have been published