Automated discovery in math • Machine learning techniques (GP, ILP, etc.) have been successfully applied in science • How about mathematics? Can they be used to discover interesting relationships in mathematical “data”? • This is an exploration of using GP for that purpose • Specifically, using GP to automatically discover Euler’s identity (V – E + F = 2) from a fairly limited amount of data Cubes V=8 E = 12 F=6 V – E + F = 8 – 12 + 6 = 2 Tetrahedra V=4 E=6 F=4 V–E+F=4–6+4=2 Octahedra V=6 E = 12 F=8 V – E + F = 6 – 8 + 12 = 2 Data for Euler’s identity Polyhedron 1 2 3 4 5 6 7 8 9 Cube Triangular prism Pentagonal prism Square pyramid Triangular pyramid Pentagonal pyramid Octahedron Tower Truncated cube V 8 6 10 5 4 6 6 9 10 E 12 9 15 8 6 10 12 16 15 F 6 5 7 5 4 6 8 9 7 At a glance • • • • • • • • 50 generations Population: 4000 ASTs Generation #: 3600 (90% of population) Maximum AST depth: 13 Ramped half-and-half initialization 3 non-terminals: +, -, * 12 terminals: V, E, F, 1, 2, …, 9 Crossover, no mutation Genetic algorithms (GA) • Search a space of solution attempts (“individuals”) • Use natural selection to guide the search • Must have a fitness function that can evaluate any given individual • Individuals procreate by exchanging (recombining) “genetic material” Example: SAT solving • Problem: Given a CNF formula P over n variables x1,…,xn, find a satisfying assignment • Search space: all n-bit strings • Fitness measure for a given individual b1 bn: # of satisfied clauses in P • Genetic operations: crossover and mutation Crossover: a1 … aj-1|aj … an + b1 … bj-1|bj … bn a1 … aj-1 | bj … bn b1 … bj-1 | aj … an Mutation: 01101001 01100001 Generic GA algorithm 1. 2. 3. 4. Parameterized over: N, P, G Construct a random initial population Set i := 1 If i > N then halt Compute the fitness of each individual; if the fittest solves the problem, halt. 5. Create a new population: 1. Pick P – G individuals and copy them 2. Create G new individuals by repeated applications of genetic operations 6. Set i := i + 1 and go to step 3 Selection • How is an individual “picked” for reproduction or copying? • Main idea: the probability that an individual is selected should be proportional to the individual’s fitness • Many ways to ensure that. One method is tournament selection: – Pick 0 < k <= P individuals randomly – Select the fittest of the k • When k = 1: No selection pressure • When k = P: Too much selection pressure Genetic Programming (GP) • An instance of the generic GA scheme • Individuals are now programs, i.e., syntactic objects • Search space is kept finite by bounding program size • Programs are represented as ASTs (abstract syntax trees) Programs as ASTs if x > 0 then y := x * x else y := z + 1 Parsing if := > x 0 y := x + y * x z 1 Program structure in GP • Programs are usually simple Herbrand • • • • • terms, i.e., functional expressions AST leaves are called terminals Internal nodes are non-terminals Non-terminals are function symbols (e.g. +) Terminals are constants and variables Terminals + non-terminals must be sufficient for expressing solutions Viewing a functional AST as a “program” + * x y 2 The program has two “inputs”, x and y. Given specific values for these, it produces a unique result as output AST Crossover Crossover pt 1 + * T1 Crossover pt 2 - T2 + T4 T3 T5 Parents T6 Children - + + T5 T6 T3 * T4 T1 T2 Initial population • Built randomly • Two methods for building a random AST: – Full method: All branches are equally long – Grow method: Different subtrees can have different sizes (but less than the maximum) • More usual: ramped half-and-half initialization: half of the trees are built with one method, the other half with the other method Problem formulation • Can cast it as a standard symbolic • • • • regression problem View F as a function of E and V, and search space of all rational functions of two variables (up to a max depth) Error function: difference between actual # of faces and the result produced by the program Optimization: minimize the error Quick convergence Another approach • Search space of all identities • Generated as follows: I T1 = T2 T L | T1 + T2 | T1 – T2 | T1 * T2 L V|E|F|1|2|…|9 • Any other integer can be built from 1,…, 9 and the given non-terminals • Identity is not a non-terminal; it can only appear at the root of an AST Details • Generate P identities randomly (using ramped half-and-half initialization) • Crossover on two identities S1 = S2 and T1 = T2: • Mate two random subterms Si and Tj from each identity, producing two new subterms Si’ and Tj’ • If either new term is deeper than the max depth, then use one of the original parents • Replace Si and Tj in the identities by Si’ and Tj’ • No mutation Fitness • An identity is evaluated on a given triple of values for V, E, and F • Computing the fitness of an identity S = T: For each of the k data triples ½: If S = T holds for ½, then give the identity a point • Higher score, greater fitness • Maximum fitness: 9, minimum: 0 Problem • Trivially true identities can get perfect scores, e.g.: V=V 1 + 2 = 5 – 3 E – E + E = E • Solution: negative triples, e.g.: • V = 0, E = 0, F = 1 Trivial identities will hold for such negative triples, but plausible identities will not Fitness computation • To evaluate an identity S = T: • For each of the k data triples p: – Allocate a point if S = T holds for p – Allocate a second point if S = T does not hold for the negative triple • Maximum score: 18, minimum: 0 • Also impose a penalty of b n/20 c points for an identity of length n (to discourage excessively long expressions)