Problem Set 4 Genetic Programming and LIL-GP Where we’ve been Problem Sets 1 and 2: Theory Where we’ve been Problem Sets 1 and 2: Theory Combinatorics Where we’ve been Problem Sets 1 and 2: Theory Combinatorics Simple Genetic Algorithms (a.k.a. How to run an effective hamburger chain) Where we’ve been Problem Sets 1 and 2: Theory Combinatorics Simple Genetic Algorithms (a.k.a. How to run an effective hamburger chain) Fundamental Theorem of Genetic Algorithms Where we’ve been Problem Sets 1 and 2: Theory Combinatorics Simple Genetic Algorithms (a.k.a. How to run an effective hamburger chain) Fundamental Theorem of Genetic Algorithms Hyperplane madness Where we’ve been Problem Set 3: Genetic Algorithms and the GENESIS system Where we’ve been Problem Set 3: Genetic Algorithms and the GENESIS system Optimization of the function x10 Where we’ve been Problem Set 3: Genetic Algorithms and the GENESIS system Optimization of the function x10 Minimization of the De Jong function Where we’re going Problem Set 4: Genetic Programming and LIL-GP Where we’re going Problem Set 4: Genetic Programming and LIL-GP Evolve LISP-like programs without writing one line of LISP! (sorry, John) Where we’re going Problem Set 4: Genetic Programming and LIL-GP Evolve LISP-like programs without writing one line of LISP! (sorry, John) Solve two simple problems: symbolic regression of a quadratic function, and symbolic regression of a boolean even-3-parity function. Getting the software 1. Change to your cs426 directory. cd cs426 Getting the software 1. Change to your cs426 directory. cd cs426 2. Copy the software to your AFS space. cp –r /afs/ir/class/cs426/lilgp . This will create a directory called lilgp under your cs426 directory. Uncompressing the code 1. Change to your new lilgp directory. cd lilgp You should have one file: lilgp.tar Uncompressing the code 1. 2. Change to your new lilgp directory. cd lilgp You should have one file: lilgp.tar Uncompress with the following command: tar –xvf lilgp.tar Uncompressing the code You should now have a directory called lilgp1.1. Under this directory you only need to deal with the following subdirectories: 1.1: All the code. htmlMan: All the documentation. Don’t worry about the others. Where to start Browse through the documentation. It is indexed and easy to follow. If you want to download it yourself, you can get it at: http://garage.cps.msu.edu/software /software-index.html#lilgp Start with: lil-gp.contents.htm Where to start Try running one of the samples. They’re located in the app directory. After compiling and building (just type make), they’re ready to run. Try the regression sample. To run, type: gp –f input.file If the program does not start running, there’s something wrong with your installation. Warnings: 1. Don’t change anything in the kernel directory. Warnings: 1. 2. Don’t change anything in the kernel directory. Don’t change anything in the kernel directory. Warnings: 1. 2. 3. Don’t change anything in the kernel directory. Don’t change anything in the kernel directory. Don’t modify the makefiles. Even though they’re called GNUmakefile, typing make still works. Implementing a problem 1. Fill in a tableau, deciding on terminal sets, function sets, fitness determination, etc. Implementing a problem 1. 2. Fill in a tableau, deciding on terminal sets, function sets, fitness determination, etc. Write the code. Implementing a problem 1. 2. 3. Fill in a tableau, deciding on terminal sets, function sets, fitness determination, etc. Write the code. Create a parameter file for the run. Implementing a problem 1. 2. 3. 4. Fill in a tableau, deciding on terminal sets, function sets, fitness determination, etc. Write the code. Create a parameter file for the run. Run the code and examine the output files. Tableau Terminal set: The leaves of the expression tree. Tableau Terminal set: The leaves of the expression tree. Function set: The internal nodes. Tableau Terminal set: The leaves of the expression tree. Function set: The internal nodes. ex: symbolic regression terminal set = {x} function set = {+, -, *, %} Tableau Fitness Raw fitness – sum of error over fitness cases, number of hits, etc. Tableau Fitness Raw fitness – sum of error over fitness cases, number of hits, etc. Standardized fitness – all positive; 0 is best Tableau Fitness Raw fitness – sum of error over fitness cases, number of hits, etc. Standardized fitness – all positive; 0 is best Adjusted fitness – in between 0 and 1, with 1 being the most fit individual Tableau In general, LIL-GP was designed after what is described in Genetic Programming (Koza, 1992), so all terminology is described in the book. See chapters 6 and 7 for full explanations of all terms used in the program. Writing the code Files you need to provide: 1. 2. 3. 4. 5. app.h app.c appdef.h function.h function.c Writing the code Files you need to provide: 1. 2. app.h – Defines the global data structure used to pass information back and forth between files. app.c – Defines all the user callbacks – initialization, fitness evaluation, etc. Writing the code Files you need to provide: 3. appdef.h – Two #defines: MAX_ARGS, and DATATYPE. MAX_ARGS: Maximum number of arguments your functions will take DATATYPE: Type that all of your functions will return (ex: double, int, etc.) Writing the code Files you need to provide: 4. 5. function.h – Prototypes for all of the functions in your function set. function.c – Implementations of the functions listed in function.h. Writing the code In general, all of the tricky code has been written for you. In fact, two of the sample applications are: 1. 2. Symbolic regression of a function Boolean-11 multiplexer Both are explained in GP. You do not need to rewrite the functions in app.c, just modify them. Creating a parameter file The parameter files included with the samples are called input.file. Just modify these for the homework problems. If you want to learn more about the extensive parameters that can be set, look in the documentation. Creating a parameter file The only confusion about parameter files will come in specifying crossover rates for the first problem. To avoid this confusion, here is what you need to put in your parameter file to meet the problem criteria: Creating a parameter file Parameters for proper breeding: breed_phases = 4 breed[1].operator = crossover, select=fitness, internal=0.0 breed[1].rate = 0.1 breed[2].operator = crossover, select=fitness, internal=1.0 breed[2].rate = 0.8 Creating a parameter file Parameters for proper breeding: breed[3].operator = reproduction, select=fitness breed[3].rate = 0.1 breed[4].operator = mutation, select=fitness breed[4].rate = 0.00 Creating a parameter file Aside from this, all you need to change is pop_size, max_generations, random_seed, and perhaps output.basename. The parameter output.basename sets the file prefix for all of the output files. Creating a parameter file For problems 2 and 3, the default crossover parameters work fine. No need to change them unless you’re curious. Running the code Very simple to run the code: 1. Compile and build with the command make. If there are compile errors, it will tell you now – fix them before continuing. Running the code Very simple to run the code: 1. 2. Compile and build with the command make. If there are compile errors, it will tell you now – fix them before continuing. Run the code with the command: gp –f input.file Output files LIL-GP generates lots of good stuff: .sys – general info about the run .gen – stats on tree size and depth .prg – stats on fitness and hits .bst – info about current best individual .his – history of the .bst file .stt – unreadable version of all stats You will want the .bst and .his files. Problem 1 This problem deals with symbolic regression of the function x2/2+2x+2. There is a sample app called “regression” provided. All you need to do is modify the input.file as described, delete a bunch of lines in app.c and function.h/c, change a few numbers in app.c, and it will run. Guaranteed. Problem 1 Hint: With the provided random seed, it will not find a close solution even after 151 generations. Fiddle with the random seed and you will quickly find a perfect solution in much fewer than 151 generations. Problem 2 This problem deals with symbolic regression of a boolean function that performs even-3-parity. Even3-parity returns true given three inputs if an even number of inputs are true (i.e., 0 true, or 2 true). Problem 2 There is a sample app called “multiplexer” that performs symbolic regression for a boolean-11 multiplexer. Just modify their files to do this problem. You will need to delete a bunch of functions and add code for nand and nor. You might want to add an even-3-parity test function in app.c for fitness determination…. Problem 2 You will also need to change the specification in app.h. Your global should either have three ints (1 for each input line) or 1 int (if you like bit logic). Hint: With a specified population size of 1000, you will very likely have a perfect individual at generation 0, since the solution is very simple. This is fine. You should be happy about it, not stressed. Problem 3 This is the same as problem 2, except now with an automatically defined function. Do not attempt this problem before you have problem 2 working completely. You will be able to quickly modify your problem 2 code (just app.c) to make problem 3 work. Problem 3 In specifying the function sets for this problem, follow the model given in the “lawnmower” sample app. This shows precisely how to build the function sets for a tree with automatically defined functions. There will be two different function sets: one of the result producing branch, and one for the ADF branch. General Hints Once again, there is not much code to be written. All you are doing is modifying existing code. Don’t feel bad, this is done all the time. General Hints Once again, there is not much code to be written. All you are doing is modifying existing code. Don’t feel bad, this is done all the time. For the function sets, make sure to read the docs to find out what each of the arguments should be for the different types of arguments, terminals, and functions. General Hints Don’t modify the sample apps directly. Copy the files to a new directory, so you’ll always have the original to look back at. General Hints Don’t modify the sample apps directly. Copy the files to a new directory, so you’ll always have the original to look back at. If you get stuck, read the documentation. It’s actually very clear and well-indexed. What to turn in Answers to the written problems. app.c and function.c for each problem. The .bst file for each problem. The default of one individual in this file is fine. Final thoughts This is not a test of your programming ability. It is just an introduction to one piece of easyto-use software for genetic programming. Don’t make this harder than it needs to be by rewriting everything from scratch. Use what works. Final thoughts Other GP software freely available: GPCPP – Has basic functionality, relatively easy to add new problem classes. (C++) Download at: http://wwwcgi.cs.cmu.edu/afs/cs/project/airepository/ai/areas/genetic/gp/syste ms/gpcpp/0.html Final thoughts Other GP software freely available: GPQUICK – Simpler than GPCPP but less friendly interface. (C++) Download at: http://wwwcgi.cs.cmu.edu/afs/cs/project/airepository/ai/areas/genetic/gp/syste ms/gpquick/0.html Final thoughts Other GP software freely available: ECJ8 – Most versatile, but very large and somewhat complicated. (Java) Download at: http://www.cs.umd.edu/projects/plu s/ec/ecj/ Final thoughts Good luck!