MECHANICS OF GENETIC PROGRAMMING Tin-Shuk (Timmy) Wong Introduction I am working in ECJ to use genetic programming to develop a strategy for investing in the stock market. Because I don’t yet have visual results, I will be giving a brief presentation on the theory behind genetic programming in general. What is genetic programming? Genetic programming (GP) is a specialized variation of genetic algorithms (GA). The modern tree-based form was popularized by John Koza in the nineties. GP uses programs as individuals in the genetic pool, with each program usually modeled as a binary tree rather than a string of numbers. Leaf nodes are typically known as terminals, while inner nodes are known as functions. One advantage of this representation is that the trees can be searched and ordered easily. Crossover Recall that crossover is the “reproduction” of genetic algorithms. GP performs crossover by randomly switching nodes in two trees. If the nodes involved are functions, their children will also be switched. Another advantage of the tree structure is that even identical trees can produce different children. Crossover with different parents Crossover with identical parents Mutation The other primary operation used in GA and GP is mutation. Mutation can be implemented in a few different ways: the GP algorithm can only allow terminals to be altered, or it can allow entire subtrees to be changed. Mutation diagram GP applied to the stock market Genetic programming is more powerful than powerful than generic GA because it outputs a full program, rather than a number. Essentially, it is incredibly powerful at solving any problem that can is tractable to machine learning. It is an ideal tool for tackling the stock market, as the market has a simple fitness criterion (earning the most money) but has many complicated variables that constantly change. From here on out ECJ is a popular GA/GP suite developed in Java. My strategy involves using ECJ to implement several classes of strategies as individuals. These strategies take historical market data as input, and their output predictions are contrasted with recent market data. Hopefully, this will combine many strategies into one improved heuristic. Program input Conclusion Any questions? Thanks for your time.