EEC122 - Homework 4 Question 3: Problem description: We decided to try to solve an instance of subset sum problem. The problem: for some set of integers, to determine if exist a subset of numbers whose sum is equal to 0. This problem is NP-Hard. We generated a train set: 150 sets of n integer numbers, generated randomly with uniform distribution from [-10,10]. In this experiment, the train set is also the test set, just like we did in the Intertwined Spirals Problem. Then we used GP to learn a scheme program that for each set returns positive number iff there is a subset which sums to 0. The terminals for the GP are the numbers of the set (X0 .. Xn-1), and a random number R € [1,1]. Although random number might seem irrelevant to that problem, experiments show that removing it from the terminal set harms the results. The functions are +,-,%,* and IFLTE. % is protected division: for all X, X%0 = 0. For all X≠0, for all Y, Y%X = Y/X. (IFLTE arg0 arg1 arg2 arg3) is: If arg0 ≤ arg1, then arg2, else arg3. Individuals: The genome we used is scheme program, built as a tree. The terminals are the numbers of the set (X0 ... Xn-1), and random number € [-1, 1]. The random number is generated in the creation of the node and stays unchanged during the process. The function set (inner nodes of the tree) contains the two argument functions -, +, * and % (protected division. For all x, x%0 = 0. For y≠0, x%y = x/y), and the four arguments function IFLTE (If less than or equal), performing If arg0 ≤ arg1 return arg2 else return arg3. The terminals are chosen with equal probability, and so does the functions. The population is initialized with random 3-levels full trees. In order to prevent the trees from growing too big, if the size of a tree exceeds 200 nodes, it is removed from the population and replaced with new random 3-levels tree. The fitness of an Individual is the number of sets it classifies right. One can notice that a random guess of number € [-1,1] will expectedly classify 50% of the points correctly. On our dataset of 150 points, that means fitness of 75. Therefore, for a program to be considered "good" it must achieve fitness greater than 75. Selection: Tournament selection, k=2. Population size: 1500 Crossover: Exchange sub-trees crossover. The operator randomly chooses a node at each tree, and replaces the sub-trees rooted in those nodes. After some tuning, we found out medium pc (0.4) works the best. Mutation: Random sub-tree replace mutator. Randomly picks a node in the tree, and replaces it with new randomly generated terminal. After some tuning, we found out low pm (0.1) works the best. Generations: 501 Experiment A: sets of size 3: We run the experiment 10 independent runs. In this experiment, the best individual classifies right 148.25 (In average) out of 150 sets ( 98.8333% success). Experiment B: sets of size 4: Again, we run 10 independent runs: The best solver found solves 134.3333 (in average) sets correctly, which are 89.56% success. Experiment C: sets of size 5, integers € [-10,10] The best solver found solves 134 (in average) sets correctly, which are 89.33% success. Conclusions, observations and remarks: Even though this problem is indeed hard to solve, genetic programming can develop pretty good solvers: solve about 90% of the cases. In this method of developing, over fitting might be a problem. Ideas for future work: Try with much longer time (~10,000 generations). Try different mutation method. Try different inner function in the GP tree.