Report - Question 3

advertisement
EEC122 - Homework 4
Question 3:
Problem description:
We decided to try to solve an instance of subset sum problem.
The problem: for some set of integers, to determine if exist a subset of numbers whose sum
is equal to 0. This problem is NP-Hard.
We generated a train set: 150 sets of n integer numbers, generated randomly with uniform
distribution from [-10,10]. In this experiment, the train set is also the test set, just like we did
in the Intertwined Spirals Problem.
Then we used GP to learn a scheme program that for each set returns positive number iff
there is a subset which sums to 0.
The terminals for the GP are the numbers of the set (X0 .. Xn-1), and a random number R € [1,1]. Although random number might seem irrelevant to that problem, experiments show
that removing it from the terminal set harms the results.
The functions are +,-,%,* and IFLTE. % is protected division: for all X, X%0 = 0. For all X≠0, for
all Y, Y%X = Y/X.
(IFLTE arg0 arg1 arg2 arg3) is: If arg0 ≤ arg1, then arg2, else arg3.
Individuals: The genome we used is scheme program, built as a tree. The terminals are the
numbers of the set (X0 ... Xn-1), and random number € [-1, 1]. The random number is
generated in the creation of the node and stays unchanged during the process. The function
set (inner nodes of the tree) contains the two argument functions -, +, * and % (protected
division. For all x, x%0 = 0. For y≠0, x%y = x/y), and the four arguments function IFLTE (If less
than or equal), performing If arg0 ≤ arg1 return arg2 else return arg3. The terminals are
chosen with equal probability, and so does the functions.
The population is initialized with random 3-levels full trees. In order to prevent the trees
from growing too big, if the size of a tree exceeds 200 nodes, it is removed from the
population and replaced with new random 3-levels tree.
The fitness of an Individual is the number of sets it classifies right. One can notice that a
random guess of number € [-1,1] will expectedly classify 50% of the points correctly. On our
dataset of 150 points, that means fitness of 75. Therefore, for a program to be considered
"good" it must achieve fitness greater than 75.
Selection: Tournament selection, k=2.
Population size: 1500
Crossover: Exchange sub-trees crossover. The operator randomly chooses a node at each
tree, and replaces the sub-trees rooted in those nodes. After some tuning, we found out
medium pc (0.4) works the best.
Mutation: Random sub-tree replace mutator. Randomly picks a node in the tree, and
replaces it with new randomly generated terminal. After some tuning, we found out low pm
(0.1) works the best.
Generations: 501
Experiment A: sets of size 3:
We run the experiment 10 independent runs.
In this experiment, the best individual classifies right 148.25 (In average) out of 150 sets (
98.8333% success).
Experiment B: sets of size 4:
Again, we run 10 independent runs:
The best solver found solves 134.3333 (in average) sets correctly, which are 89.56% success.
Experiment C: sets of size 5, integers € [-10,10]
The best solver found solves 134 (in average) sets correctly, which are 89.33% success.
Conclusions, observations and remarks:
 Even though this problem is indeed hard to solve, genetic programming can develop
pretty good solvers: solve about 90% of the cases.
 In this method of developing, over fitting might be a problem.
 Ideas for future work: Try with much longer time (~10,000 generations). Try
different mutation method. Try different inner function in the GP tree.
Download