Final Project Report for CS 182 Kobi Gal & Emir Kapanci HBSS with subproblem-based probability vector 1. Introduction One approach to problems involving a large search space is Heuristic-Biased Stochastic Sampling (HBSS), which is an elaboration of iterative sampling search. HBSS generates a solution by starting at the root of the search tree and creating a trajectory by selecting a child at each decision point in the tree and making use of a heuristic function to rank the children of each search node. The stochastic character of this algorithm comes from the way a child is chosen, which is based on a probability vector (also called the bias function). The selection of this vector is an important factor in the performance of the algorithm. We propose a technique that will generate a vector tailored to fit the characteristics of the domain of the problem, and thus leading to a quicker solution. We achieve this by dividing up the big problem in to a set of smaller problems. We then proceed to solve each subproblem optimally. At each level, we rank the children according to the same heuristic function in use by HBSS and compare it with the child that was chosen in the optimal solution process. Keeping track of this rank information, we construct a probability vector as further explained in section 3. Finally, we employ HBSS on the problem at hand, using this probability vector as a guide to choosing the next child in a decision point. Our project presentation paper is organized as follows: Section 2 describes the HBSS probing routine and how it may be guided to a solution using a probability vector. Section 3 describes how to create a probability vector by dividing a large problem to distinct subproblems and solving each one optimally. In the last section, we present some experimental results and a discussion of these. 2. HBSS probing using a probability vector The HBSS algorithm we implemented is given as figures 1-3 in the Appendix. We first create an initial problem of size n. In our case the problem is a Latin square with some portion of random initial assignments. We assign initial values to a percentage of the problem, and create a probability vector as described in section 3. These initial assignments mean that we will not necessarily be able to assign a value to every slot in the square while not breaking any constraints, for the problem may not be optimally solvable. Actually, this enables us to give a combinatorial optimization flavor to the CSP problem, for the aim is to fill as many cells as possible without violating any constraint. We call the HBSS routine, which in turn invokes the HBSS-SEARCH algorithm to perform a sampling iteration resulting in either a failure or an optimal solution. At each decision point within HBSS-SEARCH, the alternative choices are sorted according to a given heuristic. In our proposed test environment, there are actually two decisions at each point of the search. First, we chose the next variable to be assigned, using the most- constrained-variable heuristic: the variable with the fewest possible values in its domain is chosen to be assigned. We update the domains using a forward check. The second decision is choosing a value to assign to the chosen variable from its domain. This will be the point where our heuristic will come into play. We chose to implement a well-known heuristic called least-constraining-value. It leads to choosing a value that rules out the smallest number of values in the domains of unassigned variables. At each step, assignment of any possible value for the variable is one child, and we can rank the children using the above heuristic by calculating the domain sizes of unassigned variables assuming the particular value was assigned. A probability vector guides HBSS to choose one of the children, according to the appropriate probabilities given for that level. For example, in the map-coloring domain [2], suppose we are to choose a color value for the variable Mexico. Suppose the possible values to in Mexico’s domain are Red, Blue or Green (ranked with Red highest), and that the probability vector for the level corresponding to Mexico’s domain is [0.2, 0.3, 0.1, 0.3]. Note that we have 4 probabilities here but only 3 possible values. After normalizing the probability vector for Mexico’s level, we get [0.33, 0.5, 0.17]. Thus, we choose value Red with probability 0.33, Blue with probability 0.5, and Green with probability 0.17. We want to see whether the knowledge gained from solving the sub problems and incorporating these in the probability vector provides as a good prediction of the path leading to the solution. Note that with the combination of the forward checking technique, and the non-backtracking nature of HBSS, it is not uncommon for a domain of a variable to become empty. Unlike conventional search techniques, we do not backtrack at this point, but continue to the next variable to assign. Since we are not guaranteed to be able to complete the square this may still be the optimal solution. When HBSS finished assigning values to all of the variables in the problem, except those that have an empty domain, a score is computed, which penalizes each variable that contains an empty domain, and we always keep the best solution found up-to-then. 3. Creating the Probability Vector In our proposed HBSS technique, once we have ordered the children nodes according to a heuristic, we need a probability vector that determines which child rank to choose at each level. To briefly restate what happens, the probability vector determines the likelihood of a child at a certain rank to be chosen and usually, children that rank higher according to the heuristic will have a higher probability of being chosen. In [1], the terminology used for what we call probability vector is a bias function, and they consider 6 different bias functions: equal weight, greedy, logarithmic, linear, polynomial and exponential. In our case, we will use subproblems to obtain a probability vector and it will not necessarily be a function-like distribution. Our intuition is that solving the subproblems separately will allow us to get a probability vector that is tailored specifically for the problem in hand. Our approach is to find suitable divisions of the problem, i.e. get subproblems that are manageable, solve these optimally and keep track of the rank (according to our HBSS heuristic) of the chosen child. This will provide statistics from which we can construct a probability vector for the whole problem. Of course, there are advantages and disadvantages of this approach. The advantage is that if the subproblems do provide a good probability vector, we will not need to run many experiments to choose a good bias function. The disadvantage however is the actual division problem. We need to be able to get subproblems that have similar characteristics as the whole problem, yet have manageable sizes. In the case of traveling salesperson for example, we could divide the graph into subgraphs to obtain the subproblems. This means for any specific problem we need to somehow come up with such subproblems. The nice point is that we don’t actually care about how to combine these subproblems; they simply guide us in the choice of our probability vector. So, to get subproblems we simply keep two points in mind: they should resemble the big problem, but be smaller in size. A Latin Squares problem with no initial assignment of variables, which is the problem described in [3], is actually not a combinatorial optimization problem. But if we have an initially random assigned Latin Square, which might correspond to an unsolvable problem, we can look at the it as a combinatorial optimization problem, and try to assign as many variables as possible without breaking any constraints. In the case that the initial assignments correspond to an optimally solvable problem in which all constraints can be satisfied, we will be able to compare HBSS to DFS. The subproblems are simply obtained by cutting smaller squares from the big square: . 1 2 7 0 8 5 4 2 0 2 5 6 3 7 4 0 2 1 1 4 7 5 1 The only change we need to make is to modify the contents of the subproblems since we may have initial assignments that are not compatible with the size of the square. For example in the above square, we have (2,4,5,7) in the same square but for a 3×3 Latin square, we only have 3 possible values. Any mapping of these values to (0,1,2) could create a square where some constraint is violated. So, instead we keep track of the filled positions and optimally solve these subproblems using forward checking. This approach is acceptable since we are not aiming to first solve and then to combine the subproblems. Moreover, the size of the subproblems is a parameter that can be modified, so we don’t decide a fixed value but rather experiment with different sizes to choose the best one. The bigger sizes will probably yield a better probability vector, but solving them optimally takes longer. The next step is to solve these subproblems to get the probability vector. Let’s assume we have 3×3 subproblems. This means we will have 3 values and 9 variables. We don’t want to limit ourselves to a unique vector for all levels (variables). It may be the case that as we are approaching the goal, our heuristic performs better, so the higher ranked children should be chosen with a higher probability than when we were at the beginning. (Or vice versa) So we will probably get a different vector for different levels of the search. Here, the levels correspond to one location of the square being filled. 0 2 1 1 0 # 1 # 2 1 0 0 # 2 2 1 1 0 0 1 2 2 # 1 1 0 0 1 2 2 0 1 1 # 0 0 1 2 2 0 1 1 2 0 We start by solving the leftmost square using a depth-first-search. We will simply ignore the subproblems that couldn’t be solved. Assume that the rightmost square is the solution we get using depth-first search for example. In our HBSS algorithm, we use the most constrained variable to choose the next variable to assign. It’s not important if the depthfirst algorithm followed a different assignment order or the same one. All we care about is the value it returned for that variable in the final solution. Once we choose a variable we need to rank the possible values it can take. Let’s assume our heuristic always orders the possible values as 0-1-2. (Actually, we use the same heuristic as in HBSS that considers the domains of the unassigned variables and also we will not consider the values that are not in the domain of a variable, but for simplicity we fix it here.) Since we already have 4 values assigned, we are at level 5. The most-constrained variable is shown by #. From the final square, we see that 0 was chosen. This means Rank1 is chosen at level 5. Next, at level 6, 2 is chosen and its rank is 3 (Actually, 2 would be ranking first as it is the only value in the domain if this was a real case). Continuing this way we get the following occurrence vector: Rank 1 0 0 0 0 1 0 0 1 0 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8 Level 9 Rank 2 0 0 0 0 0 0 1 0 0 Rank 3 0 0 0 0 0 1 0 0 1 We repeat this for all of the subproblems and add them to this one. Now we have a 9×3 vector. As a final step, we need to modify this to get our real probability vector for the big problem. Note some issues that need to be addressed: We first need to have a nonzero probability for all ranks, so that we allow HBSS to explore the whole search space and that also means instead of having only 3 ranks we need to have probabilities for N ranks where N is the size of the problem. Also, we have N×N levels instead of only 9. The second issue is easy to handle. We simply use each level of the small vector for (N×N)/9 levels of the big one. Enlarging the width of these level vectors is a more complex task. Consider the following vector that we wish to map into one of length 10: 5 3 0 1 A simple approach could be to assign a small probability to all cells that have 0 probability. This would for example result in the following vector: 5 3 0.1 1 0.1 0.1 0.1 0.1 0.1 0.1 The problem is that HBSS ranks the children according to a heuristic and uses this vector to choose one of them. As can be seen, with the first approach, children ranked 3rd and 10th will have the same probability of being chosen, so we loose the heuristic order. However, we could do better by superposing a linear (or logarithmic, etc.) probability vector as follows: Raw prob. vector Resulting Prob. vector 5 3 0 1 0 0 0 0 0 0 1 ½ 1/3 ¼ 1/5 1/6 1/7 1/8 1/9 1/10 6.00 3.50 0.33 1.25 0.20 0.16 0.14 0.13 0.11 0.10 Linear vector With weight 1 This approach keeps both our information from the subproblems and the rankings resulting from the heuristic. The weight of the superposed linear vector can be adjusted in order to balance the contribution of each of these. Now, the HBSS algorithm will use this last vector to probabilistically pick children at each level. Since variables may have domains smaller than N, we will need to normalize (divide the values by their sum to get a total probability of 1) this vector at the time of value assignments by only considering the first m values, where m is the number of possible assignments. 4. Experimental results The main focus of our project was to compare the performance of our probability vector with the bias functions discussed in [1]. We also saw that for cases where it actually was possible to complete the initial Latin square, HBSS algorithm had a computation time comparable to DFS with forward checking and choosing the most constrained variable. Graph 1- HBSS (bright line) vs. DFS + heuristic (dark line) for 30 % initial assignments and subproblem size of 4 120 run time 100 80 60 40 20 0 1 2 3 problem size (number of variables) 4 5 The experiments showed that when HBSS was able to reach a solution at the first few trials, it outperformed DFS. However, when it wasn’t able to find an optimal solution in the first probes, DFS performed better since HBSS took many iterations to explore the parts of the tree not favored by the heuristic. The probability vector we obtained using the subproblems in fact yielded a fast solution (in the first 1-3 trials) in most cases, so the average performance of HBSS was still similar to DFS as can be seen in Graph 1. In order to have a comparable scale for all bias functions, we used each one on the same Latin square, and noted the best solutions found after 20 trials. Then we took the minimum of these and noted in which iteration each of the bias functions reached a solution with an equal or higher score for the first time. The comparison of our probability vector (shown as SubHBSS) with linear, exponential, polynomial and logarithmic bias functions is given in Graph 2 below and we see that it outperforms the others in almost all cases, with polynomial bias function being a close contestant. We could improve the performance by superposing a polynomial vector instead of a linear one but superposing bias functions according to problems in hand would destroy the generality of this method, and we see that even without using problem specific bias functions our probability vector performs well. These results support our intuition that using subproblems could result in obtaining a good HBSS probability vector specific to the problem in hand. Graph 2 - HBSS with probability vector vs. HBSS with regular bias functions for 30 % initial assignments 16 14 12 SubHBSS 10 number of iterations Linear 8 Exponential 6 Polynomial 4 Logarithmic 2 0 10 13 17 20 number of variables 7. References [1] Bresina, John.L Heuristic-Biased Stochastic Sampling [2] Russel and Norvig, Artificial Intelligence, A modern approach [3] Assignment 3, CS 182 Staff, Harvard University Appendix I. Running directions 1. enter Lisp and load “project.system” 2. make and type (solve-latin-squares n) II. Pseudocodes for generating and solving Latin squares with initial assignments using HBSS: Solve-Problem () Problem = create-problem (n) Problem = Assign-Initial-values (Problem, percentage) Prob-vector = create-probability-vector (Problem) Sol = HBSS(Problem, limit, optimal-solution, heuristic, prob-vector) HBSS (problem, run-limit, optimal-solution, heuristic, prob-vector) Best-solution = none Best-result = 0 For j = 1 to run-limit { result = HBSS-search (problem, level, heuristic, prob-vector) when optimal-solution (result) return (result, scoring-function(result)) when scoring-function (result) > best-result { best-result = result best-score = scoring-function (result) } } return (best-result, best-score) Hbss-search (problem, level, value-heuristic, prob-vector) Loop { if all-variables-assigned (problem) return else { var-to-assign = most-constrained-var (problem) empty-queue (queue) for each value-option of domain (var) do push-value (value-option, queue) order-queue (value-heuristic) value-to-assign = choose-value (prob-vector,level) assign-value (var-to-assign, value-to-assign) perform-forward-checking (problem) } }