2 - Harvard University

advertisement
Final Project Report for CS 182
Kobi Gal & Emir Kapanci
HBSS with subproblem-based probability vector
1. Introduction
One approach to problems involving a large search space is Heuristic-Biased Stochastic
Sampling (HBSS), which is an elaboration of iterative sampling search. HBSS generates
a solution by starting at the root of the search tree and creating a trajectory by selecting a
child at each decision point in the tree and making use of a heuristic function to rank the
children of each search node. The stochastic character of this algorithm comes from the
way a child is chosen, which is based on a probability vector (also called the bias
function). The selection of this vector is an important factor in the performance of the
algorithm. We propose a technique that will generate a vector tailored to fit the
characteristics of the domain of the problem, and thus leading to a quicker solution. We
achieve this by dividing up the big problem in to a set of smaller problems. We then
proceed to solve each subproblem optimally. At each level, we rank the children
according to the same heuristic function in use by HBSS and compare it with the child
that was chosen in the optimal solution process. Keeping track of this rank information,
we construct a probability vector as further explained in section 3. Finally, we employ
HBSS on the problem at hand, using this probability vector as a guide to choosing the
next child in a decision point.
Our project presentation paper is organized as follows: Section 2 describes the HBSS
probing routine and how it may be guided to a solution using a probability vector. Section
3 describes how to create a probability vector by dividing a large problem to distinct subproblems and solving each one optimally. In the last section, we present some
experimental results and a discussion of these.
2. HBSS probing using a probability vector
The HBSS algorithm we implemented is given as figures 1-3 in the Appendix. We first
create an initial problem of size n. In our case the problem is a Latin square with some
portion of random initial assignments. We assign initial values to a percentage of the
problem, and create a probability vector as described in section 3. These initial
assignments mean that we will not necessarily be able to assign a value to every slot in
the square while not breaking any constraints, for the problem may not be optimally
solvable. Actually, this enables us to give a combinatorial optimization flavor to the CSP
problem, for the aim is to fill as many cells as possible without violating any constraint.
We call the HBSS routine, which in turn invokes the HBSS-SEARCH algorithm to
perform a sampling iteration resulting in either a failure or an optimal solution. At each
decision point within HBSS-SEARCH, the alternative choices are sorted according to a
given heuristic. In our proposed test environment, there are actually two decisions at
each point of the search. First, we chose the next variable to be assigned, using the most-
constrained-variable heuristic: the variable with the fewest possible values in its domain
is chosen to be assigned. We update the domains using a forward check. The second
decision is choosing a value to assign to the chosen variable from its domain. This will be
the point where our heuristic will come into play. We chose to implement a well-known
heuristic called least-constraining-value. It leads to choosing a value that rules out the
smallest number of values in the domains of unassigned variables. At each step,
assignment of any possible value for the variable is one child, and we can rank the
children using the above heuristic by calculating the domain sizes of unassigned variables
assuming the particular value was assigned. A probability vector guides HBSS to choose
one of the children, according to the appropriate probabilities given for that level. For
example, in the map-coloring domain [2], suppose we are to choose a color value for the
variable Mexico. Suppose the possible values to in Mexico’s domain are Red, Blue or
Green (ranked with Red highest), and that the probability vector for the level
corresponding to Mexico’s domain is [0.2, 0.3, 0.1, 0.3]. Note that we have 4
probabilities here but only 3 possible values. After normalizing the probability vector for
Mexico’s level, we get [0.33, 0.5, 0.17]. Thus, we choose value Red with probability
0.33, Blue with probability 0.5, and Green with probability 0.17. We want to see whether
the knowledge gained from solving the sub problems and incorporating these in the
probability vector provides as a good prediction of the path leading to the solution. Note
that with the combination of the forward checking technique, and the non-backtracking
nature of HBSS, it is not uncommon for a domain of a variable to become empty. Unlike
conventional search techniques, we do not backtrack at this point, but continue to the next
variable to assign. Since we are not guaranteed to be able to complete the square this may
still be the optimal solution. When HBSS finished assigning values to all of the variables
in the problem, except those that have an empty domain, a score is computed, which
penalizes each variable that contains an empty domain, and we always keep the best
solution found up-to-then.
3. Creating the Probability Vector
In our proposed HBSS technique, once we have ordered the children nodes according to a
heuristic, we need a probability vector that determines which child rank to choose at each
level. To briefly restate what happens, the probability vector determines the likelihood of
a child at a certain rank to be chosen and usually, children that rank higher according to
the heuristic will have a higher probability of being chosen. In [1], the terminology used
for what we call probability vector is a bias function, and they consider 6 different bias
functions: equal weight, greedy, logarithmic, linear, polynomial and exponential. In our
case, we will use subproblems to obtain a probability vector and it will not necessarily be
a function-like distribution. Our intuition is that solving the subproblems separately will
allow us to get a probability vector that is tailored specifically for the problem in hand.
Our approach is to find suitable divisions of the problem, i.e. get subproblems that are
manageable, solve these optimally and keep track of the rank (according to our HBSS
heuristic) of the chosen child. This will provide statistics from which we can construct a
probability vector for the whole problem. Of course, there are advantages and
disadvantages of this approach. The advantage is that if the subproblems do provide a
good probability vector, we will not need to run many experiments to choose a good bias
function. The disadvantage however is the actual division problem. We need to be able to
get subproblems that have similar characteristics as the whole problem, yet have
manageable sizes. In the case of traveling salesperson for example, we could divide the
graph into subgraphs to obtain the subproblems. This means for any specific problem we
need to somehow come up with such subproblems. The nice point is that we don’t
actually care about how to combine these subproblems; they simply guide us in the
choice of our probability vector. So, to get subproblems we simply keep two points in
mind: they should resemble the big problem, but be smaller in size.
A Latin Squares problem with no initial assignment of variables, which is the problem
described in [3], is actually not a combinatorial optimization problem. But if we have an
initially random assigned Latin Square, which might correspond to an unsolvable
problem, we can look at the it as a combinatorial optimization problem, and try to assign
as many variables as possible without breaking any constraints. In the case that the initial
assignments correspond to an optimally solvable problem in which all constraints can be
satisfied, we will be able to compare HBSS to DFS. The subproblems are simply
obtained by cutting smaller squares from the big square:
.
1
2
7
0
8
5
4
2
0
2
5
6
3
7
4
0
2
1
1
4
7
5
1
The only change we need to make is to modify the contents of the subproblems since we
may have initial assignments that are not compatible with the size of the square. For
example in the above square, we have (2,4,5,7) in the same square but for a 3×3 Latin
square, we only have 3 possible values. Any mapping of these values to (0,1,2) could
create a square where some constraint is violated. So, instead we keep track of the filled
positions and optimally solve these subproblems using forward checking. This approach
is acceptable since we are not aiming to first solve and then to combine the subproblems.
Moreover, the size of the subproblems is a parameter that can be modified, so we don’t
decide a fixed value but rather experiment with different sizes to choose the best one. The
bigger sizes will probably yield a better probability vector, but solving them optimally
takes longer.
The next step is to solve these subproblems to get the probability vector. Let’s assume we
have 3×3 subproblems. This means we will have 3 values and 9 variables. We don’t want
to limit ourselves to a unique vector for all levels (variables). It may be the case that as
we are approaching the goal, our heuristic performs better, so the higher ranked children
should be chosen with a higher probability than when we were at the beginning. (Or vice
versa) So we will probably get a different vector for different levels of the search. Here,
the levels correspond to one location of the square being filled.
0
2
1
1
0
#
1
#
2
1
0
0
#
2
2
1
1
0
0
1
2
2
#
1
1
0
0
1
2
2
0
1
1
#
0
0
1
2
2
0
1
1
2
0
We start by solving the leftmost square using a depth-first-search. We will simply ignore
the subproblems that couldn’t be solved. Assume that the rightmost square is the solution
we get using depth-first search for example. In our HBSS algorithm, we use the most
constrained variable to choose the next variable to assign. It’s not important if the depthfirst algorithm followed a different assignment order or the same one. All we care about
is the value it returned for that variable in the final solution. Once we choose a variable
we need to rank the possible values it can take. Let’s assume our heuristic always orders
the possible values as 0-1-2. (Actually, we use the same heuristic as in HBSS that
considers the domains of the unassigned variables and also we will not consider the
values that are not in the domain of a variable, but for simplicity we fix it here.) Since we
already have 4 values assigned, we are at level 5. The most-constrained variable is shown
by #. From the final square, we see that 0 was chosen. This means Rank1 is chosen at
level 5. Next, at level 6, 2 is chosen and its rank is 3 (Actually, 2 would be ranking first
as it is the only value in the domain if this was a real case). Continuing this way we get
the following occurrence vector:
Rank 1
0
0
0
0
1
0
0
1
0
Level 1
Level 2
Level 3
Level 4
Level 5
Level 6
Level 7
Level 8
Level 9
Rank 2
0
0
0
0
0
0
1
0
0
Rank 3
0
0
0
0
0
1
0
0
1
We repeat this for all of the subproblems and add them to this one. Now we have a 9×3
vector. As a final step, we need to modify this to get our real probability vector for the
big problem. Note some issues that need to be addressed: We first need to have a nonzero probability for all ranks, so that we allow HBSS to explore the whole search space
and that also means instead of having only 3 ranks we need to have probabilities for N
ranks where N is the size of the problem. Also, we have N×N levels instead of only 9.
The second issue is easy to handle. We simply use each level of the small vector for
(N×N)/9 levels of the big one. Enlarging the width of these level vectors is a more
complex task. Consider the following vector that we wish to map into one of length 10:
5
3
0
1
A simple approach could be to assign a small probability to all cells that have 0
probability. This would for example result in the following vector:
5
3
0.1
1
0.1
0.1
0.1
0.1
0.1
0.1
The problem is that HBSS ranks the children according to a heuristic and uses this vector
to choose one of them. As can be seen, with the first approach, children ranked 3rd and
10th will have the same probability of being chosen, so we loose the heuristic order.
However, we could do better by superposing a linear (or logarithmic, etc.) probability
vector as follows:
Raw prob.
vector
Resulting
Prob. vector
5
3
0
1
0
0
0
0
0
0
1
½
1/3
¼
1/5
1/6
1/7
1/8
1/9
1/10
6.00
3.50
0.33
1.25
0.20
0.16
0.14
0.13
0.11
0.10
Linear vector
With weight 1
This approach keeps both our information from the subproblems and the rankings
resulting from the heuristic. The weight of the superposed linear vector can be adjusted in
order to balance the contribution of each of these. Now, the HBSS algorithm will use this
last vector to probabilistically pick children at each level. Since variables may have
domains smaller than N, we will need to normalize (divide the values by their sum to get
a total probability of 1) this vector at the time of value assignments by only considering
the first m values, where m is the number of possible assignments.
4. Experimental results
The main focus of our project was to compare the performance of our probability vector
with the bias functions discussed in [1]. We also saw that for cases where it actually was
possible to complete the initial Latin square, HBSS algorithm had a computation time
comparable to DFS with forward checking and choosing the most constrained variable.
Graph 1- HBSS (bright line) vs. DFS + heuristic (dark line) for 30
% initial assignments and subproblem size of 4
120
run time
100
80
60
40
20
0
1
2
3
problem size (number of variables)
4
5
The experiments showed that when HBSS was able to reach a solution at the first few
trials, it outperformed DFS. However, when it wasn’t able to find an optimal solution in
the first probes, DFS performed better since HBSS took many iterations to explore the
parts of the tree not favored by the heuristic. The probability vector we obtained using the
subproblems in fact yielded a fast solution (in the first 1-3 trials) in most cases, so the
average performance of HBSS was still similar to DFS as can be seen in Graph 1.
In order to have a comparable scale for all bias functions, we used each one on the same
Latin square, and noted the best solutions found after 20 trials. Then we took the
minimum of these and noted in which iteration each of the bias functions reached a
solution with an equal or higher score for the first time. The comparison of our
probability vector (shown as SubHBSS) with linear, exponential, polynomial and
logarithmic bias functions is given in Graph 2 below and we see that it outperforms the
others in almost all cases, with polynomial bias function being a close contestant. We
could improve the performance by superposing a polynomial vector instead of a linear
one but superposing bias functions according to problems in hand would destroy the
generality of this method, and we see that even without using problem specific bias
functions our probability vector performs well. These results support our intuition that
using subproblems could result in obtaining a good HBSS probability vector specific to
the problem in hand.
Graph 2 - HBSS with probability vector vs. HBSS with regular bias
functions for 30 % initial assignments
16
14
12
SubHBSS
10
number of
iterations
Linear
8
Exponential
6
Polynomial
4
Logarithmic
2
0
10
13
17
20
number of variables
7. References
[1] Bresina, John.L Heuristic-Biased Stochastic Sampling
[2] Russel and Norvig, Artificial Intelligence, A modern approach
[3] Assignment 3, CS 182 Staff, Harvard University
Appendix
I. Running directions
1. enter Lisp and load “project.system”
2. make and type (solve-latin-squares n)
II. Pseudocodes for generating and solving Latin squares with initial
assignments using HBSS:
Solve-Problem ()
Problem = create-problem (n)
Problem = Assign-Initial-values (Problem, percentage)
Prob-vector = create-probability-vector (Problem)
Sol = HBSS(Problem, limit, optimal-solution, heuristic, prob-vector)
HBSS (problem,
run-limit, optimal-solution, heuristic, prob-vector)
Best-solution = none
Best-result = 0
For j = 1 to run-limit
{ result = HBSS-search (problem, level, heuristic, prob-vector)
when optimal-solution (result)
return (result, scoring-function(result))
when scoring-function (result) > best-result
{ best-result = result
best-score = scoring-function (result)
}
}
return (best-result, best-score)
Hbss-search (problem, level, value-heuristic, prob-vector)
Loop
{
if all-variables-assigned (problem)
return
else
{ var-to-assign = most-constrained-var (problem)
empty-queue (queue)
for each value-option of domain (var) do
push-value (value-option, queue)
order-queue (value-heuristic)
value-to-assign = choose-value (prob-vector,level)
assign-value (var-to-assign, value-to-assign)
perform-forward-checking (problem)
}
}
Download