Automated Generation of Benchmarks with High Discriminatory

advertisement
Proceedings of the 7th Annual ISC Graduate Research Symposium
ISC-GRS 2013
April 24, 2013, Rolla, Missouri
AUTOMATED GENERATION OF BENCHMARKS WITH HIGH DISCRIMINATORY
POWER FOR SPECIFIC SETS OF BLACK BOX SEARCH ALGORITHMS
Matthew Nuckolls
Department of Computer Science
Missouri University of Science and Technology, Rolla, MO 65409
ABSTRACT
Determining the best black box search algorithm (BBSA) to use
on any given optimization problem is difficult. It is self-evident
that one BBSA will perform better than another, but
determining a priori which BBSA will perform better is a task
for which we currently lack theoretical underpinnings. A system
could be developed to employ heuristic measures to compare a
given optimization problem to a library of benchmark
problems, where the best performing BBSA is known for each
problem in the library. This paper describes a methodology for
automatically generating benchmarks for inclusion in that
library, via evolution of NK-Landscapes.
1. INTRODUCTION
Some BBSAs lend themselves to straightforward generation of
a benchmark problem. For example, a Hill Climber search
algorithm, faced with a hill leading to a globally optimal
solution, can be expected to rapidly and efficiently climb the
hill, and can furthermore be expected to reach the top faster
than an algorithm that considers the possibility that downhill
may lead to a better solution. For other BBSAs however,
constructing a benchmark problem for which that BBSA will
outperform all others is a non-trivial task. Imperfect
understanding of the interactions between the multitude of
moving parts in a modern search algorithm leads to an
imperfect understanding of what sorts of problems any given
BBSA is best suited for.
A method by which a benchmark problem can be
automatically generated to suit an arbitrary BBSA would allow
the user to assemble a library of benchmark problems. A first
step towards building a benchmark where an arbitrary BBSA
beats all other BBSAs, is a benchmark where an arbitrary
BBSA beats all other BBSAs in a small set.
2. BLACK BOX SEARCH ALGORITHMS
Several standard BBSAs were chosen for inclusion in the set,
spanning a spectrum of behaviors. By finding benchmark
problems for a variety of BBSAs, the validity of this
methodology for creating a library of benchmark problems is
strengthened.
Random Search (RA) simply generates random individuals
and records their fitness until it runs out of evaluations. The
individual found with highest fitness is deemed optimal. This
BBSA is not expected to beat any other BBSA, however one
consequence of the No Free Lunch Theorem1 is that we should
be able to find a benchmark problem for which none of the
other included BBSAs can do better than Random Search.
Hill Climber (HC) is a steepest ascent restarting hill
climber. It starts from a random location, and at each time step
it examines all of its neighbors and moves to neighbor with
highest fitness. Should it find itself at a peak -- a point at which
all neighbors are downhill -- and then it starts over from a new
random location. Note that this algorithm is vulnerable to
plateaus, and will wander instead of restart.
At each time step, Simulated Annealing2 (SA) picks a
random neighbor. If that neighbor has a higher fitness than the
current location, then SA moves to that location. If that
neighbor has a lower or equal fitness, then SA still may move to
that location, with probability determined by a 'cooling
schedule'. Earlier in the run, SA is more likely to move
downhill. SA does not restart. As implemented in this paper, SA
uses a linear cooling schedule so that it is more likely to
explore at the beginning of the run and more likely to exploit at
the end.
As implemented for this paper, Evolutionary Algorithm3
(EA) is a mu + lambda (mu=100, lambda=10) evolutionary
algorithm, using linear ranking (s=2.0) stochastic universal
sampling for both parent selection and survival selection.
3. N-K LANDSCAPES
When using NK-landscapes4, typically the experiment is set up
to use a large number of randomly generated landscapes to
lessen the impact of any one landscape on the results. This line
of research, however, is explicitly searching for fitness
landscapes that have an outsized impact. Each NK-landscape
generated is evaluated on its ability to discriminate amongst a
set of search algorithms. A high scoring landscape is one that
shows a clear preference for one of the search algorithms, such
that the chosen algorithm consistently finds a better solution
than all other algorithms in the set. This is determined via the
following methodology.
3.1. Implementation
In this paper, NK-landscapes are implemented as a pair of
lists. The first list is the neighbors list. The neighbors list is n
elements long, where each element is a list of k+1 integer
indexes. Each element of the i'th inner list is a neighbor of i,
and will participate in the calculation of that part of the overall
1
fitness. An important implementation detail is that the first
element of the i’th inner list is i, for all i. Making the element
under consideration part of the underlying data (as opposed to a
special case) simplifies and regularizes the code, an important
consideration when metaprogramming is used. A second
important implementation detail is that no number may appear
in a neighbor list more than once. This forces the importance of
a single index point to be visible in the second list, allowing for
easier analysis. The first list is called the neighborses list, to
indicate the nested plurality of its structure.
The second list is the subfunctions list. The subfunctions
list is used in conjunction with the neighborses list to determine
the overall fitness of the individual under evaluation. The
subfunction list is implemented as a list of key value stores, of
list length n. Each key in the key value store is a binary tuple of
length k, with every possible such tuple represented in every
key value store. For example, if k is 2, then the possible keys
are (0, 0), (0, 1), (1, 0), and (1, 1). The values for each key are
real numbers, both positive and negative.
3.2. Evaluation of a Bit String Individual
To evaluate an individual, the system runs down the pair of
lists simultaneously. For each element in neighborses, it
extracts the binary value of the individual at the listed indexes
in the first list. It assembles those binary values into a single
tuple. It then looks at the corresponding subfunc key value store
in the subfuncs list and finds the value associated with that
tuple. The sum of the values found for each element in the pair
of lists is the fitness of that individual in the context of this NKlandscape.
Part of the design consideration for this structure was ease
of metaprogramming for CUDA5. The various components of
the lists plug into a string template of C++ code, which is then
compiled into a CUDA kernel. This kernel can then be run
against a large number of individuals simultaneously. This
approach is not expected to be as fast as a hand-tuned CUDA
kernel that pays proper respect to the various memory
subsystems available, however it has shown to be faster than
running the fitness evaluations on the CPU, given a sufficiently
large number of individuals in need of evaluation.
3.3. Evolutionary Operators
The search algorithm chosen to guide the modification of
the NK landscapes is a canonical mu + lambda evolutionary
algorithm, with stochastic universal sampling used for both
parent selection and survival selection. Such an algorithm needs
to be able to mutate an individual, perform genetic crossover
between individuals, and determine the fitness of an individual.
Mutation and crossover are intrinsic to the representation of the
individual, and will be covered first. Fitness evaluation is left
for a later section.
Mutation of an NK-landscape is performed in three ways,
and during any given mutation event all, some, or none of the
three ways may be used. The first mutation method is to alter
the neighbors list at a single neighborses location. This does not
alter the length of the list, nor may it ever alter the first element
in the list. The second mutation method is to alter the subfunc
at a single location. All possible tuple keys are still found, but
the values associated with those keys are altered by a random
amount.
The third mutation method alters k. When k is increased,
each element of the neighborses list gains one randomly chosen
neighbor, with care taken that no neighbor can be in the same
list twice, nor can k ever exceed n. Increasing k by 1 doubles
the size of the subfunc key value stores, since each key in the
parent has two corresponding entries in the child key, one
ending in 0, the other ending in 1. For example the key (0, 1) in
the original NK-landscape would need corresponding entries
for (0, 1, 0) and (0, 1, 1) in the mutated NK-landscape. This
implementation starts with the value in the original key and
alters it by a different random amount for each entry in the
mutated NK-landscape.
When k is decreased, a single point in the inner lists is
randomly chosen, and the neighbor found at that point is
removed from each of the lists. Care is taken so that the first
neighbor is never removed, so k can never be less than zero.
The corresponding entry in subfuncs has two parents, for
example if the second point in the inner list is chosen, then both
(0, 1, 0) and (0, 0, 0) will map to (0, 0) in the mutated NKlandscape. This implementation averages the values of the two
parent keys for each index in the subfuncs list.
Genetic crossover is only possible in this implementation
between individuals of identical n and k, via single point
crossover of the neighborses and subfuncs lists. The system is
therefore dependent on mutation to alter k, and holds n constant
during any given system run.
3.4. Evaluation of NK-Landscape Fitness
The NK-Landscape manipulation infrastructure is used to
evolve landscapes that clearly favor a given search algorithm
over all other algorithms in a set. Accordingly, a fitness score
must be assigned to each NK-Landscape in a population, so that
natural selection can favor the better landscapes, guiding the
meta-search towards an optimal landscape for the selected
search algorithm.
This implementation defines the fitness of the NKlandscape as follows. First, the 'performance' of each search
algorithm is found. The performance is defined as the mean of
the fitness values of the optimal solutions found across several
(n=30) runs. While performance is being calculated all search
algorithms also record the fitness of the worst individual they
ever encountered in the NK-landscape. This provides a
heuristic for an unknown value: the value of the worst possible
individual in the NK-landscape. Once a performance value has
been calculated for every search algorithm in the set, the
performance values and 'worst ever encountered' value are
linearly scaled into the range [0, 1], such that the worst ever
encountered value maps to zero and the best ever encountered
value maps to one. This provides a relative measure of the
performance of the various search algorithms, as well as
allowing for fair comparisons between NK-landscape.
2
4. NK-LANDSCAPE FITNESS COMPARISONS
Once each search algorithm has a normalized performance
value, the system needs to judge how well this NK landscape
'clearly favors a given search algorithm over all other
algorithms in the set'. This implementation tried two
approaches, one more successful than the other.
4.1. One versus All
The first heuristic used in this implementation wass to
calculate the set of differences between the performance of the
favored algorithm and the performance of each of the other
algorithms, and then find the minimum of the set of differences.
The minimum of the set of differences is then used as the
fitness of the NK-landscape. Note that the fitness ranges from
negative 1 to positive 1. A fitness of positive 1 would
correspond to an NK-landscape where the optimal individual
found by all of the non-favored algorithms has identical fitness
to the 'worst ever encountered' individual, while the favored
algorithm finds any individual better than that.
This approach suffered because it needed a 'multiple
coincidence' to make any forward progress. The use of the
minimum function meant that an NK-landscape needed to
clearly favor one algorithm over all others. Favoring a pair of
algorithms over all others was indistinguishable from not
favoring any algorithms at all, so there was no gradient for the
meta-search algorithm to climb.
4.2. Pairwise Comparisons
Making pairwise comparisons and allowing the metasearch algorithm to simply compare the normalized optimal
individuals between two search algorithms proved to provide a
better gradient. Since any change in the relative fitness of the
optimal individuals was reflected in the fitness score of the NKlandscape, the meta-search had immediate feedback, not
needing to cross plateaus of unchanging fitness.
5. DISTRIBUTED ARCHITECTURE
Taking advantage of the parallelizable nature of evolutionary
algorithms, this implementation used an asynchronous message
queue (beanstalkd) and a web server (nginx) to distribute the
workload across a small cluster of high performance hardware.
Each of the four machines in the cluster has a quad-core
processor and two CUDA compute cards. A worker node
system was developed whereby the head node could place a job
request for a particular search algorithm to be run against a
particular NK landscape. This job request was put into the
message queue, to be delivered to the next available worker
node. When the worker node received the request, it first
checked if it had a copy of the requested NK-landscape in a
local cache. If not, the worker node used the index number of
the NK-landscape to place a web request with the head node
and download the data sufficient to recreate the NK-landscape
and place it in the local cache. The use of the web based
distribution channel was necessary because the representation
of an NK-landscape can grow very large, and the chosen
message queue has strict message size limits. For the results
presented in this paper, the distributed system ran using 16
worker nodes utilizing the CPU cores, plus another 8 worker
nodes running their fitness functions on the CUDA cards.
Interestingly, the CUDA-based worker nodes did not
outperform the CPU based worker nodes. The CUDA-based
fitness evaluation is very fast, but the number of individuals
that need evaluated needs to be high before the speed difference
becomes apparent, due to the need to move the individuals and
kernels across the PCI bus. The exception is the random search
algorithm, which was rewritten for CUDA-based evaluation.
Since each of the evaluations is independent, all evaluations
can happen in parallel.
4. RESULTS
Figures 1 through 4 show the results for evolutionary runs
using the “One vs All” heuristic. For each figure, the bold red
line indicates the fitness of the NK-Landscape in the population
that best showcases the chosen BBSA. The bold black dashed
line indicates the value of k for that best NK-Landscape. The
thinner colored lines indicate the relative fitness of the best
individual found by the other BBSAs. The vertical axis on the
left measures the fitness of the colored lines, while the vertical
axis on the right measures only the dashed black line. The
horizontal axis shows how many fitness evaluations have
elapsed. While each “One vs All” experiment was repeated 30
times, for clarity each of these figures shows the results of only
a single randomly chosen representative run. All runs for each
experiment exhibited similar behavior, and inclusion of error
bars would unnecessarily clutter the graph.
Figures 5 through 8 show the results for evolutionary
runs using the “Pairwise” heuristic. In contrast to the previous
figures where each graph corresponds to a single evolutionary
run, this set of figures combines three runs into each graph.
Each graph shows three lines. Each line corresponds to a single
randomly chosen representative run, from the set of 30 runs
performed for each ordered pair of BBSAs. Each line in the
graph shows how performance of the BBSA in the title of the
graph compared to the performance of the BBSA corresponding
to that line, when applied to the NK-Landscape in the
population that best discriminates between the two BBSAs. For
clarity, the line showing the k value of the best NK-Landscape
is omitted from this graph.
5. DISCUSSION
In each experiment, the evolutionary process made forward
progress, however most fell short of the goal of finding a highly
discriminatory NK-Landscape.
In the experiment shown for EA vs All in Figure 1, a
landscape with k=7 was found that improved the performance
of all search algorithms in the set, but EA improved the most.
Later in the run a landscape was found that hurt the
performance of all search algorithms in the set, but EA was hurt
the least. No further gains were seen in the 1000 evaluations
allocated.
The performance of Simulated Annealing is strongly
dependent on its cooling schedule. As no attempt was made to
3
optimize the cooling schedule for any given NK-Landscape, SA
underperformed all other search algorithms in the set.
Nevertheless, Figure 2 shows that the evolutionary process
found an NK-Landscape that hurt the performance of other
search algorithms more than it hurt the performance of SA,
resulting in a net positive gain for SA. An NK-Landscape found
later in the run proved to be easier for all algorithms to solve,
while increasing k. This shows that while an increase in k may
mean an increase in difficulty, it may also mean a decrease in
difficulty, across all search algorithms. Fitness never became
positive however, meaning that the system could not find an
NK-Landscape where SA beat all other search algorithms in the
set.
Hill Climber performed unexpectedly well in this sequence
of experiments, consistently beating all other search algorithms
in the set. In the experiment shown in Figure 3 it is interesting
to note that the performance of the non-competitive search
algorithms did change during the run without altering the
overall fitness, due to the “One versus All heuristic”. This
shows that this heuristic is indeed vulnerable to the need to
cross plateaus of unchanging fitness, blind to possible progress.
Random Search, as shown in Figure 4, was not expected to
reach a fitness of 0. A fitness landscape where random search
beat all other search algorithms would be an interesting
landscape indeed.
Figure 5 shows the performance of EA versus each of the
other search algorithms. It is interesting to contrast with Figure
1, where the performance of EA never surpassed that of HC. In
Figure 5, we see EA rapidly passing HC and continuing to
grow. Clearly the system is capable of evolving an NKLandscape which favors EA over HC, so perhaps if the
experiment in Figure 1 were repeated with vastly more
evaluations allowed, the system would eventually wander
across the unchanging fitness plateau and find the solutions
found in Figure 5.
Figure 6 shows that evolving an NK-Landscape where
Simulated Annealing with a linear cooling schedule is the
preferred search algorithm, may take quite some time.
Figure 7 provides further evidence that Hill Climber
performs very well on this sort of problem, and comparing
Figure 7 to all other figures provides evidence that it is
relatively easy to evolve NK-Landscapes where Hill Climber
outperforms other BBSAs.
The interesting part of Figure 8 is that Random Search
should not consistently beat other search algorithms, unless the
other algorithms are revisiting points on the landscape that have
already been tried, and doing so at a rate faster than random
search would be expected to. This behavior is expected of SA,
which may spend a great deal of time oscillating among
already-explored states. This may also happen with EA, which
may revisit a state via any number of mechanisms. As
implemented, Hill Climber may also revisit states, when
wandering across a fitness plateau. The results in Figure 8
indicate that this is a greater weakness in SA and EA than it is
in HC.
6. CONCLUSIONS
This work definitively shows that discriminatory benchmark
problems can be evolved using fitness functions described
using the language of NK-Landscapes. However, the
discriminatory power between some BBSAs is low. A more
expressive description language may be needed to separate the
better BBSA, or perhaps the key is to simply allow a much
extended runtime.
The distributed architecture developed for this
research allows for efficient parallelization of fitness
evaluations in a heterogeneous environment. The size of the
questions we can ask depends on the amount of computational
power we can efficiently harness. A framework that allows for
efficient horizontal scalability across commodity hardware
allows us to ask bigger questions.
7. ACKNOWLEDGMENTS
The author would also like to acknowledge the support of the
Intelligent Systems Center for the research presented in this
paper. Furthermore, the author would like to thank Brian
Goldman, a prolific idea factory who epitomizes the concept
that if you generate a hundred ideas per day then even if 99.9%
of them are terrible, you’re still ahead.
9. REFERENCES
[1]
[2]
[3]
[4]
[5]
4
Wolpert, D. H., and Macready, W. G., 1997, “No Free
Lunch Theorems for Optimization”, IEEE Transactions
on Evolutionary Computation, Vol 1(1), pp. 67-82.
Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P., 1983,
“Optimization by Simulated Annealing.” Science, Vol
220, pp. 671-680.
Eiben, A. E., and Smith, J. E., 2007, “Introduction to
Evolutionary Computing,” Springer.
Kauffman, S., and Weinberger, E., 1989, “The N-K
Model of the application to the maturation of the
immune response,” Journal of Theoretical Biology, Vol
144(2), pp. 211-245.
Nickolls, J., et al, 2008, “Scalable Parallel
Programming with CUDA,” Queue, Vol 6(2), pp. 4053.
1.5
10
8
Fitness
1
6
0.5
4
0
-0.5
0
200
400
600
2
1000
0
800
Evaluations
k (landscape overlap)
Evolutionary Algorithm vs All
fitness
EA
SA
CH
RA
k
Figure 1 – EA performance vs all other algorithms at once
8
1
6
0.5
4
0
-0.5
0
100
200
300
400
500
600
Evaluations
700
800
900
2
1000
0
k (landscape overlap)
10
fitness
EA
SA
CH
RA
k
k (landscape overlap)
Fitness
1.5
fitness
EA
SA
CH
RA
k
k (landscape overlap)
Simulated Annealing vs All
fitness
EA
SA
CH
RA
k
Figure 2 – Simulated Annealing vs all other algorithms at once
Hill Climb vs All
Fitness
1.5
1
0.5
0
0
100
200
300
400
500
600
700
800
900
10
8
6
4
2
0
1000
Evaluations
Figure 3 – Hill Climber vs all other algorithms at once
Random Search vs All
Fitness
1.5
10
8
1
6
0.5
4
0
-0.5
0
100
200
300
400
500
600
Evaluations
700
800
900
Figure 4 – Random Search vs all other algorithms at once
5
2
1000
0
Evolutionary Algorithm vs Each Pairwise
Fitness
0.6
vs HC
0.4
vs RA
0.2
vs SA
0
-0.2
0
100
200
300
400
500
600
Evaluations
700
800
900
1000
Figure 5 – Evolutionary Algorithm vs each other algorithm, pairwise
Simulated Annealing vs Each Pairwise
Fitness
0
-0.1
0
100
200
300
400
500
600
700
800
900
1000
vs HC
vs RA
-0.2
vs EA
-0.3
-0.4
Evaluations
Figure 6 – Simulated Annealing vs each other algorithm, pairwise
Fitness
Hill Climb vs Each Pairwise
0.8
0.6
0.4
0.2
0
vs SA
vs RA
vs EA
0
100
200
300
400
500
600
700
800
900
1000
Evaluations
Figure 7 – Hill Climber vs each other algorithm, pairwise
Random Search vs Each Pairwise
Fitness
0.6
vs SA
0.4
vs HC
0.2
vs EA
0
-0.2
0
100
200
300
400
500
600
Evaluations
700
800
Figure 8 – Random Search vs each other algorithm, pairwise
6
900
1000
Download