Using Genetic Algorithm to Solve ... Fixed Interval Scheduling Problem

advertisement
Using Genetic Algorithm to Solve The Tactical
Scheduling Problem
Fixed Interval
Xiaomin
Zhong and Eugene
Santos
Jr.
Department of Computer Science and Engineering
University of Connecticut
Storrs,CT 06269-3155
zhongx@cse.uconn.edu and eugene@cse.uconn.edu
Abstract
Discrete optimization problems are usually NPhard.
The structural characteristics of these problemssignificantly dictate the solution landscape. In this paper,
we explore a structured approachto solving these kinds
of problems. Weuse a reinforcement learning system
to adaptively learn the structural characteristics of the
problem, hereby decomposingthe problem into several
subproblems.Basedon these structural characteristics,
we develop a Genetic Algorithm by using structural
operations to recombinethese subproblemstogether to
solve the problem. The reinforcement learning system
directs the GA. Wetest our algorithm on the Tactical Fixed Interval Scheduling Problem(TFISP) which
is the problem of determining the minimumnumber
of parallel non-identical machinesuch that a feasible
schedule exists for a given set of jobs. This work continues our work in exploiting structure for optimization(Zhong &Santos 1999).
Introduction
Discrete optimization problems are usually NP-hard
problems. For example, belief revision over Bayesian
Network is NP-hard (Shimony 1994), the Tactical
Fixed Interval Scheduling Problem (TFISP) is NP-hard
(Kroon, Salomon, & Wassenhove 1997). The structural characteristics of these problems can drastically
effect the solution landscape. Howto efficiently solve
these problems is still an open question. The existing algorithms seldom consider the structural characteristics of these problems, and this makes these algorithms sensitive to the solution landscape(Zhong &Santos 1999). In this paper, we continue and extend the
work on structure-based optimization(Zhong & Santos
1999). The basic idea is that we try to find a way to
decompose the problem into several subproblems based
on structural characteristics. The algorithm then works
on these subproblems and combines them together to
get the near optimal solution. Weuse a Reinforcement
Learning (RL) system to learn the structural characteristics of the problem, thereby decomposingthe problem into several subproblems based on structural characteristics.
Wethen use these structural characteristics to develop a Genetic Algorithm (GA) by introducing structural operations to work on these subproblems
and recombine them together. During the run time the
system evaluates the performance of the GAand continues with reinforcement learning to further tune the
reinforcement learning system to search for a better decomposition of the problem.
Weapplied an earlier version of this strategy in solving belief revision over Bayesian Networks (Zhong
Santos 1999) and got attractive results. In this paper,
we explore using an updated strategy to solve another
discrete optimization problem known as the Tactical
Fixed Interval Scheduling Problem (TFISP) (Kroon,
lomon, & Wassenhove 1997).
TFISP is the problem of determining the minimum
number of parallel non-identical machines, such that a
feasible non-preemptive schedule exists for a given set
of jobs. In TFISP, each job must belong to a specific
job class and must be carried out in a prespecified time
interval. The problem is complicated by the restriction that (1) each machine can handle only one job
a time, (2) each machine can handle only jobs from
a subset of the classes, and (3) preemption is not allowed. The processing time of the jobs and on what
type of the machine the jobs can be processed reflect
the correlations amongthese jobs. The jobs of TFISP
can be grouped together in different groups according to
their correlations. Here these groups can overlap each
other which means each job may belong to different
groups at the same time. While the groups we use to
solve belief revision did not overlap each other(Zhong
& Santos 1999), groups with overlapping can capture
the correlations of jobs in TFISP more precisely than
groups without overlapping. Obviously, these overlapping groups decompose the problem into several subproblems. Webuild a reinforcement learning controller
to identify these groups and recommend the use of
"group" crossover and "group" mutation for the genetic
algorithm based on these groupings.
The Tactical
Fixed Interval
Problem
Scheduling
The tactical fixed interval scheduling problem is the
problem of determining the minimumnumber of parallel non-identical machines, such that a feasible schedule
Copyright
©2000,
AAAI.
Allrights
reserved.
] ¢
In this paper, we explore the use of a genetic algorithm directed by a reinforcement learning system to
solve TFISP. Weuse reinforcement learning to continuously learn and identify the structural characteristics
of TFISP, and employ this in the GAthrough "group"
crossover and "group" mutations for the search process.
GA for
FIG. 0.1. An instance of TFISP and the corresponding graph G. Here the numberof job classes
is 3, and the numberof different machineclasses is
2. Machine1 can handle jobs in class 1 and 3, and
machine2 can handle jobs in class 2 and 3.
exists for a given set of jobs. Eachjob belongs to a specific job class, and has a fixed start time, a fixed finish
time, and a processing time that equals the length of
the time interval betweenthe job’s start and finish time.
Each machineis allowed to process jobs only from a prespecified subset of job classes and can process, at most,
only one job at a time.
Wemay describe TFISP as a network flow model with
side constraints (Kroon, Salomon, & Wassenhove1997).
The underlying directed graph G contains C subgraphs
Go, each one corresponding to one of the machine
classes. The node set Nc of Gc has a one-to-one correspondence with the set of start and finish times of the
jobs that can be handled by machine class c. The node
set Nc is also denoted as {no,fir = l:,...,pc},
where
Pc = INcl. Here, it is assumedthat for r = 2,... ,Pc the
nodes nc,r-1 and nc,r correspond to subsequent time
instants. A particular job j with j E Me, is represented
in Gc by an arc from the node corresponding to start
time sj to the node corresponding to finish time fj,
where Mcis the set of all jobs that can be carried out
by machines in machine class c. This arc has upper capacity one. In Gc there is for r = 2,... ,Pc an arc from
nc,~-i to nc,~ with unlimited capacity. The graphs Gc
are linked together by a super source P and a super
sink Q. There is an arc from P to each of the node
nd and there is an arc from each of he nodes nc,po to
the super sink Q. Both arcs have unlimited capacity.
Figure 0.1 shows an example of a set of jobs and the
corresponding graph G.
The side constraints that must be satisfied specify
that all flows must be integer, and that for each job the
total amount of flow in the corresponding arcs must
be exactly one unit. Now, in TFISP the objective is
to send an anmunt of flow from the super source P to
the super sink Q in such a way that: (1) all capacity constraints are satisfied, (2) all side constraints are
satisfied, and (3) the total amountof flow is minimal.
Needless to say, exact algorithms are intractable as the
number of jobs increases.
Solving
TFISP
Genetic algorithms (Goldberg 1989) are search and optimization algorithms based on the principles of natural evolution. To apply GA, one generally expresses
the problem in such a way that potential solution can
be coded in a gene-like bit sequence and a population
of those sequences is prepared. An optimal solution is
searched for by evolutionary operations including selection, crossover and mutation. Most researchers modified their implementations of GAeither by using nonstandard chromosome representation
or by designing
problem specific genetic operations to accommodatethe
problem to be solved in order to build efficient evolution programs. When using GA to solve TFISP, we
develop "group" operations for this particular problem.
The GA used in our method is based on the GENESIS(Grefenstette 1990) framework. In the following,
we will describes the representation and operations employed in our GA.
Representation
For the TFISPdomain, we represent a feasible schedule
to the the underlying TFISP as a chromosome. Each
gene corresponds to one job in TFISP, and holds the
machine class identifier which means the corresponding job is handled by a machineclass identified: by this
machine class identifier.
The genetic operations manipulate each individual by changing the value of each
element in the array. The fitness evaluation is simply
our solution quality calculation. In this case, it is the
number of machines required to carry out all jobs.
Selection
The selection operation is the standard "roulette wheel"
selection approach based on the ranking of the individuals within the population instead of the absolute performance. Ranking helps prevent premature convergence
by preventing "super" individuals from taking over the
population within a few generations.
Crossover
The crossover operation performs a "group" crossover.
Wepartition the jobs of TFISP into several groups
based on their processing time and machine classes they
can be processed. Each group is a subset of jobs that
forms a sub-schedule of the TFISP, and can overlap
each other. Wehope each sub-schedule is a suboptimal solution. For any two parents, we get one child by
combining the best subgroups together and the other
by next best subgroups together. The mechanism of
AI APPLICATIONS29
the "group" crossover is described in detail in our previous paper (Zhong & Santos 1999). In the following
we give a brief description. Assumethere are N jobs.
The N jobs have been appropriately divided into m
groups. Assume, there are two selected parents Pl and
P2. Each parent represents a feasible schedule for the
underlying TFISP. For any group i, we use Ei which
is subsolution as evaluation for group i. Ei equals the
number of machines needed by group i. For Pl, we get
Eil (i = 1,...,m).
For P2, we get Ei2 (i = 1,...,m).
For all groups, we then compare Eil with the corresponding Ei2. We get one child cl by combining the
groups with best values of Eik together. That means,
for any group i, if Eil >_ El2, k -- 1 and the corresponding schedule in Gil is added to el, otherwise k = 2 and
the corresponding schedule in Gi2 is added to cl. We
get the other child by combining the groups with next
best values of Eik together. Since the groups can overlap each other, each job can belong to different groups
at the same time. For the overlapping jobs, the corresponding elements of the gene were updated by randomly choosing a machineclass identifier amongall the
machine identifiers which the corresponding jobs can be
handled.
Mutation
There are two kinds of mutation operation employed in
our method. One is standard mutation and the other is
called "group" mutation. Standard mutation randomly
selects a chromosome to modify and then randomly
choose a new value for that chromosome. "Group"
mutation randomly select a chromosome to change the
group it belongs to. These mutation operations helps
the GAto maintain diversity in the population to avoid
premature convergence.
Reinforcement Learning System for
GroupingIdentification
Reinforcement learning (Kaelbling & Littman 1997)
a basic machine learning paradigm. It is a framework
for learning to make sequences of decisions in an environment. In the reinforcement learning scheme, inputs
describing the environment is given to the learning system. The system then makes a decision according to
these inputs, thereby causing the environment to deliver
a reinforcement to the system. The reinforcement measures the decision made by the system. In this paper,
we use the same reinforcement learning scheme that we
employed in our previous paper on implementing belief revision over Bayesian Networks (Zhong & Santos
1999).
Architecture
The system has two major parts: the controller and
the adaptive critic. For the controller, it has one unit
corresponding to each jobs of TFISP. Each unit is a
learning automata. The kth unit of the controller has
a weight vector wk = (Wkl .... ,Wkm) and gki E G =
30
FLAIRS-2000
{ 1,2,..., m} indicating the corresponding group index.
The unit generates a group output Yk E G stochastically
according to the law: Yh = i with probability Pki and
f(whigki)
Ph~= ~=~
m f(vohjgh~)
(1)
where i E G, f : R --+ R is the activation function
given by f(x) -- l+e-Z1 ¯ From the above formula it
follows that each value of parameter wh determines a
specific stationary random grouping. The weight vector
is updated at each instant using a scalar reinforcement
signal supplied by the adaptive critic network.
The adaptive critic network has one unit corresponding to each job of TFISP. The function of the adaptive critic network is to generate a reinforcement and
send this reinforcement to the corresponding controller
unit, and update the estimates of group values after
observing the state and payoff. The group value is defined as the expected total discounted payoff over the
infinite horizon. Our previous paper describes more
details about the architecture of the system(Zhong
Santos 1999). Since we allow group overlapping, for
each group, the state of each job describes whether the
job belongs to this group.
Learning
Algorithm
At instance n, for any job k the control network uses (1)
to computePki (i = 1,..., m) which is the probability
job k in group i. Theseprobabilities determine a specific
stationary random grouping. The system gets a payoff
for the grouping. The adaptive critic network generates
a reinforcement based on group values and updates the
group values according to the payoff. Then the parameters of the controller network, controller weights which
are used to compute Pki, are updated using the reinforcement. At instance n -- 0, controller weights and
group values can be arbitrary. The modified learning
algorithm is as follow with group overlapping:
1. For k = 1 to m (m is the number of jobs in TFISP)
do
(a) for each i E G the kth controller unit computesPhi
based on weight vector.
(b) for each i E G, randomly generate a random value
u E [0, 1]. compare Phi compare with u. When
u < Phi, the job k will belong to group i. The job
k may be in different groups at the same time.
2. Fork--lton
(a) the reinforcement r’ to the kth controller unit from
the kth adaptive critic will be:
r’
{ 1 qhi = maXmEG
akin
= 0 otherwise
(b) update the kth controller weights:
Awki = l~(r’ -- f(whi)) --
(2)
where/3 is the learning rate, ~ is the weight decay
rate. The weight decay is to prevent converging
prematurely.
(c) update the group value:
Aqki = 3(r(n) -- qki + a max qkrn)
(3)
mEG
where 3 is the learning rate, 7 is the discount factor, and r(n) is the payoff at instant
The above steps are repeated at each time instant.
Combining
GA with
RL
As stated above, the GAis used to solve TFISP, and the
RLis used to identify groupings for the GA. During runtime, the GAinteracts with the RLcontroller in order
to tune the GAperformance and find a better grouping
that results in improved GAperformance. Based on the
grouping, for any group i we compute a sub-solution of
the best individual of the population, and use the best
sub-solution Ei to determine the performance of the
GA. At instant n, we use the following to compute the
payoff:
E Cn)
ri(n) Ei(n - 1)
using standard GA. We ran both GAs through 500 iterations.
Table 0.2 through Table 0.5 show the results of testing on instances containing 100, 500 and 1000 jobs.
Num. of Jobs
100
100
100
100
I00
100
100
100
100
100
Our GA
SGA
9.36e+02 1.49e+03
9.36e+02 1.66e+03
9.79e+02 1.70e+03
8.72e+02 1.37e+03
1.04e+03 1.63e+03
8.95e+02 1.56e+03
8.52e+02 1.61e+03
8.72e+02 1.51e+03
1.00e+03 2.03e+03
8.09e+02 1.57e+03
Improvement
37.18%
43.61%
42.41%
36.35%
36.20%
42.63%
47.08%
42.51%
50.73%
48.47%
TABLE
0.2. Performance Comparison on 100 jobs
(4)
Experiments
This section describes experimental results evaluating
the performance of our method on various instances of
TFISP. First we randomly generate the instances following the testing design proposed by Kroon (Kroon,
Salomon, & Wassenhove 1997). We ran both our GA
and standard GA. Table 0.1 show the result on instances containing 100 jobs. However, the solution
landscape of the instances in Table 0.1 turned out to
be really flat. Thus the standard GAperformed very
well on such a landscape.
In more interesting and practical situations, we usually need to consider the cost of each machine class.
This increases the complexity of the problem Wegenerate the new instances using the following design: we
set the planning horizon T to 1,000, the number of
job classes to 10, the number of machine classes to 8,
and the maximumjob duration for each job to 200.
Each machine classes can run several job classes. Each
job class can run on several machine classes. Based
on this design, we randomly generate 10 100-instances,
10 500-instances and 10 1000-instances. In cases where
possible, we determined the optimal solution through
brute-force computation. Since we consider the cost of
machine class, the fitness function becomes:
m--8 K
Num. of Jobs
5O0
50O
500
50O
500
500
500
500
5OO
500
Our GA
8.52e+03
8.24e+03
8.47e+03
8.32e+03
8.27e+03
8.27e+03
8.23e+03
7.92e+03
7.85e+03
8.19e+03
SGA
1.01e+04
9.26e+03
1.05e+04
1.01e+04
1.03e+03
9.84e+03
9.98e+03
9.74e+03
9.42e+03
1.00e+04
Improvement
15.64%
11.01%
19.33%
18.51%
19.71%
16.00%
17.54%
18.69%
16.67%
18.10%
TABLE
0.3. Performance Comparisonon 500 jobs
Num. of Jobs
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
Our GA
SGA
1.73e+042.05e+04
1.77e+04 2.00e+04
1.70e+04 1.94e+04
1.68e+04 2.00e+04
1.77e+04 1.96e+04
1.71e+04 1.87e+04
1.69e+04 1.95e+04
1.72e+04 1.91e+04
1.00e+04 2.03e+04
1.73e+04 1.96e+04
Improvement
15.61%
11.50%
12.37%
16.00%
9.69%
8.56%
13.33%
9.95%
13.43%
11.73%
rn----1
j----1
TABLE
0.4. Performance Comparison on 1000
where Cmis the cost of machine class m, K is the number of machines in machine class m, and Ymis the number of jobs that can be handle on same machine during
the planning horizon T. Weuse
From our experiments, we can see that our new GA
with group crossover and mutation is quite promising.
Intuitively, the reinforcement learning system captures
groups of correlated jobs hereby decomposingthe problem into several subproblems, and uses these groups
to direct GA. By adaptively identifying the structural
characteristics of the problem, our approach is less sensitive to the structure. However, there are still limi-
(6)
(SNGA -- SSGA)/SsGA
to compute the improvement, where SJVGAis the solution using our new GAmethod and SSGAis the solution
AI APPLICATIONS 31
Num. of Jobs
100
100
100
100
100
100
100
I00
Job Distribution
uniform
uniform
uniform
uniform
peak
peak
peak
peak
Machine Combination
limit
limit
all
all
limit
limit
all
all
Max Job Duration
100
300
100
300
I00
300
100
300
Our GA SGA
16
16
26
26
13
14
25
25
25
25
43
43
17
17
29
29
TABLE
0.1. Performance Comparisonon Instances Generated By Following The Design Proposed By Kroon
(Kroon, Salomon, & Wassenhove1997)
Number of Jobs
100
5OO
1000
Average Improvement
42.72%
17.12%
12.22%
TABLE0.5. Average Improvement by Using Our
GA
tations with the approach. From our experiments we
found that group size effected the performance of our
new GA. Whenthe size of instances increased, using
small group size will make the "group" crossover expensive. However by using large group size, we may
not get good solutions.
Conclusion
We tested our new strategy on TFISP. The jobs of
TFISP are grouped together in different groups according to their correlations and each group is a subproblem of the TFISP. Webuilt a reinforcement learning
controller to identify these groups and recommendthe
use of "group" crossover and "group" mutation for the
genetic algorithm based on these groupings. The system then evaluated the performance of the genetic algorithm and continues with reinforcement learning to further tune the controller to search for a better grouping
especially with overlapping. Preliminary results have
shown that our approach to be quite promising.
Acknowledgements
This work was supported
by
the University of Connecticut Research Foundation and
AFOSRGrant No. 9600989.
References
Goldberg, D. E. 1989. Genetic Algorithms in Search,
Optimization, and Machine Learning. NewYork, NY:
Addison-Wesley.
Grefenstette, J. J. 1990. A user’s guide to GENESIS
Version 5.0. Navy Center for Applied Research in AI.
Kaelbling, L. P., and Littman, M. L. 1997. Reinforcement learning: A survey. Journal of Artificial
Intelligence Research 4:237-285.
32
FLAIRS-2000
Kroon, L. G.; Salomon, M.; and Wassenhove, L. N. V.
1997. Exact and approximation algorithms for the tactical fixed interval scheduling problem. Operation Research 45(4):624-638.
Shimony, S. E. 1994. Finding MAPsfor belief networks
is NP-hard. Artificial Intelligence 68:399-410.
Zhong, X., and Santos, Jr., E. 1999. Probabilistic reasonging through genetic algorithms and reinforcement
learning. In The 12th International FLAIRS conference, 477-481.
Download