Using Genetic Algorithm to Solve The Tactical Scheduling Problem Fixed Interval Xiaomin Zhong and Eugene Santos Jr. Department of Computer Science and Engineering University of Connecticut Storrs,CT 06269-3155 zhongx@cse.uconn.edu and eugene@cse.uconn.edu Abstract Discrete optimization problems are usually NPhard. The structural characteristics of these problemssignificantly dictate the solution landscape. In this paper, we explore a structured approachto solving these kinds of problems. Weuse a reinforcement learning system to adaptively learn the structural characteristics of the problem, hereby decomposingthe problem into several subproblems.Basedon these structural characteristics, we develop a Genetic Algorithm by using structural operations to recombinethese subproblemstogether to solve the problem. The reinforcement learning system directs the GA. Wetest our algorithm on the Tactical Fixed Interval Scheduling Problem(TFISP) which is the problem of determining the minimumnumber of parallel non-identical machinesuch that a feasible schedule exists for a given set of jobs. This work continues our work in exploiting structure for optimization(Zhong &Santos 1999). Introduction Discrete optimization problems are usually NP-hard problems. For example, belief revision over Bayesian Network is NP-hard (Shimony 1994), the Tactical Fixed Interval Scheduling Problem (TFISP) is NP-hard (Kroon, Salomon, & Wassenhove 1997). The structural characteristics of these problems can drastically effect the solution landscape. Howto efficiently solve these problems is still an open question. The existing algorithms seldom consider the structural characteristics of these problems, and this makes these algorithms sensitive to the solution landscape(Zhong &Santos 1999). In this paper, we continue and extend the work on structure-based optimization(Zhong & Santos 1999). The basic idea is that we try to find a way to decompose the problem into several subproblems based on structural characteristics. The algorithm then works on these subproblems and combines them together to get the near optimal solution. Weuse a Reinforcement Learning (RL) system to learn the structural characteristics of the problem, thereby decomposingthe problem into several subproblems based on structural characteristics. Wethen use these structural characteristics to develop a Genetic Algorithm (GA) by introducing structural operations to work on these subproblems and recombine them together. During the run time the system evaluates the performance of the GAand continues with reinforcement learning to further tune the reinforcement learning system to search for a better decomposition of the problem. Weapplied an earlier version of this strategy in solving belief revision over Bayesian Networks (Zhong Santos 1999) and got attractive results. In this paper, we explore using an updated strategy to solve another discrete optimization problem known as the Tactical Fixed Interval Scheduling Problem (TFISP) (Kroon, lomon, & Wassenhove 1997). TFISP is the problem of determining the minimum number of parallel non-identical machines, such that a feasible non-preemptive schedule exists for a given set of jobs. In TFISP, each job must belong to a specific job class and must be carried out in a prespecified time interval. The problem is complicated by the restriction that (1) each machine can handle only one job a time, (2) each machine can handle only jobs from a subset of the classes, and (3) preemption is not allowed. The processing time of the jobs and on what type of the machine the jobs can be processed reflect the correlations amongthese jobs. The jobs of TFISP can be grouped together in different groups according to their correlations. Here these groups can overlap each other which means each job may belong to different groups at the same time. While the groups we use to solve belief revision did not overlap each other(Zhong & Santos 1999), groups with overlapping can capture the correlations of jobs in TFISP more precisely than groups without overlapping. Obviously, these overlapping groups decompose the problem into several subproblems. Webuild a reinforcement learning controller to identify these groups and recommend the use of "group" crossover and "group" mutation for the genetic algorithm based on these groupings. The Tactical Fixed Interval Problem Scheduling The tactical fixed interval scheduling problem is the problem of determining the minimumnumber of parallel non-identical machines, such that a feasible schedule Copyright ©2000, AAAI. Allrights reserved. ] ¢ In this paper, we explore the use of a genetic algorithm directed by a reinforcement learning system to solve TFISP. Weuse reinforcement learning to continuously learn and identify the structural characteristics of TFISP, and employ this in the GAthrough "group" crossover and "group" mutations for the search process. GA for FIG. 0.1. An instance of TFISP and the corresponding graph G. Here the numberof job classes is 3, and the numberof different machineclasses is 2. Machine1 can handle jobs in class 1 and 3, and machine2 can handle jobs in class 2 and 3. exists for a given set of jobs. Eachjob belongs to a specific job class, and has a fixed start time, a fixed finish time, and a processing time that equals the length of the time interval betweenthe job’s start and finish time. Each machineis allowed to process jobs only from a prespecified subset of job classes and can process, at most, only one job at a time. Wemay describe TFISP as a network flow model with side constraints (Kroon, Salomon, & Wassenhove1997). The underlying directed graph G contains C subgraphs Go, each one corresponding to one of the machine classes. The node set Nc of Gc has a one-to-one correspondence with the set of start and finish times of the jobs that can be handled by machine class c. The node set Nc is also denoted as {no,fir = l:,...,pc}, where Pc = INcl. Here, it is assumedthat for r = 2,... ,Pc the nodes nc,r-1 and nc,r correspond to subsequent time instants. A particular job j with j E Me, is represented in Gc by an arc from the node corresponding to start time sj to the node corresponding to finish time fj, where Mcis the set of all jobs that can be carried out by machines in machine class c. This arc has upper capacity one. In Gc there is for r = 2,... ,Pc an arc from nc,~-i to nc,~ with unlimited capacity. The graphs Gc are linked together by a super source P and a super sink Q. There is an arc from P to each of the node nd and there is an arc from each of he nodes nc,po to the super sink Q. Both arcs have unlimited capacity. Figure 0.1 shows an example of a set of jobs and the corresponding graph G. The side constraints that must be satisfied specify that all flows must be integer, and that for each job the total amount of flow in the corresponding arcs must be exactly one unit. Now, in TFISP the objective is to send an anmunt of flow from the super source P to the super sink Q in such a way that: (1) all capacity constraints are satisfied, (2) all side constraints are satisfied, and (3) the total amountof flow is minimal. Needless to say, exact algorithms are intractable as the number of jobs increases. Solving TFISP Genetic algorithms (Goldberg 1989) are search and optimization algorithms based on the principles of natural evolution. To apply GA, one generally expresses the problem in such a way that potential solution can be coded in a gene-like bit sequence and a population of those sequences is prepared. An optimal solution is searched for by evolutionary operations including selection, crossover and mutation. Most researchers modified their implementations of GAeither by using nonstandard chromosome representation or by designing problem specific genetic operations to accommodatethe problem to be solved in order to build efficient evolution programs. When using GA to solve TFISP, we develop "group" operations for this particular problem. The GA used in our method is based on the GENESIS(Grefenstette 1990) framework. In the following, we will describes the representation and operations employed in our GA. Representation For the TFISPdomain, we represent a feasible schedule to the the underlying TFISP as a chromosome. Each gene corresponds to one job in TFISP, and holds the machine class identifier which means the corresponding job is handled by a machineclass identified: by this machine class identifier. The genetic operations manipulate each individual by changing the value of each element in the array. The fitness evaluation is simply our solution quality calculation. In this case, it is the number of machines required to carry out all jobs. Selection The selection operation is the standard "roulette wheel" selection approach based on the ranking of the individuals within the population instead of the absolute performance. Ranking helps prevent premature convergence by preventing "super" individuals from taking over the population within a few generations. Crossover The crossover operation performs a "group" crossover. Wepartition the jobs of TFISP into several groups based on their processing time and machine classes they can be processed. Each group is a subset of jobs that forms a sub-schedule of the TFISP, and can overlap each other. Wehope each sub-schedule is a suboptimal solution. For any two parents, we get one child by combining the best subgroups together and the other by next best subgroups together. The mechanism of AI APPLICATIONS29 the "group" crossover is described in detail in our previous paper (Zhong & Santos 1999). In the following we give a brief description. Assumethere are N jobs. The N jobs have been appropriately divided into m groups. Assume, there are two selected parents Pl and P2. Each parent represents a feasible schedule for the underlying TFISP. For any group i, we use Ei which is subsolution as evaluation for group i. Ei equals the number of machines needed by group i. For Pl, we get Eil (i = 1,...,m). For P2, we get Ei2 (i = 1,...,m). For all groups, we then compare Eil with the corresponding Ei2. We get one child cl by combining the groups with best values of Eik together. That means, for any group i, if Eil >_ El2, k -- 1 and the corresponding schedule in Gil is added to el, otherwise k = 2 and the corresponding schedule in Gi2 is added to cl. We get the other child by combining the groups with next best values of Eik together. Since the groups can overlap each other, each job can belong to different groups at the same time. For the overlapping jobs, the corresponding elements of the gene were updated by randomly choosing a machineclass identifier amongall the machine identifiers which the corresponding jobs can be handled. Mutation There are two kinds of mutation operation employed in our method. One is standard mutation and the other is called "group" mutation. Standard mutation randomly selects a chromosome to modify and then randomly choose a new value for that chromosome. "Group" mutation randomly select a chromosome to change the group it belongs to. These mutation operations helps the GAto maintain diversity in the population to avoid premature convergence. Reinforcement Learning System for GroupingIdentification Reinforcement learning (Kaelbling & Littman 1997) a basic machine learning paradigm. It is a framework for learning to make sequences of decisions in an environment. In the reinforcement learning scheme, inputs describing the environment is given to the learning system. The system then makes a decision according to these inputs, thereby causing the environment to deliver a reinforcement to the system. The reinforcement measures the decision made by the system. In this paper, we use the same reinforcement learning scheme that we employed in our previous paper on implementing belief revision over Bayesian Networks (Zhong & Santos 1999). Architecture The system has two major parts: the controller and the adaptive critic. For the controller, it has one unit corresponding to each jobs of TFISP. Each unit is a learning automata. The kth unit of the controller has a weight vector wk = (Wkl .... ,Wkm) and gki E G = 30 FLAIRS-2000 { 1,2,..., m} indicating the corresponding group index. The unit generates a group output Yk E G stochastically according to the law: Yh = i with probability Pki and f(whigki) Ph~= ~=~ m f(vohjgh~) (1) where i E G, f : R --+ R is the activation function given by f(x) -- l+e-Z1 ¯ From the above formula it follows that each value of parameter wh determines a specific stationary random grouping. The weight vector is updated at each instant using a scalar reinforcement signal supplied by the adaptive critic network. The adaptive critic network has one unit corresponding to each job of TFISP. The function of the adaptive critic network is to generate a reinforcement and send this reinforcement to the corresponding controller unit, and update the estimates of group values after observing the state and payoff. The group value is defined as the expected total discounted payoff over the infinite horizon. Our previous paper describes more details about the architecture of the system(Zhong Santos 1999). Since we allow group overlapping, for each group, the state of each job describes whether the job belongs to this group. Learning Algorithm At instance n, for any job k the control network uses (1) to computePki (i = 1,..., m) which is the probability job k in group i. Theseprobabilities determine a specific stationary random grouping. The system gets a payoff for the grouping. The adaptive critic network generates a reinforcement based on group values and updates the group values according to the payoff. Then the parameters of the controller network, controller weights which are used to compute Pki, are updated using the reinforcement. At instance n -- 0, controller weights and group values can be arbitrary. The modified learning algorithm is as follow with group overlapping: 1. For k = 1 to m (m is the number of jobs in TFISP) do (a) for each i E G the kth controller unit computesPhi based on weight vector. (b) for each i E G, randomly generate a random value u E [0, 1]. compare Phi compare with u. When u < Phi, the job k will belong to group i. The job k may be in different groups at the same time. 2. Fork--lton (a) the reinforcement r’ to the kth controller unit from the kth adaptive critic will be: r’ { 1 qhi = maXmEG akin = 0 otherwise (b) update the kth controller weights: Awki = l~(r’ -- f(whi)) -- (2) where/3 is the learning rate, ~ is the weight decay rate. The weight decay is to prevent converging prematurely. (c) update the group value: Aqki = 3(r(n) -- qki + a max qkrn) (3) mEG where 3 is the learning rate, 7 is the discount factor, and r(n) is the payoff at instant The above steps are repeated at each time instant. Combining GA with RL As stated above, the GAis used to solve TFISP, and the RLis used to identify groupings for the GA. During runtime, the GAinteracts with the RLcontroller in order to tune the GAperformance and find a better grouping that results in improved GAperformance. Based on the grouping, for any group i we compute a sub-solution of the best individual of the population, and use the best sub-solution Ei to determine the performance of the GA. At instant n, we use the following to compute the payoff: E Cn) ri(n) Ei(n - 1) using standard GA. We ran both GAs through 500 iterations. Table 0.2 through Table 0.5 show the results of testing on instances containing 100, 500 and 1000 jobs. Num. of Jobs 100 100 100 100 I00 100 100 100 100 100 Our GA SGA 9.36e+02 1.49e+03 9.36e+02 1.66e+03 9.79e+02 1.70e+03 8.72e+02 1.37e+03 1.04e+03 1.63e+03 8.95e+02 1.56e+03 8.52e+02 1.61e+03 8.72e+02 1.51e+03 1.00e+03 2.03e+03 8.09e+02 1.57e+03 Improvement 37.18% 43.61% 42.41% 36.35% 36.20% 42.63% 47.08% 42.51% 50.73% 48.47% TABLE 0.2. Performance Comparison on 100 jobs (4) Experiments This section describes experimental results evaluating the performance of our method on various instances of TFISP. First we randomly generate the instances following the testing design proposed by Kroon (Kroon, Salomon, & Wassenhove 1997). We ran both our GA and standard GA. Table 0.1 show the result on instances containing 100 jobs. However, the solution landscape of the instances in Table 0.1 turned out to be really flat. Thus the standard GAperformed very well on such a landscape. In more interesting and practical situations, we usually need to consider the cost of each machine class. This increases the complexity of the problem Wegenerate the new instances using the following design: we set the planning horizon T to 1,000, the number of job classes to 10, the number of machine classes to 8, and the maximumjob duration for each job to 200. Each machine classes can run several job classes. Each job class can run on several machine classes. Based on this design, we randomly generate 10 100-instances, 10 500-instances and 10 1000-instances. In cases where possible, we determined the optimal solution through brute-force computation. Since we consider the cost of machine class, the fitness function becomes: m--8 K Num. of Jobs 5O0 50O 500 50O 500 500 500 500 5OO 500 Our GA 8.52e+03 8.24e+03 8.47e+03 8.32e+03 8.27e+03 8.27e+03 8.23e+03 7.92e+03 7.85e+03 8.19e+03 SGA 1.01e+04 9.26e+03 1.05e+04 1.01e+04 1.03e+03 9.84e+03 9.98e+03 9.74e+03 9.42e+03 1.00e+04 Improvement 15.64% 11.01% 19.33% 18.51% 19.71% 16.00% 17.54% 18.69% 16.67% 18.10% TABLE 0.3. Performance Comparisonon 500 jobs Num. of Jobs 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 Our GA SGA 1.73e+042.05e+04 1.77e+04 2.00e+04 1.70e+04 1.94e+04 1.68e+04 2.00e+04 1.77e+04 1.96e+04 1.71e+04 1.87e+04 1.69e+04 1.95e+04 1.72e+04 1.91e+04 1.00e+04 2.03e+04 1.73e+04 1.96e+04 Improvement 15.61% 11.50% 12.37% 16.00% 9.69% 8.56% 13.33% 9.95% 13.43% 11.73% rn----1 j----1 TABLE 0.4. Performance Comparison on 1000 where Cmis the cost of machine class m, K is the number of machines in machine class m, and Ymis the number of jobs that can be handle on same machine during the planning horizon T. Weuse From our experiments, we can see that our new GA with group crossover and mutation is quite promising. Intuitively, the reinforcement learning system captures groups of correlated jobs hereby decomposingthe problem into several subproblems, and uses these groups to direct GA. By adaptively identifying the structural characteristics of the problem, our approach is less sensitive to the structure. However, there are still limi- (6) (SNGA -- SSGA)/SsGA to compute the improvement, where SJVGAis the solution using our new GAmethod and SSGAis the solution AI APPLICATIONS 31 Num. of Jobs 100 100 100 100 100 100 100 I00 Job Distribution uniform uniform uniform uniform peak peak peak peak Machine Combination limit limit all all limit limit all all Max Job Duration 100 300 100 300 I00 300 100 300 Our GA SGA 16 16 26 26 13 14 25 25 25 25 43 43 17 17 29 29 TABLE 0.1. Performance Comparisonon Instances Generated By Following The Design Proposed By Kroon (Kroon, Salomon, & Wassenhove1997) Number of Jobs 100 5OO 1000 Average Improvement 42.72% 17.12% 12.22% TABLE0.5. Average Improvement by Using Our GA tations with the approach. From our experiments we found that group size effected the performance of our new GA. Whenthe size of instances increased, using small group size will make the "group" crossover expensive. However by using large group size, we may not get good solutions. Conclusion We tested our new strategy on TFISP. The jobs of TFISP are grouped together in different groups according to their correlations and each group is a subproblem of the TFISP. Webuilt a reinforcement learning controller to identify these groups and recommendthe use of "group" crossover and "group" mutation for the genetic algorithm based on these groupings. The system then evaluated the performance of the genetic algorithm and continues with reinforcement learning to further tune the controller to search for a better grouping especially with overlapping. Preliminary results have shown that our approach to be quite promising. Acknowledgements This work was supported by the University of Connecticut Research Foundation and AFOSRGrant No. 9600989. References Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. NewYork, NY: Addison-Wesley. Grefenstette, J. J. 1990. A user’s guide to GENESIS Version 5.0. Navy Center for Applied Research in AI. Kaelbling, L. P., and Littman, M. L. 1997. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4:237-285. 32 FLAIRS-2000 Kroon, L. G.; Salomon, M.; and Wassenhove, L. N. V. 1997. Exact and approximation algorithms for the tactical fixed interval scheduling problem. Operation Research 45(4):624-638. Shimony, S. E. 1994. Finding MAPsfor belief networks is NP-hard. Artificial Intelligence 68:399-410. Zhong, X., and Santos, Jr., E. 1999. Probabilistic reasonging through genetic algorithms and reinforcement learning. In The 12th International FLAIRS conference, 477-481.