1 The Convergence Prediction method for Genetic and PBIL-like algorithms with binary representation Eugene A. Sopov, Sergey A. Sopov Abstract— Genetic algorithms (GA) are stochastic search procedures which have been used for solving many complex optimization problems. It’s obvious that GA collect and exploit some statistical information about the search space, but this information isn’t processed in explicit way. In this paper we consider GA with binary representation and its explicit statistics in a form of probability distribution of unit-values. The binary GA convergence property is discussed and a new convergence prediction method is proposed. The results of the prediction algorithm effectiveness investigation over the set of complex continuous and discrete deceptive problems are presented. Index Terms—Genetic algorithms, Estimation of Distribution Algorithms, convergence prediction, binary representation. the population. Many EDA algorithms are based upon such kind of distribution [4, 5, 6, 7] The main goal of this investigation is to increase the binary GA effectiveness by analyzing and exploiting explicit information of the algorithm dynamics via distribution estimation. II. DISTRIBUTION ESTIMATION FOR BINARY GENETIC ALGORITHMS To present the statistics processing by GA we estimate the probabilities vector of a population as: k k k k k P ( p1 ,..., pn ), pi P( xi 1), i 1, n (1) k here p i is a probability of unit-value for i-th position in a I. INTRODUCTION solution, k is a generation number, n is a solution length, x j is HERE is a necessity of complex system models development in many fields of science and technology. The modelling raises many optimization problems which are multiextremal, multiobjective and have implicit objectives, complex feasible area structure, many types of variables, etc. These complex optimization problems cannot be solved with the classical optimization procedures. Thus we have to design and implement more effective and versatile methods such as genetic algorithms [1]. A well-known approach to GA parameters set reduction is the Estimation of Distribution Algorithms (EDA), sometimes called the Probabilistic Model-Building Genetic Algorithms, which replaces population representation with a probability distribution over the choices available at each position in the vector that represents a population member [2, 3]. The EDA generate solutions and control populations according to given distribution instead of using nature-inspired genetics-based operations. A straightforward way to estimate distribution for GA with binary representation is to use the unit-value distribution over a value of j-th gene of i-th individual, k is a number of generation (iteration). We can write down the generalized scheme of the EDA algorithm using the given probability distribution in the following way: 1. Randomly generate the initial population according to the probability distribution. 2. Evaluate current population. 3. Update the distribution using the given strategy. 4. Form a new population according to the probability distribution. 5. Until stop criterion is satisfied, repeat steps 2-4. T Manuscript received May 10, 2011. E. A. Sopov is with the Department of System Analysis and Operations Research, Siberian State Aerospace University named after academician M.F. Reshetnev, Krasnoyarsk, Russian Federation (e-mail: es_gt@mail.ru). S. A. Sopov is with the Department of System Analysis and Operations Research, Siberian State Aerospace University named after academician M.F. Reshetnev, Krasnoyarsk, Russian Federation (e-mail: es_gt@mail.ru). i As we can see, GA use the same scheme. The main differences take a place with steps 3-4, when GA update information using such operations as selection, crossover and mutation. Step 3 also defines the different EDA algorithms. Let’s consider some EDA realizations. Variable probabilities method (“MIVER”) was first proposed in [4] and improved in [5]. This is a stochastic optimization procedure for pseudo-boolean problems, which operates with a set (a population) of binary solution. The distribution update strategy is based on unit-value distribution of the best (and/or the worst) solution in certain population. There exists a number of strong mathematical convergence proofs for some classes of optimization problems. Population-based incremental learning (PBIL) was proposed in [6]. This is a combination of evolutionary 2 optimization and artificial neural networks learning technique. PBIL updates the distribution in the same way as “MIVER” does – according to unit-value distribution of the best and the worst solution. Both PBIL and “MIVER” use greedy linear update strategy (taking into account only the best/worst solution), so they demonstrate much more local convergence than standard GA do. Thus standard GA performance on average is better for many complex optimization problems. Probabilities-based GA was proposed in [7]. It’s a further evolution of the ideas of Variable probabilities method and PBIL. In this algorithm genetic operators are not substituted with distribution estimation, but estimated distribution is used to model genetic operators. Probabilities-based GA uses the whole population information to update distribution, so it demonstrates as global performance as standard GA do. Moreover it contains less parameters to tune and process additional information about the search space (probabilities distribution), thus probabilities-based GA exceeds standard GA on average. We have investigated the performance of algorithms using a visual representation of distribution. Each component value of probability vector, which performs distribution, varies during the run, so we can show it on diagram (fig.1-4). As we can see the probability value oscillates about p 0.5 (initial steps, uniform distribution over the search space) and then converges to value one (or zero in other case) as the algorithm converges to optimal (or suboptimal) solution. We can estimate and visualize distribution of unit-values for any binary GA, even if GA doesn’t use such distribution in the run: 1 N i x j , j 1, n (2) P p1 , , pn , p j Px j 1 N i 1 where N is a population size. As different algorithms use different distribution update strategies, so we obtain different behavior of probabilities dynamics during the run (fig.1-4). Analyzing these diagrams with different optimization problems we can find that the components of the probabilities vector frequently converge to value one if optimal solution contains unit-value in corresponding position (or to zero if optimal solution contains zero value). It means that if probability converges to one (or zero), the value of corresponding gene of optimal solution most probably is equal to one (or zero). The higher algorithm performance the more often probabilities converge to the correct value. Fig. 2: Evolution of the probability vector component for a typical run of PBIL. Fig. 3: Evolution of the probability vector component for a typical run of the Probabilities-based GA. Fig. 4: Evolution of the probability vector component for a typical run of Standard GA. III. CONVERGENCE PREDICTION METHOD We describe the convergence feature of stochastic binary algorithms as: opt pi 1, if xi 1, (3) opt pi 0, if xi 0, opt for algorithm's iteration number converges infinity. Here xi is the i-th position value of the problem optimal solution, i = 1, n . So we can use this feature to predict the optimal solution of the given problem. The convergence prediction method is: 1. Choose the binary stochastic algorithm (e.g. GA, PBIL or other), set the iteration number i 1,..., I and the number of independent algorithm run s 1,..., S . 2. Collect the statistics ( p j ) i , j 1,..., n . Average ( p j ) i s over s. Determine the tendency for p j change. 1, if p j 1, opt 3. Set x j . 0, if p j 0 A simple way to determine the convergence tendency is to use the following integral criterion on step 3: 1, if I ( p ) 0.5 0, j i i 1 xj 0, otherwise opt Fig. 1: Evolution of the probability vector component for a typical run of the Variable probabilities method (“MIVER”). (4) The main idea is that the more often probability value is greater than 0.5, the higher probability of optimal solution to be equal to one. 3 Unfortunately, in many real-world problems there is a situation when binary stochastic algorithm collects not enough information on the early steps and the gene value of certain position is equal to one (or zero) for almost every individual. But at the final stage algorithm can find the much better solution with inverted values of gene and it means that the probability vector values will change their convergence direction. We propose to use the following modification of convergence prediction algorithm to solve above mentioned problem: 1. Set the prediction step K. 2. Every K iteration use the given statistics P i , i 1, N K , N K t K , t 1,2, , to evaluate the probability vector change: P i P i P i 1 . 3. Set the weights for every iteration accordingly to its number: i 2 i N K N K 1 , i 1, , N K . 4. Evaluate probability vector weighted change as: NK i Pi . ~ P ~ pj i 1 5. Set the optimal solution: p j 0, 1, if ~ opt xj 0, otherwise 6. Add the optimal solution in the current population. Stop the run and set the predicted solution as optimal or continue algorithm run to find better solution. The main idea of the given modification is that the probability values on the later iterations have the greater weights as algorithm collects more information about the search space. The weights should have values such that i 1 i and NK i 1. i 1 IV. CONVERGENCE AND DECEPTIVE PROBLEMS The practice has shown us that GA are very effective on many real-world problems, so GA researchers need to construct special test functions, which are real challenge for a certain algorithm. In a field of GA, these test functions are deceptive functions [9, 10]. The deceptive functions design is based upon the building blocks hypothesis [1] - the problems are GA-hard (GAdifficult) or deceptive [11]. It is necessary to notice that deceptive trap functions are a class of specially constructed functions, which consider how genetic search works – so standard GA have no opportunity to escape such traps in principle. In this work we supply GA with additional operation, which can be called a strong inversion. The inversion takes place in natural systems. Holland in [12] describes this operation as inversion of gene order in a selected segment of chromosome. The strong inversion for the binary representation is a bit-inversion for all positions in a chromosome. It’s obviously that in a real-world problem we cannot identify if a local optimum is the trap, so the strong inversion can destroy all useful genes in a chromosome. To avoid this harmful effect we should use the strong inversion with very low probability (e.g. 𝑝𝑖𝑛𝑣𝑒𝑟𝑡 ∈ {0.001; 0.01}) in a manner of mutation. It means that if there are no traps in a search space, the inverted solution probably will have low fitness and its genes will be eliminated from population. In a case, GA fall into a trap, the inverted solution provides the total escape (for simple problems) or new genes with better fitness. V. TEST PROBLEMS AND NUMERICAL EXPERIMENTS We have tested the proposed convergence prediction algorithms for discussed EDA and standard GA with a representative set of complex optimization problems. Continuous optimization problems. We have included 12 internationally known test problems, and 6 of them are from the test function set approved at Special Session on RealParameter Optimization, CEC’05 [8]. The test problem set contains the following functions: Rastrigin and Shifted Rotated Rastrigin, Rosenbrock, Griewank, De Jong 2, “Sombrero” (another name is “Mexican Hat”), Schekel, Katkovnik, Additive Potential function, Multiplicative Potential function and 2 one-dimensional multiextremal functions. Deceptive Trap functions. A variety of deceptive problems are based on the function of unitation, which is defined as number of ones in chromosome. We have included the following functions in test set: Trap on Unitation (with Ackley, Whitley and Goldberg parameters), Trap-4, Trap-5, DEC2TRAP, Liepins and Vose. To estimate algorithms efficiency we have carried out a series of independent run of each algorithm for each test problem. We used reliability as a criterion to evaluate algorithm efficiency. Reliability is the percentage of successful algorithm run (i.e. exact solution of the problem was found) over the whole number of independent runs. We have set the following parameters values for algorithms: Accuracy (binary solution length) – 100, Population size – 200, Generation number – 500, Independent run number – 100. The special parameters for each algorithm (e.g. selection in GA, learning rate in PBIL and others) are chosen to be efficient on average over function set. Table 1 presents efficiency estimation results for modified convergence prediction algorithm. As the probabilities-based GA exceeds standard GA, PBIL and MIVER, below we demonstrate results only for the probabilities-based GA. The investigation shows that the probabilities-based GA cannot find the global optimum solution on the given test functions set with chromosome length greater than 20 bits. The reliability is equal to zero even with fine tuning the algorithm parameters (the same situation with other algorithms). The probabilities-based GA rapidly falls into the local trap optimum and cannot overcome its attraction. 4 Next we have investigated the probabilities-based GA with the strong inversion and the prediction algorithm. The prediction takes place through each 10 generation, the predicted solution replaces the worst individual. At the final generation we fulfill the complete prediction over the whole run. TABLE I THE RELIABILITY ESTIMATION FOR MODIFIED CONVERGENCE PREDICTION ALGORITHM Problem Rastrigin Shifted Rotated Rastrigin Rosenbrock Griewank De Jong 2 Sombrero Schekel Katkovnik Additive potential Multiplicative potential Onedimensional N1 Onedimensional N2 88 Prob.based GA 98 Standard GA 94 28 40 72 50 24 10 8 22 28 36 41 51 28 55 62 100 80 100 70 60 30 100 38 58 22 56 70 100 76 100 98 100 28 100 86 98 70 98 75 100 MIVER PBIL 61 The stochastic search algorithms are not random walk through the search space, they have a purposeful behavior. We have demonstrates in the paper, that stochastic search procedures with binary representation (GA, PHIL and others) can be analyzed through visualization of its estimated distribution dynamics. We have proposed the efficient method to predict the point of convergence (optimal solution) using the results of distribution analysis. Two approaches were investigated: a simple way of using integral criterion and more complex strategy with contribution weights calculation. As numerical experiments shows the modified convergence prediction provides the better performance, even for algorithms with greedy search strategy and local behavior. It’s clear that the proposed prediction strategy is straightforward and there are situations, which lead to incorrect prediction. We have demonstrates that the slight modification of the base algorithms can overcome such GAhard problem as deceptiveness. In further work we suppose to develop the prediction and its implementations with other EDA and evolutionary algorithms and investigate this approach with a variety of test problems, which are the challenge for the search algorithms REFERENCES 86 100 80 100 TABLE II THE RELIABILITY AND AVERAGE EVALUATION NUMBER ESTIMATION OF THE PROBABILITIES-BASED GA WITH INVERSION AND PREDICTION FOR DECEPTIVE AND TRAP FUNCTIONS Prob.-based GA with inversion Problem 42 41 53 61 38 78 aver. evaluation number 37862 42891 27771 31463 38332 15712 51 39871 reliability Ackley Trap Whitley Trap Goldgerg Trap Trap-4 Trap-5 DEC2TRAP Liepins and Vose VI. CONCLUSIONS Prob.-based GA with inversion and prediction aver. reliaevaluability tion number 46 28732 40 38701 55 25981 62 23712 45 32778 98 15899 50 33520 The results are shown in a Table 2. As we can see, the modification with the strong inversion is able to overcome the attraction of the trap, so the algorithm can find the global optimum. In cases when the probabilities-based GA with prediction yields, its reliability is also is high and has comparable values. When we add predicted solution every 10 generation in a population, the probabilities-based GA solves the given problem with less evaluations of the objective function. [1] Goldberg, D.E., 1989, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley. [2] Muehlenbein, H., Paas, G., 1996, From Recombination of Genes to the Estimation of Distribution. Part 1, Binary Parameter, Lecture Notes in Computer Science 1141, Parallel Problem Solving from Nature, pp. 178-187. [3] Larraenaga, P., Lozano, J.A., 2001. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, Kluwer Academic Publishers. [4] Antamoshkin, A.N., 1986. Brainware for search pseudoboolean optimization, Pr. of 10th Prague Conf. on Inf. Theory, Stat. Dec. Functions, Random Processes - Prague: Academia. [5] Antamoshkin, A.N, Saraev, V.N., 1987. The method of variable probabilities", in Problemy sluchaynogo poiska (Problems of random search), Riga: Zinatne, (in Russian). [6] Baluja, S., 1994. Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning, Technical Report CMU-CS-94-163, Carnegie Mellon University. [7] Semenkin, E.S., Sopov, E.A., 2005. Probabilities-based evolutionary algorithms of complex systems optimization, In Intelligent Systems (AIS’05) and Intelligent CAD (CAD-2005), Moscow, FIZMATLIT (in Russian, abstract in English). [8] Suganthan, P.N., Hansen, N. et al., 2005. Problem Definitions and Evaluation Criteria for the CEC 2005 Special Session on Real-Parameter Optimization, Technical Report, Nanyang Technological University, Singapore and KanGAL Report Number 2005005, Kanpur Genetic Algorithms Laboratory, IIT Kanpur. [9] Deb, K., Goldberg, D.E., 1992. Analysing deception in trap functions. In Foundations of Genetic Algorithms 2, D. Whitley (Ed.), San Mateo: Morgan Kaufmann. [10] Whitley, L. D., 1991. Fundamental principles of deception in genetic search. In Foundations of Genetic Algorithms, G. J. E. Rawlins (Ed.), San Mateo, CA: Morgan Kaufmann. [11] Wilson, S.W., 1991. GA-easy does not imply steepest-ascent optimizable. Proceedings of the Fourth International Conference of Genetic Algorithms (pp. 85-89). San Mateo, CA: Morgan Kaufmann. [12] Holland, J.H., 1975. Adaptation in natural and artificial systems, Ann Arbor. MI: University of Michigan Press.