Adaptive population-based algorithm for global optimization Josef Tvrdı́k, Ivan Křivý, and Ladislav Mišı́k University of Ostrava 701 03 Ostrava Czech Republic {josef.tvrdik,ivan.krivy,ladislav.misik}@osu.cz Summary. This paper describes an adaptive stochastic algorithm for the global optimization. The algorithm is based on the competition of heuristics for local search. The experimental results obtained when applying the algorithm to the estimation of nonlinear-regression parameters of NIST tasks are presented and compared with the results of other algorithms. The proposed algorithm proved to be more reliable at an acceptable time consumption. Key words: global optimization, stochastic algorithms, nonlinear regression 1 Global Optimization and Non-linear Regression Let us consider continuous global optimization problem with box constraints, i.e. for a given objective function f : D → R, D ⊂ Rd , the point x∗ is to be found such ∗ that x∗ = arg minx∈D f (x). The Q point x is called the global minimum, D is the search space defined as D = di=1 [ai , bi ], ai < bi , i = 1, 2, . . . , d and the objective function f (x) is supposed to be computable. In a nonlinear regression model, the elements of random vector Y are expressed as follows Yi = g(xi , β) + εi , i = 1, 2, . . . , n , (1) xTi where = (x1 , x2 , . . . , xk ) is i-th row of regressor matrix X, β is vector of parameters, g is a given function nonlinear in parameters, and εi ’s are iid random variables with zero means. The estimation of parameters by the least squares method means to find such estimates of β that minimize the residual sum of squares Q(β) given by the following equation Q(β) = n X [Yi − g(xi , β)]2 . (2) i=1 1 This work was supported by the grant 201/05/0284 of the Czech Grant Agency and by the research scheme MSM 6198898701 of the Institute for Research and Applications of Fuzzy Modeling. 1364 Josef Tvrdı́k et al. The estimation of β is the global optimization problem. Iterative deterministic algorithms (Levenberg-Marquardt or Gauss-Newton) used in standard statistical packages often fail when searching for the true solution of the problem. Several statistical packages were tested [TK04] on higher-level-difficulty tasks collected by NIST [NIST]. For approximately one half of the tasks, the experiments either completely failed or resulted in a significant disagreement with the certified parameter values. This paper presents an attempt to propose an adaptive population-based algorithm giving reliable estimates at reasonable time consumption without changing default values of its control parameters. 2 Controlled Random Search with Competing Heuristics Controlled random search (CRS) is a simple stochastic algorithm searching for the global minimum in the problems defined above. The algorithm was proposed by Price [Pri77]. There are several modifications of the CRS algorithm which were successfully used in solving the global optimization problems [AT04]. The CRS algorithm can be written in pseudo-code as follows: 1 generate P (population of N points in D at random); 2 find xmax (the point in P with the highest function value); 3 repeat 4 generate a new trial point y ∈ D by using a heuristic; 5 if f (y) < f (xmax ) then 6 xmax := y; 7 find new xmax ; 8 endif 9 until stopping condition; The role of a heuristic at line 4 can play any non-deterministic rule generating a new trial point y ∈ D. There are many different heuristics that can be used and, moreover, the heuristics can alternate during the course of searching process. Suppose we have h heuristics and one of them can be chosen at each iteration step at random with the probability qi , i = 1, 2, . . . , h. The probabilities qi can change according to the successfulness of heuristics in preceding steps of searching process. The heuristic is successful in the current step of evolutionary process, if it generates such a trial point y that f (y) < fmax , fmax = f (xmax ). Probability qi can be simply evaluated by taking into account the corresponding success rate. Other way consists in the weighting of successes by using the formula wi = fmax − max(f (y), fmin ) . fmax − fmin (3) Thus, wi ∈ (0, 1] and the corresponding probability qi is evaluated as qi = Wi + w0 , j=1 (Wj + w0 ) Ph (4) where Wi is the sum of wi in previous searching steps and w0 > 0 is an input parameter of the algorithm. In order to avoid the degeneration of evolutionary process, Adaptive population-based algorithm for global optimization 1365 the current values of qi are reset to their starting values (qi = 1/h) when any probability qi decreases below a given limit δ > 0. The CRS algorithm with competing heuristics belongs to the class of evolutionary algorithms described in [TKM01]. Four competing heuristics were used in the implementation of the CRS algorithm for nonlinear regression parameter estimation. Three of them are based on a randomized reflection in the simplex S (d + 1 points chosen from P ) proposed in [KT95]. A new trial point y is generated from the simplex by the relation y = g + U (g − xH ) , (5) where xH = arg maxx∈S f (x) and g is the centroid of remaining d points of the simplex S. The multiplication factor U is a random variable distributed uniformly in [s, α−s), α > 0 and s being input parameters, 0 < s < α/2. All the d+1 points of simplex are chosen at random from P in two heuristics: the heuristic denoted REFL1 uses α = 2, and s = 0.5, the heuristic denoted REFL25 uses α = 5, and s = 1.5. Regarding the third heuristic denoted REFLB, one point of the simplex is the point of P with the minimum objective function value and the remaining d points of the simplex S are chosen at random from remaining points of P . Input parameters of the heuristic are set to α = 2 and s = 0.5. The fourth competing heuristic is based on differential evolution [SP97]. A point u is generated according to u = r1 + F (r2 − r3 ) , (6) where r1 , r2 and r3 are three distinct points taken randomly from P and F > 0 is an input parameter. The elements yj , j = 1, 2. . . . , d, of a trial point y are built up by the crossover of randomly taken x (not coinciding with the current r1 , r2 , and r3 ) and u using the following rule: yj = uj if Uj ≤ C or j = l xj if Uj > C and j 6= l , (7) where l is a randomly chosen integer from {1, 2, . . . , d}, U1 , U2 , . . . , Ud are independent random variables uniformly distributed in [0, 1), and C ∈ [0, 1] is an input parameter affecting the number of elements to be exchanged by crossover. Ali and Törn [AT04] suggested to adapt the value of the scaling factor F during searching process according to the equation ( F = max(Fmin , 1 − | ffmax |) if | ffmax |<1 min min min |) otherwise , max(Fmin , 1 − | ffmax (8) where fmin , fmax are the minimum and maximum objective function values in the population, respectively, and Fmin is an input parameter ensuring F ∈ [Fmin , 1). The heuristic which uses (8) for evaluation of F with Fmin = 0.4 and C = 0.9 is denoted CDEADP in the following text. 3 Numerical Experiments and Their Results One of the criteria for examining the reliability of software is the comparison of the experimental parameter values with the certified values from sources that are considered to be reliable. The NIST collection of datasets [NIST] contains 27 nonlinear 1366 Josef Tvrdı́k et al. regression datasets (tasks) ordered according to their level of difficulty (lower – 8 tasks, average – 11 tasks, and higher – 8 tasks). The certified values are reported to 11 decimal places for each dataset. Numerical tests were carried out for all the NIST nonlinear regression tasks. The accuracy of the experimental results obtained by using our algorithms was estimated according to the number of duplicated digits when compared with the certified results. The number of duplicated digits λ can be calculated via log relative error [MW05] as 8 > > <0 λ = 11 > > : − log10 |m−c| |c| if if |m−c| |c| |m−c| |c| ≥1 < 1 × 10−11 (9) otherwise , where c denotes the certified value and m denotes the estimated value. According to [NIST], excepting the cases, where the certified value is essentially zero (e.g. for the three Lanczos problems), a good nonlinear least squares procedure should be able to duplicate the certified results to at least 4 or 5 digits. Two values of the number of duplicated digits are reported in our results: λQ for the residual sum of squares, see (2), and λβ for the mean of estimated parameters (β1 , β2 , . . . , βd ). Several algorithms for the estimation of parameters were compared in our numerical experiments. One of them is a modification of Levenberg-Marquardt algorithm implemented in nlinfit procedure of Statistical Toolbox [MTLB]. Each dataset contains two starting d-tuples of parameters values for iterative deterministic algorithms. The results obtained with these starting values are labelled Start1 and Start2 in the following text. Especially values in Start2 are very close to the certified values. Control parameters of nlinfit were set to their default values. Other algorithms are population-based, namely differential evolution and several variants of the CRS. Because of their stochastic nature, they must be tested in repeated runs. One hundred of repetitions were carried out for each task. In addition to λQ and λβ , other characteristics are also reported. The time consumption is expressed as average number (ne) of objective function evaluations needed to reach the stopping condition or maximum allowed number of objective function evaluations (40000 d). The variability of ne is characterized by the coefficient of variance vc in percents of ne. The reliability RP of the search is measured by the percentage of successful searches, in which λQ > 4 (with exception of Lanczos1, where RP is percentage of runs with λQ > 2.4). The stopping condition was defined in all the 2 2 tasks as Rmax − Rmin < ε, ε = 1 × 10−15 for all the population-based algorithms 2 2 except CRS4HCe. Rmax and Rmin are the maximum and the minimum values of the 2 determination index R defined by R2 = 1 − Pn Q(β) . − Y )2 i=1 (Yi (10) Population size was set up to N = 10 d for all the population-based algorithms. Search spaces D for all the tasks are given in [TKM06]. The results of nlinfit and differential evolution are compared in Table 1. DER is standard rand variant of differential evolution [SP97] with tuning parameters set to their recommended values, F = 0.8 and C = 0.5. DEADP uses adaptive F according to (8) with the parameters set equally as in CDEADP. Columns denoted rne contain the relative change of ne in percents when compared with CRS4HC, see Table 2. Adaptive population-based algorithm for global optimization 1367 Therefore, negative values present less time consumption with respect to CRS4HC, positive values bigger one. Table 1. Comparison of standard algorithms Task d 1 − R2 chwirut1 chwirut2 danwood gauss1 gauss2 lanczos3 misra1a misra1b enso gauss3 hahn1 kirby2 lanczos1 lanczos2 mgh17 misra1c misra1d nelson roszman1 bennett5 boxbod eckerle4 mgh09 mgh10 rat42 rat43 thurber 3 3 2 8 8 6 2 2 9 8 7 5 6 6 5 2 2 3 4 3 2 3 4 3 3 4 7 2.0E-02 1.4E-02 5.7E-04 3.0E-03 3.5E-03 1.5E-09 1.8E-05 1.1E-05 4.0E-01 3.1E-03 2.0E-04 2.9E-05 1.3E-26 2.1E-12 4.7E-05 6.1E-06 8.3E-06 7.0E-02 1.6E-03 1.1E-05 1.2E-01 2.9E-03 5.9E-03 6.2E-08 1.7E-03 8.2E-03 4.9E-04 nlinfit Start1 Start2 λQ λβ λQ λβ 10.8 11.0 11.0 11.0 10.6 10.6 10.5 11.0 8.4 11.0 10.6 11.0 1.9 10.0 9.0 9.1 10.2 8.1 8.3 8.1 10.2 10.2 4.9 7.7 7.0 7.3 10.7 8.7 11.0 8.2 11.0 10.3 10.9 8.9 11.0 6.4 2.0 1.3 0.0 3.9 0.0 11.0 10.0 7.8 0.0 2.1 2.0 7.0 5.7 5.0 10.6 11.0 11.0 11.0 10.6 10.6 10.4 11.0 8.1 11.0 10.6 10.2 1.9 9.9 11.0 11.0 11.0 10.9 11.0 2.1 10.1 10.6 8.2 0.0 10.9 11.0 8.2 8.9 9.2 9.9 8.5 8.5 8.1 10.2 10.2 4.7 7.7 6.0 6.6 10.7 8.5 7.6 8.2 10.3 9.0 6.5 1.4 5.7 7.7 4.3 2.5 7.0 5.9 5.2 DER λQ λβ RP 11.0 11.0 11.0 11.0 10.4 0.0 10.4 11.0 11.0 8.8 0.0 8.7 0.0 0.0 3.8 11.0 11.0 10.9 11.0 1.7 10.4 10.7 9.9 0.0 11.0 11.0 1.5 8.0 7.9 8.7 8.6 8.5 0.3 7.9 7.9 5.5 4.3 0.1 5.5 0.3 0.4 2.9 7.9 7.9 8.3 7.5 1.2 8.8 9.5 5.9 0.3 8.6 7.9 2.1 100 100 100 100 98 0 100 100 100 99 0 100 0 0 25 100 100 100 100 5 100 100 100 0 100 100 0 DEADP rne λQ λβ RP 574 571 300 530 1039 705 325 386 722 1912 1596 2251 746 750 1714 435 414 619 1245 190 206 140 1427 478 474 904 1912 10.9 10.8 11.0 10.7 6.0 0.0 1.3 3.1 10.8 4.7 3.4 10.3 0.0 0.0 0.1 0.2 0.1 0.0 11.0 1.3 10.3 10.7 0.0 0.0 11.0 10.8 8.1 7.7 7.7 8.7 8.3 5.4 0.3 1.2 2.5 5.3 2.4 2.2 6.5 0.3 0.3 0.8 0.2 0.1 0.5 7.4 1.0 8.7 9.3 0.1 0.1 8.4 7.6 5.8 rne 99 66 98 69 100 37 97 125 56 192 0 650 12 264 28 230 98 84 43 413 32 1037 94 141 0 678 0 680 0 730 2 292 1 320 0 265 100 73 1 -35 100 32 100 46 0 574 0 57 100 59 99 78 78 484 The results of CRS algorithms with competing heuristics are given in Table 2. The common tuning parameters were set as follows: w0 used in (4), w0 = 0.5, δ for the reset of probabilities qi to initial values, δ = 0.05. CRS4HC denotes the algorithm with four competing Heuristics described in section 2, in CRS5HC the fifth heuristic generating a random point from uniform distribution in D is added to make the algorithm convergent with probability 1. Finally, CRS4HCe stands for the algorithm that contains the same four heuristics as CRS4HC and, moreover, provides a special procedure for adapting the value ε. Starting from ε0 = 1 × 10−9 , the procedure reduces in principle current value of ε ten times when 1 − R2 < ε × 107 (for more details see [TKM06], source 1368 Josef Tvrdı́k et al. Table 2. CRS with competing heuristics Task λQ CRS4HC RP ne vc λQ CRS5HC RP rne vc λQ CRS4HCe RP rne vc chwirut1 chwirut2 danwood gauss1 gauss2 lanczos3 misra1a misra1b enso gauss3 hahn1 kirby2 lanczos1 lanczos2 mgh17 misra1c misra1d nelson roszman1 bennett5 boxbod eckerle4 mgh09 mgh10 rat42 rat43 thurber 11.0 11.0 11.0 11.0 10.4 6.9 10.4 10.9 9.7 11.0 9.9 11.0 0.0 4.1 11.0 11.0 11.0 10.9 11.0 11.0 10.4 10.7 11.0 9.0 11.0 11.0 11.0 100 100 100 100 98 100 100 100 87 100 93 100 0 55 100 100 100 100 100 100 100 100 100 100 100 100 100 5 4 7 2 3 19 8 7 7 4 7 2 18 17 4 7 7 4 4 34 7 4 8 10 6 4 3 11.0 11.0 11.0 11.0 10.4 7.0 10.4 10.9 9.0 10.9 9.2 11.0 0.0 4.1 11.0 11.0 11.0 10.9 11.0 11.0 10.4 10.7 11.0 9.0 11.0 11.0 11.0 100 100 100 100 98 100 100 100 80 99 87 100 0 40 100 100 100 100 100 100 100 100 100 100 100 100 100 100 -35 6 1E-09 100 -35 6 1E-09 100 -28 9 1E-11 100 -35 3 1E-10 98 -36 4 1E-10 100 2 23 1E-16 100 -17 7 1E-12 100 -19 9 1E-12 86 -30 11 1E-09 99 -35 6 1E-10 93 -26 10 1E-11 100 -23 3 1E-12 100 639 22 1E-31 100 8 18 1E-19 100 -18 5 1E-12 100 -11 8 1E-13 100 -12 9 1E-13 100 -17 5 1E-09 100 -36 5 1E-10 100 -11 39 1E-12 100 -37 9 1E-09 100 -35 6 1E-10 100 -15 11 1E-10 100 1 8 1E-15 100 -35 6 1E-10 100 -39 5 1E-10 100 -30 3 1E-11 3008 2987 1620 14137 14726 29810 2157 1861 19220 15908 16509 8508 28361 28251 11023 2104 2043 5904 5301 41335 1308 2629 10422 20761 2942 4807 13915 12 13 13 13 13 11 12 15 13 12 11 13 14 18 13 13 12 12 12 14 13 12 13 17 14 13 11 5 4 8 2 3 17 7 8 7 5 7 3 19 20 5 6 7 4 4 35 8 5 9 4 5 4 2 8.5 8.3 8.4 7.1 7.0 7.0 7.8 7.7 8.1 7.0 6.5 7.3 2.5 7.1 7.5 8.3 8.4 9.0 7.2 5.1 9.7 7.6 7.7 8.1 7.4 7.9 7.4 ε code in Matlab available at http://albert.osu.cz/tvrdik). The benefit of competition is evident, when we compare the results of competitive CRS algorithms with the results of non-competitive ones in Table 1. As regard CRS4HA, the same four heuristics as in CRS4HC do not compete but only alternate, with constant probabilities qi = 1/4, i = 1, 2, 3, 4. The remaining columns give the results obtained by CRS with only one heuristic, the algorithms are labelled according to the name of heuristics. The values of vc = 0 mean that all the runs of a given task were stopped because of exceeding the limit value of ne = 40000 d. 4 Discussion and Conclusions The results obtained by the nlinfit procedure can be accepted for all the tasks of lower difficulty level. In other tasks the nlinfit crashes in two cases and does not perform well in most tasks of higher difficulty. The reliability of differential evolution Adaptive population-based algorithm for global optimization 1369 Table 3. CRS without competition Task CRS4HA REFL1 REFL25 REFLB CDEADP RP rne vc RP rne vc RP rne vc RP rne vc RP rne vc chwirut1 chwirut2 danwood gauss1 gauss2 lanczos3 misra1a misra1b enso gauss3 hahn1 kirby2 lanczos1 lanczos2 mgh17 misra1c misra1d nelson roszman1 bennett5 boxbod eckerle4 mgh09 mgh10 rat42 rat43 thurber 100 100 100 100 96 100 100 100 80 100 91 100 0 55 100 100 100 100 100 100 100 100 100 100 100 100 100 17 19 40 20 20 28 30 36 22 20 19 19 36 34 18 33 31 12 19 3 38 19 20 15 19 19 20 5 5 7 2 3 24 7 8 10 4 7 3 21 24 3 6 8 4 4 38 6 4 8 4 6 3 2 84 71 81 24 38 0 2 9 86 4 0 34 0 0 0 0 0 0 91 1 90 94 0 0 81 87 1 47 74 3 243 280 444 59 57 56 532 992 261 523 475 381 62 73 88 51 -69 -1 3 184 -50 42 83 824 57 60 46 122 129 44 23 29 123 86 32 115 38 41 83 18 17 26 62 24 46 59 40 20 69 70 26 100 100 100 0 0 0 100 100 0 0 0 0 0 0 0 100 100 100 100 0 100 100 0 0 100 100 0 1922 1906 406 2164 2073 705 359 367 1773 1912 1596 2251 746 750 1714 326 340 1417 2918 190 510 2297 1435 478 2016 3228 1912 4 4 6 0 0 0 7 8 0 0 0 0 0 0 0 8 9 7 0 0 6 4 0 0 4 0 0 100 100 0 100 86 100 0 0 63 91 62 100 0 48 100 0 0 100 100 100 100 100 100 100 100 100 100 -10 -11 4838 -13 -14 -35 3609 4199 -14 -17 -25 -15 -34 -36 -18 3702 3816 -15 -12 -32 6016 1 -13 -25 -8 -8 -20 4 4 0 3 6 19 0 0 10 8 13 3 21 20 14 0 0 3 3 39 0 5 9 10 6 4 2 71 71 65 74 62 63 88 -3 41 30 250 134 36 339 133 0 510 36 0 61 21 5 64 24 78 42 52 3 475 88 0 1057 29 27 188 58 0 520 38 0 467 41 0 451 85 1 63 18 1 72 19 0 88 26 96 39 55 1 -71 25 86 6 54 95 5 75 0 178 38 0 -48 20 83 42 69 82 66 62 1 845 29 (DE) is poor in many cases when compared with competitive CRS in spite of the fact that the time consumption is higher. The results of two variants of DE differ from each other in both RP and ne, which indicates a high sensitivity of DE to the setting of tuning parameters. CRS4HC is reliable enough for most tasks except Lanczos1 and Lanczos2 (caused by their extremely low values of 1 − R2 , see Table 1). As regards other tasks, only in four tasks RP is less than 100 but always exceeds 85 percents. The addition of the fifth heuristic in CRS5HC does not increase RP despite of theoretical expectation but only makes time consumption greater by about one tenth or more. The best results in both RP and ne were achieved by using CRS4HCe, where the use of adaptive stopping condition decreases ne by one tenth to one third in most cases and increases RP in tasks with a small value of 1 − R2 . The results of competitive CRS also shows that higher values of vc indicate a high time consumption of search, which should be taken into account when proposing new variants of adaptive algorithms for the global optimization. 1370 Josef Tvrdı́k et al. As it is clear from Table 1, CRS4HA algorithm (where the cooperation of four heuristics comes through their alternation) outperforms all the CRS algorithms with one heuristic only. Comparing the results of CRS4HA and CRS4HC, the competition gives another improvement in both ne and RP. CRS4HC and CRS4HCe proved to be more reliable than nlinfit with acceptable time consumption (one objective function evaluation takes about 1 millisecond on PC) for all the NIST nonlinear datasets and, therefore, they can be used instead of deterministic algorithms in the parameter estimation. References [AT04] Ali, M.M., Törn, A.: Population set based global optimization algorithms: Some modifications and numerical studies, Computers and Operations Research 31, 1703 – 1725 (2004) [KT95] Křivý, I., Tvrdı́k, J.: The controlled random search algorithm in optimizing regression models. Comp. Statist. & Data Anal. 20, 229 – 234 (1995) [MTLB] MATLAB, version 7.1.0. The MathWorks, Inc. (2005) [MW05] McCullough, B.D., Wilson, B.: On the accuracy of statistical procedures in Microsoft Excel 2003. Comp. Statist. & Data Anal. 49, 1244 – 1252 (2005) [Pri77] Price, W.L.: A controlled random search procedure for global optimization. Computer J. 20, 367 – 370 (1977) [NIST] Statistical Reference Datasets. Nonlinear regression. NIST Information Technology Laboratory. http://www.itl.nist.gov/div898/strd. (2005) [SP97] Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optimization 11, 341 – 359 (1997) [TKM01] Tvrdı́k, J., Křivý, I., Mišı́k, L.: Evolutionary algorithm with competing heuristics. In: Ošmera, P. (ed) MENDEL 2001, 7th International Conference on Soft Computing. Technical University, Brno, 58 – 64 (2001) [TK04] Tvrdı́k, J., Křivý, I.: Comparison of algorithms for nonlinear regression estimates. In: Antoch, J. (ed) COMPSTAT 2004. Physica-Verlag, Heidelberg New York, 1917 – 1924 (2004) [TKM06] Tvrdı́k, J., Křivý, I., Mišı́k, L.: Adaptive population-based search: Application to nonlinear regression. IRAFM Research Report No. 98, University of Ostrava (2006), http://ac030.osu.cz/irafm/resrep.html