Adaptive population-based algorithm for global optimization

advertisement
Adaptive population-based algorithm for global
optimization
Josef Tvrdı́k, Ivan Křivý, and Ladislav Mišı́k
University of Ostrava
701 03 Ostrava
Czech Republic
{josef.tvrdik,ivan.krivy,ladislav.misik}@osu.cz
Summary. This paper describes an adaptive stochastic algorithm for the global
optimization. The algorithm is based on the competition of heuristics for local search.
The experimental results obtained when applying the algorithm to the estimation
of nonlinear-regression parameters of NIST tasks are presented and compared with
the results of other algorithms. The proposed algorithm proved to be more reliable
at an acceptable time consumption.
Key words: global optimization, stochastic algorithms, nonlinear regression
1 Global Optimization and Non-linear Regression
Let us consider continuous global optimization problem with box constraints, i.e.
for a given objective function f : D → R, D ⊂ Rd , the point x∗ is to be found such
∗
that x∗ = arg minx∈D f (x). The
Q point x is called the global minimum, D is the
search space defined as D = di=1 [ai , bi ], ai < bi , i = 1, 2, . . . , d and the objective
function f (x) is supposed to be computable.
In a nonlinear regression model, the elements of random vector Y are expressed
as follows
Yi = g(xi , β) + εi ,
i = 1, 2, . . . , n ,
(1)
xTi
where
= (x1 , x2 , . . . , xk ) is i-th row of regressor matrix X, β is vector of parameters, g is a given function nonlinear in parameters, and εi ’s are iid random variables
with zero means. The estimation of parameters by the least squares method means
to find such estimates of β that minimize the residual sum of squares Q(β) given
by the following equation
Q(β) =
n
X
[Yi − g(xi , β)]2 .
(2)
i=1
1
This work was supported by the grant 201/05/0284 of the Czech Grant Agency
and by the research scheme MSM 6198898701 of the Institute for Research and
Applications of Fuzzy Modeling.
1364
Josef Tvrdı́k et al.
The estimation of β is the global optimization problem. Iterative deterministic
algorithms (Levenberg-Marquardt or Gauss-Newton) used in standard statistical
packages often fail when searching for the true solution of the problem. Several
statistical packages were tested [TK04] on higher-level-difficulty tasks collected by
NIST [NIST]. For approximately one half of the tasks, the experiments either completely failed or resulted in a significant disagreement with the certified parameter
values.
This paper presents an attempt to propose an adaptive population-based algorithm giving reliable estimates at reasonable time consumption without changing
default values of its control parameters.
2 Controlled Random Search with Competing Heuristics
Controlled random search (CRS) is a simple stochastic algorithm searching for the
global minimum in the problems defined above. The algorithm was proposed by Price
[Pri77]. There are several modifications of the CRS algorithm which were successfully
used in solving the global optimization problems [AT04]. The CRS algorithm can
be written in pseudo-code as follows:
1 generate P (population of N points in D at random);
2 find xmax (the point in P with the highest function value);
3 repeat
4
generate a new trial point y ∈ D by using a heuristic;
5
if f (y) < f (xmax ) then
6
xmax := y;
7
find new xmax ;
8
endif
9 until stopping condition;
The role of a heuristic at line 4 can play any non-deterministic rule generating
a new trial point y ∈ D. There are many different heuristics that can be used and,
moreover, the heuristics can alternate during the course of searching process.
Suppose we have h heuristics and one of them can be chosen at each iteration
step at random with the probability qi , i = 1, 2, . . . , h. The probabilities qi can
change according to the successfulness of heuristics in preceding steps of searching
process. The heuristic is successful in the current step of evolutionary process, if it
generates such a trial point y that f (y) < fmax , fmax = f (xmax ). Probability qi can
be simply evaluated by taking into account the corresponding success rate. Other
way consists in the weighting of successes by using the formula
wi =
fmax − max(f (y), fmin )
.
fmax − fmin
(3)
Thus, wi ∈ (0, 1] and the corresponding probability qi is evaluated as
qi =
Wi + w0
,
j=1 (Wj + w0 )
Ph
(4)
where Wi is the sum of wi in previous searching steps and w0 > 0 is an input parameter of the algorithm. In order to avoid the degeneration of evolutionary process,
Adaptive population-based algorithm for global optimization
1365
the current values of qi are reset to their starting values (qi = 1/h) when any probability qi decreases below a given limit δ > 0. The CRS algorithm with competing
heuristics belongs to the class of evolutionary algorithms described in [TKM01].
Four competing heuristics were used in the implementation of the CRS algorithm for nonlinear regression parameter estimation. Three of them are based on
a randomized reflection in the simplex S (d + 1 points chosen from P ) proposed
in [KT95]. A new trial point y is generated from the simplex by the relation
y = g + U (g − xH ) ,
(5)
where xH = arg maxx∈S f (x) and g is the centroid of remaining d points of the
simplex S. The multiplication factor U is a random variable distributed uniformly
in [s, α−s), α > 0 and s being input parameters, 0 < s < α/2. All the d+1 points of
simplex are chosen at random from P in two heuristics: the heuristic denoted REFL1
uses α = 2, and s = 0.5, the heuristic denoted REFL25 uses α = 5, and s = 1.5.
Regarding the third heuristic denoted REFLB, one point of the simplex is the point
of P with the minimum objective function value and the remaining d points of the
simplex S are chosen at random from remaining points of P . Input parameters of
the heuristic are set to α = 2 and s = 0.5.
The fourth competing heuristic is based on differential evolution [SP97]. A point
u is generated according to
u = r1 + F (r2 − r3 ) ,
(6)
where r1 , r2 and r3 are three distinct points taken randomly from P and F > 0 is
an input parameter. The elements yj , j = 1, 2. . . . , d, of a trial point y are built up
by the crossover of randomly taken x (not coinciding with the current r1 , r2 , and
r3 ) and u using the following rule:
yj =
uj if Uj ≤ C or j = l
xj if Uj > C and j 6= l ,
(7)
where l is a randomly chosen integer from {1, 2, . . . , d}, U1 , U2 , . . . , Ud are independent random variables uniformly distributed in [0, 1), and C ∈ [0, 1] is an input
parameter affecting the number of elements to be exchanged by crossover. Ali and
Törn [AT04] suggested to adapt the value of the scaling factor F during searching
process according to the equation
(
F =
max(Fmin , 1 − | ffmax
|) if | ffmax
|<1
min
min
min
|) otherwise ,
max(Fmin , 1 − | ffmax
(8)
where fmin , fmax are the minimum and maximum objective function values in the
population, respectively, and Fmin is an input parameter ensuring F ∈ [Fmin , 1). The
heuristic which uses (8) for evaluation of F with Fmin = 0.4 and C = 0.9 is denoted
CDEADP in the following text.
3 Numerical Experiments and Their Results
One of the criteria for examining the reliability of software is the comparison of the
experimental parameter values with the certified values from sources that are considered to be reliable. The NIST collection of datasets [NIST] contains 27 nonlinear
1366
Josef Tvrdı́k et al.
regression datasets (tasks) ordered according to their level of difficulty (lower – 8
tasks, average – 11 tasks, and higher – 8 tasks). The certified values are reported
to 11 decimal places for each dataset. Numerical tests were carried out for all the
NIST nonlinear regression tasks. The accuracy of the experimental results obtained
by using our algorithms was estimated according to the number of duplicated digits
when compared with the certified results. The number of duplicated digits λ can be
calculated via log relative error [MW05] as
8
>
>
<0
λ = 11
>
>
:
− log10
|m−c|
|c|
if
if
|m−c|
|c|
|m−c|
|c|
≥1
< 1 × 10−11
(9)
otherwise ,
where c denotes the certified value and m denotes the estimated value. According
to [NIST], excepting the cases, where the certified value is essentially zero (e.g. for
the three Lanczos problems), a good nonlinear least squares procedure should be
able to duplicate the certified results to at least 4 or 5 digits. Two values of the
number of duplicated digits are reported in our results: λQ for the residual sum of
squares, see (2), and λβ for the mean of estimated parameters (β1 , β2 , . . . , βd ).
Several algorithms for the estimation of parameters were compared in our numerical experiments. One of them is a modification of Levenberg-Marquardt algorithm
implemented in nlinfit procedure of Statistical Toolbox [MTLB]. Each dataset
contains two starting d-tuples of parameters values for iterative deterministic algorithms. The results obtained with these starting values are labelled Start1 and
Start2 in the following text. Especially values in Start2 are very close to the certified values. Control parameters of nlinfit were set to their default values.
Other algorithms are population-based, namely differential evolution and several
variants of the CRS. Because of their stochastic nature, they must be tested in
repeated runs. One hundred of repetitions were carried out for each task. In addition
to λQ and λβ , other characteristics are also reported. The time consumption is
expressed as average number (ne) of objective function evaluations needed to reach
the stopping condition or maximum allowed number of objective function evaluations
(40000 d). The variability of ne is characterized by the coefficient of variance vc in
percents of ne. The reliability RP of the search is measured by the percentage of
successful searches, in which λQ > 4 (with exception of Lanczos1, where RP is
percentage of runs with λQ > 2.4). The stopping condition was defined in all the
2
2
tasks as Rmax
− Rmin
< ε, ε = 1 × 10−15 for all the population-based algorithms
2
2
except CRS4HCe. Rmax
and Rmin
are the maximum and the minimum values of the
2
determination index R defined by
R2 = 1 −
Pn
Q(β)
.
− Y )2
i=1 (Yi
(10)
Population size was set up to N = 10 d for all the population-based algorithms.
Search spaces D for all the tasks are given in [TKM06].
The results of nlinfit and differential evolution are compared in Table 1. DER
is standard rand variant of differential evolution [SP97] with tuning parameters set
to their recommended values, F = 0.8 and C = 0.5. DEADP uses adaptive F according
to (8) with the parameters set equally as in CDEADP. Columns denoted rne contain
the relative change of ne in percents when compared with CRS4HC, see Table 2.
Adaptive population-based algorithm for global optimization
1367
Therefore, negative values present less time consumption with respect to CRS4HC,
positive values bigger one.
Table 1. Comparison of standard algorithms
Task
d 1 − R2
chwirut1
chwirut2
danwood
gauss1
gauss2
lanczos3
misra1a
misra1b
enso
gauss3
hahn1
kirby2
lanczos1
lanczos2
mgh17
misra1c
misra1d
nelson
roszman1
bennett5
boxbod
eckerle4
mgh09
mgh10
rat42
rat43
thurber
3
3
2
8
8
6
2
2
9
8
7
5
6
6
5
2
2
3
4
3
2
3
4
3
3
4
7
2.0E-02
1.4E-02
5.7E-04
3.0E-03
3.5E-03
1.5E-09
1.8E-05
1.1E-05
4.0E-01
3.1E-03
2.0E-04
2.9E-05
1.3E-26
2.1E-12
4.7E-05
6.1E-06
8.3E-06
7.0E-02
1.6E-03
1.1E-05
1.2E-01
2.9E-03
5.9E-03
6.2E-08
1.7E-03
8.2E-03
4.9E-04
nlinfit
Start1
Start2
λQ λβ λQ λβ
10.8
11.0
11.0
11.0
10.6
10.6
10.5
11.0
8.4
11.0
10.6
11.0
1.9
10.0
9.0
9.1
10.2
8.1
8.3
8.1
10.2
10.2
4.9
7.7
7.0
7.3
10.7
8.7
11.0 8.2
11.0 10.3
10.9 8.9
11.0 6.4
2.0 1.3
0.0
3.9
0.0
11.0
10.0
7.8
0.0
2.1
2.0
7.0
5.7
5.0
10.6
11.0
11.0
11.0
10.6
10.6
10.4
11.0
8.1
11.0
10.6
10.2
1.9
9.9
11.0
11.0
11.0
10.9
11.0
2.1
10.1
10.6
8.2
0.0
10.9
11.0
8.2
8.9
9.2
9.9
8.5
8.5
8.1
10.2
10.2
4.7
7.7
6.0
6.6
10.7
8.5
7.6
8.2
10.3
9.0
6.5
1.4
5.7
7.7
4.3
2.5
7.0
5.9
5.2
DER
λQ
λβ RP
11.0
11.0
11.0
11.0
10.4
0.0
10.4
11.0
11.0
8.8
0.0
8.7
0.0
0.0
3.8
11.0
11.0
10.9
11.0
1.7
10.4
10.7
9.9
0.0
11.0
11.0
1.5
8.0
7.9
8.7
8.6
8.5
0.3
7.9
7.9
5.5
4.3
0.1
5.5
0.3
0.4
2.9
7.9
7.9
8.3
7.5
1.2
8.8
9.5
5.9
0.3
8.6
7.9
2.1
100
100
100
100
98
0
100
100
100
99
0
100
0
0
25
100
100
100
100
5
100
100
100
0
100
100
0
DEADP
rne
λQ
λβ RP
574
571
300
530
1039
705
325
386
722
1912
1596
2251
746
750
1714
435
414
619
1245
190
206
140
1427
478
474
904
1912
10.9
10.8
11.0
10.7
6.0
0.0
1.3
3.1
10.8
4.7
3.4
10.3
0.0
0.0
0.1
0.2
0.1
0.0
11.0
1.3
10.3
10.7
0.0
0.0
11.0
10.8
8.1
7.7
7.7
8.7
8.3
5.4
0.3
1.2
2.5
5.3
2.4
2.2
6.5
0.3
0.3
0.8
0.2
0.1
0.5
7.4
1.0
8.7
9.3
0.1
0.1
8.4
7.6
5.8
rne
99
66
98
69
100
37
97 125
56 192
0 650
12 264
28 230
98
84
43 413
32 1037
94 141
0 678
0 680
0 730
2 292
1 320
0 265
100
73
1 -35
100
32
100
46
0 574
0
57
100
59
99
78
78 484
The results of CRS algorithms with competing heuristics are given in Table 2.
The common tuning parameters were set as follows: w0 used in (4), w0 = 0.5, δ for
the reset of probabilities qi to initial values, δ = 0.05. CRS4HC denotes the algorithm
with four competing
Heuristics described in section 2, in CRS5HC the fifth heuristic generating a random point from uniform distribution in D is added to make the algorithm convergent
with probability 1. Finally, CRS4HCe stands for the algorithm that contains the same
four heuristics as CRS4HC and, moreover, provides a special procedure for adapting
the value ε. Starting from ε0 = 1 × 10−9 , the procedure reduces in principle current
value of ε ten times when 1 − R2 < ε × 107 (for more details see [TKM06], source
1368
Josef Tvrdı́k et al.
Table 2. CRS with competing heuristics
Task
λQ
CRS4HC
RP
ne
vc
λQ
CRS5HC
RP rne vc λQ
CRS4HCe
RP rne vc
chwirut1
chwirut2
danwood
gauss1
gauss2
lanczos3
misra1a
misra1b
enso
gauss3
hahn1
kirby2
lanczos1
lanczos2
mgh17
misra1c
misra1d
nelson
roszman1
bennett5
boxbod
eckerle4
mgh09
mgh10
rat42
rat43
thurber
11.0
11.0
11.0
11.0
10.4
6.9
10.4
10.9
9.7
11.0
9.9
11.0
0.0
4.1
11.0
11.0
11.0
10.9
11.0
11.0
10.4
10.7
11.0
9.0
11.0
11.0
11.0
100
100
100
100
98
100
100
100
87
100
93
100
0
55
100
100
100
100
100
100
100
100
100
100
100
100
100
5
4
7
2
3
19
8
7
7
4
7
2
18
17
4
7
7
4
4
34
7
4
8
10
6
4
3
11.0
11.0
11.0
11.0
10.4
7.0
10.4
10.9
9.0
10.9
9.2
11.0
0.0
4.1
11.0
11.0
11.0
10.9
11.0
11.0
10.4
10.7
11.0
9.0
11.0
11.0
11.0
100
100
100
100
98
100
100
100
80
99
87
100
0
40
100
100
100
100
100
100
100
100
100
100
100
100
100
100 -35 6 1E-09
100 -35 6 1E-09
100 -28 9 1E-11
100 -35 3 1E-10
98 -36 4 1E-10
100
2 23 1E-16
100 -17 7 1E-12
100 -19 9 1E-12
86 -30 11 1E-09
99 -35 6 1E-10
93 -26 10 1E-11
100 -23 3 1E-12
100 639 22 1E-31
100
8 18 1E-19
100 -18 5 1E-12
100 -11 8 1E-13
100 -12 9 1E-13
100 -17 5 1E-09
100 -36 5 1E-10
100 -11 39 1E-12
100 -37 9 1E-09
100 -35 6 1E-10
100 -15 11 1E-10
100
1 8 1E-15
100 -35 6 1E-10
100 -39 5 1E-10
100 -30 3 1E-11
3008
2987
1620
14137
14726
29810
2157
1861
19220
15908
16509
8508
28361
28251
11023
2104
2043
5904
5301
41335
1308
2629
10422
20761
2942
4807
13915
12
13
13
13
13
11
12
15
13
12
11
13
14
18
13
13
12
12
12
14
13
12
13
17
14
13
11
5
4
8
2
3
17
7
8
7
5
7
3
19
20
5
6
7
4
4
35
8
5
9
4
5
4
2
8.5
8.3
8.4
7.1
7.0
7.0
7.8
7.7
8.1
7.0
6.5
7.3
2.5
7.1
7.5
8.3
8.4
9.0
7.2
5.1
9.7
7.6
7.7
8.1
7.4
7.9
7.4
ε
code in Matlab available at http://albert.osu.cz/tvrdik). The benefit of competition
is evident, when we compare the results of competitive CRS algorithms with the results of non-competitive ones in Table 1. As regard CRS4HA, the same four heuristics
as in CRS4HC do not compete but only alternate, with constant probabilities qi = 1/4,
i = 1, 2, 3, 4. The remaining columns give the results obtained by CRS with only one
heuristic, the algorithms are labelled according to the name of heuristics. The values
of vc = 0 mean that all the runs of a given task were stopped because of exceeding
the limit value of ne = 40000 d.
4 Discussion and Conclusions
The results obtained by the nlinfit procedure can be accepted for all the tasks of
lower difficulty level. In other tasks the nlinfit crashes in two cases and does not
perform well in most tasks of higher difficulty. The reliability of differential evolution
Adaptive population-based algorithm for global optimization
1369
Table 3. CRS without competition
Task
CRS4HA
REFL1
REFL25
REFLB
CDEADP
RP rne vc RP rne vc RP rne vc RP rne vc RP rne vc
chwirut1
chwirut2
danwood
gauss1
gauss2
lanczos3
misra1a
misra1b
enso
gauss3
hahn1
kirby2
lanczos1
lanczos2
mgh17
misra1c
misra1d
nelson
roszman1
bennett5
boxbod
eckerle4
mgh09
mgh10
rat42
rat43
thurber
100
100
100
100
96
100
100
100
80
100
91
100
0
55
100
100
100
100
100
100
100
100
100
100
100
100
100
17
19
40
20
20
28
30
36
22
20
19
19
36
34
18
33
31
12
19
3
38
19
20
15
19
19
20
5
5
7
2
3
24
7
8
10
4
7
3
21
24
3
6
8
4
4
38
6
4
8
4
6
3
2
84
71
81
24
38
0
2
9
86
4
0
34
0
0
0
0
0
0
91
1
90
94
0
0
81
87
1
47
74
3
243
280
444
59
57
56
532
992
261
523
475
381
62
73
88
51
-69
-1
3
184
-50
42
83
824
57
60
46
122
129
44
23
29
123
86
32
115
38
41
83
18
17
26
62
24
46
59
40
20
69
70
26
100
100
100
0
0
0
100
100
0
0
0
0
0
0
0
100
100
100
100
0
100
100
0
0
100
100
0
1922
1906
406
2164
2073
705
359
367
1773
1912
1596
2251
746
750
1714
326
340
1417
2918
190
510
2297
1435
478
2016
3228
1912
4
4
6
0
0
0
7
8
0
0
0
0
0
0
0
8
9
7
0
0
6
4
0
0
4
0
0
100
100
0
100
86
100
0
0
63
91
62
100
0
48
100
0
0
100
100
100
100
100
100
100
100
100
100
-10
-11
4838
-13
-14
-35
3609
4199
-14
-17
-25
-15
-34
-36
-18
3702
3816
-15
-12
-32
6016
1
-13
-25
-8
-8
-20
4
4
0
3
6
19
0
0
10
8
13
3
21
20
14
0
0
3
3
39
0
5
9
10
6
4
2
71
71 65
74
62 63
88
-3 41
30 250 134
36 339 133
0 510 36
0
61 21
5
64 24
78
42 52
3 475 88
0 1057 29
27 188 58
0 520 38
0 467 41
0 451 85
1
63 18
1
72 19
0
88 26
96
39 55
1 -71 25
86
6 54
95
5 75
0 178 38
0 -48 20
83
42 69
82
66 62
1 845 29
(DE) is poor in many cases when compared with competitive CRS in spite of the
fact that the time consumption is higher. The results of two variants of DE differ
from each other in both RP and ne, which indicates a high sensitivity of DE to the
setting of tuning parameters.
CRS4HC is reliable enough for most tasks except Lanczos1 and Lanczos2 (caused
by their extremely low values of 1 − R2 , see Table 1). As regards other tasks, only in
four tasks RP is less than 100 but always exceeds 85 percents. The addition of the
fifth heuristic in CRS5HC does not increase RP despite of theoretical expectation but
only makes time consumption greater by about one tenth or more. The best results in
both RP and ne were achieved by using CRS4HCe, where the use of adaptive stopping
condition decreases ne by one tenth to one third in most cases and increases RP in
tasks with a small value of 1 − R2 . The results of competitive CRS also shows
that higher values of vc indicate a high time consumption of search, which should
be taken into account when proposing new variants of adaptive algorithms for the
global optimization.
1370
Josef Tvrdı́k et al.
As it is clear from Table 1, CRS4HA algorithm (where the cooperation of four
heuristics comes through their alternation) outperforms all the CRS algorithms with
one heuristic only. Comparing the results of CRS4HA and CRS4HC, the competition
gives another improvement in both ne and RP.
CRS4HC and CRS4HCe proved to be more reliable than nlinfit with acceptable
time consumption (one objective function evaluation takes about 1 millisecond on
PC) for all the NIST nonlinear datasets and, therefore, they can be used instead of
deterministic algorithms in the parameter estimation.
References
[AT04]
Ali, M.M., Törn, A.: Population set based global optimization algorithms:
Some modifications and numerical studies, Computers and Operations
Research 31, 1703 – 1725 (2004)
[KT95] Křivý, I., Tvrdı́k, J.: The controlled random search algorithm in optimizing regression models. Comp. Statist. & Data Anal. 20, 229 – 234 (1995)
[MTLB] MATLAB, version 7.1.0. The MathWorks, Inc. (2005)
[MW05] McCullough, B.D., Wilson, B.: On the accuracy of statistical procedures
in Microsoft Excel 2003. Comp. Statist. & Data Anal. 49, 1244 – 1252
(2005)
[Pri77]
Price, W.L.: A controlled random search procedure for global optimization. Computer J. 20, 367 – 370 (1977)
[NIST]
Statistical Reference Datasets. Nonlinear regression. NIST Information
Technology Laboratory. http://www.itl.nist.gov/div898/strd. (2005)
[SP97]
Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic
for global optimization over continuous spaces. J. Global Optimization 11,
341 – 359 (1997)
[TKM01] Tvrdı́k, J., Křivý, I., Mišı́k, L.: Evolutionary algorithm with competing
heuristics. In: Ošmera, P. (ed) MENDEL 2001, 7th International Conference on Soft Computing. Technical University, Brno, 58 – 64 (2001)
[TK04] Tvrdı́k, J., Křivý, I.: Comparison of algorithms for nonlinear regression
estimates. In: Antoch, J. (ed) COMPSTAT 2004. Physica-Verlag, Heidelberg New York, 1917 – 1924 (2004)
[TKM06] Tvrdı́k, J., Křivý, I., Mišı́k, L.: Adaptive population-based search: Application to nonlinear regression. IRAFM Research Report No. 98, University
of Ostrava (2006), http://ac030.osu.cz/irafm/resrep.html
Download