APPLICATION OF ORDER STATISTICS TO TERMINATION

advertisement
APPLICATION OF ORDER STATISTICS
TO TERMINATION OF STOCHASTIC
ALGORITHMS
Vaida Bartkutė,
Leonidas Sakalauskas
Outline
Introduction;
Application of order statistics to optimality testing
and termination of the algorithm:
Stochastic Approximation algorithms;
Simulated Annealing algorithm;
 Experimental results;
Conclusions.
Outline
Introduction;
Application of order statistics to optimality testing
and termination of the algorithm:
Stochastic Approximation algorithms;
Simulated Annealing algorithm;
 Experimental results;
Conclusions.
Introduction
Termination of the algorithm is a topical problem in
stochastic and heuristic optimization.
We consider the application of order statistics to establish
the optimality in Markov type optimization algorithms.
We build a method for the estimation of minima
confidence intervals using order statistics, which is
implemented for optimality testing and termination.
Statement of the problem
The optimization problem is (minimization) as follows:
f  x   min
xn
where f :  n   is a bounded from below locally Lipshitz
function. Denote the generalized gradient of this function by f ( x ).
Let x0 ,0 , x1 ,1 , ..., xt ,t , ... be the sequence constructed by
stochastic search algorithm, where ηt=f(xt), t = 0, 1, …. .
The Markovian algorithms for optimization
The Markovian algorithm of random searching
represents a Markov chain in which the distribution of
probabilities of a point xt+1 depends on a location of the
previous point xt and value of function ηt=f(xt) in it, that
Pt 1 xt 1 x1 ,1 ,...,xt ,t   Pt 1xt 1 xt ,t 
Examples: Stochastic Approximation;
Simulated Annealing;
Random Search (Rastrigin method) and etc.
Order statistics and target values for optimality
testing and termination
 Beginning of the problem:
Mockus (1968)
 Theoretical background:
Zilinskas, Zhigljavsky (1991)
 Application to maximum location:
Chen (1996)
 Time-to-target-solution value:
Aiex, Resende, & Ribeiro, (2002),
Pardalos (2005).
Method for optimality testing by order statistics
We build a method for estimation of minimum M of
the objective function using values of the function provided in
optimization:
M  f ( x* )  min f x 
x
  1 , ... , N , t  f xt 
Let only k+1 order statistics from the sample H to be chosen:
1 , ... , k 1 ,
where  1   2   ...    N  .
Let apply linear estimators for estimation of the minimum:
k
M N ,k   aii 
i 0
k
where
a
i 0
i
 1. .
We examine a simple set (Hall (1982)):
1
1 
 



k
k






1
1
  1  , 0, ..., 0 ,   1 
  1 
a  1   1 
 
j   
j    
  j 1
 

 j 1
 



The one side confidence interval for the minimum value of
the objective function is:


[ 0  rk ,  k   0 ,0 ]
1
where
rk , 
1 



k
1








1
, where  is a confidence level.
1 



1  1   k 





n

- the parameter of extreme values distribution;
n – dimension;
- the parameter of homogeneity of the function f(x) (Zilinskas
& Zhigliavsky (1991)).
Stochastic Approximation
The smoothing is the standard way for the nondifferentiable
optimization. We consider a function smoothed by Lipshitz
perturbation operator:
f  x,    Ef  x    ,  ~ p 
where   0 is the value of the perturbation parameter,
 is a random vector distributed with density p(.).
If density p(.) is
locally Lipshitz then functions
smoothed by this operator are twice continuously
differentiable (Rubinstein & Shapiro (1993), Bartkute &
Sakalauskas (2004)).
xt 1  xt  t  g t , t  1, 2, ...


t
t
where g  g x , t ,t stochastic gradient,
g x ,   E g x , , ,
 t is a scalar multiplier.
lim g x ,  f x .
 0
This scheme is the same for different Stochastic Approximation
algorithms whose distinguish only by approach to stochastic gradient
estimation. The minimizing sequence converges a.s. to solution of
the optimization problem under conditions typical for SA algorithms
(Ermolyev (1976), Mikhalevitch et at (1987), Spall (1992), Bartkute
& Sakalauskas (2004)).
ALGORITHM
SPSAL,
Lipshitz smoothing
density (Bartkute &
Sakalauskas (2007))
SPSAU
uniformly distributed
density in the hypercube
(Michalevitch
et al (1976), (1987))
FDSA
standard finite
differences
(Ermoliev (1988),
Mikhalevitch et al (1987))
ESTIMATE OF STOCHASTIC GRADIENT
g x ,, 
 f x    f x  
 
 - uniformly distributed in the unit ball.
g x ,, 
 f x      f x     
2
 - uniformly distributed in the hypercube [-1;1]n.
gi x, , ,  
f  x        i   f  x        i 
2
 - uniformly distributed in the unit ball,
i  0,0,0,...,1,...0,0 - with zero components except ith , equal to 1.
 - smoothing parameter.
Rate of Convergence
Let consider that the function f(x) has a sharp
*
x
minimum in the point , in which the algorithm converges
when
Then
b
a
a

0
,


, b  0, 0    1,
k  ,
k

k
k
E  x k 1  x* k 1

2
a 1 

.
b 2 H

2
A

K

a

b
1
1
 
 1   o 2 aH

H
k
 b
k

,


*
where A>0, H>0, K>0 are certain constants,x k1 is minimum point
of the smoothed function (Sakalauskas, Bartkute (2007)).
Experimental results
Unimodal testing functions
(SPSAL, SPSAU, FDSA)
 Generated funkcions with sharp minimum-
f x  
n
 bk xk
k 1
 x1  x2

A

 CB3f x   max x14  x22 , 2  x1 2  2  x2 2 , 2e
,
 Rozen Suzuki- f x  max  f x, f x  10 f x, f x  10 f x, f x  10 f x
1
1
2
1
3
1
4
Multiextremal testing functions
(Simulated Annealing (SA))
2
 Branin Beale Rastrigin-


5 2 5
1 

f  x    x2 
x  x  6   101   cos x1   10,
2 1  1
 8 
4



 

2
2
2
3
2
f x1 , x2   1.5  x1  x1  x2   2.25  x1  x1  x2  2.625  x1  x1  x2 ,
n


f x   10n   xi2  10 cos2  xi  .
i 1
The samples of T=500 test functions were generated, when
  2, K  5, and minimized by SPSA with Lipshitz
perturbation.
The coefficients of the optimizing sequence were chosen
according to convergence conditions (Bartkute & Sakalauskas
(2006)):
1

t  min 0.1; ,

 t  0.1 
t
n  2n  3  min 1; 10 ,
 
n  n  1
 t 
0    1, t  0,1, ..., N
Testing hypothesis about Pareto distribution
If order statistics follows from Weibull distribution, then
DN , k
 0 , N   A

,
  k , N    0 , N 
distributed with respect to Pareto distribution (Žilinskas, Zhigljavsky
(1991)):
.
  u 
Fk u   1  1  

 1 u 

k

 , u  0,


Thus, statistical hypothesis tested:
H 0:
  0 , N   A


P
 u   Fk u 


  k , N    0 , N 

Testing hypothesis about Pareto distribution
The hypothesis tested by criteria 2 ( n  10 ) for various stochastic
algorithms (critical value 0,46)
One side confidence interval
[ 0  rk ,  k   0  ,0 ],
Confidence interval
=0.95
n=2
0.000014
-0.000036
0.0000275
Probability
of hitting of
М to the
confidence
interval
0.954
n=4
0.000289
-0.000569
0.0005355
0.946
n=6
0.001027
-0.001879
0.001819
0.942
n=10
0.004488
-0.005359
0.0070606
0.946
n=100
0.006712
0.00087
0.16357
0.944
Estimate
of M=0
Lower
bound
Upper
bound
N=10000
Confidence bounds of the minimum
0,001
0,0008
0,0006
0,0004
0,0002
-0,0004
-0,0006
-0,0008
-0,001
Lower bound of minimum
Upper bound of minimum
M  0, N  10000 , T  500 ,   0.95, n  2
9100
8100
7100
6100
5100
4100
3100
2100
1100
-0,0002
100
0
Confidence bounds of the hitting probability
M  0, N  10000 , T  500 ,   0.95, n  6
Termination criterion of the algorithms
To stop the algorithm when minima confidence interval
becomes less admissible value :


rk ,  k   0   .
Number of iterations after the termination
of the algorithm
0.002, 0.975
1591
1508
1426
1343
1261
1178
1096
1013
931
848
766
683
601
518
436
353
90
80
70
60
50
40
30
20
10
0
Number of iterations
Simulated Annealing Algorithm
I. Choose temperature updating function Tt  T , neighborhood size
d
t
function  t   , solution generation density function
tc
p(  ,Tt ) 

Tt  m
1
n

n
,



m 1

j 1 2m  j  Tt m
and initial solution x0 (Yang (2000)).
II. Construct the optimizing sequence:
y t  xt   t ,
 t ~ p, t  1, 2, ...
   

 f xt  f y t 



Tt
 y t , if min it   t , t  min 1, e
, t ~ U 0,1

t

1
1i  n


x





 x t , otherwise.
Experimental results
Let consider results of optimality testing with Beale testing
function:
F(x,y) = (1.5-x+xy)2 + (2.25-x+xy2)2 + (2.625-x+xy3)2,
where search domain: -4.5 ≤ x,y ≤ 4.5.
It is known that this function has few local minima and
global minimum is 0 at the point (3; 0.5).
Confidence bounds of the minimum
0,02
0,015
0,01
19000
17000
15000
13000
11000
9000
-0,005
7000
lower bound
5000
0
3000
upper bound
1000
0,005
estimate
-0,01
-0,015
-0,02
Number of iterations
M  0, N  20000 , T  500 ,   0.95, n  2 ,  
n

1
Confidence bounds of the hitting probability
1
0,99
0,98
0,97
0,96
0,95
0,94
0,93
0,92
19100
18100
17100
16100
15100
14100
13100
12100
11100
9100
8100
7100
6100
5100
4100
3100
2100
0,9
10100
0,91
Number of iterations
Hitting probability
Upper bound
Lower bound
Confidence probability
M  0, N  20000 , T  500 ,   0.95, n  2
Number of iterations after the termination
of the algorithm
=0.005, γ
=
0
.
9
5
,

n

,  2
120
100
80
60
40
20
42149
39340
36532
33723
30915
28106
25297
22489
19680
16872
14063
11254
8446
5637
2829
20
0
Conclusions
 Linear estimator for minimum has been proposed using theory of
order statistics, which was studied by experimental way;
 Developed procedures are simple and depend only on the parameter
of extreme values distribution ;
 Parameter  is easily estimated using a homogeneity of the
objective function or by statistical way;
 Theoretical considerations and computer examples have shown
that we can estimate the confidence interval of a function
extremum with an admissible accuracy, when the number of
iterations increased;
 Termination rule using the minimum confidence interval was
proposed and implemented to Stochastic Approximation and
Simulated Annealing.
Download