APPLICATION OF ORDER STATISTICS TO TERMINATION OF STOCHASTIC ALGORITHMS Vaida Bartkutė, Leonidas Sakalauskas Outline Introduction; Application of order statistics to optimality testing and termination of the algorithm: Stochastic Approximation algorithms; Simulated Annealing algorithm; Experimental results; Conclusions. Outline Introduction; Application of order statistics to optimality testing and termination of the algorithm: Stochastic Approximation algorithms; Simulated Annealing algorithm; Experimental results; Conclusions. Introduction Termination of the algorithm is a topical problem in stochastic and heuristic optimization. We consider the application of order statistics to establish the optimality in Markov type optimization algorithms. We build a method for the estimation of minima confidence intervals using order statistics, which is implemented for optimality testing and termination. Statement of the problem The optimization problem is (minimization) as follows: f x min xn where f : n is a bounded from below locally Lipshitz function. Denote the generalized gradient of this function by f ( x ). Let x0 ,0 , x1 ,1 , ..., xt ,t , ... be the sequence constructed by stochastic search algorithm, where ηt=f(xt), t = 0, 1, …. . The Markovian algorithms for optimization The Markovian algorithm of random searching represents a Markov chain in which the distribution of probabilities of a point xt+1 depends on a location of the previous point xt and value of function ηt=f(xt) in it, that Pt 1 xt 1 x1 ,1 ,...,xt ,t Pt 1xt 1 xt ,t Examples: Stochastic Approximation; Simulated Annealing; Random Search (Rastrigin method) and etc. Order statistics and target values for optimality testing and termination Beginning of the problem: Mockus (1968) Theoretical background: Zilinskas, Zhigljavsky (1991) Application to maximum location: Chen (1996) Time-to-target-solution value: Aiex, Resende, & Ribeiro, (2002), Pardalos (2005). Method for optimality testing by order statistics We build a method for estimation of minimum M of the objective function using values of the function provided in optimization: M f ( x* ) min f x x 1 , ... , N , t f xt Let only k+1 order statistics from the sample H to be chosen: 1 , ... , k 1 , where 1 2 ... N . Let apply linear estimators for estimation of the minimum: k M N ,k aii i 0 k where a i 0 i 1. . We examine a simple set (Hall (1982)): 1 1 k k 1 1 1 , 0, ..., 0 , 1 1 a 1 1 j j j 1 j 1 The one side confidence interval for the minimum value of the objective function is: [ 0 rk , k 0 ,0 ] 1 where rk , 1 k 1 1 , where is a confidence level. 1 1 1 k n - the parameter of extreme values distribution; n – dimension; - the parameter of homogeneity of the function f(x) (Zilinskas & Zhigliavsky (1991)). Stochastic Approximation The smoothing is the standard way for the nondifferentiable optimization. We consider a function smoothed by Lipshitz perturbation operator: f x, Ef x , ~ p where 0 is the value of the perturbation parameter, is a random vector distributed with density p(.). If density p(.) is locally Lipshitz then functions smoothed by this operator are twice continuously differentiable (Rubinstein & Shapiro (1993), Bartkute & Sakalauskas (2004)). xt 1 xt t g t , t 1, 2, ... t t where g g x , t ,t stochastic gradient, g x , E g x , , , t is a scalar multiplier. lim g x , f x . 0 This scheme is the same for different Stochastic Approximation algorithms whose distinguish only by approach to stochastic gradient estimation. The minimizing sequence converges a.s. to solution of the optimization problem under conditions typical for SA algorithms (Ermolyev (1976), Mikhalevitch et at (1987), Spall (1992), Bartkute & Sakalauskas (2004)). ALGORITHM SPSAL, Lipshitz smoothing density (Bartkute & Sakalauskas (2007)) SPSAU uniformly distributed density in the hypercube (Michalevitch et al (1976), (1987)) FDSA standard finite differences (Ermoliev (1988), Mikhalevitch et al (1987)) ESTIMATE OF STOCHASTIC GRADIENT g x ,, f x f x - uniformly distributed in the unit ball. g x ,, f x f x 2 - uniformly distributed in the hypercube [-1;1]n. gi x, , , f x i f x i 2 - uniformly distributed in the unit ball, i 0,0,0,...,1,...0,0 - with zero components except ith , equal to 1. - smoothing parameter. Rate of Convergence Let consider that the function f(x) has a sharp * x minimum in the point , in which the algorithm converges when Then b a a 0 , , b 0, 0 1, k , k k k E x k 1 x* k 1 2 a 1 . b 2 H 2 A K a b 1 1 1 o 2 aH H k b k , * where A>0, H>0, K>0 are certain constants,x k1 is minimum point of the smoothed function (Sakalauskas, Bartkute (2007)). Experimental results Unimodal testing functions (SPSAL, SPSAU, FDSA) Generated funkcions with sharp minimum- f x n bk xk k 1 x1 x2 A CB3f x max x14 x22 , 2 x1 2 2 x2 2 , 2e , Rozen Suzuki- f x max f x, f x 10 f x, f x 10 f x, f x 10 f x 1 1 2 1 3 1 4 Multiextremal testing functions (Simulated Annealing (SA)) 2 Branin Beale Rastrigin- 5 2 5 1 f x x2 x x 6 101 cos x1 10, 2 1 1 8 4 2 2 2 3 2 f x1 , x2 1.5 x1 x1 x2 2.25 x1 x1 x2 2.625 x1 x1 x2 , n f x 10n xi2 10 cos2 xi . i 1 The samples of T=500 test functions were generated, when 2, K 5, and minimized by SPSA with Lipshitz perturbation. The coefficients of the optimizing sequence were chosen according to convergence conditions (Bartkute & Sakalauskas (2006)): 1 t min 0.1; , t 0.1 t n 2n 3 min 1; 10 , n n 1 t 0 1, t 0,1, ..., N Testing hypothesis about Pareto distribution If order statistics follows from Weibull distribution, then DN , k 0 , N A , k , N 0 , N distributed with respect to Pareto distribution (Žilinskas, Zhigljavsky (1991)): . u Fk u 1 1 1 u k , u 0, Thus, statistical hypothesis tested: H 0: 0 , N A P u Fk u k , N 0 , N Testing hypothesis about Pareto distribution The hypothesis tested by criteria 2 ( n 10 ) for various stochastic algorithms (critical value 0,46) One side confidence interval [ 0 rk , k 0 ,0 ], Confidence interval =0.95 n=2 0.000014 -0.000036 0.0000275 Probability of hitting of М to the confidence interval 0.954 n=4 0.000289 -0.000569 0.0005355 0.946 n=6 0.001027 -0.001879 0.001819 0.942 n=10 0.004488 -0.005359 0.0070606 0.946 n=100 0.006712 0.00087 0.16357 0.944 Estimate of M=0 Lower bound Upper bound N=10000 Confidence bounds of the minimum 0,001 0,0008 0,0006 0,0004 0,0002 -0,0004 -0,0006 -0,0008 -0,001 Lower bound of minimum Upper bound of minimum M 0, N 10000 , T 500 , 0.95, n 2 9100 8100 7100 6100 5100 4100 3100 2100 1100 -0,0002 100 0 Confidence bounds of the hitting probability M 0, N 10000 , T 500 , 0.95, n 6 Termination criterion of the algorithms To stop the algorithm when minima confidence interval becomes less admissible value : rk , k 0 . Number of iterations after the termination of the algorithm 0.002, 0.975 1591 1508 1426 1343 1261 1178 1096 1013 931 848 766 683 601 518 436 353 90 80 70 60 50 40 30 20 10 0 Number of iterations Simulated Annealing Algorithm I. Choose temperature updating function Tt T , neighborhood size d t function t , solution generation density function tc p( ,Tt ) Tt m 1 n n , m 1 j 1 2m j Tt m and initial solution x0 (Yang (2000)). II. Construct the optimizing sequence: y t xt t , t ~ p, t 1, 2, ... f xt f y t Tt y t , if min it t , t min 1, e , t ~ U 0,1 t 1 1i n x x t , otherwise. Experimental results Let consider results of optimality testing with Beale testing function: F(x,y) = (1.5-x+xy)2 + (2.25-x+xy2)2 + (2.625-x+xy3)2, where search domain: -4.5 ≤ x,y ≤ 4.5. It is known that this function has few local minima and global minimum is 0 at the point (3; 0.5). Confidence bounds of the minimum 0,02 0,015 0,01 19000 17000 15000 13000 11000 9000 -0,005 7000 lower bound 5000 0 3000 upper bound 1000 0,005 estimate -0,01 -0,015 -0,02 Number of iterations M 0, N 20000 , T 500 , 0.95, n 2 , n 1 Confidence bounds of the hitting probability 1 0,99 0,98 0,97 0,96 0,95 0,94 0,93 0,92 19100 18100 17100 16100 15100 14100 13100 12100 11100 9100 8100 7100 6100 5100 4100 3100 2100 0,9 10100 0,91 Number of iterations Hitting probability Upper bound Lower bound Confidence probability M 0, N 20000 , T 500 , 0.95, n 2 Number of iterations after the termination of the algorithm =0.005, γ = 0 . 9 5 , n , 2 120 100 80 60 40 20 42149 39340 36532 33723 30915 28106 25297 22489 19680 16872 14063 11254 8446 5637 2829 20 0 Conclusions Linear estimator for minimum has been proposed using theory of order statistics, which was studied by experimental way; Developed procedures are simple and depend only on the parameter of extreme values distribution ; Parameter is easily estimated using a homogeneity of the objective function or by statistical way; Theoretical considerations and computer examples have shown that we can estimate the confidence interval of a function extremum with an admissible accuracy, when the number of iterations increased; Termination rule using the minimum confidence interval was proposed and implemented to Stochastic Approximation and Simulated Annealing.