1 Optimization methods Aleksey Minin Saint-Petersburg State University Student of ACOPhys master program (10th semester) Joint Advanced Students School 09.04.2015 2 What is optimization? Joint Advanced Students School 09.04.2015 3 Joint Advanced Students School 09.04.2015 Content: 1. 2. 3. 4. 5. 6. Applications of optimization Global Optimization Local Optimization Discrete optimization Constrained optimization Real application, Bounded Derivative Network. 4 Joint Advanced Students School 09.04.2015 Applications of optimization • • • • • • • Advanced engineering design Biotechnology Data analysis Environmental management Financial planning Process control Scientific modeling etc 5 Joint Advanced Students School Global or Local ? Optimization Global Local an overview An overview and implementation 09.04.2015 6 Joint Advanced Students School 09.04.2015 What is global optimization? • The objective of global optimization is to find the globally best solution of (possibly nonlinear) models, in the (possible or known) presence of multiple local optima. 7 Joint Advanced Students School 09.04.2015 Global optimization Branch and Bound Evolutionary Algorithms Simulated Annealing Tree annealing Tabu Search 8 Branch and bound Root problem Joint Advanced Students School 09.04.2015 • Consider the original problem with the complete feasible region • The lower-bounding and upper-bounding procedures are applied • Solution found If the bounds • procedure terminates match Else • feasible region is divided into two or more regions, each strict sub regions of the original 9 Branch and bound Joint Advanced Students School 09.04.2015 Scientist are ready to carry out some experiments. But the quality of all of the varies depending on type of experiment according to next table: Type of experiment Scientist number 1 2 3 4 A 0.9 0.8 0.9 0.85 B 0.7 0.6 0.8 0.7 C 0.85 0.7 0.85 0.8 D 0.75 0.7 0.75 0.7 10 Branch and bound Joint Advanced Students School 09.04.2015 Type of experiment Scie ntist A nu mbe B r C Root D 1 2 3 4 0.9 0.8 0.9 0.85 0.7 0.6 0.8 0.7 0.85 0.7 0.85 0.8 0.75 0.7 0.75 0.7 11 Branch and bound Joint Advanced Students School 09.04.2015 Root AAAA 0.55 Type of experiment Sci ent ist nu mb er 1 2 3 4 A 0.9 0.8 0.9 0.8 5 B 0.7 0.6 0.8 0.7 C 0.8 5 0.7 0.8 5 0.8 D 0.7 5 0.7 0.7 5 0.7 12 Branch and bound Joint Advanced Students School 09.04.2015 A ADCC 0.42 Root AAAA 0.55 B BAAA 0.42 C CAAA 0.52 D DAAA 0.45 Type of experiment Sci ent ist nu mb er 1 2 3 4 A 0.9 0.8 0.9 0.8 5 B 0.7 0.6 0.8 0.7 C 0.8 5 0.7 0.8 5 0.8 D 0.7 5 0.7 0.7 5 0.7 13 Branch and bound Joint Advanced Students School 09.04.2015 A ADCC 0.42 Root AAAA 0.55 B BAAA 0.42 C CAAA 0.52 D DAAA 0.45 A CABD 0.38 B CBAA 0.39 D CDAA 0.45 Type of experiment Sci ent ist nu mb er 1 2 3 4 A 0.9 0.8 0.9 0.8 5 B 0.7 0.6 0.8 0.7 C 0.8 5 0.7 0.8 5 0.8 D 0.7 5 0.7 0.7 5 0.7 14 Branch and bound Joint Advanced Students School 09.04.2015 A ADCC 0.42 Root AAAA 0.55 B BAAA 0.42 C CAAA 0.52 D DAAA 0.45 A CABD 0.38 B CBAA 0.39 D CDAA 0.45 A CBAD 0.37 B CDBA 0.40 Type of experiment Sci ent ist nu mb er 1 2 3 4 A 0.9 0.8 0.9 0.8 5 B 0.7 0.6 0.8 0.7 C 0.8 5 0.7 0.8 5 0.8 D 0.7 5 0.7 0.7 5 0.7 15 Branch and bound Joint Advanced Students School 09.04.2015 A ADCC 0.42 Root AAAA 0.55 B BAAA 0.42 C CAAA 0.52 D DAAA 0.45 A CABD 0.38 B CBAA 0.39 D CDAA 0.45 A CBAD 0.37 B CDBA 0.40 Type of experiment Sci ent ist nu mb er 1 2 3 4 A 0.9 0.8 0.9 0.8 5 B 0.7 0.6 0.8 0.7 C 0.8 5 0.7 0.8 5 0.8 D 0.7 5 0.7 0.7 5 0.7 16 Branch and bound Joint Advanced Students School Disadvantages Advantages 09.04.2015 17 Evolutionary algorithms Step 1 Step 2 Joint Advanced Students School 09.04.2015 • Initialize the population • Evaluate initial population • Perform competitive selection • Apply genetic operators to generate new solutions Repeat • Evaluate solutions in the population Until • Some convergence criteria is satisfied 18 Evolutionary algorithms Advantages Joint Advanced Students School Disadvantages 09.04.2015 19 Simulated annealing Joint Advanced Students School 09.04.2015 Start T,E=const Decrease T Compute dE Solution found! if dE<0 then accept If dE>0 Accept exp(-dE/T) Repeat until good solution not found 20 Simulated annealing results Joint Advanced Students School 09.04.2015 21 Simulated annealing Joint Advanced Students School 09.04.2015 Advantages Disadvantages Good for high dim tasks How to define dT? Easy to program Heavily Depend on initial point Good physical meaning What is T? 22 Tree annealing Joint Advanced Students School 09.04.2015 developed by Bilbro and Snyder [1991] 1. Randomly choose an initial point x over the search interval S0 2. Randomly travel down the tree to an arbitrary terminal node i, and generate a candidate point y over the subspace defined by Si. 3. If f(y)<F(X)< I> replace x with y, and go to step 5. 4. Compute P = exp (-(f(y)-f(x))/T). If P>R, where R is a random number uniformly distributed between 0 and 1, replace x with y. 5. If y replace x, decrease T slightly and update the tree until T < Tmin. 23 Tree annealing Joint Advanced Students School developed by Bilbro and Snyder [1991] Advantages Disadvantages 09.04.2015 24 Swarm intelligence Joint Advanced Students School 09.04.2015 25 Tabu Search Joint Advanced Students School 09.04.2015 Select current point, current node by random If evalf (current node) <evalf (best node) Current node becomes best node current node becomes best node new node becomes current node Until some counter reaches limit Repeat Select a new node that has a lowest distance in the neighborhood of current node that is not on tabu list 26 Taboo search implementation 1 Joint Advanced Students School 09.04.2015 27 Tabu search implementation 5 2 1 4 3 Joint Advanced Students School 09.04.2015 28 Taboo search implementation 5 2 1 4 1 3 Joint Advanced Students School 09.04.2015 29 Tabu search implementation 5 2 1 4 6 3 7 1 3 Joint Advanced Students School 09.04.2015 30 Tabu search implementation 5 2 1 4 3 3 6 8 6 9 7 1 Joint Advanced Students School 09.04.2015 31 Tabu search implementation 5 2 1 4 3 3 6 9 8 6 10 9 7 1 Joint Advanced Students School 11 09.04.2015 32 Tabu search implementation 5 2 1 4 3 3 6 9 8 6 10 9 7 1 Joint Advanced Students School 11 09.04.2015 33 Tabu search implementation 5 2 1 4 3 3 9 6 8 6 10 9 7 1 Joint Advanced Students School 11 09.04.2015 34 Tabu search implementation 5 2 1 4 3 3 9 6 8 6 10 9 7 1 Joint Advanced Students School 11 09.04.2015 35 Joint Advanced Students School Tabu Search Advantages 09.04.2015 Disadvantages 36 Joint Advanced Students School 09.04.2015 What is Local Optimization? • The term LOCAL refers both to the fact that only information about the function from the neighborhood of the current approximation is used in updating the approximation as well as that we usually expect such methods to converge to whatever local extremum is closest to the starting approximation. • Global structure of the objective function is unknown to a local method. 37 Local optimization Joint Advanced Students School 09.04.2015 Unconstrained optimization Gradient descent Conjugated gradients BFGS Gauss Newton LevenbergMarquardt Constrained optimization Simplex SQP Interior point 38 Gradient descent Consider F(x). F(x) is defined and F’(x) defined in some neighborhood of point a. F(x) increases fastest if one goes from a in the direction of gradient of F at a. => Then Joint Advanced Students School 09.04.2015 39 Gradient descent Joint Advanced Students School Therefore we obtained: F(x0)<F(x1)<…<F(xn ) 09.04.2015 40 Quasi-Newton Methods Joint Advanced Students School 09.04.2015 •These methods build up curvature information at each iteration to formulate a quadratic model problem of the form: where the Hessian matrix, H, is a positive definite symmetric matrix, c is a constant vector, and b is a constant. •The optimal solution for this problem occurs when the partial derivatives of x go to zero: 41 Quasi-Newton Methods Joint Advanced Students School 09.04.2015 42 BFGS - algorithm Joint Advanced Students School Obtain Sk by solving: Perform a line search to find the optimal αk in the direction found in the first step, then update 09.04.2015 43 BFGS - algorithm Joint Advanced Students School 09.04.2015 44 Gauss Newton algorithm Joint Advanced Students School 09.04.2015 Given m functions f1 f2 … fm of n parameters p1 p2 .. Pn (m>n),and we want to minimize the sum: The matrix inverse is never computed explicitly in practice. Therefore we use: instead of the above formula for pk+1, we use 45 Gauss Newton algorithm Joint Advanced Students School 09.04.2015 46 Levenberg-Marquardt Joint Advanced Students School 09.04.2015 This is an iterative procedure. Initial guess for pT = (1,1,…,1). p is replaced by (p+q) => At the minimum of the sum of squares S we have Differentiating the square of the right hand side (*) The key to LMA is to replace (*) with the ‘damped version’ : 47 Levenberg-Marquardt Joint Advanced Students School 09.04.2015 48 SQP – constrained minimization Joint Advanced Students School Reformulation 09.04.2015 49 SQP – constrained minimization Joint Advanced Students School 09.04.2015 The principal idea is the formulation of a QP sub-problem based on a quadratic approximation of the Lagrangian function: 50 SQP – constrained minimization Updating the Hessian matrix Joint Advanced Students School 09.04.2015 51 SQP – constrained minimization Updating the Hessian matrix Joint Advanced Students School 09.04.2015 Hessian should be positive definite Then qkTs >0 at each update Is qkTsk <0 then qk is modified on an element by element basis The aim is to distort gk which leads to positive definite as little as possible The most negative of qksk is repeatedly halved Repeat until qkTsk > 10-5 Neural Net analysis Joint Advanced Students School 09.04.2015 What is Neuron? Typical formal neuron makes the elementary operation – weighs values of the inputs with the locally stored weights and makes above their sum nonlinear transformation: y f u, u w0 iwi xi y y u x1 u w0 wi xi xn neuron makes nonlinear operation above a linear combination of inputs Neural Net analysis Joint Advanced Students School 09.04.2015 What is training? What kind of optimization to choose? W – set of synaptic weights E (W) – error function 54 Neural Network – any architecture 1 2 3 4 Error back propagation 0 1 2 6 4 3 5 Joint Advanced Students School 09.04.2015 55 How to optimize? Joint Advanced Students School 09.04.2015 Objective function – is an Empirical error (should decay) Parameters to optimize - are weights Constraints – are equalities (inequalities) for weights if exist 56 Neural Net analysis and constrained and unconstrained minimization Joint Advanced Students School 09.04.2015 UNCON UNCON CON CON trset tset trset tset SBDN 34.84 32.56 33.05 35.40 MLP4 20.60 21.75 23.24 27.24 RMS NB! For unconstrained optimization I applied LevenbergMarquardt method For constrained case I applied SQP method 57 Joint Advanced Students School 09.04.2015 Thank you for your attention