Presentation

advertisement
1
Optimization methods
Aleksey Minin
Saint-Petersburg State University
Student of ACOPhys master program (10th semester)
Joint Advanced
Students School
09.04.2015
2
What is optimization?
Joint Advanced Students
School
09.04.2015
3
Joint Advanced Students
School
09.04.2015
Content:
1.
2.
3.
4.
5.
6.
Applications of optimization
Global Optimization
Local Optimization
Discrete optimization
Constrained optimization
Real application, Bounded Derivative Network.
4
Joint Advanced Students
School
09.04.2015
Applications of optimization
•
•
•
•
•
•
•
Advanced engineering design
Biotechnology
Data analysis
Environmental management
Financial planning
Process control
Scientific modeling etc
5
Joint Advanced Students
School
Global or Local ?
Optimization
Global
Local
an overview
An overview
and
implementation
09.04.2015
6
Joint Advanced Students
School
09.04.2015
What is global optimization?
• The objective of global optimization is to find the
globally best solution of (possibly nonlinear)
models, in the (possible or known) presence of
multiple local optima.
7
Joint Advanced Students
School
09.04.2015
Global optimization
Branch and
Bound
Evolutionary
Algorithms
Simulated
Annealing
Tree
annealing
Tabu Search
8
Branch and bound
Root
problem
Joint Advanced Students
School
09.04.2015
• Consider the original problem with the complete feasible
region
• The lower-bounding and upper-bounding procedures are
applied
• Solution found
If the bounds • procedure terminates
match
Else
• feasible region is divided into two or more regions, each
strict sub regions of the original
9
Branch and bound
Joint Advanced Students
School
09.04.2015
Scientist are ready to carry out some experiments. But
the quality of all of the varies depending on type of
experiment according to next table:
Type of experiment
Scientist
number
1
2
3
4
A
0.9
0.8
0.9
0.85
B
0.7
0.6
0.8
0.7
C
0.85
0.7
0.85
0.8
D
0.75
0.7
0.75
0.7
10
Branch and bound
Joint Advanced Students
School
09.04.2015
Type of experiment
Scie
ntist A
nu
mbe B
r
C
Root
D
1
2
3
4
0.9
0.8
0.9
0.85
0.7
0.6
0.8
0.7
0.85 0.7
0.85 0.8
0.75 0.7
0.75 0.7
11
Branch and bound
Joint Advanced Students
School
09.04.2015
Root
AAAA
0.55
Type of experiment
Sci
ent
ist
nu
mb
er
1
2
3
4
A
0.9
0.8
0.9
0.8
5
B
0.7
0.6
0.8
0.7
C
0.8
5
0.7
0.8
5
0.8
D
0.7
5
0.7
0.7
5
0.7
12
Branch and bound
Joint Advanced Students
School
09.04.2015
A
ADCC
0.42
Root
AAAA
0.55
B
BAAA
0.42
C
CAAA
0.52
D
DAAA
0.45
Type of experiment
Sci
ent
ist
nu
mb
er
1
2
3
4
A
0.9
0.8
0.9
0.8
5
B
0.7
0.6
0.8
0.7
C
0.8
5
0.7
0.8
5
0.8
D
0.7
5
0.7
0.7
5
0.7
13
Branch and bound
Joint Advanced Students
School
09.04.2015
A
ADCC
0.42
Root
AAAA
0.55
B
BAAA
0.42
C
CAAA
0.52
D
DAAA
0.45
A
CABD
0.38
B
CBAA
0.39
D
CDAA
0.45
Type of experiment
Sci
ent
ist
nu
mb
er
1
2
3
4
A
0.9
0.8
0.9
0.8
5
B
0.7
0.6
0.8
0.7
C
0.8
5
0.7
0.8
5
0.8
D
0.7
5
0.7
0.7
5
0.7
14
Branch and bound
Joint Advanced Students
School
09.04.2015
A
ADCC
0.42
Root
AAAA
0.55
B
BAAA
0.42
C
CAAA
0.52
D
DAAA
0.45
A
CABD
0.38
B
CBAA
0.39
D
CDAA
0.45
A
CBAD
0.37
B
CDBA
0.40
Type of experiment
Sci
ent
ist
nu
mb
er
1
2
3
4
A
0.9
0.8
0.9
0.8
5
B
0.7
0.6
0.8
0.7
C
0.8
5
0.7
0.8
5
0.8
D
0.7
5
0.7
0.7
5
0.7
15
Branch and bound
Joint Advanced Students
School
09.04.2015
A
ADCC
0.42
Root
AAAA
0.55
B
BAAA
0.42
C
CAAA
0.52
D
DAAA
0.45
A
CABD
0.38
B
CBAA
0.39
D
CDAA
0.45
A
CBAD
0.37
B
CDBA
0.40
Type of experiment
Sci
ent
ist
nu
mb
er
1
2
3
4
A
0.9
0.8
0.9
0.8
5
B
0.7
0.6
0.8
0.7
C
0.8
5
0.7
0.8
5
0.8
D
0.7
5
0.7
0.7
5
0.7
16
Branch and bound
Joint Advanced Students
School
Disadvantages
Advantages
09.04.2015
17
Evolutionary algorithms
Step 1
Step 2
Joint Advanced Students
School
09.04.2015
• Initialize the population
• Evaluate initial population
• Perform competitive selection
• Apply genetic operators to generate new solutions
Repeat • Evaluate solutions in the population
Until
• Some convergence criteria is satisfied
18
Evolutionary algorithms
Advantages
Joint Advanced Students
School
Disadvantages
09.04.2015
19
Simulated annealing
Joint Advanced Students
School
09.04.2015
Start
T,E=const
Decrease T
Compute dE
Solution found!
if dE<0 then
accept
If dE>0 Accept
exp(-dE/T)
Repeat until
good solution
not found
20
Simulated annealing
results
Joint Advanced Students
School
09.04.2015
21
Simulated annealing
Joint Advanced Students
School
09.04.2015
Advantages
Disadvantages
Good for high
dim tasks
How to define
dT?
Easy to
program
Heavily Depend
on initial point
Good physical
meaning
What is T?
22
Tree annealing
Joint Advanced Students
School
09.04.2015
developed by Bilbro and Snyder [1991]
1. Randomly choose an initial point x over the search
interval S0
2. Randomly travel down the tree to an arbitrary
terminal node i, and generate a candidate point y
over the subspace defined by Si.
3. If f(y)<F(X)< I> replace x with y, and go to step 5.
4. Compute P = exp (-(f(y)-f(x))/T). If P>R, where R
is a random number uniformly distributed between
0 and 1, replace x with y.
5. If y replace x, decrease T slightly and update the
tree until T < Tmin.
23
Tree annealing
Joint Advanced Students
School
developed by Bilbro and Snyder [1991]
Advantages
Disadvantages
09.04.2015
24
Swarm intelligence
Joint Advanced Students
School
09.04.2015
25
Tabu Search
Joint Advanced Students
School
09.04.2015
Select current point,
current node by
random
If evalf (current node)
<evalf (best node)
Current node becomes
best node
current node becomes
best node
new node becomes
current node
Until some counter
reaches limit
Repeat
Select a new node that
has a lowest distance in
the neighborhood of
current node that is not
on tabu list
26
Taboo search
implementation
1
Joint Advanced Students
School
09.04.2015
27
Tabu search
implementation
5
2
1
4
3
Joint Advanced Students
School
09.04.2015
28
Taboo search
implementation
5
2
1
4
1
3
Joint Advanced Students
School
09.04.2015
29
Tabu search
implementation
5
2
1
4
6
3
7
1
3
Joint Advanced Students
School
09.04.2015
30
Tabu search
implementation
5
2
1
4
3
3
6
8
6
9
7
1
Joint Advanced Students
School
09.04.2015
31
Tabu search
implementation
5
2
1
4
3
3
6
9
8
6
10
9
7
1
Joint Advanced Students
School
11
09.04.2015
32
Tabu search
implementation
5
2
1
4
3
3
6
9
8
6
10
9
7
1
Joint Advanced Students
School
11
09.04.2015
33
Tabu search
implementation
5
2
1
4
3
3
9
6
8
6
10
9
7
1
Joint Advanced Students
School
11
09.04.2015
34
Tabu search
implementation
5
2
1
4
3
3
9
6
8
6
10
9
7
1
Joint Advanced Students
School
11
09.04.2015
35
Joint Advanced Students
School
Tabu Search
Advantages
09.04.2015
Disadvantages
36
Joint Advanced Students
School
09.04.2015
What is Local Optimization?
• The term LOCAL refers both to the fact that only
information about the function from the
neighborhood of the current approximation is
used in updating the approximation as well as
that we usually expect such methods to converge
to whatever local extremum is closest to the
starting approximation.
• Global structure of the objective function is
unknown to a local method.
37
Local optimization
Joint Advanced Students
School
09.04.2015
Unconstrained optimization
Gradient
descent
Conjugated
gradients
BFGS
Gauss
Newton
LevenbergMarquardt
Constrained optimization
Simplex
SQP
Interior point
38
Gradient descent
Consider F(x). F(x) is defined and
F’(x) defined in some neighborhood of
point a.
F(x) increases fastest if one goes from a
in the direction of gradient of F at a. =>
Then
Joint Advanced Students
School
09.04.2015
39
Gradient descent
Joint Advanced Students
School
Therefore we obtained: F(x0)<F(x1)<…<F(xn )
09.04.2015
40
Quasi-Newton Methods
Joint Advanced Students
School
09.04.2015
•These methods build up curvature information at each
iteration to formulate a quadratic model problem of the
form:
where the Hessian matrix, H, is a positive definite
symmetric matrix, c is a constant vector, and b is a
constant.
•The optimal solution for this problem occurs when the
partial derivatives of x go to zero:
41
Quasi-Newton Methods
Joint Advanced Students
School
09.04.2015
42
BFGS - algorithm
Joint Advanced Students
School
Obtain Sk by solving:
Perform a line search to find the optimal αk in
the direction found in the first step, then update
09.04.2015
43
BFGS - algorithm
Joint Advanced Students
School
09.04.2015
44
Gauss Newton algorithm
Joint Advanced Students
School
09.04.2015
Given m functions f1 f2 … fm of n
parameters p1 p2 .. Pn (m>n),and we
want to minimize the sum:
The matrix inverse is never computed explicitly in
practice. Therefore we use:
instead of the above formula for pk+1,
we use
45
Gauss Newton algorithm
Joint Advanced Students
School
09.04.2015
46
Levenberg-Marquardt
Joint Advanced Students
School
09.04.2015
This is an iterative procedure. Initial
guess for pT = (1,1,…,1).
p is replaced by (p+q) =>
At the minimum of the sum of squares
S we have
Differentiating the square of the right
hand side
(*)
The key to LMA is to replace (*) with
the ‘damped version’ :
47
Levenberg-Marquardt
Joint Advanced Students
School
09.04.2015
48
SQP – constrained minimization
Joint Advanced Students
School
Reformulation
09.04.2015
49
SQP – constrained minimization
Joint Advanced Students
School
09.04.2015
The principal idea is the formulation of a QP sub-problem
based on a quadratic approximation of the Lagrangian
function:
50
SQP – constrained
minimization
Updating the Hessian matrix
Joint Advanced Students
School
09.04.2015
51
SQP – constrained
minimization
Updating the Hessian matrix
Joint Advanced Students
School
09.04.2015
Hessian should be
positive definite
Then qkTs >0 at
each update
Is qkTsk <0 then qk is
modified on an
element by element
basis
The aim is to distort
gk which leads to
positive definite as
little as possible
The most negative of
qksk is repeatedly
halved
Repeat until qkTsk >
10-5
Neural Net analysis
Joint Advanced Students
School
09.04.2015
What is Neuron?
Typical formal neuron makes the elementary operation –
weighs values of the inputs with the locally stored weights
and makes above their sum nonlinear transformation:
y  f  u, u  w0  iwi xi
y
y
u
x1
u  w0   wi xi
xn
neuron makes nonlinear operation above a linear combination of inputs
Neural Net analysis
Joint Advanced Students
School
09.04.2015
What is
training?
What kind of
optimization to
choose?
W – set of synaptic
weights
E (W) – error function
54
Neural Network – any architecture
1
2
3
4
Error back propagation
0
1 2
6
4
3
5
Joint Advanced Students
School
09.04.2015
55
How to optimize?
Joint Advanced Students
School
09.04.2015
Objective function – is an Empirical error (should decay)
Parameters to optimize - are weights
Constraints – are equalities (inequalities) for weights if exist
56
Neural Net analysis and
constrained and unconstrained
minimization
Joint Advanced Students
School
09.04.2015
UNCON UNCON CON
CON
trset
tset
trset
tset
SBDN
34.84
32.56
33.05
35.40
MLP4
20.60
21.75
23.24
27.24
RMS
NB! For unconstrained optimization I applied LevenbergMarquardt method
For constrained case I applied SQP method
57
Joint Advanced Students
School
09.04.2015
Thank you for your attention
Download