MASSACHUSETTS INSTITUTE OF TECHNOLOGY

advertisement
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Department of Civil and Environmental Engineering
1.731 Water Resource Systems
Lecture 12 & 13, Nonlinear Programming, Oct. 19, 24 2006
Unconstrained Nonlinear Programming
General statement of unconstrained nonlinear programming (NLP) problem:
Maximize F ( x1, x2 ,..., xn )
x1,x 2 ,...,x n
All candidate solutions are feasible for unconstrained problems and optimum always lies inside
feasible region.
Necessary Conditions:
1. Feasibility
Any x* is feasible
2. Stationarity
If x* is a local maximum then:
∂F ( x*)
=0
∂x j
3. Inequality Lagrange multiplier
Not applicable since there are no constraints
4. Curvature
If x* is local maximum then:.
∂ 2 L( x*, λ )
∂ 2 F ( x*)
Wkl =
Z ik Z lj =
≤0
∂x j ∂x k
∂x j ∂x k
In NLP problems local maxima are usually not necessarily global maxima (convexity conditions
usually do not hold).
Finding a Local Maximum.
Use an iterative search that moves from one candidate solution x k = xik on iteration k to a new
candidate solution x k +1 = xik +1 on iteration k + 1:
1
Search procedure:
0
Start by selecting an initial solution x , set k = 0.
1. Test for convergence, if converged set x * = x k and exit.
2. Compute a search direction vector p k = p ik
k
k
3. Compute a search distance α > 0 along p k . For some search methods α is set to 1.
4. Compute the new solution x k +1 = x k + Δx k = x k + α k p k and go to Step 1
k
k
If search is an ascent method then F(x + 1) ≥ F(x) for α sufficiently small.
x2
k
First determine search direction p . Then find
k
α that maximizes function along this
direction.
1
3
p
x
x2
x1
Δx1=α1p1
x0
x1
There are a number of different search alternatives for UNLP..
Steepest Ascent/Gradient Search
k
k
Derive p by expanding F(x + 1) in a first-order Taylor series in Δx k = α k p ik :
F (x
k +1
∂F ( x k ) k k
) = F (x ) +
α pi + K
∂xi
∂F ( x k ) ∂F ( x)
=
where
= Gradient
∂xi x = x k
∂xi
k
The increase in F(x) is largest when p ik is aligned with the objective function gradient
∂F ( x k ) / ∂x j :
p ik =
∂F ( x k )
∂xi
2
k
If the step Δx k is too large the first-order expansion may not be valid. Control the step with α ,
which is derived by maximizing F(x
k+1
k
k
) with respect to α (with p given):
k
k k
Maximize F ( x + α p i )
αk
Use an iterative univariate search algorithm to solve this problem (see Gill et al. for
alternatives). This requires multiple evaluations of F(x) on each iteration.
Usually the gradient ∂F ( x k ) / ∂x j is derived numerically, using finite difference or adjoint
methods that require multiple evaluations of F(x) on each iteration.
Steepest ascent characteristics:
• Converges very slowly (or not at all).
k
• Univariate search for α is essential
• Many function evaluations required
Newton’s Method
k
k
k
k
Derive p by expanding F(x + 1) in a second-order Taylor series about x (with α = 1, so
Δx k = p k ):
F (x
where:
k +1
∂F ( x k ) k 1 ∂ 2 F ( x k ) k k
) = F (x ) +
pi +
pi p j + K
∂xi
2 ∂xi ∂x j
∂ 2 F ( x)
∂xi ∂x j
k
=
x= xk
∂ 2 F (x k )
= H ijk = objective Hessian
∂xi ∂x j
k
Maximize increase in F(x) with respect to p :
∂F ( x k ) k 1 k k k
pi + H ij p i p j
Maximize
∂xi
2
pk
Necessary conditions for this quadratic optimization problem imply that:
H ijk p kj = −γ ik where γ k = γ ik = ∂F ( x k ) / ∂x i = objective gradient
The Newton direction Δx k = p k is solution to this set of linear eqs. It is an ascent direction if
H ijk < 0 (negative definite). Modifications are needed at points where ∂F ( x k ) / ∂x i = 0 and/or
H ijk = 0 . Modified Newton methods replace indefinite Hessian by a positive definite
approximation (e.g. by using an eigenvalue decomposition).
3
Usually the gradient and Hessian need to be computed numerically, using finite difference or
adjoint methods. Exact Hessian computation may be infeasible for large problems because of the
number of function evaluations required.
Newton’s method characteristics:
• Hessian must be negative definite
• Converges faster than steepest ascent
• Many function evaluations required
k
• Very expensive to compute H ij if numerical differentiation is required.
Quasi-Newton Methods
Avoid computational disadvantages of Newton’s method and retain good convergence properties
by gradually constructing a negative definite approximation to Hessian.
Basic idea is to use an iterative algorithm to update an approximate negative definite Hessian
matrix B k = Bijk ., using only information about gradients and previous search steps. The search
k
direction p is found from an equation having the same form as the Newton equation:
Bijk p kj = −γ ik
k
The new solution is usually obtained by using a search distance a obtained from a univariate
search:
x k +1 = x k + Δx k = x k + α k p k
The iteration is often initialized with the negative of the identity matrix B 0 = − I = −δ ij .
Consequently, the first step is a steepest ascent step.
The approximate Hessian is obtained from:
B k +1 = B k + U k
where U k = U ijk is an update matrix
How do we choose U k ? Many options !
The curvature of F(x) in the direction Δxik = xik +1 − xik at x is the quadratic form Δxik H ijk Δx kj .
This can be expressed in terms of the objective gradient since:
k
Δxik H ijk Δx kj ≈ Δxik (γ ik +1 − γ ik ) = Δxik Δγ ik since γ ik +1 = γ ik + H ijk Δxik + K
We wish to obtain the same curvature when we replace H k by B k +1 . This implies:
Bijk +1Δx kj ≈ Δγ ik
Quasi-Newton condition.for k → k +1
In most quasi-Newton methods U k is chosen subject to following requirements:
k+1
1. B
should satisfy the quasi-Newton condition:
4
k
k
2. U k should be rank 1, which implies that U ijk = uik vik for some vectors u and v .
3. U k should be symmetric (since Bk+1 should be symmetric)
B
There are various methods that satisfy these requirements asymptotically (for large k). One of
the most widely used is the Broyden-Fltecher-Goldfarb-Shanno (BFGS) algorithm:
1
1
U ijk = −
B k Δx k B k Δx k +
Δγ ik Δγ kj
k
k
k k
k il l mj m
Δxl Blm Δxm
Δγ l Δxl
When we substitute Δx k = α k p k and Bijk p kj = −γ ik this simplifies to:
U ijk =
1
γ lk plk
γ ik γ kj +
1
α
k
Δγ lk plk
Δγ ik Δγ kj
The BFGS update maintains the negative definiteness of the approximate Hessian from iteration
to iteration.
As in steepest ascent, the univariate search and gradient evaluation together require many
function evaluations on each iteration.
Quasi-Newton characteristics:
• No need to derive Hessian
• Approximate Hessian is negative definite
k
• Univariate search for α is desirable
• Many function evaluations required
• Good convergence properties
• Somewhat more expensive than steepest ascent but much more reliable.
Gauss-Newton Methods
Unconstrained nonlinear least-squares problems have a special structure that leads to efficient
iterative search algorithms. These problems seek to minimize the weighted sum of squared
errors between a set of measurements yt (t = 1,…, m) and a nonlinear model ht ( x j ) that depends
on a set of decision variables (or parameters) x j (j = 1,…, n).
Minimize F ( x1 , x 2 ,..., x n )
x1,x2,...,xn
where F ( x) =
m
∑
1
K t [ y t −ht ( x)] 2 = = sum of weighted squared errors, Kt = positive weight
2 t =1
Solve iteratively, starting with an initial estimate x 0 . Obtain search step as follows:
1. Set gradient at new iterate x k +1 equal to zero (from stationarity condition):
∂h ( x k +1 )
∂F ( x k +1 ) N
=
=0
K t [ y t −ht ( x k +1 )] t
∂x j
∂
x
j
t =1
∑
5
2. Approximate ht ( x j ) by a first-order Taylor series expansion around previous iteration:
ht ( x k +1 ) = ht ( x k ) +
∂ht ( x k ) k
Δx j + K
∂x j
This is valid only if step Δx kj is sufficiently small.
3. Assume ∂ht ( x k +1 ) / ∂x j = ∂ht ( x k ) / ∂x j .
Then stationarity condition reduces to:
Bijk Δx kj = γ ik
where:
k
∂F ( x k ) N
k ∂ht ( x )
=
K t [ y t −ht ( x )]
∂xi
∂xi
t =1
∑
Bijk
∂ht ( x k +1 ) ∂ht ( x k +1 )
=
Kt
∂xi
∂x j
t =1
N
∑
This is the Gauss-Newton search algorithm. It has same form as Quasi-Newton algorithm, with
a positive definite approximate Hessian B k computed from the sensitivity derivatives
∂ht ( x k ) / ∂x j and with Δx k = p k (α k = 1) . Sensitivity derivatives are usually be computed
numerically, from multiple evaluations of the function h(x).
k
Algorithm performance often improved by adding a positive constant λ to diagonals of B k so:
[ Bijk + λ k δ ij ]Δx kj = γ ik
This gives Levenberg-Marquardt search algorithm.
k
Small λ → unmodified Gauss-Newton (steps may be too large)
k
Large λ → Steepest descent (steps may be too small)
k
It is helpful to adjust λ dynamically to improve convergence. Univariate search along Δx k is
an alternative.
Gauss-Newton characteristics:
• Specialized algorithm for least-squares (minimization) problems
• Approximate positive definite Hessian is derived from sensitivity derivatives
• Many function evaluations required
• Good convergence properties
• Most efficient option for least-squares problems (depending on effort required to evaluate
sensitivity derivatives).
6
Download