Uploaded by Mani S Prasad

Gradient Based Optimization LN -4

advertisement
Engg System Design
Optimization
Gradient Based
M S Prasad , AISST: Amity Univ
This lecture note is based on Textbooks and open literature . This LN is suitable for Grad /post Grad
students of Aerospace & Avionics . to be read in conjunction with Class room discussions
Gradient Based Optimization : LN – 4
Most local optimization algorithms are gradient-based. As indicated by the name, gradientbased optimization techniques make use of gradient information to find the optimum solution .
Sometimes also referred as Numerical techniques .
Gradient-based algorithms are widely used for solving a variety of optimization problems in
engineering. These techniques are popular because they are efficient (in terms of the number
of function evaluations required to find the optimum), they can solve problems with large
numbers of design variables, and they typically require little problem-specific parameter tuning.
Gradient-based algorithms typically make use of a two-step process to reach the optimum.
The first step is to use gradient information for finding a desirable search direction S in which to
move. The second step is to move in this direction until no more progress can be made. The
second step is known as the one-dimensional or line search, and provides the optimum step
size, α a positive scalar. There are also gradient-based algorithms that do not rely on a onedimensional search.
For most optimization problems, the gradient information is not readily available and is
obtained using finite difference gradient calculations. Finite difference gradients provide a
flexible means of estimating the gradient information. The different gradient-based algorithms
that exist, differ mostly in the logic used to determine the search direction.
For the one-dimensional search, there are many algorithms that will find the best step size, and
generally any of these techniques can be combined with a particular gradient-based algorithm
to perform the required one-dimensional search. Some of the popular one-dimensional search
algorithms include the Golden Section search, the Fibonacci search, and many variations of
polynomial approximations. ( Refer Line search LN – 2 ).
The iteration of optimum value follows the algorithm as below :𝑋 𝐾+1 = 𝑋 𝐾 + βˆ†π‘‹ 𝐾
βˆ†π‘‹ 𝐾 = π›Όπ‘˜ 𝑠 π‘˜
Where SK is the desirable search direction in design space and α k is the step size in that
direction. Due to this reason the gradient based methods are known as “Search Techniques ‘
Or Direct Techniques .
Summary : The basic idea of numerical methods for nonlinear optimization problems is to start
with a reasonable estimate for the optimum design. Cost and constraint functions and their
derivatives are evaluated at that point. Based on them, the design is moved to a new point. The
process is continued until either optimality conditions or some other stopping criteria are met.
ESDO LN -4 # M S Prasad
Page 1
This iterative process represents an organized search through the design space for points that
represent local minima. Thus, the procedures are often called the search techniques or direct
methods of optimization.
Algorithm
Step 1 : Estimate the starting value of Variable X0 and set iteration Counter K = 0.
Step 2 . Compute a search direction Sk in design space .
Step 3 : Calculate αk in direction Sk
Step 4 : calculate Xk+1 = Xk + αk Sk
Step 5 : Check
f ( X ) K+1 < f ( Xk ) 0r
f( Xk + αk Sk ) < f ( Xk )
to find the next iteration
Step 6 continue till we converge.
How to find the direction ?
Let us expand the function by Taylor series
f( Xk + αk Sk )
1
f( Xk + αk Sk ) = f( XK+1 ) = 𝑓(𝑋 π‘˜ ) + ∇𝑇 𝑓(𝑋 𝐾 )(βˆ†π‘‹)π‘˜ + 2 (βˆ†π‘‹ π‘˜ )𝑇 𝐻 ( 𝑋 π‘˜ )(βˆ†π‘‹)π‘˜
where βˆ†π‘‹ = αk Sk
we can also take only one term of Taylor series
This function needs to be minimum for finding an optimum solution i.e. its derivative should
be set to zero.
f( Xk + αk Sk ) /dα = ∇𝑇 𝑓(𝑋 𝐾 )(𝑆)π‘˜ + 𝛼(𝑆 π‘˜ )𝑇 𝐻 ( 𝑋 π‘˜ )(𝑆)π‘˜ = 0
𝛼=−
∇𝑇 𝑓(𝑋 𝐾 )(𝑆)π‘˜
𝑇
(𝑆 π‘˜ ) 𝐻 ( 𝑋 π‘˜ )(𝑆)π‘˜
π‘“π‘œπ‘Ÿ π‘œπ‘π‘‘π‘–π‘šπ‘’π‘š π‘£π‘Žπ‘™π‘’π‘’ 𝑀𝑒 π‘ β„Žπ‘œπ‘’π‘™π‘‘ β„Žπ‘Žπ‘£π‘’ 𝐢 𝐾 . 𝑆 𝐾 < 0.
Where CK = ∇𝑇 𝑓(𝑋 𝐾 ). This is also known as gradient of cost function. The direction defined by
this procedure is known as direction of descent .
Steepest Descent Search Method
We know that gradient points towards the direction of maximum rate of increase for function
f(X) at design point X* . Hence in steepest descent algorithm we select the direction negative of
gradient i.e opposite to it.
Algorithm
ESDO LN -4 # M S Prasad
Page 2
Step 1 : select starting point X0
Step 2 : Calculate Gradient i.e C0
Step 3 : check convergence i.e || CK|| < e stop since x * = Xk is the minimum point .
Step 4 . set S 0 = - C0
Step 5 calculate α to minimize the function f ( X0 + α S0 ) #
Step 6 calculate X1 iterate till convergence criteria or minimum value is reached .
# This is a linear equation in α and minima can also be found out by any Line search algorithm.
Problems with Steepest Descent Method
1. A large number of iterations may be required for the minimization of even positive definite
quadratic forms, i.e., the method can be quite slow to converge to the minimum point.
2. Information calculated at the previous iterations is not used. Each iteration is started
independent of others, which is inefficient.
3. Only first-order information about the function is used at each iteration to determine the
search direction. This is one reason that convergence of the method is slow. It can further
deteriorate if an inaccurate line search is used.
Note : we had a function f (Xk+1 ) . calculate
df( Xk+1 ) / dα = df( xk+1)/dα . d Xk+1/dα = Ck . Sk = 0 also Ck+1. Ck = 0
that it shows that the successive Steepest Descent Directions are Normal to one another .
.
Conjugate Gradient Method
The normal steepest descent algorithm sometimes takes number of iterations to converge in case of
quadratic functions . A modification of this was suggested by Fletcher & Reeves so that it can
converge faster. The concept is to find a conjugate gradient at each point and then find the step size
.The direction updates are modified as
𝑆 π‘˜+1 = − ∇𝑓(𝑋 π‘˜ ) + π›½π‘˜ 𝑆 π‘˜−1 The βk is Conjugate direction and is given as
ESDO LN -4 # M S Prasad
Page 3
π›½π‘˜ = {
||𝐢 π‘˜ ||
||𝐢 π‘˜−1 ||
2
}
Rest of the steps are same as steepest descent algorithm . The convergence of this method is faster.
In this algorithm the current steepest descent direction is modified by adding a scaled direction
used in the previous iteration. The scale factor is determined using lengths of the gradient
vector at the two iterations as shown in above equation of βk.
Thus, the conjugate direction is nothing but a deflected steepest descent direction. This is an
extremely simple modification that requires little additional calculation.
The conjugate gradient algorithm finds the minimum in n iterations for positive definite
quadratic functions having n design variables.
Newton Method of optimization
The basic idea of the Newton’s method is to use a second-order Taylor’s expansion of the
function about the current design point. This gives a quadratic expression for the change in
design βˆ†π‘‹ . The necessary condition for minimization of this function then gives an explicit
calculation for design change.
Taylor expansion of our function f(X) with small change ΔX is : 𝑓(𝑋 + βˆ†π‘‹) = 𝑓(𝑋) + 𝐢 𝑇 βˆ†π‘‹ +
πœ•2 𝑓(𝑋)
1
βˆ†π‘‹ 𝑇 𝐻 βˆ†π‘‹
2
Here C is ∇𝑓(𝑋) π‘Žπ‘›π‘‘ 𝐻 = πœ•π‘₯𝑖 πœ•π‘₯𝑗 Hessian of function f ( x)
Differentiating the above with respect to ΔX and equating to zero for minimization , we have
C+ H ΔX = 0 i.e ΔX = - H-1 C
Now we can update
X1 = X0 + ΔX.
Note : we know that if H is positive semi definite matrix it will ensure Global minima. Also
any quadratic function needs to be convex and positive semi definite to have a minima .
The above Newton’s method does not have a step size associated with the calculation of
design change ΔX ); i.e., step size is taken as one (step of length one is called an ideal step size
or Newton’s step). Therefore, we cannot guarantee that the cost function will reduce at
each iteration;
f (x(k+1)) < f (x(k))).
Thus, it appears this algorithm may not converge easily .
ESDO LN -4 # M S Prasad
Page 4
Modified Newton’s Method
In modified Newton’s method we incorporate the step size α computation resulting into
better convergence rate .
Algorithms ( Modified Newton )
Step 1 : start with X0 , set k = 0 ; select convergence parameter € ( a small number ).
Step 2 : calculate CK
if ||Ck || < € stop else continue
Step 3 : calculate Hk at XK : calculate H0
Step 4 : calculate SK = - [ Hk ] -1 CK : - [ H0 ] -1 C0
k k
K
{ Generally it is better to solve linear equation H S = - C instead of calculating
inverse )
Step 5 : update function X k+1 = Xk + αk Sk
calculate αk by minimizing the function f( Xk + αk Sk )
Step 6 : set K = K+1 go to step 2.
It is important to note here that unless H is positive definite, the direction S K determined may
not be that of descent for the cost function. If H is negative definite or negative semidefinite,
the condition is always violated. With H as indefinite or positive semidefinite, the condition may
or may not be satisfied, so we must check for it
.
c(k)TH-1c(k) < 0
This condition will always satisfied if H is positive semi definite.
Disadvantages of Newton’s Method
1. It requires calculations of second-order derivatives at each iteration, which is usually quite
time consuming. In some applications it may not even be possible to calculate such derivatives.
Also, a linear system of equations Hk Sk = - CK needs to be solved , needing more computations
in each step.
2. The method is not convergent unless the Hessian remains positive definite and a step size is
calculated along the search direction to update design. However, the method has a quadratic
rate of convergence when it converges. For a strictly convex quadratic function, the method
converges in just one iteration from any starting
ESDO LN -4 # M S Prasad
Page 5
Quasi Newton Method
Sometimes it may be difficult to compute Hessian Matrix due to complex equation . Is it
possible to approximate the second derivative and proceed ?. In such cases we can try to
approximate the second derivative , using two pieces of information: change in design and the
gradient vectors between two successive iterations. While updating, the properties of
symmetry and positive definiteness needs to be preserved always. The derivation of the
updating procedures is based on the so-called quasi-Newton condition.
This condition is derived by requiring the curvature of the cost function in the search direction
d(k) to be the same at two consecutive points x(k) and x(k+1).
The enforcement of this condition gives the updating formulas for the Hessian of the cost
Function or its inverse. For a strictly convex quadratic function, the updating procedure
converges to the exact Hessian in n iterations.
David – Fletcher – Powell ( Inverse Hessian Approximation ) : DFP algorithm
Algorithm
Step 1 : Choose initial value of X0 . Select a symmetric positive definite matrix ( nxn) A0 as
estimate of H . { selection of Identity matrix is possible}, Choose € and set K =0( iteration
counter )
Step 2: calculate || Ck || , if || Ck || < € stop .
else
Step 3 : Sk = - Ak Ck
Step 4 : calculate αk by minimizing the function f(Xk + αk Sk )
Step 5 : Update Xk+1 = Xk + αk Sk
Step 6 : Update Ak as below
Ak+1 = Ak + Bk + Ck
π΅π‘˜ = 𝑑𝐾 ( π‘‘π‘˜ )𝑇 ⁄(π‘‘π‘˜ . π‘Œπ‘˜ ) ;
𝐢 π‘˜ = − π‘π‘˜ (π‘π‘˜ )𝑇 ⁄(π‘Œπ‘˜ 𝑍𝐾 )
π‘Œπ‘˜ = 𝐢 π‘˜+1 − 𝐢 π‘˜ ; π‘π‘˜ = π΄π‘˜ π‘Œπ‘˜ ; π‘‘π‘˜ = π›Όπ‘˜ 𝑆 π‘˜
Step 7 : K = K+1
Go to step 2 .
The matrix A(k) is positive definite for all k. This implies that the method will always
converge to a local minimum point
2. When this method is applied to a positive definite quadratic form, A(k) converges to
the inverse of the Hessian of the quadratic form.
ESDO LN -4 # M S Prasad
Page 6
Direct Hessian Updating: BFGS Method
Instead of updating inverse of H we can also update H as suggested by Broyden-FletcherGoldfarb-Shanno (BFGS) algorithm
Step 1. Estimate an initial design X0. Choose a symmetric positive definite H (nxn ) matrix H(0)
as an estimate for the Hessian of the cost function. In the absence of more information, H(0) = I
can be also chosen . Choose a convergence parameter €. Set k = 0,
Calculate gradient C0 = ∇𝑓(𝑋)0 .
Step 2. calculate || Ck || , if || Ck || < € stop .
Step 3 : Sk = - Hk Ck
Step 4 : calculate αk by minimizing the function f(Xk + αk Sk )
Step 5 : Update Xk+1 = Xk + αk Sk
Step 6 : Update Hk as below
Hk+1 = Hk + Dk + Ek
𝐻 π‘˜ = 𝑦𝐾 ( π‘¦π‘˜ )𝑇 ⁄(π‘‘π‘˜ . π‘Œπ‘˜ ) ;
π‘Œπ‘˜ = 𝐢 π‘˜+1 − 𝐢 π‘˜ ;
𝐸 π‘˜ = − πΆπ‘˜ (πΆπ‘˜ )𝑇 ⁄(πΆπ‘˜ 𝑆𝐾 )
π‘‘π‘˜ = π›Όπ‘˜ 𝑆 π‘˜
Step 7. Set k = k + 1 and go to Step 2.
Note again that the first iteration of the method is the same as that for the steepest descent
method when H(0) = I. It can be shown that the BFGS update formula keeps the Hessian
approximation positive definite if an accurate line search is used.
Gradient Projection Algorithm
Gradient projection method is based on the concept of projecting the search direction into
subspace tangent to active constraint.
Problem definition
minimize f(x) subject to constraint
𝑔𝑗 (π‘₯) = 𝒂𝑻𝒋 π‘₯ − 𝑏𝑗 ≥ 0 π‘œπ‘Ÿ ∑𝑛𝑖 π‘Žπ‘—π‘– π‘₯𝑖 − 𝑏𝑗 ≥ 0
Solution
Assume there are p number of active constraints and π’ˆπ’‚ vector of active constraint and N is
gradient of these active constraint ( column matrix) then we have
ESDO LN -4 # M S Prasad
Page 7
π’ˆπ’‚ = 𝑡𝑻 𝑿 − 𝒃 = 0 ---------------------- (A)
The basic assumption we make in this algorithm is X lies in the subspace tangent to active
constraint that is
Both Xi and Xi+1 satisfy the constraint as defined by equation A.This amount to
If Xi+1 = Xi + α S then we have NT S = 0
Hence this can be defines as steepest gradient algorithm as under : Minimize ST 𝛁f subject to
NTS = 0 and STS =1.
To solve this let us define a Lagrange function as under : 𝐿( 𝑠, πœ†, µ ) = 𝑆 𝑇 ∇𝑓 − πœ†π‘† 𝑇 𝑁 − πœ‡(𝑆 𝑇 𝑆 − 1)
πœ•πΏ
πœ•π‘†
= ∇𝑓 − πœ†π‘ − 2πœ‡π‘† = 0 ------------1
𝑁 𝑇 ∇𝑓 − 𝑁 𝑇 π‘πœ† = 0
πœ† = ( 𝑁 𝑇 𝑁)−1 𝑁 𝑇 ∇𝑓
Substituting this in eqn 1 above.
𝑆=
1
1
[𝐼 − ( 𝑁 𝑇 𝑁)−1 𝑁 𝑇 ]∇𝑓 =
𝑃 ∇𝑓
2πœ‡
2πœ‡
P is known as Projection matrix . 1/ 2µ is insignificant being a scalar value.
Thats how we calculate new search direction S = - 𝑃 ∇𝑓 .
After a search direction has been determined, we have to determine the value of α . Unlike
the unconstrained case, there is an upper limit on α set by the inactive constraints since if α
increases, some of them may become active and then violated.
Since
𝑔𝑗 (π‘₯) = 𝒂𝑻𝒋 π‘₯ − 𝑏𝑗 π‘œπ‘Ÿ 𝑔𝑗 (π‘₯) = 𝒂𝑻𝒋 (π‘₯𝑖 + 𝛼𝑠) − 𝑏𝑗 ≥ 0
𝛼 ≤ −𝑔𝑗 (π‘₯)/ 𝒂𝑻𝒋 𝑠
The main difficulty caused by the nonlinearity of the constraints is that the one-dimensional
search typically moves away from the constraint boundary. This is because we move in the
tangent subspace which no longer follows exactly the constraint boundary. For this we can use
the approximation of g to restore it .
ESDO LN -4 # M S Prasad
Page 8
𝑔𝑗 ≈ 𝑔𝑗 + ∇𝑔𝑖𝑇 (π‘₯̅𝑖 − π‘₯𝑖 )
constraint boundaries. After the one-dimensional search is over, we require a restoration move
to bring x back to the constraint boundaries using linear approximation.
𝑔𝑗 ≈ 𝑔𝑗 + ∇𝑔𝑖𝑇 (π‘₯̅𝑖 − π‘₯𝑖 )
We want to find a correction ¯xi − xi in the tangent subspace (i.e. P(¯xi − xi) = 0)
that would reduce gj to zero.
(π‘₯̅𝑖 − π‘₯𝑖 ) = −𝑁(𝑁 𝑇 𝑁)−1 π‘”π‘Ž
is the desired correction, where ga is the vector of active constraints. In addition to this we
carry out improvement by specifying a parameter β for reduction in cost function We
specify
𝑓(π‘₯𝑖 ) − 𝑓(π‘₯𝑖+1 ) ≅ 𝛽𝑓(π‘₯𝑖 )
And
𝛼 ∗ = − 𝛽 𝑓(π‘₯𝑖 )/𝑠 𝑇 ∇𝑓
We update
π‘₯𝑖+1 = π‘₯𝑖 + 𝛼 ∗ 𝑠 − 𝑁(𝑁 𝑇 𝑁)−1 π‘”π‘Ž
This method is very suitable for non linear optimization.
---------------------------------------------------------------------------------------------------------------------
ESDO LN -4 # M S Prasad
Page 9
SECTION II
Constrained Steepest descent Optimization
Concept of Descent function
In unconstrained optimization methods we used the cost function as the descent function to
monitor progress of algorithms toward the optimum point. For constrained problems,
the descent function is usually constructed by adding a penalty for constraint violations
to the current value of the cost function.
Generally Pshenichny’s descent function (also called the exact penalty function) is commonly
used to its simplicity. Pshenichny’s descent function F at any point x is defined as
Φ(X) = f(x) + R V(x)
where R > 0 is a positive number called the penalty parameter (initially specified by the user),
V(x) ≥ 0 is either the maximum constraint violation among all the constraints or zero, and
f (x) is the cost function value at x.
The descent function at the point x(k) is
Φk = f k + R V k
Φk = Φ(Xk) and Vk = V (XK )
and R is the most current value of the penalty parameter.
It must be ensured that R is greater than or equal to the sum of all the Lagrange multipliers of
the Quadratic sub problems at the point x(k).
𝑝
π‘š
π‘Ÿπ‘˜ = ∑ |π‘£π‘–π‘˜ | + ∑ |π‘’π‘–π‘˜ | π‘Žπ‘›π‘‘ 𝑅 ≥ π‘Ÿπ‘˜
𝑖=1
𝑖=1
|π‘£π‘–π‘˜ | 𝑖𝑠 π‘‘β„Žπ‘’ π‘’π‘žπ‘’π‘Žπ‘™π‘–π‘‘π‘¦ π‘π‘œπ‘›π‘‘π‘–π‘‘π‘–π‘œπ‘› π‘šπ‘’π‘™π‘‘π‘–π‘π‘™π‘–π‘’π‘Ÿ. π‘Žπ‘›π‘‘ |π‘’π‘–π‘˜ |𝑖𝑠 π‘–π‘›π‘’π‘žπ‘’π‘Žπ‘™π‘–π‘‘π‘¦ π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘ π‘šπ‘’π‘™π‘‘π‘–π‘π‘™π‘–π‘’π‘Ÿ
Vk ≥ 0 is the maximum constraint violation at K th iteration. i.e
Max { 0; |h1| , |h2|…; g1 , g2….}
R is the most current value of the penalty parameter.. Actually, it must be ensured that it is
greater than or equal to the sum of all the Lagrange multipliers of the QP sub problem at the
point x(k).
ESDO LN -4 # M S Prasad
Page 10
Quadratic sub Problems
Whenever we are dealing with constraint optimization problems we seek to linearize the
problem and constraints to get a finite solution or convergence . Most of the times in general we
have
Minimize
𝑓 ( 𝑋 𝐾 + βˆ† 𝑋 𝐾 ) = 𝑓(𝑋 𝐾 ) + ∇𝑓 𝑇 (𝑋 𝐾 )βˆ† 𝑋 π‘˜ π‘€π‘–π‘‘β„Ž π‘™π‘–π‘›π‘’π‘Žπ‘Ÿ πΈπ‘žπ‘’π‘Žπ‘™π‘–π‘‘π‘¦ π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘
β„Žπ‘— ( 𝑋 𝐾 + βˆ† 𝑋 𝐾 ) ≅ β„Žπ‘—(𝑋 𝐾 ) + ∇β„Žπ‘— 𝑇 (𝑋 𝐾 )βˆ† 𝑋 π‘˜ = 0 and
𝑔𝑖 ( 𝑋 𝐾 + βˆ† 𝑋 𝐾 ) ≅ 𝑔𝑖 (𝑋 𝐾 ) + ∇𝑔𝑖 𝑇 (𝑋 𝐾 )βˆ† 𝑋 π‘˜ ≤ 0
Such minimization problem in Vector form are denoted as
Minimize
Linear equality Constraints
Linear Inequality Constraints
1
1
𝑓̃ = 𝐢 𝑇 𝑑 + 2 𝑑 𝑇 𝑑 = ∑𝑛𝑖=1 𝐢𝑖 𝑑𝑖 + 2 ∑𝑛𝑖=1 𝑑𝑖 𝑑𝑖 --1
𝑁 𝑇 𝑑 = 𝑒 π‘œπ‘Ÿ ∑𝑝𝑗=1 𝑛𝑖𝑗 𝑑𝑖 = 𝑒𝑗 ---2
𝐴𝑇 𝑑 ≤ 𝑏 π‘œπ‘Ÿ ∑𝑛𝑗=1 π‘Žπ‘–π‘— 𝑑𝑗 ≤ 𝑏𝑗 where -----3
𝐢𝑖 = πœ•π‘“(𝑋 π‘˜ )⁄ πœ•π‘‹π‘– ; ej = - hj ( Xk ) ; bj = -gj (Xk) ; di = ΔXik
𝑛𝑖𝑗 = πœ•β„Žπ‘—(𝑋 π‘˜ )⁄ πœ•π‘‹π‘–
π‘Žπ‘–π‘— = πœ•π‘”π‘–(𝑋 π‘˜ )⁄ πœ•π‘‹π‘– and matrix A is formed by the components of aij
These set of equations are known as Quadratic sub problems .
The parameter Vk ≥ 0 related to the maximum constraint violation at the kth iteration is
determined using the calculated values of the constraint functions at the design point x(k) as
below
Vk
=
max { 0; |h1 | , | h2 | ….|hp | ; g1 ..g2 ………gn } ---
4
Since the equality constraint is violated if it is different from zero, the absolute value is used
with each hi i. Note that Vk is always nonnegative, i.e., Vk ≥ 0. If all constraints are satisfied at
x(k), then Vk = 0
Constraint Descent Algorithm
The stopping criterion for the algorithm is that ||d|| ≤ € for a feasible point. Here € is a small
positive number and d is the search direction that is obtained as a solution of the QP sub
problem.
Step 1. Set k = 0. Select initial values for design variables as X0. Select an appropriate initial
value for the penalty parameter R0, and two small numbers €1 and €2 defining the
permissible constraint violation and convergence parameter values, respectively. R0 = 1 can be
a starting selection.
ESDO LN -4 # M S Prasad
Page 11
Step 2. Compute at Xk the cost , constraint and their gradients. Calculate the maximum
constraint violation Vk as defined in equation 4 above .
.
Step 3. Using the cost and constraint function values and their gradients, define the QP sub
problem given by Eqs. 1 to 3.
Solve the QP sub problem to obtain the search direction d k and Lagrange multipliers vectors Vk
and Uk.
Step 4. Check ||d(k)|| < €2 and the maximum constraint violation Vk ≤ €1. If these criteria
are satisfied then stop. else continue.
Step 5. To check the necessary condition of R i.e. R ≥ r k for the penalty parameter R,
calculate the sum rk of the Lagrange multipliers defined as below
π‘Ÿπ‘˜ = ∑
𝑝
𝑖=1
|π‘£π‘–π‘˜ | + ∑
π‘š
π‘’π‘–π‘˜
𝑖=1
.
Set R = max {Rk, rk}.
Step 6. Update X k+1 = X k + αk d(k),
Like the unconstrained problems, the step size is calculated by minimizing the descent
function) along the search direction d(k).
Step 7. Save the current penalty parameter as Rk+1 = R. Update the iteration counter as k = k +
1, and go to Step 2.
The CSD algorithm is a first-order method that can treat equality and inequality constraints. The
algorithm converges to a local minimum point starting from an arbitrary point. . The rate of
convergence of the CSD algorithm can be improved by including second-order information in
the QP sub problem.
References
Standard Text books : Engineering system optimization : J S Arora . Kalyanmoy Deb and Rao.
ESDO LN -4 # M S Prasad
Page 12
Download