Solving Single-Variable, Unconstrained NLPs Prerequisites: Non-Linear Programming (NLP): Single-Variable, Unconstrained ChE 4G03: Optimization in Chemical Engineering Optimization – Part II McMaster University Department of Chemical Engineering Basic Concepts for Basic Concepts for Benoı̂t Chachuat <benoit@mcmaster.ca> Optimization – Part I Methods for Single-Variable Unconstrained Optimization 11111111111111111 00000000000000000 Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 1 / 18 Solving Single-Variable, Unconstrained NLPs Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 2 / 18 Outline Single Variable Optimization “Aren’t single-variable problems easy?” — Sometimes “Won’t the methods for multivariable problems work in the single variable case?” — Yes But, 1 A few important problems are single variables — e.g., nonlinear regression 2 This will give us insight into multivariable solution techniques 3 Single-variable optimization is a subproblem for many nonlinear optimization methods and software! — e.g., linesearch in (un)constrained NLP This lecture Numerical Solution Methods Region Elimination Methods Interpolation Methods Analytical Solution Methods Derivative-based Methods For additional details, see Rardin (1998), Chapter 13.2 Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 3 / 18 Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 4 / 18 Region Elimination Methods (Minimize Case) Golden Section Search (Minimize Case) Iteratively consider the function value at 4 carefully spaced points: Assume a unimodal function Leftmost x ℓ is always a lower bound on the optimal value x ∗ Rightmost x u is always an upper bound on the optimal value x ∗ Points x 1 and x 2 fall in between How do we choose the intermediate points x 1 , x 2 ? Case 1: f (x 1 ) < f (x 2 ) f (x) x1 x2 x1 x2 xu xℓ x1 x2 xu xℓ x1 x2 xu What is the advantage of choosing this ratio γ? Benoı̂t Chachuat (McMaster University) x xu xℓ eliminated NLP: Single-Variable, Unconstrained 4G03 5 / 18 Golden Section Search Algorithm (Minimize Case) ◮ Choose lower and upper bounds, x ℓ and x u , that bracket x ∗ , as well as stopping tolerance ǫ > 0 Set x 1 ← x u − γ x u − x ℓ and x 2 ← x ℓ + γ x u − x ℓ Step 1: Stopping ◮ u ℓ ∗ If x − x < ǫ, stop — report x ← solution 1 2 u x −x ℓ as an approximate Step 2a: Case f (x 1 ) < f (x 2 ) (optimum left) ◮ Narrow the search by eliminating the rightmost part: x u ← x 2, x 2 ← x 1, x 1 ← x u − γ x u − x ℓ , Step 2b: Case ◮ > f (x 2 ) xℓ x1 x2 xu xu reduction γ NLP: Single-Variable, Unconstrained 4G03 6 / 18 Pros: Objective function can be nonsmooth or even discontinuous Calculations are straightforward (only function evaluations) Cons: Assumes a unimodal function and requires prior knowledge of an enclosure x ∗ ∈ [x ℓ , x u ] Golden section search is slow! Considerable computation may be needed to get an accurate solution it. 0 1 2 3 4 (optimum right) and evaluate f (x 2 ) — Return to step 1 NLP: Single-Variable, Unconstrained x2 ∆ (x−20)4 500 Narrow the search by eliminating the leftmost part: x ℓ ← x 1, x 1 ← x 2, x 2 ← x ℓ + γ x u − x ℓ , Benoı̂t Chachuat (McMaster University) x1 Example: Consider the problem: min f (x) = and evaluate f (x 1 ) — Return to step 1 f (x 1 ) Benoı̂t Chachuat (McMaster University) xℓ Pros and Cons of Golden Section Search Step 0: Initialization ◮ x √ ∆ 1+ 5 ≈ 0.618 a with γ = 2 fraction know as the golden ratio x xℓ f (x) Golden Section Search The points x 1 , x 2 are taken as: h i ∆ x1 = xu − γ xu − xℓ h i ∆ x2 = xℓ + γ xu − xℓ Case 2: f (x 1 ) > f (x 2 ) f (x) Golden section search proceeds by keeping both possible intervals [x ℓ , x 2 ] or [x 1 , x u ] equal 4G03 7 / 18 xℓ 0.00 15.28 15.28 21.12 x1 15.28 24.72 21.12 24.72 x2 24.72 30.54 24.72 26.95 Benoı̂t Chachuat (McMaster University) xu 40.00 40.00 30.56 30.56 f (x ℓ ) 320.00 -29.57 -29.57 -42.23 f (x 1 ) -29.57 -48.45 -42.23 -48.45 NLP: Single-Variable, Unconstrained − 2x f (x 2 ) -48.45 -36.27 -48.45 -49.23 f (x u ) 240.00 240.00 -36.27 -36.27 xu − xℓ 40.00 24.72 15.28 9.44 4G03 8 / 18 Interpolation Methods (Minimize Case) Math Refresher: Lagrange Polynomials Improve speed by taking full advantage of a 3-point pattern: Assume a unimodal, continuous function Fit a smooth curve through the points: (x ℓ , f (x ℓ )), (x m , f (x m )), (x u , f (x u )); then, optimize the fitted curve Case 1B: f (x) xm xq xu eliminated xm xu xℓ Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained xm xq m xℓ x pN (x) = xu xu 1.2 1.2 1 1 0.8 0.8 0.6 0.4 9 / 18 0.4 0.2 0 0 -0.2 1 2 3 4 x Benoı̂t Chachuat (McMaster University) 5 6 0 1 2 3 x 4 5 NLP: Single-Variable, Unconstrained 6 4G03 10 / 18 Quadratic Fit Search Algorithm (Minimize Case) Step 0: Initialization ◮ Quadratic fit search proceeds by fitting a 2nd-order polynomial through (x ℓ , f (x ℓ )), (x m , f (x m )) (x u , f (x u )): Choose starting 3-point pattern (x ℓ , x m , x u ), as well as stopping tolerance ǫ > 0 Step 1: Stopping ◮ (x − x m )(x − x u ) (x − x ℓ )(x − x u ) p2 (x) = f (x ) ℓ + f (x m ) m m ℓ u (x − x )(x − x ) (x − x ℓ )(x m − x u ) (x − x ℓ )(x − x m ) + f (x u ) u (x − x ℓ )(x u − x m ) ℓ If x u − x ℓ < ǫ, stop — report x ∗ ← x m as an approximate solution Step 2: Quadratic Fit ◮ Quadratic Fit Search Compute quadratic fit optimum x q as h i h i m2 ℓ u2 m u2 ℓ2 u ℓ2 m2 f (x ) x − x + f (x ) x − x + f (x ) x − x 1 xq ← 2 f (x ℓ ) [x m − x u ] + f (x m ) [x u − x ℓ ] + f (x u ) [x ℓ − x m ] Step 3a: Case x q < x m The unique optimum of the fitted quadratic function occurs at: h i h i ℓ ) x m2 − x u2 + f (x m ) x u2 − x ℓ 2 + f (x u ) x ℓ 2 − x m2 f (x ∆ 1 xq = 2 f (x ℓ ) [x m − x u ] + f (x m ) [x u − x ℓ ] + f (x u ) [x ℓ − x m ] ◮ 4G03 Step 3b: Case x q > x m If f (x q ) > f (x m ), update ◮ xℓ ← xq ◮ m x ←x , 11 / 18 x Return to step 1 Benoı̂t Chachuat (McMaster University) If f (x q ) > f (x m ), update xh ← xq Otherwise, update u ◮ NLP: Single-Variable, Unconstrained 0.6 0.2 0 4G03 N Y x − xi with: Lk (x) = xk − xi ∆ i =1 i 6=k -0.2 How do we fit the curve? Benoı̂t Chachuat (McMaster University) p(xk )Lk (x), k=0 Quadratic Fit Search (Minimize Case) ∆ N X ∆ x xℓ The Lagrange polynomials provide one way of computing this interpolation: x q > x m , and f (x q ) > f (x m ) x xℓ 2 p3 (x) x q > x m , and f (x q ) < f (x m ) For N + 1 data points, there is one and only one polynomial of order ∆ N, pN (x) = a0 + a1 x + · · · + aN x N , passing through all the points p2 (x) Case 1A: f (x) 1 ◮ m ←x Otherwise, update x ℓ ← x m, q ◮ xm ← xq Return to step 1 NLP: Single-Variable, Unconstrained 4G03 12 / 18 Pros and Cons of Quadratic Fit Search Derivative-Based Methods Pros: Improve convergence speed by using first- and second-order derivatives taking full advantage of a 3-point pattern: Assume a smooth function f Consider the quadratic approximation of f passing through the current iterate x k Optimize this approximation to determine the next iterate, x k+1 Calculations are also straightforward (only function evaluations) Typically faster than Golden section search (but not so much...) Cons: Assumes a unimodal function and requires prior knowledge of an enclosure x ∗ ∈ [x ℓ , x u ] f (x) Objective function has to be smooth ∆ (x−20)4 500 Example: Consider the problem: min f (x) = it. 0 1 2 3 4 xℓ 0.00 0.00 20.92 20.92 xm 32.00 20.92 25.00 25.00 xu 40.00 32.00 32.00 30.00 f (x ℓ ) 320.00 320.00 -41.84 -41.84 f (x m ) -22.53 -41.84 -48.75 -48.75 − 2x xu − xℓ 40.00 32.00 11.80 9.08 f (x u ) 240.00 -22.53 -22.53 -40.03 f (x q ) -41.84 -48.75 -40.03 xq 20.92 25.00 30.00 x x k+1 x k Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 13 / 18 Newton’s Method Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained xk ? Use Taylor series approximation: h i 1 h i2 f (x) ≈ f (x k ) + f ′ (x k ) x − x k + f ′′ (x k ) x − x k 2 Step 0: Initialization Calculate the optimum, x k+1 , of the quadratic approximation: h i f ′ (x k+1 ) = 0 = f ′ (x k ) + f ′′ (x k ) x k+1 − x k Step 1: Derivatives ◮ ◮ ◮ Choose an initial guess x 0 , as well as stopping tolerance ǫ > 0 Set k ← 0 Compute first- and second-order derivatives f ′ (x k ), f ′′ (x k ) Step 2: Stopping ◮ Basic Newton’s Search The basic Newton’s method proceeds iteratively as If f ′ (x k ) < ǫ, stop — report x ∗ ← x k as an approximate solution Step 3: Newton Step ′ x k+1 ′ 14 / 18 Basic Newton Algorithm How do we get a quadratic approximation at ∆ 4G03 ◮ f ′ (x k ) ∆ = x k − ′′ k f (x ) ◮ ◮ k k ) Compute Newton step d k+1 ← − ff ′′(x (x k ) Update iterate x k+1 ← x k + d k+1 Increment k ← k + 1 and return to step 1 ) is called the Newton step. The term d k+1 = − ff ′′(x (x k ) Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 15 / 18 Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 16 / 18 Variant: Quasi-Newton Algorithm Pros and Cons of (Quasi-)Newton Search Idea: Approximate second-order derivatives using finite differences, Pros: Very fast convergence close to the optimal solution (quadratic convergence rate) No need to bound the optimum within a range f ′ (x k ) − f ′ (x k−1 ) f ′′ (x k ) ≈ x k − x k−1 Step 0: Initialization ◮ ◮ Choose initial points x −1 and x 0 , as well as stopping tolerance ǫ > 0 Set k ← 0 Step 1: Derivatives ◮ Compute first-order derivative f ′ (x k ) and approximate inverse of ∆ second-order derivatives B(x k ) = x k −x k−1 f ′ (x k )−f ′ (x k−1 ) Step 2: Stopping ◮ ∆ (x−20)4 500 Example: Consider the problem: min f (x) = If f ′ (x k ) < ǫ, stop — report x ∗ ← x k as an approximate solution k 0 1 2 3 Step 3: Newton Step ◮ ◮ ◮ Compute Newton step d k+1 ← −f ′ (x k )B k Update iterate x k+1 ← x k + d k+1 Increment k ← k + 1 and return to step 1 Benoı̂t Chachuat (McMaster University) NLP: Single-Variable, Unconstrained 4G03 Cons: Requires a “good” initial guess, otherwise typically diverges No distinction between (local) minima and maxima! Objective function has to be smooth and first-order derivatives must be available (possibly second-order derivatives too) 17 / 18 xk f (x k ) 30.00 27.50 26.48 -40.00 -48.67 -49.43 Benoı̂t Chachuat (McMaster University) f ′ (x k ) 6.000 1.37 0.18 f ′′ (x k ) |f ′ (x k )| 2.40 1.35 1.01 6.000 1.375 0.178 NLP: Single-Variable, Unconstrained − 2x d k+1 -2.50 -1.02 4G03 18 / 18