Nonlinear Programming i) Introduction A general nonlinear programming problem (NLP) can be expressed as follows: Find the values of variables x1, x2, ...., xn that Max ( or Min) Z = f(x1, x2, ..., xn) s.t. g1(x1, x2, ........, xn) (≤, =, ≥ ) b1 g2(x1, x2, ........, xn) (≤, =, ≥ ) b2 ....................................................... gm(x1, x2, ........, xn) (≤, =, ≥ ) bm f(x1, x2, ..., xn) is the objective function and g1( ), g2( there are no constraints, it is an unconstrained NLP. ), ... are the NLP’s constraints. If Feasible region for any LP is a convex set. Also optimal solution is an extreme or corner point. In NLP, even if it is a convex set, optimal solution need not be an extreme point. Also, for the NLP the optimal solution may not be on the boundary of the feasible region. Local Extremum A point x is a local maximum if f(x) ≥ f(x’) for all feasible x’ that are close to x. Analogously, a point is a local minimum if f(x) ≤ f(x’) holds for all x’ that are close to x. Convex and Concave Functions A function f(x) is convex if and only if the line segment joining any two points on the curve y = f(x) is never below the curve. A function f(x) is concave if and only if the line segment joining any two points on the curve y = f(x) is never above the curve. y = f(x) y = f(x) Convex function Concave function Theorem 1: Considering a maximization NLP problem. Suppose the feasible region S is a convex set. If f(x) is concave on S, then any local maximum for NLP is an optimal solution to this NLP. Theorem 1’: Considering a minimization NLP problem. Suppose the feasible region S is a convex set. If f(x) is convex on S, then any local minimum for NLP is an optimal solution to this NLP. 1 How do we determine if a function of a single variable is convex or concave? Suppose f’’(x) exists for all x in a convex set S. Then, f(x) is a convex function on S if and only if f’’(x) ≥ 0 for all x in S. If f’’(x) ≤ 0 for all x in S, then f(x) is a concave function. f(x) = x2 f’(x) = 2x f’’(x) = 2 f(x) = x1/2 f’(x) = (1/2)x-1/2 + ve, hence convex function. f’’(x) = -(1/4)x-3/2 ≤ 0, hence, concave function. How can we determine whether a function f(x1, x2, .... , xn) of n variables is convex or concave on a set S C Rn ? Definition: The Hessian of f(x1, x2, .... , xn) is the nxn matrix whose ijth entry is: ∂2f/(∂xi.∂xj) e.g. f(x1, x2) = x13 + 2x1x2 + x22, then H(x1, x2) = 6x1 2 2 2 Definition: An ith principal minor of an nxn matrix is the determinant of any ixi matrix obtained by deleting (n – i) rows and the corresponding (n – i) columns of the matrix. Thus, for the matrix -2 -1 -1 -4 the first principal minors are -2 and -4, and the 2nd principal minor is (-2)(-4) –(-1)(-1) = 7 Definition: The kth leading principal minor of an nxn matrix is the determinant of any kxk matrix obtained by deleting the last (n – k) rows and columns of the matrix. We let Hk(x1, x2, ..., xn) be the kth leading minor of the Hessian matrix evaluated at the point (x1, x2, ....., xn). Thus, if f(x1, x2) = x13 + 2x1x2 + x22 , then H2(x1, x2) = 6x1(2) – 2(2) = 12x1 – 4 H1(x1, x2) = 6x1 Theorem: Suppose f(x1, x2, ...., xn) has continuous second order partial derivatives for each x = (x1, x2, ...., xn) € S. Then, f(x1, x2, ...., xn) is a convex function on S if and only if for each x € S, all principal minors of H are nonnegative. Ex. f(x1, x2) = x13 + 2x1x2 + x22 H(x1, x2) = 2 2 First principal minors are 2 2 2 and 2 (both ≥ 0) Second principal minor is 2(2) – 2(2) = 0 ≥ 0 All are nonnegative. Therefore, f(x1, x2) is a convex function. 2 Theorem: Suppose f(x1, x2, ...., xn) has continuous second order partial derivatives for each x = (x1, x2, ...., xn) € S. Then, f(x1, x2, ...., xn) is a concave function on S if and only if for each x € S and k = 1, 2, ... n, all nonzero principal minors have the same sign as (-1)k. f(x1, x2) = - x12 - x1x2 - 2x22 Ex. H(x1, x2) = -2 -1 First principal minors are -1 -4 nonpositive (-1)1 → negative -2 and -4 Second principal minor is -2(-4) – (-1)(-1) = 7 > 0 (-1)2 → +ve Therefore, f(x1, x2) is a concave function. f(x1, x2) = x12 - 3x1x2 + 2x22 Ex. is not a convex or a concave function. ii) Solving NLP’s with One Variable How to solve: max (or min) f(x) s.t. x € [a, b] Case 1: and f’(x) = 0 Points where a < x < b, Suppose a < x < b, and f’(x) exists. If x0 is a local maximum or local minimum, then f’(x) = 0. Furthermore, if f’’(x) < 0, x0 is a local maximum and, if f’’(x) > 0, x0 is a local minimum. In some cases, as in C, f’(x0) = 0, but x0 is not a local max or min. Here, f’’(x0) = 0. B C A a x0 b x0 = 0 a x0 f’(x0) = 0 x0: local max. b f’(x0) = 0 x0: local min f2(x0) = 0 x0 not a local max or min When f’’(x0) = 0, further assessment is necessary to determine whether x0 is a local max or a local min. 3 Case 2. Points where f’(x) does not Exist If f(x) does not have a derivative at x0, x0 may be a local max or a local min or neither. y = f(x) x1 x0 x2 x0 not a local extremum x0 local max. x0 not a local extremum x0 local min In this case, we determine whether x0 is a local max or a local min by checking values of f(x) at points x1 < x0 and x2 > x0 near x0. f(x0) > f(x1); f(x0) < f(x1); f(x0) ≥ f(x1); f(x0) ≤ f(x1); f(x0) < f(x2) f(x0) > f(x2) f(x0) ≥ f(x2) f(x0) ≤ f(x2) not a local extremum. not a local extremum. local max. local min. Case 3. End points a and b of [a, b] If f’(a) > 0, then a is a local minimum. If f’(a) < 0, then a is a local maximum If f’(b) > 0, then b is a local maximum If f’(b) < 0, then b is a local minimum. y = f(x) a f’(a) > 0, a is local min. a f’(a) < 0, local max. b f’(b) > 0, b is local max. Examples to be solved in class. 4 b f’(b) < 0, local min. iii) Unconstrained Maximization and Minimization with Several Variables We now consider, max (or min) f(x1, x2, .... , xn) s.t. (x1, x2, ...... , xn) € Rn (set of real numbers) We assume that first and 2nd partial derivatives of f(x1, x2, .... , xn) exist and are continuous. Let x’ = (x1’, x2’, ..... , xn’) be a local extremum for the NLP above. If x’ is a local extremum, then ∂f(x’)/∂xi = 0. Definition: A point x’ having ∂f(x’)/∂xi = 0 for i = 1, 2, ....., n is called a stationary point of f. Theorem: If Hk(x’) > 0, k = 1, 2, ...,n, then a stationary point x’ is a local minimum for NLP. Theorem: If, for k = 1, 2, ...,n, Hk(x’) is non zero and has the same sign as (-1)k, then a stationary point x’ is a local maximum for NLP. Theorem: If Hn(x’) ≠ 0 and the conditions of the above two theorems do not hold, then a stationary point x’ is not a local extremum. It is then called a saddle point. If Hn(x’) = 0, the stationary point can be any one of the three (inconclusive). From the theorems we have seen so far, if f(x1, x2, .... , xn) is a concave function and NLP is a max problem, any stationary point is an optimal solution. If f(x1, x2, .... , xn) is a convex function and NLP is a min problem, any stationary point is an optimal solution. Examples to be solved in class. See also Ex. 28, p.658 in text book. iv) NLP’s with Constraints – Lagrange Multipliers Lagrange multipliers can be used to solve NLP in which all the constraints are equality constraints. max (or min) Z = f(x1, x2, .... , xn) s.t. g1(x1, x2, ........, xn) = b1 g2(x1, x2, ........, xn) = b2 ....................................... gm(x1, x2, ........, xn) = bm (1) To solve the above, we associate a multiplier λi with the ith constraint and form the Lagrangian, 𝑚 L(x1, x2, ..., xn, λ1, λ2, .., λm) = f(x1, x2, .... , xn) + ∑𝑖=1 λi[bi − gi(x1, x2, … , xn)] (2) Then we attempt to find a point (x1’, x2’, ..., xn’, λ1’, λ2’, ..., λm’) that maximizes (or minimizes) L(x1, x2, ..., xn, λ1, λ2, .., λm). Supposing NLP (1) is a max problem. If (x1’, x2’, ..., xn’, λ1’, λ2’, .., λm’) maximizes L, then at (x1’, x2’, ..., xn’, λ1’, λ2’, .., λm’), ∂L/∂λi = bi − gi(x1, x2, … , xn) = 0 5 This shows that (x1’, x2’, ..., xn’) will satisfy the constraints in (1). It can be shown that if (x1’, x2’, ..., xn’, λ1’, λ2’, ..., λm’) solves the unconstrained maximization problem, max L(x1, x2, ..., xn, λ1, λ2, .., λm) = f(x1, x2, .... , xn) (3) then, (x1’, x2’, ..., xn’) solves (1). To solve (3), we solve, ∂L/∂x1 = ∂L/∂x2 = ......= ∂L/∂xn = ∂L/∂λ1 = ∂L/∂λ2 = ...... = ∂L/∂λm = 0 (4) Theorem: Suppose NLP (1) is a maximization problem. If f(x 1, x2, .... , xn) is a concave function and each gi(x1, x2, .... , xn) is a linear function, then any point (x1’, x2’, ..., xn’, λ1’, λ2’, .., λm’) satisfying (4) will yield an optimal solution (x1’, x2’, ..., xn’) to NLP (1). Theorem: Suppose NLP (1) is a minimization problem. If f(x1, x2, .... , xn) is a convex function and each gi(x1, x2, .... , xn) is a linear function, then any point (x1’, x2’, ..., xn’, λ1’, λ2’, .., λm’) satisfying (4) will yield an optimal solution (x1’, x2’, ..., xn’) to NLP (1). Lagrange Multipliers and Sensitivity Analysis If the RHS of the ith constraint is increased by a small amount Δbi (in either max or min problem), then the optimal Z value of (1) will increase by approximately ∑𝑚 𝑖=1(Δbi)λi . In particular, if we increase RHS of only constraint i by Δbi, then optimal Z-value of (1) will increase by (Δbi)λi. Ex. A company is planning to spend $10 000 on advertising. It costs $3 000 per minute to advertise on television and $1 000 per minute to advertise on radio. If the firm buys x minutes of television advertising and y minutes of radio advertising, then its revenue in thousands of dollars is given by, f(x, y) = - 2x2 – y2 + xy + 8x + 3y. How can the firm maximize its revenue? Solution in class. 6 v) The Kuhn – Tucker Conditions ( or Karush – Kuhn – Tucker Conditions) We now consider, max (or min) Z = f(x1, x2, .... , xn) s.t. g1(x1, x2, ........, xn) ≤ b1 g2(x1, x2, ........, xn) ≤ b2 ....................................... gm(x1, x2, ........, xn) ≤ bm (A) All constraints must be in the form ≤ 0 . If not in ≤ 0, then it must be converted to. Following Theorems I and I2 give conditions (KT conditions) that are necessary for a point x’ = (x1’, x2’, ..., xn’) to solve (A). Note: Partial derivative of f wrt xj evaluated at x’ is written as, ∂f(x’)/∂xj For the theorems to hold, the functions g1, g2, ...., gm must satisfy certain regularity conditions (usually called constraint qualifications). (Will be discussed later). When the constraints are linear, these regularity assumptions are always satisfied. Theorem I: Suppose (A) is a maximization problem. If x’ = (x1’, x2’, ..., xn’) is an optimal solution to (A), then x’ = (x1’, x2’, ..., xn’) must satisfy the m constraints in (A) and there must exist multipliers λ1, λ2, ....., λm satisfying, 𝑚 ∂f(x’)/∂xj - ∑𝑖=1 λi′ . [∂gi(x ′ )/ ∂xj] = 0 (j = 1, 2, ...., n) λi’[bi – gi(x’)] = 0 λi’ ≥ 0 (i = 1, 2, ...., m) (i = 1, 2, ...., m) Theorem I’: Suppose (A) is a minimization problem. If x’ = (x1’, x2’, ..., xn’) is an optimal solution to (A), then x’ = (x1’, x2’, ..., xn’) must satisfy the m constraints in (A) and there must exist multipliers λ1, λ2, ....., λm satisfying, 𝑚 ∂f(x’)/∂xj + ∑𝑖=1 λi′ . [∂gi(x ′ )/ ∂xj] = 0 (j = 1, 2, ...., n) λi’[bi – gi(x’)] = 0 λi’ ≥ 0 (i = 1, 2, ...., m) (i = 1, 2, ...., m) λi may be thought of as the shadow price for the ith constraint in (A). 7 If we are to include the constraints that the variables must be non-negative, max (or min) Z = f(x1, x2, .... , xn) s.t. g1(x1, x2, ........, xn) ≤ b1 g2(x1, x2, ........, xn) ≤ b2 ....................................... gm(x1, x2, ........, xn) ≤ bm - x1 ≤ 0 - x2 ≤ 0 ....... - xn ≤ 0 (B) If we associate multipliers μ1, μ2, ..., μn with the non-negativity constraints in (B), we have: Theorem II: Suppose (B) is a maximization problem. If x’ = (x1’, x2’, ..., xn’) is an optimal solution to (B), then x’ = (x1’, x2’, ..., xn’) must satisfy the constraints in (B) and there must exist multipliers λ1, λ2, ....., λm, μ1, μ2, ... , μn satisfying, 𝑚 ∂f(x’)/∂xj - ∑𝑖=1 λi′ . [∂gi(x ′ )/ ∂xj] + μj = 0 (j = 1, 2, ...., n) λi’[bi – gi(x’)] = 0 (i = 1, 2, ...., m) 𝑚 ′ ′ {∂f(x’)/∂xj - ∑𝑖=1 λi . [∂gi(x )/ ∂xj]} xj = 0 (j = 1, 2, ...., n) λi’ ≥ 0 μj’ ≥ 0 (i = 1, 2, ...., m) (j = 1, 2, ...., n) (*) Because μj ≥ 0, equation (*) may be replaced by: 𝑚 ∂f(x’)/∂xj - ∑𝑖=1 λi′ . [∂gi(x ′ )/ ∂xj] ≤ 0 (j = 1, 2, ...., n) Theorem II’: For minimization the following are to be satisfied: 𝑚 ∂f(x’)/∂xj + ∑𝑖=1 λi′ . [∂gi(x ′ )/ ∂xj] - μj = 0 λi’[bi – gi(x’)] = 0 (j = 1, 2, ...., n) (**) (i = 1, 2, ...., m) 𝑚 {∂f(x’)/∂xj + ∑𝑖=1 λi′ . [∂gi(x ′ )/ ∂xj]} xj = 0 λi’ ≥ 0 μj’ ≥ 0 (j = 1, 2, ...., n) (i = 1, 2, ...., m) (j = 1, 2, ...., n) Because μj ≥ 0, equation (**) may be replaced by: 𝑚 ∂f(x’)/∂xj + ∑𝑖=1 λi′ . [∂gi(x ′ )/ ∂xj] ≥ 0 (j = 1, 2, ...., n) Theorems I, I’, II, and II’ give conditions that are necessary for a point x’ = (x 1’, x2’, ..., xn’) to be an optimal solution to (A) or (B). 8 The following two theorems give conditions that are sufficient for x’ = (x1’, x2’, ..., xn’) to be an optimal solution to (A) or (B): Theorem III: For maximization, if f(x1, x2, ..., xn) is a concave function and constraints g1( ), g2( ), ...gm( ) are convex functions, then any point x’ = (x1’, x2’, ..., xn’) satisfying Theorem I is an optimal solution to (A). Any point x’ = (x1’, x2’, ..., xn’) satisfying Theorem II is an optimal solution to B. Theorem III’: For minimization, if f(x1, x2, ..., xn) is a convex function and constraints g1( ), g2( ), ...gm( ) are convex functions, then any point x’ = (x1’, x2’, ..., xn’) satisfying Theorem I’ is an optimal solution to (A). Any point x’ = (x1’, x2’, ..., xn’) satisfying Theorem II’ is an optimal solution to B. For minimization, both f( ) and g( ) are convex; the latter requirement ensures the feasible region for (A) and (B) is a convex set. Solutions in class. 9