2. Single Variable Unconstrained Optimization 2.1 Calculus Refresher Consider the following problem in geometry. Example: Construct a rectangle of perimeter L with the largest area. Solution: Rectangles of various shapes can be constructed with perimeter L . Among these, we need to determine the one with the largest area. Let the sides of the optimal rectangle be x & y . We need to maximize Area xy , subject to the constraint (2 x 2 y ) L L 2x Since y , we have Area x( L 2 x) / 2 2 For what value x of does the above function take a maximum. Recall from basic calculus that if df the differentiable function f ( x) takes a maximum/ minimum at x * then 0 dx x x* In the above equation f ( x) x( L 2 x) / 2 df ( L 2 x) / 2 x(2) / 2 dx df ( L 2 x*) / 2 x *(2) / 2 0 dx x x* x* L / 4 L 2x * y* L/4 2 Thus, the optimal shape is a square of side L / 4 and the area is L2 /16 . The above result is confirmed below where we plot the area as a function of x. (Move your mouse anywhere within the square bracket and PRESS CONTROL_ENTER to execute the MATLAB command]. close all; L=2; x= 0:0.01*L:L/2; Area = x.*(L-2*x)/2; plot(x,Area); grid on; axis('equal') 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The key mathematical concept used above was setting the derivative of a function to zero in order to find the maximum. Much of optimization stems from this fundamental idea. Let us state this result in a formal manner, and explore its meaning a little further. Theorem: If a differentiable function f ( x) takes a maximum or minimum at x * then df 0 dx x x* Proof: We provide an informal proof here; see [Reference] for details. df 1 d2 f 1 d3 f 2 3 4 f ( x* x) f ( x* ) x x x o x 2 3 dx x x* 2! dx x x* 3! dx x x* Consider the first two terms: df f ( x* x) f ( x* ) x dx x x* Suppose f ( x* ) is a minimum, then by definition, f ( x* ) must be less than bpth f ( x* x) and f ( x* x) for sufficiently small x df f ( x* ) f ( x* x) f ( x* ) x dx x x* 0 df dx x x x* f ( x* ) f ( x* x) f ( x* ) 0 df dx df dx x x x* x x x* df x dx x x* Thus, we must have df 0 dx x x* 0 Example 2: Consider a function f ( x) x 2 5x 10 that is plotted below. close all; x= -1:0.1:5; f =x.^2 -5*x+10; plot(x,f); grid on; 16 14 12 10 8 6 4 2 -1 0 1 2 3 4 5 Note that the function takes a minima at x = 2.5. Indeed: df 2x 5 dx df 2(2.5) 5 0 dx x 2.5 One can use Taylor command in MATLAB to observe that the second term is indeed 'missing': f = sym('x*x-5*x+10'); ft = taylor(f,3,2.5) % 3 terms at x = 2.5 ft = 15/4+(x-5/2)^2 Example 3: Next consider the function f ( x) 3x 2 6 x 11 that is plotted below. x= -5:0.1:5; f =-3*x.^2 +6*x+11; plot(x,f); grid on; 20 0 -20 -40 -60 -80 -100 -5 df dx -4 -3 -2 -1 0 1 2 3 4 5 6(1) 6 0 x 1 f = sym('-3*x*x-5*x+10'); ft = taylor(f,3,2.5) % 3 terms at x = 2.5 The above two examples merely confirm the correctness of the theorem. Note the following: If the slope vanishes, we cannot conclude that the function takes a minima or a maximuma, i.e., df dx 0 f ( x) takes a maximum or minimum at x * x x* Example: Let us consider a different function f ( x) 3x3 9 that highlights the limitations of the theorem. Note that at x = 0, the derivative vanishes df 9x2 0 x 0 dx But the function neither takes a minima or a maxima at x = 0, as is plotted below. x= -5:0.1:5; f =3*x.^3 -9; plot(x,f); grid on; 400 300 200 100 0 -100 -200 -300 -400 -5 -4 -3 -2 -1 0 1 2 3 4 5 So, let us separate out the issues: 1. Points where the derivative of a function vanishes are called STATIONARY points. Thus x = 2,5, in the first example, x = 1 in the second example and x = 0 in the third example are all stationary points. 2. A stationary point can be a minima (as in the first example) or a maxima (as in the second example) or a cusp (as in the third example). How do we differentiate between the three cases? The three functions are reproduced below. 16 20 400 300 14 0 200 12 -20 100 10 -40 0 8 -100 -60 6 -200 -80 4 2 -1 -300 0 1 2 3 4 5 -100 -5 -4 -3 -2 -1 0 1 2 3 4 5 -400 -5 -4 -3 -2 -1 0 1 2 3 4 5 f ( x) x 2 5x 10 f ( x) 3x 2 6 x 11 f ( x) 3 x 3 9 To differentiate between the three cases, we must consider the second derivative. Note that in the first figure, the derivative reaches a zero at the minima and then increases. Mathematically, we have: d df d 2 f 0 at a minima. dx dx dx 2 Indeed, for the given function d2 f 50 dx 2 While, in the second figure, the derivative reaches a zero at the maxima and then decreases, i.e., d df d 2 f 0 at a maxima. dx dx dx 2 Indeed, for the given function d2 f 6 0 dx 2 Finally, one may be tempted to conclude that if the second derivative is zero at a stationary point, then it must be a cusp (neither a minima or a maxima). Unfortunately, this conclusion is false. For example, consider the function f ( x) x 4 12 . Note that df 4 x3 0 x 0 dx But the function indeed takes a minima at x = 0, as is plotted below. close all; x= -5:0.1:5; f =x.^4 +12; plot(x,f); grid on; 700 600 500 400 300 200 100 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 2.2 Optimality In summary we have the following Theorem 2: Consider a differentiable function f ( x) . Suppose at a point x * we have: df (a) 0 then x * is a STATIONARY point (and of special interest in optimization). dx x x* (b) (c) df dx df dx 0 and x x* 0 and x x* d2 f dx 2 d2 f dx 2 0 then the function takes a MINIMA at x = x* x x* 0 then the function takes a MAXIMA at x = x* x x* d2 f df 0 one cannot conclude anything about the local behavior of x. 0 and dx 2 x x* dx x x* Proof: We provide an informal proof here; see [Reference] for details. Consider the Taylor series: df 1 d2 f 1 d3 f 2 3 4 f ( x* x) f ( x* ) x x x o x 2 3 dx x x* 2! dx x x* 3! dx x x* (d) Since f ( x* ) is a stationary point, df 0 dx x x* Thus: 1 d2 f 2 * * f ( x x) f ( x ) x 2 2! dx x x* (b) Suppose d2 f 0 dx 2 x x* then 1 d2 f 2 x 0 for all x (both positive and negative) 2 2! dx x x* Thus: 1 d2 f 2 f ( x* x) f ( x* ) x 2 2! dx x x* Thus f ( x* ) must be less than both f ( x* x) and f ( x* x) for sufficiently small x The other parts of the theorem can be similarly established. End of Proof. We now have a very powerful theorem that can be applied to find and check for minima/ maxima. In fact the above theorem can be used to find and detect multiple minima/ maxima. Example: Consider the function f ( x) x3 3x 5 . df 3x 2 3 0 x 1 dx Thus, we have 2 stationary points x* 1. Let us now check what happens to the 2nd derivative at these two stationary points. d2 f 6x dx 2 At the two stationary points, we have: d2 f dx 2 60 x *1 d2 f 6 0 dx 2 x*1 Thus, we can conclude from the above theorem that x* 1 is a minima of the function, while x* 1 is a maxima of the function. We can confirm these findings by plotting the function close all; x= -2:0.1:2; f =x.^3 -3*x+5; plot(x,f); grid on; 7 6.5 6 5.5 5 4.5 4 3.5 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 So far we have only considered polynomials. But the above theorem applies to any function (that is twice differentiable). Example: Consider the function f ( x) sin x x / 2 . df 1 cos x 1/ 2 0 cos x dx 2 There are infinite stationary points for this function (we have only considered the positive values): X * / 4 7 / 4 9 / 4 ... 2n / 4 Let us now check what happens to the 2nd derivative at these stationary points. d2 f sin x dx 2 We have: sin( / 4) 0 sin(7 / 4) 0 sin(9 / 4) 0 ... Thus there are infinite minima and maxima for this function, and they alternate. We can confirm these findings by plotting the function close all; x= -3*pi:0.1:3*pi; f =sin(x)-x/sqrt(2); plot(x,f); grid on; 8 6 4 2 0 -2 -4 -6 -8 -10 -8 -6 -4 -2 0 2 4 6 8 10 Note: The above example also illustrates a very important concept that the second fundamental theorem of optimization only helps us find LOCAL minima and maxima. None of the minima/ maxima is a GLOBAL minima/ maxima. Thus one must be contended with finding local minimas and maximas.The next example illustrates that even finding the local minima and maxima is hard using the above theorem. Example: Consider the function f ( x) 3sin x x 0.1x 2 0.1cos(2 x) df 3cos x 1 0.2 x 0.2sin(2 x) dx The above theorem states that one must find the stationary points where the derivative vanishes. But solving for x in df 3cos x 1 0.2 x 0.2sin(2 x) 0 dx is not at all trivial. 2.4 Algorithms to Compute Stationary Points For non-trivial problems, there are two broad classes of methods to find stationary points: df 0 using a non-linear solver (1) Solve dx (2) Directly find the minima/ maxima of f ( x) In particular we will consider the function f ( x) sin x x6 x 19 as an example. Since all algorithms require an initial guess point, let us plot the function. close all; x= -15:0.1:15; f =3*sin(x) –x +0.1*x.^2 + 0.1*cos(2*x); plot(x,f); grid on; 40 35 30 25 20 15 10 5 0 -5 -10 -15 -10 -5 0 5 10 15 First, we will use built-in MATLAB programs to solve via both methods. 2.4.1 FSOLVE Non-Linear Solver We will use the MATLAB "fsolve" function to solve the non-linear equation. All non-linear solvers require an initial guess point. Observe that the function has numerous minimas and maximas. Depending on the starting guess points, algorithms will converge typically to the nearest stationary point. options = optimset('TolFun',1e-10,'TolX',1e-10,'Display','off'); dfdx = inline('3*cos(x) –1 +0.2*x - 0.2*sin(2*x)'); fsolve(dfdx,-1,options) % initial guess of -1 fsolve(dfdx,1, options) % initial guess of 1 fsolve(dfdx,5, options) % initial guess of 5 ans = -1.19457375875414 ans = 1.28268099939306 ans = 4.72836823924901 2.4.2 FMINUNC Minimization Solver We will use the MATLAB "fminunc" function to find minimal points. All optimization solvers require an initial guess point. Observe that the function has numerous minimas. Depending on the starting guess points, algorithms will converge typically to the nearest stationary point. options = optimset('TolFun',1e-10,'TolX',1e-10, 'TolCon',1e-10,'Display', 'Off', 'LargeScale', 'Off', 'GradObj', 'off', 'GradConstr','off'); f = inline('3*sin(x) –x +0.1*x^2 +0.1*cos(2*x)'); fminunc(f,-1, options) % initial guess of -1 fminunc(f,1, options) % initial guess of 1 fminunc(f,5, options) % initial guess of 5 ans = -1.19457374992716 ans = -1.19457376204777 ans = 4.72836822582243 Also observe that for the initial guess we converge to the closest minima (whereas the non-linear solver converged to the nearest stationary point). To find the closest maxima using fminunc you need to define the negative of the function. fNeg = inline('-3*sin(x) +x -0.1*x^2 -0.1*cos(2*x)'); fminunc(fNeg,1, options) % initial guess of 1 ans = 1.28268099242501 2.4.3 Quadratic Polynomial Fitting To get a better understanding of the minimization algorithms, let us try a simple polynomial fitting algorithm that is very popular. The idea is quite simple: suppose we had a quadratic function q( x) ax 2 bx c its stationary point can be determined by setting the derivative to zero: dq 2ax b 0 dx x* b /(2a) So, given a complex function f ( x) and an initial guess point x0 , we sample f ( x) at three points surrounding x0 (including x0 ). Then using these three values we fit a polynomial q( x) ax 2 bx c then we find the new guess point x0 b /(2a) , and repeat the process till convergence is reached. tol = 1e-10; f = inline('3*sin(x) –x +0.1*x^2 +0.1*cos(2*x)'); fminQuadFit(f,-1, tol) % initial guess of -1 fminQuadFit(f,1, tol) % initial guess of 1 ans = -1.19457375791758 ans = 1.28268099909323 Note that the code does not distinuguish between minima and maxima. Let us try the code on a different function: tol = 1e-10; f = inline('x^2'); fminQuadFit(f,0.1, tol)