2. Single Variable Unconstrained Optimization

advertisement
2. Single Variable Unconstrained Optimization
2.1 Calculus Refresher
Consider the following problem in geometry.
Example: Construct a rectangle of perimeter L with the largest area.
Solution: Rectangles of various shapes can be constructed with perimeter L . Among these, we
need to determine the one with the largest area. Let the sides of the optimal rectangle be x & y .
We need to maximize Area  xy , subject to the constraint (2 x  2 y )  L
L  2x
Since y 
, we have Area  x( L  2 x) / 2
2
For what value x of does the above function take a maximum. Recall from basic calculus that if
df
the differentiable function f ( x) takes a maximum/ minimum at x * then
0
dx x  x*
In the above equation f ( x)  x( L  2 x) / 2
df
 ( L  2 x) / 2  x(2) / 2
dx
df
 ( L  2 x*) / 2  x *(2) / 2  0
dx x  x*
 x*  L / 4
L  2x *
 y* 
 L/4
2
Thus, the optimal shape is a square of side L / 4 and the area is L2 /16 . The above result is
confirmed below where we plot the area as a function of x.
(Move your mouse anywhere within the square bracket and PRESS CONTROL_ENTER to
execute the MATLAB command].
close all; L=2;
x= 0:0.01*L:L/2;
Area = x.*(L-2*x)/2;
plot(x,Area); grid on; axis('equal')
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
0

0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
The key mathematical concept used above was setting the derivative of a function to zero in
order to find the maximum. Much of optimization stems from this fundamental idea. Let us state
this result in a formal manner, and explore its meaning a little further.
Theorem: If a differentiable function f ( x) takes a maximum or minimum at x * then
df
0
dx x  x*
Proof: We provide an informal proof here; see [Reference] for details.
df
1 d2 f
1 d3 f
2
3
4
f ( x*  x)  f ( x* ) 
x 

x



 x   o  x 
2
3
dx x  x*
2! dx x  x*
3! dx x  x*
Consider the first two terms:
df
f ( x*  x)  f ( x* ) 
x
dx x  x*
Suppose f ( x* ) is a minimum, then by definition, f ( x* ) must be less than bpth f ( x*  x) and
f ( x*  x) for sufficiently small x
df
f ( x* )  f ( x*  x)  f ( x* ) 
x
dx x  x*
0
df
dx
x
x  x*
f ( x* )  f ( x*  x)  f ( x* ) 
0
df
dx
df
dx
x
x  x*
x
x  x*
df
x
dx x  x*
Thus, we must have
df
0
dx x  x*
0

Example 2: Consider a function f ( x)  x 2  5x  10 that is plotted below.
close all;
x= -1:0.1:5;
f =x.^2 -5*x+10;
plot(x,f);
grid on;
16
14
12
10
8
6
4
2
-1
0
1
2
3
4
5
Note that the function takes a minima at x = 2.5. Indeed:
df
 2x  5
dx
df
 2(2.5)  5  0
dx x  2.5
One can use Taylor command in MATLAB to observe that the second term is indeed 'missing':
f = sym('x*x-5*x+10');
ft = taylor(f,3,2.5) % 3 terms at x = 2.5
ft =
15/4+(x-5/2)^2

Example 3: Next consider the function f ( x)  3x 2  6 x  11 that is plotted below.
x= -5:0.1:5;
f =-3*x.^2 +6*x+11;
plot(x,f);
grid on;
20
0
-20
-40
-60
-80
-100
-5
df
dx
-4
-3
-2
-1
0
1
2
3
4
5
 6(1)  6  0
x 1
f = sym('-3*x*x-5*x+10');
ft = taylor(f,3,2.5) % 3 terms at x = 2.5

The above two examples merely confirm the correctness of the theorem.
Note the following: If the slope vanishes, we cannot conclude that the function takes a minima or
a maximuma, i.e.,
df
dx
 0  f ( x) takes a maximum or minimum at x *
x  x*
Example: Let us consider a different function f ( x)  3x3  9 that highlights the limitations of
the theorem. Note that at x = 0, the derivative vanishes
df
 9x2
0
x 0
dx
But the function neither takes a minima or a maxima at x = 0, as is plotted below.
x= -5:0.1:5;
f =3*x.^3 -9;
plot(x,f);
grid on;
400
300
200
100
0
-100
-200
-300
-400
-5
-4
-3
-2
-1
0
1
2
3
4
5
So, let us separate out the issues:
1. Points where the derivative of a function vanishes are called STATIONARY points.
Thus x = 2,5, in the first example, x = 1 in the second example and x = 0 in the third
example are all stationary points.
2. A stationary point can be a minima (as in the first example) or a maxima (as in the second
example) or a cusp (as in the third example).
How do we differentiate between the three cases? The three functions are reproduced below.
16
20
400
300
14
0
200
12
-20
100
10
-40
0
8
-100
-60
6
-200
-80
4
2
-1
-300
0
1
2
3
4
5
-100
-5
-4
-3
-2
-1
0
1
2
3
4
5
-400
-5
-4
-3
-2
-1
0
1
2
3
4
5
f ( x)  x 2  5x  10
f ( x)  3x 2  6 x  11
f ( x)  3 x 3  9
To differentiate between the three cases, we must consider the second derivative. Note that in the
first figure, the derivative reaches a zero at the minima and then increases. Mathematically, we
have:
d  df  d 2 f
 0 at a minima.
 
dx  dx  dx 2
Indeed, for the given function
d2 f
50
dx 2
While, in the second figure, the derivative reaches a zero at the maxima and then decreases, i.e.,
d  df  d 2 f
 0 at a maxima.
 
dx  dx  dx 2
Indeed, for the given function
d2 f
 6  0
dx 2
Finally, one may be tempted to conclude that if the second derivative is zero at a stationary point,
then it must be a cusp (neither a minima or a maxima). Unfortunately, this conclusion is false.
For example, consider the function
f ( x)  x 4  12 .
Note that
df
 4 x3  0  x  0
dx
But the function indeed takes a minima at x = 0, as is plotted below.
close all;
x= -5:0.1:5;
f =x.^4 +12;
plot(x,f);
grid on;
700
600
500
400
300
200
100
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
2.2 Optimality
In summary we have the following
Theorem 2: Consider a differentiable function f ( x) . Suppose at a point x * we have:
df
(a)
 0 then x * is a STATIONARY point (and of special interest in optimization).
dx x  x*
(b)
(c)
df
dx
df
dx
 0 and
x  x*
 0 and
x  x*
d2 f
dx 2
d2 f
dx 2
 0 then the function takes a MINIMA at x = x*
x  x*
 0 then the function takes a MAXIMA at x = x*
x  x*
d2 f
df
 0 one cannot conclude anything about the local behavior of x.
 0 and
dx 2 x  x*
dx x  x*
Proof: We provide an informal proof here; see [Reference] for details. Consider the Taylor
series:
df
1 d2 f
1 d3 f
2
3
4
f ( x*  x)  f ( x* ) 
x 

x

 
 x   o  x 
2
3
dx x  x*
2! dx x  x*
3! dx x  x*
(d)
Since f ( x* ) is a stationary point,
df
0
dx x  x*
Thus:
1 d2 f
2
*
*
f ( x  x)  f ( x ) 
 x 
2
2! dx x  x*
(b) Suppose
d2 f
0
dx 2 x  x*
then
1 d2 f
2
 x   0 for all x (both positive and negative)
2
2! dx x  x*
Thus:
1 d2 f
2
f ( x*  x)  f ( x* ) 
 x 
2
2! dx x  x*
Thus f ( x* ) must be less than both f ( x*  x) and f ( x*  x) for sufficiently small x
The other parts of the theorem can be similarly established.
End of Proof.
We now have a very powerful theorem that can be applied to find and check for minima/
maxima. In fact the above theorem can be used to find and detect multiple minima/ maxima.
Example: Consider the function f ( x)  x3  3x  5 .
df
 3x 2  3  0  x  1
dx
Thus, we have 2 stationary points x*  1. Let us now check what happens to the 2nd derivative
at these two stationary points.
d2 f
 6x
dx 2
At the two stationary points, we have:
d2 f
dx 2
60
x *1
d2 f
 6  0
dx 2 x*1
Thus, we can conclude from the above theorem that
x*  1 is a minima of the function, while
x*  1 is a maxima of the function.
We can confirm these findings by plotting the function
close all;
x= -2:0.1:2;
f =x.^3 -3*x+5;
plot(x,f);
grid on;
7
6.5
6
5.5
5
4.5
4
3.5
3
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
So far we have only considered polynomials. But the above theorem applies to any function (that
is twice differentiable).
Example: Consider the function f ( x)  sin x  x / 2 .
df
1
 cos x  1/ 2  0  cos x 
dx
2
There are infinite stationary points for this function (we have only considered the positive
values):
X *   / 4 7 / 4 9 / 4 ... 2n   / 4
Let us now check what happens to the 2nd derivative at these stationary points.
d2 f
  sin x
dx 2
We have:
 sin( / 4)  0
 sin(7 / 4)  0
 sin(9 / 4)  0
...
Thus there are infinite minima and maxima for this function, and they alternate.
We can confirm these findings by plotting the function
close all;
x= -3*pi:0.1:3*pi;
f =sin(x)-x/sqrt(2);
plot(x,f);
grid on;
8
6
4
2
0
-2
-4
-6
-8
-10
-8
-6
-4
-2
0
2
4
6
8
10
Note: The above example also illustrates a very important concept that the second fundamental
theorem of optimization only helps us find LOCAL minima and maxima. None of the minima/
maxima is a GLOBAL minima/ maxima.
Thus one must be contended with finding local minimas and maximas.The next example
illustrates that even finding the local minima and maxima is hard using the above theorem.
Example: Consider the function f ( x)  3sin x  x  0.1x 2  0.1cos(2 x)
df
 3cos x  1  0.2 x  0.2sin(2 x)
dx
The above theorem states that one must find the stationary points where the derivative vanishes.
But solving for x in
df
 3cos x  1  0.2 x  0.2sin(2 x)  0
dx
is not at all trivial.
2.4 Algorithms to Compute Stationary Points
For non-trivial problems, there are two broad classes of methods to find stationary points:
df
 0 using a non-linear solver
(1) Solve
dx
(2) Directly find the minima/ maxima of f ( x)
In particular we will consider the function f ( x)  sin x  x6  x  19 as an example. Since all
algorithms require an initial guess point, let us plot the function.
close all;
x= -15:0.1:15;
f =3*sin(x) –x +0.1*x.^2 + 0.1*cos(2*x);
plot(x,f);
grid on;
40
35
30
25
20
15
10
5
0
-5
-10
-15
-10
-5
0
5
10
15
First, we will use built-in MATLAB programs to solve via both methods.
2.4.1 FSOLVE Non-Linear Solver
We will use the MATLAB "fsolve" function to solve the non-linear equation. All non-linear
solvers require an initial guess point. Observe that the function has numerous minimas and
maximas. Depending on the starting guess points, algorithms will converge typically to the
nearest stationary point.
options = optimset('TolFun',1e-10,'TolX',1e-10,'Display','off');
dfdx = inline('3*cos(x) –1 +0.2*x - 0.2*sin(2*x)');
fsolve(dfdx,-1,options) % initial guess of -1
fsolve(dfdx,1, options) % initial guess of 1
fsolve(dfdx,5, options) % initial guess of 5
ans =
-1.19457375875414
ans =
1.28268099939306
ans =
4.72836823924901
2.4.2 FMINUNC Minimization Solver
We will use the MATLAB "fminunc" function to find minimal points. All optimization solvers
require an initial guess point. Observe that the function has numerous minimas. Depending on
the starting guess points, algorithms will converge typically to the nearest stationary point.
options = optimset('TolFun',1e-10,'TolX',1e-10, 'TolCon',1e-10,'Display',
'Off', 'LargeScale', 'Off', 'GradObj', 'off', 'GradConstr','off');
f = inline('3*sin(x) –x +0.1*x^2 +0.1*cos(2*x)');
fminunc(f,-1, options) % initial guess of -1
fminunc(f,1, options) % initial guess of 1
fminunc(f,5, options) % initial guess of 5
ans =
-1.19457374992716
ans =
-1.19457376204777
ans =
4.72836822582243
Also observe that for the initial guess we converge to the closest minima (whereas the non-linear
solver converged to the nearest stationary point). To find the closest maxima using fminunc you
need to define the negative of the function.
fNeg = inline('-3*sin(x) +x -0.1*x^2 -0.1*cos(2*x)');
fminunc(fNeg,1, options) % initial guess of 1
ans =
1.28268099242501
2.4.3 Quadratic Polynomial Fitting
To get a better understanding of the minimization algorithms, let us try a simple polynomial
fitting algorithm that is very popular. The idea is quite simple: suppose we had a quadratic
function
q( x)  ax 2  bx  c
its stationary point can be determined by setting the derivative to zero:
dq
 2ax  b  0
dx
 x*  b /(2a)
So, given a complex function f ( x) and an initial guess point x0 , we sample f ( x) at three points
surrounding x0 (including x0 ). Then using these three values we fit a polynomial
q( x)  ax 2  bx  c
then we find the new guess point x0  b /(2a) , and repeat the process till convergence is
reached.
tol = 1e-10;
f = inline('3*sin(x) –x +0.1*x^2 +0.1*cos(2*x)');
fminQuadFit(f,-1, tol) % initial guess of -1
fminQuadFit(f,1, tol) % initial guess of 1
ans =
-1.19457375791758
ans =
1.28268099909323
Note that the code does not distinuguish between minima and maxima. Let us try the code on a
different function:
tol = 1e-10;
f = inline('x^2');
fminQuadFit(f,0.1, tol)
Download