L7_Optimization

advertisement
MA2213 Lecture 7
Optimization
Topics
The Best Approximation Problem pages 159-165
Chebyshev Polynomials pages 165-171
Finding the Minimum of a Function
Gradient of a Function
Method of Steepest Descent
Constrained Minimization
http://www.mat.univie.ac.at/~neum/glopt/applications.html
http://en.wikipedia.org/wiki/Optimization_(mathematics)
What is “argmin” ?
min x (x -1)  
1
4
argmin x (x -1) 
1
2
x  [ 0,1]
x  [ 0,1]
argmin x (1 - x)  { 0,1}
x  [ 0,1]
Optimization Problems
Least Squares : given ( x1, y1 ),..., ( xn , yn )
y  C ([ a, b])
argmin
f V
and a subspace V
 j1 (f(x j )  y j )
n
2
or
 C ([ a, b])
or argmin
f V

b
a
compute
(f(x )  y( x)) dx
2
Spline Interpolation : given a  x1  x2    xn1  xn  b
compute argmin
s V
where

b
a
[s '' ( x)]2 dx
V  { f  C ([ a, b]), f ( xi )  yi , i  1,..., n }
2
LS equations page 179 are derived using differentiation.
Spline equations pages 149-151 are derived similarly.
The Best Approximation Problem p.159
Definition For
f  C ([ a, b])
and integer
n0
n (f )  min [ max | f ( x)  p ( x) | ]
p  Pn
where
a xb
Pn  { polynomial s of degree  n }
Definition The best approximation problem is to compute
arg min [ max | f ( x)  p ( x) | ]
p  Pn
a xb
Best approximation pages 159-165 is more complicated
Best Approximation Examples
mn  arg min [ max | e  p ( x) | ]
x
p  Pn
1 x 1
1
m0  (e  e) / 2  1.5431
max | e  m0 |  1.1752
x
1 x 1
m 1 ( x)  1.1752 x  1.2643
max | e  m1 ( x) |  0.279
x
1 x 1
Best Approximation Degree 0
Best Approx. Error Degree 0
Best Approximation Degree 1
Best Approx. Error Degree 1
Properties of Best Approximation
Figures 4.43 and 4.14 on page 162 display the error
for the degree 3 Taylor Approximation (at x = 0) and the
error for the Best Approximation of degree 3 over the
interval [-1,1] for exp(x), together with the figures in the
preceding slides, support assertions on pages 162-163:
1. Best approximation gives much smaller error than
Taylor approximation.
2. Best approximation error tends to be dispersed over
the interval rather that at the end.
3. Best approximation error is oscillatory, it changes
sign at least n+1 times in the interval and the sizes
of the oscillations will be equal.
Theoretical Foundations
Theorem 1. (Weierstrass Approximation Theorem 1885).
If
f  C ([ a, b]) and
polynomial
p
 0
then there exists a
such that | f ( x) 
p( x) |   , x [a, b].
Proof Weierstrass’s original proof used properties of
solutions of a partial differential equation called the heat
equation. A modern, more constructive proof based on
Bernstein polynomials is given on pages 320-323 of
Kincaid and Cheney’s Numerical Analysis: Mathematics
of Scientific Computing, Brooks Cole, 2002.
Corollary
f  C ([ a, b]) 
lim n (f )  lim min [ max | f ( x)  p ( x) | ]  0
n
n p  Pn
a xb
Accuracy of Best Approximation
If f  C  ( [a, b] ) then
n (f )  min [ max | f ( x)  p ( x) | ]
p  Pn
satisfies
a xb
n 1
(b  a)
(n 1)
 n (f ) 
max
|
f
(
x
)
|
2 n 1 a  x b
(n  1)! 2
Table 4.6 on page 163 compares this upper bound with
computed values of
 n (e ), n  1, 2, 3, 4, 5, 6, 7
x
and shows that it is about 2.5 times larger.
Theoretical Foundations
Theorem 2. (Chebyshev’s Alternation Theorem 1859).
If
f  C ([ a, b])  and
n0
then
p  arg min [ max | f ( x)  p ( x) | ]
p  Pn
iff there exist points
a xb
x0  x1    xn  xn1
[ a, b] such that
k
f ( xk )  p( xk )  (1) c || f  p || , 0  k  n  1
where c  1 and || f  p ||  max | f ( x)  p ( x) |
in
a xb
Proof Kincaid and Cheney page 416
Sample Problem
In Example 4.4.1 on page 160 the author states that the
function m1 ( x)  1.1752x  1.2643 is the best linear
x
mimimax polynomial to e on [1,1] . Equivalenty stated
m1  arg min [ max | e  p ( x) | ]
x
1  x  1
p  P1
Problem. Use Theorem 2 to prove this statement.
Solution It suffices to find points
such that | e
xj
x0  x1  x2 in [1,1]
 m1 ( x j ) |  max | e x  m1 ( x) | , j  1,2,3
and the sequence
1  x  1
e  m1 ( x j ) changes sign twice.
xj
Sample Problem
Step 1. Compute the set
arg max | e  m1 ( x) |
x
1  x  1
Question Can this set be empty ?
Observe that if
| e  m1 ( x) |
x
has a maximum at
x
e
x  y  (1,1) then  m1 ( x) has a either a
maximum or a minimum at x  y  (1,1) therefore


d
x
y
e  1.1752 x  1.2643 | x  y  e  1.1752  0
dx
so y  log e (1.1752)  0.1614 is the only point in
x
(1,1) where | e  m1 ( x) | can have a maximum.
Sample Problem
( 1,1)  [ 1,1] therefore
might have a maximum at x  1
Step 2. Observe that
| e  m1 ( x) |
x
and / or at
x  1 Equivalently stated
arg max | e  m1 ( x) |  {  1, 0.1614, 1 }
x
1  x  1
The maximum MUST occur at 1, 2, or all 3 points !
Step 3. Compute
0.1614
1
e
1
1
e  m1 (1)  0.2788
 m (0.1614)   0.2788
e  m1 (1)  0.2788
Step 4. Choose sequence
x0  1, x1  0.1614, x2  1
Remez Exchange Algorithm
described in pages 416-419 of Kincaid and Cheney, is
based on Theorem 2. Invented by Evgeny Yakovlevich
Remez in 1934, it is a powerful computational algorithm
that has vast applications in the design of engineering
systems such as the tuning filters that allow your TV
and Mobile Telephone to tune in to the program of your
choice or to listent (only) to the person who calls you.
http://en.wikipedia.org/wiki/Remez_algorithm
http://www.eepatents.com/receiver/Spec.html#D1
http://comelec.enst.fr/~rioul/publis/199302rioulduhamel.pdf
Chebyshev Polynomials
Definition The Chebyshev polynomials
T0 , T1 , T2 ,...
are defined by the equation
Tn (cos  )  cos( n ), n  0,1,2,...
Remark Clearly
T0 ( x)  1, T1 ( x)  x ,
T2 ( x)  2 x  1
however, it is NOT obvious that there EXISTS a
polynomial that satisfies the equation above for
EVERY nonnegative integer !
2
Triple Recursion Relation
derived on pages 167-168 is
Tn1 ( x)  2 xTn ( x)  Tn1 ( x), n  1
Result 1.
T3 ( x)  4 x  3 x
3
T4 ( x)  8 x  8 x  1
5
3
T5 ( x)  16 x  20 x  5 x
4
n 1
2
Result 2.
Tn ( x)  2
x  lower degree terms
Result 3.
Tn ( x)  (1) Tn ( x)
n
n
Euler and the Binomial Expansion
2 cos(n )  e
give
in
e
 in
 [cos   i sin  ]  [cos   i sin  ]
n
n
n
n n
nk
k
  k 0  (cos  ) (i sin  )   k 0  (cos  ) n k (i sin  ) k
k 
k 
n


0 2 j  n
n


j
n2 j
2
j
(1)  (cos  ) (1  cos  )
2 j
 2 Tn (cos  )
Definition
Gradients
T
 F F
F 
F ( x1 ,..., xn )  


x1 x2
xn 

Examples
3
2
x


1
2
2
(3x1  7 x2 )    ( x  x ) 
1
2



7
 
2 x2 
T
n
1 T
defined
by
F ( x1 ,..., xn )  2 x Ax  b x
F:R R
T
nn
n
x

[
x
,...,
x
]
, bR
where
1
n and symmetric A  R
 F  Ax  b
http://en.wikipedia.org/wiki/Gradient
Geometric Meaning
n
n
n
Result If F : R  R, x  R and u  R
T
is a unit vector (u u  1) then

d
n  F
F ( x  tu)   j 1 
( x)  u j  u T (F ( x))
x

dt
 j

This has a maximum value when
and it equals
|| F ( x) ||2
F ( x )
u
|| F ( x) ||2
Therefore, the gradient of F at x is a vector in whose
direction F has steepest ascent (or increase) and
whose magnitude equals the rate of increase.
Question : What is the direction of steepest descent ?
Minima and Maxima
n
Theorem (Calculus) If F : R  R has a minimal
or a maximal value
F ( y)
then
(F )( y )  0
Example If F ( x1 , x2 )  x  2 x1  1  3x  ( x1  1)  3x
2
1
2
2
and
min F ( x1 , x2 )  F (1,0)  0
T
(F )( x1 , x2 )  [2 x1  2,6 x2 ]
so
(F )( 1,0)  [0, 0]  0
then
T
2
Remark The function G : R
2
2
G( x1 , x2 )  x1  x2 satisfies
however
G
R
defined by
(G )(0,0)  0
has no maxima and no minima.
2
2
2
Linear Equations and Optimization
nn
Theorem If P  R
is symmetric and positive definite
b R
then for every
defined by
n
the function
F :R  R
n
F ( x1 ,..., xn )  x Px  b x
1
2
T
T
satisfies the following three properties:
1.
2.
3.
lim F ( x1 ,..., xn )  
|| x|| 
F
y
Proof Let
has a minimum value F ( y )
satisfies Py  b therefore it is unique.
c  min x Px Since P
T
|| x|| 1
is pos. def.
F ( x)  
c  0  x Px  c || x ||  ||xlim
|| 
T
2


Linear Equations and Optimization
Therefore there exists a number r  0 such that
|| x ||  r  F ( x)  F (0)
n
Since the set Br  { x  R : || x ||  r } is
bounded and closed, there exists
F ( y )  minn F ( x)
y Br
such that
Therefore, by the preceding
xR
0  (F )( y ) Furthermore, since
T
1 T
F ( x)  2 x Px  b x  F ( x)  Px  b
calculus theorem
it follows that
1
0  Py  b  y  P b
Application to Least Squares Geometry
mn
Theorem Given m  n  0 and a matrix B  R
with rank ( B )  n (or equivalently, B B nonsingular),
m
and y  R then the following conditions are equivalent
T
(i) The function F ( x)  ( Bx  y)T ( Bx  y),
has a minimum value at
(ii)
(iii)
1
c  ( B B) B y
T
xR
n
xc
T
Bc  y  span { columns of B }
this is read as : Bc-y is orthogonal (or perpendicular) to
the subspace of
R
m
spanned by column vectors of
B
Application to Least Squares Geometry
Proof (i) iff (ii) First observe that
F ( x)  ( Bx  y ) ( Bx  y )  2[ x Px  b x]  y y
T
1
2
T
T
T
b  B y , P  B B is symmetric and positive definite.
T
T
If F(x) has minimum value at
theorem implies that
xc
then the preceding
1
Pc  b  c  ( B B) B y
T
T
(ii) iff (iii)
Bc  y  span { columns of B }
iff 
T
1
B ( Bc  y)  0  c  ( B B) B y
T
T
This proof that (ii) iff (iii) was emailed to me by Fu Xiang
Steepest Descent Method
of Cauchy (1847) is a numerical algorithm to solve
n
the following problem: given F : R  R compute
y  arg min F
xR n
1. Start with
2. Compute
3. Compute
4. Compute
y1 and for k  1: N do following:
d k  F ( yk )
tk  arg min F ( yk  t d k )
yk 1  yk  tk d k
Reference : pages 440-441 Numerical Methods by
Dahlquist, G. and Bjorck, A., Prentice-Hall, 1974.
Application of Steepest Descent
to minimize the previous function
F :R  R
n
F ( x)  x Ax  b x, F ( x)  Ax  b
1. Start with y1 and for k  1 : N do following:
1
2
2. Compute
T
T
d k  F ( yk )  b  Ayk
tk  arg min F ( yk  t d k )
T
d
dt F ( yk  t d k ) | t t k  d k ( yk  t d k ) | t t k  0
T
T
 t k  d k (b  Ayk ) / d k Ad k
3. Compute
4. Compute
yk 1  yk  tk d k
MATLAB CODE
function [A,b,y,er] = steepdesc(N,y1)
% function [A,b,y,er] = steepdesc(N,y1)
A = [1 1;1 2];
contour(X,Y,F,20)
b = [2 3]';
hold on
dx = 1/10;
quiver(X,Y,FX,FY);
for i = 1:21
y(:,1) = y1;
for j = 1:21
for k = 1:N
x = [(i-1)*dx (j-1)*dx]';
yk = y(:,k);
F(i,j) = .5*x'*A*x - b'*x;
dk = b - A*yk;
end
tk = dk'*(b-A*yk)/(dk'*A*dk);
end
y(:,k+1) = yk + tk*dk;
X = ones(21,1)*(0:.1:2);
er(k) = norm(A*y(:,k+1)-b);
Y = X';
end
[FX,FY] = gradient(F);
plot(y(1,:),y(2,:),'ro')
Graphics of Steepest Descent
Constrained Optimization
F :R  R
n
Problem Minimize
constraint
subject to a
c ( x )  0 where c : R  R
n
m
The Lagrange-multiplier method computes
y  R ,   R that solves the
m
n-equations F ( y ) 
 j 1  j c j ( y)
n
m
and the m-equations
c( y )  0
This will generally result in a nonlinear system of
equations – the topic that discussed in Lecture 9.
http://en.wikipedia.org/wiki/Lagrange_multiplier
http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html
Examples
1. Minimize
F ( x1 , x2 )  x  x
x1  x2 1  0
2
1
Since
2
2
with the constraint
( x1  x2  1)  [1 1]
T
the method of Lagrange multipliers gives
F ( y1 , y2 )  2[ y1 y2 ]  ( y1  y2  1)  [  ]
T
and
T
y1  y2  1  0  y1  y2  ,   1
1
2
nn
F ( x)  x Ax where A  R is symmetric
T
and positive definite, subject to the constraint x x  1  0
T
This gives F ( y )  2 Ay  ( y y)  2 y hence
y is an eigenvector of A and F ( y)  yT Ay  yT y  
Therefore   0 and is the largest eigenvalue of A
2. Maximize
T
Homework Due Tutorial 4 (Week 9, 15-19 Oct)
1. Do problem 7 on page 165. Suggestion: practice by doing
problem 2 on page 164 and problem 5 on page 165 since these
problems are similar and have solutions on pages 538-539. Do
NOT hand in solutions for your practice problems.
2. Do problem 10 on pages 170-171. Suggestion: study the
discussion of the minimum size property on pages 168-169.
Then practice by doing problem 3 on page 169. Do
NOT hand in solutions for your practice problems.
Extra Credit : Compute
arg min [ max | x  p( x) | ]
n
pPn1
1 x 1
Suggestion: THINK about Theorem 2 and problem 3 on page 169.
Homework Due Tutorial 4 (Week 9, 15-19 Oct)
3. The trapezoid method for integrating a function f
using
n
equal length subintervals can be shown to give an
estimate having the form
T (n)  I  a1n 2  a2 n 4  a3n 6  
b
where
 C  ([ a, b])
I   f ( x) dx
a
and the sequence
a1 , a2 , a3 , depends
S (2n)  43 T (2n)  13 T (n) where
f.
S ( 2n) is the estimate for the integral obtained using Simpson’s
on
(a) Show that for any
method with 2n equal length subintervals. (b) Use this fact to
together with the form of T (n ) above to prove that there exists a
sequence b1 , b2 , b3 , with S (n)  I  b n 4  b n 6  
(c) Compute constants
c1 , c2 , with
r1 , r2 , r3
1
2
so that there exists a sequence
6
8
r1T (n)  r2T (2n)  r3T (4n)  I  c1n  c2 n  
Homework Due Lab 4 (Week 10, 22-26 October)
4. Consider the equations for the 9 variables inside the array
x01  1 x02  2 x03  3
 
 
x 1

x
x
x
x

4
11
12
13
14
 10

 x20  1
x21
x22
x23
x24  4


x31
x32
x33
x34  4
 x30  1
 
x41  2 x42  3 x43  4
 
xi 1, j  xi 1, j  xi , j 1  xi , j 1  4 xi , j  0, i, j  1,2,3
A  R 99 , b  R 9
(a) Write these equations as Ax  b
then solve using Gauss Elim. and display the solution in the array.
where
(b) Compute the Jacobi iteration matrix
BR
99
and
|| B || .
(c) Write a MATLAB program to implement the Jacobi method
for a (n+2) x (n+2) array without computing a sparse matrix A.
Homework Due Tutorial 4 (Week 9, 15-19 Oct)
5. Consider the equation
Ax  b where A  R
nn
,b R
n
 sin( mh) 
 2 1 0  0 
sin( 2mh)
 1 2 1  0 




v m  sin( 3mh)  A   0
1   0









2
1




sin( nmh) 
 0  0 1  2
(a) Prove
that the
vectors
where
h

n 1
, m  1,..., n
are eigenvectors of
A
and
compute their eigenvalues.
(b) Prove that the Jacobi method for this matrix converges by
showing that the spectral radius of the iteration matrix is < 1.
Homework Due Lab 4 (Week 10,22-26 October)
1. (a) Modify the computer code developed for Lab 3 to compute
polynomials that interpolate the function 1/(1+x*x) on the interval
[-5,5] based on N = 4, 8, 16, and 32 nodes located at the points
x(j) = 5 cos((2 j – 1)pi/(2N)), j = 1,…,N. (b) Compare the results
with the results you obtained in Lab 3 using uniform nodes.
(c) Plot the functions
w( x)  ( x  x(1))( x  x(2))  ( x  x( N ))
both for the case where the nodes x(j) are uniformly and where
they are chosen as above. (d) Show that x(j) / 5 are the zeros of a
Chebyshev polynomial, then derive a formula for w(x) and use this
formula to explain why the use of the nonuniform nodes x(j) above
gives a smaller interpolation error than the use of uniform nodes.
Homework Due Lab 4 (Week 10,22-26 October)
2. (a) Write computer code to compute trapezoidal approximations
for
I 
4
0
dx
1
 tan (4)  1.32581766366803
2
1 x
and run this code to compute approximations I(n) and associated
errors for n = 2, 4, 8, 16, 32, 64 and 128 intervals. (b) Use the
(Romberg) formula that you developed in Tutorial 4 to combine
I(n), I(2n), and I(4n) for n = 2,4,8,16,32 to develop more accurate
approximations R(n). Compute the ratios of consecutive errors
(I-I(2n))/(I-I(n)) and (I-R(2n))/(I-R(n)) for n = 2,4,8,16, present
them in a table and discuss them (I denotes exact integral).
(c) Compute approximations to the integral in (a) using Gauss
quadrature with n = 1, 2, 3, 4, and present the errors in a table
and compare them to the errors obtained in (a), (b) above.
Homework Due Lab 5 (Week 12, 5-9 November)
3. (a) Use the MATLAB program for Prob4(c)Homework dueTut. 4
to compute the internal variables in the following array for n = 50.
 

 x1, 0  1



2
x

n
 n,0
 




x1,n 1   n(n  2)






xn ,1

x33
xn ,n 1  2n  1 

xn 1,1  n(n  2)  xn 1,n  2n  1


x0,1  1
x1,1


x0,n   n 2
x1,n
that satisfy the inequalities
| xi 1, j  xi 1, j  xi , j 1  xi , j 1  4 xi , j |  104 , 1  i, j  n
(b) Display the solution using MATLAB mesh&contour commands.
(c) Find a polynomial P of two variables so the exact solution
satisfies xi , j  P(i, j ) and use it to compute&display the error.
Download