MA2213 Lecture 7 Optimization Topics The Best Approximation Problem pages 159-165 Chebyshev Polynomials pages 165-171 Finding the Minimum of a Function Gradient of a Function Method of Steepest Descent Constrained Minimization http://www.mat.univie.ac.at/~neum/glopt/applications.html http://en.wikipedia.org/wiki/Optimization_(mathematics) What is “argmin” ? min x (x -1) 1 4 argmin x (x -1) 1 2 x [ 0,1] x [ 0,1] argmin x (1 - x) { 0,1} x [ 0,1] Optimization Problems Least Squares : given ( x1, y1 ),..., ( xn , yn ) y C ([ a, b]) argmin f V and a subspace V j1 (f(x j ) y j ) n 2 or C ([ a, b]) or argmin f V b a compute (f(x ) y( x)) dx 2 Spline Interpolation : given a x1 x2 xn1 xn b compute argmin s V where b a [s '' ( x)]2 dx V { f C ([ a, b]), f ( xi ) yi , i 1,..., n } 2 LS equations page 179 are derived using differentiation. Spline equations pages 149-151 are derived similarly. The Best Approximation Problem p.159 Definition For f C ([ a, b]) and integer n0 n (f ) min [ max | f ( x) p ( x) | ] p Pn where a xb Pn { polynomial s of degree n } Definition The best approximation problem is to compute arg min [ max | f ( x) p ( x) | ] p Pn a xb Best approximation pages 159-165 is more complicated Best Approximation Examples mn arg min [ max | e p ( x) | ] x p Pn 1 x 1 1 m0 (e e) / 2 1.5431 max | e m0 | 1.1752 x 1 x 1 m 1 ( x) 1.1752 x 1.2643 max | e m1 ( x) | 0.279 x 1 x 1 Best Approximation Degree 0 Best Approx. Error Degree 0 Best Approximation Degree 1 Best Approx. Error Degree 1 Properties of Best Approximation Figures 4.43 and 4.14 on page 162 display the error for the degree 3 Taylor Approximation (at x = 0) and the error for the Best Approximation of degree 3 over the interval [-1,1] for exp(x), together with the figures in the preceding slides, support assertions on pages 162-163: 1. Best approximation gives much smaller error than Taylor approximation. 2. Best approximation error tends to be dispersed over the interval rather that at the end. 3. Best approximation error is oscillatory, it changes sign at least n+1 times in the interval and the sizes of the oscillations will be equal. Theoretical Foundations Theorem 1. (Weierstrass Approximation Theorem 1885). If f C ([ a, b]) and polynomial p 0 then there exists a such that | f ( x) p( x) | , x [a, b]. Proof Weierstrass’s original proof used properties of solutions of a partial differential equation called the heat equation. A modern, more constructive proof based on Bernstein polynomials is given on pages 320-323 of Kincaid and Cheney’s Numerical Analysis: Mathematics of Scientific Computing, Brooks Cole, 2002. Corollary f C ([ a, b]) lim n (f ) lim min [ max | f ( x) p ( x) | ] 0 n n p Pn a xb Accuracy of Best Approximation If f C ( [a, b] ) then n (f ) min [ max | f ( x) p ( x) | ] p Pn satisfies a xb n 1 (b a) (n 1) n (f ) max | f ( x ) | 2 n 1 a x b (n 1)! 2 Table 4.6 on page 163 compares this upper bound with computed values of n (e ), n 1, 2, 3, 4, 5, 6, 7 x and shows that it is about 2.5 times larger. Theoretical Foundations Theorem 2. (Chebyshev’s Alternation Theorem 1859). If f C ([ a, b]) and n0 then p arg min [ max | f ( x) p ( x) | ] p Pn iff there exist points a xb x0 x1 xn xn1 [ a, b] such that k f ( xk ) p( xk ) (1) c || f p || , 0 k n 1 where c 1 and || f p || max | f ( x) p ( x) | in a xb Proof Kincaid and Cheney page 416 Sample Problem In Example 4.4.1 on page 160 the author states that the function m1 ( x) 1.1752x 1.2643 is the best linear x mimimax polynomial to e on [1,1] . Equivalenty stated m1 arg min [ max | e p ( x) | ] x 1 x 1 p P1 Problem. Use Theorem 2 to prove this statement. Solution It suffices to find points such that | e xj x0 x1 x2 in [1,1] m1 ( x j ) | max | e x m1 ( x) | , j 1,2,3 and the sequence 1 x 1 e m1 ( x j ) changes sign twice. xj Sample Problem Step 1. Compute the set arg max | e m1 ( x) | x 1 x 1 Question Can this set be empty ? Observe that if | e m1 ( x) | x has a maximum at x e x y (1,1) then m1 ( x) has a either a maximum or a minimum at x y (1,1) therefore d x y e 1.1752 x 1.2643 | x y e 1.1752 0 dx so y log e (1.1752) 0.1614 is the only point in x (1,1) where | e m1 ( x) | can have a maximum. Sample Problem ( 1,1) [ 1,1] therefore might have a maximum at x 1 Step 2. Observe that | e m1 ( x) | x and / or at x 1 Equivalently stated arg max | e m1 ( x) | { 1, 0.1614, 1 } x 1 x 1 The maximum MUST occur at 1, 2, or all 3 points ! Step 3. Compute 0.1614 1 e 1 1 e m1 (1) 0.2788 m (0.1614) 0.2788 e m1 (1) 0.2788 Step 4. Choose sequence x0 1, x1 0.1614, x2 1 Remez Exchange Algorithm described in pages 416-419 of Kincaid and Cheney, is based on Theorem 2. Invented by Evgeny Yakovlevich Remez in 1934, it is a powerful computational algorithm that has vast applications in the design of engineering systems such as the tuning filters that allow your TV and Mobile Telephone to tune in to the program of your choice or to listent (only) to the person who calls you. http://en.wikipedia.org/wiki/Remez_algorithm http://www.eepatents.com/receiver/Spec.html#D1 http://comelec.enst.fr/~rioul/publis/199302rioulduhamel.pdf Chebyshev Polynomials Definition The Chebyshev polynomials T0 , T1 , T2 ,... are defined by the equation Tn (cos ) cos( n ), n 0,1,2,... Remark Clearly T0 ( x) 1, T1 ( x) x , T2 ( x) 2 x 1 however, it is NOT obvious that there EXISTS a polynomial that satisfies the equation above for EVERY nonnegative integer ! 2 Triple Recursion Relation derived on pages 167-168 is Tn1 ( x) 2 xTn ( x) Tn1 ( x), n 1 Result 1. T3 ( x) 4 x 3 x 3 T4 ( x) 8 x 8 x 1 5 3 T5 ( x) 16 x 20 x 5 x 4 n 1 2 Result 2. Tn ( x) 2 x lower degree terms Result 3. Tn ( x) (1) Tn ( x) n n Euler and the Binomial Expansion 2 cos(n ) e give in e in [cos i sin ] [cos i sin ] n n n n n nk k k 0 (cos ) (i sin ) k 0 (cos ) n k (i sin ) k k k n 0 2 j n n j n2 j 2 j (1) (cos ) (1 cos ) 2 j 2 Tn (cos ) Definition Gradients T F F F F ( x1 ,..., xn ) x1 x2 xn Examples 3 2 x 1 2 2 (3x1 7 x2 ) ( x x ) 1 2 7 2 x2 T n 1 T defined by F ( x1 ,..., xn ) 2 x Ax b x F:R R T nn n x [ x ,..., x ] , bR where 1 n and symmetric A R F Ax b http://en.wikipedia.org/wiki/Gradient Geometric Meaning n n n Result If F : R R, x R and u R T is a unit vector (u u 1) then d n F F ( x tu) j 1 ( x) u j u T (F ( x)) x dt j This has a maximum value when and it equals || F ( x) ||2 F ( x ) u || F ( x) ||2 Therefore, the gradient of F at x is a vector in whose direction F has steepest ascent (or increase) and whose magnitude equals the rate of increase. Question : What is the direction of steepest descent ? Minima and Maxima n Theorem (Calculus) If F : R R has a minimal or a maximal value F ( y) then (F )( y ) 0 Example If F ( x1 , x2 ) x 2 x1 1 3x ( x1 1) 3x 2 1 2 2 and min F ( x1 , x2 ) F (1,0) 0 T (F )( x1 , x2 ) [2 x1 2,6 x2 ] so (F )( 1,0) [0, 0] 0 then T 2 Remark The function G : R 2 2 G( x1 , x2 ) x1 x2 satisfies however G R defined by (G )(0,0) 0 has no maxima and no minima. 2 2 2 Linear Equations and Optimization nn Theorem If P R is symmetric and positive definite b R then for every defined by n the function F :R R n F ( x1 ,..., xn ) x Px b x 1 2 T T satisfies the following three properties: 1. 2. 3. lim F ( x1 ,..., xn ) || x|| F y Proof Let has a minimum value F ( y ) satisfies Py b therefore it is unique. c min x Px Since P T || x|| 1 is pos. def. F ( x) c 0 x Px c || x || ||xlim || T 2 Linear Equations and Optimization Therefore there exists a number r 0 such that || x || r F ( x) F (0) n Since the set Br { x R : || x || r } is bounded and closed, there exists F ( y ) minn F ( x) y Br such that Therefore, by the preceding xR 0 (F )( y ) Furthermore, since T 1 T F ( x) 2 x Px b x F ( x) Px b calculus theorem it follows that 1 0 Py b y P b Application to Least Squares Geometry mn Theorem Given m n 0 and a matrix B R with rank ( B ) n (or equivalently, B B nonsingular), m and y R then the following conditions are equivalent T (i) The function F ( x) ( Bx y)T ( Bx y), has a minimum value at (ii) (iii) 1 c ( B B) B y T xR n xc T Bc y span { columns of B } this is read as : Bc-y is orthogonal (or perpendicular) to the subspace of R m spanned by column vectors of B Application to Least Squares Geometry Proof (i) iff (ii) First observe that F ( x) ( Bx y ) ( Bx y ) 2[ x Px b x] y y T 1 2 T T T b B y , P B B is symmetric and positive definite. T T If F(x) has minimum value at theorem implies that xc then the preceding 1 Pc b c ( B B) B y T T (ii) iff (iii) Bc y span { columns of B } iff T 1 B ( Bc y) 0 c ( B B) B y T T This proof that (ii) iff (iii) was emailed to me by Fu Xiang Steepest Descent Method of Cauchy (1847) is a numerical algorithm to solve n the following problem: given F : R R compute y arg min F xR n 1. Start with 2. Compute 3. Compute 4. Compute y1 and for k 1: N do following: d k F ( yk ) tk arg min F ( yk t d k ) yk 1 yk tk d k Reference : pages 440-441 Numerical Methods by Dahlquist, G. and Bjorck, A., Prentice-Hall, 1974. Application of Steepest Descent to minimize the previous function F :R R n F ( x) x Ax b x, F ( x) Ax b 1. Start with y1 and for k 1 : N do following: 1 2 2. Compute T T d k F ( yk ) b Ayk tk arg min F ( yk t d k ) T d dt F ( yk t d k ) | t t k d k ( yk t d k ) | t t k 0 T T t k d k (b Ayk ) / d k Ad k 3. Compute 4. Compute yk 1 yk tk d k MATLAB CODE function [A,b,y,er] = steepdesc(N,y1) % function [A,b,y,er] = steepdesc(N,y1) A = [1 1;1 2]; contour(X,Y,F,20) b = [2 3]'; hold on dx = 1/10; quiver(X,Y,FX,FY); for i = 1:21 y(:,1) = y1; for j = 1:21 for k = 1:N x = [(i-1)*dx (j-1)*dx]'; yk = y(:,k); F(i,j) = .5*x'*A*x - b'*x; dk = b - A*yk; end tk = dk'*(b-A*yk)/(dk'*A*dk); end y(:,k+1) = yk + tk*dk; X = ones(21,1)*(0:.1:2); er(k) = norm(A*y(:,k+1)-b); Y = X'; end [FX,FY] = gradient(F); plot(y(1,:),y(2,:),'ro') Graphics of Steepest Descent Constrained Optimization F :R R n Problem Minimize constraint subject to a c ( x ) 0 where c : R R n m The Lagrange-multiplier method computes y R , R that solves the m n-equations F ( y ) j 1 j c j ( y) n m and the m-equations c( y ) 0 This will generally result in a nonlinear system of equations – the topic that discussed in Lecture 9. http://en.wikipedia.org/wiki/Lagrange_multiplier http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html Examples 1. Minimize F ( x1 , x2 ) x x x1 x2 1 0 2 1 Since 2 2 with the constraint ( x1 x2 1) [1 1] T the method of Lagrange multipliers gives F ( y1 , y2 ) 2[ y1 y2 ] ( y1 y2 1) [ ] T and T y1 y2 1 0 y1 y2 , 1 1 2 nn F ( x) x Ax where A R is symmetric T and positive definite, subject to the constraint x x 1 0 T This gives F ( y ) 2 Ay ( y y) 2 y hence y is an eigenvector of A and F ( y) yT Ay yT y Therefore 0 and is the largest eigenvalue of A 2. Maximize T Homework Due Tutorial 4 (Week 9, 15-19 Oct) 1. Do problem 7 on page 165. Suggestion: practice by doing problem 2 on page 164 and problem 5 on page 165 since these problems are similar and have solutions on pages 538-539. Do NOT hand in solutions for your practice problems. 2. Do problem 10 on pages 170-171. Suggestion: study the discussion of the minimum size property on pages 168-169. Then practice by doing problem 3 on page 169. Do NOT hand in solutions for your practice problems. Extra Credit : Compute arg min [ max | x p( x) | ] n pPn1 1 x 1 Suggestion: THINK about Theorem 2 and problem 3 on page 169. Homework Due Tutorial 4 (Week 9, 15-19 Oct) 3. The trapezoid method for integrating a function f using n equal length subintervals can be shown to give an estimate having the form T (n) I a1n 2 a2 n 4 a3n 6 b where C ([ a, b]) I f ( x) dx a and the sequence a1 , a2 , a3 , depends S (2n) 43 T (2n) 13 T (n) where f. S ( 2n) is the estimate for the integral obtained using Simpson’s on (a) Show that for any method with 2n equal length subintervals. (b) Use this fact to together with the form of T (n ) above to prove that there exists a sequence b1 , b2 , b3 , with S (n) I b n 4 b n 6 (c) Compute constants c1 , c2 , with r1 , r2 , r3 1 2 so that there exists a sequence 6 8 r1T (n) r2T (2n) r3T (4n) I c1n c2 n Homework Due Lab 4 (Week 10, 22-26 October) 4. Consider the equations for the 9 variables inside the array x01 1 x02 2 x03 3 x 1 x x x x 4 11 12 13 14 10 x20 1 x21 x22 x23 x24 4 x31 x32 x33 x34 4 x30 1 x41 2 x42 3 x43 4 xi 1, j xi 1, j xi , j 1 xi , j 1 4 xi , j 0, i, j 1,2,3 A R 99 , b R 9 (a) Write these equations as Ax b then solve using Gauss Elim. and display the solution in the array. where (b) Compute the Jacobi iteration matrix BR 99 and || B || . (c) Write a MATLAB program to implement the Jacobi method for a (n+2) x (n+2) array without computing a sparse matrix A. Homework Due Tutorial 4 (Week 9, 15-19 Oct) 5. Consider the equation Ax b where A R nn ,b R n sin( mh) 2 1 0 0 sin( 2mh) 1 2 1 0 v m sin( 3mh) A 0 1 0 2 1 sin( nmh) 0 0 1 2 (a) Prove that the vectors where h n 1 , m 1,..., n are eigenvectors of A and compute their eigenvalues. (b) Prove that the Jacobi method for this matrix converges by showing that the spectral radius of the iteration matrix is < 1. Homework Due Lab 4 (Week 10,22-26 October) 1. (a) Modify the computer code developed for Lab 3 to compute polynomials that interpolate the function 1/(1+x*x) on the interval [-5,5] based on N = 4, 8, 16, and 32 nodes located at the points x(j) = 5 cos((2 j – 1)pi/(2N)), j = 1,…,N. (b) Compare the results with the results you obtained in Lab 3 using uniform nodes. (c) Plot the functions w( x) ( x x(1))( x x(2)) ( x x( N )) both for the case where the nodes x(j) are uniformly and where they are chosen as above. (d) Show that x(j) / 5 are the zeros of a Chebyshev polynomial, then derive a formula for w(x) and use this formula to explain why the use of the nonuniform nodes x(j) above gives a smaller interpolation error than the use of uniform nodes. Homework Due Lab 4 (Week 10,22-26 October) 2. (a) Write computer code to compute trapezoidal approximations for I 4 0 dx 1 tan (4) 1.32581766366803 2 1 x and run this code to compute approximations I(n) and associated errors for n = 2, 4, 8, 16, 32, 64 and 128 intervals. (b) Use the (Romberg) formula that you developed in Tutorial 4 to combine I(n), I(2n), and I(4n) for n = 2,4,8,16,32 to develop more accurate approximations R(n). Compute the ratios of consecutive errors (I-I(2n))/(I-I(n)) and (I-R(2n))/(I-R(n)) for n = 2,4,8,16, present them in a table and discuss them (I denotes exact integral). (c) Compute approximations to the integral in (a) using Gauss quadrature with n = 1, 2, 3, 4, and present the errors in a table and compare them to the errors obtained in (a), (b) above. Homework Due Lab 5 (Week 12, 5-9 November) 3. (a) Use the MATLAB program for Prob4(c)Homework dueTut. 4 to compute the internal variables in the following array for n = 50. x1, 0 1 2 x n n,0 x1,n 1 n(n 2) xn ,1 x33 xn ,n 1 2n 1 xn 1,1 n(n 2) xn 1,n 2n 1 x0,1 1 x1,1 x0,n n 2 x1,n that satisfy the inequalities | xi 1, j xi 1, j xi , j 1 xi , j 1 4 xi , j | 104 , 1 i, j n (b) Display the solution using MATLAB mesh&contour commands. (c) Find a polynomial P of two variables so the exact solution satisfies xi , j P(i, j ) and use it to compute&display the error.