Math 1540 Spring 2011 Notes #4 Higher derivatives, Taylor’s theorem 1 The space of linear transformations from Rn to Rm : We have discussed linear transformations mapping Rn to Rm: We can add such linear transformations in the usual way: (L1 + L2 ) (x) = L1 (x) + L2 (x). Similarly we can multiply such a linear transformation by a scalar. In this way, the set L (Rn ; Rm ) = flinear transformations from Rn to Rm g becomes a vector space. If we choose bases for Rn and Rm ; say the standard bases, then each element of L (Rn ; Rm ) has an m n matrix with respect to these bases. Since there are mn entries in such a matrix, and they all can be chosen independently of each other, L (Rn ; Rm ) has dimension mn: A basis is the set of m n matrices which are all zero except for a 1 in one entry. 2 Second derivative Recall that if f : Rn ! Rm is di¤erentiable at a point x 2 Rn ; then Df (x) is a linear transformation from Rn to Rm : Hence, for each x; Df (x) 2 L (Rm ; Rn ) : From this we see that Df is a function from Rn to L (Rn ; Rm ) : We can then discuss D (Df ) ; or D2 f; the second derivative of f: For each x 2 Rn ; 2 D f (x) is a linear transformation from Rn to L (Rn ; Rm ) : Hence, for any v 2 Rn ; D2 f (x) (v) 2 L (Rn ; Rm ) : Therefore, for any w 2 Rn ; D2 f (x) (v) (w) 2 Rm : Recall that Rn Rn = f(v; w) j v and w are in Rn g : We can therefore consider D2 f (x) as a linear transformation from Rn Rn to Rm : So instead of writing D2 f (x) (v) (w) ; we write D2 f (x) (v; w) : This linear transformation from Rn to Rm is called “bilinear”, because it is linear as a function of v for each …xed w; and also as a function of w for each …xed v: In other words, D2 f (x) v 1 + v 2 ; w1 + w2 = D2 f (x) v 1 ; w1 + D2 f (x) v 2 ; w1 + D2 f (x) v 1 ; w2 + D2 f (x) v 2 ; w2 : 1 Now we will only consider the case m = 1: Thus, f : Rn ! R For each x; Df (x) : Rn ! R Similarly, Df : Rn ! L (Rn ; R) For each x: D2 f (x) : Rn ! L (Rn ; R) : Equivalently, D2 f (x) : Rn Rn ! R (1) We wish to consider the nature of a general bilinear function L from Rn Rn to R: Let e1 ; :::; en be the standard basis vectors of Rn : Then for each i and j; L (ei ; ej ) 2 R: Let L (ei ; ej ) = aij : It will be simplest now to consider the case n = 2: Suppose that v = c1 e1 + c2 e2 ; and w = d1 e1 + d2 e2 : The bilinearity implies that L (v; w) = L (c1 e1 + c2 e2 ; d1 e1 + d2 e2 ) = c1 d1 a11 + c1 d2 a12 + c2 d1 a21 + c2 d2 a22: It turns out that this equals (c1 ; c2 ) a11 a12 a21 a22 d1 d2 : (Check by multiplying this out.) In this way, each L is associated with an n matrix A: In the case where L = D2 f (x) ; it is shown in the text that A= n @2f (x) : @xi @xj If you recall that for most functions, the order in which you take partial derivatives doesn’t matter, you see that under some assumptions on f; A is a symmetric matrix. Theorem 6.8.3 in the text says that this will be true if all of the second partial derivatives of f are continuous. Example: Let f (x; y) = x2 y+xy 3 : We will …nd the standard matrix for D2 f (1; 1) ; and check that the limit formula for derivative works. For this function we have Df (x; y) = 2xy + y 3 ; x2 + 3xy 2 : 2 By this we mean that Df (x; y) u v (2xy + y 3 ) u (x2 + 3xy 2 ) v = Also, 2y 2x + 3y 2 2x + 3y 2 6xy D2 f (x; y) has the matrix 2 : 2 @ f @ f (Notice that @x@y = @y@x : ) Recall that D2 f (x; y) : Rn ! L (Rn ; R) : Therefore, u D2 f (x) must be a map from Rn to R: We saw such a map before: Df (x) v maps Rn to R: Any element of L (Rn ; R) can be written in the form x ! bx where b is a 1 n matrix; that is, a row vector. And any linear map L from Rn to fall n-dimensional row vectorsg can be written as y ! y T A for some n n matrix u A: It is shown in the text that if L = D2 f (x) ; then A is the matrix of second v partial derivatives of f; called the “Hessian”. This leads us to the equation D2 f (x; y) u v = (u; v) 2y 2x + 3y 2 2 2x + 3y 6xy p q = (u; v) 2y 2x + 3y 2 2 2x + 3y 6xy : From this we get D2 f (x; y) u v p q Hence, D2 f (1; 1) u v p q 2 5 5 6 = (u; v) p q = 2up + 5uq + 5pv + 6v 2 : We now check this last formula using the de…nition of derivative. However, it is a bit complicated to describe just what is meant by the norm of a linear operator. It turns out to be equivalent to discuss the corresponding matrices. Once again the sup norm will be convenient. We wish to check that Df (x; y) lim (x;y)!(1;1) Df (1; 1) (x x y 1; y 1 1 3 1) 1 2 5 5 6 1 = 0: (Notice that in the numerator we are dealing with row vectors.) We obtain 2xy + y 3 ; x2 + 3xy 2 (3; 4) (2 (x 1) + 5 (y 1) ; 5 (x 1) + 6 (y 1)) 1 It is su¢ cient to show that the ratio of the absolute value of each component of the vector in this expression to the norm in the denominator tends to zero as (x; y) ! (1; 1) : The …rst component is y 3 + 2xy 2x 5y + 4: A little algebra is necessary: Since 2x = 2 (x 1) + 2 and 5y = 5 (y 1) + 5; we have y 3 + (2 (x 1) + 2) y 2x 5 (y 5 + 4 = y3 1) 3y + 2 + (x 1) (2y 2) Further, it turns out that y 3 3y + 2 = (y 1)2 (y + 2) : Hence if (x; y) 6= (1; 1) then the ratio of the absolute value of the …rst component of the numerator to the denominator is ( (x 1)2 jy+2j+2jx 1j2 if jx 1j > jy 1j (y 1)2 (y + 2) + 2 (x 1) (y 1) jx 1j : 2 2 (y 1) [y+2]+2jy 1j max fjx 1j ; jy 1jg if jx 1j jy 1j jy 1j Both alternatives on the right tend to zero as as (x; y) ! (1; 1) : The second component can be handled similarly. It would be a nice algebra exercise to do this. 3 Third derivative Notice the pattern: f : Rn ! R; and for each x 2 Rn (where f is di¤erentiable), Df (x) : Rn ! R: In other words, Df (x) 2 L (Rn ; R) : The linear transformation Df (x) has the standard matrix (1 n) given by the gradient, which is in Rn : Thus, Df : Rn ! Rn : Df is not usually a linear transformation. As we explained, D2 f (x) is a linear transformation from Rn Rn to R; and this linear transformation has the standard n n matrix given above. Therefore, D2 f : Rn ! L (Rn Rn ; R) : Hence, we expect that for each x; @3f : D3 f (x) : Rn ! L (Rn Rn ; R) : This will involve the third derivatives @xi @x j @xk We will consider this further below. First, we have a review of Taylor series in one variable. 4 Taylor series for f : R ! R First recall the general formula for a Taylor series in one variable. Suppose that f : R ! R; and all derivatives of f exist at every x 2 R: If x0 2 R; then the Taylor 4 series for f at x0 is 1 (n) f (x0 ) (x x0 )n : (2) n! is the n-th derivative of f: We have the usual conventions that 0! = 1 and 1 n=0 Here f (n) f (0) = f: This series may converge for all x; or only for x in some interval containing x0 : (It obviously converges if x = x0 :) And if it converges for some x 6= x0 ; it might not converge to f (x) : Examples of these possibilities will be given in class. De…nition 1 If the series (2) converges to f (x) in some neighborhood of x0 ; then f is called “analytic” at x0 : Perhaps of even more importance is using a …nite sum of the terms in the Taylor series to approximate f on some interval containing x0 : This can sometimes be done even if f is not analytic at x0 ; perhaps because not all of the derivatives of f at x0 are de…ned. The theorem which allows us to give such approximations is called Taylor’s theorem. To state Taylor’s theorem we …rst need a de…nition. De…nition 2 Suppose that f : I R ! R; where I is an open interval containing a point x0 : Suppose that r is a nonnegative integer. We say that f is of class C r on I if the …rst r derivatives, f; f 0 , f 00 , ..., f (r) exist and are continuous on I: Theorem 3 Suppose that f : I R ! R where I is an open interval containing a point x0 ; and f is of class C r on I: Suppose that x and y are in I: Then there is a c between x and y such that f (y) f (x) = r 1 n=1 1 (n) f (x) (y n! x)n + 1 (r) f (c) (y r! x)r : If r = 1; then this is the mean value theorem. As an example, let f (x) = jxj5=2 ; and consider f (2) f ( 1) : Notice that f 0 (0) = f 00 (0) = 0; but f 000 (0) doesn’t exist. Also, f (x) = ( x)5=2 if x < 0: We wish to …nd c 2 ( 1; 2) such that f (2) 1 ( 1)) + f 00 (c) (2 2 5 1 00 1 = ( 1) (3) + f (c) (9) : 2 2 f ( 1) = f 0 ( 1) (2 25=2 5 ( 1))2 5 3 If c > 0; then f 00 (c) = 25 32 c1=2 ; while if d = c; then f 00 (d) = ( d)1=2 = 2 2 f 00 (c) ; so f 00 is an even function. Therefore we can assume c > 0; and we want 25=2 + or p c= 13 953 1 = c2 ; 2 222 8 135 25=2 + 13 2 : Since we assumed that c > 0; we have to check that c < 2: That is easily done. p 5 c< 1 (8 + 7) = 1: 15 Taylor’s theorem of order 2 and quadratic forms. As pointed out earlier, the mean value theorem is a special case of Taylor’s Theorem. Using the formula on page 353, in Theorem 6.3.1, we see that if f is di¤erentiable at every x; then for any x0 and x there is a c between x0 and x such that f (x) = f (x0 ) + Df (c) (x x0 ) : Recall that Df (c) is a row vector, (the gradient). Extending by one more term, Theorem 6.8.5 gives f (x) = f (x0 ) + Df (x0 ) (x 1 x0 ) + D2 f (c) (x 2 x0 ; x (3) x0 ) : We need to explain the last term. From the theory above, we see that if we write x x0 as a column vector, then the last term is of the form x0 )T A (x (x and A is the n D2 f (c) (x n matrix x0 ; x @2f @xi @xj x0 ) = ((x = (4) x0 ) : Writing this out, we have the expression x0 )1 ; (x n i;j=1 aij (x x0 )2 ; x0 )i (x 6 ; (x x0 )j 0 (x B (x x0 )n ) A B @ (x 1 x0 )1 x0 )2 C C A x0 )n (5) (6) Let’s look again at n = 2: Let x x0 = u v for scalars u and v: Then the expression in (5) becomes a11 u2 + a12 uv + a21 uv + a22 v 2 : But A is symmetric, so we get a11 u2 + 2a12 uv + a22 v 2 : Such an expression is called a “quadratic form”. In the n dimensional case with (x x0 ) = u; we get a11 u21 + a22 u22 + + ann u2n + 2a12 u1 u2 + 2a13 u1 u3 + + 2a(n 1)n un 1 un ; which is again called a quadratic form. One of the chief questions one asks about a quadratic form is whether it is positive whenever u 6= 0: In that case it is called a “positive de…nite” quadratic form. One major reason that question is important is its application in the next section of the text to maxima and minima of functions f : Rn R: 6 Taylor’s theorem This is Theorem 6.8.5, which was referred to above. The proof is somewhat complicated, and the longest proof in either Chapter 5 or Chapter 6. I will be content here to carry the expansion out one more term than in (3), thus adding a third derivative, and discussing the resulting expression. It is f (x) = f (x0 )+Df (x0 ) (x 1 x0 )+ D2 f (x0 ) (x 2 x0 ; x 1 x0 )+ D3 f (c) (x 3! x0 ; x The last term with D3 f (c) ; is called the “remainder term”. Here c is a point on the line segment between x0 and x: You can tell from the last term that D3 f (c) : Rn Rn Rn ! R: There are 3n @3f third derivatives, @xi @x (c) : Thus, they won’t …t into a square matrix. We will j @xk 7 x0 ; x x0 ) : denote these derivatives by fabc (c) where a; b; and c are each one of x:or y: The third x derivative term when n = 2, and “x”is ; turns out to be y 1 fxxx (c) (x 3! x0 )3 + 3fxxy (c) (x x0 )2 (y y0 ) + 3fxyy (c) (x x0 ) (y y0 )2 + fyyy (c) (y (7) Can you see what the third order derivative would be when n = 3? What about the fourth derivative term for n = 2 and n = 3? 7 7.1 Maxima and Minima Positive de…nite quadratic forms. A quadratic form is a function Q : Rn ! R of the form Q (u) = uT Au for some symmetric n n matrix A: The relevance of this to Taylor’s Theorem is seen by looking at equations (3) and (4). De…nition 4 A symmetric matrix A is called “positive de…nite” if Q (u) > 0 for every u 6= 0 in Rn : There are two particularly useful criteria for determining of A is positive de…nite. These are from linear algebra, and won’t be proved here. Theorem 5 A symmetric n n matrix A is positive de…nite if either of the following conditions holds: (i) All eigenvalues of A are positive (ii) All n “upper left” subdeterminants of A are positive. An upper left subdeterminant is one formed by deleting between zero and n 1 of the last rows and columns of A= This will be illustrated in class. If A is a symmetric matrix and A is positive de…nite, then A is called negative de…nite. 8 y0 )3 : 7.2 Application to maxima and minima The equation (3) above allows us to determine criteria guaranteeing that a point x0 is a local maximum or local minimum for the function f: To apply it, we must assume that f 2 C 2 : There cannot be a local maximum at x0 unless Df (x0 ) = 0; for otherwise there is a nonzero directional derivative in some direction e; which means that d f (x0 + te) jt=0 6= 0; dt and so there are larger values of f either for t positive or t negative, and jtj small. De…nition 6 x0 is called a “critical point” of f if Df (x0 ) = 0: We then repeat equation (3): f (x) = f (x0 ) + Df (x0 ) (x x0 ) + D2 f (c) (x x0 ; x x0 ) : Assuming that Df (x0 ) = 0, we get f (x) = f (x0 ) + (x x0 )T D2 f (c) (x x0 ) : (8) Theorem 7 If x0 is a critical point of f and the matrix corresponding to D2 f (x0 ) is positive de…nite, then x0 is a local minimum for f: If D2 f (x0 ) is negative de…nite, then x0 is a local maximum. Proof. Suppose that x0 is a critical point of f and A := D2 f (x0 ) is positive de…nite. Then eT Ae > 0 for every unit vector e 2 Rn : If u is a nonzero vector in Rn ; then u is a unit vector. It follows that A is positive de…nite if and only if eT Ae > 0 e = jjujj for every unit vector e: But the set of all unit vectors in A is a compact set. Hence, = min eT Ae > 0: jjej=1j Because D2 f (x) is continuous, it follows that there is a such that if jjc xjj < ; then eT D2 f (c) e > 2 > 0 for every unit vector e: Hence D2 f is also positive de…nite. (i.e. the symmetric matrix corresponding to D2 f (c) is positive de…nite.) If jjx x0 jj < then jjc x0 jj < , because c is on the line segment between x and x0 : Equation (8) then implies that if 0 < jjx x0 jj < , then f (x) > f (x0 ) : Hence x0 is a local minimum for f: The case of a minimum is similar. 9 8 Homework, due Feb. 2 at the beginning of class 1. Use the formula in the middle of page 359 to write out completely the terms involving second derivatives in the Taylor series around x0 = 0 of a function f : R3 ! R: (That is, give the expression corresponding to (7) above, which is the third derivative term for a function from R2 to R1 :) Then write out completely the terms in the Taylor series around 0 involving the third derivative for a function g : R3 ! R: 2. pg. 386, # 19, c, 3. pg 386, #19 f. 4. 384 # 7 b. 5. pg. 384, # 7 c, The answer is in the back of the book (page 708), but you need to show the calculations needed to get the answer. Referring to the answer in the book, you need only consider the points where k = 0 and m = 0; n = j = 0; and n = 1; j = 0: 10