1 The space of linear transformations from Rn to Rm. 2 Second

advertisement
Math 1540
Spring 2011
Notes #4
Higher derivatives, Taylor’s theorem
1
The space of linear transformations from Rn to
Rm :
We have discussed linear transformations mapping Rn to Rm: We can add such linear
transformations in the usual way: (L1 + L2 ) (x) = L1 (x) + L2 (x). Similarly we can
multiply such a linear transformation by a scalar. In this way, the set
L (Rn ; Rm ) = flinear transformations from Rn to Rm g
becomes a vector space. If we choose bases for Rn and Rm ; say the standard bases,
then each element of L (Rn ; Rm ) has an m n matrix with respect to these bases.
Since there are mn entries in such a matrix, and they all can be chosen independently
of each other, L (Rn ; Rm ) has dimension mn: A basis is the set of m n matrices
which are all zero except for a 1 in one entry.
2
Second derivative
Recall that if f : Rn ! Rm is di¤erentiable at a point x 2 Rn ; then Df (x) is a linear
transformation from Rn to Rm : Hence, for each x; Df (x) 2 L (Rm ; Rn ) : From this
we see that Df is a function from Rn to L (Rn ; Rm ) :
We can then discuss D (Df ) ; or D2 f; the second derivative of f: For each x 2 Rn ;
2
D f (x) is a linear transformation from Rn to L (Rn ; Rm ) : Hence, for any v 2 Rn ;
D2 f (x) (v) 2 L (Rn ; Rm ) : Therefore, for any w 2 Rn ; D2 f (x) (v) (w) 2 Rm :
Recall that Rn Rn = f(v; w) j v and w are in Rn g : We can therefore consider D2 f (x) as a linear transformation from Rn Rn to Rm : So instead of writing
D2 f (x) (v) (w) ; we write D2 f (x) (v; w) :
This linear transformation from Rn to Rm is called “bilinear”, because it is linear
as a function of v for each …xed w; and also as a function of w for each …xed v: In
other words,
D2 f (x)
v 1 + v 2 ; w1 + w2 =
D2 f (x) v 1 ; w1 +
D2 f (x) v 2 ; w1
+
D2 f (x) v 1 ; w2 +
D2 f (x) v 2 ; w2 :
1
Now we will only consider the case m = 1: Thus,
f : Rn ! R
For each x; Df (x) : Rn ! R
Similarly,
Df : Rn ! L (Rn ; R)
For each x: D2 f (x) : Rn ! L (Rn ; R) :
Equivalently, D2 f (x) : Rn Rn ! R
(1)
We wish to consider the nature of a general bilinear function L from Rn Rn to R: Let
e1 ; :::; en be the standard basis vectors of Rn : Then for each i and j; L (ei ; ej ) 2 R:
Let L (ei ; ej ) = aij :
It will be simplest now to consider the case n = 2: Suppose that v = c1 e1 + c2 e2 ;
and w = d1 e1 + d2 e2 : The bilinearity implies that
L (v; w) = L (c1 e1 + c2 e2 ; d1 e1 + d2 e2 )
= c1 d1 a11 + c1 d2 a12 + c2 d1 a21 + c2 d2 a22:
It turns out that this equals
(c1 ; c2 )
a11 a12
a21 a22
d1
d2
:
(Check by multiplying this out.) In this way, each L is associated with an n
matrix A: In the case where L = D2 f (x) ; it is shown in the text that
A=
n
@2f
(x) :
@xi @xj
If you recall that for most functions, the order in which you take partial derivatives
doesn’t matter, you see that under some assumptions on f; A is a symmetric matrix.
Theorem 6.8.3 in the text says that this will be true if all of the second partial
derivatives of f are continuous.
Example: Let f (x; y) = x2 y+xy 3 : We will …nd the standard matrix for D2 f (1; 1) ;
and check that the limit formula for derivative works. For this function we have
Df (x; y) = 2xy + y 3 ; x2 + 3xy 2 :
2
By this we mean that
Df (x; y)
u
v
(2xy + y 3 ) u
(x2 + 3xy 2 ) v
=
Also,
2y
2x + 3y 2
2x + 3y 2
6xy
D2 f (x; y) has the matrix
2
:
2
@ f
@ f
(Notice that @x@y
= @y@x
: ) Recall that D2 f (x; y) : Rn ! L (Rn ; R) : Therefore,
u
D2 f (x)
must be a map from Rn to R: We saw such a map before: Df (x)
v
maps Rn to R: Any element of L (Rn ; R) can be written in the form
x ! bx
where b is a 1 n matrix; that is, a row vector. And any linear map L from Rn to
fall n-dimensional row vectorsg can be written as y ! y T A for some n n matrix
u
A: It is shown in the text that if L = D2 f (x)
; then A is the matrix of second
v
partial derivatives of f; called the “Hessian”.
This leads us to the equation
D2 f (x; y)
u
v
= (u; v)
2y
2x + 3y 2
2
2x + 3y
6xy
p
q
= (u; v)
2y
2x + 3y 2
2
2x + 3y
6xy
:
From this we get
D2 f (x; y)
u
v
p
q
Hence,
D2 f (1; 1)
u
v
p
q
2 5
5 6
= (u; v)
p
q
= 2up + 5uq + 5pv + 6v 2 :
We now check this last formula using the de…nition of derivative. However, it is
a bit complicated to describe just what is meant by the norm of a linear operator.
It turns out to be equivalent to discuss the corresponding matrices. Once again the
sup norm will be convenient. We wish to check that
Df (x; y)
lim
(x;y)!(1;1)
Df (1; 1)
(x
x
y
1; y
1
1
3
1)
1
2 5
5 6
1
= 0:
(Notice that in the numerator we are dealing with row vectors.) We obtain
2xy + y 3 ; x2 + 3xy 2
(3; 4)
(2 (x
1) + 5 (y
1) ; 5 (x
1) + 6 (y
1))
1
It is su¢ cient to show that the ratio of the absolute value of each component of the
vector in this expression to the norm in the denominator tends to zero as (x; y) !
(1; 1) :
The …rst component is y 3 + 2xy 2x 5y + 4: A little algebra is necessary: Since
2x = 2 (x 1) + 2 and 5y = 5 (y 1) + 5; we have
y 3 + (2 (x
1) + 2) y
2x
5 (y
5 + 4 = y3
1)
3y + 2 + (x
1) (2y
2)
Further, it turns out that y 3 3y + 2 = (y 1)2 (y + 2) : Hence if (x; y) 6= (1; 1)
then the ratio of the absolute value of the …rst component of the numerator to the
denominator is
( (x 1)2 jy+2j+2jx 1j2
if jx 1j > jy 1j
(y 1)2 (y + 2) + 2 (x 1) (y 1)
jx 1j
:
2
2
(y 1) [y+2]+2jy 1j
max fjx 1j ; jy 1jg
if jx 1j jy 1j
jy 1j
Both alternatives on the right tend to zero as as (x; y) ! (1; 1) : The second component can be handled similarly. It would be a nice algebra exercise to do this.
3
Third derivative
Notice the pattern: f : Rn ! R; and for each x 2 Rn (where f is di¤erentiable),
Df (x) : Rn ! R: In other words, Df (x) 2 L (Rn ; R) : The linear transformation
Df (x) has the standard matrix (1 n) given by the gradient, which is in Rn : Thus,
Df : Rn ! Rn : Df is not usually a linear transformation.
As we explained, D2 f (x) is a linear transformation from Rn Rn to R; and this
linear transformation has the standard n n matrix given above.
Therefore, D2 f : Rn ! L (Rn Rn ; R) : Hence, we expect that for each x;
@3f
:
D3 f (x) : Rn ! L (Rn Rn ; R) : This will involve the third derivatives @xi @x
j @xk
We will consider this further below. First, we have a review of Taylor series in one
variable.
4
Taylor series for f : R ! R
First recall the general formula for a Taylor series in one variable. Suppose that
f : R ! R; and all derivatives of f exist at every x 2 R: If x0 2 R; then the Taylor
4
series for f at x0 is
1 (n)
f (x0 ) (x x0 )n :
(2)
n!
is the n-th derivative of f: We have the usual conventions that 0! = 1 and
1
n=0
Here f (n)
f (0) = f:
This series may converge for all x; or only for x in some interval containing x0 :
(It obviously converges if x = x0 :) And if it converges for some x 6= x0 ; it might not
converge to f (x) : Examples of these possibilities will be given in class.
De…nition 1 If the series (2) converges to f (x) in some neighborhood of x0 ; then
f is called “analytic” at x0 :
Perhaps of even more importance is using a …nite sum of the terms in the Taylor
series to approximate f on some interval containing x0 : This can sometimes be done
even if f is not analytic at x0 ; perhaps because not all of the derivatives of f at
x0 are de…ned. The theorem which allows us to give such approximations is called
Taylor’s theorem. To state Taylor’s theorem we …rst need a de…nition.
De…nition 2 Suppose that f : I
R ! R; where I is an open interval containing
a point x0 : Suppose that r is a nonnegative integer. We say that f is of class C r on
I if the …rst r derivatives, f; f 0 , f 00 , ..., f (r) exist and are continuous on I:
Theorem 3 Suppose that f : I R ! R where I is an open interval containing a
point x0 ; and f is of class C r on I: Suppose that x and y are in I: Then there is a c
between x and y such that
f (y)
f (x) =
r 1
n=1
1 (n)
f (x) (y
n!
x)n +
1 (r)
f (c) (y
r!
x)r :
If r = 1; then this is the mean value theorem.
As an example, let f (x) = jxj5=2 ; and consider f (2) f ( 1) : Notice that f 0 (0) =
f 00 (0) = 0; but f 000 (0) doesn’t exist. Also, f (x) = ( x)5=2 if x < 0: We wish to …nd
c 2 ( 1; 2) such that
f (2)
1
( 1)) + f 00 (c) (2
2
5
1 00
1 = ( 1) (3) + f (c) (9) :
2
2
f ( 1) = f 0 ( 1) (2
25=2
5
( 1))2
5
3
If c > 0; then f 00 (c) = 25 32 c1=2 ; while if d = c; then f 00 (d) =
( d)1=2 =
2
2
f 00 (c) ; so f 00 is an even function. Therefore we can assume c > 0; and we want
25=2 +
or
p
c=
13
953 1
=
c2 ;
2
222
8
135
25=2 +
13
2
:
Since we assumed that c > 0; we have to check that c < 2: That is easily done.
p
5
c<
1
(8 + 7) = 1:
15
Taylor’s theorem of order 2 and quadratic forms.
As pointed out earlier, the mean value theorem is a special case of Taylor’s Theorem.
Using the formula on page 353, in Theorem 6.3.1, we see that if f is di¤erentiable at
every x; then for any x0 and x there is a c between x0 and x such that
f (x) = f (x0 ) + Df (c) (x
x0 ) :
Recall that Df (c) is a row vector, (the gradient).
Extending by one more term, Theorem 6.8.5 gives
f (x) = f (x0 ) + Df (x0 ) (x
1
x0 ) + D2 f (c) (x
2
x0 ; x
(3)
x0 ) :
We need to explain the last term. From the theory above, we see that if we write
x x0 as a column vector, then the last term is of the form
x0 )T A (x
(x
and A is the n
D2 f (c) (x
n matrix
x0 ; x
@2f
@xi @xj
x0 ) = ((x
=
(4)
x0 )
: Writing this out, we have the expression
x0 )1 ; (x
n
i;j=1 aij
(x
x0 )2 ;
x0 )i (x
6
; (x
x0 )j
0
(x
B (x
x0 )n ) A B
@
(x
1
x0 )1
x0 )2 C
C
A
x0 )n
(5)
(6)
Let’s look again at n = 2: Let
x
x0 =
u
v
for scalars u and v: Then the expression in (5) becomes
a11 u2 + a12 uv + a21 uv + a22 v 2 :
But A is symmetric, so we get
a11 u2 + 2a12 uv + a22 v 2 :
Such an expression is called a “quadratic form”. In the n dimensional case with
(x x0 ) = u; we get
a11 u21 + a22 u22 +
+ ann u2n + 2a12 u1 u2 + 2a13 u1 u3 +
+ 2a(n
1)n un 1 un ;
which is again called a quadratic form. One of the chief questions one asks about
a quadratic form is whether it is positive whenever u 6= 0: In that case it is called
a “positive de…nite” quadratic form. One major reason that question is important
is its application in the next section of the text to maxima and minima of functions
f : Rn R:
6
Taylor’s theorem
This is Theorem 6.8.5, which was referred to above. The proof is somewhat complicated, and the longest proof in either Chapter 5 or Chapter 6. I will be content here
to carry the expansion out one more term than in (3), thus adding a third derivative,
and discussing the resulting expression. It is
f (x) = f (x0 )+Df (x0 ) (x
1
x0 )+ D2 f (x0 ) (x
2
x0 ; x
1
x0 )+ D3 f (c) (x
3!
x0 ; x
The last term with D3 f (c) ; is called the “remainder term”. Here c is a point on the
line segment between x0 and x:
You can tell from the last term that D3 f (c) : Rn Rn Rn ! R: There are 3n
@3f
third derivatives, @xi @x
(c) : Thus, they won’t …t into a square matrix. We will
j @xk
7
x0 ; x
x0 ) :
denote these derivatives by fabc (c) where a; b; and c are each one of x:or y: The third
x
derivative term when n = 2, and “x”is
; turns out to be
y
1
fxxx (c) (x
3!
x0 )3 + 3fxxy (c) (x
x0 )2 (y
y0 ) + 3fxyy (c) (x
x0 ) (y
y0 )2 + fyyy (c) (y
(7)
Can you see what the third order derivative would be when n = 3? What about the
fourth derivative term for n = 2 and n = 3?
7
7.1
Maxima and Minima
Positive de…nite quadratic forms.
A quadratic form is a function Q : Rn ! R of the form
Q (u) = uT Au
for some symmetric n n matrix A: The relevance of this to Taylor’s Theorem is
seen by looking at equations (3) and (4).
De…nition 4 A symmetric matrix A is called “positive de…nite” if Q (u) > 0 for
every u 6= 0 in Rn :
There are two particularly useful criteria for determining of A is positive de…nite.
These are from linear algebra, and won’t be proved here.
Theorem 5 A symmetric n n matrix A is positive de…nite if either of the following
conditions holds:
(i) All eigenvalues of A are positive
(ii) All n “upper left” subdeterminants of A are positive.
An upper left subdeterminant is one formed by deleting between zero and n 1
of the last rows and columns of A= This will be illustrated in class.
If A is a symmetric matrix and A is positive de…nite, then A is called negative
de…nite.
8
y0 )3 :
7.2
Application to maxima and minima
The equation (3) above allows us to determine criteria guaranteeing that a point
x0 is a local maximum or local minimum for the function f: To apply it, we must
assume that f 2 C 2 : There cannot be a local maximum at x0 unless Df (x0 ) = 0; for
otherwise there is a nonzero directional derivative in some direction e; which means
that
d
f (x0 + te) jt=0 6= 0;
dt
and so there are larger values of f either for t positive or t negative, and jtj small.
De…nition 6 x0 is called a “critical point” of f if Df (x0 ) = 0:
We then repeat equation (3):
f (x) = f (x0 ) + Df (x0 ) (x
x0 ) + D2 f (c) (x
x0 ; x
x0 ) :
Assuming that Df (x0 ) = 0, we get
f (x) = f (x0 ) + (x
x0 )T D2 f (c) (x
x0 ) :
(8)
Theorem 7 If x0 is a critical point of f and the matrix corresponding to D2 f (x0 )
is positive de…nite, then x0 is a local minimum for f: If D2 f (x0 ) is negative de…nite,
then x0 is a local maximum.
Proof. Suppose that x0 is a critical point of f and A := D2 f (x0 ) is positive de…nite.
Then eT Ae > 0 for every unit vector e 2 Rn : If u is a nonzero vector in Rn ; then
u
is a unit vector. It follows that A is positive de…nite if and only if eT Ae > 0
e = jjujj
for every unit vector e: But the set of all unit vectors in A is a compact set. Hence,
= min eT Ae > 0:
jjej=1j
Because D2 f (x) is continuous, it follows that there is a such that if jjc xjj <
; then eT D2 f (c) e > 2 > 0 for every unit vector e: Hence D2 f is also positive
de…nite. (i.e. the symmetric matrix corresponding to D2 f (c) is positive de…nite.)
If jjx x0 jj < then jjc x0 jj < , because c is on the line segment between x and
x0 : Equation (8) then implies that if 0 < jjx x0 jj < , then f (x) > f (x0 ) : Hence
x0 is a local minimum for f:
The case of a minimum is similar.
9
8
Homework, due Feb. 2 at the beginning of class
1. Use the formula in the middle of page 359 to write out completely the terms
involving second derivatives in the Taylor series around x0 = 0 of a function
f : R3 ! R: (That is, give the expression corresponding to (7) above, which
is the third derivative term for a function from R2 to R1 :) Then write out
completely the terms in the Taylor series around 0 involving the third derivative
for a function g : R3 ! R:
2. pg. 386, # 19, c,
3. pg 386, #19 f.
4. 384 # 7 b.
5. pg. 384, # 7 c, The answer is in the back of the book (page 708), but you need
to show the calculations needed to get the answer. Referring to the answer in
the book, you need only consider the points where k = 0 and m = 0; n = j = 0;
and n = 1; j = 0:
10
Download