Taylor`s approximation in several variables.

advertisement
ucsc
supplementary notes
ams/econ 11b
Taylor’s approximation in several variables.
c 2008, Yonatan Katznelson
Finding the extreme (minimum or maximum) values of a function, is one of the most
important applications of differential calculus to economics. In general, there are two steps
to this process: (i) finding the the points where extreme values may occur, and (ii) analyzing
the behavior of the function near these points, to determine whether or not extreme values
actually occur there.
For a function of one variable, we saw that step (i) consisted of finding the critical
points of the function, and step (ii) consisted of using the first or second derivative test
to analyze the behavior of the function near the point.
Broadly speaking, we use the same two steps for functions of several variables. However,
because there are more variables, there are some technical differences in (i) the way critical
points are found, and (ii) the second derivative test.† To understand these differences, we’ll
study the second degree Taylor approximation.
1.
The one-variable case.
To makes sense of Taylor’s approximation in several variables, we first need to recall
what the approximation looks like for functions of one variable.‡ If the first and second
derivatives of the function y = f (t) are defined at t0 then the quadratic function, then the
second degree Taylor polynomial of f (t), centered at t0 is given by
T2 (t) = f (t0 ) + f 0 (t0 )(t − t0 ) +
f 00 (t0 )
(t − t0 )2 .
2
(1.1)
If the first, second and third derivatives of f (t) are defined for all points t in the interval
(t0 − a, t0 + a), then f (t) ≈ T2 (t) if t is sufficiently close to t0 . More precisely,
|f (t) − T2 (t)| ≤
f 000 (ϑ)
|t − t0 |3 ,
6
(1.2)
where ϑ is some point between t and t0 . If |t − t0 | < 1, then |t − t0 |3 is even smaller, and if
f 000 (t) is reasonably ‘well behaved’ (e.g., continuous) in the vicinity of t0 , then the expression
on the right-hand side of (1.2) will be very small if |t − t0 | is small. In particular, we will
assume throughout this discussion that ‘sufficiently small’ means small enough so that the
000
00
term f 6(ϑ) |t − t0 |3 is much smaller than the quadratic term f 2(t0 ) (t − t0 )2 of the Taylor
polynomial T2 (t) in (1.1).
To summarize, the second degree Taylor approximation (in one variable) says that, if
|t − t0 | is sufficiently small, then
f (t) ≈ f (t0 ) + f 0 (t0 )(t − t0 ) +
† The
f 00 (t0 )
(t − t0 )2 .
2
(1.3)
first derivative test does not generalize easily (if at all) to the several-variable scenario.
a more thorough discussion of Taylor’s approximation for functions of one variable, please see SN 7 on the
review page of the 11A website: http://people.ucsc.edu/˜yorik/11A/review.htm.
‡ For
1
2.
The chain rule.§
To derive the analogous approximation in several variables, we use the one-variable
version and the multivariable version of the chain rule.
Recall, if z = z(x, y) is a function of two variables, and both x and y are functions of
the variable t, i.e., x = x(t) and y = y(t), then z = f (x(t), y(t)) is also a function of t. If
all the functions are differentiable, then z has a derivative with respect to t, and the chain
rule says that
z 0 (t) = zx (x(t), y(t)) · x0 (t) + zy (x(t), y(t)) · y 0 (t),
(2.1)
where zx and zy are the partial derivatives of z with respect to x and y, respectively.
To extend Taylor’s approximation to the two-variable scenario, we will apply the chain
rule in the special case that x(t) = x0 + t∆x and y(t) = y0 + t∆y, where both ∆x and ∆y
are constants. In this case, x0 (t) = ∆x and y 0 (t) = ∆y, and Equation (2.1) simplifies to
z 0 (t) = zx (x(t), y(t)) · ∆x + zy (x(t), y(t)) · ∆y.
(2.2)
We can use the chain rule to compute second derivatives as well. Continuing with the
special case above, we have
d 0
(z (t))
dt
d
=
(zx (x(t), y(t)) · ∆x + zy (x(t), y(t)) · ∆y)
dt
d
d
=
(zx (x(t), y(t)) · ∆x) + (zy (x(t), y(t)) · ∆y)
dt
dt
= (zxx (x(t), y(t)) · ∆x + zxy (x(t), y(t)) · ∆y) · ∆x
+ (zyx (x(t), y(t)) · ∆x + zyy (x(t), y(t)) · ∆y) · ∆y.
z 00 (t) =
Collecting terms, and remembering that zxy = zyx , we can simplify the last expression
above to
z 00 (t) = zxx (x(t), y(t))(∆x)2 + 2zxy (x(t), y(t))∆x∆y + zyy (x(t), y(t))(∆y)2 .
(2.3)
More generally, if w(x1 , . . . , xn ) is a function of n variables, and the variables x1 through
xn are all functions of the variable t, then Equation (2.1) generalizes to
w0 (t) = wx1 (x1 (t), . . . , xn (t)) · x01 (t) + · · · + wxn (x1 (t), . . . , xn (t)) · x0n (t)
n
X
=
wxj (x1 (t), . . . , xn (t)) · x0j (t).
j=1
In the special case that
xj (t) = x̃j + t∆xj ,
for j = 1, . . . , n,
where x̃j and ∆xj are all constants, Equation (2.2) generalizes to
w0 (t) =
n
X
wxj (x1 (t), . . . , xn (t)) · ∆xj ,
j=1
§ See
section 17.6 in the textbook.
2
(2.4)
and Equation (2.3) generalizes to
00
w (t) =
n X
n
X
wxj xk (x1 (t), . . . , xn (t)) · ∆xj ∆xk .
(2.5)
j=1 k=1
The double-sum above means that we let both indices, j and k, take every value between 1
and n. For example, if n = 3, (2.5) can be written out explicitly as
w00 (t) = wx1 x1 (x1 (t), x2 (t), x3 (t))∆x1 ∆x1 + 2wx1 x2 (x1 (t), x2 (t)x3 (t))∆x1 ∆x2
+ 2wx1 x3 (x1 (t), x2 (t), x3 (t))∆x1 ∆x3
+ wx2 x1 (x1 (t), x2 (t), x3 (t))∆x2 ∆x1 + wx2 x2 (x1 (t), x2 (t), x3 (t))∆x2 ∆x2
+ 2wx2 x3 (x1 (t), x2 (t), x3 (t))∆x2 ∆x3
+ wx3 x1 (x1 (t), x2 (t), x3 (t))∆x3 ∆x1 + wx3 x2 (x1 (t), x2 (t), x3 (t))∆x3 ∆x2
+ 2wx3 x3 (x1 (t), x2 (t), x3 (t))∆x3 ∆x3 .
Remembering that wxj xk = wxk xj , we can rearrange the terms and shorten the expression
above a little, to obtain
w00 (t) = wx1 x1 (x1 (t), x2 (t), x3 (t))∆x21 + wx2 x2 (x1 (t), x2 (t)x3 (t))∆x22
+ wx3 x3 (x1 (t), x2 (t)x3 (t))∆x23
+ 2wx1 x2 (x1 (t), x2 (t), x3 (t))∆x1 ∆x2 + 2wx1 x3 (x1 (t), x2 (t)x3 (t))∆x1 ∆x3
+ 2wx2 x3 (x1 (t), x2 (t)x3 (t))∆x2 ∆x3 .
3.
The second order Taylor approximation in two variables.
Equation (1.3) says that a function of one variable can be well approximated by a
quadratic polynomial. We would like to find a similar approximation for functions of two
variables.¶
In other words, if z = f (x, y) is a function of two variables that is sufficiently differentiable in the neighborhood of the point (x0 , y0 ),k then we hope to find a quadratic
polynomial T2 (x, y) such that
f (x, y) ≈ T2 (x, y),
when (x, y) sufficiently close to (x0 , y0 ). Sufficiently close means, in this case, that both
∆x = (x − x0 )
and
∆y = (y − y0 )
are small.
To find T2 (x, y) we apply Taylor’s approximation (1.3) to the function
z(t) = f (x(t), y(t)) = f (x0 + t∆x, y0 + t∆y),
and use formulas (2.2) and (2.3) to compute z 0 (t) and z 00 (t).
First, observe that
f (x, y) = f (x0 + (x − x0 ), y0 + (y − y0 )) = f (x0 + ∆x, y0 + ∆y) = z(1).
¶ And,
k I.e.,
in general, for a function of n variables, but first things first.
all of its partial derivatives of order up to and including order 3 are defined around (x0 , y0 ).
3
Next, applying the approximation (1.3) we have
f (x, y) = z(1) ≈ z(0) + z 0 (0)(1 − 0) +
z 00 (0)
z 00 (0)
(1 − 0)2 = z(0) + z 0 (0) +
.
2
2
(3.1)
All we have to do now is express the three terms on the far right of this equation in terms
of the function f and its partial derivatives, and this is not hard to do, but does require
that you pay attention.
First, note that (x(0), y(0)) = (x0 + 0 · ∆x, y0 + 0 · ∆y) = (x0 , y0 ), so
z(0) = f (x(0), y(0)) = f (x0 , y0 ).
(3.2)
Next, applying formula (2.2), we find that
z 0 (0) = fx (x(0), y(0)) · ∆x + fy (x(0), y(0)) · ∆y = fx (x0 , y0 ) · ∆x + fy (x0 , y0 ) · ∆y. (3.3)
Likewise, applying formula (2.3) gives
z 00 (0) = fxx (x(0), y(0)) · ∆x2 + 2fxy (x(0), y(0)) · ∆x∆y + fyy (x(0), y(0)) · ∆y 2
= fxx (x0 , y0 ) · ∆x2 + 2fxy (x0 , y0 ) · ∆x∆y + fyy (x0 , y0 ) · ∆y 2 .
(3.4)
Finally, inserting the expressions above for z(0), z 0 (0) and z 00 (0) into the approximation
formula (3.1), and remembering that ∆x = (x−x0 ) and ∆y = (y −y0 ), we derive the Taylor
approximation in two variables,
f (x, y) ≈ f (x0 , y0 ) + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 )
fyy (x0 , y0 )
fxx (x0 , y0 )
+
(x − x0 )2 + fxy (x0 , y0 )(x − x0 )(y − y0 ) +
(y − y0 )2 ,
2
2
(3.5)
which is accurate when both |x − x0 | and |y − y0 | are sufficiently small.
4.
The second order approximation in general.
As you can imagine, as the number of variables grows, the number of terms in the second
order Taylor polynomial grows as well. On the other hand, the way in which the polynomial
is derived is exactly the same as it was for two variables—using the one variable formula
(1.3) and the two special cases of the chain rule, (2.4) and (2.5).
Leaving out the details, the second order approximation for a function of n variables
looks like this:
f (x1 , . . . , xn ) ≈ f (x̃1 , . . . , x̃n ) +
n
X
fxj (x̃1 , . . . , x̃n )(xj − x̃j )
j=1
+
n
n X
X
(4.1)
fxj xk (x̃, . . . , x̃n ) · (xj − x̃j )(xk − x̃k ),
j=1 k=1
where (x̃1 , . . . , x̃n ) is a fixed point and the approximation is accurate, if the distances
|xj − x̃j |, for j = 1, . . . , n, are all sufficiently small.
4
5.
An example.
Consider the function
f (x, y) =
x3 y 3
2
2
1−
e0.75x +y ,
−
2
3
whose graph in the neighborhood of (0, 0) appears in Figure 1, below. The point (0, 0, 1)
Figure 1: The graph of f (x, y) = 1 −
x3
2
−
y3
3
e0.75x
2
+y 2
.
on the graph is the point from which the vertical arrow is extending, and it appears to be
the lowest point on the graph in its own immediate vicinity. In other words, it appears that
f (0, 0) = 1 is a local minimum value of the function, meaning that
f (0, 0) < f (x, y)
for all points (x, y) that are sufficiently close to (0, 0). However pictures can be misleading,
so it is important to have an analytic argument that proves that f (0, 0) is a local minimum
value.
Analyzing the function f (x, y) as it is defined may be a little complicated, and this is
where the Taylor approximation of f (x, y), centered at (0, 0), becomes very useful. Applying
formula (3.1) in this case, yields
fyy (0, 0) 2
fxx (0, 0) 2
x + fxy (0, 0)xy +
y ,
(5.1)
2
2
which is accurate for (x, y) sufficiently close to (0, 0). The quadratic on the right is a very
simple function of x and y, and as you will see, it will become even more simple, once we
actually evaluate the various partial derivatives of f (x, y) at (0, 0).
f (x, y) ≈ 1 + fx (0, 0)x + fy (0, 0)y +
5
Differentiating once yields
3x2 0.75x2 +y2
x3 y 3
2
2
fx (x, y) = −
e
+ 1.5x 1 −
−
e0.75x +y
2
2
3
and
x3 y 3
2
2
2 0.75x2 +y 2
e0.75x +y .
fy (x, y) = −y e
+ 2y 1 −
−
2
3
Evaluating these expressions at the point (0, 0) gives
fx (0, 0) = 0
and
fy (0, 0) = 0,
which means that there are no linear terms in the Taylor approximation above.
Differentiating the first derivatives (and collecting terms) yields the second order derivatives:
x3 y 3
2
2
3
2
fxx (x, y) = − 3x + 4.5x + 1.5 + 2.25x
1−
−
e0.75x +y ,
2
3
3
3
y
x
2
2
−
e0.75x +y
fxy (x, y) = −3x2 y − 1.5xy 2 + 3xy 1 −
2
3
and
x3 y 3
2
2
3
2
fyy (x, y) = − 2y + 4y ) + 2 + 4y
1−
−
e0.75x +y .
2
3
As complicated as they appear,∗∗ the second order partial derivatives are very easy to
evaluate at the point (0, 0). Indeed
fxx (0, 0)
fxy (0, 0)
and
fyy (0, 0)
= [−0 + (1.5 + 0) (1 − 0 − 0)] e0 = 1.5 · 1 · 1 = 1.5,
= [0 − 0 + 0 (1 − 0 − 0)] e0 = 0
=
= [−0 + (2 + 0) (1 − 0 − 0)] e0 = 2 · 1 · 1 = 2.
Returning to the Taylor approximation (5.1), and inserting the values that we have just
computed, we see that when (x, y) is close to (0, 0),
f (x, y) ≈ 1 +
3x2
+ y2.
4
This means that if (x, y) is close (but not equal to) to (0, 0), then
f (x, y) − f (0, 0) ≈
3x2
+ y 2 > 0,
4
which means that f (x, y) > f (0, 0). But this shows that f (0, 0) is a local minimum value,
as we were trying to show.
This example illustrates how the Taylor approximation is used to analyze the behavior
of a function in the neighborhood of a point where the first order partial derivatives vanish.
This approach leads to the second derivative test for functions of two or more variables.
∗∗ I
recommend that you compute these partial derivatives on your own, for practice.
6
Download