ucsc supplementary notes ams/econ 11b Taylor’s approximation in several variables. c 2008, Yonatan Katznelson Finding the extreme (minimum or maximum) values of a function, is one of the most important applications of differential calculus to economics. In general, there are two steps to this process: (i) finding the the points where extreme values may occur, and (ii) analyzing the behavior of the function near these points, to determine whether or not extreme values actually occur there. For a function of one variable, we saw that step (i) consisted of finding the critical points of the function, and step (ii) consisted of using the first or second derivative test to analyze the behavior of the function near the point. Broadly speaking, we use the same two steps for functions of several variables. However, because there are more variables, there are some technical differences in (i) the way critical points are found, and (ii) the second derivative test.† To understand these differences, we’ll study the second degree Taylor approximation. 1. The one-variable case. To makes sense of Taylor’s approximation in several variables, we first need to recall what the approximation looks like for functions of one variable.‡ If the first and second derivatives of the function y = f (t) are defined at t0 then the quadratic function, then the second degree Taylor polynomial of f (t), centered at t0 is given by T2 (t) = f (t0 ) + f 0 (t0 )(t − t0 ) + f 00 (t0 ) (t − t0 )2 . 2 (1.1) If the first, second and third derivatives of f (t) are defined for all points t in the interval (t0 − a, t0 + a), then f (t) ≈ T2 (t) if t is sufficiently close to t0 . More precisely, |f (t) − T2 (t)| ≤ f 000 (ϑ) |t − t0 |3 , 6 (1.2) where ϑ is some point between t and t0 . If |t − t0 | < 1, then |t − t0 |3 is even smaller, and if f 000 (t) is reasonably ‘well behaved’ (e.g., continuous) in the vicinity of t0 , then the expression on the right-hand side of (1.2) will be very small if |t − t0 | is small. In particular, we will assume throughout this discussion that ‘sufficiently small’ means small enough so that the 000 00 term f 6(ϑ) |t − t0 |3 is much smaller than the quadratic term f 2(t0 ) (t − t0 )2 of the Taylor polynomial T2 (t) in (1.1). To summarize, the second degree Taylor approximation (in one variable) says that, if |t − t0 | is sufficiently small, then f (t) ≈ f (t0 ) + f 0 (t0 )(t − t0 ) + † The f 00 (t0 ) (t − t0 )2 . 2 (1.3) first derivative test does not generalize easily (if at all) to the several-variable scenario. a more thorough discussion of Taylor’s approximation for functions of one variable, please see SN 7 on the review page of the 11A website: http://people.ucsc.edu/˜yorik/11A/review.htm. ‡ For 1 2. The chain rule.§ To derive the analogous approximation in several variables, we use the one-variable version and the multivariable version of the chain rule. Recall, if z = z(x, y) is a function of two variables, and both x and y are functions of the variable t, i.e., x = x(t) and y = y(t), then z = f (x(t), y(t)) is also a function of t. If all the functions are differentiable, then z has a derivative with respect to t, and the chain rule says that z 0 (t) = zx (x(t), y(t)) · x0 (t) + zy (x(t), y(t)) · y 0 (t), (2.1) where zx and zy are the partial derivatives of z with respect to x and y, respectively. To extend Taylor’s approximation to the two-variable scenario, we will apply the chain rule in the special case that x(t) = x0 + t∆x and y(t) = y0 + t∆y, where both ∆x and ∆y are constants. In this case, x0 (t) = ∆x and y 0 (t) = ∆y, and Equation (2.1) simplifies to z 0 (t) = zx (x(t), y(t)) · ∆x + zy (x(t), y(t)) · ∆y. (2.2) We can use the chain rule to compute second derivatives as well. Continuing with the special case above, we have d 0 (z (t)) dt d = (zx (x(t), y(t)) · ∆x + zy (x(t), y(t)) · ∆y) dt d d = (zx (x(t), y(t)) · ∆x) + (zy (x(t), y(t)) · ∆y) dt dt = (zxx (x(t), y(t)) · ∆x + zxy (x(t), y(t)) · ∆y) · ∆x + (zyx (x(t), y(t)) · ∆x + zyy (x(t), y(t)) · ∆y) · ∆y. z 00 (t) = Collecting terms, and remembering that zxy = zyx , we can simplify the last expression above to z 00 (t) = zxx (x(t), y(t))(∆x)2 + 2zxy (x(t), y(t))∆x∆y + zyy (x(t), y(t))(∆y)2 . (2.3) More generally, if w(x1 , . . . , xn ) is a function of n variables, and the variables x1 through xn are all functions of the variable t, then Equation (2.1) generalizes to w0 (t) = wx1 (x1 (t), . . . , xn (t)) · x01 (t) + · · · + wxn (x1 (t), . . . , xn (t)) · x0n (t) n X = wxj (x1 (t), . . . , xn (t)) · x0j (t). j=1 In the special case that xj (t) = x̃j + t∆xj , for j = 1, . . . , n, where x̃j and ∆xj are all constants, Equation (2.2) generalizes to w0 (t) = n X wxj (x1 (t), . . . , xn (t)) · ∆xj , j=1 § See section 17.6 in the textbook. 2 (2.4) and Equation (2.3) generalizes to 00 w (t) = n X n X wxj xk (x1 (t), . . . , xn (t)) · ∆xj ∆xk . (2.5) j=1 k=1 The double-sum above means that we let both indices, j and k, take every value between 1 and n. For example, if n = 3, (2.5) can be written out explicitly as w00 (t) = wx1 x1 (x1 (t), x2 (t), x3 (t))∆x1 ∆x1 + 2wx1 x2 (x1 (t), x2 (t)x3 (t))∆x1 ∆x2 + 2wx1 x3 (x1 (t), x2 (t), x3 (t))∆x1 ∆x3 + wx2 x1 (x1 (t), x2 (t), x3 (t))∆x2 ∆x1 + wx2 x2 (x1 (t), x2 (t), x3 (t))∆x2 ∆x2 + 2wx2 x3 (x1 (t), x2 (t), x3 (t))∆x2 ∆x3 + wx3 x1 (x1 (t), x2 (t), x3 (t))∆x3 ∆x1 + wx3 x2 (x1 (t), x2 (t), x3 (t))∆x3 ∆x2 + 2wx3 x3 (x1 (t), x2 (t), x3 (t))∆x3 ∆x3 . Remembering that wxj xk = wxk xj , we can rearrange the terms and shorten the expression above a little, to obtain w00 (t) = wx1 x1 (x1 (t), x2 (t), x3 (t))∆x21 + wx2 x2 (x1 (t), x2 (t)x3 (t))∆x22 + wx3 x3 (x1 (t), x2 (t)x3 (t))∆x23 + 2wx1 x2 (x1 (t), x2 (t), x3 (t))∆x1 ∆x2 + 2wx1 x3 (x1 (t), x2 (t)x3 (t))∆x1 ∆x3 + 2wx2 x3 (x1 (t), x2 (t)x3 (t))∆x2 ∆x3 . 3. The second order Taylor approximation in two variables. Equation (1.3) says that a function of one variable can be well approximated by a quadratic polynomial. We would like to find a similar approximation for functions of two variables.¶ In other words, if z = f (x, y) is a function of two variables that is sufficiently differentiable in the neighborhood of the point (x0 , y0 ),k then we hope to find a quadratic polynomial T2 (x, y) such that f (x, y) ≈ T2 (x, y), when (x, y) sufficiently close to (x0 , y0 ). Sufficiently close means, in this case, that both ∆x = (x − x0 ) and ∆y = (y − y0 ) are small. To find T2 (x, y) we apply Taylor’s approximation (1.3) to the function z(t) = f (x(t), y(t)) = f (x0 + t∆x, y0 + t∆y), and use formulas (2.2) and (2.3) to compute z 0 (t) and z 00 (t). First, observe that f (x, y) = f (x0 + (x − x0 ), y0 + (y − y0 )) = f (x0 + ∆x, y0 + ∆y) = z(1). ¶ And, k I.e., in general, for a function of n variables, but first things first. all of its partial derivatives of order up to and including order 3 are defined around (x0 , y0 ). 3 Next, applying the approximation (1.3) we have f (x, y) = z(1) ≈ z(0) + z 0 (0)(1 − 0) + z 00 (0) z 00 (0) (1 − 0)2 = z(0) + z 0 (0) + . 2 2 (3.1) All we have to do now is express the three terms on the far right of this equation in terms of the function f and its partial derivatives, and this is not hard to do, but does require that you pay attention. First, note that (x(0), y(0)) = (x0 + 0 · ∆x, y0 + 0 · ∆y) = (x0 , y0 ), so z(0) = f (x(0), y(0)) = f (x0 , y0 ). (3.2) Next, applying formula (2.2), we find that z 0 (0) = fx (x(0), y(0)) · ∆x + fy (x(0), y(0)) · ∆y = fx (x0 , y0 ) · ∆x + fy (x0 , y0 ) · ∆y. (3.3) Likewise, applying formula (2.3) gives z 00 (0) = fxx (x(0), y(0)) · ∆x2 + 2fxy (x(0), y(0)) · ∆x∆y + fyy (x(0), y(0)) · ∆y 2 = fxx (x0 , y0 ) · ∆x2 + 2fxy (x0 , y0 ) · ∆x∆y + fyy (x0 , y0 ) · ∆y 2 . (3.4) Finally, inserting the expressions above for z(0), z 0 (0) and z 00 (0) into the approximation formula (3.1), and remembering that ∆x = (x−x0 ) and ∆y = (y −y0 ), we derive the Taylor approximation in two variables, f (x, y) ≈ f (x0 , y0 ) + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ) fyy (x0 , y0 ) fxx (x0 , y0 ) + (x − x0 )2 + fxy (x0 , y0 )(x − x0 )(y − y0 ) + (y − y0 )2 , 2 2 (3.5) which is accurate when both |x − x0 | and |y − y0 | are sufficiently small. 4. The second order approximation in general. As you can imagine, as the number of variables grows, the number of terms in the second order Taylor polynomial grows as well. On the other hand, the way in which the polynomial is derived is exactly the same as it was for two variables—using the one variable formula (1.3) and the two special cases of the chain rule, (2.4) and (2.5). Leaving out the details, the second order approximation for a function of n variables looks like this: f (x1 , . . . , xn ) ≈ f (x̃1 , . . . , x̃n ) + n X fxj (x̃1 , . . . , x̃n )(xj − x̃j ) j=1 + n n X X (4.1) fxj xk (x̃, . . . , x̃n ) · (xj − x̃j )(xk − x̃k ), j=1 k=1 where (x̃1 , . . . , x̃n ) is a fixed point and the approximation is accurate, if the distances |xj − x̃j |, for j = 1, . . . , n, are all sufficiently small. 4 5. An example. Consider the function f (x, y) = x3 y 3 2 2 1− e0.75x +y , − 2 3 whose graph in the neighborhood of (0, 0) appears in Figure 1, below. The point (0, 0, 1) Figure 1: The graph of f (x, y) = 1 − x3 2 − y3 3 e0.75x 2 +y 2 . on the graph is the point from which the vertical arrow is extending, and it appears to be the lowest point on the graph in its own immediate vicinity. In other words, it appears that f (0, 0) = 1 is a local minimum value of the function, meaning that f (0, 0) < f (x, y) for all points (x, y) that are sufficiently close to (0, 0). However pictures can be misleading, so it is important to have an analytic argument that proves that f (0, 0) is a local minimum value. Analyzing the function f (x, y) as it is defined may be a little complicated, and this is where the Taylor approximation of f (x, y), centered at (0, 0), becomes very useful. Applying formula (3.1) in this case, yields fyy (0, 0) 2 fxx (0, 0) 2 x + fxy (0, 0)xy + y , (5.1) 2 2 which is accurate for (x, y) sufficiently close to (0, 0). The quadratic on the right is a very simple function of x and y, and as you will see, it will become even more simple, once we actually evaluate the various partial derivatives of f (x, y) at (0, 0). f (x, y) ≈ 1 + fx (0, 0)x + fy (0, 0)y + 5 Differentiating once yields 3x2 0.75x2 +y2 x3 y 3 2 2 fx (x, y) = − e + 1.5x 1 − − e0.75x +y 2 2 3 and x3 y 3 2 2 2 0.75x2 +y 2 e0.75x +y . fy (x, y) = −y e + 2y 1 − − 2 3 Evaluating these expressions at the point (0, 0) gives fx (0, 0) = 0 and fy (0, 0) = 0, which means that there are no linear terms in the Taylor approximation above. Differentiating the first derivatives (and collecting terms) yields the second order derivatives: x3 y 3 2 2 3 2 fxx (x, y) = − 3x + 4.5x + 1.5 + 2.25x 1− − e0.75x +y , 2 3 3 3 y x 2 2 − e0.75x +y fxy (x, y) = −3x2 y − 1.5xy 2 + 3xy 1 − 2 3 and x3 y 3 2 2 3 2 fyy (x, y) = − 2y + 4y ) + 2 + 4y 1− − e0.75x +y . 2 3 As complicated as they appear,∗∗ the second order partial derivatives are very easy to evaluate at the point (0, 0). Indeed fxx (0, 0) fxy (0, 0) and fyy (0, 0) = [−0 + (1.5 + 0) (1 − 0 − 0)] e0 = 1.5 · 1 · 1 = 1.5, = [0 − 0 + 0 (1 − 0 − 0)] e0 = 0 = = [−0 + (2 + 0) (1 − 0 − 0)] e0 = 2 · 1 · 1 = 2. Returning to the Taylor approximation (5.1), and inserting the values that we have just computed, we see that when (x, y) is close to (0, 0), f (x, y) ≈ 1 + 3x2 + y2. 4 This means that if (x, y) is close (but not equal to) to (0, 0), then f (x, y) − f (0, 0) ≈ 3x2 + y 2 > 0, 4 which means that f (x, y) > f (0, 0). But this shows that f (0, 0) is a local minimum value, as we were trying to show. This example illustrates how the Taylor approximation is used to analyze the behavior of a function in the neighborhood of a point where the first order partial derivatives vanish. This approach leads to the second derivative test for functions of two or more variables. ∗∗ I recommend that you compute these partial derivatives on your own, for practice. 6