Section 14.5 The Chain Rule for Multivariate Functions In Calculus 1, we often considered functions f (x) where x was actually a function of another variable, say x = g(t). To differentiate f with respect to t (as opposed to differentiating it with respect to x), we must use the chain rule: d dx f (g(t)) = f ′ (g(t))g ′ (t) = f ′ (x) . dt dt For example, if f (x) = sin x and x = g(t) = et , then the rate of change of f with respect to x is given by d d f (x) = sin x = cos x, dx dx while the rate of change of f with respect to t is given by d d d f (x) = f (g(t)) = sin(et ) = et cos(et ). dt dt dt Now that we are working with multivariate functions, we can consider similar questions for f (x, y) when x and y are actually functions of another variable (or of multiple variables). There are many different such situations, but all of the outcomes look very similar to the chain rule for a single-variable function. The Chain Rule for f (x, y) when x and y are functions of a parameter t We have seen that, if f is a function of two variables x and y, then each of the variables x and y plays a role in the rate of change of the height of the surface f (x, y). Because of this, we consider the rate of change of f with respect to x, fx , and the rate of change of f with respect to y, fy , separately. Now it may be the case that each of x and y is actually controlled by a single variable t; if so, we think of x as x = x(t) and y as y = y(t). Then we may rewrite f (x, y) = f (x(t), y(t)). While we can still consider the rate of change of f with respect to x or with respect to y (the intermediate variables), thinking of f as a function of the single variable t allows us to consider the rate of change of f with respect to t. Let’s break down the way the chain rule for one intermediate variable works: d f (g(t)) dt is determined by finding the ”intermediate derivative” f ′ , as well as the derivative g ′ of the inside function g. We put all of this information together by multiplying f ′ (g(t)) by g ′ (t), so that d f (g(t)) = f ′ (g(t))g ′ (t). dt The chain rule for f (x(t), y(t)) works nearly the same way: we calculate d f (x(t), y(t)) dt by finding the ”intermediate derivatives” fx and fy , as well as the derivatives x′ (t) and y ′ (t) of the inside functions x(t) and y(t). Finally, we put all of the information together: 1 Section 14.5 Theorem 2. If f (x, y) has continuous partial derivatives fx and fy , and if x = x(t) and y = y(t) are differentiable functions of t, then the composite function f (x(t), y(t)) is a differentiable function of t and d f (x(t), y(t)) = fx (x(t), y(t)))x′ (t) + fy (x(t), y(t))y ′ (t) dt ∂f dx ∂f dy = + . ∂x dt ∂y dt Notice that our final answer is no longer a partial derivative; since we are thinking of f as being controlled by the single variable t, our answer is a full derivative, not a partial one. Example: Given the function f (x, y) = x3 sin y, where x = ln t and y = t2 , find: 1. The rate of change of f with respect to x 2. The rate of change of f with respect to y 3. The rate of change of f with respect to t. To answer (1), we calculate the partial fx : fx = 3x2 sin y. Similarly, fy = x3 cos y. To find the rate of change of f with respect to t, we will need to know the rates of change of x and y with respect to t: d 1 d x(t) = ln t = dt dt t and d d y(t) = t2 = 2t. dt dt So the derivative of f with respect to t is df ∂f dx ∂f dy = + dt ∂x dt ∂y dt 1 = (3x2 sin y)( ) + (x3 cos y)(2t) t 1 2 = (3(ln t) sin(t2 ))( ) + ((ln t)3 cos(t2 ))(2t) t 2 2 3(ln t) sin(t ) = + 2t(ln t)3 cos(t2 ). t 2 Section 14.5 This is actually not the only way to calculate df . Since we already know that x = ln t and dt 2 2 y = t , we could begin by rewriting f (x, y) as f (ln t, t ) = (ln t)3 sin(t2 ). Then df d = (ln t)3 sin(t2 ) dt dt 3(ln t)2 sin(t2 ) = + 2t(ln t)3 cos(t2 ). t This is exactly the same answer we got in the previous computation, and either method is perfectly acceptable to use (although the first calculation may be a bit simpler since it breaks the derivative down into more manageable pieces). The Chain Rule for f (x, y) when x and y are functions of two variables s and t In the case above, the intermediate variables x and y were controlled by a single variable t; in other words, we can view x and y as single variable functions. However, the intermediate variables x and y could each be multivariate functions: if x = g(s, t), y = h(s, t), then we can either choose to think of f (x, y) as a function of two variables, or as f (g(r, s), h(r, s)). In particular, we can now consider the rate at which f changes with respect to either of the variables r or s, i.e. we can find ∂f and ∂f . ∂s ∂t We need another version of the chain rule for this situation: Theorem 3. If the functions f (x, y), x = g(s, t), and y = h(s, t) are differentiable, then the partial derivatives of f with respect to s and t are given by ∂ ∂ f (x, y) = f (g(r, s), h(r, s)) ∂s ∂s = ∂f ∂x ∂f ∂y + ∂x ∂s ∂y ∂s and ∂ ∂ f (x, y) = f (g(r, s), h(r, s)) ∂t ∂s = ∂f ∂x ∂f ∂y + ∂x ∂t ∂y ∂t We can make the above discussion far more general–we can consider a function u of n variables x1 , x2 , . . . , xn , each of which is a function of the m variables t1 , t2 , . . . , tm . Then the function u is actually a function of t1 , t2 , . . . , tm , and we can evaluate the partial derivative of u with respect to any of the ti . 3 Section 14.5 Theorem 4. The partial derivative ∂u is given by ∂ti ∂u ∂u ∂x1 ∂u ∂x2 ∂u ∂xn = + + ... + . ∂ti ∂x1 ∂ti ∂x2 ∂ti ∂xn ∂ti Example: The function f (x, y, z) = xyz, and each of x, y, and z is a function of the variables r and s, and ∂f . given by x = r2 + s, y = r cos s, and z = sin(rs). Find the partials ∂f ∂r ∂s To use the formulas given above, we first need to calculate ∂f ∂x , ∂f ∂y , and ∂f ∂z : ∂f ∂f ∂f = yz, = xz, and = xy. ∂x ∂y ∂z We also need to find the partials of each intermediate variable x, y, and z with respect to r and s: ∂x ∂y ∂z = 2r, = cos s, and = s cos(rs); ∂r ∂r ∂r and ∂y ∂z ∂x = 1, = −r sin s, and = r cos(rs). ∂s ∂s ∂s Finally, we have ∂ ∂f ∂x ∂f ∂y ∂f ∂z f (x, y, z) = + + ∂r ∂x ∂r ∂y ∂r ∂z ∂r = (yz)(2r) + (xz)(cos s) + (xy)(s cos(rs)) = (r cos s)(sin(rs))(2r) + (r2 + s)(sin(rs))(cos s) + (r2 + s)(r cos s)(s cos(rs)) = 2r2 (cos s)(sin(rs)) + (r2 + s)(cos s)(sin(rs)) + (r3 s + rs2 )(cos s)(cos(rs)) and ∂ ∂f ∂x ∂f ∂y ∂f ∂z f (x, y, z) = + + ∂s ∂x ∂s ∂y ∂s ∂z ∂s = (yz)(1) + (xz)(−r sin s) + (xy)(r cos(rs)) = (r cos s)(sin(rs)) + (r2 + s)(sin(rs))(−r sin(rs)) + (r2 + s)(r cos s)(r cos(rs)) = r(cos s)(sin(rs)) − (r3 + rs)(sin2 (rs)) + (r4 + r2 s)(cos s)(cos(rs)). Implicit Differentiation 4 Section 14.5 The information we have seen in this section can help us simplify the process of implicit differentiation that we learned in Calculus 1. Recall that an equation in terms of x and y, such as x sin(xy) = 0, gives us a relationship between the two variables; in particular, the equation defines y as a function dy of x, so that we can find . dx On the other hand, if F (x, y) is a multivariate function, setting F (x, y) = 0 and thinking of y as a function of x, y = y(x), we can use the chain rule formula above to calculate dy : dx d d 0= F (x, y) dx dx dx dy = Fx + Fy dx dx dy = Fx + Fy . dx Since dy dy d 0 = 0, we have Fx + Fy = 0; solving for , we see that dx dx dx dy Fx =− . dx Fy Theorem 0.0.1. If F (x, y) is a differentiable function and F (x, y) = 0 defines y as a differentiable function of x, then the rate of change of y with respect to x is given by dy Fx =− . dx Fy Example: Given the equation sin(xy) − x3 y = 0, find dy . dx Thinking of F (x, y) = sin(xy)−x3 y, we can use the formula in the theorem above: since Fx = y cos(xy)− 3x y and Fy = x cos(xy) − x3 , we have 2 dy Fx y cos(xy) − 3x2 y =− =− . dx Fy x cos(xy) − x3 5