# Chain Rule

```Chain Rule for Functions of Several Variables
June 21, 2011
1
Functions of Several Variables
We write f : Rn → Rm for a rule assigning to each vector in a domain D ⊆ Rn a unique
vector in Rm .
Examples:
1. Suppose your position at time t is given by
p(t) = h2t + 1, 3t, 1 − ti.
This gives a function p : R → R3 .
Observe that any vector line equation r = r0 + tv is a function r : R → Rn , where
r0 , v ∈ Rn .
2. On a windy day, the force of the wind at a point (x, y, z) might be given by a function
F (x, y, z) : R3 → R2
F (x, y, z) = h2xz + 1, y + zi.
n
m
For a function f : R → R , we write
f (x1 , x2 , . . . , xn ) = (f1 (x1 , x2 , . . . , xn ), f2 (x1 , x2 , . . . , xn ), . . . , fm (x1 , x2 , . . . , xn )).
For the function p in Example 1, we have p1 (t) = 2t + 1, p2 (t) = 3t, and p3 (t) = 1 − t.
2
Derivative Matrix
We can organize the partial derivatives of f : R2 → R into a 1 &times; 2 matrix called the
derivative matrix,
h
i
∂f
∂f
Df = ∂x ∂y
More generally, for f : Rn → Rm , the derivative matrix of
f (x1 , . . . , xn ) = (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn ))
is



Df = 

∂f1
∂x1
∂f2
∂x1
∂f1
∂x2
∂f2
∂x2
...
...
..
.
∂f1
∂xn
∂f2
∂xn
∂fm
∂x1
∂fm
∂x2
...
∂fm
∂xn
..
.
..
.
Examples: Find the derivative matrix.
1. r(t) = h2t + 1, 3t, 1 − ti


2
Dr =  3 
−1
1
..
.



.

2. F (x, y) = hsin x, cos(xy), ln xi

cos x
0
DF =  −y sin(xy) −x sin(xy) 
1
0
x

3
Composition for Functions of Several Variables
Recall from single variable calculus that for two functions f, g : R → R, we may find
the composition
f ◦ g(x) = f (g(x)).
We can illustrate this as follows:
g
f
R→R→R
To take derivatives of these compositions, we use the Chain Rule:
df
dg
d
(f ◦ g(x)) =
(g(x)) (x)
dx
dx
dx
n
m
k
For multivariable functions f : R → R and g : R → Rn , we can take the composition
f ◦ g(t) = f (g(t))
illustrated by
g
f
Rk → Rn → Rm .
Our next task will be to find a rule for expressing the derivative matrix of f ◦ g in terms
of the matrices Df and Dg.
Examples:
1. Find F ◦ g(x, y), where F (t) = ht2 , 2t, 3t + 1i and g(x, y) = 3x + 2y.
F ◦ g(t) = F (g(t)) = F (3x + 2y) = h(3x + 2y)2 , 3(3x + 3y), 3(3x + 2y) + 1i
2. Express T in terms of t, when T (x, y) = hxy, 3x + yi, and x = t2 , y = 2t.
T (t) = h(t2 )(2t), 3(t2 ) + (2t)i
We can view this as a composition by defining a function f (t) = hx(t), y(t)i = ht2 , 2ti.
Then expressing T in terms of t is the same as finding T ◦ f (t).
4
Chain Rule, Case 1
Suppose the function f (x) gives the height of a mountain range at position x. We are
thinking of a cross section through the mountains, like the following graph.
2
Now suppose your are traveling through the mountains along this cross section, with
position at time t given by a function g(t). Then your height at time t is f (g(t)). We
can define a function h(t) = f (g(t)). Sometimes we write h = f ◦ g or h(t) = (f ◦ g)(t).
Recall that the one-dimensional chain rule is
dh
df
dg
(t) =
g(t)
(t).
dt
dx
dt
We can state this in terms of the derivative matrices of f and g. Since f : R → R and
g : R → R, the matrices Df and Dg are 1 &times; 1 matrices.
df dg Df = dx
, Dg = dx
Since multiplication of 1&times;1 matrices is just scalar multilplication, the one-dimensional
chain rule can be written
Dh(t) = Df (g(t))Dg(t).
5
Chain Rule, Case 2
Now redefine the mountain range by a function of two variables f (x, y). As before,
you travel across the mountain range in a path g(t) = hg1 (t), g2 (t)i. Now your height
at time t is given by h(t) = f (g(t)), which means that we evaluate f (x, y) at x = g1 (t)
and y = g2 (t).
The derivative h0 (t) will give the rate of change of height with respect to time. We
compute this by taking the product of derivative matrices
Dh(t) = Df (g(t))Dg(t).
This what the computation looks like:
Dh(t) =
=
=
h
h
∂f
(x, y)
∂x
∂f
(g(t))
∂x
∂f
(x, y)
∂y
∂f
(g(t))
∂y
i i
g(t)
∂g1
(t)
∂t
∂g2
(t)
∂t
∂g1
(t)
∂t
∂g2
(t)
∂t
∂f
∂g1
∂f
∂g2
(g(t))
(t) +
(g(t))
(t)
∂x
∂t
∂y
∂t
3
6
Chain Rule for Multivariable Functions
For general functions
f : Rn → Rm , f (x) = hf1 (x), . . . , fm (x)i)
and
g : Rk → Rn g(t) = hg1 (t), . . . , gn (t)i
we can take the composition
h(t) = f (g(t)) = f (g1 (t), . . . , gn (t)).
We can find the derivative matrix of the composition h by taking the product of derivative matrices.
Dh(t) = Df (g(t))Dg(t)
7
Examples
1. Suppose in our example from the beginning, the height of the mountain range at
point (x, y) is given by f (x, y) = 2x sin(xy), and the path you travel is given by
g(t) = ht, t2 i. Then your height at time t is h(t) = f (g(t)). We can compute the
rate of change of height at time t using the chain rule.
Dh(t) = Df (g(t))Dg(t)
=
2xy cos(xy) + 2 sin(xy) 2x cos(xy) 2
=
2t3 cos(t3 ) + 2 sin(t3 ) 2t2 cos(t3 )
=
g(t)
2t3 cos(t3 ) + 2 sin(t3 ) + 4t3 cos(t3 )
= 6t3 cos(t3 ) + 2 sin(t3 )
1
2t
1
2t
We can also compute this derivative by first composing the functions, then taking
the derivative of the composition.
h(t) = f (g(t)) = 2t sin(t3 )
Dh(t) = 6t3 cos(t3 ) + 2 sin(t3 )
2. Find Dh and Dh(1, −1) where h = f (g(s, t)), and
f (x, y, z) = h2x + z 2 , xyzi, h(s, t) = ht, 2st2 , s3 i.
Using the chain rule, we compute:
Dh(s, t) = Df (g(s, t))Dg(s, t)
4


0
1
2 0 2z  2t2 4st 
=
yz xz xy g(s,t)
3s2 0


0
1
3
2
0 2s
 2t2 4st 
=
2s4 t2 s3 t 2st3
3s2 0
6s5
2
=
2s3 t3 + 6s3 t3 2s4 t2 + 4s4 t2
6s5
2
=
8s3 t3 6s4 t2
Without using the chain rule, we can first compose the functions, then find the
derivative matrix of the composition.
h(s, t) = (f ◦ g)(s, t) = h2t + s6 , 2s4 t3 i
Dh(s, t) = D(f ◦ g)(s, t) =
6s5
2
3 3
8s t 6s4 t2
From this, we compute
Dh(1, −1) =
xy+1
3. Find D(F ◦ g)(3, 5), if F (x, y) = he
6 2
−8 6
−x
.
, e i, Dg(3, 5) =
h1, 0i.
D(F ◦ g)(3, 5) = DF (g(3, 5))Dg(3, 5)
xy+1
−1
ye
xexy+1 =
−e−x
0
2
(1,0)
0 e
−1
=
− 1e 0
2
2e
=
1
e
5
−1
, and g(3, 5) =
2
```