2.5 chain rule 2.6 gradient

advertisement
Lecture 6: 2.5 The Chain Rule.
The Chain rule in one variable: suppose that y = g(x), and z = f (y), i.e.
z = h(x), where h(x) = f (g(x)) = f ◦ g(x) then
dz
dz dy
=
dx
dy dx
⇔
h′ (x) = f ′ (y) g ′ (x)
The intuitive way to understand this is through the linear approximation:
△z = f (y + △y) − f (y) ∼ f ′ (y)△y
which is the same as saying that f is differentiable. Similarly
△y = g(x + △x) − g(x) ∼ g ′ (x)△x
and if we combine the two we get
△z ∼ f ′ (y) g ′ (x)△x
Since this is also equal to
△z = h(x + △x) − h(x) ∼ h′ (x)△x
the chain rule in one variable follows.
The Chain rule in several variables: Suppose that g : Rn → Rm , f : Rm → Rp
and let h = f ◦ g : Rn → Rp (i.e. h(x) = f (g(x))). Then
Dh(x0 ) = Df (y0 ) Dg(x0 ),
where
y0 = g(x0 )
and the right hand side is the p × n matrix formed by the matrix product of the
p × m matrix Df (y0 ) by the m × n matrix Dg(x0 ).
The intuitive argument above actually generalizes to several variables just by replacing f ′ by Df etc. since differentiability of functions in several variables says
g(x + △x) − g(x) ∼ Dg(x)△x.
If h(t) = f (c(t)) where f : R3 → R, c(t) = (x(t), y(t), z(t)) is a path or curve,
then by the chain rule


dx


[
]  dt 

∂f ∂f ∂f 
∂h
 dy  = ∂f dx + ∂f dy + ∂f dz
= Df Dc =

∂x ∂y ∂z  dt 
∂t
∂y dt
∂z dt
 ∂x dt
 dz 
dt
The gradient of a function f : Rn → R given by
[
∂f
grad f = ∇f =
...
∂x1
1
∂f
∂xn
]
2
This can also be expressed with the gradient notation and dot product
dh
(t) = ∇f (c(t)) · c ′ (t)
dt
where
c′ (t) = (x′ (t), y ′ (t), z ′ (t)),
is the velocity vector of the path.
Note that c′ (t) is tangent to the path c(t), which follows since it can be obtained
as the limit as h → 0 of the vector between two close points on the curve
c(t+ h) − c(t) ( x(t+ h) − x(t) y(t+ h) − y(t) z(t+ h) − z(t) )
=
,
,
→ (x′ (t), y ′ (t), z ′ (t))
h
h
h
h
Ex. If z = x2 + y 2 , x = cos t and y = sin t find dz/dt.
dz
∂z dx ∂z dy
Sol. 1
=
+
= 2x (− sin t)+2y cos t = 2 cos t (− sin t)+2 sin t cos t = 0
dt
∂x dt ∂y dt
dz
Sol. 2
z = x2 + y 2 = cos2 t + sin2 t = 1,
=0
dt
Ex. If z = h(r, θ) = f (x, y) where x = g1 (r, θ) = r cos θ and y = g2 (r, θ) = r sin θ
then by the chain rule
 ∂g
∂g1 
1
( ∂h ∂h ) ( ∂f ∂f )  ∂r
∂θ 
,
=
,


∂r ∂θ
∂x ∂y
∂g2 ∂g2
∂r
∂θ
i.e.
∂h
∂f ∂x ∂f ∂y
=
+
∂r
∂x ∂r
∂y ∂r
∂h
∂f ∂x ∂f ∂y
=
+
∂θ
∂x ∂θ
∂y ∂θ
or written shorter since h and f is the same function expressed in different coordinates
∂
∂x ∂
∂y ∂
=
+
∂r
∂r ∂x ∂r ∂y
∂x ∂
∂y ∂
∂
=
+
∂θ
∂θ ∂x ∂θ ∂y
In this case the matrix
 ∂x ∂x 
(
)
cos θ −r sin θ
 ∂r ∂θ 

=
sin θ r cos θ
∂y ∂y
∂r ∂θ
∂r
∂x
Note that in general
̸= ( )−1 . whereas for a function of one variable its
∂x
∂r
true that dx/dy = (dy/x)−1 . That this is true in one dimension follows from
differentiating the identity f (f −1 (x)) = x which gives f ′ (f −1 (x))f −1 (x) ′ = 1. The
higher dimension analogue of this would be with the derivative matrices so
 

∂r ∂r
∂x ∂x −1
(
)
 ∂x ∂y   ∂r ∂θ 
1 r cos θ r sin θ
=

 =
 ∂θ ∂θ 
r − sin θ − cos θ
∂y ∂y
∂x ∂y
∂r ∂θ
3
2.6 The gradient and directional derivative.
The gradient of a function f : R3 → R is the vector
(
)
∂f ∂f ∂f
∇f =
,
,
∂x ∂y ∂z
i.e. it is the matrix of derivatives written as a vector.
Consider the equation of a line in space ℓ(t) = x + tv, −∞ < t < ∞.
The function h(t) = f◦ ℓ(t) = f (x + tv) represents the function f restricted to the line.
The directional derivative of f at x in the direction of unit vector v is given by
d
= ∇f (x) · v
f (x + tv)
dt
t=0
Here the equality follows from the chain rule: h ′ = Dh = Df Dℓ = ∇f · ℓ ′ = ∇f · v.
The reason we choose v to be a unit vector is that we want the directional derivative
to represent the rate of change in different directions.
Suppose that f represents the temperature at different points in space. Suppose
that a fly flies along the line above at unit speed then the change of temperature
per unit time or distance is the directional derivative.
The gradient points in the direction along which f increases the fastest.
In fact ∇f · v = |∇f | |v| cos θ, where θ is the angle between ∇f and v, and the
max is when cos θ = 1.
Suppose we are lost in wood and we want to reach a high hill top to see where we
are. However, we can only see a few feet in front of us because of the high trees.
In which direction shall we go in order to reach a hill-top fast. The answer is that
if we go in the direction of the grade likely to reach a hill-top fast.
The gradient is normal to the tangent plane of the level surface:
Let f : R3 → R and let (x0 , y0 , z0 ) be a point on the level surface S defined by
f (x, y, z) = k, for some constant k. Then ∇f (x0 , y0 , z0 ) is normal to the level
surface in the following sense: If v = c′ (0) is a tangent vector to a path c(t) in S
with c(0) = (x0 , y0 , z0 ), then ∇f (x0 , y0 , z0 ) · v = 0.
In fact, since f (x(t), y(t), z(t)) = k it follows that
0=
d
f (x(t), y(t), z(t)) = ∇f (x(t), y(t), z(t)) · c′ (t)
dt
Let S be a level surface f (x, y, z) = k. The tangent plane of S at a point
(x0 , y0 , z0 ) of S is defined by the equation
∇f (x0 , y0 , z0 ) · (x − x0 , y − y0 , z − z0 ) = 0
In fact (x, y, z) is in the tangent plane if (x, y, z)−(x0 , y0 , z0 ) is parallel to the plane
and hence perpendicular to the normal.
Download