Understanding the gradient vector in terms of indicating the direction

advertisement
Understanding the gradient vector in terms of indicating
the direction of change
Motivations and derivations
Let w be a function of two variables x, y, such that 𝒘 = 𝒇(𝒙, 𝒚); thus w is a
surface in space that changes over the changes in x and y. To indicate this
change in terms of its amount and direction, we shall use a vector to
represent such change where the magnitude of the vector is the amount of
change in w, and the direction of the vector is the direction in which the
change in w flows as x and y change.
In the surface 𝑤 = 𝑓(𝑥, 𝑦), at any point with components (x, y, w), we can
take its partial derivatives and directional derivative to show the rate of
change in a given direction. At different directions the rate of change, or the
slope of the surface may vary, we call the direction in which the surface has
the maximum slope/rate of change the direction of flow, and we can
represent it by a vector. The direction of flow, expressed in another word, is
the direction in which as we vary the value of x and y, that is, we move in a
direction in the x-y plane from one initial point (a, b), w where w=f(x, y)
increases the fastest among all directions. Note that we wish to indicate the
direction traveled in the x-y plane which is w’s level sets, thus the vector has
a w component of 0, and is parallel to the x-y plane. Similarly, if w is a
function of three variable x, y, and z, then w obtains a four-dimensional
graph, in its three-dimensional level surface, at a point (x, y, z), we are able
to indicate which direction we should travel to get the maximum rate of
change. For example, for function 𝑤 = 𝑥 2 + 𝑦 2 + 𝑧 2 , its gradient vector
field at various points is shown in the following graph:
The vectors indicates in three-dimensional space, the directions to travel in
order to get the maximum increase in w in its four-dimensional graph (which
is hard to visualize). A good way of visualizing it is to put it into a good,
common application, such as a function of temperature T in a space; for
every position it is a point in space (x, y, z), where has a temperature T and
T=f(x, y, z). Then this vector field is showing the flow of heat in that space.
2
With those basic understanding of such vector we want to get, the problem
can be stated as that we want to find a vector 𝑣⃑ such that 𝑣⃑ = ⟨𝑚, 𝑛, 0⟩ =
𝑚
𝑛
⟨𝑥, 𝑦⟩ where 𝐷𝑢 𝑓(𝑥, 𝑦) = 𝑓𝑥 ∗ 2 2 + 𝑓𝑦 ∗ 2 2 has a maximum. Note
√𝑚 +𝑛
√𝑚 +𝑛
that the directional derivative of f(x, y) at direction of unit vector 𝑢̂ = ⟨𝑎, 𝑏⟩
is 𝐷𝑢 𝑓(𝑥, 𝑦) = 𝑓𝑥 ∗ 𝑎 + 𝑓𝑦 ∗ 𝑏; since 𝑣⃑ is a vector on the x-y plane, we treat
it as a vector with only x and y component as 𝑣⃑ = ⟨𝑚, 𝑛⟩, then its unit vector
⃑⃑
𝑣
is |𝑣⃑⃑| =
⟨𝑚,𝑛⟩
√𝑚2 +𝑛2
. From this statement, we can see that 𝐷𝑢 𝑓(𝑥, 𝑦) is the
directional derivative in the same direction of 𝑣⃑. Recall that we want a such
vector 𝑣⃑ that is pointing at the direction that the function z=f(x, y) has the
largest rate of change; that is, the directional derivative along the direction
of 𝑣⃑ should have the maximum value at that one point. Since we are
creating such useful tool, we can try further and see if we can let it indicate
the rate of change of z as well. That is, |𝑣⃑ | = 𝐷𝑣⃑⃑ 𝑓(𝑥, 𝑦) = √𝑚2 + 𝑛2 . We
can see that we might some exciting result when we plug them back in the
equation above of the directional derivative of f(x, y), where we get
𝑚
𝑛
𝐷𝑢 𝑓(𝑥, 𝑦) = √𝑚2 + 𝑛2 = 𝑓𝑥 ∗
+ 𝑓𝑦 ∗
,
2
2
2
2
+
𝑛
+
𝑛
√𝑚
√𝑚
multiply √𝑚2 + 𝑛2 to both side of the equation, then we have
𝑚2 + 𝑛2 = 𝑓𝑥 ∗ 𝑚 + 𝑓𝑦 ∗ 𝑛,
solving the equation by matching the coefficients of m and n terms, and we
𝑚 = 𝑓𝑥
get {
. Thus, a vector 𝑣⃑ that indicates the direction and the magnitude
𝑛 = 𝑓𝑦
of the largest rate of change of a function z=f(x, y) at a point (a, b) is given by
⟨𝑓𝑥 (𝑎, 𝑏), 𝑓𝑦 (𝑎, 𝑏)⟩; that is, 𝑣⃑ = ⟨𝑓𝑥 (𝑎, 𝑏), 𝑓𝑦 (𝑎, 𝑏)⟩. Since such vectors is a
strong and useful tool to study multivariable function, we give it a name to
reference it easily and clearly, it is named a gradient vector of a function at
some point (a, b).
Closer look at gradient vector to study its properties
Since a gradient vector is defined at a particular point (a, b) on the x-y plane
for function z=f(x, y), then there is a value of z corresponding to the value
3
input (x, y), name it 𝑧0 , then we can represent all pairs of (x, y) where
𝑓(𝑥, 𝑦) = 𝑧0 as a level curve of function f. The gradient vector, starts at a
point (a, b) that is on this level curve, pointing at the direction in which the
function has its largest rate of change at the point.
1.0
1.0
0.5
0.5
0.5
0.5
0.0
0.0
0.0 y
- 0.5
0.0 y
- 0.5
0.0
x
0.0
x
- 0.5
- 0.5
0.5
0.5
As the figure above shown, the first figure is the graph of function 𝑧 =
𝑓(𝑥, 𝑦) = 𝑥 2 + 𝑥 ∗ 𝑦 + 𝑦 2 , the second figure is the graph of its level curves.
A gradient vector at a point where the level is calculated is pointing at the
direction with the largest slope, which in a level curve graph, is where the
level curves being the tightest. To get a more direct visualization, we can
look at the two-dimensional graph of the level curves:
0.4
0.2
y
0.0
- 0.2
- 0.4
- 0.4
- 0.2
0.0
x
4
0.2
0.4
At any picked point on the graph, in order to get the largest slope, or
choosing the direction where the level curves is tightest, that is, the direct
distance from one curve to another should be the shortest. Since the
distance between two level curve is always the same (the value depends on
the values of k in f(x, y) you choose to draw the level curves), then the
shortest distance between them should be the direction that is normal to
the tangent line of the curve at that point (a, b). That is, the gradient vector
at point (a, b) should be normal to the tangent plane of the level curve of
f(x, y) at (a, b). We can predict a bit further that the theorem can be applied
to any n-dimension system, and we shall attempt to prove it.
It’s easy to apply the same method we used for a function in space z=f(x, y)
to derive the form of gradient vector of a function of more variables. Let
w=f(x) where x=x1, x2, x3 ... xn, and w is at least once differentiable, then the
gradient vector of w is ∇𝑤 = ⟨𝑓𝑥1, 𝑓𝑥2, 𝑓𝑥3, … 𝑓𝑥𝑛, ⟩, where the gradient of w is
denoted with operator ∇ as ∇𝑤, pronounced “del w”. We can get the level
sets of function w by setting w equal to some constant k; thus, we can have
a level set of w expressed as 𝑘 = 𝑓(𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛 ). According to the chain
rule, we have the total derivative of w:
𝑑𝑤 𝜕𝑤 𝑑𝑥1 𝜕𝑤 𝑑𝑥2
𝜕𝑤 𝑑𝑥𝑛
=
∗
+
∗
+ ⋯+
∗
𝑑𝑡
𝜕𝑥1 𝑑𝑡
𝜕𝑥2 𝑑𝑡
𝜕𝑥𝑛 𝑑𝑡
= ∇𝑤 ∗ ⟨
The second vector ⟨
𝑑𝑥1 𝑑𝑥2
𝑑𝑡
,
𝑑𝑡
𝑑𝑥1 𝑑𝑥2
𝑑𝑡
,…
,
𝑑𝑡
𝑑𝑥𝑛
𝑑𝑡
,…
𝑑𝑥𝑛
𝑑𝑡
⟩.
⟩ indicates how each x changes according
to the change in t, thus it would trace out a (n-1)-dimensional “trace” in the
level set; that is, it is a “trace” of how the position of the point
(𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛 ) changes to change the value of w, but the “trace” is not
showing the change of w. To have a better visualization, when w is a
function of two variables x and y, then the vector ⟨
𝑑𝑥 𝑑𝑦
,
𝑑𝑡 𝑑𝑡
⟩ is how a point (x,
y) changes in the x-y plane, which depends on this position, the value of w
will vary accordingly. Therefore, the vector ⟨
𝑑𝑥1 𝑑𝑥2
𝑑𝑡
,
the level set of w. Since w=k for some constant k,
5
,…
𝑑𝑡
𝑑𝑤
𝑑𝑡
𝑑𝑥𝑛
𝑑𝑡
⟩ is a vector on
= 0 = ∇𝑤 ∗
⟨
𝑑𝑥1 𝑑𝑥2
𝑑𝑡
,
,…
𝑑𝑥𝑛
𝑑𝑡
𝑑𝑡
𝑑𝑥1 𝑑𝑥2
vector ⟨
𝑑𝑡
,
𝑑𝑡
⟩, the dot product of the gradient vector ∇𝑤 and the “trace”
,…
𝑑𝑥𝑛
𝑑𝑡
⟩ on the level set is zero, thus the gradient vector is
normal to the level set.
The following figures shows the gradient vectors of function 𝑧 = 𝑥 2 + 𝑦 2 at
different points. Note that since the simplest linear quadric shape, the level
curves of the function is circles with different radius and the origin as their
center. (Due to technology limitation, I’m not able to show the level curves
of it in graph.)
The gradient vector at the same point viewed through the x-y plane and the
surface.
As we change the trace point, the gradient changes and its direction is
always normal to the level curves of z. (Recall the level curves of z are
circles.)
6
Deriving the vector of direction of change; the general,
imperfect gradient: directional gradient
The gradient vector at a point on function w indicates the direction from this
point in which w has the largest rate of change. To indicates changes of w to
all directions, we introduce a new vector called the directional gradient. To
get an easier visualization along the way deriving directional gradient, we
shall first start with a function of two variables, which graph is a surface in
space; then later we shall attempt to extend the idea to n-dimensional
figures.
Let z be a function of two variables x and y, where z=f(x, y). Assume that f is
continuous everywhere and at least once differentiable, then all directional
derivatives at a point (x, y, z) on the surface exist. A point (x, y, z) on surface
in direction of the unit vector 𝑢
⃑⃑ = ⟨𝑎, 𝑏, 𝑐 ⟩, there exists a vector 𝑑⃑ that
indicates the rate of change of x, y and z at the point through that direction.
To say it in a simple fashion, the vector 𝑑⃑ indicates where the surface is
growing in the direction of 𝑢
⃑⃑. According to this definition, we know the
following about 𝑑⃑:
1. 𝑑⃑ = 𝑛𝑢
⃑⃑ = 𝑛 ∗ ⟨𝑎, 𝑏, 𝑐 ⟩, since they are in the direction, thus 𝑢
⃑⃑ ∕∕ 𝑑⃑.
7
2. 𝑑⃑ = ⟨𝑥, 𝑦, 𝐷𝑢 𝑓(𝑥, 𝑦)⟩, since 𝑑⃑ indicates the rate of change of the
function accordingly to changes in x and y, thus the rate of change of z
in direction of 𝑢
⃑⃑ can be represented by the directional derivative of
f(x, y) in direction of 𝑢
⃑⃑, which is 𝐷𝑢 𝑓(𝑥, 𝑦).
If z=f(x, y) is a continuous function in the neighborhood of (x, y, z), 𝑣⃑ is a
non-zero vector in space with component ⟨𝑙, 𝑚, 𝑛⟩, then the directional
gradient vector can be derived as follows:
The unit vector 𝑢
⃑⃑ in direction of vector 𝑣⃑ is
⃑⃑
𝑣
=⟨
|𝑣
⃑⃑|
𝑙
√𝑙 2 +𝑚2 +𝑛
,
2
𝑚
√𝑙 2 +𝑚2 +𝑛
,
2
𝑛
√𝑙 2 +𝑚2 +𝑛2
⟩ = ⟨𝑎, 𝑏, 𝑐 ⟩.
Since 𝑑⃑ = ⟨𝑥, 𝑦, 𝐷𝑢 𝑓(𝑥, 𝑦)⟩ for some x and y, and 𝑑⃑ = 𝑛𝑢
⃑⃑ = 𝑛 ∗
⟨
𝑙
,
𝑚
,
𝑛
√𝑙 2 +𝑚2 +𝑛2 √𝑙 2 +𝑚2 +𝑛2 √𝑙 2 +𝑚2 +𝑛2
⟩ = ⟨𝑥, 𝑦, 𝐷𝑢 𝑓(𝑥, 𝑦)⟩, thus we have the
following relationships:
𝑥 =𝑛∗𝑎
𝑦 =𝑛∗𝑏
{
𝐷𝑢 𝑓(𝑥, 𝑦) = 𝑛 ∗ 𝑐
Therefore we can express n in form of 𝑛 =
𝐷𝑢 𝑓(𝑥,𝑦)
𝑐
, plug in the above
equations, we get
𝐷𝑢 𝑓(𝑥, 𝑦)
∗𝑎
𝑐
{
.
𝐷𝑢 𝑓(𝑥, 𝑦)
𝑦=
∗𝑏
𝑐
𝑥=
𝐷 𝑓(𝑥,𝑦)
𝐷 𝑓(𝑥,𝑦)
Thus, the directional gradient 𝑑⃑ = ⟨ 𝑢
∗ 𝑎, 𝑢
∗ 𝑏, 𝐷𝑢 𝑓(𝑥, 𝑦)⟩,
𝑐
where 𝑎 =
8
𝑙
√𝑙 2 +𝑚2 +𝑛2
,𝑏 =
𝑚
√𝑙 2 +𝑚2 +𝑛2
,𝑐 =
𝑐
𝑛
√𝑙 2 +𝑚2 +𝑛2
.
The above figure shows the fx tangent line at a point on surface 𝑧 =
7𝑥𝑦
2 2
𝑒 (𝑥 +𝑦 )
.
In the above figure, the blue line shows the line that the directional gradient
vector of that point is in; the red line shows the projection of the line that
the unit vector 𝑢
⃑⃑ is in on the surface; and of course, in this special case, the
unit vector 𝑢
⃑⃑ = 𝑖̂, the unit vector along the x-axis.
Especially note that a directional gradient vector at a point is not always in
the tangent plane at that point. A tangent plane at a point is the plane
determined by the plane formed with the fx tangent line and the fy tangent
line at the point. In another word, the tangent plane at a point is
determined by two directional gradient vectors, one in the direction of the
x-axis, the other in the direction of the y-axis. Such tangent plane at point
9
(x0, y0, z0) is expressed in form of 𝑧 − 𝑧0 = 𝑓𝑥 (𝑥0 , 𝑦0 ) ∗ (𝑥 − 𝑥0 ) +
𝑓𝑦 (𝑥0 , 𝑦0 ) ∗ (𝑦 − 𝑦0 ). Any two of the directional gradient at a point on
surface can form a plane, and the direction of each directional gradient
vector may vary and thus form different planes. The reason that the tangent
plane at a point on surface is defined in such way is briefly discussed in a
separate paper (likely to be to the next one after this.).
10
Download