Understanding the gradient vector in terms of indicating the direction of change Motivations and derivations Let w be a function of two variables x, y, such that 𝒘 = 𝒇(𝒙, 𝒚); thus w is a surface in space that changes over the changes in x and y. To indicate this change in terms of its amount and direction, we shall use a vector to represent such change where the magnitude of the vector is the amount of change in w, and the direction of the vector is the direction in which the change in w flows as x and y change. In the surface 𝑤 = 𝑓(𝑥, 𝑦), at any point with components (x, y, w), we can take its partial derivatives and directional derivative to show the rate of change in a given direction. At different directions the rate of change, or the slope of the surface may vary, we call the direction in which the surface has the maximum slope/rate of change the direction of flow, and we can represent it by a vector. The direction of flow, expressed in another word, is the direction in which as we vary the value of x and y, that is, we move in a direction in the x-y plane from one initial point (a, b), w where w=f(x, y) increases the fastest among all directions. Note that we wish to indicate the direction traveled in the x-y plane which is w’s level sets, thus the vector has a w component of 0, and is parallel to the x-y plane. Similarly, if w is a function of three variable x, y, and z, then w obtains a four-dimensional graph, in its three-dimensional level surface, at a point (x, y, z), we are able to indicate which direction we should travel to get the maximum rate of change. For example, for function 𝑤 = 𝑥 2 + 𝑦 2 + 𝑧 2 , its gradient vector field at various points is shown in the following graph: The vectors indicates in three-dimensional space, the directions to travel in order to get the maximum increase in w in its four-dimensional graph (which is hard to visualize). A good way of visualizing it is to put it into a good, common application, such as a function of temperature T in a space; for every position it is a point in space (x, y, z), where has a temperature T and T=f(x, y, z). Then this vector field is showing the flow of heat in that space. 2 With those basic understanding of such vector we want to get, the problem can be stated as that we want to find a vector 𝑣⃑ such that 𝑣⃑ = 〈𝑚, 𝑛, 0〉 = 𝑚 𝑛 〈𝑥, 𝑦〉 where 𝐷𝑢 𝑓(𝑥, 𝑦) = 𝑓𝑥 ∗ 2 2 + 𝑓𝑦 ∗ 2 2 has a maximum. Note √𝑚 +𝑛 √𝑚 +𝑛 that the directional derivative of f(x, y) at direction of unit vector 𝑢̂ = 〈𝑎, 𝑏〉 is 𝐷𝑢 𝑓(𝑥, 𝑦) = 𝑓𝑥 ∗ 𝑎 + 𝑓𝑦 ∗ 𝑏; since 𝑣⃑ is a vector on the x-y plane, we treat it as a vector with only x and y component as 𝑣⃑ = 〈𝑚, 𝑛〉, then its unit vector ⃑⃑ 𝑣 is |𝑣⃑⃑| = 〈𝑚,𝑛〉 √𝑚2 +𝑛2 . From this statement, we can see that 𝐷𝑢 𝑓(𝑥, 𝑦) is the directional derivative in the same direction of 𝑣⃑. Recall that we want a such vector 𝑣⃑ that is pointing at the direction that the function z=f(x, y) has the largest rate of change; that is, the directional derivative along the direction of 𝑣⃑ should have the maximum value at that one point. Since we are creating such useful tool, we can try further and see if we can let it indicate the rate of change of z as well. That is, |𝑣⃑ | = 𝐷𝑣⃑⃑ 𝑓(𝑥, 𝑦) = √𝑚2 + 𝑛2 . We can see that we might some exciting result when we plug them back in the equation above of the directional derivative of f(x, y), where we get 𝑚 𝑛 𝐷𝑢 𝑓(𝑥, 𝑦) = √𝑚2 + 𝑛2 = 𝑓𝑥 ∗ + 𝑓𝑦 ∗ , 2 2 2 2 + 𝑛 + 𝑛 √𝑚 √𝑚 multiply √𝑚2 + 𝑛2 to both side of the equation, then we have 𝑚2 + 𝑛2 = 𝑓𝑥 ∗ 𝑚 + 𝑓𝑦 ∗ 𝑛, solving the equation by matching the coefficients of m and n terms, and we 𝑚 = 𝑓𝑥 get { . Thus, a vector 𝑣⃑ that indicates the direction and the magnitude 𝑛 = 𝑓𝑦 of the largest rate of change of a function z=f(x, y) at a point (a, b) is given by 〈𝑓𝑥 (𝑎, 𝑏), 𝑓𝑦 (𝑎, 𝑏)〉; that is, 𝑣⃑ = 〈𝑓𝑥 (𝑎, 𝑏), 𝑓𝑦 (𝑎, 𝑏)〉. Since such vectors is a strong and useful tool to study multivariable function, we give it a name to reference it easily and clearly, it is named a gradient vector of a function at some point (a, b). Closer look at gradient vector to study its properties Since a gradient vector is defined at a particular point (a, b) on the x-y plane for function z=f(x, y), then there is a value of z corresponding to the value 3 input (x, y), name it 𝑧0 , then we can represent all pairs of (x, y) where 𝑓(𝑥, 𝑦) = 𝑧0 as a level curve of function f. The gradient vector, starts at a point (a, b) that is on this level curve, pointing at the direction in which the function has its largest rate of change at the point. 1.0 1.0 0.5 0.5 0.5 0.5 0.0 0.0 0.0 y - 0.5 0.0 y - 0.5 0.0 x 0.0 x - 0.5 - 0.5 0.5 0.5 As the figure above shown, the first figure is the graph of function 𝑧 = 𝑓(𝑥, 𝑦) = 𝑥 2 + 𝑥 ∗ 𝑦 + 𝑦 2 , the second figure is the graph of its level curves. A gradient vector at a point where the level is calculated is pointing at the direction with the largest slope, which in a level curve graph, is where the level curves being the tightest. To get a more direct visualization, we can look at the two-dimensional graph of the level curves: 0.4 0.2 y 0.0 - 0.2 - 0.4 - 0.4 - 0.2 0.0 x 4 0.2 0.4 At any picked point on the graph, in order to get the largest slope, or choosing the direction where the level curves is tightest, that is, the direct distance from one curve to another should be the shortest. Since the distance between two level curve is always the same (the value depends on the values of k in f(x, y) you choose to draw the level curves), then the shortest distance between them should be the direction that is normal to the tangent line of the curve at that point (a, b). That is, the gradient vector at point (a, b) should be normal to the tangent plane of the level curve of f(x, y) at (a, b). We can predict a bit further that the theorem can be applied to any n-dimension system, and we shall attempt to prove it. It’s easy to apply the same method we used for a function in space z=f(x, y) to derive the form of gradient vector of a function of more variables. Let w=f(x) where x=x1, x2, x3 ... xn, and w is at least once differentiable, then the gradient vector of w is ∇𝑤 = 〈𝑓𝑥1, 𝑓𝑥2, 𝑓𝑥3, … 𝑓𝑥𝑛, 〉, where the gradient of w is denoted with operator ∇ as ∇𝑤, pronounced “del w”. We can get the level sets of function w by setting w equal to some constant k; thus, we can have a level set of w expressed as 𝑘 = 𝑓(𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛 ). According to the chain rule, we have the total derivative of w: 𝑑𝑤 𝜕𝑤 𝑑𝑥1 𝜕𝑤 𝑑𝑥2 𝜕𝑤 𝑑𝑥𝑛 = ∗ + ∗ + ⋯+ ∗ 𝑑𝑡 𝜕𝑥1 𝑑𝑡 𝜕𝑥2 𝑑𝑡 𝜕𝑥𝑛 𝑑𝑡 = ∇𝑤 ∗ 〈 The second vector 〈 𝑑𝑥1 𝑑𝑥2 𝑑𝑡 , 𝑑𝑡 𝑑𝑥1 𝑑𝑥2 𝑑𝑡 ,… , 𝑑𝑡 𝑑𝑥𝑛 𝑑𝑡 ,… 𝑑𝑥𝑛 𝑑𝑡 〉. 〉 indicates how each x changes according to the change in t, thus it would trace out a (n-1)-dimensional “trace” in the level set; that is, it is a “trace” of how the position of the point (𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛 ) changes to change the value of w, but the “trace” is not showing the change of w. To have a better visualization, when w is a function of two variables x and y, then the vector 〈 𝑑𝑥 𝑑𝑦 , 𝑑𝑡 𝑑𝑡 〉 is how a point (x, y) changes in the x-y plane, which depends on this position, the value of w will vary accordingly. Therefore, the vector 〈 𝑑𝑥1 𝑑𝑥2 𝑑𝑡 , the level set of w. Since w=k for some constant k, 5 ,… 𝑑𝑡 𝑑𝑤 𝑑𝑡 𝑑𝑥𝑛 𝑑𝑡 〉 is a vector on = 0 = ∇𝑤 ∗ 〈 𝑑𝑥1 𝑑𝑥2 𝑑𝑡 , ,… 𝑑𝑥𝑛 𝑑𝑡 𝑑𝑡 𝑑𝑥1 𝑑𝑥2 vector 〈 𝑑𝑡 , 𝑑𝑡 〉, the dot product of the gradient vector ∇𝑤 and the “trace” ,… 𝑑𝑥𝑛 𝑑𝑡 〉 on the level set is zero, thus the gradient vector is normal to the level set. The following figures shows the gradient vectors of function 𝑧 = 𝑥 2 + 𝑦 2 at different points. Note that since the simplest linear quadric shape, the level curves of the function is circles with different radius and the origin as their center. (Due to technology limitation, I’m not able to show the level curves of it in graph.) The gradient vector at the same point viewed through the x-y plane and the surface. As we change the trace point, the gradient changes and its direction is always normal to the level curves of z. (Recall the level curves of z are circles.) 6 Deriving the vector of direction of change; the general, imperfect gradient: directional gradient The gradient vector at a point on function w indicates the direction from this point in which w has the largest rate of change. To indicates changes of w to all directions, we introduce a new vector called the directional gradient. To get an easier visualization along the way deriving directional gradient, we shall first start with a function of two variables, which graph is a surface in space; then later we shall attempt to extend the idea to n-dimensional figures. Let z be a function of two variables x and y, where z=f(x, y). Assume that f is continuous everywhere and at least once differentiable, then all directional derivatives at a point (x, y, z) on the surface exist. A point (x, y, z) on surface in direction of the unit vector 𝑢 ⃑⃑ = 〈𝑎, 𝑏, 𝑐 〉, there exists a vector 𝑑⃑ that indicates the rate of change of x, y and z at the point through that direction. To say it in a simple fashion, the vector 𝑑⃑ indicates where the surface is growing in the direction of 𝑢 ⃑⃑. According to this definition, we know the following about 𝑑⃑: 1. 𝑑⃑ = 𝑛𝑢 ⃑⃑ = 𝑛 ∗ 〈𝑎, 𝑏, 𝑐 〉, since they are in the direction, thus 𝑢 ⃑⃑ ∕∕ 𝑑⃑. 7 2. 𝑑⃑ = 〈𝑥, 𝑦, 𝐷𝑢 𝑓(𝑥, 𝑦)〉, since 𝑑⃑ indicates the rate of change of the function accordingly to changes in x and y, thus the rate of change of z in direction of 𝑢 ⃑⃑ can be represented by the directional derivative of f(x, y) in direction of 𝑢 ⃑⃑, which is 𝐷𝑢 𝑓(𝑥, 𝑦). If z=f(x, y) is a continuous function in the neighborhood of (x, y, z), 𝑣⃑ is a non-zero vector in space with component 〈𝑙, 𝑚, 𝑛〉, then the directional gradient vector can be derived as follows: The unit vector 𝑢 ⃑⃑ in direction of vector 𝑣⃑ is ⃑⃑ 𝑣 =〈 |𝑣 ⃑⃑| 𝑙 √𝑙 2 +𝑚2 +𝑛 , 2 𝑚 √𝑙 2 +𝑚2 +𝑛 , 2 𝑛 √𝑙 2 +𝑚2 +𝑛2 〉 = 〈𝑎, 𝑏, 𝑐 〉. Since 𝑑⃑ = 〈𝑥, 𝑦, 𝐷𝑢 𝑓(𝑥, 𝑦)〉 for some x and y, and 𝑑⃑ = 𝑛𝑢 ⃑⃑ = 𝑛 ∗ 〈 𝑙 , 𝑚 , 𝑛 √𝑙 2 +𝑚2 +𝑛2 √𝑙 2 +𝑚2 +𝑛2 √𝑙 2 +𝑚2 +𝑛2 〉 = 〈𝑥, 𝑦, 𝐷𝑢 𝑓(𝑥, 𝑦)〉, thus we have the following relationships: 𝑥 =𝑛∗𝑎 𝑦 =𝑛∗𝑏 { 𝐷𝑢 𝑓(𝑥, 𝑦) = 𝑛 ∗ 𝑐 Therefore we can express n in form of 𝑛 = 𝐷𝑢 𝑓(𝑥,𝑦) 𝑐 , plug in the above equations, we get 𝐷𝑢 𝑓(𝑥, 𝑦) ∗𝑎 𝑐 { . 𝐷𝑢 𝑓(𝑥, 𝑦) 𝑦= ∗𝑏 𝑐 𝑥= 𝐷 𝑓(𝑥,𝑦) 𝐷 𝑓(𝑥,𝑦) Thus, the directional gradient 𝑑⃑ = 〈 𝑢 ∗ 𝑎, 𝑢 ∗ 𝑏, 𝐷𝑢 𝑓(𝑥, 𝑦)〉, 𝑐 where 𝑎 = 8 𝑙 √𝑙 2 +𝑚2 +𝑛2 ,𝑏 = 𝑚 √𝑙 2 +𝑚2 +𝑛2 ,𝑐 = 𝑐 𝑛 √𝑙 2 +𝑚2 +𝑛2 . The above figure shows the fx tangent line at a point on surface 𝑧 = 7𝑥𝑦 2 2 𝑒 (𝑥 +𝑦 ) . In the above figure, the blue line shows the line that the directional gradient vector of that point is in; the red line shows the projection of the line that the unit vector 𝑢 ⃑⃑ is in on the surface; and of course, in this special case, the unit vector 𝑢 ⃑⃑ = 𝑖̂, the unit vector along the x-axis. Especially note that a directional gradient vector at a point is not always in the tangent plane at that point. A tangent plane at a point is the plane determined by the plane formed with the fx tangent line and the fy tangent line at the point. In another word, the tangent plane at a point is determined by two directional gradient vectors, one in the direction of the x-axis, the other in the direction of the y-axis. Such tangent plane at point 9 (x0, y0, z0) is expressed in form of 𝑧 − 𝑧0 = 𝑓𝑥 (𝑥0 , 𝑦0 ) ∗ (𝑥 − 𝑥0 ) + 𝑓𝑦 (𝑥0 , 𝑦0 ) ∗ (𝑦 − 𝑦0 ). Any two of the directional gradient at a point on surface can form a plane, and the direction of each directional gradient vector may vary and thus form different planes. The reason that the tangent plane at a point on surface is defined in such way is briefly discussed in a separate paper (likely to be to the next one after this.). 10