Math 1321 Week 8 Lab Worksheet Due Thursday 03/07 1. Find ∇f (x, y, z) if f (x, y, z) = x2 + 3xz + z 2 y Solution: E D∂ ∂ ∂ f (x, y, z), f (x, y, z), f (x, y, z) ∂x ∂y ∂z D E = (2x + 3z), (z 2 ), (3x + 2zy) ∇f (x, y, z) = = (2x + 3z)i + (z 2 )j + (3x + 2zy)k 2. Consider Newton’s second law F = ma. Suppose that the force is F = −∇U , where U = U (r). Let r = r(t) = [x(t), y(t), z(t)] be the trajectory along which Newton’s law is satisfied (i.e. a(t) = r00 (t)). Prove that the quantity E = mv 2 /2 + U (r) is a constant with respect to t, that is dE/dt = 0 where v = kr0 (t)k. This constant is called the total energy of a particle. (Notice that mv 2 /2 represents the kinetic energy and U represents the potential energy of the particle). Solution: We want to show dE/dt = 0 mv 2 + U (r) 2 dE md 2 d = (v ) + U dt 2 dt dt E= Now notice that we can write the following: v2 = v · v Hence, d 2 d (v ) = (v · v) = v0 · v + v · v0 = 2v · v0 = 2v · a dt dt Also, d U = Ux x0 (t) + Uy y 0 (t) + Uz z 0 (t) = r0 · ∇U = v · ∇U dt This gives, d E = mv · a + v · ∇U = v · (ma − F) = 0 dt So the total energy is conserved for the trajectory of the motion. 3. Gradient Descent: Gradient descent (also known as the steepest descent method) is an iterative method that is used to find the minimum of a function F . The method is given an initial point r0 , and it follows the negative of the gradient in order to move the point toward a critical point, which is hopefully the desired local minimum. The equation below is the general formulation of the iterative procedure (update equation). rn+1 = rn − µ∇F (rn ) Gradient descent is popular for very large-scale optimization problems because it is easy to implement and each iteration is computationally cheap. In the following questions, you will be verifying the gradient descent algorithm for a simple quadratic bowl function and making inferences about the gradient descent step size µ. (a) Consider the convex function given as, f (x, y) = x2 +y 2 . We can rewrite the gradient descent update equation in terms of x and y by using the fact that, rn = (xn , yn ). This gives the following scalar update equations: ∂ f (x, y) ∂x ∂ = yn − µ f (x, y) ∂y xn+1 = xn − µ yn+1 Derive the gradient descent update equation for the x and y components for the function given by f (x, y) Solution: xn+1 = xn − 2µxn yn+1 = yn − 2µyn (b) Let r0 = (x0 , y0 ) = (5, 3) be the initial values. Compute the first 5 iterations of the x and y updates (i.e n = 0, 1, 2, 3, 4) when µ = 0.25. Can you guess what the minimum of the function is from your calculations? Solution: n 0 1 2 3 4 5 6 7 8 xn 5.0000 2.5000 1.2500 0.6250 0.3125 0.1563 0.0781 0.0391 0.0195 yn 3.0000 1.5000 0.7500 0.3750 0.1875 0.0938 0.0469 0.0234 0.0117 From the values computed, it can be inferred that the minimum of the function f (x, y) occurs at (x, y) → 0 (c) Repeat (b) for when µ = 0.45 and µ = 0.75. What can you say about the choice of the step size µ? Solution: for µ = 0.45, we have, n xn yn 0 5.0000 3.0000 1 0.5000 0.3000 2 0.0500 0.0300 3 0.0050 0.0030 4 0.0005 0.0003 5 0.0000 0.0000 6 0.0000 0.0000 7 0.0000 0.0000 8 0.0000 0.0000 9 0.0000 0.0000 for µ = 0.75, we have, n xn yn 0 5.0000 3.0000 1 -2.5000 -1.5000 2 1.2500 0.7500 3 -0.6250 -0.3750 4 0.3125 0.1875 5 -0.1563 -0.0938 6 0.0781 0.0469 7 -0.0391 -0.0234 8 0.0195 0.0117 9 -0.0098 -0.0059 Increasing the step size µ increases the convergence speed, however if the step size is too large it can cause oscillatory behavior. This means that gradient descent is very sensitive to the choice of µ