Mathematics of Data Models CS2810: Lecture 14 Optimization Basic vector calculus and notations By Professor Wu Today, we are going to learn the basic optimization notation We learned from basic calculus that we can just take the derivative We learned in calculus that the location where the derivative is 0 implies a maximum or a minimum Here, x that gives us the lowest value is 2 And set the derivative to 0 Example Problem: We are going to fence in a rectangular field. If we look at the field from above the cost of the vertical sides are $10/ft, the cost of the bottom is $2/ft and the cost of the top is $7/ft. If we have $700 determine the dimensions of the field that will maximize the enclosed area. Although, finding and setting the gradient to 0 is a great way to find the optimal solution. You cannot always set the derivative to 0. For more complex equations It's not immediately obvious how to set the derivative to 0. We start by picking a random point, let’s say α=2. We have a little person at the current location α = 2, we want to walk left toward our next location. What should we put into the box to get the next α location? This gave us an error of 4 α next = α now α=2 It turns out that by looking at the derivative at α=2, it tells us exactly which way to go. Starting from α=2 which way should we walk? It turns out that by looking at the derivative at α=2, it tells us exactly which direction to go. α=2 α= 2 α next = α now - 4 -2 = 2 - 4 OH no, we were walking in the right direction, but we walked too far. The solution is to walk a little less by multiplying 4 by a small step size η=0.2. 4 α next = α now - 0.2 If you are not getting a smaller value, your step size is too big If after you take a step, the value gets bigger, η is way too big. Keep making them smaller until you get a smaller objective α=-2 α=2 This step is too big, so you didn’t get a smaller value. That’s cus you went too far. Let’s take a smaller step. α next = α now - 0.2 1.2 = 2 α=1.2 α=2 We are now walking closer towards a lower error. 4 - 0.2 (4) The derivative tells you which direction to go. If you take a small enough step, you will always get a lower value until you reach the optimal. If we repeat this process, we will slowly walk toward α=0. α next = α now - 0.2 1.2 = 2 4 - 0.2 (4) Next Step: 0.72 = 1.2 - 0.2 (2.4) Next Step: 0.43 = 0.72 - 0.2 (1.44) This iterative solver is one of the most used and famous algorithm in machine learning. It is called Gradient Descent In case you were wondering, if we had started at α=-2 α next = α now - 0.2 -1.2 = -2 -4 + 0.2 (4) Next Step: -0.72 = -1.2 + 0.2 (2.4) Next Step: -0.43 = -0.72 + 0.2 (1.44) The person would have walked towards the right instead of left. If we can apply the same idea in a higher dimension. Solve this problem