Uploaded by Carlos

CS2810 Lecture 14

advertisement
Mathematics of Data Models
CS2810: Lecture 14
Optimization
Basic vector calculus and notations
By Professor Wu
Today, we are going to learn the
basic optimization notation
We learned from basic calculus that we can just
take the derivative
We learned in calculus that the location
where the derivative is 0 implies a maximum
or a minimum
Here, x that gives us the lowest value is 2
And set the derivative to 0
Example Problem: We are going to fence in a rectangular
field. If we look at the field from above the cost of the vertical
sides are $10/ft, the cost of the bottom is $2/ft and the cost
of the top is $7/ft. If we have $700 determine the dimensions
of the field that will maximize the enclosed area.
Although, finding and setting the gradient to 0 is a great way to find the optimal
solution.
You cannot always set the derivative to 0.
For more complex equations
It's not immediately obvious how to set the derivative to 0.
We start by picking a random point, let’s say α=2.
We have a little person at the current
location α = 2, we want to walk left
toward our next location. What should
we put into the box to get the next α
location?
This gave us
an error of 4
α next = α now α=2
It turns out that by looking at the
derivative at α=2, it tells us exactly
which way to go.
Starting from α=2 which way should we walk?
It turns out that by looking at the
derivative at α=2, it tells us exactly
which direction to go.
α=2
α=
2
α next = α now -
4
-2 = 2 - 4
OH no, we were walking in the right direction, but we walked too far. The solution is to
walk a little less by multiplying 4 by a small step size η=0.2.
4
α next = α now - 0.2
If you are not getting a smaller value, your step size is too big
If after you take a step, the value gets
bigger, η is way too big. Keep making
them smaller until you get a smaller
objective
α=-2
α=2
This step is too big, so you didn’t get a
smaller value. That’s cus you went too far.
Let’s take a smaller step.
α next = α now - 0.2
1.2 = 2
α=1.2 α=2
We are now walking
closer towards a
lower error.
4
- 0.2 (4)
The derivative tells you which
direction to go. If you take a small
enough step, you will always get
a lower value until you reach the
optimal.
If we repeat this process, we will slowly walk toward α=0.
α next = α now - 0.2
1.2 = 2
4
- 0.2 (4)
Next Step:
0.72 = 1.2 - 0.2 (2.4)
Next Step:
0.43 = 0.72 - 0.2 (1.44)
This iterative solver is one of the most
used and famous algorithm in
machine learning. It is called
Gradient Descent
In case you were wondering, if we had started at α=-2
α next = α now - 0.2
-1.2 = -2
-4
+ 0.2 (4)
Next Step:
-0.72 = -1.2 + 0.2 (2.4)
Next Step:
-0.43 = -0.72 + 0.2 (1.44)
The person would have walked
towards the right instead of left.
If we can apply the same idea in a higher dimension.
Solve this problem
Download