Uploaded by 15140083

Lecture 2,3

advertisement
Machine learning
Lecture – 1, 2
Syed Qamar Askari
Topics
• Univariate Simple Linear Regression
We’ll start with Regression
Prerequisite knowledge required
Linear function
• It draws line
• Standard form: y = mx + b
• m is the slope
• b is the y-intercept
Prerequisite knowledge required
Quadratic function
• It draws parabola
• Standard form: f(x) = a(x - h)2 + k
•
•
•
•
a positive: Graph opens upward
a negative: Graph opens downward
line of symmetry at x = h
vertex is the point (h, k)
Prerequisite knowledge required
Cubic function
• Forms: f(x)=ax3+bx2+cx+d
f(x)=(x-a)3-b
Regression Types
• Linear and non-linear regression
• Simple and multiple regression
• Univariate and multivariate regression
Simple univariate linear regression
Example: House price prediction
Size in feet2 (x)
2104
1416
1534
852
…
Price ($) in 1000's (y)
460
232
315
178
…
Based on given data, can you predict the
price of 1600 square feet house ?
House price prediction – Solution (on board)
400
Size in Price ($) in
feet2 (x) 1000's (y)
2104
460
1416
232
1534
315
852
178
…
…
300
Price ($)
in 1000’s
200
100
Price of 1600 sq.ft?
0
0
500
1000
1500
Size in feet2
2000
2500
How could computer do it?
• Explanation of following steps on board
• Simple (solvable by a line) one-featured dataset example on board
• Hypothesis, linear modal, parameters (slope and y-intercept)
• Do some exercises with different combinations of parameters
• Cost/error function formulation
• To decrease error and get more fit model, show the relationship with slope and yintercept one-by-one and then simultaneously
• Give the concept of slope calculation using the concept of partial derivation
• Gradient descent for simple linear regression
•
•
•
•
Partial derivation of Error function w.r.t slope and y-intercept
Concept of learning rate
Equation to update the both parameters
Overall gradient descent algorithm
• Running few iterations of GD on simple example
Regression – Working model
Training Set
Learning Algorithm
Size of
house
h
Estimated
price
How to make function h?
Regression – Line fitting
Hypothesis:
‘s:
Parameters
How to choose
‘s ?
3
3
3
2
2
2
1
1
1
0
0
0
0
1
2
3
0
1
2
3
0
1
2
3
How to know, how good the fitting is?
Mean squared error
Mean squared error
y
Cost Function:
Mean squared error function
x
Idea: Choose
so that
is close to for our
training examples
Goal:
Understanding the role of slope
We want to fit a line in following data points
By fixing θ0 = 0 and varying θ1, understand
the behavior of J
3
3
2
2
1
1
y
0
0
1
x
2
3
0
-0.5 0 0.5 1 1.5 2 2.5
θ1
Understanding the role of slope
(for fixed θ1 = 1 , this is a function of x)
(function of the parameter θ1 = 1 )
3
3
2
2
1
1
y
0
0
1
x
2
3
0
-0.5 0 0.5 1 1.5 2 2.5
θ1
Understanding the role of slope
(function of the parameter θ1 = 0.5)
3
3
2
2
1
1
y
0
0
1
x
2
3
0
-0.5 0 0.5 1 1.5 2 2.5
θ1
Understanding the role of slope
(function of the parameter θ1 = 0)
3
3
2
2
1
1
0
0
1
x
2
3
0
-0.5 0 0.5 1 1.5 2 2.5
θ1
Understanding the role of slope
(function of the parameter θ1 with different values)
3
2
1
0
-0.5 0 0.5 1 1.5 2 2.5
θ1
So optimal value is 1 for θ1
Similarly by fixing θ1 and change in θ0 will
also give such kind of curve
What if we change both parameters
simultaneously?
We’ll come up with a landscape
Landscape view of J w.r.t to both parameters
J(0,1)
1
0
Idea
We’ll generate a random solution and then will
search the landscape to reach global minimum
J(0,1)
0
1
J(0,1)
0
1
How to search such complex unpredictable
landscapes?
• Hill climbing
• Simulated annealing
• Gradient descent
Slope of J at some point in space
The graph of a function, drawn in black, and a tangent line to that function, drawn in red. The slope of
the tangent line is equal to the derivative of the function at the marked point.
Partial derivation of J w.r.t both parameters
w.r.t θ0
w.r.t θ1
Gradient descent algorithm
update
both
simultaneously
Role of alpha and convergence
If α is too small, gradient descent
can be slow.
If α is too large, gradient descent
can overshoot the minimum. It may
fail to converge, or even diverge.
at local optima
Current value of
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.
As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time.
Demonstration of convergence
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
Download